NUMB3RS

Project files: NUMB3RS.zip.

The NUMB3RS example, from Di Jasio (chapter 4), illustrates the operation of mutliplication for a variety of data types.

The simpler integers (byte, short, and int) use a single instruction (hardware multiply). Extended precision integers and floating-point numbers require mutliple instructions to perform a multiplication.

The StopWatch window (see Measuring Performance allow us to analyze the number of instruction cycles to perform various operations.

The final results are shown in the table below:

line # type bytes cycles
20 char 1 6
24 short 2 6
28 int 4 6
32 long long 8 21
36 float 4 71
40 double 8 100

line #	type	bytes	cycles
20	char	1	6
24	short	2	6
28	int	4	6
32	long long	8	21
36	float	4	71
40	double	8	100

When the C source window is active, stepping through the program proceeds one C statement at a time. When the disassembly window is active, stepping through the program proceeds one instruction at a time. This allows us to check the cycle time instruction by instruction.

For char, short, and int data types, a multiplication statement requires 4 instructions:

24:    s3= s1 * s2;	
   lhu         v1,20(s8)  v1 <= s1
   lhu         v0,22(s8)  v0 <= s2
   mul         v0,v1,v0   v0 <= v1*v0
   sh          v0,24(s8)  s3 <= v0

Two instructions (lhu is load half-word unsigned) are used to load s1 and s2. The multiplication requires one instruction. The last instruction stores the result to s3. The load/store instructions execute in one cycle each. The multiply operation requires three cycles, for a total of six.

Multiplication of two 64-bit integers is performed inline (see disassembly) using three 32-bit multiply instructions and four additions.

There are 17 instructions used for the long long multiplication. The first two multiplication instructions require two cycles each. The last multiplication instruction requires three cycles. These additional cycles bring the total to 21, as shown in the table above. The additional cycle delay for the last multiplication is probably because the result is used in the following instruction, so that the instruction pipeline must be delayed by one cycle.

Multiplication of float and double numbers is performed by an external function.

`NUMB3RS4.c`

01: /*
02: ** NUMB3RS
03: **
04: ** Example4
05: */
06: 
07: #include <p32xxxx.h>
08: 
09: main ()
10: {
11:     char        c1, c2, c3;
12:     short       s1, s2, s3;
13:     int         i1, i2, i3;
14:     long long   ll1, ll2, ll3;
15:     float       f1,f2, f3;
16:     double      d1, d2, d3;
17:     
18:     c1 = 12;        // testing char integers (8-bit)
19:     c2 = 34;
20:     c3 = c1 * c2;
21:     
22:     s1 = 1234;      // testing short integers (16-bit)
23:     s2 = 5678;  
24:     s3= s1 * s2;        
25: 
26:     i1 = 1234567;     // testing (long) integers (32-bit)
27:     i2 = 3456789;       
28:     i3= i1 * i2;
29:     
30:     ll1 = 1234L;   // testing long long integers (64-bit)
31:     ll2 = 5678L;        
32:     ll3= ll1 * ll2;     
33: 
34:     f1 = 12.34;     // testing single precision floating point
35:     f2 = 56.78;
36:     f3= f1 * f2;
37: 
38:     d1 = 12.34L;    // testing double precision floating point
39:     d2 = 56.78L;        
40:     d3= d1 * d2;        
41: } // main

Disassembly Listing

---  C:\pic32\NUMB3RS\NUMB3RS4.c  ----------------------------------------------------------------
1:                   /*
2:                   ** NUMB3RS
3:                   **
4:                   ** Example4
5:                   */
6:                   
7:                   #include 
8:                   
9:                   main ()
10:                  {
9D000018  27BDFF90   addiu       sp,sp,-112
9D00001C  AFBF006C   sw          ra,108(sp)
9D000020  AFBE0068   sw          s8,104(sp)
9D000024  03A0F021   addu        s8,sp,zero
11:                      char        c1, c2, c3;
12:                      short       s1, s2, s3;
13:                      int         i1, i2, i3;
14:                      long long   ll1, ll2, ll3;
15:                      float       f1,f2, f3;
16:                      double      d1, d2, d3;
17:                      
18:                      c1 = 12;        // testing char integers (8-bit)
9D000028  2402000C   addiu       v0,zero,12
9D00002C  A3C20010   sb          v0,16(s8)
19:                      c2 = 34;
9D000030  24020022   addiu       v0,zero,34
9D000034  A3C20011   sb          v0,17(s8)
20:                      c3 = c1 * c2;
9D000038  93C30010   lbu         v1,16(s8)
9D00003C  93C20011   lbu         v0,17(s8)
9D000040  70621002   mul         v0,v1,v0
9D000044  A3C20012   sb          v0,18(s8)
21:                      
22:                      s1 = 1234;      // testing short integers (16-bit)
9D000048  240204D2   addiu       v0,zero,1234
9D00004C  A7C20014   sh          v0,20(s8)
23:                      s2 = 5678;	
9D000050  2402162E   addiu       v0,zero,5678
9D000054  A7C20016   sh          v0,22(s8)
24:                      s3= s1 * s2;	
9D000058  97C30014   lhu         v1,20(s8)
9D00005C  97C20016   lhu         v0,22(s8)
9D000060  70621002   mul         v0,v1,v0
9D000064  A7C20018   sh          v0,24(s8)
25:                  
26:                      i1 = 1234567;     // testing (long) integers (32-bit)
9D000068  3C020012   lui         v0,0x12
9D00006C  3442D687   ori         v0,v0,0xd687
9D000070  AFC2001C   sw          v0,28(s8)
27:                      i2 = 3456789;	
9D000074  3C020034   lui         v0,0x34
9D000078  3442BF15   ori         v0,v0,0xbf15
9D00007C  AFC20020   sw          v0,32(s8)
28:                      i3= i1 * i2;
9D000080  8FC3001C   lw          v1,28(s8)
9D000084  8FC20020   lw          v0,32(s8)
9D000088  70621002   mul         v0,v1,v0
9D00008C  AFC20024   sw          v0,36(s8)
29:                      
30:                      ll1 = 1234L;   // testing long long integers (64-bit)
9D000090  240204D2   addiu       v0,zero,1234
9D000094  00001821   addu        v1,zero,zero
9D000098  AFC20028   sw          v0,40(s8)
9D00009C  AFC3002C   sw          v1,44(s8)
31:                      ll2 = 5678L;	
9D0000A0  2402162E   addiu       v0,zero,5678
9D0000A4  00001821   addu        v1,zero,zero
9D0000A8  AFC20030   sw          v0,48(s8)
9D0000AC  AFC30034   sw          v1,52(s8)
32:                      ll3= ll1 * ll2;	
9D0000B0  8FC30028   lw          v1,40(s8)
9D0000B4  8FC20030   lw          v0,48(s8)
9D0000B8  00620019   multu       v1,v0
9D0000BC  00002012   mflo        a0
9D0000C0  00002810   mfhi        a1
9D0000C4  8FC30028   lw          v1,40(s8)
9D0000C8  8FC20034   lw          v0,52(s8)
9D0000CC  70621802   mul         v1,v1,v0
9D0000D0  00A01021   addu        v0,a1,zero
9D0000D4  00431021   addu        v0,v0,v1
9D0000D8  8FC60030   lw          a2,48(s8)
9D0000DC  8FC3002C   lw          v1,44(s8)
9D0000E0  70C31802   mul         v1,a2,v1
9D0000E4  00431021   addu        v0,v0,v1
9D0000E8  00402821   addu        a1,v0,zero
9D0000EC  AFC40038   sw          a0,56(s8)
9D0000F0  AFC5003C   sw          a1,60(s8)
33:                  
34:                      f1 = 12.34;	    // testing single precision floating point
9D0000F4  3C029D07   lui         v0,0x9d07
9D0000F8  8C420DF0   lw          v0,3568(v0)
9D0000FC  AFC20040   sw          v0,64(s8)
35:                      f2 = 56.78;
9D000100  3C029D07   lui         v0,0x9d07
9D000104  8C420DF4   lw          v0,3572(v0)
9D000108  AFC20044   sw          v0,68(s8)
36:                      f3= f1 * f2;
9D00010C  8FC40040   lw          a0,64(s8)
9D000110  8FC50044   lw          a1,68(s8)
9D000114  0F4002E3   jal         0x9d000b8c
9D000118  00000000   nop         
9D00011C  AFC20048   sw          v0,72(s8)
37:                  
38:                      d1 = 12.34L;    // testing double precision floating point
9D000120  3C029D07   lui         v0,0x9d07
9D000124  8C430DFC   lw          v1,3580(v0)
9D000128  8C420DF8   lw          v0,3576(v0)
9D00012C  AFC20050   sw          v0,80(s8)
9D000130  AFC30054   sw          v1,84(s8)
39:                      d2 = 56.78L;	
9D000134  3C029D07   lui         v0,0x9d07
9D000138  8C430E04   lw          v1,3588(v0)
9D00013C  8C420E00   lw          v0,3584(v0)
9D000140  AFC20058   sw          v0,88(s8)
9D000144  AFC3005C   sw          v1,92(s8)
40:                      d3= d1 * d2;	
9D000148  8FC40050   lw          a0,80(s8)
9D00014C  8FC50054   lw          a1,84(s8)
9D000150  8FC60058   lw          a2,88(s8)
9D000154  8FC7005C   lw          a3,92(s8)
9D000158  0F4001B1   jal         0x9d0006c4
9D00015C  00000000   nop         
9D000160  AFC20060   sw          v0,96(s8)
9D000164  AFC30064   sw          v1,100(s8)
41:                  } // main
9D000168  03C0E821   addu        sp,s8,zero
9D00016C  8FBF006C   lw          ra,108(sp)
9D000170  8FBE0068   lw          s8,104(sp)
9D000174  27BD0070   addiu       sp,sp,112
9D000178  03E00008   jr          ra
9D00017C  00000000   nop

Measuring Performance

After building the project, open the Stopwatch Window (Select Debugger -- StopWatch from the top menu), shown below:

Zero the stopwatch and execute a Step-Over command at a multiplication statement. You must manually record the instruction cycles required to execute the statement before proceeding to the next statements whose performance you wish to measure.

Reference

Lucio Di Jasio, Programming 32-bit Microcontrollers in C, Exploring the PIC32, Newnes (Elsevier), 2008. ISBN 978-0-7506-8709-5. Chapter 4 NUMB3RS

Exercises

Test the performance of the division operation for the various data types, using the methodogy described above.
Measure the performance of floating-point addition and subtraction.
Test the performance of the trigonmetric functions relative to the standard arithmetic operations.
Test the performance of multiplication for complex data types. (see Di Jasio, p 77-78)

Maintained by John Loomis, updated Tue Aug 05 10:26:50 2008