Project files: NUMB3RS.zip.
The NUMB3RS example, from Di Jasio (chapter 4), illustrates the operation of mutliplication for a variety of data types.
The simpler integers (byte, short, and int) use a single instruction (hardware multiply). Extended precision integers and floating-point numbers require mutliple instructions to perform a multiplication.
The StopWatch window (see Measuring Performance allow us to analyze the number of instruction cycles to perform various operations.
The final results are shown in the table below:
line # | type | bytes | cycles |
---|---|---|---|
20 | char | 1 | 6 |
24 | short | 2 | 6 |
28 | int | 4 | 6 |
32 | long long | 8 | 21 |
36 | float | 4 | 71 |
40 | double | 8 | 100 |
When the C source window is active, stepping through the program proceeds one C statement at a time. When the disassembly window is active, stepping through the program proceeds one instruction at a time. This allows us to check the cycle time instruction by instruction.
For char, short, and int data types, a multiplication statement requires 4 instructions:
24: s3= s1 * s2; lhu v1,20(s8) v1 <= s1 lhu v0,22(s8) v0 <= s2 mul v0,v1,v0 v0 <= v1*v0 sh v0,24(s8) s3 <= v0
Two instructions (lhu is load half-word unsigned) are used to load s1 and s2. The multiplication requires one instruction. The last instruction stores the result to s3. The load/store instructions execute in one cycle each. The multiply operation requires three cycles, for a total of six.
Multiplication of two 64-bit integers is performed inline (see disassembly) using three 32-bit multiply instructions and four additions.
There are 17 instructions used for the long long multiplication. The first two multiplication instructions require two cycles each. The last multiplication instruction requires three cycles. These additional cycles bring the total to 21, as shown in the table above. The additional cycle delay for the last multiplication is probably because the result is used in the following instruction, so that the instruction pipeline must be delayed by one cycle.
Multiplication of float and double numbers is performed by an external function.
NUMB3RS4.c
01: /* 02: ** NUMB3RS 03: ** 04: ** Example4 05: */ 06: 07: #include <p32xxxx.h> 08: 09: main () 10: { 11: char c1, c2, c3; 12: short s1, s2, s3; 13: int i1, i2, i3; 14: long long ll1, ll2, ll3; 15: float f1,f2, f3; 16: double d1, d2, d3; 17: 18: c1 = 12; // testing char integers (8-bit) 19: c2 = 34; 20: c3 = c1 * c2; 21: 22: s1 = 1234; // testing short integers (16-bit) 23: s2 = 5678; 24: s3= s1 * s2; 25: 26: i1 = 1234567; // testing (long) integers (32-bit) 27: i2 = 3456789; 28: i3= i1 * i2; 29: 30: ll1 = 1234L; // testing long long integers (64-bit) 31: ll2 = 5678L; 32: ll3= ll1 * ll2; 33: 34: f1 = 12.34; // testing single precision floating point 35: f2 = 56.78; 36: f3= f1 * f2; 37: 38: d1 = 12.34L; // testing double precision floating point 39: d2 = 56.78L; 40: d3= d1 * d2; 41: } // main
--- C:\pic32\NUMB3RS\NUMB3RS4.c ---------------------------------------------------------------- 1: /* 2: ** NUMB3RS 3: ** 4: ** Example4 5: */ 6: 7: #include8: 9: main () 10: { 9D000018 27BDFF90 addiu sp,sp,-112 9D00001C AFBF006C sw ra,108(sp) 9D000020 AFBE0068 sw s8,104(sp) 9D000024 03A0F021 addu s8,sp,zero 11: char c1, c2, c3; 12: short s1, s2, s3; 13: int i1, i2, i3; 14: long long ll1, ll2, ll3; 15: float f1,f2, f3; 16: double d1, d2, d3; 17: 18: c1 = 12; // testing char integers (8-bit) 9D000028 2402000C addiu v0,zero,12 9D00002C A3C20010 sb v0,16(s8) 19: c2 = 34; 9D000030 24020022 addiu v0,zero,34 9D000034 A3C20011 sb v0,17(s8) 20: c3 = c1 * c2; 9D000038 93C30010 lbu v1,16(s8) 9D00003C 93C20011 lbu v0,17(s8) 9D000040 70621002 mul v0,v1,v0 9D000044 A3C20012 sb v0,18(s8) 21: 22: s1 = 1234; // testing short integers (16-bit) 9D000048 240204D2 addiu v0,zero,1234 9D00004C A7C20014 sh v0,20(s8) 23: s2 = 5678; 9D000050 2402162E addiu v0,zero,5678 9D000054 A7C20016 sh v0,22(s8) 24: s3= s1 * s2; 9D000058 97C30014 lhu v1,20(s8) 9D00005C 97C20016 lhu v0,22(s8) 9D000060 70621002 mul v0,v1,v0 9D000064 A7C20018 sh v0,24(s8) 25: 26: i1 = 1234567; // testing (long) integers (32-bit) 9D000068 3C020012 lui v0,0x12 9D00006C 3442D687 ori v0,v0,0xd687 9D000070 AFC2001C sw v0,28(s8) 27: i2 = 3456789; 9D000074 3C020034 lui v0,0x34 9D000078 3442BF15 ori v0,v0,0xbf15 9D00007C AFC20020 sw v0,32(s8) 28: i3= i1 * i2; 9D000080 8FC3001C lw v1,28(s8) 9D000084 8FC20020 lw v0,32(s8) 9D000088 70621002 mul v0,v1,v0 9D00008C AFC20024 sw v0,36(s8) 29: 30: ll1 = 1234L; // testing long long integers (64-bit) 9D000090 240204D2 addiu v0,zero,1234 9D000094 00001821 addu v1,zero,zero 9D000098 AFC20028 sw v0,40(s8) 9D00009C AFC3002C sw v1,44(s8) 31: ll2 = 5678L; 9D0000A0 2402162E addiu v0,zero,5678 9D0000A4 00001821 addu v1,zero,zero 9D0000A8 AFC20030 sw v0,48(s8) 9D0000AC AFC30034 sw v1,52(s8) 32: ll3= ll1 * ll2; 9D0000B0 8FC30028 lw v1,40(s8) 9D0000B4 8FC20030 lw v0,48(s8) 9D0000B8 00620019 multu v1,v0 9D0000BC 00002012 mflo a0 9D0000C0 00002810 mfhi a1 9D0000C4 8FC30028 lw v1,40(s8) 9D0000C8 8FC20034 lw v0,52(s8) 9D0000CC 70621802 mul v1,v1,v0 9D0000D0 00A01021 addu v0,a1,zero 9D0000D4 00431021 addu v0,v0,v1 9D0000D8 8FC60030 lw a2,48(s8) 9D0000DC 8FC3002C lw v1,44(s8) 9D0000E0 70C31802 mul v1,a2,v1 9D0000E4 00431021 addu v0,v0,v1 9D0000E8 00402821 addu a1,v0,zero 9D0000EC AFC40038 sw a0,56(s8) 9D0000F0 AFC5003C sw a1,60(s8) 33: 34: f1 = 12.34; // testing single precision floating point 9D0000F4 3C029D07 lui v0,0x9d07 9D0000F8 8C420DF0 lw v0,3568(v0) 9D0000FC AFC20040 sw v0,64(s8) 35: f2 = 56.78; 9D000100 3C029D07 lui v0,0x9d07 9D000104 8C420DF4 lw v0,3572(v0) 9D000108 AFC20044 sw v0,68(s8) 36: f3= f1 * f2; 9D00010C 8FC40040 lw a0,64(s8) 9D000110 8FC50044 lw a1,68(s8) 9D000114 0F4002E3 jal 0x9d000b8c 9D000118 00000000 nop 9D00011C AFC20048 sw v0,72(s8) 37: 38: d1 = 12.34L; // testing double precision floating point 9D000120 3C029D07 lui v0,0x9d07 9D000124 8C430DFC lw v1,3580(v0) 9D000128 8C420DF8 lw v0,3576(v0) 9D00012C AFC20050 sw v0,80(s8) 9D000130 AFC30054 sw v1,84(s8) 39: d2 = 56.78L; 9D000134 3C029D07 lui v0,0x9d07 9D000138 8C430E04 lw v1,3588(v0) 9D00013C 8C420E00 lw v0,3584(v0) 9D000140 AFC20058 sw v0,88(s8) 9D000144 AFC3005C sw v1,92(s8) 40: d3= d1 * d2; 9D000148 8FC40050 lw a0,80(s8) 9D00014C 8FC50054 lw a1,84(s8) 9D000150 8FC60058 lw a2,88(s8) 9D000154 8FC7005C lw a3,92(s8) 9D000158 0F4001B1 jal 0x9d0006c4 9D00015C 00000000 nop 9D000160 AFC20060 sw v0,96(s8) 9D000164 AFC30064 sw v1,100(s8) 41: } // main 9D000168 03C0E821 addu sp,s8,zero 9D00016C 8FBF006C lw ra,108(sp) 9D000170 8FBE0068 lw s8,104(sp) 9D000174 27BD0070 addiu sp,sp,112 9D000178 03E00008 jr ra 9D00017C 00000000 nop
After building the project, open the Stopwatch Window (Select Debugger -- StopWatch from the top menu), shown below:
Zero the stopwatch and execute a Step-Over command at a multiplication statement. You must manually record the instruction cycles required to execute the statement before proceeding to the next statements whose performance you wish to measure.
Lucio Di Jasio, Programming 32-bit Microcontrollers in C, Exploring the PIC32, Newnes (Elsevier), 2008. ISBN 978-0-7506-8709-5. Chapter 4 NUMB3RS
Maintained by John Loomis, updated Tue Aug 05 10:26:50 2008