# Lab2: RISC-V RV32I[MA] emulator with ELF support <contributed by:`xl86305955`> ###### tags:`Computer Architecture`, `RISC-V` ## Rewrite Programs from [Assignment1: RISC-V Assembly and Instruction Pipeline](https://hackmd.io/@jserv/By5OE6fOr) ### fraction C code: ```clike= #define NUM 7 void main() { volatile char* tx = (volatile char*) 0x40002000; const char* output = "7! = "; int mul = 1; int o_int; char o_char[50]; while(*output) { *tx = *output; output++; } for (int i=1;i<=NUM;i++) mul*=i; int tmp; int count = 0; int flag =1; o_int = mul; do { if (o_int < 10) { flag = 0; } tmp = o_int%10; if (tmp == 0) o_char[count] = '0'; if (tmp == 1) o_char[count] = '1'; if (tmp == 2) o_char[count] = '2'; if (tmp == 3) o_char[count] = '3'; if (tmp == 4) o_char[count] = '4'; if (tmp == 5) o_char[count] = '5'; if (tmp == 6) o_char[count] = '6'; if (tmp == 7) o_char[count] = '7'; if (tmp == 8) o_char[count] = '8'; if (tmp == 9) o_char[count] = '9'; count++; o_int = o_int/10; }while ( o_int > 10 || flag == 1); /* print out the value of result*/ for (;count>=0;count--) { *tx = o_char[count]; } } ``` Without optimization: ``` $ riscv-none-embed-gcc frac.c setup.o -march=rv32im -mabi=ilp32 -nostdlib -T link.ld -o frac $ ./emu-rv32i frac 7! = 5040 >>> Execution time: 121626 ns >>> Instruction count: 453 (IPS=3724532) >>> Jumps: 65 (14.35%) - 44 forwards, 21 backwards >>> Branching T=59 (84.29%) F=11 (15.71%) ``` With O2: ``` $ riscv-none-embed-gcc frac.c setup.o -march=rv32im -mabi=ilp32 -O2 -nostdlib -T link.ld -o frac $ ./emu-rv32i frac 7! = 5040 >>> Execution time: 90074 ns >>> Instruction count: 197 (IPS=2187090) >>> Jumps: 34 (17.26%) - 20 forwards, 14 backwards >>> Branching T=30 (75.00%) F=10 (25.00%) ``` With O3: ``` $ make frac riscv-none-embed-gcc frac.c setup.o -march=rv32im -mabi=ilp32 -O3 -nostdlib -T link.ld -o frac $ ./emu-rv32i frac 7! = 5040 >>> Execution time: 78733 ns >>> Instruction count: 197 (IPS=2502127) >>> Jumps: 34 (17.26%) - 20 forwards, 14 backwards >>> Branching T=30 (75.00%) F=10 (25.00%) ``` | | Without Optimization | O2 | O3 | | ----------------- | --------------------------- |:--------------------------- | -------------------------- | | Execution Time | 121626 ns | 990074 ns | 78733 ns | | Instruction Count | 453 | 197 | 197 | | Jumps | 65 | 34 | 34 | | Branching | T=59 (84.29%) F=11 (15.71%) | T=30 (75.00%) F=10 (25.00%) | T=30 (75.00%) F=10 (25.00%) | The instruction count is the same when we apply O2 and O3 Take a look at assembly code, these two optimization generate the same assembly code this time ``` $ riscv-none-embed-objdump -d frac frac: file format elf32-littleriscv Disassembly of section .text: 00000194 <main>: 194: f9010113 addi sp,sp,-112 198: 000007b7 lui a5,0x0 19c: 06812623 sw s0,108(sp) 1a0: 06912423 sw s1,104(sp) 1a4: 07212223 sw s2,100(sp) 1a8: 07312023 sw s3,96(sp) 1ac: 05412e23 sw s4,92(sp) 1b0: 05512c23 sw s5,88(sp) 1b4: 05612a23 sw s6,84(sp) 1b8: 05712823 sw s7,80(sp) 1bc: 05812623 sw s8,76(sp) 1c0: 05912423 sw s9,72(sp) 1c4: 05a12223 sw s10,68(sp) 1c8: 05b12023 sw s11,64(sp) 1cc: 34078793 addi a5,a5,832 # 340 <__rodata_start> 1d0: 03700713 li a4,55 1d4: 400026b7 lui a3,0x40002 1d8: 00e68023 sb a4,0(a3) # 40002000 <__stack+0x40000cb8> 1dc: 00178793 addi a5,a5,1 1e0: 0007c703 lbu a4,0(a5) 1e4: fe071ae3 bnez a4,1d8 <main+0x44> 1e8: 00001637 lui a2,0x1 1ec: 3b060613 addi a2,a2,944 # 13b0 <__stack+0x68> 1f0: 00a00513 li a0,10 1f4: 02a667b3 rem a5,a2,a0 1f8: 00c10693 addi a3,sp,12 1fc: 00100593 li a1,1 200: 00068713 mv a4,a3 204: 00900d93 li s11,9 208: 03900d13 li s10,57 20c: 00700c93 li s9,7 210: 00800c13 li s8,8 214: 03800b93 li s7,56 218: 03700b13 li s6,55 21c: 00600a93 li s5,6 220: 03600a13 li s4,54 224: 00500993 li s3,5 228: 03500913 li s2,53 22c: 00100813 li a6,1 230: 00200493 li s1,2 234: 00300413 li s0,3 238: 00400393 li t2,4 23c: 03400293 li t0,52 240: 03300f93 li t6,51 244: 03000e93 li t4,48 248: 40d58333 sub t1,a1,a3 24c: 06d00893 li a7,109 250: 06300e13 li t3,99 254: 08079863 bnez a5,2e4 <main+0x150> 258: 01d70023 sb t4,0(a4) 25c: 0b379263 bne a5,s3,300 <main+0x16c> 260: 01270023 sb s2,0(a4) 264: 00e307b3 add a5,t1,a4 268: 06c8c063 blt a7,a2,2c8 <main+0x134> 26c: 05058e63 beq a1,a6,2c8 <main+0x134> 270: 00f687b3 add a5,a3,a5 274: 400025b7 lui a1,0x40002 278: 0080006f j 280 <main+0xec> 27c: 00060793 mv a5,a2 280: 0007c703 lbu a4,0(a5) 284: fff78613 addi a2,a5,-1 288: 00e58023 sb a4,0(a1) # 40002000 <__stack+0x40000cb8> 28c: fef698e3 bne a3,a5,27c <main+0xe8> 290: 06c12403 lw s0,108(sp) 294: 06812483 lw s1,104(sp) 298: 06412903 lw s2,100(sp) 29c: 06012983 lw s3,96(sp) 2a0: 05c12a03 lw s4,92(sp) 2a4: 05812a83 lw s5,88(sp) 2a8: 05412b03 lw s6,84(sp) 2ac: 05012b83 lw s7,80(sp) 2b0: 04c12c03 lw s8,76(sp) 2b4: 04812c83 lw s9,72(sp) 2b8: 04412d03 lw s10,68(sp) 2bc: 04012d83 lw s11,64(sp) 2c0: 07010113 addi sp,sp,112 2c4: 00008067 ret 2c8: 00ce27b3 slt a5,t3,a2 2cc: 02a64633 div a2,a2,a0 2d0: 40f007b3 neg a5,a5 2d4: 00f5f5b3 and a1,a1,a5 2d8: 00170713 addi a4,a4,1 2dc: 02a667b3 rem a5,a2,a0 2e0: f6078ce3 beqz a5,258 <main+0xc4> 2e4: 01079863 bne a5,a6,2f4 <main+0x160> 2e8: 03100f13 li t5,49 2ec: 01e70023 sb t5,0(a4) 2f0: f6dff06f j 25c <main+0xc8> 2f4: 00979c63 bne a5,s1,30c <main+0x178> 2f8: 03200f13 li t5,50 2fc: 01e70023 sb t5,0(a4) 300: 01579a63 bne a5,s5,314 <main+0x180> 304: 01470023 sb s4,0(a4) 308: f5dff06f j 264 <main+0xd0> 30c: 02879063 bne a5,s0,32c <main+0x198> 310: 01f70023 sb t6,0(a4) 314: 01979663 bne a5,s9,320 <main+0x18c> 318: 01670023 sb s6,0(a4) 31c: f49ff06f j 264 <main+0xd0> 320: 01879a63 bne a5,s8,334 <main+0x1a0> 324: 01770023 sb s7,0(a4) 328: f3dff06f j 264 <main+0xd0> 32c: f27798e3 bne a5,t2,25c <main+0xc8> 330: 00570023 sb t0,0(a4) 334: f3b798e3 bne a5,s11,264 <main+0xd0> 338: 01a70023 sb s10,0(a4) 33c: f29ff06f j 264 <main+0xd0> ``` Different optimization method will lead to different start of section header too With O3: ``` $ riscv-none-embed-readelf -h frac Start of section headers: 1556 (bytes into file) ``` Without Optimization: ``` $ riscv-none-embed-readelf -h frac Start of section headers: 1744 (bytes into file) ``` #### Try to reduce instruction count Brute force (without optimization): ```clike= int a = 1; int b = 2; int c = 3; int d = 4; int e = 5; int f = 6; int g = 7; mul = a * b * c * d * e * f * g; ``` ``` $ ./emu-rv32i frac 7! = 5040 >>> Execution time: 100834 ns >>> Instruction count: 405 (IPS=4016502) >>> Jumps: 57 (14.07%) - 43 forwards, 14 backwards >>> Branching T=52 (83.87%) F=10 (16.13%) ``` Reduced `50` instructions and less jumps and branches But when it apply `O2` or `O3`, it will be the same as the for loop one