# Lab2: RISC-V RV32I[MA] emulator with ELF support
<contributed by:`xl86305955`>
###### tags:`Computer Architecture`, `RISC-V`
## Rewrite Programs from [Assignment1: RISC-V Assembly and Instruction Pipeline](https://hackmd.io/@jserv/By5OE6fOr)
### fraction
C code:
```clike=
#define NUM 7
void main()
{
volatile char* tx = (volatile char*) 0x40002000;
const char* output = "7! = ";
int mul = 1;
int o_int;
char o_char[50];
while(*output) {
*tx = *output;
output++;
}
for (int i=1;i<=NUM;i++)
mul*=i;
int tmp;
int count = 0;
int flag =1;
o_int = mul;
do {
if (o_int < 10) {
flag = 0;
}
tmp = o_int%10;
if (tmp == 0) o_char[count] = '0';
if (tmp == 1) o_char[count] = '1';
if (tmp == 2) o_char[count] = '2';
if (tmp == 3) o_char[count] = '3';
if (tmp == 4) o_char[count] = '4';
if (tmp == 5) o_char[count] = '5';
if (tmp == 6) o_char[count] = '6';
if (tmp == 7) o_char[count] = '7';
if (tmp == 8) o_char[count] = '8';
if (tmp == 9) o_char[count] = '9';
count++;
o_int = o_int/10;
}while ( o_int > 10 || flag == 1);
/* print out the value of result*/
for (;count>=0;count--) {
*tx = o_char[count];
}
}
```
Without optimization:
```
$ riscv-none-embed-gcc frac.c setup.o -march=rv32im -mabi=ilp32 -nostdlib -T link.ld -o frac
$ ./emu-rv32i frac
7! = 5040
>>> Execution time: 121626 ns
>>> Instruction count: 453 (IPS=3724532)
>>> Jumps: 65 (14.35%) - 44 forwards, 21 backwards
>>> Branching T=59 (84.29%) F=11 (15.71%)
```
With O2:
```
$ riscv-none-embed-gcc frac.c setup.o -march=rv32im -mabi=ilp32 -O2 -nostdlib -T link.ld -o frac
$ ./emu-rv32i frac
7! = 5040
>>> Execution time: 90074 ns
>>> Instruction count: 197 (IPS=2187090)
>>> Jumps: 34 (17.26%) - 20 forwards, 14 backwards
>>> Branching T=30 (75.00%) F=10 (25.00%)
```
With O3:
```
$ make frac
riscv-none-embed-gcc frac.c setup.o -march=rv32im -mabi=ilp32 -O3 -nostdlib -T link.ld -o frac
$ ./emu-rv32i frac
7! = 5040
>>> Execution time: 78733 ns
>>> Instruction count: 197 (IPS=2502127)
>>> Jumps: 34 (17.26%) - 20 forwards, 14 backwards
>>> Branching T=30 (75.00%) F=10 (25.00%)
```
| | Without Optimization | O2 | O3 |
| ----------------- | --------------------------- |:--------------------------- | -------------------------- |
| Execution Time | 121626 ns | 990074 ns | 78733 ns |
| Instruction Count | 453 | 197 | 197 |
| Jumps | 65 | 34 | 34 |
| Branching | T=59 (84.29%) F=11 (15.71%) | T=30 (75.00%) F=10 (25.00%) | T=30 (75.00%) F=10 (25.00%) |
The instruction count is the same when we apply O2 and O3
Take a look at assembly code, these two optimization generate the same assembly code this time
```
$ riscv-none-embed-objdump -d frac
frac: file format elf32-littleriscv
Disassembly of section .text:
00000194 <main>:
194: f9010113 addi sp,sp,-112
198: 000007b7 lui a5,0x0
19c: 06812623 sw s0,108(sp)
1a0: 06912423 sw s1,104(sp)
1a4: 07212223 sw s2,100(sp)
1a8: 07312023 sw s3,96(sp)
1ac: 05412e23 sw s4,92(sp)
1b0: 05512c23 sw s5,88(sp)
1b4: 05612a23 sw s6,84(sp)
1b8: 05712823 sw s7,80(sp)
1bc: 05812623 sw s8,76(sp)
1c0: 05912423 sw s9,72(sp)
1c4: 05a12223 sw s10,68(sp)
1c8: 05b12023 sw s11,64(sp)
1cc: 34078793 addi a5,a5,832 # 340 <__rodata_start>
1d0: 03700713 li a4,55
1d4: 400026b7 lui a3,0x40002
1d8: 00e68023 sb a4,0(a3) # 40002000 <__stack+0x40000cb8>
1dc: 00178793 addi a5,a5,1
1e0: 0007c703 lbu a4,0(a5)
1e4: fe071ae3 bnez a4,1d8 <main+0x44>
1e8: 00001637 lui a2,0x1
1ec: 3b060613 addi a2,a2,944 # 13b0 <__stack+0x68>
1f0: 00a00513 li a0,10
1f4: 02a667b3 rem a5,a2,a0
1f8: 00c10693 addi a3,sp,12
1fc: 00100593 li a1,1
200: 00068713 mv a4,a3
204: 00900d93 li s11,9
208: 03900d13 li s10,57
20c: 00700c93 li s9,7
210: 00800c13 li s8,8
214: 03800b93 li s7,56
218: 03700b13 li s6,55
21c: 00600a93 li s5,6
220: 03600a13 li s4,54
224: 00500993 li s3,5
228: 03500913 li s2,53
22c: 00100813 li a6,1
230: 00200493 li s1,2
234: 00300413 li s0,3
238: 00400393 li t2,4
23c: 03400293 li t0,52
240: 03300f93 li t6,51
244: 03000e93 li t4,48
248: 40d58333 sub t1,a1,a3
24c: 06d00893 li a7,109
250: 06300e13 li t3,99
254: 08079863 bnez a5,2e4 <main+0x150>
258: 01d70023 sb t4,0(a4)
25c: 0b379263 bne a5,s3,300 <main+0x16c>
260: 01270023 sb s2,0(a4)
264: 00e307b3 add a5,t1,a4
268: 06c8c063 blt a7,a2,2c8 <main+0x134>
26c: 05058e63 beq a1,a6,2c8 <main+0x134>
270: 00f687b3 add a5,a3,a5
274: 400025b7 lui a1,0x40002
278: 0080006f j 280 <main+0xec>
27c: 00060793 mv a5,a2
280: 0007c703 lbu a4,0(a5)
284: fff78613 addi a2,a5,-1
288: 00e58023 sb a4,0(a1) # 40002000 <__stack+0x40000cb8>
28c: fef698e3 bne a3,a5,27c <main+0xe8>
290: 06c12403 lw s0,108(sp)
294: 06812483 lw s1,104(sp)
298: 06412903 lw s2,100(sp)
29c: 06012983 lw s3,96(sp)
2a0: 05c12a03 lw s4,92(sp)
2a4: 05812a83 lw s5,88(sp)
2a8: 05412b03 lw s6,84(sp)
2ac: 05012b83 lw s7,80(sp)
2b0: 04c12c03 lw s8,76(sp)
2b4: 04812c83 lw s9,72(sp)
2b8: 04412d03 lw s10,68(sp)
2bc: 04012d83 lw s11,64(sp)
2c0: 07010113 addi sp,sp,112
2c4: 00008067 ret
2c8: 00ce27b3 slt a5,t3,a2
2cc: 02a64633 div a2,a2,a0
2d0: 40f007b3 neg a5,a5
2d4: 00f5f5b3 and a1,a1,a5
2d8: 00170713 addi a4,a4,1
2dc: 02a667b3 rem a5,a2,a0
2e0: f6078ce3 beqz a5,258 <main+0xc4>
2e4: 01079863 bne a5,a6,2f4 <main+0x160>
2e8: 03100f13 li t5,49
2ec: 01e70023 sb t5,0(a4)
2f0: f6dff06f j 25c <main+0xc8>
2f4: 00979c63 bne a5,s1,30c <main+0x178>
2f8: 03200f13 li t5,50
2fc: 01e70023 sb t5,0(a4)
300: 01579a63 bne a5,s5,314 <main+0x180>
304: 01470023 sb s4,0(a4)
308: f5dff06f j 264 <main+0xd0>
30c: 02879063 bne a5,s0,32c <main+0x198>
310: 01f70023 sb t6,0(a4)
314: 01979663 bne a5,s9,320 <main+0x18c>
318: 01670023 sb s6,0(a4)
31c: f49ff06f j 264 <main+0xd0>
320: 01879a63 bne a5,s8,334 <main+0x1a0>
324: 01770023 sb s7,0(a4)
328: f3dff06f j 264 <main+0xd0>
32c: f27798e3 bne a5,t2,25c <main+0xc8>
330: 00570023 sb t0,0(a4)
334: f3b798e3 bne a5,s11,264 <main+0xd0>
338: 01a70023 sb s10,0(a4)
33c: f29ff06f j 264 <main+0xd0>
```
Different optimization method will lead to different start of section header too
With O3:
```
$ riscv-none-embed-readelf -h frac
Start of section headers: 1556 (bytes into file)
```
Without Optimization:
```
$ riscv-none-embed-readelf -h frac
Start of section headers: 1744 (bytes into file)
```
#### Try to reduce instruction count
Brute force (without optimization):
```clike=
int a = 1;
int b = 2;
int c = 3;
int d = 4;
int e = 5;
int f = 6;
int g = 7;
mul = a * b * c * d * e * f * g;
```
```
$ ./emu-rv32i frac
7! = 5040
>>> Execution time: 100834 ns
>>> Instruction count: 405 (IPS=4016502)
>>> Jumps: 57 (14.07%) - 43 forwards, 14 backwards
>>> Branching T=52 (83.87%) F=10 (16.13%)
```
Reduced `50` instructions and less jumps and branches
But when it apply `O2` or `O3`, it will be the same as the for loop one