# Lab2: RISC-V RV32I[MA] emulator with ELF support
###### tags: `RISC-V` `computer architure 2021`
I choose this problem because I have read this problem before, but at that time there was no way to have any idea of how to solve it after understanding the meaning of the [problem](https://hackmd.io/@kuouu/2021-arch-homework1).
## Rewrite in C language
```c=
int findComplement(int num) {
unsigned mask = 0xffffffff ;
while ( mask & num )
mask <<= 1 ;
return num ^ ~mask ;
}
void printNum(int num) {
volatile char *tx = (volatile char *) 0x40002000 ;
for ( int i = 31 ; i >=0 ; i-- )
*tx = ((num >> i) & 1) ? '1' : '0' ;
}
void printStr(const char *str) {
volatile char *tx = (volatile char *) 0x40002000 ;
while (*str) {
*tx = *str ;
str++ ;
}
}
int _start() {
const char *before = "Before: " ;
const char *after = "After : " ;
const char *newLine = "\n" ;
int input = 170 ;
printStr(before) ;
printNum(input) ;
printStr(newLine) ;
int result = findComplement(input) ;
printStr(after) ;
printNum(result) ;
}
```
I rewrite the output part into`emu-rv32i` compatible version because `emu-rv32i` cannot perform `ecall`.
In addition, the binary representation is used in the execution result, which is more convenient for comparison before and after.
## Comparing different optimization and manual in GNU Toolchain
### execute the elf file
- Origin assembly code written by 郭又宗.
```clike=
.data
input: .word 0x00000005
.text
main:
lw a0, input
jal ra, findComplement
jal ra, printResult
li a7, 10 # end program
ecall
findComplement:
li t0, 0xffffffff # mask
loop:
and t1, t0, a0 # mask & input
beq t1, x0, exit # t1 == 0, goto exit
slli t0, t0, 1 # mask <<= 1
j loop
exit:
not t0, t0 # mask = ~mask
xor a0, a0, t0 # input = input ^ mask
jr ra
printResult:
li a7, 1 # print integer
ecall
jr ra
```
- produced by gcc with -O3
```bash=
$riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O3 -nostdlib -o numberComplementO3 numberComplement.c
$./emu-rv32i numberComplementO3
Before: 00000000000000000000000010101010
After : 00000000000000000000000001010101
>>> Execution time: 515684 ns
>>> Instruction count: 468 (IPS=907532)
>>> Jumps: 77 (16.45%) - 0 forwards, 77 backwards
>>> Branching T=76 (95.00%) F=4 (5.00%)
```
- produced by gcc with -O0
```bash=
$riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O0 -nostdlib -o numberComplementO0 numberComplement.c
$./emu-rv32i numberComplementO0
Before: 00000000000000000000000010101010
After : 00000000000000000000000001010101
>>> Execution time: 713477 ns
>>> Instruction count: 1197 (IPS=1677699)
>>> Jumps: 172 (14.37%) - 76 forwards, 96 backwards
>>> Branching T=145 (91.19%) F=14 (8.81%)
```
- prodeced by gcc with -Os
```bash=
$riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -Os -nostdlib -o numberComplementOs numberComplement.c
$./emu-rv32i numberComplementOs
Before: 00000000000000000000000010101010
After : 00000000000000000000000001010101
>>> Execution time: 54474 ns
>>> Instruction count: 551 (IPS=10114917)
>>> Jumps: 125 (22.69%) - 31 forwards, 94 backwards
>>> Branching T=87 (93.55%) F=6 (6.45%)
```
### objdump
- produced by gcc with -O3
```bash=
$ riscv-none-embed-objdump -d numberComplementO3
numberComplementO3: file format elf32-littleriscv
Disassembly of section .text:
00010054 <findComplement>:
10054: 02050063 beqz a0,10074 <findComplement+0x20>
10058: fff00793 li a5,-1
1005c: 00179793 slli a5,a5,0x1
10060: 00f57733 and a4,a0,a5
10064: fe071ce3 bnez a4,1005c <findComplement+0x8>
10068: 00f54533 xor a0,a0,a5
1006c: fff54513 not a0,a0
10070: 00008067 ret
10074: 00000513 li a0,0
10078: 00008067 ret
0001007c <printNum>:
1007c: 01f00713 li a4,31
10080: 40002637 lui a2,0x40002
10084: fff00693 li a3,-1
10088: 40e557b3 sra a5,a0,a4
1008c: 0017f793 andi a5,a5,1
10090: 03078793 addi a5,a5,48
10094: 00f60023 sb a5,0(a2) # 40002000 <__global_pointer$+0x3fff068b>
10098: fff70713 addi a4,a4,-1
1009c: fed716e3 bne a4,a3,10088 <printNum+0xc>
100a0: 00008067 ret
000100a4 <printStr>:
100a4: 00054783 lbu a5,0(a0)
100a8: 00078c63 beqz a5,100c0 <printStr+0x1c>
100ac: 40002737 lui a4,0x40002
100b0: 00f70023 sb a5,0(a4) # 40002000 <__global_pointer$+0x3fff068b>
100b4: 00154783 lbu a5,1(a0)
100b8: 00150513 addi a0,a0,1
100bc: fe079ae3 bnez a5,100b0 <printStr+0xc>
100c0: 00008067 ret
000100c4 <_start>:
100c4: 000107b7 lui a5,0x10
100c8: 16078793 addi a5,a5,352 # 10160 <_start+0x9c>
100cc: 04200713 li a4,66
100d0: 400026b7 lui a3,0x40002
100d4: 00e68023 sb a4,0(a3) # 40002000 <__global_pointer$+0x3fff068b>
100d8: 0017c703 lbu a4,1(a5)
100dc: 00178793 addi a5,a5,1
100e0: fe071ae3 bnez a4,100d4 <_start+0x10>
100e4: 01f00713 li a4,31
100e8: 0aa00593 li a1,170
100ec: 400026b7 lui a3,0x40002
100f0: fff00613 li a2,-1
100f4: 40e5d7b3 sra a5,a1,a4
100f8: 0017f793 andi a5,a5,1
100fc: 03078793 addi a5,a5,48
10100: 00f68023 sb a5,0(a3) # 40002000 <__global_pointer$+0x3fff068b>
10104: fff70713 addi a4,a4,-1
10108: fec716e3 bne a4,a2,100f4 <_start+0x30>
1010c: 00a00793 li a5,10
10110: 00f68023 sb a5,0(a3)
10114: 000107b7 lui a5,0x10
10118: 16c78793 addi a5,a5,364 # 1016c <_start+0xa8>
1011c: 04100713 li a4,65
10120: 400026b7 lui a3,0x40002
10124: 00e68023 sb a4,0(a3) # 40002000 <__global_pointer$+0x3fff068b>
10128: 0017c703 lbu a4,1(a5)
1012c: 00178793 addi a5,a5,1
10130: fe071ae3 bnez a4,10124 <_start+0x60>
10134: 01f00713 li a4,31
10138: 05500593 li a1,85
1013c: 40002637 lui a2,0x40002
10140: fff00693 li a3,-1
10144: 40e5d7b3 sra a5,a1,a4
10148: 0017f793 andi a5,a5,1
1014c: 03078793 addi a5,a5,48
10150: 00f60023 sb a5,0(a2) # 40002000 <__global_pointer$+0x3fff068b>
10154: fff70713 addi a4,a4,-1
10158: fed716e3 bne a4,a3,10144 <_start+0x80>
1015c: 00008067 ret
```
- produced by gcc with -O0
```bash=
$ riscv-none-embed-objdump -d numberComplementO0
numberComplementO0: file format elf32-littleriscv
Disassembly of section .text:
00010054 <findComplement>:
10054: fd010113 addi sp,sp,-48
10058: 02812623 sw s0,44(sp)
1005c: 03010413 addi s0,sp,48
10060: fca42e23 sw a0,-36(s0)
10064: fff00793 li a5,-1
10068: fef42623 sw a5,-20(s0)
1006c: 0100006f j 1007c <findComplement+0x28>
10070: fec42783 lw a5,-20(s0)
10074: 00179793 slli a5,a5,0x1
10078: fef42623 sw a5,-20(s0)
1007c: fdc42703 lw a4,-36(s0)
10080: fec42783 lw a5,-20(s0)
10084: 00f777b3 and a5,a4,a5
10088: fe0794e3 bnez a5,10070 <findComplement+0x1c>
1008c: fdc42703 lw a4,-36(s0)
10090: fec42783 lw a5,-20(s0)
10094: 00f747b3 xor a5,a4,a5
10098: fff7c793 not a5,a5
1009c: 00078513 mv a0,a5
100a0: 02c12403 lw s0,44(sp)
100a4: 03010113 addi sp,sp,48
100a8: 00008067 ret
000100ac <printNum>:
100ac: fd010113 addi sp,sp,-48
100b0: 02812623 sw s0,44(sp)
100b4: 03010413 addi s0,sp,48
100b8: fca42e23 sw a0,-36(s0)
100bc: 400027b7 lui a5,0x40002
100c0: fef42423 sw a5,-24(s0)
100c4: 01f00793 li a5,31
100c8: fef42623 sw a5,-20(s0)
100cc: 0380006f j 10104 <printNum+0x58>
100d0: fec42783 lw a5,-20(s0)
100d4: fdc42703 lw a4,-36(s0)
100d8: 40f757b3 sra a5,a4,a5
100dc: 0017f793 andi a5,a5,1
100e0: 00078663 beqz a5,100ec <printNum+0x40>
100e4: 03100793 li a5,49
100e8: 0080006f j 100f0 <printNum+0x44>
100ec: 03000793 li a5,48
100f0: fe842703 lw a4,-24(s0)
100f4: 00f70023 sb a5,0(a4)
100f8: fec42783 lw a5,-20(s0)
100fc: fff78793 addi a5,a5,-1 # 40001fff <__global_pointer$+0x3fff05e5>
10100: fef42623 sw a5,-20(s0)
10104: fec42783 lw a5,-20(s0)
10108: fc07d4e3 bgez a5,100d0 <printNum+0x24>
1010c: 00000013 nop
10110: 00000013 nop
10114: 02c12403 lw s0,44(sp)
10118: 03010113 addi sp,sp,48
1011c: 00008067 ret
00010120 <printStr>:
10120: fd010113 addi sp,sp,-48
10124: 02812623 sw s0,44(sp)
10128: 03010413 addi s0,sp,48
1012c: fca42e23 sw a0,-36(s0)
10130: 400027b7 lui a5,0x40002
10134: fef42623 sw a5,-20(s0)
10138: 0200006f j 10158 <printStr+0x38>
1013c: fdc42783 lw a5,-36(s0)
10140: 0007c703 lbu a4,0(a5) # 40002000 <__global_pointer$+0x3fff05e6>
10144: fec42783 lw a5,-20(s0)
10148: 00e78023 sb a4,0(a5)
1014c: fdc42783 lw a5,-36(s0)
10150: 00178793 addi a5,a5,1
10154: fcf42e23 sw a5,-36(s0)
10158: fdc42783 lw a5,-36(s0)
1015c: 0007c783 lbu a5,0(a5)
10160: fc079ee3 bnez a5,1013c <printStr+0x1c>
10164: 00000013 nop
10168: 00000013 nop
1016c: 02c12403 lw s0,44(sp)
10170: 03010113 addi sp,sp,48
10174: 00008067 ret
00010178 <_start>:
10178: fd010113 addi sp,sp,-48
1017c: 02112623 sw ra,44(sp)
10180: 02812423 sw s0,40(sp)
10184: 03010413 addi s0,sp,48
10188: 000107b7 lui a5,0x10
1018c: 20078793 addi a5,a5,512 # 10200 <_start+0x88>
10190: fef42623 sw a5,-20(s0)
10194: 000107b7 lui a5,0x10
10198: 20c78793 addi a5,a5,524 # 1020c <_start+0x94>
1019c: fef42423 sw a5,-24(s0)
101a0: 000107b7 lui a5,0x10
101a4: 21878793 addi a5,a5,536 # 10218 <_start+0xa0>
101a8: fef42223 sw a5,-28(s0)
101ac: 0aa00793 li a5,170
101b0: fef42023 sw a5,-32(s0)
101b4: fec42503 lw a0,-20(s0)
101b8: f69ff0ef jal ra,10120 <printStr>
101bc: fe042503 lw a0,-32(s0)
101c0: eedff0ef jal ra,100ac <printNum>
101c4: fe442503 lw a0,-28(s0)
101c8: f59ff0ef jal ra,10120 <printStr>
101cc: fe042503 lw a0,-32(s0)
101d0: e85ff0ef jal ra,10054 <findComplement>
101d4: fca42e23 sw a0,-36(s0)
101d8: fe842503 lw a0,-24(s0)
101dc: f45ff0ef jal ra,10120 <printStr>
101e0: fdc42503 lw a0,-36(s0)
101e4: ec9ff0ef jal ra,100ac <printNum>
101e8: 00000013 nop
101ec: 00078513 mv a0,a5
101f0: 02c12083 lw ra,44(sp)
101f4: 02812403 lw s0,40(sp)
101f8: 03010113 addi sp,sp,48
101fc: 00008067 ret
```
- produced by gcc with -Os
```bash=
$ riscv-none-embed-objdump -d numberComplementOs
numberComplementOs: file format elf32-littleriscv
Disassembly of section .text:
00010054 <findComplement>:
10054: fff00793 li a5,-1
10058: 00f57733 and a4,a0,a5
1005c: 00071863 bnez a4,1006c <findComplement+0x18>
10060: 00f54533 xor a0,a0,a5
10064: fff54513 not a0,a0
10068: 00008067 ret
1006c: 00179793 slli a5,a5,0x1
10070: fe9ff06f j 10058 <findComplement+0x4>
00010074 <printNum>:
10074: 01f00713 li a4,31
10078: 40002637 lui a2,0x40002
1007c: fff00693 li a3,-1
10080: 40e557b3 sra a5,a0,a4
10084: 0017f793 andi a5,a5,1
10088: 03078793 addi a5,a5,48
1008c: 00f60023 sb a5,0(a2) # 40002000 <__global_pointer$+0x3fff06d3>
10090: fff70713 addi a4,a4,-1
10094: fed716e3 bne a4,a3,10080 <printNum+0xc>
10098: 00008067 ret
0001009c <printStr>:
1009c: 40002737 lui a4,0x40002
100a0: 00054783 lbu a5,0(a0)
100a4: 00079463 bnez a5,100ac <printStr+0x10>
100a8: 00008067 ret
100ac: 00f70023 sb a5,0(a4) # 40002000 <__global_pointer$+0x3fff06d3>
100b0: 00150513 addi a0,a0,1
100b4: fedff06f j 100a0 <printStr+0x4>
000100b8 <_start>:
100b8: 00010537 lui a0,0x10
100bc: ff010113 addi sp,sp,-16
100c0: 11450513 addi a0,a0,276 # 10114 <_start+0x5c>
100c4: 00112623 sw ra,12(sp)
100c8: 00812423 sw s0,8(sp)
100cc: fd1ff0ef jal ra,1009c <printStr>
100d0: 0aa00513 li a0,170
100d4: fa1ff0ef jal ra,10074 <printNum>
100d8: 00010537 lui a0,0x10
100dc: 12050513 addi a0,a0,288 # 10120 <_start+0x68>
100e0: fbdff0ef jal ra,1009c <printStr>
100e4: 0aa00513 li a0,170
100e8: f6dff0ef jal ra,10054 <findComplement>
100ec: 00050413 mv s0,a0
100f0: 00010537 lui a0,0x10
100f4: 12450513 addi a0,a0,292 # 10124 <_start+0x6c>
100f8: fa5ff0ef jal ra,1009c <printStr>
100fc: 00040513 mv a0,s0
10100: f75ff0ef jal ra,10074 <printNum>
10104: 00c12083 lw ra,12(sp)
10108: 00812403 lw s0,8(sp)
1010c: 01010113 addi sp,sp,16
10110: 00008067 ret
```
| | -O3 | -O0 | -Os |
| -------------------------- |:---------:|:------------:|:-----------:|
| Instruction count | 468 | 1197 | 551 |
| Execution time | 515684 | 713477 | 54474 |
| Lines of code | 71 | 111 | 52 |
| Jumps(forwards, backwards) | 77(0, 77) | 172(76, 96) | 125(31, 94) |
| Branch(T, F) | 80(76, 4) | 159(145, 14) | 93(87, 6) |
- Observation
- Counters in `emu rv32i` simulator
- `jump_counter` indicates the number of executions of `jump`(`j`, `jal`, etc) when program is running.
- `forwards_counter` indicates the number of executions of `forward jump` when program is running. A `jump` is called a `forward jump` if the target address is larger than the address of the jump instruction.
- `backwards_counter` indicates the number of executions of 'backward jump' when program is running. A `jump` is called a `backward jump` if the target address is less than the address of the jump instruction.
- `true_counter` indicates the number of successful predictions of `branch` when the program is running.
- 'false_counter' indicates the number of failed predictions of `branch` when the program is running.
- The following points can be observed from above table:
- Use option `-O0` when compiling, this is the unoptimize version, so the number of instructions, the execution time and the code size are the highest.
- Use option `-O3` or `-Os` to optimize the program, the number of instructions, the execution time and the code size are similar.
- Use option `O3` when compiling, there is no forward jump.
- For option `O3` and `Os`, the probability of successful predition of `branch` is higher than option `O0`. When CPU encounter `branch`, it needs to either execute the next instruction or use the target address to update the value of `program counter` and then continues execution from the instruction which is pointed by new `program counter`.
### readelf
- produced by gcc with -O3
```bash=
$ riscv-none-embed-readelf -h numberComplementO3
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100c4
Start of program headers: 52 (bytes into file)
Start of section headers: 872 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 7
Section header string table index: 6
```
- produced by gcc with -O0
```bash=
$ riscv-none-embed-readelf -h numberComplementO0
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x10178
Start of program headers: 52 (bytes into file)
Start of section headers: 1036 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 7
Section header string table index: 6
```
- produced by gcc with -Os
```bash=
$ riscv-none-embed-readelf -h numberComplementOs
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100b8
Start of program headers: 52 (bytes into file)
Start of section headers: 800 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 7
Section header string table index: 6
```
### size
- produced by gcc with -O3
```bash=
$ riscv-none-embed-size numberComplementO3
text data bss dec hex filename
289 0 0 289 121 numberComplementO3
```
- produced by gcc with -O0
```bash=
$ riscv-none-embed-size numberComplementO0
text data bss dec hex filename
454 0 0 454 1c6 numberComplementO0
```
- produced by gcc with -Os
```bash=
$ riscv-none-embed-size numberComplementOs
text data bss dec hex filename
217 0 0 217 d9 numberComplementOs
```