# Lab2: RISC-V RV32I[MA] emulator with ELF support ###### tags: `RISC-V` `computer architure 2021` I choose this problem because I have read this problem before, but at that time there was no way to have any idea of how to solve it after understanding the meaning of the [problem](https://hackmd.io/@kuouu/2021-arch-homework1). ## Rewrite in C language ```c= int findComplement(int num) { unsigned mask = 0xffffffff ; while ( mask & num ) mask <<= 1 ; return num ^ ~mask ; } void printNum(int num) { volatile char *tx = (volatile char *) 0x40002000 ; for ( int i = 31 ; i >=0 ; i-- ) *tx = ((num >> i) & 1) ? '1' : '0' ; } void printStr(const char *str) { volatile char *tx = (volatile char *) 0x40002000 ; while (*str) { *tx = *str ; str++ ; } } int _start() { const char *before = "Before: " ; const char *after = "After : " ; const char *newLine = "\n" ; int input = 170 ; printStr(before) ; printNum(input) ; printStr(newLine) ; int result = findComplement(input) ; printStr(after) ; printNum(result) ; } ``` I rewrite the output part into`emu-rv32i` compatible version because `emu-rv32i` cannot perform `ecall`. In addition, the binary representation is used in the execution result, which is more convenient for comparison before and after. ## Comparing different optimization and manual in GNU Toolchain ### execute the elf file - Origin assembly code written by 郭又宗. ```clike= .data input: .word 0x00000005 .text main: lw a0, input jal ra, findComplement jal ra, printResult li a7, 10 # end program ecall findComplement: li t0, 0xffffffff # mask loop: and t1, t0, a0 # mask & input beq t1, x0, exit # t1 == 0, goto exit slli t0, t0, 1 # mask <<= 1 j loop exit: not t0, t0 # mask = ~mask xor a0, a0, t0 # input = input ^ mask jr ra printResult: li a7, 1 # print integer ecall jr ra ``` - produced by gcc with -O3 ```bash= $riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O3 -nostdlib -o numberComplementO3 numberComplement.c $./emu-rv32i numberComplementO3 Before: 00000000000000000000000010101010 After : 00000000000000000000000001010101 >>> Execution time: 515684 ns >>> Instruction count: 468 (IPS=907532) >>> Jumps: 77 (16.45%) - 0 forwards, 77 backwards >>> Branching T=76 (95.00%) F=4 (5.00%) ``` - produced by gcc with -O0 ```bash= $riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O0 -nostdlib -o numberComplementO0 numberComplement.c $./emu-rv32i numberComplementO0 Before: 00000000000000000000000010101010 After : 00000000000000000000000001010101 >>> Execution time: 713477 ns >>> Instruction count: 1197 (IPS=1677699) >>> Jumps: 172 (14.37%) - 76 forwards, 96 backwards >>> Branching T=145 (91.19%) F=14 (8.81%) ``` - prodeced by gcc with -Os ```bash= $riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -Os -nostdlib -o numberComplementOs numberComplement.c $./emu-rv32i numberComplementOs Before: 00000000000000000000000010101010 After : 00000000000000000000000001010101 >>> Execution time: 54474 ns >>> Instruction count: 551 (IPS=10114917) >>> Jumps: 125 (22.69%) - 31 forwards, 94 backwards >>> Branching T=87 (93.55%) F=6 (6.45%) ``` ### objdump - produced by gcc with -O3 ```bash= $ riscv-none-embed-objdump -d numberComplementO3 numberComplementO3: file format elf32-littleriscv Disassembly of section .text: 00010054 <findComplement>: 10054: 02050063 beqz a0,10074 <findComplement+0x20> 10058: fff00793 li a5,-1 1005c: 00179793 slli a5,a5,0x1 10060: 00f57733 and a4,a0,a5 10064: fe071ce3 bnez a4,1005c <findComplement+0x8> 10068: 00f54533 xor a0,a0,a5 1006c: fff54513 not a0,a0 10070: 00008067 ret 10074: 00000513 li a0,0 10078: 00008067 ret 0001007c <printNum>: 1007c: 01f00713 li a4,31 10080: 40002637 lui a2,0x40002 10084: fff00693 li a3,-1 10088: 40e557b3 sra a5,a0,a4 1008c: 0017f793 andi a5,a5,1 10090: 03078793 addi a5,a5,48 10094: 00f60023 sb a5,0(a2) # 40002000 <__global_pointer$+0x3fff068b> 10098: fff70713 addi a4,a4,-1 1009c: fed716e3 bne a4,a3,10088 <printNum+0xc> 100a0: 00008067 ret 000100a4 <printStr>: 100a4: 00054783 lbu a5,0(a0) 100a8: 00078c63 beqz a5,100c0 <printStr+0x1c> 100ac: 40002737 lui a4,0x40002 100b0: 00f70023 sb a5,0(a4) # 40002000 <__global_pointer$+0x3fff068b> 100b4: 00154783 lbu a5,1(a0) 100b8: 00150513 addi a0,a0,1 100bc: fe079ae3 bnez a5,100b0 <printStr+0xc> 100c0: 00008067 ret 000100c4 <_start>: 100c4: 000107b7 lui a5,0x10 100c8: 16078793 addi a5,a5,352 # 10160 <_start+0x9c> 100cc: 04200713 li a4,66 100d0: 400026b7 lui a3,0x40002 100d4: 00e68023 sb a4,0(a3) # 40002000 <__global_pointer$+0x3fff068b> 100d8: 0017c703 lbu a4,1(a5) 100dc: 00178793 addi a5,a5,1 100e0: fe071ae3 bnez a4,100d4 <_start+0x10> 100e4: 01f00713 li a4,31 100e8: 0aa00593 li a1,170 100ec: 400026b7 lui a3,0x40002 100f0: fff00613 li a2,-1 100f4: 40e5d7b3 sra a5,a1,a4 100f8: 0017f793 andi a5,a5,1 100fc: 03078793 addi a5,a5,48 10100: 00f68023 sb a5,0(a3) # 40002000 <__global_pointer$+0x3fff068b> 10104: fff70713 addi a4,a4,-1 10108: fec716e3 bne a4,a2,100f4 <_start+0x30> 1010c: 00a00793 li a5,10 10110: 00f68023 sb a5,0(a3) 10114: 000107b7 lui a5,0x10 10118: 16c78793 addi a5,a5,364 # 1016c <_start+0xa8> 1011c: 04100713 li a4,65 10120: 400026b7 lui a3,0x40002 10124: 00e68023 sb a4,0(a3) # 40002000 <__global_pointer$+0x3fff068b> 10128: 0017c703 lbu a4,1(a5) 1012c: 00178793 addi a5,a5,1 10130: fe071ae3 bnez a4,10124 <_start+0x60> 10134: 01f00713 li a4,31 10138: 05500593 li a1,85 1013c: 40002637 lui a2,0x40002 10140: fff00693 li a3,-1 10144: 40e5d7b3 sra a5,a1,a4 10148: 0017f793 andi a5,a5,1 1014c: 03078793 addi a5,a5,48 10150: 00f60023 sb a5,0(a2) # 40002000 <__global_pointer$+0x3fff068b> 10154: fff70713 addi a4,a4,-1 10158: fed716e3 bne a4,a3,10144 <_start+0x80> 1015c: 00008067 ret ``` - produced by gcc with -O0 ```bash= $ riscv-none-embed-objdump -d numberComplementO0 numberComplementO0: file format elf32-littleriscv Disassembly of section .text: 00010054 <findComplement>: 10054: fd010113 addi sp,sp,-48 10058: 02812623 sw s0,44(sp) 1005c: 03010413 addi s0,sp,48 10060: fca42e23 sw a0,-36(s0) 10064: fff00793 li a5,-1 10068: fef42623 sw a5,-20(s0) 1006c: 0100006f j 1007c <findComplement+0x28> 10070: fec42783 lw a5,-20(s0) 10074: 00179793 slli a5,a5,0x1 10078: fef42623 sw a5,-20(s0) 1007c: fdc42703 lw a4,-36(s0) 10080: fec42783 lw a5,-20(s0) 10084: 00f777b3 and a5,a4,a5 10088: fe0794e3 bnez a5,10070 <findComplement+0x1c> 1008c: fdc42703 lw a4,-36(s0) 10090: fec42783 lw a5,-20(s0) 10094: 00f747b3 xor a5,a4,a5 10098: fff7c793 not a5,a5 1009c: 00078513 mv a0,a5 100a0: 02c12403 lw s0,44(sp) 100a4: 03010113 addi sp,sp,48 100a8: 00008067 ret 000100ac <printNum>: 100ac: fd010113 addi sp,sp,-48 100b0: 02812623 sw s0,44(sp) 100b4: 03010413 addi s0,sp,48 100b8: fca42e23 sw a0,-36(s0) 100bc: 400027b7 lui a5,0x40002 100c0: fef42423 sw a5,-24(s0) 100c4: 01f00793 li a5,31 100c8: fef42623 sw a5,-20(s0) 100cc: 0380006f j 10104 <printNum+0x58> 100d0: fec42783 lw a5,-20(s0) 100d4: fdc42703 lw a4,-36(s0) 100d8: 40f757b3 sra a5,a4,a5 100dc: 0017f793 andi a5,a5,1 100e0: 00078663 beqz a5,100ec <printNum+0x40> 100e4: 03100793 li a5,49 100e8: 0080006f j 100f0 <printNum+0x44> 100ec: 03000793 li a5,48 100f0: fe842703 lw a4,-24(s0) 100f4: 00f70023 sb a5,0(a4) 100f8: fec42783 lw a5,-20(s0) 100fc: fff78793 addi a5,a5,-1 # 40001fff <__global_pointer$+0x3fff05e5> 10100: fef42623 sw a5,-20(s0) 10104: fec42783 lw a5,-20(s0) 10108: fc07d4e3 bgez a5,100d0 <printNum+0x24> 1010c: 00000013 nop 10110: 00000013 nop 10114: 02c12403 lw s0,44(sp) 10118: 03010113 addi sp,sp,48 1011c: 00008067 ret 00010120 <printStr>: 10120: fd010113 addi sp,sp,-48 10124: 02812623 sw s0,44(sp) 10128: 03010413 addi s0,sp,48 1012c: fca42e23 sw a0,-36(s0) 10130: 400027b7 lui a5,0x40002 10134: fef42623 sw a5,-20(s0) 10138: 0200006f j 10158 <printStr+0x38> 1013c: fdc42783 lw a5,-36(s0) 10140: 0007c703 lbu a4,0(a5) # 40002000 <__global_pointer$+0x3fff05e6> 10144: fec42783 lw a5,-20(s0) 10148: 00e78023 sb a4,0(a5) 1014c: fdc42783 lw a5,-36(s0) 10150: 00178793 addi a5,a5,1 10154: fcf42e23 sw a5,-36(s0) 10158: fdc42783 lw a5,-36(s0) 1015c: 0007c783 lbu a5,0(a5) 10160: fc079ee3 bnez a5,1013c <printStr+0x1c> 10164: 00000013 nop 10168: 00000013 nop 1016c: 02c12403 lw s0,44(sp) 10170: 03010113 addi sp,sp,48 10174: 00008067 ret 00010178 <_start>: 10178: fd010113 addi sp,sp,-48 1017c: 02112623 sw ra,44(sp) 10180: 02812423 sw s0,40(sp) 10184: 03010413 addi s0,sp,48 10188: 000107b7 lui a5,0x10 1018c: 20078793 addi a5,a5,512 # 10200 <_start+0x88> 10190: fef42623 sw a5,-20(s0) 10194: 000107b7 lui a5,0x10 10198: 20c78793 addi a5,a5,524 # 1020c <_start+0x94> 1019c: fef42423 sw a5,-24(s0) 101a0: 000107b7 lui a5,0x10 101a4: 21878793 addi a5,a5,536 # 10218 <_start+0xa0> 101a8: fef42223 sw a5,-28(s0) 101ac: 0aa00793 li a5,170 101b0: fef42023 sw a5,-32(s0) 101b4: fec42503 lw a0,-20(s0) 101b8: f69ff0ef jal ra,10120 <printStr> 101bc: fe042503 lw a0,-32(s0) 101c0: eedff0ef jal ra,100ac <printNum> 101c4: fe442503 lw a0,-28(s0) 101c8: f59ff0ef jal ra,10120 <printStr> 101cc: fe042503 lw a0,-32(s0) 101d0: e85ff0ef jal ra,10054 <findComplement> 101d4: fca42e23 sw a0,-36(s0) 101d8: fe842503 lw a0,-24(s0) 101dc: f45ff0ef jal ra,10120 <printStr> 101e0: fdc42503 lw a0,-36(s0) 101e4: ec9ff0ef jal ra,100ac <printNum> 101e8: 00000013 nop 101ec: 00078513 mv a0,a5 101f0: 02c12083 lw ra,44(sp) 101f4: 02812403 lw s0,40(sp) 101f8: 03010113 addi sp,sp,48 101fc: 00008067 ret ``` - produced by gcc with -Os ```bash= $ riscv-none-embed-objdump -d numberComplementOs numberComplementOs: file format elf32-littleriscv Disassembly of section .text: 00010054 <findComplement>: 10054: fff00793 li a5,-1 10058: 00f57733 and a4,a0,a5 1005c: 00071863 bnez a4,1006c <findComplement+0x18> 10060: 00f54533 xor a0,a0,a5 10064: fff54513 not a0,a0 10068: 00008067 ret 1006c: 00179793 slli a5,a5,0x1 10070: fe9ff06f j 10058 <findComplement+0x4> 00010074 <printNum>: 10074: 01f00713 li a4,31 10078: 40002637 lui a2,0x40002 1007c: fff00693 li a3,-1 10080: 40e557b3 sra a5,a0,a4 10084: 0017f793 andi a5,a5,1 10088: 03078793 addi a5,a5,48 1008c: 00f60023 sb a5,0(a2) # 40002000 <__global_pointer$+0x3fff06d3> 10090: fff70713 addi a4,a4,-1 10094: fed716e3 bne a4,a3,10080 <printNum+0xc> 10098: 00008067 ret 0001009c <printStr>: 1009c: 40002737 lui a4,0x40002 100a0: 00054783 lbu a5,0(a0) 100a4: 00079463 bnez a5,100ac <printStr+0x10> 100a8: 00008067 ret 100ac: 00f70023 sb a5,0(a4) # 40002000 <__global_pointer$+0x3fff06d3> 100b0: 00150513 addi a0,a0,1 100b4: fedff06f j 100a0 <printStr+0x4> 000100b8 <_start>: 100b8: 00010537 lui a0,0x10 100bc: ff010113 addi sp,sp,-16 100c0: 11450513 addi a0,a0,276 # 10114 <_start+0x5c> 100c4: 00112623 sw ra,12(sp) 100c8: 00812423 sw s0,8(sp) 100cc: fd1ff0ef jal ra,1009c <printStr> 100d0: 0aa00513 li a0,170 100d4: fa1ff0ef jal ra,10074 <printNum> 100d8: 00010537 lui a0,0x10 100dc: 12050513 addi a0,a0,288 # 10120 <_start+0x68> 100e0: fbdff0ef jal ra,1009c <printStr> 100e4: 0aa00513 li a0,170 100e8: f6dff0ef jal ra,10054 <findComplement> 100ec: 00050413 mv s0,a0 100f0: 00010537 lui a0,0x10 100f4: 12450513 addi a0,a0,292 # 10124 <_start+0x6c> 100f8: fa5ff0ef jal ra,1009c <printStr> 100fc: 00040513 mv a0,s0 10100: f75ff0ef jal ra,10074 <printNum> 10104: 00c12083 lw ra,12(sp) 10108: 00812403 lw s0,8(sp) 1010c: 01010113 addi sp,sp,16 10110: 00008067 ret ``` | | -O3 | -O0 | -Os | | -------------------------- |:---------:|:------------:|:-----------:| | Instruction count | 468 | 1197 | 551 | | Execution time | 515684 | 713477 | 54474 | | Lines of code | 71 | 111 | 52 | | Jumps(forwards, backwards) | 77(0, 77) | 172(76, 96) | 125(31, 94) | | Branch(T, F) | 80(76, 4) | 159(145, 14) | 93(87, 6) | - Observation - Counters in `emu rv32i` simulator - `jump_counter` indicates the number of executions of `jump`(`j`, `jal`, etc) when program is running. - `forwards_counter` indicates the number of executions of `forward jump` when program is running. A `jump` is called a `forward jump` if the target address is larger than the address of the jump instruction. - `backwards_counter` indicates the number of executions of 'backward jump' when program is running. A `jump` is called a `backward jump` if the target address is less than the address of the jump instruction. - `true_counter` indicates the number of successful predictions of `branch` when the program is running. - 'false_counter' indicates the number of failed predictions of `branch` when the program is running. - The following points can be observed from above table: - Use option `-O0` when compiling, this is the unoptimize version, so the number of instructions, the execution time and the code size are the highest. - Use option `-O3` or `-Os` to optimize the program, the number of instructions, the execution time and the code size are similar. - Use option `O3` when compiling, there is no forward jump. - For option `O3` and `Os`, the probability of successful predition of `branch` is higher than option `O0`. When CPU encounter `branch`, it needs to either execute the next instruction or use the target address to update the value of `program counter` and then continues execution from the instruction which is pointed by new `program counter`. ### readelf - produced by gcc with -O3 ```bash= $ riscv-none-embed-readelf -h numberComplementO3 ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x100c4 Start of program headers: 52 (bytes into file) Start of section headers: 872 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 1 Size of section headers: 40 (bytes) Number of section headers: 7 Section header string table index: 6 ``` - produced by gcc with -O0 ```bash= $ riscv-none-embed-readelf -h numberComplementO0 ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x10178 Start of program headers: 52 (bytes into file) Start of section headers: 1036 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 1 Size of section headers: 40 (bytes) Number of section headers: 7 Section header string table index: 6 ``` - produced by gcc with -Os ```bash= $ riscv-none-embed-readelf -h numberComplementOs ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x100b8 Start of program headers: 52 (bytes into file) Start of section headers: 800 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 1 Size of section headers: 40 (bytes) Number of section headers: 7 Section header string table index: 6 ``` ### size - produced by gcc with -O3 ```bash= $ riscv-none-embed-size numberComplementO3 text data bss dec hex filename 289 0 0 289 121 numberComplementO3 ``` - produced by gcc with -O0 ```bash= $ riscv-none-embed-size numberComplementO0 text data bss dec hex filename 454 0 0 454 1c6 numberComplementO0 ``` - produced by gcc with -Os ```bash= $ riscv-none-embed-size numberComplementOs text data bss dec hex filename 217 0 0 217 d9 numberComplementOs ```