# Lab2: RISC-V RV32I[MA] emulator with ELF support ###### tags: `Computer Architecture 2020` ## Configure some variable Need to do so everytime you boot. ```shell $ cd $HOME/riscv-none-embed-gcc $ echo "export PATH=`pwd`/8.2.0-3.1/bin:$PATH" > setenv $ cd $HOME $ source riscv-none-embed-gcc/setenv $ riscv-none-embed-gcc -v #testing ``` ## Rewrite assembly to C ```cpp int fib(int n){ if(n == 0 || n == 1) return n; return (fib(n-1)+fib(n-2)); } void _start(){ volatile char* tx = (volatile char*) 0x40002000; int res = fib(10); for (int i = 31; i >= 0; i--) if ((res >> i) & 0x01) *tx = '1'; else *tx = '0'; } ``` To make generated assembly easier, print result in binary. ## Using objdump objdump can display information from object files. One of usefull feature is flag `d` which can disassemble assembly in `.text` section. ```shell $ riscv-none-embed-objdump -d test1 ``` First of all, try to trace generated assembly in `fib`: Then I tried to compile in `O1`: ``` 00010054 <fib>: 10054: 00100793 li a5,1 10058: 04a7f263 bgeu a5,a0,1009c <fib+0x48> 1005c: ff010113 addi sp,sp,-16 10060: 00112623 sw ra,12(sp) 10064: 00812423 sw s0,8(sp) 10068: 00912223 sw s1,4(sp) 1006c: 00050413 mv s0,a0 10070: fff50513 addi a0,a0,-1 10074: fe1ff0ef jal ra,10054 <fib> 10078: 00050493 mv s1,a0 1007c: ffe40513 addi a0,s0,-2 10080: fd5ff0ef jal ra,10054 <fib> 10084: 00a48533 add a0,s1,a0 10088: 00c12083 lw ra,12(sp) 1008c: 00812403 lw s0,8(sp) 10090: 00412483 lw s1,4(sp) 10094: 01010113 addi sp,sp,16 10098: 00008067 ret 1009c: 00008067 ret ``` Trace assmebly: In 10058, only branch when parameter `a0` is zero or one (`a5` >= `a0` be true). Also, it's good to know result is store in `a0`. ``` 1006c: 00050413 mv s0,a0 10070: fff50513 addi a0,a0,-1 10074: fe1ff0ef jal ra,10054 <fib> 10078: 00050493 mv s1,a0 ``` In 1006c to 10078 is trying to evaluate `fib(n-1)`. The temp result for later addition is stored in `s1`. ``` 1007c: ffe40513 addi a0,s0,-2 10080: fd5ff0ef jal ra,10054 <fib> 10084: 00a48533 add a0,s1,a0 ``` In 1007c to 10084 is trying to evaluate second value (`fib(n-2)`) in expression. The result of `fib(n-2)` is stored in `a0`. Then, store result of `fib(n-1) + fib(n-2)` in `a0` which is return value for current function. Last thing need to do is restoring the value on stack (`ra`, `s0`, `s1`). Performance: ``` 00000000000000000000000000110111 >>> Execution time: 731802 ns >>> Instruction count: 2062 (IPS=2817702) >>> Jumps: 478 (23.18%) - 92 forwards, 386 backwards >>> Branching T=117 (48.55%) F=124 (51.45%) ``` <br> Following is compiled with flag `O3`: ``` 00010054 <fib>: 10054: 00100793 li a5,1 10058: 06a7f663 bgeu a5,a0,100c4 <fib+0x70> 1005c: fe010113 addi sp,sp,-32 10060: 01312623 sw s3,12(sp) 10064: ffe50993 addi s3,a0,-2 10068: 01212823 sw s2,16(sp) 1006c: ffe9f793 andi a5,s3,-2 10070: ffd50913 addi s2,a0,-3 10074: 00812c23 sw s0,24(sp) 10078: 00912a23 sw s1,20(sp) 1007c: 00112e23 sw ra,28(sp) 10080: fff50493 addi s1,a0,-1 10084: 40f90933 sub s2,s2,a5 10088: 00000413 li s0,0 1008c: 00048513 mv a0,s1 10090: fc5ff0ef jal ra,10054 <fib> 10094: ffe48493 addi s1,s1,-2 10098: 00a40433 add s0,s0,a0 1009c: ff2498e3 bne s1,s2,1008c <fib+0x38> 100a0: 0019f513 andi a0,s3,1 100a4: 00850533 add a0,a0,s0 100a8: 01c12083 lw ra,28(sp) 100ac: 01812403 lw s0,24(sp) 100b0: 01412483 lw s1,20(sp) 100b4: 01012903 lw s2,16(sp) 100b8: 00c12983 lw s3,12(sp) 100bc: 02010113 addi sp,sp,32 100c0: 00008067 ret 100c4: 00008067 ret ``` Performance: ```shell $ ./emu-rv32i test1 ``` ``` 00000000000000000000000000110111 >>> Execution time: 963655 ns >>> Instruction count: 1947 (IPS=2020432) >>> Jumps: 268 (13.76%) - 46 forwards, 222 backwards >>> Branching T=100 (41.67%) F=140 (58.33%) ``` In frist glance, I have no idea what the generated code did. Then, I tried to find which flag make it change a lot. Finally, I found flag `foptimize-sibling-calls` make it changes a lot! The flag's description is "The compiler optimizes tail recursive calls". Let's explaing tail call first. ```cpp int foo(int a, int b) { // some code ... return bar(b); } ``` we call `bar` for tail call which appears at the tail position of `foo`. How about **Tail Recursive Calls**? Tail recursive call is a special case of tail call where callee function is same as caller function. ```cpp int foo(int a, int b) { if (a > 0) return foo(a - 1, b + 1); // Tail recursive call else return b; } ``` Note: Trap for tail recursive call example ```cpp function foo1(data) { return a(data) + 1; } ``` The call to `a(data)` is not in tail position in `foo1`, because control must return to the caller to allow it to inspect or modify the return value before returning it. Last ralative stuff is **Sibling Calls**. Sibling call is another special case of tail call where caller function and callee function do not need to be same, but they have compatible stack footprint. It means the return types of both functions must be same, and the arguments being passed must take same stack space. ```cpp int foo(int a, int b) { // some code ... return bar(a - 1, b); } ``` Basic idea for TCO is to change procedure call into jump: Before optimization: ``` foo: call B call A ret ``` After CTO: ``` foo: call B jmp A ``` It is good to know we can just ignore `ret` and `ra` when jump into `A` is the next line called `foo`. It makes sense even ignore `ret` in `foo`. Back to `fib`, it is tail recursive call. Also, it is sibling call since the definition implies every function is a sibling of itself. Now, let's try to trace CTO assembly. First tail recursion occure in `return n;`. W/O CTO, if `n` is zero or one, `pc` will jump to `ret`.(redundant) ``` 10058: 04a7f263 bgeu a5,a0,1009c <fib+0x48> ... 10098: 00008067 ret 1009c: 00008067 ret ``` With CTO ``` 1007c: 02a7fe63 bgeu a5,a0,100b8 <fib+0x64> ... 100b8: 01248533 add a0,s1,s2 100bc: 01c12083 lw s4,8(sp) ... 100d4: 02010113 addi sp,sp,32 100d8: 00008067 ret ``` `n` is zero or one. Instead of calling another procedure, it just sum up `s1` and `s2` and store 1 in `a0`. Also restore `s1`~`s4` and `ra`. On the other hand, `return fib(n-1) + fib(n - 2)` didn't call `fib` twice. It uses `bne s0,s3,10098 <fib+0x44>` to evaluate tail recursion. ## ELF header use `readelf` to check it ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 .. OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V .. ``` We can found it is exe for RISC-V through attribute `Machine`. (It makes sense because we cross-compile it into RISC-V) ### Reference [TCO](https://en.wikipedia.org/wiki/Tail_call)