# RISC-V Toolchain 1 ###### tags: `Computer Architecture 2020` ## [Factorial](https://hackmd.io/@Tim096/rJOiVg_8v) The following assembly program was implemented by 曾鈜寬同學. ### RISC-V asm with recursion ```clike= .data argument: .word 6 str1: .string "Factorial of " str2: .string " = " .text main: lw a0, argument # n = 6 jal ra, fact mv a1, a0 lw a0, argument jal ra, printResult li a7, 10 ecall fact: addi sp, sp, -16 sw ra, 8(sp) sw a0, 0(sp) addi t0, a0, -1 bge t0, zero, nfact addi a0, zero, 1 addi sp, sp, 16 jr x1 nfact: addi a0, a0, -1 jal ra, fact addi t1, a0, 0 lw a0, 0(sp) lw ra, 8(sp) addi sp, sp, 16 mul a0, a0, t1 ret # a0: Value which factorial number was computed from # a1: Factorial result printResult: mv t0, a0 mv t1, a1 la a0, str1 li a7, 4 ecall mv a0, t0 li a7, 1 ecall la a0, str2 li a7, 4 ecall mv a0, t1 li a7, 1 ecall ret ``` ### Rewrite in C int main() was replaced by void _start() and __mulsi3 was added to implement mult. ```c= int Factorial(int n); unsigned int __mulsi3 (unsigned int a, unsigned int b); void _start() { volatile int n = Factorial(6); } int Factorial(int n) { if (n>=1) return __mulsi3(n, Factorial(n-1)); else return 1; } unsigned int __mulsi3 (unsigned int a, unsigned int b){ unsigned int r = 0; while (a) { if (a & 1) r += b; a >>= 1; b <<= 1; } return r; } ``` ### Run with rv32emu emulator ```bash $ riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O3 -nostdlib -o fact fact.c ``` ``` smallhanley@smallhanley-VirtualBox:~/rv32emu$ ./emu-rv32i fact >>> Execution time: 1011 ns >>> Instruction count: 6 (IPS=5934718) >>> Jumps: 1 (16.67%) - 0 forwards, 1 backwards >>> Branching T=0 (-nan%) F=0 (-nan%) ``` #### objdump with O3 optimization ``` Disassembly of section .text: 00010054 <Factorial.part.0>: 10054: ff010113 addi sp,sp,-16 10058: 00812423 sw s0,8(sp) 1005c: 00112623 sw ra,12(sp) 10060: 00050413 mv s0,a0 10064: fff50513 addi a0,a0,-1 10068: 00100793 li a5,1 1006c: 00a05663 blez a0,10078 <Factorial.part.0+0x24> 10070: fe5ff0ef jal ra,10054 <Factorial.part.0> 10074: 00050793 mv a5,a0 10078: 00000513 li a0,0 1007c: 00040e63 beqz s0,10098 <Factorial.part.0+0x44> 10080: 00147713 andi a4,s0,1 10084: 00145413 srli s0,s0,0x1 10088: 00070463 beqz a4,10090 <Factorial.part.0+0x3c> 1008c: 00f50533 add a0,a0,a5 10090: 00179793 slli a5,a5,0x1 10094: fe0416e3 bnez s0,10080 <Factorial.part.0+0x2c> 10098: 00c12083 lw ra,12(sp) 1009c: 00812403 lw s0,8(sp) 100a0: 01010113 addi sp,sp,16 100a4: 00008067 ret 000100a8 <_start>: 100a8: ff010113 addi sp,sp,-16 100ac: 2d000793 li a5,720 100b0: 00f12623 sw a5,12(sp) 100b4: 01010113 addi sp,sp,16 100b8: 00008067 ret 000100bc <Factorial>: 100bc: 00a04663 bgtz a0,100c8 <Factorial+0xc> 100c0: 00100513 li a0,1 100c4: 00008067 ret 100c8: fe010113 addi sp,sp,-32 100cc: 00812c23 sw s0,24(sp) 100d0: 00912a23 sw s1,20(sp) 100d4: 00112e23 sw ra,28(sp) 100d8: 01212823 sw s2,16(sp) 100dc: 01312623 sw s3,12(sp) 100e0: 01412423 sw s4,8(sp) 100e4: 01512223 sw s5,4(sp) 100e8: 01612023 sw s6,0(sp) 100ec: fff50493 addi s1,a0,-1 100f0: 00050413 mv s0,a0 100f4: 00100793 li a5,1 100f8: 04049463 bnez s1,10140 <Factorial+0x84> 100fc: 00000513 li a0,0 10100: 00147713 andi a4,s0,1 10104: 00145413 srli s0,s0,0x1 10108: 00070463 beqz a4,10110 <Factorial+0x54> 1010c: 00f50533 add a0,a0,a5 10110: 00179793 slli a5,a5,0x1 10114: fe0416e3 bnez s0,10100 <Factorial+0x44> 10118: 01c12083 lw ra,28(sp) 1011c: 01812403 lw s0,24(sp) 10120: 01412483 lw s1,20(sp) 10124: 01012903 lw s2,16(sp) 10128: 00c12983 lw s3,12(sp) 1012c: 00812a03 lw s4,8(sp) 10130: 00412a83 lw s5,4(sp) 10134: 00012b03 lw s6,0(sp) 10138: 02010113 addi sp,sp,32 1013c: 00008067 ret 10140: ffe50913 addi s2,a0,-2 10144: 00100713 li a4,1 10148: 02091263 bnez s2,1016c <Factorial+0xb0> 1014c: 00000793 li a5,0 10150: 0014f693 andi a3,s1,1 10154: 0014d493 srli s1,s1,0x1 10158: 00068463 beqz a3,10160 <Factorial+0xa4> 1015c: 00e787b3 add a5,a5,a4 10160: 00171713 slli a4,a4,0x1 10164: fe0496e3 bnez s1,10150 <Factorial+0x94> 10168: f95ff06f j 100fc <Factorial+0x40> 1016c: ffd50993 addi s3,a0,-3 10170: 03304263 bgtz s3,10194 <Factorial+0xd8> 10174: 00000713 li a4,0 10178: 00197693 andi a3,s2,1 1017c: 00195913 srli s2,s2,0x1 10180: 00068463 beqz a3,10188 <Factorial+0xcc> 10184: 00f70733 add a4,a4,a5 10188: 00179793 slli a5,a5,0x1 1018c: fe0916e3 bnez s2,10178 <Factorial+0xbc> 10190: fbdff06f j 1014c <Factorial+0x90> 10194: ffc50a13 addi s4,a0,-4 10198: 03404263 bgtz s4,101bc <Factorial+0x100> 1019c: 00000793 li a5,0 101a0: 0019f693 andi a3,s3,1 101a4: 0019d993 srli s3,s3,0x1 101a8: 00068463 beqz a3,101b0 <Factorial+0xf4> 101ac: 00e787b3 add a5,a5,a4 101b0: 00171713 slli a4,a4,0x1 101b4: fe0996e3 bnez s3,101a0 <Factorial+0xe4> 101b8: fbdff06f j 10174 <Factorial+0xb8> 101bc: ffb50a93 addi s5,a0,-5 101c0: 03504263 bgtz s5,101e4 <Factorial+0x128> 101c4: 00000713 li a4,0 101c8: 001a7693 andi a3,s4,1 101cc: 001a5a13 srli s4,s4,0x1 101d0: 00068463 beqz a3,101d8 <Factorial+0x11c> 101d4: 00f70733 add a4,a4,a5 101d8: 00179793 slli a5,a5,0x1 101dc: fe0a16e3 bnez s4,101c8 <Factorial+0x10c> 101e0: fbdff06f j 1019c <Factorial+0xe0> 101e4: ffa50b13 addi s6,a0,-6 101e8: 03604263 bgtz s6,1020c <Factorial+0x150> 101ec: 00000793 li a5,0 101f0: 001af693 andi a3,s5,1 101f4: 001ada93 srli s5,s5,0x1 101f8: 00068463 beqz a3,10200 <Factorial+0x144> 101fc: 00e787b3 add a5,a5,a4 10200: 00171713 slli a4,a4,0x1 10204: fe0a96e3 bnez s5,101f0 <Factorial+0x134> 10208: fbdff06f j 101c4 <Factorial+0x108> 1020c: ff950513 addi a0,a0,-7 10210: 02a04263 bgtz a0,10234 <Factorial+0x178> 10214: 00000713 li a4,0 10218: 001b7693 andi a3,s6,1 1021c: 001b5b13 srli s6,s6,0x1 10220: 00068463 beqz a3,10228 <Factorial+0x16c> 10224: 00f70733 add a4,a4,a5 10228: 00179793 slli a5,a5,0x1 1022c: fe0b16e3 bnez s6,10218 <Factorial+0x15c> 10230: fbdff06f j 101ec <Factorial+0x130> 10234: e21ff0ef jal ra,10054 <Factorial.part.0> 10238: 00050793 mv a5,a0 1023c: fd9ff06f j 10214 <Factorial+0x158> 00010240 <__mulsi3>: 10240: 00050793 mv a5,a0 10244: 00000513 li a0,0 10248: 02078063 beqz a5,10268 <__mulsi3+0x28> 1024c: 0017f713 andi a4,a5,1 10250: 0017d793 srli a5,a5,0x1 10254: 00070463 beqz a4,1025c <__mulsi3+0x1c> 10258: 00b50533 add a0,a0,a1 1025c: 00159593 slli a1,a1,0x1 10260: fe0796e3 bnez a5,1024c <__mulsi3+0xc> 10264: 00008067 ret 10268: 00008067 ret ``` #### ELF Header ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x100a8 Start of program headers: 52 (bytes into file) Start of section headers: 1084 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 1 Size of section headers: 40 (bytes) Number of section headers: 6 Section header string table index: 5 ``` #### Size ``` text data bss dec hex filename 536 0 0 536 218 fact ``` Compare with the asm written by 曾鈜寬同學, the code generated by objdump seems interesting. Look at the following section. ``` 000100a8 <_start>: 100a8: ff010113 addi sp,sp,-16 100ac: 2d000793 li a5,720 100b0: 00f12623 sw a5,12(sp) 100b4: 01010113 addi sp,sp,16 100b8: 00008067 retS ``` ![](https://i.imgur.com/mflbtiu.png) In this section, we can find the label <_start>, which means the start position. The thing that very surpises me is that the code directly uses `li a5,720` that moves 720 to regiter a5. The number `720` is exactly the result of Factorial(6). Additionally, we can see that objdump also generates the sections about `Factorial` ans `__mulsi3` function. However, the sections except for `_start` didn't be called. This is very cool. #### Optimization comparison | | O0 | O1 | O2 | O3 | Os | |:------------------- |:----- | ----- |:----- |:---- |:----- | | Execution time (ns) | 29552 | 14202 | 12081 | 1023 | 25646 | | Instruction count | 453 | 214 | 179 | 6 | 200 | | Jumps | 59 | 59 | 38 | 1 | 58 | | Jumps forwards | 25 | 26 | 17 | 0 | 31 | | Jumps backwards | 34 | 33 | 21 | 1 | 27 | | Branching T | 20 | 17 | 18 | 0 | 20 | | Branching F | 21 | 24 | 22 | 0 | 19 |