# RISC-V Toolchain 1
###### tags: `Computer Architecture 2020`
## [Factorial](https://hackmd.io/@Tim096/rJOiVg_8v)
The following assembly program was implemented by 曾鈜寬同學.
### RISC-V asm with recursion
```clike=
.data
argument: .word 6
str1: .string "Factorial of "
str2: .string " = "
.text
main:
lw a0, argument # n = 6
jal ra, fact
mv a1, a0
lw a0, argument
jal ra, printResult
li a7, 10
ecall
fact:
addi sp, sp, -16
sw ra, 8(sp)
sw a0, 0(sp)
addi t0, a0, -1
bge t0, zero, nfact
addi a0, zero, 1
addi sp, sp, 16
jr x1
nfact:
addi a0, a0, -1
jal ra, fact
addi t1, a0, 0
lw a0, 0(sp)
lw ra, 8(sp)
addi sp, sp, 16
mul a0, a0, t1
ret
# a0: Value which factorial number was computed from
# a1: Factorial result
printResult:
mv t0, a0
mv t1, a1
la a0, str1
li a7, 4
ecall
mv a0, t0
li a7, 1
ecall
la a0, str2
li a7, 4
ecall
mv a0, t1
li a7, 1
ecall
ret
```
### Rewrite in C
int main() was replaced by void _start() and __mulsi3 was added to implement mult.
```c=
int Factorial(int n);
unsigned int __mulsi3 (unsigned int a, unsigned int b);
void _start() {
volatile int n = Factorial(6);
}
int Factorial(int n) {
if (n>=1)
return __mulsi3(n, Factorial(n-1));
else
return 1;
}
unsigned int __mulsi3 (unsigned int a, unsigned int b){
unsigned int r = 0;
while (a) {
if (a & 1)
r += b;
a >>= 1;
b <<= 1;
}
return r;
}
```
### Run with rv32emu emulator
```bash
$ riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O3 -nostdlib -o fact fact.c
```
```
smallhanley@smallhanley-VirtualBox:~/rv32emu$ ./emu-rv32i fact
>>> Execution time: 1011 ns
>>> Instruction count: 6 (IPS=5934718)
>>> Jumps: 1 (16.67%) - 0 forwards, 1 backwards
>>> Branching T=0 (-nan%) F=0 (-nan%)
```
#### objdump with O3 optimization
```
Disassembly of section .text:
00010054 <Factorial.part.0>:
10054: ff010113 addi sp,sp,-16
10058: 00812423 sw s0,8(sp)
1005c: 00112623 sw ra,12(sp)
10060: 00050413 mv s0,a0
10064: fff50513 addi a0,a0,-1
10068: 00100793 li a5,1
1006c: 00a05663 blez a0,10078 <Factorial.part.0+0x24>
10070: fe5ff0ef jal ra,10054 <Factorial.part.0>
10074: 00050793 mv a5,a0
10078: 00000513 li a0,0
1007c: 00040e63 beqz s0,10098 <Factorial.part.0+0x44>
10080: 00147713 andi a4,s0,1
10084: 00145413 srli s0,s0,0x1
10088: 00070463 beqz a4,10090 <Factorial.part.0+0x3c>
1008c: 00f50533 add a0,a0,a5
10090: 00179793 slli a5,a5,0x1
10094: fe0416e3 bnez s0,10080 <Factorial.part.0+0x2c>
10098: 00c12083 lw ra,12(sp)
1009c: 00812403 lw s0,8(sp)
100a0: 01010113 addi sp,sp,16
100a4: 00008067 ret
000100a8 <_start>:
100a8: ff010113 addi sp,sp,-16
100ac: 2d000793 li a5,720
100b0: 00f12623 sw a5,12(sp)
100b4: 01010113 addi sp,sp,16
100b8: 00008067 ret
000100bc <Factorial>:
100bc: 00a04663 bgtz a0,100c8 <Factorial+0xc>
100c0: 00100513 li a0,1
100c4: 00008067 ret
100c8: fe010113 addi sp,sp,-32
100cc: 00812c23 sw s0,24(sp)
100d0: 00912a23 sw s1,20(sp)
100d4: 00112e23 sw ra,28(sp)
100d8: 01212823 sw s2,16(sp)
100dc: 01312623 sw s3,12(sp)
100e0: 01412423 sw s4,8(sp)
100e4: 01512223 sw s5,4(sp)
100e8: 01612023 sw s6,0(sp)
100ec: fff50493 addi s1,a0,-1
100f0: 00050413 mv s0,a0
100f4: 00100793 li a5,1
100f8: 04049463 bnez s1,10140 <Factorial+0x84>
100fc: 00000513 li a0,0
10100: 00147713 andi a4,s0,1
10104: 00145413 srli s0,s0,0x1
10108: 00070463 beqz a4,10110 <Factorial+0x54>
1010c: 00f50533 add a0,a0,a5
10110: 00179793 slli a5,a5,0x1
10114: fe0416e3 bnez s0,10100 <Factorial+0x44>
10118: 01c12083 lw ra,28(sp)
1011c: 01812403 lw s0,24(sp)
10120: 01412483 lw s1,20(sp)
10124: 01012903 lw s2,16(sp)
10128: 00c12983 lw s3,12(sp)
1012c: 00812a03 lw s4,8(sp)
10130: 00412a83 lw s5,4(sp)
10134: 00012b03 lw s6,0(sp)
10138: 02010113 addi sp,sp,32
1013c: 00008067 ret
10140: ffe50913 addi s2,a0,-2
10144: 00100713 li a4,1
10148: 02091263 bnez s2,1016c <Factorial+0xb0>
1014c: 00000793 li a5,0
10150: 0014f693 andi a3,s1,1
10154: 0014d493 srli s1,s1,0x1
10158: 00068463 beqz a3,10160 <Factorial+0xa4>
1015c: 00e787b3 add a5,a5,a4
10160: 00171713 slli a4,a4,0x1
10164: fe0496e3 bnez s1,10150 <Factorial+0x94>
10168: f95ff06f j 100fc <Factorial+0x40>
1016c: ffd50993 addi s3,a0,-3
10170: 03304263 bgtz s3,10194 <Factorial+0xd8>
10174: 00000713 li a4,0
10178: 00197693 andi a3,s2,1
1017c: 00195913 srli s2,s2,0x1
10180: 00068463 beqz a3,10188 <Factorial+0xcc>
10184: 00f70733 add a4,a4,a5
10188: 00179793 slli a5,a5,0x1
1018c: fe0916e3 bnez s2,10178 <Factorial+0xbc>
10190: fbdff06f j 1014c <Factorial+0x90>
10194: ffc50a13 addi s4,a0,-4
10198: 03404263 bgtz s4,101bc <Factorial+0x100>
1019c: 00000793 li a5,0
101a0: 0019f693 andi a3,s3,1
101a4: 0019d993 srli s3,s3,0x1
101a8: 00068463 beqz a3,101b0 <Factorial+0xf4>
101ac: 00e787b3 add a5,a5,a4
101b0: 00171713 slli a4,a4,0x1
101b4: fe0996e3 bnez s3,101a0 <Factorial+0xe4>
101b8: fbdff06f j 10174 <Factorial+0xb8>
101bc: ffb50a93 addi s5,a0,-5
101c0: 03504263 bgtz s5,101e4 <Factorial+0x128>
101c4: 00000713 li a4,0
101c8: 001a7693 andi a3,s4,1
101cc: 001a5a13 srli s4,s4,0x1
101d0: 00068463 beqz a3,101d8 <Factorial+0x11c>
101d4: 00f70733 add a4,a4,a5
101d8: 00179793 slli a5,a5,0x1
101dc: fe0a16e3 bnez s4,101c8 <Factorial+0x10c>
101e0: fbdff06f j 1019c <Factorial+0xe0>
101e4: ffa50b13 addi s6,a0,-6
101e8: 03604263 bgtz s6,1020c <Factorial+0x150>
101ec: 00000793 li a5,0
101f0: 001af693 andi a3,s5,1
101f4: 001ada93 srli s5,s5,0x1
101f8: 00068463 beqz a3,10200 <Factorial+0x144>
101fc: 00e787b3 add a5,a5,a4
10200: 00171713 slli a4,a4,0x1
10204: fe0a96e3 bnez s5,101f0 <Factorial+0x134>
10208: fbdff06f j 101c4 <Factorial+0x108>
1020c: ff950513 addi a0,a0,-7
10210: 02a04263 bgtz a0,10234 <Factorial+0x178>
10214: 00000713 li a4,0
10218: 001b7693 andi a3,s6,1
1021c: 001b5b13 srli s6,s6,0x1
10220: 00068463 beqz a3,10228 <Factorial+0x16c>
10224: 00f70733 add a4,a4,a5
10228: 00179793 slli a5,a5,0x1
1022c: fe0b16e3 bnez s6,10218 <Factorial+0x15c>
10230: fbdff06f j 101ec <Factorial+0x130>
10234: e21ff0ef jal ra,10054 <Factorial.part.0>
10238: 00050793 mv a5,a0
1023c: fd9ff06f j 10214 <Factorial+0x158>
00010240 <__mulsi3>:
10240: 00050793 mv a5,a0
10244: 00000513 li a0,0
10248: 02078063 beqz a5,10268 <__mulsi3+0x28>
1024c: 0017f713 andi a4,a5,1
10250: 0017d793 srli a5,a5,0x1
10254: 00070463 beqz a4,1025c <__mulsi3+0x1c>
10258: 00b50533 add a0,a0,a1
1025c: 00159593 slli a1,a1,0x1
10260: fe0796e3 bnez a5,1024c <__mulsi3+0xc>
10264: 00008067 ret
10268: 00008067 ret
```
#### ELF Header
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100a8
Start of program headers: 52 (bytes into file)
Start of section headers: 1084 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 6
Section header string table index: 5
```
#### Size
```
text data bss dec hex filename
536 0 0 536 218 fact
```
Compare with the asm written by 曾鈜寬同學, the code generated by objdump seems interesting. Look at the following section.
```
000100a8 <_start>:
100a8: ff010113 addi sp,sp,-16
100ac: 2d000793 li a5,720
100b0: 00f12623 sw a5,12(sp)
100b4: 01010113 addi sp,sp,16
100b8: 00008067 retS
```

In this section, we can find the label <_start>, which means the start position. The thing that very surpises me is that the code directly uses `li a5,720` that moves 720 to regiter a5. The number `720` is exactly the result of Factorial(6). Additionally, we can see that objdump also generates the sections about `Factorial` ans `__mulsi3` function. However, the sections except for `_start` didn't be called. This is very cool.
#### Optimization comparison
| | O0 | O1 | O2 | O3 | Os |
|:------------------- |:----- | ----- |:----- |:---- |:----- |
| Execution time (ns) | 29552 | 14202 | 12081 | 1023 | 25646 |
| Instruction count | 453 | 214 | 179 | 6 | 200 |
| Jumps | 59 | 59 | 38 | 1 | 58 |
| Jumps forwards | 25 | 26 | 17 | 0 | 31 |
| Jumps backwards | 34 | 33 | 21 | 1 | 27 |
| Branching T | 20 | 17 | 18 | 0 | 20 |
| Branching F | 21 | 24 | 22 | 0 | 19 |