# Assignment2: RISC-V Toolchain
contributed by: <`huang-me`>
###### tags: `Computer Architecture`
## Rewrite
[Leetcode 191. Number of 1 Bits](https://leetcode.com/problems/number-of-1-bits/)
Write a function that takes an unsigned integer and returns the number of '1' bits it has (also known as the [Hamming weight](https://en.wikipedia.org/wiki/Hamming_weight)).
### C code
```c
int cntbit(int n) {
if(n == 0) return 0;
int cnt = 0;
while(n &= n-1)
cnt++;
cnt++;
return cnt;
}
void _start()
{
volatile char* tx = (volatile char*) 0x40002000;
volatile int *tx2 = (volatile int*) 0x40000008;
int n = 2147483647;
int cnt = cntbit(n);
char *str = "The bitcount of ";
while (*str)
*tx = *str++;
*tx2 = n;
str = " is ";
while(*str)
*tx = *str++;
*tx2 = cnt;
return;
}
```
### Disassemble the ELF
Codes compiled into RV32 binary by compiler with different compiler optimization level.
disassemble command: `$ riscv-none-embed-objdump -d countBits`
#### `-O3`
Compiled with command: `$ riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O3 -nostdlib countBits.c -o countBits`
- Observations:
- instruction count: `97`
- stack usage: `0`
- register usage: `4` registers used
- `a0`: store value of `n`
- `a3`: store temperary value `n-1`
- `a4`: store value of `cnt`
- `a5`: store temperary value `n & (n-1)`
```
00010054 <cntbit>:
10054: 02050863 beqz a0,10084 <cntbit+0x30>
10058: fff50793 addi a5,a0,-1
1005c: 00a7f7b3 and a5,a5,a0
10060: 02078663 beqz a5,1008c <cntbit+0x38>
10064: 00000713 li a4,0
10068: fff78693 addi a3,a5,-1
1006c: 00d7f7b3 and a5,a5,a3
10070: 00070513 mv a0,a4
10074: 00170713 addi a4,a4,1
10078: fe0798e3 bnez a5,10068 <cntbit+0x14>
1007c: 00250513 addi a0,a0,2
10080: 00008067 ret
10084: 00000513 li a0,0
10088: 00008067 ret
1008c: 00100513 li a0,1
10090: 00008067 ret
```
#### `-O1`
Compiled with command: `$ riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O1 -nostdlib countBits.c -o countBits`
- Observations:
- instruction count: `230`
- stack usage: `0`
- register usage: `3` registers used
- `a0`: store value of `n` and `cnt`
- `a4`: store temperary value `n-1`
- `a5`: store temperary value `n & (n-1)`
```
00010054 <cntbit>:
10054: 02050463 beqz a0,1007c <cntbit+0x28>
10058: fff50793 addi a5,a0,-1
1005c: 00f577b3 and a5,a0,a5
10060: 02078063 beqz a5,10080 <cntbit+0x2c>
10064: 00000513 li a0,0
10068: 00150513 addi a0,a0,1
1006c: fff78713 addi a4,a5,-1
10070: 00e7f7b3 and a5,a5,a4
10074: fe079ae3 bnez a5,10068 <cntbit+0x14>
10078: 00150513 addi a0,a0,1
1007c: 00008067 ret
10080: 00078513 mv a0,a5
10084: ff5ff06f j 10078 <cntbit+0x24>
```
#### `-O0`
Compiled with command: `$ riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O0 -nostdlib countBits.c -o countBits`
- Observations:
- instruction count: `544`
- stack usage: `2`
- `-36(s0)`: store value of `n & (n-1)`
- `-20(s0)`: store value of `cnt`
- register usage: `3` registers used
- `a0, a4, a5` are used to store temperary value
```
00010054 <cntbit>:
10054: fd010113 addi sp,sp,-48
10058: 02812623 sw s0,44(sp)
1005c: 03010413 addi s0,sp,48
10060: fca42e23 sw a0,-36(s0)
10064: fdc42783 lw a5,-36(s0)
10068: 00079663 bnez a5,10074 <cntbit+0x20>
1006c: 00000793 li a5,0
10070: 0440006f j 100b4 <cntbit+0x60>
10074: fe042623 sw zero,-20(s0)
10078: 0100006f j 10088 <cntbit+0x34>
1007c: fec42783 lw a5,-20(s0)
10080: 00178793 addi a5,a5,1
10084: fef42623 sw a5,-20(s0)
10088: fdc42783 lw a5,-36(s0)
1008c: fff78793 addi a5,a5,-1
10090: fdc42703 lw a4,-36(s0)
10094: 00f777b3 and a5,a4,a5
10098: fcf42e23 sw a5,-36(s0)
1009c: fdc42783 lw a5,-36(s0)
100a0: fc079ee3 bnez a5,1007c <cntbit+0x28>
100a4: fec42783 lw a5,-20(s0)
100a8: 00178793 addi a5,a5,1
100ac: fef42623 sw a5,-20(s0)
100b0: fec42783 lw a5,-20(s0)
100b4: 00078513 mv a0,a5
100b8: 02c12403 lw s0,44(sp)
100bc: 03010113 addi sp,sp,48
100c0: 00008067 ret
```
### Summary
:::info
from -O0 to -O1:
- reduce `lw` and `sw` operation
from -O1 to -O3:
- replace the `jump` operation with `ret` (only operation after `jump` is ret), reducing `jump` lead to fewer instruction flushes.
:::
| | -O0 | -O1 | -O3 |
|:------------------:|:---:|:---:|:---:|
| Instruction count | 544 | 230 | 97 |
| Line of code | 345 | 197 | 185 |
| stack used (bytes) | 48 | 0 | 0 |
## Results
### -O0
```=
The bitcount of 2147483647 is 31
>>> Execution time: 162137 ns
>>> Instruction count: 544 (IPS=3355187)
>>> Jumps: 57 (10.48%) - 5 forwards, 52 backwards
>>> Branching T=51 (94.44%) F=3 (5.56%)
```
### -O1
```=
The bitcount of 2147483647 is 31
>>> Execution time: 216318 ns
>>> Instruction count: 230 (IPS=1063249)
>>> Jumps: 50 (21.74%) - 1 forwards, 49 backwards
>>> Branching T=47 (90.38%) F=5 (9.62%)
```
### -O3
```=
The bitcount of 2147483647 is 31
>>> Execution time: 68878 ns
>>> Instruction count: 97 (IPS=1408287)
>>> Jumps: 19 (19.59%) - 0 forwards, 19 backwards
>>> Branching T=18 (90.00%) F=2 (10.00%)
```