# Assignment2: RISC-V Toolchain contributed by: <`huang-me`> ###### tags: `Computer Architecture` ## Rewrite [Leetcode 191. Number of 1 Bits](https://leetcode.com/problems/number-of-1-bits/) Write a function that takes an unsigned integer and returns the number of '1' bits it has (also known as the [Hamming weight](https://en.wikipedia.org/wiki/Hamming_weight)). ### C code ```c int cntbit(int n) { if(n == 0) return 0; int cnt = 0; while(n &= n-1) cnt++; cnt++; return cnt; } void _start() { volatile char* tx = (volatile char*) 0x40002000; volatile int *tx2 = (volatile int*) 0x40000008; int n = 2147483647; int cnt = cntbit(n); char *str = "The bitcount of "; while (*str) *tx = *str++; *tx2 = n; str = " is "; while(*str) *tx = *str++; *tx2 = cnt; return; } ``` ### Disassemble the ELF Codes compiled into RV32 binary by compiler with different compiler optimization level. disassemble command: `$ riscv-none-embed-objdump -d countBits` #### `-O3` Compiled with command: `$ riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O3 -nostdlib countBits.c -o countBits` - Observations: - instruction count: `97` - stack usage: `0` - register usage: `4` registers used - `a0`: store value of `n` - `a3`: store temperary value `n-1` - `a4`: store value of `cnt` - `a5`: store temperary value `n & (n-1)` ``` 00010054 <cntbit>: 10054: 02050863 beqz a0,10084 <cntbit+0x30> 10058: fff50793 addi a5,a0,-1 1005c: 00a7f7b3 and a5,a5,a0 10060: 02078663 beqz a5,1008c <cntbit+0x38> 10064: 00000713 li a4,0 10068: fff78693 addi a3,a5,-1 1006c: 00d7f7b3 and a5,a5,a3 10070: 00070513 mv a0,a4 10074: 00170713 addi a4,a4,1 10078: fe0798e3 bnez a5,10068 <cntbit+0x14> 1007c: 00250513 addi a0,a0,2 10080: 00008067 ret 10084: 00000513 li a0,0 10088: 00008067 ret 1008c: 00100513 li a0,1 10090: 00008067 ret ``` #### `-O1` Compiled with command: `$ riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O1 -nostdlib countBits.c -o countBits` - Observations: - instruction count: `230` - stack usage: `0` - register usage: `3` registers used - `a0`: store value of `n` and `cnt` - `a4`: store temperary value `n-1` - `a5`: store temperary value `n & (n-1)` ``` 00010054 <cntbit>: 10054: 02050463 beqz a0,1007c <cntbit+0x28> 10058: fff50793 addi a5,a0,-1 1005c: 00f577b3 and a5,a0,a5 10060: 02078063 beqz a5,10080 <cntbit+0x2c> 10064: 00000513 li a0,0 10068: 00150513 addi a0,a0,1 1006c: fff78713 addi a4,a5,-1 10070: 00e7f7b3 and a5,a5,a4 10074: fe079ae3 bnez a5,10068 <cntbit+0x14> 10078: 00150513 addi a0,a0,1 1007c: 00008067 ret 10080: 00078513 mv a0,a5 10084: ff5ff06f j 10078 <cntbit+0x24> ``` #### `-O0` Compiled with command: `$ riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O0 -nostdlib countBits.c -o countBits` - Observations: - instruction count: `544` - stack usage: `2` - `-36(s0)`: store value of `n & (n-1)` - `-20(s0)`: store value of `cnt` - register usage: `3` registers used - `a0, a4, a5` are used to store temperary value ``` 00010054 <cntbit>: 10054: fd010113 addi sp,sp,-48 10058: 02812623 sw s0,44(sp) 1005c: 03010413 addi s0,sp,48 10060: fca42e23 sw a0,-36(s0) 10064: fdc42783 lw a5,-36(s0) 10068: 00079663 bnez a5,10074 <cntbit+0x20> 1006c: 00000793 li a5,0 10070: 0440006f j 100b4 <cntbit+0x60> 10074: fe042623 sw zero,-20(s0) 10078: 0100006f j 10088 <cntbit+0x34> 1007c: fec42783 lw a5,-20(s0) 10080: 00178793 addi a5,a5,1 10084: fef42623 sw a5,-20(s0) 10088: fdc42783 lw a5,-36(s0) 1008c: fff78793 addi a5,a5,-1 10090: fdc42703 lw a4,-36(s0) 10094: 00f777b3 and a5,a4,a5 10098: fcf42e23 sw a5,-36(s0) 1009c: fdc42783 lw a5,-36(s0) 100a0: fc079ee3 bnez a5,1007c <cntbit+0x28> 100a4: fec42783 lw a5,-20(s0) 100a8: 00178793 addi a5,a5,1 100ac: fef42623 sw a5,-20(s0) 100b0: fec42783 lw a5,-20(s0) 100b4: 00078513 mv a0,a5 100b8: 02c12403 lw s0,44(sp) 100bc: 03010113 addi sp,sp,48 100c0: 00008067 ret ``` ### Summary :::info from -O0 to -O1: - reduce `lw` and `sw` operation from -O1 to -O3: - replace the `jump` operation with `ret` (only operation after `jump` is ret), reducing `jump` lead to fewer instruction flushes. ::: | | -O0 | -O1 | -O3 | |:------------------:|:---:|:---:|:---:| | Instruction count | 544 | 230 | 97 | | Line of code | 345 | 197 | 185 | | stack used (bytes) | 48 | 0 | 0 | ## Results ### -O0 ```= The bitcount of 2147483647 is 31 >>> Execution time: 162137 ns >>> Instruction count: 544 (IPS=3355187) >>> Jumps: 57 (10.48%) - 5 forwards, 52 backwards >>> Branching T=51 (94.44%) F=3 (5.56%) ``` ### -O1 ```= The bitcount of 2147483647 is 31 >>> Execution time: 216318 ns >>> Instruction count: 230 (IPS=1063249) >>> Jumps: 50 (21.74%) - 1 forwards, 49 backwards >>> Branching T=47 (90.38%) F=5 (9.62%) ``` ### -O3 ```= The bitcount of 2147483647 is 31 >>> Execution time: 68878 ns >>> Instruction count: 97 (IPS=1408287) >>> Jumps: 19 (19.59%) - 0 forwards, 19 backwards >>> Branching T=18 (90.00%) F=2 (10.00%) ```