# Assignment2: RISC-V Toolchain
contributed by < [`paulpeng-popo`](https://github.com/paulpeng-popo/ComputerArchitecture/tree/main/hw2) >
## Choose a question
I selected [Bitwise AND of Numbers Range](https://hackmd.io/@JoshuaLee0321/asm-fall-2023) from `JoshuaLee`, because its concept is similar to my topic which I have chosen in assignment 1.
## Analyzation of Original code
This question is asking us to find the common prefix of left and right 's binary code.
To find the common prefix of left and right, we can shift both of them to the right until they are equal.
```c
int rangeBitwiseAnd(int left, int right) {
int res = 0;
while (left != right) {
left >>= 1;
right >>= 1;
res += 1;
}
return left << res;
}
```
After making optimization by `JoshuaLee` it looks like
```c
int rangeBitwiseAnd(int L, int R) {
return L == R ? L : L & ~(0xffffffff >> clz(L ^ R));
}
```
He uses bitmask trick to remove loops in previous implementation.
To compare between these two function, I add `read_cycles` to calculate how much clock cycles will take during execution.
```c
static inline uint64_t
read_cycles(void)
{
uint64_t cycles;
__asm__ volatile("rdcycle %0" : "=r"(cycles));
return cycles;
}
```
And modify `main()`
```c
int main()
{
int left = 26, right = 30;
uint64_t cnt0 = read_cycles();
printf("%d\n", rangeBitwiseAnd(left, right));
uint64_t cyclecount = read_cycles() - cnt0;
printf("Cycle Count: %u\n", (unsigned int)cyclecount);
return 0;
}
```
Compile it
```shell
$ riscv-none-elf-gcc -march=rv32i_zicsr_zifencei -mabi=ilp32 -Wall -S -o range.s range.c
$ riscv-none-elf-gcc -march=rv32i_zicsr_zifencei -mabi=ilp32 -Wall -c -o range.o range.s
$ riscv-none-elf-gcc -o a.elf range.o
```
Result of execution on rv32emu
```
24
Cycle Count: 1573
```
Disassembly of `a.elf`
```
a.elf: file format elf32-littleriscv
Disassembly of section .text:
00010094 <exit>:
10094: 1141 add sp,sp,-16
10096: 4581 li a1,0
10098: c422 sw s0,8(sp)
1009a: c606 sw ra,12(sp)
1009c: 842a mv s0,a0
1009e: 2f1000ef jal 10b8e
...
00010142 <clz>:
10142: 1101 add sp,sp,-32
10144: ce22 sw s0,28(sp)
10146: 1000 add s0,sp,32
10148: fea42623 sw a0,-20(s0)
1014c: fec42783 lw a5,-20(s0)
10150: 8385 srl a5,a5,0x1
10152: fec42703 lw a4,-20(s0)
10156: 8fd9 or a5,a5,a4
10158: fef42623 sw a5,-20(s0)
1015c: fec42783 lw a5,-20(s0)
10160: 8389 srl a5,a5,0x2
...
00010264 <rangeBitwiseAnd2>:
10264: 1101 add sp,sp,-32
10266: ce06 sw ra,28(sp)
10268: cc22 sw s0,24(sp)
1026a: 1000 add s0,sp,32
1026c: fea42623 sw a0,-20(s0)
10270: feb42423 sw a1,-24(s0)
10274: fec42703 lw a4,-20(s0)
10278: fe842783 lw a5,-24(s0)
1027c: 00f71563 bne a4,a5,10286
...
```
Display the `a.elf` header
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100c2
Start of program headers: 52 (bytes into file)
Start of section headers: 69304 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
Program size
```
text data bss dec hex filename
51524 1876 1528 54928 d690 a.elf
```
## Make the original Assembly run properly on rv32emu
According to [rv32emu docs](https://github.com/sysprog21/rv32emu/blob/master/docs/syscall.md), `rv32emu` does not support printf system call.
Error will be like
```
$ rv32emu a.elf
unknown syscall 1
```
TODO: Modify assembly code
## Observe different compilation options result
From [gcc man page](https://man7.org/linux/man-pages/man1/gcc.1.html), there’s 8 different valid -O options
> Optimization Options
> -faggressive-loop-optimizations
> ...
> -fweb -fwhole-program -fwpa -fuse-linker-plugin --param
> name=value -O -O0 -O1 -O2 -O3 -Os -Ofast -Og
- -O/O1 (Optimize)
- -O0 (Do no optimization. Default)
- -O2 (Optimize even more)
- -O3 (Optimize yet more)
- -Os (Optimize for size)
- -Ofast (Disregard strict standards compliance)
- -Og (Optimize debugging experience)
In Makefile, we add `OPT` for variable to change compilation settings.
Default is `-O0`
```Makefile=
SRC = range
OUT = a.elf
OPT = -O0
.PHONY: clean
ASFLAGS = -march=rv32i_zicsr_zifencei -mabi=ilp32
$(OUT): $(SRC).o
riscv-none-elf-gcc -o $(OUT) $(SRC).o
$(SRC).o: $(SRC).s
riscv-none-elf-gcc $(OPT) -c -o $(SRC).o $(SRC).s
$(SRC).s: $(SRC).c
riscv-none-elf-gcc $(OPT) -S -o $(SRC).s $(SRC).c
clean:
rm -f $(SRC).s $(SRC).o $(OUT)
```
Below we can observe at code size and cycle count it took
### -O0 option
```
$ make OPT=-O0
$ riscv-none-elf-size a.elf > sizeO0
```
Cycle Count: 1573
```
text data bss dec hex filename
51524 1876 1528 54928 d690 a.elf
```
### -O1 option
```
$ make OPT=-O1
$ riscv-none-elf-size a.elf > sizeO1
```
Cycle Count: 1493
```
text data bss dec hex filename
51194 1876 1528 54598 d546 a.elf
```
### -O2 option
```
$ make OPT=-O2
$ riscv-none-elf-size a.elf > sizeO2
```
Cycle Count: 1493
```
text data bss dec hex filename
51186 1876 1528 54590 d53e a.elf
```
### -O3 option
```
$ make OPT=-O3
$ riscv-none-elf-size a.elf > sizeO3
```
Cycle Count: 1452
```
text data bss dec hex filename
51256 1876 1528 54660 d584 a.elf
```
### -Os option
```
$ make OPT=-Os
$ riscv-none-elf-size a.elf > sizeOs
```
Cycle Count: 1493
```
text data bss dec hex filename
51188 1876 1528 54592 d540 a.elf
```
### -Ofast option
```
$ make OPT=-Ofast
$ riscv-none-elf-size a.elf > sizeOfast
```
Cycle Count: 1452
```
text data bss dec hex filename
51256 1876 1528 54660 d584 a.elf
```
### -Og option
```
$ make OPT=-Og
$ riscv-none-elf-size a.elf > sizeOg
```
Cycle Count: 1506
```
text data bss dec hex filename
51176 1876 1528 54580 d534 a.elf
```
### Conclusion
**CSR**
| Name | cycle count | size |
| ------ | -------- | ----- |
| O0 | 1573 | 51524 |
| O1 | 1493 | 51194 |
| O2 | 1493 | 51186 |
| O3 | 1452 | 51256 |
| Os | 1493 | 51188 |
| Ofast | 1452 | 51256 |
Besides option `-O0`, other options explicitly reduce the cycle count. The reason could be that the code is too simple, there is no more room for optimization.
And with option `-Os`, it indeed reduce the code size compare to others.
:::warning
Improve your English writing. Explain more according to the experiments.
:notes: jserv
:::