# Assignment2: GNU Toolchain
Contributed by [Yan-You Chen](https://github.com/y0y0alex/CA/tree/main/homework2)
## Install Ubuntu on VirtualBox
Because my computer is Windows, so I use the VirtualBox and Ubuntu to execute the rv32emu.
### The issues encountered during the installation
I first encounter a problem is the version problem.
I install the version ==Ubuntu Linux 20.04-LTS== on VirtualBox by the [install suggestion](https://hackmd.io/@sysprog/SJAR5XMmi).
But after I execute ==make== on rv32emu, I should execute the command ==make check==. But the result is fail.
#### Failed message
```
ca@ca-VirtualBox:~rc32emu$ make check
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Running hello.elf ... Failed.
make: *** [Makefile: 153 : check] 錯誤 1
```
After I googling and asking proffesor, I know that the Ubuntu version should update to ==Ubuntu Linux 22.04== or a later version.
## Basic instruction to using RISCV gcc
The gcc we used in rv32emu is ==riscv-none-elf-gcc==.
* Setting the environment var
> cd $HOME
> source riscv-none-elf-gcc/setenv
* Compiler
> riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -O3 -o test test.c
* Display the assembler mnemonics for the machine instructions and store to .txt file
> riscv-none-elf-objdump -d test > test.txt
* Check elf header
> riscv-none-elf-readelf -h ./test
* Check elf size
> riscv-none-elf-size ./test
* Run code
> build/rv32emu test
## You should know before modify code
Because we need to implement the CSR, we need to implement the [perfcounter](https://github.com/sysprog21/rv32emu/tree/master/tests/perfcounter).
So, we first need to know what perfcounter do.
#### perfcounter/main.c
```c
#include <stdint.h>
#include <stdio.h>
#include <string.h>
//those are functions in the same dir in perfcounter used to take CSR
extern uint64_t get_cycles();
extern uint64_t get_instret();
int main(void)
{
/* measure cycles */
uint64_t instret = get_instret();
uint64_t oldcount = get_cycles();
/* fill the C code you choose here, so you can get the CSR*/
uint64_t cyclecount = get_cycles() - oldcount;
printf("cycle count: %u\n", (unsigned int) cyclecount);
printf("instret: %x\n", (unsigned) (instret & 0xffffffff));
return 0;
}
```
This code is the base templete I rewrite from perfcount to implement the CSR.
And another important file you need to check is Makefile.
#### Makefile
```c
.PHONY: clean
include ../../mk/toolchain.mk
/*-Ofast , -O1 , -O2 ... you want need to chane in Makefile*/
CFLAGS = -march=rv32i_zicsr_zifencei -mabi=ilp32 -O1 -Wall
OBJS = \
/*if you want add file, fill in*/
getcycles.o \
getinstret.o \
main.o
BIN = perfcount.elf /*the final result you generate*/
%.o: %.S
$(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $<
%.o: %.c
$(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $<
all: $(BIN)
$(BIN): $(OBJS)
$(CROSS_COMPILE)gcc -o $@ $^
clean:
$(RM) $(BIN) $(OBJS)
```
After you finish those steps, you can start working.
## Operate step example
### 1. cd to the dir you place the code
<s>

</s>
```
ca@ca-VirtualBox:~rv32emu/tests/hw2$ ls
getcycle.S getinstret.S main.c Makefile
```
:::warning
:warning: Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text.
:notes: jserv
:::
### 2. use the "make" to compile you code

The Makefile will help you compile files.
### 3. execute the .elf file to check the result

### 4. if you want to clean the code you generated, use "make clean"

---
## Choose a Question
* Problem: I chose the **Implement log base power of 2 with CLZ** from [洪胤勛](https://hackmd.io/@KXkA4u0LQuyNTwOorDw2RA/HJSryZ4g6)
* Motivation: A basic math can be operated by a shifted skill is interesting.
### The origin code
```c
#include <stdint.h>
#include <stdio.h>
uint16_t count_leading_zeros(uint64_t x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x |= (x >> 32);
/* count ones (population count) */
x -= ((x >> 1) & 0x5555555555555555);
x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
x += (x >> 8);
x += (x >> 16);
x += (x >> 32);
return (64 - (x & 0x7f));
}
// log base power of 2
uint16_t logp2(int power, uint16_t clz)
{
uint16_t result = 0;
int tmp = 64 - clz;
while (1) {
tmp -= power;
if (tmp <= 0)
break;
result++;
}
return result;
}
int main(int argc, char* argv[])
{
uint64_t a = 64;
uint16_t clz = count_leading_zeros(a);
printf("%d\n", logp2(2, clz));
return 0;
}
```
### Modified code to CSR
```c
#include <stdint.h>
#include <stdio.h>
extern uint64_t get_cycles();
extern uint64_t get_instret();
uint16_t count_leading_zeros(uint64_t x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x |= (x >> 32);
/* count ones (population count) */
x -= ((x >> 1) & 0x5555555555555555);
x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
x += (x >> 8);
x += (x >> 16);
x += (x >> 32);
return (64 - (x & 0x7f));
}
// log base power of 2
uint16_t logp2(int power, uint16_t clz)
{
uint16_t result = 0;
int tmp = 64 - clz;
while (1) {
tmp -= power;
if (tmp <= 0)
break;
result++;
}
return result;
}
int main(void)
{
/* measure cycles */
uint64_t instret = get_instret();
uint64_t oldcount = get_cycles();
uint64_t a = 64;
uint16_t clz = count_leading_zeros(a);
uint16_t ans = logp2(1, clz);
uint64_t cyclecount = get_cycles() - oldcount;
printf("cycle count: %u\n", (unsigned int) cyclecount);
printf("instret: %x\n", (unsigned) (instret & 0xffffffff));
printf("Input data is : %lld\n", a);
printf("The log based 2 is : %d\n", ans);
return 0;
}
```
## **Compare Assembly Code**
```
The test data are all 64 and use log2 base.
```
## -O1 Optimized Assembly Code
### Assembly code
```c
0001016c <count_leading_zeros>:
1016c: 01f59713 sll a4,a1,0x1f
10170: 00155793 srl a5,a0,0x1
10174: 00f767b3 or a5,a4,a5
10178: 0015d713 srl a4,a1,0x1
1017c: 00a7e533 or a0,a5,a0
10180: 00b765b3 or a1,a4,a1
10184: 01e59713 sll a4,a1,0x1e
10188: 00255793 srl a5,a0,0x2
1018c: 00f767b3 or a5,a4,a5
10190: 0025d613 srl a2,a1,0x2
10194: 00a7e533 or a0,a5,a0
10198: 00b66633 or a2,a2,a1
1019c: 01c61713 sll a4,a2,0x1c
101a0: 00455793 srl a5,a0,0x4
101a4: 00f767b3 or a5,a4,a5
101a8: 00465693 srl a3,a2,0x4
101ac: 00a7e733 or a4,a5,a0
101b0: 00c6e6b3 or a3,a3,a2
101b4: 01869613 sll a2,a3,0x18
101b8: 00875793 srl a5,a4,0x8
101bc: 00f667b3 or a5,a2,a5
101c0: 0086d613 srl a2,a3,0x8
101c4: 00e7e7b3 or a5,a5,a4
101c8: 00d66633 or a2,a2,a3
101cc: 01061713 sll a4,a2,0x10
101d0: 0107d693 srl a3,a5,0x10
101d4: 00d766b3 or a3,a4,a3
101d8: 01065713 srl a4,a2,0x10
101dc: 00f6e6b3 or a3,a3,a5
101e0: 00c76733 or a4,a4,a2
101e4: 00d766b3 or a3,a4,a3
101e8: 01f71613 sll a2,a4,0x1f
101ec: 0016d793 srl a5,a3,0x1
101f0: 00f667b3 or a5,a2,a5
101f4: 00175593 srl a1,a4,0x1
101f8: 55555637 lui a2,0x55555
101fc: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x55537805>
10200: 00c7f7b3 and a5,a5,a2
10204: 00c5f633 and a2,a1,a2
10208: 40f687b3 sub a5,a3,a5
1020c: 00f6b6b3 sltu a3,a3,a5
10210: 40c70733 sub a4,a4,a2
10214: 40d70733 sub a4,a4,a3
10218: 01e71613 sll a2,a4,0x1e
1021c: 0027d693 srl a3,a5,0x2
10220: 00d666b3 or a3,a2,a3
10224: 00275593 srl a1,a4,0x2
10228: 33333637 lui a2,0x33333
1022c: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155e3>
10230: 00c6f6b3 and a3,a3,a2
10234: 00c5f5b3 and a1,a1,a2
10238: 00c7f7b3 and a5,a5,a2
1023c: 00c77733 and a4,a4,a2
10240: 00f687b3 add a5,a3,a5
10244: 00d7b6b3 sltu a3,a5,a3
10248: 00e58733 add a4,a1,a4
1024c: 00e686b3 add a3,a3,a4
10250: 01c69613 sll a2,a3,0x1c
10254: 0047d713 srl a4,a5,0x4
10258: 00e66733 or a4,a2,a4
1025c: 0046d613 srl a2,a3,0x4
10260: 00f707b3 add a5,a4,a5
10264: 00e7b733 sltu a4,a5,a4
10268: 00d606b3 add a3,a2,a3
1026c: 00d70733 add a4,a4,a3
10270: 0f0f16b7 lui a3,0xf0f1
10274: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d31bf>
10278: 00d7f7b3 and a5,a5,a3
1027c: 00d77733 and a4,a4,a3
10280: 01871613 sll a2,a4,0x18
10284: 0087d693 srl a3,a5,0x8
10288: 00d666b3 or a3,a2,a3
1028c: 00875613 srl a2,a4,0x8
10290: 00f687b3 add a5,a3,a5
10294: 00d7b6b3 sltu a3,a5,a3
10298: 00e60733 add a4,a2,a4
1029c: 00e686b3 add a3,a3,a4
102a0: 01069613 sll a2,a3,0x10
102a4: 0107d713 srl a4,a5,0x10
102a8: 00e66733 or a4,a2,a4
102ac: 0106d613 srl a2,a3,0x10
102b0: 00f707b3 add a5,a4,a5
102b4: 00e7b733 sltu a4,a5,a4
102b8: 00d606b3 add a3,a2,a3
102bc: 00d70733 add a4,a4,a3
102c0: 00f70733 add a4,a4,a5
102c4: 07f77713 and a4,a4,127
102c8: 04000513 li a0,64
102cc: 40e50533 sub a0,a0,a4
102d0: 01051513 sll a0,a0,0x10
102d4: 01055513 srl a0,a0,0x10
102d8: 00008067 ret
000102dc <logp2>:
102dc: 00050713 mv a4,a0
102e0: 04000793 li a5,64
102e4: 40b785b3 sub a1,a5,a1
102e8: 40a585b3 sub a1,a1,a0
102ec: 02b05063 blez a1,1030c <logp2+0x30>
102f0: 00000513 li a0,0
102f4: 00150513 add a0,a0,1
102f8: 01051513 sll a0,a0,0x10
102fc: 01055513 srl a0,a0,0x10
10300: 40e585b3 sub a1,a1,a4
10304: feb048e3 bgtz a1,102f4 <logp2+0x18>
10308: 00008067 ret
1030c: 00000513 li a0,0
10310: 00008067 ret
00010314 <main>:
10314: ff010113 add sp,sp,-16
10318: 00112623 sw ra,12(sp)
1031c: 00812423 sw s0,8(sp)
10320: 00912223 sw s1,4(sp)
10324: 01212023 sw s2,0(sp)
10328: e31ff0ef jal 10158 <get_instret>
1032c: 00050493 mv s1,a0
10330: e15ff0ef jal 10144 <get_cycles>
10334: 00050913 mv s2,a0
10338: 03900593 li a1,57
1033c: 00100513 li a0,1
10340: f9dff0ef jal 102dc <logp2>
10344: 00050413 mv s0,a0
10348: dfdff0ef jal 10144 <get_cycles>
1034c: 412505b3 sub a1,a0,s2
10350: 0001c537 lui a0,0x1c
10354: c3850513 add a0,a0,-968 # 1bc38 <__clzsi2+0x72>
10358: 430000ef jal 10788 <printf>
1035c: 00048593 mv a1,s1
10360: 0001c537 lui a0,0x1c
10364: c4c50513 add a0,a0,-948 # 1bc4c <__clzsi2+0x86>
10368: 420000ef jal 10788 <printf>
1036c: 04000613 li a2,64
10370: 00000693 li a3,0
10374: 0001c537 lui a0,0x1c
10378: c5c50513 add a0,a0,-932 # 1bc5c <__clzsi2+0x96>
1037c: 40c000ef jal 10788 <printf>
10380: 00040593 mv a1,s0
10384: 0001c537 lui a0,0x1c
10388: c7450513 add a0,a0,-908 # 1bc74 <__clzsi2+0xae>
1038c: 3fc000ef jal 10788 <printf>
10390: 00000513 li a0,0
10394: 00c12083 lw ra,12(sp)
10398: 00812403 lw s0,8(sp)
1039c: 00412483 lw s1,4(sp)
103a0: 00012903 lw s2,0(sp)
103a4: 01010113 add sp,sp,16
103a8: 00008067 ret
```
### elf size

### execute

* Observation:
* Line of code : `147`
* Allocate `16` bytes on stack
* Registers used : `$ra`, `$sp` , `$s0~$s2`, `$a0~$a5`
* Number of `lw` and `sw` : `4` and `4`
* execute output:
* cycle count : 48
* instret : 2c6
## -O2 Optimized Assembly Code
### Assembly code
```c
000101ec <count_leading_zeros>:
101ec: 01f59713 sll a4,a1,0x1f
101f0: 00155793 srl a5,a0,0x1
101f4: 00f767b3 or a5,a4,a5
101f8: 0015d713 srl a4,a1,0x1
101fc: 00b765b3 or a1,a4,a1
10200: 00a7e533 or a0,a5,a0
10204: 01e59713 sll a4,a1,0x1e
10208: 00255793 srl a5,a0,0x2
1020c: 00f767b3 or a5,a4,a5
10210: 0025d713 srl a4,a1,0x2
10214: 00b76733 or a4,a4,a1
10218: 00a7e533 or a0,a5,a0
1021c: 01c71693 sll a3,a4,0x1c
10220: 00455793 srl a5,a0,0x4
10224: 00f6e7b3 or a5,a3,a5
10228: 00475693 srl a3,a4,0x4
1022c: 00e6e6b3 or a3,a3,a4
10230: 00a7e733 or a4,a5,a0
10234: 01869613 sll a2,a3,0x18
10238: 00875793 srl a5,a4,0x8
1023c: 00f667b3 or a5,a2,a5
10240: 0086d613 srl a2,a3,0x8
10244: 00d66633 or a2,a2,a3
10248: 00e7e7b3 or a5,a5,a4
1024c: 0107d693 srl a3,a5,0x10
10250: 01061713 sll a4,a2,0x10
10254: 00d766b3 or a3,a4,a3
10258: 01065713 srl a4,a2,0x10
1025c: 00c76733 or a4,a4,a2
10260: 00f6e6b3 or a3,a3,a5
10264: 00d766b3 or a3,a4,a3
10268: 01f71593 sll a1,a4,0x1f
1026c: 0016d793 srl a5,a3,0x1
10270: 55555637 lui a2,0x55555
10274: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x55537805>
10278: 00f5e7b3 or a5,a1,a5
1027c: 00c7f7b3 and a5,a5,a2
10280: 00175593 srl a1,a4,0x1
10284: 40f687b3 sub a5,a3,a5
10288: 00c5f633 and a2,a1,a2
1028c: 00f6b6b3 sltu a3,a3,a5
10290: 40c70733 sub a4,a4,a2
10294: 40d70733 sub a4,a4,a3
10298: 01e71593 sll a1,a4,0x1e
1029c: 0027d693 srl a3,a5,0x2
102a0: 33333637 lui a2,0x33333
102a4: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155e3>
102a8: 00d5e6b3 or a3,a1,a3
102ac: 00c6f6b3 and a3,a3,a2
102b0: 00275593 srl a1,a4,0x2
102b4: 00c7f7b3 and a5,a5,a2
102b8: 00f687b3 add a5,a3,a5
102bc: 00c5f5b3 and a1,a1,a2
102c0: 00c77733 and a4,a4,a2
102c4: 00e58733 add a4,a1,a4
102c8: 00d7b6b3 sltu a3,a5,a3
102cc: 00e686b3 add a3,a3,a4
102d0: 01c69613 sll a2,a3,0x1c
102d4: 0047d713 srl a4,a5,0x4
102d8: 00e66733 or a4,a2,a4
102dc: 00f707b3 add a5,a4,a5
102e0: 0046d613 srl a2,a3,0x4
102e4: 00d60633 add a2,a2,a3
102e8: 00e7b733 sltu a4,a5,a4
102ec: 0f0f16b7 lui a3,0xf0f1
102f0: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d31bf>
102f4: 00c70733 add a4,a4,a2
102f8: 00d77733 and a4,a4,a3
102fc: 00d7f7b3 and a5,a5,a3
10300: 01871613 sll a2,a4,0x18
10304: 0087d693 srl a3,a5,0x8
10308: 00d666b3 or a3,a2,a3
1030c: 00f687b3 add a5,a3,a5
10310: 00875613 srl a2,a4,0x8
10314: 00e60733 add a4,a2,a4
10318: 00d7b6b3 sltu a3,a5,a3
1031c: 00e686b3 add a3,a3,a4
10320: 01069613 sll a2,a3,0x10
10324: 0107d713 srl a4,a5,0x10
10328: 00e66733 or a4,a2,a4
1032c: 00f707b3 add a5,a4,a5
10330: 0106d613 srl a2,a3,0x10
10334: 00e7b733 sltu a4,a5,a4
10338: 00d606b3 add a3,a2,a3
1033c: 00d70733 add a4,a4,a3
10340: 00f70733 add a4,a4,a5
10344: 07f77713 and a4,a4,127
10348: 04000513 li a0,64
1034c: 40e50533 sub a0,a0,a4
10350: 01051513 sll a0,a0,0x10
10354: 01055513 srl a0,a0,0x10
10358: 00008067 ret
0001035c <logp2>:
1035c: 04000793 li a5,64
10360: 40b785b3 sub a1,a5,a1
10364: 40a585b3 sub a1,a1,a0
10368: 00050713 mv a4,a0
1036c: 00000513 li a0,0
10370: 00b05e63 blez a1,1038c <logp2+0x30>
10374: 00150513 add a0,a0,1
10378: 01051513 sll a0,a0,0x10
1037c: 40e585b3 sub a1,a1,a4
10380: 01055513 srl a0,a0,0x10
10384: feb048e3 bgtz a1,10374 <logp2+0x18>
10388: 00008067 ret
1038c: 00008067 ret
000100b0 <main>:
100b0: ff010113 add sp,sp,-16
100b4: 00112623 sw ra,12(sp)
100b8: 00812423 sw s0,8(sp)
100bc: 00912223 sw s1,4(sp)
100c0: 118000ef jal 101d8 <get_instret>
100c4: 00050413 mv s0,a0
100c8: 0fc000ef jal 101c4 <get_cycles>
100cc: 00050493 mv s1,a0
100d0: 0f4000ef jal 101c4 <get_cycles>
100d4: 409505b3 sub a1,a0,s1
100d8: 0001c537 lui a0,0x1c
100dc: c1850513 add a0,a0,-1000 # 1bc18 <__clzsi2+0x6e>
100e0: 68c000ef jal 1076c <printf>
100e4: 0001c537 lui a0,0x1c
100e8: 00040593 mv a1,s0
100ec: c2c50513 add a0,a0,-980 # 1bc2c <__clzsi2+0x82>
100f0: 67c000ef jal 1076c <printf>
100f4: 0001c537 lui a0,0x1c
100f8: 04000613 li a2,64
100fc: 00000693 li a3,0
10100: c3c50513 add a0,a0,-964 # 1bc3c <__clzsi2+0x92>
10104: 668000ef jal 1076c <printf>
10108: 0001c537 lui a0,0x1c
1010c: 00600593 li a1,6
10110: c5450513 add a0,a0,-940 # 1bc54 <__clzsi2+0xaa>
10114: 658000ef jal 1076c <printf>
10118: 00c12083 lw ra,12(sp)
1011c: 00812403 lw s0,8(sp)
10120: 00412483 lw s1,4(sp)
10124: 00000513 li a0,0
10128: 01010113 add sp,sp,16
1012c: 00008067 ret
```
### elf size

### execute

* Observation:
* Line of code : `140`
* Allocate `16` bytes on stack
* Registers used : `$ra`, `$sp` , `$s0~$s1`, `$a0~$a5`
* Number of `lw` and `sw` : `3` and `3`
* execute output:
* cycle count : 7
* instret : 2c5
## -O3 Optimized Assembly Code
### Assembly code
```c
000101ec <count_leading_zeros>:
101ec: 01f59713 sll a4,a1,0x1f
101f0: 00155793 srl a5,a0,0x1
101f4: 00f767b3 or a5,a4,a5
101f8: 0015d713 srl a4,a1,0x1
101fc: 00b765b3 or a1,a4,a1
10200: 00a7e533 or a0,a5,a0
10204: 01e59713 sll a4,a1,0x1e
10208: 00255793 srl a5,a0,0x2
1020c: 00f767b3 or a5,a4,a5
10210: 0025d713 srl a4,a1,0x2
10214: 00b76733 or a4,a4,a1
10218: 00a7e533 or a0,a5,a0
1021c: 01c71693 sll a3,a4,0x1c
10220: 00455793 srl a5,a0,0x4
10224: 00f6e7b3 or a5,a3,a5
10228: 00475693 srl a3,a4,0x4
1022c: 00e6e6b3 or a3,a3,a4
10230: 00a7e733 or a4,a5,a0
10234: 01869613 sll a2,a3,0x18
10238: 00875793 srl a5,a4,0x8
1023c: 00f667b3 or a5,a2,a5
10240: 0086d613 srl a2,a3,0x8
10244: 00d66633 or a2,a2,a3
10248: 00e7e7b3 or a5,a5,a4
1024c: 0107d693 srl a3,a5,0x10
10250: 01061713 sll a4,a2,0x10
10254: 00d766b3 or a3,a4,a3
10258: 01065713 srl a4,a2,0x10
1025c: 00c76733 or a4,a4,a2
10260: 00f6e6b3 or a3,a3,a5
10264: 00d766b3 or a3,a4,a3
10268: 01f71593 sll a1,a4,0x1f
1026c: 0016d793 srl a5,a3,0x1
10270: 55555637 lui a2,0x55555
10274: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x55537805>
10278: 00f5e7b3 or a5,a1,a5
1027c: 00c7f7b3 and a5,a5,a2
10280: 00175593 srl a1,a4,0x1
10284: 40f687b3 sub a5,a3,a5
10288: 00c5f633 and a2,a1,a2
1028c: 00f6b6b3 sltu a3,a3,a5
10290: 40c70733 sub a4,a4,a2
10294: 40d70733 sub a4,a4,a3
10298: 01e71593 sll a1,a4,0x1e
1029c: 0027d693 srl a3,a5,0x2
102a0: 33333637 lui a2,0x33333
102a4: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155e3>
102a8: 00d5e6b3 or a3,a1,a3
102ac: 00c6f6b3 and a3,a3,a2
102b0: 00275593 srl a1,a4,0x2
102b4: 00c7f7b3 and a5,a5,a2
102b8: 00f687b3 add a5,a3,a5
102bc: 00c5f5b3 and a1,a1,a2
102c0: 00c77733 and a4,a4,a2
102c4: 00e58733 add a4,a1,a4
102c8: 00d7b6b3 sltu a3,a5,a3
102cc: 00e686b3 add a3,a3,a4
102d0: 01c69613 sll a2,a3,0x1c
102d4: 0047d713 srl a4,a5,0x4
102d8: 00e66733 or a4,a2,a4
102dc: 00f707b3 add a5,a4,a5
102e0: 0046d613 srl a2,a3,0x4
102e4: 00d60633 add a2,a2,a3
102e8: 00e7b733 sltu a4,a5,a4
102ec: 0f0f16b7 lui a3,0xf0f1
102f0: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d31bf>
102f4: 00c70733 add a4,a4,a2
102f8: 00d77733 and a4,a4,a3
102fc: 00d7f7b3 and a5,a5,a3
10300: 01871613 sll a2,a4,0x18
10304: 0087d693 srl a3,a5,0x8
10308: 00d666b3 or a3,a2,a3
1030c: 00f687b3 add a5,a3,a5
10310: 00875613 srl a2,a4,0x8
10314: 00e60733 add a4,a2,a4
10318: 00d7b6b3 sltu a3,a5,a3
1031c: 00e686b3 add a3,a3,a4
10320: 01069613 sll a2,a3,0x10
10324: 0107d713 srl a4,a5,0x10
10328: 00e66733 or a4,a2,a4
1032c: 00f707b3 add a5,a4,a5
10330: 0106d613 srl a2,a3,0x10
10334: 00e7b733 sltu a4,a5,a4
10338: 00d606b3 add a3,a2,a3
1033c: 00d70733 add a4,a4,a3
10340: 00f70733 add a4,a4,a5
10344: 07f77713 and a4,a4,127
10348: 04000513 li a0,64
1034c: 40e50533 sub a0,a0,a4
10350: 01051513 sll a0,a0,0x10
10354: 01055513 srl a0,a0,0x10
10358: 00008067 ret
0001035c <logp2>:
1035c: 04000793 li a5,64
10360: 40b785b3 sub a1,a5,a1
10364: 40a585b3 sub a1,a1,a0
10368: 00050713 mv a4,a0
1036c: 00000513 li a0,0
10370: 00b05e63 blez a1,1038c <logp2+0x30>
10374: 00150513 add a0,a0,1
10378: 01051513 sll a0,a0,0x10
1037c: 40e585b3 sub a1,a1,a4
10380: 01055513 srl a0,a0,0x10
10384: feb048e3 bgtz a1,10374 <logp2+0x18>
10388: 00008067 ret
1038c: 00008067 ret
000100b0 <main>:
100b0: ff010113 add sp,sp,-16
100b4: 00112623 sw ra,12(sp)
100b8: 00812423 sw s0,8(sp)
100bc: 00912223 sw s1,4(sp)
100c0: 118000ef jal 101d8 <get_instret>
100c4: 00050413 mv s0,a0
100c8: 0fc000ef jal 101c4 <get_cycles>
100cc: 00050493 mv s1,a0
100d0: 0f4000ef jal 101c4 <get_cycles>
100d4: 409505b3 sub a1,a0,s1
100d8: 0001c537 lui a0,0x1c
100dc: c1850513 add a0,a0,-1000 # 1bc18 <__clzsi2+0x6e>
100e0: 68c000ef jal 1076c <printf>
100e4: 0001c537 lui a0,0x1c
100e8: 00040593 mv a1,s0
100ec: c2c50513 add a0,a0,-980 # 1bc2c <__clzsi2+0x82>
100f0: 67c000ef jal 1076c <printf>
100f4: 0001c537 lui a0,0x1c
100f8: 04000613 li a2,64
100fc: 00000693 li a3,0
10100: c3c50513 add a0,a0,-964 # 1bc3c <__clzsi2+0x92>
10104: 668000ef jal 1076c <printf>
10108: 0001c537 lui a0,0x1c
1010c: 00600593 li a1,6
10110: c5450513 add a0,a0,-940 # 1bc54 <__clzsi2+0xaa>
10114: 658000ef jal 1076c <printf>
10118: 00c12083 lw ra,12(sp)
1011c: 00812403 lw s0,8(sp)
10120: 00412483 lw s1,4(sp)
10124: 00000513 li a0,0
10128: 01010113 add sp,sp,16
1012c: 00008067 ret
```
### elf size
<s></s>
```
ca@ca-VirtualBox:~rv32emu/tests/hw2$ riscv-none-elf-size ./perfcount.elf
text data bss dec hex filename
51608 1876 1528 55012 d6e4 ./perfcount.elf
```
### execute

:::warning
:warning: Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text.
:notes: jserv
:::
* Observation:
* Line of code : `140`
* Allocate `16` bytes on stack
* Registers used : `$ra`, `$sp` , `$s0~$s1`, `$a0~$a5`
* Number of `lw` and `sw` : `3` and `3`
* execute output:
* cycle count : 7
* instret : 2c5
## -Os Optimized Assembly Code
### Assembly code
```c
00010204 <count_leading_zeros>:
10204: 01f59713 sll a4,a1,0x1f
10208: 00155793 srl a5,a0,0x1
1020c: 00f767b3 or a5,a4,a5
10210: 0015d713 srl a4,a1,0x1
10214: 00b765b3 or a1,a4,a1
10218: 00a7e533 or a0,a5,a0
1021c: 01e59713 sll a4,a1,0x1e
10220: 00255793 srl a5,a0,0x2
10224: 00f767b3 or a5,a4,a5
10228: 0025d613 srl a2,a1,0x2
1022c: 00b66633 or a2,a2,a1
10230: 00a7e533 or a0,a5,a0
10234: 01c61713 sll a4,a2,0x1c
10238: 00455793 srl a5,a0,0x4
1023c: 00f767b3 or a5,a4,a5
10240: 00465693 srl a3,a2,0x4
10244: 00a7e733 or a4,a5,a0
10248: 00c6e6b3 or a3,a3,a2
1024c: 01869613 sll a2,a3,0x18
10250: 00875793 srl a5,a4,0x8
10254: 00f667b3 or a5,a2,a5
10258: 0086d613 srl a2,a3,0x8
1025c: 00d66633 or a2,a2,a3
10260: 00e7e7b3 or a5,a5,a4
10264: 0107d693 srl a3,a5,0x10
10268: 01061713 sll a4,a2,0x10
1026c: 00d766b3 or a3,a4,a3
10270: 01065713 srl a4,a2,0x10
10274: 00c76733 or a4,a4,a2
10278: 00f6e6b3 or a3,a3,a5
1027c: 00d766b3 or a3,a4,a3
10280: 01f71613 sll a2,a4,0x1f
10284: 0016d793 srl a5,a3,0x1
10288: 00f667b3 or a5,a2,a5
1028c: 55555637 lui a2,0x55555
10290: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x55537805>
10294: 00175593 srl a1,a4,0x1
10298: 00c7f7b3 and a5,a5,a2
1029c: 40f687b3 sub a5,a3,a5
102a0: 00c5f633 and a2,a1,a2
102a4: 00f6b6b3 sltu a3,a3,a5
102a8: 40c70733 sub a4,a4,a2
102ac: 40d70733 sub a4,a4,a3
102b0: 01e71613 sll a2,a4,0x1e
102b4: 0027d693 srl a3,a5,0x2
102b8: 00d666b3 or a3,a2,a3
102bc: 33333637 lui a2,0x33333
102c0: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155e3>
102c4: 00c6f6b3 and a3,a3,a2
102c8: 00275593 srl a1,a4,0x2
102cc: 00c7f7b3 and a5,a5,a2
102d0: 00c5f5b3 and a1,a1,a2
102d4: 00f687b3 add a5,a3,a5
102d8: 00c77733 and a4,a4,a2
102dc: 00e58733 add a4,a1,a4
102e0: 00d7b6b3 sltu a3,a5,a3
102e4: 00e686b3 add a3,a3,a4
102e8: 01c69613 sll a2,a3,0x1c
102ec: 0047d713 srl a4,a5,0x4
102f0: 00e66733 or a4,a2,a4
102f4: 00f707b3 add a5,a4,a5
102f8: 0046d613 srl a2,a3,0x4
102fc: 00d606b3 add a3,a2,a3
10300: 00e7b733 sltu a4,a5,a4
10304: 00d70733 add a4,a4,a3
10308: 0f0f16b7 lui a3,0xf0f1
1030c: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d31bf>
10310: 00d77733 and a4,a4,a3
10314: 00d7f7b3 and a5,a5,a3
10318: 01871613 sll a2,a4,0x18
1031c: 0087d693 srl a3,a5,0x8
10320: 00d666b3 or a3,a2,a3
10324: 00f687b3 add a5,a3,a5
10328: 00875613 srl a2,a4,0x8
1032c: 00e60733 add a4,a2,a4
10330: 00d7b6b3 sltu a3,a5,a3
10334: 00e686b3 add a3,a3,a4
10338: 01069613 sll a2,a3,0x10
1033c: 0107d713 srl a4,a5,0x10
10340: 00e66733 or a4,a2,a4
10344: 00f707b3 add a5,a4,a5
10348: 0106d613 srl a2,a3,0x10
1034c: 00e7b733 sltu a4,a5,a4
10350: 00d606b3 add a3,a2,a3
10354: 00d70733 add a4,a4,a3
10358: 00f70733 add a4,a4,a5
1035c: 07f77713 and a4,a4,127
10360: 04000513 li a0,64
10364: 40e50533 sub a0,a0,a4
10368: 01051513 sll a0,a0,0x10
1036c: 01055513 srl a0,a0,0x10
10370: 00008067 ret
00010374 <logp2>:
10374: 04000793 li a5,64
10378: 00050713 mv a4,a0
1037c: 40b785b3 sub a1,a5,a1
10380: 00000513 li a0,0
10384: 40e585b3 sub a1,a1,a4
10388: 00b05a63 blez a1,1039c <logp2+0x28>
1038c: 00150793 add a5,a0,1
10390: 01079513 sll a0,a5,0x10
10394: 01055513 srl a0,a0,0x10
10398: fedff06f j 10384 <logp2+0x10>
1039c: 00008067 ret
000100b0 <main>:
100b0: ff010113 add sp,sp,-16
100b4: 00112623 sw ra,12(sp)
100b8: 00812423 sw s0,8(sp)
100bc: 00912223 sw s1,4(sp)
100c0: 01212023 sw s2,0(sp)
100c4: 12c000ef jal 101f0 <get_instret>
100c8: 00050493 mv s1,a0
100cc: 110000ef jal 101dc <get_cycles>
100d0: 00050913 mv s2,a0
100d4: 03900593 li a1,57
100d8: 00100513 li a0,1
100dc: 298000ef jal 10374 <logp2>
100e0: 00050413 mv s0,a0
100e4: 0f8000ef jal 101dc <get_cycles>
100e8: 412505b3 sub a1,a0,s2
100ec: 0001c537 lui a0,0x1c
100f0: c2850513 add a0,a0,-984 # 1bc28 <__clzsi2+0x6e>
100f4: 688000ef jal 1077c <printf>
100f8: 0001c537 lui a0,0x1c
100fc: 00048593 mv a1,s1
10100: c3c50513 add a0,a0,-964 # 1bc3c <__clzsi2+0x82>
10104: 678000ef jal 1077c <printf>
10108: 0001c537 lui a0,0x1c
1010c: 04000613 li a2,64
10110: 00000693 li a3,0
10114: c4c50513 add a0,a0,-948 # 1bc4c <__clzsi2+0x92>
10118: 664000ef jal 1077c <printf>
1011c: 0001c537 lui a0,0x1c
10120: 00040593 mv a1,s0
10124: c6450513 add a0,a0,-924 # 1bc64 <__clzsi2+0xaa>
10128: 654000ef jal 1077c <printf>
1012c: 00c12083 lw ra,12(sp)
10130: 00812403 lw s0,8(sp)
10134: 00412483 lw s1,4(sp)
10138: 00012903 lw s2,0(sp)
1013c: 00000513 li a0,0
10140: 01010113 add sp,sp,16
10144: 00008067 ret
```
### elf size

### execute

* Observation:
* Line of code : `144`
* Allocate `16` bytes on stack
* Registers used : `$ra`, `$sp` , `$s0~$s2`, `$a0~$a5`
* Number of `lw` and `sw` : `4` and `4`
* execute output:
* cycle count : 54
* instret : 2c6
## -Ofast Optimized Assembly Code
### Assembly code
```c
000101ec <count_leading_zeros>:
101ec: 01f59713 sll a4,a1,0x1f
101f0: 00155793 srl a5,a0,0x1
101f4: 00f767b3 or a5,a4,a5
101f8: 0015d713 srl a4,a1,0x1
101fc: 00b765b3 or a1,a4,a1
10200: 00a7e533 or a0,a5,a0
10204: 01e59713 sll a4,a1,0x1e
10208: 00255793 srl a5,a0,0x2
1020c: 00f767b3 or a5,a4,a5
10210: 0025d713 srl a4,a1,0x2
10214: 00b76733 or a4,a4,a1
10218: 00a7e533 or a0,a5,a0
1021c: 01c71693 sll a3,a4,0x1c
10220: 00455793 srl a5,a0,0x4
10224: 00f6e7b3 or a5,a3,a5
10228: 00475693 srl a3,a4,0x4
1022c: 00e6e6b3 or a3,a3,a4
10230: 00a7e733 or a4,a5,a0
10234: 01869613 sll a2,a3,0x18
10238: 00875793 srl a5,a4,0x8
1023c: 00f667b3 or a5,a2,a5
10240: 0086d613 srl a2,a3,0x8
10244: 00d66633 or a2,a2,a3
10248: 00e7e7b3 or a5,a5,a4
1024c: 0107d693 srl a3,a5,0x10
10250: 01061713 sll a4,a2,0x10
10254: 00d766b3 or a3,a4,a3
10258: 01065713 srl a4,a2,0x10
1025c: 00c76733 or a4,a4,a2
10260: 00f6e6b3 or a3,a3,a5
10264: 00d766b3 or a3,a4,a3
10268: 01f71593 sll a1,a4,0x1f
1026c: 0016d793 srl a5,a3,0x1
10270: 55555637 lui a2,0x55555
10274: 55560613 add a2,a2,1365 # 55555555 <__BSS_END__+0x55537805>
10278: 00f5e7b3 or a5,a1,a5
1027c: 00c7f7b3 and a5,a5,a2
10280: 00175593 srl a1,a4,0x1
10284: 40f687b3 sub a5,a3,a5
10288: 00c5f633 and a2,a1,a2
1028c: 00f6b6b3 sltu a3,a3,a5
10290: 40c70733 sub a4,a4,a2
10294: 40d70733 sub a4,a4,a3
10298: 01e71593 sll a1,a4,0x1e
1029c: 0027d693 srl a3,a5,0x2
102a0: 33333637 lui a2,0x33333
102a4: 33360613 add a2,a2,819 # 33333333 <__BSS_END__+0x333155e3>
102a8: 00d5e6b3 or a3,a1,a3
102ac: 00c6f6b3 and a3,a3,a2
102b0: 00275593 srl a1,a4,0x2
102b4: 00c7f7b3 and a5,a5,a2
102b8: 00f687b3 add a5,a3,a5
102bc: 00c5f5b3 and a1,a1,a2
102c0: 00c77733 and a4,a4,a2
102c4: 00e58733 add a4,a1,a4
102c8: 00d7b6b3 sltu a3,a5,a3
102cc: 00e686b3 add a3,a3,a4
102d0: 01c69613 sll a2,a3,0x1c
102d4: 0047d713 srl a4,a5,0x4
102d8: 00e66733 or a4,a2,a4
102dc: 00f707b3 add a5,a4,a5
102e0: 0046d613 srl a2,a3,0x4
102e4: 00d60633 add a2,a2,a3
102e8: 00e7b733 sltu a4,a5,a4
102ec: 0f0f16b7 lui a3,0xf0f1
102f0: f0f68693 add a3,a3,-241 # f0f0f0f <__BSS_END__+0xf0d31bf>
102f4: 00c70733 add a4,a4,a2
102f8: 00d77733 and a4,a4,a3
102fc: 00d7f7b3 and a5,a5,a3
10300: 01871613 sll a2,a4,0x18
10304: 0087d693 srl a3,a5,0x8
10308: 00d666b3 or a3,a2,a3
1030c: 00f687b3 add a5,a3,a5
10310: 00875613 srl a2,a4,0x8
10314: 00e60733 add a4,a2,a4
10318: 00d7b6b3 sltu a3,a5,a3
1031c: 00e686b3 add a3,a3,a4
10320: 01069613 sll a2,a3,0x10
10324: 0107d713 srl a4,a5,0x10
10328: 00e66733 or a4,a2,a4
1032c: 00f707b3 add a5,a4,a5
10330: 0106d613 srl a2,a3,0x10
10334: 00e7b733 sltu a4,a5,a4
10338: 00d606b3 add a3,a2,a3
1033c: 00d70733 add a4,a4,a3
10340: 00f70733 add a4,a4,a5
10344: 07f77713 and a4,a4,127
10348: 04000513 li a0,64
1034c: 40e50533 sub a0,a0,a4
10350: 01051513 sll a0,a0,0x10
10354: 01055513 srl a0,a0,0x10
10358: 00008067 ret
0001035c <logp2>:
1035c: 04000793 li a5,64
10360: 40b785b3 sub a1,a5,a1
10364: 40a585b3 sub a1,a1,a0
10368: 00050713 mv a4,a0
1036c: 00000513 li a0,0
10370: 00b05e63 blez a1,1038c <logp2+0x30>
10374: 00150513 add a0,a0,1
10378: 01051513 sll a0,a0,0x10
1037c: 40e585b3 sub a1,a1,a4
10380: 01055513 srl a0,a0,0x10
10384: feb048e3 bgtz a1,10374 <logp2+0x18>
10388: 00008067 ret
1038c: 00008067 ret
000100b0 <main>:
100b0: ff010113 add sp,sp,-16
100b4: 00112623 sw ra,12(sp)
100b8: 00812423 sw s0,8(sp)
100bc: 00912223 sw s1,4(sp)
100c0: 118000ef jal 101d8 <get_instret>
100c4: 00050413 mv s0,a0
100c8: 0fc000ef jal 101c4 <get_cycles>
100cc: 00050493 mv s1,a0
100d0: 0f4000ef jal 101c4 <get_cycles>
100d4: 409505b3 sub a1,a0,s1
100d8: 0001c537 lui a0,0x1c
100dc: c1850513 add a0,a0,-1000 # 1bc18 <__clzsi2+0x6e>
100e0: 68c000ef jal 1076c <printf>
100e4: 0001c537 lui a0,0x1c
100e8: 00040593 mv a1,s0
100ec: c2c50513 add a0,a0,-980 # 1bc2c <__clzsi2+0x82>
100f0: 67c000ef jal 1076c <printf>
100f4: 0001c537 lui a0,0x1c
100f8: 04000613 li a2,64
100fc: 00000693 li a3,0
10100: c3c50513 add a0,a0,-964 # 1bc3c <__clzsi2+0x92>
10104: 668000ef jal 1076c <printf>
10108: 0001c537 lui a0,0x1c
1010c: 00600593 li a1,6
10110: c5450513 add a0,a0,-940 # 1bc54 <__clzsi2+0xaa>
10114: 658000ef jal 1076c <printf>
10118: 00c12083 lw ra,12(sp)
1011c: 00812403 lw s0,8(sp)
10120: 00412483 lw s1,4(sp)
10124: 00000513 li a0,0
10128: 01010113 add sp,sp,16
1012c: 00008067 ret
```
### elf size

### execute

* Observation:
* Line of code : `140`
* Allocate `16` bytes on stack
* Registers used : `$ra`, `$sp` , `$s0~$s1`, `$a0~$a5`
* Number of `lw` and `sw` : `3` and `3`
* execute output:
* cycle count : 7
* instret : 2c5
## -o handwrite Assembly Code
```
The code detail please check github file "handwrite.S"
```
### details
We need th check the asm-hello dirtionary to check the result of handwrite assembly code.
And add some assembly code to execute.
following list the some details.
```
.global _start
.set STDOUT, 1
.set SYSEXIT, 93
.set SYSWRITE, 64
---------------------------
start:
jal get_cycles
addi sp, sp, -4
sw a0, 0(sp)
...skip main things...
li a7, SYSWRITE
li a0, 1
la a1, str_cycle
li a2, 13
ecall
jal get_cycles
lw t0, 0(sp) # t0 = pre cycle
sub a0, a0, t0 # a0 = new cycle
addi sp, sp, 4
li a1, 4
jal print_ascii
mv t0, a0
li a0, 1
la a1, buffer
li a2, 4
li a7, SYSWRITE
ecall
li a7, SYSWRITE
li a0, 1
la a1, endl
li a2, 2
ecall
li a7, SYSEXIT # "exit" syscall
add a0, x0, 0 # Use 0 return code
ecall # invoke syscall to terminate the program
------------------------
get_cycles:
csrr a1, cycleh
csrr a3, cycle
csrr a2, cycleh
bne a1, a2, get_cycles
ret
print_ascii:
mv t0, a0 # load integer
li t1, 0 # t1 = quotient
li t2, 0 # t2 = reminder
li t3, 10 # t3 = divisor
mv t4, a1 # t4 = count roun
check_less_then_ten:
bge t0, t3, divide
mv t2, t0
mv t0, t1 # t0 = quotient
j to_ascii
divide:
sub t0, t0, t3
addi t1, t1, 1
j check_less_then_ten
to_ascii:
addi t2, t2, 48 # reminder to ascii
la t5, buffer # t5 = buffer addr
addi t4, t4, -1
add t5, t5, t4
sb t2, 0(t5)
# counter = 0 exit
beqz t4, convert_loop_done
li t1, 0 # refresh quotient
j check_less_then_ten
convert_loop_done:
retS
```
<s>
### elf size

### execute

</s>
:::warning
:warning: Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text.
:notes: jserv
:::
## Conclusion
* -O1 to -O2, using less Line of code, S register and less use lw/sw.
* -O1 to -O2 is reduce a lot of cycle count : 48->7 .
* -Os have most cycle count : 54 .
* In this case, both -O2 and -Ofast are almost identical.
* handwrite version also speed up a lot cycle count : 13
* So, this is faster than others besides O2
:::warning
TODO: Revise the handwritten RISC-V assembly code.
:notes: jserv
:::