owned this note
owned this note
Published
Linked with GitHub
# Assignment2: RISC-V Toolchain
### Contributed by<Rei-Fu-Zhang>
## install VM with Oracle VM VirtualBox
first [click here](https://www.virtualbox.org/)

because my cpu is AMD R7 5800X3D
so we need to do a prework
go to BIOS and open the CPU's VM Function
then slect ypur VM like the picture below

## Problem description:
### The problem I selected:122. Best Time to Buy and Sell Stock II from [陳邦尉](https://github.com/ram-a11y/Computer-Architecture)
### motivation:I'm a hopeless stock maniac
### the code from him
```
#include<stdio.h>
int maxProfit(const int *, int);
int main(){
const int prices[] = {7,1,5,3,6,4};
const int pricesSize = 6;
printf("%d\n", maxProfit(prices, pricesSize));
return 0;
}
int maxProfit(int* prices, int pricesSize) {
int sum = 0;
for (int i = 1; i < pricesSize; i++) {
if (prices[i - 1] < prices[i]) {
// check if growing then add it
sum = sum + (prices[i] - prices[i - 1]);
}
}
return sum;
}
```
### My version
```
#include<stdio.h>
#include<stdlib.h>
int main(){
int arr[]={99,32,3,56,0,2,56,99};
int size=8;
int a= maxProfit(arr,size);
printf("%d\n",a);
}
int maxProfit(int* prices, int pricesSize){
int totalProfit = 0;
int i=0;
int increases = 0;
for (i = 1; i < pricesSize; i++) {
if (prices[i] <= prices[i - 1]) {
totalProfit += increases;
increases = 0;
}
else
increases += prices[i] - prices[i - 1];
}
totalProfit += increases;
return totalProfit;
}
```
## rewrite the RV32I code
```
.data
arr: .word 9,2,3,6,0,2,6,9
size: .word 8
.text
main:
la s0 arr
lw s1 size
jal ra profit
add s10 s10 s3
j print
profit:
addi s2 s2 1 #s2=i=1 i++
add t1 s2 zero #t1=s2=i
slli t1 t1 2 #t1*4
add t1 t1 s0 #address arr[i]
lw s4 0[t1]
lw s5 -4[t1]
bge s5 s4 process
sub s8 s4 s5
add s3 s3 s8
blt s2 s1 profit
jr ra
process:
add s10 s10 s3
add s3 zero zero
blt s2 s1 profit
jr ra
print:
addi a0 s10 0
li a7 1
ecall
li a7 10
ecall
```
trying to modify my code for fit in RV32EMU
```
# RISC-V assembly program to print "Hello World!" to stdout.
.org 0
# Provide program starting address to linker
.global _start
/* newlib system calls */
.set SYSEXIT, 93
.set SYSWRITE, 64
.data
arr: .word 9,2,3,6,0,2,6,9
size: .word 8
start_input_line:.string "Input:["
start_output_line:.string "Output:["
endline:.string "]\n"
.text
main:
la s0 arr
lw s1 size
jal ra profit
add s10 s10 s3
j print
profit:
addi s2 s2 1 #s2=i=1 i++
add t1 s2 zero #t1=s2=i
slli t1 t1 2 #t1*4
add t1 t1 s0 #address arr[i]
lw s4 0[t1]
lw s5 -4[t1]
bge s5 s4 process
sub s8 s4 s5
add s3 s3 s8
blt s2 s1 profit
jr ra
process:
add s10 s10 s3
add s3 zero zero
blt s2 s1 profit
jr ra
print:
addi a0 s10 0
li a7 1
ecall
li a7 10
ecall
```
### CSR cycle count:2781
## abstract about how to use a GCC toolchain
### 1.compile your c code
riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -O(3~1) -o (filename you want) (your C code filename)
### look at the elf header
riscv-none-elf-readelf -h (your filename)
### check size
riscv-none-elf-size (your filename)
## -Os
### read elf

### check size

### exe

### Disassembly code
```
000101cc <maxProfit>:
101cc: 00000793 li a5,0
101d0: 00100613 li a2,1
101d4: 00000693 li a3,0
101d8: 00b64663 blt a2,a1,101e4 <maxProfit+0x18>
101dc: 00f68533 add a0,a3,a5
101e0: 00008067 ret
101e4: 00452703 lw a4,4(a0)
101e8: 00052803 lw a6,0(a0)
101ec: 00e84c63 blt a6,a4,10204 <maxProfit+0x38>
101f0: 00f686b3 add a3,a3,a5
101f4: 00000793 li a5,0
101f8: 00160613 addi a2,a2,1
101fc: 00450513 addi a0,a0,4
10200: fd9ff06f j 101d8 <maxProfit+0xc>
10204: 41070733 sub a4,a4,a6
10208: 00e787b3 add a5,a5,a4
1020c: fedff06f j 101f8 <maxProfit+0x2c>
```
## -Ofast
### code
### exe and check size

### check elf

### exe

### Disassembly code
```
000101ac <maxProfit>:
101ac: 00100793 li a5,1
101b0: 0cb7d263 bge a5,a1,10274 <maxProfit+0xc8>
101b4: 00300793 li a5,3
101b8: 0cb7d263 bge a5,a1,1027c <maxProfit+0xd0>
101bc: ffc58713 addi a4,a1,-4
101c0: 00052783 lw a5,0(a0)
101c4: ffe77713 andi a4,a4,-2
101c8: 00450693 addi a3,a0,4
101cc: 00370713 addi a4,a4,3
101d0: 00000813 li a6,0
101d4: 00100313 li t1,1
101d8: 00000893 li a7,0
101dc: 0006a603 lw a2,0(a3)
101e0: 00000e13 li t3,0
101e4: 40f60eb3 sub t4,a2,a5
101e8: 06c7d863 bge a5,a2,10258 <maxProfit+0xac>
101ec: 0046a783 lw a5,4(a3)
101f0: 010e8e33 add t3,t4,a6
101f4: 00000813 li a6,0
101f8: 40c78eb3 sub t4,a5,a2
101fc: 06f65863 bge a2,a5,1026c <maxProfit+0xc0>
10200: 01ce8833 add a6,t4,t3
10204: 00230313 addi t1,t1,2 # 101a2 <frame_dummy+0x16>
10208: 00868693 addi a3,a3,8
1020c: fce318e3 bne t1,a4,101dc <maxProfit+0x30>
10210: 00271793 slli a5,a4,0x2
10214: 00f507b3 add a5,a0,a5
10218: 0180006f j 10230 <maxProfit+0x84>
1021c: 00170713 addi a4,a4,1
10220: 010888b3 add a7,a7,a6
10224: 00478793 addi a5,a5,4
10228: 00000813 li a6,0
1022c: 02b75263 bge a4,a1,10250 <maxProfit+0xa4>
10230: 0007a603 lw a2,0(a5)
10234: ffc7a683 lw a3,-4(a5)
10238: 40d60533 sub a0,a2,a3
1023c: fec6d0e3 bge a3,a2,1021c <maxProfit+0x70>
10240: 00170713 addi a4,a4,1
10244: 00a80833 add a6,a6,a0
10248: 00478793 addi a5,a5,4
1024c: feb742e3 blt a4,a1,10230 <maxProfit+0x84>
10250: 01088533 add a0,a7,a6
10254: 00008067 ret
10258: 0046a783 lw a5,4(a3)
1025c: 010888b3 add a7,a7,a6
10260: 00000813 li a6,0
10264: 40c78eb3 sub t4,a5,a2
10268: f8f64ce3 blt a2,a5,10200 <maxProfit+0x54>
1026c: 01c888b3 add a7,a7,t3
10270: f95ff06f j 10204 <maxProfit+0x58>
10274: 00000513 li a0,0
10278: 00008067 ret
1027c: 00000813 li a6,0
10280: 00100713 li a4,1
10284: 00000893 li a7,0
10288: f89ff06f j 10210 <maxProfit+0x64>
```
## -O1
### compile your c code -O1

### watch elf head

### check size

### executing

### Disassembly code
```
00010184 <maxProfit>:
10184: 00100793 li a5,1
10188: 04b7d263 bge a5,a1,101cc <maxProfit+0x48>
1018c: 00450793 addi a5,a0,4
10190: 00259593 slli a1,a1,0x2
10194: 00b505b3 add a1,a0,a1
10198: 00000613 li a2,0
1019c: 00000513 li a0,0
101a0: 0140006f j 101b4 <maxProfit+0x30>
101a4: 40d70733 sub a4,a4,a3
101a8: 00e60633 add a2,a2,a4
101ac: 00478793 addi a5,a5,4
101b0: 02b78263 beq a5,a1,101d4 <maxProfit+0x50>
101b4: 0007a703 lw a4,0(a5)
101b8: ffc7a683 lw a3,-4(a5)
101bc: fee6c4e3 blt a3,a4,101a4 <maxProfit+0x20>
101c0: 00c50533 add a0,a0,a2
101c4: 00000613 li a2,0
101c8: fe5ff06f j 101ac <maxProfit+0x28>
101cc: 00000613 li a2,0
101d0: 00000513 li a0,0
101d4: 00c50533 add a0,a0,a2
101d8: 00008067 ret
```
## -O2
### compile your c code -O2

### watch elf

### check size

### exe

### Disassembly code
```
00010200 <maxProfit>:
10200: 00100793 li a5,1
10204: 04b7da63 bge a5,a1,10258 <maxProfit+0x58>
10208: 00259593 slli a1,a1,0x2
1020c: 00052703 lw a4,0(a0)
10210: 00450793 addi a5,a0,4
10214: 00b505b3 add a1,a0,a1
10218: 00000613 li a2,0
1021c: 00000513 li a0,0
10220: 0140006f j 10234 <maxProfit+0x34>
10224: 00478793 addi a5,a5,4
10228: 00c50533 add a0,a0,a2
1022c: 00000613 li a2,0
10230: 02b78063 beq a5,a1,10250 <maxProfit+0x50>
10234: 00070693 mv a3,a4
10238: 0007a703 lw a4,0(a5)
1023c: 40d70833 sub a6,a4,a3
10240: fee6d2e3 bge a3,a4,10224 <maxProfit+0x24>
10244: 00478793 addi a5,a5,4
10248: 01060633 add a2,a2,a6
1024c: feb794e3 bne a5,a1,10234 <maxProfit+0x34>
10250: 00c50533 add a0,a0,a2
10254: 00008067 ret
10258: 00000513 li a0,0
1025c: 00008067 ret
```
## -O3
### compile your c code -O3

### watch elf

### check size

### exe

### Disassembly code
```
000101ac <maxProfit>:
101ac: 00100793 li a5,1
101b0: 0cb7d263 bge a5,a1,10274 <maxProfit+0xc8>
101b4: 00300793 li a5,3
101b8: 0cb7d263 bge a5,a1,1027c <maxProfit+0xd0>
101bc: ffc58713 addi a4,a1,-4
101c0: 00052783 lw a5,0(a0)
101c4: ffe77713 andi a4,a4,-2
101c8: 00450693 addi a3,a0,4
101cc: 00370713 addi a4,a4,3
101d0: 00000813 li a6,0
101d4: 00100313 li t1,1
101d8: 00000893 li a7,0
101dc: 0006a603 lw a2,0(a3)
101e0: 00000e13 li t3,0
101e4: 40f60eb3 sub t4,a2,a5
101e8: 06c7d863 bge a5,a2,10258 <maxProfit+0xac>
101ec: 0046a783 lw a5,4(a3)
101f0: 010e8e33 add t3,t4,a6
101f4: 00000813 li a6,0
101f8: 40c78eb3 sub t4,a5,a2
101fc: 06f65863 bge a2,a5,1026c <maxProfit+0xc0>
10200: 01ce8833 add a6,t4,t3
10204: 00230313 addi t1,t1,2 # 101a2 <frame_dummy+0x16>
10208: 00868693 addi a3,a3,8
1020c: fce318e3 bne t1,a4,101dc <maxProfit+0x30>
10210: 00271793 slli a5,a4,0x2
10214: 00f507b3 add a5,a0,a5
10218: 0180006f j 10230 <maxProfit+0x84>
1021c: 00170713 addi a4,a4,1
10220: 010888b3 add a7,a7,a6
10224: 00478793 addi a5,a5,4
10228: 00000813 li a6,0
1022c: 02b75263 bge a4,a1,10250 <maxProfit+0xa4>
10230: 0007a603 lw a2,0(a5)
10234: ffc7a683 lw a3,-4(a5)
10238: 40d60533 sub a0,a2,a3
1023c: fec6d0e3 bge a3,a2,1021c <maxProfit+0x70>
10240: 00170713 addi a4,a4,1
10244: 00a80833 add a6,a6,a0
10248: 00478793 addi a5,a5,4
1024c: feb742e3 blt a4,a1,10230 <maxProfit+0x84>
10250: 01088533 add a0,a7,a6
10254: 00008067 ret
10258: 0046a783 lw a5,4(a3)
1025c: 010888b3 add a7,a7,a6
10260: 00000813 li a6,0
10264: 40c78eb3 sub t4,a5,a2
10268: f8f64ce3 blt a2,a5,10200 <maxProfit+0x54>
1026c: 01c888b3 add a7,a7,t3
10270: f95ff06f j 10204 <maxProfit+0x58>
10274: 00000513 li a0,0
10278: 00008067 ret
1027c: 00000813 li a6,0
10280: 00100713 li a4,1
10284: 00000893 li a7,0
10288: f89ff06f j 10210 <maxProfit+0x64>
```
## observation
| | line of code | Allocate bytes on stack | Registers used |||CSR|
| -------- | -------- | -------- | -------- | -------- | -------- | -------- |
| O0 |22 | 96 |```ra``` ```sp``` ```a0 ~ a7``` ```s0``` ```t1``` | | |2832|
| O1 | 22 | 96 |```ra``` ```sp``` ```a0 ~ a7``` ```s0``` ```t1``` | | |2822 |
| O2 | 24 | 80 |```ra``` ```sp``` ```a0 ~ a7``` ```s0``` | | |2826 |
| O3 | 96 | 16 |```ra``` ```sp``` ```a0 ~ a7``` | | |2760 |
| Ofast | 96 | 16 |```ra``` ```sp``` ```a0 ~ a7``` | | |2760 |
| Os | 17 | 42 |```ra``` ```sp``` ```a0 ~ a7``` ``s0`` | | |2861 |
### anaylize
* **O1** : O0 and O1 Almost the same.
* **O1** to **O2** : Fewer lines of Code, stack usage amount, lw/sw count, CSR. The register ```t1```not used since O2.
* **O2** to **O3** : More Fewer lines of Code, stack usage amount, lw/sw count, CSR. The register ```s0``` is not used in O3. Futhermore, O3's main funtion address is in front of the singlenumber fuction.
* **Ofast** :As same as O3, not any differance.
* **Os**: Lines of Code, stack usage amount, lw/sw count, register usage amount.
is between O2 and O3, but CSR is between O0 and O1.
* Comparison (">" means better)
* CSR: O3 = Ofast > O2 > O1 = Os
* LOC total: O2 > O1 > Os > O3 > Ofast