# Assignment2: RISC-V Toolchain ### Contributed by<Rei-Fu-Zhang> ## install VM with Oracle VM VirtualBox first [click here](https://www.virtualbox.org/) ![](https://i.imgur.com/uaAoh93.jpg) because my cpu is AMD R7 5800X3D so we need to do a prework go to BIOS and open the CPU's VM Function then slect ypur VM like the picture below ![](https://i.imgur.com/QlDdVi2.png) ## Problem description: ### The problem I selected:122. Best Time to Buy and Sell Stock II from [陳邦尉](https://github.com/ram-a11y/Computer-Architecture) ### motivation:I'm a hopeless stock maniac ### the code from him ``` #include<stdio.h> int maxProfit(const int *, int); int main(){ const int prices[] = {7,1,5,3,6,4}; const int pricesSize = 6; printf("%d\n", maxProfit(prices, pricesSize)); return 0; } int maxProfit(int* prices, int pricesSize) { int sum = 0; for (int i = 1; i < pricesSize; i++) { if (prices[i - 1] < prices[i]) { // check if growing then add it sum = sum + (prices[i] - prices[i - 1]); } } return sum; } ``` ### My version ``` #include<stdio.h> #include<stdlib.h> int main(){ int arr[]={99,32,3,56,0,2,56,99}; int size=8; int a= maxProfit(arr,size); printf("%d\n",a); } int maxProfit(int* prices, int pricesSize){ int totalProfit = 0; int i=0; int increases = 0; for (i = 1; i < pricesSize; i++) { if (prices[i] <= prices[i - 1]) { totalProfit += increases; increases = 0; } else increases += prices[i] - prices[i - 1]; } totalProfit += increases; return totalProfit; } ``` ## rewrite the RV32I code ``` .data arr: .word 9,2,3,6,0,2,6,9 size: .word 8 .text main: la s0 arr lw s1 size jal ra profit add s10 s10 s3 j print profit: addi s2 s2 1 #s2=i=1 i++ add t1 s2 zero #t1=s2=i slli t1 t1 2 #t1*4 add t1 t1 s0 #address arr[i] lw s4 0[t1] lw s5 -4[t1] bge s5 s4 process sub s8 s4 s5 add s3 s3 s8 blt s2 s1 profit jr ra process: add s10 s10 s3 add s3 zero zero blt s2 s1 profit jr ra print: addi a0 s10 0 li a7 1 ecall li a7 10 ecall ``` trying to modify my code for fit in RV32EMU ``` # RISC-V assembly program to print "Hello World!" to stdout. .org 0 # Provide program starting address to linker .global _start /* newlib system calls */ .set SYSEXIT, 93 .set SYSWRITE, 64 .data arr: .word 9,2,3,6,0,2,6,9 size: .word 8 start_input_line:.string "Input:[" start_output_line:.string "Output:[" endline:.string "]\n" .text main: la s0 arr lw s1 size jal ra profit add s10 s10 s3 j print profit: addi s2 s2 1 #s2=i=1 i++ add t1 s2 zero #t1=s2=i slli t1 t1 2 #t1*4 add t1 t1 s0 #address arr[i] lw s4 0[t1] lw s5 -4[t1] bge s5 s4 process sub s8 s4 s5 add s3 s3 s8 blt s2 s1 profit jr ra process: add s10 s10 s3 add s3 zero zero blt s2 s1 profit jr ra print: addi a0 s10 0 li a7 1 ecall li a7 10 ecall ``` ### CSR cycle count:2781 ## abstract about how to use a GCC toolchain ### 1.compile your c code riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -O(3~1) -o (filename you want) (your C code filename) ### look at the elf header riscv-none-elf-readelf -h (your filename) ### check size riscv-none-elf-size (your filename) ## -Os ### read elf ![](https://i.imgur.com/fKnIjvP.png) ### check size ![](https://i.imgur.com/lp6y39R.png) ### exe ![](https://i.imgur.com/ZNXd9w4.png) ### Disassembly code ``` 000101cc <maxProfit>: 101cc: 00000793 li a5,0 101d0: 00100613 li a2,1 101d4: 00000693 li a3,0 101d8: 00b64663 blt a2,a1,101e4 <maxProfit+0x18> 101dc: 00f68533 add a0,a3,a5 101e0: 00008067 ret 101e4: 00452703 lw a4,4(a0) 101e8: 00052803 lw a6,0(a0) 101ec: 00e84c63 blt a6,a4,10204 <maxProfit+0x38> 101f0: 00f686b3 add a3,a3,a5 101f4: 00000793 li a5,0 101f8: 00160613 addi a2,a2,1 101fc: 00450513 addi a0,a0,4 10200: fd9ff06f j 101d8 <maxProfit+0xc> 10204: 41070733 sub a4,a4,a6 10208: 00e787b3 add a5,a5,a4 1020c: fedff06f j 101f8 <maxProfit+0x2c> ``` ## -Ofast ### code ### exe and check size ![](https://i.imgur.com/v0wTB8z.png) ### check elf ![](https://i.imgur.com/EQnDuW2.png) ### exe ![](https://i.imgur.com/QYATqKU.png) ### Disassembly code ``` 000101ac <maxProfit>: 101ac: 00100793 li a5,1 101b0: 0cb7d263 bge a5,a1,10274 <maxProfit+0xc8> 101b4: 00300793 li a5,3 101b8: 0cb7d263 bge a5,a1,1027c <maxProfit+0xd0> 101bc: ffc58713 addi a4,a1,-4 101c0: 00052783 lw a5,0(a0) 101c4: ffe77713 andi a4,a4,-2 101c8: 00450693 addi a3,a0,4 101cc: 00370713 addi a4,a4,3 101d0: 00000813 li a6,0 101d4: 00100313 li t1,1 101d8: 00000893 li a7,0 101dc: 0006a603 lw a2,0(a3) 101e0: 00000e13 li t3,0 101e4: 40f60eb3 sub t4,a2,a5 101e8: 06c7d863 bge a5,a2,10258 <maxProfit+0xac> 101ec: 0046a783 lw a5,4(a3) 101f0: 010e8e33 add t3,t4,a6 101f4: 00000813 li a6,0 101f8: 40c78eb3 sub t4,a5,a2 101fc: 06f65863 bge a2,a5,1026c <maxProfit+0xc0> 10200: 01ce8833 add a6,t4,t3 10204: 00230313 addi t1,t1,2 # 101a2 <frame_dummy+0x16> 10208: 00868693 addi a3,a3,8 1020c: fce318e3 bne t1,a4,101dc <maxProfit+0x30> 10210: 00271793 slli a5,a4,0x2 10214: 00f507b3 add a5,a0,a5 10218: 0180006f j 10230 <maxProfit+0x84> 1021c: 00170713 addi a4,a4,1 10220: 010888b3 add a7,a7,a6 10224: 00478793 addi a5,a5,4 10228: 00000813 li a6,0 1022c: 02b75263 bge a4,a1,10250 <maxProfit+0xa4> 10230: 0007a603 lw a2,0(a5) 10234: ffc7a683 lw a3,-4(a5) 10238: 40d60533 sub a0,a2,a3 1023c: fec6d0e3 bge a3,a2,1021c <maxProfit+0x70> 10240: 00170713 addi a4,a4,1 10244: 00a80833 add a6,a6,a0 10248: 00478793 addi a5,a5,4 1024c: feb742e3 blt a4,a1,10230 <maxProfit+0x84> 10250: 01088533 add a0,a7,a6 10254: 00008067 ret 10258: 0046a783 lw a5,4(a3) 1025c: 010888b3 add a7,a7,a6 10260: 00000813 li a6,0 10264: 40c78eb3 sub t4,a5,a2 10268: f8f64ce3 blt a2,a5,10200 <maxProfit+0x54> 1026c: 01c888b3 add a7,a7,t3 10270: f95ff06f j 10204 <maxProfit+0x58> 10274: 00000513 li a0,0 10278: 00008067 ret 1027c: 00000813 li a6,0 10280: 00100713 li a4,1 10284: 00000893 li a7,0 10288: f89ff06f j 10210 <maxProfit+0x64> ``` ## -O1 ### compile your c code -O1 ![](https://i.imgur.com/WHYz4FO.png) ### watch elf head ![](https://i.imgur.com/wOO0wYz.png) ### check size ![](https://i.imgur.com/ERRyBHL.png) ### executing ![](https://i.imgur.com/YVNWToo.png) ### Disassembly code ``` 00010184 <maxProfit>: 10184: 00100793 li a5,1 10188: 04b7d263 bge a5,a1,101cc <maxProfit+0x48> 1018c: 00450793 addi a5,a0,4 10190: 00259593 slli a1,a1,0x2 10194: 00b505b3 add a1,a0,a1 10198: 00000613 li a2,0 1019c: 00000513 li a0,0 101a0: 0140006f j 101b4 <maxProfit+0x30> 101a4: 40d70733 sub a4,a4,a3 101a8: 00e60633 add a2,a2,a4 101ac: 00478793 addi a5,a5,4 101b0: 02b78263 beq a5,a1,101d4 <maxProfit+0x50> 101b4: 0007a703 lw a4,0(a5) 101b8: ffc7a683 lw a3,-4(a5) 101bc: fee6c4e3 blt a3,a4,101a4 <maxProfit+0x20> 101c0: 00c50533 add a0,a0,a2 101c4: 00000613 li a2,0 101c8: fe5ff06f j 101ac <maxProfit+0x28> 101cc: 00000613 li a2,0 101d0: 00000513 li a0,0 101d4: 00c50533 add a0,a0,a2 101d8: 00008067 ret ``` ## -O2 ### compile your c code -O2 ![](https://i.imgur.com/BYUwRhP.png) ### watch elf ![](https://i.imgur.com/TblE5lx.png) ### check size ![](https://i.imgur.com/EKME9N9.png) ### exe ![](https://i.imgur.com/GBiVwh2.png) ### Disassembly code ``` 00010200 <maxProfit>: 10200: 00100793 li a5,1 10204: 04b7da63 bge a5,a1,10258 <maxProfit+0x58> 10208: 00259593 slli a1,a1,0x2 1020c: 00052703 lw a4,0(a0) 10210: 00450793 addi a5,a0,4 10214: 00b505b3 add a1,a0,a1 10218: 00000613 li a2,0 1021c: 00000513 li a0,0 10220: 0140006f j 10234 <maxProfit+0x34> 10224: 00478793 addi a5,a5,4 10228: 00c50533 add a0,a0,a2 1022c: 00000613 li a2,0 10230: 02b78063 beq a5,a1,10250 <maxProfit+0x50> 10234: 00070693 mv a3,a4 10238: 0007a703 lw a4,0(a5) 1023c: 40d70833 sub a6,a4,a3 10240: fee6d2e3 bge a3,a4,10224 <maxProfit+0x24> 10244: 00478793 addi a5,a5,4 10248: 01060633 add a2,a2,a6 1024c: feb794e3 bne a5,a1,10234 <maxProfit+0x34> 10250: 00c50533 add a0,a0,a2 10254: 00008067 ret 10258: 00000513 li a0,0 1025c: 00008067 ret ``` ## -O3 ### compile your c code -O3 ![](https://i.imgur.com/MMMMrYe.png) ### watch elf ![](https://i.imgur.com/hCLeGgk.png) ### check size ![](https://i.imgur.com/oHTKB9D.png) ### exe ![](https://i.imgur.com/3gJmtOC.png) ### Disassembly code ``` 000101ac <maxProfit>: 101ac: 00100793 li a5,1 101b0: 0cb7d263 bge a5,a1,10274 <maxProfit+0xc8> 101b4: 00300793 li a5,3 101b8: 0cb7d263 bge a5,a1,1027c <maxProfit+0xd0> 101bc: ffc58713 addi a4,a1,-4 101c0: 00052783 lw a5,0(a0) 101c4: ffe77713 andi a4,a4,-2 101c8: 00450693 addi a3,a0,4 101cc: 00370713 addi a4,a4,3 101d0: 00000813 li a6,0 101d4: 00100313 li t1,1 101d8: 00000893 li a7,0 101dc: 0006a603 lw a2,0(a3) 101e0: 00000e13 li t3,0 101e4: 40f60eb3 sub t4,a2,a5 101e8: 06c7d863 bge a5,a2,10258 <maxProfit+0xac> 101ec: 0046a783 lw a5,4(a3) 101f0: 010e8e33 add t3,t4,a6 101f4: 00000813 li a6,0 101f8: 40c78eb3 sub t4,a5,a2 101fc: 06f65863 bge a2,a5,1026c <maxProfit+0xc0> 10200: 01ce8833 add a6,t4,t3 10204: 00230313 addi t1,t1,2 # 101a2 <frame_dummy+0x16> 10208: 00868693 addi a3,a3,8 1020c: fce318e3 bne t1,a4,101dc <maxProfit+0x30> 10210: 00271793 slli a5,a4,0x2 10214: 00f507b3 add a5,a0,a5 10218: 0180006f j 10230 <maxProfit+0x84> 1021c: 00170713 addi a4,a4,1 10220: 010888b3 add a7,a7,a6 10224: 00478793 addi a5,a5,4 10228: 00000813 li a6,0 1022c: 02b75263 bge a4,a1,10250 <maxProfit+0xa4> 10230: 0007a603 lw a2,0(a5) 10234: ffc7a683 lw a3,-4(a5) 10238: 40d60533 sub a0,a2,a3 1023c: fec6d0e3 bge a3,a2,1021c <maxProfit+0x70> 10240: 00170713 addi a4,a4,1 10244: 00a80833 add a6,a6,a0 10248: 00478793 addi a5,a5,4 1024c: feb742e3 blt a4,a1,10230 <maxProfit+0x84> 10250: 01088533 add a0,a7,a6 10254: 00008067 ret 10258: 0046a783 lw a5,4(a3) 1025c: 010888b3 add a7,a7,a6 10260: 00000813 li a6,0 10264: 40c78eb3 sub t4,a5,a2 10268: f8f64ce3 blt a2,a5,10200 <maxProfit+0x54> 1026c: 01c888b3 add a7,a7,t3 10270: f95ff06f j 10204 <maxProfit+0x58> 10274: 00000513 li a0,0 10278: 00008067 ret 1027c: 00000813 li a6,0 10280: 00100713 li a4,1 10284: 00000893 li a7,0 10288: f89ff06f j 10210 <maxProfit+0x64> ``` ## observation | | line of code | Allocate bytes on stack | Registers used |||CSR| | -------- | -------- | -------- | -------- | -------- | -------- | -------- | | O0 |22 | 96 |```ra``` ```sp``` ```a0 ~ a7``` ```s0``` ```t1``` | | |2832| | O1 | 22 | 96 |```ra``` ```sp``` ```a0 ~ a7``` ```s0``` ```t1``` | | |2822 | | O2 | 24 | 80 |```ra``` ```sp``` ```a0 ~ a7``` ```s0``` | | |2826 | | O3 | 96 | 16 |```ra``` ```sp``` ```a0 ~ a7``` | | |2760 | | Ofast | 96 | 16 |```ra``` ```sp``` ```a0 ~ a7``` | | |2760 | | Os | 17 | 42 |```ra``` ```sp``` ```a0 ~ a7``` ``s0`` | | |2861 | ### anaylize * **O1** : O0 and O1 Almost the same. * **O1** to **O2** : Fewer lines of Code, stack usage amount, lw/sw count, CSR. The register ```t1```not used since O2. * **O2** to **O3** : More Fewer lines of Code, stack usage amount, lw/sw count, CSR. The register ```s0``` is not used in O3. Futhermore, O3's main funtion address is in front of the singlenumber fuction. * **Ofast** :As same as O3, not any differance. * **Os**: Lines of Code, stack usage amount, lw/sw count, register usage amount. is between O2 and O3, but CSR is between O0 and O1. * Comparison (">" means better) * CSR: O3 = Ofast > O2 > O1 = Os * LOC total: O2 > O1 > Os > O3 > Ofast