# Assignment2: RISC-V Toolchain ###### tags: `Computer Architecture` `RISC-V` `jserv` > Contributed by <[tonych1997](https://github.com/tonych1997/Computer-Architecture)> ## Information for Assignment 2 * [Assignment2: RISC-V Toolchain](https://hackmd.io/@sysprog/2022-arch-homework2) * [Lab2: RISC-V RV32I[MACF] emulator with ELF support](https://hackmd.io/@sysprog/SJAR5XMmi) * [rv32emu](https://github.com/sysprog21/rv32emu) * [RISC-V Assembly Programmer's Manual](https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md) ## Install Risc-V rv32imacf toolchain on WSL (Ubuntu 20.04) Since my PC is using Windows OS, I installed [WSL (Windows Subsystem for Linux)](https://learn.microsoft.com/zh-tw/windows/wsl/install) to run Linux environment. So I can run Ubuntu 20.04 on WSL. Follow the instruction from [this note](https://hackmd.io/@sysprog/SJAR5XMmi). ### Required packages ``` $ sudo apt update $ sudo apt install make $ sudo apt install gcc $ sudo apt-get install libsdl2-dev ``` ### Install GNU Toolchain for RISC-V [xPack GNU RISC-V Embedded GCC](https://xpack.github.io/riscv-none-elf-gcc/) ``` $ cd /tmp $ wget https://github.com/xpack-dev-tools/riscv-none-elf-gcc-xpack/releases/download/v12.2.0-1/xpack-riscv-none-elf-gcc-12.2.0-1-linux-x64.tar.gz $ tar zxvf xpack-riscv-none-elf-gcc-12.2.0-1-linux-x64.tar.gz $ cp -af xpack-riscv-none-elf-gcc-12.2.0-1 $HOME/riscv-none-elf-gcc ``` ### Configure $PATH ``` $ cd $HOME/riscv-none-elf-gcc $ echo "export PATH=`pwd`/bin:$PATH" > setenv ``` update $PATH environment variable ``` $ cd $HOME $ source riscv-none-elf-gcc/setenv ``` Only one problem was encountered as follows, the others were executed correctly. There is a [problem](https://blog.csdn.net/wangchengming1/article/details/76946412) when I executed `$ source riscv-none-elf-gcc/setenv`. Beacause in WSL, some directories are Windows directoryies, Linux can not execute the instruction contain the ` ` (space),`(`, `)` . So, I use the [vim instruction: search and replace](https://vim.fandom.com/wiki/Search_and_replace) `:%s/foo/bar/g` to replace ` `, `(`, and `)` to `\\ `, `\\(`, and `\\)` in `setenv` then `source` again. :::warning The blank after `export` command is not a directory, and all the blanks will become `\ `(\space) after it is replaced, so remember to delete the `\` before the blank after `export`. ::: ### Check $PATH Check $PATH by following [commands](https://blog.csdn.net/DLUTBruceZhang/article/details/8811456). ``` $ export # check PATH $ echo $PATH # check PATH enviroment variables ``` Check `riscv-none-elf-gcc` toolchain is in the $PATH. ``` $ riscv-none-elf-gcc -v ``` Then I can see this the following messages: `gcc version 12.2.0 (xPack GNU RISC-V Embedded GCC x86_64)` ![](https://i.imgur.com/svwEIRm.png) Run `hello.elf` using the command line below: ``` $ build/rv32emu build/hello.elf ``` Finally I can get the correct output result. ![](https://i.imgur.com/PCC3Fpd.png) ## Pick a Question The following question is picked from [Assignment 1](https://hackmd.io/@sysprog/2022-arch-homework1) is **Search Insert Position**. > [陳奕萍](https://hackmd.io/@P76111482/BJxisP5Mi) ([Leetcode 35](https://leetcode.com/problems/search-insert-position/)) > Given a sorted array of distinct integers and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order. I chose this question because it uses arrays, and since my previous assignment also used arrays, I wanted to start with an array question. [Source code](https://github.com/qwer951212/ComputerArchitecture_HW1) ### Original C code The original implementation of the questions is as follows. ```clike= #include <stdio.h> int searchInsert(int *nums, int numsSize, int target) { int left = 0; int right = numsSize - 1; while (left <= right) { int mid = (left + right) / 2; if (target == nums[mid]) return mid; else if (target < nums[mid]) right = mid - 1; else left = mid + 1; } return left; } int main() { int data[] = {1, 3, 5, 6}; int size = 4; int tar1 = 5, tar2 = 2, tar3 = 7; int index1 = searchInsert(data, size, tar1); int index2 = searchInsert(data, size, tar2); int index3 = searchInsert(data, size, tar3); printf("The target1 insert position is %d\n", index1); printf("The target2 insert position is %d\n", index2); printf("The target3 insert position is %d\n", index3); return 0; } ``` ### Handwritten Asssembly * Adapt the C and Assembly code to rv32emu style. * I think there should be no special changes. ```clike= .data data: .word 1, 3, 5, 6 # data[] = {1, 3, 5, 6} size: .word 4 # array size = 4 tar1: .word 5 # target1 = 5 tar2: .word 2 # target2 = 2 tar3: .word 7 # target3 = 7 str: .string "The insert position is " n1: .string "\n" .text main: la s1, data # s1 = nums[] lw s2, size # s2 = size addi s5, x0, 0 # s5 = 0 //left = 0 addi s6, s2, -1 # s6 = size - 1 //right = size - 1 lw s3, tar1 # s3 = 5 //tar1 jal ra, searchInsert jal ra, print lw s3, tar2 # s3 = 2 //tar2 jal ra, searchInsert jal ra, print lw s3, tar3 # s3 = 7 //tar3 jal ra, searchInsert jal ra, print li a7, 10 # System call: Exit ecall searchInsert: addi t0, s5, 0 addi t1, s6, 0 loop: slt t6, t1, t0 # if left > right, t6 = 1 bne t6, zero, return add t2, t0, t1 # mid = left + right srli t2, t2, 1 # mid = mid / 2 slli t3, t2, 2 # offset = mid * 4 add t4, s1, t3 # address = base address + offset lw t5, 0(t4) # t5 = data[mid] beq s3, t5, equal # if target == data[mid], go to equal blt s3, t5, less # if target < data[mid], go to less addi t0, t2, 1 # if target > data[mid], left = mid + 1 j loop # jump to loop equal: addi s4, t2, 0 # s4 = mid ret # return to main less: addi t1, t2, -1 # if target < data[mid], right = mid - 1 j loop # jump to loop return: addi s4, t0, 0 # s4 = left ret # return to main print: la a0, str # load string li a7, 4 # System call: PrintString ecall addi a0, s4, 0 # load result li a7, 1 # System call: PrintInt ecall la a0, n1 li a7, 4 ecall ret ``` ## Using GNU/GCC Toolchain 1. Compile C code ``` $ riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -O(3~1) -o (Generated file) (C code file) ``` * -march=rv32i : Specify RV32I instruction set architecture. * -mabi=ilp32 : Specify ILP32 ABI. * -Ofast: optimized for speed * -Os: optimized for size * -O(1~3) 2. Display the assembler mnemonics for the machine instructions ``` $ riscv-none-elf-objdump -d (file) ``` * Do [file redirection](https://www.guru99.com/linux-redirection.html) to output the results to the txt ``` riscv-none-elf-objdump -d (file) > asb.txt ``` * [Search keywords in vim](https://officeguide.cc/vim-search-operations-tutorial-examples/) ``` / (keywords) ``` 3. Display the [ELF](https://zh.wikipedia.org/zh-tw/%E5%8F%AF%E5%9F%B7%E8%A1%8C%E8%88%87%E5%8F%AF%E9%8F%88%E6%8E%A5%E6%A0%BC%E5%BC%8F) file header ``` $ riscv-none-elf-readelf -h (file) ``` 4. list the section sizes ``` $ riscv-none-elf-size (file) ``` 5. Execute code ``` $ build/rv32emu (file) ``` 6. ## Compare Assembly code ### -O1 ``` $ riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -O1 -o hw2_1 hw2.c ``` #### Assembly Code ``` $ riscv-none-elf-objdump -d hw2_1 > asb_hw2_1.txt ``` * main ```clike= 000101dc <main>: 101dc: fe010113 addi sp,sp,-32 101e0: 00112e23 sw ra,28(sp) 101e4: 00812c23 sw s0,24(sp) 101e8: 00912a23 sw s1,20(sp) 101ec: 01212823 sw s2,16(sp) 101f0: 000217b7 lui a5,0x21 101f4: 77478793 addi a5,a5,1908 # 21774 <__clzsi2+0xf8> 101f8: 0007a603 lw a2,0(a5) 101fc: 0047a683 lw a3,4(a5) 10200: 0087a703 lw a4,8(a5) 10204: 00c7a783 lw a5,12(a5) 10208: 00c12023 sw a2,0(sp) 1020c: 00d12223 sw a3,4(sp) 10210: 00e12423 sw a4,8(sp) 10214: 00f12623 sw a5,12(sp) 10218: 00500613 li a2,5 1021c: 00400593 li a1,4 10220: 00010513 mv a0,sp 10224: f61ff0ef jal ra,10184 <searchInsert> 10228: 00050913 mv s2,a0 1022c: 00200613 li a2,2 10230: 00400593 li a1,4 10234: 00010513 mv a0,sp 10238: f4dff0ef jal ra,10184 <searchInsert> 1023c: 00050493 mv s1,a0 10240: 00700613 li a2,7 10244: 00400593 li a1,4 10248: 00010513 mv a0,sp 1024c: f39ff0ef jal ra,10184 <searchInsert> 10250: 00050413 mv s0,a0 10254: 00090593 mv a1,s2 10258: 00021537 lui a0,0x21 1025c: 70850513 addi a0,a0,1800 # 21708 <__clzsi2+0x8c> 10260: 260000ef jal ra,104c0 <printf> 10264: 00048593 mv a1,s1 10268: 00021537 lui a0,0x21 1026c: 72c50513 addi a0,a0,1836 # 2172c <__clzsi2+0xb0> 10270: 250000ef jal ra,104c0 <printf> 10274: 00040593 mv a1,s0 10278: 00021537 lui a0,0x21 1027c: 75050513 addi a0,a0,1872 # 21750 <__clzsi2+0xd4> 10280: 240000ef jal ra,104c0 <printf> 10284: 00000513 li a0,0 10288: 01c12083 lw ra,28(sp) 1028c: 01812403 lw s0,24(sp) 10290: 01412483 lw s1,20(sp) 10294: 01012903 lw s2,16(sp) 10298: 02010113 addi sp,sp,32 1029c: 00008067 re ``` * searchInsert ```clike= 00010184 <searchInsert>: 10184: 00050693 mv a3,a0 10188: fff58593 addi a1,a1,-1 1018c: 0405c463 bltz a1,101d4 <searchInsert+0x50> 10190: 00000713 li a4,0 10194: 00c0006f j 101a0 <searchInsert+0x1c> 10198: 00150713 addi a4,a0,1 1019c: 02e5c863 blt a1,a4,101cc <searchInsert+0x48> 101a0: 00b707b3 add a5,a4,a1 101a4: 01f7d513 srli a0,a5,0x1f 101a8: 00f50533 add a0,a0,a5 101ac: 40155513 srai a0,a0,0x1 101b0: 00251793 slli a5,a0,0x2 101b4: 00f687b3 add a5,a3,a5 101b8: 0007a783 lw a5,0(a5) 101bc: 00c78e63 beq a5,a2,101d8 <searchInsert+0x54> 101c0: fcf65ce3 bge a2,a5,10198 <searchInsert+0x14> 101c4: fff50593 addi a1,a0,-1 101c8: fd5ff06f j 1019c <searchInsert+0x18> 101cc: 00070513 mv a0,a4 101d0: 00008067 ret 101d4: 00000513 li a0,0 101d8: 00008067 ret ``` #### ELF Header ``` $ riscv-none-elf-readelf -h hw2_1 ``` ![](https://i.imgur.com/vQhZ1WO.png) #### Size ``` $ riscv-none-elf-size hw2_1 ``` ![](https://i.imgur.com/Gnuxl5C.png) #### Execution result and CSR count ``` $ build/rv32emu --stats hw2/hw2_1 ``` ![](https://i.imgur.com/noHgdmB.png) ### -O2 ``` $ riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -O2 -o hw2_2 hw2.c ``` #### Assembly Code ``` $ riscv-none-elf-objdump -d hw2_2 > asb_hw2_2.txt ``` * main ```clike= 000100c4 <main>: 100c4: 000217b7 lui a5,0x21 100c8: 76c78793 addi a5,a5,1900 # 2176c <__clzsi2+0xf8> 100cc: 0007a803 lw a6,0(a5) 100d0: 0047a683 lw a3,4(a5) 100d4: 0087a703 lw a4,8(a5) 100d8: 00c7a783 lw a5,12(a5) 100dc: fe010113 addi sp,sp,-32 100e0: 00500613 li a2,5 100e4: 00400593 li a1,4 100e8: 00010513 mv a0,sp 100ec: 00112e23 sw ra,28(sp) 100f0: 00812c23 sw s0,24(sp) 100f4: 00912a23 sw s1,20(sp) 100f8: 01212823 sw s2,16(sp) 100fc: 01012023 sw a6,0(sp) 10100: 00d12223 sw a3,4(sp) 10104: 00e12423 sw a4,8(sp) 10108: 00f12623 sw a5,12(sp) 1010c: 13c000ef jal ra,10248 <searchInsert> 10110: 00050913 mv s2,a0 10114: 00200613 li a2,2 10118: 00400593 li a1,4 1011c: 00010513 mv a0,sp 10120: 128000ef jal ra,10248 <searchInsert> 10124: 00700613 li a2,7 10128: 00050493 mv s1,a0 1012c: 00400593 li a1,4 10130: 00010513 mv a0,sp 10134: 114000ef jal ra,10248 <searchInsert> 10138: 00050413 mv s0,a0 1013c: 00021537 lui a0,0x21 10140: 00090593 mv a1,s2 10144: 70050513 addi a0,a0,1792 # 21700 <__clzsi2+0x8c> 10148: 370000ef jal ra,104b8 <printf> 1014c: 00021537 lui a0,0x21 10150: 00048593 mv a1,s1 10154: 72450513 addi a0,a0,1828 # 21724 <__clzsi2+0xb0> 10158: 360000ef jal ra,104b8 <printf> 1015c: 00021537 lui a0,0x21 10160: 00040593 mv a1,s0 10164: 74850513 addi a0,a0,1864 # 21748 <__clzsi2+0xd4> 10168: 350000ef jal ra,104b8 <printf> 1016c: 01c12083 lw ra,28(sp) 10170: 01812403 lw s0,24(sp) 10174: 01412483 lw s1,20(sp) 10178: 01012903 lw s2,16(sp) 1017c: 00000513 li a0,0 10180: 02010113 addi sp,sp,32 10184: 00008067 ret ``` * searchInsert ```clike= 00010248 <searchInsert>: 10248: fff58593 addi a1,a1,-1 1024c: 00050693 mv a3,a0 10250: 0405c063 bltz a1,10290 <searchInsert+0x48> 10254: 00000713 li a4,0 10258: 00b70533 add a0,a4,a1 1025c: 40155513 srai a0,a0,0x1 10260: 00251793 slli a5,a0,0x2 10264: 00f687b3 add a5,a3,a5 10268: 0007a783 lw a5,0(a5) 1026c: 02c78463 beq a5,a2,10294 <searchInsert+0x4c> 10270: 00f65a63 bge a2,a5,10284 <searchInsert+0x3c> 10274: fff50593 addi a1,a0,-1 10278: fee5d0e3 bge a1,a4,10258 <searchInsert+0x10> 1027c: 00070513 mv a0,a4 10280: 00008067 ret 10284: 00150713 addi a4,a0,1 10288: fce5d8e3 bge a1,a4,10258 <searchInsert+0x10> 1028c: ff1ff06f j 1027c <searchInsert+0x34> 10290: 00000513 li a0,0 10294: 00008067 ret ``` #### ELF Header ``` $ riscv-none-elf-readelf -h hw2_2 ``` ![](https://i.imgur.com/ynPZf80.png) #### Size ``` $ riscv-none-elf-size hw2_2 ``` ![](https://i.imgur.com/JOe1yuT.png) #### Execution result and CSR count ``` $ build/rv32emu --stats hw2/hw2_2 ``` ![](https://i.imgur.com/pPDZDLP.png) ### -O3 ``` $ riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -O3 -o hw2_3 hw2.c ``` #### Assembly Code ``` $ riscv-none-elf-objdump -d hw2_3 > asb_hw2_3.txt ``` * main ```clike= 000100c4 <main>: 100c4: 000227b7 lui a5,0x22 100c8: 84478793 addi a5,a5,-1980 # 21844 <__clzsi2+0xf4> 100cc: 0007a603 lw a2,0(a5) 100d0: 0047a683 lw a3,4(a5) 100d4: 0087a703 lw a4,8(a5) 100d8: 00c7a783 lw a5,12(a5) 100dc: fe010113 addi sp,sp,-32 100e0: 00c12023 sw a2,0(sp) 100e4: 00d12223 sw a3,4(sp) 100e8: 00112e23 sw ra,28(sp) 100ec: 00812c23 sw s0,24(sp) 100f0: 00912a23 sw s1,20(sp) 100f4: 00e12423 sw a4,8(sp) 100f8: 00f12623 sw a5,12(sp) 100fc: 00300693 li a3,3 10100: 00000593 li a1,0 10104: 00500613 li a2,5 10108: 00b687b3 add a5,a3,a1 1010c: 4017d793 srai a5,a5,0x1 10110: 00279713 slli a4,a5,0x2 10114: 01070713 addi a4,a4,16 10118: 00270733 add a4,a4,sp 1011c: ff072703 lw a4,-16(a4) 10120: 12c70c63 beq a4,a2,10258 <main+0x194> 10124: 10e65863 bge a2,a4,10234 <main+0x170> 10128: fff78693 addi a3,a5,-1 1012c: fcb6dee3 bge a3,a1,10108 <main+0x44> 10130: 00300693 li a3,3 10134: 00000493 li s1,0 10138: 00200613 li a2,2 1013c: 00d487b3 add a5,s1,a3 10140: 4017d793 srai a5,a5,0x1 10144: 00279713 slli a4,a5,0x2 10148: 01070713 addi a4,a4,16 1014c: 00270733 add a4,a4,sp 10150: ff072703 lw a4,-16(a4) 10154: 0cc70c63 beq a4,a2,1022c <main+0x168> 10158: 0ae65863 bge a2,a4,10208 <main+0x144> 1015c: fff78693 addi a3,a5,-1 10160: fc96dee3 bge a3,s1,1013c <main+0x78> 10164: 00000413 li s0,0 10168: 00300693 li a3,3 1016c: 00700613 li a2,7 10170: 00d407b3 add a5,s0,a3 10174: 4017d793 srai a5,a5,0x1 10178: 00279713 slli a4,a5,0x2 1017c: 01070713 addi a4,a4,16 10180: 00270733 add a4,a4,sp 10184: ff072703 lw a4,-16(a4) 10188: 06c70c63 beq a4,a2,10200 <main+0x13c> 1018c: 04e65863 bge a2,a4,101dc <main+0x118> 10190: fff78693 addi a3,a5,-1 10194: fc86dee3 bge a3,s0,10170 <main+0xac> 10198: 00021537 lui a0,0x21 1019c: 7d850513 addi a0,a0,2008 # 217d8 <__clzsi2+0x88> 101a0: 3f4000ef jal ra,10594 <printf> 101a4: 00021537 lui a0,0x21 101a8: 00048593 mv a1,s1 101ac: 7fc50513 addi a0,a0,2044 # 217fc <__clzsi2+0xac> 101b0: 3e4000ef jal ra,10594 <printf> 101b4: 00022537 lui a0,0x22 101b8: 00040593 mv a1,s0 101bc: 82050513 addi a0,a0,-2016 # 21820 <__clzsi2+0xd0> 101c0: 3d4000ef jal ra,10594 <printf> 101c4: 01c12083 lw ra,28(sp) 101c8: 01812403 lw s0,24(sp) 101cc: 01412483 lw s1,20(sp) 101d0: 00000513 li a0,0 101d4: 02010113 addi sp,sp,32 101d8: 00008067 ret 101dc: 00178413 addi s0,a5,1 101e0: fa86cce3 blt a3,s0,10198 <main+0xd4> 101e4: 00d407b3 add a5,s0,a3 101e8: 4017d793 srai a5,a5,0x1 101ec: 00279713 slli a4,a5,0x2 101f0: 01070713 addi a4,a4,16 101f4: 00270733 add a4,a4,sp 101f8: ff072703 lw a4,-16(a4) 101fc: f8c718e3 bne a4,a2,1018c <main+0xc8> 10200: 00078413 mv s0,a5 10204: f95ff06f j 10198 <main+0xd4> 10208: 00178493 addi s1,a5,1 1020c: f496cce3 blt a3,s1,10164 <main+0xa0> 10210: 00d487b3 add a5,s1,a3 10214: 4017d793 srai a5,a5,0x1 10218: 00279713 slli a4,a5,0x2 1021c: 01070713 addi a4,a4,16 10220: 00270733 add a4,a4,sp 10224: ff072703 lw a4,-16(a4) 10228: f2c718e3 bne a4,a2,10158 <main+0x94> 1022c: 00078493 mv s1,a5 10230: f35ff06f j 10164 <main+0xa0> 10234: 00178593 addi a1,a5,1 10238: eeb6cce3 blt a3,a1,10130 <main+0x6c> 1023c: 00b687b3 add a5,a3,a1 10240: 4017d793 srai a5,a5,0x1 10244: 00279713 slli a4,a5,0x2 10248: 01070713 addi a4,a4,16 1024c: 00270733 add a4,a4,sp 10250: ff072703 lw a4,-16(a4) 10254: ecc718e3 bne a4,a2,10124 <main+0x60> 10258: 00078593 mv a1,a5 1025c: ed5ff06f j 10130 <main+0x6c> ``` * searchInsert ```clike= 00010320 <searchInsert>: 10320: fff58593 addi a1,a1,-1 10324: 00050693 mv a3,a0 10328: 0405c263 bltz a1,1036c <searchInsert+0x4c> 1032c: 00000713 li a4,0 10330: 00b70533 add a0,a4,a1 10334: 40155513 srai a0,a0,0x1 10338: 00251793 slli a5,a0,0x2 1033c: 00f687b3 add a5,a3,a5 10340: 0007a783 lw a5,0(a5) 10344: 00c78a63 beq a5,a2,10358 <searchInsert+0x38> 10348: 00f65a63 bge a2,a5,1035c <searchInsert+0x3c> 1034c: fff50593 addi a1,a0,-1 10350: fee5d0e3 bge a1,a4,10330 <searchInsert+0x10> 10354: 00070513 mv a0,a4 10358: 00008067 ret 1035c: 00150713 addi a4,a0,1 10360: fce5d8e3 bge a1,a4,10330 <searchInsert+0x10> 10364: 00070513 mv a0,a4 10368: ff1ff06f j 10358 <searchInsert+0x38> 1036c: 00000513 li a0,0 10370: 00008067 ret ``` #### ELF Header ``` $ riscv-none-elf-readelf -h hw2_3 ``` ![](https://i.imgur.com/4K1IURy.png) #### Size ``` $ riscv-none-elf-size hw2_3 ``` ![](https://i.imgur.com/wgA1WvI.png) #### Execution result and CSR count ``` $ build/rv32emu --stats hw2/hw2_3 ``` ![](https://i.imgur.com/sT7CUsX.png) ### -Ofast ``` $ riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -Ofast -o hw2_fast hw2.c ``` #### Assembly Code ``` $ riscv-none-elf-objdump -d hw2_fast > asb_hw2_fast.txt ``` * main ```clike= 000100c4 <main>: 100c4: 000227b7 lui a5,0x22 100c8: 84478793 addi a5,a5,-1980 # 21844 <__clzsi2+0xf4> 100cc: 0007a603 lw a2,0(a5) 100d0: 0047a683 lw a3,4(a5) 100d4: 0087a703 lw a4,8(a5) 100d8: 00c7a783 lw a5,12(a5) 100dc: fe010113 addi sp,sp,-32 100e0: 00c12023 sw a2,0(sp) 100e4: 00d12223 sw a3,4(sp) 100e8: 00112e23 sw ra,28(sp) 100ec: 00812c23 sw s0,24(sp) 100f0: 00912a23 sw s1,20(sp) 100f4: 00e12423 sw a4,8(sp) 100f8: 00f12623 sw a5,12(sp) 100fc: 00300693 li a3,3 10100: 00000593 li a1,0 10104: 00500613 li a2,5 10108: 00b687b3 add a5,a3,a1 1010c: 4017d793 srai a5,a5,0x1 10110: 00279713 slli a4,a5,0x2 10114: 01070713 addi a4,a4,16 10118: 00270733 add a4,a4,sp 1011c: ff072703 lw a4,-16(a4) 10120: 12c70c63 beq a4,a2,10258 <main+0x194> 10124: 10e65863 bge a2,a4,10234 <main+0x170> 10128: fff78693 addi a3,a5,-1 1012c: fcb6dee3 bge a3,a1,10108 <main+0x44> 10130: 00300693 li a3,3 10134: 00000493 li s1,0 10138: 00200613 li a2,2 1013c: 00d487b3 add a5,s1,a3 10140: 4017d793 srai a5,a5,0x1 10144: 00279713 slli a4,a5,0x2 10148: 01070713 addi a4,a4,16 1014c: 00270733 add a4,a4,sp 10150: ff072703 lw a4,-16(a4) 10154: 0cc70c63 beq a4,a2,1022c <main+0x168> 10158: 0ae65863 bge a2,a4,10208 <main+0x144> 1015c: fff78693 addi a3,a5,-1 10160: fc96dee3 bge a3,s1,1013c <main+0x78> 10164: 00000413 li s0,0 10168: 00300693 li a3,3 1016c: 00700613 li a2,7 10170: 00d407b3 add a5,s0,a3 10174: 4017d793 srai a5,a5,0x1 10178: 00279713 slli a4,a5,0x2 1017c: 01070713 addi a4,a4,16 10180: 00270733 add a4,a4,sp 10184: ff072703 lw a4,-16(a4) 10188: 06c70c63 beq a4,a2,10200 <main+0x13c> 1018c: 04e65863 bge a2,a4,101dc <main+0x118> 10190: fff78693 addi a3,a5,-1 10194: fc86dee3 bge a3,s0,10170 <main+0xac> 10198: 00021537 lui a0,0x21 1019c: 7d850513 addi a0,a0,2008 # 217d8 <__clzsi2+0x88> 101a0: 3f4000ef jal ra,10594 <printf> 101a4: 00021537 lui a0,0x21 101a8: 00048593 mv a1,s1 101ac: 7fc50513 addi a0,a0,2044 # 217fc <__clzsi2+0xac> 101b0: 3e4000ef jal ra,10594 <printf> 101b4: 00022537 lui a0,0x22 101b8: 00040593 mv a1,s0 101bc: 82050513 addi a0,a0,-2016 # 21820 <__clzsi2+0xd0> 101c0: 3d4000ef jal ra,10594 <printf> 101c4: 01c12083 lw ra,28(sp) 101c8: 01812403 lw s0,24(sp) 101cc: 01412483 lw s1,20(sp) 101d0: 00000513 li a0,0 101d4: 02010113 addi sp,sp,32 101d8: 00008067 ret 101dc: 00178413 addi s0,a5,1 101e0: fa86cce3 blt a3,s0,10198 <main+0xd4> 101e4: 00d407b3 add a5,s0,a3 101e8: 4017d793 srai a5,a5,0x1 101ec: 00279713 slli a4,a5,0x2 101f0: 01070713 addi a4,a4,16 101f4: 00270733 add a4,a4,sp 101f8: ff072703 lw a4,-16(a4) 101fc: f8c718e3 bne a4,a2,1018c <main+0xc8> 10200: 00078413 mv s0,a5 10204: f95ff06f j 10198 <main+0xd4> 10208: 00178493 addi s1,a5,1 1020c: f496cce3 blt a3,s1,10164 <main+0xa0> 10210: 00d487b3 add a5,s1,a3 10214: 4017d793 srai a5,a5,0x1 10218: 00279713 slli a4,a5,0x2 1021c: 01070713 addi a4,a4,16 10220: 00270733 add a4,a4,sp 10224: ff072703 lw a4,-16(a4) 10228: f2c718e3 bne a4,a2,10158 <main+0x94> 1022c: 00078493 mv s1,a5 10230: f35ff06f j 10164 <main+0xa0> 10234: 00178593 addi a1,a5,1 10238: eeb6cce3 blt a3,a1,10130 <main+0x6c> 1023c: 00b687b3 add a5,a3,a1 10240: 4017d793 srai a5,a5,0x1 10244: 00279713 slli a4,a5,0x2 10248: 01070713 addi a4,a4,16 1024c: 00270733 add a4,a4,sp 10250: ff072703 lw a4,-16(a4) 10254: ecc718e3 bne a4,a2,10124 <main+0x60> 10258: 00078593 mv a1,a5 1025c: ed5ff06f j 10130 <main+0x6c> ``` * searchInsert ```clike= 00010320 <searchInsert>: 10320: fff58593 addi a1,a1,-1 10324: 00050693 mv a3,a0 10328: 0405c263 bltz a1,1036c <searchInsert+0x4c> 1032c: 00000713 li a4,0 10330: 00b70533 add a0,a4,a1 10334: 40155513 srai a0,a0,0x1 10338: 00251793 slli a5,a0,0x2 1033c: 00f687b3 add a5,a3,a5 10340: 0007a783 lw a5,0(a5) 10344: 00c78a63 beq a5,a2,10358 <searchInsert+0x38> 10348: 00f65a63 bge a2,a5,1035c <searchInsert+0x3c> 1034c: fff50593 addi a1,a0,-1 10350: fee5d0e3 bge a1,a4,10330 <searchInsert+0x10> 10354: 00070513 mv a0,a4 10358: 00008067 ret 1035c: 00150713 addi a4,a0,1 10360: fce5d8e3 bge a1,a4,10330 <searchInsert+0x10> 10364: 00070513 mv a0,a4 10368: ff1ff06f j 10358 <searchInsert+0x38> 1036c: 00000513 li a0,0 10370: 00008067 ret ``` #### ELF Header ``` $ riscv-none-elf-readelf -h hw2_fast ``` ![](https://i.imgur.com/iJ4TuoS.png) #### Size ``` $ riscv-none-elf-size hw2_fast ``` ![](https://i.imgur.com/nrDIg8o.png) #### Execution result and CSR count ``` $ build/rv32emu --stats hw2/hw2_fast ``` ![](https://i.imgur.com/kjomFrD.png) ### -Os ``` $ riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -O1 -o hw2_s hw2.c ``` #### Assembly Code ``` $ riscv-none-elf-objdump -d hw2_s > asb_hw2_s.txt ``` * main ```clike= 000101dc <main>: 101dc: fe010113 addi sp,sp,-32 101e0: 00112e23 sw ra,28(sp) 101e4: 00812c23 sw s0,24(sp) 101e8: 00912a23 sw s1,20(sp) 101ec: 01212823 sw s2,16(sp) 101f0: 000217b7 lui a5,0x21 101f4: 77478793 addi a5,a5,1908 # 21774 <__clzsi2+0xf8> 101f8: 0007a603 lw a2,0(a5) 101fc: 0047a683 lw a3,4(a5) 10200: 0087a703 lw a4,8(a5) 10204: 00c7a783 lw a5,12(a5) 10208: 00c12023 sw a2,0(sp) 1020c: 00d12223 sw a3,4(sp) 10210: 00e12423 sw a4,8(sp) 10214: 00f12623 sw a5,12(sp) 10218: 00500613 li a2,5 1021c: 00400593 li a1,4 10220: 00010513 mv a0,sp 10224: f61ff0ef jal ra,10184 <searchInsert> 10228: 00050913 mv s2,a0 1022c: 00200613 li a2,2 10230: 00400593 li a1,4 10234: 00010513 mv a0,sp 10238: f4dff0ef jal ra,10184 <searchInsert> 1023c: 00050493 mv s1,a0 10240: 00700613 li a2,7 10244: 00400593 li a1,4 10248: 00010513 mv a0,sp 1024c: f39ff0ef jal ra,10184 <searchInsert> 10250: 00050413 mv s0,a0 10254: 00090593 mv a1,s2 10258: 00021537 lui a0,0x21 1025c: 70850513 addi a0,a0,1800 # 21708 <__clzsi2+0x8c> 10260: 260000ef jal ra,104c0 <printf> 10264: 00048593 mv a1,s1 10268: 00021537 lui a0,0x21 1026c: 72c50513 addi a0,a0,1836 # 2172c <__clzsi2+0xb0> 10270: 250000ef jal ra,104c0 <printf> 10274: 00040593 mv a1,s0 10278: 00021537 lui a0,0x21 1027c: 75050513 addi a0,a0,1872 # 21750 <__clzsi2+0xd4> 10280: 240000ef jal ra,104c0 <printf> 10284: 00000513 li a0,0 10288: 01c12083 lw ra,28(sp) 1028c: 01812403 lw s0,24(sp) 10290: 01412483 lw s1,20(sp) 10294: 01012903 lw s2,16(sp) 10298: 02010113 addi sp,sp,32 1029c: 00008067 ret ``` * searchInsert ```clike= 00010184 <searchInsert>: 10184: 00050693 mv a3,a0 10188: fff58593 addi a1,a1,-1 1018c: 0405c463 bltz a1,101d4 <searchInsert+0x50> 10190: 00000713 li a4,0 10194: 00c0006f j 101a0 <searchInsert+0x1c> 10198: 00150713 addi a4,a0,1 1019c: 02e5c863 blt a1,a4,101cc <searchInsert+0x48> 101a0: 00b707b3 add a5,a4,a1 101a4: 01f7d513 srli a0,a5,0x1f 101a8: 00f50533 add a0,a0,a5 101ac: 40155513 srai a0,a0,0x1 101b0: 00251793 slli a5,a0,0x2 101b4: 00f687b3 add a5,a3,a5 101b8: 0007a783 lw a5,0(a5) 101bc: 00c78e63 beq a5,a2,101d8 <searchInsert+0x54> 101c0: fcf65ce3 bge a2,a5,10198 <searchInsert+0x14> 101c4: fff50593 addi a1,a0,-1 101c8: fd5ff06f j 1019c <searchInsert+0x18> 101cc: 00070513 mv a0,a4 101d0: 00008067 ret 101d4: 00000513 li a0,0 101d8: 00008067 ret ``` #### ELF Header ``` $ riscv-none-elf-readelf -h hw2_s ``` ![](https://i.imgur.com/XnzO7qG.png) #### Size ``` $ riscv-none-elf-size hw2_s ``` ![](https://i.imgur.com/6vNRxqV.png) #### Execution result and CSR count ``` $ build/rv32emu --stats hw2/hw2_s ``` ![](https://i.imgur.com/BQackIC.png) ## Contrast Handwritten and Compiler-optimized Assembly ### Obserations and explanations * [LOC (lines of code)](https://zh.wikipedia.org/zh-tw/%E5%8E%9F%E5%A7%8B%E7%A2%BC%E8%A1%8C%E6%95%B8) * [CSR (Control/Status Register)](https://en.wikipedia.org/wiki/Control/Status_Register) The best items are shown in bold. | | Handwrite | O1 | O2 | O3 | Ofast | Os | | -------- | -------- | -------- | -------- | -------- | -------- | -------- | | CSR | - | 4913 | 4899 | **4890** | **4890** | 4913 | | LOC: main | - | **50** | **50** | 103 | 104 | 51 | | LOC: searchInsert | - | 23 | **21** | 22 | 22 | 23 | | LOC: total | - | 73 | **71** | 125 | 126 | 74 | * Comparison (">" means better) * CSR: O3 = Ofast > O2 > O1 = Os * LOC total: O2 > O1 > Os > O3 > Ofast > note: I'm having trouble editing and executing the handwritten assembly language, I'll add the handwritten comparison when it's resolved. ## Write a Faster / Smaller Assembly Program * Check the results of rv32emu --stats for the statistics of your program’s execution. * Then, try to optimize the handwritten/generated assembly. You shall read [RISC-V Assembly Programmer’s Manual](https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md) carefully. * drop some function calls * apply techniques such as [loop unrolling](https://en.wikipedia.org/wiki/Loop_unrolling) and [peephole optimization](http://homepage.cs.uiowa.edu/~dwjones/compiler/notes/38.shtml) I think under ideal conditions, it is possible to write better code than the compiler. But in reality, we may overlook something and the code is not as good as we thought it would be. However, I think it might be more feasible to start with C and reduce unnecessary code. ## Reference * [Example page](https://hackmd.io/@wIVnCcUaTouAktrkMVLEMA/SJEP_amvK) * [Assignment 2 Example page](https://hackmd.io/@wIVnCcUaTouAktrkMVLEMA/SJEP_amvK) * [Assignment 2 -1](https://hackmd.io/@zKOCm3SSTKyUyiPV-nfEjw/Hyj0urMQo) * [Assignment 2 -2](https://hackmd.io/ADNQPiEFSPC_daP2GJ_sSQ) * [Assignment2 2021](https://hackmd.io/@sysprog/2021-arch-homework2) * [Assignment 2 - Search Insert Position](https://hackmd.io/@N9qHU_eLRvKyfDfJk8cDXA/Hy0O4vFwF)