# Assignment2: RISC-V Toolchain Contributed by [tobychui](https://github.com/tobychui) ## Install Debian 11 on VirtualBox As my laptop is Windows only, I decided to install VirtualBox and Debian on top of it. The reason why I choose Debian is that Debian is usually more stable with my previous experience with Linux and less resources hungry with an adventage of compatible with Ubuntu. I first partitioned a virtual disk of size 16GB and soon found out the installation takes more space then I have allocated. Thus, I reinstalled Debian on a 64GB virtual disk and disabled Desktop installation. This made it successfully installed the rv32ima toolchain. ![](https://i.imgur.com/SioyToq.png) :::warning Never show the plaintext with pictures! Instead, use the markdown syntax. :notes: jserv ::: It turns out, the whole tool chainwith Debian 11 (no desktop / GUI) takes around 8.1GB of disk space. Install the basic utils as follows. ```shell sudo apt-get install ifconfig git sudo -y # Setup sudo by editing /etc/sudoers sudo ifconfig # Write down your IP address of your VM, if you are not using bridged network adapter, you can skip this step ``` :::warning Make use of [ip](https://linux.die.net/man/8/ip) as the modern option. :notes: jserv ::: Connect to the VM instance using SSH and the IP address listed in the ifconfig output. ## Install riscv rv32ima toolchain Follow the instruction from [this note](https://hackmd.io/@sysprog/rJAufgHYS) Replace this line ```bash echo "export PATH=`pwd`/bin:$PATH" > setenv ``` to this line ```bash echo "export PATH=`pwd`/bin:$PATH" >> .bashrc ``` then ```source``` the .bashrc file ```bash source .bashrc ``` validate installation succeed by executing ``` riscv-none-embed-gcc -v ``` and the following result is printed out on the terminal. ![](https://i.imgur.com/OdwrxyT.png) make test also passed ![](https://i.imgur.com/v9NbPD6.png) ## Pick a Question The following question is picked from the Assignment 1 > 張力尹 Sort Colors(Leetcode75) > Given an array nums with n objects colored red, white, or blue, sort them in-place so that objects of the same color are adjacent, with the colors in the order red, white, and blue. >We will use the integers 0, 1, and 2 to represent the color red, white, and blue, respectively. >You must solve this problem without using the library's sort function. [Source code](https://github.com/liing0228/computer-hw) The original implementation of the questions is as follows. ```c= #include<stdio.h> int main(){ int i, j, temp; int nums[9]={0,1,1,1,2,2,2,0,1,0}; int numsSize=9; for (i = numsSize-1; i >=0; --i){ for (j = 0; j < i ; ++j){ if (nums[j] > nums[j + 1]){ temp = nums[j]; nums[j] = nums[j + 1]; nums[j + 1] = temp; } } } } ``` (If you have noticed, there is 10 elements in the nums array (length is 9), but it should not be effecting the compilation) The reason I choose this question is that this code used one of my favorite sorting algorithm - [Bubble Sort](https://www.youtube.com/watch?v=Cq7SMsQBEUw). And for these types of simple algorithm, it is easier to spot the difference between manual assembly writing and compiler optimized and generated assembly. #### The Hand Written Assembly The hand written assembly looks like this ```asm .data list: .word 0,2,7,4,5,6,3,8,1 len: .word 9 cma: .string "," ent: .string "\n" .text main: la s1, list lw s2, len addi s9, s2, -1 add s3, zero, zero # i = 0 (s3) add s4, s1, zero # t1 = list[i] address (s4) loop1: addi s5, s3, 0 # j = i+1 (s5) add s6, s4, zero # t2 = list[j] address (s6) loop2: addi s5, s5, 1 addi s6, s6, 4 # j = j+1 lw t1, 0(s4) # value of list[i] lw t2, 0(s6) # value of list[j] blt t1, t2, loop2_ sw t2, 0(s4) sw t1, 0(s6) loop2_: blt s5, s9, loop2 addi s3, s3, 1 addi s4, s4, 4 # i = i+1 blt s3, s9, loop1 la a0, ent # new line li a7, 4 # print ecall showList: lw t1, 0(s1) add a0, t1, zero # load integer li a7, 1 # print result ecall la a0, cma # speration li a7, 4 # print ecall addi s1, s1, 4 addi s2, s2, -1 bgtz s2, showList li a7, 10 #end program ecall ``` ### Compiling using RISC-V gcc To compile the .c file, I first tried to use the following command. ```bash riscv-none-embed-gcc -Os ./leetcode.c ``` But it failed. The compiler return the following errors. ![](https://i.imgur.com/vCZjsyJ.png) ``` exit.c:(.text.exit+0x18): undefined reference to `_exit' ``` After some googling, seems there are some [issues](https://github.com/riscv-collab/riscv-gnu-toolchain/issues/418#issuecomment-475200297) in the RISC-V GNU toolchain that would create this kind of error if a function is not defined or missing. Hence, a specific paramter has to be added in order to "make the linker happy" as quoted. The paramter is listed below. ``` --specs=nosys.specs ``` After using the new compiling paramters, it works. ![](https://i.imgur.com/coTvVfh.png) #### Analysis Compiler Generated Assembly By using the following command, I can see the content of the compiler output assembly. ```bash riscv-none-embed-objdump -d ./a.out ``` And the above command output the following content to terminal. ```asm= ./a.out: file format elf32-littleriscv Disassembly of section .text: 00010074 <exit>: 10074: 1141 addi sp,sp,-16 10076: 4581 li a1,0 10078: c422 sw s0,8(sp) 1007a: c606 sw ra,12(sp) 1007c: 842a mv s0,a0 1007e: 2a7d jal 1023c <__call_exitprocs> 10080: c281a503 lw a0,-984(gp) # 117e8 <_global_impure_ptr> 10084: 5d5c lw a5,60(a0) 10086: c391 beqz a5,1008a <exit+0x16> 10088: 9782 jalr a5 1008a: 8522 mv a0,s0 1008c: 2e31 jal 103a8 <_exit> 0001008e <main>: 1008e: 4501 li a0,0 10090: 8082 ret 00010092 <register_fini>: 10092: 00000793 li a5,0 10096: c791 beqz a5,100a2 <register_fini+0x10> 10098: 00000517 auipc a0,0x0 1009c: 26250513 addi a0,a0,610 # 102fa <__libc_fini_array> 100a0: ac49 j 10332 <atexit> 100a2: 8082 ret 000100a4 <_start>: 100a4: 00002197 auipc gp,0x2 100a8: b1c18193 addi gp,gp,-1252 # 11bc0 <__global_pointer$> 100ac: c3018513 addi a0,gp,-976 # 117f0 <completed.1> 100b0: c4c18613 addi a2,gp,-948 # 1180c <__BSS_END__> 100b4: 8e09 sub a2,a2,a0 100b6: 4581 li a1,0 100b8: 28f1 jal 10194 <memset> 100ba: 00000517 auipc a0,0x0 100be: 27850513 addi a0,a0,632 # 10332 <atexit> 100c2: c511 beqz a0,100ce <_start+0x2a> 100c4: 00000517 auipc a0,0x0 100c8: 23650513 addi a0,a0,566 # 102fa <__libc_fini_array> 100cc: 249d jal 10332 <atexit> 100ce: 28b1 jal 1012a <__libc_init_array> 100d0: 4502 lw a0,0(sp) 100d2: 004c addi a1,sp,4 100d4: 4601 li a2,0 100d6: 3f65 jal 1008e <main> 100d8: bf71 j 10074 <exit> 000100da <__do_global_dtors_aux>: 100da: 1141 addi sp,sp,-16 100dc: c422 sw s0,8(sp) 100de: c3018413 addi s0,gp,-976 # 117f0 <completed.1> 100e2: 00044783 lbu a5,0(s0) 100e6: c606 sw ra,12(sp) 100e8: ef99 bnez a5,10106 <__do_global_dtors_aux+0x2c> 100ea: 00000793 li a5,0 100ee: cb89 beqz a5,10100 <__do_global_dtors_aux+0x26> 100f0: 00000517 auipc a0,0x0 100f4: 2bc50513 addi a0,a0,700 # 103ac <__FRAME_END__> 100f8: 00000097 auipc ra,0x0 100fc: 000000e7 jalr zero # 0 <exit-0x10074> 10100: 4785 li a5,1 10102: 00f40023 sb a5,0(s0) 10106: 40b2 lw ra,12(sp) 10108: 4422 lw s0,8(sp) 1010a: 0141 addi sp,sp,16 1010c: 8082 ret 0001010e <frame_dummy>: 1010e: 00000793 li a5,0 10112: cb99 beqz a5,10128 <frame_dummy+0x1a> 10114: c3418593 addi a1,gp,-972 # 117f4 <object.0> 10118: 00000517 auipc a0,0x0 1011c: 29450513 addi a0,a0,660 # 103ac <__FRAME_END__> 10120: 00000317 auipc t1,0x0 10124: 00000067 jr zero # 0 <exit-0x10074> 10128: 8082 ret 0001012a <__libc_init_array>: 1012a: 1141 addi sp,sp,-16 1012c: c422 sw s0,8(sp) 1012e: c04a sw s2,0(sp) 10130: 00001417 auipc s0,0x1 10134: 28040413 addi s0,s0,640 # 113b0 <__init_array_start> 10138: 00001917 auipc s2,0x1 1013c: 27890913 addi s2,s2,632 # 113b0 <__init_array_start> 10140: 40890933 sub s2,s2,s0 10144: c606 sw ra,12(sp) 10146: c226 sw s1,4(sp) 10148: 40295913 srai s2,s2,0x2 1014c: 00090963 beqz s2,1015e <__libc_init_array+0x34> 10150: 4481 li s1,0 10152: 401c lw a5,0(s0) 10154: 0485 addi s1,s1,1 10156: 0411 addi s0,s0,4 10158: 9782 jalr a5 1015a: fe991ce3 bne s2,s1,10152 <__libc_init_array+0x28> 1015e: 00001417 auipc s0,0x1 10162: 25240413 addi s0,s0,594 # 113b0 <__init_array_start> 10166: 00001917 auipc s2,0x1 1016a: 25290913 addi s2,s2,594 # 113b8 <__do_global_dtors_aux_fini_array_entry> 1016e: 40890933 sub s2,s2,s0 10172: 40295913 srai s2,s2,0x2 10176: 00090963 beqz s2,10188 <__libc_init_array+0x5e> 1017a: 4481 li s1,0 1017c: 401c lw a5,0(s0) 1017e: 0485 addi s1,s1,1 10180: 0411 addi s0,s0,4 10182: 9782 jalr a5 10184: fe991ce3 bne s2,s1,1017c <__libc_init_array+0x52> 10188: 40b2 lw ra,12(sp) 1018a: 4422 lw s0,8(sp) 1018c: 4492 lw s1,4(sp) 1018e: 4902 lw s2,0(sp) 10190: 0141 addi sp,sp,16 10192: 8082 ret 00010194 <memset>: 10194: 433d li t1,15 10196: 872a mv a4,a0 10198: 02c37363 bgeu t1,a2,101be <memset+0x2a> 1019c: 00f77793 andi a5,a4,15 101a0: efbd bnez a5,1021e <memset+0x8a> 101a2: e5ad bnez a1,1020c <memset+0x78> 101a4: ff067693 andi a3,a2,-16 101a8: 8a3d andi a2,a2,15 101aa: 96ba add a3,a3,a4 101ac: c30c sw a1,0(a4) 101ae: c34c sw a1,4(a4) 101b0: c70c sw a1,8(a4) 101b2: c74c sw a1,12(a4) 101b4: 0741 addi a4,a4,16 101b6: fed76be3 bltu a4,a3,101ac <memset+0x18> 101ba: e211 bnez a2,101be <memset+0x2a> 101bc: 8082 ret 101be: 40c306b3 sub a3,t1,a2 101c2: 068a slli a3,a3,0x2 101c4: 00000297 auipc t0,0x0 101c8: 9696 add a3,a3,t0 101ca: 00a68067 jr 10(a3) 101ce: 00b70723 sb a1,14(a4) 101d2: 00b706a3 sb a1,13(a4) 101d6: 00b70623 sb a1,12(a4) 101da: 00b705a3 sb a1,11(a4) 101de: 00b70523 sb a1,10(a4) 101e2: 00b704a3 sb a1,9(a4) 101e6: 00b70423 sb a1,8(a4) 101ea: 00b703a3 sb a1,7(a4) 101ee: 00b70323 sb a1,6(a4) 101f2: 00b702a3 sb a1,5(a4) 101f6: 00b70223 sb a1,4(a4) 101fa: 00b701a3 sb a1,3(a4) 101fe: 00b70123 sb a1,2(a4) 10202: 00b700a3 sb a1,1(a4) 10206: 00b70023 sb a1,0(a4) 1020a: 8082 ret 1020c: 0ff5f593 andi a1,a1,255 10210: 00859693 slli a3,a1,0x8 10214: 8dd5 or a1,a1,a3 10216: 01059693 slli a3,a1,0x10 1021a: 8dd5 or a1,a1,a3 1021c: b761 j 101a4 <memset+0x10> 1021e: 00279693 slli a3,a5,0x2 10222: 00000297 auipc t0,0x0 10226: 9696 add a3,a3,t0 10228: 8286 mv t0,ra 1022a: fa8680e7 jalr -88(a3) 1022e: 8096 mv ra,t0 10230: 17c1 addi a5,a5,-16 10232: 8f1d sub a4,a4,a5 10234: 963e add a2,a2,a5 10236: f8c374e3 bgeu t1,a2,101be <memset+0x2a> 1023a: b7a5 j 101a2 <memset+0xe> 0001023c <__call_exitprocs>: 1023c: 7179 addi sp,sp,-48 1023e: cc52 sw s4,24(sp) 10240: c281aa03 lw s4,-984(gp) # 117e8 <_global_impure_ptr> 10244: d04a sw s2,32(sp) 10246: 148a2903 lw s2,328(s4) 1024a: d606 sw ra,44(sp) 1024c: d422 sw s0,40(sp) 1024e: d226 sw s1,36(sp) 10250: ce4e sw s3,28(sp) 10252: ca56 sw s5,20(sp) 10254: c85a sw s6,16(sp) 10256: c65e sw s7,12(sp) 10258: c462 sw s8,8(sp) 1025a: 02090863 beqz s2,1028a <__call_exitprocs+0x4e> 1025e: 8b2a mv s6,a0 10260: 8bae mv s7,a1 10262: 4a85 li s5,1 10264: 59fd li s3,-1 10266: 00492483 lw s1,4(s2) 1026a: fff48413 addi s0,s1,-1 1026e: 00044e63 bltz s0,1028a <__call_exitprocs+0x4e> 10272: 048a slli s1,s1,0x2 10274: 94ca add s1,s1,s2 10276: 020b8663 beqz s7,102a2 <__call_exitprocs+0x66> 1027a: 1044a783 lw a5,260(s1) 1027e: 03778263 beq a5,s7,102a2 <__call_exitprocs+0x66> 10282: 147d addi s0,s0,-1 10284: 14f1 addi s1,s1,-4 10286: ff3418e3 bne s0,s3,10276 <__call_exitprocs+0x3a> 1028a: 50b2 lw ra,44(sp) 1028c: 5422 lw s0,40(sp) 1028e: 5492 lw s1,36(sp) 10290: 5902 lw s2,32(sp) 10292: 49f2 lw s3,28(sp) 10294: 4a62 lw s4,24(sp) 10296: 4ad2 lw s5,20(sp) 10298: 4b42 lw s6,16(sp) 1029a: 4bb2 lw s7,12(sp) 1029c: 4c22 lw s8,8(sp) 1029e: 6145 addi sp,sp,48 102a0: 8082 ret 102a2: 00492783 lw a5,4(s2) 102a6: 40d4 lw a3,4(s1) 102a8: 17fd addi a5,a5,-1 102aa: 04878263 beq a5,s0,102ee <__call_exitprocs+0xb2> 102ae: 0004a223 sw zero,4(s1) 102b2: dae1 beqz a3,10282 <__call_exitprocs+0x46> 102b4: 18892783 lw a5,392(s2) 102b8: 008a9733 sll a4,s5,s0 102bc: 00492c03 lw s8,4(s2) 102c0: 8ff9 and a5,a5,a4 102c2: ef89 bnez a5,102dc <__call_exitprocs+0xa0> 102c4: 9682 jalr a3 102c6: 00492703 lw a4,4(s2) 102ca: 148a2783 lw a5,328(s4) 102ce: 01871463 bne a4,s8,102d6 <__call_exitprocs+0x9a> 102d2: fb2788e3 beq a5,s2,10282 <__call_exitprocs+0x46> 102d6: dbd5 beqz a5,1028a <__call_exitprocs+0x4e> 102d8: 893e mv s2,a5 102da: b771 j 10266 <__call_exitprocs+0x2a> 102dc: 18c92783 lw a5,396(s2) 102e0: 0844a583 lw a1,132(s1) 102e4: 8f7d and a4,a4,a5 102e6: e719 bnez a4,102f4 <__call_exitprocs+0xb8> 102e8: 855a mv a0,s6 102ea: 9682 jalr a3 102ec: bfe9 j 102c6 <__call_exitprocs+0x8a> 102ee: 00892223 sw s0,4(s2) 102f2: b7c1 j 102b2 <__call_exitprocs+0x76> 102f4: 852e mv a0,a1 102f6: 9682 jalr a3 102f8: b7f9 j 102c6 <__call_exitprocs+0x8a> 000102fa <__libc_fini_array>: 102fa: 1141 addi sp,sp,-16 102fc: c422 sw s0,8(sp) 102fe: 00001797 auipc a5,0x1 10302: 0be78793 addi a5,a5,190 # 113bc <__fini_array_end> 10306: 00001417 auipc s0,0x1 1030a: 0b240413 addi s0,s0,178 # 113b8 <__do_global_dtors_aux_fini_array_entry> 1030e: 8f81 sub a5,a5,s0 10310: c226 sw s1,4(sp) 10312: c606 sw ra,12(sp) 10314: 4027d493 srai s1,a5,0x2 10318: c881 beqz s1,10328 <__libc_fini_array+0x2e> 1031a: 17f1 addi a5,a5,-4 1031c: 943e add s0,s0,a5 1031e: 401c lw a5,0(s0) 10320: 14fd addi s1,s1,-1 10322: 1471 addi s0,s0,-4 10324: 9782 jalr a5 10326: fce5 bnez s1,1031e <__libc_fini_array+0x24> 10328: 40b2 lw ra,12(sp) 1032a: 4422 lw s0,8(sp) 1032c: 4492 lw s1,4(sp) 1032e: 0141 addi sp,sp,16 10330: 8082 ret 00010332 <atexit>: 10332: 85aa mv a1,a0 10334: 4681 li a3,0 10336: 4601 li a2,0 10338: 4501 li a0,0 1033a: a009 j 1033c <__register_exitproc> 0001033c <__register_exitproc>: 1033c: c281a703 lw a4,-984(gp) # 117e8 <_global_impure_ptr> 10340: 14872783 lw a5,328(a4) 10344: c3a1 beqz a5,10384 <__register_exitproc+0x48> 10346: 43d8 lw a4,4(a5) 10348: 487d li a6,31 1034a: 04e84d63 blt a6,a4,103a4 <__register_exitproc+0x68> 1034e: 00271813 slli a6,a4,0x2 10352: c11d beqz a0,10378 <__register_exitproc+0x3c> 10354: 01078333 add t1,a5,a6 10358: 08c32423 sw a2,136(t1) # 101a8 <memset+0x14> 1035c: 1887a883 lw a7,392(a5) 10360: 4605 li a2,1 10362: 00e61633 sll a2,a2,a4 10366: 00c8e8b3 or a7,a7,a2 1036a: 1917a423 sw a7,392(a5) 1036e: 10d32423 sw a3,264(t1) 10372: 4689 li a3,2 10374: 00d50d63 beq a0,a3,1038e <__register_exitproc+0x52> 10378: 0705 addi a4,a4,1 1037a: c3d8 sw a4,4(a5) 1037c: 97c2 add a5,a5,a6 1037e: c78c sw a1,8(a5) 10380: 4501 li a0,0 10382: 8082 ret 10384: 14c70793 addi a5,a4,332 10388: 14f72423 sw a5,328(a4) 1038c: bf6d j 10346 <__register_exitproc+0xa> 1038e: 18c7a683 lw a3,396(a5) 10392: 0705 addi a4,a4,1 10394: c3d8 sw a4,4(a5) 10396: 8e55 or a2,a2,a3 10398: 18c7a623 sw a2,396(a5) 1039c: 97c2 add a5,a5,a6 1039e: c78c sw a1,8(a5) 103a0: 4501 li a0,0 103a2: 8082 ret 103a4: 557d li a0,-1 103a6: 8082 ret 000103a8 <_exit>: 103a8: a001 j 103a8 <_exit> ``` Which doesn't seems normal as you can see in the main section, there are only two instructions. ```asm= 0001008e <main>: 1008e: 4501 li a0,0 10090: 8082 ret ``` I have debugged this for a few days, which includes 1. **Switching to Ubuntu 20.04, same error occured (So it is not Debian 11's issue)** 2. Test other student's work which has similar structure to my choicen question, like [this one](https://github.com/chungen0126/TwoSum) by 李仲恩. Similar error occured when trying to build or analysis the assembly. ![](https://i.imgur.com/7UyZVZK.png) 3. Tried to look for issues on the GNU toolchain repo and there is no exact issue that can resolve the error I am seeing. The cloest issue I could find is [this](https://github.com/riscv-collab/riscv-gnu-toolchain/issues/893#issuecomment-830740919) but I have no idea how to resolve this 4. Other common debugging actions (e.g. restart the machine, change permission of files) with no success :::info I don't know the reason. But if you add `volatile` to each variable, it seems OK. Maybe you can try to find out what is the different between `volatile` and non-volatile. ```asm= 00010054 <main>: 10054: 000107b7 lui a5,0x10 10058: 19478793 addi a5,a5,404 # 10194 <main+0x140> 1005c: 0007a683 lw a3,0(a5) 10060: 0047a703 lw a4,4(a5) 10064: fc010113 addi sp,sp,-64 10068: 0087a603 lw a2,8(a5) 1006c: 00d12e23 sw a3,28(sp) 10070: 00c7a683 lw a3,12(a5) 10074: 02e12023 sw a4,32(sp) 10078: 0107a703 lw a4,16(a5) 1007c: 02c12223 sw a2,36(sp) 10080: 0147a603 lw a2,20(a5) 10084: 02d12423 sw a3,40(sp) 10088: 0187a683 lw a3,24(a5) 1008c: 02e12623 sw a4,44(sp) 10090: 01c7a703 lw a4,28(a5) 10094: 02c12823 sw a2,48(sp) 10098: 0207a783 lw a5,32(a5) 1009c: 02d12a23 sw a3,52(sp) 100a0: 02e12c23 sw a4,56(sp) 100a4: 02f12e23 sw a5,60(sp) 100a8: 00900793 li a5,9 100ac: 00f12c23 sw a5,24(sp) 100b0: 01812783 lw a5,24(sp) 100b4: fff78793 addi a5,a5,-1 100b8: 00f12623 sw a5,12(sp) 100bc: 00c12783 lw a5,12(sp) 100c0: 0c07c463 bltz a5,10188 <main+0x134> 100c4: 00012823 sw zero,16(sp) 100c8: 01012703 lw a4,16(sp) ... ``` For your information, ChingHongFang ::: ### Updated 1: After adding volatile to C program Based on TA's idea, volatile type was added to each int variable as follows. ![](https://i.imgur.com/vO403tt.png) The following was compiled and returned ![](https://i.imgur.com/3KCRAHs.png) ``` ../a.out: file format elf32-littleriscv Disassembly of section .text: 00010074 <exit>: 10074: 1141 addi sp,sp,-16 10076: 4581 li a1,0 10078: c422 sw s0,8(sp) 1007a: c606 sw ra,12(sp) 1007c: 842a mv s0,a0 1007e: 24c1 jal 1033e <__call_exitprocs> 10080: c281a503 lw a0,-984(gp) # 11908 <_global_impure_ptr> 10084: 5d5c lw a5,60(a0) 10086: c391 beqz a5,1008a <exit+0x16> 10088: 9782 jalr a5 1008a: 8522 mv a0,s0 1008c: 2939 jal 104aa <_exit> 0001008e <register_fini>: 1008e: 00000793 li a5,0 10092: c791 beqz a5,1009e <register_fini+0x10> 10094: 00000517 auipc a0,0x0 10098: 36850513 addi a0,a0,872 # 103fc <__libc_fini_array> 1009c: ae61 j 10434 <atexit> 1009e: 8082 ret 000100a0 <_start>: 100a0: 00002197 auipc gp,0x2 100a4: c4018193 addi gp,gp,-960 # 11ce0 <__global_pointer$> 100a8: c3018513 addi a0,gp,-976 # 11910 <completed.1> 100ac: c4c18613 addi a2,gp,-948 # 1192c <__BSS_END__> 100b0: 8e09 sub a2,a2,a0 100b2: 4581 li a1,0 100b4: 22cd jal 10296 <memset> 100b6: 00000517 auipc a0,0x0 100ba: 37e50513 addi a0,a0,894 # 10434 <atexit> 100be: c511 beqz a0,100ca <_start+0x2a> 100c0: 00000517 auipc a0,0x0 100c4: 33c50513 addi a0,a0,828 # 103fc <__libc_fini_array> 100c8: 26b5 jal 10434 <atexit> 100ca: 228d jal 1022c <__libc_init_array> 100cc: 4502 lw a0,0(sp) 100ce: 004c addi a1,sp,4 100d0: 4601 li a2,0 100d2: 2891 jal 10126 <main> 100d4: b745 j 10074 <exit> 000100d6 <__do_global_dtors_aux>: 100d6: 1141 addi sp,sp,-16 100d8: c422 sw s0,8(sp) 100da: c3018413 addi s0,gp,-976 # 11910 <completed.1> 100de: 00044783 lbu a5,0(s0) 100e2: c606 sw ra,12(sp) 100e4: ef99 bnez a5,10102 <__do_global_dtors_aux+0x2c> 100e6: 00000793 li a5,0 100ea: cb89 beqz a5,100fc <__do_global_dtors_aux+0x26> 100ec: 00000517 auipc a0,0x0 100f0: 3e450513 addi a0,a0,996 # 104d0 <__FRAME_END__> 100f4: 00000097 auipc ra,0x0 100f8: 000000e7 jalr zero # 0 <exit-0x10074> 100fc: 4785 li a5,1 100fe: 00f40023 sb a5,0(s0) 10102: 40b2 lw ra,12(sp) 10104: 4422 lw s0,8(sp) 10106: 0141 addi sp,sp,16 10108: 8082 ret 0001010a <frame_dummy>: 1010a: 00000793 li a5,0 1010e: cb99 beqz a5,10124 <frame_dummy+0x1a> 10110: c3418593 addi a1,gp,-972 # 11914 <object.0> 10114: 00000517 auipc a0,0x0 10118: 3bc50513 addi a0,a0,956 # 104d0 <__FRAME_END__> 1011c: 00000317 auipc t1,0x0 10120: 00000067 jr zero # 0 <exit-0x10074> 10124: 8082 ret 00010126 <main>: 10126: 715d addi sp,sp,-80 10128: c6a2 sw s0,76(sp) 1012a: 0880 addi s0,sp,80 1012c: 67c1 lui a5,0x10 1012e: 4ac78793 addi a5,a5,1196 # 104ac <_exit+0x2> 10132: 0007a303 lw t1,0(a5) 10136: 0047a883 lw a7,4(a5) 1013a: 0087a803 lw a6,8(a5) 1013e: 47c8 lw a0,12(a5) 10140: 4b8c lw a1,16(a5) 10142: 4bd0 lw a2,20(a5) 10144: 4f94 lw a3,24(a5) 10146: 4fd8 lw a4,28(a5) 10148: 539c lw a5,32(a5) 1014a: fc642023 sw t1,-64(s0) 1014e: fd142223 sw a7,-60(s0) 10152: fd042423 sw a6,-56(s0) 10156: fca42623 sw a0,-52(s0) 1015a: fcb42823 sw a1,-48(s0) 1015e: fcc42a23 sw a2,-44(s0) 10162: fcd42c23 sw a3,-40(s0) 10166: fce42e23 sw a4,-36(s0) 1016a: fef42023 sw a5,-32(s0) 1016e: 47a5 li a5,9 10170: faf42e23 sw a5,-68(s0) 10174: fbc42783 lw a5,-68(s0) 10178: 17fd addi a5,a5,-1 1017a: fef42623 sw a5,-20(s0) 1017e: a871 j 1021a <main+0xf4> 10180: fe042423 sw zero,-24(s0) 10184: a041 j 10204 <main+0xde> 10186: fe842783 lw a5,-24(s0) 1018a: 078a slli a5,a5,0x2 1018c: ff040713 addi a4,s0,-16 10190: 97ba add a5,a5,a4 10192: fd07a703 lw a4,-48(a5) 10196: fe842783 lw a5,-24(s0) 1019a: 0785 addi a5,a5,1 1019c: 078a slli a5,a5,0x2 1019e: ff040693 addi a3,s0,-16 101a2: 97b6 add a5,a5,a3 101a4: fd07a783 lw a5,-48(a5) 101a8: 04e7d963 bge a5,a4,101fa <main+0xd4> 101ac: fe842783 lw a5,-24(s0) 101b0: 078a slli a5,a5,0x2 101b2: ff040713 addi a4,s0,-16 101b6: 97ba add a5,a5,a4 101b8: fd07a783 lw a5,-48(a5) 101bc: fef42223 sw a5,-28(s0) 101c0: fe842783 lw a5,-24(s0) 101c4: 0785 addi a5,a5,1 101c6: fe842683 lw a3,-24(s0) 101ca: 078a slli a5,a5,0x2 101cc: ff040713 addi a4,s0,-16 101d0: 97ba add a5,a5,a4 101d2: fd07a703 lw a4,-48(a5) 101d6: 00269793 slli a5,a3,0x2 101da: ff040693 addi a3,s0,-16 101de: 97b6 add a5,a5,a3 101e0: fce7a823 sw a4,-48(a5) 101e4: fe842783 lw a5,-24(s0) 101e8: 0785 addi a5,a5,1 101ea: fe442703 lw a4,-28(s0) 101ee: 078a slli a5,a5,0x2 101f0: ff040693 addi a3,s0,-16 101f4: 97b6 add a5,a5,a3 101f6: fce7a823 sw a4,-48(a5) 101fa: fe842783 lw a5,-24(s0) 101fe: 0785 addi a5,a5,1 10200: fef42423 sw a5,-24(s0) 10204: fe842703 lw a4,-24(s0) 10208: fec42783 lw a5,-20(s0) 1020c: f6f74de3 blt a4,a5,10186 <main+0x60> 10210: fec42783 lw a5,-20(s0) 10214: 17fd addi a5,a5,-1 10216: fef42623 sw a5,-20(s0) 1021a: fec42783 lw a5,-20(s0) 1021e: f607d1e3 bgez a5,10180 <main+0x5a> 10222: 4781 li a5,0 10224: 853e mv a0,a5 10226: 4436 lw s0,76(sp) 10228: 6161 addi sp,sp,80 1022a: 8082 ret 0001022c <__libc_init_array>: 1022c: 1141 addi sp,sp,-16 1022e: c422 sw s0,8(sp) 10230: c04a sw s2,0(sp) 10232: 00001417 auipc s0,0x1 10236: 2a240413 addi s0,s0,674 # 114d4 <__init_array_start> 1023a: 00001917 auipc s2,0x1 1023e: 29a90913 addi s2,s2,666 # 114d4 <__init_array_start> 10242: 40890933 sub s2,s2,s0 10246: c606 sw ra,12(sp) 10248: c226 sw s1,4(sp) 1024a: 40295913 srai s2,s2,0x2 1024e: 00090963 beqz s2,10260 <__libc_init_array+0x34> 10252: 4481 li s1,0 10254: 401c lw a5,0(s0) 10256: 0485 addi s1,s1,1 10258: 0411 addi s0,s0,4 1025a: 9782 jalr a5 1025c: fe991ce3 bne s2,s1,10254 <__libc_init_array+0x28> 10260: 00001417 auipc s0,0x1 10264: 27440413 addi s0,s0,628 # 114d4 <__init_array_start> 10268: 00001917 auipc s2,0x1 1026c: 27490913 addi s2,s2,628 # 114dc <__do_global_dtors_aux_fini_array_entry> 10270: 40890933 sub s2,s2,s0 10274: 40295913 srai s2,s2,0x2 10278: 00090963 beqz s2,1028a <__libc_init_array+0x5e> 1027c: 4481 li s1,0 1027e: 401c lw a5,0(s0) 10280: 0485 addi s1,s1,1 10282: 0411 addi s0,s0,4 10284: 9782 jalr a5 10286: fe991ce3 bne s2,s1,1027e <__libc_init_array+0x52> 1028a: 40b2 lw ra,12(sp) 1028c: 4422 lw s0,8(sp) 1028e: 4492 lw s1,4(sp) 10290: 4902 lw s2,0(sp) 10292: 0141 addi sp,sp,16 10294: 8082 ret 00010296 <memset>: 10296: 433d li t1,15 10298: 872a mv a4,a0 1029a: 02c37363 bgeu t1,a2,102c0 <memset+0x2a> 1029e: 00f77793 andi a5,a4,15 102a2: efbd bnez a5,10320 <memset+0x8a> 102a4: e5ad bnez a1,1030e <memset+0x78> 102a6: ff067693 andi a3,a2,-16 102aa: 8a3d andi a2,a2,15 102ac: 96ba add a3,a3,a4 102ae: c30c sw a1,0(a4) 102b0: c34c sw a1,4(a4) 102b2: c70c sw a1,8(a4) 102b4: c74c sw a1,12(a4) 102b6: 0741 addi a4,a4,16 102b8: fed76be3 bltu a4,a3,102ae <memset+0x18> 102bc: e211 bnez a2,102c0 <memset+0x2a> 102be: 8082 ret 102c0: 40c306b3 sub a3,t1,a2 102c4: 068a slli a3,a3,0x2 102c6: 00000297 auipc t0,0x0 102ca: 9696 add a3,a3,t0 102cc: 00a68067 jr 10(a3) 102d0: 00b70723 sb a1,14(a4) 102d4: 00b706a3 sb a1,13(a4) 102d8: 00b70623 sb a1,12(a4) 102dc: 00b705a3 sb a1,11(a4) 102e0: 00b70523 sb a1,10(a4) 102e4: 00b704a3 sb a1,9(a4) 102e8: 00b70423 sb a1,8(a4) 102ec: 00b703a3 sb a1,7(a4) 102f0: 00b70323 sb a1,6(a4) 102f4: 00b702a3 sb a1,5(a4) 102f8: 00b70223 sb a1,4(a4) 102fc: 00b701a3 sb a1,3(a4) 10300: 00b70123 sb a1,2(a4) 10304: 00b700a3 sb a1,1(a4) 10308: 00b70023 sb a1,0(a4) 1030c: 8082 ret 1030e: 0ff5f593 andi a1,a1,255 10312: 00859693 slli a3,a1,0x8 10316: 8dd5 or a1,a1,a3 10318: 01059693 slli a3,a1,0x10 1031c: 8dd5 or a1,a1,a3 1031e: b761 j 102a6 <memset+0x10> 10320: 00279693 slli a3,a5,0x2 10324: 00000297 auipc t0,0x0 10328: 9696 add a3,a3,t0 1032a: 8286 mv t0,ra 1032c: fa8680e7 jalr -88(a3) 10330: 8096 mv ra,t0 10332: 17c1 addi a5,a5,-16 10334: 8f1d sub a4,a4,a5 10336: 963e add a2,a2,a5 10338: f8c374e3 bgeu t1,a2,102c0 <memset+0x2a> 1033c: b7a5 j 102a4 <memset+0xe> 0001033e <__call_exitprocs>: 1033e: 7179 addi sp,sp,-48 10340: cc52 sw s4,24(sp) 10342: c281aa03 lw s4,-984(gp) # 11908 <_global_impure_ptr> 10346: d04a sw s2,32(sp) 10348: 148a2903 lw s2,328(s4) 1034c: d606 sw ra,44(sp) 1034e: d422 sw s0,40(sp) 10350: d226 sw s1,36(sp) 10352: ce4e sw s3,28(sp) 10354: ca56 sw s5,20(sp) 10356: c85a sw s6,16(sp) 10358: c65e sw s7,12(sp) 1035a: c462 sw s8,8(sp) 1035c: 02090863 beqz s2,1038c <__call_exitprocs+0x4e> 10360: 8b2a mv s6,a0 10362: 8bae mv s7,a1 10364: 4a85 li s5,1 10366: 59fd li s3,-1 10368: 00492483 lw s1,4(s2) 1036c: fff48413 addi s0,s1,-1 10370: 00044e63 bltz s0,1038c <__call_exitprocs+0x4e> 10374: 048a slli s1,s1,0x2 10376: 94ca add s1,s1,s2 10378: 020b8663 beqz s7,103a4 <__call_exitprocs+0x66> 1037c: 1044a783 lw a5,260(s1) 10380: 03778263 beq a5,s7,103a4 <__call_exitprocs+0x66> 10384: 147d addi s0,s0,-1 10386: 14f1 addi s1,s1,-4 10388: ff3418e3 bne s0,s3,10378 <__call_exitprocs+0x3a> 1038c: 50b2 lw ra,44(sp) 1038e: 5422 lw s0,40(sp) 10390: 5492 lw s1,36(sp) 10392: 5902 lw s2,32(sp) 10394: 49f2 lw s3,28(sp) 10396: 4a62 lw s4,24(sp) 10398: 4ad2 lw s5,20(sp) 1039a: 4b42 lw s6,16(sp) 1039c: 4bb2 lw s7,12(sp) 1039e: 4c22 lw s8,8(sp) 103a0: 6145 addi sp,sp,48 103a2: 8082 ret 103a4: 00492783 lw a5,4(s2) 103a8: 40d4 lw a3,4(s1) 103aa: 17fd addi a5,a5,-1 103ac: 04878263 beq a5,s0,103f0 <__call_exitprocs+0xb2> 103b0: 0004a223 sw zero,4(s1) 103b4: dae1 beqz a3,10384 <__call_exitprocs+0x46> 103b6: 18892783 lw a5,392(s2) 103ba: 008a9733 sll a4,s5,s0 103be: 00492c03 lw s8,4(s2) 103c2: 8ff9 and a5,a5,a4 103c4: ef89 bnez a5,103de <__call_exitprocs+0xa0> 103c6: 9682 jalr a3 103c8: 00492703 lw a4,4(s2) 103cc: 148a2783 lw a5,328(s4) 103d0: 01871463 bne a4,s8,103d8 <__call_exitprocs+0x9a> 103d4: fb2788e3 beq a5,s2,10384 <__call_exitprocs+0x46> 103d8: dbd5 beqz a5,1038c <__call_exitprocs+0x4e> 103da: 893e mv s2,a5 103dc: b771 j 10368 <__call_exitprocs+0x2a> 103de: 18c92783 lw a5,396(s2) 103e2: 0844a583 lw a1,132(s1) 103e6: 8f7d and a4,a4,a5 103e8: e719 bnez a4,103f6 <__call_exitprocs+0xb8> 103ea: 855a mv a0,s6 103ec: 9682 jalr a3 103ee: bfe9 j 103c8 <__call_exitprocs+0x8a> 103f0: 00892223 sw s0,4(s2) 103f4: b7c1 j 103b4 <__call_exitprocs+0x76> 103f6: 852e mv a0,a1 103f8: 9682 jalr a3 103fa: b7f9 j 103c8 <__call_exitprocs+0x8a> 000103fc <__libc_fini_array>: 103fc: 1141 addi sp,sp,-16 103fe: c422 sw s0,8(sp) 10400: 00001797 auipc a5,0x1 10404: 0e078793 addi a5,a5,224 # 114e0 <impure_data> 10408: 00001417 auipc s0,0x1 1040c: 0d440413 addi s0,s0,212 # 114dc <__do_global_dtors_aux_fini_array_entry> 10410: 8f81 sub a5,a5,s0 10412: c226 sw s1,4(sp) 10414: c606 sw ra,12(sp) 10416: 4027d493 srai s1,a5,0x2 1041a: c881 beqz s1,1042a <__libc_fini_array+0x2e> 1041c: 17f1 addi a5,a5,-4 1041e: 943e add s0,s0,a5 10420: 401c lw a5,0(s0) 10422: 14fd addi s1,s1,-1 10424: 1471 addi s0,s0,-4 10426: 9782 jalr a5 10428: fce5 bnez s1,10420 <__libc_fini_array+0x24> 1042a: 40b2 lw ra,12(sp) 1042c: 4422 lw s0,8(sp) 1042e: 4492 lw s1,4(sp) 10430: 0141 addi sp,sp,16 10432: 8082 ret 00010434 <atexit>: 10434: 85aa mv a1,a0 10436: 4681 li a3,0 10438: 4601 li a2,0 1043a: 4501 li a0,0 1043c: a009 j 1043e <__register_exitproc> 0001043e <__register_exitproc>: 1043e: c281a703 lw a4,-984(gp) # 11908 <_global_impure_ptr> 10442: 14872783 lw a5,328(a4) 10446: c3a1 beqz a5,10486 <__register_exitproc+0x48> 10448: 43d8 lw a4,4(a5) 1044a: 487d li a6,31 1044c: 04e84d63 blt a6,a4,104a6 <__register_exitproc+0x68> 10450: 00271813 slli a6,a4,0x2 10454: c11d beqz a0,1047a <__register_exitproc+0x3c> 10456: 01078333 add t1,a5,a6 1045a: 08c32423 sw a2,136(t1) # 101a4 <main+0x7e> 1045e: 1887a883 lw a7,392(a5) 10462: 4605 li a2,1 10464: 00e61633 sll a2,a2,a4 10468: 00c8e8b3 or a7,a7,a2 1046c: 1917a423 sw a7,392(a5) 10470: 10d32423 sw a3,264(t1) 10474: 4689 li a3,2 10476: 00d50d63 beq a0,a3,10490 <__register_exitproc+0x52> 1047a: 0705 addi a4,a4,1 1047c: c3d8 sw a4,4(a5) 1047e: 97c2 add a5,a5,a6 10480: c78c sw a1,8(a5) 10482: 4501 li a0,0 10484: 8082 ret 10486: 14c70793 addi a5,a4,332 1048a: 14f72423 sw a5,328(a4) 1048e: bf6d j 10448 <__register_exitproc+0xa> 10490: 18c7a683 lw a3,396(a5) 10494: 0705 addi a4,a4,1 10496: c3d8 sw a4,4(a5) 10498: 8e55 or a2,a2,a3 1049a: 18c7a623 sw a2,396(a5) 1049e: 97c2 add a5,a5,a6 104a0: c78c sw a1,8(a5) 104a2: 4501 li a0,0 104a4: 8082 ret 104a6: 557d li a0,-1 104a8: 8082 ret 000104aa <_exit>: 104aa: a001 j 104aa <_exit> ``` Execution results and binary size as follows. ![](https://i.imgur.com/9pf3bXH.png) ReadELF return the following information ![](https://i.imgur.com/JjMpWkS.png) ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x100a0 Start of program headers: 52 (bytes into file) Start of section headers: 4348 (bytes into file) Flags: 0x1, RVC, soft-float ABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 2 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 ``` In the generated assembly, we can see there are a few features that human written assembly dont have. For example, it first contain a large chunks of pre-defined / template based sections of code to handle start, register finished, exit etc. Jump instruction still exists in the code but the jump location label is replaced by instruction line number. Here is an example of such jump instruction Assembly (Hand Written) ```asn lw t1, 0(s4) # value of list[i] lw t2, 0(s6) # value of list[j] blt t1, t2, loop2_ ``` Assembly (Compiler) ```asm= 10178: 17fd addi a5,a5,-1 1017a: fef42623 sw a5,-20(s0) 1017e: a871 j 1021a <main+0xf4> ``` There is no way to compare the the counter for true_counter and jump_counter as the emu-rv32i seems not working with the latest version of GNU-RSICV compiler compiled binary files. ![](https://i.imgur.com/GdvzvhE.png) :::info Can you modify the source of `emu-rv32i` to enable debugging? So that we can check the internals during emulating RISC-V instructions. :notes: jserv ::: ### Update 2: Enabling rv32emu debug mode As the comments in rv32emu is well written, we can easily spot the debug output flags here ![](https://i.imgur.com/7jPoTqi.png) After un-commenting the above definations and recompile the rv32emu with the ```make``` command, we can observe the following errors. ![](https://i.imgur.com/M08eo7m.png) And the rest of the outputs are the register table and the statistic of the instructions used in the binary file. ![](https://i.imgur.com/PzdKGFM.png) ![](https://i.imgur.com/CbPyL40.png) After some tracing, I notice the error was raised by this line of code in the rv32emu source code. ![](https://i.imgur.com/U7m1M2L.png) And tracing back the source for the binary, it was raised by the following line of instruction in the start section ![](https://i.imgur.com/PH7ybFj.png) but clearly the emulator has this instruction support, as written in the code below. ![](https://i.imgur.com/9EsXq00.png) So I guess this might be a bug in the emulator or the implementation of the emulator is not compatible with the latest version of RISC-V GNU compiler tool chain. I have opened an [Github issue](https://github.com/sysprog21/rv32emu/issues/11) for easier discussion. #### --- END OF UPDATE 2 --- However, just from the assembly, we can observe data hazards as seen in the bgez instruction. Here is an example of this instruction in the compiled binary file. ```asm 10180: fe042423 sw zero,-24(s0) 10184: a041 j 10204 <main+0xde> 10186: fe842783 lw a5,-24(s0) ... 1021e: f607d1e3 bgez a5,10180 <main+0x5a> 10222: 4781 li a5,0 ``` As you can see, the pipeline will continue to load the 0 into a5 while the bgez instruction is in its ID state. However, if the bgez instruction jump to memory address 10180, the a5 will be overwritten by the -24(s0), which is one of the integer within the array to be re-arranged. That means in this section of the code, data hazard will happens and the pipe line will need to flush the preloaded instruction and load the new instruction after the branched target instead. ### Write a Faster / Smaller Assembly program Yes, in theory a professional assembly developer can write a faster and / or smaller code in assembly then the default setting of the GNU-compiler. However, it is also possible to optimize your code by using method like 1. Faster / Optimized algorithms 2. Manually setting only the required linkers during compilation 3. Using compression software (e.g. UPX) to post-process your binary Here is an [interesting video](https://youtu.be/ExwqNreocpg?t=723) using all these techniques to fit an assembly written program into a QR code. ### Possible reasons for failed-to-compile for non-volatile int The possible reason for failing to compile might caused by the problem where the ```volatile``` type forced the compiler to make sure the int values are always store in memory and it can be changed at anytime. As quoted by one of the StackOverflow post > The only effect of volatile is to warn the compiler that the value of x might be changed from another thread. Possible related StackOverflow post: https://stackoverflow.com/questions/57415934/volatile-int-vs-int-variable-effect And by another article from this [website](https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword), the error might caused by the nature of embedded C where volatile make the compiler not to optimize the variable that is involved with the declaration. >C's volatile keyword is a qualifier that is applied to a variable when it is declared. It tells the compiler that the value of the variable may change at any time--without any action being taken by the code the compiler finds nearby. As the C program require real-time changing of the int variables, volatile type make the compiler skip optimizing for that particular variable in the embedded environment, make it possible to "write to" during the program execution.