# Assignment2: RISC-V Toolchain contributed by < [`Jack`](https://github.com/Chi-you) > ###### tags: `RISC-V`, `Computer Architure 2021` ## Environment setting ![](https://i.imgur.com/U6iTbBZ.png) ## Sort Colors ### Motivation I choose the problem from [張力尹](https://hackmd.io/ZwAmFgXdTzSwIt1u5osvhQ?view), which is because it is a medium problem and I want to try it by myself. ### Problem description Given an array nums with n objects colored red, white, or blue, sort them `in-place` so that objects of the same color are adjacent, with the colors in the order red, white, and blue. >In-place sort: A sort algorithm in which the sorted items occupy the same storage as the original ones. We will use the integers 0, 1, and 2 to represent the color red, white, and blue, respectively. You must solve this problem without using the library's sort function. ### Example ``` Input: nums = [2,0,2,1,1,0] Output: [0,0,1,1,2,2] ``` ## C code implementation The C code below is modified to fit in the `rv32emu` emulator. ``` void sortColors(int* nums, int numsSize){ int two_ptr = numsSize -1; int zero_ptr = 0; int one_ptr = 0; while (one_ptr <= two_ptr){ if (nums[one_ptr] == 0){ int save = nums[zero_ptr]; nums[zero_ptr] = nums[one_ptr]; nums[one_ptr] = save; one_ptr++; zero_ptr++; } else if (nums[one_ptr] == 1){ one_ptr++; } else{ int save = nums[two_ptr]; nums[two_ptr] = nums[one_ptr]; nums[one_ptr] = save; two_ptr--; } } } void _start(){ int A[] = {2,0,2,1,1,0}; int n = sizeof(A)/sizeof(A[0]); sortColors(A, n); volatile char* tx = (volatile char*) 0x40002000; *tx='['; for (int i = 0; i < n; i++){ *tx=(char)((A[i]+'0') & 0x000000FF); if(i < n-1) *tx=','; } *tx=']'; } ``` ## GNU tools Optimization ### -O3 Compiled from command ``` riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O3 -nostdlib -o test test2.c ``` ### Execution Result ``` ./emu-rv32i test [0,0,1,1,2,2] >>> Execution time: 41600 ns >>> Instruction count: 142 (IPS=3413461) >>> Jumps: 13 (9.15%) - 6 forwards, 7 backwards >>> Branching T=11 (68.75%) F=5 (31.25%) ``` ### Mnemonics code from GNU Toolchain ``` riscv-none-embed-objdump -d test ``` ``` test: file format elf32-littleriscv Disassembly of section .text: 00010054 <sortColors>: 10054: fff58593 addi a1,a1,-1 10058: 0405c063 bltz a1,10098 <sortColors+0x44> 1005c: 00000693 li a3,0 10060: 00000813 li a6,0 10064: 00100313 li t1,1 10068: 00269793 slli a5,a3,0x2 1006c: 00f507b3 add a5,a0,a5 10070: 0007a603 lw a2,0(a5) 10074: 00281713 slli a4,a6,0x2 10078: 00e50733 add a4,a0,a4 1007c: 02061063 bnez a2,1009c <sortColors+0x48> 10080: 00072603 lw a2,0(a4) 10084: 00072023 sw zero,0(a4) 10088: 00168693 addi a3,a3,1 1008c: 00c7a023 sw a2,0(a5) 10090: 00180813 addi a6,a6,1 10094: fcd5dae3 bge a1,a3,10068 <sortColors+0x14> 10098: 00008067 ret 1009c: 00259713 slli a4,a1,0x2 100a0: 00e50733 add a4,a0,a4 100a4: 00660e63 beq a2,t1,100c0 <sortColors+0x6c> 100a8: 00072883 lw a7,0(a4) 100ac: 00c72023 sw a2,0(a4) 100b0: fff58593 addi a1,a1,-1 100b4: 0117a023 sw a7,0(a5) 100b8: fad5d8e3 bge a1,a3,10068 <sortColors+0x14> 100bc: fddff06f j 10098 <sortColors+0x44> 100c0: 00168693 addi a3,a3,1 100c4: fad5d2e3 bge a1,a3,10068 <sortColors+0x14> 100c8: fd1ff06f j 10098 <sortColors+0x44> 000100cc <_start>: 100cc: 000107b7 lui a5,0x10 100d0: 21478793 addi a5,a5,532 # 10214 <_start+0x148> 100d4: 0007a503 lw a0,0(a5) 100d8: 0047a583 lw a1,4(a5) 100dc: 00c7a683 lw a3,12(a5) 100e0: 0087a603 lw a2,8(a5) 100e4: 0107a703 lw a4,16(a5) 100e8: 0147a783 lw a5,20(a5) 100ec: fe010113 addi sp,sp,-32 100f0: 00a12423 sw a0,8(sp) 100f4: 00b12623 sw a1,12(sp) 100f8: 00d12a23 sw a3,20(sp) 100fc: 00c12823 sw a2,16(sp) 10100: 00e12c23 sw a4,24(sp) 10104: 00f12e23 sw a5,28(sp) 10108: 00500593 li a1,5 1010c: 00000513 li a0,0 10110: 00000693 li a3,0 10114: 00100893 li a7,1 10118: 00810613 addi a2,sp,8 1011c: 00269793 slli a5,a3,0x2 10120: 00251713 slli a4,a0,0x2 10124: 00f607b3 add a5,a2,a5 10128: 00e60733 add a4,a2,a4 1012c: 0007a603 lw a2,0(a5) 10130: 0a061863 bnez a2,101e0 <_start+0x114> 10134: 00072603 lw a2,0(a4) 10138: 00072023 sw zero,0(a4) 1013c: 00168693 addi a3,a3,1 10140: 00c7a023 sw a2,0(a5) 10144: 00150513 addi a0,a0,1 10148: fcd5d8e3 bge a1,a3,10118 <_start+0x4c> 1014c: 400027b7 lui a5,0x40002 10150: 05b00713 li a4,91 10154: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff05d4> 10158: 00812683 lw a3,8(sp) 1015c: 02c00713 li a4,44 10160: 03068693 addi a3,a3,48 10164: 0ff6f693 andi a3,a3,255 10168: 00d78023 sb a3,0(a5) 1016c: 00e78023 sb a4,0(a5) 10170: 00c12683 lw a3,12(sp) 10174: 03068693 addi a3,a3,48 10178: 0ff6f693 andi a3,a3,255 1017c: 00d78023 sb a3,0(a5) 10180: 00e78023 sb a4,0(a5) 10184: 01012683 lw a3,16(sp) 10188: 03068693 addi a3,a3,48 1018c: 0ff6f693 andi a3,a3,255 10190: 00d78023 sb a3,0(a5) 10194: 00e78023 sb a4,0(a5) 10198: 01412683 lw a3,20(sp) 1019c: 03068693 addi a3,a3,48 101a0: 0ff6f693 andi a3,a3,255 101a4: 00d78023 sb a3,0(a5) 101a8: 00e78023 sb a4,0(a5) 101ac: 01812683 lw a3,24(sp) 101b0: 03068693 addi a3,a3,48 101b4: 0ff6f693 andi a3,a3,255 101b8: 00d78023 sb a3,0(a5) 101bc: 00e78023 sb a4,0(a5) 101c0: 01c12703 lw a4,28(sp) 101c4: 03070713 addi a4,a4,48 101c8: 0ff77713 andi a4,a4,255 101cc: 00e78023 sb a4,0(a5) 101d0: 05d00713 li a4,93 101d4: 00e78023 sb a4,0(a5) 101d8: 02010113 addi sp,sp,32 101dc: 00008067 ret 101e0: 00259713 slli a4,a1,0x2 101e4: 00810813 addi a6,sp,8 101e8: 00e80733 add a4,a6,a4 101ec: 01160e63 beq a2,a7,10208 <_start+0x13c> 101f0: 00072803 lw a6,0(a4) 101f4: 00c72023 sw a2,0(a4) 101f8: fff58593 addi a1,a1,-1 101fc: 0107a023 sw a6,0(a5) 10200: f0d5dce3 bge a1,a3,10118 <_start+0x4c> 10204: f49ff06f j 1014c <_start+0x80> 10208: 00168693 addi a3,a3,1 1020c: f0d5d6e3 bge a1,a3,10118 <_start+0x4c> 10210: f3dff06f j 1014c <_start+0x80> ``` ### Size ``` text data bss dec hex filename 472 0 0 472 1d8 test ``` --- ### -O0 Compiled from command ``` riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O0 -nostdlib -o test test2.c ``` ### Execution result ``` ./emu-rv32i test [0,0,1,1,2,2] >>> Execution time: 46200 ns >>> Instruction count: 398 (IPS=8614718) >>> Jumps: 28 (7.04%) - 14 forwards, 14 backwards >>> Branching T=19 (63.33%) F=11 (36.67%) ``` ### Mnemonics code from GNU Toolchain Generated from command ``` riscv-none-embed-objdump -d test ``` ``` test: file format elf32-littleriscv Disassembly of section .text: 00010054 <sortColors>: 10054: fc010113 addi sp,sp,-64 10058: 02812e23 sw s0,60(sp) 1005c: 04010413 addi s0,sp,64 10060: fca42623 sw a0,-52(s0) 10064: fcb42423 sw a1,-56(s0) 10068: fc842783 lw a5,-56(s0) 1006c: fff78793 addi a5,a5,-1 10070: fef42623 sw a5,-20(s0) 10074: fe042423 sw zero,-24(s0) 10078: fe042223 sw zero,-28(s0) 1007c: 1200006f j 1019c <sortColors+0x148> 10080: fe442783 lw a5,-28(s0) 10084: 00279793 slli a5,a5,0x2 10088: fcc42703 lw a4,-52(s0) 1008c: 00f707b3 add a5,a4,a5 10090: 0007a783 lw a5,0(a5) 10094: 06079c63 bnez a5,1010c <sortColors+0xb8> 10098: fe842783 lw a5,-24(s0) 1009c: 00279793 slli a5,a5,0x2 100a0: fcc42703 lw a4,-52(s0) 100a4: 00f707b3 add a5,a4,a5 100a8: 0007a783 lw a5,0(a5) 100ac: fef42023 sw a5,-32(s0) 100b0: fe442783 lw a5,-28(s0) 100b4: 00279793 slli a5,a5,0x2 100b8: fcc42703 lw a4,-52(s0) 100bc: 00f70733 add a4,a4,a5 100c0: fe842783 lw a5,-24(s0) 100c4: 00279793 slli a5,a5,0x2 100c8: fcc42683 lw a3,-52(s0) 100cc: 00f687b3 add a5,a3,a5 100d0: 00072703 lw a4,0(a4) 100d4: 00e7a023 sw a4,0(a5) 100d8: fe442783 lw a5,-28(s0) 100dc: 00279793 slli a5,a5,0x2 100e0: fcc42703 lw a4,-52(s0) 100e4: 00f707b3 add a5,a4,a5 100e8: fe042703 lw a4,-32(s0) 100ec: 00e7a023 sw a4,0(a5) 100f0: fe442783 lw a5,-28(s0) 100f4: 00178793 addi a5,a5,1 100f8: fef42223 sw a5,-28(s0) 100fc: fe842783 lw a5,-24(s0) 10100: 00178793 addi a5,a5,1 10104: fef42423 sw a5,-24(s0) 10108: 0940006f j 1019c <sortColors+0x148> 1010c: fe442783 lw a5,-28(s0) 10110: 00279793 slli a5,a5,0x2 10114: fcc42703 lw a4,-52(s0) 10118: 00f707b3 add a5,a4,a5 1011c: 0007a703 lw a4,0(a5) 10120: 00100793 li a5,1 10124: 00f71a63 bne a4,a5,10138 <sortColors+0xe4> 10128: fe442783 lw a5,-28(s0) 1012c: 00178793 addi a5,a5,1 10130: fef42223 sw a5,-28(s0) 10134: 0680006f j 1019c <sortColors+0x148> 10138: fec42783 lw a5,-20(s0) 1013c: 00279793 slli a5,a5,0x2 10140: fcc42703 lw a4,-52(s0) 10144: 00f707b3 add a5,a4,a5 10148: 0007a783 lw a5,0(a5) 1014c: fcf42e23 sw a5,-36(s0) 10150: fe442783 lw a5,-28(s0) 10154: 00279793 slli a5,a5,0x2 10158: fcc42703 lw a4,-52(s0) 1015c: 00f70733 add a4,a4,a5 10160: fec42783 lw a5,-20(s0) 10164: 00279793 slli a5,a5,0x2 10168: fcc42683 lw a3,-52(s0) 1016c: 00f687b3 add a5,a3,a5 10170: 00072703 lw a4,0(a4) 10174: 00e7a023 sw a4,0(a5) 10178: fe442783 lw a5,-28(s0) 1017c: 00279793 slli a5,a5,0x2 10180: fcc42703 lw a4,-52(s0) 10184: 00f707b3 add a5,a4,a5 10188: fdc42703 lw a4,-36(s0) 1018c: 00e7a023 sw a4,0(a5) 10190: fec42783 lw a5,-20(s0) 10194: fff78793 addi a5,a5,-1 10198: fef42623 sw a5,-20(s0) 1019c: fe442703 lw a4,-28(s0) 101a0: fec42783 lw a5,-20(s0) 101a4: ece7dee3 bge a5,a4,10080 <sortColors+0x2c> 101a8: 00000013 nop 101ac: 03c12403 lw s0,60(sp) 101b0: 04010113 addi sp,sp,64 101b4: 00008067 ret 000101b8 <_start>: 101b8: fc010113 addi sp,sp,-64 101bc: 02112e23 sw ra,60(sp) 101c0: 02812c23 sw s0,56(sp) 101c4: 04010413 addi s0,sp,64 101c8: 000107b7 lui a5,0x10 101cc: 2c07a503 lw a0,704(a5) # 102c0 <_start+0x108> 101d0: 2c078713 addi a4,a5,704 101d4: 00472583 lw a1,4(a4) 101d8: 2c078713 addi a4,a5,704 101dc: 00872603 lw a2,8(a4) 101e0: 2c078713 addi a4,a5,704 101e4: 00c72683 lw a3,12(a4) 101e8: 2c078713 addi a4,a5,704 101ec: 01072703 lw a4,16(a4) 101f0: 2c078793 addi a5,a5,704 101f4: 0147a783 lw a5,20(a5) 101f8: fca42623 sw a0,-52(s0) 101fc: fcb42823 sw a1,-48(s0) 10200: fcc42a23 sw a2,-44(s0) 10204: fcd42c23 sw a3,-40(s0) 10208: fce42e23 sw a4,-36(s0) 1020c: fef42023 sw a5,-32(s0) 10210: 00600793 li a5,6 10214: fef42423 sw a5,-24(s0) 10218: fcc40793 addi a5,s0,-52 1021c: fe842583 lw a1,-24(s0) 10220: 00078513 mv a0,a5 10224: e31ff0ef jal ra,10054 <sortColors> 10228: 400027b7 lui a5,0x40002 1022c: fef42223 sw a5,-28(s0) 10230: fe442783 lw a5,-28(s0) 10234: 05b00713 li a4,91 10238: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff0528> 1023c: fe042623 sw zero,-20(s0) 10240: 0540006f j 10294 <_start+0xdc> 10244: fec42783 lw a5,-20(s0) 10248: 00279793 slli a5,a5,0x2 1024c: ff040713 addi a4,s0,-16 10250: 00f707b3 add a5,a4,a5 10254: fdc7a783 lw a5,-36(a5) 10258: 0ff7f793 andi a5,a5,255 1025c: 03078793 addi a5,a5,48 10260: 0ff7f713 andi a4,a5,255 10264: fe442783 lw a5,-28(s0) 10268: 00e78023 sb a4,0(a5) 1026c: fe842783 lw a5,-24(s0) 10270: fff78793 addi a5,a5,-1 10274: fec42703 lw a4,-20(s0) 10278: 00f75863 bge a4,a5,10288 <_start+0xd0> 1027c: fe442783 lw a5,-28(s0) 10280: 02c00713 li a4,44 10284: 00e78023 sb a4,0(a5) 10288: fec42783 lw a5,-20(s0) 1028c: 00178793 addi a5,a5,1 10290: fef42623 sw a5,-20(s0) 10294: fec42703 lw a4,-20(s0) 10298: fe842783 lw a5,-24(s0) 1029c: faf744e3 blt a4,a5,10244 <_start+0x8c> 102a0: fe442783 lw a5,-28(s0) 102a4: 05d00713 li a4,93 102a8: 00e78023 sb a4,0(a5) 102ac: 00000013 nop 102b0: 03c12083 lw ra,60(sp) 102b4: 03812403 lw s0,56(sp) 102b8: 04010113 addi sp,sp,64 102bc: 00008067 ret ``` ### Size Generated from command ``` riscv-none-embed-size test ``` ``` text data bss dec hex filename 644 0 0 644 284 test ``` ## Conclusion Below is the comparison of each level of optimization. | | Without optimization | -O2 | -O3 | |:----------------- |:-------------------- |:----------- |:----------- | | Execution time | 138300 ns | 127900 ns | 121600 ns | | Instruction count | 398 | 151 | 142 | | Jumps | 28 | 19 | 13 | | Branching | T: 19; F: 11 | T: 15; F: 7 | T: 11; F: 5 | > The execution time might be change every time because it depends on the status of the CPU. Compare without optimization to -O3 optimization, the instuction count reduced 65% and the number of Jump almost reduced by 50%. Deeply look into the assembly code generated by `RISC-V Toolchain`, I found that the `-O3 optimization` is more clear and simpler than `-O0`, which exists many redundancy instructions. Also, there are `nop` instruction in `-O0`, which means that it exists `load use data hazard`; therefore, it increases the number of cycles. The reason that I do not choose the optimize option `-Os` is because it has some error `test2.c:(.text+0x88): undefined reference to memcpy`, but I cannot figure it out right now.