# Assignment2: RISC-V Toolchain
contributed by < [`Jack`](https://github.com/Chi-you) >
###### tags: `RISC-V`, `Computer Architure 2021`
## Environment setting

## Sort Colors
### Motivation
I choose the problem from [張力尹](https://hackmd.io/ZwAmFgXdTzSwIt1u5osvhQ?view), which is because it is a medium problem and I want to try it by myself.
### Problem description
Given an array nums with n objects colored red, white, or blue, sort them `in-place` so that objects of the same color are adjacent, with the colors in the order red, white, and blue.
>In-place sort: A sort algorithm in which the sorted items occupy the same storage as the original ones.
We will use the integers 0, 1, and 2 to represent the color red, white, and blue, respectively.
You must solve this problem without using the library's sort function.
### Example
```
Input: nums = [2,0,2,1,1,0]
Output: [0,0,1,1,2,2]
```
## C code implementation
The C code below is modified to fit in the `rv32emu` emulator.
```
void sortColors(int* nums, int numsSize){
int two_ptr = numsSize -1;
int zero_ptr = 0;
int one_ptr = 0;
while (one_ptr <= two_ptr){
if (nums[one_ptr] == 0){
int save = nums[zero_ptr];
nums[zero_ptr] = nums[one_ptr];
nums[one_ptr] = save;
one_ptr++;
zero_ptr++;
}
else if (nums[one_ptr] == 1){
one_ptr++;
}
else{
int save = nums[two_ptr];
nums[two_ptr] = nums[one_ptr];
nums[one_ptr] = save;
two_ptr--;
}
}
}
void _start(){
int A[] = {2,0,2,1,1,0};
int n = sizeof(A)/sizeof(A[0]);
sortColors(A, n);
volatile char* tx = (volatile char*) 0x40002000;
*tx='[';
for (int i = 0; i < n; i++){
*tx=(char)((A[i]+'0') & 0x000000FF);
if(i < n-1) *tx=',';
}
*tx=']';
}
```
## GNU tools Optimization
### -O3
Compiled from command
```
riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O3 -nostdlib -o test test2.c
```
### Execution Result
```
./emu-rv32i test
[0,0,1,1,2,2]
>>> Execution time: 41600 ns
>>> Instruction count: 142 (IPS=3413461)
>>> Jumps: 13 (9.15%) - 6 forwards, 7 backwards
>>> Branching T=11 (68.75%) F=5 (31.25%)
```
### Mnemonics code from GNU Toolchain
```
riscv-none-embed-objdump -d test
```
```
test: file format elf32-littleriscv
Disassembly of section .text:
00010054 <sortColors>:
10054: fff58593 addi a1,a1,-1
10058: 0405c063 bltz a1,10098 <sortColors+0x44>
1005c: 00000693 li a3,0
10060: 00000813 li a6,0
10064: 00100313 li t1,1
10068: 00269793 slli a5,a3,0x2
1006c: 00f507b3 add a5,a0,a5
10070: 0007a603 lw a2,0(a5)
10074: 00281713 slli a4,a6,0x2
10078: 00e50733 add a4,a0,a4
1007c: 02061063 bnez a2,1009c <sortColors+0x48>
10080: 00072603 lw a2,0(a4)
10084: 00072023 sw zero,0(a4)
10088: 00168693 addi a3,a3,1
1008c: 00c7a023 sw a2,0(a5)
10090: 00180813 addi a6,a6,1
10094: fcd5dae3 bge a1,a3,10068 <sortColors+0x14>
10098: 00008067 ret
1009c: 00259713 slli a4,a1,0x2
100a0: 00e50733 add a4,a0,a4
100a4: 00660e63 beq a2,t1,100c0 <sortColors+0x6c>
100a8: 00072883 lw a7,0(a4)
100ac: 00c72023 sw a2,0(a4)
100b0: fff58593 addi a1,a1,-1
100b4: 0117a023 sw a7,0(a5)
100b8: fad5d8e3 bge a1,a3,10068 <sortColors+0x14>
100bc: fddff06f j 10098 <sortColors+0x44>
100c0: 00168693 addi a3,a3,1
100c4: fad5d2e3 bge a1,a3,10068 <sortColors+0x14>
100c8: fd1ff06f j 10098 <sortColors+0x44>
000100cc <_start>:
100cc: 000107b7 lui a5,0x10
100d0: 21478793 addi a5,a5,532 # 10214 <_start+0x148>
100d4: 0007a503 lw a0,0(a5)
100d8: 0047a583 lw a1,4(a5)
100dc: 00c7a683 lw a3,12(a5)
100e0: 0087a603 lw a2,8(a5)
100e4: 0107a703 lw a4,16(a5)
100e8: 0147a783 lw a5,20(a5)
100ec: fe010113 addi sp,sp,-32
100f0: 00a12423 sw a0,8(sp)
100f4: 00b12623 sw a1,12(sp)
100f8: 00d12a23 sw a3,20(sp)
100fc: 00c12823 sw a2,16(sp)
10100: 00e12c23 sw a4,24(sp)
10104: 00f12e23 sw a5,28(sp)
10108: 00500593 li a1,5
1010c: 00000513 li a0,0
10110: 00000693 li a3,0
10114: 00100893 li a7,1
10118: 00810613 addi a2,sp,8
1011c: 00269793 slli a5,a3,0x2
10120: 00251713 slli a4,a0,0x2
10124: 00f607b3 add a5,a2,a5
10128: 00e60733 add a4,a2,a4
1012c: 0007a603 lw a2,0(a5)
10130: 0a061863 bnez a2,101e0 <_start+0x114>
10134: 00072603 lw a2,0(a4)
10138: 00072023 sw zero,0(a4)
1013c: 00168693 addi a3,a3,1
10140: 00c7a023 sw a2,0(a5)
10144: 00150513 addi a0,a0,1
10148: fcd5d8e3 bge a1,a3,10118 <_start+0x4c>
1014c: 400027b7 lui a5,0x40002
10150: 05b00713 li a4,91
10154: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff05d4>
10158: 00812683 lw a3,8(sp)
1015c: 02c00713 li a4,44
10160: 03068693 addi a3,a3,48
10164: 0ff6f693 andi a3,a3,255
10168: 00d78023 sb a3,0(a5)
1016c: 00e78023 sb a4,0(a5)
10170: 00c12683 lw a3,12(sp)
10174: 03068693 addi a3,a3,48
10178: 0ff6f693 andi a3,a3,255
1017c: 00d78023 sb a3,0(a5)
10180: 00e78023 sb a4,0(a5)
10184: 01012683 lw a3,16(sp)
10188: 03068693 addi a3,a3,48
1018c: 0ff6f693 andi a3,a3,255
10190: 00d78023 sb a3,0(a5)
10194: 00e78023 sb a4,0(a5)
10198: 01412683 lw a3,20(sp)
1019c: 03068693 addi a3,a3,48
101a0: 0ff6f693 andi a3,a3,255
101a4: 00d78023 sb a3,0(a5)
101a8: 00e78023 sb a4,0(a5)
101ac: 01812683 lw a3,24(sp)
101b0: 03068693 addi a3,a3,48
101b4: 0ff6f693 andi a3,a3,255
101b8: 00d78023 sb a3,0(a5)
101bc: 00e78023 sb a4,0(a5)
101c0: 01c12703 lw a4,28(sp)
101c4: 03070713 addi a4,a4,48
101c8: 0ff77713 andi a4,a4,255
101cc: 00e78023 sb a4,0(a5)
101d0: 05d00713 li a4,93
101d4: 00e78023 sb a4,0(a5)
101d8: 02010113 addi sp,sp,32
101dc: 00008067 ret
101e0: 00259713 slli a4,a1,0x2
101e4: 00810813 addi a6,sp,8
101e8: 00e80733 add a4,a6,a4
101ec: 01160e63 beq a2,a7,10208 <_start+0x13c>
101f0: 00072803 lw a6,0(a4)
101f4: 00c72023 sw a2,0(a4)
101f8: fff58593 addi a1,a1,-1
101fc: 0107a023 sw a6,0(a5)
10200: f0d5dce3 bge a1,a3,10118 <_start+0x4c>
10204: f49ff06f j 1014c <_start+0x80>
10208: 00168693 addi a3,a3,1
1020c: f0d5d6e3 bge a1,a3,10118 <_start+0x4c>
10210: f3dff06f j 1014c <_start+0x80>
```
### Size
```
text data bss dec hex filename
472 0 0 472 1d8 test
```
---
### -O0
Compiled from command
```
riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O0 -nostdlib -o test test2.c
```
### Execution result
```
./emu-rv32i test
[0,0,1,1,2,2]
>>> Execution time: 46200 ns
>>> Instruction count: 398 (IPS=8614718)
>>> Jumps: 28 (7.04%) - 14 forwards, 14 backwards
>>> Branching T=19 (63.33%) F=11 (36.67%)
```
### Mnemonics code from GNU Toolchain
Generated from command
```
riscv-none-embed-objdump -d test
```
```
test: file format elf32-littleriscv
Disassembly of section .text:
00010054 <sortColors>:
10054: fc010113 addi sp,sp,-64
10058: 02812e23 sw s0,60(sp)
1005c: 04010413 addi s0,sp,64
10060: fca42623 sw a0,-52(s0)
10064: fcb42423 sw a1,-56(s0)
10068: fc842783 lw a5,-56(s0)
1006c: fff78793 addi a5,a5,-1
10070: fef42623 sw a5,-20(s0)
10074: fe042423 sw zero,-24(s0)
10078: fe042223 sw zero,-28(s0)
1007c: 1200006f j 1019c <sortColors+0x148>
10080: fe442783 lw a5,-28(s0)
10084: 00279793 slli a5,a5,0x2
10088: fcc42703 lw a4,-52(s0)
1008c: 00f707b3 add a5,a4,a5
10090: 0007a783 lw a5,0(a5)
10094: 06079c63 bnez a5,1010c <sortColors+0xb8>
10098: fe842783 lw a5,-24(s0)
1009c: 00279793 slli a5,a5,0x2
100a0: fcc42703 lw a4,-52(s0)
100a4: 00f707b3 add a5,a4,a5
100a8: 0007a783 lw a5,0(a5)
100ac: fef42023 sw a5,-32(s0)
100b0: fe442783 lw a5,-28(s0)
100b4: 00279793 slli a5,a5,0x2
100b8: fcc42703 lw a4,-52(s0)
100bc: 00f70733 add a4,a4,a5
100c0: fe842783 lw a5,-24(s0)
100c4: 00279793 slli a5,a5,0x2
100c8: fcc42683 lw a3,-52(s0)
100cc: 00f687b3 add a5,a3,a5
100d0: 00072703 lw a4,0(a4)
100d4: 00e7a023 sw a4,0(a5)
100d8: fe442783 lw a5,-28(s0)
100dc: 00279793 slli a5,a5,0x2
100e0: fcc42703 lw a4,-52(s0)
100e4: 00f707b3 add a5,a4,a5
100e8: fe042703 lw a4,-32(s0)
100ec: 00e7a023 sw a4,0(a5)
100f0: fe442783 lw a5,-28(s0)
100f4: 00178793 addi a5,a5,1
100f8: fef42223 sw a5,-28(s0)
100fc: fe842783 lw a5,-24(s0)
10100: 00178793 addi a5,a5,1
10104: fef42423 sw a5,-24(s0)
10108: 0940006f j 1019c <sortColors+0x148>
1010c: fe442783 lw a5,-28(s0)
10110: 00279793 slli a5,a5,0x2
10114: fcc42703 lw a4,-52(s0)
10118: 00f707b3 add a5,a4,a5
1011c: 0007a703 lw a4,0(a5)
10120: 00100793 li a5,1
10124: 00f71a63 bne a4,a5,10138 <sortColors+0xe4>
10128: fe442783 lw a5,-28(s0)
1012c: 00178793 addi a5,a5,1
10130: fef42223 sw a5,-28(s0)
10134: 0680006f j 1019c <sortColors+0x148>
10138: fec42783 lw a5,-20(s0)
1013c: 00279793 slli a5,a5,0x2
10140: fcc42703 lw a4,-52(s0)
10144: 00f707b3 add a5,a4,a5
10148: 0007a783 lw a5,0(a5)
1014c: fcf42e23 sw a5,-36(s0)
10150: fe442783 lw a5,-28(s0)
10154: 00279793 slli a5,a5,0x2
10158: fcc42703 lw a4,-52(s0)
1015c: 00f70733 add a4,a4,a5
10160: fec42783 lw a5,-20(s0)
10164: 00279793 slli a5,a5,0x2
10168: fcc42683 lw a3,-52(s0)
1016c: 00f687b3 add a5,a3,a5
10170: 00072703 lw a4,0(a4)
10174: 00e7a023 sw a4,0(a5)
10178: fe442783 lw a5,-28(s0)
1017c: 00279793 slli a5,a5,0x2
10180: fcc42703 lw a4,-52(s0)
10184: 00f707b3 add a5,a4,a5
10188: fdc42703 lw a4,-36(s0)
1018c: 00e7a023 sw a4,0(a5)
10190: fec42783 lw a5,-20(s0)
10194: fff78793 addi a5,a5,-1
10198: fef42623 sw a5,-20(s0)
1019c: fe442703 lw a4,-28(s0)
101a0: fec42783 lw a5,-20(s0)
101a4: ece7dee3 bge a5,a4,10080 <sortColors+0x2c>
101a8: 00000013 nop
101ac: 03c12403 lw s0,60(sp)
101b0: 04010113 addi sp,sp,64
101b4: 00008067 ret
000101b8 <_start>:
101b8: fc010113 addi sp,sp,-64
101bc: 02112e23 sw ra,60(sp)
101c0: 02812c23 sw s0,56(sp)
101c4: 04010413 addi s0,sp,64
101c8: 000107b7 lui a5,0x10
101cc: 2c07a503 lw a0,704(a5) # 102c0 <_start+0x108>
101d0: 2c078713 addi a4,a5,704
101d4: 00472583 lw a1,4(a4)
101d8: 2c078713 addi a4,a5,704
101dc: 00872603 lw a2,8(a4)
101e0: 2c078713 addi a4,a5,704
101e4: 00c72683 lw a3,12(a4)
101e8: 2c078713 addi a4,a5,704
101ec: 01072703 lw a4,16(a4)
101f0: 2c078793 addi a5,a5,704
101f4: 0147a783 lw a5,20(a5)
101f8: fca42623 sw a0,-52(s0)
101fc: fcb42823 sw a1,-48(s0)
10200: fcc42a23 sw a2,-44(s0)
10204: fcd42c23 sw a3,-40(s0)
10208: fce42e23 sw a4,-36(s0)
1020c: fef42023 sw a5,-32(s0)
10210: 00600793 li a5,6
10214: fef42423 sw a5,-24(s0)
10218: fcc40793 addi a5,s0,-52
1021c: fe842583 lw a1,-24(s0)
10220: 00078513 mv a0,a5
10224: e31ff0ef jal ra,10054 <sortColors>
10228: 400027b7 lui a5,0x40002
1022c: fef42223 sw a5,-28(s0)
10230: fe442783 lw a5,-28(s0)
10234: 05b00713 li a4,91
10238: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff0528>
1023c: fe042623 sw zero,-20(s0)
10240: 0540006f j 10294 <_start+0xdc>
10244: fec42783 lw a5,-20(s0)
10248: 00279793 slli a5,a5,0x2
1024c: ff040713 addi a4,s0,-16
10250: 00f707b3 add a5,a4,a5
10254: fdc7a783 lw a5,-36(a5)
10258: 0ff7f793 andi a5,a5,255
1025c: 03078793 addi a5,a5,48
10260: 0ff7f713 andi a4,a5,255
10264: fe442783 lw a5,-28(s0)
10268: 00e78023 sb a4,0(a5)
1026c: fe842783 lw a5,-24(s0)
10270: fff78793 addi a5,a5,-1
10274: fec42703 lw a4,-20(s0)
10278: 00f75863 bge a4,a5,10288 <_start+0xd0>
1027c: fe442783 lw a5,-28(s0)
10280: 02c00713 li a4,44
10284: 00e78023 sb a4,0(a5)
10288: fec42783 lw a5,-20(s0)
1028c: 00178793 addi a5,a5,1
10290: fef42623 sw a5,-20(s0)
10294: fec42703 lw a4,-20(s0)
10298: fe842783 lw a5,-24(s0)
1029c: faf744e3 blt a4,a5,10244 <_start+0x8c>
102a0: fe442783 lw a5,-28(s0)
102a4: 05d00713 li a4,93
102a8: 00e78023 sb a4,0(a5)
102ac: 00000013 nop
102b0: 03c12083 lw ra,60(sp)
102b4: 03812403 lw s0,56(sp)
102b8: 04010113 addi sp,sp,64
102bc: 00008067 ret
```
### Size
Generated from command
```
riscv-none-embed-size test
```
```
text data bss dec hex filename
644 0 0 644 284 test
```
## Conclusion
Below is the comparison of each level of optimization.
| | Without optimization | -O2 | -O3 |
|:----------------- |:-------------------- |:----------- |:----------- |
| Execution time | 138300 ns | 127900 ns | 121600 ns |
| Instruction count | 398 | 151 | 142 |
| Jumps | 28 | 19 | 13 |
| Branching | T: 19; F: 11 | T: 15; F: 7 | T: 11; F: 5 |
> The execution time might be change every time because it depends on the status of the CPU.
Compare without optimization to -O3 optimization, the instuction count reduced 65% and the number of Jump almost reduced by 50%.
Deeply look into the assembly code generated by `RISC-V Toolchain`, I found that the `-O3 optimization` is more clear and simpler than `-O0`, which exists many redundancy instructions. Also, there are `nop` instruction in `-O0`, which means that it exists `load use data hazard`; therefore, it increases the number of cycles.
The reason that I do not choose the optimize option `-Os` is because it has some error `test2.c:(.text+0x88): undefined reference to memcpy`, but I cannot figure it out right now.