owned this note
owned this note
Published
Linked with GitHub
# Lab3: srv32 - RISCV RV32IM Soft CPU
###### tags: `Computer Architecture`
Link: [2020 Lab3](https://hackmd.io/@sysprog/rJw2A5DqS), [2020 HW3](https://hackmd.io/@sysprog/2020-arch-homework3), [2021 Lab3](https://hackmd.io/@sysprog/S1Udn1Xtt), [2021 HW3](https://hackmd.io/@sysprog/2021-arch-homework3)
## Environment Construction
### Download Latest `riscv-none-embed-gcc` Version
In this assignment, I need to use `riscv-none-embed-gcc` to cross compile, then I get the `Illegal Instruction Error`. So, I fix the bug by modify [Lab2: RISC-V RV32I[MA] emulator with ELF support](https://hackmd.io/@sysprog/rJAufgHYS) Command which is shown as following:
Download the source code and unzip:
```
$ cd /tmp
$ wget https://github.com/xpack-dev-tools/riscv-none-embed-gcc-xpack/releases/download/v10.2.0-1.2/xpack-riscv-none-embed-gcc-10.2.0-1.2-linux-x64.tar.gz
$ tar zxvf xpack-riscv-none-embed-gcc-10.2.0-1.2-linux-x64.tar.gz
$ cp -af xpack-riscv-none-embed-gcc-10.2.0-1.2 $HOME/riscv-none-gcc
```
> There are some bugs in `xpack-riscv-none-embed-gcc-10.2.0-1.1-linux-x64.tar.gz` library, so just download the latest version then the issue is solved
Add the enviornment variable:
```
sudo vim ~/.bashrc
export PATH:$HOME/riscv-none-gcc/bin:$PATH
export PATH:$HOME/riscv-none-gcc/riscv-none-embed/bin:$PATH
esc + :wq
```
## Requirement
* Source Code in [Lab1: RV321 Simulator](https://hackmd.io/tw25UTrwTHWk1ayuZLBEQg)
```C=
#include <stdio.h>
#include <stdlib.h>
#define min(a, b) (((a) < (b)) ? (a) : (b))
int minCostClimbingStairs(int* cost, int costSize){
if(costSize==0) return 0;
if(costSize==1) return *cost;
if(costSize==2) return min((*cost+1), *cost);
int prev2 = *cost;
int prev1 = *(cost+1);
for(int i=2;i<costSize;i++){
int temp = min(prev1, prev2) + *(cost+i);
prev2 = prev1;
prev1 = temp;
}
return min(prev1, prev2);
}
int main(){
int costSize=10;
int cost[10]={1, 100, 1, 1, 1, 100, 1, 1, 100, 1};
int ans=minCostClimbingStairs(cost, costSize);
printf("Min Cost = %d\n", ans);
return 0;
}
```
* Compile C:
```
~/srv32/sim$ make leetcode.run
```
Output:
```
Min Cost = 6
Excuting 1508 instructions, 1968 cycles, 1.305 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.044 s
Simulation cycles: 1979
Simulation speed : 0.0449773 MHz
```
:::spoiler Assembly Code
```Assembly=
0000003c <minCostClimbingStairs>:
3c: 00050693 mv a3,a0
40: 04058c63 beqz a1,98 <minCostClimbingStairs+0x5c>
44: fff58793 addi a5,a1,-1
48: 00100713 li a4,1
4c: 00052503 lw a0,0(a0)
50: 04f77663 bgeu a4,a5,9c <minCostClimbingStairs+0x60>
54: 00200713 li a4,2
58: 0046a783 lw a5,4(a3)
5c: 02b75863 bge a4,a1,8c <minCostClimbingStairs+0x50>
60: 00259593 slli a1,a1,0x2
64: 00868713 addi a4,a3,8
68: 00b686b3 add a3,a3,a1
6c: 00078613 mv a2,a5
70: 00f55463 bge a0,a5,78 <minCostClimbingStairs+0x3c>
74: 00050613 mv a2,a0
78: 00072583 lw a1,0(a4)
7c: 00470713 addi a4,a4,4
80: 00078513 mv a0,a5
84: 00b607b3 add a5,a2,a1
88: fee692e3 bne a3,a4,6c <minCostClimbingStairs+0x30>
8c: 00a7d863 bge a5,a0,9c <minCostClimbingStairs+0x60>
90: 00078513 mv a0,a5
94: 00008067 ret
98: 00000513 li a0,0
9c: 00008067 ret
000000a0 <main>:
a0: 00020537 lui a0,0x20
a4: ff010113 addi sp,sp,-16
a8: 00600593 li a1,6
ac: 03c50513 addi a0,a0,60 # 2003c <__malloc_trim_threshold+0x4>
b0: 00112623 sw ra,12(sp)
b4: 054000ef jal ra,108 <printf>
b8: 00c12083 lw ra,12(sp)
bc: 00000513 li a0,0
c0: 01010113 addi sp,sp,16
c4: 00008067 ret
```
:::
* Gernate Wave File:
```
~/srv32/sim$ ./sim +dump
```
Output:
```
Min Cost = 6
Excuting 1508 instructions, 1968 cycles, 1.305 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.024 s
Simulation cycles: 1979
Simulation speed : 0.0824583 MHz
```
### GTKWave Analyze
1. There is no Data Hazard in my code, but there are some control hazard
2. Branch Instruction with wrong prediction

* I'm sorry that the image is small, but [Here](https://i.imgur.com/Xlaesmc.png) can see the Full Image
* In `000034A4`, there is a branch instruction and get a wrong prediction. Thus, there are two cycle penalty
```Assembly=
000034a4 <__sinit>:
34a4: 03852783 lw a5,56(a0)
34a8: 00078463 beqz a5,34b0 <__sinit+0xc>
34ac: 00008067 ret
34b0: cb9ff06f j 3168 <__sinit.part.0>
```
## How can I improve the code?
In my C code, I found the redundant branch instruction. In this case, that is `cost[10]={1, 100, 1, 1, 1, 100, 1, 1, 100, 1};`. Thus, the Boundary Condition in the original code is unnecessary. So I think i need to rewrite the code to reduce the usage of branch instruction.
### Optimization 1, Delete the Boundary Condition
```C=
#define min(a, b) (((a) < (b)) ? (a) : (b))
int minCostClimbingStairs(int* cost, int costSize){
int prev1 = 0, prev2 = 0, curr;
for(int i = 2; i < costSize + 1; i++){
curr = min(prev2 + cost[i-2], prev1 + cost[i-1]);
prev2 = prev1;
prev1 = curr;
}
return curr;
}
int main(){
int costSize=10;
int cost[10]={1, 100, 1, 1, 1, 100, 1, 1, 100, 1};
int ans=minCostClimbingStairs(cost, costSize);
printf("Min Cost = %d\n", ans);
return 0;
}
```
* Compile C
Command:
```
~/srv32/sim$ make leetcode.run
```
Output:
```
Min Cost = 6
Excuting 1508 instructions, 1968 cycles, 1.305 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.024 s
Simulation cycles: 1979
Simulation speed : 0.0824583 MHz
```
* Faster than Original 0.02 seconds, approximate 46% improvement. However, I see the number of instructions and the simulation Cycles, I know that there is no optimization.
* Its just an implementation time decrease(may be my computer is strong at that time, I don't know), but the essential is not change, so I try the second optimization
* How to optimize the code?
* In this code and the original code, i use iterative method, I think I should try the recursion one
### Optimization 2, Recursion
```C=
#define min(a, b) (((a) < (b)) ? (a) : (b))
int minCostClimbingStairs(int* cost, int costSize){
for(int i=2; i<costSize; i++)
cost[i] += min(cost[i-1], cost[i-2]);
return min(cost[costSize-1], cost[costSize-2]);
}
int main(){
int costSize=10;
int cost[10]={1, 100, 1, 1, 1, 100, 1, 1, 100, 1};
int ans=minCostClimbingStairs(cost, costSize);
printf("Min Cost = %d\n", ans);
return 0;
}
```
* Compile C
Command:
```
~/srv32/sim$ make leetcode.run
```
Output:
```
Min Cost = 6
Excuting 1615 instructions, 2099 cycles, 1.299 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.022 s
Simulation cycles: 2110
Simulation speed : 0.0959091 MHz
```
* This time I get the different result. The number of instructions is less than the original, and the simulation cycles is more than original.
:::spoiler Assembly Code
```Assemble=
0000003c <minCostClimbingStairs>:
3c: 00200793 li a5,2
40: 08b7d663 bge a5,a1,cc <minCostClimbingStairs+0x90>
44: 00400793 li a5,4
48: 0cb7de63 bge a5,a1,124 <minCostClimbingStairs+0xe8>
4c: ffb58693 addi a3,a1,-5
50: 00052703 lw a4,0(a0)
54: 00452783 lw a5,4(a0)
58: ffe6f693 andi a3,a3,-2
5c: 00850613 addi a2,a0,8
60: 00468693 addi a3,a3,4
64: 00200813 li a6,2
68: 00280813 addi a6,a6,2
6c: 00078893 mv a7,a5
70: 08f74c63 blt a4,a5,108 <minCostClimbingStairs+0xcc>
74: 00062703 lw a4,0(a2)
78: 00e88733 add a4,a7,a4
7c: 00e62023 sw a4,0(a2)
80: 00070893 mv a7,a4
84: 06e7c463 blt a5,a4,ec <minCostClimbingStairs+0xb0>
88: 00462783 lw a5,4(a2)
8c: 00860613 addi a2,a2,8
90: 00f887b3 add a5,a7,a5
94: fef62e23 sw a5,-4(a2)
98: fcd818e3 bne a6,a3,68 <minCostClimbingStairs+0x2c>
9c: 00269793 slli a5,a3,0x2
a0: 00f507b3 add a5,a0,a5
a4: ffc7a603 lw a2,-4(a5)
a8: ff87a703 lw a4,-8(a5)
ac: 00168693 addi a3,a3,1
b0: 00c75463 bge a4,a2,b8 <minCostClimbingStairs+0x7c>
b4: 00070613 mv a2,a4
b8: 0007a703 lw a4,0(a5)
bc: 00478793 addi a5,a5,4
c0: 00c70733 add a4,a4,a2
c4: fee7ae23 sw a4,-4(a5)
c8: fcb6cee3 blt a3,a1,a4 <minCostClimbingStairs+0x68>
cc: 00259593 slli a1,a1,0x2
d0: ff858593 addi a1,a1,-8
d4: 00b505b3 add a1,a0,a1
d8: 0005a783 lw a5,0(a1)
dc: 0045a503 lw a0,4(a1)
e0: 00a7d463 bge a5,a0,e8 <minCostClimbingStairs+0xac>
e4: 00078513 mv a0,a5
e8: 00008067 ret
ec: 00078893 mv a7,a5
f0: 00462783 lw a5,4(a2)
f4: 00860613 addi a2,a2,8
f8: 00f887b3 add a5,a7,a5
fc: fef62e23 sw a5,-4(a2)
100: f6d814e3 bne a6,a3,68 <minCostClimbingStairs+0x2c>
104: f99ff06f j 9c <minCostClimbingStairs+0x60>
108: 00070893 mv a7,a4
10c: 00062703 lw a4,0(a2)
110: 00e88733 add a4,a7,a4
114: 00e62023 sw a4,0(a2)
118: 00070893 mv a7,a4
11c: f6e7d6e3 bge a5,a4,88 <minCostClimbingStairs+0x4c>
120: fcdff06f j ec <minCostClimbingStairs+0xb0>
124: 00200693 li a3,2
128: f75ff06f j 9c <minCostClimbingStairs+0x60>
0000012c <main>:
12c: 000207b7 lui a5,0x20
130: 04c78793 addi a5,a5,76 # 2004c <__malloc_trim_threshold+0x14>
134: 0007af03 lw t5,0(a5)
138: 0047ae83 lw t4,4(a5)
13c: 0087ae03 lw t3,8(a5)
140: 00c7a303 lw t1,12(a5)
144: 0107a883 lw a7,16(a5)
148: 0147a803 lw a6,20(a5)
14c: 0187a603 lw a2,24(a5)
150: 01c7a683 lw a3,28(a5)
154: 0207a703 lw a4,32(a5)
158: 0247a783 lw a5,36(a5)
15c: fc010113 addi sp,sp,-64
160: 00a00593 li a1,10
164: 00810513 addi a0,sp,8
168: 02112e23 sw ra,60(sp)
16c: 01e12423 sw t5,8(sp)
170: 01d12623 sw t4,12(sp)
174: 01c12823 sw t3,16(sp)
178: 00612a23 sw t1,20(sp)
17c: 01112c23 sw a7,24(sp)
180: 01012e23 sw a6,28(sp)
184: 02c12023 sw a2,32(sp)
188: 02d12223 sw a3,36(sp)
18c: 02e12423 sw a4,40(sp)
190: 02f12623 sw a5,44(sp)
194: ea9ff0ef jal ra,3c <minCostClimbingStairs>
198: 00050593 mv a1,a0
19c: 00020537 lui a0,0x20
1a0: 03c50513 addi a0,a0,60 # 2003c <__malloc_trim_threshold+0x4>
1a4: 054000ef jal ra,1f8 <printf>
1a8: 03c12083 lw ra,60(sp)
1ac: 00000513 li a0,0
1b0: 04010113 addi sp,sp,64
1b4: 00008067 ret
```
:::
### Optimization3, Iterative with an array
```C=
#include <stdio.h>
#include <stdlib.h>
#define min(a, b) (((a) < (b)) ? (a) : (b))
int minCostClimbingStairs(int* cost, int costSize){
int *sum = (int *)malloc(costSize * sizeof(int));
sum[0] = cost[0];
sum[1] = cost[1];
for (int i = 2; i < costSize; i++)
sum[i] = min(sum[i - 1], sum[i - 2]) + cost[i];
return min(sum[costSize - 1], sum[costSize - 2]);
}
int main(){
int costSize=10;
int cost[10]={1, 100, 1, 1, 1, 100, 1, 1, 100, 1};
int ans=minCostClimbingStairs(cost, costSize);
printf("Min Cost = %d\n", ans);
return 0;
}
```
* Compile C
Command:
```
~/srv32/sim$ make leetcode.run
```
Output:
```
Min Cost = 6
Excuting 1717 instructions, 2225 cycles, 1.295 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.028 s
Simulation cycles: 2236
Simulation speed : 0.0798571 MHz
```
## Summary(`#TODO`)
1. Compare the optimization code
| | Original | Iterative | Recursion | Iterative with array|
| -------- | -------- | -------- | -------- |-------- |
| Simulation Time |0.044s|0.024s|0.022s|0.028s |
| Simulation Cycle |1979|1979|2110|2236|
| Executing Instructions |1508|1508|1615|1717|
| Execution Cycles|1968|1968|2099|2225|
|CPI|1.305|1.305|1.299|1.295|
3. Summarize how RISC-V Compliance Tests works and why the signature should be matched.
4. Explain how srv32 works with Verilator.