owned this note
owned this note
Published
Linked with GitHub
# Assignment3: SoftCPU
## Environment
#### RISC-V toolchains.
```
$ sudo apt install autoconf automake autotools-dev curl gawk git \
build-essential bison flex texinfo gperf libtool patchutils bc git \
libmpc-dev libmpfr-dev libgmp-dev gawk zlib1g-dev libexpat1-dev
$ git clone --recursive https://github.com/riscv/riscv-gnu-toolchain
$ cd riscv-gnu-toolchain
$ mkdir -p build && cd build
$ ../configure --prefix=/opt/riscv --enable-multilib
$ make -j$(nproc)
# Install the dependent packages.
$ sudo apt install build-essential ccache
```
#### srv32
```
$ git clone https://github.com/sysprog21/srv32.git
$ cd ~/srv32/tools/
$ make
$ cd ~/srv32/sim/
$ make
# Check Lab2: riscv-none-embed-gcc
$ cd $HOME
$ source riscv-none-embed-gcc/setenv
$ riscv-none-embed-gcc -v
```
#### Run compliance tests on SW simulator
##### Result:
```
$ cd ~/srv32/tests/
$ make tests-sw
================================
OK: 48/48 RISCV_TARGET=srv32 RISCV_DEVICE=rv32i RISCV_ISA=rv32i
OK: 8/8 RISCV_TARGET=srv32 RISCV_DEVICE=rv32im RISCV_ISA=rv32im
OK: 6/6 RISCV_TARGET=srv32 RISCV_DEVICE=rv32Zicsr RISCV_ISA=rv32Zicsr
--------------------------------
$ make tests
================================
OK: 48/48 RISCV_TARGET=srv32 RISCV_DEVICE=rv32i RISCV_ISA=rv32i
OK: 8/8 RISCV_TARGET=srv32 RISCV_DEVICE=rv32im RISCV_ISA=rv32im
OK: 6/6 RISCV_TARGET=srv32 RISCV_DEVICE=rv32Zicsr RISCV_ISA=rv32Zicsr
```
## Assignment1 - MoveZeroes
* Rewrite my source code: modify int to **volatile int**.
* mkdir hw3 in sw, and put hw3.c and Makefile in.
:::spoiler Code
```C=
#include <stdio.h>
#include <stdlib.h>
void moveZeroes(volatile int nums[], volatile int numsSize){
volatile int index=0;
for(volatile int i = 0; i < numsSize; ++i){
if(nums[i])
nums[index++] = nums[i];
}
while(index < numsSize)
nums[index++] = 0;
}
int main()
{
volatile int nums[] = {0,1,0,3,12};
volatile int numsSize = 5;
printf("\nBefore move zeroes = ");
volatile int i;
for(i = 0; i < numsSize; ++i){
printf("%d ", nums[i]);
}
moveZeroes(nums, numsSize);
printf("\nAfter move zeroes = ");
for(i = 0; i < numsSize; ++i){
printf("%d ", nums[i]);
}
}
```
:::
```
$ cd $HOME/srv32
$ make hw3
make[2]: Leaving directory '/home/jeff/srv32/sw/hw3'
make[1]: Leaving directory '/home/jeff/srv32/sw'
make[1]: Entering directory '/home/jeff/srv32/sim'
Before move zeroes = 0 1 0 3 12
After move zeroes = 1 3 12 0 0
Excuting 9454 instructions, 12126 cycles, 1.282 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.094 s
Simulation cycles: 12137
Simulation speed : 0.129117 MHz
make[1]: Leaving directory '/home/jeff/srv32/sim'
make[1]: Entering directory '/home/jeff/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/hw3/hw3.elf
Before move zeroes = 0 1 0 3 12
After move zeroes = 1 3 12 0 0
Excuting 9454 instructions, 12126 cycles, 1.283 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.005 s
Simulation cycles: 12126
Simulation speed : 2.680 MHz
make[1]: Leaving directory '/home/jeff/srv32/tools'
```
## Waveform
:::spoiler MoveZeroes Assembly Code Generated By SRV32
```C=
0000003c <moveZeroes>:
3c: fe010113 addi sp,sp,-32
40: 00b12623 sw a1,12(sp)
44: 00012c23 sw zero,24(sp)
48: 00012e23 sw zero,28(sp)
4c: 01c12703 lw a4,28(sp)
50: 00c12783 lw a5,12(sp)
54: 06f75e63 bge a4,a5,d0 <moveZeroes+0x94>
58: 01c12783 lw a5,28(sp)
5c: 00279793 slli a5,a5,0x2
60: 00f507b3 add a5,a0,a5
64: 0007a783 lw a5,0(a5)
68: 02078663 beqz a5,94 <moveZeroes+0x58>
6c: 01c12703 lw a4,28(sp)
70: 01812783 lw a5,24(sp)
74: 00271713 slli a4,a4,0x2
78: 00178693 addi a3,a5,1
7c: 00d12c23 sw a3,24(sp)
80: 00e50733 add a4,a0,a4
84: 00072703 lw a4,0(a4)
88: 00279793 slli a5,a5,0x2
8c: 00f507b3 add a5,a0,a5
90: 00e7a023 sw a4,0(a5)
94: 01c12783 lw a5,28(sp)
98: 00178793 addi a5,a5,1
9c: 00f12e23 sw a5,28(sp)
a0: 01c12703 lw a4,28(sp)
a4: 00c12783 lw a5,12(sp)
a8: faf748e3 blt a4,a5,58 <moveZeroes+0x1c>
ac: 01812703 lw a4,24(sp)
b0: 00c12783 lw a5,12(sp)
b4: 02f75463 bge a4,a5,dc <moveZeroes+0xa0>
b8: 01812783 lw a5,24(sp)
bc: 00178713 addi a4,a5,1
c0: 00279793 slli a5,a5,0x2
c4: 00e12c23 sw a4,24(sp)
c8: 00f507b3 add a5,a0,a5
cc: 0007a023 sw zero,0(a5)
d0: 01812703 lw a4,24(sp)
d4: 00c12783 lw a5,12(sp)
d8: fef740e3 blt a4,a5,b8 <moveZeroes+0x7c>
dc: 02010113 addi sp,sp,32
e0: 00008067 ret
```
:::
```
$ sudo apt install gtkwave
$ gtkwave
Load the file $HOME/srv32/sim/wave.fst.
```
* This core is three-stage pipeline processors, which is IF/ID, EXE and WB.
* Branch is taken during the EXE, it needs to flush two instructions that have been fetched into the pipeline, which causes a delay.

In my MoveZeroes program, we focus on branch instruction.
When encountering the branch, if the branch jump, it needs to flush the instructions that have been fetched into the pipeline.
```
68: 02078663 beqz a5,94 <moveZeroes+0x58>
6c: 01c12703 lw a4,28(sp)
70: 01812783 lw a5,24(sp)
.
.
94: 01c12783 lw a5,28(sp)
.
```
The instruction **beqz a5,94** takes the branch, so the instructions at 6c and 70 need to flush and fetch the instruction at 94.

|Address| Instruction | Cycle1 | Cycle2 |Cycle3 | Cycle4 | Cycle5 |Cycle6 |
|-------| -------- | -------- | -------- |-------- |-------- |-------- |-------- |
|68 | beqz a5,94 | IF/ID | EXE |WB | | | |
|6c | lw a4,28(sp)| | ~~IF/ID~~ |**NOP** |**NOP** | | |
|70 | lw a5,24(sp)| | |~~IF/ID~~ |**NOP** | | |
|94 | lw a5,28(sp)| | | |IF/ID |EXE |WB |
The jump instruction is similar like the branch instruction, it also needs to stall the two following instructions.

```
0000003c <moveZeroes>:
3c: fe010113 addi sp,sp,-32
.
.
.
184: eb9ff0ef jal ra,3c <moveZeroes>
188: 00020537 lui a0,0x20
18c: 05850513 addi a0,a0,88 # 20058 <__malloc_trim_threshold+0x20>
```
## Software Optimizations
Because encountering the branch and the jump instruction will cause a delay of two instructions.
So if we want to eliminate stalls, and get shorter cycle counts, we can do the loop-unrolling to reach it.
* First, we know the size of the array, so we can loop the array without for-loop.
* Second, we move our MoveZeroes function into the main function, which can reduce the jump instruction.
:::spoiler optimal_hw3 Code
```C=
#include <stdio.h>
volatile int main()
{
volatile int nums[] = {0,1,0,3,12};
volatile int numsSize = 5;
volatile int i;
volatile int index=0;
printf("\nBefore move zeroes = ");
for(i = 0; i < numsSize; ++i){
printf("%d ", nums[i]);
}
if(nums[0]){nums[index++] = nums[0];}
if(nums[1]){nums[index++] = nums[1];}
if(nums[2]){nums[index++] = nums[2];}
if(nums[3]){nums[index++] = nums[3];}
if(nums[4]){nums[index++] = nums[4];}
while(index < numsSize){
nums[index++] = 0;
}
printf("\nAfter move zeroes = ");
for(i = 0; i < numsSize; ++i){
printf("%d ", nums[i]);
}
printf("\n");
}
```
:::
After optimization, the total instruction and cycle counts are as follows.
```
ake[2]: Leaving directory '/home/jeff/srv32/sw/optimal_hw3'
make[1]: Leaving directory '/home/jeff/srv32/sw'
make[1]: Entering directory '/home/jeff/srv32/sim'
Before move zeroes = 0 1 0 3 12
After move zeroes = 1 3 12 0 0
Excuting 9109 instructions, 11673 cycles, 1.281 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.086 s
Simulation cycles: 11684
Simulation speed : 0.13586 MHz
make[1]: Leaving directory '/home/jeff/srv32/sim'
make[1]: Entering directory '/home/jeff/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/optimal_hw3/optimal_hw3.elf
Before move zeroes = 0 1 0 3 12
After move zeroes = 1 3 12 0 0
Excuting 9109 instructions, 11673 cycles, 1.281 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.004 s
Simulation cycles: 11673
Simulation speed : 2.904 MHz
make[1]: Leaving directory '/home/jeff/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```
* We reduced the cycle counts from 12126 to 11673, a total of 453 cycles.