owned this note
owned this note
Published
Linked with GitHub
# Lab3: SoftCPU
contributed by < [geniuseric](https://github.com/geniuseric) >
## Requirement
[Assignment3: SoftCPU](https://hackmd.io/@sysprog/2021-arch-homework3)
## Setup Environment
1. Prepare GNU Toolchain for RISC-V. See [The xPack GNU RISC-V Embedded GCC](https://xpack.github.io/riscv-none-embed-gcc/).
```shell
$ cd /tmp
$ wget https://github.com/xpack-dev-tools/riscv-none-embed-gcc-xpack/releases/download/v10.2.0-1.2/xpack-riscv-none-embed-gcc-10.2.0-1.2-linux-x64.tar.gz
$ tar zxvf xpack-riscv-none-embed-gcc-10.2.0-1.2-linux-x64.tar.gz
$ cp -af xpack-riscv-none-embed-gcc-10.2.0-1.2 $HOME/riscv-none-embed-gcc
```
2. Configure `$PATH`.
```shell
$ cd $HOME/riscv-none-embed-gcc
$ echo "export PATH=`pwd`/bin:$PATH" > setenv
```
Once step (1) and (2) are complete, you can simply update `$PATH` environment variable via:
```shell
$ cd $HOME
$ source riscv-none-embed-gcc/setenv
```
Check `$PATH` at the first time:
```shell
$ riscv-none-embed-gcc -v
```
3. Fetch [SRV32](https://github.com/sysprog21/srv32).
```shell
$ cd $HOME
$ git clone https://github.com/sysprog21/srv32
```
4. Run compliance tests (https://github.com/sysprog21/srv32).
```shell
$ cd $HOME/srv32/tools/
$ make
$ cd $HOME/srv32/sim/
$ make
$ cd $HOME/srv32/tests/
$ make tests # Run test v1 for RTL
$ make tests-sw # Run test v1 for ISS simulator
```
## Simple 3-stage pipeline RISC-V processor
- The detailed description is written in [here](https://hackmd.io/@sysprog/S1Udn1Xtt).
- [SRV32](https://github.com/sysprog21/srv32) is a three-stage pipeline processor with IF/ID,EX,WB stages.
- It passes RV32IM compliance test.
- It supports **full forwarding**, which means data hazard can be resolved WITHOUT stalling the processor.
- The **branch penalty** is 2.
- The pipeline architecture is shown below.

## Search Insert Position
- The original C code is written in [searchInsert.c](https://github.com/geniuseric/Computer_Architecture/blob/master/HW1/searchInsert.c).
- In order to run on SRV32, I modify int to **volatile** int.
- Also, I change ```mid = (down + up) / 2``` to ```mid = (down + up) >> 1``` for optimization.
- I put [searchInsert](https://github.com/geniuseric/Computer_Architecture/tree/master/HW3/searchInsert) folder which contains the modified C code and Makefile into ```$HOME/srv32/sw``` folder.
- Run code for RTL and ISS simulators.
```shell
$ cd $HOME/srv32
$ make searchInsert
```
- The result is shown below.
```c=
make[1]: Entering directory '/home/airobots/srv32/sw'
make -C common
make[2]: Entering directory '/home/airobots/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/airobots/srv32/sw/common'
make[2]: Entering directory '/home/airobots/srv32/sw/searchInsert'
riscv-none-embed-gcc -O3 -Wall -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -L../common -o searchInsert.elf searchInsert.c -lc -lm -lgcc -lsys -T ../common/default.ld
riscv-none-embed-objcopy -j .text -O binary searchInsert.elf imem.bin
riscv-none-embed-objcopy -j .data -O binary searchInsert.elf dmem.bin
riscv-none-embed-objcopy -O binary searchInsert.elf memory.bin
riscv-none-embed-objdump -d searchInsert.elf > searchInsert.dis
riscv-none-embed-readelf -a searchInsert.elf > searchInsert.symbol
make[2]: Leaving directory '/home/airobots/srv32/sw/searchInsert'
make[1]: Leaving directory '/home/airobots/srv32/sw'
make[1]: Entering directory '/home/airobots/srv32/sim'
The insert position is 4
Excuting 1705 instructions, 2217 cycles, 1.300 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.018 s
Simulation cycles: 2228
Simulation speed : 0.123778 MHz
make[1]: Leaving directory '/home/airobots/srv32/sim'
make[1]: Entering directory '/home/airobots/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/searchInsert/searchInsert.elf
The insert position is 4
Excuting 1705 instructions, 2217 cycles, 1.300 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.001 s
Simulation cycles: 2217
Simulation speed : 4.024 MHz
make[1]: Leaving directory '/home/airobots/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```
## View Waveform
- Install [GTKWave](http://gtkwave.sourceforge.net/) and open it.
```shell
$ sudo apt install gtkwave
$ gtkwave
```
- Load the file ```$HOME/srv32/sim/wave.fst```.
- Show the signals/events of the branch instruction ```beq a4,a5,ec``` at instruction address ```0x90```.

- Explain the instrction:
1. The previous instruction of 0x90 is ```0x8c```, which is ```lw a5,8(sp)```. The value of register ```a5``` is forwarded from 0x8c in WB stage to 0x90 in EX stage.
2. The time when 0x90 is in EX stage, the values of register ```a4``` and ```a5``` are equal. Therefore, the branch is taken and the next instruction of 0x90 should be ```0xec```.
3. Control signals ```ex_flush``` and ```wb_flush``` show when instructions ```0x94``` and ```0x98``` are flushed. Two instructions match the number of **branch penalty**, which is 2.
## Propose Software Optimization
- For the further optimization, I eliminate **while** loop and use **goto** syntax instead.
- The optimized code is written in [opt_searchInsert.c](https://github.com/geniuseric/Computer_Architecture/blob/master/HW3/opt_searchInsert/opt_searchInsert.c).
- The optimized result is shown below. It saves ```2217-2214=3``` cycles.
```c=
make[1]: Entering directory '/home/airobots/srv32/sw'
make -C common
make[2]: Entering directory '/home/airobots/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/airobots/srv32/sw/common'
make[2]: Entering directory '/home/airobots/srv32/sw/opt_searchInsert'
riscv-none-embed-gcc -O3 -Wall -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -L../common -o opt_searchInsert.elf opt_searchInsert.c -lc -lm -lgcc -lsys -T ../common/default.ld
riscv-none-embed-objcopy -j .text -O binary opt_searchInsert.elf imem.bin
riscv-none-embed-objcopy -j .data -O binary opt_searchInsert.elf dmem.bin
riscv-none-embed-objcopy -O binary opt_searchInsert.elf memory.bin
riscv-none-embed-objdump -d opt_searchInsert.elf > opt_searchInsert.dis
riscv-none-embed-readelf -a opt_searchInsert.elf > opt_searchInsert.symbol
make[2]: Leaving directory '/home/airobots/srv32/sw/opt_searchInsert'
make[1]: Leaving directory '/home/airobots/srv32/sw'
make[1]: Entering directory '/home/airobots/srv32/sim'
The insert position is 4
Excuting 1702 instructions, 2214 cycles, 1.300 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.019 s
Simulation cycles: 2225
Simulation speed : 0.117105 MHz
make[1]: Leaving directory '/home/airobots/srv32/sim'
make[1]: Entering directory '/home/airobots/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/opt_searchInsert/opt_searchInsert.elf
The insert position is 4
Excuting 1702 instructions, 2214 cycles, 1.301 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.001 s
Simulation cycles: 2214
Simulation speed : 3.919 MHz
make[1]: Leaving directory '/home/airobots/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```