# Assignment3: SoftCPU
###### tags: `CA`
contributed by < [reputation0809](https://github.com/reputation0809) >
## Requirement
[Assignment3: SoftCPU](https://hackmd.io/@sysprog/2021-arch-homework3)
## Set Environment
1. Download riscv-none-embed-gcc for srv32 [The xPack GNU RISC-V Embedded GCC](https://xpack.github.io/riscv-none-embed-gcc/).
```shell=
cd $HOME
wget https://github.com/xpack-dev-tools/riscv-none-embed-gcc-xpack/releases/download/v10.2.0-1.2/xpack-riscv-none-embed-gcc-10.2.0-1.2-linux-x64.tar.gz
tar zxvf xpack-riscv-none-embed-gcc-10.2.0-1.2-linux-x64.tar.gz
cd riscv-none-embed-gcc
echo "export PATH=`pwd`/bin:$PATH" > setenv
```
2. Configure `$PATH`.
```shell=
echo "export PATH=`pwd`/bin:$PATH" > setenv
cd $HOME
source riscv-none-embed-gcc/setenv
```
3. Download needed packages
```shell=
sudo apt-get install lcov
sudo apt-get install ccache
```
4. Git srv32
```shell=
git clone https://github.com/sysprog21/srv32
```
5. Into srv32 and build simulations
```shell=
cd srv32/tools
make
cd ../sim
make
```
6. Check
```shell=
cd ..
make all
```

## Requirement 1
* The original C code you can find in [here](https://github.com/reputation0809/ca_hw3/blob/main/origin_main.c).
* Due to srv32, I need to modify `int` to `volatile int`.
### Run simulaiton on srv32
1. create a new folder `main` in `sw`
2. create C file in `main` called `main.c`
3. copy `Makefile` from `hello` to `main`
4. modify [Makefile](https://github.com/reputation0809/ca_hw3/tree/main) in `main`
5. back to srv32 root to run `make main`

## Requirement 2
### SRV32 - Simple 3-stage pipeline RISC-V processor Architecture
* This is a simple RISC-V 3-stage pipeline processor and supports FreeRTOS.
* This core is three-stage pipeline processors, which is Fetch & Decode (F/D), execution (E) and write back (WB).
* **Register Forwarding**: The problem with data hazards, introduced by this sequence of instructions can be solved with a simple hardware technique called forwarding. When the execution result accesses the same register, the execution result is directly forwarded to the next instruction.
* **Branch Penalty**: When the branch is taken during the execute phase, it needs to flush the instructions that have been fetched into the pipeline, which causes a delay of two instructions, so the extra cost of the branch is two.
* **Memory Interface**: One instruction memory and one data memory. The instruction memory is read-only for one read port, while data memory is two port, one for reading and one for writing.
* **SRV32 pipeline architecture:**

### Use GTKWave
- Install [GTKWave](http://gtkwave.sourceforge.net/).
```shell
sudo apt install gtkwave
gtkwave
```
* Activate GTKWave and load `sim/wave.fst` generated by srv32.

* Append signals and show.

* Take `lw` instruction for example.
* We can see `sim/trace.log`
```shell=
418 00002c74 01042683 read 0x0002123c => 0x00000000, x13 (a3) <= 0x00000000
```
* The `pc` is `00002c74`, `inst` is `01042683`, `read address` is `0x0002123c`, `read data` is `0x00000000`.
* First of all, `pc` is transfered from `if` stage to `wb` stage

* In the `if` stage, fetch the instruction from `imem`, read address as `imem_addr`, read data as `imem_rdata`.

* In the `ex` stage, program will count the read address as `dmem_raddr`.
* `wb_mem2reg=1` means load memory data to register.
* In the `wb` stage, go to DMEM load data as `dmem_rdata` and write to register based on `wb_raddress`.

## Requirement 3
* Originally, I ran my C code and got the simulation statistics:
```shell=
Simulation statistics
=====================
Simulation time : 0.014 s
Simulation cycles: 2471
Simulation speed : 0.179 MHz
```
* Now, I wonder how to optimize my code to have a fewer instructions and cycle counts.
* I fiqure out that I originally use Binary Search on my program, but the array only has three data.
* Therefore, I think maybe using a simple and intuitional way to solve the problem can have a fewer instructions.
* You can find the new code I propose [here](https://github.com/reputation0809/ca_hw3/blob/main/opt_main.c).
* Finally, I run simulation and observe the results.

* VoilĂ , the instructions magically reduce `1887-1830=57` and the cycles reduce `2471-2412=59`.
## Requirement 4
### How RISC-V Compliance Tests works
* According to [README](https://github.com/riscv-non-isa/riscv-arch-test#readme) from [riscv-arch-test](https://github.com/riscv-non-isa/riscv-arch-test), we can easily know how the compliance tests works.
* Running the following command will clone the repository into `srv32/tests` and do the compliance test.
```shell=
make tests # run the compliance test for RTL
make tests-sw # run the compliance test for ISS simulator
```
* Results of running `make tests-sw`
```shell=
OK: 48/48 RISCV_TARGET=srv32 RISCV_DEVICE=rv32i RISCV_ISA=rv32i
OK: 8/8 RISCV_TARGET=srv32 RISCV_DEVICE=rv32im RISCV_ISA=rv32im
OK: 6/6 RISCV_TARGET=srv32 RISCV_DEVICE=rv32Zicsr RISCV_ISA=rv32Zicsr
```
### How srv32 works with Verilator
* Verilator is a RTL simulation for verilog code.
* By running Verilator in srv32, we can know how our code works and also the simulation statistics including time, cycles, and speed.
* After realizing the data from Verilator, we can analyze the performance of our RTL code.