owned this note
owned this note
Published
Linked with GitHub
# Assigment3: SoftCPU
## Start with Lab3
Let's start this assignment by going through the [Lab3: srv32 - RISCV RV32IM Soft CPU](https://hackmd.io/@sysprog/S1Udn1Xtt). `srv32` is a three-stage pipeline processor using RV32IM instruction sets(**RV** stands for RISC-V, **32I** stands for 32 base integer instruction set, **M** stands for supporting multiply and divide instuctions).
Here, I use the Ubuntu OS in Virtualbox.
### Set up verilator
To use `srv32`, we have to first set up the `verilater`. `verilator` is a compiler that converts Verilog HDL design into C++ or SystemC model which can be executed after compiling. We follow the [step-by-step installation](https://verilator.org/guide/latest/install.html) tutorial on website of verilator. After installing all the package it need and set up the env variable, we try the following instuction in the varilator folder:
`make test`
and get:

Seems like everything is in place. Then we try:
`sudo make install`

and finished the installation of verilator.
### Set up srv32
Now we have verilator set up, we can start installing our `srv32`. Before the installation, we have to make sure the **riscv toolchain** instructions are already set up in our environment, which means we have to be able to use the instructions like `riscv-none-embed-gcc` or `riscv-none-embed-objdump` in our terminal. Because the installation of riscv toolchain is finished in HW2, so we only need to `sudo apt install build-essential ccache` and we can try:

to see if we set up srv32 properly.
### Start simulation with srv32
First, we start with:
`make all`
and we get a very kong result, which we can see there are a lot of simulations going on, such as comparing the trace of **RTL and ISS**, something about **Coremark** and something about **Dhrystone**.
We can see there are some statistics of time, cycles and speed:

And find that there are 9 simulations which are all pass.

But the results are still confusing...
So let's try other instuctions to see if we can find some clues.
`make all-sw`

After trying this instruction, we can see it use the instruction set simulator(ISS) `rvsim` in the tool folder and run through all the programs in sw folder. It outputs the result and the statistics of all the programs.
`make tests-all`
From the result of this instruction, we can see it first execute the RTL simulation and then execute ISS simulation for every program in sw.
And from the Makefile in srv32 folder, we can see:

`tests-all` do all the `make tests-sw`, `make tests`, `make all` and `make all-sw` instuctions. Also we know that when type `make tests` , compliance tests will be run on the RTL simulator (sim) and compared the output with the reference output specified by riscv-compliance AND the output of ISS simulation (rvsim).
## Requirement 1
I pick the hamming distance code for this assignment. First I referenced from other c code in the `sw` folder and make my own one `HW3`.
```=linux
# in sw folder make homework directory
mkdir HW3
# Generate our own c code
# Copy other Makefile for reference
# Modify our own Makefile file
```
### C code
Reference from the C code last time and adjusted it to make sure we can run the C code by the command `gcc -o Hw3.out HW3.c`
```=c
#include <stdio.h>
int hammingDistance(int x, int y){
// XOR operation will output 1 if two bit are different.
int cmpResult = x ^ y;
// printf("..%d",cmpResult);
// Counting the bits of the compared result.
int dis = 0;
while (cmpResult) {
if (cmpResult & 1) {
++dis; // If the right most bit is 1, result value + 1
}
cmpResult >>= 1; // Shift right one bit on every loop.
// printf("...%d",cmpResult);
}
return dis;
}
int main(void){
int x=4;
int y=1;
printf("Hamming distance is %d\n",hammingDistance(x,y));
return 0;
}
```
### Makefile
Reference Makefile from other folder in `sw`. And simply modify the name.
```include ../common/Makefile.common
EXE = .elf
SRC = HW3.c
CFLAGS += -L../common
LDFLAGS += -T ../common/default.ld
TARGET = HW3
OUTPUT = $(TARGET)$(EXE)
.PHONY: all clean
all: $(TARGET)
$(TARGET): $(SRC)
$(CC) $(CFLAGS) -o $(OUTPUT) $(SRC) $(LDFLAGS)
$(OBJCOPY) -j .text -O binary $(OUTPUT) imem.bin
$(OBJCOPY) -j .data -O binary $(OUTPUT) dmem.bin
$(OBJCOPY) -O binary $(OUTPUT) memory.bin
$(OBJDUMP) -d $(OUTPUT) > $(TARGET).dis
$(READELF) -a $(OUTPUT) > $(TARGET).symbol
clean:
$(RM) *.o $(OUTPUT) $(TARGET).dis $(TARGET).symbol [id]mem.bin memory.bin
```
### Run the code by the simulator
Now we can run the simulation anad compare the result of RTL simulation and ISS simulation.
```
# Run the make command at the root path srv32 directory
make HW3
```
### Result
* RTL

* ISS

* Pass message

From the result, we can see the CPI is almost the same between 2 simulations. Also ISS simulation is much faster .
## Requirement 2
In this part, we will try to use tools `GTKWave` to help us known better about the control signal and the srv32 3-staged pipeline.
First we install it with command:
`sudo apt install gtkwave`
After proper installed, we'll need to get the `wave.fit` file for importing into the GTKWave. To achieve this, we can go to the `sim` directory and use the command:
`make HW3.run`
to get the wave.fit file we want. And then, we can import it into the GTKWave to get the signal graph we want to observe.
Before Analyse the control signals of our program, we first need to know the architecture of the srv32, which is:

And then, we can observe from our signal graph

that there are IF, EX and WB three stages of pipeline. The 32-bit instruction `00014297` in hexadecimal represents the first instruction in our assembly code `auipc t0, 0x14`
Also, from the `fetch_pc`, `if_pc`, `ex_pc`, `wb_pc` and `next_pc`, we can learn that the program counters' address are being passed during every stage of pipeline. And `next_pc` will get its value when the program counters went into `ex_pc`.

## Requirement 3
According to the information from lab. The srv32 pipeline suffer from **branch stall issue**. We can see the issue from the following image:

To improve this code, I came out the following code:
```=c
#include <stdio.h>
int hammingDistance(int x, int y){
// XOR operation will output 1 if two bit are different.
int cmpResult = x ^ y;
// printf("..%d",cmpResult);
// Counting the bits of the compared result.
int dis = 0;
while (cmpResult) {
dis+=(cmpResult & 1) ; // If the right most bit is 1, result value + 1
cmpResult >>= 1; // Shift right one bit on every loop.
// printf("...%d",cmpResult);
}
return dis;
}
int main(void){
int x=4;
int y=1;
printf("Hamming distance is %d\n",hammingDistance(x,y));
return 0;
}
```
which can eliminate the `if` branch in the loop. Hope to improve our code. However, after decrease on branch in the code, the perfermance got worse.

We can see there are no reduction in the cycles. Also the speed is slower than the original one. So I trace the program logic and find that although the new code has less if statement, it does one addition no matter what. Not like the original one will use the if statement to decrease the times of additions. Therefore, we failed to improve our code by eliminate the if statement.
## Requirement 4
* The RISC-V compliance test is made to confirm the basic operation based on the specification. It can benefit the software community and the tools/OS ecosystem. Although we want to make sure the basic operations work, the RISC-V compliance test won't test every aspect of the processor. It will set some benchmarks instead to check if the processor meets. And there's a technique call `signatures` that will ask for specific values in some of the place in memory to make sure the processor works properly.
* srv32 is a 3-stage RISC-V processor simulator program wrote by Verilog. Verilator is a simulator environment for Verilog. Verilator will convert HDL verilog into a behavior-oriented model in C++ based on the cycle to simulate. So in conclusion, srv32 is a script wrote by RTL language Verilog and use Verilator to turn it into C++. That makes it more intuitive to do the design or testing with its behavior-oriented feature.