CA 2021 Term Project - Analyze kleine-riscv and validate its pipeline design

# CA 2021 Term Project - Analyze kleine-riscv and validate its pipeline design In this project, despite I have no experience on Verilog, I will try to use all the knowledge of computer architecture I learned in the class this semester to learn and understand how a processor works and implements. There will be two parts of sharing in this article. First, I will show my experience of setting up the kleine-riscv step-by-step, to help other beginners like me who want to try using kleine-riscv but can hardly find document on official Github or anywhere else on the internet. Second, I will try to read the Verilog code and combine the stuffs I've learned in the class about pipeline, control signals and hazard to learn how to design a processor. I will share what I've learned during this process. Without further ado, let's started! ## Implement kleine-riscv on Ubuntu [Kleine-riscv](https://github.com/rolandbernard/kleine-riscv) is a small RISC-V core written in synthesizable Verilog that supports the RV32I unprivileged ISA and parts of the privileged ISA, namely M-mode. `# M-mode is for simple embedded systems that run trusted application code with no memory protection (beyond trapping non-existent memory addresses); very low cost to implement.` To implement the kleine-riscv, first we prepare an operating system, in my case Ubuntu. And then download the code of kleine-riscv form Github. ``` git clone https://github.com/rolandbernard/kleine-riscv.git ``` After downloading, we change the directory and run make. ``` # change dir cd kleine-riscv # make file make ``` Ohh no, first error message... ![](https://i.imgur.com/lqDteZ9.png) Fix it with installing the `clang` library. ``` sudo apt-get install clang ``` Try again! Ohh no, not again... ![](https://i.imgur.com/dfPae8e.png) From the error message, it seems like the `ld` has no emulation mode it wants. This is because the `ld` in my Ubuntu is supported by the GNU Scientific Library(gsl). And seems like this ld is not working. So I reinstall the `ld` by intalling the LLVM Compiler Infrastructure Project. ``` $ git clone https://github.com/llvm/llvm-project llvm-project $ mkdir build $ cd build $ cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=lld -DCMAKE_INSTALL_PREFIX=/usr/local ../llvm-project/llvm $ make install ``` And then remember to add the path to the system library. (For kleine-riscv, we need `ld.lld`) ``` sudo ln -s /home/harvey/llvm-project/build/bin/ld.lld /usr/bin/ld ``` Now we can finally run `make`, from the `makefile` we can see it will run `make sim` and `make test`. `make sim` will create the build directory and generate the `Vcore` that is the core program to run C program or riscv program; After Vcore was established, `make test` will scan for the .c(c code) and .S(assembly code) file in `tests` directory and build the ELF file in `/kleine-riscv/tests/build`, then use Vcore to run ELF file. ![](https://i.imgur.com/imTsvJo.png) These test code will use the header file in `/kleine-riscv/tests/include`. ![](https://i.imgur.com/GKi5Rew.png) You can design your own C/assembly programs, design the test case and see if the output is correct. For example, in `/kleine-riscv/tests/misc/prime.c` we alter the test case #5 ![](https://i.imgur.com/DFKeTq2.png) run `make test` and get error message because 4 is not a prime number. ![](https://i.imgur.com/OaxuZLh.png) To conclude the first part, we try to set up the kleine-riscv, fixed some obstacles and observe what it is capable of. There is one concern that the `Vcore` won't show any message when the instruction is successfully done. So it took me a lot of time to figure out what it can do. ## Analyse the design & architecture of klein-riscv processor In the second part, because kleine-riscv is a 5-staged pipeline processor, so based on the knowledge I've learned about pipeline, control signals and hazard in the course. I will try to compare with the course materials to see the difference and understand how the processor is designed, so maybe someday I can design my own processor. Let's look at the architecture of the 5-staged pipeline processor from the lecture materials first. ![](https://i.imgur.com/mHkYOpu.png) I assume the `kleine-riscv` will look very much like this, so now I will start look through the Verilog code and see if it is similar. First, we start from observing the structure of it. ``` └── kleine-riscv/ └── src/ ├──pipeline/ │ ├── decode.v │ ├── execute.v │ ├── fetch.v │ ├── memory.v │ ├── pipeline.v │ └── writeback.v ├── units/ │ ├── alu.v │ ├── busio.v │ ├── cmp.v │ ├── csr.v │ ├── hazard.v │ └── regfile.v ├── core.v └── params.vh ``` From `pipeline`, we can see there are 5 stages of pipeline `IF`,`ID`,`EX`,`MEM` and `WB`. From `units`, we can guess is the components in the graph above, such as `alu`, `bus io`, `branch compare unit`, `control signal register`,`hazard bypass wire` and `register` Now we'll dive into some Verilog code and see what is inside. Start from the components in `units`. * `alu.v` This component is for calculating stuffs. There are inputs for 2 operands a and b(32-bits). Also `funct3` input to select which operator will the alu execute; A `function_modifier` input as signal input. The following are the operators supported by alu. ![](https://i.imgur.com/TInj4s7.png) However, in the code, there are 2 outputs which are 1st cycle and 2nd cycle output that I still have no idea why. * `busio.v` This part is the communication signal between register and memory. * `cmp.v` This part compares input data a and b. Also input unsign signal to decide whether is signed number. Moreover, output the signal of branch equal and branch less than. Exactly like the graph above shows. * `csr.v` Control signal register. This part of code is hard to understand. * `hazard.v` This part is the unit to identify whether the pipeline has hazard. Its goal is to output two signals: stall and invalidate, to help all of the components in pipeline to know how to deal with the data. * `regfile.v` In this component, it is the register in `ID` stage. There are inputs of rs1, rs2 and rd register address(5-bits), rd data(32-bits); Output of rs1 and rs2 data. Also a signal input clk to update the rd register when the clock value is positive. So now, after we briefly went through the code. Let's see how does the processor works to deal with the hazards. Kleine-riscv uses the hazard unit to deal with the hazards. These are the hazards it deals with: **EX data hazard**, **MEM data hazard** and **Control hazard**. ![](https://i.imgur.com/GjbleuN.png) First, from `pipeline/execute.v` we can see that there's input to choose whether the data go into the ALU is the original data from the registers or the data sent from EX/MEM to ID. ![](https://i.imgur.com/Wxea5Z8.png) So we can learned that kleine-riscv has the mechanism of bypassing value from one stage to another. And then we check the `units/hazard.v` to see what and how it design to deal with these hazards. We know that the hazard unit will generate the `stall` and `invalidate` signal to establish the system to fix hazard. If `stall=True`, the components in pipeline will stall and do nothing. And if `stall=False`, it will also check `invalidate` and make sure it is `False` to proceed what they supposed to do. So let's see how these two signals generate. From the following chunk of code: ![](https://i.imgur.com/ps9cOJ6.png) We learned that if `reset` is true, all of the stage will get the `invalidate=True` signal; If `trap_inalidate` is True then the `branch_invalidate` is also True. That also will get `invalidate=True`. ![](https://i.imgur.com/bPTs1w8.png) So why it has to use two variables to get the same result? It is because of the scenario when **branch is taken**. The `invalidate` will be true when the branch is take, so the IF, ID and EX should flush (do nothing) what they are doing when the branch taken or not is decided during EX stage. About the `EX` and `MEM` data hazard, we can observe how they work from this chunk of code: ![](https://i.imgur.com/vcpSThm.png) We can see from first section, it will check if the register address of 1 and 2 in the `ID` stage instruction is the same as the destination register's address of `EX` stage instruction. This is the scenario of the destination register being used in the next 2 instructions. ![](https://i.imgur.com/M8lx87z.png) In the second section, it checks if the register address of 1 and 2 in the `ID` stage instruction is the same as the destination register's address of `MEM` stage instruction. This is the scenario of the destination register being used in the next instruction when now there is a `lw` instruction in the `EX` stage. ![](https://i.imgur.com/2O20qKq.png) These two scenario will cause the `IF` to stall. After observing the design of how the processor deal with hazards, however, I'm not sure if the design can check if the 2nd instruction behind in scenario 1 can be detected. ``` add t0,t1,t2 sub t4,t0,t3 and t5,t0,t6 # this instruction ``` Because this instruction will also cause data hazard. So maybe it should also check the `IF` stage instruction's register 1 and 2 address to make the process more complete. ```=verilog ###units/hazard.v### module hazard ( input reset, ##### add this // from fetch input valid_fetch, input [4:0] rs1_address_fetch, input [4:0] rs2_address_fetch, input uses_rs1_fetch, input uses_rs2_fetch, ##### // from decode input valid_decode, input [4:0] rs1_address_decode, #change variable name input [4:0] rs2_address_decode, #change varaible name input uses_rs1, input uses_rs2, input uses_csr, . . . ) wire data_hazard = valid_decode && ( (valid_execute && rd_address_execute != 0 && ( uses_rs1_decode && rs1_address_decode == rd_address_execute || uses_rs2_decode && rs2_address_decode == rd_address_execute )) #change varaible name ##### add here ||(valid_execute && rd_address_execute != 0 && ( uses_rs1_fetch && rs1_address_fetch == rd_address_execute || uses_rs2_fetch && rs2_address_fetch == rd_address_execute )) ##### || (valid_memory && rd_address_memory != 0 && !bypass_memory && ( uses_rs1 && rs1_address_decode == rd_address_memory || uses_rs2 && rs2_address_decode == rd_address_memory )) || uses_csr && ( csr_write_execute && valid_execute || csr_write_memory && valid_memory || csr_write_writeback && valid_writeback )); ```