# Assignment3: SoftCPU ###### tags: `CA` contributed by < [reputation0809](https://github.com/reputation0809) > ## Requirement [Assignment3: SoftCPU](https://hackmd.io/@sysprog/2021-arch-homework3) ## Set Environment 1. Download riscv-none-embed-gcc for srv32 [The xPack GNU RISC-V Embedded GCC](https://xpack.github.io/riscv-none-embed-gcc/). ```shell= cd $HOME wget https://github.com/xpack-dev-tools/riscv-none-embed-gcc-xpack/releases/download/v10.2.0-1.2/xpack-riscv-none-embed-gcc-10.2.0-1.2-linux-x64.tar.gz tar zxvf xpack-riscv-none-embed-gcc-10.2.0-1.2-linux-x64.tar.gz cd riscv-none-embed-gcc echo "export PATH=`pwd`/bin:$PATH" > setenv ``` 2. Configure `$PATH`. ```shell= echo "export PATH=`pwd`/bin:$PATH" > setenv cd $HOME source riscv-none-embed-gcc/setenv ``` 3. Download needed packages ```shell= sudo apt-get install lcov sudo apt-get install ccache ``` 4. Git srv32 ```shell= git clone https://github.com/sysprog21/srv32 ``` 5. Into srv32 and build simulations ```shell= cd srv32/tools make cd ../sim make ``` 6. Check ```shell= cd .. make all ``` ![](https://i.imgur.com/NVQSgFv.png) ## Requirement 1 * The original C code you can find in [here](https://github.com/reputation0809/ca_hw3/blob/main/origin_main.c). * Due to srv32, I need to modify `int` to `volatile int`. ### Run simulaiton on srv32 1. create a new folder `main` in `sw` 2. create C file in `main` called `main.c` 3. copy `Makefile` from `hello` to `main` 4. modify [Makefile](https://github.com/reputation0809/ca_hw3/tree/main) in `main` 5. back to srv32 root to run `make main` ![](https://i.imgur.com/yCVCT5v.png) ## Requirement 2 ### SRV32 - Simple 3-stage pipeline RISC-V processor Architecture * This is a simple RISC-V 3-stage pipeline processor and supports FreeRTOS. * This core is three-stage pipeline processors, which is Fetch & Decode (F/D), execution (E) and write back (WB). * **Register Forwarding**: The problem with data hazards, introduced by this sequence of instructions can be solved with a simple hardware technique called forwarding. When the execution result accesses the same register, the execution result is directly forwarded to the next instruction. * **Branch Penalty**: When the branch is taken during the execute phase, it needs to flush the instructions that have been fetched into the pipeline, which causes a delay of two instructions, so the extra cost of the branch is two. * **Memory Interface**: One instruction memory and one data memory. The instruction memory is read-only for one read port, while data memory is two port, one for reading and one for writing. * **SRV32 pipeline architecture:** ![](https://i.imgur.com/RSfT3FO.jpg) ### Use GTKWave - Install [GTKWave](http://gtkwave.sourceforge.net/). ```shell sudo apt install gtkwave gtkwave ``` * Activate GTKWave and load `sim/wave.fst` generated by srv32. ![](https://i.imgur.com/WcGDCAz.png) * Append signals and show. ![](https://i.imgur.com/JeMl5y7.png) * Take `lw` instruction for example. * We can see `sim/trace.log` ```shell= 418 00002c74 01042683 read 0x0002123c => 0x00000000, x13 (a3) <= 0x00000000 ``` * The `pc` is `00002c74`, `inst` is `01042683`, `read address` is `0x0002123c`, `read data` is `0x00000000`. * First of all, `pc` is transfered from `if` stage to `wb` stage ![](https://i.imgur.com/DYXYfL6.png) * In the `if` stage, fetch the instruction from `imem`, read address as `imem_addr`, read data as `imem_rdata`. ![](https://i.imgur.com/Uu05nJv.png) * In the `ex` stage, program will count the read address as `dmem_raddr`. * `wb_mem2reg=1` means load memory data to register. * In the `wb` stage, go to DMEM load data as `dmem_rdata` and write to register based on `wb_raddress`. ![](https://i.imgur.com/NwPNg7U.png) ## Requirement 3 * Originally, I ran my C code and got the simulation statistics: ```shell= Simulation statistics ===================== Simulation time : 0.014 s Simulation cycles: 2471 Simulation speed : 0.179 MHz ``` * Now, I wonder how to optimize my code to have a fewer instructions and cycle counts. * I fiqure out that I originally use Binary Search on my program, but the array only has three data. * Therefore, I think maybe using a simple and intuitional way to solve the problem can have a fewer instructions. * You can find the new code I propose [here](https://github.com/reputation0809/ca_hw3/blob/main/opt_main.c). * Finally, I run simulation and observe the results. ![](https://i.imgur.com/0sMcHG9.png) * VoilĂ , the instructions magically reduce `1887-1830=57` and the cycles reduce `2471-2412=59`. ## Requirement 4 ### How RISC-V Compliance Tests works * According to [README](https://github.com/riscv-non-isa/riscv-arch-test#readme) from [riscv-arch-test](https://github.com/riscv-non-isa/riscv-arch-test), we can easily know how the compliance tests works. * Running the following command will clone the repository into `srv32/tests` and do the compliance test. ```shell= make tests # run the compliance test for RTL make tests-sw # run the compliance test for ISS simulator ``` * Results of running `make tests-sw` ```shell= OK: 48/48 RISCV_TARGET=srv32 RISCV_DEVICE=rv32i RISCV_ISA=rv32i OK: 8/8 RISCV_TARGET=srv32 RISCV_DEVICE=rv32im RISCV_ISA=rv32im OK: 6/6 RISCV_TARGET=srv32 RISCV_DEVICE=rv32Zicsr RISCV_ISA=rv32Zicsr ``` ### How srv32 works with Verilator * Verilator is a RTL simulation for verilog code. * By running Verilator in srv32, we can know how our code works and also the simulation statistics including time, cycles, and speed. * After realizing the data from Verilator, we can analyze the performance of our RTL code.