# Lab3: SoftCPU contributed by < [geniuseric](https://github.com/geniuseric) > ## Requirement [Assignment3: SoftCPU](https://hackmd.io/@sysprog/2021-arch-homework3) ## Setup Environment 1. Prepare GNU Toolchain for RISC-V. See [The xPack GNU RISC-V Embedded GCC](https://xpack.github.io/riscv-none-embed-gcc/). ```shell $ cd /tmp $ wget https://github.com/xpack-dev-tools/riscv-none-embed-gcc-xpack/releases/download/v10.2.0-1.2/xpack-riscv-none-embed-gcc-10.2.0-1.2-linux-x64.tar.gz $ tar zxvf xpack-riscv-none-embed-gcc-10.2.0-1.2-linux-x64.tar.gz $ cp -af xpack-riscv-none-embed-gcc-10.2.0-1.2 $HOME/riscv-none-embed-gcc ``` 2. Configure `$PATH`. ```shell $ cd $HOME/riscv-none-embed-gcc $ echo "export PATH=`pwd`/bin:$PATH" > setenv ``` Once step (1) and (2) are complete, you can simply update `$PATH` environment variable via: ```shell $ cd $HOME $ source riscv-none-embed-gcc/setenv ``` Check `$PATH` at the first time: ```shell $ riscv-none-embed-gcc -v ``` 3. Fetch [SRV32](https://github.com/sysprog21/srv32). ```shell $ cd $HOME $ git clone https://github.com/sysprog21/srv32 ``` 4. Run compliance tests (https://github.com/sysprog21/srv32). ```shell $ cd $HOME/srv32/tools/ $ make $ cd $HOME/srv32/sim/ $ make $ cd $HOME/srv32/tests/ $ make tests # Run test v1 for RTL $ make tests-sw # Run test v1 for ISS simulator ``` ## Simple 3-stage pipeline RISC-V processor - The detailed description is written in [here](https://hackmd.io/@sysprog/S1Udn1Xtt). - [SRV32](https://github.com/sysprog21/srv32) is a three-stage pipeline processor with IF/ID,EX,WB stages. - It passes RV32IM compliance test. - It supports **full forwarding**, which means data hazard can be resolved WITHOUT stalling the processor. - The **branch penalty** is 2. - The pipeline architecture is shown below. ![](https://i.imgur.com/9lbFKBM.jpg) ## Search Insert Position - The original C code is written in [searchInsert.c](https://github.com/geniuseric/Computer_Architecture/blob/master/HW1/searchInsert.c). - In order to run on SRV32, I modify int to **volatile** int. - Also, I change ```mid = (down + up) / 2``` to ```mid = (down + up) >> 1``` for optimization. - I put [searchInsert](https://github.com/geniuseric/Computer_Architecture/tree/master/HW3/searchInsert) folder which contains the modified C code and Makefile into ```$HOME/srv32/sw``` folder. - Run code for RTL and ISS simulators. ```shell $ cd $HOME/srv32 $ make searchInsert ``` - The result is shown below. ```c= make[1]: Entering directory '/home/airobots/srv32/sw' make -C common make[2]: Entering directory '/home/airobots/srv32/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/airobots/srv32/sw/common' make[2]: Entering directory '/home/airobots/srv32/sw/searchInsert' riscv-none-embed-gcc -O3 -Wall -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -L../common -o searchInsert.elf searchInsert.c -lc -lm -lgcc -lsys -T ../common/default.ld riscv-none-embed-objcopy -j .text -O binary searchInsert.elf imem.bin riscv-none-embed-objcopy -j .data -O binary searchInsert.elf dmem.bin riscv-none-embed-objcopy -O binary searchInsert.elf memory.bin riscv-none-embed-objdump -d searchInsert.elf > searchInsert.dis riscv-none-embed-readelf -a searchInsert.elf > searchInsert.symbol make[2]: Leaving directory '/home/airobots/srv32/sw/searchInsert' make[1]: Leaving directory '/home/airobots/srv32/sw' make[1]: Entering directory '/home/airobots/srv32/sim' The insert position is 4 Excuting 1705 instructions, 2217 cycles, 1.300 CPI Program terminate - ../rtl/../testbench/testbench.v:418: Verilog $finish Simulation statistics ===================== Simulation time : 0.018 s Simulation cycles: 2228 Simulation speed : 0.123778 MHz make[1]: Leaving directory '/home/airobots/srv32/sim' make[1]: Entering directory '/home/airobots/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/searchInsert/searchInsert.elf The insert position is 4 Excuting 1705 instructions, 2217 cycles, 1.300 CPI Program terminate Simulation statistics ===================== Simulation time : 0.001 s Simulation cycles: 2217 Simulation speed : 4.024 MHz make[1]: Leaving directory '/home/airobots/srv32/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ``` ## View Waveform - Install [GTKWave](http://gtkwave.sourceforge.net/) and open it. ```shell $ sudo apt install gtkwave $ gtkwave ``` - Load the file ```$HOME/srv32/sim/wave.fst```. - Show the signals/events of the branch instruction ```beq a4,a5,ec``` at instruction address ```0x90```. ![](https://i.imgur.com/1lqB7KA.png) - Explain the instrction: 1. The previous instruction of 0x90 is ```0x8c```, which is ```lw a5,8(sp)```. The value of register ```a5``` is forwarded from 0x8c in WB stage to 0x90 in EX stage. 2. The time when 0x90 is in EX stage, the values of register ```a4``` and ```a5``` are equal. Therefore, the branch is taken and the next instruction of 0x90 should be ```0xec```. 3. Control signals ```ex_flush``` and ```wb_flush``` show when instructions ```0x94``` and ```0x98``` are flushed. Two instructions match the number of **branch penalty**, which is 2. ## Propose Software Optimization - For the further optimization, I eliminate **while** loop and use **goto** syntax instead. - The optimized code is written in [opt_searchInsert.c](https://github.com/geniuseric/Computer_Architecture/blob/master/HW3/opt_searchInsert/opt_searchInsert.c). - The optimized result is shown below. It saves ```2217-2214=3``` cycles. ```c= make[1]: Entering directory '/home/airobots/srv32/sw' make -C common make[2]: Entering directory '/home/airobots/srv32/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/airobots/srv32/sw/common' make[2]: Entering directory '/home/airobots/srv32/sw/opt_searchInsert' riscv-none-embed-gcc -O3 -Wall -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -L../common -o opt_searchInsert.elf opt_searchInsert.c -lc -lm -lgcc -lsys -T ../common/default.ld riscv-none-embed-objcopy -j .text -O binary opt_searchInsert.elf imem.bin riscv-none-embed-objcopy -j .data -O binary opt_searchInsert.elf dmem.bin riscv-none-embed-objcopy -O binary opt_searchInsert.elf memory.bin riscv-none-embed-objdump -d opt_searchInsert.elf > opt_searchInsert.dis riscv-none-embed-readelf -a opt_searchInsert.elf > opt_searchInsert.symbol make[2]: Leaving directory '/home/airobots/srv32/sw/opt_searchInsert' make[1]: Leaving directory '/home/airobots/srv32/sw' make[1]: Entering directory '/home/airobots/srv32/sim' The insert position is 4 Excuting 1702 instructions, 2214 cycles, 1.300 CPI Program terminate - ../rtl/../testbench/testbench.v:418: Verilog $finish Simulation statistics ===================== Simulation time : 0.019 s Simulation cycles: 2225 Simulation speed : 0.117105 MHz make[1]: Leaving directory '/home/airobots/srv32/sim' make[1]: Entering directory '/home/airobots/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/opt_searchInsert/opt_searchInsert.elf The insert position is 4 Excuting 1702 instructions, 2214 cycles, 1.301 CPI Program terminate Simulation statistics ===================== Simulation time : 0.001 s Simulation cycles: 2214 Simulation speed : 3.919 MHz make[1]: Leaving directory '/home/airobots/srv32/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ```