# Assignment3: Single-cycle RISC-V CPU contributed by < [`yutingshih`](https://github.com/yutingshih) > ## Environment Setup - OS: macOS 14.1 - Architecture: AArch64 ### Install Verilator and GTKWave ```shell brew install verilator gtkwave ``` ### Install JDK and SBT ```shell ## Install sdkman curl -s "https://get.sdkman.io" | bash source "$HOME/.sdkman/bin/sdkman-init.sh" ## Install Eclipse Temurin JDK 11 sdk install java 11.0.21-tem sdk install sbt ``` ## Chisel HDL ### Hello World in Chisel This `Hello` module blinks the LED every 50000000 cycles with a duty ratio of 50%. That is, the LED is turned on for 25000000 cycles, and then turned off for 25000000 cycles. ```scala= class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(32.W)) val blkReg = RegInit(0.U(1.W)) cntReg := cntReg + 1.U when(cntReg === CNT_MAX) { cntReg := 0.U blkReg := ~blkReg } io.led := blkReg } ``` Since the maximum value of `CNT_MAX` is less than 4294967295 ($=2^{32} - 1$), we don't need 32 bits for `cntReg`. We only need $ceil(log_2(24999999)) = 25$ bits for the `cntReg`. Thus, the enhanced version of `Hello` module is as following, which saves 7 bits registers. ```scala= class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(25.W)) val blkReg = RegInit(0.U(1.W)) cntReg := cntReg + 1.U when(cntReg === CNT_MAX) { cntReg := 0.U blkReg := ~blkReg } io.led := blkReg } ``` For more flexible design, we can further enhance the `Hello` module's functionality by parameterizing the blinking period with the syntax of hardware generator of Chisel. Note that we use `log2Up` function to calculate the maximum number of bits of the counter `cntReg` needs. ```scala= class Hello(val period: Int = 50000000) extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (period / 2 - 1).U; val cntReg = RegInit(0.U(log2Up(CNT_MAX).W)) val blkReg = RegInit(0.U(1.W)) cntReg := cntReg + 1.U when(cntReg === CNT_MAX) { cntReg := 0.U blkReg := ~blkReg } io.led := blkReg } ``` ## Single Cycle CPU [yutingshih/ca2023-lab3](https://github.com/yutingshih/ca2023-lab3) Get the repository: ```shell $ git clone https://github.com/sysprog21/ca2023-lab3 $ cd ca2023-lab3 ``` To simulate and run tests for this project, execute the following commands under the `ca2023-lab3` directory. ```shell $ sbt test ``` Alternately, run `make` or `make test`. Since we have not made any efforts to correct the given implementation, all test cases will fail as shown below: ``` [info] *** 6 TESTS FAILED *** [error] Failed tests: [error] riscv.singlecycle.InstructionDecoderTest [error] riscv.singlecycle.ByteAccessTest [error] riscv.singlecycle.InstructionFetchTest [error] riscv.singlecycle.ExecuteTest [error] riscv.singlecycle.FibonacciTest [error] riscv.singlecycle.QuicksortTest [error] (Test / test) sbt.TestsFailedException: Tests unsuccessful ``` ### Waveform Analysis [GTKWave](https://gtkwave.sourceforge.net/) have problem to open on macOS 14 (see [issue #250](https://github.com/gtkwave/gtkwave/issues/250)). Fortunately, there is a waveform viewer extension for VSCode, [WaveTrace](https://www.wavetrace.io/), can be the alternative of GTKWave. ### Instruction Fetch Stage The `InstructionFetchTest` randomly generates the control signal `jump_flag_id` for branch taken or not taken to test if the value of `instruction_address` (program counter, PC) outputed by `InstructionFetch` module is as expected. - For the case of **branch not taken**, the value of next PC should be the current value of PC + 4. - For the case of **branch taken**, the value of next PC would be `jump_address_id`. #### Testbench ```shell $ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.InstructionFetchTest" ``` ``` [info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/) [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] Run completed in 4 seconds, 625 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` #### Case 1: Branch Not Taken **21 ps**: `io_jump_flag_id` is low, the next value of `pc` is set to be `pc + 4` (`0x1004`) ![image](https://hackmd.io/_uploads/HJbqgwQYT.png) #### Case 2: Branch Taken **23 ps**: `io_jump_flag_id` is high, the next value of `pc` is set to be `io_jump_address_id` (`0x1000`) ![image](https://hackmd.io/_uploads/SyAceDXtT.png) ### Instruction Decode Stage The `IntructionDecoderTest` sequentially gives sw (`sw x10, 4(x0)`), lui (`lui x5, 2`), and add (`add x3, x1, x2`) instructions to test if the `IntructionDecoder` module correctly decodes the instructions into the corresponding fields. The blank lines left in `InstructionDeocder.scala` should be implemented to generate the `memory_read_enable` and `memory_write_enable` control signals for the data memory (DM) read/write. #### Testbench ```shell $ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.InstructionDecoderTest" ``` ``` [info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/) [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] Run completed in 5 seconds, 118 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` **3 ps**: when the `io_instruction` is `0x00A02223` (`sw a0, 4(x0)`), the `io_memory_write_enable` should be `0b1`, `io_ex_immediate` should be `0x4`, `io_reg1_read_address` should be `0x0`, and `io_reg2_read_address` should be `0xA` (`a0`). ![image](https://hackmd.io/_uploads/SyyLBvmYa.png) ### Execute Stage The `Execute` module is responsible for the following computations: - arithmetic and logic computation - address of data memory access - control signal and address of branch jump Therefore, the `ExecuteTest` executes 100 additions (`add x3, x2, x1`) with randomly-generated operands assigned to `x1` and `x2` and then test the branch jump (`beq x1, x2, 2`) functionality with both taken and not taken cases. #### Testbench ```shell $ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.ExecuteTest" ``` ``` [info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/) [info] ExecuteTest: [info] Execution of Single Cycle CPU [info] - should execute correctly [info] Run completed in 4 seconds, 769 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` #### Case 1: Arithmetic Operations `io_instruction` is `0x001101B3` (`beq x1, x2, 2`) **3 ps**: `io_reg1_data` is `0x0517DC88` and `io_reg2_data` is `0x0848030E`, so the `io_mem_alu_result` should be `0x0D5FDF96` (`= 0x0517DC88 + 0x0848030E`) **5 ps**: `io_reg1_data` is `0x0CC22046` and `io_reg2_data` is `0x049DBDC0`, so the `io_mem_alu_result` should be `0x115FDE06` (`= 0x0CC22046 + 0x049DBDC0`) ![image](https://hackmd.io/_uploads/SJaEwvQtp.png) #### Case 2: Jump Branch Operations `io_instruction` is `0x00208163` (`beq x1, x2, 2`) **205 ps**: `io_reg1_data` is not equal to `io_reg2_data`, so `io_if_jump_flag` should be low. **207 ps**: `io_reg1_data` is not equal to `io_reg2_data`, so `io_if_jump_flag` should be high, and the next `pc` should be set as the value of `io_if_jump_address`. ![image](https://hackmd.io/_uploads/ByOVKD7K6.png) ### CPU Integration - Fibonacci The `FibonacciTest` recursively calculates the 10th term of Fibonacci sequence by loading the `src/main/resources/fibonacci.asmbin` execuable file. ```shell $ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.FibonacciTest" ``` ``` [info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/) [info] FibonacciTest: [info] Single Cycle CPU [info] - should recursively calculate Fibonacci(10) [info] Run completed in 6 seconds, 186 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### CPU Integration - Quick Sort The `QuicksortTest` performs a quicksort on 10 integers by loading the `src/main/resources/quicksort.asmbin` execuable file. ```shell $ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.QuicksortTest" ``` ``` [info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/) [info] QuicksortTest: [info] Single Cycle CPU [info] - should perform a quicksort on 10 numbers [info] Run completed in 6 seconds, 840 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### CPU Integration - Byte Access The `ByteAccessTest` stores and loads a single byte by loading the `src/main/resources/sb.asmbin` execuable file. ```shell $ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.ByteAccessTest" ``` ``` [info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/) [info] ByteAccessTest: [info] Single Cycle CPU [info] - should store and load a single byte [info] Run completed in 5 seconds, 959 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### Combine All Together ```shell $ sbt test ``` ``` [info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/) [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] InstructionDecoderTest: [info] ExecuteTest: [info] FibonacciTest: [info] Execution of Single Cycle CPU [info] InstructionDecoder of Single Cycle CPU [info] - should execute correctly [info] Single Cycle CPU [info] - should produce correct control signal [info] - should recursively calculate Fibonacci(10) [info] ByteAccessTest: [info] Single Cycle CPU [info] - should store and load a single byte [info] QuicksortTest: [info] Single Cycle CPU [info] - should perform a quicksort on 10 numbers [info] RegisterFileTest: [info] Register File of Single Cycle CPU [info] - should read the written content [info] - should x0 always be zero [info] - should read the writing content [info] Run completed in 25 seconds, 9 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ## Software Porting Port the program in [Homework 2](https://hackmd.io/@yutingshih/arch2023-homework2) to MyCPU. ### Modification Instead of showing the result on the screen, I modify `csrc/ln_bf16.c` to write the result (error code returned by the unit test function) to the address `0x4`, the same as `csrc/fibonacci.c`. ```c int main() { float average_error = 0, maximal_error = 0; int error_code = test_ln_bf16(40, &average_error, &maximal_error); *((volatile int *) (4)) = error_code; return error_code; } ``` I reuse the Makefile used in Homework 2 but add the additional targets `update` and `%.asmbin` for Homework 3. ```makefile # Usage: # make [all] compile all the targets # make TARGET compile a specific target # make TARGET [TARGET [...]] compile specific targets # make test run tests for all the targets # make test_TARGET run test for a specific target # make test_TARGET [test_TARGET [...]] run tests for specific targets # make clean delete all the executables # # Example: # make (compile all the targets) # make clean test (delete all the executables, compile and run all the tests) # make all test_mul_bf16 (compile all the targets but only run test for mul_bf16) NAME ?= ln_bf16 ELF = $(NAME).elf BIN = $(NAME).asmbin OBJ = $(addsuffix .o, $(BIN)) CROSS ?= riscv-none-elf- CC := $(CROSS)gcc OBJCOPY := $(CROSS)objcopy OPT := 0 CFLAGS := -Wall -O$(OPT) CPPFLAGS = LDLIBS = -lm ifdef CROSS CFLAGS += -march=rv32i_zicsr_zifencei -mabi=ilp32 RUNTIME ?= rv32emu endif all: $(BIN) %.elf: %.c -$(CC) $(CPPFLAGS) $(CFLAGS) -o $@ $< $(LDLIBS) %.asmbin: %.elf $(OBJCOPY) -O binary -j .text -j .data $< $@ test: $(addprefix test_, $(NAME)) test_%: %.elf -@$(RUNTIME) $< bench: CPPFLAGS += -I../perfcounter bench: LDLIBS += perfcounter/perfcount.a bench: test update: $(BIN) cp -f $(BIN) ../src/main/resources clean: -@$(RM) -v $(BIN) ``` ### Testbench Extend the testbench in `src/test/scala/riscv/singlecycle/CPUTest.scala` with `LnBf16Test`, which runs BFloat16 natural logarithm approximation with `src/main/resources/ln_bf16.asmbin` to verify the design of MyCPU. It will read the address `0x4` to get the error code of unit test function, and check if it is correct. ```scala class LnBf16Test extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Single Cycle CPU") it should "calculate approximation of ln_bf16" in { test(new TestTopModule("ln_bf16.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 50) { c.clock.step(1000) c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } c.io.mem_debug_read_address.poke(4.U) c.clock.step() c.io.mem_debug_read_data.expect(0.U) } } } ``` ### Correctness Compile the C source code and copy to the `src/main/resources` directory. ```shell $ make -f ln_bf16.mk test_ln_bn16 update ``` Run the testbench. ```shell $ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.LnBf16Test" ``` ``` [info] compiling 1 Scala source to /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/target/scala-2.13/test-classes ... [info] LnBf16Test: [info] Single Cycle CPU [info] - should calculate approximation of ln_bf16 [info] Run completed in 10 seconds, 276 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` When executing to 102003 ps, the testbench peek the memory address `0x4`, and the value stored is `0x0` which is as our expected (error code 0 means no error). ![image](https://hackmd.io/_uploads/rkoOUtmFp.png) ### Competibility To ensure the competibility between the program `ln_bf16.asmbin` and MyCPU, I temporarily removed the `RDCYCLE`/`RDCYCLEH` instructions by commenting the benchmark-related C code in `ln_bf16.c`.