Assignment 3: Your Own RISC-V CPU

# Assignment 3: Your Own RISC-V CPU Contributed by <`ryanycs`> [GitHub repo](https://github.com/ryanycs/ca2025-mycpu) [TOC] ## Chisel Bootcmap ### Hello World in Chisel ```scala class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(32.W)) val blkReg = RegInit(0.U(1.W)) cntReg := cntReg + 1.U when(cntReg === CNT_MAX) { cntReg := 0.U blkReg := ~blkReg } io.led := blkReg } ``` The 'Hello World in Chisel' implements a LED blinker using a counter. ## 1-single-cycle ### Test Summary #### InstructionDecoderTest This test validates the Instruction Decode (ID) stage of a single-cycle RV32I CPU. It checks that each instruction type generates the correct control signals, including: - `ex_aluop1_source` - `ex_aluop2_source` - `ex_immediate` - `regs_reg1_read_address` - `reg_write_enable` - `reg_write_address` - `wb_reg_write_source` - `memory_read_enable` - `memory_write_enable` #### InstructionFetchTest This test validates the Instruction Fetch (IF) stage of the single-cycle CPU, which focuses on Program Counter (PC) update logic, specifically, PC increment (`PC + 4`) or control-flow changes (`Jump`). The test program randomly set control signals between no jump and jump, expecting instruction fetching updates the PC correctly. #### ExecuteTest This test validates the Execute (EX) stage of the single-cycle CPU, which focuses on ALU computation result (`ADD`), branch condition (`BEQ`, `BEQU`, `BNE`), and jump target generation. #### RegisterFileTest This test validate the Register File module of the single-cycle RISC-V CPU, which focuses on register write and read, register `x0` behavior, and registers write-through support. #### CPUTest This test validate the full single-cycle CPU integration. It focus on end-to-end program execution for real RISC-V programs, such as `fibonacci.asmbin` and `quicksort.asmbin`. ### Issues Encounter #### MuxLookup When using an expression as the result value in `MuxLookup`, It need to wrap the expression in parentheses: ```scala // ============================================================ // [CA25: Exercise 8] WriteBack Source Selection // ============================================================ io.regs_write_data := MuxLookup(io.regs_write_source, io.alu_result)( Seq( RegWriteSource.Memory -> io.memory_read_data, RegWriteSource.NextInstructionAddress -> (io.instruction_address + 4.U) // <-here ) ) ``` If there is no parentheses, the compiler will report type mismatch error. ```log [error] /home/ryanycs/ca2025-mycpu/1-single-cycle/src/main/scala/riscv/core/WriteBack.scala:49:75: type mismatch; [error] found : chisel3.UInt [error] required: String [error] RegWriteSource.NextInstructionAddress -> io.instruction_address + 4.U [error] ^ [error] one error found [error] (Compile / compileIncremental) Compilation failed ``` This error confused me for a while because the `type mismatch` does not directly indicate that the issue is caused by missing parentheses. ### Test Result ```log [info] InstructionDecoderTest: [info] InstructionDecoder [info] - should decode RV32I instructions and generate correct control signals [info] ByteAccessTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly handle byte-level store/load operations (SB/LB) [info] InstructionFetchTest: [info] InstructionFetch [info] - should correctly update PC and handle jumps [info] ExecuteTest: [info] Execute [info] - should execute ALU operations and branch logic correctly [info] FibonacciTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly execute recursive Fibonacci(10) program [info] RegisterFileTest: [info] RegisterFile [info] - should correctly read previously written register values [info] - should keep x0 hardwired to zero (RISC-V compliance) [info] - should support write-through (read during write cycle) [info] QuicksortTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly execute Quicksort algorithm on 10 numbers [info] Run completed in 30 seconds, 312 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 31 s, completed Dec 8, 2025 5:57:28 PM ``` #### Compliance Test ![image](https://hackmd.io/_uploads/Hkhhe7VGWl.png) ## 2-mmio-trap ### Test Summary #### CLINTCSRTest This test validate the Core Local Interruptor (CLINT) and Control and Status Registers (CSR) of the single-cycle CPU. It focuses on machine-mode interrupt handling, trap/exception processing, and environmental instruction. #### ExecuteTest This test validates the CSR (Control and Status Register) write-back logic within the Execute stage of the single-cycle CPU, which focuses is on computing the correct value to write to CSR registers for different CSR instructions. #### TimerTest This test validates the Timer module’s memory-mapped I/O (MMIO) registers. Specifically, it checks that the limit register and enable status can be written to and read back correctly. #### UartMMIOTest This test validates the UART MMIO interface in the single-cycle CPU. It ensures that TX and RX operations, as well as baud rate and enable control, function correctly when driven by CPU memory accesses. ### Test Result ```log [info] ByteAccessTest: [info] [CPU] Byte access program [info] - should store and load single byte [info] CLINTCSRTest: [info] [CLINT] Machine-mode interrupt flow [info] - should handle external interrupt [info] - should handle environmental instructions [info] UartMMIOTest: [info] [UART] Comprehensive TX+RX test [info] - should pass all TX and RX tests [info] ExecuteTest: [info] [Execute] CSR write-back [info] - should produce correct data for csr write [info] FibonacciTest: [info] [CPU] Fibonacci program [info] - should calculate recursively fibonacci(10) [info] TimerTest: [info] [Timer] MMIO registers [info] - should read and write the limit [info] InterruptTrapTest: [info] [CPU] Interrupt trap flow [info] - should jump to trap handler and then return [info] QuicksortTest: [info] [CPU] Quicksort program [info] - should quicksort 10 numbers [info] Run completed in 32 seconds, 550 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 33 s, completed Dec 8, 2025 5:58:58 PM ``` #### Compliance Test ![image](https://hackmd.io/_uploads/SJRAU7NGZx.png) ## 3-pipeline ### Test Summary #### PipelineProgramTest This test verifies that the pipeline CPU: - Execute programs correctly (`fibonacci.asmbin`, `quicksort.asmbin`, `sb.asmbin`). - Handle data and control hazards correctly. - Handle machine-mode traps. #### PipelineRegisterTest This test validates that the pipeline register module correctly handles: - normal data propagation (`stall = false`, `flush = false`), register value = current value - stall (`stall = true`, `flush = false`), register value = previous value - flush (`stall = false`, `flush = true`), register value = default value ### Hazard Detection Summary and Analysis - Why do we need to stall for load-use hazards? (Hint: Consider data dependency and forwarding limitations) > [!Note] Ans > Forwarding is sufficient to solve RAW data hazards when the result is computed in the `EX` stage of an instruction because its result can then be forwarded to the `EX` stage of the next instruction. However, the `lw` instruction does not finish reading data until the end of the `MEM` stage, so its result cannot be forwarded to the `EX` stage of the next instruction. > > A solution is to *stall* the pipeline, holding up operation until the data is available. - What is the difference between "stall" and "flush" operations? (Hint: Compare their effects on pipeline registers and PC) > [!Note] Ans > Stalling a stage is performed by **disabling its pipeline register** (i.e., the register to the left of a stage) so that the stage’s inputs do not change. When a stage is stalled, all previous stages must also be stalled so that no subsequent instructions are lost. The pipeline register directly after the stalled stage must be **cleared** (flushed) to prevent bogus information from propagating forward. > > More specifically, the *stall* is the control signal connected to *enable* of the pipeline registers, and the *flush* is the control signal connected to *reset* of the pipeline registers. - Why does jump instruction with register dependency need stall? (Hint: When is jump target address available?) > [!Note] Ans > Since the jump target address is available at `ID` stage, a jump instruction with register dependency (e.g., `jalr x0, x1, 0`) needs a stall if the value of the register it depends on `x1` is being written by a preceding instruction that hasn't reached the `WB` stage yet or hasn't reached a stage where its result can be forwarded to the `ID` stage: > ``` > IF ID EX MEM WB (add x1, x2, x3) > │ > │ forward > ▼ > IF ID ID EX MEM WB (jalr x0, x1, 0) > ``` - In this design, why is branch penalty only 1 cycle instead of 2? (Hint: Compare ID-stage vs EX-stage branch resolution) > [!Note] Ans > The branch penalty is only 1 cycle because branch resolution is moved from the EX stage to the ID stage. > - ID-stage branch resolution > If the branch is taken, Only the instruction in the IF/ID register needs to be flushed. There is 1-cycle penalty. > ``` > IF ID > IF ◄─ flush! > ``` > - EX-stage branch resolution > If the branch is taken, two instructions must be flushed (the one in IF/ID and the one in ID/EX). There is 2-cycle penalty. > ``` > IF ID EX > IF ID ◄─ flush! > IF ◄─ flush! > ``` - What would happen if we removed the hazard detection logic entirely? (Hint: Consider data hazards and control flow correctness) > [!Note] Ans > If the hazard detection logic were removed entirely, the pipeline cpu would compute the wrong result. There are two hazards will encounter: > - Data Hazard > A data hazard occurs when an instruction tries to read a register that has not yet been written back by a previous instruction. > - Control Hazard > A control hazard occurs when the decision of what instruction to fetch next has not been made by the time the fetch takes place. - Complete the stall condition summary: > [!Note] Ans > Stall is needed when: > 1. The instruction in the `ID` stage is `Jump` or the instruction in the `EX` stage is `Load`, and if The instruction in the `ID` stage requires a register value being produced by the instruction in the `EX` stage (EX stage condition) > 2. The `Jump` instruction in the `ID` stage requires a register value being produced by a `Load` instruction in the `MEM` stage. (MEM stage condition) > > Flush is needed when: > 1. Branch is taken (Branch/Jump condition) ### Handwritten RISC-V Assembly Code in Homework2 I use `int32_to_bf16.S` to ensure the pipelined RISC-V CPU functions correctness. #### 3-pipeline/src/test/scala/riscv/PipelineProgramTest.scala ```scala it should "convert int32 to bf16 correctly" in { runProgram("int32_to_bf16.asmbin", cfg) { c => c.clock.setTimeout(0) c.clock.step(10000) c.io.regs_debug_read_address.poke(10.U) // a0 c.io.regs_debug_read_data.expect(0x0.U) } } ``` ```log [info] PipelineProgramTest: [info] Three-stage Pipelined CPU [info] - should convert int32 to bf16 correctly [info] Five-stage Pipelined CPU with Stalling [info] - should convert int32 to bf16 correctly [info] Five-stage Pipelined CPU with Forwarding [info] - should convert int32 to bf16 correctly [info] Five-stage Pipelined CPU with Reduced Branch Delay [info] - should convert int32 to bf16 correctly ``` ### Issues Encounter #### chiseltest.TimeoutException For testing my pipelined cpu, I use `int32_to_bf16.S` which I written in homework2 to ensure it functions correctness. But I found that It needs more than 1000 cycles to complete the full program. However, If I changed the steps of clock from `c.clock.step(1000)` to `c.clock.step(10000)`, there is a `chiseltest.TimeoutException` telling that I can not step over 1000 cycles. ```log [info] - should convert int32 to bf16 correctly *** FAILED *** [info] chiseltest.TimeoutException: timeout on TestTopModule.clock: IO[Clock] at 1000 idle cycles. You can extend the timeout by calling .setTimeout(<n>) on your clock (setting it to 0 means 'no timeout'). ``` The solution is add `c.clock.setTimeout(0)` to tell chiseltest taht there is no timeout for clock stepping. ### Test Result ```log [info] PipelineProgramTest: [info] Three-stage Pipelined CPU [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] Five-stage Pipelined CPU with Stalling [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] Five-stage Pipelined CPU with Forwarding [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] Five-stage Pipelined CPU with Reduced Branch Delay [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] PipelineUartTest: [info] Three-stage Pipelined CPU UART Comprehensive Test [info] - should pass all TX and RX tests [info] Five-stage Pipelined CPU with Stalling UART Comprehensive Test [info] - should pass all TX and RX tests [info] Five-stage Pipelined CPU with Forwarding UART Comprehensive Test [info] - should pass all TX and RX tests [info] Five-stage Pipelined CPU with Reduced Branch Delay UART Comprehensive Test [info] - should pass all TX and RX tests [info] PipelineRegisterTest: [info] Pipeline Register [info] - should be able to stall and flush [info] Run completed in 1 minute, 53 seconds. [info] Total number of tests run: 29 [info] Suites: completed 3, aborted 0 [info] Tests: succeeded 29, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 114 s (01:54), completed Dec 8, 2025 6:05:29 PM ``` #### Compliance Test ![image](https://hackmd.io/_uploads/r1kWwXNGZe.png) ## Acknowledge This assignment was completed with assistance from [Github Copilot](https://github.com/features/copilot) auto completions for code/comments writing (Agent mode does not use). [ChatGPT](https://chatgpt.com/) is used to refine the writing of this document. ## Reference [Digital Design and Computer Architecture, RISC-V Edition]()