# Assignment3: Your Own RISC-V CPU contributed by [< rara0857 >](https://github.com/rara0857/ca2025-mycpu) ::: warning **Disclaimer:** I used AI tools such as ChatGPT / Gemini to better understand algorithm and architecture descriptions. ::: ## Environment WSL version: 2.6.2.0 WSL type: WSL2 Distro: Ubuntu 24.04.1 LTS ## Environment Setup ``` sudo apt install build-essential verilator gtkwave curl -s "https://get.sdkman.io" | bash sdk install java $(sdk list java | grep -o "\b8\.[0-9]*\.[0-9]*\-tem" | head -1) sdk install sbt ``` ## Step Forked from https://github.com/sysprog21/ca2025-mycpu Goal: finish `0-minimal`, `1-single-cycle`, `2-mmio-trap`, `3-pipeline`, and make sure all the corresponding unit tests pass. ### Test ```command= $ sbt "project minimal" test $ sbt "project singleCycle" test $ sbt "project mmioTrap" test $ sbt "project pipeline " test ``` ### 0-minimal **Console output:** ```console.log [info] JITTest: [info] Minimal CPU - JIT Test [info] - should correctly execute jit.asmbin and set a0 to 42 [info] Run completed in 46 seconds, 614 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### 1-single-cycle There are 9 exercises in `1-single-cycle`: | Exercise | Name | File | |:--------:|:-------------------------------:|:-----------------------:| | 1 | Immediate Extension | InstructionDecode.scala | | 2 | Control Signal Generation | InstructionDecode.scala | | 3 | ALU Control Decode | ALUControl.scala | | 4 | Branch Comparison Logic | Execute.scala | | 5 | Jump Target Address Calculation | Execute.scala | | 6 | Load Data Extension | MemoryAccess.scala | | 7 | Store Data Alignment | MemoryAccess.scala | | 8 | Write-Back Multiplexer | WriteBack.scala | | 9 | PC Update Logic | InstructionFetch.scala | **Console output:** ```console.log [info] InstructionDecoderTest: [info] InstructionDecoder [info] - should decode RV32I instructions and generate correct control signals [info] ByteAccessTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly handle byte-level store/load operations (SB/LB) [info] InstructionFetchTest: [info] InstructionFetch [info] - should correctly update PC and handle jumps [info] ExecuteTest: [info] Execute [info] - should execute ALU operations and branch logic correctly [info] FibonacciTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly execute recursive Fibonacci(10) program [info] RegisterFileTest: [info] RegisterFile [info] - should correctly read previously written register values [info] - should keep x0 hardwired to zero (RISC-V compliance) [info] - should support write-through (read during write cycle) [info] QuicksortTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly execute Quicksort algorithm on 10 numbers [info] Run completed in 29 seconds, 613 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### 2-mmio-trap There are 13 exercises in `2-mmio-trap`: | Exercise | Name | File | |:--------:|:-------------------------------------------:|:-----------------------:| | 1 | Immediate Extension | InstructionDecode.scala | | 2 | Control Signal Generation | InstructionDecode.scala | | 3 | ALU Control Logic | ALUControl.scala | | 4 | Branch Comparison Logic | Execute.scala | | 5 | Jump Target Address Calculation | Execute.scala | | 6 | Load Data Extension | MemoryAccess.scala | | 7 | Store Data Alignment | MemoryAccess.scala | | 8 | WriteBack Source Selection with CSR Support | WriteBack.scala | | 9 | CSR Register Lookup Table | CSR.scala | | 10 | CSR Write Priority Logic | CSR.scala | | 11 | Interrupt Entry — mstatus Transition | CLINT.scala | | 12 | Trap Return (MRET) — mstatus Restoration | CLINT.scala | | 13 | PC Update Logic with Interrupts | InstructionFetch.scala | **Console output:** ```console.log [info] ByteAccessTest: [info] [CPU] Byte access program [info] - should store and load single byte [info] CLINTCSRTest: [info] [CLINT] Machine-mode interrupt flow [info] - should handle external interrupt [info] - should handle environmental instructions [info] UartMMIOTest: [info] [UART] Comprehensive TX+RX test [info] - should pass all TX and RX tests [info] ExecuteTest: [info] [Execute] CSR write-back [info] - should produce correct data for csr write [info] FibonacciTest: [info] [CPU] Fibonacci program [info] - should calculate recursively fibonacci(10) [info] TimerTest: [info] [Timer] MMIO registers [info] - should read and write the limit [info] InterruptTrapTest: [info] [CPU] Interrupt trap flow [info] - should jump to trap handler and then return [info] QuicksortTest: [info] [CPU] Quicksort program [info] - should quicksort 10 numbers [info] Run completed in 33 seconds, 142 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` **Nyancat Animation:** ![螢幕擷取畫面 2025-12-15 175021](https://hackmd.io/_uploads/rJiI5I6fWg.png) ### 3-pipeline | Exercise | Name | File | |:--------:|:-------------------------------------------:|:-------------------------------------------:| | 16 | ALU Operation Implementation | ALU.scala | | 17 | Data Forwarding to EX Stage | fivestage_final/Forwarding.scala | | 18 | Data Forwarding to ID Stage | fivestage_final/Forwarding.scala | | 19 | Pipeline Hazard Detection | fivestage_final/Control.scala | | 20 | Pipeline Register Flush Logic | fivestage_final/IF2ID.scala | **Console output:** ```console.log [info] PipelineProgramTest: [info] Three-stage Pipelined CPU [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] Five-stage Pipelined CPU with Stalling [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] Five-stage Pipelined CPU with Forwarding [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] Five-stage Pipelined CPU with Reduced Branch Delay [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] PipelineUartTest: [info] Three-stage Pipelined CPU UART Comprehensive Test [info] - should pass all TX and RX tests [info] Five-stage Pipelined CPU with Stalling UART Comprehensive Test [info] - should pass all TX and RX tests [info] Five-stage Pipelined CPU with Forwarding UART Comprehensive Test [info] - should pass all TX and RX tests [info] Five-stage Pipelined CPU with Reduced Branch Delay UART Comprehensive Test [info] - should pass all TX and RX tests [info] PipelineRegisterTest: [info] Pipeline Register [info] - should be able to stall and flush [info] Run completed in 1 minute, 36 seconds. [info] Total number of tests run: 29 [info] Suites: completed 3, aborted 0 [info] Tests: succeeded 29, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` **Hazard Detection Summary and Analysis:** ## Hazard Detection Summary and Analysis (Exercise 21) ### Q1 — Why do we need to stall for load-use hazards? Load data is only available after the memory read The next instruction may need it earlier, and forwarding can’t forward a value that doesn’t exist yet → **stall**. --- ### Q2 — What is the difference between "stall" and "flush" operations? - **Stall:** pause pipeline progress (hold **PC** / pipeline registers). - **Flush:** cancel an instruction by injecting a **NOP**, so it won’t change architectural state. --- ### Q3 — Why does jump instruction with register dependency need stall? `jalr` needs `rs1` to compute the target address. If `rs1` depends on a previous unfinished instruction (especially a load), the target isn’t ready → **stall** to avoid jumping to the wrong address. --- ### Q4 — In this design, why is branch penalty only 1 cycle instead of 2? Branch/jump is decided in the **ID stage** (with **ID forwarding**), so only the already-fetched **IF** instruction is wrong → **1-cycle penalty**. --- ### Q5 — What would happen if we removed the hazard detection logic entirely? Without hazard detection, the CPU may read stale operands and take wrong branches/jumps → incorrect results / random failures. --- ### Q6 — Complete the stall condition summary: #### Stall is needed when: >1. `(jump_id || load_ex)` and ID uses `rd_ex` > - match `rs1/rs2`, and `rd_ex != x0` >2. `jump_id && load_mem` and ID uses `rd_mem` > - match `rs1/rs2`, and `rd_mem != x0` >#### Flush is needed when: >1. a branch/jump is taken (`jump_flag`) → flush **IF** (insert **NOP**) ## Reference https://hackmd.io/@sysprog/B1Qxu2UkZx#Chisel-Tutorial https://hackmd.io/@sysprog/2025-arch-homework3