# Assignment3: Your Own RISC-V CPU
contributed by [< rara0857 >](https://github.com/rara0857/ca2025-mycpu)
::: warning
**Disclaimer:** I used AI tools such as ChatGPT / Gemini to better understand algorithm and architecture descriptions.
:::
## Environment
WSL version: 2.6.2.0
WSL type: WSL2
Distro: Ubuntu 24.04.1 LTS
## Environment Setup
```
sudo apt install build-essential verilator gtkwave
curl -s "https://get.sdkman.io" | bash
sdk install java $(sdk list java | grep -o "\b8\.[0-9]*\.[0-9]*\-tem" | head -1)
sdk install sbt
```
## Step
Forked from https://github.com/sysprog21/ca2025-mycpu
Goal: finish `0-minimal`, `1-single-cycle`, `2-mmio-trap`, `3-pipeline`, and make sure all the corresponding unit tests pass.
### Test
```command=
$ sbt "project minimal" test
$ sbt "project singleCycle" test
$ sbt "project mmioTrap" test
$ sbt "project pipeline " test
```
### 0-minimal
**Console output:**
```console.log
[info] JITTest:
[info] Minimal CPU - JIT Test
[info] - should correctly execute jit.asmbin and set a0 to 42
[info] Run completed in 46 seconds, 614 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
### 1-single-cycle
There are 9 exercises in `1-single-cycle`:
| Exercise | Name | File |
|:--------:|:-------------------------------:|:-----------------------:|
| 1 | Immediate Extension | InstructionDecode.scala |
| 2 | Control Signal Generation | InstructionDecode.scala |
| 3 | ALU Control Decode | ALUControl.scala |
| 4 | Branch Comparison Logic | Execute.scala |
| 5 | Jump Target Address Calculation | Execute.scala |
| 6 | Load Data Extension | MemoryAccess.scala |
| 7 | Store Data Alignment | MemoryAccess.scala |
| 8 | Write-Back Multiplexer | WriteBack.scala |
| 9 | PC Update Logic | InstructionFetch.scala |
**Console output:**
```console.log
[info] InstructionDecoderTest:
[info] InstructionDecoder
[info] - should decode RV32I instructions and generate correct control signals
[info] ByteAccessTest:
[info] Single Cycle CPU - Integration Tests
[info] - should correctly handle byte-level store/load operations (SB/LB)
[info] InstructionFetchTest:
[info] InstructionFetch
[info] - should correctly update PC and handle jumps
[info] ExecuteTest:
[info] Execute
[info] - should execute ALU operations and branch logic correctly
[info] FibonacciTest:
[info] Single Cycle CPU - Integration Tests
[info] - should correctly execute recursive Fibonacci(10) program
[info] RegisterFileTest:
[info] RegisterFile
[info] - should correctly read previously written register values
[info] - should keep x0 hardwired to zero (RISC-V compliance)
[info] - should support write-through (read during write cycle)
[info] QuicksortTest:
[info] Single Cycle CPU - Integration Tests
[info] - should correctly execute Quicksort algorithm on 10 numbers
[info] Run completed in 29 seconds, 613 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
### 2-mmio-trap
There are 13 exercises in `2-mmio-trap`:
| Exercise | Name | File |
|:--------:|:-------------------------------------------:|:-----------------------:|
| 1 | Immediate Extension | InstructionDecode.scala |
| 2 | Control Signal Generation | InstructionDecode.scala |
| 3 | ALU Control Logic | ALUControl.scala |
| 4 | Branch Comparison Logic | Execute.scala |
| 5 | Jump Target Address Calculation | Execute.scala |
| 6 | Load Data Extension | MemoryAccess.scala |
| 7 | Store Data Alignment | MemoryAccess.scala |
| 8 | WriteBack Source Selection with CSR Support | WriteBack.scala |
| 9 | CSR Register Lookup Table | CSR.scala |
| 10 | CSR Write Priority Logic | CSR.scala |
| 11 | Interrupt Entry — mstatus Transition | CLINT.scala |
| 12 | Trap Return (MRET) — mstatus Restoration | CLINT.scala |
| 13 | PC Update Logic with Interrupts | InstructionFetch.scala |
**Console output:**
```console.log
[info] ByteAccessTest:
[info] [CPU] Byte access program
[info] - should store and load single byte
[info] CLINTCSRTest:
[info] [CLINT] Machine-mode interrupt flow
[info] - should handle external interrupt
[info] - should handle environmental instructions
[info] UartMMIOTest:
[info] [UART] Comprehensive TX+RX test
[info] - should pass all TX and RX tests
[info] ExecuteTest:
[info] [Execute] CSR write-back
[info] - should produce correct data for csr write
[info] FibonacciTest:
[info] [CPU] Fibonacci program
[info] - should calculate recursively fibonacci(10)
[info] TimerTest:
[info] [Timer] MMIO registers
[info] - should read and write the limit
[info] InterruptTrapTest:
[info] [CPU] Interrupt trap flow
[info] - should jump to trap handler and then return
[info] QuicksortTest:
[info] [CPU] Quicksort program
[info] - should quicksort 10 numbers
[info] Run completed in 33 seconds, 142 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
**Nyancat Animation:**

### 3-pipeline
| Exercise | Name | File |
|:--------:|:-------------------------------------------:|:-------------------------------------------:|
| 16 | ALU Operation Implementation | ALU.scala |
| 17 | Data Forwarding to EX Stage | fivestage_final/Forwarding.scala |
| 18 | Data Forwarding to ID Stage | fivestage_final/Forwarding.scala |
| 19 | Pipeline Hazard Detection | fivestage_final/Control.scala |
| 20 | Pipeline Register Flush Logic | fivestage_final/IF2ID.scala |
**Console output:**
```console.log
[info] PipelineProgramTest:
[info] Three-stage Pipelined CPU
[info] - should calculate recursively fibonacci(10)
[info] - should quicksort 10 numbers
[info] - should store and load single byte
[info] - should solve data and control hazards
[info] - should handle all hazard types comprehensively
[info] - should handle machine-mode traps
[info] Five-stage Pipelined CPU with Stalling
[info] - should calculate recursively fibonacci(10)
[info] - should quicksort 10 numbers
[info] - should store and load single byte
[info] - should solve data and control hazards
[info] - should handle all hazard types comprehensively
[info] - should handle machine-mode traps
[info] Five-stage Pipelined CPU with Forwarding
[info] - should calculate recursively fibonacci(10)
[info] - should quicksort 10 numbers
[info] - should store and load single byte
[info] - should solve data and control hazards
[info] - should handle all hazard types comprehensively
[info] - should handle machine-mode traps
[info] Five-stage Pipelined CPU with Reduced Branch Delay
[info] - should calculate recursively fibonacci(10)
[info] - should quicksort 10 numbers
[info] - should store and load single byte
[info] - should solve data and control hazards
[info] - should handle all hazard types comprehensively
[info] - should handle machine-mode traps
[info] PipelineUartTest:
[info] Three-stage Pipelined CPU UART Comprehensive Test
[info] - should pass all TX and RX tests
[info] Five-stage Pipelined CPU with Stalling UART Comprehensive Test
[info] - should pass all TX and RX tests
[info] Five-stage Pipelined CPU with Forwarding UART Comprehensive Test
[info] - should pass all TX and RX tests
[info] Five-stage Pipelined CPU with Reduced Branch Delay UART Comprehensive Test
[info] - should pass all TX and RX tests
[info] PipelineRegisterTest:
[info] Pipeline Register
[info] - should be able to stall and flush
[info] Run completed in 1 minute, 36 seconds.
[info] Total number of tests run: 29
[info] Suites: completed 3, aborted 0
[info] Tests: succeeded 29, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
**Hazard Detection Summary and Analysis:**
## Hazard Detection Summary and Analysis (Exercise 21)
### Q1 — Why do we need to stall for load-use hazards?
Load data is only available after the memory read
The next instruction may need it earlier, and forwarding can’t forward a value that doesn’t exist yet → **stall**.
---
### Q2 — What is the difference between "stall" and "flush" operations?
- **Stall:** pause pipeline progress (hold **PC** / pipeline registers).
- **Flush:** cancel an instruction by injecting a **NOP**, so it won’t change architectural state.
---
### Q3 — Why does jump instruction with register dependency need stall?
`jalr` needs `rs1` to compute the target address.
If `rs1` depends on a previous unfinished instruction (especially a load), the target isn’t ready → **stall** to avoid jumping to the wrong address.
---
### Q4 — In this design, why is branch penalty only 1 cycle instead of 2?
Branch/jump is decided in the **ID stage** (with **ID forwarding**), so only the already-fetched **IF** instruction is wrong → **1-cycle penalty**.
---
### Q5 — What would happen if we removed the hazard detection logic entirely?
Without hazard detection, the CPU may read stale operands and take wrong branches/jumps → incorrect results / random failures.
---
### Q6 — Complete the stall condition summary:
#### Stall is needed when:
>1. `(jump_id || load_ex)` and ID uses `rd_ex`
> - match `rs1/rs2`, and `rd_ex != x0`
>2. `jump_id && load_mem` and ID uses `rd_mem`
> - match `rs1/rs2`, and `rd_mem != x0`
>#### Flush is needed when:
>1. a branch/jump is taken (`jump_flag`) → flush **IF** (insert **NOP**)
## Reference
https://hackmd.io/@sysprog/B1Qxu2UkZx#Chisel-Tutorial
https://hackmd.io/@sysprog/2025-arch-homework3