# Assignment 3: Your Own RISC-V CPU
Contributed by <`ryanycs`>
[GitHub repo](https://github.com/ryanycs/ca2025-mycpu)
[TOC]
## Chisel Bootcmap
### Hello World in Chisel
```scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
The 'Hello World in Chisel' implements a LED blinker using a counter.
## 1-single-cycle
### Test Summary
#### InstructionDecoderTest
This test validates the Instruction Decode (ID) stage of a single-cycle RV32I CPU. It checks that each instruction type generates the correct control signals, including:
- `ex_aluop1_source`
- `ex_aluop2_source`
- `ex_immediate`
- `regs_reg1_read_address`
- `reg_write_enable`
- `reg_write_address`
- `wb_reg_write_source`
- `memory_read_enable`
- `memory_write_enable`
#### InstructionFetchTest
This test validates the Instruction Fetch (IF) stage of the single-cycle CPU, which focuses on Program Counter (PC) update logic, specifically, PC increment (`PC + 4`) or control-flow changes (`Jump`).
The test program randomly set control signals between no jump and jump, expecting instruction fetching updates the PC correctly.
#### ExecuteTest
This test validates the Execute (EX) stage of the single-cycle CPU, which focuses on ALU computation result (`ADD`), branch condition (`BEQ`, `BEQU`, `BNE`), and jump target generation.
#### RegisterFileTest
This test validate the Register File module of the single-cycle RISC-V CPU, which focuses on register write and read, register `x0` behavior, and registers write-through support.
#### CPUTest
This test validate the full single-cycle CPU integration. It focus on end-to-end program execution for real RISC-V programs, such as `fibonacci.asmbin` and `quicksort.asmbin`.
### Issues Encounter
#### MuxLookup
When using an expression as the result value in `MuxLookup`, It need to wrap the expression in parentheses:
```scala
// ============================================================
// [CA25: Exercise 8] WriteBack Source Selection
// ============================================================
io.regs_write_data := MuxLookup(io.regs_write_source, io.alu_result)(
Seq(
RegWriteSource.Memory -> io.memory_read_data,
RegWriteSource.NextInstructionAddress -> (io.instruction_address + 4.U) // <-here
)
)
```
If there is no parentheses, the compiler will report type mismatch error.
```log
[error] /home/ryanycs/ca2025-mycpu/1-single-cycle/src/main/scala/riscv/core/WriteBack.scala:49:75: type mismatch;
[error] found : chisel3.UInt
[error] required: String
[error] RegWriteSource.NextInstructionAddress -> io.instruction_address + 4.U
[error] ^
[error] one error found
[error] (Compile / compileIncremental) Compilation failed
```
This error confused me for a while because the `type mismatch` does not directly indicate that the issue is caused by missing parentheses.
### Test Result
```log
[info] InstructionDecoderTest:
[info] InstructionDecoder
[info] - should decode RV32I instructions and generate correct control signals
[info] ByteAccessTest:
[info] Single Cycle CPU - Integration Tests
[info] - should correctly handle byte-level store/load operations (SB/LB)
[info] InstructionFetchTest:
[info] InstructionFetch
[info] - should correctly update PC and handle jumps
[info] ExecuteTest:
[info] Execute
[info] - should execute ALU operations and branch logic correctly
[info] FibonacciTest:
[info] Single Cycle CPU - Integration Tests
[info] - should correctly execute recursive Fibonacci(10) program
[info] RegisterFileTest:
[info] RegisterFile
[info] - should correctly read previously written register values
[info] - should keep x0 hardwired to zero (RISC-V compliance)
[info] - should support write-through (read during write cycle)
[info] QuicksortTest:
[info] Single Cycle CPU - Integration Tests
[info] - should correctly execute Quicksort algorithm on 10 numbers
[info] Run completed in 30 seconds, 312 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 31 s, completed Dec 8, 2025 5:57:28 PM
```
#### Compliance Test

## 2-mmio-trap
### Test Summary
#### CLINTCSRTest
This test validate the Core Local Interruptor (CLINT) and Control and Status Registers (CSR) of the single-cycle CPU. It focuses on machine-mode interrupt handling, trap/exception processing, and environmental instruction.
#### ExecuteTest
This test validates the CSR (Control and Status Register) write-back logic within the Execute stage of the single-cycle CPU, which focuses is on computing the correct value to write to CSR registers for different CSR instructions.
#### TimerTest
This test validates the Timer module’s memory-mapped I/O (MMIO) registers. Specifically, it checks that the limit register and enable status can be written to and read back correctly.
#### UartMMIOTest
This test validates the UART MMIO interface in the single-cycle CPU. It ensures that TX and RX operations, as well as baud rate and enable control, function correctly when driven by CPU memory accesses.
### Test Result
```log
[info] ByteAccessTest:
[info] [CPU] Byte access program
[info] - should store and load single byte
[info] CLINTCSRTest:
[info] [CLINT] Machine-mode interrupt flow
[info] - should handle external interrupt
[info] - should handle environmental instructions
[info] UartMMIOTest:
[info] [UART] Comprehensive TX+RX test
[info] - should pass all TX and RX tests
[info] ExecuteTest:
[info] [Execute] CSR write-back
[info] - should produce correct data for csr write
[info] FibonacciTest:
[info] [CPU] Fibonacci program
[info] - should calculate recursively fibonacci(10)
[info] TimerTest:
[info] [Timer] MMIO registers
[info] - should read and write the limit
[info] InterruptTrapTest:
[info] [CPU] Interrupt trap flow
[info] - should jump to trap handler and then return
[info] QuicksortTest:
[info] [CPU] Quicksort program
[info] - should quicksort 10 numbers
[info] Run completed in 32 seconds, 550 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 33 s, completed Dec 8, 2025 5:58:58 PM
```
#### Compliance Test

## 3-pipeline
### Test Summary
#### PipelineProgramTest
This test verifies that the pipeline CPU:
- Execute programs correctly (`fibonacci.asmbin`, `quicksort.asmbin`, `sb.asmbin`).
- Handle data and control hazards correctly.
- Handle machine-mode traps.
#### PipelineRegisterTest
This test validates that the pipeline register module correctly handles:
- normal data propagation (`stall = false`, `flush = false`), register value = current value
- stall (`stall = true`, `flush = false`), register value = previous value
- flush (`stall = false`, `flush = true`), register value = default value
### Hazard Detection Summary and Analysis
- Why do we need to stall for load-use hazards? (Hint: Consider data dependency and forwarding limitations)
> [!Note] Ans
> Forwarding is sufficient to solve RAW data hazards when the result is computed in the `EX` stage of an instruction because its result can then be forwarded to the `EX` stage of the next instruction. However, the `lw` instruction does not finish reading data until the end of the `MEM` stage, so its result cannot be forwarded to the `EX` stage of the next instruction.
>
> A solution is to *stall* the pipeline, holding up operation until the data is available.
- What is the difference between "stall" and "flush" operations? (Hint: Compare their effects on pipeline registers and PC)
> [!Note] Ans
> Stalling a stage is performed by **disabling its pipeline register** (i.e., the register to the left of a stage) so that the stage’s inputs do not change. When a stage is stalled, all previous stages must also be stalled so that no subsequent instructions are lost. The pipeline register directly after the stalled stage must be **cleared** (flushed) to prevent bogus information from propagating forward.
>
> More specifically, the *stall* is the control signal connected to *enable* of the pipeline registers, and the *flush* is the control signal connected to *reset* of the pipeline registers.
- Why does jump instruction with register dependency need stall? (Hint: When is jump target address available?)
> [!Note] Ans
> Since the jump target address is available at `ID` stage, a jump instruction with register dependency (e.g., `jalr x0, x1, 0`) needs a stall if the value of the register it depends on `x1` is being written by a preceding instruction that hasn't reached the `WB` stage yet or hasn't reached a stage where its result can be forwarded to the `ID` stage:
> ```
> IF ID EX MEM WB (add x1, x2, x3)
> │
> │ forward
> ▼
> IF ID ID EX MEM WB (jalr x0, x1, 0)
> ```
- In this design, why is branch penalty only 1 cycle instead of 2? (Hint: Compare ID-stage vs EX-stage branch resolution)
> [!Note] Ans
> The branch penalty is only 1 cycle because branch resolution is moved from the EX stage to the ID stage.
> - ID-stage branch resolution
> If the branch is taken, Only the instruction in the IF/ID register needs to be flushed. There is 1-cycle penalty.
> ```
> IF ID
> IF ◄─ flush!
> ```
> - EX-stage branch resolution
> If the branch is taken, two instructions must be flushed (the one in IF/ID and the one in ID/EX). There is 2-cycle penalty.
> ```
> IF ID EX
> IF ID ◄─ flush!
> IF ◄─ flush!
> ```
- What would happen if we removed the hazard detection logic entirely? (Hint: Consider data hazards and control flow correctness)
> [!Note] Ans
> If the hazard detection logic were removed entirely, the pipeline cpu would compute the wrong result. There are two hazards will encounter:
> - Data Hazard
> A data hazard occurs when an instruction tries to read a register that has not yet been written back by a previous instruction.
> - Control Hazard
> A control hazard occurs when the decision of what instruction to fetch next has not been made by the time the fetch takes place.
- Complete the stall condition summary:
> [!Note] Ans
> Stall is needed when:
> 1. The instruction in the `ID` stage is `Jump` or the instruction in the `EX` stage is `Load`, and if The instruction in the `ID` stage requires a register value being produced by the instruction in the `EX` stage (EX stage condition)
> 2. The `Jump` instruction in the `ID` stage requires a register value being produced by a `Load` instruction in the `MEM` stage. (MEM stage condition)
>
> Flush is needed when:
> 1. Branch is taken (Branch/Jump condition)
### Handwritten RISC-V Assembly Code in Homework2
I use `int32_to_bf16.S` to ensure the pipelined RISC-V CPU functions correctness.
#### 3-pipeline/src/test/scala/riscv/PipelineProgramTest.scala
```scala
it should "convert int32 to bf16 correctly" in {
runProgram("int32_to_bf16.asmbin", cfg) { c =>
c.clock.setTimeout(0)
c.clock.step(10000)
c.io.regs_debug_read_address.poke(10.U) // a0
c.io.regs_debug_read_data.expect(0x0.U)
}
}
```
```log
[info] PipelineProgramTest:
[info] Three-stage Pipelined CPU
[info] - should convert int32 to bf16 correctly
[info] Five-stage Pipelined CPU with Stalling
[info] - should convert int32 to bf16 correctly
[info] Five-stage Pipelined CPU with Forwarding
[info] - should convert int32 to bf16 correctly
[info] Five-stage Pipelined CPU with Reduced Branch Delay
[info] - should convert int32 to bf16 correctly
```
### Issues Encounter
#### chiseltest.TimeoutException
For testing my pipelined cpu, I use `int32_to_bf16.S` which I written in homework2 to ensure it functions correctness. But I found that It needs more than 1000 cycles to complete the full program. However, If I changed the steps of clock from `c.clock.step(1000)` to `c.clock.step(10000)`, there is a `chiseltest.TimeoutException` telling that I can not step over 1000 cycles.
```log
[info] - should convert int32 to bf16 correctly *** FAILED ***
[info] chiseltest.TimeoutException: timeout on TestTopModule.clock: IO[Clock] at 1000 idle cycles. You can extend the timeout by calling .setTimeout(<n>) on your clock (setting it to 0 means 'no timeout').
```
The solution is add `c.clock.setTimeout(0)` to tell chiseltest taht there is no timeout for clock stepping.
### Test Result
```log
[info] PipelineProgramTest:
[info] Three-stage Pipelined CPU
[info] - should calculate recursively fibonacci(10)
[info] - should quicksort 10 numbers
[info] - should store and load single byte
[info] - should solve data and control hazards
[info] - should handle all hazard types comprehensively
[info] - should handle machine-mode traps
[info] Five-stage Pipelined CPU with Stalling
[info] - should calculate recursively fibonacci(10)
[info] - should quicksort 10 numbers
[info] - should store and load single byte
[info] - should solve data and control hazards
[info] - should handle all hazard types comprehensively
[info] - should handle machine-mode traps
[info] Five-stage Pipelined CPU with Forwarding
[info] - should calculate recursively fibonacci(10)
[info] - should quicksort 10 numbers
[info] - should store and load single byte
[info] - should solve data and control hazards
[info] - should handle all hazard types comprehensively
[info] - should handle machine-mode traps
[info] Five-stage Pipelined CPU with Reduced Branch Delay
[info] - should calculate recursively fibonacci(10)
[info] - should quicksort 10 numbers
[info] - should store and load single byte
[info] - should solve data and control hazards
[info] - should handle all hazard types comprehensively
[info] - should handle machine-mode traps
[info] PipelineUartTest:
[info] Three-stage Pipelined CPU UART Comprehensive Test
[info] - should pass all TX and RX tests
[info] Five-stage Pipelined CPU with Stalling UART Comprehensive Test
[info] - should pass all TX and RX tests
[info] Five-stage Pipelined CPU with Forwarding UART Comprehensive Test
[info] - should pass all TX and RX tests
[info] Five-stage Pipelined CPU with Reduced Branch Delay UART Comprehensive Test
[info] - should pass all TX and RX tests
[info] PipelineRegisterTest:
[info] Pipeline Register
[info] - should be able to stall and flush
[info] Run completed in 1 minute, 53 seconds.
[info] Total number of tests run: 29
[info] Suites: completed 3, aborted 0
[info] Tests: succeeded 29, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 114 s (01:54), completed Dec 8, 2025 6:05:29 PM
```
#### Compliance Test

## Acknowledge
This assignment was completed with assistance from [Github Copilot](https://github.com/features/copilot) auto completions for code/comments writing (Agent mode does not use). [ChatGPT](https://chatgpt.com/) is used to refine the writing of this document.
## Reference
[Digital Design and Computer Architecture, RISC-V Edition]()