# Assignment 3: RISC-V CPU with Chisel ## Part 1 — Chisel Bootcamp Summary, Environment Setup, and Enhanced Hello Module Contributed by [yuyan7498](https://github.com/yuyan7498) Code: [Github repository](https://github.com/yuyan7498/ca2025-mycpu) --- # 1. Environment & Toolchain Setup To begin Lab3, I first needed to establish a working Chisel/Scala environment. The final and stable setup I used: - **Platform:** Windows 10 + WSL2 (Ubuntu) - **JDK & sbt:** installed via SDKMAN - **Editor:** VS Code + Metals - **Project Base:** `chisel-tutorial` - **Testing Framework:** iotesters (consistent with the tutorial repo) ### 1.1 Installing SDKMAN Initially I ran: ```bash curl -s "https://get.sdkman.io" | bash ``` but SDKMAN failed due to missing `unzip`. SDKMAN depends on `unzip`, so I had to install it first: ```bash sudo apt update sudo apt install -y unzip zip ``` Then I re-ran the SDKMAN installer and it succeeded. ### 1.2 Installing Java & sbt ```bash sdk install java 11.0.21-tem sdk install sbt ``` `java -version` and `sbt about` confirmed the installation. ### 1.3 Cloning chisel-tutorial ```bash mkdir ~/Chisel cd ~/Chisel git clone https://github.com/ucb-bar/chisel-tutorial.git cd chisel-tutorial code . ``` VS Code + Metals detected the project automatically. ### 1.4 sbt run vs sbt test - `sbt run` → Runs a **main object**, such as generating Verilog or running a dedicated testbench. - `sbt test` → Compiles and runs **all ScalaTest + iotesters** test files under `src/test/scala`. For my enhanced Hello module, I mainly used: ```bash sbt "runMain hello.Hello" ``` to run exactly the `hello.Hello` object (which internally runs the iotesters‑style HelloTests). --- # 2. Chisel Bootcamp: Learning Process, Confusions, and What I Mastered This section summarizes what I learned while completing Chisel Bootcamp (up to Part 3.6), including the typical confusions I ran into and how I resolved them. --- ## 2.1 Understanding What Chisel Actually Is Initially I vaguely thought “Chisel = nicer Verilog.” After going through the notebooks and repeatedly asking questions, I finally internalized: - **Chisel is a Scala *library* that describes hardware generators.** - My Scala code runs once to **elaborate** the circuit graph (FIRRTL). - Simulation and synthesis operate on the generated circuit, not on Scala. This solved several early confusions: - Why `println()` only appears once at elaboration, not per clock. - Why loops in Scala (e.g., `for(i <- 0 until n)`) replicate hardware instead of creating a runtime loop. - Why changing Scala variables after elaboration has no effect on the already‑generated hardware. --- ## 2.2 Basic Hardware: IO Bundles & Combinational Logic I got used to the basic pattern: ```scala class MyModule extends Module { val io = IO(new Bundle { val in = Input(UInt(8.W)) val out = Output(UInt(8.W)) }) io.out := io.in + 1.U } ``` Key points I learned and sometimes mixed up: - `Input/Output` and `UInt(width.W)` are **hardware types**, not Scala integers. - `:=` describes a **connection** in the circuit, not an assignment that happens “now”. - Multiple `:=` to the same signal are combined with priority, similar to Verilog `always` blocks. The Bootcamp exercises around adders, bit operations, and small combinational circuits helped me build intuition that the Chisel code is a **graph of wires and operations**, not a sequence of instructions. --- ## 2.3 Sequential Logic: Registers, State, and when/elsewhen/otherwise Registers were the first place I felt “this is real hardware”: ```scala val reg = RegInit(0.U(8.W)) reg := reg + 1.U ``` Confusions I had: - I expected assignment order to behave like software (top‑down execution), but Chisel merges all assignments. - I wondered why moving lines around did not change behavior the way I imagined. The resolution: - All `reg := ...` inside a single clock domain are merged at compile‑time according to `when / elsewhen / otherwise` conditions. - Clocked behavior is governed by the flip‑flops, **not** the text order. --- ## 2.4 Scala vs Hardware: Constants, Types, and Generators A classic example that clarified the difference: ```scala val s = true.B io.out := Mux(s, 3.U, 0.U) ``` Here: - `s` is a **Chisel Bool node representing a constant 1**. - `Mux(s, 3.U, 0.U)` is a hardware multiplexer, not an if‑statement. - The decision is made in hardware (combinational) every cycle, not once in Scala. Bootcamp’s generator examples (e.g., using `Vec`, higher‑order functions, and parameterized modules) showed that: - Scala is the **meta‑language** for describing families of circuits (different widths, depths, structures). - The final design is still a plain circuit with wires and registers. --- ## 2.5 iotesters vs chiseltest (and where I got stuck) In the Chisel Bootcamp notebooks, tests use the **iotesters** style: ```scala poke(c.io.in, 3.U) step(1) expect(c.io.out, 9.U) ``` At one point I accidentally tried to use modern `chiseltest` syntax inside the Bootcamp environment: ```scala c.io.in.poke(3.U) c.io.out.expect(9.U) ``` This caused errors like: ```text value poke is not a member of chisel3.UInt value expect is not a member of chisel3.UInt ``` From this I learned: - Bootcamp/Jupyter uses iotesters and its own helper `test(...)` or `Driver.execute`. - The standalone `chisel-tutorial` repository’s `build.sbt` originally only includes: - `"chisel-iotesters"` - older ScalaTest APIs To avoid dependency conflicts and focus on the lab goals, I decided to: - Keep the tutorial repository on **iotesters**. - Use `PeekPokeTester` and `Driver(...)` conventions for `Hello` and other examples. --- ## 2.6 println vs printf: Software vs Hardware Debugging I also clarified the difference between: - `println(...)` → executes once during elaboration / Scala runtime → good for debugging generator structure. - `printf(...)` → becomes part of the hardware simulation → prints once per cycle (when enabled) in the simulation log → similar to Verilog’s `$display`. For hardware debugging (especially later in pipeline hazard analysis), `printf` and waveforms are essential tools. --- # 3. Enhanced “Hello World in Chisel” The assignment explicitly requires: > Describe the operation of “Hello World in Chisel” and **enhance it by incorporating logic circuit**. The original Hello example in the tutorial is extremely simple: it just drives a constant value out of the module, without any internal state or meaningful time behavior. To make it a more realistic hardware “Hello World”, I enhanced it into a **counter‑driven LED blinker** that exhibits clear sequential behavior over time. --- ## 3.1 What Was Enhanced Compared to the Original Hello Original behavior (conceptually): - `io.out := 42.U` (or some constant) - No internal registers - No clock‑dependent behavior - Not very representative of an actual FPGA “hello world” Enhanced behavior: 1. **Added a 32‑bit counter register (`cntReg`)** - Implemented with `RegInit(0.U(32.W))` - Increments by 1 on every clock cycle. 2. **Added a 1‑bit LED state register (`blkReg`)** - Implemented with `RegInit(0.U(1.W))` - Stores the current LED on/off state. - Flips (bitwise NOT) whenever the counter reaches a threshold. 3. **Introduced a configurable threshold constant `CNT_MAX`** - For real hardware, this could be something like `(50000000 / 2 - 1).U` for a 1 Hz blink on a 50 MHz clock. - For simulation, I used a much smaller value: `CNT_MAX = 4.U(32.W)`. 4. **Added a comparator and conditional logic** - `when (cntReg === CNT_MAX) { ... }` - When the counter equals the maximum: - `cntReg` is reset to 0 - `blkReg` is toggled via `blkReg := ~blkReg` 5. **Connected the LED output to the internal register** - `io.led := blkReg` - Now the LED output reflects a real internal state that changes over time, not just a constant value. In terms of logical building blocks, the enhanced Hello module now clearly contains: - A **counter register** - A **1‑bit state register** - A **comparator (==)** - A **NOT gate** - A small piece of **control logic** (`when` block) All of these are exactly the kinds of “logic circuits” the homework expects us to start playing with, even before diving into the full MyCPU. --- ## 3.2 Hello Module Internal Logic (Summary) ### Registers ```text cntReg: 32-bit counter blkReg: 1-bit LED state CNT_MAX = 4.U (small for simulation) ``` ### Behavior (per cycle) ```text cntReg := cntReg + 1.U when (cntReg === CNT_MAX) { cntReg := 0.U blkReg := ~blkReg } io.led := blkReg ``` Effectively: - The LED (`io.led`) toggles every **CNT_MAX + 1** cycles. - With CNT_MAX = 4, the LED period is 10 cycles (5 cycles ON, 5 cycles OFF, etc). --- ## 3.3 Hello Module — ASCII Hardware Diagram (for HackMD) High‑level view: ```text +-----------------------------------------+ | Hello Module | |-----------------------------------------| clk --------->| cntReg (32-bit counter) | | cntReg := cntReg + 1 | | | | when cntReg == CNT_MAX | | cntReg := 0 | | blkReg := NOT(blkReg) | | | | led := blkReg | +-----------------------------------------+ ``` Lower‑level, focusing on the logic we added: ```text clk | v +------------+ +----------------+ +--------+ | cntReg |----->| cntReg == MAX |----->| NOT |-----> blkReg | (register) | | (comparator) | | | +------------+ +----------------+ +--------+ | | | +----------------------+------------------+ (control logic led output: led = blkReg ``` This ASCII diagram is compatible with HackMD and explicitly shows the new logic components: - Counter register - Comparator against CNT_MAX - NOT gate feeding the LED state register --- ## 3.4 Testing Strategy (HelloTests) To verify that the enhanced Hello really behaves as designed, I wrote an iotesters‑style test: - Assumption: at reset, `blkReg` is initialized to 0, hence `io.led` starts at 0. - With `CNT_MAX = 4`, the LED should flip every 5 cycles. Test flow: 1. Immediately after reset, check that LED = 0. 2. Step 5 cycles, then LED should become 1. 3. Step another 5 cycles, then LED should go back to 0. Conceptually: ```text Cycle 0: led = 0 (reset state) Cycle 1–4: led = 0 Cycle 5: led toggles to 1 Cycle 6–9: led = 1 Cycle 10: led toggles back to 0 ... ``` The testbench uses: - `step(5)` - `expect(c.io.led, 1.U)` - then `step(5)` and `expect(c.io.led, 0.U)` I run it together with the Hello object: ```bash sbt "runMain hello.Hello" ``` which elaborates the Hello module, runs the tester, and prints the success message. --- # 4. **Single‑Cycle CPU** The implemented CPU follows a single-cycle execution model, where each instruction completes all five classical stages—Instruction Fetch, Decode, Execute, Memory Access, and Write Back—within one clock cycle. This design eliminates pipeline hazards and simplifies control logic, allowing the focus to remain on instruction correctness and datapath completeness, which is essential before extending the design to MMIO and pipelined architectures. --- ## 4.1 Completed CA25 Exercises and Implementation Summary ### Exercise 1: Immediate Extraction **Goal:** Correctly reconstruct S-type, B-type, and J-type immediates. **Verification:** InstructionDecoderTest and waveform inspection for branch/jump targets. ### Exercise 2: Control Signal Generation **Goal:** Generate correct control signals for ALU operand selection and write-back source. **Verification:** Decode-stage unit tests and correct execution of arithmetic and jump instructions. ### Exercise 3: ALU Control Decode **Goal:** Map opcode, funct3, and funct7 to ALU operations. **Key Issue:** Distinguishing ADD/SUB and SRL/SRA using funct7[5]. **Verification:** ExecuteTest. ### Exercise 4: Branch Condition Evaluation **Goal:** Implement all six RV32I branch conditions. **Verification:** Execute-stage unit tests and branch-heavy programs. ### Exercise 5: Jump Target Calculation **Goal:** Compute correct targets for JAL, JALR, and branches. **Critical Detail:** Clearing the least significant bit for JALR targets. **Verification:** Correct function calls and returns in Fibonacci. ### Exercise 6: Load Data Extension **Goal:** Correct sign or zero extension for LB/LH/LW/LBU/LHU. **Verification:** ByteAccessTest with negative and positive values. ### Exercise 7: Store Data Alignment **Goal:** Generate correct byte strobes and data alignment for SB/SH/SW. **Verification:** ByteAccessTest and memory waveform inspection. ### Exercise 8: Write-Back Source Selection **Goal:** Select ALU result, memory data, or PC+4 correctly. **Verification:** End-to-end program tests (Fibonacci, Quicksort). ### Exercise 9: Program Counter Update **Goal:** Correct PC update for sequential execution and control flow changes. **Verification:** InstructionFetchTest and waveform analysis. --- ## 4.2 Testing and Verification ### Unit Testing All **9 Chisel unit tests** passed successfully using: ```bash make test ``` ```bash cd .. && sbt "project singleCycle" test [info] welcome to sbt 1.10.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project ca2025-mycpu-build-build from metals.sbt... [info] loading project definition from /home/toby/Chisel/ca2025-mycpu/project/project [info] loading settings for project ca2025-mycpu-build from metals.sbt... [info] loading project definition from /home/toby/Chisel/ca2025-mycpu/project [success] Generated .bloop/ca2025-mycpu-build.json [success] Total time: 2 s, completed Dec 17, 2025, 2:49:53 PM [info] loading settings for project root from build.sbt... [info] set current project to mycpu-root (in build file:/home/toby/Chisel/ca2025-mycpu/) [info] set current project to mycpu-single-cycle (in build file:/home/toby/Chisel/ca2025-mycpu/) [info] InstructionDecoderTest: [info] InstructionDecoder [info] - should decode RV32I instructions and generate correct control signals [info] ByteAccessTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly handle byte-level store/load operations (SB/LB) [info] ComplianceTest: [info] MyCPU Compliance ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/add-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/add-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/addi-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/addi-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/and-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/and-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/andi-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/andi-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/auipc-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/auipc-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/beq-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/beq-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/bge-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bge-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/bgeu-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bgeu-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/blt-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/blt-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/bltu-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bltu-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/bne-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/bne-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/fence-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/fence-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/jal-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/jal-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/jalr-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/jalr-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/lb-align-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lb-align-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/lbu-align-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lbu-align-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/lh-align-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lh-align-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/lhu-align-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lhu-align-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/lui-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lui-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/lw-align-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/lw-align-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/or-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/or-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/ori-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/ori-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/sb-align-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sb-align-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/sh-align-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sh-align-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/sll-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sll-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/slli-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slli-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/slt-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slt-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/slti-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/slti-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/sltiu-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sltiu-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/sltu-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sltu-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/sra-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sra-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/srai-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srai-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/srl-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srl-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/srli-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/srli-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/sub-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sub-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/sw-align-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/sw-align-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/xor-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/xor-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/I/src/xori-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/xori-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/hints/src/fence-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/hints/src/fence-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/hints/src/srl-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/hints/src/srl-01.S ✅ Test completed - signature: /home/toby/Chisel/ca2025-mycpu/tests/riscof_work_1sc/rv32i_m/privilege/src/misalign1-jalr-01.S/dut/DUT-mycpu.signature [info] - should pass test /home/toby/Chisel/ca2025-mycpu/tests/riscv-arch-test/riscv-test-suite/rv32i_m/privilege/src/misalign1-jalr-01.S [info] InstructionFetchTest: [info] InstructionFetch [info] - should correctly update PC and handle jumps [info] ExecuteTest: [info] Execute [info] - should execute ALU operations and branch logic correctly [info] FibonacciTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly execute recursive Fibonacci(10) program [info] RegisterFileTest: [info] RegisterFile [info] - should correctly read previously written register values [info] - should keep x0 hardwired to zero (RISC-V compliance) [info] - should support write-through (read during write cycle) [info] QuicksortTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly execute Quicksort algorithm on 10 numbers [info] Run completed in 7 minutes, 48 seconds. [info] Total number of tests run: 50 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 50, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 469 s (07:49), completed Dec 17, 2025, 2:57:43 PM ``` ```bash make compliance ``` ![image](https://hackmd.io/_uploads/Hy7yMzgmbl.png) ```bash make sim SIM_ARGS="-instruction src/main/resources/fibonacci.asmbin" ``` ![image](https://hackmd.io/_uploads/H1aqYflQ-l.png) --- ## 4.2 Simulation and Waveform Analysis Verilator simulation with VCD waveform generation was used to analyze: - Immediate decoding correctness - Branch and jump behavior - Load/store alignment - Write-back data selection Waveform inspection was essential for debugging subtle datapath issues not immediately visible from test results alone. --- ## 4.3 Lessons Learned - Waveform-based debugging is crucial for understanding datapath behavior. # 5. **Single‑Cycle CPU with MMIO, Trap, and Interrupt Support** This report documents the implementation of **Memory-Mapped I/O (MMIO)**, **trap**, and **interrupt handling** in the 2-mmio-trap stage of Assignment 3. The design extends the single-cycle RV32I CPU with machine-mode privileged support, allowing the processor to interact with peripherals and safely handle asynchronous and synchronous events. ## 5.1 Memory-Mapped I/O (MMIO) Architecture ### Address Decoding and Bus Routing In this project, MMIO is implemented at the **top-level test module (TestTopModule)**. The CPU exposes a unified memory interface (`cpu.io.memory_bundle`) and a peripheral selector (`cpu.io.deviceSelect`). The address is logically divided into two parts: - **High-order bits** (`deviceSelect`): select the target device - **Lower bits**: index within the selected device or memory Example mapping: - `deviceSelect = 0x0` → Main memory - `deviceSelect = 0x8` → Timer (MMIO) - Other values → extended peripherals (UART, VGA, CLINT) This design allows the CPU to remain unaware of peripherals. From the CPU’s perspective, all accesses are standard `lw` / `sw` operations. --- ### MMIO Bus Demultiplexing The top module demultiplexes the memory bus based on `deviceSelect`: - If the access targets **main memory**, the CPU memory bundle is directly forwarded. - If the access targets **Timer MMIO**, the bus is intercepted: - Writes update `timerLimit` or `timerEnable` registers - Reads return `TimerReadData` As a result: - **No special I/O instructions are required** - The CPU datapath for load/store remains unchanged This approach naturally scales to multiple MMIO slaves such as UART, VGA, and CLINT. --- ### Software View of MMIO From software’s point of view, MMIO is simply memory access to specific addresses: ```c *(volatile uint32_t*)0x80000004 = 1000; // Timer limit *(volatile uint32_t*)0x80000008 = 1; // Enable timer ``` The `volatile` qualifier is required because MMIO accesses have side effects and must not be optimized away. --- ## 5.2 Trap and Interrupt Architecture ### CLINT: Central Trap Controller The **Core-Local Interrupt Controller (CLINT)** is responsible for: - Monitoring external interrupt signals (`cpu.io.interrupt_flag`) - Detecting trap-causing instructions (`ecall`, `ebreak`, `mret`) - Coordinating CSR updates during trap entry and exit - Requesting PC redirection during control flow changes CLINT does not execute instructions; instead, it orchestrates **state transitions**. --- ### Trap Entry: Interrupt and Exception Handling When an interrupt or exception occurs, CLINT performs the following actions: 1. **Event Detection** - Timer or peripheral asserts `interrupt_flag` - Or CPU executes `ecall` / `ebreak` 2. **CSR State Update Request** CLINT asserts `direct_write_enable` and supplies CSR write data: - `mstatus.MPIE ← mstatus.MIE` - `mstatus.MIE ← 0` - `mepc ← PC + 4` - `mcause ← interrupt or exception code` 3. **PC Redirection Request** CLINT asserts `interrupt_assert` and provides `interrupt_handler_address = mtvec` CSR updates and interrupt assertion occur atomically at instruction boundaries. --- ### PC Update Priority (Instruction Fetch) The **InstructionFetch** stage updates the PC according to strict priority: 1. `interrupt_assert` 2. `jump_flag` (branch / JAL / JALR / MRET) 3. `PC + 4` Any interrupt overrides normal control flow. --- ### CSR Write Priority and Atomicity The **CSR module** enforces strict write priority: - When `direct_write_enable = 1`: - CLINT writes `mstatus`, `mepc`, and `mcause` - CPU CSR instructions are ignored - Otherwise: - CPU CSR instructions execute normally This guarantees atomic trap state transitions. --- ## 5.3 Trap Return (`mret`) Flow When `mret` is executed: 1. CLINT restores: - `mstatus.MIE ← MPIE` - `mstatus.MPIE ← 1` 2. PC redirected to `mepc` 3. Program execution resumes --- ## 5.4 MMIO and Interrupt Interaction MMIO peripherals can trigger interrupts using the same mechanism as software traps. Example: - Timer reaches limit → interrupt_flag asserted - CLINT initiates trap - CPU jumps to `mtvec` - Trap handler runs - `mret` returns execution --- ## 5.5 End-to-End Flow Summary ``` CPU lw/sw → MMIO address ↓ Top-level routing ↓ Peripheral updates ↓ interrupt_flag ↓ CLINT + CSR ↓ PC → mtvec ↓ Handler ↓ mret → mepc ``` --- ## 5.6 Testing and Verification ```bash make test ``` ```bash cd .. && sbt "project mmioTrap" test [info] welcome to sbt 1.10.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project ca2025-mycpu-build-build from metals.sbt... [info] loading project definition from /home/toby/Chisel/ca2025-mycpu/project/project [info] loading settings for project ca2025-mycpu-build from metals.sbt... [info] loading project definition from /home/toby/Chisel/ca2025-mycpu/project [success] Generated .bloop/ca2025-mycpu-build.json [success] Total time: 2 s, completed Dec 18, 2025, 8:36:40 PM [info] loading settings for project root from build.sbt... [info] set current project to mycpu-root (in build file:/home/toby/Chisel/ca2025-mycpu/) [info] set current project to mycpu-mmio-trap (in build file:/home/toby/Chisel/ca2025-mycpu/) [info] ByteAccessTest: [info] [CPU] Byte access program [info] - should store and load single byte [info] CLINTCSRTest: [info] [CLINT] Machine-mode interrupt flow [info] - should handle external interrupt [info] - should handle environmental instructions [info] UartMMIOTest: [info] [UART] Comprehensive TX+RX test [info] - should pass all TX and RX tests [info] ExecuteTest: [info] [Execute] CSR write-back [info] - should produce correct data for csr write [info] FibonacciTest: [info] [CPU] Fibonacci program [info] - should calculate recursively fibonacci(10) [info] TimerTest: [info] [Timer] MMIO registers [info] - should read and write the limit [info] InterruptTrapTest: [info] [CPU] Interrupt trap flow [info] - should jump to trap handler and then return [info] QuicksortTest: [info] [CPU] Quicksort program [info] - should quicksort 10 numbers [info] Run completed in 42 seconds, 333 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ```bash make compliance ``` ![image](https://hackmd.io/_uploads/BkWa8OZXZg.png) ```bash make demo ``` ![image](https://hackmd.io/_uploads/ByDv85GQZg.png) # 6. **Pipeline CPU** Four pipeline variants are provided: - **ThreeStage**: Minimal pipeline with folded EX/MEM/WB stage. - **FiveStageStall**: Five-stage pipeline using stalls for hazard resolution. - **FiveStageForward**: Adds forwarding paths to reduce stalls. - **FiveStageFinal**: Fully optimized pipeline with forwarding, refined hazard detection, and correct CLINT/CSR interaction. This report focuses on **FiveStageFinal**. ## 6.1 MMIO in the Pipelined CPU ### Architectural Consistency with 2-mmio-trap The MMIO mechanism in the pipelined CPU is **architecturally identical** to the 2-mmio-trap design. - The CPU exposes a unified memory interface (`io.memory_bundle`) and a device selector (`io.deviceSelect`). - Address decoding and MMIO routing are performed **outside the CPU**, typically in the top-level test module (e.g., `TestTopModule`). - High-order address bits select peripherals such as Timer or UART, while remaining addresses map to main memory. From the CPU’s perspective, MMIO accesses are indistinguishable from normal memory accesses. --- ### MemoryAccess Stage in a Pipeline Context In the five-stage pipeline, all load/store operations (`lw`, `sw`, `lb`, `sb`, etc.) are handled by the **MemoryAccess** stage (`3-pipeline/src/main/scala/riscv/core/fivestage_final/MemoryAccess.scala`). Responsibilities include: - Computing effective addresses - Driving read/write strobes and data - Forwarding the request to external memory or MMIO devices The pipeline CPU does **not** contain device-specific logic. Whether an access targets RAM or MMIO is determined externally using `deviceSelect`, preserving modularity and consistency with the single-cycle design. --- ## 6.2 Trap and Interrupt Mechanism in a Pipeline ### CLINT Placement and Inputs In the pipelined design, the **CLINT** module is located at: ``` 3-pipeline/src/main/scala/riscv/core/fivestage_final/CLINT.scala ``` Key inputs include: - `instruction_id`: instruction currently in the ID stage - `instruction_address_if`: PC value from the IF stage - `jump_flag` and `jump_address`: control flow resolution from the pipeline - `interrupt_flag`: external interrupt sources (timer or external) This placement allows trap decisions to be made early while maintaining precise control. --- ### Trap Entry Logic (Interrupts and Exceptions) When the ID stage detects: - `ecall` or `ebreak`, or - an asserted `interrupt_flag` with `mstatus.MIE` and the corresponding `mie` bit enabled, CLINT initiates trap entry by performing the following actions: 1. **CSR State Preparation** - `MPIE ← MIE` - `MIE ← 0` - `MPP ← 0b11` (machine mode) - `mepc ← current (or post-jump) PC` - `mcause ← interrupt or exception code` 2. **Atomic CSR Update** - `direct_write_enable := true` - CSR module overwrites `mstatus`, `mepc`, and `mcause` in the same cycle 3. **PC Redirection Request** - `id_interrupt_assert := true` - `id_interrupt_handler_address := mtvec` This mechanism is functionally equivalent to 2-mmio-trap, with the key difference that trap detection now occurs in the **ID stage**. --- ### Pipeline Flush and Control Coordination Because multiple instructions are in flight, additional pipeline control is required. - **InstructionFetch** gives interrupt redirection highest priority: ``` Interrupt > Jump > Sequential (PC + 4) ``` - **Control.scala** asserts flush signals for the IF and ID stages when a trap is taken. - Two pipeline bubbles are injected to ensure no incorrect instructions progress further. The `IF2ID` pipeline register: - Replaces instructions with NOPs on flush - Resets PC to the entry address - Clears interrupt flags This guarantees that the pipeline is in a clean state when execution enters the trap handler. --- ### Trap Return (`mret`) Handling When `mret` is detected in the ID stage, CLINT performs trap return: 1. **CSR Restoration** - `MIE ← MPIE` - `MPIE ← 1` 2. **PC Redirection** - Handler address set to `mepc` 3. **Atomic CSR Write** - `direct_write_enable := true` 4. **Pipeline Flush** - IF and ID stages are flushed to remove stale instructions Execution resumes at the precise instruction saved in `mepc`. --- ## 6.3 CSR Priority and Atomicity The CSR module enforces strict priority rules: - When `direct_write_enable = 1`, CLINT updates override CSR instructions. - When `direct_write_enable = 0`, CSR instructions (`csrrw`, `csrrs`, etc.) execute normally. This guarantees atomic trap entry and exit even under pipelined execution. --- ## 6.4 Comparison with 2-mmio-trap ### Shared Mechanisms - MMIO via address decoding outside the CPU - CLINT-managed atomic CSR updates - Identical `mstatus`, `mepc`, and `mcause` semantics - PC update priority: Interrupt > Jump > Sequential ### Pipeline-Specific Extensions - Trap detection in ID stage - Multi-stage pipeline flush on trap entry and exit - Coordination with hazard detection and stall logic - Forwarding networks ensure flushed instructions do not write back --- ## 6.5 Testing and Verification ```bash make test ``` ```bash [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ```bash make compliance ``` ![image](https://hackmd.io/_uploads/r1CFjYGQ-l.png) ## 6.6 Running assignment2 on the Five-Stage Final CPU ### Key changes - `linker.ld`: Load address set to `0x1000` to match `Parameters.EntryAddress` used by the ROM loader. - `Makefile`: Use local `riscv-none-elf-` toolchain with `-ffreestanding -nostdlib`; objcopy now keeps `.text.startup/.text.trap` so the main code is present. - `start.S`: - Provide `uart_write` (MMIO at `0x4000_0000`) and `host_exit` (`exit_flag = 0xBABECAFE`), so all I/O is MMIO and no longer depends on ecall/trap. - BSS fields: `exit_code`, `exit_flag`, `debug_state`, and `uart_log` (first 256 UART bytes for debugging). - Boot prints `BOOT\n`; `debug_state`: 1 at boot, 3 before calling `main`, 6 after entering `main`, 5 on normal exit. - `hanoi.s`: Replace ecall I/O with calls to `uart_write`. - `board/verilator/Top.scala`: Default to `ImplementationType.FiveStageFinal` and expose `device_select` for UART MMIO. ## Build steps 1) Rebuild the program and binary: ```bash cd 3-pipeline/assignment2 make clean make make test.asmbin cp test.asmbin ../src/main/resources/assignment2.asmbin ``` 2) (Only if hardware changed) Regenerate Verilog and Verilator sim: ```bash cd /home/toby/Chisel/ca2025-mycpu/3-pipeline PATH=$HOME/.local/bin:$PATH \ SBT_OPTS="-Dsbt.boot.directory=$PWD/../.sbtboot -Dsbt.global.base=$PWD/../.sbtboot/global -Dsbt.ivy.home=$PWD/../.ivy2" \ make verilator ``` 3) Run simulation: ```bash cd verilog/verilator/obj_dir ./VTop -instruction ../../../src/main/resources/assignment2.asmbin \ -time 100000000 \ -halt 0x6154 \ -signature 0x6150 0x6270 ../../../assignment2_signature.txt ``` - `-halt 0x6154`: stop when `exit_flag` becomes `0xBABECAFE`. - `assignment2_signature.txt` stores `exit_code`/`exit_flag`/`debug_state` and the first 256 bytes of `uart_log`. ## Issues and fixes - Verilator treats UART MMIO reads (0x4xxxxxxx) as out-of-range, spamming `invalid read address ...` on stdout. Current approach: ignore the warnings; for clean output, patch `verilog/verilator/sim.cpp` to return 0 and suppress messages for UART reads. - Initially objcopy omitted `.text.startup/.text.trap`, so the main program was missing; fixed in the Makefile. - Early ecall/trap I/O path was replaced by pure MMIO (`uart_write`/`host_exit`), plus `debug_state` and `uart_log` to aid debugging. ## Results - Typical signature snapshot: `exit_flag = 0xBABECAFE`, `exit_code = 0`, `debug_state = 5`, `uart_log_idx = 256`. The `uart_log` confirms text output is emitted. - Terminal (UART) output excerpt (decoded from `uart_log`, first 256 bytes): ``` BOOT 4000 78 0 3f80 89 0 40c0 171 0 4040 313 0 PASSED PASSED PASSED 66 0 4000 127 0 3f80 96 0 40c0 153 0 4040 191 0 PASSED PASSED PASSED 55 input: 31 expected: 46 got: 46 36 input: 480 expected: 79 got: 79 145 input: 0xf00000 ``` --- # Reference - https://github.com/sysprog21/ca2025-mycpu - https://hackmd.io/@sysprog/2025-arch-homework3