arch2025-homework3

# Assignment 3: Construct a RISC-V CPU with Chisel contributed by <[Wei-Chen Lai](https://github.com/Winstonllllai)> [**`code`**](https://github.com/Winstonllllai/ca2025-HW3) :::spoiler **Table of Content** [TOC] ::: ## Set up Environment ### Install Dependent Packages To run this project on MacOS 26 ``` $ brew install verilator ``` ### Install sbt ``` bash # Uninstall everything brew uninstall sbt brew uninstall jenv # Install sdkman curl -s "https://get.sdkman.io" | bash source "$HOME/.sdkman/bin/sdkman-init.sh" # Install Eclipse Temurin JDK 11 sdk install java 11.0.29-tem sdk install sbt ``` ### Install GTKwave & Surfer [GTKwave](https://gtkwave.sourceforge.net/) & [Surfer](https://surfer-project.org/) are tools to view the waveform of signals - Install Icarus Verilog ``` brew install icarus-verilog ``` - GTKwave ``` brew install desktop-file-utils shared-mime-info\ gobject-introspection gtk-mac-integration \ meson ninja pkg-config gtk+3 gtk4 git clone "https://github.com/gtkwave/gtkwave.git" cd gtkwave meson setup build && cd build && meson install ``` - Surfer ``` $ brew install verilator surfer ``` ## Learn Chisel! [Chisel](https://www.chisel-lang.org/) stands for Constructing Hardware in a [Scala](https://www.scala-lang.org/) Embedded Language. It is an open-source hardware construction language developed by UC Berkeley. Its most famous application is in the design of RISC-V processors (such as Rocket Chip and BOOM). Essence: It is not a brand-new standalone language; rather, it is a "Domain-Specific Language" (DSL) embedded within the Scala programming language. Core Concept: "Hardware Generator". The Chisel code is actually a Scala program. When this program is executed, it generates the corresponding circuit logic (which is ultimately converted into Verilog). ### Pros & Cons #### Pros - High Parameterization: Ideal for creating flexible, modular designs (like CPUs). - Concise Code: Drastically reduces boilerplate using Object-Oriented features. - Type Safety: Catches wiring and width errors before generating Verilog. - Ecosystem: Strong support from the open-source RISC-V community. #### Cons - Steep Learning Curve: Requires mastering Scala and functional programming. - Tricky Debugging: Difficult to map simulation errors back to the original Chisel code. - Complex Setup: Requires software-centric tools (JDK, SBT) unfamiliar to many hardware engineers. ### How to Get Start We Learn Chisel from [Chisel Bootcamp](https://github.com/freechipsproject/chisel-bootcamp) [Learn Chisel Online](https://mybinder.org/v2/gh/sysprog21/chisel-bootcamp/HEAD) #### To run Chsel Bootcamp on MacOS locally run instruction bellow ``` pip3 install --upgrade pip pip3 install jupyter --ignore-installed pip3 install jupyterlab ``` ``` curl -L -o coursier https://git.io/coursier-cli && chmod +x coursier SCALA_VERSION=2.12.10 ALMOND_VERSION=0.9.1 ./coursier bootstrap -r jitpack \ -i user -I user:sh.almond:scala-kernel-api_$SCALA_VERSION:$ALMOND_VERSION \ sh.almond:scala-kernel_$SCALA_VERSION:$ALMOND_VERSION \ --sources --default=true \ -o almond ./almond --install ``` Install Bootcamp ``` git clone https://github.com/freechipsproject/chisel-bootcamp.git cd chisel-bootcamp mkdir -p ~/.jupyter/custom cp source/custom.js ~/.jupyter/custom/custom.js ``` ``` jupyter notebook ``` ### 1. Basic Data Types Chisel types define hardware wires, distinct from Scala types. * **Unsigned Integer:** `UInt(8.W)` (8-bit width) * **Signed Integer:** `SInt(10.W)` (10-bit width) * **Boolean:** `Bool()` (1-bit) * **Literals:** * `10.U` (Width inferred) * `"hff".U(8.W)` (Hex explicit width) * `true.B`, `false.B` ### 2. Modules & IO All hardware must extend `Module`. Ports are defined in `IO`. ```scala class MyModule extends Module { val io = IO(new Bundle { val a = Input(UInt(8.W)) val b = Input(UInt(8.W)) val out = Output(UInt(8.W)) }) // Logic io.out := io.a + io.b } ``` ### 3. Connections & Operators Connect signals from **right to left**. * **Assignment:** `:=` * **Arithmetic:** `+`, `-`, `*`, `/`, `%` * **Bitwise:** `&`, `|`, `^`, `~` * **Shift:** `<<` (Left), `>>` (Right) * **Comparison:** `===` (Equal), `=/=` (Not Equal), `>`, `<` * **Bit Extraction:** `val x = sig(3, 0)` * **Concatenation:** `val cat = Cat(high, low)` ### 4. Registers (Sequential Logic) Clock and Reset are implicit. * **Reg with Reset (Best Practice):** `val reg = RegInit(0.U(8.W))` * **Simple Reg:** `val reg = Reg(UInt(8.W))` * **RegNext (Delay):** `val delayed = RegNext(io.in)` ### 5. Control Flow **Note:** Use `when` for hardware, not Scala's `if`. * **Conditional:** ```scala when(io.a > 10.U) { reg := 1.U } .elsewhen(io.a === 0.U) { reg := 2.U } .otherwise { reg := 0.U } ``` * **Multiplexer:** `Mux(cond, trueVal, falseVal)` * **Switch/Is:** ```scala import chisel3.util._ switch(state) { is(sIdle) { /* logic */ } } ``` ## Minimal CPU The CPU supports exactly these RISC-V instructions: * `AUIPC` (Add Upper Immediate to PC) - for PC-relative addressing * `ADDI` (Add Immediate) - for arithmetic and register initialization * `LW` (Load Word) - word-aligned memory reads only * `SW` (Store Word) - word-aligned memory writes only * `JALR` (Jump and Link Register) - for function calls and returns ### Test result ``` [info] JITTest: [info] Minimal CPU - JIT Test [info] - should correctly execute jit.asmbin and set a0 to 42 [info] Run completed in 35 seconds, 469 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ## RISC-V Single Cycle CPU ![image](https://hackmd.io/_uploads/HygnIY28yZx.png) Adopts a single-cycle execution model where each instruction completes in one clock cycle (CPI = 1). Since there is no pipelining design, there are no data or control hazards. Instruction Set: Fully supports the RV32I base integer instruction set. Memory: Uses Harvard Architecture, separating instruction and data memory to avoid access conflicts. ### Supported Instructions * Arithmetic and Logic Instructions: ```add```, ```sub```, ```slt```, etc. * Memory Access Instructions: ```lb```, ```lw```, ```sb```, etc. * Branch Instructions: ```beq```, ```jar```, etc. ### Exercises * **InstructionDecode.scala:** * `Exercise 1`: Implement immediate extension and bit reordering for S, B, and J types. * `Exercise 2`: Generate control signals for write-back sources and ALU operand selection. * **ALUControl.scala:** * `Exercise 3`: Decode Opcode, Funct3, and Funct7 to generate correct ALU operation codes. * **Execute.scala:** * `Exercise 4`: Implement comparison logic for all six branch instructions (e.g., BEQ, BNE). * `Exercise 5`: Calculate jump target addresses for Branch, JAL, and JALR instructions. * **MemoryAccess.scala:** * `Exercise 6`: Handle sign-extension and zero-extension for Load instructions. * `Exercise 7`: Generate byte strobes and align data for Store instructions. * **WriteBack.scala:** * `Exercise 8`: Implement the multiplexer to select the register write-back source (ALU/Memory/PC+4). * **InstructionFetch.scala:** * `Exercise 9`: Implement PC update logic to handle sequential execution and jumps. ## Testcases * **Unit Tests** * `InstructionFetchTest`: Verifies the Program Counter (PC) update logic, including sequential execution (PC+4) and handling of Jump instructions. * `InstructionDecoderTest`: Checks if the decoder correctly parses various RV32I instruction formats (R, I, S, B, U, J types) and generates accurate control signals (e.g., ALU source selection, register write enable). * `ExecuteTest`: Validates ALU operations (arithmetic, logic, shift, comparison) and the decision logic for Branch instructions. * `RegisterFileTest`: Tests register read/write behavior, ensuring register x0 is always zero and supporting write-through. * **Integration Tests** (CPUTest.scala) * `FibonacciTest`: Executes a recursive Fibonacci program (fibonacci.asmbin) to verify function calls and stack operations. * `QuicksortTest`: Runs a quicksort program (quicksort.asmbin) to sort 10 numbers, verifying complex control flow and data processing. * `ByteAccessTest`: Runs a byte access program (sb.asmbin) to verify lb/sb instructions for memory access, alignment, and sign extension correctness. ### Test Result ``` [info] InstructionDecoderTest: [info] InstructionDecoder [info] - should decode RV32I instructions and generate correct control signals [info] ByteAccessTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly handle byte-level store/load operations (SB/LB) [info] InstructionFetchTest: [info] InstructionFetch [info] - should correctly update PC and handle jumps [info] ExecuteTest: [info] Execute [info] - should execute ALU operations and branch logic correctly [info] FibonacciTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly execute recursive Fibonacci(10) program [info] RegisterFileTest: [info] RegisterFile [info] - should correctly read previously written register values [info] - should keep x0 hardwired to zero (RISC-V compliance) [info] - should support write-through (read during write cycle) [info] QuicksortTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly execute Quicksort algorithm on 10 numbers [info] Run completed in 20 seconds, 523 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### RISCOF Compliance Test ``` Validating RISCOF installation... RISCOF found: /Users/winston/miniconda3/envs/chisel/bin/riscof Version: RISC-V Architectural Test Framework., version 1.25.3 Running RISCOF compliance tests for 1-single-cycle (RV32I)... Running RISCOF compliance tests for 1-single-cycle... Using config: config-1-single-cycle.ini Using RISCOF: /Users/winston/miniconda3/envs/chisel/bin/riscof Using toolchain: Starting compliance test run at Mon Dec 8 14:14:03 CST 2025 This may take 10-15 minutes for the full test suite... INFO | ****** RISCOF: RISC-V Architectural Test Framework 1.25.3 ******* INFO | using riscv_isac version : 0.18.0 INFO | using riscv_config version : 3.18.3 INFO | Reading configuration from: /Users/winston/Documents/ca2025-HW3/tests/config-1-single-cycle.ini INFO | Preparing Models INFO | Input-ISA file INFO | ISACheck: Loading input file: /Users/winston/Documents/ca2025-HW3/tests/mycpu_plugin/mycpu_isa_rv32i.yaml INFO | ISACheck: Load Schema /Users/winston/miniconda3/envs/chisel/lib/python3.13/site-packages/riscv_config/schemas/schema_isa.yaml INFO | ISACheck: Processing Hart:0 INFO | ISACheck: Initiating Validation for Hart:0 INFO | ISACheck: No errors for Hart:0 INFO | ISACheck: Updating fields node for each CSR in Hart:0 INFO | ISACheck: Dumping out Normalized Checked YAML: /Users/winston/Documents/ca2025-HW3/tests/riscof_work_1sc/mycpu_isa_rv32i_checked.yaml INFO | Input-Platform file INFO | Loading input file: /Users/winston/Documents/ca2025-HW3/tests/mycpu_plugin/mycpu_platform.yaml INFO | Load Schema /Users/winston/miniconda3/envs/chisel/lib/python3.13/site-packages/riscv_config/schemas/schema_platform.yaml INFO | Initiating Validation INFO | No Syntax errors in Input Platform Yaml. :) INFO | Dumping out Normalized Checked YAML: /Users/winston/Documents/ca2025-HW3/tests/riscof_work_1sc/mycpu_platform_checked.yaml INFO | Generating database for suite: /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite INFO | Database File Generated: /Users/winston/Documents/ca2025-HW3/tests/riscof_work_1sc/database.yaml INFO | Env path set to/Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/env INFO | Running Build for DUT INFO | Running Build for Reference INFO | Selecting Tests. INFO | Running Tests on DUT. INFO | === BATCH MODE: Preparing 41 tests === INFO | Compiling test 1/41: /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/add-01.S ... INFO | Compiling test 41/41: /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/rv32i_m/privilege/src/misalign1-jalr-01.S INFO | === Generating batch test file with 41 tests === INFO | === Running all 41 tests in single SBT session === INFO | [1/41] Running: /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/add-01.S ... INFO | [41/41] Running: /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/rv32i_m/privilege/src/misalign1-jalr-01.S INFO | Batch test completed. Full log: /Users/winston/Documents/ca2025-HW3/tests/riscof_work_1sc/batch_test.log INFO | Results: 41 passed, 0 failed INFO | Running Tests on Reference Model. INFO | Reference signature generated: /Users/winston/Documents/ca2025-HW3/tests/riscof_work_1sc/rv32i_m/I/src/add-01.S/ref/Reference-rv32emu.signature ... INFO | Reference signature generated: /Users/winston/Documents/ca2025-HW3/tests/riscof_work_1sc/rv32i_m/privilege/src/misalign1-jalr-01.S/ref/Reference-rv32emu.signature INFO | Initiating signature checking. INFO | Following 41 tests have been run : INFO | TEST NAME : COMMIT ID : STATUS INFO | /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/add-01.S : - : Passed ... INFO | Test report generated at /Users/winston/Documents/ca2025-HW3/tests/riscof_work_1sc/report.html. INFO | Opening test report in web-browser ✅ Compliance tests complete. Results in riscof_work_1sc/ Completion time: Mon Dec 8 14:16:28 CST 2025 Copying results to results/ directory... Cleaning up auto-generated RISCOF test files... ✅ Compliance tests complete. Results in results/ 📊 View report: results/report.html ``` ## RISC-V CPU with MMIO Peripherals and Trap Handling * **Extensions**: Based on the single-cycle design, adds the Zicsr extension (CSR registers) and the Machine-mode privilege architecture. * **Peripherals & Interrupts**: Integrates Timer, UART, and VGA peripherals via MMIO, and introduces the CLINT (Core-Local Interrupt Controller) to handle hardware interrupts and software traps. * **Atomicity**: Ensures the atomicity of interrupt handling and CSR updates; CLINT write operations take priority over CPU instructions. ### Exercises * **CSR.scala:** * `Exercise 10`: Implement the CSR register lookup table (mapping addresses to mstatus, mepc, etc.). * `Exercise 11`: Implement CSR write priority logic to prioritize CLINT updates. * **WriteBack.scala:** * `Exercise 12` (WriteBack Source Selection): Extend the write-back multiplexer to include CSR read data (RegWriteSource.CSR) as a source, enabling the result of CSR instructions to be written to the register file. * **MemoryAccess.scala:** * `Exercise 12` (Load Data Extension): Implement sign and zero extension for load instructions (LB, LBU, LH, LHU). This requires extracting the correct byte or halfword based on the address alignment (mem_address_index) and extending it to 32 bits. * **CLINT.scala:** * `Exercise 13`: Implement mstatus state transition for interrupt entry (save MIE to MPIE). * `Exercise 14`: Implement mstatus restoration for trap return (MRET). * **InstructionFetch.scala:** * `Exercise 15`: Implement PC update logic with interrupt assertions as the highest priority. ### Testcases * **Module Tests** * `ExecuteTest (CSR)`: Extended to verify the read/modify/write logic of CSR instructions (csrw, csrr, etc.). * `TimerTest`: Verifies if the Timer peripheral's MMIO registers (limit, enable) can be correctly read and written. * `UartMMIOTest`: Tests the UART peripheral's transmit (TX) and receive (RX) functionality and its interrupt flags. * **CLINTCSRTest**: * `External Interrupt`: Simulates an external interrupt trigger to verify mstatus/mepc/mcause updates and the mret return flow. *`Environmental Instructions`: Verifies the triggering and jumping of exception instructions ecall and ebreak. * **Integration Tests** (CPUTest.scala) * `InterruptTrapTest`: Executes irqtrap.asmbin to simulate an interrupt, verifying the CPU correctly jumps to the Trap Handler and returns, while checking CSR states. * Includes basic tests like `FibonacciTest` and `QuicksortTest` to ensure backward compatibility. ### Test Result ``` [info] ByteAccessTest: [info] [CPU] Byte access program [info] - should store and load single byte [info] CLINTCSRTest: [info] [CLINT] Machine-mode interrupt flow [info] - should handle external interrupt [info] - should handle environmental instructions [info] UartMMIOTest: [info] [UART] Comprehensive TX+RX test [info] - should pass all TX and RX tests [info] ExecuteTest: [info] [Execute] CSR write-back [info] - should produce correct data for csr write [info] FibonacciTest: [info] [CPU] Fibonacci program [info] - should calculate recursively fibonacci(10) [info] TimerTest: [info] [Timer] MMIO registers [info] - should read and write the limit [info] InterruptTrapTest: [info] [CPU] Interrupt trap flow [info] - should jump to trap handler and then return [info] QuicksortTest: [info] [CPU] Quicksort program [info] - should quicksort 10 numbers [info] Run completed in 21 seconds, 129 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### RISCOF Compliance Test ``` Validating RISCOF installation... RISCOF found: /Users/winston/miniconda3/envs/chisel/bin/riscof Version: RISC-V Architectural Test Framework., version 1.25.3 Running RISCOF compliance tests for 2-mmio-trap (RV32I + Zicsr)... Running RISCOF compliance tests for 2-mmio-trap... Using config: config-2-mmio-trap.ini Using RISCOF: /Users/winston/miniconda3/envs/chisel/bin/riscof Using toolchain: Starting compliance test run at Mon Dec 8 14:45:47 CST 2025 This may take 10-15 minutes for the full test suite... INFO | ****** RISCOF: RISC-V Architectural Test Framework 1.25.3 ******* INFO | using riscv_isac version : 0.18.0 INFO | using riscv_config version : 3.18.3 INFO | Reading configuration from: /Users/winston/Documents/ca2025-HW3/tests/config-2-mmio-trap.ini INFO | Preparing Models INFO | Input-ISA file INFO | ISACheck: Loading input file: /Users/winston/Documents/ca2025-HW3/tests/mycpu_plugin/mycpu_isa_rv32i_zicsr.yaml INFO | ISACheck: Load Schema /Users/winston/miniconda3/envs/chisel/lib/python3.13/site-packages/riscv_config/schemas/schema_isa.yaml INFO | ISACheck: Processing Hart:0 INFO | ISACheck: Initiating Validation for Hart:0 INFO | ISACheck: No errors for Hart:0 INFO | ISACheck: Updating fields node for each CSR in Hart:0 INFO | ISACheck: Dumping out Normalized Checked YAML: /Users/winston/Documents/ca2025-HW3/tests/riscof_work_2mt/mycpu_isa_rv32i_zicsr_checked.yaml INFO | Input-Platform file INFO | Loading input file: /Users/winston/Documents/ca2025-HW3/tests/mycpu_plugin/mycpu_platform.yaml INFO | Load Schema /Users/winston/miniconda3/envs/chisel/lib/python3.13/site-packages/riscv_config/schemas/schema_platform.yaml INFO | Initiating Validation INFO | No Syntax errors in Input Platform Yaml. :) INFO | Dumping out Normalized Checked YAML: /Users/winston/Documents/ca2025-HW3/tests/riscof_work_2mt/mycpu_platform_checked.yaml INFO | Generating database for suite: /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite INFO | Database File Generated: /Users/winston/Documents/ca2025-HW3/tests/riscof_work_2mt/database.yaml INFO | Env path set to/Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/env INFO | Running Build for DUT INFO | Running Build for Reference INFO | Selecting Tests. INFO | Running Tests on DUT. INFO | === BATCH MODE: Preparing 119 tests === INFO | Compiling test 1/119: /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/add-01.S INFO | Compiling test 2/119: /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/addi-01.S ... INFO | /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/rv32i_m/privilege/src/misalign2-jalr-01.S : - : Passed INFO | Test report generated at /Users/winston/Documents/ca2025-HW3/tests/riscof_work_2mt/report.html. INFO | Opening test report in web-browser ✅ Compliance tests complete. Results in riscof_work_2mt/ Completion time: Mon Dec 8 14:42:35 CST 2025 Copying results to results/ directory... Cleaning up auto-generated RISCOF test files... ✅ Compliance tests complete. Results in results/ 📊 View report: results/report.html ``` ### Nyancat VGA Display Demo ``` 🐱 Starting VGA demo with nyancat animation... Display: 640×480@72Hz with SDL2 visualization Program: nyancat.asmbin (12-frame nyancat animation) Note: Frame upload + animation takes significant time Duration: 500M cycles (~5 minutes, includes full animation) cd verilog/verilator/obj_dir && ./VTop -vga -instruction ../../../src/main/resources/nyancat.asmbin -time 500000000 [SDL2] Window opened: 640x480 'VGA Display - MyCPU' [SDL2] Press ESC or close window to stop simulation early Simulation progress: 1% Simulation progress: 2% Simulation progress: 3% ... Simulation progress: 96% Simulation progress: 97% Simulation progress: 98% Simulation progress: 99% Simulation progress: 100% ✅ Demo complete! You should have seen animated nyancat. ``` ![image](https://hackmd.io/_uploads/HJdjkTffZx.png) ## Pipelined RISC-V CPU * **Performance Optimization**: Splits the execution process into multiple stages (3 or 5 stages) to improve clock frequency and instruction throughput, reducing CPI. * **Multiple Implementations**: Provides various pipeline versions, ranging from basic Stall mechanisms to advanced Data Forwarding and Early Branch Resolution in the ID stage. * **Hazard Handling**: Implements a complete Hazard Unit to detect and resolve Data Hazards and Control Hazards. ### Exercises * **ALU.scala:** * `Exercise 16`: Implement the remaining ALU operations, including shift operations (sll, srl, sra), comparison operations (slt, sltu), and logical operations (xor, or, and). * **fivestage_final/Forwarding.scala:** * `Exercise 17`: Implement data forwarding logic for the EX stage to resolve RAW hazards for rs1 and rs2 using data from MEM and WB stages. * `Exercise 18`: Implement data forwarding logic for the ID stage to enable early branch resolution by forwarding operands from MEM and WB stages. * **fivestage_final/Control.scala:** * `Exercise 19`: Implement pipeline hazard detection logic to identify load-use hazards and jump dependencies, asserting pc_stall, if_stall, and id_flush signals when necessary. * **fivestage_final/IF2ID.scala:** * `Exercise 20`: Implement the stall and flush logic for the IF/ID pipeline registers (instruction, address, and interrupt flag), ensuring NOPs are inserted during a flush. ### Testcases * **Basic & Peripheral Tests** * `PipelineRegisterTest`: Verifies the stall and flush mechanisms of pipeline registers (e.g., IF/ID, ID/EX). * `PipelineUartTest`: Verifies UART peripheral MMIO read/write functions within the pipelined architecture. * **Program Execution Tests** (PipelineProgramTest) * `Basic Programs`: Executes fibonacci, quicksort, sb to ensure correct basic instruction execution in the pipeline. * `Hazard Test` (hazard.asmbin): Tests basic Data Hazards (RAW) and Control Hazards to verify the correctness of Forwarding or Stall mechanisms. * `Hazard Extended` (hazard_extended.asmbin): A comprehensive hazard test covering WAW, Load-Use Hazards, consecutive Loads, branch condition dependencies, JAL return address dependencies, CSR read/write dependencies, and more complex scenarios. * `Trap Test` (irqtrap.asmbin): Verifies the interrupt flow in the pipeline architecture, specifically the pipeline flush behavior when a Trap occurs. ### Test Result ``` [info] PipelineProgramTest: [info] Three-stage Pipelined CPU [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] Five-stage Pipelined CPU with Stalling [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] Five-stage Pipelined CPU with Forwarding [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] Five-stage Pipelined CPU with Reduced Branch Delay [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] PipelineUartTest: [info] Three-stage Pipelined CPU UART Comprehensive Test [info] - should pass all TX and RX tests [info] Five-stage Pipelined CPU with Stalling UART Comprehensive Test [info] - should pass all TX and RX tests [info] Five-stage Pipelined CPU with Forwarding UART Comprehensive Test [info] - should pass all TX and RX tests [info] Five-stage Pipelined CPU with Reduced Branch Delay UART Comprehensive Test [info] - should pass all TX and RX tests [info] PipelineRegisterTest: [info] Pipeline Register [info] - should be able to stall and flush [info] Run completed in 1 minute, 8 seconds. [info] Total number of tests run: 29 [info] Suites: completed 3, aborted 0 [info] Tests: succeeded 29, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### RISCOF Compliance Test ``` Validating RISCOF installation... RISCOF found: /Users/winston/miniconda3/envs/chisel/bin/riscof Version: RISC-V Architectural Test Framework., version 1.25.3 Running RISCOF compliance tests for 3-pipeline (RV32I + Zicsr)... Running RISCOF compliance tests for 3-pipeline... Using config: config-3-pipeline.ini Using RISCOF: /Users/winston/miniconda3/envs/chisel/bin/riscof Using toolchain: Starting compliance test run at Mon Dec 8 14:47:31 CST 2025 This may take 10-15 minutes for the full test suite... INFO | ****** RISCOF: RISC-V Architectural Test Framework 1.25.3 ******* INFO | using riscv_isac version : 0.18.0 INFO | using riscv_config version : 3.18.3 INFO | Reading configuration from: /Users/winston/Documents/ca2025-HW3/tests/config-3-pipeline.ini INFO | Preparing Models INFO | Input-ISA file INFO | ISACheck: Loading input file: /Users/winston/Documents/ca2025-HW3/tests/mycpu_plugin/mycpu_isa_rv32i_zicsr.yaml INFO | ISACheck: Load Schema /Users/winston/miniconda3/envs/chisel/lib/python3.13/site-packages/riscv_config/schemas/schema_isa.yaml INFO | ISACheck: Processing Hart:0 INFO | ISACheck: Initiating Validation for Hart:0 INFO | ISACheck: No errors for Hart:0 INFO | ISACheck: Updating fields node for each CSR in Hart:0 INFO | ISACheck: Dumping out Normalized Checked YAML: /Users/winston/Documents/ca2025-HW3/tests/riscof_work_3pl/mycpu_isa_rv32i_zicsr_checked.yaml INFO | Input-Platform file INFO | Loading input file: /Users/winston/Documents/ca2025-HW3/tests/mycpu_plugin/mycpu_platform.yaml INFO | Load Schema /Users/winston/miniconda3/envs/chisel/lib/python3.13/site-packages/riscv_config/schemas/schema_platform.yaml INFO | Initiating Validation INFO | No Syntax errors in Input Platform Yaml. :) INFO | Dumping out Normalized Checked YAML: /Users/winston/Documents/ca2025-HW3/tests/riscof_work_3pl/mycpu_platform_checked.yaml INFO | Generating database for suite: /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite INFO | Database File Generated: /Users/winston/Documents/ca2025-HW3/tests/riscof_work_3pl/database.yaml INFO | Env path set to/Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/env INFO | Running Build for DUT INFO | Running Build for Reference INFO | Selecting Tests. INFO | Running Tests on DUT. INFO | === BATCH MODE: Preparing 119 tests === INFO | Compiling test 1/119: /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/add-01.S INFO | Compiling test 2/119: /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/rv32i_m/I/src/addi-01.S ... INFO | /Users/winston/Documents/ca2025-HW3/tests/riscv-arch-test/riscv-test-suite/rv32i_m/privilege/src/misalign2-jalr-01.S : - : Passed INFO | Test report generated at /Users/winston/Documents/ca2025-HW3/tests/riscof_work_3pl/report.html. INFO | Opening test report in web-browser ✅ Compliance tests complete. Results in riscof_work_3pl/ Completion time: Mon Dec 8 14:53:18 CST 2025 Copying results to results/ directory... Cleaning up auto-generated RISCOF test files... ✅ Compliance tests complete. Results in results/ 📊 View report: results/report.html ``` ### Hazard Detection Summary and Analysis (Exercise 21) * ==**Q1:**== Why do we need to stall for load-use hazards? A: Data from a Load instruction is only obtained in the MEM stage. If the very next instruction needs to use this value in the ID/EX stage, it cannot be forwarded in time. The pipeline must stall for one cycle to wait for the data. * ==**Q2:**== What is the difference between "stall" and "flush" operations? A: Stall freezes the pipeline, keeping the current state (PC remains unchanged) to wait for data. Flush clears the pipeline registers, discarding the current instruction (turning it into a NOP) to handle incorrect predictions or insert bubbles. * ==**Q3:**== Why does jump instruction with register dependency need stall? A: The jump target is calculated in the ID stage. If the dependent register value is still being calculated or read in the EX or MEM stage, the ID stage cannot access the correct value immediately, requiring a stall. * ==**Q4:**== In this design, why is branch penalty only 1 cycle instead of 2? A: Because this design uses ID stage branch resolution. Once a jump is decided in the ID (2nd) stage, only the incorrectly fetched instruction in the IF stage needs to be flushed, resulting in a 1-cycle penalty. * ==**Q5:**== What would happen if we removed the hazard detection logic entirely? A: Data hazards would result in reading old values, and control hazards would cause the execution of instructions from the wrong path, causing the CPU to malfunction. * ==**Q6:**== Complete the stall condition summary: * Stall is needed when: 1. A Load-Use hazard occurs, or an ID jump instruction depends on an EX result. 2. An ID jump instruction depends on a MEM stage Load result. * Flush is needed when: 1. A branch is taken or a jump is executed in the ID stage. ## Run HW2 on RISC-V Pipelined CPU ### Modifications 1. Remove `ecall` instruction because we are running program on bare metal and there are no system call supported. 2. Write all the result to absolute memory address directly when the program finished. 3. Extend Scala test program for hanoi.asmbin. ### Code #### hanoi.S ```asm= .text .globl _start _start: li x2, 0x2000 # Fix disk positions (BLANK 1-3: neutralize x5 effect) # BLANK 1: Fix position at x2+20 sw x0, 20(x2) # BLANK 2: Fix position at x2+24 sw x0, 24(x2) # BLANK 3: Fix position at x2+28 sw x0, 28(x2) addi x8, x0, 1 game_loop: # BLANK 4: Check loop termination (2^3 moves) addi x5, x0, 8 beq x8, x5, finish_game # Gray code formula: gray(n) = n XOR (n >> k) # BLANK 5: What is k for Gray code? srli x5, x8, 1 # BLANK 6: Complete Gray(n) calculation xor x6, x8, x5 # BLANK 7-8: Calculate previous value and its shift addi x7, x8, -1 srli x28, x7, 1 # BLANK 9: Generate Gray(n-1) xor x7, x7, x28 # BLANK 10: Which bits changed? xor x5, x6, x7 # Initialize disk number addi x9, x0, 0 # BLANK 11: Mask for testing LSB andi x6, x5, 1 # BLANK 12: Branch if disk 0 moves bne x6, x0, disk_found # BLANK 13: Set disk 1 addi x9, x0, 1 # BLANK 14: Test second bit with proper mask andi x6, x5, 2 bne x6, x0, disk_found # BLANK 15: Last disk number addi x9, x0, 2 disk_found: # BLANK 16: Check impossible pattern (multiple bits) andi x30, x5, 5 addi x31, x0, 5 beq x30, x31, pattern_match jal x0, continue_move pattern_match: continue_move: # BLANK 17: Word-align disk index (multiply by what?) slli x5, x9, 2 # BLANK 18: Base offset for disk array addi x5, x5, 20 add x5, x2, x5 lw x18, 0(x5) bne x9, x0, handle_large # BLANK 19: Small disk moves by how many positions? addi x19, x18, 2 # BLANK 20: Number of pegs addi x6, x0, 3 blt x19, x6, display_move sub x19, x19, x6 jal x0, display_move handle_large: # BLANK 21: Load reference disk position lw x6, 20(x2) # BLANK 22: Sum of all peg indices (0+1+2) addi x19, x0, 3 sub x19, x19, x18 sub x19, x19, x6 display_move: # BLANK 26: Calculate storage offset slli x5, x9, 2 addi x5, x5, 20 add x5, x2, x5 # BLANK 27: Update disk position sw x19, 0(x5) # BLANK 28-29: Increment counter and loop addi x8, x8, 1 jal x0, game_loop finish_game: li x5, 4 sw x8, 0(x5) # Total steps # Disk 0 Position (Offset 20) -> 0x8 lw x6, 20(x2) sw x6, 4(x5) # Disk 1 Position (Offset 24) -> 0xC lw x6, 24(x2) sw x6, 8(x5) # Disk 2 Position (Offset 28) -> 0x10 lw x6, 28(x2) sw x6, 12(x5) loop: j loop ``` #### PipelineProgramTest.scala ```diff ... for (i <- 1 to 1000) { c.clock.step() c.io.mem_debug_read_address.poke((i * 4).U) } c.io.csr_debug_read_address.poke(CSRRegister.MSTATUS) c.clock.step() c.io.csr_debug_read_data.expect(0x1888.U) c.io.csr_debug_read_address.poke(CSRRegister.MCAUSE) c.clock.step() val cause = c.io.csr_debug_read_data.peek().litValue assert(mcauseAcceptable.contains(cause), f"unexpected mcause 0x${cause}%x") c.io.mem_debug_read_address.poke(0x4.U) c.clock.step() c.io.mem_debug_read_data.expect(0x2022L.U) } } + it should "solve hanoi tower (3 disks)" in { + runProgram("hanoi.asmbin", cfg) { c => + c.clock.setTimeout(0) + c.clock.step(50000) + c.io.regs_debug_read_address.poke(8.U) + c.clock.step() + c.io.regs_debug_read_data.expect(8.U) + } + } } } ``` #### Makefile ```diff BINS = \ fibonacci.asmbin \ + hanoi.asmbin \ hazard.asmbin \ quicksort.asmbin \ sb.asmbin \ uart.asmbin \ irqtrap.asmbin ``` ### Analysis To check to simulate `hanoi.S` on MyCPU, go to `3-pipeline/csrc` and execute instruction below to generate `hanoi.asmbin` in `3-pipeline/src/resources` ```bash make CROSS_COMPILE=riscv64-unknown-elf- hanoi.asmbinv make update ``` Execute instruction below to generate waveform (`trace.vcd`) and `memory_dump.txt` (from address 0~256) ```bash make sim SIM_ARGS="-instruction src/main/resources/hanoi.asmbin -signature 0 256 memory_dump.txt" ``` #### memory_dump.txt ``` 00000000 00000008 // total step 00000002 // disk 1 location (peg) 00000002 // disk 2 location (peg) 00000002 // disk 3 location (peg) 00000000 ... 00000000 ``` - Total step is initialized to 1. It tells that this project spend 7 step to finish moving disks. - The location of three disk should be the same in the end. ### Test Result ``` sbt "project pipeline" "testOnly riscv.PipelineProgramTest -- -z hanoi" [info] welcome to sbt 1.10.7 (Eclipse Adoptium Java 11.0.29) [info] loading project definition from /Users/winston/Documents/ca2025-HW3/project/project/project [info] loading settings for project ca2025-hw3-build-build from metals.sbt... [info] loading project definition from /Users/winston/Documents/ca2025-HW3/project/project [info] loading settings for project ca2025-hw3-build from metals.sbt... [info] loading project definition from /Users/winston/Documents/ca2025-HW3/project [success] Generated .bloop/ca2025-hw3-build.json [success] Total time: 1 s, completed 2025年12月10日下午4:23:02 [info] loading settings for project root from build.sbt... [info] set current project to mycpu-root (in build file:/Users/winston/Documents/ca2025-HW3/) [info] set current project to mycpu-pipeline (in build file:/Users/winston/Documents/ca2025-HW3/) [info] PipelineProgramTest: [info] Three-stage Pipelined CPU [info] - should solve hanoi tower (3 disks) [info] Five-stage Pipelined CPU with Stalling [info] - should solve hanoi tower (3 disks) [info] Five-stage Pipelined CPU with Forwarding [info] - should solve hanoi tower (3 disks) [info] Five-stage Pipelined CPU with Reduced Branch Delay [info] - should solve hanoi tower (3 disks) [info] Run completed in 12 seconds, 270 milliseconds. [info] Total number of tests run: 4 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 4, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ## References - [Lab3: Construct a RISC-V CPU with Chisel](https://hackmd.io/@sysprog/B1Qxu2UkZx) - [Chisel Bootcamp](https://github.com/freechipsproject/chisel-bootcamp) - [MacOS 15+版本iverilog+GtkWAVE](https://blog.csdn.net/m0_51389066/article/details/146957073)