1. [Learning Chisel by doing](http://127.0.0.1:8888/lab) # **Assignment 3: Your Own RISC-V CPU** ## **0. Abstract / Executive Summary** --- ## **1. Environment Setup** ### **1.1 Development Environment** All experiments were conducted on **macOS (M1)** using the toolchain required by the lab materials. The CPU projects were built and tested with **Eclipse Temurin JDK 11.0.29** and **sbt 1.10.7**, as specified in Lab3 . Hardware simulation was performed using **Verilator** together with **Surfer** for waveform inspection. **SDL2** was installed to enable VGA output for the Nyancat animation in the *2-mmio-trap* project. The official **ca2025-mycpu** repository served as the working base for all implementations. ### **1.2 Setup Procedures & Issues Encountered** * **Java Version Conflict**: During the initial setup, `sbt test` repeatedly failed with the following error message: ``` bad constant pool index ``` This issue was caused by sbt using **Java 21** instead of the required **Eclipse Temurin JDK 11**, as specified in the lab instructions. The problem was resolved by switching the default Java version to **11.0.29-tem**: ``` sdk default java 11.0.29-tem ``` The Java version was then verified using: ``` java -version ``` Expected output: ``` openjdk version "11.0.29" 2025-10-21 OpenJDK Runtime Environment Temurin-11.0.29+7 (build 11.0.29+7) OpenJDK 64-Bit Server VM Temurin-11.0.29+7 (build 11.0.29+7, mixed mode) ``` After switching to JDK 11, sbt compiled all modules successfully and the issue was fully resolved. But **after reopening, it’s still the same.** The default Java version is set to **Java 21**, but I need to switch to **Java 11**. Solution: Edit the Zsh configuration file: ```bash nano ~/.zshrc ``` Add the following configuration to explicitly set `JAVA_HOME` to Java 11 (Temurin) managed by SDKMAN: ```bash export SDKMAN_DIR="$HOME/.sdkman" [[ -s "$SDKMAN_DIR/bin/sdkman-init.sh" ]] && source "$SDKMAN_DIR/bin/sdkman-init.sh" export JAVA_HOME="$HOME/.sdkman/candidates/java/11.0.29-tem" export PATH="$JAVA_HOME/bin:$PATH" ``` Verification: Reload the configuration and verify the Java environment: ```bash source ~/.zshrc java -version echo $JAVA_HOME which java ``` Expected Output: ```text openjdk version "11.0.29" 2025-10-21 OpenJDK Runtime Environment Temurin-11.0.29+7 ... ``` * **Verilator Compiler Standard**: An error was encountered stating that *“Verilator requires a C++14 or newer compiler.”* Initially, this issue was addressed by running the tests with the environment variable `CXXFLAGS="-std=c++14" sbt test`. However, this approach did not fully resolve the problem. An alternative solution was to explicitly enforce the C++14 standard within the `1-single-cycle` directory by setting the compiler and flags at the environment level: ```bash cd ~/1-single-cycle # Check whether the environment is polluted by unexpected flags echo "CXX=$CXX" echo "CXXFLAGS=$CXXFLAGS" # Force the use of clang++ and specify the C++14 standard export CXX=clang++ export CXXFLAGS="-std=c++14" # Clean Verilator / test build cache rm -rf ../test_run_dir ./test_run_dir # Rebuild make ``` After applying these steps, the build completed successfully and the program ran as expected. * **RISCOF Compliance Testing Setup Note** During the RISCOF compliance testing setup, the reference model `rv32emu` failed to build initially due to a missing configuration file. To resolve this issue, the reference emulator was manually configured and built before rerunning the compliance tests. The following steps were performed: ```bash cd /Users/suniachiu/ca2025-mycpu/tests/rv32emu make defconfig make ENABLE_ARCH_TEST=1 ENABLE_FULL4G=1 cd /Users/suniachiu/ca2025-mycpu/1-single-cycle make compliance ``` After applying the default configuration (`make defconfig`) and enabling architectural test support, `rv32emu` was successfully built and used as the reference model by RISCOF. This allowed the compliance test flow to proceed correctly, including signature generation and comparison between the DUT (single-cycle CPU) and the reference model. * **RISCOF Compliance Testing – Current Status (Unresolved)** All 41 RV32I tests in the RISCOF compliance suite **compile successfully**. However, on the DUT (single-cycle CPU) side, almost no valid signatures are produced, or the generated signatures are incorrect. Consequently, **all tests are reported as Failed**. From the RISCOF logs, two key observations can be made: * **DUT side** All 41 tests report failure, accompanied by messages such as *“Signature not created”*, indicating that the DUT does not generate valid architectural signatures during execution. * **Reference side** Each test successfully produces a corresponding `Reference-rv32emu.signature`, confirming that the reference model and test infrastructure are functioning correctly. This discrepancy suggests that the root cause lies in the DUT’s execution path or signature-generation mechanism, rather than in test compilation or the reference model. Further inspection of a representative test (`add-01.S`) confirms that the test binary defines a valid signature region: ```text 00003000 D begin_signature 00003004 d signature_x3_0 00003004 d signature_x3_1 00003048 d signature_x8_0 00003088 d signature_x1_0 00003888 d signature_x1_1 00003940 D end_signature ``` This indicates that the signature memory region is correctly defined by the test program. However, the DUT does not yet produce matching signature contents, and the underlying cause remains unresolved at this stage. --- ## **2. The Minimal CPU (0-minimal)** ### **2.1 Supported Instructions** The 0-minimal CPU is designed only to run the **`jit.asmbin` self-modifying program**, so it implements exactly five RV32I instructions required for that workflow: 1. **AUIPC (Add Upper Immediate to PC)** Used for **PC-relative address generation**, e.g., computing the target address of the JIT code buffer. `rd = PC + (imm20 << 12)` 2. **ADDI (Add Immediate)** Used for **simple arithmetic, address increments, loop counters, and register initialization**, including setting the final result `a0 = 42`. `rd = rs1 + imm12` 3. **LW (Load Word)** Used to **load 32-bit words from memory**, specifically treating encoded instructions as data so they can be copied into the executable buffer. `rd = Mem[rs1 + offset]` 4. **SW (Store Word)** Used to **store 32-bit words to memory**. In `jit.asmbin`, this is the core mechanism that **writes copied instructions into the JIT code buffer**, enabling self-modifying code. `Mem[rs1 + offset] = rs2` 5. **JALR (Jump and Link Register)** Used for **indirect control flow** (entering the buffer and returning). The JIT program jumps to the buffer (e.g., **0x102c**) and later returns via `jalr`. `PC = (rs1 + imm) & ~1` #### How These Five Instructions Implement `jit.asmbin` As described in Lab3, `jit.asmbin` demonstrates self-modifying code by **copying instructions into an executable buffer, jumping to the copied code, executing it, and returning with `a0 = 42`.** This is achieved as follows: * **AUIPC** computes the **JIT buffer address** (PC-relative). * **LW** reads the source instructions (treated as data). * **SW** writes those instruction words into the **code buffer** (the key self-modifying step). * **ADDI** updates pointers/counters for the copy loop and sets the final result **`a0 = 42`**. * **JALR** transfers control into the buffer and later returns to the caller. Overall, although the 0-minimal CPU is intentionally limited, these five instructions are sufficient to execute the complete JIT self-modifying workflow required by the lab. --- ### **2.2 Architecture Overview** ### **2.3 Execution Flow & Waveform Study** #### Parse `trace.vcd` The output shows Overall Status: PASS, meaning the CPU behaved as expected. It confirms that the CPU executed in the JIT code buffer at 0x102c for 999,978 cycles, detected the expected memory writes (2 stores), and most importantly validated the architectural result a0 (x10) = 42. ``` Parsed 5 relevant signals. ====================================================================== VCD Trace Analysis Report - 0-minimal RISC-V CPU ====================================================================== Overall Status: [PASS] Key Findings: - JIT Code Execution: OK (999978 cycles at buffer) - Register a0 == 42: YES - Memory Writes Detected: YES (2 writes) Detailed Statistics: - PC Samples: 1000000 - Max PC Address: 0x00001030 - Register Writes: 499998 - Writes to a0 (x10): 249995 Expected Memory Layout: - Entry Point: 0x00001000 - JIT Code Buffer: 0x0000102c - JIT Instructions: 0x00001034 Interpretation: - [OK] CPU successfully executed the JIT self-modifying code. - Note: ChiselTest validates a0=42 via a separate debug interface. ====================================================================== ``` #### Implementation from Surfer ![image](https://hackmd.io/_uploads/SyzX26P7Wx.png) #### What I saw (Surfer waveform) | Signal | Value / Observation | | ------------------------ | -------------------------------------------------------------- | | `io_instruction_address` | **0x102c / 0x1030** repeating | | `io_instruction` | **0x02A00513, 0x00008067, 0x00001300…** | | `io_jump_flag_id` | **0** (no jump) | | `io_jump_address_id` | sometimes **0x0000002a** (this is just an ALU-computed result) | | `pc` | **0x102c → 0x1030 → 0x102c → …** repeating | --- #### What it means (signal interpretation) 1) `io_instruction [31:0]` This is the **instruction currently being decoded** by the CPU. **Example A: `0x02A00513`** * Decodes to: `addi a0, zero, 42` * Fields: `rd = x10 (a0)`, `rs1 = x0`, `imm = 42` * Meaning: this instruction **sets `a0 = 42`**, which matches the JIT program’s intended behavior. **Example B: `0x00008067`** * Decodes to: `jalr x0, ra, 0` * Meaning: this is the **return instruction** at the end of the JIT code. --- 2) io_instruction_address [31:0] This is the **PC value output from the IF stage** (the memory address of the instruction being fetched). Waveform shows: * `0x0000102c` * `0x00001030` * alternating repeatedly Meaning: * The CPU is **fetching instructions from the JIT code buffer**. * The repeated 0x102c ↔ 0x1030 pattern indicates the CPU is **executing a loop inside the JIT buffer**. --- 3) `io_instruction_read_data [31:0]` This is the **instruction returned by memory**. * It should match `io_instruction` (when `instruction_valid = 1`). * Surfer shows consistent values: ``` 02A00513 00008067 00001300 ``` Meaning: * The **IF → ID pipeline path is working correctly** (fetch and decode are consistent). --- 4) `io_instruction_valid` Surfer shows: * `instruction_valid = 1` Meaning: * Instruction memory loading (InstructionROM → RAM) has **completed** (`load_finished`). * The pipeline is **no longer stalled**. Rule of thumb: * `valid = 0` → PC held (stall) * `valid = 1` → CPU fetches / increments / jumps normally --- 5) `io_jump_address_id [31:0]` and `io_jump_flag_id` Surfer shows: * `io_jump_address_id = 0x0000002a` * `io_jump_flag_id = 0` Meaning: * `0x0000002a` is **not a taken jump target**; it is just an **ALU-computed value** produced in the pipeline. * Since `io_jump_flag_id = 0`, control logic indicates **“do not jump”** (no JAL/JALR taken, no taken branch). --- 6) `pc [31:0]` Waveform shows: * `0x102c → 0x1030 → 0x102c → 0x1030 → …` Meaning: * The CPU is executing code that has been **copied into the JIT buffer**. * Execution returns (around `0x1030`) and re-enters the loop, so the PC **bounces between these addresses**. --- 7) `reset`, `clock` Surfer shows: * `reset = 0` * `clock = toggling` Meaning: * The CPU has exited reset and is **running normally**. --- ## **3. Single-Cycle CPU (1-single-cycle)** ### **3.1 Core Architecture** The CPU follows a 5-stage single-cycle architecture: **Fetch (IF)**, **Decode (ID)**, **Execute (EX)**, **Memory (MEM)**, and **Write-back (WB)**. ### **3.2 Control Logic** I implemented the mapping between instruction fields and hardware control signals: * **Control Signal Generation**: In `InstructionDecode.scala`, I defined the logic for `wbSource`, `aluOp1Sel`, and `aluOp2Sel`. For example, `aluOp1Sel` is set to `InstructionAddress` for branches and JAL to support PC-relative addressing. * **ALU Control**: In `ALUControl.scala`, I used `funct3` and `funct7(5)` to distinguish similar instructions like `ADD/SUB` and `SRL/SRA`. ### **3.3 Instruction Flow and Waveform Examples** * **Instruction Fetch (IF) stage:** The program counter (PC) is initialized to the entry address (0x1000) and updates once per cycle when an instruction is available. I completed the `InstructionFetch`, PC update logic, by selecting between sequential execution and control-flow redirection using a multiplexer: * **If `jump_flag_id` = 1:** PC jumps to `jump_address_id`. * **Otherwise:** PC advances to the next instruction (`pc + 4`). * **When `instruction_valid` = 0:** PC holds its value and a NOP (`0x00000013`) is injected to prevent incorrect execution. * **Decode (ID) Stage:** I implemented the control unit to generate signals based on instruction opcodes. A critical part of this stage is the **Immediate Generator (Exercise 1)**, which handles five different formats. For example, in J-type instructions (like `JAL`), I implemented the bit-reordering logic to correctly extract the 21-bit immediate: `{inst[31], inst[19:12], inst[20], inst[30:21], 0}`. This ensures the computed `ex_immediate` is ready for the Execute stage. * **Execute (EX) Stage:** The ALU performs the core computation based on signals from `ALUControl`. I implemented the logic to distinguish between similar instructions like `ADD/SUB` or `SRL/SRA` by checking the bit `funct7(5)`. Furthermore, for the `JALR` instruction **(Exercise 5)**, I implemented the address calculation as `(rs1 + immediate) & ~1`. In the waveform, when `JALR` executes, `io_if_jump_flag` is asserted, and the `jalrTarget` is sent back to the IF stage to update the PC. * **Memory Access (MEM) Stage:** This stage handles data alignment for Load and Store operations **(Exercise 6 & 7)**. For byte-level stores (`SB`), I implemented **write strobes** (`io.memory_bundle.write_strobe`) to ensure only the target byte within a 32-bit word is modified. For loads like `LB`, the module extracts the specific byte and performs sign-extension to 32 bits before passing it to the Write-back stage. * **Write-Back (WB) Stage:** The final stage uses a multiplexer to select the data to be written into the register file **(Exercise 8)**. Based on `regs_write_source`, the module chooses between `alu_result` (for arithmetic), `memory_read_data` (for loads), or `PC + 4` (for jumps). In the waveform, the resulting `io_regs_write_data` can be seen transitioning at the end of each cycle, ready to be captured by the Register File on the next clock edge. ### **3.4 Test Results** ![image](https://hackmd.io/_uploads/rJkgguyrZe.png) ``` WRITE_VCD=1 sbt test surfer /Users/suniachiu/ca2025-mycpu/1-single-cycle/test_run_dir/InstructionFetch_should_correctly_update_PC_and_handle_jumps/InstructionFetch.vcd ``` ![image](https://hackmd.io/_uploads/SJP-ockBWg.png) ``` $ make verilator ./run-verilator.sh -instruction src/main/resources/fibonacci.asmbin -time 2000 -vcd dump.vcd run surfer dump.vcd ``` ![image](https://hackmd.io/_uploads/B1BJlRkH-e.png) * **Verilator** ![image](https://hackmd.io/_uploads/S1Ksn1xBWe.png) From the waveforms generated by Verilator, it can be observed that the behavior of each stage in the single-cycle CPU is correct. In the Instruction Fetch (IF) stage, `io_instruction_address` starts from `0x1060` and increments by +4 each cycle, while the corresponding `io_instruction` updates correctly, indicating that the program counter (PC) and instruction fetch mechanism are functioning as expected. In the Instruction Decode (ID) stage, the CPU correctly identifies instruction types such as `load`, `store`, and `addi`, and generates the appropriate control signals. During the Execute (EX) stage, the ALU accurately computes stack-relative addresses for subsequent memory accesses. In the Memory (MEM) stage, `write_enable` is asserted only for `store` instructions, and the correct data is written to memory. For `load` instructions, previously stored values are successfully read back, demonstrating that the data memory interface operates correctly. Although Write Back (WB) signals are not directly shown in the waveform, the values retrieved by `load` instructions indicate that the data is properly written back to the register file. Overall, the waveform confirms that each instruction completes the IF, ID, EX, MEM, and WB stages within a single clock cycle, which aligns with the design expectations of a single-cycle CPU. This verifies that the system executes the program correctly at the semantic level. --- ## **4. MMIO, Trap, and Interrupt Support (2-mmio-trap)** ### **4.1 CSR Implementation** ### **4.2 Interrupt Handling Flow** ### **4.3 MMIO Devices** ### **4.4 Nyancat VGA Demo** --- ## **5. Pipelined CPU (3-pipeline)** ### **5.1 Pipeline Versions and Comparison** ### **5.2 Data Hazards** ### **5.3 Control Hazards** ### **5.4 Waveform-Based Hazard Analysis** ### **5.5 Pipeline Test Suite** --- ## **6. RISC-V Architectural Compliance (Optional but High-Value)** --- ## **7. Debugging, Challenges, and Iterative Improvements** * **Issue: VTop Cannot Open Instruction Binary (Unresolved)** During simulation, an additional issue was observed where `VTop` failed to open the specified instruction binary. When running the simulator with the `-instruction` option, the following error was reported: > *Could not open file <instruction binary>* This issue was traced to the interaction between relative paths, working directories, and how `VTop` resolves the instruction binary location at runtime. Although the instruction binary (`*.asmbin`) exists in the expected directory, `VTop` may be invoked from a different working directory (e.g., `verilog/verilator/obj_dir`), causing relative paths to resolve incorrectly. As a workaround, absolute paths were used when manually invoking `VTop`, which allowed the simulation to proceed. However, this path-resolution issue has not yet been fully integrated or resolved within the automated RISCOF compliance flow, and may still affect DUT execution when binaries are generated and launched by the test framework. --- ## **8. Summary and Reflection** --- ## **9. Bonus: Enhanced ECALL and BREAK (Optional)**