# arch2025 - homework3
> Name: Cheng-Han, Zhu
> Source Code: [GitHub](https://github.com/n26141876/neil-ca2025-mycpu/tree/main)
## 1\. Overview
In this assignment, I successfully constructed a complete 32-bit RISC-V processor using Chisel. The project evolved through three major phases: implementing a basic **Single-Cycle** core, extending it with **MMIO and Interrupt/Trap** support, and finally optimizing it into a **5-Stage Pipelined** architecture with full hazard handling.
The CPU fully complies with the **RV32I** base instruction set and the **Zicsr** extension, verified against the official **RISC-V Architectural Test Suite (RISCOF)**.
## 2\. Development Environment & Toolchain Setup
One of the most significant challenges in this project was setting up a compatible development environment.
### 2.1 The Toolchain Compatibility Issue
Initially, the legacy `riscv64-unknown-elf-gcc` toolchain provided in the environment did not support the `-march=rv32i_zicsr` flag required by the latest `riscof` (1.25.3) framework. This caused persistent compilation failures during compliance testing ("unsupported ISA subset 'z'").
### 2.2 Solution: Toolchain Upgrade & Symlinking
To resolve this, I manually installed a modern toolchain (`riscv32-elf-gcc`) to `/opt/riscv-new-toolchain`. However, the Chisel test suite (SBT) hardcoded the toolchain prefix as `riscv64-unknown-elf-`.
**My Solution:**
I created symbolic links (symlinks) in the new toolchain's `bin` directory to alias the `riscv32-elf-*` executables to `riscv64-unknown-elf-*`. This successfully "tricked" both SBT and RISCOF into using the modern compiler while satisfying the naming requirements of the legacy scripts.
**Command used for fix:**
```bash
sudo bash -c 'for file in riscv32-elf-*; do ln -sf "$file" "riscv64-unknown-elf-${file#riscv32-elf-}"; done'
```
-----
## 3\. Phase 1: Single-Cycle CPU
The first phase focused on implementing the fundamental data path and control logic for the RV32I instruction set.
### 3.1 Implementation Details
* **Instruction Fetch (IF):** Implemented PC update logic to handle sequential execution (`PC+4`) and unconditional jumps (`JAL`, `JALR`).
* **Instruction Decode (ID):** Implemented control signal generation for ALU operations, memory access (`MemRead`, `MemWrite`), and register write-back.
* **Execute (EX):** Integrated the ALU to support all arithmetic, logical, and shift operations. Added branch comparison logic.
* **Write Back (WB):** Implemented a Multiplexer to select data from ALU, Data Memory, or PC+4 (for linking).
### 3.2 Verification
* **Unit Tests:** Passed `FibonacciTest`, `QuicksortTest`, and `ByteAccessTest`.
* **Compliance Tests:** Passed all 41 tests in the `rv32i` suite.
* **make test**
****
* **make compliance**
**
**
-----
## 4\. Phase 2: MMIO & Trap Support
This phase transformed the calculation core into a system-capable processor by adding **CSRs (Control and Status Registers)**, **CLINT (Core Local Interruptor)**, and **MMIO (Memory Mapped I/O)**.
### 4.1 Implementation Details
* **CSR Module:** Implemented machine-mode CSRs (`mstatus`, `mie`, `mip`, `mcause`, `mepc`, `mtvec`) with atomic read/write logic.
* **CLINT:** Integrated a timer interrupt generator. The CLINT asserts an interrupt signal when `mtime` \>= `mtimecmp`.
* **Trap Handling Logic:**
* **Trap Entry:** When an interrupt (Timer/External) or exception (`ecall`) occurs, the hardware automatically saves the current PC to `mepc`, updates `mcause`, disables interrupts in `mstatus`, and jumps to the handler address in `mtvec`.
* **Trap Return (`mret`):** Restores the PC from `mepc` and re-enables interrupts.
* **Peripherals:** Integrated UART and Timer modules mapped to specific memory addresses.
### 4.2 Verification
* **Unit Tests:** Passed `InterruptTrapTest` (verified jump to trap handler), `TimerTest`, and `UartMMIOTest`.
* **Compliance Tests:** Passed all 119 tests covering `RV32I` and `Zicsr` extensions.
* **make test**
****
* **make compliance**
****
-----
## 5\. Phase 3: 5-Stage Pipelined CPU
The final phase involved pipelining the datapath into **IF, ID, EX, MEM, WB** stages to improve throughput.
### 5.1 Handling Hazards
To ensure correctness, I implemented a **Hazard Unit** and **Forwarding Unit**:
1. **Data Hazards (RAW):**
* **Forwarding:** Data from the MEM or WB stage is forwarded directly to the EX stage if the source register matches the destination register of a previous instruction. This eliminates bubbles for most arithmetic sequences.
* **Load-Use Hazard:** Since data loaded from memory is not available until the WB stage, a **Stall** is inserted in the ID stage if a dependent instruction follows a Load.
2. **Control Hazards:**
* **Flush:** When a branch is taken or a jump occurs in the EX stage, instructions in the IF and ID stages are flushed (converted to NOPs) to discard the wrongly fetched path.
### 5.2 Hazard Analysis (Exercise 21)
In this section, I analyze the logic behind the Hazard Detection mechanism implemented in the Control and HazardUnit modules.
1. **Why do we need to stall for load-use hazards?**
>Because the data from a Load instruction is retrieved from memory in the MEM stage. However, the immediately following instruction needs this data in the EX stage. Even with forwarding, the data cannot physically travel back in time from the end of the MEM stage to the start of the EX stage for the next instruction. Therefore, we must insert a 1-cycle stall to wait for the data to be ready.
2. **What is the difference between "stall" and "flush" operations?**
> Stall: Freezes the PC and pipeline registers (keeping the instruction in the current stage) to delay execution. This effectively "pauses" the pipeline.
> Flush: Clears the pipeline register (setting it to NOP/zero), effectively discarding the instruction currently in that stage. This is used when an instruction shouldn't be executed (e.g., wrong branch path).
3. **Why does jump instruction with register dependency need stall?**
> Indirect jumps (like JALR) calculate the target address in the ID stage using a register value. If the previous instruction is still calculating that register value in the EX stage (e.g., a Load or ALU operation), the value is not yet available in the ID stage. Therefore, the pipeline must stall until the value is forwarded or written back.
4. **In this design, why is branch penalty only 1 cycle instead of 2?**
> In our design, the branch decision (taken/not taken) and target address calculation are resolved in the ID stage (or data is forwarded to ID for comparison). Since we detect the branch direction early in the decode stage, we only need to flush the one incorrect instruction that was just fetched (in the IF stage). If we resolved it in the EX stage, we would need to flush two instructions.
5. **What would happen if we removed the hazard detection logic entirely?**
> Data Hazards (RAW): Instructions would read outdated values from registers before previous instructions have written the new results, leading to incorrect calculations.
> Control Hazards: The CPU would execute instructions immediately following a taken branch or jump, corrupting the program state with instructions that should have been skipped.
6. **Stall and Flush Conditions Summary**
- Stall is needed when:
>The instruction in the EX stage is a Load AND its destination register matches one of the source registers in the ID stage.
>The instruction in the ID stage is JALR AND its source register depends on the destination of the instruction in the EX stage.
- Flush is needed when:
>A Branch is taken OR a Jump (JAL/JALR) is taken (Control Hazard detected).
### 5.3 Verification
* **Unit Tests:** Passed `FibonacciTest` and `QuicksortTest`, proving that data dependencies and recursion are handled correctly by the pipeline logic.
* **Compliance Tests:** Passed all architectural tests.
* **make test**
****
* **make compliance**
****
-----
## 6\. Custom Application Integration (Homework 2)
I successfully ported the C/Assembly program from Homework 2 (`main.asmbin`) to the Chisel-based CPU.
* **Integration:** I added a new test class `MainTest` in `CPUTest.scala` that loads the compiled `main.asmbin` into the CPU's instruction memory.
* **Execution & Analysis:** The program was executed on the 5-Stage Pipelined CPU. The CPU correctly handled the data dependencies within the assembly code. Specifically, the Forwarding Unit was active during the arithmetic sequences, allowing the pipeline to execute without inserting unnecessary bubbles.
* **Verification:** Verification: The correctness of the execution is verified by the waveform below. As shown in the highlighted section, when io_write_enable is high, the destination register address io_write_address becomes 0xA (register a0), and the write data io_write_data is 0xA (decimal 10). This matches the expected result of the arithmetic sequence calculated by the program.

-----
## 7\. Conclusion
This project provided a deep dive into computer architecture. I learned not only how to implement the logical structure of a CPU (Pipeline, Hazard handling, CSRs) but also how to navigate complex toolchain environments. Successfully passing the RISC-V architectural compliance suite gives me high confidence in the correctness of my design.