# Assignment3: Your Own RISC-V CPU ## 1-single-cycle ### Results :::spoiler Outputs ``` [info] InstructionDecoderTest: [info] InstructionDecoder [info] - should decode RV32I instructions and generate correct control signals [info] ByteAccessTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly handle byte-level store/load operations (SB/LB) [info] InstructionFetchTest: [info] InstructionFetch [info] - should correctly update PC and handle jumps [info] ExecuteTest: [info] Execute [info] - should execute ALU operations and branch logic correctly [info] FibonacciTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly execute recursive Fibonacci(10) program [info] RegisterFileTest: [info] RegisterFile [info] - should correctly read previously written register values [info] - should keep x0 hardwired to zero (RISC-V compliance) [info] - should support write-through (read during write cycle) [info] QuicksortTest: [info] Single Cycle CPU - Integration Tests [info] - should correctly execute Quicksort algorithm on 10 numbers [info] Run completed in 43 seconds, 927 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ::: ![image](https://hackmd.io/_uploads/BkfrnpQMZe.png) :::success We can observe that after reset signal pulled down, CPU fetches the first instruction `0x00001197` at `0x1000` which is `auipc x3, 1`, so `regs_io_data` is the data: `0x1000 (pc) + 0x1000 (upper imm) = 0x2000` being written back to `x3` of `RegFile` ::: ## 2-mmio-trap ### Results :::spoiler Outputs ``` [info] ByteAccessTest: [info] [CPU] Byte access program [info] - should store and load single byte [info] CLINTCSRTest: [info] [CLINT] Machine-mode interrupt flow [info] - should handle external interrupt [info] - should handle environmental instructions [info] UartMMIOTest: [info] [UART] Comprehensive TX+RX test [info] - should pass all TX and RX tests [info] ExecuteTest: [info] [Execute] CSR write-back [info] - should produce correct data for csr write [info] FibonacciTest: [info] [CPU] Fibonacci program [info] - should calculate recursively fibonacci(10) [info] TimerTest: [info] [Timer] MMIO registers [info] - should read and write the limit [info] InterruptTrapTest: [info] [CPU] Interrupt trap flow [info] - should jump to trap handler and then return [info] QuicksortTest: [info] [CPU] Quicksort program [info] - should quicksort 10 numbers [info] Run completed in 45 seconds, 491 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ::: ![image](https://hackmd.io/_uploads/Bkv0_8SfZx.png) :::success We can observe that three `Zicsr` instructions are executed: 1. `0x30529373: csrrw x6, mtvec, x5` 2. `0x3002a073: csrrs x0, mstatus, x5` 3. `0x3042a073: csrrs x0, mie, x5` The csr registers are updated with `x5` register content. ::: ### Nyancat Animation ![Recording 2025-12-08 200334](https://hackmd.io/_uploads/HJiwJBEz-e.gif) #### Compression ## 3-pipeline #### Hazard Detection Analysis with Waveforms :::info #### Why do we need to stall for load-use hazards? (Hint: Consider data dependency and forwarding limitations) Because the data being loaded from memory is not available until the end of the MEM stage, so the next instruction that needs this data must wait; forwarding cannot resolve this dependency in time, requiring a stall to ensure correct execution. ::: :::info #### What is the difference between "stall" and "flush" operations? (Hint: Compare their effects on pipeline registers and PC) Stall holds the current values in pipeline registers and prevents new instructions from entering, while flush clears instructions in pipeline registers to remove incorrect or unwanted instructions. ::: :::info #### Why does jump instruction with register dependency need stall? (Hint: When is jump target address available?) Because the jump target address is computed using a register value that may not be available yet if it is being written by a previous instruction; the pipeline must stall until the correct value is ready to ensure the jump goes to the correct address. ::: :::info #### In this design, why is branch penalty only 1 cycle instead of 2? (Hint: Compare ID-stage vs EX-stage branch resolution) Because branches are resolved in the ID stage rather than the EX stage, allowing the pipeline to flush only the IF stage and fetch the correct instruction sooner, thus reducing the penalty to a single cycle. ::: :::info #### What would happen if we removed the hazard detection logic entirely? (Hint: Consider data hazards and control flow correctness) The processor would execute instructions incorrectly due to unresolved data and control hazards, leading to incorrect results, unpredictable behavior, and possible program crashes because instructions would use stale or invalid data and branch/jump targets. ::: ### Results :::spoiler Outputs ``` [info] PipelineProgramTest: [info] Three-stage Pipelined CPU [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] Five-stage Pipelined CPU with Stalling [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] Five-stage Pipelined CPU with Forwarding [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] Five-stage Pipelined CPU with Reduced Branch Delay [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps [info] PipelineUartTest: [info] Three-stage Pipelined CPU UART Comprehensive Test [info] - should pass all TX and RX tests [info] Five-stage Pipelined CPU with Stalling UART Comprehensive Test [info] - should pass all TX and RX tests [info] Five-stage Pipelined CPU with Forwarding UART Comprehensive Test [info] - should pass all TX and RX tests [info] Five-stage Pipelined CPU with Reduced Branch Delay UART Comprehensive Test [info] - should pass all TX and RX tests [info] PipelineRegisterTest: [info] Pipeline Register [info] - should be able to stall and flush [info] Run completed in 2 minutes, 22 seconds. [info] Total number of tests run: 29 [info] Suites: completed 3, aborted 0 [info] Tests: succeeded 29, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ::: :::success CPU fetches instruction `0x28c28293 addi x5, x5, 652` from memory, which is initiated by `TestTopModule.scala` ::: ![image](https://hackmd.io/_uploads/r1xcfYubbg.png) :::success We retrieve the `opcode` of `addi` `('b0010011)`, and lookup the instruction types to extend the immediate field of I-type instructions `(=0x284)`. ::: ![image](https://hackmd.io/_uploads/rJqCfYdW-g.png) :::success `ALU` will receive the control signal decoded from `opcode`, `funct3` and `funct7` by`ALUControl`, and `Execute` will select the operand source. `ALU` will enter the switch case, matching `ALUFunctions.add`, and do add to `op1` and `op2`. ::: ![image](https://hackmd.io/_uploads/B1yg7tO-We.png) :::success Memory Access is not needed ::: ![image](https://hackmd.io/_uploads/BkeZmFd--g.png) :::success Writes the result `0x1298` to register `x5` ::: ![image](https://hackmd.io/_uploads/rkGf7tOWWx.png) ## Test on Assignment 2 1. Add sqrt.S in 3-pipeline/csrc 2. Modify Makefile ``` BINS = \ fibonacci.asmbin \ hazard.asmbin \ quicksort.asmbin \ sb.asmbin \ uart.asmbin \ irqtrap.asmbin \ sqrt.asmin ``` 3. Modify PipelineProgramTest.scala ``` it should "calculate inversed square root of 96" in { runProgram("sqrt.asmbin", cfg) { c => for (i <- 1 to 50) { // Avoid time-out c.clock.step(1000) c.io.mem_debug_read_address.poke((i * 4).U) } c.io.mem_debug_read_address.poke(2.U) c.io.mem_debug_read_data.expect(0x42c0.U) c.io.mem_debug_read_address.poke(10.U) for(i<-1 to 2000){ c.clock.step() } c.io.mem_debug_read_data.expect(0x3c2b.U) } } ``` 6. Run make test ``` [info] Three-stage Pipelined CPU [info] - should calculate recursively fibonacci(10) [info] - should quicksort 10 numbers [info] - should store and load single byte [info] - should calculate inversed square root of 96 [info] - should solve data and control hazards [info] - should handle all hazard types comprehensively [info] - should handle machine-mode traps ```