# Enhancing Homework3
I added waveform analysis for J-type and S-type in [Homework 3](https://hackmd.io/@jeremytsai/computer_architecture_hw3).
# Extend Homework3 to 3-stage pipeline processor
contributed by < [`jeremy90307`](https://github.com/jeremy90307) >
The code for the Three-Stage processor is [ca2023-lab3_3-Stage](https://github.com/jeremy90307/ca2023-lab3_3-Stage.git), and the reference source is [YatCPU_Lab](https://github.com/hrpccs/2022-fall-yatcpu-repo/).
## Interrupt
### Introduction
Before constructing the Three-Stage CPU, it is essential to understand the concept of interrupts. Without interrupts, a program will only run according to the pre-defined instructions without the ability to interrupt midway. However, a practical CPU must always be ready to handle external events, allowing it to promptly process interrupts.
Please refer to [Lab2 of YatCPU](https://yatcpu.sysu.tech/labs/lab2-interrupt/) for a detailed introduction.
### CSR
Control and Status Registers (CSR) are utilized to control and store the current status of various features in the CPU.
- `mstatus register`
The mstatus register is used to record the current state, such as whether interrupts are enabled.
- `mepc register`
The mepc register saves the address of the instruction to be executed after the interrupt returns. When the CPU handles an interrupt, the mepc register is automatically set to the address of the current instruction.
- `mcause register`
The mcause register stores the reason for the interrupt.
- `mtvec register`
The `mtvec` register stores the address of the interrupt handling routine. When the CPU encounters an interrupt, the program counter (`pc`) register is automatically set to the address stored in the `mtvec` register, pointing to the interrupt handling routine.
When an interrupt occurs, the CPU needs to flush and stall the pipeline, and write interrupt-related information into CSR registers.
### The operation commands for CSR are as follows
- ID : ID stage needs to identify CSR instructions and generate corresponding control signals and data for other modules.
- EXE : For CSR instructions, writing into a register involves first fetching the value from the CSR register according to the instruction semantics. The value is then modified before being written back to the CSR register. At this moment, the ALU inside the EXE is idle, allowing for the reuse of the ALU to obtain the value from the CSR register.
- WB : After supporting CSR-related operations, the source of data to be written back to the target register includes the value read from the target CSR register before modification.
### CLINT(Core Local Interruptor)
The purpose of an interrupt controller is to detect external interrupts. Upon the arrival of an interrupt, it interrupts the current execution flow of the CPU. After setting the relevant CSR information, the processor then jumps to the interrupt handling routine to execute the interrupt service routine.
### Incorporate interrupts into the single-cycle architecture diagram

## Introduction
In the previous HW3 single-cycle CPU design, the critical path was too long, making it challenging to increase the clock frequency. Additionally, each clock cycle could only execute one instruction, resulting in a low instruction throughput. Therefore, a 3-stage CPU (IF, ID, EXE) was designed to address these issues. However, branch and jump instructions also introduced control hazards
## Three-Stage CPU
Building a Three-Stage CPU following the tutorial from [YatCPU](https://yatcpu.sysu.tech/labs/lab3-pipelined-cpu/), I will complete the assessment according to the instructions in the content.
### Three-Stage CPU architecture diagram

Using the `IF2ID` and `ID2EX` pipeline registers, the single-cycle CPU is divided into three stages.
- Instruction Fetch(IF) : Fetching the instruction based on the content stored in memory.
- Instruction Decode(ID) : Decoding the instruction into control signals and reading operands from registers.
- Execute(EXE) : Involving ALU operations, accessing memory, and writing back the results.
### Pipeline registers
Pipeline registers act as buffers in a pipeline, helping to split combinational logic and shorten critical paths. Their function is straightforward – during each clock cycle, based on the reset (pipeline clear) or stall (pipeline pause) conditions, the register content is cleared, held, or set to a new value. The output of the register is the value stored in it. For versatility, we can define a PipelineRegister module with parameters to implement pipeline registers of varying data widths.
Task: Complete the `PipelineRegister.scala` module with the provided template.
```scala=
package riscv.core
import chisel3._
import riscv.Parameters
class PipelineRegister(width: Int = Parameters.DataBits, defaultValue: UInt = 0.U) extends Module {
val io = IO(new Bundle {
val stall = Input(Bool())
val flush = Input(Bool())
val in = Input(UInt(width.W))
val out = Output(UInt(width.W))
})
// Lab3(PipelineRegister)
val reg = RegInit(UInt(width.W), defaultValue)
when(io.flush) {
reg := defaultValue
}.elsewhen(!io.stall) {
reg := io.in
}
io.out := reg
// Lab3(PipelineRegister) End
}
```
### Harzard
- Data Harzard
In a three-stage pipeline, there are no data hazards since all data processing operations take place in the EXE stage.
- Control Harzard
There are three situations in which a program jump may occur:
1. The EXE stage executes a jump instruction.
2. The EXE stage executes a branch instruction, and the branch condition is satisfied.
3. An interrupt occurs when the EXE stage receives the InterruptAssert signal from CLINT.
In all these scenarios, the EXE stage sends the jump signal, consisting of jump_flag and jump_address, to the IF stage. However, before writing the jump_address to the program counter (pc), there are unwanted instructions in the IF and ID stages that have not yet written to the registers. Therefore, it is only necessary to clear the corresponding pipeline registers to flush these two instructions.
Test: Fill in the missing code in the `Control.scala` module.
```scala=
package riscv.core.threestage
import chisel3._
class Control extends Module {
// Lab3(Flush)
val io = IO(new Bundle {
val jump_flag = Input(Bool())
val if_flush = Output(Bool())
val id_flush = Output(Bool())
})
io.if_flush := io.jump_flag
io.id_flush := io.jump_flag
// Lab3(Flush) End
}
```
Test: Complete the code for the flush part in the `CPU.scala` module.
```scala=
// Lab3(Flush)
ctrl.io.jump_flag := ex.io.if_jump_flag
if2id.io.flush := ctrl.io.if_flush
id2ex.io.flush := ctrl.io.id_flush
// Lab3(Flush) End
```
### CPUTest
:::spoiler PipelineRegisterTest
```scala=
package riscv
import chisel3._
import chiseltest._
import org.scalatest.flatspec.AnyFlatSpec
import riscv.core.PipelineRegister
import scala.math.pow
import scala.util.Random
class PipelineRegisterTest extends AnyFlatSpec with ChiselScalatestTester {
behavior of "Pipeline Register"
it should "be able to stall and flush" in {
val rand = new Random
val default_value = rand.nextInt(pow(2, Parameters.DataBits).toInt).asUInt(Parameters.DataWidth)
test(new PipelineRegister(Parameters.DataBits, default_value)).withAnnotations(TestAnnotations.annos) { c =>
var pre = default_value
for (_ <- 1 to 1000) {
val cur = rand.nextInt(pow(2, Parameters.DataBits).toInt).asUInt(Parameters.DataWidth)
c.io.in.poke(cur)
rand.nextInt(3) match {
case 0 =>
c.io.stall.poke(false.B)
c.io.flush.poke(false.B)
c.clock.step()
c.io.out.expect(cur)
pre = cur
case 1 =>
c.io.stall.poke(true.B)
c.io.flush.poke(false.B)
c.clock.step()
c.io.out.expect(pre)
case 2 =>
c.io.stall.poke(false.B)
c.io.flush.poke(true.B)
c.clock.step()
c.io.out.expect(default_value)
pre = default_value
}
}
}
}
```
:::
:::spoiler ThreeStageCPUTest
```scala=
package riscv
import chisel3._
import chiseltest._
import org.scalatest.flatspec.AnyFlatSpec
class ThreeStageCPUTest extends AnyFlatSpec with ChiselScalatestTester {
behavior of "Three-stage Pipelined CPU"
it should "calculate recursively fibonacci(10)" in {
test(new TestTopModule("fibonacci.asmbin", ImplementationType.ThreeStage)).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 50) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.mem_debug_read_address.poke(4.U)
c.clock.step()
c.io.mem_debug_read_data.expect(55.U)
}
}
it should "quicksort 10 numbers" in {
test(new TestTopModule("quicksort.asmbin", ImplementationType.ThreeStage)).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 50) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
for (i <- 1 to 10) {
c.io.mem_debug_read_address.poke((4 * i).U)
c.clock.step()
c.io.mem_debug_read_data.expect((i - 1).U)
}
}
}
it should "store and load single byte" in {
test(new TestTopModule("sb.asmbin", ImplementationType.ThreeStage)).withAnnotations(TestAnnotations.annos) { c =>
c.clock.step(1000)
c.io.regs_debug_read_address.poke(5.U)
c.io.regs_debug_read_data.expect(0xDEADBEEFL.U)
c.io.regs_debug_read_address.poke(6.U)
c.io.regs_debug_read_data.expect(0xEF.U)
c.io.regs_debug_read_address.poke(1.U)
c.io.regs_debug_read_data.expect(0x15EF.U)
}
}
it should "solve control hazards" in {
test(new TestTopModule("hazard.asmbin", ImplementationType.ThreeStage)).withAnnotations(TestAnnotations.annos) { c =>
c.clock.step(1000)
c.io.regs_debug_read_address.poke(1.U)
c.io.regs_debug_read_data.expect(26.U)
c.io.mem_debug_read_address.poke(4.U)
c.clock.step()
c.io.mem_debug_read_data.expect(1.U)
c.io.mem_debug_read_address.poke(8.U)
c.clock.step()
c.io.mem_debug_read_data.expect(3.U)
}
}
}
```
:::
**Result**
```
~/ca2023-lab3_3-Stage$ sbt test
[info] welcome to sbt 1.8.1 (Temurin Java 1.8.0_392)
[info] loading settings for project ca2023-lab3_3-stage-build from plugins.sbt ...
[info] loading project definition from /home/jeremytsai/ca2023-lab3_3-Stage/project
[info] loading settings for project root from build.sbt ...
[info] set current project to yatcpu (in build file:/home/jeremytsai/ca2023-lab3_3-Stage/)
[info] PipelineRegisterTest:
[info] Pipeline Register
[info] - should be able to stall and flush
[info] ThreeStageCPUTest:
[info] Three-stage Pipelined CPU
[info] - should calculate recursively fibonacci(10)
[info] - should quicksort 10 numbers
[info] - should store and load single byte
[info] - should solve control hazards
[info] Run completed in 11 seconds, 609 milliseconds.
[info] Total number of tests run: 5
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 5, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 12 s, completed 2024/1/13 下午 09:31:52
```
# Integrate HW3 into the Three-Stage CPU and pass the test
### Process
1. Place the assembly code for HW2 (`hw2.S`) into the `ca2023-lab3_3-stage/csrc` directory.
2. Modify hw2.S to remove `ecall` and add `_start:`
3. Modify the `Makefile` ,and add `hw2.asmbin` under BINS.
4. Enter `$ make update` in the directory to generate `hw2.asmbin`.
5. In `ThreeStageCPUTest.scala`, add a Test for `hw2.asmbin`.
```scala=
it should "calculate the scale" in {
test(new TestTopModule("hw2.asmbin",ImplementationType.ThreeStage)).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 50) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U)
}
c.io.regs_debug_read_address.poke(16.U) //a6
c.clock.step()
c.io.regs_debug_read_data.expect(0x41d00000.U)
}
}
```
**Test**
```
$sbt test
```
**Output**
```
~/ca2023-lab3_3-Stage$ sbt test
[info] welcome to sbt 1.8.1 (Temurin Java 1.8.0_392)
[info] loading settings for project ca2023-lab3_3-stage-build from plugins.sbt ...
[info] loading project definition from /home/jeremytsai/ca2023-lab3_3-Stage/project
[info] loading settings for project root from build.sbt ...
[info] set current project to yatcpu (in build file:/home/jeremytsai/ca2023-lab3_3-Stage/)
[info] compiling 1 Scala source to /home/jeremytsai/ca2023-lab3_3-Stage/target/scala-2.13/test-classes ...
[info] PipelineRegisterTest:
[info] Pipeline Register
[info] - should be able to stall and flush
[info] ThreeStageCPUTest:
[info] Three-stage Pipelined CPU
[info] - should calculate recursively fibonacci(10)
[info] - should quicksort 10 numbers
[info] - should store and load single byte
[info] - should solve control hazards
[info] - should calculate the scale
[info] Run completed in 14 seconds, 100 milliseconds.
[info] Total number of tests run: 6
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 17 s, completed 2024/1/14 下午 03:27:02
```
:::warning
No CPI comparison and analysis.
:::
# Reference
- [YatCPU](https://yatcpu.sysu.tech/)