Try   HackMD

Enhancing Homework3

I added waveform analysis for J-type and S-type in Homework 3.

Extend Homework3 to 3-stage pipeline processor

contributed by < jeremy90307 >

The code for the Three-Stage processor is ca2023-lab3_3-Stage, and the reference source is YatCPU_Lab.

Interrupt

Introduction

Before constructing the Three-Stage CPU, it is essential to understand the concept of interrupts. Without interrupts, a program will only run according to the pre-defined instructions without the ability to interrupt midway. However, a practical CPU must always be ready to handle external events, allowing it to promptly process interrupts.

Please refer to Lab2 of YatCPU for a detailed introduction.

CSR

Control and Status Registers (CSR) are utilized to control and store the current status of various features in the CPU.

  • mstatus register
    The mstatus register is used to record the current state, such as whether interrupts are enabled.

  • mepc register
    The mepc register saves the address of the instruction to be executed after the interrupt returns. When the CPU handles an interrupt, the mepc register is automatically set to the address of the current instruction.

  • mcause register
    The mcause register stores the reason for the interrupt.

  • mtvec register
    The mtvec register stores the address of the interrupt handling routine. When the CPU encounters an interrupt, the program counter (pc) register is automatically set to the address stored in the mtvec register, pointing to the interrupt handling routine.

When an interrupt occurs, the CPU needs to flush and stall the pipeline, and write interrupt-related information into CSR registers.

The operation commands for CSR are as follows

  • ID : ID stage needs to identify CSR instructions and generate corresponding control signals and data for other modules.

  • EXE : For CSR instructions, writing into a register involves first fetching the value from the CSR register according to the instruction semantics. The value is then modified before being written back to the CSR register. At this moment, the ALU inside the EXE is idle, allowing for the reuse of the ALU to obtain the value from the CSR register.

  • WB : After supporting CSR-related operations, the source of data to be written back to the target register includes the value read from the target CSR register before modification.

CLINT(Core Local Interruptor)

The purpose of an interrupt controller is to detect external interrupts. Upon the arrival of an interrupt, it interrupts the current execution flow of the CPU. After setting the relevant CSR information, the processor then jumps to the interrupt handling routine to execute the interrupt service routine.

Incorporate interrupts into the single-cycle architecture diagram

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Introduction

In the previous HW3 single-cycle CPU design, the critical path was too long, making it challenging to increase the clock frequency. Additionally, each clock cycle could only execute one instruction, resulting in a low instruction throughput. Therefore, a 3-stage CPU (IF, ID, EXE) was designed to address these issues. However, branch and jump instructions also introduced control hazards

Three-Stage CPU

Building a Three-Stage CPU following the tutorial from YatCPU, I will complete the assessment according to the instructions in the content.

Three-Stage CPU architecture diagram

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Using the IF2ID and ID2EX pipeline registers, the single-cycle CPU is divided into three stages.

  • Instruction Fetch(IF) : Fetching the instruction based on the content stored in memory.
  • Instruction Decode(ID) : Decoding the instruction into control signals and reading operands from registers.
  • Execute(EXE) : Involving ALU operations, accessing memory, and writing back the results.

Pipeline registers

Pipeline registers act as buffers in a pipeline, helping to split combinational logic and shorten critical paths. Their function is straightforward – during each clock cycle, based on the reset (pipeline clear) or stall (pipeline pause) conditions, the register content is cleared, held, or set to a new value. The output of the register is the value stored in it. For versatility, we can define a PipelineRegister module with parameters to implement pipeline registers of varying data widths.

Task: Complete the PipelineRegister.scala module with the provided template.

package riscv.core import chisel3._ import riscv.Parameters class PipelineRegister(width: Int = Parameters.DataBits, defaultValue: UInt = 0.U) extends Module { val io = IO(new Bundle { val stall = Input(Bool()) val flush = Input(Bool()) val in = Input(UInt(width.W)) val out = Output(UInt(width.W)) }) // Lab3(PipelineRegister) val reg = RegInit(UInt(width.W), defaultValue) when(io.flush) { reg := defaultValue }.elsewhen(!io.stall) { reg := io.in } io.out := reg // Lab3(PipelineRegister) End }

Harzard

  • Data Harzard
    In a three-stage pipeline, there are no data hazards since all data processing operations take place in the EXE stage.
  • Control Harzard
    There are three situations in which a program jump may occur:
    1. The EXE stage executes a jump instruction.
    2. The EXE stage executes a branch instruction, and the branch condition is satisfied.
    3. An interrupt occurs when the EXE stage receives the InterruptAssert signal from CLINT.

In all these scenarios, the EXE stage sends the jump signal, consisting of jump_flag and jump_address, to the IF stage. However, before writing the jump_address to the program counter (pc), there are unwanted instructions in the IF and ID stages that have not yet written to the registers. Therefore, it is only necessary to clear the corresponding pipeline registers to flush these two instructions.

Test: Fill in the missing code in the Control.scala module.

package riscv.core.threestage import chisel3._ class Control extends Module { // Lab3(Flush) val io = IO(new Bundle { val jump_flag = Input(Bool()) val if_flush = Output(Bool()) val id_flush = Output(Bool()) }) io.if_flush := io.jump_flag io.id_flush := io.jump_flag // Lab3(Flush) End }

Test: Complete the code for the flush part in the CPU.scala module.

// Lab3(Flush) ctrl.io.jump_flag := ex.io.if_jump_flag if2id.io.flush := ctrl.io.if_flush id2ex.io.flush := ctrl.io.id_flush // Lab3(Flush) End

CPUTest

PipelineRegisterTest
package riscv import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec import riscv.core.PipelineRegister import scala.math.pow import scala.util.Random class PipelineRegisterTest extends AnyFlatSpec with ChiselScalatestTester { behavior of "Pipeline Register" it should "be able to stall and flush" in { val rand = new Random val default_value = rand.nextInt(pow(2, Parameters.DataBits).toInt).asUInt(Parameters.DataWidth) test(new PipelineRegister(Parameters.DataBits, default_value)).withAnnotations(TestAnnotations.annos) { c => var pre = default_value for (_ <- 1 to 1000) { val cur = rand.nextInt(pow(2, Parameters.DataBits).toInt).asUInt(Parameters.DataWidth) c.io.in.poke(cur) rand.nextInt(3) match { case 0 => c.io.stall.poke(false.B) c.io.flush.poke(false.B) c.clock.step() c.io.out.expect(cur) pre = cur case 1 => c.io.stall.poke(true.B) c.io.flush.poke(false.B) c.clock.step() c.io.out.expect(pre) case 2 => c.io.stall.poke(false.B) c.io.flush.poke(true.B) c.clock.step() c.io.out.expect(default_value) pre = default_value } } } }
ThreeStageCPUTest
package riscv import chisel3._ import chiseltest._ import org.scalatest.flatspec.AnyFlatSpec class ThreeStageCPUTest extends AnyFlatSpec with ChiselScalatestTester { behavior of "Three-stage Pipelined CPU" it should "calculate recursively fibonacci(10)" in { test(new TestTopModule("fibonacci.asmbin", ImplementationType.ThreeStage)).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 50) { c.clock.step(1000) c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } c.io.mem_debug_read_address.poke(4.U) c.clock.step() c.io.mem_debug_read_data.expect(55.U) } } it should "quicksort 10 numbers" in { test(new TestTopModule("quicksort.asmbin", ImplementationType.ThreeStage)).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 50) { c.clock.step(1000) c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } for (i <- 1 to 10) { c.io.mem_debug_read_address.poke((4 * i).U) c.clock.step() c.io.mem_debug_read_data.expect((i - 1).U) } } } it should "store and load single byte" in { test(new TestTopModule("sb.asmbin", ImplementationType.ThreeStage)).withAnnotations(TestAnnotations.annos) { c => c.clock.step(1000) c.io.regs_debug_read_address.poke(5.U) c.io.regs_debug_read_data.expect(0xDEADBEEFL.U) c.io.regs_debug_read_address.poke(6.U) c.io.regs_debug_read_data.expect(0xEF.U) c.io.regs_debug_read_address.poke(1.U) c.io.regs_debug_read_data.expect(0x15EF.U) } } it should "solve control hazards" in { test(new TestTopModule("hazard.asmbin", ImplementationType.ThreeStage)).withAnnotations(TestAnnotations.annos) { c => c.clock.step(1000) c.io.regs_debug_read_address.poke(1.U) c.io.regs_debug_read_data.expect(26.U) c.io.mem_debug_read_address.poke(4.U) c.clock.step() c.io.mem_debug_read_data.expect(1.U) c.io.mem_debug_read_address.poke(8.U) c.clock.step() c.io.mem_debug_read_data.expect(3.U) } } }

Result

~/ca2023-lab3_3-Stage$ sbt test
[info] welcome to sbt 1.8.1 (Temurin Java 1.8.0_392)
[info] loading settings for project ca2023-lab3_3-stage-build from plugins.sbt ...
[info] loading project definition from /home/jeremytsai/ca2023-lab3_3-Stage/project
[info] loading settings for project root from build.sbt ...
[info] set current project to yatcpu (in build file:/home/jeremytsai/ca2023-lab3_3-Stage/)
[info] PipelineRegisterTest:
[info] Pipeline Register
[info] - should be able to stall and flush
[info] ThreeStageCPUTest:
[info] Three-stage Pipelined CPU
[info] - should calculate recursively fibonacci(10)
[info] - should quicksort 10 numbers
[info] - should store and load single byte
[info] - should solve control hazards
[info] Run completed in 11 seconds, 609 milliseconds.
[info] Total number of tests run: 5
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 5, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 12 s, completed 2024/1/13 下午 09:31:52

Integrate HW3 into the Three-Stage CPU and pass the test

Process

  1. Place the assembly code for HW2 (hw2.S) into the ca2023-lab3_3-stage/csrc directory.
  2. Modify hw2.S to remove ecall and add _start:
  3. Modify the Makefile ,and add hw2.asmbin under BINS.
  4. Enter $ make update in the directory to generate hw2.asmbin.
  5. In ThreeStageCPUTest.scala, add a Test for hw2.asmbin.
it should "calculate the scale" in { test(new TestTopModule("hw2.asmbin",ImplementationType.ThreeStage)).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 50) { c.clock.step(1000) c.io.mem_debug_read_address.poke((i * 4).U) } c.io.regs_debug_read_address.poke(16.U) //a6 c.clock.step() c.io.regs_debug_read_data.expect(0x41d00000.U) } }

Test

$sbt test

Output

~/ca2023-lab3_3-Stage$ sbt test
[info] welcome to sbt 1.8.1 (Temurin Java 1.8.0_392)
[info] loading settings for project ca2023-lab3_3-stage-build from plugins.sbt ...
[info] loading project definition from /home/jeremytsai/ca2023-lab3_3-Stage/project
[info] loading settings for project root from build.sbt ...
[info] set current project to yatcpu (in build file:/home/jeremytsai/ca2023-lab3_3-Stage/)
[info] compiling 1 Scala source to /home/jeremytsai/ca2023-lab3_3-Stage/target/scala-2.13/test-classes ...
[info] PipelineRegisterTest:
[info] Pipeline Register
[info] - should be able to stall and flush
[info] ThreeStageCPUTest:
[info] Three-stage Pipelined CPU
[info] - should calculate recursively fibonacci(10)
[info] - should quicksort 10 numbers
[info] - should store and load single byte
[info] - should solve control hazards
[info] - should calculate the scale
[info] Run completed in 14 seconds, 100 milliseconds.
[info] Total number of tests run: 6
[info] Suites: completed 2, aborted 0
[info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 17 s, completed 2024/1/14 下午 03:27:02

No CPI comparison and analysis.

Reference