Try   HackMD

Assignment3: single-cycle RISC-V CPU

contributed by <Paintakotako>

Install

Follow the instructions in Lab3 to install the required dependencies, but i've encounter a situation where Java is not installed, so installtion is needed.

Chisel bootcamp notes

Module

class Passthrough extends Module {
  val io = IO(new Bundle {
    val in = Input(UInt(4.W))
    val out = Output(UInt(4.W))
  })
  io.out := io.in
}
  • Module is a built-in Chisel class that all hardware modules must extend.
  • val io = IO(...)
    • declare all input and output ports in io val
    • Must be called io and be an IO object
  • new Bundle {...}
    • Hardware struct type, contains named sigals in and out

Tester

test(new Passthrough()) { c =>
    c.io.in.poke(0.U)     // Set our input to value 0
    c.io.out.expect(0.U)  // Assert that the output correctly has 0
    c.io.in.poke(1.U)     // Set our input to value 1
    c.io.out.expect(1.U)  // Assert that the output correctly has 1
    c.io.in.poke(2.U)     // Set our input to value 2
    c.io.out.expect(2.U)  // Assert that the output correctly has 2
}
println("SUCCESS!!") // Scala Code: if we get here, our tests passed!
  • test accepts a Passthrogh module
  • Set input using poke
  • Set expect output as expect
    • If all expect statements are true, then the test is passed.

Operators

  • true.B and false.B are preferred ways to create Chisel Bool literals

  • Mux is used to select value, operates like ternary operator

  • Cat operator to concatenate to bits value

    • e.g. Cat(b10, b1) = b(101)

Ternary operator:
Also known as the conditional operator, is a shorthand way of writing an if-else statement. Its syntax is as follows:

condition ? expression_if_true : expression_if_false;
val s = true.B 
io.outmux := Mux(s, 3.U, 0.U) // here outmux's value shold be 3.U since S is true
io.outcat := Cat(2.U, 1.U) // concatenates 2 (b10) with 1 (b1) and assign it to outat witch val is 5 (101)

Control flow

when, elsewhen, and otherwise

when(someBooleanCondition) {
  // things to do when true
}.elsewhen(someOtherBooleanCondition) {
  // things to do on this condition
}.otherwise {
  // things to do if none of th boolean conditions are true
}
  • when describe the behavior of hardware
  • Note: when does not return value
    • e.g. val result = when(squareIt) { x * x }.otherwise { x } is not valid

The Wire Construct

  • Defines a circuit component
  • Wire can serve as an intermediary between two circuits.
  • The reference image is as follows:
  • Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →
class Sort4 extends Module {
  val io = IO(new Bundle {
    val in0 = Input(UInt(16.W))
    val in1 = Input(UInt(16.W))
    val in2 = Input(UInt(16.W))
    val in3 = Input(UInt(16.W))
    val out0 = Output(UInt(16.W))
    val out1 = Output(UInt(16.W))
    val out2 = Output(UInt(16.W))
    val out3 = Output(UInt(16.W))
  })

  val row10 = Wire(UInt(16.W))
  val row11 = Wire(UInt(16.W))
  val row12 = Wire(UInt(16.W))
  val row13 = Wire(UInt(16.W))

  when(io.in0 < io.in1) {
    row10 := io.in0            // preserve first two elements
    row11 := io.in1
  }.otherwise {
    row10 := io.in1            // swap first two elements
    row11 := io.in0
  }

  when(io.in2 < io.in3) {
    row12 := io.in2            // preserve last two elements
    row13 := io.in3
  }.otherwise {
    row12 := io.in3            // swap last two elements
    row13 := io.in2
  }

  val row21 = Wire(UInt(16.W))
  val row22 = Wire(UInt(16.W))

  when(row11 < row12) {
    row21 := row11            // preserve middle 2 elements
    row22 := row12
  }.otherwise {
    row21 := row12            // swap middle two elements
    row22 := row11
  }

  val row20 = Wire(UInt(16.W))
  val row23 = Wire(UInt(16.W))
  when(row10 < row13) {
    row20 := row10            // preserve middle 2 elements
    row23 := row13
  }.otherwise {
    row20 := row13            // swap middle two elements
    row23 := row10
  }

  when(row20 < row21) {
    io.out0 := row20            // preserve first two elements
    io.out1 := row21
  }.otherwise {
    io.out0 := row21            // swap first two elements
    io.out1 := row20
  }

  when(row22 < row23) {
    io.out2 := row22            // preserve first two elements
    io.out3 := row23
  }.otherwise {
    io.out2 := row23            // swap first two elements
    io.out3 := row22
  }
}
  • We can define some Wire such as row10, row11, ... to be intermediate between input and output.

when vs if in chisel

  • when does not return a value; instead, it is used to describe the behavior of hardware, such as setting signals to specific values or performing certain operations.

  • if is not used to control the behavior of hardware; instead, it makes static choices during the generation process. It is typically used for deterministic parameter logic rather than representing hardware behavior.

Sequential Logic

Reg

  • A Reg holds its output value until the rising edge of its clock, at which time it takes on the value of its input.
  • i.e. Reg has a input in it's prev half clock, and has a output in it's second hald clock.
class RegisterModule extends Module {
  val io = IO(new Bundle {
    val in  = Input(UInt(12.W))
    val out = Output(UInt(12.W))
  })
  
  val register = Reg(UInt(12.W))
  register := io.in + 1.U
  io.out := register
}

test(new RegisterModule) { c =>
  for (i <- 0 until 100) {
    c.io.in.poke(i.U)
    c.clock.step(1)
    c.io.out.expect((i + 1).U)
  }
}
  • In test case, set input using poke, step is used to tick the clock once, which will cause the register to pass its input to its output.

RegNext

In previos case, we need to specify Register type, instead, we can use RegNext, this command will automacitly determine the register type inferred from the register's output connection.

class RegNextModule extends Module {
  val io = IO(new Bundle {
    val in  = Input(UInt(12.W))
    val out = Output(UInt(12.W))
  })
  
  // register bitwidth is inferred from io.out
  io.out := RegNext(io.in + 1.U)
}

test(new RegNextModule) { c =>
  for (i <- 0 until 100) {
    c.io.in.poke(i.U)
    c.clock.step(1)
    c.io.out.expect((i + 1).U)
  }
}

Hello World in Chisel

class Hello extends Module {
  val io = IO(new Bundle {
    val led = Output(UInt(1.W))
  })
  val CNT_MAX = (50000000 / 2 - 1).U;
  val cntReg  = RegInit(0.U(32.W))
  val blkReg  = RegInit(0.U(1.W))
  cntReg := cntReg + 1.U
  when(cntReg === CNT_MAX) {
    cntReg := 0.U
    blkReg := ~blkReg
  }
  io.led := blkReg
}

The module has only output and no input; the output of this module is a UInt with a width of 1 bit, which means the output can be either 0 or 1.

CNT_MAX is a counter register that contains a value of 24,999,999. The .U indicates that this value is an unsigned integer.

cntReg is a register initialized with 0 as an unsigned integer, with a width of 32 bits. This means that cntReg can represent a number in the range from 0 to \(2^{32} - 1\).

blkReg is a register that continuously counts cntReg until its value accumulates to 24,999,999.

Finally, the LED is assigned the value of blkReg, which is 1. Then, cntReg is reset to zero, and it starts accumulating again until it reaches CNT_MAX. The LED value is then updated to the complement of blkReg (~blkReg), and this process repeats.

We can refactor the original code using logic circuits like Mux with the following pattern:

class Hello extends Module {
  val io = IO(new Bundle {
    val led = Output(UInt(1.W))
  })
  val CNT_MAX = (50000000 / 2 - 1).U;
  val cntReg  = RegInit(0.U(32.W))
  val blkReg  = RegInit(0.U(1.W))
  cntReg := Mux(cngReg === CNT_MAX, 0.U, cntReg + 1.U)
  blkReg := Mux(cntReg === CNT_MAX, ~blkReg, blkReg)
  io.led := blkReg   
}

Lab 3 : Single Cycle RISC-V CPU

Implementaion

Refer to the following image for the implementation of a single-cycle machine.

  • Full
    Single-cycle CPU architecture

InstructionFetch stage

Here we need to determine the next value of the program counter (pc) based on whether a jump is required. If a jump is necessary, set the pc to the jump address; otherwise, set it to pc + 4.

We can inspect the tester's code to examine its poke and expect operations.

 case 0 => // no jump
    cur = pre + 4
    c.io.jump_flag_id.poke(false.B)
    c.clock.step()
    c.io.instruction_address.expect(cur)
    pre = pre + 4
case 1 => // jump
    c.io.jump_flag_id.poke(true.B)
    c.io.jump_address_id.poke(entry)
    c.clock.step()
    c.io.instruction_address.expect(entry)
    pre = entry

It can be inferred that the expected value for jump is the jump_address_id, while for non-jump operations, the expected value is the current program counter.

The following is the result after incorporating the above-mentioned feature:

$ sbt "testOnly riscv.singlecycle.InstructionFetchTest"
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] Run completed in 2 seconds, 590 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 3 s

InstructionDecode stage

In the original code, the module defines the following:

    val regs_reg1_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth))
    val regs_reg2_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth))
    val ex_immediate           = Output(UInt(Parameters.DataWidth))
    val ex_aluop1_source       = Output(UInt(1.W))
    val ex_aluop2_source       = Output(UInt(1.W))
    val memory_read_enable     = Output(Bool())
    val memory_write_enable    = Output(Bool())
    val wb_reg_write_source    = Output(UInt(2.W))
    val reg_write_enable       = Output(Bool())
    val reg_write_address      = Output(UInt(Parameters.PhysicalRegisterAddrWidth))

The remaining two outputs have not been implemented yet.

  • memory_read_enable
  • memory_write_enable

Checking the test files in InstructionDecoderTest, we can actually identify a bug. Specifically, there are no tests for the missing two outputs. In other words, filling in random values for the missing two outputs still allows the test to pass.

  io.memory_read_enable := 0.U
  io.memory_write_enable := 0.U

Here are the output results of the aforementioned behavior.

$ sbt "testOnly riscv.singlecycle.InstructionDecoderTest"
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] Run completed in 2 seconds, 658 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

However, in implementation, we need to determine the instruction type to classify it. If it is of the "load word" type, then it requires reading from memory, so the memory_read needs to be set to 1. Conversely, if it is of the store word type, as it involves writing to memory, memory_write should be set to 1.

Here are the correct results of setting the output after comparing with the opcode.

$ sbt "testOnly riscv.singlecycle.InstructionDecoderTest"
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] Run completed in 2 seconds, 616 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Execute stage

In the Execute module, two additional modules are declared, namely ALU and ALUControl.

Among them, ALU performs operations based on the values of op1 and op2, as well as the given func.
For example, if func is add, then result = op1 + op2. Therefore, before entering ALU, it is necessary to specify the function type to be given to ALU through alu_ctrl. After obtaining the function type from alu_ctrl, the operands (operand1 and operand2) for ALU operation need to be specified.

Following the Single-cycle CPU architecture, the missing code for circuit design based on the provided image is as follows.
The assignment of op1 and op2 to ALU, as well as the corresponding func, is not completed yet. The func is obtained from alu_ctrl, so the alu_funct of alu_ctrl is assigned to ALU. Next, op1 and op2 need to be specified.

  • op1 can be:
    • 0 or regRd1
  • op2 can be:
    • regRd2 or imm16

The following is the result after incorporating the above-mentioned feature:

$ sbt "testOnly riscv.singlecycle.ExecuteTest"
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] Run completed in 2 seconds, 709 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Combining into a CPU

Now that the modules for each stage have been defined, the next step is to connect the inputs and outputs for each stage. Once this is done, the single-cycle machine will be complete

$ sbt "testOnly riscv.singlecycle.CPUTest"
[info] Passed: Total 0, Failed 0, Errors 0, Passed 0
[info] No tests to run for Test / testOnly
[success] Total time: 2 s

Having completed the individual tests mentioned above, we can now execute the test cases.

$ sbt test
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] Run completed in 6 seconds, 745 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 7 s,