# Assignment3: single-cycle RISC-V CPU
contributed by <[`Paintakotako`](https://github.com/Paintako)>
## Install
Follow the instructions in Lab3 to install the required dependencies, but i've encounter a situation where Java is not installed, so installtion is needed.
## Chisel bootcamp notes
### Module
```scala
class Passthrough extends Module {
val io = IO(new Bundle {
val in = Input(UInt(4.W))
val out = Output(UInt(4.W))
})
io.out := io.in
}
```
* `Module` is a built-in Chisel class that all hardware modules must extend.
* `val io = IO(...)`
* declare all input and output ports in `io` val
* **Must be** called `io` and be an `IO` object
* `new Bundle {...}`
* Hardware struct type, contains named sigals `in` and `out`
### Tester
```scala
test(new Passthrough()) { c =>
c.io.in.poke(0.U) // Set our input to value 0
c.io.out.expect(0.U) // Assert that the output correctly has 0
c.io.in.poke(1.U) // Set our input to value 1
c.io.out.expect(1.U) // Assert that the output correctly has 1
c.io.in.poke(2.U) // Set our input to value 2
c.io.out.expect(2.U) // Assert that the output correctly has 2
}
println("SUCCESS!!") // Scala Code: if we get here, our tests passed!
```
* test accepts a `Passthrogh` module
* Set input using `poke`
* Set expect output as `expect`
* If all `expect` statements are true, then the test is passed.
### Operators
* `true.B` and `false.B` are preferred ways to create Chisel Bool literals
* `Mux` is used to select value, operates like **ternary operator**
* `Cat` operator to concatenate to bits value
* e.g. Cat(b10, b1) = b(101)
:::warning
Ternary operator:
Also known as the conditional operator, is a shorthand way of writing an if-else statement. Its syntax is as follows:
```c
condition ? expression_if_true : expression_if_false;
```
:::
```scala
val s = true.B
io.outmux := Mux(s, 3.U, 0.U) // here outmux's value shold be 3.U since S is true
io.outcat := Cat(2.U, 1.U) // concatenates 2 (b10) with 1 (b1) and assign it to outat witch val is 5 (101)
```
### Control flow
#### when, elsewhen, and otherwise
```scala
when(someBooleanCondition) {
// things to do when true
}.elsewhen(someOtherBooleanCondition) {
// things to do on this condition
}.otherwise {
// things to do if none of th boolean conditions are true
}
```
* `when` describe the behavior of hardware
* Note: `when` does not return value
* e.g. `val result = when(squareIt) { x * x }.otherwise { x }` is not valid
#### The Wire Construct
* Defines a **circuit component**
* `Wire` can serve as an **intermediary** between two circuits.
* The reference image is as follows:
* 
```scala
class Sort4 extends Module {
val io = IO(new Bundle {
val in0 = Input(UInt(16.W))
val in1 = Input(UInt(16.W))
val in2 = Input(UInt(16.W))
val in3 = Input(UInt(16.W))
val out0 = Output(UInt(16.W))
val out1 = Output(UInt(16.W))
val out2 = Output(UInt(16.W))
val out3 = Output(UInt(16.W))
})
val row10 = Wire(UInt(16.W))
val row11 = Wire(UInt(16.W))
val row12 = Wire(UInt(16.W))
val row13 = Wire(UInt(16.W))
when(io.in0 < io.in1) {
row10 := io.in0 // preserve first two elements
row11 := io.in1
}.otherwise {
row10 := io.in1 // swap first two elements
row11 := io.in0
}
when(io.in2 < io.in3) {
row12 := io.in2 // preserve last two elements
row13 := io.in3
}.otherwise {
row12 := io.in3 // swap last two elements
row13 := io.in2
}
val row21 = Wire(UInt(16.W))
val row22 = Wire(UInt(16.W))
when(row11 < row12) {
row21 := row11 // preserve middle 2 elements
row22 := row12
}.otherwise {
row21 := row12 // swap middle two elements
row22 := row11
}
val row20 = Wire(UInt(16.W))
val row23 = Wire(UInt(16.W))
when(row10 < row13) {
row20 := row10 // preserve middle 2 elements
row23 := row13
}.otherwise {
row20 := row13 // swap middle two elements
row23 := row10
}
when(row20 < row21) {
io.out0 := row20 // preserve first two elements
io.out1 := row21
}.otherwise {
io.out0 := row21 // swap first two elements
io.out1 := row20
}
when(row22 < row23) {
io.out2 := row22 // preserve first two elements
io.out3 := row23
}.otherwise {
io.out2 := row23 // swap first two elements
io.out3 := row22
}
}
```
* We can define some `Wire` such as `row10, row11, ...` to be intermediate between input and output.
:::warning
`when` vs `if` in chisel
* `when` does not return a value; instead, it is used to **describe the behavior of hardware**, such as setting signals to specific values or performing certain operations.
* `if` is not used to control the behavior of hardware; instead, it makes static choices during the generation process. It is typically used for **deterministic parameter logic** rather than representing hardware behavior.
:::
### Sequential Logic
#### Reg
* A `Reg` holds its output value until the **rising edge** of its clock, at which time it takes on the value of its input.
* i.e. `Reg` has a input in it's prev half clock, and has a output in it's second hald clock.
```scala
class RegisterModule extends Module {
val io = IO(new Bundle {
val in = Input(UInt(12.W))
val out = Output(UInt(12.W))
})
val register = Reg(UInt(12.W))
register := io.in + 1.U
io.out := register
}
test(new RegisterModule) { c =>
for (i <- 0 until 100) {
c.io.in.poke(i.U)
c.clock.step(1)
c.io.out.expect((i + 1).U)
}
}
```
* In test case, set input using `poke`, `step` is used to **tick the clock once**, which will cause the **register to pass its input to its output.**
#### RegNext
In previos case, we need to specify Register type, instead, we can use `RegNext`, this command will **automacitly determine the register type** inferred from the **register's output connection.**
```scala
class RegNextModule extends Module {
val io = IO(new Bundle {
val in = Input(UInt(12.W))
val out = Output(UInt(12.W))
})
// register bitwidth is inferred from io.out
io.out := RegNext(io.in + 1.U)
}
test(new RegNextModule) { c =>
for (i <- 0 until 100) {
c.io.in.poke(i.U)
c.clock.step(1)
c.io.out.expect((i + 1).U)
}
}
```
## Hello World in Chisel
```scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
The module has only output and no input; the output of this module is a `UInt` with a `width of 1 bit`, which means the output can be either `0` or `1`.
`CNT_MAX` is a `counter register` that contains a value of 24,999,999. The `.U` indicates that this value is an `unsigned integer`.
`cntReg` is a register initialized with 0 as an unsigned integer, with a `width of 32 bits`. This means that cntReg can represent a number in the range from 0 to $2^{32} - 1$.
`blkReg` is a register that continuously counts cntReg until its value accumulates to 24,999,999.
Finally, the LED is assigned the value of `blkReg`, which is `1`. Then, `cntReg` is reset to zero, and it starts accumulating again until it reaches `CNT_MAX`. The LED value is then updated to the complement of blkReg `(~blkReg)`, and this process repeats.
We can refactor the original code using logic circuits like `Mux` with the following pattern:
```scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := Mux(cngReg === CNT_MAX, 0.U, cntReg + 1.U)
blkReg := Mux(cntReg === CNT_MAX, ~blkReg, blkReg)
io.led := blkReg
}
```
## Lab 3 : Single Cycle RISC-V CPU
### Implementaion
Refer to the following image for the implementation of a single-cycle machine.
- [ ] Full

### InstructionFetch stage
Here we need to determine the next value of the `program counter (pc)` based on whether a `jump` is required. If a jump is necessary, set the pc to the `jump address`; otherwise, set it to `pc + 4.`
We can inspect the tester's code to examine its poke and expect operations.
```scala
case 0 => // no jump
cur = pre + 4
c.io.jump_flag_id.poke(false.B)
c.clock.step()
c.io.instruction_address.expect(cur)
pre = pre + 4
case 1 => // jump
c.io.jump_flag_id.poke(true.B)
c.io.jump_address_id.poke(entry)
c.clock.step()
c.io.instruction_address.expect(entry)
pre = entry
```
It can be inferred that the expected value for `jump` is the `jump_address_id`, while for non-jump operations, the expected value is the current `program counter`.
The following is the result after incorporating the above-mentioned feature:
```bash
$ sbt "testOnly riscv.singlecycle.InstructionFetchTest"
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] Run completed in 2 seconds, 590 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 3 s
```
### InstructionDecode stage
In the original code, the module defines the following:
```scala
val regs_reg1_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth))
val regs_reg2_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth))
val ex_immediate = Output(UInt(Parameters.DataWidth))
val ex_aluop1_source = Output(UInt(1.W))
val ex_aluop2_source = Output(UInt(1.W))
val memory_read_enable = Output(Bool())
val memory_write_enable = Output(Bool())
val wb_reg_write_source = Output(UInt(2.W))
val reg_write_enable = Output(Bool())
val reg_write_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth))
```
The remaining two outputs have not been implemented yet.
* memory_read_enable
* memory_write_enable
Checking the test files in InstructionDecoderTest, we can actually identify a bug. Specifically, there are no tests for the missing two outputs. In other words, filling in random values for the missing two outputs still allows the test to pass.
```scala
io.memory_read_enable := 0.U
io.memory_write_enable := 0.U
```
Here are the output results of the aforementioned behavior.
```bash
$ sbt "testOnly riscv.singlecycle.InstructionDecoderTest"
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] Run completed in 2 seconds, 658 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
However, in implementation, we need to determine the instruction type to classify it. If it is of the `"load word" type`, then it requires reading from memory, so the `memory_read` needs to be set to `1`. Conversely, if it is of the `store word` type, as it involves writing to memory, `memory_write `should be set to `1`.
Here are the correct results of setting the output after comparing with the opcode.
```bash
$ sbt "testOnly riscv.singlecycle.InstructionDecoderTest"
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] Run completed in 2 seconds, 616 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
### Execute stage
In the Execute module, two additional modules are declared, namely `ALU` and `ALUControl`.
Among them, `ALU` performs operations based on the values of `op1` and `op2`, as well as the given `func`.
For example, if func is ``add``, then `result = op1 + op2`. Therefore, before entering ALU, it is necessary to specify the `function type` to be given to `ALU` through `alu_ctrl`. After obtaining the function type from `alu_ctrl`, the operands (`operand1` and `operand2`) for `ALU` operation need to be specified.
Following the Single-cycle CPU architecture, the missing code for circuit design based on the provided image is as follows.
The assignment of `op1` and `op2` to `ALU`, as well as the corresponding `func`, is not completed yet. The `func` is obtained from `alu_ctrl`, so the `alu_funct` of `alu_ctrl` is assigned to `ALU`. Next, `op1` and `op2` need to be specified.
* op1 can be:
* 0 or regRd1
* op2 can be:
* regRd2 or imm16
The following is the result after incorporating the above-mentioned feature:
```bash
$ sbt "testOnly riscv.singlecycle.ExecuteTest"
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] Run completed in 2 seconds, 709 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
### Combining into a CPU
Now that the modules for each stage have been defined, the next step is to connect the inputs and outputs for each stage. Once this is done, the single-cycle machine will be complete
```bash
$ sbt "testOnly riscv.singlecycle.CPUTest"
[info] Passed: Total 0, Failed 0, Errors 0, Passed 0
[info] No tests to run for Test / testOnly
[success] Total time: 2 s
```
Having completed the individual tests mentioned above, we can now execute the test cases.
```bash
$ sbt test
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] Run completed in 6 seconds, 745 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 7 s,
```