# Assignment3: single-cycle RISC-V CPU contributed by <[`Paintakotako`](https://github.com/Paintako)> ## Install Follow the instructions in Lab3 to install the required dependencies, but i've encounter a situation where Java is not installed, so installtion is needed. ## Chisel bootcamp notes ### Module ```scala class Passthrough extends Module { val io = IO(new Bundle { val in = Input(UInt(4.W)) val out = Output(UInt(4.W)) }) io.out := io.in } ``` * `Module` is a built-in Chisel class that all hardware modules must extend. * `val io = IO(...)` * declare all input and output ports in `io` val * **Must be** called `io` and be an `IO` object * `new Bundle {...}` * Hardware struct type, contains named sigals `in` and `out` ### Tester ```scala test(new Passthrough()) { c => c.io.in.poke(0.U) // Set our input to value 0 c.io.out.expect(0.U) // Assert that the output correctly has 0 c.io.in.poke(1.U) // Set our input to value 1 c.io.out.expect(1.U) // Assert that the output correctly has 1 c.io.in.poke(2.U) // Set our input to value 2 c.io.out.expect(2.U) // Assert that the output correctly has 2 } println("SUCCESS!!") // Scala Code: if we get here, our tests passed! ``` * test accepts a `Passthrogh` module * Set input using `poke` * Set expect output as `expect` * If all `expect` statements are true, then the test is passed. ### Operators * `true.B` and `false.B` are preferred ways to create Chisel Bool literals * `Mux` is used to select value, operates like **ternary operator** * `Cat` operator to concatenate to bits value * e.g. Cat(b10, b1) = b(101) :::warning Ternary operator: Also known as the conditional operator, is a shorthand way of writing an if-else statement. Its syntax is as follows: ```c condition ? expression_if_true : expression_if_false; ``` ::: ```scala val s = true.B io.outmux := Mux(s, 3.U, 0.U) // here outmux's value shold be 3.U since S is true io.outcat := Cat(2.U, 1.U) // concatenates 2 (b10) with 1 (b1) and assign it to outat witch val is 5 (101) ``` ### Control flow #### when, elsewhen, and otherwise ```scala when(someBooleanCondition) { // things to do when true }.elsewhen(someOtherBooleanCondition) { // things to do on this condition }.otherwise { // things to do if none of th boolean conditions are true } ``` * `when` describe the behavior of hardware * Note: `when` does not return value * e.g. `val result = when(squareIt) { x * x }.otherwise { x }` is not valid #### The Wire Construct * Defines a **circuit component** * `Wire` can serve as an **intermediary** between two circuits. * The reference image is as follows: * ![image](https://hackmd.io/_uploads/SkqeXWGrT.png) ```scala class Sort4 extends Module { val io = IO(new Bundle { val in0 = Input(UInt(16.W)) val in1 = Input(UInt(16.W)) val in2 = Input(UInt(16.W)) val in3 = Input(UInt(16.W)) val out0 = Output(UInt(16.W)) val out1 = Output(UInt(16.W)) val out2 = Output(UInt(16.W)) val out3 = Output(UInt(16.W)) }) val row10 = Wire(UInt(16.W)) val row11 = Wire(UInt(16.W)) val row12 = Wire(UInt(16.W)) val row13 = Wire(UInt(16.W)) when(io.in0 < io.in1) { row10 := io.in0 // preserve first two elements row11 := io.in1 }.otherwise { row10 := io.in1 // swap first two elements row11 := io.in0 } when(io.in2 < io.in3) { row12 := io.in2 // preserve last two elements row13 := io.in3 }.otherwise { row12 := io.in3 // swap last two elements row13 := io.in2 } val row21 = Wire(UInt(16.W)) val row22 = Wire(UInt(16.W)) when(row11 < row12) { row21 := row11 // preserve middle 2 elements row22 := row12 }.otherwise { row21 := row12 // swap middle two elements row22 := row11 } val row20 = Wire(UInt(16.W)) val row23 = Wire(UInt(16.W)) when(row10 < row13) { row20 := row10 // preserve middle 2 elements row23 := row13 }.otherwise { row20 := row13 // swap middle two elements row23 := row10 } when(row20 < row21) { io.out0 := row20 // preserve first two elements io.out1 := row21 }.otherwise { io.out0 := row21 // swap first two elements io.out1 := row20 } when(row22 < row23) { io.out2 := row22 // preserve first two elements io.out3 := row23 }.otherwise { io.out2 := row23 // swap first two elements io.out3 := row22 } } ``` * We can define some `Wire` such as `row10, row11, ...` to be intermediate between input and output. :::warning `when` vs `if` in chisel * `when` does not return a value; instead, it is used to **describe the behavior of hardware**, such as setting signals to specific values or performing certain operations. * `if` is not used to control the behavior of hardware; instead, it makes static choices during the generation process. It is typically used for **deterministic parameter logic** rather than representing hardware behavior. ::: ### Sequential Logic #### Reg * A `Reg` holds its output value until the **rising edge** of its clock, at which time it takes on the value of its input. * i.e. `Reg` has a input in it's prev half clock, and has a output in it's second hald clock. ```scala class RegisterModule extends Module { val io = IO(new Bundle { val in = Input(UInt(12.W)) val out = Output(UInt(12.W)) }) val register = Reg(UInt(12.W)) register := io.in + 1.U io.out := register } test(new RegisterModule) { c => for (i <- 0 until 100) { c.io.in.poke(i.U) c.clock.step(1) c.io.out.expect((i + 1).U) } } ``` * In test case, set input using `poke`, `step` is used to **tick the clock once**, which will cause the **register to pass its input to its output.** #### RegNext In previos case, we need to specify Register type, instead, we can use `RegNext`, this command will **automacitly determine the register type** inferred from the **register's output connection.** ```scala class RegNextModule extends Module { val io = IO(new Bundle { val in = Input(UInt(12.W)) val out = Output(UInt(12.W)) }) // register bitwidth is inferred from io.out io.out := RegNext(io.in + 1.U) } test(new RegNextModule) { c => for (i <- 0 until 100) { c.io.in.poke(i.U) c.clock.step(1) c.io.out.expect((i + 1).U) } } ``` ## Hello World in Chisel ```scala class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(32.W)) val blkReg = RegInit(0.U(1.W)) cntReg := cntReg + 1.U when(cntReg === CNT_MAX) { cntReg := 0.U blkReg := ~blkReg } io.led := blkReg } ``` The module has only output and no input; the output of this module is a `UInt` with a `width of 1 bit`, which means the output can be either `0` or `1`. `CNT_MAX` is a `counter register` that contains a value of 24,999,999. The `.U` indicates that this value is an `unsigned integer`. `cntReg` is a register initialized with 0 as an unsigned integer, with a `width of 32 bits`. This means that cntReg can represent a number in the range from 0 to $2^{32} - 1$. `blkReg` is a register that continuously counts cntReg until its value accumulates to 24,999,999. Finally, the LED is assigned the value of `blkReg`, which is `1`. Then, `cntReg` is reset to zero, and it starts accumulating again until it reaches `CNT_MAX`. The LED value is then updated to the complement of blkReg `(~blkReg)`, and this process repeats. We can refactor the original code using logic circuits like `Mux` with the following pattern: ```scala class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(32.W)) val blkReg = RegInit(0.U(1.W)) cntReg := Mux(cngReg === CNT_MAX, 0.U, cntReg + 1.U) blkReg := Mux(cntReg === CNT_MAX, ~blkReg, blkReg) io.led := blkReg } ``` ## Lab 3 : Single Cycle RISC-V CPU ### Implementaion Refer to the following image for the implementation of a single-cycle machine. - [ ] Full ![Single-cycle CPU architecture](https://hackmd.io/_uploads/SJzK891ra.png) ### InstructionFetch stage Here we need to determine the next value of the `program counter (pc)` based on whether a `jump` is required. If a jump is necessary, set the pc to the `jump address`; otherwise, set it to `pc + 4.` We can inspect the tester's code to examine its poke and expect operations. ```scala case 0 => // no jump cur = pre + 4 c.io.jump_flag_id.poke(false.B) c.clock.step() c.io.instruction_address.expect(cur) pre = pre + 4 case 1 => // jump c.io.jump_flag_id.poke(true.B) c.io.jump_address_id.poke(entry) c.clock.step() c.io.instruction_address.expect(entry) pre = entry ``` It can be inferred that the expected value for `jump` is the `jump_address_id`, while for non-jump operations, the expected value is the current `program counter`. The following is the result after incorporating the above-mentioned feature: ```bash $ sbt "testOnly riscv.singlecycle.InstructionFetchTest" [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] Run completed in 2 seconds, 590 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 3 s ``` ### InstructionDecode stage In the original code, the module defines the following: ```scala val regs_reg1_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth)) val regs_reg2_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth)) val ex_immediate = Output(UInt(Parameters.DataWidth)) val ex_aluop1_source = Output(UInt(1.W)) val ex_aluop2_source = Output(UInt(1.W)) val memory_read_enable = Output(Bool()) val memory_write_enable = Output(Bool()) val wb_reg_write_source = Output(UInt(2.W)) val reg_write_enable = Output(Bool()) val reg_write_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth)) ``` The remaining two outputs have not been implemented yet. * memory_read_enable * memory_write_enable Checking the test files in InstructionDecoderTest, we can actually identify a bug. Specifically, there are no tests for the missing two outputs. In other words, filling in random values for the missing two outputs still allows the test to pass. ```scala io.memory_read_enable := 0.U io.memory_write_enable := 0.U ``` Here are the output results of the aforementioned behavior. ```bash $ sbt "testOnly riscv.singlecycle.InstructionDecoderTest" [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] Run completed in 2 seconds, 658 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` However, in implementation, we need to determine the instruction type to classify it. If it is of the `"load word" type`, then it requires reading from memory, so the `memory_read` needs to be set to `1`. Conversely, if it is of the `store word` type, as it involves writing to memory, `memory_write `should be set to `1`. Here are the correct results of setting the output after comparing with the opcode. ```bash $ sbt "testOnly riscv.singlecycle.InstructionDecoderTest" [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] Run completed in 2 seconds, 616 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### Execute stage In the Execute module, two additional modules are declared, namely `ALU` and `ALUControl`. Among them, `ALU` performs operations based on the values of `op1` and `op2`, as well as the given `func`. For example, if func is ``add``, then `result = op1 + op2`. Therefore, before entering ALU, it is necessary to specify the `function type` to be given to `ALU` through `alu_ctrl`. After obtaining the function type from `alu_ctrl`, the operands (`operand1` and `operand2`) for `ALU` operation need to be specified. Following the Single-cycle CPU architecture, the missing code for circuit design based on the provided image is as follows. The assignment of `op1` and `op2` to `ALU`, as well as the corresponding `func`, is not completed yet. The `func` is obtained from `alu_ctrl`, so the `alu_funct` of `alu_ctrl` is assigned to `ALU`. Next, `op1` and `op2` need to be specified. * op1 can be: * 0 or regRd1 * op2 can be: * regRd2 or imm16 The following is the result after incorporating the above-mentioned feature: ```bash $ sbt "testOnly riscv.singlecycle.ExecuteTest" [info] ExecuteTest: [info] Execution of Single Cycle CPU [info] - should execute correctly [info] Run completed in 2 seconds, 709 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### Combining into a CPU Now that the modules for each stage have been defined, the next step is to connect the inputs and outputs for each stage. Once this is done, the single-cycle machine will be complete ```bash $ sbt "testOnly riscv.singlecycle.CPUTest" [info] Passed: Total 0, Failed 0, Errors 0, Passed 0 [info] No tests to run for Test / testOnly [success] Total time: 2 s ``` Having completed the individual tests mentioned above, we can now execute the test cases. ```bash $ sbt test [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] ByteAccessTest: [info] Single Cycle CPU [info] - should store and load a single byte [info] FibonacciTest: [info] Single Cycle CPU [info] - should recursively calculate Fibonacci(10) [info] QuicksortTest: [info] Single Cycle CPU [info] - should perform a quicksort on 10 numbers [info] ExecuteTest: [info] Execution of Single Cycle CPU [info] - should execute correctly [info] RegisterFileTest: [info] Register File of Single Cycle CPU [info] - should read the written content [info] - should x0 always be zero [info] - should read the writing content [info] Run completed in 6 seconds, 745 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 7 s, ```