# Assignment3: Single-cycle RISC-V CPU contributed by [chihenliu](https://hackmd.io/@chihenliu) ## 1.Environment Setup My OS is `Ubuntu 22.04.3 LTS` ### 1.1. Install the dependent package ```shell $sudo apt install build-essential verilator gtkwave ``` ### 1.2.Install sbt/JDK/SDKMAN #### 1.2.1 Install SDKMAN follow the instructions install SDKman ```shell $curl -s "https://get.sdkman.io" | bash $source "$HOME/.sdkman/bin/sdkman-init.sh" $sdk version ``` #### 1.2.2 Install sbt follow the instructions install sbt ```shell $sdk install java $(sdk list java | grep -o "\b8\.[0-9]*\.[0-9]*\-tem" | head -1) $sdk install sbt ``` The installation of sbt is complete. #### 1.2.3 Install JDK follow [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p#Lab3-Construct-a-single-cycle-RISC-V-CPU-with-Chisel) instructions ```shell $sdk install java 11.0.21-tem ``` The installation of JDK is complete. ### 1.2.4 Install GTKWave For general install 1.`Type./configure` 2.`make` 3.`sudo make install` However, my Ubuntu is encountering errors, so I'm following the installation instructions based on the README.md as follows: ```shell! $sudo apt-get install libjudy-dev $sudo apt-get install libbz2-dev $sudo apt-get install liblzma-dev $sudo apt-get install libgconf2-dev $sudo apt-get install libgtk2.0-dev $sudo apt-get install tcl-dev $sudo apt-get install tk-dev $sudo apt-get install gperf $sudo apt-get install gtk2-engines-pixbuf ``` After above instrcution install Package ,Iinstall GTKWave using `Type./configure`,`make`,`sudo make install` ## 2. Explaination of Hello World in Chisel ### 2.1 Chisel tutorials follow the instructions: ```shell $ git clone https://github.com/ucb-bar/chisel-tutorial $ cd chisel-tutorial $ git checkout release $ sbt run ``` Output: ``` test Hello Success: 1 tests passed in 6 cycles taking 0.004980 seconds [info] [0.002] RAN 1 CYCLES PASSED [success] Total time: 2 s, completed ``` You also can run all examples: ```shell $./run-examples.sh all ``` ### 2.2 Chisel Bootcamp - [x] 1.Introduction to Scala - [x] 2.1.Your First Chisel Module - [x] 2.2.Combinational Logic - [x] 2.3.Control Flow - [x] 2.4.Sequential Logic - [x] 2.5.Putting it all Together: An FIR Filter - [x] 2.6.More on ChiselTest - [x] 3.1.Generators: Parameters - [x] 3.2.Generators: Collections - [x] 3.3.Chisel Standard Library - [x] 3.4.Higher-Order Functions - [x] 3.5.Functional Programming - [x] 3.6.Object Oriented Programming - [x] 3.7.Generators: Types before `Dec.1` I will go through all the steps. ### 2.3 Explaination of Hello World ```scala class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(32.W)) val blkReg = RegInit(0.U(1.W)) cntReg := cntReg + 1.U when(cntReg === CNT_MAX) { cntReg := 0.U blkReg := ~blkReg } io.led := blkReg } ``` * It is observed I/O Bundle there is only one output signal 'led' with no input signals, and `'led'` is an unsigned integer with a bits width of 1. * `cntReg ` It is a 32-bit unsigned integer register, initialized with 0 * `blkReg` It is a 1-bit unsigned integer register, initialized with 0 and Used to control the state of the LED * `CNT_MAX`It is a constant with a value of `24999999`. This value is typically set based on the system's clock frequency and is used to control the flashing frequency of the LED * In each clock cycle, the value of `cntReg` increases by 1 * When` cntReg` reaches the value of `CNT_MAX`, `cntReg` is reset to 0, and the value of `blkReg` is inverted We can achieve another LED functionality by eliminating blkReg ```scala // Hello in chisel ,after eliminating blkReg class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val cntMax = (50000000 / 2 - 1).U val cntReg = RegInit(0.U(32.W)) cntReg := Mux(cntReg === cntMax, 0.U, cntReg + 1.U) io.led := cntReg === cntMax } ``` * when cntReg reaches `cntMax`, we directly set the LED to ON (1), while at other times, the LED is turned off (0) * The LED will briefly flash each time `cntReg` reaches cntMax, rather than remaining illuminated until the next counting cycle is completed ## 3. Complete Lab 3 `MyCPU` code ### 3.1. Single-cycle CPU #### Single-cycle CPU diagarm ![Single-cycle CPU architecture](https://hackmd.io/_uploads/SJzK891ra.png) #### InstrcutionFetch stage #### InstrcutionDecode stage #### Execute stage #### Memory Access stage #### Write-Back stage ### 3.2. Finsh My CPU code We need to add code to four Scala files to complete the modules in `src/main/scala/riscv/core` * InstructionFetch.scala * InstructionDecode.scala * Execute.scala * CPU.scala By completing the `Instruction Fetch`, `Instruction Decode`, and `Execute` stages, and then using the aforementioned components, I have completed the `CPU` section. Here is my [repository](https://github.com/chihen0709/ca2023-lab3) for Lab 3, which was forked from [ca2023-lab3](https://github.com/sysprog21/ca2023-lab3). ### 3.3. MyCPU test and Waveform Test command: ```shell $sbt test ``` However, since the CPU code was not initially completed, you will receive the following Output: ```shell [info] *** 6 TESTS FAILED *** [error] Failed tests: [error] riscv.singlecycle.InstructionDecoderTest [error] riscv.singlecycle.ByteAccessTest [error] riscv.singlecycle.InstructionFetchTest [error] riscv.singlecycle.ExecuteTest [error] riscv.singlecycle.FibonacciTest [error] riscv.singlecycle.QuicksortTest [error] (Test / test) sbt.TestsFailedException: Tests unsuccessful ``` After completing the missing code for the `Instruction Fetch`, `Instruction Decode`, and `Execute` stages as well as the CPU, I proceeded to test according to the command provided in Lab 3. ```shell $sbt test ``` we can get following Output: ```shell [info] ExecuteTest: [info] Execution of Single Cycle CPU [info] - should execute correctly [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] QuicksortTest: [info] Single Cycle CPU [info] - should perform a quicksort on 10 numbers [info] ByteAccessTest: [info] Single Cycle CPU [info] - should store and load a single byte [info] FibonacciTest: [info] Single Cycle CPU [info] - should recursively calculate Fibonacci(10) [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] RegisterFileTest: [info] Register File of Single Cycle CPU [info] - should read the written content [info] - should x0 always be zero [info] - should read the writing content [info] Run completed in 9 seconds, 325 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 10 s, completed Nov 28, 2023, 5:41:06 PM ``` To test a single test case, you can use the following command ```shell $sbt "testOnly riscv.singlecycle.XXXTest" ``` #### 3.3.1. InstructionFetch test The `PC` is initialized to `ProgramCounter.EntryAddress`. The `jump_flag_id` is used to determine whether a jump should be executed; it is a control signal. If it is true, a jump is executed, and the `PC` is updated to the memory location provided by `jump_address_id`. If it is false, `PC` is incremented by 4 to execute the next instruction ##### InstructionFetchTest.scala ```scala class InstructionFetchTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("InstructionFetch of Single Cycle CPU") it should "fetch instruction" in { test(new InstructionFetch).withAnnotations(TestAnnotations.annos) { c => val entry = 0x1000 var pre = entry var cur = pre c.io.instruction_valid.poke(true.B) var x = 0 for (x <- 0 to 100) { Random.nextInt(2) match { case 0 => // no jump cur = pre + 4 c.io.jump_flag_id.poke(false.B) c.clock.step() c.io.instruction_address.expect(cur) pre = pre + 4 case 1 => // jump c.io.jump_flag_id.poke(true.B) c.io.jump_address_id.poke(entry) c.clock.step() c.io.instruction_address.expect(entry) pre = entry } } ``` In the given example, a random number is generated. If this random number is 0, the program continues without any jump, and the Program Counter (`PC`) simply increments by 4 (to `pre + 4`). Conversely, if the random number is 1, the program executes a jump to the entry address ##### Waveform * jump_flag_id set to 1 ![screenshot 2023-11-28 203504](https://hackmd.io/_uploads/rybtyDXHT.png) ![screenshot 2023-11-28 203516](https://hackmd.io/_uploads/HJZY1PQHa.png) When `jump_flag_id` is set to `1`, you can observe that instead of incrementing PC by `4` to become `0x1012`, it directly jumps to `0x1000` from its original memory Address at `0x1008` * jump_flag_id set to 0 ![screenshot 2023-11-28 204421](https://hackmd.io/_uploads/H1d1ZPXB6.png) ![screenshot 2023-11-28 204411](https://hackmd.io/_uploads/S1YyWPQrp.png) You can observe that when `jump_flag_id` is set to `0`, the PC memory address transitions from `0x1000` to `0x1004` after the next clock cycle, following the `PC+4 ` #### 3.3.2. Instruction Decode test In the `ID` stage, an input signal `instruction` is decoded by the `ID` unit, generating various control signals for the circuit,After completing the `ID` module, you will obtain a total of `10` complete outputs。 ##### InstructionDecodeTest.scala ```scala class InstructionDecoderTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("InstructionDecoder of Single Cycle CPU") it should "produce correct control signal" in { test(new InstructionDecode).withAnnotations(TestAnnotations.annos) { c => c.io.instruction.poke(0x00a02223L.U) // S-type c.io.ex_aluop1_source.expect(ALUOp1Source.Register) c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate) c.io.regs_reg1_read_address.expect(0.U) c.io.regs_reg2_read_address.expect(10.U) c.clock.step() c.io.instruction.poke(0x000022b7L.U) // lui c.io.regs_reg1_read_address.expect(0.U) c.io.ex_aluop1_source.expect(ALUOp1Source.Register) c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate) c.clock.step() c.io.instruction.poke(0x002081b3L.U) // add c.io.ex_aluop1_source.expect(ALUOp1Source.Register) c.io.ex_aluop2_source.expect(ALUOp2Source.Register) c.clock.step() } } } ``` The above code verifies three instructions: `S-type`, `lui`, and `add`. I added two signals,` memory_read_enable` and `memory_write_enable`, in the InstructionDecoder.scala file, and the above test case lacks testing for`memory_write_enable`. Perhaps, additional test cases can be added for `memory_write_enable` as part of completing Assignment 3 ##### Waveform ![screenshot 2023-11-28 213124](https://hackmd.io/_uploads/H12KnDXr6.png) **S-type** Waveform ![screenshot 2023-11-28 213214](https://hackmd.io/_uploads/rk2KhwmBa.png) **lui** Waveform ![screenshot 2023-11-28 213256](https://hackmd.io/_uploads/H12FhP7Ha.png) **add** Waveform ![screenshot 2023-11-28 213424](https://hackmd.io/_uploads/SkTY3DmBp.png) ![screenshot 2023-11-28 221000](https://hackmd.io/_uploads/S1TyB_7rp.png) ![screenshot 2023-11-28 221031](https://hackmd.io/_uploads/S1pySO7rp.png) #### 3.3.3. Execute test Based on `Execute.scala`, this stage is primarily composed of two modules: `ALU` and `ALU Control`. `ALU Control` is responsible for generating `opcode`, `funct3`, and `funct7`. Subsequently, ALU performs operations using the code it generates, resulting in output signals `if_jump_flag` and `if_jump_address` ##### ExecuteTest.scala ```scala class ExecuteTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Execution of Single Cycle CPU") it should "execute correctly" in { test(new Execute).withAnnotations(TestAnnotations.annos) { c => c.io.instruction.poke(0x001101b3L.U) // x3 = x2 + x1 var x = 0 for (x <- 0 to 100) { val op1 = scala.util.Random.nextInt(429496729) val op2 = scala.util.Random.nextInt(429496729) val result = op1 + op2 val addr = scala.util.Random.nextInt(32) c.io.reg1_data.poke(op1.U) c.io.reg2_data.poke(op2.U) c.clock.step() c.io.mem_alu_result.expect(result.U) c.io.if_jump_flag.expect(0.U) } // beq test c.io.instruction.poke(0x00208163L.U) // pc + 2 if x1 === x2 c.io.instruction_address.poke(2.U) c.io.immediate.poke(2.U) c.io.aluop1_source.poke(1.U) c.io.aluop2_source.poke(1.U) c.clock.step() // equ c.io.reg1_data.poke(9.U) c.io.reg2_data.poke(9.U) c.clock.step() c.io.if_jump_flag.expect(1.U) c.io.if_jump_address.expect(4.U) // not equ c.io.reg1_data.poke(9.U) c.io.reg2_data.poke(19.U) c.clock.step() c.io.if_jump_flag.expect(0.U) c.io.if_jump_address.expect(4.U) } } } ``` I have added the signal assignments for `alu.io.func`, `alu.io.op1`, and `alu.io.op2` in Execute that were previously incomplete. This test is conducted to verify three types of operations: `x1+x2=x3`, `equ (equal)`, and `not equ (not equal)` ##### Waveform **X3=X1+X2** ![screenshot 2023-11-28 220115](https://hackmd.io/_uploads/H1tB7_QrT.png) **beq** ![screenshot 2023-11-28 220203](https://hackmd.io/_uploads/HytrQdmB6.png) **not beq** ![screenshot 2023-11-28 220219](https://hackmd.io/_uploads/SktBXd7Bp.png) ## 4. HomeWork2 Assembly Code Adapt on MyCPU ### 4.1. Modify the origin homework2 code Because the single-cycle CPU lacks system calls, I will remove the `ecall`, `rdcycle`, and` rdcycleh` instructions, and instead, I will add the `start` and `loop` label。 ```Assembly .global itof_clz .global _start _start: la t0, num lw a0, 12(t0) lw a1, 8(t0) jal itof_clz li t0,1 li t1,2 li t2,3 loop: j loop ``` ### 4.2. Test my RISC-V assembly I'm writing my program in CPUtest, and here is my test program ```scala class itof_clzTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Single Cycle CPU") it should "convert integer to floating point" in { test(new TestTopModule("itof_clz.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 500) { c.clock.step(1000) // Avoid timeout c.io.mem_debug_read_address.poke((i * 4).U) // Assume the converted result is stored in memory sequentially } c.io.regs_debug_read_address.poke(10.U) println(s"${c.io.regs_debug_read_data.peek()}") c.io.regs_debug_read_data.expect(1088462400.U) c.io.regs_debug_read_address.poke(11.U) println(s"${c.io.regs_debug_read_data.peek()}") c.io.regs_debug_read_data.expect(0.U) } } } ``` The main goal is to test whether my integer can be converted into IEEE-754 floating point. run single test command ```shell $sbt "testOnly riscv.singlecycle.itof_clzTest" ``` so I run this test Program get Success output message ```shell [info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /home/chihen/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/home/chihen/ca2023-lab3/) UInt<32>(1088462400) UInt<32>(0) [info] itof_clzTest: [info] Single Cycle CPU [info] - should convert integer to floating point [info] Run completed in 18 seconds, 664 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 19 s, completed Nov 29, 2023, 8:36:22 PM ``` | Input | Output | | -------- | -------- | | 0x84f2 | UInt<32>(1088462400) | My output valid by IEEE754 converter is correct ### 4-3 Using Verilator to Run the Assembly and Visualizing using waveforms Use the following command to generate the simulation executable file of the CPU ```shell $make verilator $./run-verilator.sh -instruction src/main/resources/itof_clz.asmbin -time 4000 -vcd itofclz01.vcd ``` Output: ```shell -time 4000 -memory 1048576 -instruction src/main/resources/itof_clz.asmbin [-------------------->] 100% ``` #### Waveform Using an online [RISC-V instruction encoder/decoder](https://luplab.gitlab.io/rvcodecjs/#q=sw+a0,+0(sp)&abi=false&isa=AUTO) allows us to quickly understand the registers behind the instructions and easily determine their memory locations, enabling us to better observe the waveform variations * R-type **sub a3,a3,a2** Assembly =**sub x13, x13, x12** Binary =`0100 0000 1100 0110 1000 0110 1011 0011` Hexadecimal =`0x40c686b3` ID stage ![screenshot 2023-11-30 182748](https://hackmd.io/_uploads/S1HTQyUS6.png) `io_reg_write_enable` is used to indicate whether `R-Type` instructions should write to IO device registers EX stage ![screenshot 2023-11-30 184252](https://hackmd.io/_uploads/rk-uw1IHa.png) `alu_op` has successfully retrieved the value from the register and is ready to perform operations using it Reg ![screenshot 2023-11-30 184629](https://hackmd.io/_uploads/rJzXOJUrT.png) For this stage, after the `clock` enters the next phase, the values in the registers will undergo a change. * I-type **lw a0,8(t0)** Assembly =**lw x10, 12(x5)** Binary =`0000 0000 1100 0010 1010 0101 0000 0011` Hexadecimal =`0x00c2a503` ID stage ![screenshot 2023-11-30 185308](https://hackmd.io/_uploads/BJSjK18BT.png) `mem_read_enable` and `reg_write_enable` have been set to extract data from memory addresses and prepare for writing into registers. Ex stage ![screenshot 2023-11-30 185728](https://hackmd.io/_uploads/Skso918Bp.png) We can obtain the address `00001308` read from the registers Reg stage ![screenshot 2023-11-30 190315](https://hackmd.io/_uploads/BkqWnJIS6.png) For this stage, after the clock enters the next phase, the values in the registers will undergo a change * S-type **sw a0, 0(sp)** Assembly =**sw x10, 0(x2)** Binary =`0000 0000 1010 0001 0010 0000 0010 0011` Hexadecimal =`0x00a12023` ID stage ![screenshot 2023-11-30 190645](https://hackmd.io/_uploads/ryg1618rp.png) `io_reg_write_enable` is used to indicate whether `S-Type` instructions should write to IO device registers EX stage ![screenshot 2023-11-30 191010](https://hackmd.io/_uploads/B1wjTyIBa.png) We can observe that `alu_op` is determined by the changes in `alu_op_source`, which in turn affects the data in the register ## 5.Conculsion Through this practical assignment, I have come to realize my own shortcomings and have learned a new programming language. Going through Lab 3 step by step to understand the architecture of a `single-cycle CPU` has given me a deeper understanding of the essence of computer architecture and its design. Perhaps in the future, there may be assignments related to `GPU` design that will allow us to delve even further into the implications and principles behind computer components. I also look forward to continuously learning through the guidance of our teacher and pushing myself to bridge the significant gap between myself and those who excel in the field. ## 6.Reference [Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/CvrOEhLKSxOJKdblTjhEqQ?view) [Single-Cycle Processor](https://hackmd.io/@joanne8826/S1jWF0it8) [Building a RISC-V Processor](https://docs.google.com/presentation/d/1SbeyDTycsb97201QvzxGa4CmE9bmRDyd/edit#slide=id.p2) [Datapath Control](https://docs.google.com/presentation/d/1UvXegiqDEGa5IOWMnnybxK4jftMY7MOF/edit#slide=id.p3) [Chisel Breakdown 3](https://docs.google.com/presentation/d/1gMtABxBEDFbCFXN_-dPyvycNAyFROZKwk-HMcnxfTnU/edit#slide=id.p)