# Assignment3: Single-cycle RISC-V CPU contributed by < [`jimmylu890303`](https://github.com/jimmylu890303) > ## Hello World in Chisel ```scala class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(32.W)) val blkReg = RegInit(0.U(1.W)) cntReg := cntReg + 1.U when(cntReg === CNT_MAX) { cntReg := 0.U blkReg := ~blkReg } io.led := blkReg } ``` - There is no input signal detected here. However, there is an output signal called `led`. The `led` is of type unsigned int and has a bit width of 1. - `CNT_MAX` is a constant set to 29999999. - `cntReg` is a 32-bit unsigned integer register initialized with a value of 0. - `blkReg` is a 1-bit unsigned integer register initialized with a value of 0. - On each clock cycle, cntReg is incremented by one. - When cntReg reaches CNT_MAX, `cntReg` is reset to zero, but blkReg remains unchanged. - The output `led` is controlled by the value stored in blkReg Below is the code where I'm using logic circuits to enhance the original code, employing a Mux to control the signal when cntReg equals CNT_MAX. ```scala class Hello extends Module { val io = IO(new Bundle { val led = Output(UInt(1.W)) }) val CNT_MAX = (50000000 / 2 - 1).U; val cntReg = RegInit(0.U(32.W)) val blkReg = RegInit(0.U(1.W)) cntReg := Mux(cntReg === CNT_MAX,0.U,cntReg + 1.U) blkReg := Mux(cntReg === CNT_MAX,~blkReg,blkReg) io.led := blkReg } ``` ## Lab 3 : Single Cycle RISC-V CPU We need to add code to four Scala files to complete the modules. - InstructionFetch.scala - InstructionDecode.scala - Execute.scala - CPU.scala Above are the four Scala files. ### InstructionFetch.scala: In the InstructionFetch.scala file, the IF module needs to determine the next instruction address to be stored in the program counter based on the `jump_flag_id` signal. ![image](https://hackmd.io/_uploads/Hyh9GaMH6.png) #### Test InstructionFetch We will test the InstructionFetch process 100 times. Each time, a random number (0 or 1) will be generated. - If the number is 0, indicating no jump, the output signal `instruction_address` is expected to be `pre + 4`. - If the number is 1, indicating a jump, the target address for the jump is `entry`. Thus, the output signal `instruction_address` is expected to be `entry`. ```scala for (x <- 0 to 100) { Random.nextInt(2) match { case 0 => // no jump cur = pre + 4 c.io.jump_flag_id.poke(false.B) c.clock.step() c.io.instruction_address.expect(cur) pre = pre + 4 case 1 => // jump c.io.jump_flag_id.poke(true.B) c.io.jump_address_id.poke(entry) c.clock.step() c.io.instruction_address.expect(entry) pre = entry } } ``` > src/test/scala/riscv/singlecycle/InstructionFetchTest.scala #### Analysis with GTKWave - jump_flag_id is set to 1 ![Screenshot from 2023-11-23 20-03-06](https://hackmd.io/_uploads/Sku3x624a.png) ![Screenshot from 2023-11-23 20-12-19](https://hackmd.io/_uploads/HkARba3Np.png) when the `jump_flag_id` is set to 1, the program counter (pc) will be set to 0x1000 (entry) in the next cycle. - jump_flag_id is set to 0 ![Screenshot from 2023-11-23 20-27-12](https://hackmd.io/_uploads/HkNPS6hN6.png) ![Screenshot from 2023-11-23 20-27-25](https://hackmd.io/_uploads/BJNPHpn4a.png) when the `jump_flag_id` is set to 0, the program counter (pc) will be set to PC+4(0x1004+4) in the next cycle. ### InstructionDecode.scala: In the InstructionDecode.scala file, the ID module is responsible for decoding the input signal `instruction` and generating multiple control signals for the circuit. ![image](https://hackmd.io/_uploads/B1DE7TMBa.png) Within the complete InstructionDecode.scala module, this section will ascertain the following 10 output signals by parsing the 32-bit instruction. - regs_reg1_read_address - regs_reg2_read_address - ex_immediate - ex_aluop1_source - ex_aluop2_source - memory_read_enable - memory_write_enable - wb_reg_write_source - reg_write_enable - reg_write_address #### Test InstructionDecode ```scala c.io.instruction.poke(0x00a02223L.U) // S-type c.io.ex_aluop1_source.expect(ALUOp1Source.Register) c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate) c.io.regs_reg1_read_address.expect(0.U) c.io.regs_reg2_read_address.expect(10.U) c.clock.step() c.io.instruction.poke(0x000022b7L.U) // lui c.io.regs_reg1_read_address.expect(0.U) c.io.ex_aluop1_source.expect(ALUOp1Source.Register) c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate) c.clock.step() c.io.instruction.poke(0x002081b3L.U) // add c.io.ex_aluop1_source.expect(ALUOp1Source.Register) c.io.ex_aluop2_source.expect(ALUOp2Source.Register) c.clock.step() ``` > src/test/scala/riscv/singlecycle/InstructionDecoderTest.scala In this test, it will check three types of instructions: S-type, lui, and add. #### Analysis with GTKWave - Instruction `0x00a02223`(S-type) ![Screenshot from 2023-11-23 21-36-03](https://hackmd.io/_uploads/ryUPUA3Na.png) - Instruction `0x000022b7`(lui) ![Screenshot from 2023-11-23 21-39-12](https://hackmd.io/_uploads/Hkwj8A3ET.png) - Instruction `0x002081b3`(add) ![Screenshot from 2023-11-23 21-40-01](https://hackmd.io/_uploads/B1qlv0h4a.png) ### Execute.scala In the Execute.scala file, there are two main modules. One is the `ALU control`, responsible for generating the corresponding ALU function code based on the opcode, funct3, and funct7 of the input instruction. The other is the `ALU`, which performs the designated function determined by the ALU function code generated by the ALU control. In the complete Execute module, it will produce the result from the ALU, as well as output the signals if_jump_flag and if_jump_address. ![image](https://hackmd.io/_uploads/SJXrLpGrp.png) #### Test Execute ```scala c.io.instruction.poke(0x001101b3L.U) // x3 = x2 + x1 var x = 0 for (x <- 0 to 100) { val op1 = scala.util.Random.nextInt(429496729) val op2 = scala.util.Random.nextInt(429496729) val result = op1 + op2 val addr = scala.util.Random.nextInt(32) c.io.reg1_data.poke(op1.U) c.io.reg2_data.poke(op2.U) c.clock.step() c.io.mem_alu_result.expect(result.U) c.io.if_jump_flag.expect(0.U) } // beq test c.io.instruction.poke(0x00208163L.U) // pc + 2 if x1 === x2 c.io.instruction_address.poke(2.U) c.io.immediate.poke(2.U) c.io.aluop1_source.poke(1.U) c.io.aluop2_source.poke(1.U) c.clock.step() // equ c.io.reg1_data.poke(9.U) c.io.reg2_data.poke(9.U) c.clock.step() c.io.if_jump_flag.expect(1.U) c.io.if_jump_address.expect(4.U) // not equ c.io.reg1_data.poke(9.U) c.io.reg2_data.poke(19.U) c.clock.step() c.io.if_jump_flag.expect(0.U) c.io.if_jump_address.expect(4.U) ``` > src/test/scala/riscv/singlecycle/ExecuteTest.scala In this test,it will test 2 types of instrucitons(`x3 = x2 + x1` and `beq`). #### Analysis with GTKWave - x3 = x2 + x1 ![Screenshot from 2023-11-23 22-12-05](https://hackmd.io/_uploads/ByNJCC2VT.png) - beq (equal occur) ![Screenshot from 2023-11-23 22-15-31](https://hackmd.io/_uploads/HyJ-1ypN6.png) - beq (not equal occur) ![Screenshot from 2023-11-23 22-16-38](https://hackmd.io/_uploads/BkUZk16Ea.png) ## Modify the handwritten RISC-V assembly code in Homework2 ### Modify the origin homework2 code Because the Single Cycle CPU lacks a system call for printing, I'm unable to directly print the output result while executing the assembly code. Instead of utilizing the print system call, I've adapted the code to store the output result in memory. In homework 2, we are required to modify the relevant system call in rv32emu to display the result and convert the output from numerical to ASCII format. ```scala jal ra, pimo addi a0, a0, 48 la t0, buffer sb zero, 1(t0) sb a0, 0(t0) li a0, 1 la a1, buffer li a2, 2 li a7, SYSWRITE ecall # print result of pimo (which is in a0) ``` In homework3, ```scala # sw result in mem sw a0, 0(s3) ``` The subsequent steps outline the modifications I made to enable the code to run on the Single Cycle CPU. - Put my code `main.S` into `/csrc` directory. - Saves the output result at memory addresses `0x4, 0x8, 0xC, and 0x10`. - Modify the `Makefile` to generate `main.asmbin`. - After generating `main.asmbin`, move this file to the directory `src/main/resources`. - Add a corresponding test named `Hw2Test` in the `CPUTest.scala` file. ### Test my RISC-V assembly To test my RISC-V assembly code, I've added a test named Hw2Test to CPUTest.scala. Here, I verify the results at memory addresses `0x4, 0x8, 0xC, and 0x10`. ```scala class Hw2Test extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Single Cycle CPU") it should "Implementation of multiplication overflow prediction for unsigned integers using CLZ" in { test(new TestTopModule("main.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 10) { c.clock.step(1000) c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } // result should be 0 0 1 1 c.io.mem_debug_read_address.poke(4.U) c.clock.step() c.io.mem_debug_read_data.expect(0.U) c.io.mem_debug_read_address.poke(8.U) c.clock.step() c.io.mem_debug_read_data.expect(0.U) c.io.mem_debug_read_address.poke(12.U) c.clock.step() c.io.mem_debug_read_data.expect(1.U) c.io.mem_debug_read_address.poke(16.U) c.clock.step() c.io.mem_debug_read_data.expect(1.U) } } } ``` Run test: ```scala sbt "testOnly riscv.singlecycle.Hw2Test" ``` Output: ``` [info] welcome to sbt 1.9.7 (OpenLogic Java 11.0.21) [info] loading settings for project ca2023-lab3-build from plugins.sbt .. [info] loading project definition from /home/jimmy/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/home/jimmy/ca2023-lab3/) [info] Hw2Test: [info] Single Cycle CPU [info] - should Implementation of multiplication overflow prediction for unsigned integers using CLZ [info] Run completed in 29 seconds, 876 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 35 s, completed Nov 25, 2023, 2:34:59 PM ``` ### Using Verilator to Run the Assembly ```scala ./run-verilator.sh -instruction src/main/resources/main.asmbin -time 2000 -vcd dump01.vcd ``` Output: ``` -time 2000 -memory 1048576 -instruction src/main/resources/main.asmbin [-------------------->] 100% ``` Use GTKWave to see wave Case 1: ![Screenshot from 2023-11-25 16-00-24](https://hackmd.io/_uploads/r1v-5mkSa.png) > prev cycle ![Screenshot from 2023-11-25 16-00-35](https://hackmd.io/_uploads/Bkv-5XkHT.png) > next cycle - Instruction `0x024000EF` is equal to `jal ra, pimo`(jal x1, 36). - PC is now `0x00001050`,and regs_write_source=`0b11`.So write back value ra = PC+4. - ![image](https://hackmd.io/_uploads/r1R9jm1r6.png) - Target jump address is `0x1074`(Computed from ALU),and `if_jump_flag` is 1(Computed from Jump judge). - So next cycle PC is set to `0x1074`. Case 2: ![Screenshot from 2023-11-25 16-29-48](https://hackmd.io/_uploads/SkP2gVkr6.png) - Instruction `0x00512023` is equal to `sw t0, 0(sp)`(sw x5, 0(x2)). - `io_memory_write_enable` is 1,because it is a store word instruction. - ALU.op1 is the address of sp, and ALU.op2 is the offset(immeditate). - `im_mem_alu_result` is the target writing memory address. - `regs_io_read_datas` is the value stored in $a0. Case 3: ![Screenshot from 2023-11-25 16-42-56](https://hackmd.io/_uploads/SkFh741rT.png) - Instruction `0x00512023` is equal to `li s3, 4`(addi x19, x0, 4). - ALU.op1 is the value of x0, and ALU.op2 is the value of immediate. - `ALU.mem_alu_result` is 0x4 and `wb_reg_write_source` is 0b00. So `regs_write_data` is set by `ALU.mem_alu_result`. - Target register is 0x13(`$s3`),and `regs_io_write_enable` is 1. So $s3 will be set to 4.