# Assignment3: single-cycle RISC-V CPU contributed by < [`CSIE523`](https://github.com/CSIE523/ca2023-lab3) > ## Complete Lab3: Construct a single-cycle RISC-V CPU with Chisel ### InstructionFetch InstructionFetchTest.scala ```scala class InstructionFetchTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("InstructionFetch of Single Cycle CPU") it should "fetch instruction" in { test(new InstructionFetch).withAnnotations(TestAnnotations.annos) { c => val entry = 0x1000 var pre = entry var cur = pre c.io.instruction_valid.poke(true.B) var x = 0 for (x <- 0 to 100) { Random.nextInt(2) match { case 0 => // no jump cur = pre + 4 c.io.jump_flag_id.poke(false.B) c.clock.step() c.io.instruction_address.expect(cur) pre = pre + 4 case 1 => // jump c.io.jump_flag_id.poke(true.B) c.io. id.poke(entry) c.clock.step() c.io.instruction_address.expect(entry) pre = entry } } } } } ``` From analyzing InstructionFetchTest file, the CPU instruction address starts from 0x1000. It iterates 100 times, and in each iteration, it generates a value of 0 or 1 for 'jump_flag_id' to determine whether to jump to a specified address. If not, simply add 4 to the program counter. For example: if jump_flag_id == 0: The previous instruction address is 0x1000. Due to the jump_flag_id is 0, the program counter adds 4 to get the next instruction address. Therefore, the current is 0x1004. ![image](https://hackmd.io/_uploads/rk1DTRpVp.png) if jump_flag_id == 1: Although the previous instruction address is 0x1004, the jump_flag_id is 1 in next cycle. The program counter is directly assigned the specific jump_address entry, which is named entry and has a value of 0x1000. ![image](https://hackmd.io/_uploads/SyxHp0T46.png) ### InstructionDecode InstructionDecoderTest.scala ```scala class InstructionDecoderTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("InstructionDecoder of Single Cycle CPU") it should "produce correct control signal" in { test(new InstructionDecode).withAnnotations(TestAnnotations.annos) { c => c.io.instruction.poke(0x00a02223L.U) // S-type c.io.ex_aluop1_source.expect(ALUOp1Source.Register) c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate) c.io.regs_reg1_read_address.expect(0.U) c.io.regs_reg2_read_address.expect(10.U) c.clock.step() c.io.instruction.poke(0x000022b7L.U) // lui c.io.regs_reg1_read_address.expect(0.U) c.io.ex_aluop1_source.expect(ALUOp1Source.Register) c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate) c.clock.step() c.io.instruction.poke(0x002081b3L.U) // add c.io.ex_aluop1_source.expect(ALUOp1Source.Register) c.io.ex_aluop2_source.expect(ALUOp2Source.Register) c.clock.step() } } } ``` In the Decode stage, there are three test cases, which are 0x00a02223, 0x000022b7L and 0x002081b3L. Then I utilize [Online RISC-V Decoder tools](https://luplab.gitlab.io/rvcodecjs/) to help us analyze the corresponding instructions. First of all, it uses specific bits of instruction from instrcution fetch stage to find the instruction type. #### `0x00a02223`: `sw x10, 4(x0)` ![image](https://hackmd.io/_uploads/B1uHmRbSa.png) From waveform, because the opcode is 0100011, it's S-type. The funct3 is 010, so it's sw. ![image](https://hackmd.io/_uploads/HJPZERbS6.png) regs_reg1_read_address: rs1 => 00000 regs_reg2_read_address: rs2 => 01010 aluop1_source: ALUOp1Source.Register aluop2_source: ALUOp2Source.Immediate ex_immediate: Cat(Fill(21, io.instruction(31)), io.instruction(30, 25), io.instruction(11, 7)) => bits extension according to 31th bit + bit 30~25 + bit 11~7 and the value is 0x00000004 S-type will write to memory, so the memory_write_enable needs to be 1. #### `0x000022b7`: `lui x5, 2` ![image](https://hackmd.io/_uploads/Hkg-_0ZHa.png) From waveform, because the opcode is 0110111, it's lui. ![image](https://hackmd.io/_uploads/Syu1O0WBT.png) regs_reg1_read_address: 0.U(Parameters.PhysicalRegisterAddrWidth) aluop1_source: ALUOp1Source.Register aluop2_source: ALUOp2Source.Immediate reg_write_address: rd => 00101 ex_immediate: Cat(io.instruction(31, 12), 0.U(12.W)) => bit 31~12 + 12 zeros and the value is 0x00002000 lui will write to register, so the reg_write_enable needs to be 1. #### `0x002081b3`: `add x3, x1, x2` ![image](https://hackmd.io/_uploads/Sk79g1GS6.png) From waveform, because the opcode is 0110011, it's RM-type. The funct3 is 000 and the funct7 is 0000000, so it's add. ![image](https://hackmd.io/_uploads/ryvGr1zra.png) regs_reg1_read_address: rs1 => 00001 regs_reg2_read_address: rs2 => 00010 aluop1_source: ALUOp1Source.Register aluop2_source: ALUOp2Source.Register reg_write_address: rd => 00011 lui will write to register, so the reg_write_enable needs to be 1. ### Execute ExecuteTest.scala ```scala class ExecuteTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Execution of Single Cycle CPU") it should "execute correctly" in { test(new Execute).withAnnotations(TestAnnotations.annos) { c => c.io.instruction.poke(0x001101b3L.U) // x3 = x2 + x1 var x = 0 for (x <- 0 to 100) { val op1 = scala.util.Random.nextInt(429496729) val op2 = scala.util.Random.nextInt(429496729) val result = op1 + op2 val addr = scala.util.Random.nextInt(32) c.io.reg1_data.poke(op1.U) c.io.reg2_data.poke(op2.U) c.clock.step() c.io.mem_alu_result.expect(result.U) c.io.if_jump_flag.expect(0.U) } // beq test c.io.instruction.poke(0x00208163L.U) // pc + 2 if x1 === x2 c.io.instruction_address.poke(2.U) c.io.immediate.poke(2.U) c.io.aluop1_source.poke(1.U) c.io.aluop2_source.poke(1.U) c.clock.step() // equ c.io.reg1_data.poke(9.U) c.io.reg2_data.poke(9.U) c.clock.step() c.io.if_jump_flag.expect(1.U) c.io.if_jump_address.expect(4.U) // not equ c.io.reg1_data.poke(9.U) c.io.reg2_data.poke(19.U) c.clock.step() c.io.if_jump_flag.expect(0.U) c.io.if_jump_address.expect(4.U) } } } ``` In the Execute stage, there are two different instruction type testcases, R-type and B-type. ![image](https://hackmd.io/_uploads/H1BJyfmr6.png) For R-type testcase, it's `0x00208163` and the corresponding assembly is `add x3, x2, x1`. The testbench tests 100 times addition and due to no branch, the if_jump_flag is 0. ![image](https://hackmd.io/_uploads/Hy0hQMXr6.png)![image](https://hackmd.io/_uploads/BkRRQGXr6.png) For B-type, it's `0x00208163` and the corresponding assembly is `beq x1, x2, 2`. If x1 and x2 are equal, the program counter will add 2. The CPU needs to compare the data in two register x1 and x2, so the aluop1_source and aluop2_source should be 1. It results in alu.op1 and alu.op2 select reg1_data and reg2_data instead of instruction_address and immediate. The values in register x1 and x2 are the same. The program counter jumps to `4` and the if_jump_flag is 1. On the other hand, if x1 and x2 are not equal. The program counter stays at `4` and the if_jump_flag is 0. After filling the blanks, I entered`sbt test` in the command line to verify my answer ``` $sbt test ``` and got the result. ``` $ sbt test [info] welcome to sbt 1.9.7 (Temurin Java 1.8.0_392) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /home/dejuva9487/Desktop/2023_Computer_Architecture_FALL/hw3/ca2023-lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/home/dejuva9487/Desktop/2023_Computer_Architecture_FALL/hw3/ca2023-lab3/) [info] compiling 1 Scala source to /home/dejuva9487/Desktop/2023_Computer_Architecture_FALL/hw3/ca2023-lab3/target/scala-2.13/test-classes ... [info] ExecuteTest: [info] Execution of Single Cycle CPU [info] - should execute correctly [info] ByteAccessTest: [info] Single Cycle CPU [info] - should store and load a single byte [info] InstructionDecoderTest: [info] InstructionDecoder of Single Cycle CPU [info] - should produce correct control signal [info] InstructionFetchTest: [info] InstructionFetch of Single Cycle CPU [info] - should fetch instruction [info] RegisterFileTest: [info] Register File of Single Cycle CPU [info] - should read the written content [info] - should x0 always be zero [info] - should read the writing content [info] FibonacciTest: [info] Single Cycle CPU [info] - should recursively calculate Fibonacci(10) [info] QuicksortTest: [info] Single Cycle CPU [info] - should perform a quicksort on 10 numbers [info] Run completed in 12 seconds, 744 milliseconds. [info] Total number of tests run: 9 [info] Suites: completed 7, aborted 0 [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ## Modify the handwritten RISC-V assembly code in Homework2 In order to fit Homework2 in `single-cycle RISC-V CPU with Chisel`, I modified related code as well as makefile for translating my `binarization.c` into `binarization.asmbin`. I put the binarized values in the corresponding address, so that they can be used by the `CPUTest` to verify the answer. ```c #include <stdint.h> #include <stdio.h> uint16_t count_leading_zeros_16(uint16_t x) { x |= (x >> 1); x |= (x >> 2); x |= (x >> 4); x |= (x >> 8); x -= ((x >> 1) & 0x5555); x = ((x >> 2) & 0x3333) + (x & 0x3333); x = ((x >> 4) + x) & 0x0f0f; x += (x >> 8); return (16 - (x & 0x1f)); // change 0x3f to 0x1f } static int binarization(uint16_t *arr, uint16_t threshold, int i){ uint16_t sub = threshold - *(arr+i); uint16_t clz = count_leading_zeros_16(sub); return (clz) ? 0 : 255; } int main(){ // pixel test // 8-bit color depth for black and white photo uint16_t picture[5] = {0,80,127,150,231}; uint16_t threshold = 127; uint16_t *pixel = picture; *((volatile uint16_t *) (2)) = binarization(pixel, threshold, 0); *((volatile uint16_t *) (4)) = binarization(pixel, threshold, 1); *((volatile uint16_t *) (6)) = binarization(pixel, threshold, 2); *((volatile uint16_t *) (8)) = binarization(pixel, threshold, 3); *((volatile uint16_t *) (10)) = binarization(pixel, threshold, 4); return 0; } ``` The following text is inserted into `CPUTest.scala`. My homework2 is to binarize 5 values 0, 80, 127, 150, 231 with threshod 127, so the answer is 0, 0, 0, 255, 255. ```scala class hw2Test extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Single Cycle CPU") it should "binarize the pixel" in { test(new TestTopModule("binarization.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 500) { c.clock.step(100) c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } c.io.mem_debug_read_address.poke(2.U) c.clock.step() c.io.mem_debug_read_data.expect(0x0.U) c.io.mem_debug_read_address.poke(4.U) c.clock.step() c.io.mem_debug_read_data.expect(0x0.U) c.io.mem_debug_read_address.poke(6.U) c.clock.step() c.io.mem_debug_read_data.expect(0x0.U) c.io.mem_debug_read_address.poke(8.U) c.clock.step() c.io.mem_debug_read_data.expect(0xff.U) c.io.mem_debug_read_address.poke(10.U) c.clock.step() c.io.mem_debug_read_data.expect(0xff.U) } } } ``` This is the successful test result of hw2Test captured from the `sbt test` command line. ``` $ sbt test [info] welcome to sbt 1.9.7 (Temurin Java 1.8.0_392) [info] loading settings for project ca2023-lab3-build from plugins.sbt ... [info] loading project definition from /home/dejuva9487/Desktop/2023_Computer_Architecture_FALL/hw3/ca2023-lab3/projec [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/home/dejuva9487/Desktop/2023_Computer_Architecture_FALL/hw3/ca2023-lab3/) ... [info] hw2Test: [info] Single Cycle CPU [info] - should binarize the pixel ... [info] Run completed in 12 seconds, 349 milliseconds. [info] Total number of tests run: 10 [info] Suites: completed 8, aborted 0 [info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` Here is the waveform. ![image](https://hackmd.io/_uploads/B1pHHYEH6.png)