# Assignment3: single-cycle RISC-V CPU contributed by < [`eeeXun`](https://github.com/eeeXun) > ## Installation On Arch Linux ``` sudo pacman -S sbt jdk17-openjdk verilator gtkwave ``` The current version of the packages ``` pacman -Q | grep "sbt\|jdk\|verilator\|gtkwave" ``` ``` gtkwave 3.3.117-1 jdk17-openjdk 17.0.9.u8-2 jre17-openjdk 17.0.9.u8-2 jre17-openjdk-headless 17.0.9.u8-2 sbt 1:1.8.3-1 verilator 5.018-1 ``` ## Pass all tests At the beginning, I have no idea where to start. So I look into the tests that I failed. In the `InstructionFetchTest`, I see that when `jump_flag_id` is turned on, the `instruction_address` should be `entry`. And the value of `entry` is pass into `instruction_address_id`. If the `jump_flag_id` is turned off, the `instruction_address` should be `cur`. And the value of `cur` is the `prev` + 4. So at this point, I could make the `InstructionFetchTest` pass. But when it goes to `InstructionDecoderTest`, I see the test only examine `ex_aluop1_source`, `ex_aluop2_source`, `regs_reg1_read_address` and `regs_reg2_read_address` these four values. But the value of these four signals has been already assigned in `InstructionDecode`. And I think these four values are all assigned correctly. The error message shows ``` [info] - should produce correct control signal *** FAILED *** [info] : io.memory_write_enable <= VOID [info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/InstructionDecode.scala 127:14] : [module InstructionDecode] Reference io is not fully initialized. ``` `memory_write_enable` is not initialized. But I still have no idea why it would fail. In the [bootcamp](https://github.com/freechipsproject/chisel-bootcamp), I only see the failure occur when the value mismatch the value in `expect` function. So I look at [Lab3](https://hackmd.io/@sysprog/r1mlr3I7p) again. I found that I missed this [diagram](https://hackmd.io/@sysprog/r1mlr3I7p#Single-cycle-CPU-architecture-diagram). This diagram shows all the signals required for this homework. After following this diagram, all the failed tests are quickly resolved. ## Assignment 2 In assembly code of [assignment 2](https://github.com/eeeXun/computer_architecture/blob/master/hw2/hw2.s), I remove all the instructions related to `ecall`. And in [exit](https://github.com/eeeXun/computer_architecture/blob/master/hw2/hw2.s#L270), I change it to infinite jump ```assembly! exit: j exit ``` There is only one test case. And the result is stored in register `s3`. So in my test case, I only check whether the value of register `s3` is correct. ```scala! class HW2Test extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Single Cycle CPU") it should "multiply two bfloat16" in { test(new TestTopModule("hw2.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 50) { c.clock.step(1000) c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } c.io.regs_debug_read_address.poke(19.U) // s3 c.io.regs_debug_read_data.expect(0x440a0000.U) } } } ``` ## Waveform During Testing ### Startup I load the `vcd` file of assignment 2. And I compare the `clock` with `iostruction_address` and `instruction` at first. I observe that it take so long to start the instruction. ![image](https://hackmd.io/_uploads/HJROqGAVT.png) Then I compare it with memory, `rom_loader`. ![image](https://hackmd.io/_uploads/r1aXhGRVT.png) The instruction start time is almost identical to the time when `rom_loader` stops changing. Then I look at the objdump result from the assignment 2 and compare it with what I observed in GTKWave. :::spoiler objdump ```assembly! hw2.o: file format elf32-littleriscv Disassembly of section .text: 00000000 <_start>: 0: 00100893 li a7,1 4: 00000617 auipc a2,0x0 8: 00060613 mv a2,a2 c: 00062803 lw a6,0(a2) # 4 <_start+0x4> 10: 028000ef jal 38 <f32_b16_p1> 14: 000807b3 add a5,a6,zero 18: 00462803 lw a6,4(a2) 1c: 01c000ef jal 38 <f32_b16_p1> 20: 00080733 add a4,a6,zero 24: 0a8000ef jal cc <encoder> 28: 00098cb3 add s9,s3,zero 2c: 0b8000ef jal e4 <decoder> 30: 0d0000ef jal 100 <Multi_bfloat> 34: 25c0006f j 290 <exit> 00000038 <f32_b16_p1>: 38: 01012023 sw a6,0(sp) 3c: 000802b3 add t0,a6,zero 40: 7f800fb7 lui t6,0x7f800 44: 01f2f333 and t1,t0,t6 48: 00800fb7 lui t6,0x800 4c: ffff8f93 add t6,t6,-1 # 7fffff <str+0x7ffd53> 50: 01f2f3b3 and t2,t0,t6 54: 7f800fb7 lui t6,0x7f800 58: 07f30463 beq t1,t6,c0 <inf_or_zero> 5c: 00736e33 or t3,t1,t2 60: 060e0063 beqz t3,c0 <inf_or_zero> 64: 00800fb7 lui t6,0x800 68: 01f3e3b3 or t2,t2,t6 6c: 00008fb7 lui t6,0x8 70: 01f383b3 add t2,t2,t6 74: 0183df13 srl t5,t2,0x18 78: 020f0063 beqz t5,98 <no_overflow> 7c: 00800fb7 lui t6,0x800 80: 01f30333 add t1,t1,t6 84: 0113d393 srl t2,t2,0x11 88: 07f00f93 li t6,127 8c: 01f3f3b3 and t2,t2,t6 90: 01039393 sll t2,t2,0x10 94: 0140006f j a8 <f32_b16_p2> 00000098 <no_overflow>: 98: 0103d393 srl t2,t2,0x10 9c: 07f00f93 li t6,127 a0: 01f3f3b3 and t2,t2,t6 a4: 01039393 sll t2,t2,0x10 000000a8 <f32_b16_p2>: a8: 01f2d293 srl t0,t0,0x1f ac: 01f29293 sll t0,t0,0x1f b0: 0062e2b3 or t0,t0,t1 b4: 0072e2b3 or t0,t0,t2 b8: 00028833 add a6,t0,zero bc: 00008067 ret 000000c0 <inf_or_zero>: c0: 01085813 srl a6,a6,0x10 c4: 01081813 sll a6,a6,0x10 c8: 00008067 ret 000000cc <encoder>: cc: 000782b3 add t0,a5,zero d0: 00070333 add t1,a4,zero d4: 01035313 srl t1,t1,0x10 d8: 0062e2b3 or t0,t0,t1 dc: 000289b3 add s3,t0,zero e0: 00008067 ret 000000e4 <decoder>: e4: 000c82b3 add t0,s9,zero e8: ffff0937 lui s2,0xffff0 ec: 0122f333 and t1,t0,s2 f0: 01029393 sll t2,t0,0x10 f4: 00030b33 add s6,t1,zero f8: 00038ab3 add s5,t2,zero fc: 00008067 ret 00000100 <Multi_bfloat>: 100: 000a82b3 add t0,s5,zero 104: 000b0333 add t1,s6,zero 108: 7f800fb7 lui t6,0x7f800 10c: 01f2fe33 and t3,t0,t6 110: 01f373b3 and t2,t1,t6 114: 007e0e33 add t3,t3,t2 118: 3f800fb7 lui t6,0x3f800 11c: 41fe0e33 sub t3,t3,t6 120: 0062c3b3 xor t2,t0,t1 124: 01f3d393 srl t2,t2,0x1f 128: 01f39393 sll t2,t2,0x1f 12c: 007e6e33 or t3,t3,t2 130: 00929293 sll t0,t0,0x9 134: 0092d293 srl t0,t0,0x9 138: 005e62b3 or t0,t3,t0 13c: 007f0fb7 lui t6,0x7f0 140: 01f2f3b3 and t2,t0,t6 144: 01f37e33 and t3,t1,t6 148: 00839393 sll t2,t2,0x8 14c: 80000fb7 lui t6,0x80000 150: 01f3e3b3 or t2,t2,t6 154: 0013d393 srl t2,t2,0x1 158: 008e1e13 sll t3,t3,0x8 15c: 01fe6e33 or t3,t3,t6 160: 001e5e13 srl t3,t3,0x1 164: 00000333 add t1,zero,zero 168: 80000fb7 lui t6,0x80000 16c: 001fdf93 srl t6,t6,0x1 170: 01f3feb3 and t4,t2,t6 174: 01d03433 snez s0,t4 178: 40800433 neg s0,s0 17c: 01c474b3 and s1,s0,t3 180: 00930333 add t1,t1,s1 184: 001e5e13 srl t3,t3,0x1 188: 001fdf93 srl t6,t6,0x1 18c: 01f3feb3 and t4,t2,t6 190: 01d03433 snez s0,t4 194: 40800433 neg s0,s0 198: 01c474b3 and s1,s0,t3 19c: 00930333 add t1,t1,s1 1a0: 001e5e13 srl t3,t3,0x1 1a4: 001fdf93 srl t6,t6,0x1 1a8: 01f3feb3 and t4,t2,t6 1ac: 01d03433 snez s0,t4 1b0: 40800433 neg s0,s0 1b4: 01c474b3 and s1,s0,t3 1b8: 00930333 add t1,t1,s1 1bc: 001e5e13 srl t3,t3,0x1 1c0: 001fdf93 srl t6,t6,0x1 1c4: 01f3feb3 and t4,t2,t6 1c8: 01d03433 snez s0,t4 1cc: 40800433 neg s0,s0 1d0: 01c474b3 and s1,s0,t3 1d4: 00930333 add t1,t1,s1 1d8: 001e5e13 srl t3,t3,0x1 1dc: 001fdf93 srl t6,t6,0x1 1e0: 01f3feb3 and t4,t2,t6 1e4: 01d03433 snez s0,t4 1e8: 40800433 neg s0,s0 1ec: 01c474b3 and s1,s0,t3 1f0: 00930333 add t1,t1,s1 1f4: 001e5e13 srl t3,t3,0x1 1f8: 001fdf93 srl t6,t6,0x1 1fc: 01f3feb3 and t4,t2,t6 200: 01d03433 snez s0,t4 204: 40800433 neg s0,s0 208: 01c474b3 and s1,s0,t3 20c: 00930333 add t1,t1,s1 210: 001e5e13 srl t3,t3,0x1 214: 001fdf93 srl t6,t6,0x1 218: 01f3feb3 and t4,t2,t6 21c: 01d03433 snez s0,t4 220: 40800433 neg s0,s0 224: 01c474b3 and s1,s0,t3 228: 00930333 add t1,t1,s1 22c: 001e5e13 srl t3,t3,0x1 230: 001fdf93 srl t6,t6,0x1 234: 01f3feb3 and t4,t2,t6 238: 01d03433 snez s0,t4 23c: 40800433 neg s0,s0 240: 01c474b3 and s1,s0,t3 244: 00930333 add t1,t1,s1 248: 001e5e13 srl t3,t3,0x1 24c: 80000fb7 lui t6,0x80000 250: 01f37eb3 and t4,t1,t6 254: 000e8a63 beqz t4,268 <not_overflow> 258: 00131313 sll t1,t1,0x1 25c: 00800fb7 lui t6,0x800 260: 01f282b3 add t0,t0,t6 264: 0080006f j 26c <Mult_end> 00000268 <not_overflow>: 268: 00231313 sll t1,t1,0x2 0000026c <Mult_end>: 26c: 01835313 srl t1,t1,0x18 270: 00130313 add t1,t1,1 274: 00135313 srl t1,t1,0x1 278: 01031313 sll t1,t1,0x10 27c: 0172d293 srl t0,t0,0x17 280: 01729293 sll t0,t0,0x17 284: 0062e2b3 or t0,t0,t1 288: 000289b3 add s3,t0,zero 28c: 00008067 ret 00000290 <exit>: 290: 0000006f j 290 <exit> 00000294 <test0>: 294: 4141f9a7 .word 0x4141f9a7 298: 423645a2 .word 0x423645a2 0000029c <test1>: 29c: 3fa66666 .word 0x3fa66666 2a0: 42c63333 .word 0x42c63333 000002a4 <test2>: 2a4: 43e43a5e .word 0x43e43a5e 2a8: 42b1999a .word 0x42b1999a 000002ac <str>: 2ac: 0000000a .word 0x0000000a ``` ::: ![image](https://hackmd.io/_uploads/HkIl1QC4T.png) So these period should be the time of loading ELF file into memory! ### InstructionFetch When `inst_fetch.io.instruction_read_data` is loaded, it takes 3 cpu clock cycles to dump the `inst_fetch.io.instruction_address`, which is the `PC`. And when `inst_fetch.io.instruction_address` changes, it takes 1 cpu clock cycle to load the `inst_fetch.io.instruction_read_data`. ![image](https://hackmd.io/_uploads/ryMCgoJSa.png) ![image](https://hackmd.io/_uploads/HJimDsJBa.png) 3 cpu clock cycles and 1 cpu cycle correspond to the instruction fetch clock cycle. ![image](https://hackmd.io/_uploads/rkwxh7xST.png) ### InstructionDecode `InstructionDecode` gets the output immediately when input `instruction` is signaled. And it holds the state for 4 cpu cycles. ![image](https://hackmd.io/_uploads/H1SV7Per6.png) It is different from `InstructionFetch`, there is no clock inside `InstructionDecode`. I guess this is due to there is no register inside `InstructionDecode`. The register inside `InstructionFetch` ```scala! val pc = RegInit(ProgramCounter.EntryAddress) ``` ### Execute When an instruction in `Execute` recives all input signals from `instruction`, `instruction_address`, `reg1_data`, `reg2_data`, `immediate`, `aluop1_source` and `aluop2_source`, it generates output `mem_alu_result`, `if_jump_flag` and `if_jump_address` immediately. And it holds the state for 4 cpu cycles. But here is something weird in some cases, it did not hold the state for 4 cpu cycles. Take the following `auipc` instruction for example ```assembly! 00000000 <_start>: 0: 00100893 li a7,1 4: 00000617 auipc a2,0x0 ``` The `mem_alu_result` changes at 3rd cpu cycle. And I found out it's due to the changing time of `ex.io.instruction`(this is from `inst_fetch.io.instruction`) does not synchronize with `ex.io.instruction_address`(this is from `inst_fetch.io.instruction_address`). ![image](https://hackmd.io/_uploads/Byg-4axH6.png) ### MemoryAccess In the following `lw` example, it takes 1 cpu cycle to load data from memory since instrcution fetched. And it hold the data for 3 cpu cycles. ```assembly! 00000000 <_start>: 0: 00100893 li a7,1 4: 00000617 auipc a2,0x0 8: 00060613 mv a2,a2 c: 00062803 lw a6,0(a2) # 4 <_start+0x4> ``` ![image](https://hackmd.io/_uploads/rJhG8RxST.png) ### WriteBack In the following `auipc` example, the `wb_io_regs_write_data` is come from `ex_io_mem_alu_result` ```assembly! 00000000 <_start>: 0: 00100893 li a7,1 4: 00000617 auipc a2,0x0 ``` ![image](https://hackmd.io/_uploads/SkXF8T-ST.png) In the following `lw` example, it loads a word `0x4141f9a7`. The `wb_io_regs_write_data` is come after 1 cpu cycle since instructions loaded. Because the data is from `mem_io_wb_memory_read_data` ```assembly! 00000000 <_start>: 0: 00100893 li a7,1 4: 00000617 auipc a2,0x0 8: 00060613 mv a2,a2 c: 00062803 lw a6,0(a2) # 4 <_start+0x4> ``` ![image](https://hackmd.io/_uploads/Sk5GqpbST.png) ## Waveform on Verilator The `vcd` generated by verilator is different from the `vcd` file generated during testing. There is no boot up time. In the fisrt cpu cycle, the CPU continued to fetch, decode, execute instructinons. And the cpu cycle is 4 ps, which is different from 2 ps generated during testing. ![image](https://hackmd.io/_uploads/rJNuC6Wra.png) The output `inst_fetch_io_instruction_read_data` is generated after quarter cpu cycle, which is 1 ps, since `inst_fetch_io_instrucion_address` is signaled. However, the result generated during testing is 1 cpu cycle, 2ps. The time interval between instruction fetching and next instrucion fetching is 1 cpu cycle, which is 4 ps. The result is different from the time generated during testing, which is 4 cpu cycles, 8 ps. ## ecall I implement `ecall` in this [branch](https://github.com/eeeXun/ca2023-lab3/tree/ecall). And test it with the [assembly program that print `RISC-V\n`](https://github.com/sysprog21/rv32emu/blob/master/docs/syscall.md#risc-v-calling-conventions). Initially, I add the signal `ecall_flag`, which will be turned on when instruction `ecall` is decoded, `ecall_a0`, `ecall_a1`, `ecall_a2` and `ecall_a7` to the `CPUBundle.scala`. The `ecall_a1`, `ecall_a2` and `ecall_a7` are data of register. In `verilog/verilator/sim_main.cpp`, `Simulator.run` function, I check if `top->io_ecall_flag` is true. If it is true, then I check `top->io_ecall_a7` code. Then I compare it with [system call number](https://github.com/sysprog21/rv32emu/blob/master/docs/syscall.md#newlib-integration). If the code is write, then I get the data from `memory->read` function with starting address `top->io_ecall_a1` and length `top->io_ecall_a2`. When I run `make verilator`, it just get some errors ``` [error] firrtl.passes.PassExceptions: [error] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/board/verilator/Top.scala 15:14] : [module Top] Reference io is not fully initialized. [error] : io.ecall_a7 <= VOID [error] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/board/verilator/Top.scala 15:14] : [module Top] Reference io is not fully initialized. [error] : io.ecall_flag <= VOID [error] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/board/verilator/Top.scala 15:14] : [module Top] Reference io is not fully initialized. [error] : io.ecall_a0 <= VOID [error] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/board/verilator/Top.scala 15:14] : [module Top] Reference io is not fully initialized. [error] : io.ecall_a2 <= VOID [error] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/board/verilator/Top.scala 15:14] : [module Top] Reference io is not fully initialized. [error] : io.ecall_a1 <= VOID [error] firrtl.passes.PassException: 5 errors detected! ``` So I add the code below to `src/main/scala/board/verilator/Top.scala` ```diff! --- a/src/main/scala/board/verilator/Top.scala +++ b/src/main/scala/board/verilator/Top.scala @@ -25,6 +25,12 @@ class Top extends Module { cpu.io.instruction := io.instruction cpu.io.instruction_valid := io.instruction_valid + + io.ecall_flag := cpu.io.ecall_flag + io.ecall_a0 := cpu.io.ecall_a0 + io.ecall_a1 := cpu.io.ecall_a1 + io.ecall_a2 := cpu.io.ecall_a2 + io.ecall_a7 := cpu.io.ecall_a7 } object VerilogGenerator extends App { ``` Then it works! But I'm not sure what is the relationship between `Top.scala` and `CPU.scala`. When I run the verilator, I found there are some bugs in my code. The string `RISC-V\n` is printed out 3 times. Then I inspect the wavform dumped from verilator. The `ecall` is just like other instrion, it hold for 1 cpu cycle. ![image](https://hackmd.io/_uploads/rJVLdiHBT.png) So I suspect it is caused by the while loop in `Simulator.run` function, which is not looping every cpu cycle.