--- tags: [jserv, 2023-arch, RISC-V, Chisel] --- # Assignment3: single-cycle RISC-V CPU > contributed by [freshLiver](https://github.com/freshLiver/ca2023-lab3) ## Setup the Environment (in Docker) The systems I currently have are Ubuntu 20.04 and Debian 12, however the lab3 recommends Ubuntu 22.04, here I use the given dockerfile to build development environment in docker. ### Auto-testing with GitHub Action Create a `XXX.yml` file under `ca2023-lab3/.github/workflows` with following content: ```yml name: MyCPU Testing run-name: ${{ github.actor }} is testing out GitHub Actions 🚀 on: [push] jobs: Explore-GitHub-Actions: runs-on: ubuntu-22.04 # https://github.com/actions/runner-images/tree/main steps: - name: Check out repository code uses: actions/checkout@v4 - name: Setup Chisel Env run: | sudo apt-get update && sudo apt-get -y install build-essential curl wget zip unzip verilator sbt --version - name: Setup RISC-V Env run: | wget https://github.com/xpack-dev-tools/riscv-none-elf-gcc-xpack/releases/download/v13.2.0-2/xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz tar zxvf xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz cp -af xpack-riscv-none-elf-gcc-13.2.0-2 $HOME/riscv-none-elf-gcc export PATH="$HOME/riscv-none-elf-gcc/bin:$PATH" riscv-none-elf-gcc -v - name: Build and Test run: | export PATH="$HOME/riscv-none-elf-gcc/bin:$PATH" make -C csrc update sbt test ``` ## Complete the Single-cycle RISC-V CPU To complete the missing parts, we must refer the wiring between each component in the CPU: ![HW3 CPU Arch](https://hackmd.io/_uploads/SJzK891ra.png) ### Instruction Fetch The missing part in IF is the MUX to determine the PC of next cycle. The PC of next cycle should be `PC + 4` for most cases; however, when the CPU find that a branch/jump instruction is executed in the EXE stage of previous cycle, IF should change the PC to the address specified by the branch/jump instruction. In the IF part, whether the branch/jump instruction was executed is indicated by the `jump_flag_id` input signal, so here just use the `Mux` provided by chisel to implement the missing MUX. :::danger :warning: **Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases. ::: ### Instruction Decode In ID stage, there are two output signals (`memory_read_enable` and `memory_write_enable`) not handled, so just raise the signal base on the input instruction type (L type for reading memory, S type for writing memory): :::danger :warning: **Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases. ::: ### Execute In EXE stage, the missing parts are the inputs of ALU, its output has been wired to the output (`io.mem_alu_result`) but not wired with needed inputs (the outputs of two MUXs and the `ALUFunct` from ALU Control). The output from ALU Control (`alu_ctrl.io.alu_funct`) could be simply wired to the ALU input (`alu.io.func`), but the input data of ALU (`alu.io.op1` and `alu.io.op2`) are based on the two MUXs. The first (upper) MUX determines which data should be passed to ALU. From the implementation of ID stage, we can find that when the signal `ex_aluop1_source` is 0, it indicates that we should pass the register data (`reg1_data` in the code, `Reg1RD` in the image above); otherwise, the first MUX should output 0 to the ALU. ```scala object ALUOp1Source { val Register = 0.U(1.W) val InstructionAddress = 1.U(1.W) } ... class InstructionDecode extends Module { ... io.ex_aluop1_source := Mux( opcode === Instructions.auipc || opcode === InstructionTypes.B || opcode === Instructions.jal, ALUOp1Source.InstructionAddress, ALUOp1Source.Register ) ... io.ex_aluop2_source := Mux( opcode === InstructionTypes.RM, ALUOp2Source.Register, ALUOp2Source.Immediate ) ... } ``` :::warning In the above image, when the `ALUOp1Src` is set, the MUX should pass `Reg1RD` to ALU. However, in the given implementation, the `Reg1RD` is passed to ALU when the signal `ex_aluop1_source` is 0, which is different from the wiring in the above image. ::: Similarly, the second MUX passes the register data (`reg2_data`) to ALU, otherwise passes the immediate. :::danger :warning: **Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases. ::: ### CPU Finally, we need to connect needed signals and data for each component, and this is handled by the CPU module. In this module, the only missing part is wiring the needed signals and data for EXE stage: :::danger :warning: **Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases. ::: ## Run the Hanwritten Assembly Code of Lab2 on MyCPU ### How the Assembly Codes are Loaded #### The `.asmbin` and `.txt` Files To understand how the assembly codes are executed by MyCPU, we can first trace the test functions. When we test MyCPU with the command `sbt "testOnly riscv.singlecycle.XXXXTest"`, the corresponding function defined in `ca2023-lab3/src/test/scala/riscv/singlecycle/XXXXTest.scala` will be executed. Take `QuicksortTest` for example: ```bash $ sbt "testOnly riscv.singlecycle.ByteAccessTest" [info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21) [info] loading settings for project lab3-build from plugins.sbt ... [info] loading project definition from /lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/lab3/) [info] ByteAccessTest: [info] Single Cycle CPU [info] - should store and load a single byte [info] Run completed in 6 seconds, 446 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] ``` Then, look at the implementation of the module executed, it use the module `TestTopModule` to load the assembly: ```scala class ByteAccessTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Single Cycle CPU") it should "store and load a single byte" in { test(new TestTopModule("sb.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 500) { c.clock.step() c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } c.io.regs_debug_read_address.poke(5.U) // t0 c.io.regs_debug_read_data.expect(0xdeadbeefL.U) c.io.regs_debug_read_address.poke(6.U) // t1 c.io.regs_debug_read_data.expect(0xef.U) c.io.regs_debug_read_address.poke(1.U) // ra c.io.regs_debug_read_data.expect(0x15ef.U) } } } ``` If we dive into the implementation of `TestTopModule`, we can find that it use `peripheral.InstructionROM` to read the specified assembly (`*.asmbin` file under `ca2023-lab3/src/main/resources` directory) into memory, and convert the assembly into a `.txt` file under `ca2023-lab3/verilog` directory: ```scala class InstructionROM(instructionFilename: String) extends Module { ... val (instructionsInitFile, capacity) = readAsmBinary(instructionFilename) ... def readAsmBinary(filename: String) = { ... var instructions = new Array[BigInt](0) val arr = new Array[Byte](4) ... instructions = instructions :+ BigInt(0x00000013L) instructions = instructions :+ BigInt(0x00000013L) instructions = instructions :+ BigInt(0x00000013L) val currentDir = System.getProperty("user.dir") val exeTxtPath = Paths.get(currentDir, "verilog", f"${instructionFilename}.txt") val writer = new FileWriter(exeTxtPath.toString) for (i <- instructions.indices) { writer.write(f"@$i%x\n${instructions(i)}%08x\n") } writer.close() (exeTxtPath, instructions.length) } } ``` We can see the function `readAsmBinary` first appends 3 [`nop`](https://luplab.gitlab.io/rvcodecjs/#q=0x00000013) instructions to the array `instructions`, and then write the instructions in hexadecimal plain text format into the txt file, and precede each instruction with `@${instruction-number}`. :::warning **Why add 3 `nop` instead of 4 (ID, EXE, MEM, WB) ?** ::: Now, if we check the content of the created txt file `sb.asmbin.txt`: ```text @0 00400513 # addi x10, x0, 4 @1 deadc2b7 # lui x5, -136484 @2 eef28293 # addi x5, x5, -273 @3 00550023 # sb x5, 0(x10) @4 00052303 # lw x6, 0(x10) @5 01500913 # addi x18, x0, 21 @6 012500a3 # sb x18, 1(x10) @7 00052083 # lw x1, 0(x10) @8 0000006f # jal x0, 0 @9 00000013 # nop @a 00000013 # nop @b 00000013 # nop ``` And compare with the source assembly code, it should contain all the instructions in the source assembly code: ```bash $ riscv-none-elf-objdump -d sb.o sb.o: file format elf32-littleriscv Disassembly of section .text: 00000000 <_start>: 0: 00400513 li a0,4 4: deadc2b7 lui t0,0xdeadc 8: eef28293 add t0,t0,-273 # deadbeef <loop+0xdeadbecf> c: 00550023 sb t0,0(a0) 10: 00052303 lw t1,0(a0) 14: 01500913 li s2,21 18: 012500a3 sb s2,1(a0) 1c: 00052083 lw ra,0(a0) 00000020 <loop>: 20: 0000006f j 20 <loop> ``` #### Load `.txt` to RAM :::info Image that there is a ROM beside our CPU, our we need first load our instructions from ROM into RAM, and then start the CPU, which fetch the instructions from the RAM, to execute our program. ::: However, this only shows that the specified assembly codes will be converted into a txt file, we still need to know how the assembly is loaded by the `CPU` module. So next, let's look at how the outputs of `readAsmBinary` are used during the simulation. After the assembly is converted to txt file, [the `loadMemoryFromFileInline` function](https://javadoc.io/doc/org.chipsalliance/chisel_2.13/latest/chisel3/util/experimental/loadMemoryFromFileInline$.html) is used for loading the content of the given txt file into the ROM: ```scala class InstructionROM(instructionFilename: String) extends Module { val io = IO(new Bundle { val address = Input(UInt(Parameters.AddrWidth)) val data = Output(UInt(Parameters.InstructionWidth)) }) val (instructionsInitFile, capacity) = readAsmBinary(instructionFilename) val mem = Mem(capacity, UInt(Parameters.InstructionWidth)) annotate(new ChiselAnnotation { override def toFirrtl = MemorySynthInit }) loadMemoryFromFileInline(mem, instructionsInitFile.toString.replaceAll("\\\\", "/")) io.data := mem.read(io.address) ... } ``` According to the chisel document: > *loadMemoryFromFileInline* is an annotation generator that helps with loading a memory from a text file inlined in the Verilog module. This relies on Verilator and Verilog's $readmemh or $readmemb. :::info **The content of the generated txt file ?** This function uses `$readmemh` to load the hexadecimal value, so we can check [the Verilog spec](https://ieeexplore.ieee.org/stampPDF/getPDF.jsp?tp=&arnumber=1620780&ref=aHR0cHM6Ly9pZWVleHBsb3JlLmllZWUub3JnL2RvY3VtZW50LzE2MjA3ODA=#G6.120119) ([another source](https://www.eg.bucknell.edu/~csci320/2016-fall/wp-content/uploads/2015/08/verilog-std-1364-2005.pdf#G6.120119) if the IEEE source is too slow) for the input format for the `$readmemh`: > The **numbers shall have neither the length nor the base format specified**. For $readmemb, each number shall be binary. For $readmemh, the numbers shall be hexadecimal. [...] **White space and/or comments** shall be used to separate the numbers. > > In the following discussion, the term **address** refers to an index into the array that models the memory. > > As the file is read, each number encountered is assigned to a successive word element of the memory. Addressing is controlled both by specifying start and/or finish addresses in the system task invocation and by specifying addresses in the data file. **When addresses appear in the data file, the format is an at character (@) followed by a hexadecimal number as follows**: > > @hh...h > > Both uppercase and lowercase digits are allowed in the number. No white space is allowed between the @ and the number. As many address specifications as needed within the data file can be used. When the system task encounters an address specification, it loads subsequent data starting at that memory address. And now we understand why we need to convert the assembly into txt file in a special format (`@${instruction-number}`). ::: :::success **Tweak the addresses to verify the understanding** To verify the understanding of the address in the input file, let's do some modification in the `InstructionROM` module: 1. Change the txt format from `f"@$i%x\n${instructions(i)}%08x\n"` to `f"@${i+1}%x\n${instructions(i)}%08x\n"` 2. Increase output capacity of the function `readAsmBinary` by 1 (`instructions.length + 1`) Then, run the CPU and we can see now the first data output by the `InstructionROM` module is `00000000`: ```text $ sbt "testOnly riscv.singlecycle.ByteAccessTest" ... [warn] one warning found InstructionROM output 00000000 Memory Poke 0, output instruction -> 00000000 InstructionROM output 00400513 Memory Poke 1, output instruction -> 00000000 InstructionROM output deadc2b7 Memory Poke 2, output instruction -> 00000000 InstructionROM output eef28293 Memory Poke 3, output instruction -> 00000000 CPU input instruction -> 00000000 (0) InstructionROM output 00550023 Memory Poke 4, output instruction -> 00000000 InstructionROM output 00052303 Memory Poke 5, output instruction -> 00000000 InstructionROM output 01500913 Memory Poke 6, output instruction -> 00000000 InstructionROM output 012500a3 Memory Poke 7, output instruction -> 00000000 ``` ::: From the code, we can see that each time the module `InstructionROM` is executed, the instruction at the specified address will be output to the `io.data` port. In the `TestTopModule`, we can find that the output (`instruction_rom.io.data`) is connected to the `rom_loader.io.rom_data`, with the `load_address` port of `ROMLoader` set as `Parameters.EntryAddress` (`0x1000`) : ```scala class TestTopModule(exeFilename: String) extends Module { ... val mem = Module(new Memory(8192)) val instruction_rom = Module(new InstructionROM(exeFilename)) val rom_loader = Module(new ROMLoader(instruction_rom.capacity)) rom_loader.io.rom_data := instruction_rom.io.data rom_loader.io.load_address := Parameters.EntryAddress instruction_rom.io.address := rom_loader.io.rom_address ... } ``` And from the implementation of `ROMLoader`, we can see that the input instruction will be put to the RAM (`RAMBundle`), from the given address `load_address`: ```scala class ROMLoader(capacity: Int) extends Module { val io = IO(new Bundle { val bundle = Flipped(new RAMBundle) ... }) val address = RegInit(0.U(32.W)) val valid = RegInit(false.B) ... when(address <= (capacity - 1).U) { io.bundle.write_enable := true.B io.bundle.write_data := io.rom_data io.bundle.address := (address << 2.U).asUInt + io.load_address io.bundle.write_strobe := VecInit(Seq.fill(Parameters.WordSize)(true.B)) address := address + 1.U when(address === (capacity - 1).U) { valid := true.B } } io.load_finished := valid io.rom_address := address } ``` And once all the instructions are loaded into the RAM (indicated by the given `capacity` from `InstructionROM`, by checking the register `address`), the `ROMLoader` set the `load_finished` to signal the CPU to start fetching the instructions in the RAM. Now, back to the test module, and we can suppose that the `c.io.mem_debug_read_address.poke()` will trigger the `InstructionROM` to output the specified instruction to `ROMLoader`, then the `ROMLoader` will put the instruction into RAM. And after all the instructions are loaded into RAM, the CPU will start fetching the instructions from RAM: ```scala class ByteAccessTest extends AnyFlatSpec with ChiselScalatestTester { ... test(new TestTopModule("sb.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 500) { c.clock.step() c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } ... } } } ``` ### How the Assembly Codes are Executed Next, let's check how the the instruction are executed. In the `TestTopModule` module, we can find that the CPU is in another clock that only ticks at the 4th cycle of the default clock (for `InstructionROM` and `ROMLoader`). :::warning **Why needs another clock for the CPU module ?** ::: ```scala class TestTopModule(exeFilename: String) extends Module { val io = IO(new Bundle { val mem_debug_read_address = Input(UInt(Parameters.AddrWidth)) val regs_debug_read_address = Input(UInt(Parameters.PhysicalRegisterAddrWidth)) val regs_debug_read_data = Output(UInt(Parameters.DataWidth)) val mem_debug_read_data = Output(UInt(Parameters.DataWidth)) }) ... val CPU_clkdiv = RegInit(UInt(2.W), 0.U) val CPU_tick = Wire(Bool()) val CPU_next = Wire(UInt(2.W)) CPU_next := Mux(CPU_clkdiv === 3.U, 0.U, CPU_clkdiv + 1.U) CPU_tick := CPU_clkdiv === 0.U CPU_clkdiv := CPU_next withClock(CPU_tick.asClock) { val cpu = Module(new CPU) cpu.io.debug_read_address := 0.U cpu.io.instruction_valid := rom_loader.io.load_finished mem.io.instruction_address := cpu.io.instruction_address cpu.io.instruction := mem.io.instruction when(!rom_loader.io.load_finished) { rom_loader.io.bundle <> mem.io.bundle cpu.io.memory_bundle.read_data := 0.U }.otherwise { rom_loader.io.bundle.read_data := 0.U cpu.io.memory_bundle <> mem.io.bundle } cpu.io.debug_read_address := io.regs_debug_read_address io.regs_debug_read_data := cpu.io.debug_read_data } mem.io.debug_read_address := io.mem_debug_read_address io.mem_debug_read_data := mem.io.debug_read_data } ``` :::info **How can `ROMLoader` access `Memory` instance created in `TestTopModule` ?** The trick seems to be inside the `when` statement: ```scala when(!rom_loader.io.load_finished) { rom_loader.io.bundle <> mem.io.bundle cpu.io.memory_bundle.read_data := 0.U }.otherwise { rom_loader.io.bundle.read_data := 0.U cpu.io.memory_bundle <> mem.io.bundle } ``` According to the official document, the [bulk connect operator `<>`](https://javadoc.io/doc/org.chipsalliance/chisel_2.13/latest/chisel3/Bundle.html#%3C%3E(that:=%3Echisel3.Data)(implicitsourceInfo:chisel3.experimental.SourceInfo):Unit) seems to be used for connecting the same fields of two bundles. So, when the `ROMLoader` is still loading the instructions, the RAM (`mem`) in the `TestTopModule` will be connected to the `RAMBundle` in the `ROMLoader`; and once the instructions were loaded into RAM, the RAM should stop accepting the data from the `ROMLoader` any more, and change to accept the data from the CPU. ::: And from the code, we can see that the input port `io.mem_debug_read_address` is wired to the RAM's `mem.io.debug_read_address`, meaning that the every time the module is poked, it will output the data from the specified address (`io.mem_debug_read_address`) in the RAM. In additional to the `io.mem_debug_read_address` port, the `cpu.io.instruction_address` is also wired to the RAM, and the RAM will output the instruction (`mem.io.instruction`) at the specified address to the CPU. So it means that at the end of every 4 default cycles (4 `c.io.mem_debug_read_address.poke()`), the CPU will be triggered and start fetching the instruction (if the `instruction_valid` signal is also set) by specifying the `instruction_address`. Now, add some `printf` to inspect the data during simulation, then we can see the output supports our assumption: ```text $ sbt "testOnly riscv.singlecycle.ByteAccessTest" ... [warn] one warning found (default cycle) InstructionROM: output 00400513 from 00000000 (default cycle) ROMLoader: write 00400513 to 00001000 // first instruction loaded (default cycle) InstructionROM: output deadc2b7 from 00000001 (default cycle) ROMLoader: write deadc2b7 to 00001004 (default cycle) InstructionROM: output eef28293 from 00000002 (default cycle) ROMLoader: write eef28293 to 00001008 (default cycle) InstructionROM: output 00550023 from 00000003 (default cycle) ROMLoader: write 00550023 to 0000100c (CPU cycle) CPU: fetch 00400513 from 00001000 (valid: 0) (default cycle) InstructionROM: output 00052303 from 00000004 (default cycle) ROMLoader: write 00052303 to 00001010 (default cycle) InstructionROM: output 01500913 from 00000005 (default cycle) ROMLoader: write 01500913 to 00001014 (default cycle) InstructionROM: output 012500a3 from 00000006 (default cycle) ROMLoader: write 012500a3 to 00001018 (default cycle) InstructionROM: output 00052083 from 00000007 (default cycle) ROMLoader: write 00052083 to 0000101c (CPU cycle) CPU: fetch 00400513 from 00001000 (valid: 0) (default cycle) InstructionROM: output 0000006f from 00000008 (default cycle) ROMLoader: write 0000006f to 00001020 (default cycle) InstructionROM: output 00000013 from 00000009 (default cycle) ROMLoader: write 00000013 to 00001024 (default cycle) InstructionROM: output 00000013 from 0000000a (default cycle) ROMLoader: write 00000013 to 00001028 (default cycle) InstructionROM: output 00000013 from 0000000b (default cycle) ROMLoader: write 00000013 to 0000102c // last instruction loaded (CPU cycle) CPU: fetch 00400513 from 00001000 (valid: 0) (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (CPU cycle) CPU: fetch 00400513 from 00001000 (valid: 1) // CPU started (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (CPU cycle) CPU: fetch deadc2b7 from 00001004 (valid: 1) (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 ... // running (CPU cycle) CPU: fetch 00052083 from 0000101c (valid: 1) (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (CPU cycle) CPU: fetch 0000006f from 00001020 (valid: 1) (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (CPU cycle) CPU: fetch 0000006f from 00001020 (valid: 1) (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 (default cycle) InstructionROM: output 00000000 from 0000000c (default cycle) ROMLoader: write 00000000 to 00000000 ... // repeat `j loop`... ``` After the loop was terminated, the test remaining routine is used to check the registers' value specified via the `regs_debug_read_address` port, and compared them with the expected values: ```scala class ByteAccessTest extends AnyFlatSpec with ChiselScalatestTester { ... test(new TestTopModule("sb.asmbin")).withAnnotations(TestAnnotations.annos) { c => ... c.io.regs_debug_read_address.poke(5.U) // t0 c.io.regs_debug_read_data.expect(0xdeadbeefL.U) c.io.regs_debug_read_address.poke(6.U) // t1 c.io.regs_debug_read_data.expect(0xef.U) c.io.regs_debug_read_address.poke(1.U) // ra c.io.regs_debug_read_data.expect(0x15ef.U) } } } ``` ### Custom Test #### Address of `.data` Section Now, create a simple custom test which runs our handwritten assembly code on MyCPU, to verify our understanding of the overall workflow of MyCPU. First, define some data in the `.data` section and read the data to `a0` register: ```asm # csrc/simple.S .data mdata: .word 0x12345678 .word 0x22345678 .word 0x32345678 .word 0x42345678 .text main: li s1, 1234 la s0, .data lw a0, 0(s0) loop: j loop ``` And do `make` and check the generated asmbin using `xxd` command (with `-e` option), and we can see that the `.data` section is right after the instructions: ```text 00000000: 4d200493 00001417 ffc40413 00042503 .. M.........%.. 00000010: 0000006f 12345678 22345678 32345678 o...xV4.xV4"xV42 00000020: 42345678 xV4B ``` Then, create a custom test module in `ca2023-lab3/src/test/scala/riscv/singlecycle/CPUTest.scala`. This test unit check the registers' (`s0`, `s1`, `a0`) values after all instructions being executed: ```scala class SimpleTest extends AnyFlatSpec with ChiselScalatestTester { behavior.of("Simple Test") it should "Test Handwritten Assembly on MyCPU" in { test(new TestTopModule("simple.asmbin")).withAnnotations(TestAnnotations.annos) { c => for (i <- 1 to 50) { c.clock.step() c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout } c.io.mem_debug_read_address.poke(0x1000.U) // first instruction c.clock.step() println(f"mem[${c.io.mem_debug_read_address.peek().litValue}%08x]: ${c.io.mem_debug_read_data.peek().litValue}%08x") c.io.regs_debug_read_address.poke(8.U) // s0 println(f"s0: ${c.io.regs_debug_read_data.peek().litValue}%08x") c.io.regs_debug_read_address.poke(9.U) // s1 c.io.regs_debug_read_data.expect(1234.U) // s1 c.io.regs_debug_read_address.poke(10.U) // a0 c.io.regs_debug_read_data.expect(0x12345678L.U) } } } ``` Then, execute this test on MyCPU by running `sbt "testOnly riscv.singlecycle.SimpleTest"`. And we will get the result: ```bash $ sbt "testOnly riscv.singlecycle.SimpleTest" [info] welcome to sbt 1.9.7 (Temurin Java 1.8.0_392) [info] loading settings for project lab3-build from plugins.sbt ... [info] loading project definition from /lab3/project [info] loading settings for project root from build.sbt ... [info] set current project to mycpu (in build file:/lab3/) mem[00001000]: 4d200493 s0: 00000ffc [info] SimpleTest: [info] Simple Test [info] - should Test Handwritten Assembly on MyCPU *** FAILED *** [info] io_regs_debug_read_data=0 (0x0) did not equal expected=305419896 (0x12345678) (lines in CPUTest.scala: 136, 120) (CPUTest.scala:136) [info] Run completed in 6 seconds, 923 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0 [info] *** 1 TEST FAILED *** ... ``` The debug message shows that the address of `.data` section (stored in `s0`) is at `0x0FFC`. However, according to the [loading flow](#load-txt-to-ram) explained above, and the generated txt file: ```text @0 4d200493 @1 00001417 @2 ffc40413 @3 00042503 @4 0000006f @5 12345678 @6 22345678 @7 32345678 @8 42345678 @9 00000013 @a 00000013 @b 00000013 ``` The `.text` section should begin from `0x1000`, and the `.data` section is right after the `.text` section, meaning that `.data` section should start from `0x1014`. And we can verify this by specifying the address statically in the code: ```asm ... .text main: li s1, 1234 li s0, 0x1014 lw a0, 0(s0) loop: j loop ``` And the result shows that our assumption is correct: ```bash $ sbt "testOnly riscv.singlecycle.SimpleTest" ... mem[00001000]: 4d200493 s0: 00001014 [info] SimpleTest: [info] Simple Test [info] - should Test Handwritten Assembly on MyCPU [info] Run completed in 7 seconds, 434 milliseconds. [info] Total number of tests run: 1 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] ``` :::danger TODO: solve the `.data` address problem ::: ### Port HW2 Code ## References - <https://hackmd.io/@sysprog/r1mlr3I7p> - <https://luplab.gitlab.io/rvcodecjs/#q=lw&abi=false&isa=RV32I> - <https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf> - <https://javadoc.io/doc/org.chipsalliance/chisel_2.13/latest/chisel3/util/MuxLookup$.html> - <https://www.scala-sbt.org/1.x/docs/Testing.html#testOnly> - <https://www.scala-lang.org/api/2.13.3/scala/Array.html>