Try   HackMD

Assignment3: single-cycle RISC-V CPU

contributed by freshLiver

Setup the Environment (in Docker)

The systems I currently have are Ubuntu 20.04 and Debian 12, however the lab3 recommends Ubuntu 22.04, here I use the given dockerfile to build development environment in docker.

Auto-testing with GitHub Action

Create a XXX.yml file under ca2023-lab3/.github/workflows with following content:

name: MyCPU Testing
run-name: ${{ github.actor }} is testing out GitHub Actions 🚀
on: [push]
jobs:
  Explore-GitHub-Actions:
    runs-on: ubuntu-22.04 # https://github.com/actions/runner-images/tree/main
    steps:
      - name: Check out repository code
        uses: actions/checkout@v4

      - name: Setup Chisel Env
        run:  |
              sudo apt-get update && sudo apt-get -y install build-essential curl wget zip unzip verilator
              sbt --version

      - name: Setup RISC-V Env
        run:  |
              wget https://github.com/xpack-dev-tools/riscv-none-elf-gcc-xpack/releases/download/v13.2.0-2/xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz
              tar zxvf xpack-riscv-none-elf-gcc-13.2.0-2-linux-x64.tar.gz
              cp -af xpack-riscv-none-elf-gcc-13.2.0-2 $HOME/riscv-none-elf-gcc
              export PATH="$HOME/riscv-none-elf-gcc/bin:$PATH"
              riscv-none-elf-gcc -v

      - name: Build and Test
        run:  |
              export PATH="$HOME/riscv-none-elf-gcc/bin:$PATH"
              make -C csrc update
              sbt test

Complete the Single-cycle RISC-V CPU

To complete the missing parts, we must refer the wiring between each component in the CPU:

HW3 CPU Arch

Instruction Fetch

The missing part in IF is the MUX to determine the PC of next cycle. The PC of next cycle should be PC + 4 for most cases; however, when the CPU find that a branch/jump instruction is executed in the EXE stage of previous cycle, IF should change the PC to the address specified by the branch/jump instruction.

In the IF part, whether the branch/jump instruction was executed is indicated by the jump_flag_id input signal, so here just use the Mux provided by chisel to implement the missing MUX.

:warning: Refrain from copying and pasting your solution directly into the HackMD note. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.

Instruction Decode

In ID stage, there are two output signals (memory_read_enable and memory_write_enable) not handled, so just raise the signal base on the input instruction type (L type for reading memory, S type for writing memory):

:warning: Refrain from copying and pasting your solution directly into the HackMD note. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.

Execute

In EXE stage, the missing parts are the inputs of ALU, its output has been wired to the output (io.mem_alu_result) but not wired with needed inputs (the outputs of two MUXs and the ALUFunct from ALU Control).

The output from ALU Control (alu_ctrl.io.alu_funct) could be simply wired to the ALU input (alu.io.func), but the input data of ALU (alu.io.op1 and alu.io.op2) are based on the two MUXs.

The first (upper) MUX determines which data should be passed to ALU. From the implementation of ID stage, we can find that when the signal ex_aluop1_source is 0, it indicates that we should pass the register data (reg1_data in the code, Reg1RD in the image above); otherwise, the first MUX should output 0 to the ALU.

object ALUOp1Source {
  val Register           = 0.U(1.W)
  val InstructionAddress = 1.U(1.W)
}
...
class InstructionDecode extends Module {
  ...
  io.ex_aluop1_source := Mux(
    opcode === Instructions.auipc || opcode === InstructionTypes.B || opcode === Instructions.jal,
    ALUOp1Source.InstructionAddress,
    ALUOp1Source.Register
  )
  ...
  io.ex_aluop2_source := Mux(
    opcode === InstructionTypes.RM,
    ALUOp2Source.Register,
    ALUOp2Source.Immediate
  )
  ...
}

In the above image, when the ALUOp1Src is set, the MUX should pass Reg1RD to ALU.

However, in the given implementation, the Reg1RD is passed to ALU when the signal ex_aluop1_source is 0, which is different from the wiring in the above image.

Similarly, the second MUX passes the register data (reg2_data) to ALU, otherwise passes the immediate.

:warning: Refrain from copying and pasting your solution directly into the HackMD note. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.

CPU

Finally, we need to connect needed signals and data for each component, and this is handled by the CPU module.

In this module, the only missing part is wiring the needed signals and data for EXE stage:

:warning: Refrain from copying and pasting your solution directly into the HackMD note. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.

Run the Hanwritten Assembly Code of Lab2 on MyCPU

How the Assembly Codes are Loaded

The .asmbin and .txt Files

To understand how the assembly codes are executed by MyCPU, we can first trace the test functions.

When we test MyCPU with the command sbt "testOnly riscv.singlecycle.XXXXTest", the corresponding function defined in ca2023-lab3/src/test/scala/riscv/singlecycle/XXXXTest.scala will be executed. Take QuicksortTest for example:

$ sbt "testOnly riscv.singlecycle.ByteAccessTest"
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project lab3-build from plugins.sbt ...
[info] loading project definition from /lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/lab3/)
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] Run completed in 6 seconds, 446 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] 

Then, look at the implementation of the module executed, it use the module TestTopModule to load the assembly:

class ByteAccessTest extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("Single Cycle CPU")
  it should "store and load a single byte" in {
    test(new TestTopModule("sb.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
      for (i <- 1 to 500) {
        c.clock.step()
        c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
      }
      c.io.regs_debug_read_address.poke(5.U) // t0
      c.io.regs_debug_read_data.expect(0xdeadbeefL.U)
      c.io.regs_debug_read_address.poke(6.U) // t1
      c.io.regs_debug_read_data.expect(0xef.U)
      c.io.regs_debug_read_address.poke(1.U) // ra
      c.io.regs_debug_read_data.expect(0x15ef.U)
    }
  }
}

If we dive into the implementation of TestTopModule, we can find that it use peripheral.InstructionROM to read the specified assembly (*.asmbin file under ca2023-lab3/src/main/resources directory) into memory, and convert the assembly into a .txt file under ca2023-lab3/verilog directory:

class InstructionROM(instructionFilename: String) extends Module {
  ...
  val (instructionsInitFile, capacity) = readAsmBinary(instructionFilename)
  ...

  def readAsmBinary(filename: String) = {
    ...
    var instructions = new Array[BigInt](0)
    val arr          = new Array[Byte](4)
    ...
    instructions = instructions :+ BigInt(0x00000013L)
    instructions = instructions :+ BigInt(0x00000013L)
    instructions = instructions :+ BigInt(0x00000013L)
    val currentDir = System.getProperty("user.dir")
    val exeTxtPath = Paths.get(currentDir, "verilog", f"${instructionFilename}.txt")
    val writer     = new FileWriter(exeTxtPath.toString)
    for (i <- instructions.indices) {
      writer.write(f"@$i%x\n${instructions(i)}%08x\n")
    }
    writer.close()
    (exeTxtPath, instructions.length)
  }
}

We can see the function readAsmBinary first appends 3 nop instructions to the array instructions, and then write the instructions in hexadecimal plain text format into the txt file, and precede each instruction with @${instruction-number}.

Why add 3 nop instead of 4 (ID, EXE, MEM, WB) ?

Now, if we check the content of the created txt file sb.asmbin.txt:

@0
00400513 # addi x10, x0, 4
@1
deadc2b7 # lui x5, -136484
@2
eef28293 # addi x5, x5, -273
@3
00550023 # sb x5, 0(x10)
@4
00052303 # lw x6, 0(x10)
@5
01500913 # addi x18, x0, 21
@6
012500a3 # sb x18, 1(x10)
@7
00052083 # lw x1, 0(x10)
@8
0000006f # jal x0, 0
@9
00000013 # nop
@a
00000013 # nop
@b
00000013 # nop

And compare with the source assembly code, it should contain all the instructions in the source assembly code:

$ riscv-none-elf-objdump -d sb.o

sb.o:     file format elf32-littleriscv


Disassembly of section .text:

00000000 <_start>:
   0:   00400513                li      a0,4
   4:   deadc2b7                lui     t0,0xdeadc
   8:   eef28293                add     t0,t0,-273 # deadbeef <loop+0xdeadbecf>
   c:   00550023                sb      t0,0(a0)
  10:   00052303                lw      t1,0(a0)
  14:   01500913                li      s2,21
  18:   012500a3                sb      s2,1(a0)
  1c:   00052083                lw      ra,0(a0)

00000020 <loop>:
  20:   0000006f                j       20 <loop>

Load .txt to RAM

Image that there is a ROM beside our CPU, our we need first load our instructions from ROM into RAM, and then start the CPU, which fetch the instructions from the RAM, to execute our program.

However, this only shows that the specified assembly codes will be converted into a txt file, we still need to know how the assembly is loaded by the CPU module. So next, let's look at how the outputs of readAsmBinary are used during the simulation.

After the assembly is converted to txt file, the loadMemoryFromFileInline function is used for loading the content of the given txt file into the ROM:

class InstructionROM(instructionFilename: String) extends Module {
  val io = IO(new Bundle {
    val address = Input(UInt(Parameters.AddrWidth))
    val data    = Output(UInt(Parameters.InstructionWidth))
  })

  val (instructionsInitFile, capacity) = readAsmBinary(instructionFilename)
  val mem      = Mem(capacity, UInt(Parameters.InstructionWidth))
  annotate(new ChiselAnnotation {
    override def toFirrtl =
      MemorySynthInit
  })

  loadMemoryFromFileInline(mem, instructionsInitFile.toString.replaceAll("\\\\", "/"))
  io.data := mem.read(io.address)
  ...
}

According to the chisel document:

loadMemoryFromFileInline is an annotation generator that helps with loading a memory from a text file inlined in the Verilog module. This relies on Verilator and Verilog's $readmemh or $readmemb.

The content of the generated txt file ?

This function uses $readmemh to load the hexadecimal value, so we can check the Verilog spec (another source if the IEEE source is too slow) for the input format for the $readmemh:

The numbers shall have neither the length nor the base format specified. For $readmemb, each number shall be binary. For $readmemh, the numbers shall be hexadecimal. [] White space and/or comments shall be used to separate the numbers.

In the following discussion, the term address refers to an index into the array that models the memory.

As the file is read, each number encountered is assigned to a successive word element of the memory. Addressing is controlled both by specifying start and/or finish addresses in the system task invocation and by specifying addresses in the data file. When addresses appear in the data file, the format is an at character (@) followed by a hexadecimal number as follows:

@hhh

Both uppercase and lowercase digits are allowed in the number. No white space is allowed between the @ and the number. As many address specifications as needed within the data file can be used. When the system task encounters an address specification, it loads subsequent data starting at that memory address.

And now we understand why we need to convert the assembly into txt file in a special format (@${instruction-number}).

Tweak the addresses to verify the understanding

To verify the understanding of the address in the input file, let's do some modification in the InstructionROM module:

  1. Change the txt format from f"@$i%x\n${instructions(i)}%08x\n" to f"@${i+1}%x\n${instructions(i)}%08x\n"
  2. Increase output capacity of the function readAsmBinary by 1 (instructions.length + 1)

Then, run the CPU and we can see now the first data output by the InstructionROM module is 00000000:

$ sbt "testOnly riscv.singlecycle.ByteAccessTest"
...
[warn] one warning found
InstructionROM output 00000000
Memory Poke          0, output instruction -> 00000000
InstructionROM output 00400513
Memory Poke          1, output instruction -> 00000000
InstructionROM output deadc2b7
Memory Poke          2, output instruction -> 00000000
InstructionROM output eef28293
Memory Poke          3, output instruction -> 00000000
CPU input instruction -> 00000000 (0)
InstructionROM output 00550023
Memory Poke          4, output instruction -> 00000000
InstructionROM output 00052303
Memory Poke          5, output instruction -> 00000000
InstructionROM output 01500913
Memory Poke          6, output instruction -> 00000000
InstructionROM output 012500a3
Memory Poke          7, output instruction -> 00000000

From the code, we can see that each time the module InstructionROM is executed, the instruction at the specified address will be output to the io.data port.

In the TestTopModule, we can find that the output (instruction_rom.io.data) is connected to the rom_loader.io.rom_data, with the load_address port of ROMLoader set as Parameters.EntryAddress (0x1000) :

class TestTopModule(exeFilename: String) extends Module {
  ...

  val mem             = Module(new Memory(8192))
  val instruction_rom = Module(new InstructionROM(exeFilename))
  val rom_loader      = Module(new ROMLoader(instruction_rom.capacity))

  rom_loader.io.rom_data     := instruction_rom.io.data
  rom_loader.io.load_address := Parameters.EntryAddress
  instruction_rom.io.address := rom_loader.io.rom_address
  ...
}

And from the implementation of ROMLoader, we can see that the input instruction will be put to the RAM (RAMBundle), from the given address load_address:

class ROMLoader(capacity: Int) extends Module {
  val io = IO(new Bundle {
    val bundle = Flipped(new RAMBundle)
    ...
  })

  val address = RegInit(0.U(32.W))
  val valid   = RegInit(false.B)
  ...
  when(address <= (capacity - 1).U) {
    io.bundle.write_enable := true.B
    io.bundle.write_data   := io.rom_data
    io.bundle.address      := (address << 2.U).asUInt + io.load_address
    io.bundle.write_strobe := VecInit(Seq.fill(Parameters.WordSize)(true.B))
    address                := address + 1.U
    when(address === (capacity - 1).U) {
      valid := true.B
    }
  }
  io.load_finished := valid
  io.rom_address   := address
}

And once all the instructions are loaded into the RAM (indicated by the given capacity from InstructionROM, by checking the register address), the ROMLoader set the load_finished to signal the CPU to start fetching the instructions in the RAM.

Now, back to the test module, and we can suppose that the c.io.mem_debug_read_address.poke() will trigger the InstructionROM to output the specified instruction to ROMLoader, then the ROMLoader will put the instruction into RAM. And after all the instructions are loaded into RAM, the CPU will start fetching the instructions from RAM:

class ByteAccessTest extends AnyFlatSpec with ChiselScalatestTester {
  ...
    test(new TestTopModule("sb.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
      for (i <- 1 to 500) {
        c.clock.step()
        c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
      }
      ...
    }
  }
}

How the Assembly Codes are Executed

Next, let's check how the the instruction are executed.

In the TestTopModule module, we can find that the CPU is in another clock that only ticks at the 4th cycle of the default clock (for InstructionROM and ROMLoader).

Why needs another clock for the CPU module ?

class TestTopModule(exeFilename: String) extends Module {
  val io = IO(new Bundle {
    val mem_debug_read_address  = Input(UInt(Parameters.AddrWidth))
    val regs_debug_read_address = Input(UInt(Parameters.PhysicalRegisterAddrWidth))
    val regs_debug_read_data    = Output(UInt(Parameters.DataWidth))
    val mem_debug_read_data     = Output(UInt(Parameters.DataWidth))
  })
  ...
  val CPU_clkdiv = RegInit(UInt(2.W), 0.U)
  val CPU_tick   = Wire(Bool())
  val CPU_next   = Wire(UInt(2.W))
  CPU_next   := Mux(CPU_clkdiv === 3.U, 0.U, CPU_clkdiv + 1.U)
  CPU_tick   := CPU_clkdiv === 0.U
  CPU_clkdiv := CPU_next

  withClock(CPU_tick.asClock) {
    val cpu = Module(new CPU)
    cpu.io.debug_read_address  := 0.U
    cpu.io.instruction_valid   := rom_loader.io.load_finished
    mem.io.instruction_address := cpu.io.instruction_address
    cpu.io.instruction         := mem.io.instruction

    when(!rom_loader.io.load_finished) {
      rom_loader.io.bundle <> mem.io.bundle
      cpu.io.memory_bundle.read_data := 0.U
    }.otherwise {
      rom_loader.io.bundle.read_data := 0.U
      cpu.io.memory_bundle <> mem.io.bundle
    }

    cpu.io.debug_read_address := io.regs_debug_read_address
    io.regs_debug_read_data   := cpu.io.debug_read_data
  }

  mem.io.debug_read_address := io.mem_debug_read_address
  io.mem_debug_read_data    := mem.io.debug_read_data
}

How can ROMLoader access Memory instance created in TestTopModule ?

The trick seems to be inside the when statement:

when(!rom_loader.io.load_finished) {
  rom_loader.io.bundle <> mem.io.bundle
  cpu.io.memory_bundle.read_data := 0.U
}.otherwise {
  rom_loader.io.bundle.read_data := 0.U
  cpu.io.memory_bundle <> mem.io.bundle
}

According to the official document, the bulk connect operator <> seems to be used for connecting the same fields of two bundles.

So, when the ROMLoader is still loading the instructions, the RAM (mem) in the TestTopModule will be connected to the RAMBundle in the ROMLoader; and once the instructions were loaded into RAM, the RAM should stop accepting the data from the ROMLoader any more, and change to accept the data from the CPU.

And from the code, we can see that the input port io.mem_debug_read_address is wired to the RAM's mem.io.debug_read_address, meaning that the every time the module is poked, it will output the data from the specified address (io.mem_debug_read_address) in the RAM.

In additional to the io.mem_debug_read_address port, the cpu.io.instruction_address is also wired to the RAM, and the RAM will output the instruction (mem.io.instruction) at the specified address to the CPU.

So it means that at the end of every 4 default cycles (4 c.io.mem_debug_read_address.poke()), the CPU will be triggered and start fetching the instruction (if the instruction_valid signal is also set) by specifying the instruction_address.

Now, add some printf to inspect the data during simulation, then we can see the output supports our assumption:

$ sbt "testOnly riscv.singlecycle.ByteAccessTest"
...
[warn] one warning found
(default cycle) InstructionROM: output 00400513 from 00000000
(default cycle) ROMLoader: write 00400513 to 00001000           // first instruction loaded
(default cycle) InstructionROM: output deadc2b7 from 00000001
(default cycle) ROMLoader: write deadc2b7 to 00001004
(default cycle) InstructionROM: output eef28293 from 00000002
(default cycle) ROMLoader: write eef28293 to 00001008
(default cycle) InstructionROM: output 00550023 from 00000003
(default cycle) ROMLoader: write 00550023 to 0000100c
(CPU cycle) CPU: fetch 00400513 from 00001000 (valid: 0)
(default cycle) InstructionROM: output 00052303 from 00000004
(default cycle) ROMLoader: write 00052303 to 00001010
(default cycle) InstructionROM: output 01500913 from 00000005
(default cycle) ROMLoader: write 01500913 to 00001014
(default cycle) InstructionROM: output 012500a3 from 00000006
(default cycle) ROMLoader: write 012500a3 to 00001018
(default cycle) InstructionROM: output 00052083 from 00000007
(default cycle) ROMLoader: write 00052083 to 0000101c
(CPU cycle) CPU: fetch 00400513 from 00001000 (valid: 0)
(default cycle) InstructionROM: output 0000006f from 00000008
(default cycle) ROMLoader: write 0000006f to 00001020
(default cycle) InstructionROM: output 00000013 from 00000009
(default cycle) ROMLoader: write 00000013 to 00001024
(default cycle) InstructionROM: output 00000013 from 0000000a
(default cycle) ROMLoader: write 00000013 to 00001028
(default cycle) InstructionROM: output 00000013 from 0000000b
(default cycle) ROMLoader: write 00000013 to 0000102c           // last instruction loaded
(CPU cycle) CPU: fetch 00400513 from 00001000 (valid: 0)
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(CPU cycle) CPU: fetch 00400513 from 00001000 (valid: 1)        // CPU started
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(CPU cycle) CPU: fetch deadc2b7 from 00001004 (valid: 1)
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
...                                                             // running
(CPU cycle) CPU: fetch 00052083 from 0000101c (valid: 1)
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(CPU cycle) CPU: fetch 0000006f from 00001020 (valid: 1)
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(CPU cycle) CPU: fetch 0000006f from 00001020 (valid: 1)
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
(default cycle) InstructionROM: output 00000000 from 0000000c
(default cycle) ROMLoader: write 00000000 to 00000000
...                                                             // repeat `j loop`...

After the loop was terminated, the test remaining routine is used to check the registers' value specified via the regs_debug_read_address port, and compared them with the expected values:

class ByteAccessTest extends AnyFlatSpec with ChiselScalatestTester {
  ...
    test(new TestTopModule("sb.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
      ...
      c.io.regs_debug_read_address.poke(5.U) // t0
      c.io.regs_debug_read_data.expect(0xdeadbeefL.U)
      c.io.regs_debug_read_address.poke(6.U) // t1
      c.io.regs_debug_read_data.expect(0xef.U)
      c.io.regs_debug_read_address.poke(1.U) // ra
      c.io.regs_debug_read_data.expect(0x15ef.U)
    }
  }
}

Custom Test

Address of .data Section

Now, create a simple custom test which runs our handwritten assembly code on MyCPU, to verify our understanding of the overall workflow of MyCPU.

First, define some data in the .data section and read the data to a0 register:

# csrc/simple.S
.data
mdata:
    .word 0x12345678
    .word 0x22345678
    .word 0x32345678
    .word 0x42345678

.text
main:
    li s1, 1234
    la s0, .data
    lw a0, 0(s0)
loop:
    j loop

And do make and check the generated asmbin using xxd command (with -e option), and we can see that the .data section is right after the instructions:

00000000: 4d200493 00001417 ffc40413 00042503  .. M.........%..
00000010: 0000006f 12345678 22345678 32345678  o...xV4.xV4"xV42
00000020: 42345678                             xV4B

Then, create a custom test module in ca2023-lab3/src/test/scala/riscv/singlecycle/CPUTest.scala. This test unit check the registers' (s0, s1, a0) values after all instructions being executed:

class SimpleTest extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("Simple Test")
  it should "Test Handwritten Assembly on MyCPU" in {
    test(new TestTopModule("simple.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
      for (i <- 1 to 50) {
        c.clock.step()
        c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
      }
      c.io.mem_debug_read_address.poke(0x1000.U) // first instruction
      c.clock.step()
      println(f"mem[${c.io.mem_debug_read_address.peek().litValue}%08x]: ${c.io.mem_debug_read_data.peek().litValue}%08x")

      c.io.regs_debug_read_address.poke(8.U) // s0
      println(f"s0: ${c.io.regs_debug_read_data.peek().litValue}%08x")
      
      c.io.regs_debug_read_address.poke(9.U) // s1
      c.io.regs_debug_read_data.expect(1234.U) // s1

      c.io.regs_debug_read_address.poke(10.U) // a0
      c.io.regs_debug_read_data.expect(0x12345678L.U)
    }
  }
}

Then, execute this test on MyCPU by running sbt "testOnly riscv.singlecycle.SimpleTest". And we will get the result:

$ sbt "testOnly riscv.singlecycle.SimpleTest"
[info] welcome to sbt 1.9.7 (Temurin Java 1.8.0_392)
[info] loading settings for project lab3-build from plugins.sbt ...
[info] loading project definition from /lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/lab3/)
mem[00001000]: 4d200493
s0: 00000ffc
[info] SimpleTest:
[info] Simple Test
[info] - should Test Handwritten Assembly on MyCPU *** FAILED ***
[info]   io_regs_debug_read_data=0 (0x0) did not equal expected=305419896 (0x12345678) (lines in CPUTest.scala: 136, 120) (CPUTest.scala:136)
[info] Run completed in 6 seconds, 923 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
[info] *** 1 TEST FAILED ***
...

The debug message shows that the address of .data section (stored in s0) is at 0x0FFC. However, according to the loading flow explained above, and the generated txt file:

@0
4d200493
@1
00001417
@2
ffc40413
@3
00042503
@4
0000006f
@5
12345678
@6
22345678
@7
32345678
@8
42345678
@9
00000013
@a
00000013
@b
00000013

The .text section should begin from 0x1000, and the .data section is right after the .text section, meaning that .data section should start from 0x1014. And we can verify this by specifying the address statically in the code:

...
.text
main:
    li s1, 1234
    li s0, 0x1014
    lw a0, 0(s0)
loop:
    j loop

And the result shows that our assumption is correct:

$ sbt "testOnly riscv.singlecycle.SimpleTest"
...
mem[00001000]: 4d200493
s0: 00001014
[info] SimpleTest:
[info] Simple Test
[info] - should Test Handwritten Assembly on MyCPU
[info] Run completed in 7 seconds, 434 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success]

TODO: solve the .data address problem

Port HW2 Code

References