Assignment3: Single-cycle RISC-V CPU

contributed by < Yao1201 >

tags: `RISC-V`, `jserv`

Prerequisites

Environment settings

Ther is a very detailed explanation on Lab3: Construct a single-cycle CPU with Chisel.

Chisel Bootcamp

I used Chisel Bootcamp to learn CHISEL by completing the exercises provided.

Hello word in Chisel

// Hello World in Chisel
class Hello extends Module {
  val io = IO(new Bundle {
    val led = Output(UInt(1.W))
  })
  val CNT_MAX = (50000000 / 2 - 1).U;
  val cntReg  = RegInit(0.U(32.W))
  val blkReg  = RegInit(0.U(1.W))
  cntReg := cntReg + 1.U
  when(cntReg === CNT_MAX) {
    cntReg := 0.U
    blkReg := ~blkReg                                                         
  }
  io.led := blkReg
}

This code imply when cntReg equals to CNT_MAX, cntReg resets to 0, blkReg toggles its state, and the output is the current state of blkReg. Besides, cntReg increases by one after each cycle.

io.led is output which
CNT_MAX is an unsigned integer which equals to 24999999
cntReg is a 32-bits unsigned integer register initialized to 0
blkReg is a 1-bit unsigned integer register initialized to 0, which will be assigned to io.led

The code below is my enhancement of the original code using logical circuits:

// After enhancing
class Hello extends Module {
  val io = IO(new Bundle {
    val led = Output(UInt(1.W))
  })
  val CNT_MAX = (50000000 / 2 - 1).U;
  val cntReg  = RegInit(0.U(32.W))
  val blkReg  = RegInit(0.U(1.W))
  cntReg := Mux(cntReg === CNT_MAX, 0.U, cntReg + 1.U)  
  blkReg := Mux(cntReg === CNT_MAX, ~blkReg, blkReg)
  io.led := blkReg
}

I use Mux instruction substituting for when instruction.

Single-cycle RISC-V CPU

First, we need to complete the following files:

InstructionFetch.scala
InstructionDecode.scala
Execute.scala
CPU.scala

src/main/scala/riscv/core/*.scala

The completed code can be found in my repositories on GitHub (Forked from sysprog21/ca2023-lab3)

Single-cycle CPU architecture diagram

Implementation

After completing the files above, we can use following command to run single test:

$ sbt "testOnly riscv.singlecycle.InstructionFetchTest"

Waveform

While running tests, we set the environment variable WRITE_VCD to 1, waveform files will be generated.

$ WRITE_VCD=1 sbt test

Afterward, we can find .vcd files in various subdirectories under the test_run_dir directory.

InstructionFetch

previous

current

From these two diagrams, in the beginning, io_instruction_address = 0x1000 and we can observe that when io_jump_flag_id = 0, pc = pc + 4.

previous

current

In the other hand, when io_jump_flag_id = 1, pcwill depend onio_jump_address_id, which equals to 0x1000(i.e. pc = io_jump_address_id`)

We can compare with the InstructionFetchTest.scala:

class InstructionFetchTest extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("InstructionFetch of Single Cycle CPU")
  it should "fetch instruction" in {
    test(new InstructionFetch).withAnnotations(TestAnnotations.annos) { c =>
      val entry = 0x1000
      var pre   = entry
      var cur   = pre
      c.io.instruction_valid.poke(true.B)
      var x = 0
      for (x <- 0 to 100) {
        Random.nextInt(2) match {
          case 0 => // no jump
            cur = pre + 4
            c.io.jump_flag_id.poke(false.B)
            c.clock.step()
            c.io.instruction_address.expect(cur)
            pre = pre + 4
          case 1 => // jump
            c.io.jump_flag_id.poke(true.B)
            c.io.jump_address_id.poke(entry)
            c.clock.step()
            c.io.instruction_address.expect(entry)
            pre = entry
        }
      }
    }
  }

src/test/scala/riscv/singlecycle/InstructionFetchTest.scala

The Entry = 0x1000, this is why io_instruction_address = 0x1000 in the beginning.
When io.jump_flag_id = false, pre = pre + 4 (i.e. pc = pc + 4),
When io.jump_flag_id =true, pre = entry (i.e. pc = 0x1000)

Consistent with our previous analysis.

InstructionDecoder

In this diagram, we can see io_insruction = 0x00A02223.Based on the RISC-V ISA and the diagram below, we can get funct3 = 0x2, rs1 = 0, rs2 = 0xA, imm = 4. Hence, we can infer the instruction which is sw a1, 4(x0).Because it's a S type instruction, it should be memory write enable but not memory read enable.

Back to the waveform diagram, comparing the result of waveform and my inferences.opcode = 23, io_memory_read_enable = 0,io_memory_write_enable = 1 and so on.

Besides, we also can compare with the InstructionDecoder.scala:

class InstructionDecoderTest extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("InstructionDecoder of Single Cycle CPU")
  it should "produce correct control signal" in {
    test(new InstructionDecode).withAnnotations(TestAnnotations.annos) { c =>
      c.io.instruction.poke(0x00a02223L.U) // S-type
      c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
      c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate)
      c.io.regs_reg1_read_address.expect(0.U)
      c.io.regs_reg2_read_address.expect(10.U)
      c.clock.step()
    }
  }
}

src/test/scala/riscv/singlecycle/InstructionDecoderTest.scala

All of them align with our inferences.

Execute

R-type

In this diagram, opcode = 0x33, funt3 = 0x0, funct7 = 0x0, which means the instruction is an add instruction. So, we can observe that :

io_op1 equals to alu_io_op1 and io_reg1_data.
io_op2 equals to alu_io_op2 and io_reg2_data.
io_result = alu_io_op1 + alu_io_op2.
io_if_jump_flag = 0

B-type

In this diagram, opcode = 0x63, funt3 = 0x0, which means the instruction is a beq instruction. In this moment, we can observe that :

io_aluop1_source and io_aluop2_source equals to 1.
Because io_reg1_data != io_reg2_data, io_if_jump_flag = 0.
alu_io_op1 = io_instruction_address and alu_io_op2 = io_immediate
io_alu_funct = 1 means which is ALUFunction.add
io_if_jump_address = 4, which equals to io_result = alu_io_op1 + alu_io_op2,

Next moment, we can see io_reg1_data = io_reg2_data.Therefore, io_if_jump_flag is changed to 1, pc will jump to io_if_jump_address.

Assembly Code Adaptation from Homework2

To ensure the smooth operation and testing of the assembly code on MyCPU, we need to take the following modification :

Because single-cycle CPU lacks a system call for printing, we can't print our result directly.We need to modify the assembly code to store the result in memory instead.
To ensure compatibility between the programs used in Homework2 and MyCPU, you should remove the RDCYCLE/RDCYCLEH instructions.
Edit the Makeflie to generate .asmbin, then put the Hammingdist.s into the correct folder.
Add function, HammingdistTest, in the CPUTest.scala.
Run the test.

The code below is only partial snippet. You can find the complete code in ca2023-lab3 on GitHub (Forked from sysprog21/ca2023-lab3)

1. Modify assembly code

In Homework2, I add a syscall for printing integer :

# print the result #
    li a7, SYSINT	 #"printint" syscall
    add a1, a0, x0      # address of string(move result of hd_cal to a1)
    li a0, 1 		 #1 = standard output (stdout)
    ecall               # print result of hd_cal

However, in Homework3, I store the result in memory instead :

# store the result in memory(0x4, 0x8, 0x12)
    addi s11, s11, 4    #s11 = 0
    sw a0, 0(s11)

Besides, remove the RDCYCLE/RDCYCLEH instruction.

2. Edit the `Makefile` and regenerate RISC-V program

After the modifications mentioned above, we need to edit the csrc/Makefile to generate .asmbin

BINS = \
	fibonacci.asmbin \
	hello.asmbin \
	mmio.asmbin \
	quicksort.asmbin \
	sb.asmbin \
+	Hammingdist.asmbin

csrc/Makefile

Next, place the modificated assembly code Hammingdist.S into /csrc.

To regenerate the RISC-V programs utilized for unit tests, change to the csrc directory and run the make update command. Ensure that the $PATH environment variable is correctly configured to include the GNU toolchain for RISC-V.

$ cd $HOME/riscv-none-elf-gcc
$ echo "export PATH=`pwd`/bin:$PATH" > setenv
$ cd $HOME
$ source riscv-none-elf-gcc/setenv

$ cd csrc
$ make update

3. Add functional test in `CPUTest.scala`

class HammingdistTest extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("Single Cycle CPU")
  it should "Calculate hammingdistance of two sequences" in {
    test(new TestTopModule("Hammingdist.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
      for (i <- 1 to 50) {
        c.clock.step(1000)
        c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
      }
      
      c.io.mem_debug_read_address.poke(4.U) //read memory(0x4)
      c.clock.step()
      c.io.mem_debug_read_data.expect(21.U) //expect result = 21
      c.io.mem_debug_read_address.poke(8.U) // read memory(0x8)
      c.clock.step()
      c.io.mem_debug_read_data.expect(63.U) //expect result = 63
      c.io.mem_debug_read_address.poke(12.U) //read memory(0x12)
      c.clock.step()
      c.io.mem_debug_read_data.expect(0.U) //expect result = 0
    }
  }
}

src/test/scala/riscv/singlecycle/CPUTest.scala

4. Run test

Finally, we can test our assembly code on MyCPU by using following commands :

$ cd $HOME/ca2023-lab3
$ sbt "testOnly riscv.singlecycle.HammingdistTest"

Here is the output when you pass the test :

lab02@ubuntu:~/ca2023-lab3$ sbt "testOnly riscv.singlecycle.HammingdistTest"
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/lab02/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/lab02/ca2023-lab3/)
[info] HammingdistTest:
[info] Single Cycle CPU
[info] - should Calculate hammingdistance of two sequences
[info] Run completed in 8 seconds, 665 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 11 s, completed Nov 29, 2023, 2:44:51 AM

Make Verilator

In order to quickly test our written program, we can use Verilator for simulation. After the first run and every time you modify the Chisel code, you need to execute the following command in the project’s root directory to generate Verilog files:

$ make verilator

After compilation, we can load Hammingdist.asmbin file for simulating 1000 cycles, and save the simulation waveform to the dump1.vcd file, we can run:

$ ./run-verilator.sh -instruction src/main/resources/Hammingdist.asmbin
-time 2000 -vcd dump1.vcd

Output:

-time 2000
-memory 1048576
-instruction src/main/resources/Hammingdist.asmbin
[-------------------->] 100%

Then, run gtkwave dump1.vcd to analyze its waveform.

We can observe that the signal io_instruction begins with 000000000 and 00100000. In the meantime, let’s verify the hexadecimal representation of Hammingdist.asmbin:

$ hexdump src/main/resources/Hammingdist.asmbin | head -1

Its output:

0000000 0000 0010 0000 0000 ffff 000f 0000 0000

It's aligns with the expected waveform.

Waveform analyze

The io_instruction_address start from 0x1000, which correspond to link.lds:

OUTPUT_ARCH("riscv")
ENTRY(_start)

SECTIONS
{
  . = 0x00001000;
  .text : { *(.text.init) *(.text.startup) *(.text) }
  .data ALIGN(0x1000) : { *(.data*) *(.rodata*) *(.sdata*) }
  .bss : { *(.bss) }
  _end = .;
}

csrc/link.lds

Next, I analyze the assembly code in comparaison with the waveform.

section .text.Hammingdist
.global _start
.data
   test_data_1: .dword 0x0000000000100000, 0x00000000000FFFFF
   test_data_2: .dword 0x0000000000000001, 0x7FFFFFFFFFFFFFFE
   test_data_3: .dword 0x000000028370228F, 0x000000028370228F 

_start:  #0x1000
    addi sp, sp, -12
    # push pointers of test data onto the stack
    la t0, test_data_1
    sw t0, 0(sp)
    la t0, test_data_2
    ...

First instruction : addi, sp, sp, -12

io_instruction_address = 0x1030
io_instructiom = 0xFF410113
io_jump_flag_id = 0
io_immediate = -12
io_memory_read_enable = 0, io_memory_write_enable = 0

Because it's I-type, io_jump_flag_id = 0, io_memory_read_enable = 0 and io_memory_write_enable = 0.

In theory, the io_instruction_address for addi should be at 0x1004, but from the waveform, it is observed to be 0x1030. My speculation is that the preceding input data is executed first, causing the address for addi to become 0x1030.

Next instruction: la t0, test_data_1

Because la is a pseudo-instruction, it use two instrutions ,0x00000297 and 0xFCC28293, implementing it.

io_instruction_address = 0x1034, 0x1038
io_instructiom = 0x00000297, 0xFCC28293
io_jump_flag_id = 0
io_immediate = 0
io_memory_read_enable = 0, io_memory_write_enable = 0

Next instruction: sw t0, 0(sp)

io_instruction_address = 0x0000103c=C
io_instructiom = 0x00512023
io_jump_flag_id = 0
io_immediate = 0
io_memory_read_enable = 0, io_memory_write_enable = 1

Because it's S-type, io_memory_write_enable = 1.

Branch instruction: beq a1, zero, clz_lower_set_one

io_instructiom = 0x02058A63
io_jump_flag_id = 1
pc = 0x1190
io_immediate = 0x34
io_jump_address_id = 0x000011C4, which equals to pc + io_immediate

Because it's B-type, io_jump_flag_id = 1.

Since io_jump_flag_id = 1, next pc will jump to io_jump_address_id = 0x000011C4. The waveform is shown as:

Reference

Lab3: Construct a single-cycle CPU with Chisel