Try   HackMD

Assignment3: single-cycle RISC-V CPU

contributed by < kkkkk1109 >

Introduction

In this assignment, we are asked to learn Chisel and implement a single-cycle RISC-V CPU. By following the steps in Lab3, I complete mycpu, which is the object mentioned above, and test my assembly code from Assignment 2.In addition, there are explanations for the signals at different stages of mycpu along with evidence of successful tests.

Hello World in Chisel

class Hello extends Module {
  val io = IO(new Bundle {
    val led = Output(UInt(1.W))
  })
  val CNT_MAX = (50000000 / 2 - 1).U;
  val cntReg  = RegInit(0.U(32.W))
  val blkReg  = RegInit(0.U(1.W))
  cntReg := cntReg + 1.U
  when(cntReg === CNT_MAX) {
    cntReg := 0.U
    blkReg := ~blkReg                                                                                                                                     
  }
  io.led := blkReg
}
  • led is the output of this circuit
  • CNT_MAX is an unsign integer number 24999999
  • cntReg is a 32-bit register initialed with 0 value
  • blkReg is a 1-bit register initialed with 0 value

The cntReg increase one every cycle, and when cntReg equals to CNT_MAX,
the bit in blkReg will flip, and the output is the value in blkReg

class Hello extends Module {
  val io = IO(new Bundle {
    val led = Output(UInt(1.W))
  })
  val CNT_MAX = (50000000 / 2 - 1).U;
  val cntReg  = RegInit(0.U(32.W))
  val blkReg  = RegInit(0.U(1.W))
  cntReg := Mux(cntReg === CNT_MAX,
                0.U, cntReg + 1.U)  
  blkReg := Mux(cntReg === CNT_MAX,
                ~blkReg, blkReg)
  io.led := blkReg

}

we can simply use the Mux to implement Hello World in Chisel

Single-cycle RISC-V CPU

To complete the Single-cycle RISC-V CPU, we need to add code to Scala files in src/main/scala/riscv/core. The following strategies outline how to complete each module and what the code should look like when tests pass.

Instruction Fetch

Check the jump_flag_id to determine whether it is true or not. If it is true, set the program counter (pc) to the jump_address; otherwise, add 4.U to the program counter (pc)

$ sbt "testOnly riscv.singlecycle.InstructionFetchTest"
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] Run completed in 4 seconds, 609 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 6 s, completed Nov 29, 2023, 10:13:46 PM

Instruction Decode

The space we need to filled in is to output the memory read and write enable.To determine the memory read and write enable, decode the instruction. If the instruction is of L-type, set memory_read_enable to true; if it is of S-type, set memory_write_enable to true.

$ sbt "testOnly riscv.singlecycle.InstructionDecoderTest"
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] Run completed in 4 seconds, 820 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 12 s, completed Nov 29, 2023, 10:28:23 PM

Execute

In Execute stage, we should define the Input of op1 and op2. First, I write the code like this.

alu.io.op1 :=  io.reg1_data
alu.io.op2 :=  io.reg2_data

However, I still pass the test

$ sbt "testOnly riscv.singlecycle.ExecuteTest"
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] Run completed in 4 seconds, 685 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 10 s, completed Nov 29, 2023, 10:35:42 PM

While, this mistake leads to the tests Failed in CPU

$ sbt test
[error] Failed tests:
[error] 	riscv.singlecycle.ByteAccessTest
[error] 	riscv.singlecycle.FibonacciTest
[error] 	riscv.singlecycle.QuicksortTest
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 31 s, completed Nov 29, 2023, 10:37:59 PM

It takes me hours to debug since I passed the Execute test and thought it was the CPU code went wrong. I found that I miss reading a secten in Lab3

Taking ex_aluop1_source control signal as an example, this control signal determines the input for the first operand of the ALU. It assigns a value to ex_aluop1_source based on the opcode. When the instruction type is either auipc, jal, or B, ex_aluop1_source is set to 0, controlling the ALU’s first operand input to be the instruction address. In other cases, ex_aluop1_source is set to 1, controlling the ALU’s first operand input to be a register.

I forgot to check the op1 and op2 should be an address,a register value or an immediate, after take this into consideration, I passed all the tests.

[info] Run completed in 27 seconds, 916 milliseconds.
[info] Total number of tests run: 10
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

waveform

I made a assembly code Instruction.s in csrc to see the waveform.

.text
_start:
addi x1, x0, 3
exit:
j exit
  • Instruction Fetch
    We can see when the instruction_valid on, the instruction get the instruction_read_datasignal. Also, when the jump_flag_id on, insturcion_address become the jump address where here is the label exit
    image

  • Instruction Decode
    Since the instruction here is addi, aluop1_source is 0 which means the data input is register; whenaluop2_source is 1 ,which means the data input should be immediate.
    image

when the instruction is j, the aluop1_source become 1, which means the data input now is address.
image

Also, this stage also decode the instruction into opcode, rs, rd, register address and so on.

  • Execute
    In this stage, the signal op1 and op2 are the value 0 from x0 and immediate 3, and show the result 3 in the signal result. When the instruction changes to j, the jump_flag goes to 1.

image

  • Memory
    There is no memory write or read in Instruction.s, so the memory_read_enable and memory_write_enable are both 0.
    image

  • Write Back
    In the write-back stage, the computed data or data read from memory is written into registers
    image

Modify handwritten RISC-V code in Assignment 2

Modify the assembly in assignment 2 by removing the ecall and RDCYCLE/RDCYCLEH instruction, and the result is stored in the register s3.

Makefile

Add assembly code hw3.s and instruction.s to generate .asmbin

CROSS_COMPILE ?= riscv-none-elf-

ASFLAGS = -march=rv32i_zicsr -mabi=ilp32
CFLAGS = -O0 -Wall -march=rv32i_zicsr -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv

AS := $(CROSS_COMPILE)as
CC := $(CROSS_COMPILE)gcc
LD := $(CROSS_COMPILE)ld
OBJCOPY := $(CROSS_COMPILE)objcopy

%.o: %.S
	$(AS) -R $(ASFLAGS) -o $@ $<
%.elf: %.S
	$(AS) -R $(ASFLAGS) -o $(@:.elf=.o) $<
	$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o)
%.elf: %.c init.o
	$(CC) $(CFLAGS) -c -o $(@:.elf=.o) $<
	$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o

%.asmbin: %.elf
	$(OBJCOPY) -O binary -j .text -j .data $< $@

BINS = \
	fibonacci.asmbin \
	hello.asmbin \
	mmio.asmbin \
	quicksort.asmbin \
	sb.asmbin\
+	hw3.asmbin\
+	instruction.asmbin\

# Clear the .DEFAULT_GOAL special variable, so that the following turns
# to the first target after .DEFAULT_GOAL is not set.
.DEFAULT_GOAL :=

all: $(BINS)

update: $(BINS)
	cp -f $(BINS) ../src/main/resources

clean:
	$(RM) *.o *.elf *.asmbin
$ make update

In src/test/scala/riscv/singlecycle/CPUTest.scala, add the test file

class hw3 extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("Single Cycle CPU")
  it should "do bfloat16 multiplication "in {
    test(new TestTopModule("hw3.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
      for (i <- 1 to 500) {
        c.clock.step()
        c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
      }
      c.io.regs_debug_read_address.poke(19.U) // t0
      c.io.regs_debug_read_data.expect(0x43000000.U)

    }
  }
}

In my assembly code, the result of bfloat16 multiplication should be stored in s3, which is x19, and the multiplication result of 1.29999* 99.09999 should be 0x43000000.
Then, run the test file.

$ sbt test
[info] - should do bfloat16 multiplication *** FAILED ***
[info]   io_regs_debug_read_data=133 (0x85) did not equal expected=1124073472 (0x43000000) (lines in CPUTest.scala: 126, 120) (CPUTest.scala:126)
info] *** 1 TEST FAILED ***
[error] Failed tests:
[error] 	riscv.singlecycle.hw3
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful

Test Failed, the register value is not corresponded to the expected value 0x43000000, using the gtkwave to check the signal.

$ ./run-verilator.sh -instruction src/main/resources/hw3.asmbin -time 2000 -vcd dump.vcd
$ gtkwave dump.vcd

image

When running the program, it only reaches halfway at cycle 500. It can be observed that by cycle 681, the register s3 has the expected value.

image

Change the run cycle to 1000.

class hw3 extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("Single Cycle CPU")
  it should "do bfloat16 multiplication "in {
    test(new TestTopModule("hw3.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
      for (i <- 1 to 1000) {
        c.clock.step()
        c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
      }
      c.io.regs_debug_read_address.poke(19.U) // t0
      c.io.regs_debug_read_data.expect(0x43000000.U)
    }
  }
}

All tests passed!

[info] hw3:
[info] Single Cycle CPU
[info] - should do bfloat16 multiplication
[info] Run completed in 29 seconds, 278 milliseconds.
[info] Total number of tests run: 10
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.