contributed by < kkkkk1109 >
In this assignment, we are asked to learn Chisel and implement a single-cycle RISC-V CPU. By following the steps in Lab3, I complete mycpu
, which is the object mentioned above, and test my assembly code from Assignment 2.In addition, there are explanations for the signals at different stages of mycpu
along with evidence of successful tests.
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
led
is the output of this circuitCNT_MAX
is an unsign integer number 24999999cntReg
is a 32-bit register initialed with 0 valueblkReg
is a 1-bit register initialed with 0 valueThe cntReg
increase one every cycle, and when cntReg
equals to CNT_MAX
,
the bit in blkReg
will flip, and the output is the value in blkReg
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := Mux(cntReg === CNT_MAX,
0.U, cntReg + 1.U)
blkReg := Mux(cntReg === CNT_MAX,
~blkReg, blkReg)
io.led := blkReg
}
we can simply use the Mux to implement Hello World in Chisel
To complete the Single-cycle RISC-V CPU, we need to add code to Scala files in src/main/scala/riscv/core
. The following strategies outline how to complete each module and what the code should look like when tests pass.
Check the jump_flag_id
to determine whether it is true or not. If it is true, set the program counter (pc) to the jump_address
; otherwise, add 4.U to the program counter (pc)
$ sbt "testOnly riscv.singlecycle.InstructionFetchTest"
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] Run completed in 4 seconds, 609 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 6 s, completed Nov 29, 2023, 10:13:46 PM
The space we need to filled in is to output the memory read and write enable
.To determine the memory read and write enable
, decode the instruction. If the instruction is of L-type
, set memory_read_enable to true; if it is of S-type
, set memory_write_enable to true.
$ sbt "testOnly riscv.singlecycle.InstructionDecoderTest"
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] Run completed in 4 seconds, 820 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 12 s, completed Nov 29, 2023, 10:28:23 PM
In Execute stage, we should define the Input of op1 and op2. First, I write the code like this.
alu.io.op1 := io.reg1_data
alu.io.op2 := io.reg2_data
However, I still pass the test
$ sbt "testOnly riscv.singlecycle.ExecuteTest"
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] Run completed in 4 seconds, 685 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 10 s, completed Nov 29, 2023, 10:35:42 PM
While, this mistake leads to the tests Failed in CPU
$ sbt test
[error] Failed tests:
[error] riscv.singlecycle.ByteAccessTest
[error] riscv.singlecycle.FibonacciTest
[error] riscv.singlecycle.QuicksortTest
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 31 s, completed Nov 29, 2023, 10:37:59 PM
It takes me hours to debug since I passed the Execute test and thought it was the CPU code went wrong. I found that I miss reading a secten in Lab3
Taking ex_aluop1_source control signal as an example, this control signal determines the input for the first operand of the ALU. It assigns a value to ex_aluop1_source based on the opcode. When the instruction type is either auipc, jal, or B, ex_aluop1_source is set to 0, controlling the ALU’s first operand input to be the instruction address. In other cases, ex_aluop1_source is set to 1, controlling the ALU’s first operand input to be a register.
I forgot to check the op1 and op2 should be an address,a register value or an immediate, after take this into consideration, I passed all the tests.
[info] Run completed in 27 seconds, 916 milliseconds.
[info] Total number of tests run: 10
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
I made a assembly code Instruction.s
in csrc
to see the waveform.
.text
_start:
addi x1, x0, 3
exit:
j exit
Instruction Fetch
We can see when the instruction_valid
on, the instruction
get the instruction_read_data
signal. Also, when the jump_flag_id
on, insturcion_address
become the jump address where here is the label exit
Instruction Decode
Since the instruction here is addi
, aluop1_source
is 0 which means the data input is register; whenaluop2_source
is 1 ,which means the data input should be immediate.
when the instruction is j
, the aluop1_source
become 1, which means the data input now is address.
Also, this stage also decode the instruction into opcode, rs, rd, register address and so on.
op1
and op2
are the value 0 from x0
and immediate
3, and show the result 3 in the signal result
. When the instruction changes to j
, the jump_flag
goes to 1.Memory
There is no memory write or read in Instruction.s
, so the memory_read_enable
and memory_write_enable
are both 0.
Write Back
In the write-back stage, the computed data or data read from memory is written into registers
Modify the assembly in assignment 2 by removing the ecall
and RDCYCLE/RDCYCLEH
instruction, and the result is stored in the register s3
.
Add assembly code hw3.s
and instruction.s
to generate .asmbin
CROSS_COMPILE ?= riscv-none-elf-
ASFLAGS = -march=rv32i_zicsr -mabi=ilp32
CFLAGS = -O0 -Wall -march=rv32i_zicsr -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv
AS := $(CROSS_COMPILE)as
CC := $(CROSS_COMPILE)gcc
LD := $(CROSS_COMPILE)ld
OBJCOPY := $(CROSS_COMPILE)objcopy
%.o: %.S
$(AS) -R $(ASFLAGS) -o $@ $<
%.elf: %.S
$(AS) -R $(ASFLAGS) -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o)
%.elf: %.c init.o
$(CC) $(CFLAGS) -c -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o
%.asmbin: %.elf
$(OBJCOPY) -O binary -j .text -j .data $< $@
BINS = \
fibonacci.asmbin \
hello.asmbin \
mmio.asmbin \
quicksort.asmbin \
sb.asmbin\
+ hw3.asmbin\
+ instruction.asmbin\
# Clear the .DEFAULT_GOAL special variable, so that the following turns
# to the first target after .DEFAULT_GOAL is not set.
.DEFAULT_GOAL :=
all: $(BINS)
update: $(BINS)
cp -f $(BINS) ../src/main/resources
clean:
$(RM) *.o *.elf *.asmbin
$ make update
In src/test/scala/riscv/singlecycle/CPUTest.scala
, add the test file
class hw3 extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "do bfloat16 multiplication "in {
test(new TestTopModule("hw3.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 500) {
c.clock.step()
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.regs_debug_read_address.poke(19.U) // t0
c.io.regs_debug_read_data.expect(0x43000000.U)
}
}
}
In my assembly code, the result of bfloat16 multiplication should be stored in s3
, which is x19
, and the multiplication result of 1.29999* 99.09999
should be 0x43000000
.
Then, run the test file.
$ sbt test
[info] - should do bfloat16 multiplication *** FAILED ***
[info] io_regs_debug_read_data=133 (0x85) did not equal expected=1124073472 (0x43000000) (lines in CPUTest.scala: 126, 120) (CPUTest.scala:126)
info] *** 1 TEST FAILED ***
[error] Failed tests:
[error] riscv.singlecycle.hw3
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful
Test Failed, the register value is not corresponded to the expected value 0x43000000
, using the gtkwave to check the signal.
$ ./run-verilator.sh -instruction src/main/resources/hw3.asmbin -time 2000 -vcd dump.vcd
$ gtkwave dump.vcd
When running the program, it only reaches halfway at cycle 500. It can be observed that by cycle 681, the register s3 has the expected value.
Change the run cycle to 1000.
class hw3 extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "do bfloat16 multiplication "in {
test(new TestTopModule("hw3.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 1000) {
c.clock.step()
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.regs_debug_read_address.poke(19.U) // t0
c.io.regs_debug_read_data.expect(0x43000000.U)
}
}
}
All tests passed!
[info] hw3:
[info] Single Cycle CPU
[info] - should do bfloat16 multiplication
[info] Run completed in 29 seconds, 278 milliseconds.
[info] Total number of tests run: 10
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.