contributed by < Yao1201
>
RISC-V
, jserv
Ther is a very detailed explanation on Lab3: Construct a single-cycle CPU with Chisel.
I used Chisel Bootcamp to learn CHISEL by completing the exercises provided.
// Hello World in Chisel
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
This code imply when cntReg
equals to CNT_MAX
, cntReg
resets to 0, blkReg
toggles its state, and the output is the current state of blkReg
. Besides, cntReg
increases by one after each cycle.
io.led
is output whichCNT_MAX
is an unsigned integer which equals to 24999999cntReg
is a 32-bits unsigned integer register initialized to 0blkReg
is a 1-bit unsigned integer register initialized to 0, which will be assigned to io.led
The code below is my enhancement of the original code using logical circuits:
// After enhancing
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := Mux(cntReg === CNT_MAX, 0.U, cntReg + 1.U)
blkReg := Mux(cntReg === CNT_MAX, ~blkReg, blkReg)
io.led := blkReg
}
I use Mux
instruction substituting for when
instruction.
First, we need to complete the following files:
src/main/scala/riscv/core/*.scala
The completed code can be found in my repositories on GitHub (Forked from sysprog21/ca2023-lab3)
After completing the files above, we can use following command to run single test:
$ sbt "testOnly riscv.singlecycle.InstructionFetchTest"
While running tests, we set the environment variable WRITE_VCD
to 1, waveform files will be generated.
$ WRITE_VCD=1 sbt test
Afterward, we can find .vcd files in various subdirectories under the test_run_dir
directory.
previous
current
From these two diagrams, in the beginning, io_instruction_address = 0x1000
and we can observe that when io_jump_flag_id = 0
, pc = pc + 4
.
previous
current
In the other hand, when io_jump_flag_id = 1,
pcwill depend on
io_jump_address_id, which equals to
0x1000(i.e.
pc = io_jump_address_id`)
We can compare with the InstructionFetchTest.scala
:
class InstructionFetchTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("InstructionFetch of Single Cycle CPU")
it should "fetch instruction" in {
test(new InstructionFetch).withAnnotations(TestAnnotations.annos) { c =>
val entry = 0x1000
var pre = entry
var cur = pre
c.io.instruction_valid.poke(true.B)
var x = 0
for (x <- 0 to 100) {
Random.nextInt(2) match {
case 0 => // no jump
cur = pre + 4
c.io.jump_flag_id.poke(false.B)
c.clock.step()
c.io.instruction_address.expect(cur)
pre = pre + 4
case 1 => // jump
c.io.jump_flag_id.poke(true.B)
c.io.jump_address_id.poke(entry)
c.clock.step()
c.io.instruction_address.expect(entry)
pre = entry
}
}
}
}
src/test/scala/riscv/singlecycle/InstructionFetchTest.scala
Entry = 0x1000
, this is why io_instruction_address = 0x1000
in the beginning.io.jump_flag_id = false
, pre = pre + 4
(i.e. pc = pc + 4
),io.jump_flag_id =true
, pre = entry
(i.e. pc = 0x1000
)Consistent with our previous analysis.
In this diagram, we can see io_insruction = 0x00A02223
.Based on the RISC-V ISA and the diagram below, we can get funct3 = 0x2
, rs1 = 0
, rs2 = 0xA
, imm = 4
. Hence, we can infer the instruction which is sw a1, 4(x0)
.Because it's a S type instruction, it should be memory write enable but not memory read enable.
Back to the waveform diagram, comparing the result of waveform and my inferences.opcode = 23
, io_memory_read_enable = 0
,io_memory_write_enable = 1
and so on.
Besides, we also can compare with the InstructionDecoder.scala
:
class InstructionDecoderTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("InstructionDecoder of Single Cycle CPU")
it should "produce correct control signal" in {
test(new InstructionDecode).withAnnotations(TestAnnotations.annos) { c =>
c.io.instruction.poke(0x00a02223L.U) // S-type
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate)
c.io.regs_reg1_read_address.expect(0.U)
c.io.regs_reg2_read_address.expect(10.U)
c.clock.step()
}
}
}
src/test/scala/riscv/singlecycle/InstructionDecoderTest.scala
All of them align with our inferences.
In this diagram, opcode = 0x33
, funt3 = 0x0
, funct7 = 0x0
, which means the instruction is an add
instruction. So, we can observe that :
io_op1
equals to alu_io_op1
and io_reg1_data
.io_op2
equals to alu_io_op2
and io_reg2_data
.io_result = alu_io_op1 + alu_io_op2
.io_if_jump_flag = 0
In this diagram, opcode = 0x63
, funt3 = 0x0
, which means the instruction is a beq
instruction. In this moment, we can observe that :
io_aluop1_source
and io_aluop2_source
equals to 1.io_reg1_data != io_reg2_data
, io_if_jump_flag = 0
.alu_io_op1 = io_instruction_address
and alu_io_op2 = io_immediate
io_alu_funct = 1
means which is ALUFunction.add
io_if_jump_address = 4
, which equals to io_result = alu_io_op1 + alu_io_op2
,Next moment, we can see io_reg1_data = io_reg2_data
.Therefore, io_if_jump_flag
is changed to 1, pc
will jump to io_if_jump_address
.
To ensure the smooth operation and testing of the assembly code on MyCPU, we need to take the following modification :
RDCYCLE/RDCYCLEH
instructions.Makeflie
to generate .asmbin
, then put the Hammingdist.s
into the correct folder.HammingdistTest
, in the CPUTest.scala
.The code below is only partial snippet. You can find the complete code in ca2023-lab3 on GitHub (Forked from sysprog21/ca2023-lab3)
In Homework2, I add a syscall for printing integer :
# print the result #
li a7, SYSINT #"printint" syscall
add a1, a0, x0 # address of string(move result of hd_cal to a1)
li a0, 1 #1 = standard output (stdout)
ecall # print result of hd_cal
However, in Homework3, I store the result in memory instead :
# store the result in memory(0x4, 0x8, 0x12)
addi s11, s11, 4 #s11 = 0
sw a0, 0(s11)
Besides, remove the RDCYCLE/RDCYCLEH
instruction.
Makefile
and regenerate RISC-V programAfter the modifications mentioned above, we need to edit the csrc/Makefile
to generate .asmbin
BINS = \
fibonacci.asmbin \
hello.asmbin \
mmio.asmbin \
quicksort.asmbin \
sb.asmbin \
+ Hammingdist.asmbin
csrc/Makefile
Next, place the modificated assembly code Hammingdist.S
into /csrc
.
To regenerate the RISC-V programs utilized for unit tests, change to the csrc
directory and run the make update
command. Ensure that the $PATH
environment variable is correctly configured to include the GNU toolchain for RISC-V.
$ cd $HOME/riscv-none-elf-gcc
$ echo "export PATH=`pwd`/bin:$PATH" > setenv
$ cd $HOME
$ source riscv-none-elf-gcc/setenv
$ cd csrc
$ make update
CPUTest.scala
class HammingdistTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "Calculate hammingdistance of two sequences" in {
test(new TestTopModule("Hammingdist.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 50) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.mem_debug_read_address.poke(4.U) //read memory(0x4)
c.clock.step()
c.io.mem_debug_read_data.expect(21.U) //expect result = 21
c.io.mem_debug_read_address.poke(8.U) // read memory(0x8)
c.clock.step()
c.io.mem_debug_read_data.expect(63.U) //expect result = 63
c.io.mem_debug_read_address.poke(12.U) //read memory(0x12)
c.clock.step()
c.io.mem_debug_read_data.expect(0.U) //expect result = 0
}
}
}
src/test/scala/riscv/singlecycle/CPUTest.scala
Finally, we can test our assembly code on MyCPU by using following commands :
$ cd $HOME/ca2023-lab3
$ sbt "testOnly riscv.singlecycle.HammingdistTest"
Here is the output when you pass the test :
lab02@ubuntu:~/ca2023-lab3$ sbt "testOnly riscv.singlecycle.HammingdistTest"
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/lab02/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/lab02/ca2023-lab3/)
[info] HammingdistTest:
[info] Single Cycle CPU
[info] - should Calculate hammingdistance of two sequences
[info] Run completed in 8 seconds, 665 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 11 s, completed Nov 29, 2023, 2:44:51 AM
In order to quickly test our written program, we can use Verilator for simulation. After the first run and every time you modify the Chisel code, you need to execute the following command in the project’s root directory to generate Verilog files:
$ make verilator
After compilation, we can load Hammingdist.asmbin
file for simulating 1000 cycles, and save the simulation waveform to the dump1.vcd
file, we can run:
$ ./run-verilator.sh -instruction src/main/resources/Hammingdist.asmbin
-time 2000 -vcd dump1.vcd
Output:
-time 2000
-memory 1048576
-instruction src/main/resources/Hammingdist.asmbin
[-------------------->] 100%
Then, run gtkwave dump1.vcd
to analyze its waveform.
We can observe that the signal io_instruction
begins with 000000000
and 00100000
. In the meantime, let’s verify the hexadecimal representation of Hammingdist.asmbin:
$ hexdump src/main/resources/Hammingdist.asmbin | head -1
Its output:
0000000 0000 0010 0000 0000 ffff 000f 0000 0000
It's aligns with the expected waveform.
The io_instruction_address
start from 0x1000
, which correspond to link.lds
:
OUTPUT_ARCH("riscv")
ENTRY(_start)
SECTIONS
{
. = 0x00001000;
.text : { *(.text.init) *(.text.startup) *(.text) }
.data ALIGN(0x1000) : { *(.data*) *(.rodata*) *(.sdata*) }
.bss : { *(.bss) }
_end = .;
}
csrc/link.lds
Next, I analyze the assembly code in comparaison with the waveform.
section .text.Hammingdist
.global _start
.data
test_data_1: .dword 0x0000000000100000, 0x00000000000FFFFF
test_data_2: .dword 0x0000000000000001, 0x7FFFFFFFFFFFFFFE
test_data_3: .dword 0x000000028370228F, 0x000000028370228F
_start: #0x1000
addi sp, sp, -12
# push pointers of test data onto the stack
la t0, test_data_1
sw t0, 0(sp)
la t0, test_data_2
...
First instruction : addi, sp, sp, -12
io_instruction_address = 0x1030
io_instructiom = 0xFF410113
io_jump_flag_id = 0
io_immediate = -12
io_memory_read_enable = 0
, io_memory_write_enable = 0
Because it's I-type, io_jump_flag_id = 0
, io_memory_read_enable = 0
and io_memory_write_enable = 0
.
In theory, the io_instruction_address
for addi
should be at 0x1004
, but from the waveform, it is observed to be 0x1030
. My speculation is that the preceding input data is executed first, causing the address for addi
to become 0x1030
.
Next instruction: la t0, test_data_1
Because la
is a pseudo-instruction, it use two instrutions ,0x00000297
and 0xFCC28293
, implementing it.
io_instruction_address = 0x1034, 0x1038
io_instructiom = 0x00000297, 0xFCC28293
io_jump_flag_id = 0
io_immediate = 0
io_memory_read_enable = 0
, io_memory_write_enable = 0
Next instruction: sw t0, 0(sp)
io_instruction_address = 0x0000103c=C
io_instructiom = 0x00512023
io_jump_flag_id = 0
io_immediate = 0
io_memory_read_enable = 0
, io_memory_write_enable = 1
Because it's S-type, io_memory_write_enable = 1
.
Branch instruction: beq a1, zero, clz_lower_set_one
io_instructiom = 0x02058A63
io_jump_flag_id = 1
pc = 0x1190
io_immediate = 0x34
io_jump_address_id = 0x000011C4
, which equals to pc + io_immediate
Because it's B-type, io_jump_flag_id = 1
.
Since io_jump_flag_id = 1
, next pc
will jump to io_jump_address_id = 0x000011C4
. The waveform is shown as: