contributed by < ChengChiTing
>
RISC-V
, jserv
Ubuntu 22.04.3
Follow the instructions in Lab3: Construct a single-cycle RISC-V CPU with Chisel, install the essential package.
$ sudo apt install build-essential verilator gtkwave
verilator version : 4.038
GTKWave Analyzer version : 3.3.104
$ curl -s "https://get.sdkman.io" | bash
$ source "$HOME/.sdkman/bin/sdkman-init.sh"
$ sdk install java 11.0.21-tem
$ sdk install sbt
java version : 11.0.21
sbt version : 1.9.7
note! java version is crucial!
Version 11 & 17 can execute sbt command, but I encountered some error when I used java 21 in my first time environment setting.
Before we start Lab 3, we have to learn the fundamental concepts of Chisel first. Chisel is a domain specific language (DSL) implemented using Scala’s macro features. We can get the Repositiory with git command.
$ git clone https://github.com/ucb-bar/chisel-tutorial
After then, we can use the following command to check whether sbt installed successfully and executed correctly in our system.
$ cd chisel-tutorial
$ sbt run
It is needed to download necessary components for the first time. If the sbt run successfully, we can get the following output:
[info] Loading project definition from /home/riscv/chisel-tutorial/project
[info] Loading settings for project chisel-tutorial from build.sbt ...
[info] Set current project to chisel-tutorial (in build file:/home/riscv/chisel-tutorial/)
[info] running hello.Hello
[info] [0.001] Elaborating design...
[info] [0.049] Done elaborating.
Computed transform order in: 106.1 ms
Total FIRRTL Compile Time: 223.9 ms
End of dependency graph
Circuit state created
[info] [0.001] SEED 1701115466014
test Hello Success: 1 tests passed in 6 cycles taking 0.008942 seconds
[info] [0.002] RAN 1 CYCLES PASSED
[success] Total time: 2 s, completed Nov 28, 2023, 4:04:27 AM
Follow the instructions in Lab3: Construct a single-cycle RISC-V CPU with Chisel , and install Docker on Ubuntu. Run the following command:
$ docker run -it --rm -p 8888:8888 sysprog21/chisel-bootcamp
Copy the URL in output message, which starts with http://127.0.0.1:8888/, and paste it into your web browser to access the Jupyter Notebook.
Learn how to write basic Scala code and the concepts of Chisel from Chisel tutorial.
We should go through the following CHEPTER and complete the exercises:
After we have already completed all the exercises above, we can begin our work on Single-cycle RISC-V CPU
Fork the GitHub repository ca2023-lab3
$ git clone https://github.com/sysprog21/ca2023-lab3
$ cd ca2023-lab3
We can use the following command to check if the single-cycle RISC-V cpu implement sucessfully.
$ sbt test
However, the Scala code in this repository is not entirely complete. Once we run the test directly without filling the lost code, we will get the error message shown below :
[info] Run completed in 9 seconds, 985 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 3, failed 6, canceled 0, ignored 0, pending 0
[info] *** 6 TESTS FAILED ***
[error] Failed tests:
[error] riscv.singlecycle.InstructionDecoderTest
[error] riscv.singlecycle.ByteAccessTest
[error] riscv.singlecycle.InstructionFetchTest
[error] riscv.singlecycle.ExecuteTest
[error] riscv.singlecycle.FibonacciTest
[error] riscv.singlecycle.QuicksortTest
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 15 s, completed Nov 28, 2023, 10:50:50 AM
Therefore, we have to fill the scala code and passed all of the core test. The code related to core is located in the src/main/scala/riscv
directory. If we want to run a single test, such as running only InstructionFetchTest, execute the following command:
$ sbt "testOnly riscv.singlecycle.InstructionFetchTest"
We can check the single-cycle cpu architecture diagram to help us complete the cpu core.
Code can be found in src/main/scala/riscv/core/InstructionFetch.scala
class InstructionFetch extends Module {
val io = IO(new Bundle {
val jump_flag_id = Input(Bool())
val jump_address_id = Input(UInt(Parameters.AddrWidth))
val instruction_read_data = Input(UInt(Parameters.DataWidth))
val instruction_valid = Input(Bool())
val instruction_address = Output(UInt(Parameters.AddrWidth))
val instruction = Output(UInt(Parameters.InstructionWidth))
})
val pc = RegInit(ProgramCounter.EntryAddress)
when(io.instruction_valid) {
io.instruction := io.instruction_read_data
// lab3(InstructionFetch) begin
// lab3(InstructionFetch) end
}.otherwise {
pc := pc
io.instruction := 0x00000013.U
}
io.instruction_address := pc
}
We can compare the instruction fetch stage diagram shown below :
In instruction fetch stage, we have four inputs( jump_flag_id, jump_address_id, instruction_read_data, instruction_valid ) and two output( instruction_address, instruction ). We have to check if the instruction is valid or not with instruction_valid
then check jump_flag_id
. once the jump_flag_id
is True
, the PC is directed to jump_address_id
; otherwise, it is incremented to PC + 4.
If we run InstructionDecodeTest, we will get the following error :
[info] firrtl.passes.PassExceptions: firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/InstructionDecode.scala 127:14] : [module InstructionDecode] Reference io is not fully initialized.
[info] : io.memory_write_enable <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/InstructionDecode.scala 127:14] : [module InstructionDecode] Reference io is not fully initialized.
[info] : io.memory_read_enable <= VOID
Therefore, we need to find the correct signal drive io.memory_write_enable
and io.memory_read_enable
. The io.memory_write_enable
is used to implement store
operate and the io.memory_read_enable
is used to implement load
operate.
When we check the InstructionTypes
defined by InstructionDecode.scala
, load
instructions are defined as InstructionsTypeL
and store
instructions are defined as InstructionsTypeS
Code can be found in src/main/scala/riscv/core/InstructionDecode.scala
object InstructionsTypeL {
val lb = "b000".U
val lh = "b001".U
val lw = "b010".U
val lbu = "b100".U
val lhu = "b101".U
}
object InstructionsTypeI {
val addi = 0.U
val slli = 1.U
val slti = 2.U
val sltiu = 3.U
val xori = 4.U
val sri = 5.U
val ori = 6.U
val andi = 7.U
}
object InstructionsTypeS {
val sb = "b000".U
val sh = "b001".U
val sw = "b010".U
}
We can use opcode to check whether the instruction is load
or store
or not, and then determine the enable signal is True
or False
.
If we run ExecuteTest, we will get the following error :
[info] firrtl.passes.PassExceptions: firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/Execute.scala 32:24] : [module Execute] Reference alu is not fully initialized.
[info] : alu.io.op1 <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/Execute.scala 32:24] : [module Execute] Reference alu is not fully initialized.
[info] : alu.io.op2 <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/Execute.scala 32:24] : [module Execute] Reference alu is not fully initialized.
[info] : alu.io.func <= VOID
According to the execute stage diagram shown below, we can notice that the two input, aluop1_source
and aluop2_source
, determine where the alu_op signal come from.
If we check the InstructionDecode.scala
, we would know the relationship between alu_op and aluop_source. Therefore, we could use conditionals to complete our code.
Code can be found in src/main/scala/riscv/core/InstructionDecode.scala
object ALUOp1Source {
val Register = 0.U(1.W)
val InstructionAddress = 1.U(1.W)
}
object ALUOp2Source {
val Register = 0.U(1.W)
val Immediate = 1.U(1.W)
}
Another missing part is alu.io.func
. Alu_func signal comes from the ALU Control
, and we could check ALUControl.scala
. We will use alu_ctrl.io,alu_func
as input signal and alu.io.func
as output signal.
Code can be found in src/main/scala/riscv/core/ALUControl.scala
class ALUControl extends Module {
val io = IO(new Bundle {
val opcode = Input(UInt(7.W))
val funct3 = Input(UInt(3.W))
val funct7 = Input(UInt(7.W))
val alu_funct = Output(ALUFunctions())
}
After filling missing scala code above, we could run sbt test
again. However, there are still some error appeared, and the output message are shown below :
[info] firrtl.passes.PassExceptions: firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.aluop2_source <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.aluop1_source <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.reg2_data <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.immediate <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.instruction_address <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.reg1_data <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.instruction <= VOID
After checking the error message, we have to drive the missing imput signal during Execute stage. Comparing with the single-cycle cpu architecture diagram above, we can figure out that the signal of aluop_source
, immediate
and instruction
come from InstructionDecoded stage. The signal of reg_data
come from RegisterFile and the signal of instruction_address
come from InstructionFetch stage. Meanwhile, we can check the corresponded scala code then fill the right input signal.
As long as we fix all the missing part of the scala code , we could run sbt test
again. If all of the scala code is correct, we would get the following output.
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] InstructionDecoderTest:
[info] ExecuteTest:
[info] InstructionDecoder of Single Cycle CPU
[info] Execution of Single Cycle CPU
[info] - should produce correct control signal
[info] - should execute correctly
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] Run completed in 13 seconds, 26 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 17 s, completed Nov 29, 2023, 12:23:55 AM
We could use waveform simulation to check the signals generated by our cpu. Run the following command.
$ WRITE_VCD=1 sbt test
Afterward, we can find .vcd
files generated in test_run_dir
folder; Thus , we could open file with GTKWave and analisis relationship between different signals.
At the time 33ps, the io_instruction_address
= 100C and when it comes to time 35ps. Since the io_instruction_valid
is valid and io_jump_id
is False
(low signal), the io_instruction_address
has to add 4 and send into pc
. Therefore we can see the pc
and io_instruction_address
are 1010 at time 35ps.
When it comes to time 37ps, the pc
and io_instruction_address
should add 4 and be equal to 1014. However, io_jump_flag is True
(high signal) and io_jump_address_id is 1000 so pc
and io_instruction_address
have to be 1000, too.
At time 2ps, the input instruction is 00A02223
, we can change it from machine code to RISC-V assembly code. we will get sw x10, 4(x0)
. We can check the output signal. The offset 4
correspond to the io_ex_immediate
and io_memory_write_enable
is True
because we are executing store instruction. The target memory address rd
is x4
because 4(x0)
and the source data comes from x10
correspond to io_regs_regs2_read_address
At time 2ps, we can figure out the io_instruction
is 001101B3
correspond to add x3, x2, x1
. The io_func
is equal to 1
and its definition in ALUControl.scala
is also add
instruction. Therefore, we have to add io_op1 to io_op2, and the result should be 19CAEB99
(155EEA9E
+ 046C00FB
).
We could use GNU toolchain to help us run HW2 on MyCPU.
First, we have to set environment on our system. According to Assignment2: RISC-V Toolchain, we have to execute the following command on root folder.
$ cd $HOME
$ source riscv-none-elf-gcc/setenv
Second, keep our RISC-V assembly code in the csrs
directory and modify Makefile
. After that, to regenerate the RISC-V programs utilized for unit tests, change to the csrc
directory and run the make update
command
CROSS_COMPILE ?= riscv-none-elf-
ASFLAGS = -march=rv32i_zicsr -mabi=ilp32
CFLAGS = -O0 -Wall -march=rv32i_zicsr -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv
AS := $(CROSS_COMPILE)as
CC := $(CROSS_COMPILE)gcc
LD := $(CROSS_COMPILE)ld
OBJCOPY := $(CROSS_COMPILE)objcopy
%.o: %.S
$(AS) -R $(ASFLAGS) -o $@ $<
%.elf: %.S
$(AS) -R $(ASFLAGS) -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o)
%.elf: %.c init.o
$(CC) $(CFLAGS) -c -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o
%.asmbin: %.elf
$(OBJCOPY) -O binary -j .text -j .data $< $@
BINS = \
fibonacci.asmbin \
hello.asmbin \
mmio.asmbin \
quicksort.asmbin \
sb.asmbin
# Clear the .DEFAULT_GOAL special variable, so that the following turns
# to the first target after .DEFAULT_GOAL is not set.
.DEFAULT_GOAL :=
all: $(BINS)
update: $(BINS)
cp -f $(BINS) ../src/main/resources
clean:
$(RM) *.o *.elf *.asmbin
$ cd csrc
$ make update
If make
command run successfully, we will get new .asmbin
in csrc
folder. In addition, we can use sbt test
to test our file. Sbt test activates the test cases for validating the CPU implementation relies on Chiseltest. We have to modify CPUTest.scala
first.
File: src/test/scala/riscv/singlecycle/CPUTest.scala
The code I add to the file is shown below. We can run sbt test
command and confirm whether our .asmbin
file pass test or not.
class ipcheckTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "report success" in {
test(new TestTopModule("ipcheck.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 2000) {
c.clock.step()
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.regs_debug_read_address.poke(1.U)
c.io.regs_debug_read_data.expect(0x1300.U)
c.io.regs_debug_read_address.poke(2.U)
c.io.regs_debug_read_data.expect(0xf80.U)
c.io.regs_debug_read_address.poke(8.U)
c.io.regs_debug_read_data.expect(0xfb0.U)
c.io.regs_debug_read_address.poke(9.U)
c.io.regs_debug_read_data.expect(0xfc4.U)
c.io.regs_debug_read_address.poke(12.U)
c.io.regs_debug_read_data.expect(0x10.U)
c.io.regs_debug_read_address.poke(15.U)
c.io.regs_debug_read_data.expect(0xfcc.U)
c.io.regs_debug_read_address.poke(16.U)
c.io.regs_debug_read_data.expect(0x8.U)
}
println("success")
}
}
Third, we need to execute the following command in the project’s root directory to generate Verilog files :
$ make verilator
To load the ipcheck.asmbin file, simulate for 1000 cycles, and save the simulation waveform to the dump.vcd file, we can run :
$ ./run-verilator.sh -instruction src/main/resources/ipcheck.asmbin -time 2000 -vcd dump.vcd
Then, open dump.vcd
with GTKWave
to check its waveform.
At time 2ps, io_instruction_valid
is True
and io_jump_flag_id
is False
so the pc will point to instruction address 0x00001000
. The pointed instruction is 0x00001137
which will be send to InstructionDecode stage.
The instruction will decoded in this stage. If we check the RISC-V instruction set manual, we will find the correspond code is lui x2, 1
. We can confirm the signals in the GTKWave. Since lui
is a U-type instruction, io_reg_write_enable
will be True
. rd
and io_reg_write_address
is 02 due to the target register is x2
. lui
places the U-immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros. Therefore, 0x00001000
will be placed in x2
register.
io_aluop1_source
is 0 and io_aluop2_source
is 1. Alu will implement add
function because of func
is 1, so the result will be 0x00001000
.
io_write_address
is 2 and io_write_data
is 0x00001000
. The value of register x2
change into 0x00001000
at time 5ps, when is the next cycle signal begin.
The lui
instruction doesn't implement read or write in memory so most of the signals will be 0
in this stage.
It will send 0x00001000
back to the registers, and x2
will be changed when the next cycle begin.