郭君瑋
In my term project, my goal is Study ChiselRiscV and the books mentioned on the project page to understand how to build a RISC-V processor using Chisel. Ensure that it passes the riscv-arch-test and supports the RV32IM and CSR instructions.Finally, select and modify at least three RISC-V programs from course quiz, and make them run on my improved ChiselRiscV processor.
The entire process can be divided into the following three main steps:
Before beginning my study and implementation, it is essential to build my environment. The following are OS and the software I used in this project.
For chisel
For riscv-arch-test
I use Docker to set up my own environment. The detail installation of the above software and others not mentioned is recorded in the following Dockerfile.
In the official RISCOF document, there is a small mistake regarding the installation of SAIL. The version of ocaml-base-compiler needs to be at least 4.08.1 instead of 4.06.1. Additionally, when installing RISCOF by pip, it will be installed in /home/user/.local/bin
, you need to use export PATH="$PATH:/home/user/.local/bin"
to get riscof
this command in your terminal.
ChiselRiscV is a 32-bit RISC-V CPU implemented according to the book "CPU Design with RISC-V and Chisel - First step to custom CPU implementation with open-source ISA". To understand how to make a riscv CPU by Chisel, the first step is cloning this repository.
After cloning repository, we can see this file structure
In Chisel, the source code can divided two part, main
and test
. main
contains the code for all hardware behaviors, and test
is similiar to the testbench in Verilog. It is responsible for providing inputs and verifying the outputs. build.sbt
include the compile configuration and the version of Scala and Chisel.
If we want to run the all test, use
or use testOnly
to run the Specified test.
Memory consists of UInt(8.W), representing a byte. RISC-V uses little-endian, which means the least significant byte is stored at the lowest address.
8 bit fetch.hex
Address | Data |
---|---|
0 | 11 |
1 | 12 |
2 | 13 |
3 | 14 |
4 | 21 |
… | … |
11 | 34 |
8 bit fetch.hex after being organized into 32 bit.
Address | Data |
---|---|
0 | 14131211 |
4 | 24232221 |
8 | 34333221 |
Hence, the behavior of fetching memory will like the following.
pcReg
will increment based on the StartAddr
when test starts. Every cycle, CPU fetch instruction depended on pcReg
.
32 registers are defined in RV32I, which are used to store data and addresses. Each register is 32-bit wide.
Address | register |
---|---|
0 | zero |
1 | ra |
2 | sp |
3 | gp |
4 | tp |
5 | t0 |
6 | t1 |
7 | t2 |
8 | s0/fp |
9 | s1 |
10 | a0 |
11 | a1 |
12 | a2 |
13 | a3 |
14 | a4 |
15 | a5 |
16 | a6 |
17 | a7 |
18 | s2 |
19 | s3 |
20 | s4 |
21 | s5 |
22 | s6 |
23 | s7 |
24 | s8 |
25 | s9 |
26 | s10 |
27 | s11 |
28 | t3 |
29 | t4 |
30 | s5 |
31 | t6 |
And there is a register memory declared in Core. we can use the above table to find the register we want to accessed
RV32I basic instruction set is composed of 32-bit instructions.
The instruction format is divided into six types: R-Type, I-Type, S-Type,U-Type,J-Type and B-Type. J-Type and B-Type are respectively come from I-Type and S-Type, so we can say that there are four basic types of instructions.
[R-Type]
+-------------------------------------------------------------------------------------------------+
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
+---------------------+--------------+--------------+--------+--------------+---------------------+
| funct7 | rs2 | rs1 | funct3 | rd | opcode |
+---------------------+--------------+--------------+--------+--------------+---------------------+
[I-Tpye]
+-------------------------------------------------------------------------------------------------+
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
+------------------------------------+--------------+--------+--------------+---------------------+
| imm_i | rs1 | funct3 | rd | opcode |
+------------------------------------+--------------+--------+--------------+---------------------+
[S-Type]
+-------------------------------------------------------------------------------------------------+
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
+---------------------+--------------+--------------+--------+--------------+---------------------+
| imm_s(11:5) | rs2 | rs1 | funct3 | imm_s(4:0) | opcode |
+---------------------+--------------+--------------+--------+--------------+---------------------+
[U-Type]
+-------------------------------------------------------------------------------------------------+
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
+------------------------------------------------------------+--------------+---------------------+
| imm_u(11:5) | rd | opcode |
+------------------------------------------------------------+--------------+---------------------+
[J-Type]
+-------------------------------------------------------------------------------------------------+
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
+------------------------------------------------------------+--------------+---------------------+
| imm_j(20 + 10:1 + 11 + 19:12) | rd | opcode |
+------------------------------------------------------------+--------------+---------------------+
[B-Type]
+-------------------------------------------------------------------------------------------------+
| 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 |
+---------------------+--------------+--------------+--------+--------------+---------------------+
| imm_b(12 + 10:5) | rs2 | rs1 | funct3 | imm_b(4:1+11)| opcode |
+---------------------+--------------+--------------+--------+--------------+---------------------+
The CPU can analyze the instructions based on these formats to determine the required registers and data.
and immediate can be got by the same method.
Since every instructions execute different behaviors, it is necessary to use control signals to determine the circuit paths. The following is the signal applied to this CPU.
A lookup table is required to decode every instruction and determining all the signals.
Here, take an add
instruction as a example to demonstrate the behavior of the lookup table. If an add
instruction is the following.
The bit pattern of add
is b0000000??????????000?????0110011
.Then, the CPU will use this pattern to match the instruction in the table, identify which instruction it is, and give back a set of control signals.
Because of different instruction format and usage, the operation is not performed on two different registers but on a register and an immediate value sometimes.
After determining op1Data and op2Data, the inputs wiil be sent to the ALU. The design of the ALU is shown in the following code, where the control signal determines the type of operation to be executed.
For B format instruciotn, branch Comparator is also required.
Becasue B and J format maybe jump to another address, modification to the pcReg is essential. If instruction is Jal
or Jalr
, directly write the jump address to pcReg, and so do B format instructions.
when CPU execute ecall
, the trap handler must be triggered. Since this CPU only implements M-mode(Machine mode), when an ecall
occurs, the value of mtvec
must be written to pcReg
to allow the CPU to jump to the trap handler.
load and save instructions must access data memory, so only lw
and sw
are related to this stage.
CPU access data memory according to the address of register value and immediate (rs1Data
+ immS
) in sw
instrution.
The memWen
signal will only be true for the lw
instruction. In other words, only the lw
instruction writes data memory based on the register value and the immediate address (rs1Data
+ immI
).
Except for lw
, sw
, J and B format instrutions, the remaining basic instructions write the result of aluOut
back to the specified register when regFileWen === RenS
. lw
, Jal
and Jalr
do the same thing, but lw
write the data accessed from memory, while Jal
and Jalr
write back pcPlus4
, instead of aluOut
. sw
and B format instrutions don't care about this stage.
CSR instuction is atomic. Such instructions cannot be divided into separate steps.
In other words, CSR instuction can read and write the same CSR register in the same time. Take csrrw
for a example, csrrw
is "Atomic Read/Write in CSR". csrrw
read value from the CSR rgister and write it to rd
register (see the above code), while the value in rs1
register is read and written to CSR register.
s
for set, the operated value is ORed with the CSR value.
c
for clear, the operated value is inverted and ANDed with the CSR value.
The exit
is the end signal. when instrution is Unimp
, the exit
signal is raised to high. Then, test detect that exit
signal is high, the whole CPU test will be finished.
Through my research, I discovered that this CPU lacks some basic instructions and needs to add the M-series instructions.
The missing instructions are Lh
, Lhu
, Lb
, Lbu
, Sh
and Sb
. In order to adding them to the current CPU, the first step is modifying the decoder. Hence, I add the bit pattern of these instructions while adjusting the lookup table and part of control signal.
In the beginning, the signal memWen
is ture only when the instruction is sw
. after adding sh
and sb
, this signal is expanded to four states and can identify how much byte data should be stored.
wbSel
is also expanded to 8 states. For each load instructions, there are corresponding write-back behaviors.
expanding M-series instructions is similiar to complement load and save instructions. Adding bit pattern and modifying lookup table is necessary.
For M-series instructions, it is sufficient to expand the behavior of the ALU, and the rest of the signal controls are the same as add
instruction.
On division, you have to be careful about deviding by 0. Specification for this case is also defined.The quotient of division by zero has all bits set, and the remainder of division by zero equals the dividend.
from Implemented M-extension of RISCV
The riscv-arch-test are an evolving set of tests that are created to help ensure that software written for a given RISC-V Profile/Specification will run on all implementations that comply with that profile. The older 2.x version of the framework is based on Makefiles and the current version 3.10 I Adopt use RISCOF as its basis system.
RISCOF(The RISC-V Compatibility Framework) is a python based framework which enables testing of a RISC-V target (hard or soft implementations) against a standard RISC-V golden reference model using a suite of RISC-V architectural assembly tests.
RISCOF generates standard pre-built templates for DUTs and Reference Models for the user via the setup
command as shown below:
The above command will generate the following files and directories in the current directory:
The generate template config.ini
will look something like this by default:
Before you start to run RISCOF, you should supply the path of some files about your hardware model, such as plugin, ispec and pspec to config.ini
.
A typical DUT plugin directory has the following structure:
The python plugin files capture the behavior of model for compiling tests, executing them on the DUT and finally extracting the signature for each test.
The yaml specs in the DUT plugin directory are the most important inputs to the RISCOF framework. All decisions of filtering tests depend on the these YAML files. The files must follow the syntax/format specified by riscv-config. These YAMLs are validated in RISCOF using riscv-config.
The env
folder can also contain other necessary plugin specific files for pre/post processing of logs, signatures, elfs, etc.
For ChiselRiscV, the input data is .hex
compiled .c
via riscv-gnu-toolchain. I use the following command to compile the testing program.
Then, it produce a .hex
as input of hardware and a dump
for debugging.
the testing command is the following:
I add argument -DprogramFile
to find where the .hex
is. The loadMemoryFromFileInline
funtion read it to mem
.
In RISCOF, the testing is comparing the section .data
of the program with the reference hardware to verify whether the hardware behavior is correct. Therefore, the dut hardware must have the funtion of printing out the memory content. However, Chisel doesn't have any funtion about writting data to file, like fwrite
in Verilog. So, I used 2>&1 | tee output.stdout
to save terminal output and Capture the part of memory data. This method was provided by nucleusrv.
The remaining task is to print the content of a specified memory address. Here, I refer to the MMIO approach by setting two rarely used memory addresses as output addresses. By moving specific memory value to OutAddr
and then storing 1 into PrintAddr
, the hardware will print the value of OutAddr
in the terminal.
At the end of the test program, I add the following assembly code. This assembly code will read the specified begin_signature
address and end_signature
address, then print the values of this memory region one by one using the method described above.
This test is modified from Quiz2 Problem A
The testing result is 0xffffffc1
equal to -63
.
This test is modified from Quiz2 Problem D
The testing result is 0x0000005b
equal to 91
.
This test is modified from Quiz4 Problem A
The testing result is 0x0000a7c6
equal to 42950
.
if you use the following c
program, you will get the same result. The program is the implementation of the following equation.