This is the term project of NCKU Computer Architecture (Fall 2023), contribured by Kuanch.
This project adapts riscv-mini to compatible with part of RV32M, and a new 4 stages design.
Go to Adaptions to see what is adapted, in summary
riscv-mini is a simple RISC-V 3-stage pipeline written in Chisel. Its datapath diagram is the following :
riscv-mini provide tools to visualize the waveform, by
Since we are working on RV32M, it is crucial to see the improvement on multiplication, by producing waveform of ./VTile tests/multiply.riscv.hex
, we could have the following waveform :
It is also crucial to know its hierarchy shown in waveform before we begin :
riscv-mini use riscv-tools to build the project, run
to build the toolchain.
After building the toolchain, compile C program with
Dump the elf file with
to have only RV32 instructions.
Modify custom-bmark/Makefile
to comply with your C program
to produce .vcd
and have the number of cycle, you will see something like
Simulation completed at time 2492 (cycle 249)
in output/main.out
In assignment 3, we look over a single cycle design, how the multi-stage is designed is still unclear, it's necessary to look into it before we really dive into the design.
The code below is all defined in src/main/scala/mini/Datapath.scala
.
We could easily find the pipeline registers in the Datapath file, they are defined as following :
and
The pipelining works with
and
Now it is clear where we can find the instructions between stages, that's important for our understanding and debugging.
the following behavior is defined in src/main/scala/mini/Control.scala
, the Decode stage is also fused inside Control.scala
like
A_sel
, B_sel
and alu_op
is input into ALUWriteBack
DMem Accessing
If there is no specifications, the adaptions below pass all the tests by running
under sbt
environment, this is actually to be 2 parts
1. unit test
ALUTests, BrCondTests, ImmGenTests, CSRTests, CacheTests, DatapathTests
2. integrated test
CoreSimpleTests, CoreISATests, CoreBmarkTests(median, multiply, qsort, towers, vvadd)
TL;DR You could see all the commits at this mul_in_alu branch on Github.
There are several places we need to take care for complying with RV32M:
1. Instruction and Opcode
Before starting, we need to define the format in instructions.scala
like following:
and also in Opcode.scala
, define Funct3
and Funct7
for MUL
Which will be used to identify the control signal and testing later.
2. Control
riscv-mini define the behaviors of each kind of instructions in Control.scala
in advance, the format is like the folloing, the meaning of each signal is tagged upon,
For MUL
, we add
its next PC is PC+4, RS2 is a register instead of an immediate, and there is not a load, store or csr, but it requires a writeback (wb_sel=Y).
3. Alu
The primary modification is at AluArea
, by mapping io.alu_op
in the Mux tree, it returns corresponding calculation:
Since it's 32-bits system, we implement a half-precision multiplier by
this would take lower 16-bits to be multiplied, ensure the result fits within the 32-bits width.
4. AluTest and TestUtils
There 50 test cases already prepared in src/test/scala/mini/TestUtils.scala
, the corresponding Funct7
, Funct3
and Opcode
are determined in advance for testing the instructions, and registers are generated randomly.
We integrate mul
test case like
And in src/test/scala/mini/ALUTests.scala
,
is added to conduct a half-precision multiplication.
TL;DR You could see all the mentioned files at this commit on Github.
In this section, we will verify the work above by giving C program, object file information and waveform respectively.
I write a simple C program for testing purpose like following
Then we can compile them and dump elf and assembly it as hex file:
Finally, we can find 02f707b3
in .hex file
this will be fetch by riscv-mini design, we can verify this by running verilator tool to produce its waveform.
It is clear to see 02f707b3
really being executed in the design.
The following results is obtained by building elf with
as you could see the huge reduction of the number of cycle, and much slower cost growth :
#cycle | rv32i | support M |
---|---|---|
1 mul | 832 | 249 |
10 mul | 3470 | 342 |
100 mul | 29931 | 1152 |
we use the following code to test "support M"
and the code for "rv32i"
TL;DR You could see all the commits at this 4stages branch on Github.
In this section, I would like to seperate MEM stage, since MEM+WB is the last efficient design in riscv-mini now, especially when riscv-mini support RV32I only in original architecture.
For better understanding the architecture and smooth the path to 4 stage design, I manually name some of the signals in the diagram with the variable name :
you can find exact the corresponding variables in Datapath.scala
, this will definitely help us to divide the stage.
1. Pipeline Register
The first thing is to separate registers for Memory and WriteBack
em_reg
remains the same interface to ew_reg
at this time, another alternative is moving CSR to EXE, like some of the materials suggested.
We create new pipeline register mw_reg
for passing the writeback states, mw_reg.alu
is simply receiving em_reg.alu
, mw_reg.wdata
is to receive reading data from memory, like lw
, and since CSR instructions definitely write back to register file like CSRRW
, CSRRS
, or CSRRC
, we pipeline the output to mw_reg
from CSR.
And we pipelined it with
2. WriteBack Registers
There are 4 inputs into the Mux at WriteBack stage, I pipelined them into mw_reg
and wire into corresponding ports
and the Mux at WB stage
3. Bypass
In the original 3 stages design, only WB->EXE
bypass is design, for bypass em_reg.alu
back to rs1
and rs2
. After seperate Memory stage from WB stage, new WB->EXE
should be added.
MEM->EXE
(WB->EXE
in 3 stage design)WB->EXE
(new)After the above adaptions, now this is how the design looks like:
(1/8): pass more tests by correcting wb_sel
when bypass
(1/7): calculations seems to be correct, although some CSR activate abnormal behaviors, the root cause remains unknown.
Latest Update: 1 / 8
Test | Pass | PassItem |
---|---|---|
ALUTests |
Image Not Showing
Possible Reasons
|
all |
BrCondTests |
Image Not Showing
Possible Reasons
|
all |
ImmGenTests |
Image Not Showing
Possible Reasons
|
all |
CSRTests |
Image Not Showing
Possible Reasons
|
all |
CacheTests |
Image Not Showing
Possible Reasons
|
all |
DatapathTests |
Image Not Showing
Possible Reasons
|
all |
Test | Pass | PassItem |
---|---|---|
CoreSimpleTests |
Image Not Showing
Possible Reasons
|
all |
CoreISATests |
Image Not Showing
Possible Reasons
|
all |
CoreBmarkTests |
Image Not Showing
Possible Reasons
|
all |
TileSimpleTests |
Image Not Showing
Possible Reasons
|
all |
TileISATests |
Image Not Showing
Possible Reasons
|
33/41 |
TileBmarkTests |
Image Not Showing
Possible Reasons
|
0/5 |
TileLargeBmarkTests |
Image Not Showing
Possible Reasons
|
0/5 |
We failed on some of the integrated tests, but it is hard to debug since the highly integration of these tests, like they are .hex and hard to read (issue: Test hexfile creation documentation), unclear pass conditions etc., we would like to take notes about the debugging progress.
When we compile C program like we did at Customized C Program run on riscv-mini, it returns TOHOST=1337
even when we just have an empty main function, we further examined .dump file and found the line
it is actually possible to trace why the exception code, interestingly, ChatGPT give another idea:
Regarding the specific code "1337", without additional context, it's hard to determine its exact meaning. However, "1337" (or "leet" in leetspeak) is often used in programming and gaming cultures to signify expertise or to flag something as special or unusual. It's possible that this code is being used humorously or symbolically to indicate a unique or noteworthy state in the program.
We now track what is the meaning of 1337 code with the hex code.
by examining the code, we can write it like
or
We still couldn't figure out why the root of the exception code 1337 at this step.
Some of them (qsort.riscv.hex, customized program) can't end normally, by look over the waveforms, return TOHOST=1337.
We also investigate several simple tests like rv32ui-p-addi.hex
and it seems okay for the calculations, but enter traps.
* rv32ui-p-addi.hex (TOHOST=668)
3 stages
4 stages
3 stages
4 stages
3 stages
4 stages
but these still resulting testing fails (TOHOST=668), surprisingly, when we run a complicated tests like median.riscv-large.hex
, there is much lesser serious error (TOHOST=1), being determined as pass.
more inspections on riscv-tests are needed, like how the tests are produced and the testing targets etc.
Lack of the essential instruction coverage.
riscv-tests is a repository hosting unit tests for RISC-V processors,
Let's start from a really basic example, the following code define a isa test of ADD
:
With this kind of interfaces, it's ealier to cover similar instruction tests.
Tile*
things (updated at 1/30)As you could see that all Core
tests passed but Tile
failed most of the test, and we don't see any Tile
thing in riscv-test
and it's difficult to compare it with Core
since they are both .hex
and no C code or dump file.
The only way is to further look over Tile.scala
and TileTester.scala
:
in the code, NastiBundle
is introduced, to simulated I/O hardware outside CPU, interact with memory inside CPU.
as you could see that at line 114 and 115.
and CPU cache interact datapath with
now it is more clear that TileTester.scala
test how CPU works with external memory which is mimicked with NastiBundle
.
According to ChatGPT4:
The NastiBundle is part of the NASTI (North American Slave and Master Interface) protocol, which is a specific implementation of the AXI (Advanced eXtensible Interface) protocol defined by ARM. The AXI protocol is widely used for high-speed data transfer between components in a system-on-chip (SoC), such as between processors, memory, and peripherals. NASTI is a Chisel implementation of this protocol, allowing for easy integration and interface definition in Chisel-based hardware designs.
And given:
So in the TileTester.scala
, dpath.io.dcache indirectly connect to Tile.io.nasti now.
(update at 2/1)
When I remove the assertion in TileTester.scala
at
all the test pass, which refer to the exception code, including 27, 9, 7, 21, 19, 11 etc., it would be great to know the meaning of this code, but again, they are all .hex
file, it's really hard to dive into it.
Error building binutils in riscv-gnu-toolchain target with GCC 4.9.4
RISC-V 初探 (building toolchain)
riscv-gnu-toolchain工具链-从下载到运行 (!!CSDN!!) (building toolchain)
Lab2: RISC-V RV32I[MACF] emulator with ELF support
Where to put CSR unit in 5-stage pipeline ? (reddit)
Extending lowRISC with new devices
深入 AXI4 总线(一)握手机制 (知乎)
浅谈AXI总线 (!!CSDN!!)
Test hexfile creation documentation
Error information *** FAILED *** (tohost = 1337)
Using C++ Emulator fails when calling printf syscall from a RISC-V baremetal program