CA2022 Project: RISCV-Atom and implement RV32M
contributed by <鄒崴丞StevenChou
>, <王漢祺WangHanChi
>
Prerequisites
Install Dependency
-
Git clone the source code
git clone https://github.com/saursin/riscv-atom.git
-
Install git, make, python3, gcc & other tools
sudo apt install git python3 build-essential
-
Install Verilator(Version = 5.002)
cd $HOME
git clone https://github.com/verilator/verilator
cd verilator
git checkout stable
export VERILATOR_ROOT=pwd
autoconf
./configure
make
export VERILATOR_ROOT=$HOME/verilator
export PATH=$VERILATOR_ROOT/bin:$PATH
-
Install GTK Wave
sudo apt install gtkwave
-
Install Screen
sudo apt install screen
-
Install RISC-V GNU Toolchain
:warning:Before doing this, we need to add a file in riscv-atom
.
ths file is here, or you can do the operation here
cd riscv-atom
sudo chmod +x install-toolchain.sh
sudo ./install-toolchain.sh -x
or install from source
riscv-gnu-toolchain
-
Install Doxygen
sudo apt install doxygen
-
Install Latex Related packages
sudo apt -y install texlive-latex-recommended texlive-pictures texlive-latex-extra latexmk
-
Install sphinx & other python dependencies
cd docs/ && pip install -r requirements.txt
-
Install socat packages
sudo apt install socat
Building RISC-V Atom
- RISC-V Atom environment variables
cd riscv-atom
source sourceme
echo "source <YOUR - PATH>/sourceme" >> ~/.bashrc
- Building the Simulator
make soctarget=atombones
atomsim –help
:warning:Because of our verilator
is installed from source instead of using sudo apt install verilator
. So we need to change the path in ~/riscv-atom/sim/makefile
.
Fix a little error
:warning:Because there is some bug in his install-toolchain.sh, I install the riscv-gnu-toolchain form the source, and we can get the configure
file in the riscv-gnu-toolchain dictionary. Therefore, we can modify the install-toolchain.sh as following in the line 60.
Running Examples on AtomSim
Hello World Example
Switch to the examples dictionary
cd ~/riscv-atom/sw/examples
Compile with RISC-V gcc cross-compiler, generate hello.elf
in hello-asm
dictionary
make soctarget=atombones ex=hello-asm compile
Run the example
atomsim hello-asm/hello.elf
Alternatively, use make run to run the example
make soctarget=atombones ex=hello-asm run
The syntax is asfollowing:
make soctarget=<TARGET> ex=<EXAMPLE> compile
make soctarget=<TARGET> ex=<EXAMPLE> run
The Runexamples Script
Automatically compile and simulate all examples
atomsim-runexamples
make soctarget=atombones run-all
Using Atomsim Vuart
atomsim-gen-vports
screen $RVATOM/userport 9600
In another terminal
atomsim hello-asm/hello.elf –vuart=$RVATOM/simport
To close the screen command press ctrl+a
, type :quit
and press Enter
.
Test AtomSim(CPU core)
We can type the atomsim --help
to get all the instruction about the atom simulation.
- Use -t will get the trace.vcd(fst graph) in the
~/riscv-atom/
- Use -v will get the detail output
Analyze RISCV-Atom
2-Stage Pipeline
This author is talking about his design inspiration come from Arm Cortex m0+ on RISC-V Atom (Core) And on MICROCHIP DEVELOPER HELP this website mentions that M0+ can minimize its branch penalty dual to two-stage.Because it can reduce the access to Flash, it can further reduce power consumption, which usually accounts for the power consumption of microcontrollers. Therefore, it can work at ultra-low power consumption.

This picture is show the pipeline of Arm Cortex M0+, we can find that the Decode stage is divided into Pre-decode and Main decode. Such this design can minimize the branch penalty.

And Atom also inherits this advantage, divided into two stages, we can see the above figure, its stage-1 is mainly used for Fetch Instruction, and stage-2 is responsible for Decode, Execute & Write-back, can reduce its branch penalty to 1.
Design motivation
We can reduce the use of Register file by putting Decode in stage-2, which can reduce the usage of LUT. LUT is a common component of FPGA. It is the same as a ROM. It can pre-store the results of logic functions in it, so that the pre-stored results can be addressed by using the input signal as an address.

The reason why the amount of LUT usage can be reduced is because the FPGA is composed of a basic block called CLB (Configurable Logic Block) or LAB (Logic Array Block). The following figure is a typical structure of a CLB block, which includes a full adder (FA), a D-Type flip-flop, three multiplexers (mux) and two three-input Lookup tables (3-LUTs ). Therefore, reducing the CLB will also reduce the LUT usage.
Validate RISC-V Atom by Verilator, including Dhrystone.
Validate
The version of Verilator in our environment is 5.002
As above using, we can type make sim
in the ROOTDIR of riscv-atom to build the atomsim. In the makefile, we can find that is
After get the atomsim, we can type atomsim --help
to get the user guide, that is
Now, we can run the simple example to check the atomsim
atomsim sw/examples/hello-asm/hello.elf
And we will get the
We also can add some flag likes -v
, -d
, -t
…and so on.
Therefore, we completed the Validate RISC-V Atom by Verilator.
Dhrystone
The author, saursin have made Dhrystone before, but there are some bugs in the code which made it cannot run. I commented some code in order to fixed it. Now we can type make dhrystone
in the ROOTDIR to get the dhrystone result.
This code is dhry_1.c which is one fragment of dhrystone.
Here is the result of dhrystone
Study Atomsim
AtomSim is the main part of the riscv-atom. There are mainly two parts, which is core and uncore. core is the "core" part of the processor, "uncore" is the remain part of the processor.
Core
First we will take a look at the core part of the Atomsim. The CPU is a 2-stage pipeline, The first stage is the Fetch stage, the second stage is the decode, execute, memory and write back stage.
Defs.vh
defines all the macros we are going to use in the other files, mainly defining all the instruction type, ALU options and comparator option.
Decode.vh
decodes the fetched instruction, takes out the opcode
, func3
and func7
part. Figure out the destination register rd
and source register rs1
and rs2
.
After we decompose the fetched instruction, we can analyze the function we need to prepare and enable or disable the register file, figure out the comparision type.
Now we figure out the register file we need and the comparison type, the last thing the decoder need to is generate a 32-bit immediate for future
calculation.
RegisterFile.v
stores all the RISC-V CPU register, providing a reset function. With in a cycle we can write in one register and read out two register at the same time.
Because RISC-V CPU have 32 registers, so we need 5 bits to locate each of them, also the first register which is x0
is a hard-wired zero, means it cannot be changed into any value instead of zero.
We can write into the register file every time the clock is trigger (1 clock cycle) and also the write enable signal Data_We_i
is high, the written register is choosen by the Rd_Sel_i
signal. If the reset signal Rst_i
is high, all the register will be formatted into zero.
ALU.v
is responsible for all the main calculation of the RISC-V CPU, it takes the output of the decoder, which is the register value or an immediate. The main supported calculation are add, subtract, arithmetic shift left and right, bitwise and, or and exclusive or.
ALU uses the select signal sel_i
to select the correct function of the calculation.
Because addition and subtraction is very similar under bare metal, so we can use the same function at the same time.
Arithmetic right shift needs to pad the most significand bit and logic shift only need to pad zeros, we can obtain the shift amount by bottom 5 bits of the signal b_i
.
After doing all the calculations, we can use the select signal to output the correct answer.
Implement the rv32M-Extension
In order to implement RV32M, we have to first understand what RV32M stands for. RV32M is an optional extended instruction set besides RV32I. It is mainly used for multiplication, division and remainder of (non-negative) integers.
Read the risc-v spec

We can get the machine code in M-Extension in the picture.

And we can also know that M-instructions is all R-type.
According to the description on page 44 of the RISC-V specification :
REM and REMU provide the remainder of the corresponding division operation. For REM, the sign of the result equals the sign of the dividend.
…
For both signed and unsigned division, it holds that dividend = divisor × quotient + remainder.
Therefore, when doing REM and REMU operations, the sign of the dividend must be the same as the remainder. And all division operations must comply with dividend = divisor x quotient + remainder .
- In addition, we need to pay attention to two exceptions when doing division operations. The first is the case where the divisor is 0, and the second is the division overflow of signed numbers. The following are the operation results in special cases :
Condition |
Dividend |
Divisor |
|
DIVU |
REMU |
DIV |
REMU |
divided by 0 |
\(x\) |
\(0\) |
|
\(2^L-1\) |
\(x\) |
\(-1\) |
\(x\) |
overflow (on signed) |
\(-2^{L-1}\) |
\(-1\) |
|
– |
– |
\(-2^L-1\) |
\(0\) |
We need to expand the ALU for our M instructions.
Remove the old instructions, and we will extend ont bit for I + M instructions. Therefore it changed from 3'd0
to 4'd0
in ALU_FUNC_ADD.
We will change a little in this verilog file because the file is not related to Decode. We will widen the wire for decode alu opcode.
riscv-tests
We decide to use riscv-test to test atomsim finally. There are many places need to modify to meet the requires.
- Beacuse the author of riscv-atom did not implement
fence
this instruction, we modify the decode.v to avoid error.
- Modify the linker script for atomsim, we need to compare the linker script of riscv-test. So we make the .text section begin at 0x00000000.
- git clone the riscv-test project and put it in riscv-atom/test. And follow the step form README.md
- Modify the link script in
riscv-tests/env/p
. The reason is we want to align the start section between atomsim and riscv-tests. And we find that we should follow the setting of author. He let the ROM is 64MB, and RAM is 64MB in the riscv-atom/sw/lib/linklink.ld
.Therefore we should push the .data section
to 0x04000000
.
- Modify the testing header in order to print 0/1 to check it pass or not. we will change the end part, which will not change the testing code.
First, we will modify the riscv_test.h in riscv-tests/env/p
. one is unimp
because atomsim will stop when it decode the word ebreak
. Therecore, we will change it into ebreak
.
And because it will not print any symbol to let us know pass or not, we create the similar pass result
and fail result
to print 0/1to let us know which code is passed. And this step is also good for us to write shell script to check.
We will modify the same file, which is riscv_test.h in riscv-tests/env/p
.
- Modify the testing macros to let it print 0/1, we change the end of macro to let it use our similar code which can print 0/1.
There is a file need to modify beacuse it directly call the macro to check the simple function, so we replace it to our pass macro. The file is simple.S in riscv-test/isa/rv64ui
. The reason which we modify the rv64 file is that the rv32ui testing code is called the same code in rv64ui, so we should modify in rv64ui instead of rv32ui.
- We write a shell script to auto confirm the test pass or not in
riscv-atom/scripts
. We can type make riscv-test
to get the result!
Result of make riscv-test
hanchi@hanchi:~/riscv-atom$ make riscv-test
./scripts/riscv-test.sh
Now test rv32ui for riscv-atom!!
testing instruction add
Loading segment 1 [base=0x00000000, sz= 1724 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction addi
Loading segment 1 [base=0x00000000, sz= 1148 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction and
Loading segment 1 [base=0x00000000, sz= 1724 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction andi
Loading segment 1 [base=0x00000000, sz= 956 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction auipc
Loading segment 1 [base=0x00000000, sz= 564 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction beq
Loading segment 1 [base=0x00000000, sz= 1212 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction bge
Loading segment 1 [base=0x00000000, sz= 1276 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction bgeu
Loading segment 1 [base=0x00000000, sz= 1340 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction blt
Loading segment 1 [base=0x00000000, sz= 1212 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction bltu
Loading segment 1 [base=0x00000000, sz= 1212 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction bne
Loading segment 1 [base=0x00000000, sz= 1212 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction jal
Loading segment 1 [base=0x00000000, sz= 572 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction jalr
Loading segment 1 [base=0x00000000, sz= 700 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction lb
Loading segment 1 [base=0x00000000, sz= 1084 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
Loading segment 3 [base=0x04000000, sz= 16 bytes, at=0x04000000] ... done
1
testing instruction lbu
Loading segment 1 [base=0x00000000, sz= 1084 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
Loading segment 3 [base=0x04000000, sz= 16 bytes, at=0x04000000] ... done
1
testing instruction lh
Loading segment 1 [base=0x00000000, sz= 1148 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
Loading segment 3 [base=0x04000000, sz= 16 bytes, at=0x04000000] ... done
1
testing instruction lhu
Loading segment 1 [base=0x00000000, sz= 1212 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
Loading segment 3 [base=0x04000000, sz= 16 bytes, at=0x04000000] ... done
1
testing instruction lui
Loading segment 1 [base=0x00000000, sz= 572 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction lw
Loading segment 1 [base=0x00000000, sz= 1212 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
Loading segment 3 [base=0x04000000, sz= 16 bytes, at=0x04000000] ... done
1
testing instruction or
Loading segment 1 [base=0x00000000, sz= 1724 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction ori
Loading segment 1 [base=0x00000000, sz= 956 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction sb
Loading segment 1 [base=0x00000000, sz= 1596 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
Loading segment 3 [base=0x04000000, sz= 16 bytes, at=0x04000000] ... done
1
testing instruction sh
Loading segment 1 [base=0x00000000, sz= 1788 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
Loading segment 3 [base=0x04000000, sz= 32 bytes, at=0x04000000] ... done
1
testing instruction sll
Loading segment 1 [base=0x00000000, sz= 1852 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction simple
Loading segment 1 [base=0x00000000, sz= 444 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction slli
Loading segment 1 [base=0x00000000, sz= 1148 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction slt
Loading segment 1 [base=0x00000000, sz= 1724 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction slti
Loading segment 1 [base=0x00000000, sz= 1084 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction sltiu
Loading segment 1 [base=0x00000000, sz= 1084 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction sltu
Loading segment 1 [base=0x00000000, sz= 1724 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction sra
Loading segment 1 [base=0x00000000, sz= 1916 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction srai
Loading segment 1 [base=0x00000000, sz= 1212 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction srl
Loading segment 1 [base=0x00000000, sz= 1916 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction srli
Loading segment 1 [base=0x00000000, sz= 1148 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction sub
Loading segment 1 [base=0x00000000, sz= 1724 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction sw
Loading segment 1 [base=0x00000000, sz= 1788 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
Loading segment 3 [base=0x04000000, sz= 48 bytes, at=0x04000000] ... done
1
testing instruction xor
Loading segment 1 [base=0x00000000, sz= 1724 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction xori
Loading segment 1 [base=0x00000000, sz= 956 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
Now test rv32um for riscv-atom!!
testing instruction div
Loading segment 1 [base=0x00000000, sz= 700 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction divu
Loading segment 1 [base=0x00000000, sz= 700 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction mul
Loading segment 1 [base=0x00000000, sz= 1724 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction mulh
Loading segment 1 [base=0x00000000, sz= 1724 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction mulhu
Loading segment 1 [base=0x00000000, sz= 1724 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction mulhsu
Loading segment 1 [base=0x00000000, sz= 1724 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction rem
Loading segment 1 [base=0x00000000, sz= 700 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
testing instruction remu
Loading segment 1 [base=0x00000000, sz= 700 bytes, at=0x00000000] ... done
Loading segment 2 [base=0x00001000, sz= 72 bytes, at=0x00001000] ... done
1
==============================
rv32ui-p instruction set :
The pass rate is 38/38
The fail rate is 0/38
Pass rv32ui-p testing!
==============================
rv32um-p instruction set :
The pass rate is 8/8
The fail rate is 0/8
Pass rv32um-p testing!
==============================
Issue and Pull Request
We plan to hold a issue to the author
We encounter some problem need to solve:
- verilator PATH ERROR
- In
~/riscv-atom/sim/makefile
, because our verilator is install by source code, therefore the PATH is not same with author.Hence, we need to change the path before push to github and CICD.
- verilator command
- We find that the author's verilator version is not the same with us. Hence,we need to change the /verilator lint_off UNUSEDSIGNAL/ to /* verilator lint_off UNUSED */
- This version can pass github CICD
- This version can run in our environment
Reference Links
Computer Archiecture 2022: Term Project
saursin / riscv-atom
RISC-V Atom Documentation & User Manual
srv32
riscv-spec
dhrystone
Cortex m0+