# Analysis RV32M for [spu32](https://github.com/maikmerten/spu32) (compact RV32I core)
###### tags: `RISC-V`, `Computer Architure 2022`, `Term project`
* RV32M is a variation of the RISC-V instruction set architecture (ISA) that is designed for faster mathematical computation and provides balance between performance and code density. It is a extension of the RV32I base ISA and includes the following subsets:
* **Multiplication operations** are significant for many applications in the domain of digital signal processing, image processing, scientific computing, and many more.
* **Division operations** are needed for some scientific computing and special purpose operations like graphics rendering, etc.
* **SPU32** ("Small Processing Unit 32"), a compact RISC-V processor implementing the RV32I instruction set, also includes some peripherals. This project is written in Verilog and is designed to be synthesizable using the yosys.
* **YOSYS** is an opensource framework for RTL synthesis, that is translate HDL into gate-level netlist implementation.
* We need the Verilog/SystemVerilog simulators such as **Verilator**, supporting RISC-V simulation. Verilator accepts Verilog or SystemVerilog, and compiles HDL code into a much faster optimized and optionally thread-partitioned model, which is in turn wrapped inside a C++/SystemC module.
### Setup environment
1. Install prequisites:
```
$sudo apt-get install build-essential clang bison flex libreadline-dev \
gawk tcl-dev libffi-dev git mercurial graphviz \
xdot pkg-config python python3 libftdi-dev gperf \
libboost-program-options-dev autoconf libgmp-dev \
cmake curl
```
Install yosys & verilator:
```$ sudo apt install yosys verilator```
2. Install [riscv-gnu-toolchain](https://github.com/riscv-collab/riscv-gnu-toolchain).
```
$ mkdir ~/riscv-gcc
$ cd ~/riscv-gcc
$ git clone https://github.com/riscv/riscv-gnu-toolchain
$ cd riscv-gnu-toolchain/
$ ./configure --prefix=/opt/riscv32 --with-arch=rv32ima --with-abi=ilp32
$ sudo make -j$(nproc)
$ export RISCV32=/opt/riscv32
$ export PATH=$RISCV32/bin:$PATH
```
3. Clone SPU32:
```
$ cd ~/riscv-gcc/
$ git clone https://github.com/maikmerten/spu32.git
```
### Use yosys to analysis SPU32
1. Analyse SPU32's modules:
```
$ cd ~/riscv/spu32
$ sudo yosys
yosys> read -sv cpu/cpu.v
yosys> hierarchy -top spu32_cpu
2.1. Analyzing design hierarchy..
Top module: \spu32_cpu
Used module: \spu32_cpu_branch
Used module: \spu32_cpu_registers
Used module: \spu32_cpu_decoder
Used module: \spu32_cpu_bus
Used module: \spu32_cpu_alu
Used module: \spu32_cpu_mul
Used module: \spu32_cpu_shifter
Used module: \spu32_cpu_div
```
Compare to last-year work by [劉品宏](https://hackmd.io/3gjcgqDkQXuk9jQf44856w?view#Reading-and-Elaborating-the-design-using-the-Verilog-frontend)'s team:
```
4.10. Analyzing design hierarchy..
Top module: \spu32_cpu
Used module: \spu32_cpu_branch
Used module: \spu32_cpu_registers
Used module: \spu32_cpu_decoder
Used module: \spu32_cpu_bus
Used module: \spu32_cpu_alu
Used module: \spu32_cpu_shifter
Used module: \spu32_cpu_mul
```
Suprisingly, I find they had fulfilled complete RV32M instruction in the project! 😱
2. Synthesis Verilog into gate-level representation.
* write the design to the Yosys's internal format
```yosys> write_ilang```
* convert processes to netlist elements and optimize
```yosys> proc; opt; memory; opt```
* display design netlist using xdot:
```yosys> show```
* translate netlist to gate logic and optimize
```yosys> techmap; opt```
* write generated netlist to a new Verilog file
```yosys> write_verilog synth.v```
## RV32M instruction
### Introduction
The "M" extention belongs to R-Type set and includes following 6 instructions. Because 32bit x 32bit can return up to 64 bits result, which exceed the size of a register. ```MUL``` is used to get the lower 32 bits and ```MULH``` for the higher ones.

* *x* divided by zero:
```DIVU``` = *-1* ```REMU``` = *x*
```DIV``` = *2^31^-1* ```REM``` = *x*
* If both higher and lower result are required simutaneously, it's recommended to write:
```MULH rdh, rs1,rs2; MUL rdl, rs1,rs2```
or ```DIV rdq, rs1,rs2; REM rdr, rs1,rs2```
* Signed Division Overflow: (-2^30^) ➗(-1)
```DIV``` = *-2^30^* ```REM``` = *0*
### Implementaion
* Modified spu32/cpu/decoder.v
```=
//R-Type
case({funct3, funct7[0]})
{`FUNC_ADD_SUB, 1'b0}: aluop_op = funct7[5] ? `ALUOP_SUB : `ALUOP_ADD;
{`FUNC_SLL, 1'b0}: aluop_op = `ALUOP_SLL;
{`FUNC_SLT, 1'b0}: aluop_op = `ALUOP_SLT;
{`FUNC_SLTU, 1'b0}: aluop_op = `ALUOP_SLTU;
{`FUNC_XOR, 1'b0}: aluop_op = `ALUOP_XOR;
{`FUNC_SRL_SRA, 1'b0}: aluop_op = funct7[5] ? `ALUOP_SRA : `ALUOP_SRL;
{`FUNC_OR, 1'b0}: aluop_op = `ALUOP_OR;
{`FUNC_AND, 1'b0}: aluop_op = `ALUOP_AND;
{`FUNC_MUL, 1'b1}: begin
aluop_op = `ALUOP_MUL;
aluop_signed = 2'b00;
end
{`FUNC_MULH, 1'b1}: begin
aluop_op = `ALUOP_MULH;
aluop_signed = 2'b11;
end
{`FUNC_MULHSU, 1'b1}: begin
aluop_op = `ALUOP_MULH;
aluop_signed = 2'b01;
end
{`FUNC_MULHU, 1'b1}: begin
aluop_op = `ALUOP_MULH;
aluop_signed = 2'b00;
end
{`FUNC_DIV, 1'b1}: begin
aluop_op = `ALUOP_DIV;
aluop_signed = 2'b11;
end
{`FUNC_DIVU, 1'b1}: begin
aluop_op = `ALUOP_DIV;
aluop_signed = 2'b00;
end
{`FUNC_REM, 1'b1}: begin
aluop_op = `ALUOP_REM;
aluop_signed = 2'b11;
end
{`FUNC_REMU, 1'b1}: begin
aluop_op = `ALUOP_REM;
aluop_signed = 2'b00;
end
default: aluop_op = `ALUOP_ADD;
endcase
```
* Modified spu32/cpu/riscvdefs.vh
```=
`define FUNC_MUL 3'b000
`define FUNC_MULH 3'b001
`define FUNC_MULHSU 3'b010
`define FUNC_MULHU 3'b011
`define FUNC_DIV 3'b100
`define FUNC_DIVU 3'b101
`define FUNC_REM 3'b110
`define FUNC_REMU 3'b111
```
* Modified spu32/cpu/aludefs.vh
```=
`define ALUOP_MUL 4'b1010
`define ALUOP_MULH 4'b1011
`define ALUOP_DIV 4'b1100
`define ALUOP_REM 4'b1101
```
## Use SPU32 with Verilator
* Execute the synthesized verilog:
:::spoiler
```
$ verilator synth.v --sc
%Warning-UNOPTFLAT: synth6.v:2198:14: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu._0120_'
2198 | wire [1:0] _0120_;
| ^~~~~~
... Use "/* verilator lint_off UNOPTFLAT */" and lint_on around source to disable this message.
synth6.v:2198:14: Example path: spu32_cpu._0120_
synth6.v:4542:79: Example path: ASSIGNW
synth6.v:2322:15: Example path: spu32_cpu._0233_
synth6.v:2760:20: Example path: ASSIGNW
synth6.v:2198:14: Example path: spu32_cpu._0120_
%Warning-UNOPTFLAT: synth6.v:4550:21: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.__Vconcswap4'
4550 | assign { _0250_[2], _0250_[0] } = { _0020_, _0250_[1] };
| ^
synth6.v:4550:21: Example path: spu32_cpu.__Vconcswap4
synth6.v:4550:35: Example path: ASSIGNW
synth6.v:2344:14: Example path: spu32_cpu._0250_
synth6.v:4550:35: Example path: ASSIGNW
synth6.v:4550:21: Example path: spu32_cpu.__Vconcswap4
%Warning-UNOPTFLAT: synth6.v:4547:23: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.__Vconcswap2'
4547 | assign { _0243_[5:3], _0243_[1:0] } = { 2'h0, _0022_, 1'h0, _0243_[2] };
| ^
synth6.v:4547:23: Example path: spu32_cpu.__Vconcswap2
synth6.v:4547:39: Example path: ASSIGNW
synth6.v:2336:14: Example path: spu32_cpu._0243_
synth6.v:4547:39: Example path: ASSIGNW
synth6.v:4547:23: Example path: spu32_cpu.__Vconcswap2
%Warning-UNOPTFLAT: synth6.v:5694:14: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.dec_inst._150_'
5694 | wire [5:0] _150_;
| ^~~~~
synth6.v:5694:14: Example path: spu32_cpu.dec_inst._150_
synth6.v:6036:19: Example path: ASSIGNW
synth6.v:5694:14: Example path: spu32_cpu.dec_inst._150_
%Warning-UNOPTFLAT: synth6.v:6533:49: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.dec_inst.__Vconcswap2'
6533 | assign { _157_[47:45], _157_[43:18], _157_[16], _157_[14:0] } = { 4'h0, _053_[5], _053_[5], _053_[5], 1'h0, _059_[4], 1'h0, _059_[4], 1'h0, _056_[4], _056_[4], 2'h0, _053_[3], 2'h0, _050_[3], 2'h0, _059_[3], 2'h0, _059_[2], _059_[2], 4'h0, _157_[15], 1'h0, _016_, 1'h0, _016_, _016_, _015_, _015_, 2'h0, _014_, _014_, 1'h0, _014_ };
| ^
synth6.v:6533:49: Example path: spu32_cpu.dec_inst.__Vconcswap2
synth6.v:6533:65: Example path: ASSIGNW
synth6.v:5706:15: Example path: spu32_cpu.dec_inst._157_
synth6.v:5871:19: Example path: ASSIGNW
synth6.v:5575:14: Example path: spu32_cpu.dec_inst._033_
synth6.v:6496:35: Example path: ASSIGNW
synth6.v:5598:14: Example path: spu32_cpu.dec_inst._056_
synth6.v:6533:65: Example path: ASSIGNW
synth6.v:6533:49: Example path: spu32_cpu.dec_inst.__Vconcswap2
%Warning-UNOPTFLAT: synth6.v:6539:90: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.dec_inst.__Vconcswap3'
synth6.v:6539:90: Example path: spu32_cpu.dec_inst.__Vconcswap3
synth6.v:6539:106: Example path: ASSIGNW
synth6.v:5727:16: Example path: spu32_cpu.dec_inst._173_
synth6.v:6539:106: Example path: ASSIGNW
synth6.v:6539:90: Example path: spu32_cpu.dec_inst.__Vconcswap3
%Warning-UNOPTFLAT: synth6.v:4599:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.bru_inst._009_'
4599 | wire [31:0] _009_;
| ^~~~~
synth6.v:4599:15: Example path: spu32_cpu.bru_inst._009_
synth6.v:5224:19: Example path: ASSIGNW
synth6.v:4603:15: Example path: spu32_cpu.bru_inst._011_
synth6.v:5168:19: Example path: ASSIGNW
synth6.v:4599:15: Example path: spu32_cpu.bru_inst._009_
%Warning-UNOPTFLAT: synth6.v:9289:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.alu_inst.mul_inst._0028_'
9289 | wire [63:0] _0028_;
| ^~~~~~
synth6.v:9289:15: Example path: spu32_cpu.alu_inst.mul_inst._0028_
synth6.v:11616:20: Example path: ASSIGNW
synth6.v:9293:15: Example path: spu32_cpu.alu_inst.mul_inst._0030_
synth6.v:11499:20: Example path: ASSIGNW
synth6.v:9289:15: Example path: spu32_cpu.alu_inst.mul_inst._0028_
%Warning-UNOPTFLAT: synth6.v:9261:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.alu_inst.mul_inst._0004_'
9261 | wire [63:0] _0004_;
| ^~~~~~
synth6.v:9261:15: Example path: spu32_cpu.alu_inst.mul_inst._0004_
synth6.v:11615:20: Example path: ASSIGNW
synth6.v:9290:15: Example path: spu32_cpu.alu_inst.mul_inst._0029_
synth6.v:11206:20: Example path: ASSIGNW
synth6.v:9261:15: Example path: spu32_cpu.alu_inst.mul_inst._0004_
%Warning-UNOPTFLAT: synth6.v:158:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.alu_inst._0145_'
158 | wire [31:0] _0145_;
| ^~~~~~
synth6.v:158:15: Example path: spu32_cpu.alu_inst._0145_
synth6.v:1680:17: Example path: ASSIGNW
synth6.v:200:8: Example path: spu32_cpu.alu_inst._0166_
synth6.v:1754:20: Example path: ASSIGNW
synth6.v:158:15: Example path: spu32_cpu.alu_inst._0145_
%Warning-UNOPTFLAT: synth6.v:6641:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.alu_inst.div_inst._0055_'
6641 | wire [62:0] _0055_;
| ^~~~~~
synth6.v:6641:15: Example path: spu32_cpu.alu_inst.div_inst._0055_
synth6.v:9087:21: Example path: ASSIGNW
synth6.v:6641:15: Example path: spu32_cpu.alu_inst.div_inst._0055_
%Warning-UNOPTFLAT: synth6.v:6654:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.alu_inst.div_inst._0063_'
6654 | wire [31:0] _0063_;
| ^~~~~~
synth6.v:6654:15: Example path: spu32_cpu.alu_inst.div_inst._0063_
synth6.v:9008:20: Example path: ASSIGNW
synth6.v:6654:15: Example path: spu32_cpu.alu_inst.div_inst._0063_
%Warning-UNOPTFLAT: synth6.v:6659:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.alu_inst.div_inst._0065_'
6659 | wire [62:0] _0065_;
| ^~~~~~
synth6.v:6659:15: Example path: spu32_cpu.alu_inst.div_inst._0065_
synth6.v:8956:20: Example path: ASSIGNW
synth6.v:6659:15: Example path: spu32_cpu.alu_inst.div_inst._0065_
%Warning-UNOPTFLAT: synth6.v:6594:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.alu_inst.div_inst._0027_'
6594 | wire [62:0] _0027_;
| ^~~~~~
synth6.v:6594:15: Example path: spu32_cpu.alu_inst.div_inst._0027_
synth6.v:9233:24: Example path: ASSIGNW
synth6.v:6594:15: Example path: spu32_cpu.alu_inst.div_inst._0027_
%Warning-UNOPTFLAT: synth6.v:164:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.alu_inst._0148_'
164 | wire [32:0] _0148_;
| ^~~~~~
synth6.v:164:15: Example path: spu32_cpu.alu_inst._0148_
synth6.v:1791:17: Example path: ASSIGNW
synth6.v:364:8: Example path: spu32_cpu.alu_inst._0248_
synth6.v:1875:20: Example path: ASSIGNW
synth6.v:164:15: Example path: spu32_cpu.alu_inst._0148_
%Warning-UNOPTFLAT: synth6.v:2368:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu._0265_'
2368 | wire [14:0] _0265_;
| ^~~~~~
synth6.v:2368:15: Example path: spu32_cpu._0265_
synth6.v:4556:39: Example path: ASSIGNW
synth6.v:4556:22: Example path: spu32_cpu.__Vconcswap5
synth6.v:4556:39: Example path: ASSIGNW
synth6.v:2368:15: Example path: spu32_cpu._0265_
%Warning-UNOPTFLAT: synth6.v:2370:14: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu._0266_'
2370 | wire [4:0] _0266_;
| ^~~~~~
synth6.v:2370:14: Example path: spu32_cpu._0266_
synth6.v:4557:37: Example path: ASSIGNW
synth6.v:4557:23: Example path: spu32_cpu.__Vconcswap7
synth6.v:4557:37: Example path: ASSIGNW
synth6.v:2370:14: Example path: spu32_cpu._0266_
%Warning-UNOPTFLAT: synth6.v:5281:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.bus_inst._048_'
5281 | wire [31:0] _048_;
| ^~~~~
synth6.v:5281:15: Example path: spu32_cpu.bus_inst._048_
synth6.v:5517:23: Example path: ASSIGNW
synth6.v:5281:15: Example path: spu32_cpu.bus_inst._048_
%Warning-UNOPTFLAT: synth6.v:6664:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.alu_inst.div_inst._0067_'
6664 | wire [31:0] _0067_;
| ^~~~~~
synth6.v:6664:15: Example path: spu32_cpu.alu_inst.div_inst._0067_
synth6.v:8904:20: Example path: ASSIGNW
synth6.v:6664:15: Example path: spu32_cpu.alu_inst.div_inst._0067_
%Warning-UNOPTFLAT: synth6.v:2408:16: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu._0292_'
2408 | wire [119:0] _0292_;
| ^~~~~~
synth6.v:2408:16: Example path: spu32_cpu._0292_
synth6.v:2830:20: Example path: ASSIGNW
synth6.v:2216:14: Example path: spu32_cpu._0138_
synth6.v:4565:25: Example path: ASSIGNW
synth6.v:2408:16: Example path: spu32_cpu._0292_
%Warning-UNOPTFLAT: synth6.v:5586:14: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.dec_inst._044_'
5586 | wire [3:0] _044_;
| ^~~~~
synth6.v:5586:14: Example path: spu32_cpu.dec_inst._044_
synth6.v:6531:36: Example path: ASSIGNW
synth6.v:5702:15: Example path: spu32_cpu.dec_inst._155_
synth6.v:5890:19: Example path: ASSIGNW
synth6.v:5586:14: Example path: spu32_cpu.dec_inst._044_
%Warning-UNOPTFLAT: synth6.v:4584:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.bru_inst._001_'
4584 | wire [31:0] _001_;
| ^~~~~
synth6.v:4584:15: Example path: spu32_cpu.bru_inst._001_
synth6.v:5223:19: Example path: ASSIGNW
synth6.v:4600:15: Example path: spu32_cpu.bru_inst._010_
synth6.v:5000:19: Example path: ASSIGNW
synth6.v:4584:15: Example path: spu32_cpu.bru_inst._001_
%Warning-UNOPTFLAT: synth6.v:5706:15: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.dec_inst._157_'
5706 | wire [47:0] _157_;
| ^~~~~
synth6.v:5706:15: Example path: spu32_cpu.dec_inst._157_
synth6.v:5871:19: Example path: ASSIGNW
synth6.v:5575:14: Example path: spu32_cpu.dec_inst._033_
synth6.v:6066:19: Example path: ASSIGNW
synth6.v:5595:14: Example path: spu32_cpu.dec_inst._053_
synth6.v:6533:65: Example path: ASSIGNW
synth6.v:5706:15: Example path: spu32_cpu.dec_inst._157_
%Warning-UNOPTFLAT: synth6.v:5575:14: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.dec_inst._033_'
5575 | wire [5:0] _033_;
| ^~~~~
synth6.v:5575:14: Example path: spu32_cpu.dec_inst._033_
synth6.v:6496:35: Example path: ASSIGNW
synth6.v:5598:14: Example path: spu32_cpu.dec_inst._056_
synth6.v:5900:19: Example path: ASSIGNW
synth6.v:5575:14: Example path: spu32_cpu.dec_inst._033_
%Warning-UNOPTFLAT: synth6.v:5727:16: Signal unoptimizable: Feedback to clock or circular logic: 'spu32_cpu.dec_inst._173_'
5727 | wire [127:0] _173_;
| ^~~~~
synth6.v:5727:16: Example path: spu32_cpu.dec_inst._173_
synth6.v:5971:19: Example path: ASSIGNW
synth6.v:5628:14: Example path: spu32_cpu.dec_inst._086_
synth6.v:6539:106: Example path: ASSIGNW
synth6.v:5727:16: Example path: spu32_cpu.dec_inst._173_
%Error: Exiting due to 25 warning(s)
```
:::
* The error occurs for that Verilator is much picky to combo loop. Nevertheless, SPU32 uses loop to implement multiplier.

* Solution:
add ```/* verilator lint_off UNOPTFLAT */``` to the top of synth.v
* Also change spu32/software/spu32-system/libtinyc/libtiny.c
```int printf(const char* format, ...){}```
* Verification Program to test mul instruction:
```=
#include <verilated.h>
#include <stdint.h>
#include "Vsynth.h"
void mul(int32_t expected, int32_t s1, int32_t s2){
int32_t result;
asm volatile("mul %0, %1, %2;": "=r"(result): "r"(s1), "r"(s2));
if (result != expected){
printf("mul fail: %d %d %d but got %d\n\r", expected, s1, s2, result);
}
}
int main(){
int32_t a = 3;
int32_t b = 5;
mul(15,a,b);
printf("Hello World %d\n", a*b);
return 0;
}
```
* Execute:
```=
~/spu32$ verilator -Wno-fatal synth.v mul_tb.cpp --cc --trace --exe
~/spu32$ make -j -C obj_dir -f Vsynth.mk
make: Entering directory '/home/matthew/spu32/obj_dir'
g++ -I. -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow -Os -c -o multest.o ../software/spu32-system/apps/multest/multest.c
/software/spu32-system/apps/multest/multest.c:42: Error: number of operands mismatch for `mul'
make: *** [Vcpu.mk:59: multest.o] Error 1
make: *** Waiting for unfinished jobs....
make: Leaving directory '/home/matthew/spu32/obj_dir'
```
The test returns a message: "number of operands mismatch for mul". If just print out the multiplication without inline assembly, it correctly executes. Besides, I find that even simple instruction like ```add``` also returns the message.
### Testbench (in verilog)
* Install [Icarus Verilog](http://iverilog.icarus.com/):
```$ sudo apt install iverilog```
* Generate vvp assembly:
```iverilog -o div.vvp cpu/tests/div_tb.v cpu/div.v```
* Check:
```$ vvp div.vvp ```
:::spoiler
```
VCD info: dumpfile ./cpu/tests/div_tb.lxt opened for output.
div: 0 / 0 = -1
div: 0 / 1 = 0
div: 0 / -1 = 0
div: 0 / 2147483647 = 0
div: 0 / -2147483648 = 0
div: 1 / 0 = -1
div: 1 / 1 = 1
div: 1 / -1 = -1
div: 1 / 2147483647 = 0
div: 1 / -2147483648 = 0
div: -1 / 0 = -1
div: -1 / 1 = -1
div: -1 / -1 = 1
div: -1 / 2147483647 = 0
div: -1 / -2147483648 = 0
div: 2147483647 / 0 = -1
div: 2147483647 / 1 = 2147483647
div: 2147483647 / -1 = -2147483647
div: 2147483647 / 2147483647 = 1
div: 2147483647 / -2147483648 = 0
div: -2147483648 / 0 = -1
div: -2147483648 / 1 = -2147483648
div: -2147483648 / -1 = -2147483648
div: -2147483648 / 2147483647 = -1
div: -2147483648 / -2147483648 = 1
divu: 0 / 0 = 4294967295
divu: 0 / 1 = 0
divu: 0 / 4294967295 = 0
divu: 0 / 2147483647 = 0
divu: 0 / 2147483648 = 0
divu: 1 / 0 = 4294967295
divu: 1 / 1 = 1
divu: 1 / 4294967295 = 0
divu: 1 / 2147483647 = 0
divu: 1 / 2147483648 = 0
divu: 4294967295 / 0 = 4294967295
divu: 4294967295 / 1 = 4294967295
divu: 4294967295 / 4294967295 = 1
divu: 4294967295 / 2147483647 = 2
divu: 4294967295 / 2147483648 = 1
divu: 2147483647 / 0 = 4294967295
divu: 2147483647 / 1 = 2147483647
divu: 2147483647 / 4294967295 = 0
divu: 2147483647 / 2147483647 = 1
divu: 2147483647 / 2147483648 = 0
divu: 2147483648 / 0 = 4294967295
divu: 2147483648 / 1 = 2147483648
divu: 2147483648 / 4294967295 = 0
divu: 2147483648 / 2147483647 = 1
divu: 2147483648 / 2147483648 = 1
divu: 4294967295 / 2684354564 = 1
remu: 4294967295 % 2684354564 = 1610612731
div: -1 / -1610612731 = 0
rem: -1 % -1610612731 = -1
divu: 4294967295 / 2684354571 = 1
remu: 4294967295 % 2684354571 = 1610612724
div: -1 / -1610612724 = 0
rem: -1 % -1610612724 = -1
divu: 4294967295 / 2684354572 = 1
remu: 4294967295 % 2684354572 = 1610612723
div: -1 / -1610612723 = 0
rem: -1 % -1610612723 = -1
divu: 4294967295 / 2684354573 = 1
remu: 4294967295 % 2684354573 = 1610612722
div: -1 / -1610612722 = 0
rem: -1 % -1610612722 = -1
divu: 4294967295 / 2684354574 = 1
remu: 4294967295 % 2684354574 = 1610612721
div: -1 / -1610612721 = 0
rem: -1 % -1610612721 = -1
divu: 4294967295 / 2684354575 = 1
remu: 4294967295 % 2684354575 = 1610612720
div: -1 / -1073741840 = 0
rem: -1 % -1073741840 = -1
divu: 4294967295 / 3221225456 = 1
remu: 4294967295 % 3221225456 = 1073741839
div: -1 / -1073741839 = 0
rem: -1 % -1073741839 = -1
divu: 4294967295 / 3221225457 = 1
remu: 4294967295 % 3221225457 = 1073741838
div: -1 / -1073741838 = 0
rem: -1 % -1073741838 = -1
divu: 4294967295 / 3221225458 = 1
remu: 4294967295 % 3221225458 = 1073741837
div: -1 / -1073741837 = 0
rem: -1 % -1073741837 = -1
divu: 4294967295 / 3221225459 = 1
remu: 4294967295 % 3221225459 = 1073741836
div: -1 / -1073741836 = 0
rem: -1 % -1073741836 = -1
divu: 4294967295 / 3221225460 = 1
remu: 4294967295 % 3221225460 = 1073741835
div: -1 / -1073741835 = 0
rem: -1 % -1073741835 = -1
divu: 4294967295 / 3221225461 = 1
remu: 4294967295 % 3221225461 = 1073741834
div: -1 / -1073741834 = 0
rem: -1 % -1073741834 = -1
divu: 4294967295 / 3221225462 = 1
remu: 4294967295 % 3221225462 = 1073741833
div: -1 / -1073741833 = 0
rem: -1 % -1073741833 = -1
divu: 4294967295 / 3221225463 = 1
remu: 4294967295 % 3221225463 = 1073741832
div: -1 / -1073741832 = 0
rem: -1 % -1073741832 = -1
divu: 4294967295 / 3221225464 = 1
remu: 4294967295 % 3221225464 = 1073741831
div: -1 / -1073741831 = 0
rem: -1 % -1073741831 = -1
divu: 4294967295 / 3221225465 = 1
remu: 4294967295 % 3221225465 = 1073741830
div: -1 / -1073741830 = 0
rem: -1 % -1073741830 = -1
divu: 4294967295 / 3221225466 = 1
remu: 4294967295 % 3221225466 = 1073741829
div: -1 / -1073741829 = 0
rem: -1 % -1073741829 = -1
divu: 4294967295 / 3221225467 = 1
remu: 4294967295 % 3221225467 = 1073741828
div: -1 / -1073741828 = 0
rem: -1 % -1073741828 = -1
divu: 4294967295 / 3221225468 = 1
remu: 4294967295 % 3221225468 = 1073741827
div: -1 / -1073741827 = 0
rem: -1 % -1073741827 = -1
divu: 4294967295 / 3221225469 = 1
remu: 4294967295 % 3221225469 = 1073741826
div: -1 / -1073741826 = 0
rem: -1 % -1073741826 = -1
divu: 4294967295 / 3221225470 = 1
remu: 4294967295 % 3221225470 = 1073741825
div: -1 / -1073741825 = 0
rem: -1 % -1073741825 = -1
divu: 4294967295 / 3221225471 = 1
remu: 4294967295 % 3221225471 = 1073741824
div: -1 / -1073741824 = 0
rem: -1 % -1073741824 = -1
divu: 4294967295 / 3221225472 = 1
remu: 4294967295 % 3221225472 = 1073741823
div: -1 / -1073741823 = 0
rem: -1 % -1073741823 = -1
divu: 4294967295 / 3221225473 = 1
remu: 4294967295 % 3221225473 = 1073741822
div: -1 / -1073741822 = 0
rem: -1 % -1073741822 = -1
divu: 4294967295 / 3221225474 = 1
remu: 4294967295 % 3221225474 = 1073741821
div: -1 / -1073741821 = 0
rem: -1 % -1073741821 = -1
divu: 4294967295 / 3221225475 = 1
remu: 4294967295 % 3221225475 = 1073741820
div: -1 / -1073741820 = 0
rem: -1 % -1073741820 = -1
divu: 4294967295 / 3221225476 = 1
remu: 4294967295 % 3221225476 = 1073741819
div: -1 / -1073741819 = 0
rem: -1 % -1073741819 = -1
divu: 4294967295 / 3221225477 = 1
remu: 4294967295 % 3221225477 = 1073741818
div: -1 / -1073741818 = 0
rem: -1 % -1073741818 = -1
divu: 4294967295 / 3221225478 = 1
remu: 4294967295 % 3221225478 = 1073741817
div: -1 / -1073741817 = 0
rem: -1 % -1073741817 = -1
divu: 4294967295 / 3221225479 = 1
remu: 4294967295 % 3221225479 = 1073741816
div: -1 / -1073741816 = 0
rem: -1 % -1073741816 = -1
divu: 4294967295 / 3221225480 = 1
remu: 4294967295 % 3221225480 = 1073741815
div: -1 / -1073741815 = 0
rem: -1 % -1073741815 = -1
divu: 4294967295 / 3221225481 = 1
remu: 4294967295 % 3221225481 = 1073741814
div: -1 / -1073741814 = 0
rem: -1 % -1073741814 = -1
divu: 4294967295 / 3221225482 = 1
remu: 4294967295 % 3221225482 = 1073741813
div: -1 / -1073741813 = 0
rem: -1 % -1073741813 = -1
divu: 4294967295 / 3221225483 = 1
remu: 4294967295 % 3221225483 = 1073741812
div: -1 / -1073741812 = 0
rem: -1 % -1073741812 = -1
divu: 4294967295 / 3221225484 = 1
remu: 4294967295 % 3221225484 = 1073741811
div: -1 / -1073741811 = 0
rem: -1 % -1073741811 = -1
divu: 4294967295 / 3221225485 = 1
remu: 4294967295 % 3221225485 = 1073741810
div: -1 / -1073741810 = 0
rem: -1 % -1073741810 = -1
divu: 4294967295 / 3221225486 = 1
remu: 4294967295 % 3221225486 = 1073741809
div: -1 / -1073741809 = 0
rem: -1 % -1073741809 = -1
divu: 4294967295 / 3221225487 = 1
remu: 4294967295 % 3221225487 = 1073741808
div: -1 / -536870928 = 0
rem: -1 % -536870928 = -1
divu: 4294967295 / 3758096368 = 1
remu: 4294967295 % 3758096368 = 536870927
div: -1 / -536870927 = 0
rem: -1 % -536870927 = -1
divu: 4294967295 / 3758096369 = 1
remu: 4294967295 % 3758096369 = 536870926
div: -1 / -536870926 = 0
rem: -1 % -536870926 = -1
divu: 4294967295 / 3758096370 = 1
remu: 4294967295 % 3758096370 = 536870925
div: -1 / -536870925 = 0
...
:::
### Analysis
* Benchmark? **Dhrystone** and **CoreMark** benchmark programs.
* Compare the w/ and w/o RV32M by **statics**?
#
[YOSYS Installation Guide](https://symbiyosys.readthedocs.io/en/latest/install.html)
[Verilator Homepage](https://www.veripool.org/verilator/)
[Verilator C++ Trace](https://veripool.org/guide/latest/faq.html#how-do-i-generate-waveforms-traces-in-c)
[SPU32](https://github.com/maikmerten/spu32/)
[31 Verilog Projects](https://github.com/sudhamshu091/32-Verilog-Mini-Projects)
[The RISC-V Instruction Set Manual](https://iscasmc.ios.ac.cn/iscasmcwp/wp-content/uploads/2021/11/The-RISC-V-Instruction-Set-Manual-Volume-I-Unprivileged-ISA.pdf)
###### contributed by ```陳柏瑋```