# Asssigment3: SoftCPU
contributed by <`Zheng-Xian Li(garyparrot)`>
## Goal
1. Follow the instruction of [this article](https://hackmd.io/@sysprog/S1Udn1Xtt).
* Modify the assembly programs user/done with Assignment1 for src32 Simulation with Verilator.
* Validate the results in your program.
2. Check the generated file `wave.fst` and use GTKwave to view the waveform. Then, explain how your program is executed along with srv32 Simulation.
* Show the signal/events inside `srv32` associated to
* PC
* Branch
* Instruction Memory(I-MEM)
* Data Memory(D-MEM)
* Instruction Internals
* Discuss pipeline architecture along with your program.
3. Propose the software optimizations (against your program) based on the pipeline design of `srv32`.
* Goals
* Fewer instructions
* Shorter cycle counts
* Eliminate unnecessary stalls.
4. Summarize how `RISC-V Compilance Tests` works and
* Why the signature should be matched.
5. Explain how `src32` works with `Verilator`.
## [1] Following the instruction of that article
So in this homework, we are going to use a RISC-V simulator called `srv32`. It features a 3-stage pipeline processor design. Support `RV32IM` instruction (verified by the `RV32IM compliance test`) ...
### Run RTL sim
The first we are going to run the [Veripool](https://www.veripool.org/verilator/) tool. It is a RTL-level generator. So it can simulating the execution of RISC-V binary at `Register-Transfer-Level`.
#### My homework 1 code (sqrt)
:::spoiler my code
```c=
#include <stdio.h>
int mySqrt(int x){
unsigned int s = 0;
for(unsigned int i = (1 << 15);i > 0;i >>= 1)
if((s+i) * (s+i) <= x)
s += i;
return s;
}
int main() {
printf("%d\n", mySqrt(2147483647));
return 0;
}
```
:::
#### Result
:::spoiler `make sqrt.run`
```
46340
Excuting 1897 instructions, 2415 cycles, 1.273 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.059 s
Simulation cycles: 2426
Simulation speed : 0.0411186 MHz
```
:::
### Run ISS sim
This simulator is capable of simulating the execution of RISC-V binary in software level.
#### Result
:::spoiler `make sqrt.run`
```
./rvsim --memsize 128 -l trace.log ../sw/sqrt/sqrt.elf
46340
Excuting 1897 instructions, 2415 cycles, 1.273 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.002 s
Simulation cycles: 2415
Simulation speed : 1.197 MHz
```
:::
### Run RISC-V compliance test (v1.0)
Change directory into `./tests`.
* Execute `make tests` to run the RISC-V compliance tests on RTL simulator.
* Execute `make tests-sw` to run the RISC-V compliance tests on SW simulator.
:::spoiler `make tests`
```
if [ "1" = "1" ]; then \
if [ ! -d riscv-compliance.v1 ]; then \
git clone -b 1.0 git://github.com/riscv/riscv-arch-test.git riscv-compliance.v1; \
fi; \
rm -rf riscv-compliance.v1/riscv-target/srv32; \
cp -r srv32.v1 riscv-compliance.v1/riscv-target/srv32; \
else \
if [ ! -d riscv-compliance.v2 ]; then \
git clone git://github.com/riscv/riscv-arch-test.git riscv-compliance.v2; \
fi; \
rm -rf riscv-compliance.v2/riscv-target/srv32; \
cp -r srv32.v2 riscv-compliance.v2/riscv-target/srv32; \
fi
export ROOT_SRV32=/home/garyparrot/Programming/srv32; \
export TARGET_SIM="/home/garyparrot/Programming/srv32/sim/sim +trace"; \
export TARGET_SWSIM="/home/garyparrot/Programming/srv32/tools/rvsim --memsize 128"; \
export RISCV_PREFIX=riscv-none-embed-; \
export RISCV_TARGET=srv32; \
make rv32c=0 -C riscv-compliance.v1
make[1]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
for isa in rv32i rv32im rv32Zicsr; do \
make RISCV_TARGET=srv32 RISCV_TARGET_FLAGS="" RISCV_DEVICE=$isa RISCV_ISA=$isa variant; \
rc=$?; \
if [ $rc -ne 0 ]; then \
exit $rc; \
fi \
done
make[2]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
make \
RISCV_TARGET=srv32 \
RISCV_DEVICE=rv32i \
RISCV_PREFIX=riscv-none-embed- \
run -C /home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32i
make[3]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32i'
make[3]: Nothing to be done for 'run'.
make[3]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32i'
riscv-test-env/verify.sh
Compare to reference files ...
Check I-ADD-01 ... OK
Check I-ADDI-01 ... OK
Check I-AND-01 ... OK
Check I-ANDI-01 ... OK
Check I-AUIPC-01 ... OK
Check I-BEQ-01 ... OK
Check I-BGE-01 ... OK
Check I-BGEU-01 ... OK
Check I-BLT-01 ... OK
Check I-BLTU-01 ... OK
Check I-BNE-01 ... OK
Check I-DELAY_SLOTS-01 ... OK
Check I-EBREAK-01 ... OK
Check I-ECALL-01 ... OK
Check I-ENDIANESS-01 ... OK
Check I-IO-01 ... OK
Check I-JAL-01 ... OK
Check I-JALR-01 ... OK
Check I-LB-01 ... OK
Check I-LBU-01 ... OK
Check I-LH-01 ... OK
Check I-LHU-01 ... OK
Check I-LUI-01 ... OK
Check I-LW-01 ... OK
Check I-MISALIGN_JMP-01 ... OK
Check I-MISALIGN_LDST-01 ... OK
Check I-NOP-01 ... OK
Check I-OR-01 ... OK
Check I-ORI-01 ... OK
Check I-RF_size-01 ... OK
Check I-RF_width-01 ... OK
Check I-RF_x0-01 ... OK
Check I-SB-01 ... OK
Check I-SH-01 ... OK
Check I-SLL-01 ... OK
Check I-SLLI-01 ... OK
Check I-SLT-01 ... OK
Check I-SLTI-01 ... OK
Check I-SLTIU-01 ... OK
Check I-SLTU-01 ... OK
Check I-SRA-01 ... OK
Check I-SRAI-01 ... OK
Check I-SRL-01 ... OK
Check I-SRLI-01 ... OK
Check I-SUB-01 ... OK
Check I-SW-01 ... OK
Check I-XOR-01 ... OK
Check I-XORI-01 ... OK
--------------------------------
OK: 48/48 RISCV_TARGET=srv32 RISCV_DEVICE=rv32i RISCV_ISA=rv32i
make[2]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
make[2]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
make \
RISCV_TARGET=srv32 \
RISCV_DEVICE=rv32im \
RISCV_PREFIX=riscv-none-embed- \
run -C /home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32im
make[3]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32im'
make[3]: Nothing to be done for 'run'.
make[3]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32im'
riscv-test-env/verify.sh
Compare to reference files ...
Check DIV ... OK
Check DIVU ... OK
Check MULH ... OK
Check MULHSU ... OK
Check MULHU ... OK
Check MUL ... OK
Check REM ... OK
Check REMU ... OK
--------------------------------
OK: 8/8 RISCV_TARGET=srv32 RISCV_DEVICE=rv32im RISCV_ISA=rv32im
make[2]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
make[2]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
make \
RISCV_TARGET=srv32 \
RISCV_DEVICE=rv32Zicsr \
RISCV_PREFIX=riscv-none-embed- \
run -C /home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32Zicsr
make[3]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32Zicsr'
make[3]: Nothing to be done for 'run'.
make[3]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32Zicsr'
riscv-test-env/verify.sh
Compare to reference files ...
Check I-CSRRC-01 ... OK
Check I-CSRRCI-01 ... OK
Check I-CSRRS-01 ... OK
Check I-CSRRSI-01 ... OK
Check I-CSRRW-01 ... OK
Check I-CSRRWI-01 ... OK
--------------------------------
OK: 6/6 RISCV_TARGET=srv32 RISCV_DEVICE=rv32Zicsr RISCV_ISA=rv32Zicsr
make[2]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
make[1]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
```
:::
:::spoiler `make tests-sw`
```
if [ "1" = "1" ]; then \
if [ ! -d riscv-compliance.v1 ]; then \
git clone -b 1.0 git://github.com/riscv/riscv-arch-test.git riscv-compliance.v1; \
fi; \
rm -rf riscv-compliance.v1/riscv-target/srv32; \
cp -r srv32.v1 riscv-compliance.v1/riscv-target/srv32; \
else \
if [ ! -d riscv-compliance.v2 ]; then \
git clone git://github.com/riscv/riscv-arch-test.git riscv-compliance.v2; \
fi; \
rm -rf riscv-compliance.v2/riscv-target/srv32; \
cp -r srv32.v2 riscv-compliance.v2/riscv-target/srv32; \
fi
export ROOT_SRV32=/home/garyparrot/Programming/srv32; \
export TARGET_SIM="/home/garyparrot/Programming/srv32/tools/rvsim --memsize 128 -l trace.log"; \
export TARGET_SWSIM="/home/garyparrot/Programming/srv32/tools/rvsim --memsize 128"; \
export RISCV_PREFIX=riscv-none-embed-; \
export RISCV_TARGET=srv32; \
make rv32c=0 -C riscv-compliance.v1
make[1]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
for isa in rv32i rv32im rv32Zicsr; do \
make RISCV_TARGET=srv32 RISCV_TARGET_FLAGS="" RISCV_DEVICE=$isa RISCV_ISA=$isa variant; \
rc=$?; \
if [ $rc -ne 0 ]; then \
exit $rc; \
fi \
done
make[2]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
make \
RISCV_TARGET=srv32 \
RISCV_DEVICE=rv32i \
RISCV_PREFIX=riscv-none-embed- \
run -C /home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32i
make[3]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32i'
make[3]: Nothing to be done for 'run'.
make[3]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32i'
riscv-test-env/verify.sh
Compare to reference files ...
Check I-ADD-01 ... OK
Check I-ADDI-01 ... OK
Check I-AND-01 ... OK
Check I-ANDI-01 ... OK
Check I-AUIPC-01 ... OK
Check I-BEQ-01 ... OK
Check I-BGE-01 ... OK
Check I-BGEU-01 ... OK
Check I-BLT-01 ... OK
Check I-BLTU-01 ... OK
Check I-BNE-01 ... OK
Check I-DELAY_SLOTS-01 ... OK
Check I-EBREAK-01 ... OK
Check I-ECALL-01 ... OK
Check I-ENDIANESS-01 ... OK
Check I-IO-01 ... OK
Check I-JAL-01 ... OK
Check I-JALR-01 ... OK
Check I-LB-01 ... OK
Check I-LBU-01 ... OK
Check I-LH-01 ... OK
Check I-LHU-01 ... OK
Check I-LUI-01 ... OK
Check I-LW-01 ... OK
Check I-MISALIGN_JMP-01 ... OK
Check I-MISALIGN_LDST-01 ... OK
Check I-NOP-01 ... OK
Check I-OR-01 ... OK
Check I-ORI-01 ... OK
Check I-RF_size-01 ... OK
Check I-RF_width-01 ... OK
Check I-RF_x0-01 ... OK
Check I-SB-01 ... OK
Check I-SH-01 ... OK
Check I-SLL-01 ... OK
Check I-SLLI-01 ... OK
Check I-SLT-01 ... OK
Check I-SLTI-01 ... OK
Check I-SLTIU-01 ... OK
Check I-SLTU-01 ... OK
Check I-SRA-01 ... OK
Check I-SRAI-01 ... OK
Check I-SRL-01 ... OK
Check I-SRLI-01 ... OK
Check I-SUB-01 ... OK
Check I-SW-01 ... OK
Check I-XOR-01 ... OK
Check I-XORI-01 ... OK
--------------------------------
OK: 48/48 RISCV_TARGET=srv32 RISCV_DEVICE=rv32i RISCV_ISA=rv32i
make[2]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
make[2]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
make \
RISCV_TARGET=srv32 \
RISCV_DEVICE=rv32im \
RISCV_PREFIX=riscv-none-embed- \
run -C /home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32im
make[3]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32im'
make[3]: Nothing to be done for 'run'.
make[3]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32im'
riscv-test-env/verify.sh
Compare to reference files ...
Check DIV ... OK
Check DIVU ... OK
Check MULH ... OK
Check MULHSU ... OK
Check MULHU ... OK
Check MUL ... OK
Check REM ... OK
Check REMU ... OK
--------------------------------
OK: 8/8 RISCV_TARGET=srv32 RISCV_DEVICE=rv32im RISCV_ISA=rv32im
make[2]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
make[2]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
make \
RISCV_TARGET=srv32 \
RISCV_DEVICE=rv32Zicsr \
RISCV_PREFIX=riscv-none-embed- \
run -C /home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32Zicsr
make[3]: Entering directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32Zicsr'
make[3]: Nothing to be done for 'run'.
make[3]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1/riscv-test-suite/rv32Zicsr'
riscv-test-env/verify.sh
Compare to reference files ...
Check I-CSRRC-01 ... OK
Check I-CSRRCI-01 ... OK
Check I-CSRRS-01 ... OK
Check I-CSRRSI-01 ... OK
Check I-CSRRW-01 ... OK
Check I-CSRRWI-01 ... OK
--------------------------------
OK: 6/6 RISCV_TARGET=srv32 RISCV_DEVICE=rv32Zicsr RISCV_ISA=rv32Zicsr
make[2]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
make[1]: Leaving directory '/home/garyparrot/Programming/srv32/tests/riscv-compliance.v1'
```
:::
## [2] Use GTKWave to view execution signals.
In this section, we are going to observe the internal signals of the pipeline simulator by `GTKWave`.

Judging from the image we can see the initial execution of our program.
### imem_addr

Juding from the image we can see `imem_addr` contain the instruction memory address at `fetch_pc` stage. The [pipeline](https://i.imgur.com/Hz9FNKZ.jpg) shows that `EX` stage are two stage away from `FETCH` stage. We can confirm this from the diagram.
### Registers Access (wb_dst_sel, ex_src1_sel, ex_src2_sel)

I highlight the following instruciton in the given image.
```
4: 64028293 addi t0,t0,1600 # 14640 <trap_handler>
```
In that image I focus on the `ex` stage. The stage is executing the `addi` instruction.
* ``ex_imm_sel`` selected.
* `ex_imm` equals to `1600` in decimal.
* `ex_src1_sel` select `05`, which is the register id of `t0`.
* `ex_src2_sel` select nothing since the instruction format doesn't use second src register.
* `ex_pc` equals to `0x4`, the instruction of the highlighted `addi`.
* Focus on the next clk, we can see `wb_dst_sel` equals to `05`, which is the reigster id of `t0`.
## [3] Propose the software optimizations
`printf` cause too many unrelated noise. So in this section I will slightly modify my code like this.
```c=
#include <stdio.h>
int mySqrt(int x){
unsigned int s = 0;
for(unsigned int i = (1 << 15);i > 0;i >>= 1)
if((s+i) * (s+i) <= x)
s += i;
return s;
}
int main() {
return mySqrt(40000);
// printf("%d\n", mySqrt(2147483647));
}
```
```
0000003c <mySqrt>:
3c: 00050593 mv a1,a0
40: 01000693 li a3,16
44: 00008737 lui a4,0x8
48: 00000513 li a0,0
4c: 0100006f j 5c <mySqrt+0x20>
50: 00175713 srli a4,a4,0x1
54: fff68693 addi a3,a3,-1
58: 00068c63 beqz a3,70 <mySqrt+0x34>
5c: 00e507b3 add a5,a0,a4
60: 02f78633 mul a2,a5,a5
64: fec5e6e3 bltu a1,a2,50 <mySqrt+0x14>
68: 00078513 mv a0,a5
6c: fe5ff06f j 50 <mySqrt+0x14>
70: 00008067 ret
```
```
Excuting 134 instructions, 180 cycles, 1.343 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.014 s
Simulation cycles: 191
Simulation speed : 0.0136429 MHz
```
According to the info from this [article](https://hackmd.io/@sysprog/S1Udn1Xtt). This pipeline suffer from **branch stall** issue. The following image shows the issue.

Analyze the wave graph. We find those instructions causing branch stalls.
* 1 hit * `` 4c: 0100006f j 5c <mySqrt+0x20>``
* 13 hit * `` 64: fec5e6e3 bltu a1,a2,50 <mySqrt+0x14>``
* 3 hit * `` 6c: fe5ff06f j 50 <mySqrt+0x14>``
* 1 hit * `` 58: 00068c63 beqz a3,70 <mySqrt+0x34>``
* 1 hit * `` 70: 00008067 ret``
So it looks like this code causing most of the branch stalls.
```c
if((s+i) * (s+i) <= x)
```
Since this is just a simple loop, The loop unrolling technique will help.
### Result
My approach turn all these if branch into a boolean expression result. And instead of perform branch on these bool expression result. I make it multiply the actual calcuting bit. If the boolean expression result as `True`, them the bit will multiply by `1`, which keep the original value. If the result is `False`, the bit multiply by `0`, which equals to what will happen if branch not taken.
```c=
#include <stdio.h>
int mySqrt(int x){
unsigned int s = 0;
#define roll(s,i,x) (s += i * ((s+i)*(s+i) <= x))
roll(s ,(1 << 15), x);
roll(s ,(1 << 14), x);
roll(s ,(1 << 13), x);
roll(s ,(1 << 12), x);
roll(s ,(1 << 11), x);
roll(s ,(1 << 10), x);
roll(s ,(1 << 9), x);
roll(s ,(1 << 8), x);
roll(s ,(1 << 7), x);
roll(s ,(1 << 6), x);
roll(s ,(1 << 5), x);
roll(s ,(1 << 4), x);
roll(s ,(1 << 3), x);
roll(s ,(1 << 2), x);
roll(s ,(1 << 1), x);
return s;
}
int main() {
return mySqrt(40000);
}
```
:::spoiler assembly

```
0000003c <mySqrt>:
3c: 400006b7 lui a3,0x40000
40: 00d536b3 sltu a3,a0,a3
44: 0016c693 xori a3,a3,1
48: 00f69693 slli a3,a3,0xf
4c: 00004737 lui a4,0x4
50: 00e68733 add a4,a3,a4
54: 02e70733 mul a4,a4,a4
58: 00e53733 sltu a4,a0,a4
5c: 00174713 xori a4,a4,1
60: 00e71713 slli a4,a4,0xe
64: 00e686b3 add a3,a3,a4
68: 00002737 lui a4,0x2
6c: 00e68733 add a4,a3,a4
70: 02e707b3 mul a5,a4,a4
74: 00f537b3 sltu a5,a0,a5
78: 0017c793 xori a5,a5,1
7c: 00d79793 slli a5,a5,0xd
80: 00d786b3 add a3,a5,a3
84: 00001737 lui a4,0x1
88: 00e68733 add a4,a3,a4
8c: 02e707b3 mul a5,a4,a4
90: 00f537b3 sltu a5,a0,a5
94: 0017c793 xori a5,a5,1
98: 00c79793 slli a5,a5,0xc
9c: 00d787b3 add a5,a5,a3
a0: 00001737 lui a4,0x1
a4: 80070713 addi a4,a4,-2048 # 800 <_text_end+0x294>
a8: 00e78733 add a4,a5,a4
ac: 02e70733 mul a4,a4,a4
b0: 00000693 li a3,0
b4: 00e56663 bltu a0,a4,c0 <mySqrt+0x84>
b8: 000016b7 lui a3,0x1
bc: 80068693 addi a3,a3,-2048 # 800 <_text_end+0x294>
c0: 00f686b3 add a3,a3,a5
c4: 40068713 addi a4,a3,1024
c8: 02e70733 mul a4,a4,a4
cc: 00e537b3 sltu a5,a0,a4
d0: 0017c793 xori a5,a5,1
d4: 00a79793 slli a5,a5,0xa
d8: 00d787b3 add a5,a5,a3
dc: 20078713 addi a4,a5,512
e0: 02e706b3 mul a3,a4,a4
e4: 00d536b3 sltu a3,a0,a3
e8: 0016c693 xori a3,a3,1
ec: 00969693 slli a3,a3,0x9
f0: 00f687b3 add a5,a3,a5
f4: 10078713 addi a4,a5,256
f8: 02e706b3 mul a3,a4,a4
fc: 00d536b3 sltu a3,a0,a3
100: 0016c693 xori a3,a3,1
104: 00869693 slli a3,a3,0x8
108: 00f686b3 add a3,a3,a5
10c: 08068713 addi a4,a3,128
110: 02e707b3 mul a5,a4,a4
114: 00f537b3 sltu a5,a0,a5
118: 0017c793 xori a5,a5,1
11c: 00779793 slli a5,a5,0x7
120: 00d786b3 add a3,a5,a3
124: 04068713 addi a4,a3,64
128: 02e707b3 mul a5,a4,a4
12c: 00f537b3 sltu a5,a0,a5
130: 0017c793 xori a5,a5,1
134: 00679793 slli a5,a5,0x6
138: 00d787b3 add a5,a5,a3
13c: 02078713 addi a4,a5,32
140: 02e706b3 mul a3,a4,a4
144: 00d536b3 sltu a3,a0,a3
148: 0016c693 xori a3,a3,1
14c: 00569693 slli a3,a3,0x5
150: 00f687b3 add a5,a3,a5
154: 01078713 addi a4,a5,16
158: 02e706b3 mul a3,a4,a4
15c: 00d536b3 sltu a3,a0,a3
160: 0016c693 xori a3,a3,1
164: 00469693 slli a3,a3,0x4
168: 00f686b3 add a3,a3,a5
16c: 00868713 addi a4,a3,8
170: 02e707b3 mul a5,a4,a4
174: 00f537b3 sltu a5,a0,a5
178: 0017c793 xori a5,a5,1
17c: 00379793 slli a5,a5,0x3
180: 00d786b3 add a3,a5,a3
184: 00468713 addi a4,a3,4
188: 02e707b3 mul a5,a4,a4
18c: 00f537b3 sltu a5,a0,a5
190: 0017c793 xori a5,a5,1
194: 00279793 slli a5,a5,0x2
198: 00d787b3 add a5,a5,a3
19c: 00278713 addi a4,a5,2
1a0: 02e70733 mul a4,a4,a4
1a4: 00e53533 sltu a0,a0,a4
1a8: 00154513 xori a0,a0,1
1ac: 00151513 slli a0,a0,0x1
1b0: 00f50533 add a0,a0,a5
1b4: 00008067 ret
```
:::
### Total Branch taken

* 1 hit `` b4: 00e56663 bltu a0,a4,c0 <mySqrt+0x84>``
* 1 hit `` 1b4: 00008067 ret``
```
Excuting 119 instructions, 131 cycles, 1.100 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.019 s
Simulation cycles: 142
Simulation speed : 0.00747368 MHz
```
* Total instruction saved: 134 - 119 = 15.
* Total cycle saved: 180 - 131 = 49.