owned this note
owned this note
Published
Linked with GitHub
# Assignment 3: SoftCPU
## Requirement
This assignment using the program in [assignment 1: Repeated Number](https://hackmd.io/@E4b6eQ9-RWSAX-9mP_FLhA/B1KTHywHK).
:::spoiler C Code
```c=
#include <stdio.h>
int repeatedNTimes(int* nums, int numsSize){
if(numsSize>=3)
{
if(nums[0]==nums[numsSize-1] || nums[0]==nums[numsSize-2])
return nums[0];
if(nums[1]==nums[numsSize-1] || nums[1]==nums[0])
return nums[1];
}
for(int i=2;i<numsSize;i++)
if(nums[i]==nums[i-1] || nums[i]==nums[i-2])
return nums[i];
return -1;
}
int main()
{
int a[]={5,1,5,2,5,3,5,4};
int numsSize=8;
int repeatedNum=repeatedNTimes(a,numsSize);
printf("[");
for(int i=0;i<numsSize;i++)
printf("%d ",a[i]);
printf("] RepeatedNumber: %d\n",repeatedNum);
return 0;
}
```
:::
:::spoiler Handwriting Assembly Code
```Assembly
.data
nums: .word 1,2,3,3
numsSize: .word 4
str1: .string "["
str2: .string "]\n"
str3: .string " "
str4: .string "Repeated number is "
.text
main:
la a1,nums
lw a2,numsSize
jal Check
mv a4,a0
jal ra,Print #Print Array
la a0,str4
li a7,4
ecall
mv a0,a4 #print Answer
li a7,1
ecall
li a7,10
ecall
Print: # a1:array a2: numSize
li t0,0
la a0,str1 #print started
li a7,4
ecall
mv a3,a1
PrintFor:
lw a0,(0)a3 # print number
li a7,1
ecall
la a0,str3 # print space
li a7,4
ecall
addi a3,a3,4
addi t0,t0,1
bne t0,a2,PrintFor
PrintEnd:
la a0,str2 #print ended
li a7,4
ecall
jr ra
Check:
# return a0 by answer
# t0 i
# t3 t4 t5 number of Ai Ai+1 Ai+2
# a3 pointer to current number
#------Start--------------------------------
mv t4,a2 #t4= numsSize
addi t4,t4,-1 #t4=numsSize-1
slli t4,t4,2 #t4= (numsSize-1) pointer offset
addi t3,zero,3 # 3
blt a2,t3,SkipCompare
# if(nums[0]==nums[numsSize-1] || nums[0]==nums[numsSize-2])
lw t3,(0)a1 # t0=A[0]
add a3,a1,t4
lw t4,(0)a3
lw t5,(-4)a3
beq t3,t4,ReturnAns
beq t3,t5,ReturnAns
# if(nums[1]==nums[numsSize-1] || nums[1]==nums[0])
lw t3,(4)a1
lw t5,(0)a1
beq t3,t4,ReturnAns
beq t3,t5,ReturnAns
#---------loop Start------
li t0,2 #Start From 2
CheckFor:
#-------getNumber-------------------------------
slli t3,t0,2 # t3 = t0*4 t3 :i
add a3,a1,t3 # a3= a1+t0 pointer calculation
lw t3,(0)a3 # t3= A[i]
lw t4,(-4)a3
lw t5,(-8)a3
#----check A==A[i+1] or A==A[i+2]----------------
beq t3,t4,ReturnAns
beq t3,t5,ReturnAns
addi t0,t0,1 #endFor i++
blt t0,a2,CheckFor #Check i<numsSize
ReturnAns:
add a0,t3,zero
jr ra
SkipCompare:
lw t3,(0)a1
jr ra
```
:::
## Run RTL SIM
```
Excuting 6521 instructions, 8361 cycles, 1.282 CPI
Simulation statistics
=====================
Simulation time : 0.061 s
Simulation cycles: 8372
Simulation speed : 0.137246 MHz
```
## Run ISS
```
Excuting 6521 instructions, 8361 cycles, 1.282 CPI
Simulation statistics
=====================
Simulation time : 0.002 s
Simulation cycles: 8361
Simulation speed : 3.359 MHz
```
## Wavefrom Analysis
:::spoiler O3 generated Assembly Code
```
Disassembly of section .text:
...
0000003c <repeatedNTimes>:
3c: 00200713 li a4,2
40: 00050793 mv a5,a0
44: 06b75063 bge a4,a1,a4 <repeatedNTimes+0x68>
48: 00259713 slli a4,a1,0x2
4c: ffc70713 addi a4,a4,-4
50: 00e50733 add a4,a0,a4
54: 00052683 lw a3,0(a0)
58: 00072603 lw a2,0(a4)
5c: 00068513 mv a0,a3
60: 04c68063 beq a3,a2,a0 <repeatedNTimes+0x64>
64: ffc72503 lw a0,-4(a4)
68: 02a68c63 beq a3,a0,a0 <repeatedNTimes+0x64>
6c: 0047a503 lw a0,4(a5)
70: 02a60863 beq a2,a0,a0 <repeatedNTimes+0x64>
74: 00200713 li a4,2
78: 00a69c63 bne a3,a0,90 <repeatedNTimes+0x54>
7c: 00008067 ret
80: 0007a503 lw a0,0(a5)
84: 00478793 addi a5,a5,4
88: 00a68c63 beq a3,a0,a0 <repeatedNTimes+0x64>
8c: 00e58c63 beq a1,a4,a4 <repeatedNTimes+0x68>
90: 0087a683 lw a3,8(a5)
94: 0047a503 lw a0,4(a5)
98: 00170713 addi a4,a4,1
9c: fea692e3 bne a3,a0,80 <repeatedNTimes+0x44>
a0: 00008067 ret
a4: fff00513 li a0,-1
a8: 00008067 ret
000000ac <main>:
ac: 000207b7 lui a5,0x20
b0: 05878793 addi a5,a5,88 # 20058 <__malloc_trim_threshold+0x20>
b4: 0007ae03 lw t3,0(a5)
b8: 0047a303 lw t1,4(a5)
bc: 0087a883 lw a7,8(a5)
c0: 00c7a803 lw a6,12(a5)
c4: 0107a603 lw a2,16(a5)
c8: 0147a683 lw a3,20(a5)
cc: 0187a703 lw a4,24(a5)
d0: 01c7a783 lw a5,28(a5)
d4: fc010113 addi sp,sp,-64
d8: 00800593 li a1,8
dc: 00010513 mv a0,sp
e0: 02112e23 sw ra,60(sp)
e4: 02812c23 sw s0,56(sp)
e8: 02912a23 sw s1,52(sp)
ec: 03212823 sw s2,48(sp)
f0: 03312623 sw s3,44(sp)
f4: 01c12023 sw t3,0(sp)
f8: 00612223 sw t1,4(sp)
fc: 01112423 sw a7,8(sp)
100: 01012623 sw a6,12(sp)
104: 00c12823 sw a2,16(sp)
108: 00d12a23 sw a3,20(sp)
10c: 00e12c23 sw a4,24(sp)
110: 00f12e23 sw a5,28(sp)
114: f29ff0ef jal ra,3c <repeatedNTimes>
118: 00050993 mv s3,a0
11c: 05b00513 li a0,91
120: 00010413 mv s0,sp
124: 0e8000ef jal ra,20c <putchar>
128: 02010913 addi s2,sp,32
12c: 000204b7 lui s1,0x20
130: 00042583 lw a1,0(s0)
134: 03c48513 addi a0,s1,60 # 2003c <__malloc_trim_threshold+0x4>
138: 00440413 addi s0,s0,4
13c: 078000ef jal ra,1b4 <printf>
140: ff2418e3 bne s0,s2,130 <main+0x84>
144: 00020537 lui a0,0x20
148: 00098593 mv a1,s3
14c: 04050513 addi a0,a0,64 # 20040 <__malloc_trim_threshold+0x8>
150: 064000ef jal ra,1b4 <printf>
154: 03c12083 lw ra,60(sp)
158: 03812403 lw s0,56(sp)
15c: 03412483 lw s1,52(sp)
160: 03012903 lw s2,48(sp)
164: 02c12983 lw s3,44(sp)
168: 00000513 li a0,0
16c: 04010113 addi sp,sp,64
170: 00008067 ret
...
```
:::
### Starting for data loading
In the main function, that will load/store some data input.
C Code
```c=
int a[]={5,1,5,2,5,3,5,4};
int numsSize=8;
```
Disassembly Code
```
94: 000207b7 lui a5,0x20
98: 04478793 addi a5,a5,68 # 20044 <__malloc_trim_threshold+0xc>
9c: 0007a303 lw t1,0(a5)
a0: 0047a883 lw a7,4(a5)
a4: 0087a803 lw a6,8(a5)
a8: 00c7a583 lw a1,12(a5)
ac: 0107a603 lw a2,16(a5)
b0: 0147a683 lw a3,20(a5)
b4: 0187a703 lw a4,24(a5)
b8: 01c7a783 lw a5,28(a5)
...
```
In GTKWave

Because srv32 is a 3 stage processor model, when the lw t1,0(at) /(00071303) is in instr, after 2 cycle the wb_rdata(write data) signal output.We can observe that the wb_rdata will be {5,1,5,2,5,3,5,4} that is not any hazard in this time because it is the load a list of data in process.(see disassembly code)
### Repeat N Times function
```c=
int repeatedNTimes(int* nums, int numsSize){
if(numsSize>=3)
{
if(nums[0]==nums[numsSize-1] || nums[0]==nums[numsSize-2])
return nums[0];
if(nums[1]==nums[numsSize-1] || nums[1]==nums[0])
return nums[1];
}
for(int i=2;i<numsSize;i++)
if(nums[i]==nums[i-1] || nums[i]==nums[i-2])
return nums[i];
return -1;
}
```
In GTKwave ,that has a brance taken in ex_pc=68

Corresponding disassembly code
```c=
60: 04c68063 beq a3,a2,a0 <repeatedNTimes+0x64>
64: ffc72503 lw a0,-4(a4)
68: 02a68c63 beq a3,a0,a0 <repeatedNTimes+0x64>
6c: 0047a503 lw a0,4(a5)
70: 02a60863 beq a2,a0,a0 <repeatedNTimes+0x64>
...
a0: 00008067 ret
```
IF/ID | EX | WB
:----:|:----:|:----:
lw | beq | lw
6c | 68 | 64
Branch taken occur in 68: beq a3,a0,a0
and then next_pc=0xa0 (repeatedNtime=0x3c,0x3c+0x64=0xa0)
The array A={5,3,5,...} is checked that the first number is equal the third number.
function return(0xa0 : ret)
In the this branch taken penalty:
Cycle|1|2|3
:---|:-:|:-:|:-:
IF/ID| lw | NOP | ret
Ex | beq| NOP | NOP
WB | lw | beq | NOP
### Other Input Data
Input data={5,1,10,5,2,5,3,5,5,5,4,11}

Corresponding disassembly code
```
...
80: 0007a503 lw a0,0(a5)
84: 00478793 addi a5,a5,4
88: 00a68c63 beq a3,a0,a0 <repeatedNTimes+0x64>
8c: 00e58c63 beq a1,a4,a4 <repeatedNTimes+0x68>
90: 0087a683 lw a3,8(a5)
94: 0047a503 lw a0,4(a5)
98: 00170713 addi a4,a4,1
9c: fea692e3 bne a3,a0,80 <repeatedNTimes+0x44>
a0: 00008067 ret
...
```
We can find that this the for loop(0x90 instruction) in C code function.But this jump backward is prediction fail in srv32.
Cycle|0|1|2|3|
:-----|:-:|:-:|:-:|:-:|
IF/ID |ret |NOP|lw(0x80) |addi|
Ex |bne(0x94)|NOP|NOP |lw |
WB |addi |bne|NOP |NOP |

This is a huge problem when the array size is large, but we can reduce the effect of the problem by loop unrolling.And then we need to make the sequential instruction more.(reduce jump backward instruction occur)
## Software Optimization
Because of srv32 that is not any RAW or any load-use penalties(referrence:[Lab3: srv32 - RISCV RV32IM Soft CPU](https://hackmd.io/@sysprog/S1Udn1Xtt)),we not need to optimize for instruction scheduling, only control hazards are needed to focus.
According the analyzing above, I rewrite the C programming in two key points for reduce brance prediction fail(control hazard):
1. Loop unrolling
Loop unrolling can reduce the jump counter, when execute 2 round only check once condition, that can be half of condition check...
2. More instruction sequential parts
adding the break condition in the middle of the loop, the break condition will be predict to no branch taken, and then the branch prediction fail only the mission complete.That can be reduce half of branch penalties.
Input array:

### C code only Loop unrolling
```c=
int repeatedNTimes(int* nums, int numsSize){
if(numsSize>=3)
{
if(nums[0]==nums[numsSize-1] || nums[0]==nums[numsSize-2])
return nums[0];
if(nums[1]==nums[numsSize-1] || nums[1]==nums[0])
return nums[1];
}
int i=2;
for(;i+8<numsSize;i+=8)
{
if(nums[i]==nums[i-1] || nums[i]==nums[i-2])
return nums[i];
if(nums[i+1]==nums[i] || nums[i+1]==nums[i-1])
return nums[i+1];
if(nums[i+2]==nums[i+1] || nums[i+2]==nums[i])
return nums[i+2];
if(nums[i+3]==nums[i+2] || nums[i+3]==nums[i+1])
return nums[i+3];
if(nums[i+4]==nums[i+3] || nums[i+4]==nums[i+2])
return nums[i+4];
if(nums[i+5]==nums[i+4] || nums[i+5]==nums[i+3])
return nums[i+5];
if(nums[i+6]==nums[i+5] || nums[i+6]==nums[i+4])
return nums[i+6];
if(nums[i+7]==nums[i+6] || nums[i+7]==nums[i+5])
return nums[i+7];
}
for(;i<numsSize;i++)
{
if( nums[i]==nums[i-1] || nums[i]==nums[i-2])
return nums[i];
}
return -1;
}
```
### C code : Loop unrolling and addition break condition
```c=
int repeatedNTimes(int* nums, int numsSize){
if(numsSize>=3)
{
if(nums[0]==nums[numsSize-1] || nums[0]==nums[numsSize-2])
return nums[0];
if(nums[1]==nums[numsSize-1] || nums[1]==nums[0])
return nums[1];
}
int i=2;
for(;i+8<numsSize;i+=8)
{
if(nums[i]==nums[i-1] || nums[i]==nums[i-2])
return nums[i];
if(nums[i+1]==nums[i] || nums[i+1]==nums[i-1])
return nums[i+1];
if(nums[i+2]==nums[i+1] || nums[i+2]==nums[i])
return nums[i+2];
if(nums[i+3]==nums[i+2] || nums[i+3]==nums[i+1])
return nums[i+3];
if(nums[i+4]==nums[i+3] || nums[i+4]==nums[i+2])
return nums[i+4];
if(nums[i+5]==nums[i+4] || nums[i+5]==nums[i+3])
return nums[i+5];
if(nums[i+6]==nums[i+5] || nums[i+6]==nums[i+4])
return nums[i+6];
if(nums[i+7]==nums[i+6] || nums[i+7]==nums[i+5])
return nums[i+7];
i+=8;
if(i+8>=numsSize)
break;
if(nums[i]==nums[i-1] || nums[i]==nums[i-2])
return nums[i];
if(nums[i+1]==nums[i] || nums[i+1]==nums[i-1])
return nums[i+1];
if(nums[i+2]==nums[i+1] || nums[i+2]==nums[i])
return nums[i+2];
if(nums[i+3]==nums[i+2] || nums[i+3]==nums[i+1])
return nums[i+3];
if(nums[i+4]==nums[i+3] || nums[i+4]==nums[i+2])
return nums[i+4];
if(nums[i+5]==nums[i+4] || nums[i+5]==nums[i+3])
return nums[i+5];
if(nums[i+6]==nums[i+5] || nums[i+6]==nums[i+4])
return nums[i+6];
if(nums[i+7]==nums[i+6] || nums[i+7]==nums[i+5])
return nums[i+7];
}
for(;i<numsSize;i++)
{
if( nums[i]==nums[i-1] || nums[i]==nums[i-2])
return nums[i];
}
return -1;
}
```
### assembly code: In break condition parts
```assembly
e8: 09168663 beq a3,a7,174 <repeatedNTimes+0x138>
ec: 07160a63 beq a2,a7,160 <repeatedNTimes+0x124>
f0: 01080813 addi a6,a6,16
f4: 0287a503 lw a0,40(a5)
#Here f8 is break condition check
f8: 0ab85c63 bge a6,a1,1b0 <repeatedNTimes+0x174>
fc: 06a88863 beq a7,a0,16c <repeatedNTimes+0x130>
100: 06a68663 beq a3,a0,16c <repeatedNTimes+0x130>
...
1b0: 00030813 mv a6,t1
1b4: fcdff06f j 180 <repeatedNTimes+0x144>
...
```
### waveform in break condition

We can find that when the cpu execute the instruction, the branch prediction is success(if_pc=if_pc+4) when the ex_pc=0xf8,that is no branch penalties unlike the condition check in the end of the loop.
|Simulation|Before Optimization|Loop unrolling|Loop unrolling and addition break condition
|:--:|:--:|:--:|:--:|
|RTL|109709 cycles|109424 cycles |109403 cycles
|ISS|109698 cycles|109413 cycles |109392 cycles
## How RISC-V Compliance Tests works?
In RISC-V Compliance Tests will running the RISC-V binary file by two ways.Once(RTL simulator) generate the test signnature, other(ISS simurator) generate the reference signature.
According to the description of [riscv-arch-test](https://github.com/riscv-non-isa/riscv-arch-test/blob/master/doc/README.adoc#the-architectural-test),
> **Signature**
> The data Written into specific memory location during the execution >of the test.Values of the operations carried out in the test.
In srv32,RTL simulator generate the test signature and the ISS simulator generate the reference signature.We can see that the Makefile in srv32:
```bash=
$(SUBDIRS):
@$(MAKE) rv32c=$(rv32c) -C sw $@
@$(MAKE) $(if $(_verilator), verilator=1) \
$(if $(_coverage), coverate=1) \
$(if $(_top), top=1) rv32c=$(rv32c) debug=$(debug) -C sim $@.run
@$(MAKE) $(if $(_top), top=1) rv32c=$(rv32c) -C tools $@.run #Focus this line
@echo "Compare the trace between RTL and ISS simulator"
@diff --brief sim/trace.log tools/trace.log#and this line
@echo === Simulation passed ===
```
In the line,
```bash=
@$(MAKE) $(if $(_top), top=1) rv32c=$(rv32c) -C tools $@.run
```
That is running the ISS simulator.And then,
```bash=
@diff --brief sim/trace.log tools/trace.log#and this line
```
we can check that the "tools" directory's name.In the "tools" and "sim" directories, we can find the "trace.log" file.Open "trace.log file", we can see that
```assembly=
1 00000000 00015297 x05 (t0) <= 0x00015000
2 00000004 87028293 x05 (t0) <= 0x00014870
3 00000008 30529073 x00 (zero) <= 0x00000000
4 0000000c 3050e073 x00 (zero) <= 0x00014870
5 00000010 00022297 x05 (t0) <= 0x00022010
6 00000014 85428293 x05 (t0) <= 0x00021864
7 00000018 00022317 x06 (t1) <= 0x00022018
8 0000001c 88c30313 x06 (t1) <= 0x000218a4
9 00000020 0002a023 write 0x00021864 <= 0x00000000
10 00000024 00428293 x05 (t0) <= 0x00021868
11 00000028 fe62ece3
14 00000020 0002a023 write 0x00021868 <= 0x00000000
15 00000024 00428293 x05 (t0) <= 0x0002186c
16 00000028 fe62ece3
19 00000020 0002a023 write 0x0002186c <= 0x00000000
20 00000024 00428293 x05 (t0) <= 0x00021870
21 00000028 fe62ece3
24 00000020 0002a023 write 0x00021870 <= 0x00000000
25 00000024 00428293 x05 (t0) <= 0x00021874
26 00000028 fe62ece3
29 00000020 0002a023 write 0x00021874 <= 0x00000000
30 00000024 00428293 x05 (t0) <= 0x00021878
31 00000028 fe62ece3
34 00000020 0002a023 write 0x00021878 <= 0x00000000
35 00000024 00428293 x05 (t0) <= 0x0002187c
36 00000028 fe62ece3
39 00000020 0002a023 write 0x0002187c <= 0x00000000
40 00000024 00428293 x05 (t0) <= 0x00021880
41 00000028 fe62ece3
44 00000020 0002a023 write 0x00021880 <= 0x00000000
45 00000024 00428293 x05 (t0) <= 0x00021884
46 00000028 fe62ece3
49 00000020 0002a023 write 0x00021884 <= 0x00000000
50 00000024 00428293 x05 (t0) <= 0x00021888
51 00000028 fe62ece3
```
First column
- order of instruction
Second column
- Instruction lines
Third column
- The instruction in binary
Forth column
- the data written in operation
- example: x05(t0) <= 0x0015000 is write data 0x0015000 to register t0
And the command
```bash=
@diff --brief sim/trace.log tools/trace.log
```
is check the differrence of the sim/trace.log and tools/trace.log, when the files are no differrence such that the test signature and the reference signature is the same.The RISC-V Compliance test was success.
**Why the signature should be match?**
The reference signature is the RISC-V model that are generate by rvsim(software).We can find that C code in "tools" directories.The rvsim program(ISS simulator) run the RISC-V binary by simulation, and output the operations during the process.The is the goal of the hardware build.
I think that is write in C code can be easy to modify and update.If the RISC-V module are finish, it need to be present by hardware description language for making.(Verilog)
So the RTL simulator is written by Verilog, and we need to check that the Verilog's RISC-V model is same as our model built in C Code(or other).How to make sure the model is same?** Writing the test cases and compare the process during the test executions.** If the record(trace.log) is different that means the code in Verilog's language is wrong,that is needed to be debug and update until they meets.
## How srv32 work with Verilator?
If we need do describe how srv32 work with Verilator, we can choose the easiest case to introduction.
Type
```bash=
make qsort
```
in the srv32 we can get the series of results.
```bash=
riscv-none-embed-gcc -O3 -Wall -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -L../common -o qsort.elf qsort.c -lc -lm -lgcc -lsys -T ../common/default.ld
riscv-none-embed-objcopy -j .text -O binary qsort.elf imem.bin
riscv-none-embed-objcopy -j .data -O binary qsort.elf dmem.bin
riscv-none-embed-objcopy -O binary qsort.elf memory.bin
riscv-none-embed-objdump -d qsort.elf > qsort.dis
riscv-none-embed-readelf -a qsort.elf > qsort.symbol
```
Entering the directory qsort and do
*riscv-none-embed-gcc*
- compile the project qsort make qsort.elf file
*riscv-none-embed-objcopy*
- copy the qsort.elf(.text) to imem.bin (For Verilog)
- copy the qsort.elf(.data) to dmem.bin (For Verilog)
*riscv-none-embed-objdump*
- disassembly the qsort.elf
- output to qsort.dis (.text in assembly)
*riscv-none-embed-readelf*
- disassembly the qsort.elf output to qsort.symbol
And then,
Entering the directory sim, we can see that
```bash=
Before sorting:
-1.922930e+02 3.821900e+02 3.821800e+02 -1.922940e+02 1.000000e-06 2.838749e+08 1.000000e-07
After sorting:
-1.922940e+02 -1.922930e+02 1.000000e-07 1.000000e-06 3.821800e+02 3.821900e+02 2.838749e+08
Excuting 768832 instructions, 990884 cycles, 1.288 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
```
The simulation is running and the Verilator is finish.Look at the sim folder, we can find the
- imem.bin
- dmem.bin
- .sim
- sim_main.cpp
- Makefile
Let see the Makefile contents:
```makefile
%.run: $(TARGET) checkcode.awk
@if [ ! -f ../sw/$*/memory.bin ]; then \
make -C ../sw $*; \
fi
@cp ../sw/$*/*.bin .
@$(STDBUF) ./$(TARGET) $(RFLAGS) | awk -f $(filter %.awk, $^)
@if [ -f coverage.dat ]; then \
mv coverage.dat $*_cov.dat; \
fi
```
This seems like "make" something in sw directories(such like qsort,our program etc), and copy the *.bin(previous step's imem.bin,dmem.bin) to the directory 'sim'.Then run the ./$(TARGET) ("./sim") process.
If we typing make qsort.run in sim directories, we can get the result:
```bash=
Before sorting:
-1.922930e+02 3.821900e+02 3.821800e+02 -1.922940e+02 1.000000e-06 2.838749e+08 1.000000e-07
After sorting:
-1.922940e+02 -1.922930e+02 1.000000e-07 1.000000e-06 3.821800e+02 3.821900e+02 2.838749e+08
Excuting 768832 instructions, 990884 cycles, 1.288 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 8.291 s
Simulation cycles: 990895
Simulation speed : 0.119515 MHz
```
Yes, the simulation was run! We can check the '.sim' was running.There are some files also in directory such like sim_main.cpp.In that file,we can see
```c=
...
#include "verilated.h"
...
int main(int argc, char** argv)
{
Verilated::commandArgs(argc,argv);
Verilated::traceEverOn(true);
#ifdef HAVE_CHRONO
std::chrono::steady_clock::time_point time_begin;
#endif
Vriscv *top = new Vriscv;
top->stall = 1;
top->resetb = 0;
top->clk = 0;
#ifdef HAVE_CHRONO
time_begin = std::chrono::steady_clock::now();
#endif
while (!Verilated::gotFinish()) {
if (main_time > RESOLUTION) {
top->resetb = 1;
}
if (main_time > 30) {
top->stall = 0;
}
if ((main_time % RESOLUTION) == 1) {
top->clk = 1;
}
if ((main_time % RESOLUTION) == (RESOLUTION / 2 + 1)) {
top->clk = 0;
}
top->eval();
main_time++;
}
...
```
Verilated is the header file in Verilator.The ".sim" will call the Verilator(Verilated::commandArgs(argc,argv)) to simulate our program until the simulation finish(Verilated::gotFinish), using chrono to record the time durations.
**Summary**
How to srv32 work with Verilator:
1. using riscv compiler to compile the program output .elf file
2. copy the content of .elf to imem.bin,dmem.bin and others.
3. copy these *.bin file to "sim" directories
4. run the ".sim" program, the Verilator will be called in the program
5. record and print the duration of time and the simulation result