# [Assignment3: SoftCPU](https://hackmd.io/@sysprog/2022-arch-homework3)
###### tags: `RISC-V` `jserv`
## Before start
When following the steps on [lab3](https://hackmd.io/@sysprog/S1Udn1Xtt) to setup required environment of homework, I encountered some problems.
### Installation of RISC-V toolchains
In Makefile, we can see that risc-v toolchains will install in `/opt/riscv`, which might need a root user privilege to execute because it will add a new directory under the root.

When directly type in `make -j$(nproc)`, it aroused a permisson denied error and by using `sudo` can fix it.
```bash
sudo make -j$(nproc)
```
### Installation of verilator
When installing verilator, we have to add a new system variable to save the directory of verilator root. In lab3, command `export` was used but it can only take effect in current shell. If restarted we have to retype this command again. So I modified `~/.profile` to make it run this command automatically every time the shell restarted.
```bash
if [ -d "$HOME/verilator" ] ; then
VERILATOR_ROOT="$HOME/verilator"
fi
```
After add these lines into `~/.profile` we can use it as a source and check the value of system variable.
```bash
source ~/.profile
echo $VERILATOR_ROOT
```
You should see the following result, where `teimeiki` will be replaced with your user name:

And after add the system variable, the instruction tells me to run `./configure` but it turned out the file doesn't exist. We should first use `autoconf` to generate `configure` file with defult setting in `configure.ac`.
The following is all command I used:
```bash
# build riscv toolchain from source code
sudo apt install autoconf automake autotools-dev curl gawk git \
build-essential bison flex texinfo gperf libtool patchutils bc git \
libmpc-dev libmpfr-dev libgmp-dev gawk zlib1g-dev libexpat1-dev
git clone --recursive https://github.com/riscv/riscv-gnu-toolchain
cd riscv-gnu-toolchain
mkdir -p build && cd build
../configure --prefix=/opt/riscv --with-isa-spec=20191213 \
--with-multilib-generator="rv32i-ilp32--;rv32im-ilp32--;rv32imac-ilp32--;rv32im_zicsr-ilp32--;rv32imac_zicsr-ilp32--;rv64imac-lp64--;rv64imac_zicsr-lp64--"
sudo make -j$(nproc)
```
```bash
# install some dependent packages
sudo apt install build-essential lcov ccache libsystemc-dev
```
```bash
# get verilator
cd $HOME
git clone https://github.com/verilator/verilator
cd verilator
git checkout stable
```
Add this into `~/.profile`:
```bash
if [ -d "$HOME/verilator" ] ; then
VERILATOR_ROOT="$HOME/verilator"
fi
```
```bash
# to generate ./configure
autoconf
# build verilator
./configure
make
# check whether installation have problems
make test
```
Modify `~/.profile` again:
```bash
if [ -d "$HOME/verilator" ] ; then
VERILATOR_ROOT="$HOME/verilator"
PATH="$VERILATOR_ROOT/bin:$PATH"
fi
```
```bash
# Make sure the version of Verilator >= 5.002
verilator --version
```
```bash
# get srv32
git clone https://github.com/sysprog21/srv32
```
After installation of srv32, I try to use `make all` to check whether the invironment is set up correctly and this error occured:
```bash
Vriscv.mk:57: /usr/local/share/verilator/include/verilated.mk: No such file or directory
make[3]: *** No rule to make target '/usr/local/share/verilator/include/verilated.mk'. Stop.
make[3]: Leaving directory '/home/teimeiki/srv32/sim/sim_cc'
%Error: make -C sim_cc -f Vriscv.mk -j 1 exited with 2
%Error: Command Failed /home/teimeiki/verilator/bin/verilator_bin -O3 -cc -Wall -Wno-STMTDLY -Wno-UNUSED +define+MEMSIZE=128 --trace-fst --Mdir sim_cc --build --exe sim_main.cpp getch.cpp -o sim -f filelist.txt ../rtl/top.v
make[2]: *** [Makefile:45: sim] Error 2
make[2]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: *** [Makefile:118: pi_pthread] Error 2
make[1]: Leaving directory '/home/teimeiki/srv32'
make: *** [Makefile:94: all] Error 1
```
After read the `Vriscv.mk` in the directory `srv32/sim/sim_cc` , I realized that the value of `$VERILATOR_ROOT` will be chaned in makefile and commenting the line didn't help because it will automatically uncomment when we call `make` command. So I tried to set flag when calling `make` command:
```bash
make all VERILATOR_ROOT="$VERILATOR_ROOT"
```
The value of VERILATOR_ROOT will be specified even if mk scripts reassign the value. It finally works fine.
:::info
The reason is that I didn't `export` the environment variable. After I add `export VERILATOR_ROOT` and `export CROSS_COMPILE` in `~/.profile`, the problem is solved and there is no need to specify environment variable in command line.
:::
## Srv32
### Data hazard
### Control hazard
In `README.md` file of srv32, it state that `Two instructions branch penalty if branch taken, CPI is 1 for other instructions.`. We can also see this phenomena with a testing program.
I want to check the wave form to observe this phenomena, so the following code is introduced:
```clike
int main(void) {
for(int i = 0; i < 1000; i++);
return 0;
}
```
It only composed by a for loop doing nothig, which will produce a successive taken conditional branch. And compiled by `-O0` optimization flag specified:
```bash
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o hello.elf hello.c -lc -lm -lgcc -lsys -T ../common/default.ld
```
With this program, we can obser the behavior of srv32 when encounter with successive conditional branches. And the following waveform is generated:

We can know that every time branch taken, 2 instruction will be flushed.
Here is 2 implementation of the c code:
#### First
```assembly
.text
.global main
main:
addi sp, sp, -4 # main will called by _start
sw ra, 0(sp)
li x25, 100
li t0, 0
li t1, 1000
for:
bge t0, t1, end
addi t0, t0, 1
j for
end:
lw ra, 0(sp)
addi sp, sp, 4
ret
```
:::spoiler {state="close"} result
```bash
teimeiki@ubuntu:~/srv32$ make myTest
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/myTest'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o myTest.elf myTest.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary myTest.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary myTest.elf dmem.bin
riscv-none-elf-objcopy -O binary myTest.elf memory.bin
riscv-none-elf-objdump -d myTest.elf > myTest.dis
riscv-none-elf-readelf -a myTest.elf > myTest.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/myTest'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
Excuting 3039 instructions, 5049 cycles, 1.661 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.03 s
Simulation cycles: 5060
Simulation speed : 0.168667 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/myTest/myTest.elf
Excuting 3039 instructions, 5049 cycles, 1.661 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.002 s
Simulation cycles: 5049
Simulation speed : 2.395 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```
:::
#### Second
```assembly
.text
.global main
main:
addi sp, sp, -4 # main will called by _start
sw ra, 0(sp)
li t0, 0
li t1, 1000
for:
addi t0, t0, 1
blt t0, t1, for
lw ra, 0(sp)
addi sp, sp, 4
ret
```
:::spoiler {state="close"} result
```bash
teimeiki@ubuntu:~/srv32$ make myTest
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/myTest'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o myTest.elf myTest.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary myTest.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary myTest.elf dmem.bin
riscv-none-elf-objcopy -O binary myTest.elf memory.bin
riscv-none-elf-objdump -d myTest.elf > myTest.dis
riscv-none-elf-readelf -a myTest.elf > myTest.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/myTest'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
Excuting 2040 instructions, 4048 cycles, 1.984 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.026 s
Simulation cycles: 4059
Simulation speed : 0.156115 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/myTest/myTest.elf
Excuting 2040 instructions, 4048 cycles, 1.984 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.001 s
Simulation cycles: 4048
Simulation speed : 3.393 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```
:::
We cannot avoid bursting branch in for loop unless we use loop unrolling, but we can by changing implementation way to reduce instruction counts. The second implementation has fewer instruction counts, while it is actually a `do...while...` implementation so the first iteration will not be checked whether greater than 1000 or not.
| | instruction count | cycle count | CPI |
| -------- | -------- | -------- | -------- |
| first implementation | 3090 | 5049 | 1.661 |
| scecond implementation | 2040 | 4048 | 1.984 |
## Port hw2 to srv32
### Try to build previous work
I have tried some way to run my assembly code on srv32, but all of them failed at first.
I reference to [OscarShiang's previous work](https://hackmd.io/@oscarshiang/arch_hw3) because it is the scanty detailed report using assembly language and download the [source code](https://github.com/OscarShiang/srv32/blob/6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits/count_bits.s) on his github to check whether other's work can run. And encounter with follwin issue:
:::spoiler {state="close"} error message
```bash
startup.S: Assembler messages:
startup.S:5: Error: unrecognized opcode `csrw mtvec,t0'
startup.S:6: Error: unrecognized opcode `csrrsi zero,mtvec,1'
startup.S:74: Error: unrecognized opcode `csrr t5,mepc'
startup.S:76: Error: unrecognized opcode `csrw mepc,t5'
startup.S:133: Error: unrecognized opcode `csrr t5,mepc'
startup.S:135: Error: unrecognized opcode `csrw mepc,t5'
startup.S:192: Error: unrecognized opcode `csrr t5,mepc'
startup.S:194: Error: unrecognized opcode `csrw mepc,t5'
```
:::
I modified `Makefile.common` as suggested in this [issue](https://github.com/kuopinghsu/srv32/issues/10) to make it become compatible with current ISA spec version, but it turned out to failed too:
:::spoiler {state="close"} error message
```bash
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common'
make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -nostartfiles -nostdlib -L../common -o count_bits.elf count_bits.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary count_bits.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary count_bits.elf dmem.bin
riscv-none-elf-objcopy -O binary count_bits.elf memory.bin
riscv-none-elf-objdump -d count_bits.elf > count_bits.dis
riscv-none-elf-readelf -a count_bits.elf > count_bits.symbol
make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits'
make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw'
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim'
Illegal instruction at PC 0x00000188
Illegal instruction at PC 0x0000018c
Illegal instruction at PC 0x00000190
Illegal instruction at PC 0x00000194
DMEM address fffffd41 out of range
- ../rtl/../testbench/testbench.v:423: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.013 s
Simulation cycles: 1329
Simulation speed : 0.102231 MHz
make[1]: *** [Makefile:69: count_bits.run] Error 1
make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim'
make: *** [Makefile:80: count_bits] Error 2
```
:::
The illegal instruction is in subroutine `printf` and it is compressed instruction. Because the address in error message increase by 4 and `printf` contains the first compressed instruction in whole program, I think the problem might be caused by compress instruction. After checking srv32' readme again, I enalbed compressed instruction with `rv32c=1` flag and tried again, but it cannot run, too:
:::spoiler {state="close"} error message
```bash
teimeiki@ubuntu:~/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06$ make count_bits rv32c=1
make rv32c=1 -C sw count_bits
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common'
make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits'
riscv-none-elf-gcc -O0 -Wall -march=rv32imac_zicsr -mabi=ilp32 -nostartfiles -nostdlib -L../common -o count_bits.elf count_bits.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary count_bits.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary count_bits.elf dmem.bin
riscv-none-elf-objcopy -O binary count_bits.elf memory.bin
riscv-none-elf-objdump -d count_bits.elf > count_bits.dis
riscv-none-elf-readelf -a count_bits.elf > count_bits.symbol
make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits'
make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw'
make verilator=1 \
\
rv32c=1 debug=0 -C sim count_bits.run
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim'
Illegal instruction at PC 0x00000002
```
:::
And it even cannot terminate.
If we disable `printf`, OscarShiang's code can run normally:
:::spoiler {state="close"} can run, but no output on screen
```bash
teimeiki@ubuntu:~/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06$ make count_bits
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common'
make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -nostartfiles -nostdlib -L../common -o count_bits.elf count_bits.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary count_bits.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary count_bits.elf dmem.bin
riscv-none-elf-objcopy -O binary count_bits.elf memory.bin
riscv-none-elf-objdump -d count_bits.elf > count_bits.dis
riscv-none-elf-readelf -a count_bits.elf > count_bits.symbol
make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits'
make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw'
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim'
Excuting 187 instructions, 257 cycles, 1.374 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.009 s
Simulation cycles: 268
Simulation speed : 0.0297778 MHz
make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim'
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/tools'
gcc -c -o rvsim.o rvsim.c -O3 -g -Wall
gcc -c -o decompress.o decompress.c -O3 -g -Wall
gcc -c -o syscall.o syscall.c -O3 -g -Wall
gcc -c -o elfread.o elfread.c -O3 -g -Wall
gcc -c -o getch.o getch.c -O3 -g -Wall
gcc -O3 -g -Wall -o rvsim rvsim.o decompress.o syscall.o elfread.o getch.o
./rvsim --memsize 128 -l trace.log ../sw/count_bits/count_bits.elf
Excuting 187 instructions, 257 cycles, 1.374 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.000 s
Simulation cycles: 257
Simulation speed : 2.705 MHz
make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```
:::
:::info
The problem is caused by `printf`. In OscarShiang’s work, `printf` always use a compressed instruction format so when we use normal 32bits instruction format, srv32 will borken. And a weird thing is that setting `rv32c=1` didn't help.
But after I test `printf` in my project cloned from [srv32](https://github.com/sysprog21/srv32), it turned out that `printf` works perfectly. So I suspended debugging previous work and port my hw2 to srv32 first.
:::
### Link start routine
At first, the linker will not link my program with `_start`, my main function will be used as `_start` insteaded. Also, the data will be directly append to text section rather than data section:
:::spoiler {state="close"} dumped disassembled code and memory
```assembly
teimeiki@ubuntu:~/srv32$ riscv-none-elf-objdump -d sw/myTestAs/myTest.elf
sw/myTestAs/myTest.elf: file format elf32-littleriscv
Disassembly of section .text:
00000000 <_start>:
0: fe810113 addi sp,sp,-24
4: 00000497 auipc s1,0x0
8: 08c48493 addi s1,s1,140 # 90 <arr1>
c: 00010913 mv s2,sp
10: 00000997 auipc s3,0x0
14: 08c9a983 lw s3,140(s3) # 9c <len1>
18: 00000293 li t0,0
1c: 00299313 slli t1,s3,0x2
00000020 <for1>:
20: 009283b3 add t2,t0,s1
24: 01228e33 add t3,t0,s2
28: 006e0eb3 add t4,t3,t1
2c: 0003a383 lw t2,0(t2)
30: 007e2023 sw t2,0(t3)
34: 007ea023 sw t2,0(t4)
38: 00428293 addi t0,t0,4
3c: fe62c2e3 blt t0,t1,20 <for1>
40: 00000293 li t0,0
44: 00399313 slli t1,s3,0x3
48: 04000893 li a7,64
0000004c <forPrint>:
4c: 005903b3 add t2,s2,t0
50: 00001537 lui a0,0x1
54: fff50513 addi a0,a0,-1 # fff <SYSBRK+0xf29>
58: 00038593 mv a1,t2
.
.
.
7c: fc62c8e3 blt t0,t1,4c <forPrint>
80: 0040006f j 84 <exit>
00000084 <exit>:
84: 00000513 li a0,0
88: 05d00893 li a7,93
8c: 00000073 ecall
00000090 <arr1>:
90: 00000001 .word 0x00000001
94: 00000002 .word 0x00000002
98: 00000003 .word 0x00000003
0000009c <len1>:
9c: 00000003 .word 0x00000003
000000a0 <space>:
a0: Address 0x00000000000000a0 is out of bounds.
a4:
000000a1 <comma>:
a1: Address 0x00000000000000a1 is out of bounds.
a5:
000000a2 <nline>:
a2: 000a .short 0x000a
```
```bash
teimeiki@ubuntu:~/srv32/sw/myTestAs$ hexdump memory.bin
0000000 0113 fe81 0497 0000 8493 08c4 0913 0001
0000010 0997 0000 a983 08c9 0293 0000 9313 0029
0000020 83b3 0092 8e33 0122 0eb3 006e a383 0003
0000030 2023 007e a023 007e 8293 0042 c2e3 fe62
0000040 0293 0000 9313 0039 0893 0400 03b3 0059
0000050 1537 0000 0513 fff5 8593 0003 0613 0040
0000060 0073 0000 0513 0010 0597 0000 8593 0385
0000070 0613 0010 0073 0000 8293 0042 c8e3 fc62
0000080 006f 0040 0513 0000 0893 05d0 0073 0000
0000090 0001 0000 0002 0000 0003 0000 0003 0000
00000a0 2c20 000a 0000 0000 0000 0000 0000 0000
00000b0 0000 0000 0000 0000 0000 0000 0000 0000
*
0020000 ffff ffff 0000 0000 ffff ffff 0000 0000
0020010
```
:::
This is because I use assembler to assemble my code at first, thus forgot to set flags such as `-nostartfiles` and `-nostdlib` because the assembler cannot recognize them. So when linking, somthing will go wrong.
I modified `Makefile` as following and make it properly:
```bash
include ../common/Makefile.common
EXE = .elf
SRC = concatenation_of_array.s
CFLAGS += -L../common
LDFLAGS += -T ../common/default.ld
TARGET = concatenation_of_array
OUTPUT = $(TARGET)$(EXE)
.PHONY: all clean
all: $(TARGET)
$(TARGET): $(SRC)
$(CC) $(CFLAGS) -o $(OUTPUT) $(SRC) $(LDFLAGS) -g
$(OBJCOPY) -j .text -O binary $(OUTPUT) imem.bin
$(OBJCOPY) -j .data -O binary $(OUTPUT) dmem.bin
$(OBJCOPY) -O binary $(OUTPUT) memory.bin
$(OBJDUMP) -d $(OUTPUT) > $(TARGET).dis
$(READELF) -a $(OUTPUT) > $(TARGET).symbol
clean:
$(RM) *.o $(OUTPUT) $(TARGET).dis $(TARGET).symbol [id]mem.bin memory.bin
```
### System call
In srv32, the calling convention seems to be same with rv32emu. After observing [sw/common/syscall.c](https://github.com/kuopinghsu/srv32/blob/20836e7077f4bd54aeef363fff1e68d03b12ff01/sw/common/syscall.c) and [tools/syscall.c](https://github.com/kuopinghsu/srv32/blob/20836e7077f4bd54aeef363fff1e68d03b12ff01/tools/syscall.c) under the directory `srv32`, I think the calling convention should be same.
`sw/common/syscall.c`:
```c=71
_write(int file, const void *ptr, size_t len)
{
#if HAVE_SYSCALL
int res = __internal_syscall(SYS_WRITE, (long)file, (long)ptr, (long)len, 0, 0, 0);
return res;
#else
const char *buf = (char*)ptr;
int i;
for(i=0; i<len; i++)
_putchar(buf[i]);
return len;
#endif
}
```
`tools/syscall.c`:
```c=80
case SYS_WRITE:
#if 0
if (a0 == STDOUT) {
int i;
for(i=0; i<a2; i++) {
char c = ptr[DVA2PA(a1)+i];
putchar(c);
}
fflush(stdout);
}
#else
res = (int)write(a0, (const char*)(&ptr[DVA2PA(a1)]), a2);
#endif
break;
```
In this 2 implementation, a0 is output file (STDOUT), a1 is the address of data and a2 is the len of data in byte. So I directly use my code written in hw2 to do system call but faild. After that, I reffer to [鄭至崴](https://hackmd.io/@Fo7UsdePRsKPVV4CPYGbpA/rydAP0OSo)'s suggestion and try `printf` again. After I rebuild my project, it can works find.
The usage of `printf` is described in [wanghanchi's work](https://hackmd.io/@wanghanchi/rkOjWYqBj?fbclid=IwAR1XR-7Z33EzA4HTW0Vm4fBg33MOs-Vx3NsQtEwQGyz6qljL3WFyOVn6ERo).
There is one thing we should pay attention to. `ecall` is an exception so the register used by it is seperated from GPRs, while `printf` is a subroutine, it will modify our GPRs. `ra` should be store before function call and so does `t0-6`, `a0-7` as long as we need them after calling `printf`. Otherwise we might get a wrong result.
### Change main into a subroutine
Because the main function will become a subroutine of `_start`, we have to modify it. In homework 2, I specify the address of main to `0x00`, but this address should be reserved for `_start` routine. I modified my code form:
```assembly
.org 0
.global _start
.set STDOUT, 1
.set SYSEXIT, 93
.set SYSWRITE, 64
.set SYSBRK, 214
.data
arr: .word 1, 2, 3 # a[3] = {1, 2, 3}
len1: .word 3 # array length of a is 3
space: .byte ' ' # space
nline: .byte '\n'
.text
_start:
addi sp, sp, -24 # allocate space for b
la s1, arr1
mv s2, sp # base address of b
...
MY CODE
```
to:
```assembly
.global main
.data
arr1: .word 1, 2, 3 # a[3] = {1, 2, 3}
len1: .word 3 # array length of a is 3
nline: .byte '\n'
iformat: .string "%d "
.text
main:
addi sp, sp, -28 # allocate space for b
sw ra, 24(sp)
la s1, arr1
mv s2, sp # base address of b
...
MY CODE
```
SYS call constants is eliminated because `printf` don't need them; The name is modified because there is another `_start` function so it is better to avoild duplicated names; The origin of `main` should be determined when linking so `.org 0` is eliminated; The main function should follow calling convention of rv32 so it should save its return address.
Also, the return value should be set properly. `a0` should be set to 0 before return if every thing goes perfectly, otherwise an error will be passed to `make` if we execute the code using `make` command:
:::spoiler {state="close"} An error occurs!
```bash
teimeiki@ubuntu:~/srv32$ make concatenation_of_array
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
1 2 3 1 2 3
Excuting 6995 instructions, 9287 cycles, 1.327 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.078 s
Simulation cycles: 9298
Simulation speed : 0.119205 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf
1 2 3 1 2 3
Excuting 6995 instructions, 9287 cycles, 1.328 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.003 s
Simulation cycles: 9287
Simulation speed : 2.656 MHz
make[1]: *** [Makefile:48: concatenation_of_array.run] Error 8
make[1]: Leaving directory '/home/teimeiki/srv32/tools'
make: *** [Makefile:119: concatenation_of_array] Error 2
```
:::
With a `li a0, 0` before returning:
:::spoiler {state="close"} No error!
```bash
teimeiki@ubuntu:~/srv32$ make concatenation_of_array
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
1 2 3 1 2 3
Excuting 6996 instructions, 9288 cycles, 1.327 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.075 s
Simulation cycles: 9299
Simulation speed : 0.123987 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf
1 2 3 1 2 3
Excuting 6996 instructions, 9288 cycles, 1.328 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.005 s
Simulation cycles: 9288
Simulation speed : 1.974 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```
:::
### Result
Here is the code after modification:
```assembly=
.global main
.data
arr1: .word 1, 2, 3 # a[3] = {1, 2, 3}
len1: .word 3 # array length of a is 3
nline: .byte '\n'
iformat: .string "%d "
.text
main:
addi sp, sp, -28 # allocate space for b
sw ra, 24(sp)
la s1, arr1
mv s2, sp # base address of b
lw s3, len1
li t0, 0 # i = 0
slli t1, s3, 2 # t1 = len(arr) * 4
for1:
add t2, t0, s1 # t2 = curr address of a
add t3, t0, s2 # t3 = curr address of b
add t4, t3, t1 # t4 = curr address of b + len of a
lw t2, 0(t2) # t2 = a[i]
sw t2, 0(t3) # b[i] = a[i]
sw t2, 0(t4) # b[i + len(a)] = a[i]
addi t0, t0, 4
blt t0, t1, for1
li s4, 0
slli s5, s3, 3 # t1 = len(arr) * 8
forPrint:
la a0, iformat
add t2, s2, s4 # t2 = address of b[i]
lw a1, 0(t2)
call printf
addi s4, s4, 4
blt s4, s5, forPrint
la a0, nline
call printf
lw ra, 24(sp)
addi sp, sp, 28
li a0, 0 # return 0
ret
```
And follwing is the output:
:::spoiler {state="close"} result
```bash
teimeiki@ubuntu:~/srv32$ make concatenation_of_array
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
1 2 3 1 2 3
Excuting 6996 instructions, 9288 cycles, 1.327 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.075 s
Simulation cycles: 9299
Simulation speed : 0.123987 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf
1 2 3 1 2 3
Excuting 6996 instructions, 9288 cycles, 1.328 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.005 s
Simulation cycles: 9288
Simulation speed : 1.974 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```
:::
## Optimization of hw2
I found that there is no need to store the value of stack pointer at line 14 because it will not change in main function. So I first modify it.
And I extend the array so the for loop will iterate for 40 times, so the improvement is more significant.
After modification, the result is as following:
:::spoiler {state="close"} with output on screen
```bash
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Excuting 71137 instructions, 91987 cycles, 1.293 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish
Simulation statistics
=====================
Simulation time : 10.029 s
Simulation cycles: 91998
Simulation speed : 0.0091732 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Excuting 71137 instructions, 91987 cycles, 1.293 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.042 s
Simulation cycles: 91987
Simulation speed : 2.200 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```
:::
:::spoiler {state="close"} without output on screen
```bash
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
Excuting 363 instructions, 449 cycles, 1.236 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.014 s
Simulation cycles: 460
Simulation speed : 0.0328571 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf
Excuting 363 instructions, 449 cycles, 1.237 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.000 s
Simulation cycles: 449
Simulation speed : 2.314 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```
:::
I modify the code as following:
```assembly
.global main
.data
arr1: .word 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40# a[3] = {1, 2, 3}
len1: .word 40 # array length of a is 3
nline: .byte '\n'
iformat: .string "%d "
.text
main:
addi sp, sp, -324 # allocate space for b
sw ra, 320(sp)
la s1, arr1
lw s3, len1
li t0, 0 # i = 0
slli t1, s3, 2 # t1 = len(arr) * 4
andi t2, s3, 0x3# count remainder ( i % 4)
li t3, 3
beq t3, t2, three
li t3, 2
beq t3, t2, two
li t3, 1
beq t3, t2, one
for1:
add t2, t0, s1 # t2 = curr address of a
add t3, t0, sp # t3 = curr address of b
add t4, t3, t1 # t4 = curr address of b + len of a
lw t2, 0(t2) # t2 = a[i]
sw t2, 0(t3) # b[i] = a[i]
sw t2, 0(t4) # b[i + len(a)] = a[i]
addi t0, t0, 4
three: # i % 4 == 3
add t2, t0, s1 # t2 = curr address of a
add t3, t0, sp # t3 = curr address of b
add t4, t3, t1 # t4 = curr address of b + len of a
lw t2, 0(t2) # t2 = a[i]
sw t2, 0(t3) # b[i] = a[i]
sw t2, 0(t4) # b[i + len(a)] = a[i]
addi t0, t0, 4
two: # i % 4 == 2
add t2, t0, s1 # t2 = curr address of a
add t3, t0, sp # t3 = curr address of b
add t4, t3, t1 # t4 = curr address of b + len of a
lw t2, 0(t2) # t2 = a[i]
sw t2, 0(t3) # b[i] = a[i]
sw t2, 0(t4) # b[i + len(a)] = a[i]
addi t0, t0, 4
one: # i % 4 == 1
add t2, t0, s1 # t2 = curr address of a
add t3, t0, sp # t3 = curr address of b
add t4, t3, t1 # t4 = curr address of b + len of a
lw t2, 0(t2) # t2 = a[i]
sw t2, 0(t3) # b[i] = a[i]
sw t2, 0(t4) # b[i + len(a)] = a[i]
addi t0, t0, 4
blt t0, t1, for1
li s4, 0
slli s5, s3, 3 # t1 = len(arr) * 8
forPrint:
la a0, iformat
add t2, sp, s4 # t2 = address of b[i]
lw a1, 0(t2)
call printf
addi s4, s4, 4
blt s4, s5, forPrint
la a0, nline
call printf
lw ra, 320(sp)
addi sp, sp, 324
li a0, 0 # return 0
ret
```
And the result is as following:
:::spoiler {state="close"} with output on screen
```assembly
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Excuting 71114 instructions, 91904 cycles, 1.292 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.976 s
Simulation cycles: 91915
Simulation speed : 0.0941752 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Excuting 71114 instructions, 91904 cycles, 1.292 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.042 s
Simulation cycles: 91904
Simulation speed : 2.206 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```
:::
:::spoiler {state="close"} without output on screen
```assembly
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
Excuting 342 instructions, 368 cycles, 1.076 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.014 s
Simulation cycles: 379
Simulation speed : 0.0270714 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf
Excuting 342 instructions, 368 cycles, 1.076 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.000 s
Simulation cycles: 368
Simulation speed : 1.878 MHz
make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```
:::
The instruction count of branch in for loop is reduced from $n$ to $n / 4 + 3$, where n is the length of input array; And 3 additional `lw` and 1 `andi` is needed.
| | instruction count | cycle count | CPI | LOC of for loop|
| -------- | -------- | -------- | -------- | -------- |
| with loop unrolling | 342 | 368 | 1.076 | 38 |
| without loop unrolling | 363 | 449 | 1.237 | 10 |
### Observing waveform
In windows terminal, enter this line to use gtkwave.
```bash
.\gtkwave.exe -f ..\hw2.fst
```
And select the signal you want to observe.
#### Control hazard

In this figure, we can see that each time a branch (or jump) is taken, two instruction will be flushed. This is called branch penalty, and can be illustrated by figure [here](https://github.com/sysprog21/srv32/blob/devel/images/branch.svg).

The instruction fetched is wrong if branch is taken, so 2 instruction following branch should be flushed. Branch prediction do not help because in this srv32 implementation, all of the destination of branch and jump is decided at EXE stage.
We can observe it in instruction fetching waveform too.

```assembly
dc: 007e2023 sw t2,0(t3)
e0: 007ea023 sw t2,0(t4)
e4: 00428293 addi t0,t0,4
e8: f862c8e3 blt t0,t1,78 <for1>
ec: 00000a13 li s4,0
f0: 00399a93 slli s5,s3,0x3
```
When fetch_pc is set to the address of `blt`, the memory will pass the instruction (`blt t0,t1,78`) to riscv CPU after 1 cycle.

And when `blt` flow to EXE stage, because it is a branch type instruction, the branch flag will be set and `next_pc` will be set to branch destination too. The `fetch_pc` will be updated with `next_pc` in next cycle and in this procedure, 2 instruction that shouldn't be executed will be fetched into our pipline so needed to be flushed.
#### Save and load data

DMEM:
```assembly
0000000 ffff ffff 0000 0000 ffff ffff 0000 0000
0000010 0001 0000 0002 0000 0003 0000 0004 0000
0000020 0005 0000 0006 0000 0007 0000 0008 0000
0000030 0009 0000 000a 0000 000b 0000 000c 0000
0000040 000d 0000 000e 0000 000f 0000 0010 0000
0000050 0011 0000 0012 0000 0013 0000 0014 0000
0000060 0015 0000 0016 0000 0017 0000 0018 0000
0000070 0019 0000 001a 0000 001b 0000 001c 0000
0000080 001d 0000 001e 0000 001f 0000 0020 0000
0000090 0021 0000 0022 0000 0023 0000 0024 0000
00000a0 0025 0000 0026 0000 0027 0000 0028 0000
00000b0 0028 0000 250a 2064 0000 0000
00000bc
```
CODE:
```assembly
7c: 00228e33 add t3,t0,sp
80: 006e0eb3 add t4,t3,t1
84: 0003a383 lw t2,0(t2)
88: 007e2023 sw t2,0(t3)
8c: 007ea023 sw t2,0(t4)
```
When `lw` goes to EXE stage, `dmem_rready` will be set and `dmem_raddr` will be set to the addressof data. After subtracting offset and truncating the least significant 2 bits, the answer will be passed to `raddr`. And the value stored in address will be read to CPU after a cycle. In this example, the value is `0000 0001`.
And when `sw` goes to `WB` stage, the `wready` signal will be set.

The result calculated by ALU will be passed to `wb_waddr`. After the subtraction of offset and truncation of least significant 2 bits, the address will be passed to `waddr` to store `wdata` into data memory.
## Select a LeetCode problem with medium difficulty
I select [longest-substring-without-repeating-characters](https://leetcode.com/problems/longest-substring-without-repeating-characters/description/) as my new object to implement.
Here is the discription:
>Given a string s, find the length of the longest substring without repeating characters.
>Example 1:
>```
>Input: s = "abcabcbb"
>Output: 3
>Explanation: The answer is "abc", with the length of 3.
>```
>Example 2:
>```
>Input: s = "bbbbb"
>Output: 1
>Explanation: The answer is "b", with the length of 1.
>```
>Example 3:
>```
>Input: s = "pwwkew"
>Output: 3
>Explanation: The answer is "wke", with the length of 3.
>Notice that the answer must be a substring, "pwke" is a subsequence and not a substring.
>```
>Constraints:
>
>* $0$ <= `s.length` <= $5 * 10^{4}$
>* `s` consists of English letters, digits, symbols and spaces.
### Algorithm
```clike
int lengthOfLongestSubstring(char *s) {
int map[128];
/* ASCII only use 7 bits to store */
for (int i = 0; i < 128; i++) {
map[i] = 0;
}
int maxLen = 0, start = 0;
/* start is the start point of longest substring containing s[i] */
for (int i = 0; s[i] != '\0'; i++){
if (map[s[i]] > start)
start = map[s[i]];
map[s[i]] = i + 1;
if (i + 1 - start > maxLen)
maxLen = i + 1 - start;
}
return maxLen;
}
```
For explanation, please consult [this solution](https://leetcode.com/problems/longest-substring-without-repeating-characters/solutions/1737/c-code-in-9-lines/?orderBy=most_votes).
And I want to reduce the space size the sparse array `map`, so I modifiy the constraints to:
>Constraints:
>
>* $0$ <= `s.length` <= $255$
>* `s` consists of lower case English letters.
Because there is only 26 letters now, the size of `map` can be reduced to 26; And the maximal length is 255 now so we can only use 8 bit to store it, which is a char. Here is the C code after modification:
```clike
int lengthOfLongestSubstring(char *s) {
char map[26];
for (int i = 0; i < 128; i++) {
map[i] = 0;
}
int maxLen = 0, start = 0;
/* start is the start point of longest substring containing s[i] */
for (int i = 0; s[i] != '\0'; i++){
if (map[s[i]] > start)
start = map[s[i]];
map[s[i]] = i + 1;
if (i + 1 - start > maxLen)
maxLen = i + 1 - start;
}
return maxLen;
}
```
The only modification is the assigment of map.
### Assembly code
:::info
todo: finish it :(
:::