# [Assignment3: SoftCPU](https://hackmd.io/@sysprog/2022-arch-homework3) ###### tags: `RISC-V` `jserv` ## Before start When following the steps on [lab3](https://hackmd.io/@sysprog/S1Udn1Xtt) to setup required environment of homework, I encountered some problems. ### Installation of RISC-V toolchains In Makefile, we can see that risc-v toolchains will install in `/opt/riscv`, which might need a root user privilege to execute because it will add a new directory under the root. ![](https://i.imgur.com/r3VuIU9.png) When directly type in `make -j$(nproc)`, it aroused a permisson denied error and by using `sudo` can fix it. ```bash sudo make -j$(nproc) ``` ### Installation of verilator When installing verilator, we have to add a new system variable to save the directory of verilator root. In lab3, command `export` was used but it can only take effect in current shell. If restarted we have to retype this command again. So I modified `~/.profile` to make it run this command automatically every time the shell restarted. ```bash if [ -d "$HOME/verilator" ] ; then VERILATOR_ROOT="$HOME/verilator" fi ``` After add these lines into `~/.profile` we can use it as a source and check the value of system variable. ```bash source ~/.profile echo $VERILATOR_ROOT ``` You should see the following result, where `teimeiki` will be replaced with your user name: ![](https://i.imgur.com/80LU29C.png) And after add the system variable, the instruction tells me to run `./configure` but it turned out the file doesn't exist. We should first use `autoconf` to generate `configure` file with defult setting in `configure.ac`. The following is all command I used: ```bash # build riscv toolchain from source code sudo apt install autoconf automake autotools-dev curl gawk git \ build-essential bison flex texinfo gperf libtool patchutils bc git \ libmpc-dev libmpfr-dev libgmp-dev gawk zlib1g-dev libexpat1-dev git clone --recursive https://github.com/riscv/riscv-gnu-toolchain cd riscv-gnu-toolchain mkdir -p build && cd build ../configure --prefix=/opt/riscv --with-isa-spec=20191213 \ --with-multilib-generator="rv32i-ilp32--;rv32im-ilp32--;rv32imac-ilp32--;rv32im_zicsr-ilp32--;rv32imac_zicsr-ilp32--;rv64imac-lp64--;rv64imac_zicsr-lp64--" sudo make -j$(nproc) ``` ```bash # install some dependent packages sudo apt install build-essential lcov ccache libsystemc-dev ``` ```bash # get verilator cd $HOME git clone https://github.com/verilator/verilator cd verilator git checkout stable ``` Add this into `~/.profile`: ```bash if [ -d "$HOME/verilator" ] ; then VERILATOR_ROOT="$HOME/verilator" fi ``` ```bash # to generate ./configure autoconf # build verilator ./configure make # check whether installation have problems make test ``` Modify `~/.profile` again: ```bash if [ -d "$HOME/verilator" ] ; then VERILATOR_ROOT="$HOME/verilator" PATH="$VERILATOR_ROOT/bin:$PATH" fi ``` ```bash # Make sure the version of Verilator >= 5.002 verilator --version ``` ```bash # get srv32 git clone https://github.com/sysprog21/srv32 ``` After installation of srv32, I try to use `make all` to check whether the invironment is set up correctly and this error occured: ```bash Vriscv.mk:57: /usr/local/share/verilator/include/verilated.mk: No such file or directory make[3]: *** No rule to make target '/usr/local/share/verilator/include/verilated.mk'. Stop. make[3]: Leaving directory '/home/teimeiki/srv32/sim/sim_cc' %Error: make -C sim_cc -f Vriscv.mk -j 1 exited with 2 %Error: Command Failed /home/teimeiki/verilator/bin/verilator_bin -O3 -cc -Wall -Wno-STMTDLY -Wno-UNUSED +define+MEMSIZE=128 --trace-fst --Mdir sim_cc --build --exe sim_main.cpp getch.cpp -o sim -f filelist.txt ../rtl/top.v make[2]: *** [Makefile:45: sim] Error 2 make[2]: Leaving directory '/home/teimeiki/srv32/sim' make[1]: *** [Makefile:118: pi_pthread] Error 2 make[1]: Leaving directory '/home/teimeiki/srv32' make: *** [Makefile:94: all] Error 1 ``` After read the `Vriscv.mk` in the directory `srv32/sim/sim_cc` , I realized that the value of `$VERILATOR_ROOT` will be chaned in makefile and commenting the line didn't help because it will automatically uncomment when we call `make` command. So I tried to set flag when calling `make` command: ```bash make all VERILATOR_ROOT="$VERILATOR_ROOT" ``` The value of VERILATOR_ROOT will be specified even if mk scripts reassign the value. It finally works fine. :::info The reason is that I didn't `export` the environment variable. After I add `export VERILATOR_ROOT` and `export CROSS_COMPILE` in `~/.profile`, the problem is solved and there is no need to specify environment variable in command line. ::: ## Srv32 ### Data hazard ### Control hazard In `README.md` file of srv32, it state that `Two instructions branch penalty if branch taken, CPI is 1 for other instructions.`. We can also see this phenomena with a testing program. I want to check the wave form to observe this phenomena, so the following code is introduced: ```clike int main(void) { for(int i = 0; i < 1000; i++); return 0; } ``` It only composed by a for loop doing nothig, which will produce a successive taken conditional branch. And compiled by `-O0` optimization flag specified: ```bash riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o hello.elf hello.c -lc -lm -lgcc -lsys -T ../common/default.ld ``` With this program, we can obser the behavior of srv32 when encounter with successive conditional branches. And the following waveform is generated: ![](https://i.imgur.com/mz9NS3Y.png) We can know that every time branch taken, 2 instruction will be flushed. Here is 2 implementation of the c code: #### First ```assembly .text .global main main: addi sp, sp, -4 # main will called by _start sw ra, 0(sp) li x25, 100 li t0, 0 li t1, 1000 for: bge t0, t1, end addi t0, t0, 1 j for end: lw ra, 0(sp) addi sp, sp, 4 ret ``` :::spoiler {state="close"} result ```bash teimeiki@ubuntu:~/srv32$ make myTest make[1]: Entering directory '/home/teimeiki/srv32/sw' make -C common make[2]: Entering directory '/home/teimeiki/srv32/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/teimeiki/srv32/sw/common' make[2]: Entering directory '/home/teimeiki/srv32/sw/myTest' riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o myTest.elf myTest.s -lc -lm -lgcc -lsys -T ../common/default.ld -g riscv-none-elf-objcopy -j .text -O binary myTest.elf imem.bin riscv-none-elf-objcopy -j .data -O binary myTest.elf dmem.bin riscv-none-elf-objcopy -O binary myTest.elf memory.bin riscv-none-elf-objdump -d myTest.elf > myTest.dis riscv-none-elf-readelf -a myTest.elf > myTest.symbol make[2]: Leaving directory '/home/teimeiki/srv32/sw/myTest' make[1]: Leaving directory '/home/teimeiki/srv32/sw' make[1]: Entering directory '/home/teimeiki/srv32/sim' Excuting 3039 instructions, 5049 cycles, 1.661 CPI Program terminate - ../rtl/../testbench/testbench.v:434: Verilog $finish Simulation statistics ===================== Simulation time : 0.03 s Simulation cycles: 5060 Simulation speed : 0.168667 MHz make[1]: Leaving directory '/home/teimeiki/srv32/sim' make[1]: Entering directory '/home/teimeiki/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/myTest/myTest.elf Excuting 3039 instructions, 5049 cycles, 1.661 CPI Program terminate Simulation statistics ===================== Simulation time : 0.002 s Simulation cycles: 5049 Simulation speed : 2.395 MHz make[1]: Leaving directory '/home/teimeiki/srv32/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ``` ::: #### Second ```assembly .text .global main main: addi sp, sp, -4 # main will called by _start sw ra, 0(sp) li t0, 0 li t1, 1000 for: addi t0, t0, 1 blt t0, t1, for lw ra, 0(sp) addi sp, sp, 4 ret ``` :::spoiler {state="close"} result ```bash teimeiki@ubuntu:~/srv32$ make myTest make[1]: Entering directory '/home/teimeiki/srv32/sw' make -C common make[2]: Entering directory '/home/teimeiki/srv32/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/teimeiki/srv32/sw/common' make[2]: Entering directory '/home/teimeiki/srv32/sw/myTest' riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o myTest.elf myTest.s -lc -lm -lgcc -lsys -T ../common/default.ld -g riscv-none-elf-objcopy -j .text -O binary myTest.elf imem.bin riscv-none-elf-objcopy -j .data -O binary myTest.elf dmem.bin riscv-none-elf-objcopy -O binary myTest.elf memory.bin riscv-none-elf-objdump -d myTest.elf > myTest.dis riscv-none-elf-readelf -a myTest.elf > myTest.symbol make[2]: Leaving directory '/home/teimeiki/srv32/sw/myTest' make[1]: Leaving directory '/home/teimeiki/srv32/sw' make[1]: Entering directory '/home/teimeiki/srv32/sim' Excuting 2040 instructions, 4048 cycles, 1.984 CPI Program terminate - ../rtl/../testbench/testbench.v:434: Verilog $finish Simulation statistics ===================== Simulation time : 0.026 s Simulation cycles: 4059 Simulation speed : 0.156115 MHz make[1]: Leaving directory '/home/teimeiki/srv32/sim' make[1]: Entering directory '/home/teimeiki/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/myTest/myTest.elf Excuting 2040 instructions, 4048 cycles, 1.984 CPI Program terminate Simulation statistics ===================== Simulation time : 0.001 s Simulation cycles: 4048 Simulation speed : 3.393 MHz make[1]: Leaving directory '/home/teimeiki/srv32/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ``` ::: We cannot avoid bursting branch in for loop unless we use loop unrolling, but we can by changing implementation way to reduce instruction counts. The second implementation has fewer instruction counts, while it is actually a `do...while...` implementation so the first iteration will not be checked whether greater than 1000 or not. | | instruction count | cycle count | CPI | | -------- | -------- | -------- | -------- | | first implementation | 3090 | 5049 | 1.661 | | scecond implementation | 2040 | 4048 | 1.984 | ## Port hw2 to srv32 ### Try to build previous work I have tried some way to run my assembly code on srv32, but all of them failed at first. I reference to [OscarShiang's previous work](https://hackmd.io/@oscarshiang/arch_hw3) because it is the scanty detailed report using assembly language and download the [source code](https://github.com/OscarShiang/srv32/blob/6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits/count_bits.s) on his github to check whether other's work can run. And encounter with follwin issue: :::spoiler {state="close"} error message ```bash startup.S: Assembler messages: startup.S:5: Error: unrecognized opcode `csrw mtvec,t0' startup.S:6: Error: unrecognized opcode `csrrsi zero,mtvec,1' startup.S:74: Error: unrecognized opcode `csrr t5,mepc' startup.S:76: Error: unrecognized opcode `csrw mepc,t5' startup.S:133: Error: unrecognized opcode `csrr t5,mepc' startup.S:135: Error: unrecognized opcode `csrw mepc,t5' startup.S:192: Error: unrecognized opcode `csrr t5,mepc' startup.S:194: Error: unrecognized opcode `csrw mepc,t5' ``` ::: I modified `Makefile.common` as suggested in this [issue](https://github.com/kuopinghsu/srv32/issues/10) to make it become compatible with current ISA spec version, but it turned out to failed too: :::spoiler {state="close"} error message ```bash make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw' make -C common make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common' make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits' riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -nostartfiles -nostdlib -L../common -o count_bits.elf count_bits.s -lc -lm -lgcc -lsys -T ../common/default.ld -g riscv-none-elf-objcopy -j .text -O binary count_bits.elf imem.bin riscv-none-elf-objcopy -j .data -O binary count_bits.elf dmem.bin riscv-none-elf-objcopy -O binary count_bits.elf memory.bin riscv-none-elf-objdump -d count_bits.elf > count_bits.dis riscv-none-elf-readelf -a count_bits.elf > count_bits.symbol make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits' make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw' make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim' Illegal instruction at PC 0x00000188 Illegal instruction at PC 0x0000018c Illegal instruction at PC 0x00000190 Illegal instruction at PC 0x00000194 DMEM address fffffd41 out of range - ../rtl/../testbench/testbench.v:423: Verilog $finish Simulation statistics ===================== Simulation time : 0.013 s Simulation cycles: 1329 Simulation speed : 0.102231 MHz make[1]: *** [Makefile:69: count_bits.run] Error 1 make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim' make: *** [Makefile:80: count_bits] Error 2 ``` ::: The illegal instruction is in subroutine `printf` and it is compressed instruction. Because the address in error message increase by 4 and `printf` contains the first compressed instruction in whole program, I think the problem might be caused by compress instruction. After checking srv32' readme again, I enalbed compressed instruction with `rv32c=1` flag and tried again, but it cannot run, too: :::spoiler {state="close"} error message ```bash teimeiki@ubuntu:~/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06$ make count_bits rv32c=1 make rv32c=1 -C sw count_bits make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw' make -C common make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common' make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits' riscv-none-elf-gcc -O0 -Wall -march=rv32imac_zicsr -mabi=ilp32 -nostartfiles -nostdlib -L../common -o count_bits.elf count_bits.s -lc -lm -lgcc -lsys -T ../common/default.ld -g riscv-none-elf-objcopy -j .text -O binary count_bits.elf imem.bin riscv-none-elf-objcopy -j .data -O binary count_bits.elf dmem.bin riscv-none-elf-objcopy -O binary count_bits.elf memory.bin riscv-none-elf-objdump -d count_bits.elf > count_bits.dis riscv-none-elf-readelf -a count_bits.elf > count_bits.symbol make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits' make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw' make verilator=1 \ \ rv32c=1 debug=0 -C sim count_bits.run make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim' Illegal instruction at PC 0x00000002 ``` ::: And it even cannot terminate. If we disable `printf`, OscarShiang's code can run normally: :::spoiler {state="close"} can run, but no output on screen ```bash teimeiki@ubuntu:~/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06$ make count_bits make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw' make -C common make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common' make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits' riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -nostartfiles -nostdlib -L../common -o count_bits.elf count_bits.s -lc -lm -lgcc -lsys -T ../common/default.ld -g riscv-none-elf-objcopy -j .text -O binary count_bits.elf imem.bin riscv-none-elf-objcopy -j .data -O binary count_bits.elf dmem.bin riscv-none-elf-objcopy -O binary count_bits.elf memory.bin riscv-none-elf-objdump -d count_bits.elf > count_bits.dis riscv-none-elf-readelf -a count_bits.elf > count_bits.symbol make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits' make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw' make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim' Excuting 187 instructions, 257 cycles, 1.374 CPI Program terminate - ../rtl/../testbench/testbench.v:418: Verilog $finish Simulation statistics ===================== Simulation time : 0.009 s Simulation cycles: 268 Simulation speed : 0.0297778 MHz make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim' make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/tools' gcc -c -o rvsim.o rvsim.c -O3 -g -Wall gcc -c -o decompress.o decompress.c -O3 -g -Wall gcc -c -o syscall.o syscall.c -O3 -g -Wall gcc -c -o elfread.o elfread.c -O3 -g -Wall gcc -c -o getch.o getch.c -O3 -g -Wall gcc -O3 -g -Wall -o rvsim rvsim.o decompress.o syscall.o elfread.o getch.o ./rvsim --memsize 128 -l trace.log ../sw/count_bits/count_bits.elf Excuting 187 instructions, 257 cycles, 1.374 CPI Program terminate Simulation statistics ===================== Simulation time : 0.000 s Simulation cycles: 257 Simulation speed : 2.705 MHz make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ``` ::: :::info The problem is caused by `printf`. In OscarShiang’s work, `printf` always use a compressed instruction format so when we use normal 32bits instruction format, srv32 will borken. And a weird thing is that setting `rv32c=1` didn't help. But after I test `printf` in my project cloned from [srv32](https://github.com/sysprog21/srv32), it turned out that `printf` works perfectly. So I suspended debugging previous work and port my hw2 to srv32 first. ::: ### Link start routine At first, the linker will not link my program with `_start`, my main function will be used as `_start` insteaded. Also, the data will be directly append to text section rather than data section: :::spoiler {state="close"} dumped disassembled code and memory ```assembly teimeiki@ubuntu:~/srv32$ riscv-none-elf-objdump -d sw/myTestAs/myTest.elf sw/myTestAs/myTest.elf: file format elf32-littleriscv Disassembly of section .text: 00000000 <_start>: 0: fe810113 addi sp,sp,-24 4: 00000497 auipc s1,0x0 8: 08c48493 addi s1,s1,140 # 90 <arr1> c: 00010913 mv s2,sp 10: 00000997 auipc s3,0x0 14: 08c9a983 lw s3,140(s3) # 9c <len1> 18: 00000293 li t0,0 1c: 00299313 slli t1,s3,0x2 00000020 <for1>: 20: 009283b3 add t2,t0,s1 24: 01228e33 add t3,t0,s2 28: 006e0eb3 add t4,t3,t1 2c: 0003a383 lw t2,0(t2) 30: 007e2023 sw t2,0(t3) 34: 007ea023 sw t2,0(t4) 38: 00428293 addi t0,t0,4 3c: fe62c2e3 blt t0,t1,20 <for1> 40: 00000293 li t0,0 44: 00399313 slli t1,s3,0x3 48: 04000893 li a7,64 0000004c <forPrint>: 4c: 005903b3 add t2,s2,t0 50: 00001537 lui a0,0x1 54: fff50513 addi a0,a0,-1 # fff <SYSBRK+0xf29> 58: 00038593 mv a1,t2 . . . 7c: fc62c8e3 blt t0,t1,4c <forPrint> 80: 0040006f j 84 <exit> 00000084 <exit>: 84: 00000513 li a0,0 88: 05d00893 li a7,93 8c: 00000073 ecall 00000090 <arr1>: 90: 00000001 .word 0x00000001 94: 00000002 .word 0x00000002 98: 00000003 .word 0x00000003 0000009c <len1>: 9c: 00000003 .word 0x00000003 000000a0 <space>: a0: Address 0x00000000000000a0 is out of bounds. a4: 000000a1 <comma>: a1: Address 0x00000000000000a1 is out of bounds. a5: 000000a2 <nline>: a2: 000a .short 0x000a ``` ```bash teimeiki@ubuntu:~/srv32/sw/myTestAs$ hexdump memory.bin 0000000 0113 fe81 0497 0000 8493 08c4 0913 0001 0000010 0997 0000 a983 08c9 0293 0000 9313 0029 0000020 83b3 0092 8e33 0122 0eb3 006e a383 0003 0000030 2023 007e a023 007e 8293 0042 c2e3 fe62 0000040 0293 0000 9313 0039 0893 0400 03b3 0059 0000050 1537 0000 0513 fff5 8593 0003 0613 0040 0000060 0073 0000 0513 0010 0597 0000 8593 0385 0000070 0613 0010 0073 0000 8293 0042 c8e3 fc62 0000080 006f 0040 0513 0000 0893 05d0 0073 0000 0000090 0001 0000 0002 0000 0003 0000 0003 0000 00000a0 2c20 000a 0000 0000 0000 0000 0000 0000 00000b0 0000 0000 0000 0000 0000 0000 0000 0000 * 0020000 ffff ffff 0000 0000 ffff ffff 0000 0000 0020010 ``` ::: This is because I use assembler to assemble my code at first, thus forgot to set flags such as `-nostartfiles` and `-nostdlib` because the assembler cannot recognize them. So when linking, somthing will go wrong. I modified `Makefile` as following and make it properly: ```bash include ../common/Makefile.common EXE = .elf SRC = concatenation_of_array.s CFLAGS += -L../common LDFLAGS += -T ../common/default.ld TARGET = concatenation_of_array OUTPUT = $(TARGET)$(EXE) .PHONY: all clean all: $(TARGET) $(TARGET): $(SRC) $(CC) $(CFLAGS) -o $(OUTPUT) $(SRC) $(LDFLAGS) -g $(OBJCOPY) -j .text -O binary $(OUTPUT) imem.bin $(OBJCOPY) -j .data -O binary $(OUTPUT) dmem.bin $(OBJCOPY) -O binary $(OUTPUT) memory.bin $(OBJDUMP) -d $(OUTPUT) > $(TARGET).dis $(READELF) -a $(OUTPUT) > $(TARGET).symbol clean: $(RM) *.o $(OUTPUT) $(TARGET).dis $(TARGET).symbol [id]mem.bin memory.bin ``` ### System call In srv32, the calling convention seems to be same with rv32emu. After observing [sw/common/syscall.c](https://github.com/kuopinghsu/srv32/blob/20836e7077f4bd54aeef363fff1e68d03b12ff01/sw/common/syscall.c) and [tools/syscall.c](https://github.com/kuopinghsu/srv32/blob/20836e7077f4bd54aeef363fff1e68d03b12ff01/tools/syscall.c) under the directory `srv32`, I think the calling convention should be same. `sw/common/syscall.c`: ```c=71 _write(int file, const void *ptr, size_t len) { #if HAVE_SYSCALL int res = __internal_syscall(SYS_WRITE, (long)file, (long)ptr, (long)len, 0, 0, 0); return res; #else const char *buf = (char*)ptr; int i; for(i=0; i<len; i++) _putchar(buf[i]); return len; #endif } ``` `tools/syscall.c`: ```c=80 case SYS_WRITE: #if 0 if (a0 == STDOUT) { int i; for(i=0; i<a2; i++) { char c = ptr[DVA2PA(a1)+i]; putchar(c); } fflush(stdout); } #else res = (int)write(a0, (const char*)(&ptr[DVA2PA(a1)]), a2); #endif break; ``` In this 2 implementation, a0 is output file (STDOUT), a1 is the address of data and a2 is the len of data in byte. So I directly use my code written in hw2 to do system call but faild. After that, I reffer to [鄭至崴](https://hackmd.io/@Fo7UsdePRsKPVV4CPYGbpA/rydAP0OSo)'s suggestion and try `printf` again. After I rebuild my project, it can works find. The usage of `printf` is described in [wanghanchi's work](https://hackmd.io/@wanghanchi/rkOjWYqBj?fbclid=IwAR1XR-7Z33EzA4HTW0Vm4fBg33MOs-Vx3NsQtEwQGyz6qljL3WFyOVn6ERo). There is one thing we should pay attention to. `ecall` is an exception so the register used by it is seperated from GPRs, while `printf` is a subroutine, it will modify our GPRs. `ra` should be store before function call and so does `t0-6`, `a0-7` as long as we need them after calling `printf`. Otherwise we might get a wrong result. ### Change main into a subroutine Because the main function will become a subroutine of `_start`, we have to modify it. In homework 2, I specify the address of main to `0x00`, but this address should be reserved for `_start` routine. I modified my code form: ```assembly .org 0 .global _start .set STDOUT, 1 .set SYSEXIT, 93 .set SYSWRITE, 64 .set SYSBRK, 214 .data arr: .word 1, 2, 3 # a[3] = {1, 2, 3} len1: .word 3 # array length of a is 3 space: .byte ' ' # space nline: .byte '\n' .text _start: addi sp, sp, -24 # allocate space for b la s1, arr1 mv s2, sp # base address of b ... MY CODE ``` to: ```assembly .global main .data arr1: .word 1, 2, 3 # a[3] = {1, 2, 3} len1: .word 3 # array length of a is 3 nline: .byte '\n' iformat: .string "%d " .text main: addi sp, sp, -28 # allocate space for b sw ra, 24(sp) la s1, arr1 mv s2, sp # base address of b ... MY CODE ``` SYS call constants is eliminated because `printf` don't need them; The name is modified because there is another `_start` function so it is better to avoild duplicated names; The origin of `main` should be determined when linking so `.org 0` is eliminated; The main function should follow calling convention of rv32 so it should save its return address. Also, the return value should be set properly. `a0` should be set to 0 before return if every thing goes perfectly, otherwise an error will be passed to `make` if we execute the code using `make` command: :::spoiler {state="close"} An error occurs! ```bash teimeiki@ubuntu:~/srv32$ make concatenation_of_array make[1]: Entering directory '/home/teimeiki/srv32/sw' make -C common make[2]: Entering directory '/home/teimeiki/srv32/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/teimeiki/srv32/sw/common' make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array' riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array' make[1]: Leaving directory '/home/teimeiki/srv32/sw' make[1]: Entering directory '/home/teimeiki/srv32/sim' 1 2 3 1 2 3 Excuting 6995 instructions, 9287 cycles, 1.327 CPI Program terminate - ../rtl/../testbench/testbench.v:434: Verilog $finish Simulation statistics ===================== Simulation time : 0.078 s Simulation cycles: 9298 Simulation speed : 0.119205 MHz make[1]: Leaving directory '/home/teimeiki/srv32/sim' make[1]: Entering directory '/home/teimeiki/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf 1 2 3 1 2 3 Excuting 6995 instructions, 9287 cycles, 1.328 CPI Program terminate Simulation statistics ===================== Simulation time : 0.003 s Simulation cycles: 9287 Simulation speed : 2.656 MHz make[1]: *** [Makefile:48: concatenation_of_array.run] Error 8 make[1]: Leaving directory '/home/teimeiki/srv32/tools' make: *** [Makefile:119: concatenation_of_array] Error 2 ``` ::: With a `li a0, 0` before returning: :::spoiler {state="close"} No error! ```bash teimeiki@ubuntu:~/srv32$ make concatenation_of_array make[1]: Entering directory '/home/teimeiki/srv32/sw' make -C common make[2]: Entering directory '/home/teimeiki/srv32/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/teimeiki/srv32/sw/common' make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array' riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array' make[1]: Leaving directory '/home/teimeiki/srv32/sw' make[1]: Entering directory '/home/teimeiki/srv32/sim' 1 2 3 1 2 3 Excuting 6996 instructions, 9288 cycles, 1.327 CPI Program terminate - ../rtl/../testbench/testbench.v:434: Verilog $finish Simulation statistics ===================== Simulation time : 0.075 s Simulation cycles: 9299 Simulation speed : 0.123987 MHz make[1]: Leaving directory '/home/teimeiki/srv32/sim' make[1]: Entering directory '/home/teimeiki/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf 1 2 3 1 2 3 Excuting 6996 instructions, 9288 cycles, 1.328 CPI Program terminate Simulation statistics ===================== Simulation time : 0.005 s Simulation cycles: 9288 Simulation speed : 1.974 MHz make[1]: Leaving directory '/home/teimeiki/srv32/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ``` ::: ### Result Here is the code after modification: ```assembly= .global main .data arr1: .word 1, 2, 3 # a[3] = {1, 2, 3} len1: .word 3 # array length of a is 3 nline: .byte '\n' iformat: .string "%d " .text main: addi sp, sp, -28 # allocate space for b sw ra, 24(sp) la s1, arr1 mv s2, sp # base address of b lw s3, len1 li t0, 0 # i = 0 slli t1, s3, 2 # t1 = len(arr) * 4 for1: add t2, t0, s1 # t2 = curr address of a add t3, t0, s2 # t3 = curr address of b add t4, t3, t1 # t4 = curr address of b + len of a lw t2, 0(t2) # t2 = a[i] sw t2, 0(t3) # b[i] = a[i] sw t2, 0(t4) # b[i + len(a)] = a[i] addi t0, t0, 4 blt t0, t1, for1 li s4, 0 slli s5, s3, 3 # t1 = len(arr) * 8 forPrint: la a0, iformat add t2, s2, s4 # t2 = address of b[i] lw a1, 0(t2) call printf addi s4, s4, 4 blt s4, s5, forPrint la a0, nline call printf lw ra, 24(sp) addi sp, sp, 28 li a0, 0 # return 0 ret ``` And follwing is the output: :::spoiler {state="close"} result ```bash teimeiki@ubuntu:~/srv32$ make concatenation_of_array make[1]: Entering directory '/home/teimeiki/srv32/sw' make -C common make[2]: Entering directory '/home/teimeiki/srv32/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/teimeiki/srv32/sw/common' make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array' riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array' make[1]: Leaving directory '/home/teimeiki/srv32/sw' make[1]: Entering directory '/home/teimeiki/srv32/sim' 1 2 3 1 2 3 Excuting 6996 instructions, 9288 cycles, 1.327 CPI Program terminate - ../rtl/../testbench/testbench.v:434: Verilog $finish Simulation statistics ===================== Simulation time : 0.075 s Simulation cycles: 9299 Simulation speed : 0.123987 MHz make[1]: Leaving directory '/home/teimeiki/srv32/sim' make[1]: Entering directory '/home/teimeiki/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf 1 2 3 1 2 3 Excuting 6996 instructions, 9288 cycles, 1.328 CPI Program terminate Simulation statistics ===================== Simulation time : 0.005 s Simulation cycles: 9288 Simulation speed : 1.974 MHz make[1]: Leaving directory '/home/teimeiki/srv32/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ``` ::: ## Optimization of hw2 I found that there is no need to store the value of stack pointer at line 14 because it will not change in main function. So I first modify it. And I extend the array so the for loop will iterate for 40 times, so the improvement is more significant. After modification, the result is as following: :::spoiler {state="close"} with output on screen ```bash make[1]: Entering directory '/home/teimeiki/srv32/sw' make -C common make[2]: Entering directory '/home/teimeiki/srv32/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/teimeiki/srv32/sw/common' make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array' riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array' make[1]: Leaving directory '/home/teimeiki/srv32/sw' make[1]: Entering directory '/home/teimeiki/srv32/sim' 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Excuting 71137 instructions, 91987 cycles, 1.293 CPI Program terminate - ../rtl/../testbench/testbench.v:434: Verilog $finish Simulation statistics ===================== Simulation time : 10.029 s Simulation cycles: 91998 Simulation speed : 0.0091732 MHz make[1]: Leaving directory '/home/teimeiki/srv32/sim' make[1]: Entering directory '/home/teimeiki/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Excuting 71137 instructions, 91987 cycles, 1.293 CPI Program terminate Simulation statistics ===================== Simulation time : 0.042 s Simulation cycles: 91987 Simulation speed : 2.200 MHz make[1]: Leaving directory '/home/teimeiki/srv32/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ``` ::: :::spoiler {state="close"} without output on screen ```bash make[1]: Entering directory '/home/teimeiki/srv32/sw' make -C common make[2]: Entering directory '/home/teimeiki/srv32/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/teimeiki/srv32/sw/common' make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array' riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array' make[1]: Leaving directory '/home/teimeiki/srv32/sw' make[1]: Entering directory '/home/teimeiki/srv32/sim' Excuting 363 instructions, 449 cycles, 1.236 CPI Program terminate - ../rtl/../testbench/testbench.v:434: Verilog $finish Simulation statistics ===================== Simulation time : 0.014 s Simulation cycles: 460 Simulation speed : 0.0328571 MHz make[1]: Leaving directory '/home/teimeiki/srv32/sim' make[1]: Entering directory '/home/teimeiki/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf Excuting 363 instructions, 449 cycles, 1.237 CPI Program terminate Simulation statistics ===================== Simulation time : 0.000 s Simulation cycles: 449 Simulation speed : 2.314 MHz make[1]: Leaving directory '/home/teimeiki/srv32/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ``` ::: I modify the code as following: ```assembly .global main .data arr1: .word 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40# a[3] = {1, 2, 3} len1: .word 40 # array length of a is 3 nline: .byte '\n' iformat: .string "%d " .text main: addi sp, sp, -324 # allocate space for b sw ra, 320(sp) la s1, arr1 lw s3, len1 li t0, 0 # i = 0 slli t1, s3, 2 # t1 = len(arr) * 4 andi t2, s3, 0x3# count remainder ( i % 4) li t3, 3 beq t3, t2, three li t3, 2 beq t3, t2, two li t3, 1 beq t3, t2, one for1: add t2, t0, s1 # t2 = curr address of a add t3, t0, sp # t3 = curr address of b add t4, t3, t1 # t4 = curr address of b + len of a lw t2, 0(t2) # t2 = a[i] sw t2, 0(t3) # b[i] = a[i] sw t2, 0(t4) # b[i + len(a)] = a[i] addi t0, t0, 4 three: # i % 4 == 3 add t2, t0, s1 # t2 = curr address of a add t3, t0, sp # t3 = curr address of b add t4, t3, t1 # t4 = curr address of b + len of a lw t2, 0(t2) # t2 = a[i] sw t2, 0(t3) # b[i] = a[i] sw t2, 0(t4) # b[i + len(a)] = a[i] addi t0, t0, 4 two: # i % 4 == 2 add t2, t0, s1 # t2 = curr address of a add t3, t0, sp # t3 = curr address of b add t4, t3, t1 # t4 = curr address of b + len of a lw t2, 0(t2) # t2 = a[i] sw t2, 0(t3) # b[i] = a[i] sw t2, 0(t4) # b[i + len(a)] = a[i] addi t0, t0, 4 one: # i % 4 == 1 add t2, t0, s1 # t2 = curr address of a add t3, t0, sp # t3 = curr address of b add t4, t3, t1 # t4 = curr address of b + len of a lw t2, 0(t2) # t2 = a[i] sw t2, 0(t3) # b[i] = a[i] sw t2, 0(t4) # b[i + len(a)] = a[i] addi t0, t0, 4 blt t0, t1, for1 li s4, 0 slli s5, s3, 3 # t1 = len(arr) * 8 forPrint: la a0, iformat add t2, sp, s4 # t2 = address of b[i] lw a1, 0(t2) call printf addi s4, s4, 4 blt s4, s5, forPrint la a0, nline call printf lw ra, 320(sp) addi sp, sp, 324 li a0, 0 # return 0 ret ``` And the result is as following: :::spoiler {state="close"} with output on screen ```assembly make[1]: Entering directory '/home/teimeiki/srv32/sw' make -C common make[2]: Entering directory '/home/teimeiki/srv32/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/teimeiki/srv32/sw/common' make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array' riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array' make[1]: Leaving directory '/home/teimeiki/srv32/sw' make[1]: Entering directory '/home/teimeiki/srv32/sim' 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Excuting 71114 instructions, 91904 cycles, 1.292 CPI Program terminate - ../rtl/../testbench/testbench.v:434: Verilog $finish Simulation statistics ===================== Simulation time : 0.976 s Simulation cycles: 91915 Simulation speed : 0.0941752 MHz make[1]: Leaving directory '/home/teimeiki/srv32/sim' make[1]: Entering directory '/home/teimeiki/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Excuting 71114 instructions, 91904 cycles, 1.292 CPI Program terminate Simulation statistics ===================== Simulation time : 0.042 s Simulation cycles: 91904 Simulation speed : 2.206 MHz make[1]: Leaving directory '/home/teimeiki/srv32/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ``` ::: :::spoiler {state="close"} without output on screen ```assembly make[1]: Entering directory '/home/teimeiki/srv32/sw' make -C common make[2]: Entering directory '/home/teimeiki/srv32/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/teimeiki/srv32/sw/common' make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array' riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array' make[1]: Leaving directory '/home/teimeiki/srv32/sw' make[1]: Entering directory '/home/teimeiki/srv32/sim' Excuting 342 instructions, 368 cycles, 1.076 CPI Program terminate - ../rtl/../testbench/testbench.v:434: Verilog $finish Simulation statistics ===================== Simulation time : 0.014 s Simulation cycles: 379 Simulation speed : 0.0270714 MHz make[1]: Leaving directory '/home/teimeiki/srv32/sim' make[1]: Entering directory '/home/teimeiki/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf Excuting 342 instructions, 368 cycles, 1.076 CPI Program terminate Simulation statistics ===================== Simulation time : 0.000 s Simulation cycles: 368 Simulation speed : 1.878 MHz make[1]: Leaving directory '/home/teimeiki/srv32/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ``` ::: The instruction count of branch in for loop is reduced from $n$ to $n / 4 + 3$, where n is the length of input array; And 3 additional `lw` and 1 `andi` is needed. | | instruction count | cycle count | CPI | LOC of for loop| | -------- | -------- | -------- | -------- | -------- | | with loop unrolling | 342 | 368 | 1.076 | 38 | | without loop unrolling | 363 | 449 | 1.237 | 10 | ### Observing waveform In windows terminal, enter this line to use gtkwave. ```bash .\gtkwave.exe -f ..\hw2.fst ``` And select the signal you want to observe. #### Control hazard ![](https://i.imgur.com/MXtr1nU.png) In this figure, we can see that each time a branch (or jump) is taken, two instruction will be flushed. This is called branch penalty, and can be illustrated by figure [here](https://github.com/sysprog21/srv32/blob/devel/images/branch.svg). ![](https://raw.githubusercontent.com/sysprog21/srv32/fa12dfb668b6ca5bf47cf10b93951ddc50de7369/images/branch.svg) The instruction fetched is wrong if branch is taken, so 2 instruction following branch should be flushed. Branch prediction do not help because in this srv32 implementation, all of the destination of branch and jump is decided at EXE stage. We can observe it in instruction fetching waveform too. ![](https://i.imgur.com/bSSOjDc.png) ```assembly dc: 007e2023 sw t2,0(t3) e0: 007ea023 sw t2,0(t4) e4: 00428293 addi t0,t0,4 e8: f862c8e3 blt t0,t1,78 <for1> ec: 00000a13 li s4,0 f0: 00399a93 slli s5,s3,0x3 ``` When fetch_pc is set to the address of `blt`, the memory will pass the instruction (`blt t0,t1,78`) to riscv CPU after 1 cycle. ![](https://i.imgur.com/dGCEZeG.png) And when `blt` flow to EXE stage, because it is a branch type instruction, the branch flag will be set and `next_pc` will be set to branch destination too. The `fetch_pc` will be updated with `next_pc` in next cycle and in this procedure, 2 instruction that shouldn't be executed will be fetched into our pipline so needed to be flushed. #### Save and load data ![](https://i.imgur.com/0knpkcz.png) DMEM: ```assembly 0000000 ffff ffff 0000 0000 ffff ffff 0000 0000 0000010 0001 0000 0002 0000 0003 0000 0004 0000 0000020 0005 0000 0006 0000 0007 0000 0008 0000 0000030 0009 0000 000a 0000 000b 0000 000c 0000 0000040 000d 0000 000e 0000 000f 0000 0010 0000 0000050 0011 0000 0012 0000 0013 0000 0014 0000 0000060 0015 0000 0016 0000 0017 0000 0018 0000 0000070 0019 0000 001a 0000 001b 0000 001c 0000 0000080 001d 0000 001e 0000 001f 0000 0020 0000 0000090 0021 0000 0022 0000 0023 0000 0024 0000 00000a0 0025 0000 0026 0000 0027 0000 0028 0000 00000b0 0028 0000 250a 2064 0000 0000 00000bc ``` CODE: ```assembly 7c: 00228e33 add t3,t0,sp 80: 006e0eb3 add t4,t3,t1 84: 0003a383 lw t2,0(t2) 88: 007e2023 sw t2,0(t3) 8c: 007ea023 sw t2,0(t4) ``` When `lw` goes to EXE stage, `dmem_rready` will be set and `dmem_raddr` will be set to the addressof data. After subtracting offset and truncating the least significant 2 bits, the answer will be passed to `raddr`. And the value stored in address will be read to CPU after a cycle. In this example, the value is `0000 0001`. And when `sw` goes to `WB` stage, the `wready` signal will be set. ![](https://i.imgur.com/VvgYjvj.png) The result calculated by ALU will be passed to `wb_waddr`. After the subtraction of offset and truncation of least significant 2 bits, the address will be passed to `waddr` to store `wdata` into data memory. ## Select a LeetCode problem with medium difficulty I select [longest-substring-without-repeating-characters](https://leetcode.com/problems/longest-substring-without-repeating-characters/description/) as my new object to implement. Here is the discription: >Given a string s, find the length of the longest substring without repeating characters. >Example 1: >``` >Input: s = "abcabcbb" >Output: 3 >Explanation: The answer is "abc", with the length of 3. >``` >Example 2: >``` >Input: s = "bbbbb" >Output: 1 >Explanation: The answer is "b", with the length of 1. >``` >Example 3: >``` >Input: s = "pwwkew" >Output: 3 >Explanation: The answer is "wke", with the length of 3. >Notice that the answer must be a substring, "pwke" is a subsequence and not a substring. >``` >Constraints: > >* $0$ <= `s.length` <= $5 * 10^{4}$ >* `s` consists of English letters, digits, symbols and spaces. ### Algorithm ```clike int lengthOfLongestSubstring(char *s) { int map[128]; /* ASCII only use 7 bits to store */ for (int i = 0; i < 128; i++) { map[i] = 0; } int maxLen = 0, start = 0; /* start is the start point of longest substring containing s[i] */ for (int i = 0; s[i] != '\0'; i++){ if (map[s[i]] > start) start = map[s[i]]; map[s[i]] = i + 1; if (i + 1 - start > maxLen) maxLen = i + 1 - start; } return maxLen; } ``` For explanation, please consult [this solution](https://leetcode.com/problems/longest-substring-without-repeating-characters/solutions/1737/c-code-in-9-lines/?orderBy=most_votes). And I want to reduce the space size the sparse array `map`, so I modifiy the constraints to: >Constraints: > >* $0$ <= `s.length` <= $255$ >* `s` consists of lower case English letters. Because there is only 26 letters now, the size of `map` can be reduced to 26; And the maximal length is 255 now so we can only use 8 bit to store it, which is a char. Here is the C code after modification: ```clike int lengthOfLongestSubstring(char *s) { char map[26]; for (int i = 0; i < 128; i++) { map[i] = 0; } int maxLen = 0, start = 0; /* start is the start point of longest substring containing s[i] */ for (int i = 0; s[i] != '\0'; i++){ if (map[s[i]] > start) start = map[s[i]]; map[s[i]] = i + 1; if (i + 1 - start > maxLen) maxLen = i + 1 - start; } return maxLen; } ``` The only modification is the assigment of map. ### Assembly code :::info todo: finish it :( :::