Try   HackMD

Assignment3: SoftCPU

tags: RISC-V jserv

Before start

When following the steps on lab3 to setup required environment of homework, I encountered some problems.

Installation of RISC-V toolchains

In Makefile, we can see that risc-v toolchains will install in /opt/riscv, which might need a root user privilege to execute because it will add a new directory under the root.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

When directly type in make -j$(nproc), it aroused a permisson denied error and by using sudo can fix it.

sudo make -j$(nproc)

Installation of verilator

When installing verilator, we have to add a new system variable to save the directory of verilator root. In lab3, command export was used but it can only take effect in current shell. If restarted we have to retype this command again. So I modified ~/.profile to make it run this command automatically every time the shell restarted.

if [ -d "$HOME/verilator" ] ; then
  VERILATOR_ROOT="$HOME/verilator"
fi

After add these lines into ~/.profile we can use it as a source and check the value of system variable.

source ~/.profile
echo $VERILATOR_ROOT

You should see the following result, where teimeiki will be replaced with your user name:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

And after add the system variable, the instruction tells me to run ./configure but it turned out the file doesn't exist. We should first use autoconf to generate configure file with defult setting in configure.ac.

The following is all command I used:

# build riscv toolchain from source code
sudo apt install autoconf automake autotools-dev curl gawk git \
                 build-essential bison flex texinfo gperf libtool patchutils bc git \
                 libmpc-dev libmpfr-dev libgmp-dev gawk zlib1g-dev libexpat1-dev
git clone --recursive https://github.com/riscv/riscv-gnu-toolchain
cd riscv-gnu-toolchain
mkdir -p build && cd build
../configure --prefix=/opt/riscv --with-isa-spec=20191213 \
    --with-multilib-generator="rv32i-ilp32--;rv32im-ilp32--;rv32imac-ilp32--;rv32im_zicsr-ilp32--;rv32imac_zicsr-ilp32--;rv64imac-lp64--;rv64imac_zicsr-lp64--"
sudo make -j$(nproc)
# install some dependent packages
sudo apt install build-essential lcov ccache libsystemc-dev
# get verilator
cd $HOME
git clone https://github.com/verilator/verilator
cd verilator
git checkout stable

Add this into ~/.profile:

if [ -d "$HOME/verilator" ] ; then
  VERILATOR_ROOT="$HOME/verilator"
fi
# to generate ./configure
autoconf

# build verilator
./configure
make

# check whether installation have problems
make test

Modify ~/.profile again:

if [ -d "$HOME/verilator" ] ; then
  VERILATOR_ROOT="$HOME/verilator"
  PATH="$VERILATOR_ROOT/bin:$PATH"
fi
# Make sure the version of Verilator >= 5.002
verilator --version
# get srv32
git clone https://github.com/sysprog21/srv32

After installation of srv32, I try to use make all to check whether the invironment is set up correctly and this error occured:

Vriscv.mk:57: /usr/local/share/verilator/include/verilated.mk: No such file or directory
make[3]: *** No rule to make target '/usr/local/share/verilator/include/verilated.mk'.  Stop.
make[3]: Leaving directory '/home/teimeiki/srv32/sim/sim_cc'
%Error: make -C sim_cc -f Vriscv.mk -j 1 exited with 2
%Error: Command Failed /home/teimeiki/verilator/bin/verilator_bin -O3 -cc -Wall -Wno-STMTDLY -Wno-UNUSED +define+MEMSIZE=128 --trace-fst --Mdir sim_cc --build --exe sim_main.cpp getch.cpp -o sim -f filelist.txt ../rtl/top.v
make[2]: *** [Makefile:45: sim] Error 2
make[2]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: *** [Makefile:118: pi_pthread] Error 2
make[1]: Leaving directory '/home/teimeiki/srv32'
make: *** [Makefile:94: all] Error 1

After read the Vriscv.mk in the directory srv32/sim/sim_cc , I realized that the value of $VERILATOR_ROOT will be chaned in makefile and commenting the line didn't help because it will automatically uncomment when we call make command. So I tried to set flag when calling make command:

 make all VERILATOR_ROOT="$VERILATOR_ROOT"

The value of VERILATOR_ROOT will be specified even if mk scripts reassign the value. It finally works fine.

The reason is that I didn't export the environment variable. After I add export VERILATOR_ROOT and export CROSS_COMPILE in ~/.profile, the problem is solved and there is no need to specify environment variable in command line.

Srv32

Data hazard

Control hazard

In README.md file of srv32, it state that Two instructions branch penalty if branch taken, CPI is 1 for other instructions.. We can also see this phenomena with a testing program.

I want to check the wave form to observe this phenomena, so the following code is introduced:

int main(void) {
    for(int i = 0; i < 1000; i++);
    return 0;
}

It only composed by a for loop doing nothig, which will produce a successive taken conditional branch. And compiled by -O0 optimization flag specified:

riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32  -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o hello.elf hello.c -lc -lm -lgcc -lsys -T ../common/default.ld

With this program, we can obser the behavior of srv32 when encounter with successive conditional branches. And the following waveform is generated:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

We can know that every time branch taken, 2 instruction will be flushed.

Here is 2 implementation of the c code:

First

.text
.global main

main:
    addi sp, sp, -4    # main will called by _start
    sw ra, 0(sp)

    li x25, 100
    li t0, 0
    li t1, 1000
for:
    bge t0, t1, end
    addi t0, t0, 1
    j for

end:
    lw ra, 0(sp)
    addi sp, sp, 4
    ret
result
teimeiki@ubuntu:~/srv32$ make myTest
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/myTest'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32  -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o myTest.elf myTest.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary myTest.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary myTest.elf dmem.bin
riscv-none-elf-objcopy -O binary myTest.elf memory.bin
riscv-none-elf-objdump -d myTest.elf > myTest.dis
riscv-none-elf-readelf -a myTest.elf > myTest.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/myTest'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'

Excuting 3039 instructions, 5049 cycles, 1.661 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish

Simulation statistics
=====================
Simulation time  : 0.03 s
Simulation cycles: 5060
Simulation speed : 0.168667 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/myTest/myTest.elf

Excuting 3039 instructions, 5049 cycles, 1.661 CPI
Program terminate

Simulation statistics
=====================
Simulation time  : 0.002 s
Simulation cycles: 5049
Simulation speed : 2.395 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===

Second

.text
.global main

main:
    addi sp, sp, -4    # main will called by _start
    sw ra, 0(sp)

    li t0, 0
    li t1, 1000
for:
    addi t0, t0, 1
    blt t0, t1, for

    lw ra, 0(sp)
    addi sp, sp, 4
    ret
result
teimeiki@ubuntu:~/srv32$ make myTest
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/myTest'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32  -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o myTest.elf myTest.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary myTest.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary myTest.elf dmem.bin
riscv-none-elf-objcopy -O binary myTest.elf memory.bin
riscv-none-elf-objdump -d myTest.elf > myTest.dis
riscv-none-elf-readelf -a myTest.elf > myTest.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/myTest'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'

Excuting 2040 instructions, 4048 cycles, 1.984 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish

Simulation statistics
=====================
Simulation time  : 0.026 s
Simulation cycles: 4059
Simulation speed : 0.156115 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/myTest/myTest.elf

Excuting 2040 instructions, 4048 cycles, 1.984 CPI
Program terminate

Simulation statistics
=====================
Simulation time  : 0.001 s
Simulation cycles: 4048
Simulation speed : 3.393 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===

We cannot avoid bursting branch in for loop unless we use loop unrolling, but we can by changing implementation way to reduce instruction counts. The second implementation has fewer instruction counts, while it is actually a do...while... implementation so the first iteration will not be checked whether greater than 1000 or not.

instruction count cycle count CPI
first implementation 3090 5049 1.661
scecond implementation 2040 4048 1.984

Port hw2 to srv32

Try to build previous work

I have tried some way to run my assembly code on srv32, but all of them failed at first.

I reference to OscarShiang's previous work because it is the scanty detailed report using assembly language and download the source code on his github to check whether other's work can run. And encounter with follwin issue:

error message
startup.S: Assembler messages:
startup.S:5: Error: unrecognized opcode `csrw mtvec,t0'
startup.S:6: Error: unrecognized opcode `csrrsi zero,mtvec,1'
startup.S:74: Error: unrecognized opcode `csrr t5,mepc'
startup.S:76: Error: unrecognized opcode `csrw mepc,t5'
startup.S:133: Error: unrecognized opcode `csrr t5,mepc'
startup.S:135: Error: unrecognized opcode `csrw mepc,t5'
startup.S:192: Error: unrecognized opcode `csrr t5,mepc'
startup.S:194: Error: unrecognized opcode `csrw mepc,t5'

I modified Makefile.common as suggested in this issue to make it become compatible with current ISA spec version, but it turned out to failed too:

error message
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common'
make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -nostartfiles -nostdlib -L../common -o count_bits.elf count_bits.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary count_bits.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary count_bits.elf dmem.bin
riscv-none-elf-objcopy -O binary count_bits.elf memory.bin
riscv-none-elf-objdump -d count_bits.elf > count_bits.dis
riscv-none-elf-readelf -a count_bits.elf > count_bits.symbol
make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits'
make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw'
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim'
Illegal instruction at PC 0x00000188
Illegal instruction at PC 0x0000018c
Illegal instruction at PC 0x00000190
Illegal instruction at PC 0x00000194
DMEM address fffffd41 out of range
- ../rtl/../testbench/testbench.v:423: Verilog $finish

Simulation statistics
=====================
Simulation time  : 0.013 s
Simulation cycles: 1329
Simulation speed : 0.102231 MHz

make[1]: *** [Makefile:69: count_bits.run] Error 1
make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim'
make: *** [Makefile:80: count_bits] Error 2

The illegal instruction is in subroutine printf and it is compressed instruction. Because the address in error message increase by 4 and printf contains the first compressed instruction in whole program, I think the problem might be caused by compress instruction. After checking srv32' readme again, I enalbed compressed instruction with rv32c=1 flag and tried again, but it cannot run, too:

error message
teimeiki@ubuntu:~/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06$ make count_bits rv32c=1
make rv32c=1 -C sw count_bits
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common'
make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits'
riscv-none-elf-gcc -O0 -Wall -march=rv32imac_zicsr -mabi=ilp32 -nostartfiles -nostdlib -L../common -o count_bits.elf count_bits.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary count_bits.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary count_bits.elf dmem.bin
riscv-none-elf-objcopy -O binary count_bits.elf memory.bin
riscv-none-elf-objdump -d count_bits.elf > count_bits.dis
riscv-none-elf-readelf -a count_bits.elf > count_bits.symbol
make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits'
make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw'
make  verilator=1 \
                  \
                  rv32c=1 debug=0 -C sim count_bits.run
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim'
Illegal instruction at PC 0x00000002

And it even cannot terminate.

If we disable printf, OscarShiang's code can run normally:

can run, but no output on screen
teimeiki@ubuntu:~/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06$ make count_bits
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/common'
make[2]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32 -nostartfiles -nostdlib -L../common -o count_bits.elf count_bits.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary count_bits.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary count_bits.elf dmem.bin
riscv-none-elf-objcopy -O binary count_bits.elf memory.bin
riscv-none-elf-objdump -d count_bits.elf > count_bits.dis
riscv-none-elf-readelf -a count_bits.elf > count_bits.symbol
make[2]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw/count_bits'
make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sw'
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim'

Excuting 187 instructions, 257 cycles, 1.374 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish

Simulation statistics
=====================
Simulation time  : 0.009 s
Simulation cycles: 268
Simulation speed : 0.0297778 MHz

make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/sim'
make[1]: Entering directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/tools'
gcc -c -o rvsim.o rvsim.c -O3 -g -Wall
gcc -c -o decompress.o decompress.c -O3 -g -Wall
gcc -c -o syscall.o syscall.c -O3 -g -Wall
gcc -c -o elfread.o elfread.c -O3 -g -Wall
gcc -c -o getch.o getch.c -O3 -g -Wall
gcc -O3 -g -Wall  -o rvsim rvsim.o decompress.o syscall.o elfread.o getch.o
./rvsim --memsize 128 -l trace.log ../sw/count_bits/count_bits.elf

Excuting 187 instructions, 257 cycles, 1.374 CPI
Program terminate

Simulation statistics
=====================
Simulation time  : 0.000 s
Simulation cycles: 257
Simulation speed : 2.705 MHz

make[1]: Leaving directory '/home/teimeiki/ubuntu_MK/srv32-6cc195fb9d42dfc0bd7e9c6721e02d5dba09ab06/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===

The problem is caused by printf. In OscarShiang’s work, printf always use a compressed instruction format so when we use normal 32bits instruction format, srv32 will borken. And a weird thing is that setting rv32c=1 didn't help.
But after I test printf in my project cloned from srv32, it turned out that printf works perfectly. So I suspended debugging previous work and port my hw2 to srv32 first.

At first, the linker will not link my program with _start, my main function will be used as _start insteaded. Also, the data will be directly append to text section rather than data section:

dumped disassembled code and memory
teimeiki@ubuntu:~/srv32$ riscv-none-elf-objdump -d sw/myTestAs/myTest.elf

sw/myTestAs/myTest.elf:     file format elf32-littleriscv


Disassembly of section .text:

00000000 <_start>:
   0:   fe810113                addi    sp,sp,-24
   4:   00000497                auipc   s1,0x0
   8:   08c48493                addi    s1,s1,140 # 90 <arr1>
   c:   00010913                mv      s2,sp
  10:   00000997                auipc   s3,0x0
  14:   08c9a983                lw      s3,140(s3) # 9c <len1>
  18:   00000293                li      t0,0
  1c:   00299313                slli    t1,s3,0x2

00000020 <for1>:
  20:   009283b3                add     t2,t0,s1
  24:   01228e33                add     t3,t0,s2
  28:   006e0eb3                add     t4,t3,t1
  2c:   0003a383                lw      t2,0(t2)
  30:   007e2023                sw      t2,0(t3)
  34:   007ea023                sw      t2,0(t4)
  38:   00428293                addi    t0,t0,4
  3c:   fe62c2e3                blt     t0,t1,20 <for1>
  40:   00000293                li      t0,0
  44:   00399313                slli    t1,s3,0x3
  48:   04000893                li      a7,64

0000004c <forPrint>:
  4c:   005903b3                add     t2,s2,t0
  50:   00001537                lui     a0,0x1
  54:   fff50513                addi    a0,a0,-1 # fff <SYSBRK+0xf29>
  58:   00038593                mv      a1,t2
.
.
.
      7c:   fc62c8e3                blt     t0,t1,4c <forPrint>
  80:   0040006f                j       84 <exit>

00000084 <exit>:
  84:   00000513                li      a0,0
  88:   05d00893                li      a7,93
  8c:   00000073                ecall

00000090 <arr1>:
  90:   00000001                .word   0x00000001
  94:   00000002                .word   0x00000002
  98:   00000003                .word   0x00000003

0000009c <len1>:
  9c:   00000003                .word   0x00000003

000000a0 <space>:
  a0:           Address 0x00000000000000a0 is out of bounds.

  a4:

000000a1 <comma>:
  a1:           Address 0x00000000000000a1 is out of bounds.

  a5:

000000a2 <nline>:
  a2:   000a                    .short  0x000a

teimeiki@ubuntu:~/srv32/sw/myTestAs$ hexdump memory.bin
0000000 0113 fe81 0497 0000 8493 08c4 0913 0001
0000010 0997 0000 a983 08c9 0293 0000 9313 0029
0000020 83b3 0092 8e33 0122 0eb3 006e a383 0003
0000030 2023 007e a023 007e 8293 0042 c2e3 fe62
0000040 0293 0000 9313 0039 0893 0400 03b3 0059
0000050 1537 0000 0513 fff5 8593 0003 0613 0040
0000060 0073 0000 0513 0010 0597 0000 8593 0385
0000070 0613 0010 0073 0000 8293 0042 c8e3 fc62
0000080 006f 0040 0513 0000 0893 05d0 0073 0000
0000090 0001 0000 0002 0000 0003 0000 0003 0000
00000a0 2c20 000a 0000 0000 0000 0000 0000 0000
00000b0 0000 0000 0000 0000 0000 0000 0000 0000
*
0020000 ffff ffff 0000 0000 ffff ffff 0000 0000
0020010

This is because I use assembler to assemble my code at first, thus forgot to set flags such as -nostartfiles and -nostdlib because the assembler cannot recognize them. So when linking, somthing will go wrong.

I modified Makefile as following and make it properly:

include ../common/Makefile.common

EXE      = .elf
SRC      = concatenation_of_array.s
CFLAGS  += -L../common
LDFLAGS += -T ../common/default.ld
TARGET   = concatenation_of_array
OUTPUT   = $(TARGET)$(EXE)

.PHONY: all clean

all: $(TARGET)

$(TARGET): $(SRC)
    $(CC) $(CFLAGS) -o $(OUTPUT) $(SRC) $(LDFLAGS) -g
    $(OBJCOPY) -j .text -O binary $(OUTPUT) imem.bin
    $(OBJCOPY) -j .data -O binary $(OUTPUT) dmem.bin
    $(OBJCOPY) -O binary $(OUTPUT) memory.bin
    $(OBJDUMP) -d $(OUTPUT) > $(TARGET).dis
    $(READELF) -a $(OUTPUT) > $(TARGET).symbol

clean:
    $(RM) *.o $(OUTPUT) $(TARGET).dis $(TARGET).symbol [id]mem.bin memory.bin

System call

In srv32, the calling convention seems to be same with rv32emu. After observing sw/common/syscall.c and tools/syscall.c under the directory srv32, I think the calling convention should be same.
sw/common/syscall.c:

_write(int file, const void *ptr, size_t len) { #if HAVE_SYSCALL int res = __internal_syscall(SYS_WRITE, (long)file, (long)ptr, (long)len, 0, 0, 0); return res; #else const char *buf = (char*)ptr; int i; for(i=0; i<len; i++) _putchar(buf[i]); return len; #endif }

tools/syscall.c:

case SYS_WRITE: #if 0 if (a0 == STDOUT) { int i; for(i=0; i<a2; i++) { char c = ptr[DVA2PA(a1)+i]; putchar(c); } fflush(stdout); } #else res = (int)write(a0, (const char*)(&ptr[DVA2PA(a1)]), a2); #endif break;

In this 2 implementation, a0 is output file (STDOUT), a1 is the address of data and a2 is the len of data in byte. So I directly use my code written in hw2 to do system call but faild. After that, I reffer to 鄭至崴's suggestion and try printf again. After I rebuild my project, it can works find.

The usage of printf is described in wanghanchi's work.

There is one thing we should pay attention to. ecall is an exception so the register used by it is seperated from GPRs, while printf is a subroutine, it will modify our GPRs. ra should be store before function call and so does t0-6, a0-7 as long as we need them after calling printf. Otherwise we might get a wrong result.

Change main into a subroutine

Because the main function will become a subroutine of _start, we have to modify it. In homework 2, I specify the address of main to 0x00, but this address should be reserved for _start routine. I modified my code form:

.org 0
.global _start

.set STDOUT, 1
.set SYSEXIT,  93
.set SYSWRITE, 64
.set SYSBRK, 214

.data
arr: .word 1, 2, 3 # a[3] = {1, 2, 3}
len1: .word 3 # array length of a is 3
space: .byte ' ' # space
nline: .byte '\n'

.text

_start:
    addi sp, sp, -24    # allocate space for b
    la s1, arr1
    mv s2, sp           # base address of b
    ...
    MY CODE

to:

.global main

.data
arr1: .word 1, 2, 3 # a[3] = {1, 2, 3}
len1: .word 3 # array length of a is 3
nline: .byte '\n'
iformat: .string "%d "

.text
main:
    addi sp, sp, -28    # allocate space for b
    sw ra, 24(sp)
    la s1, arr1
    mv s2, sp           # base address of b
    ...
    MY CODE

SYS call constants is eliminated because printf don't need them; The name is modified because there is another _start function so it is better to avoild duplicated names; The origin of main should be determined when linking so .org 0 is eliminated; The main function should follow calling convention of rv32 so it should save its return address.

Also, the return value should be set properly. a0 should be set to 0 before return if every thing goes perfectly, otherwise an error will be passed to make if we execute the code using make command:

An error occurs!
teimeiki@ubuntu:~/srv32$ make concatenation_of_array
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32  -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
1 2 3 1 2 3

Excuting 6995 instructions, 9287 cycles, 1.327 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish

Simulation statistics
=====================
Simulation time  : 0.078 s
Simulation cycles: 9298
Simulation speed : 0.119205 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf
1 2 3 1 2 3

Excuting 6995 instructions, 9287 cycles, 1.328 CPI
Program terminate

Simulation statistics
=====================
Simulation time  : 0.003 s
Simulation cycles: 9287
Simulation speed : 2.656 MHz

make[1]: *** [Makefile:48: concatenation_of_array.run] Error 8
make[1]: Leaving directory '/home/teimeiki/srv32/tools'
make: *** [Makefile:119: concatenation_of_array] Error 2

With a li a0, 0 before returning:

No error!
teimeiki@ubuntu:~/srv32$ make concatenation_of_array
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32  -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
1 2 3 1 2 3

Excuting 6996 instructions, 9288 cycles, 1.327 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish

Simulation statistics
=====================
Simulation time  : 0.075 s
Simulation cycles: 9299
Simulation speed : 0.123987 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf
1 2 3 1 2 3

Excuting 6996 instructions, 9288 cycles, 1.328 CPI
Program terminate

Simulation statistics
=====================
Simulation time  : 0.005 s
Simulation cycles: 9288
Simulation speed : 1.974 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===

Result

Here is the code after modification:

.global main .data arr1: .word 1, 2, 3 # a[3] = {1, 2, 3} len1: .word 3 # array length of a is 3 nline: .byte '\n' iformat: .string "%d " .text main: addi sp, sp, -28 # allocate space for b sw ra, 24(sp) la s1, arr1 mv s2, sp # base address of b lw s3, len1 li t0, 0 # i = 0 slli t1, s3, 2 # t1 = len(arr) * 4 for1: add t2, t0, s1 # t2 = curr address of a add t3, t0, s2 # t3 = curr address of b add t4, t3, t1 # t4 = curr address of b + len of a lw t2, 0(t2) # t2 = a[i] sw t2, 0(t3) # b[i] = a[i] sw t2, 0(t4) # b[i + len(a)] = a[i] addi t0, t0, 4 blt t0, t1, for1 li s4, 0 slli s5, s3, 3 # t1 = len(arr) * 8 forPrint: la a0, iformat add t2, s2, s4 # t2 = address of b[i] lw a1, 0(t2) call printf addi s4, s4, 4 blt s4, s5, forPrint la a0, nline call printf lw ra, 24(sp) addi sp, sp, 28 li a0, 0 # return 0 ret

And follwing is the output:

result
teimeiki@ubuntu:~/srv32$ make concatenation_of_array
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32  -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
1 2 3 1 2 3

Excuting 6996 instructions, 9288 cycles, 1.327 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish

Simulation statistics
=====================
Simulation time  : 0.075 s
Simulation cycles: 9299
Simulation speed : 0.123987 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf
1 2 3 1 2 3

Excuting 6996 instructions, 9288 cycles, 1.328 CPI
Program terminate

Simulation statistics
=====================
Simulation time  : 0.005 s
Simulation cycles: 9288
Simulation speed : 1.974 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===

Optimization of hw2

I found that there is no need to store the value of stack pointer at line 14 because it will not change in main function. So I first modify it.

And I extend the array so the for loop will iterate for 40 times, so the improvement is more significant.

After modification, the result is as following:

with output on screen
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32  -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Excuting 71137 instructions, 91987 cycles, 1.293 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish

Simulation statistics
=====================
Simulation time  : 10.029 s
Simulation cycles: 91998
Simulation speed : 0.0091732 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Excuting 71137 instructions, 91987 cycles, 1.293 CPI
Program terminate

Simulation statistics
=====================
Simulation time  : 0.042 s
Simulation cycles: 91987
Simulation speed : 2.200 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
without output on screen
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32  -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'

Excuting 363 instructions, 449 cycles, 1.236 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish

Simulation statistics
=====================
Simulation time  : 0.014 s
Simulation cycles: 460
Simulation speed : 0.0328571 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf

Excuting 363 instructions, 449 cycles, 1.237 CPI
Program terminate

Simulation statistics
=====================
Simulation time  : 0.000 s
Simulation cycles: 449
Simulation speed : 2.314 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===

I modify the code as following:

.global main
.data
arr1: .word 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40# a[3] = {1, 2, 3}
len1: .word 40 # array length of a is 3
nline: .byte '\n'
iformat: .string "%d "



.text

main:
    addi sp, sp, -324    # allocate space for b
    sw ra, 320(sp)
    la s1, arr1
    lw s3, len1
    li t0, 0            # i = 0
    slli t1, s3, 2      # t1 = len(arr) * 4

    andi t2, s3, 0x3# count remainder ( i % 4)
    li t3, 3
    beq t3, t2, three
    li t3, 2
    beq t3, t2, two
    li t3, 1
    beq t3, t2, one
for1:
    add t2, t0, s1      # t2 = curr address of a
    add t3, t0, sp      # t3 = curr address of b
    add t4, t3, t1      # t4 = curr address of b + len of a
    lw t2, 0(t2)        # t2 = a[i]
    sw t2, 0(t3)        # b[i] = a[i]
    sw t2, 0(t4)        # b[i + len(a)] = a[i]
    addi t0, t0, 4
three: # i % 4 == 3
    add t2, t0, s1      # t2 = curr address of a
    add t3, t0, sp      # t3 = curr address of b
    add t4, t3, t1      # t4 = curr address of b + len of a
    lw t2, 0(t2)        # t2 = a[i]
    sw t2, 0(t3)        # b[i] = a[i]
    sw t2, 0(t4)        # b[i + len(a)] = a[i]
    addi t0, t0, 4
two: # i % 4 == 2
    add t2, t0, s1      # t2 = curr address of a
    add t3, t0, sp      # t3 = curr address of b
    add t4, t3, t1      # t4 = curr address of b + len of a
    lw t2, 0(t2)        # t2 = a[i]
    sw t2, 0(t3)        # b[i] = a[i]
    sw t2, 0(t4)        # b[i + len(a)] = a[i]
    addi t0, t0, 4
one: # i % 4 == 1
    add t2, t0, s1      # t2 = curr address of a
    add t3, t0, sp      # t3 = curr address of b
    add t4, t3, t1      # t4 = curr address of b + len of a
    lw t2, 0(t2)        # t2 = a[i]
    sw t2, 0(t3)        # b[i] = a[i]
    sw t2, 0(t4)        # b[i + len(a)] = a[i]
    addi t0, t0, 4
    blt t0, t1, for1

    li s4, 0
    slli s5, s3, 3      # t1 = len(arr) * 8
forPrint:
    la a0, iformat
    add t2, sp, s4      # t2 = address of b[i]
    lw a1, 0(t2)
    call printf

    addi s4, s4, 4
    blt s4, s5, forPrint
    la a0, nline
    call printf

    lw ra, 320(sp)
    addi sp, sp, 324
    li a0, 0            # return 0
    ret

And the result is as following:

with output on screen
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32  -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Excuting 71114 instructions, 91904 cycles, 1.292 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish

Simulation statistics
=====================
Simulation time  : 0.976 s
Simulation cycles: 91915
Simulation speed : 0.0941752 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Excuting 71114 instructions, 91904 cycles, 1.292 CPI
Program terminate

Simulation statistics
=====================
Simulation time  : 0.042 s
Simulation cycles: 91904
Simulation speed : 2.206 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
without output on screen
make[1]: Entering directory '/home/teimeiki/srv32/sw'
make -C common
make[2]: Entering directory '/home/teimeiki/srv32/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/teimeiki/srv32/sw/common'
make[2]: Entering directory '/home/teimeiki/srv32/sw/concatenation_of_array'
riscv-none-elf-gcc -O0 -Wall -march=rv32im_zicsr -mabi=ilp32  -misa-spec=2.2 -march=rv32im -nostartfiles -nostdlib -L../common -o concatenation_of_array.elf concatenation_of_array.s -lc -lm -lgcc -lsys -T ../common/default.ld -g
riscv-none-elf-objcopy -j .text -O binary concatenation_of_array.elf imem.bin
riscv-none-elf-objcopy -j .data -O binary concatenation_of_array.elf dmem.bin
riscv-none-elf-objcopy -O binary concatenation_of_array.elf memory.bin
riscv-none-elf-objdump -d concatenation_of_array.elf > concatenation_of_array.dis
riscv-none-elf-readelf -a concatenation_of_array.elf > concatenation_of_array.symbol
make[2]: Leaving directory '/home/teimeiki/srv32/sw/concatenation_of_array'
make[1]: Leaving directory '/home/teimeiki/srv32/sw'
make[1]: Entering directory '/home/teimeiki/srv32/sim'

Excuting 342 instructions, 368 cycles, 1.076 CPI
Program terminate
- ../rtl/../testbench/testbench.v:434: Verilog $finish

Simulation statistics
=====================
Simulation time  : 0.014 s
Simulation cycles: 379
Simulation speed : 0.0270714 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/sim'
make[1]: Entering directory '/home/teimeiki/srv32/tools'
./rvsim --memsize 128 -l trace.log ../sw/concatenation_of_array/concatenation_of_array.elf

Excuting 342 instructions, 368 cycles, 1.076 CPI
Program terminate

Simulation statistics
=====================
Simulation time  : 0.000 s
Simulation cycles: 368
Simulation speed : 1.878 MHz

make[1]: Leaving directory '/home/teimeiki/srv32/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===

The instruction count of branch in for loop is reduced from

n to
n/4+3
, where n is the length of input array; And 3 additional lw and 1 andi is needed.

instruction count cycle count CPI LOC of for loop
with loop unrolling 342 368 1.076 38
without loop unrolling 363 449 1.237 10

Observing waveform

In windows terminal, enter this line to use gtkwave.

.\gtkwave.exe -f ..\hw2.fst

And select the signal you want to observe.

Control hazard

In this figure, we can see that each time a branch (or jump) is taken, two instruction will be flushed. This is called branch penalty, and can be illustrated by figure here.

The instruction fetched is wrong if branch is taken, so 2 instruction following branch should be flushed. Branch prediction do not help because in this srv32 implementation, all of the destination of branch and jump is decided at EXE stage.

We can observe it in instruction fetching waveform too.

  dc:   007e2023            sw  t2,0(t3)
  e0:   007ea023            sw  t2,0(t4)
  e4:   00428293            addi    t0,t0,4
  e8:   f862c8e3            blt t0,t1,78 <for1>
  ec:   00000a13            li  s4,0
  f0:   00399a93            slli    s5,s3,0x3

When fetch_pc is set to the address of blt, the memory will pass the instruction (blt t0,t1,78) to riscv CPU after 1 cycle.


And when blt flow to EXE stage, because it is a branch type instruction, the branch flag will be set and next_pc will be set to branch destination too. The fetch_pc will be updated with next_pc in next cycle and in this procedure, 2 instruction that shouldn't be executed will be fetched into our pipline so needed to be flushed.

Save and load data


DMEM:

0000000 ffff ffff 0000 0000 ffff ffff 0000 0000
0000010 0001 0000 0002 0000 0003 0000 0004 0000
0000020 0005 0000 0006 0000 0007 0000 0008 0000
0000030 0009 0000 000a 0000 000b 0000 000c 0000
0000040 000d 0000 000e 0000 000f 0000 0010 0000
0000050 0011 0000 0012 0000 0013 0000 0014 0000
0000060 0015 0000 0016 0000 0017 0000 0018 0000
0000070 0019 0000 001a 0000 001b 0000 001c 0000
0000080 001d 0000 001e 0000 001f 0000 0020 0000
0000090 0021 0000 0022 0000 0023 0000 0024 0000
00000a0 0025 0000 0026 0000 0027 0000 0028 0000
00000b0 0028 0000 250a 2064 0000 0000
00000bc

CODE:

  7c:   00228e33            add t3,t0,sp
  80:   006e0eb3            add t4,t3,t1
  84:   0003a383            lw  t2,0(t2)
  88:   007e2023            sw  t2,0(t3)
  8c:   007ea023            sw  t2,0(t4)

When lw goes to EXE stage, dmem_rready will be set and dmem_raddr will be set to the addressof data. After subtracting offset and truncating the least significant 2 bits, the answer will be passed to raddr. And the value stored in address will be read to CPU after a cycle. In this example, the value is 0000 0001.

And when sw goes to WB stage, the wready signal will be set.

The result calculated by ALU will be passed to wb_waddr. After the subtraction of offset and truncation of least significant 2 bits, the address will be passed to waddr to store wdata into data memory.

Select a LeetCode problem with medium difficulty

I select longest-substring-without-repeating-characters as my new object to implement.

Here is the discription:

Given a string s, find the length of the longest substring without repeating characters.
Example 1:

Input: s = "abcabcbb"
Output: 3
Explanation: The answer is "abc", with the length of 3.

Example 2:

Input: s = "bbbbb"
Output: 1
Explanation: The answer is "b", with the length of 1.

Example 3:

Input: s = "pwwkew"
Output: 3
Explanation: The answer is "wke", with the length of 3.
Notice that the answer must be a substring, "pwke" is a subsequence and not a substring.

Constraints:

  • 0
    <= s.length <=
    5104
  • s consists of English letters, digits, symbols and spaces.

Algorithm

int lengthOfLongestSubstring(char *s) {
    int map[128];    
    /* ASCII only use 7 bits to store */
    for (int  i = 0; i < 128; i++) {
        map[i] = 0;
    }
    
    int maxLen = 0, start = 0;    
    /* start is the start point of longest substring containing s[i] */
    
    for (int i = 0; s[i] != '\0'; i++){
        if (map[s[i]] > start)
            start = map[s[i]];
        map[s[i]] = i + 1;
        if (i + 1 - start > maxLen)
            maxLen = i + 1 - start;
    }
    return maxLen;
}

For explanation, please consult this solution.

And I want to reduce the space size the sparse array map, so I modifiy the constraints to:

Constraints:

  • 0
    <= s.length <=
    255
  • s consists of lower case English letters.

Because there is only 26 letters now, the size of map can be reduced to 26; And the maximal length is 255 now so we can only use 8 bit to store it, which is a char. Here is the C code after modification:

int lengthOfLongestSubstring(char *s) {
    char map[26];
    
    for (int  i = 0; i < 128; i++) {
        map[i] = 0;
    }
    
    int maxLen = 0, start = 0;    
    /* start is the start point of longest substring containing s[i] */
    
    for (int i = 0; s[i] != '\0'; i++){
        if (map[s[i]] > start)
            start = map[s[i]];
        map[s[i]] = i + 1;
        if (i + 1 - start > maxLen)
            maxLen = i + 1 - start;
    }
    return maxLen;
}

The only modification is the assigment of map.

Assembly code

todo: finish it :(