RISC-V
jserv
When following the steps on lab3 to setup required environment of homework, I encountered some problems.
In Makefile, we can see that risc-v toolchains will install in /opt/riscv
, which might need a root user privilege to execute because it will add a new directory under the root.
When directly type in make -j$(nproc)
, it aroused a permisson denied error and by using sudo
can fix it.
When installing verilator, we have to add a new system variable to save the directory of verilator root. In lab3, command export
was used but it can only take effect in current shell. If restarted we have to retype this command again. So I modified ~/.profile
to make it run this command automatically every time the shell restarted.
After add these lines into ~/.profile
we can use it as a source and check the value of system variable.
You should see the following result, where teimeiki
will be replaced with your user name:
And after add the system variable, the instruction tells me to run ./configure
but it turned out the file doesn't exist. We should first use autoconf
to generate configure
file with defult setting in configure.ac
.
The following is all command I used:
Add this into ~/.profile
:
Modify ~/.profile
again:
After installation of srv32, I try to use make all
to check whether the invironment is set up correctly and this error occured:
After read the Vriscv.mk
in the directory srv32/sim/sim_cc
, I realized that the value of $VERILATOR_ROOT
will be chaned in makefile and commenting the line didn't help because it will automatically uncomment when we call make
command. So I tried to set flag when calling make
command:
The value of VERILATOR_ROOT will be specified even if mk scripts reassign the value. It finally works fine.
The reason is that I didn't export
the environment variable. After I add export VERILATOR_ROOT
and export CROSS_COMPILE
in ~/.profile
, the problem is solved and there is no need to specify environment variable in command line.
In README.md
file of srv32, it state that Two instructions branch penalty if branch taken, CPI is 1 for other instructions.
. We can also see this phenomena with a testing program.
I want to check the wave form to observe this phenomena, so the following code is introduced:
It only composed by a for loop doing nothig, which will produce a successive taken conditional branch. And compiled by -O0
optimization flag specified:
With this program, we can obser the behavior of srv32 when encounter with successive conditional branches. And the following waveform is generated:
We can know that every time branch taken, 2 instruction will be flushed.
Here is 2 implementation of the c code:
We cannot avoid bursting branch in for loop unless we use loop unrolling, but we can by changing implementation way to reduce instruction counts. The second implementation has fewer instruction counts, while it is actually a do...while...
implementation so the first iteration will not be checked whether greater than 1000 or not.
instruction count | cycle count | CPI | |
---|---|---|---|
first implementation | 3090 | 5049 | 1.661 |
scecond implementation | 2040 | 4048 | 1.984 |
I have tried some way to run my assembly code on srv32, but all of them failed at first.
I reference to OscarShiang's previous work because it is the scanty detailed report using assembly language and download the source code on his github to check whether other's work can run. And encounter with follwin issue:
I modified Makefile.common
as suggested in this issue to make it become compatible with current ISA spec version, but it turned out to failed too:
The illegal instruction is in subroutine printf
and it is compressed instruction. Because the address in error message increase by 4 and printf
contains the first compressed instruction in whole program, I think the problem might be caused by compress instruction. After checking srv32' readme again, I enalbed compressed instruction with rv32c=1
flag and tried again, but it cannot run, too:
And it even cannot terminate.
If we disable printf
, OscarShiang's code can run normally:
The problem is caused by printf
. In OscarShiang’s work, printf
always use a compressed instruction format so when we use normal 32bits instruction format, srv32 will borken. And a weird thing is that setting rv32c=1
didn't help.
But after I test printf
in my project cloned from srv32, it turned out that printf
works perfectly. So I suspended debugging previous work and port my hw2 to srv32 first.
At first, the linker will not link my program with _start
, my main function will be used as _start
insteaded. Also, the data will be directly append to text section rather than data section:
This is because I use assembler to assemble my code at first, thus forgot to set flags such as -nostartfiles
and -nostdlib
because the assembler cannot recognize them. So when linking, somthing will go wrong.
I modified Makefile
as following and make it properly:
In srv32, the calling convention seems to be same with rv32emu. After observing sw/common/syscall.c and tools/syscall.c under the directory srv32
, I think the calling convention should be same.
sw/common/syscall.c
:
tools/syscall.c
:
In this 2 implementation, a0 is output file (STDOUT), a1 is the address of data and a2 is the len of data in byte. So I directly use my code written in hw2 to do system call but faild. After that, I reffer to 鄭至崴's suggestion and try printf
again. After I rebuild my project, it can works find.
The usage of printf
is described in wanghanchi's work.
There is one thing we should pay attention to. ecall
is an exception so the register used by it is seperated from GPRs, while printf
is a subroutine, it will modify our GPRs. ra
should be store before function call and so does t0-6
, a0-7
as long as we need them after calling printf
. Otherwise we might get a wrong result.
Because the main function will become a subroutine of _start
, we have to modify it. In homework 2, I specify the address of main to 0x00
, but this address should be reserved for _start
routine. I modified my code form:
to:
SYS call constants is eliminated because printf
don't need them; The name is modified because there is another _start
function so it is better to avoild duplicated names; The origin of main
should be determined when linking so .org 0
is eliminated; The main function should follow calling convention of rv32 so it should save its return address.
Also, the return value should be set properly. a0
should be set to 0 before return if every thing goes perfectly, otherwise an error will be passed to make
if we execute the code using make
command:
With a li a0, 0
before returning:
Here is the code after modification:
And follwing is the output:
I found that there is no need to store the value of stack pointer at line 14 because it will not change in main function. So I first modify it.
And I extend the array so the for loop will iterate for 40 times, so the improvement is more significant.
After modification, the result is as following:
I modify the code as following:
And the result is as following:
The instruction count of branch in for loop is reduced from to , where n is the length of input array; And 3 additional lw
and 1 andi
is needed.
instruction count | cycle count | CPI | LOC of for loop | |
---|---|---|---|---|
with loop unrolling | 342 | 368 | 1.076 | 38 |
without loop unrolling | 363 | 449 | 1.237 | 10 |
In windows terminal, enter this line to use gtkwave.
And select the signal you want to observe.
In this figure, we can see that each time a branch (or jump) is taken, two instruction will be flushed. This is called branch penalty, and can be illustrated by figure here.
The instruction fetched is wrong if branch is taken, so 2 instruction following branch should be flushed. Branch prediction do not help because in this srv32 implementation, all of the destination of branch and jump is decided at EXE stage.
We can observe it in instruction fetching waveform too.
When fetch_pc is set to the address of blt
, the memory will pass the instruction (blt t0,t1,78
) to riscv CPU after 1 cycle.
And when blt
flow to EXE stage, because it is a branch type instruction, the branch flag will be set and next_pc
will be set to branch destination too. The fetch_pc
will be updated with next_pc
in next cycle and in this procedure, 2 instruction that shouldn't be executed will be fetched into our pipline so needed to be flushed.
DMEM:
CODE:
When lw
goes to EXE stage, dmem_rready
will be set and dmem_raddr
will be set to the addressof data. After subtracting offset and truncating the least significant 2 bits, the answer will be passed to raddr
. And the value stored in address will be read to CPU after a cycle. In this example, the value is 0000 0001
.
And when sw
goes to WB
stage, the wready
signal will be set.
The result calculated by ALU will be passed to wb_waddr
. After the subtraction of offset and truncation of least significant 2 bits, the address will be passed to waddr
to store wdata
into data memory.
I select longest-substring-without-repeating-characters as my new object to implement.
Here is the discription:
Given a string s, find the length of the longest substring without repeating characters.
Example 1:Example 2:
Example 3:
Constraints:
- <=
s.length
<=s
consists of English letters, digits, symbols and spaces.
For explanation, please consult this solution.
And I want to reduce the space size the sparse array map
, so I modifiy the constraints to:
Constraints:
- <=
s.length
<=s
consists of lower case English letters.
Because there is only 26 letters now, the size of map
can be reduced to 26; And the maximal length is 255 now so we can only use 8 bit to store it, which is a char. Here is the C code after modification:
The only modification is the assigment of map.
todo: finish it :(