Assignment1: RISC-V Assembly and Instruction Pipeline
contributed by <hugo0406
>
Find Leftmost 0-byte using CLZ
-
In C,the end of the string is denoted by an all-0 byte. To find the length of a string, a C program uses the “strlen” function.This function searches the string, from left to right, for the 0-byte, and returns the number of bytes scanned,not counting the 0-byte.
-
A fast implementation of “strlen” might load and test single bytes until a word boundary is reached,and then load a word at a time into a register, and test the register for the presence of a 0-byte.
-
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- Values from 0 to 3 denoting bytes 0 to 3, and a value of 4 denoting that there's no 0-byte in the word.
- “00” denotes a 0-byte, “nn” denotes a nonzero byte, and “xx” denotes a byte that may be 0 or nonzero.
- Follows considering a 64-bit(double word) case instead of 32-bit,then range of the function's return value is 0 to 8.
Implementation
C Code
Assembly Code
VersionⅠ use only one test case :
.data
test1: .dword 0x1122334455007700
str1: .string "The Leftmost 0-byte is "
.text
main:
la t0,test1
lw a0,0(t0)
lw a1,4(t0)
jal ra,zbytel
mv t0,a0
la a0,str1
li a7,4
ecall
mv a0,t0
li a7,1
ecall
li a7, 10
ecall
zbytel:
addi sp,sp,-4
sw ra,0(sp)
mv s0,a0
mv s1,a1
li t0,0x7f7f7f7f
s2,s0,t0
add s2,s2,t0
or s2,s2,s0
or s2,s2,t0
xori s2,s2,-1
s3,s1,t0
add s3,s3,t0
or s3,s3,s0
or s3,s3,t0
xori s3,s3,-1
mv a0,s2
mv a1,s3
jal clz
lw ra,0(sp)
addi sp,sp,4
srli a0,a0,3
jr ra
clz:
t1,a1,0x1
srli s4,a1,1
srli s5,a0,1
slli t1,t1,31
or s5,s5,t1
or a1,s4,a1
or a0,s5,a0
t1,a1,0x3
srli s4,a1,2
srli s5,a0,2
slli t1,t1,30
or s5,s5,t1
or a1,s4,a1
or a0,s5,a0
t1,a1,0xf
srli s4,a1,4
srli s5,a0,4
slli t1,t1,28
or s5,s5,t1
or a1,s4,a1
or a0,s5,a0
t1,a1,0xff
srli s4,a1,8
srli s5,a0,8
slli t1,t1,24
or s5,s5,t1
or a1,s4,a1
or a0,s5,a0
li t1,0xffff
t1,a1,t1
srli s4,a1,16
srli s5,a0,16
slli t1,t1,16
or s5,s5,t1
or a1,s4,a1
or a0,s5,a0
mv s5,a1
s4,a1,x0
or a1,s4,a1
or a0,s5,a0
t1,a1,0x1
srli s4,a1,1
srli s5,a0,1
slli t1,t1,31
or s5,s5,t1
li t1,0x55555555
s4,s4,t1
s5,s5,t1
sub a1,a1,s4
sub a0,a0,s5
t1,a1,0x3
srli s4,a1,2
srli s5,a0,2
slli t1,t1,30
or s5,s5,t1
li t1,0x33333333
s4,s4,t1
s5,s5,t1
a1,a1,t1
a0,a0,t1
add a1,a1,s4
add a0,a0,s5
t1,a1,0xf
srli s4,a1,4
srli s5,a0,4
slli t1,t1,28
or s5,s5,t1
add s4,s4,a1
add s5,s5,a0
li t1,0x0f0f0f0f
a1,s4,t1
a0,s5,t1
t1,a1,0xff
srli s4,a1,8
srli s5,a0,8
slli t1,t1,24
or s5,s5,t1
add a1,a1,s4
add a0,a0,s5
li t1,0xffff
t1,t1,a1
srli s4,a1,16
srli s5,a0,16
slli t1,t1,16
or s5,s5,t1
add a1,a1,s4
add a0,a0,s5
mv s5,a1
s4,a1,x0
add a1,a1,s4
add a0,a0,s5
a0,a0,0x7f
li t1,64
sub a0,t1,a0
jr ra
VersionⅡ use multiple test cases by loops:
.data
test1: .dword 0x1122334455007700
test2: .dword 0x0123456789abcdef
test3: .dword 0x1100220033445566
str1: .string "The Leftmost 0-byte is "
endl: .string "\n"
.text
main:
la t2, test1
li t3, 3
loop:
lw a0, 0(t2)
lw a1, 4(t2)
jal zbytel
mv t4, a0
la a0, str1
li a7, 4
ecall
mv a0, t4
li a7, 1
ecall
la a0, endl
li a7, 4
ecall
addi t2, t2, 8
addi t3, t3, -1
bne t3, x0, loop
li a7, 10
ecall
zbytel:
addi sp,sp,-4
sw ra,0(sp)
mv s0,a0
mv s1,a1
li t0,0x7f7f7f7f
s2,s0,t0
add s2,s2,t0
or s2,s2,s0
or s2,s2,t0
xori s2,s2,-1
s3,s1,t0
add s3,s3,t0
or s3,s3,s0
or s3,s3,t0
xori s3,s3,-1
mv a0,s2
mv a1,s3
jal clz
lw ra,0(sp)
addi sp,sp,4
srli a0,a0,3
jr ra
clz:
t1,a1,0x1
srli s4,a1,1
srli s5,a0,1
slli t1,t1,31
or s5,s5,t1
or a1,s4,a1
or a0,s5,a0
t1,a1,0x3
srli s4,a1,2
srli s5,a0,2
slli t1,t1,30
or s5,s5,t1
or a1,s4,a1
or a0,s5,a0
t1,a1,0xf
srli s4,a1,4
srli s5,a0,4
slli t1,t1,28
or s5,s5,t1
or a1,s4,a1
or a0,s5,a0
t1,a1,0xff
srli s4,a1,8
srli s5,a0,8
slli t1,t1,24
or s5,s5,t1
or a1,s4,a1
or a0,s5,a0
li t1,0xffff
t1,a1,t1
srli s4,a1,16
srli s5,a0,16
slli t1,t1,16
or s5,s5,t1
or a1,s4,a1
or a0,s5,a0
mv s5,a1
s4,a1,x0
or a1,s4,a1
or a0,s5,a0
t1,a1,0x1
srli s4,a1,1
srli s5,a0,1
slli t1,t1,31
or s5,s5,t1
li t1,0x55555555
s4,s4,t1
s5,s5,t1
sub a1,a1,s4
sub a0,a0,s5
t1,a1,0x3
srli s4,a1,2
srli s5,a0,2
slli t1,t1,30
or s5,s5,t1
li t1,0x33333333
s4,s4,t1
s5,s5,t1
a1,a1,t1
a0,a0,t1
add a1,a1,s4
add a0,a0,s5
t1,a1,0xf
srli s4,a1,4
srli s5,a0,4
slli t1,t1,28
or s5,s5,t1
add s4,s4,a1
add s5,s5,a0
li t1,0x0f0f0f0f
a1,s4,t1
a0,s5,t1
t1,a1,0xff
srli s4,a1,8
srli s5,a0,8
slli t1,t1,24
or s5,s5,t1
add a1,a1,s4
add a0,a0,s5
li t1,0xffff
t1,t1,a1
srli s4,a1,16
srli s5,a0,16
slli t1,t1,16
or s5,s5,t1
add a1,a1,s4
add a0,a0,s5
mv s5,a1
s4,a1,x0
add a1,a1,s4
add a0,a0,s5
a0,a0,0x7f
li t1,64
sub a0,t1,a0
jr ra
Output
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Analysis & Pipeline
Generated from Ripes.
Below are a few lines of disassembled code:
Take the instruction jal x1 72 <zbytel>
,which address is at 0x14, and see how it works in pipeline:
IF Stage

-
First of all ,the PC is 0x14
,meaning that we are going to fetch instruction jal x1 72 <zbytel>
.
-
From Instr. memory ,we can get instruction machine code 0x048000ef
,also is jal x1 72 <zbytel>
will get into the next stage.
-
PC= PC+4, if no bracnching occured ,where next instuction we're going to fetch at next cycle.
ID Stage



-
In this stage, the decoder decodes the machine code 0x048000ef
.
-
opcode
field is 1101111
,represnt jal
instruction
-
rd
field is 00001
,represent x1(ra)
-
imm
field is 00000100100000000000
- imm[10:1]=
0000100100
,so immediate value is 0x00000048
- instr[19:15]=
00000
,so R1 idx is 0x00
- insrt[24:20]=
01000
,so R2 idx is 0x08
EXE Stage

- In this stage, the ALU sums the inputs
0x00000014
(the address) and 0x00000048
(the immediate),then get 0x0000005c
where zbytel at.
MEM Stage

- Flushing 2 instructions(use nop) next 2 cycles.
- IF stage fetch the instruction at
0x0000005c
.
- The
jal
instruction doesn't involve memory access, so there are no memory-related operations in this stage.
WB Stage

- Lastly, the processor writes
0x00000018
into x1(ra).
Reference
- Hacker's Delight
- RISC-V Instruction Set Manual
- RISC-V Instruction Format