Assignment1: RISC-V Assembly and Instruction Pipeline
contributed by < 李尚宸
>
Quiz1 Problem C (Counting Leading Zeros)
Choose one problem (A, B, or C) from Quiz1, translate it from C code to a complete RISC-V assembly program, and include the relevant test data.
C programs
fabsf.c
my_clz.c
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Show the full C source code corresponding to the original problem set.
fp16_to_fp32.c
Assembly programs
fabsf.s
my_clz.s

fp16_to_fp32.s
Summary of Registers Used
my_clz.s
main
function:
t0
: Pointers for num_test_cases
and test_case_array
.
t1
: Number of test cases.
t2
:i
a7
: System call register.
my_clz
function:
a0
: Argument register used for passing the current test case and returning the result
t3
: i
t4
: count
t5
: x & (1U << i)
Loop Unrolling
my_clz_loop_unrolling.s
- We perform loop unrolling in the
main
function
.data
num_test_cases:
.word 3
test_case_array:
.word 0, 1, 1024
newline:
.asciz "\n"
.text
.globl main
main:
lw t1, 3
la t0, test_case_array
li t2, 0
test_case0:
lw a0, 0(t0)
jal ra, my_clz
li a7, 1 # System call code for print integer
ecall
la a0, newline
li a7, 4 # System call code for print string
ecall
addi t0, t0, 4 # Next test_case
addi t2, t2, 1 # i++
test_case1:
lw a0, 0(t0)
jal ra, my_clz
li a7, 1
ecall
la a0, newline
li a7, 4
ecall
addi t0, t0, 4
addi t2, t2, 1
test_case2:
lw a0, 0(t0)
jal ra, my_clz
li a7, 1
ecall
la a0, newline
li a7, 4
ecall
end_main:
li a7, 10 # System call code for exit
ecall
my_clz:
li t3, 31 # i = 31
li t4, 0 # count = 0
loop_my_clz:
li t5, 1 # t5 = 1
sll t5, t5, t3 # t5 = 1 << i
and t5, a0, t5 # t5 = x & (1 << i)
bnez t5, end_my_clz # if ((x & (1 << i)) != 0), exit loop
addi t4, t4, 1 # count++
addi t3, t3, -1 # i--
bltz t3, end_my_clz # if (i < 0), exit loop
j loop_my_clz
end_my_clz:
mv a0, t4 # return count
ret

A Comparison of C, Non-Loop Unrolling, and Loop Unrolling Implementations
my_clz
|
C |
Non-Loop Unrolling Assembly |
Loop Unrolling Assembly |
Cycles |
7469 |
946 |
928 |
Instrs. retired |
5603 |
736 |
726 |
CPI |
1.33 |
1.29 |
1.28 |
IPC |
0.75 |
0.778 |
0.782 |
Clock rate |
6.48 KHz |
59.17 Hz |
1.33 KHz |
Problem
Given an array nums containing n distinct numbers in the range [0, n], return the only number in the range that is missing from the array.
Example 1:
Input: nums = [3,0,1]
Output: 2
Explanation: n = 3 since there are 3 numbers, so all numbers are in the range [0,3]. 2 is the missing number in the range since it does not appear in nums.
Example 2:
Input: nums = [0,1]
Output: 2
Explanation: n = 2 since there are 2 numbers, so all numbers are in the range [0,2]. 2 is the missing number in the range since it does not appear in nums.
Example 3:
Input: nums = [9,6,4,2,3,5,7,0,1]
Output: 8
Explanation: n = 9 since there are 9 numbers, so all numbers are in the range [0,9]. 8 is the missing number in the range since it does not appear in nums.
C Program

You should automate the testing procedures as much as possible, meaning there's no need to manually check each program output. Instead, implement internal validation to handle the result checking.
Assembly Program
.data
nums0:
.word 3, 0, 1
nums1:
.word 0, 1
nums2:
.word 9, 6, 4, 2, 3, 5, 7, 0, 1
len_nums0:
.word 3
len_nums1:
.word 2
len_nums2:
.word 9
newline:
.asciz "\n"
.text
.globl main
main:
# missing number of num0
la a0, nums0
lw a1, len_nums0
jal ra, missing_number
li a7, 1 # System call code for print integer
ecall
la a0, newline
li a7, 4 # System call code for print string
ecall
# missing number of num1
la a0, nums1
lw a1, len_nums1
jal ra, missing_number
li a7, 1
ecall
la a0, newline
li a7, 4
ecall
# missing number of num2
la a0, nums2
lw a1, len_nums2
jal ra, missing_number
li a7, 1
ecall
la a0, newline
li a7, 4
ecall
end_main:
li a7, 10 # System call code for exit
ecall
missing_number:
li t0, 0 # mask = 0
li t1, 0 # i = 0
loop_missing_number:
beq t1, a1, end_missing_number # if (i == n), exit loop
lw t2, 0(a0) # Load nums[i]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums[i]
addi a0, a0, 4 # Next element
addi t1, t1, 1 # i++
j loop_missing_number
end_missing_number:
xor a0, t0, t1 # mask ^ n
ret

Summary of Registers Used
main
function:
a0
: Pointer to the current array (nums1
, nums2
, nums3
).
a1
: Length of the current array (len_nums1
, len_nums2
, len_nums3
).
a7
: System call register.
missing_number
function:
t0
: mask
t1
: i
t2
: nums[i]
Loop Unrolling
- We perform loop unrolling in the
missing_number
function
.data
nums0:
.word 3, 0, 1
nums1:
.word 0, 1
nums2:
.word 9, 6, 4, 2, 3, 5, 7, 0, 1
len_nums0:
.word 3
len_nums1:
.word 2
len_nums2:
.word 9
newline:
.asciz "\n"
.text
.globl main
main:
nums0_missing_number:
la a0, nums0
li t0, 0 # mask = 0
li t1, 0 # i = 0
lw t2, 0(a0) # Load nums0[0]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums0[0]
addi a0, a0, 4 # Next element
addi t1, t1, 1 # i++
lw t2, 0(a0) # Load nums0[1]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums0[1]
addi a0, a0, 4 # Next element
addi t1, t1, 1 # i++
lw t2, 0(a0) # Load nums0[2]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums0[2]
addi t1, t1, 1 # i++
xor a0, t0, t1 # mask ^ n
li a7, 1 # System call code for print integer
ecall
la a0, newline
li a7, 4 # System call code for print string
ecall
nums1_missing_number:
la a0, nums1
li t0, 0 # mask = 0
li t1, 0 # i = 0
lw t2, 0(a0) # Load nums1[0]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums1[0]
addi a0, a0, 4 # Next element
addi t1, t1, 1 # i++
lw t2, 0(a0) # Load nums1[1]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums1[1]
addi t1, t1, 1 # i++
xor a0, t0, t1 # mask ^ n
li a7, 1
ecall
la a0, newline
li a7, 4
ecall
nums2_missing_number:
la a0, nums2
li t0, 0 # mask = 0
li t1, 0 # i = 0
lw t2, 0(a0) # Load nums2[0]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums2[0]
addi a0, a0, 4 # Next element
addi t1, t1, 1 # i++
lw t2, 0(a0) # Load nums2[1]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums2[1]
addi a0, a0, 4 # Next element
addi t1, t1, 1 # i++
lw t2, 0(a0) # Load nums2[2]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums2[2]
addi a0, a0, 4 # Next element
addi t1, t1, 1 # i++
lw t2, 0(a0) # Load nums2[3]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums2[3]
addi a0, a0, 4 # Next element
addi t1, t1, 1 # i++
lw t2, 0(a0) # Load nums2[4]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums2[4]
addi a0, a0, 4 # Next element
addi t1, t1, 1 # i++
lw t2, 0(a0) # Load nums2[5]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums2[5]
addi a0, a0, 4 # Next element
addi t1, t1, 1 # i++
lw t2, 0(a0) # Load nums2[6]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums2[6]
addi a0, a0, 4 # Next element
addi t1, t1, 1 # i++
lw t2, 0(a0) # Load nums2[7]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums2[7]
addi a0, a0, 4 # Next element
addi t1, t1, 1 # i++
lw t2, 0(a0) # Load nums2[8]
xor t0, t0, t1 # mask = mask ^ i
xor t0, t0, t2 # mask = mask ^ nums2[8]
addi t1, t1, 1 # i++
xor a0, t0, t1 # mask ^ n
li a7, 1
ecall
la a0, newline
li a7, 4
ecall
end_main:
li a7, 10 # System call code for exit
ecall

A Comparison of C, Non-Loop Unrolling, and Loop Unrolling Implementations
|
C |
Non-Loop Unrolling Assembly |
Loop Unrolling Assembly |
Cycles |
5112 |
212 |
120 |
Instrs. retired |
3847 |
148 |
102 |
CPI |
1.33 |
1.43 |
1.18 |
IPC |
0.753 |
0.698 |
0.85 |
Clock rate |
7.83 KHz |
27.25 Hz |
77.42 KHz |
Hazard
-
Data Hazards:
-
RAW (Read After Write): An instruction reads data before a previous instruction writes to it.
- Example:
- Solution: Data forwarding, pipeline stalling.
-
WAR (Write After Read): An instruction writes data before a previous one reads it.
- Example:
- Solution: Reordering instructions, or register renaming.
-
WAW (Write After Write): Two instructions write to the same register in the wrong order, causing data loss.
- Example:
- Solution: Pipeline stall, register renaming.
-
Control Hazards:
- Caused by branch instructions that change the instruction flow. Mispredicting can delay execution.
- Solution: Branch prediction, delayed branching.
-
Structural Hazards:
- Occur when multiple instructions require the same hardware resource at the same time.
- Solution: Adding more resources (e.g., multiple ALUs), or using multiple pipelines.
Analysis
Ripes Simulator
We use Ripes, a graphical processor simulator and assembly editor for the RISC-V.
There are five stages in the Ripes pipeline.
IF
: Instruction Fetch
ID
: Instruction Decode & Register Read
EX
: Execution or Address Calculation
MEM
: Data Memory Access
WB
: Write Back

Case 1: my_clz.s
- There is a RAW hazard between the
li t5, 1
instruction and the following instructions (sll
).

-
Data forwarding ensures that the value of t5
is immediately available to the sll instruction, without having to wait for it to propagate through the entire pipeline and write back to the register file.
-
Control Hazard
-
Before jal x1 48 <my_clz>
execution


- After
jal x1 48 <my_clz>
execution


Case 2: missing_number.s

- There is a RAW hazard between two
xor
instruction.

-
This example, like the one above in my_clz.s, uses data forwarding to avoid data hazards.
-
Control Hazard
-
Before jal x1 124 <missing_number>
execution


- After
jal x1 124 <missing_number>
execution


- Due to branch instructions that alter the flow of execution,
nop(flush)
instructions are inserted here.