# Assignment1: RISC-V Assembly and Instruction Pipeline
:::danger
Be aware of [headings](https://hackmd.io/c/tutorials/%2F%40docs%2Fbasic_formatting_en).
:::
## CLZ with loop
> The primary function is to count the number of leading zeros in an integer and print the result based on the input data. If the input data is 0, it prints "undefined."
1. Initialize Registers:
* t0 = 0: This is the counter for the leading zeros.
* t1 = 31: This represents the current bit index, starting at the highest bit (31st bit).
2. Loop Through Bits:
* For each bit, the mask t3 is set to 1 << t1 to target the current bit.
* t2 = a0 & (1 << t1): This checks if the current bit in a0 is set (1) or not (0).
* If the bit is 0, the counter t0 increments, and the index t1 moves to the next bit.
* If a non-zero bit is found, the loop stops.
3. Completion:
* Once a non-zero bit is found or all bits are checked, the counter (t0) is moved to a0 as the return value.
* The function returns to the caller by jumping to the address in ra (return address).
:::danger
Use fewer instructions.
:::
```c
.data
str_1: .string "\nThe leading zero of "
str_2: .string " is "
str_undefined: .string "undefined"
test_data_1: .word 19 # 00000000000000000000000000010011
test_data_2: .word 4096 # 00000000000000000001000000000000
test_data_3: .word 1073741824 # 01000000000000000000000000000000
test_data_4: .word 0 # The number 0 (to test undefined behavior)
.text
.globl main
main:
# Test case 1
la a0, test_data_1
jal ra, run_clz_test
# Test case 2
la a0, test_data_2
jal ra, run_clz_test
# Test case 3
la a0, test_data_3
jal ra, run_clz_test
# Test case 4 (undefined behavior for 0)
la a0, test_data_4
jal ra, run_clz_test
# Exit the program
li a7, 10
ecall
# run_clz_test function: Executes the my_clz function and prints the result for a given test case
run_clz_test:
addi sp, sp, -8 # Adjust the stack pointer to make space for saving registers
sw ra, 0(sp) # Save return address to the stack
lw a1, 0(a0) # Load the test data into a1
beqz a1, handle_zero # If test data is 0, jump to handle_zero
mv a0, a1 # Move test data to a0 for my_clz input
jal ra, my_clz # Call my_clz function
mv t1, a0 # Save the my_clz result (leading zero count) to t1
mv a0, a1 # Restore test data to a0
mv a1, t1 # Pass the leading zero count as a1
jal ra, printResult # Call printResult function
j restore_stack # Jump to restore stack
# Handle zero case (undefined behavior)
handle_zero:
mv t0, a1 # Save 0 as test data
la a0, str_1 # Load "The leading zero of " string
li a7, 4 # System call code for printing a string
ecall # Print "The leading zero of "
li a0, 0
li a7, 1 # System call code for printing an integer
ecall # Print 0
la a0, str_2
li a7, 4
ecall # Print " is "
la a0, str_undefined
li a7, 4
ecall # Print "undefined"
j restore_stack # Jump to restore stack
restore_stack:
lw ra, 0(sp) # Restore return address from the stack
addi sp, sp, 8 # Restore the stack pointer
ret # Return to caller
# my_clz function: Counts the number of leading zeros in an integer
# Input: a0 - input integer
# Output: a0 - count of leading zeros
my_clz:
li t0, 0 # Initialize counter t0 = 0
li t1, 31 # Set bit index t1 = 31 (starting from the highest bit)
clz_loop:
li t3, 1 # Load 1 into t3
sll t3, t3, t1 # Shift t3 left by t1 bits to create the mask (1 << t1)
and t2, a0, t3 # t2 = a0 & (1 << t1)
beqz t2, check_next_bit # If t2 == 0, jump to check_next_bit
j clz_done # If t2 != 0, jump to clz_done
check_next_bit:
addi t0, t0, 1 # Increment counter t0
addi t1, t1, -1 # Decrement bit index t1
bgez t1, clz_loop # If t1 >= 0, continue looping
clz_done:
mv a0, t0 # Move the count t0 to a0 (return value)
jr ra # Return to the caller
# printResult function: Displays the result "The leading zero of <test_data> is <leading zero count>"
# a0: The original input value (test_data)
# a1: The leading zero count
printResult:
mv t0, a0 # Save original input value (test_data) in temporary register t0
mv t1, a1 # Save leading zero count in temporary register t1
la a0, str_1
li a7, 4
ecall # Print the string "The leading zero of "
mv a0, t0
li a7, 1
ecall # Print the test_data
la a0, str_2
li a7, 4
ecall # Print the string " is "
mv a0, t1
li a7, 1
ecall # Print the leading zero count
ret # Return to the caller
```
:::danger
Do not use screenshots for plain text content, as this is inaccessible to visually impaired users.
:::
### Output
The leading zero of 19 is 27
The leading zero of 4096 is 19
The leading zero of 1073741824 is 1
The leading zero of 0 is undefined
Program exited with code: 0
### Performance
| Metric | Value |
| -------- | -------- |
| Cycles | 773 |
| Instrs. retired | 487 |
| CPI | 1.59 |
| IPC | 0.63 |
| Clock rate | 10.75 Hz |
### Pipeline diagram
| Instruction | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
|------------------------|----|----|----|----|----|----|----|
| auipc x10 0x10000 | IF | ID | EX | MEM| WB | | |
| addi x10 x10 37 | | IF | ID | EX | MEM| WB | |
| jal x1 48 <run_clz_test>| | | IF | ID | EX | MEM| WB |
| auipc x10 0x10000 | | | | IF | ID | | |
| addi x10 x10 29 | | | | | IF | | |
| Instruction | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
|------------------------------|----|----|----|----|----|----|----|----|----|----|----|
| addi x2 x2 -8 | IF | ID | EX | MEM| WB | | | | | | |
| sw x1 0 x2 | | IF | ID | EX | MEM| WB | | | | | |
| lw x11 0 x10 | | | IF | ID | EX | MEM| WB | | | | |
| beq x11 x0 32 <handle_zero> | | | | IF | ID | - | EX | MEM| WB | | |
| addi x10 x11 0 | | | | | IF | - | ID | EX | MEM| WB | |
| jal x1 104 <my_clz> | | | | | | | IF | ID | EX | MEM| WB |
| addi x6 x10 0 | | | | | | | | IF | ID | | |
| addi x10 x11 0 | | | | | | | | | IF | | |
#### The loop pipeline diagram

:::warning
TODO: Rewrite `my_clz` without loops.
:::
## CLZ with Binary Search Approach
This function uses a binary search approach to calculate the number of leading zeros in the input integer.
* The input integer is repeatedly shifted left, and the AND operation is used to check whether the higher bits are 0.
* Each time the higher bits are found to be 0, the corresponding offset (16, 8, 4, 2, 1) is added to the counter.
```assemly
.data
str_1: .string "\nThe leading zero of "
str_2: .string " is "
str_undefined: .string "undefined"
test_data_1: .word 19 # 00000000000000000000000000010011
test_data_2: .word 4096 # 00000000000000000001000000000000
test_data_3: .word 1073741824 # 01000000000000000000000000000000
test_data_4: .word 0 # The number 0 (to test undefined behavior)
.text
.globl main
main:
# Test case 1
la a0, test_data_1
jal ra, run_clz_test
# Test case 2
la a0, test_data_2
jal ra, run_clz_test
# Test case 3
la a0, test_data_3
jal ra, run_clz_test
# Test case 4 (Undefined behavior for 0)
la a0, test_data_4
jal ra, run_clz_test
# Exit the program
li a7, 10
ecall
# run_clz_test function: Executes the my_clz function and prints the result for a given test case
run_clz_test:
addi sp, sp, -8 # Adjust the stack pointer to make space for saving registers
sw ra, 0(sp) # Save return address to the stack
lw a1, 0(a0) # Load the test data into a1
beqz a1, handle_zero # If test data is 0, go to handle_zero
mv a0, a1 # Move test data to a0 for my_clz input
jal ra, my_clz # Call my_clz function
mv t1, a0 # Save the my_clz result (leading zero count) to t1
mv a0, a1 # Restore test data to a0
mv a1, t1 # Pass the leading zero count as a1
jal ra, printResult # Call printResult function
j restore_stack # Jump to restore stack
# Handle case where input is zero (undefined behavior)
handle_zero:
mv t0, a1 # t0 = 0
la a0, str_1 # Load "The leading zero of " string
li a7, 4 # System call code for printing a string
ecall # Print "The leading zero of "
li a0, 0
li a7, 1 # System call code for printing an integer
ecall # Print 0
la a0, str_2
li a7, 4
ecall # Print " is "
la a0, str_undefined
li a7, 4
ecall # Print "undefined"
j restore_stack # Jump to restore stack
restore_stack:
lw ra, 0(sp) # Restore return address from the stack
addi sp, sp, 8 # Restore the stack pointer
ret # Return to caller
# my_clz function: Counts the number of leading zeros in an integer using Binary Search Approach
my_clz:
li t0, 0 # Initialize counter t0 = 0
li t1, 0xFFFF0000 # t1 = 0xFFFF0000
and t2, a0, t1 # t2 = a0 & 0xFFFF0000
beqz t2, clz_high16_zero # If high 16 bits are zero, skip
j clz_check_high8
clz_high16_zero:
addi t0, t0, 16 # t0 += 16
slli a0, a0, 16 # a0 <<= 16
clz_check_high8:
li t1, 0xFF000000 # t1 = 0xFF000000
and t2, a0, t1 # t2 = a0 & 0xFF000000
beqz t2, clz_high8_zero # If next 8 bits are zero, skip
j clz_check_high4
clz_high8_zero:
addi t0, t0, 8 # t0 += 8
slli a0, a0, 8 # a0 <<= 8
clz_check_high4:
li t1, 0xF0000000 # t1 = 0xF0000000
and t2, a0, t1 # t2 = a0 & 0xF0000000
beqz t2, clz_high4_zero # If next 4 bits are zero, skip
j clz_check_high2
clz_high4_zero:
addi t0, t0, 4 # t0 += 4
slli a0, a0, 4 # a0 <<= 4
clz_check_high2:
li t1, 0xC0000000 # t1 = 0xC0000000
and t2, a0, t1 # t2 = a0 & 0xC0000000
beqz t2, clz_high2_zero # If next 2 bits are zero, skip
j clz_check_high1
clz_high2_zero:
addi t0, t0, 2 # t0 += 2
slli a0, a0, 2 # a0 <<= 2
clz_check_high1:
li t1, 0x80000000 # t1 = 0x80000000
and t2, a0, t1 # t2 = a0 & 0x80000000
beqz t2, clz_high1_zero # If highest bit is zero, skip
j clz_done
clz_high1_zero:
addi t0, t0, 1 # t0 += 1
slli a0, a0, 1 # a0 <<= 1
clz_done:
mv a0, t0 # Move result to a0
jr ra # Return to caller
# printResult function: Displays the result "The leading zero of <test_data> is <leading zero count>"
printResult:
mv t0, a0 # Save original input value (test_data) in temporary register t0
mv t1, a1 # Save leading zero count in temporary register t1
la a0, str_1
li a7, 4
ecall # Print the string "The leading zero of "
mv a0, t0
li a7, 1
ecall # Print the test_data
la a0, str_2
li a7, 4
ecall # Print the string " is "
mv a0, t1
li a7, 1
ecall # Print the leading zero count
ret # Return to the caller
```
### Output
The leading zero of 19 is 27
The leading zero of 4096 is 19
The leading zero of 1073741824 is 1
The leading zero of 0 is undefined
Program exited with code: 0
### Performance
| Metric | Value |
| -------- | -------- |
| Cycles | 330 |
| Instrs. retired | 208 |
| CPI | 1.59 |
| IPC | 0.63 |
| Clock rate | 9.71 Hz |
### Advantage
The execution time is fixed, It will not differ because of the different number of digits in the highest bit
### Pipeline diagram
| Instruction | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
|------------------------------|----|----|----|----|----|----|----|
| auipc x10 0x10000 | IF | ID | EX | MEM| WB | | |
| addi x10 x10 37 | | IF | ID | EX | MEM| WB | |
| jal x1 48 <run_clz_test> | | | IF | ID | EX | MEM| WB |
| auipc x10 0x10000 | | | | IF | ID | | |
| addi x10 x10 29 | | | | | IF | | |
| Instruction | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
|------------------------------|----|----|----|----|----|----|----|----|----|----|----|
| addi x2 x2 -8 | IF | ID | EX | MEM| WB | | | | | | |
| sw x1 0 x2 | | IF | ID | EX | MEM| WB | | | | | |
| lw x11 0 x10 | | | IF | ID | EX | MEM| WB | | | | |
| beq x11 x0 32 <handle_zero> | | | | IF | ID | - | EX | MEM| WB | | |
| addi x10 x11 0 | | | | | IF | - | ID | EX | MEM| WB | |
| jal x1 104 <my_clz> | | | | | | | IF | ID | EX | MEM| WB |
| addi x6 x10 0 | | | | | | | | IF | ID | | |
| addi x10 x11 0 | | | | | | | | | IF | | |
#### Without loop pipeline diagram


## LeetCode 991: Broken Calculator
1. Reverse Operations: Start from the target and progressively transform it into the startValue, calculating the required number of operations.
1. Bitwise Operations: Perform right shifts (equivalent to division by 2) for even numbers, and add 1 for odd numbers.
```c
.data
str_start: .string "\nStartValue = "
str_target: .string ", Target = "
str_output: .string ", Minimum operations needed: "
# Test data
start_1: .word 2
target_1: .word 3 # 00000000000000000000000000000011
start_2: .word 5
target_2: .word 4096 # 00000000000000000001000000000000
start_3: .word 3
target_3: .word 1073741824 # 01000000000000000000000000000000
.text
.globl main
main:
# Test case 1
la a0, start_1
la a1, target_1
jal ra, run_test
# Test case 2
la a0, start_2
la a1, target_2
jal ra, run_test
# Test case 3
la a0, start_3
la a1, target_3
jal ra, run_test
# Exit the program
li a7, 10
ecall
# run_test function: Executes min_operations function and prints the result
# Inputs:
# a0: Address of startValue
# a1: Address of target
run_test:
addi sp, sp, -12 # Allocate stack space for saved registers
sw ra, 8(sp) # Save return address
sw s0, 4(sp) # Save s0
sw s1, 0(sp) # Save s1
lw s0, 0(a0) # Load startValue into s0
lw s1, 0(a1) # Load target into s1
mv a0, s0 # Move startValue to a0
mv a1, s1 # Move target to a1
jal ra, min_operations # Call min_operations function
mv t0, s0 # Save startValue to t0
mv t1, s1 # Save target to t1
mv t2, a0 # Save result (operations) to t2
jal ra, printResult # Call printResult function
lw ra, 8(sp) # Restore return address
lw s0, 4(sp) # Restore s0
lw s1, 0(sp) # Restore s1
addi sp, sp, 12 # Deallocate stack space
ret
# min_operations function: Calculates the minimum number of operations
# Inputs:
# a0: startValue
# a1: target
# Output:
# a0: Minimum number of operations
min_operations:
addi sp, sp, -8 # Allocate stack space for saved registers
sw s0, 4(sp) # Save s0
sw s1, 0(sp) # Save s1
mv s0, a0 # s0 = startValue
mv s1, a1 # s1 = target
li t0, 0 # t0 = operations
min_loop:
bge s0, s1, min_loop_end # If startValue >= target, exit loop
andi t1, s1, 1 # t1 = target % 2
beqz t1, min_even # If t1 == 0 (even), go to min_even
# target is odd
addi s1, s1, 1 # target += 1
addi t0, t0, 1 # operations += 1
j min_loop
min_even:
srai s1, s1, 1 # target = target / 2
addi t0, t0, 1 # operations += 1
j min_loop
min_loop_end:
sub t1, s0, s1 # t1 = startValue - target
add t0, t0, t1 # operations += (startValue - target)
mv a0, t0 # a0 = operations
lw s0, 4(sp) # Restore s0
lw s1, 0(sp) # Restore s1
addi sp, sp, 8 # Deallocate stack space
ret
# printResult function: Prints startValue, target, and minimum operations
# Inputs:
# t0: startValue
# t1: target
# t2: Minimum operations
printResult:
la a0, str_start # Print "\nStartValue = "
li a7, 4
ecall
mv a0, t0 # Print startValue
li a7, 1
ecall
la a0, str_target # Print ", Target = "
li a7, 4
ecall
mv a0, t1 # Print target
li a7, 1
ecall
la a0, str_output # Print ", Minimum operations needed: "
li a7, 4
ecall
mv a0, t2 # Print minimum operations
li a7, 1
ecall
ret
```
### Output
StartValue = 2, Target = 3, Minimum operations needed: 2
StartValue = 5, Target = 4096, Minimum operations needed: 11
StartValue = 3, Target = 1073741824, Minimum operations needed: 30
Program exited with code: 0
### Performance
| Metric | Value |
| -------- | -------- |
| Cycles | 330 |
| Instrs. retired | 208 |
| CPI | 1.59 |
| IPC | 0.63 |
| Clock rate | 9.71 Hz |
### Expected to be optimized with clz
Originally I wanted to optimize that when the test case is a power of 2, it can quickly move to the right without moving bit by bit. As a result, the related code has to be expanded, so the performance is even worse.