# Assignment1: RISC-V Assembly and Instruction Pipeline :::danger Be aware of [headings](https://hackmd.io/c/tutorials/%2F%40docs%2Fbasic_formatting_en). ::: ## CLZ with loop > The primary function is to count the number of leading zeros in an integer and print the result based on the input data. If the input data is 0, it prints "undefined." 1. Initialize Registers: * t0 = 0: This is the counter for the leading zeros. * t1 = 31: This represents the current bit index, starting at the highest bit (31st bit). 2. Loop Through Bits: * For each bit, the mask t3 is set to 1 << t1 to target the current bit. * t2 = a0 & (1 << t1): This checks if the current bit in a0 is set (1) or not (0). * If the bit is 0, the counter t0 increments, and the index t1 moves to the next bit. * If a non-zero bit is found, the loop stops. 3. Completion: * Once a non-zero bit is found or all bits are checked, the counter (t0) is moved to a0 as the return value. * The function returns to the caller by jumping to the address in ra (return address). :::danger Use fewer instructions. ::: ```c .data str_1: .string "\nThe leading zero of " str_2: .string " is " str_undefined: .string "undefined" test_data_1: .word 19 # 00000000000000000000000000010011 test_data_2: .word 4096 # 00000000000000000001000000000000 test_data_3: .word 1073741824 # 01000000000000000000000000000000 test_data_4: .word 0 # The number 0 (to test undefined behavior) .text .globl main main: # Test case 1 la a0, test_data_1 jal ra, run_clz_test # Test case 2 la a0, test_data_2 jal ra, run_clz_test # Test case 3 la a0, test_data_3 jal ra, run_clz_test # Test case 4 (undefined behavior for 0) la a0, test_data_4 jal ra, run_clz_test # Exit the program li a7, 10 ecall # run_clz_test function: Executes the my_clz function and prints the result for a given test case run_clz_test: addi sp, sp, -8 # Adjust the stack pointer to make space for saving registers sw ra, 0(sp) # Save return address to the stack lw a1, 0(a0) # Load the test data into a1 beqz a1, handle_zero # If test data is 0, jump to handle_zero mv a0, a1 # Move test data to a0 for my_clz input jal ra, my_clz # Call my_clz function mv t1, a0 # Save the my_clz result (leading zero count) to t1 mv a0, a1 # Restore test data to a0 mv a1, t1 # Pass the leading zero count as a1 jal ra, printResult # Call printResult function j restore_stack # Jump to restore stack # Handle zero case (undefined behavior) handle_zero: mv t0, a1 # Save 0 as test data la a0, str_1 # Load "The leading zero of " string li a7, 4 # System call code for printing a string ecall # Print "The leading zero of " li a0, 0 li a7, 1 # System call code for printing an integer ecall # Print 0 la a0, str_2 li a7, 4 ecall # Print " is " la a0, str_undefined li a7, 4 ecall # Print "undefined" j restore_stack # Jump to restore stack restore_stack: lw ra, 0(sp) # Restore return address from the stack addi sp, sp, 8 # Restore the stack pointer ret # Return to caller # my_clz function: Counts the number of leading zeros in an integer # Input: a0 - input integer # Output: a0 - count of leading zeros my_clz: li t0, 0 # Initialize counter t0 = 0 li t1, 31 # Set bit index t1 = 31 (starting from the highest bit) clz_loop: li t3, 1 # Load 1 into t3 sll t3, t3, t1 # Shift t3 left by t1 bits to create the mask (1 << t1) and t2, a0, t3 # t2 = a0 & (1 << t1) beqz t2, check_next_bit # If t2 == 0, jump to check_next_bit j clz_done # If t2 != 0, jump to clz_done check_next_bit: addi t0, t0, 1 # Increment counter t0 addi t1, t1, -1 # Decrement bit index t1 bgez t1, clz_loop # If t1 >= 0, continue looping clz_done: mv a0, t0 # Move the count t0 to a0 (return value) jr ra # Return to the caller # printResult function: Displays the result "The leading zero of <test_data> is <leading zero count>" # a0: The original input value (test_data) # a1: The leading zero count printResult: mv t0, a0 # Save original input value (test_data) in temporary register t0 mv t1, a1 # Save leading zero count in temporary register t1 la a0, str_1 li a7, 4 ecall # Print the string "The leading zero of " mv a0, t0 li a7, 1 ecall # Print the test_data la a0, str_2 li a7, 4 ecall # Print the string " is " mv a0, t1 li a7, 1 ecall # Print the leading zero count ret # Return to the caller ``` :::danger Do not use screenshots for plain text content, as this is inaccessible to visually impaired users. ::: ### Output The leading zero of 19 is 27 The leading zero of 4096 is 19 The leading zero of 1073741824 is 1 The leading zero of 0 is undefined Program exited with code: 0 ### Performance | Metric | Value | | -------- | -------- | | Cycles | 773 | | Instrs. retired | 487 | | CPI | 1.59 | | IPC | 0.63 | | Clock rate | 10.75 Hz | ### Pipeline diagram | Instruction | 0 | 1 | 2 | 3 | 4 | 5 | 6 | |------------------------|----|----|----|----|----|----|----| | auipc x10 0x10000 | IF | ID | EX | MEM| WB | | | | addi x10 x10 37 | | IF | ID | EX | MEM| WB | | | jal x1 48 <run_clz_test>| | | IF | ID | EX | MEM| WB | | auipc x10 0x10000 | | | | IF | ID | | | | addi x10 x10 29 | | | | | IF | | | | Instruction | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |------------------------------|----|----|----|----|----|----|----|----|----|----|----| | addi x2 x2 -8 | IF | ID | EX | MEM| WB | | | | | | | | sw x1 0 x2 | | IF | ID | EX | MEM| WB | | | | | | | lw x11 0 x10 | | | IF | ID | EX | MEM| WB | | | | | | beq x11 x0 32 <handle_zero> | | | | IF | ID | - | EX | MEM| WB | | | | addi x10 x11 0 | | | | | IF | - | ID | EX | MEM| WB | | | jal x1 104 <my_clz> | | | | | | | IF | ID | EX | MEM| WB | | addi x6 x10 0 | | | | | | | | IF | ID | | | | addi x10 x11 0 | | | | | | | | | IF | | | #### The loop pipeline diagram ![image](https://hackmd.io/_uploads/rJOdeI5Jyg.png) :::warning TODO: Rewrite `my_clz` without loops. ::: ## CLZ with Binary Search Approach This function uses a binary search approach to calculate the number of leading zeros in the input integer. * The input integer is repeatedly shifted left, and the AND operation is used to check whether the higher bits are 0. * Each time the higher bits are found to be 0, the corresponding offset (16, 8, 4, 2, 1) is added to the counter. ```assemly .data str_1: .string "\nThe leading zero of " str_2: .string " is " str_undefined: .string "undefined" test_data_1: .word 19 # 00000000000000000000000000010011 test_data_2: .word 4096 # 00000000000000000001000000000000 test_data_3: .word 1073741824 # 01000000000000000000000000000000 test_data_4: .word 0 # The number 0 (to test undefined behavior) .text .globl main main: # Test case 1 la a0, test_data_1 jal ra, run_clz_test # Test case 2 la a0, test_data_2 jal ra, run_clz_test # Test case 3 la a0, test_data_3 jal ra, run_clz_test # Test case 4 (Undefined behavior for 0) la a0, test_data_4 jal ra, run_clz_test # Exit the program li a7, 10 ecall # run_clz_test function: Executes the my_clz function and prints the result for a given test case run_clz_test: addi sp, sp, -8 # Adjust the stack pointer to make space for saving registers sw ra, 0(sp) # Save return address to the stack lw a1, 0(a0) # Load the test data into a1 beqz a1, handle_zero # If test data is 0, go to handle_zero mv a0, a1 # Move test data to a0 for my_clz input jal ra, my_clz # Call my_clz function mv t1, a0 # Save the my_clz result (leading zero count) to t1 mv a0, a1 # Restore test data to a0 mv a1, t1 # Pass the leading zero count as a1 jal ra, printResult # Call printResult function j restore_stack # Jump to restore stack # Handle case where input is zero (undefined behavior) handle_zero: mv t0, a1 # t0 = 0 la a0, str_1 # Load "The leading zero of " string li a7, 4 # System call code for printing a string ecall # Print "The leading zero of " li a0, 0 li a7, 1 # System call code for printing an integer ecall # Print 0 la a0, str_2 li a7, 4 ecall # Print " is " la a0, str_undefined li a7, 4 ecall # Print "undefined" j restore_stack # Jump to restore stack restore_stack: lw ra, 0(sp) # Restore return address from the stack addi sp, sp, 8 # Restore the stack pointer ret # Return to caller # my_clz function: Counts the number of leading zeros in an integer using Binary Search Approach my_clz: li t0, 0 # Initialize counter t0 = 0 li t1, 0xFFFF0000 # t1 = 0xFFFF0000 and t2, a0, t1 # t2 = a0 & 0xFFFF0000 beqz t2, clz_high16_zero # If high 16 bits are zero, skip j clz_check_high8 clz_high16_zero: addi t0, t0, 16 # t0 += 16 slli a0, a0, 16 # a0 <<= 16 clz_check_high8: li t1, 0xFF000000 # t1 = 0xFF000000 and t2, a0, t1 # t2 = a0 & 0xFF000000 beqz t2, clz_high8_zero # If next 8 bits are zero, skip j clz_check_high4 clz_high8_zero: addi t0, t0, 8 # t0 += 8 slli a0, a0, 8 # a0 <<= 8 clz_check_high4: li t1, 0xF0000000 # t1 = 0xF0000000 and t2, a0, t1 # t2 = a0 & 0xF0000000 beqz t2, clz_high4_zero # If next 4 bits are zero, skip j clz_check_high2 clz_high4_zero: addi t0, t0, 4 # t0 += 4 slli a0, a0, 4 # a0 <<= 4 clz_check_high2: li t1, 0xC0000000 # t1 = 0xC0000000 and t2, a0, t1 # t2 = a0 & 0xC0000000 beqz t2, clz_high2_zero # If next 2 bits are zero, skip j clz_check_high1 clz_high2_zero: addi t0, t0, 2 # t0 += 2 slli a0, a0, 2 # a0 <<= 2 clz_check_high1: li t1, 0x80000000 # t1 = 0x80000000 and t2, a0, t1 # t2 = a0 & 0x80000000 beqz t2, clz_high1_zero # If highest bit is zero, skip j clz_done clz_high1_zero: addi t0, t0, 1 # t0 += 1 slli a0, a0, 1 # a0 <<= 1 clz_done: mv a0, t0 # Move result to a0 jr ra # Return to caller # printResult function: Displays the result "The leading zero of <test_data> is <leading zero count>" printResult: mv t0, a0 # Save original input value (test_data) in temporary register t0 mv t1, a1 # Save leading zero count in temporary register t1 la a0, str_1 li a7, 4 ecall # Print the string "The leading zero of " mv a0, t0 li a7, 1 ecall # Print the test_data la a0, str_2 li a7, 4 ecall # Print the string " is " mv a0, t1 li a7, 1 ecall # Print the leading zero count ret # Return to the caller ``` ### Output The leading zero of 19 is 27 The leading zero of 4096 is 19 The leading zero of 1073741824 is 1 The leading zero of 0 is undefined Program exited with code: 0 ### Performance | Metric | Value | | -------- | -------- | | Cycles | 330 | | Instrs. retired | 208 | | CPI | 1.59 | | IPC | 0.63 | | Clock rate | 9.71 Hz | ### Advantage The execution time is fixed, It will not differ because of the different number of digits in the highest bit ### Pipeline diagram | Instruction | 0 | 1 | 2 | 3 | 4 | 5 | 6 | |------------------------------|----|----|----|----|----|----|----| | auipc x10 0x10000 | IF | ID | EX | MEM| WB | | | | addi x10 x10 37 | | IF | ID | EX | MEM| WB | | | jal x1 48 <run_clz_test> | | | IF | ID | EX | MEM| WB | | auipc x10 0x10000 | | | | IF | ID | | | | addi x10 x10 29 | | | | | IF | | | | Instruction | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |------------------------------|----|----|----|----|----|----|----|----|----|----|----| | addi x2 x2 -8 | IF | ID | EX | MEM| WB | | | | | | | | sw x1 0 x2 | | IF | ID | EX | MEM| WB | | | | | | | lw x11 0 x10 | | | IF | ID | EX | MEM| WB | | | | | | beq x11 x0 32 <handle_zero> | | | | IF | ID | - | EX | MEM| WB | | | | addi x10 x11 0 | | | | | IF | - | ID | EX | MEM| WB | | | jal x1 104 <my_clz> | | | | | | | IF | ID | EX | MEM| WB | | addi x6 x10 0 | | | | | | | | IF | ID | | | | addi x10 x11 0 | | | | | | | | | IF | | | #### Without loop pipeline diagram ![image](https://hackmd.io/_uploads/SkeD7wIJ1l.png) ![image](https://hackmd.io/_uploads/r1eK7PI1Jg.png) ## LeetCode 991: Broken Calculator 1. Reverse Operations: Start from the target and progressively transform it into the startValue, calculating the required number of operations. 1. Bitwise Operations: Perform right shifts (equivalent to division by 2) for even numbers, and add 1 for odd numbers. ```c .data str_start: .string "\nStartValue = " str_target: .string ", Target = " str_output: .string ", Minimum operations needed: " # Test data start_1: .word 2 target_1: .word 3 # 00000000000000000000000000000011 start_2: .word 5 target_2: .word 4096 # 00000000000000000001000000000000 start_3: .word 3 target_3: .word 1073741824 # 01000000000000000000000000000000 .text .globl main main: # Test case 1 la a0, start_1 la a1, target_1 jal ra, run_test # Test case 2 la a0, start_2 la a1, target_2 jal ra, run_test # Test case 3 la a0, start_3 la a1, target_3 jal ra, run_test # Exit the program li a7, 10 ecall # run_test function: Executes min_operations function and prints the result # Inputs: # a0: Address of startValue # a1: Address of target run_test: addi sp, sp, -12 # Allocate stack space for saved registers sw ra, 8(sp) # Save return address sw s0, 4(sp) # Save s0 sw s1, 0(sp) # Save s1 lw s0, 0(a0) # Load startValue into s0 lw s1, 0(a1) # Load target into s1 mv a0, s0 # Move startValue to a0 mv a1, s1 # Move target to a1 jal ra, min_operations # Call min_operations function mv t0, s0 # Save startValue to t0 mv t1, s1 # Save target to t1 mv t2, a0 # Save result (operations) to t2 jal ra, printResult # Call printResult function lw ra, 8(sp) # Restore return address lw s0, 4(sp) # Restore s0 lw s1, 0(sp) # Restore s1 addi sp, sp, 12 # Deallocate stack space ret # min_operations function: Calculates the minimum number of operations # Inputs: # a0: startValue # a1: target # Output: # a0: Minimum number of operations min_operations: addi sp, sp, -8 # Allocate stack space for saved registers sw s0, 4(sp) # Save s0 sw s1, 0(sp) # Save s1 mv s0, a0 # s0 = startValue mv s1, a1 # s1 = target li t0, 0 # t0 = operations min_loop: bge s0, s1, min_loop_end # If startValue >= target, exit loop andi t1, s1, 1 # t1 = target % 2 beqz t1, min_even # If t1 == 0 (even), go to min_even # target is odd addi s1, s1, 1 # target += 1 addi t0, t0, 1 # operations += 1 j min_loop min_even: srai s1, s1, 1 # target = target / 2 addi t0, t0, 1 # operations += 1 j min_loop min_loop_end: sub t1, s0, s1 # t1 = startValue - target add t0, t0, t1 # operations += (startValue - target) mv a0, t0 # a0 = operations lw s0, 4(sp) # Restore s0 lw s1, 0(sp) # Restore s1 addi sp, sp, 8 # Deallocate stack space ret # printResult function: Prints startValue, target, and minimum operations # Inputs: # t0: startValue # t1: target # t2: Minimum operations printResult: la a0, str_start # Print "\nStartValue = " li a7, 4 ecall mv a0, t0 # Print startValue li a7, 1 ecall la a0, str_target # Print ", Target = " li a7, 4 ecall mv a0, t1 # Print target li a7, 1 ecall la a0, str_output # Print ", Minimum operations needed: " li a7, 4 ecall mv a0, t2 # Print minimum operations li a7, 1 ecall ret ``` ### Output StartValue = 2, Target = 3, Minimum operations needed: 2 StartValue = 5, Target = 4096, Minimum operations needed: 11 StartValue = 3, Target = 1073741824, Minimum operations needed: 30 Program exited with code: 0 ### Performance | Metric | Value | | -------- | -------- | | Cycles | 330 | | Instrs. retired | 208 | | CPI | 1.59 | | IPC | 0.63 | | Clock rate | 9.71 Hz | ### Expected to be optimized with clz Originally I wanted to optimize that when the test case is a power of 2, it can quickly move to the right without moving bit by bit. As a result, the related code has to be expanded, so the performance is even worse.