Assignment1: RISC-V Assembly and Instruction Pipeline

# Assignment1: RISC-V Assembly and Instruction Pipeline :::danger Check the requirements carefully. ::: ## 1342. Number of Steps to Reduce a Number to Zero Given an integer num, return the number of steps to reduce it to zero. In one step, if the current number is even, you have to divide it by 2, otherwise, you have to subtract 1 from it. **Example** : **Input**: num = 14 **Output**: 6 **Explanation**: Step 1) 14 is even; divide by 2 and obtain 7. Step 2) 7 is odd; subtract 1 and obtain 6. Step 3) 6 is even; divide by 2 and obtain 3. Step 4) 3 is odd; subtract 1 and obtain 2. Step 5) 2 is even; divide by 2 and obtain 1. Step 6) 1 is odd; subtract 1 and obtain 0. Constraints: `0 <= num <= 10^6` ## Solution First, we use simulation to solve this question directly.Implement the program according to the description of the problem. ```c int numberOfSteps(int num) { int count=0; while(num!=0){ if(num%2==0){//number is even num/=2; } else{//number is odd num-=1; } count++; } return count; } ``` If we want to change above C code to asseblely, we will face one problem how to implement `%` and `/` .In order to implement those operator, I use long division with clz mentioned in quiz1. ```c static inline int my_clz(uint32_t x) { int count = 0; for (int i = 31; i >= 0; --i) { if (x & (1U << i)) break; count++; } return count; } ``` In long division, the first thing should be done is align the MSB of two number. Find the MSB of two number and substract each other to get the shift number of divisor. Then, clz is a efficient way to find the MSB of a number. ``` .data number: .word 10000 str1: .string "The steps to Reduce a Number to Zero of " # First part of the output message str2: .string " is " # Second part of the output message .text main: lw a1, number # Load the argument (10000) into register a1 jal ra, numberOfSteps # Jump-and-link to the 'numberOfSteps' function mv a1, a0 # Move the result (count) from a0 to a1 for printing lw a0, number # Reload input argument(10000) to a0 for printing jal ra, printResult # Call the function to print the result li a7, 10 # System call code for exiting the program ecall # Make the exit system call # numberOfSteps # a1: Input argument (number) # a0: Output argument (count) numberOfSteps: addi sp, sp, -4 sw ra, 0(sp) li a0, 0 # Set count number to 0 li t0, 2 # Save dividor(2) in temporary register t0 start: beqz a1, end # If number is 0, count won't be 0. prevent this condition mv t1, a1 # Save dividend(number) in temporary register t1 bgt t0, t1, ldend # If dividend is less than divisor, skip longdivision # CLZ of input number li t2, 31 # i = 31 li t3, 1 # Save a constant(1) in temporary register t3 li t4, 0 # count = 0 clzloop: sll t5, t3, t2 # x & (1U << i) and t6, t5, t1 # Save result in temporary register t6 bge t6, t3, longdivision # If t6 is greater than 1 or equal to 1, break the loop addi t4, t4, 1 # count++ addi t2, t2, -1 # --i j clzloop longdivision: # number divide 2 li t1, 30 # CLZ of 2 is 30 sub t5, t1, t4 # Difference between clz of divisor and clz of dividend sll t2, t0, t5 # Align the most significant digit of divisor and dividend li t3, 0 # Set quotient in temporary register t3 li t4, 0 # Record number of divisor shift mv t1, a1 # Restore number to temporary register t1 ldloop: bgt t2, t1, skip # If divisor is larger than dividend, skip substraction sub t1, t1, t2 # Dividend substract divisor addi t3, t3, 1 # Set the LSB of quotient to 1 skip: beq t4, t5, ldend # If number of divisor shift is eqaul to number of alignment shift slli t3, t3, 1 # Shift left quotient srli t2, t2, 1 # Shift Right divisor addi t4, t4, 1 # The number of divisor shift add 1 j ldloop ldend: beqz t1, even # If t1(remainder) is equal to 0, number is even.Otherwise, it's odd addi a1, a1, -1 # If t1(remainder) is odd, number - 1 addi a0, a0, 1 # count add 1 j start even: mv a1, t3 # If t1(remainder) is even, number % 2(t3 is the quotient of the longdivision) addi a0, a0, 1 # count add 1 j start end: lw ra, 0(sp) addi sp, sp, 4 ret # This function prints the factorial result in the format: # a0: The original input value (number) # a1: The number of Steps to Reduce a Number to Zero (count) printResult: mv t0, a0 # Save original input value (X) in temporary register t0 mv t1, a1 # Save factorial result (Y) in temporary register t1 la a0, str1 # Load the address of the first string ("Factorial value of ") li a7, 4 # System call code for printing a string ecall # Print the string mv a0, t0 # Move the original input value (X) to a0 for printing li a7, 1 # System call code for printing an integer ecall # Print the integer (X) la a0, str2 # Load the address of the second string (" is ") li a7, 4 # System call code for printing a string ecall # Print the string mv a0, t1 # Move the factorial result (Y) to a0 for printing li a7, 1 # System call code for printing an integer ecall # Print the integer (Y) ret # Return to the caller ``` :::danger Use fewer instructions. ::: After simulation on Ripes, the result is in following picture ![螢幕擷取畫面 2024-10-14 172809](https://hackmd.io/_uploads/SkuiYn511x.png) Look like there's something for improvement. I find this code spend a lot time on clz funtion. As long as the input number is smaller, the search time of clz will be more. In order to reduce time of clz funtion, I use bit mask way to implement clz funtion. ```c int my_clz(uint32_t x) { if (x == 0) return 32; int n = 0; if (x <= 0x0000FFFF) { n += 16; x <<= 16; } if (x <= 0x00FFFFFF) { n += 8; x <<= 8; } if (x <= 0x0FFFFFFF) { n += 4; x <<= 4; } if (x <= 0x3FFFFFFF) { n += 2; x <<= 2; } if (x <= 0x7FFFFFFF) { n += 1; x <<= 1; } return n; } ``` ```c # CLZ of input number li t4, 0 # n = 0 li t2, 0x0000FFFF # Save 0x0000FFFF in temporary register t2 bgt t1, t2, S1 # If t1 is greater than t2, skip following instrcution addi t4, t4, 16 # n += 16 slli t1, t1, 16 # x <<= 16 S1: li t2, 0x00FFFFFF # Save 0x00FFFFFF in temporary register t2 bgt t1, t2, S2 # If t1 is greater than t2, skip following instrcution addi t4, t4, 8 # n += 8 slli t1, t1, 8 # n += 8 S2: li t2, 0x0FFFFFFF # Save 0x0FFFFFFF in temporary register t2 bgt t1, t2, S3 # If t1 is greater than t2, skip following instrcution addi t4, t4, 4 # n += 8 slli t1, t1, 4 # n += 8 S3: li t2, 0x3FFFFFFF # Save 0x3FFFFFFF in temporary register t2 bgt t1, t2, S4 # If t1 is greater than t2, skip following instrcution addi t4, t4, 2 slli t1, t1, 2 S4: li t2, 0x7FFFFFFF # Save 0x7FFFFFFF in temporary register t2 bgt t1, t2, longdivision addi t4, t4, 1 ``` The result is in the following picture. Compare to the oridinal version, the cycle time is reduced significantly. ![螢幕擷取畫面 2024-10-14 172501](https://hackmd.io/_uploads/B146Y25yke.png) ### Version 2 ```c int numberOfSteps(int num) { if(num==0) return 0; //if number is 0, count won't be 0. prevent this condition int count = 0; while (num!=0) { count += (num & 0x1) + 1; num >>= 1; } return count - 1; } ``` ```c .data number: .word 10000 str1: .string "The steps to Reduce a Number to Zero of " # First part of the output message str2: .string " is " # Second part of the output message .text main: lw a1, number jal ra, numberOfSteps mv a1, a0 lw a0, number jal ra, printResult li a7, 10 # System call code for exiting the program ecall # Make the exit system call # numberOfSteps # a1: Input argument (number) # a0: Output argument (count) numberOfSteps: addi sp, sp, -4 sw ra, 0(sp) li a0, 0 #set count nunber to 0 start: beqz a1, end andi t0, a1, 0x1 addi t0, t0, 1 add a0, t0, a0 srli a1, a1, 1 j start end: addi a0, a0, -1 lw ra, 0(sp) addi sp, sp, 4 ret # This function prints the factorial result in the format: # a0: The original input value (number) # a1: The number of Steps to Reduce a Number to Zero (count) printResult: mv t0, a0 # Save original input value (X) in temporary register t0 mv t1, a1 # Save factorial result (Y) in temporary register t1 la a0, str1 # Load the address of the first string ("Factorial value of ") li a7, 4 # System call code for printing a string ecall # Print the string mv a0, t0 # Move the original input value (X) to a0 for printing li a7, 1 # System call code for printing an integer ecall # Print the integer (X) la a0, str2 # Load the address of the second string (" is ") li a7, 4 # System call code for printing a string ecall # Print the string mv a0, t1 # Move the factorial result (Y) to a0 for printing li a7, 1 # System call code for printing an integer ecall # Print the integer (Y) ret # Return to the caller ``` ![螢幕擷取畫面 2024-10-14 172835](https://hackmd.io/_uploads/By1CK39Jyl.png) ### Version 3 ```c int length(uint32_t num) { uint32_t clz = 0; if ((num >> 16) == 0) { clz += 16; num <<= 16; } if ((num >> 24) == 0) { clz += 8; num <<= 8; } if ((num >> 28) == 0) { clz += 4; num <<= 4; } if ((num >> 30) == 0) { clz += 2; num <<= 2; } if ((num >> 31) == 0) { clz += 1; } return 32 - clz; } int count(uint32_t num) { num = (num & 0x55555555) + ((num >> 1) & 0x55555555); num = (num & 0x33333333) + ((num >> 2) & 0x33333333); num = (num & 0x0F0F0F0F) + ((num >> 4) & 0x0F0F0F0F); num = (num & 0x00FF00FF) + ((num >> 8) & 0x00FF00FF); num = (num & 0x0000FFFF) + ((num >> 16) & 0x0000FFFF); return num; } int numberOfSteps(uint32_t num) { if(num==0) return 0; //if number is 0, count won't be 0. prevent this condition return length(num) - 1 + count(num); } ``` ```c .data number: .word 10000 str1: .string "The steps to Reduce a Number to Zero of " # First part of the output message str2: .string " is " # Second part of the output message .text main: lw a1, number jal ra, numberOfSteps mv a1, a0 lw a0, number jal ra, printResult li a7, 10 # System call code for exiting the program ecall # Make the exit system call # numberOfSteps # a1: Input argument (number) # a0: Output argument (count) numberOfSteps: addi sp, sp, -4 sw ra, 0(sp) beqz a1, end # if number is 0, count won't be 0. prevent this condition # length funtion li a0, 0 # set clz to 0 mv t0, a1 # Save original input value (number) in temporary register t0 mv t1, a1 # Save original input value (number) in temporary register t0 srli t0, t0, 16 bne t0, x0, S1 addi a0, a0, 16 slli t1, t1, 16 S1: mv t0, t1 srli t0, t0, 24 bne t0, x0, S2 addi a0, a0, 8 slli t1, t1, 8 S2: mv t0, t1 srli t0, t0, 28 bne t0, x0, S3 addi a0, a0, 4 slli t1, t1, 4 S3: mv t0, t1 srli t0, t0, 30 bne t0, x0, S4 addi a0, a0, 2 slli t1, t1, 2 S4: mv t0, t1 srli t0, t0, 31 bne t0, x0, S5 addi a0, a0, 1 S5: li t1, 32 sub a0, t1, a0 # 32 - clz # count funtion li t1, 0x55555555 and t2, a1, t1 srli a1, a1, 1 and a1, a1, t1 add a1, a1, t2 li t1, 0x33333333 and t2, a1, t1 srli a1, a1, 2 and a1, a1, t1 add a1, a1, t2 li t1, 0x0F0F0F0F and t2, a1, t1 srli a1, a1, 4 and a1, a1, t1 add a1, a1, t2 li t1, 0x00FF00FF and t2, a1, t1 srli a1, a1, 8 and a1, a1, t1 add a1, a1, t2 li t1, 0x0000FFFF and t2, a1, t1 srli a1, a1, 16 and a1, a1, t1 add a1, a1, t2 # length(num) - 1 + count(num) add a0, a0, a1 addi a0, a0, -1 end: lw ra, 0(sp) addi sp, sp, 4 ret # This function prints the factorial result in the format: # a0: The original input value (number) # a1: The number of Steps to Reduce a Number to Zero (count) printResult: mv t0, a0 # Save original input value (X) in temporary register t0 mv t1, a1 # Save factorial result (Y) in temporary register t1 la a0, str1 # Load the address of the first string ("Factorial value of ") li a7, 4 # System call code for printing a string ecall # Print the string mv a0, t0 # Move the original input value (X) to a0 for printing li a7, 1 # System call code for printing an integer ecall # Print the integer (X) la a0, str2 # Load the address of the second string (" is ") li a7, 4 # System call code for printing a string ecall # Print the string mv a0, t1 # Move the factorial result (Y) to a0 for printing li a7, 1 # System call code for printing an integer ecall # Print the integer (Y) ret # Return to the caller ``` ![螢幕擷取畫面 2024-10-14 173457](https://hackmd.io/_uploads/SJOAK29ykl.png)