Assignment1 - HackMD

# Assignment1: RISC-V Assembly and Instruction Pipeline **Topic:function of checking if Bitwise OR Has Trailing Zeros** ## topic discribtion and motivation * **discribtion** Return true if it is possible to select two or more elements whose bitwise OR has trailing zeros, return false otherwise * **the function application** In RISC-V instruction set, having trailing zeros in a piece of data can be an indicator that the data is not an executable instruction. This property arises because, in the RISC-V instruction format, the opcode (operation code) typically occupies the least significant 7 bits, and for standard RISC-V instructions, the least significant bit of the opcode is always set to 1. Thus, if a data word has a trailing zero (i.e., its least significant bit is 0), it can be inferred that it is unlikely to be a valid instruction opcode. This application is particularly useful during instruction decoding, where it helps distinguish between actual instructions and other types of data, such as immediate values, memory addresses, or control data. By quickly checking for trailing zeros, a processor or decoding logic can filter out non-instruction data and focus processing power on valid instructions only, which can enhance efficiency. ## C code * #### Function-based programming version1(O(n^2)) ``` #include <stdbool.h> bool CheckTrailingZeros(int* nums, int numsSize) { for (int i = 0; i < numsSize; i++) { for (int j = i + 1; j < numsSize; j++) { int bitwiseOr = nums[i] | nums[j]; if ((bitwiseOr & 1) == 0) { return true; } } } return false; } int main() { int nums[] = {1, 2, 3, 4, 5}; int numsSize = 5; bool result = CheckTrailingZeros(nums, numsSize); return result; } ``` version2(O(n)) ``` #include <stdbool.h> bool CheckTrailingZeros(int* nums, int numsSize) { int countTrailingZeros = 0; for (int i = 0; i < numsSize; i++) { if ((nums[i] & 1) == 0) { countTrailingZeros++; } } if (countTrailingZeros >= 2) { return true; } return false; } int main() { int nums[] = {1, 2, 3, 4, 5}; int numsSize = 5; bool result = CheckTrailingZeros(nums, numsSize); return result; } ``` * #### Non-functional programming ``` #include <stdio.h> #include <stdbool.h> #include <unistd.h> #include <stdlib.h> int main() { int i; int *nums; int arraysize=0; printf("please enter an arraysize\n"); scanf("%d",&arraysize); printf("now array size is:%d\n",arraysize); nums = (int*)malloc(arraysize * sizeof(int)); if (nums == NULL) { printf("Memory allocation failed!\n"); return 1; } printf("please enter an array\n"); for(int i=0;i<arraysize;i++){ scanf("%d",&nums[i]); printf("%d\n",nums[i]); printf("%d\n",arraysize); } printf("nums is "); for(int i=0;i<arraysize;i++){ printf("%d ",nums[i]); } printf("\n"); for(int i=0;i<arraysize;i++){ for(int j=i+1;j<arraysize;j++){ int bitwiseOr=nums[i]|nums[j]; printf("%d\n",bitwiseOr); if ((bitwiseOr&0b1)== 0){ printf("it has trailing zero\n"); goto cleanup; } else{ printf("no trailing zero\n"); } } } cleanup: free(nums); return 0; } ``` ## Assembly code * #### C_v1_direct conversion ``` .data #static region nums: .word 1,2,3,4,5 #testdata0 numsSize: .word 5 #testdata0 newline: .string "\n" .text .globl main main: #def a0 a1 la a0, nums lw a1, numsSize #function call jal ra, checktrailingzero li a7, 1 mv a1, a0 ecall li a7, 4 la a0, newline ecall li a7, 10 ecall checktrailingzero: addi sp,sp,-8 #adjust stack for 2 items sw s1,4(sp) # save s1 value in case covered by function sw s0,0(sp) # save s0 value in case covered by function add s0,x0,x0 #int bitwise0r=0 add a2,x0,x0 #i=0 add t0,a0,x0 #save num[0] in t0 add a6,a0,x0 # Set pointers for nums[i]# a6 = &nums[i] (nums[i] pointer) loop1: bge a2,a1,loop1_done #for(int i ; i < numsSize;) addi a3,a2,1 #j=i+1 addi a7,a6,4 # a7 = &nums[i+1] (nums[j] pointer) loop2: bge a3,a1,loop2_done #for(int j ; j < numsSize;) lw a4,0(a6) #a4=num[i] pointer lw a5,0(a7) #a5=num[j] pointer or s0,a4,a5 # int bitwiseOr = nums[i] | nums[j]; andi s1,s0,1 #def (bitwiseOr & 1)=s1 beq s1,x0,findtrailingzero j unfoundtrailingzero findtrailingzero: li a0,1 # return true ret unfoundtrailingzero: addi a3,a3,1 #j++ addi a7,a7,4 #&nums[j]pointnext j loop2 loop2_done: addi a2,a2,1 #i++ addi a6,a6,4 #&num[i] pointnext j loop1 loop1_done: lw s0,0(sp) # return the s0 original value to s0 lw s1,4(sp) # return the s1 original value to s1 li a0,0 #return false jr ra ``` * #### C_v2_direct conversion_v1 ``` .data #static region nums: .word 1,2,3,4,5 #testdata0 numsSize: .word 5 #testdata0 newline: .string "\n" .text .globl main main: #def a0 a1 la a0, nums lw a1, numsSize #function call jal ra, checktrailingzero li a7, 1 mv a1, a0 ecall li a7, 4 la a0, newline ecall li a7, 10 ecall checktrailingzero: addi sp,sp,-4 # push stack for 1 sw s0,0(sp) # save s0 add s0,x0,x0 # int countTrailingZeros = 0 add a2,x0,x0 # i = 0 loop1: bge a2,a1,loop1_done # if (i >= numsSize) ,loop done # check if nums[i] is even slli t0,a2,2 # (i * 4) add t1,a0,t0 # calculate nums[i] address toward t1 lw t2,0(t1) # load nums[i] toward t2 andi t2,t2,1 # nums[i] & 1 bne t2,x0,skip_count # if nums[i] odd，skip_count addi s0,s0,1 # countTrailingZeros++ skip_count: addi a2,a2,1 # i++ j loop1 # next loop loop1_done: li t0,2 # make to=2(imm) bge s0,t0,return_true # if (countTrailingZeros >= 2) return true li a0,0 # return false (a0 = 0) j function_done return_true: li a0,1 # return true (a0 = 1) function_done: lw s0,0(sp) # reload s0_original addi sp,sp,4 # stack back jr ra # ret ``` * eliminate data hazard * load_use hazard:reschedule instruction and insert the independent instruction between two successively dependent instruction * #### C_v2_direct conversion_v2 ``` .data #static region nums: .word 1,2,3,4,5 #testdata0 numsSize: .word 5 #testdata0 newline: .string "\n" .text .globl main main: #def a0 a1 la a0, nums lw a1, numsSize #function call jal ra, checktrailingzero li a7, 1 #print mv a1, a0 ecall #system call li a7, 4 la a0, newline ecall li a7, 10 ecall checktrailingzero: add s0,x0,x0 # int countTrailingZeros = 0 addi sp,sp,-4 # push stack for 1 add a2,x0,x0 # i = 0 sw s0,0(sp) # save s0 loop1: bge a2,a1,loop1_done # if (i >= numsSize) ,loop done # check if nums[i] is even slli t0,a2,2 # (i * 4) lw t2,0(t1) # load nums[i] toward t2 add t1,a0,t0 # calculate nums[i] address toward t1 andi t2,t2,1 # nums[i] & 1 bne t2,x0,skip_count # if nums[i] odd，skip_count addi s0,s0,1 # countTrailingZeros++ skip_count: addi a2,a2,1 # i++ j loop1 # next loop loop1_done: li t0,2 # make t0=2(imm) bge s0,t0,return_true # if (countTrailingZeros >= 2) return true li a0,0 # return false (a0 = 0) j function_done return_true: li a0,1 # return true (a0 = 1) function_done: lw s0,0(sp) # reload s0_original addi sp,sp,4 # stack back jr ra # ret ``` * #### C_v2_direct conversion_v3 ``` .data #static region nums: .word 1,2,3,4,5 #testdata0 numsSize: .word 5 #testdata0 newline: .string "\n" .text .globl main main: #def a0 a1 la a0, nums lw a1, numsSize #function call jal ra, checktrailingzero li a7, 1 #print mv a1, a0 ecall #system call li a7, 4 la a0, newline ecall li a7, 10 ecall checktrailingzero: add s0,x0,x0 # int countTrailingZeros = 0 addi sp,sp,-4 # push stack for 1 sw s0,0(sp) # save s0 add a2,x0,x0 # i = 0 loop1: bge a2,a1,loop1_done # if (i >= numsSize) ,loop done nop nop # check if nums[i] is even slli t0,a2,2 # (i * 4) lw t2,0(t1) # load nums[i] toward t2 add t1,a0,t0 # calculate nums[i] address toward t1 andi t2,t2,1 # nums[i] & 1 bne t2,x0,skip_count # if nums[i] odd，skip_count nop nop addi s0,s0,1 # countTrailingZeros++ skip_count: addi a2,a2,1 # i++ j loop1 # next loop loop1_done: li t0,2 # make t0=2(imm) bge s0,t0,return_true # if (countTrailingZeros >= 2) return true nop nop li a0,0 # return false (a0 = 0) j function_done return_true: li a0,1 # return true (a0 = 1) function_done: lw s0,0(sp) # reload s0_original addi sp,sp,4 # stack back jr ra # ret ``` * eliminate control hazard : * reason:in the pipeline,following jump and branch format instructions we often see flush or NOP appear. This occurs because the processor is waiting to determine whether the branch condition is met. ![image](https://hackmd.io/_uploads/HylIrRHTkke.png) * resolution: 1. **stall**:insert NOP following branch or jump ![image](https://hackmd.io/_uploads/ry9NuIak1l.png) ![image](https://hackmd.io/_uploads/HJbhqIpkke.png) ![image](https://hackmd.io/_uploads/Hy-nuLpJyx.png) ![image](https://hackmd.io/_uploads/rJPw5IaJye.png) ![image](https://hackmd.io/_uploads/BJT6FLpkye.png) ![image](https://hackmd.io/_uploads/HkVuFI61kl.png) 3. **branch prediction**: * Explanation of Branch Prediction: In the example above, we use the bne instruction to determine whether numbers[i] is odd. If the number is odd, the program jumps to skip; otherwise, it continues to execute the addition. When running this code, the CPU needs to predict whether the bne instruction will result in a jump.Without Branch Prediction:When the CPU encounters the bne instruction, it must stop and wait to evaluate the result of numbers[i] & 1. * Branch Prediction Example Analysis:Let’s assume the CPU uses a static branch prediction strategy (e.g., always assuming no jump). Below, we analyze two different scenarios based on the input data: Scenario 1: The Array Contains Mostly Even Numbers numbers = [2, 4, 6, 8, 10] Static Prediction: Assume the branch will never be taken. Prediction Success: Most of the bne branches are not taken, and the pipeline continues without stalling. Effect: Overall performance improves. Scenario 2: The Array Contains Mostly Odd Numbers numbers = [1, 3, 5, 7, 9] Static Prediction: Assume the branch will never be taken. Prediction Failure: The bne branch needs to jump every time, forcing the CPU to flush the pipeline and reload the instructions. Effect: Overall performance decreases. ## test data | test | nums_input | output | | ------------- | ---------- | ------ | | 1. |[1,2,3,4,5] | true | | 2. | [2,4,8,16] | true | | 3. | [1,3,5,7,9] | false | ## Result * #### C test1:![image](https://hackmd.io/_uploads/BkhAJgekke.png) test2:![image](https://hackmd.io/_uploads/HJTZgxg1Je.png) test3:![image](https://hackmd.io/_uploads/BknRelgkyx.png) * #### Assembly test1:![image](https://hackmd.io/_uploads/Bkoc7lg1Jx.png) test2:![image](https://hackmd.io/_uploads/rkBhXgxy1g.png) test3:![image](https://hackmd.io/_uploads/rkXCXlekye.png) ## Analysis 1. ### C_v1_ripes vs C_v2_ripes * #### C_v1_ripes compiling ![image](https://hackmd.io/_uploads/r1nlQBqyyx.png) * #### C_v2_ripes compiling ![image](https://hackmd.io/_uploads/Bk6G7SqyJx.png) from IPC、Cycles,we can know C_v2 is more efficient than C_v1 2. ### C_v1_compiling vs C_v2_mecompiling * #### C_v1_mecompiling ![image](https://hackmd.io/_uploads/Hyl-8S5kJe.png) cycles/IPC =204.855 * #### C_v2_mecompiling)_v1 ![image](https://hackmd.io/_uploads/Sy1CrH9Jyx.png) cycles/IPC =167.72 ### To optimize the c_v2_mecompiling #### 1.optimize data hazard * #### C_v2_mecompiling_v2 ![image](https://hackmd.io/_uploads/rJJJZVTkye.png) #### 2.optimize control hazard * #### C_v2_mecompiling_v3 ![image](https://hackmd.io/_uploads/r1OSs8akkl.png) ## Reference *[MIT 6.004 L15: Introduction to Pipelining](https:/https://www.youtube.com/watch?v=5NQkhqZe8_8/) *[MIT 6.004 L16: Processor Pipelining](https:/https://www.youtube.com/watch?v=TMpjvAvQCWA/) *[cs61c Lectures](https://) *[Ripes: Teaching Computer Architecture Through Visual and Interactive Simulators](https:/https://www.youtube.com/watch?v=yEw-S8J-LkI&t=730s/)