# Assignment1: RISC-V Assembly and Instruction Pipeline
**Topic:function of checking if Bitwise OR Has Trailing Zeros**
## topic discribtion and motivation
* **discribtion**
Return true if it is possible to select two or more elements whose bitwise OR has trailing zeros, return false otherwise
* **the function application**
In RISC-V instruction set, having trailing zeros in a piece of data can be an indicator that the data is not an executable instruction. This property arises because, in the RISC-V instruction format, the opcode (operation code) typically occupies the least significant 7 bits, and for standard RISC-V instructions, the least significant bit of the opcode is always set to 1. Thus, if a data word has a trailing zero (i.e., its least significant bit is 0), it can be inferred that it is unlikely to be a valid instruction opcode.
This application is particularly useful during instruction decoding, where it helps distinguish between actual instructions and other types of data, such as immediate values, memory addresses, or control data. By quickly checking for trailing zeros, a processor or decoding logic can filter out non-instruction data and focus processing power on valid instructions only, which can enhance efficiency.
## C code
* #### Function-based programming
version1(O(n^2))
```
#include <stdbool.h>
bool CheckTrailingZeros(int* nums, int numsSize) {
for (int i = 0; i < numsSize; i++) {
for (int j = i + 1; j < numsSize; j++) {
int bitwiseOr = nums[i] | nums[j];
if ((bitwiseOr & 1) == 0) {
return true;
}
}
}
return false;
}
int main() {
int nums[] = {1, 2, 3, 4, 5};
int numsSize = 5;
bool result = CheckTrailingZeros(nums, numsSize);
return result;
}
```
version2(O(n))
```
#include <stdbool.h>
bool CheckTrailingZeros(int* nums, int numsSize) {
int countTrailingZeros = 0;
for (int i = 0; i < numsSize; i++) {
if ((nums[i] & 1) == 0) {
countTrailingZeros++;
}
}
if (countTrailingZeros >= 2) {
return true;
}
return false;
}
int main() {
int nums[] = {1, 2, 3, 4, 5};
int numsSize = 5;
bool result = CheckTrailingZeros(nums, numsSize);
return result;
}
```
* #### Non-functional programming
```
#include <stdio.h>
#include <stdbool.h>
#include <unistd.h>
#include <stdlib.h>
int main()
{
int i;
int *nums;
int arraysize=0;
printf("please enter an arraysize\n");
scanf("%d",&arraysize);
printf("now array size is:%d\n",arraysize);
nums = (int*)malloc(arraysize * sizeof(int));
if (nums == NULL) {
printf("Memory allocation failed!\n");
return 1;
}
printf("please enter an array\n");
for(int i=0;i<arraysize;i++){
scanf("%d",&nums[i]);
printf("%d\n",nums[i]);
printf("%d\n",arraysize);
}
printf("nums is ");
for(int i=0;i<arraysize;i++){
printf("%d ",nums[i]);
}
printf("\n");
for(int i=0;i<arraysize;i++){
for(int j=i+1;j<arraysize;j++){
int bitwiseOr=nums[i]|nums[j];
printf("%d\n",bitwiseOr);
if ((bitwiseOr&0b1)== 0){
printf("it has trailing zero\n");
goto cleanup;
}
else{
printf("no trailing zero\n");
}
}
}
cleanup:
free(nums);
return 0;
}
```
## Assembly code
* #### C_v1_direct conversion
```
.data #static region
nums: .word 1,2,3,4,5 #testdata0
numsSize: .word 5 #testdata0
newline: .string "\n"
.text
.globl main
main:
#def a0 a1
la a0, nums
lw a1, numsSize
#function call
jal ra, checktrailingzero
li a7, 1
mv a1, a0
ecall
li a7, 4
la a0, newline
ecall
li a7, 10
ecall
checktrailingzero:
addi sp,sp,-8 #adjust stack for 2 items
sw s1,4(sp) # save s1 value in case covered by function
sw s0,0(sp) # save s0 value in case covered by function
add s0,x0,x0 #int bitwise0r=0
add a2,x0,x0 #i=0
add t0,a0,x0 #save num[0] in t0
add a6,a0,x0 # Set pointers for nums[i]# a6 = &nums[i] (nums[i] pointer)
loop1:
bge a2,a1,loop1_done #for(int i ; i < numsSize;)
addi a3,a2,1 #j=i+1
addi a7,a6,4 # a7 = &nums[i+1] (nums[j] pointer)
loop2:
bge a3,a1,loop2_done #for(int j ; j < numsSize;)
lw a4,0(a6) #a4=num[i] pointer
lw a5,0(a7) #a5=num[j] pointer
or s0,a4,a5 # int bitwiseOr = nums[i] | nums[j];
andi s1,s0,1 #def (bitwiseOr & 1)=s1
beq s1,x0,findtrailingzero
j unfoundtrailingzero
findtrailingzero:
li a0,1 # return true
ret
unfoundtrailingzero:
addi a3,a3,1 #j++
addi a7,a7,4 #&nums[j]pointnext
j loop2
loop2_done:
addi a2,a2,1 #i++
addi a6,a6,4 #&num[i] pointnext
j loop1
loop1_done:
lw s0,0(sp) # return the s0 original value to s0
lw s1,4(sp) # return the s1 original value to s1
li a0,0 #return false
jr ra
```
* #### C_v2_direct conversion_v1
```
.data #static region
nums: .word 1,2,3,4,5 #testdata0
numsSize: .word 5 #testdata0
newline: .string "\n"
.text
.globl main
main:
#def a0 a1
la a0, nums
lw a1, numsSize
#function call
jal ra, checktrailingzero
li a7, 1
mv a1, a0
ecall
li a7, 4
la a0, newline
ecall
li a7, 10
ecall
checktrailingzero:
addi sp,sp,-4 # push stack for 1
sw s0,0(sp) # save s0
add s0,x0,x0 # int countTrailingZeros = 0
add a2,x0,x0 # i = 0
loop1:
bge a2,a1,loop1_done # if (i >= numsSize) ,loop done
# check if nums[i] is even
slli t0,a2,2 # (i * 4)
add t1,a0,t0 # calculate nums[i] address toward t1
lw t2,0(t1) # load nums[i] toward t2
andi t2,t2,1 # nums[i] & 1
bne t2,x0,skip_count # if nums[i] odd,skip_count
addi s0,s0,1 # countTrailingZeros++
skip_count:
addi a2,a2,1 # i++
j loop1 # next loop
loop1_done:
li t0,2 # make to=2(imm)
bge s0,t0,return_true # if (countTrailingZeros >= 2) return true
li a0,0 # return false (a0 = 0)
j function_done
return_true:
li a0,1 # return true (a0 = 1)
function_done:
lw s0,0(sp) # reload s0_original
addi sp,sp,4 # stack back
jr ra # ret
```
* eliminate data hazard
* load_use hazard:reschedule instruction and insert the independent instruction between two successively dependent instruction
* #### C_v2_direct conversion_v2
```
.data #static region
nums: .word 1,2,3,4,5 #testdata0
numsSize: .word 5 #testdata0
newline: .string "\n"
.text
.globl main
main:
#def a0 a1
la a0, nums
lw a1, numsSize
#function call
jal ra, checktrailingzero
li a7, 1 #print
mv a1, a0
ecall #system call
li a7, 4
la a0, newline
ecall
li a7, 10
ecall
checktrailingzero:
add s0,x0,x0 # int countTrailingZeros = 0
addi sp,sp,-4 # push stack for 1
add a2,x0,x0 # i = 0
sw s0,0(sp) # save s0
loop1:
bge a2,a1,loop1_done # if (i >= numsSize) ,loop done
# check if nums[i] is even
slli t0,a2,2 # (i * 4)
lw t2,0(t1) # load nums[i] toward t2
add t1,a0,t0 # calculate nums[i] address toward t1
andi t2,t2,1 # nums[i] & 1
bne t2,x0,skip_count # if nums[i] odd,skip_count
addi s0,s0,1 # countTrailingZeros++
skip_count:
addi a2,a2,1 # i++
j loop1 # next loop
loop1_done:
li t0,2 # make t0=2(imm)
bge s0,t0,return_true # if (countTrailingZeros >= 2) return true
li a0,0 # return false (a0 = 0)
j function_done
return_true:
li a0,1 # return true (a0 = 1)
function_done:
lw s0,0(sp) # reload s0_original
addi sp,sp,4 # stack back
jr ra # ret
```
* #### C_v2_direct conversion_v3
```
.data #static region
nums: .word 1,2,3,4,5 #testdata0
numsSize: .word 5 #testdata0
newline: .string "\n"
.text
.globl main
main:
#def a0 a1
la a0, nums
lw a1, numsSize
#function call
jal ra, checktrailingzero
li a7, 1 #print
mv a1, a0
ecall #system call
li a7, 4
la a0, newline
ecall
li a7, 10
ecall
checktrailingzero:
add s0,x0,x0 # int countTrailingZeros = 0
addi sp,sp,-4 # push stack for 1
sw s0,0(sp) # save s0
add a2,x0,x0 # i = 0
loop1:
bge a2,a1,loop1_done # if (i >= numsSize) ,loop done
nop
nop
# check if nums[i] is even
slli t0,a2,2 # (i * 4)
lw t2,0(t1) # load nums[i] toward t2
add t1,a0,t0 # calculate nums[i] address toward t1
andi t2,t2,1 # nums[i] & 1
bne t2,x0,skip_count # if nums[i] odd,skip_count
nop
nop
addi s0,s0,1 # countTrailingZeros++
skip_count:
addi a2,a2,1 # i++
j loop1 # next loop
loop1_done:
li t0,2 # make t0=2(imm)
bge s0,t0,return_true # if (countTrailingZeros >= 2) return true
nop
nop
li a0,0 # return false (a0 = 0)
j function_done
return_true:
li a0,1 # return true (a0 = 1)
function_done:
lw s0,0(sp) # reload s0_original
addi sp,sp,4 # stack back
jr ra # ret
```
* eliminate control hazard :
* reason:in the pipeline,following jump and branch format instructions we often see flush or NOP appear. This occurs because the processor is waiting to determine whether the branch condition is met.

* resolution:
1. **stall**:insert NOP following branch or jump






3. **branch prediction**:
* Explanation of Branch Prediction: In the example above, we use the bne instruction to determine whether numbers[i] is odd. If the number is odd, the program jumps to skip; otherwise, it continues to execute the addition. When running this code, the CPU needs to predict whether the bne instruction will result in a jump.Without Branch Prediction:When the CPU encounters the bne instruction, it must stop and wait to evaluate the result of numbers[i] & 1.
* Branch Prediction Example Analysis:Let’s assume the CPU uses a static branch prediction strategy (e.g., always assuming no jump). Below, we analyze two different scenarios based on the input data:
Scenario 1: The Array Contains Mostly Even Numbers
numbers = [2, 4, 6, 8, 10]
Static Prediction: Assume the branch will never be taken.
Prediction Success: Most of the bne branches are not taken, and the pipeline continues without stalling.
Effect: Overall performance improves.
Scenario 2: The Array Contains Mostly Odd Numbers
numbers = [1, 3, 5, 7, 9]
Static Prediction: Assume the branch will never be taken.
Prediction Failure: The bne branch needs to jump every time, forcing the CPU to flush the pipeline and reload the instructions.
Effect: Overall performance decreases.
## test data
| test | nums_input | output |
| ------------- | ---------- | ------ |
| 1. |[1,2,3,4,5] | true |
| 2. | [2,4,8,16] | true |
| 3. | [1,3,5,7,9] | false |
## Result
* #### C
test1:
test2:
test3:
* #### Assembly
test1:
test2:
test3:
## Analysis
1. ### C_v1_ripes vs C_v2_ripes
* #### C_v1_ripes compiling

* #### C_v2_ripes compiling

from IPC、Cycles,we can know C_v2 is more efficient than C_v1
2. ### C_v1_compiling vs C_v2_mecompiling
* #### C_v1_mecompiling

cycles/IPC =204.855
* #### C_v2_mecompiling)_v1

cycles/IPC =167.72
### To optimize the c_v2_mecompiling
#### 1.optimize data hazard
* #### C_v2_mecompiling_v2

#### 2.optimize control hazard
* #### C_v2_mecompiling_v3

## Reference
*[MIT 6.004 L15: Introduction to Pipelining](https:/https://www.youtube.com/watch?v=5NQkhqZe8_8/)
*[MIT 6.004 L16: Processor Pipelining](https:/https://www.youtube.com/watch?v=TMpjvAvQCWA/)
*[cs61c Lectures](https://)
*[Ripes: Teaching Computer Architecture Through Visual and Interactive Simulators](https:/https://www.youtube.com/watch?v=yEw-S8J-LkI&t=730s/)