# Assignment1: RISC-V Assembly and Instruction Pipeline
## Problem Description: N-Repated Element in Size $2N$ Array
Given one interger array called **nums**:
- It has $N+1$ uniques number
- The length of array: $2N$
- There has **exactly one** element is repeat **N** times.
Return the repeated element.
e.g. Array: [1,2,3,4,4,4]
- The length of array: 6
- N = 3
- Repeated element is repeat 3 times
Ans: 4
:::info
**[e.g.](https://www.aje.com/arc/editing-tip-using-eg-and-ie/)** is the abbreviation for the Latin phrase exempli gratia, meaning "for example."
:notes: jserv
:::
## Problem Analysis
Only one element can be repeated.Consider 3 cases, i is the index of current element:
- Case 1: abcddd, d are all continuous repeated.(or dddabc,abdddc)
> 1. We can compare two adjacent number to find the repeated elements.(compare $A_i$ and $A_{i+1}$)
- Case 2: dadbdc, d are all discontinuous repeated.(or adbdcd)
> 2. We can compare the element after one (compare $A_i$ and $A_{i+1}$) to find the
- Case 3: other
> 3. Case 3 must combined in case 1 and case 2,some of repeated elements are continuous repeat,some of repeated elements are discontinuous repeat.(ddadbc -> dda|dbc)
## C code
Check all element in the array $A$, if the element $A_i$ is repeated element, following event are true:
> $A_i = A_{i+1}$ or $A_i = A_{i+1}$
```c
#include <stdio.h>
int repeatedNTimes(int* nums,int numsSize)
{
for(int i=0;i<numsSize;i++)
{
if(nums[i]==nums[(i+1)%numsSize] || nums[i]==nums[(i+2)%numsSize])
return nums[i];
}
return -1;
}
int main()
{
int a[]={5,1,5,2,5,3,5,4};
int numsSize=8;
printf("[");
for(int i=0;i<numsSize;i++)
printf("%d ",a[i]);
printf("]\n");
int repeatedNum=repeatedNTimes(a,numsSize);
printf("%d ",repeatedNum);
return 0;
}
```
## Assembly Code
These codes not only calculate repeated number but also print the input arrays.
```assembly
.data
nums: .word 5,1,5,2,5,3,5,4
numsSize: .word 8
str1: .string "["
str2: .string "]\n"
str3: .string " "
str4: .string "Repeated number is "
.text
main:
la a1,nums
lw a2,numsSize
jal Check
mv a4,a0
jal ra,Print #Print Array
la a0,str4
li a7,4
ecall
mv a0,a4 #print Answer
li a7,1
ecall
li a7,10
ecall
Print: # a1:array a2: numSize
li t0,0
la a0,str1 #print started
li a7,4
ecall
mv a3,a1
PrintFor:
lw a0,(0)a3 # print number
li a7,1
ecall
la a0,str3 # print space
li a7,4
ecall
addi a3,a3,4
addi t0,t0,1
bne t0,a2,PrintFor
PrintEnd:
la a0,str2 #print ended
li a7,4
ecall
jr ra
Check: # return a0 by answer
# t0 t1 t2 i ,i+1 ,i+2
# t3 t4 t5 number of Ai Ai+1 Ai+2
# t6 :4
# a3 pointer to current number
#------ForStart--------------------------------
li t0,0
li t6,4
CheckFor:
addi t1,t0,1
addi t2,t0,2
#-------getNumberi-------------------------------
mul t3,t0,t6 # t3 = t0*4
add a3,a1,t3 # a3= a1+t3 offset calculation
lw t3,(0)a3 # t3= A[i]
#-------getNumberi+1------------------------------
rem t1,t1,a2
mul t4,t1,t6 # t4 = t1*4
add a3,a1,t4 # a3= a1+t4 offset calculation
lw t4,(0)a3 # t4= A[i+1]
#-------getNumberi+2------------------------------
rem t2,t2,a2
mul t5,t2,t6 # t5 = t2*4
add a3,a1,t5 # a3= a1+t5 offset calculation
lw t5,(0)a3 # t5= A[i+2]
#----check A[i]==A[i+1] or A[i]==A[i+2]----------------
beq t3,t4,ReturnAns
beq t3,t5,ReturnAns
addi t0,t0,1 #endFor i++
blt t0,a2,CheckFor #Check i<numsSize
ReturnAns:
add a0,t3,zero
jr ra
```
:::warning
:warning: What if your microprocessor lacks of `REM` instruction?
:notes: jserv
:::
:::info
If microprocessor lacks of `REM` instruction, there are two alternative solution:
1. add the branch if the i+1, i+2 is exceed the array
```assembly
# rem t1,t1,a2
blt t1,a2,Continue
sub t1,t1,a2
Continue:
...
```
2. Avoid to calculating the REM instruction from C code and assembly(both).Check the starting or ending of the array before loop.I prefer this solution than first solution because not need to check the array in loop.
```c=
int repeatedNTimes(int* nums, int numsSize){
if(numsSize>=3)
{
if(nums[0]==nums[numsSize-1] || nums[0]==nums[numsSize-2])
return nums[0];
if(nums[1]==nums[numsSize-1] || nums[1]==nums[0])
return nums[1];
}
for(int i=2;i<numsSize;i++)
if(nums[i]==nums[i-1] || nums[i]==nums[i-2])
return nums[i];
return -1;
}
```
assembly code in Check
```assembly
Check:
# return a0 by answer
# t0 i
# t3 t4 t5 number of Ai Ai+1 Ai+2
# a3 pointer to current number
#------Start--------------------------------
mv t4,a2 #t4= numsSize
addi t4,t4,-1 #t4=numsSize-1
slli t4,t4,2 #t4= (numsSize-1) pointer offset
addi t3,zero,3 # 3
blt a2,t3,SkipCompare
# if(nums[0]==nums[numsSize-1] || nums[0]==nums[numsSize-2])
lw t3,(0)a1 # t0=A[0]
add a3,a1,t4
lw t4,(0)a3
lw t5,(-4)a3
beq t3,t4,ReturnAns
beq t3,t5,ReturnAns
# if(nums[1]==nums[numsSize-1] || nums[1]==nums[0])
lw t3,(4)a1
lw t5,(0)a1
beq t3,t4,ReturnAns
beq t3,t5,ReturnAns
#---------loop Start------
li t0,2 #Start From 2
CheckFor:
#-------getNumber-------------------------------
slli t3,t0,2 # t3 = t0*4 t3 :i
add a3,a1,t3 # a3= a1+t0 pointer calculation
lw t3,(0)a3 # t3= A[i]
lw t4,(-4)a3
lw t5,(-8)a3
#----check A==A[i+1] or A==A[i+2]----------------
beq t3,t4,ReturnAns
beq t3,t5,ReturnAns
addi t0,t0,1 #endFor i++
blt t0,a2,CheckFor #Check i<numsSize
ReturnAns:
add a0,t3,zero
jr ra
SkipCompare:
lw t3,(0)a1
jr ra
```
:::
These code can be split to three parts:
- main
- Loading the input argument and jump to specific label such as "Check"(N-repeated number) or "Print"(print arrays), finally print the N-repeated number
- Print
- Print the array, it has a loop(PrintFor) to execute print A[i] and other like [] symbol.
- Check
- Check label do the N-repeated number calculation.
- First, it find the number of A[i],A[i+1],A[i+2].The pointer was calculate by
```assembly
mul t3,t0,t6 # t6=4
#t3 is calculate the offset of pointer
#but we also can using slli t3,t0,2
#left shift to replace line above
add a3,a1,t3
#a3=a1+t3=address of A[0]+offset
#calculating the pointer to A[i]
lw t3,(0)a3 #t3=A[i], get the number of Ai
#load word
```
- Same process also for A[i+1] and A[i+2] but more 1 instructions
```assembly
rem t2,t2,a2
#To calculate (i+2)%n, n is size of array,
#using the start of
#the array offset (i+2)%n to instead.
```
- Condition check A[i]==A[i+1] or A[i]==A[i+2] go to ReturnAns label, and also execute some instuction(like i++,i<numsSize) for loop execution.
```assembly
beq t3,t4,ReturnAns
beq t4,t5,ReturnAns
addi t0,t0,1 #endFor i++
blt t0,a3,ReturnAns #Check i<numsSize
j CheckFor
‵`
## Result
Input array and Repeated number:



## Ripes simulator
Single Cycle

A Instruction will be decode and decide to which unit result to be using and what to do.
- Is it need to jump to specific instruction?
- Is it need to write the register?
- Is it a immediately instruaction like addi?
- Is it a branch instruction?
- Is it write the data memory?
#### Example
addi x11 x11 0

- Register Write enable(addi **x11**,x11,0)
- op1 is x11(addi x11,**x11**,0)
- op2 is 0(Immdiate)

- no Branch taken

- no write any data

- ALU result need to write to register x11

If we only do the instruction one by one, is the single cycle RISC-V Processor.If we hope to get higher performance, processor do instruction pipelining.
Instruction pipelining means the every parts do the difference instruction to speed up the process such like the "pipeline" of the factory.

Let's we try to explain the how the instruction to execute in 5-stage RISC-V
### 5-stage RISC-V processor

We commonly split the pipeline to 5-stage:

- **IF(Instruction Fetch)**
- Reading instruction from memory,and calculation next PC(Program Counter) will be.(PC increment by 4,or jump to specific instruction(branch/jump))
- **ID(Instruction Decode)**
- Reading register file(prepare for Ex stage)
- Reading immediately value
- **EX(Execute)**
- Actual computation(OR,XOR,Addition...etc)
- If need to jump other instruction, the result go to PC(IF stage parts)
- **MEM(Memory Access)**
- Memory access if needed
- **WB(Write back)**
- Write results into register file.
## Hazard
Pipelining can speed up the instruction processing, but it might solve the following problem:
- **Control Hazard**(Branch Hazard)
- If we need to jump other instruction, but that is some instruction that are next instruction before jump.
```assembly
addi t0,t0,1 #endFor i++
blt t0,a3,CheckFor #Check i<numsSize
ReturnAns:
add a0,t3,zero
```
- If "blt" result is jump to CheckFor label, but other stage of pipelining doing the "add a0,t3,zero" instruction, it will get the wrong result.

- **Data Hazard**
- When the instructions need to modify data in differrent stage such as
```assembly
mul t3,t0,t6 # t3 = t0*4 t3 :i
add a3,a1,t3 # a3= a1+t0 pointer calculation
```
- called RAW(Read after Write).Other data hazard such like WAR,WAW.
- When RAW occuring, "add a3,a1,t3" can be executed before "mul t3,t0,t6" WriteBack, so "add a3,a1,t3" get the wrong "t3".
## Harzard solving
This part we will talk about how to solve harzard in 5-stage RISC-V processor.
### Control Harzard
Continue the concept of control hazards

"add x10,x28,x0" in ID stage
"blt x5,x13,-64<CheckFor>" in EX stage
In this time, the result in branch taken in "blt...".

Using Ripes simulator visualization tools, we can observe the result of the ALU(EX stage) pass though the IF stage(PC input).Let's see what happen in next cycle.

These wrong instruction will be "**nop(flush)**" in pipeline stage, avoiding the hazard occurs.
- **IF(Instruction Fetch)**
This stage not only read the next instruction but also check the Ex stage result if that has branch/jump called. It need to call other stage to flush the wrong instruction(nop(flush)).
- **EX(Execution)**
This stage calculate the result, when the result is branch taken,it need to pass signal to IF stage.
### Data Hazard
In RAW(Read After Write)

```assembly
mul t3,t0,t6 # mul x28,x5,x31
add a3,a1,t3 # add x13,x11,x28 calculation
```
The result of the "mul x28,x5,x31" is in MeM stage,but "add x13,x11,x28" is in Ex stage, the t28 in "add x13,x11,x28" will read value "before write version".

We can observed that the ALU result of the "mul x29,x5,x31" also pass through the Op2 of "add x13,x11,x28" without stall called foward & by passing.
We can check about difference of single cycle processor and 5-stage processor.

We can find the forward & by passing unit in the 5-stage processor(Ex stage), it will receive(yellow line) the MeM stage(calculation results) if the RAW occurs.
- **MeM(Memory Access)**
If the RAW or other data hazard occurs, Mem stage will pass the caculation result to EX stage(input signals) without stall.
**Load-Use Example**
```assembly
lw t5,(0)a3 # t5= A[i+2]
#----check A==A[i+1] or A==A[i+2]----------------
beq t3,t5,ReturnAns
beq t3,t4,ReturnAns
```
In this example, we exchange the "beq t3,t5,ReturnAns" and "beq t3,t4,ReturnAns" to ensure the problem ocurrs.(load t5 immdiate use)

We can observe the result of x30 will stall one cycle("nop(stall)") until meet "beq..." and the result pass through the Ex stage(For calculation) and ID stage(For write in register).
- In **Load Use** example, the result will stall one cycle(WB stage) and pass through Ex and ID stage.
- **ID(Instruction Decode)**
- not differrence between single cycle and 5-stage processor
- store the data which WB stage passing to specific register, and output signal to Ex stage
- **WB(Write Back)**
- In load-use example, WB stage need to pass signals to both ID stage and Ex stage to avoid hazard occurs.
- Other situation only pass the result to ID stage write registers.
## Conclusion
In 5-stage RISC-V processor section, we talk about how to split the single cycle processor to 5 stage pipelining.
- IF(Instruction Fetch)
- ID(InstructionDecode)
- Ex(Execution)
- MeM(Memory Access)
- WB(Write Back)
And how the 5-stage solve pipelining problems(branch occurs, data dependencies...) with signals passing(Forward & bypassing...).
## Reference
[Instruction pipelining - Wikipedia](https://en.wikipedia.org/wiki/Instruction_pipelining)
[Classic RISC pipeline - Wikipedia](https://en.wikipedia.org/wiki/Classic_RISC_pipeline)
[Hazard (computer architecture) - Wikipedia](https://en.wikipedia.org/wiki/Hazard_(computer_architecture))