# Assignment1: RISC-V Assembly and Instruction Pipeline ## Problem Description: N-Repated Element in Size $2N$ Array Given one interger array called **nums**: - It has $N+1$ uniques number - The length of array: $2N$ - There has **exactly one** element is repeat **N** times. Return the repeated element. e.g. Array: [1,2,3,4,4,4] - The length of array: 6 - N = 3 - Repeated element is repeat 3 times Ans: 4 :::info **[e.g.](https://www.aje.com/arc/editing-tip-using-eg-and-ie/)** is the abbreviation for the Latin phrase exempli gratia, meaning "for example." :notes: jserv ::: ## Problem Analysis Only one element can be repeated.Consider 3 cases, i is the index of current element: - Case 1: abcddd, d are all continuous repeated.(or dddabc,abdddc) > 1. We can compare two adjacent number to find the repeated elements.(compare $A_i$ and $A_{i+1}$) - Case 2: dadbdc, d are all discontinuous repeated.(or adbdcd) > 2. We can compare the element after one (compare $A_i$ and $A_{i+1}$) to find the - Case 3: other > 3. Case 3 must combined in case 1 and case 2,some of repeated elements are continuous repeat,some of repeated elements are discontinuous repeat.(ddadbc -> dda|dbc) ## C code Check all element in the array $A$, if the element $A_i$ is repeated element, following event are true: > $A_i = A_{i+1}$ or $A_i = A_{i+1}$ ```c #include <stdio.h> int repeatedNTimes(int* nums,int numsSize) { for(int i=0;i<numsSize;i++) { if(nums[i]==nums[(i+1)%numsSize] || nums[i]==nums[(i+2)%numsSize]) return nums[i]; } return -1; } int main() { int a[]={5,1,5,2,5,3,5,4}; int numsSize=8; printf("["); for(int i=0;i<numsSize;i++) printf("%d ",a[i]); printf("]\n"); int repeatedNum=repeatedNTimes(a,numsSize); printf("%d ",repeatedNum); return 0; } ``` ## Assembly Code These codes not only calculate repeated number but also print the input arrays. ```assembly .data nums: .word 5,1,5,2,5,3,5,4 numsSize: .word 8 str1: .string "[" str2: .string "]\n" str3: .string " " str4: .string "Repeated number is " .text main: la a1,nums lw a2,numsSize jal Check mv a4,a0 jal ra,Print #Print Array la a0,str4 li a7,4 ecall mv a0,a4 #print Answer li a7,1 ecall li a7,10 ecall Print: # a1:array a2: numSize li t0,0 la a0,str1 #print started li a7,4 ecall mv a3,a1 PrintFor: lw a0,(0)a3 # print number li a7,1 ecall la a0,str3 # print space li a7,4 ecall addi a3,a3,4 addi t0,t0,1 bne t0,a2,PrintFor PrintEnd: la a0,str2 #print ended li a7,4 ecall jr ra Check: # return a0 by answer # t0 t1 t2 i ,i+1 ,i+2 # t3 t4 t5 number of Ai Ai+1 Ai+2 # t6 :4 # a3 pointer to current number #------ForStart-------------------------------- li t0,0 li t6,4 CheckFor: addi t1,t0,1 addi t2,t0,2 #-------getNumberi------------------------------- mul t3,t0,t6 # t3 = t0*4 add a3,a1,t3 # a3= a1+t3 offset calculation lw t3,(0)a3 # t3= A[i] #-------getNumberi+1------------------------------ rem t1,t1,a2 mul t4,t1,t6 # t4 = t1*4 add a3,a1,t4 # a3= a1+t4 offset calculation lw t4,(0)a3 # t4= A[i+1] #-------getNumberi+2------------------------------ rem t2,t2,a2 mul t5,t2,t6 # t5 = t2*4 add a3,a1,t5 # a3= a1+t5 offset calculation lw t5,(0)a3 # t5= A[i+2] #----check A[i]==A[i+1] or A[i]==A[i+2]---------------- beq t3,t4,ReturnAns beq t3,t5,ReturnAns addi t0,t0,1 #endFor i++ blt t0,a2,CheckFor #Check i<numsSize ReturnAns: add a0,t3,zero jr ra ``` :::warning :warning: What if your microprocessor lacks of `REM` instruction? :notes: jserv ::: :::info If microprocessor lacks of `REM` instruction, there are two alternative solution: 1. add the branch if the i+1, i+2 is exceed the array ```assembly # rem t1,t1,a2 blt t1,a2,Continue sub t1,t1,a2 Continue: ... ``` 2. Avoid to calculating the REM instruction from C code and assembly(both).Check the starting or ending of the array before loop.I prefer this solution than first solution because not need to check the array in loop. ```c= int repeatedNTimes(int* nums, int numsSize){ if(numsSize>=3) { if(nums[0]==nums[numsSize-1] || nums[0]==nums[numsSize-2]) return nums[0]; if(nums[1]==nums[numsSize-1] || nums[1]==nums[0]) return nums[1]; } for(int i=2;i<numsSize;i++) if(nums[i]==nums[i-1] || nums[i]==nums[i-2]) return nums[i]; return -1; } ``` assembly code in Check ```assembly Check: # return a0 by answer # t0 i # t3 t4 t5 number of Ai Ai+1 Ai+2 # a3 pointer to current number #------Start-------------------------------- mv t4,a2 #t4= numsSize addi t4,t4,-1 #t4=numsSize-1 slli t4,t4,2 #t4= (numsSize-1) pointer offset addi t3,zero,3 # 3 blt a2,t3,SkipCompare # if(nums[0]==nums[numsSize-1] || nums[0]==nums[numsSize-2]) lw t3,(0)a1 # t0=A[0] add a3,a1,t4 lw t4,(0)a3 lw t5,(-4)a3 beq t3,t4,ReturnAns beq t3,t5,ReturnAns # if(nums[1]==nums[numsSize-1] || nums[1]==nums[0]) lw t3,(4)a1 lw t5,(0)a1 beq t3,t4,ReturnAns beq t3,t5,ReturnAns #---------loop Start------ li t0,2 #Start From 2 CheckFor: #-------getNumber------------------------------- slli t3,t0,2 # t3 = t0*4 t3 :i add a3,a1,t3 # a3= a1+t0 pointer calculation lw t3,(0)a3 # t3= A[i] lw t4,(-4)a3 lw t5,(-8)a3 #----check A==A[i+1] or A==A[i+2]---------------- beq t3,t4,ReturnAns beq t3,t5,ReturnAns addi t0,t0,1 #endFor i++ blt t0,a2,CheckFor #Check i<numsSize ReturnAns: add a0,t3,zero jr ra SkipCompare: lw t3,(0)a1 jr ra ``` ::: These code can be split to three parts: - main - Loading the input argument and jump to specific label such as "Check"(N-repeated number) or "Print"(print arrays), finally print the N-repeated number - Print - Print the array, it has a loop(PrintFor) to execute print A[i] and other like [] symbol. - Check - Check label do the N-repeated number calculation. - First, it find the number of A[i],A[i+1],A[i+2].The pointer was calculate by ```assembly mul t3,t0,t6 # t6=4 #t3 is calculate the offset of pointer #but we also can using slli t3,t0,2 #left shift to replace line above add a3,a1,t3 #a3=a1+t3=address of A[0]+offset #calculating the pointer to A[i] lw t3,(0)a3 #t3=A[i], get the number of Ai #load word ``` - Same process also for A[i+1] and A[i+2] but more 1 instructions ```assembly rem t2,t2,a2 #To calculate (i+2)%n, n is size of array, #using the start of #the array offset (i+2)%n to instead. ``` - Condition check A[i]==A[i+1] or A[i]==A[i+2] go to ReturnAns label, and also execute some instuction(like i++,i<numsSize) for loop execution. ```assembly beq t3,t4,ReturnAns beq t4,t5,ReturnAns addi t0,t0,1 #endFor i++ blt t0,a3,ReturnAns #Check i<numsSize j CheckFor ‵` ## Result Input array and Repeated number: ![](https://i.imgur.com/p1qRapr.png) ![](https://i.imgur.com/I9nIhBG.png) ![](https://i.imgur.com/QPwFoSd.png) ## Ripes simulator Single Cycle ![](https://i.imgur.com/nwA4XcV.png) &nbsp;&nbsp;A Instruction will be decode and decide to which unit result to be using and what to do. - Is it need to jump to specific instruction? - Is it need to write the register? - Is it a immediately instruaction like addi? - Is it a branch instruction? - Is it write the data memory? #### Example addi x11 x11 0 ![](https://i.imgur.com/Cxn6WqU.png) - Register Write enable(addi **x11**,x11,0) - op1 is x11(addi x11,**x11**,0) - op2 is 0(Immdiate) ![](https://i.imgur.com/4uTKA2r.png) - no Branch taken ![](https://i.imgur.com/d77aeU6.png) - no write any data ![](https://i.imgur.com/ZHABoEo.png) - ALU result need to write to register x11 ![](https://i.imgur.com/bEqw9ei.png) &nbsp;&nbsp;If we only do the instruction one by one, is the single cycle RISC-V Processor.If we hope to get higher performance, processor do instruction pipelining. &nbsp;&nbsp;Instruction pipelining means the every parts do the difference instruction to speed up the process such like the "pipeline" of the factory. ![](https://i.imgur.com/frNc1NJ.png) &nbsp; &nbsp;&nbsp;Let's we try to explain the how the instruction to execute in 5-stage RISC-V ### 5-stage RISC-V processor ![](https://i.imgur.com/QVs4HfX.png) We commonly split the pipeline to 5-stage: ![](https://i.imgur.com/kYHayP7.png) - **IF(Instruction Fetch)** - Reading instruction from memory,and calculation next PC(Program Counter) will be.(PC increment by 4,or jump to specific instruction(branch/jump)) - **ID(Instruction Decode)** - Reading register file(prepare for Ex stage) - Reading immediately value - **EX(Execute)** - Actual computation(OR,XOR,Addition...etc) - If need to jump other instruction, the result go to PC(IF stage parts) - **MEM(Memory Access)** - Memory access if needed - **WB(Write back)** - Write results into register file. ## Hazard &nbsp;&nbsp;Pipelining can speed up the instruction processing, but it might solve the following problem: - **Control Hazard**(Branch Hazard) - If we need to jump other instruction, but that is some instruction that are next instruction before jump. ```assembly addi t0,t0,1 #endFor i++ blt t0,a3,CheckFor #Check i<numsSize ReturnAns: add a0,t3,zero ``` - If "blt" result is jump to CheckFor label, but other stage of pipelining doing the "add a0,t3,zero" instruction, it will get the wrong result. ![](https://i.imgur.com/R8WSDIS.png) - **Data Hazard** - When the instructions need to modify data in differrent stage such as ```assembly mul t3,t0,t6 # t3 = t0*4 t3 :i add a3,a1,t3 # a3= a1+t0 pointer calculation ``` - called RAW(Read after Write).Other data hazard such like WAR,WAW. - When RAW occuring, "add a3,a1,t3" can be executed before "mul t3,t0,t6" WriteBack, so "add a3,a1,t3" get the wrong "t3". ## Harzard solving &nbsp;&nbsp;This part we will talk about how to solve harzard in 5-stage RISC-V processor. ### Control Harzard &nbsp;&nbsp;Continue the concept of control hazards ![](https://i.imgur.com/tNba2U2.png) "add x10,x28,x0" in ID stage "blt x5,x13,-64<CheckFor>" in EX stage In this time, the result in branch taken in "blt...". ![](https://i.imgur.com/teqtfKg.png) &nbsp;&nbsp;Using Ripes simulator visualization tools, we can observe the result of the ALU(EX stage) pass though the IF stage(PC input).Let's see what happen in next cycle. ![](https://i.imgur.com/1nFU3hA.png) These wrong instruction will be "**nop(flush)**" in pipeline stage, avoiding the hazard occurs. - **IF(Instruction Fetch)** This stage not only read the next instruction but also check the Ex stage result if that has branch/jump called. It need to call other stage to flush the wrong instruction(nop(flush)). - **EX(Execution)** This stage calculate the result, when the result is branch taken,it need to pass signal to IF stage. ### Data Hazard In RAW(Read After Write) ![](https://i.imgur.com/BVMAogU.png) ```assembly mul t3,t0,t6 # mul x28,x5,x31 add a3,a1,t3 # add x13,x11,x28 calculation ``` &nbsp;&nbsp;The result of the "mul x28,x5,x31" is in MeM stage,but "add x13,x11,x28" is in Ex stage, the t28 in "add x13,x11,x28" will read value "before write version". ![](https://i.imgur.com/Z58N0Lu.png) &nbsp;&nbsp;We can observed that the ALU result of the "mul x29,x5,x31" also pass through the Op2 of "add x13,x11,x28" without stall called foward & by passing. &nbsp;&nbsp;We can check about difference of single cycle processor and 5-stage processor. ![](https://i.imgur.com/f9AccrJ.png) &nbsp;&nbsp;We can find the forward & by passing unit in the 5-stage processor(Ex stage), it will receive(yellow line) the MeM stage(calculation results) if the RAW occurs. - **MeM(Memory Access)** If the RAW or other data hazard occurs, Mem stage will pass the caculation result to EX stage(input signals) without stall. **Load-Use Example** ```assembly lw t5,(0)a3 # t5= A[i+2] #----check A==A[i+1] or A==A[i+2]---------------- beq t3,t5,ReturnAns beq t3,t4,ReturnAns ``` In this example, we exchange the "beq t3,t5,ReturnAns" and "beq t3,t4,ReturnAns" to ensure the problem ocurrs.(load t5 immdiate use) ![](https://i.imgur.com/bfeKNKD.png) We can observe the result of x30 will stall one cycle("nop(stall)") until meet "beq..." and the result pass through the Ex stage(For calculation) and ID stage(For write in register). - In **Load Use** example, the result will stall one cycle(WB stage) and pass through Ex and ID stage. - **ID(Instruction Decode)** - not differrence between single cycle and 5-stage processor - store the data which WB stage passing to specific register, and output signal to Ex stage - **WB(Write Back)** - In load-use example, WB stage need to pass signals to both ID stage and Ex stage to avoid hazard occurs. - Other situation only pass the result to ID stage write registers. ## Conclusion &nbsp;&nbsp;In 5-stage RISC-V processor section, we talk about how to split the single cycle processor to 5 stage pipelining. - IF(Instruction Fetch) - ID(InstructionDecode) - Ex(Execution) - MeM(Memory Access) - WB(Write Back) &nbsp;&nbsp;And how the 5-stage solve pipelining problems(branch occurs, data dependencies...) with signals passing(Forward & bypassing...). ## Reference [Instruction pipelining - Wikipedia](https://en.wikipedia.org/wiki/Instruction_pipelining) [Classic RISC pipeline - Wikipedia](https://en.wikipedia.org/wiki/Classic_RISC_pipeline) [Hazard (computer architecture) - Wikipedia](https://en.wikipedia.org/wiki/Hazard_(computer_architecture))