Homework1: RISC-V Assembly and Instruction Pipeline

--- tags: Computer Architecture 2022 --- # Homework1: RISC-V Assembly and Instruction Pipeline ## Problem [Leetcode 905 Sort Array By Parity](https://leetcode.com/problems/sort-array-by-parity/) Given an integer array nums, move all the even integers at the beginning of the array followed by all the odd integers. Return any array that satisfies this condition. **Example 1 :** Input: nums = [3,1,2,4] Output: [2,4,3,1] Explanation: The outputs [4,2,3,1], [2,4,1,3], and [4,2,1,3] would also be accepted. **Example 2 :** Input: nums = [0] Output: [0] ## Solution We need to judge whether the number is even or odd. Put even numbers in front of the matrix and odd numbers in the back. **Step 1 : judge whether the number is even or odd** we can take the remainder of 2 to judge the number **Step2 : put the even number in front of the matrix and odd numbers in the back** we need a pointer to point to the head of array and a pointer point to the back of array. When we put the even number in front of the matrix, the pointer which point at the front of array will point at the next space. When we put the odd number in back of the matrix, the pointer which point at the back of the array will point at the pervious space. ## C Code ```c int *sortArrayByParity(int *nums, int numsSize, int *returnSize) { int *ans = malloc(sizeof(int)*numsSize); int *front = ans; int *back = ans+numsSize; for(int i = 0;i<numsSize;i++) { if(nums[i] % 2) { *--back = *(nums+i); } else { *front++ = *(nums+i); } } *returnSize = numsSize; return ans; } ``` ## Assembly Code ``` .data nums: .word 22,3,9,2,54 numsSize: .word 5 returnpointer: .word 0x100 .text main: la a0,nums # a0 is the array pointer lw a1,numsSize # a1 is the array size addi sp,sp,-4 # create a space sw ra,0(sp) # save the return address to main jal sortArrayByParity lw ra,0(sp) # load the return address to main addi sp,sp,4 # release the space j exit sortArrayByParity: mv s1,a0 # save s1 the array pointer mv s2,a1 # save s2 the array size lw a0,returnpointer # a0 is the address of head (the return address of array) slli t0,s2,2 # array size * sizeof(int) add t1,a0,t0 # t1 is the address of back mv t0,a0 # t0 is the address of head loop: bge t3,s2,done # t3 : for loop count lw t4,0(s1) # t4 : load the array element addi s1,s1,4 # point to the next element in nums array andi t2,t4,1 # if t2 = 1, that the number is even beq t2,x0,else then: # if the number is odd, run then addi t1,t1,-4 # point to the previous element sw t4,0(t1) # put the even number head of the array addi t3,t3,1 # for loop count++ add t2,x0,x0 # clean the space j end else: # if the number is odd, run else sw t4,0(t0) # put even number in the head addi t0,t0,4 # point to the next element addi t3,t3,1 # for loop count++ add t2,x0,x0 # clean the space j end end: j loop done: jr ra exit: li a7, 10 # end the program ecall ``` :::warning :warning: Can you use fewer instructions? :notes: jserv ::: :::info This is the method I try to use fewer instructions - Try to put the repeat instruction which is in label "then" and "else" into "end" label - delete "j end" in label "else" because it will run label "end" although no instruction "j end" - don't need to save the ra register in main because we don't use the function in another function ::: ``` .data nums: .word 22,3,9,2,54 numsSize: .word 5 returnpointer: .word 0x100 .text main: la a0,nums # a0 is the array pointer lw a1,numsSize # a1 is the array size jal sortArrayByParity j exit sortArrayByParity: mv s1,a0 # save s1 the array pointer mv s2,a1 # save s2 the array size lw a0,returnpointer # a0 is the address of head (the return address of array) slli t0,s2,2 # array size * sizeof(int) add t1,a0,t0 # t1 is the address of back mv t0,a0 # t0 is the address of head loop: bge t3,s2,done # t3 : for loop count lw t4,0(s1) # t4 : load the array element addi s1,s1,4 # point to the next element in nums array andi t2,t4,1 # if t2 = 1, that the number is even beq t2,x0,else then: # if the number is odd, run then addi t1,t1,-4 # point to the previous element sw t4,0(t1) # put the even number head of the array j end else: # if the number is odd, run else sw t4,0(t0) # put even number in the head addi t0,t0,4 # point to the next element end: add t2,x0,x0 # clean the space addi t3,t3,1 j loop # for loop count++ done: jr ra exit: li a7, 10 # end the program ecall ``` And I fill up the function of printf and three testing data ``` .data nums1: .word 22,3,9,2,54 nums2: .word 18,7,69,32,5,8,2 nums3: .word 3,2,5 numsSize1: .word 5 numsSize2: .word 7 numsSize3: .word 3 returnpointer: .word 0x100 .text main: la a0,nums1 # a0 is the array pointer lw a1,numsSize1 # a1 is the array size jal sortArrayByParity li t0,0 # clean the register mv t1,a1 # a1 is the array size mv t2,a0 # a0 is the array pointer which is sort by parity print_loop: bge t0,t1,exit # if print all of the element, exit lw a0,0(t2) # load the array element jal printf # print a0 to console addi t0,t0,1 # print loop count++ addi t2,t2,4 # point to next element j print_loop sortArrayByParity: mv s1,a0 # save s1 the array pointer mv s2,a1 # save s2 the array size lw a0,returnpointer # a0 is the address of head (the return address of array) slli t0,s2,2 # array size * sizeof(int) add t1,a0,t0 # t1 is the address of back mv t0,a0 # t0 is the address of head loop: bge t3,s2,done # t3 : for loop count lw t4,0(s1) # t4 : load the array element addi s1,s1,4 # point to the next element in nums array andi t2,t4,1 # if t2 = 1, that the number is even beq t2,x0,else then: # if the number is odd, run then addi t1,t1,-4 # point to the previous element sw t4,0(t1) # put the even number head of the array j end else: # if the number is odd, run else sw t4,0(t0) # put even number in the head addi t0,t0,4 # point to the next element end: add t2,x0,x0 # clean the space addi t3,t3,1 # for loop count++ j loop done: jr ra printf: li a7,1 # print int ecall li a0, 9 # tab li a7, 11 # print char ecall jr ra exit: li a7, 10 # end the program ecall ``` The intput is ![](https://i.imgur.com/vUrox63.png) And the output is ![](https://i.imgur.com/dyAMl9V.png) ## Analysis We use Ripes to simulate RISC-V processor and we can divide the processor into 5 stage ![](https://i.imgur.com/OllV6yk.png) | Stage | Description | | -------- | -------- | | IF | Instruction Fetch | | ID| Instruction Decode & Register Read| |EX|Execution or Address Calculation| |MEM|Data Memory Access| |WB|Write Back| Then we make a example to explain the function of each stage. First, we use I-type instruction such as addi instruction to explain how the processor work. For this instruction, it need to do the following five step to complete the instruction :::info Ex : addi t0 t1 1 Step 1 : Get the instruction Step 2 : Parse instruction field Step 3 : Read data based on parsed operands Step 4 : Perform operation Step 5 : Write result to our destination register ::: ### IF ![](https://i.imgur.com/LuhdOUf.png) First, IF stage will use the PC to fetch the instruction when the data pass the instruction memory. PC will be plus four to get the next instruction. :::success PC is 0 and read the instruction addi t0,t1,1. The machine code of this instruction is 0x00130293. PC will be plus four after instruction be fetch. ::: ### ID ![](https://i.imgur.com/N5R4Qs2.png) Then, ID stage decode the instruction into rs1, rs2, rd, opcode and transform to registers. If opcode is I-type,it will use Imm to operate the register. Registers will read the value of rs1 and rs2 register and transform to EX stage to execute the instruction And, if WB stage transform data to ID stage, it will get the rd register and the value to write register :::success Decode the machine code to get the rs1, rd which is 0x06,0x05 respectively. Because of the I-type, imm will decode the instruction to get the immediate. And read the rs1 rs2 register value which is 0x0 and 0x0 respectively ::: ### EX ![](https://i.imgur.com/doPbp2X.png) Then, EX stage will excute the operation and use the following table to decide which operation we want to perform. We can see that Func3 and Func7 will choose which operation to perform. When the operation be done, it will transform the result to MEM stage ![](https://i.imgur.com/Pd7VFLb.png) :::success plus the value of rs1 and imm transform to res ::: ### MEM ![](https://i.imgur.com/sT2HxCi.png) This stage will input the address and read the data from memory. Because add instruction don't need to access memory, just transform the data to WB stage. :::success Read the value of memory at 0x00000001, but addi don't need this value. It will be unused. ::: ### WB ![](https://i.imgur.com/tTRATjA.png) Finally, we need to write the data to destination register. It will transform data to ID stage to write the data to register :::success It will write the 0x01 to the 0x05 register, 0x1302 will be ignore. ::: ### CPU control Because of many type of instruction, we need control bits to decide which situation we wanted. The below picture show how many bits we need. ![](https://i.imgur.com/J2DlCfs.png) - PCSel : decide which address is my next instruction (next instruction or jump label) - ImmSel : Does this instruction have an immediate - RegWEn : Does this instruction write to the destination register rd - BrUn : Does this instruction do a branch. If so, is it unsigned or signed - BSel : Does this instruction operate on R[rs2] or an immediate - ASel : Does this instruction operate on R[rs1] or an immediate - ALUSel : What operation should we perform on the selected operands - MemRW : Do we want to write to memory? Do we want to read from memory? - WBSel : Which value do we want to write back to rd :::success for the Example : addi t0,t1,1 PCSel : 0 immSel : 1 RegWEn : 1 BrUn : 0 BSel : 1 ASel : 0 ALUSel : addi MemRW : 0 WBSel : 1 ::: ## Pipeline After the above introduction, we know that there are five stages to process the instruction. In efficiency, use these five stage simultaneously is an efficient way. ![](https://i.imgur.com/0XW7JxE.png) In this case, we need to make sure that all the stage will consume the same time, so we need register(IFID, IDEX, EX/MEM, MEM/WB) between two of stage to ensure all the stage will consume the same time. ![](https://i.imgur.com/NiIpJx7.png) ## Pipeline Hazard When we use pipeline to process our instruction, there are some problem that will not occur in one cycle process. We will talk about these problem below. ### Structural hazard When we want to write the data to register and the another instruction want to read the register data. They both need the ID stage. ![](https://i.imgur.com/yLPWrdz.png) :::info Solution : 1. Build the RegFile withc independent read and write port. 2. split RegFile access in two. Prepare to write during 1st half, write on falling edge, read during 2nd half of each clock cycle. ::: When we want to access memory data in MEM stage, and we want to get the instruction from memory at the same time, like the below picture. ![](https://i.imgur.com/xfzX8CA.png) :::info Solution : There are two cache between processor and memory. We can independent get the data and the instruction ![](https://i.imgur.com/r8jlgqG.png) ::: ### Data Hazard When the data we want to calculate, but the data have not been save yet. Like the below picture. ![](https://i.imgur.com/8GJNogl.png) :::info Solution : 1. Stalling (To wait for two tcycle) ![](https://i.imgur.com/GRRMmV6.png) 2. Compiler can arrange code to avoid hazard and stall 3. Forward result as soon as it is available(After ALU), even though it's not stored in RegFile yet (need the hardware support) ![](https://i.imgur.com/caQWyGL.png) ::: When the data we load from memory and have not save to register yet. Like the below picture. We can not use Forward to solve this problem. ![](https://i.imgur.com/aygW4SZ.png) :::info Solution :　 1. Put nop instruction between load and use ![](https://i.imgur.com/6TDyPWV.png) 2. Compiler put an unrelated instruction in that slot ![](https://i.imgur.com/rLin9sC.png) ::: ### Control Hazard When we excute the branch instruction, we need to wait the decision to jump to the label or continue, so we need to wait two instruction to get the decision ![](https://i.imgur.com/Vn8y9Qo.png) :::info Solution :　 1. The Branch decision made in 2nd stage, so only one nop is needed instead of two 2. Branch Prediction : Predict a result and continue to run next instruction. If corrent, continue run. If wrong, then flush all the pipeline and flip prediction. :::