# Assignment1: RISC-V Assembly and Instruction Pipeline ## Two sum This problem is based on [LeetCode 1](https://leetcode.com/problems/two-sum/). Given an array of integers nums and an integer target, return indices of the two numbers such that they add up the same as target. ## Implementation I use two for loops to solve this problem. First for loop aims to find the base point in the array. Second for loop aims to find the number that plus first number equal to the first one. You can find the source code [here](https://github.com/jc123488/TwoSum) ## C code I use three arrays to store three different test numbers given by leetcode. To solve two sum problem, my program will call the twoSum function to calculate the correct answer. ```c= #include <stdio.h> void twoSum(int* nums,int numSize, int target){ int i,j; for(i=0;i<numSize;i++){ for(j=i+1;j<numSize;j++){ if(nums[i]+nums[j]==target) printf("%d %d\n",i,j); } } } int main(){ int nums1[4] = {2,7,11,15}, target1 = 9, size1=4; int nums2[3] = {3,2,4}, target2 = 6, size2=3; int nums3[2] = {3,3}, target3 = 6, size3=2; twoSum(nums1,size1,target1); twoSum(nums2,size2,target2); twoSum(nums3,size3,target3); return 0; } ``` ## Test data The three test data given by leetcode. ### Test 1 >nums1[4] = {2,7,11,15} target1 = 9 size1 = 4 ### Test 2 >nums[3] = {3,2,4} target2 = 6 size2 = 3 ### Test 3 >nums[2] = {3,3} target3 = 6 size3 = 2 ## Assembly code ```s= .data # first test data nums1: .word 2, 7, 11, 15 target1: .word 9 size1: .word 4 # second test data nums2: .word 3, 2, 4 target2: .word 6 size2: .word 3 # third test data nums3: .word 3, 3 target3: .word 6 size3: .word 2 endl: .string "\n" .text main: la s0, nums1 #load nums1 address to s0 lw s1, target1 #load target1 to s1 lw s2, size1 #load size1 to s2 jal twoSum la s0, nums2 lw s1, target2 lw s2, size2 jal twoSum la s0, nums3 lw s1, target3 lw s2, size3 jal twoSum li a7, 10 #end ecall twoSum: mv t0, zero #int i=0 addi t1, t0, 1 #int j=i+1 mv a0, s0 #store nums1 address for i mv a1, s0 #store nums1 address for j loop2: lw a2, 0(a0) #load nums[i] lw a3, 4(a1) #load nums[j] add a3, a2, a3 #nums1[i]+nums2[j] beq a3, s1, print #if (nums1[i]+nums2[j]==target) go to print addi a1, a1, 4 #point to next nums1[j] address addi t1, t1, 1 #j++ blt t1, s2, loop2 loop1: addi t0, t0, 1 #i++ addi a0, a0, 4 #point to next nums[i] address mv a1, a0 addi t1, t0, 1 #j=i+1 bltu t0, s2, loop2 #if (i<numsize) go to loop2 print: mv a0, t0 #move i li a7, 1 #print integer ecall mv a0, t1 #move j ecall la a0, endl #load \n li a7, 4 #print string ecall jr ra ``` ## Machine code Ripes can translate the pseudo assembly code into machine code. The machine code is looks like below: ```s= 00000000 <main>: 0: 10000417 auipc x8 0x10000 4: 00040413 addi x8 x8 0 8: 10000497 auipc x9 0x10000 c: 0084a483 lw x9 8(x9) 10: 10000917 auipc x18 0x10000 14: 00492903 lw x18 4(x18) 18: 044000ef jal x1 0x5c <twoSum> 1c: 10000417 auipc x8 0x10000 20: ffc40413 addi x8 x8 -4 24: 10000497 auipc x9 0x10000 28: 0004a483 lw x9 0(x9) 2c: 10000917 auipc x18 0x10000 30: ffc92903 lw x18 -4(x18) 34: 028000ef jal x1 0x5c <twoSum> 38: 10000417 auipc x8 0x10000 3c: ff440413 addi x8 x8 -12 40: 10000497 auipc x9 0x10000 44: ff44a483 lw x9 -12(x9) 48: 10000917 auipc x18 0x10000 4c: ff092903 lw x18 -16(x18) 50: 00c000ef jal x1 0x5c <twoSum> 54: 00a00893 addi x17 x0 10 58: 00000073 ecall 0000005c <twoSum>: 5c: 00000293 addi x5 x0 0 60: 00128313 addi x6 x5 1 64: 00040513 addi x10 x8 0 68: 00040593 addi x11 x8 0 0000006c <loop2>: 6c: 00052603 lw x12 0(x10) 70: 0045a683 lw x13 4(x11) 74: 00d606b3 add x13 x12 x13 78: 02968263 beq x13 x9 36 <print> 7c: 00458593 addi x11 x11 4 80: 00130313 addi x6 x6 1 84: ff2344e3 blt x6 x18 -24 <loop2> 00000088 <loop1>: 88: 00128293 addi x5 x5 1 8c: 00450513 addi x10 x10 4 90: 00050593 addi x11 x10 0 94: 00128313 addi x6 x5 1 98: fd22eae3 bltu x5 x18 -44 <loop2> 0000009c <print>: 9c: 00028513 addi x10 x5 0 a0: 00100893 addi x17 x0 1 a4: 00000073 ecall a8: 00030513 addi x10 x6 0 ac: 00000073 ecall b0: 10000517 auipc x10 0x10000 b4: f8c50513 addi x10 x10 -116 b8: 00400893 addi x17 x0 4 bc: 00000073 ecall c0: 00008067 jalr x0 x1 0 ``` ## Analysis ### 5-stage pipelined processor ![](https://i.imgur.com/G4PD6s3.png) I choose the **5-stage pipelined processor with forward and hazard detection** to run this program. The 5-stage are IF, ID, EX, MEM, WB. I'm going to choose one instruction to analysis these 5-stage. ### 1. Instruction Fetch (IF) ![](https://i.imgur.com/5IC2ZGs.png) - The first instruction in my program is `auipc x8, 0x10000`. Its start at `0x00000000` in instruction memory. The purpose is to give the address of test array stored in nums1 to s0. - ==PC== will add 4 in every cycle to read the next instruction because memory uses word address instead of byte address. ### 2. Instruction Decode (ID) ![](https://i.imgur.com/0i62ejr.png) - The instruction `0x10000417` is going to decode to 7 bits opcode, 5 bits rd and 20 bits imm in this stage. - The output of imm is 32 bits, the upper 20 bits is same as upper 20 bits of instruction, and put zero to fill up last 12 bits. ### 3. Execute (EX) ![](https://i.imgur.com/1kCR3P5.png) - There are 4 multiplexers to choose the input of ALU unit in this stage. - ==ALU unit== is to operate Op1 and Op2 by add, sub, and, or, shift left, shift right and so on. - ==Forwarding unit== is to detect the data hazard. If there is data hazard occur, forwarding unit will supply two signal to multiplexer in this stage for select the output by EXE stage or MEM stage. - ==Hazard unit== is to detect load use data hazard. If there is load use data hazard occur, hazard unit will supply the signal to EXE stage for doing NOP(stall) instruction. - ==Branch unit== is to detect B-type instruction in RISC V ISA. If brach condition occur, branch unit will send the branch taken signal to PC. Thus PC can send the new address to instruction memory to fetch the new instruction out. ### 4. Memory (MEM) ![](https://i.imgur.com/U5M6qBT.png) - This stage can read the value from memory by load instruction or store value into memory by store instruction. ### 5. Write Back (WB) ![](https://i.imgur.com/6oCaCdX.png) - The multiplexer in this stage can select which data needs write back to the destination register if the instruction need write operation. ### Hazard in my program #### 1. Control hazard The control hazard occurs when a jump or branch condition holds. So we need a hazard detection unit in our RISC V CPU for solving this problem. I use an instruction in my code to make it more clear to explain how it works.. ```s= ... main: la s0, nums1 lw s1, target1 lw s2, size1 jal twoSum la s0, nums2 lw s1, target2 lw s2, size2 jal twoSum la s0, nums3 lw s1, target3 lw s2, size3 jal twoSum li a7, 10 #end ecall twoSum: ... ``` - The ==hazard unit== will detect the control hazard happen when the code `jal twoSum` in line 10 goes to the EX stage. - Like the figure below, it will insert TWO NOP instruction in ID and EX stage for flush out the original instruction that has been fatched. - ==PC== is going to fetch the new instruction in instruction memory which `jal` instruction point at. - Finally, the `jal` instruction completes its task. ![](https://i.imgur.com/OHwGOIr.png) #### 2. Load use data hazard The load use data hazard occurs when a instruction after the load instruction wants to use the data in register load by memory. Briefly, the data doesn't write back to the register and load out from memory yet. I use an instruction in my code to make it more clear to explain how it works. ```s= ... loop2: lw a2, 0(a0) lw a3, 4(a1) add a3, a2, a3 ... ``` - The ==hazard unit== will detect the load use data hazard happen when the code `add a3, a2, a3` in line 5 goes to the ID stage. It will identify whether the destination register in `add` and `lw` is the same or not. - Like the figure below, it will insert ONE NOP instruction in EX stage for waiting the data load out from memory. - The ==forwarding unit== will detect that the data doesn't write back to the register, so it pass the data that is just read from the memory to ALU. - Finally, the load use data hazard needs one NOP and one forwaring to solve it. ![](https://i.imgur.com/hrtCOaJ.png) #### 3. Data hazard (normal) The data hazard occurs when a instruction want to use the data that doesn't wrtie back to the register yet. I use an instruction in my code to make it more clear to explain how it works. ```s= ... loop1: addi t0, t0, 1 addi a0, a0, 4 mv a1, a0 ... ``` - The ==forwarding unit== will detect the data hazard happen when the code `mv a1, a0` in line 5 goes to the ID stage. It will identify whether the destination register in `mv` and `addi` is the same or not. It will pass the data output by ALU to be the new input of ALU again. This means we doesn't need to stop pipeline processing until the data write back to destination register. - The detailed process is shown in the following figure: ![](https://i.imgur.com/aoQTsqf.png) > ALU calculates the result is 0x1000001c and pass it to pipeline register between EX/MEM. ![](https://i.imgur.com/Y4AihxM.png) > Forwarding unit pass 0x1000001c to be the new ALU input for solve the data hazard. --- :::danger Progress? :notes: jserv ::: :::success Sorry, I just finish it. :::