# Assignment1: RISC-V Assembly and Instruction Pipeline ## TwoSum Given an array of integers `nums` and a integer `target`. We have to pick two of the numbers which add up to target. ## Approach First, we get a random number in the array.Then, we check the last of them if there is the number that add up to the target, else we go to the first step and get another ramdow number. ### **C Code** ```c #include <stdio.h> int main() { int nums[4] = {2, 7, 11, 15}, target = 9; for (int i = 0; i < sizeof(nums); i++){ int complement = target - nums[i]; for (int j = i + 1; j < sizeof(nums); j++){ if (nums[j] == complement) { printf("%d %d\n", i, j); } } } } ``` :::warning Can you optimize the above? :notes: jserv ::: ### **Assembly Code** ```C .data nums: .word 2, 7, 11, 15 # nums[4] = {2, 7, 11, 15} target: .word 9 # target = 9 str: .string "The answer is " len: .word 4 # nums.size() = 4 space: .string " " .text main: la s1, nums # s1 = nums lw s2, target # s2 = target lw s3, len # s3 = sizeof(nums) add t0, zero, zero # i = 0 jal ra, loop # jump to for loop branch: la a0, str # load string li a7, 4 # print string ecall add a0, t0, zero # load integer li a7, 1 # print result ecall la a0, space # load string li a7, 4 # print string ecall add a0, t1, zero # load string li a7, 1 # print result ecall li a7, 10 # end problem ecall loop: lw t3, 0(s1) # t3 = nums[i] sub t2, s2, t3 # complement = target - nums addi t1, t0,1 # j = i+1 addi s4, s1, 4 # nums[j] = nums[i+1] loop2: lw t3, 0(s4) # t3 = nums[j] beq t3, t2, branch # if complement = nums[j] go to branch addi s4, s4, 4 # nums[j++] addi t1, t1, 1 # j++ blt t1, s3, loop2 addi s1, s1, 4 # nums[i++] addi t0, t0,1 # i++ blt t0, s3, loop # if i<4, go to loop jr ra # jump back to the jal ``` ## Analysis I test the code on Ripes, and I choose the 5-stage processor. ### **Pseudo instruction** ```c= 00000000 <main>: 0: 10000497 auipc x9 0x10000 4: 00048493 addi x9 x9 0 8: 10000917 auipc x18 0x10000 c: 00892903 lw x18 8 x18 10: 10000997 auipc x19 0x10000 14: 0139a983 lw x19 19 x19 18: 000002b3 add x5 x0 x0 1c: 044000ef jal x1 68 <loop> 00000020 <branch>: 20: 10000517 auipc x10 0x10000 24: ff450513 addi x10 x10 -12 28: 00400893 addi x17 x0 4 2c: 00000073 ecall 30: 00028533 add x10 x5 x0 34: 00100893 addi x17 x0 1 38: 00000073 ecall 3c: 10000517 auipc x10 0x10000 40: feb50513 addi x10 x10 -21 44: 00400893 addi x17 x0 4 48: 00000073 ecall 4c: 00030533 add x10 x6 x0 50: 00100893 addi x17 x0 1 54: 00000073 ecall 58: 00a00893 addi x17 x0 10 5c: 00000073 ecall 00000060 <loop>: 60: 0004ae03 lw x28 0 x9 64: 41c903b3 sub x7 x18 x28 68: 00128313 addi x6 x5 1 6c: 00448a13 addi x20 x9 4 00000070 <loop2>: 70: 000a2e03 lw x28 0 x20 74: fa7e06e3 beq x28 x7 -84 <branch> 78: 004a0a13 addi x20 x20 4 7c: 00130313 addi x6 x6 1 80: ff3348e3 blt x6 x19 -16 <loop2> 84: 00448493 addi x9 x9 4 88: 00128293 addi x5 x5 1 8c: fd32cae3 blt x5 x19 -44 <loop> 90: 00008067 jalr x0 x1 0 ``` ### 5-Stage RISC-V Processor You can see the 5-Stage RISC-V Processor on the bottom, and there are pipeline registers in this processor. ![](https://i.imgur.com/5GGH1dO.png) I'll go through the five stage, and discuss each of them. ### Memory We can see the memory assignment on the bottom. The .data starts at 0X10000000, and the .text starts at 0X00000000. ![](https://i.imgur.com/AGTSu2Y.png) The .data start at 0X10000000, we can see `nums` is at 0X10000000 ~0X1000000F, `target` is at 0X10000010 ~ 0X10000013, `str` is at 0X10000014 ~ 0X10000022,`space` is at 0X10000023 ~ 0x10000024 `len` is at 0X10000025 ~ 0X10000029, space is at . ![](https://i.imgur.com/LQRSCfa.png) ### Instruction Fetch stage In this stage, it will decide the location of next instruction, and get the instruction from instruction memory. This is the first instruction of my program, and it is a normal case. * The first instruction is at 0x00000000. * PC output 0x00000000 to the adder and instruction memory. * Instruction memory get the address and output the first instruction. * The adder add the location up to 0x00000004. * The multiplexer will have the decision of the location of next instruction. In this case, the multiplexer chooses the port from adder. * Another port of the multiplexer is used for jump or branch. * The PC enable signal will be true in normal case, and it will become false when there are some stall. ![](https://i.imgur.com/y2sATqr.png) ### Instrution decode stage In this stage, it will decode the instruction, and the registers will be read or written. This is an add instruction on the bottom. ![](https://i.imgur.com/VrpEFnY.png) * The decoder get the instruction from instruction memory. * The write/enable signal is true that mean there will be a register written back. * Registers get the signals from decoder. Register x0 and register x5 will be read, and register x0 will be written. * The immediate will be used if it is an immediate value in the instruction. ### Execute stage In this stage, it will execute the instructions. It will do some calculation or some logical operation. This is an add instruction on the bottom. ![](https://i.imgur.com/M1M2KDz.png) * The first level multiplexers will decide where will the source in the register get from. 1. The middle port of them will get the source from instruction decode stage. 2. The other two port will get the source by forwarding. * The second level multiplexer on the top will decide which source will be operated from Op1. 1. The port on the top will get the source from PC. 2. The port on the bot will get the source from register. * The second level multuplexer on the bot will decide which source will be operated from Op2. 1. The port on the top will get the source from register. 2. The port on the bot will get the source from immediate. * The branch taken signal will be true if there is a branch. * ALU will operate the source from Op1 and Op2. The add instruction we can see the both source of ALU are from registers. ![](https://i.imgur.com/pQt5LxU.png) The auipc instruction we can see the source of ALU are from PC and immediate. ![](https://i.imgur.com/wXpOZ4E.png) The addi instruction we can see the source of ALU are from register and immediate. ### Memory stage In this stage, it will store the data in the data memory if it has to. This is an auipc instruction on the bot. In this case, it doesn't store any data. ![](https://i.imgur.com/6ElWn1x.png) * The write/enable signal will be true if there are some data will be stored. * ### Write back stage In this stage, it will decide what will be written in register. This is an auipc instruction on the bot. In this case, the source from data memory will be written. ![](https://i.imgur.com/kthDMbZ.png) * The multiplexer will decide which source will be written. 1. The port on the top will get the source from PC. 2. The port on the middle will get the source from ALU. 3. The port on the bot will get the source from data memory.