# Assignment1: RISC-V Assembly and Instruction Pipeline
## TwoSum
Given an array of integers `nums` and a integer `target`. We have to pick two of the numbers which add up to target.
## Approach
First, we get a random number in the array.Then, we check the last of them if there is the number that add up to the target, else we go to the first step and get another ramdow number.
### **C Code**
```c
#include <stdio.h>
int main()
{
int nums[4] = {2, 7, 11, 15}, target = 9;
for (int i = 0; i < sizeof(nums); i++){
int complement = target - nums[i];
for (int j = i + 1; j < sizeof(nums); j++){
if (nums[j] == complement) {
printf("%d %d\n", i, j);
}
}
}
}
```
:::warning
Can you optimize the above?
:notes: jserv
:::
### **Assembly Code**
```C
.data
nums: .word 2, 7, 11, 15 # nums[4] = {2, 7, 11, 15}
target: .word 9 # target = 9
str: .string "The answer is "
len: .word 4 # nums.size() = 4
space: .string " "
.text
main:
la s1, nums # s1 = nums
lw s2, target # s2 = target
lw s3, len # s3 = sizeof(nums)
add t0, zero, zero # i = 0
jal ra, loop # jump to for loop
branch:
la a0, str # load string
li a7, 4 # print string
ecall
add a0, t0, zero # load integer
li a7, 1 # print result
ecall
la a0, space # load string
li a7, 4 # print string
ecall
add a0, t1, zero # load string
li a7, 1 # print result
ecall
li a7, 10 # end problem
ecall
loop:
lw t3, 0(s1) # t3 = nums[i]
sub t2, s2, t3 # complement = target - nums
addi t1, t0,1 # j = i+1
addi s4, s1, 4 # nums[j] = nums[i+1]
loop2:
lw t3, 0(s4) # t3 = nums[j]
beq t3, t2, branch # if complement = nums[j] go to branch
addi s4, s4, 4 # nums[j++]
addi t1, t1, 1 # j++
blt t1, s3, loop2
addi s1, s1, 4 # nums[i++]
addi t0, t0,1 # i++
blt t0, s3, loop # if i<4, go to loop
jr ra # jump back to the jal
```
## Analysis
I test the code on Ripes, and I choose the 5-stage processor.
### **Pseudo instruction**
```c=
00000000 <main>:
0: 10000497 auipc x9 0x10000
4: 00048493 addi x9 x9 0
8: 10000917 auipc x18 0x10000
c: 00892903 lw x18 8 x18
10: 10000997 auipc x19 0x10000
14: 0139a983 lw x19 19 x19
18: 000002b3 add x5 x0 x0
1c: 044000ef jal x1 68 <loop>
00000020 <branch>:
20: 10000517 auipc x10 0x10000
24: ff450513 addi x10 x10 -12
28: 00400893 addi x17 x0 4
2c: 00000073 ecall
30: 00028533 add x10 x5 x0
34: 00100893 addi x17 x0 1
38: 00000073 ecall
3c: 10000517 auipc x10 0x10000
40: feb50513 addi x10 x10 -21
44: 00400893 addi x17 x0 4
48: 00000073 ecall
4c: 00030533 add x10 x6 x0
50: 00100893 addi x17 x0 1
54: 00000073 ecall
58: 00a00893 addi x17 x0 10
5c: 00000073 ecall
00000060 <loop>:
60: 0004ae03 lw x28 0 x9
64: 41c903b3 sub x7 x18 x28
68: 00128313 addi x6 x5 1
6c: 00448a13 addi x20 x9 4
00000070 <loop2>:
70: 000a2e03 lw x28 0 x20
74: fa7e06e3 beq x28 x7 -84 <branch>
78: 004a0a13 addi x20 x20 4
7c: 00130313 addi x6 x6 1
80: ff3348e3 blt x6 x19 -16 <loop2>
84: 00448493 addi x9 x9 4
88: 00128293 addi x5 x5 1
8c: fd32cae3 blt x5 x19 -44 <loop>
90: 00008067 jalr x0 x1 0
```
### 5-Stage RISC-V Processor
You can see the 5-Stage RISC-V Processor on the bottom, and there are pipeline registers in this processor.

I'll go through the five stage, and discuss each of them.
### Memory
We can see the memory assignment on the bottom. The .data starts at 0X10000000, and the .text starts at 0X00000000.

The .data start at 0X10000000, we can see `nums` is at 0X10000000 ~0X1000000F, `target` is at 0X10000010 ~ 0X10000013, `str` is at 0X10000014 ~ 0X10000022,`space` is at 0X10000023 ~ 0x10000024 `len` is at 0X10000025 ~ 0X10000029, space is at .

### Instruction Fetch stage
In this stage, it will decide the location of next instruction, and get the instruction from instruction memory.
This is the first instruction of my program, and it is a normal case.
* The first instruction is at 0x00000000.
* PC output 0x00000000 to the adder and instruction memory.
* Instruction memory get the address and output the first instruction.
* The adder add the location up to 0x00000004.
* The multiplexer will have the decision of the location of next instruction. In this case, the multiplexer chooses the port from adder.
* Another port of the multiplexer is used for jump or branch.
* The PC enable signal will be true in normal case, and it will become false when there are some stall.

### Instrution decode stage
In this stage, it will decode the instruction, and the registers will be read or written.
This is an add instruction on the bottom.

* The decoder get the instruction from instruction memory.
* The write/enable signal is true that mean there will be a register written back.
* Registers get the signals from decoder. Register x0 and register x5 will be read, and register x0 will be written.
* The immediate will be used if it is an immediate value in the instruction.
### Execute stage
In this stage, it will execute the instructions. It will do some calculation or some logical operation.
This is an add instruction on the bottom.

* The first level multiplexers will decide where will the source in the register get from.
1. The middle port of them will get the source from instruction decode stage.
2. The other two port will get the source by forwarding.
* The second level multiplexer on the top will decide which source will be operated from Op1.
1. The port on the top will get the source from PC.
2. The port on the bot will get the source from register.
* The second level multuplexer on the bot will decide which source will be operated from Op2.
1. The port on the top will get the source from register.
2. The port on the bot will get the source from immediate.
* The branch taken signal will be true if there is a branch.
* ALU will operate the source from Op1 and Op2.
The add instruction we can see the both source of ALU are from registers.

The auipc instruction we can see the source of ALU are from PC and immediate.

The addi instruction we can see the source of ALU are from register and immediate.
### Memory stage
In this stage, it will store the data in the data memory if it has to.
This is an auipc instruction on the bot. In this case, it doesn't store any data.

* The write/enable signal will be true if there are some data will be stored.
*
### Write back stage
In this stage, it will decide what will be written in register.
This is an auipc instruction on the bot. In this case, the source from data memory will be written.

* The multiplexer will decide which source will be written.
1. The port on the top will get the source from PC.
2. The port on the middle will get the source from ALU.
3. The port on the bot will get the source from data memory.