# Assignment1: RISC-V Assembly and Instruction Pipeline
contributed by < [`kaeteyaruyo`](https://github.com/kaeteyaruyo) >
## Dot product
In Algebraic definition, The dot product of two vectors $\mathbf a = [a_1, a_2, \dots, a_n]$ and $\mathbf b = [b_1, b_2, \dots, b_n]$ is defined as:$$\mathbf a \cdot \mathbf b =\sum _{i=1}^{n}a_ib_i=a_1b_1+a_2b_2+\cdots +a_nb_n$$
As in implementation, we store two vectors in two arrays that have same length, travel through them and multiply the same position elements, and add the result into summation, finally we get the dot product.
## Implementation
You can find the source code [here](https://github.com/kaeteyaruyo/Computer-Architecture/tree/master/hw1). Feel free to fork and modify it.
### C code
I use arrays of length = 3 to illustrate the dot product process. You can change the length or element inside them to see if the result is still correct.
```clike=
#include <stdio.h>
int main()
{
int a[3] = {1, 2, 3};
int b[3] = {4, 5, 6};
int sum = 0;
for (int i = 0; i < 3; ++i)
{
sum += a[i] * b[i];
}
printf("The inner product of two vectors is %d", sum);
return 0;
}
```
### Assembly code
In this example, I don't handle addition and multiplication overflow. So please ensure that the result of `mul` at line 33 and `add` at line 34 won't exceed the range of valid value ($-2^{31}$ to $2^{31} - 1$).
> There is no assembly syntax highligh in hackmd :(
> [name=kinoe_T]
```clike=
.data
arr1: .word 1, 2, 3 # a[3] = {2, 4, 6}
arr2: .word 4, 5, 6 # b[3] = {8, 10, 12}
len: .word 3 # array length = 3
str: .string "The inner product of two vectors is "
.text
# s1 = arr1 base address
# s2 = arr2 base address
# s3 = array length
# s4 = sum
# t0 = i
# t1 = a[i]
# t2 = b[i]
# t3 = a[i] * b[i] (assume no overflow, lower 32 bits)
main:
la s1, arr1 # s1 = a
la s2, arr2 # s2 = b
lw s3, len # s3 = 3
add s4, x0, x0 # sum = 0
add t0, x0, x0 # i = 0
jal ra, loop # jump to for loop
jal ra, print # jump to for loop
li a7, 10 # end program
ecall
loop:
lw t1, 0(s1) # t1 = a[i]
addi s1, s1, 4 # ++a (address move forward)
lw t2, 0(s2) # t2 = b[i]
addi s2, s2, 4 # ++b
mul t3, t1, t2 # t3 = a[i] * b[i]
add s4, s4, t3 # sum += a[i] * b[i]
addi t0, t0, 1 # ++i
blt t0, s3, loop # if i < 3, go to loop
ret # else, return to main
print:
la a0, str # load string
li a7, 4 # print string
ecall
add a0, s4, x0 # load result
li a7, 1 # print integer
ecall
ret # go back to main
```
## Analysis
We test our code using [Ripes](https://github.com/mortbopet/Ripes) simulator.
### Pseudo instruction
Put code above into editor and we will see that Ripe doesn't execute it literally. Instead, it replace [pseudo instruction (p.110)](https://riscv.org//wp-content/uploads/2017/05/riscv-spec-v2.2.pdf) into equivalent one, and change register name from ABI name to sequencial one.
The translated code looks like:
```
00000000 <main>:
0: 10000497 auipc x9 0x10000
4: 00048493 addi x9 x9 0
8: 10000917 auipc x18 0x10000
c: 00490913 addi x18 x18 4
10: 10000997 auipc x19 0x10000
14: 0089a983 lw x19 8(x19)
18: 00000a33 add x20 x0 x0
1c: 000002b3 add x5 x0 x0
20: 010000ef jal x1 0x30 <loop>
24: 030000ef jal x1 0x54 <print>
28: 00a00893 addi x17 x0 10
2c: 00000073 ecall
00000030 <loop>:
30: 0004a303 lw x6 0(x9)
34: 00448493 addi x9 x9 4
38: 00092383 lw x7 0(x18)
3c: 00490913 addi x18 x18 4
40: 02730e33 mul x28 x6 x7
44: 01ca0a33 add x20 x20 x28
48: 00128293 addi x5 x5 1
4c: ff32c2e3 blt x5 x19 -28 <loop>
50: 00008067 jalr x0 x1 0
00000054 <print>:
54: 10000517 auipc x10 0x10000
58: fc850513 addi x10 x10 -56
5c: 00400893 addi x17 x0 4
60: 00000073 ecall
64: 000a0533 add x10 x20 x0
68: 00100893 addi x17 x0 1
6c: 00000073 ecall
70: 00008067 jalr x0 x1 0
```
In each row it denotes address in instruction memory, instruction's machine code (in hex) and instruction itself respectively.
### 5-stage pipelined processor
Now we can choose a processor to run this code. Ripes provide four kinds of processor for us to choose, including **single cycle processor**, **5-stage processor**, **5-stage with hazard detection** and **5-stage with forward and hazard detection**. Here we choose the **5 stage processor**. Its block diagram look like this:

:::warning
Why is opcode output connected to immediate block? Isn't that immediate come from just instruction's immediate part according to its format type?
:::
The "5-stage" means this processor using five-stage pipeline to parallelize instructions. The stages are:
1. Instruction fetch (IF)
2. Instruction decode and register fetch (ID)
3. Execute (EX)
4. Memory access (MEM)
5. Register write back (WB)
You can see that each stage is separated by **pipeline registers** (the rectangle block with stage names on its each side) in the block diagram.
Instruction in different type of format will go through 5 stages with different signal turned on. Let's discuss each format in detail with example.
#### U-type format
The first instruction in this program is `auipc x9 0x10000`. According to [RISC-V Manual (p.14)](https://riscv.org//wp-content/uploads/2017/05/riscv-spec-v2.2.pdf):
> AUIPC (add upper immediate to pc) is used to build pc-relative addresses and uses the **U-type format**. AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the pc, then places the result in register rd.
Let's see how it go through each stage.
1. Instruction fetch (IF)

* We start from instruction put at `0x00000000`, so `addr` is equal to `0x00000000`
* The machine code of first instruction is `0x10000497` (you can look it up in the code snippet above), so `instr` is equal to `0x10000497`.
* PC will increment by 4 automatically using the above adder.
* Because there is no branch occur, next instruction will be at PC + 4, so the multiplexer before PC choose input come from adder.
2. Instruction decode and register fetch (ID)

* Instruction `0x10000497` is decoded to three part:
* `opcode` = `auipc`
* `Wr idx` = `0x09`
* `imm.` = `0x10000000` (`0x10000` in upper 20 bits, filling in the lowest 12 bits with zeros)
* U-type format doesn't read register, so `R1 idx` and `R2 idx` are both `0x00`.
* `Reg 1` and `Reg 2` read value from `0x00` register, so their value are both `0x00000000` too.
* Current PC value (`0x00000000`) and next PC value (`0x00000004`) are just send through this stage, we don't use them.
3. Execute (EX)

* First level multiplexers choose value come from `Reg 1` and `Reg 2`, but this is an U-type format instruction, we don't use them. So they are filtered by second level multiplexer.
* `Reg 1` and `Reg 2` are also send to branch block, but no branch is taken.
* Second level multiplexer choose value come from current PC value (upper one) and immediate (lower one) as `Op1` and `Op2` of ALU.
* ALU add two operand togeher, so the `Res` is equal to `0x00000000`.
* Next PC value (`0x00000004`) and `Wr idx` (`0x09`) are just send through this stage, we don't use them.
4. Memory access (MEM)

* `Res` from ALU is send to 3 ways:
* Pass through this stage and go to WB stage
* Send back to EXE stage for next instruction to use
* Use as data memory address. Memory read data at address `0x10000000`, so `Read out` is equal to `0x00000001`. The table below denotes the data section of memory.

* `Reg 2` is send to `Data in`, but memory doesn't enable writing.
* Next PC value (`0x00000004`) and `Wr idx` (`0x09`) are just send through this stage, we don't use them.
5. Register write back (WB)

* The multiplexer choose `Res` from ALU as final output. So the output value is `0x10000000`.
* The output value and `Wr idx` are send back to registers block. Finally, the value `0x10000000` will be write into `x9` register, whose ABI name is `s1`.
After all these stage are done, the register is updated like this:

#### R-type format
(Not done yet...)
* Explain your program with the visualization for multiplexer input selection, register write/enable signals and more. You have to illustrate each stage such as IF, ID, IE, MEM, and WB. In addition, you should discuss the steps of memory updates accordingly.
## Reference
* [Dot product - Wikipedia](https://en.wikipedia.org/wiki/Dot_product)
* [RISC-V Instruction Set Manual](https://riscv.org//wp-content/uploads/2017/05/riscv-spec-v2.2.pdf)
* [RISC-V Green Card](https://www.cl.cam.ac.uk/teaching/1617/ECAD+Arch/files/docs/RISCVGreenCardv8-20151013.pdf)
* [Environmental Calls](https://github.com/mortbopet/Ripes/wiki/Environment-calls)