# Assignment1: RISC-V Assembly and Instruction Pipeline
###### tags: `Computer Architecture` `RISC-V` `jserv`
> contributed by <[Tony (tonych1997)](https://github.com/tonych1997/Computer-Architecture)>
> [Requirements for this assignment](https://hackmd.io/llVsw3WOTtSq4tf7GlrwRw?view)
## Leetcode [1929. Concatenation of Array](https://leetcode.com/problems/concatenation-of-array/)
Given an integer array `nums` of length `n`, you want to create an array `ans` of length `2n` where `ans[i] == nums[i]` and `ans[i + n] == nums[i]` for `0 <= i < n` (**0-indexed**).
Specifically, `ans` is the **concatenation** of two `nums` arrays.
Return the array `ans`.
## Implementation
You can find the source code [here](https://github.com/tonych1997/Computer-Architecture). Feel free to fork and modify it.
### C code
Since the code used by Leetcode is in the form of a subroutine, I have modified the code slightly.
In code for assignment, the input array lengths and contents, and the output array lengths, are predefined.
#### Code for Leetcode
```clike=
int* getConcatenation(int* nums, int numsSize, int* returnSize)
{
*returnSize = numsSize*2;
int * ans=(int*)malloc(sizeof(int)*(numsSize*2));
int i = 0;
for(i=0; i<numsSize; i++){
ans[i]=nums[i];
ans[i+numsSize]=nums[i];
}
return ans;
}
```
#### Code for assignment
I defined an integer array of length 3 to do the concatenation action. After changing the length and content of the array, the program should still be correct.
```clike=
#include <stdio.h>
#include <stdlib.h>
int main()
{
int a[3] = {1, 2, 3};
int b[6];
int i;
for(i=0; i<3; i++){
b[i]=a[i];
b[i+3]=a[i];
}
for(i=0; i<3*2; i++) {
printf("%d ", b[i]);
}
return 0;
}
```
### Assembly code
#### Correct output code
**This code is longer, but it will output the correct answer.**
I found that i (t0) is not set to 0 in main when the array length is more than 5, which causes an error. So we can add two nop in lines 18-19 and 20-21 to solve the problem.
```clike=
.data
arr: .word 1, 2, 3 # a[3] = {1, 2, 3}
len1: .word 3 # array length of a is 3
space: .string " " # space
.text
# s1 = arr a base address
# s2 = arr b base address
# s3 = array length of a
# s4 = array length of b
# t0 = i for loopCon1, loopCon2, print
main:
la s1, arr # s1 = address of a
lw s3, len1 # s3 = length of a
add s4, s3, s3 # s4 = length of b (a*2)
add t0, x0, x0 # i = 0
jal ra, loopCon1
add t0, x0, x0 # i = 0
jal ra, loopCon2
add t0, x0, x0 # i = 0
jal ra, print
jal ra, exit
loopCon1:
add t1, t0, x0 # t1 = i
add t1, t1, t1 # t1 = t1*2
add t1, t1, t1 # t1 = t1*2 (t1*4)
add t1, t1, s1 # address of a[i] (base addr. + 4i)
lw t2, 0(t1) # t2 = s1 (content of a[i])
add t1, t0, x0 # t1 = i
add t1, t1, t1 # t1 = t1*2
add t1, t1, t1 # t1 = t1*2 (t1*4)
add t1, t1, s2 # address of b[i] (base addr. + 4i)
sw t2, 0(t1) # b[i] = t2 (content of a[i])
addi t0, t0, 1 # i++
blt t0, s3, loopCon1 # if i < length, go to loopCon1
ret # else, return to main
loopCon2:
add t1, t0, x0 # t1 = i
add t1 t1 t1 # t1 i*2
add t1 t1 t1 # t1 i*2 (t1*4)
add t1 t1 s1 # t1=i*4+base_of_arr
lw t2 0(t1) # t2 = s1 (content of a[i])
add t1 t0 s3 # t1 = i + length
add t1 t1 t1 # t1 = t1*2
add t1 t1 t1 # t1 = t1*2 (t1*4)
add t1 t1 s2 # t1 = address of b[i+length] (base addr. + 4*(i+length))
sw t2 0(t1) # t2 = content in arr[n+1]
addi t0 t0 1 # i++
blt t0, s3, loopCon2 # if i < length, go to loopCon2
ret # else, return to main
print:
lw t2 0(s2) # t2 = content of b[i]
add a0, t2, x0 # load result of array b
li a7, 1 # print integer
ecall # systemcall
la a0, space # load string - space
li a7, 4 # print string
ecall
addi s2, s2, 4 # address move forward
addi t0, t0, 1 # i++
blt t0, s4, print
ret
exit:
li a7 10 # end program
ecall
```
#### Wrong output code
This code is **shorter and more efficient**, but when exported, the first half of the value is correct and the second half is wrong.
**I think the error occurred on line 29, but I don't know the reason why.**
```clike=
.data
arr1: .word 1, 2, 3 # a[3] = {1, 2, 3}
len1: .word 3 # array length of a is 3
len2: .word 6 # array length of b is 6
space: .string " " # space
.text
# s1 = arr a base address
# s2 = arr b base address
# s3 = array length of a
# s4 = array length of b
# t0 = i
# t1 = a[i]
# t2 = b[i]
main:
la s1, arr1 # s1 = a
lw s3, len1 # s3 = 3
lw s4, len2 # s4 = 6
add t0, x0, x0 # i = 0
jal ra, loopCon # jump to loop for concatenation
add t0, x0, x0 # i = 0
jal ra, loopPrint # jump to loop for print
jal ra, exit # jump to exit
loopCon:
lw t1, 0(s1) # t1 = a[i]
sw t1, 0(s2) # b[i] = t1
sw t1, 12(s2) # b[i+3] = t1
addi s1, s1, 4 # address move forward
addi s2, s2, 4 # address move forward
addi t0, t0, 1 # i++
blt t0, s3, loopCon # if i < 3, go to loopCon
ret # else, return to main
loopPrint:
lw t2, 0(s2) # t2 = b[i]
addi s2, s2, 4 # address move forward
add a0, t2, x0 # load result of array b
li a7, 1 # print integer
ecall
la a0, space # load string - space
li a7, 4 # print string
ecall
addi t0, t0, 1 # i++
blt t0, s4, loopPrint # if i < 6, go to loopPrint
ret # else, return to main
exit:
li a7, 10 # end program
ecall
```
:::warning
:warning: If you have a comprehensive automated test suite, no `print` is really needed. Instead, everything should be self-contained.
:notes: jserv
:::
## Analysis
I test my code using [Ripes](https://github.com/mortbopet/Ripes) simulator.
### Pseudo instuction
Put code above into editor and we will see that Ripe doesn’t execute it literally. Instead, it replace [pseudo instruction (p.110)](https://riscv.org//wp-content/uploads/2017/05/riscv-spec-v2.2.pdf) into equivalent one, and change register name from [ABI (application binary interface)](https://zh.wikipedia.org/zh-tw/%E5%BA%94%E7%94%A8%E4%BA%8C%E8%BF%9B%E5%88%B6%E6%8E%A5%E5%8F%A3) name to sequencial one.
The translated code as follows:
```
00000000 <main>:
0: 10000497 auipc x9 0x10000
4: 00048493 addi x9 x9 0
8: 10000997 auipc x19 0x10000
c: 0049a983 lw x19 4 x19
10: 01398a33 add x20 x19 x19
14: 000002b3 add x5 x0 x0
18: 018000ef jal x1 24 <loopCon1>
1c: 000002b3 add x5 x0 x0
20: 044000ef jal x1 68 <loopCon2>
24: 000002b3 add x5 x0 x0
28: 070000ef jal x1 112 <print>
2c: 09c000ef jal x1 156 <exit>
00000030 <loopCon1>:
30: 00028333 add x6 x5 x0
34: 00630333 add x6 x6 x6
38: 00630333 add x6 x6 x6
3c: 00930333 add x6 x6 x9
40: 00032383 lw x7 0 x6
44: 00028333 add x6 x5 x0
48: 00630333 add x6 x6 x6
4c: 00630333 add x6 x6 x6
50: 01230333 add x6 x6 x18
54: 00732023 sw x7 0 x6
58: 00128293 addi x5 x5 1
5c: fd32cae3 blt x5 x19 -44 <loopCon1>
60: 00008067 jalr x0 x1 0
00000064 <loopCon2>:
64: 00028333 add x6 x5 x0
68: 00630333 add x6 x6 x6
6c: 00630333 add x6 x6 x6
70: 00930333 add x6 x6 x9
74: 00032383 lw x7 0 x6
78: 01328333 add x6 x5 x19
7c: 00630333 add x6 x6 x6
80: 00630333 add x6 x6 x6
84: 01230333 add x6 x6 x18
88: 00732023 sw x7 0 x6
8c: 00128293 addi x5 x5 1
90: fd32cae3 blt x5 x19 -44 <loopCon2>
94: 00008067 jalr x0 x1 0
00000098 <print>:
98: 00092383 lw x7 0 x18
9c: 00038533 add x10 x7 x0
a0: 00100893 addi x17 x0 1
a4: 00000073 ecall
a8: 10000517 auipc x10 0x10000
ac: f6850513 addi x10 x10 -152
b0: 00400893 addi x17 x0 4
b4: 00000073 ecall
b8: 00490913 addi x18 x18 4
bc: 00128293 addi x5 x5 1
c0: fd42cce3 blt x5 x20 -40 <print>
c4: 00008067 jalr x0 x1 0
000000c8 <exit>:
c8: 00a00893 addi x17 x0 10
cc: 00000073 ecall
```
In each row it denotes address in instruction memory, instruction’s machine code (in hex) and instruction itself respectively.
## 5-stage pipeline processor
There are many types RISC-V processor in Ripes, as shown as follows. (RISC-V processors are similar to MIPS processors. MIPS is use in Computer Organization textbook of 白算盤)

I choose the 5-stage proceesor, and it's block diagram as shown as follows.

5-stage means the processor has a pipeline of five stages, running instructions in parallel. The five stages as follows.
1. **Instruction Fetch (IF)**
* Get Instructions from memory (mostly cache) by using PC .
* PC + 4 ($\because$ each instruction is 4 bytes)
2. **Instruction decode and register file read (ID)**
* Command decode and read Register
3. **EXecution/address calculation (EX)**
* ALU Execute or address operation
* Memory reference (load/store)
* Register-Register ALU instruction
* Register-Immediate ALU instruction
* Conditional branch
4. **Memory access (MEM)**
* Load: the memory does a read using the effective address computed in the previous cycle.
* Store: the memory writes the data from the second register read from the register file using the effective address.
5. **Write back (WB)**
* Write the result into the register file
* Load: from the memory system
* Register-Register ALU instruction: from the ALU
### Pipeline process
Different types of instructions will use different stages.
|Instruction|<--| --- | step name |---|-->|CPI|
|:---------:|:-:|:---:|:------------------:|:-:|:-:|:-:|
|R-type |IF |ID/RF|Execution | |WB | 4 |
|lw |IF |ID/RF|addr. calc. |MA |WB | 5 |
|sw |IF |ID/RF|addr. calc. |MA | | 4 |
|beq |IF |ID/RF|compare reg. | | | 3 |
|j |IF | ID |compose target addr.| | | 3 |
| | | | | | | |
|func. unit |IM |Reg. |ALU |DM |Reg.| 3|
> Function Unit: The function unit used by the stage.
> IM: Instruction Memory
> Reg: Register file
> DM: Data Memory
The following diagram is a simple assembly language program that uses Ripes to understand how the pipeline works.

#### Assembly code
```clike=
main:
addi s1, x0, 1 # s1 = 0 + 1
addi s2, x0, 6 # s2 = 0 + 6
addi s3, x0, 13 # s3 = 0 + 13
sw s1, 0(t0) # t0[0] content = s1 content
sub t1, s3, s1 # t1 = s3(13) - s1(1)
```
#### Pseudo code
```
00000000 <main>:
0: 00100493 addi x9 x0 1
4: 00600913 addi x18 x0 6
8: 00d00993 addi x19 x0 13
c: 0092a023 sw x9 0 x5
10: 40998333 sub x6 x19 x9
```
#### IF
In the begging, the PC has been added 4, and the first instruction has been put into pipline.

#### ID
In the ID stage, the PC is set to 8 and then the value of the register or source is read in.
R1 corresponds to x0 and the value is 0x00. R2 corresponds to the value 1 and the value is 0x01.

#### EX
In the EX stage, 0x00 and 0x01 are put into the ALU for the operation.

#### MEM
In the MEM stage, read and write (lw / sw) accesses to memory are performed in this stage, and non-lw / sw instructions are passed.

#### WB
In the WB stage, the result will be written to the destination registers (R-type / lw). If you do not have to write back to the register, it will pass directly.
Result (0x00000001) is the yellow line, and the above of yello line is the destination register 0x09 (x9 / s1).

### Register value
After five stages of the first instruction, the value of the s1 register is updated to 1.

## Pipeline Hazards
Pipeline is efficient, but can cause hazard problems.
There are three main types of Hazard that Pipeline encounters
|Hazard type |Reasons for occurrence|
|:---------------:|:--------------------:|
|Structual Hazards|Hardware resource is limited. e.g. only single memory|
|Data Hazards|An instruction in the pipeline requires a result that has not yet been generated by the previous instruction (which is still in the pipeline) (data dependency)|
|Control Hazards|The result specified by Branch has not yet been generated (not yet decided whether to jump or not), and it has been executed by other commands.|
### Structual Hazards
#### Solution
* add hardware (e.g. memory)
* e.g. If only have one memory, then every instruction use the same memory, should wait the previous instruction use memory first.
### Data Hazards
* Solution
* Software
* Insert nop
* Delay branch
* Hardware
* Stall + Forwarding
#### Data Hazards example
If we choose a 5-stage processor that without forwarding or hazard detection, then process these code.

The result of register t1, it's value is 0x00. This is the wrong value. Becasuse the code in line 4 occured the Data Hazards.

So, we can insert 2 nop to solve the Data Hazards.
When the first instruction addi in WB stage, the third instruction add in ID stage. So after the first addi write back, then the third add will from ID to IF, then the third add can read the right value on t1(x9) in IF stage.

After inserting two nop, t1 successfully writes the value.

### Control Hazards
* Solution
* Software
* Insert nop
* Delay branch
* Hardware
* Predict not taken
* Flush wrong instruction
#### Control Hazards example
```clike=
main:
addi s1, x0, 1 # s1 = 0 + 1
addi s2, x0, 1 # s2 = 0 + 1
beq s1, s2, jump # if(s1==s2) then jump to 'jump' label
add t0, s1, s2 # t0 = s1 + s2 (set to = 2)
jump:
add t1, s1, s2 # t1 = s1 + s2 (set t1 = 2)
```
If Control Hazards occured, the beq in Ex stage decided to jump, but the back instructions are in the IF and ID stage.


Because the branch instruction decided to jump, so the pipeline flush the instruction in IF and ID stages, then read the branch's target instruction. Thus, solved the Control Hazards by using nop wastes two clocks.

## Reference
### Leetcode
* [Leetcode Discuss - with C](https://leetcode.com/problems/concatenation-of-array/discuss/1341644/with-C)
* [Leetcode Discuss - C / C++ / Python Simple and Short Solutions](https://leetcode.com/problems/concatenation-of-array/discuss/1418404/C-C%2B%2B-Python-Simple-and-Short-Solutions)
### RISC-V ISA
* [RISC-V 指令集架構介紹 - Integer Calling convention](https://tclin914.github.io/77838749/)
* [RISC-V指令集讲解(5)条件和无条件跳转指令](https://zhuanlan.zhihu.com/p/394860078)
### Pipeline
* [RISC-V流水线CPU 计算机组成与设计第四章第二部分](https://blog.csdn.net/Photice/article/details/125240998?ops_request_misc=&request_id=&biz_id=102&utm_term=riscv%20pipeline&utm_medium=distribute.pc_search_result.none-task-blog-2~all~sobaiduweb~default-3-125240998.nonecase)
* [Pipelining: Basic and Intermediate Concepts](https://blog.csdn.net/weixin_42437114/article/details/115285005)
* [从零开始设计RISC-V处理器——五级流水线之数据冒险](https://blog.csdn.net/qq_45677520/article/details/122806845?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522166489581516800182773726%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fall.%2522%257D&request_id=166489581516800182773726&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~first_rank_ecpm_v1~pc_rank_v39-12-122806845-null-null.142^v51^pc_rank_34_2,201^v3^add_ask&utm_term=riscv%20%E6%B5%81%E6%B0%B4%E7%B7%9A%20%E4%BB%8B%E7%B4%B9)
* [RISC-V Datapath Part4: Pipeline](https://zhuanlan.zhihu.com/p/267576239)
* [Day-7 Pipeline](https://ithelp.ithome.com.tw/articles/10261505)
* [Hazard](https://hackmd.io/@dZfCcN4hT8aUuDPv3B8CWQ/rk7NPf2kK)
### Computer Organization / Architecture
* [計算機組織結構](https://hackmd.io/@8bFA57f7SRG-K7AAT8s62g/ryv1NT3S#%E8%A8%88%E7%AE%97%E6%A9%9F%E7%B5%84%E7%B9%94%E7%B5%90%E6%A7%8B)
* [wjungle 筆記](https://www.ptt.cc/bbs/graduate/M.1476022765.A.174.html)
### Past assignment in this class
* [Assignment1: RISC-V Assembly and Instruction Pipeline - contributed by kaeteyaruyo](https://hackmd.io/@kaeteyaruyo/risc-v-hw1)