# Lab1: RV321 Simulator
## Decode XORed Array
### problem
[leetcode1720](https://leetcode.com/problems/decode-xored-array/)
There is a hidden integer array arr that consists of n non-negative integers.
It was encoded into another integer array encoded of length n - 1, such that encoded[i] = arr[i] XOR arr[i + 1]. For example, if arr = [1,0,2,1], then encoded = [1,2,3].
You are given the encoded array. You are also given an integer first, that is the first element of arr, i.e. arr[0].
Return the original array arr. It can be proved that the answer exists and is unique.
Example 1:
```C=
Input: encoded = [1,2,3], first = 1
Output: [1,0,2,1]
Explanation: If arr = [1,0,2,1], then first = 1 and encoded = [1 XOR 0, 0 XOR 2, 2 XOR 1] = [1,2,3]
```
Example 2:
```C=
Input: encoded = [6,2,7,3], first = 4
Output: [4,2,0,7,4]
```
### Solution
In previous example,we can find encoded[i]=output[i] ^ output[i+1]

However,If we check the XOR truth table above,we can also find B ^ Y = A,
Thus,the output[i+1] can be generated by encoded[i] ^ output[i] .
### C code
```C=
int* decode(int* encoded, int encodedSize, int first, int* returnSize){
int* result = (int*)malloc(sizeof(int)*(encodedSize+1));
result[0]=first;
for(int i=0;i<encodedSize;i++){
result[i+1]=result[i] ^ encoded[i];
}
*returnSize = encodedSize+1;
return result;
}
```
### Assembly code
```C=
.data
num1: .word 1,2,3
numSize1: .word 3
first1: .word 1
num2: .word 6,2,7,3
numSize2: .word 4
first2: .word 4
num3: .word 5,6,7,8,9
numSize3: .word 5
first3: .word 2
nextline: .string "\n"
space: .string " "
.text
main:
la a1,num1 #get num1 address
la a2,numSize1 #get numSize1 address
la a3,first1 #get first1 address
lw a2,0(a2) #a2 = numSize1 = 3
lw a3,0(a3) #a3 = first1 = 1
li a4,0 #int i = 0
li a5,0 #int count = 0
li a6,1 #input set =3
jal ra,decode.L1 #goto decode.L1
input2:
la a1,num2 #get num2 address
la a2,numSize2 #get numSize2 address
la a3,first2 #get first2 address
lw a2,0(a2) #a2 = numSize2 = 4
lw a3,0(a3) #a3 = first2 = 4
li a4,0 #int i = 0
li a5,0 #int count = 0
addi a5,a5,1 #count++
jal ra,decode.L1 #goto decode.L1
input3:
la a1,num3 #get num3 address
la a2,numSize3 #get numSize3 address
la a3,first3 #get first3 address
lw a2,0(a2) #a2 = numSize3 = 5
lw a3,0(a3) #a3 = first3 = 2
li a4,0 #int i = 0
addi a5,a5,1 ##count++
jal ra,decode.L1 #goto decode.L1
decode.L1: #only use t0,t1,t2
add t0,a3,x0
sw t0,0(a4) #result[0] = first
j decode.L2
decode.L2:
blt a4,a2,decode.L3 #if i<numSize then goto decode.L3
addi a4,x0,0 #i=0
j print #goto print
decode.L3:
slli t3,a4,2 #a4*4
add t4,t3,a1 #get num1[i]
lw t1,0(t4) #get num1[i]
lw t2,0(t3) #get result[i]
xor t5,t1,t2 #result[i] ^ num[i]
addi t6,t3,4 #a4*4,k = i + 1
sw t5,0(t6) #result[k] = result[i] ^ num[i]
addi a4,a4,1 #i++
j decode.L2 #goto decode.L2
print:
slli t3,a4,2 #a4*4
lw a0,0(t3) #get result[i]
li a7,1 #systemcall print
ecall #execute
addi a4,a4,1 #i++
la a0,space #print " "
li a7,4
ecall
ble a4,a2,print #if i<numSize then goto print
la a0,nextline #nextline
li a7,4
ecall
beq a5,x0,input2 #after excute input1,goto input2
beq a5,a6,input3 #after excute input2,goto input3
j exit
exit:
li a7,10
ecall
```
### Optimization
```C=
.data
num1: .word 1,2,3
numSize1: .word 3
first1: .word 1
num2: .word 6,2,7,3
numSize2: .word 4
first2: .word 4
num3: .word 5,6,7,8,9
numSize3: .word 5
first3: .word 2
nextline: .string "\n"
space: .string " "
.text
main:
la a1,num1 #get num1 address
la a2,numSize1 #get numSize1 address
la a3,first1 #get first1 address
lw a2,0(a2) #a2 = numSize1 = 3
lw a3,0(a3) #a3 = first1 = 1
li a4,0 #int i = 0
li a5,0 #int count = 0
li a6,3 #input set =3
jal ra,decode.L1 #goto decode.L1
la a1,num2 #get num2 address
la a2,numSize2 #get numSize2 address
la a3,first2 #get first2 address
lw a2,0(a2) #a2 = numSize2 = 4
lw a3,0(a3) #a3 = first2 = 4
jal ra,decode.L1 #goto decode.L1
la a1,num3 #get num3 address
la a2,numSize3 #get numSize3 address
la a3,first3 #get first3 address
lw a2,0(a2) #a2 = numSize3 = 5
lw a3,0(a3) #a3 = first3 = 2
decode.L1:
add t0,a3,x0
sw t0,0(a4) #result[0] = first
decode.L2: #loop
beq a4,a2,index_zero #if i=numSize then goto index_zero
slli t0,a4,2 #a4*4
add t2,t0,a1 #get num1[i]
lw t1,0(t2) #get num1[i]
lw t2,0(t0) #get result[i]
xor t1,t1,t2 #result[i] ^ num[i]
addi t2,t0,4 #a4*4,k = i + 1
sw t1,0(t2) #result[k] = result[i] ^ num[i]
addi a4,a4,1 #i++
j decode.L2 #goto decode.L2
index_zero:
addi a4,x0,0 #i=0
print:
slli t0,a4,2 #a4*4
lw a0,0(t0) #get result[i]
li a7,1 #systemcall print
ecall #execute
addi a4,a4,1 #i++
la a0,space #print " "
li a7,4
ecall
ble a4,a2,print #if i<numSize then goto print
la a0,nextline #nextline
li a7,4
ecall
addi a5,a5,1 #count++
beq a5,a6,exit #excute all dataset
li a4,0 #int i = 0
jr ra
exit:
li a7,10
ecall
```
### Results
* dataset1
> num1 = [1,2,3]
numSize1 = 3
first1 = 1
result1 = [1,1 XOR 1,0 XOR 2,2 XOR 3] = [1,0,2,1]
* dataset2
> num2 = [6,2,7,3]
numSize2 = 4
first2 = 4
result2 = [4,4 XOR 6,2 XOR 2,0 XOR 7,7 XOR 3] = [4,2,0,7,4]
* dataset3
> num3 = [5,6,7,8,9]
numSize3 = 5
first3 = 2
result3 = [2,2 XOR 5,7 XOR 6,1 XOR 7,6 XOR 8,14 XOR 9] = [2,7,1,6,14,7]
* Ripes result

### 5-stage-pipeline processor

There are 5 stage in 5-stage-pipeline processor :
(Take instruction " addi x14 x14 1 " for example)
1. IF(Instruction Fetch)
PC will sent into Instruction Memory as address to get Instruction,after Fetch Instruction,PC will plus 4 to get next Instruction.
(First Instruction)

(Second Instruction)

2. ID(Instruction Decode)
In this stage ,the instruction will be decoded,and the address is also sent into Registers to get the value.

| Instruction | opcode | rd address|rs1 address|Immediate|
| -------- | -------- | -------- | ------| ------- |
| 0x00170713 | 0x13(ADDI) | 0x0E |0X0E| 0x00000001|
3. EXE(Execution)
The data come from the ID stage will be excuted by the ALU according to the Opcode.

In the above image,we can understand the Instruction is ADDI,and the input from rs1 and Immediate is 0x00000002 and 0X00000001,thus after the ALU ADD operation , the output is 0X00000003.
4. MEM(Memory Access)
In this stage,Data Memory can read/write according to the instruction .

In ADDI operation,the Data memory Writing is disabled,
and the data 0x00000003 from EXE stage is pass to WB stage.
5. WB(Write Back)
The MUX in Write Back stage will choose which datapath can sent back to the Registers.

In the image we can find the data 0x00000003 is writeback to Registers with address 0x0E.
* Registers before instruction " addi x14 x14 1 "

* Registers after instruction " addi x14 x14 1 "

### Pipeline Hazards
#### Data Hazard
* When two consecutive instruction have data dependency,due to the pipeline structure,the data come from ID stage would be wrong.
However,it can be solved by forwarding.

* above image show that the instrution in EXE,MEM have data dependency(auipc's rd is as same as addi's rs1),and the yellow line show how forwarding work,by using mux to choose data from MEM stage,the data hazard can be easily avoided.
* when load-use occured,it need to stall for 1 cycle and use forwarding.

#### Branch Hazard
When Jump/Branch happened,the instructions in EXE stage and ID stage become useless.Thus it's important to let instruction in EXE stage and ID stage become NOP.


* The image show the instructions in address 0xC8,0xCC are still in pipeline before Jump is executed.
### Reference
[Hazard (computer architecture)](https://en.wikipedia.org/wiki/Hazard_(computer_architecture))
[RISC-V 整數基本指令集](https://ithelp.ithome.com.tw/articles/10194907)