# Assignment1: RISC-V Assembly and Instruction Pipeline -- Bubble Sort
contributed by < `Shengyuu` >
## C code
Sort integer array `arr[]` which includes 5 integer with bublle sort algorithm
```cpp
#include <stdio.h>
int main()
{
int arr[5];
arr[0] = 3;
arr[1] = 5;
arr[2] = 1;
arr[3] = 2;
arr[4] = 4;
int tmp;
for(int i = 0; i < 4; i++){
for(int j = 0; j < 4 - i; j++){
if(arr[j] > arr[j + 1]){
tmp = arr[j];
arr[j] = arr[j + 1];
arr[j + 1] = tmp;
}
}
}
return 0;
}
```
## Assambly code
```cpp
main:
#create stack
addi sp,sp,-48
#save s0
sw s0,44(sp)
#update s0
addi s0,sp,48
#init arr[] to memory
li a5,3
sw a5,-48(s0)
li a5,5
sw a5,-44(s0)
li a5,1
sw a5,-40(s0)
li a5,2
sw a5,-36(s0)
li a5,4
sw a5,-32(s0)
#init i to memory
sw zero,-20(s0)
j .L2
.L6:
#init j to memory
sw zero,-24(s0)
j .L3
.L5:
#load arr[ j ] to a4
lw a5,-24(s0)
slli a5,a5,2
addi a4,s0,-16
add a5,a4,a5
lw a4,-32(a5)
#load arr[j+1] a5
lw a5,-24(s0)
addi a5,a5,1
slli a5,a5,2
addi a3,s0,-16
add a5,a3,a5
lw a5,-32(a5)
#if arr[j + 1] > arr[j]
bge a5,a4,.L4
# arr[j+1] < arr[j]
#load arr[j] a5
lw a5,-24(s0)
slli a5,a5,2
addi a4,s0,-16
add a5,a4,a5
lw a5,-32(a5)
#store arr[j] to tmp
sw a5,-28(s0)
#load arr[j+1] to a4
lw a5,-24(s0)
addi a5,a5,1
slli a5,a5,2
addi a4,s0,-16
add a5,a4,a5
lw a4,-32(a5)
#arr[j] = arr[j+1]
lw a5,-24(s0)
slli a5,a5,2
addi a3,s0,-16
add a5,a3,a5
sw a4,-32(a5)
#store tmp to arr[j+1]
lw a5,-24(s0)
addi a5,a5,1
slli a5,a5,2
addi a4,s0,-16
add a5,a4,a5
lw a4,-28(s0)
sw a4,-32(a5)
.L4:
#j++
lw a5,-24(s0)
addi a5,a5,1
sw a5,-24(s0)
.L3:
#check (j - i) < 4
li a4,4
lw a5,-20(s0)
sub a5,a4,a5
lw a4,-24(s0)
blt a4,a5,.L5
#i++
lw a5,-20(s0)
addi a5,a5,1
sw a5,-20(s0)
.L2:
#check i < 4
lw a4,-20(s0)
li a5,3
bge a5,a4,.L6
#return
li a5,0
mv a0,a5
lw s0,44(sp)
addi sp,sp,48
jr ra
```
- .main
- Create stack and initialize the parameters `arr[]` and `i` in stack memory
- .L2
- If `i < 4` jamp to `.L6`, else free the stack memory then return
- .L6
- Initailize parameter `j` in stack memory
- .L3
- If `(j - i) < 4` jump to `.L5`, else `i++` and go to `.L2` to check `i`
- .L5
- Compare `arr[j]` to `arr[j+1]`
- If `arr[j+1] < arr[j]`
- Store `arr[j]` to `tmp`
- Assign `arr[j+1]` to `arr[j]`
- Assign `tmp` to `arr[j+1]`
- Go to `.L4` to exucute `j++` then check `j`
- If `arr[j+1]` > `arr[j]`
- Jump to `.L4` and excute `j++` then check `j`
- .L4
- Exute `j++`
### Statck Memory Layout

<details>
<summary></summary>
@startuml
rectangle rectangle[
0
----
-4 -> s0
----
...
...
...
----
-20 -> i
----
-24 -> j
----
-26 -> tmp
----
-32 -> arr[4];
----
-36 -> arr[3];
----
-40 -> arr[2];
----
-44 -> arr[1];
----
-48 -> arr[0];
]
@enduml
</details>
## Memory layout with Ripes simulator
There are instruction memory and data memory in Ripes simulator memory.
Instruction memory start from `0x00000000` to `0x0000011f` because there are 72 instructions in bubble sort program, and each instruction uses 4 instruction addresss, therefore the highest address is `(72 * 4 - 1)` which is `0x0000011f`.

...
...
...

Data memory start from `0x7ffffff0` to `0x7fffffc0` becuase the inital value of register `sp` is `0x7ffffff0` and we need 48 bytes to store parameters.

The parameter `arr[]` stores in the buttom of the stack which is `0x7fffffc0` to `0x7fffffd0`.

## How instructions work in 5 stage pipeline cpu with Ripes simulator
I will explain a couple instructions step by step
### addi x2 x2 -48 (addi sp,sp,-48)
This instruction creates a stack to store the parameter of main function.
#### IF stage
In this stage the cpu fetch the instruction at the address `0x00000000` in the instruction memory, and the value of the instruction is `0xfd010113` because the memory is in little-endian ordering.


#### DE stage
In this stage the cpu decode the instruction, find the data in register `x2` and store it to ID/EX register for next stage.

#### EXE stage
In this stage the cpu sends the data of `x2` register and immediate date to ALU for computing.

#### MEM stage
Because the instruction doesn't touch the memory, nothing to be done in this stage.

#### WB stage
In this stage the cpu has to write the data back to `x2` register.

## Issues in pipeline cpu
There are categories of data dependency situation and branch issue in pipeline cpu, i will instroduce a couple of those.
### Data dependency
The instruction `sw x15 -48(x8)` has to use `x8` register which's value is be changed by instruction `addi x8 x2 48`, hence we have to send the data of the `x8` register to the MUX of the EXE stage, preventing from using error data in `x8` register.

### jump instruction
Because the jump instruction, the cpu has to flush the the instruction which has been fetch in previous stage.

### branch instuction
Go to branch or not is decideed in decode stage, if the register's data hasn't be updated, the cpu has to add `nop` instruction to stall the pipeline cpu.

## The result of bubble sort
The datas in `arr[]` have been sorted successfully

###### tags: `Computer archetecture` `Assignment`