# Assignment1: RISC-V Assembly and Instruction Pipeline
SC Lin
###### tags: `computer architure 2021`
## Perfect Number
Based on [Leetcode 507](https://leetcode.com/problems/perfect-number/), where a perfect number is a number that is equal to its [aliquot sum](https://en.wikipedia.org/wiki/Aliquot_sum) (sum of all divisors except for the number itself).
## Assembly Code
Solution: for all numbers less than _n_/2, add up all numbers that divides into _n_ with no remainder. Check if the sum equates to _n_.
```
.data
in: .word 6 # test input
.text
main:
lw a0, in
jal ra, isPerfect
li a7, 1 # output answer
ecall
li a7, 10 # end program
ecall
isPerfect: #function a0: input & ret
srli a1, a0, 1 # a2 = a0 >> 1
li a3, 0 # a3 = 0 (use as temporary sum)
LOOP: # do
rem a2, a0, a1 # a2 = a0 % a1
bnez a2, EIF # if a2 == 0
add a3, a3, a1 # a3 += a1
EIF: # endif
addi a1, a1, -1 # a1--
bnez a1, LOOP # while (a2 != 0)
li a1, 0 # a0 = (a3 == a0) ? 1 : 0;
bne a3, a0, END
li a1, 1
END:
mv a0, a1 # output answer
ret
```
| Register | Use |
| - | - |
| a0 | input to function & output from function |
| a1 | iterative value (_n_/2 -> 0)|
| a2 | temporary remainder |
| a3 | cumulative sum |
## Pipelining
The focus will be the function's loop, which takes up most of the processing time. Instructions used are described in table below:
| Instruction | Type |
| - | - |
| `srli` | S |
| `rem` | R |
| `bnez`(`bne`) | B |
| `add` | R |
| `addi` | I |
We have the 5-stage pipeline
* Instruction fetch (IF)
* Instruction Decode (ID)
* Execution (EX)
* Memory (MEM)
* Write Back (WB)
which can be executed independently. Note that the only instruction making use of MEM stage is the initial `lw` used to load input data.
Below we have the RISC-V Processor (Mux not shown)

We will demonstrate a pipeline of a single loop iteration.
### Instruction fetch (IF)
(Cycle 7) Consider the beginning of the loop at the `rem` instruction. The previous instructions `addi` & `srli` are in the ID and EX stage respectively. The `rem` instruction is fetched by the IFID block.

|cycle|7|8|9|10|11|12|13|
|-|-|-|-|-|-|-|-|
|`srli`|EX
|`addi`|ID
|`rem`|IF
### Instruction decode (ID)
(Cycle 8) While `rem` is passed through the decode block to the IDEX. The next instruction `bne` is now being fetched by the IFID block. Data hazard (read after write) can occur here (since `bne` requires the results from `rem`, for which fowarding is used.

|cycle|7|8|9|10|11|12|13|
|-|-|-|-|-|-|-|-|
|`srli`|EX|MEM
|`addi`|ID|EX
|`rem`|IF|ID
|`bne`||IF
### Execute (EX)
(Cycle 9) `rem` passed the IDEX block to be executed. `bne` is being decoded. The `add` instruction is being fetched.

|cycle|7|8|9|10|11|12|13|
|-|-|-|-|-|-|-|-|
|`srli`|EX|MEM|WB
|`addi`|ID|EX|MEM
|`rem`|IF|ID|EX
|`bne`||IF|ID
|`add`| ||IF
### Memory (MEM)
(Cycle 10) results of `rem` is being stored, in this case nothing is being stored in memory. The `bne` instruction is now being executed. `add` is being decoded and the next instruction `addi` being fetched.

|cycle|7|8|9|10|11|12|13|
|-|-|-|-|-|-|-|-|
|`srli`|EX|MEM|WB
|`addi`|ID|EX|MEM|WB
|`rem`|IF|ID|EX|MEM|
|`bne`||IF|ID|EX
|`add`| ||IF|ID
|`addi`| |||IF
### Writeback (WB)
(Cycle 11) The results from`rem` is now being updated in the registers. `bne` is (not really) using memory. `add` is being executed. `addi` is being decoded and `bne` being fetched.

|cycle|7|8|9|10|11|12|13|
|-|-|-|-|-|-|-|-|
|`srli`|EX|MEM|WB|
|`addi`|ID|EX|MEM|WB
|`rem`|IF|ID|EX|MEM|WB
|`bne`||IF|ID|EX|MEM
|`add`| ||IF|ID|EX
|`addi`| |||IF|ID
|`bne`| ||||IF