# Assignment1: RISC-V Assembly and Instruction Pipeline
contributed by < [jumha](https://github.com/jumha) >
tags: `RISC-V` `computer architecture` `homework`
## Diagonal Matrix Sum
This is one of the "Easy" problems listed at LeetCode. The problem is follows.
> Given a square matrix, return the sum of the matrix diagonals.
> Only include the sum of all the elements on the primary diagonal and all the elements on the secondary diagonal that are not part of the primary diagonal.

## C code
The function prototype is given by the problem.
```c
int diagonalSum(int** mat, int matSize, int matColSize);
```
This is the source code. The input matrix size is $3 \times 3$ and the diagonal sum is $29$.
```c
#include <stdio.h>
int diagonalSum(int** mat, int matSize, int matColSize);
int main() {
int input_matrix[3][3] = {
{1, 5, 9},
{7, 9, 7},
{6, 10, 4}
};
size_t n = sizeof(*input_matrix) / sizeof(int);
int *mat[n];
int i;
for (i = 0; i < n; ++i) { //Record the address of each row
mat[i] = input_matrix[i];
}
int sum = diagonalSum(mat, n, n);
printf("The diagonal sum of this matrix is %d\n", sum);
return 0;
}
int diagonalSum(int** mat, int matSize, int matColSize) {
int i, sum = 0;
for (i = 0; i < matSize; ++i) { //Sum the primary diagonal and secondary diagonal
sum += mat[i][i] + mat[i][matSize - i - 1];
}
if (matSize % 2) { //Substrate the intersection of primary diagonal and secondary diagonal
sum -= mat[matSize / 2][matSize / 2];
}
return sum;
}
```
### Explanation
The `matSize` is the number of the rows in matrix. The element of primary diagonal is `mat[i][i]`, and of secondary diagonal is `mat[i][matSize - i - 1]`. To solve this problem, I calculate the summation of primary diagonal and secondary diagonal first. Then, check the `matSize` is odd or even. If `matSize` is odd, it means that there is an intersection of primary diagonal and secondary diagonal, so it needs to be substrated from the sum. Otherwise, the sum is the answer.
## Assembly code
```c
.data
arr: .word 1, 5, 9, 7, 1, 7, 6, 10, 4 # input matrix
str: .string "The diagonal sum of this matrix is "
rowsize: .word 3
colsize: .word 3
.text
main:
addi sp, sp, -16
sw ra, 0(sp)
la a0, arr
lw a1, rowsize
lw a2, colsize
sw a0, 4(sp) # *mat
addi t0, a0, 12
sw t0, 8(sp) # (* + 1)mat
addi t0, a0, 24
sw t0, 12(sp) # (* + 2)mat
addi a0, sp, 4
jal ra, diagonalSum
jal ra, print
lw ra, 0(sp)
addi sp, sp, 16
li a7, 10 # end
ecall
diagonalSum:
addi sp, sp, -8
sw s0, 0(sp)
sw s1, 4(sp)
add s1, zero, zero # sum = 0
add t0, zero, zero # i = 0
loop1:
slli t1, t0, 2
add s0, t1, a0
lw s0, 0(s0) # (* + i)mat
add t1, s0, t1
lw t1, 0(t1) # (* + i)(* + i)mat
sub t2, a1, t0
addi t2, t2, -1 # matSize - i - 1
slli t2, t2, 2
add t2, s0, t2
lw t2, 0(t2) # (* + i)(* + matSize - i - 1)mat
add t1, t1, t2
add s1, s1, t1 # sum += mat[i][i] + mat[i][matSize - i - 1]
addi t0, t0, 1
slt t1, t0, a1
slli t1, t0, 2
bnez t1, loop1
andi t1, a1, 1
beqz t1, L1
srli t0, a1, 1 # rowsize / 2
slli t0, t0, 2
add s0, t0, a0
lw s0, 0(s0) # (* + i)mat
add t0, s0, t0
lw t0, 0(t0) # (* + i)(* + i)mat
sub s1, s1, t0
L1:
mv a1, s1
lw s0, 0(sp)
lw s1, 4(sp)
addi sp, sp, 8
jr ra
print:
la a0, str # load string
li a7, 4 # print string
ecall
mv a0, a1 # load sum
li a7, 1 # print integer
ecall
jr ra
```
:::warning
Can you use fewer instructions?
:notes: jserv
:::
### Explanation
The input matrix is represented as 2D array stored in memory in row major, that is, the address of next row is differ by 12 bytes(colume size * word size).
In main function, I store return address and the address of three rows of the matrix in stack memory, so I need to add 16 in stack pointer register.
## Analysis
In this homework, we use Ripes to simulate the assembly code running on RISC-V 5-stage pipelined processor(32bits).

This processor has five stage, IF, ID, EXE, MEM and WB. It means that our instruction will be executed in five stage.
* IF stage(Instruction Fecth): Read instruction from memory
* ID stage(Instruction Decode): Decode the instruction and read from registers
* EX stage(Execute): Do arithmatic or logic operation
* MEM stage(Memory access): Laod/store data from/to memory
* WB stage(Write Back): Write the data to register
#### Branch instruction
We focus on `bnez t1, loop1`. If the data in t1 is not equal to 0, the next instruction is at label loop1.

At IF stage, PC calculate the address of `bnez t1, loop1` from memory.

In next cycle, `bnez t1, loop1` at ID stage. It decode the instruction as BNE and read the data from register `t1` and `x0`. Moreover, at IF stage, processor reads instruction `andi t1, a1, 1`, where the address is increased by 4, since the pipilined processor will load the next instruction whatever the result of branch instruction is. And I think this processor has no branch prediction because it always reads instruction from address increased by 4.

In next cycle, `bnez t1, loop1` at EX stage. The processor calculate the address of labal `loop1`, and it is branch taken so IF/ID and ID/EX stage register will be clear in next stage. The processor will also flush the instructions in IF and ID stage in next stage.

We can see that the instructions in ID and EX stages are `nop` and the instruction in IF stage is `slli t1, t0, 2` which is the instruction at `loop1`.
The branch instruction will do nothing in MEM and WB stages.
#### Data hazard

In my assembly code,`lw t2, 0(t2)` is followed by `add t1, t1, t2`. It will have data hazard in register `t2` as known as RAW(Read After Write) hazard. The reason is that the data at `t2` will be read from Data memory and it happens in MEM stage. However, the next instruction read the wrong data in `t2` in ID stage.

To solve this hazard, there will be a `nop` instruction between `lw t2, 0(t2)` and `add t1, t1, t2`. After the data read from Data memory, the data will be forwarding to EX stage so that the instruction `add t1, t1, t2` can be executed correctly.

Finally, the data will be written back to register `t2` in WB stage.