# Assignment1: RISC-V Assembly and Instruction Pipeline contributed by < [jumha](https://github.com/jumha) > tags: `RISC-V` `computer architecture` `homework` ## Diagonal Matrix Sum This is one of the "Easy" problems listed at LeetCode. The problem is follows. > Given a square matrix, return the sum of the matrix diagonals. > Only include the sum of all the elements on the primary diagonal and all the elements on the secondary diagonal that are not part of the primary diagonal. ![](https://i.imgur.com/zAij7iY.png) ## C code The function prototype is given by the problem. ```c int diagonalSum(int** mat, int matSize, int matColSize); ``` This is the source code. The input matrix size is $3 \times 3$ and the diagonal sum is $29$. ```c #include <stdio.h> int diagonalSum(int** mat, int matSize, int matColSize); int main() { int input_matrix[3][3] = { {1, 5, 9}, {7, 9, 7}, {6, 10, 4} }; size_t n = sizeof(*input_matrix) / sizeof(int); int *mat[n]; int i; for (i = 0; i < n; ++i) { //Record the address of each row mat[i] = input_matrix[i]; } int sum = diagonalSum(mat, n, n); printf("The diagonal sum of this matrix is %d\n", sum); return 0; } int diagonalSum(int** mat, int matSize, int matColSize) { int i, sum = 0; for (i = 0; i < matSize; ++i) { //Sum the primary diagonal and secondary diagonal sum += mat[i][i] + mat[i][matSize - i - 1]; } if (matSize % 2) { //Substrate the intersection of primary diagonal and secondary diagonal sum -= mat[matSize / 2][matSize / 2]; } return sum; } ``` ### Explanation The `matSize` is the number of the rows in matrix. The element of primary diagonal is `mat[i][i]`, and of secondary diagonal is `mat[i][matSize - i - 1]`. To solve this problem, I calculate the summation of primary diagonal and secondary diagonal first. Then, check the `matSize` is odd or even. If `matSize` is odd, it means that there is an intersection of primary diagonal and secondary diagonal, so it needs to be substrated from the sum. Otherwise, the sum is the answer. ## Assembly code ```c .data arr: .word 1, 5, 9, 7, 1, 7, 6, 10, 4 # input matrix str: .string "The diagonal sum of this matrix is " rowsize: .word 3 colsize: .word 3 .text main: addi sp, sp, -16 sw ra, 0(sp) la a0, arr lw a1, rowsize lw a2, colsize sw a0, 4(sp) # *mat addi t0, a0, 12 sw t0, 8(sp) # (* + 1)mat addi t0, a0, 24 sw t0, 12(sp) # (* + 2)mat addi a0, sp, 4 jal ra, diagonalSum jal ra, print lw ra, 0(sp) addi sp, sp, 16 li a7, 10 # end ecall diagonalSum: addi sp, sp, -8 sw s0, 0(sp) sw s1, 4(sp) add s1, zero, zero # sum = 0 add t0, zero, zero # i = 0 loop1: slli t1, t0, 2 add s0, t1, a0 lw s0, 0(s0) # (* + i)mat add t1, s0, t1 lw t1, 0(t1) # (* + i)(* + i)mat sub t2, a1, t0 addi t2, t2, -1 # matSize - i - 1 slli t2, t2, 2 add t2, s0, t2 lw t2, 0(t2) # (* + i)(* + matSize - i - 1)mat add t1, t1, t2 add s1, s1, t1 # sum += mat[i][i] + mat[i][matSize - i - 1] addi t0, t0, 1 slt t1, t0, a1 slli t1, t0, 2 bnez t1, loop1 andi t1, a1, 1 beqz t1, L1 srli t0, a1, 1 # rowsize / 2 slli t0, t0, 2 add s0, t0, a0 lw s0, 0(s0) # (* + i)mat add t0, s0, t0 lw t0, 0(t0) # (* + i)(* + i)mat sub s1, s1, t0 L1: mv a1, s1 lw s0, 0(sp) lw s1, 4(sp) addi sp, sp, 8 jr ra print: la a0, str # load string li a7, 4 # print string ecall mv a0, a1 # load sum li a7, 1 # print integer ecall jr ra ``` :::warning Can you use fewer instructions? :notes: jserv ::: ### Explanation The input matrix is represented as 2D array stored in memory in row major, that is, the address of next row is differ by 12 bytes(colume size * word size). In main function, I store return address and the address of three rows of the matrix in stack memory, so I need to add 16 in stack pointer register. ## Analysis In this homework, we use Ripes to simulate the assembly code running on RISC-V 5-stage pipelined processor(32bits). ![](https://i.imgur.com/FB9OhH9.png) This processor has five stage, IF, ID, EXE, MEM and WB. It means that our instruction will be executed in five stage. * IF stage(Instruction Fecth): Read instruction from memory * ID stage(Instruction Decode): Decode the instruction and read from registers * EX stage(Execute): Do arithmatic or logic operation * MEM stage(Memory access): Laod/store data from/to memory * WB stage(Write Back): Write the data to register #### Branch instruction We focus on `bnez t1, loop1`. If the data in t1 is not equal to 0, the next instruction is at label loop1. ![](https://i.imgur.com/fHjDx3N.png) At IF stage, PC calculate the address of `bnez t1, loop1` from memory. ![](https://i.imgur.com/2sxsD5L.png) In next cycle, `bnez t1, loop1` at ID stage. It decode the instruction as BNE and read the data from register `t1` and `x0`. Moreover, at IF stage, processor reads instruction `andi t1, a1, 1`, where the address is increased by 4, since the pipilined processor will load the next instruction whatever the result of branch instruction is. And I think this processor has no branch prediction because it always reads instruction from address increased by 4. ![](https://i.imgur.com/wcBl9ES.png) In next cycle, `bnez t1, loop1` at EX stage. The processor calculate the address of labal `loop1`, and it is branch taken so IF/ID and ID/EX stage register will be clear in next stage. The processor will also flush the instructions in IF and ID stage in next stage. ![](https://i.imgur.com/B7T0yoq.png) We can see that the instructions in ID and EX stages are `nop` and the instruction in IF stage is `slli t1, t0, 2` which is the instruction at `loop1`. The branch instruction will do nothing in MEM and WB stages. #### Data hazard ![](https://i.imgur.com/BxrjNFD.png) In my assembly code,`lw t2, 0(t2)` is followed by `add t1, t1, t2`. It will have data hazard in register `t2` as known as RAW(Read After Write) hazard. The reason is that the data at `t2` will be read from Data memory and it happens in MEM stage. However, the next instruction read the wrong data in `t2` in ID stage. ![](https://i.imgur.com/TRAXbco.png) To solve this hazard, there will be a `nop` instruction between `lw t2, 0(t2)` and `add t1, t1, t2`. After the data read from Data memory, the data will be forwarding to EX stage so that the instruction `add t1, t1, t2` can be executed correctly. ![](https://i.imgur.com/QgWvPF5.png) Finally, the data will be written back to register `t2` in WB stage.