# Assignment1: RISC-V Assembly and Instruction Pipeline -- Bubble Sort contributed by < `Shengyuu` > ## C code Sort integer array `arr[]` which includes 5 integer with bublle sort algorithm ```cpp #include <stdio.h> int main() { int arr[5]; arr[0] = 3; arr[1] = 5; arr[2] = 1; arr[3] = 2; arr[4] = 4; int tmp; for(int i = 0; i < 4; i++){ for(int j = 0; j < 4 - i; j++){ if(arr[j] > arr[j + 1]){ tmp = arr[j]; arr[j] = arr[j + 1]; arr[j + 1] = tmp; } } } return 0; } ``` ## Assambly code ```cpp main: #create stack addi sp,sp,-48 #save s0 sw s0,44(sp) #update s0 addi s0,sp,48 #init arr[] to memory li a5,3 sw a5,-48(s0) li a5,5 sw a5,-44(s0) li a5,1 sw a5,-40(s0) li a5,2 sw a5,-36(s0) li a5,4 sw a5,-32(s0) #init i to memory sw zero,-20(s0) j .L2 .L6: #init j to memory sw zero,-24(s0) j .L3 .L5: #load arr[ j ] to a4 lw a5,-24(s0) slli a5,a5,2 addi a4,s0,-16 add a5,a4,a5 lw a4,-32(a5) #load arr[j+1] a5 lw a5,-24(s0) addi a5,a5,1 slli a5,a5,2 addi a3,s0,-16 add a5,a3,a5 lw a5,-32(a5) #if arr[j + 1] > arr[j] bge a5,a4,.L4 # arr[j+1] < arr[j] #load arr[j] a5 lw a5,-24(s0) slli a5,a5,2 addi a4,s0,-16 add a5,a4,a5 lw a5,-32(a5) #store arr[j] to tmp sw a5,-28(s0) #load arr[j+1] to a4 lw a5,-24(s0) addi a5,a5,1 slli a5,a5,2 addi a4,s0,-16 add a5,a4,a5 lw a4,-32(a5) #arr[j] = arr[j+1] lw a5,-24(s0) slli a5,a5,2 addi a3,s0,-16 add a5,a3,a5 sw a4,-32(a5) #store tmp to arr[j+1] lw a5,-24(s0) addi a5,a5,1 slli a5,a5,2 addi a4,s0,-16 add a5,a4,a5 lw a4,-28(s0) sw a4,-32(a5) .L4: #j++ lw a5,-24(s0) addi a5,a5,1 sw a5,-24(s0) .L3: #check (j - i) < 4 li a4,4 lw a5,-20(s0) sub a5,a4,a5 lw a4,-24(s0) blt a4,a5,.L5 #i++ lw a5,-20(s0) addi a5,a5,1 sw a5,-20(s0) .L2: #check i < 4 lw a4,-20(s0) li a5,3 bge a5,a4,.L6 #return li a5,0 mv a0,a5 lw s0,44(sp) addi sp,sp,48 jr ra ``` - .main - Create stack and initialize the parameters `arr[]` and `i` in stack memory - .L2 - If `i < 4` jamp to `.L6`, else free the stack memory then return - .L6 - Initailize parameter `j` in stack memory - .L3 - If `(j - i) < 4` jump to `.L5`, else `i++` and go to `.L2` to check `i` - .L5 - Compare `arr[j]` to `arr[j+1]` - If `arr[j+1] < arr[j]` - Store `arr[j]` to `tmp` - Assign `arr[j+1]` to `arr[j]` - Assign `tmp` to `arr[j+1]` - Go to `.L4` to exucute `j++` then check `j` - If `arr[j+1]` > `arr[j]` - Jump to `.L4` and excute `j++` then check `j` - .L4 - Exute `j++` ### Statck Memory Layout ![](https://i.imgur.com/pow23dx.png) <details> <summary></summary> @startuml rectangle rectangle[ 0 ---- -4 -> s0 ---- ... ... ... ---- -20 -> i ---- -24 -> j ---- -26 -> tmp ---- -32 -> arr[4]; ---- -36 -> arr[3]; ---- -40 -> arr[2]; ---- -44 -> arr[1]; ---- -48 -> arr[0]; ] @enduml </details> ## Memory layout with Ripes simulator There are instruction memory and data memory in Ripes simulator memory. Instruction memory start from `0x00000000` to `0x0000011f` because there are 72 instructions in bubble sort program, and each instruction uses 4 instruction addresss, therefore the highest address is `(72 * 4 - 1)` which is `0x0000011f`. ![](https://i.imgur.com/GAuSl6m.png) ... ... ... ![](https://i.imgur.com/K66eyyQ.png) Data memory start from `0x7ffffff0` to `0x7fffffc0` becuase the inital value of register `sp` is `0x7ffffff0` and we need 48 bytes to store parameters. ![](https://i.imgur.com/p57kWfx.png) The parameter `arr[]` stores in the buttom of the stack which is `0x7fffffc0` to `0x7fffffd0`. ![](https://i.imgur.com/H1ajD4x.png) ## How instructions work in 5 stage pipeline cpu with Ripes simulator I will explain a couple instructions step by step ### addi x2 x2 -48 (addi sp,sp,-48) This instruction creates a stack to store the parameter of main function. #### IF stage In this stage the cpu fetch the instruction at the address `0x00000000` in the instruction memory, and the value of the instruction is `0xfd010113` because the memory is in little-endian ordering. ![](https://i.imgur.com/oxixvP2.png) ![](https://i.imgur.com/cdZaPxx.png) #### DE stage In this stage the cpu decode the instruction, find the data in register `x2` and store it to ID/EX register for next stage. ![](https://i.imgur.com/I4wHe4b.png) #### EXE stage In this stage the cpu sends the data of `x2` register and immediate date to ALU for computing. ![](https://i.imgur.com/Y485pjv.png) #### MEM stage Because the instruction doesn't touch the memory, nothing to be done in this stage. ![](https://i.imgur.com/qXrYsm3.png) #### WB stage In this stage the cpu has to write the data back to `x2` register. ![](https://i.imgur.com/YYUsZhE.png) ## Issues in pipeline cpu There are categories of data dependency situation and branch issue in pipeline cpu, i will instroduce a couple of those. ### Data dependency The instruction `sw x15 -48(x8)` has to use `x8` register which's value is be changed by instruction `addi x8 x2 48`, hence we have to send the data of the `x8` register to the MUX of the EXE stage, preventing from using error data in `x8` register. ![](https://i.imgur.com/wqHNprW.png) ### jump instruction Because the jump instruction, the cpu has to flush the the instruction which has been fetch in previous stage. ![](https://i.imgur.com/dqTDKMx.png) ### branch instuction Go to branch or not is decideed in decode stage, if the register's data hasn't be updated, the cpu has to add `nop` instruction to stall the pipeline cpu. ![](https://i.imgur.com/5qeGzQR.png) ## The result of bubble sort The datas in `arr[]` have been sorted successfully ![](https://i.imgur.com/3yueYKO.png) ###### tags: `Computer archetecture` `Assignment`