Assignment1: RISC-V Assembly and Instruction Pipeline

# Assignment1: RISC-V Assembly and Instruction Pipeline contributed by < `Cheng Yu` > ###### tags: `Computer Architure` ## Insertion Sort ### C code ```cpp #include <stdio.h> void main(void) { int data[5]={2,3,7,4,1}; int n=5; int i, j, temp; for (i = 1; i < n; i++) { temp = data[i]; for (j = i - 1; j >= 0 && data[j] > temp; j--) { data[j+1] = data[j]; } data[j+1] = temp; } printf("Sorted array = "); for(i=0; i<n; i++) { printf("%d ",data[i]); } printf("\n"); } ``` ### Assembly code ```cpp .data arr: .word 2, 3, 7, 4, 1 str1: .string "Sorted array = " .text main: la s0, arr addi t0, x0, 5 # initial n addi t1, x0 ,0 # initial i jal ra, Loopi # Print the result to console jal ra, print # Exit program li a7, 10 ecall Loopi: addi t1, t1, 1 # i++ slli t4, t1, 2 # get the address of data[i] add s1, s0, t4 lw t5, 0(s1) # t5=data[i] add t3, t5, x0 # temp=data[i] addi t2, t1, -1 # j=i-1 blt t1, t0, Loopj # if(i<n) jump jr ra Loopj: slli t4, t2, 2 # get the address of data[j] add s1, s0, t4 lw t6, 0(s1) # t6=data[j] blt t2, x0, Loopi # if(j<0) leave Loopj bge t3, t6, Loopi # if(temp>=data[j]) leave Loopj sw t6, 4(s1) # data[j+1] = data[j] sw t3, 0(s1) # data[j] = temp addi t2, t2, -1 # j-- j Loopj print: la a0, str1 li a7, 4 ecall lw t0, 0(s0) mv a0, t0 li a7, 1 ecall lw t0, 4(s0) mv a0, t0 li a7, 1 ecall lw t0, 8(s0) mv a0, t0 li a7, 1 ecall lw t0, 12(s0) mv a0, t0 li a7, 1 ecall lw t0, 16(s0) mv a0, t0 li a7, 1 ecall ret ``` ### Result ![](https://i.imgur.com/ybmkJ84.png) ## Ripes simulator ### Pipeline The pipeline architecture consists of 5 stages (IF, ID, EX, MEM, WB) in the simulator, and following is introducing of each stage: * **IF (Instruction fetch)** - Fetch Instruction according to program counter and write instruction which is from instruction memory into IF/ID register. * If load-use happens,we must stall pipeline,because we want to delay next instruction. We must flush instruction and not write next instruction counter into program counter and we can't write flush instruction into IF/ID register in order to ensure we can get previous instruction which unexecuted. * If branch error happens, we just flush instruction,because the wrong instruction count go into program counter,and we will get error instruction. So we just flush instruction and let it go into IF/ID register. * **ID (Instruction decode)** * Decoder Instruction which come from IF/ID register. * If load-use happenes, we don't write the result which come from decoding instruction. Because after decoding instruction ,we find the data which we need is not prepare,we need wait one cycle. After we wait one cycle, the data which we need is forwarding to EXE stage,and this instruction will also move to EXE stage. So we can get correct data. * If branch error happens, we don't write the result which come frm decoding instruction. Because this instruction is error, we can't let it move to next stage. * **EX (Execute)** * In this stage, we just execute the instruction according to instruction opcode and func3. * If data dependency happens, we just forwarding data which previous instructions execute result to this stage. * And in this stage we will check load-use,data dependency or branch whether happen. * **MEM (Memory access)** * This stage will store the result which executed into memory,get data from memory or go to next stage, according to instruction opcode. * This stage can forwarding data to EXE stage,if data dependency exists in MEM stage and EXE stage. * **WB (Writeback)** * This stage will write the result into register file. * This stage can forwarding data to EXE stage,if data dependency exists in WB stage and EXE stage. ## Issue in pipeline CPU In pipeline CPU, we need to focus on dependency issue, control instructions, register updates, memory updates. 1. dependency Dependency means when pipeline CPU encounter RAW issue,need to do some forwarding or the performance will be very low. Because this CPU is in order issue in order execution, we don't need to care about other dependency like WAR,WAW. - example 1 (R-type,R-type) In this case we can see there are two registers.srl will update x17 register,meanwhile andi need x17 source register.Because CPU has forwarding mechanism so andi x17 can read the latest data without inserting a bubble. ![](https://i.imgur.com/9nDspKB.jpg) - example 2 (lw,R-type) In this case there is a srl instruction after lw instruction.srl need to get the latest x17 register data. ![](https://i.imgur.com/dYMaKQH.jpg) In this cycle, we can see there is a stall before srl instruction.why need this stall? Becasue lw still can't access memory in this cycle. ![](https://i.imgur.com/RTYRv9h.jpg) In this cycle srl go to EXE stage.lw is in WB stage,and x17 already get the latest data,so forwarding mechanism is work. ![](https://i.imgur.com/CVYyKMS.jpg) - example 3 (control instruction) first we can see beq instruction beq x12,x17,12 in decode stage.meanwhile,andi x17,x17,1 in EXE stage. Because beq need to get the latest x17 register,it need wait for two cycle.When andi instruction in WB stage,beq can get latest data and calculate the result to choose branch or not. If branch is true then the fetch PC will change to new one. ![](https://i.imgur.com/9brtX0u.jpg) ![](https://i.imgur.com/yoIc1Ny.jpg) ![](https://i.imgur.com/PIaioPU.jpg) ![](https://i.imgur.com/s6wgl6i.jpg) 2. control instruction Control instrction like j,jal,jr,jalr is checked in decode stage. When it need to branch, decode stage will caculate branch PC to fetch. - example In this cycle beq can start to calculate. If branch is true PC will be PC+12 or PC will be PC+4. ![](https://i.imgur.com/k69mWfw.jpg) Next cycle decode is changed to nop because branch occure.PC is PC + 12 now,so flush addi instruction and fetch sll instruction. ![](https://i.imgur.com/QwjP0Eq.jpg) 3. register updates In above example, I already show how a register updates in CPU. 4. memory updates - example sw is in mem stage. In this cycle we can see x8 register hase data 0x7ffffff0 and x15 register has data 0xfffffb50, so next cycle we can predict that 0x7fffffd4 (0x7ffffff0 - 0x1c) will store 0xfffffb50. ![](https://i.imgur.com/gYpXXcU.jpg) ![](https://i.imgur.com/CpnXTkP.jpg)![](https://i.imgur.com/7z2Sf2U.jpg) Next cycle sw is in WB stage. We can see 0x7fffffd4 store 0xffffb50. sw instruction finishs its job. ![](https://i.imgur.com/anwlNPC.jpg) ![](https://i.imgur.com/vMyjQUd.jpg) ![](https://i.imgur.com/gPNpA5V.jpg) ![](https://i.imgur.com/92NRil6.jpg)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.