owned this note
owned this note
Published
Linked with GitHub
# Assignment1: RISC-V Assembly and Instruction Pipeline
contributed by < `Cheng Yu` >
###### tags: `Computer Architure`
## Insertion Sort
### C code
```cpp
#include <stdio.h>
void main(void)
{
int data[5]={2,3,7,4,1};
int n=5;
int i, j, temp;
for (i = 1; i < n; i++)
{
temp = data[i];
for (j = i - 1; j >= 0 && data[j] > temp; j--)
{
data[j+1] = data[j];
}
data[j+1] = temp;
}
printf("Sorted array = ");
for(i=0; i<n; i++)
{
printf("%d ",data[i]);
}
printf("\n");
}
```
### Assembly code
```cpp
.data
arr: .word 2, 3, 7, 4, 1
str1: .string "Sorted array = "
.text
main:
la s0, arr
addi t0, x0, 5 # initial n
addi t1, x0 ,0 # initial i
jal ra, Loopi
# Print the result to console
jal ra, print
# Exit program
li a7, 10
ecall
Loopi:
addi t1, t1, 1 # i++
slli t4, t1, 2 # get the address of data[i]
add s1, s0, t4
lw t5, 0(s1) # t5=data[i]
add t3, t5, x0 # temp=data[i]
addi t2, t1, -1 # j=i-1
blt t1, t0, Loopj # if(i<n) jump
jr ra
Loopj:
slli t4, t2, 2 # get the address of data[j]
add s1, s0, t4
lw t6, 0(s1) # t6=data[j]
blt t2, x0, Loopi # if(j<0) leave Loopj
bge t3, t6, Loopi # if(temp>=data[j]) leave Loopj
sw t6, 4(s1) # data[j+1] = data[j]
sw t3, 0(s1) # data[j] = temp
addi t2, t2, -1 # j--
j Loopj
print:
la a0, str1
li a7, 4
ecall
lw t0, 0(s0)
mv a0, t0
li a7, 1
ecall
lw t0, 4(s0)
mv a0, t0
li a7, 1
ecall
lw t0, 8(s0)
mv a0, t0
li a7, 1
ecall
lw t0, 12(s0)
mv a0, t0
li a7, 1
ecall
lw t0, 16(s0)
mv a0, t0
li a7, 1
ecall
ret
```
### Result

## Ripes simulator
### Pipeline
The pipeline architecture consists of 5 stages (IF, ID, EX, MEM, WB) in the simulator, and following is introducing of each stage:
* **IF (Instruction fetch)**
- Fetch Instruction according to program counter and write instruction which is from instruction memory into IF/ID register.
* If load-use happens,we must stall pipeline,because we want to delay next instruction. We must flush instruction and not write next instruction counter into program counter and we can't write flush instruction into IF/ID register in order to ensure we can get previous instruction which unexecuted.
* If branch error happens, we just flush instruction,because the wrong instruction count go into program counter,and we will get error instruction. So we just flush instruction and let it go into IF/ID register.
* **ID (Instruction decode)**
* Decoder Instruction which come from IF/ID register.
* If load-use happenes, we don't write the result which come from decoding instruction. Because after decoding instruction ,we find the data which we need is not prepare,we need wait one cycle. After we wait one cycle, the data which we need is forwarding to EXE stage,and this instruction will also move to EXE stage. So we can get correct data.
* If branch error happens, we don't write the result which come frm decoding instruction. Because this instruction is error, we can't let it move to next stage.
* **EX (Execute)**
* In this stage, we just execute the instruction according to instruction opcode and func3.
* If data dependency happens, we just forwarding data which previous instructions execute result to this stage.
* And in this stage we will check load-use,data dependency or branch whether happen.
* **MEM (Memory access)**
* This stage will store the result which executed into memory,get data from memory or go to next stage, according to instruction opcode.
* This stage can forwarding data to EXE stage,if data dependency exists in MEM stage and EXE stage.
* **WB (Writeback)**
* This stage will write the result into register file.
* This stage can forwarding data to EXE stage,if data dependency exists in WB stage and EXE stage.
## Issue in pipeline CPU
In pipeline CPU, we need to focus on dependency issue, control instructions, register updates, memory updates.
1. dependency
Dependency means when pipeline CPU encounter RAW issue,need to do some forwarding or the performance will be very low. Because this CPU is in order issue in order execution, we don't need to care about other dependency like WAR,WAW.
- example 1 (R-type,R-type)
In this case we can see there are two registers.srl will update x17 register,meanwhile andi need x17 source register.Because CPU has forwarding mechanism so andi x17 can read the latest data without inserting a bubble.

- example 2 (lw,R-type)
In this case there is a srl instruction after lw instruction.srl need to get the latest x17 register data.

In this cycle, we can see there is a stall before srl instruction.why need this stall? Becasue lw still can't access memory in this cycle.

In this cycle srl go to EXE stage.lw is in WB stage,and x17 already get the latest data,so forwarding mechanism is work.

- example 3 (control instruction)
first we can see beq instruction beq x12,x17,12 in decode stage.meanwhile,andi x17,x17,1 in EXE stage. Because beq need to get the latest x17 register,it need wait for two cycle.When andi instruction in WB stage,beq can get latest data and calculate the result to choose branch or not. If branch is true then the fetch PC will change to new one.




2. control instruction
Control instrction like j,jal,jr,jalr is checked in decode stage. When it need to branch, decode stage will caculate branch PC to fetch.
- example
In this cycle beq can start to calculate. If branch is true PC will be PC+12 or PC will be PC+4.

Next cycle decode is changed to nop because branch occure.PC is PC + 12 now,so flush addi instruction and fetch sll instruction.

3. register updates
In above example, I already show how a register updates in CPU.
4. memory updates
- example
sw is in mem stage. In this cycle we can see x8 register hase data 0x7ffffff0 and x15 register has data 0xfffffb50, so next cycle we can predict that 0x7fffffd4 (0x7ffffff0 - 0x1c) will store 0xfffffb50.


Next cycle sw is in WB stage. We can see 0x7fffffd4 store 0xffffb50. sw instruction finishs its job.



