---
tags: Computer Architecture 2022
---
# Homework1: RISC-V Assembly and Instruction Pipeline
## Problem
[Leetcode 905 Sort Array By Parity](https://leetcode.com/problems/sort-array-by-parity/)
Given an integer array nums, move all the even integers at the beginning of the array followed by all the odd integers.
Return any array that satisfies this condition.
**Example 1 :**
Input: nums = [3,1,2,4]
Output: [2,4,3,1]
Explanation: The outputs [4,2,3,1], [2,4,1,3], and [4,2,1,3] would also be accepted.
**Example 2 :**
Input: nums = [0]
Output: [0]
## Solution
We need to judge whether the number is even or odd. Put even numbers in front of the matrix and odd numbers in the back.
**Step 1 : judge whether the number is even or odd**
we can take the remainder of 2 to judge the number
**Step2 : put the even number in front of the matrix and odd numbers in the back**
we need a pointer to point to the head of array and a pointer point to the back of array. When we put the even number in front of the matrix, the pointer which point at the front of array will point at the next space. When we put the odd number in back of the matrix, the pointer which point at the back of the array will point at the pervious space.
## C Code
```c
int *sortArrayByParity(int *nums, int numsSize, int *returnSize)
{
int *ans = malloc(sizeof(int)*numsSize);
int *front = ans;
int *back = ans+numsSize;
for(int i = 0;i<numsSize;i++)
{
if(nums[i] % 2)
{
*--back = *(nums+i);
}
else
{
*front++ = *(nums+i);
}
}
*returnSize = numsSize;
return ans;
}
```
## Assembly Code
```
.data
nums: .word 22,3,9,2,54
numsSize: .word 5
returnpointer: .word 0x100
.text
main:
la a0,nums # a0 is the array pointer
lw a1,numsSize # a1 is the array size
addi sp,sp,-4 # create a space
sw ra,0(sp) # save the return address to main
jal sortArrayByParity
lw ra,0(sp) # load the return address to main
addi sp,sp,4 # release the space
j exit
sortArrayByParity:
mv s1,a0 # save s1 the array pointer
mv s2,a1 # save s2 the array size
lw a0,returnpointer # a0 is the address of head (the return address of array)
slli t0,s2,2 # array size * sizeof(int)
add t1,a0,t0 # t1 is the address of back
mv t0,a0 # t0 is the address of head
loop:
bge t3,s2,done # t3 : for loop count
lw t4,0(s1) # t4 : load the array element
addi s1,s1,4 # point to the next element in nums array
andi t2,t4,1 # if t2 = 1, that the number is even
beq t2,x0,else
then: # if the number is odd, run then
addi t1,t1,-4 # point to the previous element
sw t4,0(t1) # put the even number head of the array
addi t3,t3,1 # for loop count++
add t2,x0,x0 # clean the space
j end
else: # if the number is odd, run else
sw t4,0(t0) # put even number in the head
addi t0,t0,4 # point to the next element
addi t3,t3,1 # for loop count++
add t2,x0,x0 # clean the space
j end
end:
j loop
done:
jr ra
exit:
li a7, 10 # end the program
ecall
```
:::warning
:warning: Can you use fewer instructions?
:notes: jserv
:::
:::info
This is the method I try to use fewer instructions
- Try to put the repeat instruction which is in label "then" and "else" into "end" label
- delete "j end" in label "else" because it will run label "end" although no instruction "j end"
- don't need to save the ra register in main because we don't use the function in another function
:::
```
.data
nums: .word 22,3,9,2,54
numsSize: .word 5
returnpointer: .word 0x100
.text
main:
la a0,nums # a0 is the array pointer
lw a1,numsSize # a1 is the array size
jal sortArrayByParity
j exit
sortArrayByParity:
mv s1,a0 # save s1 the array pointer
mv s2,a1 # save s2 the array size
lw a0,returnpointer # a0 is the address of head (the return address of array)
slli t0,s2,2 # array size * sizeof(int)
add t1,a0,t0 # t1 is the address of back
mv t0,a0 # t0 is the address of head
loop:
bge t3,s2,done # t3 : for loop count
lw t4,0(s1) # t4 : load the array element
addi s1,s1,4 # point to the next element in nums array
andi t2,t4,1 # if t2 = 1, that the number is even
beq t2,x0,else
then: # if the number is odd, run then
addi t1,t1,-4 # point to the previous element
sw t4,0(t1) # put the even number head of the array
j end
else: # if the number is odd, run else
sw t4,0(t0) # put even number in the head
addi t0,t0,4 # point to the next element
end:
add t2,x0,x0 # clean the space
addi t3,t3,1
j loop # for loop count++
done:
jr ra
exit:
li a7, 10 # end the program
ecall
```
And I fill up the function of printf and three testing data
```
.data
nums1: .word 22,3,9,2,54
nums2: .word 18,7,69,32,5,8,2
nums3: .word 3,2,5
numsSize1: .word 5
numsSize2: .word 7
numsSize3: .word 3
returnpointer: .word 0x100
.text
main:
la a0,nums1 # a0 is the array pointer
lw a1,numsSize1 # a1 is the array size
jal sortArrayByParity
li t0,0 # clean the register
mv t1,a1 # a1 is the array size
mv t2,a0 # a0 is the array pointer which is sort by parity
print_loop:
bge t0,t1,exit # if print all of the element, exit
lw a0,0(t2) # load the array element
jal printf # print a0 to console
addi t0,t0,1 # print loop count++
addi t2,t2,4 # point to next element
j print_loop
sortArrayByParity:
mv s1,a0 # save s1 the array pointer
mv s2,a1 # save s2 the array size
lw a0,returnpointer # a0 is the address of head (the return address of array)
slli t0,s2,2 # array size * sizeof(int)
add t1,a0,t0 # t1 is the address of back
mv t0,a0 # t0 is the address of head
loop:
bge t3,s2,done # t3 : for loop count
lw t4,0(s1) # t4 : load the array element
addi s1,s1,4 # point to the next element in nums array
andi t2,t4,1 # if t2 = 1, that the number is even
beq t2,x0,else
then: # if the number is odd, run then
addi t1,t1,-4 # point to the previous element
sw t4,0(t1) # put the even number head of the array
j end
else: # if the number is odd, run else
sw t4,0(t0) # put even number in the head
addi t0,t0,4 # point to the next element
end:
add t2,x0,x0 # clean the space
addi t3,t3,1 # for loop count++
j loop
done:
jr ra
printf:
li a7,1 # print int
ecall
li a0, 9 # tab
li a7, 11 # print char
ecall
jr ra
exit:
li a7, 10 # end the program
ecall
```
The intput is

And the output is

## Analysis
We use Ripes to simulate RISC-V processor and we can divide the processor into 5 stage

| Stage | Description |
| -------- | -------- |
| IF | Instruction Fetch |
| ID| Instruction Decode & Register Read|
|EX|Execution or Address Calculation|
|MEM|Data Memory Access|
|WB|Write Back|
Then we make a example to explain the function of each stage.
First, we use I-type instruction such as addi instruction to explain how the processor work. For this instruction, it need to do the following five step to complete the instruction
:::info
Ex : addi t0 t1 1
Step 1 : Get the instruction
Step 2 : Parse instruction field
Step 3 : Read data based on parsed operands
Step 4 : Perform operation
Step 5 : Write result to our destination register
:::
### IF

First, IF stage will use the PC to fetch the instruction when the data pass the instruction memory. PC will be plus four to get the next instruction.
:::success
PC is 0 and read the instruction addi t0,t1,1. The machine code of this instruction is 0x00130293. PC will be plus four after instruction be fetch.
:::
### ID

Then, ID stage decode the instruction into rs1, rs2, rd, opcode and transform to registers. If opcode is I-type,it will use Imm to operate the register.
Registers will read the value of rs1 and rs2 register and transform to EX stage to execute the instruction
And, if WB stage transform data to ID stage, it will get the rd register and the value to write register
:::success
Decode the machine code to get the rs1, rd which is 0x06,0x05 respectively.
Because of the I-type, imm will decode the instruction to get the immediate.
And read the rs1 rs2 register value which is 0x0 and 0x0 respectively
:::
### EX

Then, EX stage will excute the operation and use the following table to decide which operation we want to perform. We can see that Func3 and Func7 will choose which operation to perform.
When the operation be done, it will transform the result to MEM stage

:::success
plus the value of rs1 and imm transform to res
:::
### MEM

This stage will input the address and read the data from memory.
Because add instruction don't need to access memory, just transform the data to WB stage.
:::success
Read the value of memory at 0x00000001, but addi don't need this value. It will be unused.
:::
### WB

Finally, we need to write the data to destination register. It will transform data to ID stage to write the data to register
:::success
It will write the 0x01 to the 0x05 register, 0x1302 will be ignore.
:::
### CPU control
Because of many type of instruction, we need control bits to decide which situation we wanted. The below picture show how many bits we need.

- PCSel : decide which address is my next instruction (next instruction or jump label)
- ImmSel : Does this instruction have an immediate
- RegWEn : Does this instruction write to the destination register rd
- BrUn : Does this instruction do a branch. If so, is it unsigned or signed
- BSel : Does this instruction operate on R[rs2] or an immediate
- ASel : Does this instruction operate on R[rs1] or an immediate
- ALUSel : What operation should we perform on the selected operands
- MemRW : Do we want to write to memory? Do we want to read from memory?
- WBSel : Which value do we want to write back to rd
:::success
for the Example : addi t0,t1,1
PCSel : 0
immSel : 1
RegWEn : 1
BrUn : 0
BSel : 1
ASel : 0
ALUSel : addi
MemRW : 0
WBSel : 1
:::
## Pipeline
After the above introduction, we know that there are five stages to process the instruction. In efficiency, use these five stage simultaneously is an efficient way.

In this case, we need to make sure that all the stage will consume the same time, so we need register(IFID, IDEX, EX/MEM, MEM/WB) between two of stage to ensure all the stage will consume the same time.

## Pipeline Hazard
When we use pipeline to process our instruction, there are some problem that will not occur in one cycle process. We will talk about these problem below.
### Structural hazard
When we want to write the data to register and the another instruction want to read the register data. They both need the ID stage.

:::info
Solution :
1. Build the RegFile withc independent read and write port.
2. split RegFile access in two. Prepare to write during 1st half, write on falling edge, read during 2nd half of each clock cycle.
:::
When we want to access memory data in MEM stage, and we want to get the instruction from memory at the same time, like the below picture.

:::info
Solution :
There are two cache between processor and memory. We can independent get the data and the instruction

:::
### Data Hazard
When the data we want to calculate, but the data have not been save yet. Like the below picture.

:::info
Solution :
1. Stalling (To wait for two tcycle)

2. Compiler can arrange code to avoid hazard and stall
3. Forward result as soon as it is available(After ALU), even though it's not stored in RegFile yet (need the hardware support)

:::
When the data we load from memory and have not save to register yet. Like the below picture. We can not use Forward to solve this problem.

:::info
Solution :
1. Put nop instruction between load and use

2. Compiler put an unrelated instruction in that slot

:::
### Control Hazard
When we excute the branch instruction, we need to wait the decision to jump to the label or continue, so we need to wait two instruction to get the decision

:::info
Solution :
1. The Branch decision made in 2nd stage, so only one nop is needed instead of two
2. Branch Prediction :
Predict a result and continue to run next instruction. If corrent, continue run. If wrong, then flush all the pipeline and flip prediction.
:::