---
tags: Computer architecture 2022
---
# Lab1: RV32I Simulator
contributed by < [`qwe661234`](https://github.com/qwe661234) >
> GitHub: [ComputerArchitectureHW/HW1-LeetCode-203-RISCV](https://github.com/qwe661234/ComputerArchitectureHW/tree/main/HW1-LeetCode-203-RISCV)
## 203. Remove Linked List Elements
This problem is based on [LeetCode 203](https://leetcode.com/problems/remove-linked-list-elements/). Given the head of a linked list and an integer variable `val`, remove all the nodes of the linked list that has `Node.val == val`, and return the new head.
### ListNode Structure
```c
struct ListNode {
int val;
struct ListNode *next;
};
```
## Assembly code
```cpp
.data
length:
.word 7
array:
.word 1, 2, 6, 3, 4, 5, 6
target:
.word 6
answer:
.word 1, 2, 3, 4, 5
err_msg:
.string "malloc fail"
assert_fail:
.string "assert fail"
assert_success:
.string "assert success"
.text
main:
jal create_default_list
mv t1, s1 # store list head to t1
jal print_list
mv t1, s1 # store list head to t1
jal remove_elements
mv t1, s1 # store list head to t1
jal print_list
mv t1, s1 # store list head to t1
jal assert_list
exit:
addi a7, x0 10 # exit
ecall
# ------ create_default_list ------
create_default_list:
addi sp, sp, 8
li s2, 0 # i = 0
la t1, array # load array
la t0, length # load length
lw t0, 0(t0)
bne s2, t0, create
jr ra
create:
li s0, 0x0fff0000 # address of list head
sw ra, 0(sp) # store return address
sw s0, 4(sp) # store the address of list head
jal ra, malloc # get memory for the next node
bne a0, x0, exit_error
lw s1, 0(t1) # load array[i]
addi t1, t1, 4 # i++
sw s1, 0(s0) # node->value = array[i]
sw x0, 4(s0) # node->next = NULL
mv t2, s0 # prev = node
addi s2, s2, 1
addi s0, s0, 8
beq s2, t0, ret # check end condition in for loop
loop:
jal ra, malloc # get memory for the next node
bne a0, x0, exit_error
lw s1, 0(t1) # load array[i]
addi t1, t1, 4 # i++
sw s1, 0(s0) # node->value = array[i]
sw x0, 4(s0) # node->next = NULL
sw s0, 4(t2) # prev->next = node
mv t2, s0 # prev = node
addi s2, s2, 1
addi s0, s0, 8
bne s2, t0, loop # check end condition in for loop
ret:
lw ra, 0(sp) # load return address
lw s1, 4(sp) # load the address of list head
addi sp, sp, 8
jr ra
# ------ malloc ------
malloc: # allocates 8 bytes on the heap, returns pointer to start in a0
addi a0, s0, 8 # move heap limit to current + 8, 4 byte for integer, 4 byte for pointer
addi a7, x0, 214 # brk systemcall
ecall
jr ra
# ------ remove_elements ------
remove_elements:
la s0, target
lw s0, 0(s0)
iterate:
beq t1, x0, end_iterate
lw t2, 0(t1) # load value store in node
beq t2, s0, remove
mv t3, t1 # prev = cur
lw t1, 4(t1) # load address store in node->next
j iterate
remove:
beq t1, s1, remove_head # if cur == head
lw t1, 4(t1) # load cur->next to t0
sw t1, 4(t3) # prev->next = cur->next
j iterate
remove_head:
lw t1, 4(t1) # load address store in cur->next
mv s1, t1 # head = head->next
j iterate
end_iterate:
jr ra
# ------ print_list ------
print_list:
beq t1, x0, end
lw a0, 0(t1) # load value store in node
addi a7, x0, 1 # print number
ecall
lw t1, 4(t1) # load address store in node->next
j print_list
end:
addi a0, x0, 10 # '\n' acsii number
addi a7, x0, 11 # print char
ecall
jr ra
exit_error:
la a0, err_msg
addi a7, x0 4 # print error message
ecall
addi a7, x0 10 # exit
ecall
# ------ assert_list ------
assert_list:
la t0, answer
assert_loop:
beq t1, x0, end_aseert
lw a0, 0(t1) # load value store in node
lw a1, 0(t0) # load value store in answer[i]
lw t1, 4(t1) # load address store in node->next
addi t0, t0, 4 # i++
beq a0, a1 assert_loop
la a0, assert_fail
addi a7, x0 4 # print message
ecall
jr ra
end_aseert:
la a0, assert_success
addi a7, x0 4 # print message
ecall
jr ra
```
:::warning
:warning: Can you use fewer instructions?
:notes: jserv
:::
## Interpret the assembly code
### malloc
#### C code
```c
struct ListNode *node = malloc(sizeof(struct ListNode));
```
#### Assembly code
```c
# ------ malloc ------
malloc: # allocates 8 bytes on the heap, returns pointer to start in a0
addi a0, s0, 8 # move heap limit to current + 8, 4 byte for integer, 4 byte for pointer
addi a7, x0, 214 # brk systemcall
ecall
jr ra
```
register `s0` represent the origin heap limit address, register`a0` represent the new heap limit address. We want to allocate 8 bytes for ListNode, 4 bytes for `node->value` and 4 bytes for `node->next`, so we add register `s0` with 8 and store the result in register `a0`. Then we use `brk` systemcall to set new heap limit address to address stored in register `a0`.
It is more convient to use `sbrk` systemcall to implement malloc, but Ripes only support `brk` systemcall.
### create_default_list
#### C code
```c
struct ListNode *head = NULL, *cur = head;
int array[7] = {1, 2, 6, 3, 4, 5, 6}
int length = 7;
for (int i = 0; i < len; i++) {
struct ListNode *node = malloc(sizeof(struct ListNode));
node->val = array[i];
node->next = NULL;
if (!head)
head = node;
else {
cur->next = node;
cur = cur->next;
}
}
```
#### Assembly code
```c
create_default_list:
addi sp, sp, 8
li s2, 0 # i = 0
la t1, array # load array
la t0, length # load length
lw t0, 0(t0)
bne s2, t0, create # if length == 0, return
jr ra
```
First, we store list head address and return address in stack and load array in data section.
```c
create:
li s0, 0x0fff0000 # address of list head
sw ra, 0(sp) # store return address
sw s0, 4(sp) # store the address of list head
jal ra, malloc # get memory for the next node
bne a0, x0, exit_error
lw s1, 0(t1) # load array[i]
addi t1, t1, 4 # i++
sw s1, 0(s0) # node->value = array[i]
sw x0, 4(s0) # node->next = NULL
mv t2, s0 # prev = node
addi s2, s2, 1
addi s0, s0, 8
beq s2, t0, ret # check end condition in for loop
```
Two instruction `sw ra, 0(sp)` and `sw s0, 4(sp)` store return address and the address of list head (`0x0fff0000`) in stack.
This label is for `head == NULL`, because head is NULL when `i = 0`. Instruction `sw s1, 0(s0)` store arr[0] to `node->val`, instruction `sw x0, 4(s0)` set `node->next` to NULL and instuction `mv t2, s0 ` record previous node.
Instruction `addi t1, t1, 4` means i++.
```c
loop:
jal ra, malloc # get memory for the next node
bne a0, x0, exit_error
lw s1, 0(t1) # load array[i]
addi t1, t1, 4 # i++
sw s1, 0(s0) # node->value = array[i]
sw x0, 4(s0) # node->next = NULL
sw s0, 4(t2) # prev->next = node
mv t2, s0 # prev = node
addi s2, s2, 1
addi s0, s0, 8
bne s2, t0, loop # check end condition in for loop
```
Only one instruction in this label `loop` is different from label `create`, `sw s0, 4(t2)`, this instruction store `node` to `previous_node->next`.
### remove_elements
#### C code
```c
int target = 1;
struct ListNode *cur = head, *prev = NULL;
while (cur) {
if (cur->val == target) {
if (cur == head) {
head = head->next;
cur = head;
continue;
} else {
prev->next = cur->next;
}
}
prev = cur;
cur = cur->next;
}
```
#### Assembly code
```c
iterate:
beq t1, x0, end_iterate
lw t2, 0(t1) # load value store in node
beq t2, s0, remove
mv t3, t1 # prev = cur
lw t1, 4(t1) # load address store in node->next
j iterate
```
Instruction `beq t2, s0` is to check if `node->val == target`.
```c
remove:
beq t1, s1, remove_head # if cur == head
lw t1, 4(t1) # load cur->next to t0
sw t1, 4(t3) # prev->next = cur->next
j iterate
```
Instruction `beq t1, s1` is to check if `current_node == head`.
This label is to deal with remove node from list. Register `t1` store current node address and register `t3` store previous node address.
```c
remove_head:
lw t1, 4(t1) # load address store in cur->next
mv s1, t1 # head = head->next
j iterate
```
This label is to deal with remove head from list. Register `t1` store current node address, instruction `lw t1, 4(t1)` means we load `head->next` and instruction `mv s1, t1` means we store `head->next` to `head`.
### assert_list
#### C code
```clike
void assert_list(int *answer, ListNode *head) {
ListNode *cur = head;
int count = 0;
while (cur) {
if (cur->val != answer[count++]) {
printf("assert fail");
return;
}
cur = cur->next;
}
printf("assert success");
}
```
#### Assembly code
```cpp
assert_list:
la t0, answer
assert_loop:
beq t1, x0, end_aseert
lw a0, 0(t1) # load value store in node
lw a1, 0(t0) # load value store in answer[i]
lw t1, 4(t1) # load address store in node->next
addi t0, t0, 4 # i++
beq a0, a1 assert_loop
la a0, assert_fail
addi a7, x0 4 # print message
ecall
jr ra
end_aseert:
la a0, assert_success
addi a7, x0 4 # print message
ecall
jr ra
```
This function is for testing results automatically, we load answers which we has defined and value store in list node, then we assert whether these two values are equal or not.
## Result
The original linked list is `1->2->6->3->4->5->6`. After removing target elements in linked list, the linked list would be `1->2->3->4->5`.
```
1263456
12345
assert success
Program exited with code: 0
```
## Reduce instruction usage
### create_default_list
If we create list reversely(from last node to first node), we can just store `current_node` to `pervious_node->next`, instead of set `current_node->next = NULL`. Also, we don't need to care about this creating is for head or node. Therefore, we can save instruction `sw x0, 4(s0) # node->next = NULL` and merge label `create` to label `loop`. But, we should reverse the arrange of array in data section, because you create list reversely.
#### assembly code after modify
```cpp
create_default_list:
addi sp, sp, 8
li s0, 0x0fff0000 # address of list head
sw ra, 0(sp) # store return address
sw s0, 4(sp) # store the address of list head
li s2, 0 # i = 0
la t1, array # load array
la t0, length # load length
lw t0, 0(t0)
bne s2, t0, loop
jr ra
loop:
jal ra, malloc # get memory for the next node
bne a0, x0, exit_error
lw s1, 0(t1) # load array[i]
addi t1, t1, 4 # i++
sw s1, 0(s0) # node->value = array[i]
sw t2, 4(s0) # prev->next = node
mv t2, s0 # prev = node
addi s2, s2, 1
addi s0, s0, 8
bne s2, t0, loop # check end condition in for loop
ret:
addi s1, s0, -8
lw ra, 0(sp) # load return address
addi sp, sp, 8
jr ra
```
## Pipeline feature in Ripes
`Instruction pipeline` is a technique for implementing instruction-level parallelism within a single processor. We partition an instruction to five stages.
| Stage |Full name |
| -------- | -------- |
| IF | Instruciton Fetch |
|ID | Instruction decode & read register|
| EX |Execution or Address Calculation |
| MEM | Data memory access (load data from memory) |
| WB | write back (write data back to register) |
Also, we have four `pipeline register` to store the data of pervious stage. Next stage would load data from `pipeline register` and store new data to next `pipeline register`.

### Pipeline stage
We take instruction `lw t0, 4(t1)` as example
#### IF
In this stage, load the instruction which address is PC, and store PC + 4 to IF/ID pipeline register.

#### ID
In this stage, decode the instruction loaded from IF/ID pipeline register. the instruction is `lw t0, 4(t1)`, so `R1` is register `t1`(x6) , `R2` in this instruction is useless, `imm` is 4 and register `t0` (x5) would store in ID/EX pipeline register.
Reg 1 means the address store in register `t1` (x6), here is 0x00000020.

#### EX
In this stage, we pass immediate and the address store in register `t1` (x6) to ALU, ALU will execute add operation to calculate right memory address that we want to access value.
Here is 0x00000020 + 4 (immediate) = 0x00000024.

#### MEM
In this stage, we will access data store in memory address that ALU calculate in previous stage.

#### WB
In last stage, we will write data which we access in pervious stage back to the target register.
We can notice the below image, red line represent the target register number, and blue line represent the value that access from memory.

## Pipeline Hazard
The pipeline design might encounter three types of hazard
### 1. Structure hazards
A required resource is busy, and instruction need to wait until the resource is ready.
#### Solution
* Load/store and Instruction fetch both require memory access
Separating instruction memory and data memory or separating instruction cache and data cache.
### 2. Data hazard
An instruction depends on completion of data access by a previous instruction.
There are two scenarioes for data hazard
#### RAW(Read After Write)
There is an example of RAW, before the first instruction `add s0, t0, t1` write back data to register`x8`, the second instruction `sub t2, s0, t3` has read data from register `x8`.
In this Image, `Ripe` use forwarding to avoid data hazard.

#### Load-use
There is an example of RAW, before the first instruction `lw s0, 20(t1)` access memory and write back data to register`s0`, the second instruction `sub t2, s0, t3` has read data from register `s0`.
In this Image, `Ripe` use stall and forwarding to avoid data hazard.

#### Solution
there are three solution for data hazard:
* Reodering code: reorder code to avoid use of load result in the next instruction.
* Stall: add no operation instruction to wait data write back.
* Forwarding: bypass the result to next instruction, this desgin requires extra connections in the datapath.

But, we still need a stall to avoid data hazard in load-use case.

### 3. Control hazard
Branch determines flow of control, and fetching next instruction depends on branch outcome.
#### Solution
Result of branch is known at the EX stage.
* Stall: wait until the branch determined, then fetching next instruction.
* Branch Prediction:
* Correct prediction: no stall required.
* Incorrect prediction: clean incorrect instruction and stall one cycle to wait previous instruction finish EX stage.
## Reference
* https://moodle.ncku.edu.tw/pluginfile.php/345826/mod_resource/content/2/Ch4_Processor_single.pdf
* https://moodle.ncku.edu.tw/pluginfile.php/345827/mod_resource/content/1/Ch4_Processor-pipe1.pdf