--- tags: Computer architecture 2022 --- # Lab1: RV32I Simulator contributed by < [`qwe661234`](https://github.com/qwe661234) > > GitHub: [ComputerArchitectureHW/HW1-LeetCode-203-RISCV](https://github.com/qwe661234/ComputerArchitectureHW/tree/main/HW1-LeetCode-203-RISCV) ## 203. Remove Linked List Elements This problem is based on [LeetCode 203](https://leetcode.com/problems/remove-linked-list-elements/). Given the head of a linked list and an integer variable `val`, remove all the nodes of the linked list that has `Node.val == val`, and return the new head. ### ListNode Structure ```c struct ListNode { int val; struct ListNode *next; }; ``` ## Assembly code ```cpp .data length: .word 7 array: .word 1, 2, 6, 3, 4, 5, 6 target: .word 6 answer: .word 1, 2, 3, 4, 5 err_msg: .string "malloc fail" assert_fail: .string "assert fail" assert_success: .string "assert success" .text main: jal create_default_list mv t1, s1 # store list head to t1 jal print_list mv t1, s1 # store list head to t1 jal remove_elements mv t1, s1 # store list head to t1 jal print_list mv t1, s1 # store list head to t1 jal assert_list exit: addi a7, x0 10 # exit ecall # ------ create_default_list ------ create_default_list: addi sp, sp, 8 li s2, 0 # i = 0 la t1, array # load array la t0, length # load length lw t0, 0(t0) bne s2, t0, create jr ra create: li s0, 0x0fff0000 # address of list head sw ra, 0(sp) # store return address sw s0, 4(sp) # store the address of list head jal ra, malloc # get memory for the next node bne a0, x0, exit_error lw s1, 0(t1) # load array[i] addi t1, t1, 4 # i++ sw s1, 0(s0) # node->value = array[i] sw x0, 4(s0) # node->next = NULL mv t2, s0 # prev = node addi s2, s2, 1 addi s0, s0, 8 beq s2, t0, ret # check end condition in for loop loop: jal ra, malloc # get memory for the next node bne a0, x0, exit_error lw s1, 0(t1) # load array[i] addi t1, t1, 4 # i++ sw s1, 0(s0) # node->value = array[i] sw x0, 4(s0) # node->next = NULL sw s0, 4(t2) # prev->next = node mv t2, s0 # prev = node addi s2, s2, 1 addi s0, s0, 8 bne s2, t0, loop # check end condition in for loop ret: lw ra, 0(sp) # load return address lw s1, 4(sp) # load the address of list head addi sp, sp, 8 jr ra # ------ malloc ------ malloc: # allocates 8 bytes on the heap, returns pointer to start in a0 addi a0, s0, 8 # move heap limit to current + 8, 4 byte for integer, 4 byte for pointer addi a7, x0, 214 # brk systemcall ecall jr ra # ------ remove_elements ------ remove_elements: la s0, target lw s0, 0(s0) iterate: beq t1, x0, end_iterate lw t2, 0(t1) # load value store in node beq t2, s0, remove mv t3, t1 # prev = cur lw t1, 4(t1) # load address store in node->next j iterate remove: beq t1, s1, remove_head # if cur == head lw t1, 4(t1) # load cur->next to t0 sw t1, 4(t3) # prev->next = cur->next j iterate remove_head: lw t1, 4(t1) # load address store in cur->next mv s1, t1 # head = head->next j iterate end_iterate: jr ra # ------ print_list ------ print_list: beq t1, x0, end lw a0, 0(t1) # load value store in node addi a7, x0, 1 # print number ecall lw t1, 4(t1) # load address store in node->next j print_list end: addi a0, x0, 10 # '\n' acsii number addi a7, x0, 11 # print char ecall jr ra exit_error: la a0, err_msg addi a7, x0 4 # print error message ecall addi a7, x0 10 # exit ecall # ------ assert_list ------ assert_list: la t0, answer assert_loop: beq t1, x0, end_aseert lw a0, 0(t1) # load value store in node lw a1, 0(t0) # load value store in answer[i] lw t1, 4(t1) # load address store in node->next addi t0, t0, 4 # i++ beq a0, a1 assert_loop la a0, assert_fail addi a7, x0 4 # print message ecall jr ra end_aseert: la a0, assert_success addi a7, x0 4 # print message ecall jr ra ``` :::warning :warning: Can you use fewer instructions? :notes: jserv ::: ## Interpret the assembly code ### malloc #### C code ```c struct ListNode *node = malloc(sizeof(struct ListNode)); ``` #### Assembly code ```c # ------ malloc ------ malloc: # allocates 8 bytes on the heap, returns pointer to start in a0 addi a0, s0, 8 # move heap limit to current + 8, 4 byte for integer, 4 byte for pointer addi a7, x0, 214 # brk systemcall ecall jr ra ``` register `s0` represent the origin heap limit address, register`a0` represent the new heap limit address. We want to allocate 8 bytes for ListNode, 4 bytes for `node->value` and 4 bytes for `node->next`, so we add register `s0` with 8 and store the result in register `a0`. Then we use `brk` systemcall to set new heap limit address to address stored in register `a0`. It is more convient to use `sbrk` systemcall to implement malloc, but Ripes only support `brk` systemcall. ### create_default_list #### C code ```c struct ListNode *head = NULL, *cur = head; int array[7] = {1, 2, 6, 3, 4, 5, 6} int length = 7; for (int i = 0; i < len; i++) { struct ListNode *node = malloc(sizeof(struct ListNode)); node->val = array[i]; node->next = NULL; if (!head) head = node; else { cur->next = node; cur = cur->next; } } ``` #### Assembly code ```c create_default_list: addi sp, sp, 8 li s2, 0 # i = 0 la t1, array # load array la t0, length # load length lw t0, 0(t0) bne s2, t0, create # if length == 0, return jr ra ``` First, we store list head address and return address in stack and load array in data section. ```c create: li s0, 0x0fff0000 # address of list head sw ra, 0(sp) # store return address sw s0, 4(sp) # store the address of list head jal ra, malloc # get memory for the next node bne a0, x0, exit_error lw s1, 0(t1) # load array[i] addi t1, t1, 4 # i++ sw s1, 0(s0) # node->value = array[i] sw x0, 4(s0) # node->next = NULL mv t2, s0 # prev = node addi s2, s2, 1 addi s0, s0, 8 beq s2, t0, ret # check end condition in for loop ``` Two instruction `sw ra, 0(sp)` and `sw s0, 4(sp)` store return address and the address of list head (`0x0fff0000`) in stack. This label is for `head == NULL`, because head is NULL when `i = 0`. Instruction `sw s1, 0(s0)` store arr[0] to `node->val`, instruction `sw x0, 4(s0)` set `node->next` to NULL and instuction `mv t2, s0 ` record previous node. Instruction `addi t1, t1, 4` means i++. ```c loop: jal ra, malloc # get memory for the next node bne a0, x0, exit_error lw s1, 0(t1) # load array[i] addi t1, t1, 4 # i++ sw s1, 0(s0) # node->value = array[i] sw x0, 4(s0) # node->next = NULL sw s0, 4(t2) # prev->next = node mv t2, s0 # prev = node addi s2, s2, 1 addi s0, s0, 8 bne s2, t0, loop # check end condition in for loop ``` Only one instruction in this label `loop` is different from label `create`, `sw s0, 4(t2)`, this instruction store `node` to `previous_node->next`. ### remove_elements #### C code ```c int target = 1; struct ListNode *cur = head, *prev = NULL; while (cur) { if (cur->val == target) { if (cur == head) { head = head->next; cur = head; continue; } else { prev->next = cur->next; } } prev = cur; cur = cur->next; } ``` #### Assembly code ```c iterate: beq t1, x0, end_iterate lw t2, 0(t1) # load value store in node beq t2, s0, remove mv t3, t1 # prev = cur lw t1, 4(t1) # load address store in node->next j iterate ``` Instruction `beq t2, s0` is to check if `node->val == target`. ```c remove: beq t1, s1, remove_head # if cur == head lw t1, 4(t1) # load cur->next to t0 sw t1, 4(t3) # prev->next = cur->next j iterate ``` Instruction `beq t1, s1` is to check if `current_node == head`. This label is to deal with remove node from list. Register `t1` store current node address and register `t3` store previous node address. ```c remove_head: lw t1, 4(t1) # load address store in cur->next mv s1, t1 # head = head->next j iterate ``` This label is to deal with remove head from list. Register `t1` store current node address, instruction `lw t1, 4(t1)` means we load `head->next` and instruction `mv s1, t1` means we store `head->next` to `head`. ### assert_list #### C code ```clike void assert_list(int *answer, ListNode *head) { ListNode *cur = head; int count = 0; while (cur) { if (cur->val != answer[count++]) { printf("assert fail"); return; } cur = cur->next; } printf("assert success"); } ``` #### Assembly code ```cpp assert_list: la t0, answer assert_loop: beq t1, x0, end_aseert lw a0, 0(t1) # load value store in node lw a1, 0(t0) # load value store in answer[i] lw t1, 4(t1) # load address store in node->next addi t0, t0, 4 # i++ beq a0, a1 assert_loop la a0, assert_fail addi a7, x0 4 # print message ecall jr ra end_aseert: la a0, assert_success addi a7, x0 4 # print message ecall jr ra ``` This function is for testing results automatically, we load answers which we has defined and value store in list node, then we assert whether these two values are equal or not. ## Result The original linked list is `1->2->6->3->4->5->6`. After removing target elements in linked list, the linked list would be `1->2->3->4->5`. ``` 1263456 12345 assert success Program exited with code: 0 ``` ## Reduce instruction usage ### create_default_list If we create list reversely(from last node to first node), we can just store `current_node` to `pervious_node->next`, instead of set `current_node->next = NULL`. Also, we don't need to care about this creating is for head or node. Therefore, we can save instruction `sw x0, 4(s0) # node->next = NULL` and merge label `create` to label `loop`. But, we should reverse the arrange of array in data section, because you create list reversely. #### assembly code after modify ```cpp create_default_list: addi sp, sp, 8 li s0, 0x0fff0000 # address of list head sw ra, 0(sp) # store return address sw s0, 4(sp) # store the address of list head li s2, 0 # i = 0 la t1, array # load array la t0, length # load length lw t0, 0(t0) bne s2, t0, loop jr ra loop: jal ra, malloc # get memory for the next node bne a0, x0, exit_error lw s1, 0(t1) # load array[i] addi t1, t1, 4 # i++ sw s1, 0(s0) # node->value = array[i] sw t2, 4(s0) # prev->next = node mv t2, s0 # prev = node addi s2, s2, 1 addi s0, s0, 8 bne s2, t0, loop # check end condition in for loop ret: addi s1, s0, -8 lw ra, 0(sp) # load return address addi sp, sp, 8 jr ra ``` ## Pipeline feature in Ripes `Instruction pipeline` is a technique for implementing instruction-level parallelism within a single processor. We partition an instruction to five stages. | Stage |Full name | | -------- | -------- | | IF | Instruciton Fetch | |ID | Instruction decode & read register| | EX |Execution or Address Calculation | | MEM | Data memory access (load data from memory) | | WB | write back (write data back to register) | Also, we have four `pipeline register` to store the data of pervious stage. Next stage would load data from `pipeline register` and store new data to next `pipeline register`. ![](https://i.imgur.com/c9oQBbq.png) ### Pipeline stage We take instruction `lw t0, 4(t1)` as example #### IF In this stage, load the instruction which address is PC, and store PC + 4 to IF/ID pipeline register. ![](https://i.imgur.com/IGjKOZ6.png) #### ID In this stage, decode the instruction loaded from IF/ID pipeline register. the instruction is `lw t0, 4(t1)`, so `R1` is register `t1`(x6) , `R2` in this instruction is useless, `imm` is 4 and register `t0` (x5) would store in ID/EX pipeline register. Reg 1 means the address store in register `t1` (x6), here is 0x00000020. ![](https://i.imgur.com/T5eGd54.png) #### EX In this stage, we pass immediate and the address store in register `t1` (x6) to ALU, ALU will execute add operation to calculate right memory address that we want to access value. Here is 0x00000020 + 4 (immediate) = 0x00000024. ![](https://i.imgur.com/IJbFz9s.png) #### MEM In this stage, we will access data store in memory address that ALU calculate in previous stage. ![](https://i.imgur.com/fUoVP2B.png) #### WB In last stage, we will write data which we access in pervious stage back to the target register. We can notice the below image, red line represent the target register number, and blue line represent the value that access from memory. ![](https://i.imgur.com/CVHaiPY.png) ## Pipeline Hazard The pipeline design might encounter three types of hazard ### 1. Structure hazards A required resource is busy, and instruction need to wait until the resource is ready. #### Solution * Load/store and Instruction fetch both require memory access Separating instruction memory and data memory or separating instruction cache and data cache. ### 2. Data hazard An instruction depends on completion of data access by a previous instruction. There are two scenarioes for data hazard #### RAW(Read After Write) There is an example of RAW, before the first instruction `add s0, t0, t1` write back data to register`x8`, the second instruction `sub t2, s0, t3` has read data from register `x8`. In this Image, `Ripe` use forwarding to avoid data hazard. ![](https://i.imgur.com/g2lucvG.png) #### Load-use There is an example of RAW, before the first instruction `lw s0, 20(t1)` access memory and write back data to register`s0`, the second instruction `sub t2, s0, t3` has read data from register `s0`. In this Image, `Ripe` use stall and forwarding to avoid data hazard. ![](https://i.imgur.com/P8VubLT.png) #### Solution there are three solution for data hazard: * Reodering code: reorder code to avoid use of load result in the next instruction. * Stall: add no operation instruction to wait data write back. * Forwarding: bypass the result to next instruction, this desgin requires extra connections in the datapath. ![](https://i.imgur.com/XvwrJFl.png) But, we still need a stall to avoid data hazard in load-use case. ![](https://i.imgur.com/zVgUc1E.png) ### 3. Control hazard Branch determines flow of control, and fetching next instruction depends on branch outcome. #### Solution Result of branch is known at the EX stage. * Stall: wait until the branch determined, then fetching next instruction. * Branch Prediction: * Correct prediction: no stall required. * Incorrect prediction: clean incorrect instruction and stall one cycle to wait previous instruction finish EX stage. ## Reference * https://moodle.ncku.edu.tw/pluginfile.php/345826/mod_resource/content/2/Ch4_Processor_single.pdf * https://moodle.ncku.edu.tw/pluginfile.php/345827/mod_resource/content/1/Ch4_Processor-pipe1.pdf