# Homework2: RISC-V Toolchain
###### tags: `2022 computer architecture`
Reference: [陳彥甫](https://hackmd.io/@qwe661234/H1J9kc6Ws) [Leetcode1234](https://leetcode.com/problems/remove-linked-list-elements)
## Original code - Remove Linked List Elements
* I choose the Problem **Remove Linked List Elements** since linked-list is an ubiquitous topic in computer science so I wonder how it works in assembly. Also, I find some mistakes in his either c code or assembly. Hence, I would like to correct it and try to speedup the execution with optimizations.
* c++ code rewritten from [陳彥甫](https://hackmd.io/@qwe661234/H1J9kc6Ws)
```cpp=
ListNode* removeElements(ListNode* head, int val) {
ListNode *cur = head, *prev = NULL;
while (cur) {
if (cur->val == val) {
if (cur == head) {
head = head->next;
cur = head;
continue;
} else {
prev->next = cur->next;
}
}
prev = cur; //this shouldn't be run if cur removing !
cur = cur->next;
}
return head;
}
```
The problem is:
If the given list is ```[1,2,2,1]``` and the target value is ```2```. The expected output is ```[1,1]```, but the prgram returns ```[1,2,1]```, as the first node points to third node when removing second one. But third node updates the ```next``` of the already-removed second node not that of the first node.
* Assembly
```
create_default_list:
addi sp, sp, 8
li s0, 0x0fff0000 # address of list head
sw ra, 0(sp) # store return address
sw s0, 4(sp) # store the address of list head
li s2, 0 # i = 0
la t1, array # load array
la t0, length # load length
lw t0, 0(t0)
bne s2, t0, loop
jr ra
loop:
jal ra, malloc # get memory for the next node
bne a0, x0, exit_error #!!!
...
# allocates 8 bytes on the heap, returns pointer to start in a0
malloc:
addi a0, s0, 8
addi a7, x0, 214 # brk systemcall
ecall
jr ra
```
The malloc returns the start pointer and because the linked-list exists in the heap, ```a0``` is never smaller than 0x0fff0000. Therefore, the code will jump to ```exit_error``` even if node is successfully created.
## Optimization
* I optimize the c++ program by seperating a while loop to check if it's the head, and if not then the following don't need to check any more:
```cpp=
ListNode* removeElements(ListNode* head, int val) {
while(head && head->val == val)
head = head->next;
if(head) {
ListNode *p = head, *q = head->next;
while(q) {
if(q->val == val)
p->next = q->next;
else
p = p->next;
q = q->next;
}
}
return head;
}
```
* The improved assembly code:
```
# ------ remove_elements ------
remove_elements:
la s0, target
lw s0, 0(s0) # s0 = target
head_while:
beq t1, x0, end_iterate # head == null
lw t2, 0(t1)
lw t3, 4(t1) # pre-load head.next
bne t2, s0, if_head # head->val!=val
mv t1, t3
j head_while
if_head:
beq t1, x0, end_iterate
mv t3, t1 # p = head
lw t4, 4(t1) # q = head->next
while_q:
beq t4, x0, end_iterate
lw t2, 0(t4) # load q.val
lw t5, 4(t4) # pre-load q.next
lw t6, 4(t3) # pre-load p.next
mv t4, t5 # q = q.next
beq t2, s0, remove
mv t3, t6 # p = p.next
j while_q
remove:
sw t5, 4(t3) # p.next = q.next
j while_q
end_iterate:
mv s1, t1 # push head back
jr ra
```
### Using RISC-V GCC

#### Different optimization result:
::: spoiler -O0 assembly code
```
00010184 <removeElements>:
10184: fd010113 addi sp,sp,-48
10188: 02812623 sw s0,44(sp)
1018c: 03010413 addi s0,sp,48
10190: fca42e23 sw a0,-36(s0)
10194: fcb42c23 sw a1,-40(s0)
10198: 0100006f j 101a8 <removeElements+0x24>
1019c: fdc42783 lw a5,-36(s0)
101a0: 0047a783 lw a5,4(a5)
101a4: fcf42e23 sw a5,-36(s0)
101a8: fdc42783 lw a5,-36(s0)
101ac: 00078a63 beqz a5,101c0 <removeElements+0x3c>
101b0: fdc42783 lw a5,-36(s0)
101b4: 0007a783 lw a5,0(a5)
101b8: fd842703 lw a4,-40(s0)
101bc: fef700e3 beq a4,a5,1019c <removeElements+0x18>
101c0: fdc42783 lw a5,-36(s0)
101c4: 06078063 beqz a5,10224 <removeElements+0xa0>
101c8: fdc42783 lw a5,-36(s0)
101cc: fef42623 sw a5,-20(s0)
101d0: fdc42783 lw a5,-36(s0)
101d4: 0047a783 lw a5,4(a5)
101d8: fef42423 sw a5,-24(s0)
101dc: 0400006f j 1021c <removeElements+0x98>
101e0: fe842783 lw a5,-24(s0)
101e4: 0007a783 lw a5,0(a5)
101e8: fd842703 lw a4,-40(s0)
101ec: 00f71c63 bne a4,a5,10204 <removeElements+0x80>
101f0: fe842783 lw a5,-24(s0)
101f4: 0047a703 lw a4,4(a5)
101f8: fec42783 lw a5,-20(s0)
101fc: 00e7a223 sw a4,4(a5)
10200: 0100006f j 10210 <removeElements+0x8c>
10204: fec42783 lw a5,-20(s0)
10208: 0047a783 lw a5,4(a5)
1020c: fef42623 sw a5,-20(s0)
10210: fe842783 lw a5,-24(s0)
10214: 0047a783 lw a5,4(a5)
10218: fef42423 sw a5,-24(s0)
1021c: fe842783 lw a5,-24(s0)
10220: fc0790e3 bnez a5,101e0 <removeElements+0x5c>
10224: fdc42783 lw a5,-36(s0)
10228: 00078513 mv a0,a5
1022c: 02c12403 lw s0,44(sp)
10230: 03010113 addi sp,sp,48
10234: 00008067 ret
00010238 <main>:
10238: fc010113 addi sp,sp,-64
1023c: 02112e23 sw ra,60(sp)
10240: 02812c23 sw s0,56(sp)
10244: 04010413 addi s0,sp,64
10248: fe042623 sw zero,-20(s0)
1024c: fec42783 lw a5,-20(s0)
10250: fef42423 sw a5,-24(s0)
10254: 000217b7 lui a5,0x21
10258: 7f478793 addi a5,a5,2036 # 217f4 <__clzsi2+0x8c>
1025c: 0007a803 lw a6,0(a5)
10260: 0047a503 lw a0,4(a5)
10264: 0087a583 lw a1,8(a5)
10268: 00c7a603 lw a2,12(a5)
1026c: 0107a683 lw a3,16(a5)
10270: 0147a703 lw a4,20(a5)
10274: 0187a783 lw a5,24(a5)
10278: fd042023 sw a6,-64(s0)
1027c: fca42223 sw a0,-60(s0)
10280: fcb42423 sw a1,-56(s0)
10284: fcc42623 sw a2,-52(s0)
10288: fcd42823 sw a3,-48(s0)
1028c: fce42a23 sw a4,-44(s0)
10290: fcf42c23 sw a5,-40(s0)
10294: 00700793 li a5,7
10298: fef42023 sw a5,-32(s0)
1029c: fe042223 sw zero,-28(s0)
102a0: 0700006f j 10310 <main+0xd8>
102a4: 00800513 li a0,8
102a8: 1d0000ef jal ra,10478 <malloc>
102ac: 00050793 mv a5,a0
102b0: fcf42e23 sw a5,-36(s0)
102b4: fe442783 lw a5,-28(s0)
102b8: 00279793 slli a5,a5,0x2
102bc: ff078793 addi a5,a5,-16
102c0: 008787b3 add a5,a5,s0
102c4: fd07a703 lw a4,-48(a5)
102c8: fdc42783 lw a5,-36(s0)
102cc: 00e7a023 sw a4,0(a5)
102d0: fdc42783 lw a5,-36(s0)
102d4: 0007a223 sw zero,4(a5)
102d8: fec42783 lw a5,-20(s0)
102dc: 00079863 bnez a5,102ec <main+0xb4>
102e0: fdc42783 lw a5,-36(s0)
102e4: fef42623 sw a5,-20(s0)
102e8: 01c0006f j 10304 <main+0xcc>
102ec: fe842783 lw a5,-24(s0)
102f0: fdc42703 lw a4,-36(s0)
102f4: 00e7a223 sw a4,4(a5)
102f8: fe842783 lw a5,-24(s0)
102fc: 0047a783 lw a5,4(a5)
10300: fef42423 sw a5,-24(s0)
10304: fe442783 lw a5,-28(s0)
10308: 00178793 addi a5,a5,1
1030c: fef42223 sw a5,-28(s0)
10310: fe442703 lw a4,-28(s0)
10314: fe042783 lw a5,-32(s0)
10318: f8f746e3 blt a4,a5,102a4 <main+0x6c>
1031c: 00600593 li a1,6
10320: fec42503 lw a0,-20(s0)
10324: e61ff0ef jal ra,10184 <removeElements>
10328: fea42623 sw a0,-20(s0)
1032c: 0280006f j 10354 <main+0x11c>
10330: fec42783 lw a5,-20(s0)
10334: 0007a783 lw a5,0(a5)
10338: 00078593 mv a1,a5
1033c: 000217b7 lui a5,0x21
10340: 7f078513 addi a0,a5,2032 # 217f0 <__clzsi2+0x88>
10344: 1f9000ef jal ra,10d3c <printf>
10348: fec42783 lw a5,-20(s0)
1034c: 0047a783 lw a5,4(a5)
10350: fef42623 sw a5,-20(s0)
10354: fec42783 lw a5,-20(s0)
10358: fc078ce3 beqz a5,10330 <main+0xf8>
1035c: 00000793 li a5,0
10360: 00078513 mv a0,a5
10364: 03c12083 lw ra,60(sp)
10368: 03812403 lw s0,56(sp)
1036c: 04010113 addi sp,sp,64
10370: 00008067 ret
```
:::
::: spoiler -O1 assembly code
```
00010184 <removeElements>:
10184: 00050c63 beqz a0,1019c <removeElements+0x18>
10188: 00052783 lw a5,0(a0)
1018c: 02b79c63 bne a5,a1,101c4 <removeElements+0x40>
10190: 00452503 lw a0,4(a0)
10194: fe051ae3 bnez a0,10188 <removeElements+0x4>
10198: 00008067 ret
1019c: 00008067 ret
101a0: 0047a703 lw a4,4(a5)
101a4: 00e6a223 sw a4,4(a3)
101a8: 0047a783 lw a5,4(a5)
101ac: 00078a63 beqz a5,101c0 <removeElements+0x3c>
101b0: 0007a703 lw a4,0(a5)
101b4: feb706e3 beq a4,a1,101a0 <removeElements+0x1c>
101b8: 0046a683 lw a3,4(a3)
101bc: fedff06f j 101a8 <removeElements+0x24>
101c0: 00008067 ret
101c4: 00452783 lw a5,4(a0)
101c8: 00050693 mv a3,a0
101cc: fe0792e3 bnez a5,101b0 <removeElements+0x2c>
101d0: fc9ff06f j 10198 <removeElements+0x14>
000101d4 <main>:
101d4: fc010113 addi sp,sp,-64
101d8: 02112e23 sw ra,60(sp)
101dc: 02812c23 sw s0,56(sp)
101e0: 02912a23 sw s1,52(sp)
101e4: 03212823 sw s2,48(sp)
101e8: 03312623 sw s3,44(sp)
101ec: 000217b7 lui a5,0x21
101f0: 73c78793 addi a5,a5,1852 # 2173c <__clzsi2+0x8c>
101f4: 0007a803 lw a6,0(a5)
101f8: 0047a503 lw a0,4(a5)
101fc: 0087a583 lw a1,8(a5)
10200: 00c7a603 lw a2,12(a5)
10204: 0107a683 lw a3,16(a5)
10208: 0147a703 lw a4,20(a5)
1020c: 0187a783 lw a5,24(a5)
10210: 01012223 sw a6,4(sp)
10214: 00a12423 sw a0,8(sp)
10218: 00b12623 sw a1,12(sp)
1021c: 00c12823 sw a2,16(sp)
10220: 00d12a23 sw a3,20(sp)
10224: 00e12c23 sw a4,24(sp)
10228: 00f12e23 sw a5,28(sp)
1022c: 00410413 addi s0,sp,4
10230: 02010993 addi s3,sp,32
10234: 00000913 li s2,0
10238: 00000493 li s1,0
1023c: 0100006f j 1024c <main+0x78>
10240: 00050493 mv s1,a0
10244: 00440413 addi s0,s0,4
10248: 03340463 beq s0,s3,10270 <main+0x9c>
1024c: 00800513 li a0,8
10250: 170000ef jal ra,103c0 <malloc>
10254: 00042783 lw a5,0(s0)
10258: 00f52023 sw a5,0(a0)
1025c: 00052223 sw zero,4(a0)
10260: fe0480e3 beqz s1,10240 <main+0x6c>
10264: 00a92223 sw a0,4(s2)
10268: 00050913 mv s2,a0
1026c: fd9ff06f j 10244 <main+0x70>
10270: 00600593 li a1,6
10274: 00048513 mv a0,s1
10278: f0dff0ef jal ra,10184 <removeElements>
1027c: 02051063 bnez a0,1029c <main+0xc8>
10280: 000214b7 lui s1,0x21
10284: 00000413 li s0,0
10288: 00042583 lw a1,0(s0)
1028c: 73848513 addi a0,s1,1848 # 21738 <__clzsi2+0x88>
10290: 1f5000ef jal ra,10c84 <printf>
10294: 00442783 lw a5,4(s0)
10298: fe0786e3 beqz a5,10284 <main+0xb0>
1029c: 00000513 li a0,0
102a0: 03c12083 lw ra,60(sp)
102a4: 03812403 lw s0,56(sp)
102a8: 03412483 lw s1,52(sp)
102ac: 03012903 lw s2,48(sp)
102b0: 02c12983 lw s3,44(sp)
102b4: 04010113 addi sp,sp,64
102b8: 00008067 ret
```
:::
::: spoiler -O2 assembly code
```
000100c4 <main>:
100c4: 000117b7 lui a5,0x11
100c8: 56c78793 addi a5,a5,1388 # 1156c <__errno+0x8>
100cc: 0007a803 lw a6,0(a5)
100d0: 0047a503 lw a0,4(a5)
100d4: 0087a583 lw a1,8(a5)
100d8: 00c7a603 lw a2,12(a5)
100dc: 0107a683 lw a3,16(a5)
100e0: 0147a703 lw a4,20(a5)
100e4: 0187a783 lw a5,24(a5)
100e8: fc010113 addi sp,sp,-64
100ec: 02812c23 sw s0,56(sp)
100f0: 02912a23 sw s1,52(sp)
100f4: 03212823 sw s2,48(sp)
100f8: 03312623 sw s3,44(sp)
100fc: 02112e23 sw ra,60(sp)
10100: 01012223 sw a6,4(sp)
10104: 00a12423 sw a0,8(sp)
10108: 00b12623 sw a1,12(sp)
1010c: 00c12823 sw a2,16(sp)
10110: 00d12a23 sw a3,20(sp)
10114: 00e12c23 sw a4,24(sp)
10118: 00f12e23 sw a5,28(sp)
1011c: 00410413 addi s0,sp,4
10120: 02010993 addi s3,sp,32
10124: 00000913 li s2,0
10128: 00000493 li s1,0
1012c: 0140006f j 10140 <main+0x7c>
10130: 00a92223 sw a0,4(s2)
10134: 00440413 addi s0,s0,4
10138: 00050913 mv s2,a0
1013c: 02898463 beq s3,s0,10164 <main+0xa0>
10140: 00800513 li a0,8
10144: 27c000ef jal ra,103c0 <malloc>
10148: 00042783 lw a5,0(s0)
1014c: 00052223 sw zero,4(a0)
10150: 00f52023 sw a5,0(a0)
10154: fc049ee3 bnez s1,10130 <main+0x6c>
10158: 00440413 addi s0,s0,4
1015c: 00050493 mv s1,a0
10160: fe8990e3 bne s3,s0,10140 <main+0x7c>
10164: 00600593 li a1,6
10168: 00048513 mv a0,s1
1016c: 0f0000ef jal ra,1025c <removeElements>
10170: 00051663 bnez a0,1017c <main+0xb8>
10174: 00002783 lw a5,0(zero) # 0 <exit-0x10094>
10178: 00100073 ebreak
1017c: 03c12083 lw ra,60(sp)
10180: 03812403 lw s0,56(sp)
10184: 03412483 lw s1,52(sp)
10188: 03012903 lw s2,48(sp)
1018c: 02c12983 lw s3,44(sp)
10190: 00000513 li a0,0
10194: 04010113 addi sp,sp,64
10198: 00008067 ret
0001025c <removeElements>:
1025c: 00051863 bnez a0,1026c <removeElements+0x10>
10260: 0580006f j 102b8 <removeElements+0x5c>
10264: 00452503 lw a0,4(a0)
10268: 04050463 beqz a0,102b0 <removeElements+0x54>
1026c: 00052783 lw a5,0(a0)
10270: feb78ae3 beq a5,a1,10264 <removeElements+0x8>
10274: 00452783 lw a5,4(a0)
10278: 00050693 mv a3,a0
1027c: 00079a63 bnez a5,10290 <removeElements+0x34>
10280: 02c0006f j 102ac <removeElements+0x50>
10284: 0047a783 lw a5,4(a5)
10288: 0046a683 lw a3,4(a3)
1028c: 00078e63 beqz a5,102a8 <removeElements+0x4c>
10290: 0007a703 lw a4,0(a5)
10294: feb718e3 bne a4,a1,10284 <removeElements+0x28>
10298: 0047a703 lw a4,4(a5)
1029c: 00e6a223 sw a4,4(a3)
102a0: 0047a783 lw a5,4(a5)
102a4: fe0796e3 bnez a5,10290 <removeElements+0x34>
102a8: 00008067 ret
102ac: 00008067 ret
102b0: 00000513 li a0,0
102b4: 00008067 ret
102b8: 00008067 ret
```
:::
::: spoiler -O3 assembly code
```
000100c4 <main>:
100c4: 000117b7 lui a5,0x11
100c8: 5ac78793 addi a5,a5,1452 # 115ac <__errno+0x8>
100cc: 0007a803 lw a6,0(a5)
100d0: 0047a503 lw a0,4(a5)
100d4: 0087a583 lw a1,8(a5)
100d8: 00c7a603 lw a2,12(a5)
100dc: 0107a683 lw a3,16(a5)
100e0: 0147a703 lw a4,20(a5)
100e4: 0187a783 lw a5,24(a5)
100e8: fc010113 addi sp,sp,-64
100ec: 02812c23 sw s0,56(sp)
100f0: 02912a23 sw s1,52(sp)
100f4: 03212823 sw s2,48(sp)
100f8: 03312623 sw s3,44(sp)
100fc: 02112e23 sw ra,60(sp)
10100: 01012223 sw a6,4(sp)
10104: 00a12423 sw a0,8(sp)
10108: 00b12623 sw a1,12(sp)
1010c: 00c12823 sw a2,16(sp)
10110: 00d12a23 sw a3,20(sp)
10114: 00e12c23 sw a4,24(sp)
10118: 00f12e23 sw a5,28(sp)
1011c: 00410493 addi s1,sp,4
10120: 02010993 addi s3,sp,32
10124: 00000913 li s2,0
10128: 00000413 li s0,0
1012c: 00800513 li a0,8
10130: 2d0000ef jal ra,10400 <malloc>
10134: 0004a783 lw a5,0(s1)
10138: 00052223 sw zero,4(a0)
1013c: 00f52023 sw a5,0(a0)
10140: 02040663 beqz s0,1016c <main+0xa8>
10144: 00a92223 sw a0,4(s2)
10148: 00448493 addi s1,s1,4
1014c: 02998663 beq s3,s1,10178 <main+0xb4>
10150: 00050913 mv s2,a0
10154: 00800513 li a0,8
10158: 2a8000ef jal ra,10400 <malloc>
1015c: 0004a783 lw a5,0(s1)
10160: 00052223 sw zero,4(a0)
10164: 00f52023 sw a5,0(a0)
10168: fc041ee3 bnez s0,10144 <main+0x80>
1016c: 00448493 addi s1,s1,4
10170: 00050413 mv s0,a0
10174: fb349ce3 bne s1,s3,1012c <main+0x68>
10178: 00600713 li a4,6
1017c: 00c0006f j 10188 <main+0xc4>
10180: 00442403 lw s0,4(s0)
10184: 04040c63 beqz s0,101dc <main+0x118>
10188: 00042783 lw a5,0(s0)
1018c: fee78ae3 beq a5,a4,10180 <main+0xbc>
10190: 00442783 lw a5,4(s0)
10194: 00600693 li a3,6
10198: 00078c63 beqz a5,101b0 <main+0xec>
1019c: 0007a703 lw a4,0(a5)
101a0: 0047a783 lw a5,4(a5)
101a4: 02d70663 beq a4,a3,101d0 <main+0x10c>
101a8: 00442403 lw s0,4(s0)
101ac: fe0798e3 bnez a5,1019c <main+0xd8>
101b0: 03c12083 lw ra,60(sp)
101b4: 03812403 lw s0,56(sp)
101b8: 03412483 lw s1,52(sp)
101bc: 03012903 lw s2,48(sp)
101c0: 02c12983 lw s3,44(sp)
101c4: 00000513 li a0,0
101c8: 04010113 addi sp,sp,64
101cc: 00008067 ret
101d0: 00f42223 sw a5,4(s0)
101d4: fc0794e3 bnez a5,1019c <main+0xd8>
101d8: fd9ff06f j 101b0 <main+0xec>
101dc: 00002783 lw a5,0(zero) # 0 <exit-0x10094>
101e0: 00100073 ebreak
000102a4 <removeElements>:
102a4: 00051863 bnez a0,102b4 <removeElements+0x10>
102a8: 0500006f j 102f8 <removeElements+0x54>
102ac: 00452503 lw a0,4(a0)
102b0: 04050063 beqz a0,102f0 <removeElements+0x4c>
102b4: 00052783 lw a5,0(a0)
102b8: feb78ae3 beq a5,a1,102ac <removeElements+0x8>
102bc: 00452783 lw a5,4(a0)
102c0: 00050693 mv a3,a0
102c4: 02078463 beqz a5,102ec <removeElements+0x48>
102c8: 0007a703 lw a4,0(a5)
102cc: 0047a783 lw a5,4(a5)
102d0: 00b70863 beq a4,a1,102e0 <removeElements+0x3c>
102d4: 0046a683 lw a3,4(a3)
102d8: fe0798e3 bnez a5,102c8 <removeElements+0x24>
102dc: 00008067 ret
102e0: 00f6a223 sw a5,4(a3)
102e4: fe0792e3 bnez a5,102c8 <removeElements+0x24>
102e8: 00008067 ret
102ec: 00008067 ret
102f0: 00000513 li a0,0
102f4: 00008067 ret
102f8: 00008067 ret
```
:::
::: spoiler -Os assembly code
```
000100c4 <main>:
100c4: fd010113 addi sp,sp,-48
100c8: 000115b7 lui a1,0x11
100cc: 01c00613 li a2,28
100d0: 6bc58593 addi a1,a1,1724 # 116bc <__errno+0x8>
100d4: 00410513 addi a0,sp,4
100d8: 02812423 sw s0,40(sp)
100dc: 02912223 sw s1,36(sp)
100e0: 03212023 sw s2,32(sp)
100e4: 02112623 sw ra,44(sp)
100e8: 00410413 addi s0,sp,4
100ec: 219000ef jal ra,10b04 <memcpy>
100f0: 00000913 li s2,0
100f4: 00000493 li s1,0
100f8: 00800513 li a0,8
100fc: 270000ef jal ra,1036c <malloc>
10100: 00042783 lw a5,0(s0)
10104: 00052223 sw zero,4(a0)
10108: 00f52023 sw a5,0(a0)
1010c: 02048863 beqz s1,1013c <main+0x78>
10110: 00a92223 sw a0,4(s2)
10114: 00050913 mv s2,a0
10118: 00440413 addi s0,s0,4
1011c: 02010793 addi a5,sp,32
10120: fc879ce3 bne a5,s0,100f8 <main+0x34>
10124: 00600593 li a1,6
10128: 00048513 mv a0,s1
1012c: 0f4000ef jal ra,10220 <removeElements>
10130: 00051a63 bnez a0,10144 <main+0x80>
10134: 00002783 lw a5,0(zero) # 0 <exit-0x10094>
10138: 00100073 ebreak
1013c: 00050493 mv s1,a0
10140: fd9ff06f j 10118 <main+0x54>
10144: 02c12083 lw ra,44(sp)
10148: 02812403 lw s0,40(sp)
1014c: 02412483 lw s1,36(sp)
10150: 02012903 lw s2,32(sp)
10154: 00000513 li a0,0
10158: 03010113 addi sp,sp,48
1015c: 00008067 ret
00010220 <removeElements>:
10220: 04050263 beqz a0,10264 <removeElements+0x44>
10224: 00052703 lw a4,0(a0)
10228: 00452783 lw a5,4(a0)
1022c: 00b70663 beq a4,a1,10238 <removeElements+0x18>
10230: 00050713 mv a4,a0
10234: 0200006f j 10254 <removeElements+0x34>
10238: 00078513 mv a0,a5
1023c: fe5ff06f j 10220 <removeElements>
10240: 0007a683 lw a3,0(a5)
10244: 00b69c63 bne a3,a1,1025c <removeElements+0x3c>
10248: 0047a683 lw a3,4(a5)
1024c: 00d72223 sw a3,4(a4)
10250: 0047a783 lw a5,4(a5)
10254: fe0796e3 bnez a5,10240 <removeElements+0x20>
10258: 00008067 ret
1025c: 00472703 lw a4,4(a4)
10260: ff1ff06f j 10250 <removeElements+0x30>
10264: 00008067 ret
```
:::
::: spoiler -Ofast assembly code
```
000100c4 <main>:
100c4: 000117b7 lui a5,0x11
100c8: 5ac78793 addi a5,a5,1452 # 115ac <__errno+0x8>
100cc: 0007a803 lw a6,0(a5)
100d0: 0047a503 lw a0,4(a5)
100d4: 0087a583 lw a1,8(a5)
100d8: 00c7a603 lw a2,12(a5)
100dc: 0107a683 lw a3,16(a5)
100e0: 0147a703 lw a4,20(a5)
100e4: 0187a783 lw a5,24(a5)
100e8: fc010113 addi sp,sp,-64
100ec: 02812c23 sw s0,56(sp)
100f0: 02912a23 sw s1,52(sp)
100f4: 03212823 sw s2,48(sp)
100f8: 03312623 sw s3,44(sp)
100fc: 02112e23 sw ra,60(sp)
10100: 01012223 sw a6,4(sp)
10104: 00a12423 sw a0,8(sp)
10108: 00b12623 sw a1,12(sp)
1010c: 00c12823 sw a2,16(sp)
10110: 00d12a23 sw a3,20(sp)
10114: 00e12c23 sw a4,24(sp)
10118: 00f12e23 sw a5,28(sp)
1011c: 00410493 addi s1,sp,4
10120: 02010993 addi s3,sp,32
10124: 00000913 li s2,0
10128: 00000413 li s0,0
1012c: 00800513 li a0,8
10130: 2d0000ef jal ra,10400 <malloc>
10134: 0004a783 lw a5,0(s1)
10138: 00052223 sw zero,4(a0)
1013c: 00f52023 sw a5,0(a0)
10140: 02040663 beqz s0,1016c <main+0xa8>
10144: 00a92223 sw a0,4(s2)
10148: 00448493 addi s1,s1,4
1014c: 02998663 beq s3,s1,10178 <main+0xb4>
10150: 00050913 mv s2,a0
10154: 00800513 li a0,8
10158: 2a8000ef jal ra,10400 <malloc>
1015c: 0004a783 lw a5,0(s1)
10160: 00052223 sw zero,4(a0)
10164: 00f52023 sw a5,0(a0)
10168: fc041ee3 bnez s0,10144 <main+0x80>
1016c: 00448493 addi s1,s1,4
10170: 00050413 mv s0,a0
10174: fb349ce3 bne s1,s3,1012c <main+0x68>
10178: 00600713 li a4,6
1017c: 00c0006f j 10188 <main+0xc4>
10180: 00442403 lw s0,4(s0)
10184: 04040c63 beqz s0,101dc <main+0x118>
10188: 00042783 lw a5,0(s0)
1018c: fee78ae3 beq a5,a4,10180 <main+0xbc>
10190: 00442783 lw a5,4(s0)
10194: 00600693 li a3,6
10198: 00078c63 beqz a5,101b0 <main+0xec>
1019c: 0007a703 lw a4,0(a5)
101a0: 0047a783 lw a5,4(a5)
101a4: 02d70663 beq a4,a3,101d0 <main+0x10c>
101a8: 00442403 lw s0,4(s0)
101ac: fe0798e3 bnez a5,1019c <main+0xd8>
101b0: 03c12083 lw ra,60(sp)
101b4: 03812403 lw s0,56(sp)
101b8: 03412483 lw s1,52(sp)
101bc: 03012903 lw s2,48(sp)
101c0: 02c12983 lw s3,44(sp)
101c4: 00000513 li a0,0
101c8: 04010113 addi sp,sp,64
101cc: 00008067 ret
101d0: 00f42223 sw a5,4(s0)
101d4: fc0794e3 bnez a5,1019c <main+0xd8>
101d8: fd9ff06f j 101b0 <main+0xec>
101dc: 00002783 lw a5,0(zero) # 0 <exit-0x10094>
101e0: 00100073 ebreak
000102a4 <removeElements>:
102a4: 00051863 bnez a0,102b4 <removeElements+0x10>
102a8: 0500006f j 102f8 <removeElements+0x54>
102ac: 00452503 lw a0,4(a0)
102b0: 04050063 beqz a0,102f0 <removeElements+0x4c>
102b4: 00052783 lw a5,0(a0)
102b8: feb78ae3 beq a5,a1,102ac <removeElements+0x8>
102bc: 00452783 lw a5,4(a0)
102c0: 00050693 mv a3,a0
102c4: 02078463 beqz a5,102ec <removeElements+0x48>
102c8: 0007a703 lw a4,0(a5)
102cc: 0047a783 lw a5,4(a5)
102d0: 00b70863 beq a4,a1,102e0 <removeElements+0x3c>
102d4: 0046a683 lw a3,4(a3)
102d8: fe0798e3 bnez a5,102c8 <removeElements+0x24>
102dc: 00008067 ret
102e0: 00f6a223 sw a5,4(a3)
102e4: fe0792e3 bnez a5,102c8 <removeElements+0x24>
102e8: 00008067 ret
102ec: 00008067 ret
102f0: 00000513 li a0,0
102f4: 00008067 ret
102f8: 00008067 ret
```
:::
#### Handwritten code:
```
.data
length:
.word 7
array:
.word 1, 2, 6, 3, 4, 5, 6
target:
.word 6
answer:
.word 5, 4, 3, 2, 1
err_msg:
.string "malloc fail"
assert_fail:
.string "assert fail"
assert_success:
.string "assert success"
.text
main:
jal create_default_list
mv t1, s1 # store list head to t1
jal print_list
mv t1, s1 # store list head to t1
jal remove_elements
mv t1, s1 # store list head to t1
jal print_list
mv t1, s1 # store list head to t1
jal assert_list
exit:
addi a7, x0, 10 # exit
ecall
# ------ create_default_list ------
create_default_list:
addi sp, sp, 8
li s0, 0x0fff0000 # address of list head
sw ra, 0(sp) # store return address
sw s0, 4(sp) # store the address of list head
li s2, 0 # i = 0
la t1, array # load array
la t0, length # load length
lw t0, 0(t0)
bne s2, t0, loop
jr ra
loop:
addi a0, s0, 8 # malloc: move heap limit to current + 8, 4 byte for integer, 4 byte for pointer
addi a7, x0, 214 # brk systemcall
ecall
beq a0, x0, exit_error
lw s1, 0(t1) # load array[i]
addi t1, t1, 4 # i++
sw s1, 0(s0) # node->value = array[i]
sw t2, 4(s0) # prev->next = node
mv t2, s0 # prev = node
addi s2, s2, 1
addi s0, s0, 8
bne s2, t0, loop # check end condition in for loop
ret:
addi s1, s0, -8
lw ra, 0(sp) # load return address
addi sp, sp, 8
jr ra
# ------ remove_elements ------
remove_elements:
la s0, target
lw s0, 0(s0) # s0 = target
head_while:
beq t1, x0, end_iterate # head == null
lw t2, 0(t1)
lw t3, 4(t1) # pre-load head.next
bne t2, s0, if_head # head->val!=val
mv t1, t3
j head_while
if_head:
beq t1, x0, end_iterate
mv t3, t1 # p = head
lw t4, 4(t1) # q = head->next
while_q:
beq t4, x0, end_iterate
lw t2, 0(t4) # load q.val
lw t5, 4(t4) # pre-load q.next
lw t6, 4(t3) # pre-load p.next
mv t4, t5 # q = q.next
beq t2, s0, remove
mv t3, t6 # p = p.next
j while_q
remove:
sw t5, 4(t3) # p.next = q.next
j while_q
end_iterate:
mv s1, t1 # push head back
jr ra
# ------ print_list ------
print_list:
lw a0, 0(t1) # load value store in node
lw t1, 4(t1) # load address store in node->next
addi a7, x0, 1 # print number
ecall
bne t1, x0, print_list
addi a0, x0, 10 # '\n' acsii number
addi a7, x0, 11 # print char
ecall
jr ra
exit_error:
la a0, err_msg
addi a7, x0, 4 # print error message
ecall
addi a7, x0, 10 # exit
ecall
# ------ assert_list ------
assert_list:
la t0, answer
assert_loop:
beq t1, x0, end_aseert
lw a0, 0(t1) # load value store in node
lw a1, 0(t0) # load value store in answer[i]
lw t1, 4(t1) # load address store in node->next
addi t0, t0, 4 # i++
beq a0, a1, assert_loop
la a0, assert_fail
addi a7, x0, 4 # print message
ecall
jr ra
end_aseert:
la a0, assert_success
addi a7, x0, 4 # print message
ecall
jr ra
```
## Analysis
#### Size of Assembly code:
| Subject | -O0 | -O1 | -O2 | -O3 | -Os | -Ofast |
|:-------:|:-----:|:-----:|:-----:|:----- | ----- |:------ |
| text | 75596 | 75404 | 75428 | 75552 | 75340 | 75552 |
| data | 2816 | 2816 | 2816 | 2816 | 2816 | 2816 |
| bss | 812 | 812 | 812 | 812 | 812 | 812 |
| dec | 79224 | 79032 | 79056 | 79180 | 78968 | 79180 |
| hex | 13578 | 134b8 | 134d0 | 1354c | 13478 | 1354c |
#### Observation:
| Subject | -O0 | -O1 | -O2 | -O3 | -Os | -Ofast |
|:---------------:|:----- |:----- |:----- |:----- |:----- |:------ |
| LOC | 123 | 77 | 125 | 125 | 104 | 141 |
| Register S | s0 | s0-s3 | s0-s3 | s0-s3 | s0-s2 | s0-s3 |
| Register T | - | - | - | - | - | - |
| Register A | a0-5 | a0-a5 | a0-a6 | a0-a6 | a0-a5 | a0-a6 |
| Other Regs | sp,ra | sp,ra | sp,ra | sp,ra | sp,ra | sp,ra |
| CSR cycle count | 5708 | 5489 | 5494 | 5494 | 5545 | 5494 |
We could observe that in Os and Ofast optimization choices, it uses more registers for preloading data; although the size of the code(**LOC**) increase tremendously, the number of **CSR** cycle has decreased.
#### Improvement
 
By reducing unnecessary checking and tweaking assert branch, without change of functionality, we decrease cycles by up to 14% and improve IPC by 10%.
## Reference
[Homework 2](https://hackmd.io/@sysprog/2022-arch-homework2)
[CSR counter](https://five-embeddev.com/riscv-isa-manual/latest/counters.html)
[Lab2 source codes](https://github.com/abnormal749/Computer_Arch2022/tree/main/Lab2)