# Homework2: RISC-V Toolchain ###### tags: `2022 computer architecture` Reference: [陳彥甫](https://hackmd.io/@qwe661234/H1J9kc6Ws) [Leetcode1234](https://leetcode.com/problems/remove-linked-list-elements) ## Original code - Remove Linked List Elements * I choose the Problem **Remove Linked List Elements** since linked-list is an ubiquitous topic in computer science so I wonder how it works in assembly. Also, I find some mistakes in his either c code or assembly. Hence, I would like to correct it and try to speedup the execution with optimizations. * c++ code rewritten from [陳彥甫](https://hackmd.io/@qwe661234/H1J9kc6Ws) ```cpp= ListNode* removeElements(ListNode* head, int val) { ListNode *cur = head, *prev = NULL; while (cur) { if (cur->val == val) { if (cur == head) { head = head->next; cur = head; continue; } else { prev->next = cur->next; } } prev = cur; //this shouldn't be run if cur removing ! cur = cur->next; } return head; } ``` The problem is: If the given list is ```[1,2,2,1]``` and the target value is ```2```. The expected output is ```[1,1]```, but the prgram returns ```[1,2,1]```, as the first node points to third node when removing second one. But third node updates the ```next``` of the already-removed second node not that of the first node. * Assembly ``` create_default_list: addi sp, sp, 8 li s0, 0x0fff0000 # address of list head sw ra, 0(sp) # store return address sw s0, 4(sp) # store the address of list head li s2, 0 # i = 0 la t1, array # load array la t0, length # load length lw t0, 0(t0) bne s2, t0, loop jr ra loop: jal ra, malloc # get memory for the next node bne a0, x0, exit_error #!!! ... # allocates 8 bytes on the heap, returns pointer to start in a0 malloc: addi a0, s0, 8 addi a7, x0, 214 # brk systemcall ecall jr ra ``` The malloc returns the start pointer and because the linked-list exists in the heap, ```a0``` is never smaller than 0x0fff0000. Therefore, the code will jump to ```exit_error``` even if node is successfully created. ## Optimization * I optimize the c++ program by seperating a while loop to check if it's the head, and if not then the following don't need to check any more: ```cpp= ListNode* removeElements(ListNode* head, int val) { while(head && head->val == val) head = head->next; if(head) { ListNode *p = head, *q = head->next; while(q) { if(q->val == val) p->next = q->next; else p = p->next; q = q->next; } } return head; } ``` * The improved assembly code: ``` # ------ remove_elements ------ remove_elements: la s0, target lw s0, 0(s0) # s0 = target head_while: beq t1, x0, end_iterate # head == null lw t2, 0(t1) lw t3, 4(t1) # pre-load head.next bne t2, s0, if_head # head->val!=val mv t1, t3 j head_while if_head: beq t1, x0, end_iterate mv t3, t1 # p = head lw t4, 4(t1) # q = head->next while_q: beq t4, x0, end_iterate lw t2, 0(t4) # load q.val lw t5, 4(t4) # pre-load q.next lw t6, 4(t3) # pre-load p.next mv t4, t5 # q = q.next beq t2, s0, remove mv t3, t6 # p = p.next j while_q remove: sw t5, 4(t3) # p.next = q.next j while_q end_iterate: mv s1, t1 # push head back jr ra ``` ### Using RISC-V GCC ![](https://i.imgur.com/aH2AHvq.png) #### Different optimization result: ::: spoiler -O0 assembly code ``` 00010184 <removeElements>: 10184: fd010113 addi sp,sp,-48 10188: 02812623 sw s0,44(sp) 1018c: 03010413 addi s0,sp,48 10190: fca42e23 sw a0,-36(s0) 10194: fcb42c23 sw a1,-40(s0) 10198: 0100006f j 101a8 <removeElements+0x24> 1019c: fdc42783 lw a5,-36(s0) 101a0: 0047a783 lw a5,4(a5) 101a4: fcf42e23 sw a5,-36(s0) 101a8: fdc42783 lw a5,-36(s0) 101ac: 00078a63 beqz a5,101c0 <removeElements+0x3c> 101b0: fdc42783 lw a5,-36(s0) 101b4: 0007a783 lw a5,0(a5) 101b8: fd842703 lw a4,-40(s0) 101bc: fef700e3 beq a4,a5,1019c <removeElements+0x18> 101c0: fdc42783 lw a5,-36(s0) 101c4: 06078063 beqz a5,10224 <removeElements+0xa0> 101c8: fdc42783 lw a5,-36(s0) 101cc: fef42623 sw a5,-20(s0) 101d0: fdc42783 lw a5,-36(s0) 101d4: 0047a783 lw a5,4(a5) 101d8: fef42423 sw a5,-24(s0) 101dc: 0400006f j 1021c <removeElements+0x98> 101e0: fe842783 lw a5,-24(s0) 101e4: 0007a783 lw a5,0(a5) 101e8: fd842703 lw a4,-40(s0) 101ec: 00f71c63 bne a4,a5,10204 <removeElements+0x80> 101f0: fe842783 lw a5,-24(s0) 101f4: 0047a703 lw a4,4(a5) 101f8: fec42783 lw a5,-20(s0) 101fc: 00e7a223 sw a4,4(a5) 10200: 0100006f j 10210 <removeElements+0x8c> 10204: fec42783 lw a5,-20(s0) 10208: 0047a783 lw a5,4(a5) 1020c: fef42623 sw a5,-20(s0) 10210: fe842783 lw a5,-24(s0) 10214: 0047a783 lw a5,4(a5) 10218: fef42423 sw a5,-24(s0) 1021c: fe842783 lw a5,-24(s0) 10220: fc0790e3 bnez a5,101e0 <removeElements+0x5c> 10224: fdc42783 lw a5,-36(s0) 10228: 00078513 mv a0,a5 1022c: 02c12403 lw s0,44(sp) 10230: 03010113 addi sp,sp,48 10234: 00008067 ret 00010238 <main>: 10238: fc010113 addi sp,sp,-64 1023c: 02112e23 sw ra,60(sp) 10240: 02812c23 sw s0,56(sp) 10244: 04010413 addi s0,sp,64 10248: fe042623 sw zero,-20(s0) 1024c: fec42783 lw a5,-20(s0) 10250: fef42423 sw a5,-24(s0) 10254: 000217b7 lui a5,0x21 10258: 7f478793 addi a5,a5,2036 # 217f4 <__clzsi2+0x8c> 1025c: 0007a803 lw a6,0(a5) 10260: 0047a503 lw a0,4(a5) 10264: 0087a583 lw a1,8(a5) 10268: 00c7a603 lw a2,12(a5) 1026c: 0107a683 lw a3,16(a5) 10270: 0147a703 lw a4,20(a5) 10274: 0187a783 lw a5,24(a5) 10278: fd042023 sw a6,-64(s0) 1027c: fca42223 sw a0,-60(s0) 10280: fcb42423 sw a1,-56(s0) 10284: fcc42623 sw a2,-52(s0) 10288: fcd42823 sw a3,-48(s0) 1028c: fce42a23 sw a4,-44(s0) 10290: fcf42c23 sw a5,-40(s0) 10294: 00700793 li a5,7 10298: fef42023 sw a5,-32(s0) 1029c: fe042223 sw zero,-28(s0) 102a0: 0700006f j 10310 <main+0xd8> 102a4: 00800513 li a0,8 102a8: 1d0000ef jal ra,10478 <malloc> 102ac: 00050793 mv a5,a0 102b0: fcf42e23 sw a5,-36(s0) 102b4: fe442783 lw a5,-28(s0) 102b8: 00279793 slli a5,a5,0x2 102bc: ff078793 addi a5,a5,-16 102c0: 008787b3 add a5,a5,s0 102c4: fd07a703 lw a4,-48(a5) 102c8: fdc42783 lw a5,-36(s0) 102cc: 00e7a023 sw a4,0(a5) 102d0: fdc42783 lw a5,-36(s0) 102d4: 0007a223 sw zero,4(a5) 102d8: fec42783 lw a5,-20(s0) 102dc: 00079863 bnez a5,102ec <main+0xb4> 102e0: fdc42783 lw a5,-36(s0) 102e4: fef42623 sw a5,-20(s0) 102e8: 01c0006f j 10304 <main+0xcc> 102ec: fe842783 lw a5,-24(s0) 102f0: fdc42703 lw a4,-36(s0) 102f4: 00e7a223 sw a4,4(a5) 102f8: fe842783 lw a5,-24(s0) 102fc: 0047a783 lw a5,4(a5) 10300: fef42423 sw a5,-24(s0) 10304: fe442783 lw a5,-28(s0) 10308: 00178793 addi a5,a5,1 1030c: fef42223 sw a5,-28(s0) 10310: fe442703 lw a4,-28(s0) 10314: fe042783 lw a5,-32(s0) 10318: f8f746e3 blt a4,a5,102a4 <main+0x6c> 1031c: 00600593 li a1,6 10320: fec42503 lw a0,-20(s0) 10324: e61ff0ef jal ra,10184 <removeElements> 10328: fea42623 sw a0,-20(s0) 1032c: 0280006f j 10354 <main+0x11c> 10330: fec42783 lw a5,-20(s0) 10334: 0007a783 lw a5,0(a5) 10338: 00078593 mv a1,a5 1033c: 000217b7 lui a5,0x21 10340: 7f078513 addi a0,a5,2032 # 217f0 <__clzsi2+0x88> 10344: 1f9000ef jal ra,10d3c <printf> 10348: fec42783 lw a5,-20(s0) 1034c: 0047a783 lw a5,4(a5) 10350: fef42623 sw a5,-20(s0) 10354: fec42783 lw a5,-20(s0) 10358: fc078ce3 beqz a5,10330 <main+0xf8> 1035c: 00000793 li a5,0 10360: 00078513 mv a0,a5 10364: 03c12083 lw ra,60(sp) 10368: 03812403 lw s0,56(sp) 1036c: 04010113 addi sp,sp,64 10370: 00008067 ret ``` ::: ::: spoiler -O1 assembly code ``` 00010184 <removeElements>: 10184: 00050c63 beqz a0,1019c <removeElements+0x18> 10188: 00052783 lw a5,0(a0) 1018c: 02b79c63 bne a5,a1,101c4 <removeElements+0x40> 10190: 00452503 lw a0,4(a0) 10194: fe051ae3 bnez a0,10188 <removeElements+0x4> 10198: 00008067 ret 1019c: 00008067 ret 101a0: 0047a703 lw a4,4(a5) 101a4: 00e6a223 sw a4,4(a3) 101a8: 0047a783 lw a5,4(a5) 101ac: 00078a63 beqz a5,101c0 <removeElements+0x3c> 101b0: 0007a703 lw a4,0(a5) 101b4: feb706e3 beq a4,a1,101a0 <removeElements+0x1c> 101b8: 0046a683 lw a3,4(a3) 101bc: fedff06f j 101a8 <removeElements+0x24> 101c0: 00008067 ret 101c4: 00452783 lw a5,4(a0) 101c8: 00050693 mv a3,a0 101cc: fe0792e3 bnez a5,101b0 <removeElements+0x2c> 101d0: fc9ff06f j 10198 <removeElements+0x14> 000101d4 <main>: 101d4: fc010113 addi sp,sp,-64 101d8: 02112e23 sw ra,60(sp) 101dc: 02812c23 sw s0,56(sp) 101e0: 02912a23 sw s1,52(sp) 101e4: 03212823 sw s2,48(sp) 101e8: 03312623 sw s3,44(sp) 101ec: 000217b7 lui a5,0x21 101f0: 73c78793 addi a5,a5,1852 # 2173c <__clzsi2+0x8c> 101f4: 0007a803 lw a6,0(a5) 101f8: 0047a503 lw a0,4(a5) 101fc: 0087a583 lw a1,8(a5) 10200: 00c7a603 lw a2,12(a5) 10204: 0107a683 lw a3,16(a5) 10208: 0147a703 lw a4,20(a5) 1020c: 0187a783 lw a5,24(a5) 10210: 01012223 sw a6,4(sp) 10214: 00a12423 sw a0,8(sp) 10218: 00b12623 sw a1,12(sp) 1021c: 00c12823 sw a2,16(sp) 10220: 00d12a23 sw a3,20(sp) 10224: 00e12c23 sw a4,24(sp) 10228: 00f12e23 sw a5,28(sp) 1022c: 00410413 addi s0,sp,4 10230: 02010993 addi s3,sp,32 10234: 00000913 li s2,0 10238: 00000493 li s1,0 1023c: 0100006f j 1024c <main+0x78> 10240: 00050493 mv s1,a0 10244: 00440413 addi s0,s0,4 10248: 03340463 beq s0,s3,10270 <main+0x9c> 1024c: 00800513 li a0,8 10250: 170000ef jal ra,103c0 <malloc> 10254: 00042783 lw a5,0(s0) 10258: 00f52023 sw a5,0(a0) 1025c: 00052223 sw zero,4(a0) 10260: fe0480e3 beqz s1,10240 <main+0x6c> 10264: 00a92223 sw a0,4(s2) 10268: 00050913 mv s2,a0 1026c: fd9ff06f j 10244 <main+0x70> 10270: 00600593 li a1,6 10274: 00048513 mv a0,s1 10278: f0dff0ef jal ra,10184 <removeElements> 1027c: 02051063 bnez a0,1029c <main+0xc8> 10280: 000214b7 lui s1,0x21 10284: 00000413 li s0,0 10288: 00042583 lw a1,0(s0) 1028c: 73848513 addi a0,s1,1848 # 21738 <__clzsi2+0x88> 10290: 1f5000ef jal ra,10c84 <printf> 10294: 00442783 lw a5,4(s0) 10298: fe0786e3 beqz a5,10284 <main+0xb0> 1029c: 00000513 li a0,0 102a0: 03c12083 lw ra,60(sp) 102a4: 03812403 lw s0,56(sp) 102a8: 03412483 lw s1,52(sp) 102ac: 03012903 lw s2,48(sp) 102b0: 02c12983 lw s3,44(sp) 102b4: 04010113 addi sp,sp,64 102b8: 00008067 ret ``` ::: ::: spoiler -O2 assembly code ``` 000100c4 <main>: 100c4: 000117b7 lui a5,0x11 100c8: 56c78793 addi a5,a5,1388 # 1156c <__errno+0x8> 100cc: 0007a803 lw a6,0(a5) 100d0: 0047a503 lw a0,4(a5) 100d4: 0087a583 lw a1,8(a5) 100d8: 00c7a603 lw a2,12(a5) 100dc: 0107a683 lw a3,16(a5) 100e0: 0147a703 lw a4,20(a5) 100e4: 0187a783 lw a5,24(a5) 100e8: fc010113 addi sp,sp,-64 100ec: 02812c23 sw s0,56(sp) 100f0: 02912a23 sw s1,52(sp) 100f4: 03212823 sw s2,48(sp) 100f8: 03312623 sw s3,44(sp) 100fc: 02112e23 sw ra,60(sp) 10100: 01012223 sw a6,4(sp) 10104: 00a12423 sw a0,8(sp) 10108: 00b12623 sw a1,12(sp) 1010c: 00c12823 sw a2,16(sp) 10110: 00d12a23 sw a3,20(sp) 10114: 00e12c23 sw a4,24(sp) 10118: 00f12e23 sw a5,28(sp) 1011c: 00410413 addi s0,sp,4 10120: 02010993 addi s3,sp,32 10124: 00000913 li s2,0 10128: 00000493 li s1,0 1012c: 0140006f j 10140 <main+0x7c> 10130: 00a92223 sw a0,4(s2) 10134: 00440413 addi s0,s0,4 10138: 00050913 mv s2,a0 1013c: 02898463 beq s3,s0,10164 <main+0xa0> 10140: 00800513 li a0,8 10144: 27c000ef jal ra,103c0 <malloc> 10148: 00042783 lw a5,0(s0) 1014c: 00052223 sw zero,4(a0) 10150: 00f52023 sw a5,0(a0) 10154: fc049ee3 bnez s1,10130 <main+0x6c> 10158: 00440413 addi s0,s0,4 1015c: 00050493 mv s1,a0 10160: fe8990e3 bne s3,s0,10140 <main+0x7c> 10164: 00600593 li a1,6 10168: 00048513 mv a0,s1 1016c: 0f0000ef jal ra,1025c <removeElements> 10170: 00051663 bnez a0,1017c <main+0xb8> 10174: 00002783 lw a5,0(zero) # 0 <exit-0x10094> 10178: 00100073 ebreak 1017c: 03c12083 lw ra,60(sp) 10180: 03812403 lw s0,56(sp) 10184: 03412483 lw s1,52(sp) 10188: 03012903 lw s2,48(sp) 1018c: 02c12983 lw s3,44(sp) 10190: 00000513 li a0,0 10194: 04010113 addi sp,sp,64 10198: 00008067 ret 0001025c <removeElements>: 1025c: 00051863 bnez a0,1026c <removeElements+0x10> 10260: 0580006f j 102b8 <removeElements+0x5c> 10264: 00452503 lw a0,4(a0) 10268: 04050463 beqz a0,102b0 <removeElements+0x54> 1026c: 00052783 lw a5,0(a0) 10270: feb78ae3 beq a5,a1,10264 <removeElements+0x8> 10274: 00452783 lw a5,4(a0) 10278: 00050693 mv a3,a0 1027c: 00079a63 bnez a5,10290 <removeElements+0x34> 10280: 02c0006f j 102ac <removeElements+0x50> 10284: 0047a783 lw a5,4(a5) 10288: 0046a683 lw a3,4(a3) 1028c: 00078e63 beqz a5,102a8 <removeElements+0x4c> 10290: 0007a703 lw a4,0(a5) 10294: feb718e3 bne a4,a1,10284 <removeElements+0x28> 10298: 0047a703 lw a4,4(a5) 1029c: 00e6a223 sw a4,4(a3) 102a0: 0047a783 lw a5,4(a5) 102a4: fe0796e3 bnez a5,10290 <removeElements+0x34> 102a8: 00008067 ret 102ac: 00008067 ret 102b0: 00000513 li a0,0 102b4: 00008067 ret 102b8: 00008067 ret ``` ::: ::: spoiler -O3 assembly code ``` 000100c4 <main>: 100c4: 000117b7 lui a5,0x11 100c8: 5ac78793 addi a5,a5,1452 # 115ac <__errno+0x8> 100cc: 0007a803 lw a6,0(a5) 100d0: 0047a503 lw a0,4(a5) 100d4: 0087a583 lw a1,8(a5) 100d8: 00c7a603 lw a2,12(a5) 100dc: 0107a683 lw a3,16(a5) 100e0: 0147a703 lw a4,20(a5) 100e4: 0187a783 lw a5,24(a5) 100e8: fc010113 addi sp,sp,-64 100ec: 02812c23 sw s0,56(sp) 100f0: 02912a23 sw s1,52(sp) 100f4: 03212823 sw s2,48(sp) 100f8: 03312623 sw s3,44(sp) 100fc: 02112e23 sw ra,60(sp) 10100: 01012223 sw a6,4(sp) 10104: 00a12423 sw a0,8(sp) 10108: 00b12623 sw a1,12(sp) 1010c: 00c12823 sw a2,16(sp) 10110: 00d12a23 sw a3,20(sp) 10114: 00e12c23 sw a4,24(sp) 10118: 00f12e23 sw a5,28(sp) 1011c: 00410493 addi s1,sp,4 10120: 02010993 addi s3,sp,32 10124: 00000913 li s2,0 10128: 00000413 li s0,0 1012c: 00800513 li a0,8 10130: 2d0000ef jal ra,10400 <malloc> 10134: 0004a783 lw a5,0(s1) 10138: 00052223 sw zero,4(a0) 1013c: 00f52023 sw a5,0(a0) 10140: 02040663 beqz s0,1016c <main+0xa8> 10144: 00a92223 sw a0,4(s2) 10148: 00448493 addi s1,s1,4 1014c: 02998663 beq s3,s1,10178 <main+0xb4> 10150: 00050913 mv s2,a0 10154: 00800513 li a0,8 10158: 2a8000ef jal ra,10400 <malloc> 1015c: 0004a783 lw a5,0(s1) 10160: 00052223 sw zero,4(a0) 10164: 00f52023 sw a5,0(a0) 10168: fc041ee3 bnez s0,10144 <main+0x80> 1016c: 00448493 addi s1,s1,4 10170: 00050413 mv s0,a0 10174: fb349ce3 bne s1,s3,1012c <main+0x68> 10178: 00600713 li a4,6 1017c: 00c0006f j 10188 <main+0xc4> 10180: 00442403 lw s0,4(s0) 10184: 04040c63 beqz s0,101dc <main+0x118> 10188: 00042783 lw a5,0(s0) 1018c: fee78ae3 beq a5,a4,10180 <main+0xbc> 10190: 00442783 lw a5,4(s0) 10194: 00600693 li a3,6 10198: 00078c63 beqz a5,101b0 <main+0xec> 1019c: 0007a703 lw a4,0(a5) 101a0: 0047a783 lw a5,4(a5) 101a4: 02d70663 beq a4,a3,101d0 <main+0x10c> 101a8: 00442403 lw s0,4(s0) 101ac: fe0798e3 bnez a5,1019c <main+0xd8> 101b0: 03c12083 lw ra,60(sp) 101b4: 03812403 lw s0,56(sp) 101b8: 03412483 lw s1,52(sp) 101bc: 03012903 lw s2,48(sp) 101c0: 02c12983 lw s3,44(sp) 101c4: 00000513 li a0,0 101c8: 04010113 addi sp,sp,64 101cc: 00008067 ret 101d0: 00f42223 sw a5,4(s0) 101d4: fc0794e3 bnez a5,1019c <main+0xd8> 101d8: fd9ff06f j 101b0 <main+0xec> 101dc: 00002783 lw a5,0(zero) # 0 <exit-0x10094> 101e0: 00100073 ebreak 000102a4 <removeElements>: 102a4: 00051863 bnez a0,102b4 <removeElements+0x10> 102a8: 0500006f j 102f8 <removeElements+0x54> 102ac: 00452503 lw a0,4(a0) 102b0: 04050063 beqz a0,102f0 <removeElements+0x4c> 102b4: 00052783 lw a5,0(a0) 102b8: feb78ae3 beq a5,a1,102ac <removeElements+0x8> 102bc: 00452783 lw a5,4(a0) 102c0: 00050693 mv a3,a0 102c4: 02078463 beqz a5,102ec <removeElements+0x48> 102c8: 0007a703 lw a4,0(a5) 102cc: 0047a783 lw a5,4(a5) 102d0: 00b70863 beq a4,a1,102e0 <removeElements+0x3c> 102d4: 0046a683 lw a3,4(a3) 102d8: fe0798e3 bnez a5,102c8 <removeElements+0x24> 102dc: 00008067 ret 102e0: 00f6a223 sw a5,4(a3) 102e4: fe0792e3 bnez a5,102c8 <removeElements+0x24> 102e8: 00008067 ret 102ec: 00008067 ret 102f0: 00000513 li a0,0 102f4: 00008067 ret 102f8: 00008067 ret ``` ::: ::: spoiler -Os assembly code ``` 000100c4 <main>: 100c4: fd010113 addi sp,sp,-48 100c8: 000115b7 lui a1,0x11 100cc: 01c00613 li a2,28 100d0: 6bc58593 addi a1,a1,1724 # 116bc <__errno+0x8> 100d4: 00410513 addi a0,sp,4 100d8: 02812423 sw s0,40(sp) 100dc: 02912223 sw s1,36(sp) 100e0: 03212023 sw s2,32(sp) 100e4: 02112623 sw ra,44(sp) 100e8: 00410413 addi s0,sp,4 100ec: 219000ef jal ra,10b04 <memcpy> 100f0: 00000913 li s2,0 100f4: 00000493 li s1,0 100f8: 00800513 li a0,8 100fc: 270000ef jal ra,1036c <malloc> 10100: 00042783 lw a5,0(s0) 10104: 00052223 sw zero,4(a0) 10108: 00f52023 sw a5,0(a0) 1010c: 02048863 beqz s1,1013c <main+0x78> 10110: 00a92223 sw a0,4(s2) 10114: 00050913 mv s2,a0 10118: 00440413 addi s0,s0,4 1011c: 02010793 addi a5,sp,32 10120: fc879ce3 bne a5,s0,100f8 <main+0x34> 10124: 00600593 li a1,6 10128: 00048513 mv a0,s1 1012c: 0f4000ef jal ra,10220 <removeElements> 10130: 00051a63 bnez a0,10144 <main+0x80> 10134: 00002783 lw a5,0(zero) # 0 <exit-0x10094> 10138: 00100073 ebreak 1013c: 00050493 mv s1,a0 10140: fd9ff06f j 10118 <main+0x54> 10144: 02c12083 lw ra,44(sp) 10148: 02812403 lw s0,40(sp) 1014c: 02412483 lw s1,36(sp) 10150: 02012903 lw s2,32(sp) 10154: 00000513 li a0,0 10158: 03010113 addi sp,sp,48 1015c: 00008067 ret 00010220 <removeElements>: 10220: 04050263 beqz a0,10264 <removeElements+0x44> 10224: 00052703 lw a4,0(a0) 10228: 00452783 lw a5,4(a0) 1022c: 00b70663 beq a4,a1,10238 <removeElements+0x18> 10230: 00050713 mv a4,a0 10234: 0200006f j 10254 <removeElements+0x34> 10238: 00078513 mv a0,a5 1023c: fe5ff06f j 10220 <removeElements> 10240: 0007a683 lw a3,0(a5) 10244: 00b69c63 bne a3,a1,1025c <removeElements+0x3c> 10248: 0047a683 lw a3,4(a5) 1024c: 00d72223 sw a3,4(a4) 10250: 0047a783 lw a5,4(a5) 10254: fe0796e3 bnez a5,10240 <removeElements+0x20> 10258: 00008067 ret 1025c: 00472703 lw a4,4(a4) 10260: ff1ff06f j 10250 <removeElements+0x30> 10264: 00008067 ret ``` ::: ::: spoiler -Ofast assembly code ``` 000100c4 <main>: 100c4: 000117b7 lui a5,0x11 100c8: 5ac78793 addi a5,a5,1452 # 115ac <__errno+0x8> 100cc: 0007a803 lw a6,0(a5) 100d0: 0047a503 lw a0,4(a5) 100d4: 0087a583 lw a1,8(a5) 100d8: 00c7a603 lw a2,12(a5) 100dc: 0107a683 lw a3,16(a5) 100e0: 0147a703 lw a4,20(a5) 100e4: 0187a783 lw a5,24(a5) 100e8: fc010113 addi sp,sp,-64 100ec: 02812c23 sw s0,56(sp) 100f0: 02912a23 sw s1,52(sp) 100f4: 03212823 sw s2,48(sp) 100f8: 03312623 sw s3,44(sp) 100fc: 02112e23 sw ra,60(sp) 10100: 01012223 sw a6,4(sp) 10104: 00a12423 sw a0,8(sp) 10108: 00b12623 sw a1,12(sp) 1010c: 00c12823 sw a2,16(sp) 10110: 00d12a23 sw a3,20(sp) 10114: 00e12c23 sw a4,24(sp) 10118: 00f12e23 sw a5,28(sp) 1011c: 00410493 addi s1,sp,4 10120: 02010993 addi s3,sp,32 10124: 00000913 li s2,0 10128: 00000413 li s0,0 1012c: 00800513 li a0,8 10130: 2d0000ef jal ra,10400 <malloc> 10134: 0004a783 lw a5,0(s1) 10138: 00052223 sw zero,4(a0) 1013c: 00f52023 sw a5,0(a0) 10140: 02040663 beqz s0,1016c <main+0xa8> 10144: 00a92223 sw a0,4(s2) 10148: 00448493 addi s1,s1,4 1014c: 02998663 beq s3,s1,10178 <main+0xb4> 10150: 00050913 mv s2,a0 10154: 00800513 li a0,8 10158: 2a8000ef jal ra,10400 <malloc> 1015c: 0004a783 lw a5,0(s1) 10160: 00052223 sw zero,4(a0) 10164: 00f52023 sw a5,0(a0) 10168: fc041ee3 bnez s0,10144 <main+0x80> 1016c: 00448493 addi s1,s1,4 10170: 00050413 mv s0,a0 10174: fb349ce3 bne s1,s3,1012c <main+0x68> 10178: 00600713 li a4,6 1017c: 00c0006f j 10188 <main+0xc4> 10180: 00442403 lw s0,4(s0) 10184: 04040c63 beqz s0,101dc <main+0x118> 10188: 00042783 lw a5,0(s0) 1018c: fee78ae3 beq a5,a4,10180 <main+0xbc> 10190: 00442783 lw a5,4(s0) 10194: 00600693 li a3,6 10198: 00078c63 beqz a5,101b0 <main+0xec> 1019c: 0007a703 lw a4,0(a5) 101a0: 0047a783 lw a5,4(a5) 101a4: 02d70663 beq a4,a3,101d0 <main+0x10c> 101a8: 00442403 lw s0,4(s0) 101ac: fe0798e3 bnez a5,1019c <main+0xd8> 101b0: 03c12083 lw ra,60(sp) 101b4: 03812403 lw s0,56(sp) 101b8: 03412483 lw s1,52(sp) 101bc: 03012903 lw s2,48(sp) 101c0: 02c12983 lw s3,44(sp) 101c4: 00000513 li a0,0 101c8: 04010113 addi sp,sp,64 101cc: 00008067 ret 101d0: 00f42223 sw a5,4(s0) 101d4: fc0794e3 bnez a5,1019c <main+0xd8> 101d8: fd9ff06f j 101b0 <main+0xec> 101dc: 00002783 lw a5,0(zero) # 0 <exit-0x10094> 101e0: 00100073 ebreak 000102a4 <removeElements>: 102a4: 00051863 bnez a0,102b4 <removeElements+0x10> 102a8: 0500006f j 102f8 <removeElements+0x54> 102ac: 00452503 lw a0,4(a0) 102b0: 04050063 beqz a0,102f0 <removeElements+0x4c> 102b4: 00052783 lw a5,0(a0) 102b8: feb78ae3 beq a5,a1,102ac <removeElements+0x8> 102bc: 00452783 lw a5,4(a0) 102c0: 00050693 mv a3,a0 102c4: 02078463 beqz a5,102ec <removeElements+0x48> 102c8: 0007a703 lw a4,0(a5) 102cc: 0047a783 lw a5,4(a5) 102d0: 00b70863 beq a4,a1,102e0 <removeElements+0x3c> 102d4: 0046a683 lw a3,4(a3) 102d8: fe0798e3 bnez a5,102c8 <removeElements+0x24> 102dc: 00008067 ret 102e0: 00f6a223 sw a5,4(a3) 102e4: fe0792e3 bnez a5,102c8 <removeElements+0x24> 102e8: 00008067 ret 102ec: 00008067 ret 102f0: 00000513 li a0,0 102f4: 00008067 ret 102f8: 00008067 ret ``` ::: #### Handwritten code: ``` .data length: .word 7 array: .word 1, 2, 6, 3, 4, 5, 6 target: .word 6 answer: .word 5, 4, 3, 2, 1 err_msg: .string "malloc fail" assert_fail: .string "assert fail" assert_success: .string "assert success" .text main: jal create_default_list mv t1, s1 # store list head to t1 jal print_list mv t1, s1 # store list head to t1 jal remove_elements mv t1, s1 # store list head to t1 jal print_list mv t1, s1 # store list head to t1 jal assert_list exit: addi a7, x0, 10 # exit ecall # ------ create_default_list ------ create_default_list: addi sp, sp, 8 li s0, 0x0fff0000 # address of list head sw ra, 0(sp) # store return address sw s0, 4(sp) # store the address of list head li s2, 0 # i = 0 la t1, array # load array la t0, length # load length lw t0, 0(t0) bne s2, t0, loop jr ra loop: addi a0, s0, 8 # malloc: move heap limit to current + 8, 4 byte for integer, 4 byte for pointer addi a7, x0, 214 # brk systemcall ecall beq a0, x0, exit_error lw s1, 0(t1) # load array[i] addi t1, t1, 4 # i++ sw s1, 0(s0) # node->value = array[i] sw t2, 4(s0) # prev->next = node mv t2, s0 # prev = node addi s2, s2, 1 addi s0, s0, 8 bne s2, t0, loop # check end condition in for loop ret: addi s1, s0, -8 lw ra, 0(sp) # load return address addi sp, sp, 8 jr ra # ------ remove_elements ------ remove_elements: la s0, target lw s0, 0(s0) # s0 = target head_while: beq t1, x0, end_iterate # head == null lw t2, 0(t1) lw t3, 4(t1) # pre-load head.next bne t2, s0, if_head # head->val!=val mv t1, t3 j head_while if_head: beq t1, x0, end_iterate mv t3, t1 # p = head lw t4, 4(t1) # q = head->next while_q: beq t4, x0, end_iterate lw t2, 0(t4) # load q.val lw t5, 4(t4) # pre-load q.next lw t6, 4(t3) # pre-load p.next mv t4, t5 # q = q.next beq t2, s0, remove mv t3, t6 # p = p.next j while_q remove: sw t5, 4(t3) # p.next = q.next j while_q end_iterate: mv s1, t1 # push head back jr ra # ------ print_list ------ print_list: lw a0, 0(t1) # load value store in node lw t1, 4(t1) # load address store in node->next addi a7, x0, 1 # print number ecall bne t1, x0, print_list addi a0, x0, 10 # '\n' acsii number addi a7, x0, 11 # print char ecall jr ra exit_error: la a0, err_msg addi a7, x0, 4 # print error message ecall addi a7, x0, 10 # exit ecall # ------ assert_list ------ assert_list: la t0, answer assert_loop: beq t1, x0, end_aseert lw a0, 0(t1) # load value store in node lw a1, 0(t0) # load value store in answer[i] lw t1, 4(t1) # load address store in node->next addi t0, t0, 4 # i++ beq a0, a1, assert_loop la a0, assert_fail addi a7, x0, 4 # print message ecall jr ra end_aseert: la a0, assert_success addi a7, x0, 4 # print message ecall jr ra ``` ## Analysis #### Size of Assembly code: | Subject | -O0 | -O1 | -O2 | -O3 | -Os | -Ofast | |:-------:|:-----:|:-----:|:-----:|:----- | ----- |:------ | | text | 75596 | 75404 | 75428 | 75552 | 75340 | 75552 | | data | 2816 | 2816 | 2816 | 2816 | 2816 | 2816 | | bss | 812 | 812 | 812 | 812 | 812 | 812 | | dec | 79224 | 79032 | 79056 | 79180 | 78968 | 79180 | | hex | 13578 | 134b8 | 134d0 | 1354c | 13478 | 1354c | #### Observation: | Subject | -O0 | -O1 | -O2 | -O3 | -Os | -Ofast | |:---------------:|:----- |:----- |:----- |:----- |:----- |:------ | | LOC | 123 | 77 | 125 | 125 | 104 | 141 | | Register S | s0 | s0-s3 | s0-s3 | s0-s3 | s0-s2 | s0-s3 | | Register T | - | - | - | - | - | - | | Register A | a0-5 | a0-a5 | a0-a6 | a0-a6 | a0-a5 | a0-a6 | | Other Regs | sp,ra | sp,ra | sp,ra | sp,ra | sp,ra | sp,ra | | CSR cycle count | 5708 | 5489 | 5494 | 5494 | 5545 | 5494 | We could observe that in Os and Ofast optimization choices, it uses more registers for preloading data; although the size of the code(**LOC**) increase tremendously, the number of **CSR** cycle has decreased. #### Improvement ![](https://i.imgur.com/uXqDXjQ.png =290x) ![](https://i.imgur.com/wsblu4M.png =300x) By reducing unnecessary checking and tweaking assert branch, without change of functionality, we decrease cycles by up to 14% and improve IPC by 10%. ## Reference [Homework 2](https://hackmd.io/@sysprog/2022-arch-homework2) [CSR counter](https://five-embeddev.com/riscv-isa-manual/latest/counters.html) [Lab2 source codes](https://github.com/abnormal749/Computer_Arch2022/tree/main/Lab2)