kdnvt
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    1
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # Assignment1: RISC-V Assembly and Instruction Pipeline contributed by < [`kdnvt`](https://github.com/kdnvt/Palindrome-linked-list) > :information_source: There are two branches in the [GitHub repository](https://github.com/kdnvt/Palindrome-linked-list). While `main` is the straightforward version, `unrolling` use the technique of [loop unrolling](https://en.wikipedia.org/wiki/Loop_unrolling) to optimize the code. ## Palindrome Linked List ([LeetCode 234](https://leetcode.com/problems/palindrome-linked-list/)) > Description: Given the head of a singly linked list, return true if it is a [palindrome](https://en.wikipedia.org/wiki/Palindrome) or false otherwise. > > Constraints: The number of nodes in the list is in the range [1, 105]. 0 <= Node.val <= 9 > >Follow up: Could you do it in $O(n)$ time and $O(1)$ space? ## Solution ### Idea for problem solving Given the condition of $O(1)$ space, I think the modification of original linked list is inevitable. My basic idea is to reverse the linked list node one by one until middle node, and compare the first half list, which has already been reversed, with the second half list. If two linked list are the same, the original linked list is palindrome linked list. ### C program #### First version ```c bool isPalindrome(struct ListNode* head) { struct ListNode *tmp,*cur = head, *next = head->next; int length= 0; for(;cur!=NULL;cur = cur->next,length++); if(length<2) return true; cur = head; for(int i=0;i<length/2 - 1;i++){ tmp = next->next; next->next = cur; cur = next; next = tmp; } if (length%2 ){ next = next->next; } for(int i=0;i<length/2 - 1;i++){ if(cur->val != next->val) break; cur = cur->next; next = next->next; } if(cur->val == next->val) return true; return false; } ``` Because getting the length of linked list directly is impossible, my first intuition is to traverse the whole linked list, get the length of it, and calculate how many nodes should I reverse. After that, I can reverse the corresponding number of nodes, and compare the lists. #### Second version ```c bool isPalindrome(struct ListNode* head) { struct ListNode *tmp,*cur = head, *next = head->next,*tail = cur; if(!head->next) return true; while(tail!=NULL) { tail = tail->next; if (!tail) { cur = cur->next; break; } tail = tail->next; if (!tail) break; tmp = next->next; next->next = cur; cur = next; next = tmp; } while(next->next) { if(cur->val != next->val) break; cur = cur->next; next = next->next; } if(cur->val == next->val) return true; return false; } ``` After the first version, I was thinking of whether there is a better way to solve this problem. It occurs to me that one of solutions to [remove middle node of linked list](https://leetcode.com/problems/middle-of-the-linked-list/) uses the fast-slow pointers technique. #### Third version ```c bool isPalindrome(struct ListNode* head) { struct ListNode *tmp, *prev = NULL,*cur = head, *tail = cur; if (!head->next) return true; while(tail && tail->next) { tail = tail->next->next; tmp = cur->next; cur->next = prev; prev = cur; cur = tmp; } if (tail) // length is odd number cur = cur->next; while (cur->next) { if(cur->val != prev->val) break; cur = cur->next; prev = prev->next; } if (cur->val == prev->val) return true; return false; } ``` Same thought as second version, but a more concise one. #### Fourth version ```c while (cur) { if (cur->val != prev->val) return false; cur = cur->next; prev = prev->next; } return true; ``` By simplifying the comparing while loop, it could be much more consice. ### Assembly ```c .data list : .word 1,0,2,0,2,0,1,0 list2: .word 1,0,2,0,3,0,1,0 list3: .word 1,0,2,0,3,0,2,0,1,0 list4: .word 1,0,2,0,3,0,3,0,1,0 list5: .word 1,0 .text main: la s4, list # s4 stores the head of the target list # the process to link the list. # list5 don't need this section. addi t0, s4, 8 sw t0, 4(s4) addi t0, s4, 16 sw t0, 12(s4) addi t0, s4, 24 sw t0, 20(s4) # addi t0, s4, 32 # for length 5 list # sw t0, 28(s4) # for length 5 list # end of linking section add a0, s4, zero jal ra, isPalindrome j print isPalindrome: addi sp,sp,-16 # push s0~s4 sw s0,0(sp) sw s1,4(sp) sw s2,8(sp) sw s3,12(sp) add s0, zero, zero # s0 prev = NULL; add s1, a0, zero # s1 cur = head; add s2, a0, zero # s2 = tail = head; lw t0, 4(a0) # t0 = head->next; beq t0,zero, true loop1: beq s2, zero, end1 # while(tail) lw t0, 4(s2) # t0 = tail->next; beq t0, zero, end1 # while(tail->next) lw t1, 4(t0) # t1 = tail->next->next; add s2, t1, zero # tail = tail->next->next; lw s3, 4(s1) # s3 = tmp = cur->next; sw s0, 4(s1) # cur->next = prev; add s0, s1, zero # prev = cur; add s1, s3, zero # cur = tmp; j loop1 end1: beq s2,zero,loop2 lw s1, 4(s1) # cur = cur->next loop2: lw t0, 4(s1) # t0 = cur->next; beq t0,zero,end2 # while(cur->next) lw t0, 0(s0) # t0 = prev->val; lw t1, 0(s1) # t1 = cur->val; bne t0,t1, end2 # if(cur->val != prev->val) lw s0, 4(s0) # prev = prev->next; lw s1, 4(s1) # cur = cur->next; j loop2 # end2: lw t0, 0(s1) lw t1, 0(s0) beq t0,t1, true #if(cur->val == prev->val) return true add a0,zero,zero #return false j pop true: addi a0,zero,1 #return true pop: lw s0,0(sp) lw s1,4(sp) lw s2,8(sp) lw s3,12(sp) addi sp,sp,16 # pop s0~s4 jr ra print: li a7,1 ecall exit: ``` There are 5 test data in this program: * `list` : Palindrome list with even number length. * `list2` : Non-palindrome list with even number length. * `list3` : Palindrome list with odd number length. * `list4` : Non-palindrome list with odd number length. * `list5` : Single node list.(palindrome) #### experiment results The 5-stage in-order processor with hazard detection/elimination and forwarding gets correct results on each of test data. However, when using the processor without hazard detection/elimination, the simulator says "Unknown system call in register 'a7':0 ". It obvious that there exists hazard around the "ecall" instruction. Compare the pipeline diagram of two version : With hazard detection/elimination: | Instruction | | | | | | | | | |:------------- | --- | --- | --- | --- | --- | --- | --- | --- | | addi x17 x0 1 | IF | ID | EX | MEM | WB | | | | | ecall | | IF | ID | EX | - | - | MEM | WB | Without hazard detection/elimination: | Instruction | | | | | | | |:------------- | --- | --- | --- | --- | --- | --- | | addi x17 x0 1 | IF | ID | EX | MEM | WB | | | ecall | | IF | ID | EX | MEM | WB | Finally, I add three nop operations before ecall, so the processor without hazard detection can use the ecall. (two nop aren't enough) | Instruction | | | | | | | | | | |:------------- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | addi x17 x0 1 | IF | ID | EX | MEM | WB | | | | | | addi x0 x0 0 | | IF | ID | EX | MEM | WB | | | | | addi x0 x0 0 | | | IF | ID | EX | MEM | WB | | | | addi x0 x0 0 | | | | IF | ID | EX | MEM | WB | | | ecall | | | | | IF | ID | EX | MEM | WB | After I fixed the ecall, nevertheless, the processor without hazard detection still got the wrong result. It seems like there are other hazards. #### Potential data hazard When comparing two pipeline diagrams, I can find many hazards which are eliminated by the standard processor. Standard: | Instruction | | | | | | | | | ------------- | --- | --- | --- | --- | --- | --- | --- | | lw x5 4 x10 | IF | ID | EX | MEM | WB | | | | beq x5 x0 104 | IF | ID | - | EX | MEM | WB | | | beq x18 x0 40 | | IF | - | ID | EX | MEM | WB | without hazard detection: | Instruction | | | | | | | | | ------------- | --- | --- | --- | --- | --- | --- | --- | | lw x5 4 x10 | IF | ID | EX | MEM | WB | | | | beq x5 x0 104 | | IF | ID | EX | MEM | WB | | | beq x18 x0 40 | | | IF | ID | EX | MEM | WB | As we can see, there is a RAW(Read After Write) data hazard on x5 register. ![](https://i.imgur.com/DT42F0O.png) This diagram is the state before stalling. `beq x5 x0 104 <true>` is in the ID stage, while `lw x5 4 x10` is in the EXE stage. ![](https://i.imgur.com/yVTiH6z.png) This diagram presents the state when stalling happens. The `lw` instruction read the `x5` register, the value is 0x10000008 on the diagram, which is different from the previous value 0x10000018 `beq` just read from `x5`. ![](https://i.imgur.com/EqHZbBu.png) This diagram show the next state. In this state, `lw` is in the WB stage, which means it will write back the new value to register `x5`. As we can see, the `Wr_En` signal in Registers file is green, which means it is enabled. Aside from writing back, it is also "forwarding" the new value to `beq` in EXE stage. The diagram below shows the forwarding path. `beq` in EXE stage compares the two value from registers `x5`, `x0`, and decides whether branches or not. Two diagrams below shows the data path of register value. The first diagram shows the data path of `x5` from forwarding `lw` instruction, while the second diagram shows the data path of `x0`. `beq` then compares the two value at the Branch unit which is shown at the bottom part of diagram. In this example, `x5` is 0x10000008 and `x0` is 0x00000000, so the branch isn't taken. As we can see, the `Branch_taken` signal is red. ![](https://i.imgur.com/avEnJLe.png) This diagram shows `x5` data path. ![](https://i.imgur.com/2nM78aY.png) This diagram shows `x0` data path. ![](https://i.imgur.com/00RiGtn.png) This diagram shows another operation `beq` do in EXE stage, calculating the branch address. The upper value 0x00000050 is the instruction address of `beq`, while the below value 0x00000068 is the immediate field specified in instruction, which is the same to 104(decimal) in the `beq x5 x0 104`. `beq` then adds the two value and generate the target address 0x000000b8. In this example, the branch isn't taken. ## Assembly optimization ### Use temporary registers Because `isPalindrome` doesn't call any other function, it is feasible to replace all callee-saved register with caller-save register, so we don't need to push them on the stack everytime the function being called. ```c isPalindrome: add t4, zero, zero # t4 prev = NULL; add t5, a0, zero # t5 cur = head; add t2, a0, zero # t2 = tail = head; lw t0, 4(a0) # t0 = head->next; beq t0,zero, true loop1: beq t2, zero, end1 # while(tail) lw t0, 4(t2) # t0 = tail->next; beq t0, zero, end1 # while(tail->next) lw t1, 4(t0) # t1 = tail->next->next; add t2, t1, zero # tail = tail->next->next; lw t3, 4(t5) # t3 = tmp = cur->next; sw t4, 4(t5) # cur->next = prev; add t4, t5, zero # prev = cur; add t5, t3, zero # cur = tmp; j loop1 end1: beq t2,zero,loop2 lw t5, 4(t5) # cur = cur->next loop2: lw t0, 4(t5) # t0 = cur->next; beq t0,zero,end2 # while(cur->next) lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; bne t0,t1, end2 # if(cur->val != prev->val) lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; j loop2 # end2: lw t0, 0(t5) lw t1, 0(t4) beq t0,t1, true #if(cur->val == prev->val) return true add a0,zero,zero #return false j pop true: addi a0,zero,1 #return true pop: jr ra ``` ### Avoid redundant branch In the comparing while loop of `isPalindrome`, there are several different branch. By referencing the fourth edition of my C program, I obviate the redundant branches and make it more concise. ```c isPalindrome: add t4, zero, zero # t4 prev = NULL; add t5, a0, zero # t5 cur = head; add t2, a0, zero # t2 = tail = head; lw t0, 4(a0) # t0 = head->next; beq t0, zero, end2 loop1: beq t2, zero, end1 # while(tail) lw t0, 4(t2) # t0 = tail->next; beq t0, zero, end1 # while(tail->next) lw t1, 4(t0) # t1 = tail->next->next; add t2, t1, zero # tail = tail->next->next; lw t3, 4(t5) # t3 = tmp = cur->next; sw t4, 4(t5) # cur->next = prev; add t4, t5, zero # prev = cur; add t5, t3, zero # cur = tmp; j loop1 end1: add a0,zero,zero # if two list are different # the loop will jump to end # and the a0 will be 0. beq t2,zero,loop2 lw t5, 4(t5) # cur = cur->next loop2: beq t5,zero,end2 # while(cur) lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; bne t0,t1, fail # if(cur->val != prev->val) lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; j loop2 # end2: addi a0,zero,1 # if loop terminates normally(cur == NULL). # this line will be executed. fail: jr ra ``` ### Loop unrolling I think this one is interesting, because it doesn't reduce the number of instructions in the instruction memory. On the contrary, it increases the number. However, except for the disadvantage on the code size and readability, it may actually improve the performance of the program by reducing branch and utilizing pipelining. The reason I want to use loop unrolling on this program is that Ripes always takes next instruction as predicted one, which means the loop will take the wrong branch most of the time. Therefore, it is a good opportunity to use loop unrolling, because it reduce the number of branch. In order to demonstrate the impact of loop unrolling, I have to build a long linked list. I introduce some new function shown as below: ```c ################ generate ################ generate: # a0 number addi sp,sp,-8 sw s0,0(sp) sw ra,4(sp) add s0,zero,a0 slli s0,s0,3 # 8bytes per node, 2^3 = 8, s0 is the size of whole momery add a0,s0,zero jal ra, sbrk la t0,heap_base lw t0, 0(t0) add a0,t0,zero addi t0,t0,4 # t0 is the address of "next" of each node add t1, t0,s0 # t1 is the boundry gen_loop: bgt t0,t1,gen_end addi t2,t0,4 sw t2,0(t0) # node->next pointing to next node sw zero,-4(t0) # node->val = 0 addi t0,t0,8 # next node j gen_loop gen_end: sw zero,-8(t0) # make final node pointing to null lw s0,0(sp) lw ra,4(sp) addi sp,sp,8 jr ra ################ sbrk ################ sbrk: # a0 incre_size la t0,heap_end lw t1,0(t0) add a0,t1,a0 sw a0,0(t0) add t1,a0,zero li a7,214 ecall # Todo : consider brk fail add a0,t1,zero jr ra ``` Ripes doesn't support `sbrk` system call, so I use `brk` to implement a similar function. The function `generate` uses sbrk to require a block of memory, and use that to generate an arbitrary length linked list. The length is assigned from the first parameter passed in this function in register `a0`. I plan to experiment three cases: 1. without unrolling 2. unrolling all loops two times 3. unrolling all loops four times All loops includes loop1, loop2, gen_loop. Takes loop2 for example. Unrolling 2 times: ```c loop2: beq t5,zero,end2 # while(cur) lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; bne t0,t1, fail # if(cur->val != prev->val) lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; beq t5,zero,end2 # while(cur) lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; bne t0,t1, fail # if(cur->val != prev->val) lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; j loop2 # ``` Unrolling 4 times: ```c loop2: beq t5,zero,end2 # while(cur) lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; bne t0,t1, fail # if(cur->val != prev->val) lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; beq t5,zero,end2 # while(cur) lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; bne t0,t1, fail # if(cur->val != prev->val) lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; beq t5,zero,end2 # while(cur) lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; bne t0,t1, fail # if(cur->val != prev->val) lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; beq t5,zero,end2 # while(cur) lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; bne t0,t1, fail # if(cur->val != prev->val) lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; j loop2 # ``` I try to pass a linked list with 100 nodes to `isPalindrome`, and compare the difference between cases. Here's the results: Without unrolling: ![](https://i.imgur.com/aQbn7Uw.png) Unrolling two times: ![](https://i.imgur.com/kuJyVAZ.png) Unrolling four times: ![](https://i.imgur.com/4M4TJAS.png) As we can see, for a linked list with 100 nodes, the number of cycles drops from 2098 to 1680 with four times unrolling. ### Obviate data hazards by reordering instructions There are several data hazards in this program, which leads to stalling of pipeline and reduces the performance. Here, I tried to reorder instructions and avoid the hazards. I didn't deal with all of hazards but only those in the loops, because the process runs loops most of the time. ```c loop1: lw t0, 4(t2) # t0 = tail->next; beq t2, zero, end1 # while(tail) beq t0, zero, end1 # while(tail->next) lw t1, 4(t0) # t1 = tail->next->next; lw t3, 4(t5) # t3 = tmp = cur->next; add t2, t1, zero # tail = tail->next->next; sw t4, 4(t5) # cur->next = prev; add t4, t5, zero # prev = cur; add t5, t3, zero # cur = tmp; j loop1 end1: ``` ```c loop2: beq t5,zero,end2 # while(cur) lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; bne t0,t1, fail # if(cur->val != prev->val) j loop2 # end2: ``` This figure below shows the pipeline diagram of loop2 without reordering: ![](https://i.imgur.com/iExHPUn.png) This figure below shows the pipeline diagram of loop2 with reordering: ![](https://i.imgur.com/75476cO.png) As we can see, the one stalling in each loop is eliminated. I combine instructions reordering with loop unrolling, and measures the performance on linked list with 100 nodes. Without unrolling: ![](https://i.imgur.com/Lflig3a.png) Unrolling with two times: ![](https://i.imgur.com/TKPoHfl.png) Unrolling with four times: ![](https://i.imgur.com/lUkGOfM.png) ### Eliminate redundant branch instructions by strip mining like technique :::warning There are some updates in the program. I remove the `gen_loop` in `generate` function, and uses a new function `connect` instead. The function connects the nodes in a continuous memory block. The new loop `con_loop` in `connect` have nearly the same effect as original gen_loop. **For the following examples, I will use con_loop instead of gen_loop.** To see the newest version of code, please view the following link: https://github.com/kdnvt/Palindrome-linked-list It contains two different branch: `main` and `unrolling`. The main is the standard version, while `unrolling` use the loop unrolling technique. ::: There are some branch instrucions in unrolling con_loop: ```c con_loop: bge t0,a1,con_end addi t1,t0,4 # t1 = address of next node sw t1,0(t0) # node->next pointing to next node addi t0,t0,8 # next node bge t0,a1,con_end addi t1,t0,4 # t1 = address of next node sw t1,0(t0) # node->next pointing to next node addi t0,t0,8 # next node bge t0,a1,con_end addi t1,t0,4 # t1 = address of next node sw t1,0(t0) # node->next pointing to next node addi t0,t0,8 # next node bge t0,a1,con_end addi t1,t0,4 # t1 = address of next node sw t1,0(t0) # node->next pointing to next node addi t0,t0,8 # next node j con_loop con_end: ``` When I unroll the loop four times, the number of `bge` instruction increase correspondingly. I think these instrucions is redundant and want to eliminate increased instructions and make each iteration body have only one `bge` instruction. The solution I want to use is similar to strip mining in compilers for vector processors, which splits original loop into two new loops. One runs iterations without unrolling, while another runs iterations with unrolling. Let n be number of iterations, and k be the times of loop unrolling. First execute n%k iterations in original loop, and then execute remaining iterations in k unrolling loop. Process can determine whether to take the branch or not on the head of loop body, and obviate the checking for the rest of body, since the the remaining iterations must be multiple of k. The optimized code is shown as below: ```c andi t2,a1,3 # t2 = number of nodes % 4 (loop unrolling times) addi t0,a0,4 # t1 = &first_node->next slli a1,a1,3 # a1 = size of list add a1,a1,t0 # a1 = boundry con_loop_normal: beq t2,zero,con_loop addi t1,t0,4 # t1 = address of next node sw t1,0(t0) # node->next pointing to next node addi t2,t2,-1 addi t0,t0,8 # next node j con_loop_normal con_loop: bge t0,a1,con_end addi t1,t0,4 # t1 = address of next node sw t1,0(t0) # node->next pointing to next node addi t0,t0,8 # next node addi t1,t0,4 # t1 = address of next node sw t1,0(t0) # node->next pointing to next node addi t0,t0,8 # next node addi t1,t0,4 # t1 = address of next node sw t1,0(t0) # node->next pointing to next node addi t0,t0,8 # next node addi t1,t0,4 # t1 = address of next node sw t1,0(t0) # node->next pointing to next node addi t0,t0,8 # next node j con_loop con_end: ``` I measured the number of cycles and instructions retired with linked list with 500 nodes. Without unrolling: ![](https://i.imgur.com/9PWzLXk.png) With unrolling four times: ![](https://i.imgur.com/w1Itwe7.png) With unrolling four times and strip mining on con_loop: ![](https://i.imgur.com/5x8FOFd.png) It is interesting that unrolling with strip mining got fewer cycles and instrucions retired but a higher CPI than unrolling without strip mining, which is a good example to demonstrate that higher CPI doesn't always means better performance. After optimize the `con_loop`, I try to use the same technique to optimize loop1 and loop2. Unfortunately, since we don't know the exact length of linked list at the beginning, it is infeasible to apply strip mining on loop1. However, if we count the number of iterations of loop1, we can optimize the loop2. Nonetheless, the effect of this optimization is not very well, because we need to add extra instrucions in loop1 and loop2 to count. Theoretically, the effect would be more obvious with much longer list or much larger number of unrolling. The optimized code is shown as below: ```c li t6, 0 # t6 = 0; used to count the number of loop # ... loop1: lw t0, 4(t2) # t0 = tail->next; beq t2, zero, end1 # while(tail) beq t0, zero, end1 # while(tail->next) lw t1, 4(t0) # t1 = tail->next->next; lw t3, 4(t5) # t3 = tmp = cur->next; add t2, t1, zero # tail = tail->next->next; sw t4, 4(t5) # cur->next = prev; add t4, t5, zero # prev = cur; add t5, t3, zero # cur = tmp; # ... # ... # unrolling loops ... # ... # ... lw t0, 4(t2) # t0 = tail->next; beq t2, zero, end1 # while(tail) beq t0, zero, end1 # while(tail->next) lw t1, 4(t0) # t1 = tail->next->next; lw t3, 4(t5) # t3 = tmp = cur->next; add t2, t1, zero # tail = tail->next->next; sw t4, 4(t5) # cur->next = prev; add t4, t5, zero # prev = cur; add t5, t3, zero # cur = tmp; addi t6,t6,1 # t6++; j loop1 end1: add a0,zero,zero # if two list are different # the loop will jump to end # and the a0 will be 0. beq t2,zero,loop2 lw t5, 4(t5) # cur = cur->next loop2: beq t6,zero,ori_loop2 # while there are unrolling iterations lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; bne t0,t1, fail # if(cur->val != prev->val) lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; bne t0,t1, fail # if(cur->val != prev->val) lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; bne t0,t1, fail # if(cur->val != prev->val) lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; bne t0,t1, fail # if(cur->val != prev->val) addi t6,t6,-1 j loop2 ori_loop2: beq t5,zero,end2 # while(cur) lw t0, 0(t4) # t0 = prev->val; lw t1, 0(t5) # t1 = cur->val; lw t4, 4(t4) # prev = prev->next; lw t5, 4(t5) # cur = cur->next; bne t0,t1, fail # if(cur->val != prev->val) j ori_loop2 end2: ``` This optimization with 500 nodes list is shown as below: ![](https://i.imgur.com/zozlcGq.png)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully