# 2019q3 Homework4 (quiz4) contributed by < `colinyoyo26` > ## 測驗 `1` 考慮一個單向 linked list: ```c typedef struct __list { int data; struct __list *next; } list; ``` 在不存在環狀結構的狀況下,以下函式能夠對 linked list 元素從小到大排序: ```c list *sort(list *start) { if (!start || !start->next) return start; list *left = start; list *right = left->next; LL0; left = sort(left); right = sort(right); for (list *merge = NULL; left || right; ) { if (!right || (left && left->data < right->data)) { if (!merge) { LL1; } else { LL2; merge = merge->next; } LL3; } else { if (!merge) { LL4; } else { LL5; merge = merge->next; } LL6; } } return start; } ``` 請補完程式碼。 :::success 延伸問題: 1. 解釋上述程式運作原理; 2. 指出程式改進空間,特別是考慮到 [Optimizing merge sort](https://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sort); 3. 將上述 singly-linked list 擴充為 circular doubly-linked list 並重新實作對應的 `sort`; 4. 依循 Linux 核心 [include/linux/list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h) 程式碼的方式,改寫上述排序程式; 5. 嘗試將原本遞迴的程式改寫為 iterative 版本; 6. 將 [2019q1 第 3 週測驗題 (下)](https://hackmd.io/@sysprog/SkrVSKiU4) 提及的 XOR linked list 列入考量,也實作對應的排序程式,並比較效能和記憶體開銷; 7. 比照 [List Sort](https://arxiv.org/pdf/1310.7890.pdf) 的手法來分析前述程式 (含擴充版本) 的效能表現; ::: ==解題== - 一開始這邊的實作比較特別一點,每次 divide 的點為第一個元素,所以可以知道 LL0 為 `left->next = NULL` ```c list *left = start; list *right = left->next; LL0; ``` - 再來只有 merge 第一個元素時會進入 `if (!merge)` ,可以看出 LL1 為 `start = merge = left` - start 指向 merge 後第一個元素 - 接者不是第一個的情況只要把尾端元素的 `next` 指下一個元素就好 `merge->next = left - 最後 left 被選到所以要往後移`left = left->nex` - 右邊和左邊同理 ```cpp if (!right || (left && left->data < right->data)) { if (!merge) { LL1; } else { LL2; merge = merge->next; } LL3; } ``` ### 程式改進空間 - 如同解題時提到的「每次 divide 的點為第一個元素」所以執行的情況其實會像 quick-sort 的 worst case 一樣( pivot 都選到極值),時間複雜度為 $O(n^2)$ - `merge sort` 就是從中間切才有 $O(nlogn)$ 的穩定效能,但是這個實作沒做到這點 - 其時間開銷寫成遞迴式: $T(n) = T(n - 1) + n$ - 以下簡單圖例說明 - [ ] divide 過程 ![](https://i.imgur.com/eKlEXrl.jpg) ![](https://i.imgur.com/EC0fcrM.jpg) ![](https://i.imgur.com/ffwkprx.jpg) ![](https://i.imgur.com/ywSGUjB.jpg) - [ ] merge 過程 ![](https://i.imgur.com/1GzV3Pp.jpg) ![](https://i.imgur.com/BCIYWvZ.jpg) ![](https://i.imgur.com/xbwwsnu.jpg) :::warning 改用 GraphViz 重新製作上方 div/merge 圖例; GraphViz 的描述內容可內嵌於 HackMD 頁面,請善用; :notes: jserv ::: - 用內嵌語法會有如下警告訊息跑出來,不知道是不是 bug :::warning TypeError: Y.renderString(...).then(...).catch(...).finally is not a function ::: > 太好了,表示你可以提交錯誤回報: https://github.com/hackmdio/hackmd-io-issues/issues > 之後 HackMD 改版就會有你的名字 - 為了驗證我的推論所以做了以下實驗 (擷取部份 code) - merge.c ```c int main(int argc, char** argv){ size_t size = atoi(argv[1]); list head; struct timespec start, end; head.next = NULL; for (size_t i = 0; i < size; i++){ list *item = malloc(sizeof(list)); item->data = random(); item->next = head.next; head.next = item; } clock_gettime(CLOCK_REALTIME, &start); head.next = sort(head.next); clock_gettime(CLOCK_REALTIME, &end); printf("%lu %lu\n", size, end.tv_nsec - start.tv_nsec ); return 0; } ``` - test.sh ```shell #!/bin/bash rm *.txt gcc -o m m.c for i in $(seq 1 50 2000) do ./m $i >> out.txt done gnuplot ./plot.gp ``` - 結果 - 從數據及圖可以看出的確是 $O(n^2)$ 趨勢 ![](https://i.imgur.com/aoRNBD2.png) - 部份實驗數據 |elments | time (ns) |-----|- | 1 | 119 | | 101 | 28327 | | 201 | 114818 | | 301 | 221432 | | 401 | 386574 | | 501 | 598338 | | 601 | 862473 | | 701 | 1165719 | | 801 | 1680010 | | 901 | 1893432 | - 為了更嚴謹一些,重新改寫了從中間 divide 的版本,看出效能確實有很大的差異 ![](https://i.imgur.com/gqRg5ol.png) - 主要是多了這個迴圈,把 list 平均分到 left right ```c for (size_t i = 0; start; i++){ tem = start; start = start->next; if (i & 1){ tem->next = left.next; left.next = tem; } else { tem->next = right.next; right.next = tem; } } ``` - [Optimizing merge sort](https://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sort) 提到因為 memory hierarchy 的關係,所以用 Cache-aware versions of the merge sort algorithm 減少 cache 資料搬動的次數,其中一個作法為分割到剛好滿足 cache 大小時就用 `in-place sorting algorithm` 排序 :::warning [skip list](https://en.wikipedia.org/wiki/Skip_list) 是種變形,之前修課的學生對此進行研究: [共筆](https://hackmd.io/@Vy_ZuZdnSo-Wv4jSWgNmLw/B1OZFDL1H) :notes: jserv ::: - 試著在 size <= 64 時使用 insertion sort 且為了善用 spatial locality 把 divide 的迴圈改成以下形式 - 差在 if 的條件,一開始沒注意到這點,所以實驗結果一直不如預期 - 雖然是 linked list 但是一開始插入節點時是循序的差入,每個節點的 virtual address 只相差 32 bytes - 待更新 ```cpp for(size_t i = 0; start; i++){ tem = start; start = start->next; if (i < size / 2){ tem->next = left.next; left.next = tem; } else { tem->next = right.next; right.next = tem; } } ``` - 實驗結果 ![](https://i.imgur.com/KbWQB7z.png) - insertion sort ```cpp list *insert_sort(list *start){ list **prev, **cur = &start, *tem; while(*cur){ cur = &(*cur)->next; prev = &start; while(*cur && *prev != *cur){ if((*prev)->data > (*cur)->data){ tem = *cur; *cur = (*cur)->next; tem->next = *prev; *prev = tem; prev = &start; continue; } prev = &(*prev)->next; } } return start; } ``` ### circular doubly-linked list - 實作了邏輯相同的 doublly linked 版本 - 因為 `list` 的操作會一直用到,所以寫成 function - `replace` function 內一開始忘記把左右 node 只到新的 head 找了很久 - sort.c ```c void sort(list *start) { /* empty or singular */ if (start->next == start->prev) return; list left, right; init_list(&left); init_list(&right); insert_tail(&left, pop_item(start->next)); replace(start, &right); sort(&right); init_list(start); for (; !is_empty(&left) || !is_empty(&right); ) { if (is_empty(&right) || (!is_empty(&left) && first_data(&left) < first_data(&right))) insert_tail(start, pop_item(left.next)); else insert_tail(start, pop_item(right.next)); } } ``` - list.h ```cpp typedef struct __list { int data; struct __list *next; struct __list *prev; } list; static inline void init_list(list* head){ head->next = head; head->prev = head; } /* insert to tail of linked list */ static inline void insert_tail(list* head, list* target){ list *prev = head->prev; target->next = head; target->prev = prev; prev->next = target; head->prev = target; } /* pop the certain item and return it */ static inline list* pop_item(list* item){ list *next = item->next, *prev = item->prev; next->prev = prev; prev->next = next; return item; } static inline bool is_empty(list* head){ return head->next == head; } /* return data of first entry*/ static inline int first_data(list* head){ return head->next->data; } /* replace the head of list */ static inline void replace(list* origin, list* new){ list *next = origin->next, *prev = origin->prev; *new = *origin; next->prev = new; prev->next = new; } ``` ### iterative version - 相同邏輯的 iterative版本,差在是從左邊 merge 時間複雜度也是 O($n^2$) ```c list *sort_iter(list *start) { if (!start || !start->next) return start; list *left = start; list *right = left->next; list *tem; left->next = NULL; do { tem = right->next; right->next = NULL; for (list *merge = NULL; left || right; ) { if (!right || (left && left->data < right->data)) { if (!merge) { start = merge = left; } else { merge->next = left; merge = merge->next; } left = left->next; } else { if (!merge) { start = merge = right; } else { merge->next = right; merge = merge->next; } right = right->next; } } left = start; right = tem; } while(tem); return start; } ``` ### XOR linked list - sort.c ```c void sort(xlist* list) { if (!list || list->head == list->tail) return ; xlist left, right; xlist_init(&left); xlist_insert(&left, xlist_popfront(list)); right = *list; sort(&right); xlist_init(list); for (; left.head || right.head; ) { if (!right.head || (left.head && first_data(&left) < first_data(&right))) xlist_insert(list, xlist_popfront(&left)); else xlist_insert(list, xlist_popfront(&right)); } } ``` - xlist.h ```c typedef struct xor_node { int data; uint64_t xptr; } xnode; typedef struct xor_list { struct xor_node *head, *tail; } xlist; static inline void xlist_init(xlist* list){ list->head = list->tail = NULL; } /* use to trace the list */ static inline void xlist_next(xnode** cur, xnode** prev){ xnode* tem = *cur; *cur = (xnode*) ((uint64_t) (*cur)->xptr ^ (uint64_t) *prev); *prev = tem; } /* pop and retrun first item */ static inline xnode* xlist_popfront(xlist* list){ xnode *tem, *next = (xnode *) (list->head->xptr ^ (uint64_t) NULL); /* originally, next->xptr == list.head ^ next->next */ if(next) next->xptr = next->xptr ^ (uint64_t) list->head ^ (uint64_t) NULL; tem = list->head; list->head = next; return tem; } /* retrun data in first entry */ static inline int first_data(xlist* list){ return ((xnode *)((uint64_t) list->head ^ (uint64_t) NULL))->data; } /* insert tail */ static inline void xlist_insert(xlist* list, xnode* node){ node->xptr = (uint64_t) list->tail ^ (uint64_t) NULL; if(!list->head){ list->tail = node; list->head = node; return; } list->tail->xptr = list->tail->xptr ^ (uint64_t) NULL ^ (uint64_t) node; list->tail = node; } ``` - 比較和 doubly linked list 記憶體開銷 (size 100) - 用 valgrind 觀測 heap 使用率 - 看出用 doubly-linked list 每個 block 多了 8 bytes 也就是一個指標所佔的大小 - [ ] doubly-linked list ```shell ==5409== HEAP SUMMARY: ==5409== in use at exit: 2,400 bytes in 100 blocks ==5409== total heap usage: 101 allocs, 1 frees, 3,424 bytes allocated ``` - [ ] xor linked list ```shell ==5435== HEAP SUMMARY: ==5435== in use at exit: 1,600 bytes in 100 blocks ==5435== total heap usage: 101 allocs, 1 frees, 2,624 bytes allocated ``` ### 效能分析 - 對結果不太意外, xor 的實做為了節省開銷需要更大的計算量,而 doubly linked list 原本就要比 sigle linked list 還需要更多的操作 ![](https://i.imgur.com/Lrw9LvK.png) :::danger 修正用語: (注意連字號) * `singly-linked list (orig)`: 原本給定的程式碼 * `singly-linked list (fixed)`: 嘗試修正效能表現的程式碼 * `doubly-linked list`: 雙向鏈結串列 ::: - 已更正 ## 參考資料 - [xor liked list](https://en.wikipedia.org/wiki/XOR_linked_list) - [Merge sort](https://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sort)