# 2019q3 Homework4 (quiz4)
contributed by < `colinyoyo26` >
## 測驗 `1`
考慮一個單向 linked list:
```c
typedef struct __list {
    int data;
    struct __list *next;
} list;
```
在不存在環狀結構的狀況下,以下函式能夠對 linked list 元素從小到大排序:
```c
list *sort(list *start) {
    if (!start || !start->next)
        return start;
    list *left = start;
    list *right = left->next;
    LL0;
    left = sort(left);
    right = sort(right);
    for (list *merge = NULL; left || right; ) {
        if (!right || (left && left->data < right->data)) {
            if (!merge) {
                LL1;
            } else {
                LL2;
                merge = merge->next;
            }
            LL3;
        } else {
            if (!merge) {
                LL4;
            } else {
                LL5;
                merge = merge->next;
            }
            LL6;
        }
    }
    return start;
}
```
請補完程式碼。
:::success
延伸問題:
1. 解釋上述程式運作原理;
2. 指出程式改進空間,特別是考慮到 [Optimizing merge sort](https://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sort);
3. 將上述 singly-linked list 擴充為 circular doubly-linked list 並重新實作對應的 `sort`;
4. 依循 Linux 核心 [include/linux/list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h) 程式碼的方式,改寫上述排序程式;
5. 嘗試將原本遞迴的程式改寫為 iterative 版本;
6. 將 [2019q1 第 3 週測驗題 (下)](https://hackmd.io/@sysprog/SkrVSKiU4) 提及的 XOR linked list 列入考量,也實作對應的排序程式,並比較效能和記憶體開銷;
7. 比照 [List Sort](https://arxiv.org/pdf/1310.7890.pdf) 的手法來分析前述程式 (含擴充版本) 的效能表現;
:::
==解題==
- 一開始這邊的實作比較特別一點,每次 divide 的點為第一個元素,所以可以知道 LL0 為 `left->next = NULL`
```c
list *left = start;
list *right = left->next;
LL0;
```
- 再來只有 merge 第一個元素時會進入 `if (!merge)` ,可以看出 LL1 為 `start = merge = left`
    - start 指向 merge 後第一個元素
- 接者不是第一個的情況只要把尾端元素的 `next` 指下一個元素就好 `merge->next = left
- 最後 left 被選到所以要往後移`left = left->nex`
- 右邊和左邊同理
```cpp
if (!right || (left && left->data < right->data)) {
    if (!merge) {
        LL1;
    } else {
        LL2;
        merge = merge->next;
    }
    LL3;
} 
```
### 程式改進空間
- 如同解題時提到的「每次 divide 的點為第一個元素」所以執行的情況其實會像 quick-sort 的 worst case 一樣( pivot 都選到極值),時間複雜度為 $O(n^2)$
    - `merge sort` 就是從中間切才有 $O(nlogn)$ 的穩定效能,但是這個實作沒做到這點
    - 其時間開銷寫成遞迴式: $T(n) = T(n - 1) + n$
    - 以下簡單圖例說明
- [ ] divide 過程




- [ ] merge 過程



:::warning
改用 GraphViz 重新製作上方 div/merge 圖例;
GraphViz 的描述內容可內嵌於 HackMD 頁面,請善用;
:notes: jserv
:::
- 用內嵌語法會有如下警告訊息跑出來,不知道是不是 bug
:::warning
TypeError: Y.renderString(...).then(...).catch(...).finally is not a function
:::
> 太好了,表示你可以提交錯誤回報: https://github.com/hackmdio/hackmd-io-issues/issues
> 之後 HackMD 改版就會有你的名字
- 為了驗證我的推論所以做了以下實驗 (擷取部份 code)
- merge.c
```c
int main(int argc, char** argv){
    size_t size = atoi(argv[1]);
    list head;
    struct timespec start, end;
    head.next = NULL;
    for (size_t i = 0; i < size; i++){
        list *item = malloc(sizeof(list));
        item->data = random();
        item->next = head.next;
        head.next = item;
    }
    clock_gettime(CLOCK_REALTIME, &start);
    head.next = sort(head.next);
    clock_gettime(CLOCK_REALTIME, &end);
    printf("%lu %lu\n", size, end.tv_nsec - start.tv_nsec );
    return 0;
}
```
- test.sh
```shell
#!/bin/bash
rm *.txt
gcc  -o m m.c
for i in $(seq 1 50 2000)
do
    ./m $i >> out.txt
done
gnuplot ./plot.gp
```
- 結果
    - 從數據及圖可以看出的確是 $O(n^2)$ 趨勢

- 部份實驗數據
|elments | time (ns)
|-----|-
| 1   | 119     |
| 101 | 28327   |
| 201 | 114818  |
| 301 | 221432  |
| 401 | 386574  |
| 501 | 598338  |
| 601 | 862473  |
| 701 | 1165719 |
| 801 | 1680010 |
| 901 | 1893432 |
 
- 為了更嚴謹一些,重新改寫了從中間 divide 的版本,看出效能確實有很大的差異

- 主要是多了這個迴圈,把 list 平均分到 left right
```c
    for (size_t i = 0; start; i++){
        tem = start;
        start = start->next;
        if (i & 1){
            tem->next = left.next;
            left.next = tem;
        } else {
            tem->next = right.next;
            right.next = tem;
        }
    }
```
- [Optimizing merge sort](https://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sort) 提到因為 memory hierarchy 的關係,所以用 Cache-aware versions of the merge sort algorithm 減少 cache 資料搬動的次數,其中一個作法為分割到剛好滿足 cache 大小時就用 `in-place sorting algorithm` 排序
:::warning
[skip list](https://en.wikipedia.org/wiki/Skip_list) 是種變形,之前修課的學生對此進行研究: [共筆](https://hackmd.io/@Vy_ZuZdnSo-Wv4jSWgNmLw/B1OZFDL1H)
:notes: jserv
:::
- 試著在 size <= 64 時使用 insertion sort 且為了善用 spatial locality 把 divide 的迴圈改成以下形式
    - 差在 if 的條件,一開始沒注意到這點,所以實驗結果一直不如預期 
    - 雖然是 linked list 但是一開始插入節點時是循序的差入,每個節點的 virtual address 只相差 32 bytes
    - 待更新
```cpp
    for(size_t i = 0; start; i++){
        tem = start;
        start = start->next;
        if (i < size / 2){
            tem->next = left.next;
            left.next = tem;
        } else {
            tem->next = right.next;
            right.next = tem;
        }
    }
```
- 實驗結果

- insertion sort
```cpp
list *insert_sort(list *start){
    list **prev, **cur = &start, *tem;
    while(*cur){
        cur = &(*cur)->next;
        prev = &start;
        while(*cur && *prev != *cur){
            if((*prev)->data > (*cur)->data){
                tem = *cur;
                *cur = (*cur)->next;
                tem->next = *prev;
                *prev = tem;
                prev = &start;
                continue;
            }
            prev = &(*prev)->next;
        }
    }
    return start;
}
```
### circular doubly-linked list
- 實作了邏輯相同的 doublly linked 版本 
    - 因為 `list` 的操作會一直用到,所以寫成 function 
    - `replace` function 內一開始忘記把左右 node 只到新的 head 找了很久
- sort.c
```c
void sort(list *start) {
    /* empty or singular */
    if (start->next == start->prev)
        return;
    list left, right;
    init_list(&left);
    init_list(&right);  
    insert_tail(&left, pop_item(start->next));
    replace(start, &right);
    sort(&right);
    init_list(start);
    
    for (; !is_empty(&left) || !is_empty(&right); ) {
        if (is_empty(&right) || (!is_empty(&left) && first_data(&left) < first_data(&right))) 
            insert_tail(start, pop_item(left.next));
        else 
            insert_tail(start, pop_item(right.next));       
    }
}
```
- list.h
```cpp
typedef struct __list {
    int data;
    struct __list *next;
    struct __list *prev;
} list;
static inline void init_list(list* head){
    head->next = head;
    head->prev = head;
}
/* insert to tail of linked list */
static inline void insert_tail(list* head, list* target){
    list *prev = head->prev;
    target->next = head;
    target->prev = prev;
    prev->next = target;
    head->prev = target;
}
/* pop the certain item and return it */
static inline list* pop_item(list* item){
    list *next = item->next,
        *prev = item->prev;
    
    next->prev = prev;
    prev->next = next;
    return item;
}
static inline bool is_empty(list* head){
    return head->next == head;
}
/* return data of first entry*/
static inline int first_data(list* head){
    return head->next->data;
}
/* replace the head of list */
static inline void replace(list* origin, list* new){
    list *next = origin->next,
        *prev = origin->prev;
    *new = *origin;
    next->prev = new;
    prev->next = new;
}
```
### iterative version
- 相同邏輯的 iterative版本,差在是從左邊 merge 時間複雜度也是 O($n^2$)
```c
list *sort_iter(list *start) {
    if (!start || !start->next)
        return start;
    list *left = start;
    list *right = left->next;
    list *tem;
    left->next = NULL;
    do {
        tem = right->next;
        right->next = NULL;
        for (list *merge = NULL; left || right; ) {
            if (!right || (left && left->data < right->data)) {
                if (!merge) {
                    start = merge = left;
                } else {
                    merge->next = left;
                    merge = merge->next;
                }
                left = left->next;
            } else {
                if (!merge) {
                    start = merge = right;
                } else {
                    merge->next = right;
                    merge = merge->next;
                }
                right = right->next;
            }
        }
        left = start;
        right = tem;
    } while(tem);
    return start;
}
```
###  XOR linked list
- sort.c
```c
void sort(xlist* list) {
    if (!list || list->head == list->tail)
        return ;
    xlist left, right;
    xlist_init(&left);
    xlist_insert(&left, xlist_popfront(list));
    right = *list;
    sort(&right);
    xlist_init(list);
    
    for (; left.head || right.head; ) {
        if (!right.head || (left.head && first_data(&left) < first_data(&right)))
            xlist_insert(list, xlist_popfront(&left));
        else 
            xlist_insert(list, xlist_popfront(&right));
    }
}
```
- xlist.h
```c
typedef struct xor_node {
    int data;
    uint64_t xptr;
} xnode; 
typedef struct xor_list {
    struct xor_node *head, *tail;
} xlist;
static inline void xlist_init(xlist* list){
    list->head = list->tail = NULL;
}
/* use to trace the list */
static inline void xlist_next(xnode** cur, xnode** prev){
    xnode* tem = *cur;
    *cur = (xnode*) ((uint64_t) (*cur)->xptr ^ (uint64_t) *prev);
    *prev = tem;
}
/* pop and retrun first item */
static inline xnode* xlist_popfront(xlist* list){
    xnode *tem, *next = (xnode *) (list->head->xptr ^ (uint64_t) NULL);
    /* originally, next->xptr == list.head ^ next->next */ 
    if(next)
        next->xptr = next->xptr ^ (uint64_t) list->head ^ (uint64_t) NULL;
    tem = list->head; 
    list->head = next;
    return tem;
}
/* retrun data in first entry */
static inline int first_data(xlist* list){
    return ((xnode *)((uint64_t) list->head ^ (uint64_t) NULL))->data; 
}
/* insert tail */
static inline void xlist_insert(xlist* list, xnode* node){
    node->xptr = (uint64_t) list->tail ^ (uint64_t) NULL;
    if(!list->head){
        list->tail = node;
        list->head = node;
        return;
    }
    list->tail->xptr = list->tail->xptr ^ (uint64_t) NULL ^ (uint64_t) node;
    list->tail = node;
}   
```
- 比較和 doubly linked list 記憶體開銷 (size 100) 
    - 用 valgrind 觀測 heap 使用率
    - 看出用 doubly-linked list 每個 block 多了 8 bytes 也就是一個指標所佔的大小
- [ ] doubly-linked list
```shell
==5409== HEAP SUMMARY:
==5409==   in use at exit: 2,400 bytes in 100 blocks
==5409==   total heap usage: 101 allocs, 1 frees, 3,424 bytes allocated
```
- [ ] xor linked list
```shell
==5435== HEAP SUMMARY:
==5435==     in use at exit: 1,600 bytes in 100 blocks
==5435==   total heap usage: 101 allocs, 1 frees, 2,624 bytes allocated
```
### 效能分析
- 對結果不太意外, xor 的實做為了節省開銷需要更大的計算量,而 doubly linked list 原本就要比 sigle linked list 還需要更多的操作

:::danger
修正用語: (注意連字號)
* `singly-linked list (orig)`: 原本給定的程式碼
* `singly-linked list (fixed)`: 嘗試修正效能表現的程式碼
* `doubly-linked list`: 雙向鏈結串列
:::
- 已更正
## 參考資料
- [xor liked list](https://en.wikipedia.org/wiki/XOR_linked_list)
- [Merge sort](https://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sort)