# 2019q3 Homework4 (quiz4)
contributed by < `colinyoyo26` >
## 測驗 `1`
考慮一個單向 linked list:
```c
typedef struct __list {
int data;
struct __list *next;
} list;
```
在不存在環狀結構的狀況下,以下函式能夠對 linked list 元素從小到大排序:
```c
list *sort(list *start) {
if (!start || !start->next)
return start;
list *left = start;
list *right = left->next;
LL0;
left = sort(left);
right = sort(right);
for (list *merge = NULL; left || right; ) {
if (!right || (left && left->data < right->data)) {
if (!merge) {
LL1;
} else {
LL2;
merge = merge->next;
}
LL3;
} else {
if (!merge) {
LL4;
} else {
LL5;
merge = merge->next;
}
LL6;
}
}
return start;
}
```
請補完程式碼。
:::success
延伸問題:
1. 解釋上述程式運作原理;
2. 指出程式改進空間,特別是考慮到 [Optimizing merge sort](https://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sort);
3. 將上述 singly-linked list 擴充為 circular doubly-linked list 並重新實作對應的 `sort`;
4. 依循 Linux 核心 [include/linux/list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h) 程式碼的方式,改寫上述排序程式;
5. 嘗試將原本遞迴的程式改寫為 iterative 版本;
6. 將 [2019q1 第 3 週測驗題 (下)](https://hackmd.io/@sysprog/SkrVSKiU4) 提及的 XOR linked list 列入考量,也實作對應的排序程式,並比較效能和記憶體開銷;
7. 比照 [List Sort](https://arxiv.org/pdf/1310.7890.pdf) 的手法來分析前述程式 (含擴充版本) 的效能表現;
:::
==解題==
- 一開始這邊的實作比較特別一點,每次 divide 的點為第一個元素,所以可以知道 LL0 為 `left->next = NULL`
```c
list *left = start;
list *right = left->next;
LL0;
```
- 再來只有 merge 第一個元素時會進入 `if (!merge)` ,可以看出 LL1 為 `start = merge = left`
- start 指向 merge 後第一個元素
- 接者不是第一個的情況只要把尾端元素的 `next` 指下一個元素就好 `merge->next = left
- 最後 left 被選到所以要往後移`left = left->nex`
- 右邊和左邊同理
```cpp
if (!right || (left && left->data < right->data)) {
if (!merge) {
LL1;
} else {
LL2;
merge = merge->next;
}
LL3;
}
```
### 程式改進空間
- 如同解題時提到的「每次 divide 的點為第一個元素」所以執行的情況其實會像 quick-sort 的 worst case 一樣( pivot 都選到極值),時間複雜度為 $O(n^2)$
- `merge sort` 就是從中間切才有 $O(nlogn)$ 的穩定效能,但是這個實作沒做到這點
- 其時間開銷寫成遞迴式: $T(n) = T(n - 1) + n$
- 以下簡單圖例說明
- [ ] divide 過程




- [ ] merge 過程



:::warning
改用 GraphViz 重新製作上方 div/merge 圖例;
GraphViz 的描述內容可內嵌於 HackMD 頁面,請善用;
:notes: jserv
:::
- 用內嵌語法會有如下警告訊息跑出來,不知道是不是 bug
:::warning
TypeError: Y.renderString(...).then(...).catch(...).finally is not a function
:::
> 太好了,表示你可以提交錯誤回報: https://github.com/hackmdio/hackmd-io-issues/issues
> 之後 HackMD 改版就會有你的名字
- 為了驗證我的推論所以做了以下實驗 (擷取部份 code)
- merge.c
```c
int main(int argc, char** argv){
size_t size = atoi(argv[1]);
list head;
struct timespec start, end;
head.next = NULL;
for (size_t i = 0; i < size; i++){
list *item = malloc(sizeof(list));
item->data = random();
item->next = head.next;
head.next = item;
}
clock_gettime(CLOCK_REALTIME, &start);
head.next = sort(head.next);
clock_gettime(CLOCK_REALTIME, &end);
printf("%lu %lu\n", size, end.tv_nsec - start.tv_nsec );
return 0;
}
```
- test.sh
```shell
#!/bin/bash
rm *.txt
gcc -o m m.c
for i in $(seq 1 50 2000)
do
./m $i >> out.txt
done
gnuplot ./plot.gp
```
- 結果
- 從數據及圖可以看出的確是 $O(n^2)$ 趨勢

- 部份實驗數據
|elments | time (ns)
|-----|-
| 1 | 119 |
| 101 | 28327 |
| 201 | 114818 |
| 301 | 221432 |
| 401 | 386574 |
| 501 | 598338 |
| 601 | 862473 |
| 701 | 1165719 |
| 801 | 1680010 |
| 901 | 1893432 |
- 為了更嚴謹一些,重新改寫了從中間 divide 的版本,看出效能確實有很大的差異

- 主要是多了這個迴圈,把 list 平均分到 left right
```c
for (size_t i = 0; start; i++){
tem = start;
start = start->next;
if (i & 1){
tem->next = left.next;
left.next = tem;
} else {
tem->next = right.next;
right.next = tem;
}
}
```
- [Optimizing merge sort](https://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sort) 提到因為 memory hierarchy 的關係,所以用 Cache-aware versions of the merge sort algorithm 減少 cache 資料搬動的次數,其中一個作法為分割到剛好滿足 cache 大小時就用 `in-place sorting algorithm` 排序
:::warning
[skip list](https://en.wikipedia.org/wiki/Skip_list) 是種變形,之前修課的學生對此進行研究: [共筆](https://hackmd.io/@Vy_ZuZdnSo-Wv4jSWgNmLw/B1OZFDL1H)
:notes: jserv
:::
- 試著在 size <= 64 時使用 insertion sort 且為了善用 spatial locality 把 divide 的迴圈改成以下形式
- 差在 if 的條件,一開始沒注意到這點,所以實驗結果一直不如預期
- 雖然是 linked list 但是一開始插入節點時是循序的差入,每個節點的 virtual address 只相差 32 bytes
- 待更新
```cpp
for(size_t i = 0; start; i++){
tem = start;
start = start->next;
if (i < size / 2){
tem->next = left.next;
left.next = tem;
} else {
tem->next = right.next;
right.next = tem;
}
}
```
- 實驗結果

- insertion sort
```cpp
list *insert_sort(list *start){
list **prev, **cur = &start, *tem;
while(*cur){
cur = &(*cur)->next;
prev = &start;
while(*cur && *prev != *cur){
if((*prev)->data > (*cur)->data){
tem = *cur;
*cur = (*cur)->next;
tem->next = *prev;
*prev = tem;
prev = &start;
continue;
}
prev = &(*prev)->next;
}
}
return start;
}
```
### circular doubly-linked list
- 實作了邏輯相同的 doublly linked 版本
- 因為 `list` 的操作會一直用到,所以寫成 function
- `replace` function 內一開始忘記把左右 node 只到新的 head 找了很久
- sort.c
```c
void sort(list *start) {
/* empty or singular */
if (start->next == start->prev)
return;
list left, right;
init_list(&left);
init_list(&right);
insert_tail(&left, pop_item(start->next));
replace(start, &right);
sort(&right);
init_list(start);
for (; !is_empty(&left) || !is_empty(&right); ) {
if (is_empty(&right) || (!is_empty(&left) && first_data(&left) < first_data(&right)))
insert_tail(start, pop_item(left.next));
else
insert_tail(start, pop_item(right.next));
}
}
```
- list.h
```cpp
typedef struct __list {
int data;
struct __list *next;
struct __list *prev;
} list;
static inline void init_list(list* head){
head->next = head;
head->prev = head;
}
/* insert to tail of linked list */
static inline void insert_tail(list* head, list* target){
list *prev = head->prev;
target->next = head;
target->prev = prev;
prev->next = target;
head->prev = target;
}
/* pop the certain item and return it */
static inline list* pop_item(list* item){
list *next = item->next,
*prev = item->prev;
next->prev = prev;
prev->next = next;
return item;
}
static inline bool is_empty(list* head){
return head->next == head;
}
/* return data of first entry*/
static inline int first_data(list* head){
return head->next->data;
}
/* replace the head of list */
static inline void replace(list* origin, list* new){
list *next = origin->next,
*prev = origin->prev;
*new = *origin;
next->prev = new;
prev->next = new;
}
```
### iterative version
- 相同邏輯的 iterative版本,差在是從左邊 merge 時間複雜度也是 O($n^2$)
```c
list *sort_iter(list *start) {
if (!start || !start->next)
return start;
list *left = start;
list *right = left->next;
list *tem;
left->next = NULL;
do {
tem = right->next;
right->next = NULL;
for (list *merge = NULL; left || right; ) {
if (!right || (left && left->data < right->data)) {
if (!merge) {
start = merge = left;
} else {
merge->next = left;
merge = merge->next;
}
left = left->next;
} else {
if (!merge) {
start = merge = right;
} else {
merge->next = right;
merge = merge->next;
}
right = right->next;
}
}
left = start;
right = tem;
} while(tem);
return start;
}
```
### XOR linked list
- sort.c
```c
void sort(xlist* list) {
if (!list || list->head == list->tail)
return ;
xlist left, right;
xlist_init(&left);
xlist_insert(&left, xlist_popfront(list));
right = *list;
sort(&right);
xlist_init(list);
for (; left.head || right.head; ) {
if (!right.head || (left.head && first_data(&left) < first_data(&right)))
xlist_insert(list, xlist_popfront(&left));
else
xlist_insert(list, xlist_popfront(&right));
}
}
```
- xlist.h
```c
typedef struct xor_node {
int data;
uint64_t xptr;
} xnode;
typedef struct xor_list {
struct xor_node *head, *tail;
} xlist;
static inline void xlist_init(xlist* list){
list->head = list->tail = NULL;
}
/* use to trace the list */
static inline void xlist_next(xnode** cur, xnode** prev){
xnode* tem = *cur;
*cur = (xnode*) ((uint64_t) (*cur)->xptr ^ (uint64_t) *prev);
*prev = tem;
}
/* pop and retrun first item */
static inline xnode* xlist_popfront(xlist* list){
xnode *tem, *next = (xnode *) (list->head->xptr ^ (uint64_t) NULL);
/* originally, next->xptr == list.head ^ next->next */
if(next)
next->xptr = next->xptr ^ (uint64_t) list->head ^ (uint64_t) NULL;
tem = list->head;
list->head = next;
return tem;
}
/* retrun data in first entry */
static inline int first_data(xlist* list){
return ((xnode *)((uint64_t) list->head ^ (uint64_t) NULL))->data;
}
/* insert tail */
static inline void xlist_insert(xlist* list, xnode* node){
node->xptr = (uint64_t) list->tail ^ (uint64_t) NULL;
if(!list->head){
list->tail = node;
list->head = node;
return;
}
list->tail->xptr = list->tail->xptr ^ (uint64_t) NULL ^ (uint64_t) node;
list->tail = node;
}
```
- 比較和 doubly linked list 記憶體開銷 (size 100)
- 用 valgrind 觀測 heap 使用率
- 看出用 doubly-linked list 每個 block 多了 8 bytes 也就是一個指標所佔的大小
- [ ] doubly-linked list
```shell
==5409== HEAP SUMMARY:
==5409== in use at exit: 2,400 bytes in 100 blocks
==5409== total heap usage: 101 allocs, 1 frees, 3,424 bytes allocated
```
- [ ] xor linked list
```shell
==5435== HEAP SUMMARY:
==5435== in use at exit: 1,600 bytes in 100 blocks
==5435== total heap usage: 101 allocs, 1 frees, 2,624 bytes allocated
```
### 效能分析
- 對結果不太意外, xor 的實做為了節省開銷需要更大的計算量,而 doubly linked list 原本就要比 sigle linked list 還需要更多的操作

:::danger
修正用語: (注意連字號)
* `singly-linked list (orig)`: 原本給定的程式碼
* `singly-linked list (fixed)`: 嘗試修正效能表現的程式碼
* `doubly-linked list`: 雙向鏈結串列
:::
- 已更正
## 參考資料
- [xor liked list](https://en.wikipedia.org/wiki/XOR_linked_list)
- [Merge sort](https://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sort)