探討 [Functional Programming in C](https://hackmd.io/s/r1SgsdF3X)

# 探討 [Functional Programming in C](https://hackmd.io/s/r1SgsdF3X) contributed by < `JEFF1216`, `brian208579`, `happyincent` > ## 目標設定 1. 解釋給定程式碼的運作 2. 透過修改和實驗，理解 functional programming (FP) 風格簡單來說，其實和數學的函數有一些關係，亦即在我們的輸入和輸出都不會被干擾的情況下，輸入同一個函式會出現相同的規律，例如：5 輸入一個 f(x) 的函式得到的數值是 10，那麼輸入一個 15 的數值就要得到 30，而不是其他數值。而且不會受到 [side effect](https://en.wikipedia.org/wiki/Side_effect_(computer_science)) 的影響，若我們過程中想要改變我們的輸入是不行的，除非我們使用 copy (call by value) 的方式，但設想一個問題，如果今天我們要將一個內含 `26` 個元素的陣列其中一個元素作更改，可行辦法有複製一整個陣列後，然後再做更改。但這種作法會增加記憶體空間和降低效率，但若是透過一些資料結構，例如 tree 或 linked list 等等....結構，而且 functional programming 是一旦執行了就算有比它執行更快的函式出現它依舊會跑完不受影響。結論來說所謂的 functional programming 就是將副函式設計成純函式並且相互呼叫組裝起來其優點是應用程式或者運行環境（Runtime）可以對純函數的運算結果進行快取，運算加快速度。給定程式碼中關於 functional programming 最重要的精髓在於以下: ```clike /* integer linked list type */ typedef struct __int_list const *const int_list_t; static int_list_t const Nil = NULL; /* empty list */ ``` 這是一個 const 結構指標且該指標的位址和裡面的內容無法透過區域變數去改變，猶如我在上述提到的 functional programming 是在不干擾 input output 的情況下帶入函式，所以我們在做延伸題目是不可以更改這一行的程式碼的! 再來，是關於 callback 的函式指標的理解，這在我們的 review 作業中有看過類似的: ```clike= void (* show_car_attributes(int color, void (*func)(int)) )(int) ``` 這是一個函式指標，而 `MakeListCallback` 就是拿來接收 reverse_toarray 的這個函式，並且將 `lst` 和 `res` 傳給 reverse 做下一個動作，打個比方:`make_list(arr_size, arr, Nil, reverse_toarray, res);` 會在這行程式碼去呼叫 `make_list` 副函式直到執行完透過 `cb(lst,res)` 將做好的 linked list 的頭位址回傳給 reverse，所以整體的函式執行順序是: **main --> make_list --> reverse_toarray --> reverse --> list2array --> void_map_array** 再來是建立 linked list 的部分: 這個是從陣列中的尾端建立起，亦即 `5` 是尾巴而 `99` 是頭，而在建立 linked list 過程中，lst 會將下一個結構的 next 儲存起來，並且將位址和數值複寫進下一個結構中，所以在 make_list 的順序是: ### lst ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 99 | <ref> }", width=1.2, color=red] b [label="{ <data> 95 | <ref> }"]; c [label="{ <data> 90 | <ref> }"]; d [label="..." shape=box]; e [label="{ <data> 5 | <ref> }",width=1.2]; a:ref:c -> b:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false]; c:ref:c -> d:ref[arrowhead=dot, arrowtail=vee, dir=both, tailclip=false]; d->e:ref[arrowhead=dot, arrowtail=vee, dir=both, tailclip=false]; } ``` ### list ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 99 | <ref> }", width=1.2, color=red] b [label="{ <data> 95 | <ref> }"]; c [label="..." shape=box]; d [label="{ <data> 5 | <ref> }"]; e [label="{ NULL }"]; a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; c:ref:c -> d:data [arrowsize=1.2]; d:ref:c -> e [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; } ``` reverse 將頭 (99) 的 next 指向一開始的 rlist (Null)，再透過 rlist 儲存新的 (99) 的位址，再依序將新的 (95) 的 next 指向新的 (99)，如圖: ### rlist: ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 99 | <ref> }", width=1.2, color=red] b [label="{ NULL }"]; a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; } ``` ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 95 | <ref> }", width=1.2, color=red] b [label="{ <data> 99 | <ref> }"]; c [label="{ NULL }"]; a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; } ``` 直到將全部的 link-list 做完反轉後才結束，如圖: ### rlist: ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 5 | <ref> }", width=1.2, color=red] b [label="{ <data> 10 | <ref> }"]; c [label="..." shape=box]; d [label="{ <data> 99 | <ref> }"]; e [label="{ NULL }"]; a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; c:ref:c -> d:data [arrowsize=1.2]; d:ref:c -> e [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; } ``` --- 最後的 linked list 再傳回 `list2array` 並將最後的結果印出來。本程式的技巧除了透過陣列來建立一個 linked list，過程中也利用 `if` 的判斷式來進行遞迴並且傳給下一個副函式，例如在 `reverse` 副函式中，其終止的條件是，當 list 的位址為 NULL (也就是到最後一個 5 時) 會停止遞迴並將其中的值傳給下一個副函式: ```clike void reverse(int_list_t list, int_list_t rlist, /* reversed list */ ReversedListCallback const cb, CPS_Result res) { if (Nil == list) { cb(rlist, res); return; } reverse(list->next, &(node_t) K1, cb, res); } ``` 注意到這裡的 CPS: > Continuation Passing Style (CPS): Programming style characterized by functions with an additional continuation argument, which call their continuation instead of returning to their caller (continuations are simply functions in C). ### 比照 FP，改寫上述程式碼，實作出 quick sort，同樣不得配置 heap memory (也就是不得用 malloc/free) #### 作法 1 > [source code](https://github.com/happyincent/Functional-Programming-in-C/blob/master/sort_arr.c) [reference](http://alrightchiu.github.io/SecondRound/comparison-sort-quick-sortkuai-su-pai-xu-fa.html) * 直接對 main 中的 arr 做 quick sort，因此 make_list 的 input 即為 sorted 過的 array * 再將 make_list 的 callback function 改為 list2array 即可達成不配置 heap memory 的要求 ```c ... /* quick sort functions */ void swap(int *a, int *b); int partition(int *arr, int front, int end); void quicksort(int *arr, int front, int end); int main(int argc, char *argv[]) { int32_t arr[] = {99, 95, 90, 85, 80, 20, 75, 15, 10, 5}; uint32_t const arr_size = ARRAY_SIZE(arr); /* quick sort input arr */ quicksort(arr, 0, arr_size - 1); int32_t res[arr_size]; /* call make_list and pass list2array as the "continuation" */ make_list(arr_size, arr, Nil, list2array, res); /* print out the quicksorted array */ void_map_array(print_val, arr_size, res); printf("\n"); return 0; } /* construct a linked list from an array */ void make_list(uint32_t const arr_size, int32_t const arr[], int_list_t lst, MakeListCallback const cb, CPS_Result res) { // cb == list2array if (!arr_size) { cb(lst, res); return; } make_list(arr_size - 1, arr, &(node_t){.val = arr[arr_size - 1], .next = lst}, cb, res); } ... ``` #### 作法 2 > [source code](https://github.com/happyincent/Functional-Programming-in-C/blob/master/quicksort_without_const.c) [reference](https://www.tutorialcup.com/linked-list/quick-sort-sIngly-linked-list.htm) * 將 typedef int_list_t 的 const 拿掉，才能對指標本身、指標指向的物件進行更動 * make_list 的 call back 函式 quicksort_toarray 直接做 list2array * 以 quicksort 取得 sorted 後的 list head ```c /* integer linked list type */ typedef struct __int_list * int_list_t; // typedef struct __int_list const *const int_list_t; ... /* quick sort functions */ int_list_t getTail(int_list_t cur); int_list_t partition(int_list_t head, int_list_t end, int_list_t *newhead, int_list_t *newtail); int_list_t quicksort(int_list_t head, int_list_t tail); /* sort a list and then store it in an array */ void quicksort_toarray(int_list_t list, CPS_Result res) { list2array( quicksort(list, getTail(list)), res ); } ... int main(int argc, char *argv[]) { int32_t arr[] = {99, 95, 90, 85, 80, 20, 75, 15, 10, 5}; uint32_t const arr_size = ARRAY_SIZE(arr); int32_t res[arr_size]; /* call make_list and pass quicksort_toarray as the "continuation" */ make_list(arr_size, arr, Nil, quicksort_toarray, res); /* print out the quicksortd array */ void_map_array(print_val, arr_size, res); printf("\n"); return 0; } ... int_list_t getTail(int_list_t cur) { if (cur == Nil || cur->next == Nil) { return cur; } return getTail(cur->next); } ... ``` ### 作法三 > [source code](https://github.com/happyincent/Functional-Programming-in-C/blob/master/mergesort_wihout_const.c) 利用 mergesort 我們減少了程式碼中的 side effect 並且將每個副函式幾乎變成了 pure function。也就是說，每個函式我們都只用了區域變數，並讓其中的值只受到該區域的副函式影響，直到結束後傳遞結果為止。 #### mergesort mergesort 簡單來說就是將一個 linked list 拆成一個又一個獨立的結構再去作排列和重組。如圖： ![](https://i.imgur.com/mZWAlpF.png) 在 mergesort 中最重要的莫過於要怎麼實現 "**拆鏈表**" 這件事。 ```clike= void partition(int_list_t head, int_list_t slow, int_list_t fast, int_list_t *front, int_list_t *back) { if (head == NULL || head->next == NULL) { *front = head; *back = NULL; return; } if (fast == NULL) { *front = head; *back = slow->next; slow->next = NULL; return; } else { if (fast->next == NULL) partition(head, slow, fast->next, front, back); else partition(head, slow->next, fast->next->next, front, back); } } int_list_t mergeLists(int_list_t a, int_list_t b) { int_list_t mergedList = NULL; if (a == NULL) { return b; } else if (b == NULL) { return a; } if (a->val <= b->val) { mergedList = a; mergedList->next = mergeLists(a->next, b); } else { mergedList = b; mergedList->next = mergeLists(a, b->next); } return mergedList; } void mergesort(int_list_t *source) { int_list_t head = *source; int_list_t a = NULL; int_list_t b = NULL; if (head == NULL || head->next == NULL) { return; } partition(head, head, head->next, &a, &b); mergesort(&a); mergesort(&b); *source = mergeLists(a, b); } ``` 我們用兩個結構指標： fast 和 slow，當 fast 不為 NULL 時，fast 往前讀取位址，若下一個fast 依舊不為 NULL，則 slow 和 fast 兩個指標繼續往前，直到 fast 指向 NULL 為止。如圖 ### list ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 99 | <ref> }", width=1.2, color=red] b [label="{ <data> 95 | <ref> }", width=1.2, color=red]; c [label="{ <data> 85 | <ref> }", width=1.2, color=red]; d [label="{ <data> 80 | <ref> }", width=1.2, color=red]; e [label="{ <data>. .. | <ref> }", width=1.2, color=red]; f [label="{ NULL }", width=1.2, color=red]; g [label="{ <data> slow }", width=1.2, color=black]; h [label="{ <data> fast }", width=1.2, color=black]; a:ref:c -> b:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; c:ref:c -> d:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; d:ref:c -> e:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; e:ref:c -> f [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; a:ref:c -> g:data [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> h:data [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; } ``` 若 fast 不為 NULL 則變成： ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 99 | <ref> }", width=1.2, color=red] b [label="{ <data> 95 | <ref> }", width=1.2, color=red]; c [label="{ <data> 85 | <ref> }", width=1.2, color=red]; d [label="{ <data> 80 | <ref> }", width=1.2, color=red]; e [label="{ <data>. .. | <ref> }", width=1.2, color=red]; f [label="{ NULL }", width=1.2, color=red]; g [label="{ <data> slow }", width=1.2, color=black]; h [label="{ <data> fast }", width=1.2, color=black]; a:ref:c -> b:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; c:ref:c -> d:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; d:ref:c -> e:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; e:ref:c -> f [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; a:ref:c -> g:data [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; c:ref:c -> h:data [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; } ``` 若 fast 仍不為 NULL 的話則 fast 和 slow 各前進一格。 ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 99 | <ref> }", width=1.2, color=red] b [label="{ <data> 95 | <ref> }", width=1.2, color=red]; c [label="{ <data> 85 | <ref> }", width=1.2, color=red]; d [label="{ <data> 80 | <ref> }", width=1.2, color=red]; e [label="{ <data>. .. | <ref> }", width=1.2, color=red]; f [label="{ NULL }", width=1.2, color=red]; g [label="{ <data> slow }", width=1.2, color=black]; h [label="{ <data> fast }", width=1.2, color=black]; a:ref:c -> b:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; c:ref:c -> d:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; d:ref:c -> e:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; e:ref:c -> f [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> g:data [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; d:ref:c -> h:data [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; } ``` 直到 fast 碰到接地處（NULL）為止，則該 list 就會被分為兩個和其他，以本題來說就是 99, 90, 95, 6, 85, 20, 75, 15 一群，(10, 5) 為一群不斷地這樣分下去。直到變成一個獨立的 link-list 後再去連接起來。本次的程式執行順序為： **main -->make_list --> mergesort --> partition --> partition --> ........ -->mergesort -->mergelist --> mergesort --> list2array --> void_map_array** :::warning 目前還沒想到怎麼以 FP 風格實作對 linked list 做 quick sort 的方法 to-do list: * https://stackoverflow.com/questions/4574279/sorting-in-functional-programming-languages * https://stackoverflow.com/questions/7717691/why-is-the-minimalist-example-haskell-quicksort-not-a-true-quicksort * http://learnyouahaskell.com/recursion ::: ### Non-intrusive makelist 11/29更新 [source code](https://github.com/happyincent/Functional-Programming-in-C/blob/master/mergesort.c) ```clike #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <stddef.h> #define container_of(ptr, type, member) \ ((type *) ((char *) (ptr) -offsetof(type, member))) #define list_entry(node, type, member) container_of(node, type, member) typedef struct __list * list_t; typedef struct __list { list_t next; } node_t; typedef struct __ele { int32_t const val; list_t const list; } ele_t; static const ele_t Nil = {.val = 0, .list = &(node_t){ .next = NULL }}; typedef void *const CPS_Result; typedef void (*MakeListCallback)(ele_t *e, CPS_Result res); void make_list(uint32_t const arr_size, int32_t const array[], ele_t *e, MakeListCallback const cb, CPS_Result res); void mergesort_toarray(ele_t *e, CPS_Result res); /* Merge sort */ void mergesort(ele_t **source); void partition(ele_t *head, ele_t **front, ele_t **back); ele_t *mergeLists(ele_t *a, ele_t *b); typedef void (*VoidMappable)(int32_t const val); void void_map_array(VoidMappable const cb, uint32_t const size, int32_t const *const arr); void list2array(ele_t *e, CPS_Result res); static void print_val(int32_t const val) { printf("%d ", val); } #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0])) int main(int argc, char *argv[]) { int32_t arr[] = {99, 95, 90, 85, 80, 20, 75, 15, 10, 5}; uint32_t const arr_size = ARRAY_SIZE(arr); int32_t res[arr_size]; make_list(arr_size, arr, (ele_t *)(&Nil), mergesort_toarray, res); void_map_array(print_val, arr_size, res); printf("\n"); return 0; } void make_list(uint32_t const arr_size, int32_t const arr[], ele_t *e, MakeListCallback const cb, CPS_Result res) { if (!arr_size) { cb(e, res); return; } make_list(arr_size - 1, arr, &(ele_t){ .val = arr[arr_size - 1], .list = &(node_t){ .next = (list_t)(&(e->list)) } }, cb, res); } void list2array(ele_t *e, CPS_Result res) { if (e->list->next == NULL) { return; } int32_t *array = res; array[0] = e->val; list2array(list_entry(e->list->next, ele_t, list), array + 1); } void mergesort_toarray(ele_t *e, CPS_Result res) { if (e->list->next != NULL) mergesort(&e); list2array(e, res); } void void_map_array(VoidMappable const cb, uint32_t const size, int32_t const *const arr) { if (!size) return; cb(arr[0]); void_map_array(cb, size - 1, arr + 1); } /* Merge sort */ void mergesort(ele_t **source) { ele_t *head = *source; ele_t *a = NULL; ele_t *b = NULL; if (head == NULL || head->list->next == (list_t)(&(Nil.list))) return; partition(head, &a, &b); mergesort(&a); mergesort(&b); *source = mergeLists(a, b); } void partition(ele_t *head, ele_t **front, ele_t **back) { ele_t *fast; ele_t *slow; if (head == NULL || head->list->next == (list_t)(&(Nil.list))) { *front = head; *back = NULL; } else { slow = head; fast = list_entry(head->list->next, ele_t, list); while(fast->list->next != NULL) { fast = list_entry(fast->list->next, ele_t, list); if (fast->list->next != NULL) { slow = list_entry(slow->list->next, ele_t, list); fast = list_entry(fast->list->next, ele_t, list); } } *front = head; *back = list_entry(slow->list->next, ele_t, list); slow->list->next = (list_t)(&(Nil.list)); } } ele_t *mergeLists(ele_t *a, ele_t *b) { ele_t *mergedList = NULL; ele_t *tmp; if (a->list->next == NULL) { return b; } else if (b->list->next == NULL) { return a; } if (a->val <= b->val) { mergedList = a; tmp = mergeLists(list_entry(a->list->next, ele_t, list), b); mergedList->list->next = (list_t)(&(tmp->list)); } else { mergedList = b; tmp = mergeLists(a, list_entry(b->list->next, ele_t, list)); mergedList->list->next = (list_t)(&(tmp->list)); } return mergedList; } ``` 編譯並測試: ```shell $ make gcc -O0 -g -Wall -Werror -std=gnu99 -c -o mergesort.o mergesort.c gcc mergesort.o -o mergesort $ ./mergesort 5 10 15 20 75 80 85 90 95 99 ``` 12/1更新，程式碼解釋：本次關於 functional programming 的作業，最後實做我們是採用 non-intrusive 的 linked list 來實現，從程式碼中 ```clike typedef struct __list * list_t; typedef struct __list { list_t next; } node_t; typedef struct __ele { int32_t const val; list_t const list; } ele_t; ``` 我們將`val `和`list` 的值宣告為常量，亦即無法對其值進行改變，然後將`next `宣告為一個指標的指標，如果是透過圖形來解釋的話，會是像這樣，如圖： ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> val | <ref> list }", width=1.2, color=red] b [label="{ <data> val | <ref> list }", width=1.2, color=red]; g [label="{ <data> next }", width=1.2, color=black]; g:ref:c -> b:ref [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=0.5]; a:ref:c -> g:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=0.5]; } ``` ### Nil 而在程式實現中，我們必須設計一個 Nil 結構參數，其目的是為了在各個函式之間呼叫，我們是以`ele_t `或是`list_t `的結構型態進行回傳資料和遞迴，所以我們無法直接利用 NULL 來作為函式中斷遞迴的判斷，因此我們令`Nil `為`static const ele_t Nil = {.val = 0, .list = &(node_t){ .next = NULL }}; ` 意思是當我們指向 Nil 的時候就是該位址指向 Null 的意思。如圖： ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> val | <ref> list }", width=1.2, color=red] b [label="{ <data> Nil(0) | <ref> list }", width=1.2, color=red]; c [label="{ <data> NULL }", width=1.2, color=black]; g [label="{ <data> next }", width=1.2, color=black]; g:ref:c -> b:ref [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=0.5]; a:ref:c -> g:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=0.5]; b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=0.5]; } ``` ### container_of (ptr, type, member) 再來是關於`#define container_of(ptr, type, member) ` 會使用在 linux kernel 中的這個定義式是因為我們想要根據一個結構體變數中的一個成員變數的指標來獲取指向整個結構體變量的指標，打個比方，若今天有一個結構體為： ```clike struct demo_struct { type1 member1; type2 member2; type3 member3; type4 member4; }; ``` 假設結構體變數 demo 在實際記憶體內的位置如下圖所示： ``` demo +-------------+ 0xA000 | member1 | +-------------+ 0xA004 | member2 | +-------------+ 0xA010 | member3 | +-------------+ 0xA018 | member4 | +-------------+ ``` 若今天我們執行程式碼：`container_of(memp, struct demo_struct, type3) `會使得出來的位址為該結構得初始位址，也就是 0xA000，而在作 mergesort 的時候，我們的 fast 和 slow 指標必須是指向該 linked-list 結構的初始位址才可以對它作 dereference ，否則就會出現程式記憶體區段錯誤（core-dumped )，因次我們才會定義`list_entry `為: ```clike #define list_entry(node, type, member) container_of(node, type, member) ``` 例如在`partition `中我們要對將其拆開時，便會使用到` fast = list_entry(head->list->next, ele_t, list);` ### mergeLists 最後是關於 mergeLists 的寫法，其實和之前的寫法不會相差太多，只是在於進行遞迴時，需要用到`tmp`來當作暫存器儲存結構的位址，例如，在下列程式碼中： ```clike if (a->list->next == NULL) { return b; } else if (b->list->next == NULL) { return a; } if (a->val <= b->val) { mergedList = a; tmp = mergeLists(list_entry(a->list->next, ele_t, list), b); mergedList->list->next = (list_t)(&(tmp->list)); } else { mergedList = b; tmp = mergeLists(a, list_entry(b->list->next, ele_t, list));//回傳（a,Nil)給mergeLists mergedList->list->next = (list_t)(&(tmp->list)); } ``` 我們假設 a 是 95 的結構體，b 是 90 的結構體，在比較完大小後，在`tmp = mergeLists(a, list_entry(b->list->next, ele_t, list)); `中會進行遞迴也就是將 a 結構體指標和 Nil 回傳給 mergeLists 函式，經由 if 判斷式後回傳 a 結構體指標給 tmp 最後將 90 的 next ，也就是 b->list->next 由原本的指向 Nil 改成指向 95 的結構體，以此遞迴下去完成 non-intrusive linked-list 的重組。最後再將該鏈表印出來完成此次 FP 風格的實現。 gdb下的觀察結果： ```shell breakpoint 4, partition (head=0x7ffffffedee0, front=0x7ffffffedde0, back=0x7ffffffedde8) at mergesort.c:124 124 if (head == NULL || head->list->next == (list_t)(&(Nil.list))) { (gdb) 129 slow = head; (gdb) 130 fast = list_entry(head->list->next, ele_t, list); (gdb) 132 while(fast->list->next != NULL) { (gdb) p *head $1 = {val = 99, list = 0x7ffffffeded8} (gdb) p *fast $2 = {val = 95, list = 0x7ffffffedf48} (gdb) p *(head->list->next) $3 = {next = 0x7ffffffedf48} (gdb) p *(head->list) $4 = {next = 0x7ffffffedf58} (gdb) p &(head->list>next) No symbol "next" in current context. (gdb) p &(head->list->next) $5 = (list_t *) 0x7ffffffeded8 (gdb) p (head->list) $6 = (const list_t) 0x7ffffffeded8 (gdb) p (head->list->next) $7 = (list_t) 0x7ffffffedf58 (gdb) p &(fast->list) $8 = (const list_t *) 0x7ffffffedf58 ``` container_of 和 offsetof 的範例： ```clike #include <stdio.h> #include <stddef.h> //#define offsetof(type, member) (size_t)&(((type*)0)->member) 在stddef.h就有定義了 #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) //offset表示去取該結構指標（例如：type）的結構偏移量即該變量之記憶體大小（會自動補齊） typedef struct T { int member_bit :9; char member_char; long member_long; } TYPE; int main() { printf("%zu\n", sizeof(TYPE)); // 16, 8 (armv7l) //因為member_bit為9個位元為了對齊會補成16個位元 printf("%zu\n", offsetof(TYPE, member_char)); // 2, 2 (armv7l) //char為2位元 printf("%zu\n", offsetof(TYPE, member_long)); // 8, 4 (armv7l) // printf("%zu\n", offsetof(TYPE, member_bit)); // error TYPE *t; printf("%p, ", t); printf("%p, ", &t->member_char); printf("%p\n", (container_of(&(t->member_char), TYPE, member_char))); // 0x7ffff9572f00, 0x7ffff9572f08, 0x7ffff9572f00 (x86-64) // 0x7ea6b20c, 0x7ea6b210, 0x7ea6b20c (armv7l) } ``` --- ## 測試這樣的 integer list 和 non-intrusive linked list 的效能差異 ### 1 . 了解 intrusive 和 non-intrusive non-intrusive data collection ( 被儲存的資料不知道任何 data collection 的資訊 )，以 linkde-list 為例，裡面包含了我們所要存的項目 ( Item )，還有儲存資料的空間 ( node )，而我們知道，如果要在 linked-list 中刪除某一項目 ( Item )時，我們必須走訪 linked-list 內全部的資料，逕而找尋到要刪除的項目( Item )，可想而知時間複雜度為 O(n)，若我們要降低時間複雜度的話，可改用 intrusive 的方式，他是將 Item 和 Node 合在一起，而此時要刪除某一項目時他的時間複雜度則降低為 O(1)，對於資料 Item 來說，使用 intrusive data collection 存 Item 的話，Item 可以直接找到它在 data collection 內的位置，接著可以直接操作 data collection，反觀 non-intrusive data collection 需要多一個步驟找到 Item 在 data collection 內的 container ( 容器 )，才能對 container 動作。而這個動作會造成一些額外的成本 ( 設計、維護、時間、空間之間的取捨 )。問題 | intrusive | non-intrusive --- | --- | --- **記憶體管理** | **外部** | **內部，通過分配器** **插入/刪除的時間** | **較快** | **較慢** **記憶體位置** | **較好** | **較差** **可否以值方式保存不可複製和不可移動的對象**|**可以**|**不可以** **記憶體佔用**|**最少**|**比最少多** **須修改值的定義以利插入**|**是**|**否** **容器可否複製**|**否**|**是** **不使用容器也能破壞容器的不變式**|**較容易**|**較難** **從值計算迭代器**|**較難**|**較容易** ### 關於integer list 和 non-intrusive linked list 的效能差異首先我們要先了解其實所謂的 intrusive inked-list 就是指 integer inked-list 它會在結構裡儲存數值（資料）然後配上指向下一個數值的位址。這樣讀取資料的鏈表我們就稱為 intrusive inked-list。再來是關於 non-intrusive linked list 這裡不確定我的認知對不對，但以我所查詢的資料中 non-intrusive linked list 是將兩個數值的位址儲存在該結構中，也就是說，在結構裡面都是儲存位址而非數值，而且沒有定義儲存資料的型態，例如： ```clike struct llhead { struct llhead *prev, *next; }; ``` 底下我們利用了兩個例子加上 clock 函式來計算一個有十個元素的陣列進行 linked-list 的建立會花多久時間，這兩個例子我們是採用環狀的 linked-list 來做測試，測試結果如下： ```shell $ ./non-intrusive 0 1 2 3 4 5 6 7 8 9 Time = 0.000082 ``` 可以看到以 non-intrusive 來建立 linked list 所花的時間是 0.082ms。再看一般的 integer list 花多少時間，測試如下： ```shell $ ./intrusive 0 1 2 3 4 5 6 7 8 9 Time = 0.000064 ``` 由上可以發現 integer list 花的時間較少，而我們利用 gnuplot 將創建不同個數目的 node 所需的時間繪成圖形做比較，可以更清楚的發現 intrusive 的效能是比較好的。 ![](https://i.imgur.com/KAJGIYt.png) ## 參考資料 * [intrusive/non-intrusive data collection](https://medium.com/fcamels-notes/intrusive-non-intrusive-data-collection-2b937f3c0c0b) * [Intrusive and non-intrusive comparison](http://boost.ez2learn.com/doc/html/intrusive/intrusive_vs_nontrusive.html) * [Intrusive lists](https://stackoverflow.com/questions/3361145/intrusive-lists#) * [Intrusive Linked Lists](http://faculty.ycp.edu/~dhovemey/spring2011/cs350/lecture/lecture5.html) * [Non-Intrusive Linked List](https://avdongre.wordpress.com/2015/04/13/non-intrusive-linked-list/)