Functional Programming in C

# Functional Programming in C 考慮到以下採用 [Functional Programming](https://en.wikipedia.org/wiki/Functional_programming) 風格開發的 singly-linked list 程式: ```Clike #include <stdint.h> #include <stdio.h> #include <stdlib.h> /* integer linked list type */ typedef struct __int_list const *const int_list_t; static int_list_t const Nil = NULL; /* empty list */ /* singly linked list element */ typedef struct __int_list { int_list_t next; int32_t const val; } node_t; /* A pointer to where we are about to store the result of computation */ typedef void *const CPS_Result; /* prototypes for the continuations */ typedef void (*MakeListCallback)(int_list_t list, CPS_Result res); void make_list(uint32_t const arr_size, int32_t const array[], int_list_t lst, MakeListCallback const cb, CPS_Result res); typedef void (*ReversedListCallback)(int_list_t rlist, CPS_Result res); void reverse(int_list_t list, int_list_t rlist, ReversedListCallback const cb, CPS_Result res); typedef void (*VoidMappable)(int32_t const val); void void_map_array(VoidMappable const cb, uint32_t const size, int32_t const *const arr); void list2array(int_list_t list, CPS_Result res); /* reverse a list and then store it in an array */ void reverse_toarray(int_list_t list, CPS_Result res) { reverse(list, Nil, list2array, res); } static void print_val(int32_t const val) { printf("%d ", val); } #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0])) int main(int argc, char *argv[]) { int32_t arr[] = {99, 95, 90, 85, 80, 20, 75, 15, 10, 5}; uint32_t const arr_size = ARRAY_SIZE(arr); int32_t res[arr_size]; /* call make_list and pass reverse_toarray as the "continuation" */ make_list(arr_size, arr, Nil, reverse_toarray, res); /* print out the reversed array */ void_map_array(print_val, arr_size, res); printf("\n"); return 0; } /* construct a linked list from an array */ void make_list(uint32_t const arr_size, int32_t const arr[], int_list_t lst, MakeListCallback const cb, CPS_Result res) { if (!arr_size) { cb(lst, res); return; } make_list(arr_size - 1, arr, &(node_t){.val = arr[arr_size - 1], .next = lst}, cb, res); } /* transform a linked list into an array */ void list2array(int_list_t list, CPS_Result res) { if (Nil == list) return; int32_t *array = res; array[0] = list->val; list2array(list->next, array + 1); } void reverse(int_list_t list, int_list_t rlist, /* reversed list */ ReversedListCallback const cb, CPS_Result res) { if (Nil == list) { cb(rlist, res); return; } reverse(list->next, &(node_t) K1, cb, res); } /* iterate over an array and performs action cb on each element */ void void_map_array(VoidMappable const cb, uint32_t const size, int32_t const *const arr) { if (!size) return; cb(arr[0]); void_map_array(cb, K2, K3); } ``` 程式輸出為 `5 10 15 75 20 80 85 90 95 99`，也就是將輸入的數值順序予以反轉，過程中不需要 malloc/free，請補完上述程式碼。 K1 = ? * `(a)` {.val = list->val, .next = list} * `(b)` {.val = list->val, .next = rlist} * `(c)` {.val = rlist->val, .next = list} * `(d)` {.val = rlist->val, .next = rlist} K2 = ? * `(a)` size - 1 * `(b)` size * `(c)` size + 1 K3 = ? * `(a)` arr - 1 * `(b)` arr * `(c)` arr + 1 :::success 延伸題目: 1. 解釋上述程式碼的運作，並且透過修改和實驗，理解 functional programming (FP) 風格; 2. 比照 FP，改寫上述程式碼，實作出 quick sort，同樣不得配置 heap memory (也就是不得用 malloc/free); 3. 測試這樣的 integer list 和 non-intrusive linked list 的效能差異; ::: #### 1. 解釋上述程式碼的運作，並且透過修改和實驗，理解 functional programming (FP) 風格首先必須先瞭解一下程式運作的邏輯，先看一下 reverse() 這個 function。他會將傳進去的 singly linked list 反轉後將結果轉成 array 存在 main() 中先宣告好的 res 中。而他的過程是透過巧妙的遞迴實作完成，下面以圖示說明： * 第一次呼叫時將原始的 lst 傳進去，紅色部分為指標處。 ### lst: ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 99 | <ref> }", width=1.2, color=blue] b [label="{ <data> 95 | <ref> }"]; c [label="{ <data> 90 | <ref> }"]; d [label="..." shape=box]; e [label="{ <data> 5 | <ref> }",width=1.2]; a:ref:c -> b:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false]; c:ref:c -> d:ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false]; d->e :ref [arrowhead=dot, arrowtail=vee, dir=both, tailclip=false]; } ``` * 進入遞迴第一次時將 list->next 放在第一個參數傳了進去，而比較特別的地方是的第二個參數的部分直接將一個 node_t 型態的 linked list 初始化為 val = 目前紅色指標處的 value，next = NULL (rlist 一開始為 NULL)，也就是下一層遞迴裡的 rlist。 ### list: ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 95 | <ref> }", width=1.2, color=red] b [label="{ <data> 90 | <ref> }"]; c [label="{ <data> 85 | <ref> }"]; d [label="..." shape=box]; e [label="{ <data> 5 | <ref> }",width=1.2]; a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; c:ref:c -> d [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; d->e } ``` ### rlist: ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 99 | <ref> }", width=1.2, color=red] b [label="{ NULL }"]; a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; } ``` * 進第二層遞迴時又將 list->next 傳了進去，第二項參數的 rlist 的 val 初始化為 list->val ，也就是 95，next 再接回去原來傳進來的 rlist，就會變成下圖： ### list: ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 90 | <ref> }", width=1.2, color=red] b [label="{ <data> 85 | <ref> }"]; c [label="{ <data> 80 | <ref> }"]; d [label="..." shape=box]; e [label="{ <data> 5 | <ref> }",width=1.2]; a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; c:ref:c -> d [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false]; d->e } ``` ### rlist: ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 95 | <ref> }", width=1.2, color=red] b [label="{ <data> 99 | <ref> }"]; c [label="{ NULL }"]; a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; } ``` * 一直到最後 rlist 就會是 list 的反轉了： ### rlist: ```graphviz digraph foo { rankdir=LR; node [shape=record]; a [label="{ <data> 5 | <ref> }", width=1.2, color=red] b [label="{ <data> 10 | <ref> }"]; c [label="..." shape=box]; d [label="{ <data> 99 | <ref> }"]; e [label="{ NULL }"]; a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; c:ref:c -> d:data [arrowsize=1.2]; d:ref:c -> e [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, arrowsize=1.2]; } ``` * 終止條件成立時會呼叫一開始傳進去的 list2array() 並將 rlist 轉成 array 存在 res 中，完成 function。 * 故第一題的 K1 就是 ‘{.val = list->val, .next = rlist}’ ，而 void_map_array 是將 array 內的 data 印出來，K2 為遞迴終止條件的變數，故應為 size-1，K3 等同於 arr[i++] 的功能，因此為 arr+1。 * 一開始在看這份 code 時覺得很不習慣，所有的事情都從遞迴以及傳遞參數之間完成，每個功能都拆開來撰寫，上網搜尋了一下 [Functional Programming](https://en.wikipedia.org/wiki/Functional_programming) 各類相關資料，發現大致上有幾個特點： * 一個運算用的 function 大多只包含條件式以及遞迴呼叫。 * [higher-order functions](https://en.wikipedia.org/wiki/Higher-order_function) 主要有兩個特點： * function 可當作參數傳入 function 之中。 * function 可以 return function. * [lazy evaluation](https://en.wikipedia.org/wiki/Lazy_evaluation) ：當變數被定義時不會立即 evaluate，而是等待要使用時才 evaluate 並儲存已計算的結果，所以不會重複 evaluate 同一個定義。 * 不支援 flow Controls，像是迴圈以及 If-Else 條件判斷式，只使用 function 及 function 呼叫。 [Side Effect](https://en.wikipedia.org/wiki/Side_effect_(computer_science)) :::warning 關於 FP，可對照看之前報告 [MapReduce with POSIX Thread](https://hackmd.io/s/Hkb-lXkyg)，注意裡頭引用的論文 :notes: jserv ::: #### 2. 比照 FP，改寫上述程式碼，實作出 quick sort，同樣不得配置 heap memory (也就是不得用 malloc/free) 回憶一下 [quicksort](http://notepad.yehyeh.net/Content/Algorithm/Sort/Quick/Quick.php) 演算法的流程： * 選定數列中的一個值當作 pivot ，而選擇的方法可以是隨機選，也可以是挑出陣列的第一項，當然也可以使用 [median of medians](https://en.wikipedia.org/wiki/Median_of_medians) 確保每次選的 pivot 都是接近中位數的數，以降低 quicksort 遞迴次數。 * 從數列的頭到尾對每個數比較，若大於 pivot 則將其置於右方，剩下的則置於 pivot 左方。 * 對左右兩邊的數列分別都再做上述步驟(也就是遞迴)，一直到數列個數為 0 或 1 時結束遞迴。開始實作演算法的部份，由於 functional progamming 的程式碼有一個很大的特色是：當一個變數建立出來後就不可以在更動他的值，在 C語言中也就是 const 型態的變數，這樣的限制也讓程式碼寫起來相當的不習慣。我們先嘗試寫了一個 partition 的function: ```clike __list_strct partition( int_list_t cur, int_list_t llist, int_list_t pivot, int_list_t rlist) { if(Nil == cur) { __list_strct r = {.left = llist, .right = rlist}; return r; } /* Put the numbers which are larger than pivot into right. */ if(cur->val > pivot->val) { return partition(cur->next, llist, pivot, &(node_t){.val = cur->val, .next = rlist}); } return partition(cur->next, &(node_t){.val = cur->val, .next = llist}, pivot, rlist); } ``` 回傳值的型態是一個包含兩個 list 指標的 struct，目的是要呼叫 partition 這個 function 後可以拿到兩個分完堆的 list。 ```clike typedef struct list_strct { int_list_t left; int_list_t right; } __list_strct; ``` 然而這樣的作法出現了問題，儘管我們在 partition() 內測試了 function 內容都確定沒寫錯，但當 function return 後做下一步的執行就會出現難以預料的結果或是 segmentation fault，連多一個 printf 都會讓結果截然不同，發現我們犯了一個錯誤。先觀察一下這行程式碼： ```clike /* integer linked list type */ typedef struct __int_list const *const int_list_t; ``` 他的意思是定義了一個 int_list_t 的宣告方式，他可以宣告一個本身是 const 的指標並且指向一個 const 的變數。也就是說如果我們將他當作一個 function 的宣告，則這個 function 就可以被視為是一個 const 的變數，也就是回傳的結果必須固定不變，然而如果每次迴傳都式不同的變數時就會發生錯誤。這裡也讓我們想起 functional programming 的初衷，因此繼續修改程式碼。 :::danger 補完程式碼！ :notes: jserv ::: 11/12更新 ### 關於functional programming的程式設計風格簡單來說，其實和數學的函式有一些關係，亦即在我們的輸入和輸出都不會被干擾的情況下，輸入同一個函式會出現相同的規律，例如：5輸入一個f(x)的函式得到的數值是10，那麼輸入一個15的數值就要得到30，而不是其他數值。而且不會受到side effect的影響，若我們過程中想要改變我們的輸入是不行的，除非我們使用copy(call by value)的方式，但設想一個問題，如果今天我們要將一個內含26個元素的陣列其中一個元素作更改，可行辦法有複製一整個陣列後，然後再做更改。但這種作法會增加記憶體空間和降低效率，但若是透過一些資料結構，例如：tree或是link list等等....結構而且functional programming是一旦執行了就算有比它執行更快的函式出現它依舊會跑完不受影響。 :::info 搭配閱讀: [C 語言返回堆疊空間的結構體](http://www.zenlife.tk/return-struct-without-malloc.md) 需要釐清大量 callback 之後的記憶體配置方式 :notes: jserv ::: ### 程式邏輯