2021q1 Homework1 (quiz1)

# 2021q1 Homework1 (quiz1) contributed by < `secminhr` > > [題目](https://hackmd.io/@sysprog/linux2021-quiz1) ## 問題 ### 1. LLL = ? 函式： ```cpp static inline void list_concat(node_t **left, node_t *right) { while (*left) LLL; *left = right; } ``` 用法： ```cpp node_t *result = NULL; list_concat(&result, left); CCC; *list = result; ``` 從函式的名稱跟使用我們就可以知道這個 `list_concat` 的用處是把兩個 linked list 串接在一起。 <s>作答的選項： - `(a)` `left = left->next` - `(b)` `left = (*left)->next` - `(c)` `left = &((*left)->next)` - `(d)` `*left = (*left)->next` `(a)` 肯定不對因為 `node_t *` 沒有 `next` 這個成員。 `(b)` 也不對因為 `left` 的型態是 `node_t **` 而 `next` 的型態是 `node_t *`</s> <s>剩下 `(c)` 和 `(d)` ，看看哪個能達成目標。</s> :::danger 選擇題只是為了授課教師批改的便利，你不該拘泥在選項，否則就跟高中生無異。相反地，你該思考：若這題要你憑空撰寫程式碼，你會怎麼做？ :notes: jserv ::: :::info 下方將直接構造 LLL ::: 假設 `list_concat` 的輸入 **left** ```graphviz digraph LLLleft { rankdir=LR a[shape=record, label="<h>Data1|<n>next"] b[shape=record, label="<h>Data2|<n>next"] a:n -> b:h NULL[shape=box] b:n -> NULL result[shape=box] result->a:h left[shape=box] left->result } ``` **right** ```graphviz digraph LLLright { rankdir=LR a[shape=record, label="<h>Data3|<n>next"] NULL[shape=box] a:n -> NULL right[shape=box] right -> a:h } ``` 根據 `list_concat` 的串接部份 ```c *left = right; ``` 為了能讓這行把兩個 list 串接起來，在這行執行之前，left 的圖應為： ```graphviz digraph LLLleft { rankdir=LR a[shape=record, label="<h>Data1|<n>next"] b[shape=record, label="<h>Data2|<n>next"] a:n -> b:h NULL[shape=box] b:n -> NULL result[shape=box] result->a:h left[shape=box] left->b:n } ``` 注意到這也符合 `while (*left)` 的停止情況可以推斷是 `while` 把原本的圖變成現在的樣子把整個過程畫出來就是 1. ```graphviz digraph LLLleft { rankdir=LR a[shape=record, label="<h>Data1|<n>next"] b[shape=record, label="<h>Data2|<n>next"] a:n -> b:h NULL[shape=box] b:n -> NULL result[shape=box] result->a:h left[shape=box] left->result } ``` 2. ```graphviz digraph LLLleft { rankdir=LR a[shape=record, label="<h>Data1|<n>next"] b[shape=record, label="<h>Data2|<n>next"] a:n -> b:h NULL[shape=box] b:n -> NULL result[shape=box] result->a:h left[shape=box] left->a:n } ``` 3. ```graphviz digraph LLLleft { rankdir=LR a[shape=record, label="<h>Data1|<n>next"] b[shape=record, label="<h>Data2|<n>next"] a:n -> b:h NULL[shape=box] b:n -> NULL result[shape=box] result->a:h left[shape=box] left->b:n } ``` 於是 LLL 便能被構造出來： ``` left 應指向： left 指標指向的指標指向的物件的成員 next ``` 也就是 ```c left = &((*left)->next) ``` 答案為 `(c)` ### 2. AAA = ? AAA 的位置在 `quicksort` 的中段出現： ```c node_t *pivot = *list; int value = pivot->value; node_t *p = pivot->next; pivot->next = NULL; node_t *left = NULL, *right = NULL; while (p) { node_t *n = p; p = p->next; list_add_node_t(n->value > value ? AAA : BBB, n); } ``` 看一下 `list_add_node_t` ，可以看出來 `list_add_node_t` 的作用是把一個 node 加在一個 list 的頭部。 ```c static inline void list_add_node_t(node_t **list, node_t *node_t) { node_t->next = *list; *list = node_t; } ``` 我們知道 quicksort 的作法就是 1. 選出一個 pivot 2. 把所有元素跟 pivot 的值相比，比 pivot 小的放一邊，比 pivot 大的放另一邊 3. 兩邊分別 sort 4. 以小、 pivot 、大的順序連在一起因為看到了一個比較，表示這裡在做的是第二步。而 left 和 right 則代表兩邊。問題是：大的應該放在 right 還是 left ？我們需要看結合的順序來解決。因為題目說會依據遞增順序，所以第一個結合的肯定要放比 pivot 小的。根據 `quicksort` 的最後幾行： ```c node_t *result = NULL; list_concat(&result, left); CCC; *list = result; ``` 知道小的放在 left ，大的放在 right 因此， ``` AAA = &right BBB = &left ``` 答案為 `(e)` ### 3. BBB = ? 說明如上，答案為 `(c)` ### 4. CCC = ? CCC 出現在 `quicksort` 的最後： ```c node_t *result = NULL; list_concat(&result, left); CCC; *list = result; ``` 因為要從小到大，所以 CCC 的位置應放： ```c list_concat(&result, pivot); list_concat(&result, right); ``` 答案為 `(b)` ## 運作原理 ### 測試部份 #### list_is_ordered ```c static bool list_is_ordered(node_t *list) { bool first = true; int value; while (list) { if (first) { value = list->value; first = false; } else { if (list->value < value) return false; value = list->value; } list = list->next; } return true; } ``` 這個函式將回傳 `list` 是否依照由小到大的順序。先看 `while` 部份 ```cpp while (list) { ... list = list->next; } ``` `while` 迴圈在 `list` 不為 NULL 時，會嘗試走訪整個串列。看看 `while` 的其他部份 ```c if (first) { value = list->value; first = false; } else { if (list->value < value) return false; value = list->value; } ``` 可以知道當 `first` 為真的時候（也就是 `list` 指向串列頭部時），會設定 `value` 的值為 `list->value` 。在 `first` 不為真的時候，會比較當前的 `list->value` 和 `value` ，如果 `list->value` 比 `value` 小則馬上回傳 `false` 。透過這個行為跟下面的 `value = list->Value` 更新可以判斷， `value` 放的值會是前一個節點的 `value` 。然後是放在 `while` 迴圈外的 `return true;` 此時已經走訪完整個串列，如果沒有因為中間的 `return false;` 而離開，就代表整個串列是由小到大的順序。 #### list_display ```c static void list_display(node_t *list) { printf("%s IN ORDER : ", list_is_ordered(list) ? " " : "NOT"); while (list) { printf("%d ", list->value); list = list->next; } printf("\n"); } ``` 這個函式的目的就是把整個串列印出來，並且第一行會說明是否有按照由小到大的順序。首先 ```c printf("%s IN ORDER : ", list_is_ordered(list) ? " " : "NOT"); ``` 用了 `list_is_ordered` 來判斷，如果有按照順序就印出 IN ORDER : ，沒有的話就印出 NOT IN ORDER : 。接下來的 `while` 是在走訪串列並把每個元素的 value 印出來。 ```c while (list) { printf("%d ", list->value); list = list->next; } ``` 最後 `printf("\n");` 換行。 #### main main 裡面有使用一些題目中沒出現的函式，我們可以用上下文跟語意來推斷函式的功用。 ```c int main(int argc, char **argv) { size_t count = 20; node_t *list = NULL; while (count--) list = list_make_node_t(list, random() % 1024); list_display(list); quicksort(&list); list_display(list); if (!list_is_ordered(list)) return EXIT_FAILURE; list_free(&list); return EXIT_SUCCESS; } ``` 首先是亂數生成 list 的部分： ```c size_t count = 20; node_t *list = NULL; while (count--) list = list_make_node_t(list, random() % 1024); ``` 這會生成一個有二十個元素的串列。接下來這三行，會把經過 `quicksort` 之前和之後的 `list` 印出來。 ```c list_display(list); quicksort(&list); list_display(list); ``` 最後，如果沒有按照順序就回傳 `EXIT_FAILURE` ，反之回收 list 並回傳 `EXIT_SUCCESS` 。 `EXIT_FAILURE` 跟 `EXIT_SUCCESS` 是定義在 stdlib 裡面。這裡似乎有 [memory leak 的情況](https://hackmd.io/0_I7IwOFQHW_CMOqwLkSdA?both#Memory-leak-%E5%95%8F%E9%A1%8C)（沒有按照順序的時候）。 ```c if (!list_is_ordered(list)) return EXIT_FAILURE; list_free(&list); return EXIT_SUCCESS; ``` ### 排序部分 #### [list_concat](https://hackmd.io/0_I7IwOFQHW_CMOqwLkSdA#1-LLL--) #### list_add_node_t 意圖很明顯是把 `node_t` 加在頭部 ```c static inline void list_add_node_t(node_t **list, node_t *node_t) { node_t->next = *list; *list = node_t; } ``` 參數： ```graphviz digraph list_add_node_t_1 { rankdir=LR a[shape=record, label="<h>Data1|<n>"] b[shape=record, label="<h>Data2|"] a:n -> b:h h[shape=box, label=""] h -> a:h list[shape=box] list -> h node_t[shape=box] new[shape=record, label="<h>New data|<n>"] node_t -> new:h } ``` `node_t->next = *list;` ： ```graphviz digraph { rankdir=LR a[shape=record, label="<h>Data1|<n>"] b[shape=record, label="<h>Data2|"] a:n -> b:h h[shape=box, label=""] h -> a:h list[shape=box] list -> h node_t[shape=box] new[shape=record, label="<h>New data|<n>"] node_t -> new:h new:n -> a:h } ``` `*list = node_t;` ```graphviz digraph { rankdir=LR a[shape=record, label="<h>Data1|<n>"] b[shape=record, label="<h>Data2|"] a:n -> b:h h[shape=box, label=""] list[shape=box] list -> h node_t[shape=box] new[shape=record, label="<h>New data|<n>"] node_t -> new:h new:n -> a:h h -> new:h } ``` #### quicksort ```c void quicksort(node_t **list) { if (!*list) return; node_t *pivot = *list; int value = pivot->value; node_t *p = pivot->next; pivot->next = NULL; node_t *left = NULL, *right = NULL; while (p) { node_t *n = p; p = p->next; list_add_node_t(n->value > value ? AAA : BBB, n); } quicksort(&left); quicksort(&right); node_t *result = NULL; list_concat(&result, left); CCC; *list = result; } ``` 1. base case `*list` 為 NULL 的時候當然不用排。 ```c if (!*list) return; ``` 2. pivot 這裡的 pivot 選擇的是最頭部。 ```c node_t *pivot = *list; ``` 3. setup 為了接下來把 `*list` 分成 `left` 和 `right` 做準備。 ```c int value = pivot->value; node_t *p = pivot->next; pivot->next = NULL; node_t *left = NULL, *right = NULL; ``` - `value` 將作為放在 `left` 或者 `right` 的判斷基準 - `p` 將作為走訪 `*list` 的變數 4. partition ```c while (p) { node_t *n = p; p = p->next; list_add_node_t(n->value > value ? AAA : BBB, n); } ``` 這裡的 `p = p->next` 設定走訪節點的指標，`list_add_node_t` 的說明請看[第二、三題](https://hackmd.io/0_I7IwOFQHW_CMOqwLkSdA#2-AAA--)。結束之後，`*list` 將被拆成三個部分：`left` 、 `right` 和 `pivot` 。 5. sort & concat ```c quicksort(&left); quicksort(&right); node_t *result = NULL; list_concat(&result, left); CCC; *list = result; ``` 將 `left` 和 `right` 都 sort 過後，將 `left` 、 `pivot` 和 `right` 連接在一起，最後修改 `*list` 為排序好的結果。 [`list_concat` 的說明](https://hackmd.io/0_I7IwOFQHW_CMOqwLkSdA#1-LLL--) [CCC 部分的說明](https://hackmd.io/0_I7IwOFQHW_CMOqwLkSdA?both#4-CCC--) ## Random 問題 ### 補全要讓程式能夠跑起來，我們必須給出 [`list_make_node_t`](https://github.com/secminhr/quiz1/blob/master/main.c#L19) 和 [`list_free`](https://github.com/secminhr/quiz1/blob/master/main.c#L6) 。 ### 問題多跑幾次就能發現，產出的 `list` 是一樣的。（甚至跟範例也一樣） ``` NOT IN ORDER : 1016 84 706 124 326 483 763 498 939 186 205 809 236 74 255 81 115 105 966 359 IN ORDER : 74 81 84 105 115 124 186 205 236 255 326 359 483 498 706 763 809 939 966 1016 ``` 看看 `random` 的說明（節錄關於 `random` 的部份）： ``` RANDOM(3) Linux Programmer's Manual RANDOM(3) NAME random, srandom, initstate, setstate - random number generator SYNOPSIS #include <stdlib.h> long int random(void); void srandom(unsigned int seed); ...... DESCRIPTION The random() function uses a nonlinear additive feedback random number generator employing a default ta‐ ble of size 31 long integers to return successive pseudo-random numbers in the range from 0 to RAND_MAX. The period of this random number generator is very large, approximately 16 * ((2^31) - 1). The srandom() function sets its argument as the seed for a new sequence of pseudo-random integers to be returned by random(). These sequences are repeatable by calling srandom() with the same seed value. If no seed value is provided, the random() function is automatically seeded with a value of 1. ...... ``` 其中有說到這是偽隨機，並且種子由 `srandom` 提供。如果沒有給定，將使用 1 作為種子。所以這裡的目標變成我們要在每次執行時給出不同的種子。常見的作法之一是利用 `time` 函式： ``` TIME(2) Linux Programmer's Manual TIME(2) NAME time - get time in seconds SYNOPSIS #include <time.h> time_t time(time_t *tloc); DESCRIPTION time() returns the time as the number of seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC). If tloc is non-NULL, the return value is also stored in the memory pointed to by tloc. RETURN VALUE On success, the value of time in seconds since the Epoch is returned. On error, ((time_t) -1) is returned, and errno is set appropriately. ``` 在 `main` 裡面加上： ```c srandom(time(NULL)); ``` 因為時間是一直變動的，所以 `time` 應該能給出不一樣的值。但是當快速多次執行的時候，因為 `time` 給出的數字是秒數，又會出現相同的狀況。 ```shell $ ./main NOT IN ORDER : 551 592 812 815 206 174 16 118 603 73 208 502 655 543 829 80 299 955 65 443 IN ORDER : 16 65 73 80 118 174 206 208 299 443 502 543 551 592 603 655 812 815 829 955 $ ./main NOT IN ORDER : 551 592 812 815 206 174 16 118 603 73 208 502 655 543 829 80 299 955 65 443 IN ORDER : 16 65 73 80 118 174 206 208 299 443 502 543 551 592 603 655 812 815 829 955 $ ./main NOT IN ORDER : 551 592 812 815 206 174 16 118 603 73 208 502 655 543 829 80 299 955 65 443 IN ORDER : 16 65 73 80 118 174 206 208 299 443 502 543 551 592 603 655 812 815 829 955 ``` 只要在一秒內多次執行就能重現這個狀況。我們可以借用 ASLR 的特性來解決這個問題。 ASLR (Address Space Layout Randomization) 會讓程式被載入到隨機的位置。可以利用這點來每次提供不同的種子。 ```c srandom((unsigned int) main); ``` ## Memory leak 問題當 `main` 在要退出的時候，似乎有 memory leak 的情況： ```c if (!list_is_ordered(list)) return EXIT_FAILURE; list_free(&list); return EXIT_SUCCESS; ``` 當沒有排序好的時候 `list` 將不會被回收，`main` 將直接回傳。我們使用 valgrind 來看看是否真的存在 memory leak 的情況。排序正確時： ``` ==12901== Memcheck, a memory error detector ==12901== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==12901== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==12901== Command: ./main ==12901== NOT IN ORDER : 1022 873 741 965 408 371 15 940 497 844 982 679 94 307 955 1002 350 95 1022 494 IN ORDER : 15 94 95 307 350 371 408 494 497 679 741 844 873 940 955 965 982 1002 1022 1022 ==12901== ==12901== HEAP SUMMARY: ==12901== in use at exit: 0 bytes in 0 blocks ==12901== total heap usage: 21 allocs, 21 frees, 1,344 bytes allocated ==12901== ==12901== All heap blocks were freed -- no leaks are possible ==12901== ==12901== For lists of detected and suppressed errors, rerun with: -s ==12901== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ``` 無排序時： ``` ==12954== Memcheck, a memory error detector ==12954== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==12954== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==12954== Command: ./main ==12954== NOT IN ORDER : 1022 873 741 965 408 371 15 940 497 844 982 679 94 307 955 1002 350 95 1022 494 NOT IN ORDER : 1022 873 741 965 408 371 15 940 497 844 982 679 94 307 955 1002 350 95 1022 494 ==12954== ==12954== HEAP SUMMARY: ==12954== in use at exit: 320 bytes in 20 blocks ==12954== total heap usage: 21 allocs, 1 frees, 1,344 bytes allocated ==12954== ==12954== LEAK SUMMARY: ==12954== definitely lost: 16 bytes in 1 blocks ==12954== indirectly lost: 304 bytes in 19 blocks ==12954== possibly lost: 0 bytes in 0 blocks ==12954== still reachable: 0 bytes in 0 blocks ==12954== suppressed: 0 bytes in 0 blocks ==12954== Rerun with --leak-check=full to see details of leaked memory ==12954== ==12954== For lists of detected and suppressed errors, rerun with: -s ==12954== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ``` 可以發現在沒有排序的情況下確實有 memory leak 的情況。於是在 `list_is_ordered` 為 `false` 的時候我們也加入 `list_free` ```c if (!list_is_ordered(list)) { list_free(&list); return EXIT_FAILURE; } list_free(&list); return EXIT_SUCCESS; ``` 再用 valgrind 看看 ``` ==13042== Memcheck, a memory error detector ==13042== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==13042== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==13042== Command: ./main ==13042== NOT IN ORDER : 1022 873 741 965 408 371 15 940 497 844 982 679 94 307 955 1002 350 95 1022 494 NOT IN ORDER : 1022 873 741 965 408 371 15 940 497 844 982 679 94 307 955 1002 350 95 1022 494 ==13042== ==13042== HEAP SUMMARY: ==13042== in use at exit: 0 bytes in 0 blocks ==13042== total heap usage: 21 allocs, 21 frees, 1,344 bytes allocated ==13042== ==13042== All heap blocks were freed -- no leaks are possible ==13042== ==13042== For lists of detected and suppressed errors, rerun with: -s ==13042== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ``` 可以發現已經沒有剛才的情況了。最後，移除重複。 ```c int return_val = list_is_ordered(list) ? EXIT_SUCCESS : EXIT_FAILURE; list_free(&list); return return_val; ``` --- ## non-recursive sort :::warning TODO: 描述你的策略和考量 :notes: jserv ::: ```c= void quicksort(node_t **list) { #define MAX_LEVELS 1000 node_t **list_arr[MAX_LEVELS] = { list }; node_t *l_arr[MAX_LEVELS], *r_arr[MAX_LEVELS], *piv_arr[MAX_LEVELS]; bool left_sorted[MAX_LEVELS], right_sorted[MAX_LEVELS]; int i = 0; while (i >= 0) { if (*list_arr[i]) { if (left_sorted[i]) { if (right_sorted[i]) { goto CONCAT; } else { goto SORT_RIGHT; } } PAIRTITION: { node_t *left = NULL, *right = NULL, *pivot = *list_arr[i]; int value = pivot->value; node_t *p = pivot->next; pivot->next = NULL; while (p) { node_t *n = p; p = p->next; list_add_node_t(n->value > value ? &right : &left, n); } l_arr[i] = left; r_arr[i] = right; piv_arr[i] = pivot; } SORT_LEFT: { left_sorted[i] = true; left_sorted[i+1] = false; right_sorted[i+1] = false; list_arr[i+1] = &l_arr[i]; i++; continue; } SORT_RIGHT: { right_sorted[i] = true; left_sorted[i+1] = false; right_sorted[i+1] = false; list_arr[i+1] = &r_arr[i]; i++; continue; } CONCAT: { node_t *result = NULL; list_concat(&result, l_arr[i]); list_concat(&result, piv_arr[i]); list_concat(&result, r_arr[i]); *list_arr[i] = result; i--; } } else { i--; } } *list = *list_arr[0]; } ``` 很大部份的邏輯跟原來的 quicksort 是一樣的，所以只說明取代遞迴的部份。在使用遞迴的時候，函式的 stack 和跳躍都是自動處理的。為了作到非遞迴，我們只好自己處理。 ### 模仿 stack 可以發現非遞迴版本多了許多的陣列。這些陣列是為了模仿本來在函式呼叫所使用的 stack ，以此為用途的陣列包含 `list_arr` `l_arr` `r_arr` `piv_arr` 。另外，控制陣列讀取寫入的變數 `i` 則是扮演類似 bp 的角色。在原本函式呼叫的地方，我們將 `i` 加一（例如上面的 37 行、 45 行）相對的在退出函式的地方，將 `i` 減一。（例如上面的 54 行、 57 行） ### 模仿跳躍另外一個要克服的就是跳躍。原本使用函式呼叫的時候， stack 會有函式結束後要跳躍到哪個位置的資訊。在這份作法裡，我們使用布林陣列來紀錄是否曾經到達過某處，（例如 33 行、 41 行）並以此為依據進行跳躍。（ 10~16 行）