2019q1 Homework1 (list)

# 2019q1 Homework1 (list) contributed by <`bauuuu1021`> ## [作業要求](https://hackmd.io/s/S12jCWKHN) ### merge sort * 從 [linux-list](https://github.com/sysprog21/linux-list) 學習裡頭的技巧，包含裡頭 unit test 的設計，透過給定的 linked list 操作，實作出 merge sort * 附上你的修改過程，也需要讓 `qtest` 得以支援 * 可將 [dict](https://hackmd.io/s/B1Bq5Jat7) 裡頭的測試資料拿來作效能分析的輸入 * 思考提升排序效率的方法，需要做平均和最壞狀況的分析 * 平均效能分析 (用 `random_shuffle_array()` 所產生的數列) * 實驗結果圖形如下(fig1) ![](https://i.imgur.com/f43EsdP.png) * 特例效能分析 * 根據過去在演算法所學，這三種 sorting algorithm 之最佳及最差時間複雜度如下 | 排序名稱 | 最佳 | 平均 | 最差 | | ---------- | -------- | -------- | --| | insert sort| $O(n)$ | $O(n^2)$ |$O(n^2)$ | | quick sort | $O(nlgn)$ | $O(nlgn)$ | $O(n^2)$ | | merge sort | $O(nlgn)$ | $O(nlgn)$ | $O(nlgn)$ | * 而當 input data 為 1,2,3,... (sequential data)時，恰為 insert sort 之 best case 及 quick sort 之 worst case * 新增 macro `sequential_array()`，並利用 `#if` 及 `#elif` 搭配 `#define select` 即可選擇使用哪種 input data： ```C #if select random_shuffle_array(values, (uint16_t) ARRAY_SIZE(values)); #elif 1-select sequential_array(values, (uint16_t) ARRAY_SIZE(values)); #endif ``` * 實驗結果圖形如下(fig2) ![](https://i.imgur.com/1zkqIVe.png) * 結論： * 令人意外的是在 sequential input 的情況下， best case of insertion sort 雖然為 $O(n)$ ，但還是比 avg. case of merge sort ($O(nlgn)$) 高不少 (參照 fig2) * insertion sort 在不同型態 input 之比較 ![](https://i.imgur.com/8XJReGa.png) 可發現 sequential input 時所需時間明顯下降 * quick sort 在不同型態 input 之比較 ![](https://i.imgur.com/KUhTQX1.png) 可發現在 sequential input 時所需時間大幅提昇 * merge sort 在不同型態 input 之比較 ![](https://i.imgur.com/d5tdDzm.png) 可發現兩種資料型態所需的時間差不多 * 以下引述 jserv 老師在社團的發文 >在 Linux 核心中，建議採用 heap sort 或 merge sort，有個重要的考量是，許多要排序的原始序列，已是幾乎排序好的 (例如個別 process 佔用 CPU 的比例，原本已排序好，只是新增新的 process 後，又需要重排序)，或是期望順序剛好相反，這對於 quick sort 都是很大的衝擊。 ### 自我檢查事項 * 為何 Linux 採用 macro 來實作 linked list？一般的 function call 有何成本？ * macro 是在編譯前就展開成定義的內容，所以在執行時可以省去跳躍到函式的時間。但缺點為如果多次呼叫會佔用較多的記憶體空間(code section)。 * function 則是不展開函式定義的內容，待執行時才跳到 function 處執行，優點是不論被呼叫多少次都只有一份實體，缺點為每次執行都需要額外跳躍，增加執行所需時間。 * 除了 macro 及 function ，我也發現在 list.h 中大量的使用 inline * 根據 C 語言規格書 >Making a function an inline function suggests that calls to the function be as fast as possible. The extent to which such suggestions are effective is implementation-defined. (6.7.4 Function specifiers) :::info 在 header 中使用 `inline` 時，常搭配 `static` 出現，這樣的 `static inline` 函式搭配編譯器最佳化，則有 macro 的效果，並得以進行參數形態檢查 :notes: jserv ::: * 使用 **inline** 這個 keyword 可以要求 compiler 執行最佳化 (i.e. inline expansion) ，但 compiler 可以決定是否採納。 * 注意在 C89 上不支援！ * Linux 應用 linked list 在哪些場合？舉三個案例並附上對應程式碼，需要解說，以及揣摩對應的考量 * GNU extension 的 [typeof](https://gcc.gnu.org/onlinedocs/gcc/Typeof.html) 有何作用？在程式碼中扮演什麼角色？ * typeof(x) 會回傳 x 的資料型態 * 在 [typeof 在 kernel 的使用](http://deltamaster.is-programmer.com/posts/37253.html) 有淺顯易懂的例子解說 ```C= int a; typeof(a) b; // equals int b; typeof(&a) c; // equals int *c; ``` * 以下程式碼擷取自 [include/linux/typecheck.h](https://github.com/torvalds/linux/blob/master/include/linux/typecheck.h) ```C=5 /* * Check at compile time that something is of a particular type. * Always evaluates to 1 so you may use it easily in comparisons. */ #define typecheck(type,x) \ ({ type __dummy; \ typeof(x) __dummy2; \ (void)(&__dummy == &__dummy2); \ 1; \ }) /* * Check at compile time that 'function' is a certain type, or is a pointer * to that type (needs to use typedef for the function type.) */ #define typecheck_fn(type,function) \ ({ typeof(type) __tmp = function; \ (void)__tmp; \ }) ``` * 呼叫 typecheck(type,x) 時，如果 x 的資料型態與 type 不相符的話就會跳出 warning 訊息 * 根據 C 語言規格書，使用 `==` 或 `!=` 必須滿足下列四種條件之一 >— both operands have arithmetic type — both operands are pointers to qualified or unqualified versions of compatible types — one operand is a pointer to an object or incomplete type and the other is a pointer to a qualified or unqualified version of void — one operand is a pointer and the other is a null pointer constant (6.5.9 Equality operators)    其中對於 **compatible type** 定義如下 > Two types have compatible type if their types are the same > (6.2.7 Compatible type and composite type) * 因此當 x 的資料型態與 type 不相符時，例如： ```C int x; typecheck(float, x); ``` * 執行結果 ```shell warning: comparison of distinct pointer types lacks a cast (void)(&__dummy == &__dummy2); ``` * 這是因為 `_dummy` 及 `_dummy2` 不是一樣的 data type，所以此指向這兩者的 pointer 不是 compatible type * 在 typecheck_fn() 中，如果 function 的 return type 與參數 type 不相符的話一樣會跳出 warning * 例如： ```C ... int add (int a, int b) { return a+b; } int main () { typecheck_fn(char(*)(), add); } ``` * 會有以下執行結果 ```shell warning: initialization from incompatible pointer type [-Wincompatible-pointer-types] typecheck_fn(char(*)(), add); ``` * 這是因為在 macro `typecheck_fn` 中將 add() 的 return value **(int)** 指派給 **(char)** * 解釋以下巨集的原理 ```Clike #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) ``` * 巨集中 `offsetof()` 的定義如下 ``` #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) ``` * 根據 [container_of 巨集](http://adrianhuang.blogspot.tw/2010/01/linux-kernel-containerof.html) 中的解釋，`offsetof()` 可以算出某 struct member 與 &struct 相對位址，例如： ```C= #include <stdio.h> #include <stddef.h> typedef struct { char name[16]; int age; char id[12]; } idCard; int main () { idCard myData = {"bauuuu1021",22,"f74041145"}; printf("offset of id %ld\n", offsetof(idCard,id)); //output 20 } ``` * 因此 pointer to the member - offset of the memeber in the struct ，就可以得到 struct 的位址 ```C=17 idCard myData = {"bauuuu1021",22,"f74041145"}; printf("addr. of id \t%p\n", &myData.id); printf("offset of id \t%ld\n", offsetof(idCard, id)); printf("(addr. of id)-offset\t%p\n", container_of(&(myData.id), idCard, id)); printf("addr. of struct \t%p\n", &myData); ``` 可以得到以下輸出 ```shell addr. of id 0x7fff1c42a064 offset of id 20 (addr. of id)-offset 0x7fff1c42a050 addr. of struct 0x7fff1c42a050 ``` 得到 struct address 後可以使用它進行其他操作 ```C=22 printf("name is %s\n", container_of(&(myData.id), idCard, id)->name); // output bauuuu1021 ``` * 除了你熟悉的 add 和 delete 操作，`list.h` 還定義一系列操作，為什麼呢？這些有什麼益處？ * 將一些常用操作定義為 macro 或 function，可以避免每次需要這些操作都要重新實作，同時也提高易讀性 * `LIST_POISONING` 這樣的設計有何意義？ * list.h 中的 list_del() : ```C=129 /* ... * LIST_POISONING can be enabled during build-time to provoke an invalid memory * access when the memory behind the next/prev pointer is used after a list_del. * This only works on systems which prohibit access to the predefined memory * addresses. */ ... #ifdef LIST_POISONING node->prev = (struct list_head *) (0x00100100); node->next = (struct list_head *) (0x00200200); #endif ``` * 對特定 node 呼叫 list_del() 時，已經將此 node 從 list 中移除，因此存取 `node->prev` 及 `node->next` 是非法的行為 * 因此執行 list_del() 時可以將 `node->prev` 及 `node->next` 指向不可存取的位址，之後在沒有賦予新值的情況下存取到 `node->prev` 或 `node->next` 就會被視為非法操作而發出 exception * linked list 採用環狀是基於哪些考量？ * 可從兩個面向來考量這樣的設計: * singly or doubly * doubly linked list 雖然會花費較多的儲存空間，但相較於他在實作上的便利性，空間的差別顯的微不足道。 * 例如：刪除一個以 pointer 指向的節點時，使用 singly 必須從頭開始 trace 到該節點，才能進行刪除的調整，費時 O(n)；而使用 doubly 可以在 O(1) 內完成 * circular or non-circular * 雖然多數的狀況其實是使用 non-circular，但其實 circular 也沒有多使用其他資源，只是將尾節點指向頭 * circular doubly-linked list 擁有四種不同情況的所有優點，且沒有顯著的空間差別，因此 linux 採用這種設計 * 什麼情境會需要對 linked list 做排序呢？舉出真實世界的應用，最好在作業系統核心出現 * 什麼情境會需要需要找到 [第 k 大/小元素](https://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array/) 呢？又，你打算如何實作？ * * `list_for_each_safe` 和 `list_for_each` 的差異在哪？"safe" 在執行時期的影響為何？ * for_each 風格的開發方式對程式開發者的影響為何？ * Perl 和 Python 中都有 `foreach` 這種可針對整個 array(list) 操作的語法 * 例如(Perl)： ```perl @array = (1,2,3,4,5); foreach my $i (@array) { $i = $i*$i; } print "@array\n"; #output 1 4 9 16 25 ``` * 程式註解裡頭大量存在 `@` 符號，這有何意義？你能否應用在後續的程式開發呢？ >原本根據提示以為 linux 也是利用 Doxygen 來產生文件，但是安裝並執行後卻沒有找到預期的結果文件。後來參考 [0xff07](https://hackmd.io/s/Hy-Fdpy84#%E6%96%87%E4%BB%B6%E7%94%9F%E6%88%90) 同學的共筆，發現原本了理解有誤。[name=bauuuu1021][color=red] * 在 [How to write kernel documentation](https://www.kernel.org/doc/html/v4.11/doc-guide/sphinx.html#) 有以下說明 >The Linux kernel uses **Sphinx** to generate pretty documentation from reStructuredText files under Documentation. To build the documentation in HTML or PDF formats, use make htmldocs or make pdfdocs. The generated documentation is placed in Documentation/output. * 所以 linux kernel 使用的是 Sphinx 而非 Doxygen * [Sphinx quick-start](https://matplotlib.org/sampledoc/_images/basic_screenshot.png) * `tests/` 目錄底下的 unit test 的作用為何？就軟體工程來說的精神為何？ * 檢驗 list.h 中的 macro 是否運作正確，如果運作失敗或結果有誤會在 assert() 被擋下來 * 根據維基百科對 unit test 的定義： >In computer programming, unit testing is a software testing method by which individual units of source code, sets of one or more computer program modules together with associated control data, usage procedures, and operating procedures, are tested to **determine whether they are fit for use** * 所以 unit test 是為了確保這段程式碼是正確並可達到預期效用的，如果開發到一個階段沒有經過測試並修改錯誤，可能造成往後開發成果不如預期 * `tests/` 目錄的 unit test 可如何持續精進和改善呢？ ## 參考資料 * [2018 spring HW](https://hackmd.io/s/BJIK447of) * [Github link](https://github.com/bauuuu1021/linux-list) * [gnuplot 教學](https://hackmd.io/c/rJKbX1pFZ/https%3A%2F%2Fhackmd.io%2Fs%2FSkwp-alOg) * [C standard manual](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf) * [Why are all the linked lists circular in the Linux Kernel?](https://www.quora.com/Why-are-all-the-linked-lists-circular-in-the-Linux-Kernel) * [2019 Spring Homework 1 作業區](https://hackmd.io/s/SJ4kPZYS4) ###### tags: `bauuuu1021`, `sysprog`,`2019spring`