2018q1 Homework 3 (list)

# 2018q1 Homework 3 (list) contributed by <`piggyking1421`> ## 生日悖論 * **如何透過近似的運算式，估算上述機率計算呢？附上有效的 LaTeX 算式** 首先想到的方式是利用之前看過的 [Poisson distribution](https://www.umass.edu/wsp/resources/poisson/)，Poisson distribution 適用的條件為下： **(1) The event is something that can be counted in whole numbers** **(2) Occurrences are independent, so that one occurrence neither diminishes nor increases the chance of another** **(3) The average frequency of occurrence for the time period in question is know** **(4) It is possible to count how many events have occurred** 而 Poisson distribution 的公式為： $P(k)= e^{-\lambda} \cdot \dfrac{\lambda^k}{k!}$ $\lambda$為單位時間內事件的平均發生次數 $k$ 為隨機變數我們關注的是有兩人同一天生日這個 event，符合條件 1;且兩兩同一天生日的機率是獨立的，符合條件 2;單位時間內事件的平均發生機率 $\lambda$ 可以由人數排列組合和一年有365天求得，符合條件 3;有多少個兩人同一天生日是可數的，符合條件 4 。我們要求的是至少有兩人同一天生日的機率，也就是 $P'=1-P(0)$ ，k=0 代表的是沒有人同一天生日 $P(k=0)$ 而$\lambda$則是一年365天內兩人生日相同的機率，可以利用排列組合乘上兩個人生日在其中一天的機率，也就是 $\lambda=C^{人數}_2 \cdot \dfrac{1}{365}$ 若問題是要求幾個人在同一間房間中，有兩個人生日是同一天的機率會大於50%，即為 $P=1-P(0)>50\%$ 將人數=23人代入可知 $1-e^{-0.6932} \cdot \dfrac{0.6932^0}{0!} = 0.500026409 > 50\%$ 可知當房間內有23人時，有兩人生日相同的機率會大於50% 在[Birthday problem](https://en.wikipedia.org/wiki/Birthday_problem)中發現有其他近似的方法 * **[Birthday problem](https://en.wikipedia.org/wiki/Birthday_problem) 對資訊安全的影響在哪些層面？** 在[Birthday attack](https://en.wikipedia.org/wiki/Birthday_attack)中提到 Birthday attack是一種密碼攻擊的術語，就是來自於機率中的生日悖論。一般通訊會透過 hash function 進行加密，Birthday attack就是用在找 hash function 的 collision 。從生日悖論中可以看到從機率上看來，hash function的值域只要太小，就可以有很高的機率被猜出來。就像是23人遠小於365天，卻有50%的概率有兩個人生日相同一樣。 * **像是 [Random Number Generator](http://stattrek.com/statistics/random-number-generator.aspx) 這樣的產生器，是否能產生「夠亂」的亂數呢？** 不知道什麼是所謂「夠亂」的亂數，應該是要看用在什麼用途上，現在的電腦的亂數產生器都是偽亂數產生器(pseudorandom number generator)，透過某些演算法和種子的值來做到不可預測性，用在一般生活上應該都算是「夠亂」，但如果是用在密碼學上就要考慮到被破譯的可能性，有加密也要能解密，也許在計算功能越來越強大的電腦中，只要是可重現的隨機數，那就都不夠亂。在[Frequently-Asked Questions](http://stattrek.com/statistics/random-number-generator.aspx#faq)中便有提到 **How "random" is Stat Trek's Random Number Generator?** 網站中自己就有寫說這個演算法可用在統計學的應用上，但不應該被用在密碼學的加密上。 ## Linux 風格的 linked list * **為何 Linux 採用 macro 來實作 linked list？一般的 function call 有何成本？** macro 是在經過前置處理器時處理 `#define` 的常數、字串、函式，會在編譯前便將定義的macro替換到程式碼中，減少編譯時為了執行 function call的時間。由於function call 為了要執行function中的程式，會做stack儲存位址並進行跳躍，在編譯上會較花時間，但相較於macro，function call在記憶體中只會有一段 function 的程式碼，每次進行 function call 使用跳躍去相同的一段記憶體，減少的空間使用，是一種空間和時間的取捨。另外，function 在編譯時會做 type checking ，macro則是作為一段敘述被替換到程式碼中，因此不會 check 參數的 type ，在使用上較難debug。因此看來macro似乎適合用在一些簡短且常出現的函式上(時間成本>空間成本)，較長且較少出現的比較適合使用function call (空間成本>時間成本，且 debug 困難) 在 [Macro vs Function in C](https://stackoverflow.com/questions/9104568/macro-vs-function-in-c) 也看到一些macro使用起來比較不直觀的例子，不過我想可能只是因為我不熟悉macro，在常用的人眼中看來可能是很直觀的? 之後可能要再去翻翻 macro 的使用方式 * **Linux 應用 linked list 在哪些場合？舉三個案例並附上對應程式碼，需要解說，以及揣摩對應的考量** 待補 * **GNU extension 的 typeof 有何作用？在程式碼中扮演什麼角色？** [typeof](https://gcc.gnu.org/onlinedocs/gcc/Typeof.html) 中有詳細的解釋，裡面提到 `typeof` 可以用 expression 和 type 做為 argument ，會return argument 的 type 為那一種。我想是因為像上面說的 macro 在編譯時因為被前置處理器以敘述的方式替代進去 code ，而不會進行type check，在 macro 中要處理參數的 type 可以藉由 `typeof` 達到此效果。在[sysprog21/linux-list](https://github.com/sysprog21/linux-list)中的 include/list.h 中，394行～399行 ```clike=394 #ifdef __LIST_HAVE_TYPEOF #define list_for_each_entry(entry, head, member) \ for (entry = list_entry((head)->next, __typeof__(*entry), member); \ &entry->member != (head); \ entry = list_entry(entry->member.next, __typeof__(*entry), member)) #endif ``` 和426~431行 ```clike=426 #define list_for_each_entry_safe(entry, safe, head, member) \ for (entry = list_entry((head)->next, __typeof__(*entry), member), \ safe = list_entry(entry->member.next, __typeof__(*entry), member); \ &entry->member != (head); entry = safe, \ safe = list_entry(safe->member.next, __typeof__(*entry), member)) ``` 是使用了 typeof 的版本 * **解釋以下巨集的原理** ```Clike= #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) ``` `__extension__` : 在編譯時若使用`-pedantic`這種檢查ANSI/ISO C標準的編譯參數，若有 GNU C extensions 的code， compiler 就會發出 warning 的訊息，使用`__extension__`就可以讓他不發出 warning 的訊息。這個 macro 是利用已知某個 struct 中的某個 member 的位址，以此計算出整個 struct 的起始位址值。 ptr:指向 member 的位址，也就是傳入的已知位址值 type:member所在的 struct type member:已知位址值的 member 名稱第一行： `const __typeof__(((type *) 0)->member) *__pmember = (ptr);` 使用了`typeof`去宣告和 member 同 type 的指標`__pmember`，並指向 ptr (也就是傳入的已知位址值)。第二行： `(type *) ((char *) __pmember - offsetof(type, member));` 使用了另外一個 macro `offsetof` ，作用是算出 struct 起始未值和該 member 的位址距離， `offsetof` 的定義如下 `#define offsetof(type, member) (size_t)&(((type*)0)->member)` 詳細可看 [OFFSETOF](http://man7.org/linux/man-pages/man3/offsetof.3.html)。將剛剛得到的 __pmember 轉型為 char (因為char是1byte，而 offsetof 回傳的單位為 byte)再減去 `offsetof` 獲得的 member 位址差值，即可獲得開頭的位置，並再轉型回指向 type 的指標。 * **除了你熟悉的 add 和 delete 操作，`list.h` 還定義一系列操作，為什麼呢？這些有什麼益處？** 在github的[list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h)寫了 ``` /* * Simple doubly linked list implementation. * * Some of the internal functions ("__xxx") are useful when * manipulating whole lists rather than single entries, as * sometimes we already know the next/prev entries and we can * generate better code by using them directly rather than * using the generic single-entry routines. */ ``` 也就是說list.h中有針對各種使用情況而有各種不同的function，因此在使用時可以看情況選擇適合的，減少冗餘的部份，提高效能或是達到不同的效果。像是如果要合併list有 `static inline void list_splice_tail(struct list_head *list,struct list_head *head)` 另外還有 `list_splice_tail`、`list_splice_init`、`list_splice_tail_init`、`list_cut_position`各種針對不同情況的function。 * **`LIST_POISONING` 這樣的設計有何意義**？ include/list.h 中寫了 ```=130 * LIST_POISONING can be enabled during build-time to provoke an invalid memory * access when the memory behind the next/prev pointer is used after a list_del. * This only works on systems which prohibit access to the predefined memory * addresses. ``` LIST_POISONING 是用來避免重新使用被刪除掉的記憶體時使用到原本應該要被刪除的next/prev。而在[List poisoning](https://en.wikipedia.org/wiki/List_poisoning)中寫到，是來自於攻擊者利用無效的email address攻擊電子郵件。 * **linked list 採用環狀是基於哪些考量？** 環狀 linked-list 由於尾連著頭，擁有著很高的靈活性，在設計function上也少了要操作第一個和最後一個的問題。 * **`list_for_each_safe` 和 `list_for_each` 的差異在哪？"safe" 在執行時期的影響為何？** ```clike=370 /** * list_for_each - iterate over list nodes * @node: list_head pointer used as iterator * @head: pointer to the head of the list * * The nodes and the head of the list must must be kept unmodified while * iterating through it. Any modifications to the the list will cause undefined * behavior. */ #define list_for_each(node, head) \ for (node = (head)->next; node != (head); node = node->next) ``` ```clike=401 /** * list_for_each_safe - iterate over list nodes and allow deletes * @node: list_head pointer used as iterator * @safe: list_head pointer used to store info for next entry in list * @head: pointer to the head of the list * * The current node (iterator) is allowed to be removed from the list. Any * other modifications to the the list will cause undefined behavior. */ #define list_for_each_safe(node, safe, head) \ for (node = (head)->next, safe = node->next; node != (head); \ node = safe, safe = node->next) ``` 可看到 safe 的版本差別在於多了一個 n，去把 pos->next 存下，並在要回到 pos->next 時把n的值放回 pos，如此可在 iterate 的過程中能夠刪除節點，即使pos被del掉了仍舊能繼續安全的往下一個節點走。 * **for_each 風格的開發方式對程式開發者的影響為何？** * 提示：對照其他程式語言，如 Perl 和 Python 待補 * **程式註解裡頭大量存在 `@` 符號，這有何意義？你能否應用在後續的程式開發呢？** * 提示: 對照看 Doxygen Doxygen 是程式的文檔產生器, 是能提取程式的內容自動產生程式說明文件的系統.在程式碼的註解中利用特殊的格式，便能自動產生說明文件。 [Doxygen](https://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdat)中寫到 `All commands in the documentation start with a backslash (\) or an at-sign (@). If you prefer you can replace all commands starting with a backslash below by their counterparts that start with an at-sign.` 也就是說在程式註解中打 `@`是為了能用Doxygen直接產生說明文件的格式。 * **`tests/` 目錄底下的 unit test 的作用為何？就軟體工程來說的精神為何？** `tests/` 中的檔案是用來個別對某部份的功能進行測試，對debug來說能夠更精確簡單。若是一次debug整個大程式常常不知道錯誤是出在哪裡，而能分成一小塊一小塊去debug，也就是模組化，把程式變得單純、好觀察，降低debug的難度。 * **`tests/` 目錄的 unit test 可如何持續精進和改善呢？** ## 針對 linked list 的排序演算法 * **對 linked list 進行排序的應用場合為何？** * **考慮到 linked list 在某些場合幾乎都是幾乎排序完成，這樣貿然套用 quick sort 會有什麼問題？如何確保在平均和最差的時間複雜度呢？** * **能否找出 Linux 核心原始程式碼裡頭，對 linked list 排序的案例？** * **透過 [skiplist](https://en.wikipedia.org/wiki/Skip_list) 這樣的變形，能否加速排序？** * **研讀 [How to find the kth largest element in an unsorted array of length n in O(n)?](https://stackoverflow.com/questions/251781/how-to-find-the-kth-largest-element-in-an-unsorted-array-of-length-n-in-on)，比照上方連結，是否能設計出針對 linked list 有效找出第 k 大 (或小) 元素的演算法？** * **[linux-list](https://github.com/sysprog21/linux-list) 裡頭 `examples` 裡頭有兩種排序實作（insertion 和 quick sort)，請簡述其原理** ## Reference [Poisson distribution](https://www.umass.edu/wsp/resources/poisson/) [阿夢的程式設計天地](https://dotblogs.com.tw/ace_dream/2016/01/16/function_macro) [Macro vs Function in C](https://stackoverflow.com/questions/9104568/macro-vs-function-in-c) [Linux的container_of 與 offsetof巨集](https://myao0730.blogspot.tw/2016/09/linuxcontainerof-offsetof.html) [請益一段程式碼offsetof/container_of](https://www.ptt.cc/bbs/C_and_CPP/M.1462016161.A.1F8.html) [Why does the Linux kernel use circular doubly linked lists to store the lists of processes?](https://stackoverflow.com/questions/46089965/why-does-the-linux-kernel-use-circular-doubly-linked-lists-to-store-the-lists-of) [Doxygen 程式文件產生器與簡易筆記](https://blog.longwin.com.tw/2011/04/doxygen-document-generator-2011/) [模組化老哏細談](https://www.ithome.com.tw/voice/88488)