2018q1 Homework3 (list)

2018q1 Homework3 (list) === contributed by <`raygoah`> ## 作業說明 * [list 作業說明](https://hackmd.io/s/HkxZbJzif) ## 生日悖論 #### 如何透過近似的運算式，估算上述機率計算呢？附上有效的 LaTeX 算式 * 假設每個出生日期的機率，在一年 365 天中是平均分布的 * $n$ 表示總共的人數 * $P(n)$ 為這 $n$ 個人，每個人的生日都是不同天的機率: $P(n)=\dfrac{C_{n}^{365} \times\ n!}{365^n}=\dfrac{365! \times\ n!}{365^n \times\ n! \times\ (365-n)!}=\dfrac{365!}{365^n \times\ (365-n)!}$ * $P'(n)$ 表示這 $n$ 個人中，有人生日重複的機率: $P'(n)=1-P(n)=1-\dfrac{365!}{365^n \times\ (365-n)!}$ * 計算 $P'(22)$: $P'(22)=1-\dfrac{365!}{365^{22} \times\ 343!}=1-\dfrac{364 \times\ 363 \times\ 362 \times.... \times\ 344}{365^{21}}= 1 - 0.493=0.507$ 機率大於 $50％$ #### Birthday problem 對資訊安全的影響在哪些層面？ * 閱讀 [生日攻擊](http://wiki.mbalib.com/zh-tw/%E7%94%9F%E6%97%A5%E6%94%BB%E5%87%BB) 介紹 * 從 hash function 中的 collision，藉由枚舉找到能產生相同 hash value 的訊息，即可用來偽造傳遞的訊息或是合約 * 若是 hash function 中使用的 hash value 其值域不夠大，bits 數不夠多，便可以藉由生日攻擊，找到一個 y ，使得 y 和原本的 x 在 hash funcion 中會得到相同的值，h(x)=h(y) * 網路上搜尋到的資料中，大多以簽署電子合約時，利用修改合約中不容易被發現有更動過的地方，且更改後合約經過 hash function 產生出的 hash value 會相同，進而欺騙或是造成傷害 * 而關於生日攻擊的防禦辦法，可藉由將產生出的 hash value bits 數增加，減少 collision 的發生，讓經由生日攻擊找到 collision 的困難增加，需要更多更久的計算才能找到一個 collision，來達到防禦的作用 #### 像是 Random Number Generator 這樣的產生器，是否能產生「夠亂」的亂數呢？ * [Random Number Generator](http://stattrek.com/statistics/random-number-generator.aspx) * 在 Frequently-Asked Questions 中有提到： >Q: How "random" is Stat Trek's Random Number Generator?[color=blue] >>A: Although no computer algorithm can produce numbers that are truly random, Stat Trek's Random Number Generator produces numbers that are nearly random. Stat Trek's Random Number Generator can be used for most statistical applications (like randomly assigning subjects to treatments in a statistical experiment). However, it should not be used to generate numbers for cryptography.[color=red] * 大部分的電腦產生亂數都不是真正完全隨機的亂數，而是很接近隨機的一種偽隨機性的偽亂數 * 首先先輸入 seed 值 800，觀察得到的 Random Number Table，接著輸入其他 seed 值多次後，再回來輸入一開始的 seed 值 800，可以發現到 Random Number Table 中所有數字是相同的且順序也是一樣的 * 可以知道利用偽亂數所製作的亂數產生器，在 seed 值輸入相同的情況下，產生出來的『亂數』順序和值都會是一樣的，因此亂數產生器並不是真的完全隨機產生亂數，但是在 seed 值總是輸入不同的情況下，會造成產生的亂數永遠會是不同的錯覺 * 參考： * [你的程式夠亂嗎](https://www.ithome.com.tw/voice/110007) * [偽亂數](https://zh.wikipedia.org/wiki/%E4%BC%AA%E9%9A%8F%E6%9C%BA%E6%80%A7) ## Linux 風格的 linked list #### 為何 Linux 採用 macro 來實作 linked list？一般的 function call 有何成本？ * 先理解一下什麼是 marco * [marco 維基百科](https://zh.wikipedia.org/wiki/%E5%B7%A8%E9%9B%86) * [高等 C 語言 -- 巨集 (Macro)](http://ccckmit.wikidot.com/cp:macro) * [第 8 章 C 前置處理](http://140.127.40.1/~jwu/c/cpgch8.htm) * 兩者的差異在閱讀 [stackoverflow](https://stackoverflow.com/questions/4990362/what-is-the-difference-between-a-macro-and-a-function-in-c) 上的問答後，有比較瞭解了，如下： * marco 會被前置處理器所處理，在前置處理的過程中，code 中有 marco 的地方，會被代換成對應的文字或算式 * 使用 function call 的話，需要經由 compiler 編譯成組語，並且將 function 的參數以及回傳值放入 stack 中，比起使用 macro 會需要較多的時間 * 使用 marco 可以減少 call stack 所需的時間 * 但 marco 在 debug 時會比較困難 * 這邊有一個比較特別值得注意的地方，在 C99 以及 C++ 中其實有一個 inline function，inline function 在內嵌程式碼這點和 macro 相同，減少了 call stack 所需的時間，使得執行速度變快，閱讀關於 inline function 的說明 * [程序員筆記](http://ascii-iicsa.blogspot.tw/2010/08/inline-functionmarco.html) * [Inline Functions In C](http://www.greenend.org.uk/rjk/tech/inline.html) #### Linux 應用 linked list 在哪些場合？舉三個案例並附上對應程式碼，需要解說，以及揣摩對應的考量 1. Memeory Management * 在作業系統的記憶體管理中，有很多部份會使用到 linked-list，因此由這個方向從 linux 在 [github 上開源的程式碼](https://github.com/torvalds/linux)中找尋對應的部份 * [memerory management 介紹](http://www.csie.ntnu.edu.tw/~swanky/os/chap6.htm) * [linux/mm/page_alloc.c](https://github.com/torvalds/linux/blob/master/mm/page_alloc.c) * 看到程式碼中註解提到 ``` Manages the free list, the system allocates free pages here. ``` * 在 page_alloc.c 中管理 free list 以及分配系統中的 free pages * 利用 `list_add_tail` 將 page 加到 linked list 的最後一個，最後回傳新 pages 在 linked list 上被分配的編號，程式碼如下： ```clike= /* * Obtain a specified number of elements from the buddy allocator, all under * a single hold of the lock, for efficiency. Add them to the supplied list. * Returns the number of new pages which were placed at *list. */ static int rmqueue_bulk(struct zone *zone, unsigned int order, unsigned long count, struct list_head *list, int migratetype) { int i, alloced = 0; spin_lock(&zone->lock); for (i = 0; i < count; ++i) { struct page *page = __rmqueue(zone, order, migratetype); if (unlikely(page == NULL)) break; if (unlikely(check_pcp_refill(page))) continue; /* * Split buddy pages returned by expand() are received here in * physical page order. The page is added to the tail of * caller's list. From the callers perspective, the linked list * is ordered by page number under some conditions. This is * useful for IO devices that can forward direction from the * head, thus also in the physical page order. This is useful * for IO devices that can merge IO requests if the physical * pages are ordered properly. */ list_add_tail(&page->lru, list); alloced++; if (is_migrate_cma(get_pcppage_migratetype(page))) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, -(1 << order)); } /* * i pages were removed from the buddy list even if some leak due * to check_pcp_refill failing so adjust NR_FREE_PAGES based * on i. Do not confuse with 'alloced' which is the number of * pages added to the pcp list. */ __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); spin_unlock(&zone->lock); return alloced; } ``` 2. 排程管理 * [Linux 的行程管理](http://sp1.wikidot.com/linuxprocessmanagement) * [linux/kernel/sched/sched.h](https://github.com/torvalds/linux/blob/master/kernel/sched/wait.c) * 在 sched.h 中宣告了實作 real time scheduler 相關的 struct，其中排程所需要用的 queue，便是經由 linked list 實作而成，任務經由這個 queue 進行排程，程式碼如下所示： ```clike= /* * This is the priority-queue data structure of the RT scheduling class: */ struct rt_prio_array { DECLARE_BITMAP(bitmap, MAX_RT_PRIO+1); /* include 1 bit for delimiter */ struct list_head queue[MAX_RT_PRIO]; }; ``` 3. File System * [Linux 的檔案系統簡介](http://ccckmit.wikidot.com/lk:file) * [linux/include/linux/fs.h](https://github.com/torvalds/linux/blob/master/include/linux/fs.h#L5) * 程式碼中使用了大量的 linked list #### GNU extension 的 typeof 有何作用？在程式碼中扮演什麼角色？ * 參考 [typeof operator in C](https://stackoverflow.com/questions/12081502/typeof-operator-in-c) * 回傳給入變數的 type，若 typeof(x)，便回傳 x 的 type，如下： ```clike int x; /* Plain old int variable. */ typeof(x) y; /* Same type as x. Plain old int variable. */ ``` #### 解釋以下巨集的原理 ``` #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) ``` * 定義一個指標將其指向 member 的地址，再來利用 typeof 取得這個 member 所在的 struct type，產生 __pmember 這個指標，並且將 __pmember 指向一開始傳入的 ptr，而 ptr 為 struct 中 member 的地址 * 接著把 __pmember 的值減掉 offsetof(type, member)，也就是減掉 member 在 struct 中的 offset * 減掉後得到的結果就是 member 所在的 struct 其記憶體起始位置 #### 除了你熟悉的 add 和 delete 操作，list.h 還定義一系列操作，為什麼呢？這些有什麼益處？ * 從 [linux/include/linux/list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h) 中可以看到最開頭中註解的說明： ``` /* * Simple doubly linked list implementation. * * Some of the internal functions ("__xxx") are useful when * manipulating whole lists rather than single entries, as * sometimes we already know the next/prev entries and we can * generate better code by using them directly rather than * using the generic single-entry routines. */ ``` * 提到實作多個後方有 __xxx 的 function 原因，在某些時候已經知道下一個或是前一個 node 時，使用這些 finctiom 可以更方便有效率的操作 linked list #### LIST_POISONING 這樣的設計有何意義？ * 參考 [linux-like-list/list.h](https://github.com/ecsv/linux-like-list/blob/master/list.h) 中所寫 ```clike /** * list_del() - Remove a list node from the list * @node: pointer to the node * * The node is only removed from the list. Neither the memory of the removed * node nor the memory of the entry containing the node is free'd. The node * has to be handled like an uninitialized node. Accessing the next or prev * pointer of the node is not safe. * * Unlinked, initialized nodes are also uninitialized after list_del. * * LIST_POISONING can be enabled during build-time to provoke an invalid memory * access when the memory behind the next/prev pointer is used after a list_del. * This only works on systems which prohibit access to the predefined memory * addresses. */ static __inline__ void list_del(struct list_head *node) { struct list_head *next = node->next; struct list_head *prev = node->prev; next->prev = prev; prev->next = next; #ifdef LIST_POISONING node->prev = (struct list_head *)(0x00100100); node->next = (struct list_head *)(0x00200200); #endif } ``` * 註解部份有提到，使用 list_del() 所刪除的 node，只是先從 linked list 中被移除，但其 node 所在的記憶體區段仍未被釋放，因此不該操作此 node 的 prev 或是 next pointer ，因為此 node 已經從 linked list 中被刪除，該被視為 uninitialized * 而 LIST_POISONING 便是針對這個問題而產生的，將被 list_del() 所刪除的 node 其 prev/next pointer 指向 LIST_POISONING 所給定的記憶體位置，若是操作了已經被 list_del() 所刪除的 node 中的 prev/next pointer，就會變成存取到 LIST_POISONING 的記憶體地址，防止存取到非法的記憶體地址 #### linked list 採用環狀是基於哪些考量？ * 使用環狀的 linked list 只需由 last node 的 next pointer 便可以找到開頭的 node * 更改 last node 時可以節省許多步驟，只要更改 last 這個指標，便可以順利改成新的 last node #### list_for_each_safe 和 list_for_each 的差異在哪？“safe” 在執行時期的影響為何？ * 比較兩者的差異 * `list_for_each` : ```clike #define list_for_each(pos, head) \ for (pos = (head)->next; pos != (head); pos = pos->next) ``` * `list_for_each_safe` : ```clike #define list_for_each_safe(pos, n, head) \ for (pos = (head)->next, n = pos->next; pos != (head); \ pos = n, n = pos->next) ``` * 可以看到 list_for_each_safe 多了變數 n 用來除存 pos 目前所指向 node 的 next node * safe 版本可以避免在利用 list_del() 刪除了 pos 所指向的指標後，進入下一次迴圈時會因為 pos->next 已經被改為 LIST_POISONING 所產生的問題，藉由預先保存 pos->next 於 n，便可以在進入下一次 for 迴圈時，得到正確的 node，而不被 list_del() 所影響 #### for_each 風格的開發方式對程式開發者的影響為何？ * for_each 風格的開發方式可以很方便的走訪某資料結構的全部資料，不像使用 for 需要自行設定範圍以及條件，相比起來程式碼較簡潔，開發過程中使用也比較輕鬆簡單 #### 程式註解裡頭大量存在 @ 符號，這有何意義？你能否應用在後續的程式開發呢？ * 參考 [Doxygen 介紹](https://zh.wikipedia.org/wiki/Doxygen) & [Doxygen 程式文件產生器與簡易筆記](https://blog.longwin.com.tw/2011/04/doxygen-document-generator-2011/) * Doxygen 是一套程式文件的產生器, 會抓取程式的內容、註解來自動產生程式說明文件的系統 * 依照 Doxygen 的格式寫作，可以自動產生程式說明文件，修改程式碼後有新版本的程式，也可以自動產生對應的新說明文件 * 其中 @ 為 Doxygen 規定的註解開頭 #### tests/ 目錄底下的 unit test 的作用為何？就軟體工程來說的精神為何？ * 在 test/ 目錄底下有許多小型的程式，而這些程式的作用是用來測試每個對應的 function 功能是否正常 * 先確認每個小地方能夠正確執行後，再將程式合併測試時，便可以藉此剪枝排除可能性，方便 debug 以及減少所需時間 * 這也是模組化設計的好處，之前大三室友每天都歌頌著模組化的好處，現在才有比較深的體會，不然之前覺得模組化要分開寫很麻煩，在大型的專案開發中，模組化的開發可以省下許多心力，同時程式維護起來也會比較輕鬆 * 參考：[先寫單元測試的12個好處](http://blog.turn.tw/?p=2821) #### tests/ 目錄的 unit test 可如何持續精進和改善呢？ * 避免在一個 unit test 中，同時用到多個在專案中自行定義的 function，否則測試結果顯示為錯誤的話，無法判定是否為想要測試的那個 function 造成的問題，應該要讓操縱變因保持為一個，或是在確認其他 function 能正常運作後才使用該 function ## 針對 linked list 的排序演算法