2018q1 Homework3 (list)

contributed by < TingL7 >

生日悖論

如何透過近似的運算式，估算上述機率計算呢？附上有效的 LaTeX 算式

生日悖論是指 23 人以上，出現兩個人生日同一天的機率高於 50 %

參考 Birthday problem
假設
- 所有人的生日都是隨機分佈
- 一年 365 天
P(n) : n 人中至少兩人生日是同一天
$n 人中生日都不同天 = \frac{365}{365} \cdot \frac{364}{365} \cdot \frac{363}{365} \cdot . . . \frac{366 - n}{365} = \frac{365!}{365^{n} \cdot (365 - n)!}$
$P (n) = 1 - n 人中生日都不同天 = 1 - \frac{365!}{365^{n} \cdot (365 - n)!}$
$P (23) = 1 - \frac{365!}{365^{23} \cdot 342!} \approx 1 - 0.49270276567 = 0.50729723432$

參考資料

HackMD – LaTeX 語法與示範

Birthday problem 對資訊安全的影響在哪些層面？

主要的影響在於 hash function 的應用
hash function 的碰撞，利用窮舉法製作出具有相同 hash value 的檔案或合同，取代真正的檔案以進行欺騙
- 有一長度 n 的字串 t1 透過 hash function 得到 f(t1)
- 要找到字串 t2 ，使得 f(t1) = f(t2) ，可以透過大量製造內容不同但意義相同（如，增減標點或空白）的字串
- 從生日悖論中，可以知道只要有 2^n/2 個字串就會有很大的機率發生碰撞
- 因此，只要製造出 2^n/2 個字串就有機會找到具有相同雜湊值的字串
- 應對方法：增加字串長度
參考Birthday attack 、生日攻击是什么，有什么用？、 hash碰撞，数字签名，生日攻击

像是 Random Number Generator 這樣的產生器，是否能產生「夠亂」的亂數呢？

產生器產生的亂數，是透過 seed 來產生亂數表，因此 seed 相同的話，得到的亂數表都是相同的。
如果每次的亂數種子都設不一樣就可以得到不同的亂數表，進而達到看起來像亂數的效果。
- 像是 c 語言中常用的 srand(time(NULL)); ，以時間作為亂數種子，使得每次亂數種子都不同，得到不同亂數表，進而得到「亂」數
- 而 rand() 測試幾次，會發現每次得到的亂數是相同的，因為他的種子每次都相同

Linux 風格的 linked list

為何 Linux 採用 macro 來實作 linked list？一般的 function call 有何成本？

參考 What is the difference between a macro and a function in C? 中提到，The basic difference is that function is compiled and macro is preprocessed.
macro 在前置處理的過程中，就完成程式碼的替換。
function 在被編譯後執行，每次呼叫時都需要時間進行 stack 的 pop 、 push，比較耗費時間。
因此，採用 macro 來實作 linked list 可以避免過多的 function call 時間成本

Linux 應用 linked list 在哪些場合？舉三個案例並附上對應程式碼，需要解說，以及揣摩對應的考量

GNU extension 的 typeof 有何作用？在程式碼中扮演什麼角色？

參考 GCC typeof在kernel中的使用、 Referring to a Type with typeof
int a;typeof（a) b; 宣告與變數 a 相同 type 的變數 b
可以達到 c++ 中「多型」的效果

可以根據函式傳入不同的 type 而產生不同的行為
e.g.

#define max(a,b)          \
  ({ typeof (a) _a = (a); \
     typeof (b) _b = (b); \
     _a > _b ? _a : _b; })

解釋以下巨集的原理

#define container_of(ptr, type, member)                            \
    __extension__({                                                \
        const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
        (type *) ((char *) __pmember - offsetof(type, member));    \
    })

在 Alternate Keywords 中，提到編譯參數 -traditional 、 -ansi 、 -std=c99 會使一些關鍵字失效，如 typeof ，在GNU C 中，在字的前後加上 __ 可以解決關鍵字失效的問題
定義 container_of(ptr, type, member) 以 __extension__() 取代。
const __typeof__(((type *) 0)->member) *__pmember = (ptr);
- (type *) 0 : 型態轉成 type * 的 0
- ((type *) 0)->member : type * 的 0 指向 member
  - 由 -> 可以猜測 type 是 structure
- __typeof__(((type *) 0)->member) : type 中 member 的型態
  - 0 可以替換成其他數字，只是為了知道 member 的 type 而使用
- const __typeof__(((type *) 0)->member) * : 型態為 const type 中 member 的型態指標
- const __typeof__(((type *) 0)->member) *__pmember : 宣告變數__pmember，型態為 const type 中 member 的型態指標
- const __typeof__(((type *) 0)->member) *__pmember = (ptr); ：宣告變數__pmember，型態為 const type 中 member 的型態指標，指向 ptr 指的位置
(type *) ((char *) __pmember - offsetof(type, member));
- (char *) __pmember : 型態轉換成 char * 的 __pmember
- offsetof(type, member) : type 中 member 的位址大小
  - #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) 計算出 struct 和其 member 的距離
- ((char *) __pmember - offsetof(type, member)) : 指向 ptr 也是 struct typr 中的 member的位址減去 struct 開頭與 member 的距離，也就是 struct 開頭的位址
- (type *) ((char *) __pmember - offsetof(type, member)); ：以(type *) 儲存 type 開頭的位址

除了你熟悉的 add 和 delete 操作，list.h 還定義一系列操作，為什麼呢？這些有什麼益處？

除了一般的 add 和 delete ，還定義這些操作
- list_add_tail
- list_del_init : 在刪除節點之前，先將其初始化，這樣可以避免錯誤的操作
- list_empty
- list_is_singular : 是否只有一個節點
- list_splice ：把 list 的一些節點插入到另一個 list 開頭
- list_splice_tail
- list_splice_init
- list_cut_position : 將 list 開頭移到另一個 list
- list_move : 把指定的節點移到 list 開頭
- list_move_tail
由於這是環狀 doublely-linked list ，會有針對開頭或尾端的操作，這樣能夠更方便的使用，也能提升效能

LIST_POISONING 這樣的設計有何意義？

list poisoning 是指 mail list 出現無效的郵件地址導致資源的消耗
LIST_POISONING 被定義在 linux-list/include/list.h中





























/**
 * list_del() - Remove a list node from the list
 * @node: pointer to the node
 *
 * The node is only removed from the list. Neither the memory of the removed
 * node nor the memory of the entry containing the node is free'd. The node
 * has to be handled like an uninitialized node. Accessing the next or prev
 * pointer of the node is not safe.
 *
 * Unlinked, initialized nodes are also uninitialized after list_del.
 *
 * LIST_POISONING can be enabled during build-time to provoke an invalid memory                
 * access when the memory behind the next/prev pointer is used after a list_del.
 * This only works on systems which prohibit access to the predefined memory
 * addresses.
 */
static inline void list_del(struct list_head *node)
{
    struct list_head *next = node->next;
    struct list_head *prev = node->prev;

    next->prev = prev;
    prev->next = next;

#ifdef LIST_POISONING
    node->prev = (struct list_head *) (0x00100100);
    node->next = (struct list_head *) (0x00200200);
#endif
}

第 130 行的註解有提到， LIST_POISONING 的設計是因為刪除節點時，只有刪除 list 中的節點，並沒有真正的釋放記憶體，因此在重新使用這塊記憶體時，可能會去使用到不該用的 next / prev

linked list 採用環狀是基於哪些考量？

在使用上效能比較好，例如要使用到 linked list 尾端，直接由 head 就可以找到，不必再使用 loop 去找到 node->next = NULL;

list_for_each_safe 和 list_for_each 的差異在哪？“safe” 在執行時期的影響為何？

linux-list/include/list.h中












/**
 * list_for_each_safe - iterate over list nodes and allow deletes
 * @node: list_head pointer used as iterator
 * @safe: list_head pointer used to store info for next entry in list
 * @head: pointer to the head of the list
 *
 * The current node (iterator) is allowed to be removed from the list. Any
 * other modifications to the the list will cause undefined behavior.
 */
#define list_for_each_safe(node, safe, head)                     \
    for (node = (head)->next, safe = node->next; node != (head); \
         node = safe, safe = node->next)











/**
 * list_for_each - iterate over list nodes
 * @node: list_head pointer used as iterator
 * @head: pointer to the head of the list
 *
 * The nodes and the head of the list must must be kept unmodified while
 * iterating through it. Any modifications to the the list will cause undefined
 * behavior.
 */
#define list_for_each(node, head) \
    for (node = (head)->next; node != (head); node = node->next)

從程式碼來看最大的差異在於 safe 版本多了一個變數 safe 儲存了當下節點的 next
多了一個 safe 的好處在於，可以對當下節點進行刪除的動作，也不會影響整個迴圈的進行，因為 safe 會事先儲存下一個節點的位址，不會使 list 斷掉

for_each 風格的開發方式對程式開發者的影響為何？

與 c 語言中 for 的寫法相比， foreach 簡潔許多，可讀性也較高
foreach 可以簡單的走訪過整個資料結構
Perl foreach 用法
```
foreach my $i (@array){;}
```
可以發現，參數僅有控制變數 i 跟要操作的 array ，而在 c 的 for 的寫法中可以自訂變數初始動作、條件、每次內容做完後的動作，彈性比較大，但相對的寫法就比較複雜

程式註解裡頭大量存在 @ 符號，這有何意義？你能否應用在後續的程式開發呢？

Doxygen 是產生參考文件的工具，會抓取程式內容與註解來自動產生說明文件。因此，只要按照 Doxygen 的格式撰寫註解，就可以不用另外寫說明文件，可以由 Doxygen 自動產生
Doxygen 說明文件中提到，@ 符號表示註解的開頭，而 \ 對於 Doxygen 來說，與 @ 意義相同
@ 會用在參數的開頭，或是一些關鍵字的開頭 (如 @struct)

tests/ 目錄底下的 unit test 的作用為何？就軟體工程來說的精神為何？

tests/ 目錄底下有許多測試 function 的檔案，可以先確認每個 function 的功能正確無誤，再應用到程式中，使得 debug 更容易
就軟體工程來說，希望將開發流程、大程式切割成許多小部份，可以實做一些就測試分析一些，在維護、找錯、團隊合作上都會進行的更容易、更順利

tests/ 目錄的 unit test 可如何持續精進和改善呢？

針對 linked list 的排序演算法

對 linked list 進行排序的應用場合為何？

linked list 的優點在於插入、刪除、共用節點，而且需要的記憶體空間是動態的
需要以不同的優先權依據重新排列
排程的管理
- 需要以優先順序排列（優先權依據可能是必須馬上做的或者比較重要的）
- 有新的事項進來要插入容易
- 事項的數量不固定

考慮到 linked list 在某些場合幾乎都是幾乎排序完成，這樣貿然套用 quick sort 會有什麼問題？如何確保在平均和最差的時間複雜度呢？

參考快速排序(Quick Sort) – 寫點科普，請給指教。
quick sort 不適用於幾乎排序好的資料
- quick sort 在使用時，會選擇最後一個當分割的 pivot ，如果是幾乎排序好的每次在挑 pivot 時，容易選到最大值，以此做分割會造成沒有分割成兩組數的效果
- 因此，對於幾乎排序完成的資料，在使用 quick sort 的時間複雜度通常是 worst case O(n²)
可能的解法： Middle-of-Three 方法
- 由資料的最左值、最右值、中間值三個中挑出中間值作為 pivot
- 可以避免取到最大值造成分割沒有效果的情況，甚至接近 best case ，因為是幾乎排序好的資料，容易選到中間值，分割出數量相等的兩組數

能否找出 Linux 核心原始程式碼裡頭，對 linked list 排序的案例？

利用搜尋工具，搜尋關鍵字 list_sort
/lib/list_sort.c
- 以 merge sort 來實作 list sort
/drivers/gpu/drm/drm_modes.c
- mode 的 list 需要以 favorability 的順序排序，因此用到 list sort

透過 skiplist 這樣的變形，能否加速排序？

skip list 是一種針對 sorted linked list 的資料結構，透過不同 level 做出不同的 linked list 達到加速搜尋的效果
e.g. 搜尋 7
1. 從 level 4 比對， 1 <= 7 ， null 跳 level 3
2. level 3 比對 4 <= 7 , 6 <= 7, null 跳 level 2
3. level 2 比對 9 >= 7 跳 level 1
4. level 1 比對 7 == 7
比對 5 次，比直接查詢（7 次）還要快
對於 insertion sort ，這樣子的資料結構，能夠加速，因為每次插入時都需要搜尋插入的位置，這時候 skip list 就派上用場了
對於其他的排序法就沒有明顯的幫助

研讀 How to find the kth largest element in an unsorted array of length n in O(n)?，比照上方連結，是否能設計出針對 linked list 有效找出第 k 大 (或小) 元素的演算法？

參考連結中 hoder 的回答，試著設計找出第 k 小元素的 pseudocode:

QUICKSELECT(list, k)
    建立 l1, l2 兩個空的 linled list
    r = 從 list 中隨機挑出一個節點中的 val

    foreach (node, head)
        if noed->val < r
        then add(node, l1)
        if noed->val > r
        then add(node, l2)

    if k <= len(l1)
        return QUICKSORT(l1, k)
    else if k > len(list)-len(l2)
        return QUICKSORT(l2, k - (len(list) - len(l2)))
    else
        return r

linux-list 裡頭 examples 裡頭有兩種排序實作（insertion 和 quick sort)，請簡述其原理

insertion : 分成兩部份 list_insert_sorted 及 list_insertsort

原理：將未排序好的 list 中所有節點，一個個插入到已排序的 list 中適合的位置

list_insertsort

static void list_insertsort(struct list_head *head)
{
    struct list_head list_unsorted;
    struct listitem *item = NULL, *is = NULL;

    INIT_LIST_HEAD(&list_unsorted);
    list_splice_init(head, &list_unsorted);

    list_for_each_entry_safe (item, is, &list_unsorted, list) {
        list_del(&item->list);
        list_insert_sorted(item, head);
    }
}

這個函式主要做兩件事

將 list 移到另一個 list (list_unsorted)
先將節點從 list 中刪除，再將節點插入排序好的 list 中

list_insert_sorted

static void list_insert_sorted(struct listitem *entry, struct list_head *head)
{
    struct listitem *item = NULL;

    if (list_empty(head)) {
        list_add(&entry->list, head);
        return;
    }   

    list_for_each_entry (item, head, list) {
        if (cmpint(&entry->i, &item->i) < 0) {
            list_add_tail(&entry->list, &item->list);
            return;
        }   
    }   

    list_add_tail(&entry->list, head);
}

要將節點插入排序好的 list 中，分 3 種情況

第一個：當 list 是空的話， item 直接放進 list 中
中間：找到比 item 小的節點,將 item 放到它後面
最後一個： list 中最小的元素

quick sort

原理：設定一個 pivot ，將 list 以 pivot 為刀切割成兩部份，一部分都比 pivot 大，一部分比 pivot 小，在將切割出來的部份遞迴做 quick sort 直到切出來的部份為空

list_qsort





























static void list_qsort(struct list_head *head)
{
    struct list_head list_less, list_greater;
    struct listitem *pivot;
    struct listitem *item = NULL, *is = NULL;

    if (list_empty(head) || list_is_singular(head))
        return;

    INIT_LIST_HEAD(&list_less);
    INIT_LIST_HEAD(&list_greater);

    pivot = list_first_entry(head, struct listitem, list);
    list_del(&pivot->list);

    list_for_each_entry_safe (item, is, head, list) {
        if (cmpint(&item->i, &pivot->i) < 0)
            list_move_tail(&item->list, &list_less);
        else
            list_move(&item->list, &list_greater);
    }

    list_qsort(&list_less);
    list_qsort(&list_greater);

    list_add(&pivot->list, head);
    list_splice(&list_less, head);
    list_splice_tail(&list_greater, head);
}

由第 13 行可以知道， pivot 挑選第一個 entry
第 26~28 行是將切割過的 list 重新收集、排好