Linux Linked-List Queue

--- tags: Linux 核心專題 --- # Linux Linked-List Queue 實作與排序優化 contributed by < [Tonr01](https://github.com/Tonr01/lab0-c) > [Linux 核心設計 (Linux Kernel Internals)](http://wiki.csie.ncku.edu.tw/linux/schedule) ## 開發環境 ``` gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 架構： x86_64 CPU 作業模式： 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 供應商識別號： GenuineIntel Model name: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz CPU 家族： 6 型號： 141 每核心執行緒數： 2 每通訊端核心數： 8 Socket(s): 1 製程： 1 CPU max MHz: 4600.0000 CPU min MHz: 800.0000 BogoMIPS: 4608.00 ``` ### Virtual machine 建立 * 建立虛擬機 ![image](https://hackmd.io/_uploads/SyL5s7Rlel.png) * 設定硬碟大小 ![image](https://hackmd.io/_uploads/BJwY6XRegl.png) * 設定存儲裝置，並選擇下載好的 Ubuntu 版本 ![image](https://hackmd.io/_uploads/S10MnXAllx.png) ### Ubuntu磁區分割 [Ubuntu Linux 磁碟分割和目錄的類型和定義說明](https://www.dell.com/support/kbdoc/zh-tw/000131456/ubuntu-linux-%E7%A3%81%E7%A2%9F%E5%88%86%E5%89%B2-%E5%92%8C-%E7%9B%AE%E9%8C%84-%E7%9A%84-%E9%A1%9E%E5%9E%8B-%E5%92%8C-%E5%AE%9A%E7%BE%A9-%E8%AA%AA%E6%98%8E) ![image](https://hackmd.io/_uploads/SyWjGEAxel.png) :::warning 因為是在虛擬機上分割磁區，不會有 **EFI** 跟 **Bios** ，所以都須自己分割出來 ::: ### 安裝必要檔案 ## 作業要求 * **q_new**: 建立新的「空」佇列; * **q_free**: 釋放佇列所佔用的記憶體; * **q_insert_head**: 在佇列開頭 (head) 插入 (insert) 給定的新節點 (以 LIFO 準則); * **q_insert_tail**: 在佇列尾端 (tail) 插入 (insert) 給定的新節點 (以 FIFO 準則); * **q_remove_head**: 在佇列開頭 (head) 移去 (remove) 給定的節點; * **q_remove_tail**: 在佇列尾端 (tail) 移去 (remove) 給定的節點; * **q_release_element**: 釋放特定節點的記憶體空間; * **q_size**: 計算目前佇列的節點總量; * **q_delete_mid**: 移走佇列中間的節點， * 詳見 [LeetCode 2095. Delete the Middle Node of a Linked List](https://leetcode.com/problems/delete-the-middle-node-of-a-linked-list/) * **q_delete_dup**: 在已經排序的狀況，移走佇列中具備重複內容的節點 * 詳見 [LeetCode 82. Remove Duplicates from Sorted List II](https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/) * **q_swap**: 交換佇列中鄰近的節點 * 詳見 [LeetCode 24. Swap Nodes in Pairs](https://leetcode.com/problems/swap-nodes-in-pairs/) * **q_reverse**: 以反向順序重新排列鏈結串列，該函式不該配置或釋放任何鏈結串列節點，換言之，它只能重排已經存在的節點; * **q_reverseK**: 類似 q_reverse，但指定每 k 個節點為一組，進行反向順序排列 * 詳見 [LeetCode 25. Reverse Nodes in k-Group](https://leetcode.com/problems/reverse-nodes-in-k-group/) * **q_sort**: 以遞增順序來排序鏈結串列的節點 * 參閱 [Linked List Sort](https://npes87184.github.io/2015-09-12-linkedListSort/) 得知實作手法; * **q_merge**: 合併所有已排序的佇列，並合而為一個新的已排序佇列 * 詳見 [LeetCode 23. Merge k Sorted Lists](https://leetcode.com/problems/merge-k-sorted-lists/) * **q_descend**: * 參閱 [LeetCode 2487. Remove Nodes From Linked List](https://leetcode.com/problems/remove-nodes-from-linked-list/) ## queue.c 開發過程 ### q_new #### 思路一開始最簡單的想法是，配置記憶體給head，並檢查是否配置成功。 #### 程式碼 :::danger 出現了 `Segmentation fault occurred. You dereferenced a NULL or invalid pointermake` ，再看完程式碼的 comment `Notice: sometimes, Cppcheck would find the potential NULL pointer bugs` 與巨集後，發現我忽略了 pointer 的指向。 ```c struct list_head *q_new() { struct list_head *head = (struct list_head *) malloc(sizeof(struct list_head)); if (!head) { free(head); return NULL; } return head; } ``` ::: :::success 於是加入 `INIT_LIST_HEAD(head);` ，將 head 的 prev 與 next 都指向自己。 ```diff struct list_head *q_new() { struct list_head *head = (struct list_head *) malloc(sizeof(struct list_head)); if (!head) { free(head); return NULL; } + INIT_LIST_HEAD(head); return head; } ``` ::: ### q_free #### 思路一開始先檢查 list 是否存在，若不存在則無須進行 `q_free` ，再看完 `list.h` 定義的巨集後，發現 `list_for_each_entry_safe` 適合用來走訪每個 `entry` ，再使用 `q_release_element(entry);` 來釋放該 `entry` 的 `entry->value` 及 `entry->list` ，最後將 head 也釋放掉，釋放整個 list。 #### 程式碼 ```c void q_free(struct list_head *l) { if (!l) return; element_t *entry, *safe; list_for_each_entry_safe (entry, safe, l, list) q_release_element(entry); free(l); } ``` ### q_insert_head #### 思路首先先檢查 list 是否存在，再配置記憶體給新的 `entry`，並檢查 `entry` 與 `entry->value` 的空間是否配置成功，將節點的value值用`strdup(s);`將s的值複製給它，最後將`new_element`用`list_add`加入到list開頭的位置上。 #### 程式碼 :::danger 有出現 `Test of malloc failure` 錯誤訊息，才發現 `entry` 裡的 `entry->value` 的空間也可能配置失敗，於是加入配置記憶體的檢查。 ```c bool q_insert_head(struct list_head *head, char *s) { if (!head) return false; element_t *new_element = malloc(sizeof(element_t)); if (!new_element) return false; new_element->value = strdup(s); list_add(&new_element->list, head); return true; } ``` ::: :::success 修改後。 ```diff bool q_insert_head(struct list_head *head, char *s) { if (!head) return false; element_t *new_element = malloc(sizeof(element_t)); if (!new_element) return false; new_element->value = strdup(s); + if (!new_element->value) { + free(new_element); + return false; + } list_add(&new_element->list, head); return true; } ``` ::: ### q_insert_tail #### 思路大致上與 `q_insert_head` 思路一樣，首先先檢查 list 是否存在，再配置記憶體給新的 `entry`，並檢查 `entry` 與 `entry->value` 的空間是否配置成功，將 `entry->value` 用 `strdup(s);` 將s的值複製給它，最後將 `new_element` 用 `list_add_tail` 加入到list最後的位置上。 #### 程式碼 ```c bool q_insert_head(struct list_head *head, char *s) { if (!head) return false; element_t *new_element = malloc(sizeof(element_t)); if (!new_element) return false; new_element->value = strdup(s); if (!new_element->value) { free(new_element); return false; } list_add_tail(&new_element->list, head); return true; } ``` ### q_remove_head #### 思路一開始先檢查 list 使否存在與 list 是否為空，再使用 `list_first_entry` 找出第一個 `entry` 的位置，再使用 `list_del_init` 將此 `entry` remove ，並將此 `entry` 的 prev 與 next 都指向自己，最後將 `entry->value` 用 `strncpy` 複製到 sp 裡。 #### 程式碼 ```c element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize) { if (!head || list_empty(head)) return NULL; element_t *target = list_first_entry(head, element_t, list); list_del_init(&target->list); if (sp) { strncpy(sp, target->value, bufsize - 1); sp[bufsize - 1] = '\0'; } return target; } ``` ### q_remove_tail #### 思路大致上與 `q_remove_head` 思路一樣，一開始先檢查 list 使否存在與 list 是否為空，再使用 `list_last_entry` 找出最後一個 `entry` 的位置，再使用 `list_del_init` 將此 `entry` remove ，並將此 `entry` 的 prev 與 next 都指向自己，最後將 `entry->value` 用 `strncpy` 複製到 sp 裡。 #### 程式碼 ```c element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize) { if (!head || list_empty(head)) return NULL; element_t *target = list_last_entry(head, element_t, list); list_del_init(&target->list); if (sp) { strncpy(sp, target->value, bufsize - 1); sp[bufsize - 1] = '\0'; } return target; } ``` ### q_size #### 思路一開始先檢查 list 使否存在與 list 是否為空，再宣告一個節點用 `list_for_each` 去走訪 list ，並紀錄 list 大小。 #### 程式碼 ```c int q_size(struct list_head *head) { if (!head || list_empty(head)) return 0; int size = 0; struct list_head *li; list_for_each (li, head) size++; return size; } ``` ### q_delete_mid #### 思路我的想法是用 two-pointer 的方式實作，首先宣告兩個 pointer `pre` 與 `nex` 分別代表 list 的頭跟尾的位置，這兩個 pointer 會一直往中心移動，會有兩種 case * Case 1：Size of list 為奇數 (5個節點)，此時當 `pre` 與 `nex` 在同個位置的節點即為中心點。 ![](https://i.imgur.com/vS6ukn6.png) * Case 2：Size of list 為偶數 (6個節點)，此時當 `pre` 與 `nex` 交錯時， `pre` 即為中心點。 ![](https://i.imgur.com/yzc8iYZ.png) 用 `list_del` 將該節點從 list 中分離出來，再用 `list_entry` 取得 `entry` 位置，最後釋放該 `entry` 。 #### 程式碼 ```c bool q_delete_mid(struct list_head *head) { if (!head || list_empty(head)) return false; struct list_head *pre = head->prev; struct list_head *nex = head->next; // Find the middle of list do { pre = pre->prev; nex = nex->next; } while (pre != nex && pre != nex->next); list_del(pre); element_t *target = list_entry(pre, element_t, list); q_release_element(target); return true; } ``` ### q_delete_dup #### 思路先用 `list_for_each_entry_safe` 去走訪每個元素，分別用 `target` 與 `temp` 去紀錄第一個元素與下一個元素，再檢查它們是否相同，若相同，則釋放掉 `target`，用一個二元變數 `dup` 去紀錄是否有重複，若 `dup == true` ，則 `target` 也為重複的元素，再將其釋放。在實作時出現 `Segmentation fault occurred. You dereferenced a NULL or invalid pointer` ，發現是因為在比較時， `temp` 存取去到 `head` ，所以加入了 `target->list.next != head` 去使 `temp` 不會去存取到 `head` 。 #### 程式碼 ```c bool q_delete_dup(struct list_head *head) { if (!head || list_empty(head)) return false; bool dup = false; element_t *target, *temp; list_for_each_entry_safe (target, temp, head, list) { if (target->list.next != head && !strcmp(target->value, temp->value)) { dup = true; list_del(&target->list); q_release_element(target); } else if (dup) { dup = false; list_del(&target->list); q_release_element(target); } } return true; } ``` ### q_reverse #### 思路想到的方法為逐一將 list 走訪，每走訪一個 `entry` 就將該 `entry` 從 list 中分離出來，再加入到 list 的第一個位置。 #### 程式碼 ```c void q_reverse(struct list_head *head) { element_t *entry, *safe; list_for_each_entry_safe (entry, safe, head, list) list_move(&entry->list, head); } ``` ### q_reversek #### 思路作法跟 `q_reverse` 類似，但不像 `q_reverse` 總是將節點插入第一個位置，這裡使用一個 pointer `insert` 指向一組 k 個節點的第一個插入位置，再使用 `count` 去紀錄走訪的節點數，當走訪的節點數恰 k 個時，將 pointer `insert` 指向下一組 k 個節點的第一個插入位置，也就是上一組的最後一個節點(目前節點的 prev)。 #### 程式碼 ```c void q_reverseK(struct list_head *head, int k) { if (!head || list_empty(head)) return; struct list_head *node = NULL, *safe = NULL, *insert = head; int count = 0; list_for_each_safe(node, safe, head) { if (k > count) list_move(node, insert); else { count = 0; insert = node->prev; } count++; } } ``` ### q_swap #### 思路 `q_swap` 其實就是 `q_reverseK(2)` ，於是用`q_reverseK` 去實作。 #### 程式碼 ```c void q_swap(struct list_head *head) { if (!head || list_empty(head)) return; q_reverseK(head, 2); } ``` ### q_descend #### 思路從 list 最尾端開始往前做，設 `target` 為最後一個元素、 `prev` 為 `target` 的前一個元素、 `size` 紀錄 list 中剩餘元素個數。 * 若後面的元素比前一個元素大，則將前一個元素移出 list。 * 繼續往前找，直到找到比現在的 `target` 還大的元素，再將 `target` 指向此元素。 * 重複以上步驟直到到 `head` 為止。 #### 程式碼 :::danger 在跑測試時，在執行 `q_free` 後尚有未被刪除的元素，發現因為將元素移出 list 後，並未將其釋放而造成錯誤。 ```c int q_descend(struct list_head *head) { if (!head || list_empty(head)) return 0; int size = q_size(head); struct list_head *target = head->prev, *prev = target->prev; while (target->prev !=head) { element_t *t = list_entry(target, element_t, list); element_t *p = list_entry(prev, element_t, list); if (strcmp(p->value, t->value) < 0) { list_del_init(&p->list); prev = target->prev; size--; } else { target = prev; prev = target->prev; } } return size; } ``` ::: :::success 修改後 ```diff int q_descend(struct list_head *head) { if (!head || list_empty(head)) return 0; int size = q_size(head); struct list_head *target = head->prev, *prev = target->prev; while (target->prev !=head) { element_t *t = list_entry(target, element_t, list); element_t *p = list_entry(prev, element_t, list); if (strcmp(p->value, t->value) < 0) { - list_del_init(&p->list); + list_del(&p->list); + q_release_element(p); prev = target->prev; size--; } else { target = prev; prev = target->prev; } } return size; } ``` > 使用 `diff` 或 `git diff` 產生程式碼差異的列表 > :notes: jserv ::: ### q_sort #### 思路我選擇採用 merge sort，因為 merge sort 為 stable sorting algorithm ，而其最壞的時間複雜度只有 $O(nlogn)$ ，這裡參考了 [Linked List Sort](https://npes87184.github.io/2015-09-12-linkedListSort/) 的實作方法，將 merge sort 分成三個部份，分別是： * `mergelist` * 主要是做 **merge** 的部份，採用非遞迴的方式將兩個 list 以遞增的形式串接在一起。 * `mergesort` * 主要是做 **split** 的部份，使用兩個指標，分別是 `fast` 與 `slow` ，採用 [Floyd Cycle Detection Algorithm](https://medium.com/@orionssl/%E6%8E%A2%E7%B4%A2-floyd-cycle-detection-algorithm-934cdd05beb9) ，當較快的指標到達終點時，較慢的指標會恰好在中心位置，並以遞迴的方式將 list 拆解，再呼叫 `mergelist` 將拆解後的 list 排序並重新組合成一條 list。 * `q_sort` * 因為採用 [Floyd Cycle Detection Algorithm](https://medium.com/@orionssl/%E6%8E%A2%E7%B4%A2-floyd-cycle-detection-algorithmm-934cdd05beb9) 的方式，所以原本的環狀佇列結構會導致找不到中心位置，於是先將原本的環狀 doubly-linked list 轉換成 singly-linked list，等做完 merge sort 後，再將 singly-linked list 轉回環狀 doubly-linked list。 :::warning 注意連字號的位置，書寫為 "doubly-linked list" :notes: jserv 👌 (以修改) ::: #### 程式碼 * `mergelist` ```c struct list_head *mergelist(struct list_head *l1, struct list_head *l2) { struct list_head temp; struct list_head *t = &temp; while (l1 && l2) { element_t *ele1 = list_entry(l1, element_t, list); element_t *ele2 = list_entry(l2, element_t, list); if (strcmp(ele1->value, ele2->value) < 0) { t->next = l1; t = t->next; l1 = l1->next; } else { t->next = l2; t = t->next; l2 = l2->next; } } // Connect the remain list if (l1) t->next = l1; if (l2) t->next = l2; return temp.next; } ``` * `mergesort` ```c struct list_head *mergesort(struct list_head *head) { if (!head->next) return head; struct list_head *fast = head->next; struct list_head *slow = head; // split list while (fast && fast->next) { slow = slow->next; fast = fast->next->next; } // Fast is the middle of list fast = slow->next; slow->next = NULL; struct list_head *left = mergesort(head); struct list_head *right = mergesort(fast); // Merge sorted left and sorted right return mergelist(left, right); } ``` * `q_sort` ```c void q_sort(struct list_head *head) { if (!head || list_empty(head)) return; // Turn the list into Singly linked list head->prev->next = NULL; head->next = mergesort(head->next); // Turn the list into doubly circular linked list struct list_head *temp = head, *next = head->next; while (next) { next->prev = temp; temp = next; next = next->next; } temp->next = head; head->prev = temp; ``` ### q_merge #### 思路一開始有兩種想法，第一種是使用 `mergelist` 迭代的將兩條 list 合併成一條，但因為 `mergelist` 實作的關係，輸入需要是沒有 `head` 的兩條 list ，所以在迭代前都需要先對 list `head` 做處理，而在合併後還須將 list 變回 doubly-linked list，以上步驟會讓程式碼變得較為冗長。於是採用第二種方法，第二種為先將所有 list 串接在一起，再對這條 list 做 `q_sort` ，這樣一來就無須對 `head` 做變更，整體程式碼可以較為簡短。 `cur` 表示第一條 queue， `temp` 為指向下一條佇列的指標，迭代的將後面的佇列連接到第一條佇列後面，最後再對這條佇列做 `q_sort` ，最後回傳佇列長度。 #### 程式碼 ```c int q_merge(struct list_head *head) { if (!head || list_empty(head)) return 0; queue_contex_t *cur = list_entry(head->next, queue_contex_t, chain); for (struct list_head *temp = head->next->next; temp != head; temp = temp->next) { queue_contex_t *t = list_entry(temp, queue_contex_t, chain); list_splice_init(t->q, cur->q); } q_sort(cur->q); return q_size(cur->q); } ``` ## 引入 list_sort.c 比較自己的 sort.c 效能差異 ### 引入`list_sort` 檔案及修改首先引入 `list_sort.h` ，刪除不必要的 `header` 。 ```c /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_LIST_SORT_H #define _LINUX_LIST_SORT_H #include <stdint.h> #include "list.h" ``` `list_sort.c` 使到兩個巨集 `likely` 與 `unlikely`，定義於`<linux/compiler.h>`，將其定義加入 `list_sort.h` 中，在 `merge_final` 函式中宣告了 `u8 count = 0;` ，將其改成 `uint_8 count = 0;` 。 ```c # define likely(x) __builtin_expect(!!(x), 1) # define unlikely(x) __builtin_expect(!!(x), 0) ``` 最後是將 `list_sort.o` 寫入 `makefile` 裡。 ```c OBJS := qtest.o report.o console.o harness.o queue.o list_sort.o\ random.o dudect/constant.o dudect/fixture.o dudect/ttest.o \ shannon_entropy.o \ linenoise.o web.o ``` 在 `qtest` 中先複製一份 `do_sort` 給 `list_sort` 使用，在 `list_sort.h` 中有定義函式，於是需要自己實作，這個函式是用來比較字串大小，回傳值為 `int` 。 ```c typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *, const struct list_head *, const struct list_head *); ``` 所以在 `qtest` 中加入 `cmp` 函式實作。 ```c int cmp(void *priv, const struct list_head *list1, const struct list_head *list2) { element_t *list1_entry = list_entry(list1, element_t, list); element_t *list2_entry = list_entry(list2, element_t, list); return strcmp(list1_entry->value, list2_entry->value) ; } ``` 在 `qtest` 中的命令 `mysort` 與 `list_sort` 都能正常執行。 ### 效能比較這裡用 `qtest` 中的 `time` 進行時間的比較，有三種測資，分別用命令 `RAND` 加入100000、300000、500000筆隨機資料給 queue 。 #### 測試指令加入自己寫的 `trace-mysort.cmd` 與 `trace-list_sort.cmd` 到測試資料中， `./qtest -v 3 -f traces/trace-list_sort.cmd` 會執行以下命令。 ```c option fail 0 option malloc 0 new ih RAND 100000/300000/500000 time mysort/list_sort free ``` #### my_sort |測試次數|100000筆|300000筆|500000筆| |-|-|-|-| |1|0.042|0.196|0.407| |2|0.041|0.194|0.419| |3|0.042|0.200|0.430| |average|0.0417|0.1967|0.4187| #### list_sort |測試次數|100000筆|300000筆|500000筆| |-|-|-|-| |1|0.041|0.170|0.368| |2|0.043|0.170|0.388| |3|0.039|0.171|0.364| |average|0.041|0.1703|0.3733| #### 效能差距 ||100000筆|300000筆|500000筆| |-|-|-|-| |效能差距|2%|14%|11%| 因為測試資料都是隨機生成的，加上測試次數不多，所以以 Perf 來分析效能。 :::warning 測試方式不足以反映出上述實作的特性。 :notes: jserv 👌 (以修改) ::: ### 以 [Linux 效能分析工具: Perf](http://wiki.csie.ncku.edu.tw/embedded/perf-tutorial) 來分析效能 #### 環境安裝 ``` $ cat "/boot/config-`uname -r`" | grep "PERF_EVENT" ``` 用此指令確認是否安裝 perf。 ``` $ uname -r 5.19.0-43-generic ``` 確認 kernel 版本。 ``` $ sudo apt-get install linux-tools-5.19.0-43-generic linux-cloud-tools-5.19.0-43-generic ``` 安裝 perf。 ``` $ cat /proc/sys/kernel/perf_event_paranoid ``` 確認 perf 權限。 * 2 : 不允許任何量測。但部份用來查看或分析已存在的紀錄的指令仍可使用，如 perf ls、perf report、perf timechart、 perf trace。 * 1 : 不允許 CPU events data。但可以使用 perf stat、perf record 並取得 Kernel profiling data。 * 0 : 不允許 raw tracepoint access。但可以使用 perf stat、perf record 並取得 CPU events data。 * -1: 權限全開。 ``` $ sudo sh -c " echo -1 > /proc/sys/kernel/perf_event_paranoid" ``` 將權限全開 ``` $ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict" ``` 最後如果要檢測 cache miss event，需要先取消 kernel pointer 的禁用。 #### 測試指令使用`./qtest -v 3 -f traces/trace-list_sort.cmd` 會執行以下命令，這裡以500000筆隨機資料做測試。 ```c option fail 0 option malloc 0 new ih RAND 500000 time mysort/list_sort free ``` 執行 `perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./qtest -f traces/trace-mysort.cmd` ，會執行5次上述命令，再去觀察其效能。 #### my_sort ``` Performance counter stats for './qtest -f traces/trace-mysort.cmd' (5 runs): 16,049,539 cache-misses # 48.666 % of all cache refs ( +- 0.18% ) 32,954,397 cache-references ( +- 0.04% ) 2,203,807,317 instructions # 0.74 insn per cycle ( +- 0.00% ) 2,958,082,501 cycles ( +- 0.37% ) 0.65370 +- 0.00448 seconds time elapsed ( +- 0.69% ) ``` #### list_sort ``` Performance counter stats for './qtest -f traces/trace-list_sort.cmd' (5 runs): 14,022,127 cache-misses # 54.505 % of all cache refs ( +- 0.43% ) 25,688,333 cache-references ( +- 0.10% ) 2,251,606,244 instructions # 0.81 insn per cycle ( +- 0.00% ) 2,867,416,957 cycles ( +- 0.90% ) 0.61738 +- 0.00504 seconds time elapsed ( +- 0.82% ) ``` #### 效能 |event data|my_sort|list_sort| |-|-|-| |cache-misses|16,049,539|14,022,127| |cache-references|32,954,397|25,688,333| |instructions|2,203,807,317|2,251,606,244| |cycles|2,958,082,501|2,867,416,95| #### 花費時間 |測試次數|my_sort 花費時間|list_sort 花費時間| |-|-|-| |1|0.357|0.398| |2|0.361|0.412| |3|0.362|0.407| |4|0.364|0.411| |5|0.365|0.402| |average|0.3618|0.406| 可以看出 `list_sort` 的 cache-misses 比 `my_sort` 少很多， `list_sort` 所需的 instructions 與 cycles 都比 `my_sort` 少， `my_sort` 只能達到 `list_sort` 的 89% 效能 (0.3618/0.406 = 0.89) 。 ## 在 qtest 提供新的命令並實作 shuffle ### 利用 [Fisher–Yates shuffle](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle) 演算法來實作洗牌（shuffle） 1. 先用 `q_size` 取得佇列的大小 len。 2. 隨機從 0 ~ (len - 1) 中抽出一個數字 random，抽到的數字為節點的位置，再將該節點加到佇列最後位置。 3. 隨著 len 大小變小，直到所有的節點都已經被抽到過，shuffle 就結束。 ### shuffle 流程 | len | random | queue | result | |-|-|-|-| |0–5|5|A B C D E ~~F~~|F| |0–4|3|A B C ~~D~~ E|F D| |0–3|1|A ~~B~~ C E|F D B| |0–2|1|A ~~C~~ E|F D B C| |0–1|0|~~A~~ E|F D B C A| |0–0|0|~~E~~|F D B C A E| ### shuffle 實作 `len` 為佇列的大小，`random` 為隨機數，也同時代表被抽到的節點位置，將 `node` 移動到第 random 個節點位置，再將該節點移動到佇列最後的位置，當每個節點都做過一輪就結束。 ```c void shuffle(struct list_head *head) { if (!head || list_empty(head)) return; int len = q_size(head); while (len) { int random = rand() % len; struct list_head *node = head->next; while (random) { node = node->next; random--; } list_move_tail(node, head); len--; } } ``` 將 `shuffle` 加入到 `qtest` 裡面，這裡的 `do_shuffle` 是參考 `do_reverse` 的實作，因為這兩個函式都是用來改變佇列的位置。 ```c bool do_shuffle(int argc, char *argv[]) { if (argc != 1) { report(1, "%s takes no arguments", argv[0]); return false; } if (!current || !current->q) report(3, "Warning: Calling shuffle on null queue"); error_check(); set_noallocate_mode(true); if (current && exception_setup(true)) shuffle(current->q); exception_cancel(); set_noallocate_mode(false); q_show(3); return !error_check(); } ``` ### 執行結果 ``` # do 5 times shuffle l = [] l = [A] l = [A B] l = [A B C] l = [A B C D] l = [A B C D E] l = [A B C D E F] # shuffle l = [E F D C B A] l = [B A F E C D] l = [E F B C A D] l = [C F D E A B] l = [B D F A E C] l = NULL Freeing queue ``` ## 研讀論文[〈Dude, is my code constant time?〉](https://eprint.iacr.org/2016/1123.pdf)，解釋本程式的 “simulation” 模式 ### 研讀論文[〈Dude, is my code constant time?〉](https://eprint.iacr.org/2016/1123.pdf) #### Introduction 這篇論文主要闡述提供了一個新的工具 dudect ，可以不限定任何平台去檢測部份的程式碼是否達到 constant time ，提供 leakage detection 的方法。 #### Leakage detection 這個方法會先去測量兩種不同的輸入的執行時間，並用統計學去判定這兩種執行時間的差異性，其中一種輸入為固定的數值，另一種為隨機數值。在進行統計分析前會先裁掉極值，這些極值可能是測量的誤差或是程式執行被中斷導致。 #### Statistical test 這裡採用 [Welch's t-test](https://en.wikipedia.org/wiki/Welch%27s_t-test) ，為 [Student's t-test](https://en.wikipedia.org/wiki/Student%27s_t-distribution) 的改編版， Welch's t-test 處理兩個樣本間不同的標準差與樣本大小會比較可靠，這個方法會去計算兩個樣本的均值是否相等，並計算出 t 值。其中 $\bar{X}$ 為樣本均值， $s_{\bar{X}}$ 為標準差。 ![](https://i.imgur.com/00ekpCY.png) #### Pre-processing 這裡提到的預處理指的是在進行統計分析前裁剪掉極值的動作，而裁剪的方法為裁剪掉大於 $p$ 百分比的測量值。這部份也就是 simulation 的缺陷，因為沒有將極值裁剪掉，導致影響到測量的速度。 ### 解釋 Simulation 模式首先在 `trace-17-complexity.cmd` 會先執行 `option simulation 1` ，先開啟 simulation mode ，再呼叫 `it`, `ih`, `rh`, `rt` 四個函式。以呼叫 `ih` 為例，在 `qtest` 中會執行 `bool do_ih` 函式。 ```c if (simulation) { if (argc != 1) { report(1, "%s does not need arguments in simulation mode", argv[0]); return false; } bool ok = is_insert_head_const(); if (!ok) { report(1, "ERROR: Probably not constant time or wrong implementation"); return false; } report(1, "Probably constant time"); return ok; } ``` 可以看到決定執行時間是否為 constant time 且程式正確執行，由函式 `is_insert_head_const` 決定，此函式在 `fixture.h` 定義。 ```c bool is_##op##_const(void) { return test_const(#op, DUT(op)); } ``` 在 `fixture.c` 中 `is_##op##_const` 會回傳函式 `test_const` 的結果 `result` ， `result` 為函式 `doit` 的回傳值。 ```c result = doit(mode); ``` 其中 `result` 由兩個函式的回傳值做 AND 而來， `measure()`, `report()`。 ```c static bool doit(int mode) { ... bool ret = measure(before_ticks, after_ticks, input_data, mode); ... ret &= report(); ... return ret; } ``` 首先為函式`measure` ，主要對四種模式 `it`, `ih`, `rh`, `rt` 進行執行正確性的判定，判定依據為執行完該函式，其改變後的佇列大小是否正確。 ```c if (before_size != after_size - 1) return false; ``` 再來是函式 `report` ，主要是用來判定執行時間是否為 constant time 。 ```c /* Definitely not constant time */ if (max_t > t_threshold_bananas) return false; /* Probably not constant time. */ if (max_t > t_threshold_moderate) return false; /* For the moment, maybe constant time. */ return true; ``` ### 解決 Simulation 實作的缺陷首先將參數 $p$ 百分比 `percentiles` 加入到宣告中，但因為作者有多宣告了一種結構，而 Simulation 實作是將宣告直接寫在函式中，於是這裡將 `percentiles` 在 `t_context_t` 結構中宣告。 ```c /* 作者宣告 */ typedef struct { int64_t *ticks; int64_t *exec_times; uint8_t *input_data; uint8_t *classes; dudect_config_t *config; ttest_ctx_t *ttest_ctxs[DUDECT_TESTS]; int64_t *percentiles; <------ } dudect_ctx_t; ``` ```diff typedef struct { double mean[2]; double m2[2]; double n[2]; + int64_t *percentiles; } t_context_t; ``` 再將相關程式引入到專案中，詳細修改在 [commit](https://github.com/sysprog21/lab0-c/commit/01fea1727467d6843ac984652a6e00f35e50d5e2) 中。 ### 測試結果加入了 `percentiles` 的實作後，測試都能在 constant time 內完成。 ``` --- Trace Points +++ TESTING trace trace-17-complexity: # Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant Probably constant time Probably constant time Probably constant time Probably constant time --- trace-17-complexity 5/5 --- TOTAL 5/5 ```