# 2020q3 Homework1 (lab0) contributed by < `blueskyson` > ###### tags: `linux2020` ## 環境 在 VirtualBox 中安裝 Ubuntu 20.04.1 ``` $ uname -a Linux vm1 5.4.0-47-generic #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux $ gcc --version gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0 ``` ## 實作 queue.h ### queue_t - 在此結構加入 `list_ele_t *tail` ,使 `bool q_insert_tail()` 時間複雜度 $O(1)$ - 加入 `unsigned size` 儲存佇列長度,使 `int q_size(queue_t *q)` 時間複雜度 $O(1)$ ```cpp= typedef struct { list_ele_t *head; list_ele_t *tail; unsigned size; } queue_t; ``` ## 實作 queue.c ### queue_t *q_new() - 第 4 行先確認記憶體是否分配成功,若否就直接回傳 `NULL` - 接下來初始化 `head` 以及 `tail` 為 `NULL` ```cpp= queue_t *q_new() { queue_t *q = malloc(sizeof(queue_t)); if (!q) return NULL; q->head = NULL; q->tail = NULL; q->size = 0; return q; } ``` ### void q_free(queue_t *q) - 第 3 行先確認傳入的佇列是否為 NULL,若為 NULL 則直接跳出函式 - 接下來,當 `q->head` 不是 NULL 時,先用指標 `*del` 指向 `q->head` ,再把 `q->head` 指向 `q->head->next` ,然後釋放 `*del` - 最後釋放 `q` ```cpp= void q_free(queue_t *q) { if(!q) return; free(q); while (q->head) { list_ele_t *del = q->head; q->head = q->head->next; free(del->value); free(del); } free(q); } ``` ### bool q_insert_head(queue_t *q, char *s) - 第 3 行確認傳入的佇列是否為 NULL,若為 NULL 則直接跳出函式 - 第 6 行確認新元素 `newh` 記憶體是否配置成功 - 第 10 行確認 `*val` 的記憶體是否配置成功,若沒有配置成功,要記得把先前配置的 `newh` 釋放 - 然後使用 `memcpy()` 將傳入的字串複製進到 `val` 並讓 `newh->value` 指向 `val` ,這裡要注意記得要預留 `'\0'` 的位置 - 最後把原來的 `q->head` 接在 `newh` 後面,並把 `newh` 設為新的 `q->head` ```cpp= bool q_insert_head(queue_t *q, char *s) { if (!q) return false; list_ele_t *newh = malloc(sizeof(list_ele_t)); if (!newh) return false; unsigned len = strlen(s) + 1; char *val = malloc(sizeof(char) * len); if (!val) { free(newh); return false; } memcpy(val, s, len); newh->value = val; newh->next = q->head; q->head = newh; if (q->size == 0) q->tail = newh; (q->size)++; return true; } ``` ### bool q_insert_tail(queue_t *q, char *s) - 前面的動作跟 `q_insert_head()` 差不多,接下來先判斷佇列 `q` 否為空,若為空,則把 `q->head` 及 `q->tail` 設為 `newt` ;若不為空,則把 `newt` 接在原本的 `q->tail` 後面,並將其設為 `q->tail` ```cpp= bool q_insert_tail(queue_t *q, char *s) { if (q == NULL) return false; list_ele_t *newt = malloc(sizeof(list_ele_t)); if (!newt) return false; unsigned len = strlen(s) + 1; char *val = malloc(sizeof(char) * len); if (!val) { free(newt); return false; } memcpy(val, s, len); newt->value = val; newt->next = NULL; if (q->size == 0) { q->head = newt; q->tail = newt; } else { q->tail->next = newt; q->tail = newt; } (q->size)++; return true; } ``` ### bool q_remove_head(queue_t *q, char *sp, size_t bufsize) - 首先確認 `q` 是否為 `NULL` 以及佇列是否為空,然後確認佇列長度是否為 1 ,如果是,就把 `q->tail` 指向 `NULL` - 將原本的 `q->head` 指向 `q->head->next` - 接下來把字串複製到 `sp` ,如果字串長度大於 `bufsize` ,則只複製 `bufsize` 個字元,並且把最後一個字元改為 `\0` - 最後將目標元素刪除,`q->size` 減 1 ```cpp= bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { if (!q || !(q->head)) return false; if (q->size == 1) q->tail = NULL; list_ele_t *del; del = q->head; q->head = q->head->next; if (sp) { if (strlen(del->value) > bufsize) { memcpy(sp, del->value, bufsize); sp[bufsize - 1] = '\0'; } else { memcpy(sp, del->value, sizeof(char) * (strlen(del->value) + 1)); } } free(del->value); free(del); (q->size)--; return true; } ``` ### int q_size(queue_t *q) ```cpp= int q_size(queue_t *q) { if (!q) return 0; return q->size; } ``` ### void q_reverse(queue_t *q) - `*prev` `*curr` `*next` 代表相鄰的三個節點。每當 `curr` 非 NULL,先將 `next` 指向 `curr->next` ,然後將 `curr->next` 指向 `prev` ,把三個節點往前平移,持續前述動作直到 `curr` 為 NULL,達到反轉 linked list 的效果。圖示如下: ```graphviz digraph g1{ rankdir=LR; node [shape=record]; 1 [label = "{<data>1 |<ref>}"]; 2 [label = "{<data>2 |<ref>}"]; 3 [label = "{<data>3 |<ref>}"]; 4 [label = "{<data>4 |<ref>}"]; 5 [label = "{<data>... |<ref>}"]; 1:ref -> 2:data [color=black]; 2:ref -> 3:data [color=black]; 3:ref -> 4:data [color=black]; 4:ref -> 5:data [color=black]; prev->1; next->3; curr->2; 2:ref -> 1:data [color=red] } ``` ```graphviz digraph g2{ rankdir=LR; node [shape=record]; 1 [label = "{<data>1 |<ref>}"]; 2 [label = "{<data>2 |<ref>}"]; 3 [label = "{<data>3 |<ref>}"]; 4 [label = "{<data>4 |<ref>}"]; 5 [label = "{<data>... |<ref>}"]; 2:ref -> 1:data [color=black]; 3:ref -> 4:data [color=black]; 4:ref -> 5:data [color=black]; prev->2; next->4; curr->3; 3:ref -> 2:data [color=red] } ``` ```cpp= void q_reverse(queue_t *q) { if (!q || !(q->head)) return; list_ele_t *prev, *curr, *next; prev = q->head; q->tail = q->head; curr = prev->next; while (curr) { next = curr->next; curr->next = prev; prev = curr; curr = next; } q->head = prev; q->tail->next = NULL; } ``` ### void q_sort(queue_t *q) - 原本很隨便的寫了 bubble sort 交差了事,然而在 trace-15 和 trace-16 會發生 TLE,所以搜尋了其他適合 linked list 的排序法,參考 [Merge Sort for Linked Lists](https://www.geeksforgeeks.org/merge-sort-for-linked-list/) 實作 merge sort - 由於我需要直接修改整個 linked list 的連接順序,所以需要使用 `q->head` 指標的指標,也就是 `&(q->head)` 操作其儲存的頭節點位址,而非單純修改節點的連接順序 - 首先,宣告指標 `*fast` `*slow` ,來找出佇列的中間節點, `fast` 每次會移動 2 個單位,`slow` 每次會移動 1 個單位,所以當 `fast` 指向 `NULL` 時, `slow` 正好指向中間的節點 - 找到中間的節點之後,把佇列切成兩個分割,左分割的頭是 `*headref` 和 `slow` ,右分割的頭是 `slow->next`,接下來再對其進行分割直到每個分割剩下 1 個或 0 個節點。 - 最後透過 `strcmp()` 比較字串長度及大小,把左分割及右分割的所有元素由小到大組合在一起。 ```cpp= void sort_recur(list_ele_t **headref) { list_ele_t *head = *headref; if (head == NULL || head->next == NULL) return; /* find the middle node */ list_ele_t *fast = head->next, *slow = head; while (fast != NULL) { fast = fast->next; if (fast != NULL) { fast = fast->next; slow = slow->next; } } /* make partitions */ list_ele_t *right_head = slow->next; slow->next = NULL; sort_recur(&head); sort_recur(&right_head); list_ele_t *head_tmp; /* find out the smallest node in 2 partitions * and set it as the head of combined partition */ if (strcmp(head->value, right_head->value) <= 0) { *headref = head; head = head->next; } else { *headref = right_head; right_head = right_head->next; } head_tmp = *headref; /* compare values in 2 partitions and * reorder them in combined partition */ while (true) { if (head == NULL) { head_tmp->next = right_head; break; } else if (right_head == NULL) { head_tmp->next = head; break; } if (strcmp(head->value, right_head->value) <= 0) { head_tmp->next = head; head = head->next; } else { head_tmp->next = right_head; right_head = right_head->next; } head_tmp = head_tmp->next; } } void q_sort(queue_t *q) { if (!q || q->size < 2) return; sort_recur(&(q->head)); } ``` ## Valgrind 排除 qtest 實作的記憶體錯誤 首先,使用 valgrind leak-check 工具針對每個 trace 檢查是否有 memory leak ### 測試 trace-01-ops.cmd ```bash $ valgrind -q --leak-check=full ./qtest -f traces/trace-01-ops.cmd cmd> # Test of insert_head and remove_head cmd> option fail 0 cmd> option malloc 0 cmd> new q = [] cmd> ih gerbil q = [gerbil] cmd> ih bear q = [bear gerbil] cmd> ih dolphin q = [dolphin bear gerbil] cmd> rh dolphin Removed dolphin from queue q = [bear gerbil] cmd> rh bear Removed bear from queue q = [gerbil] cmd> rh gerbil Removed gerbil from queue q = [] cmd> Freeing queue ==8226== 8 bytes in 1 blocks are still reachable in loss record 1 of 3 ==8226== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==8226== by 0x4A4D50E: strdup (strdup.c:42) ==8226== by 0x10FEF9: linenoiseHistoryAdd (linenoise.c:1236) ==8226== by 0x110A8C: linenoiseHistoryLoad (linenoise.c:1325) ==8226== by 0x10C214: main (qtest.c:769) ==8226== ==8226== 160 bytes in 1 blocks are still reachable in loss record 2 of 3 ==8226== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==8226== by 0x10FEB9: linenoiseHistoryAdd (linenoise.c:1224) ==8226== by 0x110A8C: linenoiseHistoryLoad (linenoise.c:1325) ==8226== by 0x10C214: main (qtest.c:769) ==8226== ==8226== 321 bytes in 19 blocks are still reachable in loss record 3 of 3 ==8226== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==8226== by 0x4A4D50E: strdup (strdup.c:42) ==8226== by 0x10FE6D: linenoiseHistoryAdd (linenoise.c:1236) ==8226== by 0x110A8C: linenoiseHistoryLoad (linenoise.c:1325) ==8226== by 0x10C214: main (qtest.c:769) ==8226== ``` 從提示訊息可以得知 qtest.c 第 769 行出現問題,觀察程式碼: ` linenoiseHistoryLoad(HISTORY_FILE); /* Load the history at startup */ ` 得知 linenoiseHistoryLoad 函式有問題,接下來觀察 linenoise.c 第 1325 行: `linenoiseHistoryAdd(buf);` 觀察 linenoise.c 第 1236 行: `linecopy = strdup(line);` 最後追查到 `strdup()` 的原始碼,然而此函式好歹也包含在 C 的函式庫中,應該不會有問題,其原始碼在 [https://code.woboq.org/userspace/glibc/string/strdup.c.html](https://code.woboq.org/userspace/glibc/string/strdup.c.html) 擷取 `__strdup()` 觀察其實作: ```cpp= /* Duplicate S, returning an identical malloc'd string. */ char * __strdup (const char *s) { size_t len = strlen (s) + 1; void *new = malloc (len); if (new == NULL) return NULL; return (char *) memcpy (new, s, len); } ``` 觀察函式的內容,確認此函式沒有 memory leak 的疑慮。那問題應該是出在 linenoise.c 第 1236 行: `linecopy = strdup(line);` 我初步猜測變數 linecopy 可能忘記被清理掉了,觀察前後程式碼,於是我在第 1244 行後面補上 `free(linecopy);` 函式,在函式回傳前釋放 `linecopy` ,然後再度編譯執行: ``` $ make CC linenoise.o LD qtest $ valgrind -q --leak-check=full ./qtest -f traces/trace-01-ops.cmd ==9257== Invalid read of size 1 ==9257== at 0x483FED4: strcmp (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==9257== by 0x10FE61: linenoiseHistoryAdd (linenoise.c:1231) ==9257== by 0x110A94: linenoiseHistoryLoad (linenoise.c:1326) ==9257== by 0x10C214: main (qtest.c:769) ==9257== Address 0x4ba1cf0 is 0 bytes inside a block of size 8 free'd ==9257== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==9257== by 0x10FF26: linenoiseHistoryAdd (linenoise.c:1246) ==9257== by 0x110A94: linenoiseHistoryLoad . . . ``` 結果噴出更多記憶體錯誤,於是我把程式碼復原,並且再度思考,發現 ` linenoise.c` 有個全域變數: ```cpp static char **history = NULL; ``` 我從此變數被調用的形式猜測此為儲存 qtest 紀錄的東西,也許是 qtest 結束之後忘記將將其釋放造成 memory leak,所以我尋找可以釋放此變數的函式,發現以下函式可以達到目的: ```cpp= static void freeHistory(void) { if (history) { int j; for (j = 0; j < history_len; j++) free(history[j]); free(history); } } ``` 我很開心地在 qtest.c 倒數第三行呼叫 `freeHistory()` ,然而編譯時發生了錯誤: ``` $ make CC qtest.o qtest.c: In function ‘main’: qtest.c:782:5: error: implicit declaration of function ‘freeHistory’ [-Werror=implicit-function-declaration] 782 | freeHistory(); | ^~~~~~~~~~~ cc1: all warnings being treated as errors make: *** [Makefile:50: qtest.o] Error 1 ``` 似乎是 linenoise.h 沒有宣告這個函式造成這個錯誤,於是我在 linenoise.h 補上宣告,然後編譯: ``` $ make CC qtest.o In file included from console.h:5, from qtest.c:34: linenoise.h:73:13: error: ‘freeHistory’ used but never defined [-Werror] 73 | static void freeHistory(void); | ^~~~~~~~~~~ cc1: all warnings being treated as errors make: *** [Makefile:50: qtest.o] Error 1 ``` 經過網路爬文,在 [stackoverflow](https://stackoverflow.com/questions/5526461/gcc-warning-function-used-but-not-defined) 發現問題的原因,是因為 static 前綴的函式只能被實作的 .c 檔看到,其他 .c 檔無法呼叫,所以不能在 qtest.c 呼叫它!最後我只好自幹一個跟 `freeHistory()` 功能一樣,但是沒有 static 前綴的函式: ```cpp= void deleteHistory(void) { if (history) { int j; for (j = 0; j < history_len; j++) free(history[j]); free(history); } } ``` 在 linenoise.h 宣告 `void deleteHistory(void);` 、並且在 qtest.c 倒數第三行呼叫 `deleteHistory();` 之後, trace-01-ops.cmd 終於通過記憶體測試: ``` $ make CC qtest.o CC console.o CC dudect/fixture.o CC linenoise.o LD qtest $ valgrind -q --leak-check=full ./qtest -f traces/trace-01-ops.cmd cmd> # Test of insert_head and remove_head cmd> option fail 0 cmd> option malloc 0 cmd> new q = [] cmd> ih gerbil q = [gerbil] cmd> ih bear q = [bear gerbil] cmd> ih dolphin q = [dolphin bear gerbil] cmd> rh dolphin Removed dolphin from queue q = [bear gerbil] cmd> rh bear Removed bear from queue q = [gerbil] cmd> rh gerbil Removed gerbil from queue q = [] cmd> Freeing queue ``` ### 測式其他 trace 接下來用類似的方法依序測試其他 trace: ``` $ valgrind -q --leak-check=full ./qtest -f traces/trace-02-ops.cmd $ valgrind -q --leak-check=full ./qtest -f traces/trace-03-ops.cmd $ valgrind -q --leak-check=full ./qtest -f traces/trace-04-ops.cmd . . . ``` 都沒有再出現記憶體問題,然而在測試 trace-12 時會發生 malloc error、測試 trace-13 時會發生 TLE ,不確定是不是 valgrind 造成的。於是乎我想到使用 Makefile 自帶的 make valgrind: ``` $ make valgrind # Explicitly disable sanitizer(s) make clean SANITIZER=0 qtest make[1]: Entering directory '/home/vm1/lab0-c' rm -f qtest.o report.o console.o harness.o queue.o random.o dudect/constant.o dudect/fixture.o dudect/ttest.o linenoise.o .qtest.o.d .report.o.d .console.o.d .harness.o.d .queue.o.d .random.o.d .dudect/constant.o.d .dudect/fixture.o.d .dudect/ttest.o.d .linenoise.o.d *~ qtest /tmp/qtest.* rm -rf .dudect rm -rf *.dSYM (cd traces; rm -f *~) CC qtest.o CC report.o CC console.o CC harness.o CC queue.o CC random.o CC dudect/constant.o CC dudect/fixture.o CC dudect/ttest.o CC linenoise.o LD qtest make[1]: Leaving directory '/home/vm1/lab0-c' cp qtest /tmp/qtest.Bv7bLm chmod u+x /tmp/qtest.Bv7bLm sed -i "s/alarm/isnan/g" /tmp/qtest.Bv7bLm scripts/driver.py -p /tmp/qtest.Bv7bLm --valgrind --- Trace Points +++ TESTING trace trace-01-ops: # Test of insert_head and remove_head --- trace-01-ops 6/6 +++ TESTING trace trace-02-ops: # Test of insert_head, insert_tail, and remove_head --- trace-02-ops 6/6 +++ TESTING trace trace-03-ops: # Test of insert_head, insert_tail, reverse, and remove_head --- trace-03-ops 6/6 +++ TESTING trace trace-04-ops: # Test of insert_head, insert_tail, size, and sort --- trace-04-ops 6/6 +++ TESTING trace trace-05-ops: # Test of insert_head, insert_tail, remove_head, reverse, size, and sort --- trace-05-ops 5/5 +++ TESTING trace trace-06-string: # Test of truncated strings --- trace-06-string 6/6 +++ TESTING trace trace-07-robust: # Test operations on NULL queue --- trace-07-robust 6/6 +++ TESTING trace trace-08-robust: # Test operations on empty queue --- trace-08-robust 6/6 +++ TESTING trace trace-09-robust: # Test remove_head with NULL argument --- trace-09-robust 6/6 +++ TESTING trace trace-10-malloc: # Test of malloc failure on new --- trace-10-malloc 6/6 +++ TESTING trace trace-11-malloc: # Test of malloc failure on insert_head --- trace-11-malloc 6/6 +++ TESTING trace trace-12-malloc: # Test of malloc failure on insert_tail --- trace-12-malloc 6/6 +++ TESTING trace trace-13-perf: # Test performance of insert_tail --- trace-13-perf 6/6 +++ TESTING trace trace-14-perf: # Test performance of size --- trace-14-perf 6/6 +++ TESTING trace trace-15-perf: # Test performance of insert_tail, size, reverse, and sort --- trace-15-perf 6/6 +++ TESTING trace trace-16-perf: # Test performance of sort with random and descending orders # 10000: all correct sorting algorithms are expected pass # 50000: sorting algorithms with O(n^2) time complexity are expected failed # 100000: sorting algorithms with O(nlogn) time complexity are expected pass --- trace-16-perf 6/6 +++ TESTING trace trace-17-complexity: # Test if q_insert_tail and q_size is constant time complexity Probably constant time Probably constant time --- trace-17-complexity 5/5 --- TOTAL 100/100 Test with specific case by running command: scripts/driver.py -p /tmp/qtest.Bv7bLm --valgrind -t <tid> ``` 完美通過測試 ## Massif 視覺化 ### 實驗 1 實驗方法為: 比較 qtest.c 中有無呼叫 `deleteHistory()` 的圖案差別,以 trace-01 為例: ``` $ valgrind --tool=massif ./qtest -f traces/trace-01-ops.cmd $ massif-visualizer massif.out.* ``` 下圖是有呼叫 `deleteHistory()` 的結果,可以看到最後的 Total Memory Heap Consumption 有歸零,代表記憶體有確實歸還 ![](https://i.imgur.com/HakghEW.jpg) 下圖是無呼叫 `deleteHistory()` 的結果,可以看到最後的 Total Memory Heap Consumption 尚未歸零,然後程式直接結束,代表記憶體有缺失 ![](https://i.imgur.com/zVDmizE.jpg) ### 實驗 2 實驗方法為: 比較 qtest.c 中有無呼叫 deleteHistory() 的圖案差別,以 make test 為例: ``` $ valgrind --tool=massif make test $ massif-visualizer massif.out.* ``` ![](https://i.imgur.com/lyoNjdK.jpg) ![](https://i.imgur.com/IvWk0Op.jpg) 兩張圖的 Total Memory Heap Consumption 都長的差不多 ## 研讀 Dudect ### a) Variable-time implementation 首先研讀論文,但是我實在有看沒有懂,就算丟去翻譯成中文,一樣看的不是很懂。就我所理解的,作者第一個實驗測試 Memory comparison ,分成兩組測資,第一組使用均勻分布的 16-byte 字串,命名為 “random class”;另一組測資為完全相同的字串,命名為 “fix class” 。然後觀察這兩種 class 跟一個字串 s 做比較所耗費的 cpu cycle ,並畫出累積分布函數 (cdf) : ![](https://i.imgur.com/nBVBBei.png) 由上圖可以發現紅色所代表的 class 耗用的 cycle 數比較低、藍色較高,可見他們的分佈完全不同,接下來作者對實驗做不同的前處理,然後做多次的測試並比較兩種 class 分布的 t-statistic 的絕對值。當這個值大於 4.5 時,代表這兩種 class 的分布是不一樣的: ![](https://i.imgur.com/287ViOF.png) 很明顯,每次實驗的 |t| statistic 都大於 4.5, “random class”、“fix class” 的 Memory comparison 為不同的分布,他們耗費的 cpu cycle 差太多,出現 timing leakage 接下來,作者對實驗方法做了小幅修改,它降低評估工具的能力,並且讓評估工具不知道字串 s (我無法理解這句 "the evaluator does not know s" 的意思,是指不知道 s 儲存的值,還是連 s 的存在都不知道,還是別的情況...) ,然後將 fixed class 的值變成 0 ,再度進行實驗: ![](https://i.imgur.com/9sqF5Zv.png) ![](https://i.imgur.com/LTaDXTs.png) 經實驗後發現: 兩者耗費 cycle 數的分布變靠近了,然而 |t| statistic 在經過一些前處理之後,仍然無法保持在 4.5 以下,所以兩者分布仍有顯著差異 ### b) Constant-time implementation 接下來作者設計了一個預期為 constant time 的函式,這個函式會透過邏輯運算比較字串,並且不會在偵測到不同字元時提早結束 ![](https://i.imgur.com/dEgk8KE.png) ![](https://i.imgur.com/BB9krEc.png) 由 |t| statistics 小於 4.5 可以得知兩種 class 的分布是一樣的,無 timing leakage,由此可以推估此函式的複雜度為 $O(1)$ 。 接下來作者以類似的方法分析 AES128 、 Curve25519 on an ARM7TDMI 的 timing leakage ,就不多做贅述了。 ### 總結 作者使用 fix class 來測試 corner-case 的執行情況、 random class 測試一般情況,觀察是否有 timing leakage。有時候測資太小,難以觀察 timing leakage ,此時將測試次數增加,增加到百萬倍以上觀察 timing leakage 是否被拉開,若無,則可以得知該程式有很大的可能為 constant time。 然而我在想,如果 corner-case 並不影響程式執行的 cycle ,也就是 corner-case 無法造成 timing leakage,也許這個方法測試 constant time 會失準? 我的理解可能有錯,歡迎各位糾正 ### 解釋 dudect 實作 首先我觀察 qtest.c 的 `do_insert_tail()` ,當變數 `simulation` 為 `TRUE` 時,裡面有一個測量時間複雜度的函式 `is_insert_tail_const()` 就會被呼叫,進而啟動量測的程序,該函式在 dudect/fixture.c;在 `is_insert_tail_const()` 裡,會多次呼叫 `result = doit(0);` 如果 `doit()` 判斷 insert_tail 為 constant time ,會回傳 `TRUE` ,然後停止測試。 接下來觀察 `doit()` 的實作: ```cpp= static bool doit(int mode) { int64_t *before_ticks = calloc(number_measurements + 1, sizeof(int64_t)); int64_t *after_ticks = calloc(number_measurements + 1, sizeof(int64_t)); int64_t *exec_times = calloc(number_measurements, sizeof(int64_t)); uint8_t *classes = calloc(number_measurements, sizeof(uint8_t)); uint8_t *input_data = calloc(number_measurements * chunk_size, sizeof(uint8_t)); if (!before_ticks || !after_ticks || !exec_times || !classes || !input_data) { die(); } prepare_inputs(input_data, classes); measure(before_ticks, after_ticks, input_data, mode); differentiate(exec_times, before_ticks, after_ticks); update_statistics(exec_times, classes); bool ret = report(); free(before_ticks); free(after_ticks); free(exec_times); free(classes); free(input_data); return ret; } ``` - 第 15 行 `prepare_inputs(input_data, classes);` 會產生測資跟 classes 的資訊,每一筆測資會對應一種 class (fix 或 random)。 - 第 17 行 `measure(before_ticks, after_ticks, input_data, mode);` 是用來測量 cpu cycle 的,其實作的關鍵部分如下: ```cpp for (size_t i = drop_size; i < number_measurements - drop_size; i++) { char *s = get_random_string(); dut_new(); dut_insert_head( get_random_string(), *(uint16_t *) (input_data + i * chunk_size) % 10000); before_ticks[i] = cpucycles(); dut_insert_tail(s, 1); after_ticks[i] = cpucycles(); dut_free(); } ``` 其中 dut_* 看似為函式,其實都是 macro ,其目的應該是為了盡可能減少呼叫函式時的堆疊切換,以提高 cycle 量測的精準度,以 `dut_insert_tail(s, 1);` 為例: ```cpp #define dut_insert_tail(s,n) do { int j = n; while (j--) q_insert_tail(q, s); } while (0) ``` 我們可以看到 `dut_insert_tail(s, 1);` 被 `before_ticks[i] = cpucycles();` 和 `after_ticks[i] = cpucycles();` 夾起來,這就是用來儲存 cpu cycle 的實作方法 再細看 `cpucycle()` 的原始碼: ```cpp #include <stdint.h> // http://www.intel.com/content/www/us/en/embedded/training/ia-32-ia-64-benchmark-code-execution-paper.html static inline int64_t cpucycles(void) { #if defined(__i386__) || defined(__x86_64__) unsigned int hi, lo; __asm__ volatile("rdtsc\n\t" : "=a"(lo), "=d"(hi)); return ((int64_t) lo) | (((int64_t) hi) << 32); #else #error Unsupported Architecture #endif } ``` 上面那個 Intel 的網址如右: [Code Execution Times: IA-32/IA-64 Instruction Set Architecture](http://www.intel.com/content/www/us/en/embedded/training/ia-32-ia-64-benchmark-code-execution-paper.html),參考 Intel 白皮書,我得知: `"rdtsc"` 的全名為 Read Time-Stamp Counter and Processor ID IA assembly instruction ,是用來讀取 cpu cycle 的指令 `"=a"(lo)` 的 `=` 表示 write-only,`a` 表示暫存器 EAX,`(lo)` 則是指存放到變數 `lo` `"=d"(hi)` 的 `=` 表示 write-only,`d` 表示暫存器 EDX,`(hi)` 則是指存放到變數 `hi` 最後將 `lo` 和 `high` 放入 `int64_t` 回傳 - 終於,我們知道了量測 cpu cycle 的過程,接下來回到 `doit()` 的第 18 行 `differentiate(exec_times, before_ticks, after_ticks);` 這個函式很單純,就是計算 `after_ticks` 和 `before_ticks` 的差值: ```cpp= static void differentiate(int64_t *exec_times, int64_t *before_ticks, int64_t *after_ticks) { for (size_t i = 0; i < number_measurements; i++) { exec_times[i] = after_ticks[i] - before_ticks[i]; } } ``` - 再來 `doit()` 第 19 行 `update_statistics(exec_times, classes);` 用來更新 t test 的資料,這些資料會透過 `t_push()` 存放至 `t_ctx *t` : ```cpp void t_push(t_ctx *ctx, double x, uint8_t class) { assert(class == 0 || class == 1); ctx->n[class]++; /* Welford method for computing online variance * in a numerically stable way. */ double delta = x - ctx->mean[class]; ctx->mean[class] = ctx->mean[class] + delta / ctx->n[class]; ctx->m2[class] = ctx->m2[class] + delta * (x - ctx->mean[class]); } ``` - 最後,來到了 `doit()` 第 20 行 `bool ret = report();` 此函式會把 [t-test](https://en.wikipedia.org/wiki/Student%27s_t-test) 的一些參數計算出來: ```cpp= static bool report(void) { double max_t = fabs(t_compute(t)); double number_traces_max_t = t->n[0] + t->n[1]; double max_tau = max_t / sqrt(number_traces_max_t); printf("\033[A\033[2K"); printf("meas: %7.2lf M, ", (number_traces_max_t / 1e6)); if (number_traces_max_t < enough_measurements) { printf("not enough measurements (%.0f still to go).\n", enough_measurements - number_traces_max_t); return false; } /* * max_t: the t statistic value * max_tau: a t value normalized by sqrt(number of measurements). * this way we can compare max_tau taken with different * number of measurements. This is sort of "distance * between distributions", independent of number of * measurements. * (5/tau)^2: how many measurements we would need to barely * detect the leak, if present. "barely detect the * leak" = have a t value greater than 5. */ printf("max t: %+7.2f, max tau: %.2e, (5/tau)^2: %.2e.\n", max_t, max_tau, (double) (5 * 5) / (double) (max_tau * max_tau)); if (max_t > t_threshold_bananas) { return false; } else if (max_t > t_threshold_moderate) { return false; } else { /* max_t < t_threshold_moderate */ return true; } } ``` 如果測資不夠多,會直接回傳 `FALSE` ;如果測資夠多,接下來看有沒有 t 值超過 `t_threshold_moderate` (預設為 10) ,如果都沒超過就回傳 `TRUE` ,代表 `do_insert_tail()` 非常有可能為 $O(1)$