2020q1 Homework1 (lab0)

# 2020q1 Homework1 (lab0) contributed by < `ambersun1234` > ## 環境 ```shell $ uname -a Linux 4.15.0-74-generic #83~16.04.1-Ubuntu SMP Wed Dec 18 04:56:23 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux $ gcc --version gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 ``` ## 作業說明 + [作業說明](https://hackmd.io/@sysprog/linux2020-lab0?fbclid=IwAR1CxFMIqxbzNRJaOyJggRwmEJeJMIGtkr1yrZ05y1qSHnU4F-O883hGikE#-%E4%BD%9C%E6%A5%AD%E8%A6%81%E6%B1%82-WIP) + [C Programming Lab](http://www.cs.cmu.edu/~213/labs/cprogramminglab.pdf) ## 作業要求 + ==詳細閱讀 [C Programming Lab](http://www.cs.cmu.edu/~213/labs/cprogramminglab.pdf)== ，依據指示著手修改 `queue.[ch]` 和連帶的檔案，測試後用 Git 管理各項修改，記得也該實作 `q_sort` 函式。 + 修改排序所用的比較函式，變更為 [natural sort](https://github.com/sourcefrog/natsort)，在 "simulation" 也該做對應的修改，得以反映出 [natural sort](https://github.com/sourcefrog/natsort) 的使用。 + 除了修改程式，也要編輯「[作業區](https://hackmd.io/@sysprog/linux2020-homework1)」，增添開發紀錄和 GitHub 連結，除了提及你如何逐步達到自動評分程式的要求外，共筆也要涵蓋以下: + 運用 Valgrind 排除 `qtest` 實作的記憶體錯誤，並透過 Massif 視覺化 "simulation" 過程中的記憶體使用量，需要設計對應的實驗 + 研讀論文 [Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf)，解釋本程式的 "simulation" 模式是如何透過以實驗而非理論分析，達到驗證時間複雜度，需要解釋 [Student's t-distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution) 及程式實作的原理; + 解釋 [select](http://man7.org/linux/man-pages/man2/select.2.html) 系統呼叫在本程式的使用方式，並分析 [console.c](https://github.com/sysprog21/lab0-c/blob/master/console.c) 的實作，說明其中運用 CS:APP [RIO 套件](http://csapp.cs.cmu.edu/2e/ch10-preview.pdf) 的原理和考量點。可對照參考 [CS:APP 第 10 章重點提示](https://hackmd.io/@sysprog/H1TtmVTTz) :arrow_forward: 為避免「舉燭」，請比照 `qtest` 實作，撰寫獨立的終端機命令解譯器 (可建立新的 GitHub repository) + 挑戰題 + 參照 [antirez/linenoise](https://github.com/antirez/linenoise)，強化 `qtest` 命令列功能，使其支援單行或多行編輯、歷史命令處理、命令自動補完 (例如輸入 `h` 之後按下 Tab 按鍵，自動接續補上 `elp`，成為完整的 `help` 命令)。除了整合程式碼外，應當說明 [antirez/linenoise](https://github.com/antirez/linenoise) 的運作原理，注意到 [termios](http://man7.org/linux/man-pages/man3/termios.3.html) 的運用 + 思考如何預設開啟 AddressSanitizer，卻又能滿足效能測試所需 (可能需要兩種以上的 build target) + 指出現有程式的缺陷 (提示: 和 [RIO 套件](http://csapp.cs.cmu.edu/2e/ch10-preview.pdf) 有關)，嘗試強化並提交 pull request ## 實驗紀錄 + 一開始編譯會遇到錯誤: ```shell harness.h:66:0: error: "strdup" redefined [-Werror] #define strdup test_strdup ^ ``` + 於是我簡單的看了一下，發現值得注意的是，在 `harness.h` 中有寫到 ```cpp #define malloc test_malloc #define free test_free #define strdup test_strdup ``` + 意思就是說，該程式將這些 function 改成自己的版本，這個大概也就可以解釋為什麼我嘗試要 malloc newh->value 的時候會有錯誤 + 更深入的閱讀程式碼之後我發現到，`test_malloc` 實際上是用一個更大的空間來儲存記憶體相關的東西，而且是使用 `雙向 linked list` 儲存 + 以下這個 struct 是用來儲存記憶體相關的東西 ```cpp typedef struct BELE { struct BELE *next; struct BELE *prev; size_t payload_size; size_t magic_header; /* Marker to see if block seems legitimate */ unsigned char payload[0]; /* Also place magic number at tail of every block */ } block_ele_t; ``` 在 test_malloc 裡面有一句是這麼寫的 `block_ele_t *new_block = malloc(size + sizeof(block_ele_t) + sizeof(size_t));` 可以發現到除了 block_ele_t 之外還有 size 跟 sizeof(size_t)，其中 size 就是在 malloc 時的參數，另外一個是跟所謂的 `magic number` 有相關 :::info magic number: 程式設計中將一個數字寫死，用以表達特殊意義其中像是 0xdeadbeef 這種的被稱作為 [hexspeak](https://en.wikipedia.org/wiki/Hexspeak) ::: + 在 block_ele_t 中 magic_header 用以儲存此 struct 的記憶體狀態，回到真正 malloc 的地方，最後一個 `sizeof(size_t)` 即是要儲存 magic_footer 的 ```cpp /* Value at start of every allocated block */ #define MAGICHEADER 0xdeadbeef /* Value when deallocate block */ #define MAGICFREE 0xffffffff /* Value at end of every block */ #define MAGICFOOTER 0xbeefdead ``` + `test_malloc` 的回傳值是定義成 `void *p = (void *) &new_block->payload;` 剛開始我很不明白為什麼是 payload ，而且他的大小還是0，後來有查到一個叫做 <font color=red>struct hack</font> 的東西 + cf. `struct hack` + 使用這個方法可以使 struct 達到動態記憶體分配 + 在 struct 的最後新增一個大小為0的陣列 + 然後在 malloc 的時候要求空間除了原本 struct 之外再加上額外需求空間 > [reference struct hack](http://frankchang0125.blogspot.com/2013/01/c-struct-hack-structure-with-variable.html) + 以 struct hack 的方法即可以分配記憶體給 list_ele_t + 其中還有 `static size_t allocated_count = 0;` 用以紀錄目前 allocated 的記憶體數量 ## queue.c 實做 ### `q_sort` + 我打算以 [merge_sort](https://npes87184.github.io/2015-09-12-linkedListSort/) 搭配 [natural_sort](https://github.com/sourcefrog/natsort) 作為 compare function + [natural_sort](https://github.com/sourcefrog/natsort) 的實做會使排序的結果更直覺 ```c list_ele_t *merge(list_ele_t *q1, list_ele_t *q2) { if (!q1) return q2; if (!q2) return q1; if (strcmp(q1->value, q2->value) < 0) { q1->next = merge(q1->next, q2); return q1; } else { q2->next = merge(q1, q2->next); return q2; } } list_ele_t *mergeSort(list_ele_t *q) { if (!q || !q->next) { return q; } list_ele_t *q1 = q->next, *q2 = q; while (q1 && q1->next) { q2 = q2->next; q1 = q1->next->next; } q1 = q2->next; q2->next = NULL; list_ele_t *t1 = mergeSort(q); list_ele_t *t2 = mergeSort(q1); return merge(t1, t2); } ``` + 以上的 sort 實做在 `./traces/trace-15-perf.cmd` 會錯誤，跑了一下發現，由於排序數量過於龐大會導致 stack overflow，於是把遞迴的實做改成迴圈的方式 ```shell cmd> ih dolphin 1000000 q = [dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin ... ] cmd> it gerbil 1000000 q = [dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin ... ] cmd> size 1000 Queue size = 2000000 q = [dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin dolphin ... ] cmd> reverse q = [gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil gerbil ... ] cmd> sort Segmentation fault (core dumped) ``` ```cpp list_ele_t *merge(list_ele_t *q1, list_ele_t *q2) { list_ele_t *thead = NULL, *tcur = NULL; if (strcmp(q1->value, q2->value) < 0) { thead = q1; q1 = q1->next; } else { thead = q2; q2 = q2->next; } tcur = thead; while (q1 && q2) { if (strcmp(q1->value, q2->value) < 0) { tcur->next = q1; q1 = q1->next; } else { tcur->next = q2; q2 = q2->next; } tcur = tcur->next; } if (q1) { tcur->next = q1; } if (q2) { tcur->next = q2; } return thead; } ``` + 改成迴圈之後就可以通過測試了 + 我將 natual sort 的實做引入之後並跑測試後發現它會跳 `ERROR: Not sorted in ascending order`，在 `do_sort` 函式中加入 simulation 的判斷後就沒有錯誤了(如下) ```c bool ok = true; if (q) { for (list_ele_t *e = q->head; e && --cnt; e = e->next) { /* Ensure each element in ascending order */ /* FIXME: add an option to specify sorting order */ if (simulation) { if (strnatcmp(e->value, e->next->value) > 0) { report(1, "ERROR: Not sorted in natural order"); ok = false; break; } } else { if (strcasecmp(e->value, e->next->value) > 0) { report(1, "ERROR: Not sorted in ascending order"); ok = false; break; } } } } ``` ``` cmd> new cmd> new q = [] cmd> ih rfc2086.txt cmd> ih rfc2086.txt q = [rfc2086.txt] cmd> ih rfc1.txt cmd> ih rfc1.txt q = [rfc1.txt rfc2086.txt] cmd> ih rfc822.txt cmd> ih rfc822.txt q = [rfc822.txt rfc1.txt rfc2086.txt] cmd> sort cmd> sort ERROR: Not sorted in ascending order q = [rfc1.txt rfc822.txt rfc2086.txt] cmd> option simulation 1 cmd> option simulation 1 cmd> sort cmd> sort q = [rfc1.txt rfc822.txt rfc2086.txt] cmd> ``` ### `q_new` + 需注意的是 malloc 有可能會失敗 ```c queue_t *q = malloc(sizeof(queue_t)); /* What if malloc returned NULL? */ if (q == NULL) { return NULL; } else { q->head = NULL; q->q_size = 0; q->last = q->head; return q; } ``` ### `q_free` + 需判斷 list 是否為空值以及也需要 free(current->value)，因為該 value 是經由 `malloc` 新增的 + 最後再將 `queue_t` free 掉即可 ```c void q_free(queue_t *q) { if (q != NULL) { list_ele_t *current = q->head, *next; while (current != NULL) { next = current->next; free(current->value); free(current); current = next; } free(q); } } ``` ### `q_insert_head` + 檢查 list 是否為空 + `malloc` 一個新的 element 空間 + 拷貝字串是由 [strdup](https://linux.die.net/man/3/strdup) 所進行的，該函式會自行 `malloc` 字串大小空間，並將其字串拷貝進新的記憶體空間，以下節錄自 [man page](https://linux.die.net/man/3/strdup) > The strdup() function returns a pointer to a new string which is a duplicate of the string s. Memory for the new string is obtained with malloc(3), and can be freed with free(3). > > The strndup() function is similar, but only copies at most n bytes. If s is longer than n, only n bytes are copied, and a terminating null byte ('\0') is added. + 新增元素必須擺 list 開頭 ```c bool q_insert_head(queue_t *q, char *s) { if (q == NULL) { return false; } list_ele_t *newh; /* TODO: What should you do if the q is NULL? */ newh = malloc(sizeof(list_ele_t)); /* Don't forget to allocate space for the string and copy it */ /* What if either call to malloc returns NULL? */ if (newh == NULL) { return false; } else { newh->value = strdup(s); if (newh->value == NULL) { free(newh); return false; } newh->next = q->head; // newh->next will definitely be head if (q->head == NULL) { // queue empty q->last = newh; } q->head = newh; q->q_size++; return true; } } ``` ### `q_insert_tail` + 基本同 [q_insert_head](#q_insert_head)，只是新增元素是加在 list 尾端 ```c bool q_insert_tail(queue_t *q, char *s) { if (q == NULL) { return false; } list_ele_t *newh; newh = malloc(sizeof(list_ele_t)); if (newh == NULL) { return false; } else { newh->value = strdup(s); if (newh->value == NULL) { free(newh); return false; } newh->next = NULL; if (q->last == NULL) { q->head = q->last = newh; } else { q->last->next = newh; q->last = newh; } q->q_size++; return true; } } ``` ### `q_remove_head` + 將被刪除元素之字串複製至 sp 並加上 `null-terminated string('\0')`，需使用 [strncpy](https://linux.die.net/man/3/strncpy)，避免 [buffer overflow](http://www.cis.syr.edu/~wedu/seed/Book/book_sample_buffer.pdf) + 同時也要更新 list 的開頭、結尾以及大小 + 釋放記憶體需要兩次，一個是：元素 value(由 [strdup](https://linux.die.net/man/3/strdup) `malloc`)，另一個是元素本身 ```c bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { /* You need to fix up this code. */ if (q == NULL || q->head == NULL) { return false; } else { if (sp != NULL) { strncpy(sp, q->head->value, bufsize - 1); sp[bufsize - 1] = '\0'; } list_ele_t *temp = q->head; q->head = q->head->next; q->q_size--; free(temp->value); free(temp); return true; } } ``` ### `q_size` + 透過增加 `queue_t` 的 member 即可完成 $O(1)$ 的取值 ```c int q_size(queue_t *q) { return q == NULL ? 0 : q->q_size; } ``` ### `q_reverse` + 檢查 list 是否為空 + 簡單的進行前後對調即可完成 list 反轉 ```c void q_reverse(queue_t *q) { /* You need to write the code for this function */ if (q == NULL) { return; } list_ele_t *previous = NULL, *current = q->head, *next = q->head, *temp; temp = q->head; q->head = q->last; q->last = temp; while (next != NULL) { // store next = current->next; // reverse current->next = previous; // move on previous = current; current = next; } } ``` ## Valgrind + 執行 `make valgrind` 並未發現任何相關問題 + ### Massif 記憶體使用量 + 為了實驗需要，我將 qtest.c 做了一點修改；將 rh 改寫為可重複執行N次 + 參照了[AndybnACT](https://hackmd.io/@AndybnA/lab0-c?fbclid=IwAR2tbsPQQDULUYRga2ppCJeS0Q56LkJR7BwPcIq-V-W3-ZQC979TFYHmkhY#Massif) 的方法，我做了如下的實驗 + `$ valgrind --tool=massif --stacks=yes --time-unit=i ./qtest` <hr> 1. 寫入 50 筆隨機資料，進行排序以及反轉，最後再釋放 + ![](https://i.imgur.com/K9NHTuq.png) 2. 寫入 1000 筆隨機資料，進行排序以及反轉，最後再釋放 + ![](https://i.imgur.com/6wSoq9E.png) + 在 50 筆的實驗中可以明顯看到在新增完資料後是有一段稍微平的曲線，而可以證明排序以及反轉是沒有使用太多記憶體 + 相反的在 1000 筆的測試中基本上就有點難看出排序以及反轉的記憶體使用量，較為明顯的只有新增以及刪除而已 ## select 系統呼叫 + 根據 [man select](http://man7.org/linux/man-pages/man2/select.2.html) 的結果可以簡略的得知它是 `synchronous I/O multiplexing`，並且定義於 `POSIX.1-2000, POSIX.1-2008` > select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform a corresponding I/O operation (e.g., read(2), or a sufficiently small write(2)) without blocking. + 簡言之就是 `select` & `pselect` 允許程式監聽多個檔案描述符等待到其狀態變為 ready + 其中有三項操作是有些許不同的 + `select` 使用 `struct timeval` 作為 timeout 的依據 ```c <sys/time.h> struct timeval { long tv_sec; /* seconds */ long tv_usec; /* microseconds */ }; ``` + `select` 有可能會更新參數 `timeout` + `select` 並沒有所謂 `sigmask` + 在操作 sets 上面，他有提供4個基本的 macro 操作 + `FD_ZERO()`: 清除 set + `FD_SET()`: 新增 fd 至 set + `FD_CLR()`: 從 set 中移除指定 fd + `FD_ISSET()`: 查詢 fd 是否存在於 set 中 + 在程式碼中簡單搜尋，[lab0-c/console.c](https://github.com/ambersun1234/lab0-c/blob/master/console.c#L557) 中即找尋到 `select` 系統呼叫的蹤影 ```c /* * Handle command processing in program that uses select as main control loop. * Like select, but checks whether command input either present in internal * buffer * or readable from command input. If so, that command is executed. * Same return as select. Command input file removed from readfds * * nfds should be set to the maximum file descriptor for network sockets. * If nfds == 0, this indicates that there is no pending network activity */ int cmd_select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout) { char *cmdline; int infd; fd_set local_readset; while (!block_flag && read_ready()) { cmdline = readline(); interpret_cmd(cmdline); prompt_flag = true; } if (cmd_done()) return 0; if (!block_flag) { /* Process any commands in input buffer */ if (!readfds) readfds = &local_readset; /* Add input fd to readset for select */ infd = buf_stack->fd; FD_SET(infd, readfds); if (infd == STDIN_FILENO && prompt_flag) { printf("%s", prompt); fflush(stdout); prompt_flag = true; } if (infd >= nfds) nfds = infd + 1; } if (nfds == 0) return 0; int result = select(nfds, readfds, writefds, exceptfds, timeout); if (result <= 0) return result; infd = buf_stack->fd; if (readfds && FD_ISSET(infd, readfds)) { /* Commandline input available */ FD_CLR(infd, readfds); result--; cmdline = readline(); if (cmdline) interpret_cmd(cmdline); } return result; } ``` + `readline` 定義於 [lab0-c/console.c](https://github.com/ambersun1234/lab0-c/blob/master/console.c#L477) ```c /* Read command from input file. * When hit EOF, close that file and return NULL */ static char *readline() { int cnt; char c; char *lptr = linebuf; if (!buf_stack) return NULL; for (cnt = 0; cnt < RIO_BUFSIZE - 2; cnt++) { if (buf_stack->cnt <= 0) { /* Need to read from input file */ buf_stack->cnt = read(buf_stack->fd, buf_stack->buf, RIO_BUFSIZE); buf_stack->bufptr = buf_stack->buf; if (buf_stack->cnt <= 0) { /* Encountered EOF */ pop_file(); if (cnt > 0) { /* Last line of file did not terminate with newline. */ /* Terminate line & return it */ *lptr++ = '\n'; *lptr++ = '\0'; if (echo) { report_noreturn(1, prompt); report_noreturn(1, linebuf); } return linebuf; } return NULL; } } /* Have text in buffer */ c = *buf_stack->bufptr++; *lptr++ = c; buf_stack->cnt--; if (c == '\n') break; } if (c != '\n') { /* Hit buffer limit. Artificially terminate line */ *lptr++ = '\n'; } *lptr++ = '\0'; if (echo) { report_noreturn(1, prompt); report_noreturn(1, linebuf); } return linebuf; } ``` + 其中幾個比較重要的是 + ==STDIN_FILENO== + 定義於 `<unistd.h>` + `STDIN_FILENO` 是 `file discriptor` 型態是 `int` + c.f. `stdin` 的型態是 `FILE *` + `STDIN_FILENO` 屬於 `unbuffered i/o` + `stdin` 屬於 `buffered i/o` > [STDIN_FILENO與stdin的區別與作用](https://www.twblogs.net/a/5b862a8f2b71775d1cd4bc69) > [对stdin，stdout 和STDOUT_FILENO，STDIN_FILENO的学习](https://blog.csdn.net/u012349696/article/details/51125106) > [Chapter 5. Standard I/O Library](https://notes.shichao.io/apue/ch5/) + ==read 系統呼叫== > read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf. > On files that support seeking, the read operation commences at the current file offset, and the file offset is incremented by the number of bytes read. If the current file offset is at or past the end of file, no bytes are read, and read() returns zero. > > If count is zero, read() may detect the errors described below. In the absence of any errors, or if read() does not check for errors, a read() with a count of 0 returns zero and has no other effects. + [man read](https://linux.die.net/man/2/read) 從 file descriptor 讀取特定大小的資料至 buffer 中 + 回到最初的，select 的用處是 `允許程式監聽多個檔案描述符等待到其狀態變為 ready`，我上網找了一下發現 file 這種東西基本上都是處於 ready 狀態的。可參考 [when a file descriptor is ready to read?](https://stackoverflow.com/questions/54100744/when-a-file-descriptor-is-ready-to-read) 以及 [man page](http://man7.org/linux/man-pages/man2/select.2.html)(以下節錄至 man page) > A file descriptor is considered ready if it is possible to perform a corresponding I/O operation (e.g., read(2), or a sufficiently small write(2)) without blocking. + 所以 `select` 的用意即是等待檔案狀態是 ready 的 + ### 運用 [CS:APP RIO](https://scs.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=f107c2ce-79d5-4529-baeb-2bb495d8c11a) 的原理以及考量點 + 在 UNIX 的世界觀，所有東西都是 **檔案** + 針對檔案，unix 有提供基本的系統呼叫，如: `read`、`write` + 而系統呼叫是有成本的; 課程中有提到，如果我將 `read` & `write` 設定成1個 byte，以一個小程式來說，這樣的成本是不會太高 + 但如果說，我將上述設定搭配網路通訊程式，這樣高成本的呼叫會給系統帶來不小的負擔 + **buffered I/O** 就是為了解決以上問題，先將資料暫存在一個特定的區域，等到完全讀完之後在進行 `read`、`write` 系統呼叫，會大幅度的降低系統呼叫的成本 + 在一般的檔案來說，上述 system call 是沒有什麼問題的，但考量到 unix 作業系統將所有東西視為是檔案，連同資料夾(directory)也是一種特殊的檔案、device 以及 socket 通通都是檔案 + 但是在 socket 這種檔案，`read` 以及 `write` 就會有一點難用了 + 在 [CS:APP RIO](https://scs.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=f107c2ce-79d5-4529-baeb-2bb495d8c11a) 裡面提到，`read` & `write` 保證可以讀取或寫入至少1個 byte，但是這個在網路傳輸方面就有疑慮了。如果我要直接使用作業系統提供的系統呼叫進行網路實做，這樣就會不方便，而且還要確保資料傳遞是沒問題的 + 於是為了要確保資料傳遞的方便性以及撰寫行考量，`RIO(Robust I/O)` 即是相對應的解法 <hr> + 在 [console.c](https://github.com/ambersun1234/lab0-c/blob/master/console.c#L478) 中，有使用到 RIO 的影子，readline 的實做就是一個 RIO 的概念，透過寫入 buffer 直到遇到 `\n`，並且在之後透過讀取 buffer 進行後續操作 :::info 你可以「設計實驗」來理解程式，而非總是用「查詢」這樣被動的學習方式。 :notes: jserv ::: ## 研讀論文 [Dude, is my code constant time?](https://eprint.iacr.org/2016/1123.pdf) + 該論文實做的方法如下 + approach: timing leakage detection 1. class definition: 2 diff input class 2. cycle counters: can be used to accurately acquire execution time measurements 3. environmental condition, to minimize effect of environment, class are randomly choosen + apply post-processing 1. cropping: crop/discard the measurement due to os interupt 2. high-order processing: it may be beneficial to apply high-order processing + apply statistical test 1. try to disprove null hypothesis 2. t-test 3. non-parametrix test + ### 解釋本程式的 "simulation" 模式是如何透過以實驗而非理論分析，達到驗證時間複雜度 + 實驗複雜度的程式是定義在 [dudect/fixture.c](https://github.com/ambersun1234/lab0-c/blob/master/dudect/fixture.c#L120) 裡面 ```=c static bool doit(int mode) { int64_t *before_ticks = calloc(number_measurements + 1, sizeof(int64_t)); int64_t *after_ticks = calloc(number_measurements + 1, sizeof(int64_t)); int64_t *exec_times = calloc(number_measurements, sizeof(int64_t)); uint8_t *classes = calloc(number_measurements, sizeof(uint8_t)); uint8_t *input_data = calloc(number_measurements * chunk_size, sizeof(uint8_t)); if (!before_ticks || !after_ticks || !exec_times || !classes || !input_data) { die(); } prepare_inputs(input_data, classes); measure(before_ticks, after_ticks, input_data, mode); differentiate(exec_times, before_ticks, after_ticks); update_statistics(exec_times, classes); bool ret = report(); free(before_ticks); free(after_ticks); free(exec_times); free(classes); free(input_data); return ret; } ``` + 其中 measure 的方法相對當純，在 insert_tail 的前後分別取得當前 cpu cycle，最後在計算執行前後的 cpu cycle 是否有差異，即可辨別此方法是否屬於 constant time ```c void measure(int64_t *before_ticks, int64_t *after_ticks, uint8_t *input_data, int mode) { assert(mode == test_insert_tail || mode == test_size); if (mode == test_insert_tail) { for (size_t i = drop_size; i < number_measurements - drop_size; i++) { char *s = get_random_string(); dut_new(); dut_insert_head( get_random_string(), *(uint16_t *) (input_data + i * chunk_size) % 10000); before_ticks[i] = cpucycles(); dut_insert_tail(s, 1); after_ticks[i] = cpucycles(); dut_free(); } } else { for (size_t i = drop_size; i < number_measurements - drop_size; i++) { dut_new(); dut_insert_head( get_random_string(), *(uint16_t *) (input_data + i * chunk_size) % 10000); before_ticks[i] = cpucycles(); dut_size(1); after_ticks[i] = cpucycles(); dut_free(); } } } ``` + ### [Student's t-distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution) 及程式實作的原理 + 實做的部份是在 [dudect/ttest.c](https://github.com/ambersun1234/lab0-c/blob/master/dudect/ttest.c) ###### tags: `linux2020`