2018q3 Homework2(lab0)

# 2018q3 Homework2(lab0) contributed by <`posutsai`> > 實驗環境 > Ubuntu 16.04.5 LTS > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 8 > On-line CPU(s) list: 0-7 > Thread(s) per core: 2 > Core(s) per socket: 4 > Socket(s): 1 > NUMA node(s): 1 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 60 > Model name: Intel(R) Core(TM) i7-4770 CPU @ > 3.40GHz > Stepping: 3 > CPU MHz: 3454.851 > CPU max MHz: 3900.0000 > CPU min MHz: 800.0000 > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 8192K ## [queue.h](https://github.com/posutsai/lab0-c/blob/master/queue.h) 資料結構的設計 ```C= /* Queue structure */ typedef struct { list_ele_t *head; list_ele_t *tail; int size; } queue_t; ``` * tail pointer 在實作 [q_insert_tail](https://github.com/posutsai/lab0-c/blob/master/queue.c#L96) function 發現，因為必須要在 O(1) 的時間複雜度完成勢必得要有一個 pointer 指向 list 的尾端，而不是每一次都從 head 一路走到底。當 list 內只有單一節點時，會讓 head 和 tail 都指向這唯一節點。 * size 有了這樣的一個 member 可以讓 [q_size](https://github.com/posutsai/lab0-c/blob/master/queue.c#L150) 在常數時間內回傳 list 內的 node 總數，而不是每次都計算一次。在實作 queue.c 當需要存取節點總數時，必須以 `q->size` 的型式，而不是呼叫 `q_size()` 函式，原因是在實作 `q_size` 函式時，若 q 是 NULL pointer 依然回傳 0 ，實作時因為這點困擾很久。不懂的點是為何要設定 `q_size` 是這樣的輸出型式，以物件導向的觀點， size 這一個 member 應該是 private 必須透過 getter 去存取，如果是我設計的話，可能會讓 q 是 NULL pointer 的情況回傳 -1。 ## queue.c 內部函數實作 * Overview 這次關於 linked list 的實作，並不是我第一次寫這系列的操作，不過從題目上的設計就可以了解到，我以前忽略的很多細節，甚至都已經假設輸入指標必定是有效的，沒做任何的 assertion 甚至考量到程式因此崩潰的狀況，我想這就是為什麼一個好的測試，對於工業級的產品來說那麼重要了。除此之外，這份作業內的 `q_test` 這支程式也讓我深深了解為何之前提到過的 [google-test](https://github.com/google/googletest) 和 [google-mock](https://github.com/google/googlemock) 會應運而生了 * [queue_new](https://github.com/posutsai/lab0-c/blob/master/queue.c#L25) 如我在上節所述，我在實作這個 `q_new` 時才意識到 `malloc` 可能會回傳 NULL pointer，從 `malloc` 的 [man page](http://man7.org/linux/man-pages/man3/malloc.3.html) 可以看到，在此摘錄幾點使用 `malloc` 來做 dynamic allocation 需要注意的事項: - 經由 `malloc` 而來的記憶體並未初始化，要注意 linux kernel 內的 over commit 機制， kernel 允許 process 能申請使用的記憶體空間不等於實際可以拿來分配使用的空間量，而當多個 process 同時使用太多記憶體空間時會導致 kernel OOM killer 把 process 結束掉。至於在這次作業內怎麼解決可能會有 overcommit 的問題，因為必須使用到那段記憶體，作業系統才會決定是否允許使用，在 [harness.c test_malloc](https://github.com/posutsai/lab0-c/blob/master/harness.c#L140) 函數內，在一開始就會先以 memset 初始化一個常數。 - 當 size 參數為 0 時，不是回傳一個 NULL pointer 就是一個可以被丟進 `free` 的指標。以往在做 free 時很怕會釋放到 NULL pointer 所以會多做一個 if 做判斷，在看了 man page 之後才知道這是沒必要的動作，但同一記憶體位置並不能被二次釋放。 - 當 malloc 過程出現錯誤時會回傳 NULL。 - 關於 malloc 這樣 memory allocation 的實作，我在 [tensorflow](https://github.com/tensorflow/tensorflow/blob/c023f46956f8a867d0dc77f1ee742564a3622e68/tensorflow/core/platform/windows/port.cc#L73) 內部透過 macro 去開啟不同的實作方式，他採用的是 [jemalloc](https://github.com/jemalloc/jemalloc) 與之相對應的還有 google 開發的 tcmalloc ，以及預設採用 ptmalloc ，三者最主要的差異在於對 garbage collection 的處理不同導致相對於 ptmalloc 另外兩種模型的碎片化問題不會那麼嚴重。而底層跟作業系統申請 memory 則是使用 [mmap](http://man7.org/linux/man-pages/man2/mmap.2.html)。 * [q_insert_head](https://github.com/posutsai/lab0-c/blob/master/queue.c#L63) 在實作上，撇除 q 本身是 NULL 以及 malloc 失敗回傳 NULL 的狀況下，可以把 q_insert_head 分為兩種狀況: - 若 list 內沒有任何節點: - 把 s 以 strdup 複製一份 assign 進新節點 - head 和 tail 指標都要指向插入之唯一節點 - 把單一節點的 next 指標指向 NULL - 若 list 內已存在節點: - 把新節點的 next 指向原先的 `q->head` - 把 `q->head` 指向新節點 - 把 s 以 strdup 複製一份 assign 進新節點 ```C= bool q_insert_head(queue_t *q, char *s) { if (q == NULL) return false; list_ele_t *newh; newh = malloc(sizeof(list_ele_t)); if (newh == NULL) { return false; } // 在為新節點配置記憶體時，採用的是 strdup ， // 所以在 remove 時記得要 free newh->value = strdup(s); if (q->head == NULL) { q->head = newh; q->tail = newh; newh->next = NULL; } else { newh->next = q->head; q->head = newh; } q->size++; return true; } ``` * [q_insert_tail](https://github.com/posutsai/lab0-c/blob/master/queue.c#L96) 與 q_insert_head 類似一樣以 strdup 為新節點配置記憶體及分為兩種狀況考慮，不過註解內有要求實作上必須要在常數的時間複雜度內完成，這也就是為什麼想要在 `queue_t` 內多一個 member 的原因: - list 內部目前沒有任何節點: - 把所要求的字串 assign 進新節點 - head 和 tail 都指向新節點 - 新節點的 next 設為 NULL - list 已存在節點: - 把 s 以 strdup 複製一份 assign 進新節點 - 把原先 tail 指向的節點之 next 指向新節點 - 更新 tail 指向新節點 ```C= bool q_insert_tail(queue_t *q, char *s) { if (q == NULL) return false; list_ele_t *newt = malloc(sizeof(list_ele_t)); if (newt == NULL) { return false; } newt->next = NULL; newt->value = strdup(s); if (q->tail == NULL) { q->head = newt; q->tail = newt; } else { q->tail->next = newt; q->tail = newt; } q->size++; return true; } ``` * [q_free](https://github.com/posutsai/lab0-c/blob/master/queue.c#L38) 若 q 為有效指標則分為兩種狀況處理: - list 內沒有任何節點，則針對 q 這個指標做 free - 若 list 內還有節點，則遍歷整個 list ， free 各個節點，但在 free 的同時也會失去跟下一個節點的連結，所以要先用 tmp 存起來。 ```C= void q_free(queue_t *q) { if (q != NULL) { if (q->size == 0) { free(q); return; } for (list_ele_t *cur = q->head; cur != NULL;) { list_ele_t *tmp = cur->next; free(cur); cur = tmp; } /* Free queue structure */ free(q); } ``` * [q_remove_head](https://github.com/posutsai/lab0-c/blob/master/queue.c#L127) 移除 head 的條件是 q 必須為有效指標且 size > 0，在實作移除節點時，必須注意移除節點後可能會分為兩個狀況，沒有節點或是只剩一個或是多個節點: - 先分析有剩餘節點的狀況 - 先用一個 tmp pointer 儲存原始 head 作為後來的 free 以及字串處理使用 - 更新 q->head 並調整 size - 題目要求要複製 bufsize - 1 長度的字串到 sp ，並在字串最後補上節數字元 '\0'，我這裡使用的是 memcpy 而非 strdup，[memcpy](http://www.cplusplus.com/reference/cstring/memcpy/) 的第三個參數，表示的是要拷貝的長度，所以我們必須先計算這樣的長度代表幾個 char 並在適當位置補上結束字元。 - 最後在針對 tmp 做釋放。 - 若是在移除 head 之後，list 內沒有任何節點 - 重複上述所有步驟， `q->head = q->head->next` 並不會受到影響因為會被更新為 NULL。 - 由於現在 list 內沒有任一結點，所以我們也要把 tail 設為 NULL ，在這個地方卡了很久。 ```C= bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { if (q == NULL || q->size == 0) return false; list_ele_t *tmp = q->head; q->head = q->head->next; int copy_len = bufsize / sizeof(char) - 1; if (sp != NULL) { memcpy(sp, tmp->value, copy_len); sp[copy_len] = '\0'; } free(tmp); q->size--; if (q->size == 0) q->tail = NULL; return true; } ``` * [q_reverse](https://github.com/posutsai/lab0-c/blob/master/queue.c#L167) ## 測試環境的分析 * q_reverse 內對 malloc 的偵測機制一直對於 q_reverse 內如何偵測實作細節是否有用到 malloc 感到好奇，從 [q_test.c](https://github.com/posutsai/lab0-c/blob/master/qtest.c) 可以看到 [do_reverse](https://github.com/posutsai/lab0-c/blob/master/qtest.c#L359) 函數內呼叫了兩次 [set_noallocate_mode](https://github.com/posutsai/lab0-c/blob/master/harness.c#L211) 去調整是否可以呼叫 malloc 的狀態，重點在於使用 macro 把 malloc 轉為 [test_malloc](https://github.com/posutsai/lab0-c/blob/master/harness.h#L58)，所以我們看似使用預設的 malloc 但其實是呼叫 test_malloc 。我原先以為他是想辦法在我們呼叫 glibc malloc 內寫一層類似 middle ware 的結構，若只是以 preprocess 的方式做偵測的話，可以用 mmap, brk 和 sbrk 這類的系統呼叫替代而不被偵測出來，事實上在 glibc 內也是以演算法去決定使用哪個系統呼叫，比較好的做法是使用 GNU 提供的 [malloc hook](http://www.gnu.org/software/libc/manual/html_node/Hooks-for-Malloc.html#Hooks-for-Malloc) 在 do_reverse 先註冊會讓他跳出一個 exception，這樣的做法可以更有效的偵測是否使用 malloc。 jemalloc 和 tcmalloc 也都是以這樣的手法在不影響原始程式碼的情形下改變 memory management model，基於上述理由，我試著以 glibc hook 實作 malloc detection。 ```C= // modify harness.[c|h] to add following code #include <malloc.h> // use two function pointer to store original malloc and free hook void *(original_malloc_hook) (size_t size, const void *caller); void (original_free_hook) (void *ptr, const void *caller); void hook_init() { original_malloc_hook = __malloc_hook; original_free_hook = __free_hook; // test_malloc prototype 改為 void *test_malloc (size_t, const void *) __malloc_hook = test_malloc; // 同理，把 test_free 改為 void test_free(void *, const void *) __free_hook = test_free; } void *test_malloc(size_t size, const void *caller) { __malloc_hook = original_malloc_hook; __free_hook = original_free_hook; .... // test_malloc 內原始程式碼 __malloc_hook = test_malloc; __free_hook = test_free; } void test_free(void *ptr, const void *caller) { __malloc_hook = original_malloc_hook; __free_hook = original_free_hook; .... // test_free 內原始程式碼 __malloc_hook = test_malloc; __free_hook = test_free; } // ----------------------------------------------------- // qtest.c int main() { // command parser ... hook_init(); queue_init(); .... } ``` 以 make test 進行編譯時，發現會 free 產生問題，進一步發現在 test_free 裡大量使用 [MAGICHEADER](https://github.com/posutsai/lab0-c/blob/master/harness.c#L19) 、 [MAGICFOOTER](https://github.com/posutsai/lab0-c/blob/master/harness.c#L23) 和 [MAGICFREE](https://github.com/posutsai/lab0-c/blob/master/harness.c#L21) 這三個 macro ，似乎在原版的 test_malloc 和 test_free 內並不只是單純的以 preprocessor 替換，猜測可能是為了追蹤尚有多少空間位被釋放，採取間接的方式釋放記憶體。 harness.c 內有自己實作一個 doubly linked list 去把每一個 allocated chunk 彼此相連，並以 [block_ele_t](https://github.com/posutsai/lab0-c/blob/master/harness.c#L42) 這樣的資料結構儲存相關的 metadata ，在 test_malloc 內可以[看到](https://github.com/posutsai/lab0-c/blob/master/harness.c#L138)在每個 block 的尾端放入 MAGICFOOTER 的值，目的在於辨認當想要 free 時，是否位於[合法空間](https://github.com/posutsai/lab0-c/blob/master/harness.c#L163)，並以 [find_header](https://github.com/posutsai/lab0-c/blob/master/harness.c#L77) 找出這個 block 的起始指標，並把 header 到 footer 之間的記憶體通通 assign 1，最後在把整個 block 釋放掉，在整個對 doubly linked list 的新增和刪除結點上，對於一些步驟並不了解存在的意義: * 不懂為何需要 header 和 footer 都 assign MAGICFREE，且 pay_load 內通通設為 0x55，在 test_malloc 內可以理解，但在 test_free 就不懂為何要這樣做了。 * 從程式碼可以看出 MAGICHEAD 和 MAGICFOOTER 是作為辨認合法性的用途，但這個做法應該很難保證記憶體內容不會是我們設定的值 0xdeadbeef 或是 0xbeefdead，感覺是個不太好的作法。 * qtest 內類似 error handling 的機制