2017q1 Homework1 (phonebook)

# 2017q1 Homework1 (phonebook) contributed by <`petermouse`> ### Reviewed by <`vtim9907`> - 用 BST 優化是很好的嘗試，不過可以嘗試針對 cache line 的特性來實做 BST，可以先參考 [ cache 原理和實驗 ](https://hackmd.io/s/S14L26-_l)了解 cache line 特性。 - hash function 的部份，嘗試測試搜尋多筆資料很好，但是就如你說的，測出來的時間包含到了其他東西，所以不夠準確，可以考慮修改更精確。 - 可以嘗試比較多種 hash function 之間的效能差異。 - 可以嘗試實做新增/刪除資料的功能，以貼近現實需求，更符合實際情況。 - 尚未回答本作業要求回答的問題。 ### Review by <`henry0929016816`> * 是擅長發現問題的朋友呢！發現到 pHead 沒有初始化 lastName 有可能會出錯 * 如果 perf top 無法觀看到 append 跟 findName 的差異可以嘗試使用 [gprof](https://sourceware.org/binutils/docs/gprof/) 觀察函式執行效能的差異 * 使用 shuf 產生了隨機的值去檢測 bst 跟 djb2 findName 是個不錯的方式，但是這只能看到執行時間的差異，建議可以將建表資料的分佈方式用點陣圖表示，讓大家知道 bst 跟 djb2 資料分佈的差異導致搜尋速度的落差。 ### Review by <`tina0405`> * 在hashtable上可以多嘗試各種不同的方法。 * binarytree上我覺得能直接看出findname()效能的進步，對應時間複雜度，如果能解釋數據更好。 ## 開發環境 * OS: Ubuntu 16.04 LTS * Memory: 8 GB * CPU: * Name: * Intel(R) Core(TM) i5-4200H CPU @ 2.80GHz * Cache: * L1d Cache： 32K * L1i Cache： 32K * L2 Cache： 256K * L3 Cache： 3072K L1 Cache 資訊 ( `$ sudo dmidecode` 查詢硬體資訊 ) ``` Handle 0x000A, DMI type 7, 19 bytes Cache Information Socket Designation: CPU Internal L1 Configuration: Enabled, Not Socketed, Level 1 Operational Mode: Write Back Location: Internal Installed Size: 128 kB Maximum Size: 128 kB Supported SRAM Types: Unknown Installed SRAM Type: Unknown Speed: Unknown Error Correction Type: Single-bit ECC System Type: Other Associativity: 8-way Set-associative ``` 128 kB 是因為 2 ( cores ) * 2 ( D-Cache + I-Cache ) * 32 kB 得來 ## phonebook ### 程式碼觀察仔細觀察 `main.c`，`pHead` 是 linked list 的 header， `pHead->pNext` 才會是第一筆資料的位置，不過在程式碼中 ```clike= /* in main.c */ e = pHead; /* some statements */ findName(input, e); ``` ```clike= /* in phonebook_orig.c */ entry *findName(char lastName[], entry *pHead) { while (pHead != NULL) { if (strcasecmp(lastName, pHead->lastName) == 0) return pHead; pHead = pHead->pNext; } return NULL; } ``` 發現到 `main.c` 執行到 `findName()` 這個函式時，竟然先比較`pHead->lastName` 才接著比較 `pHead->pNext->lastName`，可是 `pHead` 並沒有初始化 `lastName` 這個 member，不過也不會造成錯誤，實際將 `pHead->lastName` printf 出來，會發現沒有輸出任何文字。接著我在[這篇文章](http://stackoverflow.com/questions/8029584/why-does-malloc-initialize-the-values-to-0-in-gcc)看到了可能的解釋： > When you call malloc(), one of two things will happen: > 1. It recycles memory that was previous allocated and freed from the same process. > 2. It requests new page(s) from the operating system. 特別是第二點 > Memory coming from the OS will be zeroed for security reasons > When the OS gives you memory, it could have been freed from a different process. So that memory could contain sensitive information such as a password. So to prevent you reading such data, the OS will zero it before it gives it to you 也就是說，在執行 `malloc()` 且由作業系統給一個新的分頁時，作業系統為了考量到安全問題，會先把分頁給清空，所以當我們存取 `pHead->lastName` 時，馬上就遇到了`\0`，所以實際上 `pHead->lastName` 就是個空字串這篇文章也另外提到了: > I note that the C standard says nothing about this. This is strictly an OS behavior. So this zeroing may or may not be present on systems where security is not a concern. 所以是剛好作業系統幫助了我們初始化，可能換了作業系統就不適用，不該仰賴作業系統幫助我們做初始化0的動作，或是改用 `calloc()` ### 原始版本先 `make cache-test` 測試看看 cache miss rate ``` Performance counter stats for './phonebook_orig' (100 runs): 1,026,076 cache-misses # 85.825 % of all cache refs ( +- 0.30% ) 1,195,548 cache-references ( +- 0.18% ) 262,104,722 instructions # 1.39 insn per cycle ( +- 0.02% ) 188,310,587 cycles ( +- 0.27% ) 0.056928599 seconds time elapsed ( +- 0.54% ) ``` 執行 `calculate` 並查看 `output.txt` 平均 100 次的執行時間 (還沒有 opt 的版本所以 opt 時間為 orig 的時間) ``` append() 0.037610 0.037610 findName() 0.005019 0.005019 ``` 參考[示範共筆](https://hackmd.io/s/BJRMImUPg#)，也試著用 `perf record` 紀錄熱點，結果得不到有用的資訊，推測是程式執行太快，樣本數不夠，或者是我搞錯什麼了試著調整 `/proc/sys/kernel/perf_event_max_sample_rate` 內的數值，結果也失敗了 ( Fsync 命令執行失敗 ) ### 改良版本(1): 改變資料結構原始版本 cache miss 極高，是因為在當 cache miss 發生時，移入 block 的資料量極少( 1筆，struct size (136 Byte) 遠大於 block size (64 Byte)，需 3 個 block 存放 struct )，cache 中存放整個 structure 但大部份的 member 未使用因此將其他 member 以另一個 structure 表示， `entry` 僅儲存該 struct pointer，必要時再分配記憶體儲存其他資訊 ```clike= typedef struct __PHONE_BOOK_DATA { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } data; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_ENTRY *pNext; data *info; } entry; ``` 改進過後的 cache miss rate : 從原本的 85.825 % 降為 ==43.462 %== ``` Performance counter stats for './phonebook_opt' (100 runs): 122,800 cache-misses # 43.462 % of all cache refs ( +- 0.49% ) 282,548 cache-references ( +- 0.60% ) 244,622,819 instructions # 1.94 insn per cycle ( +- 0.02% ) 126,199,311 cycles ( +- 0.50% ) 0.038089531 seconds time elapsed ( +- 0.53% ) ``` 產生的長條圖 ![perfomance comparision: original against optimized(reduze struct size)](https://i.imgur.com/WeZQKQw.png) ### 改良版本(2): 使用 binary search tree 參考 [Sorted Linked List to Balanced BST](http://www.geeksforgeeks.org/sorted-linked-list-to-balanced-bst/) ，其建樹方法就像 tree traversal 中以 LVR 走訪 (反過來說， LVR 的走訪可以得到原本已排序的 linked list )，利用遞迴從原本的 linked list 建立 binary search tree 建立 binary searh tree 需要給 head 與總數為 input，最後回傳 root 節點 ```clike= /* binary search tree implementation */ node *buildBST(entry **head, int n) { if (n <= 0) return NULL; node *left = buildBST(head, n/2); node *root = (node *) malloc(sizeof(node)); root->pbEntry = *head; root->left = left; *head = (*head)->pNext; root->right = buildBST(head, n - n/2 - 1); return root; } ``` 以 binary search tree 搜尋 entry ```clike= entry *findNameByBST(char lastName[], node *root) { if (root == NULL) return NULL; int result; if ( (result = strcasecmp(lastName, root->pbEntry->lastName)) == 0 ) return root->pbEntry; else if (result < 0) return findNameByBST(lastName, root->left); else return findNameByBST(lastName, root->right); return NULL; } ``` 與原始版本再做一次比較： ![perfomance comparision: original against optimized(reduze struct size + BST)](https://i.imgur.com/RYlcx1Y.png) 可以發現 `append()` 時間增加，因為除了建立 linked list，還額外建立了 binary search tree ，增加了一些時間與空間的成本，但比起建立 linked list 的時間來得少，`findName()` 時間極快，時間複雜度從原本的 O(n) 變成 O(lg n) ### 改良版本(3): 使用 hash function hash function 可以讓我們資料分群，找尋資料只要計算完 hash value 從特定的群體開始找尋，節省 linked list 從頭找起的麻煩不過有好多個 hash functions 啊！這裡嘗試使用 djb2 看看 ```clike= unsigned long djb2_hash(char *str) { unsigned long hash = 5381; int c; while ((c = *str++)) hash = ((hash << 5) + hash) + c; return hash % HASH_TABLE_SIZE; } ``` 為了建立 hash table (內有多個 linked list) 時不要每次都從找尋 linked list 尾端才加入資料，改在linked list 最前面加入，並指定為 head ```clike= entry *insertFront(char lastName[],entry* e) { entry *pHead = (entry *) malloc(sizeof(entry)); strcpy(e->lastName, lastName); pHead->pNext = e; return pHead; } ``` 測試時使用的 hash table 大小為 50021，再與其他方式一起比較 ![performance comparision:original,reduced size,reduced size + BST, and reduced size + hash(djb2)](https://i.imgur.com/Xxh5zC6.png) 結果 djb2 `append()` 的時間又花更久了，因為多計算了 hash value，但是 `findName()` 與 BST 有相近的表現 ### 改變測資原本都是以 `zyxel` 來測試 `append()` 與 `findName()` 的效能，但是僅僅在接近35萬筆的資料中尋找一筆顯然不符合真實情況，也不太公平可以用方便快速的工具 `shuf` 來產生隨機的測資，像是 `$ shuf -n 5000 dictionary/words.txt > random.txt` 底下是1000筆人名測試的結果：不過沒有到很準確，因為連同檔案處理的時間也一起算進去了 ![](https://i.imgur.com/uW376VW.png) 本來看不出 BST 與 djb2 的差異，現在可以發現到 `findName()` djb2 與 BST 差了將近 2 倍 ! 可以說 djb2 在 50021 的 table size 平均表現比 BST 好 ### hash 演算法、大小、衝突、時間的分析 (待續) ## 參考資料 [示範用的 phonebook 作業](https://hackmd.io/s/BJRMImUPg#) [2016q3 自己的共筆](https://hackmd.io/s/SJEbssma) [Linux 效能分析工具: Perf](http://wiki.csie.ncku.edu.tw/embedded/perf-tutorial) [使用 gnuplot 科學作圖](http://www.phy.fju.edu.tw/~ph116j/gnuplot_tutorial.pdf) [Sorted Linked List to Balanced BST](http://www.geeksforgeeks.org/sorted-linked-list-to-balanced-bst/) [Skip List](https://puremonkey2010.blogspot.tw/2013/04/skip-list.html) [Hash functions](http://www.cse.yorku.ca/~oz/hash.html) [hash function 觀念和實務](https://hackmd.io/s/HJln3jU_e#) [prime number list](http://compoasso.free.fr/primelistweb/page/prime/liste_online_en.php)