2017q1 Homework1 (phonebook)

# 2017q1 Homework1 (phonebook) contributed by < `hunng` > ### Reviewed by divazone * 在github沒看到在此篇做修改後的code * 請回答數大就是美的問題 * 在binary tree的部分可以試著根據此電話的名字分布來試著去減少部分的分支來加速搜索時間 ### Reviewed by stanleytazi + 在 github 沒看到有 commit 的紀錄 + 雖然cache-miss rate下降，但其實 cache ref&cache miss次數是上升的 + 在 hash table 新增一個 pointer 指向最近存放的node 這個做法，如果沒有特別一定要照這個順序，直接一直把新的 node 接在最靠近table那個就好，這樣應該就不用新增一個 pointer ### [github](https:/github.com/hunng/phonebook) 切換 branch 觀看 code 修改內容 # 事前準備 ## 開發環境 * OS: Ubuntu 16.04.2 LTS * Architecture: x86_64 * CPU op-mode(s): 32-bit, 64-bit * CPU(s): 4 * CPU Model name: Intel(R) Core(TM) i5-4200U CPU @ 1.60 GHz * L1d cache: 32 K * L1i cache: 32 K * L2 cache: 256 K * L3 cache: 3072 K * MemTotal: 8056592 KB ## Perf 首先利用以下指令確認目前的 Kernel config 有沒有啟用 Perf ``` ＄ cat "/boot/config-`uname -r`" | grep "PERF_EVENT" ``` 得到如下，確認已開啟 ``` CONFIG_PERF_EVENTS_INTEL_UNCORE=y CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_EVENTS=y CONFIG_HAVE_PERF_EVENTS_NMI=y ``` 嘗試 perf_top_example.c後，結果如下 ![](https://i.imgur.com/8e6PtJH.png) * 基本指令 * pref list: 用於查詢 pref 在此環境下能觸發的 event * pref top: 與 Linux 內建 top 類似 * pref stat: 用於觀察特定程式效能 * pref record: 針對函式級別進行 event 統計利用以前寫的程式測試上述指令 ``` ＄ perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./a.out ``` ``` Performance counter stats for './a.out' (5 runs): 4,324 cache-misses # 34.303 % of all cache refs ( +- 19.10% ) 12,606 cache-references ( +- 2.57% ) 1,812,893 instructions # 1.21 insns per cycle ( +- 0.24% ) 1,496,746 cycles ( +- 4.83% ) 0.004125766 seconds time elapsed ( +- 68.22% ) ``` ``` ＄ perf record -e branch-misses:u,branch-instructions:u ./a.out ＄ perf report ``` ![](https://i.imgur.com/ENzMvD3.png) ![](https://i.imgur.com/Kq46yJj.png) --- # 作業內容 github 內切換 branch 看每個版本的 code ## 原始版本 * 執行: `$ ./phonebook_orig ` * entry 大小及 append() findName() 各自的時間如下 ``` size of entry : 136 bytes execution time of append() : 0.047660 sec execution time of findName() : 0.005363 sec ``` * cache-test: `$ make cache-test` * Cache-misses 高達 87.98 % ``` Performance counter stats for './phonebook_orig' (100 runs): 951,440 cache-misses # 87.982 % of all cache refs ( +- 0.37% ) 1,081,409 cache-references ( +- 0.32% ) 260,904,065 instructions # 1.50 insns per cycle ( +- 0.02% ) 173,663,392 cycles ( +- 0.21% ) 0.070041016 seconds time elapsed ( +- 0.38% ) ``` ## 減少資料結構大小 ### 既然沒使用過除了 last name 其他資料，何不直接不用 ```C= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` * 執行時間 ``` size of entry : 24 bytes execution time of append() : 0.042229 sec execution time of findName() : 0.002306 sec ``` * cache-misses * 87.98 % -> 40.95 % ``` Performance counter stats for './phonebook_opt' (100 runs): 88,053 cache-misses # 40.945 % of all cache refs ( +- 0.59% ) 215,051 cache-references ( +- 0.61% ) 240,778,363 instructions # 2.03 insns per cycle ( +- 0.02% ) 118,601,828 cycles ( +- 0.25% ) 0.047711455 seconds time elapsed ( +- 0.37% ) ``` ### 但考慮到不能有功能方面的減損(只有名字就不叫電話簿了) idea from [here](https://hackmd.io/s/BJRMImUPg#%E5%84%AA%E5%8C%96%E7%89%88%E6%9C%AC-1-%E6%B8%9B%E5%B0%91%E8%B3%87%E6%96%99%E7%B5%90%E6%A7%8B%E5%A4%A7%E5%B0%8F) 建立新的 struct (__PHONE_BOOK_ENTRY_DETAIL) 用於儲存所有資料並在原有 struct 新增一個指標指向它 ```C= typedef struct __PHONE_BOOK_ENTRY_DETAIL { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry_detail; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; entry_detail *pDetail; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` * 執行時間 ``` size of entry : 32 bytes execution time of append() : 0.046090 sec execution time of findName() : 0.002385 sec ``` * cache-misses * 40.95 % (24 bytes) -> 28.990 (32 bytes) * 24 -> 32 bytes 之間，雖然使用較大 entry 但 cache-misses 卻較小 ``` Performance counter stats for './phonebook_opt' (100 runs): 113,338 cache-misses # 28.990 % of all cache refs ( +- 0.77% ) 390,960 cache-references ( +- 0.80% ) 243,989,894 instructions # 1.93 insns per cycle ( +- 0.02% ) 126,349,603 cycles ( +- 0.54% ) 0.051253389 seconds time elapsed ( +- 0.75% ) ``` * plot ![](https://i.imgur.com/rgBUoKU.png) ## 使用 Hash Function hash function 使用 BKDR-Hash ```c= unsigned int hashit(char lastName[]) { unsigned int seed = 131; unsigned int value = 0; while (*lastName) { value = value * seed + (*lastName++); } return value % TABLE_SIZE; } ``` * 將所有數據放入 hash table 中， table 中有指向 entry 的 array * 將重複索引值的 entry 存放以 linked list 存放，但目前實作下的 append() 時間過長，需在修改 * 調整 mod 值（第一層 hash table 可存放的量），雖然讓整體下降，也仍然花了近2倍的時間 * 推測因為每次在存放資料時都需要重頭找起，所以當 mod 值低的時候，每個欄位放的總數眾多，每次存放要找到底便是負擔 * 而在調整 mod 值甚至近資料總數時，則因過大的 struct 導致過多的 cache-misses (71%) ，速度無法繼續向上提昇 * 想到的改進方向： * 因為在 append() 的時間過長 * 假如在 table 中每個欄位多存放一個 entry *，指向此欄位最新增加的 entry，新資料近來時能直接利用此指標進行操作使用的 table ```c typedef struct __HASH_TABLE { entry *store[TABLE_SIZE]; } table; ``` ![](https://i.imgur.com/ZZKVHG5.png) ### 調整後 ```c typedef struct __HASH_TABLE { entry *store[TABLE_SIZE][2]; } table; ``` TABLE_SIZE = 256 並用以下 enum 來增加 code 的閱讀性 ``` enum entryStore { FIRST, LAST }; ``` * cache-misses * 與調整前有明顯降低 * 略高於 phonebook_opt ``` Performance counter stats for './phonebook_hash' (100 runs): 94,607 cache-misses # 31.851 % of all cache refs ( +- 1.01% ) 297,029 cache-references ( +- 0.87% ) 237,542,616 instructions # 1.71 insns per cycle ( +- 0.02% ) 139,061,959 cycles ( +- 0.26% ) 0.057001722 seconds time elapsed ( +- 0.35% ) ``` * plot * 雖然 phonebook_hash 的 append() 時間較長 * 但降低了查詢時間 * 也就表示 phonebook 若是單次的 append() 後反覆查詢，就可以看出此版本的優勢 ![](https://i.imgur.com/hrAn7C4.png) ### TABLE_SIZE 與效能的關係 * 藉由 TABLE_SIZE 來設定此 hash table 中儲存的第一層資料個數 * 總 table 大小為 TABLE_SIZE * 8 (entry *) * 2 ``` $ ./phonebook_hash size of entry * : 8 bytes size of table : 32768 bytes ``` * append() * 經過 2048 前並沒有多大的增幅，大多在 0.051 ～ 0.052間 * 而在 4096 時就有明顯的增加 * 4096*8*2 = 65536 超過 32K 的 cache ![](https://i.imgur.com/gYvs7NX.png) * findName() 會隨著 TABLE_SIZE 上升而下降 * 同樣的 hash function % mod 下 * 在 table 不大時，搜尋的 entry 數會大略兩倍，下降幅度大概是 * 1/2 * 在 table 有一定大小時，因為搜尋經過的 entry 沒有差多少，下降的幅度沒有那麼大 ![](https://i.imgur.com/mg4YVqn.png) * cache * cache-references 數隨著 TABLE_SIZE 增大而增大 * cache-misses 數實際則相差不多，大多都在 7~9 萬不等 | TABLE_SIZE | cache-misses | cache-references | |------------|--------------|----------------| | 32 | 98173 | 180160 | | 64 | 93515 | 204223 | | 128 | 87883 | 237809 | | 256 | 83402 | 261340 | | 512 | 78871 | 293602 | | 1024 | 79128 | 395887 | | 2048 | 89420 | 458105 | | 4096 | 102688 | 542540 | 其中一次實驗，但多次實驗下來發現 cache-misses 都在 7～9萬浮動，甚至在 2048 及 4096 曾跑出 100000 ~ 120000 >以前只注意到 cache-misses rate，但這次發現原來很多 cache-misses rate 有詐，有時其實是因為分母太大 >[name=hunng] ## 使用 Binary Search Tree （最後使用了 trie） ### binary search tree 中的 node * 目前只有儲存當下的 char 值，若未來需要其他資料則需加上另一個指標指向詳細資料，如下 ```c typedef struct __BINARY_SEARCH_NODE { /* if needed more information add a entry_detail */ char str[MAX_LAST_NAME_SIZE]; struct __BINARY_SEARCH_NODE *pLeft, *pRight; } node; ``` ### 一個能讓 `char []` 互相比較，並能產生大小值 * 而在之前的程式碼中其實就有用過 `strcasecmp()` * 若使用 word.txt (排序過得資料)經由這個方法得到的數字，排序會得到一長串 linked-list (與原先 orig 一樣)，但又因為增加了其他的 pointer，整體會浪費了許多空間節錄自 `$ man strcasecmp` ``` The strcasecmp() function performs a byte-by-byte comparison of the strings s1 and s2, ignoring the case of the characters. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2. ``` * 那假如使用字母排列 * 依照 rank 數比較字母，同字往左邊，不同字往右邊 * 雖然能產生叫漂亮的 binary tree，但會產生前面遇過得 append() 時間很長的問題，每個 append() 都相當於在從頭找起 * 所以選擇了依照第一個字母排列 * 雖然降低了整體 binary tree 的複雜度（第一位同字後都只有單向） * 但能大大提昇整體的效能 * 同第一位同字的則依照前面 hash 時使用的方法，新增一個 pointer 指向最底層加速 append() * 這種資料結構比較像 Trie 修改了使用的 node ```c typedef struct __TRIE_NODE { /* if needed more information add a entry_detail */ char str[MAX_LAST_NAME_SIZE]; struct __TRIE_NODE *pLeft[2], *pRight; } node; ``` 示意圖 ![示意圖](https://i.imgur.com/wM3Rfcu.png) * cache-misses ``` Performance counter stats for './phonebook_bst' (100 runs): 83,285 cache-misses # 54.120 % of all cache refs ( +- 0.76% ) 153,889 cache-references ( +- 0.63% ) 327,152,663 instructions # 1.78 insns per cycle ( +- 0.01% ) 183,479,557 cycles ( +- 0.13% ) 0.072624966 seconds time elapsed ( +- 0.30% ) ``` * plot ![](https://i.imgur.com/FgiCbcL.png) * 可以的修改方向 * 因為當初是為了要做 binary search tree，所以連接了兩個不同字母開頭的 * 但假如利用了 array 儲存開頭（像 hash table)，會省去不少從頭找起的時間，例如：當查找 z開頭的字時，可以不用經由開頭 a 找到 b，在從 b 找到 c 這種方式，若能直接從存定的 array 中的值利用，會快很多 ## 本例選用的 dataset (也就是輸入電話簿的姓名) 是否存在問題？ * 有代表性嗎？跟真實英文姓氏差異大嗎？ * word.txt 為類似 `aaaa` `bbbb` 這種無意義字詞最作為開頭和一些英文單字，例如`afternight` `afternoon` * 撇開那些無義單詞還比較像是一本字典的索引 * 資料難道「數大就是美」嗎？ * 如果我們要建立符合真實電話簿情境的輸入，該怎麼作？ * 輸入順序不該是按照順序的