# 2017q1 Homework1 (phonebook) contributed by < `twzjwang` > 作業說明: [B01: phonebook](https://hackmd.io/s/rJYD4UPKe) github: [twzjwang/phonebook](https://github.com/twzjwang/phonebook) ### Reviewed by `zhanyangch` * 修改 runtime.gp 的指令順序可以使數字不被蓋住 # 開發環境 - Ubuntu 14.04.5 LTS Linux 4.4.0-62-generic >> 這個 distribution 太舊,之後的實驗可能會有問題,請升級到 Ubuntu 16.04 之後 [name="jserv"] - cpu version: Intel(R) Core(TM) i5-3337U CPU @ 1.80GHz Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K - memory size: 4GiB # 開發前 - 安裝Ubuntu系統 - 安裝開發工具 - vim - git - build-essential - linux-tools-common - linux-tools-generic - cppcheck - astyle - colordiff - gnuplot - ... - github設定 # 未優化版本 - 清空 cache - `$ echo 1 | sudo tee /proc/sys/vm/drop_caches` - 執行 phonebook_orig - `$ ./phonebook_orig` - 結果 ``` size of entry : 136 bytes execution time of append() : 0.108467 sec execution time of findName() : 0.006022 sec ``` - cache-test `$ make cache-test` - ``` Performance counter stats for './phonebook_orig' (100 runs): 2,080,686 cache-misses # 96.031 % of all cache refs ( +- 0.16% ) 2,166,682 cache-references ( +- 0.18% ) 262,557,158 instructions # 1.41 insns per cycle ( +- 0.02% ) 185,753,200 cycles ( +- 0.12% ) 0.075103558 seconds time elapsed ( +- 0.94% ) ``` # 實驗一 - 用 binary search tree >> 不要濫用「優化」(optimize) 這詞,在你有具體改進之前,都只是實驗(experiment) [name="jserv"] ## 方法 1: 使用strcmp >> 避免用 version 1, 2, 3, ... 等寫法,明確標注你的嘗試途徑,尤其在 Git commit messages 更該如此,原始程式碼和技術報告都是寫給人看的! [name="jserv"] - 用`strcmp`判斷,left < parent, right > parent - append 時間太久 => 直接 ctrl+c ```C int strcmp(const char *s1, const char *s2) { while(*s1 && (*s1 == *s2)) s1++, s2++; return *(const unsigned char*) s1 - *(const unsigned char*) s2; } ``` >> 注意一致的 coding style,應該寫作 `char *s1`,而非 `char *s1`,然後是 `*s1 == *s2` (有空白) 而非 `*s1==*s2` [name="jserv"] - 改用`all-names.txt` find `zora`測試 - 結果 - findName 時間大幅縮短(優化前 20%) - append 時間太久(優化前的 90 倍) - cache miss 約為 4.795% ``` Performance counter stats for './phonebook_orig' (100 runs): 69,435 cache-misses # 77.038 % of all cache refs ( +- 1.23% ) 90,131 cache-references ( +- 1.26% ) 12,058,151 instructions # 1.13 insns per cycle ( +- 0.03% ) 10,699,046 cycles ( +- 1.63% ) 0.004715377 seconds time elapsed ( +- 1.86% ) Performance counter stats for './phonebook_opt_bst' (100 runs): 97,979 cache-misses # 4.795 % of all cache refs ( +- 2.68% ) 2,043,183 cache-references ( +- 1.37% ) 971,374,950 instructions # 2.01 insns per cycle ( +- 0.00% ) 482,156,625 cycles ( +- 1.04% ) 0.198216919 seconds time elapsed ( +- 1.31% ) ``` >上面這行是...?[name=課程助教][color=red] >已刪除 [name=twzjwang] ![](https://i.imgur.com/rFDGhkX.png) ## 方法 2: 改用自訂義 entry_cmp - 因為輸入檔字母按順序(a-z) => 產生 skewed BST - 改用自訂 cmp => 先比長度 ```C int entry_cmp(char a[], char b[]) { if(strlen(a) == strlen(b)) return strcmp(a, b); return strlen(a) - strlen(b); } ``` >> 沒必要多次呼叫 `strlen`,請避免 [name="jserv"] ```C int entry_cmp(char a[], char b[]) { int i = strlen(a) - strlen(b); return i ? i : strcmp(a, b); } ``` - 結果 - append 時間縮短為 version 1 的 0.47% - findName 時間縮短為優化前 4% - cache miss 約為 5.148% ``` Performance counter stats for './phonebook_orig' (100 runs): 74,670 cache-misses # 78.183 % of all cache refs ( +- 0.70% ) 95,508 cache-references ( +- 1.00% ) 12,061,787 instructions # 1.02 insns per cycle ( +- 0.02% ) 11,865,179 cycles ( +- 1.95% ) 0.005664565 seconds time elapsed ( +- 3.60% ) Performance counter stats for './phonebook_opt_bst' (100 runs): 134,457 cache-misses # 5.148 % of all cache refs ( +- 2.42% ) 2,611,651 cache-references ( +- 0.88% ) 449,076,938 instructions # 1.88 insns per cycle ( +- 0.15% ) 238,755,941 cycles ( +- 1.50% ) 0.098018759 seconds time elapsed ( +- 1.83% ) ``` ![](https://i.imgur.com/IXviDpN.png) - 問題 - 能否再縮短 append 時間? ## 方法 3: 嘗試修改 entry_cmp - 再修改用 cmp => 先比長度,再從結尾比大小(與 strcmp相反) ```C int entry_cmp(char a[], char b[]) { if (strlen(a) == strlen(b)) { int l = strlen(a); for (int i = l - 1; i >= 0; i--) { if (*(a + i) != *(b + i)) return *(a + i) - *(b + i); } } return strlen(a) - strlen(b); } ``` >> 縮減 `int i` 和 `int l` 的 scope。 >> `strlen` 多次呼叫,可以簡化 [name="jserv"] ```C int entry_cmp(char a[], char b[]) { int m = strlen(a); int n = m - strlen(b); if (!n) { while (m--) { if (*(a + m) != *(b + m)) return *(a + m) - *(b + m); } } return n; } ``` - 結果 - append 時間縮短為 version 2 的 0.14% - findName 時間縮短 < 優化前 1% - cache miss 約為 44.641% - 推測: - 過程中採用 binary search tree 結構存資料 插入和搜尋中存取 cache 次數與 singly-linked list 不同 (cache-references 和 cache-misses 次數明顯不同) - 已知 cache max size: 32 KB cache block size: 64 B entry size: 144 B 8-way set associative - orig - append : n * O(1) 16750筆資料 * 每筆 144 B = 2,412,000 B 2,412,000 B / 64 B = 37688 block - findName : O(n) 16750筆資料 * 每筆 144 B = 2,412,000 B 2,412,000 B / 64 B = 37,688 block - 37,688 + 37,688 = 75,376 次 - opt_bst - append : O(nlogn) 16750 * log(16750) * 144 B = 33,844,878 B 33,844,878 B / 64 B = 528,826 block - findName :O(n) log(16750) * 144 B = 2,021 B 2,021 B / 64 B = 32 block - 528,826 + 32 = 528,858 次 - 參考 [cache 原理和實驗](https://hackmd.io/s/S14L26-_l) >> 如何解釋 cache miss 的變化? [name=jserv] ``` Performance counter stats for './phonebook_orig' (100 runs): 61,477 cache-misses # 74.388 % of all cache refs ( +- 1.54% ) 82,643 cache-references ( +- 1.34% ) 12,059,048 instructions # 1.26 insns per cycle ( +- 0.03% ) 9,593,850 cycles ( +- 1.84% ) 0.004231672 seconds time elapsed ( +- 2.34% ) Performance counter stats for './phonebook_opt_bst' (100 runs): 200,986 cache-misses # 44.641 % of all cache refs ( +- 1.99% ) 450,229 cache-references ( +- 1.23% ) 55,910,271 instructions # 0.95 insns per cycle ( +- 0.25% ) 58,829,146 cycles ( +- 1.87% ) 0.025019251 seconds time elapsed ( +- 2.06% ) ``` ![](https://i.imgur.com/mfSomHh.png) 更改字串 cmp 方法比較 (用`all-names.txt` find `zora`) 版本|方法 1 | 方法 2 | 方法 3 -|----------|-----------|------------ append|0.189335|0.089585|0.013539 findName|0.000022|0.000007|0.000001 - 改回`words.txt` find `zyxel` - ![](https://i.imgur.com/t5x0jmT.png) - 問題 - 除了改變 cmp 規則,有沒有辦法減少插入、搜尋的深度? - balance BST ## 方法 4: 將 single-linked list 轉換成 Balanced BST - 因為輸入字母已按順序排列 >> 可用 `sort -R` 建立新的輸入資料集 [name="jserv"] - 參考 [Sorted Linked List to Balanced BST](http://www.geeksforgeeks.org/sorted-linked-list-to-balanced-bst/) Method 1 - 將 orig 的 singly_linked list 資料轉換成 balanced BST ```C //search by bst entry *findName(char lastName[], entry *pHead) { entry *temp = pHead; int cmp; while (temp) { cmp = strcmp(lastName, temp->lastName); if (cmp == 0) return temp; else if (cmp < 0) temp = temp->pLeft; else temp = temp->pRight; } return NULL; } //orig append entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; e->pRight = NULL; e->pLeft = NULL; return e; } entry *new_balance_bst(entry *root, int num) { if(!num) return NULL; entry *bst_root = NULL; bst_root = balance_bst(bst_root, num, root); free(root); return bst_root; } entry *balance_bst(entry *e, int num, entry *root) { static entry *bst_build_cur = NULL; if(!num) return NULL; if(!bst_build_cur) bst_build_cur = root; if(!e){ e = (entry *) malloc(sizeof(entry)); e->pNext = NULL; e->pRight = NULL; e->pLeft = NULL; } int mid = (num +1)/2; e->pLeft = balance_bst(e->pLeft, mid - 1, root); strcpy(e->lastName, bst_build_cur->lastName); bst_build_cur = bst_build_cur->pNext; e->pRight = balance_bst(e->pRight, num - mid, root); return e; } void free_bst(entry *e) { if (e) { free_bst(e->pRight); free_bst(e->pLeft); free(e); } } ``` >> 上述的 `findName` 可改寫為以下: >> ```C >> { >> entry *temp = pHead; >> while (temp) { >> if (!strcmp(lastName, temp->lastName)) >> return temp; >> temp = (cmp < 0) ? temp->pLeft : temp->pRight; >> } >> return NULL; >> } >> ``` >> [name="jserv"] - 結果 - append 時間降為優化前1.69倍 - findName 時間 < 優化前0.0178% - cache-miss 約為 94.463% ``` Performance counter stats for './phonebook_opt_bst' (100 runs): 2,431,096 cache-misses # 94.463 % of all cache refs ( +- 0.10% ) 2,573,593 cache-references ( +- 0.20% ) 481,113,045 instructions # 1.34 insns per cycle ( +- 0.01% ) 358,634,304 cycles ( +- 0.29% ) 0.142062161 seconds time elapsed ( +- 0.62% ) ``` ![](https://i.imgur.com/k4XQ9gH.png) - 問題 - 降低 cache-misses ? - 直接建成 BST (不經過 singly-linked list )? - 邊讀檔邊建 => 不知資料筆數 => 先讀過一次存下資料數 ## 方法 5: 嘗試省略 singly-linked list ,直接建 Balanced BST - 修改 versoin 4 先讀檔取得資料數量 - 直接讀檔建 balance BST ```C while (fgets(line, sizeof(line), fp)) { while (line[i] != '\0') i++; count_node++; } ``` - 結果 - append 時間降為優化前 1.44 倍 - findName 時間 < 優化前 0.0169% - cache-miss 約為 92.262% ``` Performance counter stats for './phonebook_opt_bst' (100 runs): 1,425,612 cache-misses # 92.262 % of all cache refs ( +- 0.04% ) 1,545,179 cache-references ( +- 0.11% ) 434,912,471 instructions # 1.38 insns per cycle ( +- 0.00% ) 314,507,544 cycles ( +- 0.10% ) 0.124844305 seconds time elapsed ( +- 0.54% )! ``` ![](https://i.imgur.com/2DtIu9z.png) - 問題 - 其他方法? ## binary search tree 小結 Algo.|優化前(singly-linked list)|優化後(balanced binary search tree) -|-----|----- space|O(n)|O(n) insert|O(1)|O(n) (讀資料數) + O(log n) (插入) search|O(n)|O(log n) # 實驗二 - 縮減 entry 結構 - 程式中依據 `lastName` 插入及搜尋 - 將其他資訊定義在新的結構 `entry_info` 需要時再配置記憶體 ```C /* original version */ typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 改成以下 ```C typedef struct __PHONE_BOOK_ENTRY_INFO { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } entry_info; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; entry_info *pInfo; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` - 結果 - append 時間為修改前 82% - findName 時間為修改前 51% - cache-miss 約為 68.577% - cache miss 降低原因: size of entry 變小 ``` Performance counter stats for './phonebook_orig' (100 runs): 2,042,217 cache-misses # 95.007 % of all cache refs ( +- 0.12% ) 2,149,550 cache-references ( +- 0.16% ) 262,784,639 instructions # 1.43 insns per cycle ( +- 0.02% ) 183,943,466 cycles ( +- 0.37% ) 0.073711683 seconds time elapsed ( +- 0.90% ) Performance counter stats for './phonebook_opt_struct' (100 runs): 378,199 cache-misses # 68.577 % of all cache refs ( +- 0.20% ) 551,497 cache-references ( +- 1.13% ) 245,480,764 instructions # 1.71 insns per cycle ( +- 0.02% ) 143,394,756 cycles ( +- 1.12% ) 0.056445606 seconds time elapsed ( +- 1.34% ) ``` ![](https://i.imgur.com/gURlVZX.png) # 實驗三 - 結合實驗一(BST)與實驗二(縮減 entry 結構) - 將前兩個方法結合 - 結果 - (優化前) original: - 使用 single linked list 儲存資料 - append 時紀錄最後一個 node 插入linked list 時間不會太多 ( n*O(1) ) - findName 最糟的情況下必須從頭找到尾 ( O(n) ) - (實驗一) opt bst: - 使用BST雖然 append 時間稍微增加 ( n * O(log n) ) - 但 findName 時間大幅減少 ( O(log n) ) - (實驗二) opt struct: - 簡化 entry 結構降低 cache misses rate - append 和 findName 時間因此減少 - (實驗三) optimized - 使用 size 較小的資料結構讓 BST 插入、搜尋時間再縮短 ``` Performance counter stats for './phonebook_orig' (100 runs): 1,987,476 cache-misses # 95.591 % of all cache refs ( +- 0.67% ) 2,079,154 cache-references ( +- 0.70% ) 262,484,896 instructions # 1.43 insns per cycle ( +- 0.02% ) 184,098,351 cycles ( +- 0.58% ) 0.084555741 seconds time elapsed ( +- 2.78% ) Performance counter stats for './phonebook_opt' (100 runs): 469,681 cache-misses # 78.364 % of all cache refs ( +- 0.35% ) 599,356 cache-references ( +- 1.20% ) 314,544,912 instructions # 1.49 insns per cycle ( +- 0.00% ) 211,290,940 cycles ( +- 0.75% ) 0.094401048 seconds time elapsed ( +- 2.10% ) Performance counter stats for './phonebook_opt_bst' (100 runs): 1,387,546 cache-misses # 90.434 % of all cache refs ( +- 0.14% ) 1,534,318 cache-references ( +- 0.47% ) 434,348,761 instructions # 1.37 insns per cycle ( +- 0.00% ) 317,930,025 cycles ( +- 0.59% ) 0.137107756 seconds time elapsed ( +- 1.86% ) Performance counter stats for './phonebook_opt_struct' (100 runs): 374,466 cache-misses # 71.274 % of all cache refs ( +- 0.25% ) 525,392 cache-references ( +- 0.86% ) 245,306,198 instructions # 1.78 insns per cycle ( +- 0.02% ) 137,518,542 cycles ( +- 0.87% ) 0.057796514 seconds time elapsed ( +- 2.39% ) ``` ![](https://i.imgur.com/HB2T2d2.png) # changing git commit message - 之前沒有注意 commit message 內容及格式 Author email也打錯... 經過提醒後嘗試修改 commit message - `git rebase -i HEAD~5` 修改前五次 commit message - 將要修改那行的 `pick` 改成 `edit` - `git commit --amend` Author 資訊錯誤也可在此更改 `git commit --amend --author="twzjwang <twzjwang@gmail.com>"` - 開啟編輯,修改後儲存 - `git rebase --continue` - 重複以上三步驟 - `git push --force-with-lease` # 本例選用的 dataset 是否存在問題? - 有代表性嗎?跟真實英文姓氏差異大嗎? - 許多字不是名詞,甚至是連續而無意義 - 資料難道「數大就是美」嗎? - 除了資料"量"外,"質"也很重要 - 如果內容大部分是無意義的資料,要從中篩選出有參考價值的資料 - 如果我們要建立符合真實電話簿情境的輸入,該怎麼作? - 從已整理好的開放資料取得更常見的姓氏 - 要考慮姓名有可能重複的狀況 - 除了根據"姓名"搜尋資料外,根據 電話號碼 搜尋也是常見的方式,要重新設計演算法 - e.g. [whoscall](https://whoscall.com/zh-TW/) # 參考資料 - [Linux 查看系統及硬體資訊](https://www.phpini.com/linux/linux-check-system-hardware-information) - [Sorted Linked List to Balanced BST](http://www.geeksforgeeks.org/sorted-linked-list-to-balanced-bst/) - [jkrvivian HackMD筆記](https://hackmd.io/s/SyOQgOQa#) - [vic85821 HackMD筆記](https://hackmd.io/s/r1B6WwQT#) - [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/) - [changing-git-commit-message-after-push](http://stackoverflow.com/questions/8981194/changing-git-commit-message-after-push-given-that-no-one-pulled-from-remote) - [修改 Git commits 的作者資訊](https://yulun.me/2014/git-tips-change-author-and-email-in-previous-commits/) - [記憶體系統-快取記憶體](http://systw.net/note/af/sblog/more.php?id=252) ###### tags: `twzjwang`