# 2017q1 Homework1(phonebook) contributed by < [refleex](https://github.com/refleex) > ### Reviewed by `king1224` * phonebook_opt_hash 的 findName 執行時間是 0.0000 秒,你認為是不正常的嗎? 如果將要查找的 lastName 改變之後,時間還會是 0.0000 秒嗎 * 我試著將查找目標改為 " aaaa ",卻出現了錯誤,append 是不是需要做修正 * 可以詳細閱讀電話簿的輸入順序,main.c 中呼叫 append 和 findName 的時機,以及你是如何處理 append 和 findName 的,就會發現為什麼 findName 在這邊是 0 秒,為什麼不能查找別筆資料 >main.c 中的 input array 即為查找目標 >dictionary 資料夾中的 word.txt 為預設電話簿 >[name=洪正皇][color=#e847bd] ## **開發環境** 作業系統 : ubuntu 16.04 LTS (64-bit) Architecture: x86_64 CPU 作業模式: 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 每核心執行緒數:2 每通訊端核心數:2 Socket(s): 1 NUMA 節點: 1 供應商識別號: GenuineIntel CPU 家族: 6 型號: 60 Model name: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz 製程: 3 CPU MHz: 1694.671 CPU max MHz: 3200.0000 CPU min MHz: 800.0000 BogoMIPS: 5187.92 虛擬: VT-x L1d 快取: 32K L1i 快取: 32K L2 快取: 256K L3 快取: 3072K NUMA node0 CPU(s): 0-3 ## ~~進度~~ - 2017/02/22 >> 不用特別標時間,HackMD 會自動記錄更新,你只要及時整理進度和所見所聞即可 [name="jserv"] --- ## **前置作業** 先安裝並熟悉 perf 的使用 ``` $ sudo apt-get install linux-tools-common $ sudo apt-get install linux-tools-3.16.0-50-generic linux-cloud-tools-3.16.0-50-generic ``` 但因為 kernel 版本不同,因此照出現的警示安裝相符的版本 ``` $ sudo apt-get install linux-tools-4.4.0-63-generic $ sudo apt-get install linux-cloud-tools-4.4.0-63-generic ``` 接著就照 [Perf](https://hackmd.io/s/B11109rdg) 的說明繼續練習,先查看權限 ``` $ cat /proc/sys/kernel/perf_event_paranoid ``` 結果為 1 ``` 1 ``` 但在我要解除 kernel pointer 限制時 ``` $ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict" ``` 查看權限的結果仍為 1 執行 example.c ``` $ ./example ``` 得到 pid: 26710 ``` $ perf top -p 26710 ``` 輸入上列指令,卻只得到 perf top 的參數列表 ``` Usage: perf top [<options>] -a, --all-cpus system-wide collection from all CPUs -b, --branch-any sample any taken branches -c, --count <n> event period to sample -C, --cpu <cpu> list of cpus to monitor -d, --delay <n> number of seconds to delay between refreshes -D, --dump-symtab dump the symbol table used for profiling -E, --entries <n> display this many functions -e, --event <event> event selector. use 'perf list' to list available even -f, --count-filter <n> only display functions with more events than this -F, --freq <n> profile at this frequency -g enables call-graph recording and display -i, --no-inherit child tasks do not inherit counters -j, --branch-filter <branch filter mask> branch stack filter modes -K, --hide_kernel_symbols hide kernel symbols -k, --vmlinux <file> vmlinux pathname -M, --disassembler-style <disassembler style> Specify disassembler style (e.g. -M intel for intel syntax) . . . ``` 上述部份試了很多種方法,結果不是沒變,不然就是 ``` bash: syntax error near unexpected token `newline' ``` 不過沒關係,大致重複來回做7個小時後,再參考一些網路上的文章,已經稍微對 perf 有些概念,所以正式開始這次的作業-->優化 --- ## **Original** 先用 perf stat 檢查原始版本的 cash miss ``` $sudo perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig ``` 結果如下 ``` Performance counter stats for './phonebook_orig' (100 runs): 969,283 cache-misses # 85.291 % of all cache refs ( +- 0.19% ) 1,136,447 cache-references ( +- 0.13% ) 261,184,322 instructions # 1.47 insns per cycle ( +- 0.02% ) 177,351,808 cycles ( +- 0.14% ) 0.056668601 seconds time elapsed ( +- 0.15% ) ``` cache-miss 85.291% ``` size of entry : 136 bytes execution time of append() : 0.059108 sec execution time of findName() : 0.005920 sec ``` --- ## **可能的效能改進方向** - 使用 binary search tree 或其他資料結構改寫 - 藉由 hash function 來加速查詢,請詳閱 hash function 觀念和實務 - 既然 first name, last name, address 都是合法的英文 (可假設成立),使用字串壓縮的演算法,降低資料表示的成本 --- ## **優化 1 (struct)** 打開 ```phonebook_opt.h``` ``` clike= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; entry *findName(char lastName[], entry *pHead); entry *append(char lastName[], entry *e); ``` 打開 ```phonebook_orig.c``` ``` clike= entry *findName(char lastName[], entry *pHead) { while (pHead != NULL) { if (strcasecmp(lastName, pHead->lastName) == 0) return pHead; pHead = pHead->pNext; } return NULL; } entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; } ``` 可以發現在```phonebook_opt.h``` struct 所選宣告變數中 append() 和 findName() 只用到了 lastName 因此把不會用到的變數 用另外一個struct宣告 ```clike= typedef struct __PHONE_BOOK_ENTRY1 { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } entry1; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct entry1 *detail; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 然後```$ make plot``` ``` size of entry : 32 bytes execution time of append() : 0.038840 sec execution time of findName() : 0.005018 sec ``` ``` Performance counter stats for './phonebook_opt' (100 runs): 971,060 cache-misses # 85.131 % of all cache refs ( +- 0.17% ) 1,140,662 cache-references ( +- 0.12% ) 261,089,459 instructions # 1.47 insns per cycle ( +- 0.02% ) 177,377,228 cycles ( +- 0.13% ) 0.056722433 seconds time elapsed ( +- 0.16% ) ``` 85.131%,嗯。 ![modify struct](https://i.imgur.com/lTRc2bp.png) However, 我可憐的電腦,cache-miss 僅優化了 0.16% -- 不死心重複嘗試後,把原本的 phonebook 資料夾整個刪掉,重頭開始做。 ```clike= #define OPT 1 typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_ENTRY *pNext; struct entry1 *detail; } entry; typedef struct __PHONE_BOOK_ENTRY1 { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } entry1; entry *findName(char lastName[], entry *pHead); entry *append(char lastName[], entry *e); #endif ``` ```clike= entry *findName(char lastName[], entry *pHead) { while (pHead != NULL) { if (strcasecmp(lastName, pHead->lastName) == 0) return pHead; pHead = pHead->pNext; } return NULL; } entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; } ``` 然後```$ make plot``` ``` size of entry : 32 bytes execution time of append() : 0.034790 sec execution time of findName() : 0.002003 sec ``` ``` Performance counter stats for './phonebook_opt' (100 runs): 142,859 cache-misses # 47.150 % of all cache refs ( +- 0.50% ) 302,990 cache-references ( +- 0.60% ) 243,965,824 instructions # 1.93 insns per cycle ( +- 0.02% ) 126,343,413 cycles ( +- 0.26% ) 0.040863594 seconds time elapsed ( +- 0.34% ) ``` 47.15% 總算回歸正常數值 ![newruntime](https://i.imgur.com/lbZXSPD.png) --- ## 優化 2 ( hash function ) 看完 [hash function 觀念和實務](https://hackmd.io/s/HJln3jU_e) ,再上網找了一些關於hash function的文章 了解到 hash function 其實就是把一堆資料分門別類,再根據索引去搜尋。 所以如果把電話簿裡的名字以某種規則分類,若要找特定名字,便可在較小的範圍內搜尋,以減少搜索時間 所謂某種規則,便是 hash function ,雜湊函數。 上網找了一個 BKDR-hash function 便開始著手修改程式碼。 ```clike= unsigned int hash(char lastName[]) { int hash_number = 0; int h = 53; for (int i = 0; lastName[i] != '\0'; i++) hash_number = (hash_number * h + lastName[i]) % TABLE; return hash_number; } ``` hash function 本身很好寫,難在 main 函式的更改,尤其 allocate memory 時極常出錯 ``` size of entry : 32 bytes execution time of append() : 0.073824 sec execution time of findName() : 0.000000 sec *** Error in `./phonebook_opt_hash': munmap_chunk(): invalid pointer: 0x00007ffd6f3b87a0 *** ``` 上網找了一下關於 munmap_chunk(): invalid pointer 的問題,找到一篇提到:the program will crash if you pass a pointer to free() that has not been obtained through malloc(). 修改一下程式碼後(包括 Makefile, calculate.c, runtime.gp等等) ``` size of entry : 32 bytes execution time of append() : 0.056479 sec execution time of findName() : 0.000000 sec ``` ``` Performance counter stats for './phonebook_orig' (100 runs): 570,732 cache-misses # 59.378 % of all cache refs ( +- 0.33% ) 961,182 cache-references ( +- 0.31% ) 260,905,474 instructions # 1.47 insns per cycle ( +- 0.02% ) 177,699,696 cycles ( +- 0.42% ) 0.057268308 seconds time elapsed ( +- 0.52% ) ``` ``` Performance counter stats for './phonebook_opt' (100 runs): 139,321 cache-misses # 46.176 % of all cache refs ( +- 0.56% ) 301,716 cache-references ( +- 0.53% ) 243,828,644 instructions # 1.94 insns per cycle ( +- 0.02% ) 125,408,510 cycles ( +- 0.29% ) 0.040634235 seconds time elapsed ( +- 0.38% ) ``` ``` Performance counter stats for './phonebook_opt_hash' (100 runs): 742,301 cache-misses # 59.684 % of all cache refs ( +- 0.76% ) 1,243,727 cache-references ( +- 0.24% ) 336,095,343 instructions # 1.22 insns per cycle ( +- 0.01% ) 275,331,193 cycles ( +- 0.30% ) 0.089550008 seconds time elapsed ( +- 0.36% ) ``` Origin 的 cash-miss 只有 59% ,正常不應該那麼低,所以我下載原版跑了一次,跟之前一樣約略是 85% ~ 86%。 應該是因為我在更改 main.c 時不小心更動某一部份(還不確定再哪),而 origin 有近乎和 hash 版本一樣的 cash-miss ![](https://i.imgur.com/DaaCAGM.png) findName 一直是0.0000秒,原本以為他是沒有進到 while迴圈而直接回傳 NULL ,可是試驗之後發現也不是,這邊一直想不透 --- ## **參考資料** - [Perf -- Linux下的系统性能调优工具](https://www.ibm.com/developerworks/cn/linux/l-cn-perf1/) - https://perf.wiki.kernel.org/index.php/Tutorial - http://groangao.pixnet.net/blog/post/24474489-%5Bc,c%2B%2B%5D-typedef-struct-%E7%94%A8%E6%B3%95%E8%AA%AA%E6%98%8E - [hash table](http://algs4.cs.princeton.edu/34hash/) - https://zh.wikipedia.org/wiki/Strcpy - [ifdef](https://msdn.microsoft.com/zh-tw/library/2a1b21sf.aspx) - [munmap_chunk(): invalid pointer](http://stackoverflow.com/questions/32118545/munmap-chunk-invalid-pointer)