2016q3 Homework1 (phonebook)

# 2016q3 Homework1 (phonebook) contributed by <`bobsonlin`> ## 開發環境 * OS： Ubuntu 16.04.1 LTS * Memory: 12201MB * CPU: * Name: * 4-core Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz * Cache: * L1d Cache: 32KB * L1i Cache: 32KB * L2 Cache: 256KB * L3 Cache: 3072KB * Frequency: * CPU max MHz: 3200 * CPU min MHz: 800 查詢方式： * 點左下方的menu -> system tool -> System Profiler and Benchmark * 由`lscpu`指令可得知CPU的最大和最小的工作頻率 ## 案例分析： Phonebook ### 未優化版本 `$ make run` ``` size of entry : 136 bytes execution time of append() : 0.110389 sec execution time of findName() : 0.007590 sec ``` `$ perf stat --repeat 5 -e cycles,instructions,cache-references,cache-misses,branch-instructions,branch-misses ./phonebook_orig` ``` Performance counter stats for ’./phonebook_orig’ (5 runs): 2,10730710 cycles ( +- 1.15% ) [65.49%] 2,6152,9155 instructions # 1.24 insns per cycle ( +- 0.18% ) [82.75%] 226,7098 cache-references ( +- 0.32% ) [82.75%] 208,8783 cache-misses # 92.135 % of all cache refs ( +- 1.51% ) [82.75%] 4660,8227 branch-instructions ( +- 1.01% ) [85.59%] 99,5033 branch-misses # 2.13% of all branches ( +- 1.13% ) [83.47%] 0.079568228 seconds time elapsed ( +- 13.06% ) ``` #### 用 perf 找尋熱點： `$ ./phonebook_orig & sudo perf top -p $!` ![](https://i.imgur.com/lHHBOM9.png) 由以上的結果可知，在phonebook的程式中，消耗CPU週期最多的是findName()， ### 優化版本 * 修改struct的結構 `$ make run` ``` size of entry : 32 bytes execution time of append() : 0.095677 sec execution time of findName() : 0.004692 sec ``` `$ perf stat --repeat 5 -e cycles,instructions,cache-references,cache- misses,branch-instructions,branch-misses ./phonebook_orig` ``` Performance counter stats for ’./phonebook_orig’ (5 runs): 1,4712,4605 cycles ( +- 0.87% ) [65.26%] 2,4898,6819 instructions # 1.69 insns per cycle ( +- 0.47% ) [82.80%] 69,3457 cache-references ( +- 3.35% ) [83.83%] 38,4917 cache-misses # 55.507 % of all cache refs ( +- 2.35% ) [83.87%] 4366,7742 branch-instructions ( +- 0.53% ) [83.87%] 99,4378 branch-misses # 2.28% of all branches ( +- 2.30% ) [84.65%] 0.060416386 seconds time elapsed ( +- 19.98% ) ``` * plot ![](https://i.imgur.com/1iOjxSw.png) * 分析結果 * Hash Function 加速查詢 * Hash Table 建立： ```C== hashTable *createHashTable(int tableSize){ int i; hashTable *tb; tb = (hashTable*)malloc(sizeof(hashTable)); tb->list = (entry **)malloc(sizeof(entry*)*tableSize); for(i = 0;i < tableSize;i++){ tb->list[i] = NULL; } return tb; } ``` * hash function： ```C== unsigned int hash(char *str) { unsigned int hash_value = 0; unsigned int seed = 131; while(*str) hash_value = hash_value * seed + (*str++); return (hash_value % TABLE_SIZE); } ``` * findName 修改： ```C== entry *findName_HashTable(char lastName[], hashTable *tb) { entry *e; unsigned int index = hash(lastName); for(e = tb->list[index]; e != NULL; e = e->pNext){ if (strcasecmp(lastName, e->lastName) == 0) return e; } return NULL; } ``` * append 修改 ```C== void append_HashTable(char lastName[], hashTable *tb) { unsigned int index = hash(lastName); entry *e; e = (entry*)malloc(sizeof(entry)); strcpy(e->lastName, lastName); e->pNext = tb->list[index]; tb->list[index] = e; } ``` * performace ``` Performance counter stats for './phonebook_opt_HASH' (100 runs): 32,5297 cache-misses # 53.204 % of all cache refs ( +- 0.43% ) 61,1409 cache-references ( +- 0.37% ) 2,3601,3893 instructions # 1.53 insns per cycle ( +- 0.02% ) 1,5460,6436 cycles ( +- 0.36% ) 0.062424603 seconds time elapsed ( +- 3.01% ) ``` * plot ![](https://i.imgur.com/wIOjTzd.png) * 修改讀檔方式，降低append()的執行時間 * 解說：參考[c14006078同學的HackMD](https://hackmd.io/MYNgnCDsCsYEwFoAmInQQFngMwQQwCM5dI88BmAU3LnIEYlsAGIA)，覺得他這個修改很棒，既然從fgets()得來的lastname不影響比對，可以把對字串的處理移至findname()再進行。 * plot ![](https://i.imgur.com/kKeBESb.png) ## 工作進度 * [9/25] 灌好OS，熟悉Github、HackMD、Perf、gnuplot的操作 * [9/28] 實現Hash function * Now 希望能在gnuplot產生3個資料的直方圖，並仔細的分析結果，也希望可以對hash function做深入的研究。 ## 遇到的問題 * cache-references 在網路上查(https://www.ibm.com/developerworks/cn/linux/l-cn-perf1/)是指cache的命中次數，我以為命中次數要愈高愈好，但在成大資工Wiki的Perf介紹中(http://wiki.csie.ncku.edu.tw/embedded/perf-tutorial)卻好像指cache-references降低是件好事？ >> 請回去讀書，計算機組織的課本寫得很清楚。人之所缺乏專業，往往就是活在「腦補」的世界中。 [name=jserv] >> 我google了有關cache的資料，做了以下的整理: >> * 由[參考1](http://nas.takming.edu.tw/pluto/architecture/ch2_6.pdf)得知，cache讀取命中稱為cache hit，讀取錯失稱為cache miss，又若將terminal中的cache-misses次數除以cache-references次數，剛好與terminal中顯示的百分比相同，由此推論cache references應不是cache的命中次數，而是CPU向cache要求資料的總次數。 >> * 由[參考1](http://nas.takming.edu.tw/pluto/architecture/ch2_6.pdf)的儲存系統階層可以得知，向cache取得資料時機為要找的資料不在CPU暫存器時，才會向cache找資料。反向推論，若cache-references次數較少，代表CPU需要的資料可能大多已存在暫存器中。 >> * 綜合上方兩者推論，成大資工WiKi的[Perf介紹](http://wiki.csie.ncku.edu.tw/embedded/perf-tutorial)中，利用存取的局部性修改程式，大幅降低cache-references次數，亦降低cache-misses次數，代表CPU所需要的資料大多存在CPU暫存器中，使得執行時間縮短為原本的三分之一。 * 當我執行`make run`時，發現append()的執行時間為findName()的15倍，但是進行perf找尋熱點時，findName()卻為熱點，明顯大於append()。想了解執行時間與CPU消耗週期的差異。 * 當我在執行`$ git push`時，發生permission denied，經過同學的幫忙，才發現我當時是從老師給的github網址直接clone下來。可以由`git remote -v`來得知本機的專案從哪裡clone下來，以及將會被push到何處，若欲修改，可先`$ git remote #代號`再`$ git remote add #代號`，最後`$ git push --set-upstream #代號 master` ## 待完成項目 ## 參考資料 1. [計算機組織與結構pdf - 德明科技大學資訊科技系](http://nas.takming.edu.tw/pluto/architecture/ch2_6.pdf) 2. [Linux 效能分析工具: Perf](http://wiki.csie.ncku.edu.tw/embedded/perf-tutorial) 3. [yanzijun同學的HackMD](https://hackmd.io/s/H1N0jwvp) 4. [c14006078同學的HackMD](https://hackmd.io/MYNgnCDsCsYEwFoAmInQQFngMwQQwCM5dI88BmAU3LnIEYlsAGIA) ###### tags: `bobsonlin`