2017q1 Homework1 (phonebook)

# 2017q1 Homework1 (phonebook) contributed by < `henry0929016816` > ### Reviewed by `illusion030` * 可以嘗試使用不同的 hash function 來實作並比較 * 沒有回答作業要求中的問題 ### Reviewed by `petermouse` * 需考量不同 hash table 大小對搜尋造成的影響 * 可以考慮使用樹狀結構 ( binary search tree , red–black tree 等) 來實作，並與其他方法一同分析比較 * Git commit [b9e5c2d](https://github.com/henry0929016816/phonebook/commit/b9e5c2dad108e0956d91d4121fe14653a15172e9) 標題不需指出檔名，標題首字需大寫 * 要考慮到 `malloc()` 失敗的情形 ### Reviewed by `SeanCCC` * 為何make cache-test需樣sudo權限? ## 安裝 ubuntu 16.04 [acer windows8.1 安裝雙系統](https://hackmd.io/BwNgDAhgnJILQCYDMATMcAsBGBAjOuSAZgKxxLZ7BYoxIDGQA===#) ## 開發環境 ``` os: ubuntu 16.04 LTS Architecture: x86_64 CPU 作業模式： 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 每核心執行緒數：2 每通訊端核心數：2 Socket(s): 1 NUMA 節點： 1 供應商識別號： GenuineIntel CPU 家族： 6 型號： 60 Model name: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz 製程： 3 CPU MHz： 799.963 CPU max MHz: 3200.0000 CPU min MHz: 800.0000 BogoMIPS: 5188.12 虛擬： VT-x L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 3072K NUMA node0 CPU(s): 0-3 ``` 利用 ==lscpu== 指令取得電腦內部的資訊 >請列出硬體相關資訊，如：cache, memory >[color=red][name=課程助教] >抱歉給的資訊太少，已補上相關資訊[name=henry0929016816] ## 預備知識 - [ ]perf 的使用方式 - [ ]gnuplot 畫圖 - [ ]cache ## 效能分析-優化前 * 執行 phonebook_orig ``` ./phonebook_orig ``` 得到 findName() 跟 append() 所需的執行時間 ``` size of entry : 136 bytes execution time of append() : 0.040664 sec execution time of findName() : 0.005231 sec ``` * 找出 cache-misses 率 ``` sudo make cache-test ``` 發現 cache-misses 率高達==84.546%== ``` Performance counter stats for './phonebook_orig' (100 runs): 970,646 cache-misses # 84.546 % of all cache refs ( +- 0.34% ) 1,148,065 cache-references ( +- 0.27% ) 262,101,748 instructions # 1.44 insn per cycle ( +- 0.02% ) 181,954,743 cycles ( +- 0.18% ) 0.063477558 seconds time elapsed ( +- 0.65% ) ``` ## 優化方式 ### 方法一: 改變 struct 結構 * #### 結構太大造成的瓶頸認真檢視 phonebook_orig.c 的程式碼 ```c= entry *findName(char lastName[], entry *pHead) { while (pHead != NULL) { if (strcasecmp(lastName, pHead->lastName) == 0) return pHead; pHead = pHead->pNext; } return NULL; } entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; } ``` 將會發現 append() 跟 findName() 函式使用到的資料只有 __PHONE_BOOK_ENTRY 裡的 lastName，其他的資訊幾乎都沒有用到，卻佔了一大部分的空間。我的電腦 leve 1 cache 有32 kbyte，而 __PHONE_BOOK_ENTRY 結構的大小為 136 bytes，==( 32 * 1024 ) / ( 136 ) = 240.941==。只考慮當下只有這支程式在跑的情況下，將所有的 __PHONE_BOOK_ENTRY 結構都丟進 findName() 裡去執行，我的level 1 cache 也只能儲存 240.941 個 entry，查找的資料一但過多，勢必會發生頻繁的 cache miss。 * #### 突破瓶頸修改 phonebook_opt.h ```c= typedef struct __PHONE_BOOK_ENTRY { // char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry_detail; //只存放lastName,讓struct的大小降低 typedef struct __LAST_NAME_ENTRY{ char lastName[MAX_LAST_NAME_SIZE]; entry_detail *detail; struct __LAST_NAME_ENTRY *pNext; }entry; ``` 新增一個資料結構 __LAST_NAME_ENTRY 用來儲存 lastname ，至於其他的資訊，等到有需要再 ==malloc== entry_detail 去儲存。使得需要儲存的結構降為 32 byte ，照著剛才的算法 ==( 32 * 1024 ) / ( 32 ) = 1024==，按照同樣的假設條件，只有這支程式再跑，我們可以發現比剛才還要多的 entry，期許能降低 cache-miss。 * #### 實驗結果 * 利用==sudo make cache-test== 觀看兩支程式的cache miss * phonebook_orig ``` Performance counter stats for './phonebook_orig' (100 runs): 628,561 cache-misses # 74.442 % of all cache refs ( +- 2.65% ) 844,360 cache-references ( +- 1.67% ) 262,444,912 instructions # 1.21 insn per cycle ( +- 0.02% ) 216,675,317 cycles ( +- 1.36% ) 0.135132961 seconds time elapsed ( +- 1.88% ) ``` * phonebook_opt ``` Performance counter stats for './phonebook_opt' (100 runs): 112,042 cache-misses # 47.039 % of all cache refs ( +- 0.61% ) 238,189 cache-references ( +- 2.55% ) 244,613,756 instructions # 1.39 insn per cycle ( +- 0.02% ) 175,806,100 cycles ( +- 1.67% ) 0.107832504 seconds time elapsed ( +- 2.03% ) ``` 發現 cache-miss 的確是有降低的 * #### 附圖說明 ![](https://i.imgur.com/fl66SIJ.png) cache-miss 率的降低，影響了執行時間，可推測 cache-miss 較低的程式，執行速度比較快。 * 由於電腦不是高效能模式，執行速度有點慢，透過指令調整為高效能模式 * ==cpupower frequency-set --governor performance== * 得到新的執行時間圖 ![](https://i.imgur.com/ILTAbeZ.png) ### 方法二：建立 hash function * #### 問題：建立的資料表，無法快速查到資料以下為原本的 append 函式 ```c= entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; } ``` 建立的資料表為 singly-linked list，因此每次搜尋資料都必須從 linked list 的頭開始往下尋找，如果想要查的資料不幸位在 linked list 末端，那將會浪費許多時間在走訪無關的節點。 * #### 想法既然建立的資料表，無法快速查到想要的資訊，那我們就重新建表，使得資料以特定的原則==分門別類==，查詢資料時，只要知道資料在哪個類別，就能以較快的方式尋得資料。 * #### 分類的方式 : hash function 參考[hash table 之 bkdrhash算法解析及擴展](http://blog.csdn.net/wanglx_/article/details/40400693)，我們將字串依照數學運算，求得值，當作是資料的==類別==，兩個字串如果擁有相同的值，則用 linked list 串接在一起，方便搜尋資料時，只要求得字串的值，就能知道該往哪個類別搜尋資料，不用在從頭到尾辛辛苦苦找資料了。 * #### 實作網路上有很多 hash function 的實作方式，我選擇使用 bkdrhash 。以下為實作函式。 `phonebook_hash.c` ```c= unsigned int bkdr_hash(char* key) { char* str = key; unsigned int seed=31; unsigned int hash=0; while(*str!='\0') { hash=hash * seed +(*str++); } /*PRIME_NUMBER=4793*/ hash%=PRIME_NUMBER; return hash; } ``` 假設將字串 abc 丟進此函式，參照 ascii code a 為 97，得到這個資料我們可以計算，==(97~a~ * 31^2^ + 98~b~ * 31^2^ + 99~c~ * 31^0^) % 4793 = 20.10==，得到的值則當作分類用的 index。我們可以期許透過這個方式，加速 findName() 的執行時間 * #### 實際結果 - cache miss降低透過hash function 我們得到了比第一個優化方式還要少的 cache miss。以下為兩者的 cache miss 資料改變 struct 結構 ``` Performance counter stats for './phonebook_opt' (100 runs): 127,322 cache-misses # 45.177 % of all cache refs ( +- 0.90% ) 281,831 cache-references ( +- 0.89% ) 244,515,172 instructions # 1.95 insn per cycle ( +- 0.02% ) 125,228,080 cycles ( +- 0.30% ) 0.043359649 seconds time elapsed ( +- 1.24% ) ``` hash function ``` Performance counter stats for './phonebook_hash' (100 runs): 187,364 cache-misses # 36.710 % of all cache refs ( +- 1.23% ) 510,394 cache-references ( +- 0.56% ) 256,299,139 instructions # 1.53 insn per cycle ( +- 0.02% ) 167,345,086 cycles ( +- 0.29% ) 0.057692378 seconds time elapsed ( +- 1.06% ) ``` * #### 附圖說明 ![](https://i.imgur.com/APU9xFV.png) append() 時間的增加，是因為建構資料時多了計算 hash function 的時間，然而這犧牲的時間卻換取了 findName() 極高的效能提升。 ## 參考資料 * [ktvexe 的 hackmd](https://hackmd.io/MYZgnCwGbALAtGArCADPWAjAhgRnpgOwAmY8UAbMJqsLgBxb2ZA=#) * [jkrvivian 的 hackmd](https://hackmd.io/s/SyOQgOQa#) * [perf 原理和實務](https://hackmd.io/s/B11109rdg#) * [Programming Small](https://hackmd.io/KYYwzAjAHATAJgVgLQEMQDMxICwIcJATgHZhikowE4YA2AI2xnqgSA==#) * [cache 原理和實驗](https://hackmd.io/s/S14L26-_l#) * [Linux 系統下如何查看 CPU 型號、核心數量、頻率和溫度](https://magiclen.org/linux-view-cpu/) * [高效能電腦的程序優化](http://linux.vbird.org/linux_enterprise/cputune.php) * [釋放與清除 cache memory](http://gwokae.mewggle.com/wordpress/2010/06/%E9%87%8B%E6%94%BE%E8%88%87%E6%B8%85%E9%99%A4-linux%E8%A8%98%E6%86%B6%E9%AB%94%E4%B8%AD%E7%9A%84cache-memory/) * [hash table 之 bkdrhash算法解析及擴展 ](http://blog.csdn.net/wanglx_/article/details/40400693) * [gnuplot 設定長條圖並標記數值](http://gmd20.blog.163.com/blog/static/1684392320146163444900/)