2018q1 Homework1 (phonebook)

--- tags: linyunwen --- # 2018q1 Homework1 (phonebook) ###### tags: `linyunwen` contributed by <`LinYunWen`> - [D01: phonebook](https://hackmd.io/s/SykZ8IMOf) ### Reviewed by `nashi5566` * 後面提到名單已經排序的部份，之後有沒有打算將資料打散之後再嘗試呢？ * 已經修改過的新程式請趕快 push 上去並附上簡要的 commit message > < > 請問一下你可以看到的是 master branch 所以沒有看到 commit > 你可以切換 branch 來看到我實作的程式碼，謝謝你 > [name=yw] ### Reviewed by `raygoah` * 目前嘗試的各版本 code ，是以建立不同 branch 的方式，共存在 github 上，應該將所有 branch 合併為一個，也符合作業要求中只有一個 main 檔的要求 * 在 commit message 中有寫到做了什麼以及為什麼要怎麼做，算寫的滿清楚的，但建議可以再加入得到的結果以及實作後造成的影響 ### Reviewed by `afcidk` * 實做後記得要把 **TODO** 刪掉 * 在`feat/search_table` 裡的 `main.c`，發現第74行用到的`temp`變數在第54行就已經宣告了，這樣會不會增加日後更動程式時出現誤用變數的風險？ * 在 hash_table 的實做裡面，你提到 hash table size 是 4500。但在 `BKDRhash()` 裡面，實際上會得的返回值區間是 0～4095。是否可以比較 *使用位元運算子，浪費空間* 和 *使用 mod 運算子，省一些空間* 的效能差異？ * 在 [Summary](https://hackmd.io/s/B1X1lB8uz#Summary) 建立的表格讓之前的實驗結果一目了然，但是有些實驗並非著重在 cache miss rate 上，是否可以將 cache reference 一併列出？ :::info ## Envorinment * Architecture: x86_64 * CPU 作業模式： 32-bit, 64-bit * CPU(s): 4 * 每核心執行緒數：2 * CPU MHz： 2793.668 * CPU max MHz: 3400.0000 * CPU min MHz: 800.0000 * L1d 快取： 32K * L1i 快取： 32K * L2 快取： 256K * L3 快取： 3072K <br> > - **[在 Linux 下查詢硬體資訊的工具指令](https://blog.gtwang.org/linux/linux-hardware-information-command/)** > - <參考 : ktvexe [2017q1 Homework1 (phonebook)](https://hackmd.io/w5PPxJUwS6G32fbLDBhLiw?view)> ::: ## Setting > perf top > add "sudo" to make authorization > ```shell > $ cat /proc/sys/kernel/perf_event_paranoid > $ 3 > ``` > set perf_event_paranoid = 1 (default) with > ```sudo sh -c 'echo 1 > /proc/sys/kernel/perf_event_paranoid'``` > [superuser solution](https://superuser.com/questions/980632/run-perf-without-root-rights) >> Jeff Vander Stoep posted the patch on July 27. It adds a another value that can be set for the sysctl parameter (i.e. kernel.perf\_event\_paranoid=3) that restricts perf\_event\_open() to processes with the CAP\_SYS\_ADMIN capability. Currently, perf\_event\_paranoid is set to 2 by default, which disallows access to some perf features (raw tracepoint access, CPU event access, and kernel profiling) to processes without the proper capabilities; the patch does not change the default. He also submitted another [patch](https://lwn.net/Articles/696236/)that would allow configuring the kernel to make 3 be the default perf\_event\_paranoid value. >> [lwn.net](https://lwn.net/Articles/696216/) >> 終端機等文字訊息請直接將文字貼上，避免使用圖片 >> [color=red][name=vvn] >> 感謝你，我之後會注意的，不過其實這似乎是個 error ，所以我才直接將結果貼上，我會再想想怎樣呈現比較好 >> [name=yw] - reference: [CSIE wiki - perf](http://wiki.csie.ncku.edu.tw/embedded/perf-tutorial#%E5%AE%89%E8%A3%9D)、[perf 原理和實務](https://hackmd.io/s/B11109rdg) ## Original - execution time ```shell size of entry : 136 bytes execution time of append() : 0.060688 sec execution time of findName() : 0.005451 sec ``` - stats ```shell Performance counter stats for './phonebook_orig' (100 runs): 964,631 cache-misses # 79.394 % of all cache refs ( +- 0.35% ) 1,214,991 cache-references ( +- 0.38% ) 264,776,714 instructions # 1.25 insn per cycle ( +- 0.02% ) 212,053,930 cycles ( +- 0.43% ) 0.065508261 seconds time elapsed ( +- 0.77% ) ``` ![](https://i.imgur.com/15wxlCY.png) - findName ```c= /* original version */ entry *findName(char lastName[], entry *pHead) { while (pHead != NULL) { if (strcasecmp(lastName, pHead->lastName) == 0) return pHead; pHead = pHead->pNext; } return NULL; } ``` - append ```c= entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; } ``` - 其他測試資料 - all-names.txt (16751筆) ```c++= fp = fopen("./dictionary/all-names.txt", "r"); int counter = 0; double total_time = 0.0; i = 0; while (fgets(input, sizeof(input), fp)) { while (input[i] != '\0') i++; input[i - 1] = '\0'; i = 0; /* compute the execution time */ clock_gettime(CLOCK_REALTIME, &start); // TODO: findName() clock_gettime(CLOCK_REALTIME, &end); cpu_time2 = diff_in_second(start, end); counter++; total_time += cpu_time2; } printf("execution time of findName() : %lf sec\n", total_time/counter); fclose(fp); ``` ```shell size of entry : 136 bytes execution time of append() : 0.043385 sec execution time of findName() : 0.003817 sec Performance counter stats for './phonebook_orig': 2,703,006,518 cache-references 2,349,397,909 cache-misses # 86.918 % of all cache refs 52.409814206 seconds time elapsed ``` ## Idea - Goal: 加速 ```findName``` ，要提昇 ```findName``` 的速度，要馬尋找較快速，要馬降低 cache miss ，因為當 CPU 有 cache miss ，就必須跟其他記憶體存取資訊，使得搜尋速度便慢，故主要實現加速的方式有以下兩個大方向 - **使用較快的演算法** 1. hash table 2. search table - **降低 cache miss** 1. 建立新的儲存結構 ## Implement ### 建立新的儲存結構 1. reduce the size of __PHONE_BOOK_ENTRY > branch "feat/reduce-size" - 比較接近使用者體驗與經驗狀況來看，我們在電話簿中搜尋時，主要以人名來做搜尋，當搜尋到對應的人物時，才會給予使用者點擊顯示此聯絡人更多的資訊。而這份程式碼 ```findName``` 亦是以 ```entry``` 中的 ```lastName``` 來做主要搜尋，因此可以發現 ```entry``` 並不需要這大量的空間存放全部的資訊。 - 創建一個新的 ```struct __INFORMATION``` 來存放其他的資料，而原本的 ```entry``` 中存放 ```lastName```, ```firstName```, 以及一個指向其他 ```information``` 的指標。 > <參考 : ktvexe [2017q1 Homework1 (phonebook)](https://hackmd.io/w5PPxJUwS6G32fbLDBhLiw?view)> - entry ```c=9 // phonebook_opt.h typedef struct __INFOMATION { char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } info_entry; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; info_entry *infomation; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` - execution time ```shell size of entry : 48 bytes execution time of append() : 0.040588 sec execution time of findName() : 0.002486 sec ``` - stats ```shell Performance counter stats for './phonebook_opt' (100 runs): 196,161 cache-misses # 41.025 % of all cache refs ( +- 0.63% ) 478,149 cache-references ( +- 0.63% ) 248,685,554 instructions # 1.54 insn per cycle ( +- 0.02% ) 161,622,141 cycles ( +- 0.47% ) 0.050508480 seconds time elapsed ( +- 0.61% ) ``` 在這邊可以看到原本的 136 bytes 以降低為 46 bytes，因此 cache miss rate 也大幅下降為 41 ％，所以 ```findName``` 的時間跟著下降， ```append``` 也是，因為所處理的資料量較少，整體速度都跟著變快 ![](https://i.imgur.com/MkG4PYw.png) - 測試其他資料利用 ./dictionary/all-names.txt 來做其他名字的搜尋，而已全部 16751 筆資料取平均得到以下的結果，可以看到搜尋速度跟 cache miss rate 也有明顯的下降 - all-names.txt (16751 筆) ```shell size of entry : 48 bytes counter: 16751 execution time of append() : 0.047555 sec execution time of findName() : 0.001276 sec Performance counter stats for './phonebook_opt': 573,715,877 cache-references 181,525,064 cache-misses # 31.640 % of all cache refs 21.444725248 seconds time elapsed ``` ### 更新演算法 1. Search table > brance "feat/search-table" 因為觀察到輸入檔 ```dictionary/words.txt``` 都是照著字母排列，如果說今天想要尋找的字串從第一個字就不相同，那就不應該繼續比較其他字，而是尋找下一行，但是這樣做也是增加多餘的尋找，如果建立一份每一個字母的第一筆資料是在哪一個位置再開始尋找，即可減少搜尋的次數 - execute time ```shell size of entry : 136 bytes execution time of append() : 0.057229 sec execution time of findName() : 0.000019 sec ``` - stats ```shell Performance counter stats for './phonebook_opt' (100 runs): 208,331 cache-misses # 49.684 % of all cache refs ( +- 1.06% ) 419,311 cache-references ( +- 2.06% ) 205,218,950 instructions # 1.12 insn per cycle ( +- 0.02% ) 183,614,971 cycles ( +- 0.86% ) 0.058406260 seconds time elapsed ( +- 1.19% ) ``` 在這邊可以看到，雖然資料結構的大小並沒有改變，仍然為 136 bytes ，但是 ```findName``` 搜尋時間和 cache miss rate 都有明顯的下降，推測原因就是建立 table 後減少了不必要的查詢，相對的也不會過度的載入不需要的資料，造成大量的 cache miss ![](https://i.imgur.com/vMFdmUC.png) - reduce size and search table ``` Performance counter stats for './phonebook_opt' (100 runs): 115,498 cache-misses # 47.900 % of all cache refs ( +- 1.01% ) 241,123 cache-references ( +- 1.09% ) 190,222,390 instructions # 1.42 insn per cycle ( +- 0.02% ) 133,871,200 cycles ( +- 0.48% ) 0.041614991 seconds time elapsed ( +- 0.55% ) ``` ![](https://i.imgur.com/te83Imq.png) - 測試其他測試值 - all-names.txt (16751筆) 的平均值 ``` size of entry : 48 bytes execution time of append() : 0.063503 sec execution time of findName() : 0.000050 sec Performance counter stats for './phonebook_opt': 494,896 cache-misses # 12.532 % of all cache refs 3,949,064 cache-references 0.932224762 seconds time elapsed ``` - 問題探討 1. 這個是針對資料集是按照字母排序才能使用，若是不規則的排序，這方法會是大錯特錯，如果真的要使用應該要配合先將資料做預處理的動作 2. 因為目前建立的 search table 只跟每筆資料的第一個字元有關，因此若是有一組資料集，每筆資料的字首皆相同，那這個搜尋方式的行為，會跟普通的線性搜尋一樣，而更進一步的方式是可以建立成跟字母相關類似字元樹的樹狀結構 table 可以更完全的做排除不必要搜尋 2. Hash table > branch "feat/hash-table" 跟 search table 有點類似，一樣是預先建立一個 table 但是這個表的建立方式是利用 hash 的方式來進行，而這個部份我是參考[各种字符串Hash函数比较](https://www.byvoid.com/zhs/blog/string-hash-compare)，然後選用 BKDRHash 方式，因為起來效果是最好的，並採用 seed = 131 ，比較困難的點是，考量到測試資料量有30多萬比，照理來說 hash table 應該要1萬以上，會比較妥當，但是我目前 hash 速度非常之慢，造成 ```append``` 反而上升兩倍，因此先將 hash table size 為 4500 ```c++= // 參考網路實作 BKDHash int BKDRHash(char *str) { int seed = 131; int hash_value = 0; while (*str) { hash_value = hash_value * seed + (*str++); } return hash_value & 0x00000FFF; } ``` ```c++= // 模仿 append() 實作 hash table 的 linear append hash_entry* hash_linear_append(char lastName[], entry *value, hash_entry *block) { strcpy(value->lastName, lastName); block->next = (hash_entry*) malloc(sizeof(hash_entry)); block = block->next; strcpy(block->lastName, lastName); block->value = value; block->next = NULL; return block; } ``` ```c++= entry *findName(char lastName[], hash_entry **table) { int value = BKDRHash(lastName); hash_entry *temp = table[value]; do { if (strcmp(temp->lastName, lastName) == 0) { return temp->value; } temp = temp->next; } while (temp != NULL); return NULL; } ``` 其資料結構大小並沒有變化，但是同樣的 cache miss rate 及 ```findName``` 亦大幅下降 ```shell Performance counter stats for './phonebook_opt' (100 runs): 400,557 cache-misses # 38.316 % of all cache refs ( +- 1.87% ) 1,045,407 cache-references ( +- 1.80% ) 339,438,608 instructions # 1.05 insn per cycle ( +- 0.02% ) 322,594,446 cycles ( +- 1.51% ) 0.101027885 seconds time elapsed ( +- 1.65% ) ``` - stats ![](https://i.imgur.com/s8QOGIM.png) - reduce and hash ```shell Performance counter stats for './phonebook_opt' (100 runs): 285,057 cache-misses # 33.311 % of all cache refs ( +- 2.50% ) 855,755 cache-references ( +- 1.73% ) 332,727,023 instructions # 1.32 insn per cycle ( +- 0.03% ) 251,489,583 cycles ( +- 1.51% ) 0.077913353 seconds time elapsed ( +- 1.69% ) ``` ![](https://i.imgur.com/H5QpYqT.png) - 測試其他資料 - all-names.txt (16751 筆) ```shell size of entry : 48 bytes execution time of append() : 0.092830 sec execution time of findName() : 0.000000 sec Performance counter stats for './phonebook_opt': 374,618 cache-misses # 38.882 % of all cache refs 963,477 cache-references 0.108437187 seconds time elapsed ``` ### Summary - 在表中描寫各項狀況的 ```append```、```findName```、cache miss rate 的時間和比率，在降低資料結構的大小以及套用快速的搜尋演算法後，跟原始線性搜尋相比，減少搜尋次數，可以顯著的降低 cache miss rate，但是相對的 ```append``` 可能因為建立特殊的資料結構而使時間上升，而一結果整體下來，降低資料結構大小，並配合使用 hash table searching 演算法效果最好 - 但是我自己對自己所實作的結果有所疑慮，因為就測試結果來看有點過於樂觀，但我目前卻觀察不到有什麼奇怪的問題點，所以只能先以此保守說明 | | original | reduce size | search table | hash table | search table (reduce size) | hash table (reduce size) | | -------- | -------- | -------- | ------- | -------- | ------ | ------- | | append | 0.050533 | 0.041713 | 0.057229 | 0.09508 | 0.039643 | 0.073392 | | findName | 0.00531 | 0.003453 | 0.000019 | 0.000002 | 0.000015 | 0.000001 | | cache miss rate | 79.894 % | 41.025 % | 49.684 % | 38.316 % | 47.900 % | 33.311 % | ## 問題回答 1. 本例選用的 dataset (也就是輸入電話簿的姓名) 是有代表性嗎？跟真實英文姓氏差異大嗎？ - 如果觀察一下測試用的 dataset (words.txt)，就可以明顯的發現，雖然資料數多，但是很多筆資料只是無意義的英文字母字串，而非一組英文字，甚或一組人名，而且整筆資料已被字母排序過。 2. 資料難道「數大就是美」嗎？ - 很明顯是錯誤的觀念。很多時候雖然我們需要大量的資料來做測試，以確保程式的正確性或是效能，但是更重要的是資料的質，因為有貼近實際發生狀況下的資料才能有效的測試出程式較可能出現的錯誤，猶如 "garbage in, garbage out" 3. 如果我們要建立符合真實電話簿情境的輸入，該怎麼作？ - 現在網路發達，已有許多建立好的資料，可以在網路上搜尋 English name，e.g. [Sample English Names](https://sites.google.com/site/eikenzwijn/my-english-friends/sample-of-english-names)、[English Baby Names](http://www.babynames.org.uk/english-baby-names.htm) ## [讀後感想 <因為自動飲料機而延畢的那一年>](https://hackmd.io/eP51LY7PSnSSLEOi8QF-Zg) ## Reference - [2017 年春季作業說明](https://hackmd.io/s/Bk-1zqIte) - [效能分析工具 Perf](http://mropengate.blogspot.tw/2016/04/perf.html)