2018q1 Homework1 (phonebook)

2018q1 Homework1 (phonebook) === contributed by < `poorpoorQQ` > ### Reviewed by `LinYunWen` * 中文和英數字間應該多留一個空白 * 問題成因我認為有另一個原因是，因為整體資料量有超過 35 萬筆，在搜尋時， cache 往往要不斷載入資料，但是空間上可以很明顯看到 L1 也只有 32K ，因此能存放的數量相比之下非常小，所以才會產生極高的 cache miss rate，故降低 entry size 可以減少 cache miss rate，這部分你可以多做些說明 * 複製 terminal 上的文字在貼上時，也可以使用 ```ctrl+shift+V``` ，這樣間距就不會這麼大 * 對結果數據可以在做多一點分析，說明為什麼為這樣，或是觀察到甚麼，另外可以嘗試不同的測試資料做搜尋，以及對這幾種方法的 cache miss rate 也做出一個表來比較 ### Reviewed by `AliennCheng` * 可以嘗試不同資料輸入，多方比較算法效率和輸入是否有關 * 釋放空間的問題來自原始碼本身，[原出處已更新](https://www.facebook.com/groups/system.software2018/permalink/375496642916504/) * 若已完成 TODO 所要求之事項，可將 TODO 移除 * AVL Tree 的部分，程式結束前沒有進行空間清除的工作，可能導致 memory leak ## 系統狀態 ``` Distributor ID: Ubuntu Description: Ubuntu 17.10 Release: 17.10 Codename: artful Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 42 Model name: Intel(R) Core(TM) i3-2310M CPU @ 2.10GHz Stepping: 7 CPU MHz: 2093.315 CPU max MHz: 2100.0000 CPU min MHz: 800.0000 BogoMIPS: 4186.63 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K ``` ## 研究目標 1. 減少cache-miss 2. 降低搜尋時間(findName) 3. 降低讀資料時間(append) ## 問題成因 cache裡面沒有想要的資料，造成需要向主記憶體搬資料與效率下降。當CPU在向主記憶體搬資料時，會預期目標資料附近的其他資料在未來有較高的機會使用到，因此會一併搬到cache裡。因cache空間極為有限，若一次搬運的資料沒有充分的運用，造成CPU頻繁地向主記憶體索取資料，cache-miss rate上升，整體效能下降。 ## 解決方向眼前主要目標，減少搬運次數進而減少搜尋時間。減少搬運次數可以從兩方面著手：精準鎖定特定搜索區域、增加一次可以搬運的數量。 ## 前人資料回顧增加一次可以搬運的數量：只保留結構指標與lastName，增加cache能夠容納的資料筆數。精準鎖定特定搜索區域：BST、HASH table 1. 只保留結構指標與lastName 參考[`ryanwang`](https://hackmd.io/s/Sk-qaWiYl)的做法，先用註解的方式拿掉其他資料，看看執行效果如何。為單純比較此方法的效果，findname與append函式解與_orig版本相同。 --- * 結構大小比較 ``` _orig size of entry : 136 bytes execution time of append() : 0.088583 sec execution time of findName() : 0.007309 sec _opt size of entry : 24 bytes execution time of append() : 0.072469 sec execution time of findName() : 0.003780 sec ``` --- * cache-test比較 ``` Performance counter stats for './phonebook_orig' (100 runs): 1,456,386 cache-misses # 68.524 % of all cache refs ( +- 0.05% ) 2,125,369 cache-references ( +- 0.09% ) 285,089,301 instructions # 1.24 insn per cycle ( +- 0.02% ) 229,770,943 cycles ( +- 0.08% ) Performance counter stats for './phonebook_opt' (100 runs): 92,041 cache-misses # 26.275 % of all cache refs ( +- 0.14% ) 350,297 cache-references ( +- 0.25% ) 253,528,730 instructions # 1.62 insn per cycle ( +- 0.02% ) 156,277,248 cycles ( +- 0.12% ) ``` --- * 執行時間比較 ![smallStrucTime](https://i.imgur.com/oiGiXeb.png) ## 提出想法既然字典資料是已排序的又僅限於a到z的英文字母，因此向多建立一個26的元素的指標陣列去著手： 1. 將字典中每個字母的第一個字串的結構指標存入指標陣列中，搭配BST使用。搜尋時直接由該字母的指標當作root開始搜尋 ## 實作首先實作AVL樹，由於沒有學過此種資料結構，先參考[`演算法技術手冊, 2/e`](https://www.tenlong.com.tw/products/9789864762637)裡面針對AVL樹的說明。之後搜尋關鍵字"AVL tree C impelement"，參考[`這裡`](https://rosettacode.org/wiki/AVL_tree/C)的範例程式並實作之。實作成果如下： LinkList ``` size of entry : 136 bytes execution time of append() : 0.082714 sec execution time of findName() : 0.007317 sec Performance counter stats for './phonebook_orig' (100 runs): 1,472,928 cache-misses # 68.955 % of all cache refs ( +- 0.04% ) 2,136,073 cache-references ( +- 0.07% ) 285,313,585 instructions # 1.24 insn per cycle ( +- 0.02% ) 229,768,835 cycles ( +- 0.06% ) 0.112038957 seconds time elapsed ( +- 0.24% ) ``` AVL tree ``` execution time of append() : 0.416098 sec execution time of findName() : 0.000000 sec execution time of append() : 0.412337 sec execution time of findName() : 0.000001 sec Performance counter stats for './phonebook_opt' (100 runs): 97,834 cache-misses # 18.158 % of all cache refs ( +- 0.17% ) 538,805 cache-references ( +- 0.82% ) 1,252,267,068 instructions # 1.43 insn per cycle ( +- 0.00% ) 873,624,462 cycles ( +- 0.08% ) 0.420647749 seconds time elapsed ( +- 0.10% ) ``` * 執行時間比較 ![AVL tree](https://i.imgur.com/9P764D7.png) ## 結果比較： | | Orig | 縮小entry | AVL樹 | | -------- | -------- | -------- | -------- | | append time | 0.082714 s | 0.072469 s | 0.412337 s | | findfile time | 0.007317 s | 0.003780 s | 0.000001 s | | cache miss rate | 68.955 % | 26.275 % | 18.158 % | 我的電腦的cache miss rate不管在何種形況下，好像都比其他同學小。不知道會不會是因為這顆CPU已經是九年前的產品，所以設計跟行為跟現在的CPU不太一樣。 ## 程式碼修改理解運作原理之後，實作上並無太大的困難，修改的地方大約為三部分。 1. entry結構的修改 ```clike= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; PhoneBookData *data; int height; struct __PHONE_BOOK_ENTRY *pNext[2]; } entry; ``` 參考[`ryanwang`](https://hackmd.io/s/Sk-qaWiYl)的做法，將沒用到的資料移至其他結構，並由*data指向。 int height用以儲存節點高度。指標陣列 *pNext[]用以指向子樹。 2. main中初始化第一筆entry的部分 ```clike= #ifdef OPT entry *pHead = NULL, *e = NULL; #else /* build the entry */ entry *pHead, *e; pHead = (entry *) malloc(sizeof(entry)); printf("size of entry : %lu bytes\n", sizeof(entry)); e = pHead; e->pNext = NULL; #endif ``` 說明：因AVL樹在實作上採用遞迴來建立節點，所以配置第一個節點在遞迴函式內實行，所以不須先初始化第一個entry。 ```clike= #ifdef OPT insert(&pHead, line); //printf("InOne:%s\n", line); #else e = append(line, e); #endif ``` 說明：insert函式不須回傳值，故不使用原來的append()。 ```clike= #ifdef OPT #else if (pHead->pNext) free(pHead->pNext); free(pHead); #endif return 0; ``` 說明：因為結構不同，原本的釋放功能會有問題(munmap_chunk() invalid pointer)，原因尚未釐清，目前只查到常發生在釋放空間的時候。AVL版本的釋放空間功能尚未完成。 3. 存取NULL指標造成segmentation fault ```clike= if (n->pNext[0] == NULL) return -(n->pNext[1]->height); else if (n->pNext[1] == NULL) return n->pNext[0]->height; else return n->pNext[0]->height - n->pNext[1]->height; ``` 這是花費最多時間DEBUG的部分。參考網站宣稱程式是可執行的，但是經過繁瑣的DEBUG之後，發現有存取到NULL都要另外處理。這裡只列出其中一部分。