2018q1 Homework1 (phonebook)

# 2018q1 Homework1 (phonebook) ###### tags:`shaimejump520` contributed by < `shaimejump520` > ### Reviewed by `bauuuu1021` * 修正小小筆誤 * 提出大家都沒想過的問題並實際做實驗驗證蠻不錯的，但因此沒實作到 Hash function 及 BST 有點可惜 --- ## 開發環境 * OS: ubuntu 16.04 LTS * Architecture: x86_64 * CPU op-mode(s): 32-bit, 64-bit * Byte Order: Little Endian * CPU(s): 4 * On-line CPU(s) list: 0-3 * Thread(s) per core: 2 * Core(s) per socket: 2 * Socket(s): 1 * NUMA node(s): 1 * Vendor ID: GenuineIntel * CPU family: 6 * Model: 60 * Model name: Intel(R) Core(TM) i5-4200H CPU @ 2.80GHz * Stepping: 3 * CPU MHz: 2793.471 * CPU max MHz: 3400.0000 * CPU min MHz: 800.0000 * BogoMIPS: 5586.94 * Virtualization: VT-x * L1d cache: 32K * L1i cache: 32K * L2 cache: 256K * L3 cache: 3072K * NUMA node0 CPU(s): 0-3 ## 開始前準備 * Linux 其實已經使用 ubuntu 好一陣子了，但都沒有認真使用過 vim 看了wiki上的[GNU/Linux 開發工具共筆](https://hackmd.io/c/rJKbX1pFZ)的[終端機，Vim 設定](https://hackmd.io/c/rJKbX1pFZ/https%3A%2F%2Fhackmd.io%2Fs%2FHJv9naEwl) 篇，才發現vim除了簡單的修改外觀、縮排、行數等等之外，還有那麼多可以玩的東西修改vimrc將實用的操作 mapping 到特定按鍵後，再裝了[Minimalist Vim Plugin Manager](https://github.com/junegunn/vim-plug)，vim 使用起來方便了許多，其中最實用的應該是 NERDTree吧，切換檔案速度快上了多，也可以更簡單明瞭的管理檔案 * 剩下的之後再補充吧... ## 基本操作 * 清空快取 ``` $ echo 1 | sudo tee /proc/sys/vm/drop_caches ``` 當 /proc/sys/vm/drop_caches 的值被設定為1時，是表示要求 Linux 釋放沒在使用的一般性快取（pagecache），而設為2時，則代表要求釋放 dentry 與 inode 所使用到的快取，若設為3則是釋放所有的快取（也就是包含1與2的狀況） * 執行執行一次會分別顯示 append() 及 findName() 的執行時間 ``` $ sudo ./phonebook_orig size of entry : 136 bytes execution time of append() : 0.044309 sec execution time of findName() : 0.005483 sec ``` 使用 makefile 中定義的 cache-test 會執行100次，並計算平均的 cache-misses 等等資訊 ``` $ sudo make cache-test . . . Performance counter stats for './phonebook_orig' (100 runs): 1,261,006 cache-misses # 86.324 % of all cache refs ( +- 0.24% ) 1,460,781 cache-references ( +- 0.18% ) 263,966,596 instructions # 1.28 insn per cycle ( +- 0.02% ) 205,881,476 cycles ( +- 0.14% ) 0.063045433 seconds time elapsed ( +- 0.99% ) ``` 如上，此次的 cache miss 機率高達 86.324% 使用 gnuplot 將資料圖像化以便比較 ![](https://i.imgur.com/L2WHTcL.png) ## 問題分析 * 提示可改進方向 * 更改 entry 的 size 節省空間，應可使 cache miss 下降 * Hash function，有效加快搜尋速度至 O(1) * Binary Serch Tree，也應能加快搜尋速度至 O(logn) ### 優化一：從Structure著手 [Github link](https://github.com/shaimejump520/phonebook/commit/7e9ccc6ef307e30a55bccfc1524630a2b3e70929) * entry 中無用的資料結構太多，造成記憶體負擔 ```clike= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 實際上在 findName() 中僅需搜尋 lastName，並不需傲用到其他的資料，然而該 structure 定義之下足足使用的 136 bytes 的記憶體空間，這在老師的影片中與前賢們的筆記裡都有提到： > 優化前版本的 Structure size = 136 bytes ，而我的 layer1 cache 也只有 32k bytes 頂多也只能放 32 * 1024 / 136 = 240.94 筆資料,在查找將近 35 萬筆資料時，無可避免的會發生大量 cache-miss，縮小體積後的 struct size = 32 bytes，能放 1024 筆資料，應該能有效減少 cache-miss。[name=Ryan Wang] [2017q1 Homework1 (phonebook) ](https://hackmd.io/iUY7i3c3Ty-GXXQs70Krmw?both) 各位先輩們的解決方法都大同小異，另外新定義一個 structure - detailEntry 將原本搜尋時使用不到的資料定義在其中，而原本的 entry 只存放 lastName 與指向 detailEntry 的 pointer 及 linked-list 所需的 pNext pointer ，如此一來將 entey size 降低到 32 bytes ，應會使 cache-miss rate 下降。 ```clike= typedef struct __PHONE_BOOK_DETAIL_ENTRY { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } detailEntry; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; detailEntry *detail; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 執行結果如下： ![](https://i.imgur.com/4Steqjd.png) cache-miss rate 也下降到 39.735% > 構想中的效能改進有達到，然而以上都是重複先賢們的實驗而已，在此過程中我特別注意到，大家都可以明瞭的說原始 structure 設計 size 為 136 bytes，而我卻對此疑惑不已，這也是我進行以下探討與實驗的緣由 [name=shaimejump520] ### 問題討論 -- Structure Size * 背景知識 * size of char = 1 bytes * size pointer = 4 or 8 bytes ( pointer 中儲存的是 memory address，所以在不同位元的電腦中，其 size 應該不相同，我的開發環境是 64 bits，所以應該為 8 bytes) 小測試： ```clike= int main() { char *ptr; printf("%d\n", sizeof(ptr)); return 0; } ``` 輸出結果：8 * Size 問題：原始結構中，若將各資料量大小相加，結果應為 131 bytes，然而 sizeof(entry) 的輸出值卻是 136，我為此感到蠻納悶的，後來發現有 Alignment 這件事，因為事關 memory access 的效能問題，所以 compiler 傾向將資料對齊，此舉能提升資料存取的效率 ```clike= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 與 131 最接近的 8 的倍數正好是 136，所以我猜測 compiler 會自己將資料對齊至 8 bytes 也就是 64 bits * 驗證 -- 猜測錯誤，並非總是對齊 8 bytes 為了證實上面的猜測，我做了個小測試： ```clike= typedef struct { char a; char b; } s; int main() { printf("%d\n", sizeof(s)); return 0; } ``` 輸出結果是：2 所以我原先的猜測是錯誤的，並非總是對齊 8 bytes * 經過多次實驗，發現會根據排列順序對下方的宣告型態做對齊 ```clike= typedef struct { char a; //對齊b,所以佔 4bytes int b; } s; typedef struct { char a; //對齊b,所以佔 4bytes int b; //a與b總共佔 8bytes，正好與 pointer 對齊 int c; //對齊ptr,所以佔 8bytes int *p; } t; int main() { printf("%d ", sizeof(s)); printf("%d\n", sizeof(t)); return 0; } ``` 輸出結果為：8 24 圖形解說：（空白對齊所需空置space） structure s : | a | | | | | - | - | - | - | | b | b | b | b | structure t : | a | | | | b | b | b | b | | - | - | - | - | - | - | - | - | | c | c | c | c | | | | | | p | p | c | p | p | p | p | p | >[color=red]表格（3,3) 應為 p 而不是 c [name=bauuuu1021] * 有趣的是，不同的排列順序其 size 可能不同 ```clike= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; struct __PHONE_BOOK_ENTRY *pNext; //將pNext移動至此宣告 char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } entry; ``` 輸出：size of entry : 144 bytes（與原先的 136 bytes 不同） Q：為什麼 compiler 不將資料順序 reorder 再做對齊，以取得最小的空間使用呢？ A：因為 compiler 無法確認 struct 是何種儲存結構 (for example: a foreign library, a file on disc, network data, CPU page tables, ...)，compiler 並不知道這些結構的編碼（是以哪種方式儲存的），如過貿然將資料重排，可能產生與原本不相同的結構，造成資料的不一致。舉例來說，ZIP file的儲存結構跟其壓縮方式有關，若任意的將資料 reorder 將導致資料遺失或者是錯誤。[參考資料](https://stackoverflow.com/questions/9486364/why-cant-c-compilers-rearrange-struct-members-to-eliminate-alignment-padding) * 結論 --- 適當的排列 structure 結構也能減少程式執行時的空間浪費