2016q3 Homework1 (phonebook)

# 2016q3 Homework1 (phonebook) contributed by <`abba123`> >>請依照格式填寫標題和"contributed by"[name=課程助教] ### Reviewed by HuangWenChen * 底下 Create_hashtable 地方程式碼亂掉須修改 * github 上建議不用將編譯後執行產生的檔案.txt push 上去 ## 開發環境 OS : ubuntu 16.04.1 LTS 這個作業主要是要研究如何降低 cache miss 現在就讓我們來看看電腦的 cache 長什麼樣子吧 ``` $ lscpu L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 6144K ``` 我只擷取了 cache 的部份簡單的來說越上層的 cache 存取的速度越快 L1>L3 --- 為了讓我們 coding 更舒服我們來修改一下 vim 裡面的設定順便複習一下 vim 的操作吧 [鳥哥的 linux 私房菜](http://linux.vbird.org/linux_basic/0310vi.php) $vim ~/.vimrc 打上自己要的設定之後下次在用 vim 就會生效啦 ``` set hlsearch set backspace=2 set autoindent set nu syntax on ``` ## 案例分析 Phonebook 目的：分析電話簿搜尋程式，探討 cache miss 對於整體效能有顯著影響在分析前當然要跑一次啦先把我們的檔案編譯 $ make phonebook_orig 用 perf 來做分析 $ perf stat -r 10 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses ./phonebook_orig ``` Performance counter stats for './phonebook_orig' (10 runs): 1,216,859 cache-misses # 85.464 % of all cache refs ( +- 1.08% ) 1,423,831 cache-references ( +- 0.77% ) 2,599,531 L1-dcache-load-misses ( +- 0.21% ) <not supported> L1-dcache-store-misses <not supported> L1-dcache-prefetch-misses 33,887 L1-icache-load-misses ( +- 10.02% ) 0.062661439 seconds time elapsed ( +- 6.23% ) ``` >無法查看 L1-dcache-store-misses L1-dcache-prefetch-misses ? >>「程式輸出的文字訊息」請勿使用圖片表示，請改由文字貼上[name=課程助教] 這邊是把程式做10次來取一個平均可以發現尚未優話的版本 cache miss 的機率很高這樣表示在 cache 裡面幾乎找不到我們所需要的資料必須往下到 memory 裡面去找花費相當大的時間經過閱讀 [Programming Small](https://hackmd.io/s/S1rbwmZ6)之後有了大概的方向來減少 cache miss 的發生 * 簡化 Struct 增加 cache 的存放數 * 實做 Hash function 降低 findName() 執行速度 * 構想中... ### 簡化 Struct 先來看一下我們原本儲存資料的結構 ```c= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 根據 findName() ，其實我們只需要 lastName 這項資訊這表示我們存了太多不會用到的資訊我們把除了 lastName 的其餘資訊搬到別的空間做儲存並加入一個指標做連結這樣可以大幅簡少 struct 的大小 ```c= typedef struct __PHONE_BOOK_DETAIL { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_DETAIL *pNext; } detail; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; detail *book_detail; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 我們來執行一次優化版本吧 $ make phonebook_opt $ perf stat -r 10 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses ./phonebook_opt ``` Performance counter stats for './phonebook_opt' (10 runs): 167,123 cache-misses # 43.034 % of all cache refs ( +- 2.28% ) 388,348 cache-references ( +- 1.65% ) 1,279,994 L1-dcache-load-misses ( +- 0.18% ) <not supported> L1-dcache-store-misses <not supported> L1-dcache-prefetch-misses 24,808 L1-icache-load-misses ( +- 10.19% ) 0.040618060 seconds time elapsed ( +- 8.60% ) ``` 我們可以看到，經由我們的優化，改變 struct，cache miss 大幅下降 ``` size of entry : 136 bytes execution time of append() : 0.037243 sec execution time of findName() : 0.005767 sec ``` ``` size of entry : 32 bytes execution time of append() : 0.028887 sec execution time of findName() : 0.001861 sec ``` entry 大幅縮減，可以放進cache的entry更多，所以cache miss 的機率下降，也導致我們的執行速度上升 Gnuplot ```shell $ make plot ``` ![](https://i.imgur.com/hJ0wStW.png) 不論是 append() findName() 執行時間都下降了呢！ --- ### Hash Function 接下來為了減少 findname() 搜尋的時間我們來實做一下 hash table 我們所採用的是 [djb2](http://www.cse.yorku.ca/~oz/hash.html) 這個 hash function ```c= int hash(char *str) { unsigned long hash=5381; while(*str != '\0') hash = (hash<<5)+ *str++; return hash%42737; } ``` 接下來是建立一個 hash table ```c= typedef struct hash_table { entry **table; } hash_table; ``` ```c= hash_table *Create_hashtable(hash_table *t,int size) { int i=0; t->table=(entry **)malloc(sizeof(entry)*size); for(i=0; i<size; i++) { t->table[i]=NULL; } return t; } ``` 去修改一下 find() append() 我們直接把經過 hash 轉換的字串，直接丟到相對應的 table 裡面 ```c= entry *append(char lastName[], hash_table *e) { /* allocate memory for the new entry and put lastName */ int x=hash(lastName); entry *node; node = (entry *) malloc(sizeof(entry)); node->pNext=e->table[x]; e->table[x]=node; strcpy(node->lastName, lastName); return 0; } ``` 先找到相對應的 table，再去裡面一個一個尋找 lastName 是否相等 ``` entry *findName(char lastName[], hash_table *pHead) { int x=hash(lastName); entry *e; e=pHead->table[x]; while (e != NULL) { if(strcasecmp(e->lastName,lastName)==0) return e; e = e->pNext; } return NULL; } ``` 來比對一下我們優化的結果吧 ``` Performance counter stats for './phonebook_opt_hash' (100 runs): 103,901 cache-misses # 30.510 % of all cache refs ( +- 0.93% ) 340,545 cache-references ( +- 0.23% ) 237,127,330 instructions # 1.62 insns per cycle ( +- 0.02% ) 146,186,435 cycles ( +- 0.38% ) 0.041308486 seconds time elapsed ( +- 0.40% ) ``` 經過加入 hashtable，可以發現 cache miss 又再次降低而且findName()幾乎不用時間了但 append() 的時間卻往上升了一點點 ![](https://i.imgur.com/vgQhmHe.jpg) 雖然hash的時間上多了一點，但由於phonebook最主要的功能是在查詢所以假如是指 append 一次，findName 1000 次的話，差距就會很顯著 ![](https://i.imgur.com/SEemONq.png) ## FUTURE WORK * 字串壓縮 * binary search tree