# 2016q3 Homework 1 (phonebook) contributed by <`vic85821`> ### Reviewed by `jkrvivian` * 優化方法一(使用union),已改變程式行為,因此無法和其他優化方法比較。 * 沒有評析binary search tree搜尋演算法對findName()的影響 * 已提出使用hash function優化搜尋速度,還未實作,也還未比較不同hash function間的效能 * 沒有採用不同的data set實驗,因原本使用的資料與現實差異很大 * [commit 2b10484](https://github.com/vic85821/phonebook/commit/2b10484241bc2033b40f7a0e433072ddabf25966)的commit訊息很模糊,並不知道具體做了甚麼修改 ## 開發環境 * Ubuntu 16.04.1 LTS >不小心就把windows洗掉了...[name=vic85821] * Linux version 4.4.0-38-generic * CPU : Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz * MEM : 8GB * L1d cache : 32K * L1i cache : 32K * L2 cache : 256K * L3 cache : 3072K > 一併列出硬體規格,像是 CPU processor, cache memory 等資訊,這樣日後實驗重現才有依據 [name=jserv] 因為我的硬體設備比較特別,linux抓不到我的無線網卡,因此要額外安裝驅動程式 * `sudo lshw -C network` 查詢網卡產品名稱 : **MT7630e** * 到 [網站](https://github.com/neurobin/MT7630E) 抓取驅動程式 * `cd`到該檔案所在的目錄 ```shell $ sudo chmod +x install $ sudo ./install ``` ### perf `$ perf list` ``` List of pre-defined events (to be used in -e): branch-instructions OR branches [Hardware event] branch-misses [Hardware event] bus-cycles [Hardware event] cache-misses [Hardware event] cache-references [Hardware event] cpu-cycles OR cycles [Hardware event] instructions [Hardware event] ref-cycles [Hardware event] stalled-cycles-frontend OR idle-cycles-frontend [Hardware event] alignment-faults [Software event] bpf-output [Software event] context-switches OR cs [Software event] cpu-clock [Software event] cpu-migrations OR migrations [Software event] dummy [Software event] emulation-faults [Software event] major-faults [Software event] minor-faults [Software event] page-faults OR faults [Software event] ``` **範例效能評析測試** ![](https://i.imgur.com/V1iKAyz.png) >第一次使用perf,還不是很熟悉 試了很多次 ` perf top -p $pid `才成功!! [name=vic8521] [time=Sat, Sep 24, 2016 5:17 PM] >>「程式輸出的文字訊息」請勿使用圖片表示,請改由文字貼上[name=課程助教] ### gnuplot >之前大二在程式語言,曾經對matrix multiplication,以不同的方法執行(暴力解, 施特拉森演算法, Concurrency),並對效能進行分析 [color=#dd75c5] ![](https://i.imgur.com/CHY8PrS.png) ### astyle 用來檢查coding style是否符合 `astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]` * indent=spaces=4 縮排=4個空白 ## 案例分析 : phonebook ### 作業說明 更改程式碼以降低cache-miss的比例 ### 猜想 - [ ] 更改資料結構,減少不常使用的儲存空間 - [ ] 更改搜尋的方法 ### 未優化版本(origin) **struct** ```c typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` * 資料結構冗長 --- **findName** ```c entry *findName(char lastname[], entry *pHead) { while (pHead != NULL) { if (strcasecmp(lastname, pHead->lastName) == 0) return pHead; pHead = pHead->pNext; } return NULL; } ``` * 方法效率低(如果不是,一直往下找直到null) --- **append** ```c entry *append(char lastName[], entry *e) { /* allocate memory for the new entry and put lastName */ e->pNext = (entry *) malloc(sizeof(entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; } ``` --- ### 執行結果 `make cache-test` ``` size of entry : 136 bytes execution time of append() : 0.043322 sec execution time of findName() : 0.005194 sec ``` ``` Performance counter stats for './phonebook_orig' (100 runs): 2,037,607 cache-misses # 94.069 % of all cache refs ( +- 0.38% ) 2,166,081 cache-references ( +- 0.32% ) 261,123,688 instructions # 1.36 insns per cycle ( +- 0.02% ) 191,871,256 cycles ( +- 0.85% ) 0.070512191 seconds time elapsed ( +- 1.71% ) ``` --- ### 分析 main.c的一段程式碼 ```c e = pHead; /* the givn last name to find */ char input[MAX_LAST_NAME_SIZE] = "zyxel"; e = pHead; assert(findName(input, e) && "Did you implement findName() in " IMPL "?"); assert(0 == strcmp(findName(input, e)->lastName, "zyxel")); ``` 可以發現都是去找"zyxel",並且findname()都是根據lastname去搜尋 - 資料結構的精減化 --- ### 優化版本 **1.資料結構優化** * 版本1 ```c typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; union { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; }; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 執行結果 `make cache-test` ``` size of entry : 40 bytes execution time of append() : 0.072287 sec execution time of findName() : 0.003676 sec ``` ``` Performance counter stats for './phonebook_opt' (100 runs): 389,500 cache-misses # 66.630 % of all cache refs ( +- 0.33% ) 584,572 cache-references ( +- 1.12% ) 243,952,419 instructions # 1.72 insns per cycle ( +- 0.02% ) 141,447,587 cycles ( +- 0.83% ) 0.050986039 seconds time elapsed ( +- 1.70% ) ``` cache-miss 原本 `94.069%` -> 優化後 `66.630%` >老師上課也有說過,單純的簡化資料結構,也可以達到最佳化!!是真的!! [color=#dd75c5] * note : union會自動對齊內部包含的變數中,size最大的空間(但**對齊會有個最小值。e.g. 4/8byte**,仍有可能造成空間浪費) >> "example" 的簡寫是 [e.g.](https://en.wiktionary.org/wiki/e.g.),不是 "ex",後者是拿來描述前女友、前東家等詞彙 >> 這段程式碼已經改變預期行為了,作最佳化不能改變正確性。 [name=jserv] --- * 版本2 ```c typedef struct _PHONE_BOOK_DETAIL { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } detail; typedef struct _PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; detail* detailDate; struct _PHONE_BOOK_ENTRY *pNext; } entry; ``` 執行結果 `perf stat -r 100 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses ./phonebook_opt ` ``` size of entry : 24 bytes execution time of append() : 0.070298 sec execution time of findName() : 0.003760 sec ``` ``` Performance counter stats for './phonebook_opt' (100 runs): 252,673 cache-misses # 67.435 % of all cache refs ( +- 0.44% ) (64.12%) 374,694 cache-references ( +- 0.82% ) (64.44%) 713,048 L1-dcache-load-misses ( +- 3.34% ) (65.51%) 240,697 L1-dcache-store-misses ( +- 1.05% ) (70.62%) 522,278 L1-dcache-prefetch-misses ( +- 3.61% ) (73.19%) 87,401 L1-icache-load-misses ( +- 3.72% ) (67.82%) 0.046339100 seconds time elapsed ( +- 1.24% ) ``` ![](https://i.imgur.com/SZNNRUp.png) > 備註:這部份都尚未修改findname() append(),而是使用phonebook_orig.c 裡的function [color=#9707bf] --- **2.更改搜尋方法** * 版本1 binary search tree ``` +-----+ |node | +----+ +----+ +-------+-----+ +----+ |node| |node| | | |node| +---> +-------+ +--------+ +--v-+ +-v--+ +----+ | +----> | | +---> |node| |node| +--v--+ | | +-----------+ +----+ |node | +--v-+ +--v-+ | | +-----+ |node| |node| +-v--+ +-v--+ +----+ +----+ |node| |node| +----+ +----+ ``` 對資料結構稍做更動,增加 **bst** ```c #define OPT 1 #define BST 1 typedef struct _PHONE_BOOK_DETAIL { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } detail; typedef struct _PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; detail *detailDate; struct _PHONE_BOOK_ENTRY *pNext; } entry; typedef struct _PHONE_BOOK_BST { struct _PHONE_BOOK_ENTRY *data; struct _PHONE_BOOK_BST *left; struct _PHONE_BOOK_BST *right; } bst; bst *findName(char lastname[], bst *root); entry *append(char lastName[], entry *e); bst *build_bst(entry** pHead,int num); #endif ``` 建立binary search tree雖然findname()時間變快許多,但是一開始還要額外花時間在建樹 ```c //build binary search bst *build_bst(entry** pHead,int num) { if(num == 0) return NULL; bst *root,*left; //recursive build left leaf left = build_bst(pHead, num>>1); // build root(center node) root = (bst *) malloc(sizeof(bst)); root -> data = (entry *)malloc(sizeof(entry)); strcpy((root->data)->lastName, (*pHead)->lastName); root->left = left; // build right left entry *tmp = *pHead; *pHead = (*pHead)->pNext; free(tmp); root->right = build_bst(pHead, num-(num>>1)-1); return root; } ``` 執行結果 `perf stat -r 100 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses ./phonebook_opt` ```shell size of entry : 32 bytes execution time of append() : 0.080784 sec # 包含建樹,可以看出用了許多時間 execution time of findName() : 0.000001 sec ``` ``` Performance counter stats for './phonebook_opt' (100 runs): 555,542 cache-misses # 85.642 % of all cache refs ( +- 0.51% ) (65.22%) 648,682 cache-references ( +- 0.47% ) (66.16%) 1,059,861 L1-dcache-load-misses ( +- 0.47% ) (67.19%) 772,671 L1-dcache-store-misses ( +- 0.35% ) (68.97%) 314,794 L1-dcache-prefetch-misses ( +- 0.66% ) (68.96%) 89,415 L1-icache-load-misses ( +- 1.78% ) (66.29%) 0.100239647 seconds time elapsed ( +- 2.05% ) ``` ![](https://i.imgur.com/MwcupNB.png) * ==cache miss (85.642%)== 沒有下降的很明顯,可能原因為 * struct bst 有額外的`bst *left,*right`,在建樹的時候使用遞迴,造成大量cache miss --- ## Makefile 之前很少用到makefile,如果有也只是簡單打幾行compile的指令,第一次看到如同這次的Makefile ``` CC ?= gcc CFLAGS_common ?= -Wall -std=gnu99 CFLAGS_orig = -O0 CFLAGS_opt = -O0 EXEC = phonebook_orig phonebook_opt all: $(EXEC) SRCS_common = main.c phonebook_orig: $(SRCS_common) phonebook_orig.c phonebook_orig.h $(CC) $(CFLAGS_common) $(CFLAGS_orig) \ -DIMPL="\"$@.h\"" -o $@ \ $(SRCS_common) $@.c phonebook_opt: $(SRCS_common) phonebook_opt.c phonebook_opt.h $(CC) $(CFLAGS_common) $(CFLAGS_opt) \ -DIMPL="\"$@.h\"" -o $@ \ $(SRCS_common) $@.c run: $(EXEC) echo 3 | sudo tee /proc/sys/vm/drop_caches watch -d -t "./phonebook_orig && echo 3 | sudo tee /proc/sys/vm/drop_caches" cache-test: $(EXEC) perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./phonebook_orig perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./phonebook_opt output.txt: cache-test calculate ./calculate plot: output.txt gnuplot scripts/runtime.gp calculate: calculate.c $(CC) $(CFLAGS_common) $^ -o $@ .PHONY: clean clean: $(RM) $(EXEC) *.o perf.* \ calculate orig.txt opt.txt output.txt runtime.png ``` ## 待完成的項目 - [ ] 用hash table 做findNmae()的最佳化 - [ ] 學會GDB!!!! - [ ] 熟悉 Makefile的語法 ## 參考資料 * [louielu HackMD](https://hackmd.io/CYJgZgHA7AnADAFgLQGMCmaCGSEGYCsyMARjPkgIwhS4gT4VwogJA===) * [楊靜妃學姐 Hackpad](https://embedded2015.hackpad.com/note-hw2-phonebook-3mUOSk5J9cu)