<style> h2.part{color:#0099B0;} h3.part{color:#D92424;} h4.part{color:#005BB0;} h5.part{color:#FD6F0A;} h6.part{color:#4400B0;} </style> # 2016q3 Homework1 (phonebook) contributed by <`f5120125`> ###### tags: `sysprog` >>請依照指定格式填寫標題和"contributed by" [name=課程助教] ### Reviewed by `GreenYo0413` - 使用二元搜尋數來當作資料結構是個不錯的想法,但產生分枝的方法可以再詳述,像是使用strcmp還是自己定義的hash function。 >>已新增二元搜尋樹和雜湊函數[name=f5120125] ### 開發環境 #### Ubuntu 14.04 LTS ``` $ lscpu ``` - CPU: Intel® Core™ i5 CPU 650 @ 3.20GHz × 4 - Mem: 8 GiB - Cache: L1d cache: 32 KB L1i cache: 32 KB L2 cache: 256 KB L3 cache: 4096 KB ### 前置作業 #### Github ``` $ git branch <branch name> $ git checkout <branch name> $ git add . $ git commit -m "my commit" $ git pull $ git push origin <branch name> ``` #### Gnuplot ``` reset set ylabel 'time(sec)' set style fill solid set title 'perfomance comparison' set term png size 1024,512 enhanced font 'Verdana,11' set output 'runtime.png' plot [:][:0.08]'output.txt' using 2:xtic(1) with histogram title 'original', \ '' using ($0-0.15):($2+0.001):2 with labels title ' ', \ '' using 3:xtic(1) with histogram title 'small-struct' , \ ``` #### Perf 參考[[1]](http://wiki.csie.ncku.edu.tw/embedded/perf-tutorial)和[[2]](http://mropengate.blogspot.tw/2016/04/perf.html)的文章所列出幾個常用指令學習perf ### ==列出所有可以觸發perf採樣點的EVENT:== ``` $ perf list ``` - 採樣點類型 1. [Hardware event] 2. [Software event] 3. [Hardware cache event] 4. [Kernel PMU event] ### ==提供指定程序執行的概括情況== - 更詳細的說明 ``` $ perf stat --help ``` - 以phonebook_orig為例: ``` $ perf stat ./phonebook_orig ``` - 取得概略資訊: ``` pid: 6196 size of entry : 136 bytes execution time of append() : 0.057027 sec execution time of findName() : 0.009792 sec Performance counter stats for './phonebook_orig': 89.716798 task-clock (msec) # 0.994 CPUs utilized 18 context-switches # 0.201 K/sec 0 cpu-migrations # 0.000 K/sec 12,357 page-faults # 0.138 M/sec 291,019,350 cycles # 3.244 GHz (82.23%) 167,571,318 stalled-cycles-frontend # 57.58% frontend cycles idle (82.19%) 98,915,774 stalled-cycles-backend # 33.99% backend cycles idle (64.47%) 272,060,112 instructions # 0.93 insns per cycle # 0.62 stalled cycles per insn (82.27%) 46,888,250 branches # 522.625 M/sec (86.55%) 1,274,939 branch-misses # 2.72% of all branches (85.12%) 0.090283053 seconds time elapsed ``` ### ==察看整體系統負載:== ``` $ sudo perf top ``` - 更詳細的說明 ``` $ perf top --help ``` ### ==小結== 往後可以根據==觸發採樣點的EVENT類型==, 去得知需要的資訊 - 例如, 若想知道cache miss在整體系統的狀況 ``` $ perf top -e cache-misses ``` ## 案例分析 - Phonebook ### 目標 - **減少Cache miss** - **提升效能** #### ►根據[提示](https://hackmd.io/s/S1rbwmZ6)可以減少==Cache miss==或==提升效能==的修改方向: - [x] 改用較小的結構 - [x] 改用二元搜尋 - [x] 使用雜湊函數 ### 1. 未優化版本 #### ►原本的結構 ==**(共計 136 bytes)**== - 16+16+16+10+10+16+16+16+2+5=123 compiler做alignment後有128-byte 最後加上pointer的size(在64-bit中為8-byte):128+8=136 ```clike= #define MAX_LAST_NAME_SIZE 16 #define OPT 1 typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; }detail; ``` ### 2. 優化版本 - ==使用較小的結構== #### ►新增結構 ==**(只有32 bytes)**== ```clike= typedef struct __LAST_NAME{ char lastName[ MAX_NAME_SIZE ]; struct __LAST_NAME *pNext; struct __PHONE_BOOK_ENTRY *dNext; }entry; ``` #### ►執行時間 ``` size of entry : 32 bytes execution time of append() : 0.034547 sec execution time of findName() : 0.003680 sec ``` - [ ] 尚未釐清 - 16+16+16+10+10+16+16+16+2+5=123 compiler做alignment後有128-byte 最後加上pointer的size(在64-bit中為8-byte):128+8=136 在Makefile中已經將最佳化關掉,為什麼還是會有alignment的最佳化 上網查到關閉alignment的方法為使用關鍵字:__attribute__((packed)); ### 3. 優化版本 - ==使用二元搜尋== #### ►採用二元搜尋樹的原因 | Binary Search Tree | Average | Worst | | ---------- | -------------- | ------| | **space** | $\Theta(n)$ | $O(n)$| | **search** | $\Theta(log~n)$| $O(n)$| | **insert** | $\Theta(log~n)$| $O(n)$| | **delete** | $\Theta(log~n)$| $O(n)$| - 由上表可得知再==一般情況==下==搜尋==資料可以在 ==$\Theta(log~n)$== #### ►如何建立二元搜尋樹 - [x] Top-down appraoch ==$O(nlog~n)$== - 由root開始一筆一筆insert資料來建立二元搜尋樹 - [x] [Bottom-up approach](http://www.geeksforgeeks.org/sorted-linked-list-to-balanced-bst/) ==$O(n)$== - 由於Input為 **sorted list**. 藉由已知的資料集, 我們可從leaf開始建立二元搜尋樹 #### ►二元搜尋樹的實作 ##### Tree node ```clike= typedef struct _PHONE_BOOK_BST { struct __LAST_NAME *data; struct _PHONE_BOOK_BST *left; struct _PHONE_BOOK_BST *right; }BST; ``` ##### Search function - 假設str1為欲在樹中搜尋的字串, str2為目前探訪節點的字串 - 利用[strcasecmp](https://linux.die.net/man/3/strcasecmp)或者[strncasecmp]() **```strcasecmp(const char *str1, const char *str2)```** **```strncasecmp(const char *str1, const char *str2, size_t len)```** - 皆是 ==**case insensitive**== - 若 str1 和 str2 相同, 回傳值==0 - 若 str1 > str2, 回傳值>0 (往 right child 探訪) - 若 str1 < str2, 回傳值<0 (往 left child 探訪) ##### Build tree ###### ♦如何建立*balanced* tree structure - 當root為資料集中間數, 樹的結構能夠較balanced, 所以以n筆sorted list來看, 我們將中間數放在root, 高度預期為 $\Theta(log~n)$ ###### ♦從sorted list中的何處開始建立subtree到完成整棵binary search tree - 下列為從資料集中挑出的已排序資料 ```graphviz digraph linkedList { rankdir=LR; node [shape=record]; edge [tailclip=false]; a [label="{ <data> aaron | <ref> }"] b [label="{ <data> abdulkarim | <ref> }"]; c [label="{ <data> burton | <ref> }"]; d [label="{ <data> cob | <ref> }"]; e [label="{ <data> desmund | <ref> }"]; f [label="{ <data> dru | <ref> }"]; g [label="{ <data> elijoh | <ref> }"]; NULL [shape=box]; a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both]; b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both]; c:ref:c -> d:data [arrowhead=vee, arrowtail=dot, dir=both]; d:ref:c -> e:data [arrowhead=vee, arrowtail=dot, dir=both]; e:ref:c -> f:data [arrowhead=vee, arrowtail=dot, dir=both]; f:ref:c -> g:data [arrowhead=vee, arrowtail=dot, dir=both]; g:ref:c -> NULL [arrowhead=vee, arrowtail=dot, dir=both]; } ``` - 預期建立的binary search tree ```graphviz digraph G { nodesep=0.8; ranksep=1; {node[style=invis,label=""]; __root_cob__; } {node[style=invis, label="", width=.1]; __dru_children__; __abdulkarim_children__; } {rank=same; abdulkarim; dru; __root_cob__} {rank=same; aaron; burton; __abdulkarim_children__} {rank=same; desmund; elijah; __dru_children__} cob -> abdulkarim; cob -> dru; abdulkarim -> aaron; abdulkarim -> burton; dru -> desmund; dru -> elijah; {edge[style=invis]; //Distantiate nodes cob -> __root_cob__; abdulkarim -> __root_cob__ -> dru; //Force ordering between children dru -> __dru_children__; desmund -> __dru_children__ -> elijah; abdulkarim -> __abdulkarim_children__; aaron -> __abdulkarim_children__ -> burton; } } ``` ##### Algorithm ```clike= BST *sortedListToBSTRecur(entry** node, int len) { /* base step: if len is zero, then return NULL */ //build left subtree recursively BST *left = sortedListToBSTRecur(node, len>>1); /* allocate root here and set its attributes */ /* Move the node to the next in this recursion step. When this step returns, the previous recursion step will get the next node */ *node = (*node)->pNext; //build right node root->right = sortedListToBSTRecur(node, len-(len>>1)-1); return root; } ``` #### ►分析演算法 ##### step 1 - 遞迴建立左節點, 且因sorted list從第一筆開始, 其資料為最小, 要將其放入正確位置則必須深入binary tree的高度, 因此長度會在每次遞迴時除以2,直到碰到NULL ##### step 2 - allocate節點並放入資料 ##### step 3 - 因為已經處理完目前的資料, 要將指標移向sorted list的下一筆資料 ##### step 4 - 遞迴建立右節點 ##### Time Complexity - 由於每個節點探訪一次並馬上allocate空間並放入資料, 其時間複雜度為 ==$O(n)$== #### ►執行時間 ``` size of entry : 32 bytes size of tree node : 24 bytes execution time of append() : 0.071578 sec execution time of findName() : 0.000019 sec ``` ### 4. 優化版本 - ==使用雜湊函數== - 如何[實作hash table](https://www.tutorialspoint.com/data_structures_algorithms/hash_data_structure.htm) - 定義TABLE_SIZE - 實作hash_function去取得key值 - 實作hash_append建立hash table - 實作hash_find_name去找尋給定的last name - HashTable ```clike= typedef struct _HASH_TABLE_{ int len; HashItem *next; }HashTable; ``` - HashItem ```clike= typedef struct _HASH_ITEM_{ char lastName[ MAX_LAST_NAME_SIZE ]; int key; struct _HASH_ITEM_ *next; }HashItem; ``` - hash_append 我將新增的item插入slut前端, 每個slut有自己的linked-list. 由於file是由小排序到大讀取, 因此linked-list會是由大到小 - 估計append的時間和建立linked-list差不多 - 由於每個slut有自己的linked-list. 若資料分佈均勻, findName時間可以下降 - 執行時間 ``` size of hash item : 32 bytes execution time of append() : 0.049486 sec execution time of findName() : 0.000014 sec ``` ### 5. ==runtime==的比較圖 ![image alt](http://i.imgur.com/U8OKQZH.png)