2017q3 Homework1 (phonebook)

--- tags: 研究所 --- # 2017q3 Homework1 (phonebook) contributed by <twngbm> ## 系統環境 ``` Distributor ID: Ubuntu Description: Ubuntu 16.04.3 LTS Release: 16.04 Codename: xenial L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K ``` ## 原始碼閱讀-opt.txt輸出問題 + 一開始用make plot的時候，畫出來的兩張圖長的一樣 + 資料夾裏面只產生一個orig.txt檔，預期中的opt.txt並沒有產生 + 進入main裏面看到 ```c=9 #ifdef OPT #define OUT_FILE "opt.txt" #else #define OUT_FILE "orig.txt" #endif ``` + 但是上面有沒有看到 OPT的define，而且phonebook.h也沒有被include進來 + 看到main 裏面的IMPL ```c=7 #include IMPL ``` + 再看到MakeFile ```c=24 phonebook_opt: $(SRCS_common) phonebook_opt.c phonebook_opt.h clean $(CC) $(CFLAGS_common) $(CFLAGS_opt) \ -DIMPL="\"$@.h\"" -o $@ \ $(SRCS_common) $@.c ``` + 根據網路上的資料，-DIMPL會把 main.c裏面的 IMPL替換成後面給定的值，因此上面的指令會變成 ``` cc -Wall -std=gnu99 -O0 \ -DIMPL="\"phonebook_opt.h\"" -o phonebook_opt \ main.c phonebook_opt.c ``` + 最後我們進入phonebook_opt.h裏面，我們可以發現 ```c=8 //#define OPT 1 ``` + 將其取消註解即可正常輸出opt.txt和正常plot ## Orig輸出 ``` size of entry : 136 bytes execution time of append() : 0.055486 sec execution time of findName() : 0.010031 sec ``` * 用gnuplot繪圖後(修改scripts/runtime.gp) ![](https://i.imgur.com/0r4f3Fc.png) * 接著用 ``` $ make cache-test Performance counter stats for './phonebook_orig' (100 runs): 464,1253 cache-misses # 93.015 % of all cache refs ( +- 0.14% ) 498,9790 cache-references ( +- 0.32% ) 2,6225,0077 instructions # 1.18 insn per cycle ( +- 0.02% ) 2,2253,6808 cycles ( +- 0.66% ) 0.076652806 seconds time elapsed ( +- 0.77% ) ``` 可以看到高達93.015%的cache misses ## 優化1-減少struct大小 + 看到phonebook_orig.h ```c= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` :::info * 這裡定義了entry是一個linked list 的結構，每一個節點裡面都存有上面列出的所有資料(last name,first name,email...)。每一個節點的大小是136bytes。 * 我們一個list的結構大小是136bytes，我電腦的L1 cache 是32K，因此我頂多只能放32x1024/（136X8）=30.12，30個entry在cache裏面，但是我們的基數有35萬個，因此cache-miss一定會很常發生。（引用自[programming small](https://hackmd.io/s/SkfffA-jZ)） ::: * 嘗試將不需要處理的資訊移出原本的資料結構，並用一指標儲存 * 因為我們的搜尋是用lastname來去搜尋的，因此我把lastname獨立出來做一個結構，再用指標指向詳細資料的結構 ```c= typedef struct __PHONE_BOOK_DATA { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } pDATA; typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_DATA *pDATA; struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` + 執行後我們發現entry的大小變成32bytes，因此可預期cache的內可存放的entry會變多。 ``` Performance counter stats for './phonebook_orig' (100 runs): 459,1314 cache-misses # 93.554 % of all cache refs ( +- 0.15% ) 490,7672 cache-references ( +- 0.17% ) 2,6084,8591 instructions # 1.26 insns per cycle ( +- 0.02% ) 2,0721,3607 cycles ( +- 0.39% ) 0.061480693 seconds time elapsed ( +- 0.82% ) Performance counter stats for './phonebook_opt' (100 runs): 151,5075 cache-misses # 67.984 % of all cache refs ( +- 0.40% ) 222,8564 cache-references ( +- 0.31% ) 2,4404,7516 instructions # 1.73 insns per cycle ( +- 0.02% ) 1,4083,7182 cycles ( +- 0.62% ) 0.039998826 seconds time elapsed ( +- 0.79% ) ``` 可以發現cashe-miss的情況減少了27% 從圖片中也可以發現append和find執行時間有所減少 ![](https://i.imgur.com/0Lhn0mn.png) ## 優化2-hash table * 希望能利用hash的方法來解決linked list的循序搜循的時間複雜度為O(n)的情況，利用hash表來解決這個問題 + 本例利用BKDR hash,seed 選131,bucket大小選擇是4093 1. 首先先引入hash function ```c=7 unsigned long hash(char *str) { unsigned int seed = 131; // 31 131 1313 13131 131313 etc.. unsigned int hash = 0; while (*str) { hash = hash*seed + (*str++); } hash &= 0x7FFFFFFF; return (hash %= SIZE); } ``` :::info + 這邊我們定義SIZE的大小為4093，SIZE的意義為Hash table的大小，同時也是hash bucket的數量 ::: 2. 修改main.c ```c=52 entry *hash_table_index[SIZE]; for (i = 0; i < SIZE; i++) { hash_table_index[i] = ( entry *) malloc(sizeof(entry)); } ``` :::info + 在這邊我建構 an array of pointer to struct entry + entry是利用優化1中，減小的data struct 來實作 + 建構完array之後，我就可以利用hash functin return 回來的key值，直接存取在array中的記憶體位址中的變數 + 此處中array所存放的變數為pointer to struct entry，因此我只要利用 hash_table_index[key]->lastName即可取出lastName資料 ::: 3. 建構append()和findName() + append() ```c=29 void *append(char lastName[], entry *hash_table_index[]) { int key = hash(lastName); entry *temp = hash_table_index[key]; hash_table_index[key] = (entry *) malloc(sizeof(entry)); strcpy(hash_table_index[key]->lastName,lastName); hash_table_index[key]->pNext = (temp == NULL) ? NULL : temp; return NULL; } ``` :::info + 為了解決collision的情況，因此我使用linked list來儲存同一個key的所有資料 + struet entry 中保留 struct ____PHONE_BOOK_ENTRY *pNext__ ::: :::warning + 原來的作法是，進入linked list之後，一一比較每個*pNext的值直到list的末端（pNext==NULL） + 耗時過長因此更改作法 ::: :::info + 後來更改作法為，直接在linked list 的頭，插入新的資料節點 + 將hash_table_indes[key]的pointer指向新的節點，將新的節點的pNext指向後方原來存在的linked list + 35行判斷此新增的資料結構是否為新的節點或是在bucket中已有其他資料結構，來判斷是否指向NULL或是原有的linked list ::: + findName() ```c=17 int findName(char lastName[], entry *hash_table_index[]) { unsigned long key = hash(lastName); entry *now = hash_table_index[key]; while (now->pNext != NULL) { if (strcasecmp(lastName, now->lastName) == 0) return 1; now = now->pNext; } return 0; } ``` :::info + 透過hash function算出key值後，透過array的存取來進入linked list中的第一個節點，並判斷字串是否相同，不相同則進入第二個節點 + 如果有找到則回傳1 ::: 5. 結果 + 首先先比較執行時間（原版,結構優化,Hash） ![](https://i.imgur.com/OMp7aBm.png) :::info 關於append time 1. 優化後的資料結構，在進行append的時候，因為資料結構較小，因此fetch進cache時，可以減少cache miss的機會，因此執行時間較快 2. hash雖然是使用優化後的資料結構，但是因為必須進行hash運算，所以整體的append時間約莫跟原版的一樣，但是還是比原版略快一點（某些情況下可以差到5%左右的時間）關於findName time 1. 優化後的資料結構，因為size較小，所以cache miss較少，固執行速度比原版快 2. 經過hash之後，search 的average time變成Θ(1)，雖然有linked list的Θ(n)存在，但是整體的搜尋時間已經變成線性的了，根據實測，每個bucket中的數量大概在80~100之間 3. 建構hash_table的缺點是，需要額外得記憶體空間來儲存hash table，本例使用hash table 內的儲存內容為pointer 因此額外的記憶體空間為8*SIZE bytes。 ::: + 再來比較cache miss ``` Performance counter stats for './phonebook_orig' (100 runs): 234,1706 cache-misses # 90.686 % of all cache refs ( +- 0.15% ) 258,2200 cache-references ( +- 0.21% ) 2,2097,8273 instructions # 1.26 insns per cycle ( +- 0.02% ) 1,7572,3132 cycles ( +- 0.55% ) 0.049907710 seconds time elapsed ( +- 1.03% ) Performance counter stats for './phonebook_opt_small_structure' (100 runs): 85,3654 cache-misses # 72.777 % of all cache refs ( +- 0.38% ) 117,2979 cache-references ( +- 0.37% ) 2,0419,5625 instructions # 1.64 insns per cycle ( +- 0.02% ) 1,2425,1403 cycles ( +- 0.44% ) 0.034413972 seconds time elapsed ( +- 0.56% ) Performance counter stats for './phonebook_opt_hash' (100 runs): 54,8297 cache-misses # 68.200 % of all cache refs ( +- 0.37% ) 80,3950 cache-references ( +- 1.07% ) 2,4281,2531 instructions # 1.66 insns per cycle ( +- 0.02% ) 1,4605,8299 cycles ( +- 0.41% ) 0.040509766 seconds time elapsed ( +- 0.50% ) ``` :::info hash中整體的cache miss影響在於bucket 數量的影響，實測過當bucket=32時，cache misses rate比bucket=4096時高了5%左右 ::: ## QA :::warning 本例選用的 dataset (也就是輸入電話簿的姓名)有代表性嗎？跟真實英文姓氏差異大嗎？ ::: :::info 本例子選用的資料，與真實英文姓名相差頗大。但是作為phonebook的dataset，必須考慮有人可能在新增資料時，就不注重名字的正確性，有些人可能只是想要儲存一個自己能夠看懂的資料進去。 ::: :::warning 資料難道「數大就是美」嗎？ ::: :::info 如果資料庫的大小越大，那麼我們就需要更符合資料庫結構的演算法來幫助我們儲存資料，甚至必須考慮，針對不同的search key去建立不同的資料結構，來符合快速的搜尋目的。 ::: :::warning 如果我們要建立符合真實電話簿情境的輸入，該怎麼作？ ::: :::info 1. 如果是像本案例，由一排序過的檔案進行輸入，則可以使用hash來進行append 2. 如果是隨機輸入，則可以使用TST來進行append，並利用system idle time來進行balance。 ::: ### Reference + https://hackpad.com/ep/pad/static/xWkbsmaWVEn + https://hackmd.io/s/rJCs_ZVa#優化版本2-–-hash-function + http://blog.techbridge.cc/2017/01/21/simple-hash-table-intro/ + https://www.byvoid.com/zht/blog/string-hash-compare

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.