2017q1 Homework1 (phonebook)

# 2017q1 Homework1 (phonebook) contributed by <`F74021200`> >中英文字間請以空白隔開 >進度落後太多請快快追上! >[name=課程助教][color=red] >好的! > [name=F74021200] ## 開發環境電腦規格： ``` tinin@tinin:~$ lscpu Architecture: x86_64 CPU 作業模式： 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 每核心執行緒數：2 每通訊端核心數：2 Socket(s): 1 NUMA 節點： 1 供應商識別號： GenuineIntel CPU 家族： 6 型號： 58 Model name: Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz 製程： 9 CPU MHz： 2994.030 CPU max MHz: 3200.0000 CPU min MHz: 1200.0000 BogoMIPS: 5188.16 虛擬： VT-x L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 3072K NUMA node0 CPU(s): 0-3 ``` linux kernel: ``` tinin@tinin:~$ uname -a Linux tinin 4.8.0-39-generic #42~16.04.1-Ubuntu SMP Mon Feb 20 15:06:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux ``` ## Perf top ``` $./phonebook_orig & sudo perf top -p $! ``` ``` 39.16% libc-2.23.so [.] __strcasecmp_l_avx 31.70% phonebook_orig [.] findName 7.24% [kernel] [k] release_pages 5.56% phonebook_orig [.] strcasecmp@plt 5.45% [kernel] [k] unmap_page_range 3.60% [kernel] [k] nmi 1.83% libc-2.23.so [.] __strcasecmp_avx 1.82% [kernel] [k] common_perm_cond 1.82% [kernel] [k] mem_cgroup_update_lru_size 1.81% [kernel] [k] kmem_cache_free 0.01% [kernel] [k] native_iret 0.00% [kernel] [k] native_write_msr ``` 消耗 CPU 周期最多的部份為函式 findName() ## Phonebook 效能 * ### cache miss ``` $ ./phonebook_orig ``` ``` size of entry : 136 bytes execution time of append() : 0.048713 sec execution time of findName() : 0.005059 sec ``` ``` sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict" ``` 執行 ``` perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig ``` ``` Performance counter stats for './phonebook_orig' (100 runs): 2,096,128 cache-misses # 94.483 % of all cache refs ( +- 0.20% ) 2,218,535 cache-references ( +- 0.16% ) 262,467,890 instructions # 1.32 insn per cycle ( +- 0.02% ) 199,293,309 cycles ( +- 0.57% ) 0.065489355 seconds time elapsed ( +- 0.69% ) ``` 由上圖知， cache miss 高達 94.800% 檢視程式與資料結構發現，在搜尋 phonebook 時，僅使用到 lastName ，因此，透過縮小 struct，使能加入 cache 的資料 struct 增加，從而降低 cache miss 發生的機率。 ```click= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; /* char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; */ struct __PHONE_BOOK_ENTRY *pNext; } entry; ``` 修改後 ``` $ ./phonebook_opt ``` ``` size of entry : 24 bytes execution time of append() : 0.044088 sec execution time of findName() : 0.002203 sec ``` 執行時間縮短了！ ``` perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_opt ``` ``` Performance counter stats for './phonebook_opt' (100 runs): 253,406 cache-misses # 71.468 % of all cache refs ( +- 0.26% ) 354,574 cache-references ( +- 0.93% ) 241,179,130 instructions # 1.80 insn per cycle ( +- 0.02% ) 133,748,026 cycles ( +- 0.67% ) 0.043806218 seconds time elapsed ( +- 0.69% ) ``` 由上圖知， cache miss 發生次數下降了 ![](https://i.imgur.com/ymnXGMh.png) * ### Hash function 使用 hash function 增加搜尋速度 In phonebook_opt.h ```click= struct phonebook { Entry *Data[HASH_TABLE_SIZE]; func_t findName; read_t readFile; }; ``` In phonebook_opt.c ```click= static Entry *findName_impl(char lastName[], Phonebook **self) { Entry *e; e = ((*self)->Data)[BKDRHash(lastName) % HASH_TABLE_SIZE]; while (e) { if (strcasecmp(lastName, e->lastName) == 0) return e; e = e->pNext; } return NULL; } static void readFile_impl(FILE *fp, Phonebook **self) { char line[MAX_LAST_NAME_SIZE]; int i = 0; Entry *e; while (fgets(line, sizeof(line), fp)) { while (line[i] != '\0') i++; line[i - 1] = '\0'; i = 0; e = (*self)->Data[BKDRHash(line) % HASH_TABLE_SIZE]; while(e->pNext) e = e->pNext; e = append(line, e); } } Entry *append(char lastName[], Entry *e) { e->pNext = (Entry *) malloc(sizeof(Entry)); e = e->pNext; strcpy(e->lastName, lastName); e->pNext = NULL; return e; } int init_phonebook (Phonebook **self) { int i; if (NULL == ((*self) = malloc(sizeof(Phonebook)))) return -1; for (i = 0; i < HASH_TABLE_SIZE; ++i) { if (NULL == (((*self)->Data)[i] = (Entry *) malloc(sizeof(Entry)))) return -1; ((*self)->Data)[i]->pNext = NULL; } (*self)->findName = findName_impl; (*self)->readFile = readFile_impl; return 0; } unsigned int BKDRHash(char *str) { unsigned int seed = 131; unsigned int hash = 0; while (*str) { hash = hash * seed + (*str++); } return hash; } ``` 修改後 ``` $ ./phonebook_opt ``` ``` execution time of append() : 30.423652 sec execution time of findName() : 0.000126 sec ``` ``` perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./phonebook_opt ``` ``` Performance counter stats for './phonebook_opt' (100 runs): 294,807,802 cache-misses # 65.948 % of all cache refs ( +- 0.68% ) 447,031,194 cache-references ( +- 0.07% ) 3,394,511,333 instructions # 0.04 insn per cycle ( +- 0.01% ) 80,193,329,008 cycles ( +- 0.57% ) 27.472831615 seconds time elapsed ( +- 0.50% ) ``` 搜尋時間確實下降了（0.005059 sec->0.000126 sec）， cache-miss 也下降（94.483%->65.948 %），但讀入時間增加太多(0.044088 sec->30.423652 sec)，應該是因為在讀入每筆資料時，都須先計算其 hash 值，在計算 hash 值時，除了額外計算的時間，還有呼叫函式的時間，另外，當找到 hash 值後，若所在的陣列已有其他資料，又必須經過一串 linked list 到達最後一個後，才能將新資料加入；因此，讀入所有資料的時間才會如此可觀。以下嘗試兩種方法: * (1)將 hash function 展開，減少呼叫函式所花費的時間。 ```click= static Entry *findName_impl(char lastName[], Phonebook **self) { Entry *e; unsigned int seed = 131; unsigned int hash = 0; char *str; str = lastName; while (*str) { hash = hash * seed + (*str++); } e = ((*self)->Data)[hash % HASH_TABLE_SIZE]; while (e) { if (strcasecmp(lastName, e->lastName) == 0) return e; e = e->pNext; } return NULL; } static void readFile_impl(FILE *fp, Phonebook **self) { char line[MAX_LAST_NAME_SIZE], *str; int i = 0; unsigned int seed = 131; unsigned int hash = 0; Entry *e; while (fgets(line, sizeof(line), fp)) { while (line[i] != '\0') i++; line[i - 1] = '\0'; i = 0; str = line; hash = 0; while(*str) { hash = hash * seed + (*str++); } e = (*self)->Data[hash % HASH_TABLE_SIZE]; while(e->pNext) e = e->pNext; e = append(line, e); } } ``` ``` $ make run ``` (上為原本的 hash 版本，下為修改後的) ``` execution time of append() : 33.082860 sec execution time of findName() : 0.000200 sec execution time of append() : 33.490506 sec execution time of findName() : 0.000144 sec ``` ``` $ perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./phonebook_opt_1 ``` ``` Performance counter stats for './phonebook_opt_1' (100 runs): 332,570,144 cache-misses # 73.024 % of all cache refs ( +- 0.75% ) 455,426,921 cache-references ( +- 0.16% ) 3,394,695,626 instructions # 0.04 insn per cycle ( +- 0.02% ) 88,919,125,151 cycles ( +- 0.70% ) 29.833768554 seconds time elapsed ( +- 0.51% ) ``` 結果，並沒有獲得太大的改善。 ![](https://i.imgur.com/SXRrMd3.png) * (2)加入一筆資料時，直接放在第一個說明：因為原本的情況是，在加入一筆資料時，會將其放在所屬 hash set 裡 linked-list 的最後一個；現在，改將其直接加入到第一個位置。 ```click= static void readFile_impl(FILE *fp, Phonebook **self) { char line[MAX_LAST_NAME_SIZE], *str; int i = 0; unsigned int seed = 131; unsigned int hash = 0; Entry *e, *e1; while (fgets(line, sizeof(line), fp)) { while (line[i] != '\0') i++; line[i - 1] = '\0'; i = 0; str = line; hash = 0; while(*str) { hash = hash * seed + (*str++); } e = (*self)->Data[hash % HASH_TABLE_SIZE]; e1 = e->pNext; e->pNext = NULL; e = append(line, e); e->pNext = e1; } } ``` ``` $ make run ``` ``` execution time of append() : 31.395222 sec execution time of findName() : 0.000202 sec execution time of append() : 0.092144 sec execution time of findName() : 0.000000 sec ``` ``` $ perf stat --repeat 100 \ -e cache-misses,cache-references,instructions,cycles \ ./phonebook_opt_1 ``` ``` Performance counter stats for './phonebook_opt_1' (100 runs): 260,094 cache-misses # 59.930 % of all cache refs ( +- 0.20% ) 433,995 cache-references ( +- 1.34% ) 231,826,919 instructions # 1.43 insn per cycle ( +- 0.02% ) 161,765,903 cycles ( +- 1.40% ) 0.057799116 seconds time elapsed ( +- 1.44% ) ``` 輸入時所花時間大幅下降，由此可知，在原版本的輸入中，於 linked-list 的搜尋花了很多時間(上為原本的 hash 版本，下為修改後的)；另外，修改後，搜尋字串"zyxel"的時間降為0.000000 sec ，在檢查 words.txt 後，發現"zyxel"為所在 hash set 的第一個結點，可能因此使得搜尋時間非常短。 ![](https://i.imgur.com/X6NfEgR.png) * ### binary search tree 使用不同的資料結構觀察效能的改變，這次用 binary search tree 。 In phonebook_bst.h ```click= struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_ENTRY *left; struct __PHONE_BOOK_ENTRY *right; }; struct phonebook { Entry *Data; func_t findName; read_t readFile; }; ``` In phonebook_bst.c ```click= static Entry *findName_impl(char lastName[], Phonebook **self) { Entry * p; int cmp; p = (*self)->Data; while (1) { cmp = strcmp (lastName, p->lastName); if (cmp > 0) { if (p->right)p = p->right; else { printf ("No such name!\n"); return NULL; } } else if (cmp < 0) { if (p->left)p = p->left; else { printf ("No such name!\n"); return NULL; } } else { return p; } } } static void readFile_impl(FILE *fp, Phonebook **self) { int i = 0; char line[MAX_LAST_NAME_SIZE]; while (fgets(line, sizeof(line), fp)) { while (line[i] != '\0') i++; line[i - 1] = '\0'; i = 0; insert (line, (*self)->Data); } } Entry *insert(char lastName[], Entry *Data) { Entry *p; int cmp; p = Data; while (1) { cmp = strcmp (lastName, p->lastName); if (cmp > 0) { if (p->right)p = p->right; else { p->right = (Entry *) malloc (sizeof(Entry)); p = p->right; strcpy (p->lastName, lastName); p->left = NULL; p->right = NULL; return p; } } else { if (p->left)p = p->left; else { p->left = (Entry *) malloc (sizeof(Entry)); p = p->left; strcpy (p->lastName, lastName); p->left = NULL; p->right = NULL; return p; } } } } int init_phonebook (Phonebook **self) { if (NULL == (*self = malloc(sizeof(Phonebook)))) return -1; if (NULL == ((*self)->Data = (Entry *) malloc(sizeof(Entry)))) return -1; strcpy((*self)->Data->lastName, "n"); (*self)->Data->left = NULL; (*self)->Data->right = NULL; (*self)->findName = findName_impl; (*self)->readFile = readFile_impl; return 0; } ``` 在觀察 words.txt 後，我發現資料的前幾個字母是按照 abcdefg ... 的方式排列，為了不使 tree 中的結點都集中在某一邊，我先將根結點預設為 n 。 ``` $ ./phonebook_bst ``` ``` execution time of append() : 195.034462 sec execution time of findName() : 0.000834 sec ``` 與 orig 相比，搜尋時間確實減少了，但讀入資料的時間卻太長；與 hash function 版本相比，除了搜尋時間較長外，讀入時間亦長的誇張，由 words.txt 中的資料來看，應是所產生的 tree 太過類似 skewed tree 所導致的。不過， cache-miss 卻下降許多（94.483%->41.123 %），如下： ``` Performance counter stats for './phonebook_bst' (100 runs): 1,908,252,577 cache-misses # 41.123 % of all cache refs ( +- 1.27% ) 4,640,306,720 cache-references ( +- 1.10% ) 1,167,592,842,608 instructions # 2.26 insn per cycle ( +- 0.22% ) 516,320,047,658 cycles ( +- 0.47% ) 163.680049478 seconds time elapsed ( +- 0.54% ) ``` 問題: Q1.:本例選用的dataset(也就是輸入電話簿的姓名)是否存在問題? A1.: 首先，須確認dataset 中資料有多少比例是目前真實存在的英文姓氏，因為在電話簿中，不可能出現非姓氏的字，若出現，則此資料中的人也不存在，這筆資料會被直接捨棄，因此，當此 dataset 中出現非姓氏詞時，就表示此 dataset 有問題；所以，要回答這問題需要對此 dataset 中的資料與真實姓氏的資料比對，依比對結果回答。 Q2.:有代表性嗎?跟真實英文姓氏差異大嗎? A2.: 關於是否有代表性，需要考慮許多因素，就此電話簿所用年代而言，就必須考慮那些姓氏出現在所用年代，有一 [網站](http://www.ancestry.com/learn/facts) 能查詢到英文姓氏在不同年代於美國的分布以及其起源，舉個例，依記載，在1843到1999年，紀錄了一位名叫 Elka Aaaa 的人，確實有人的姓氏是 aaaa；所以，需要知道更多此電話簿的用途資訊，才能知道其是否有代表性；關於第二個問題，可能需要先更明確的定義'所要的真實英文姓氏'的定義，才能有後續的動作，就如上一個問題中一樣，所謂'真實英文姓氏'也有年代問題，不同年代的英文姓氏組成不同，須先設定好年代才能比較差異性。 Q3.:資料難道「數大便是美」嗎? A3.: 就統計學來說，於母群體中，隨機抽樣的數量越多，資料分析出的結果越趨近於母群體，因此，'數大'，只代表取樣的數量大，第一，沒限制須隨機抽樣，這很可能導致資料出現極大偏差；第二，沒限制所取樣本是在母群體中，這會使得資料的代表性降低；因此，資料並非'數大'便是美，所謂 garbage in, garbage out ，即使'數大'的垃圾，所產亦是垃圾。 Q4.:如果我們要建立符合真實電話簿情境的輸入，該怎麼作? A4.: 首先，要先確認此情境的設定，例如:年代、地區，若情況允許，當然是直接取得該地的人口姓氏，若無法取得，但時間允許的話，就在此地區隨機分配調查人員，依照人群流動時的疏密程度比例配置調查員的數量，如此能盡可能將母群體中的資料都蒐集到，使取樣資料越接近母群體的實際資料；若無法做實地調查，則無法知道各姓氏所佔比例，那只好從現在仍存在的所有姓氏中，隨機取樣做出電話簿了。 :::danger 這種等級的回覆，跟文組學生有什麼差別？理工人要拿出數據、數學模型，還有推理出來！ --jserv ::: >是!我趕緊修改! >[name=F74021200] 參考資料： [perf 原理和實務 ](https://hackmd.io/s/B11109rdg) [Programming Small](https://hackmd.io/s/HkO2ZpIYe) [ gnuplot 語法解說和示範](https://hackmd.io/s/Skwp-alOg) [ Makefile 語法和示範](https://hackmd.io/s/SySTMXPvl) [hash function 觀念和實務](https://hackmd.io/s/HJln3jU_e) [哈西表](http://blog.csdn.net/wanglx_/article/details/40400693) [哈西函數](http://blog.csdn.net/wanglx_/article/details/40300363) [哈西函式 in C](https://www.byvoid.com/zhs/blog/string-hash-compare) [你所不知道的 C 語言：物件導向程式設計篇](https://hackmd.io/s/HJLyQaQMl#) [ How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/) [Hsiang的共筆](https://hackmd.io/GYBgRgJghgjDCsBaATAFgMwE5GtSA7IlJstjABwTDCbDwzACmAbEA===#) ###### tags: `by F74021200`