2017q3 Homework1 (phonebook)

# 2017q3 Homework1 (phonebook) contributed by <`tina0405`> ## 開發環境 * 指令： lscpu ~~~ Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 58 Model name: Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz Stepping: 9 CPU MHz: 1200.024 CPU max MHz: 3200.0000 CPU min MHz: 1200.0000 BogoMIPS: 5188.73 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K NUMA node0 CPU(s): 0-3 ~~~ * 如果只想關心 cache 可以利用指令： lscpu|grep cache ~~~ tina@tina-X550VB:~$ lscpu|grep cache L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K ~~~ * 先來看原版的 performance Performance counter stats for './phonebook_orig' (10 runs): ~~~ 209,5636 cache-misses # 95.620 % of all cache refs ( +- 0.22% ) 219,1640 cache-references ( +- 0.32% ) 2,6172,7225 instructions # 1.39 insn per cycle ( +- 0.05% ) 1,8822,9850 cycles ( +- 0.45% ) 0.062704964 seconds time elapsed ( +- 1.86% ) ~~~ ## 修改 struct * 先將 entry 裡的結構減小，大小愈小就能容納更多的資料，利用這點減少 cache miss ~~~ c= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_EXCEPT *info; struct __PHONE_BOOK_ENTRY *pNext; } entry; typedef struct __PHONE_BOOK_EXCEPT { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } except; ~~~ Performance counter stats for './phonebook_opt' (100 runs): ~~~ 38,8268 cache-misses # 72.970 % of all cache refs ( +- 0.07% ) 53,2092 cache-references ( +- 0.29% ) 2,4459,9781 instructions # 1.80 insn per cycle ( +- 0.03% ) 1,3614,4418 cycles ( +- 0.32% ) 0.044347565 seconds time elapsed ( +- 0.53% ) ~~~ * cache-misses 的確由95.620% -> 72.970% * 跑 100 次所花費的平均時間，分別在 append() findName() ![](https://i.imgur.com/LUWdt5h.png) * 本來想要再減小 entry 裡的結構，但做了以下不良示範： * 把 *pNext 也放進 struct __PHONE_BOOK_EXCEPT * 先將結構改成： ~~~ c= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_EXCEPT *info; } entry; typedef struct __PHONE_BOOK_EXCEPT { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; struct __PHONE_BOOK_ENTRY *pNext; } except; ~~~ * 把 entry 的大小調成 24byte ~~~ size of entry : 24 bytes ~~~ * 跑了 100 次的 cache-miss ~~~ Performance counter stats for './phonebook_opt' (100 runs): 334,2205 cache-misses # 96.572 % of all cache refs ( +- 0.06% ) 346,0849 cache-references ( +- 0.08% ) 4,4028,8022 instructions # 1.29 insn per cycle ( +- 0.01% ) 3,4027,0221 cycles ( +- 0.22% ) 0.111962579 seconds time elapsed ( +- 0.28% ) ~~~ * 所花時間也比原本更高很多 ![](https://i.imgur.com/UzJPVYA.png) * 跟原版差別在於，每次要都資料進去都必須重新 malloc 空間，要指到 pHead 時就得透過 info ，連 info 的空間也要重新malloc ，況且 word.txt 裡有 349900 筆資料都要 malloc ~~~c= //phonebook_opt.c entry11 *append(char lastName[], entry *e) { e->info = (except*)malloc(sizeof(except)); e->info->pNext = (entry *)malloc(sizeof(entry)); e = e->info->pNext; strcpy(e->lastName, lastName); e->info = (except*)malloc(sizeof(except)); e->info->pNext = NULL; return e; } entry *findName(char lastName[], entry *pHead) { while (pHead != NULL) { if (strcasecmp(lastName, pHead->lastName) == 0) return pHead; pHead = pHead->info->pNext; } return NULL; } ~~~ * 在減小 entry 的同時，必須考量 malloc 所付出的代價 * 待做事項 * 解釋 malloc * 利用工具去知道 malloc 平均所花的時間 * 找尋減少 malloc 的方法 ## 修改資料結構 * 嘗試 Hash fuction * 參考[Hash function](https://en.wikipedia.org/wiki/Hash_function) ~~~ A hash function is any function that can be used to map data of arbitrary size to data of fixed size. ~~~ * 而我們應該要有一把 Key 去透過 hash function，來找尋資料的 index ~~~ index=h(Key) ~~~ * 這邊應該發生了一點問題，假如兩筆資料同時想放入同個 index 空間這就是所謂的 Collision。 ~~~ 雖說應該儘量避免 Collision ，但如果真的發生，我的想法是利用 Chaining： e.g.當 A 和 B 都選到 1 ，那就在 1 的地址上，建 linked list： |0| |1|->|A|->|B| |2| .... ~~~ * 放置資料的修改 ~~~c= //append() while(strcmp(hashTable[pos]->lastName, "\000")) { hashTable[pos]->pNext = (slot_unit *) malloc(sizeof(slot_unit)); hashTable[pos]=hashTable[pos]->pNext; } strcpy(hashTable[pos]->lastName, lastName); hashTable[pos]->pNext = (slot_unit *) malloc(sizeof(slot_unit)); hashTable[pos]=hashTable[pos]->pNext; ~~~ * 讀取資料的修改 * 這邊因為放置資料已經建好結構，我們只要一一比對，不對就換下一筆 pNext ~~~c= slot_unit *findName(char lastName[],slot_unit *hashpos) { while (hashpos != NULL) { if (strcasecmp(lastName, hashpos->lastName) == 0) return hashpos; hashpos = hashpos->pNext; } return NULL; } ~~~ * string 轉 number 和 hash function的部份，我參考 [nekoneko 的 HackMD](https://hackmd.io/s/rJCs_ZVa#) * 但有強制轉型成 int ~~~c= unsigned int stringToInt(char *key) { int number = 0; while (*key) number += (int)*key++; return number; } ~~~ * 更改資料結構後 cache-miss 下降,變成 62.923 % ~~~ Performance counter stats for './phonebook_opt' (100 runs): 29,3966 cache-misses # 62.923 % of all cache refs ( +- 0.10% ) 46,7183 cache-references ( +- 0.42% ) 2,2751,4591 instructions # 1.71 insn per cycle ( +- 0.02% ) 1,3284,8254 cycles ( +- 0.16% ) 0.044613902 seconds time elapsed ( +- 0.52% ) ~~~ * 在 findName() 的效能上如預期的因為分堆變好很多，但在 append() 則差不多，因為其實擺放資料時，都還是先 malloc 一個空間再放置。 ![](https://i.imgur.com/j80i9Sz.png) * 這裡其實一開始就給 hashTable 一個空間，突然覺得不應該再 malloc() ，於是就使用 realloc（）把空間分配下去。 ~~~c= hashTable[a] = (slot_unit*)realloc(hashTable[a],sizeof(slot_unit)); //另外想測試如果我把需要 malloc（）的空間一次宣告完，在 //append 函式中 //都用 realloc() 會不會省去一些時間 slot_unit *space[350000]; void *append(char lastName[],unsigned int pos) { while(strcmp(hashTable[pos]->lastName, "\000")) { hashTable[pos]->pNext =(slot_unit*)realloc(space[i++],sizeof(slot_unit)); hashTable[pos]=hashTable[pos]->pNext; } strcpy(hashTable[pos]->lastName, lastName); hashTable[pos]->pNext =(slot_unit*)realloc(space[i++],sizeof(slot_unit)); hashTable[pos]=hashTable[pos]->pNext; } ~~~ 發現 cache-miss 下降 5% 左右 Performance counter stats for './phonebook_opt' (100 runs): ~~~ 31,7071 cache-misses # 57.314 % of all cache refs ( +- 0.14% ) 55,3219 cache-references ( +- 1.28% ) 2,4047,8635 instructions # 1.60 insn per cycle ( +- 0.02% ) 1,5054,1818 cycles ( +- 0.72% ) 0.049331016 seconds time elapsed ( +- 0.81% ) ~~~ * 但實際 findname 減少的時間其實只差一點點，但 append 甚至還變多。 ![](https://i.imgur.com/hkMwQzH.png) >>到這裡已經有點不太理解，因為我一開始預期的是宣告固定矩陣大小 realloc 取代一直 malloc 的過程，append() 時間應該要減少，而 cache-miss 應該要差不多，因為資料存放的大小不變 * 待做事項： * 嘗試探討 hashTable 的最適合的大小 * 減少 append() * 去了解資料的分佈在 hashtable 的情形 ## 作業要求 * 在 GitHub 上 fork [phonebook](https://github.com/sysprog21/phonebook/) (若你在 2017 年 9 月 22 日之前已經 fork [phonebook](https://github.com/sysprog21/phonebook/)，那應該重新 fork 過)，然後適度修改 `phonebook_opt.c` 和 `phonebook_opt.h` 兩個檔案，使得這兩者執行時期的 cache miss 降低。請用 perf 驗證，而且改進的過程中，不能有功能方面的減損。 * `phonebook_orig.[ch]` 不需要修改，我們關注的是 `phonebook_opt.[ch]`，當然要修改 `main.c` 也是允許的 * `findName()` 的時間必須原本的時間更短 * `append()` 的時間可以比原始版本稍久，但不應該增加太多，請參照下方共筆選讀的方法，著手縮減 `append()` 的時間開銷 * `main.c` 應該只有一份，不要建立新的 `main()`，善用 Makefile 定義對應的 `CFLAGS` * 過程中也會需要改動到 `Makefile`，請詳閱 [Makefile 語法和示範](https://hackmd.io/s/SySTMXPvl) * 在提交程式前，務必詳閱 [如何寫好 Git Commit Message](https://blog.louie.lu/2017/03/21/%E5%A6%82%E4%BD%95%E5%AF%AB%E4%B8%80%E5%80%8B-git-commit-message/) * 在執行程式(`phonebook_orig` 和 `phonebook_opt`) 前，先清空 cache: ``` $ echo 1 | sudo tee /proc/sys/vm/drop_caches ``` * 詳閱 [cache 原理和實驗](https://hackmd.io/s/S14L26-_l)，作業中也該提及你的認知並描述對應的實驗方法 * 除了修改程式，也要編輯「[作業區](https://hackmd.io/s/HyxQTaZj-)」，增添開發紀錄和 GitHub 連結 * 至少要列出效能分析，以及充份說明你如何改善效能 * 以 [gnuplot](http://www.gnuplot.info/) 繪製效能分析圖表 * 除了降低 `findName()` 的 cache miss 與執行成本，`append()` 也該想辦法縮減時間 * 建立 phone book 時，既然 lastName 是索引值，可以優先建立搜尋表格，其他資料可稍後再補上 * 回答以下問題：本例選用的 dataset (也就是輸入電話簿的姓名) 是否存在問題？ * 有代表性嗎？跟真實英文姓氏差異大嗎？ * 資料難道「數大就是美」嗎？ * 如果我們要建立符合真實電話簿情境的輸入，該怎麼作？ * 詳細閱讀 [示範用的 Phonebook 作業](https://hackmd.io/s/SklSEW6G-)，嘗試在自己的環境中重現裡頭的實驗，比照其中的寫作格式，第一段要提及「開發環境」，並且充分比較不同資料結構的影響 * 引入多項效能改進並在 HackMD 詳細論述： (詳情參照下方「見賢思齊」提及的共筆) * 使用 binary search tree 或其他資料結構改寫 * 藉由 hash function 來加速查詢，請詳閱 [hash function 觀念和實務](https://hackmd.io/s/HJln3jU_e) * 既然 first name, last name, address 都是合法的英文 (可假設成立)，使用[字串壓縮的演算法](http://stackoverflow.com/questions/1138345/best-compression-algorithm-for-short-text-strings)，降低資料表示的成本 * 截止日期： * Oct 8, 2017 (含) 之前 * 越早在 GitHub 上有動態、越早接受 code review，評分越高 ## 挑戰題 * 支援 [fuzzy search](http://www.informit.com/articles/article.aspx?p=1848528)，允許搜尋 lastName 時，不用精準打出正確的寫法 * 比方說電話簿有一筆資料是 `McDonald`，但若使用者輸入 `MacDonald` 或 `McDonalds`，也一併檢索出來 * 延伸閱讀: [Levenshtein Distance (編輯距離)](https://charles620016.hackpad.com/ep/pad/static/Japi4qFyAzt) * 改善電話簿的效能分析，透過大量的重複執行，找出 95% 信賴區間，而且允許動態新增資料 (較符合現實) 和查詢

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.