# 2017q1 Homework1(phonebook)
contributed by < [refleex](https://github.com/refleex) >
### Reviewed by `king1224`
* phonebook_opt_hash 的 findName 執行時間是 0.0000 秒,你認為是不正常的嗎? 如果將要查找的 lastName 改變之後,時間還會是 0.0000 秒嗎
* 我試著將查找目標改為 " aaaa ",卻出現了錯誤,append 是不是需要做修正
* 可以詳細閱讀電話簿的輸入順序,main.c 中呼叫 append 和 findName 的時機,以及你是如何處理 append 和 findName 的,就會發現為什麼 findName 在這邊是 0 秒,為什麼不能查找別筆資料
>main.c 中的 input array 即為查找目標
>dictionary 資料夾中的 word.txt 為預設電話簿
>[name=洪正皇][color=#e847bd]
## **開發環境**
作業系統 : ubuntu 16.04 LTS (64-bit)
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
每核心執行緒數:2
每通訊端核心數:2
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 60
Model name: Intel(R) Core(TM) i5-4210M CPU @ 2.60GHz
製程: 3
CPU MHz: 1694.671
CPU max MHz: 3200.0000
CPU min MHz: 800.0000
BogoMIPS: 5187.92
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 3072K
NUMA node0 CPU(s): 0-3
## ~~進度~~
- 2017/02/22
>> 不用特別標時間,HackMD 會自動記錄更新,你只要及時整理進度和所見所聞即可 [name="jserv"]
---
## **前置作業**
先安裝並熟悉 perf 的使用
```
$ sudo apt-get install linux-tools-common
$ sudo apt-get install linux-tools-3.16.0-50-generic linux-cloud-tools-3.16.0-50-generic
```
但因為 kernel 版本不同,因此照出現的警示安裝相符的版本
```
$ sudo apt-get install linux-tools-4.4.0-63-generic
$ sudo apt-get install linux-cloud-tools-4.4.0-63-generic
```
接著就照 [Perf](https://hackmd.io/s/B11109rdg) 的說明繼續練習,先查看權限
```
$ cat /proc/sys/kernel/perf_event_paranoid
```
結果為 1
```
1
```
但在我要解除 kernel pointer 限制時
```
$ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"
```
查看權限的結果仍為 1
執行 example.c
```
$ ./example
```
得到 pid: 26710
```
$ perf top -p 26710
```
輸入上列指令,卻只得到 perf top 的參數列表
```
Usage: perf top [<options>]
-a, --all-cpus system-wide collection from all CPUs
-b, --branch-any sample any taken branches
-c, --count <n> event period to sample
-C, --cpu <cpu> list of cpus to monitor
-d, --delay <n> number of seconds to delay between refreshes
-D, --dump-symtab dump the symbol table used for profiling
-E, --entries <n> display this many functions
-e, --event <event> event selector. use 'perf list' to list available even
-f, --count-filter <n>
only display functions with more events than this
-F, --freq <n> profile at this frequency
-g enables call-graph recording and display
-i, --no-inherit child tasks do not inherit counters
-j, --branch-filter <branch filter mask>
branch stack filter modes
-K, --hide_kernel_symbols
hide kernel symbols
-k, --vmlinux <file> vmlinux pathname
-M, --disassembler-style <disassembler style>
Specify disassembler style (e.g. -M intel for intel syntax)
.
.
.
```
上述部份試了很多種方法,結果不是沒變,不然就是
```
bash: syntax error near unexpected token `newline'
```
不過沒關係,大致重複來回做7個小時後,再參考一些網路上的文章,已經稍微對 perf 有些概念,所以正式開始這次的作業-->優化
---
## **Original**
先用 perf stat 檢查原始版本的 cash miss
```
$sudo perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig
```
結果如下
```
Performance counter stats for './phonebook_orig' (100 runs):
969,283 cache-misses # 85.291 % of all cache refs ( +- 0.19% )
1,136,447 cache-references ( +- 0.13% )
261,184,322 instructions # 1.47 insns per cycle ( +- 0.02% )
177,351,808 cycles ( +- 0.14% )
0.056668601 seconds time elapsed ( +- 0.15% )
```
cache-miss 85.291%
```
size of entry : 136 bytes
execution time of append() : 0.059108 sec
execution time of findName() : 0.005920 sec
```
---
## **可能的效能改進方向**
- 使用 binary search tree 或其他資料結構改寫
- 藉由 hash function 來加速查詢,請詳閱 hash function 觀念和實務
- 既然 first name, last name, address 都是合法的英文 (可假設成立),使用字串壓縮的演算法,降低資料表示的成本
---
## **優化 1 (struct)**
打開 ```phonebook_opt.h```
``` clike=
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
entry *findName(char lastName[], entry *pHead);
entry *append(char lastName[], entry *e);
```
打開 ```phonebook_orig.c```
``` clike=
entry *findName(char lastName[], entry *pHead)
{
while (pHead != NULL) {
if (strcasecmp(lastName, pHead->lastName) == 0)
return pHead;
pHead = pHead->pNext;
}
return NULL;
}
entry *append(char lastName[], entry *e)
{
/* allocate memory for the new entry and put lastName */
e->pNext = (entry *) malloc(sizeof(entry));
e = e->pNext;
strcpy(e->lastName, lastName);
e->pNext = NULL;
return e;
}
```
可以發現在```phonebook_opt.h``` struct 所選宣告變數中 append() 和 findName() 只用到了 lastName
因此把不會用到的變數 用另外一個struct宣告
```clike=
typedef struct __PHONE_BOOK_ENTRY1 {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} entry1;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct entry1 *detail;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
```
然後```$ make plot```
```
size of entry : 32 bytes
execution time of append() : 0.038840 sec
execution time of findName() : 0.005018 sec
```
```
Performance counter stats for './phonebook_opt' (100 runs):
971,060 cache-misses # 85.131 % of all cache refs ( +- 0.17% )
1,140,662 cache-references ( +- 0.12% )
261,089,459 instructions # 1.47 insns per cycle ( +- 0.02% )
177,377,228 cycles ( +- 0.13% )
0.056722433 seconds time elapsed ( +- 0.16% )
```
85.131%,嗯。
![modify struct](https://i.imgur.com/lTRc2bp.png)
However, 我可憐的電腦,cache-miss 僅優化了 0.16%
--
不死心重複嘗試後,把原本的 phonebook 資料夾整個刪掉,重頭開始做。
```clike=
#define OPT 1
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct __PHONE_BOOK_ENTRY *pNext;
struct entry1 *detail;
} entry;
typedef struct __PHONE_BOOK_ENTRY1 {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} entry1;
entry *findName(char lastName[], entry *pHead);
entry *append(char lastName[], entry *e);
#endif
```
```clike=
entry *findName(char lastName[], entry *pHead)
{
while (pHead != NULL) {
if (strcasecmp(lastName, pHead->lastName) == 0)
return pHead;
pHead = pHead->pNext;
}
return NULL;
}
entry *append(char lastName[], entry *e)
{
/* allocate memory for the new entry and put lastName */
e->pNext = (entry *) malloc(sizeof(entry));
e = e->pNext;
strcpy(e->lastName, lastName);
e->pNext = NULL;
return e;
}
```
然後```$ make plot```
```
size of entry : 32 bytes
execution time of append() : 0.034790 sec
execution time of findName() : 0.002003 sec
```
```
Performance counter stats for './phonebook_opt' (100 runs):
142,859 cache-misses # 47.150 % of all cache refs ( +- 0.50% )
302,990 cache-references ( +- 0.60% )
243,965,824 instructions # 1.93 insns per cycle ( +- 0.02% )
126,343,413 cycles ( +- 0.26% )
0.040863594 seconds time elapsed ( +- 0.34% )
```
47.15% 總算回歸正常數值
![newruntime](https://i.imgur.com/lbZXSPD.png)
---
## 優化 2 ( hash function )
看完 [hash function 觀念和實務](https://hackmd.io/s/HJln3jU_e) ,再上網找了一些關於hash function的文章
了解到 hash function 其實就是把一堆資料分門別類,再根據索引去搜尋。
所以如果把電話簿裡的名字以某種規則分類,若要找特定名字,便可在較小的範圍內搜尋,以減少搜索時間
所謂某種規則,便是 hash function ,雜湊函數。
上網找了一個 BKDR-hash function 便開始著手修改程式碼。
```clike=
unsigned int hash(char lastName[])
{
int hash_number = 0;
int h = 53;
for (int i = 0; lastName[i] != '\0'; i++)
hash_number = (hash_number * h + lastName[i]) % TABLE;
return hash_number;
}
```
hash function 本身很好寫,難在 main 函式的更改,尤其 allocate memory 時極常出錯
```
size of entry : 32 bytes
execution time of append() : 0.073824 sec
execution time of findName() : 0.000000 sec
*** Error in `./phonebook_opt_hash': munmap_chunk(): invalid pointer: 0x00007ffd6f3b87a0 ***
```
上網找了一下關於 munmap_chunk(): invalid pointer 的問題,找到一篇提到:the program will crash if you pass a pointer to free() that has not been obtained through malloc().
修改一下程式碼後(包括 Makefile, calculate.c, runtime.gp等等)
```
size of entry : 32 bytes
execution time of append() : 0.056479 sec
execution time of findName() : 0.000000 sec
```
```
Performance counter stats for './phonebook_orig' (100 runs):
570,732 cache-misses # 59.378 % of all cache refs ( +- 0.33% )
961,182 cache-references ( +- 0.31% )
260,905,474 instructions # 1.47 insns per cycle ( +- 0.02% )
177,699,696 cycles ( +- 0.42% )
0.057268308 seconds time elapsed ( +- 0.52% )
```
```
Performance counter stats for './phonebook_opt' (100 runs):
139,321 cache-misses # 46.176 % of all cache refs ( +- 0.56% )
301,716 cache-references ( +- 0.53% )
243,828,644 instructions # 1.94 insns per cycle ( +- 0.02% )
125,408,510 cycles ( +- 0.29% )
0.040634235 seconds time elapsed ( +- 0.38% )
```
```
Performance counter stats for './phonebook_opt_hash' (100 runs):
742,301 cache-misses # 59.684 % of all cache refs ( +- 0.76% )
1,243,727 cache-references ( +- 0.24% )
336,095,343 instructions # 1.22 insns per cycle ( +- 0.01% )
275,331,193 cycles ( +- 0.30% )
0.089550008 seconds time elapsed ( +- 0.36% )
```
Origin 的 cash-miss 只有 59% ,正常不應該那麼低,所以我下載原版跑了一次,跟之前一樣約略是 85% ~ 86%。
應該是因為我在更改 main.c 時不小心更動某一部份(還不確定再哪),而 origin 有近乎和 hash 版本一樣的 cash-miss
![](https://i.imgur.com/DaaCAGM.png)
findName 一直是0.0000秒,原本以為他是沒有進到 while迴圈而直接回傳 NULL ,可是試驗之後發現也不是,這邊一直想不透
---
## **參考資料**
- [Perf -- Linux下的系统性能调优工具](https://www.ibm.com/developerworks/cn/linux/l-cn-perf1/)
- https://perf.wiki.kernel.org/index.php/Tutorial
- http://groangao.pixnet.net/blog/post/24474489-%5Bc,c%2B%2B%5D-typedef-struct-%E7%94%A8%E6%B3%95%E8%AA%AA%E6%98%8E
- [hash table](http://algs4.cs.princeton.edu/34hash/)
- https://zh.wikipedia.org/wiki/Strcpy
- [ifdef](https://msdn.microsoft.com/zh-tw/library/2a1b21sf.aspx)
- [munmap_chunk(): invalid pointer](http://stackoverflow.com/questions/32118545/munmap-chunk-invalid-pointer)