# HW1-Phonebook
###### tags: `Miyavi-Chen`
## 開發環境
### Reviewed by `natetang`
* 尚未回答作業所提出的問題
* Commit message 放圖片那裏應該要跟你實際做優化的 message 不同。EX: [Use hash function to optimize functions](https://github.com/Miyavi-Chen/phonebook/commit/6100b29a3b729b60fbc52f89a04ddf641cd6d6d8)
* 方法都有列出來了,就只差實作,加油
### Reviewed by `cdq7`
* Commit message [Reduce entry to reduce time](https://github.com/Miyavi-Chen/phonebook/commit/486b21634548303cc7ee9a09a88da4427317b319) seems a little vague.
* The cache-miss using hash function shows a significant improvement compared to our peers. What is the key reason?
* Commit message [Add phonebook_opt_hash](https://github.com/sysprog21/phonebook/commit/4b6e58f6ab983601f471bf1ca77f9e9dd3dea0b0) has a spelling error. "for decrising cache miss. Do you mean "decreasing"?
* Two commit messages, [Add phonebook_opt_hash](https://github.com/sysprog21/phonebook/commit/4b6e58f6ab983601f471bf1ca77f9e9dd3dea0b0) and [Add phonebook_opt_hash for optimization](https://github.com/sysprog21/phonebook/commit/dcc165a9a509ace961064a4c86daa041e9a2ddb2) need to be more explicit.
cpu:
```clike=
$ lscpu
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
每核心執行緒數:2
每通訊端核心數:4
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 58
Model name: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz
製程: 9
CPU MHz: 1824.829
CPU max MHz: 3700.0000
CPU min MHz: 1600.0000
BogoMIPS: 6584.46
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 8192K
NUMA node0 CPU(s): 0-7
```
kernel:
```
$ uname -a
Linux miyavi-desktop 4.8.0-39-generic #42~16.04.1-Ubuntu SMP Mon Feb 20 15:06:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
```
## Requirement
1. cache miss 降低
2. findName() 的時間必須原本的時間更短
3. 回答以下問題:本例選用的 dataset (也就是輸入電話簿的姓名) 是否存在問題?
* 有代表性嗎?跟真實英文姓氏差異大嗎?
* 資料難道「數大就是美」嗎?
* 如果我們要建立符合真實電話簿情境的輸入,該怎麼作?
## 課前準備
### perf
* 看支援的event $ perf list
* 目前系統熱點 $ perf top
* 監測cache miss的話 Root 須權限變更 . 由下面指令確認目前權限
```clike=
$ cat /proc/sys/kernel/perf_event_paranoid
```
* 由下面指令可更改權限
```clike=
$ sudo su
echo 0 > /proc/sys/kernel/kptr_restrict
echo 0 > /proc/sys/kernel/perf_event_paranoid
```
* Find hot point through perf
```clike=
./phonebook_orig & sudo perf top -e cycles:ppp -p $!
```
* 觀看特定event及sample 取樣頻率設定(5000)
```clike=
$ perf top -e cache-misses -c 5000
```
* 分析特定檔案events(重複跑5次). :u是user space,:k is kernel space
```clike=
$ perf stat --repeat 5 -e cache-misses:u,cache-references,instructions,cycles ./perf_stat_cache_miss
```
* 觀測function level 改用 $ perf record 及$ perf report
```clike=
$ perf record -F 5000 -e cache-misses ./phonebook_orig && perf report
$ perf report
```
### gnuplot
* 繪圖 $ make plot
* 檢圖 $ eog [file name]
* .gp file 可設定圖片輸出格式
* 只修改圖可用 $ gnuplot scripts/runtime.gp
**參考資料**
[gnuplot 語法解說和示範](https://hackmd.io/s/Skwp-alOg)
### 程式碼格式化工具: astyle
```
$ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]
```
* `--style=kr` 用K&R的風格進行編程
* `--indent=spaces=4` 四個空格
* `--indent-switches` 關於Case的縮排
- Indent ‘switch’ blocks, so that the inner 'case XXX:'headers are indented in relation to the switch block.
* `--suffix=none` 不保存原始文件
* `*.[ch]` 所有的.c或.h檔
[Astyle official doc](http://astyle.sourceforge.net/astyle.html#_Quick_Start)
## 未優化版本
* 執行: `$ ./phonebook_orig`
* append() 及 findName() 時間如下
```
size of entry : 136 bytes
execution time of append() : 0.069621 sec
execution time of findName() : 0.005714 sec
```
* cache test: `$ make cache-test`
* cache-misses 高達 92%
```
Performance counter stats for './phonebook_orig' (100 runs):
2,110,653 cache-misses # 92.831 % of all cache refs ( +- 0.15% )
2,273,646 cache-references ( +- 0.19% )
262,132,700 instructions # 1.33 insn per cycle ( +- 0.02% )
196,377,615 cycles ( +- 0.38% )
0.061147295 seconds time elapsed ( +- 1.48% )
```
## Phonebook 開發紀錄
* 先試著重現前人成果
### 優化版本 1 - 減少資料結構大小
* 理解:因為find 只用到lastname,故建立新的struct把詳細資料搬移到entry_detail,在 entry 裡放一個pointer 指向entry_detail .
```c=
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __PHONE_BOOK_ENTRY *pNext;
} entry_detail;
typedef struct __LAST_NAME_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
entry_detail *detail;
struct __LAST_NAME_ENTRY *pNext;
} entry;
```
* 執行時間:
```
size of entry : 32 bytes
execution time of append() : 0.057417 sec
execution time of findName() : 0.003356 sec
```
* cache-misses: 縮小了許多 (92% to 62%)
* append/findName 時間也降低許多
```
Performance counter stats for './phonebook_opt' (100 runs):
375,561 cache-misses # 62.049 % of all cache refs ( +- 0.58% )
605,269 cache-references ( +- 0.95% )
244,491,770 instructions # 1.83 insn per cycle ( +- 0.02% )
133,387,297 cycles ( +- 0.36% )
0.042542070 seconds time elapsed ( +- 1.67% )
```
* plot

## 優化版本 2 - 使用 Hash Function 儲存資料
參考了 [laochanlam](https://github.com/laochanlam/phonebook/blob/master/phonebook_opt_hash.h) [ktvexe ](https://github.com/ktvexe/phonebook-1/blob/master/phonebook_opt.h) 後採用比較看得懂的laochanlam作法.所以選用 BKDR-Hash 進行優化,下面是 BKDR-Hash 的程式碼片段:
[BKDR-Hash 參考](http://www.jianshu.com/p/0be67bf8887e)
* seed 應為質數減少collision
* 跟0x7FFFFFFF 做and確保hash value仍在unsigned int 內
* hash table size 應選用大的質數避免collision
```c=
unsigned int BKDRHash(char *str){
unsigned int seed = 131;
unsigned int hash = 0;
while (*str){
hash = hash * seed + (*str++);
}
return (hash & 0x7FFFFFFF)% 42737 ;
}
```
* 執行時間:
* append()平均時間從0.061933->0.026846 大幅下降
* findName()平均時間從0.020846->0.000003 s大幅下降

* cache-misses 下降到51.79%
```
Performance counter stats for './phonebook_opt_hash' (100 runs):
325,024 cache-misses # 51.790 % of all cache refs ( +- 0.14% )
627,575 cache-references ( +- 0.47% )
241,368,273 instructions # 1.56 insn per cycle ( +- 0.02% )
154,418,301 cycles ( +- 0.38% )
0.050084548 seconds time elapsed ( +- 1.92% )
```
## 優化版本 3 - BST
TBD...進度落後中,之後再補
## 優化版本 4 - trie and array
TBD...進度落後中,之後再補
## 優化版本 5 - 利用 memory pool
TBD...進度落後中,之後再補
## 進階版本 - Fuzzy Search
TBD...進度落後中,之後再補