# 2017q1 Homework1 (phonebook)
contributed by <`zmke`>
### Reviewed by `ryanwang522`
* 參考 [Cache](https://stevenschmatz.gitbooks.io/eecs-370/content/Lecture_Notes/lecture_17.html) 的說明,Cache 是為了保留最可能被存取的資料(Most likely to be referenced),避免較耗時的記憶體操作。
* 可以利用 Hash function 來增進 `findName()` 的效率。
* 這裡實作的 BST 缺乏對 Cache line 大小的配合,可以透過更改 BST node 結構以符合 Cache line,參考 [關於 CPU Cache](http://cenalulu.github.io/linux/all-about-cpu-cache/)。
## 開發環境
OS: Ubuntu 16.04 LTS
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
Model name: Intel(R) Core(TM) i5-4200H CPU @ 2.80GHz
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 3072K
## cache
* ~~cache 是為了讓資料存取的速度適應 CPU 的處理速度~~
>> 這句話不精確,請參照 eecs-370 的說明 [Cache](https://stevenschmatz.gitbooks.io/eecs-370/content/Lecture_Notes/lecture_17.html),搭配短片 [What is cache memory](https://www.youtube.com/watch?v=roeZs-eL-lw),再來回顧你原本的論述 [name="jserv"]
* static random access memory (SRAM) 速度快、成本高
* 電腦的 main memory 主要是 DRAM ,跟不上 CPU 存取的速度, cache 採用的是 SRAM
* CPU 在 cache 找不到需要的資料時,會到 main memory 去找,但 CPU 和 main memory 的速度差了好幾個數量級
* Von Neumann bottleneck : main memory 的速度跟不上 CPU 的速度, 造成 CPU performance 的限制
* 察看記憶體資訊`$ dmidecode --type 17 | more`
```
Memory Device
Array Handle: 0x0042
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 4096 MB
Form Factor: SODIMM
Set: None
Locator: ChannelA-DIMM0
Bank Locator: BANK 0
Type: DDR3
Type Detail: Synchronous
Speed: 1600 MHz
Manufacturer: Transcend
Serial Number: 0009FB43
Asset Tag: 9876543210
Part Number: TS512MSK64W6H
Rank: 1
Configured Clock Speed: 1600 MHz
```
以我的電腦為例, CPU 時脈是 2.8GHz 但 main memory 時脈只有 1600 MHz
* function of the cache
* The cache will hold data that we think is most likely to be referenced.
* Minimize the average memory access latency
* Maximize the number of references that are serviced by the cache
* N-Way Set Associative : 避免 Fully Associative 和 Direct Mapped 的缺陷,將 cache 分成 N 個 set
* cache miss: 在 cache 找不到需要的資料,需要到 main memory 尋找
* 檢查電腦的 cache : `$sudo dmidecode -t cache | more`
```
Cache Information
Socket Designation: CPU Internal L1
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 128 kB
Maximum Size: 128 kB
Supported SRAM Types:
Unknown
Installed SRAM Type: Unknown
Speed: Unknown
Error Correction Type: Single-bit ECC
System Type: Other
Associativity: 8-way Set-associative
```
## 原始版本
>中英文字間請已空白隔開
>[name=課程助教][color=red]
* 先清空 cache:`$ echo 1 | sudo tee /proc/sys/vm/drop_caches`
* 執行:`$ ./phonebook_orig`
```
size of entry : 136 bytes
execution time of append() : 0.066602 sec
execution time of findName() : 0.005839 sec
```
* 測試 cache miss rate: `$ make cache-test`
* 原始版本 cache miss rate 85%
```
Performance counter stats for './phonebook_orig' (100 runs):
1,025,260 cache-misses # 85.415 % of all cache refs ( +- 0.35% )
1,200,333 cache-references ( +- 0.29% )
262,101,582 instructions # 1.40 insn per cycle ( +- 0.02% )
187,299,573 cycles ( +- 0.14% )
0.057128980 seconds time elapsed ( +- 1.18% )
```
* 利用 perf 找熱點
`$ ./phonebook_orig & sudo perf top -p $!`
* findName 佔19.16%, append 佔3.32%,但 append 實際執行時間卻較長
```
30.72% libc-2.23.so [.] __strcasecmp_l_avx
19.16% phonebook_orig [.] findName
6.63% [kernel] [k] clear_page_c_e
6.52% phonebook_orig [.] main
5.67% libc-2.23.so [.] _IO_fgets
5.30% libc-2.23.so [.] _IO_getline_info
3.71% libc-2.23.so [.] _int_malloc
3.32% phonebook_orig [.] append
2.54% [kernel] [.] native_irq_return_iret
2.46% [kernel] [k] unmap_page_range
2.46% [kernel] [k] free_pcppages_bulk
1.66% [kernel] [k] release_pages
1.65% phonebook_orig [.] strcasecmp@plt
1.35% libc-2.23.so [.] __memcpy_sse2
0.93% libc-2.23.so [.] memchr
0.91% [kernel] [k] do_page_fault
0.86% [kernel] [k] __do_page_fault
0.83% [unknown] [k] 0x00007f29e9ab0e02
0.83% [unknown] [k] 0x00007f29e9ab23c8
0.83% [kernel] [k] page_remove_rmap
0.82% [kernel] [k] __find_get_block
0.82% [kernel] [k] free_hot_cold_page
0.00% [kernel] [k] native_write_msr
```
## method1 - 減少 entry 大小
* findName 的時候都只用 lastName 去找,原本的結構有很多資料,如果將 entry 縮小,就有更多 entry 可以被放進 cache ,預期能降低 cache miss rate
``` ==
typedef struct __ENTRY_DETAIL {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} entryDetail;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
entryDetail *detail;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
```
* 把~~沒有用到~~ lastName 以外的資料移到 entryDetail ,原本的結構內剩下 lastName 和 entryDetail 的指標, append() 和 findName() 和原始版本相同
`$ ./phonebook_opt`
>> 不是「沒有用到」,請思考用語的精確性 [name="jserv"]
```
size of entry : 32 bytes
execution time of append() : 0.066095 sec
execution time of findName() : 0.002160 sec
```
`$ make plot`
![](https://i.imgur.com/0KMhkXn.png)
* size of entry 從 136 bytes 縮小到 32 bytes , append() 和 findName() 的時間都降低了
`$ make cache-test`
```
Performance counter stats for './phonebook_opt' (100 runs):
145,400 cache-misses # 46.061 % of all cache refs ( +- 0.34% )
315,669 cache-references ( +- 0.56% )
244,493,450 instructions # 1.90 insn per cycle ( +- 0.02% )
128,986,123 cycles ( +- 0.31% )
0.039869771 seconds time elapsed ( +- 0.62% )
```
* cache miss rate 從 84% 降低到 46%
## method2 - Binary Search Tree
* 原本的結構是 singly linked list ,搜尋時是 liner search ,將結構改成 Binary Search Tree 預期能降低 findName 的時間
* 因為字典檔內的 lastName 是已經排序過的字串,直接用來建 BST 的話會使 BST 變成 list , findName 無法跳脫 liner search ,因此嘗試建出左右平衡的 BST
* reference
* [Sorted Linked List to Balanced BST](http://www.geeksforgeeks.org/sorted-linked-list-to-balanced-bst/)
* [Convert Sorted List to Binary Search Tree](https://siddontang.gitbooks.io/leetcode-solution/content/tree/convert_sorted_listarray_to_binary_search_tree.html)
```==
bst *list_to_BST(entry *pHead)
{
int listLen = 0;
entry *cur = pHead;
while(cur) {
listLen++;
cur = cur->pNext;
}
return build(&pHead, listLen);
}
bst *build(entry **pHead, int listLen)
{
if(listLen <= 0)
return NULL;
bst *left = build(pHead, listLen/2);
bst *root = (bst *) malloc(sizeof(bst));
strcpy(root->lastName, (*pHead)->lastName);
root->left = left;
*pHead = (*pHead)->pNext;
root->right = build(pHead, listLen-listLen/2-1);
return root;
}
```
* 維持原始版本的 entry structure ,使用二元搜尋樹進行實驗
`$ ./phonebook_bst`
```
size of entry : 136 bytes
execution time of append() : 0.066551 sec
execution time of findName() : 0.000000 sec
```
`$ make plot`
![](https://i.imgur.com/2FZIfk8.png)
發現 findName() 的時間明顯降低,但 append() 較前兩個情形稍微上升
`$ make cache-test`
```
Performance counter stats for './phonebook_bst' (100 runs):
770,479 cache-misses # 80.189 % of all cache refs ( +- 0.34% )
960,834 cache-references ( +- 1.09% )
359,999,587 instructions # 1.21 insn per cycle ( +- 0.01% )
297,319,379 cycles ( +- 1.15% )
0.092091102 seconds time elapsed ( +- 1.28% )
```
與原始版本相比, cache miss rate 有些微下降(85.41% -> 80.19%)
* 使用 method1 縮小過後的 entry structure 進行實驗
`$ ./phonebook_bst`
```
size of entry : 32 bytes
execution time of append() : 0.065951 sec
execution time of findName() : 0.000000 sec
```
`$ make plot`
![](https://i.imgur.com/IBK3BgF.png)
append() 的時間較前兩個情況下降了
`$ make cache-test`
```
Performance counter stats for './phonebook_bst' (100 runs):
302,300 cache-misses # 65.713 % of all cache refs ( +- 0.19% )
460,034 cache-references ( +- 0.57% )
342,481,904 instructions # 1.59 insn per cycle ( +- 0.01% )
214,799,756 cycles ( +- 0.32% )
0.065648659 seconds time elapsed ( +- 0.39% )
```
同時使用 bst 和較小的 entry , cache miss rate 較只用 bst 明顯下降
* 依實驗結果來看,目前讓 findName() 進步最多的是將 linked list 轉成 binary search tree 有效節省 findName() 所需的時間,將 entry 縮小除了能夠降低 cache miss rate 之外,連 cache reference 的次數也跟著降低了
## memory leak
* valgrind :可以用來檢查程式記憶體
`$ valgrind --leak-check=yes ./phonebook_orig`
* 修正 main.c 最後的 release memory 的部份
```
while(pHead) {
entry *tmp = pHead;
pHead = pHead->pNext;
free(tmp);
}
```
valgrind 結果
```
==9712== All heap blocks were freed -- no leaks are possible
```
* 為 binary search tree 增加 releaseTree() ,確保記憶體完整被釋放
```
void releaseTree(bst *root)
{
if(root == NULL) return;
releaseTree(root->left);
releaseTree(root->right);
free(root);
}
```
valfrind 結果
```
==9716== All heap blocks were freed -- no leaks are possible
```
## 本例選用的 dataset (也就是輸入電話簿的姓名) 是否存在問題?
* 有代表性嗎?跟真實英文姓氏差異大嗎?
* 裡面有部份是沒意義的單字
* 資料難道「數大就是美」嗎?
* 並不完全正確,像是 AlphaGo 就是靠大量的對戰資料來進步,但資料的質量也很重要,在做訊號分析的時候常常需要 filter 來過濾,我想資料也是一樣,至少不能和現實相差太大
* 如果我們要建立符合真實電話簿情境的輸入,該怎麼作?
* 要去找找有沒有真實姓名統計資料,把這些名字做成字典檔
* [可以產生名字的網站](http://www.generatedata.com/#t1)
* [English Baby Names](http://babynames.net/all/english)
* 如果會爬蟲的技術,可以從這些網站把資料整理下來
## Reference
[Cache](https://stevenschmatz.gitbooks.io/eecs-370/content/Lecture_Notes/lecture_17.html)
[von Neumann bottleneck](http://whatis.techtarget.com/definition/von-Neumann-bottleneck)
[What is cache memory – Gary explains](http://www.androidauthority.com/what-is-cache-memory-gary-explains-681747/)
[Binary Tree](http://www.csie.ntnu.edu.tw/~u91029/BinaryTree.html)
[C Binary Tree with an Example C Code](http://www.thegeekstuff.com/2013/02/c-binary-tree")
[ktvexe 筆記](https://hackmd.io/MYZgnCwGbALAtGArCADPWAjAhgRnpgOwAmY8UAbMJqsLgBxb2ZA=)
[twzjwang 筆記](https://hackmd.io/s/HJsOjpYKl)
[檢查程式記憶體的小工具-valgrind](http://daydreamer.idv.tw/rewrite.php/read-18.html)