owned this note
owned this note
Published
Linked with GitHub
# 2018q1 Homework1(Phonebook)
### Reviewed by `Willemchen`
* 共筆第二行加入 contributed by < `...` > 方便辨識
* 比較的部份可以使用 gnuplot 來畫圖比較四個改進的方案,參考 [gnuplot 語法解說和示範](https://hackmd.io/s/Skwp-alOg#)
### Reviewed by `chasingfar`
* AVL Tree 的 append() 時間爆表、 findName() 時間無法計測,可嘗試改變 `main.c` 的實驗方式,如增加 findName() 的次數並改用 `all-names.txt` 為字典檔以平衡兩個時間。
---
## 開發環境
ubuntu 14.04 LTS
Linux version 4.4.0-116-generic
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Stepping: 3
CPU MHz: 1402.031
BogoMIPS: 4789.05
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-7
---
## perf 測試
安裝完perf後開始照著教學試試看他的功能,計算 pi 的結果如下:
99.81% test [.] compute_pi_baseline
0.03% [kernel] [k] native_write_msr_safe
0.03% [kernel] [k] read_tsc
0.03% [kernel] [k] acct_account_cputime
0.03% [kernel] [k] apic_timer_interrupt
0.03% [unknown] [k] 0x0000000000400663
0.03% [unknown] [k] 0x000000000040066d
0.03% [kernel] [k] update_vsyscall
0.00% [kernel] [k] do_nanosleep
0.00% [kernel] [k] native_apic_msr_write
> 可以明顯發現熱點出現在測試程式中的 compute_pi_baseline
>
---
## 開始實作
### 改進一:使用AVL Tree實作
* 首先由於 Linked list 的搜尋時間為O(N),可以針對此點改善,因此嘗試使用AVL tree作為資料結構。
原先的 append() 須注意到若兩子樹高度差超過1時需要 rotation,且有四種可能分別為RR,RL,LL,LR。(此部份參考至[AVL Tree insersion](https://www.geeksforgeeks.org/avl-tree-set-1-insertion/))
```C=
entry *append(char lastName[], entry *node)
{
/* 1. Perform the normal BST appendion */
if (node == NULL)
return(newNode(lastName));
if (strcasecmp(lastName, node->lastName) < 0)
node->left = append(lastName, node->left);
else if (strcasecmp(lastName, node->lastName) > 0)
node->right = append(lastName, node->right);
else // Equal lastNames are not allowed in BST
return node;
/* 2. Update height of this ancestor node */
node->height = 1 + max(height(node->left),
height(node->right));
/* 3. Get the balance factor of this ancestor
node to check whether this node became
unbalanced */
int balance = getBalance(node);
// If this node becomes unbalanced, then
// there are 4 cases
// Left Left Case
if (balance > 1 && strcasecmp(lastName, node->left->lastName) < 0)
return rightRotate(node);
// Right Right Case
if (balance < -1 && strcasecmp(lastName, node->right->lastName) > 0)
return leftRotate(node);
// Left Right Case
if (balance > 1 && strcasecmp(lastName, node->left->lastName) > 0)
{
node->left = leftRotate(node->left);
return rightRotate(node);
}
// Right Left Case
if (balance < -1 && strcasecmp(lastName, node->right->lastName) < 0)
{
node->right = rightRotate(node->right);
return leftRotate(node);
}
/* return the (unchanged) node pointer */
return node;
}
// A utility function to right rotate subtree rooted with y
entry *rightRotate(entry *y)
{
entry *x = y->left;
entry *T2 = x->right;
// Perform rotation
x->right = y;
y->left = T2;
// Update heights
y->height = max(height(y->left), height(y->right))+1;
x->height = max(height(x->left), height(x->right))+1;
// Return new root
return x;
}
// A utility function to left rotate subtree rooted with x
entry *leftRotate(entry *x)
{
entry *y = x->right;
entry *T2 = y->left;
// Perform rotation
y->left = x;
x->right = T2;
// Update heights
x->height = max(height(x->left), height(x->right))+1;
y->height = max(height(y->left), height(y->right))+1;
// Return new root
return y;
}
```
* 如此一來可以建出一高度為 Log(n) 之樹使搜尋時間大大降低。FindName() 可固定在 log(n) 次以內找到結果。
```C=
entry *findName(char lastName[], entry *pHead)
{
while (pHead != NULL) {
if (strcasecmp(lastName, pHead->lastName) == 0)
return pHead;
else if(strcasecmp(lastName, pHead->lastName) < 0)
pHead = pHead->left;
else
pHead = pHead->right;
}
return NULL;
}
```
* 然而由於在建樹過程中需要不斷 ratate,因此發現 append() 的效率極低。
![](https://i.imgur.com/FaWdHJt.png)
>append()的時間爆表了!!約需要0.23秒左右,但findName()的效能有明顯改善。 [name=張文瑋]
* orig 以及 opt 分別結果如下
Performance counter stats for './phonebook_orig' (100 runs):
954,011 cache-misses # 84.558 % of all cache refs ( +- 0.58% )
1,128,236 cache-references ( +- 0.63% )
262,495,551 instructions # 1.44 insns per cycle ( +- 0.02% )
181,872,464 cycles ( +- 0.14% )
0.056621351 seconds time elapsed ( +- 0.71% )
---
Performance counter stats for './phonebook_opt' (100 runs):
193,835 cache-misses # 49.631 % of all cache refs ( +- 1.09% )
390,553 cache-references ( +- 1.57% )
1,613,938,262 instructions # 2.05 insns per cycle ( +- 0.00% )
786,476,487 cycles ( +- 0.23% )
0.239435931 seconds time elapsed ( +- 0.26% )
> 從 cache-misses 及 instructions 也可看出這個方法的優缺點,不過這個方法在data亂序的狀態也適用。[name=張文瑋]
>
---
### 改進二:Sorted Linked List to Balanced BST
>此部份參考至[Ryan Wang大師](https://hackmd.io/s/Sk-qaWiYl#) [name=張文瑋]
>
由於老師給的資料為照大小順序排好的,因此在建樹過程只要遞迴將串列 n/2 位置當作子樹的root,如此將會很有效率的建構成 balanced BST。
圖示:
Input: Linked List 1->2->3
Output: A Balanced BST
2
/ \
1 3
Input: Linked LIst 1->2->3->4->5->6->7
Output: A Balanced BST
4
/ \
2 6
/ \ / \
1 3 4 7
* 新增一新的 structure 來操作 balanced BST。
```clike=
typedef struct __BST_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __BST_ENTRY *left;
struct __BST_ENTRY *right;
} bst_entry;
```
* 遞迴呼叫不斷從 n/2 將 root 往上拉。
```clike=
bst_entry* sortedListToBSTRecur(entry **head_ref, int n)
{
/* Base Case */
if (n <= 0)
return NULL;
/* Recursively construct the left subtree */
bst_entry *left = sortedListToBSTRecur(head_ref, n/2);
/* Allocate memory for root, and link the above constructed left
subtree with root */
bst_entry *root = newNode((*head_ref)->lastName);
root->left = left;
/* Change head pointer of Linked List for parent recursive calls */
*head_ref = (*head_ref)->pNext;
/* Recursively construct the right subtree and link it with root
The number of nodes in right subtree is total nodes - nodes in
left subtree - 1 (for root) which is n-n/2-1*/
root->right = sortedListToBSTRecur(head_ref, n-n/2-1);
return root;
}
```
* 可以發現 append() 的時間明顯降了下來,不過由於是額外處理原先的 Linked List 因此需要的時間還是相較於原來的多。
![](https://i.imgur.com/lpN6LGp.png)
Performance counter stats for './phonebook_opt' (100 runs):
761,606 cache-misses # 83.628 % of all cache refs ( +- 0.56% )
910,709 cache-references ( +- 0.52% )
341,273,575 instructions # 1.14 insns per cycle ( +- 0.02% )
299,807,238 cycles ( +- 1.11% )
0.092792705 seconds time elapsed ( +- 1.09% )
>由於新增了一個 144 bytes 的 entry ,因此 cache-misses rate 又上升了,故也可以從而得知每種方法都有其利與弊,必須參照目標從而決定實作方式。 [name=張文瑋]
---
### 改進三:縮小 structure 內容
希望藉由改善 cache miss rate 來達到效能提升,有鑑於看了許多人的共筆大家都有將 struct 的部份優化,因此我也來試試。
```clike=
typedef struct __PHONE_BOOK_OTHER_ENTRY {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} other_entry;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct __PHONE_BOOK_ENTRY *pNext;
other_entry *others;
} entry;
typedef struct __BST_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct __BST_ENTRY *left;
struct __BST_ENTRY *right;
other_entry *others;
} bst_entry;
```
* Linked List 所存使用的 stuct 下降為 32bytes, BST 所使用的 struct 則下降到 40 bytes, miss rate 下降到58%, append() 的執行時間也略為下降。
Performance counter stats for './phonebook_opt' (100 runs):
195,756 cache-misses # 58.002 % of all cache refs ( +- 1.01% )
337,497 cache-references ( +- 1.60% )
301,526,643 instructions # 1.72 insns per cycle ( +- 0.02% )
175,602,613 cycles ( +- 0.62% )
0.054685540 seconds time elapsed ( +- 0.63% )
![](https://i.imgur.com/xsPI8Kg.png)
---
### 比較
|| orig| AVL | sorted linked list to balanced BST |BST 優化 struct|
|--------| -------- | -------- | -------- | ---- |
|miss rate(%) | 84.558 | 49.631 | 83.628 |58.002|
|append() 平均執行時間(s)|0.037817|0.231546|0.078296|0.052502 |
|findName() 平均執行時間(s)|0.005247|~0.000000|~0.000000|~0.000000|
---
## 問題討論
* dataset 有代表性嗎?跟真實英文姓氏差異大嗎?
* 其實個人認為差異不會太大,若是資料皆可有大小順序權值那麼運用 tree 的資料結構儲存結果作用將會非常方便。
* 資料難道「數大就是美」嗎?
* 古人云:樹大必有枯枝,當資料量大時必需要考慮到一些無用的數據,否則只會拖慢執行速度。
* 如果我們要建立符合真實電話簿情境的輸入,該怎麼作?
* 真實電話簿可能會有重名的問題,需要額外給予權值。
---
## 參考連結
[AVL Tree insersion](https://www.geeksforgeeks.org/avl-tree-set-1-insertion/)
[Ryan Wang 2017q1 phonebook](https://hackmd.io/s/Sk-qaWiYl#)
[Sorted Linked List to Balanced BST](https://www.geeksforgeeks.org/sorted-linked-list-to-balanced-bst/)
[如何寫一個 Git Commit Message](https://blog.louie.lu/2017/03/21/%E5%A6%82%E4%BD%95%E5%AF%AB%E4%B8%80%E5%80%8B-git-commit-message/#rules02)
[perf 原理和實務](https://hackmd.io/s/B11109rdg#)