owned this note
owned this note
Published
Linked with GitHub
# 2017q1 Homework1(phonebook)
contributed by < `potinglee` >
### Reviewed by `yachiyang01`
-git commit message需用命令語氣,內文與標題需一個空白行
### Reviewed by `harryoooooooooo`
* C/C++標準中,free(), delete等函式不須再另外判斷是否為NULL,傳入NULL並不會發生錯誤
>> 注意一致的格式 [name="jserv"]
>進度嚴重落後,請趕工![name=課程助教][color=red]
## 開發環境
* OS: Ubuntu 16.04.2 LTS
* L1d cache: 32K
* L1i cache: 32K
* L2 cache: 256K
* L3 cache: 6144K
* Architecture: x86_64
* CPU 作業模式: 32-bit, 64-bit
* Byte Order: Little Endian
* CPU(s): 4
* Model name: Intel(R) Core(TM) i5-6300HQ CPU @ 2.30GHz
* **memory**
description: System memory
physical id: 0
size: 7872MiB
## 未優化版本
* 執行: `$ ./phonebook_orig`
* append() 及 findName() 時間如下
```
size of entry : 136 bytes
execution time of append() : 0.083434 sec
execution time of findName() : 0.005348 sec
```
* catch test: `$ make cache-test`
* cache-misses 為 92%
```
Performance counter stats for './phonebook_orig' (100 runs):
457,4257 cache-misses # 92.288 % of all cache refs ( +- 0.18% )
495,6511 cache-references ( +- 0.10% )
2,6246,7397 instructions # 1.50 insn per cycle ( +- 0.02% )
1,7458,4735 cycles ( +- 0.19% )
0.066569148 seconds time elapsed ( +- 0.26% )
```
## 優化版本1 - 減少資料結構大小
原本的資料結構為:
```c=
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
```
因為只搜尋 lastName,所以把其他不會用到的成員移除。結果如下:
```c=
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
```
* 執行: `$ ./phonebook_opt`
* append() 及 findName() 時間如下
```
size of entry : 24 bytes
execution time of append() : 0.039963 sec
execution time of findName() : 0.001991 sec
```
* catch test: `$ make cache-test`
* cache-misses 減少為 63%
```
Performance counter stats for './phonebook_opt' (100 runs):
102,7372 cache-misses # 63.629 % of all cache refs ( +- 0.76% )
161,4624 cache-references ( +- 0.13% )
2,4126,7291 instructions # 2.16 insn per cycle ( +- 0.02% )
1,1166,5508 cycles ( +- 0.19% )
0.041329077 seconds time elapsed ( +- 0.30% )
```
* 和未優化版本的比較圖:

## 優化版本2 - 使用二元搜尋樹
將原本的用 linked list 建立的資料,改用二元搜尋樹建立。
原本建立 linked list 的程式:
```c=
entry *append(char lastName[], entry *e)
{
/* allocate memory for the new entry and put lastName */
e->pNext = (entry *) malloc(sizeof(entry));
e = e->pNext;
strcpy(e->lastName, lastName);
e->pNext = NULL;
return e;
}
```
建立二元搜尋樹的程式:
```c=
entry *append(char lastName[], entry *e)
{
if(e == NULL) {
strcpy(e->lastName, lastName);
return e;
}
else {
//create a new node
entry* newnode = (entry *) malloc(sizeof(entry));
strcpy(newnode->lastName, lastName);
newnode->pRight = newnode->pLeft = NULL;
entry *tmp = e;
entry *prev = NULL;
while(tmp != NULL) {
prev = tmp;
if(strcasecmp(lastName, tmp->lastName) > 0) tmp = tmp->pRight;
else if(strcasecmp(lastName, tmp->lastName) < 0) tmp = tmp->pLeft;
}
if(prev->pRight == tmp) prev->pRight = newnode;
else prev->pLeft = newnode;
return newnode;
}
}
```
搜尋二元搜尋樹的 findName():
```c=
entry *findName(char lastName[], entry *pHead)
{
if(pHead == NULL) {
/* Element is not found */
return NULL;
}
while(pHead != NULL) {
if(strcasecmp(lastName, pHead->lastName) == 0) {
return pHead;
}
else {
if(strcasecmp(lastName, pHead->lastName) > 0) {
pHead = pHead->pRight;
}
else {
pHead = pHead->pLeft;
}
}
}
return pHead;
}
```
* 執行: `$ ./phonebook_opt`
* append() 及 findName() 時間如下
```
size of entry : 32 bytes
execution time of append() : 0.309874 sec
execution time of findName() : 0.003658 sec
```
* catch test: `$ make cache-test`
* cache-misses 減少為 68%,但因資料結構較大的緣故,cache-miss較優化版本1增加 5%。
```
Performance counter stats for './phonebook_opt' (100 runs):
160,2956 cache-misses # 68.397 % of all cache refs ( +- 0.50% )
234,3613 cache-references ( +- 0.11% )
3,2665,4220 instructions # 2.38 insn per cycle ( +- 0.01% )
1,3712,2147 cycles ( +- 0.16% )
0.050254760 seconds time elapsed ( +- 0.32% )
```
* 和未優化版本的比較圖:

雖然使用了二元搜尋樹,但 findNmae()的時間卻沒減少,反而增加。
推測是由於每次加新資料都從 tree 的尾端,因此會形成這種狀況:
```graphviz
digraph bst {
aaaa -> aaaaa -> aaaaaa -> aaaaaaa [label = pRight];
aaaa -> NULL [label = pLeft];
aaaaa -> NULL [label = pLeft];
aaaaaa -> NULL [label = pLeft];
aaaaaaa -> NULL [label = pLeft];
}
```
每一個 node 雖有兩個指標,實際上卻只用到一個(另一個指向 NULL),因此整個二元搜尋樹等同於普通的 linked list,而無法發揮二元搜尋樹的優勢。
>> 請及早調整輸入的 data set [name="jserv"]
## 優化版本3 - 用樹旋轉改進以上的問題
(working)
>> 可以先將 words.txt 透過 -R 打亂排序
>> [name=課程助教][color=red]