# 2016q3 Homework1 (phonebook)
contributed by <`a530788140`>
### Reviewed by `Louie Lu`
* git log is too simplify for reading
* calcuate.c 33:40 寫的很奇怪。
* 沒有推上 Hashtable 的 code.
* coding style
* consist if else block (you got two style, i prefer this one)
```c
if (expr) {
stmt
} else {
stmt
}
```
## 實作二元搜尋樹
* `append`
```c
entry *append(char lastName[], entry *root)
{
/* allocate memory for the new entry and put lastName */
if (strcasecmp("\0", root->lastName) == 0) {
strcpy(root->lastName, lastName);
} else {
entry *tmp, *current, *parent;
tmp = (entry *) malloc(sizeof(entry));
strcpy(tmp->lastName, lastName);
tmp->lNext = NULL;
tmp->rNext = NULL;
current = root;
while (current != NULL) {
parent = current;
if (hash(lastName) > hash(current->lastName))
current = current->rNext;
else
current = current->lNext;
}
if (hash(lastName) > hash(parent->lastName))
parent->rNext = tmp;
else
parent->lNext = tmp;
}
return root;
}
```
>> 請保持原始程式碼一致的縮排 (4 個空白,而非 tab),注意看作業要求中 astyle 的設定方式 [name=jserv]
>感謝老師提醒,已修改[name=儂偉]
* `findName`
```c
entry *findName(char lastname[], entry *root)
{
while (root != NULL) {
if (strcasecmp(lastname, root->lastName) == 0) {
return root;
} else {
if (hash(lastname) > hash(root->lastName)) {
root = root->rNext;
} else
root = root->lNext;
}
}
return NULL;
}
```
---
### 發現問題
>程式碼停在main 43行無法編譯[name=儂偉][time=Sat, Sep 24, 2016 3:30 PM]
### 使用GDB
```
Temporary breakpoint 1, main (argc=1, argv=0x7fffffffdee8) at main.c:25
25 {
(gdb) n
27 int i = 0;
(gdb) n
33 fp = fopen(DICT_FILE, "r");
(gdb) n
34 if (fp == NULL) {
(gdb) n
41 pHead = (entry *) malloc(sizeof(entry));
(gdb) n
42 printf("start");
(gdb) n
43 printf("size of entry : %lu bytes\n", sizeof(entry));
(gdb) n
startsize of entry : 32 bytes
44 printf("end");
(gdb) n
45 e = pHead;
(gdb) n
48 e->rNext = e->lNext = NULL;
(gdb)
```
>>請避免用圖片來呈現文字,請多用文字方塊,這樣才能方便複製和搜尋[name=jserv]
>>發現44行後無法執行的問題,待解決[name=儂偉][time=Sat, Sep 24, 2016 3:32 PM]
>已改善,但目前還找不出問題[name=儂偉][time=Sat, Sep 24, 2016 4:51 PM]
>發現降低字典檔數量就可以執行,還在思考原因[name=儂偉][time=Sat, Sep 24, 2016 5:05 PM]
```
startsize of entry : 32 bytes
endexecution time of append() : 257.225241 sec
execution time of findName() : 0.003507 sec
```
>發現原因,由於append用字串長度,字串長度太接近,樹的深度太深[name=儂偉][time=Sat, Sep 24, 2016 5:13 PM]
---
## 降低字典檔數量為5千

用二元樹會讓append的時間大幅增加,由於每次都要從頭開始判斷,也有可能是因為判斷大小的不夠均勻,導致樹歪掉,用數學來看log~2~50000~=15,照理說應該不會差到超過100倍,還在思考。
### 使用hash(djb2)處理字串 改用5W字典檔
```c
unsigned long hash(char* str)
{
unsigned long hash = 5381;
int c;
while ((c = *str++) != 0)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
```

發現 `findName` 的速度快了5倍,但append時間還是太長,應該可以再進一步優化。
#### 未優化
```
startsize of entry : 136 bytes
endexecution time of append() : 0.008545 sec
execution time of findName() : 0.000757 sec
Performance counter stats for './phonebook_orig' (100 runs):
298,739 cache-misses # 90.767 refs ( +- 0.55% )
329,128 cache-references ( +- 0.45% )
38,030,533 instructions # 0.91 insns per ( +- 0.04% )
41,899,470 cycles ( +- 2.70% )
0.013604583 seconds time elapsed ( +- 2.61% )
```
#### 優化後
```
Performance counter stats for './phonebook_opt' (100 runs):
2,874,890 cache-misses # 3.110 % of all cache refs ( +- 3.77% )
92,449,220 cache-references ( +- 0.22% )
51,082,295,945 instructions # 1.94 insns per cycle ( +- 0.00% )
26,326,936,041 cycles ( +- 0.80% )
9.239935448 seconds time elapsed ( +- 0.89% )
```
發現優化後cache-misses差了30倍
> 對照不同資料結構對 phonebook [效能影響的實驗](https://embedded2015.hackpad.com/note-hw2-phonebook-3mUOSk5J9cu) by [Jingfei](https://github.com/jingfei/phonebook). [name=jserv]
### 優化hash方式來處理字串
```
unsigned long hash(char *key)
{
unsigned int hashVal = 0;
while (*key != '\0') {
hashVal = (hashVal << 5) + *key++;
}
return hashVal % 42737;
}
```
### 大量減少append時間和findname時間
```
size of entry : 32 bytes
execution time of append() : 0.832717 sec
execution time of findName() : 0.000000996 sec
```

### 但cache-misses增加了
```
Performance counter stats for './phonebook_opt' (100 runs):
5,932,784 cache-misses # 36.397 % of all cache refs( +- 0.26% )
16,300,312 cache-references ( +- 0.07% )
3,345,379,637 instructions # 1.23 insns per cycle ( +- 0.00% )
2,711,734,441 cycles ( +- 0.13% )
0.829867464 seconds time elapsed ( +- 0.99% )
```
---
## 實作HashTable
* findName
```c
entry *findName(char lastname[], Hashtable *ht)
{
entry *tmp;
unsigned int index = hash2(lastname,ht);
for(tmp = ht->list[index]; tmp != NULL ; tmp = tmp->pNext)
if(strcasecmp(tmp->lastName,lastname) == 0)
return tmp;
printf("%12s can not be find! \n",lastname);
return NULL;
}
```
*append
```c
Hashtable *append(char lastName[], Hashtable *ht)
{
unsigned int index = hash2(lastName, ht);
entry *new_entry;
new_entry = (entry*) malloc(sizeof(entry));
if (new_entry == NULL) return NULL;
strcpy(new_entry->lastName,lastName);
new_entry->pNext = ht->list[index];
ht->list[index] = new_entry;
return ht;
}
```
### 遇到問題
```
Hashtable size = 42737
size of entry : 24 bytes
execution time of append() : 0.090755 sec
execution time of findName() : 0.000000 sec
```
>findname一直都是0秒 待解決[name=儂偉]
### 使用GDB觀察findname時間
```
Breakpoint 1, main (argc=1, argv=0x7fffffffdee8) at main.c:100
100 clock_gettime(CLOCK_REALTIME, &start);
(gdb) p start
$1 = {tv_sec = 1474797779, tv_nsec = 612819655}
(gdb) n
101 findName(input, e);
(gdb) n
102 clock_gettime(CLOCK_REALTIME, &end);
(gdb) p end
$2 = {tv_sec = 1474797779, tv_nsec = 698358342}
(gdb)
```
計算出findname時間差約為85538687ns
