# 2017q1 Homework1 (phonebook)
contributed by < `yih6208` >
>請詳細閱讀作業要求(每份作業共筆的標題固定為 2017q1 Homework1 (phonebook),其中 “phonebook” 更換為對應的作業名稱,注意:後者是小寫),這是課程最低要求[name=課程助教][color=red]
>>已更改 謝謝助教提醒[name=yih6208][color=blue]
### Reviewed by <`vtim9907`>
- hash function 優化的部份,append 的時間不增反減的地方很特殊,可是下面的回答應該是跟 findName 比較有關係,還是看不出來如何作到的?
- 可以做更多的優化嘗試,甚至做到讓 append 和 findName 的時間都大幅縮短。
- 可以更貼近現實,除了現有的新增和查詢,可以嘗試做出刪除的功能,並進一步比較各結構刪除的效能。
- 尚未回答本作業需要回答的問題。
## Develop Environment
### CPU's information
Used Command `$ lscpu`
```shell
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Core(TM) i5-4210H CPU @ 2.90GHz
Stepping: 3
CPU MHz: 3399.853
CPU max MHz: 3500.0000
CPU min MHz: 800.0000
BogoMIPS: 5786.63
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
NUMA node0 CPU(s): 0-3
```
## First Step - Move unimportant variables out
I first think about how does the cache size effects on the miss rate.After watch Jserv's video ,I realize that I have to decrease the size of the PHONE_BOOK_ENTRY struct ,so I put the unimportant information in PHONE_BOOK_ENTRY struct out into a new struct exclude the "Lastname" which we are mainly concerned. After this change I successfully decrease the miss rate around 43%.
```C=
//The unimportant information
struct PhoneBookInformation{
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
};
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct PhoneBookInformation *PhoneInf; //use this pointer to store the others' information
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
```
### Optimize Result
#### Missing Rate
`$ echo 1 | sudo tee /proc/sys/vm/drop_caches`
`$ make cache-test`
```shell=
Performance counter stats for './phonebook_orig' (100 runs):
1,175,295 cache-misses # 94.250 % of all cache refs ( +- 0.22% )
1,246,994 cache-references ( +- 0.19% )
261,872,163 instructions # 1.35 insn per cycle ( +- 0.02% )
194,577,270 cycles ( +- 0.26% )
0.057577720 seconds time elapsed ( +- 0.40% )
performance counter stats for './phonebook_opt' (100 runs):
143,402 cache-misses # 43.720 % of all cache refs ( +- 0.39% )
328,000 cache-references ( +- 0.27% )
206,811,777 instructions # 1.63 insn per cycle ( +- 0.02% )
127,065,175 cycles ( +- 0.12% )
0.037148397 seconds time elapsed ( +- 0.14% )
```
#### The Excution time plot

### First Conclude
As you can see the above plot , after changing the struct's instructure by moving the unimportant variable to another struct ,the searching time decreased obviously. It's only need 40% time to find the name out compared to the oringinal one .
## Second Step - Use Hash Search
After reading others' notes and the collaborative writing last year , I'm trying to use the hash table to increase my searching efficiency. As I read from the tutorial on the others' website. I found out that the Hash search's time complexity in best situation is O(1), so I should be able to decrease my searching time to 0.
```C
/*The Variable to store the Hash Table*/
/*I initialize the hash table before it's used*/
#define MAX_HASH_TABLES 174991
entry HashTable[MAX_HASH_TABLES];
```
### BKDR Hashing
After reading some website's hashing function performance analyzing, I finally choose the BKDRHash. It's collide chance is minium and its speed is the fastest one.
http://www.cnblogs.com/tibetanmastiff/p/hash.html
```C=
unsigned int BKDRHash(char *str)
{
unsigned int hash = 0;
while (*str) {
hash = hash * MAX_HASH_TABLES + (*str++);
}
hash &= 0x7FFFFFFF;
hash %= MAX_HASH_TABLES;
return hash;
}
```
### Hashing Optimize Result
#### Missing Rate
```shell=
Performance counter stats for './phonebook_orig' (100 runs):
1,146,719 cache-misses # 91.710 % of all cache refs ( +- 0.13% )
1,250,378 cache-references ( +- 0.27% )
260,580,570 instructions # 1.24 insns per cycle ( +- 0.02% )
209,346,095 cycles ( +- 0.31% )
0.056890723 seconds time elapsed ( +- 0.35% )
Performance counter stats for './phonebook_opt' (100 runs):
143,252 cache-misses # 35.434 % of all cache refs ( +- 2.18% )
404,283 cache-references ( +- 0.24% )
154,915,863 instructions # 1.29 insns per cycle ( +- 0.03% )
120,175,037 cycles ( +- 0.88% )
0.032652262 seconds time elapsed ( +- 0.94% )
```
#### The Excution time plot

>> 使用 hash function 後 append() 時間不增反減?
>> [name=課程助教][color=red]
>>> 因為當我建出hash table後,相較於原本將所有資料串接的模式,從頭開始搜尋的node數變少,也因此縮短了時間。