# phonebook ## 學習目標 1.學習perf工具 2.學習gunplot工具 3.學習git工具 ## 環境配置 ```shell Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 61 Model name: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz Stepping: 4 CPU MHz: 2200.945 CPU max MHz: 2700.0000 CPU min MHz: 500.0000 BogoMIPS: 4393.70 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K ``` ## 改進目標 降低cache miss以及提升效能,可行的方向有 1.降低搬移的資料量 2.使用binary search tree建立資料表提升查詢效能 3.hash function 來提升查詢效能 ## 原始效能分析 ```shell 33.83% phonebook_orig [.] findName 28.97% libc-2.23.so [.] __strcasecmp_l_avx 5.98% [kernel] [k] clear_page_c_e 3.25% [kernel] [k] _raw_spin_lock 3.24% phonebook_orig [.] main 3.09% [kernel] [k] free_pcppages_bulk 2.89% libc-2.23.so [.] _IO_fgets 2.29% libc-2.23.so [.] __strcasecmp_avx 2.04% libc-2.23.so [.] _int_malloc 1.57% [kernel] [k] alloc_pages_vma 1.32% libc-2.23.so [.] _IO_getline_info 1.18% phonebook_orig [.] append ``` ```shell 345,0778 cache-misses # 89.510 % of all cache refs ( +- 0.15% ) 385,5173 cache-references ( +- 0.24% ) 2,6173,7428 instructions # 1.06 insns per cycle ( +- 0.02% ) 2,4631,4013 cycles ( +- 0.71% ) ``` cache miss大約為90% ### 效能改進(struct) 為了減少記憶體的搬移,先嘗試降低entry的大小,將除了搜尋引所之外的lastName移到別的struct ```shell= typedef struct __PHONE_BOOK_ENTRY { char lastName[MAX_LAST_NAME_SIZE]; struct __PHONE_BOOK_DETAIL *pDetail; struct __PHONE_BOOK_ENTRY *pNext; } entry; typedef struct __PHONE_BOOK_DETAIL { char firstName[16]; char email[16]; char phone[10]; char cell[10]; char addr1[16]; char addr2[16]; char city[16]; char state[2]; char zip[5]; } detail; ``` cache miss降低到69%左右 ```shell 124,5225 cache-misses # 68.849 % of all cache refs ( +- 0.22% ) 180,8641 cache-references ( +- 0.35% ) 2,4419,7204 instructions # 1.43 insns per cycle ( +- 0.02% ) 1,7110,3445 cycles ( +- 0.68% ) ``` ## 效能改進(BST) 加入binary search tree的結構,同時也加入hash value ```shell typedef struct __BST { long hash_v; struct __BST *left; struct __BST *right; struct __PHONE_BOOK_ENTRY *data; } node; ``` 在append中加入了簡單的tree的insert,比較hash value的值,依序插入node ```shell= node *currunt=head; if (head == NULL) { head = new_node; return NULL; } for (;;) { if (new_node->hash_v > currunt->hash_v) { if (currunt->right == NULL) { currunt->right = new_node; return NULL; } else { currunt = currunt->right; } } else { if (currunt->left == NULL) { currunt->left = new_node; return NULL; } else { currunt = currunt->left; } } } ``` 這裡在一開始我遇到一個問題,就是hash function的選擇 一開始我在stackoverflow上找到的djb2 hash function by Dan Bernstein ```shell= unsigned long hash(unsigned char *str) { unsigned long hash = 5381; int c; while (c = *str++) hash = ((hash << 5) + hash) + c; /* hash * 33 + c */ return hash; } ``` 使用這個hash function發現等了很久程式都還沒跑完 於是在append function中把值印出來 跑了五分鐘字母開頭為a的名字都還沒跑完 我猜測應該是因為測資是有排序過,所以hash value可能讓我的tree相當的傾斜 造成每次append一筆資料都必須造訪過非常多的node,造成append的速度相當緩慢 再來選擇其他hash function後就沒有問題了 hash function使用Jenkins hash function 參考資料: [Jenkins hash function](https://en.wikipedia.org/wiki/Jenkins_hash_function) ```shell= uint32_t hash(char * s) { uint32_t hash = 0; for(; *s; ++s) { hash += *s; hash += (hash << 10); hash ^= (hash >> 6); } hash += (hash << 3); hash ^= (hash >> 11); hash += (hash << 15); return hash; } ``` ### 改善後效能 ```shell root@XXX:/Desktop/phonebook# ./phonebook_opt size of entry : 32 bytes execution time of append() : 0.378361 sec execution time of findName() : 0.000000 sec ``` 平均的append花費時間大概是0.37 大約是0.069的5.36倍 ![](https://i.imgur.com/8qAAiQb.png) 但是在findName部分的時間就幾乎降到0 我想電話簿建立一次多花一些時間但是在搜尋上獲得的好處應該是可以彌補的 ## 未來展望 在原始的append部分我發現了每次都會呼叫malloc 如果能宣告一個夠長的陣列來儲存這些資料,當陣列滿時再realloc就可以再提升效能