1.學習perf工具
2.學習gunplot工具
3.學習git工具
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 61
Model name: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
Stepping: 4
CPU MHz: 2200.945
CPU max MHz: 2700.0000
CPU min MHz: 500.0000
BogoMIPS: 4393.70
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
降低cache miss以及提升效能,可行的方向有
1.降低搬移的資料量
2.使用binary search tree建立資料表提升查詢效能
3.hash function 來提升查詢效能
33.83% phonebook_orig [.] findName
28.97% libc-2.23.so [.] __strcasecmp_l_avx
5.98% [kernel] [k] clear_page_c_e
3.25% [kernel] [k] _raw_spin_lock
3.24% phonebook_orig [.] main
3.09% [kernel] [k] free_pcppages_bulk
2.89% libc-2.23.so [.] _IO_fgets
2.29% libc-2.23.so [.] __strcasecmp_avx
2.04% libc-2.23.so [.] _int_malloc
1.57% [kernel] [k] alloc_pages_vma
1.32% libc-2.23.so [.] _IO_getline_info
1.18% phonebook_orig [.] append
345,0778 cache-misses # 89.510 % of all cache refs ( +- 0.15% )
385,5173 cache-references ( +- 0.24% )
2,6173,7428 instructions # 1.06 insns per cycle ( +- 0.02% )
2,4631,4013 cycles ( +- 0.71% )
cache miss大約為90%
為了減少記憶體的搬移,先嘗試降低entry的大小,將除了搜尋引所之外的lastName移到別的struct
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct __PHONE_BOOK_DETAIL *pDetail;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
typedef struct __PHONE_BOOK_DETAIL {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} detail;
cache miss降低到69%左右
124,5225 cache-misses # 68.849 % of all cache refs ( +- 0.22% )
180,8641 cache-references ( +- 0.35% )
2,4419,7204 instructions # 1.43 insns per cycle ( +- 0.02% )
1,7110,3445 cycles ( +- 0.68% )
加入binary search tree的結構,同時也加入hash value
typedef struct __BST {
long hash_v;
struct __BST *left;
struct __BST *right;
struct __PHONE_BOOK_ENTRY *data;
} node;
在append中加入了簡單的tree的insert,比較hash value的值,依序插入node
node *currunt=head;
if (head == NULL) {
head = new_node;
return NULL;
}
for (;;) {
if (new_node->hash_v > currunt->hash_v) {
if (currunt->right == NULL) {
currunt->right = new_node;
return NULL;
} else {
currunt = currunt->right;
}
} else {
if (currunt->left == NULL) {
currunt->left = new_node;
return NULL;
} else {
currunt = currunt->left;
}
}
}
這裡在一開始我遇到一個問題,就是hash function的選擇
一開始我在stackoverflow上找到的djb2 hash function by Dan Bernstein
unsigned long hash(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
使用這個hash function發現等了很久程式都還沒跑完
於是在append function中把值印出來
跑了五分鐘字母開頭為a的名字都還沒跑完
我猜測應該是因為測資是有排序過,所以hash value可能讓我的tree相當的傾斜
造成每次append一筆資料都必須造訪過非常多的node,造成append的速度相當緩慢
再來選擇其他hash function後就沒有問題了
hash function使用Jenkins hash function
參考資料:
Jenkins hash function
uint32_t hash(char * s)
{
uint32_t hash = 0;
for(; *s; ++s) {
hash += *s;
hash += (hash << 10);
hash ^= (hash >> 6);
}
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
return hash;
}
root@XXX:/Desktop/phonebook# ./phonebook_opt
size of entry : 32 bytes
execution time of append() : 0.378361 sec
execution time of findName() : 0.000000 sec
平均的append花費時間大概是0.37 大約是0.069的5.36倍
但是在findName部分的時間就幾乎降到0
我想電話簿建立一次多花一些時間但是在搜尋上獲得的好處應該是可以彌補的
在原始的append部分我發現了每次都會呼叫malloc
如果能宣告一個夠長的陣列來儲存這些資料,當陣列滿時再realloc就可以再提升效能