HaoTse
>jkrvivian
>$ sudo apt-get update
$ sudo apt-get install build-essential
$ sudo apt-get install linux-tools-common linux-tools-generic
$ sudo apt-get install astyle colordiff gnuplot
$uname -a
Linux tse-Aspire-V5-573G 4.4.0-38-generic
$ lscpu | grep cache
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
$ ./phonebook_orig
size of entry : 136 bytes
execution time of append() : 0.047622 sec
execution time of findName() : 0.005108 sec
一個entry(PhoneBook size)為 136bytes,L1 cache的size為 32KB = 32 * 1024bytes,所以(32 * 1024) / (136 * 8) = 30.12,只能存30筆左右。
$ make cache-test
Performance counter stats for './phonebook_orig' (100 runs):
96,4439 cache-misses # 87.936 % of all cache refs ( +- 0.43% )
109,6754 cache-references ( +- 0.28% )
2,6120,5049 instructions # 1.50 insns per cycle ( +- 0.02% )
1,7371,1341 cycles ( +- 0.39% )
0.070385042 seconds time elapsed ( +- 0.57% )
$ ./phonebook_orig & sudo perf top -p $!
使用perf尋找熱點鄭皓澤
23.06% libc-2.23.so [.] __strcasecmp_l_avx
22.35% phonebook_orig [.] findName
8.66% phonebook_orig [.] main
5.41% libc-2.23.so [.] _int_malloc
5.21% [kernel] [k] clear_page_c_e
4.01% phonebook_orig [.] append
3.96% libc-2.23.so [.] __memcpy_sse2
3.65% libc-2.23.so [.] _IO_fgets
3.20% [unknown] [k] 0x00007ff9db93e02e
2.73% [kernel] [k] release_pages
2.17% libc-2.23.so [.] memchr
1.40% libc-2.23.so [.] __strcpy_sse2_unaligned
1.40% libc-2.23.so [.] _IO_getline_info
1.37% [kernel] [k] uncharge_list
1.36% [kernel] [k] page_remove_rmap
1.28% [kernel] [k] alloc_pages_vma
0.94% [kernel] [k] lru_cache_add_active_or_unevictable
0.81% [kernel] [k] __rmqueue.isra.79
0.71% [kernel] [k] get_page_from_freelist
0.71% libc-2.23.so [.] _IO_getline
0.71% [kernel] [k] pagevec_lru_move_fn
0.71% libc-2.23.so [.] malloc
0.70% libc-2.23.so [.] __strcasecmp_avx
0.70% phonebook_orig [.] strcasecmp@plt
0.70% [kernel] [k] __pagevec_lru_add_fn
0.69% [kernel] [k] free_hot_cold_page
0.68% [kernel] [k] unmap_page_range
0.67% [kernel] [k] free_pcppages_bulk
0.05% [kernel] [k] __perf_event_enable
0.00% [kernel] [k] nmi_handle
0.00% [kernel] [k] native_write_msr_safe
使用 perf 的結果中的第2行我們可以看出 findName() 佔了 22.35%,第4行 append() 則佔了4.01%,findName()為熱點,著手改善findName()的速度。
前面提到我的電腦的L1 cache最多只能放30個entry,但總共有35萬個單字,因此一定會發生頻繁的cache miss。
根據題目要求「有沒有找到 last name」,我們只需要搜尋last name,因此其他資訊是可以忽略不看的。所以在這裡我就另外宣告了一個struct儲存last name,將他取代原本的entry,另外將原本的entry改為detail。
typedef struct __LAST_NAME_ENTRY{
char lastName[MAX_LAST_NAME_SIZE];
detail *detail;
struct __LAST_NAME_ENTRY *pNext;
} entry;
$ ./phonebook_opt
size of entry : 32 bytes
execution time of append() : 0.053768 sec
execution time of findName() : 0.002357 sec
entry的大小很明顯的從 136bytes下降至 32bytes,cache就可以放入(32 * 1024) / (32 * 8) = 128筆資料,大大的增加cache hit的機會。
findName()的執行時間也明顯下降了。
$ make cache-test
Performance counter stats for './phonebook_opt' (100 runs):
14,9402 cache-misses # 28.386 % of all cache refs ( +- 1.18% )
52,6332 cache-references ( +- 1.64% )
2,4425,4756 instructions # 1.49 insns per cycle ( +- 0.02% )
1,6354,4468 cycles ( +- 1.61% )
0.071050083 seconds time elapsed ( +- 1.84% )
cache miss 的機率明顯的從 87.936%下降至 23.386%。
$ ./phonebook_opt & sudo perf top -p $!
28.64% libc-2.23.so [.] __strcasecmp_l_avx
20.16% phonebook_opt [.] main
8.76% phonebook_opt [.] findName
8.33% libc-2.23.so [.] malloc
7.41% libc-2.23.so [.] __memcpy_sse2
4.36% libc-2.23.so [.] _IO_getline_info
2.83% libc-2.23.so [.] _IO_fgets
2.76% libc-2.23.so [.] _int_malloc
2.60% libc-2.23.so [.] memchr
2.59% phonebook_opt [.] append
2.50% libc-2.23.so [.] __strcasecmp_avx
1.43% [kernel] [.] native_irq_return_iret
1.34% libc-2.23.so [.] __strcpy_sse2_unaligned
1.29% [unknown] [k] 0x00007f9a8187d02e
1.26% [kernel] [k] do_exit
1.25% [kernel] [k] unmap_page_range
1.24% [kernel] [k] free_pages_prepare
1.24% [kernel] [k] free_pcppages_bulk
0.01% [kernel] [k] native_sched_clock
0.00% [kernel] [k] native_write_msr_safe
findName()所佔的效能也從 22.35%降至 8.76%。
從方法一改變資料結構後已經可以發現findName()明顯的效能進步,不過考慮到單字量的大小,就想到可以從hash function著手,利用last name當作key,對hash table做搜尋一定比每次都要從頭搜尋還要快。
根據參考資料看出BKDRHash的平均表現是最好的,因此在這裡選擇使用它。
// BKDR Hash Function
unsigned int BKDRHash(char *str)
{
unsigned int seed = 131; // 31 131 1313 13131 131313 etc..
unsigned int hash = 0;
while (*str)
{
hash = hash * seed + (*str++);
}
//HASH_TABLE_SIZE define in phonebook_hash.h
return (hash % HASH_TABLE_SIZE);
}
並先將table size設為42737,宣告方式變為
entry *pHead[HASH_TABLE_SIZE], *e[HASH_TABLE_SIZE];
這裡出現了小bug,花了不少時間學習使用gdb。鄭皓澤
$ ./phonebook_hash
size of entry : 32 bytes
execution time of append() : 0.079218 sec053768
execution time of findName() : 0.000000 sec
findName()的時間直接降至顯示0秒,但append()卻上升了0.02545秒。
$ make cache-test
Performance counter stats for './phonebook_hash' (100 runs):
105,9511 cache-misses # 64.586 % of all cache refs ( +- 0.85% )
164,0460 cache-references ( +- 0.31% )
2,9978,3164 instructions # 1.13 insns per cycle ( +- 0.02% )
2,6638,1339 cycles ( +- 0.45% )
0.111340353 seconds time elapsed ( +- 0.52% )
cache miss 的機率從 23.386%上升至 64.586%。
HaoTse
sysprog21
phonebook
hash function