contributed by <jeff60907
>
請加快進度!!
課程助教
refleex
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 58
Model name: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
Stepping: 9
CPU MHz: 1605.039
CPU max MHz: 3900.0000
CPU min MHz: 1600.0000
BogoMIPS: 6807.01
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
$ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]
$ make run
$ make cache-test
size of entry : 136 bytes
execution time of append() : 0.065398 sec
execution time of findName() : 0.006174 sec
Performance counter stats for './phonebook_orig' (100 runs):
199,1215 cache-misses # 95.216 % of all cache refs ( +- 0.53% )
209,1256 cache-references ( +- 0.52% )
2,6079,3262 instructions # 1.40 insns per cycle ( +- 0.02% )
1,8677,7025 cycles ( +- 0.47% )
0.063488818 seconds time elapsed ( +- 2.66% )
最初版本 cache-misses 高達 95.216%,代表大部份的存取資料都是 miss的
接著使用 perf
找熱點 了解 程式中哪個資源佔最多
$ ./phonebook_orig & sudo perf top -p $!
14.85% phonebook_orig [.] findName
perf 出來 findName 為 phonebook中的 熱點,但是在上面 append()花的時間卻比findName()多
參考0140454同學共筆發現有 memory leak 問題,上學期做作業時沒有想過有這個問題,
當程式沒有將記憶體正確釋放將可能導致 memory leak,使用 valgrind
觀察
$ sudo apt-get install valgrind
$ valgrind --leak-check=full ./phonebook_orig
==26473== HEAP SUMMARY:
==26473== in use at exit: 47,586,264 bytes in 349,899 blocks
==26473== total heap usage: 349,906 allocs, 7 frees, 47,596,856 bytes allocated
==26473==
==26473== 816,000 bytes in 6,000 blocks are possibly lost in loss record 1 of 3
==26473== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26473== by 0x400D5C: append (in /home/jeff60907/spring/phonebook/phonebook_orig)
==26473== by 0x400ACB: main (in /home/jeff60907/spring/phonebook/phonebook_orig)
==26473==
==26473== 46,770,264 (136 direct, 46,770,128 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 3
==26473== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==26473== by 0x400D5C: append (in /home/jeff60907/spring/phonebook/phonebook_orig)
==26473== by 0x400ACB: main (in /home/jeff60907/spring/phonebook/phonebook_orig)
==26473==
==26473== LEAK SUMMARY:
==26473== definitely lost: 136 bytes in 1 blocks
==26473== indirectly lost: 46,770,128 bytes in 343,898 blocks
==26473== possibly lost: 816,000 bytes in 6,000 blocks
==26473== still reachable: 0 bytes in 0 blocks
==26473== suppressed: 0 bytes in 0 blocks
==26473==
==26473== For counts of detected and suppressed errors, rerun with: -v
==26473== Use --track-origins=yes to see where uninitialised values come from
==26473== ERROR SUMMARY: 8 errors from 8 contexts (suppressed: 0 from 0)
可以看到
Heap summary
總共程式使用了47,586,264 bytes in 349,899 blocks
Leak summary
中definitely lost
代表確定有memory leak
問題必需要解決,而indirectly or possibly
是被懷疑的需要再針對程式中分析。
if (pHead->pNext) free(pHead->pNext);
把 free(pHead->pNext)一個一個 free
while(pHead->pNext) {
entry *next = pHead->pNext;
pHead->pNext = next->pNext;
free(next);
}
==26705== LEAK SUMMARY:
==26705== definitely lost: 136 bytes in 1 blocks
==26705== indirectly lost: 0 bytes in 0 blocks
==26705== possibly lost: 0 bytes in 0 blocks
==26705== still reachable: 0 bytes in 0 blocks
==26705== suppressed: 0 bytes in 0 blocks
free(pHead) 移除後發現,有
LEAK SUMMARY
問題,但是其他懷疑的都OK了
補回 free(pHead),解決memory leak
問題
==26738== All heap blocks were freed -- no leaks are possible
find()的時候,只尋找 lastName,但是整個結構包含了其他的資料,把用不到的資料成員自己存放在entry_detail,縮小搜尋的結構。
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __PHONE_BOOK_ENTRY *pNext;
} entry_detail;
typedef struct _LASTNAME_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
entry_detail *detail;
struct _LASTNAME_ENTRY *pNext;
} entry;
size of entry : 32 bytes
execution time of append() : 0.028760 sec
execution time of findName() : 0.002783 sec
size 從 136 bytes 降為 32 bytes
Performance counter stats for './phonebook_opt' (100 runs):
35,6617 cache-misses # 74.485 % of all cache refs ( +- 0.19% )
47,8774 cache-references ( +- 0.52% )
2,4427,3902 instructions # 1.82 insns per cycle ( +- 0.02% )
1,3421,9984 cycles ( +- 0.44% )
0.040133182 seconds time elapsed ( +- 2.18% )
單純更改資料結構 cache-miss 從 95% 降到75% ,改善一點結構就可以讓 cache 搬運成本降低。
使用 BKDR-Hash 進行優化
hash table size = 1024
unsigned int BKDRHash(char *str)
{
unsigned int hash = 0;
while (*str) {
hash = hash * 131 + (*str++);
}
return (hash & 0x7FFFFFFF);
}
size of entry : 32 bytes
execution time of append() : 0.036309 sec
execution time of findName() : 0.000034 sec
Performance counter stats for './phonebook_hash' (100 runs):
32,2706 cache-misses # 50.147 % of all cache refs ( +- 0.31% )
64,3514 cache-references ( +- 0.62% )
2,2868,9908 instructions # 1.69 insns per cycle ( +- 0.02% )
1,3552,0183 cycles ( +- 0.62% )
0.045378226 seconds time elapsed ( +- 2.95% )
append()時間上升蠻多的,是因為建立 hash table 需要額外的時間,但是建立完後只要找到hash value 後需要搜尋的範圍就縮小很多了,findName()效能就大幅得上升。
而且 cahce-miss 也從 74% 變成 50%
讓 hash table 碰撞機率下降是否可以提高效能?
hash table size = 2048
Performance counter stats for './phonebook_hash' (100 runs):
32,6460 cache-misses # 46.424 % of all cache refs ( +- 0.22% )
70,3208 cache-references ( +- 0.73% )
2,2888,2470 instructions # 1.68 insns per cycle ( +- 0.02% )
1,3649,5806 cycles ( +- 0.50% )
0.046204817 seconds time elapsed ( +- 2.65% )
hash table size = 8192
Performance counter stats for './phonebook_hash' (100 runs):
33,9796 cache-misses # 39.912 % of all cache refs ( +- 0.10% )
85,1367 cache-references ( +- 0.16% )
2,2994,7215 instructions # 1.66 insns per cycle ( +- 0.02% )
1,3835,4609 cycles ( +- 0.11% )
0.052061784 seconds time elapsed ( +- 2.74% )
hash table size = 32768
Performance counter stats for './phonebook_hash' (100 runs):
46,1318 cache-misses # 40.176 % of all cache refs ( +- 0.81% )
114,8230 cache-references ( +- 0.19% )
2,3405,9699 instructions # 1.49 insns per cycle ( +- 0.02% )
1,5712,5488 cycles ( +- 0.50% )
0.058200217 seconds time elapsed ( +- 2.32% )
提高 hash table size 可以提高執行速率但是一定程度後就不再改變,另外也可以降低 cache-miss 但是 size 高於某一個值 cache-miss 反而開始上升,並不是愈高愈好。
可以嘗試 "質數" hash table size 實驗
參考閱讀: StackOverflow - Why should hash functions use a prime number modulus?
課程助教
建立 pChild[27]
紀錄字母索引表
typedef struct _LASTNAME_ENTRY {
char ch;
entry_detail *detail;
struct _LASTNAME_ENTRY *pChild[27];
} entry;
size of entry : 232 bytes
execution time of append() : 0.192304 sec
execution time of findName() : 0.000000 sec
Performance counter stats for './phonebook_trie' (100 runs):
565,2061 cache-misses # 92.578 % of all cache refs ( +- 0.05% )
610,5195 cache-references ( +- 0.04% )
16,1597,4402 instructions # 1.53 insns per cycle ( +- 0.01% )
10,5371,1599 cycles ( +- 0.07% )
0.332670737 seconds time elapsed ( +- 0.83% )
cache-misses 暴增 和 append 時間也比未優化更久,因為再建立樹結構的時候就花費很多時間成本, 但是 findName()執行時間就變成是最快的了
memory pool 為事先建立一大塊的記憶體先放著,可以取代傳統的 malloc,因為每一次呼叫 malloc 都會有系統成本存在,之後 append() 需要記憶體空間就可以從 pool 存取即可。
memory pool 的大小
trie 2 * 1024 * 1024
others 512 * 1024
trie 需要的記憶體空間大很多,給太小就會造成記憶體錯誤,可以驗證 trie 上述優點是速度快速,缺點是耗費記憶體
使用了memory pool 大幅地改善 append()的效能,trie 的時間減少了一半多,表示 trie 建立索引表時,呼叫 malloc時間成本很高。
jeff60907