Original version 把實做交錯在 main.c 與 phonebook_orig.c,導致後續要加入優化版本較不方便,目前有兩項解決辦法:
gcc -D
,決定該用哪個版本,也可定義在標頭檔。這方法會在 main.c 放一堆巨集判斷式,看起來非常凌亂。void *init();
void *findName(char [], void *);
void *append(char [], void *);
void memory_free(void *);
Perf (Performance Event)
Cache misses 並不是所有種類的 cache miss 次數的總和,而是最後一層 cache 的 miss 次數。我認為這是優化版本其 cache-references 次數會降低的原因。透過更聰明的程式編排,能在更上層的 cache (smaller but faster) 裡找到需要的資料,所以參照 Last Level Cach 的次數與在此發生 miss 的次數也就跟著降低。
References:
Original version 的動態記憶體釋放應該有誤:
if(pHead->pNext) free(pHead->pNext);
free(pHead);
__builtin___clear_cache(char *begin, char *end);
This function is used to flush the processor's instruction cache for the region of memory between begin inclusive and end exclusive.
Gnuplot command for element distribution
> set title 'element distribution'
> set xrange[0 : 65535]
> set terminal png
> set output 'xxx.png'
> plot "xxx.txt" with points pointtype 0 notitle
size of entry : 136 bytes
execution time of append() : 0.035750 sec
execution time of findName() : 0.003906 sec
Performance counter stats for './phonebook_orig' (100 runs):
2,010,723 cache-misses #87.332% of all cache refs ( +- 0.13% )
2,302,381 cache-references ( +- 0.07% )
387,959,515 instructions #1.51 insns per cycle ( +- 0.01% )
256,717,122 cycles ( +- 0.13% )
0.070331387 seconds time elapsed ( +- 1.31% )
以下使用了三種雜湊函式,且分別對原始版本進行效能比較。雜湊表大小為 216 (65536) ,碰撞(collision)的情形以鍊結串列方式處理。
於 Makefile 內編譯選項定義使用雜湊函式符號,全域函數指標 uint32_t (*hash_func)(unsigned char *)
會指向對應函式。
以下為個雜湊函式優化情況:
Fowler–Noll–Vo(FNV)
FNV hashes are designed to be fast while maintaining a low collision rate. The high dispersion of the FNV hashes makes them well suited for hashing nearly identical strings such as URLs, hostnames, filenames, text, IP addresses, etc.
size of entry : 32 bytes
execution time of append() : 0.052116 sec
execution time of findName() : 0.000000 sec
Performance counter stats for './phonebook_opt' (100 runs):
820,913 cache-misses #58.291% of all cache refs ( +- 0.16% )
1,408,308 cache-references ( +- 0.12% )
280,872,352 instructions #1.10 insns per cycle ( +- 0.02% )
255,427,037 cycles ( +- 0.28% )
0.043733445 seconds time elapsed ( +- 1.57% )
DJB2
size of entry : 32 bytes
execution time of append() : 0.048434 sec
execution time of findName() : 0.000000 sec
Performance counter stats for './phonebook_opt' (100 runs):
779,938 cache-misses #57.892% of all cache refs ( +- 0.12% )
1,347,239 cache-references ( +- 0.09% )
291,625,741 instructions #1.28 insns per cycle ( +- 0.02% )
227,187,432 cycles ( +- 0.43% )
0.076748609 seconds time elapsed ( +- 2.21% )
SDBM
size of entry : 32 bytes
execution time of append() : 0.041675 sec
execution time of findName() : 0.000000 sec
Performance counter stats for './phonebook_opt' (100 runs):
862,401 cache-misses #64.358% of all cache refs ( +- 0.13% )
1,340,008 cache-references ( +- 0.09% )
294,506,718 instructions #1.18 insns per cycle ( +- 0.02% )
248,938,359 cycles ( +- 0.22% )
0.067740143 seconds time elapsed ( +- 1.24% )
Source of this article: https://hackmd.io/s/BJBBVaqT
Reference: