Phonebook

  • Original version 把實做交錯在 main.c 與 phonebook_orig.c,導致後續要加入優化版本較不方便,目前有兩項解決辦法:

    1. 在 main.c 裡將各自相關的部份使用巨集判斷式隔離,在編譯指令加入定義gcc -D,決定該用哪個版本,也可定義在標頭檔。這方法會在 main.c 放一堆巨集判斷式,看起來非常凌亂。
    2. main.c 裡使用一致介面,定義實做在對應 .c 檔內:
      ​​void *init();
      ​​void *findName(char [], void *);
      ​​void *append(char [], void *);
      ​​void memory_free(void *);
      
      這個方式會需要對 original version 稍微進行修改。為了拿掉 entry *e 這個指標,在 append 時選擇對頭端進行,鍊結方式變成反向,而搜尋字串則是等效改成搜尋 "aaaaaaugh"。
  • Perf (Performance Event)

    • cache-misses: Usually this indicates Last Level Cache misses.
    • cache-references: Usually this indicates Last Level Cache accesses but this may vary depending on your CPU.

    Cache misses 並不是所有種類的 cache miss 次數的總和,而是最後一層 cache 的 miss 次數。我認為這是優化版本其 cache-references 次數會降低的原因。透過更聰明的程式編排,能在更上層的 cache (smaller but faster) 裡找到需要的資料,所以參照 Last Level Cach 的次數與在此發生 miss 的次數也就跟著降低。

    References:

  • Original version 的動態記憶體釋放應該有誤:

    if(pHead->pNext) free(pHead->pNext);free(pHead);
    
  • __builtin___clear_cache(char *begin, char *end);
    This function is used to flush the processor's instruction cache for the region of memory between begin inclusive and end exclusive.

  • Gnuplot command for element distribution

    ​> set title 'element distribution'
    ​> set xrange[0 : 65535]
    ​> set terminal png
    ​> set output 'xxx.png'
    ​> plot "xxx.txt" with points pointtype 0 notitle
    

Original Version

size of entry : 136 bytes
execution time of append() : 0.035750 sec
execution time of findName() : 0.003906 sec
 Performance counter stats for './phonebook_orig' (100 runs):

     2,010,723	cache-misses		#87.332% of all cache refs	( +-  0.13% )
     2,302,381	cache-references                             		( +-  0.07% )
   387,959,515	instructions    	#1.51  insns per cycle		( +-  0.01% )
   256,717,122	cycles							( +-  0.13% )

0.070331387 seconds time elapsed ( +-  1.31% )

Optimized Version

Hash

以下使用了三種雜湊函式,且分別對原始版本進行效能比較。雜湊表大小為 216 (65536) ,碰撞(collision)的情形以鍊結串列方式處理。

於 Makefile 內編譯選項定義使用雜湊函式符號,全域函數指標 uint32_t (*hash_func)(unsigned char *)會指向對應函式。

以下為個雜湊函式優化情況:

  • Fowler–Noll–Vo(FNV)
    FNV hashes are designed to be fast while maintaining a low collision rate. The high dispersion of the FNV hashes makes them well suited for hashing nearly identical strings such as URLs, hostnames, filenames, text, IP addresses, etc.

    ​size of entry : 32 bytes
    ​execution time of append() : 0.052116 sec
    ​execution time of findName() : 0.000000 sec
    
    ​Performance counter stats for './phonebook_opt' (100 runs):
    ​    820,913	cache-misses		#58.291% of all cache refs	( +-  0.16% )
    ​  1,408,308	cache-references					( +-  0.12% )
    ​280,872,352	instructions		#1.10  insns per cycle		( +-  0.02% )
    ​255,427,037	cycles							( +-  0.28% )
    ​
    ​0.043733445 seconds time elapsed ( +-  1.57% )
    


  • DJB2

    ​size of entry : 32 bytes
    ​execution time of append() : 0.048434 sec
    ​execution time of findName() : 0.000000 sec
    
    ​Performance counter stats for './phonebook_opt' (100 runs):
    ​    779,938	cache-misses		#57.892% of all cache refs	( +-  0.12% )
    ​  1,347,239	cache-references					( +-  0.09% )
    ​291,625,741	instructions		#1.28  insns per cycle		( +-  0.02% )
    ​227,187,432	cycles							( +-  0.43% )
    ​
    ​0.076748609 seconds time elapsed ( +-  2.21% )
    


  • SDBM

    ​size of entry : 32 bytes
    ​execution time of append() : 0.041675 sec
    ​execution time of findName() : 0.000000 sec
    
    ​Performance counter stats for './phonebook_opt' (100 runs):
    ​    862,401	cache-misses		#64.358% of all cache refs	( +-  0.13% )
    ​  1,340,008	cache-references					( +-  0.09% )
    ​294,506,718	instructions		#1.18  insns per cycle		( +-  0.02% )
    ​248,938,359	cycles							( +-  0.22% )
    ​
    ​0.067740143 seconds time elapsed ( +-  1.24% )
    


Source of this article: https://hackmd.io/s/BJBBVaqT

Reference: