Phonebook

Original version 把實做交錯在 main.c 與 phonebook_orig.c，導致後續要加入優化版本較不方便，目前有兩項解決辦法：
1. 在 main.c 裡將各自相關的部份使用巨集判斷式隔離，在編譯指令加入定義gcc -D，決定該用哪個版本，也可定義在標頭檔。這方法會在 main.c 放一堆巨集判斷式，看起來非常凌亂。
2. main.c 裡使用一致介面，定義實做在對應 .c 檔內：
```
void *init();
void *findName(char [], void *);
void *append(char [], void *);
void memory_free(void *);
```
  這個方式會需要對 original version 稍微進行修改。為了拿掉 entry *e 這個指標，在 append 時選擇對頭端進行，鍊結方式變成反向，而搜尋字串則是等效改成搜尋 "aaaaaaugh"。
Perf (Performance Event)
- cache-misses: Usually this indicates Last Level Cache misses.
- cache-references: Usually this indicates Last Level Cache accesses but this may vary depending on your CPU.
Cache misses 並不是所有種類的 cache miss 次數的總和，而是最後一層 cache 的 miss 次數。我認為這是優化版本其 cache-references 次數會降低的原因。透過更聰明的程式編排，能在更上層的 cache (smaller but faster) 裡找到需要的資料，所以參照 Last Level Cach 的次數與在此發生 miss 的次數也就跟著降低。

References:
- http://web.eece.maine.edu/~vweaver/projects/perf_events/perf_event_open.html
- http://wiki.csie.ncku.edu.tw/embedded/perf-tutorial

Original version 的動態記憶體釋放應該有誤：

if(pHead->pNext) free(pHead->pNext);
free(pHead);

__builtin___clear_cache(char *begin, char *end);
This function is used to flush the processor's instruction cache for the region of memory between begin inclusive and end exclusive.

Gnuplot command for element distribution

> set title 'element distribution'
> set xrange[0 : 65535]
> set terminal png
> set output 'xxx.png'
> plot "xxx.txt" with points pointtype 0 notitle

Original Version

size of entry : 136 bytes
execution time of append() : 0.035750 sec
execution time of findName() : 0.003906 sec

 Performance counter stats for './phonebook_orig' (100 runs):

     2,010,723	cache-misses		#87.332% of all cache refs	( +-  0.13% )
     2,302,381	cache-references                             		( +-  0.07% )
   387,959,515	instructions    	#1.51  insns per cycle		( +-  0.01% )
   256,717,122	cycles							( +-  0.13% )

0.070331387 seconds time elapsed ( +-  1.31% )

Optimized Version

Hash

以下使用了三種雜湊函式，且分別對原始版本進行效能比較。雜湊表大小為 2¹⁶ (65536) ，碰撞(collision)的情形以鍊結串列方式處理。

於 Makefile 內編譯選項定義使用雜湊函式符號，全域函數指標 uint32_t (*hash_func)(unsigned char *)會指向對應函式。

以下為個雜湊函式優化情況：

Fowler–Noll–Vo(FNV)
FNV hashes are designed to be fast while maintaining a low collision rate. The high dispersion of the FNV hashes makes them well suited for hashing nearly identical strings such as URLs, hostnames, filenames, text, IP addresses, etc.

size of entry : 32 bytes
execution time of append() : 0.052116 sec
execution time of findName() : 0.000000 sec

Performance counter stats for './phonebook_opt' (100 runs):
    820,913	cache-misses		#58.291% of all cache refs	( +-  0.16% )
  1,408,308	cache-references					( +-  0.12% )
280,872,352	instructions		#1.10  insns per cycle		( +-  0.02% )
255,427,037	cycles							( +-  0.28% )

0.043733445 seconds time elapsed ( +-  1.57% )

DJB2

size of entry : 32 bytes
execution time of append() : 0.048434 sec
execution time of findName() : 0.000000 sec

Performance counter stats for './phonebook_opt' (100 runs):
    779,938	cache-misses		#57.892% of all cache refs	( +-  0.12% )
  1,347,239	cache-references					( +-  0.09% )
291,625,741	instructions		#1.28  insns per cycle		( +-  0.02% )
227,187,432	cycles							( +-  0.43% )

0.076748609 seconds time elapsed ( +-  2.21% )

SDBM

size of entry : 32 bytes
execution time of append() : 0.041675 sec
execution time of findName() : 0.000000 sec

Performance counter stats for './phonebook_opt' (100 runs):
    862,401	cache-misses		#64.358% of all cache refs	( +-  0.13% )
  1,340,008	cache-references					( +-  0.09% )
294,506,718	instructions		#1.18  insns per cycle		( +-  0.02% )
248,938,359	cycles							( +-  0.22% )

0.067740143 seconds time elapsed ( +-  1.24% )

Source of this article: https://hackmd.io/s/BJBBVaqT

Reference:

FNV_1a_16: http://isthe.com/chongo/tech/comp/fnv/
DJB2 & SDBM: http://www.cse.yorku.ca/~oz/hash.html