2017q1 Homework1 (phonebook)

contributed by <heathcliffYang>

Reviewed by `hunng`

利用多個 hash function 來探討其差異
- 但能簡單介紹一下每個的內容，並在最後圖表中附上所有內容，能更好的看出優劣
利用改變不同查詢的值，讓實驗能更符合實際情況
- 因應輸出的值須修改一下 y 軸單位

開發環境

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 58
Model name:            Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
Stepping:              9
CPU MHz:               1200.000
CPU max MHz:           3100.0000
CPU min MHz:           1200.0000
BogoMIPS:              4988.88
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K![](https://i.imgur.com/qUGUX4K.png)

L2 cache:              256K
L3 cache:              3072K
NUMA node0 CPU(s):     0-3

cache

Handle 0x0001, DMI type 7, 19 bytes
Cache Information
	Socket Designation: Unknown
	Configuration: Enabled, Not Socketed, Level 1
	Operational Mode: Write Back
	Location: Internal
	Installed Size: 32 kB
	Maximum Size: 32 kB
	Supported SRAM Types:
		Asynchronous
	Installed SRAM Type: Asynchronous
	Speed: Unknown
	Error Correction Type: Single-bit ECC
	System Type: Other
	Associativity: 16-way Set-associative

原版

Performance counter stats for './phonebook_orig' (100 runs):

     2,037,678      cache-misses              #   95.827 % of all cache refs      ( +-  0.65% )
     2,126,412      cache-references                                              ( +-  0.65% )
   260,791,029      instructions              #    1.44  insns per cycle          ( +-  0.02% )
   180,682,430      cycles                                                        ( +-  0.27% )

   0.069068516 seconds time elapsed                                          ( +-  1.85% )

其他資料結構與搜尋方法

hash function 版本的優化

將資料檢索與詳細資料分開，以 entry 的大小簡化為基本，繼續之前沒能實作的 hash function 的方法。
TABLE_SIZE 的值會影響到 append() 所花的時間與 cache miss
不同的 hash function
Hash function 的 distribution 的實驗，計算最小可容忍的 table size

修正 hash function 優化版本中 append() 的失誤：沒有將 list 正確 link 起來；之所以此次找的字串 "zyxel" 無法檢測出 append() 不正確是因為這個字串是很後面才被 append 的，所以他的資料不會被覆蓋掉，也就找的到。
因此，當寫完優化版本之後，為確認其正確性，還是取 dataset 中相距較遠的樣本來檢測比較好。

關於 append() ，一開始的直覺是：從尾端新增，但這樣 append() 所花的時間高達 0.2 sec ，不過其實沒這個必要，因為可以從前端 insert ，雖然時間還是因為多一些判斷而偏高，但仍因為 zyxel 在後面的原因，所以 findName() 幾乎為最佳。

 Performance counter stats for './phonebook_opt' (100 runs):

           412,103      cache-misses              #   42.583 % of all cache refs      ( +-  0.76% )
           967,771      cache-references                                              ( +-  0.26% )
       233,042,376      instructions              #    1.41  insns per cycle          ( +-  0.02% )
       165,053,962      cycles                                                        ( +-  0.34% )

       0.058836808 seconds time elapsed                                          ( +-  1.34% )

cache miss 下降顯著，但仍有改進空間。

Hash table size 的影響探討 - based on BKDR hash function

append() 所花時間
cache-misses (%)

不同 hash function 與 distibution 的觀察

什麼是 the number of layers (Y軸)
課程助教
我喜歡他的圖亮谷
To 課程助教：因為想知道 word 被分配到各個 key 的情形楊惟晶

TABLE_SIZE 皆為 30000

BKDR
cache-misses 52.208%
SDBM
cache-misses 59.723%
RS
cache-misses 57.723%
JS
49.795%
PJW
53.365%
ELF
50.140%
DJB
50.343%
AP
51.845%

Dataset 的討論

/dictionary/words.txt -> 349900 個 words
- opt : hash function BKDR version, TABLE_SIZE 為 20000
1. 為於資料 50% 位置的 "lonely"
  - cache-misses 的方面： orig : 94.337%, opt : 36.001%
    orig 的搜尋時間易受 target 位置改變影響；對 opt 來說，只要分佈得夠平均，尋找資料的時間都很短，變化不大。
/dictionary/all-names.txt -> 16750 個 names
- opt : hash function BKDR version，TABLE_SIZE 為 20000
- Distribution of all-names.txt
  
  50% : erika, 25% : ford, 75% : catherine
1. ford
  - cache-misses 的方面： orig : 69.954%, opt : 43.364%
2. erika
  - cache-misses 的方面： orig : 73.360%, opt : 43.553%
3. catherine
  - cache-misses 的方面： orig : 72.149%, opt : 48.895%
因為 dataset 太小，append() 都沒有花很多時間；而就 findName()來說，只要 target 越前面，對 orig 版本越有利，而 opt 則是穩定的少。

關於 cache-misses

此次嘗試只改變 opt 的搜尋方法，而不改變資料的結構
opt : BKDR + TABLE_SIZE 為 20000

append() + findName()

Performance counter stats for './phonebook_opt' (100 runs):

  1,163,149      cache-misses              #   64.363 % of all cache refs      ( +-  0.16% )
  1,807,180      cache-references                                              ( +-  0.12% )
255,560,409      instructions              #    1.18  insns per cycle          ( +-  0.12% )
216,628,926      cycles                                                        ( +-  0.53% )

0.103203648 seconds time elapsed                                          ( +-  1.99% )

只有append()
Performance counter stats for './phonebook_opt' (100 runs):

  1,159,457      cache-misses              #   63.565 % of all cache refs      ( +-  0.19% )
  1,824,048      cache-references                                              ( +-  0.12% )
254,958,193      instructions              #    1.19  insns per cycle          ( +-  0.08% )
215,133,335      cycles                                                        ( +-  0.42% )

0.101813265 seconds time elapsed                                          ( +-  2.16% )

Performance counter stats for './phonebook_orig' (100 runs):

       908,096      cache-misses              #   93.467 % of all cache refs      ( +-  0.05% )
       971,572      cache-references                                              ( +-  0.10% )
   201,187,670      instructions              #    1.50  insns per cycle          ( +-  0.03% )
   133,950,753      cycles                                                        ( +-  0.20% )

   0.052471375 seconds time elapsed                                          ( +-  2.70% )

去除掉其他動作(檔案寫入等等)，讓 1 & 2 的情況差異只多了一個 findName()。

append() 與 findName() 的 cache-misses 觀察 - opt：

cache-misses 只少了 3692 次（只佔約0.32%）影響很小

append() 各種因素的比較

在同樣的資料結構下：
兩種不同的 appenf() 方法，從 orig -> opt 增加了 27.68% 的 cache-misses，可見 hash function 的資料儲存有一定的效能犧牲
opt 的不同 TABLE_SIZE 的影響（資料結構未改變）
10000 : 987,147
20000 : 1,159,475
30000 : 1,270,889
在 array 之間的移動有相當的影響。
OPT 版本的資料結構改變的影響（TABLE_SIZE = 20000）：
1,159,475 -> 35,541

資料一次從記憶體下層轉移到記體上層的單位定作 block。

跟一開始的檔案系統設定的 block size = 4096 bytes 有不一樣嗎?
楊惟晶

Fuzzy Search - Soundex

這學期希望自己著重在實際把演算法變成程式碼，而不是找到可以用的複製貼上，寫程式能力太弱了需要練習。
從構思到 debug 完居然花了一整天。

/dictionary/all-names.txt , 找的字 = zuzana (最後一個)
append() 的時間多了整整 46%，而從 distribution 的狀況來看，可以隱約看出端倪，分佈的跨度非常大，極度不平均，這是因為英文上有些"聽起來很像"的名字，所以依照 soundex 會得到一樣的編碼。
Example:
find : abby
Similar answer:
abby, 100
abbie, 100
abbi, 100
abbey, 100
abbe, 100
TABLE_SIZE 的變化。
- 20000
  Performance counter stats for './phonebook_opt' (100 runs):
```
  41,700      cache-misses              #   57.339 % of all cache refs      ( +-  0.65% )
  72,724      cache-references                                              ( +-  2.26% )
```
  38,520,996 instructions # 1.68 insns per cycle ( ± 0.01% )
  22,983,744 cycles ( ± 1.21% )
  
  0.008330655 seconds time elapsed ( ± 1.23% )
- 3000
  Performance counter stats for './phonebook_opt' (100 runs):
```
  25,337      cache-misses              #   54.181 % of all cache refs      ( +-  0.83% )
  46,763      cache-references                                              ( +-  2.82% )
```
  21,227,533 instructions # 1.41 insns per cycle ( ± 0.01% )
  15,037,622 cycles ( ± 1.18% )
  
  0.005637389 seconds time elapsed ( ± 1.41% )
  
  table size 愈小，出現愈高的峰值，某些"聽起來"可能的字愈多的名字，就可能會找到不省人事。