2017q1 Homework1 (phonebook)

contributed by <rayleigh0407>

Reviewed by `yachiyang01`

測試一中削減資料大小的方法，可以嘗試將一些不需要放在struct中的資料獨立出來。
Commit Message內文應說明是什麼、為什麼而不是如何做。

Reviewed by `illusion030`

commit message 寫得不清楚
沒有當 malloc 失敗時的防呆

Reviewed by `SeanCCC`

沒有解釋hash table大小怎麼決定的

開發環境

**Desktop**
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 94
Model name:            Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Stepping:              3
CPU MHz:               800.000
CPU max MHz:           4200.0000
CPU min MHz:           800.0000
BogoMIPS:              8016.72
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7
**Laptop**
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 69
Model name:            Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
Stepping:              1
CPU MHz:               1694.531
CPU max MHz:           2700.0000
CPU min MHz:           800.0000
BogoMIPS:              4788.93
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K
NUMA node0 CPU(s):     0-3

前置作業

Dual system

系統方面,我使用 ubuntu 16.04.1 LTS
因為半年前換電腦時就先灌好雙系統了
所以直接沿用

rayleigh@rayleigh-System-Product-Name:~$lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.1 LTS
Release:	16.04
Codename:	xenial

Vim 編輯器

大一的時候還覺得很難用
不過現在學了一些基本修改及安裝插件
直接變成 coding 神器阿XD
感謝老師及助教提供的資料

這裡有個小工具能夠改變語法顏色

    Plugin 'octol/vim-cpp-enhanced-highlight'

可以修改一些基本語法的顏色和字體

    hi Character cterm=bold ctermfg=DarkRed ctermbg=NONE
    hi SpecialChar cterm=bold ctermfg=DarkGreen ctermbg=NONE
    hi Function cterm=bold ctermfg=DarkBlue

參考資料：
syntax highlight plugin
vim syntax

Perf

linux 相當強大的效能統計軟體
稍微看一下使用的說明

   $ perf --help
 usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

 The most commonly used perf commands are:
   ...
   list            List all symbolic event types
   ...
   stat            Run a command and gather performance counter statistics
   ...
   top             System profiling tool.
   ...

這次作業主要是使用 stat

$ perf stat --help
NAME
       perf-stat - Run a command and gather performance counter statistics

SYNOPSIS
       perf stat [-e <EVENT> | --event=EVENT] [-a] <command>
       perf stat [-e <EVENT> | --event=EVENT] [-a] — <command> [<options>]
    .
    .
   *-e, --event=
           Select the PMU event. 
    -p, --pid=<pid>
           stat events on existing process id (comma separated list)
    -a, --all-cpus
           system-wide collection from all CPUs
   *-d, --detailed
           print more detailed statistics, can be specified up to 3 times
                 -d:          detailed events, L1 and LLC data cache
              -d -d:     more detailed events, dTLB and iTLB events
           -d -d -d:     very detailed events, adding prefetch events
   *-r, --repeat=<n>
           repeat command and print average + stddev (max: 100). 0 means forever.
    -o file, --output file
           Print the output into the designated file.
    --append
           Append to the output file designated with the -o option. Ignored if -o is not specified.
    .
    .

藉由 perf list 可以得知有哪些 PMU event

參考資料
perf 原理和實務
 Perf – Linux下的系统性能調優工具

gnuplot

請在中英文字間加上空白區隔
課程助教
好的, 謝謝助教

參考資料
gnuplot 語法解說和示範

cache

實作前,不得不先了解 cache 這東西
cache 算是一個 cpu 專屬的記憶體
通常取資料時會先從 cache 裡找
這是為了提昇整體運算的效能（畢竟 ram 現在還是慢 cpu 很多）

藉由下列指令,可以得知使用者目前電腦 cache 的資訊

$ lscpu
$ sudo lshw -C memory
$ x86info -c(須先安裝x86info)
$ sudo dmidecode -t cache

或是開啟 /proc/cpuinfo 檔案,也可以看到該電腦 cpu 的資訊

$ cat /proc/cpuinfo | grep cache
cache size	: 3072 KB
cache_alignment	: 64
cache size	: 3072 KB
cache_alignment	: 64
cache size	: 3072 KB
cache_alignment	: 64
cache size	: 3072 KB
cache_alignment	: 64
(有幾個核心就會顯示幾次)

參考資料
cache原理和實驗
 關於CPU Cache – 程序猿需要知道的那些事
 Stackoverflow and stackexchange

開發過程

1. 原始版

先執行原始程式碼,來看看他的效能如何

$ perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig

Performance counter stats for './phonebook_orig' (100 runs):

      126,1316      cache-misses              #   90.600 % of all cache refs      ( +-  0.23% )
      139,2178      cache-references                                              ( +-  0.16% )
   2,6076,3251      instructions              #    1.45  insns per cycle          ( +-  0.02% )
   1,7966,5430      cycles                                                        ( +-  0.11% )

   0.068607172 seconds time elapsed                                          ( +-  0.43% )

cache miss 高達90%
目前猜測原因有兩個：

資料量大使用量卻少
還有一個,正常的phonebook可能本身就有給予多種資料
但在原始碼裡卻只用lastName找資料
其他資料浪費的空間
也可能是導致cache-miss過高的原因
資料儲存方式
看一下原始碼,資料是以linked list直接串接成一列
要從這麼大的資料鏈(我猜是word.txt : 約350000筆)裡找一筆相符的
可能就是導致cache-miss過高的原因

2. 測試一: 削減資料大小

先從1開始下手
把 entry 裡除了 lastName 以外的資料另外創一個type
名為 otherData
並在 entry 裡宣告一個指向 otherData的pointer
有需要在分配記憶體給它儲存資料

就直接把兩個完整的 structure 貼上來吧!
課程助教











...
typedef struct __PHONE_BOOK_OTHER_DATA {
    //put data expect lastName
} otherData;

typedef struct __PHONE_BOOK_ENTRY {
    ...
    otherData *data;
    ...
} entry
...

測試結果：
test1_Pic
Cache-miss:

 Performance counter stats for './phonebook_opt' (100 runs):

           12,4501      cache-misses              #   28.264 % of all cache refs      ( +-  0.59% )
           44,0500      cache-references                                              ( +-  0.46% )
       2,4398,0290      instructions              #    1.94  insns per cycle          ( +-  0.02% )
       1,2562,6360      cycles                                                        ( +-  0.19% )

       0.048758786 seconds time elapsed                                          ( +-  0.30% )

結果意外地理想,大幅降低 cache-miss ,搜尋及插入資料的時間也變少了。
但是這只算是治標不治本
假設每筆資料都要填入其他相關資訊(e.g. 姓氏)
問題就會回到原點
這邊我讓每筆資料新增一塊空的 otherData





phonebook_opt.c:
     ...
     e->data = (otherData *) malloc(sizeof(otherData));                                         
     strcpy(e->lastName, lastName);
     ...

測試結果：

Performance counter stats for './phonebook_opt' (100 runs):

          151,8995      cache-misses              #   94.176 % of all cache refs      ( +-  0.07% )
          161,2927      cache-references                                              ( +-  0.09% )
       3,4121,7546      instructions              #    1.46  insns per cycle          ( +-  0.02% )
       2,3358,1072      cycles                                                        ( +-  0.13% )

       0.090065262 seconds time elapsed                                          ( +-  0.20% )

可以看到 cache-miss 又增加了
還比原始版本還高
執行時間也比原本的長
可見此方法不怎麼可行
opt

3. 測試二：嘗試使用Hash table

第二次,我使用 Hash table 來儲存所有資料
建立一個 entry type 的 Hash table, 大小大概是350000
利用每個 lastName 所對應的 hash value
對 table size 取餘數來當作索引值




    ...
    hash_index = hash_function(entry->lastName);
    put entry into table[hash_index];
    ...

但不見得每筆資料的 hash value 都是獨立的
因此當遇到 hash collision (兩個變數的 hash value 相同)時
就以 linked list 的方式串接（如下圖所示）

然而 Hash function 有很多種
這邊就來測試以下六種 Hash function

DJB2

測試結果：
DJB2-result
cache-misses:

 Performance counter stats for './phonebook_hash_opt' (100 runs):

      254,7001      cache-misses              #   58.420 % of all cache refs      ( +-  0.22% )
      435,9795      cache-references                                              ( +-  0.24% )
   2,2879,0814      instructions              #    1.18  insns per cycle          ( +-  0.04% )
   1,9360,8219      cycles                                                        ( +-  0.34% )

   0.053086737 seconds time elapsed                                          ( +-  0.33% )

BKDR

測試結果：
BKDR-result
cache-misses:

 Performance counter stats for './phonebook_hash_opt' (100 runs):

      255,0878      cache-misses              #   59.273 % of all cache refs      ( +-  0.30% )
      430,3633      cache-references                                              ( +-  0.26% )
   2,3726,8911      instructions              #    1.20  insns per cycle          ( +-  0.04% )
   1,9828,2803      cycles                                                        ( +-  0.36% )

   0.054561474 seconds time elapsed                                          ( +-  0.35% )

AP
測試結果：
APHash
cache-misses:

 Performance counter stats for './phonebook_hash_opt' (100 runs):

      251,9352      cache-misses              #   58.323 % of all cache refs      ( +-  0.29% )
      431,9651      cache-references                                              ( +-  0.37% )
   2,5446,4326      instructions              #    1.26  insns per cycle          ( +-  0.04% )
   2,0218,2336      cycles                                                        ( +-  0.43% )

   0.055191749 seconds time elapsed                                          ( +-  0.39% )

JS
測試結果:
JSHash
cache-misses:

 Performance counter stats for './phonebook_hash_opt' (100 runs):

      247,0955      cache-misses              #   56.622 % of all cache refs      ( +-  0.23% )
      436,3923      cache-references                                              ( +-  0.21% )
   2,3718,4584      instructions              #    1.21  insns per cycle          ( +-  0.04% )
   1,9667,2118      cycles                                                        ( +-  0.28% )

   0.054124701 seconds time elapsed                                          ( +-  0.32% )

RS
測試結果:
RSHash
cache-misses:

  Performance counter stats for './phonebook_hash_opt' (100 runs):

      215,5988      cache-misses              #   52.947 % of all cache refs      ( +-  0.26% )
      407,1983      cache-references                                              ( +-  0.75% )
   2,3803,1769      instructions              #    1.27  insns per cycle          ( +-  0.04% )
   1,8734,7593      cycles                                                        ( +-  0.28% )

   0.049808275 seconds time elapsed                                          ( +-  0.37% )

SDB
測試結果:
SDBHash
cache-misses:

 Performance counter stats for './phonebook_hash_opt' (100 runs):

      242,4593      cache-misses              #   57.131 % of all cache refs      ( +-  0.26% )
      424,3887      cache-references                                              ( +-  0.25% )
   2,3704,8982      instructions              #    1.22  insns per cycle          ( +-  0.04% )
   1,9386,2498      cycles                                                        ( +-  0.27% )

   0.053005389 seconds time elapsed                                          ( +-  0.28% )

比較圖

time:
cache-miss:

問題討論：本例選用的 dataset (也就是輸入電話簿的姓名) 是否存在問題？

有代表性嗎？跟真實英文姓氏差異大嗎？
就現實情況來講差異還蠻大的, 許多名字根本沒有母音（我不知道如何發音）, 或是只含一種字母（ aaaaaa ）,例如下面這些名字是從 word.txt 裡面擷取出來的:
```
bbbb
bbbbb
bbbbbb
bbbbbbb
bbbbbbbb
bbbbe
bbbbut
bbbppsll
bbcc
```
利用join指令去比對 words.txt 及 all-names.txt 兩個檔案
發現共有的名字有 14536 個, 重疊的比例大概一半( 7110 個名字已重複)
不過如果這是一個人的別名或一個電話號碼的標籤, 就還說的過去

再加上 words.txt 幾乎沒有重複使用的名字, 雖然測資只有 35 萬, 但重複的比例也太低, 這裡有個網站名叫 name statistics, 統計了美國最多人取的名字, 像 Smith 的比例足足有 1.006%, 意思是 3 億多的美國人中, 有百萬多個名為 Smith, 對比到 words.txt, 可見相似率相對的低
name statistics
資料難道「數大就是美」嗎？
就如同剛剛提到的, words.txt 雖然有35萬資料, 卻沒有涵蓋住資料量只有 16000 的 all-names.txt, 可見資料不僅要大, 還要有意義, 才能算是好的資料, 如何才是有意義的資料, 會在下個問題來探討
如果我們要建立符合真實電話簿情境的輸入，該怎麼作？
或許可以沿用 words.txt, 將其定位為"電話簿的別名", 並在同一列中加入其他屬性的資料, 像是地址, 信箱, 或是現代常用的 facebook, line ID 等等, 增加資料的可信度,

參考資料：

同學您好，助教這邊沒有您的個人資料，請正確填寫課程表單課程助教
助教抱歉,已填寫戴子祐