2018q1 Homework1 (phonebook)

contributed by <HexRabbit>

Reviewed by `nashi5566`

似乎沒有看到輔助說明使用 memory pool 之後的時間變化的比較圖或數據
想知道後面提到 malloc() 造成影響與開發環境不同造成優化效果的程度之間有什麼關聯性 (因為列點的方式看起來似乎是有相關聯的)
作業的標題似乎時間標錯了

Reviewed by `LinYunWen`

同學你的標題錯誤囉~XDDD
commit message 多為使用 git commit 做較多文字的說明，尤其是在寫入較大量的變更時
另外有一個 commit message title 叫 Add more bugs (0ed7141) ?
可以多做一些數據上的呈現，比如減少 struct 體積以及 memory pool 那邊，並且分析跟 cache miss rate 上的比較

開發環境

                   -`                 
                  .o+`                 HexRabbit
                 `ooo/                 OS: Arch Linux 
                `+oooo:                Kernel: x86_64 Linux 4.13.3-1-zen
               `+oooooo:               Virtualization: VT-x
               -+oooooo+:              Intel Core i5-7200U @ 4x 3.1GHz
             `/:-:++oooo+:             L1d cache: 32K
            `/++++/+++++++:            L1i cache: 32K
           `/++++++++++++++:           L2 cache: 256K
          `/+++ooooooooooooo/`         L3 cache: 3072K
         ./ooosssso++osssssso+`        GPU: intel
        .oossssso-````/ossssss+`       RAM: 7862MiB
       -osssssso.      :ssssssso.      
      :osssssss/        osssso+++.     
     /ossssssss/        +ssssooo/-      
   `/ossssso+/:-        -:/+osssso+-   
  `+sso+:-`                 `.-/+oso:  
 `++:.                           `-/+/
 .`                                 `/

優化策略

縮小 struct 體積

額外使用另一個 struct 並只存入 lastName 及指向原 struct 的指針，縮小單筆資料的體積，可使得呼叫 findName() 時讀入快取的資料量增加，繼而增進 cache hit 的成功率。不過由於需要另外建立 struct，在 append 時會耗費更多時間。

使用 Hash Table

參考 Programming Small，使用一個較大的質數^[1] 42737 作為 HashTable，並以 djb2 作為 hash function，在資料平均分佈的情況下，每個 key 對應的串列長度為(350000/42737 = 8.19)，使得每次存取時不必遍歷整個串列，大幅提昇效能。

請補上參考連結

為甚麼選定 42737?
課程助教

已修正，感謝提醒
HexRabbit

效能差異

Performance counter stats for './phonebook_orig' (100 runs):                                                                                      

         9,286,328      cache-misses              #   95.053 % of all cache refs      ( +-  0.25% ) 
         9,769,588      cache-references                                              ( +-  0.44% )
       386,432,150      instructions              #    1.41  insn per cycle           ( +-  0.01% )
       273,745,051      cycles                                                        ( +-  0.88% )
       
       0.088832038 seconds time elapsed                                               ( +-  0.88% )

 Performance counter stats for './phonebook_opt' (100 runs):                                                                                       
 
        10,331,405      cache-misses              #   78.574 % of all cache refs      ( +-  0.59% )
        13,148,653      cache-references                                              ( +-  0.66% )
       448,060,714      instructions              #    1.02  insn per cycle           ( +-  0.02% )
       438,339,399      cycles                                                        ( +-  1.70% )
       
       0.157374408 seconds time elapsed                                               ( +-  1.51% )

可以明顯觀察到 cache miss 大幅下降了27%，但總消耗時間上升約77%(大量的free操作於圖表中沒有表現出來)

數據跟圖示擠在一起，請重新繪製
課程助教

搞定
HexRabbit

findname()消耗的時間大幅度降低，
不過在搜尋單字量為 1 時只能說是完全沒有效能提昇，

但如果將單字量增加到 8 個並且有一個單字輸入錯誤的情況下，便可以看出 optimazed 的版本在 findName() 上耗費的時間幾乎沒有改變，而原始版本多了接近三倍時間，可以看出透過犧牲些許效能是有機會在大量查詢人名時得到補償的。

利用 memory pool 提昇速度

在實做了上述幾個優化後，發現優化版本在 append() 上所花費的時間增長了約1.6倍，觀察後發現問題出在 malloc() 和 hash function 上。

嘗試實做 memory pool，透過一次性分配大量記憶體空間來減少 malloc() 呼叫進而縮短執行時間。

Performance counter stats for './phonebook_opt' (100 runs):

            86,484      cache-misses:u            #   13.048 % of all cache refs      ( +- 10.91% )
           662,839      cache-references:u                                            ( +-  2.50% )
       178,112,559      instructions:u            #    1.63  insn per cycle           ( +-  0.03% )
       109,034,011      cycles:u                                                      ( +-  0.96% )

       0.051151509 seconds time elapsed                                          ( +-  0.81% )

效能上有大幅度的提昇，執行時間縮短 40%，
cache-reference 和 cache-misses 都大幅度下降，cache-miss rate 降低至 13.048%

實做時遇到的問題

由於每個人的開發環境不同，優化的效果可能出現差距，例如在我的配備下優化過的append()消耗的時間較原本多出了66%，而在朋友的電腦上實做卻只多出45%
- 發現是malloc()消耗相當多時間 –> 利用memory pool解決
make plot時必須刪除前一次輸出的 opt.txt 才能得出正確結果，否則會使用到上次的數據
- 修改Makefile

能否請你說明是甚麼問題呢? 我不太理解XD
課程助教

知乎/Hash时取模一定要模质数吗？ ↩︎