2016q3 Homework1 (phonebook)

contributed by <jayfeng0225>

tags: `jayfeng0225`

系統安裝與設定

系統 : Ubuntu 16.04 LTS
配備 :

安裝相關開發工具

$ sudo apt-get update
$ sudo apt-get install build-essential
$ sudo apt-get install linux-tools-common linux-tools-generic
$ sudo apt-get install astyle colordiff gnuplot`

HIME 中文輸入法使用起來比較接近新注音

設定教學 : 小克's 部落格

$ sudo apt-get install hime

Git設定

參考下列網址即可完成設定
GitHub的設定引指
 學習Git
Git 版本控制系統

git push

$ git commit -m "comment"
$ git push -u origin master

git branch

$ git branch <new_branch_name>   建立本地 local branch
$ git branch -m <old_name> <new_name>    改名字 (如果有同名會失敗，改用 -M 可以強制覆蓋)
$ git branch    列出目前有那些 branch 以及目前在那個 branch
$ git checkout <branch_name>    切換 branch (注意到如果你有檔案修改了卻還沒 commit，會不能切換 branch，解法稍後會談)
$ git checkout -b <new_branch_name> (<from_branch_name>) 本地建立 branch 並立即 checkout 切換過去
$ git branch -d <branch_name> 刪除 local branch

Gnuplot Script 分析

參考網站 : Gnuplot (Part I)

runtime.gp

reset                                                                                           
set ylabel 'time(sec)'
set style fill solid
set title 'perfomance comparison'
set term png enhanced font 'Verdana,10'
set output 'runtime.png'

plot [:][:0.100]'output.txt' using 2:xtic(1) with histogram title 'original', \
'' using ($0-0.1):($2+0.002):2 with labels title ' ', \
'' using 3:xtic(1) with histogram title 'struct optimized'  , \ 
'' using ($0+0.05):($3+0.002):3 with labels title ' ', \
'' using 4:xtic(1) with histogram title 'hash function'  , \ 
'' using ($0+0.3):($4+0.002):4 with labels title ' ', \

output.txt

append() 0.067780 0.053012 0.084497                                           
findName() 0.006792 0.003539 0.000000

xtic(1) : x軸使用第一項也就是append()與findName()
using 2 , 3 and 4代表後面三項數字
($0 - 0.1) : 表示以$0(append and findName)往左0.1個單位的位置
($2+0.002) : 表示將數值標記在以$2(opt time)的長條往上0.002單位的位置

案例分析

根據案例分析的內容來探討改進的方法以及造成cache miss的原因

使用perf來看cache miss的比率，原來的架構大約有67%的cache-miss

$ perf stat -r 10 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses ./phonebook_orig

Performance counter stats for './phonebook_orig' (10 runs):
         1,337,063      cache-misses              #   66.914 % of all cache refs      (63.90%)
         1,930,629      cache-references                                              (64.94%)
         2,663,069      L1-dcache-load-misses                                         (68.04%)
           895,559      L1-dcache-store-misses                                        (70.16%)
         2,627,222      L1-dcache-prefetch-misses                                     (69.13%)
           111,859      L1-icache-load-misses                                         (65.85%)
       0.109293886 seconds time elapsed                                          ( +- 15.40% )

再來查看電腦的資訊

$ lscpu | grep cache
L1d 快取：          32K
L1i 快取：          32K
L2 快取：           256K
L3 快取：           3072K

效能改進方向

改寫 struct __PHONE_BOOK_ENTRY 的成員，搬動到新的結構中
使用 hash function 來加速查詢
使用 binary search tree 改寫演算法

Optimization

首先，根據程式要求，只要找到lasyname就算符合結果









entry *findName(char lastname[], entry *pHead)
{
    while (pHead != NULL) {
        if (strcasecmp(lastname, pHead->lastName) == 0)
            return pHead;
        pHead = pHead->pNext;
    }
    return NULL;
}

針對L1-cache來調整struct大小做優化

原先的struct有136byte的大小
所以可以改變struct的架構來降低原來的cahce miss比率
新的struct只留下search的結果lastname





#define MAX_LAST_NAME_SIZE 16
typedef struct __PHONE_BOOK_ENTRY {
    char lastName[MAX_LAST_NAME_SIZE];
    struct __PHONE_BOOK_ENTRY *pNext;
 } entry;

L1 cache有32Kbyte，每筆entry有136bytes。
32*1024 / 136 = 240.9，每次大約能夠存240筆資料在cache
而改善後struct的大小為24byte
32*1024/24 = 1365.33，能夠存高達1365筆資料在cache，因此效能能夠大大提升

改善後的效能結果 :

可以看到append與findName的時間都減了，而cache miss的部份

Performance counter stats for './phonebook_orig' (100 runs) :
         1,420,611      cache-misses              #   72.836 % of all cache refs    
         1,916,229      cache-references                                            
       272,770,687      instructions              #    1.35  insns per cycle        
       194,818,050      cycles                                                      
       0.097316509 seconds time elapsed                                          ( +-  1.46% )

 Performance counter stats for './phonebook_opt' (100 runs):
            90,950      cache-misses              #   29.957 % of all cache refs    
           300,289      cache-references                                            
       244,047,904      instructions              #    1.72  insns per cycle        
       142,529,076      cycles                                                      
       0.068250922 seconds time elapsed                                          ( +-  1.45% )

可以看到cache-misses由72.8%下降到29.9%！

使用hash function來減少cache-misses的發生

參考案例分析 : phonebook的方法，我採用的是djb2 hash演算法






unsigned int hash(char *key){
    unsigned int hashVal = 0;
    while(*key != '\0')
        hashVal = (hashVal << 5) + *key++;
    return hashVal % 42737;
}

我並沒有針對儲存hash的資料結構做優化，而是直接採用陣列式的struct entry
所以最後實作出來的append效率很差，比上原始版本高上不少

會再針對append的部分做優化 JayFeng















entry *findName(char lastname[], entry **e)
{   
    int hashindex;
    hashindex = hash(lastname);
    entry *pHead;
    pHead = e[hashindex];
    /* TODO: implement */ 
    while (pHead != NULL) {
        if (strcasecmp(lastname, pHead->lastName) == 0){
            return pHead;
        }
        pHead = pHead->pNext;
    }
    return NULL;
}

雖然append時間增加，但findName的時間幾乎是小於0.000001，比之前的版本好上很多

 Performance counter stats for './phonebook_hash' (100 runs):
           352,837      cache-misses              #   28.865 % of all cache refs    
         1,218,066      cache-references                                            
       247,999,356      instructions              #    1.36  insns per cycle        
       188,666,811      cycles                                                      
       0.096232347 seconds time elapsed                                          ( +-  1.62% )

這邊可以看到cache-misses也下降了很多，比起僅修改struct大小在好一點 ( 29.9% -> 28.8% )