contributed by <jayfeng0225
>
jayfeng0225
系統 : Ubuntu 16.04 LTS
配備 :
安裝相關開發工具
$ sudo apt-get update
$ sudo apt-get install build-essential
$ sudo apt-get install linux-tools-common linux-tools-generic
$ sudo apt-get install astyle colordiff gnuplot`
設定教學 : 小克's 部落格
$ sudo apt-get install hime
參考下列網址即可完成設定
GitHub的設定引指
學習Git
Git 版本控制系統
$ git commit -m "comment"
$ git push -u origin master
$ git branch <new_branch_name> 建立本地 local branch
$ git branch -m <old_name> <new_name> 改名字 (如果有同名會失敗,改用 -M 可以強制覆蓋)
$ git branch 列出目前有那些 branch 以及目前在那個 branch
$ git checkout <branch_name> 切換 branch (注意到如果你有檔案修改了卻還沒 commit,會不能切換 branch,解法稍後會談)
$ git checkout -b <new_branch_name> (<from_branch_name>) 本地建立 branch 並立即 checkout 切換過去
$ git branch -d <branch_name> 刪除 local branch
參考網站 : Gnuplot (Part I)
reset
set ylabel 'time(sec)'
set style fill solid
set title 'perfomance comparison'
set term png enhanced font 'Verdana,10'
set output 'runtime.png'
plot [:][:0.100]'output.txt' using 2:xtic(1) with histogram title 'original', \
'' using ($0-0.1):($2+0.002):2 with labels title ' ', \
'' using 3:xtic(1) with histogram title 'struct optimized' , \
'' using ($0+0.05):($3+0.002):3 with labels title ' ', \
'' using 4:xtic(1) with histogram title 'hash function' , \
'' using ($0+0.3):($4+0.002):4 with labels title ' ', \
append() 0.067780 0.053012 0.084497
findName() 0.006792 0.003539 0.000000
根據案例分析的內容來探討改進的方法以及造成cache miss的原因
使用perf來看cache miss的比率,原來的架構大約有67%的cache-miss
$ perf stat -r 10 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses ./phonebook_orig
Performance counter stats for './phonebook_orig' (10 runs):
1,337,063 cache-misses # 66.914 % of all cache refs (63.90%)
1,930,629 cache-references (64.94%)
2,663,069 L1-dcache-load-misses (68.04%)
895,559 L1-dcache-store-misses (70.16%)
2,627,222 L1-dcache-prefetch-misses (69.13%)
111,859 L1-icache-load-misses (65.85%)
0.109293886 seconds time elapsed ( +- 15.40% )
再來查看電腦的資訊
$ lscpu | grep cache
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 3072K
首先,根據程式要求,只要找到lasyname就算符合結果
entry *findName(char lastname[], entry *pHead)
{
while (pHead != NULL) {
if (strcasecmp(lastname, pHead->lastName) == 0)
return pHead;
pHead = pHead->pNext;
}
return NULL;
}
原先的struct有136byte的大小
所以可以改變struct的架構來降低原來的cahce miss比率
新的struct只留下search的結果lastname
#define MAX_LAST_NAME_SIZE 16
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
L1 cache有32Kbyte,每筆entry有136bytes
。
32*1024 / 136 = 240.9,每次大約能夠存240筆資料在cache
而改善後struct的大小為24byte
32*1024/24 = 1365.33,能夠存高達1365筆資料在cache,因此效能能夠大大提升
改善後的效能結果 :
可以看到append與findName的時間都減了,而cache miss的部份
Performance counter stats for './phonebook_orig' (100 runs) :
1,420,611 cache-misses # 72.836 % of all cache refs
1,916,229 cache-references
272,770,687 instructions # 1.35 insns per cycle
194,818,050 cycles
0.097316509 seconds time elapsed ( +- 1.46% )
Performance counter stats for './phonebook_opt' (100 runs):
90,950 cache-misses # 29.957 % of all cache refs
300,289 cache-references
244,047,904 instructions # 1.72 insns per cycle
142,529,076 cycles
0.068250922 seconds time elapsed ( +- 1.45% )
可以看到cache-misses由72.8%下降到29.9%!
參考案例分析 : phonebook的方法,我採用的是djb2 hash演算法
unsigned int hash(char *key){
unsigned int hashVal = 0;
while(*key != '\0')
hashVal = (hashVal << 5) + *key++;
return hashVal % 42737;
}
我並沒有針對儲存hash的資料結構做優化,而是直接採用陣列式的struct entry
所以最後實作出來的append效率很差,比上原始版本高上不少
會再針對append的部分做優化 JayFeng
entry *findName(char lastname[], entry **e)
{
int hashindex;
hashindex = hash(lastname);
entry *pHead;
pHead = e[hashindex];
/* TODO: implement */
while (pHead != NULL) {
if (strcasecmp(lastname, pHead->lastName) == 0){
return pHead;
}
pHead = pHead->pNext;
}
return NULL;
}
雖然append時間增加,但findName的時間幾乎是小於0.000001,比之前的版本好上很多
Performance counter stats for './phonebook_hash' (100 runs):
352,837 cache-misses # 28.865 % of all cache refs
1,218,066 cache-references
247,999,356 instructions # 1.36 insns per cycle
188,666,811 cycles
0.096232347 seconds time elapsed ( +- 1.62% )
這邊可以看到cache-misses也下降了很多,比起僅修改struct大小在好一點 ( 29.9% -> 28.8% )