contributed by <bobsonlin
>
查詢方式:
lscpu
指令可得知CPU的最大和最小的工作頻率$ make run
size of entry : 136 bytes
execution time of append() : 0.110389 sec
execution time of findName() : 0.007590 sec
$ perf stat --repeat 5 -e cycles,instructions,cache-references,cache-misses,branch-instructions,branch-misses ./phonebook_orig
Performance counter stats for ’./phonebook_orig’ (5 runs):
2,10730710 cycles ( +- 1.15% ) [65.49%]
2,6152,9155 instructions # 1.24 insns per cycle ( +- 0.18% ) [82.75%]
226,7098 cache-references ( +- 0.32% ) [82.75%]
208,8783 cache-misses # 92.135 % of all cache refs ( +- 1.51% ) [82.75%]
4660,8227 branch-instructions ( +- 1.01% ) [85.59%]
99,5033 branch-misses # 2.13% of all branches ( +- 1.13% ) [83.47%]
0.079568228 seconds time elapsed ( +- 13.06% )
$ ./phonebook_orig & sudo perf top -p $!
由以上的結果可知,在phonebook的程式中,消耗CPU週期最多的是findName(),
修改struct的結構
$ make run
size of entry : 32 bytes
execution time of append() : 0.095677 sec
execution time of findName() : 0.004692 sec
$ perf stat --repeat 5 -e cycles,instructions,cache-references,cache- misses,branch-instructions,branch-misses ./phonebook_orig
Performance counter stats for ’./phonebook_orig’ (5 runs):
1,4712,4605 cycles ( +- 0.87% ) [65.26%]
2,4898,6819 instructions # 1.69 insns per cycle ( +- 0.47% ) [82.80%]
69,3457 cache-references ( +- 3.35% ) [83.83%]
38,4917 cache-misses # 55.507 % of all cache refs ( +- 2.35% ) [83.87%]
4366,7742 branch-instructions ( +- 0.53% ) [83.87%]
99,4378 branch-misses # 2.28% of all branches ( +- 2.30% ) [84.65%]
0.060416386 seconds time elapsed ( +- 19.98% )
Hash Function 加速查詢
hashTable *createHashTable(int tableSize){
int i;
hashTable *tb;
tb = (hashTable*)malloc(sizeof(hashTable));
tb->list = (entry **)malloc(sizeof(entry*)*tableSize);
for(i = 0;i < tableSize;i++){
tb->list[i] = NULL;
}
return tb;
}
unsigned int hash(char *str)
{
unsigned int hash_value = 0;
unsigned int seed = 131;
while(*str)
hash_value = hash_value * seed + (*str++);
return (hash_value % TABLE_SIZE);
}
entry *findName_HashTable(char lastName[], hashTable *tb)
{
entry *e;
unsigned int index = hash(lastName);
for(e = tb->list[index]; e != NULL; e = e->pNext){
if (strcasecmp(lastName, e->lastName) == 0)
return e;
}
return NULL;
}
void append_HashTable(char lastName[], hashTable *tb)
{
unsigned int index = hash(lastName);
entry *e;
e = (entry*)malloc(sizeof(entry));
strcpy(e->lastName, lastName);
e->pNext = tb->list[index];
tb->list[index] = e;
}
Performance counter stats for './phonebook_opt_HASH' (100 runs):
32,5297 cache-misses # 53.204 % of all cache refs ( +- 0.43% )
61,1409 cache-references ( +- 0.37% )
2,3601,3893 instructions # 1.53 insns per cycle ( +- 0.02% )
1,5460,6436 cycles ( +- 0.36% )
0.062424603 seconds time elapsed ( +- 3.01% )
修改讀檔方式,降低append()的執行時間
請回去讀書,計算機組織的課本寫得很清楚。人之所缺乏專業,往往就是活在「腦補」的世界中。 jserv
我google了有關cache的資料,做了以下的整理:
- 由參考1得知,cache讀取命中稱為cache hit,讀取錯失稱為cache miss,又若將terminal中的cache-misses次數除以cache-references次數,剛好與terminal中顯示的百分比相同,由此推論cache references應不是cache的命中次數,而是CPU向cache要求資料的總次數。
- 由參考1的儲存系統階層可以得知,向cache取得資料時機為要找的資料不在CPU暫存器時,才會向cache找資料。反向推論,若cache-references次數較少,代表CPU需要的資料可能大多已存在暫存器中。
- 綜合上方兩者推論,成大資工WiKi的Perf介紹中,利用存取的局部性修改程式,大幅降低cache-references次數,亦降低cache-misses次數,代表CPU所需要的資料大多存在CPU暫存器中,使得執行時間縮短為原本的三分之一。
make run
時,發現append()的執行時間為findName()的15倍,但是進行perf找尋熱點時,findName()卻為熱點,明顯大於append()。想了解執行時間與CPU消耗週期的差異。$ git push
時,發生permission denied,經過同學的幫忙,才發現我當時是從老師給的github網址直接clone下來。可以由git remote -v
來得知本機的專案從哪裡clone下來,以及將會被push到何處,若欲修改,可先$ git remote #代號
再$ git remote add #代號
,最後$ git push --set-upstream #代號 master
bobsonlin