contributed by < refleex >
king1224
main.c 中的 input array 即為查找目標
dictionary 資料夾中的 word.txt 為預設電話簿
洪正皇
作業系統 : ubuntu 16.04 LTS (64-bit)
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
每核心執行緒數:2
每通訊端核心數:2
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 60
Model name: Intel® Core™ i5-4210M CPU @ 2.60GHz
製程: 3
CPU MHz: 1694.671
CPU max MHz: 3200.0000
CPU min MHz: 800.0000
BogoMIPS: 5187.92
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 3072K
NUMA node0 CPU(s): 0-3
不用特別標時間,HackMD 會自動記錄更新,你只要及時整理進度和所見所聞即可 "jserv"
先安裝並熟悉 perf 的使用
$ sudo apt-get install linux-tools-common
$ sudo apt-get install linux-tools-3.16.0-50-generic linux-cloud-tools-3.16.0-50-generic
但因為 kernel 版本不同,因此照出現的警示安裝相符的版本
$ sudo apt-get install linux-tools-4.4.0-63-generic
$ sudo apt-get install linux-cloud-tools-4.4.0-63-generic
接著就照 Perf 的說明繼續練習,先查看權限
$ cat /proc/sys/kernel/perf_event_paranoid
結果為 1
1
但在我要解除 kernel pointer 限制時
$ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"
查看權限的結果仍為 1
執行 example.c
$ ./example
得到 pid: 26710
$ perf top -p 26710
輸入上列指令,卻只得到 perf top 的參數列表
Usage: perf top [<options>]
-a, --all-cpus system-wide collection from all CPUs
-b, --branch-any sample any taken branches
-c, --count <n> event period to sample
-C, --cpu <cpu> list of cpus to monitor
-d, --delay <n> number of seconds to delay between refreshes
-D, --dump-symtab dump the symbol table used for profiling
-E, --entries <n> display this many functions
-e, --event <event> event selector. use 'perf list' to list available even
-f, --count-filter <n>
only display functions with more events than this
-F, --freq <n> profile at this frequency
-g enables call-graph recording and display
-i, --no-inherit child tasks do not inherit counters
-j, --branch-filter <branch filter mask>
branch stack filter modes
-K, --hide_kernel_symbols
hide kernel symbols
-k, --vmlinux <file> vmlinux pathname
-M, --disassembler-style <disassembler style>
Specify disassembler style (e.g. -M intel for intel syntax)
.
.
.
上述部份試了很多種方法,結果不是沒變,不然就是
bash: syntax error near unexpected token `newline'
不過沒關係,大致重複來回做7個小時後,再參考一些網路上的文章,已經稍微對 perf 有些概念,所以正式開始這次的作業–>優化
先用 perf stat 檢查原始版本的 cash miss
$sudo perf stat --repeat 100 -e cache-misses,cache-references,instructions,cycles ./phonebook_orig
結果如下
Performance counter stats for './phonebook_orig' (100 runs):
969,283 cache-misses # 85.291 % of all cache refs ( +- 0.19% )
1,136,447 cache-references ( +- 0.13% )
261,184,322 instructions # 1.47 insns per cycle ( +- 0.02% )
177,351,808 cycles ( +- 0.14% )
0.056668601 seconds time elapsed ( +- 0.15% )
cache-miss 85.291%
size of entry : 136 bytes
execution time of append() : 0.059108 sec
execution time of findName() : 0.005920 sec
打開 phonebook_opt.h
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
entry *findName(char lastName[], entry *pHead);
entry *append(char lastName[], entry *e);
打開 phonebook_orig.c
entry *findName(char lastName[], entry *pHead)
{
while (pHead != NULL) {
if (strcasecmp(lastName, pHead->lastName) == 0)
return pHead;
pHead = pHead->pNext;
}
return NULL;
}
entry *append(char lastName[], entry *e)
{
/* allocate memory for the new entry and put lastName */
e->pNext = (entry *) malloc(sizeof(entry));
e = e->pNext;
strcpy(e->lastName, lastName);
e->pNext = NULL;
return e;
}
可以發現在phonebook_opt.h
struct 所選宣告變數中 append() 和 findName() 只用到了 lastName
因此把不會用到的變數 用另外一個struct宣告
typedef struct __PHONE_BOOK_ENTRY1 {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} entry1;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct entry1 *detail;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
然後$ make plot
size of entry : 32 bytes
execution time of append() : 0.038840 sec
execution time of findName() : 0.005018 sec
Performance counter stats for './phonebook_opt' (100 runs):
971,060 cache-misses # 85.131 % of all cache refs ( +- 0.17% )
1,140,662 cache-references ( +- 0.12% )
261,089,459 instructions # 1.47 insns per cycle ( +- 0.02% )
177,377,228 cycles ( +- 0.13% )
0.056722433 seconds time elapsed ( +- 0.16% )
85.131%,嗯。
However, 我可憐的電腦,cache-miss 僅優化了 0.16%
–
不死心重複嘗試後,把原本的 phonebook 資料夾整個刪掉,重頭開始做。
#define OPT 1
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
struct __PHONE_BOOK_ENTRY *pNext;
struct entry1 *detail;
} entry;
typedef struct __PHONE_BOOK_ENTRY1 {
char firstName[16];
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} entry1;
entry *findName(char lastName[], entry *pHead);
entry *append(char lastName[], entry *e);
#endif
entry *findName(char lastName[], entry *pHead)
{
while (pHead != NULL) {
if (strcasecmp(lastName, pHead->lastName) == 0)
return pHead;
pHead = pHead->pNext;
}
return NULL;
}
entry *append(char lastName[], entry *e)
{
/* allocate memory for the new entry and put lastName */
e->pNext = (entry *) malloc(sizeof(entry));
e = e->pNext;
strcpy(e->lastName, lastName);
e->pNext = NULL;
return e;
}
然後$ make plot
size of entry : 32 bytes
execution time of append() : 0.034790 sec
execution time of findName() : 0.002003 sec
Performance counter stats for './phonebook_opt' (100 runs):
142,859 cache-misses # 47.150 % of all cache refs ( +- 0.50% )
302,990 cache-references ( +- 0.60% )
243,965,824 instructions # 1.93 insns per cycle ( +- 0.02% )
126,343,413 cycles ( +- 0.26% )
0.040863594 seconds time elapsed ( +- 0.34% )
47.15% 總算回歸正常數值
看完 hash function 觀念和實務 ,再上網找了一些關於hash function的文章
了解到 hash function 其實就是把一堆資料分門別類,再根據索引去搜尋。
所以如果把電話簿裡的名字以某種規則分類,若要找特定名字,便可在較小的範圍內搜尋,以減少搜索時間
所謂某種規則,便是 hash function ,雜湊函數。
上網找了一個 BKDR-hash function 便開始著手修改程式碼。
unsigned int hash(char lastName[])
{
int hash_number = 0;
int h = 53;
for (int i = 0; lastName[i] != '\0'; i++)
hash_number = (hash_number * h + lastName[i]) % TABLE;
return hash_number;
}
hash function 本身很好寫,難在 main 函式的更改,尤其 allocate memory 時極常出錯
size of entry : 32 bytes
execution time of append() : 0.073824 sec
execution time of findName() : 0.000000 sec
*** Error in `./phonebook_opt_hash': munmap_chunk(): invalid pointer: 0x00007ffd6f3b87a0 ***
上網找了一下關於 munmap_chunk(): invalid pointer 的問題,找到一篇提到:the program will crash if you pass a pointer to free() that has not been obtained through malloc().
修改一下程式碼後(包括 Makefile, calculate.c, runtime.gp等等)
size of entry : 32 bytes
execution time of append() : 0.056479 sec
execution time of findName() : 0.000000 sec
Performance counter stats for './phonebook_orig' (100 runs):
570,732 cache-misses # 59.378 % of all cache refs ( +- 0.33% )
961,182 cache-references ( +- 0.31% )
260,905,474 instructions # 1.47 insns per cycle ( +- 0.02% )
177,699,696 cycles ( +- 0.42% )
0.057268308 seconds time elapsed ( +- 0.52% )
Performance counter stats for './phonebook_opt' (100 runs):
139,321 cache-misses # 46.176 % of all cache refs ( +- 0.56% )
301,716 cache-references ( +- 0.53% )
243,828,644 instructions # 1.94 insns per cycle ( +- 0.02% )
125,408,510 cycles ( +- 0.29% )
0.040634235 seconds time elapsed ( +- 0.38% )
Performance counter stats for './phonebook_opt_hash' (100 runs):
742,301 cache-misses # 59.684 % of all cache refs ( +- 0.76% )
1,243,727 cache-references ( +- 0.24% )
336,095,343 instructions # 1.22 insns per cycle ( +- 0.01% )
275,331,193 cycles ( +- 0.30% )
0.089550008 seconds time elapsed ( +- 0.36% )
Origin 的 cash-miss 只有 59% ,正常不應該那麼低,所以我下載原版跑了一次,跟之前一樣約略是 85% ~ 86%。
應該是因為我在更改 main.c 時不小心更動某一部份(還不確定再哪),而 origin 有近乎和 hash 版本一樣的 cash-miss
findName 一直是0.0000秒,原本以為他是沒有進到 while迴圈而直接回傳 NULL ,可是試驗之後發現也不是,這邊一直想不透