linyunwen
contributed by <LinYunWen
>
nashi5566
請問一下你可以看到的是 master branch 所以沒有看到 commit
你可以切換 branch 來看到我實作的程式碼,謝謝你
yw
raygoah
afcidk
feat/search_table
裡的 main.c
,發現第74行用到的temp
變數在第54行就已經宣告了,這樣會不會增加日後更動程式時出現誤用變數的風險?BKDRhash()
裡面,實際上會得的返回值區間是 0~4095。是否可以比較 使用位元運算子,浪費空間 和 使用 mod 運算子,省一些空間 的效能差異?
- 在 Linux 下查詢硬體資訊的工具指令
- <參考 : ktvexe
2017q1 Homework1 (phonebook)>
perf top
add "sudo" to make authorization
$ cat /proc/sys/kernel/perf_event_paranoid $ 3
set perf_event_paranoid = 1 (default) with
sudo sh -c 'echo 1 > /proc/sys/kernel/perf_event_paranoid'
superuser solutionJeff Vander Stoep posted the patch on July 27. It adds a another value that can be set for the sysctl parameter (i.e. kernel.perf_event_paranoid=3) that restricts perf_event_open() to processes with the CAP_SYS_ADMIN capability. Currently, perf_event_paranoid is set to 2 by default, which disallows access to some perf features (raw tracepoint access, CPU event access, and kernel profiling) to processes without the proper capabilities; the patch does not change the default. He also submitted another patchthat would allow configuring the kernel to make 3 be the default perf_event_paranoid value.
lwn.net
終端機等文字訊息請直接將文字貼上,避免使用圖片
vvn
感謝你,我之後會注意的,不過其實這似乎是個 error , 所以我才直接將結果貼上,我會再想想怎樣呈現比較好
yw
size of entry : 136 bytes
execution time of append() : 0.060688 sec
execution time of findName() : 0.005451 sec
Performance counter stats for './phonebook_orig' (100 runs):
964,631 cache-misses # 79.394 % of all cache refs ( +- 0.35% )
1,214,991 cache-references ( +- 0.38% )
264,776,714 instructions # 1.25 insn per cycle ( +- 0.02% )
212,053,930 cycles ( +- 0.43% )
0.065508261 seconds time elapsed ( +- 0.77% )
/* original version */
entry *findName(char lastName[], entry *pHead)
{
while (pHead != NULL) {
if (strcasecmp(lastName, pHead->lastName) == 0)
return pHead;
pHead = pHead->pNext;
}
return NULL;
}
entry *append(char lastName[], entry *e)
{
/* allocate memory for the new entry and put lastName */
e->pNext = (entry *) malloc(sizeof(entry));
e = e->pNext;
strcpy(e->lastName, lastName);
e->pNext = NULL;
return e;
}
fp = fopen("./dictionary/all-names.txt", "r");
int counter = 0;
double total_time = 0.0;
i = 0;
while (fgets(input, sizeof(input), fp)) {
while (input[i] != '\0')
i++;
input[i - 1] = '\0';
i = 0;
/* compute the execution time */
clock_gettime(CLOCK_REALTIME, &start);
// TODO: findName()
clock_gettime(CLOCK_REALTIME, &end);
cpu_time2 = diff_in_second(start, end);
counter++;
total_time += cpu_time2;
}
printf("execution time of findName() : %lf sec\n", total_time/counter);
fclose(fp);
size of entry : 136 bytes
execution time of append() : 0.043385 sec
execution time of findName() : 0.003817 sec
Performance counter stats for './phonebook_orig':
2,703,006,518 cache-references
2,349,397,909 cache-misses # 86.918 % of all cache refs
52.409814206 seconds time elapsed
findName
,要提昇 findName
的速度,要馬尋找較快速,要馬降低 cache miss ,因為當 CPU 有 cache miss ,就必須跟其他記憶體存取資訊,使得搜尋速度便慢,故主要實現加速的方式有以下兩個大方向
reduce the size of __PHONE_BOOK_ENTRY
branch "feat/reduce-size"
findName
亦是以 entry
中的 lastName
來做主要搜尋,因此可以發現 entry
並不需要這大量的空間存放全部的資訊。struct __INFORMATION
來存放其他的資料,而原本的 entry
中存放 lastName
, firstName
, 以及一個指向其他 information
的指標。<參考 : ktvexe
2017q1 Homework1 (phonebook)>
// phonebook_opt.h
typedef struct __INFOMATION {
char email[16];
char phone[10];
char cell[10];
char addr1[16];
char addr2[16];
char city[16];
char state[2];
char zip[5];
} info_entry;
typedef struct __PHONE_BOOK_ENTRY {
char lastName[MAX_LAST_NAME_SIZE];
char firstName[16];
info_entry *infomation;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
size of entry : 48 bytes
execution time of append() : 0.040588 sec
execution time of findName() : 0.002486 sec
Performance counter stats for './phonebook_opt' (100 runs):
196,161 cache-misses # 41.025 % of all cache refs ( +- 0.63% )
478,149 cache-references ( +- 0.63% )
248,685,554 instructions # 1.54 insn per cycle ( +- 0.02% )
161,622,141 cycles ( +- 0.47% )
0.050508480 seconds time elapsed ( +- 0.61% )
在這邊可以看到原本的 136 bytes 以降低為 46 bytes,因此 cache miss rate 也大幅下降為 41 %,所以 findName
的時間跟著下降, append
也是,因為所處理的資料量較少,整體速度都跟著變快
size of entry : 48 bytes
counter: 16751
execution time of append() : 0.047555 sec
execution time of findName() : 0.001276 sec
Performance counter stats for './phonebook_opt':
573,715,877 cache-references
181,525,064 cache-misses # 31.640 % of all cache refs
21.444725248 seconds time elapsed
Search table
brance "feat/search-table"
因為觀察到輸入檔 dictionary/words.txt
都是照著字母排列,如果說今天想要尋找的字串從第一個字就不相同,那就不應該繼續比較其他字,而是尋找下一行,但是這樣做也是增加多餘的尋找,如果建立一份每一個字母的第一筆資料是在哪一個位置再開始尋找,即可減少搜尋的次數
size of entry : 136 bytes
execution time of append() : 0.057229 sec
execution time of findName() : 0.000019 sec
Performance counter stats for './phonebook_opt' (100 runs):
208,331 cache-misses # 49.684 % of all cache refs ( +- 1.06% )
419,311 cache-references ( +- 2.06% )
205,218,950 instructions # 1.12 insn per cycle ( +- 0.02% )
183,614,971 cycles ( +- 0.86% )
0.058406260 seconds time elapsed ( +- 1.19% )
在這邊可以看到,雖然資料結構的大小並沒有改變,仍然為 136 bytes ,但是 findName
搜尋時間和 cache miss rate 都有明顯的下降,推測原因就是建立 table 後減少了不必要的查詢,相對的也不會過度的載入不需要的資料,造成大量的 cache miss
Performance counter stats for './phonebook_opt' (100 runs):
115,498 cache-misses # 47.900 % of all cache refs ( +- 1.01% )
241,123 cache-references ( +- 1.09% )
190,222,390 instructions # 1.42 insn per cycle ( +- 0.02% )
133,871,200 cycles ( +- 0.48% )
0.041614991 seconds time elapsed ( +- 0.55% )
測試其他測試值
size of entry : 48 bytes
execution time of append() : 0.063503 sec
execution time of findName() : 0.000050 sec
Performance counter stats for './phonebook_opt':
494,896 cache-misses # 12.532 % of all cache refs
3,949,064 cache-references
0.932224762 seconds time elapsed
問題探討
Hash table
branch "feat/hash-table"
跟 search table 有點類似,一樣是預先建立一個 table 但是這個表的建立方式是利用 hash 的方式來進行,而這個部份我是參考各种字符串Hash函数比较,然後選用 BKDRHash 方式,因為起來效果是最好的,並採用 seed = 131 ,比較困難的點是,考量到測試資料量有30多萬比,照理來說 hash table 應該要1萬以上,會比較妥當,但是我目前 hash 速度非常之慢,造成 append
反而上升兩倍,因此先將 hash table size 為 4500
// 參考網路實作 BKDHash
int BKDRHash(char *str) {
int seed = 131;
int hash_value = 0;
while (*str) {
hash_value = hash_value * seed + (*str++);
}
return hash_value & 0x00000FFF;
}
// 模仿 append() 實作 hash table 的 linear append
hash_entry* hash_linear_append(char lastName[], entry *value, hash_entry *block)
{
strcpy(value->lastName, lastName);
block->next = (hash_entry*) malloc(sizeof(hash_entry));
block = block->next;
strcpy(block->lastName, lastName);
block->value = value;
block->next = NULL;
return block;
}
entry *findName(char lastName[], hash_entry **table)
{
int value = BKDRHash(lastName);
hash_entry *temp = table[value];
do {
if (strcmp(temp->lastName, lastName) == 0) {
return temp->value;
}
temp = temp->next;
} while (temp != NULL);
return NULL;
}
其資料結構大小並沒有變化,但是同樣的 cache miss rate 及 findName
亦大幅下降
Performance counter stats for './phonebook_opt' (100 runs):
400,557 cache-misses # 38.316 % of all cache refs ( +- 1.87% )
1,045,407 cache-references ( +- 1.80% )
339,438,608 instructions # 1.05 insn per cycle ( +- 0.02% )
322,594,446 cycles ( +- 1.51% )
0.101027885 seconds time elapsed ( +- 1.65% )
stats
reduce and hash
Performance counter stats for './phonebook_opt' (100 runs):
285,057 cache-misses # 33.311 % of all cache refs ( +- 2.50% )
855,755 cache-references ( +- 1.73% )
332,727,023 instructions # 1.32 insn per cycle ( +- 0.03% )
251,489,583 cycles ( +- 1.51% )
0.077913353 seconds time elapsed ( +- 1.69% )
測試其他資料
size of entry : 48 bytes
execution time of append() : 0.092830 sec
execution time of findName() : 0.000000 sec
Performance counter stats for './phonebook_opt':
374,618 cache-misses # 38.882 % of all cache refs
963,477 cache-references
0.108437187 seconds time elapsed
append
、findName
、cache miss rate 的時間和比率,在降低資料結構的大小以及套用快速的搜尋演算法後,跟原始線性搜尋相比,減少搜尋次數,可以顯著的降低 cache miss rate,但是相對的 append
可能因為建立特殊的資料結構而使時間上升,而一結果整體下來,降低資料結構大小,並配合使用 hash table searching 演算法效果最好original | reduce size | search table | hash table | search table (reduce size) | hash table (reduce size) | |
---|---|---|---|---|---|---|
append | 0.050533 | 0.041713 | 0.057229 | 0.09508 | 0.039643 | 0.073392 |
findName | 0.00531 | 0.003453 | 0.000019 | 0.000002 | 0.000015 | 0.000001 |
cache miss rate | 79.894 % | 41.025 % | 49.684 % | 38.316 % | 47.900 % | 33.311 % |