homework都是在ubuntu上作業,
對於從windows轉到linux的user,
其中一個面臨的問題就是輸入法,
示範安裝嘸蝦米輸入法
–->>> this command lshw (list hardware) can get more detailed hardware information.
下面只是一部分資訊
參考同學 GitHub 設定指引
astyle的參數,對source code編排的影響,請參考(http://astyle.sourceforge.net/astyle.html#_indent-switches)
$ astyle --style=kr --indent=spaces=4 --indent-switches --suffix=none *.[ch]
以下不像蒙地卡羅法,那是什麼方法,為什麼可以求出PI?
#include <stdio.h>
#include <unistd.h>
double compute_pi_baseline(size_t N) {
double pi = 0.0;
double dt = 1.0 / N;
for (size_t i = 0; i < N; i++) {
double x = (double) i / N;
pi += dt / (1.0 + x * x);
}
return pi * 4.0;
}
int main() {
printf("pid: %d\n", getpid());
sleep(10);
compute_pi_baseline(50000000);
return 0;
}
蒙地卡羅法裡有一個橢圓的相關方程式,(xx + yy < 1) 這是什麼?
$ sudo su <--- 切到root
$ echo -1 > /proc/sys/kernel/perf_event_paranoid --- 開啟可以取得所有event data的權限
$ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict" --- 如果要檢測 cache miss event ,需要先取消 kernel pointer 的禁用。
Ex:待測程式需印出PID,並延遲幾秒執行,讓另一個terminal可以執行perf top -p $pid(填上待測程式的PID),抓取相關的event data.
6 /* original version */
7 typedef struct __PHONE_BOOK_ENTRY {
8 char lastName[MAX_LAST_NAME_SIZE];
9 char firstName[16];
10 char email[16];
11 char phone[10];
12 char cell[10];
13 char addr1[16];
14 char addr2[16];
15 char city[16];
16 char state[2];
17 char zip[5];
18 struct __PHONE_BOOK_ENTRY *pNext;
19 } entry;
$ sudo su
$ echo -1 > /proc/sys/kernel/perf_event_paranoid
$ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"
$ sudo perf stat -r 10 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses,instructions,cycles ./phonebook_orig
size of entry : 136 bytes
execution time of append() : 0.088558 sec
execution time of findName() : 0.017283 sec
Performance counter stats for './phonebook_orig' (10 runs):
827,166 cache-misses # 29.994 % of all cache refs ( +- 0.95% ) [44.38%]
2,757,744 cache-references ( +- 0.77% ) [31.05%]
2,838,813 L1-dcache-load-misses ( +- 1.31% ) [28.96%]
5,272,629 L1-dcache-store-misses ( +- 1.29% ) [28.35%]
0 L1-dcache-prefetch-misses
67,225 L1-icache-load-misses ( +- 3.70% ) [29.64%]
259,001,566 instructions # 0.79 insns per cycle ( +- 0.62% ) [44.15%]
328,689,512 cycles ( +- 0.40% ) [42.98%]
0.146228190 seconds time elapsed ( +- 0.43% )
27 typedef struct __PHONE_BOOK_ENTRY {
28 char lastName[MAX_LAST_NAME_SIZE];
29 #ifdef DATA_STRUCTURE_OPTIMIZATION_SMALLER_SIZE
30 releated *related_information;
31 #else
32 char firstName[16];
33 char email[16];
34 char phone[10];
35 char cell[10];
36 char addr1[16];
37 char addr2[16];
38 char city[16];
39 char state[2];
40 char zip[5];
41 #endif
42 struct __PHONE_BOOK_ENTRY *pNext;
43 } entry;
$ sudo su
$ echo -1 > /proc/sys/kernel/perf_event_paranoid
$ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"
$ sudo perf stat -r 10 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses,instructions,cycles ./phonebook_opt
size of entry : 32 bytes
execution time of append() : 0.064288 sec
execution time of findName() : 0.005590 sec
Performance counter stats for './phonebook_opt' (10 runs):
70,125 cache-misses # 8.085 % of all cache refs ( +- 0.76% ) [42.66%]
867,400 cache-references ( +- 7.07% ) [29.23%]
634,781 L1-dcache-load-misses ( +- 2.59% ) [31.64%]
1,630,373 L1-dcache-store-misses ( +- 2.77% ) [33.57%]
0 L1-dcache-prefetch-misses
40,509 L1-icache-load-misses ( +- 6.46% ) [31.99%]
256,956,863 instructions # 1.32 insns per cycle ( +- 0.86% ) [45.52%]
193,948,741 cycles ( +- 0.54% ) [43.05%]
0.084300406 seconds time elapsed ( +- 0.54% )
46 typedef struct __LEVEL1_COMPARE {
47 char first_char_of_name;
48 entry *link;
49 } level1_compare;
$ sudo su
$ echo -1 > /proc/sys/kernel/perf_event_paranoid
$ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"
$ sudo perf stat -r 10 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses,instructions,cycles ./phonebook_opt_2
size of entry : 32 bytes
execution time of append() : 18.591619 sec
execution time of findName() : 0.000032 sec
Performance counter stats for './phonebook_opt_2' (10 runs):
199,603 cache-misses # 0.008 % of all cache refs ( +- 6.19% ) [42.85%]
2,489,462,022 cache-references ( +- 0.02% ) [28.60%]
5,701,701,395 L1-dcache-load-misses ( +- 0.30% ) [28.59%]
10,203,165 L1-dcache-store-misses ( +- 3.69% ) [28.58%]
0 L1-dcache-prefetch-misses
1,635,607 L1-icache-load-misses ( +- 1.90% ) [28.58%]
23,329,940,160 instructions # 0.53 insns per cycle ( +- 0.02% ) [42.86%]
44,357,537,174 cycles ( +- 0.12% ) [42.84%]
18.643948613 seconds time elapsed ( +- 0.13% )
46 typedef struct __LEVEL1_COMPARE {
47 char first_char_of_name;
48 entry *the_last_node;
49 entry *link;
50 } level1_compare;
$ sudo su
$ echo -1 > /proc/sys/kernel/perf_event_paranoid
$ sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"
$ sudo perf stat -r 10 -e cache-misses,cache-references,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses,instructions,cycles ./phonebook_opt_3
size of entry : 32 bytes
execution time of append() : 0.070129 sec
execution time of findName() : 0.000032 sec
Performance counter stats for './phonebook_opt_3' (10 runs):
31,586 cache-misses # 5.733 % of all cache refs ( +- 3.19% ) [41.21%]
550,933 cache-references ( +- 1.64% ) [37.54%]
425,916 L1-dcache-load-misses ( +- 0.95% ) [34.90%]
1,906,506 L1-dcache-store-misses ( +- 0.92% ) [32.54%]
0 L1-dcache-prefetch-misses
63,040 L1-icache-load-misses ( +- 8.00% ) [27.99%]
206,689,110 instructions # 1.22 insns per cycle ( +- 0.38% ) [38.99%]
168,967,583 cycles ( +- 0.69% ) [34.78%]
0.073860286 seconds time elapsed ( +- 0.73% )
original | 優化1 | 優化2 | 優化3 | |
execution time of append() | 0.088558 sec | 0.064288 sec | 18.591619 sec | 0.070129 sec |
execution time of findName() | 0.017283 sec | 0.005590 sec | 0.000032 sec | 0.000032 sec |
cache-misses | 827,166 | 70,125 | 199,603 | 31,586 |
cache-references | 2,757,744 | 867,400 | 2,489,462,022 | 550,933 |
instructions | 259,001,566 | 256,956,863 | 23,329,940,160 | 206,689,110 |
cycles | 328,689,512 | 193,948,741 | 44,357,537,174 | 168,967,583 |
append() cache-misses | 39.36% | |||
findName() cache-misses | 48.96% | 6.89% |
如何取得較小的function misses ? [youchihwang]Fri, Oct 7, 2016 5:10 AM
gnuplot,如何設定長方圖的寬度呢?[youchihwang]Fri, Oct 7, 2016 5:11 AM
original vs 優化1
1:structure減小,cache可以放進更多需要儲存的資料,pde因此減少cache-misses,但為什麼可以減少cache-references.
2:cache-misses減小了,run time也就減少。
優化1 vs 優化2
加了bucket link list, 將name的第一個字母做為index, 分別儲存在不同的link list。
1:append的作法,是以要加入的name的第一個字母做為index,再到對應的link list,依序搜尋直到最後一個node後,再加入name,因此runtime會較長
2:findname的作法,以要搜尋的name的第一個字母為index,再到相對應的link list裡搜尋,所以搜尋的時間會比從頭到尾的搜尋來得快。
優化2 vs 優化3
優化2的append時間太長的原因是以name的第一個字母做為index,再到相對應的link list去搜尋,直到最後一個node再搜入,為了加速執行時間,所以在bucket link list的index structure加入了指向最後一個node的member, 因此在插入name時,即可直接跳到最後一個node並插入其後。