--- tags:sysprog2018 --- # dict contributed by <`plusline`> --- ### perf [perf_stat_cache_miss.c](https://hackmd.io/s/B11109rdg#) ``` c static char array[10000][10000]; int main (void){ int i, j; for (i = 0; i < 10000; i++) for (j = 0; j < 10000; j++) array[j][i]++; return 0; } ``` ``` perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./perf_stat_cache_miss ``` ``` Performance counter stats for './perf_stat_cache_miss' (50 runs): 0 cache-misses # 0.000 % of all cache refs 0 cache-references 2,397,836,865 instructions # 1.10 insn per cycle ( +- 0.00% ) 2,177,686,992 cycles ( +- 0.57% ) 0.649085746 seconds time elapsed ( +- 0.57% ) ``` 一樣的程式碼,在我的電腦上卻沒有出現 cache miss 。是因為 cache 比較大,還是因為 branch prediction 做的比較好? :::info 關掉編譯器最佳化 (加上 `-O0` 編譯選項),然後再測試 :notes: jserv ::: :::success ``` root@plusline-System-Product-Name:~/plusline# g++ -O0 -o perf_test perf_stat_cache_miss.c root@plusline-System-Product-Name:~/plusline# perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./perf_test 11111 Performance counter stats for './perf_test' (5 runs): 0 cache-misses # 0.000 % of all cache refs 0 cache-references 2,497,839,142 instructions # 1.16 insn per cycle ( +- 0.01% ) 2,152,559,818 cycles ( +- 2.10% ) 0.634402145 seconds time elapsed ( +- 1.85% ) ``` 還是一樣 ::: 要減少cashe miss 主要分成兩種方法 Compiler-controlled prefetch 和 Compiler optimizations 。[Reducing Cache Miss Rate](http://ece-research.unm.edu/jimp/611/slides/chap5_3.html) ``` c= static char array[10000][10000]; int main (void){ int i, j; for (i = 0; i < 10000; i++) for (j = 0; j < 10000; j++) array[i][j]++; return 0; } ``` [perf_stat_cache_miss.c](https://hackmd.io/s/B11109rdg#) 改進 cache-misses 的方法就是屬於 Compiler optimizations 的 Loop interchange,因為陣列在記憶體中的放置方式,導致改進後的版本可以連續的存取鄰近的空間,進而降低 cache miss rate 。 ---- ### Ternary search tree [TST introduction](https://www.youtube.com/watch?v=xv4oRyqSKiw) summary of feature - Hashing needs to check entire key but TST works only for string. - Search miss and search hit may only involve few character. - TST support sort operation while hashing may not. - Faster than hashing and flexible than binary search tree. ### Bloom Filter [Bloom Filters](https://www.youtube.com/watch?v=bEmBh1HtYrw) 沒有 false negative,只有 false positive,但機率很小。 ### 練習 比較 test_ref.c 和 test_cpy.c 會發現 test_ref.c 少一輸出 runtime 的程式碼。加上後出現 segmentation fault 不知道怎解決,但仍能輸出 bench_ref.txt。 ```c if (argc == 2 && strcmp(argv[1], "--bench") == 0) { int stat = bench_test(root, BENCH_TEST_FILE, LMAX); tst_free_all(root); return stat; } FILE *output; output = fopen("ref.txt", "a"); if (output != NULL) { fprintf(output, "%.6f\n", t2 - t1); fclose(output); } else printf("open file error\n"); ``` 在利用產生的 bench_cpy.txt 和 bench_ref.txt 放在一起產生散布圖。 修改 `runtime3.gp`後面幾行 ``` set xtics 20000 plot [:247614][:]'bench_cpy.txt' using 1:2 with points title 'cpy',\ 'bench_ref.txt' using 1:2 with points title 'ref',\ ``` set xtics 20000 改x軸刻度為20000 ploot [:247614][:] 一共有247614筆資料,且讓 y 軸自動調整。 再加上 bench_ref.txt 的資料。 runtime of prefix search ![](https://i.imgur.com/SSplx9Z.png) 在 Makefile 加入 plot ``` plot: gnuplot scripts/runtime3.gp ``` 後來發現被修改後的 `runtime3.gp` 其實就是 `runtimept.gp` ``` git log --oneline -5 =>最近五次push git reset --hard HEAD =>還原最近一次版本 ``` test_ref.c 被我越改越爛,只好去學怎恢復版本,之前都是直接砍掉再 git clone