---
tags:sysprog2018
---
# dict
contributed by <`plusline`>
---
### perf
[perf_stat_cache_miss.c](https://hackmd.io/s/B11109rdg#)
``` c
static char array[10000][10000];
int main (void){
int i, j;
for (i = 0; i < 10000; i++)
for (j = 0; j < 10000; j++)
array[j][i]++;
return 0;
}
```
```
perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./perf_stat_cache_miss
```
```
Performance counter stats for './perf_stat_cache_miss' (50 runs):
0 cache-misses # 0.000 % of all cache refs
0 cache-references
2,397,836,865 instructions # 1.10 insn per cycle ( +- 0.00% )
2,177,686,992 cycles ( +- 0.57% )
0.649085746 seconds time elapsed ( +- 0.57% )
```
一樣的程式碼,在我的電腦上卻沒有出現 cache miss 。是因為 cache 比較大,還是因為 branch prediction 做的比較好?
:::info
關掉編譯器最佳化 (加上 `-O0` 編譯選項),然後再測試
:notes: jserv
:::
:::success
```
root@plusline-System-Product-Name:~/plusline# g++ -O0 -o perf_test perf_stat_cache_miss.c
root@plusline-System-Product-Name:~/plusline# perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./perf_test
11111
Performance counter stats for './perf_test' (5 runs):
0 cache-misses # 0.000 % of all cache refs
0 cache-references
2,497,839,142 instructions # 1.16 insn per cycle ( +- 0.01% )
2,152,559,818 cycles ( +- 2.10% )
0.634402145 seconds time elapsed ( +- 1.85% )
```
還是一樣
:::
要減少cashe miss 主要分成兩種方法 Compiler-controlled prefetch 和 Compiler optimizations 。[Reducing Cache Miss Rate](http://ece-research.unm.edu/jimp/611/slides/chap5_3.html)
``` c=
static char array[10000][10000];
int main (void){
int i, j;
for (i = 0; i < 10000; i++)
for (j = 0; j < 10000; j++)
array[i][j]++;
return 0;
}
```
[perf_stat_cache_miss.c](https://hackmd.io/s/B11109rdg#) 改進 cache-misses 的方法就是屬於 Compiler optimizations 的 Loop interchange,因為陣列在記憶體中的放置方式,導致改進後的版本可以連續的存取鄰近的空間,進而降低 cache miss rate 。
----
### Ternary search tree
[TST introduction](https://www.youtube.com/watch?v=xv4oRyqSKiw)
summary of feature
- Hashing needs to check entire key but TST works only for string.
- Search miss and search hit may only involve few character.
- TST support sort operation while hashing may not.
- Faster than hashing and flexible than binary search tree.
### Bloom Filter
[Bloom Filters](https://www.youtube.com/watch?v=bEmBh1HtYrw)
沒有 false negative,只有 false positive,但機率很小。
### 練習
比較 test_ref.c 和 test_cpy.c 會發現 test_ref.c 少一輸出 runtime 的程式碼。加上後出現 segmentation fault 不知道怎解決,但仍能輸出 bench_ref.txt。
```c
if (argc == 2 && strcmp(argv[1], "--bench") == 0) {
int stat = bench_test(root, BENCH_TEST_FILE, LMAX);
tst_free_all(root);
return stat;
}
FILE *output;
output = fopen("ref.txt", "a");
if (output != NULL) {
fprintf(output, "%.6f\n", t2 - t1);
fclose(output);
} else
printf("open file error\n");
```
在利用產生的 bench_cpy.txt 和 bench_ref.txt 放在一起產生散布圖。
修改 `runtime3.gp`後面幾行
```
set xtics 20000
plot [:247614][:]'bench_cpy.txt' using 1:2 with points title 'cpy',\
'bench_ref.txt' using 1:2 with points title 'ref',\
```
set xtics 20000 改x軸刻度為20000
ploot [:247614][:] 一共有247614筆資料,且讓 y 軸自動調整。
再加上 bench_ref.txt 的資料。
runtime of prefix search
![](https://i.imgur.com/SSplx9Z.png)
在 Makefile 加入 plot
```
plot:
gnuplot scripts/runtime3.gp
```
後來發現被修改後的 `runtime3.gp` 其實就是 `runtimept.gp`
```
git log --oneline -5 =>最近五次push
git reset --hard HEAD =>還原最近一次版本
```
test_ref.c 被我越改越爛,只好去學怎恢復版本,之前都是直接砍掉再 git clone