owned this note
owned this note
Published
Linked with GitHub
# 2016q3 Homework1 (clz)
contributed by <`yenWu`>
###### tags: `yenWu` `clz` `sysprog21`
[Github](https://github.com/yenWu/clz-tests)
# 建立測試檔案和四個版本的clz
我直接用了老師提供的四個版本,但發現 recursive 的版本沒有終止條件所以我就自己把iteration改成recursive
>> 作業要求會變更,晚點你重新留意 [name=jserv]
>> 收到 [name=彥寬]
在 Makefile裡我本來打算參照老師的 `#include IMPL`,而且我偷懶直接 include .c 檔,但發現我的 code 裡有一些問題,使用 gdb 時卻不能跳到 clz的function裡,原因可能有很多種可能( include的gdb不會放斷點進去),但目前我沒法驗證,所以我還是使用我平常的方式把它拆成 .o 和一個 main 這就能跳進去了,當然就是我寫錯了Q
測試檔案我很簡單的用了 for迴圈 跑 32個 assert,但由於我是用 for 所以就算是有問題也無法 print 出現在跑到哪一個參數。
## 時間浮動
>>I usually use my notebook with Intel Pentium CPU. However, I change to desktop computer, and the time shake is mystery disappear. I will fix it at the end of the work.[name=Yen-Kuan Wu]
目前我並沒有使用任何 clear_cache 指令,時間浮動非常的大
`$perf stat --repeat 5 -e cache-misses,cache-references,L1-icache-load-misses,L1-icache-load,L1-dcache-load-misses ./clz_recursive`
第一次結果
```
execution time of recursive 0.006160 sec
execution time of recursive 0.004613 sec
execution time of recursive 0.013008 sec
execution time of recursive 0.011653 sec
execution time of recursive 0.006010 sec
Performance counter stats for './clz_recursive' (5 runs):
16,633 cache-misses # 40.578 % of all cache refs ( +- 32.30% )
40,990 cache-references ( +- 8.19% )
0.010242438 seconds time elapsed ( +- 16.13% )
```
第二次結果
```
execution time of recursive 0.006384 sec
execution time of recursive 0.006204 sec
execution time of recursive 0.003486 sec
execution time of recursive 0.003474 sec
execution time of recursive 0.003501 sec
Performance counter stats for './clz_recursive' (5 runs):
5,933 cache-misses # 13.441 % of all cache refs ( +- 23.98% )
44,140 cache-references ( +- 2.36% )
0.005923746 seconds time elapsed ( +- 13.63% )
```
注意到了 cache miss 差非常多,而且上面的時間跳動非常大,所以我們來第二個實驗
```
execution time of recursive 0.009706 sec
execution time of recursive 0.010373 sec
execution time of recursive 0.007648 sec
execution time of recursive 0.004535 sec
execution time of recursive 0.011464 sec
Performance counter stats for './clz_recursive' (5 runs):
22,825 L1-icache-load-misses # 0.45% of all L1-icache hits ( +- 31.92% ) [59.96%]
5,113,605 L1-icache-load ( +- 3.40% ) [77.90%]
3,054 L1-dcache-load-misses ( +- 38.53% ) [63.57%]
0.011087284 seconds time elapsed ( +- 12.20% )
```
```
sudo perf stat --repeat 5 -e L1-icache-load-misses,L1-icache-load,L1-dcache-load-misses ./clz_recursive
execution time of recursive 0.004225 sec
execution time of recursive 0.007665 sec
execution time of recursive 0.007012 sec
execution time of recursive 0.003590 sec
execution time of recursive 0.004015 sec
Performance counter stats for './clz_recursive' (5 runs):
19,026 L1-icache-load-misses # 0.39% of all L1-icache hits ( +- 15.38% ) [61.59%]
4,873,784 L1-icache-load ( +- 0.27% )
7,210 L1-dcache-load-misses ( +- 48.44% ) [41.87%]
0.006931752 seconds time elapsed ( +- 13.37% )
```
讓人疑惑的是 L1 cache的數值居然出現這種相法的反例,當然我測試過非常多次,我的推斷搞不好跟 cache 沒關係
目前打算再去把 perf 讀過一變仔細思考到底怎麼找比較好
## Harley's alogrithm
I print the binary format to understand it.
* First, make all 0's be 1's after leading 1's.
```clike
/* Propagate leftmost 1-bit to the right */
x = x | (x >> 1);
x = x | (x >> 2);
x = x | (x >> 4);
x = x | (x >> 8);
x = x | (x >> 16);
```
>>we can understand it with a key 'accumulate'. Logic right shift 1 will accumulate to logic right shift 2, and so on.
>>1000 >> 1 | 1000 = 1100
>>1100 >> 2 | 1100 = 1111
>> [name=Yen-Kuan Wu]
Now, we transfer all of number with the same clz to the same num.
Let's keep going at the tail`return Table[ x >> 26]`.It make sense because of we should map each `clz` value to uniqne table element.**But we can see that element in the table is not uniqne.** We should fix it QQ!!!
>> we don't need to understand it because we should prove this alogrithm is correct. We can print `x >> 26` to check uniqune. If true ,we can fill the table. [name=Yen-Kuan Wu]
`$ sort -n -k 2 clz_tmp.txt`
32 0
31 1
16 3
30 5
3 6
15 8
29 12
10 13
2 14
12 18
14 19
21 20
19 22
28 25
25 27
9 29
1 30
17 32
4 34
11 38
13 40
22 41
20 42
26 44
18 47
5 48
23 51
27 53
6 55
24 57
7 58
8 60
0 62
This alogrithm is 1-1 mapping.
>> When I fill this table, I notice that teacher is pretty nice for us. HaHa~ [name=Yen-Kuan Wu]
* Second,
>> I will survey this alg latter.
# Gnuplot
tutorial in Chinese: https://dl.dropboxusercontent.com/u/1091156/yong/text/gnuplot%20introduction.pdf
```
plot {<ranges>}
{<function> | {"<datafile>" {datafile-modifiers}}}
{axes <axes>} {<title-spec>} {with <style>}
{, {definitions,} <function> ...}
```
* `plot` defaultly use first and second column to draw a x-y picture.