# 2016q3 Homework1(clz)
contributed by <`TempoJiJi`>
###### tags: `TempoJiJi` `clz` `sysprog21`
## 預期目標
1. 閱讀 [重新理解數值](/s/SkKZBXZT) 裡頭關於 count leading zero ([clz](https://en.wikipedia.org/wiki/Find_first_set#CLZ)) 的描述,設計實驗,比較 32-bit 數值對以下實做的效能差異:
* recursive version
* iteration version
* binary search technique
* byte-shift version
* Harley's algorithm
2. 比照 phonebook 和 compute-pi,設計一套 benchmark suite,得以針對所有的 32-bit 數值進行個別 clz 實做效能的分析,並透過 gnuplot 製圖
## 開發環境
* Ubuntu 16.04 LTS
* L1d cache: 32K
* L1i cache: 32K
* L2 cache: 256K
* L3 cache: 3072K
* Architecture: x86_64
* CPU op-mode(s): 32-bit, 64-bit
* Byte Order: Little Endian
* CPU(s): 4
* Model name: Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
---
## 測試每種version的正確性
### recursive version
```clike=
uint8_t clz_recursive(uint32_t x, int shift_len)
{
if(shift_len == 0)
return 0;
/* shift upper half down, rest is filled up with 0s */
uint16_t upper = (x >> shift_len);
// mask upper half away
uint16_t lower = ( x & (0xFFFF >> (16 - shift_len)) );
return upper ? clz_recursive(upper, shift_len >> 1) :
shift_len + clz_recursive(lower, shift_len >> 1);
}
```
多了一個參數記錄每次要shift多少位,每次進入lower的遞回就加shift_len
### Harley’s algorithm
```clike=
int clz_harley(unsigned x) {
char u = 100;
static char table[64] =
{32,31, u,16, u,30, 3, u, 15, u, u, u,29,10, 2, u,
u, u,12,14,21, u,19, u, u,28, u,25, u, 9, 1, u,
17, u, 4, u, u, u,11, u, 13,22,20, u,26, u, u,18,
5, u, u,23, u,27, u, 6, u,24, 7, u, 8, u, 0, u};
x = x | (x >> 1); // Propagate leftmost
x = x | (x >> 2); // 1-bit to the right.
x = x | (x >> 4);
x = x | (x >> 8);
x = x | (x >> 16);
x = (x << 3) - x; // Multiply by 7.
x = (x << 8) - x; // Multiply by 255.
x = (x << 8) - x; // Again.
x = (x << 8) - x; // Again.
return table[x >> 26];
}
```
其實有點不理解這個演算法的,可能還需要再研究一下...
參考資料:[P. 99-106. Number of leading zeros algorithms. - Hacker's Delight](http://www.hackersdelight.org/hdcodetxt/nlz.c.txt)
```clike=
for(int i=1;i<UINT_MAX;i++){
if( __builtin_clz(i) != clz_recursive(i,16))
printf("%d\n",i);
if( __builtin_clz(i) != clz_iteration(i))
printf("%d\n",i);
if( __builtin_clz(i) != clz_binary_search(i))
printf("%d\n",i);
if( __builtin_clz(i) != clz_byte_shift(i))
printf("%d\n",i);
if( __builtin_clz(i) != clz_harley(i))
printf("%d\n",i);
}
```
測試完後確認所有演算法都正確
---
以下爲各個版本的結果圖:
範圍:[100000:100000000]
1. iteration
![](https://i.imgur.com/lJh3ZNk.png)
2. recursion
![](https://i.imgur.com/bxSfTAX.png)
3. harley
![](https://i.imgur.com/oR1ffKP.png)
4. byte_shift
![](https://i.imgur.com/Mg80gL6.png)
5. binary_search
![](https://i.imgur.com/9IScPkC.png)
所有比較圖:
範圍:10000000~30000000 ,每次加100
![](https://i.imgur.com/Rd7IQyt.png)
數據跳動蠻嚴重的
嘗試讓數據跳動不要那麼大,把所有軟體都關掉,在測上面那張圖的時候有開着google chrome
![](https://i.imgur.com/xLdUgWM.png)
跟上面也差太多了...
這次嘗試開着chrome然後跑着youtube的影片測測看:
![](https://i.imgur.com/qOZsJJE.png)
可以很明顯的看到兩張圖的差別...雖然知道開着遊覽器會影響效能,但沒想到差那麼多...下次要測試的時候還是把能關的軟體都關掉吧,讓CPU空閒一點...