owned this note
owned this note
Published
Linked with GitHub
# 2016q3Homework1(CLZ)
###### tags: `tundergod` `hw1` `2016q3`
contributed by <`tundergod`>
[**作業內容**](https://hackmd.io/s/B1LHefQ6)
## 作業要求
* 閱讀 重新理解數值 裡頭關於 count leading zero (clz) 的描述,設計實驗,比較 32-bit 數值對以下實做的效能差異:
* recursive version
* iteration version
* binary search technique
* byte-shift version
* Harley’s algorithm
### 測試方式
* 走訪全部的 32-bit 數值,用上述演算法帶入計算 clz 值,先驗證正確性,如果演算法不正確,試圖改正
* 比照 phonebook 和 compute-pi,設計一套 benchmark suite,得以針對所有的 32-bit 數值進行個別 clz 實做效能的分析,並透過 gnuplot 製圖
* 要附上個別數值實際耗費時間,不能只列出平均值
* 落點分析圖,類似 tcp-anaysis (with-code)
* 為了避免編譯器最佳化的影響,務必指定編譯參數 -O0 來抑制最佳化
## CLZ測試
* 本次測試基於[wikipeida](https://en.wikipedia.org/wiki/Find_first_set#CLZ)上給的五個clz算法
* clz 1:一個一個bit去位移,理論上是最慢的版本
```
int clz1()
{
n=0;
if (x == 0) return 32;
for (n = 0; ((x & 0x80000000) == 0); n++, x <<= 1);
return n;
}
```
* clz 2:clk1的進階版,以每次4bit的方法去位移
```
int clz2()
{
n=0;
if (x == 0) return 32;
for (n = 0; ((x & 0xF0000000) == 0); n += 4, x <<= 4);
n += (int)clz_table_4bit[x >> (32-4)];
return n;
}
```
* clz3:通過binary search的原理,第一次查看16bit,第二次8bit以此類推的做下去
```
int clz3()
{
if (x == 0) return(32);
n = 0;
if (x & 0x0000FFFF) {n = n +16; x = x <<16;}
if (x & 0x00FFFFFF) {n = n + 8; x = x << 8;}
if (x & 0x0FFFFFFF) {n = n + 4; x = x << 4;}
if (x & 0x3FFFFFFF) {n = n + 2; x = x << 2;}
if (x & 0x7FFFFFFF) {n = n + 1;}
return n;
}
```
* clz4:利用一個lookup table的幫助算法的增強
```
int clz4()
{
n=0;
if ((x & 0xFFFF0000) == 0) {n = 16; x <<= 16;} else {n = 0;}
if ((x & 0xFF000000) == 0) {n += 8; x <<= 8;}
if ((x & 0xF0000000) == 0) {n += 4; x <<= 4;}
n += (int)clz_table_4bit[x >> (32-4)];
return n;
}
```
* clz5:The fastest practical approach to simulate clz uses a precomputed 64KB lookup table
```
int clz5()
{
n=0;
if ((x & 0xFFFF0000) == 0)
return (int)clz5_table_16bit[x] + 16;
else
return (int)clz5_table_16bit[x >> 16];
}
```
* clz5提到的16bit loopup table:
```
static uint8_t clz5_table_16bit[65536] =
{16,15,14,14,13,13,13,13,12,12,12,12,..........
..................................0,0,0,0,0,0};
```
* 測試環境:
* ```for i in `seq 160000 100000 20000000`; ```
* 測試結果:
* 當數值爲0x00000001時(ans = 31):
![](https://i.imgur.com/ZR2rQvd.png)
* 當數值爲0x00000080時(ans = 24):
![](https://i.imgur.com/1OJmF0f.png)
* 當數值爲0x00008000時(ans = 16):
![](https://i.imgur.com/xzCzuFX.png)
* 當數值爲0x00800000時(ans = 8):
![](https://i.imgur.com/HE7LxjC.png)
* 當數值爲0x80000000時(ans = 0):
![](https://i.imgur.com/DsqdCHe.png)
>可以看到無論clz所求值爲多少,執行時間幾乎不變.
而意外的是最簡單的單位元位移算法竟然是最快的[name=lim wen sheng]
## 撰寫作業要求程式
### Recursion Version
```
uint8_t clz(uint32_t x,int half)
{
if(x == 0) return 32;
//end recursive after fifth
if(half == 0) return 0;
/* shift upper half down, rest is filled up with 0s */
uint16_t upper = (x >> half);
// mask upper half away
uint16_t lower = (x & (0x0000FFFF >> (16 - half)));
//if upper have value move it to lower else ans+16 exe recursive for another half
return upper ? clz(upper,half>>1) : (half) + clz(lower,half>>1);
}
```
### Iteration Version
```
uint8_t clz(uint32_t x)
{
int n = 0;
if (x == 0) return 32;
//every iterative left shift 1 bit if val = 0, ans++
for (n = 0; ((x & 0x80000000) == 0); n++, x <<= 1);
return n;
}
```
### Binary Search Technique
```
uint8_t clz(uint32_t x)
{
int n = 0;
if (x == 0) return(32);
//32 -> 16
if (x & 0x0000FFFF) {n = n +16; x = x << 16;}
//16 -> 8
if (x & 0x00FFFFFF) {n = n + 8; x = x << 8;}
//8 -> 4
if (x & 0x0FFFFFFF) {n = n + 4; x = x << 4;}
//4 -> 2
if (x & 0x3FFFFFFF) {n = n + 2; x = x << 2;}
//2 -> 1
if (x & 0x7FFFFFFF) {n = n + 1;}
return n;
}
```
### Byte-Shift Version
* 單純用byte-shift做不出來...
>參考學長筆記看到的版本都長這樣但是不知道爲什麼這是byte-shift[name=lim wen sheng]
```
unsigned clz(uint32_t x)
{
if (x == 0) return 32;
int n = 1;
if ((x >> 16) == 0) { n += 16; x <<= 16; }
if ((x >> 24) == 0) { n += 8; x <<= 8; }
if ((x >> 28) == 0) { n += 4; x <<= 4; }
if ((x >> 30) == 0) { n += 2; x <<= 2; }
n = n - (x >> 31);
return n;
}
```
### Harley's Algorithm
* refernces:
* [Eight-point algorithm](https://en.wikipedia.org/wiki/Eight-point_algorithm)
* [Some tricks useful for the Secure Hash Algorithm(Hacker's Delight)](http://www.hackersdelight.org/corres.txt)
>在上述網站可以看到6種類似Harley's Algorithm的做法,原理還在研究中...[name=lim wen sheng]
```
#include <stdio.h>
#include <stdint.h>
uint8_t clz(uint32_t x)
{
static unsigned char table [48] = {
32, 6, 5, 0, 4, 12, 0, 20,
15, 3, 11, 0, 0, 18, 25, 31,
8, 14, 2, 0, 10, 0, 0, 0,
0, 0, 0, 21, 0, 0, 19, 26,
7, 0, 13, 0, 16, 1, 22, 27,
9, 0, 17, 23, 28, 24, 29, 30
};
x = x | (x >> 1); // Propagate leftmost
x = x | (x >> 2); // 1-bit to the right.
x = x | (x >> 4);
x = x | (x >> 8);
x = x & ~(x >> 16);
x = x * 0x3EF5D037;
return table[x >> 26];
}
```
### 圖表
* recursive version
![](https://i.imgur.com/vWQXEw1.png)
* iteration version
![](https://i.imgur.com/pQc3qjr.png)
* binary search technique
![](https://i.imgur.com/qt3nHcU.png)
* byte-shift version
![](https://i.imgur.com/Puv1lqh.png)
* Harley’s algorithm
![](https://i.imgur.com/ap0SCHP.png)
* Compare All:
![](https://i.imgur.com/UzUoZOr.png)