owned this note
owned this note
Published
Linked with GitHub
# [clz](https://hackmd.io/s/B1LHefQ6)
[github](https://github.com/diana0651/clz-tests) contributed by <`Diana Ho`>
###### tags: `d0651` `sys`
## 案例分析
### 作業要求
- [ ]閱讀[重新理解數值](https://hackmd.io/s/SkKZBXZT)裡頭關於 [count leading zero (clz)](https://en.wikipedia.org/wiki/Find_first_set#CLZ) 的描述
- [ ]設計實驗,比較 32-bit 數值對以下實做的效能差異:
- [ ]recursive version
- [ ]iteration version
- [ ]binary search technique
- [ ]byte-shift version
- [ ]Harley’s algorithm
* [ ]找至少 3 個案例,說明 clz 的應用場合
* 示範: [A Fast Hi Precision Fixed Point Divide](http://me.henri.net/fp-div.html)
* 提示:透過 [Google Books](https://books.google.com/) 可以找到一些專門探討 optimization 的書籍,裡頭會有 clz 的應用
### 開發環境
Ubuntu 14.04 LTS
## CLZ(Count Leading Zero)演算法
* 找到 32 位元整數中的 leading zero,最高位元為 1,最低位元為 32
## 驗證程式碼1: recursive version
### 原版
從一半(也就是16-bit)嘗試shift right,檢查shift之後的值是否為零,若為零,則shift right的bit數再減少一半,直至不為零;不為零則繼續利用shift right後的值
```clike=
uint8_t clz(uint32_t x)
{
/* shift upper half down, rest is filled up with 0s */
uint16_t upper = (x >> 16);
// mask upper half away
uint16_t lower = (x & 0xFFFF);
return upper ? clz(upper) : 16 + clz(lower);
}
```
#### 實驗結果
core dump
- 程式碼中 16 應該要隨著進入不同層而變動
- 16 -> 8 -> 4 -> 2 -> 1
- 0xFFFF 的值要跟著位移
- (0xFFFF >> (16 - shift_len))
### 改正版
原版缺少遞迴終止條件與位移量 shift-bit 錯誤,所以加入一個參數記錄每次要 shift 多少位,每次進入 lower 的遞回就加 shift_len
> >[參考概念](https://hackmd.io/IwEwHATAzAxgLAUwLRQggDEuA2AnLpAQwDNCCB2XdSYAIxG0ITCA?both)
>
> >[參考實作](https://hackmd.io/JzCGwMwEyg2BaADAJnPALARkwdngIwGZ10NNkBTdADlFlmAsKA==?both)
```clike=
uint8_t clz_recursive(uint32_t x, int shift_len)
{
if(shift_len == 0)
return 0;
/* shift upper half down, rest is filled up with 0s */
uint16_t upper = (x >> shift_len);
// mask upper half away
uint16_t lower = ( x & (0xFFFF >> (16 - shift_len)) );
return upper ? clz_recursive(upper, shift_len >> 1) :
shift_len + clz_recursive(lower, shift_len >> 1);
}
```
## 驗證程式碼2: iteration version
檢查 leading zero 是否在前 16 bit,若是則擠掉後 16 bit 繼續比。接著依序比 8、4、2...
```clike=
int clz(uint32_t x) {
int n = 32, c = 16;
do {
uint32_t y = x >> c;
if (y) { n -= c; x = y; }
c >>= 1;
} while (c);
return (n - x);
}
```
## 驗證程式碼3: binary search technique
先看 leading zero 可能若在哪個範圍,再進去把前面不需要判斷的 bit 都左移擠掉,再縮小範圍去看。
```clike=
int clz(uint32_t x) {
if (x == 0) return 32;
int n = 0;
if (x <= 0x0000FFFF) { n += 16; x <<= 16; }
if (x <= 0x00FFFFFF) { n += 8; x <<= 8; }
if (x <= 0x0FFFFFFF) { n += 4; x <<= 4; }
if (x <= 0x3FFFFFFF) { n += 2; x <<= 2; }
if (x <= 0x7FFFFFFF) { n += 1; x <<= 1; }
return n;
}
```
## 驗證程式碼4: byte-shift version
先判斷 x 的 leading zero 是在哪個範圍,若為後半部,則將前半部 bit 左移擠掉繼續判斷。
`n = n - (x >> 31)` 為判斷最後 1 bit 用。
```clike=
int clz(uint32_t x) {
if (x == 0) return 32;
int n = 1;
if ((x >> 16) == 0) { n += 16; x <<= 16; }
if ((x >> 24) == 0) { n += 8; x <<= 8; }
if ((x >> 28) == 0) { n += 4; x <<= 4; }
if ((x >> 30) == 0) { n += 2; x <<= 2; }
n = n - (x >> 31);
return n;
}
```
## 驗證程式碼5: Harley’s algorithm
### 原版
這個演算法是拿來算log2,也就是算二進位數字中尾端有幾個0,本來以為算出ctz之後,在用31減掉就是clz,想了一下錯了,要先對原本的數字做reverse之後,再算ctz
```clike=
uint8_t clz(uint32_t x)
{
static prog_uint8_t const Table[] = {
0xFF, 0, 0xFF, 15, 0xFF, 1, 28, 0xFF,
16, 0xFF, 0xFF, 0xFF, 2, 21, 29, 0xFF,
0xFF, 0xFF, 19, 17, 10, 0xFF, 12, 0xFF,
0xFF, 3, 0xFF, 6, 0xFF, 22, 30, 0xFF,
14, 0xFF, 27, 0xFF, 0xFF, 0xFF, 20, 0xFF,
18, 9, 11, 0xFF, 5, 0xFF, 0xFF, 13,
26, 0xFF, 0xFF, 8, 0xFF, 4, 0xFF, 25,
0xFF, 7, 24, 0xFF, 23, 0xFF, 31, 0xFF,
};
/* Propagate leftmost 1-bit to the right */
x = x | (x >> 1);
x = x | (x >> 2);
x = x | (x >> 4);
x = x | (x >> 8);
x = x | (x >> 16);
/* x = x * 0x6EB14F9 */
x = (x << 3) - x; /* Multiply by 7. */
x = (x << 8) - x; /* Multiply by 255. */
x = (x << 8) - x; /* Again. */
x = (x << 8) - x; /* Again. */
return pgm_read_byte(&Table[x >> 26]);
}
```
這個 table 是用來判斷 CTZ (最後面有幾個0)
### 改正版
> > [參考實作](http://www.hackersdelight.org/hdcodetxt/nlz.c.txt)
```clike=
const char table[64] =
{32,31, u,16, u,30, 3, u, 15, u, u, u,29,10, 2, u,
u, u,12,14,21, u,19, u, u,28, u,25, u, 9, 1, u,
17, u, 4, u, u, u,11, u, 13,22,20, u,26, u, u,18,
5, u, u,23, u,27, u, 6, u,24, 7, u, 8, u, 0, u};
```
---
u 在 table 中無用,可另外給定任意數字
## 設計實驗
### 結果驗證
- 使用 Assertion function 做比對驗證
- [Assertion 用法](http://ptolemy.eecs.berkeley.edu/~johnr/tutorials/assertions.html)
- 在驗證程式中使用 assertion function 從 1 開始跑到 UNIT_MAX,比較 gcc 提供的內建函式 `__builtin_clz()` 和各個 clz 版本的結果是否相同。
- [built-in function in C](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html)
```clike=
```
### 效能分析
>>[參考實作](https://hackmd.io/CYJgrAHADApgZsAtARksxAWEA2EiBGIMA7IgIbDBRnEJnLbJA===#時間量測方法)
![](https://i.imgur.com/5g3mYeV.png)
### RDTSC 量測
[Linux time measurement](http://stackoverflow.com/questions/12392278/measure-time-in-linux-time-vs-clock-vs-getrusage-vs-clock-gettime-vs-gettimeof)
>>[參考實作](https://hackmd.io/GYJghsCsDMDGCmBaekQA5EBZrxIs0ADHtAOyYCM0mI8AbCKbEA==?view#rdtsc)
---
<style>
h2.part {color: red;}
h3.part {color: green;}
h4.part {color: blue;}
h5.part {color: black;}
</style>