---
tags: sysprog2017
---
# 2017q3 Homework1 (clz)
contributed by <`Yuessiah`, `HexRabbit`[^1]>
###### 本文件多處地方採用超連結來補完報告,建議點開來看
## 簡介
clz 為計算 2 進位數從 MSB 往右遇到的第一個 $1$ 之前所有 $0$ 的數量
## 開發環境
```
eeeeeeeeeeeeeeeee
eeeeeeeeeeeeeeeeeeeeeee
eeeee eeeeeeeeeeee eeeee OS: elementary os elementary OS 0.3.2 freya
eeee eeeee eee eeee Kernel: x86_64 Linux 4.4.0-93-generic
eeee eeee eee eeee Shell: bash 4.3.11
eee eee eee eee Virtualization: VT-x
eee eee eee eee CPU: Intel Core i7-7700 CPU @ 4.2GHz
ee eee eeee eeee CPU(s): 8
ee eee eeeee eeeeee
ee eee eeeee eeeee ee RAM: 1691MiB / 15934MiB
eee eeee eeeeee eeeee eee L1d cache: 32K
eee eeeeeeeeee eeeeee eee L1i cache: 32K
eeeeeeeeeeeeeeeeeeeeeeee eeeee L2 cache: 256K
eeeeeeee eeeeeeeeeeee eeee L3 cache: 8192K
eeeee eeeee
eeeeeee eeeeeee BogoMIPS: 7200.65
eeeeeeeeeeeeeeeee
```
## 實作方式
本份作業[^2]探討了幾個演算法,底下做些簡單的描述。
### [Binary search](https://en.wikipedia.org/wiki/Binary_search_algorithm)
#### iterative
```C
int clz(uint32_t x) {
int n = 32, c = 16;
do {
uint32_t y = x >> c;
if (y) { n -= c; x = y; }
c >>= 1;
} while (c);
return (n - x);
}
```
假設一個數字 ```00001010100011011100001010100101```
```x = 00001010100011011100001010100101```, $c = 16$, $n = 32$
```....________________________________```
```x = 00000000000000000000101010001101```, $c = 8$, $n = 16$
```...._______________^________________```
```x = 00000000000000000000000000001010```, $c = 4$, $n = 8$
```...._______________^_______^________```
```x = 00000000000000000000000000001010```, $c = 2$, $n = 8$
```...._______________^_______^___^____```
```x = 00000000000000000000000000000010```, $c = 1$, $n = 6$
```...._______________^_______^___^_^__```
```x = 00000000000000000000000000000001```, $c = 0$, $n = 5$
```...._______________^_______^___^_^^_```
$x = 1$, 最後```return (n - x);``` i.e. clz = 4.
#### bit mask
作業說明上直接叫他 binary search 挺怪的,因為其他兩份 code 也是 binary search 啊 😕
於是本文件決定改成 bit mask 了
```C
int clz(uint32_t x) {
if (x == 0) return 32;
int n = 0;
if (x <= 0x0000FFFF) { n += 16; x <<= 16; }
if (x <= 0x00FFFFFF) { n += 8; x <<= 8; }
if (x <= 0x0FFFFFFF) { n += 4; x <<= 4; }
if (x <= 0x3FFFFFFF) { n += 2; x <<= 2; }
if (x <= 0x7FFFFFFF) { n += 1; x <<= 1; }
return n;
}
```
利用 mask 概念判定第一個 $1$ 在哪出現。
以```0x0FFFFFFF```為例,
由前面的操作,第一個 $1$ 的位置一定會落在```0x00FFFFFF ~ 0xFFFFFFFF```之間
從這個範圍剖一半,若發現 $1$ 在右側代表 leading zeros 一定多,所以加上合理零的數量 4,並且做適當的位移;
若在左側則 leading zeros 較少,不做任何動作
#### byte shift
```C
int clz(uint32_t x) {
if (x == 0) return 32;
int n = 1;
if ((x >> 16) == 0) { n += 16; x <<= 16; }
if ((x >> 24) == 0) { n += 8; x <<= 8; }
if ((x >> 28) == 0) { n += 4; x <<= 4; }
if ((x >> 30) == 0) { n += 2; x <<= 2; }
n = n - (x >> 31);
return n;
}
```
只有 if 判斷跟 bit mask 寫法不同,但邏輯上是一樣的。
#### recursive
```ktvexe```同學寫的[^3]有點看不懂,我還是自己寫一份好了
```C
int clz(uint32_t x, int div=16, int n=32) {
if (div == 0) return n;
if (x < (1<<div-1)) return clz(x, div/2, n);
else return clz(x>>div, div/2, n-div);
}
```
做法跟 iterative 的版本差不多。
---
### [Hash table](https://en.wikipedia.org/wiki/Hash_table)
#### [De Bruijn sequence](https://en.wikipedia.org/wiki/De_Bruijn_sequence)
```C
const int tab32[32] = {
0 , 1 , 28, 2 , 29, 14, 24, 3,
30, 22, 20, 15, 25, 17, 4 , 8,
31, 27, 13, 23, 21, 19, 16, 7,
26, 12, 18, 6 , 11, 5 , 10, 9
};
int clz(uint32_t value) {
value |= value >> 1;
value |= value >> 2;
value |= value >> 4;
value |= value >> 8;
value |= value >> 16;
return tab32[((uint32_t)((value - (value >> 1))*0x077CB531U)) >> 27];
}
```
先將 value 變為除了第一個 $1$ ,其餘的 bit 都設為 $0$
接著用```0x077CB531U```這個 De Bruijn sequence 乘上 value,在將 table 映射到的值回傳。
#### Harley's algorithm
```C
uint8_t clz(uint32_t x)
{
static prog_uint8_t const Table[] = {
0xFF, 0, 0xFF, 15, 0xFF, 1, 28, 0xFF,
16, 0xFF, 0xFF, 0xFF, 2, 21, 29, 0xFF,
0xFF, 0xFF, 19, 17, 10, 0xFF, 12, 0xFF,
0xFF, 3, 0xFF, 6, 0xFF, 22, 30, 0xFF,
14, 0xFF, 27, 0xFF, 0xFF, 0xFF, 20, 0xFF,
18, 9, 11, 0xFF, 5, 0xFF, 0xFF, 13,
26, 0xFF, 0xFF, 8, 0xFF, 4, 0xFF, 25,
0xFF, 7, 24, 0xFF, 23, 0xFF, 31, 0xFF,
};
/* Propagate leftmost 1-bit to the right */
x = x | (x >> 1);
x = x | (x >> 2);
x = x | (x >> 4);
x = x | (x >> 8);
x = x | (x >> 16);
/* x = x * 0x6EB14F9 */
x = (x << 3) - x; /* Multiply by 7. */
x = (x << 8) - x; /* Multiply by 255. */
x = (x << 8) - x; /* Again. */
x = (x << 8) - x; /* Again. */
return pgm_read_byte(&Table[x >> 26]);
}
```
## 效能差異
![](https://i.imgur.com/NBtpd7k.png)
[^1]: 本共筆是與`HexRabbit`[共同研究](https://hackmd.io/IwVgbAxgZgJgDATgLQgOwCN1ICwGYpxIICGcxR6q2CUApgEzEz1xA===?view)得以撰寫而成
[^2]: [Hackmd/ C03: clz](https://hackmd.io/IYExBYQVgRmBacBOARjRKDMBjeTMxLxQBmAbJiJksEqiUA==?view)
[^3]: [Hackmd/ ktvexe 2016q3 clz](https://hackmd.io/s/BJBZn6Q6)