--- tags: sysprog2017 --- # 2017q3 Homework1 (clz) contributed by <`Yuessiah`, `HexRabbit`[^1]> ###### 本文件多處地方採用超連結來補完報告,建議點開來看 ## 簡介 clz 為計算 2 進位數從 MSB 往右遇到的第一個 $1$ 之前所有 $0$ 的數量 ## 開發環境 ``` eeeeeeeeeeeeeeeee eeeeeeeeeeeeeeeeeeeeeee eeeee eeeeeeeeeeee eeeee OS: elementary os elementary OS 0.3.2 freya eeee eeeee eee eeee Kernel: x86_64 Linux 4.4.0-93-generic eeee eeee eee eeee Shell: bash 4.3.11 eee eee eee eee Virtualization: VT-x eee eee eee eee CPU: Intel Core i7-7700 CPU @ 4.2GHz ee eee eeee eeee CPU(s): 8 ee eee eeeee eeeeee ee eee eeeee eeeee ee RAM: 1691MiB / 15934MiB eee eeee eeeeee eeeee eee L1d cache: 32K eee eeeeeeeeee eeeeee eee L1i cache: 32K eeeeeeeeeeeeeeeeeeeeeeee eeeee L2 cache: 256K eeeeeeee eeeeeeeeeeee eeee L3 cache: 8192K eeeee eeeee eeeeeee eeeeeee BogoMIPS: 7200.65 eeeeeeeeeeeeeeeee ``` ## 實作方式 本份作業[^2]探討了幾個演算法,底下做些簡單的描述。 ### [Binary search](https://en.wikipedia.org/wiki/Binary_search_algorithm) #### iterative ```C int clz(uint32_t x) { int n = 32, c = 16; do { uint32_t y = x >> c; if (y) { n -= c; x = y; } c >>= 1; } while (c); return (n - x); } ``` 假設一個數字 ```00001010100011011100001010100101``` ```x = 00001010100011011100001010100101```, $c = 16$, $n = 32$ ```....________________________________``` ```x = 00000000000000000000101010001101```, $c = 8$, $n = 16$ ```...._______________^________________``` ```x = 00000000000000000000000000001010```, $c = 4$, $n = 8$ ```...._______________^_______^________``` ```x = 00000000000000000000000000001010```, $c = 2$, $n = 8$ ```...._______________^_______^___^____``` ```x = 00000000000000000000000000000010```, $c = 1$, $n = 6$ ```...._______________^_______^___^_^__``` ```x = 00000000000000000000000000000001```, $c = 0$, $n = 5$ ```...._______________^_______^___^_^^_``` $x = 1$, 最後```return (n - x);``` i.e. clz = 4. #### bit mask 作業說明上直接叫他 binary search 挺怪的,因為其他兩份 code 也是 binary search 啊 😕 於是本文件決定改成 bit mask 了 ```C int clz(uint32_t x) { if (x == 0) return 32; int n = 0; if (x <= 0x0000FFFF) { n += 16; x <<= 16; } if (x <= 0x00FFFFFF) { n += 8; x <<= 8; } if (x <= 0x0FFFFFFF) { n += 4; x <<= 4; } if (x <= 0x3FFFFFFF) { n += 2; x <<= 2; } if (x <= 0x7FFFFFFF) { n += 1; x <<= 1; } return n; } ``` 利用 mask 概念判定第一個 $1$ 在哪出現。 以```0x0FFFFFFF```為例, 由前面的操作,第一個 $1$ 的位置一定會落在```0x00FFFFFF ~ 0xFFFFFFFF```之間 從這個範圍剖一半,若發現 $1$ 在右側代表 leading zeros 一定多,所以加上合理零的數量 4,並且做適當的位移; 若在左側則 leading zeros 較少,不做任何動作 #### byte shift ```C int clz(uint32_t x) { if (x == 0) return 32; int n = 1; if ((x >> 16) == 0) { n += 16; x <<= 16; } if ((x >> 24) == 0) { n += 8; x <<= 8; } if ((x >> 28) == 0) { n += 4; x <<= 4; } if ((x >> 30) == 0) { n += 2; x <<= 2; } n = n - (x >> 31); return n; } ``` 只有 if 判斷跟 bit mask 寫法不同,但邏輯上是一樣的。 #### recursive ```ktvexe```同學寫的[^3]有點看不懂,我還是自己寫一份好了 ```C int clz(uint32_t x, int div=16, int n=32) { if (div == 0) return n; if (x < (1<<div-1)) return clz(x, div/2, n); else return clz(x>>div, div/2, n-div); } ``` 做法跟 iterative 的版本差不多。 --- ### [Hash table](https://en.wikipedia.org/wiki/Hash_table) #### [De Bruijn sequence](https://en.wikipedia.org/wiki/De_Bruijn_sequence) ```C const int tab32[32] = { 0 , 1 , 28, 2 , 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4 , 8, 31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6 , 11, 5 , 10, 9 }; int clz(uint32_t value) { value |= value >> 1; value |= value >> 2; value |= value >> 4; value |= value >> 8; value |= value >> 16; return tab32[((uint32_t)((value - (value >> 1))*0x077CB531U)) >> 27]; } ``` 先將 value 變為除了第一個 $1$ ,其餘的 bit 都設為 $0$ 接著用```0x077CB531U```這個 De Bruijn sequence 乘上 value,在將 table 映射到的值回傳。 #### Harley's algorithm ```C uint8_t clz(uint32_t x) { static prog_uint8_t const Table[] = { 0xFF, 0, 0xFF, 15, 0xFF, 1, 28, 0xFF, 16, 0xFF, 0xFF, 0xFF, 2, 21, 29, 0xFF, 0xFF, 0xFF, 19, 17, 10, 0xFF, 12, 0xFF, 0xFF, 3, 0xFF, 6, 0xFF, 22, 30, 0xFF, 14, 0xFF, 27, 0xFF, 0xFF, 0xFF, 20, 0xFF, 18, 9, 11, 0xFF, 5, 0xFF, 0xFF, 13, 26, 0xFF, 0xFF, 8, 0xFF, 4, 0xFF, 25, 0xFF, 7, 24, 0xFF, 23, 0xFF, 31, 0xFF, }; /* Propagate leftmost 1-bit to the right */ x = x | (x >> 1); x = x | (x >> 2); x = x | (x >> 4); x = x | (x >> 8); x = x | (x >> 16); /* x = x * 0x6EB14F9 */ x = (x << 3) - x; /* Multiply by 7. */ x = (x << 8) - x; /* Multiply by 255. */ x = (x << 8) - x; /* Again. */ x = (x << 8) - x; /* Again. */ return pgm_read_byte(&Table[x >> 26]); } ``` ## 效能差異 ![](https://i.imgur.com/NBtpd7k.png) [^1]: 本共筆是與`HexRabbit`[共同研究](https://hackmd.io/IwVgbAxgZgJgDATgLQgOwCN1ICwGYpxIICGcxR6q2CUApgEzEz1xA===?view)得以撰寫而成 [^2]: [Hackmd/ C03: clz](https://hackmd.io/IYExBYQVgRmBacBOARjRKDMBjeTMxLxQBmAbJiJksEqiUA==?view) [^3]: [Hackmd/ ktvexe 2016q3 clz](https://hackmd.io/s/BJBZn6Q6)