[clz](https://hackmd.io/s/B1LHefQ6)

# [clz](https://hackmd.io/s/B1LHefQ6) [github](https://github.com/diana0651/clz-tests) contributed by <`Diana Ho`> ###### tags: `d0651` `sys` ## 案例分析 ### 作業要求 - [ ]閱讀[重新理解數值](https://hackmd.io/s/SkKZBXZT)裡頭關於 [count leading zero (clz)](https://en.wikipedia.org/wiki/Find_first_set#CLZ) 的描述 - [ ]設計實驗，比較 32-bit 數值對以下實做的效能差異： - [ ]recursive version - [ ]iteration version - [ ]binary search technique - [ ]byte-shift version - [ ]Harley’s algorithm * [ ]找至少 3 個案例，說明 clz 的應用場合 * 示範: [A Fast Hi Precision Fixed Point Divide](http://me.henri.net/fp-div.html) * 提示：透過 [Google Books](https://books.google.com/) 可以找到一些專門探討 optimization 的書籍，裡頭會有 clz 的應用 ### 開發環境 Ubuntu 14.04 LTS ## CLZ（Count Leading Zero）演算法 * 找到 32 位元整數中的 leading zero，最高位元為 1，最低位元為 32 ## 驗證程式碼1: recursive version ### 原版從一半(也就是16-bit)嘗試shift right，檢查shift之後的值是否為零，若為零，則shift right的bit數再減少一半，直至不為零；不為零則繼續利用shift right後的值 ```clike= uint8_t clz(uint32_t x) { /* shift upper half down, rest is filled up with 0s */ uint16_t upper = (x >> 16); // mask upper half away uint16_t lower = (x & 0xFFFF); return upper ? clz(upper) : 16 + clz(lower); } ``` #### 實驗結果 core dump - 程式碼中 16 應該要隨著進入不同層而變動 - 16 -> 8 -> 4 -> 2 -> 1 - 0xFFFF 的值要跟著位移 - (0xFFFF >> (16 - shift_len)) ### 改正版原版缺少遞迴終止條件與位移量 shift-bit 錯誤，所以加入一個參數記錄每次要 shift 多少位，每次進入 lower 的遞回就加 shift_len > >[參考概念](https://hackmd.io/IwEwHATAzAxgLAUwLRQggDEuA2AnLpAQwDNCCB2XdSYAIxG0ITCA?both) > > >[參考實作](https://hackmd.io/JzCGwMwEyg2BaADAJnPALARkwdngIwGZ10NNkBTdADlFlmAsKA==?both) ```clike= uint8_t clz_recursive(uint32_t x, int shift_len) { if(shift_len == 0) return 0; /* shift upper half down, rest is filled up with 0s */ uint16_t upper = (x >> shift_len); // mask upper half away uint16_t lower = ( x & (0xFFFF >> (16 - shift_len)) ); return upper ? clz_recursive(upper, shift_len >> 1) : shift_len + clz_recursive(lower, shift_len >> 1); } ``` ## 驗證程式碼2: iteration version 檢查 leading zero 是否在前 16 bit，若是則擠掉後 16 bit 繼續比。接著依序比 8、4、2... ```clike= int clz(uint32_t x) { int n = 32, c = 16; do { uint32_t y = x >> c; if (y) { n -= c; x = y; } c >>= 1; } while (c); return (n - x); } ``` ## 驗證程式碼3: binary search technique 先看 leading zero 可能若在哪個範圍，再進去把前面不需要判斷的 bit 都左移擠掉，再縮小範圍去看。 ```clike= int clz(uint32_t x) { if (x == 0) return 32; int n = 0; if (x <= 0x0000FFFF) { n += 16; x <<= 16; } if (x <= 0x00FFFFFF) { n += 8; x <<= 8; } if (x <= 0x0FFFFFFF) { n += 4; x <<= 4; } if (x <= 0x3FFFFFFF) { n += 2; x <<= 2; } if (x <= 0x7FFFFFFF) { n += 1; x <<= 1; } return n; } ``` ## 驗證程式碼4: byte-shift version 先判斷 x 的 leading zero 是在哪個範圍，若為後半部，則將前半部 bit 左移擠掉繼續判斷。 `n = n - (x >> 31)` 為判斷最後 1 bit 用。 ```clike= int clz(uint32_t x) { if (x == 0) return 32; int n = 1; if ((x >> 16) == 0) { n += 16; x <<= 16; } if ((x >> 24) == 0) { n += 8; x <<= 8; } if ((x >> 28) == 0) { n += 4; x <<= 4; } if ((x >> 30) == 0) { n += 2; x <<= 2; } n = n - (x >> 31); return n; } ``` ## 驗證程式碼5: Harley’s algorithm ### 原版這個演算法是拿來算log2，也就是算二進位數字中尾端有幾個0，本來以為算出ctz之後，在用31減掉就是clz，想了一下錯了，要先對原本的數字做reverse之後，再算ctz ```clike= uint8_t clz(uint32_t x) { static prog_uint8_t const Table[] = { 0xFF, 0, 0xFF, 15, 0xFF, 1, 28, 0xFF, 16, 0xFF, 0xFF, 0xFF, 2, 21, 29, 0xFF, 0xFF, 0xFF, 19, 17, 10, 0xFF, 12, 0xFF, 0xFF, 3, 0xFF, 6, 0xFF, 22, 30, 0xFF, 14, 0xFF, 27, 0xFF, 0xFF, 0xFF, 20, 0xFF, 18, 9, 11, 0xFF, 5, 0xFF, 0xFF, 13, 26, 0xFF, 0xFF, 8, 0xFF, 4, 0xFF, 25, 0xFF, 7, 24, 0xFF, 23, 0xFF, 31, 0xFF, }; /* Propagate leftmost 1-bit to the right */ x = x | (x >> 1); x = x | (x >> 2); x = x | (x >> 4); x = x | (x >> 8); x = x | (x >> 16); /* x = x * 0x6EB14F9 */ x = (x << 3) - x; /* Multiply by 7. */ x = (x << 8) - x; /* Multiply by 255. */ x = (x << 8) - x; /* Again. */ x = (x << 8) - x; /* Again. */ return pgm_read_byte(&Table[x >> 26]); } ``` 這個 table 是用來判斷 CTZ (最後面有幾個0) ### 改正版 > > [參考實作](http://www.hackersdelight.org/hdcodetxt/nlz.c.txt) ```clike= const char table[64] = {32,31, u,16, u,30, 3, u, 15, u, u, u,29,10, 2, u, u, u,12,14,21, u,19, u, u,28, u,25, u, 9, 1, u, 17, u, 4, u, u, u,11, u, 13,22,20, u,26, u, u,18, 5, u, u,23, u,27, u, 6, u,24, 7, u, 8, u, 0, u}; ``` --- u 在 table 中無用，可另外給定任意數字 ## 設計實驗 ### 結果驗證 - 使用 Assertion function 做比對驗證 - [Assertion 用法](http://ptolemy.eecs.berkeley.edu/~johnr/tutorials/assertions.html) - 在驗證程式中使用 assertion function 從 1 開始跑到 UNIT_MAX，比較 gcc 提供的內建函式 `__builtin_clz()` 和各個 clz 版本的結果是否相同。 - [built-in function in C](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) ```clike= ``` ### 效能分析 >>[參考實作](https://hackmd.io/CYJgrAHADApgZsAtARksxAWEA2EiBGIMA7IgIbDBRnEJnLbJA===#時間量測方法) ![](https://i.imgur.com/5g3mYeV.png) ### RDTSC 量測 [Linux time measurement](http://stackoverflow.com/questions/12392278/measure-time-in-linux-time-vs-clock-vs-getrusage-vs-clock-gettime-vs-gettimeof) >>[參考實作](https://hackmd.io/GYJghsCsDMDGCmBaekQA5EBZrxIs0ADHtAOyYCM0mI8AbCKbEA==?view#rdtsc) --- <style> h2.part {color: red;} h3.part {color: green;} h4.part {color: blue;} h5.part {color: black;} </style>