2016q3 Homework1 (clz)

contributed by <carolc0708>

作業要求

[ ]閱讀重新理解數值裡頭關於 count leading zero (clz) 的描述
設計實驗，比較 32-bit 數值對以下實做的效能差異：
- [ ]recursive version
- [ ]iteration version
- [ ]binary search technique
- [ ]byte-shift version
- [ ]Harley’s algorithm
[ ]找至少 3 個案例，說明 clz 的應用場合
示範: A Fast Hi Precision Fixed Point Divide
提示：透過 Google Books 可以找到一些專門探討 optimization 的書籍，裡頭會有 clz 的應用

CLZ（Count Leading Zero）演算法

找到 32 位元整數中的 leading zero，最高位元為 1，最低位元為 32
修改作業說明中所提供的參考程式碼:

recursive version

原版:

uint8_t clz(uint32_t x)
{
	/* shift upper half down, rest is filled up with 0s */
	uint16_t upper = (x >> 16); 
	// mask upper half away
	uint16_t lower = (x & 0xFFFF);
	return upper ? clz(upper) : 16 + clz(lower);
}

修正版:

uint8_t clz_binary_recursive(uint32_t x, int shift)
{
    if(x == 0)
        return 32;
    if(shift == 0)
        return 0;
    if(x <= (0xFFFFFFFF >> shift))
        return shift + clz_binary_recursive(x << shift, shift / 2);
    else
        return clz_binary_recursive(x, shift / 2);
}

參考多位同學實作，發現 recursive 若是一個個 bit 去計算，還可以寫成更簡單的方式:

uint8_t clz_recursive(uint32_t x)
{
    return x ? clz_recursive(x>>1) -1 : 32;
}

iteration version

int clz(uint32_t x) {
    int n = 32, c = 16;
    do {
        uint32_t y = x >> c;
        if (y) { n -= c; x = y; }
        c >>= 1;
    } while (c);
    return (n - x);
}

檢查 leading zero 是否在前 16 bit，若是則擠掉後 16 bit 繼續比。接著依序比 8、4、2…

參考 CheHsuan 同學的筆記， iteration version 若是一個個 bit 下去看的話，也可以寫成另一種方式:

uint8_t clz_iteration(uint32_t x)
{
    int n = 0;
    while(x)
    {
        x >>= 1;
        n += 1;
    }
    return 32-n;
}

Harley’s algorithm

uint8_t clz(uint32_t x)
{
    static prog_uint8_t const Table[] = {
      0xFF, 0, 0xFF, 15, 0xFF, 1, 28, 0xFF,
      16, 0xFF, 0xFF, 0xFF, 2, 21, 29, 0xFF,
      0xFF, 0xFF, 19, 17, 10, 0xFF, 12, 0xFF,
      0xFF, 3, 0xFF, 6, 0xFF, 22, 30, 0xFF,
      14, 0xFF, 27, 0xFF, 0xFF, 0xFF, 20, 0xFF,
      18, 9, 11, 0xFF, 5, 0xFF, 0xFF, 13,
      26, 0xFF, 0xFF, 8, 0xFF, 4, 0xFF, 25,
      0xFF, 7, 24, 0xFF, 23, 0xFF, 31, 0xFF,
    };

    /* Propagate leftmost 1-bit to the right */
    x = x | (x >> 1);
    x = x | (x >> 2);
    x = x | (x >> 4);
    x = x | (x >> 8);
    x = x | (x >> 16);
 
    /* x = x * 0x6EB14F9 */
    x = (x << 3) - x;   /* Multiply by 7. */
    x = (x << 8) - x;   /* Multiply by 255. */
    x = (x << 8) - x;   /* Again. */
    x = (x << 8) - x;   /* Again. */

    return pgm_read_byte(&Table[x >> 26]);
}

修正版:
將 table 改為以下，u 可以為任意值。

   const char table[64] =
     {32,31, u,16, u,30, 3, u,15, u, u, u,29,10, 2, u,
       u, u,12,14,21, u,19, u,u,28, u,25, u, 9, 1, u,
      17, u, 4, u, u, u,11, u,13,22,20, u,26, u, u,18,
       5, u, u,23, u,27, u, 6,u,24, 7, u, 8, u, 0, u};

binary search technique

int clz(uint32_t x) {
    if (x == 0) return 32;
    int n = 0;
    if (x <= 0x0000FFFF) { n += 16; x <<= 16; }
    if (x <= 0x00FFFFFF) { n += 8; x <<= 8; }
    if (x <= 0x0FFFFFFF) { n += 4; x <<= 4; }
    if (x <= 0x3FFFFFFF) { n += 2; x <<= 2; }
    if (x <= 0x7FFFFFFF) { n += 1; x <<= 1; }
    return n;
}

先看 leading zero 可能若在哪個範圍，再進去把前面不需要判斷的 bit 都左移擠掉，再縮小範圍去看。

byte-shift version

int clz(uint32_t x) {
    if (x == 0) return 32;
    int n = 1;
    if ((x >> 16) == 0) { n += 16; x <<= 16; }
    if ((x >> 24) == 0) { n += 8; x <<= 8; }
    if ((x >> 28) == 0) { n += 4; x <<= 4; }
    if ((x >> 30) == 0) { n += 2; x <<= 2; }
    n = n - (x >> 31);
    return n;
}

先判斷 x 的 leading zero 是在哪個範圍，若為後半部，則將前半部 bit 左移擠掉繼續判斷。
n = n - (x >> 31) 為判斷最後 1 bit 用。

測試結果

結果驗證

寫 verify.c 做程式驗證。
使用 Assertion function 做比對驗證
參考 Assertion 用法，在驗證程式中使用 assertion function 比較 __builtin_clz() 和各個 clz 版本的結果是否相同。

Assertions in C

In C, assertions are implemented with the standard assert macro. The argument to assert must be true when the macro is executed, otherwise the program aborts and prints an error message. For example, the assertion
assert( size <= LIMIT );
will abort the program and print an error message like this:
Assertion violation: file tripe.c, line 34: size <= LIMIT
if size is greater than LIMIT.
使用 __builtin_clz(x) 來驗證計算是否正確
剛開始一直出現錯誤，把值一個個印出來才發現 __builtin_clz(0) 是 31，但應該是 32 才正確。

查看 built-in function in C:

Built-in Function: int __builtin_clz (unsigned int x)

Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.

發現 __builtin_clz(0) 的結果是 undefined。
為避免造成錯誤就先不使用，僅檢查 x =1~UINT_MAX 間所有數值結果。

效能分析

CLZ 使用案例

參考 Hacker's Delight這本書當中所提及之應用:

Devision Algorithms(p.184)
Integer Square Root, Newtons' method(p.281)
Searching a word for a consecutive string
(for disk block allocation algorithm)