Try   HackMD

2016q3 Homework1(clz)

Reviewed by shouchengH

  • 可以用一張圖, 或各 function 的結果, 來顯示成果
  • recursive version中 , if (i == 4) return !upper; , 可以利用取的size來取代4
  • clz 呼叫clz_helper會使效能降低
  • 如果將uint16_t upper 的定義, 分別定成uint8_t ,可節省空間
  • Binary Serch 可以將function head 秀出, 可以清楚了解程個函式
  • 自己的文件上可以加上 contributed by <XXX>

目標

閱讀 重新理解數值 裡頭關於 count leading zero (clz) 的描述,設計實驗,比較 32-bit 數值對以下實做的效能差異:

  • recursive version
  • iteration version
  • binary search technique
  • byte-shift version
  • Harley’s algorithm

recursive version

原本的沒有設終止條件

uint8_t clz(uint32_t x) { /* shift upper half down, rest is filled up with 0s */ uint16_t upper = (x >> 16); // mask upper half away uint16_t lower = (x & 0xFFFF); return upper ? clz(upper) : 16 + clz(lower); }

修正後

static uint32_t offset[] = { 16, 8, 4, 2, 1 }; static uint32_t mask[] = { 0xFFFF, 0x00FF, 0x000F, 0x0003, 0x0001 }; static int clz_helper(uint32_t x, int i) { /* shift upper half down, rest is filled up with 0s */ uint16_t upper = (x >> offset[i]); // mask upper half away uint16_t lower = (x & mask[i]); if (i == 4) return !upper; return upper ? clz_helper(upper, i + 1) : offset[i] + clz_helper(lower, i + 1); } int clz(uint32_t x) { if (!x) return 32; clz_helper(x, 0); }

Harley’s algorithm

uint8_t clz(uint32_t x) { static prog_uint8_t const Table[] = { 0xFF, 0, 0xFF, 15, 0xFF, 1, 28, 0xFF, 16, 0xFF, 0xFF, 0xFF, 2, 21, 29, 0xFF, 0xFF, 0xFF, 19, 17, 10, 0xFF, 12, 0xFF, 0xFF, 3, 0xFF, 6, 0xFF, 22, 30, 0xFF, 14, 0xFF, 27, 0xFF, 0xFF, 0xFF, 20, 0xFF, 18, 9, 11, 0xFF, 5, 0xFF, 0xFF, 13, 26, 0xFF, 0xFF, 8, 0xFF, 4, 0xFF, 25, 0xFF, 7, 24, 0xFF, 23, 0xFF, 31, 0xFF, }; /* Propagate leftmost 1-bit to the right */ x = x | (x >> 1); x = x | (x >> 2); x = x | (x >> 4); x = x | (x >> 8); x = x | (x >> 16); /* x = x * 0x6EB14F9 */ x = (x << 3) - x; /* Multiply by 7. */ x = (x << 8) - x; /* Multiply by 255. */ x = (x << 8) - x; /* Again. */ x = (x << 8) - x; /* Again. */ return pgm_read_byte(&Table[x >> 26]); }

Binary Serch

{
    if (x == 0) return 32;
    int n = 0;
    if (x <= 0x0000FFFF) { n += 16; x <<= 16; }
    if (x <= 0x00FFFFFF) { n += 8; x <<= 8; }
    if (x <= 0x0FFFFFFF) { n += 4; x <<= 4; }
    if (x <= 0x3FFFFFFF) { n += 2; x <<= 2; }
    if (x <= 0x7FFFFFFF) { n += 1; x <<= 1; }
    return n;
}

參考資料

勃興筆記