2017q3 Homework1 (clz)

# 2017q3 Homework1 (clz) contributed by<`zhanyangch`> ### Reviewed by `st9007a` - Iteration , Bit Shift , Binary 版本的程式碼應一併解釋 - 建議在做測試前可以先使用現有程式碼對各版本 clz 演算法做驗證 - 展現出分布圖後，請對數據做分析 - 應用場合可以列出相關程式碼 - 作業要求用 C99 或 C11 的 `_Generic` 改寫程式碼，建議可以嘗試 ## harley algorithm * 首先會將第一次出現 1 以後的 bit 都填為 1，例如計算0xF1 00000000000000000000000011110001 的結果如下： 00000000000000000000000011111111 做完這個動作後，數值只剩下 33種形式(以 32bit 為例)，前面有 m 個 0 之後有 n 個 1，m+n=32 ``` clike x = x | (x >> 1); x = x | (x >> 2); x = x | (x >> 4); x = x | (x >> 8); x = x | (x >> 16); ``` * 只剩下 33 個值之後，卻仍然佔據 32bit，思考如何可以快速判定這 33 個值的 clz，如果可以找到合適 hash function，可以在不碰撞的情況下 mapping 到較小的數值範圍，並建立 table 對應 clz 的結果。如何找到此種 hash function ，有比較快的方式嗎？需要再研究 ```clike static uint8_t const Table[] ={ 32,31, 0,16, 0,30, 3, 0,15, 0, 0, 0,29,10, 2, 0, 0, 0,12,14,21, 0,19, 0, 0,28, 0,25, 0, 9, 1, 0, 17, 0, 4, 0, 0, 0,11, 0,13,22,20, 0,26, 0, 0,18, 5, 0, 0,23, 0,27, 0, 6,0,24, 7, 0, 8, 0, 0, 0 }; /* x = x * 0x6EB14F9 */ x = (x << 3) - x; /* Multiply by 7. */ x = (x << 8) - x; /* Multiply by 255. */ x = (x << 8) - x; /* Again. */ x = (x << 8) - x; /* Again. */ return Table[(x >> 26)]; ``` ## recursive * 每次將 bit 數目減半，利用 magic[] 減少遞迴次數，當遞迴到 c=3 時 upper、lower 皆為 2bit magic[] 列舉出所有可能的值，查表即可得 clz ```clike static const int mask[]={0,8,12,14}; static const int magic[]={2,1,0,0}; unsigned clz2(uint32_t x,int c) { if (!x && !c) return 32; uint32_t upper = (x >> (16>>c)); uint32_t lower = (x & (0xFFFF>>mask[c])); if (c == 3) return upper ? magic[upper] : 2 + magic[lower]; return upper ? clz2(upper, c + 1) : (16 >> (c)) + clz2(lower, c + 1); } ``` ## 觀察時間分佈 * 執行環境 ``` $ lscpu Architecture: x86_64 CPU 作業模式： 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 每核心執行緒數：2 每通訊端核心數：2 Socket(s): 1 NUMA 節點： 1 供應商識別號： GenuineIntel CPU 家族： 6 型號： 42 Model name: Intel(R) Core(TM) i7-2640M CPU @ 2.80GHz 製程： 7 CPU MHz： 855.421 CPU max MHz: 3500.0000 CPU min MHz: 800.0000 BogoMIPS: 5587.06 虛擬： VT-x L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 4096K NUMA node0 CPU(s): 0-3 ``` ```shell $ make $ make run $ make plot ``` ![](https://i.imgur.com/NlSf8HW.png) ## 應用 ### Exponential-Golomb coding 在 H.264 使用的編碼 Exponential-Golomb coding(指數哥倫布編碼）中，解碼時先計算 clz 決定該值編碼的長度 * 0 階指數哥倫布編碼的方法 1. 將要編碼的數以二進位表達並 +1 2. bit 數為 y，則將 y-1 個 0 加入在此數前面 0 ⇒ 1 ⇒ 1 1 ⇒ 10 ⇒ 010 2 ⇒ 11 ⇒ 011 3 ⇒ 100 ⇒ 00100 4 ⇒ 101 ⇒ 00101 5 ⇒ 110 ⇒ 00110 6 ⇒ 111 ⇒ 00111 7 ⇒ 1000 ⇒ 0001000 8 ⇒ 1001 ⇒ 0001001 若我們在前面找到 M 個 0 則代表此數總長度為 2*M+1 * [範例程式](https://stackoverflow.com/questions/2363500/does-anyone-have-an-easy-solution-to-parsing-exp-golomb-codes-using-c) 改為一次讀取 1byte ，將其 leadingZeroBits 改為課堂上使用的 clz 計算 ```clike int32_t getBitByPos(unsigned char *buffer, int32_t pos) { return (buffer[pos/8] >> (8 - pos%8) & 0x01); } int32_t getByteByPos(unsigned char *buffer, int32_t pos) { return (buffer[pos/8] >> (8 - pos%8)); } uint32_t decodeGolomb(unsigned char *byteStream, uint32_t *index) { uint32_t leadingZeroBits = -1; uint32_t codeNum = 0; uint32_t pos = *index; if (byteStream == NULL || pos == 0 ) { printf("Invalid input\n"); return 0; } /*for (int32_t b = 0; !b; leadingZeroBits++) b = getBitByPos(byteStream, pos++);*/ leadingZeroBits = clz(getByteByPos(byteStream,pos)) + 1; pos += leadingZeroBits; for (int32_t b = leadingZeroBits; b > 0; b--) codeNum = codeNum | (getBitByPos(byteStream, pos++) << (b - 1)); *index = pos; return ((1 << leadingZeroBits) - 1 + codeNum); } ``` ## 參考資料 [Exponential-Golomb coding:wikipedia](https://en.wikipedia.org/wiki/Exponential-Golomb_coding)