Bit Twiddling Hacks 解析 (三)

Bit Twiddling Hacks 解析 (三)

「2 為底的整數 log」跟「count tailing zeroes」某些程度來說是一樣的問題。因為：

v & (-v) 取出最低位的 1
對取出來的結果取 2 為底的整數 log

2 為底的整數 log

原文：Find the log base 2 of an integer with the MSB N set in O(N) operations (the obvious way)

問題：給一個無號整數 v，找出

⌊ \log_{2} v ⌋

。最直覺的方法就是迴圈寫：





unsigned int res = 0;
while (v >>= 1)
{
    res++;
}

方法一：剛好為二的冪次時

原文：Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup

這個思考方式是這樣：如果可以構造出一個 1-1 onto 的函數

f

，使得：

f : {2^{0}, 2^{1} \dots 2^{31}} ⟶ {0, 1, \dots 31}

那就可以建一個大小是 32 的表，先用

f (n)

把 2 的冪次對應到整數中，再去查對應的數值就可以了：

\log (n) = table [f (n)]

剩下的問題就是這個

f

要怎麼造出來。這邊有一個神奇的造法是這樣：

uint32_t f (int n) {
    return (uint32_t)(v * 0x077CB531U) >> 27;
}

static const int table[32] = 
{
  0,  9,  1,  10, 13, 21, 2,  29, 
  11, 14, 16, 18, 22, 25, 3,  30,
  8,  12, 20, 28, 15, 17, 24,  7, 
  19, 27, 23, 6,  26, 5,  4,  31
};

res = table[f(n)];

或是直接：








static const int table[32] = 
{
  0,  9,  1,  10, 13, 21, 2,  29, 
  11, 14, 16, 18, 22, 25, 3,  30,
  8,  12, 20, 28, 15, 17, 24,  7, 
  19, 27, 23, 6,  26, 5,  4,  31
};
res = table[(uint32_t)(n * 0x077CB531U) >> 27];

0x077CB531U 有一個特別的性質：把它做 (0x077CB531U << i) ，其中

i = 0 \dots 31

，並且取出最高的 5 個位元時，剛好結果會 0 ~ 31 恰出現一次。這種性質數字叫做 De Bruijn Series。既然 n 已經是 2 的冪次，所以 (n * 0x077CB531U) 其實就是在做右左移。

方法二(之一)：不是二的冪次

那就把他變成 2 的冪次：

static const int table[32] = 
{
31, 0,  9,  1,  10, 13, 21, 2,  29, 
    11, 14, 16, 18, 22, 25, 3,  30,
    8,  12, 20, 28, 15, 17, 24,  7, 
    19, 27, 23, 6,  26, 5,  4
};

v |= v >> 1; 
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;

v += 1;
r = table[(uint32_t)(v * 0x077CB531U) >> 27];

加入那個 9 ~ 13 行的那堆 OR 之後，就會把這個東西變成：

v = 000. . . \underset{本 來 最 高 位 的 1 ↑}{000000001} 111. . .111

所以再 +1 之後，就會變成：

\begin{aligned} v & = 000. . . \underset{本 來 最 高 位 的 1 ↑}{000000001} 111. . .111 \\ v + 1 & = 000. . . \underset{本 來 最 高 位 的 1 ↑}{000000010} 000. . .000 \end{aligned}

然後問題就回到之前的方法。只是表格要重新調整一下(因為結果會比原先多左移一位)。

方法二(之二)：不是二的冪次

原文：Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup

既然剛剛是「先把東西變成一排 1 再 +1」，那就忍不住想要問：有沒有辦法做直接把一排 1 對應到 0 ~ 31 的整數。也就是構造下面這樣的 1-1 onto mapping：

g : {0 b 1, 0 b 11, 0 b 111, 0 b 1111, \dots, 0 b \underset{32 個 1}{\underset{⏟}{111. . .11}}} \to {0, 1, \dots, 31}

然後就可以不用做 v + 1，直接查 table[g(n)] 就好？答案是可以，換一個 magic number 就可以了：

static const int table[32] = 
{
  0 , 9 , 1 , 10, 13, 21, 2 , 29, 
  11, 14, 16, 18, 22, 25, 3 , 30,
  8 , 12, 20, 28, 15, 17, 24, 7 , 
  19, 27, 23, 6 , 26, 5 , 4 , 31
};

v |= v >> 1; 
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;

r = table[(uint32_t)(v * 0x07C4ACDDU) >> 27];

這個數字為什麼可以做到這件事？我想到的解釋大概如下：對於任意有：

v = 0 xFF . . . FF

形式的數字 v，直接把他跟 B := 0x07C4ACDDU 相乘，實際上是在做：

\begin{array}{r} (B \cdot \underset{會 是 二 的 冪 次}{\underset{⏟}{(v + 1)}} - B) >> 27 \end{array}

這時候要做一件不能隨便做的事：問能不能把上面那個東西展開成這樣：

((v * B) >> 27) \overset{?}{\to} (B * (v + 1) >> 27) - (B >> 27)

然後因為 B >> 27 是 0，所以上面的東西似乎有機會變成：

((v * B) >> 27) \overset{?}{\to} (B * (v + 1) >> 27)

箭頭右邊會回到「剛好是二的冪次」的做法。那問題就是：這樣可不可以？一般來說這種做法並不保證成立。但只要枚舉所有可能的 v ，然後說它都成立就好。所以就暴力枚舉看看：

#include <stdio.h>
#define B 0x07C4ACDDU
int main ()
{
    unsigned v = 1;
    for (int i = 1; i <= 32; i++) {
        unsigned res1 = (v * B) >> 27;
        unsigned res2 = (B * (v + 1) >> 27);
        printf("|%3d|%3d|%3d|%3d|\n", i, res1, res2, (unsigned)(res2 - res1));
        v = (v << 1) | 1;
    }

發現兩者剛好相差一個位置：

|  1|  0|  1|  1|
|  2|  2|  3|  1|
|  3|  6|  7|  1|
|  4| 14| 15|  1|
|  5| 30| 31|  1|
|  6| 29| 30|  1|
|  7| 27| 28|  1|
|  8| 23| 24|  1|
|  9| 16| 17|  1|
| 10|  1|  2|  1|
| 11|  3|  4|  1|
| 12|  8|  9|  1|
| 13| 17| 18|  1|
| 14|  4|  5|  1|
| 15|  9| 10|  1|
| 16| 20| 21|  1|
| 17| 10| 11|  1|
| 18| 21| 22|  1|
| 19| 11| 12|  1|
| 20| 24| 25|  1|
| 21| 18| 19|  1|
| 22|  5|  6|  1|
| 23| 12| 13|  1|
| 24| 26| 27|  1|
| 25| 22| 23|  1|
| 26| 13| 14|  1|
| 27| 28| 29|  1|
| 28| 25| 26|  1|
| 29| 19| 20|  1|
| 30|  7|  8|  1|
| 31| 15| 16|  1|
| 32| 31|  0|-31|

所以會發現：其實可以。只是在建表的時候要重新調整一下位置。

方法三：利用浮點數

概念是「把整數轉浮點數，然後取出 exponent」。不過這個前提是要知道是使用 IEEE 754。另外還有 byte ordering 要考慮，所以實際上會複雜一點：







int r;
union { unsigned int u[2]; double d; } t; // temp

t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] = 0x43300000;
t.u[__FLOAT_WORD_ORDER!=LITTLE_ENDIAN] = v;
t.d -= 4503599627370496.0;
r = (t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] >> 20) - 0x3FF;

Count Tailing Zeroes (ctz)

問題：給定一個整數 v ，問這個數從最低位開始，連續出現了多少 0？

最直覺的做法就是一直位移，直到出現非 0 位元為止：









int c;  // output: c will count v's trailing zero bits,
        // so if v is 1101000 (base 2), then c will be 3
if (v) {
    v = (v ^ (v - 1)) >> 1;  // Set v's trailing 0s to 1s and zero rest
    for (c = 0; v; c++)
        v >>= 1;
} else {
  c = 31;
}

方法一：使用二元搜尋



























unsigned int v;     // 32-bit word input to count zero bits on right
unsigned int c;     // c will be the number of zero bits on the right,
                    // so if v is 1101000 (base 2), then c will be 3
// NOTE: if 0 == v, then c = 31.
if (v & 0x1)  {
    // special case for odd v (assumed to happen half of the time)
    c = 0;
} else {
    c = 1;
    if ((v & 0xffff) == 0) {  
        v >>= 16;  
        c += 16;
    }
    if ((v & 0xff) == 0) {  
        v >>= 8;  
        c += 8;
    }
    if ((v & 0xf) == 0) {  
        v >>= 4;
        c += 4;
    }
    if ((v & 0x3) == 0) {  
        v >>= 2;
        c += 2;
    }
    c -= v & 0x1;
}

這個做法用二元搜尋去找第一個 1 的位置：先看有沒有在最低 16 位元？

沒的話就把前 16 位元砍掉，然侯知道結果至少是 16，接下來的問題就變成「最高位的 16 位元中，第一個 1 出現現在哪」
反之，如果他在前 16 位元，那問題就變成「最低位的 16 位元中，第一個 1 出現在哪？」

以此遞迴下去。

方法一：還是二元搜尋

原文：Count the consecutive zero bits (trailing) on the right in parallel

可以想成上面的改良版，只是適用扣的：









unsigned int v;      // 32-bit word input to count zero bits on right
unsigned int c = 32; // c will be the number of zero bits on the right
v = (v & -signed(v));
if (v) c--;
if (v & 0x0000FFFF) c -= 16;
if (v & 0x00FF00FF) c -= 8;
if (v & 0x0F0F0F0F) c -= 4;
if (v & 0x33333333) c -= 2;
if (v & 0x55555555) c -= 1;

或是用另外一個角度思考：結果一定是 0 ~ 32 其中一個數。但是是不是 32 可以由 v 是否為 0 判斷。如果不是 0，那麼 c 不可能是 32，所以就扣 1。

排除了 n = 32 的狀況後，接著看結果是 0 ~ 31 的狀況。假定結果是 n，那麼因為

0 \leq n \leq 31

，所以 n 會是個 5 位元的數字，也就是有以下的形式：

n = b_{4} \cdot 16 + b_{3} \cdot 8 + b_{2} \cdot 4 + b_{1} \cdot 2 + b_{0}

其中

b_{i}

是 0 或 1。第 5 ~ 9 就是在問

b_{4} \dots b_{0}

是 0 還是 1

方法三：使用浮點數

原文：Count the consecutive zero bits (trailing) on the right by casting to a float

關鍵是 IEEE-754 中的 exponent 可以用 0x7f800000 去找出來。把無號數 v 最低位的位元取出來(v & -v) 並轉型成浮點數之後，把對應的 exponent 找出來並且反推回數值：




unsigned int v;            // find the number of trailing zeros in v
int r;                     // the result goes here
float f = (float)(v & -v); // cast the least significant bit in v to a float
r = (*(uint32_t *)&f >> 23) - 0x7f;

Interleave bits

問題：讓兩個數字的位元交錯。比如 x 的二進位形式是 ABCDEFGH，而 y 的二進位形式是 abcdefgh，那麼求出的結果 r 的二進位形式為 AaBbCcDdEeFfGgHh。

大致上的思路是這樣：

解法一：暴力查表

一樣是窮舉一個 byte 中所有可能的數值。假定某個 byte b 的二進位形式是 b = ABCDFEGH，那麼 table[b] 將會查到一個二進位形式是 0A0B0C0D0E0F0G0H 的數值 (也就是在所有位元間加入 0 把他們隔開的結果)。

比如說：221 的二進位是 11011101，那麼 table[221] 就會查到「每個位元左邊都插入 1 個 0」的結果，也就是 01 01 00 01 01 01 00 01。窮舉出這個表：


































static const unsigned short MortonTable256[256] = {
  0x0000, 0x0001, 0x0004, 0x0005, 0x0010, 0x0011, 0x0014, 0x0015, 
  0x0040, 0x0041, 0x0044, 0x0045, 0x0050, 0x0051, 0x0054, 0x0055, 
  0x0100, 0x0101, 0x0104, 0x0105, 0x0110, 0x0111, 0x0114, 0x0115, 
  0x0140, 0x0141, 0x0144, 0x0145, 0x0150, 0x0151, 0x0154, 0x0155, 
  0x0400, 0x0401, 0x0404, 0x0405, 0x0410, 0x0411, 0x0414, 0x0415, 
  0x0440, 0x0441, 0x0444, 0x0445, 0x0450, 0x0451, 0x0454, 0x0455, 
  0x0500, 0x0501, 0x0504, 0x0505, 0x0510, 0x0511, 0x0514, 0x0515, 
  0x0540, 0x0541, 0x0544, 0x0545, 0x0550, 0x0551, 0x0554, 0x0555, 
  0x1000, 0x1001, 0x1004, 0x1005, 0x1010, 0x1011, 0x1014, 0x1015, 
  0x1040, 0x1041, 0x1044, 0x1045, 0x1050, 0x1051, 0x1054, 0x1055, 
  0x1100, 0x1101, 0x1104, 0x1105, 0x1110, 0x1111, 0x1114, 0x1115, 
  0x1140, 0x1141, 0x1144, 0x1145, 0x1150, 0x1151, 0x1154, 0x1155, 
  0x1400, 0x1401, 0x1404, 0x1405, 0x1410, 0x1411, 0x1414, 0x1415, 
  0x1440, 0x1441, 0x1444, 0x1445, 0x1450, 0x1451, 0x1454, 0x1455, 
  0x1500, 0x1501, 0x1504, 0x1505, 0x1510, 0x1511, 0x1514, 0x1515, 
  0x1540, 0x1541, 0x1544, 0x1545, 0x1550, 0x1551, 0x1554, 0x1555, 
  0x4000, 0x4001, 0x4004, 0x4005, 0x4010, 0x4011, 0x4014, 0x4015, 
  0x4040, 0x4041, 0x4044, 0x4045, 0x4050, 0x4051, 0x4054, 0x4055, 
  0x4100, 0x4101, 0x4104, 0x4105, 0x4110, 0x4111, 0x4114, 0x4115, 
  0x4140, 0x4141, 0x4144, 0x4145, 0x4150, 0x4151, 0x4154, 0x4155, 
  0x4400, 0x4401, 0x4404, 0x4405, 0x4410, 0x4411, 0x4414, 0x4415, 
  0x4440, 0x4441, 0x4444, 0x4445, 0x4450, 0x4451, 0x4454, 0x4455, 
  0x4500, 0x4501, 0x4504, 0x4505, 0x4510, 0x4511, 0x4514, 0x4515, 
  0x4540, 0x4541, 0x4544, 0x4545, 0x4550, 0x4551, 0x4554, 0x4555, 
  0x5000, 0x5001, 0x5004, 0x5005, 0x5010, 0x5011, 0x5014, 0x5015, 
  0x5040, 0x5041, 0x5044, 0x5045, 0x5050, 0x5051, 0x5054, 0x5055, 
  0x5100, 0x5101, 0x5104, 0x5105, 0x5110, 0x5111, 0x5114, 0x5115, 
  0x5140, 0x5141, 0x5144, 0x5145, 0x5150, 0x5151, 0x5154, 0x5155, 
  0x5400, 0x5401, 0x5404, 0x5405, 0x5410, 0x5411, 0x5414, 0x5415, 
  0x5440, 0x5441, 0x5444, 0x5445, 0x5450, 0x5451, 0x5454, 0x5455, 
  0x5500, 0x5501, 0x5504, 0x5505, 0x5510, 0x5511, 0x5514, 0x5515, 
  0x5540, 0x5541, 0x5544, 0x5545, 0x5550, 0x5551, 0x5554, 0x5555
};

在這樣的狀況下，對於一個 byte 來說，只要做：


table[b1] | (table[b2] << 1);

就可以得出位元交叉混合的結果了。有了一個 byte 的做法後，可以把它推廣到交錯地合併 2 個 short ，組合出一個 int 的做法：







unsigned short x; // Interleave bits of x and y, so that all of the
unsigned short y; // bits of x are in the even positions and y in the odd;
unsigned int z;   // z gets the resulting 32-bit Morton Number.
z = table[y >> 8]   << 17 |
    table[x >> 8]   << 16 |
    table[y & 0xFF] <<  1 |
    table[x & 0xFF];

解法二：分治法

這個做法仿照「逆轉所有位元」的方法，用下面這組 mask：

B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF, ...}

主要的思路是先把位元打散，以 8 位元來說，假定位元表示是 ABCDEFGH ，而「打散」的結果就是 0A0B0C0D0F0G0H，而「打散」的做法是使用分治法。以 8 位元為例，就是不斷地交換不同間隔的位元：

      00000000ABCDEFGH
    0000ABCD    0000EFGH
  00AB  00CD    00EF  00GH
 0A 0B  0C 0D  0E 0F  0G 0H

把同樣的想法推廣到「交叉合併 2 個 16 位元整數」的話，就會得到文章中的 code：















static const unsigned int B[] = {0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF}
static const unsigned int S[] = {1, 2, 4, 8};
unsigned int x; // Interleave lower 16 bits of x and y, so the bits of x
unsigned int y; // are in the even positions and bits from y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number.
                // x and y must initially be less than 65536.
x = (x | (x << S[3])) & B[3];
x = (x | (x << S[2])) & B[2];
x = (x | (x << S[1])) & B[1];
x = (x | (x << S[0])) & B[0];
y = (y | (y << S[3])) & B[3];
y = (y | (y << S[2])) & B[2];
y = (y | (y << S[1])) & B[1];
y = (y | (y << S[0])) & B[0];
z = x | (y << 1);

跟 8 位元的想法幾乎一樣，只是把位元寬度變長，而跟著調整那組 mask 跟位移的長度。

跟前面位元反轉用的分治方式是一樣的。這邊舉合併兩個 byte 為例。假設 b1 的二進位表示法是 ABCDEFGH，而 b2 的二進位表示法是 abcdefgh，那每次

    ABCDEFGHabcdefgh
   ABCDabcd  EFGHefgh
 ABab  CDcd  EFef  GHgh
Aa Bb Cc Dd  Ee Ff Gg Hh