研讀 Bit Twiddling Hacks

--- tags: c, bitwise operation --- # 研讀 Bit Twiddling Hacks contributed by < `hsuedw` > ## [Compute the sign of an integer](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) 有兩個型別為 `int` 的變數 `v` 與 `sign` 。想要偵測 `v` 是否小於 `0` 。 ### 若 `v` 小於 `0` ，則 `sign` 被指派為 `-1` 。否則， `sign` 被指派為 `0` 。 * 第一種實作方法 :::spoiler {state=open} detect_sign_1.c ```c= int main() { int v = -7, sign; sign = -(v < 0); v = 9; sign = -(v < 0); return 0; } ``` ::: 若 `v` 的值小於 `0` ，則運算式 `v < 0` 的值為 `1` 。結果就是 `-1` 被指派給 `sign` 。若 `v` 的值大於 `0` ，則運算式 `v < 0` 的值為 `0` 。結果就是 `0` 被指派給 `sign` 。接下來，觀察 `detect_sign_1.c` 的 x86 組合語言 ```assembly= 0000000000001129 <main>: 1129: f3 0f 1e fa endbr64 112d: 55 push %rbp 112e: 48 89 e5 mov %rsp,%rbp 1131: c7 45 f8 f9 ff ff ff movl $0xfffffff9,-0x8(%rbp) 1138: 8b 45 f8 mov -0x8(%rbp),%eax 113b: c1 e8 1f shr $0x1f,%eax 113e: 0f b6 c0 movzbl %al,%eax 1141: f7 d8 neg %eax 1143: 89 45 fc mov %eax,-0x4(%rbp) 1146: c7 45 f8 09 00 00 00 movl $0x9,-0x8(%rbp) 114d: 8b 45 f8 mov -0x8(%rbp),%eax 1150: c1 e8 1f shr $0x1f,%eax 1153: 0f b6 c0 movzbl %al,%eax 1156: f7 d8 neg %eax 1158: 89 45 fc mov %eax,-0x4(%rbp) 115b: b8 00 00 00 00 mov $0x0,%eax 1160: 5d pop %rbp 1161: c3 retq 1162: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 1169: 00 00 00 116c: 0f 1f 40 00 nopl 0x0(%rax) ``` 組合語言的第 5 行到第 10 行對應到 C 語言的第 3 行到第 4 行。從這段組合語言得到了啟發，我們可以把程式碼以下列想法改寫 1. 假設每個 byte 有 8 個 bit，且一個 `int` 型別的 object 在記憶體中佔用 4 個 byte。 2. 由於現今電腦對有號整數大多採用二補數 (two's complement) 編碼。所以，若變數 `v` 內的數值小於 `0` ，則其二補數編碼的 most significant bit 必為 `1` (如組合語言第 5 行的 `$0xfffffff9`)。反之，為 `0` (如組合語言第 11 行的 `$0x9`)。 3. 對變數 `v` 內的數值做 31 個 bit 的邏輯右移。若變數 `v` 內的數值小於 `0` ，則做完邏輯右移後，變數 `v` 內的數值變為 `1` 。否則，為 `0` 。 4. ==在 C 語言規範中，對小於 0 的有號整數做右移 (bitwise right shift) 是 implementation-defined。== > C99 Standard, 6.5.7 Bitwise shift operators > The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2. ==If E1 has a signed type and a negative value, the resulting value is implementation-defined.== 對小於 0 的有號數而言， `>>` 運算子是邏輯右移，還是算術右移？==因為使用 GCC 編譯程式碼。所以，對有號整數而言， `>>` 運算子是算術右移==。 > [Re: right shift of signed type](https://gcc.gnu.org/legacy-ml/gcc/2000-04/msg00152.html) > An arithmetic left shift is equivalent to a logical left shift > (as far as GCC is concerned). > > For right shifts, if the type is signed, then we use an arithmetic shift; > if the type is unsigned then we use a logical. 我不知道原作者是否觀察了組合語言後，得到了第二種實作方法的靈感。 * 第二種實作方法 :::spoiler {state=open} detect_sign_2.c ```c #define CHAR_BIT (8) int main() { int v = -7; int sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1)); v = 9; sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1)); return 0; } ``` ::: 事實上，我們可以不需要大費周章地把變數 `v` 轉為 `unsigned int` ，做完邏輯右移後，再轉回 `int` ，然後再加上負號。我們可以利用下面幾個技巧簡化程式碼。 1. ==用 GCC 編譯的程式碼中，對有號整數而言 `>>` 運算子的行為是算術右移==。也就是在右移的時候，把 most significant bit 填入左邊空出來的 bit。 2. 在二補數編碼中， most significant bit 為 sign bit 。對正整數而言， sign bit 為 `0` 。對負整數而言， sign bit 為 `1` 。所以，==在使用 GCC 的前提下==，可以把 `detect_sign_2.c` 的程式碼簡化如下。 :::spoiler {state=open} detect_sign_3.c ```c #define CHAR_BIT (8) int main() { int v = -7; int sign = v >> (sizeof(int) * CHAR_BIT - 1); v = 9; sign = v >> (sizeof(int) * CHAR_BIT - 1); return 0; } ``` ::: 也就是直接對變數 `v` 內的數值 (`int` 型別的數值為有號整數) 做算術右移 31 個 bit 。若變數 `v` 內的數值小於 0 ，算術右移 31 個 bit 後，變數 `v` 內的數值為 `0xffffffff` ，這就是 `-1` 的二補數二進位編碼。若變數 `v` 內的數值大於或等於 0 ，算術右移 31 個 bit 後，變數 `v` 內的數值為 `0x00000000` ，這就是 `0` 的二補數二進位編碼。 > 延伸閱讀: > * Computer Systems: A Programmer’s Perspective, 3rd ed, Ch 2 Representing and Manipulating Information > * [解讀計算機編碼](https://hackmd.io/@sysprog/binary-representation) 要注意的是， ==`detect_sign_3.c` 不具 portability== 。因為對有號整數做 bitwise right shift 的行為取決於 compiler 的實做，不是 C 語言的規範。 ### 若 `v` 小於 `0` ，則 `sign` 被指派為 `-1` 。否則， `sign` 被指派為 `1` 。 :::spoiler {state=open} detect_sign_4.c ```c #define CHAR_BIT (8) int main() { int v = -7; int sign = +1 | (v >> (sizeof(int) * CHAR_BIT - 1)); v = 9; sign = +1 | (v >> (sizeof(int) * CHAR_BIT - 1)); return 0; } ``` ::: 延續 `detect_sign_3.c` 的實作，對 `(v >> (sizeof(int) * CHAR_BIT - 1))` 做 bitwise or ，可以實現當 `v` 小於 `0` ，則 `sign` 被指派為 `-1` 。否則， `sign` 被指派為 `1`。如先前所述，這種寫法不具 portability。 ### 若 `v` 小於 `0` ，則 `sign` 被指派為 `-1` 。若 `v` 等於 `0` ，`sign` 被指派為 `0` 。若 `v` 大於 `0` ，`sign` 被指派為 `1` 。 :::spoiler {state=open} detect_sign_5.c ```c #define CHAR_BIT (8) int main() { int v = -7; int sign = (v != 0) | -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1)); v = 9; sign = (v != 0) | -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1)); v = 0; sign = (v != 0) | -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1)); return 0; } ``` ::: :::spoiler {state=open} detect_sign_6.c ```c #define CHAR_BIT (8) int main() { int v = -7; int sign = (v != 0) | (v >> (sizeof(int) * CHAR_BIT - 1)); v = 9; sign = (v != 0) | (v >> (sizeof(int) * CHAR_BIT - 1)); v = 0; sign = (v != 0) | (v >> (sizeof(int) * CHAR_BIT - 1)); return 0; } ``` ::: `detect_sign_5.c` 雖然比較繁瑣，但具 protibility 。 `detect_sign_6.c` 則相反。原作者還提供第三種實作方式。 :::spoiler {state=open} detect_sign_7.c ```c int main() { int v = -7; int sign = (v > 0) - (v < 0); v = 9; sign = (v > 0) - (v < 0); v = 0; sign = (v > 0) - (v < 0); return 0; } ``` ::: ### 若 `v` 為非 `0` ，則 `sign` 被指派為 `1` 。否則， `sign` 被指派為 `0` 。 :::spoiler {state=open} detect_sign_8.c ```c #define CHAR_BIT (8) int main() { int v = -7; int sign = 1 ^ ((unsigned int)v >> (sizeof(int) * CHAR_BIT - 1)); v = 9; sign = 1 ^ ((unsigned int)v >> (sizeof(int) * CHAR_BIT - 1)); v = 0; sign = 1 ^ ((unsigned int)v >> (sizeof(int) * CHAR_BIT - 1)); return 0; } ``` ::: 由先前的討論可得知， * 若變數 `v` 的數值小於 `0` ，則 `((unsigned int)v >> (sizeof(int) * CHAR_BIT - 1))` 運算式的計算結果為 `0x00000001` 。再與 `1` 做 `^` (bitwise xor) ，最後結果為 `0x00000000` 。 * 若變數 `v` 的數值為非 `0` ，則 `((unsigned int)v >> (sizeof(int) * CHAR_BIT - 1))` 運算式的計算結果為 `0x00000000` 。再與 `1` 做 `^` (bitwise xor) ，最後結果為 `0x00000001` 。 ## [Detect if two integers have opposite signs](https://graphics.stanford.edu/~seander/bithacks.html#DetectOppositeSigns) ```c #include <stdbool.h> int main() { int x = 0, y = 0; bool f = ((x ^ y) < 0); x = 1, y = 2; f = ((x ^ y) < 0); x = -1, y = -2; f = ((x ^ y) < 0); x = 1, y = -2; f = ((x ^ y) < 0); return 0; } ``` 由於現今電腦對有號整數大多採用二補數 (two's complement) 編碼。所以，在有號整數的二進位表示法中，其 most significant bit 為 sign bit。這裡借用 Computer Systems: A Programmer's Perspective 第三版，第二章中對二補數編碼的定義來說明 > Computer Systems: A Programmer's Perspective, 2.2.3 Two's-Complement Encodings > PRINCIPLE: Definition of two's-complement encoding > For vector $\vec{x} = [x_{w-1}, x_{w-2}, \cdots, x_{0}]$ > $$ > B2T_{w}(\vec{x}) \dot{=} -x_{w-1}2^{w-1} + \sum\limits_{i=0}^{w-2} x_{i}2^{i} \tag{2.3} > $$ 由公式 (2.3) 可以得知對小於 0 的有號整數而言，其二進位表示法中的 most signifcant bit 必為 `1`。否則，為 `0` 。 * 若兩有號整數 (`x` 與 `y`)，一為負整數，另一為非負整數，則此兩有號整數的 most significant bit 必分別為 `1` 與 `0` 。 `x ^ y` 運算結果的 most siginificant bit 必為 `1` 。所以， `x ^ y` 運算結果必為負整數。 * 若兩有號整數 (`x` 與 `y`)，同時大於等於 0 或同時小於 0 ，則此兩有號整數的 most significant bit 必同時分別為 `0` 或 `1` 。 `x ^ y` 運算結果的 most siginificant bit 必為 `0` 。所以， `x ^ y` 運算結果必為非負整數。 ## [Compute the minimum (min) or maximum (max) of two integers without branching](https://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax) ### 找最小值 #### 第一種方法 :::spoiler {state=open} min1.c ```c= int my_min(int x, int y) { return y ^ ((x ^ y) & -(x < y)); } int main() { int r = my_min(3, 5); r = my_min(9, 2); r = my_min(8, 8); return 0; } ``` ::: 整支程式的重點當然就是第 3 行中 `y ^ ((x ^ y) & -(x < y))` 這個運算式。把這個運算式拆解成幾個部分來看。 * `-(x < y)` 這個運算式扮演條件判斷的的角色。 * 如果 `x < y` 成立，則整個 `-(x < y)` 的結果是 `-1` 。由於電腦採用二補數表示有號整數，所以 `-1` 的二進位表示法為 `0xffffffff` (假設 `int` 型別 object 佔用 4 個 byte)。所以可以對整個運算式 `y ^ ((x ^ y) & -(x < y))` 做出下列改寫: `y ^ ((x ^ y) & -(x < y))` $\Rightarrow$ `y ^ ((x ^ y) & 0xffffffff)` $\Rightarrow$ `y ^ (x ^ y)` $\Rightarrow$ `x` * 如果 `x < y` 不成立 (也就是 `x` 大於 `y` )，則整個 `-(x < y)` 的結果是 `0` 。 `0` 的二進位表示法為 `0` 所以可以對整個運算式 `y ^ ((x ^ y) & -(x < y))` 做出下列改寫: `y ^ ((x ^ y) & -(x < y))` $\Rightarrow$ `y ^ ((x ^ y) & 0)` $\Rightarrow$ `y ^ 0` $\Rightarrow$ `y` #### 第二種方法 :::spoiler {state=open} min2.c ```c= #include <limits.h> int my_min(int x, int y) { return y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); } int main() { int r = my_min(3, 5); r = my_min(9, 2); r = my_min(8, 8); r = my_min(INT_MIN, INT_MIN); r = my_min(INT_MAX, INT_MAX); r = my_min(INT_MAX, INT_MIN); r = my_min(INT_MIN, INT_MAX); return 0; } ``` ::: 假設 `int` 型別的 object 在記憶體中佔用 4 個 byte ，每個 byte 有 8 個 bit。所以 `(x - y) >> (sizeof(int) * CHAR_BIT - 1)` 運算式會對 `(x - y)` 做 31 個 bit 的 bitwise right shift 31。在==使用 GCC 編譯程式碼==的前提下。運算子 `>>` 對負整數而言是算術右移，對非負整數而言是邏輯右移。 1. 如果 `x - y` 為負整數 (也就是 `x` 小於 `y` ) ，則運算式 `(x - y) >> (sizeof(int) * CHAR_BIT - 1)` 的結果就是 `0xffffffff` 。那麼，我們可以對整個 `y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1)))` 運算式做下列改寫: `y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1)))` $\Rightarrow$ `y + ((x - y) & ((x - y) >> 31))` $\Rightarrow$ `y + ((x - y) & 0xffffffff)` $\Rightarrow$ `y + (x - y)` $\Rightarrow$ `x` 2. 如果 `x - y` 為正整數 (也就是 `x` 大於 `y` ) ，則運算式 `(x - y) >> (sizeof(int) * CHAR_BIT - 1)` 的結果就是 `0` 。那麼，我們可以對整個 `y + ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1)))` 運算式做下列改寫: $\Rightarrow$ `y + ((x - y) & ((x - y) >> 31))` $\Rightarrow$ `y + ((x - y) & 0)` $\Rightarrow$ `y + 0` $\Rightarrow$ `y` 然而第二種方法在第 15 行與第 16 行無法計算出正確答案。這兩行計算出來的答案都是 `INT_MAX` 而不是我們所期待的 `INT_MIN` 。這是 overflow 所造成的問題。所以==第二種方法的使用前提是 `x - y` 不能發生 overflow== 。 ### 找最大值 #### 第一種方法 ```c= int my_max(int x, int y) { return x ^ ((x ^ y) & -(x < y)); } int main() { int r = my_max(3, 5); r = my_max(9, 2); r = my_max(8, 8); return 0; } ``` * 運作原理 `min1.c` 相似。 * 其實，第 3 行中的運算式也可以寫成這樣 `y ^ ((x ^ y) & -(y < x))` 。 #### 第二種方法 ```c= #include <limits.h> int my_max(int x, int y) { return x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); } int main() { int r = my_max(3, 5); r = my_max(9, 2); r = my_max(8, 8); r = my_max(INT_MIN, INT_MIN); r = my_max(INT_MAX, INT_MAX); r = my_max(INT_MAX, INT_MIN); r = my_max(INT_MIN, INT_MAX); return 0; } ``` * 運作原理 `min2.c` 相似。 * 然而第二種方法在第 15 行與第 16 行無法計算出正確答案。這兩行計算出來的答案都是 `INT_MIN` 而不是我們所期待的 `INT_MAX` 。這是 overflow 所造成的問題。所以==第二種方法的使用前提是 `x - y` 不能發生 overflow== 。 ## [Determining if an integer is a power of 2](https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2) ```c #include <stdbool.h> bool is_power_of_2(unsigned int v) { return v && !(v & (v - 1)); } int main() { bool f = is_power_of_2(100); f = is_power_of_2(1024); f = is_power_of_2(0); return 0; } ``` * 若變數 `v` 的值是 2 的冪 (power of 2)，則它的二進位編碼中必定只有一個 bit 為 `1` 。假設變數 `v` 的數值為 `1024` ，它的二進位編碼為 10000000000$_{2}$ ，則運算式 `v - 1` 的結果為 01111111111$_{2}$ 。所以，運算式 `v & (v - 1)` 的結果就是 10000000000$_{2}$ & 01111111111$_{2}$ = 0 。 * 所以，==只要變數 `v` 的值是 2 的冪，運算式 `v & (v - 1)` 的結果就是 0== 。 * 有一個例外狀況需要處理。當變數 `v` 的值為 `0` 時， `v & (v - 1)` 的結果是 0 。但是， 0 並不是 2 的冪。所以把運算式改寫為 `v && !(v & (v - 1))` 。這樣就可以擋掉這個例外狀況。 ## Sign extending ### [Sign extending from a constant bit-width](https://graphics.stanford.edu/~seander/bithacks.html#FixedSignExtend) 假設有一個以二補數表示的有號數 `x` ，其二進位表示法由 b 個 bit 表示。然後，我們想要把 `x` 轉換成由 `int` 型別 (一般為 32 個 bit)的 object 表示。則左邊多出來的 bit 皆填 sign bit 。 ```c #define BIT_WIDTH (5) int main() { int x; // convert this from using BIT_WIDTH bits to a full int int r; // resulting sign extended number goes here struct { signed int x:BIT_WIDTH; } s; x = -3; r = s.x = x; x = 5; r = s.x = x; return 0; } ``` ``` (gdb) l 1 2 #define BIT_WIDTH (5) 3 int main() 4 { 5 int x; // convert this from using BIT_WIDTH bits to a full int 6 int r; // resulting sign extended number goes here 7 struct { 8 signed int x:BIT_WIDTH; 9 } s; 10 (gdb) 11 x = -3; 12 r = s.x = x; 13 14 x = 5; 15 r = s.x = x; 16 17 return 0; 18 } (gdb) b 11 Breakpoint 1 at 0x1131: file test.c, line 11. (gdb) r Starting program: /home/edward/workspace/linux-2022/gdb/test Breakpoint 1, main () at test.c:11 11 x = -3; (gdb) n 12 r = s.x = x; (gdb) x/4bx &r 0x7fffffffe2dc: 0x00 0x00 0x00 0x00 (gdb) n 14 x = 5; (gdb) x/4bx &r 0x7fffffffe2dc: 0xfd 0xff 0xff 0xff (gdb) n 15 r = s.x = x; (gdb) n 17 return 0; (gdb) x/4bx &r 0x7fffffffe2dc: 0x05 0x00 0x00 0x00 (gdb) ``` ### [Sign extending from a variable bit-width](https://graphics.stanford.edu/~seander/bithacks.html#VariableSignExtendj) 有時候我們想要把以 b 個 bit 表示的有號整數用更多 bit 表示，但我們又事先不知道 b 的確切數字。也就是說， `b` 是個變數。 ```c= int main() { unsigned b = 5; // number of bits representing the number in x int x = -3; // sign extend this b-bit number to r int r; // resulting sign-extended number int const m = 1U << (b - 1); // mask can be pre-computed if b is fixed x = x & ((1U << b) - 1); // (Skip this if bits in x above position b are already zero.) r = (x ^ m) - m; x = 3; x = x & ((1U << b) - 1); r = (x ^ m) - m; return 0; } ``` 以上面這個程式為例。變數 `b` (無號正整數)的數值為 `5` 。也就是用 5 個 bit 表示有號數。第 6 行的變數 `m` 的值為 `0x00000010` 。第 8 行。運算式 `(1U << b)` 的結果為 `0x00000020` 。接著，運算式 `(1U << b) - 1` 的結果為 `0x0000001f` 。所以，整個運算式 `x & ((1U << b) - 1)` 的結果為 `0x0000001d` 。並把這個值指派給變數 `x` 。 * 也就是說，運算式 `(1U << b) - 1` 造出了一個 mask 。這個 mask 保留了變數 `x` 的 bit 0 ~ 4 ，其餘的 bit 全都清為 0 。第 9 行。運算式 `x ^ m` 的結果為 `0x0000000d` 。所以，整個運算式 `(x ^ m) - m` 的結果為負整數 `0xfffffffd` ，並把這個值指派給變數 `r`。 * 也就是說，運算式 `x ^ m` 檢測變數 `x` 中的 sign bit (就是本範例中的 bit 4 )。 * 在這個範例中，以 5 個 bit 表示有號整數。所以有號整數的範圍為 -16 ~ 15 。 * 如果這 5 個 bit 表示一個負整數，則 sign bit (即 bit 4 ) 為 `1` 。那麼，運算式 `x ^ m` 相當於加 16 。則運算式 `(x ^ m) - m` 則是再減掉 `-16` ，就會回到原本的負整數。 * 如果這 5 個 bit 表示一個非負整數，則 sign bit (即 bit 4 ) 為 `0` 。那麼，運算式 `x ^ m` 相當於減 16 。則運算式 `(x ^ m) - m` 則是再減掉 `-16` ，就會回到原本的非負整數。 ### [Sign extending from a variable bit-width in 3 operations](https://graphics.stanford.edu/~seander/bithacks.html#VariableSignExtendRisky) ```c= #include <stdio.h> #include <limits.h> int main() { unsigned b = 2; // number of bits representing the number in x int x = 2; // sign extend this b-bit number to r int r; // resulting sign-extended number #define M(B) (1U << ((sizeof(x) * CHAR_BIT) - B)) // CHAR_BIT=bits/byte static int const multipliers[] = { 0, M(1), M(2), M(3), M(4), M(5), M(6), M(7), M(8), M(9), M(10), M(11), M(12), M(13), M(14), M(15), M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23), M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31), M(32) }; // (add more if using more than 64 bits) static int const divisors[] = { 1, ~M(1), M(2), M(3), M(4), M(5), M(6), M(7), M(8), M(9), M(10), M(11), M(12), M(13), M(14), M(15), M(16), M(17), M(18), M(19), M(20), M(21), M(22), M(23), M(24), M(25), M(26), M(27), M(28), M(29), M(30), M(31), M(32) }; // (add more for 64 bits) #undef M r = (x * multipliers[b]) / divisors[b]; printf("%d\n", r); return 0; } ``` 假設 1 個 byte 有 8 個 bit 。 `int` 與 `unsigned int` 都是佔用 4 個 byte 的記憶體空間。而我們要表示的有號整數佔用 b 個 bit 。這個作法的核心想法就是 `(x << B) >> B` 。其中， `B = 32 - b` 。在沒有發生 overflow 的前提下，運算式 `(x << B) >> B` 的結果與 `x` 所表示的數值相同。 #### 什麼時候會發生 overflow ? 如程式中的第 6~7 行所示，當變數 `b` 與 `x` 的數值皆為 `2` 時， $2 << 30 = 2 \times 2^{30} = 2^{31} > 2^{31} - 1$ 。會超過 `int` 型別所能表示的最大值 ($2^{31} - 1$)，所以會發生 overflow。把上面的程式編譯後，並執行。可得下列結果。 ``` $ gcc test.c -fsanitize=undefined -o test $ ./test test.c:27:12: runtime error: signed integer overflow: 2 * 1073741824 cannot be represented in type 'int' -2 ``` 原本預期程式會輸出 `2` 。但是因為 overflow ，得到的結果卻是 `-2` 。由這個範例中的討論可以知道，只要 $log_{2}x + (32 - b) \ge 31$ 就會發生 overflow 。在程式碼中，第 20 行的 `~M(1)` 跟 `multipliers` 與 `divisors` 這兩個陣列的其他元素看起來不大一樣。如果只用 1 個 bit 表示有號整數，那麼可以表示的數值範圍為 -1 ~ 0 。也就是當變數 `b` 與 `x` 皆為 `1` 時， $log_{2}x + (32 - b) = 31 \ge 31$ 。會發生 overflow 。無法套用 `(x << B) >> B` 這個想法。 ```c #include <stdio.h> #include <limits.h> int main() { unsigned b = 1; // number of bits representing the number in x int x = 1; // sign extend this b-bit number to r int r; // resulting sign-extended number #define M(B) (1U << ((sizeof(x) * CHAR_BIT) - B)) // CHAR_BIT=bits/byte int multiplier = M(b), divisor = M(b); r = (x * multiplier) / divisor; printf("%d\n", r); return 0; } ``` 編譯並執行後，可得下列結果。原本預期程式會輸出 `-1` 。但是因為 overflow ，得到的結果卻是 `1` 。 ``` $ gcc test.c -fsanitize=undefined -o test $ ./test 1 ``` 將 `divisor = M(b)` 改為 `divisor = ~M(b)` 。因為 `1 * M(1)` $\Rightarrow$ `1 << 31` $\Rightarrow$ `-2147483648` ，又 `~M(1)` $\Rightarrow$ `2147483647` 。所以 `-2147483648 / 2147483647` $\Rightarrow$ `-1` 。 ```c #include <stdio.h> #include <limits.h> int main() { unsigned b = 1; // number of bits representing the number in x int x = 1; // sign extend this b-bit number to r int r; // resulting sign-extended number #define M(B) (1U << ((sizeof(x) * CHAR_BIT) - B)) // CHAR_BIT=bits/byte int multiplier = M(b), divisor = ~M(b); r = (x * multiplier) / divisor; printf("%d\n", r); return 0; } ``` 編譯並執行後，就可以得到預期的結果。 ``` $ gcc test.c -fsanitize=undefined -o test $ ./test -1 ``` ## [Conditionally set or clear bits without branching](https://graphics.stanford.edu/~seander/bithacks.html#ConditionalSetOrClearBitsWithoutBranching) 不用 C 語言的 `if ... else ...` 語句 (即 without branch 或 branchless ) 做到下面這支程式的效果。也就是當變數 `f` 為 `1` 時，會按照變數 `m` 的 set bits 設定變數 `w` 中相對應的位元。否則 (即變數 `f` 為 `0` ) ，會按照變數 `m` 的 set bits 清除變數 `w` 中相對應的位元。變數 `f` 的值只會是 `0` 或 `1` 。 ```c #include <stdbool.h> int main() { bool f; // conditional flag unsigned int m; // the bit mask unsigned int w; // the word to modify if (f) w |= m; else w &= ~m; return 0; } ``` ### 方法一: ```c #include <stdbool.h> int main() { bool f = 1; // conditional flag unsigned int m = 0xabcdef12; // the bit mask unsigned int w = 0x12345678; // the word to modify w ^= (-f ^ w) & m; // w = w ^ ((-f ^ w) & m); return 0; } ``` 藉由觀察組合語言可以推得運算式 `w ^= (-f ^ w) & m` 等價於運算式 `w = w ^ ((-f ^ w) & m)` 。 ``` $ gcc test.c -O0 -g -o test $ objdump -d test | less 0000000000001129 <main>: 1129: f3 0f 1e fa endbr64 112d: 55 push %rbp 112e: 48 89 e5 mov %rsp,%rbp 1131: c7 45 f4 01 00 00 00 movl $0x1,-0xc(%rbp) 1138: c7 45 f8 12 ef cd ab movl $0xabcdef12,-0x8(%rbp) 113f: c7 45 fc 78 56 34 12 movl $0x12345678,-0x4(%rbp) 1146: 8b 45 f4 mov -0xc(%rbp),%eax 1149: f7 d8 neg %eax 114b: 33 45 fc xor -0x4(%rbp),%eax 114e: 23 45 f8 and -0x8(%rbp),%eax 1151: 31 45 fc xor %eax,-0x4(%rbp) 1154: b8 00 00 00 00 mov $0x0,%eax 1159: 5d pop %rbp 115a: c3 retq 115b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) ``` * 假設 1 個 byte 有 8 個 bit 。 1 個 `unsigned int` 佔用 4 個 byte 。 * $A \oplus B = (A \cdot \lnot B) + (\lnot A \cdot B)$ #### 變數 `f` 的數值為 `1` `w ^ ((-f ^ w) & m)` $\Rightarrow$ `w ^ ((-1 ^ w) & m)` $\Rightarrow$ `w ^ ((0xffffffff ^ w) & m)` $\Rightarrow$ `w ^ (w & m)` $\Rightarrow$ `(w & ~(w & m)) | (~w & (w & m))` $\Rightarrow$ `(w & (~w | ~m)) | (~w & (w & m))` $\Rightarrow$ `((w & ~w) | ( w & ~m)) | ((~w & w) & m)` $\Rightarrow$ `0 | ( w & ~m) | 0 & m` $\Rightarrow$ `( w & ~m) | 0` $\Rightarrow$ `w & ~m` | w | m | ~m | w & m | w ^ (w & m) | w & ~m | | --- | --- | ---- | ----- | ----------- | ------ | | 0 | 0 | 1 | 0 | 0 | 0 | | 0 | 1 | 0 | 0 | 0 | 0 | | 1 | 0 | 1 | 0 | 1 | 1 | | 1 | 1 | 0 | 1 | 0 | 0 | #### 變數 `f` 的數值為 `0` `w ^ ((-f ^ w) & m)` $\Rightarrow$ `w ^ ((0 ^ w) & m)` $\Rightarrow$ `w ^ (~w & m)` $\Rightarrow$ `(w & ~(~w & m)) | (~w & (~w & m))` $\Rightarrow$ `(w & (w | ~m)) | ((~w & ~w) & m)` $\Rightarrow$ `w | (~w & m)` $\Rightarrow$ `(w | ~w) & (w | m)` $\Rightarrow$ `1 & (w | m)` $\Rightarrow$ `w | m` | w | m | ~w | ~m | w \| ~m | w & (w \| ~m) | ~w & m | w ^ (~w & m) | w \| m | | --- | --- | ---- | ---- | ------- | ------------- | ------ | ------------ | ------ | | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | ### 方法二: ```c #include <stdbool.h> int main() { bool f; // conditional flag unsigned int m; // the bit mask unsigned int w; // the word to modify w = (w & ~m) | (-f & m); return 0; } ``` #### 變數 `f` 的數值為 `1` `(w & ~m) | (-f & m)` $\Rightarrow$ `(w & ~m) | (-1 & m)` $\Rightarrow$ `(w & ~m) | (0xffffffff & m)` $\Rightarrow$ `(w & ~m) | m` $\Rightarrow$ `(w | m) & (~m | m)` $\Rightarrow$ `(w | m) & 1` $\Rightarrow$ `w | m` #### 變數 `f` 的數值為 `0` `(w & ~m) | (-f & m)` $\Rightarrow$ `(w & ~m) | (0 & m)` $\Rightarrow$ `(w & ~m) | 0` $\Rightarrow$ `w & ~m` ## [Conditionally negate a value without branching](https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate) ### 方法一在不使用分支的情況下，當 flag ( `fNegate` ) 的值為 `false` 時，取負數。 ```c #include <stdbool.h> int main() { bool fDontNegate = 0; // Flag indicating we should not negate v. int v = 1234; // Input value to negate if fDontNegate is false. int r = 5678; // result = fDontNegate ? v : -v; r = (fDontNegate ^ (fDontNegate - 1)) * v; return 0; } ``` 假設 `int` 佔用 4 個 byte ，每個 byte 有 8 個 bit 。 #### 當 fDontNegate 的值為 `0` 時 `r = (0 ^ (0 - 1)) * v;` $\Rightarrow$ `r = (0 ^ (-1)) * v;` $\Rightarrow$ `r = (0x00000000 ^ 0xffffffff) * v;` $\Rightarrow$ `r = (0xffffffff) * v;` $\Rightarrow$ `r = (-1) * v;` $\Rightarrow$ `r = -v;` 用 `gdb` 追蹤程式碼。看完下面組合語言程式碼的追蹤，可以得知上面的數學推導與電腦機器指令的執行過程是一致的。對照一下上面的 C 語言程式碼，可以得知 `r = (fDontNegate ^ (fDontNegate - 1)) * v;` 這個 statement 對應到下面組合語言的第 14 ~ 18 行。 ```assemble= $ gcc test.c -O0 -g -o test $ gdb -q test Reading symbols from test... (gdb) disass main Dump of assembler code for function main: 0x0000000000001129 <+0>: endbr64 0x000000000000112d <+4>: push %rbp 0x000000000000112e <+5>: mov %rsp,%rbp 0x0000000000001131 <+8>: movb $0x0,-0x9(%rbp) 0x0000000000001135 <+12>: movl $0x4d2,-0x8(%rbp) 0x000000000000113c <+19>: movl $0x162e,-0x4(%rbp) 0x0000000000001143 <+26>: movzbl -0x9(%rbp),%eax 0x0000000000001147 <+30>: movzbl -0x9(%rbp),%edx 0x000000000000114b <+34>: sub $0x1,%edx 0x000000000000114e <+37>: xor %eax,%edx 0x0000000000001150 <+39>: mov -0x8(%rbp),%eax 0x0000000000001153 <+42>: imul %edx,%eax 0x0000000000001156 <+45>: mov %eax,-0x4(%rbp) 0x0000000000001159 <+48>: mov $0x0,%eax 0x000000000000115e <+53>: pop %rbp 0x000000000000115f <+54>: retq End of assembler dump. ``` 下面是用 `gdb` 追蹤組合語言的部分過程。 * 當執行完第 14 行 (`sub $0x1,%edx`) 後， CPU 暫存器 rdx 的內容由 `0` 轉為 `0xffffffff` 。 ```gdb (gdb) info register rdx rdx 0x0 0 (gdb) ni 9 r = (fDontNegate ^ (fDontNegate - 1)) * v; (gdb) info register rdx rdx 0xffffffff 4294967295 ``` * 當執行完第 15 行 (`xor %eax,%edx`) 後， CPU 暫存器 rdx 的內容維持 `0xffffffff`。 ```gdb (gdb) info register rax rax 0x0 0 (gdb) ni 9 r = (fDontNegate ^ (fDontNegate - 1)) * v; (gdb) info register rdx rdx 0xffffffff 4294967295 ```gdb * 當執行完第 16 行 (mov -0x8(%rbp),%eax) 與第 17 行 (`imul %edx,%eax`) 後， CPU 暫存器 rax 的內容變為 `0xfffffb2e` ，也就是 $-1234_{10}$ 。 ```gdb (gdb) info register rax rax 0x0 0 (gdb) ni 0x0000555555555153 9 r = (fDontNegate ^ (fDontNegate - 1)) * v; (gdb) info register rax rax 0x4d2 1234 (gdb) ni 0x0000555555555156 9 r = (fDontNegate ^ (fDontNegate - 1)) * v; (gdb) info register rax rax 0xfffffb2e 4294966062 ``` * 當執行完第 18 行 (mov %eax,-0x4(%rbp)) 後，將計算結果 ($-1234_{10}$) 由 CPU 暫存器複製到記憶體位址 0x7fffffffe2cc (就是變數 `r`)中。因為 Intel CPU 架構採 little endian (least significant byte 放在低位址)，所以正確的資料為 0xfffffb2e ，也就是 $-1234_{10}$ 。 ```gdb (gdb) info register rbp rbp 0x7fffffffe2d0 0x7fffffffe2d0 (gdb) x/4bx 0x7fffffffe2cc 0x7fffffffe2cc: 0x2e 0x16 0x00 0x00 (gdb) ni 10 return 0; (gdb) x/4bx 0x7fffffffe2cc 0x7fffffffe2cc: 0x2e 0xfb 0xff 0xff (gdb) ``` #### 當 fDontNegate 的值為 `1` 時 `r = (1 ^ (1 - 1)) * v;` $\Rightarrow$ `r = (1 ^ (0)) * v;` $\Rightarrow$ `r = (0x00000001 ^ (0x00000000)) * v;` $\Rightarrow$ `r = (0x00000001) * v;` $\Rightarrow$ `r = 1 * v;` $\Rightarrow$ `r = v;` 把 C code 中，變數 `fDontNegate` 的初值改為 `1` 。下面是用 gdb 追蹤組合語言的部分過程。 ```assemble= $ gcc test.c -O0 -g -o test $ gdb -q test Reading symbols from test... (gdb) disass main Dump of assembler code for function main: 0x0000000000001129 <+0>: endbr64 0x000000000000112d <+4>: push %rbp 0x000000000000112e <+5>: mov %rsp,%rbp 0x0000000000001131 <+8>: movb $0x1,-0x9(%rbp) 0x0000000000001135 <+12>: movl $0x4d2,-0x8(%rbp) 0x000000000000113c <+19>: movl $0x162e,-0x4(%rbp) 0x0000000000001143 <+26>: movzbl -0x9(%rbp),%eax 0x0000000000001147 <+30>: movzbl -0x9(%rbp),%edx 0x000000000000114b <+34>: sub $0x1,%edx 0x000000000000114e <+37>: xor %eax,%edx 0x0000000000001150 <+39>: mov -0x8(%rbp),%eax 0x0000000000001153 <+42>: imul %edx,%eax 0x0000000000001156 <+45>: mov %eax,-0x4(%rbp) 0x0000000000001159 <+48>: mov $0x0,%eax 0x000000000000115e <+53>: pop %rbp 0x000000000000115f <+54>: retq End of assembler dump. ``` 一樣把注意力集中在第 14 ~ 18 行。在程式執行到第 14 行時，變數 `fDontNegate` ， `v` 與 `r` 的初值會被分別寫到 `0x7fffffffe2d7` (1 byte) ， `0x7fffffffe2d8` (4 bytes) 與 `0x7fffffffe2dc` (4 bytes) 這三個記憶體位址。要注意的是， Intel CPU 採用 little endian 對記憶體讀寫資料。 ```gdb (gdb) info register rbp rbp 0x7fffffffe2e0 0x7fffffffe2e0 (gdb) x/4xb 0x7fffffffe2d7 0x7fffffffe2d7: 0x01 0x00 0x00 0x00 (gdb) x/4xb 0x7fffffffe2d8 0x7fffffffe2d8: 0xd2 0x04 0x00 0x00 (gdb) x/4xb 0x7fffffffe2dc 0x7fffffffe2dc: 0x2e 0x16 0x00 0x00 ``` 接著執行組合語言第 14 行 (`sub $0x1,%edx`)。因為在第 13 行已經把變數 `fDontNegate` (記憶體位址 `0x7fffffffe2d7` ) 的值複製給了 CPU 暫存器 `rdx` ，所執行完第 14 行後，暫存器 `rdx` 的值為 `0` 。 ```gdb (gdb) info register rdx rdx 0x1 1 (gdb) ni 9 r = (fDontNegate ^ (fDontNegate - 1)) * v; (gdb) info register rdx rdx 0x0 0 ``` 當執行完第 15 行 (xor %eax,%edx) 後， CPU 暫存器 rdx 的內容變為 `1`。因為暫存器 eax 在第 12 行已被設定為變數 `fDontNegate` 的值，也就是 `1` 。 ```gdb (gdb) ni 9 r = (fDontNegate ^ (fDontNegate - 1)) * v; (gdb) info register rdx rdx 0x1 1 ``` 當執行完第 16 行 ( `mov -0x8(%rbp),%eax` ) 與第 17 行 ( `imul %edx,%eax` ) 後， CPU 暫存器 rax 的內容變為 `0x4d2` ，也就是 `1234` 。 ```gdb (gdb) info register rax rax 0x1 1 (gdb) ni 9 r = (fDontNegate ^ (fDontNegate - 1)) * v; (gdb) ni 0x0000555555555153 9 r = (fDontNegate ^ (fDontNegate - 1)) * v; (gdb) info register rax rax 0x4d2 1234 ``` 當執行完第 18 行 ( `mov %eax,-0x4(%rbp)` ) 後，將計算結果 ( $1234_{10}$ ) 由 CPU 暫存器複製到記憶體位址 `0x7fffffffe2dc` (就是變數 r)中。 ```gdb (gdb) x/4dx 0x7fffffffe2dc 0x7fffffffe2dc : 0x2e 0x16 0x00 0x00 (gdb) ni 0x0000555555555156 9 r = (fDontNegate ^ (fDontNegate - 1)) * v; (gdb) ni 10 return 0; (gdb) x/4dx 0x7fffffffe2dc 0x7fffffffe2dc: 0xd2 0x04 0x00 0x00 ``` ### 方法二在不使用分支的情況下，當 flag ( `fNegate` ) 的值為 `true` 時，取負數。 ```c #include <stdbool.h> // If you need to negate only when a flag is true, then use this: int main() { bool fNegate = 1; // Flag indicating if we should negate v. int v = 1234; // Input value to negate if fNegate is true. int r = 5678; // result = fNegate ? -v : v; r = (v ^ -fNegate) + fNegate; return 0; } ``` 假設 `int` 佔用 4 個 byte ，每個 byte 有 8 個 bit 。 #### 當 fDontNegate 的值為 `1` 時 `r = (v ^ -fNegate) + fNegate;` $\Rightarrow$ `r = (v ^ -1) + 1;` $\Rightarrow$ `r = (v ^ 0xffffffff) + 1;` $\Rightarrow$ `r = ~v + 1;` $\Rightarrow$ `r = -v;` 接著用 GDB 追蹤程式碼。 ```gdb= $ gcc test.c -O0 -g -o test $ gdb -q test Reading symbols from test... (gdb) disass main Dump of assembler code for function main: 0x0000000000001129 <+0>: endbr64 0x000000000000112d <+4>: push %rbp 0x000000000000112e <+5>: mov %rsp,%rbp 0x0000000000001131 <+8>: movb $0x1,-0x9(%rbp) 0x0000000000001135 <+12>: movl $0x4d2,-0x8(%rbp) 0x000000000000113c <+19>: movl $0x162e,-0x4(%rbp) 0x0000000000001143 <+26>: movzbl -0x9(%rbp),%eax 0x0000000000001147 <+30>: neg %eax 0x0000000000001149 <+32>: xor -0x8(%rbp),%eax 0x000000000000114c <+35>: mov %eax,%edx 0x000000000000114e <+37>: movzbl -0x9(%rbp),%eax 0x0000000000001152 <+41>: add %edx,%eax 0x0000000000001154 <+43>: mov %eax,-0x4(%rbp) 0x0000000000001157 <+46>: mov $0x0,%eax 0x000000000000115c <+51>: pop %rbp 0x000000000000115d <+52>: retq End of assembler dump. ``` 對照 C code 之後，可以很容易地理解到，組合語言中的第 9 到 11 行就是將變數 `fNegate` 、 `v` 以及 `r` 分別初始化為 `1` 、 `1234` 以及 `5678` 。所以，這三個變數的記憶體位址分別為 `0x7fffffffe247` (`-0x9(%rbp)`) 、 `0x7fffffffe248` (`-0x8(%rbp)`) 以及 `0x7fffffffe24c` (`-0x4(%rbp)`)。 ```gdb (gdb) info register rbp rbp 0x7fffffffe250 0x7fffffffe250 (gdb) x/1xb 0x7fffffffe247 0x7fffffffe247: 0x01 (gdb) x/4xb 0x7fffffffe248 0x7fffffffe248: 0xd2 0x04 0x00 0x00 (gdb) x/4xb 0x7fffffffe24c 0x7fffffffe24c: 0x2e 0x16 0x00 0x00 ``` 組合語言第 12 與 13 行將變數 `fNegate` 的值 (也就是記憶體位址 `0x7fffffffe247` 中的值) 複製到暫存器 `eax` 中。然後再對暫存器 `eax` 取負值。 ```gdb (gdb) info register rax rax 0x555555555129 93824992235817 (gdb) ni 0x0000555555555147 10 r = (v ^ -fNegate) + fNegate; (gdb) info register rax rax 0x1 1 (gdb) ni 10 r = (v ^ -fNegate) + fNegate; (gdb) info register rax rax 0xffffffff 4294967295 ``` 組合語言第 14 行將變數 `v` 的值 ( `0x04d2` ) 與 `0xffffffff` 也就是 ( `-1` ) 進行 xor 運算。得到的結果為 `~v` ( `0xfb2d` ) 。 ```gdb (gdb) ni 0x000055555555514c 10 r = (v ^ -fNegate) + fNegate; (gdb) info register rax rax 0xfffffb2d 4294966061 ``` 組合語言第 15 到 17 行將 `0xfb2d` 複製到暫存器 edx ，將變數 `fNegate` 的值 (`1`) 複製到暫存器 eax ，將暫存器 edx 中的值與暫存器 eax 中的值相加 ( `0xfb2d + 1` ) 後的和存入暫存器 eax 中。 ```gdb (gdb) info register rdx rdx 0x7fffffffe358 140737488347992 (gdb) ni 10 r = (v ^ -fNegate) + fNegate; (gdb) info register rdx rdx 0xfffffb2d 4294966061 (gdb) info register rax rax 0xfffffb2d 4294966061 (gdb) ni 10 r = (v ^ -fNegate) + fNegate; (gdb) info register rax rax 0x1 1 (gdb) ni 0x0000555555555154 10 r = (v ^ -fNegate) + fNegate; (gdb) info register rax rax 0xfffffb2e 4294966062 ``` 最後，組合語言第 18 行，將最後的結果 `0xfb2e` ( `-1234` ) 複製到變數 `r` 中。 ```gdb (gdb) x/4bx 0x7fffffffe24c 0x7fffffffe24c: 0x2e 0x16 0x00 0x00 (gdb) ni 11 return 0; (gdb) x/4bx 0x7fffffffe24c 0x7fffffffe24c: 0x2e 0xfb 0xff 0xff ``` #### 當 fDontNegate 的值為 `0` 時 `r = (v ^ -fNegate) + fNegate;` $\Rightarrow$ `r = (v ^ 0) + 0;` $\Rightarrow$ `r = v + 0;` $\Rightarrow$ `r = v;` 在 C code 中，將變數 `fNegate` 的初值改為 `0` 。接著用 GDB 追蹤程式碼。 ```gdb= $ gcc test.c -O0 -g -o test $ gdb -q test Reading symbols from test... (gdb) disass main Dump of assembler code for function main: 0x0000000000001129 <+0>: endbr64 0x000000000000112d <+4>: push %rbp 0x000000000000112e <+5>: mov %rsp,%rbp 0x0000000000001131 <+8>: movb $0x0,-0x9(%rbp) 0x0000000000001135 <+12>: movl $0x4d2,-0x8(%rbp) 0x000000000000113c <+19>: movl $0x162e,-0x4(%rbp) 0x0000000000001143 <+26>: movzbl -0x9(%rbp),%eax 0x0000000000001147 <+30>: neg %eax 0x0000000000001149 <+32>: xor -0x8(%rbp),%eax 0x000000000000114c <+35>: mov %eax,%edx 0x000000000000114e <+37>: movzbl -0x9(%rbp),%eax 0x0000000000001152 <+41>: add %edx,%eax 0x0000000000001154 <+43>: mov %eax,-0x4(%rbp) 0x0000000000001157 <+46>: mov $0x0,%eax 0x000000000000115c <+51>: pop %rbp 0x000000000000115d <+52>: retq End of assembler dump. ``` 和上一節一樣，組合語言中的第 9 到 11 行就是將變數 `fNegate` 、 `v` 以及 `r` 分別初始化為 `0` 、 `1234` 以及 `5678` 。所以，這三個變數的記憶體位址分別為 `0x7fffffffe247` (`-0x9(%rbp)`) 、 `0x7fffffffe248` (`-0x8(%rbp)`) 以及 `0x7fffffffe24c` (`-0x4(%rbp)`)。 ```gdb (gdb) x/1xb 0x7fffffffe247 0x7fffffffe247: 0x00 (gdb) x/4xb 0x7fffffffe248 0x7fffffffe248: 0xd2 0x04 0x00 0x00 (gdb) x/4xb 0x7fffffffe24c 0x7fffffffe24c: 0x2e 0x16 0x00 0x00 ``` 組合語言第 12 與 13 行將變數 fNegate 的值 (也就是記憶體位址 0x7fffffffe247 中的值) 複製到暫存器 eax 中。然後再對暫存器 eax 取負值。 ```gdb (gdb) info register rax rax 0x555555555129 93824992235817 (gdb) ni 0x0000555555555147 10 r = (v ^ -fNegate) + fNegate; (gdb) info register rax rax 0x0 0 (gdb) ni 10 r = (v ^ -fNegate) + fNegate; (gdb) info register rax rax 0x0 0 ``` 組合語言第 14 行將變數 v 的值 ( `0x04d2` ) 與 `0` 進行 xor 運算。得到的結果為 ~v ( `0x04d2` ) 。 ```gdb (gdb) ni 0x000055555555514c 10 r = (v ^ -fNegate) + fNegate; (gdb) info register rax rax 0x4d2 1234 ``` 組合語言第 15 到 17 行將 `0x04d2` 複製到暫存器 edx ，將變數 `fNegate` 的值 ( `0` ) 複製到暫存器 eax ，將暫存器 edx 中的值與暫存器 eax 中的值相加 ( `0x04d2 + 0` ) 後的和存入暫存器 eax 中。 ```gdb (gdb) info register rdx rdx 0x7fffffffe358 140737488347992 (gdb) ni 10 r = (v ^ -fNegate) + fNegate; (gdb) info register rdx rdx 0x4d2 1234 (gdb) info register rax rax 0x4d2 1234 (gdb) ni 10 r = (v ^ -fNegate) + fNegate; (gdb) info register rax rax 0x0 0 (gdb) ni 0x0000555555555154 10 r = (v ^ -fNegate) + fNegate; (gdb) info register rax rax 0x4d2 1234 ``` 最後，組合語言第 18 行，將最後的結果 `0xfb2e` ( `-1234` ) 複製到變數 `r` 中。 ```gdb (gdb) x/4bx 0x7fffffffe24c 0x7fffffffe24c: 0x2e 0x16 0x00 0x00 (gdb) ni 11 return 0; (gdb) x/4bx 0x7fffffffe24c 0x7fffffffe24c: 0xd2 0x04 0x00 0x00 ``` ## [Merge bits from two values according to a mask](https://graphics.stanford.edu/~seander/bithacks.html#MaskedMerge) ```c int main() { unsigned int a; // value to merge in non-masked bits unsigned int b; // value to merge in masked bits unsigned int mask; // 1 where bits from b should be selected; 0 where from a. unsigned int r; // result of (a & ~mask) | (b & mask) goes here r = a ^ ((a ^ b) & mask); return 0; } ``` 這段程式碼中定義了 4 個 `unsigned int` 變數 `a` 、 `b` 、 `mask` 與 `r` 。若變數 `mask` 的 bit i 為 `1` 則變數 `r` 的 bit i 會與變數 `b` 的 bit i 相同。若變數 `mask` 的 bit i 為 `0` 則變數 `r` 的 bit i 會與變數 `a` 的 bit i 相同。以下標 $_{i}$ 表示某個變數的 bit i 。 ### 當 $mask_{i}$ 為 `1` 時 $r_{i} = a_{i} \oplus ((a_{i} \oplus b_{i}) \land mask_{i})$ $\Rightarrow r_{i} = a_{i} \oplus ((a_{i} \oplus b_{i}) \land 1)$ $\Rightarrow r_{i} = a_{i} \oplus (a_{i} \oplus b_{i})$ $\Rightarrow r_{i} = b_{i}$ 所以，當變數 `mask` 的 bit i 為 `1` 則變數 `r` 的 bit i 會與變數 `b` 的 bit i 相同。 ### 當 $mask_{i}$ 為 `0` 時 $r_{i} = a_{i} \oplus ((a_{i} \oplus b_{i}) \land mask_{i})$ $\Rightarrow r_{i} = a_{i} \oplus ((a_{i} \oplus b_{i}) \land 0)$ $\Rightarrow r_{i} = a_{i} \oplus 0$ $\Rightarrow r_{i} = a_{i}$ 所以，當變數 `mask` 的 bit i 為 `0` 則變數 `r` 的 bit i 會與變數 `a` 的 bit i 相同。 ## Counting bits set ### [Counting bits set (naive way)](https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive) ```c int main() { unsigned int v; // count the number of bits set in v unsigned int c; // c accumulates the total bits set in v for (c = 0; v; v >>= 1) c += v & 1; return 0; } ``` 這支程式應該不需要多做解釋。 ### [Counting bits set by lookup table](https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable) #### 方法一 ```c int main() { static const unsigned char BitsSetTable256[256] = { #define B2(n) n, n+1, n+1, n+2 #define B4(n) B2(n), B2(n+1), B2(n+1), B2(n+2) #define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2) B6(0), B6(1), B6(1), B6(2) }; unsigned int v; // count the number of bits set in 32-bit value v unsigned int c; // c is the total bits set in v // Option 1: c = BitsSetTable256[v & 0xff] + BitsSetTable256[(v >> 8) & 0xff] + BitsSetTable256[(v >> 16) & 0xff] + BitsSetTable256[v >> 24]; return 0; } ``` #### 方法二 ```c int main() { static const unsigned char BitsSetTable256[256] = { #define B2(n) n, n+1, n+1, n+2 #define B4(n) B2(n), B2(n+1), B2(n+1), B2(n+2) #define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2) B6(0), B6(1), B6(1), B6(2) }; unsigned int v; // count the number of bits set in 32-bit value v unsigned int c; // c is the total bits set in v // Option 2: unsigned char * p = (unsigned char *) &v; c = BitsSetTable256[p[0]] + BitsSetTable256[p[1]] + BitsSetTable256[p[2]] + BitsSetTable256[p[3]]; return 0; } ``` #### 另一種產生 table 的方法 ```c // To initially generate the table algorithmically: BitsSetTable256[0] = 0; for (int i = 0; i < 256; i++) { BitsSetTable256[i] = (i & 1) + BitsSetTable256[i / 2]; } ``` ### [Counting bits set, Brian Kernighan's way](https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan) ### [Counting bits set in 14, 24, or 32-bit words using 64-bit instructions](https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64) ### [Counting bits set, in parallel](https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel) ### [Count bits set (rank) from the most-significant bit upto a given position](https://graphics.stanford.edu/~seander/bithacks.html#CountBitsFromMSBToPos) ### [Select the bit position (from the most-significant bit) with the given count (rank)](https://graphics.stanford.edu/~seander/bithacks.html#SelectPosFromMSBRank) ## [Swapping values with XOR](https://graphics.stanford.edu/~seander/bithacks.html#SwappingValuesXOR) ```c= int main() { int x = 12, y = 77; x = x ^ y; y = x ^ y; x = x ^ y; return 0; } ``` 或者可以跟原作者一樣定義 macro 。 ```c #define SWAP(a, b) (((a) ^= (b)), ((b) ^= (a)), ((a) ^= (b))) ``` 接下來研究如何用 bitwise xor 運算子 (`^`) 交換兩個型別為 `int` 的變數的值。 | x | y | x ^ y | | --- | --- | ------ | | 0 | 0 | 0 | | 0 | 1 | 1 | | 1 | 0 | 1 | | 1 | 1 | 0 | 由 xor 的真值表可以看出==運算式 `x ^ x` 的結果是 0== 。所以，第 5 行可以改寫如下: ```c=5 y = (12 ^ 77) ^ 77; ``` 所以，當第 5 行執行完畢後，變數 `y` 的值就是變數 `x` 一開始的值，也就是 `12` 。接著看第 6 行。這時，變數 `x` 停留在第 4 行執行結束後的狀態，也就是 `x ^ y`。且此時變數 `y` 的值已經是變數 `x` 原本一開始的數值。所以可以把第 6 行改寫如下: ```c=6 x = (12 ^ 77) ^ 12; ``` 所以，第 6 行執行完畢後，變數 `x` 的數值就會是變數 `y` 原本一開始的數值，也就是 `77` 。 ### 有條件交換兩個變數的數值 ```c= #include <stdio.h> int main() { int x = 12, y = 77, cond = 0, t; t = x ^ y; y = y ^ (t & -cond); x = x ^ (t & -cond); return 0; } ``` * 假設整數型別 ( `int` ) 變數在記憶體中佔用 4 個 byte ，且每個 byte 有 8 個 bit 。 * 若變數 `cond` 的數值為 `0` ，則運算式 `t & -cond` 的結果就是 `0` ，導致第 6 行與第 7 行可寫如下。 ```c=6 y = 77 ^ 0; x = 12 ^ 0; ``` * 若變數 `cond` 的數值為 `1` ，則運算式 `-cond` 的結果就是 `-1` 。而 `-1` 的二補數編碼就是 `0xffffffff` 。那麼，運算式 `t & 0xffffffff` 的結果還是變數 `t` 的數值。所以，第 6 行與第 7 行可寫如下。 ```c=6 y = 77 ^(12 ^ 77) x = 12 ^(12 ^ 77) ``` 所以，當變數 `cond` 為 `0` 時，變數 `x` 與 `y` 的數值維持不變。當變數 `cond` 為 `1` 時，變數 `x` 與 `y` 的數值互換。 * NOTE: 這部分的內容參考 [看條件交換兩個數](https://hackmd.io/@0xff07/ORAORAORAORA#%E7%9C%8B%E6%A2%9D%E4%BB%B6%E4%BA%A4%E6%8F%9B%E5%85%A9%E5%80%8B%E6%95%B8) 的敘述。 ## 參考資料 * [Bit Twiddling Hacks](https://graphics.stanford.edu/~seander/bithacks.html) * [Bit Twiddling Hacks 解析 (一)](https://hackmd.io/@0xff07/ORAORAORAORA) * [Bit Twiddling Hacks 解析 (二)](https://hackmd.io/@0xff07/MUDAMUDAMUDA) * [Bit Twiddling Hacks 解析 (三)](https://hackmd.io/@0xff07/WRYYYYYYYYYY) * Computer Systems: A Programmer's Perspective, Third Edition * [解讀計算機編碼](https://hackmd.io/@sysprog/binary-representation) * [CS107 Guide to x86-64 - Stanford University](https://web.stanford.edu/class/archive/cs/cs107/cs107.1222/guide/x86-64.html)