bit field - HackMD

--- tags: DYKC --- # C 語言的 bit field > 資料整理: [jserv](http://wiki.csie.ncku.edu.tw/User/jserv) 考慮以下 C 程式，預期輸出結果 ```cpp #include <stdbool.h> #include <stdio.h> bool is_one(int i) { return i == 1; } int main() { struct { signed int a : 1; } obj = { .a = 1 }; puts(is_one(obj.a) ? "one" : "not one"); return 0; } ``` * [C: Bit Fields](https://www.tutorialspoint.com/cprogramming/c_bit_fields.htm) :::warning **C99 standard (§6.7.2.1)** > A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type. ::: 在 `struct {int a : 1; } obj = {.a = 1 };` 的地方，原本 `int` 這個資料型態需要 4 bytes 的空間，即 32 個位元，我們透過 bit field 指定只用其中一個 bit 來存放數值。但因 a 是 **signed integer**，依據[二補數系統](https://hackmd.io/@sysprog/binary-representation)，1 bit 的範圍內只能表示 `-1` 和 `0`，於是程式輸出為 `not one`。而決定 1-bit binary 1 為 `-1` 的是編譯器所定義，可見上述引用 C 語言規格書提及 implementation-defined 這個詞，代表該行為由實作品 (通常是指編譯器) 所決定，並對該定義有義務在文件內說明與告知。如果題目的 `int` 並沒有說是 signed 或 unsigned ，那麼 bit-field 宣告之後究竟該是 signed / unsigned 是由編譯器所決定的。 :::success 延伸題目：舉出 Linux 核心中用到 bit field 的原始程式碼並解說其用途 ::: ### Linux 核心: `BUILD_BUG_ON_ZERO()` 在 Linux 核心原始程式碼裡頭，有個標頭檔 [linux/build_bug.h](https://elixir.bootlin.com/linux/latest/source/include/linux/build_bug.h#L16) 內部實作是: ```c /* * Force a compilation error if condition is true, but also produce a * result (of value 0 and type size_t), so the expression can be used * e.g. in a structure initializer (or where-ever else comma expressions * aren't permitted). */ #define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:(-!!(e)); })) ``` 該巨集用來檢查是否會有編譯錯誤，e 是個判斷式，當它經由編譯器判定為 true ，則可在編譯時期，及早由編譯器告知開發者錯誤。(因為宣告發生在編譯時期) 然後 `int:(-!!(e))` 是代表什麼意思？ * !!(e)：對 e 做兩次 `NOT`，確保結果一定是 0 或 1 * -!!(e)：對 !!(e) 乘上 `-1`，因此結果會是 `0` 或 `-1` 因此，當 e 這個判斷式不合法時，即 `!!(e) = 1` 也就是 `-!!(e) = -1`，代表宣告一個 structure 中有一個 `int` ，其中 32 bits 中有 **-1 bit** 的 bit field。反之當 e 合法時，則會宣告 `int` 有 0 bit 的 bit field。 `-1` 的 bit field 很明顯是非法的 (constant expression must be non-negative integer)，那麼 0 bit 的 bit field 呢？首先 zero-width bit field 有個規定是必須 unnamed ，再來 zero-width bit field 宣告不會使用到任何空間，但是會強制結構體下一個 bit field 對齊到下個單元的 boundary，依據 C99 標準 (§6.7.2.1): * The expression that specifies the width of a bit-field shall be an integer constant expression with a nonnegative value that does not exceed the width of an object of the type that would be specified were the colon and expression omitted. If the value is zero, the declaration shall have no declarator. * A bit-field declaration with no declarator, but only a colon and a width, indicates an unnamed bit-field. As a special case, a bit-field structure member with a width of 0 indicates that no further bit-field is to be packed into the unit in which the previous bit-field, if any, was placed. * (108) An unnamed bit-field structure member is useful for padding to conform to externally imposed layouts. 舉例來說 (程式碼來自 [What is zero-width bit field](https://stackoverflow.com/questions/13802728/what-is-zero-width-bit-field))： ```c struct foo { int a : 3; int b : 2; int : 0; /* Force alignment to next boundary */ int c : 4; int d : 3; }; int main() { int i = 0xFFFF; struct foo *f = (struct foo *) &i; printf("a=%d\nb=%d\nc=%d\nd=%d\n", f->a, f->b, f->c, f->d); return 0; } ``` 預期輸出是: ``` a=-1 b=-1 c=-4 d=0 ``` 即 a = 111 (decimal -1), b = 11 (decimal -1), c = 1100 (decimal -4), d = 000 (decimal 0)，沒有 alignment 則是 a = 111, b = 1 1, c = 1 11 1, d = 111。但實際每次執行，d 的數值都不一樣。重新思考過 padding 的方式，可發現上述程式碼的漏洞。zero-width bit-field 對齊的示意圖如下： ``` i = 1111 1111 1111 1111 padding & start from here ↓ 1111 1111 1111 1111 b baaa dddcccc |← int 32 bits →|← int 32 bits →| ``` 依據 C11 規格，以下三種情形會使得結構體裡頭的成員被置於不同的記憶體空間： * bit-field 成員及相鄰的 non-bit-field 成員 * 二個 bit-field 成員其一被宣告在巢狀結構 (nested structure) 宣告當中，但另外一者不是 * 二個 bit-field 成員被 zero-length bit-field 所隔開而這裡因為宣告一個 int 的 zero-width bit-field，依據上述提及的規範, `c`,`d`會被放在和 `a` 和 `b` 不同記憶體區域中，無論 `c` 和 `d`的資料型別與資料儲存空間大小，因此 `c` 和 `d` 必定指向和 `a` 及 `b` 不同的記憶體區域 (不一定在 `0xFFFF` 範圍)，導致 `c`, `d` 的答案不如預期。依據 C11 標準 (§3.14.3): > A bit-field and an adjacent non-bit-field member are in separate memory locations. The same applies to two bit-fields, if one is declared inside a nested structure declaration and the other is not, or if the two are separated by a zero-length bit-field declaration, or if they are separated by a non-bit-field member declaration. It is not safe to concurrently update two non-atomic bit-fields in the same structure if all members declared between them are also (non-zero-length) bit-fields, no matter what the sizes of those intervening bit-fields happen to be. 如果使用 gcc 的話，c 有可能會一直維持 `-4` 或`0`不變，但用其他編譯器可能得到不同的結果，只有 a 跟 b 才是確定的。 ```shell $ gcc -o out zero_width.c $ ./out a=-1 b=-1 c=0 d=0 ``` 若採用 clang 編譯, `c` 可能產生不同於 `0` 的數值，且每次執行亦可能產生不同結果 ```shell $ clang -o out zero_width.c $ ./out a=-1 b=-1 c=1286839792 d=723631888 $ ./out a=-1 b=-1 c=1860218352 d=-400441584 ``` 在 C11 中，提供 [`_Static_assert`](https://en.cppreference.com/w/c/language/_Static_assert) 這樣的敘述來達到上述 `BUILD_BUG_ON_ZERO` 的效果，就不需要繞這麼一大圈，不過 Linux 核心考量到編譯器對語言規格的支援程度，最低期待是 C99 規格。