zero-width bit field 問題探討 I

# zero-width bit field 問題探討 I [zero-width bit field 問題探討 II](/baH8lSMJTh2q89U3hIYsag) ## 前言日前正在旁聽黃敬群教授(jserv)在成大的 [Linux 核心設計課程](http://wiki.csie.ncku.edu.tw/linux/schedule)，第1周有關於 [bit field](https://hackmd.io/s/SJ8y82ZYQ) 的教材。其中提到一個例子([程式來源](https://stackoverflow.com/questions/13802728/what-is-zero-width-bit-field))： ```cpp struct foo { int a : 3; int b : 2; int : 0; /* Force alignment to next boundary */ int c : 4; int d : 3; }; int main() { int i = 0xFFFF; struct foo *f = (struct foo *)&i; printf("a=%d\nb=%d\nc=%d\nd=%d\n", f->a, f->b, f->c, f->d); return 0; } ``` 執行程式後 output 會是： ```shell a=-1 b=-1 c=-4 d=0 ``` 但是 d 每次執行的結果卻都不一樣！結語提到： > 而這裡因為宣告一個 int 的 zero-width bit-field ，所以要對齊的長度為 32 bit，c 和 d 早就已經超出 0xFFFF的範圍，指到程式中不知名的地方，才會導致 d 每次都會是不同的答案。卻產生了一個問題： **既然 c 和 d 都超出了 0xFFFF的範圍了，為何只有 d 的結果會不一樣了？** 百思不得其解，只好請教 [Jserv與他愉快的小夥伴](https://www.facebook.com/groups/system.software2019/search/?query=gdb&epa=SEARCH_BOX)，教授卻邀請我用gdb去追蹤，動手做實驗。於是我便誠實的面對自己！ ## 目標 1. 了解 zero-width bit field 的運作方式 2. 透過 gdb 來追蹤程式碼的狀況 3. 探討為何上述程式碼中只有 d 的數值會變動 ## 實驗環境 - OS: Ubuntu 16.04.5 LTS - Compiler: gcc 8.1 ## 主題(一): zero-width bit field bit field 是指說可以在某個資料型態中只用到其中的幾個 bits。例如: ```cpp #include <stdio.h> struct test { int a; int b; int c; }; struct foo { int a : 2; int b : 3; int c : 3; }; struct doo { int a : 2; int b : 3; int : 0; int c : 3; }; int main() { printf("size of struct test: %ld\n", sizeof(struct test)); printf("size of struct foo: %ld\n", sizeof(struct foo)); printf("size of struct doo: %ld\n", sizeof(struct doo)); return 0; } // output size of struct test: 12 size of struct foo: 4 size of struct doo: 8 ``` struct test 與 struct foo 都宣告 3 個 int 的 member，但是 struct test 的 size 是 12 bytes，struct foo 卻是 4 bytes。由於在 struct foo 中的 a, b, c 都只各用了 2 bits, 3 bits, 4 bits，但仍會給予一個 int 的空間大小。 ![](https://i.imgur.com/GbdYNMl.png) struct doo 和 sturct test 的宣告幾乎一模一樣，但多了一個 ```int :0;```，size 就變 8 bytes。因為 ```int :0;``` (稱為 zero-width bit field) 的功能是做一個 padding 的動作。將下一個 bit field 對齊到下一個單位的記憶體空間。struct doo 的空間如圖： ![](https://i.imgur.com/Mn4Pys5.png) ## 主題(二): assembly and memory layout 在解決這次問題的時候用到組合語言和記憶體位址的概念。 ```cpp #include <stdio.h> int main() { int i = 10; printf("%d\n", i); return 0; } ``` 若將上述的程式disassembly，則得到： ```shell $ objdump -d -M intel a.out # output 0000000000400502 <main>: 400502: 55 push rbp 400503: 48 89 e5 mov rbp,rsp 400506: 48 83 ec 10 sub rsp,0x10 40050a: c7 45 fc 0a 00 00 00 mov DWORD PTR [rbp-0x4],0xa 400511: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 400514: 89 c6 mov esi,eax 400516: bf b4 05 40 00 mov edi,0x4005b4 40051b: b8 00 00 00 00 mov eax,0x0 400520: e8 db fe ff ff call 400400 <printf@plt> 400525: b8 00 00 00 00 mov eax,0x0 40052a: c9 leave 40052b: c3 ret 40052c: 0f 1f 40 00 nop DWORD PTR [rax+0x0] ``` 這次重點只放在前四行： ```shell 400502: 55 push rbp 400503: 48 89 e5 mov rbp,rsp 400506: 48 83 ec 10 sub rsp,0x10 40050a: c7 45 fc 0a 00 00 00 mov DWORD PTR [rbp-0x4],0xa ``` > rbp is the base pointer, which points to the base of the current stack frame, and rsp is the stack pointer, which points to the top of the current stack frame 這四行運作的圖示可參考[這裡](https://github.com/holbertonschool/Hack-The-Virtual-Memory/tree/master/04.%20The%20Stack%2C%20registers%20and%20assembly%20code) 簡單來說就是在執行一個程式的時候，會分配 rbp, rsp 暫存器所指向的位址之間的空間來儲存 local variables。可以看到 [rpb-0x4] 所指向的就是變數 i 的位置，透過```mov DWORD PTR [rbp-0x4],0xa```將10存放在這個位置。 ## 實驗從上述的概念可以得知這次實驗的程式 ```cpp // zero_widgh_bit_field.c struct foo { int a : 3; int b : 2; int : 0; /* Force alignment to next boundary */ int c : 4; int d : 3; }; int main() { int i = 0xFFFF; struct foo *f = (struct foo *)&i; printf("a=%d\nb=%d\nc=%d\nd=%d\n", f->a, f->b, f->c, f->d); return 0; } ``` struct foo 的記憶體分配應該會像是： ![](https://i.imgur.com/KYlyhjY.png) 將程式編譯後用 gdb 來 run 起來: ```shell $ gcc -Wall -g zero_width_bit_field.c -o zero_width_bit_field $ gdb -q zero_width_bit_field ``` 於 main function 下中斷點、執行並反組譯： ```shell (gdb) b main (gdb) r (gdb) disassemble ``` 得到： ``` 0x0000000000400572 <+0>: push %rbp 0x0000000000400573 <+1>: mov %rsp,%rbp 0x0000000000400576 <+4>: sub $0x20,%rsp => 0x000000000040057a <+8>: mov %fs:0x28,%rax 0x0000000000400583 <+17>: mov %rax,-0x8(%rbp) 0x0000000000400587 <+21>: xor %eax,%eax 0x0000000000400589 <+23>: movl $0xffff,-0x14(%rbp) 0x0000000000400590 <+30>: lea -0x14(%rbp),%rax 0x0000000000400594 <+34>: mov %rax,-0x10(%rbp) ... ``` 重點在於 rbp, rsp, rbp-0x10, rbp-0x14，來找出他們的值： ```shell (gdb) p $rbp $1 = (void *) 0x7fffffffd180 (gdb) p $rsp $2 = (void *) 0x7fffffffd160 (gdb) p $rbp-16 $3 = (void *) 0x7fffffffd170 (gdb) p $rbp-20 $4 = (void *) 0x7fffffffd16c ``` 可以得知 ```rbp-0x14 (0x7fffffffd16c)``` 就是變數 i 的位址 (因為將0xffff assign 到這個位置)。那 ```rbp-0x10 (0x7fffffffd170)```是誰的位址呢？做個猜測： ```shell (gdb) p &f $5 = (struct foo **) 0x7fffffffd170 ``` 得知 ```rbp-0x10 (0x7fffffffd170)``` 就是指標 f 的位址。繼續執行到程式的16行前： ```shell (gdb) n 16 struct foo *f = (struct foo *)&i; (gdb) p i $2 = 65535 (gdb) p f $3 = (struct foo *) 0x7fffffffd260 ``` pointer f 還不知道是指去哪個位址，繼續往下執行 ```cpp struct foo *f = (struct foo *)&i; ``` ```shell (gdb) p f $4 = (struct foo *) (gdb) x/t 0x7fffffffd16c // i 的值 0x7fffffffd16c: 00000000000000001111111111111111 (gdb) x/t 0x7fffffffd170 // f 的值 0x7fffffffd170: 11111111111111111101000101101100 ``` 所以現在記憶體位置的分佈為： ![](https://i.imgur.com/jVQqqgC.png) 按照這張分佈圖，可以發現 ```f->a```, ```f->b``` 都是在 ```0x7fffffffd16c``` 的位址，而 ```f->c```, ```f->d``` 的位址指到則是 ```0x7fffffff70``` 也就是 f 的位址。 ``` (gdb) p &(f->a) $5 = (int *) 0x7fffffffd16c (gdb) p &(f->b) $6 = (int *) 0x7fffffffd16c (gdb) p &(f->c) $7 = (int *) 0x7fffffffd170 (gdb) p &(f->d) $8 = (int *) 0x7fffffffd170 ``` 所以 a = 111(-1)， b = 11(-1)， c = 1100 (-4)， d = 110 (-2) ```shell (gdb) n a: -1 b: -1 c: -4 d: -2 ``` 也就是說 ==c 和 d 的值是取決於變數 i 的位址！== 為了佐證這個結果，改變一下原本的程式： ```cpp int i = 0xFFFF; printf("&i = %p\n", &i); struct foo *f = (struct foo *)&i; printf("a: %d\nb: %d\nc: %d\nd: %d\n", f->a, f->b, f->c, f->d); ``` 多次執行的結果： ```shell &i = 0x7ffec2fb04fc a: -1 b: -1 c: -4 d: -1 &i = 0x7ffd906b4e8c a: -1 b: -1 c: -4 d: 0 &i = 0x7ffd674e848c a: -1 b: -1 c: -4 d: 0 &i = 0x7ffd3c5def4c a: -1 b: -1 c: -4 d: -4 &i = 0x7fff8ec9e8cc a: -1 b: -1 c: -4 d: -4 &i = 0x7ffc5521ce8c a: -1 b: -1 c: -4 d: 0 &i = 0x7fff164d922c a: -1 b: -1 c: -4 d: 2 ``` 發現 i 的位址都是 0xc(1100) 結尾，所以 c 也就是固定為 -4！而 d 則跟著 0xf(1111), 0x8(1000), 0xc(1100), 0x2(0010), 0x4(0100)等跳動而變成 111(-1), 000(0), 100(-4), 010(2)！ :::warning 請重新閱讀 [Bit fields](https://en.cppreference.com/w/c/language/bit_field)，注意裡頭 alignment 的描述，並且搭配 [SEI CERT C Coding Standard](https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard) 的 [EXP11-C. Do not make assumptions regarding the layout of structures with bit-fields](https://wiki.sei.cmu.edu/confluence/display/c/EXP11-C.+Do+not+make+assumptions+regarding+the+layout+of+structures+with+bit-fields) 去思考。 :notes: jserv ::: ## 結語與討論為甚麼 c 都會保持不變而 d 會一直跳動的原因：**因為和變數 i 的位址有關阿！** 而為甚麼變數 i 的位址剛好前 4 個 bits 都會一樣呢...？又是另外一個問題了... 原本一個看似微小的問題，竟要用到 bit-field, assembly, memory layout等概念，還有善用 gdb 等工具。過程中也發現自己以前許多觀念都不太清楚，重新釐清後才能繼續下一步，果然還是要誠實面對自己... ## 參考文獻 - [Signed Binary Numbers](https://www.electronics-tutorials.ws/binary/signed-binary-numbers.html) - [Debugging with GDB （入門篇）](http://www.study-area.org/goldencat/debug.htm) - [64-Bit ELF V2 ABI Specification bit field](http://openpowerfoundation.org/wp-content/uploads/resources/leabi/content/ch02s01s02s04.html) - [Hack the virtual memory, chapter 4: the stack, registers and assembly code](https://github.com/holbertonschool/Hack-The-Virtual-Memory/tree/master/04.%20The%20Stack%2C%20registers%20and%20assembly%20code#hack-the-virtual-memory-chapter-4-the-stack-registers-and-assembly-code) - [gdb x command](http://visualgdb.com/gdbreference/commands/x) ###### tags: `knowThyself` `linux` `c` `bitField` `assembly`