前置處理器篇

--- title: 前置處理器篇 tags: C 語言筆記 --- ## 前置處理器篇 ### 開發物件導向程式時，善用前置處理器 * 在 macro 使用 [Stringification/Stringizing (字串化)](https://gcc.gnu.org/onlinedocs/cpp/Stringizing.html) ```#``` 和 [concatenation (連結)](https://gcc.gnu.org/onlinedocs/cpp/Concatenation.html) ```##``` ```c= #define COMMAND(NAME) { #NAME, NAME ## _command } void (*quit_command)(); void (*help_command)(); struct command { char *name; void (*function)(void); }; int main() { struct command commands[] = { COMMAND (quit), COMMAND (help), }; return 0; } ``` * 用巨集產生程式碼 ([以光影追蹤為例](https://github.com/embedded2016/raytracing)) ```c= #define DECLARE_OBJECT(name) \ struct __##name##_node; \ typedef struct __##name##_node *name##_node; \ struct __##name##_node { \ name element; \ name##_node next; \ }; \ void append_##name(const name *X, name##_node *list); \ void delete_##name##_list(name##_node *list); DECLARE_OBJECT(light) ``` 會產生 ```c= struct __light_node; typedef struct __light_node *light_node; struct __light_node { light element; light_node next; }; void append_light(const light *X, light_node *list); void delete_light_list(light_node *list); ``` ### _Generic [C11] * 用來模擬泛型程式，其本質是類似 switch 的選擇陳述，不過是編譯時期根據型態來選擇展開的對象 * 範例 ```c= #include <stdio.h> #include <math.h> #define cbrt(X) _Generic((X), long double: cbrtl, \ float: cbrtf, \ default: cbrt \ )(X) int main(void){ double x = 8.0; const float y = 3.375; printf("cbrt(8.0) = %f\n", cbrt(x)); printf("cbrtf(3.375) = %f\n", cbrt(y)); return 0; } ``` ### 不夠謹慎的 ARRAY_SIZE 巨集 * 考慮以下程式 ```c= #include <stdio.h> #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0])) int main(void) { int a[10]; int *a_ptr = a; printf("%d\n", ARRAY_SIZE(a)); printf("%d\n", ARRAY_SIZE(a_ptr)); return 0; } ``` 會得到 ```10``` & ```2```，並非預期，因此可用 typeof 來改進 ```c= int main(void) { int a[10]; typeof(a) b; printf("%d\n", ARRAY_SIZE(a)); printf("%d\n", ARRAY_SIZE(b)); return 0; } ``` ### 應用: String switch in C * 以下的 case 如果用 if else 和 strcmp 來一一判斷，會很浪費時間，所以可以用類似 switch 的方式來做 * [String switch in C](https://tia.mat.br/posts/2012/08/09/string_switch_in_c.html) ```c= #define STRING_SWITCH_L(s) \ switch (*((int32_t *)(s)) | 0x20202020) //一個簡單的 hash function，把 ".jpg"、".png"、".htm" 等轉換成數值 #define MULTICHAR_CONSTANT(a,b,c,d) \ ((int32_t)((a) | (b) << 8 | (c) << 16 | (d) << 24)) enum { EXT_JPG = MULTICHAR_CONSTANT_L('.','j','p','g'), EXT_PNG = MULTICHAR_CONSTANT_L('.','p','n','g'), EXT_HTM = MULTICHAR_CONSTANT_L('.','h','t','m'), EXT_CSS = MULTICHAR_CONSTANT_L('.','c','s','s'), EXT_TXT = MULTICHAR_CONSTANT_L('.','t','x','t'), EXT_JS = MULTICHAR_CONSTANT_L('.','j','s',0), } lwan_mime_ext_t; const char* lwan_determine_mime_type_for_file_name(char *file_name) { char *last_dot = strrchr(file_name, '.'); if (UNLIKELY(!last_dot)) goto fallback; STRING_SWITCH_L(last_dot) { case EXT_CSS: return "text/css"; case EXT_HTM: return "text/html"; case EXT_JPG: return "image/jpeg"; case EXT_JS: return "application/javascript"; case EXT_PNG: return "image/png"; case EXT_TXT: return "text/plain"; } fallback: return "application/octet-stream"; } ``` ### 「靜態」的 linked list * function call 與 macro 的執行時間差異 ![](https://i.imgur.com/1ShsG6z.png) * 所以 linux 在一些很常用到的東西就會用巨集來實作，例如 linked list 資料結構 * 先介紹 C99 新加入的 compound literal * 範例一 ```c= int *p = (int []){2, 4, 6}; ``` 等同於 ```c= int arr[] = {2, 4, 6}; int *p = arr; ``` * 範例二 ```c= struct Point { int x, y; }; int main() { func((struct Point){2, 3}); return 0; } ``` * 靜態的 linked list 實作: 第 3 行就是用 compound literal 來進行宣告 ```c= #include <stdio.h> #define cons(x, y) ((struct llist[]){x, y}) struct llist { int val; struct llist *next; }; int main() { struct llist *list = cons(9, cons(5, cons(4, cons(7, NULL)))); struct llist *p = list; for (; p; p = p->next) printf("%d", p->val); printf("\n"); return 0; } ``` * 用巨集靜態宣告 linked list 可以做到在編譯時期就先初始化 ### BUILD_BUG_ON_ZERO 巨集 * 先介紹 bit-field: * 在以下範例中第5行的 struct 宣告中 ```signed int a : 1```，代表 a 只能用 1 bit 來表達，而如果印出 a 的值可得 -1，這是因為決定 1-bit binary 1 為 -1 的是由編譯器所定義 ```c= #include <stdbool.h> #include <stdio.h> bool is_one(int i) { return i == 1; } int main() { struct { signed int a : 1; } obj = { .a = 1 }; puts(is_one(obj.a) ? "one" : "not one"); return 0; } ``` * bit fields alignment: * 關於 bit field 在 struct 之中記憶體配置 ```c= struct S { unsigned b1 : 5; unsigned : 27; // start a new unsigned int unsigned b2 : 6; }; ``` 效果等同於 ```c= struct S { unsigned b1 : 5; unsigned : 0; // Force alignment to next unsigned int unsigned b2 : 6; }; ``` * 關於 struct 原本的對齊機制，也是一樣看處理器原本的 struct 最小一次可讀多少，而因為通常都是以 byte 為單位來算，所以 bit field 通常可以直接放，不用算 alignment ```c= #include <stdio.h> typedef struct foo { int a : 5; int b : 4; int c : 8; int d : 15; //int e : 1; }f; int main() { printf("%d\n", (int)sizeof(f)); return 0; } ``` * 範例: struct f 只給了 0xFFFF 的空間以及初始化，但是從 foo 可以看到實際需要 8 byte 的空間，因此在印出 c、d 時會不知道印到誰的值(只有 a、b 會對，a = 111 (decimal -1), b = 11 (decimal -1)) ```c= struct foo { int a : 3; int b : 2; int : 0; /* Force alignment to next boundary */ int c : 4; int d : 3; }; int main() { int i = 0xFFFF; struct foo *f = (struct foo *) &i; printf("a=%d\nb=%d\nc=%d\nd=%d\n", f->a, f->b, f->c, f->d); return 0; } ``` 圖解: ``` i = 1111 1111 1111 1111 實際記憶體配置(little endian) 1111 1111 1111 1111 xxxx xxxx xxxx xxxx b baaa ddd cccc ``` * BUILD_BUG_ON_ZERO() * 在 linux/build_bug.h 中實作是 ```c= #define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:(-!!(e)); })) ``` * 用來偵測判斷式 e 在 compilation time 時是否有 error * code 解釋 * !!(e): 將所傳入的 e 做兩次 negative，如此可以確保只要 e 不為 0 結果一定為 1，e 為 0 結果仍為 0 * -!!(e): 將剛剛的結果乘上 (-1)，因此只要 e 不為 0 結果就會是 -1，e 為 0 結果仍為 0 * struct { int:-!!(e) }: 宣告一個 structure，並且使用了 bit field 技巧。因為絕對不會有佔 -1 個 bits 的 bit field 存在，所以會導致編譯時出錯。而 0 個 bits 的 bit-field 是被允許的 * [參考1](http://frankchang0125.blogspot.com/2012/10/linux-kernel-buildbugonzero.html) * [參考2](https://stackoverflow.com/questions/9229601/what-is-in-c-code) ### max, min 巨集(以 MAX 為例) * 最早期的 MAX: ```c= #define max(x, y) (x) > (y) ? (x) : (y) ``` 但這樣會有 double evaluation 的問題 * double evaluation: 以下程式碼會執行三次 printf，與預期的兩次不同 ```c= #define max(a, b) (a > b ? a : b) void doOneTime() { printf("called doOneTime!\n"); } int f1() { doOneTime(); return 0; } int f2() { doOneTime(); return 1; } int result = max(f1(), f2()); ``` * MAX1 版本: 引用 typeof 來解決 double evaluation，先將巨集的 a 和 b 儲存於變數中 ```c= #define max(a, b) ({ \ typeof (a) _a = (a); \ typeof (b) _b = (b); \ _a > _b ? _a : _b; \ }) ``` * MAX2 版本: 進行比較前先確認 a、b 資料型態是否一致 ```c= #define max(x, y) ({ \ typeof(x) _x = (x); \ typeof(y) _y = (y); \ (void) (&_x == &_y); \ _x > _y ? _x : _y; }) ``` 第 4 行 ```(void) (&_x == &_y);``` 的作用是判斷 ```_x``` 和 ```_y``` 的指標型態是否一致，如果不一致編譯器會發出警告 ```warning: comparison of distinct pointer types lacks a cast``` * MAX3 版本 (解決命名衝突) * 預期外的輸入 ```max(x , _x)```，造成不同預期的結果 * 巨集展開規則 * 當前置處理器在代換一個類函式巨集呼叫（Function-like Macro Invocation）的時候，如果參照 Formal Parameter（型式參數）的地方與 ```#``` 或 ```##``` 運算元無關，前置處理器會先展開 Actual Parameter（實際參數），再代入展開後的 Actual Parameter * 當整個類函式巨集呼叫被展開之後，如果展開後的代換序列（Replacement List）仍然包含其他巨集參照，且被參照的巨集不是正在展開過程中的巨集(否則可能會有無限遞迴)，前置處理器會反覆代換直到沒有可以被代換的巨集參照 * ```#``` 和 ```##``` 在巨集裡的展開規則 * 範例一 ```c= #define STR(X) #X #define XSTR(X) STR(X) #define CONFIG 4 const char example1[] = STR(CONFIG); // "CONFIG" const char example2[] = XSTR(CONFIG); // "4" ``` * 範例二 ```c= #include <stdio.h> #define ___PASTE(a , b) a##b #define __PASTE(a , b) ___PASTE(a, b) #define ___TOSTR(a) #a #define __TOSTR(a) ___TOSTR(a) //__PASTE 和 ___PASTR 互換來測試 #define _PRINT(a, b, c, d) ({ \ printf("%s\n", __TOSTR(__PASTE(a, __PASTE(b, __PASTE(c, d))))); \ }) const enum word{ a, b, c, d }w; int main() { _PRINT(a, b, c, d); return 0; } ``` [參考](https://zh-blog.logan.tw/2019/10/13/a-few-findings-on-c-macro/) * MAX3 程式碼 ```c= #define __max(t1, t2, max1, max2, x, y) ({ \ t1 max1 = (x); \ t2 max2 = (y); \ (void) (&max1 == &max2); \ max1 > max2 ? max1 : max2; }) #define max(x, y) \ __max(typeof(x), typeof(y), \ __UNIQUE_ID(max1_), __UNIQUE_ID(max2_), \ x, y) ``` * 先看 ```__UNIQUE_ID``` ```c= #define __UNIQUE_ID(prefix) \ __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__) ``` * 接著是裡面的 ```__COUNTER__```: Predefined Macros 的其中一個，是一個計時器，會從 0 開始計數，然後每次呼叫加 1 ([參考](https://zhuanlan.zhihu.com/p/64479211))，因此可用來產生一個不會衝突的流水號碼 * 最後用 ```gcc -E``` 展開 MAX3 觀察可以發現 __UNIQUE_ID_max1_0 和 __UNIQUE_ID_max2_1 ```c= int x = 1, _x = 2; return ({ typeof(x) __UNIQUE_ID_max1_0 = (x); \ typeof(_x) __UNIQUE_ID_max2_1 = (_x); \ (void) (&__UNIQUE_ID_max1_0 == &__UNIQUE_ID_max2_1); \ __UNIQUE_ID_max1_0 > __UNIQUE_ID_max2_1 ? __UNIQUE_ID_max1_0 : __UNIQUE_ID_max2_1; \ }); ``` * MAX4 版本 * 分析 ```__is_constexpr``` ```c= #include <stdio.h> #define Def 10 #define __is_constexpr(x) \ (sizeof(int) == sizeof(*(8 ? ((void *) ((long) (x) *0l)) : (int *) 8))) enum test { Enum }; int main() { int Val = 10; const int Const_val = 10; int a = __is_constexpr(Val); int b = __is_constexpr(Const_val); int c = __is_constexpr(10); int d = __is_constexpr(Def); int e = __is_constexpr(Enum); printf("a:%d b:%d c:%d d:%d e:%d\n", a, b, c, d, e); return 0; } ``` * C 語言中的 ?: 在此處不代表 if-else > If both the second and third operands are pointers or one is a null pointer constant and the other is a pointer, the result type is a pointer to a type qualified with all the type qualifiers of the types referenced by both operands. Furthermore, if both operands are pointers to compatible types or to differently qualified versions of compatible types, the result type is a pointer to an appropriately qualified version of the composite type; if one operand is a null pointer constant, the result has the type of the other operand; otherwise, one operand is a pointer to void or a qualified version of void, in which case the result type is a pointer to an appropriately qualified version of void. * ```(int *)``` 0 是 null pointer * ```(void *) 0``` 和 ```0``` 都是 null pointer constant * 測試 ```printf("%d\n", (int)sizeof(*(0 ? (void *) 0 : (int *) 0)));``` ### Inline Function & Macro差異 ![](https://i.imgur.com/POFLH0j.png)