2018q3 Homework3 (review)

# 2018q3 Homework3 (review) ### hex2 - [ ] 推敲以下程式碼的作用: ```C void hex2(unsigned int x) { do { char c = "0123456789abcdef" [x & 0xf]; printf("char %c for %d\n", c, x); x >>= 4; printf("char %c for %d\n", c, x); } while (x); } ``` :::success 延伸問題: 在 [glibc](https://www.gnu.org/software/libc/) 原始程式碼找出類似作用和寫法的程式碼，並探討其實作技巧 ::: --- #### 想法與思考 ```C= struct buffer { unsigned char *data; size_t length; }; print_hex (const char *label, struct buffer buffer) { printf (" %s ", label); unsigned char *p = buffer.data; unsigned char *end = p + buffer.length; while (p < end) { printf ("%02X", *p & 0xFF); ++p; } putchar ('\n'); } ``` 以上為從 [glibc](https://github.com/lattera/glibc/blob/895ef79e04a953cac1493863bcae29ad85657ee1/resolv/tst-ns_name.c#L181) 擷取的部分程式碼，比較其中差異。 * print_hex (glibc) * 在 glibc 版本中實作的是可以印出任意長度的版本。 * 採用 printf 本身提供的 specifier 輸出對應的 16 進位字元，且 8bit 一次輸出。 * hex2 (exam) * 雖然設定輸入的型態是 `unsigned int` 但 do-while loop 裡的實作並不限定為 4 byte ，但相對 glibc 以指標搭配長度的型式處理， glibc 顯得比較有彈性。 * 一次印出 4bit ，用預定義的 `"0123456789abcdef" [x & 0xf]` 從實際 value 轉換為 hex 表示式，乍看之下這行程式碼讓人費解，但其實是對常數陣列做 subscription ，這樣的技巧之前並未使用過，在這裡並需要擔心 `[x & 0xf]` 的值跑出 0~15 的區間，`&`已經保證了運算後的範圍，但用在其他狀況底下就要擔心會不會 index 超出我們允許的範圍。 * 進一步探討若是像 `printf("%d", "abc"[4])` 這樣超出陣列定義的範圍會印出怎樣的數值，發現並沒有 segmentation 之類的錯誤，不過會印出非預期的數值，再更進一部探討記憶體只有分為 read-only 和 read/write 兩種權限，故這樣的操作是合法的，從這個角度來看， glibc 的版本比較安全。 --- ### Portable Bit Fields in packetC - [ ] 考慮以下程式碼: ```C= #include <stdio.h> #include <stdint.h> struct test { unsigned int x : 5; unsigned int y : 5; unsigned int z; }; int main() { struct test t; printf("Offset of z in struct test is %ld\n", (uintptr_t) &t.z - (uintptr_t) &t); return 0; } ``` 在 GNU/Linux x86_64 環境中執行，得到以下輸出: ``` Offset of z in struct test is 4 ``` 倘若將第 10 和第 11 換為以下: ```C=10 printf("Address of t.x is %p", &t.x); ``` 會發生什麼事？ --- #### 想法與思考 * Portable Bit Field 的用意在於設定 struct 內留給某一個 member 的 bit 數，以程式碼中為例，x 和 y 佔據了 5 bit，會輸出 4 byte 的原因是因為 alignment cpu 為了存取資料的方便，會讓 z 的起始位址在 4 byte 為單位的特定位置上，可以做出結論是 z 之前的 member 沒有超過 32bit 的話都會是 4 。 ```C= // 在 z 之前的 member 佔據了 32 bit struct test { unsigned int x : 16; unsigned int y : 16; unsigned int z; }; // Offset of z in struct test is 4 // 在 z 之前的 member 佔據了 33 bit struct test { unsigned int x : 16; unsigned int y : 16; unsigned int z; }; // Offset of z in struct test is 8 ``` * 實際把題目要求的 code 往程式碼裡面代換，執行會輸出 `error: cannot take address of bit-field ` 和預期的並不一致，進一步去查詢規格書，在 6.5.3.2 節可以看到對於 `&` operator 的規範，並不能做用在 bit-field 上。 > The operand of the unary & operator shall be either a function designator, the result of a [] or unary * operator, or an lvalue that designates an object that is **not a bit-field** and is not declared with the register storage-class specifier. * 關於規格書中對於 bit-field 的規範還有只限 _Bool, signed int, unsigned int 類型，故以下程式碼會出現錯誤 > A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type. ```C= struct test { float x : 16; unsigned int y : 16; unsigned int z; }; // error: bit-field 'x' has invalid type ``` * 對於 bit-field 的操作也可以用 preprocessor 調控， `#progma pack(x)` 表示記憶體的單位長度， x 可輸入的值為 1, 2, 4 和 16 單位是 byte，考慮以下程式碼輸出的結果就會不同。 ```C= #include <stdio.h> #include <stdint.h> #pragma pack(1) struct test { unsigned int x : 5; unsigned int z; }; int main() { struct test t; printf("Offset of z in struct test is %ld\n", (uintptr_t) &t.z - (uintptr_t) &t); return 0; } // output: Offset of z in struct test is 1 ``` --- ### Signal System call - [ ] [指標篇](https://hackmd.io/s/HyBPr9WGl) 提到 signal 系統呼叫的原型宣告: ```C void (*signal(int sig, void (*handler)(int))) (int); ``` 該如何解析呢？提示: 參閱 manpage: [signal(2)](http://man7.org/linux/man-pages/man2/signal.2.html) :::success 延伸問題: 解釋 signal(2) 的作用，並在 GitHub 找出應用案例 ::: #### 想法與思考 * 首先從函數原型開始分析，從以下自 manpage 擷取下來的片段程式碼可以看出 signal 的輸出是一個 function pointer ，輸入為一個 int 和另一個 function pointer。 ```C= #include <signal.h> typedef void (*sighandler_t)(int); sighandler_t signal(int signum, sighandler_t handler); ``` 繼續從 manpage 分析，signal 系統呼叫主要是註冊 signum 與其相對應的 handler， handler 可以是 user 定義的一個 function pointer 也可以是 SIG_IGN 或是 SIG_DFL，SIG_IGN 代表忽略特定信號，SIG_DFL 則表示恢復成系統預設行為，但 SIGKILL 和 SIGSTOP 則不受此規範，不可被忽略。至於 signal system call 的回傳值為本來的對應處理函數所在的地址，使用場景很可能是在短暫的一段區間內，我們需要指定對特定信號的處理機制，過了這段區間之後就還原為原來的處理方式，從下列程式碼片段可以看出實際使用狀況。 ```C= #include <signal.h> void my_signal_handler(int signum) { printf("I got a signal\n"); } int main() { void (*old_handler)(int); old_handler = signal(SIGINT, my_signal_handler); // do some stuff here .... signal(SIGINT, old_handler); } ``` 但從 manpage 下面列出的附註可以看到，現行大家比較常使用，也比較被推薦的是用 sigaction system call ，因為 signal 函數其實有很多的未定義行為，舉例來說像是就不保證 multithreading 會有的處理機制。 * Github 上我看到的是 [illumos-gate](https://github.com/illumos/illumos-gate) 內關於 [sigaction](https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/threads/sigaction.c) 的實作，我想就如同 manpage 內的建議網路上並沒有太多實際應用 signal 這個系統呼叫的實際案例，相對地看到的是很多不同的專案自己實作了 sigaction 用到 signal.h 的地方只有定義 SIGNUM