2018q3 Homework6

2018q3 Homework6 === contributed by <`brad84622`, `amikai`, `AlecJY`> # 題目 10 ## solution 題目給定一段 atoi 的程式碼，並要求補完補完如下，於下方說明 ```clike= #include<stdio.h> #include <stdbool.h> #include<string.h> #include <stdint.h> #define ULONG_MAX ((uint64_t)(~0UL)) /* 0xFFFFFFFF */ #define LONG_MAX (ULONG_MAX >> 1U) /* 0x7FFFFFFF */ #define LONG_MIN (~LONG_MAX) /* 0x80000000 */ static inline bool is_space(char c) { return ((c == ' ') || (c == '\t')); } int64_t my_atoi(const char *str) { const char *s = str; char c; uint64_t acc,cutlim,cutoff; int neg = 0, any,base; base = 10; // Skip white space and pick up leading +/- sign if any. do { c = *s; s++; } while (is_space(c)); if (c == '-') { neg = 1; c = *s; s++; } else if (c == '+') { c = *s; s++; } else { // No sign character } any = 0; cutoff = (neg != 0) ? LONG_MIN : LONG_MAX; cutlim=(cutoff%base); acc=0; cutoff = (LONG_MAX/base); while ((c >= '0') && (c <= '9')) { c=c-'0'; if((acc>cutoff)||((acc==cutoff)&&(c>cutlim))){ any=-1; break; } else{ acc=acc*10; acc=acc+c; c=*s; s++; } } if (any < 0) { acc = (neg != 0) ? LONG_MIN : LONG_MAX; } else if (neg != 0) { acc = ~acc + 1; } else { // There is no overflow and no leading '-' exists } return acc; } ``` 這段程式碼是將輸入的字串轉成整數型態，正負號由neg處理，故比較皆用正值，若overflow則給定any=-1並跳出迴圈，後續則會回傳最大(最小)值。 - 主要是填寫 while 區塊並對函數型態(`int 改成 int64_t`)進行修改 ```clike= while ((c >= '0') && (c <= '9')) { c=c-'0'; if((acc>cutoff)||((acc==cutoff)&&(c>cutlim))){ any=-1; break; } else{ acc=acc*10; acc=acc+c; c=*s; s++; } } ``` - `if((acc>cutoff)...)`:因cutoff是最大值除以十，故當acc>cutoff時acc再乘以十必然會overflow;同理因cutlim是最大值mod10，當c>cutlim時acc乘以十加上c必然overflow。 - 考量到 64/32bit 皆要能運行，以64bit為主對overflow的部份進行修改 ```clike cutoff = (cutoff/base); //在32bit底下負數除法會出錯 cutoff = (LONG_MAX/base); //改為正數避免此問題 ``` # 題目 10 ## solution ```clike= struct s { int field1; char field2; char field3; }; int main() { return (int) &((struct s *) 0)->field3; } ``` 在 gcc 搭配 64-bit GNU/Linux 編譯和執行,返回值會是什麼呢?請搭配 C 語言規格和編譯器最佳化來解釋原因,並且在 Linux 核心程式碼舉出類似用法。先看 C 語言規格的兩句話: ``` An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant ``` 所以 `null pointer` 在規格裡的定義就是 `void *` ``` Thus, &*E is equivalent to E (even if E is a null pointer), and &(E1[E2]) to ((E1)+(E2)). It is always true that if E is a function designator or an lvalue that is a valid operand of the unary & operator, *&E is a function designator or an lvalue equal to E. If *P is an lvalue and T is the name of an object pointer type, *(T)P is an lvalue that has a type compatible with that to which T points. ``` C 語言規格定義了 `&*E` 就是 `E` ( 就算是 null pointer 也是如此), 有趣的是那 `&(*E).m` 是什麼呢? C 語言規格裡並沒有提到尷尬的是 `&((struct s *) 0)->field3` 可以替換成這樣 `&(*(struct s *) 0).field3` 也就是 `&(*E).m` 這種形態, 所以 C99 規格沒有提及到此種情況其實這也是 linux kernel 在 `offsetof` 的實做方式參考 - [Does &((struct name *)NULL -> b) cause undefined behaviour in C11?](https://stackoverflow.com/questions/26906621/does-struct-name-null-b-cause-undefined-behaviour-in-c11) 使用 gcc 編譯之後(使用O0), 使用反組譯來看看 ``` Dump of assembler code for function main: 0x00000000000005fa <+0>: push %rbp 0x00000000000005fb <+1>: mov %rsp,%rbp 0x00000000000005fe <+4>: mov $0x5,%eax 0x0000000000000603 <+9>: pop %rbp 0x0000000000000604 <+10>: retq ``` gcc 實際的行為根本就沒有發生 dereference, 而是在編譯時期就已經把地址算好了 ## offsetof linux kernel 裡的 offsetof ```c= // include/linux/stddef.h #ifdef __compiler_offsetof #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER) #else #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) #endif // include/linux/compiler_types.h #define __compiler_offsetof(a, b) __builtin_offsetof(a, b) ``` 可以看到實做方式, 如果編譯器有定義就使用編譯器 `__builtin_offsetof` 沒有就使用 `((size_t)&((TYPE *)0)->MEMBER)` 在我的實做我會採用這種方式 ```c= #define offsetof(TYPE, MEMBER) ({ \ TYPE __dummy; \ (size_t)((char *)&__dummy.MEMBER - (char *)&__dummy);}) ``` # 題目 11 ## solution ``` C= #include <math.h> #include <stdbool.h> #include <stdint.h> void ieee754_b32_encode(float x, char out[4]) { bool sign = x < 0; uint8_t exponent; uint32_t fraction; if (isinf(x)) { exponent = 0xFF; fraction = 0; } else if (isnan(x)) { exponent = 0xFF; fraction = 0x7FFFFF; } else { if (sign) x = -x; int e = 0; fraction = frexp(x, &e) * ((uint32_t) 2 << 23); if (e <= 126) { fraction = 0; exponent = 0; } else { exponent = e + 126; if (e + 126 > 0xFF) { exponent = 0xFF; fraction = 0; } } } out[0] = ((sign << 7) & 0x80) | ((exponent >> 1) & 0x7F); out[1] = ((exponent << 7) & 0x80) | ((fraction >> 16) & 0x7F); out[2] = (fraction >> 8) & 0xFF; out[3] = fraction & 0xFF; } ``` 題目給了這段程式碼，說明這是將 IEEE 754 單精度浮點數轉換成特定 32-bit 表示法，並且要求寫出將轉換出的結果轉換回 IEEE 754 單精度浮點數的程式碼，還要指出原始程式碼的疏失。首先先看一下原來的程式碼流程，這支程式碼輸入一個浮點數後會先確認浮點數的正負號，確認後會檢查浮點數是不是 infinity 或是 not a number ，如果是的話會直接做特殊處理。之後會將浮點數取絕對值，然後放進 `frexp()` 計算， `frexp()` 會回傳一個 0.5 ~ 1 之間的浮點數，以及整數 e ，使得 $原始值 = e^2 \times 回傳值$ ，之後將回傳值乘以 $2^{23}$ 轉成整數後的最低 23 bits 就是 mantissa，而 e 這個指數部分因為回傳值是 0.5 ~ 1 之間所以跟一般 IEEE 754 表示得到的指數部分多 1 ，接下來在第 20 行的比對我覺得有很大的問題， e 超過126的數字只有一點點，所以會導致大多數的浮點數最後輸出結果都一樣，達不到轉換表示法的目的，所以覺得這邊可能是題目要求指出的程式碼疏失之一，這邊的判斷式改成 `e <= -126` 會比較合理，雖然還是有點問題。第 24 行將 e 加上 126 算出 exponent ，並且判斷數字是否為無限大。最後一步是將剛剛算出來的 sign 、 exponent 、 fraction 依照跟 IEEE 754 一樣的規則，第一個 char array 的第一個 bit 放 sign ，後面接著放 exponent ，多的 bit 放進 array 的下一個元素，再將 faction 忽略最高位的那個 bit 後依照一樣的方式放進去。再來就來看一下放進一些比較特別的數字所發生的事情，放入無限大和 NaN 由於一開始有做特殊處理，所以不會有問題，只是所有的 NaN 都被處理成 0x7FFFFF ，會遺失一些訊息。再來是放入 denormalized number ， denormalized number 放進 `frexp()` 之後產生的 e 會是 -126 ，所以後面有檢察如果是的話將 exponent 和 fraction 設定為 0 ，但是在一般的 IEEE 754 裡面如果把 exponent 和 mantissa 都設為 0 代表的數字是 0 ，所以這邊把 fraction 改成其他數字會比較好。然後是放入 0 ， `frexp()` 有提到如果放入 0 的話 e 和回傳值都會是 0 ，這在這段程式碼沒有特別處理，導致後面算出來的結果會讓 0 跟 0.5 的表示值一模一樣，所以 0 也需要特別判斷。為了解決這些問題，將程式碼修改成以下這樣 ``` C= #include <math.h> #include <stdbool.h> #include <stdint.h> void ieee754_b32_encode(float x, char out[4]) { bool sign = x < 0; uint8_t exponent; uint32_t fraction; if (isinf(x)) { exponent = 0xFF; fraction = 0; } else if (isnan(x)) { exponent = 0xFF; fraction = 0x7FFFFF; } else if (x == 0) { exponent = 0; fraction = 0; } else { if (sign) x = -x; int e = 0; fraction = frexp(x, &e) * ((uint32_t) 2 << 23); if (e <= -126) { fraction = 1; exponent = 0; } else { exponent = e + 126; if (e + 126 > 0xFF) { exponent = 0xFF; fraction = 0; } } } out[0] = ((sign << 7) & 0x80) | ((exponent >> 1) & 0x7F); out[1] = ((exponent << 7) & 0x80) | ((fraction >> 16) & 0x7F); out[2] = (fraction >> 8) & 0xFF; out[3] = fraction & 0xFF; } ``` 接下來是將編碼過的數字轉換回浮點數的轉換程式，因為實際上格式跟普通的 IEEE 754 沒什麼差別，所以可以直接把每個 byte 都塞進 float 裡面就轉換回來了。 ``` C= #include <stdint.h> float ieee754_b32_decode(char in[4]) { float out; uint32_t *pout = (uint32_t *) &out; *pout = ((uint32_t) in[3]) & 0xFF; *pout |= ((uint32_t) in[2] & 0xFF) << 8; *pout |= ((uint32_t) in[1] & 0xFF) << 16; *pout |= (uint32_t) in[0] << 24; return out; } ```