課程簡介和注意須知

--- tags: sysprog --- # 課程簡介和注意須知 ## 作業要求 * 從 [課程簡介和注意須知](https://docs.google.com/presentation/d/1Wo22s5aYuuyry97-z2MMmUmdxhjD0wKTdhxIW5oqjvo/edit?usp=sharing), [第 1 週測驗題](https://hackmd.io/@sysprog/HksKVpUIr) 選出你感興趣的 3 個題組進行作答，並且回答附帶的「延伸問題」 * 應當包含 [課程簡介和注意須知](https://docs.google.com/presentation/d/1Wo22s5aYuuyry97-z2MMmUmdxhjD0wKTdhxIW5oqjvo/edit?usp=sharing) 的 Page 9 到 Page 16 裡頭的 C 語言程式設計議題 * 比照 [課前測驗參考解答: Q1](https://hackmd.io/s/ByzoiggIb) 和 [對 Linked List 進行氣泡排序](https://hackmd.io/s/BklGm1MhZ) 的模式來撰寫共筆，需要詳細分析自己的想法、參閱的材料 (包含 C 語言規格書的章節)，以及==進行相關實驗== * 閱讀 [第一週課程進度](http://wiki.csie.ncku.edu.tw/sysprog/schedule) 所有材料，記錄所見所聞並提問，若有不能理解的部分，請標註出來。善用 HackMD 的語法 `:::info` 和 `:::` 標註你的提問 :::info 像是這樣標註提問 ::: * 將你的共筆加到 [2019q3 Homework1 (作業區)](https://hackmd.io/@sysprog/HJvjIvDIr) * 截止日期： * Sep 25, 2019 (含) 之前 ## Question 1 ### 問題描述根據 [課程簡介和注意須知](https://docs.google.com/presentation/d/1Wo22s5aYuuyry97-z2MMmUmdxhjD0wKTdhxIW5oqjvo/edit#slide=id.p) P.13 所描述 : 以下範例1是合法C99/C11程式，但不是合法C++或C89程式 ```clike for (int i = 0; ;) { int i = 1; return i; } ``` 同時，以下範例2是合法C程式，也是合法C++程式 ```clike for (int i = 0; ;) {{ int i = 1; return i; }} ``` 但為什麼會這樣，兩者差異在哪裡? ### 解釋 #### C89 與 C99 的差異首先，試著以 C89 和 C99 的gcc編譯器試著編譯範例1 ```shell gcc -std=c99 ex1.c # C99，順利編譯成功 ``` ```shell gcc -std=c89 ex1.c # C89 ex1.c: In function 'main': ex1.c:3:1: error: 'for' loop initial declarations are only allowed in C99 or C11 mode for (int i = 0; ;) { ^ ex1.c:3:1: note: use option -std=c99, -std=gnu99, -std=c11 or -std=gnu11 to compile your code ``` 很好，編譯器直接了當的告訴我們 : C89 不支援 'for' loop initial declearations，但 C99 和 C11 可以。所以在 C89 就需要改成這樣 : ```clike int i; for (i = 0; ;) { int i = 1; return i; } ``` #### C 和 C++ 的差異再者，試著編譯看看範例1在 C++ 中到底哪裡不合法了 ```shell gcc ex1.c # 以 C 編譯沒問題 ``` ```shell gcc ex1.cpp # 以 C++ 編譯會出錯 ex1.cpp: In function 'int main()': ex1.cpp:6:10: error: redeclaration of 'int i' int i = 1; ^ ex1.cpp:5:14: note: 'int i' previously declared here for (int i = 0; ;) { ^ ``` 編譯器告訴我問題在於 redeclaration of 'int i' 。但為什麼在C裡面不算重複定義而在C\++裡就算呢? 根據 [DR446](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2244.htm#dr_466) 所說，原因是出在 C 和 C++ 的 for loop control declaration scope 不一樣，在 C 裡面 for 的「宣告」和「body」是兩個不同的變數可視範圍 ( scope )，`for (int i = 0; ;)` 和`{ ... }` ，而在 C++ 裡，for的「宣告」和「body」則共用同一個 scope，`for (int i = 0; ;){ ... }`，所以在此例，C++ 裡面，不能再宣告另一個`i`。同時，裡面也有提到 : for loop initial declaration 最早是 C99 加入的，而之所以會加入就是為了**變得跟 C++ 一樣** ( 所以 C++ 最一開始也是把「宣告」和「body」看成2個不同的 scope )，但 C++ 最終卻採用 ISO/IEC 14882:1998 的規則，這才導致了 C 和 C++ 的分道揚鑣很顯然的，在 for loop 規範上 C++ 比 C 還嚴格，而之所以會有這種規範，主要是考慮到這種變數可視範圍的特性可能會造成隱微難查的BUG，比如文中有此範例 : ```clike static inline int f (void) { for (int i = 0; ; ) { long i = 1; // valid C, invalid C++ // ... return i; // (perhaps unexpectedly) returns 1 in C } } ``` 可以看到，在 C 裡面`i`會回傳1，但這可能不是開發者所期望的行為。規格書作者作者認為，這種隱性的 scope 規則會導致潛在的 BUG ，實在不是什麼好現象。因此他也在 6.2.1 Scopes of identifiers 中**建議**撰寫時比照C++的規範 : >*Names declared in clause-1 of the for statement are local to the for statement and shall not be redeclared in a subsequent condition of that statement nor in the outermost block of the controlled statement.* :::warning 這一段有疑慮 : DR466 中是如此描述，但我自己在C99規格書的 6.2.1 Scopes of identifiers 卻沒有找到此段文字。 ::: 至於範例2，則是跟上述建議不同的反向操作 : 把 C++ 的行為改得跟 C 一樣。只是把變數作用範圍用==Block==寫的清清楚楚。在 C 和 C++ 裡，Block是指由左右大括號`{`、`}`所包起來的空間，Block 定義了變數的作用範圍，且 Block 可以是巢狀式的。比如以下範例 : ```clike #include<stdio.h> int main() { { int x = 1, y = 2; { printf("x = %d, y = %d\n", x, y); int y = 3; printf("x = %d, y = %d\n", x, y); } } return 0; } ``` 執行結果將是 : ``` x = 1, y = 2 x = 1, y = 3 ``` 由此可以觀察到 Block 的幾個規則 : 1. 內圈的 Block 可以「看見」外圈 Block 定義的變數 2. 當內外圈 Block 的變數名稱衝突，那麼該外層變數的可視性 (visibility) 將在內層變數宣告的同時被終止。也因此，原本的範例2，可以將其視為兩個不同的scope A、B ```clike= for (int i = 0; ;) // A { // A { // B int i = 1; // B return i; // B } // A } // A ``` 如此，`i`就不會有 re-declaration 的問題了，因為line 1的`i`在line 4、5 是不可視狀態 ### 參考資料 - [DR466](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2244.htm#dr_466) - [變數可視範圍（Scope）](https://openhome.cc/Gossip/CppGossip/Scope.html) - [Introduction to Programming Languages/Scoping with Blocks](https://en.wikibooks.org/wiki/Introduction_to_Programming_Languages/Scoping_with_Blocks) - [Scope rules in C](https://www.geeksforgeeks.org/scope-rules-in-c/) <br> <br> ## Question 2 ### 問題描述根據 [課程簡介和注意須知](https://docs.google.com/presentation/d/1Wo22s5aYuuyry97-z2MMmUmdxhjD0wKTdhxIW5oqjvo/edit#slide=id.p) P.15 所描述 : 以下範例是一個取絕對值的 funciton abs()，但為什麼這個 function 可行? 又`abs(-2147483648)`結果會是什麼? ```clike= #include <stdint.h> int32_t abs(int32_t x) { int32_t mask = (x >> 31); return (x + mask) ^ mask; } ``` ### abs(x) 的預期行為 C 語言的 signed integer 是採用二的補數編碼。而在二補數裡，一數 $x$ 與相反數 $-x$ 的編碼關係為，$-x= \neg(x-1)$。因此，我們必須設計一個 function ，使其為 : $$ abs(x)= \begin{cases} x & x\ \geq\ 0\\ \neg(x-1) & x\ <\ 0 \end{cases} $$ 因此，上述範例，我們也必須分成`x`為正數和 `x`為負數兩種情況探討其行為。 #### x為正數根據二補數的規律我們可以知道，若一數`x`為正，則MSB為0 ; 若`x`為負，則MSB為1。所以當`x`為正，範例line 3的`mask`將為 0 : ```clike int32_t mask = (x >> 31); ``` 而也因此，line 4的回傳值將會是原封不動的`x` ```clike return (x + 0) ^ 0; ``` #### x為負數但當`x`是負的，事情就有點複雜了。我們知道最終line 4的回傳值必須是 $\neg(x-1)$，為此，變數`mask`必須是 -1，才能達成我們的需求 : ```clike return (x + (-1)) ^ -1; ``` 因為根據 XOR 的真值表 : | Bit A | Bit B | A XOR B| |-------|-------|--------| |0 |0 |0 | |1 |0 |1 | |0 |1 |1 | |1 |1 |0 | 我們可以看到，當 B 為 0 時， A XOR B = A ; 但是當 B 為 1 時，A XOR B = $\neg$A，所以我們可以用 XOR 來做位元反轉 ( 0、1的反轉 ) 而當我們想讓一個`int32_t`的變數每一個 bit 都反轉，那這個值就必須是 `111 ..(28).. 1`，也就是有號數的 `-1`。所以當`mask`為`-1`，我們就可以得到 $\neg(x-1)$。看裡來很完美。現在唯一的問題是 : `mask`真的是 -1 嗎 ? 很難說 ! ### Shift Operator 我們重新回來看 line 3 : ```clike int32_t mask = (x >> 31); ``` 根據 [C99 Standard](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf) 6.5.7 Bitwise shift operators 描述 >The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / $2^{E2}$. If E1 has a signed type and a negative value, the resulting value is implementation-defined. 從上可知，當 `x` 是正的，`>>`時 MSB 毫無疑問會補 0，如此才能達到「除以$2^{E2}$後的商」的效果。但是當`x` 是負的，則`>>`的實際行為則是 implementation-defined ，由編譯器去決定要怎麼實做。雖然按照常理， MSB 應該會補 1，以達到 `(-x)/2` 的結果。若是如此則`mask`會是`-1`，範例也將能正確算出 abs(x)。但有沒有可能編譯器實際上就是補 0，所以`mask`最終變成`0x00 ... 01`，也就是 1 ? 畢竟規格書沒明確規範編譯器一定要怎麼做。 #### gcc 根據 [Re: right shift of signed type - GCC](https://gcc.gnu.org/ml/gcc/2000-04/msg00152.html) 所描述 : >For right shifts, if the type is signed, then we use an arithmetic shift; if the type is unsigned then we use a logical. 而所謂 Arithmetic shift 為 : ![arithmetic_shift](https://upload.wikimedia.org/wikipedia/commons/thumb/3/37/Rotate_right_arithmetically.svg/300px-Rotate_right_arithmetically.svg.png) - Right shift 時，空出的 bit 填入原本的 MSB - Left shift 時，空出的 bit 填 0 與之相對，Logical shift 為 : ![logical_shift](https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/Rotate_right_logically.svg/300px-Rotate_right_logically.svg.png) - Right 和 Left shift，空出的 bit 都填 0 因此我們可以知道 : 至少在 gcc 下，right shift 在負數情況下是會補1，該範例可以正確得到絕對值。 ### abs(-2147483648) 再來，我們執行看看`abs(-2147483648)`到底會產出什麼結果 ```clike #include <stdint.h> #include <stdio.h> int32_t abs(int32_t x) { int32_t mask = (x >> 31); return (x + mask) ^ mask; } int main() { printf("%d\n", abs(-2147483648)); } ``` ```shell gcc -o ex1 ex1.c && ./ex1 -2147483648 ``` 看起來 `abs(x)` 似乎是失靈了，為什麼會這樣呢? #### 從原理解讀二補數之前雖然說對二補數編碼做 $\neg(x-1)$ 運算可以翻轉十進位的正負號表示，但嚴格來講，這個運算是「取得 $x$ 的反元素」因為二補數是一個阿貝爾群，因此他的加法運算必須遵守 5 個規則 : 1. 封閉性: 若 a 和 b 是集合 G 中的元素，於是 `(a + b)` 也是集合 G 中的元素; 2. 結合律: `(a + b) + c = a + (b + c)` 3. 存在單位元素 0，使得 `a + 0 = 0 + a = a` 4. 每個元素都有反元素 (或稱「逆元」)，也就是說：對於任意 a，存在 b，使得 `a + b = 0` 5. 交換律: `a + b = b + a` 再者，我們得了解 -2147483648 代表的是 int32_t 可以表示的最小值，即`1000 ..(27).. 0`，對於這樣的數，其反元素會等於自己。因為根據群的上述規則 4，「元素 + 反元素 = 0」。那麼以4位元舉例，4位元能表示的最小值是`1000`，則 `1000 + ? = 0000`，我們可以知道，反元素`?`也同樣是`1000`，因為`1000 + 1000 = [1]0000`，如此才符合定義。這對 32 位元也是同理，`1000 ..(27).. 0`的反元素也會是`1000 ..(27).. 0` ### 結論以上 C 語言函式要能夠稱為「對於輸入參數 `int32_t x` 取絕對值」，需要有兩個前提 : 1. 使用的編譯器，對於 $x<0$ 時， $x$ << $n$ 的執行結果要等同於 $x/2^n$ 的商 2. `x` 的範圍要限定在 -2147483647 ~ 2147483647 ### 參考資料 - [解讀計算機編碼](https://hackmd.io/@sysprog/rylUqXLsm) - [你所不知道的C語言：數值系統篇](https://hackmd.io/@sysprog/BkRKhQGae?type=view) <br> <br> ## Question 3 ### 問題描述根據 [課程簡介和注意須知](https://docs.google.com/presentation/d/1Wo22s5aYuuyry97-z2MMmUmdxhjD0wKTdhxIW5oqjvo/edit#slide=id.p) P.11 所描述 : 以下範例能否編譯並執行? 若可，為什麼呢? ```clike #include <stdio.h> int main() { return (*******puts)("Hello"); } ``` ### 解釋首先，我們直接對此程式做執行編譯 : ``` gcc -o ex1 ex1.c ./ex1 Hello ``` 可以看到，這個範例確實是合法的 C 語言程式，而且其行為似乎與原始的 puts() 一樣。再來我們試著以 gdb 了解 `puts` 和 `*puts` 的詳細資訊 : ```gdb gdb ex1 (gdb) print puts $1 = {<text variable, no debug info>} 0x7ffff7a7e5f0 <puts> (gdb) print *puts $2 = {<text variable, no debug info>} 0x7ffff7a7e5f0 <puts> ``` 可以看到，`puts` 和 `*puts`沒有任何差別，所指向的地址都是0x7ffff7a7e5f0。再看看 [C99](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf) 規格書對此的解釋 : - [6.3.2.1] >A function designator is an expression that has function type. Except when it is the operand of the sizeof operator or the unary & operator, a **function designator** with type "function returning type" **is converted to** an expression that has type "pointer to function returning type". - [6.5.3.2] >The unary * operator denotes indirection. **If the operand points to a function, the result is a function designator** 因為 function name 本身就是一個 function designator，所以 `puts` 是 function designator，而根據 [6.3.2.1] 我們知道 : 當要對其進行 `*` 操作時， function designator 會自動轉換成 function pointer，變成 dereference of function pointer 而因為它變成 function pointer，再根據 [6.5.3.2] 得知 : dereference of function pointer 最後會變成 function designator ``` puts // function designator *puts // *(function designator) -> *(function pointer) -> function designator ``` 因為 `*puts == puts`，同理`*(*puts) == *(puts) == puts`。所以我們得到結論 : 不管對函式加了幾個`*`，其操作最終都會回到原點。 ### 參考資料 1. [c語法問題-結構成員指到函式記憶體位址](http://www.programmer-club.com.tw/ShowSameTitleN/c/39038.html) 2. [Lvalues and rvalues](https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.0/com.ibm.xlc131.aix.doc/language_ref/lvalue.html)