2018q3 Homework1

# 2018q3 Homework1 contributed by < [p61402](https://github.com/p61402) > ###### tags: `sysprog` ## 指標篇 ## Understanding Declarations ```C (*(void(*)())0)(); ``` 在這段程式碼中，`(void((＊)())0` 表示將 `0x0` 轉換成一個指標指向「一個沒有引數且回傳值為空的函式」。若將`(void((＊)())0` 以 `ptr` 表示，`(*ptr)()`代表呼叫此函式。 ```C typedef void (*funcptr)(); (* (funcptr) 0)(); ``` 此段程式碼與 `(*(void(*)())0)()` 同義，都會造成 [segmentation fault](https://en.wikipedia.org/wiki/Segmentation_fault) ，根據維基百科上的描述，造成的原因為 Attempting to access a nonexistent memory address (outside process's address space)，為了避免存取到自身 process 以外的記憶體位址，作業系統會有 [Memory Protection](https://en.wikipedia.org/wiki/Memory_protection) 的機制。 ## 回頭看 C 語言規格書 - C99 [3.14] 對 **Object** 的定義 : region of data storage in the execution environment, the contents of which can represent values. - **&** 在涉及指標操作時，代表的意思是「取址」，唸作 "address of" - C99 [6.2.5] 的描述 : pointer type 可由 function type 、object type 或 incomplete type 衍生，reference 到被參照的實體。 ## void * `void*`是一種 incomplete type，在 C99 [6.2.5] 中明確說明了 incomplete type 雖然描述了`object`但卻缺乏關於 size 的資訊，但`void*`究竟有什麼功能? stackoverflow 上的[這篇文章](https://stackoverflow.com/questions/11626786/what-does-void-mean-and-how-to-use-it)的回答舉了 qsort 當做例子。 ```C void qsort(void *base, size_t nmemb, size_t size, int (*compar)(const void *, const void *)); ``` 第一個引數`void *base`可以接收任何型態的指標，例如: int, double, char 等等，此種寫法能夠讓函式具有 overloading 的功能。 :::warning :question:問題：根據 manual page 對於`malloc`的描述，`malloc`回傳值是指向宣告的記憶體位址的指標`void*`。 ```C void *malloc(size_t size); ``` 在老師的共筆有說到`void*`的存在是為了讓使用者 explicitly cast，以避免 undefined behavior 產生。但是我在瀏覽別人的程式碼的過程中經常發現沒有顯式轉型的例子，但依然不影響程式運行的結果，那麼究竟在 C 語言當中使用`malloc`函式，使用implicit conversion 與 explicit conversion 兩種方法究竟會不會影響程式的運行？若不影響的話哪一種方法比較好？或是各有優缺點呢？ ```C int *arr = malloc(sizeof(int *) * length); // implicit conversion int *arr = (int *) malloc(sizeof(int *) * length); // explicit conversion ``` ::: 於是我就去翻 C 語言的規格書： - C11 [6.3.2.3] A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer. - C11 [6.5.16.1] In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand. 根據以上兩點可以知道，在 C 語言中，使用 simple assignment (=)，`void*`是可以被 implicit cast 成為其他型別的 pointer，是否會在轉型過程中發生問題則沒有被提及。然後我在 StackOverflow 找到[這篇熱烈討論的文章](https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc)關於是否該將 malloc 的結果顯式轉型，許多人對於是否該將結果強制轉型的看法不一，以下整理出幾個論點： 1. 在 C 語言中沒有必要顯式轉型，因為`void*`會自動且安全地被轉型成為其他的 pointer type，強制轉型反而會造成程式碼雜亂難以閱讀。 2. 然而在 C++ 當中必須要顯式轉型，否則會造成編譯錯誤如下，因此將`void*`顯式轉型會使得程式碼的便攜性 (portable) 增加。 ```C error: invalid conversion from ‘void*’ to ‘int*’ [-fpermissive] int *arr = malloc(sizeof(int) * 10); ``` 但對於第二個論點我個人並不支持，因為 C 與 C++ 在本質上已是不同的程式語言，兩者具有截然不同的特性，將兩者特性混用並不是一個好主意。況且 C++ 本身支援`new`這個配置記憶體的方法，甚至在功能上與方便性更甚`malloc`，在 C++ 程式碼中使用`new`會比`malloc`更好一些。在[維基百科](https://en.wikipedia.org/wiki/C_dynamic_memory_allocation#Type_safety)上也有提及兩者的優缺點。綜合以上的資料，我認為在 C 語言當中，既然`void*`能夠在使用 simple assignment (=) 時自動隱性轉型，況且不論是否進行顯式轉型得到的結果都相同，那麼使用`malloc`函式可以不必將結果顯式轉型。 ## A Pointer to a pointer 指標的指標常用來改變傳入函式的原始數值，可延長變數的 lifetime ，我的經驗是時常利用指標的指標配合`malloc`來進行二維陣列的宣告。例如以下程式碼透過`arr`這個指標指向一段長度為 3 的`int *`型別記憶體位址，再將這三個指標指向一段長度為 4 的`int`型別記憶體位址，利用此方法就可以建立一個 3\*4 的陣列。 ```C int i; int **arr = malloc(3 * sizeof(int *)); for (i = 0; i < 3; i++) arr[i] = malloc(4 * sizeof(int)); ``` :::info 你想過這樣如果要 free() 時，是否需要兩層的迴圈呢？倘若有人只是 `free(arr);` 會發生什麼事？又，該如何改善？ Hint: memory leak :notes: jserv ::: > 若要 free() 的話，應反向操作，釋放較晚宣告的記憶體，接著才釋放指向此段位址的記憶體，如下列程式碼所示。 > ```C > for (i = 0; i < 3; i++) > free(arr[i]); > free(arr) > ``` > 針對若只有`free(arr)`的部分，我簡單做了個小測試，我將二維陣列的第一個元素`arr[0][0]`的數值設為 5，再宣告一個指向整數型別的指標`*a`指向`&arr[0][0]`，實驗不同釋放的記憶體方式，判斷是否會造成 memory leak 。 > ```C > int i = 0; > int *a; > int **arr = malloc(3 * sizeof(int*)); > for (i = 0; i < 3; i++) > arr[i] = malloc(4 * sizeof(int)); > > arr[0][0] = 5; > a = &arr[0][0]; > > printf("address of a: %p\n", &a); > printf("value of a: %d\n", *a); > > > for (i = 0; i < 3; i++) > free(arr[i]); > free(arr); > printf("after free...\n"); > > printf("address of a: %p\n", &a); > printf("value of a: %d\n", *a); > ``` > 執行結果如下： > ``` > address of a: 0x7ffd0cfc1638 > value of a: 5 > after free... > address of a: 0x7ffd0cfc1638 > value of a: 0 > ``` > 可以發現`*a`所指向的位址不變，但數值更新為 0 ，可知已經成功釋放記憶體。 > 但若將釋放記憶體的過程改為只有呼叫`free(arr)`，會產生不同的結果如下： > ``` > address of a: 0x7ffcbea12d88 > value of a: 5 > after free... > address of a: 0x7ffcbea12d88 > value of a: 5 > ``` > 同樣的`*a`指向的記憶體不變，但會發現數值仍然為 5 ，由此可知若只有呼叫`free(arr)`這種方式，只能釋放`**arr`這個指標所指向的位址，對於再下一層所指向的記憶體無法進行釋放，長久下來會導致無用的資源持續佔用記憶體，以致於 memory leak ，最後造成記憶體耗盡，程式崩潰。 :::success 有個工具叫做 [cppcheck](http://cppcheck.sourceforge.net/)，可進行程式碼靜態分析，也能指出上述 memory leak ```shell $ sudo apt install cppcheck ``` 延伸閱讀: [How to dynamically allocate a 2D array in C?](https://www.geeksforgeeks.org/dynamically-allocate-2d-array-c/) :notes: jserv ::: ## Pointers vs. Arrays ```C int main() { int x[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; printf("%d %d %d %d\n", x[4], *(x + 4), *(4 + x), 4[x]); } ``` 以上程式碼中四種方法的結果都是相同的，array subscripting 只是一種語法糖，實際上存取 array 也就是對一段連續的記憶體進行操作，即便是二維陣列也是如此。 ## Function Pointer function pointer 顧名思義就是指向函式的指標，同樣以`qsort`的宣告來舉例的話，其中 `*compar` 這個指標可指向任意引數及回傳形態與之相同的函式位址，即可實現如前面提到過的 overloading 的功能，以`qsort`為例的話就可以將`*compar` 指向基於 asending order 或是 descending order 的比較。 ```C void qsort(void *base, size_t nmemb, size_t size, int (*compar)(const void *, const void *)); ``` :::info `*compar` 不是 pointer type，而 `compar` 才是，注意用精準的表達式 :notes: jserv :::