你所不知道的 C 語言 - 指標篇

# 你所不知道的 C 語言 - 指標篇 contributed by <`pjchiou`> --- ## 頭腦體操這邊讓我想了很久，後來有去查查哪裡有相關的說明，發現在經典著作內就有專門一小節在說明。 ![](https://i.imgur.com/AeeqWnP.jpg) [第5.12節 Complicated declarations](http://www.dipmat.univpm.it/~demeio/public/the_c_programming_language_2.pdf) --- ## void* 之謎共筆當中有一段 - 對某硬體架構，像是 ARM，我們需要額外的 alignment。ARMv5 (含) 以前，若要操作 32-bit 整數 (uint32_t)，該指標必須對齊 32-bit 邊界 (否則會在 dereference 時觸發 exception)。 - 以下程式 ```C #include <stdio.h> #include <stdlib.h> int main() { int a=0x12345678; void *p=&a; for(int i=0;i<=3;i++) printf("%x\n",*((char *)p+i)); return 0; } ``` 在 x86_64 Little-Endian 下執行的結果為 :::success 78 56 34 12 ::: :::warning :question: 看來跟預期的結果一樣，所以說在 Intel x86_64 的架構下，雖然能以 void * 取出任意位置的值(不會觸發 exception )，但是有可能會對效能有很大的影響。這邊可以做這樣的解讀嗎? ::: :::danger 這問題很好，可以用好幾週的時間解讀 (算是本學期課程的經典議題)。參閱 [Data alignment and caches](https://danluu.com/3c-conflict/) 和 [Data alignment for speed: myth or reality?](https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/)，並且設計類似的實驗來驗證 alignment 對資料存取的影響 :notes: jserv ::: --- ## 沒有「雙」指標，只有指標的指標要注意的是 C 語言只有 call by value (老師上課甚至提到:「根本也不用特別強調，因為就只有這一種。」)以前我看了一些 C++ 的書，甚至分三種 - call by value - call by address(根本沒這個東西...) - call by reference :::info 看了有問題的資料，反而讓我在這裡打結。只要記得**C 語言只有 call by value** 那一切就很自然了。 ::: 這段共筆內有一小段程式 ~~~C=1 int B = 2; void func(int **p) { *p = &B; } int main() { int A = 1, C = 3; int *ptrA = &A; func(&ptrA); printf("%d\n", *ptrA); return 0; } ~~~ 我們把它畫成圖會長這個樣子 :::info - 在第5行(包含)之前，資料的樣子 ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structptr [label="ptrA|<ptr> &A"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structa:A:nw } ``` - 第6行時，傳進 func 的那個 &ptrA，是一個 RValue。(也就是下圖中的 &ptrA(temp)) ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structadptr [label="&ptrA(temp)|<adptr> &ptrA"] structptr [label="<name_ptr> ptrA|<ptr> &A"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structa:A:nw structadptr:adptr -> structptr:name_ptr:nw } ``` - 進入 func 的一瞬間，會複製一份剛接到的 &ptrA ，產生一個自動變數 p，將 &ptrA 內的值存在其中，因此在那個當下，資料應該是如下圖。 ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structp [label="p(in func)| &ptrA"] structadptr [label="&ptrA(temp)|<adptr> &ptrA"] structptr [label="<name_ptr> ptrA|<ptr> &A"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structa:A:nw structadptr:adptr -> structptr:name_ptr:nw structp:p -> structptr:name_ptr:nw } ``` - 在 func 只有一行程式，把 p 指向到的值換成 &B ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structp [label="p(in func)| &ptrA"] structadptr [label="&ptrA(temp)|<adptr> &ptrA"] structptr [label="<name_ptr> ptrA|<ptr> &B"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structb:B:nw structadptr:adptr -> structptr:name_ptr:nw structp:p -> structptr:name_ptr:nw } ``` ::: 驗証一下我的想法，我把共筆內的程式小小修改一下，讓 &ptrA 變成一個 LValue，再看看發生了什麼事?程式如下所示: ```C=1 #include <stdio.h> #include <stdlib.h> int B = 2; void func(int **p) { printf("p=%p stored in %p\n",p,&p); *p = &B; } int main() { int A = 1, C = 3; int *ptrA = &A; int **ptrptrA = &ptrA; printf("ptrptrA=%p stored in %p\n",ptrptrA,&ptrptrA); func(ptrptrA); printf("%d\n", *ptrA); return 0; } ``` p 與 ptrptrA 應該存有一樣的值，**但是存在不同的位址**，輸出如下 :::success ptrptrA=0x7ffc7b947328 stored in 0x7ffc7b947330 p=0x7ffc7b947328 stored in 0x7ffc7b947308 2 ::: 接著我用同樣的想法去解析老師在直播內給出的例子 ```C=1 int B = 2; void func(int *p) { p = &B; } int main() { int A = 1, C = 3; int *ptrA = &A; func(ptrA); printf("%d\n", *ptrA); return 0; } ``` 再次強調，永遠只有 call by value ，傳進 function 內的永遠只會是**存有相同值的另一個自動變數**，搞清楚參數的 type 很重要。 :::info - 在第5行(包含)之前，資料的樣子 ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structptr [label="ptrA|<ptr> &A"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structa:A:nw } ``` - 第6行時，傳進 func 後的瞬間，資料變成下圖 ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structp [label="p|&A"] structptr [label="<name_ptr> ptrA|<ptr> &A"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structa:A:nw structp:p -> structa:A:nw } ``` - func 內做的運算為**將 p 的值改成 &B** ```graphviz digraph structs { node[shape=record] {rank=same; structa,structb,structc} structp [label="p|&B"] structptr [label="<name_ptr> ptrA|<ptr> &A"]; structb [label=" B|2"]; structa [label="<A> A|1"]; structc [label="<C> C|3"]; structptr:ptr -> structa:A:nw structp:p -> structb:B:nw } ``` ::: 圖中可以輕易看出，原本在 main 中的 ptrA 當然沒有改變。 --- ## Pointers vs. Arrays 共筆的開頭: - array vs. pointer - in declaration - extern, 如 extern char x[]; => 不能變更為 pointer 的形式 - definition/statement, 如 char x[10] => 不能變更為 pointer 的形式 - parameter of function, 如 func(char x[]) => 可變更為 pointer 的形式 => func(char *x) - in expression -array 與 pointer 可互換 :::warning :question: 看到這裡一時之間無法完全參透在說什麼，因此利用GDB做一個小小的實驗 (P.S 只學過 C/C++ 的我，對於GDB這樣的工具是如何開發出來的?完全沒有頭緒...) ::: 參考以下程式: ```C=1 #include <stdio.h> #include <stdlib.h> int main() { int a[8], *b; b = malloc(sizeof(int) * 8); for (int i = 0; i < 8; i++) { a[i] = i; b[i] = i; } printf("%p\n%p\n", a, b); free(b); return 0; } ``` 在這個程式中，除了練習使用GDB來驗証共筆的內容外，主要探討變數 a 與 b 的異同處，首先利用GDB設中斷點在第13行，開始觀查這兩個變數的差異。 |GDB指令(假設 y 為變數名稱)|a|b| |:-:|:-|:-| |whatis y|int [8]|int * | |whatis &y[0]|int *|int *| |whatis y+1|int *|int *| |whatis &y|int (*)[8]|int **| |whatis &y+1|int (*)[8]|int **| |p y|{0,1,2,3,4,5,6,7}|0x602010| |p &y[0]|(int *) 0x7fffffffdc20|(int *) 0x602010| |p y+1|(int *) 0x7fffffffdc24|(int *) 0x602014| |p &y|(int (*)[8]) 0x7fffffffdc20|(int **) 0x7fffffffdc18| |p &y+1|(int (*)[8]) 0x7fffffffdc40|(int **) 0x7fffffffdc20| |x/8 y|0x7fffffffdc20: 0 1 2 3 0x7fffffffdc30: 4 5 6 7|0x602010: 0 1 2 3 0x602020: 4 5 6 7 |x/8 &y|0x7fffffffdc20: 0 1 2 3 0x7fffffffdc30: 4 5 6 7 | 0x7fffffffdc18: 6299664 0 0 1 0x7fffffffdc28: 2 3 4 5 :::info **從這個結果我們可以得出幾個結論，同時也生出更多問題...** - 圖解 - a 不是一個指標，它是一個在編譯時就配置好的記憶體空間，**所以不能做 a++ 的運算**。但 a+1 是一個 (int *) 的指標，其值為 &a[1] 。 - b 是一個指標，所以跟指標一樣可以做 b++ 。 ```graphviz digraph structs { node[shape=record] ptra [label="<ptra> &a"] ptra0 [label="<ptra0> &a[0]"] structa [label="{<a> a|{<a0> a[0]|a[1]|a[2]|a[3]|a[4]|a[5]|a[6]|a[7]}}"]; ptra:ptra -> structa:a:nw ptra0:ptra0 -> structa:a0:sw } ``` ```graphviz digraph structs { node[shape=record] bptr [label="<bptr> b"] structb [label=" b[0]|b[1]|b[2]|b[3]|b[4]|b[5]|b[6]|b[7]"] bptr:bptr -> structb:b:nw } ``` - 觀察到的行為 1. b 的行為跟想像中的蠻符合的。 2. a 與 b 是不同 type ，但 a+1, b+1 卻是一樣的 type 。從這一個事實嘗試整理出一個頭緒: **int [8] 與 int * 對於 + 運算子，就像 int+double 與 double+double 一樣**。 3. 如果 ptr 是一個 pointer ，那麼 ptr+1 會等於 ptr + sizeof(*ptr) ，也就是被這個 pointer 指向的那個 type 所佔的大小，**而非這個 pointer 本身的大小**。舉例來說，在我的系統下， int 佔 4 個 bytes， pointer 佔 8 個 bytes ， b+1 會相當於 b+4 bytes(sizeof(int)) ，而 &b+1 會相當於 b+8 bytes(sizeof(pointer))。 ::: 做了這些實驗後，再從頭看一次[第5.3節 Pointers and Arrays](http://www.dipmat.univpm.it/~demeio/public/the_c_programming_language_2.pdf)，當中解釋 - There is one difference between an array name and a pointer that must be kept in mind. A pointer is a variable, so pa=a and pa++ are legal. But an array name is not a variable; constructions like a=pa and a++ are illegal. 才明白這裡在說些什麼，以前只是看過，並沒有真的看懂。 :::success 換個角度思考：透過 array subscripting (也就是 `[]` 運算子)，我們可存取 `a[1]`，那麼可以 `(a + 1)[1]` 嗎？如果可以，又對應到 a[?] 哪個索引值呢？ :notes: jserv Ans: 可以。因為 (a+1) 是一個 (int *) ，令 int *ptr = a+1; ，則 (a+1)[1] 就相當於 ptr[1] 也就是 *(ptr+1) 同時等於 *(a+2)。 ::: 接下來我來驗証，**如果做為 parameter 傳入 function ，那麼 a 又會是什麼東西?** ```C=1 #include <stdio.h> #include <stdlib.h> void func(int c[8],int d[],int *e) { printf("%p\n%p\n%p\n",c,d,e); } int main() { int a[8]; for(int i=0; i<8; i++) a[i] = i; func(a,a,a); printf("%p\n",a); return 0; } ``` |GDB指令(假設 y 為變數名稱)|c|d|e| |:-:|:-|:-|:-| |whatis y|int *|int * |int *| :::info 看到這個，我想實驗已經有結果了，也知道 Linus Torvalds 在氣什麼了。**不管用什麼方式丟進 function 都是一樣的東西。** 其行為跟第一個實驗中的 b 完全一樣。 ::: --- ## 重新探討「字串」 - 由於 C 語言提供了一些 syntax sugar 來初始化陣列，這使得 char *p = "hello world" 和 char p[] = "hello world" 寫法相似，但底層的行為卻大相逕庭一樣用 GDB 來觀察到底哪裡不同。參考以下程式 ```C=1 #include <stdio.h> #include <stdlib.h> int main() { char a[]="Hello world"; char *b="Hello world"; char *c; c = malloc(sizeof(char)*12); c[0]='H'; c[1]='e'; c[2]='l'; c[3]='l'; c[4]='o'; c[5]=' '; c[6]='w'; c[7]='o'; c[8]='r'; c[9]='l'; c[10]='d'; c[11]='\0'; printf("%s\n%s\n%s\n",a,b,c); free(c); return 0; } ``` 結果如下： |GDB指令(假設 y 為變數名稱)|a|b|c| |:-:|:-|:-|:-| |whatis y|char [12]|char * |char *| |whatis &y|char (*)[12]|char ** |char **| | p y|"Hello world"|0x400794 "Hello world"|0x602010 "Hello world" |p &y|(char (*)[12]) 0x7fffffffdc3c|(char **) 0x7fffffffdc28|(char **) 0x7fffffffdc30| :::warning :question: -　看來 a 的行為跟之前做的實驗一樣，是一個 array ，不是 pointer 。 -　b 與　c 看似行為一樣，都是 pointer ，所以應該和之前實驗的 b 有相同的行為。因此我有一個想法：**那我是不是也應該　free(b);** -　我加上 free(b); 以後，程式就當了!!b 和 c 看起來完全一樣，為什麼會有這樣的現象？ >malloc manpage 寫到： >The free() function frees the memory space pointed to by ptr, which *must have been returned by a previous call to malloc(), calloc(), or realloc().* Otherwise, or if free(ptr) has already been called before, undefined behavior occurs. 如果不是針對malloc系列指令動態分配的記憶體做free()，會觸發 UB [name=Yichung279] :::