# Self Referential structure imcomplete type 問題
###### tags: `linux2020` `C` `Q&A`
Ref: [寫點科普請多指教 - C 語言:鏈結串列(Linked List)的建立與刪除](https://kopu.chat/2017/06/02/c-%E8%AA%9E%E8%A8%80%EF%BC%9A%E9%8F%88%E7%B5%90%E4%B8%B2%E5%88%97linked-list%E7%9A%84%E5%BB%BA%E7%AB%8B%E8%88%87%E5%88%AA%E9%99%A4/)
# OUTLINE
[TOC]
## 問題 (程式未重構前 - 問題描述不精準,程式冗長)
大家好,我想分享一下我在練習 linked list 的 structure 宣告時遇到的問題,與 C99 規格書的資料查詢記錄,完整的程式碼放在文章的最後。
### 狀況描述
如果我們像底下這樣宣告一個 structure
```cpp
typedef struct {
int age;
struct TWICE *next;
}TWICE;
```
會出現 Error message:
```shell
$ gcc -g twice_ll.c -o ll
twice_ll.c: In function ‘main’:
twice_ll.c:35:17: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
current = current->next;
^
twice_ll.c:44:17: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
current = current->next;
^
```
出現了 ```-Wincompatible-pointer-types```
推測是兩邊的 pointer 資料型態不一致。
### GDB 實驗
如果我們使用 gdb-ptype command 來檢查資料型態的話:
```gdb
Breakpoint 1, main () at twice_ll.c:35
35 current = current->next;
(gdb) n
36 current->age = age;
(gdb) ptype current
type = struct {
int age;
struct TWICE *next;
} *
(gdb)
type = struct {
int age;
struct TWICE *next;
} *
(gdb) ptype current->next
type = struct TWICE {
<incomplete type>
} *
```
:::danger
錯誤解釋!對 forward declaration 不熟!
敘述有點問題
找資料
>[name=unknowntpo] [time=Fri, May 1, 2020 4:40 PM]
我們發現 ```current->next``` 是個 incomplete type, 因為當我們使用 typedef 一個 structure 時,只有在整個 structure 宣告完成後,也就是 ```}``` 出現後才會完成定義。
:::
如下:
```cpp
typedef struct TWICE{
int age;
struct TWICE *next;
}twice_mem;
```
附上 C99 對於 incomplete type 的解釋:
### C99 - Incomplete type
>Types are partitioned into object types (types that fully describe objects), function types (types that describe functions), and ==incomplete types (types that describe objects but lack information needed to determine their sizes==).
(Page 45).
### 完整程式碼
```cpp=1
/*
* @Title: Twice ll.c
* @Ref: https://kopu.chat/2017/06/02/c-%E8%AA%9E%E8%A8%80%EF%BC%9A%E9%8F$
* 寫點科普,請多指教 -
* C 語言:鏈結串列(Linked List)的建立與刪除
*/
#include <stdio.h>
#include <stdlib.h>
typedef struct TWICE{
int age;
struct TWICE *next; // must use "struct TWICE"
}twice_mem;
int main()
{
int count, i, age;
printf("specify how many nodes:");
scanf("%d", &count); // specify how many nodes
printf("input first node value:");
scanf("%d", &age);
// create new node
twice_mem *head = malloc(sizeof(twice_mem));
head->age = age;
head->next = NULL;
// create 'count' amount of node
twice_mem *current = head;
for (i = 1; i < count; i++){
printf("next node:\n");
scanf("%d", &age);
current->next = malloc(sizeof(twice_mem));
current = current->next;
current->age = age;
current->next = NULL;
}
// print all node
printf("Age of TWICE:\n");
current = head; // point current to head again
while(current != NULL) {
printf("%d\n", current->age); // print current node val.
current = current->next;
}
return 0;
}
```
## 老師回答
你需要整理最小可重現 (reproduce) 的程式碼,否則編譯器的警告訊息和程式碼列表無法對照 (下方的完整程式碼又太冗長)
在本案例中,最大的問題是 張辰謙 搞錯
```struct TWICE``` 和 ```typedef struct { ... } TWICE```
的意義 —— 兩者在 C 語言有顯著差異。
前者是結構體,
而 ```struct TWICE *``` 是 ```incomplete type```,這是允許的,可在需要存取到所指向的物件時,再去定義其內部結構 (forward declaration)。
```typedef struct { ... } TWICE``` 是透過 typedef 所宣告的 derived types,這兩者在執行時期的空間佔用不同。你可用 ```sizeof``` 去觀察
```cpp
typedef struct {
int age;
struct TWICE *next;
}TWICE;
```
> [name=jserv 老師]
>補充
### [forward declaration 搭配指標的技巧](https://hackmd.io/@sysprog/c-pointer?type=view#forward-declaration-%E6%90%AD%E9%85%8D%E6%8C%87%E6%A8%99%E7%9A%84%E6%8A%80%E5%B7%A7)
在 `oltk.c` 中只有宣告 (declaration),
```cpp
struct oltk; // 宣告 (incomplete type, void)
struct oltk_button;
typedef void oltk_button_cb_click(struct oltk_button *button, void *data);
typedef void oltk_button_cb_draw(struct oltk_button *button,
struct oltk_rectangle *rect, void *data);
struct oltk_button *oltk_button_add(struct oltk *oltk,
int x, int y,
int width, int height);
```
`struct oltk` 和 `struct oltk_button` 沒有具體的定義 (definition) 或實做 (implementation),僅有宣告 (declaration)。
在 `oltk.c` 中才有具體定義 `oltk` 這個 structure 的實作
```cpp
struct oltk {
struct gr *gr;
struct oltk_button **zbuttons;
...
struct oltk_rectangle invalid_rect;
};
```
---
## TODO
* 縮減程式碼
* 搜尋 reproduce
* 縮減完整程式碼到 reproduce code
* 搜尋 incomplete type 與 derived type 定義
* 理解 ```struct TWICE``` 與 ```typedef struct { ... } TWICE``` 差異
* 設計實驗
* 在 gdb 使用 ```sizeof``` 觀察 ```incomplete type``` 與 ```derived type``` 在執行期間佔用的記憶體大小
### Derived type
in **C99 6.2.5 - Types** :
>20:
>Any number of **derived types** can be constructed from the **object**, **function**, and **incomplete types**, as follows:
(只節錄 structure type 的部分)
>— **An Array type**...
>— **A structure type** describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type. — A union type describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type.
>— **A function type**...
>22:
>An **array type** of unknown size is an incomplete type. It is completed, for an identifier of that type, by specifying the size in a later declaration (with internal or external linkage). A **structure or union type of unknown content** (as described in 6.7.2.3) is an **incomplete type**. ==It is completed, for all declarations of that type, by **declaring the same structure or union tag with its defining content** later in the same scope.==
>可能是 而 struct TWICE * 是 incomplete type 這句話的解答
### Incomplete type
in **C99 6.2.5 Types**:
>1:
>Types are partitioned into object types (types that fully describe objects), function types (types that describe functions), and ==incomplete types (types that describe objects but lack information needed to determine their sizes==).
### 重構程式遇到的問題
#### Segmentation fault
```cpp=1
#include <stdio.h>
#include <stdlib.h>
/* incomplete type */
typedef struct TWICE {
int age;
struct TWICE *next; // must use "struct TWICE"
}twice_mem;
int main()
{
twice_mem *head;
head->age = 20;
head->next = NULL;
}
```
in gdb
```gdb
(gdb) b 12
Breakpoint 1 at 0x5fe: file tw_debug.c, line 12.
(gdb) r
Starting program: /home/ubuntu/linux2020/unknowntpo/lab0-c/debug
Breakpoint 1, main () at tw_debug.c:12
12 head->age = 20;
(gdb) info local
head = 0x0
(gdb)
```
* 思考後發現 ```head``` 這個 pointer to structure 要先 dynamically allocate memory 才能使用
* in line 11, 只是宣告一個 ```twice_mem``` 形態的指標,一開始沒有指向任何地方
```cpp
(gdb) info local
head = 0x0
```
# 程式重構後
## Code:
```cpp=1
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <stddef.h>
typedef struct {
int age;
struct TWICE *next; // next is an incomplete type
}twice_mem;
int main()
{
twice_mem *head = malloc(sizeof(twice_mem));
}
```
## GDB Debug
```gdb
(gdb) ptype head
type = struct {
int age;
struct TWICE *next;
} *
(gdb) ptype head->next
type = struct TWICE {
<incomplete type>
} *
```
##### 為什麼 ptype pointer to structure 會有 ```*```
Ans:
* head 是個指向 structure 的 pointer 所以 在 ```}``` 後會有 ```*```
```gdb
(gdb) ptype head
type = struct {
int age;
struct TWICE *next;
} *
```
* ```*head``` 因為已經 dereference ```head``` 這個 pointer ,所以沒有 ```*```
```gdb
(gdb) ptype *head
type = struct {
int age;
struct TWICE *next;
}
```
#### 執行期佔用空間大小觀察
1. 觀察 ```typedef struct { ... } twice_mem``` 這個透過 typedef 所宣告的 derived types 執行期間佔用的空間
```cpp
(gdb) p sizeof(head)
$4 = 8
(gdb) p sizeof(*head)
$5 = 16
```
:::warning
Question:
```head```這個 pointer type 的 object 與 ```*head``` 這個 object 佔用的大小為什麼不同呢?
:::
:::warning
Answer:
pointer type 物件 (包含 pointer to incomplete type) 的空間即為一個 word 長度
(因為實驗環境為 64-bit,word 長度為 8 byte),所以 head 這個 pointer type 物件大小就為 8 bytes.
```*head``` 的意義是把指向 ```twice_mem``` 這個 structure 的 指標 dereference,
也就是實際上 ```*head```就代表著 ```twice_mem``` 這個結構體,
因為在 64 位元的電腦上,資料會以 8-byte chunks 為單位讀取與寫入,所以 compiler 會幫你做 padding,
所以 ```twice_mem``` 這個結構體的大小就是
4 bytes (int) + 4 bytes (padding) + 8 bytes (pointer) = 16 bytes
這也是為什麼 ```sizeof(*head)``` 是 16 bytes
參考資料:
[stack overflow - sizeof a linked list node](https://stackoverflow.com/questions/16768733/size-of-a-linked-list-node)
>[name=unknowntpo]
:::
2. 觀察 ```struct TWICE *``` (line 8) 這個 incomplete type 執行期間佔用的空間
```cpp
(gdb) p sizeof(head->next)
$6 = 8
```
:::warning
Question: 該如何解釋他為什麼是 8bytes 呢?
:::
>因為它是一個指向 structure 的指標, 指標大小為一個 word 的長度,在 64-bit system 就是 8 bytes.
>[name=unknowntpo]
----
##### C99 對於 object 這個名詞的解釋
>**C99 - 3.14 Object**
>1:
> object region of data storage in the execution environment, the contents of which can represent values
NOTE When referenced, an object may be interpreted as having a particular type; see 6.3.2.1.
:::info
# word 與 WORD, DWORD, QWORD 的區別
"Word size" refers to the number of bits processed by a computer's CPU in one go (these days, typically 32 bits or 64 bits). Data bus size, instruction size, address size are usually multiples of the word size.
Just to confuse matters, for backwards compatibility, Microsoft Windows API defines a WORD as being 16 bits, a DWORD as 32 bits and a QWORD as 64 bits, regardless of the processor.
翻譯:
word size: cpu 一次可以處理的資料量
* 32 bits cpu
* word size == 32 bits
* 64 bits cpu
* word size == 64 bits
Ref: [stack overflow](https://stackoverflow.com/questions/19821103/what-does-it-mean-by-word-size-in-computer)
:::