# K & R 課本閱讀: strcpy function (p.105)
###### tags: `K&R` `C` `linux2020` `Q&A`
## 我的問題
* 為什麼 K&R p.105 頁,最精簡的code可以省略 是否為 NULL character```'\0'```呢?
[FB問題連結](https://www.facebook.com/groups/system.software2020/permalink/335045387429667/)
-----
Code:
```clike=1
/* copy t to s ; pointer version 1 */
void strcpy(char *s, char *t) {
while ((*s++ = *t++) != '\0')
;
}
```
-------
## 嘗試分析
* 分析 line3 中 postfix increment and prefix increment 行為。
>以下擷取 ISO / IEC 9899 內文
>#### 6.5.2.4 Postfix operators: (p.75)
>**Semantics:**
>The result of the postfix ```++``` operator is the value of the operand.After the result is obtained, the value of the operand is incremented.
That is, the value 1 of the appropriate type is added to it.
See the discussions of additive operators and compound assignment for information on constraints, types, and conversions and the effects of operations on pointers. The side effect of updating the stored value of the operand shall occur between the previous and the next sequence point.
(Page 87).
Forward references: additive operators (6.5.6), compound assignment (6.5.16.2).
------
>以下擷取 C programming language課本解釋 [name=unknowntpo]
#### 2.8 Increment and Decrement operator
>The unusual aspect is that ```++``` and ```--``` may be used either as prefix operators (before the variable, as in ```++n```), or postfix (after the variable: ```n++```). In both cases, the effect is to increment ```n```. But the expression ```++n``` increments n before its value is used, while ```n++``` increments ```n``` after its value has been used.
[name=Kernighan, Brian W.. C Programming Language (p. 46). Pearson Education. Kindle Edition. ]
#### Searching StackOverflow:
[How does “while(*s++ = *t++)” copy a string?](https://stackoverflow.com/questions/810129/how-does-whiles-t-copy-a-string)
Note that just because it has higher precedence doesn't mean it happens first.==Postfix increment specifically happens after the value has been used==, which his why ```*s = *t``` happens before ```s++```.
> [name=Chris Lutz]
Result:
`*s = *t` happens before `s++`.
--------------
## 解讀
依據 C 語言規格書,```strcpy``` 目標端 (dst) 的地址範圍和來源 (src) 不該重疊 (overlap),否則是未定義行為,在此我忽略 src 和 dst 重疊的狀況。
這問題源自一直將 ```=``` 讀作「等於」也想做數學上的等於,但實際上這裡的 ```=``` 是 assignment (指派數值)。
```cpp
*s++ = *t++
```
要拆解成
```cpp
*s = *t; s++, t++;
```
其中當
`*s = *t` 敘述完成時,會有對應的數值,也就是 ```*t```,有趣的地方就在於,C 語言字串的 null terminator 也就是 0,只不過你通常看到 ```'\0'``` 的書寫方式,本質上都是一樣,差別是表達法的差異。而 while 敘述遇到 0 就視為 false (非 0 都是 true,包含 -1)。
>待補上 ISO/IEC 9899 參考內容 [name=unknowntpo][time=Feb 19 2020]
>程式碼如下
```clike=1
/* copy t to s ; pointer version 1 */
void strcpy(char *s, char *t) {
while (*s++ = *t++)
;
}
```
### 用 gdb 觀察 (*dst 和 *src 的變化)
先準備測試程式碼 (檔名為 `test.c`)
```cpp
char *mystrcpy(char *dst, char *src) {
while (*dst++ = *src++)
;
}
int main() {
char *src = "hello";
char dst[16] = {'0'};
mystrcpy(dst, src);
return 0;
}
```
編譯:
```shell
$ gcc -o test -g test.c
```
接著透過 GDB 執行:
```cpp
gdb -q test
Reading symbols from test...done.
(gdb) list
1 char *mystrcpy(char *dst, char *src) {
2 while (*dst++ = *src++)
3 ;
4 }
5 int main() {
6 char *src = "hello";
7 char dst[16] = {'0'};
8 mystrcpy(dst, src);
9 return 0;
10 }
```
顯然第 2 行是我們專注的部分,先設定中斷點再執行:
```cpp
(gdb) break 2
Breakpoint 1 at 0x676: file test.c, line 2.
(gdb) run
Starting program: /tmp/test
Breakpoint 1, mystrcpy (dst=0x7fffffffe3a0 "0", src=0x555555554794 "hello") at test.c:2
2 while (*dst++ = *src++)
```
可發現,中斷點觸發,注意到 `while (*dst++ = *src++)` 這行敘述「即將」執行。於是我們嘗試逐步驗證:
```cpp
(gdb) print dst
$1 = 0x7fffffffe3a0 "0"
```
印出指標型態變數 `dst` 的帶入到 `mystrcpy` 函式時的內含值,即 `0x7fffffffe3a0`,而後者對應的記憶體內容為 `0`,符合預期 (對照原始程式碼的第 7 行)。
接著執行 `(*dst++ = *src++)`
```cpp
(gdb) print (*dst++ = *src++)
$2 = 104 'h'
```
注意到上面的敘述會得到數值 104,也就是 ASCII 的 `h` 字元,而對 `while (...)` 敘述來說,104 不等於零,就表示 TRUE (真值),就會滿足迴圈條件,自然會繼續。
由於 `(*dst++ = *src++)` 包含 "postfix increment" (即 `dst++` 和 `src++`),後者屬於 compound assignment,我們透過 GDB 觀察 `dst` 變數內含值的變化:
```cpp
(gdb) print dst
$3 = 0x7fffffffe3a1 ""
```
跟稍早相比,目前的 `0x7fffffffe3a1` 的確比 `0x7fffffffe3a0` 遞增 `1`,從記憶體線性的角度來看,就是指標指向的空間,從某一個位元組移動到隔壁的位元組。
既然稍早提到 `while` 敘述成立,那我們就來繼續執行 `(*dst++ = *src++)`:
```cpp
(gdb) print (*dst++ = *src++)
$4 = 101 'e'
(gdb) print dst
$5 = 0x7fffffffe3a2 ""
```
的確就達到字串複製的效果:一開始複製 `h` 字元,現在複製 `e` 字元,然後以此類推到最終的 `o` 字元,而且的確 `dst` 指標也逐一往前推進。
### 印出 assembly code 觀察
```shell
gcc test.c -S
cat test.s
```
Output
```asm
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $48, %rsp
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
leaq .LC0(%rip), %rax
movq %rax, -40(%rbp)
movq $0, -32(%rbp)
movq $0, -24(%rbp)
movb $48, -32(%rbp)
movq -40(%rbp), %rdx
leaq -32(%rbp), %rax
movq %rdx, %rsi
movq %rax, %rdi
call mystrcpy
movl $0, %eax
movq -8(%rbp), %rcx
xorq %fs:40, %rcx
je .L5
call __stack_chk_fail@PLT
.L5:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0"
.section .note.GNU-stack,"",@progbits
```
## CERN (CERN Computer Security) 對於 strcpy 漏洞的說明
[Common vulnerabilities guide for C programmers](https://security.web.cern.ch/security/recommendations/en/codetools/c.shtml)
### Postfix operator 何時被 evaluate
在 C99 內有說明
#### 6.5.2.4 Postfix increment and decrement operators
:::info
The **side effect** of updating the stored value of the operand shall occur between the previous and the next **sequence point**.
:::
術語查詢:
* [sequence point]():
* A sequence point defines any point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed.
* eg.
* between evaluation of the left and right operands of the ```&&``` (logical AND), ```||``` (logical OR) (as part of short-circuit evaluation), and comma operators.
* For example, in the expression ```*p++ != 0 && *q++ != 0```, all side effects of the sub-expression ```*p++ != 0``` are completed before any attempt to access ```q```.
------
:::warning
保留原本 Facebook 討論區的超連結即可,不用預設人們會偷偷刪文。文字訊息不要用圖片表示
:notes: jserv
:::
>已修正 [name=unknowntpo] [time= 9:09 March 5 2020]