Try   HackMD

strncpy 是否會自動補上 \0 ?

tags: linux2020 lab0-c Q&A

in Linux man page:

Warning: If there is no null byte among the first n bytes of src, the string placed
in dest will not be null-terminated.

前言:

觀察許多同學在撰寫 lab0-c q_insert_head() 時,都會使用
strncpy 來複製要寫入 head->value 的字串

 // copy the string
strncpy(str, s, len);
str[len] = '\0';
newh->value = str;

但如果我們在傳入參數給 strncpy 時把 len 改成 len+1 的話,或許我們就不需要額外手動幫 str 這個字串補上 \0
為了驗證這個猜測,我做了以下實驗,
並分成兩種狀況討論:

  1. 把 strncpy 的 argument n 指定成 len
  2. 把 strncpy 的 argument n 指定成 len+1

參照 man page 的 strncpy simple implementation
寫出一個函式 mystrncpy()

char mystrncpy(char *dest, const char *src, size_t n) { size_t i; for (i = 0; i < n && src[i] != '\0'; i++) dest[i] = src[i]; for (; i < n; i++) dest[i] = '\0'; return dest; }

我們發現 第一個 for-loop 的結束條件是 i == nsrc[i] == '\0'
設計實驗:

如果 strncpy 的 argument n 指定成 len

我們把 dest 指向的記憶體空間設置成 'A'
,只所以多把不屬於 dest的 2 個 byte 也設置成 'A' 的原因是,
不知道為什麼,malloc()出來的記憶體空間的數值都是 0x00
為了避免影響到實驗結果,所以把 dest 指向的記憶體空間後面 2 個 byte 也設置成 'A'

(gdb) x/8xb dest
0x555555756260:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
(gdb) p dest
$2 = 0x555555756260 ""
(gdb) set {char}0x555555756260 = 'A'
(gdb) set {char}0x555555756261 = 'A'
(gdb) set {char}0x555555756262 = 'A'
(gdb) set {char}0x555555756263 = 'A'
(gdb) set {char}0x555555756264 = 'A'
(gdb) set {char}0x555555756265 = 'A'
(gdb) set {char}0x555555756266 = 'A'
(gdb) set {char}0x555555756267 = 'A'
(gdb) set {char}0x555555756268 = 'A'
(gdb)
(gdb) x/8xb dest
0x555555756260:	0x41	0x41	0x41	0x41	0x41	0x41	0x41	0x41

確認何時迴圈會停止
n == 5

8	    for(i= 0; i < n && src[i] != '\0'; i++) #iter 1
(gdb) p i
$10 = 0
(gdb) p n
$11 = 5
(gdb) n
9	        dest[i] = src[i];
(gdb) n
8	    for(i= 0; i < n && src[i] != '\0'; i++)
(gdb) p i
$12 = 0
(gdb) n
9	        dest[i] = src[i];
(gdb) p i
$13 = 1
(gdb) n
8	    for(i= 0; i < n && src[i] != '\0'; i++)
(gdb) n
9	        dest[i] = src[i];
(gdb) p i
$14 = 2
(gdb) p dest
$15 = 0x555555756260 "heAAAAAA"
(gdb) n
8	    for(i= 0; i < n && src[i] != '\0'; i++)
(gdb) n
9	        dest[i] = src[i];
(gdb) p i
$16 = 3
(gdb) p dest
$17 = 0x555555756260 "helAAAAA"
(gdb) n
8	    for(i= 0; i < n && src[i] != '\0'; i++)
(gdb) n
9	        dest[i] = src[i];
(gdb) p i
$18 = 4
(gdb) p dest
$19 = 0x555555756260 "hellAAAA"
(gdb) n
8	    for(i= 0; i < n && src[i] != '\0'; i++)
(gdb) p i
$20 = 4
(gdb) n
12	    return dest;
(gdb) p i
$21 = 5
(gdb) p dest
$22 = 0x555555756260 "helloAAA" # no 
(gdb) print dest
$23 = 0x555555756260 "helloAAA"

因為我們設定 copy 的字元數 n == 5 的關係,所以在 copy 完 o 之後,就跳出 for 迴圈並回傳 dest

結論:
如果把 strncpy 的 argument n 指定成 len,也就是不從 src 複製 \0 的話,strncpy 就不會自動幫你補上 \0,導致出現錯誤。

對於 man page 理解有誤

如果 strncpy 的 argument n 指定成 len+1

把 mystrncpy 的 字串長度 argument 設定成 len + 1

mystrncpy(dest, src, len+1); // len+1 means we want to copy `\0` from src

Starting program: /home/ubuntu/course_jserv/week1/strxxx/strncpy

Breakpoint 2, str_copy (s=0x5555555547e4 "hello") at strncpy.c:35
35	    str = malloc(sizeof(len+1));
(gdb) n
36	    str = mystrncpy(str, s, len+1);
(gdb) p str
$9 = 0x555555756260 ""

確認一下 str 指向的記憶體內部的數值

(gdb) x/8xb str
0x555555756260:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00

發現都是 0,

murmur: malloc 不是不會初始化分配到的記憶體區塊嗎?
unknowntpoMon, Apr 20, 2020 7:18 AM

dest 指向的 6 個 byte 都設成 'A' 來觀察strncpy是否 會自動補上 \0

Breakpoint 1, mystrncpy (dest=0x555555756260 "", src=0x5555555547e4 "hello", n=6) at debug.c:8
8	    for(i= 0; i < n && src[i] != '\0'; i++)
(gdb)
9	        dest[i] = src[i];
(gdb) set {char}0x555555756260 = 'A'
(gdb) set {char}0x555555756261 = 'A'
(gdb) set {char}0x555555756262 = 'A'
(gdb) set {char}0x555555756263 = 'A'
(gdb) set {char}0x555555756264 = 'A'
(gdb) set {char}0x555555756265 = 'A'
(gdb) p dest
$2 = 0x555555756260 "AAAAAA"

接下來,尋找在 i 為多少時會跳入下一個 for-loop 並被補上 \0

(gdb) n
8	    for(i= 0; i < n && src[i] != '\0'; i++)
(gdb) n
9	        dest[i] = src[i];
(gdb) p i
$3 = 1
(gdb) n
8	    for(i= 0; i < n && src[i] != '\0'; i++)
(gdb) n
9	        dest[i] = src[i];
(gdb) n
8	    for(i= 0; i < n && src[i] != '\0'; i++)
(gdb) n
9	        dest[i] = src[i];
(gdb)
8	    for(i= 0; i < n && src[i] != '\0'; i++)
(gdb) p i
$4 = 3
(gdb) n
9	        dest[i] = src[i];
(gdb) n
8	    for(i= 0; i < n && src[i] != '\0'; i++)
(gdb) p i
$5 = 4
(gdb) p dest
$6 = 0x555555756260 "helloA"
(gdb) n
10	    for( ; i < n; i++) # jump to for loop to set '\0'
(gdb) p i
$7 = 5
(gdb) p src[i]
$8 = 0 '\000' # means we reach the '\0' in src
(gdb) p dest
$9 = 0x555555756260 "helloA"

我們發現在 i == 5 時,因為 src[i] == '\0' ,所以觸發第一個迴圈的終止條件,於是跳入下一個迴圈,

(gdb) n
11	        dest[i] = '\0'; # add '\0' to dest[5]
(gdb) p dest
$10 = 0x555555756260 "helloA"
(gdb) p i
$11 = 5
(gdb) n
10	    for( ; i < n; i++)
(gdb) p dest
$12 = 0x555555756260 "hello" # means it's null terminated
(gdb) x/6xb dest # check the memory of dest
0x555555756260:	0x68	0x65	0x6c	0x6c	0x6f	0x00

結論:
如果在 strncpy 傳入的字元個數為 len + 1
那 strncpy 就會自動幫你補上 \0

in Linux man page STRCPY(3)

If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written.
應該理解成
如果 src 的長度小於 n, strncpy() 會多寫入足夠的 \0 到 dest 來確保一共有 n 個 bytes 都被寫入 dest。

所以原本

 // copy the string
    strncpy(str, s, len);
    str[len] = '\0';
    newh->value = str;

可以改成

// copy the string
    strncpy(str, s, len + 1);
    newh->value = str;

(待補齊)FB 問答分析:

strncpy的參數size_t n,我個人認為用來限制dest的size,而不是描述src的length。
筆記中的 "len+1 means we want to copy \0 from src",這是錯誤理解,strncpy是指最多copy n個,如果copy小於n個,會幫你補上null byte,請看man page:
'''Man page: "If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written.'''
原文是寫write additional,而不是copy from src。
請考慮src的實際長度大於n時,你原本的描述是否仍成立。
參考資料
c-faq:Why does strncpy not always place a '\0' terminator in the destination string?
實務上比較常看到
寫法1:
dst = strdup(src); //return的結果一樣要記得檢查
寫法2:
size=strlen(src)+1;
ds = malloc(size); //or use calloc()
strncpy(dst, src, size);
陳東怡

https://github.com/keith-packard/picolibc/blob/master/newlib/libc/string/strncpy

張辰謙
, 你對照閱讀 man page 和查看 PicoLibc/Newlib 關於 strncpy 函式的實作程式碼,以釐清自己 語文認知 的問題在哪裡
jserv老師











































free str 問題

在 lab0-c q_insert_head()
為什麼在做完 string copy, 並 return 之前,我嘗試釋放一個自己 malloc 的記憶體區塊 str 時,反而在使用make test測試的時候產生亂碼並出現錯誤訊息呢?

$ make check
  CC	queue.o
  LD	qtest
./qtest -v 3 -f traces/trace-eg.cmd
cmd>
cmd> # Demonstration of queue testing framework
cmd> # Use help command to see list of commands and options
cmd> # Initial queue is NULL.
cmd> show
q = NULL
cmd> # Create empty queue
cmd> new
q = []
cmd> # Fill it with some values.  First at the head
cmd> ih dolphin
q = [UUUUUUUU����]
cmd> ih bear
q = [UUUUU���� �ޭCBV]
cmd> ih gerbil
q = [UUUUUUU���� �ޭCBV �ޭCBV]
cmd> # Now at the tail
cmd> it meerkat
Insertion of meerkat failed
q = [UUUUUUU���� �ޭCBV �ޭCBV]
cmd> it bear
Insertion of bear failed
q = [UUUUUUU���� �ޭCBV �ޭCBV]
cmd> # Reverse it
cmd> reverse
q = [UUUUUUU���� �ޭCBV �ޭCBV]
cmd> # See how long it is
cmd> size
Queue size = 3
q = [UUUUUUU���� �ޭCBV �ޭCBV]
cmd> # Delete queue.  Goes back to a NULL queue.
cmd> free
ERROR: Attempted to free unallocated block.  Address = 0x564243addef0
ERROR: Attempted to free unallocated or corrupted block.  Address = 0x564243addef0
ERROR: Corruption detected in block with address 0x564243addef0 when attempting to free it
ERROR: Attempted to free unallocated block.  Address = 0x564243addeb0
ERROR: Attempted to free unallocated block.  Address = 0x564243addeb0
ERROR: Attempted to free unallocated or corrupted block.  Address = 0x564243addeb0
ERROR: Corruption detected in block with address 0x564243addeb0 when attempting to free it
ERROR: Time limit exceeded.  Either you are in an infinite loop, or your code is too inefficient
q = NULL
ERROR: Freed queue, but 1 blocks are still allocated
cmd> # Exit program
cmd> quit
Freeing queue
ERROR: Freed queue, but 1 blocks are still allocated
Makefile:42: recipe for target 'check' failed
make: *** [check] Error 1

解決

FB 問答回覆

q_insert head() 中的 91 行,將 newh->value = str 時 newh->value 跟 str 指向同一個地方,當 free(str) 會讓
newh->value 指向一個空的地方。
還有在 q_insert_head 中 85 行 strncpy 執行完後可以不用 str[len] = '\0'; ,strncpy 會幫你填 '\0'。
張佳鴻

我的解讀

為了從 s 複製一個字串的內容而動態配置的記憶體片段,原本是由 str 指向它 , 但是經由 newh->value = str 這個動作,現在換成是 newh->value 與str 來指向它並存在 linked list 內,所以並不需要釋放這個記憶體片段

老師補充

感謝 張佳鴻 回覆。注意 Linux Programmer's Manual 的說明: "If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated." / "If the length of src is less than n, strncpy() writes additional null bytes to dest to ensure that a total of n bytes are written."

我查的資料

in Linux Programmer's Manual
"The stpncpy() and strncpy() functions copy at most len characters from
src into dst. If src is less than len characters long, the remainder of
dst is filled with \0' characters. Otherwise, dst >is not terminated.
The source and destination strings should not >overlap, as the behavior is
undefined."