2017q3 Homework2 (prefix-search)

contributed by <jackyhobingo>

Reviewed by `kevin550029`

一開始測試程式滿詳盡的, 比較長的文字敘述稍微排版一下會增加可讀性
修正REF功能的部分請再補上程式碼並解釋做法

測試 prefix-search 程式的功能

測試 search 功能

測試 search words matching prefix 的功能，依照，輸入 T 搜尋到 1024 筆資料

$ ./test_cpy 

ternary_tree, loaded 259112 words in 0.080956 sec
...
choice: s
find words matching prefix (at least 1 char): T
  T - searched prefix in 0.000217 sec

suggest[0] : Tàrrega,
suggest[1] : Tàu,
...
suggest[1023] : Tasquillo,

再輸入 B ，也搜尋到 1024 筆資料

find words matching prefix (at least 1 char): B
  B - searched prefix in 0.001191 sec

suggest[0] : Bábolna,
suggest[1] : Bácsalmás,
... 
suggest[1023] : Balcón

輸入一個開頭比較少見的 X ，獲得 175 筆資料，目前都是正常情況

find words matching prefix (at least 1 char): X
  X - searched prefix in 0.000042 sec

suggest[0] : Xàtiva,
suggest[1] : Xánthi,
...
suggest[174] : Xylotymbou,

輸入 A ，發現程式記憶體區段錯誤

find words matching prefix (at least 1 char): A
程式記憶體區段錯誤 (core dumped)

輸入 Aa 看是不是 prefix 有 A 都有問題，結果程式執行正常

find words matching prefix (at least 1 char): Aa
  Aa - searched prefix in 0.000007 sec
suggest[0] : Aabenraa,
suggest[1] : Aach,  
...
suggest[24] : Aasiaat,

經過以上的測試，小小的結論是，searched prefix 會輸出最多 1024 筆資料，而目前在 prefix = A 時會發生錯誤。

測試 add, find, delete 的功能

找個字串來做測試，找找看 "jacky" 有沒有在裡面

choice: f
find word in tree: jacky
  jacky not found.

沒找到，add jacky to the tris

choice: a
enter word to add: jacky
  jacky - inserted in 0.000002 sec. (259113 words in tree)

添加後，測試是否可以找到

choice: f
find word in tree: jacky
  found jacky in 0.000001 sec.

測試刪除功能，測試刪去的資料是否真的找不到了

choice: d
enter word to del: jacky
  deleting jacky
  deleted jacky in 0.000004 sec
...
choice: f
find word in tree: jacky
  jacky not found.

目前功能都正常，想到之前測試的有些資料最後有逗號，想要知道 find 功能對其的支援為何，輸入在 search prefix T 時的資料 Tasquillo,

choice: f
find word in tree: Tasquillo,
  found Tasquillo, in 0.000002 sec.

可以找到有逗號的，那沒有逗號是否可以搜尋到

choice: f
find word in tree: Tasquillo 
  Tasquillo not found.

輸入 "Tasquillo" 沒辦法搜尋到 "Tasquillo,"

觀察原始資料

原始資料為，city, country，發現有些原始資料城市是超過兩個單字的長度， e.g. New York, Mexico City，但在searched prefix 時會發現輸入 New 並不會出現 New York，也就是說程式在讀取資料時，並沒有考慮到多單字組成的資料。

Shanghai, China
Buenos Aires, Argentina
Mumbai, India
Mexico City, Distrito Federal, Mexico
Karachi, Pakistan
İstanbul, Turkey
Delhi, India
Manila, Philippines
Moscow, Russia
Dhaka, Bangladesh
Seoul, South Korea
São Paulo, Brazil
Lagos, Nigeria
Jakarta, Indonesia
Tokyo, Japan
Zhumadian, China
New York, New York, United States
Taipei, Taiwan
...

尋找 search "A" 發生 Segfault 的原因

使用 gdb debug

$ gdb test_cpy
...
(gdb) run
...
choice: s
find words matching prefix (at least 1 char): A

Program received signal SIGSEGV, Segmentation fault.
0x0000000000401d81 in tst_suggest (p=0x8061c0, c=65 'A', nchr=1, a=0x7fffffffbb10, n=0x7fffffffbad0, max=1024)
    at tst.c:330
330	        a[(*n)++] = (char *) p->eqkid;

發現程式執行到 tst.c:330 行時會 crash
















void tst_suggest(const tst_node *p, 
                 const char c,
                 const size_t nchr,
                 char **a,
                 int *n, 
                 const int max)
{
    if (!p || *n == max)
       return;
    tst_suggest(p->lokid, c, nchr, a, n, max);
    if (p->key)
        tst_suggest(p->eqkid, c, nchr, a, n, max);
    else if (*(((char *) p->eqkid) + nchr - 1) == c)
        a[(*n)++] = (char *) p->eqkid;
    tst_suggest(p->hikid, c, nchr, a, n, max);
}

程式發生 crash 的條件是在第 N 層時執行 326 行時 *n = 1023 ，遞迴自己進入第 N+1 層，再一次執行 326 行時 p->lokid = NULL，所以進入 test_suggest 第 N+2 層會在 324 行滿足條件 return，第 N+1 層繼續往下進行條件剛好進入到 330 行 (*n)++，此時 331 行 *n = 1024，所以進入第 N+2 層 test_suggest 也會在 324 行滿足條件 retrun，第 N+1 層也在此時結束，回到第 N 層進行到 330 行，程式已經不符合原先期待，會對 a[1024] 做存取，並且隨著程式持續執行， *n 持續增加，持續存取到不正確的記憶體。

Fix function tst_suggest to prevent segfault

對應的 Git commit 在哪？
標題沒下好，請改善。連智商平均只有 30 的記者都知道標題的重要，你作為更優秀的理工人，當然要做更好
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
jserv

會增加 *n 的也只有第 330 行，只需要在 329 行的進入條件多增加 *n != max 即可修復此 bug
else if ( *(((char *) p->eqkid) + nchr - 1) == c && *n != max)
















void tst_suggest(const tst_node *p, 
                 const char c,
                 const size_t nchr,
                 char **a,
                 int *n, 
                 const int max)
{
    if (!p || *n == max)
       return;
    tst_suggest(p->lokid, c, nchr, a, n, max);
    if (p->key)
        tst_suggest(p->eqkid, c, nchr, a, n, max);
    else if (*(((char *) p->eqkid) + nchr - 1) == c && *n != max)
        a[(*n)++] = (char *) p->eqkid;
    tst_suggest(p->hikid, c, nchr, a, n, max);
}

修正 test_ref.c 使程式可以執行 REF 的功能

參考 st9007a 共筆

將 test_ref.c 的 CPY 都替換為 REF，並且將輸入的 word 都用 strdup 去讓每次輸入的 word 都有不同的記憶體位址。

參考資料

st9007a 共筆