Try   HackMD

2020q1 Homework1 (lab0)

contributed by < IepIweidieng >

開發環境準備

切碟 - 2020-02-27 17:00

我切 E。

切出 32 GiB。

載裝 Ubuntu - 2020-02-27 17:20

遭遇問題


404 - Not Found

載點 dead 了,換個。
我選元智。

404 - Not Found

每個載點點點,發現全都 dead。

問題的解決與探討

到 ftp 頁面裡面看了看,發現沒有 18.04.3 這個版本,不過有 18.04.4。

http://ftp.ubuntu-tw.org/mirror/ubuntu-releases/

Name               Last Modified          Size  Type
[...]
18.04/             2020-Feb-12 22:32:24    -    Directory
18.04.4/           2020-Feb-12 22:32:24    -    Directory

觀察 Last Modified 這個欄位,可以發現是 2 月初才剛更新。
可能是因為這個原因,剛才的下載點列表還沒更新,所以裡面的連結才會失效。

http://ftp.ubuntu-tw.org/mirror/ubuntu-releases/18.04/

Name                                          Last Modified           Size   Type
Parent Directory/                                                     -      Directory
[...]
SHA1SUMS                                      2020-Feb-12 22:32:05    0.1K   application/octet-stream
SHA1SUMS.gpg                                  2020-Feb-12 22:32:41    0.9K   application/octet-stream
SHA256SUMS                                    2020-Feb-12 22:32:24    0.2K   application/octet-stream
SHA256SUMS.gpg                                2020-Feb-12 22:32:44    0.9K   application/octet-stream
ubuntu-18.04.4-desktop-amd64.iso              2020-Feb-04 02:40:24    1.9G   application/x-iso9660-image
ubuntu-18.04.4-desktop-amd64.iso.torrent      2020-Feb-12 21:41:56   79.5K   application/x-bittorrent
ubuntu-18.04.4-desktop-amd64.iso.zsync        2020-Feb-12 21:41:52    3.9M   application/octet-stream
ubuntu-18.04.4-desktop-amd64.list             2020-Feb-04 02:40:25    7.8K   application/octet-stream
[後略...]

開啟 WSL (Windows Subsystem for Linux),檢查版本 - 2020-02-27 17:40

因為離下載完成還要很久,先用 WSL (WSL 1) 開始本次作業。
安裝 WSL 的說明:https://docs.microsoft.com/zh-tw/windows/wsl/install-win10

我先前就已準備好 WSL 環境,所以將其直接開啟。

看看它的版本。

$ uname -a
Linux IID-PC-acer-N17Q3 4.4.0-18362-Microsoft #476-Microsoft Fri Nov 01 16:53:00 PST 2019 x86_64 x86_64 x86_64 GNU/Linux

查看我電腦的規格。

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
Vendor ID:           AuthenticAMD
CPU family:          21
Model:               112
Model name:          AMD A9-9420 RADEON R5, 5 COMPUTE CORES 2C+3G
Stepping:            0
CPU MHz:             3000.000
CPU max MHz:         3000.0000
BogoMIPS:            6000.00
Virtualization:      AMD-V
Hypervisor vendor:   Windows Subsystem for Linux
Virtualization type: container
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave osxsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm perfctr_core perfctr_nb bpext ptsc mwaitx fsgsbase bmi1 avx2 smep bmi2

設定開發工具 - 2020-02-27 17:50

$ sudo aptitude install build-essential git-core valgrind cppcheck clang-format aspell lordiff valgrind
$ cppcheck --version
Cppcheck 1.82

太舊,不要了。

$ sudo aptitude remove cppcheck
$ sudo snap install cppcheck
Interacting with snapd is not yet supported on Windows Subsystem for Linux.
This command has been left available for documentation purposes only.

看來需要自行編譯 cppcheck

自行編譯及安裝 cppcheck - 2020-02-27 18:03

$ cd ~
$ git clone git://github.com/danmar/cppcheck.git
$ cd cppcheck/
$ ls
AUTHORS             cmake                     gui              requirements.txt
CMakeLists.txt      console_common.pri        htmlreport       rules
COPYING             cppcheck-errors.rng       lib              runastyle
Cppcheck.xcodeproj  cppcheck.cbp              man              runastyle.bat
Makefile            cppcheck.cppcheck         naming.json      samples
addons              cppcheck.sln              oss-fuzz         snap
appveyor.yml        createrelease             philosophy.md    test
benchmarks.txt      cve-test-suite            platforms        tools
build-pcre.txt      democlient                pylintrc_travis  webreport.sh
build.bat           doxyfile                  readme.md        win_installer
cfg                 externals                 readme.txt
cli                 generate_coverage_report  readmeja.md
$ vim readme.md

看了說明後,我決定使用 cmake 編譯。

$ mkdir build/
$ cd build/
$ cmake .. -DHAVE_RULES=ON -DUSE_MATCHCOMPILER=ON
$ cmake --build .

編好了可以安裝了。

$ sudo checkinstall
**********************************************************************

 Done. The new package has been installed and saved to

 /home/iid/cppcheck/build/build_20200227-1_amd64.deb

 You can remove it from your system anytime using:

      dpkg -r build

**********************************************************************

終於裝好了,那我們可以開始了。

queue 開發過程

演算法構想 - 2020-02-27 23:08

O(1)q_insert_tail() & q_size()

如果我們只知道 list 的開頭,那麼 q_insert_tail() 需要找出 list 的 tail 才能完成;q_size() 則需要從頭 travel 到 tail 才行。

但如果把 traveling 完的結果與位置記起來,只要這個結果本身不改變,這個位置不移動,就不必再 traveling 到這裡。

我們可以試著把 traveling 的結果 (對應 tail) 與位置 (對應 size) 記起來。

我們可以在一開始 travel 整個 list,記下結果。
不過,通常整個 list 在一開始都會是空的,所以沒有 tail,而 size 為 0。這時如果去 travel,一開始就結束了。

然而,我們想要處理的是會改變的 list。
為了讓記起來的結果保持正確,需要在結果改變時,更新我們之前記住的東西。

可能需要更新 tail 的場合:

  • Tail 消失了
    • List 被 freed
      • q_free()
        • 不需要特別處理
    • Tail 被 removed
      • q_remove_head(),且 head 為 tail
        • 需要從前往後 travel 找新的 tail
  • Tail appears 或被 appended 了
    • List 被 new-ed
      • q_new() (視 list root 為 tail)
        • 將 list root 設為 tail
    • 新 node 被 inserted 在 tail 後
      • q_insert_tail()
      • q_insert_head(),且 list 為空
        • 將新 inserted 的 node 設為新的 tail
  • Tail 往前移動了
    • q_reverse()
    • q_sort()

可能需要更新 size 的場合:

  • Node 變多了
    • Node 被 inserted
      • q_insert_tail()
      • q_insert_head()
        • 加大 size
  • Node 變少了,或是變不見了
    • List 被 freed
      • q_free()
        • 不需要特別處理
    • Node 被 removed
      • q_remove_head()
        • 減小 size

當 list 被 free,它的存在便不成立,它的頭尾、它的長度,都失去意義,不再需要我們更新從前的記憶。

Tail 消失了 需要進行 traveling 來更新 tail,通常這會造成 O(n) 的時間複雜度。
但是只有 q_remove_head() 在 head 為 tail 時需要 traveling 來更新 tail,而 head 為 tail 時,list 長度為 1;需要 travel 的 node 數量固定了,因此時間複雜度降為 O(1)

而 size 不需要位置資訊,只要在 insert 和 remove 時跟著更新就行,size 取得操作的時間複雜度可為 O(1)

實作 - 2020-02-28 18:14

首先先把 q_new()q_free() 寫好。

q_new()

Declaration:

queue_t *q_new();

這是不正確的。這是 C 不是 C++。

不像 C++,C 函數宣告的 parameter list 留空的話,會允許 pass 任意 type 與數量的 arguments 進去。
所以在 C 下,我這樣呼叫也是可以的:

q_new("" L" 4" " 2" L" ", &06[L"L"],
    &(struct Struct {
        union Union {
            int long unsigned long (((i)));
            char (((byte))[sizeof(const long unsigned int long volatile)]);
        } *noinu;
    } const){&(union Union){.i = (9ULL),},});

這種函數宣告在 C 中叫做 K&R-style function declaration,是在 C 的 function prototype 出現前的函數宣告語法。
C++ 沒有 K&R-style function declaration,而是將空的 parameter list 視為沒有參數,改用 int func(...) 表示接受任意 type 與數量的 arguments 的函數

比較正確的作法是宣告成:

queue_t *q_new(void);

這是 C 與 C++ 都接受的語法。

註:2020-03-03 23:18

繼續研究後發現,C2x 不再接受 K&R-style function declaration。所以函數定義中的 int func()int func(void) 會是相等的,都不接受任何參數,變得與 C++ 相似。

但在函數宣告中,int func() 仍然不限制參數的 type 與數量,所以在 q_new() 的定義之前,或在沒有定義 q_new() 的檔案中,還是可以使用剛才的方法呼叫 q_new()

There are other issues within the field of function syntax, namely — declarations (that are not definitions) may omit parameter information completely, and []
The first, the property to provide declarations without prototypes, has also been marked obsolescent in the C standard, but still is relatively widely used. [] Therefore, we will not attempt to remove this specific feature here, this would need much more preparation and
discussion.
Remove support for function definitions with identifier lists proposal for C2x

相關資料:

1.1. Integrated changes:
[]
N2432. Remove support for function definitions with identifier lists

Definition:

/*
 * Create empty queue.
 * Return NULL if could not allocate space.
 */
queue_t *q_new()
{
    queue_t *q = malloc(sizeof(queue_t));
    if (q)
        q->head = NULL;
    return q;
}

q_free()

Declaration:

void q_free(queue_t *q);

Definition:

/* Free all storage used by queue */
void q_free(queue_t *q)
{
    if (!q)
        return;

    /* Free queue elements */
    for (list_ele_t *k = q->head; k;) {
        list_ele_t *const next = k->next;
        free(k->value);
        free(k);
        k = next;
    }
    /* Free queue structure */
    free(q);
}

不應該用 q_remove_head(),因為會不必要地設定 q->head 以及檢查要不要 copy k->value

接下來要考慮 insert_tail()size()O(1) 時間實作。

延續之前的討論,可以嘗試把 tail 和 size 記在 queue_t 中。

diff --git a/queue.h b/queue.h
index 3b48875..c67dd87 100644
--- a/queue.h
+++ b/queue.h
@@ -25,10 +25,8 @@ typedef struct ELE {
 /* Queue structure */
 typedef struct {
     list_ele_t *head; /* Linked list of elements */
-    /* TODO: You will need to add more fields to this structure
-     *        to efficiently implement q_size and q_insert_tail.
-     */
-    /* TODO: Remove the above comment when you are about to implement. */
+    list_ele_t *tail; /* The tail element of the list */
+    size_t size;      /* The size of the list */
 } queue_t;

 /* Operations on queue */

q_new() - revised

不過,通常整個 list 在一開始都會是空的,所以沒有 tail,而 size 為 0。這時如果去 travel,一開始就結束了。

list root 設為 tail

要這麼做的話,tail 的 type 得是 list_ele_t ** 才能指向 list root (queue_t::head)。
但我將 tail 的 type 設為了 list_ele_t *,所以不可行,因此將其直接設為 NULL

diff --git a/queue.c b/queue.c
index 48fef4e..f18cf9c 100644
--- a/queue.c
+++ b/queue.c
@@ -12,8 +12,11 @@
 queue_t *q_new()
 {
     queue_t *q = malloc(sizeof(queue_t));
-    if (q)
+    if (q) {
         q->head = NULL;
+        q->tail = NULL;
+        q->size = 0;
+    }
     return q;
 }

q_size()

Declaration:

int q_size(queue_t *q);

Definition:

/*
 * Return number of elements in queue.
 * Return 0 if q is NULL or empty
 */
int q_size(queue_t *q)
{
    return (q) ? q->size : 0;
}

我習慣把 ?: 的條件用 () 括起來。

如果 q 是空的話,q->size 會是 0,回傳 q->size 即可,不須額外處理。

q_insert_head() & q_insert_tail()

Declarations:

bool q_insert_head(queue_t *q, char *s);
bool q_insert_tail(queue_t *q, char *s);

Parameter s 被宣告為 char *,這是不好的。
因為這兩個函數都不會修改 s 指向的字串的內容(註解有提到要用複製的),所以應該宣告成 const char *

const char *char * 在 C 與 C++ 中都可以 implicit convert 為 const char * 型別,所以如果不需要修改 pass 進來的字串內容的話,完全可以宣告為 const char *。這可以提示其他看到這個函數宣告的寫程式者說:「你可以相信我不會改你給我的字串!」

另外,string literal 在 C 中的型別是 char [N],但在 C++ 中卻是 const char [N],所以如果有人使用:

q_insert_head(q, "aaq");

呼叫這個函數的話,這人用 C++ 編譯器編譯時,如果編譯器沒有相關的 extension,就會編譯失敗。

所以應該改成:

bool q_insert_head(queue_t *q, const char *s);
bool q_insert_tail(queue_t *q, const char *s);

Definitions:

我自己定義了函數 ele_alloc() 來新增 node。
注意我用 const char *s

/*
 * Return `NULL` if could not allocate space.
 * Return non-`NULL` if successful.
 * Argument `s` points to the string to be stored and should not be `NULL`.
 * Note: `newh->next` will not be initialized.
 */
static list_ele_t *ele_alloc(const char *s)
{
    /* Don't forget to allocate space for the string and copy it */
    list_ele_t *const newh = malloc(sizeof(list_ele_t));
    if (newh) {
        const size_t len = strlen(s) + 1;
        newh->value = malloc(len);
        if (newh->value) {
            memcpy(newh->value, s, len);
        } else {
            /* What if either call to malloc returns NULL? */
            free(newh);
            return NULL;
        }
    }
    return newh;
}

用了 strlen() 後就不需要 strcpy()strncpy() 了,應該使用 memcpy()。因為 strcpy() 以及 strncpy() 會去算字串長度,可是已經用 strlen() 算過了,會浪費時間。

由於 C Programming Lab 有提到:

For functions that provide strings as arguments, you must create and store a copy of the string by calling malloc to allocate space (remember to include space for the terminating character) and then copying from
the source to the newly allocated space.

所以我使用 malloc() + memcpy() 而非 strdup() 來複製字串。
另外,strdup() 並非 C 與 C++ 的標準函數,而是由 POSIX 標準規範的函數。

註:2020-03-04 01:27

strdup()strndup() 是 C2x 的標準函數了。

相關資料:

1.1. Integrated changes:
[]
N2353. the strdup and strndup functions

q_insert_head() 要注意 tail 可能會 appear。

/*
 * Attempt to insert element at head of queue.
 * Return true if successful.
 * Return false if q is NULL or could not allocate space.
 * Argument s points to the string to be stored.
 * The function must explicitly allocate space and copy the string into it.
 */
bool q_insert_head(queue_t *q, char *s)
{
    list_ele_t *newh;
    if (!q)
        return false;

    newh = ele_alloc(s);
    if (newh) {
        newh->next = q->head;
        q->head = newh;
        if (!q->tail)  // The tail will appear
            q->tail = newh;
        ++q->size;
        return true;
    }
    return false;
}

q_insert_tail() 則要注意 tail 會被 appended。
另外 q 為空時,head 會 appears。

注意沒有 head 時也不會有 tail,所以這裡用 else if (q->tail) 而不是單純的 if (q->tail) 以免去不必要的判斷。

/*
 * Attempt to insert element at tail of queue.
 * Return true if successful.
 * Return false if q is NULL or could not allocate space.
 * Argument s points to the string to be stored.
 * The function must explicitly allocate space and copy the string into it.
 */
bool q_insert_tail(queue_t *q, char *s)
{
    list_ele_t *newh;
    if (!q)
        return false;

    newh = ele_alloc(s);
    if (newh) {
        newh->next = NULL;
        if (!q->head)  // The head will appear
            q->head = newh;
        else if (q->tail)  // The original tail will be appended
            q->tail->next = newh;
        q->tail = newh;
        ++q->size;
        return true;
    }
    return false;
}

q_remove_head()

Declaration:

bool q_remove_head(queue_t *q, char *sp, size_t bufsize);

要注意如果 head 剛好是 tail 時,tail 會消失。

Definition:

/*
 * Attempt to remove element from head of queue.
 * Return true if successful.
 * Return false if queue is NULL or empty.
 * If sp is non-NULL and an element is removed, copy the removed string to *sp
 * (up to a maximum of bufsize-1 characters, plus a null terminator.)
 * The space used by the list element and the string should be freed.
 */
bool q_remove_head(queue_t *q, char *sp, size_t bufsize)
{
    list_ele_t *node;
    if (!q || !q->head)
        return false;

    node = q->head;
    if (sp) {
        const size_t len = strnlen(node->value, bufsize);
        memcpy(sp, node->value, len);
        if (len == bufsize && len > 0)
            sp[len - 1] = '\0';
    }
    free(node->value);

    q->head = q->head->next;
    if (node == q->tail)  // The tail will disappear
        q->tail = NULL;
    free(node);

    --q->size;
    return true;
}

strnlen() 是有步數限制的 strlen()
呼叫 strnlen(str, len) 時,如果找到 str[len - 1] 時都沒發現 '\0' 結尾,strnlen() 就會回傳 len。這可以防止 buffer overrun。
為了確保 sp'\0' 結尾,我在 strnlen() 傳回最大值時,就將 sp 強制加上 '\0' 結尾。

註:2020-03-04

查看了 man page 發現,strncpy() 在複製後會用 '\0' 填滿剩下的空間,所以理論上可以直接 sp[bufsize - 1] = '\0' 而不改變結果。
C11 的 strncpy_s() 在複製後只會包含 1 個 '\0',而不會用 '\0' 填滿剩下的空間,相較下比較有效率。

q_reverse()

Declaration:

void q_reverse(queue_t *q);

Definition:

/*
 * Reverse elements in queue
 * No effect if q is NULL or empty
 * This function should not allocate or free any list elements
 * (e.g., by calling q_insert_head, q_insert_tail, or q_remove_head).
 * It should rearrange the existing ones.
 */
void q_reverse(queue_t *q)
{
    list_ele_t *prev;

    if (!q || !q->head)
        return;

    /* Re-assign `k->next` */
    prev = NULL;
    for (list_ele_t *k = q->head; k;) {
        list_ele_t *const next = k->next;
        k->next = prev;
        prev = k;
        k = next;
    }
    /* `prev` is now the original tail */

    /* Swap `head` and `tail` */
    q->tail = q->head;
    q->head = prev;
}

q_sort() 算法構想

我使用了 recursion 版本的 merge sort 來實作 q_sort()
構想是:與 q1 測驗題中的 insertion sort 相似,但是讓 right 先走到 list 的中間,再去左右分別遞迴下去,這樣就是 merge sort。

在撰寫 q_sort() 的途中,發現了幾個可能值得探討之處:

  • 因為 list 長度 < 2 時,什麼都不作,不會遞迴,所以可以確定 leftright 被拿去遞迴時應當都不會是 NULL
    • 因為 list 長度 >= 2 時,將 list 對半分,兩邊都還會有東西。
    • 但我的函數的設計是要自己 pass list 長度進去。
      如果手動給的長度超過實際長度 2 倍的話,right 走到一半前會變 NULL
      但這時有 left 遞迴下去就足夠了,之後也不必合併 leftright 的 lists,所以可以直接回傳 left 遞迴下去的結果。
    • 因為可以確保在開始合併 leftright 的列表時,它們都不會是 NULL,所以可以在迴圈前就設定好 merge list 的開頭,不用在迴圈裡一直檢查 !merge
  • 我的函數在手動 pass 的 list 長度 < 4 時,會強制變成 insertion sort。
    • 後來推導後發現,對於長度小於 4 的 list,insertion sort 與 merge sort 的過程沒有分別。

Declaration:

void q_sort(queue_t *q);

Definitions:

我寫了另外的函數用以遞迴。

注意不要相信傳進來的 len,要讓函數在 len 給錯時還能正確完成操作。

/*
 * Sort `len` elements start with `head` in ascending order using merge sort
 * No effect if `head` is `NULL` or its member `next` is `NULL`.
 * The function reduces to insertion sort when `len` < 4
 */
static list_ele_t *ele_sort(list_ele_t *head, size_t len)
{
    list_ele_t *left;
    list_ele_t *right;
    list_ele_t *merge;

    if (!head || !head->next)
        return head;

    left = right = head;
    for (size_t k = 1, n = len / 2; k < n && right; ++k)
        right = right->next;
    if (right) {
        list_ele_t *const next = right->next;
        right->next = NULL;
        right = next;
    }

    left = ele_sort(left, len / 2);
    if (!right)  // `right` is `NULL` => `left` is the head of the sorted list
        return left;
    right = ele_sort(right, (len + 1) / 2);

    /* Now `left` and `right` are both the head of a non-`NULL` sorted list */
    {
        /* Set up the beginning of the merged list */
        list_ele_t **const phead =
            (strcmp(left->value, right->value) < 0) ? &left : &right;
        head = *phead;
        merge = *phead = (*phead)->next;
    }

    while (left || right) {
        if (!right || (left && strcmp(left->value, right->value) < 0)) {
            /* `left` should be linked from `merge` */
            merge->next = left;
        } else {
            /* `right` should be linked from `merge` */
            merge->next = right;
        }
        merge = merge->next;
    }

    return head;
}
/*
 * Sort elements of queue in ascending order
 * No effect if q is NULL or empty. In addition, if q has only one
 * element, do nothing.
 */
void q_sort(queue_t *q)
{
    if (!q || !q->head)
        return;

    ele_sort(q->head, q->size);
}

開始驗證 - 2020-02-28 22:05

$ make test
scripts/driver.py -c
---     Trace           Points
+++ TESTING trace trace-01-ops:
# Test of insert_head and remove_head
ERROR: Removed value dolphinXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value dolphin
ERROR: Removed value bearXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value bear
ERROR: Removed value gerbilXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value gerbil
---     trace-01-ops    0/6
+++ TESTING trace trace-02-ops:
# Test of insert_head, insert_tail, and remove_head
ERROR: Removed value dolphinXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value dolphin
ERROR: Removed value bearXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value bear
ERROR: Removed value gerbilXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value gerbil
ERROR: Removed value meerkatXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value meerkat
ERROR: Removed value bearXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value bear
Error limit exceeded.  Stopping command execution
---     trace-02-ops    0/6
+++ TESTING trace trace-03-ops:
# Test of insert_head, insert_tail, reverse, and remove_head
ERROR: Removed value squirrelXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value squirrel
ERROR: Removed value gerbilXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value gerbil
ERROR: Removed value bearXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value bear
ERROR: Removed value meerkatXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value meerkat
ERROR: Removed value gerbilXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value gerbil
Error limit exceeded.  Stopping command execution
---     trace-03-ops    0/6
+++ TESTING trace trace-04-ops:
# Test of insert_head, insert_tail, size, and sort
Segmentation fault occurred.  You dereferenced a NULL or invalid pointer
---     trace-04-ops    0/6
+++ TESTING trace trace-05-ops:
# Test of insert_head, insert_tail, remove_head, reverse, size, and sort
ERROR: Removed value dolphinXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX[後略...] != expected value dolphin
Segmentation fault occurred.  You dereferenced a NULL or invalid pointer
---     trace-05-ops    0/5
+++ TESTING trace trace-06-string:
# Test of truncated strings
ERROR: Removed value aardvark_bear_dolphin_gerbil_jaguarXXX[後略...] != expected value aardvark_bear_dolphin_gerbil_jaguar
ERROR: Removed value meerkat_panda_squirrel_vulture_wolfXXX[後略...] != expected value meerkat_panda_squirrel_vulture_wolf
ERROR: Removed value aardvark_bear_dolphin_gerbil_jaguarXXX[後略...] != expected value aardvark_bear_dolphin_gerbil_jaguar
ERROR: Removed value meerkat_panda_squirrel_vulture_wolfXXX[後略...] != expected value meerkat_panda_squirrel_vulture_wolf
---     trace-06-string 0/6
+++ TESTING trace trace-07-robust:
# Test operations on NULL queue
---     trace-07-robust 6/6
+++ TESTING trace trace-08-robust:
# Test operations on empty queue
---     trace-08-robust 6/6
+++ TESTING trace trace-09-robust:
# Test remove_head with NULL argument
---     trace-09-robust 6/6
+++ TESTING trace trace-10-malloc:
# Test of malloc failure on new
---     trace-10-malloc 6/6
+++ TESTING trace trace-11-malloc:
# Test of malloc failure on insert_head
---     trace-11-malloc 6/6
+++ TESTING trace trace-12-malloc:
# Test of malloc failure on insert_tail
---     trace-12-malloc 6/6
+++ TESTING trace trace-13-perf:
# Test performance of insert_tail
---     trace-13-perf   6/6
+++ TESTING trace trace-14-perf:
# Test performance of size
---     trace-14-perf   6/6
+++ TESTING trace trace-15-perf:
# Test performance of insert_tail, size, reverse, and sort
Segmentation fault occurred.  You dereferenced a NULL or invalid pointer
---     trace-15-perf   0/6
+++ TESTING trace trace-16-perf:
# Test performance of sort with random and descending orders
# 10000: all correct sorting algorithms are expected pass
# 50000: sorting algorithms with O(n^2) time complexity are expected failed
# 100000: sorting algorithms with O(nlogn) time complexity are expected pass
Segmentation fault occurred.  You dereferenced a NULL or invalid pointer
---     trace-16-perf   0/6
+++ TESTING trace trace-17-complexity:
# Test if q_insert_tail and q_size is constant time complexity
Probably constant time
Probably constant time
---     trace-17-complexity     5/5
---     TOTAL           53/100

問題探討

q_remove()q_sort() 不正確。

q_remove() 複製字串時少複製了 '\0'

diff --git a/queue.c b/queue.c
index 2226509..35a38c1 100644
--- a/queue.c
+++ b/queue.c
@@ -128,11 +128,11 @@ bool q_remove_head(queue_t *q, char *sp, size_t bufsize)
         return false;

     node = q->head;
-    if (sp) {
-        const size_t len = strnlen(node->value, bufsize);
-        memcpy(sp, node->value, len);
-        if (len == bufsize && len > 0)
-            sp[len - 1] = '\0';
+    if (sp && bufsize) {
+        const size_t len = strnlen(node->value, bufsize - 1);
+        memcpy(sp, node->value, len + 1);
+        if (len + 1 == bufsize)
+            sp[len] = '\0';
     }
     free(node->value);
     

注意如果 bufsize0 時沒有東西可 copy。

q_sort() 則有幾個問題:

  • 在從 right 遞迴下去時,手動給的 len 可能會 overflow 變成 0 (size_t 是 unsigned type)。
  • 負責 merge 兩個 lists 的 pointer 一開始設定錯地方,跳到了合併的 list 的 head 的 next 上,而這個 head 才剛設定好,它的 next 還是合併前的 next,如果其 list 長度是 1 的話就是 NULL
  • 因為忘了改變兩個 lists 的未處理部分的開頭的 pointer,造成其中一個的 next 指向自己,造成無限迴圈。
@@ -209,24 +209,26 @@ static list_ele_t *ele_sort(list_ele_t *head, size_t len)
     left = ele_sort(left, len / 2);
     if (!right)  // `right` is `NULL` => `left` is the head of the sorted list
         return left;
-    right = ele_sort(right, (len + 1) / 2);
+    right = ele_sort(right, len / 2 + len % 2);

     /* Now `left` and `right` are both the head of a non-`NULL` sorted list */
     {
         /* Set up the beginning of the merged list */
         list_ele_t **const phead =
             (strcmp(left->value, right->value) < 0) ? &left : &right;
-        head = *phead;
-        merge = *phead = (*phead)->next;
+        merge = head = *phead;
+        *phead = (*phead)->next;
     }

     while (left || right) {
         if (!right || (left && strcmp(left->value, right->value) < 0)) {
             /* `left` should be linked from `merge` */
             merge->next = left;
+            left = left->next;
         } else {
             /* `right` should be linked from `merge` */
             merge->next = right;
+            right = right->next;
         }
         merge = merge->next;
     }

但解決了這兩個問題後,q_sort() 還是有錯誤,原因不確定,用 valgrind 找找看。

$ sudo aptitude install libc6-dbg
libc6-dbg is already installed at the requested version (2.27-3ubuntu1)
$ valgrind -q --leak-check=full ./qtest
==181== error calling PR_SET_PTRACER, vgdb might block
cmd> new
cmd> new
q = []
cmd> it RAND 16
cmd> it RAND 16
q = [tblipgjq hhutscu yblakru xjsjz wlwagfii ofwirvrl umjrxhrfm kbbwx zwhwos lhkpxzb ykamabi hrhitzevx gdkhb ahpblakjy hvpbvtyul zkfgmhpbi]
cmd> sort
cmd> sort
==181== Invalid read of size 8
==181==    at 0x109CD2: do_sort (qtest.c:561)
==181==    by 0x10B9D2: interpret_cmda (console.c:220)
==181==    by 0x10BF46: interpret_cmd (console.c:243)
==181==    by 0x10C6B5: cmd_select (console.c:607)
==181==    by 0x10C75C: run_console (console.c:628)
==181==    by 0x10AE81: main (qtest.c:772)
==181==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==181==
Segmentation fault occurred.  You dereferenced a NULL or invalid pointer
==181== 5 bytes in 1 blocks are still reachable in loss record 1 of 30
[後略...]

qtestecho option 沒關,所以會將使用者輸入再印一次)

$ vim qtest.c
:561
if (strcasecmp(e->value, e->next->value) > 0) {

e->next 很可疑。

    bool ok = true;
    if (q) {
        for (list_ele_t *e = q->head; e && --cnt; e = e->next) {
            /* Ensure each element in ascending order */
            /* FIXME: add an option to specify sorting order */
            if (strcasecmp(e->value, e->next->value) > 0) {
                report(1, "ERROR: Not sorted in ascending order");
                ok = false;
                break;
            }
        }
    }

找到問題了,因為 e->nexte 到達 list tail 時已經是 NULL 了,再 ->value 下去就會爆。

只要在 e->nextNULL 時停止迴圈就可以了。

diff --git a/qtest.c b/qtest.c
index 86d68fb..6ec8d68 100644
--- a/qtest.c
+++ b/qtest.c
@@ -554,8 +554,8 @@ bool do_sort(int argc, char *argv[])
     set_noallocate_mode(false);

     bool ok = true;
-    if (q) {
-        for (list_ele_t *e = q->head; e && --cnt; e = e->next) {
+    if (q && q->head) {
+        for (list_ele_t *e = q->head; e->next && --cnt; e = e->next) {
             /* Ensure each element in ascending order */
             /* FIXME: add an option to specify sorting order */
             if (strcasecmp(e->value, e->next->value) > 0) {

注意只要比較 size - 1 次就好;在 size 為 1 時則沒東西可比較。

註 - 2020-03-02 17:36:

重看了這裡的 code 發現,for 迴圈中的 --cnt 會讓迴圈最多執行 size - 1 步,這就足以防止 e 為 tail 時 e->nextNULL 的相關問題了。
所以這裡會 crash 是因為 q 本身出了問題。(見下)

再次驗證 - 2020-02-28 23:42

再試試看。

$ make test
scripts/driver.py -c
---     Trace           Points
[...]
+++ TESTING trace trace-04-ops:
# Test of insert_head, insert_tail, size, and sort
ERROR: Removed value dolphin != expected value bear
---     trace-04-ops    0/6
+++ TESTING trace trace-05-ops:
# Test of insert_head, insert_tail, remove_head, reverse, size, and sort
ERROR: Removed value gerbil != expected value bear
ERROR: Removed value meerkat != expected value bear
ERROR: Removal from queue failed (1 failures total)
ERROR: Removal from queue failed (2 failures total)
ERROR: Removal from queue failed (3 failures total)
Error limit exceeded.  Stopping command execution
---     trace-05-ops    0/5
[...]
+++ TESTING trace trace-15-perf:
# Test performance of insert_tail, size, reverse, and sort
ERROR: Freed queue, but 3999998 blocks are still allocated
---     trace-15-perf   0/6
+++ TESTING trace trace-16-perf:
# Test performance of sort with random and descending orders
# 10000: all correct sorting algorithms are expected pass
# 50000: sorting algorithms with O(n^2) time complexity are expected failed
# 100000: sorting algorithms with O(nlogn) time complexity are expected pass
ERROR: Freed queue, but 19998 blocks are still allocated
ERROR: Freed queue, but 119996 blocks are still allocated
ERROR: Freed queue, but 319994 blocks are still allocated
---     trace-16-perf   0/6
+++ TESTING trace trace-17-complexity:
# Test if q_insert_tail and q_size is constant time complexity
Probably constant time
Probably constant time
---     trace-17-complexity     5/5
---     TOTAL           77/100

看來這次是我的問題了。

因為想不到原因,所以從頭看了這份報告,然後看到這段:

可能需要更新 tail 的場合:
[]

  • Tail 往前移動了
    • q_reverse()
    • q_sort()

我想起來了,我還沒有設定 sort 完的 list 的 head 與 tail。

diff --git a/queue.c b/queue.c
index cbe1628..b25556a 100644
--- a/queue.c
+++ b/queue.c
@@ -183,19 +183,24 @@ void q_reverse(queue_t *q)
     q->head = prev;
 }

+typedef struct {
+    list_ele_t *head;
+    list_ele_t *tail;
+} list_span_t;
+
 /*
  * Sort `len` elements start with `head` in ascending order using merge sort
  * No effect if `head` is `NULL` or its member `next` is `NULL`.
  * The function reduces to insertion sort when `len` < 4
  */
-static list_ele_t *ele_sort(list_ele_t *head, size_t len)
+static list_span_t ele_sort(list_ele_t *head, size_t len)
 {
     list_ele_t *left;
     list_ele_t *right;
     list_ele_t *merge;

     if (!head || !head->next)
-        return head;
+        return (list_span_t){head, head};

     left = right = head;
     for (size_t k = 1, n = len / 2; k < n && right; ++k)
@@ -206,10 +211,13 @@ static list_ele_t *ele_sort(list_ele_t *head, size_t len)
         right = next;
     }

-    left = ele_sort(left, len / 2);
-    if (!right)  // `right` is `NULL` => `left` is the head of the sorted list
-        return left;
-    right = ele_sort(right, len / 2 + len % 2);
+    {
+        const list_span_t left_res = ele_sort(left, len / 2);
+        left = left_res.head;
+        if (!right)  // `left` is the head of the sorted list
+            return left_res;
+    }
+    right = ele_sort(right, len / 2 + len % 2).head;

     /* Now `left` and `right` are both the head of a non-`NULL` sorted list */
     {
@@ -233,7 +241,7 @@ static list_ele_t *ele_sort(list_ele_t *head, size_t len)
         merge = merge->next;
     }

-    return head;
+    return (list_span_t){head, merge};
 }

 /*
@@ -243,8 +251,11 @@ static list_ele_t *ele_sort(list_ele_t *head, size_t len)
  */
 void q_sort(queue_t *q)
 {
-    if (!q || !q->head)
+    list_span_t span;
+    if (!q || !q->head || !q->head->next)
         return;

-    ele_sort(q->head, q->size);
+    span = ele_sort(q->head, q->size);
+    q->head = span.head;
+    q->tail = span.tail;
 }

我讓 ele_sort() 一次回傳兩個值,分別是 list sort 完後的 head 與 tail。

/*
 * Sort elements of queue in ascending order
 * No effect if q is NULL or empty. In addition, if q has only one
 * element, do nothing.
 */

注意在 q 的 size 為 1 時,do nothing,所以這時我就不呼叫 ele_sort() 也不設定 head 與 tail 了。

第三次驗證 - 2020-02-29 00:00

$ make test
scripts/driver.py -c
---     Trace           Points
[...]
+++ TESTING trace trace-04-ops:
# Test of insert_head, insert_tail, size, and sort
---     trace-04-ops    6/6
+++ TESTING trace trace-05-ops:
# Test of insert_head, insert_tail, remove_head, reverse, size, and sort
---     trace-05-ops    5/5
[...]
+++ TESTING trace trace-15-perf:
# Test performance of insert_tail, size, reverse, and sort
---     trace-15-perf   6/6
+++ TESTING trace trace-16-perf:
# Test performance of sort with random and descending orders
# 10000: all correct sorting algorithms are expected pass
# 50000: sorting algorithms with O(n^2) time complexity are expected failed
# 100000: sorting algorithms with O(nlogn) time complexity are expected pass
---     trace-16-perf   6/6
+++ TESTING trace trace-17-complexity:
# Test if q_insert_tail and q_size is constant time complexity
Probably constant time
Probably constant time
---     trace-17-complexity     5/5
---     TOTAL           100/100

Natural sort 實作過程

實作解析 - 2020-02-29 15:22

我選擇參考 Ruby 版本的 natural comparison 實作,重寫 C 的版本。
https://github.com/sourcefrog/natsort/blob/master/natcmp.rb

先看看 Ruby 版本是怎麼做的?

class String

# 'Natural order' comparison of two strings
def String.natcmp(str1, str2, caseInsensitive=false)

作者將這個 method 直接 implement 進 Ruby 的內建 class String 裡。
提供了不區分大小寫的選項;不給時則分大小寫。

注意這不是 instance method (要寫成 def self.natcmp(str2, caseInsensitive=false)),所以只能用 String.natcmp "sth" "else" 的形式呼叫,不能用 "sth".natcmp "else" 的形式來呼叫。

	str1, str2 = str1.dup, str2.dup

複製字串,因為後面會去除空白字元,以及可能要轉小寫。(Ruby 可以省略 method 呼叫的 ()

這種 assignment 的語法叫 destructing assignment。
C++17 有個類似的語法叫 structured binding declaration

auto [buf1, buf2] = std::make_tuple(strdup(argv[1]), strdup(argv[2]));

接下來,

	compareExpression = /^(\D*)(\d*)(.*)$/

這是用以分開字串的 regular expression。
Natural comparison 要分成「非數字」、「數字」、以及「餘下部分」作比較。
這個 regular expression 中的三個 groups 就分別對應這三個部分。

	if caseInsensitive
		str1.downcase!
		str2.downcase!
	end

如果不分大小寫,轉小寫。(! 結尾的 Ruby method 通常表示「破壞性的」,會直接變更原本的物件,所以前面用 .dup 複製,避免改到原本的物件)

	# Remove all whitespace
	str1.gsub!(/\s*/, '')
	str2.gsub!(/\s*/, '')

去除空白字元。

	while (str1.length > 0) or (str2.length > 0) do

注意要不斷地比較餘下部分(一開始是整個字串)。

		# Extract non-digits, digits and rest of string
		str1 =~ compareExpression
		chars1, num1, str1 = $1.dup, $2.dup, $3.dup

		str2 =~ compareExpression
		chars2, num2, str2 = $1.dup, $2.dup, $3.dup

str1/str2 分別用 regular expression match,matched 的 3 個 groups 分別 assign 給代表開頭非數字部分的 chars1/chars2、代表中間連續數字部分的num1/num2、以及代表餘下字串的 str1/str2

發現問題

這裡使用 .dup 的原因,可能是因為作者認為進行 matching (=~) 時,Ruby 會直接改變 matching group $1/$2/$3/ 所 reference 的字串物件的內容,而不是將這些 variables reference 到新 allocate 出來的字串。

用類似 C 的 pseudo code 表示作者認為的樣子:

void *$3 = str_new("");  // Initial value of `$3`
str_cpy($3, str_new("/* sth */"));  // `str1 =~ compareExpression`
void *str1 = str_dup($3);  // `str1 = $3.dup`
str_cpy($3, str_new("/* sth else */"));  // `str2 =~ compareExpression`
void *str2 = str_dup($3);  // `str2 = $3.dup`

不過用 Ruby 2.6.5 測試,Ruby 的做法是將這些 variables $1/$2/$3/ reference 到新 allocate 出來的字串,而不會改變原先字串的內容,所以這裡並不需要使用 .dup

用類似 C 的 pseudo code 表示實際執行的樣子:

void *$3 = NULL;  // Initial value of `$3`
$3 = str_new("/* sth */");  // `str1 =~ compareExpression`
void *str1 = str_dup($3);  // `str1 = $3.dup`
$3 = str_new("/* sth else */");  // `str2 =~ compareExpression`
void *str2 = str_dup($3);  // `str2 = $3.dup`

可以看到兩個 .dup (這裡用 str_dup() 表示對應的 C 函數) 都是不必要的。

註:2020-03-04 01:59

節錄 Ruby 對 Pre-defined variables 的說明:

$1, $2
Contains the subpattern from the corresponding set of parentheses in the last successful pattern matched, [], or nil if the last pattern match failed. (Mnemonic: like \digit.) These variables are all read-only.
Pre-defined variables

$1/$2/$3/ 可以 nil,而不是一直都是字串,所以應該是改變 reference 的字串,而不是改變原字串內容。

These variables are all read-only

這又是怎麼回事?

實際測試一下:

irb(main):001:0> "asdf" =~ /(\w{2})/
=> 0
irb(main):002:0> $1
=> "as"
irb(main):003:0> $1[0] = "i"; $1
=> "as"

直接從 $1 改其所 reference 的字串內容會失敗。

irb(main):004:0> a = $1
=> "as"
irb(main):005:0> a.equal? $1
=> false
irb(main):006:0> b = $1
=> "as"
irb(main):007:0> b.equal? a
=> false
irb(main):008:0> b[0] = "i"; b
=> "is"
irb(main):009:0> a
=> "as"
irb(main):010:0> $1
=> "as"

發現嘗試 reference 到 $1 所指的字串時,會 reference 到複製出來的字串。所以無法去更改 $1 所 reference 的字串內容。

繼續剛才的 code。

		# Compare the non-digits
		case (chars1 <=> chars2)

Ruby 的 <=> 是比較大小的意思,大於時會給 1,相等時給 0,小於時給 -1
有點像是 strcmp()但 C/C++ 標準沒有規定 strcmp() 的值必須是 -10、或 1 之一
C++20 也有 <=>,不過strcmp() 一樣沒有限制值域

			when 0 # Non-digits are the same, compare the digits...
				# If either number begins with a zero, then compare alphabetically,
				# otherwise compare numerically
				if (num1[0] != 48) and (num2[0] != 48)
					num1, num2 = num1.to_i, num2.to_i
				end

				case (num1 <=> num2)
					when -1 then return -1
					when 1 then return 1
				end

48 是字元 '0' 的 ASCII 值;在 Ruby 中沒有「字元」,只有「字串」,不能寫 '0',這在 Ruby 中是字串。比較好的寫法是 "0".ord

先比較非數字部分,一樣的話 ((chars1 <=> chars2) == 0) 再比較數字部分,如果數字部分也一樣的話,就給下一圈迴圈比較。

如果兩個字串的數字部分都不以 '0' 開頭,就轉成數字比較;但如果有一個的數字部分以 '0' 開頭,就使用字串比較。

			when -1 then return -1
			when 1 then return 1
		end # case

一旦遇到非數字部分不一樣時就 return

	end # while

	# Strings are naturally equal
	return 0
end

迴圈裡都沒 return 就沒不同。

end # class String

比較方法的選擇 - 2020-02-29 17:14

前面的 qtest.cdo_sort() 處有一段。

            /* FIXME: add an option to specify sorting order */

所以要使用 option 來指定 comparison 的方法。

然後是作業要求:

修改排序所用的比較函式,變更為 natural sort,在 “simulation” 也該做對應的修改,得以反映出 natural sort 的使用。

rg (ripgrep) 稍微找了一下,還暫時看不出 在 “simulation” 也該做對應的修改 的意思,可能是在 simulation mode 下要給預設的 comparison 方法。

構想與實作 - 2020-03-01 20:30

使用 add_param() 增加 comparison method 的參數 (int),然後在 do_sort() 時根據這個參數取得對應的 comparison function 進行 sort。

不應該從 queue.o 存取 qtest.o 裡的全域變數,因為這樣是不太好的設計,會造成 queue.o 的 link dependency。
所以 q_sort() 以及 ele_sort() 應該要增加指定 comparison function 用的 parameter,避免依賴 qtest.o 中的全域變數。

首先增加 compare.hcompare.c

compare.h:

#ifndef LAB0_COMPARE_H
#define LAB0_COMPARE_H

#include <string.h>  /* strcmp */
#include <strings.h> /* strcasecmp */

typedef int (*cmp_func_t)(const char *s1, const char *s2);

cmp_func_t cmp_get_func(int index);
const char *cmp_get_func_str(int index);

/* Name of the comparison functions */
#define CMP_FUNC_NAME_0 strcasecmp
#define CMP_FUNC_NAME_1 strcmp
#define CMP_FUNC_NAME_2 negstrcasecmp
#define CMP_FUNC_NAME_3 negstrcmp

/* Function name of the comparison functions */
#define CMP_FUNC_NAME(index) CMP_FUNC_NAME_##index
#define CMP_FUNC_STR(index) CMP_STR(CMP_FUNC_NAME(index))
/* Enumeration name of the comparison functions */
#define CMP_ENUM_NAME(index) CMP_CAT(k_, CMP_FUNC_NAME_##index)

/* Concatenate with expansion */
#define CMP_CAT(x, ...) CMP_CAT_NOEXPAND(x, __VA_ARGS__)
#define CMP_CAT_NOEXPAND(x, ...) x##__VA_ARGS__

/* Stringify with expansion */
#define CMP_STR(...) CMP_STR_NOEXPAND(__VA_ARGS__)
#define CMP_STR_NOEXPAND(...) #__VA_ARGS__

/* Define macro `CMP_EXPAND_FMT(n)` first and then invoke `CMP_EXPAND()` */

#define CMP_EXPAND() \
    CMP_EXPAND_FMT(0) CMP_EXPAND_FMT(1) CMP_EXPAND_FMT(2) CMP_EXPAND_FMT(3)

#define CMP_EXPAND_FMT(n) CMP_ENUM_NAME(n) = n,
typedef enum { CMP_EXPAND() } cmp_idx_t;
#undef CMP_EXPAND_FMT

int negstrcasecmp(const char *s1, const char *s2);
int negstrcmp(const char *s1, const char *s2);

#endif /* LAB0_COMPARE_H */

CMP_FUNC_NAME(index) 以及 CMP_FUNC_STR(index) 是用來從指定的 index 取得函數名稱的 macros。
其中 CMP_FUNC_NAME() 取得的是 function designator,可用於呼叫;CMP_FUNC_STR() 取得的是字串,可用於顯示。
cmp_func_t cmp_get_func(int index) 以及 const char *cmp_get_func_str(int index) 則分別是 macros CMP_FUNC_NAME(index)CMP_FUNC_STR(index) 所對應的 C 函數版本。

CMP_ENUM_NAME(index) 則是用來取得 index 所對應的 enum cmp_idx_t 的元素的名稱。

CMP_EXPAND() 要配合 #define CMP_EXPAND_FMT(n) 使用,使用上就像是 foreach 的感覺。

如果想要新增比較函數 ninecmp(),對應的 index 為 9,只要 #define CMP_FUNC_NAME_9 ninecmp,然後在 #define CMP_EXPAND() 後面加上 CMP_EXPAND_FMT(9) 就行了。我認為很方便。

compare.c:

#include "compare.h"

#define CMP_EXPAND_FMT(n)  \
    case CMP_ENUM_NAME(n): \
        return CMP_FUNC_NAME(n);
cmp_func_t cmp_get_func(int index)
{
    switch (index) {
    default:
        CMP_EXPAND()
    }
}
#undef CMP_EXPAND_FMT

#define CMP_EXPAND_FMT(n)  \
    case CMP_ENUM_NAME(n): \
        return CMP_FUNC_STR(n);
const char *cmp_get_func_str(int index)
{
    switch (index) {
    default:
        CMP_EXPAND()
    }
}
#undef CMP_EXPAND_FMT

int negstrcasecmp(const char *s1, const char *s2)
{
    return -strcasecmp(s1, s2);
}

int negstrcmp(const char *s1, const char *s2)
{
    return -strcmp(s1, s2);
}

negstrcasecmp() 以及 negstrcmp() 是我增加的字串比較函數,作用是在 sort 的時候可以由大排到小。

接著把 compare.o 加進 Makefile

diff --git a/Makefile b/Makefile
index 172a41f..34ed55e 100644
--- a/Makefile
+++ b/Makefile
@@ -25,7 +25,7 @@ $(GIT_HOOKS):
        @scripts/install-git-hooks
        @echo

-OBJS := qtest.o report.o console.o harness.o queue.o \
+OBJS := qtest.o report.o console.o harness.o queue.o compare.o \
         random.o dudect/constant.o dudect/fixture.o dudect/ttest.o
 deps := $(OBJS:%.o=.%.o.d)

然後把 q_sort() 加上用來指定比較函數的 parameter。

diff --git a/queue.c b/queue.c
index b25556a..26e5737 100644
--- a/queue.c
+++ b/queue.c
@@ -2,6 +2,7 @@
 #include <stdlib.h>
 #include <string.h>

+#include "compare.h"
 #include "harness.h"
 #include "queue.h"

@@ -193,7 +194,7 @@ typedef struct {
  * No effect if `head` is `NULL` or its member `next` is `NULL`.
  * The function reduces to insertion sort when `len` < 4
  */
-static list_span_t ele_sort(list_ele_t *head, size_t len)
+static list_span_t ele_sort(list_ele_t *head, size_t len, cmp_func_t cmp)
 {
     list_ele_t *left;
     list_ele_t *right;
@@ -212,24 +213,24 @@ static list_span_t ele_sort(list_ele_t *head, size_t len)
     }

     {
-        const list_span_t left_res = ele_sort(left, len / 2);
+        const list_span_t left_res = ele_sort(left, len / 2, cmp);
         left = left_res.head;
         if (!right)  // `left` is the head of the sorted list
             return left_res;
     }
-    right = ele_sort(right, len / 2 + len % 2).head;
+    right = ele_sort(right, len / 2 + len % 2, cmp).head;

     /* Now `left` and `right` are both the head of a non-`NULL` sorted list */
     {
         /* Set up the beginning of the merged list */
         list_ele_t **const phead =
-            (strcmp(left->value, right->value) < 0) ? &left : &right;
+            (cmp(left->value, right->value) < 0) ? &left : &right;
         merge = head = *phead;
         *phead = (*phead)->next;
     }

     while (left || right) {
-        if (!right || (left && strcmp(left->value, right->value) < 0)) {
+        if (!right || (left && cmp(left->value, right->value) < 0)) {
             /* `left` should be linked from `merge` */
             merge->next = left;
             left = left->next;
@@ -246,16 +247,17 @@ static list_span_t ele_sort(list_ele_t *head, size_t len)

 /*
  * Sort elements of queue in ascending order
- * No effect if q is NULL or empty. In addition, if q has only one
+ * No effect if `q` is `NULL` or empty. In addition, if `q` has only one
  * element, do nothing.
+ * Argument `cmp` should not be `NULL`.
  */
-void q_sort(queue_t *q)
+void q_sort(queue_t *q, cmp_func_t cmp)
 {
     list_span_t span;
     if (!q || !q->head || !q->head->next)
         return;

-    span = ele_sort(q->head, q->size);
+    span = ele_sort(q->head, q->size, cmp);
     q->head = span.head;
     q->tail = span.tail;
 }
diff --git a/queue.h b/queue.h
index c67dd87..c0d3bcb 100644
--- a/queue.h
+++ b/queue.h
@@ -11,6 +11,8 @@
 #include <stdbool.h>
 #include <stddef.h>

+#include "compare.h"
+
 /* Data structure declarations */

 /* Linked list element (You shouldn't need to change this) */
@@ -91,6 +93,6 @@ void q_reverse(queue_t *q);
  * No effect if q is NULL or empty. In addition, if q has only one
  * element, do nothing.
  */
-void q_sort(queue_t *q);
+void q_sort(queue_t *q, cmp_func_t cmp);

 #endif /* LAB0_QUEUE_H */

qtest.c 中增加全域變數 static int cmp_func_idx = 0;,然後用 add_param() 增加 compare 選項,並寫個 setter compare_setter() 來告訴使用者,比較函數換成了哪一個。

接著在 do_sort() 裡取得比較函數,用以傳給 q_sort() 以及驗證 sort 的結果。

diff --git a/qtest.c b/qtest.c
index 6ec8d68..c4dc7f8 100644
--- a/qtest.c
+++ b/qtest.c
@@ -6,7 +6,6 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
-#include <strings.h> /* strcasecmp */
 #include <sys/stat.h>
 #include <sys/wait.h>
 #include <time.h>
@@ -31,6 +30,8 @@
  */
 #include "queue.h"

+#include "compare.h" /* comparison functions */
+
 #include "console.h"
 #include "report.h"

@@ -62,6 +63,9 @@ static int string_length = MAXSTRING;
 #define MAX_RANDSTR_LEN 10
 static const char charset[] = "abcdefghijklmnopqrstuvwxyz";

+/* Comparison method */
+static int cmp_func_idx = 0;
+
 /* Forward declarations */
 static bool show_queue(int vlevel);
 static bool do_new(int argc, char *argv[]);
@@ -77,6 +81,11 @@ static bool do_show(int argc, char *argv[]);

 static void queue_init();

+static void compare_setter(int oldval)
+{
+    printf("Will use %s for sorting.\n", cmp_get_func_str(cmp_func_idx));
+}
+
 static void console_init()
 {
     add_cmd("new", do_new, "                | Create new queue");
@@ -104,6 +113,11 @@ static void console_init()
               NULL);
     add_param("fail", &fail_limit,
               "Number of times allow queue operations to return false", NULL);
+#define CMP_EXPAND_FMT(n) " " #n ": " CMP_FUNC_STR(n)
+    add_param("compare", &cmp_func_idx,
+              "Comparison function to be used (default: 0)" CMP_EXPAND(),
+              compare_setter);
+#undef CMP_EXPAND_FMT
 }

 static bool do_new(int argc, char *argv[])
@@ -547,9 +561,10 @@ bool do_sort(int argc, char *argv[])
         report(3, "Warning: Calling sort on single node");
     error_check();

+    const cmp_func_t cmp = cmp_get_func(cmp_func_idx);
     set_noallocate_mode(true);
     if (exception_setup(true))
-        q_sort(q);
+        q_sort(q, cmp);
     exception_cancel();
     set_noallocate_mode(false);

@@ -557,8 +572,7 @@ bool do_sort(int argc, char *argv[])
     if (q && q->head) {
         for (list_ele_t *e = q->head; e->next && --cnt; e = e->next) {
             /* Ensure each element in ascending order */
-            /* FIXME: add an option to specify sorting order */
-            if (strcasecmp(e->value, e->next->value) > 0) {
+            if (cmp(e->value, e->next->value) > 0) {
                 report(1, "ERROR: Not sorted in ascending order");
                 ok = false;
                 break;

測試 - 2020-03-01 22:25

$ ./qtest
cmd> help
cmd> help
Commands:
        #        ...            | Display comment
        free                    | Delete queue
        help                    | Show documentation
        ih       str [n]        | Insert string str at head of queue n times. Generate random string(s) if str equals RAND. (default: n == 1)
        it       str [n]        | Insert string str at tail of queue n times. Generate random string(s) if str equals RAND. (default: n == 1)
        log      file           | Copy output to file
        new                     | Create new queue
        option   [name val]     | Display or set options
        quit                    | Exit program
        reverse                 | Reverse queue
        rh       [str]          | Remove from head of queue.  Optionally compare to expected value str
        rhq                     | Remove from head of queue without reporting value.
        show                    | Show queue contents
        size     [n]            | Compute queue size n times (default: n == 1)
        sort                    | Sort queue in ascending order
        source   file           | Read commands from source file
        time     cmd arg ...    | Time command execution
Options:
        compare 0       Comparison function to be used (default: 0) 0: strcasecmp 1: strcmp 2: negstrcasecmp 3: negstrcmp
        echo    1       Do/don't echo commands
        error   5       Number of errors until exit
        fail    30      Number of times allow queue operations to return false
        length  1024    Maximum length of displayed string
        malloc  0       Malloc failure probability percent
        simulation      0       Start/Stop simulation mode
        verbose 4       Verbosity level
cmd> new
cmd> new
q = []
cmd> it RAND 16
cmd> it RAND 16
q = [cwxvawh qenuidu fqcrxv yxwugtmlc jxqosdvq oprjot iywxl hdkyr vcdqwrhqc jakuahtmi tnpks veftxnj jwvwsvf tywgsm mdnbop xeaerjtt]
cmd> sort
cmd> sort
q = [cwxvawh fqcrxv hdkyr iywxl jakuahtmi jwvwsvf jxqosdvq mdnbop oprjot qenuidu tnpks tywgsm vcdqwrhqc veftxnj xeaerjtt yxwugtmlc]
cmd> option compare 2
cmd> option compare 2
Will use negstrcasecmp for sorting.
cmd> sort
cmd> sort
q = [yxwugtmlc xeaerjtt veftxnj vcdqwrhqc tywgsm tnpks qenuidu oprjot mdnbop jxqosdvq jwvwsvf jakuahtmi iywxl hdkyr fqcrxv cwxvawh]
cmd>

Function never used 的問題

$ git commit -a
compare.c:27:0: style: The function 'negstrcasecmp' is never used. [unusedFunction]

^
compare.c:32:0: style: The function 'negstrcmp' is never used. [unusedFunction]

^

Fail to pass static analysis.

真奇怪,我剛剛使用的函數是沒有被使用的函數?
如果函數沒有被使用到,那我剛才使用的是什麼?
如果我剛才使用的是這個函數,為何會未被使用?

cppcheck 可能不接受這種只用在 switch 內的用法,所以以為沒有使用這些函數。
看來不能用 switch,得真的建表了。

compare.c:

#define CMP_EXPAND_FMT(n) [CMP_ENUM_NAME(n)] = CMP_FUNC_NAME(n),
static const cmp_func_t funcs[] = {CMP_EXPAND()};
#undef CMP_EXPAND_FMT

#define CMP_EXPAND_FMT(n) [CMP_ENUM_NAME(n)] = CMP_FUNC_STR(n),
static const char *const func_strs[] = {CMP_EXPAND()};
#undef CMP_EXPAND_FMT

#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))

cmp_func_t cmp_get_func(int index)
{
    if (index >= 0 && index < ARRAY_SIZE(funcs))
        return funcs[index];
    return funcs[CMP_ENUM_NAME(0)];
}

const char *cmp_get_func_str(int index)
{
    if (index >= 0 && index < ARRAY_SIZE(func_strs))
        return func_strs[index];
    return func_strs[CMP_ENUM_NAME(0)];
}
$ git commit -a
[master f6b25e9] Add an option to specify sorting order
 6 files changed, 111 insertions(+), 14 deletions(-)
 create mode 100644 compare.c
 create mode 100644 compare.h

Commit message 的問題

$ git commit
Rename elements of enum `cmp_idx_t` to `k_<func>`                          [line 1]
 - Possible misspelled word(s): enum

How to Write a Git Commit Message: https://chris.beams.io/posts/git-commit/
Proceed with commit? [e/n/?]

C 的 keyword enum 居然被認為是錯字。雖然它本來就是。

用全稱取代:

Rename elements of enumeration type `cmp_idx_t` to `k_<func>`              [line 1]
 - Possible misspelled word(s): cmp

How to Write a Git Commit Message: https://chris.beams.io/posts/git-commit/
Proceed with commit? [e/n/?]

連常用的簡寫 cmp => compare 也不行。

可能是我寫得太 specific 了。
寫得更 generic 一點。

[master d71ea32] Refine the name of comparison method enumerations
 1 file changed, 7 insertions(+), 7 deletions(-)

比較函數的實作 - 2020-03-02 00:14

compare.c:

#include <ctype.h>
#include <stdbool.h>
/* Natural comparison */

/* For `strspn()` */

#define WHITESPACES " \f\n\r\t\v"
#define DECDIGITS "0123456789"

static const char digit_space[] = DECDIGITS WHITESPACES;
static const char space[] = WHITESPACES;

用來呼叫 strspn() 用的字串。

這裡用 macros 是為了避免重複兩次同樣的字串,以免之後需要修改時需要修改兩個地方。

先看主函數實作:

/*
 * The basic implement of `natstrcmp()` and `natstrcasecmp()`.
 */
static int natstrcmp_base(const char *s1, const char *s2, bool ignore_case)
{
    const char *p1 = s1;
    const char *p2 = s2;

    while (*p1 || *p2) {
        int diff;

        /* Skip spaces */
        p1 += strspn(p1, space);
        p2 += strspn(p2, space);

        if (isdigit(*p1) || isdigit(*p2)) {
            diff = natstrcmp_digit(&p1, &p2); /* Compare digits */
        } else {
            const char c1 = (ignore_case) ? tolower(*p1) : *p1;
            const char c2 = (ignore_case) ? tolower(*p2) : *p2;
            diff = c1 - c2;
        }

        if (diff || !*p1 || !*p2)
            return diff;

        ++p1;
        ++p2;
    }
    return 0;
}

從頭開始對 2 個字串 s1s2 的字元進行比對。
先跳過連續空白,檢查有沒有遇到數字部分,有的話交給數字部分處理,沒有則直接比較。
如果有不同,或是遇到字串結尾,就回傳結果,否則比對下個字元。

數字處理的部分:

/*
 * The digit comparison part of natural comparison.
 * Both `!isspace(**ps1) && !isspace(**ps2)`
 *    and `isdigit(**ps1) || isdigit(**ps2)` should hold.
 */
static int natstrcmp_digit(const char **ps1, const char **ps2)
{
    const char *p1;
    const char *p2;
    int diff;     // The difference of the first mismatch characters
    bool numcmp;  // Perform number comparison or string comparison

    if (!isdigit(**ps1)) {
        *ps2 += strspn(*ps2, digit_space);
        return -1;
    }
    if (!isdigit(**ps2)) {
        *ps1 += strspn(*ps1, digit_space);
        return 1;
    }

    p1 = *ps1;
    p2 = *ps2;
    *ps1 += strspn(*ps1, digit_space);
    *ps2 += strspn(*ps2, digit_space);

    diff = 0;
    numcmp = !(**ps1 == '0' || **ps2 == '0');
    for (;;) {
        for (;;) {
            p1 += strspn(p1, space);
            p2 += strspn(p2, space);

            if (isdigit(p1) && isdigit(p2)) {
                if (!diff) {
                    diff = *p1 - *p2;
                    if (diff)
                        break;
                }
            } else {
                /* strcmp/numcmp: Returns on end of either numbers */
                if (!diff)
                    diff = isdigit(p1) - isdigit(p2);
                return diff;
            }

            ++p1;
            ++p2;
        }
        /* strcmp: Returns on first mismatch */
        if (!numcmp)
            return diff;
    }
}

先檢查是不是 2 個字串 *ps1*ps2 都以數字開頭。不是的話,代表有一個是數字,另一個不是,可以直接回傳結果。
我在回傳結果前把指向 指向數字字元的指標 的指標 ps1/ps2 指向的指標 *ps1/*ps2 (也就是 natstrcmp_base() 中的 p1/p2) 移到數字部分的結束。這個動作其實不是必要的,因為回傳結果非 0natstrcmp_base() 會跟著 return,並不會用到這個 由參數 ps1/ps2 指向的 指標 *ps1/*ps2 的值。

確定 2 個字串都以數字字元開頭後,將 *ps1*ps2 移到數字部分的結束。與前面的不同,這一步是需要的,因為可能因數字部分相等而回傳 0

如果 2 個字串中有 1 個以 0 開頭,就使用字串比較,否則當作整數比較。
整數字串比較特別,而處理單純字串比較簡單,所以先討論整數比較,再討論字串比較好。

首先,不可以轉成數字,因為會 overflow。

在做整數比較時,我們會看哪個數字長,一樣長時則從開頭看哪個數字大。
但是不應該看完長度再回去一一比較,因為中途可能有些空白,所以數字的長度,必須走過一遍才能明白。
因此我採用的策略是,在比較的過程中,記住第一個不一樣的數字,然後讓我們繼續看下去。
如果先看到其中一個的尾巴,就表示它比較短;否則一樣長,需要看數字大小。
而因為已經記住了第一個不一樣的數字,而十進制數字愈靠近開頭的數位,權重愈大,所以第一個不一樣的數字決定了哪個數大。

而做字串比較時,只要看到第一個不一樣的數字,就可以回傳結果。

因此,整數比較與字串比較都有「尋找第一個不同點」的共同點。
所以我將比較迴圈寫成兩層的無限迴圈,找到第一個不同點時就退出內層迴圈,在外層迴圈中根據是字串比較還是整數比較,判斷要不要繼續比較。

而當遇到非數字字元時,數字字元就比較結束了。這時整數比較可以判斷哪個整數比較長,字串比較可以判斷哪個字串比較短。
因此,整數比較與字串比較還有「尋找字元長度」的共同點。
不同的是,整數比較還要考慮是不是找到了第一個不同點,而字串比較則不用。

因為整數比較與字串比較的這些共同點,所以我把它們寫成同一個迴圈。

最後是實際提供的函數。

額外加上 neg*() 版本。

int natstrcasecmp(const char *s1, const char *s2)
{
    return natstrcmp_base(s1, s2, true);
}

int natstrcmp(const char *s1, const char *s2)
{
    return natstrcmp_base(s1, s2, false);
}

int negnatstrcasecmp(const char *s1, const char *s2)
{
    return -natstrcasecmp(s1, s2);
}

int negnatstrcmp(const char *s1, const char *s2)
{
    return -natstrcmp(s1, s2);
}

初步驗證 - 2020-03-02 08:50

接著將這些函數宣告入 compare.h

diff --git a/compare.h b/compare.h
index fa09be9..663bee6 100644
--- a/compare.h
+++ b/compare.h
@@ -14,6 +14,10 @@ const char *cmp_get_func_str(int index);
 #define CMP_FUNC_NAME_1 strcmp
 #define CMP_FUNC_NAME_2 negstrcasecmp
 #define CMP_FUNC_NAME_3 negstrcmp
+#define CMP_FUNC_NAME_4 natstrcasecmp
+#define CMP_FUNC_NAME_5 natstrcmp
+#define CMP_FUNC_NAME_6 negnatstrcasecmp
+#define CMP_FUNC_NAME_7 negnatstrcmp

 /* Function name of the comparison functions */
 #define CMP_FUNC_NAME(index) CMP_FUNC_NAME_##index
@@ -31,8 +35,12 @@ const char *cmp_get_func_str(int index);

 /* Define macro `CMP_EXPAND_FMT(n)` first and then invoke `CMP_EXPAND()` */

-#define CMP_EXPAND() \
-    CMP_EXPAND_FMT(0) CMP_EXPAND_FMT(1) CMP_EXPAND_FMT(2) CMP_EXPAND_FMT(3)
+#define CMP_EXPAND()  \
+    CMP_EXPAND_FMT(0) \
+    CMP_EXPAND_FMT(1) \
+    CMP_EXPAND_FMT(2) \
+    CMP_EXPAND_FMT(3) \
+    CMP_EXPAND_FMT(4) CMP_EXPAND_FMT(5) CMP_EXPAND_FMT(6) CMP_EXPAND_FMT(7)

 #define CMP_EXPAND_FMT(n) CMP_ENUM_NAME(n) = n,
 typedef enum { CMP_EXPAND() } cmp_idx_t;
@@ -40,5 +48,9 @@ typedef enum { CMP_EXPAND() } cmp_idx_t;

 int negstrcasecmp(const char *s1, const char *s2);
 int negstrcmp(const char *s1, const char *s2);
+int natstrcasecmp(const char *s1, const char *s2);
+int natstrcmp(const char *s1, const char *s2);
+int negnatstrcasecmp(const char *s1, const char *s2);
+int negnatstrcmp(const char *s1, const char *s2);

 #endif /* LAB0_COMPARE_H */

然後直接 make
qtest 中就可以使用這些函數了。

而且完全不需要碰 qtest.c 的 code。

$ ./qtest
cmd> help
cmd> help
Commands:
[...]
        time     cmd arg ...    | Time command execution
Options:
        compare 0       Comparison function to be used (default: 0) 0: strcasecmp 1: strcmp 2: negstrcasecmp 3: negstrcmp 4: natstrcasecmp 5: natstrcmp 6: negnatstrcasecmp 7: negnatstrcmp
        echo    1       Do/don't echo commands
[...]
cmd>

sourcefrog/natsortREADME.md 中的範例測試:

x2-g8 < x2-y7 < x2-y08 < x8-y8

cmd> new
cmd> new
q = []
cmd> ih x2-g8
cmd> ih x2-g8
q = [x2-g8]
cmd> ih x2-y7
cmd> ih x2-y7
q = [x2-y7 x2-g8]
cmd> ih x2-y08
cmd> ih x2-y08
q = [x2-y08 x2-y7 x2-g8]
cmd> ih     ih x8-y8
cmd> ih x8-y8
q = [x8-y8 x2-y08 x2-y7 x2-g8]
cmd> sort
cmd> sort
q = [x2-g8 x2-y08 x2-y7 x8-y8]
cmd> option compare 4
cmd> option compare 4
Will use natstrcasecmp for sorting.
cmd> sort
cmd> sort
Segmentation fault occurred.  You dereferenced a NULL or invalid pointer
Aborted (core dumped)

發生什麼事?

valgrind 檢查發現不小心直接把指標當作 isdigit() 的參數。

diff --git a/compare.c b/compare.c
index 8a541e3..6fe39ab 100644
--- a/compare.c
+++ b/compare.c
@@ -80,7 +80,7 @@ static int natstrcmp_digit(const char **ps1, const char **ps2)
             p1 += strspn(p1, space);
             p2 += strspn(p2, space);

-            if (isdigit(p1) && isdigit(p2)) {
+            if (isdigit(*p1) && isdigit(*p2)) {
                 if (!diff) {
                     diff = *p1 - *p2;
                     if (diff)
@@ -89,7 +89,7 @@ static int natstrcmp_digit(const char **ps1, const char **ps2)
             } else {
                 /* strcmp/numcmp: Returns on end of either numbers */
                 if (!diff)
-                    diff = isdigit(p1) - isdigit(p2);
+                    diff = isdigit(*p1) - isdigit(*p2);
                 return diff;
             }

cppcheck 都沒提出警告!

如果是 C++ 編譯器就會在編譯時直接擋下來:

$ g++ compare.c
compare.c: In function ‘int natstrcmp_digit(const char**, const char**)’:
compare.c:83:27: error: invalid conversion from ‘const char*’ to ‘int’ [-fpermissive]
             if (isdigit(p1) && isdigit(p2)) {
                           ^
In file included from compare.c:3:0:
/usr/include/ctype.h:111:1: note:   initializing argument 1 of ‘int isdigit(int)’
 __exctype (isdigit);
 ^
compare.c:83:42: error: invalid conversion from ‘const char*’ to ‘int’ [-fpermissive]
             if (isdigit(p1) && isdigit(p2)) {
                                          ^
In file included from compare.c:3:0:
/usr/include/ctype.h:111:1: note:   initializing argument 1 of ‘int isdigit(int)’
 __exctype (isdigit);
 ^
compare.c:92:38: error: invalid conversion from ‘const char*’ to ‘int’ [-fpermissive]
                     diff = isdigit(p1) - isdigit(p2);
                                      ^
In file included from compare.c:3:0:
/usr/include/ctype.h:111:1: note:   initializing argument 1 of ‘int isdigit(int)’
 __exctype (isdigit);
 ^
compare.c:92:52: error: invalid conversion from ‘const char*’ to ‘int’ [-fpermissive]
                     diff = isdigit(p1) - isdigit(p2);
                                                    ^
In file included from compare.c:3:0:
/usr/include/ctype.h:111:1: note:   initializing argument 1 of ‘int isdigit(int)’
 __exctype (isdigit);

找找看原因。

編譯器版本?

$ gcc --version
gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

寫個 test.c

#include <ctype.h>
isdigit("42")

gcc test.c -E

[前略...]
# 327 "/usr/include/ctype.h" 3 4

# 2 "test.c" 2
((*__ctype_b_loc ())[(int) ((
# 2 "test.c"
"42"
# 2 "test.c" 3 4
))] & (unsigned short int) _ISdigit)

原來 isdigit()gcc 上是 macro,有強制轉型,所以不會提出警告。

g++ test.c -E

[前略...]
# 327 "/usr/include/ctype.h" 3 4
}
# 2 "test.c" 2

# 2 "test.c"
isdigit("42")

g++ 上則是 function。
因為有 prototype 的 function 會對 arguments 做型別檢查,所以可以發現參數型別不合的問題。

如果把 gcc preprocessor 的輸出拿給 g++ 當作 C++ 程式編譯呢?
test.c

#include <ctype.h>

int main(void)
{
    isdigit("42");
}
$ gcc test.c -E | g++ -xc++ -
test.c: In function ‘int main()’:
test.c:5:5: error: cast from ‘const char*’ to ‘int’ loses precision [-fpermissive]
     isdigit("42");

還是會被 C++ 編譯器擋。

在 C++ 中進行 pointer 與整數型別的互相強制轉型時,會使用 reinterpret_cast<>() 的規則,而 reinterpret_cast<>() 在將 pointer 轉換成整數型別時,要求這個整數型別的值域要足夠完整表示 pointer 的值。
而在 LP64 (可看成 long & pointer 64-bit) data model 上,int 是 32-bit,而 pointer 是 64-bit。
在這個例子中,int 的值域不足以完整表示 const char * 的值,會被 C++ 編譯器視為轉型失敗,終而報錯。

另外,如果用:
test.c

#include <ctype.h>

int main(void)
{
    isdigit('4');
}
$ gcc test.c -E | g++ -xc++ -

則會編譯成功。

註 - 2020-03-04

在 C++ 下之所以不使用 macro,應該是因為 C++ 在 header <locale> 中有定義相同名稱的 function template

template <class charT>
  bool isspace (charT c, const locale& loc);

而這需要透過編譯器執行 overload resolution 才能確定實際的操作。
除非使用利用了 C99/C++11 的 variadic macro 功能的 narg C macro 技巧與 C11 的 _Generic,不然沒辦法實作對應的 macro。
而且它在 namespace std 下,可以用 std::isspace 表示,這不是 C macro 所能正確處理的語法。

修復問題後再度測試:
x2-g8 < x2-y7 < x2-y08 < x8-y8

cmd> new
cmd> new
q = []
cmd> option compare 4
cmd> option compare 4
Will use natstrcasecmp for sorting.
cmd> ih x2-g8
cmd> ih x2-g8
q = [x2-g8]
cmd> ih x2-y7
cmd> ih x2-y7
q = [x2-y7 x2-g8]
cmd> ih x2-y08
cmd> ih x2-y08
q = [x2-y08 x2-y7 x2-g8]
cmd> ih x8-y8
cmd> ih x8-y8
q = [x8-y8 x2-y08 x2-y7 x2-g8]
cmd> sort
cmd> sort
q = [x2-g8 x2-y08 x2-y7 x8-y8]
cmd> option compare 1
cmd> option compare 1
Will use strcmp for sorting.
cmd> sort
cmd> sort
q = [x2-g8 x2-y08 x2-y7 x8-y8]

怎麼不正確?

剛才的 Ruby 實作是正確的嗎?

擷取 README.md 中的另一個範例:
1.001 < 1.002 < 1.010 < 1.02 < 1.1 < 1.3

irb(main):047:0> String.natcmp("1.010", "1.02")
=> 1

結果前面 Ruby 的實作也是不正確的。
原因在於對 Ruby string index 會得到 Ruby string 而非 ASCII 碼數字:"010"[0] == "0",所以前面的 Ruby 實作會一律轉成數字再比較。

重新閱讀正式規格 - 2020-03-02 13:03

Sorting

Strings are sorted as usual, except that decimal integer substrings are compared on their numeric value. For example,

a < a0 < a1 < a1a < a1b < a2 < a10 < a20

Strings can contain several number parts:

x2-g8 < x2-y7 < x2-y08 < x8-y8

in which case numeric fields are separated by nonnumeric characters.

到目前為止的敘述大致符合之前的實作。

Leading spaces are ignored. This works very well for IP addresses from log files, for example.

開始不符合了。只有 leading spaces 需要忽略,但在剛才的實作中,將全部的 spaces 都忽略了。

Leading zeros are not ignored, which tends to give more reasonable results on decimal fractions.

  1.001 < 1.002 < 1.010 < 1.02 < 1.1 < 1.3

這是符合的。
正確的實作是當數字部分以 '0' 開頭時,將其看作小數點後的數字來比較,而這恰巧等同於將其當作字串比較。
而 Ruby 版本的實作則是因為存在邏輯問題,而導致結果不正確。

Some applications may wish to change this by modifying the test that calls isspace.

Performance is linear: each character of the string is scanned at most once, and only as many characters as necessary to decide are considered.

問題來了。

原範例:
1.001 < 1.002 < 1.010 < 1.02 < 1.1 < 1.3
x2-g8 < x2-y7 < x2-y08 < x8-y8

擷取其中部分:
1.02 < 1.1
x2-y7 < x2-y08

去除相同的非數字部分:
02 < 1
7 < 08

不影響結果的數字代換:
02 < 1
1 < 02

移項:
02 < 1
02 > 1

在只有規格所述的規則下,這兩個範例根本是互相衝突的!

而根據實際 sort 結果:
https://github.com/sourcefrog/natsort/blob/master/sorted-words
x2-y08 < x2-y7,這等同於 02 < 1,才是正確的。

所以這個 README.md 內的範例是有誤的。

既然知道只有數字前的空白字元需要忽略,那就可以來修改並優化實作了。

diff --git a/compare.c b/compare.c
index 6fe39ab..2a2c0bd 100644
--- a/compare.c
+++ b/compare.c
@@ -41,16 +41,12 @@ int negstrcmp(const char *s1, const char *s2)

 /* For `strspn()` */

-#define WHITESPACES " \f\n\r\t\v"
-#define DECDIGITS "0123456789"
-
-static const char digit_space[] = DECDIGITS WHITESPACES;
-static const char space[] = WHITESPACES;
+static const char digit[] = "0123456789";
+static const char space[] = " \f\n\r\t\v";

 /*
  * The digit comparison part of natural comparison.
- * Both `!isspace(**ps1) && !isspace(**ps2)`
- *    and `isdigit(**ps1) || isdigit(**ps2)` should hold.
+ * `isdigit(**ps1) || isdigit(**ps2)` should hold.
  */
 static int natstrcmp_digit(const char **ps1, const char **ps2)
 {
@@ -59,27 +55,27 @@ static int natstrcmp_digit(const char **ps1, const char **ps2)
     int diff;     // The difference of the first mismatch characters
     bool numcmp;  // Perform number comparison or string comparison

+    *ps1 += strspn(*ps1, space);
+    *ps2 += strspn(*ps2, space);
+
     if (!isdigit(**ps1)) {
-        *ps2 += strspn(*ps2, digit_space);
+        *ps2 += strspn(*ps2, digit);
         return -1;
     }
     if (!isdigit(**ps2)) {
-        *ps1 += strspn(*ps1, digit_space);
+        *ps1 += strspn(*ps1, digit);
         return 1;
     }

     p1 = *ps1;
     p2 = *ps2;
-    *ps1 += strspn(*ps1, digit_space);
-    *ps2 += strspn(*ps2, digit_space);
+    *ps1 += strspn(*ps1, digit);
+    *ps2 += strspn(*ps2, digit);

     diff = 0;
     numcmp = !(**ps1 == '0' || **ps2 == '0');
     for (;;) {
         for (;;) {
-            p1 += strspn(p1, space);
-            p2 += strspn(p2, space);
-
             if (isdigit(*p1) && isdigit(*p2)) {
                 if (!diff) {
                     diff = *p1 - *p2;
@@ -113,10 +109,6 @@ static int natstrcmp_base(const char *s1, const char *s2, bool ignore_case)
     while (*p1 || *p2) {
         int diff;

-        /* Skip spaces */
-        p1 += strspn(p1, space);
-        p2 += strspn(p2, space);
-
         if (isdigit(*p1) || isdigit(*p2)) {
             diff = natstrcmp_digit(&p1, &p2); /* Compare digits */
         } else {

嘗試自動測試 - 2020-03-02 18:00

因為手動輸入測試資料太麻煩了,所以我想研究 scripts/driver.py 看看自動測試是怎麼實作的。

$ vim scripts/driver.py

一打開就看到可疑的 dictionary。

    traceDict = {
        1: "trace-01-ops",
        2: "trace-02-ops",
        3: "trace-03-ops",
        4: "trace-04-ops",
        5: "trace-05-ops",
        6: "trace-06-string",
        7: "trace-07-robust",
        8: "trace-08-robust",
        9: "trace-09-robust",
        10: "trace-10-malloc",
        11: "trace-11-malloc",
        12: "trace-12-malloc",
        13: "trace-13-perf",
        14: "trace-14-perf",
        15: "trace-15-perf",
        16: "trace-16-perf",
        17: "trace-17-complexity"
    }

這些 values 的名稱與 traces/ 底下的檔案的主檔名一樣。

*

vim 向下搜尋某 word 的 hotkey)

發現 runTrace()run() 2 個 methods,其中 run() 會呼叫 runTrace()

看看有沒有可疑的關鍵字。
分數是怎麼給的?查查可疑的 maxScores

只有這裡有。

            maxval = self.maxScores[t]

真相只有一個。

        for t in tidList:
            tname = self.traceDict[t]
            if self.verbLevel > 0:
                print("+++ TESTING trace %s:" % tname)
            ok = self.runTrace(t)
            maxval = self.maxScores[t]
            tval = maxval if ok else 0
            if tval < maxval:
                self.printInColor("---\t%s\t%d/%d" % (tname, tval, maxval), self.RED)
            else:
                self.printInColor("---\t%s\t%d/%d" % (tname, tval, maxval), self.GREEN)
            score += tval
            maxscore += maxval
            scoreDict[t] = tval

可以看到與 maxval 直接相關的 variables 有 tvalmaxscore
score 是不同 tval 的累加,maxscore 則是不同 maxval 的累加。
tval 是看 ok 的值,非 maxval0。這說明了為什麼每個項目都會拿到滿分或 0 分,而不是對多少給多少。
ok 則是從 self.runTrace(t) 回傳的。

看看這個 classrunTrace() method。

    def runTrace(self, tid):
        if not tid in self.traceDict:
            self.printInColor("ERROR: No trace with id %d" % tid, self.RED)
            return False
        fname = "%s/%s.cmd" % (self.traceDirectory, self.traceDict[tid])
        vname = "%d" % self.verbLevel
        clist = self.command + ["-v", vname, "-f", fname]
        
        try:
            retcode = subprocess.call(clist)
        except Exception as e:
            self.printInColor("Call of '%s' failed: %s" % (" ".join(clist), e), self.RED)
            return False
        return retcode == 0

可以看到這個 method 會生成類似 execv() 的陣列參數的指令,讓 clist reference 這個陣列物件,然後使用 subprocess.call(clist) 執行這個指令。
其中 self.command"./qtest"

參數用法 qtest 有附。

$ ./qtest -h
Usage: ./qtest [-h] [-f IFILE][-v VLEVEL][-l LFILE]
        -h         Print this information
        -f IFILE   Read commands from IFILE
        -v VLEVEL  Set verbosity level
        -l LFILE   Echo results to LFILE

這段 code 使用 exception handling 進行呼叫失敗時的處理——告知使用者,並且回傳 False 代表本題不通過。
正常情況下不會 raise exception,所以 retcode = subprocess.call(clist) 會成功執行,回傳 qtest 的回傳值。

-f 參數會讓 qtest 將指定檔案,也就是 "%s/%s.cmd" % (self.traceDirectory, self.traceDict[tid]),當作輸入。
而當 qtest 發生錯誤退出時,會回傳非 0 值。因此用 return retcode == 0 判斷 qtest 有沒有成功執行。

所以判斷操作成不成功是由 qtest 決定的。

什麼時候 qtest 會回傳非 0 值?

qtest.cmain() 的最後幾行:

    bool ok = true;
    ok = ok && run_console(infile_name);
    ok = ok && finish_cmd();

    return ok ? 0 : 1;
}

console.c

bool run_console(char *infile_name)
{
    if (!push_file(infile_name)) {
        report(1, "ERROR: Could not open source file '%s'", infile_name);
        return false;
    }

    while (!cmd_done())
        cmd_select(0, NULL, NULL, NULL, NULL);
    return err_cnt == 0;
}

err_cnt 是個全域變數,會被 record_error() 增加,record_error() 會在 interpret_cmda()next_cmd->operation(argc, argv) 回傳 false 時被呼叫。

next_cmd->operation 是由 add_cmd() 加進來的 callback function。

所以可以尋找有哪些與 list 操作相關的 callback functions。如果這些函數回傳 false,那麼會導致 qtest 回傳非 0 值。

其中我們有興趣的是可以驗證 queue 的內容的 command。而 rh 可以讓我們加上預期要 remove 的 node 所儲存的字串,不符時回傳 false,可用於驗證 queue 的內容。

可以來寫測試的 script 了。

以下測試資料來自 https://github.com/sourcefrog/natsort

由於 qtest 的 parsing 方法的關係,參數不能有空白,所以我修改了 parse_args 增加雙引號 "" 語法,以接受參數中的空白。

diff --git a/console.c b/console.c
index 3a754ab..25b66ac 100644
--- a/console.c
+++ b/console.c
@@ -163,11 +163,12 @@ static char **parse_args(char *line, int *argcp)
     char *src = line;
     char *dst = buf;
     bool skipping = true;
+    bool quote = false;

     int c;
     int argc = 0;
     while ((c = *src++) != '\0') {
-        if (isspace(c)) {
+        if (!quote && isspace(c)) {
             if (!skipping) {
                 /* Hit end of word */
                 *dst++ = '\0';
@@ -179,9 +180,14 @@ static char **parse_args(char *line, int *argcp)
                 argc++;
                 skipping = false;
             }
-            *dst++ = c;
+            if (c == '\"')
+                quote = !quote;
+            else
+                *dst++ = c;
         }
     }
+    if (quote)
+        report(1, "Warning: Unbalanced quote '%s'", line);

     /* Now assemble into array of strings */
     char **argv = calloc_or_fail(argc, sizeof(char *), "parse_args");

quote 成立表示在引號中。在引號中時把空白字元當成一般字元處理。

遇到引號時,切換 quote,然後忽略引號,不要寫到參數中。

測試:

$ ./qtest
cmd> option echo 0
cmd> option echo 0
cmd> "new"
q = []
cmd> "it" "tes"t" "space
q = [test space]
cmd> rh"" "test space"
Removed test space from queue
q = []
cmd> """"""""quit""""""""
Freeing queue
$ ./qtest
cmd> "new
cmd> "new
Warning: Unbalanced quote '"new
'
Unknown command 'new
'
cmd>

再次驗證 - 2020-03-02 21:05

traces/trace-18-natsort.cmd
取自 test-dates 以及 sorted-dates

# Test of sort with natural sort - Date
option fail 0
option malloc 0
option compare 4
new
ih "2000-1-10"
ih "2000-1-2"
ih "1999-12-25"
ih "2000-3-23"
ih "1999-3-3"
sort
rh "1999-3-3"
rh "1999-12-25"
rh "2000-1-2"
rh "2000-1-10"
rh "2000-3-23"

traces/trace-19-natsort.cmd
取自 test-fractions 以及 sorted-fractions

# Test of sort with natural sort - Fractions
option fail 0
option malloc 0
option compare 4
new
ih "Fractional release numbers"
ih "1.011.02"
ih "1.010.12"
ih "1.009.02"
ih "1.009.20"
ih "1.009.10"
ih "1.002.08"
ih "1.002.03"
ih "1.002.01"
sort
rh "1.002.01"
rh "1.002.03"
rh "1.002.08"
rh "1.009.02"
rh "1.009.10"
rh "1.009.20"
rh "1.010.12"
rh "1.011.02"
rh "Fractional release numbers"

traces/trace-20-natsort.cmd
取自 test-words 以及 sorted-words

# Test of sort with natural sort - Words
option fail 0
option malloc 0
option compare 4
new
ih "fred"
ih "pic2"
ih "pic100a"
ih "pic120"
ih "pic121"
ih "jane"
ih "tom"
ih "pic02a"
ih "pic3"
ih "pic4"
ih "1-20"
ih "pic100"
ih "pic02000"
ih "10-20"
ih "1-02"
ih "1-2"
ih "x2-y7"
ih "x8-y8"
ih "x2-y08"
ih "x2-g8"
ih "pic01"
ih "pic02"
ih "pic 6"
ih "pic   7"
ih "pic 5"
ih "pic05"
ih "pic 5 "
ih "pic 5 something"
ih "pic 4 else"
sort
rh "1-02"
rh "1-2"
rh "1-20"
rh "10-20"
rh "fred"
rh "jane"
rh "pic01"
rh "pic02"
rh "pic02a"
rh "pic02000"
rh "pic05"
rh "pic2"
rh "pic3"
rh "pic4"
rh "pic 4 else"
rh "pic 5"
rh "pic 5 "
rh "pic 5 something"
rh "pic 6"
rh "pic   7"
rh "pic100"
rh "pic100a"
rh "pic120"
rh "pic121"
rh "tom"
rh "x2-g8"
rh "x2-y08"
rh "x2-y7"
rh "x8-y8"
$ ./qtest -f traces/trace-18-natsort.cmd
cmd> # Test of sort with natural sort - Date
cmd> option fail 0
cmd> option malloc 0
cmd> option compare 4
Will use natstrcasecmp for sorting.
cmd> new
q = []
cmd> ih "2000-1-10"
q = [2000-1-10]
cmd> ih "2000-1-2"
q = [2000-1-2 2000-1-10]
cmd> ih "1999-12-25"
q = [1999-12-25 2000-1-2 2000-1-10]
cmd> ih "2000-3-23"
q = [2000-3-23 1999-12-25 2000-1-2 2000-1-10]
cmd> ih "1999-3-3"
q = [1999-3-3 2000-3-23 1999-12-25 2000-1-2 2000-1-10]
cmd> sort
q = [1999-12-25 1999-3-3 2000-1-10 2000-1-2 2000-3-23]
cmd> rh "1999-3-3"
Removed 1999-12-25 from queue
ERROR: Removed value 1999-12-25 != expected value 1999-3-3
q = [1999-3-3 2000-1-10 2000-1-2 2000-3-23]
cmd> rh "1999-12-25"
Removed 1999-3-3 from queue
ERROR: Removed value 1999-3-3 != expected value 1999-12-25
q = [2000-1-10 2000-1-2 2000-3-23]
cmd> rh "2000-1-2"
Removed 2000-1-10 from queue
ERROR: Removed value 2000-1-10 != expected value 2000-1-2
q = [2000-1-2 2000-3-23]
cmd> rh "2000-1-10"
Removed 2000-1-2 from queue
ERROR: Removed value 2000-1-2 != expected value 2000-1-10
q = [2000-3-23]
cmd> rh "2000-3-23"
Removed 2000-3-23 from queue
q = []

發現怎麼和 strcmp() 的結果一樣?

寫個簡單的測試程式,test.c

#include <stdio.h>

#include "compare.h"

int main(int argc, char *argv[])
{
    int res = natstrcmp(argv[1], argv[2]);

    if (res < -1)
        res = -1;
    else if (res > 1)
        res = 1;

    printf("\"%s\" %c \"%s\"\n", argv[1], "<=>"[res + 1], argv[2]);
}

compare.o 一起編譯。

gcc -o test test.c compare.o -g -Wall -Wextra -pedantic
$ ./test "22" "3"
"22" < "3"

只好出動 gdb

$ gdb -q ./test
Reading symbols from ./test...done.
(gdb) b natstrcmp_base
Breakpoint 1 at 0xa31: file compare.c, line 105.
(gdb) r 22 3
Starting program: /home/iid/Linux-Kernel-Design-2020/lab0-c/test 22 3

Breakpoint 1, natstrcmp_base (s1=0x7ffffffee23e "22", s2=0x7ffffffee241 "3", ignore_case=false) at compare.c:105
105     {
(gdb) n
109         while (*p1 || *p2) {
(gdb)
112             if (isdigit(*p1) || isdigit(*p2)) {
(gdb)
113                 diff = natstrcmp_digit(&p1, &p2); /* Compare digits */
(gdb) s
natstrcmp_digit (ps2=<synthetic pointer>, ps1=<synthetic pointer>) at compare.c:58
58          *ps1 += strspn(*ps1, space);
(gdb) p *ps1
$1 = 0x7ffffffee23e "22"
(gdb) n
59          *ps2 += strspn(*ps2, space);
(gdb) p *ps1
$2 = 0x7ffffffee23e "22"
(gdb) n
61          if (!isdigit(**ps1)) {
(gdb)
65          if (!isdigit(**ps2)) {
(gdb)
72          *ps1 += strspn(*ps1, digit);
(gdb)
73          *ps2 += strspn(*ps2, digit);
(gdb)
76          numcmp = !(**ps1 == '0' || **ps2 == '0');
(gdb)

找到了 1 個問題。

仔細檢查後,發現有 3 個問題:

  • 不小心把指向數字部分結尾的 ps1/ps2 當作數字部分的開頭。
  • 發現第一個不同點時執行 break; 後沒有移到下一個字元,造成同一個字元比兩次。
  • 進行整數比較時,長度的不同要比先前找到的字元不同還優先,但是寫的時候寫反了,導致變成字串比較。
diff --git a/compare.c b/compare.c
index 2a2c0bd..84a5c0e 100644
--- a/compare.c
+++ b/compare.c
@@ -73,7 +73,7 @@ static int natstrcmp_digit(const char **ps1, const char **ps2)
     *ps2 += strspn(*ps2, digit);

     diff = 0;
-    numcmp = !(**ps1 == '0' || **ps2 == '0');
+    numcmp = !(*p1 == '0' || *p2 == '0');
     for (;;) {
         for (;;) {
             if (isdigit(*p1) && isdigit(*p2)) {
@@ -84,8 +84,9 @@ static int natstrcmp_digit(const char **ps1, const char **ps2)
                 }
             } else {
                 /* strcmp/numcmp: Returns on end of either numbers */
-                if (!diff)
-                    diff = isdigit(*p1) - isdigit(*p2);
+                const int diff_len = isdigit(*p1) - isdigit(*p2);
+                if (diff_len || !numcmp)
+                    diff = diff_len;
                 return diff;
             }

@@ -95,6 +96,8 @@ static int natstrcmp_digit(const char **ps1, const char **ps2)
         /* strcmp: Returns on first mismatch */
         if (!numcmp)
             return diff;
+        ++p1;
+        ++p2;
     }
 }
$ make compare.o
  CC    compare.o
$ gcc -o test test.c compare.o -Wall -Wextra -pedantic -g
test.c: In function ‘main’:
test.c:5:14: warning: unused parameter ‘argc’ [-Wunused-parameter]
 int main(int argc, char *argv[])
              ^~~~
$ ./test "22" "3"
"22" > "3"
$ make
LD    qtest
$ ./qtest -f traces/trace-18-natsort.cmd
cmd> # Test of sort with natural sort - Date
[...]
q = [1999-3-3 2000-3-23 1999-12-25 2000-1-2 2000-1-10]
cmd> sort
q = [1999-3-3 1999-12-25 2000-1-2 2000-1-10 2000-3-23]
cmd> rh "1999-3-3"
Removed 1999-3-3 from queue
q = [1999-12-25 2000-1-2 2000-1-10 2000-3-23]
cmd> rh "1999-12-25"
Removed 1999-12-25 from queue
q = [2000-1-2 2000-1-10 2000-3-23]
cmd> rh "2000-1-2"
Removed 2000-1-2 from queue
q = [2000-1-10 2000-3-23]
cmd> rh "2000-1-10"
Removed 2000-1-10 from queue
q = [2000-3-23]
cmd> rh "2000-3-23"
Removed 2000-3-23 from queue
q = []
Freeing queue

看起來 OK 了。

$ ./qtest  -f traces/trace-19-natsort.cmd
cmd> # Test of sort with natural sort - Fractions
[...]
q = [1.002.01 1.002.03 1.002.08 1.009.10 1.009.20 1.009.02 1.010.12 1.011.02 Fractional release numbers]
cmd> sort
q = [Fractional release numbers 1.002.01 1.002.03 1.002.08 1.009.02 1.009.10 1.009.20 1.010.12 1.011.02]
cmd> rh "1.002.01"
Removed Fractional release numbers from queue
ERROR: Removed value Fractional release numbers != expected value 1.002.01
q = [1.002.01 1.002.03 1.002.08 1.009.02 1.009.10 1.009.20 1.010.12 1.011.02]
cmd> rh "1.002.03"
Removed 1.002.01 from queue
ERROR: Removed value 1.002.01 != expected value 1.002.03
q = [1.002.03 1.002.08 1.009.02 1.009.10 1.009.20 1.010.12 1.011.02]
cmd> rh "1.002.08"
Removed 1.002.03 from queue
ERROR: Removed value 1.002.03 != expected value 1.002.08
q = [1.002.08 1.009.02 1.009.10 1.009.20 1.010.12 1.011.02]
cmd> rh "1.009.02"
Removed 1.002.08 from queue
ERROR: Removed value 1.002.08 != expected value 1.009.02
q = [1.009.02 1.009.10 1.009.20 1.010.12 1.011.02]
cmd> rh "1.009.10"
Removed 1.009.02 from queue
ERROR: Removed value 1.009.02 != expected value 1.009.10
q = [1.009.10 1.009.20 1.010.12 1.011.02]
Error limit exceeded.  Stopping command execution
cmd> rh "1.009.20"
cmd> rh "1.010.12"
cmd> rh "1.011.02"
cmd> rh "Fractional release numbers"
diff --git a/compare.c b/compare.c
index 0aa7d28..0279caa 100644
--- a/compare.c
+++ b/compare.c
@@ -60,11 +60,11 @@ static int natstrcmp_digit(const char **ps1, const char **ps2)

     if (!isdigit(**ps1)) {
         *ps2 += strspn(*ps2, digit);
-        return -1;
+        return 1;
     }
     if (!isdigit(**ps2)) {
         *ps1 += strspn(*ps1, digit);
-        return 1;
+        return -1;
     }

     p1 = *ps1;

寫反了。這是非數字部分的結束,所以開頭非數字者大。

$ ./qtest -f traces/trace-19-natsort.cmd
[...]
q = [1.002.01 1.002.03 1.002.08 1.009.10 1.009.20 1.009.02 1.010.12 1.011.02 Fractional release numbers]
cmd> sort
q = [1.002.01 1.002.03 1.002.08 1.009.02 1.009.10 1.009.20 1.010.12 1.011.02 Fractional release numbers]
cmd> rh "1.002.01"
Removed 1.002.01 from queue
q = [1.002.03 1.002.08 1.009.02 1.009.10 1.009.20 1.010.12 1.011.02 Fractional release numbers]
cmd> rh "1.002.03"
Removed 1.002.03 from queue
q = [1.002.08 1.009.02 1.009.10 1.009.20 1.010.12 1.011.02 Fractional release numbers]
cmd> rh "1.002.08"
Removed 1.002.08 from queue
q = [1.009.02 1.009.10 1.009.20 1.010.12 1.011.02 Fractional release numbers]
cmd> rh "1.009.02"
Removed 1.009.02 from queue
q = [1.009.10 1.009.20 1.010.12 1.011.02 Fractional release numbers]
cmd> rh "1.009.10"
Removed 1.009.10 from queue
q = [1.009.20 1.010.12 1.011.02 Fractional release numbers]
cmd> rh "1.009.20"
Removed 1.009.20 from queue
q = [1.010.12 1.011.02 Fractional release numbers]
cmd> rh "1.010.12"
Removed 1.010.12 from queue
q = [1.011.02 Fractional release numbers]
cmd> rh "1.011.02"
Removed 1.011.02 from queue
q = [Fractional release numbers]
cmd> rh "Fractional release numbers"
Removed Fractional release numbers from queue
q = []
Freeing queue

OK。

$ ./qtest -f traces/trace-20-natsort.cmd
cmd> # Test of sort with natural sort - Words
[...]
q = [pic 4 else pic 5 something pic 5  pic05 pic 5 pic   7 pic 6 pic02 pic01 x2-g8 x2-y08 x8-y8 x2-y7 1-2 1-02 10-20 pic02000 pic100 1-20 pic4 pic3 pic02a tom jane pic121 pic120 pic100a pic2 fred]
cmd> sort
q = [1-02 1-2 1-20 10-20 fred jane pic01 pic02a pic02 pic02000 pic05 pic2 pic3 pic4 pic 4 else pic 5 pic 5  pic 5 something pic 6 pic   7 pic100a pic100 pic120 pic121 tom x2-g8 x2-y08 x2-y7 x8-y8]
[...]
q = [pic02a pic02 pic02000 pic05 pic2 pic3 pic4 pic 4 else pic 5 pic 5  pic 5 something pic 6 pic   7 pic100a pic100 pic120 pic121 tom x2-g8 x2-y08 x2-y7 x8-y8]
cmd> rh "pic02"
Removed pic02a from queue
ERROR: Removed value pic02a != expected value pic02
q = [pic02 pic02000 pic05 pic2 pic3 pic4 pic 4 else pic 5 pic 5  pic 5 something pic 6 pic   7 pic100a pic100 pic120 pic121 tom x2-g8 x2-y08 x2-y7 x8-y8]
cmd> rh "pic02a"
Removed pic02 from queue
ERROR: Removed value pic02 != expected value pic02a
q = [pic02000 pic05 pic2 pic3 pic4 pic 4 else pic 5 pic 5  pic 5 something pic 6 pic   7 pic100a pic100 pic120 pic121 tom x2-g8 x2-y08 x2-y7 x8-y8]
[...]
q = [pic100a pic100 pic120 pic121 tom x2-g8 x2-y08 x2-y7 x8-y8]
cmd> rh "pic100"
Removed pic100a from queue
ERROR: Removed value pic100a != expected value pic100
q = [pic100 pic120 pic121 tom x2-g8 x2-y08 x2-y7 x8-y8]
cmd> rh "pic100a"
Removed pic100 from queue
ERROR: Removed value pic100 != expected value pic100a
q = [pic120 pic121 tom x2-g8 x2-y08 x2-y7 x8-y8]

發現在 natstrcmp_digit() 數字相同回傳 0 時,會漏掉一個字元的比較。

diff --git a/compare.c b/compare.c
index 0279caa..a4b7b6b 100644
--- a/compare.c
+++ b/compare.c
@@ -110,11 +110,12 @@ static int natstrcmp_base(const char *s1, const char *s2, bool ignore_case)
     const char *p2 = s2;

     while (*p1 || *p2) {
-        int diff;
+        int diff = 0;

         if (isdigit(*p1) || isdigit(*p2)) {
             diff = natstrcmp_digit(&p1, &p2); /* Compare digits */
-        } else {
+        }
+        if (!diff) {
             const char c1 = (ignore_case) ? tolower(*p1) : *p1;
             const char c2 = (ignore_case) ? tolower(*p2) : *p2;
             diff = c1 - c2;
$ ./qtest -f traces/trace-20-natsort.cmd >/dev/null; echo $?
0

終於通過 natural comparison 的測試了。

加入自動評分 - 2020-03-02 22:05

試著放到 scripts/driver.py 中。

diff --git a/scripts/driver.py b/scripts/driver.py
index f0abc2c..bc33b60 100755
--- a/scripts/driver.py
+++ b/scripts/driver.py
@@ -35,7 +35,10 @@ class Tracer:
         14: "trace-14-perf",
         15: "trace-15-perf",
         16: "trace-16-perf",
-        17: "trace-17-complexity"
+        17: "trace-17-complexity",
+        18: "trace-18-natsort",
+        19: "trace-19-natsort",
+        20: "trace-20-natsort",
     }

     traceProbs = {
@@ -55,10 +58,13 @@ class Tracer:
         14: "Trace-14",
         15: "Trace-15",
         16: "Trace-16",
-        17: "Trace-17"
+        17: "Trace-17",
+        18: "Trace-18",
+        19: "Trace-19",
+        20: "Trace-20",
     }

-    maxScores = [0, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5]
+    maxScores = [0, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 4, 4, 5]

     RED = '\033[91m'
     GREEN = '\033[92m'
$ make test
scripts/driver.py -c
---     Trace           Points
[...]
+++ TESTING trace trace-16-perf:
# Test performance of sort with random and descending orders
# 10000: all correct sorting algorithms are expected pass
# 50000: sorting algorithms with O(n^2) time complexity are expected failed
# 100000: sorting algorithms with O(nlogn) time complexity are expected pass
---     trace-16-perf   6/6
+++ TESTING trace trace-17-complexity:
# Test if q_insert_tail and q_size is constant time complexity
Probably constant time
Probably constant time
---     trace-17-complexity     5/5
+++ TESTING trace trace-18-natsort:
# Test of sort with natural sort - Date
Will use natstrcasecmp for sorting.
---     trace-18-natsort        4/4
+++ TESTING trace trace-19-natsort:
# Test of sort with natural sort - Fractions
Will use natstrcasecmp for sorting.
---     trace-19-natsort        4/4
+++ TESTING trace trace-20-natsort:
# Test of sort with natural sort - Words
Will use natstrcasecmp for sorting.
---     trace-20-natsort        5/5
---     TOTAL           113/113

可以 commit 了。

$ git commit -a
[master e1517c2] Make `make test` test natural sort
 4 files changed, 113 insertions(+), 3 deletions(-)
 create mode 100644 traces/trace-18-natsort.cmd
 create mode 100644 traces/trace-19-natsort.cmd
 create mode 100644 traces/trace-20-natsort.cmd

dudect 實作探討

  • dudect?
    • Detect?
    • Deduce?
    • Dude, constant time?

“simulation” 初探 - 2020-02-29 17:14

作業要求中有個:

修改排序所用的比較函式,變更為 natural sort,在 “simulation” 也該做對應的修改,得以反映出 natural sort 的使用。

“simulation” 是什麼意思?
rg (ripgrep) 找找看:

$ rg simulation
console.c
20:bool simulation = false;
102:    add_param("simulation", (int *) &simulation, "Start/Stop simulation mode",

console.h
9:extern bool simulation;

qtest.c
254:    if (simulation) {
256:            report(1, "%s does not need arguments in simulation mode", argv[0]);
473:    if (simulation) {
475:            report(1, "%s does not need arguments in simulation mode", argv[0]);

dudect/constant.c
27:/* Implement the necessary queue interface to simulation */

traces/trace-17-complexity.cmd
2:option simulation 1
5:option simulation 0

發現在 qtest.c 下的 do_insert_tail()do_size()simulation 成立時,會去呼叫 is_*_const() 函數。

$ rg 'is_\w+_const'
qtest.c
259:        bool ok = is_insert_tail_const();
478:        bool ok = is_size_const();

dudect/fixture.c
156:bool is_insert_tail_const(void)
177:bool is_size_const(void)

dudect/fixture.h
8:bool is_insert_tail_const(void);
9:bool is_size_const(void);

進入 dudect/fixture.c,發現 is_*_const() 函數都會去重複呼叫 doit(),而 doit() 呼叫的 measure() 名字看來很可疑。

$ rg -w measure
dudect/constant.h
37:void measure(int64_t *before_ticks,

dudect/constant.c
55:void measure(int64_t *before_ticks,

dudect/fixture.c
136:    measure(before_ticks, after_ticks, input_data, mode);

進入 dudect/constant.c,發現裡面有一堆 dut_*() 函數。

$ rg 'dut_'
dudect/constant.c
64:            dut_new();
65:            dut_insert_head(
69:            dut_insert_tail(s, 1);
71:            dut_free();
75:            dut_new();
76:            dut_insert_head(
80:            dut_size(1);
82:            dut_free();

dudect/constant.h
5:#define dut_new()    \
10:#define dut_size(n)                                \
16:#define dut_insert_head(s, n)    \
23:#define dut_insert_tail(s, n)    \
30:#define dut_free() \

發現是 #define 開頭的,原來是 function-like macros。

進入 dudect/constant.h

發現潛ㄑㄧㄢˊ,陽平聲在問題

#define dut_new()    \
    {                \
        q = q_new(); \
    }

#define dut_size(n)                                \
    do {                                           \
        for (int __iter = 0; __iter < n; ++__iter) \
            q_size(q);                             \
    } while (0);

#define dut_insert_head(s, n)    \
    do {                         \
        int j = n;               \
        while (j--)              \
            q_insert_head(q, s); \
    } while (0);

#define dut_insert_tail(s, n)    \
    do {                         \
        int j = n;               \
        while (j--)              \
            q_insert_tail(q, s); \
    } while (0);

#define dut_free() \
    {              \
        q_free(q); \
    }

問題探討

既然已經用 do {} while (0) 包起來了,就不應該有後面的 ; 了。
因為當寫程式者以為它是函數時,會用如 dut_size(42); 的方法使用這個 macro,而這樣就會造成多出的 ;

請送 pull request 過來。
:notes: jserv

註 - 2020-03-12 17:40
已送:https://github.com/sysprog21/lab0-c/pull/39

如果用如

if (cond)
    dut_size(42);
else
    dut_free();

的方法使用的話,會展開成:

if (cond)
    do {                                           \
        for (int __iter = 0; __iter < 42; ++__iter) \
            q_size(q);                             \
    } while (0);;
else
    {              \
        q_free(q); \
    };

(行尾的 \ 是為排版而加,實際上展開後應已併為單行而不存在)
寫程式者自己加上的 ; 已經在 if block 外面,後面的 else 不會對應前面的 if (cond),而是更外層的 if;如果這已經是最外層的 if 的話,就會產生語法錯誤。
而底下的 else 後的寫程式者自己加上的 ;,也不屬於 else block。

而 macros dut_new()dut_free() 完全可以寫成 expression 的形式,可以完美避免這種 statement 才會遇到的問題:

#define dut_new()           \
    (                       \
        (void)(q = q_new()) \
    )

#define dut_free()        \
    (                     \
        (void)(q_free(q)) \
    )

加上 (void),是避免有人使用例如 q_size dut_new() (會展開成 q_size (q = q_new())) 這種看似不合 C/C++ 語法而令人困惑的方法呼叫函數。

而將轉成 void 的 expression 用 () 括起,是避免有人使用例如 dut_new()->head++ (會展開成 (void)(q = q_new())->head++) 的方式繞過轉型成 void 後不能操作的限制,以免這人改用 ++dut_new()->head (會展開成 ++(void)(q = q_new())->head) 時發生編譯錯誤而抱怨。

論文 Dude, is my code constant time? 內容整理