2019q1 Homework1 (lab0)

contributed by < Naetw >

說明

作業要求

實做 FIFO & LIFO queue。

主要需求
- q_new - 新增空的 queue
- q_free - 清除整個 queue
- q_insert_head - 以 LIFO 方式新增元素
- q_insert_tail - 以 FIFO 方式新增元素
- q_remove_head - 移除 queue 的開頭元素
- q_size - 計算 queue 中有多少元素
- q_reverse - 反轉 queue 中元素的順序
注意
- 錯誤處理
- 透過參數傳遞字串進函式時，需要使用 malloc 要一塊新的空間並複製字串（注意 null byte）
- q_insert_tail, q_size 要是
  $O (1)$

實作流程

`q_size`

由於需要

O (1)

時間內完成，在 queue.h 裡新增成員 size 並在新增、移除操作時進行維護。呼叫此函式時，便可以直接回傳大小。




int q_size(queue_t *q)
{
    return (q == NULL) ? 0 : q->size;
}

`q_new`

須確認 malloc 是否成功。













queue_t *q_new()
{
    queue_t *q = (queue_t *) malloc(sizeof(queue_t));

    if (q != NULL) {
        q->head = NULL;
        // added when implementing q_insert_tail
        q->indirect_tail = &q->head;
        // end
        q->size = 0;
    }
    return q;
}

`q_free`

清除 queue 中所有元素及相關內容，要記得節點空間也需釋放。














void q_free(queue_t *q)
{
    if (q != NULL) {
        list_ele_t *next;

        while (q->head != NULL) {
            next = q->head->next;
            free(q->head->value);
            free(q->head);
            q->head = next;
        }
        free(q);
    }
}

`q_insert_head`

基本上就根據註解去實做，須注意的是若節點本身成功，但要是字串的空間分配失敗的話，除了需要回傳 false 外，也要記得將剛剛分配出來的節點空間做釋放以避免記憶體洩漏問題（memory leak）。































bool q_insert_head(queue_t *q, char *s)
{
    if (q != NULL) {
        list_ele_t *newh = (list_ele_t *) malloc(sizeof(list_ele_t));
        size_t length;

        if (newh != NULL) {
            length = strlen(s) + 1;
            newh->value = (char *) malloc(length * sizeof(char));
            if (newh->value != NULL) {
                memcpy(newh->value, s, length);
                newh->next = q->head;
                q->head = newh;
                // added when implementing q_insert_tail
                if (q->size == 0) {
                    q->indirect_tail = &newh->next;
                }
                // end
                ++q->size;
                return true;
            } else {
                free(newh);
                return false;
            }
        } else {
            return false;
        }
    } else {
        return false;
    }
}

改寫為更簡潔的形式:

bool q_insert_head(queue_t *q, char *s)
{
    if (!q)
        return false;
    list_ele_t *newh = (list_ele_t *) malloc(sizeof(list_ele_t));
    if (!newh) 
        return false;
    size_t length = strlen(s) + 1;
    ...
}

下方亦同，注意 scope
:notes: jserv

已改寫，可見下方
naetw

`q_remove_head`

同樣根據註解去實做。


























bool q_remove_head(queue_t *q, char *sp, size_t bufsize)
{
    if (q != NULL && q->size != 0) {
        list_ele_t *ptr;

        ptr = q->head;
        q->head = q->head->next;
        // added when implementing q_insert_tail
        if (--q->size == 0) {
            q->indirect_tail = &q->head;
        }
        // end

        if (sp != NULL) {
            memcpy(sp, ptr->value, bufsize - 1);
            sp[bufsize - 1] = '\0';
        }
        free(ptr->value);
        free(ptr);

        return true;
    } else {
        return false;
    }
}

Valgrind

在實做完比較重要的新增、移除的函式後除了跑單元測試外，也試著跑了 Valgrind 來檢查有沒有記憶體相關問題，透過 driver.py 跑的話會有比較雜亂的分析結果（參雜 Python 腳本本身的執行分析），於是改成單單使用 qtest 來做分析。在測試第 14 筆關於效能的分析時，遇到的問題：

執行 Valgrind 加入 -q 或 --quiet，只輸出錯誤訊息
:notes: jserv

==18229== Memcheck, a memory error detector
==18229== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==18229== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==18229== Command: ./qtest -v 1 -f ./traces/trace-14-perf.cmd
==18229==
# Test performance of size
ERROR: Time limit exceeded.  Either you are in an infinite loop, or your code is too inefficient
ERROR: Insertion of dolphin failed (1 failures total)
...

但是單純執行指令時不會發生問題：

» ./qtest -v 1 -f ./traces/trace-14-perf.cmd
# Test performance of size

產生問題的原因是 Valgrind 分析時會插入自己的指令來獲得資料^[1]，因此超過了 qtest 本身預期的時間限制。搜尋了一下原始碼，在 qtest.c:493 有設置 SIGALRM，並在 harness.c:53 設定了時間限制，將其數值提高便能正常使用 valgrind 測試記憶體的問題。

請針對 Valgrind 支援，提交 pull request。
另外，Valgrind 具備 suppression files 的特徵，可參見 Valgrind Suppression File Howto
:notes: jserv

`q_insert_tail`

由於必須在

O (1)

時間內完成，多加一個能夠代表 tail 的成員在 queue 的結構中，來紀錄當前狀態的尾端節點位址。

單純使用 pointer to list_ele_t 的話，在插入的函式裡就必須有以下程式碼來處理只會出現一次的情境：





if (q->size == 0) {
    q->head = new_node;
} else {
    q->tail->next = new_node;
}

上述情境如同他的條件一樣，只會出現在 queue 為空的，插入第一個元素的時候，這時想起之前看過的 Linus 介紹「有品味」的程式碼的影片，於是將代表 tail 的成員型別改成 pointer to pointer to list_ele_t，並在 queue 的建構函式中將 &q->head 指派給它：

q->indirect_tail = &q->head;

如此一來程式碼的設計可以這樣變化：

原本的程式碼

if (q->size == 0) {
    q->head = new_node;
} else {
    q->tail->next = new_node;
}
q->tail = new_node;

改寫後

*(q->indirect_tail) = new_node;
q->indirect_tail = &new_node->next;

變得更為簡潔，且不失可讀性。

其餘部分基本上跟 insert_head 差不多。需注意的部分是要在 q_remove_head 加入相對應的程式碼來處理 queue 變成空的情況，也就是重新設定代表 tail 的成員。



























bool q_insert_tail(queue_t *q, char *s)
{
    if (q != NULL) {
        list_ele_t *new_node = (list_ele_t *) malloc(sizeof(list_ele_t));
        size_t length;

        if (new_node != NULL) {
            length = strlen(s) + 1;
            new_node->value = (char *) malloc(length * sizeof(char));
            if (new_node->value != NULL) {
                memcpy(new_node->value, s, length);
                new_node->next = NULL;
                *(q->indirect_tail) = new_node;
                q->indirect_tail = &new_node->next;
                ++q->size;
                return true;
            } else {
                free(new_node);
                return false;
            }
        } else {
            return false;
        }
    } else {
        return false;
    }
}

`q_reverse`

基本上只需要額外兩個變數來紀錄前一個與後一個節點即可，當前節點使用 head 紀錄。當 queue 的大小 <= 1 的時候不需要做 reverse。

















void q_reverse(queue_t *q)
{
    if (q != NULL && q->size > 1) {
        list_ele_t *prev = q->head, *next;

        q->indirect_tail = &prev->next;
        q->head = q->head->next;
        *q->indirect_tail = NULL;
        while (q->head != NULL) {
            next = q->head->next;
            q->head->next = prev;
            prev = q->head;
            q->head = next;
        }
        q->head = prev;
    }
}

到此通過 Unit Test。接著研究是否能夠改善程式碼品質與研究 qtest 的實作。

改善程式碼品質

Invalid read

先前跑 valgrind 只注意有沒有記憶體洩漏問題，重新測了一次發現有 invalid read 的問題，研究了一下發現是以下程式碼出現問題：




if (sp != NULL) {
    memcpy(sp, ptr->value, bufsize - 1);
    sp[bufsize - 1] = '\0';
}

memcpy 會完全複製 bufsize - 1 bytes 到 sp 裡，但是 ptr->value 的空間長度可能遠小於 bufsize - 1，先前誤解了註解意思，以為要直接複製 bufsize - 1 bytes，實際上它只是最大值而已。將 memcpy 替換成 strncpy 這樣一來在遇到 NULL byte 時複製行為便會直接停止。

抽出相同邏輯的程式碼

上面的 q_insert_head 以及 q_insert_tail 程式碼邏輯相似，將其抽離出來放到另一個函式 q_insert_prologue：























list_ele_t *q_insert_prologue(queue_t *q, char *s)
{
    if (q != NULL) {
        list_ele_t *new_node = (list_ele_t *) malloc(sizeof(list_ele_t));
        size_t length;

        if (new_node != NULL) {
            length = strlen(s) + 1;  // +1 for NULL byte
            new_node->value = (char *) malloc(length * sizeof(char));
            if (new_node->value != NULL) {
                memcpy(new_node->value, s, length);
                return new_node;
            } else {
                free(new_node);
                return NULL;
            }
        } else {
            return NULL;
        }
    } else {
        return NULL;
    }
}

並在 q_insert_head 以及 q_insert_tail 中呼叫：





























bool q_insert_head(queue_t *q, char *s)
{
    list_ele_t *new_node;
    if ((new_node = q_insert_prologue(q, s)) != NULL) {
        new_node->next = q->head;
        q->head = new_node;
        if (q->size == 0) {
            q->indirect_tail = &new_node->next;
        }
        ++q->size;
        return true;
    } else {
        return false;
    }
}

bool q_insert_tail(queue_t *q, char *s)
{
    list_ele_t *new_node;
    if ((new_node = q_insert_prologue(q, s)) != NULL) {
        new_node->next = NULL;
        *(q->indirect_tail) = new_node;
        q->indirect_tail = &new_node->next;
        ++q->size;
        return true;
    } else {
        return false;
    }
}

使用 macro 簡化程式碼

上面 Jserv 提到使用 if (expr) return NULL; 來讓程式碼更加簡潔。由於個人私心覺得那樣蠻不好看的，比較習慣把主邏輯放在一起，不想要有 if (expr) return NULL; 之類的邏輯穿插其中，所以先前設計時才使用相反的判斷，把需要提早結束的邏輯都丟進 else 裡。不過既然要改，那就改成更好看的形式吧！之前曾經稍稍研究過 TensorFlow 裡的 XLA 編譯器，閱讀原始碼時看到不少 macro 的使用，覺得十分漂亮（把不是很好看的 if 弄得像是單純的一條陳述式），所以就來試試：





#define Q_RETURN_IF_NULL(ptr, retval) \
    do {                              \
        if (ptr == NULL)              \
            return retval;            \
    } while (0)

如此一來上方提到的 q_insert_prologue：

原本的程式碼：























list_ele_t *q_insert_prologue(queue_t *q, char *s)
{
    if (q != NULL) {
        list_ele_t *new_node = (list_ele_t *) malloc(sizeof(list_ele_t));
        size_t length;

        if (new_node != NULL) {
            length = strlen(s) + 1;  // +1 for NULL byte
            new_node->value = (char *) malloc(length * sizeof(char));
            if (new_node->value != NULL) {
                memcpy(new_node->value, s, length);
                return new_node;
            } else {
                free(new_node);
                return NULL;
            }
        } else {
            return NULL;
        }
    } else {
        return NULL;
    }
}

改寫後：



















list_ele_t *q_insert_prologue(queue_t *q, char *s)
{
    Q_RETURN_IF_NULL(q, NULL);

    list_ele_t *new_node = (list_ele_t *) malloc(sizeof(list_ele_t));
    size_t length;

    Q_RETURN_IF_NULL(new_node, NULL);

    length = strlen(s) + 1;  // +1 for NULL byte
    new_node->value = (char *) malloc(length * sizeof(char));
    if (new_node->value != NULL) {
        memcpy(new_node->value, s, length);
        return new_node;
    } else {
        free(new_node);
        return NULL;
    }
}

而 q_insert_head 與 q_insert_tail 也有地方能夠善用此巨集，以 q_insert_tail 為例：

原本的程式碼：













bool q_insert_tail(queue_t *q, char *s)
{
    list_ele_t *new_node;
    if ((new_node = q_insert_prologue(q, s)) != NULL) {
        new_node->next = NULL;
        *(q->indirect_tail) = new_node;
        q->indirect_tail = &new_node->next;
        ++q->size;
        return true;
    } else {
        return false;
    }
}

改寫後：











bool q_insert_tail(queue_t *q, char *s)
{
    list_ele_t *new_node;

    Q_RETURN_IF_NULL((new_node = q_insert_prologue(q, s)), false);
    new_node->next = NULL;
    *(q->indirect_tail) = new_node;
    q->indirect_tail = &new_node->next;
    ++q->size;
    return true;
}

抽掉了不少內縮的邏輯，更清楚了！

tail 命名

由於改用 pointer to pointer to list_ele_t，想想該成員直接稱作 tail 有點不夠精準，改成 indirect_tail。如此一來，要獲得最後一個節點的話看到 indirect_tail 就可以很直覺的對它做 dereference。

修改腳本以支援 Valgrind

Pull Request

在先前使用 Valgrind 時，Jserv 說我可以提交支援 Valgrind 測試的 PR。著手後遇到的問題有：

要把測試的功能放在哪？
- 額外的 shell script
- 修改 driver.py
Patch 的方法有很多種，要用哪一種？
- 像上面提到的去修改 time_limit 數值 / 讓 alarm 消失
  - 使用類似 Debug Macro，並透過編譯參數來產生不同版本的執行檔
- 由於執行檔是動態連結，可以將造成問題的 alarm 函式符號替換成同樣長度的 isnan（同樣長度才不會破壞執行檔本身結構），這樣一來執行後在做函式解析時就會拿到 isnan 的位址，讓原本的 alarm 無效化

修改 driver.py

看了一下 driver.py 發現並不難改，且想寫的 shell script 功能基本上就跟 driver.py 一樣，所以最後選擇直接修改 driver.py。

這邊沒什麼特別的，基本上就是在原先要執行的指令前面加上 valgrind，所以多新增了兩個成員 command & useValgrind：

command
- 用來表示主要指令
- 一般模式就是 qtest
useValgrind
- 一個旗標用來標示 --valgrind 選項有沒有被傳入

Patch 方法與修改 Makefile

介紹

由於 alarm 在程式中出現的地方有些分散，因此後來決定用最簡單的方法，直接去修改執行檔的符號表（Symbol Table）。

來由

能夠這樣做的原因是：現行只要系統支援主流工具（GCC, Clang）預設都是使用動態連結（dynamic linking）。

在過去，連結器會將所有目的檔（object file）全部連結成單一的執行檔，如此一來要執行的話只要把它載入記憶體就能順利執行，不需其他人的幫助，這種連結法稱為靜態連結（static linking）。

但是靜態連結存在一些缺點：

記憶體與硬碟空間

假設一個情境，A, B, C 程式都用到了 D 程式的內容，那在靜態連結的方式下，A, B, C 的執行檔都含有 D 程式的內容，若三者都要執行的話，那麼在記憶體上就會擁有 3 份同樣的內容，十分浪費記憶體空間。此外也很浪費硬碟空間。

程式開發與發佈

假設某個發佈的程式 A 使用到了某個函式庫，而該函式庫更新了，那就必須重新進行連結再做發佈。這很麻煩，因為每個小改動都需要重新連結。

而動態連結的想法基本上就是把連結過程延遲到程式要執行時再做。以上面的情境來說：執行程式 A 時，系統將 A 載入以及其所需要的 D，之後開始進行連結，也就是一些符號解析、位址重定等等，最後執行 A。若此時程式 B 需要執行，那系統只需要再將 B 載入到記憶體並直接跟已經在記憶體中的 B 進行連結便可以執行。

但是這樣造成每次程式在載入時都要重新進行連結造成效能上的損失，於是又出現了新的技術 - 延遲繫結（lazy binding）。

延遲繫結

其概念是在函式第一次被呼叫時再去做繫結，畢竟有些函式可能從開始到結束都不會被呼叫到，這樣的設計能夠省下不必要的定址動作。

在程式中會有一張表稱作 GOT (Global Offset Table)，裡面有一個函式指標陣列，儲存別的函式庫中函式的位置。實作上，ELF 會先透過 PLT (Procedure Linkage Table) 去做跳轉，可以透過以下指令觀察：


objdump -d ./qtest | less

找到 main 中第二個 call 指令：















0000000000402682 <main>:
  402682:       55                      push   rbp
  402683:       48 89 e5                mov    rbp,rsp
  402686:       48 81 ec 30 02 00 00    sub    rsp,0x230
  ...
  4026e5:       48 8b 00                mov    rax,QWORD PTR [rax]
  4026e8:       48 89 c7                mov    rdi,rax
  4026eb:       e8 3e ff ff ff          call   40262e <usage>
  4026f0:       e9 99 00 00 00          jmp    40278e <main+0x10c>
  4026f5:       48 8b 0d cc 7e 00 00    mov    rcx,QWORD PTR [rip+0x7ecc]        # 40a5c8 <optarg@@GLIBC_2.2.5>
  4026fc:       48 8d 85 e0 fe ff ff    lea    rax,[rbp-0x120]
  402703:       ba 00 01 00 00          mov    edx,0x100
  402708:       48 89 ce                mov    rsi,rcx
  40270b:       48 89 c7                mov    rdi,rax
  40270e:       e8 2d e9 ff ff          call   401040 <strncpy@plt>

可以看到 call 401040 <strncpy@plt>，401040 其實就是 0x401040 也就是 strncpy 在 plt 中的位置，再回到最上面找到：




0000000000401040 <strncpy@plt>:
  401040:       ff 25 da 8f 00 00       jmp    QWORD PTR [rip+0x8fda]        # 40a020 <strncpy@GLIBC_2.2.5>
  401046:       68 01 00 00 00          push   0x1
  40104b:       e9 d0 ff ff ff          jmp    401020 <.plt>

該位置的第一個指令便是跳去 GOT（嚴格上 GOT 分成兩部分 .got, .got.plt，這邊指的是 .got.plt），程式初始化的過程中各個函式在自己的 .got.plt 位置會被填上一段位於 plt 的程式碼位址，以 gdb 觀察：


(gdb) x/gx 0x40a020
0x40a020 <strncpy@got.plt>:     0x0000000000401046

0x0000000000401046 就是上面 strncpy@plt 的第二條指令，第二條指令做的事是放上該函式的索引，並跳到 PLT0 的位址，準備好參數後接著就跳 dl_runtime_resolve 去做解析，最後拿到 strncpy 在 shared library 中的記憶體位址。解析完成後便會將獲得的函式位址填入 .got.plt 的位址，這樣一來下次呼叫相同函式時便能夠直接跳去執行，不需解析。下方是呼叫過 strncpy 後的記憶體狀態：


















(gdb) x/gx 0x40a020
0x40a020 <strncpy@got.plt>:     0x00007ffff7e9b240
(gdb) x/2i 0x00007ffff7e9b240
   0x7ffff7e9b240 <__strncpy_sse2_unaligned>:   endbr64
   0x7ffff7e9b244 <__strncpy_sse2_unaligned+4>: mov    r8,rdx
(gdb) info pro
probes   proc     program
(gdb) info proc mappings
process 4304
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
            ...
            0x40b000           0x40d000     0x2000        0x0 [heap]
      0x7ffff7dfb000     0x7ffff7e1d000    0x22000        0x0 /usr/lib64/libc-2.28.so
      0x7ffff7e1d000     0x7ffff7f6a000   0x14d000    0x22000 /usr/lib64/libc-2.28.so
      0x7ffff7f6a000     0x7ffff7fb6000    0x4c000   0x16f000 /usr/lib64/libc-2.28.so
      ...

strncpy@got.plt 的位置已經被填入 shared library 的記憶體位址，仔細比較可以看到 0x00007ffff7e9b240 位在 libc.so 映射到的記憶體中的其中一個區段內。

了解了延遲繫結的大致流程（更多細節可參考延伸閱讀），接著來看介紹提到的修改符號表是如何影響函式解析，並在最後達到將 alarm 無效化的效果。

這裡需要再扣回上方提到的 dl_resolve_runtime 函式，需要解析特定函式就必須擁有該函式的名字，ELF 處理動態連結的區段都紀錄在 .dynamic section 中，可以用以下指令觀察：




























» readelf -d qtest

Dynamic section at offset 0x8e20 contains 24 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x401000
 0x000000000000000d (FINI)               0x4054d8
 0x0000000000000019 (INIT_ARRAY)         0x409e10
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x409e18
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x400308
 0x0000000000000005 (STRTAB)             0x400738
 0x0000000000000006 (SYMTAB)             0x400330
 0x000000000000000a (STRSZ)              333 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x40a000
 0x0000000000000002 (PLTRELSZ)           912 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x400980
 0x0000000000000007 (RELA)               0x400920
 0x0000000000000008 (RELASZ)             96 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x4008e0
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x400886
 0x0000000000000000 (NULL)               0x0

其中 SYMTAB 包含各個可能使用到的函式所需資訊，其中一個成員 st_name 紀錄著函式名稱在 STRTAB 中的位置（其餘結構可參考 elf.h）：









typedef struct
{
  Elf64_Word    st_name;		/* Symbol name (string tbl index) */
  unsigned char st_info;		/* Symbol type and binding */
  unsigned char st_other;		/* Symbol visibility */
  Elf64_Section st_shndx;		/* Section index */
  Elf64_Addr    st_value;		/* Symbol value */
  Elf64_Xword   st_size;		/* Symbol size */
} Elf64_Sym;

以 strncpy 為例，並使用 gdb 觀察：



















(gdb) x/10gx 0x400330
0x400330:       0x0000000000000000      0x0000000000000000
0x400340:       0x0000000000000000      0x0000001200000118
0x400350:       0x0000000000000000      0x0000000000000000
0x400360:       0x0000001200000033      0x0000000000000000
0x400370:       0x0000000000000000      0x0000001200000012
(gdb) x/10s 0x400738
0x400738:       ""
0x400739:       "libc.so.6"
0x400743:       "fflush"
0x40074a:       "strcpy"
0x400751:       "exit"
0x400756:       "sprintf"
0x40075e:       "fopen"
0x400764:       "signal"
0x40076b:       "strncpy"
0x400773:       "select"
(gdb) p/x 0x40076b - 0x400738
$4 = 0x33

因此，若將 alarm patch 成別的函式名稱便可以將 alarm 無效化。注意：由於其他函式的 offset 已經清楚紀錄在 st_name 了，需要以相同長度的字串去做 patch 才行。以前打 CTF 常用的是 isnan，沒有什麼副作用且長度相同。

來看看 patch 的效果：

(gdb) x/40gx 0x400330
0x400330:       0x0000000000000000      0x0000000000000000
0x400340:       0x0000000000000000      0x0000001200000118
0x400350:       0x0000000000000000      0x0000000000000000
0x400360:       0x0000001200000033      0x0000000000000000
0x400370:       0x0000000000000000      0x0000001200000012
...
0x400450:       0x000000120000005f      0x0000000000000000
0x400460:       0x0000000000000000      0x00000012000000d0
(gdb) x/30s 0x400738
0x400738:       ""
0x400739:       "libc.so.6"
...
0x400764:       "signal"
0x40076b:       "strncpy"
0x400773:       "select"
0x40077a:       "realloc"
...
0x4007e1:       "malloc"
0x4007e8:       "siglongjmp"
0x4007f3:       "__ctype_b_loc"
0x400801:       "optarg"
0x400808:       "isnan"
0x40080e:       "fwrite"
(gdb) p/x 0x400808 - 0x400738
$2 = 0xd0

在 0x400808 的位置原本是 alarm 已經被置換成 isnan 如此一來在進行解析時，便得到 isnan 函式的位址。

結果

有了修改後的 driver.py 後，最後只需要在 Makefile 中加入新的 target - valgrind 來做到複製新的一份執行檔，並利用 sed 來將 alarm 字串做置換，接著就可以使用 driver.py 進行 valgrind 分析。

valgrind_existence:
       @which valgrind 2>&1 > /dev/null

valgrind: qtest valgrind_existence
       $(eval patched_file := $(shell mktemp /tmp/qtest.XXXXXX))
       cp qtest $(patched_file)
       chmod u+x $(patched_file)
       sed -i "s/alarm/isnan/g" $(patched_file)
       scripts/driver.py -p $(patched_file) --valgrind

後記

原本後來想想覺得 pr 有點無用，所以跑去 FB 討論區做回覆，發現想的不夠全面，紀錄一下：
- 由於 13-15 測資跑真的很久（會誤以為壞掉），若單純測試功能性的話其實可以再對 driver.py 做修改，讓它的 tidList 只取前 12 個測資，來加速開發流程。註：能這樣做的原因是 13-15 並沒有什麼特殊操作，對於整體的功能性沒有影響，那三個主要測試的是性能（e.g., 是否在
  $O (1)$ 完成特定操作）。
- 以此次作業特性來說只測試 1-12 的話其實不需要上 patch，但也只是剛好這些測資與程式本身複雜度相對簡單，若程式複雜度增加，整個 pr 做到的支援還是有用的。
- dude, is my code constant time?
前面提到說 alarm 的位置分散，所以沒想要直接改原始碼。現在想想也可以參照 qtest 對 malloc, free 的 hook，寫一個特定的 macro 透過編譯參數傳入，裡面內容就是將 alarm 置換之類的。總之 patch 方法很多 XD

延伸閱讀與參考

程式設計師的自我修養
Angelboy 相關教學簡報
- Execution
- Return to dl_resolve
  - 此份比較詳盡，畢竟是要去做 exploit 當然要深入核心 XD
ELF 的 lazy binding

Valgrind Suppression File Howto

裡頭提到 Valgrind 容易有誤報的情況（看起來特別是在大型程式中，文中是用來分析 wxGTK），於是提出如何透過 suppression file 這個設定來讓分析的輸出乾淨一些。

valgrind 也用一陣子，其實沒遇過誤報的狀況，每次下去解都能夠發現真的有問題。不過倒是想起寒假參與專案 WasmVM 時，由於該專案有使用 SkyPat 來做測試，而 SkyPat 本身有一個小小的記憶體洩漏問題，導致一開始寫腳本測試 WasmVM 本身的記憶體問題時一直以為問題沒修掉。當時使用簡單的暴力法來過濾該錯誤，之後應該可以利用這個設定來排除錯誤回報（或是等 Skymizer 的大大們有空修掉該錯誤 XD，WasmVM 專案的維護者本身就是 Skymizer 的一員他有提到說同事已經知道有問題只是近期太忙）。

Suppression File

Valgrind 可以透過參數 --suppressions=<filename> 來讓預設工具 Memcheck 忽略特定例子。

可以透過參數 --gen-suppressions=all 來產生 suppression file 所需的格式。

以 qtest 為例：

» valgrind -q --leak-check=full --show-leak-kinds=all --gen-suppressions=all \
           ./qtest -v 1 -f ./traces/trace-04-ops.cmd
# Test of insert_head, insert_tail, and size
ERROR: Freed queue, but 2 blocks are still allocated
==7444== 112 bytes in 2 blocks are still reachable in loss record 1 of 1
==7444==    at 0x483880B: malloc (vg_replace_malloc.c:309)
==7444==    by 0x404CDB: test_malloc (harness.c:131)
==7444==    by 0x40512D: q_insert_prologue (queue.c:55)
==7444==    by 0x405254: q_insert_tail (queue.c:110)
==7444==    by 0x401A3C: do_insert_tail (qtest.c:227)
==7444==    by 0x403B75: interpret_cmda (console.c:218)
==7444==    by 0x403C07: interpret_cmd (console.c:239)
==7444==    by 0x40477A: cmd_select (console.c:572)
==7444==    by 0x404AFC: run_console (console.c:642)
==7444==    by 0x402821: main (qtest.c:573)
==7444==
{
   <insert_a_suppression_name_here>
   Memcheck:Leak
   match-leak-kinds: reachable
   fun:malloc
   fun:test_malloc
   fun:q_insert_prologue
   fun:q_insert_tail
   fun:do_insert_tail
   fun:interpret_cmda
   fun:interpret_cmd
   fun:cmd_select
   fun:run_console
   fun:main
}

將上方大括號的內容放進 suppression file，並在執行指令時將此檔案傳入 Memcheck 就不會回報此類型的錯誤。

» valgrind -q --leak-check=full --show-leak-kinds=all \
           --suppressions=./test.supp \
           ./qtest -v 1 -f ./traces/trace-04-ops.cmd
# Test of insert_head, insert_tail, and size
ERROR: Freed queue, but 2 blocks are still allocated

當錯誤過多時，手動將大括號內容抽出會很麻煩，文中作者有對此寫了個簡易的腳本可以參考。

研究 qtest

看起來已經有同學做了詳盡的報告 - 0xff07。

[TODO] 閱讀之後看有沒有其他可以寫的。

Flexible Array Members

上述同學的報告中出現了 Variable-length array，之前閱讀「你所不知道的 C 語言：指標篇」時也有對此做研究

Flexible Array Members 同樣也是 C99 引進的特性。在介紹以前，假設有個情境是需要紀錄會員名字的結構，最簡單的方法可能是：




struct User {
    uint32_t id;
    char name[20];
};

但並非每個人都會剛好用滿 20 個字元，會造成浪費，於是第二種方法出現：




struct User {
    uint32_t id;
    char *name;
};

但是這樣會需要多一次 malloc 呼叫，且記憶體分佈可能會有破碎的情形。

Flexible array members 這個新特性可以一次解決上述兩種方法的缺點，它具有以下限制：

在 struct 內宣告一個 incomplete array type (e.g., char name[]，size of name is flexible)。
這個成員（flexible array member）必須要放在最後。
除了 flexible array member 外，struct 必須擁有至少一個成員。

以上面的問題來舉例 flexible array member 常見用法：








char input[40];
ssize_t len;
struct User *p;

len = read(0, input, 40); // assume that this read() succeeds
p = (struct User *)malloc(sizeof(struct User) + len);
// ...
free(p);

在這情況下，上面的 p 相當於宣告為（在某些情況下，這個等效關係會不成立，詳見下方設計問題）：


struct User { uint32_t id; char name[len]; } *p;

如此一來，避免了分配額外空間外也防止了記憶體破碎的問題。

設計問題（以 C11 為準）

首先看一個有趣的問題，擁有 flexible array member 的 struct 的大小會是多少？




struct User {
    uint32_t id;
    char name[];
};

根據 C11 規格書 [1]，struct 的大小計算是將 flexible array member 視為不存在，不過有個例外是：根據其他成員組成，編譯器可能會做 padding，而這個 padding 是*能夠*跟 flexible array member 的空間重疊的。也就是說 sizeof(User) >= offsetof(struct User, name)，且上面提到的等效宣告也就可能會失效（兩種方式中 name 在 struct 中的 offset 可能會不同）。因此在存取 flexible array member 時注意不能夠直接使用 sizeof，需十分注意。