2019q1 Homework1 (linux-list)

contributed by < 0xff07 >

開發環境

$ uname -a
Linux OxB16BOOB5 4.18.0-15-generic #16-Ubuntu SMP Thu Feb 7 10:56:39 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               70
Model name:          Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
Stepping:            1
CPU MHz:             850.328
CPU max MHz:         3700.0000
CPU min MHz:         800.0000
BogoMIPS:            4988.84
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            6144K
L4 cache:            131072K
NUMA node0 CPU(s):   0-7

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.10
DISTRIB_CODENAME=cosmic
DISTRIB_DESCRIPTION="Ubuntu 18.10"

`container_of()`












#ifndef container_of
#ifdef __LIST_HAVE_TYPEOF
#define container_of(ptr, type, member)                            \
    __extension__({                                                \
        const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
        (type *) ((char *) __pmember - offsetof(type, member));    \
    })
#else
#define container_of(ptr, type, member) \
    ((type *) ((char *) (ptr) -offsetof(type, member)))
#endif
#endif

該程式原理類似 qtest 中，使用位址反推 HEADER / FOOTER 的機制。關於該巨集各部份的說明如下：

`offsetof()`

首先發現一個 offsetof 巨集。查詢 man offsetof 後可以發現關於該巨集的說明：

DESCRIPTION
       The macro offsetof() returns the offset of the field member from the start of the structure type.

       This  macro  is useful because the sizes of the fields that compose a structure can vary across implementations, and compilers may insert different
       numbers of padding bytes between fields.  Consequently, an element's offset is not necessarily given by the sum of the sizes of the  previous  ele‐
       ments.

       A compiler error will result if member is not aligned to a byte boundary (i.e., it is a bit field).

並且提供範例程式：

































EXAMPLE
       On a Linux/i386 system, when compiled using the default gcc(1) options, the program below produces the following output:

           $ ./a.out
           offsets: i=0; c=4; d=8 a=16
           sizeof(struct s)=16

   Program source

       #include <stddef.h>
       #include <stdio.h>
       #include <stdlib.h>

       int
       main(void)
       {
           struct s {
               int i;
               char c;
               double d;
               char a[];
           };

           /* Output is compiler dependent */

           printf("offsets: i=%zd; c=%zd; d=%zd a=%zd\n",
                   offsetof(struct s, i), offsetof(struct s, c),
                   offsetof(struct s, d), offsetof(struct s, a));
           printf("sizeof(struct s)=%zd\n", sizeof(struct s));

           exit(EXIT_SUCCESS);
       }

由 man 的說明可知該巨集的功能為：給定結構體，以及成員的名稱，offsetof 將給出該成員之位址，與結構體起始位址的差異

`({...})`

container_of 這個巨集至乍看之下沒有回傳值，但可發現它是一個被小括號包住的復合敘述。這是一個 GCC 的延伸功能。參閱 6.1 Statements and Declarations in Expressions 部份內容可以知道：小括號包住形成複合敘述(coumpound statement)的大括號，在 GCC 中是一個合法的 expression ，該 expression 中最後一個敘述的值，視為該復合敘述計算之後的值。這個延伸功能的其中一個好處是 增加巨集的安全。舉例而言，考慮以下找出兩個 int 中最大值的巨集：

#define maxint(a,b) ((a) > (b) ? (a) : (b))

假定想要先將 a 增加 1 ，並找出該結果與 b 較大者，可能會寫出以下程式：

maxint(++a, b);

但該程式將會被展開成：

((++a) > (b) ? (++a) : (b));

因此在這個巨集當中 a 將會增加兩次，與原先預期不同。但如果使用該延伸功能，將該巨集改寫成：

#define maxint(a,b) \
     ({int _a = (a), _b = (b); _a > _b ? _a : _b; })

a 與 b 中的 expression 都只會被計算一次。而大括號形成的複合敘述，可以使 _a _b 變數的可視範圍限制在大括號中。

整體功能

從巨集後者的：






...
#else
#define container_of(ptr, type, member) \
    ((type *) ((char *) (ptr) -offsetof(type, member)))
#endif
...

或許較好理解。該巨集將 member 所在位址，扣除該成員在結構體起始位址的差異，反推回結構體的起始位址。而前面的：








...
#ifdef __LIST_HAVE_TYPEOF
#define container_of(ptr, type, member)                            \
    __extension__({                                                \
        const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
        (type *) ((char *) __pmember - offsetof(type, member));    \
    })
...

可注意到復合敘述當中的第二行 (type *) ((char *) __pmember - offsetof(type, member)); 與前述的 ((type *) ((char *) (ptr) -offsetof(type, member))) 幾乎相同，

LIST_POISONING

首先查到這個技巧能夠用來除錯。在程式發生未預期的結束時，可以使用 core dump 進行除錯。將刪除的節點設置成某些特定的記憶體位址，若該位址在 core dump 中觀察到被寫入，就可推測問題出在某個 list_head 結構的存取中。

另外查到一個 "memory poisoning"。在 kernel 的文件中，Kernel Self-Protection 的章節提到：

Memory poisoning

When releasing memory, it is best to poison the contents (clear stack on syscall return, wipe heap memory on a free), to avoid reuse attacks that rely on the old contents of memory. This frustrates many uninitialized variable attacks, stack content exposures, heap content exposures, and use-after-free attacks.

所以推測有一部份可能是基於安全考量。

調閱 git log，得知 Linux 核心開發者對 list.h 進行哪些相關修改
:notes: jserv

`list_head` 的應用

`struct task_struct`

第一個想到的應用是用於作業系統排程相關。因此第一個想到的就是去查 process control block 相關的結構。因此查到 sched.h 中的 struct task_struct 結構。在這個結構中紀錄許多與行程有關的資料，如 pid, signal handler 等等。當中有許多資訊使用 list_head 儲存。舉例來說：








	/*
	 * 'ptraced' is the list of tasks this task is using ptrace() on.
	 *
	 * This includes both natural children and PTRACE_ATTACH targets.
	 * 'ptrace_entry' is this task's link on the p->parent->ptraced list.
	 */
    struct list_head        ptraced;
    struct list_head        ptrace_entry;

ptraced 紀錄該程序正在使用 ptrace 觀察的程序們。而當中也有與 perf 相關的資料結構，仍然是用 list_head 建立：





#ifdef CONFIG_PERF_EVENTS
	struct perf_event_context	*perf_event_ctxp[perf_nr_task_contexts];
	struct mutex			perf_event_mutex;
	struct list_head		perf_event_list;
#endif

巨集的考量

可發現 list_head 的實作方式，易於擴充，但又可以將所有擴展後的新資料結構，使用同一套巨集進行操作。舉例來說，在 bash 中執行 locate list.h ，試著尋找一些 Linux 核心中使用的相關結構：

...
/usr/src/linux-headers-4.18.0-11/arch/ia64/include/asm/native/patchlist.h
/usr/src/linux-headers-4.18.0-11/include/linux/klist.h
/usr/src/linux-headers-4.18.0-11/include/linux/list.h
/usr/src/linux-headers-4.18.0-11/include/linux/llist.h
/usr/src/linux-headers-4.18.0-11/include/linux/plist.h
/usr/src/linux-headers-4.18.0-11/include/linux/quicklist.h
/usr/src/linux-headers-4.18.0-11/include/linux/rcu_segcblist.h
/usr/src/linux-headers-4.18.0-11/include/linux/rculist.h
/usr/src/linux-headers-4.18.0-11/include/linux/scatterlist.h
/usr/src/linux-headers-4.18.0-11/include/linux/ceph/pagelist.h
/usr/src/linux-headers-4.18.0-11/include/linux/netfilter/ipset/ip_set_list.h
/usr/src/linux-headers-4.18.0-11/include/uapi/linux/netfilter/ipset/ip_set_list.h

...

以 klist.h 及 plist.h 為例，當中資料結構宣告如下：






sstruct klist {
    spinlock_t      k_lock;
    struct list_head    k_list;
    void            (*get)(struct klist_node *);
    void            (*put)(struct klist_node *);
} __attribute__ ((aligned (sizeof(void *))));









struct plist_head {
    struct list_head node_list;
};

struct plist_node {
    int         prio;
    struct list_head    prio_list;
    struct list_head    node_list;
};

文件生成

文件當中註解的符號，原理類似 Doxygen 由特定註解格式自動生成說明文件。但該格式的規範與 Doxygen 不同。在核心的文件中有提到：

The Linux kernel uses Sphinx to generate pretty documentation from reStructuredText files under Documentation. To build the documentation in HTML or PDF formats, use make htmldocs or make pdfdocs. The generated documentation is placed in Documentation/output.

可知該文件生成是使用 sphinx，而非 Doxygen。試著依照指示安裝相依套件：

$ ./scripts/sphinx-pre-install

將會出現需要生成文件所需安裝的相關套件(如指示的連結中說明所提到的)，複製貼上執行即可。若相依套件均已安裝，則會顯示：

Detected OS: Ubuntu 18.04.1 LTS.

All optional dependenties are met.
Needed package dependencies are met.

安裝完之後，可以考慮安裝文件中指定的佈景主題。但這並非必要(我反而覺得 sphinx 預設主題產生的網頁搜尋功能比較好用)：

$ pip install sphinx_rtd_theme

接著在核心原始碼的資料夾中執行

$ make htmldocs

待執行完成後，將可以在 Documentation/output 之下，找到文件的 html 輸出結果。

如需搜尋，可以點進 Search.html 中，該網頁提供關鍵字搜尋功能。舉例來說，若搜尋 list_add_tail，則可以找到相關條目：

加入 qtest

這方面的程式放在 lab0 裡面 linux-list-test 這個 branch 中。

因為 struct list_head 僅是作為使用者撰寫的 linked list 中的其中一個成員，所以這邊就使用一個包含 struct list_head 的結構體來測試。這邊參考 /private/common.h 中的測試結構體，定義:




struct listitem {
    uint16_t i;
    struct list_head list;
};

因為 list.h 中關鍵的部份是 struct list_head 中的指標操作, 幫物件配置記憶體並不是 list.h 的工作。所以測試的重點在於有沒有正確連接各個指標。

接著， qtest 中使用一個全域的 linked list q 作為待測試的資料結構。這邊也仿照其中的作法，設置一個全域變數 p 作為待測試的結構：




/* list being tested */
struct listitem p = {.i = 0, .list = {0, 0}};
/* number of elements in list */
size_t pcnt = 0;

因為 list.h 中並沒有包含如 q_new 之類幫物件配置記憶體空間的操作，但有包含初始化物件的操作，所以這邊不像 qtest 本來的測試宣告一個準備指向 linked list 的指標，而是直接宣告一個 struct listitem。而相較於 queue_t 之前要先 q_new，在測試 listitem 之前要先將其中的 listitem.list 使用 INIT_LIST_HEAD 來初始化。

然後宣告一個 listitem 的陣列作為記憶體池:




#define POOL_NUM 100
static struct listitem pool_listitem[POOL_NUM];
static bool pool_valid[POOL_NUM];
static bool list_init = false;

再這當中，pool_listitem 陣列是真正存放節點的記憶體空間，pool_valid 表示該對應元素有沒有被使用，如果有的話，就會被設成 true; 而如果要執行類似 free 整個串列連結的操作，只要把所有 pool_valid 設成 false ，並且重新執行 LIST_INIT_HEAD 就好。

觀察：測試 wrapper 的設計

首先觀察一下 qtest 當中的測試函數是怎麼運作的。可以觀察到裡面的函數大多遵循下面這種模式：























bool do_<operation name> (int argc, char *argv[])
{
    /* check if argc and argv are correct, q NULL, etc */
    if (argc != 1) {
        report(1, "%s takes no arguments", argv[0]);
        return false;
    }
    if (q == NULL) {
        /* ommit */
    }
    ...
    error_check();
    
    /* risky test that may invoke SIGSEGV and other exceptions   */
    /* is wrapped inside if (exception_setup(true)){}            */
    if (exception_setup(true)) {
    
    }
    exception_cancel();
    show_queue(3);
    return !error_check();
}

幾乎每一個測試函數，最後都會進行 show_queue。因此也先實作對應版本的 show_queue。

show_list

首先實作 show_queue 的對應版本 show_list。首先看一下 show_queue 的程式：









































static bool show_queue(int vlevel)
{
    bool ok = true;
    if (verblevel < vlevel)
        return true;
    int cnt = 0;
    if (q == NULL) {
        report(vlevel, "q = NULL");
        return true;
    }
    report_noreturn(vlevel, "q = [");
    list_ele_t *e = q->head;
    if (exception_setup(true)) {
        while (ok && e && cnt < qcnt) {
            if (cnt < big_queue_size)
                report_noreturn(vlevel, cnt == 0 ? "%s" : " %s", e->value);
            e = e->next;
            cnt++;
            ok = ok && !error_check();
        }
    }
    exception_cancel();
    if (!ok) {
        report(vlevel, " ... ]");
        return false;
    }
    if (e == NULL) {
        if (cnt <= big_queue_size)
            report(vlevel, "]");
        else
            report(vlevel, " ... ]");
    } else {
        report(vlevel, " ... ]");
        report(
            vlevel,
            "ERROR:  Either list has cycle, or queue has more than %d elements",
            qcnt);
        ok = false;
    }
    return ok;
}

可以發現在遍歷整個 q 時，是用維護好的 qcnt 進行便歷，而不是直接使用 q 的指標操作作為依據。這樣可以檢測出比如「插入通通成功，但刪除通通什麼都沒做」的狀況。如果僅透過指標操作，仍然可以順利遍歷整個 list，也不會發生非法記憶體存取，但這樣並不表示結果正確。而透過維護 qcnt ，遍歷結束之後立刻會發現 cnt 數字對不起來（或是說遍歷完之後發現沒有回到頭等等）。

這邊仿照上述測試，實作 show_list 以及 do_show_list









































static bool show_list(int vlevel)
{
    int cnt = 0;
    bool ok = true;
    if (list_init == false) {
        report(vlevel, "p = NULL.\n");
        return true;
    }
    struct listitem *cur_node =
        container_of(p.list.next, struct listitem, list);
    struct list_head *iter = p.list.next;
    report_noreturn(vlevel, "p = [ ");
    if (exception_setup(true)) {
        while (ok && iter != &(p.list) && cnt < pcnt) {
            if (cnt < big_queue_size)
                report_noreturn(vlevel, cnt ? "%d " : "%d ", cur_node->i);
            cnt++;
            iter = iter->next;
            cur_node = container_of(iter, struct listitem, list);
        }
    }
    exception_cancel();
    if (!ok) {
        report(vlevel, "... ]");
        return false;
    }
    if (iter == &(p.list)) {
        if (cnt <= big_queue_size)
            report(vlevel, "]");
        else
            report(vlevel, " ... ]");
    } else {
        report(vlevel, " ... ]");
        report(vlevel,
               "ERROR : length of list doesn't match the number of insert and "
               "delete operations.");
        ok = false;
    }
    return ok;
}

這邊的邏輯大致與 show_queue 相同，但不一樣之處是：在 queue.h 系列的操作中，會時時刻刻檢查動態配置的記憶體是否正確(即 error_check() 函數)，但這邊因為已經開好陣列了，所以不進行檢查。








bool do_list_show(int argc, char *argv[])
{
    if (argc != 1) {
        report(1, "%s takes no arguments", argv[0]);
        return false;
    }
    return show_list(0);
}

do_init_list_head()

接著寫 INIT_LIST_HEAD 的測試：





























bool do_list_init_head(int argc, char *argv[])
{
    if (argc != 1) {
        report(3, "%s takes exactly 1 argument.");
        return false;
    }
    if (list_init == true) {
        report(3, "Warning : list is not empty");
        for (int i = 0; i < POOL_NUM; i++)
            pool_valid[i] = 0;
    }
    bool ok = true;
    if (exception_setup(true)) {
        INIT_LIST_HEAD(&(p.list));
        if (p.list.next != &(p.list)) {
            report(3, "next field is not properly linked.");
            ok = false;
        }
        if (p.list.prev != &(p.list)) {
            report(3, "prev field is not properly linked.");
            ok = false;
        }
    }
    exception_cancel();
    if (ok)
        list_init = true;
    return ok;
}

這邊是用 list_init 作為有沒有初始化的依據，並且在執行結束之後，驗證是否頭尾都連到自己。

do_list_add

這邊的邏輯類似 q_insert 。雖然理論上數字範圍與重複次數應該要是任意的，但在無論限制多寬，本質上還是從 pool_listitem 中的某個元素連接到裡面另外一個元素。所以這邊先規定插入數字 i 時，必須使用 pool_listitem[i] 的記憶體，而範圍只能在 0 ~ POOL_NUM - 1，且一個數字只能插入一次。





























































bool do_list_add(int argc, char *argv[])
{
    int insert;
    bool ok = true;
    if (argc != 2) {
        report(1, "%s takes 2 arguments.");
        return false;
    }
    if (!get_int(argv[1], &insert)) {
        report(1, "'%s' is not a valid number.", argv[1]);
        return false;
    }
    if (insert >= POOL_NUM || insert < 0) {
        report(1, "'%s' out of range. Should between 0 and %d.", argv[1],
               POOL_NUM - 1);
        return false;
    }
    if (pool_valid[insert] == true) {
        report(1, "'%s' is already in list.", argv[1]);
        return false;
    }
    if (false == list_init) {
        report(1, "Calling list_add on null list");
        return false;
    }
    error_check();
    if (exception_setup(true)) {
        struct list_head *original_next = p.list.next;
        pool_listitem[insert].i = insert;
        struct list_head *node = &(pool_listitem[insert].list);
        struct list_head *head = &(p.list);
        list_add(node, head);
        if ((p.list.next) != node) {
            report(1, "ERROR : New node isn't correctly insert to head.");
            ok = false;
        }
        if ((node->prev) != head) {
            report(1, "ERROR : Head isn't correctly connected to new node.");
            ok = false;
        }
        if ((original_next->prev) != node) {
            report(1,
                   "ERROR : The second node before insertion isn't correctly "
                   "connected to newly added node.");
            ok = false;
        }
        if ((node->next) != original_next) {
            report(1,
                   "ERROR : new node isn't correctly connected to second node "
                   "before insertion.");
            ok = false;
        }
        if (ok) {
            pcnt++;
            pool_valid[insert] = true;
        }
    }
    exception_cancel();
    show_list(3);
    return ok;
}

大致上來說，架構是檢測輸入, 並在連接之後，對相應的節點做出位置檢查。