Try   HackMD

contributed by < scottxxxabc >

作業說明
GitHub

開發環境

(使用 VirtualBox)

注意書寫方式: VirtualBox (中間沒有空格)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

$ uname -r
5.13.0-30-generic

$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          1
On-line CPU(s) list:             0
Thread(s) per core:              1
Core(s) per socket:              1
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           140
Model name:                      11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Stepping:                        1
CPU MHz:                         2995.202
BogoMIPS:                        5990.40
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       48 KiB
L1i cache:                       32 KiB
L2 cache:                        1.3 MiB
L3 cache:                        12 MiB

開發環境設定

安裝必要的開發工具套件

$ sudo apt install build-essential git-core valgrind
$ sudo apt install cppcheck clang-format aspell colordiff

CppCheck版本

$ cppcheck --version
Cppcheck 1.90

queue.c 的實作

q_new()

struct list_head *q_new()
{
    struct list_head *head = malloc(sizeof(struct list_head));
    if (!head)
        return NULL
    head->next = head;
    head->prev = head;
    return head;
}
  • malloc 一個大小為 struct list_head 的記憶體,並將head 指向這個記憶體空間。
  • 由於 list 尚未有加入元素,因此將 prev 以及 next 指標指向 head






G



head

head

prev

next



head:e->head:e





head:w->head:w






q_free()

void q_free(struct list_head *head)
{
    if (head == NULL)
        return;
    head->prev->next = NULL;
    head->prev = NULL;
    struct list_head *tmp, *l = head->next;
    while (l != NULL) {
        tmp = l->next;
        element_t *e = container_of(l, element_t, list);
        q_release_element(e);
        l = tmp;
    }
    free(head);
}
  • 檢查list是否為空
  • 將 circular doubly-linked list 首尾互相連結的 link 打斷。
    • 指標 l 指向要刪除的節點,刪除 ll->next 指標也會遺失,因此用指標 tmp 指向 l 的下一個節點。
    • container_of(l, element_t, list) 取得包含 lelement_t 節點。
    • 使用 q_release_element() 釋放占用的記憶體。
    • 迴圈重複上述步驟直到 l 指向的節點為 NULL,也就是list尾端。
  • 最後釋放head占用的記憶體。

注意書寫方式: doubly-linked list (留意連字號和空白字元)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

增加下方程式碼避免輸入 list 為空

if (head == NULL)
        return;

可改寫為:

if (!head)
    return;

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

q_insert_head()

bool q_insert_head(struct list_head *head, char *s)
{
    if (!head)
        return false;
    element_t *new = malloc(sizeof(element_t));

    if (!new)
        return false;

    new->value = strdup(s);
    if (new->value == NULL) {
        free(new);
        return false;
    }
    struct list_head *tmp = head->next;

    tmp->prev = &(new->list);
    (new->list).next = tmp;

    head->next = &(new->list);
    (new->list).prev = head;
    // cppcheck-suppress memleak
    return true;
}
  • 檢查 head 是否為空以及新加入的元素 new 是否成功 malloc
  • 使用 strdup 配置記憶體並複製字串,並檢查是否成功。
  • 將新元素插入 list 前端。

CppCheck error : Memory leak

new 指向的新配置的記憶體未在結束時釋放,導致 new 指向的該記憶體在 q_inserthead() 結束後無法存取。

然而我們仍能利用container_of()及 該記憶體空間包含的 list 節點位址來重新取得記憶體空間的位址

當需要使用到該記憶體時,可以使用

element_t *tmp = container_of(node, element_t, list);

加入註解 // cppcheck-suppress memleak 忽略 CppCheck 判斷 new 指標未釋放的 memory leak error。

你應該說明為何抑制 Cppcheck 的 memory leak 錯誤是正確的操作。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv


q_insert_tail()

bool q_insert_tail(struct list_head *head, char *s)
{
    if (!head)
        return false;
    element_t *new = malloc(sizeof(element_t));

    if (!new)
        return false;

    new->value = strdup(s);
    if (new->value == NULL) {
        free(new);
        return false;
    }
    struct list_head *tmp = head->prev;

    tmp->next = &(new->list);
    (new->list).prev = tmp;

    head->prev = &(new->list);
    (new->list).next = head;
    // cppcheck-suppress memleak
    return true;
}
  • 做法與 q_insert_head() 相似,將最後一個步驟改為插入尾端。

q_remove_head()

element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize)
{
    if (q_size(head) == 0)
        return NULL;

    struct list_head *r = head->next;

    head->next = r->next;
    r->next->prev = head;

    element_t *tmp = container_of(r, element_t, list);
    size_t str_size = sizeof(sp) <= bufsize - 1 ? sizeof(sp) : bufsize - 1;
    memset(sp, '\0', bufsize);
    memcpy(sp, tmp->value, str_size);

    return tmp;
}
  • 檢查 list 是否為空。
  • 將 list 第一個元素移出。
  • 使用 memcpybufsize 長的的記憶體空間複製至 sp
  • sp 最後一個字元設為 \0

** 將if (head == NULL || head->next == NULL)的判斷修改為q_size(head) == 0
以及將

size_t str_size = sizeof(sp) <= bufsize - 1 ? sizeof(sp) : bufsize - 1;
memset(sp, '\0', bufsize);
memcpy(sp, tmp->value, str_size);

改成

if(sp){
    memcpy(sp, tmp->value, bufsize);
    sp[bufsize - 1] = '\0';
}

q_remove_tail()

element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize)
{
    if (head == NULL || head->next == NULL)
        return NULL;

    struct list_head *r = head->prev;

    r->prev->next = head;
    head->prev = r->prev;

    element_t *tmp = container_of(r, element_t, list);
    size_t str_size = sizeof(sp) <= bufsize - 1 ? sizeof(sp) : bufsize - 1;
    memset(sp, '\0', bufsize);
    memcpy(sp, tmp->value, str_size);

    return tmp;
}
  • 做法與 q_remove_head() 相似,將第二個步驟改為尾端元素。

q_size()

int q_size(struct list_head *head)
{
    if (head == NULL || head->next == NULL)
        return 0;

    int count = 0;
    struct list_head **pptr = &head->next;

    for (; *pptr != head; pptr = &(*pptr)->next, count++)
        ;

    return count;
}
  • 走訪 list 的每一個節點並計算長度。

利用額外空間存放 list 長度可使q_size()的呼叫為 constant time。


q_delete_mid()

bool q_delete_mid(struct list_head *head)
{
    if (head == NULL || head->next == NULL)
        return false;

    struct list_head *slow = head->next, *fast = head->next->next;

    for (; fast != head && fast->next != head;
         slow = slow->next, fast = fast->next->next)
        ;

    slow->prev->next = slow->next;
    slow->next->prev = slow->prev;
    element_t *tmp = container_of(slow, element_t, list);

    q_release_element(tmp);

    return true;
}
  • 檢查 list 是否為空。
  • 初始化 slow 以及 fast 指向 list 第一及第二個節點。
    • slow每次前進一個節點。
    • fast每次前進兩個節點。
  • fast 前進到 list 的尾端,也就是list 的最後一個節點或 head 時,slow 會指向中間的節點。
  • 呼叫 q_release_element() 刪除 slow 指向的節點。

q_delete_dup()

bool q_delete_dup(struct list_head *head)
{
    if (q_size(head) < 1)
        return false;

    struct list_head *tmp = head->next;
    while (tmp != head) {
        while (tmp->next != head &&
               strcmp(container_of(tmp, element_t, list)->value,
                      container_of(tmp->next, element_t, list)->value) == 0) {
            struct list_head *tmp2 = tmp->next;
            list_del(tmp->next);
            q_release_element(container_of(tmp2, element_t, list));
        }
        tmp = tmp->next;
    }
    return true;
}
  • 檢查 list 是否為空。
  • tmp 指向第一個節點。
    • 刪除與tmp值相同的元素
      • tmptmp->next 值相同,移出並刪除 tmp->next
      • 重複以上步驟直到 tmptmp->next 值不同
    • tmp 指向 tmp->next
    • 重複直到 tmp 指向陣列尾端。

q_swap()

void q_swap(struct list_head *head)
{
    if (q_size(head) <= 1)
        return;

    struct list_head *node1 = head->next, *node2 = node1->next;
    while (node1 != head && node2 != head) {
        node1->prev->next = node2;
        node2->prev = node1->prev;

        node2->next->prev = node1;
        node1->next = node2->next;

        node1->prev = node2;
        node2->next = node1;

        node1 = node1->next;
        if (node1 == head)
            break;
        node2 = node1->next;
    }
    return;
}
  • 檢查 list 是否為空或只有1個節點。
  • 交換 node1node2 指向的節點。

q_reverse()

void q_reverse(struct list_head *head)
{
    if (q_size(head) <= 1)
        return;

    struct list_head *ptr = head, *prev_node = ptr->prev,
                     *next_node = ptr->next;

    do {
        ptr->prev = next_node;
        ptr->next = prev_node;

        prev_node = ptr;
        ptr = ptr->prev;
        next_node = ptr->next;

    } while (ptr != head);
}
  • 檢查 list 是否為空或只有1個節點。
  • ptr 指向現在的節點,prev_node 指向 ptr 的前一個節點,next_node 指向 ptr 的後一個節點。
    • 交換ptr->prevptr->next 指向的位址。
    • ptrprev_nodenext_node 往下一個節點前進。
    • 重複直到 ptr 指向 list 尾端。

q_sort()

q_sort()q_sort(), merge(), mergesort() 三個函式組成。

q_sort()

將環狀結構拆開,呼叫 merge_sort() 排序後將 list 重新鏈接成 circular

void q_sort(struct list_head *head)
{
    if (q_size(head) <= 1)
        return;
    struct list_head *tmp = head->next;

    head->prev->next = NULL;
    head->next->prev = NULL;
    head->prev = NULL;
    head->next = NULL;

    struct list_head *newhead = merge_sort(tmp);

    head->next = newhead;
    newhead->prev = head;

    struct list_head **pptr = &head;
    for (; (*pptr)->next != NULL; pptr = &(*pptr)->next)
        ;

    head->prev = *pptr;
    (*pptr)->next = head;
}

merge_sort()

排序輸入的 doubly-linked list

struct list_head *merge_sort(struct list_head *pptr2first)
{
    if (pptr2first == NULL || pptr2first->next == NULL)
        return pptr2first;
    struct list_head *slow = pptr2first, *fast = slow->next;

    for (; fast != NULL && fast->next != NULL;
         slow = slow->next, fast = fast->next->next)
        ;

    struct list_head *tmp = slow->next;
    tmp->prev = NULL;

    slow->next = NULL;

    struct list_head *l1 = merge_sort(pptr2first);
    struct list_head *l2 = merge_sort(tmp);

    return merge(l1, l2);
}

merge()

將兩個 doubly-linked list merge成一個。

struct list_head *merge(struct list_head *list1, struct list_head *list2)
{
    char *v1, *v2;
    if (!list1 || !list2)
        return (list2 == NULL) ? list1 : list2;

    struct list_head *newhead = NULL, *tail = NULL, **node = NULL;

    v1 = container_of(list1, element_t, list)->value;
    v2 = container_of(list2, element_t, list)->value;
    node = (strcmp(v1, v2) <= 0) ? &list1 : &list2;
    newhead = tail = *node;
    *node = (*node)->next;

    // cppcheck-suppress knownConditionTrueFalse
    while (list1 && list2) {
        v1 = container_of(list1, element_t, list)->value;
        v2 = container_of(list2, element_t, list)->value;
        node = (*v1 <= *v2) ? &list1 : &list2;

        (*node)->prev = tail;
        tail->next = *node;
        tail = tail->next;

        *node = (*node)->next;
    }

    // cppcheck-suppress knownConditionTrueFalse
    node = (list2 == NULL) ? &list1 : &list2;
    (*node)->prev = tail;
    tail->next = *node;

    return newhead;
}

採用遞迴的方式實作,不使用原本的環狀結構以及head節點,避免多餘的指標數值指派以及空間配置。







G



ptr

ptr



n1

1

NULL

next



ptr->n1





n2

2

prev

next



n1:next->n2





n2:s->n1:s





n3

3

prev

NULL



n2:next->n3





n3:s->n2:s





  • q_sort()

    • 檢查 list 是否為空或只有1個節點。
    • head 移出 list ,將環狀結構打斷。
    • 呼叫 merge_sort() 排序 list。
    • head 加入 list 前端,重新連結首尾。
  • merge_sort()

    • 檢查 list 是否為空或只有1個節點。
    • 使用與 delete_dup() 相同的方式找到中間的節點。
    • 從中央節點將 list 分為左右兩個子 list。
    • 以兩個子 list 為輸入遞迴呼叫自己,排序兩個子 list。
    • 呼叫 merge() 將排序好的兩個 list 合併。
  • merge()

    • list1list2 分別指向兩個 doubly-linked list
    • newhead 指向新 doubly-linked list 的第一個節點,tail 指向 list 尾端。
    • 指標的指標 node 指向下一個要接在 tail 後的節點。






G



tail

tail



n1

#

prev

NULL



tail->n1





list1

list1



n2

1

NULL

next



list1->n2





list2

list2



n3

2

NULL

next



list2->n3





pptr

node



pptr->list1





n4

3

prev

next



n2:next->n4







  • *node 指向的節點連接到tail後。






G



tail

tail



n1

#

prev

next



tail->n1





list1

list1



n2

1

prev

next



list1->n2





list2

list2



n3

2

NULL

next



list2->n3





pptr

node



pptr->list1





n1:next->n2





n2:prev->n1





n4

3

prev

next



n2:next->n4






    • *node 指向下一個節點。






G



tail

tail



n2

1

prev

next



tail->n2:n





list1

list1



n4

3

prev

next



list1->n4





list2

list2



n3

2

NULL

next



list2->n3





pptr

node



pptr->list1





n1

#

prev

next



n2:prev->n1:next






    • 重複直到其中一個 list 為空。
  • 將非空的 list 接在 tail 後。

CppCheck error : Condition always true/false

merge() 中的 while (list1 && list2) 迴圈中透過指標的指標操作 list1list2,但CppCheck仍偵測出 Condition always true,推測是由於函式起始時if空指標的判斷

if (!list1 || !list2)
    return (list2 == NULL) ? list1 : list2;

導致 CppCheck 認為 list1, list2 非空以及無法偵測出指標的指標的間接操作造成。

下方的程式碼

node = (list2 == NULL) ? &list1 : &list2;

也會因相同原因造成Condition always false

加入註解 // cppcheck-suppress knownConditionTrueFalse 可忽略此問題。

shuffle 命令的實作

console_init() 加入程式碼

ADD_COMMAND(shuffle, "                | Shuffle queue contents");

執行 make,發現錯誤訊息

$ make
  CC	qtest.o
In file included from qtest.c:36:
qtest.c: In function ‘console_init’:
console.h:46:45: error: ‘do_shuffle’ undeclared (first use in this function)
   46 | #define ADD_COMMAND(cmd, msg) add_cmd(#cmd, do_##cmd, msg)
      |                                             ^~~
qtest.c:793:5: note: in expansion of macro ‘ADD_COMMAND’
  793 |     ADD_COMMAND(shuffle, "                | Shuffle queue contents");
      |     ^~~~~~~~~~~
console.h:46:45: note: each undeclared identifier is reported only once for each function it appears in
   46 | #define ADD_COMMAND(cmd, msg) add_cmd(#cmd, do_##cmd, msg)
      |                                             ^~~
qtest.c:793:5: note: in expansion of macro ‘ADD_COMMAND’
  793 |     ADD_COMMAND(shuffle, "                | Shuffle queue contents");
      |     ^~~~~~~~~~~
make: *** [Makefile:50: qtest.o] Error 1

第 5 行顯示 ADD_COMMAND(cmd, msg) 巨集內的 do_##cmd 函式未宣告,於是加入

static bool do_shuffle() { return true; }

執行 qtest,輸入 helpshuffle 出現於第 20 行。

$ ./qtest cmd> help Commands: # ... | Display comment dedup | Delete all nodes that have duplicate string dm | Delete middle node in queue free | Delete queue help | Show documentation ih str [n] | Insert string str at head of queue n times. Generate random string(s) if str equals RAND. (default: n == 1) it str [n] | Insert string str at tail of queue n times. Generate random string(s) if str equals RAND. (default: n == 1) log file | Copy output to file new | Create new queue option [name val] | Display or set options quit | Exit program reverse | Reverse queue rh [str] | Remove from head of queue. Optionally compare to expected value str rhq | Remove from head of queue without reporting value. rt [str] | Remove from tail of queue. Optionally compare to expected value str show | Show queue contents shuffle | Shuffle queue contents size [n] | Compute queue size n times (default: n == 1) sort | Sort queue in ascending order source file | Read commands from source file swap | Swap every two adjacent nodes in queue time cmd arg ... | Time command execution Options: echo 1 Do/don't echo commands error 5 Number of errors until exit fail 30 Number of times allow queue operations to return false length 1024 Maximum length of displayed string malloc 0 Malloc failure probability percent simulation 0 Start/Stop simulation mode verbose 4 Verbosity level

do_shuffle()

static bool do_shuffle()
{
    if (!l_meta.l || !l_meta.size) {
        show_queue(0);
        return false;
    }
    srand(time(NULL));

    int i;
    struct list_head *tail = l_meta.l->prev;

    for (i = lcnt; i > 1; i--) {
        int num = rand() % i;
        int j;
        struct list_head *ptr = l_meta.l->next;
        for (j = 0; j < num; j++)
            ptr = ptr->next;
        struct list_head *tmp = tail->prev;
        if (ptr != tail) {
            if (ptr->next == tail) {
                ptr->prev->next = tail;
                tail->prev = ptr->prev;

                tail->next->prev = ptr;
                ptr->next = tail->next;

                ptr->prev = tail;
                tail->next = ptr;
            } else {
                list_del_init(tail);
                list_add(tail, ptr);
                list_del_init(ptr);
                list_add(ptr, tmp);
            }
        }
        if (tmp != l_meta.l)
            tail = tmp;
    }
    show_queue(0);
    return true;
}
  • 檢查 list 是否為空。
  • tail 指向 list 最後一個節點。

根據 Fisher–Yates shuffle 演算法實作,步驟如下

  • 在第一個節點到 tail 節點之間隨機選一節點 ptr
  • ptrtail 互換。
    ptr 的下一節點為 tail ,交換節點之方法可參考q_swap()
    ptr 的下一節點不為 tail ,:
    • tail 移出 list 。






G



head

head



n1

#

prev

next



head->n1





tail

tail



n5

4

prev

next



tail->n5





ptr

ptr



n2

1

prev

next



ptr->n2





tmp

tmp



n4

3

prev

next



tmp->n4





n1:next->n2:prev






n3

2

prev

next



n2:next->n3:prev






n3:next->n4:prev






n4:next->n5:prev






n5:next->n1:s












G



head

head



n1

#

prev

next



head->n1





tail

tail



n5

4

prev

next



tail->n5





ptr

ptr



n2

1

prev

next



ptr->n2





tmp

tmp



n4

3

prev

next



tmp->n4





n1:next->n2:prev






n3

2

prev

next



n2:next->n3:prev






n3:next->n4:prev






n4:next->n1:prev






    • 加入到 ptr






G



head

head



n1

#

prev

next



head->n1





tail

tail



n5

4

prev

next



tail->n5





ptr

ptr



n2

1

prev

next



ptr->n2





tmp

tmp



n4

3

prev

next



tmp->n4





n1:next->n2:prev






n2:next->n5:prev






n3

2

prev

next



n3:next->n4:prev






n4:next->n1:prev






n5:next->n3:s






    • ptr 移出 list 。






G



head

head



n1

#

prev

next



head->n1





tail

tail



n5

4

prev

next



tail->n5





ptr

ptr



n2

1

prev

next



ptr->n2





tmp

tmp



n4

3

prev

next



tmp->n4





n1:next->n5:prev






n3

2

prev

next



n3:next->n4:prev






n4:s->n1:prev






n5:next->n3:prev






    • ptr 加入 tmp 後。






G



head

head



n1

#

prev

next



head->n1





ptr

ptr



n2

1

prev

next



ptr->n2





tmp

tmp



n4

3

prev

next



tmp->n4





tail

tail



n5

4

prev

next



tail->n5





n1:next->n5:prev






n2:next->n1:prev






n3

2

prev

next



n3:next->n4:prev






n4:next->n2:prev






n5:next->n3:prev






** 將 tail 指向 tmp 指向的節點。

引入list_sort.c

研讀list_sort.c

__attribute__((nonnull(2,3)))
void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp)
  • 參數 privlist_sort() 中並未使用,只是將 priv 傳給 cmp,若 cmp 也不使用則可傳入 NULL
  • 參數 head 為要排序的 list。
  • cmp 為比較元素大小用的 function pointer, cmplist_sort.h 的定義如下 :
typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *,const struct list_head *, const struct list_head *);

__attribute__((nonnull(2,3)))表示第二及第三個參數的值不可能為 NULL

list_sort()

list_sort() 嘗試將待合併的 list 維持

2:1 的長度,若已存在 2 個
2k
大小的 sublist ,後面的元素總量又達到
2k
時,就會合併 2 個
2k
大小的 sublist。
如此作法只要在 L1 Cache 能夠容納
32k
數量的元素時就能避免 thrashing,也能夠減少
0.2n
次的比較。

struct list_head *list = head->next, *pending = NULL;
	size_t count = 0;	/* Count of pending */

	if (list == head->prev)	/* Zero or one elements */
		return;

	/* Convert to a null-terminated singly-linked list. */
	head->prev->next = NULL;

首先先將 list 尾端連向 head 的連結打斷,使其變成單向的 list,*pending 指向一個 list,其中每個元素皆為等待被 merge 的 sublist,並巧妙利用原 list 的 prev 指標作為指向下一個節點的指標。

do {
		size_t bits;
		struct list_head **tail = &pending;

		/* Find the least-significant clear bit in count */
		for (bits = count; bits & 1; bits >>= 1)
			tail = &(*tail)->prev;
		/* Do the indicated merge */
		if (likely(bits)) {
			struct list_head *a = *tail, *b = a->prev;

			a = merge(priv, cmp, b, a);
			/* Install the merged result in place of the inputs */
			a->prev = b->prev;
			*tail = a;
		}

		/* Move one element from input list to pending */
		list->prev = pending;
		pending = list;
		list = list->next;
		pending->next = NULL;
		count++;
	} while (list);

初始狀態







G



head

head



n0

#

prev

next



head->n0





pending

pending



NULL

NULL



pending->NULL





tail

tail



tail->pending





list

list



n1

1

prev

next



list->n1:n





n0:next->n1:prev






n2

2

prev

next



n1:next->n2:prev






n3

3

prev

next



n2:next->n3:prev






n4

4

prev

next



n3:next->n4:prev






n4:next->NULL





count =

010bits =
02
,經過 for 迴圈之後,bits 為 0 ,因此不進行 merge,將 list 節點加入 pending







G


cluster_0

sublist 1



head

head



n0

#

prev

next



head->n0





pending

pending



n1

1

prev

next



pending->n1





tail

tail



tail->pending





list

list



n2

2

prev

next



list->n2:n





n0:next->n1:prev





NULL

NULL



n1:prev->NULL





n1:next->NULL





n2:prev->n1





n3

3

prev

next



n2:next->n3:prev






n4

4

prev

next



n3:next->n4:prev






n4:next->NULL





count =

110bits =
012
,經過 for 迴圈之後,bits 為 0 ,因此不進行 merge,將 list 節點加入 pending,建立新的 sublist。







G


cluster_0

sublist 1


cluster_1

sublist 2



head

head



n0

#

prev

next



head->n0





pending

pending



n2

2

prev

NULL



pending->n2





tail

tail



tail->pending





list

list



n3

3

prev

next



list->n3





n1

1

prev

NULL



n0:next->n1:w





NULL

NULL



n1:prev->NULL





n2:prev->n1





n3:prev->n2





n4

4

prev

NULL



n3:next->n4:prev






count =

210bits =
102
,經過 for 迴圈之後,bits
102
,因此將*tail指向的 sublist 與前一個 sublist merge







G


cluster_1

sublist 1



head

head



n0

#

prev

next



head->n0





pending

pending



n1

1

NULL

next



pending->n1





tail

tail



tail->pending





list

list



n3

3

prev

next



list->n3





n0:next->n1





n2

2

prev

NULL



n1:next->n2





n3:prev->n2





n4

4

prev

NULL



n3:next->n4:prev






再把 list 節點加入 pending,建立新的 sublist。







G


cluster_1

sublist 2


cluster_0

sublist 1



head

head



n0

#

prev

next



head->n0





pending

pending



n3

3

prev

NULL



pending->n3





tail

tail



tail->pending





list

list



n4

4

prev

NULL



list->n4





n1

1

NULL

next



n0:next->n1





n2

2

prev

NULL



n1:next->n2





n3:prev->n1





n4:prev->n3:w





重複以上步驟至 list 沒有剩餘節點,

list = pending;
	pending = pending->prev;
	for (;;) {
		struct list_head *next = pending->prev;

		if (!next)
			break;
		list = merge(priv, cmp, pending, list);
		pending = next;
	}
	/* The final merge, rebuilding prev links */
	merge_final(priv, cmp, head, pending, list);

參考 SmallHanley以及 arthurchang09 的方式,在 qtest.c 中加入 linuxsort 命令來執行。

static int list_node_cmp(void *priv,
                         const struct list_head *a,
                         const struct list_head *b)
{
    return strcmp(list_entry(a, element_t, list)->value,
                  list_entry(b, element_t, list)->value);
}

static bool do_linuxsort(){
    if (!l_meta.l || !l_meta.size) {
        show_queue(0);
        return false;
    }
    list_sort(NULL, l_meta.l, list_node_cmp);
    show_queue(0);
    return true;
}

接著使用 qtest.ctime 命令來計算排序一百萬個節點所花費的時間,如下表。

節點個數 我的sort linux sort 我的sort / linux sort
500000 0.887(s) 0.435(s) 0.49
1000000 1.816(s) 0.971(s) 0.53
2000000 4.098(s) 2.171(s) 0.53

使用 Valgrind 的 Cachegrind 工具分析我的實作,以一百萬個節點為例。

valgrind --tool=cachegrind --branch-sim=yes ./qtest -f traces/my_sort.cmd
cg_annotate --auto=yes cachegrind.out.5966 
--------------------------------------------------------------------------------
I1 cache:         32768 B, 64 B, 8-way associative
D1 cache:         49152 B, 64 B, 12-way associative
LL cache:         12582912 B, 64 B, 12-way associative
Command:          ./qtest -f traces/my_sort.cmd
Data file:        cachegrind.out.5966

--------------------------------------------------------------------------------
Ir            I1mr  ILmr  Dr          D1mr       DLmr       Dw          D1mw      DLmw      Bc          Bcm        Bi         Bim 
--------------------------------------------------------------------------------
2,836,569,531 1,750 1,720 719,430,080 57,810,495 27,214,285 331,698,599 4,719,563 4,347,615 394,236,008 15,439,132 39,671,200 337  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir          I1mr ILmr Dr          D1mr       DLmr      Dw         D1mw      DLmw      Bc         Bcm       Bi         Bim  file:function
--------------------------------------------------------------------------------
400,040,751    4    4 100,371,223 14,478,851 4,528,482 82,696,976     8,144        32 39,862,271 1,561,607          0   0  /home/scott/Documents/lab0-c/queue.c:merge
 97,920,684    1    1  35,017,853 16,569,945 4,935,556  6,999,993   263,755   242,938 22,951,423   771,021          0   0  /home/scott/Documents/lab0-c/queue.c:merge_sort
  4,000,025    2    2   1,000,006  1,000,001   999,674         11         4         4  1,000,002         7          0   0  /home/scott/Documents/lab0-c/queue.c:q_sort

Cachegrind 工具分析 linux list_sort,同樣排序一百萬個節點。

cg_annotate --auto=yes cachegrind.out.6025 
--------------------------------------------------------------------------------
I1 cache:         32768 B, 64 B, 8-way associative
D1 cache:         49152 B, 64 B, 12-way associative
LL cache:         12582912 B, 64 B, 12-way associative
Command:          ./qtest -f traces/linux_sort.cmd
Data file:        cachegrind.out.6025

--------------------------------------------------------------------------------
Ir            I1mr  ILmr  Dr          D1mr       DLmr       Dw          D1mw      DLmw      Bc          Bcm        Bi         Bim 
--------------------------------------------------------------------------------
2,706,564,927 1,743 1,713 653,863,237 39,702,748 20,293,696 312,061,904 5,453,266 5,097,384 363,826,443 24,564,592 57,370,502 338  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir          I1mr ILmr Dr         D1mr       DLmr      Dw         D1mw      DLmw      Bc         Bcm        Bi         Bim  file:function
--------------------------------------------------------------------------------
248,131,972    1    1 27,686,766  3,100,861 1,045,376 43,373,556         0         0 36,373,570 10,023,940 17,686,786   1  /home/scott/Documents/lab0-c/list_sort.c:merge
149,494,248    1    1 56,060,343 12,968,773 4,829,301 18,686,781         0         0          0          0          0   0  /home/scott/Documents/lab0-c/qtest.c:list_node_cmp
 42,024,281    8    8  6,999,966    507,586   500,184  8,999,980 1,005,613   992,750  6,000,011  1,406,627    999,995   1  /home/scott/Documents/lab0-c/list_sort.c:list_sort

Valgrind cg-manual
Cachegrind gathers the following statistics (abbreviations used for each statistic is given in parentheses):

  • I cache reads (Ir, which equals the number of instructions executed), I1 cache read misses (I1mr) and LL cache instruction read misses (ILmr).
  • D cache reads (Dr, which equals the number of memory reads), D1 cache read misses (D1mr), and LL cache data read misses (DLmr).
  • D cache writes (Dw, which equals the number of memory writes), D1 cache write misses (D1mw), and LL cache data write misses (DLmw).
  • Conditional branches executed (Bc) and conditional branches mispredicted (Bcm).
  • Indirect branches executed (Bi) and indirect branches mispredicted (Bim).
L1 read miss LL read miss Bcm
my mergesort 17,318,797 10,463,712 2,332,635
list_sort 16,577,220 6,374,861 11,430,567

根據兩次的測試結果可以發現,我的實作在 Cache miss 方面比 list_sort 多,但在 Branch mispredicted 則比較少。Valgrind cg-manual 提到

On a modern machine, an L1 miss will typically cost around 10 cycles, an LL miss can cost as much as 200 cycles, and a mispredicted branch costs in the region of 10 to 30 cycles. Detailed cache and branch profiling can be very useful for understanding how your program interacts with the machine and thus how to make it faster.

可見 LL miss 的懲罰比起分支預測錯誤還要重很多。對於 list_sort 而言,減少 Cache miss 顯然是效益較高的。

仔細查看我的實作程式碼的部分可以發現在 merge_sort() 以快慢節點找出中間位置的時候發生了許多 miss :

Ir          I1mr ILmr Dr         D1mr       DLmr      Dw         D1mw    DLmw    Bc         Bcm       Bi Bim 
------------------------------------------------------
 47,787,842    0    0  9,884,992  6,045,985 2,126,317          0       0       0 18,951,425   652,723  0   0      for (; fast != NULL && fast->next != NULL;
 18,132,866    0    0 18,132,866 10,517,420 2,809,211          0       0       0          0         0  0   0           slow = slow->next, fast = fast->next->next)

此方法必須要走訪每一個節點,由於資料量高達一百萬個,也不可能將全部資料搬進 Cache,因此每一次調用 next 指標都很有可能造成 Cache miss。
merge() 函式中的 *node = (*node)->next;

Ir          I1mr ILmr Dr         D1mr       DLmr      Dw         D1mw    DLmw    Bc         Bcm       Bi Bim 
------------------------------------------------------
35,348,494    0    0 17,674,247  2,944,096   948,749 17,674,247       0       0          0         0  0   0          *node = (*node)->next;

另一個造成嚴重 Cache miss 的則是比較字串的部分,以下為 merge()字串比較的部分

Ir          I1mr ILmr Dr         D1mr       DLmr      Dw         D1mw    DLmw    Bc         Bcm       Bi Bim 
------------------------------------------------------
143,393,974    0    0 35,348,494 11,523,119 3,579,693 17,674,247       0       0          0         0  0   0          node = (strcmp(v1, v2) <= 0) ? &list1 : &list2;

list_sort 則是傳入 list_sort() 參數的 cmp function。

Ir          I1mr ILmr Dr         D1mr       DLmr      Dw         D1mw    DLmw    Bc         Bcm       Bi Bim 
------------------------------------------------------
149,494,248    1    1 56,060,343 12,968,773 4,829,301 18,686,781         0         0          0          0          0   0  /home/scott/Documents/lab0-c/qtest.c:list_node_cmp

參考資料

你所不知道的 C 語言: linked list 和非連續記憶體