2020q1 Homework1 (lab0)

contributed by < ire33164 >

tags: `Linux Kernel`

作業說明

這是 linux 核心設計課程的第一次作業，主要用來加深對 singly linked list 的概念與實做

實做內容

queue.h

為了符合 C Programming Lab 中所提到q_size 和 q_insert_tail 的時間複雜度須為

O (1)

Two of the functions: q_insert_tail and q_size will require some effort on your part to meet the required performance standards. Naive implementations would require O(n) steps for a queue with n elements. We require that your implementations operate in time O(1)

因此在 struct queue_t 新增了兩個 field list_ele_t *tail, int size

/* Queue structure */
typedef struct {
    list_ele_t *head, *tail; /* Linked list of elements */
    int size;
} queue_t;

tail : 型別為 list_ele_t *, 用於紀錄 queue 最尾端的元素
size : 型別為 int, 用於紀錄 queue 上目前串接的元素數量

queue.c

q_new

創建新的 queue 並初始化 :

確認 malloc 是否成功
確認成功後再將 size 設為 0 (空佇列
head, tail 皆指向 NULL













queue_t *q_new()
{
    queue_t *q = malloc(sizeof(queue_t));
    /* Return NULL if malloc failed */
    if (!q) {
        return NULL;
    }
    /* Initialize queue */
    q->head = NULL;
    q->tail = NULL;
    q->size = 0;
    return q;
}

q_free

釋放 queue 中的 space

確認 q 是否為 NULL
利用迴圈逐個釋放 element space (list_ele_t 和 value)
釋放 queue structure


















void q_free(queue_t *q)
{
    /* No effect if q is NULL */
    if (!q) {
        return;
    }
    /* Free all elements space */
    for (list_ele_t *cur = q->head, *prev = NULL; cur; ) {
        prev = cur;
        cur = cur->next;
        /* Free value space */
        free(prev->value);
        /* Free list_ele structure */
        free(prev);
    }
    /* Free queue structure */
    free(q);
}

q_insert_head

插入 element 到 queue 的 head

確認 q 是否為 NULL
Allocate list_ele_t 的空間給 new element 並確認是否成功
Allocate strlen(s) + 1 給 new element 中的 value 並確認是否成功
將 new element 的 next 指向 old head
new element 成為 head
- 若 queue 為空, 則將 tail 和 head 都指向 new element
利用 strncpy 將 s 複製到 newh->value 中
將 size + 1

關於 (3) 為什麼要取 strlen(s) + 1 ？
在 C 語言中，字串被以連續記憶體的方式儲存下來
並在字串的最後存下中止符號 '\0'
代表字串的的結尾
因此假設 string 長度為 k 則需要 k + 1 個字元空間才能儲存

再看看 strlen 的描述

The C library function size_t strlen(const char *str) computes the length of the string str up to, but not including the terminating null character.
From tutorialspoint

因此需要 allocate strlen(s) + 1 的空間





























bool q_insert_head(queue_t *q, char *s)
{
    list_ele_t *newh;
    /* Return false if q is NULL */
    if (!q) {
        return false;
    }
    newh = malloc(sizeof(list_ele_t));
    /* Return false if malloc failed */
    if (!newh) {
        return false;
    }
    /* Return false if malloc failed */
    newh->value = malloc(strlen(s) + 1);
    if (!newh->value) {
        /* Free allocated space */
        free(newh);
        return false;
    }
    newh->next = q->head;
    strncpy(newh->value, s, strlen(s) + 1);
    if (q->size == 0) {
        q->head = q->tail = newh;
    } else {
        q->head = newh;
    }
    (q->size)++;
    return true;
}

q_insert_tail

插入 element 到 queue 的 tail

確認 q 是否為 NULL
Allocate list_ele_t 的空間給 new element 並確認是否成功
Allocate strlen(s) + 1 給 new element 中的 value 並確認是否成功
將 new element 的 next 指向 NULL
利用 strncpy 將 s 複製到 newh->value 中
將 old tail 的 next 指向 new element 使其成為 tail
- 若 queue 為空, 則將 tail 和 head 都指向 new element
將 size + 1






























bool q_insert_tail(queue_t *q, char *s)
{
    /* Return false if q is NULL */
    if (!q) {
        return false;
    }
    list_ele_t *newt;
    newt = malloc(sizeof(list_ele_t));
    /* Return false if malloc failed */
    if (!newt) {
        return false;
    }
    newt->value = malloc(strlen(s) + 1);
    /* Return false if malloc failed */
    if (!newt->value) {
        /* Free allocated space */
        free(newt);
        return false;
    } 
    newt->next = NULL;
    strncpy(newt->value, s, strlen(s) + 1);
    if (q->size == 0) {
        q->tail = q->head = newt;
    } else {
        q->tail->next = newt;
        q->tail = newt;
    }
    (q->size)++;
    return true;
}

q_remove_head

移除 queue 的 head, 若 sp 不為 NULL, 則複製長度最大為 bufsize 的字串到 sp 中

確認 q 是否為 NULL 或 empty
若 sp 不為 NULL, 則將 q->head->value 複製到 sp 中, 並將 sp 的最後一個 char 設為 '\0'
將 head 指向 head->next 改變 queue 的 head
將 size - 1
若移除 head 後 queue 為空, 則把 tail 也指向 NULL
釋放 element space (list_ele_t 和 value)

strncpy 的描述

The C library function char *strncpy(char *dest, const char *src, size_t n) copies up to n characters from the string pointed to, by src to dest. In a case where the length of src is less than that of n, the remainder of dest will be padded with null bytes.
From tutorialspoint

當中提到若 n 大於 src 的字串長度時
其在 dest 上的差額會自動補上 null bytes
因此使用 strncpy 複製字串時可以直接給定最大長度
又因 strncpy 並不像 strcpy 會在複製後自動補上 '\0'
所以就必須預留最後一個 byte 給 '\0'
因此複製的最大長度為 bufsize - 1























bool q_remove_head(queue_t *q, char *sp, size_t bufsize)
{
    /* Return false if q is NULL or empty */
    if (!q || q->size == 0) {
        return false;
    }
    /* Copy string to sp if sp is non-NULL */
    if (sp) {
        /* Copy the string that its length is smaller than bufsize */
        strncpy(sp, q->head->value, bufsize - 1);
        sp[bufsize - 1] = '\0';
    }
    list_ele_t *tmp = q->head;
    q->head = q->head->next;
    (q->size)--;
    /* Set tail to NULL if queue is empty */
    q->tail = q->size == 0 ? q->head : q->tail;
    /* Free value space */
    free(tmp->value);
    /* Free list_ele structure */
    free(tmp);
    return true;
}

q_size

回傳 queue 的 size

若 q 為 NULL 則回傳 0
回傳 size








int q_size(queue_t *q)
{
    /* Return 0 if q is NULL */
    if (!q) {
        return 0;
    }
    return q->size;
}

q_reverse

反轉 queue 的排序

確認 q 是否為 NULL 或 empty
每個 element 的 next 指向前一個 element
swap queue 中的 head 和 tail
完成反轉

















void q_reverse(queue_t *q)
{
    /* No effect if q is NULL or empty */
    if (!q || q->size == 0) {
        return;
    }
    for (list_ele_t *prev = NULL, *cur = q->head; cur; ) {
        list_ele_t *tmp = cur;
        cur = cur->next;
        tmp->next = prev;
        prev = tmp;
    }
    list_ele_t *tmp;
    tmp = q->head;
    q->head = q->tail;
    q->tail = tmp;
}

q_sort

對 queue 做依據 natural sort 排序，參考測驗 1 使用 divide and conquer 的方式做排序























































list_ele_t *divide_and_conquer(list_ele_t *head) {
    if (!head || !head->next) {
        return head;
    }
    list_ele_t *left = head;
    list_ele_t *right = head->next;
    left->next = NULL;

    left = divide_and_conquer(left);
    right = divide_and_conquer(right);

    for (list_ele_t *merge = NULL; left || right; ) {
        if (!right || (left && strcmp(left->value, right->value) < 0)) {
            if (!merge) {
                head = merge = left;
            } else {
                merge->next = left;
                merge = merge->next;
            }
            left = left->next;
        } else {
            if (!merge) {
                head = merge = right;
            } else {
                merge->next = right;
                merge = merge->next;
            }
            right = right->next;
        }
    }
    return head;
}

/*
 * Sort elements of queue in ascending order
 * No effect if q is NULL or empty. In addition, if q has only one
 * element, do nothing.
 */
void q_sort(queue_t *q)
{
    /* No effect if q is NULL or empty */
    if (!q || q->size == 0) {
        return;
    }
    /* Do nothing if q has only one */
    if (q->size == 1) {
        return;
    }
    q->head = divide_and_conquer(q->head);
    list_ele_t *tmp = q->head;
    while (tmp->next) {
        tmp = tmp->next;
    }
    q->tail = tmp;
}

除了測驗一的演算法外，值得注意的是做完排序後須重新指派 head 和 tail。
執行 $ make test 進行測試 :

+++ TESTING trace trace-15-perf:
# Test performance of insert_tail, size, reverse, and sort
---     trace-15-perf   0/6
+++ TESTING trace trace-16-perf:
# Test performance of sort with random and descending orders
# 10000: all correct sorting algorithms are expected pass
# 50000: sorting algorithms with O(n^2) time complexity are expected failed
# 100000: sorting algorithms with O(nlogn) time complexity are expected pass
ERROR: Time limit exceeded.  Either you are in an infinite loop, or your code is too inefficient
---     trace-16-perf   0/6
+++ TESTING trace trace-17-complexity:
# Test if q_insert_tail and q_size is constant time complexity
Probably constant time
Probably constant time
---     trace-17-complexity     5/5
---     TOTAL           88/100

可以看到剩下 trace-15-perf 和 trace-16-perf 未通過。

Time limit exceed

其中 trace-16-perf 顯示 Time limit exceed，經思考過後發現測驗一的演算法相當於將一個 left 插入已排序好的 right 中，概念類似 insertion sort，時間複雜度為

O (N^{2})

，若將 queue 從中間切一半分成 left 和 right 再進行運算，概念類似 merge sort，時間複雜度則可從原本的

O (N^{2})

降為

O (N l o g (N))

利用 left 走一步 right 走兩步的技巧，當 right 或 right->next 為 NULL 時，left 就剛好在 queue 的中間位置

改良後 :





















list_ele_t *divide_and_conquer(list_ele_t *head) {
    if (!head || !head->next) {
        return head;
    }
    list_ele_t *left = head;
    list_ele_t *right = head->next;

    /* left : 1, right : 2 */
    while (right && right->next) {
        left = left->next;
        right = right->next->next;
    }
    right = left->next;
    left->next = NULL;
    left = head;

    left = divide_and_conquer(left);
    right = divide_and_conquer(right);
    
...

執行 $ make test :

通過測驗！！

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

不過我很好奇為什麼改完 trace-16-perf 後 trace-15-perf 也跟著修好了，於是我用原本的 code 利用 Valgrind 再測試一次 trace-15-perf 來分析記憶體的使用情形。

執行 $ scripts/driver.py -p /tmp/qtest.pSix6W --valgrind -t 15 :

+++ TESTING trace trace-15-perf:
# Test performance of insert_tail, size, reverse, and sort
==19349== Stack overflow in thread #1: can't grow stack to 0x1ffe801000
==19349== Stack overflow in thread #1: can't grow stack to 0x1ffe801000
==19349== Can't extend stack to 0x1ffe8010a8 during signal delivery for thread 1:
==19349==   no stack segment

Stack overflow

使用 Valgrind 測試後, 發現裡面存在著 stack overflow 的問題，所以猜測是原本的演算法遞迴呼叫太多次導致 stack 無法容納如此多的 return 的 address，在改良成 merge sort 後遞迴次數變少，不只時間複雜度下降, stack overflow 的問題也跟著解決

Natural sort - 3/1 更新

撰寫的過程中忘記要使用 natural sort 來代替原本的 strcmp，因此補上進度

由於要使用 strnatcmp 因此先至 natsort 下載 strnatcmp.[ch] 加入專案中，接著將 strnatcmp.c 匯編至 Makefile 中進行編譯並且連結 :

...

OBJS := qtest.o report.o console.o harness.o queue.o strnatcmp.o \
        random.o dudect/constant.o dudect/fixture.o dudect/ttest.o
...

將 queue.c 中 strcmp 換成 strnatcmp 並匯入 strnatcmp.h，發現執行 make test 時 trace-perf-15 失敗，於是參考 merge sort iteration 對原本的程式碼稍做修改:

...

while (left && right) {
    if (strnatcmp(left->value, right->value) < 0) {
        if (!merge) {
            merge = head = left;
        } else {
            merge->next = left;
            merge = merge->next;
        }
        left = left->next;
    } else {
        if (!merge) {
            merge = head = right;
        } else {
            merge->next = right;
            merge = merge->next;
        }
        right = right->next;
    }
}

merge->next = left ? left : right ? right : NULL;

...

即可跑過測資

測試 sort 是否成功

cmd> ih h31
q = [h31 h4 h h h1 h11 h21 k t]
cmd> sort
cmd> sort
ERROR: Not sorted in ascending order
q = [h h h1 h4 h11 h21 h31 k t]

出現了 ERROR: Not sorted in ascending order
於是回頭 trace qtest 中的 function do_sort
發現

/* FIXME: add an option to specify sorting order */
if (strcasecmp(e->value, e->next->value) > 0) {
    report(1, "ERROR: Not sorted in ascending order");
    ok = false;
    break;
}

Replace merge sort with pointer to pointer

2021/1/31 更新

因為之前寫的 code 太醜，於是把整個 do_sort 重寫:

Divid_and_conquer

list_ele_t *divide_and_conquer(list_ele_t *head)
{         
    if (!head || !head->next) { 
        return head;
    }     
    list_ele_t *left = head;
    list_ele_t *right = head->next;
          
    split(head, &left, &right);
    left = divide_and_conquer(left);
    right = divide_and_conquer(right);
    return merge(left, right);
}

透過 function split 將 queue 從中間切開，這邊採用 Fast & Slow Pointer 來找出中間的值:

/* split from the middle of queue */
void split(list_ele_t *head, list_ele_t **q1, list_ele_t **q2)
{                                  
    list_ele_t *slow = head;       
    list_ele_t *fast = head->next; 
    while (fast && fast->next) {   
        slow = slow->next;         
        fast = fast->next->next;   
    }                              
    *q1 = head;                    
    *q2 = slow->next;              
    slow->next = NULL;             
    return;                        
}

merge

list_ele_t *merge(list_ele_t *q1, list_ele_t *q2)
{                                  
    list_ele_t *head = NULL;       
    list_ele_t **tmp = &head;      
    while (q1 && q2) {             
        if (strnatcmp(q1->value, q2->value) < 0) {
            *tmp = q1;             
            q1 = q1->next;         
            tmp = &((*tmp)->next); 
        } else {                   
            *tmp = q2;             
            q2 = q2->next;         
            tmp = &((*tmp)->next); 
        }                          
    }                              
    /* add the last value which is the maximum value of the queue */
    *tmp = q1 ? q1 : q2;           
    return head;                   
}

比較前面的凌亂寫法，這邊採用 pointer-to-pointer 的技巧，使整個程式碼簡潔許多，解釋一下 pointer to pointer 的概念:
- tmp 是一個指向 head 的 pointer
- 移動 node 時僅須改變 tmp 指向的位置也就是 *tmp 的值
- 這麼一來不只 head 的位址不會一直改變，整個程式碼也精簡很多

test

目前跑測資依舊是 trace-15-perf fail，用 valgrind 分析記憶體使用:

$ scripts/driver.py -p ./qtest -t 15 --valgrind
---	Trace		Points
+++ TESTING trace trace-15-perf:
# Test performance of insert_tail, size, reverse, and sort
ERROR: Time limit exceeded.  Either you are in an infinite loop, or your code is too inefficient
ERROR: Insertion of dolphin failed (1 failures total)
ERROR: Time limit exceeded.  Either you are in an infinite loop, or your code is too inefficient
ERROR: Insertion of gerbil failed (2 failures total)
ERROR: Time limit exceeded.  Either you are in an infinite loop, or your code is too inefficient

由於 Time limit exceeded 是發生在採用 natural sort 後，猜測應該是 strnatcmp 帶來對效能的衝擊，使用 perf 測試 strnatcmp 與 strcmp 的差別:

$ perf record scripts/driver.py -p ./qtest -t 15
$ perf report

strnatcmp 直接佔了 43.13% 的 overhead

使用原本的 strcmp 後因為少了 strnatcmp 的 overhead，trace-15 也可以輕鬆通過

除了 strnatcmp 外，split 本身也佔了整體 20% ~ 29% 的 overhead，因此可能要從優化 split 下手。

利用 perf 找出 split bottleneck:

$ perf annotate

可以看出大部分的 overhead 都是在 slow-fast pointer 中產生

TODO: 尋找優化方法

透過 Massif 視覺化 simulation 的記憶體使用量

$ valgrind --tool=massif ./qtest

得到 unknown option: --show-leak-kinds=all 的結果，