2019q1 Homework1 (list)

contributed by < Shengyuu >

自我檢查事項

為何 Linux 採用 macro 來實作 linked list ? 一般的 function call 有何成本？
- macro 是什麼
  - macro 會在程式被編譯之前執行，將使用者定義的名詞進行轉換
```
#include<stdio.h>
#define A 10

int main()
{
    printf("%d",A);
    return 0;
}
```
  經過編譯之後可以發現 A 直接被代換掉了
  gcc -E macro.c
  Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
- macro 要注意的事
  - 參考Macros vs Functions
- macro v.s function
  - macro 會直接將程式碼做文字替換，會增加指令空間的大小，但可以省略掉呼叫 function call 時對 stack 的操作，當 function call 重複較多次時，用 macro 代替可以節省許多時間
之後補上反組譯分析及效能分析
Linux 應用 linked list 在哪些場合？
GNU extension 的 typeof 有何作用
- typeof 可以回傳變數的型態，因為 macro 中無法給定變數型態，所以在 macro 中 typeof 被大量使用以確保型態正確
增加程式碼範例

解釋以下巨集的原理

觀察題目要求的巨集 container_of 可以發現當中用了另一個巨集 "offset" ，它被定義在 <linux/stddef.h> 中，可以用來計算某一個 struct 結構的成員相較於該結構起始位置的偏移量

#define offsetof(TYPE, MEMBER)  ((size_t)&((TYPE *)0)->MEMBER)

此段程式碼把數值 0 強制轉換成 TYPE 指標類型，拿取該 struct 指定成員後，對此成員取址，最後回傳一個型態為 size_t 的數。

驗證：















#include<stdlib.h>
#define offsetof(TYPE, MEMBER)  ((size_t)&((TYPE *)0)->MEMBER)

struct test{
    int a;
    int b;
    int c;
};

int main(){
    struct test *t;
    size_t size1,size2;
    size1 = offsetof(struct test, b);
    size2 = offsetof(struct test, c);
}

(gdb) p sizeof(*t)
$1 = 12
(gdb) p sizeof(int)
$2 = 4
(gdb) p size1
$3 = 4
(gdb) p size2
$4 = 8
(gdb)

了解巨集 offset 後，我們可以進一步來看題目所要求的巨集 container_of ，它被定義在 <linux/lernel.h> 中，這個巨集的作用是可以得到某個 struct 的起始點，只要我們知道該結構任意一個成員的地址就可以用 containner_of 算出該集夠的起始位置





#define container_of(ptr, type, member)                            \
__extension__({                                                \
    const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
    (type *) ((char *) __pmember - offsetof(type, member));    \
})

此段程式碼前半部先宣告一個型態和 member 相同的指標 __pmember 並取得 member 得所在位址 ptr ，第二部份將 member 所在的位址減去 offset 後即可得到該結構的初始位址

** 驗證：**




















#include<stdlib.h>
#define offsetof(TYPE, MEMBER)  ((size_t)&((TYPE *)0)->MEMBER)
#define container_of(ptr, type, member)                            \
__extension__({                                                \
    const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
    (type *) ((char *) __pmember - offsetof(type, member));    \
})
struct test{
    int a;
    int b;
    int c;
};

int main(){
    struct test *ptr, t;
    size_t size1,size2;
    size1 = offsetof(struct test, b);
    size2 = offsetof(struct test, c);
    ptr = container_of(&(t.b), struct test, b);
}

(gdb) p ptr
$1 = (struct test *) 0x7fffffffde0c
(gdb) p &t
$2 = (struct test *) 0x7fffffffde0c

由 gdb 輸出結果可看出變數 t 的位址和用巨集 containner_of 取得的位址一樣

補充： container_of 還有一個值得探討的地方是它將參數 __parameter 強制轉換成 char* 型態再和 offset 做減法，原因是指標在做運算時會依據所指向型態的大小而有所不同，而 char 型態的大小剛好本來就是1，所以將指標強制轉換成 char pointer 再跟 size_t 型態的變數做運算就不會產生問題
驗證：
























#include<stdlib.h>
#define offsetof(TYPE, MEMBER)  ((size_t)&((TYPE *)0)->MEMBER)
#define container_of(ptr, type, member)                            \
__extension__({                                                \
    const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
    (type *) ((char *) __pmember - offsetof(type, member));    \
})
struct test{
    int a;
    int b;
    int c;
};

int main(){
    struct test t, *ptr1, *ptr2;
    size_t size1;
    char *a1;
    int *a2;
    a1 = (char *) (&(t.b));
    a2 = (int * ) (&(t.b));
    size1 = offsetof(struct test, b);
    ptr1 = (struct test *) (a1 - size1);
    ptr2 = (struct test *) (a2 - size1);
}

(gdb) p a1
$5 = 0x7fffffffde10 ""
(gdb) p a2
$6 = (int *) 0x7fffffffde10
(gdb) p size1
$7 = 4
(gdb) p ptr1
$8 = (struct test *) 0x7fffffffde0c
(gdb) p ptr2
$9 = (struct test *) 0x7fffffffde00
(gdb) 
(gdb) p sizeof(int)
$10 = 4

a1 和 a2 是用不同型態的指標存取 &(t.b)，由 gdb 的結果可看出他們的數值原本是一樣的，但同樣對 size1 做減法後的 ptr1、ptr2 的值卻有所差異。從 $9 可以看出 ptr2 和原本的 a2 差了16，而16剛好是 sie1*4 的大小，驗證了 int 的大小是4個 Byte (~~依不同編譯器版本會不同~~）

參考 Jinyo的隨便寫寫

不是「依不同編譯器版本會不同」，而是 ABI (Application Binary Interface) 有關，參見 64-bit data models

儘量讀第一手材料。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

除了你熟悉的 add 和 delete 操作，list.h 還定義一系列操作，為什麼呢？這些有什麼益處？

很多時候我們要操作得對象是整個 list 而不只是一個 entry ，如果多在標頭檔內多定義一些操作整個 list 的 function ，在實作的時候就可以使程式碼更簡潔、更有效率。在 list.h 標頭檔中最前面的註解其實也有寫到

/*
 * Simple doubly linked list implementation.
 *
 * Some of the internal functions ("__xxx") are useful when
 * manipulating whole lists rather than single entries, as
 * sometimes we already know the next/prev entries and we can
 * generate better code by using them directly rather than
 * using the generic single-entry routines.
 */

LIST_POISONING 這樣的設計有何意義？

先來找找哪裡用到了poison

static inline void list_del(struct list_head *entry)
{
    __list_del_entry(entry);
    entry->next = LIST_POISON1;
    entry->prev = LIST_POISON2;
}

觀察上面的程式碼可以發現 list 再做刪除的時候會把刪除的節點內的 next、prev 分別指向兩個 LIST_POISON，而 LIST_POISIN 是定義在一個 poison.h 的標頭檔中，所以我們再去看看 poison.h

/*
 * Architectures might want to move the poison pointer offset
 * into some well-recognized area such as 0xdead000000000000,
 * that is also not mappable by user-space exploits:
 */
#ifdef CONFIG_ILLEGAL_POINTER_VALUE
# define POISON_POINTER_DELTA _AC(CONFIG_ILLEGAL_POINTER_VALUE, UL)
#else
# define POISON_POINTER_DELTA 0
#endif

/*
 * These are non-NULL pointers that will result in page faults
 * under normal circumstances, used to verify that nobody uses
 * non-initialized list entries.
 */
#define LIST_POISON1  ((void *) 0x100 + POISON_POINTER_DELTA)
#define LIST_POISON2  ((void *) 0x200 + POISON_POINTER_DELTA)

這段程式碼的註解告訴我們 LIST_POISON1、LIST_POISON2 是兩個會造成 page faults 的 non-NULL pointers，因此可推論出 list_del 將 prev、next 指向 LIST_POISON 是防止已被刪除的節點被違規存取的狀況

至於為什麼不直接將 next、prev 指向 NULL 呢？
可以參考 what is the meaning of 0xdead000000000000
留言中有提到一句話："Using 0xdead000000000000 as the base value just makes it easier to distinguish an explicitly poisoned value from one that was initialized with zero or got overwritten with zeroes. "
可見使用 poison 是為了在 debug 時可以更快速、清楚的知道問題出在哪邊

linked list 採用環狀是基於哪些考量？
- 對比於非環狀的 list 需要一個指標指向 list 的開頭，當操作 list 時還要多考慮操作的 node 是否為 head 的特殊狀況，環狀 linked list 則不需要一個特別的指標紀錄 list head 的位置，這使得操作 list 的 function 更為簡潔，對於每個 function 也可以減少一個判斷是否為 head 的 if case
- 在需要多次重複尋訪 list 的狀況下，環狀 list 也能發揮其特點增加執行的效率，若為非環狀的 list 每次尋訪時都需要一個 if case 判斷 node->next 是否為 NULL，而環狀的 list 因為每個 node 都可以當作 head，尋訪時可以避免這個判斷。在作業系統中常常把正在執行的多個程式放在一個 list 中，尋訪到哪個程式就執行哪個程式，在這過程中就需要大量的重複尋訪
參考Circular Linked List
什麼情境會需要對 linked list 做排序呢？舉出真實世界的應用，最好在作業系統核心出現
什麼情境會需要需要找到第 k 大/小元素呢？又，你打算如何實作？

list_for_each_safe 和 list_for_each 的差異在哪？“safe” 在執行時期的影響為何？

若我們在 for 迴圈可能改變到 list 我們就需要用到有 safe 的版本，有 safe 的版本會在我們動作之前先紀錄 list 的下一個 element ，防止我們在操作到時不小心將 element 刪除或改變，倒置錯誤結果

/**
 * list_for_each_safe - iterate over a list safe against removal of list entry
 * @pos:	the &struct list_head to use as a loop cursor.
 * @n:		another &struct list_head to use as temporary storage
 * @head:	the head for your list.
 */    
#define list_for_each_safe(pos, n, head) \
    for (pos = (head)->next, n = pos->next; pos != (head); \
        pos = n, n = pos->next)

變數 n 在 pos 初始話的時候就先把 pos->next 紀錄起來，防止進入 for 迴圈之後 pos 被改變導致 pos->next 消失或錯誤

for_each 風格的開發方式對程式開發者的影響為何？
- 提示：對照其他程式語言，如 Perl 和 Python
程式註解裡頭大量存在 @ 符號，這有何意義？你能否應用在後續的程式開發呢？
- 在函式的註解中使用 @ 符號加在 parameter 前面，讓使用者可以一眼就分辨出註解中哪些指的是參數
tests/ 目錄底下的 unit test 的作用為何？就軟體工程來說的精神為何？
tests/ 目錄的 unit test 可如何持續精進和改善呢？

Merge Sort

參考lineagech同學 git hub 上的程式碼改寫
















void list_split(struct list_head *head,
                struct list_head *left,
                struct list_head *right)
{
    struct list_head *a = head->next;
    struct list_head *b = head->next;
    while(b != head && b->next != head)
    {
        a = a->next;
        b = b->next->next;
    }

    list_cut_position(left, head, a->prev);
    list_cut_position(right, head, (b == head ? head->prev : b));

}

函式 list_split 中用了你所不知道的C語言: linked list 和非連續記憶體操作的龜兔賽跑演算法，這個演算法的好處能夠讓我們用一個 while 迴圈就得到 mid 的位址，如果沒有用這樣的演算法，可能就得先用一個迴圈得到 list 的總長度，在用另一個迴圈取得 list 中間的位址

函式中的 list_cut_position 是定義在 list.h 內的一個 function，它的作用是擷取一個 list 的一段給另一個 list

/**
 * list_cut_position() - Move beginning of a list to another list
 * @head_to: pointer to the head of the list which receives nodes
 * @head_from: pointer to the head of the list
 * @node: pointer to the node in which defines the cutting point
 *
 * All entries from the beginning of the list @head_from to (including) the
 * @node is moved to @head_from.
 *
 * @head_to is replaced when @head_from is not empty. @node must be a real
 * list node from @head_from or the behavior is undefined.
 */
static inline void list_cut_position(struct list_head *head_to,
                                     struct list_head *head_from,
                                     struct list_head *node)
{
    struct list_head *head_from_first = head_from->next;

    if (list_empty(head_from))
        return;

    if (head_from == node) {
        INIT_LIST_HEAD(head_to);
        return;
    }

    head_from->next = node->next;
    head_from->next->prev = head_from;

    head_to->prev = node;
    node->next = head_to;
    head_to->next = head_from_first;
    head_to->next->prev = head_to;
}

函式中從擷取 head_from 的 node 的位置開始到最後一個 node 給 head_to

main function 的邏輯並不難，大概就是產生一個 random 的 array ，再用 for 迴圈將 array 裡的資料一個一個放入新建立的 testlist 當中， list 創建完成後分別拿去 quick_sort 和我們自己寫的 merge_sort 排序，最後在依依筆對結果

random_shuffle_array(values, (uint16_t) ARRAY_SIZE(values));

產生 random array 的函式定義在 common.h 中
[Todo] 解析此標頭檔

2019q1 Homework1 (list)

自我檢查事項

為何 Linux 採用 macro 來實作 linked list ? 一般的 function call 有何成本？

macro 是什麼

macro 要注意的事

macro v.s function

Linux 應用 linked list 在哪些場合？

GNU extension 的 typeof 有何作用

解釋以下巨集的原理

除了你熟悉的 add 和 delete 操作，list.h 還定義一系列操作，為什麼呢？這些有什麼益處？

LIST_POISONING 這樣的設計有何意義？

linked list 採用環狀是基於哪些考量？

什麼情境會需要對 linked list 做排序呢？舉出真實世界的應用，最好在作業系統核心出現

什麼情境會需要需要找到 第 k 大/小元素 呢？又，你打算如何實作？

list_for_each_safe 和 list_for_each 的差異在哪？“safe” 在執行時期的影響為何？

for_each 風格的開發方式對程式開發者的影響為何？

程式註解裡頭大量存在 @ 符號，這有何意義？你能否應用在後續的程式開發呢？

tests/ 目錄底下的 unit test 的作用為何？就軟體工程來說的精神為何？

tests/ 目錄的 unit test 可如何持續精進和改善呢？

Merge Sort

tags: Linux 核心設計 2019

Read more

Compiler Bug

紅黑樹

CASLab x ITRI Virtual Platform Demo

VLSI FINAL

什麼情境會需要需要找到第 k 大/小元素呢？又，你打算如何實作？

tags: `Linux 核心設計 2019`