2019q1 Homework1 (list)

contributed by < johnnylord >

tags: `sysprog2019`

為何 Linux 採用 macro 來實作 linked list？一般的 function call 有何成本？

Linux kernel 中充斥著大量對於 linked list 的操作和資料結構。使用 macro 來實做 linked list，最大的考量點應該是效能的問題。一般的 function call 需要額外的成本來維護其 stack frame，例如參數的傳遞，返回地址，新舊的 stack pointer 等等。

以下實驗 function call 和 macro 在組合語言中的差別

Function based

#include <stdio.h>

int add(int a, int b)
{
    return a + b;
}

int main(void) {
    int a = 10, b  = 100;

    printf("%d\n", add(a, b));
    return 0;
}

$ gdb -q func
Reading symbols from func...done.
(gdb) b *main
Breakpoint 1 at 0x65e: file func.c, line 8.
(gdb) b *add
Breakpoint 2 at 0x64a: file func.c, line 4.
(gdb) r
Starting program: /home/johnnylord/Desktop/linux-list/func 

Breakpoint 1, main () at func.c:8
8	int main(void) {
(gdb) disassemble 
Dump of assembler code for function main:
=> 0x000055555555465e <+0>:	push   %rbp
   0x000055555555465f <+1>:	mov    %rsp,%rbp
   0x0000555555554662 <+4>:	sub    $0x10,%rsp
   0x0000555555554666 <+8>:	movl   $0xa,-0x8(%rbp)
   0x000055555555466d <+15>:	movl   $0x64,-0x4(%rbp)
   0x0000555555554674 <+22>:	mov    -0x4(%rbp),%edx
   0x0000555555554677 <+25>:	mov    -0x8(%rbp),%eax
   0x000055555555467a <+28>:	mov    %edx,%esi
   0x000055555555467c <+30>:	mov    %eax,%edi
   0x000055555555467e <+32>:	callq  0x55555555464a <add>
   0x0000555555554683 <+37>:	mov    %eax,%esi
   0x0000555555554685 <+39>:	lea    0x98(%rip),%rdi        # 0x555555554724
   0x000055555555468c <+46>:	mov    $0x0,%eax
   0x0000555555554691 <+51>:	callq  0x555555554520 <printf@plt>
   0x0000555555554696 <+56>:	mov    $0x0,%eax
   0x000055555555469b <+61>:	leaveq 
   0x000055555555469c <+62>:	retq   
End of assembler dump.
(gdb) c
Continuing.

Breakpoint 2, add (a=0, b=0) at func.c:4
4	{
(gdb) disassemble 
Dump of assembler code for function add:
=> 0x000055555555464a <+0>:	push   %rbp
   0x000055555555464b <+1>:	mov    %rsp,%rbp
   0x000055555555464e <+4>:	mov    %edi,-0x4(%rbp)
   0x0000555555554651 <+7>:	mov    %esi,-0x8(%rbp)
   0x0000555555554654 <+10>:	mov    -0x4(%rbp),%edx
   0x0000555555554657 <+13>:	mov    -0x8(%rbp),%eax
   0x000055555555465a <+16>:	add    %edx,%eax
   0x000055555555465c <+18>:	pop    %rbp
   0x000055555555465d <+19>:	retq   
End of assembler dump.

Macro based

#include <stdio.h>

#define ADD(a, b) a + b

int main()
{
    int a = 10, b = 100;

    printf("%d\n", ADD(a, b));
    return 0;
}

$ gdb -q macro
Reading symbols from macro...done.
(gdb) b *main
Breakpoint 1 at 0x64a: file macro.c, line 6.
(gdb) r
Starting program: /home/johnnylord/Desktop/linux-list/macro 

Breakpoint 1, main () at macro.c:6
6	{
(gdb) disassemble 
Dump of assembler code for function main:
=> 0x000055555555464a <+0>:	push   %rbp
   0x000055555555464b <+1>:	mov    %rsp,%rbp
   0x000055555555464e <+4>:	sub    $0x10,%rsp
   0x0000555555554652 <+8>:	movl   $0xa,-0x8(%rbp)
   0x0000555555554659 <+15>:	movl   $0x64,-0x4(%rbp)
   0x0000555555554660 <+22>:	mov    -0x8(%rbp),%edx
   0x0000555555554663 <+25>:	mov    -0x4(%rbp),%eax
   0x0000555555554666 <+28>:	add    %edx,%eax
   0x0000555555554668 <+30>:	mov    %eax,%esi
   0x000055555555466a <+32>:	lea    0xa3(%rip),%rdi        # 0x555555554714
   0x0000555555554671 <+39>:	mov    $0x0,%eax
   0x0000555555554676 <+44>:	callq  0x555555554520 <printf@plt>
   0x000055555555467b <+49>:	mov    $0x0,%eax
   0x0000555555554680 <+54>:	leaveq 
   0x0000555555554681 <+55>:	retq   
End of assembler dump.

其實觀看上面的組合語言，可以發現其實 main 的部份都差不多，差別就在 add function 的組合語言，每次呼叫 add 就得在額外做這些指令，分配新的 stack frame，和參數的傳遞。

=> 0x000055555555464a <+0>:	push   %rbp
   0x000055555555464b <+1>:	mov    %rsp,%rbp
   0x000055555555464e <+4>:	mov    %edi,-0x4(%rbp)
   0x0000555555554651 <+7>:	mov    %esi,-0x8(%rbp)

透過上面的觀察發現一個很有趣的點，傳遞到 add 的參數，其空間不一定會在當前的 stack frame 中。不過若是 add 裡面有再做其他宣告或操作，則 compiler 會更新 $rsp 的值，使得傳遞的參數在當前的 stack frame 中。

Linux 應用 linked list 在哪些場合？舉三個案例並附上對應程式碼，需要解說，以及揣摩對應的考量

virtual memory area

include/linux/mm_types.h

struct vm_area_struct {
	/* The first cache line has the info for VMA tree walking. */

	unsigned long vm_start;		/* Our start address within vm_mm. */
	unsigned long vm_end;		/* The first byte after our end address
					   within vm_mm. */

	/* linked list of VM areas per task, sorted by address */
	struct vm_area_struct *vm_next, *vm_prev;

	struct rb_node vm_rb;

	/* ... */
}

每個 process (struct task_struct) 都會有其自己的 memory descriptor(mm field in task_struct)。而 mm (struct mm_struct) 紀錄著所有屬於這個 process 的 VMA。VMA 全名為 virtual memory area，照其字面翻譯就是紀錄著這個 process 的 virtual memory 中的某些區域，如 data segment, text segment 等等。

fs/proc/task_mmu.c

static void
show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid)
{
	struct mm_struct *mm = vma->vm_mm;
	struct file *file = vma->vm_file;
	vm_flags_t flags = vma->vm_flags;
	unsigned long ino = 0;
	unsigned long long pgoff = 0;
	unsigned long start, end;
	dev_t dev = 0;
	const char *name = NULL;

	if (file) {
		struct inode *inode = file_inode(vma->vm_file);
		dev = inode->i_sb->s_dev;
		ino = inode->i_ino;
		pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT;
	}

	start = vma->vm_start;
	end = vma->vm_end;
	show_vma_header_prefix(m, start, end, flags, pgoff, dev, ino);

	/* ... */
}

我們可以在 /proc/[pid]/maps 中觀看每個 process 的 memory map，此檔案中的每一行紀錄就對應著相對應的 VMA。

Mounted file system
Linux 允許我們並行的使用多個 file system，但是我們需要能夠追蹤當前系統中有多少個被掛載的裝置或 file system，所以在 kernel 中有個變數 super_blocks，會以 linked list 的方式串連所有個當前掛載的 file system (struct super_block)。

fs/super.c

static LIST_HEAD(super_blocks);

include/linux/fs.h

struct super_block {
	struct list_head	s_list;		/* Keep this first */
	dev_t			s_dev;		/* search index; _not_ kdev_t */
	unsigned char		s_blocksize_bits;
	unsigned long		s_blocksize;
	loff_t			s_maxbytes;	/* Max file size */
	struct file_system_type	*s_type;
    /* ... */
}

Inodes in file system
每個 file system 都有其需要管理的資料，inode 是在其 file system 底下的物件，如一般檔案，目錄，連結等等。雖然他是有 linked list 的方式做管理，但是在 VFS 中，有相關的 cache，如 inode cache, 和 dentry cache，做存取的加速。

include/linux/fs.h

struct super_block {
    /* ... */
    struct list_head	s_inodes;
    /* ... */
}

GNU extension 的 typeof 有何作用？在程式碼中扮演什麼角色？

GNU C 上說若要使用 typeof 的功能在 ISO C99 上，必須改為 __typeof__。
typeof 的用處讓開發者可以容易撰寫更通用的程式碼，不用 case by case 的撰寫相對應的程式碼，但是做相同的事情。

typeof 的參數為一個 expression 或是 type，傳化為此 expression 或 type 的 type。

以下是一個小測試

__typeof__(int) a = 10; // int a = 10;
__typeof__(&a) b = &a; // int *b = &a;

解釋以下巨集的原理





#define container_of(ptr, type, member)                            \
    __extension__({                                                \
        const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
        (type *) ((char *) __pmember - offsetof(type, member));    \
    })

上術用到了很多 GCC extension。__extension__ 是用來告訴 Compiler 說我將會使用一些非原本 C 有的語法，忽略其警告。而 ({ ... }) 是 statement in expression 的語法，他是一個 expression，且回傳最後一項 statement 的值。

上述第四行，會計算出某個 type(通常是 struct 型別) 的物件的起始位置，藉由從已知此物件中的某個 member，減去到包含此 member 的物件的起始位置的位址的位移。

看巨集 offsetof 是如何算出偏移量。

#define offsetof(TYPE, MEMBER)	((size_t)&((TYPE *)0)->MEMBER)

(TYPE *)0 會指向 0x00 的位置。由於我們將指標做了轉型，所以其後接續 ->MEMBER 的操作是系統認得的，得到 MEMBER 的位址。由於我們的起始點是 0x00，所以 MEMBER 的位址，就相當於 MEMBER 到此 TYPE 起始位置的偏移量。

以下是使用上面兩個巨集的實驗

#include <stdio.h>

#define offsetof(TYPE, MEMBER) ((size_t) & ((TYPE *) 0)->MEMBER)

struct list_head {
    struct list_head *next, *prev;
};

struct A {
    char c[24];
    struct list_head list;
};

int main()
{
    printf("Offset from list elment's address to struct A's address: %d\n",
           (int) offsetof(struct A, list));
    return 0;
}

$ gcc -gdwarf -g3 -o offset offset.c
$ gdb offset

(gdb) break main
Breakpoint 1 at 0x64e: file test.c, line 16.
(gdb) r
Starting program: /home/johnnylord/Kernel/linux-list/examples/offset 

Breakpoint 1, main () at offset.c:14
16	    printf("Offset from list elment's address to struct A's address: %d\n",
(gdb) n
Offset from list elment's address to struct A's address: 24
17	    return 0;
(gdb) macro expand offsetof(struct A, list)
expands to: ((size_t)&((struct A*)0)->list)
(gdb) p (struct A*)0
$1 = (struct A *) 0x0
(gdb) p &((struct A*)0)->list
$2 = (struct list_head **) 0x18

`LIST_POISONING` 這樣的設計有何意義？

/**
 * list_del() - Remove a list node from the list
 * @node: pointer to the node
 *
 * The node is only removed from the list. Neither the memory of the removed
 * node nor the memory of the entry containing the node is free'd. The node
 * has to be handled like an uninitialized node. Accessing the next or prev
 * pointer of the node is not safe.
 *
 * Unlinked, initialized nodes are also uninitialized after list_del.
 *
 * LIST_POISONING can be enabled during build-time to provoke an invalid memory
 * access when the memory behind the next/prev pointer is used after a list_del.
 * This only works on systems which prohibit access to the predefined memory
 * addresses.
 */
static inline void list_del(struct list_head *node)
{
    struct list_head *next = node->next;
    struct list_head *prev = node->prev;

    next->prev = prev;
    prev->next = next;

#ifdef LIST_POISONING
    node->prev = (struct list_head *) (0x00100100);
    node->next = (struct list_head *) (0x00200200);
#endif
}

LIST_POISONING 的存在是為了除錯用的。當我們從 linked list 中刪除(delete)一個 node，此 node 只是脫離原本的 linked list（從上述程式碼可以得知並沒有做 free 的動作）並且將其指向 LIST_POISONING 這個位址。被刪除的 node 可能之後會被合併到其他的 linked list 或用作其他用途。
但是當開發者有意的想要完全刪除這個 node，但是忘記做 free 的動作時，這個 node 所指到的位置 LIST_POISONING 便發會了作用。因為當我們下次對這個 node 做操作時，如 node->next 或 node->pre，我們會觸發一個通知給系統，因為我們存取到系統預定義的除錯位址。

Linked list 採用環狀是基於哪些考量？

一個重要的事實是 linux 中 linked list 的開頭指標並不包含 linked list 中所串連的 node 該有的資料負載，意思是開頭的資料和 nodes 的資料並無相關，linked list 的操作只關心這些 nodes，開頭指標只是一個基準點。因此 linux 不會刪除或修改開頭的資料，這使得開頭指標成為程式邏輯判斷的可靠指示。

基於上述的事實，linux 採用環狀 linked list 時將得到大大的好處，它可以不用考慮 edge case，操作 linked list 並不因為操作位置的不同而有不同的動作。

static inline void list_del(struct list_head *node)
{
    struct list_head *next = node->next;
    struct list_head *prev = node->prev;

    next->prev = prev;
    prev->next = next;

#ifdef LIST_POISONING
    node->prev = (struct list_head *) (0x00100100);
    node->next = (struct list_head *) (0x00200200);
#endif
}

可以看到 list_del 並不會因為傳進去的 head 的位置不同，而有不同的操作。

`list_for_each_safe` 和 `list_for_each` 的差異在哪？“safe” 在執行時期的影響為何？

這兩個巨集都是用來走訪 linked list 用的，但是delete 這項操作只能在 list_for_each_safe 中.

/**
 * list_for_each_safe - iterate over list nodes and allow deletes
 * @node: list_head pointer used as iterator
 * @safe: list_head pointer used to store info for next entry in list
 * @head: pointer to the head of the list
 *
 * The current node (iterator) is allowed to be removed from the list. Any
 * other modifications to the the list will cause undefined behavior.
 */
#define list_for_each_safe(node, safe, head)                     \
    for (node = (head)->next, safe = node->next; node != (head); \
         node = safe, safe = node->next)

由於 list_for_each_safe 會預先儲存當前 node 的下一個 node(safe node)，所以即使在當回的迴圈中將 node 才 linked list 中移除，也可以繼續走訪到下一個節點（safe node)

for_each 風格的開發方式對程式開發者的影響為何？

它讓程式開發者能夠更專注的在程式邏輯上，隱藏了繁瑣的細節。這不僅讓程式的維護更容易，程式碼也更簡潔明瞭。

以下就是一個簡單的例子，用 list_for_each_safe 代替複雜的 for 迴圈設定

#define list_for_each_safe(node, safe, head)                     \
    for (node = (head)->next, safe = node->next; node != (head); \
         node = safe, safe = node->next)

程式註解裡頭大量存在 @ 符號，這有何意義？你能否應用在後續的程式開發呢？

大型多人的程式專案，程式碼需要有相對應的說明文件，例如函式的描述，參數的型態，回傳值等等。這些說明文件若能在開發過程中，透過特殊的註解語法，讓產生說明文件和程式開發同步進行會減去很多工作。
第三方軟題 Doxygen 就是一個專為程式專案產生說明文件的軟體，只要我們在程式碼中用 Doxygen 指定的特殊註解語法，便可以讓 Doxygen 為我們產生說明文件。例如 @param 用於描述函數或方法的參數信息。

`tests/` 目錄底下的 unit test 的作用為何？就軟體工程來說的精神為何？

tests/ 下的文件用於測試函式庫中提供的函數。例如，list_add.c用於測試 list.h 中的list_add() 函數。

軟體工程很重要的精神，其中一個就是希望盡可能減少錯誤，提供高質量的軟體。因此，測試是軟件工程中必不可少的一部分。