2019q1 Homework1 (list)

contributed by < ztex >

`typeof` (GNU extension)

[quote]

If you are writing a header file that must work when included in ISO C programs, write __typeof__ instead of typeof. See Alternate Keywords.

A typeof construct can be used anywhere a typedef name can be used. For example, you can use it in a declaration, in a cast, or inside of sizeof or typeof.

大意：
如果是ISO C programs 請用 __typeof__ 而不要用 typeof
詳情請見: Alternate Keywords

下面教了一些好用的用法：

Some more examples of the use of typeof:

This declares y with the type of what x points to.


typeof (*x) y;

This declares y as an array of such values.


typeof (*x) y[4];

This declares y as an array of pointers to characters:


typeof (typeof (char *)[4]) y;

It is equivalent to the following traditional C declaration:


char *y[4];

To see the meaning of the declaration using typeof, and why it might be a useful way to write, rewrite it with these macros:


#define pointer(T)  typeof(T *)
#define array(T, N) typeof(T [N])

Now the declaration can be rewritten this way:


array (pointer (char), 4) y;

Thus, array (pointer (char), 4) is the type of arrays of 4 pointers to char.

大意大概是… 想宣告一個跟*x一樣型態的變數y嗎？這樣搞就行了

我以前真的不知道可以這樣用, 讀document真的好重要喔…
Ztex

`offsetof`

http://man7.org/linux/man-pages/man3/offsetof.3.html

DESCRIPTION
The macro offsetof() returns the offset of the field member from the
start of the structure type.

 This macro is useful because the sizes of the fields that compose a
 structure can vary across implementations, and compilers may insert
 different numbers of padding bytes between fields.  Consequently, an
 element's offset is not necessarily given by the sum of the sizes of
 the previous elements.

 A compiler error will result if member is not aligned to a byte
 boundary (i.e., it is a bit field).

RETURN VALUE
offsetof() returns the offset of the given member within the given
type, in units of bytes.

如果member 不是 a bytpe boundary, compiler error
Ztex

`extension`

GCC uses the __extension__ attribute when using the -ansi flag to avoid warnings in headers with GCC extensions. This is mostly used in glibc with function declartions using long long

用來告訴gcc 不用warnings了(使用-ansi flag 情況下)
Ztex

`container_of`





#define container_of(ptr, type, member)                            \
    __extension__({                                                \
        const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
        (type *) ((char *) __pmember - offsetof(type, member));    \
    })

透過上面GNU教我們的技巧宣告一個((type *) 0)->member 型態的 pointer __pmember, 並且把 ptr assign 給這個pointer
透過offsetof 計算出這個member對於目標address的偏移量
再把__pmember 減掉這個偏移量就是目標address了

macro vs. function

https://www.geeksforgeeks.org/macros-vs-functions/

MACRO|FUNCTION
Macro is Preprocessed|Function is Compiled
No Type Checking is done in Macro|Type Checking is Done in Function
Using Macro increases the code length|Using Function keeps the code length unaffected
Use of macro can lead to side effect at later stages|Functions do not lead to any side effect in any case
Speed of Execution using Macro is Faster|Speed of Execution using Function is Slower
Before Compilation, macro name is replaced by macro value|During function call, transfer of control takes place
Macros are useful when small code is repeated many times|Functions are useful when large code is to be written
Macro does not check any Compile-Time Errors|Function checks Compile-Time Errors

linked list applications in Linux kernel

`include/linux/fs.h` (`block_device`)

struct block_device {
	dev_t			bd_dev;  /* not a kdev_t - it's a search key */
	int			bd_openers;
	struct inode *		bd_inode;	/* will die */
	struct super_block *	bd_super;
	struct mutex		bd_mutex;	/* open/close mutex */
	void *			bd_claiming;
	void *			bd_holder;
	int			bd_holders;
	bool			bd_write_holder;
#ifdef CONFIG_SYSFS
	struct list_head	bd_holder_disks;
#endif
	struct block_device *	bd_contains;
	unsigned		bd_block_size;
	u8			bd_partno;
	struct hd_struct *	bd_part;
	/* number of times partitions within this device have been opened. */
	unsigned		bd_part_count;
	int			bd_invalidated;
	struct gendisk *	bd_disk;
	struct request_queue *  bd_queue;
	struct backing_dev_info *bd_bdi;
	struct list_head	bd_list;
	/*
	 * Private data.  You must have bd_claim'ed the block_device
	 * to use this.  NOTE:  bd_claim allows an owner to claim
	 * the same device multiple times, the owner must take special
	 * care to not mess up bd_private for that case.
	 */
	unsigned long		bd_private;

	/* The counter of freeze processes */
	int			bd_fsfreeze_count;
	/* Mutex for freeze */
	struct mutex		bd_fsfreeze_mutex;
} __randomize_layout;

struct list_head bd_list:
https://elixir.bootlin.com/linux/v4.15/source/drivers/usb/gadget/udc/bdc/bdc.h#L301

/*
 * Each endpoint has a bdl(buffer descriptor list), bdl consists of 1 or more bd
 * table's chained to each other through a chain bd, every table has equal
 * number of bds. the software uses bdi(bd index) to refer to particular bd in
 * the list.
 */
struct bd_list {
	/* Array of bd table pointers*/
	struct bd_table **bd_table_array;
	/* How many tables chained to each other */
	int num_tabs;
	/* Max_bdi = num_tabs * num_bds_table - 1 */
	int max_bdi;
	/* current enq bdi from sw point of view */
	int eqp_bdi;
	/* current deq bdi from sw point of view */
	int hwd_bdi;
	/* numbers of bds per table */
	int num_bds_table;
};

block_device:
Experient:
Block Device Drivers
揣摩:
block_device 是一個描述塊裝備的結構, 他可以用來表示一個磁區或一個分區, 透過一系列的API進行I/O(上面的實驗)
而每個endpoint 都有一個 bd_list (buffer descriptor list)(結構如上) 含有一個以上的 bd_table, 軟體藉此找到特定的bd.
我覺得他的設計考量可能考慮到一個特定的block_device在進行操作時需要對其他block_device進行I/O, 透過這個設計可以輕易traverse.

我對於這部分揣摩十分沒有把握
Ztex

除了「舉燭」，試著將核心程式碼抽離出來，轉寫獨立的程式來測試，一如本例的示範。
只是「看了再幻想」，終究不符合科學精神
:notes: jserv

Copy that. Working on it.
Ztex

`include/linux/fs.h` (`inode`)

https://zh.wikipedia.org/wiki/Inode
https://blog.csdn.net/yf210yf/article/details/12138921

The inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block location(s) of the object's data.[1] File-system object attributes may include metadata (times of last change,[2] access, modification), as well as owner and permission data.

大意是在Unix-style 的檔案系統中的一種資料結構, 代表一個檔案或者目錄

/*
 * Keep mostly read-only and often accessed (especially for
 * the RCU path lookup and 'stat' data) fields at the beginning
 * of the 'struct inode'
 */
struct inode {
	umode_t			i_mode;
	unsigned short		i_opflags;
	kuid_t			i_uid;
	kgid_t			i_gid;
	unsigned int		i_flags;

#ifdef CONFIG_FS_POSIX_ACL
	struct posix_acl	*i_acl;
	struct posix_acl	*i_default_acl;
#endif

	const struct inode_operations	*i_op;
	struct super_block	*i_sb;
	struct address_space	*i_mapping;

#ifdef CONFIG_SECURITY
	void			*i_security;
#endif

	/* Stat data, not accessed from path walking */
	unsigned long		i_ino;
	/*
	 * Filesystems may only read i_nlink directly.  They shall use the
	 * following functions for modification:
	 *
	 *    (set|clear|inc|drop)_nlink
	 *    inode_(inc|dec)_link_count
	 */
	union {
		const unsigned int i_nlink;
		unsigned int __i_nlink;
	};
	dev_t			i_rdev;
	loff_t			i_size;
	struct timespec		i_atime;
	struct timespec		i_mtime;
	struct timespec		i_ctime;
	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
	unsigned short          i_bytes;
	unsigned int		i_blkbits;
	enum rw_hint		i_write_hint;
	blkcnt_t		i_blocks;

#ifdef __NEED_I_SIZE_ORDERED
	seqcount_t		i_size_seqcount;
#endif

	/* Misc */
	unsigned long		i_state;
	struct rw_semaphore	i_rwsem;

	unsigned long		dirtied_when;	/* jiffies of first dirtying */
	unsigned long		dirtied_time_when;

	struct hlist_node	i_hash;
	struct list_head	i_io_list;	/* backing dev IO list */
#ifdef CONFIG_CGROUP_WRITEBACK
	struct bdi_writeback	*i_wb;		/* the associated cgroup wb */

	/* foreign inode detection, see wbc_detach_inode() */
	int			i_wb_frn_winner;
	u16			i_wb_frn_avg_time;
	u16			i_wb_frn_history;
#endif
	struct list_head	i_lru;		/* inode LRU list */
	struct list_head	i_sb_list;
	struct list_head	i_wb_list;	/* backing dev writeback list */
	union {
		struct hlist_head	i_dentry;
		struct rcu_head		i_rcu;
	};
	u64			i_version;
	atomic_t		i_count;
	atomic_t		i_dio_count;
	atomic_t		i_writecount;
#ifdef CONFIG_IMA
	atomic_t		i_readcount; /* struct files open RO */
#endif
	const struct file_operations	*i_fop;	/* former ->i_op->default_file_ops */
	struct file_lock_context	*i_flctx;
	struct address_space	i_data;
	struct list_head	i_devices;
	union {
		struct pipe_inode_info	*i_pipe;
		struct block_device	*i_bdev;
		struct cdev		*i_cdev;
		char			*i_link;
		unsigned		i_dir_seq;
	};

	__u32			i_generation;

#ifdef CONFIG_FSNOTIFY
	__u32			i_fsnotify_mask; /* all events this inode cares about */
	struct fsnotify_mark_connector __rcu	*i_fsnotify_marks;
#endif

#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
	struct fscrypt_info	*i_crypt_info;
#endif

	void			*i_private; /* fs or device private pointer */
} __randomize_layout;

甚麼是inode
POSIX inode description
The POSIX standard mandates file-system behavior that is strongly influenced by traditional UNIX file systems. An inode is denoted by the phrase "file serial number", defined as a per-file system unique identifier for a file.[8] That file serial number, together with the device ID of the device containing the file, uniquely identify the file within the whole system[9].

Within a POSIX system, a file has the following attributes[9] which may be retrieved by the stat system call:

Device ID (this identifies the device containing the file; that is, the scope of uniqueness of the serial number).
*File serial numbers.
The file mode which determines the file type and how the file's owner, its group, and others can access the file.
A link count telling how many hard links point to the inode.
The User ID of the file's owner.
The Group ID of the file.
The device ID of the file if it is a device file.
The size of the file in bytes.
Timestamps telling when the inode itself was last modified (ctime, inode change time), the file content last modified (mtime, modification time), and last accessed (atime, access time).
The preferred I/O block size.
The number of blocks allocated to this file.

POSIX 規檔案系統行為, inode 的定義是 pre-file system unique identifier
在 POSIX 系統中一個檔案有上述屬性

list_head 的應用

struct list_head i_list;
struct list_head i_sb_list;
struct list_head i_dentry;

揣摩: 在文件系統中所有inode 都在linked list裡面, i_list 是使用中狀態的linked list, i_sb_list是super block linked list, i_dentry 記錄了目錄上的連結.
當新增inode，會將i_list加在所有的已使用的inode linked list，然後加到super block，最後加到 i_dentry.
使用 list_head 的考量大概是因為新增刪除inodes 比較有效率.

`include/linux/sched.h` (`task_struct`)

struct task_struct {
...
/*
* Children/sibling form the list of natural children:
*/
	struct list_head		children;
	struct list_head		sibling;
...
}

太長了, 我就不整個寫出了.
Ztex

揣摩: task_struct 是linux kernel中描述process 的 structure.
children 代表子行程, sibling 代表… 兄弟姊妹行程?
總之出現新的行程就會有個task_struct 新增進linked list.

關於這個部分我之前實作過一個小小的linux kernel module, 可以traverse一個process的所有childs, siblings, parents.
https://github.com/tony2037/simple-pstree

以下是其中一個function, children_to_end.

static void children_to_end(void *payload, struct task_struct *entry, int space){
    struct list_head *ptr;
    put_in_result(payload, entry->comm, task_pid_nr(entry), space);
    printk("Process name: %s", entry->comm);
    if(!list_empty(&entry->children)){
	printk("Have children");
        list_for_each(ptr, &(entry->children)){
            entry = list_entry(ptr, struct task_struct, sibling);
	    if(task_pid_nr(entry) == 0)
                return;
	    children_to_end(payload, entry, space+1);
    } 
    }
    return;
}

利用Linux庫的 list_empty, list_for_each, recursively traverse all children.

tags: 實作

Unit test

在linux-list repo裡的unit tests, 使用定義在assert.h的function assert實作unit test.
unit test的好處之一就是在於以比人工測試更有效率地對單一模組進行測試.
比如我在實作merge_sort時, 必須先實作一個function合併兩個已排序的lists, (sortedqueue_splice). 此時為了驗證這個function的正確性, 我在/tests中寫了一個merge_sort.c來驗證這個funcion的正確性

#include <assert.h>
#include <stdio.h>

#include "queue.h"

queue_t *a;
queue_t *b;

int main(void)
{
    a = q_new();
    b = q_new();
    q_insert_tail(a, "a");
    q_insert_tail(a, "b");
    q_insert_tail(a, "c");
    q_insert_tail(b, "d");
    q_insert_tail(b, "e");
    q_insert_tail(b, "f");
    q_insert_tail(b, "g");
    queue_t *r;
    r = sortedqueue_splice(a, b);
    list_ele_t *ptr;
    ptr = r->head;
    int should_be = (int) *"a";
    while (ptr != NULL) {
        assert((int) (ptr->value[0]) == should_be);
        printf("%s\n", ptr->value);
        ptr = ptr->next;
        should_be++;
    }
}

總之就是兩條已排序的queue合併起來看一看結果正不正確. (q_new … 等已經通過測試)

Makefile

TESTS = \
        merge_sort
$(TESTS): $(TESTS).c report.c console.c harness.c queue.o
        $(CC) $(CFLAGS) -I./ -o $@ $< report.c console.c harness.c queue.o

Merge sort

https://en.wikipedia.org/wiki/Merge_sort
了解merge sort的運作原理, 並且實作 project qtest 支援的merge sort.
為了達成實作了三個functions: sortedqueue_splice, merge_sort, q_split_half
























/*
 * sortedlist_splice - given two sorted queue, merge them together.
 * @a: pointer to the first sorted queue
 * @b: pointer to the second sorted queue
 *
 * Return: pointer to queue that complete merging of two queues.
*/
queue_t *sortedqueue_splice(queue_t *a, queue_t *b);

/*
 * merge_sort - given a queue, sort it with merge sortint algorithm
 * @q: pointer to queue to be sorted
 * Return: A sorted queue
*/
queue_t *merge_sort(queue_t *q);

/*
 * q_split_half - split the given queue into two queues at the middle
 * @q: pointer to queue to be splited
 * @a: pointer to the front queue
 * @b: pointer to the later queue
 * Return: void
*/
void q_split_half(queue_t *q, queue_t *a, queue_t *b);

並且學習在 project linux-list 裡面的unit test, 在 tests directory 中新增三個對應的測試: merge_sort.c, q_split_half.c, sortedqueue_splice.c.
其中使用 assert函式確定測資通過測試.
eg. merge_sort.c:


































#include <assert.h>
#include <stdio.h>

#include "queue.h"


int main(void)
{
    queue_t *q, *sorted_q;
    q = q_new();
    q_insert_tail(q, "a");
    q_insert_tail(q, "g");
    q_insert_tail(q, "c");
    q_insert_tail(q, "b");
    q_insert_tail(q, "e");
    q_insert_tail(q, "f");
    q_insert_tail(q, "d");
    list_ele_t *ptr;
    ptr = q->head;
    printf("q: ===============\n");
    while (ptr != NULL) {
        printf("%s\n", ptr->value);
        ptr = ptr->next;
    }
    sorted_q = merge_sort(q);
    int should_be = (int) *"a";
    ptr = sorted_q->head;
    printf("merge sort: ===============\n");
    while (ptr != NULL) {
        printf("%s\n", ptr->value);
        assert(ptr->value[0] == should_be++);
        ptr = ptr->next;
    }
}

並且在Makefile 中新增對應的rules

...
TESTS = \
	sortedqueue_splice \
	q_split_half \
	merge_sort

TESTS := $(addprefix tests/,$(TESTS))

...
$(TESTS): report.c console.c harness.c queue.o
	$(CC) $(CFLAGS) -I./ -o $@.test $@.c report.c console.c harness.c queue.o
...

因為方便就把所有dependencies 全部當作參數.
應該有更好的方法, 但還在思考

看個 merge_sort



















/*
 * Using merge sorting algorithm to sort the given queue
*/
queue_t *merge_sort(queue_t *q)
{
    queue_t *ptr;
    ptr = q;
    if (q == NULL)
        return NULL;
    if (q->size == 0 || q->size == 1)
        return q;
    queue_t *a, *b;
    a = q_new();
    b = q_new();
    q_split_half(ptr, a, b);
    a = merge_sort(a);
    b = merge_sort(b);
    return sortedqueue_splice(a, b);
}

這個演算法主要分三個部分

如果長度 == 1 or 0
- return
切成一半 (為了不需要判斷奇偶數, 當queue的長度為奇數時, 前面那段(a)會比較短)
sort (a) & sort(b)
合併a, b
演算法的邏輯不難, 但是把程式寫好超難. (Easy to make it done, hard to make it perfect)

其中我的程式就存在memory leak issue
以下是使用valgrind 分析的結果 (這東西真D好用)










==3300== LEAK SUMMARY:
==3300==    definitely lost: 0 bytes in 0 blocks
==3300==    indirectly lost: 0 bytes in 0 blocks
==3300==      possibly lost: 0 bytes in 0 blocks
==3300==    still reachable: 3,094 bytes in 61 blocks
==3300==         suppressed: 0 bytes in 0 blocks
==3300== Rerun with --leak-check=full to see details of leaked memory
==3300==
==3300== For counts of detected and suppressed errors, rerun with: -v
==3300== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

這代表這隻糟糕的程式如果遇到長度很長的資料, memory leak issue 就會凸顯我的無能.

接下來想試試看使用phone_book的測資測試一下

以下是通過unit test的結果:

q: ===============
a
g
c
b
e
f
d
merge sort: ===============
a
b
c
d
e
f
g

2019q1 Homework1 (list)

typeof (GNU extension)

offsetof

__extension__

container_of