Try   HackMD

2019q1 Homework1 (list)

contributed by < ztex >

typeof (GNU extension)

Referring to a Type with typeof

[quote]

If you are writing a header file that must work when included in ISO C programs, write __typeof__ instead of typeof. See Alternate Keywords.

A typeof construct can be used anywhere a typedef name can be used. For example, you can use it in a declaration, in a cast, or inside of sizeof or typeof.

大意:
如果是ISO C programs 請用 __typeof__ 而不要用 typeof
詳情請見: Alternate Keywords

下面教了一些好用的用法:

Some more examples of the use of typeof:

This declares y with the type of what x points to.
typeof (*x) y;
This declares y as an array of such values.
typeof (*x) y[4];
This declares y as an array of pointers to characters:
typeof (typeof (char *)[4]) y;
It is equivalent to the following traditional C declaration:
char *y[4];
To see the meaning of the declaration using typeof, and why it might be a useful way to write, rewrite it with these macros:
#define pointer(T) typeof(T *) #define array(T, N) typeof(T [N])
Now the declaration can be rewritten this way:
array (pointer (char), 4) y;
Thus, array (pointer (char), 4) is the type of arrays of 4 pointers to char.

大意大概是 想宣告一個跟*x一樣型態的變數y嗎?這樣搞就行了

我以前真的不知道可以這樣用, 讀document真的好重要喔
Ztex

offsetof

http://man7.org/linux/man-pages/man3/offsetof.3.html

  • DESCRIPTION
    The macro offsetof() returns the offset of the field member from the
    start of the structure type.

    ​​​​​​ This macro is useful because the sizes of the fields that compose a
    ​​​​​​ structure can vary across implementations, and compilers may insert
    ​​​​​​ different numbers of padding bytes between fields.  Consequently, an
    ​​​​​​ element's offset is not necessarily given by the sum of the sizes of
    ​​​​​​ the previous elements.
    
    ​​​​​​ A compiler error will result if member is not aligned to a byte
    ​​​​​​ boundary (i.e., it is a bit field).
    
  • RETURN VALUE
    offsetof() returns the offset of the given member within the given
    type, in units of bytes.

如果member 不是 a bytpe boundary, compiler error
Ztex

__extension__

GCC uses the __extension__ attribute when using the -ansi flag to avoid warnings in headers with GCC extensions. This is mostly used in glibc with function declartions using long long

用來告訴gcc 不用warnings了(使用-ansi flag 情況下)
Ztex

container_of

#define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ })
  • 透過上面GNU教我們的技巧宣告一個((type *) 0)->member 型態的 pointer __pmember, 並且把 ptr assign 給這個pointer
  • 透過offsetof 計算出這個member對於目標address的偏移量
  • 再把__pmember 減掉這個偏移量就是目標address了

macro vs. function

https://www.geeksforgeeks.org/macros-vs-functions/

MACRO|FUNCTION
Macro is Preprocessed|Function is Compiled
No Type Checking is done in Macro|Type Checking is Done in Function
Using Macro increases the code length|Using Function keeps the code length unaffected
Use of macro can lead to side effect at later stages|Functions do not lead to any side effect in any case
Speed of Execution using Macro is Faster|Speed of Execution using Function is Slower
Before Compilation, macro name is replaced by macro value|During function call, transfer of control takes place
Macros are useful when small code is repeated many times|Functions are useful when large code is to be written
Macro does not check any Compile-Time Errors|Function checks Compile-Time Errors

linked list applications in Linux kernel

include/linux/fs.h (block_device)

struct block_device {
	dev_t			bd_dev;  /* not a kdev_t - it's a search key */
	int			bd_openers;
	struct inode *		bd_inode;	/* will die */
	struct super_block *	bd_super;
	struct mutex		bd_mutex;	/* open/close mutex */
	void *			bd_claiming;
	void *			bd_holder;
	int			bd_holders;
	bool			bd_write_holder;
#ifdef CONFIG_SYSFS
	struct list_head	bd_holder_disks;
#endif
	struct block_device *	bd_contains;
	unsigned		bd_block_size;
	u8			bd_partno;
	struct hd_struct *	bd_part;
	/* number of times partitions within this device have been opened. */
	unsigned		bd_part_count;
	int			bd_invalidated;
	struct gendisk *	bd_disk;
	struct request_queue *  bd_queue;
	struct backing_dev_info *bd_bdi;
	struct list_head	bd_list;
	/*
	 * Private data.  You must have bd_claim'ed the block_device
	 * to use this.  NOTE:  bd_claim allows an owner to claim
	 * the same device multiple times, the owner must take special
	 * care to not mess up bd_private for that case.
	 */
	unsigned long		bd_private;

	/* The counter of freeze processes */
	int			bd_fsfreeze_count;
	/* Mutex for freeze */
	struct mutex		bd_fsfreeze_mutex;
} __randomize_layout;

struct list_head bd_list:
https://elixir.bootlin.com/linux/v4.15/source/drivers/usb/gadget/udc/bdc/bdc.h#L301

/*
 * Each endpoint has a bdl(buffer descriptor list), bdl consists of 1 or more bd
 * table's chained to each other through a chain bd, every table has equal
 * number of bds. the software uses bdi(bd index) to refer to particular bd in
 * the list.
 */
struct bd_list {
	/* Array of bd table pointers*/
	struct bd_table **bd_table_array;
	/* How many tables chained to each other */
	int num_tabs;
	/* Max_bdi = num_tabs * num_bds_table - 1 */
	int max_bdi;
	/* current enq bdi from sw point of view */
	int eqp_bdi;
	/* current deq bdi from sw point of view */
	int hwd_bdi;
	/* numbers of bds per table */
	int num_bds_table;
};

block_device:
Experient:
Block Device Drivers
揣摩:
block_device 是一個描述塊裝備的結構, 他可以用來表示一個磁區或一個分區, 透過一系列的API進行I/O(上面的實驗)
而每個endpoint 都有一個 bd_list (buffer descriptor list)(結構如上) 含有一個以上的 bd_table, 軟體藉此找到特定的bd.
我覺得他的設計考量可能考慮到一個特定的block_device在進行操作時需要對其他block_device進行I/O, 透過這個設計可以輕易traverse.

我對於這部分揣摩十分沒有把握
Ztex

除了「舉燭」,試著將核心程式碼抽離出來,轉寫獨立的程式來測試,一如本例的示範。
只是「看了再幻想」,終究不符合科學精神

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

Copy that. Working on it.
Ztex

include/linux/fs.h (inode)

https://zh.wikipedia.org/wiki/Inode
https://blog.csdn.net/yf210yf/article/details/12138921

The inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block location(s) of the object's data.[1] File-system object attributes may include metadata (times of last change,[2] access, modification), as well as owner and permission data.

大意是在Unix-style 的檔案系統中的一種資料結構, 代表一個檔案或者目錄

/*
 * Keep mostly read-only and often accessed (especially for
 * the RCU path lookup and 'stat' data) fields at the beginning
 * of the 'struct inode'
 */
struct inode {
	umode_t			i_mode;
	unsigned short		i_opflags;
	kuid_t			i_uid;
	kgid_t			i_gid;
	unsigned int		i_flags;

#ifdef CONFIG_FS_POSIX_ACL
	struct posix_acl	*i_acl;
	struct posix_acl	*i_default_acl;
#endif

	const struct inode_operations	*i_op;
	struct super_block	*i_sb;
	struct address_space	*i_mapping;

#ifdef CONFIG_SECURITY
	void			*i_security;
#endif

	/* Stat data, not accessed from path walking */
	unsigned long		i_ino;
	/*
	 * Filesystems may only read i_nlink directly.  They shall use the
	 * following functions for modification:
	 *
	 *    (set|clear|inc|drop)_nlink
	 *    inode_(inc|dec)_link_count
	 */
	union {
		const unsigned int i_nlink;
		unsigned int __i_nlink;
	};
	dev_t			i_rdev;
	loff_t			i_size;
	struct timespec		i_atime;
	struct timespec		i_mtime;
	struct timespec		i_ctime;
	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
	unsigned short          i_bytes;
	unsigned int		i_blkbits;
	enum rw_hint		i_write_hint;
	blkcnt_t		i_blocks;

#ifdef __NEED_I_SIZE_ORDERED
	seqcount_t		i_size_seqcount;
#endif

	/* Misc */
	unsigned long		i_state;
	struct rw_semaphore	i_rwsem;

	unsigned long		dirtied_when;	/* jiffies of first dirtying */
	unsigned long		dirtied_time_when;

	struct hlist_node	i_hash;
	struct list_head	i_io_list;	/* backing dev IO list */
#ifdef CONFIG_CGROUP_WRITEBACK
	struct bdi_writeback	*i_wb;		/* the associated cgroup wb */

	/* foreign inode detection, see wbc_detach_inode() */
	int			i_wb_frn_winner;
	u16			i_wb_frn_avg_time;
	u16			i_wb_frn_history;
#endif
	struct list_head	i_lru;		/* inode LRU list */
	struct list_head	i_sb_list;
	struct list_head	i_wb_list;	/* backing dev writeback list */
	union {
		struct hlist_head	i_dentry;
		struct rcu_head		i_rcu;
	};
	u64			i_version;
	atomic_t		i_count;
	atomic_t		i_dio_count;
	atomic_t		i_writecount;
#ifdef CONFIG_IMA
	atomic_t		i_readcount; /* struct files open RO */
#endif
	const struct file_operations	*i_fop;	/* former ->i_op->default_file_ops */
	struct file_lock_context	*i_flctx;
	struct address_space	i_data;
	struct list_head	i_devices;
	union {
		struct pipe_inode_info	*i_pipe;
		struct block_device	*i_bdev;
		struct cdev		*i_cdev;
		char			*i_link;
		unsigned		i_dir_seq;
	};

	__u32			i_generation;

#ifdef CONFIG_FSNOTIFY
	__u32			i_fsnotify_mask; /* all events this inode cares about */
	struct fsnotify_mark_connector __rcu	*i_fsnotify_marks;
#endif

#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
	struct fscrypt_info	*i_crypt_info;
#endif

	void			*i_private; /* fs or device private pointer */
} __randomize_layout;

甚麼是inode
POSIX inode description
The POSIX standard mandates file-system behavior that is strongly influenced by traditional UNIX file systems. An inode is denoted by the phrase "file serial number", defined as a per-file system unique identifier for a file.[8] That file serial number, together with the device ID of the device containing the file, uniquely identify the file within the whole system[9].

Within a POSIX system, a file has the following attributes[9] which may be retrieved by the stat system call:

  • Device ID (this identifies the device containing the file; that is, the scope of uniqueness of the serial number).
    *File serial numbers.
  • The file mode which determines the file type and how the file's owner, its group, and others can access the file.
  • A link count telling how many hard links point to the inode.
  • The User ID of the file's owner.
  • The Group ID of the file.
  • The device ID of the file if it is a device file.
  • The size of the file in bytes.
  • Timestamps telling when the inode itself was last modified (ctime, inode change time), the file content last modified (mtime, modification time), and last accessed (atime, access time).
  • The preferred I/O block size.
  • The number of blocks allocated to this file.

POSIX 規檔案系統行為, inode 的定義是 pre-file system unique identifier
在 POSIX 系統中 一個檔案有上述屬性

list_head 的應用

struct list_head i_list;
struct list_head i_sb_list;
struct list_head i_dentry;

揣摩: 在文件系統中所有inode 都在linked list裡面, i_list 是使用中狀態的linked list, i_sb_list是super block linked list, i_dentry 記錄了目錄上的連結.
當新增inode,會將i_list加在所有的已使用的inode linked list,然後加到super block,最後加到 i_dentry.
使用 list_head 的考量大概是因為新增刪除inodes 比較有效率.

include/linux/sched.h (task_struct)

struct task_struct {
...
/*
* Children/sibling form the list of natural children:
*/
	struct list_head		children;
	struct list_head		sibling;
...
}

太長了, 我就不整個寫出了.
Ztex

揣摩: task_struct 是linux kernel中描述process 的 structure.
children 代表子行程, sibling 代表 兄弟姊妹行程?
總之出現新的行程就會有個task_struct 新增進linked list.

關於這個部分我之前實作過一個小小的linux kernel module, 可以traverse一個process的所有childs, siblings, parents.
https://github.com/tony2037/simple-pstree

以下是其中一個function, children_to_end.

static void children_to_end(void *payload, struct task_struct *entry, int space){
    struct list_head *ptr;
    put_in_result(payload, entry->comm, task_pid_nr(entry), space);
    printk("Process name: %s", entry->comm);
    if(!list_empty(&entry->children)){
	printk("Have children");
        list_for_each(ptr, &(entry->children)){
            entry = list_entry(ptr, struct task_struct, sibling);
	    if(task_pid_nr(entry) == 0)
                return;
	    children_to_end(payload, entry, space+1);
    } 
    }
    return;
}

利用Linux庫的 list_empty, list_for_each, recursively traverse all children.


tags: 實作

Unit test

linux-list repo裡的unit tests, 使用定義在assert.h的function assert實作unit test.
unit test的好處之一就是在於以比人工測試更有效率地對單一模組進行測試.
比如我在實作merge_sort時, 必須先實作一個function合併兩個已排序的lists, (sortedqueue_splice). 此時為了驗證這個function的正確性, 我在/tests中寫了一個merge_sort.c來驗證這個funcion的正確性

#include <assert.h>
#include <stdio.h>

#include "queue.h"

queue_t *a;
queue_t *b;

int main(void)
{
    a = q_new();
    b = q_new();
    q_insert_tail(a, "a");
    q_insert_tail(a, "b");
    q_insert_tail(a, "c");
    q_insert_tail(b, "d");
    q_insert_tail(b, "e");
    q_insert_tail(b, "f");
    q_insert_tail(b, "g");
    queue_t *r;
    r = sortedqueue_splice(a, b);
    list_ele_t *ptr;
    ptr = r->head;
    int should_be = (int) *"a";
    while (ptr != NULL) {
        assert((int) (ptr->value[0]) == should_be);
        printf("%s\n", ptr->value);
        ptr = ptr->next;
        should_be++;
    }
}

總之就是兩條已排序的queue合併起來看一看結果正不正確. (q_new 等已經通過測試)

Makefile

TESTS = \
        merge_sort
$(TESTS): $(TESTS).c report.c console.c harness.c queue.o
        $(CC) $(CFLAGS) -I./ -o $@ $< report.c console.c harness.c queue.o

Merge sort

https://en.wikipedia.org/wiki/Merge_sort
了解merge sort的運作原理, 並且實作 project qtest 支援的merge sort.
為了達成實作了三個functions: sortedqueue_splice, merge_sort, q_split_half

/* * sortedlist_splice - given two sorted queue, merge them together. * @a: pointer to the first sorted queue * @b: pointer to the second sorted queue * * Return: pointer to queue that complete merging of two queues. */ queue_t *sortedqueue_splice(queue_t *a, queue_t *b); /* * merge_sort - given a queue, sort it with merge sortint algorithm * @q: pointer to queue to be sorted * Return: A sorted queue */ queue_t *merge_sort(queue_t *q); /* * q_split_half - split the given queue into two queues at the middle * @q: pointer to queue to be splited * @a: pointer to the front queue * @b: pointer to the later queue * Return: void */ void q_split_half(queue_t *q, queue_t *a, queue_t *b);

並且學習在 project linux-list 裡面的unit test, 在 tests directory 中新增三個對應的測試: merge_sort.c, q_split_half.c, sortedqueue_splice.c.
其中使用 assert函式確定測資通過測試.
eg. merge_sort.c:

#include <assert.h> #include <stdio.h> #include "queue.h" int main(void) { queue_t *q, *sorted_q; q = q_new(); q_insert_tail(q, "a"); q_insert_tail(q, "g"); q_insert_tail(q, "c"); q_insert_tail(q, "b"); q_insert_tail(q, "e"); q_insert_tail(q, "f"); q_insert_tail(q, "d"); list_ele_t *ptr; ptr = q->head; printf("q: ===============\n"); while (ptr != NULL) { printf("%s\n", ptr->value); ptr = ptr->next; } sorted_q = merge_sort(q); int should_be = (int) *"a"; ptr = sorted_q->head; printf("merge sort: ===============\n"); while (ptr != NULL) { printf("%s\n", ptr->value); assert(ptr->value[0] == should_be++); ptr = ptr->next; } }

並且在Makefile 中新增對應的rules

...
TESTS = \
	sortedqueue_splice \
	q_split_half \
	merge_sort

TESTS := $(addprefix tests/,$(TESTS))

...
$(TESTS): report.c console.c harness.c queue.o
	$(CC) $(CFLAGS) -I./ -o $@.test $@.c report.c console.c harness.c queue.o
...

因為方便就把所有dependencies 全部當作參數.
應該有更好的方法, 但還在思考

看個 merge_sort

/* * Using merge sorting algorithm to sort the given queue */ queue_t *merge_sort(queue_t *q) { queue_t *ptr; ptr = q; if (q == NULL) return NULL; if (q->size == 0 || q->size == 1) return q; queue_t *a, *b; a = q_new(); b = q_new(); q_split_half(ptr, a, b); a = merge_sort(a); b = merge_sort(b); return sortedqueue_splice(a, b); }

這個演算法主要分三個部分

  • 如果長度 == 1 or 0
    • return
  • 切成一半 (為了不需要判斷奇偶數, 當queue的長度為奇數時, 前面那段(a)會比較短)
  • sort (a) & sort(b)
  • 合併a, b
    演算法的邏輯不難, 但是把程式寫好超難. (Easy to make it done, hard to make it perfect)

其中我的程式就存在memory leak issue
以下是使用valgrind 分析的結果 (這東西真D好用)

==3300== LEAK SUMMARY: ==3300== definitely lost: 0 bytes in 0 blocks ==3300== indirectly lost: 0 bytes in 0 blocks ==3300== possibly lost: 0 bytes in 0 blocks ==3300== still reachable: 3,094 bytes in 61 blocks ==3300== suppressed: 0 bytes in 0 blocks ==3300== Rerun with --leak-check=full to see details of leaked memory ==3300== ==3300== For counts of detected and suppressed errors, rerun with: -v ==3300== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

這代表這隻糟糕的程式如果遇到長度很長的資料, memory leak issue 就會凸顯我的無能.

接下來想試試看使用phone_book的測資測試一下

以下是通過unit test的結果:

q: ===============
a
g
c
b
e
f
d
merge sort: ===============
a
b
c
d
e
f
g