2019q1 Homework1(list)

contributed by < wang0630 >

Q1: 為何 Linux 採用 macro 來實作 linked list？一般的 function call 有何成本？

Instead of using a function, using a macro has several advantages, including:

The program is slightly faster. Function calls often requires additional care, such as storing the old context in registers for later return, copying arguments and return values of a function, etc.
Note. this overhead can be dealed with inline keyword in C99 standard.
Macros accept most of types as their arguments. However, functions often only take particular types, making the latter is less generic than the former.

Q2: Linux 應用 linked list 在哪些場合？舉三個案例並附上對應程式碼，需要解說，以及揣摩對應的考量

A interesting thing pointed out by https://notes.shichao.io/lkd/ch6/
We tend to define a node like this:

struct mynode {
	int mydata;
	// some data
	struct mynode* prev;
	struct mynode* next;
}

but linux kernel defines linked list as follow:

struct mynode {
	int mydata;
	// some data
	struct list_head head;
}

struct list_head {
	struct list_head* prev;
	struct list_head* next;
}

linux version of linked list wraps all elements to one nice and simple data structure.

Usage:

task_struct
A iconic usage of linked list is task_struct which represents a task or a process. linux uses linked list to handle parent/child/sibling releationship among processes.
It is represneted as followed:

struct task_struct {
	// ...... 
	struct list_head children;
	struct list_head sibling;
	// ......
}

The relationship can be described as:
ex. process a has process b,c as children





a.children.next = &b.sibling;
b.sibling.next = &c.sibling;
c.sibling.next = &a.children;

// and prev field is linked backward

To iterate through the whole list of example above, we use the macro defined in linux/list.h

struct list_head *iterator, *starting = &a.children;
list_for_each(iterator, starting) {
	struct list_head* children_of_process_a = list_entry(iterator, struct task_struct, sibling);
	// do something with childern_of_process_a
}

Scheduler in Linux
Linux uses Completely Fair Scheduler as its scheduling algorithm. Instead of using strict rules to define the time quantum for each task, CFS assigns propotion of CPU time to each task. This propotion is based on the nice value assigned to each task:

static const int prio_to_weight[40] = {
 /* -20 */     88761,     71755,     56483,     46273,     36291,
 /* -15 */     29154,     23254,     18705,     14949,     11916,
 /* -10 */      9548,      7620,      6100,      4904,      3906,
 /*  -5 */      3121,      2501,      1991,      1586,      1277,
 /*   0 */      1024,       820,       655,       526,       423,
 /*   5 */       335,       272,       215,       172,       137,
 /*  10 */       110,        87,        70,        56,        45,
 /*  15 */        36,        29,        23,        18,        15,
};

How does CFS determine a nice value to a task?
A nice value is defined by the vruntime field in the struct sched_entity:

struct sched_entity {
	struct load_weight	load;		/* for load-balancing */
	struct rb_node		run_node;
	struct list_head	group_node;
	unsigned int		on_rq;

	u64			exec_start;
	u64			sum_exec_runtime;
	u64			vruntime;
	
	// addtional properties....
}

According to Operating system concept, ninth edition, sched_entity is the basic unit in linux CFS algorithm, and it is used to track how much time a task has been running by storing the virtual run time in vruntime field.

The virtual run time is associated to with a decay factor based on the priority of a task. lower-priority tasks have higher decay factor; hence, it's vruntime will grow and drop more rapidly than that of higher-priority tasks. For example, a normal task with nice value zero(default) has run for 200ms, and its vruntime will also be 200ms. However, if a lower-priority task runs for 200ms, its vruntime will be more than 200ms and vice versa.

For linked list usage, CFS keeps every running tasks in a red-black tree whose key is based on the value of vruntime. To select the next task to run, CFS simply chooses the task with the smallest vruntime, which will be the leftmost task in the RB-tree; Even more, since traversing through the whole RB tree takes complexity of O(logn), linux keeps a reference of the task with the smallest vruntime value in the field called rb_leftmost

linux keeps sched_entity inside a queue called cfs_rq, and use struct sched_entity __pick_next_entity to pick next entity to run:

static struct sched_entity *__pick_next_entity(struct cfs_rq *cfs_rq)
{
		// rb_leftmost is the cache mentioned above
    struct rb_node *left = cfs_rq->rb_leftmost;

    if (!left) // no task
        return NULL;
		
		// rb_entry is the macro container_of
		// return the pointer to the rb_leftmost
		// run_node is the rb tree linkage inside struct sched_entity
    return rb_entry(left, struct sched_entity, run_node);
}

Enqueueing a new task by using __enqueue_entity

static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
    struct rb_node **link = &cfs_rq->tasks_timeline.rb_node;
    struct rb_node *parent = NULL;
    struct sched_entity *entry;
    // macro of se->vruntime - cfs_rq->runtime
    s64 key = entity_key(cfs_rq, se);
    // is the new node the leftmost node?
    int leftmost = 1;

    /*
     * Find the right place in the rbtree:
     */
    while (*link) {
        parent = *link;
        // get the pointer
        entry = rb_entry(parent, struct sched_entity, run_node);
        /*
         * We dont care about collisions. Nodes with
         * the same key stay together.
         */
        if (key < entity_key(cfs_rq, entry)) {
            link = &parent->rb_left;
        } else {
            link = &parent->rb_right;
            /*
             * if traversing ever goes to the right once,
             * it will never be the leftmost node
             * so the leftmost flag is cleared
             */
            leftmost = 0;
        }
    }

    /*
     * Maintain a cache of leftmost tree entries (it is frequently
     * used):
     */
    if (leftmost)
        cfs_rq->rb_leftmost = &se->run_node;

    rb_link_node(&se->run_node, parent, link);
    rb_insert_color(&se->run_node, &cfs_rq->tasks_timeline);
}