Chaper 6 Kernel Data Structures

# Chaper 6 Kernel Data Structures * Linked lists * Queues * Maps * Binary trees # The Linux Kernel's implementation ![](https://i.imgur.com/ENd9uI8.png) WRITE_ONCE ``` #define WRITE_ONCE(x, val) \ 295 ({ \ 296 union { typeof(x) __val; char __c[1]; } __u = \ 297 { .__val = (__force typeof(x)) (val) }; \ 298 __write_once_size(&(x), __u.__c, sizeof(x)); \ 299 __u.__val; \ 300 }) ``` https://stackoverflow.com/questions/34988277/write-once-in-linux-kernel-lists Memory barrier https://blog.csdn.net/zhangxiao93/article/details/42966279 volatile的實際應用: container_of and typeof在linux kernel的運用 ``` 840 #define container_of(ptr, type, member) ({ \ 841 const typeof( ((type *)0)->member ) *__mptr = (ptr); \ 842 (type *)( (char *)__mptr - offsetof(type,member) );}) ``` https://stackoverflow.com/questions/13723422/why-this-0-in-type0-member-in-c http://deltamaster.is-programmer.com/posts/37253.html ``` /** * container_of - cast a member of a structure out to the containing structure * @ptr: the pointer to the member. * @type: the type of the container struct this is embedded in. * @member: the name of the member within the struct. * */ #define container_of(ptr, type, member) ({ \ const typeof( ((type *)0)->member ) *__mptr = (ptr); \ (type *)( (char *)__mptr - offsetof(type,member) );}) ``` 比如內核的task_struct數據結構中有一個member是sched_entity類型的se，這個member常常被調度器使用來決定進程的調度順序，那麼如果要根據這個se來獲取包含它的task_struct，就可以使用container_of(p, task_struct, se)來實現（假設p是指向這個sched_entity的指針）。原理是先產生一個指針指向member，然後將這個指針減去member在這個struct中的偏移量，指針自然就指向了包含該member的對象了 # 重點: * always reuse existing kernel infrastructure * don’t reinvent the wheel. https://hengxiuxu.blogspot.com/2015/09/system-call-linux-kernel-ubuntu-1204lts.html # Chapter 6 Kernel Data Structures ## Linked Lists The **Linux kernel’s linked list** is fundamentally a **circular doubly linked list**. ### The Linux Kernel’s Implementation ``` struct fox { unsigned long tail_length; /* length in centimeters of tail */ unsigned long weight; /* weight in kilograms */ bool is_fantastic; /* is this fox fantastic? */ struct fox *next; /* next fox in linked list */ struct fox *prev; /* previous fox in linked list */ }; ``` Instead of turning the structure into a linked list, the Linux approach is to embed a linked list node in the structure! #### The Linked List Structure ``` struct list_head { struct list_head *next struct list_head *prev; }; struct fox { unsigned long tail_length; /* length in centimeters of tail */ unsigned long weight; /* weight in kilograms */ bool is_fantastic; /* is this fox fantastic? */ struct list_head list; /* list of all fox structures */ }; ``` ``` #define container_of(ptr, type, member) ({ \ const typeof( ((type *)0)->member ) *__mptr = (ptr); \ (type *)( (char *)__mptr - offsetof(type,member) );}) #define list_entry(ptr, type, member) \ container_of(ptr, type, member) ``` #### Defining a Linked List runtime: ``` struct fox *red_fox; red_fox = kmalloc(sizeof(*red_fox), GFP_KERNEL); red_fox->tail_length = 40; red_fox->weight = 6; red_fox->is_fantastic = false; INIT_LIST_HEAD(&red_fox->list); ``` compile time: ``` struct fox red_fox = { .tail_length = 40, .weight = 6, .list = LIST_HEAD_INIT(red_fox.list), }; ``` ##### List Heads ``` static LIST_HEAD(fox_list); ``` This defines and initializes a list_head named fox_list. ### Manipulating Linked Lists #### Adding a Node to a Linked List ``` list_add(struct list_head *new, struct list_head *head) list_add_tail(struct list_head *new, struct list_head *head) ex: list_add(&f->list, &fox_list); ``` #### Deleting a Node from a Linked List ``` list_del(struct list_head *entry) ``` For example, to delete the fox node we previous added to fox_list: ``` list_del(&f->list); ``` #### Moving and Splicing Linked List Nodes ``` list_move(struct list_head *list, struct list_head *head) list_move_tail(struct list_head *list, struct list_head *head) list_empty(struct list_head *head) list_splice(struct list_head *list, struct list_head *head) list_splice_init(struct list_head *list, struct list_head *head) ``` ### Traversing Linked Lists #### The Basic Approach ``` struct list_head *p; struct fox *f; list_for_each(p, &fox_list) { /* f points to the structure in which the list is embedded */ f = list_entry(p, struct fox, list); } ``` #### The Usable Approach ``` list_for_each_entry(pos, head, member) struct fox *f; list_for_each_entry(f, &fox_list, list) { /* on each iteration, ‘f’ points to the next fox structure ... */ } ``` #### Iterating Through a List Backward ``` list_for_each_entry_reverse(pos, head, member) ``` #### Iterating While Removing ``` list_for_each_entry_safe(pos, next, head, member) ``` ``` void inotify_inode_is_dead(struct inode *inode) { struct inotify_watch *watch, *next; mutex_lock(&inode->inotify_mutex); list_for_each_entry_safe(watch, next, &inode->inotify_watches, i_list) { struct inotify_handle *ih = watch->ih; mutex_lock(&ih->mutex); inotify_remove_watch_locked(ih, watch); /* deletes watch */ mutex_unlock(&ih->mutex); } mutex_unlock(&inode->inotify_mutex); } ``` ``` list_for_each_entry_safe_reverse(pos, n, head, member) ``` **You May Still Need Locking!** The **list_for_each_entry_safe()** protect you only from removals from the list within the body of the loop. If there is a chance of concurrent removals from other code—or any other form of concurrent list manipulation—you need to properly lock access to the list. ## Queues ![](https://i.imgur.com/1K6gRfr.png) The Linux kernel’s generic queue implementation is called kfifo . ### kfifo Linux’s kfifo works like most other queue abstractions, * providing two primary operations: enqueue (unfortunately named in) and dequeue (out). The kfifo object maintains two offsets into the queue: an in offset and an out offset. * The in offset is the location in the queue to which the next enqueue will occur. * The out offset is the location in the queue from which the next dequeue will occur. * The out offset is always less than or equal to the in offset. * The enqueue (in) operation copies data into the queue, starting at the in offset.When complete, the in offset is incremented by the amount of data enqueued. * The dequeue (out) operation copies data out of the queue, starting from the out offset.When complete, the out offset is incremented by the amount of data enqueued. * When the out offset is equal to the in offset, the queue is empty: * When the in offset is equal to the length of the queue, no more data can be enqueued until the queue is reset. ### Creating a Queue ``` /** * kfifo_alloc - dynamically allocates a new fifo buffer * @fifo: pointer to the fifo * @size: the number of elements in the fifo, this must be a power of 2 * @gfp_mask: get_free_pages mask, passed to kmalloc() * * This macro dynamically allocates a new fifo buffer. * * The numer of elements will be rounded-up to a power of 2. * The fifo will be release with kfifo_free(). * Return 0 if no error, otherwise an error code. */ int kfifo_alloc(struct kfifo *fifo, unsigned int size, gfp_t gfp_mask); ``` If you want to allocate the buffer yourself, you can: ``` void kfifo_init(struct kfifo *fifo, void *buffer, unsigned int size); ``` ### Enqueuing Data ``` unsigned int kfifo_in(struct kfifo *fifo, const void *from, unsigned int len); ``` ### Dequeuing Data ``` unsigned int kfifo_out(struct kfifo *fifo, void *to, unsigned int len); unsigned int kfifo_out_peek(struct kfifo *fifo, void *to, unsigned int len, unsigned offset); ``` ### Obtaining the Size of a Queue ``` static inline unsigned int kfifo_size(struct kfifo *fifo); static inline unsigned int kfifo_len(struct kfifo *fifo); static inline unsigned int kfifo_avail(struct kfifo *fifo); static inline int kfifo_is_empty(struct kfifo *fifo); static inline int kfifo_is_full(struct kfifo *fifo); ``` ### Resetting and Destroying the Queue ``` static inline void kfifo_reset(struct kfifo *fifo); void kfifo_free(struct kfifo *fifo); ``` ### Example Queue Usage na ## Maps A map, also known as an associative array, is a collection of unique keys, where each key is associated with a specific value. Maps support at least three operations: * Add (key, value) * Remove (key) * value = Lookup (key) The idr data structure is used for mapping user-space UIDs, such as inotify watch descriptors or POSIX timer IDs, to their associated kernel data structure, such as the inotify_watch or k_itimer structures, respectively. Following the Linux kernel’s scheme of obfuscated, confusing names, this map is called idr. ### Initializing an idr ``` void idr_init(struct idr *idp); ``` ### Allocating a New UID Once you have an idr set up, you can allocate a new UID, which is a two-step process. The first function, to resize the backing tree ``` int idr_pre_get(struct idr *idp, gfp_t gfp_mask); ``` The second function, to actually obtain a new UID and add it to the idr, is ``` int idr_get_new(struct idr *idp, void *ptr, int *id); ``` Let’s look at a full example: ``` int id; do { if (!idr_pre_get(&idr_huh, GFP_KERNEL)) return -ENOSPC; ret = idr_get_new(&idr_huh, ptr, &id); } while (ret == -EAGAIN); ``` ### Looking Up a UID ``` void *idr_find(struct idr *idp, int id); ``` Usage is simple: ``` struct my_struct *ptr = idr_find(&idr_huh, id); if (!ptr) return -EINVAL; /* error */ ``` ### Removing a UID ``` void idr_remove(struct idr *idp, int id); ``` ### Destroying an idr A successful call to idr_destroy() deallocates only unused memory associated with the idr pointed at by idp. It does not free any memory currently in use by allocated UIDs. ``` void idr_destroy(struct idr *idp); ``` To force the removal of all UIDs, you can call idr_remove_all(): ``` void idr_remove_all(struct idr *idp); ``` You would call idr_remove_all() on the idr pointed at by idp before calling idr_destroy(), ensuring that all idr memory was freed ## Binary Trees ![](https://i.imgur.com/vuR4Ykl.png) ### Binary Search Trees * The left subtree of the root contains only nodes with values less than the root. * The right subtree of the root contains only nodes with values greater than the root. * All subtrees are also binary search trees ![](https://i.imgur.com/Dkc39cT.png) ### Self-Balancing Binary Search Trees ![](https://i.imgur.com/eNrfqnM.png) #### Red-Black Trees na ##### rbtrees The Linux implementation of red-black trees is called rbtrees. To create a new tree, we allocate a new rb_root and initialize it to the special value RB_ROOT: ``` struct rb_root root = RB_ROOT; ``` Individual nodes in an rbtree are represented by the ***rb_node*** structure. The rbtree implementation does not provide search and insert routines. Search and insert require each user to do so manually, using provided rbtree helper functions but their own comparison operators. Search Example: ``` struct page * rb_search_page_cache(struct inode *inode, unsigned long offset) { struct rb_node *n = inode->i_rb_page_cache.rb_node; while (n) { struct page *page = rb_entry(n, struct page, rb_page_cache); if (offset < page->offset) n = n->rb_left; else if (offset > page->offset) n = n->rb_right; else return page; } return NULL; } ``` Insert Example: ``` struct page * rb_insert_page_cache(struct inode *inode, unsigned long offset, struct rb_node *node) { struct rb_node **p = &inode->i_rb_page_cache.rb_node; struct rb_node *parent = NULL; struct page *page; while (*p) { parent = *p; page = rb_entry(parent, struct page, rb_page_cache); if (offset < page->offset) p = &(*p)->rb_left; else if (offset > page->offset) p = &(*p)->rb_right; else return page; } rb_link_node(node, parent, p); rb_insert_color(node, &inode->i_rb_page_cache); return NULL; } ``` The function returns NULL if the page was added to the page cache and the address of an existing page structure if the page is already in the cache. ### What Data Structure to Use, When na ## Algorithmic Complexity na ### Algorithms na ### Big-O Notation ### Big Theta Notation ### Time Complexity ## Conclusion