# 2021q1 Homework2 (quiz2) contributed by < `AmyLin0210` > ###### tags: `2021q1 Linux` > [2021q1 第二周測驗題](https://hackmd.io/@sysprog/linux2021-quiz2) --- ## 測驗一 ### 解釋上述程式碼運作原理 ```cpp= #define container_of(ptr, type, member) \ __extension__({ \ const __typeof__(((type *) 0)->member) *__pmember = (ptr); \ (type *) ((char *) __pmember - offsetof(type, member)); \ }) ``` 在 container_of 中包含以下部份: - `__extension__` 的功用是在使用 GCC extension `__typeof__` 時,不會跳出警告。 - `((type *)0)->member` 將 0 這個位置強制轉型為 `type *` - 取出當中的 member,並使用 `__typeof__` 得到 member 的型態,產生一個 pointer 給該型別的 `__pmember` - 將 `ptr` assign 給 `__pmember` - `offsetof(type, member)` 可以算出 `member` 在 `type` 這個 struct 中的位移量。 - 將絕對位置 `(char*)__pmember` 減去 `offsetof(type, member)` ,可以得到 struct 的起始位置。 - 最後 `(type*)` 再將起始位置轉行為 pointer to `type` 在這當中讓我最不解的是 `((type *)0)->member` 這個操作, 參考 [toastcheng](https://hackmd.io/@toastcheng/w2-quiz2?fbclid=IwAR2crB0c4FZwLpkjQ2sVXEWasg8IXXz3rn3sE5DopzAbBpA90iuV65-pzOo) 同學的作業與實驗,自己也試了一下程式碼: 定義出了一個 struct ,如下: ``` struct test { int num; }; ``` - typeof-null.c ```cpp= #include <stdio.h> int main () { const __typeof__(((struct test *)NULL)->num) a = 1; printf("a = %d\n", a); // output: a = 1 return 0; } ``` - typeof-2048.c ```cpp= #include <stdio.h> int main () { const __typeof__(((struct test *)2048)->num) a = 1; printf("a = %d\n", a); // output: a = 1 return 0; } ``` - typeof-char.c ```cpp= #include <stdio.h> int main () { char i; const __typeof__(((struct test *)i)->num) a = 1; printf("a = %d\n", a); // output: a = 1 return 0; } ``` 使用 gcc 產生組合語言 ``` $ gcc typeof-null.c typeof-char.c typeof-null.c -S -O0 ``` 使用 diff 來比較產生的組合語言 ``` $ diff typeof-char.s typeof-null.s < .file "typeof-char.c" --- > .file "typeof-null.c" ``` ``` $ diff typeof-2048.s typeof-null.s < .file "typeof-2048.c" --- > .file "typeof-null.c" ``` 發現除了 file name 以外其餘都相同,這個讓我感到意外。 `typeof-2048` 和 `typeof-null` 相同,我想可以解釋為都是對兩個位置強制轉型,所以會產生兩個同樣的組合語言。 但原先我預期 `typeof-char` 會有些不同的行為,因為已經宣告出一個 char 的變數 `i`,所以組合語言中會多出對 `i` 做處理的部份,然而結果與我想像中的不同。 如同 [toastcheng](https://hackmd.io/@toastcheng/w2-quiz2?fbclid=IwAR2crB0c4FZwLpkjQ2sVXEWasg8IXXz3rn3sE5DopzAbBpA90iuV65-pzOo) 同學的猜測,我想在 typeof 時並不會對當中的指標或參數做 dereference。 但除此之外,強制轉型如果在沒有取當中的數值做處理的情況下,是不是也不會做 dereference? 我測試了一下以下程式碼,當中的指標 `i` 皆指向 NULL,只有型態不同。 char.c ```cpp= #include <stdio.h> int main () { char *i = NULL; const __typeof__(((struct test *)i)->num) a = 1; printf("a = %d\n", a); // output: a = 1 return 0; } ``` int.c ```cpp= #include <stdio.h> int main () { int *i = NULL; const __typeof__(((struct test *)i)->num) a = 1; printf("a = %d\n", a); // output: a = 1 return 0; } ``` 上述的程式碼在轉換成組合語言後使用 diff 做比較, 發現不管原參數的型態是否不同,結果仍然相同。 ``` $ gcc int.c char.c -S -O0 $ diff int.s char.s < .file "int.c" --- > .file "char.c" ``` ---- 在這裡使用了 [Designated Initializers](https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html#:~:targetText=6.29%20Designated%20Initializers,array%20or%20structure%20being%20initialized.&targetText=To%20initialize%20a%20range%20of,This%20is%20a%20GNU%20extension.),使得 `head` 內的 `prev` 與 `next` 都被初始化為自己本身的位置。 ```cpp struct list_head { struct list_head *prev, *next; }; #define LIST_HEAD(head) struct list_head head = {&(head), &(head)} ``` #### INIT_LIST_HEAD 初始化 `head` ```cpp static inline void INIT_LIST_HEAD(struct list_head *head) { head->next = head; head->prev = head; } ``` ```graphviz digraph INIT_LIST_HEAD { nodesep=0.5 rankdir="TB" node[shape=record] head [ label = "<p> prev | <n> next", xlabel = "head" style="filled" fillcolor="lightblue" ] head:p -> head head:n -> head } ``` #### list_add_tail 由於這是一個 circuler doubly linked list ,所以使用 `head->prev` 找到 linked list 的尾部,再將 `node` 接上。 ```cpp static inline void list_add_tail(struct list_head *node, struct list_head *head) { struct list_head *prev = head->prev; prev->next = node; node->next = head; node->prev = prev; head->prev = node; } ``` ```graphviz digraph list_add_tail { nodesep=0.5 rankdir="LR" node[shape=record] prev [ label = "<p> prev | <n> next" xlabel = "prev" style="filled" fillcolor="lightblue" ] _node [ label = "<p> prev | <n> next", xlabel = "node" style="filled" fillcolor="lightblue" ] head [ label = "<p> prev | <n> next", xlabel = "head" style="filled" fillcolor="lightblue" ] prev:n->_node _node:n->head _node:p->prev head:p->_node } ``` #### list_del 將 `node` 移出這個 linked list ```cpp static inline void list_del(struct list_head *node) { struct list_head *next = node->next, *prev = node->prev; next->prev = prev; prev->next = next; } ``` ```graphviz digraph list_add_tail { nodesep=0.5 rankdir="LR" node[shape=record] prev [ label = "<p> prev | <n> next" xlabel = "prev" style="filled" fillcolor="lightblue" ] _node [ label = "<p> prev | <n> next", xlabel = "node" style="filled" fillcolor="lightblue" ] next [ label = "<p> prev | <n> next", xlabel = "next" style="filled" fillcolor="lightblue" ] prev:n->next next:p->prev } ``` #### list_empty 檢查 `head` 是否指回 `head` 自己,如果沒有指向其他 node ,便表示這個 linked list 為空。 ```cpp static inline int list_empty(const struct list_head *head) { return (head->next == head); } ``` ```graphviz digraph list_add_tail { nodesep=0.5 rankdir="LR" node[shape=record] head [ label = "<p> prev | <n> next" xlabel = "head" style="filled" fillcolor="lightblue" ] head:n->head } ``` #### list_is_singular 檢查是否 `prev` 與 `next` 都指向同一個位置,若是相同便表示只有一個 node ```cpp static inline int list_is_singular(const struct list_head *head) { return (!list_empty(head) && head->prev == head->next); } ``` ```graphviz digraph list_add_tail { nodesep=0.5 rankdir="LR" node[shape=record] head [ label = "<p> prev | <n> next" xlabel = "head" style="filled" fillcolor="lightblue" ] head:n->tmp head:p->tmp } ``` #### list_splice_tail 將 `list` 整串 linked list 連接至 `head` linked list 的最後面,形成一個新的環狀結構 ```cpp static inline void list_splice_tail(struct list_head *list, struct list_head *head) { struct list_head *head_last = head->prev; struct list_head *list_first = list->next, *list_last = list->prev; if (list_empty(list)) return; head->prev = list_last; list_last->next = head; list_first->prev = head_last; head_last->next = list_first; } ``` ```graphviz digraph list_add_tail { nodesep=0.5 rankdir="LR" node[shape=record] head [ label = "<p> prev | <n> next" xlabel = "head" style="filled" fillcolor="lightblue" ] head_tmp [ label = "<p> prev | <n> next" xlabel = "head_tmp" ] head_last [ label = "<p> prev | <n> next" xlabel = "head_last" style="filled" fillcolor="lightblue" ] list_first [ label = "<p> prev | <n> next" xlabel = "list_first" style="filled" fillcolor="lightblue" ] list_tmp [ label = "<p> prev | <n> next" xlabel = "list_tmp" ] list_last [ label = "<p> prev | <n> next" xlabel = "list_last" style="filled" fillcolor="lightblue" ] list [ label = "<p> prev | <n> next" xlabel = "list" style="filled" fillcolor="lightblue" ] head:p->list_last list_last:n->head list_first:p->head_last head_last:n->list_first head_last:p->head_tmp list:n->list_first list:p->list_last list_first:n->list_tmp list_tmp:n->list_last list_tmp:p->list_first head:n->head_tmp head_tmp:p->head head_tmp:n->head_last } ``` #### list_cut_position 將 `head_from` 後面到 `node` 前面的 linked list 裁下,連接至 `head_to` 後 ```cpp static inline void list_cut_position(struct list_head *head_to, struct list_head *head_from, struct list_head *node) { struct list_head *head_from_first = head_from->next; if (list_empty(head_from)) return; if (head_from == node) { INIT_LIST_HEAD(head_to); return; } head_from->next = node->next; head_from->next->prev = head_from; head_to->prev = node; node->next = head_to; head_to->next = head_from_first; head_to->next->prev = head_to; } ``` 未切割 ```graphviz digraph list_add_tail { nodesep=0.5 rankdir="LR" node[shape=record] head_from [ label = "<p> prev | <n> next" xlabel = "h0 | head_from" style="filled" fillcolor="lightblue" ] head_from_first [ label = "<p> prev | <n> next" xlabel = "h1 | head_from_first" style="filled" fillcolor="lightblue" ] _node [ label = "<p> prev | <n> next" xlabel = "h2 | node" style="filled" fillcolor="pink" ] h3 [ label = "<p> prev | <n> next" xlabel = "h3" style="filled" fillcolor="lightblue" ] h4 [ label = "<p> prev | <n> next" xlabel = "h4" style="filled" fillcolor="lightblue" ] head_to [ label = "<p> prev | <n> next" xlabel = "head_to" style="filled" fillcolor="lightblue" ] head_from:n->head_from_first head_from_first:n->_node _node:n->h3 h3:n->h4 h4:n->head_from h4:p->h3 h3:p->_node _node:p->head_from_first head_from_first:p->head_from head_from:p->h4 } ``` 切割後 ```graphviz digraph list_add_tail { nodesep=0.5 rankdir="LR" node[shape=record] head_from [ label = "<p> prev | <n> next" xlabel = "h0 | head_from" style="filled" fillcolor="lightblue" ] h3 [ label = "<p> prev | <n> next" xlabel = "h3" style="filled" fillcolor="lightblue" ] h4 [ label = "<p> prev | <n> next" xlabel = "h4" style="filled" fillcolor="lightblue" ] head_to [ label = "<p> prev | <n> next" xlabel = "head_to" style="filled" fillcolor="lightblue" ] head_from_first [ label = "<p> prev | <n> next" xlabel = "h1 | head_from_first" style="filled" fillcolor="lightblue" ] _node [ label = "<p> prev | <n> next" xlabel = "h2 | node" style="filled" fillcolor="pink" ] head_to:p->_node _node:n->head_to head_to:n->head_from_first head_from_first:p->head_to head_from_first:n->_node _node:p->head_from_first head_from:n->h3 head_from:p->h4 h3:p->head_from h3:n->h4 h4:p->h3 h4:n->head_from } ``` #### get_middle 利用快慢兩個指標來走訪節點,快指標一次走兩步,慢指標一次走一步。 當快指標走訪回最開頭時,慢指標會剛好走到中間。 假設這裡有 N 個節點,慢指標會在第 $\lceil N \rceil$ 個位置 ```cpp static list_ele_t *get_middle(struct list_head *list) { struct list_head *fast = list->next, *slow; list_for_each (slow, list) { if (fast->next == list || fast->next->next == list) break; fast = fast->next->next; } return list_entry(slow, list_ele_t, list); } ``` #### list_merge_sort 這段程式碼因為所有功能都有用函式包起, 乍看之下很像 merge sort 的 pseudocode。 ```cpp void list_merge_sort(queue_t *q) { if (list_is_singular(&q->list)) return; queue_t left; struct list_head sorted; INIT_LIST_HEAD(&left.list); list_cut_position(&left.list, &q->list, &get_middle(&q->list)->list); list_merge_sort(&left); list_merge_sort(q); list_merge(&left.list, &q->list, &sorted); INIT_LIST_HEAD(&q->list); list_splice_tail(&sorted, &q->list); } ``` 運作邏輯如下: - 確認是否只剩一個節點,如果只剩一個節點直接 return - 尋找整個 list 的中心位置並把他切為兩半 - 對兩個 list 分別再進行 merge sort - 完成後將兩邊 merge 起,這個步驟可以用下面圖解: - 分別有 `left` , `q ( right )` , 與 `sorted` 三段 ```graphviz digraph list_add_tail { nodesep=0.3 rankdir="TD" node[shape=record] left [ label = " {<t> tail | <h> head} | size | <l> list " xlabel = "left head" style="filled" fillcolor="lightblue" ] L1 [ label = "<p> prev | <n> next" xlabel = "L1" style="filled" fillcolor="lightblue" ] L2 [ label = "<p> prev | <n> next" xlabel = "L2" style="filled" fillcolor="lightblue" ] right [ label = " {<t> tail | <h> head} | size | <l> list " xlabel = "q head" style="filled" fillcolor="pink" ] R1 [ label = "<p> prev | <n> next" xlabel = "R1" style="filled" fillcolor="pink" ] R2 [ label = "<p> prev | <n> next" xlabel = "R2" style="filled" fillcolor="pink" ] sorted [ label = "prev | next" xlabel = "sorted head" ] left:l->L1 L1:n->L2 L2:n->left:l L2:p->L1 L1:p->left:l left:l->L2 right:l->R1 R1:n->R2 R2:n->right:l R2:p->R1 R1:p->right:l right:l->R2 } ``` - 將他們排序並連接起,接到 sorted 後方 ```graphviz digraph list_add_tail { nodesep=1 rankdir="TD" node[shape=record] sorted [ label = "<p>prev |<n> next" xlabel = "sorted head" ] left [ label = "<p> prev | <n> next" xlabel = "left head" style="filled" fillcolor="lightblue" ] L1 [ label = "<p> prev | <n> next" xlabel = "L1" style="filled" fillcolor="lightblue" ] R1 [ label = "<p> prev | <n> next" xlabel = "R1" style="filled" fillcolor="pink" ] R2 [ label = "<p> prev | <n> next" xlabel = "R2" style="filled" fillcolor="pink" ] L2 [ label = "<p> prev | <n> next" xlabel = "L2" style="filled" fillcolor="lightblue" ] right [ label = "<p> prev | <n> next" xlabel = "q head" style="filled" fillcolor="pink" ] sorted:n->L1 L1:n->R1 R1:n->R2 R2:n->L2 L2:n->sorted L2:p->R2 R2:p->R1 R1:p->L1 L1:p->sorted sorted:p->L2 } ``` - 把 q 初始化 ```graphviz digraph list_add_tail { nodesep=1 rankdir="TD" node[shape=record] q [ label = "<p>prev |<n> next" xlabel = "q list" ] q:p->q q:n->q } ``` - 將排序好的 list 接到 q 的後面 ```graphviz digraph list_add_tail { nodesep=1 rankdir="TD" node[shape=record] q [ label = "<p> prev | <n> next" xlabel = "q head" style="filled" fillcolor="pink" ] sorted [ label = "<p>prev |<n> next" xlabel = "sorted head" ] L1 [ label = "<p> prev | <n> next" xlabel = "L1" style="filled" fillcolor="lightblue" ] R1 [ label = "<p> prev | <n> next" xlabel = "R1" style="filled" fillcolor="pink" ] R2 [ label = "<p> prev | <n> next" xlabel = "R2" style="filled" fillcolor="pink" ] L2 [ label = "<p> prev | <n> next" xlabel = "L2" style="filled" fillcolor="lightblue" ] q:n->L1 L1:n->R1 R1:n->R2 R2:n->L2 L2:n->q L2:p->R2 R2:p->R1 R1:p->L1 L1:p->q q:p->L2 } ``` #### list_merge 將兩條 linked list 排序好並連接起 ```cpp static void list_merge(struct list_head *lhs, struct list_head *rhs, struct list_head *head) { INIT_LIST_HEAD(head); if (list_empty(lhs)) { list_splice_tail(lhs, head); return; } if (list_empty(rhs)) { list_splice_tail(rhs, head); return; } while (!list_empty(lhs) && !list_empty(rhs)) { char *lv = list_entry(lhs->next, list_ele_t, list)->value; char *rv = list_entry(rhs->next, list_ele_t, list)->value; struct list_head *tmp = strcmp(lv, rv) <= 0 ? lhs->next : rhs->next; list_del(tmp); list_add_tail(tmp, head); } list_splice_tail(list_empty(lhs) ? rhs : lhs, head); } ``` ### 指出改進空間並著手實作 在資料型態的部分, ```cpp struct list_head { struct list_head *prev, *next; }; typedef struct __element { char *value; struct __element *next; struct list_head list; } list_ele_t; typedef struct { list_ele_t *head; /* Linked list of elements */ list_ele_t *tail; size_t size; struct list_head list; } queue_t; ``` 由於 `list_head` 就擁有了 `tail` 、 `next` 相關的資訊,所以我把上述的資料型態給縮減為以下: ```cpp struct list_head { struct list_head *prev, *next; }; typedef struct __element { char *value; struct list_head list; } list_ele_t; typedef struct { list_ele_t *head; size_t size; struct list_head list; } queue_t; ``` 也有針對這個改動修改其他對應的程式碼。 ### 研讀 Linux 核心的 lib/list_sort.c 原始程式碼 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) `__attribute__` 是一種檢查工具,主要是 gcc 在編譯時期會去查看。 在這裡選定的 target 是 `nonnull`,表示在 `cmp_func` 這個 function pointer 中的第二及第三個參數指標不得為 NULL。 ```cpp typedef int __attribute__((nonnull(2,3))) (*cmp_func)(void *, struct list_head const *, struct list_head const *); ``` #### 參考資料 [Declaring Attributes of Functions](https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html) ## 測驗二 ### 解釋程式碼運作原理 考慮函式 func 接受一個 16 位元無號整數 N,並回傳小於或等於 N 的 power-of-2 ```cpp uint16_t func(uint16_t N) { /* change all right side bits to 1 */ N |= N >> 1; N |= N >> 2; N |= N >> 4; N |= N >> 8; return (N + 1) >> 1; } ``` 由於以上函式的目標為回傳小於或等於 N 的 power-of-2,看到 `return (N + 1) >> 1`,可以推測最後 N 的值會是 $2^{n+1} - 1$ 假設現在 func 的 input 為 `1 << 15`,以下為執行順序: ``` 1000 0000 0000 0000 // N 1100 0000 0000 0000 // N |= N >> 1 1111 0000 0000 0000 // N |= N >> 2 1111 1111 0000 0000 // N |= N >> 4 1111 1111 1111 1111 // N |= N >> 8 ``` 在回傳值為 $2^{15}$ 時,會發現有 overflow 的問題,應改為 `return (N >> 1) + 1` 來保證結果正確性。 ### 在 [The Linux Kernel API](https://www.kernel.org/doc/html/latest/core-api/kernel-api.html) 頁面搜尋 “power of 2”,可見 is_power_of_2,查閱 Linux 核心原始程式碼,舉例說明其作用和考量,特別是 round up/down power of 2 在 [linux/tools/include/linux/log2.h](https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/tools/include/linux/log2.h) 內可以找到 `is_power_of_2` 的原始程式碼。 `__attribute__((const))` 是個 GNU compiler extension,他的目標為確保這個函式沒有讀取或修改任何的 global memory。 ```cpp static inline __attribute__((const)) bool is_power_of_2(unsigned long n) { return (n != 0 && ((n & (n - 1)) == 0)); } ``` 如果 `n` 為 2 的冪次 ( $2^N$ ),從 bit 的角度來看,就只有第 N 個 bit 為 1 ,其餘皆為零。 而 `n-1` 則會變成從第 N-1 個及前方的 bit 為 1 ,其餘皆為零 `n` 與 `n-1` 做 `&` 操作後會變成 0 。 ```graphviz digraph power_of_2 { nodesep=1 rankdir="TB" node[shape=record] n [ label = " ...| 0 | 0 | 1 | 0 | 0 | 0 " xlabel = "n" ] } ``` ```graphviz digraph power_of_2 { nodesep=1 rankdir="TB" node[shape=record] "n-1" [ label = " ...| 0 | 0 | 0 | 1 | 1 | 1 " xlabel = "n-1" ] } ``` 好處是 Bitwise Operation 在硬體實做上比較有效率。 同樣在 [linux/tools/include/linux/log2.h](https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/tools/include/linux/log2.h) 內,可以找到 `__roundup_pow_of_two` 以及 `__rounddown_pow_of_two` 。 `UL` 代表 unsigned long int ( C99 標準, 6.4.4.1 )。 `fls_long` 代表的是找到第一個為一的 bit 在哪邊, - __roundup__pow_of_two - 假設我們預期此函數的回傳值為 $2^N$ ,那他的參數 `n` 數值範圍應該要為 $2^{N-1} + 1$ ~ $2^N$ - fls_long(n) 中,若 n 的範圍為 $2^{N-1}$ ~ $2^{N}-1$ ,會回傳 N - 因此 `fls_long(n - 1)` - __rounddown__pow_of_two - 假設我們預期此函數的回傳值為 $2^{N-1}$ ,那他的參數 `n` 數值範圍應該要為 $2^{N-2}$ ~ $2^{N-1} - 1$ - fls_long(n) 中,若 n 的範圍為 $2^{N-1}$ ~ $2^{N}-1$ ,會回傳 N - 因此 `fls_long(n) - 1` ```cpp /* * round up to nearest power of two */ static inline __attribute__((const)) unsigned long __roundup_pow_of_two(unsigned long n) { return 1UL << fls_long(n - 1); } ``` ```cpp /* * round down to nearest power of two */ static inline __attribute__((const)) unsigned long __rounddown_pow_of_two(unsigned long n) { return 1UL << (fls_long(n) - 1); } ``` 在 [linux/tools/include/linux/bitops.h](https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/tools/include/linux/bitops.h) 中可以找到 `fls_long`。 在這邊將 `unsigned long l` 分成 32 位元與 64 位元的兩種狀況去處理 ```cpp static inline unsigned fls_long(unsigned long l) { if (sizeof(l) == 4) return fls(l); return fls64(l); } ``` 在 [linux/include/asm-generic/bitops/fls.h](https://github.com/torvalds/linux/blob/master/include/asm-generic/bitops/fls.h) 內可以找到 `fls` 的程式原始碼。 `fls` 的函式目標為找到 `x` 內第一個唯一的 bit。 ```cpp /** * fls - find last (most-significant) bit set * @x: the word to search * * This is defined the same way as ffs. * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32. */ static __always_inline int fls(unsigned int x) { int r = 32; if (!x) return 0; if (!(x & 0xffff0000u)) { x <<= 16; r -= 16; } if (!(x & 0xff000000u)) { x <<= 8; r -= 8; } if (!(x & 0xf0000000u)) { x <<= 4; r -= 4; } if (!(x & 0xc0000000u)) { x <<= 2; r -= 2; } if (!(x & 0x80000000u)) { x <<= 1; r -= 1; } return r; } ``` 以上函式透過 bit mask 的方式一次次過濾是否前方有 1,進而算出第一個為 1 的 bit 在哪裡。 以下用 8 個 bit 來示範函式流程 ``` x: 0000 0110 mask: 1111 0000 r: 8 - 4 = 4 ``` ``` x: 0110 0000 mask: 1100 0000 r: 4 ``` ``` x: 0110 0000 mask: 1000 0000 r: 4 - 1 = 3 ``` 在這裡我看到了 Linux kernel 的優雅操作,以上的函式都是用比較貼近硬體語言的方式操作,例如位移或者是 `&` `|` 運算子,比起減法更有效率。 :::info **Reference** - [\_\_attribute__((const)) function attribute](https://www.keil.com/support/man/docs/armcc/armcc_chr1359124974244.htm) ::: ## 測驗三 ### 解釋程式碼運作原理 ```cpp #include <stdint.h> void bitcpy(void *_dest, /* Address of the buffer to write to */ size_t _write, /* Bit offset to start writing to */ const void *_src, /* Address of the buffer to read from */ size_t _read, /* Bit offset to start reading from */ size_t count) { size_t read_lhs = _read & 7; size_t read_rhs = 8 - read_lhs; const uint8_t *source = (const uint8_t *) _src + (_read / 8); size_t write_lhs = _write & 7; size_t write_rhs = 8 - write_lhs; uint8_t *dest = (uint8_t *) _dest + (_write / 8); static const uint8_t read_mask[] = { 0x00, /* == 0 00000000b */ 0x80, /* == 1 10000000b */ 0xC0, /* == 2 11000000b */ 0xE0, /* == 3 11100000b */ 0xF0, /* == 4 11110000b */ 0xF8, /* == 5 11111000b */ 0xFC, /* == 6 11111100b */ 0xFE, /* == 7 11111110b */ 0xFF /* == 8 11111111b */ }; static const uint8_t write_mask[] = { 0xFF, /* == 0 11111111b */ 0x7F, /* == 1 01111111b */ 0x3F, /* == 2 00111111b */ 0x1F, /* == 3 00011111b */ 0x0F, /* == 4 00001111b */ 0x07, /* == 5 00000111b */ 0x03, /* == 6 00000011b */ 0x01, /* == 7 00000001b */ 0x00 /* == 8 00000000b */ }; while (count > 0) { uint8_t data = *source++; size_t bitsize = (count > 8) ? 8 : count; if (read_lhs > 0) { data <<= read_lhs; if (bitsize > read_rhs) data |= (*source >> read_rhs); } if (bitsize < 8) data &= read_mask[bitsize]; uint8_t original = *dest; uint8_t mask = read_mask[write_lhs]; if (bitsize > write_rhs) { /* Cross multiple bytes */ *dest++ = (original & mask) | (data >> write_lhs); original = *dest & write_mask[bitsize - write_rhs]; *dest = original | (data << write_rhs); } else { // Since write_lhs + bitsize is never >= 8, no out-of-bound access. mask |= write_mask[write_lhs + bitsize]; *dest++ = (original & mask) | (data >> write_lhs); } count -= bitsize; } } ``` 現在假設 > _read = 2 > _write = 3 > count = 9 首先會先將 `_read` 、 `_write` 分作 lhs 與 rhs ```cpp size_t read_lhs = _read & 7; // 2 size_t read_rhs = 8 - read_lhs; // 6 ``` 找到由哪個 byte 開始讀取 bit 數值,示意圖中粉色方格為 source 。 ``` const uint8_t *source = (const uint8_t *) _src + (_read / 8); ``` ```graphviz digraph { graph [rankdir = TB] node [shape=plaintext, fontsize=18]; "" -> "bit" -> "byte" [fontcolor=black, color=white]; node [shape=none, fontcolor=black, fontsize=14, width=4.75, fixedsize=true]; byte0 [ shape=none label=< <table border="0" cellborder="1" cellspacing="0" cellpadding="17"> <tr> <td port="port0" bgcolor="#ffcce5"><font color="red">7</font></td> <td port="port1" bgcolor="#ffcce5"><font color="red">6</font></td> <td port="port2" bgcolor="#ffcce5"><font color="red">5</font></td> <td port="port3" bgcolor="#ffcce5"><font color="red">4</font></td> <td port="port4" bgcolor="#ffcce5"><font color="red">3</font></td> <td port="port5" bgcolor="#ffcce5"><font color="red">2</font></td> <td port="port6" bgcolor="#ffcce5"><font color="red">1</font></td> <td port="port7" bgcolor="#ffcce5"><font color="red">0</font></td> </tr> </table>> ]; byte1 [ shape=none label=< <table border="0" cellborder="1" cellspacing="0" cellpadding="17"> <tr> <td port="port0" bgcolor="#f5e342">7</td> <td port="port1" bgcolor="#f5e342">6</td> <td port="port2" bgcolor="#f5e342">5</td> <td port="port3" bgcolor="#f5e342">4</td> <td port="port4" bgcolor="#f5e342">3</td> <td port="port5" bgcolor="#f5e342">2</td> <td port="port6" bgcolor="#f5e342">1</td> <td port="port7" bgcolor="#f5e342">0</td> </tr> </table>> ]; t0 [shape=record label="<head> 0 | 1 | <s>2 | 3 | 4 | 5 | 6 | 7", color="white", fontcolor="black"]; t1 [shape=record label="<head> 8 | 9 | <s>10 | 11 | 12 | 13 | 14 | 15", color="white", fontcolor="black"]; "bitsize" -> byte0:port2 [color=black]; "bitsize" -> byte1:port2 [color=black]; "bitsize" -> byte0:port6 [color=none]; t0:head -> byte0:port0 [color=none]; t1:head -> byte1:port0 [color=none]; { rank=same; "bit"; byte0; byte1; } { rank=same; "byte"; t0; t1; } } ``` 將 `source` 中要讀取的片段複製到 `data` 內。 ```cpp uint8_t data = *source++; size_t bitsize = (count > 8) ? 8 : count; if (read_lhs > 0) { data <<= read_lhs; if (bitsize > read_rhs) data |= (*source >> read_rhs); } ``` ```graphviz digraph { graph [rankdir = TB] node [shape=plaintext, fontsize=18]; node [shape=none, fontcolor=black, fontsize=14, width=4.75, fixedsize=true]; byte0 [ shape=none label=<<table border="0" cellborder="1" cellspacing="0" cellpadding="12"> <tr> <td bgcolor="#ffcce5"><font color="red">5</font></td> <td bgcolor="#ffcce5"><font color="red">4</font></td> <td bgcolor="#ffcce5"><font color="red">3</font></td> <td bgcolor="#ffcce5"><font color="red">2</font></td> <td bgcolor="#ffcce5"><font color="red">1</font></td> <td bgcolor="#ffcce5"><font color="red">0</font></td> <td bgcolor="#f5e342">7</td> <td bgcolor="#f5e342">6</td> </tr> </table>> ]; } ``` 再將 data 的內容移動至 `dest` ```cpp uint8_t original = *dest; uint8_t mask = read_mask[write_lhs]; if (bitsize > write_rhs) { /* Cross multiple bytes */ *dest++ = (original & mask) | (data >> write_lhs); original = *dest & write_mask[bitsize - write_rhs]; *dest = original | (data << write_rhs); } else { // Since write_lhs + bitsize is never >= 8, no out-of-bound access. mask |= write_mask[write_lhs + bitsize]; *dest++ = (original & mask) | (data >> write_lhs); } ``` ```graphviz digraph { graph [rankdir = TB] node [shape=plaintext, fontsize=18]; "" -> "bit" -> "byte" [fontcolor=black, color=white]; node [shape=none, fontcolor=black, fontsize=14, width=4.75, fixedsize=true]; byte0 [ shape=none label=< <table border="0" cellborder="1" cellspacing="0" cellpadding="17"> <tr> <td port="port0">7</td> <td port="port1">6</td> <td port="port2">5</td> <td port="port3" bgcolor="#ffcce5"><font color="red">5</font></td> <td port="port4" bgcolor="#ffcce5"><font color="red">4</font></td> <td port="port5" bgcolor="#ffcce5"><font color="red">3</font></td> <td port="port6" bgcolor="#ffcce5"><font color="red">2</font></td> <td port="port7" bgcolor="#ffcce5"><font color="red">1</font></td> </tr> </table>> ]; byte1 [ shape=none label=< <table border="0" cellborder="1" cellspacing="0" cellpadding="17"> <tr> <td port="port0" bgcolor="#ffcce5"><font color="red">0</font></td> <td port="port1" bgcolor="#f5e342">7</td> <td port="port2" bgcolor="#f5e342">6</td> <td port="port3">4</td> <td port="port4">3</td> <td port="port5">2</td> <td port="port6">1</td> <td port="port7">0</td> </tr> </table>> ]; t0 [shape=record label="<head> 0 | 1 | <s>2 | 3 | 4 | 5 | 6 | 7", color="white", fontcolor="black"]; t1 [shape=record label="<head> 8 | 9 | <s>10 | 11 | 12 | 13 | 14 | 15", color="white", fontcolor="black"]; "bitsize" -> byte0:port3 [color=black]; "bitsize" -> byte1:port3 [color=black]; "bitsize" -> byte0:port6 [color=none]; t0:head -> byte0:port0 [color=none]; t1:head -> byte1:port0 [color=none]; { rank=same; "bit"; byte0; byte1; } { rank=same; "byte"; t0; t1; } } ``` ### 嘗試重寫為同樣功能但效率更高的實作 目前的程式碼我認為有改進空間的部份有 - 使用 bitwise 的位移取代除法。 ```diff= size_t read_lhs = _read & 7; size_t read_rhs = 8 - read_lhs; -const uint8_t *source = (const uint8_t *) _src + (_read / 8); +const uint8_t *source = (const uint8_t *) _src + (_read >> 3); size_t write_lhs = _write & 7; size_t write_rhs = 8 - write_lhs; -uint8_t *dest = (uint8_t *) _dest + (_write / 8); +uint8_t *dest = (uint8_t *) _dest + (_write >> 3); ``` - 若需要做跨 byte 儲存資料,每次都會多出兩次的位移與 mask 的,若 count 一大,運算成本也會增加。 ```diff= { size_t read_lhs = _read & 7; size_t read_rhs = 8 - read_lhs; const uint8_t *source = (const uint8_t *) _src + (_read >> 3); size_t write_lhs = _write & 7; size_t write_rhs = 8 - write_lhs; uint8_t *dest = (uint8_t *) _dest + (_write >> 3); static const uint8_t read_mask[] = { 0x00, /* == 0 00000000b */ 0x80, /* == 1 10000000b */ 0xC0, /* == 2 11000000b */ 0xE0, /* == 3 11100000b */ 0xF0, /* == 4 11110000b */ 0xF8, /* == 5 11111000b */ 0xFC, /* == 6 11111100b */ 0xFE, /* == 7 11111110b */ 0xFF /* == 8 11111111b */ }; static const uint8_t write_mask[] = { 0xFF, /* == 0 11111111b */ 0x7F, /* == 1 01111111b */ 0x3F, /* == 2 00111111b */ 0x1F, /* == 3 00011111b */ 0x0F, /* == 4 00001111b */ 0x07, /* == 5 00000111b */ 0x03, /* == 6 00000011b */ 0x01, /* == 7 00000001b */ 0x00 /* == 8 00000000b */ }; + if (count & 7UL) { + uint8_t data = *source++; + size_t bitsize = count & 7UL; + data <<= read_lhs; + data &= read_mask[bitsize]; + uint8_t original = *dest; + uint8_t mask = read_mask[write_lhs]; + if (bitsize > write_rhs) { + *dest++ = (original & mask) | (data >> write_lhs); + original = *dest & write_mask[bitsize - write_rhs]; + *dest = original | (data << write_rhs); + } else { + mask |= write_mask[write_lhs + bitsize]; + *dest++ = (original & mask) | (data >> write_lhs); + } + count -= bitsize; + } while (count > 0) { uint8_t data = *source++; size_t bitsize = (count > 8) ? 8 : count; - if (read_lhs > 0) { - data <<= read_lhs; - if (bitsize > read_rhs) - data |= (*source >> read_rhs); - } - - if (bitsize < 8) - data &= read_mask[bitsize]; data &= read_mask[bitsize]; uint8_t original = *dest; uint8_t mask = read_mask[write_lhs]; if (bitsize > write_rhs) { /* Cross multiple bytes */ *dest++ = (original & mask) | (data >> write_lhs); original = *dest & write_mask[bitsize - write_rhs]; *dest = original | (data << write_rhs); } else { // Since write_lhs + bitsize is never >= 8, no out-of-bound access. mask |= write_mask[write_lhs + bitsize]; *dest++ = (original & mask) | (data >> write_lhs); } count -= bitsize; } } ``` 在進入 while 迴圈之前,我把 src 對齊,確認在讀資料時,一定會從某個 byte 的第一個 bit 開始讀,節省 mask 的次數。 由圖片中可以看到,有將 src 對齊的執行時間比起原版還要更短。 ![](https://i.imgur.com/i44ynBN.png) ## 測驗四 ### 解釋程式碼運作原理 在這裡由測試程式 `test.c` 流程進行介紹 **test.c** ```cpp CSTR_BUFFER(a); ``` **cstr.h** ```cpp= #pragma once #include <stddef.h> #include <stdint.h> #include <stdlib.h> enum { CSTR_PERMANENT = 1, CSTR_INTERNING = 2, CSTR_ONSTACK = 4, }; #define CSTR_INTERNING_SIZE (32) #define CSTR_STACK_SIZE (128) typedef struct __cstr_data { char *cstr; uint32_t hash_size; uint16_t type; uint16_t ref; } * cstring; typedef struct __cstr_buffer { cstring str; } cstr_buffer[1]; #define CSTR_S(s) ((s)->str) #define CSTR_BUFFER(var) \ char var##_cstring[CSTR_STACK_SIZE] = {0}; \ struct __cstr_data var##_cstr_data = {var##_cstring, 0, CSTR_ONSTACK, 0}; \ cstr_buffer var; \ var->str = &var##_cstr_data; ``` 第 1 行的 `#pragma once` 在 C 語言內並不是一個標準 (non-standard) ,但是有廣泛的被編譯器支援。目標與常見的 `#ifndef...` 類似,希望只被 include 一次。 第 29 行的 [## operator](https://www.ibm.com/docs/en/zos/2.3.0?topic=mdd-operator) 被拿來連接兩個 token ,用以生成變數名稱 到目前為止可以看到我們生成了一個 `a` 的 `str_buffer` 與其相關的變數。 ---- **test.c** ```cpp cstr_cat(a, "Hello "); ``` **cstr.c** ```cpp= cstring cstr_cat(cstr_buffer sb, const char *str) { cstring s = sb->str; if (s->type == CSTR_ONSTACK) { int i = s->hash_size; while (i < CSTR_STACK_SIZE - 1) { s->cstr[i] = *str; if (*str == 0) return s; ++s->hash_size; ++str; ++i; } s->cstr[i] = 0; } cstring tmp = s; sb->str = cstr_cat2(tmp->cstr, str); cstr_release(tmp); return sb->str; } ``` 現在有一個叫做 `a` 的 `cstr_buffer` 與 `"Hello "` 這個字串被傳入 `cstr_cat` 中。 在這裡 `s` 內的資料經由上一步初始化為 ```cpp { // char a_cstring[CSTR_STACK_SIZE] = {0} char* cstr = a_cstring; uint32_t hash_size = 0; uint16_t type = CSTR_ONSTACK; uint16_t ref = 0; } ``` 如果字串除存的資料還在 stack 上面的話會有以下步驟: - 在第 5 行中的 `i` 表示現在 `s` 的 `cstr_buffer` 內的字串長度 - 在以下兩種條件都符合的情況下將字串依序放入 `cstr` 尾端,與前字串連接起。 - `cstr` 內的字串長度小於 `CSTR_STACK_SIZE - 1` - `str` 還沒走到字串尾端 ( '\0' ) - 如果 `cstr` 的大小足以放下 `str`,回傳 `s` - 如果不夠,將 `\0` 放入 `cstr`末端,進入與 `type != CSTR_ONSTACK` 相同的處理步驟 當 `type != CSTR_ONSTACK` 或者字串長度已經超過 `CSTR_STACK_SIZE` 時,呼叫 `cstr_cat2` ,傳入 `cstr` 與 `str` **cstr_cat2** ```cpp= static cstring cstr_cat2(const char *a, const char *b) { size_t sa = strlen(a), sb = strlen(b); if (sa + sb < CSTR_INTERNING_SIZE) { char tmp[CSTR_INTERNING_SIZE]; memcpy(tmp, a, sa); memcpy(tmp + sa, b, sb); tmp[sa + sb] = 0; return cstr_interning(tmp, sa + sb, hash_blob(tmp, sa + sb)); } cstring p = xalloc(sizeof(struct __cstr_data) + sa + sb + 1); if (!p) return NULL; char *ptr = (char *) (p + 1); p->cstr = ptr; p->type = 0; p->ref = 1; memcpy(ptr, a, sa); memcpy(ptr + sa, b, sb); ptr[sa + sb] = 0; p->hash_size = 0; return p; } ``` 若字串的長度大於 `CSTR_STACK_SIZE` 但小於 `CSTR_INTERNING_SIZE` 時,會進入 `cstr__interning` 的函式。 (但在測驗程式碼中 `CSTR_STACK_SIZE` 為 128, `CSTR_INTERNING_SIZE` 為 32,這樣好像就永遠執行不到這段程式碼 @@) 傳入 `cstr__interning` 的參數為: ``` tmp : 連接起的字串 sa+sb : 字串長度 hash_blob(tmp, sa + sb) : 此字串的雜湊值 ``` **hash_blob** ```cpp= static inline uint32_t hash_blob(const char *buffer, size_t len) { const uint8_t *ptr = (const uint8_t *) buffer; size_t h = len; size_t step = (len >> 5) + 1; for (size_t i = len; i >= step; i -= step) h = h ^ ((h << 5) + (h >> 2) + ptr[i - 1]); return h == 0 ? 1 : h; } ``` **cstr_interning** ```cpp= static cstring cstr_interning(const char *cstr, size_t sz, uint32_t hash) { cstring ret; CSTR_LOCK(); ret = interning(&__cstr_ctx, cstr, sz, hash); if (!ret) { expand(&__cstr_ctx); ret = interning(&__cstr_ctx, cstr, sz, hash); } ++__cstr_ctx.total; CSTR_UNLOCK(); return ret; } ``` 在 `cstr_interning` 中第 4 行 `CSTR_LOCK` 與第 11 行的 `CSTR_UNLOCK` 使得在 interning 的過程中可以保證執行正確。 ```cpp= #define CSTR_LOCK() \ ({ \ while (__sync_lock_test_and_set(&(__cstr_ctx.lock), 1)) { \ } \ }) #define CSTR_UNLOCK() ({ __sync_lock_release(&(__cstr_ctx.lock)); }) ``` 在這裡使用的是 gcc 提供的 [Built-in functions for atomic memory access](https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html) 來做 LOCK。 ```cpp struct __cstr_interning { int lock; int index; unsigned size; unsigned total; struct __cstr_node **hash; struct __cstr_pool *pool; }; static struct __cstr_interning __cstr_ctx; ``` **interning** ```cpp= static cstring interning(struct __cstr_interning *si, const char *cstr, size_t sz, uint32_t hash) { if (!si->hash) return NULL; int index = (int) (hash & (si->size - 1)); struct __cstr_node *n = si->hash[index]; while (n) { if (n->str.hash_size == hash) { if (!strcmp(n->str.cstr, cstr)) return &n->str; } n = n->next; } // 80% (4/5) threshold if (si->total * 5 >= si->size * 4) return NULL; if (!si->pool) { si->pool = xalloc(sizeof(struct __cstr_pool)); si->index = 0; } n = &si->pool->node[si->index++]; memcpy(n->buffer, cstr, sz); n->buffer[sz] = 0; cstring cs = &n->str; cs->cstr = n->buffer; cs->hash_size = hash; cs->type = CSTR_INTERNING; cs->ref = 0; n->next = si->hash[index]; si->hash[index] = n; return cs; } ``` 如果 `si` 內的 `hash` 為空,代表第一次使用這個空間,回傳 `NULL`,使用 `expand` 創造空間後再次進入。 第 9 行的 `index` 為 `hash` 與 `si->size` 取餘數的結果,接著進入 while 迴圈,目標為在那層 hash 內尋找是否有相同的字串,若有找的則回傳已經儲存的字串,若沒找到就在 pool 內創立新的 node 在第 19 行的地方,確認裡頭物件是否超過雜湊表的 4/5 ,若超過就回傳 `NULL` ,再使用 `expand` 擴增空間重新進入。 第 35 行,將新的 node 連接至 `si->hash[index]` 的開頭 **expand** ```cpp= #define HASH_START_SIZE 16 /* must be power of 2 */ static void *xalloc(size_t n) { void *m = malloc(n); if (!m) exit(-1); return m; } static void expand(struct __cstr_interning *si) { unsigned new_size = si->size * 2; if (new_size < HASH_START_SIZE) new_size = HASH_START_SIZE; struct __cstr_node **new_hash = xalloc(sizeof(struct __cstr_node *) * new_size); memset(new_hash, 0, sizeof(struct __cstr_node *) * new_size); for (unsigned i = 0; i < si->size; ++i) { struct __cstr_node *node = si->hash[i]; while (node) { struct __cstr_node *tmp = node->next; insert_node(new_hash, new_size, node); node = tmp; } } free(si->hash); si->hash = new_hash; si->size = new_size; } ``` 在 `expand` 宣告新的 `struct __cstr_node` 的指標的指標,並給予 `struct __cstr_node *` 原兩倍大小的空間 ( 若是內部還不曾有過東西,大小則為 `HASH_START_SIZE` ) :::info 參考資料 - [The ## operator](https://www.ibm.com/docs/en/zos/2.3.0?topic=mdd-operator) - [C 語言:typedef 的用法](https://magicjackting.pixnet.net/blog/post/65865174) :::