# 2021q1 Homework2 (quiz2)
contributed by < `AmyLin0210` >
###### tags: `2021q1 Linux`
> [2021q1 第二周測驗題](https://hackmd.io/@sysprog/linux2021-quiz2)
---
## 測驗一
### 解釋上述程式碼運作原理
```cpp=
#define container_of(ptr, type, member) \
__extension__({ \
const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
(type *) ((char *) __pmember - offsetof(type, member)); \
})
```
在 container_of 中包含以下部份:
- `__extension__` 的功用是在使用 GCC extension `__typeof__` 時,不會跳出警告。
- `((type *)0)->member` 將 0 這個位置強制轉型為 `type *`
- 取出當中的 member,並使用 `__typeof__` 得到 member 的型態,產生一個 pointer 給該型別的 `__pmember`
- 將 `ptr` assign 給 `__pmember`
- `offsetof(type, member)` 可以算出 `member` 在 `type` 這個 struct 中的位移量。
- 將絕對位置 `(char*)__pmember` 減去 `offsetof(type, member)` ,可以得到 struct 的起始位置。
- 最後 `(type*)` 再將起始位置轉行為 pointer to `type`
在這當中讓我最不解的是 `((type *)0)->member` 這個操作,
參考 [toastcheng](https://hackmd.io/@toastcheng/w2-quiz2?fbclid=IwAR2crB0c4FZwLpkjQ2sVXEWasg8IXXz3rn3sE5DopzAbBpA90iuV65-pzOo) 同學的作業與實驗,自己也試了一下程式碼:
定義出了一個 struct ,如下:
```
struct test {
int num;
};
```
- typeof-null.c
```cpp=
#include <stdio.h>
int main () {
const __typeof__(((struct test *)NULL)->num) a = 1;
printf("a = %d\n", a); // output: a = 1
return 0;
}
```
- typeof-2048.c
```cpp=
#include <stdio.h>
int main () {
const __typeof__(((struct test *)2048)->num) a = 1;
printf("a = %d\n", a); // output: a = 1
return 0;
}
```
- typeof-char.c
```cpp=
#include <stdio.h>
int main () {
char i;
const __typeof__(((struct test *)i)->num) a = 1;
printf("a = %d\n", a); // output: a = 1
return 0;
}
```
使用 gcc 產生組合語言
```
$ gcc typeof-null.c typeof-char.c typeof-null.c -S -O0
```
使用 diff 來比較產生的組合語言
```
$ diff typeof-char.s typeof-null.s
< .file "typeof-char.c"
---
> .file "typeof-null.c"
```
```
$ diff typeof-2048.s typeof-null.s
< .file "typeof-2048.c"
---
> .file "typeof-null.c"
```
發現除了 file name 以外其餘都相同,這個讓我感到意外。
`typeof-2048` 和 `typeof-null` 相同,我想可以解釋為都是對兩個位置強制轉型,所以會產生兩個同樣的組合語言。
但原先我預期 `typeof-char` 會有些不同的行為,因為已經宣告出一個 char 的變數 `i`,所以組合語言中會多出對 `i` 做處理的部份,然而結果與我想像中的不同。
如同 [toastcheng](https://hackmd.io/@toastcheng/w2-quiz2?fbclid=IwAR2crB0c4FZwLpkjQ2sVXEWasg8IXXz3rn3sE5DopzAbBpA90iuV65-pzOo) 同學的猜測,我想在 typeof 時並不會對當中的指標或參數做 dereference。
但除此之外,強制轉型如果在沒有取當中的數值做處理的情況下,是不是也不會做 dereference?
我測試了一下以下程式碼,當中的指標 `i` 皆指向 NULL,只有型態不同。
char.c
```cpp=
#include <stdio.h>
int main () {
char *i = NULL;
const __typeof__(((struct test *)i)->num) a = 1;
printf("a = %d\n", a); // output: a = 1
return 0;
}
```
int.c
```cpp=
#include <stdio.h>
int main () {
int *i = NULL;
const __typeof__(((struct test *)i)->num) a = 1;
printf("a = %d\n", a); // output: a = 1
return 0;
}
```
上述的程式碼在轉換成組合語言後使用 diff 做比較,
發現不管原參數的型態是否不同,結果仍然相同。
```
$ gcc int.c char.c -S -O0
$ diff int.s char.s
< .file "int.c"
---
> .file "char.c"
```
----
在這裡使用了 [Designated Initializers](https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html#:~:targetText=6.29%20Designated%20Initializers,array%20or%20structure%20being%20initialized.&targetText=To%20initialize%20a%20range%20of,This%20is%20a%20GNU%20extension.),使得 `head` 內的 `prev` 與 `next` 都被初始化為自己本身的位置。
```cpp
struct list_head {
struct list_head *prev, *next;
};
#define LIST_HEAD(head) struct list_head head = {&(head), &(head)}
```
#### INIT_LIST_HEAD
初始化 `head`
```cpp
static inline void INIT_LIST_HEAD(struct list_head *head)
{
head->next = head; head->prev = head;
}
```
```graphviz
digraph INIT_LIST_HEAD {
nodesep=0.5
rankdir="TB"
node[shape=record]
head [
label = "<p> prev | <n> next",
xlabel = "head"
style="filled"
fillcolor="lightblue"
]
head:p -> head
head:n -> head
}
```
#### list_add_tail
由於這是一個 circuler doubly linked list ,所以使用 `head->prev` 找到 linked list 的尾部,再將 `node` 接上。
```cpp
static inline void list_add_tail(struct list_head *node, struct list_head *head)
{
struct list_head *prev = head->prev;
prev->next = node;
node->next = head;
node->prev = prev;
head->prev = node;
}
```
```graphviz
digraph list_add_tail {
nodesep=0.5
rankdir="LR"
node[shape=record]
prev [
label = "<p> prev | <n> next"
xlabel = "prev"
style="filled"
fillcolor="lightblue"
]
_node [
label = "<p> prev | <n> next",
xlabel = "node"
style="filled"
fillcolor="lightblue"
]
head [
label = "<p> prev | <n> next",
xlabel = "head"
style="filled"
fillcolor="lightblue"
]
prev:n->_node
_node:n->head
_node:p->prev
head:p->_node
}
```
#### list_del
將 `node` 移出這個 linked list
```cpp
static inline void list_del(struct list_head *node)
{
struct list_head *next = node->next, *prev = node->prev;
next->prev = prev; prev->next = next;
}
```
```graphviz
digraph list_add_tail {
nodesep=0.5
rankdir="LR"
node[shape=record]
prev [
label = "<p> prev | <n> next"
xlabel = "prev"
style="filled"
fillcolor="lightblue"
]
_node [
label = "<p> prev | <n> next",
xlabel = "node"
style="filled"
fillcolor="lightblue"
]
next [
label = "<p> prev | <n> next",
xlabel = "next"
style="filled"
fillcolor="lightblue"
]
prev:n->next
next:p->prev
}
```
#### list_empty
檢查 `head` 是否指回 `head` 自己,如果沒有指向其他 node ,便表示這個 linked list 為空。
```cpp
static inline int list_empty(const struct list_head *head)
{
return (head->next == head);
}
```
```graphviz
digraph list_add_tail {
nodesep=0.5
rankdir="LR"
node[shape=record]
head [
label = "<p> prev | <n> next"
xlabel = "head"
style="filled"
fillcolor="lightblue"
]
head:n->head
}
```
#### list_is_singular
檢查是否 `prev` 與 `next` 都指向同一個位置,若是相同便表示只有一個 node
```cpp
static inline int list_is_singular(const struct list_head *head)
{
return (!list_empty(head) && head->prev == head->next);
}
```
```graphviz
digraph list_add_tail {
nodesep=0.5
rankdir="LR"
node[shape=record]
head [
label = "<p> prev | <n> next"
xlabel = "head"
style="filled"
fillcolor="lightblue"
]
head:n->tmp
head:p->tmp
}
```
#### list_splice_tail
將 `list` 整串 linked list 連接至 `head` linked list 的最後面,形成一個新的環狀結構
```cpp
static inline void list_splice_tail(struct list_head *list,
struct list_head *head)
{
struct list_head *head_last = head->prev;
struct list_head *list_first = list->next, *list_last = list->prev;
if (list_empty(list))
return;
head->prev = list_last;
list_last->next = head;
list_first->prev = head_last;
head_last->next = list_first;
}
```
```graphviz
digraph list_add_tail {
nodesep=0.5
rankdir="LR"
node[shape=record]
head [
label = "<p> prev | <n> next"
xlabel = "head"
style="filled"
fillcolor="lightblue"
]
head_tmp [
label = "<p> prev | <n> next"
xlabel = "head_tmp"
]
head_last [
label = "<p> prev | <n> next"
xlabel = "head_last"
style="filled"
fillcolor="lightblue"
]
list_first [
label = "<p> prev | <n> next"
xlabel = "list_first"
style="filled"
fillcolor="lightblue"
]
list_tmp [
label = "<p> prev | <n> next"
xlabel = "list_tmp"
]
list_last [
label = "<p> prev | <n> next"
xlabel = "list_last"
style="filled"
fillcolor="lightblue"
]
list [
label = "<p> prev | <n> next"
xlabel = "list"
style="filled"
fillcolor="lightblue"
]
head:p->list_last
list_last:n->head
list_first:p->head_last
head_last:n->list_first
head_last:p->head_tmp
list:n->list_first
list:p->list_last
list_first:n->list_tmp
list_tmp:n->list_last
list_tmp:p->list_first
head:n->head_tmp
head_tmp:p->head
head_tmp:n->head_last
}
```
#### list_cut_position
將 `head_from` 後面到 `node` 前面的 linked list 裁下,連接至 `head_to` 後
```cpp
static inline void list_cut_position(struct list_head *head_to,
struct list_head *head_from,
struct list_head *node)
{
struct list_head *head_from_first = head_from->next;
if (list_empty(head_from))
return;
if (head_from == node) {
INIT_LIST_HEAD(head_to);
return;
}
head_from->next = node->next;
head_from->next->prev = head_from;
head_to->prev = node;
node->next = head_to;
head_to->next = head_from_first;
head_to->next->prev = head_to;
}
```
未切割
```graphviz
digraph list_add_tail {
nodesep=0.5
rankdir="LR"
node[shape=record]
head_from [
label = "<p> prev | <n> next"
xlabel = "h0 | head_from"
style="filled"
fillcolor="lightblue"
]
head_from_first [
label = "<p> prev | <n> next"
xlabel = "h1 | head_from_first"
style="filled"
fillcolor="lightblue"
]
_node [
label = "<p> prev | <n> next"
xlabel = "h2 | node"
style="filled"
fillcolor="pink"
]
h3 [
label = "<p> prev | <n> next"
xlabel = "h3"
style="filled"
fillcolor="lightblue"
]
h4 [
label = "<p> prev | <n> next"
xlabel = "h4"
style="filled"
fillcolor="lightblue"
]
head_to [
label = "<p> prev | <n> next"
xlabel = "head_to"
style="filled"
fillcolor="lightblue"
]
head_from:n->head_from_first
head_from_first:n->_node
_node:n->h3
h3:n->h4
h4:n->head_from
h4:p->h3
h3:p->_node
_node:p->head_from_first
head_from_first:p->head_from
head_from:p->h4
}
```
切割後
```graphviz
digraph list_add_tail {
nodesep=0.5
rankdir="LR"
node[shape=record]
head_from [
label = "<p> prev | <n> next"
xlabel = "h0 | head_from"
style="filled"
fillcolor="lightblue"
]
h3 [
label = "<p> prev | <n> next"
xlabel = "h3"
style="filled"
fillcolor="lightblue"
]
h4 [
label = "<p> prev | <n> next"
xlabel = "h4"
style="filled"
fillcolor="lightblue"
]
head_to [
label = "<p> prev | <n> next"
xlabel = "head_to"
style="filled"
fillcolor="lightblue"
]
head_from_first [
label = "<p> prev | <n> next"
xlabel = "h1 | head_from_first"
style="filled"
fillcolor="lightblue"
]
_node [
label = "<p> prev | <n> next"
xlabel = "h2 | node"
style="filled"
fillcolor="pink"
]
head_to:p->_node
_node:n->head_to
head_to:n->head_from_first
head_from_first:p->head_to
head_from_first:n->_node
_node:p->head_from_first
head_from:n->h3
head_from:p->h4
h3:p->head_from
h3:n->h4
h4:p->h3
h4:n->head_from
}
```
#### get_middle
利用快慢兩個指標來走訪節點,快指標一次走兩步,慢指標一次走一步。
當快指標走訪回最開頭時,慢指標會剛好走到中間。
假設這裡有 N 個節點,慢指標會在第 $\lceil N \rceil$ 個位置
```cpp
static list_ele_t *get_middle(struct list_head *list)
{
struct list_head *fast = list->next, *slow;
list_for_each (slow, list) {
if (fast->next == list || fast->next->next == list)
break;
fast = fast->next->next;
}
return list_entry(slow, list_ele_t, list);
}
```
#### list_merge_sort
這段程式碼因為所有功能都有用函式包起,
乍看之下很像 merge sort 的 pseudocode。
```cpp
void list_merge_sort(queue_t *q)
{
if (list_is_singular(&q->list))
return;
queue_t left;
struct list_head sorted;
INIT_LIST_HEAD(&left.list);
list_cut_position(&left.list, &q->list, &get_middle(&q->list)->list);
list_merge_sort(&left);
list_merge_sort(q);
list_merge(&left.list, &q->list, &sorted);
INIT_LIST_HEAD(&q->list);
list_splice_tail(&sorted, &q->list);
}
```
運作邏輯如下:
- 確認是否只剩一個節點,如果只剩一個節點直接 return
- 尋找整個 list 的中心位置並把他切為兩半
- 對兩個 list 分別再進行 merge sort
- 完成後將兩邊 merge 起,這個步驟可以用下面圖解:
- 分別有 `left` , `q ( right )` , 與 `sorted` 三段
```graphviz
digraph list_add_tail {
nodesep=0.3
rankdir="TD"
node[shape=record]
left [
label = "
{<t> tail | <h> head} |
size |
<l> list
"
xlabel = "left head"
style="filled"
fillcolor="lightblue"
]
L1 [
label = "<p> prev | <n> next"
xlabel = "L1"
style="filled"
fillcolor="lightblue"
]
L2 [
label = "<p> prev | <n> next"
xlabel = "L2"
style="filled"
fillcolor="lightblue"
]
right [
label = "
{<t> tail | <h> head} |
size |
<l> list
"
xlabel = "q head"
style="filled"
fillcolor="pink"
]
R1 [
label = "<p> prev | <n> next"
xlabel = "R1"
style="filled"
fillcolor="pink"
]
R2 [
label = "<p> prev | <n> next"
xlabel = "R2"
style="filled"
fillcolor="pink"
]
sorted [
label = "prev | next"
xlabel = "sorted head"
]
left:l->L1
L1:n->L2
L2:n->left:l
L2:p->L1
L1:p->left:l
left:l->L2
right:l->R1
R1:n->R2
R2:n->right:l
R2:p->R1
R1:p->right:l
right:l->R2
}
```
- 將他們排序並連接起,接到 sorted 後方
```graphviz
digraph list_add_tail {
nodesep=1
rankdir="TD"
node[shape=record]
sorted [
label = "<p>prev |<n> next"
xlabel = "sorted head"
]
left [
label = "<p> prev | <n> next"
xlabel = "left head"
style="filled"
fillcolor="lightblue"
]
L1 [
label = "<p> prev | <n> next"
xlabel = "L1"
style="filled"
fillcolor="lightblue"
]
R1 [
label = "<p> prev | <n> next"
xlabel = "R1"
style="filled"
fillcolor="pink"
]
R2 [
label = "<p> prev | <n> next"
xlabel = "R2"
style="filled"
fillcolor="pink"
]
L2 [
label = "<p> prev | <n> next"
xlabel = "L2"
style="filled"
fillcolor="lightblue"
]
right [
label = "<p> prev | <n> next"
xlabel = "q head"
style="filled"
fillcolor="pink"
]
sorted:n->L1
L1:n->R1
R1:n->R2
R2:n->L2
L2:n->sorted
L2:p->R2
R2:p->R1
R1:p->L1
L1:p->sorted
sorted:p->L2
}
```
- 把 q 初始化
```graphviz
digraph list_add_tail {
nodesep=1
rankdir="TD"
node[shape=record]
q [
label = "<p>prev |<n> next"
xlabel = "q list"
]
q:p->q
q:n->q
}
```
- 將排序好的 list 接到 q 的後面
```graphviz
digraph list_add_tail {
nodesep=1
rankdir="TD"
node[shape=record]
q [
label = "<p> prev | <n> next"
xlabel = "q head"
style="filled"
fillcolor="pink"
]
sorted [
label = "<p>prev |<n> next"
xlabel = "sorted head"
]
L1 [
label = "<p> prev | <n> next"
xlabel = "L1"
style="filled"
fillcolor="lightblue"
]
R1 [
label = "<p> prev | <n> next"
xlabel = "R1"
style="filled"
fillcolor="pink"
]
R2 [
label = "<p> prev | <n> next"
xlabel = "R2"
style="filled"
fillcolor="pink"
]
L2 [
label = "<p> prev | <n> next"
xlabel = "L2"
style="filled"
fillcolor="lightblue"
]
q:n->L1
L1:n->R1
R1:n->R2
R2:n->L2
L2:n->q
L2:p->R2
R2:p->R1
R1:p->L1
L1:p->q
q:p->L2
}
```
#### list_merge
將兩條 linked list 排序好並連接起
```cpp
static void list_merge(struct list_head *lhs,
struct list_head *rhs,
struct list_head *head)
{
INIT_LIST_HEAD(head);
if (list_empty(lhs)) {
list_splice_tail(lhs, head);
return;
}
if (list_empty(rhs)) {
list_splice_tail(rhs, head);
return;
}
while (!list_empty(lhs) && !list_empty(rhs)) {
char *lv = list_entry(lhs->next, list_ele_t, list)->value;
char *rv = list_entry(rhs->next, list_ele_t, list)->value;
struct list_head *tmp = strcmp(lv, rv) <= 0 ? lhs->next : rhs->next;
list_del(tmp);
list_add_tail(tmp, head);
}
list_splice_tail(list_empty(lhs) ? rhs : lhs, head);
}
```
### 指出改進空間並著手實作
在資料型態的部分,
```cpp
struct list_head {
struct list_head *prev, *next;
};
typedef struct __element {
char *value;
struct __element *next;
struct list_head list;
} list_ele_t;
typedef struct {
list_ele_t *head; /* Linked list of elements */
list_ele_t *tail;
size_t size;
struct list_head list;
} queue_t;
```
由於 `list_head` 就擁有了 `tail` 、 `next` 相關的資訊,所以我把上述的資料型態給縮減為以下:
```cpp
struct list_head {
struct list_head *prev, *next;
};
typedef struct __element {
char *value;
struct list_head list;
} list_ele_t;
typedef struct {
list_ele_t *head;
size_t size;
struct list_head list;
} queue_t;
```
也有針對這個改動修改其他對應的程式碼。
### 研讀 Linux 核心的 lib/list_sort.c 原始程式碼
[lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c)
`__attribute__` 是一種檢查工具,主要是 gcc 在編譯時期會去查看。
在這裡選定的 target 是 `nonnull`,表示在 `cmp_func` 這個 function pointer 中的第二及第三個參數指標不得為 NULL。
```cpp
typedef int __attribute__((nonnull(2,3))) (*cmp_func)(void *,
struct list_head const *, struct list_head const *);
```
#### 參考資料
[Declaring Attributes of Functions](https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html)
## 測驗二
### 解釋程式碼運作原理
考慮函式 func 接受一個 16 位元無號整數 N,並回傳小於或等於 N 的 power-of-2
```cpp
uint16_t func(uint16_t N) {
/* change all right side bits to 1 */
N |= N >> 1;
N |= N >> 2;
N |= N >> 4;
N |= N >> 8;
return (N + 1) >> 1;
}
```
由於以上函式的目標為回傳小於或等於 N 的 power-of-2,看到 `return (N + 1) >> 1`,可以推測最後 N 的值會是 $2^{n+1} - 1$
假設現在 func 的 input 為 `1 << 15`,以下為執行順序:
```
1000 0000 0000 0000 // N
1100 0000 0000 0000 // N |= N >> 1
1111 0000 0000 0000 // N |= N >> 2
1111 1111 0000 0000 // N |= N >> 4
1111 1111 1111 1111 // N |= N >> 8
```
在回傳值為 $2^{15}$ 時,會發現有 overflow 的問題,應改為
`return (N >> 1) + 1` 來保證結果正確性。
### 在 [The Linux Kernel API](https://www.kernel.org/doc/html/latest/core-api/kernel-api.html) 頁面搜尋 “power of 2”,可見 is_power_of_2,查閱 Linux 核心原始程式碼,舉例說明其作用和考量,特別是 round up/down power of 2
在 [linux/tools/include/linux/log2.h](https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/tools/include/linux/log2.h) 內可以找到 `is_power_of_2` 的原始程式碼。
`__attribute__((const))` 是個 GNU compiler extension,他的目標為確保這個函式沒有讀取或修改任何的 global memory。
```cpp
static inline __attribute__((const))
bool is_power_of_2(unsigned long n)
{
return (n != 0 && ((n & (n - 1)) == 0));
}
```
如果 `n` 為 2 的冪次 ( $2^N$ ),從 bit 的角度來看,就只有第 N 個 bit 為 1 ,其餘皆為零。
而 `n-1` 則會變成從第 N-1 個及前方的 bit 為 1 ,其餘皆為零
`n` 與 `n-1` 做 `&` 操作後會變成 0 。
```graphviz
digraph power_of_2 {
nodesep=1
rankdir="TB"
node[shape=record]
n [
label = "
...| 0 | 0 | 1 | 0 | 0 | 0
"
xlabel = "n"
]
}
```
```graphviz
digraph power_of_2 {
nodesep=1
rankdir="TB"
node[shape=record]
"n-1" [
label = "
...| 0 | 0 | 0 | 1 | 1 | 1
"
xlabel = "n-1"
]
}
```
好處是 Bitwise Operation 在硬體實做上比較有效率。
同樣在 [linux/tools/include/linux/log2.h](https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/tools/include/linux/log2.h) 內,可以找到 `__roundup_pow_of_two` 以及 `__rounddown_pow_of_two` 。
`UL` 代表 unsigned long int ( C99 標準, 6.4.4.1 )。
`fls_long` 代表的是找到第一個為一的 bit 在哪邊,
- __roundup__pow_of_two
- 假設我們預期此函數的回傳值為 $2^N$ ,那他的參數 `n` 數值範圍應該要為 $2^{N-1} + 1$ ~ $2^N$
- fls_long(n) 中,若 n 的範圍為 $2^{N-1}$ ~ $2^{N}-1$ ,會回傳 N
- 因此 `fls_long(n - 1)`
- __rounddown__pow_of_two
- 假設我們預期此函數的回傳值為 $2^{N-1}$ ,那他的參數 `n` 數值範圍應該要為 $2^{N-2}$ ~ $2^{N-1} - 1$
- fls_long(n) 中,若 n 的範圍為 $2^{N-1}$ ~ $2^{N}-1$ ,會回傳 N
- 因此 `fls_long(n) - 1`
```cpp
/*
* round up to nearest power of two
*/
static inline __attribute__((const))
unsigned long __roundup_pow_of_two(unsigned long n)
{
return 1UL << fls_long(n - 1);
}
```
```cpp
/*
* round down to nearest power of two
*/
static inline __attribute__((const))
unsigned long __rounddown_pow_of_two(unsigned long n)
{
return 1UL << (fls_long(n) - 1);
}
```
在 [linux/tools/include/linux/bitops.h](https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/tools/include/linux/bitops.h) 中可以找到 `fls_long`。
在這邊將 `unsigned long l` 分成 32 位元與 64 位元的兩種狀況去處理
```cpp
static inline unsigned fls_long(unsigned long l)
{
if (sizeof(l) == 4)
return fls(l);
return fls64(l);
}
```
在 [linux/include/asm-generic/bitops/fls.h](https://github.com/torvalds/linux/blob/master/include/asm-generic/bitops/fls.h) 內可以找到 `fls` 的程式原始碼。
`fls` 的函式目標為找到 `x` 內第一個唯一的 bit。
```cpp
/**
* fls - find last (most-significant) bit set
* @x: the word to search
*
* This is defined the same way as ffs.
* Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
*/
static __always_inline int fls(unsigned int x)
{
int r = 32;
if (!x)
return 0;
if (!(x & 0xffff0000u)) {
x <<= 16;
r -= 16;
}
if (!(x & 0xff000000u)) {
x <<= 8;
r -= 8;
}
if (!(x & 0xf0000000u)) {
x <<= 4;
r -= 4;
}
if (!(x & 0xc0000000u)) {
x <<= 2;
r -= 2;
}
if (!(x & 0x80000000u)) {
x <<= 1;
r -= 1;
}
return r;
}
```
以上函式透過 bit mask 的方式一次次過濾是否前方有 1,進而算出第一個為 1 的 bit 在哪裡。
以下用 8 個 bit 來示範函式流程
```
x: 0000 0110
mask: 1111 0000
r: 8 - 4 = 4
```
```
x: 0110 0000
mask: 1100 0000
r: 4
```
```
x: 0110 0000
mask: 1000 0000
r: 4 - 1 = 3
```
在這裡我看到了 Linux kernel 的優雅操作,以上的函式都是用比較貼近硬體語言的方式操作,例如位移或者是 `&` `|` 運算子,比起減法更有效率。
:::info
**Reference**
- [\_\_attribute__((const)) function attribute](https://www.keil.com/support/man/docs/armcc/armcc_chr1359124974244.htm)
:::
## 測驗三
### 解釋程式碼運作原理
```cpp
#include <stdint.h>
void bitcpy(void *_dest, /* Address of the buffer to write to */
size_t _write, /* Bit offset to start writing to */
const void *_src, /* Address of the buffer to read from */
size_t _read, /* Bit offset to start reading from */
size_t count)
{
size_t read_lhs = _read & 7;
size_t read_rhs = 8 - read_lhs;
const uint8_t *source = (const uint8_t *) _src + (_read / 8);
size_t write_lhs = _write & 7;
size_t write_rhs = 8 - write_lhs;
uint8_t *dest = (uint8_t *) _dest + (_write / 8);
static const uint8_t read_mask[] = {
0x00, /* == 0 00000000b */
0x80, /* == 1 10000000b */
0xC0, /* == 2 11000000b */
0xE0, /* == 3 11100000b */
0xF0, /* == 4 11110000b */
0xF8, /* == 5 11111000b */
0xFC, /* == 6 11111100b */
0xFE, /* == 7 11111110b */
0xFF /* == 8 11111111b */
};
static const uint8_t write_mask[] = {
0xFF, /* == 0 11111111b */
0x7F, /* == 1 01111111b */
0x3F, /* == 2 00111111b */
0x1F, /* == 3 00011111b */
0x0F, /* == 4 00001111b */
0x07, /* == 5 00000111b */
0x03, /* == 6 00000011b */
0x01, /* == 7 00000001b */
0x00 /* == 8 00000000b */
};
while (count > 0) {
uint8_t data = *source++;
size_t bitsize = (count > 8) ? 8 : count;
if (read_lhs > 0) {
data <<= read_lhs;
if (bitsize > read_rhs)
data |= (*source >> read_rhs);
}
if (bitsize < 8)
data &= read_mask[bitsize];
uint8_t original = *dest;
uint8_t mask = read_mask[write_lhs];
if (bitsize > write_rhs) {
/* Cross multiple bytes */
*dest++ = (original & mask) | (data >> write_lhs);
original = *dest & write_mask[bitsize - write_rhs];
*dest = original | (data << write_rhs);
} else {
// Since write_lhs + bitsize is never >= 8, no out-of-bound access.
mask |= write_mask[write_lhs + bitsize];
*dest++ = (original & mask) | (data >> write_lhs);
}
count -= bitsize;
}
}
```
現在假設
> _read = 2
> _write = 3
> count = 9
首先會先將 `_read` 、 `_write` 分作 lhs 與 rhs
```cpp
size_t read_lhs = _read & 7; // 2
size_t read_rhs = 8 - read_lhs; // 6
```
找到由哪個 byte 開始讀取 bit 數值,示意圖中粉色方格為 source 。
```
const uint8_t *source = (const uint8_t *) _src + (_read / 8);
```
```graphviz
digraph {
graph [rankdir = TB]
node [shape=plaintext, fontsize=18];
"" -> "bit" -> "byte" [fontcolor=black, color=white];
node [shape=none, fontcolor=black, fontsize=14, width=4.75, fixedsize=true];
byte0 [
shape=none
label=<
<table border="0" cellborder="1" cellspacing="0" cellpadding="17">
<tr>
<td port="port0" bgcolor="#ffcce5"><font color="red">7</font></td>
<td port="port1" bgcolor="#ffcce5"><font color="red">6</font></td>
<td port="port2" bgcolor="#ffcce5"><font color="red">5</font></td>
<td port="port3" bgcolor="#ffcce5"><font color="red">4</font></td>
<td port="port4" bgcolor="#ffcce5"><font color="red">3</font></td>
<td port="port5" bgcolor="#ffcce5"><font color="red">2</font></td>
<td port="port6" bgcolor="#ffcce5"><font color="red">1</font></td>
<td port="port7" bgcolor="#ffcce5"><font color="red">0</font></td>
</tr>
</table>>
];
byte1 [
shape=none
label=<
<table border="0" cellborder="1" cellspacing="0" cellpadding="17">
<tr>
<td port="port0" bgcolor="#f5e342">7</td>
<td port="port1" bgcolor="#f5e342">6</td>
<td port="port2" bgcolor="#f5e342">5</td>
<td port="port3" bgcolor="#f5e342">4</td>
<td port="port4" bgcolor="#f5e342">3</td>
<td port="port5" bgcolor="#f5e342">2</td>
<td port="port6" bgcolor="#f5e342">1</td>
<td port="port7" bgcolor="#f5e342">0</td>
</tr>
</table>>
];
t0 [shape=record label="<head> 0 | 1 | <s>2 | 3 | 4 | 5 | 6 | 7", color="white", fontcolor="black"];
t1 [shape=record label="<head> 8 | 9 | <s>10 | 11 | 12 | 13 | 14 | 15", color="white", fontcolor="black"];
"bitsize" -> byte0:port2 [color=black];
"bitsize" -> byte1:port2 [color=black];
"bitsize" -> byte0:port6 [color=none];
t0:head -> byte0:port0 [color=none];
t1:head -> byte1:port0 [color=none];
{ rank=same; "bit"; byte0; byte1; }
{ rank=same; "byte"; t0; t1; }
}
```
將 `source` 中要讀取的片段複製到 `data` 內。
```cpp
uint8_t data = *source++;
size_t bitsize = (count > 8) ? 8 : count;
if (read_lhs > 0) {
data <<= read_lhs;
if (bitsize > read_rhs)
data |= (*source >> read_rhs);
}
```
```graphviz
digraph {
graph [rankdir = TB]
node [shape=plaintext, fontsize=18];
node [shape=none, fontcolor=black, fontsize=14, width=4.75, fixedsize=true];
byte0 [
shape=none
label=<<table border="0" cellborder="1" cellspacing="0" cellpadding="12">
<tr>
<td bgcolor="#ffcce5"><font color="red">5</font></td>
<td bgcolor="#ffcce5"><font color="red">4</font></td>
<td bgcolor="#ffcce5"><font color="red">3</font></td>
<td bgcolor="#ffcce5"><font color="red">2</font></td>
<td bgcolor="#ffcce5"><font color="red">1</font></td>
<td bgcolor="#ffcce5"><font color="red">0</font></td>
<td bgcolor="#f5e342">7</td>
<td bgcolor="#f5e342">6</td>
</tr>
</table>>
];
}
```
再將 data 的內容移動至 `dest`
```cpp
uint8_t original = *dest;
uint8_t mask = read_mask[write_lhs];
if (bitsize > write_rhs) {
/* Cross multiple bytes */
*dest++ = (original & mask) | (data >> write_lhs);
original = *dest & write_mask[bitsize - write_rhs];
*dest = original | (data << write_rhs);
} else {
// Since write_lhs + bitsize is never >= 8, no out-of-bound access.
mask |= write_mask[write_lhs + bitsize];
*dest++ = (original & mask) | (data >> write_lhs);
}
```
```graphviz
digraph {
graph [rankdir = TB]
node [shape=plaintext, fontsize=18];
"" -> "bit" -> "byte" [fontcolor=black, color=white];
node [shape=none, fontcolor=black, fontsize=14, width=4.75, fixedsize=true];
byte0 [
shape=none
label=<
<table border="0" cellborder="1" cellspacing="0" cellpadding="17">
<tr>
<td port="port0">7</td>
<td port="port1">6</td>
<td port="port2">5</td>
<td port="port3" bgcolor="#ffcce5"><font color="red">5</font></td>
<td port="port4" bgcolor="#ffcce5"><font color="red">4</font></td>
<td port="port5" bgcolor="#ffcce5"><font color="red">3</font></td>
<td port="port6" bgcolor="#ffcce5"><font color="red">2</font></td>
<td port="port7" bgcolor="#ffcce5"><font color="red">1</font></td>
</tr>
</table>>
];
byte1 [
shape=none
label=<
<table border="0" cellborder="1" cellspacing="0" cellpadding="17">
<tr>
<td port="port0" bgcolor="#ffcce5"><font color="red">0</font></td>
<td port="port1" bgcolor="#f5e342">7</td>
<td port="port2" bgcolor="#f5e342">6</td>
<td port="port3">4</td>
<td port="port4">3</td>
<td port="port5">2</td>
<td port="port6">1</td>
<td port="port7">0</td>
</tr>
</table>>
];
t0 [shape=record label="<head> 0 | 1 | <s>2 | 3 | 4 | 5 | 6 | 7", color="white", fontcolor="black"];
t1 [shape=record label="<head> 8 | 9 | <s>10 | 11 | 12 | 13 | 14 | 15", color="white", fontcolor="black"];
"bitsize" -> byte0:port3 [color=black];
"bitsize" -> byte1:port3 [color=black];
"bitsize" -> byte0:port6 [color=none];
t0:head -> byte0:port0 [color=none];
t1:head -> byte1:port0 [color=none];
{ rank=same; "bit"; byte0; byte1; }
{ rank=same; "byte"; t0; t1; }
}
```
### 嘗試重寫為同樣功能但效率更高的實作
目前的程式碼我認為有改進空間的部份有
- 使用 bitwise 的位移取代除法。
```diff=
size_t read_lhs = _read & 7;
size_t read_rhs = 8 - read_lhs;
-const uint8_t *source = (const uint8_t *) _src + (_read / 8);
+const uint8_t *source = (const uint8_t *) _src + (_read >> 3);
size_t write_lhs = _write & 7;
size_t write_rhs = 8 - write_lhs;
-uint8_t *dest = (uint8_t *) _dest + (_write / 8);
+uint8_t *dest = (uint8_t *) _dest + (_write >> 3);
```
- 若需要做跨 byte 儲存資料,每次都會多出兩次的位移與 mask 的,若 count 一大,運算成本也會增加。
```diff=
{
size_t read_lhs = _read & 7;
size_t read_rhs = 8 - read_lhs;
const uint8_t *source = (const uint8_t *) _src + (_read >> 3);
size_t write_lhs = _write & 7;
size_t write_rhs = 8 - write_lhs;
uint8_t *dest = (uint8_t *) _dest + (_write >> 3);
static const uint8_t read_mask[] = {
0x00, /* == 0 00000000b */
0x80, /* == 1 10000000b */
0xC0, /* == 2 11000000b */
0xE0, /* == 3 11100000b */
0xF0, /* == 4 11110000b */
0xF8, /* == 5 11111000b */
0xFC, /* == 6 11111100b */
0xFE, /* == 7 11111110b */
0xFF /* == 8 11111111b */
};
static const uint8_t write_mask[] = {
0xFF, /* == 0 11111111b */
0x7F, /* == 1 01111111b */
0x3F, /* == 2 00111111b */
0x1F, /* == 3 00011111b */
0x0F, /* == 4 00001111b */
0x07, /* == 5 00000111b */
0x03, /* == 6 00000011b */
0x01, /* == 7 00000001b */
0x00 /* == 8 00000000b */
};
+ if (count & 7UL) {
+ uint8_t data = *source++;
+ size_t bitsize = count & 7UL;
+ data <<= read_lhs;
+ data &= read_mask[bitsize];
+ uint8_t original = *dest;
+ uint8_t mask = read_mask[write_lhs];
+ if (bitsize > write_rhs) {
+ *dest++ = (original & mask) | (data >> write_lhs);
+ original = *dest & write_mask[bitsize - write_rhs];
+ *dest = original | (data << write_rhs);
+ } else {
+ mask |= write_mask[write_lhs + bitsize];
+ *dest++ = (original & mask) | (data >> write_lhs);
+ }
+ count -= bitsize;
+ }
while (count > 0) {
uint8_t data = *source++;
size_t bitsize = (count > 8) ? 8 : count;
- if (read_lhs > 0) {
- data <<= read_lhs;
- if (bitsize > read_rhs)
- data |= (*source >> read_rhs);
- }
-
- if (bitsize < 8)
- data &= read_mask[bitsize];
data &= read_mask[bitsize];
uint8_t original = *dest;
uint8_t mask = read_mask[write_lhs];
if (bitsize > write_rhs) {
/* Cross multiple bytes */
*dest++ = (original & mask) | (data >> write_lhs);
original = *dest & write_mask[bitsize - write_rhs];
*dest = original | (data << write_rhs);
} else {
// Since write_lhs + bitsize is never >= 8, no out-of-bound access.
mask |= write_mask[write_lhs + bitsize];
*dest++ = (original & mask) | (data >> write_lhs);
}
count -= bitsize;
}
}
```
在進入 while 迴圈之前,我把 src 對齊,確認在讀資料時,一定會從某個 byte 的第一個 bit 開始讀,節省 mask 的次數。
由圖片中可以看到,有將 src 對齊的執行時間比起原版還要更短。

## 測驗四
### 解釋程式碼運作原理
在這裡由測試程式 `test.c` 流程進行介紹
**test.c**
```cpp
CSTR_BUFFER(a);
```
**cstr.h**
```cpp=
#pragma once
#include <stddef.h>
#include <stdint.h>
#include <stdlib.h>
enum {
CSTR_PERMANENT = 1,
CSTR_INTERNING = 2,
CSTR_ONSTACK = 4,
};
#define CSTR_INTERNING_SIZE (32)
#define CSTR_STACK_SIZE (128)
typedef struct __cstr_data {
char *cstr;
uint32_t hash_size;
uint16_t type;
uint16_t ref;
} * cstring;
typedef struct __cstr_buffer {
cstring str;
} cstr_buffer[1];
#define CSTR_S(s) ((s)->str)
#define CSTR_BUFFER(var) \
char var##_cstring[CSTR_STACK_SIZE] = {0}; \
struct __cstr_data var##_cstr_data = {var##_cstring, 0, CSTR_ONSTACK, 0}; \
cstr_buffer var; \
var->str = &var##_cstr_data;
```
第 1 行的 `#pragma once` 在 C 語言內並不是一個標準 (non-standard) ,但是有廣泛的被編譯器支援。目標與常見的 `#ifndef...` 類似,希望只被 include 一次。
第 29 行的 [## operator](https://www.ibm.com/docs/en/zos/2.3.0?topic=mdd-operator) 被拿來連接兩個 token ,用以生成變數名稱
到目前為止可以看到我們生成了一個 `a` 的 `str_buffer` 與其相關的變數。
----
**test.c**
```cpp
cstr_cat(a, "Hello ");
```
**cstr.c**
```cpp=
cstring cstr_cat(cstr_buffer sb, const char *str)
{
cstring s = sb->str;
if (s->type == CSTR_ONSTACK) {
int i = s->hash_size;
while (i < CSTR_STACK_SIZE - 1) {
s->cstr[i] = *str;
if (*str == 0)
return s;
++s->hash_size;
++str;
++i;
}
s->cstr[i] = 0;
}
cstring tmp = s;
sb->str = cstr_cat2(tmp->cstr, str);
cstr_release(tmp);
return sb->str;
}
```
現在有一個叫做 `a` 的 `cstr_buffer` 與 `"Hello "` 這個字串被傳入 `cstr_cat` 中。
在這裡 `s` 內的資料經由上一步初始化為
```cpp
{
// char a_cstring[CSTR_STACK_SIZE] = {0}
char* cstr = a_cstring;
uint32_t hash_size = 0;
uint16_t type = CSTR_ONSTACK;
uint16_t ref = 0;
}
```
如果字串除存的資料還在 stack 上面的話會有以下步驟:
- 在第 5 行中的 `i` 表示現在 `s` 的 `cstr_buffer` 內的字串長度
- 在以下兩種條件都符合的情況下將字串依序放入 `cstr` 尾端,與前字串連接起。
- `cstr` 內的字串長度小於 `CSTR_STACK_SIZE - 1`
- `str` 還沒走到字串尾端 ( '\0' )
- 如果 `cstr` 的大小足以放下 `str`,回傳 `s`
- 如果不夠,將 `\0` 放入 `cstr`末端,進入與 `type != CSTR_ONSTACK` 相同的處理步驟
當 `type != CSTR_ONSTACK` 或者字串長度已經超過 `CSTR_STACK_SIZE` 時,呼叫 `cstr_cat2` ,傳入 `cstr` 與 `str`
**cstr_cat2**
```cpp=
static cstring cstr_cat2(const char *a, const char *b)
{
size_t sa = strlen(a), sb = strlen(b);
if (sa + sb < CSTR_INTERNING_SIZE) {
char tmp[CSTR_INTERNING_SIZE];
memcpy(tmp, a, sa);
memcpy(tmp + sa, b, sb);
tmp[sa + sb] = 0;
return cstr_interning(tmp, sa + sb, hash_blob(tmp, sa + sb));
}
cstring p = xalloc(sizeof(struct __cstr_data) + sa + sb + 1);
if (!p)
return NULL;
char *ptr = (char *) (p + 1);
p->cstr = ptr;
p->type = 0;
p->ref = 1;
memcpy(ptr, a, sa);
memcpy(ptr + sa, b, sb);
ptr[sa + sb] = 0;
p->hash_size = 0;
return p;
}
```
若字串的長度大於 `CSTR_STACK_SIZE` 但小於 `CSTR_INTERNING_SIZE` 時,會進入 `cstr__interning` 的函式。
(但在測驗程式碼中 `CSTR_STACK_SIZE` 為 128, `CSTR_INTERNING_SIZE` 為 32,這樣好像就永遠執行不到這段程式碼 @@)
傳入 `cstr__interning` 的參數為:
```
tmp : 連接起的字串
sa+sb : 字串長度
hash_blob(tmp, sa + sb) : 此字串的雜湊值
```
**hash_blob**
```cpp=
static inline uint32_t hash_blob(const char *buffer, size_t len)
{
const uint8_t *ptr = (const uint8_t *) buffer;
size_t h = len;
size_t step = (len >> 5) + 1;
for (size_t i = len; i >= step; i -= step)
h = h ^ ((h << 5) + (h >> 2) + ptr[i - 1]);
return h == 0 ? 1 : h;
}
```
**cstr_interning**
```cpp=
static cstring cstr_interning(const char *cstr, size_t sz, uint32_t hash)
{
cstring ret;
CSTR_LOCK();
ret = interning(&__cstr_ctx, cstr, sz, hash);
if (!ret) {
expand(&__cstr_ctx);
ret = interning(&__cstr_ctx, cstr, sz, hash);
}
++__cstr_ctx.total;
CSTR_UNLOCK();
return ret;
}
```
在 `cstr_interning` 中第 4 行 `CSTR_LOCK` 與第 11 行的 `CSTR_UNLOCK` 使得在 interning 的過程中可以保證執行正確。
```cpp=
#define CSTR_LOCK() \
({ \
while (__sync_lock_test_and_set(&(__cstr_ctx.lock), 1)) { \
} \
})
#define CSTR_UNLOCK() ({ __sync_lock_release(&(__cstr_ctx.lock)); })
```
在這裡使用的是 gcc 提供的 [Built-in functions for atomic memory access](https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html) 來做 LOCK。
```cpp
struct __cstr_interning {
int lock;
int index;
unsigned size;
unsigned total;
struct __cstr_node **hash;
struct __cstr_pool *pool;
};
static struct __cstr_interning __cstr_ctx;
```
**interning**
```cpp=
static cstring interning(struct __cstr_interning *si,
const char *cstr,
size_t sz,
uint32_t hash)
{
if (!si->hash)
return NULL;
int index = (int) (hash & (si->size - 1));
struct __cstr_node *n = si->hash[index];
while (n) {
if (n->str.hash_size == hash) {
if (!strcmp(n->str.cstr, cstr))
return &n->str;
}
n = n->next;
}
// 80% (4/5) threshold
if (si->total * 5 >= si->size * 4)
return NULL;
if (!si->pool) {
si->pool = xalloc(sizeof(struct __cstr_pool));
si->index = 0;
}
n = &si->pool->node[si->index++];
memcpy(n->buffer, cstr, sz);
n->buffer[sz] = 0;
cstring cs = &n->str;
cs->cstr = n->buffer;
cs->hash_size = hash;
cs->type = CSTR_INTERNING;
cs->ref = 0;
n->next = si->hash[index];
si->hash[index] = n;
return cs;
}
```
如果 `si` 內的 `hash` 為空,代表第一次使用這個空間,回傳 `NULL`,使用 `expand` 創造空間後再次進入。
第 9 行的 `index` 為 `hash` 與 `si->size` 取餘數的結果,接著進入 while 迴圈,目標為在那層 hash 內尋找是否有相同的字串,若有找的則回傳已經儲存的字串,若沒找到就在 pool 內創立新的 node
在第 19 行的地方,確認裡頭物件是否超過雜湊表的 4/5 ,若超過就回傳 `NULL` ,再使用 `expand` 擴增空間重新進入。
第 35 行,將新的 node 連接至 `si->hash[index]` 的開頭
**expand**
```cpp=
#define HASH_START_SIZE 16 /* must be power of 2 */
static void *xalloc(size_t n)
{
void *m = malloc(n);
if (!m)
exit(-1);
return m;
}
static void expand(struct __cstr_interning *si)
{
unsigned new_size = si->size * 2;
if (new_size < HASH_START_SIZE)
new_size = HASH_START_SIZE;
struct __cstr_node **new_hash =
xalloc(sizeof(struct __cstr_node *) * new_size);
memset(new_hash, 0, sizeof(struct __cstr_node *) * new_size);
for (unsigned i = 0; i < si->size; ++i) {
struct __cstr_node *node = si->hash[i];
while (node) {
struct __cstr_node *tmp = node->next;
insert_node(new_hash, new_size, node);
node = tmp;
}
}
free(si->hash);
si->hash = new_hash;
si->size = new_size;
}
```
在 `expand` 宣告新的 `struct __cstr_node` 的指標的指標,並給予 `struct __cstr_node *` 原兩倍大小的空間 ( 若是內部還不曾有過東西,大小則為 `HASH_START_SIZE` )
:::info
參考資料
- [The ## operator](https://www.ibm.com/docs/en/zos/2.3.0?topic=mdd-operator)
- [C 語言:typedef 的用法](https://magicjackting.pixnet.net/blog/post/65865174)
:::