# 2021q1 Homework2 (quiz2)
contributed by < [Julian-Chu](https://github.com/Julian-Chu) >
###### tags: `linux2021`
> [GitHub](https://github.com/Julian-Chu/linux2021-quiz2)
## 測驗 1: include/linux/list.h 的精簡實作
#### struct 與 initialization
```cpp
struct list_head {
struct list_head *prev, *next;
};
/**
* LIST_HEAD - Declare list head and initialize it
* @head: name of the new object
*/
#define LIST_HEAD(head) struct list_head head = {&(head), &(head)}
/**
* INIT_LIST_HEAD() - Initialize empty list head
* @head: pointer to list head
*/
static inline void INIT_LIST_HEAD(struct list_head *head)
{
head->next = head; head->prev = head;
}
```
從 struct 開始, 可以看到實作的 list 爲 doubly-linked list,接著初始化的時候將 prev 跟 next 指向 head 自身(不是指向 NULL), head 爲 dummy 節點,可用來判斷 list 的狀況,只有 head 存在就是 emtpy list
``` graphviz
digraph structs {
rankdir=LR
node[shape=record]
head [label="{<prev>prev|<head>head|<next>next}"]
head:next:e -> head:prev:w[arrowhead=none]
}
```
> 特別注意, 在程式中都使用 `INIT_LIST_HEAD()`, `LIST_HEAD()` 應該是該操作的巨集版本, 但未被使用。
todo: 測試巨集與函式呼叫的差異。
#### list_add_tail
插入節點到 list 的尾部,由於是 doubly-linked list,頭尾相連,所以 `head->prev` 即為 tail,將 `tail->next` 跟 `head->prev` 連接到 `new_tail`,再將 `new_tail` 的 `next` 連結到 `head`, `prev` 連結到 `tail`,即完成尾端的插入。
```cpp
static inline void list_add_tail(struct list_head *node, struct list_head *head)
{
struct list_head *prev = head->prev;
prev->next = node;
node->next = head;
node->prev = prev;
head->prev = node;
}
```
``` graphviz
digraph structs {
rankdir=LR
node[shape=record]
head [label="{<prev>prev|<listnode>head|<next>next}"]
tail [label="{<prev>prev|<listnode>tail|<next>next}"]
head:next:s -> tail:prev:s
tail:next:n -> head:prev:n
}
```
``` graphviz
digraph structs {
rankdir=LR
node[shape=record]
head [label="{<prev>prev|<listnode>head|<next>next}"]
tail [label="{<prev>prev|<listnode>tail|<next>next}"]
newtail [label="{<prev>prev|<listnode>new_tail|<next>next}"]
head:next:s -> tail:prev:s
tail:next:s -> newtail:prev:s
newtail:next:n -> head:prev:n
}
```
#### list_del
```cpp
static inline void list_del(struct list_head *node)
{
struct list_head *next = node->next, *prev = node->prev;
next->prev = prev; prev->next = next;
}
```
移除特定的節點,由於是 doubly-linked list,不需要從 head 開始尋找 node 的前一個節點,直接取得 prev, next 節點,建立連結即可
``` graphviz
digraph structs {
rankdir=LR
node[shape=record]
head [label="{<prev>prev|<listnode>head|<next>next}"]
tail [label="{<prev>prev|<listnode>node|<next>next}"]
newtail [label="{<prev>prev|<listnode>tail|<next>next}"]
head:next:s -> tail:prev:s
tail:next:s -> newtail:prev:s
newtail:next:n -> head:prev:n
}
```
``` graphviz
digraph structs {
rankdir=LR
node[shape=record]
head [label="{<prev>prev|<listnode>head|<next>next}"]
tail [label="{<prev>prev|<listnode>tail|<next>next}"]
head:next:s -> tail:prev:s
tail:next:n -> head:prev:n
}
```
#### list_empty and list_is_singular
```cpp
static inline int list_empty(const struct list_head *head)
{
return (head->next == head);
}
static inline int list_is_singular(const struct list_head *head)
{
return (!list_empty(head) && head->prev == head->next);
}
```
`list_empty` 與 `list_is_singular` 都是利用 dummy head 作快速的判斷,免除掉考慮 head 爲 NULL 的狀態。
只有 dummy head 則爲 empty list。若 dummy head 前後都指向同一個節點,則該 list 只有一個節點。
#### list_splice_tail
```cpp
static inline void list_splice_tail(struct list_head *list,
struct list_head *head)
{
struct list_head *head_last = head->prev;
struct list_head *list_first = list->next, *list_last = list->prev;
if (list_empty(list))
return;
head->prev = list_last;
list_last->next = head;
list_first->prev = head_last;
head_last->next = list_first;
}
```
將 list 串接到 head,要特別注意的點是 list 也是 dummy head,所以接過去節點不包含 list 自身, 起始節點爲 `list.next`,尾部爲 `list.prev`
#### list_cut_position
```cpp
static inline void list_cut_position(struct list_head *head_to,
struct list_head *head_from,
struct list_head *node)
{
struct list_head *head_from_first = head_from->next;
if (list_empty(head_from))
return;
if (head_from == node) {
INIT_LIST_HEAD(head_to);
return;
}
head_from->next = node->next;
head_from->next->prev = head_from;
head_to->prev = node;
node->next = head_to;
head_to->next = head_from_first;
head_to->next->prev = head_to;
}
```
`list_cut_position` 從 head_from list 中,切出一段 list (該list 尾端爲 node,`head_to` 爲新的 list 的 dummy head
``` graphviz
digraph structs {
rankdir=LR
node[shape=record]
head [label="{<prev>prev|<listnode>head_from|<next>next}"]
first [label="{<prev>prev|<listnode>head_from_first|<next>next}"]
cutnode [label="{<prev>prev|<listnode>node|<next>next}"]
tail [label="{<prev>prev|<listnode>tail|<next>next}"]
head:next:s -> first:prev:s
first:next:s -> cutnode:prev:s
cutnode:next:s -> tail:prev:s
tail:next:n -> head:prev:n
}
```
切開 list
``` graphviz
digraph structs {
rankdir=LR
node[shape=record]
head [label="{<prev>prev|<listnode>head_from|<next>next}"]
first [label="{<prev>prev|<listnode>head_from_first|<next>next}"]
cutnode [label="{<prev>prev|<listnode>node|<next>next}"]
tail [label="{<prev>prev|<listnode>tail|<next>next}"]
head:next:s -> tail:prev:s
first:next:n -> cutnode:prev:n
tail:next:n -> head:prev:n
}
```
與 head_to (dummy head) 連結變成新的 list
``` graphviz
digraph structs {
rankdir=LR
node[shape=record]
head_to [label="{<prev>prev|<listnode>head_to|<next>next}"]
first [label="{<prev>prev|<listnode>head_from_first|<next>next}"]
cutnode [label="{<prev>prev|<listnode>node|<next>next}"]
head_to:next:s->first:prev:s
first:next:s -> cutnode:prev:s
cutnode:next:n->head_to:prev:n
}
```
#### list_entry , list_first_entry
```cpp
#define list_entry(node, type, member) container_of(node, type, member)
#define list_first_entry(head, type, member) \
list_entry((head)->next, type, member)
```
都是用以取得 node 內部成員的巨集,結構體的內部成員及 padding 構成連續記憶體空間,於是可藉由給定成員的地址,回推結構體的開頭地址或在同一結構體的其他成員的地址。此處基於 node 的成員 list_head 推算 entry 的地址 (例如下面實作的 list_ele_t)
#### list_first_entry
```cpp
#define list_for_each(node, head) \
for (node = (head)->next; node != (head); node = node->next)
```
該巨集走訪 list,並包裝為 [foreach](https://en.wikipedia.org/wiki/Foreach_loop) 迴圈形式。
### merge sort
題目實作上的與 lab0 相當類似,queue_t 的設計概念是用來紀錄頭尾跟空 list,但對 doubly-linked list 來說,利用 dummy head 可以做到一樣的判斷,queue_t 應該可以被省略
### 移除 queue_t
- 移除 queue_t 相關方法
- 移除 list_ele_t 的 next
```c
typedef struct __element {
char *value;
struct list_head list;
} list_ele_t;
```
相關的變動差異沒有很大,除了 free 比較特別,需要用 entry 取得 list_ele_t 後再進行 free
```c
static void list_free(list_ele_t *head)
{
struct list_head *cur;
list_ele_t *element;
list_for_each(cur, &head->list){
element = list_entry(cur, list_ele_t, list);
free(element->value);
free(element);
element = NULL;
}
free(head->value);
free(head);
head = NULL;
}
```
[實作程式碼](https://github.com/Julian-Chu/linux2021-quiz2/blob/main/question1/mergesort.c)
用 valgrind 查看, 還有 invalid read, 偵錯中
```c
==46691== Invalid read of size 8
==46691== at 0x10985D: list_free (mergesort.c:85)
==46691== by 0x109A84: main (mergesort.c:142)
==46691== Address 0x56bb710 is 16 bytes inside a block of size 24 free'd
==46691== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==46691== by 0x109850: list_free (mergesort.c:88)
==46691== by 0x109A84: main (mergesort.c:142)
==46691== Block was alloc'd at
==46691== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==46691== by 0x1098B6: list_insert (mergesort.c:100)
==46691== by 0x109A05: main (mergesort.c:136)
==46691==
==46691==
==46691== HEAP SUMMARY:
==46691== in use at exit: 0 bytes in 0 blocks
==46691== total heap usage: 187,658 allocs, 187,658 frees, 4,530,484 bytes allocated
==46691==
==46691== All heap blocks were freed -- no leaks are possible
==46691==
==46691== For lists of detected and suppressed errors, rerun with: -s
==46691== ERROR SUMMARY: 93827 errors from 1 contexts (suppressed: 0 from 0)
```
發現另一個巨集 list_for_each_safe 可用於清除資料
[GitHub: include/linux/list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h#L597)
[stackoverflow: Why do we need list_for_each_safe() in for deleting nodes in kernel linked list?](https://stackoverflow.com/questions/9207850/why-do-we-need-list-for-each-safe-in-for-deleting-nodes-in-kernel-linked-list)
```cpp
#define list_for_each_safe(pos, n, head) \
for (pos = (head)->next, n = pos->next; pos != (head); \
pos = n, n = pos->next)
static void list_free(list_ele_t *head)
{
struct list_head *cur, *tmp;
list_ele_t *element;
list_for_each_safe(cur, tmp, &head->list){
element = list_entry(cur, list_ele_t, list);
list_del(cur);
free(element->value);
free(element);
element = NULL;
}
free(head->value);
free(head);
head = NULL;
}
```
valgrind 沒有報告任何錯誤
:::warning
todo:
- 各種 list_for_each 巨集的差異跟使用場景
- Export_symbol
- linux kernel: merge sort 切割比例不同, 針對 cache 優化
- optimizing merge sort: cache-aware
:::
---
## 測驗 2: 接受一個 16 位元無號整數 $N$,並回傳小於或等於 $N$ 的 power-of-2
思考邏輯,$N$的pattern是
```
0000000000000000
0000000000000001
000000000000001X
00000000000001XX
0000000000001XXX
000000000001XXXX
...
1XXXXXXXXXXXXXX
```
考慮過程中的 `|` 運算,沒有辦法把上述的 X 設為0
```cpp
N |= N >> 1;
N |= N >> 2;
N |= N >> 4;
N |= N >> 8;
```
唯一能將 X 變為零的方式,只有利用進位
```cpp
return (N + 1) >> 1;
```
所以要想辦法讓上述 patterns 中的 X 確保是 1,再利用進位後的數字右移一位得到所求答案
```cpp
0001XX -> 000111 -> +1 -> 001000 -> >>1 ->000100
```
右移 1, 2, 4, 8 位元來自哪?考慮下面的極端情況 2^15^ ,為 `1` 的位數每次運算後會變為兩倍,分別右移 1, 2, 4, 8 位後剛好可以讓 16 位元都為 `1`
```cpp
1000000000000000 | 0100000000000000 = 1100000000000000
1100000000000000 | 0011000000000000 = 1111000000000000
1111000000000000 | 0000111100000000 = 1111111100000000
1111111100000000 | 0000000011111111 = 1111111111111111
```
#### `__roundup_pow_of_two`
在 linux 原始碼裡面找到其中一個 `__roundup_pow_of_two`
[GitHub](https://github.com/torvalds/linux/blob/197c61cb176a40f5877c3caf8249722e77b7d989/drivers/net/ipa/gsi_trans.c#L153)
```c=
/* Don't let allocations cross a power-of-two boundary */
size = __roundup_pow_of_two(size);
total_size = (count + max_alloc - 1) * size;
/* The allocator will give us a power-of-2 number of pages. But we
* can't guarantee that, so request it. That way we won't waste any
* memory that would be available beyond the required space.
*
* Note that gsi_trans_pool_exit_dma() assumes the total allocated
* size is exactly (count * size).
*/
total_size = get_order(total_size) << PAGE_SHIFT;
virt = dma_alloc_coherent(dev, total_size, &addr, GFP_KERNEL);
if (!virt)
return -ENOMEM;
```
還無法了解 ipa 的全貌, 但根據註解聯想到 jemalloc 對 small object 的處理也會需要用到 round up power of 2
> Small objects are managed in groups by page runs. Each run maintains a frontier and free list to track which regions are in use. Allocation requests that are no more than half the quantum (8 or 16, depending on architecture) are rounded up to the nearest power of two that is at least sizeof(double)
[jemalloc man](https://linux.die.net/man/3/jemalloc)
:::warning
todo: the behaviour of unsigned int overflow and right bit shift
:::
---
## 測驗 3 bitcpy
LSB: least significant bit
MSB: most siginificant bit
RHS: right hand side
LHS: left hand side
### compile error
```c
In file included from bitcpy_test.c:4:
bitcpy.h:6:6: error: old-style parameter declarations in prototyped function definition
6 | void bitcpy(void *dest, size_t write, const void *_src, size_t read, size_t count)
```
與內容不符合的 error message, 經確認後在 bitcpy.h 的函式宣告少了分號。
### 運算流程
```cpp=
void bitcpy(void *_dest, /* Address of the buffer to write to */
size_t _write, /* Bit offset to start writing to */
const void *_src, /* Address of the buffer to read from */
size_t _read, /* Bit offset to start reading from */
size_t count) {
size_t read_lhs = _read & 7;
size_t read_rhs = 8 - read_lhs;
const uint8_t *source = (const uint8_t *)_src + (_read / 8);
size_t write_lhs = _write & 7;
size_t write_rhs = 8 - write_lhs;
uint8_t *dest = (uint8_t *)_dest + (_write / 8);
```

- void *_dest, const void *_src 藉由 void\* (void pointer/generic pointer) 可以任意的傳入任何記憶體位置, 在轉型至 uint8_t 便於後續的操作。
- uint8_t 爲非負的 8 位元 integer, 等同於 byte,由於 c 語言本身沒有提供 byte type, 使用 uint8_t 代替 byte。
- 7-9 與 10-12 行是類似的邏輯, 知道 _dest/_src 跟 offset 之後, 算出需要複製的記憶體位置 dest/source, 特別需要注意的是 offset 是以 bit 爲單位, (offset/8) 之後才是需要移動的 bytes
接下來操作需要複製的 bit 數目 (count), 直到 count 爲 0
```c=
while (count > 0) {
// 取得當前 source 指向記憶體的 byte 資料, 將source 指向下一個 byte
uint8_t data = *source++;
// 如果需要複製的位元數多於 8 bits, 則複製 8bits
size_t bitsize = (count > 8) ? 8 : count;
// bit 右邊還有需要複製的部分 > 0
if (read_lhs > 0) {
// 將需要複製的右側部分, 左移到對齊 byte 的msb
data <<= read_lhs;
// 跨兩個 byte, 需要讀取下一個 byte 的位元
if (bitsize > read_rhs)
// 將下一個 byte 右移對齊 byte 的尾端
// 利用 or 運算取得需要複製的 byte 內容
data |= (*source >> read_rhs);
}
// 最後一個 byte 直接用 mask 取得剩餘內容
if (bitsize < 8)
data &= read_mask[bitsize];
uint8_t original = *dest;
// 利用 mask 與 and 運算, 保持 dest 中不會被覆寫的 bits, 將要被覆寫的部分設爲 0
uint8_t mask = read_mask[write_lhs];
if (bitsize > write_rhs) {
// 將 data 複製到 dest 內對應的位置, dest 移動到下一個 byte
*dest++ = (original & mask) | (data >> write_lhs);
// 利用 mask 與 and 運算, 保持 dest 中不會被覆寫的 bits, 將要被覆寫的部分設爲 0
original = *dest & write_mask[bitsize - write_rhs];
*dest = original | (data << write_rhs);
} else {
// Since write_lhs + bitsize is never >= 8, no out-of-bound access.
mask |= write_mask[write_lhs + bitsize];
*dest++ = (original & mask) | (data >> write_lhs);
}
count -= bitsize;
}
```
7-14 行

20-33 行

更細部的圖解:
20-25行(上圖)
26-28行(下圖)

### mask 對 byte 的操作
由於對位元的操作都以 byte 爲單位, 所以利用 mask 與 and/or/xor 運算改變位元的狀態是必須要知道的知識。
### 規劃效率更高的實作
- ~~複製過程中, 只有頭尾兩個 byte 可能需要額外處理, 中間的bytes 皆是可以直接複製, 可以利用 memcpy 複製整段的連續記憶體~~ (只能針對 write_lhs == read_lhs 的情況)
- 以 byte 為單位的處理是常見的情況, 放棄 write_lhs 與 read_lhs (位元在第一個 byte 中的位置) 不匹配的情況, 以 write_lhs == read_lhs 的情況, 做連續記憶題的複製或位元運算, 少掉跨 byte 的運算(待實作與測試)
:::warning
todo:
- byte addressing
- bitwise operation faster than arithmetic operations, why and what kind of arithmetic operations can be replaced?
- read the code of bitmap in linux
:::
---
## 測驗 4 string interning
### preprocessor
\# [Stringizing](https://gcc.gnu.org/onlinedocs/cpp/Stringizing.html)
\## [Concatenation](https://gcc.gnu.org/onlinedocs/cpp/Concatenation.html)
\#pragma once [wiki](https://en.wikipedia.org/wiki/Pragma_once)
### 資料結構
```c
enum {
CSTR_PERMANENT = 1,
CSTR_INTERNING = 2,
CSTR_ONSTACK = 4,
};
#define CSTR_INTERNING_SIZE (32) // 用以判斷是否為長字串
#define CSTR_STACK_SIZE (128) // buffer size
typedef struct __cstr_data {
char *cstr; // 字串內容
uint32_t hash_size; // 雜湊值
uint16_t type; // 代表cstr 的不同狀態
uint16_t ref; // reference counter 有多少程式碼引用這個cstr
} * cstring;
typedef struct __cstr_buffer {
cstring str;
} cstr_buffer[1];
```
todo: 待驗證, 對 ref 的疑惑, 原本預期是 cstr 的 reference counter, 但查看`cstr_grab`後, 懷疑不會有進入到第10行狀態的情況(需要 s->type == 0 & s->ref !=0 )的狀態,
```c=
cstring cstr_grab(cstring s)
{
if (s->type & (CSTR_PERMANENT | CSTR_INTERNING))
return s;
if (s->type == CSTR_ONSTACK)
return cstr_clone(s->cstr, s->hash_size);
if (s->ref == 0)
s->type = CSTR_PERMANENT;
else
__sync_add_and_fetch(&s->ref, 1);
return s;
}
```
參照下列第5-7行, type 為 0 的情況, 會被馬上設為CSTR_PERMANENT
```c=
#define CSTR_LITERAL(var, cstr) \
static cstring var = NULL; \
if (!var) { \
cstring tmp = cstr_clone("" cstr, (sizeof(cstr) / sizeof(char)) - 1); \
if (tmp->type == 0) { \
tmp->type = CSTR_PERMANENT; \
tmp->ref = 0; \
}
```
### 以 enum 代表 cstr 的 type
```graphviz
digraph structs {
rankdir=LR
node[shape=record]
cstr [label="cstr type"]
permanent [label="{permanent|字串長度大於 interning size 的限制, 建立獨立 cstr, 不放入 pool 中}"]
onstack [label="{<prev>on stack|放在 cstr_buffer, 方便字串串接}"]
interning [label="{<prev>interning|字串放入 pool 中, 重複取用}"]
cstr -> permanent:w
cstr -> interning:w
cstr -> onstack:w
}
```
### cstring 的初始化邏輯以及對應的 type 如下圖
```flow
st=>start: str
e1=>end: CSTR_ONSTACK
e_permanent=>end: CSTR_PERMANENT
e_interning=>end: CSTR_INTERNING
cond_concate=>condition: 需要字串串接?
cond_string_size=>condition: >= interning size?
op_buf=>operation: CSTR_BUFFER
op_literal=>operation: CSTR_LITERAL
st->cond_concate
cond_concate(no)->op_literal
op_literal->cond_string_size
cond_concate(yes)->op_buf
op_buf->e1
cond_string_size(yes)->e_permanent
cond_string_size(no)->e_interning
```
### `CSTR_BUFFER` 產生 cstr_buffer 方便字串串接
```c
#define CSTR_BUFFER(var) \
char var##_cstring[CSTR_STACK_SIZE] = {0}; \
struct __cstr_data var##_cstr_data = {var##_cstring, 0, CSTR_ONSTACK, 0}; \
cstr_buffer var; \
var->str = &var##_cstr_data;
```
### `CSTR_LITERAL` 由給定的字串生成 cstring , 根據字串的長度決定是否作 interning
```c
#define CSTR_LITERAL(var, cstr) \
static cstring var = NULL; \
if (!var) { \
cstring tmp = cstr_clone("" cstr, (sizeof(cstr) / sizeof(char)) - 1); \
if (tmp->type == 0) { \
tmp->type = CSTR_PERMANENT; \
tmp->ref = 0; \
} \
if (!__sync_bool_compare_and_swap(&var, NULL, tmp)) { \
if (tmp->type == CSTR_PERMANENT) \
free(tmp); \
} \
}
```
以下有針對 var 初始化後做判斷, 當 var 初始化過需要清除其他執行緒創造的 tmp
```c
if (!__sync_bool_compare_and_swap(&var, NULL, tmp)) {
if (tmp->type == CSTR_PERMANENT)
free(tmp);
}
```
這邊有延伸問題2 多執行緒下的問題, 多個執行緒都取得 var == NULL 的情況,
會進入程式碼區塊之中, 雖然 cstr_clone 內部呼叫的 cstr_interning 有作 CSTR_LOCK , 但針對長度過長的 cstring(permanent) 該 lock 沒有作用, 導致重複建立 permanent cstring, 雖然重複的 permanent cstring 會被上方的程式碼區塊釋放, 但是仍舊造成額外的建立跟釋放的開銷, 可以用 [double-checked locking](https://en.wikipedia.org/wiki/Double-checked_locking) 處理。
```c
if (!var) {
cstring tmp = cstr_clone("" cstr, (sizeof(cstr) / sizeof(char)) - 1);
if (tmp->type == 0) {
tmp->type = CSTR_PERMANENT;
tmp->ref = 0;
}
...
}
```
:::warning
todo:設計延伸問題二驗證方式
:::
### __cstr_interning design
雜湊表加上單向鏈表的設計可以避免 hash 碰撞, 同時加上依據節點與雜湊表大小的比例, 做 rehash 的機制(請見 `expand`), 可以避免單向鏈表過長

```c
struct __cstr_interning {
int lock; // lock
int index; // pool 的最大index
unsigned size; // 雜湊表的大小
unsigned total; // cstr_interning執行的次數
struct __cstr_node **hash; // hash對應到的 index
struct __cstr_pool *pool; // 放置node的地方
};
```
:::warning
疑問:
- total 是 cstr_interning執行的次數, 為什麼使用 total 而不用 pool index 做擴容標準
:::
`expand` 處理 rehash 的部分
```c
static void expand(struct __cstr_interning *si)
{
unsigned new_size = si->size * 2;
if (new_size < HASH_START_SIZE)
new_size = HASH_START_SIZE;
struct __cstr_node **new_hash =
xalloc(sizeof(struct __cstr_node *) * new_size);
memset(new_hash, 0, sizeof(struct __cstr_node *) * new_size);
for (unsigned i = 0; i < si->size; ++i) {
struct __cstr_node *node = si->hash[i];
while (node) {
struct __cstr_node *tmp = node->next;
insert_node(new_hash, new_size, node);
node = tmp;
}
}
free(si->hash);
si->hash = new_hash;
si->size = new_size;
}
```
`interning` 從雜湊表中尋找對應的節點, 如果不存在, 則啟用 pool 裡的 index 指向的節點
```c
static cstring interning(struct __cstr_interning *si,
const char *cstr,
size_t sz,
uint32_t hash)
{
if (!si->hash)
return NULL;
int index = (int) (hash & (si->size - 1));
struct __cstr_node *n = si->hash[index];
while (n) {
if (n->str.hash_size == hash) {
if (!strcmp(n->str.cstr, cstr))
return &n->str;
}
n = n->next;
}
// 80% (4/5) threshold
if (si->total * 5 >= si->size * 4)
return NULL;
if (!si->pool) {
si->pool = xalloc(sizeof(struct __cstr_pool));
si->index = 0;
}
n = &si->pool->node[si->index++];
memcpy(n->buffer, cstr, sz);
n->buffer[sz] = 0;
cstring cs = &n->str;
cs->cstr = n->buffer;
cs->hash_size = hash;
cs->type = CSTR_INTERNING;
cs->ref = 0;
n->next = si->hash[index];
si->hash[index] = n;
return cs;
}
```
### `hash_blob` 依據字串長度產生 hash value
```c
static inline uint32_t hash_blob(const char *buffer, size_t len)
{
const uint8_t *ptr = (const uint8_t *) buffer;
size_t h = len;
size_t step = (len >> 5) + 1;
for (size_t i = len; i >= step; i -= step)
h = h ^ ((h << 5) + (h >> 2) + ptr[i - 1]);
return h == 0 ? 1 : h;
}
```
短字串需要對每個字符做雜湊運算, 長字串碰撞的機率降低, 可以每 step 個字符作雜湊運算即可, 提高效率
```c
size_t h = len;
size_t step = (len >> 5) + 1;
```
todo: 了解這個雜湊算法的來源
```c
h = h ^ ((h << 5) + (h >> 2) + ptr[i - 1]);
```
> 這是一個變形的 [Bernstein hash djb2](https://api.riot-os.org/group__sys__hashes__djb2.html),但故意寫得不好,留意兩者 hash 行為的落差。
> :notes: jserv
### 可改進的方向
- 目前雜湊表是可以無限擴容的, 但可能會有很多只用一兩次的字串, 可以考慮實作LRU 或是 LFU cache 的方式將少用字串釋放, 同時限制雜湊表大小, 減少記憶體消耗。(註:忘記考慮pool的大小限制, 需要再思考)
- 處理節點數超過 `INTERNING_POOL_SIZE` 的情況
:::info
- advantage: *cstr is in first position of struct
- string comparison using hash(space-time tradeoff)
- refernce counting design for memory release
- linux also has GC: Page Frame Reclamation
- cost of string operation in web server: h2o/qrintf
:::