2021q1 Homework2 (quiz2)

contributed by < AmyLin0210 >

tags: `2021q1 Linux`

2021q1 第二周測驗題

測驗一

解釋上述程式碼運作原理





#define container_of(ptr, type, member)                            \
    __extension__({                                                \
        const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
        (type *) ((char *) __pmember - offsetof(type, member));    \
    })

在 container_of 中包含以下部份：

__extension__ 的功用是在使用 GCC extension __typeof__ 時，不會跳出警告。
((type *)0)->member 將 0 這個位置強制轉型為 type *
取出當中的 member，並使用 __typeof__ 得到 member 的型態，產生一個 pointer 給該型別的 __pmember
將 ptr assign 給 __pmember
offsetof(type, member) 可以算出 member 在 type 這個 struct 中的位移量。
將絕對位置 (char*)__pmember 減去 offsetof(type, member) ，可以得到 struct 的起始位置。
最後 (type*) 再將起始位置轉行為 pointer to type

在這當中讓我最不解的是 ((type *)0)->member 這個操作，
參考 toastcheng 同學的作業與實驗，自己也試了一下程式碼:

定義出了一個 struct ，如下：

struct test {
    int num;
};

typeof-null.c







#include <stdio.h>

int main () {
    const __typeof__(((struct test *)NULL)->num) a = 1;
    printf("a = %d\n", a); // output: a = 1
    return 0;
}

typeof-2048.c







#include <stdio.h>

int main () {
    const __typeof__(((struct test *)2048)->num) a = 1;
    printf("a = %d\n", a); // output: a = 1
    return 0;
}

typeof-char.c








#include <stdio.h>

int main () {
    char i;
    const __typeof__(((struct test *)i)->num) a = 1;
    printf("a = %d\n", a); // output: a = 1
    return 0;
}

使用 gcc 產生組合語言

$ gcc typeof-null.c typeof-char.c typeof-null.c -S -O0

使用 diff 來比較產生的組合語言

$ diff typeof-char.s typeof-null.s
<       .file   "typeof-char.c"
---
>       .file   "typeof-null.c"

$ diff typeof-2048.s typeof-null.s
<       .file   "typeof-2048.c"
---
>       .file   "typeof-null.c"

發現除了 file name 以外其餘都相同，這個讓我感到意外。

typeof-2048 和 typeof-null 相同，我想可以解釋為都是對兩個位置強制轉型，所以會產生兩個同樣的組合語言。
但原先我預期 typeof-char 會有些不同的行為，因為已經宣告出一個 char 的變數 i，所以組合語言中會多出對 i 做處理的部份，然而結果與我想像中的不同。

如同 toastcheng 同學的猜測，我想在 typeof 時並不會對當中的指標或參數做 dereference。
但除此之外，強制轉型如果在沒有取當中的數值做處理的情況下，是不是也不會做 dereference？

我測試了一下以下程式碼，當中的指標 i 皆指向 NULL，只有型態不同。

char.c








#include <stdio.h>

int main () {
    char *i = NULL;
    const __typeof__(((struct test *)i)->num) a = 1;
    printf("a = %d\n", a);  // output: a = 1
    return 0;
}

int.c








#include <stdio.h>

int main () {
    int *i = NULL;
    const __typeof__(((struct test *)i)->num) a = 1;
    printf("a = %d\n", a);  // output: a = 1
    return 0;
}

上述的程式碼在轉換成組合語言後使用 diff 做比較，
發現不管原參數的型態是否不同，結果仍然相同。

$ gcc int.c char.c -S -O0
$ diff int.s char.s
<       .file   "int.c"
---
>       .file   "char.c"

在這裡使用了 Designated Initializers，使得 head 內的 prev 與 next 都被初始化為自己本身的位置。

struct list_head {
    struct list_head *prev, *next;
};

#define LIST_HEAD(head) struct list_head head = {&(head), &(head)}

INIT_LIST_HEAD

初始化 head

static inline void INIT_LIST_HEAD(struct list_head *head)
{
    head->next = head; head->prev = head;
}

list_add_tail

由於這是一個 circuler doubly linked list ，所以使用 head->prev 找到 linked list 的尾部，再將 node 接上。

static inline void list_add_tail(struct list_head *node, struct list_head *head)
{
    struct list_head *prev = head->prev;

    prev->next = node;
    node->next = head;
    node->prev = prev;
    head->prev = node;
}

list_del

將 node 移出這個 linked list

static inline void list_del(struct list_head *node)
{
    struct list_head *next = node->next, *prev = node->prev;
    next->prev = prev; prev->next = next;
}

list_empty

檢查 head 是否指回 head 自己，如果沒有指向其他 node ，便表示這個 linked list 為空。

static inline int list_empty(const struct list_head *head)
{
    return (head->next == head);
}

list_is_singular

檢查是否 prev 與 next 都指向同一個位置，若是相同便表示只有一個 node

static inline int list_is_singular(const struct list_head *head)
{
    return (!list_empty(head) && head->prev == head->next);
}

list_splice_tail

將 list 整串 linked list 連接至 head linked list 的最後面，形成一個新的環狀結構

static inline void list_splice_tail(struct list_head *list,
                                    struct list_head *head)
{
    struct list_head *head_last = head->prev;
    struct list_head *list_first = list->next, *list_last = list->prev;

    if (list_empty(list))
        return;

    head->prev = list_last;
    list_last->next = head;

    list_first->prev = head_last;
    head_last->next = list_first;
}

list_cut_position

將 head_from 後面到 node 前面的 linked list 裁下，連接至 head_to 後

static inline void list_cut_position(struct list_head *head_to,
                                     struct list_head *head_from,
                                     struct list_head *node)
{
    struct list_head *head_from_first = head_from->next;

    if (list_empty(head_from))
        return;

    if (head_from == node) {
        INIT_LIST_HEAD(head_to);
        return;
    }

    head_from->next = node->next;
    head_from->next->prev = head_from;

    head_to->prev = node;
    node->next = head_to;
    head_to->next = head_from_first;
    head_to->next->prev = head_to;
}

未切割

切割後

get_middle

利用快慢兩個指標來走訪節點，快指標一次走兩步，慢指標一次走一步。
當快指標走訪回最開頭時，慢指標會剛好走到中間。

假設這裡有 N 個節點，慢指標會在第

⌈ N ⌉

個位置

static list_ele_t *get_middle(struct list_head *list)
{
    struct list_head *fast = list->next, *slow;
    list_for_each (slow, list) {
        if (fast->next == list || fast->next->next == list)
            break;
        fast = fast->next->next;
    }
    return list_entry(slow, list_ele_t, list);
}

list_merge_sort

這段程式碼因為所有功能都有用函式包起，
乍看之下很像 merge sort 的 pseudocode。

void list_merge_sort(queue_t *q)
{
    if (list_is_singular(&q->list))
        return;

    queue_t left;
    struct list_head sorted;
    INIT_LIST_HEAD(&left.list);
    list_cut_position(&left.list, &q->list, &get_middle(&q->list)->list);
    list_merge_sort(&left);
    list_merge_sort(q);
    list_merge(&left.list, &q->list, &sorted);
    INIT_LIST_HEAD(&q->list);
    list_splice_tail(&sorted, &q->list);
}

運作邏輯如下：

確認是否只剩一個節點，如果只剩一個節點直接 return
尋找整個 list 的中心位置並把他切為兩半
對兩個 list 分別再進行 merge sort

完成後將兩邊 merge 起，這個步驟可以用下面圖解:

分別有 left , q ( right ) , 與 sorted 三段

將他們排序並連接起，接到 sorted 後方

把 q 初始化

將排序好的 list 接到 q 的後面

list_merge

將兩條 linked list 排序好並連接起

static void list_merge(struct list_head *lhs,
                       struct list_head *rhs,
                       struct list_head *head)
{
    INIT_LIST_HEAD(head);
    if (list_empty(lhs)) {
        list_splice_tail(lhs, head);
        return;
    }
    if (list_empty(rhs)) {
        list_splice_tail(rhs, head);
        return;
    }

    while (!list_empty(lhs) && !list_empty(rhs)) {
        char *lv = list_entry(lhs->next, list_ele_t, list)->value;
        char *rv = list_entry(rhs->next, list_ele_t, list)->value;
        struct list_head *tmp = strcmp(lv, rv) <= 0 ? lhs->next : rhs->next;
        list_del(tmp);
        list_add_tail(tmp, head);
    }
    list_splice_tail(list_empty(lhs) ? rhs : lhs, head);
}

指出改進空間並著手實作

在資料型態的部分，

struct list_head {
    struct list_head *prev, *next;
};

typedef struct __element {
    char *value;
    struct __element *next;
    struct list_head list;
} list_ele_t;

typedef struct {
    list_ele_t *head; /* Linked list of elements */
    list_ele_t *tail;
    size_t size;
    struct list_head list;
} queue_t;

由於 list_head 就擁有了 tail 、 next 相關的資訊，所以我把上述的資料型態給縮減為以下：

struct list_head {
    struct list_head *prev, *next;
};

typedef struct __element {
    char *value;
    struct list_head list;
} list_ele_t;

typedef struct {
    list_ele_t *head;
    size_t size;
    struct list_head list;
} queue_t;

也有針對這個改動修改其他對應的程式碼。

研讀 Linux 核心的 lib/list_sort.c 原始程式碼

lib/list_sort.c

__attribute__ 是一種檢查工具，主要是 gcc 在編譯時期會去查看。
在這裡選定的 target 是 nonnull，表示在 cmp_func 這個 function pointer 中的第二及第三個參數指標不得為 NULL。

typedef int __attribute__((nonnull(2,3))) (*cmp_func)(void *,
		struct list_head const *, struct list_head const *);

參考資料

Declaring Attributes of Functions

測驗二

解釋程式碼運作原理

考慮函式 func 接受一個 16 位元無號整數 N，並回傳小於或等於 N 的 power-of-2

uint16_t func(uint16_t N) {
    /* change all right side bits to 1 */
    N |= N >> 1;
    N |= N >> 2;
    N |= N >> 4;
    N |= N >> 8;

    return (N + 1) >> 1;
}

由於以上函式的目標為回傳小於或等於 N 的 power-of-2，看到 return (N + 1) >> 1，可以推測最後 N 的值會是

2^{n + 1} - 1

假設現在 func 的 input 為 1 << 15，以下為執行順序：

1000 0000 0000 0000 // N
1100 0000 0000 0000 // N |= N >> 1
1111 0000 0000 0000 // N |= N >> 2
1111 1111 0000 0000 // N |= N >> 4
1111 1111 1111 1111 // N |= N >> 8

在回傳值為

2^{15}

時，會發現有 overflow 的問題，應改為
return (N >> 1) + 1 來保證結果正確性。

在 The Linux Kernel API 頁面搜尋 “power of 2”，可見 is_power_of_2，查閱 Linux 核心原始程式碼，舉例說明其作用和考量，特別是 round up/down power of 2

在 linux/tools/include/linux/log2.h 內可以找到 is_power_of_2 的原始程式碼。

__attribute__((const)) 是個 GNU compiler extension，他的目標為確保這個函式沒有讀取或修改任何的 global memory。

static inline __attribute__((const))
bool is_power_of_2(unsigned long n)
{
	return (n != 0 && ((n & (n - 1)) == 0));
}

如果 n 為 2 的冪次 (

2^{N}

)，從 bit 的角度來看，就只有第 N 個 bit 為 1 ，其餘皆為零。
而 n-1 則會變成從第 N-1 個及前方的 bit 為 1 ，其餘皆為零
n 與 n-1 做 & 操作後會變成 0 。

好處是 Bitwise Operation 在硬體實做上比較有效率。

同樣在 linux/tools/include/linux/log2.h 內，可以找到 __roundup_pow_of_two 以及 __rounddown_pow_of_two 。

UL 代表 unsigned long int ( C99 標準, 6.4.4.1 )。
fls_long 代表的是找到第一個為一的 bit 在哪邊，

__roundup__pow_of_two
- 假設我們預期此函數的回傳值為
  $2^{N}$ ，那他的參數 n 數值範圍應該要為
  $2^{N - 1} + 1$ ~
  $2^{N}$
- fls_long(n) 中，若 n 的範圍為
  $2^{N - 1}$ ~
  $2^{N} - 1$ ，會回傳 N
- 因此 fls_long(n - 1)
__rounddown__pow_of_two
- 假設我們預期此函數的回傳值為
  $2^{N - 1}$ ，那他的參數 n 數值範圍應該要為
  $2^{N - 2}$ ~
  $2^{N - 1} - 1$
- fls_long(n) 中，若 n 的範圍為
  $2^{N - 1}$ ~
  $2^{N} - 1$ ，會回傳 N
- 因此 fls_long(n) - 1

/*
 * round up to nearest power of two
 */
static inline __attribute__((const))
unsigned long __roundup_pow_of_two(unsigned long n)
{
	return 1UL << fls_long(n - 1);
}

/*
 * round down to nearest power of two
 */
static inline __attribute__((const))
unsigned long __rounddown_pow_of_two(unsigned long n)
{
	return 1UL << (fls_long(n) - 1);
}

在 linux/tools/include/linux/bitops.h 中可以找到 fls_long。

在這邊將 unsigned long l 分成 32 位元與 64 位元的兩種狀況去處理

static inline unsigned fls_long(unsigned long l)
{
	if (sizeof(l) == 4)
		return fls(l);
	return fls64(l);
}

在 linux/include/asm-generic/bitops/fls.h 內可以找到 fls 的程式原始碼。

fls 的函式目標為找到 x 內第一個唯一的 bit。

/**
 * fls - find last (most-significant) bit set
 * @x: the word to search
 *
 * This is defined the same way as ffs.
 * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
 */

static __always_inline int fls(unsigned int x)
{
	int r = 32;

	if (!x)
		return 0;
	if (!(x & 0xffff0000u)) {
		x <<= 16;
		r -= 16;
	}
	if (!(x & 0xff000000u)) {
		x <<= 8;
		r -= 8;
	}
	if (!(x & 0xf0000000u)) {
		x <<= 4;
		r -= 4;
	}
	if (!(x & 0xc0000000u)) {
		x <<= 2;
		r -= 2;
	}
	if (!(x & 0x80000000u)) {
		x <<= 1;
		r -= 1;
	}
	return r;
}

以上函式透過 bit mask 的方式一次次過濾是否前方有 1，進而算出第一個為 1 的 bit 在哪裡。

以下用 8 個 bit 來示範函式流程

x:    0000 0110
mask: 1111 0000
r:    8 - 4 = 4

x:    0110 0000
mask: 1100 0000
r:    4

x:    0110 0000
mask: 1000 0000
r:    4 - 1 = 3

在這裡我看到了 Linux kernel 的優雅操作，以上的函式都是用比較貼近硬體語言的方式操作，例如位移或者是 & | 運算子，比起減法更有效率。

Reference

__attribute__((const)) function attribute

測驗三

解釋程式碼運作原理

#include <stdint.h>

void bitcpy(void *_dest,      /* Address of the buffer to write to */
            size_t _write,    /* Bit offset to start writing to */
            const void *_src, /* Address of the buffer to read from */
            size_t _read,     /* Bit offset to start reading from */
            size_t count)
{
    size_t read_lhs = _read & 7;
    size_t read_rhs = 8 - read_lhs;
    const uint8_t *source = (const uint8_t *) _src + (_read / 8);
    size_t write_lhs = _write & 7;
    size_t write_rhs = 8 - write_lhs;
    uint8_t *dest = (uint8_t *) _dest + (_write / 8);

    static const uint8_t read_mask[] = {
        0x00, /*    == 0    00000000b   */
        0x80, /*    == 1    10000000b   */
        0xC0, /*    == 2    11000000b   */
        0xE0, /*    == 3    11100000b   */
        0xF0, /*    == 4    11110000b   */
        0xF8, /*    == 5    11111000b   */
        0xFC, /*    == 6    11111100b   */
        0xFE, /*    == 7    11111110b   */
        0xFF  /*    == 8    11111111b   */
    };

    static const uint8_t write_mask[] = {
        0xFF, /*    == 0    11111111b   */
        0x7F, /*    == 1    01111111b   */
        0x3F, /*    == 2    00111111b   */
        0x1F, /*    == 3    00011111b   */
        0x0F, /*    == 4    00001111b   */
        0x07, /*    == 5    00000111b   */
        0x03, /*    == 6    00000011b   */
        0x01, /*    == 7    00000001b   */
        0x00  /*    == 8    00000000b   */
    };

    while (count > 0) {
        uint8_t data = *source++;
        size_t bitsize = (count > 8) ? 8 : count;
        if (read_lhs > 0) {
            data <<= read_lhs;
            if (bitsize > read_rhs)
                data |= (*source >> read_rhs);
        }

        if (bitsize < 8)
            data &= read_mask[bitsize];

        uint8_t original = *dest;
        uint8_t mask = read_mask[write_lhs];
        if (bitsize > write_rhs) {
            /* Cross multiple bytes */
            *dest++ = (original & mask) | (data >> write_lhs);
            original = *dest & write_mask[bitsize - write_rhs];
            *dest = original | (data << write_rhs);
        } else {
            // Since write_lhs + bitsize is never >= 8, no out-of-bound access.
            mask |= write_mask[write_lhs + bitsize];
            *dest++ = (original & mask) | (data >> write_lhs);
        }

        count -= bitsize;
    }
}

現在假設

_read = 2
_write = 3
count = 9

首先會先將 _read 、 _write 分作 lhs 與 rhs

size_t read_lhs = _read & 7;    // 2
size_t read_rhs = 8 - read_lhs; // 6

找到由哪個 byte 開始讀取 bit 數值，示意圖中粉色方格為 source 。

const uint8_t *source = (const uint8_t *) _src + (_read / 8);

將 source 中要讀取的片段複製到 data 內。

uint8_t data = *source++;
size_t bitsize = (count > 8) ? 8 : count;
if (read_lhs > 0) {
    data <<= read_lhs;
    if (bitsize > read_rhs)
        data |= (*source >> read_rhs);
}

再將 data 的內容移動至 dest

uint8_t original = *dest;
uint8_t mask = read_mask[write_lhs];
if (bitsize > write_rhs) {
    /* Cross multiple bytes */
    *dest++ = (original & mask) | (data >> write_lhs);
    original = *dest & write_mask[bitsize - write_rhs];
    *dest = original | (data << write_rhs);
} else {
    // Since write_lhs + bitsize is never >= 8, no out-of-bound access.
    mask |= write_mask[write_lhs + bitsize];
    *dest++ = (original & mask) | (data >> write_lhs);
}

嘗試重寫為同樣功能但效率更高的實作

目前的程式碼我認為有改進空間的部份有

使用 bitwise 的位移取代除法。








size_t read_lhs = _read & 7;
size_t read_rhs = 8 - read_lhs;
-const uint8_t *source = (const uint8_t *) _src + (_read / 8);
+const uint8_t *source = (const uint8_t *) _src + (_read >> 3);
size_t write_lhs = _write & 7;
size_t write_rhs = 8 - write_lhs;
-uint8_t *dest = (uint8_t *) _dest + (_write / 8);
+uint8_t *dest = (uint8_t *) _dest + (_write >> 3);

若需要做跨 byte 儲存資料，每次都會多出兩次的位移與 mask 的，若 count 一大，運算成本也會增加。





















































































{
    size_t read_lhs = _read & 7;
    size_t read_rhs = 8 - read_lhs;
    const uint8_t *source = (const uint8_t *) _src + (_read >> 3);
    size_t write_lhs = _write & 7;
    size_t write_rhs = 8 - write_lhs;
    uint8_t *dest = (uint8_t *) _dest + (_write >> 3);

    static const uint8_t read_mask[] = {
        0x00, /*    == 0    00000000b   */
        0x80, /*    == 1    10000000b   */
        0xC0, /*    == 2    11000000b   */
        0xE0, /*    == 3    11100000b   */
        0xF0, /*    == 4    11110000b   */
        0xF8, /*    == 5    11111000b   */
        0xFC, /*    == 6    11111100b   */
        0xFE, /*    == 7    11111110b   */
        0xFF  /*    == 8    11111111b   */
    };

    static const uint8_t write_mask[] = {
        0xFF, /*    == 0    11111111b   */
        0x7F, /*    == 1    01111111b   */
        0x3F, /*    == 2    00111111b   */
        0x1F, /*    == 3    00011111b   */
        0x0F, /*    == 4    00001111b   */
        0x07, /*    == 5    00000111b   */
        0x03, /*    == 6    00000011b   */
        0x01, /*    == 7    00000001b   */
        0x00  /*    == 8    00000000b   */
    };

+   if (count & 7UL) {
+       uint8_t data = *source++;
+       size_t bitsize = count & 7UL;

+       data <<= read_lhs;
+       data &= read_mask[bitsize];

+       uint8_t original = *dest;
+       uint8_t mask = read_mask[write_lhs];

+       if (bitsize > write_rhs) {
+           *dest++ = (original & mask) | (data >> write_lhs);
+           original = *dest & write_mask[bitsize - write_rhs];
+           *dest = original | (data << write_rhs);
+       } else {
+           mask |= write_mask[write_lhs + bitsize];
+           *dest++ = (original & mask) | (data >> write_lhs);
+       }
+       count -= bitsize;
+   }

    while (count > 0) {
        uint8_t data = *source++;
        size_t bitsize = (count > 8) ? 8 : count;
        
-       if (read_lhs > 0) {
-           data <<= read_lhs;
-           if (bitsize > read_rhs)
-               data |= (*source >> read_rhs);
-       }
-
-       if (bitsize < 8)
-           data &= read_mask[bitsize];

        data &= read_mask[bitsize];

        uint8_t original = *dest;
        uint8_t mask = read_mask[write_lhs];

        if (bitsize > write_rhs) {
            /* Cross multiple bytes */
            *dest++ = (original & mask) | (data >> write_lhs);
            original = *dest & write_mask[bitsize - write_rhs];
            *dest = original | (data << write_rhs);
        } else {
            // Since write_lhs + bitsize is never >= 8, no out-of-bound access.
            mask |= write_mask[write_lhs + bitsize];
            *dest++ = (original & mask) | (data >> write_lhs);
        }

        count -= bitsize;
    }
}

在進入 while 迴圈之前，我把 src 對齊，確認在讀資料時，一定會從某個 byte 的第一個 bit 開始讀，節省 mask 的次數。

由圖片中可以看到，有將 src 對齊的執行時間比起原版還要更短。

測驗四

解釋程式碼運作原理

在這裡由測試程式 test.c 流程進行介紹

test.c

CSTR_BUFFER(a);

cstr.h
































#pragma once
#include <stddef.h>
#include <stdint.h>
#include <stdlib.h>

enum {
    CSTR_PERMANENT = 1,
    CSTR_INTERNING = 2,
    CSTR_ONSTACK = 4,
};

#define CSTR_INTERNING_SIZE (32)
#define CSTR_STACK_SIZE (128)

typedef struct __cstr_data {
    char *cstr;
    uint32_t hash_size;
    uint16_t type;
    uint16_t ref;
} * cstring;

typedef struct __cstr_buffer {
    cstring str;
} cstr_buffer[1];

#define CSTR_S(s) ((s)->str)

#define CSTR_BUFFER(var)                                                      \
    char var##_cstring[CSTR_STACK_SIZE] = {0};                                \
    struct __cstr_data var##_cstr_data = {var##_cstring, 0, CSTR_ONSTACK, 0}; \
    cstr_buffer var;                                                          \
    var->str = &var##_cstr_data;

第 1 行的 #pragma once 在 C 語言內並不是一個標準 (non-standard) ，但是有廣泛的被編譯器支援。目標與常見的 #ifndef... 類似，希望只被 include 一次。

第 29 行的 ## operator 被拿來連接兩個 token ，用以生成變數名稱

到目前為止可以看到我們生成了一個 a 的 str_buffer 與其相關的變數。

test.c

cstr_cat(a, "Hello ");

cstr.c




















cstring cstr_cat(cstr_buffer sb, const char *str)
{
    cstring s = sb->str;
    if (s->type == CSTR_ONSTACK) {
        int i = s->hash_size;
        while (i < CSTR_STACK_SIZE - 1) {
            s->cstr[i] = *str;
            if (*str == 0)
                return s;
            ++s->hash_size;
            ++str;
            ++i;
        }
        s->cstr[i] = 0;
    }
    cstring tmp = s;
    sb->str = cstr_cat2(tmp->cstr, str);
    cstr_release(tmp);
    return sb->str;
}

現在有一個叫做 a 的 cstr_buffer 與 "Hello " 這個字串被傳入 cstr_cat 中。

在這裡 s 內的資料經由上一步初始化為

{
    // char a_cstring[CSTR_STACK_SIZE] = {0} 
    char* cstr = a_cstring; 
    uint32_t hash_size = 0;
    uint16_t type = CSTR_ONSTACK;
    uint16_t ref = 0;
}

如果字串除存的資料還在 stack 上面的話會有以下步驟：

在第 5 行中的 i 表示現在 s 的 cstr_buffer 內的字串長度
在以下兩種條件都符合的情況下將字串依序放入 cstr 尾端，與前字串連接起。
- cstr 內的字串長度小於 CSTR_STACK_SIZE - 1
- str 還沒走到字串尾端 ( '\0' )
如果 cstr 的大小足以放下 str，回傳 s
如果不夠，將 \0 放入 cstr末端，進入與 type != CSTR_ONSTACK 相同的處理步驟

當 type != CSTR_ONSTACK 或者字串長度已經超過 CSTR_STACK_SIZE 時，呼叫 cstr_cat2 ，傳入 cstr 與 str

cstr_cat2
























static cstring cstr_cat2(const char *a, const char *b)
{
    size_t sa = strlen(a), sb = strlen(b);
    if (sa + sb < CSTR_INTERNING_SIZE) {
        char tmp[CSTR_INTERNING_SIZE];
        memcpy(tmp, a, sa);
        memcpy(tmp + sa, b, sb);
        tmp[sa + sb] = 0;
        return cstr_interning(tmp, sa + sb, hash_blob(tmp, sa + sb));
    }
    cstring p = xalloc(sizeof(struct __cstr_data) + sa + sb + 1);
    if (!p)
        return NULL;

    char *ptr = (char *) (p + 1);
    p->cstr = ptr;
    p->type = 0;
    p->ref = 1;
    memcpy(ptr, a, sa);
    memcpy(ptr + sa, b, sb);
    ptr[sa + sb] = 0;
    p->hash_size = 0;
    return p;
}

若字串的長度大於 CSTR_STACK_SIZE 但小於 CSTR_INTERNING_SIZE 時，會進入 cstr__interning 的函式。
(但在測驗程式碼中 CSTR_STACK_SIZE 為 128， CSTR_INTERNING_SIZE 為 32，這樣好像就永遠執行不到這段程式碼 @@)

傳入 cstr__interning 的參數為:

tmp : 連接起的字串
sa+sb : 字串長度
hash_blob(tmp, sa + sb) : 此字串的雜湊值

hash_blob









static inline uint32_t hash_blob(const char *buffer, size_t len)
{
    const uint8_t *ptr = (const uint8_t *) buffer;
    size_t h = len;
    size_t step = (len >> 5) + 1;
    for (size_t i = len; i >= step; i -= step)
        h = h ^ ((h << 5) + (h >> 2) + ptr[i - 1]);
    return h == 0 ? 1 : h;
}

cstr_interning













static cstring cstr_interning(const char *cstr, size_t sz, uint32_t hash)
{
    cstring ret;
    CSTR_LOCK();
    ret = interning(&__cstr_ctx, cstr, sz, hash);
    if (!ret) {
        expand(&__cstr_ctx);
        ret = interning(&__cstr_ctx, cstr, sz, hash);
    }
    ++__cstr_ctx.total;
    CSTR_UNLOCK();
    return ret;
}

在 cstr_interning 中第 4 行 CSTR_LOCK 與第 11 行的 CSTR_UNLOCK 使得在 interning 的過程中可以保證執行正確。







#define CSTR_LOCK()                                               \
    ({                                                            \
        while (__sync_lock_test_and_set(&(__cstr_ctx.lock), 1)) { \
        }                                                         \
    })
#define CSTR_UNLOCK() ({ __sync_lock_release(&(__cstr_ctx.lock)); })

在這裡使用的是 gcc 提供的 Built-in functions for atomic memory access 來做 LOCK。

struct __cstr_interning {
    int lock;
    int index;
    unsigned size;
    unsigned total;
    struct __cstr_node **hash;
    struct __cstr_pool *pool;
};

static struct __cstr_interning __cstr_ctx;

interning







































static cstring interning(struct __cstr_interning *si,
                         const char *cstr,
                         size_t sz,
                         uint32_t hash)
{
    if (!si->hash)
        return NULL;

    int index = (int) (hash & (si->size - 1));
    struct __cstr_node *n = si->hash[index];
    while (n) {
        if (n->str.hash_size == hash) {
            if (!strcmp(n->str.cstr, cstr))
                return &n->str;
        }
        n = n->next;
    }
    // 80% (4/5) threshold
    if (si->total * 5 >= si->size * 4)
        return NULL;
    if (!si->pool) {
        si->pool = xalloc(sizeof(struct __cstr_pool));
        si->index = 0;
    }
    n = &si->pool->node[si->index++];
    memcpy(n->buffer, cstr, sz);
    n->buffer[sz] = 0;

    cstring cs = &n->str;
    cs->cstr = n->buffer;
    cs->hash_size = hash;
    cs->type = CSTR_INTERNING;
    cs->ref = 0;

    n->next = si->hash[index];
    si->hash[index] = n;

    return cs;
}

如果 si 內的 hash 為空，代表第一次使用這個空間，回傳 NULL，使用 expand 創造空間後再次進入。

第 9 行的 index 為 hash 與 si->size 取餘數的結果，接著進入 while 迴圈，目標為在那層 hash 內尋找是否有相同的字串，若有找的則回傳已經儲存的字串，若沒找到就在 pool 內創立新的 node

在第 19 行的地方，確認裡頭物件是否超過雜湊表的 4/5 ，若超過就回傳 NULL ，再使用 expand 擴增空間重新進入。

第 35 行，將新的 node 連接至 si->hash[index] 的開頭

expand

































#define HASH_START_SIZE 16 /* must be power of 2 */

static void *xalloc(size_t n)
{
    void *m = malloc(n);
    if (!m)
        exit(-1);
    return m;
}

static void expand(struct __cstr_interning *si)
{
    unsigned new_size = si->size * 2;
    if (new_size < HASH_START_SIZE)
        new_size = HASH_START_SIZE;

    struct __cstr_node **new_hash =
        xalloc(sizeof(struct __cstr_node *) * new_size);
    memset(new_hash, 0, sizeof(struct __cstr_node *) * new_size);

    for (unsigned i = 0; i < si->size; ++i) {
        struct __cstr_node *node = si->hash[i];
        while (node) {
            struct __cstr_node *tmp = node->next;
            insert_node(new_hash, new_size, node);
            node = tmp;
        }
    }

    free(si->hash);
    si->hash = new_hash;
    si->size = new_size;
}

在 expand 宣告新的 struct __cstr_node 的指標的指標，並給予 struct __cstr_node * 原兩倍大小的空間 ( 若是內部還不曾有過東西，大小則為 HASH_START_SIZE )

參考資料

2021q1 Homework2 (quiz2)

tags: 2021q1 Linux

測驗一

解釋上述程式碼運作原理

INIT_LIST_HEAD

list_add_tail

list_del

list_empty

list_is_singular

list_splice_tail

list_cut_position

get_middle

list_merge_sort

list_merge

指出改進空間並著手實作

研讀 Linux 核心的 lib/list_sort.c 原始程式碼

參考資料

測驗二

解釋程式碼運作原理

在 The Linux Kernel API 頁面搜尋 “power of 2”，可見 is_power_of_2，查閱 Linux 核心原始程式碼，舉例說明其作用和考量，特別是 round up/down power of 2

測驗三

解釋程式碼運作原理

嘗試重寫為同樣功能但效率更高的實作

測驗四

解釋程式碼運作原理

Read more

Lockless Multithreaded Logger 實作

2022q1 Homework3 (quiz3)

2022q1 Homework2 (quiz2)

2022q1 Homework1 (lab0)

tags: `2021q1 Linux`