2021q1 Homework3 (quiz3)

contributed by < toastcheng >

tags: `linux2021`

進度

1. 程式碼運作原理
2. 改進方案
3. 字串複製 (CoW) 的函式實作
4. 分析字串處理最佳化的空間效益
5. 整合 string interning
6. Linux 中的 SSO、CoW 案例

Q1: Small String Optimization

1. 程式碼運作原理

typedef union {
    char data[16];

    struct {
        uint8_t filler[15],
            space_left : 4,
            is_ptr : 1, 
            is_large_string : 1, 
            flag2 : 1, 
            flag3 : 1;
    };

    struct {
        char *ptr;
        size_t size : 54,
            capacity : 6;
    };
} xs;

透過 xs 使用 union 的方式可以看出 xs 總共佔用 16 bytes (128 bits) 的記憶體空間，因為不同情境可能為 char data[16] 或另外兩種 struct。

`xs_data` 回傳字串的開頭

static inline char *xs_data(const xs *x)
{
    if (!xs_is_ptr(x))
        return (char *) x->data;

    if (xs_is_large_string(x))
        return (char *) (x->ptr + /* OFF */ 4);
    return (char *) x->ptr;
}

從程式的分支邏輯對應 xs 的三種 layout，可以看出依照不同情境，字串的開頭可能分別在：

short string: x->data 指向的位置
字串長度小於 16，直接存放在 stack。
large string: x->ptr 後 4 個 byte 的位置
字串長度大於 LARGE_STRING_LEN (256)，會利用前 x->ptr 指向位置的前 4 個 byte 存放 reference count，因此真正存放資料的位置在 x->ptr + 4。與 xs_allocate_data 的邏輯相呼應，其依照字串長度做不同記憶體配置。
middle string: x->ptr 指向的位置
若是大於或等於 16，並且小於 LARGE_STRING_LEN，則一樣分配在 heap，並由 x->ptr 指向其位址，與 large string 差別在於不另外紀錄 reference count。

有趣的是 x->data 及 x->ptr 應是在相同的位置，這個部分應該可以改善，可見改進方案。

`xs_size` 回傳字串的長度

static inline size_t xs_size(const xs *x)
{
    return xs_is_ptr(x) ? x->size : 15 - x->space_left;
}

從這個函式可以看出字串的長度如何儲存在 xs，如果是 short string，其內容完全儲存在 stack，可以由 space_left 知道還有多少空間，因此將 15 - x->space_left 就能得到大小；而 medium string、large string 因為是動態配置，所以其大小另外存在 x->size 中。

`xs_capacity`

static inline size_t xs_capacity(const xs *x)
{
    return xs_is_ptr(x) ? ((size_t) 1 << x->capacity) - 1 : 15;
}

如果為 short string，容量固定為 15，而 medium string 及 large string 則透過 capacity 紀錄，而 capacity 則是在 xs_new 中決定的，會分配一個比字串長度還大的 power of 2 為其值。

`xs_*_refcnt`

static inline void xs_set_refcnt(const xs *x, int val)
{
    *((int *) ((size_t) x->ptr)) = val;
}

static inline void xs_inc_refcnt(const xs *x)
{
    if (xs_is_large_string(x))
        ++(*(int *) ((size_t) x->ptr));
}

static inline int xs_dec_refcnt(const xs *x)
{
    if (!xs_is_large_string(x))
        return 0;
    return --(*(int *) ((size_t) x->ptr));
}

static inline int xs_get_refcnt(const xs *x)
{
    if (!xs_is_large_string(x))
        return 0;
    return *(int *) ((size_t) x->ptr);
}

對長字串的 reference count 做操作的函式，可以發現唯獨 xs_set_refcnt 並沒有確保字串為長字串，可見改進方案。

`ilog2` 取對數

static inline int ilog2(uint32_t n) { return /* LLL */  32 - __builtin_clz(n) - 1; }

透過 __builtin_clz 的使用，找到最高位非 0 bit 的位置（從右算起），根據對數的定義該位置 - 1 便是 n 的對數的整數部分。

`xs_allocate_data` 配置記憶體

static void xs_allocate_data(xs *x, size_t len, bool reallocate)
{
    /* Medium string */
    if (len < LARGE_STRING_LEN) {
        x->ptr = reallocate ? realloc(x->ptr, (size_t) 1 << x->capacity)
                            : malloc((size_t) 1 << x->capacity);
        return;
    }

    /* Large string */
    x->is_large_string = 1;

    /* The extra 4 bytes are used to store the reference count */
    x->ptr = reallocate ? realloc(x->ptr, (size_t)(1 << x->capacity) + 4)
                        : malloc((size_t)(1 << x->capacity) + 4);

    xs_set_refcnt(x, 1);
}

xs_allocate_data 針對 medium、large string 做記憶體配置，在此判斷如果長度大於 LARGE_STRING_LEN 就要多配置 4 個 bytes 以供紀錄 reference count。

`xs_new`

xs *xs_new(xs *x, const void *p)
{
    *x = xs_literal_empty();
    size_t len = strlen(p) + 1;
    if (len > /* NNN */ 16) {
        x->capacity = ilog2(len) + 1;
        x->size = len - 1;
        x->is_ptr = true;
        xs_allocate_data(x, x->size, 0);
        memcpy(xs_data(x), p, len);
    } else {
        memcpy(x->data, p, len);
        x->space_left = 15 - (len - 1);
    }
    return x;
}

xs_new 初始化一個 xs，針對是否是 short string 決定是否要利用 xs_allocate_data 動態配置記憶體，在 xs_allocate_data 中會再針對 medium string 或 large string 做不同的配置。

`xs_grow`

xs *xs_grow(xs *x, size_t len)
{
    char buf[16];

    if (len <= xs_capacity(x))
        return x;

    /* Backup first */
    if (!xs_is_ptr(x))
        memcpy(buf, x->data, 16);

    x->is_ptr = true;
    x->capacity = ilog2(len) + 1;

    if (xs_is_ptr(x)) {
        xs_allocate_data(x, len, 1);
    } else {
        xs_allocate_data(x, len, 0);
        memcpy(xs_data(x), buf, 16);
    }
    return x;
}

增加 xs 所配置的記憶體空間至 len，並更新 capacity，並且處理字串種類的變化，但只涵蓋從 short string 成長為 medium string 的情況，仔細觀察也會發現若是成長為 large string，並沒有相對應的處理，改進可見改進方案。

`xs_cow_lazy_copy`

static bool xs_cow_lazy_copy(xs *x, char **data)
{
    if (xs_get_refcnt(x) <= 1)
        return false;

    /* Lazy copy */
    xs_dec_refcnt(x);
    xs_allocate_data(x, x->size, 0);

    if (data) {
        memcpy(xs_data(x), *data, x->size);

        /* Update the newly allocated pointer */
        *data = xs_data(x);
    }
    return true;
}

這個函式意義上應是將字串 data 的內容複製到 x，先考慮 reference count > 1 的情況：

相同內容的長字串因為共用同一段記憶體空間，因此當要將 data 的內容更改為 x，勢必無法繼續共用原本的記憶體空間，因此在更改之前呼叫 xs_dec_refcnt（x 不再引用到這個記憶體空間）
呼叫 xs_allocate_data 重新分配記憶體空間
利用 memcpy 進行複製

而如果 reference count <= 1，
在最初的檢查 x 會直接回傳 false 並且完全沒有執行到 memcpy，亦即沒有真的進行複製。對於這樣的行為思考許久，為什麼不必複製？

搜尋使用的情境發現呼叫 xs_cow_lazy_copy 的方式都是對相同的內容進行複製，共有兩處。

xs_concat：

char *data = xs_data(string);
xs_cow_lazy_copy(string, &data);

以及 xs_trim:

char *dataptr = xs_data(x), *orig = dataptr;
if (xs_cow_lazy_copy(x, &dataptr))
    orig = dataptr;

或許這兩處使用 xs_cow_lazy_copy 是因為它們將要改動原本的內容，因此先複製到別的位置。但 xs_cow_lazy_copy 較符合預期的行為應該還是要複製 reference count <= 1 的內容。

`xs_concat`

xs *xs_concat(xs *string, const xs *prefix, const xs *suffix)
{
    size_t pres = xs_size(prefix), sufs = xs_size(suffix),
           size = xs_size(string), capacity = xs_capacity(string);

    char *pre = xs_data(prefix), *suf = xs_data(suffix),
         *data = xs_data(string);

    xs_cow_lazy_copy(string, &data);

    if (size + pres + sufs <= capacity) {
        memmove(data + pres, data, size);
        memcpy(data, pre, pres);
        memcpy(data + pres + size, suf, sufs + 1);

        if (xs_is_ptr(string))
            string->size = size + pres + sufs;
        else
            string->space_left = 15 - (size + pres + sufs);
    } else {
        xs tmps = xs_literal_empty();
        xs_grow(&tmps, size + pres + sufs);
        char *tmpdata = xs_data(&tmps);
        memcpy(tmpdata + pres, data, size);
        memcpy(tmpdata, pre, pres);
        memcpy(tmpdata + pres + size, suf, sufs + 1);
        xs_free(string);
        *string = tmps;
        string->size = size + pres + sufs;
    }
    return string;
}

回傳一個 prefix + string + suffix 的字串。

`xs_trim`

xs *xs_trim(xs *x, const char *trimset)
{
    if (!trimset[0])
        return x;

    char *dataptr = xs_data(x), *orig = dataptr;

    if (xs_cow_lazy_copy(x, &dataptr))
        orig = dataptr;

    /* similar to strspn/strpbrk but it operates on binary data */
    uint8_t mask[32] = {0};

#define check_bit(byte) (/* CCC */ mask[(uint8_t) byte / 8] & 1 << (uint8_t) byte % 8)
#define set_bit(byte) (/* SSS */ mask[(uint8_t) byte / 8] |= 1 << (uint8_t) byte % 8)
    size_t i, slen = xs_size(x), trimlen = strlen(trimset);

    for (i = 0; i < trimlen; i++)
        set_bit(trimset[i]);
    for (i = 0; i < slen; i++)
        if (!check_bit(dataptr[i]))
            break;
    for (; slen > 0; slen--)
        if (!check_bit(dataptr[slen - 1]))
            break;
    dataptr += i;
    slen -= i;

    /* reserved space as a buffer on the heap.
     * Do not reallocate immediately. Instead, reuse it as possible.
     * Do not shrink to in place if < 16 bytes.
     */
    memmove(orig, dataptr, slen);
    /* do not dirty memory unless it is needed */
    if (orig[slen])
        orig[slen] = 0;

    if (xs_is_ptr(x))
        x->size = slen;
    else
        x->space_left = 15 - slen;
    return x;
#undef check_bit
#undef set_bit
}

將 x 開頭及結尾含有 trimset 內的字元的部分截去。

2. 改進方案

union 中多餘的 data type

程式碼中可以看到 xs 所代表的字串有分為短中長三種，但這三種並不是一一對應 xs 的三種排列，短字串對應的是：

struct {
    uint8_t filler[15],
        space_left : 4,
        is_ptr : 1, 
        is_large_string : 1, 
        flag2 : 1, 
        flag3 : 1;
};

長字串則是：

struct {
        char *ptr;
        size_t size : 54,
            capacity : 6;
    };

可以發現 char data[16]; 是多餘的，其功能可以由 filler 代替，因為它們指向相同的記憶體位址，雖然 filler 的大小為 15 個 bytes，但當字串長度包含 \0 為 16 時，space_left 為 0、後面 4 個 flag 也因為是 short string 而為 0，因此 filler 後的最後一個 byte 也會是 \0。

但這樣設計要成立需要確保編譯器沒有為了避免 unaligned memory access 做 padding。

根據 ISO/IEC 9899:201x 6.7.2.1 p.113

If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit

以 1 word = 4 byte 來說，filler 佔據了 3 個 word 又 3 bytes，最後一個 word 恰好可以讓剩下的 bit fields 補齊，因此沒有 padding 發生。

struct {
    uint8_t filler[15],
        space_left : 4,
        is_ptr : 1, 
        is_large_string : 1, 
        flag2 : 1, 
        flag3 : 1;
};

做一個實驗來看看每個 bitfield 的位置是否不必加額外的 pack 也不會有多餘的 padding：

typedef struct {
    uint8_t filler[15],
        space_left : 4,
        is_ptr : 1, is_large_string : 1, flag2 : 1, flag3 : 1;
} s_t;

int main() {
    s_t s;
    printf("filler: %p\n", &s.filler);
    printf("space_left: %p\n", s.space_left);
    printf("is_ptr: %p\n", &s.is_ptr);
    printf("is_large_string: %p\n", &s.is_large_string);
    printf("flag2: %p\n", &s.flag2);
    printf("flag3: %p\n", &s.flag3);
}

但很顯然 bit field 無法被 address of：

padding_test.c:17:42: error: cannot take address of bit-field 'space_left'
   17 |     printf("space_left: %p\n", ((char *) &s.space_left));
      |                                          ^

因為 bit field 的單位是 bit，比指標指向的單位 (byte) 還要細。

在 ISO/IEC 9899:201x 6.7.2.1 p.112 提到：

The unary & (address-of) operator cannot be applied to a bit-field object; thus, there are no pointers to
or arrays of bit-field objects.

所以改由觀察整個 struct 所佔據的記憶體空間，使用 sizeof 來看是否大於 16：

int main() {
    printf("size: %lu\n", sizeof(s_t));
}

預期輸出：
size: 16

結果顯示確實沒有額外的 padding。

參考資料
Unaligned Memory Accesses

`xs_set_refcnt` checking

xs_set_refcnt 並沒有防止使用者對 short string、middle string 指派 reference count。

static inline void xs_set_refcnt(const xs *x, int val)
{
    if (xs_is_large_string(x))
        *((int *) ((size_t) x->ptr)) = val;
}

`xs_allocate_data`

考慮將已分配為 large string 重新分配成 medium string 的長度，應把 is_large_string 這個 flag 設為 0。

static void xs_allocate_data(xs *x, size_t len, bool reallocate)
{
    /* Medium string */
    if (len < LARGE_STRING_LEN) {
        x->ptr = reallocate ? realloc(x->ptr, (size_t) 1 << x->capacity)
                            : malloc((size_t) 1 << x->capacity);
        x->is_large_string = 0;
        return;
    }

`xs_grow` 邏輯錯誤

xs_grow 有許多明顯的問題：

在 short string 成長為 medium/large string 的情況，先更改了 is_ptr 才在後續判斷原本是否為 short string （利用 xs_is_ptr）
沒有考慮增長為 large string 的情況

用一個簡單的測試就能看到程式的問題：

int main(int argc, char *argv[])
{
    xs *string = xs_new(&xs_literal_empty(), "12345");
    printf("type should be short (0): %d, capcity: %ld\n", xs_type(string), xs_capacity(string));
    xs_grow(string, 16);
    printf("type should be medium (1): %d, capcity: %ld\n", xs_type(string), xs_capacity(string));
    xs_grow(string, 256);
    printf("type should be large (2): %d, capcity: %ld\n", xs_type(string), xs_capacity(string));

    return 0;
}

預期輸出：

type should be short (0): 0, capcity: 15
Segmentation fault

修改程式碼如下：

xs *xs_grow(xs *x, size_t len)
{
    // 將成長前的 is_ptr 保存起來
    bool is_ptr = x->is_ptr;
    x->is_ptr = true;

    if (len < LARGE_STRING_LEN) {
        if (is_ptr) {
            xs_allocate_data(x, len, 1);
        } else {
            xs_allocate_data(x, len, 0);
            memcpy(xs_data(x), buf, 16);
        }
    } else {
        // 增長為 large string 的情況
        x->is_large_string = true;
        if (is_ptr) {
            xs_allocate_data(x, len, 1);
        } else {
            xs_allocate_data(x, len, 0);
            memcpy(xs_data(x) + 4, buf, 16);
        }
    }
    return x;
}

預期輸出：

type should be short (0): 0, capcity: 15
type should be medium (1): 1, capcity: 31
type should be large (2): 2, capcity: 511

3. 字串複製 (CoW) 的函式實作

參照 5. 整合 string interning 的部分，將 xs_cow_lazy_copy 改寫，主要需要考量若 x 為 large string，那複製 data 到 x 時，必須一併處理 reference count，在複製前，應將 reference count 減一，然後再為 x 配置新的記憶體空間。

static void xs_cow_lazy_copy(xs *x, char **data)
{
    if (!x->is_large_string) {
        memcpy(xs_data(x), *data, strlen(*data));
        return;
    } else if (xs_get_refcnt(x) > 1) {
        /* Lazy copy */
        xs_dec_refcnt(x);
        xs_interning_nontrack(x, *data, x->size, 0);
    }

    if (data) {
        memcpy(xs_data(x), *data, x->size);
    }
    /* Update the newly allocated pointer */
    *data = xs_data(x);
    return;
}

考量到在 string interning 中的實作，會對比字串的 hash 是否跟 pool 中存在的 string 相同，若有則進一步比對，若確定與現存的 string 重複，則使用同一段記憶體空間。

但在 xs_cow_lazy_copy 的使用情境不同，即便字串相同，但可能即將被 concat 或 trim 等方式修改內容，所以會希望它被存在不同的記憶體空間，即使內容重複，因此實作 xs_interning_nontrack，將記憶體配置，而不透過 string interning 紀錄其值，也不會計算 hash。

static void xs_interning_nontrack(xs *x, const void *p, size_t len, bool reallocate)
{
    x->ptr = reallocate ? realloc(x->ptr, (size_t) 1 << x->capacity + 4)
                        : malloc((size_t) 1 << x->capacity + 4);
    memcpy(xs_data(x), p, len);
    xs_set_refcnt(x, 1);
}

而在 concat 及 trim 完成對字串的更動之後，加上這一個操作，將該記憶體位置加入 string interning 的紀錄。

if (string->is_large_string)
    add_interning_address(string->ptr);

add_interning_address 的實作如下，跟 add_interning 雷同，差別在於它直接接受一個位於 heap 的字串位址，不必再分配記憶體，而 add_interning 則是需要自行配置記憶體，再將該記憶體位址回傳給呼叫的函式，該函式需要負責將位址指派到 xs_data(xs)。

void add_interning_address(char *data)
{
    XS_LOCK();
    int len = strlen(data);
    uint32_t hash = hash_blob(data, len);

    if (!__intern_ctx.pool) {
        __intern_ctx.pool = malloc(sizeof(struct __xs_pool));
    }

    struct xs_node **n = &__intern_ctx.pool->node[hash];
    while (*n) {
        if (strcmp((*n)->data + 4, data) == 0) {
            ++(*(int *) ((size_t) (*n)->data));
            XS_UNLOCK();
            return;
        }
        n = &((*n)->next);
    }
    *n = malloc(sizeof(struct xs_node));
    (*n)->data = data;
    XS_UNLOCK();
    return;
}

4. 分析字串處理最佳化的空間效益

應考慮到 locality of reference，善用 GDB 追蹤和分析，確認字串的確符合預期地配置於 stack 或 heap;

// TODO 補完分析和實驗

`xs` 空間效率

針對 1000 個 short+medium 的字串做初始化：
xs

    KB
46.84^                                                                       #
     |                                                                     @@#
     |                                                                 @@::@@#
     |                                                              @@@@ : @@#
     |                                                           @@@@@ @ : @@#
     |                                                         :@@ @@@ @ : @@#
     |                                                     :::@:@@ @@@ @ : @@#
     |                                                  @@@: :@:@@ @@@ @ : @@#
     |                                              @@@@@@ : :@:@@ @@@ @ : @@#
     |                                            @@@ @@@@ : :@:@@ @@@ @ : @@#
     |                                         @@@@ @ @@@@ : :@:@@ @@@ @ : @@#
     |                                      :@@@@ @ @ @@@@ : :@:@@ @@@ @ : @@#
     |                                    @@:@ @@ @ @ @@@@ : :@:@@ @@@ @ : @@#
     |                                @@::@ :@ @@ @ @ @@@@ : :@:@@ @@@ @ : @@#
     |                             @@:@ : @ :@ @@ @ @ @@@@ : :@:@@ @@@ @ : @@#
     |                          @@@@@:@ : @ :@ @@ @ @ @@@@ : :@:@@ @@@ @ : @@#
     |                       @@@@@ @@:@ : @ :@ @@ @ @ @@@@ : :@:@@ @@@ @ : @@#
     |  ::                 @@@@@@@ @@:@ : @ :@ @@ @ @ @@@@ : :@:@@ @@@ @ : @@#
     |  :                ::@ @@@@@ @@:@ : @ :@ @@ @ @ @@@@ : :@:@@ @@@ @ : @@#
     |  :                ::@ @@@@@ @@:@ : @ :@ @@ @ @ @@@@ : :@:@@ @@@ @ : @@#
   0 +----------------------------------------------------------------------->ki
     0                                                                   486.2

std::string

    KB
77.54^                                                             #          
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |                                                             #:::::::@::
     |@                                                            #:::::::@::
   0 +----------------------------------------------------------------------->Mi
     0                                                                   2.430

5. 整合 string interning

嘗試用簡易的方式來追蹤程式中的 large string，加入 intern.[ch] 來為 xs 擴充。

大部分的 struct 為 cstr 的簡化：

struct xs_node {
    char *data;
    struct xs_node *next;
};

struct __xs_interning {
    int lock;
    struct __xs_pool *pool;
};

struct __xs_pool {
    struct xs_node *node[INTERNING_POOL_SIZE];
};

static struct __xs_interning __intern_ctx;

實作 add_interning，該函式主要會被 xs_new 呼叫，若是使用者新增了一個 large string，便會透過此函式分配記憶體，並計算 hash，放進 pool 中。

struct xs_node *add_interning(const char *str)
{
    XS_LOCK();
    int len = strlen(str);
    uint32_t hash = hash_blob(str, len);

    if (!__intern_ctx.pool) {
        __intern_ctx.pool = malloc(sizeof(struct __xs_pool));
    }

    struct xs_node **n = &__intern_ctx.pool->node[hash];
    while (*n) {
        if (strcmp((*n)->data + 4, str) == 0) {
            ++(*(int *) ((size_t) (*n)->data));
            XS_UNLOCK();
            return *n;
        }
        n = &((*n)->next);
    }
    *n = malloc(sizeof(struct xs_node));
    (*n)->data = malloc(sizeof(char) * len + 4);
    memcpy(((*n)->data + 4), str, len);
    XS_UNLOCK();
    return (*n);
}

修改過後的 xs_new，直接負責處理不同長度字串的邏輯，而 xs_allocate_data 也簡化為處理 middle string 的記憶體分配：

xs *xs_new(xs *x, const void *p)
{
    *x = xs_literal_empty();
    size_t len = strlen(p) + 1;
    if (len > LARGE_STRING_LEN) {
        x->capacity = ilog2(len) + 1;
        x->size = len - 1;
        x->is_ptr = true;
        x->is_large_string = true;
        xs_interning(x, p, x->size, 0);
    } else if (len > /* NNN */ MIDDLE_STRING_LEN) {
        x->capacity = ilog2(len) + 1;
        x->size = len - 1;
        x->is_ptr = true;
        xs_allocate_data(x, x->size, 0);
        memcpy(xs_data(x), p, len);
    } else {
        memcpy(x->data, p, len);
        x->space_left = 15 - (len - 1);
    }
    return x;
}

large string 的記憶體分配以及 string interning 紀錄則改由 xs_interning 來負責：

static void xs_interning(xs *x, const void *p, size_t len, bool reallocate)
{
    // the allocation and set refcnt is handled at interning
    struct xs_node *n = add_interning(p);
    x->ptr = n->data;
}

作為測試，新增三個長度超過 LARGE_STRING_LEN 的相同字串，並用 xs_data （效果同 string->ptr + 4）並用 %p 觀察是否有共用同一段記憶體位址。

#include "../xs.h"
int main(int argc, char *argv[])
{
    xs *string1 = xs_new(&xs_literal_empty(), "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890");
    xs *string2 = xs_new(&xs_literal_empty(), "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890");
    xs *string3 = xs_new(&xs_literal_empty(), "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890");

    printf("string1: %p\n", xs_data(string1));
    printf("string2: %p\n", xs_data(string2));
    printf("string3: %p\n", xs_data(string3));
}

預期輸出:

string1: 0x55a42c2332d4
string2: 0x55a42c2332d4
string3: 0x55a42c2332d4

詳細實作可見 GitHub（TODO: 優化實作）。

6. Linux 中的 SSO、CoW 案例

CoW 的設計應該會有兩個情境：

不修改內容時，不複製使用共享的資料
預期會修改內容時，複製資料至新分配的記憶體空間

觀察 sk_buff，為代表 socket buffer 的 struct，有對應的函式 skb_copy、skb_clone：

`skb_copy`

/**
 *	skb_copy	-	create private copy of an sk_buff
 *	@skb: buffer to copy
 *	@gfp_mask: allocation priority
 *
 *	Make a copy of both an &sk_buff and its data. This is used when the
 *	caller wishes to modify the data and needs a private copy of the
 *	data to alter. Returns %NULL on failure or the pointer to the buffer
 *	on success. The returned buffer has a reference count of 1.
 *
 *	As by-product this function converts non-linear &sk_buff to linear
 *	one, so that &sk_buff becomes completely private and caller is allowed
 *	to modify all the data of returned buffer. This means that this
 *	function is not recommended for use in circumstances when only
 *	header is going to be modified. Use pskb_copy() instead.
 */

struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask)
{
	int headerlen = skb_headroom(skb);
	unsigned int size = skb_end_offset(skb) + skb->data_len;
	struct sk_buff *n = __alloc_skb(size, gfp_mask,
					skb_alloc_rx_flag(skb), NUMA_NO_NODE);

	if (!n)
		return NULL;

	/* Set the data pointer */
	skb_reserve(n, headerlen);
	/* Set the tail pointer and length */
	skb_put(n, skb->len);

	BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len));

	skb_copy_header(n, skb);
	return n;
}

註解提到：

Make a copy of both an &sk_buff and its data. This is used when the caller wishes to modify the data and needs a private copy of the data to alter.

很明顯地這是 CoW 的情境，在預期會修改內容的情況複製，並將 reference count 設為 1。

`skb_clone`

/**
 *	skb_clone	-	duplicate an sk_buff
 *	@skb: buffer to clone
 *	@gfp_mask: allocation priority
 *
 *	Duplicate an &sk_buff. The new one is not owned by a socket. Both
 *	copies share the same packet data but not structure. The new
 *	buffer has a reference count of 1. If the allocation fails the
 *	function returns %NULL otherwise the new buffer is returned.
 *
 *	If this function is called from an interrupt gfp_mask() must be
 *	%GFP_ATOMIC.
 */

struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask)
{
	struct sk_buff_fclones *fclones = container_of(skb,
						       struct sk_buff_fclones,
						       skb1);
	struct sk_buff *n;

	if (skb_orphan_frags(skb, gfp_mask))
		return NULL;

	if (skb->fclone == SKB_FCLONE_ORIG &&
	    refcount_read(&fclones->fclone_ref) == 1) {
		n = &fclones->skb2;
		refcount_set(&fclones->fclone_ref, 2);
	} else {
		if (skb_pfmemalloc(skb))
			gfp_mask |= __GFP_MEMALLOC;

		n = kmem_cache_alloc(skbuff_head_cache, gfp_mask);
		if (!n)
			return NULL;

		n->fclone = SKB_FCLONE_UNAVAILABLE;
	}

	return __skb_clone(n, skb);
}

註解提到：

The new one is not owned by a socket. Both copies share the same packet data but not structure. The new buffer has a reference count of 1.

透過 skb_clone 複製的 socket buffer 會跟原本的 sk_buffer 共享 packet data (unsigned char *data)，並將 reference count 設為 1，可見 __skb_clone，在 sk_buff 中 reference count 紀錄在 refcount_t users 中：

static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
{
    // ...
    refcount_set(&n->users, 1);
    // ...
}

這和純粹的共享資料有點不同，可以注意到 skb_clone 依然將 reference count 設為 1，照 string interning 的邏輯，如果要重複使用相同的字串，應該要將 reference count 加一。

那增加 reference count 的情境應該是本來預期的共享資料，因此用 refcount_inc\(&[a-zA-Z]+->users\) 搜尋，但並沒有特定的函式負責管理，而是每個函式在接收到 sk_buff 的指標副本後負責增減 reference count，各自管理。

`skb_unshare`

/*
 *	Copy shared buffers into a new sk_buff. We effectively do COW on
 *	packets to handle cases where we have a local reader and forward
 *	and a couple of other messy ones. The normal one is tcpdumping
 *	a packet thats being forwarded.
 */

/**
 *	skb_unshare - make a copy of a shared buffer
 *	@skb: buffer to check
 *	@pri: priority for memory allocation
 *
 *	If the socket buffer is a clone then this function creates a new
 *	copy of the data, drops a reference count on the old copy and returns
 *	the new copy with the reference count at 1. If the buffer is not a clone
 *	the original buffer is returned. When called with a spinlock held or
 *	from interrupt state @pri must be %GFP_ATOMIC
 *
 *	%NULL is returned on a memory allocation failure.
 */
static inline struct sk_buff *skb_unshare(struct sk_buff *skb,
					  gfp_t pri)
{
	might_sleep_if(gfpflags_allow_blocking(pri));
	if (skb_cloned(skb)) {
		struct sk_buff *nskb = skb_copy(skb, pri);

		/* Free our shared copy */
		if (likely(nskb))
			consume_skb(skb);
		else
			kfree_skb(skb);
		skb = nskb;
	}
	return skb;
}

另外還有 skb_unshare 這樣的函式，先用 skb_cloned 來判斷拿到的 skb 是否為透過 skb_clone 得到的，如果是就重新呼叫 skb_copy 斷開兩者共享的 packet data。

2021q1 Homework3 (quiz3)

tags: linux2021

進度

Q1: Small String Optimization

1. 程式碼運作原理

xs_data 回傳字串的開頭

xs_size 回傳字串的長度

xs_capacity

xs_*_refcnt

ilog2 取對數

xs_allocate_data 配置記憶體

xs_new

xs_grow

xs_cow_lazy_copy

xs_concat

xs_trim

2. 改進方案

union 中多餘的 data type

xs_set_refcnt checking

xs_allocate_data

xs_grow 邏輯錯誤