2021q1 Homework3 (quiz3)

contributed by < YoLinTsai >

tags: `linux2021-hw`

題目

延伸問題:

解釋上述程式碼運作原理，並提出改進方案;
以上述程式碼為基礎，提供字串複製 (應考慮到 CoW) 的函式實作;
設計實驗，確認上述字串處理最佳化 (針對空間效率) 帶來的效益，應考慮到 locality of reference，善用 GDB 追蹤和分析，確認字串的確符合預期地配置於 stack 或 heap;
嘗試將 quiz2 提到的 string interning 機制整合，提出對空間高度最佳化的字串處理工具函式庫
在 Linux 核心原始程式碼中，找出類似手法的 SSO 或 CoW 案例並解說;

先來看最重要的這段 union 結構，我的理解是， union 是共享一塊記憶體，根據 assign 進來的變數改變記憶體內容，利用這樣的特性，即便我們宣告了數種不同的 struct 結構，透過巧妙的記憶體空間安排，我們可以有效的共用一些記憶體的內容。

typedef union {
    /* allow strings up to 15 bytes to stay on the stack
     * use the last byte as a null terminator and to store flags
     * much like fbstring:
     * https://github.com/facebook/folly/blob/master/folly/docs/FBString.md
     */
    char data[16];

    struct {
        uint8_t filler[15],
            /* how many free bytes in this stack allocated string
             * same idea as fbstring
             */
            space_left : 4,
            /* if it is on heap, set to 1 */
            is_ptr : 1, is_large_string : 1, flag2 : 1, flag3 : 1;
    };

    /* heap allocated */
    struct {
        char *ptr;
        /* supports strings up to 2^MAX_STR_LEN_BITS - 1 bytes */
        size_t size : MAX_STR_LEN_BITS,
                      /* capacity is always a power of 2 (unsigned)-1 */
                      capacity : 6;
        /* the last 4 bits are important flags */
    };
} xs;

以下是記憶體安排的示意圖，特別注意這樣巧妙的安排，可以使不同的方式都能安全的調用 flag ，回顧 fbstring 中的特性，比方說我們把 filler 填滿，最後一個字元剛好是0x00， flag 的意義仍然正確。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

xs_new
















xs *xs_new(xs *x, const void *p)
{
    *x = xs_literal_empty();
    size_t len = strlen(p) + 1;
    if (len > 16) { //NNN
        x->capacity = ilog2(len) + 1;
        x->size = len - 1;
        x->is_ptr = true;
        xs_allocate_data(x, x->size, 0);
        memcpy(xs_data(x), p, len);
    } else {
        memcpy(x->data, p, len);
        x->space_left = 15 - (len - 1);
    }
    return x;
}

xs_new是 xs 剛宣告時會被呼叫來 allocate 記憶體的 function，如下段所示






#define xs_tmp(x)                                                   \
    ((void) ((struct {                                              \
         _Static_assert(sizeof(x) <= MAX_STR_LEN, "it is too big"); \
         int dummy;                                                 \
     }){1}),                                                        \
     xs_new(&xs_literal_empty(), x))

NNN 為16的理由是當字串長度超過16時， xs 才會用 large_string 的 policy，接著來看ilog2：

static inline int ilog2(uint32_t n) { return 32 - __builtin_clz(n) - 1; } //LLL

ilog2 是為了找最接近但不出過 n 的 2^k ，這裡用clz來找 32 bits integer 中從右邊數過來第一個 1 之前有幾個 0，接著2^(ilog2(len) + 1)即是 large_string 應該要 allocate 的記憶體空間。

xs_trim

如同 quiz3 中的說明， xs_trim 的用途是將字串前後指定的字元刪除，我們先看判斷和刪除的部份






















    uint8_t mask[32] = {0};

#define check_bit(byte) (mask[(uint8_t) byte / 8] & 1 << (uint8_t) byte % 8) //CCC
#define set_bit(byte) (mask[(uint8_t) byte / 8] |= 1 << (uint8_t) byte % 8) //SSS
    size_t i, slen = xs_size(x), trimlen = strlen(trimset);

    for (i = 0; i < trimlen; i++)
        set_bit(trimset[i]);
    for (i = 0; i < slen; i++)
        if (!check_bit(dataptr[i]))
            break;
    for (; slen > 0; slen--)
        if (!check_bit(dataptr[slen - 1]))
            break;
    dataptr += i;
    slen -= i;

    /* reserved space as a buffer on the heap.
     * Do not reallocate immediately. Instead, reuse it as possible.
     * Do not shrink to in place if < 16 bytes.
     */
    memmove(orig, dataptr, slen);

這裡的手法是做兩次 iteration ，一次從頭一次從尾，判斷好 trim 完的邊界後用 memmove 修剪完成。
值得注意的是這裡用來判斷是否為需要刪除字元的手法相當細緻，在 ascii 的編碼下，字元的值可能是 0~255，如此一來我們可以用一個 256 bits 的 mask 去表達哪幾個字元是要被刪除的。
我們假設指定刪除字元為 x:
set_bit(x): mask[ascii(x)] = 1
check_bit(x): return mask[ascii(x)] == 1
而 CCC 和 SSS 的 bit operation 正符合上述的邏輯。

xs_allocate_data

這段是當 xs_new 判斷字串長度超過 16 時，便會改成用 medium/large string 的 struct 去儲存，同時也會用 xs_allocate_data 去 malloc 記憶體，同時我們可以發現，當字串長度超過 LARGE_STRING_LEN (256) 時，會多申請 4 個 bits 用來紀錄 reference count。



















static void xs_allocate_data(xs *x, size_t len, bool reallocate)
{
    /* Medium string */
    if (len < LARGE_STRING_LEN) {
        x->ptr = reallocate ? realloc(x->ptr, (size_t) 1 << x->capacity)
                            : malloc((size_t) 1 << x->capacity);
        return;
    }

    /* Large string */
    x->is_large_string = 1;

    /* The extra 4 bytes are used to store the reference count */
    x->ptr = reallocate ? realloc(x->ptr, (size_t)(1 << x->capacity) + 4)
                        : malloc((size_t)(1 << x->capacity) + 4);

    xs_set_refcnt(x, 1);
}

xs_copy

這裡同時會解釋 CoW 和 Reference Count


























void *xs_copy(xs *dst, xs *src)
{
    xs_free(dst);
    //*dst = xs_literal_empty();
    size_t len = xs_size(src);

    if(len > 16){
        dst->capacity = src->capacity;
        dst->size = len;
        dst->is_ptr = true;
        if(len > LARGE_STRING_LEN) {
            dst->ptr = src->ptr;
            dst->is_large_string = true;
            xs_inc_refcnt(src);
        } else {
            xs_allocate_data(dst, dst->size, 0);
            memcpy(xs_data(dst), xs_data(src), len);
        }
    }else{
        dst->is_ptr = false;
        memcpy(dst->data, xs_data(src), len);
        dst->space_left = 15 - len;
    }
}

首先 reference count 在的意義是：有幾個 pointer 指著這塊記憶體，在 sso 當中的意含就是有幾個 Large String 的 ptr 指向同一塊記憶體。
那為什麼會發上述情形呢？因為 Copy on Write 的設計導致會有上述狀況， CoW 的使用情境是這樣：當我想 copy 一塊記憶體的時候，如果我接著沒有馬上要用，那我在 copy 的當下就去 allocate 似乎是個有點浪費空間和時間的作法
取而代之的，我們可以偷偷的把 pointer 指向被複製的 address ，一旦發現我真的要改他的時候，而且有複數個 pointer 指向該記憶體，再去申請一塊新的記憶體。
如上面那段 code ，我們先判斷是否為長字串，接著判斷是否為 large_string ，如果為真就用 CoW ，並調整 reference count (refcnt)。

gdb檢驗





    xs string = *xs_tmp(longString);
    xs copyString;
    xs_copy(&copyString, &string);
    xs prefix = *xs_tmp("((("), suffix = *xs_tmp(")))");
    xs_concat(&copyString, &prefix, &suffix);

vmmap

0x000055555555a000 0x000055555557b000 rw-p	[heap]
0x00007ffffffde000 0x00007ffffffff000 rw-p	[stack]

print string





{
    ptr = 0x55555555a2a0 "\002",
    size = 0x10e,
    capacity = 0x9
}

print copyString





 {
    ptr = 0x55555555a2a0 "\002",
    size = 0x10e,
    capacity = 0x9
 }

print copyString (after trim)





 {
    ptr = 0x55555555a8c0 "\001",
    size = 0x114,
    capacity = 0x9
 }

這邊檢查了 ptr 有正確指向 heap 位置，同時檢測 CoW 前後是否有確實 malloc 新的記憶體並將 ptr 指向新的位置，此外一併檢查了 refcnt 的正確性。

設計實驗

我們實做的 xs_cpy w CoW:

 Performance counter stats for './bench 10000 0':

            24,279      cache-misses              #   44.099 % of all cache refs    
            55,056      cache-references                                            

       0.002458456 seconds time elapsed

       0.002419000 seconds user
       0.000000000 seconds sys

對比 strncpy 每次 copy 都去 malloc 一塊

 Performance counter stats for './bench 10000 1':

            65,038      cache-misses              #   55.951 % of all cache refs    
           116,242      cache-references                                            

       0.001360883 seconds time elapsed

       0.001375000 seconds user
       0.000000000 seconds sys

可以發現 CoW 的手法會讓 cache-misses 的數量下降許多，其實這也符合 CoW 設計的手法，同時我們也可以發現 CoW 所花的時間是比較長的。(待確認原因)
massif