2020q1 Homework3 (quiz3)

contributed by < ambersun1234 >

開發環境

$ uname -a
Linux station 5.4.72-microsoft-standard-WSL2 #1 SMP Wed Oct 28 23:40:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ gcc -v
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)

解釋程式運作原理




























typedef union {
    /* allow strings up to 15 bytes to stay on the stack
     * use the last byte as a null terminator and to store flags
     * much like fbstring:
     * https://github.com/facebook/folly/blob/master/folly/docs/FBString.md
     */
    char data[16];

    struct {
        uint8_t filler[15],
            /* how many free bytes in this stack allocated string
             * same idea as fbstring
             */
            space_left : 4,
            /* if it is on heap, set to 1 */
            is_ptr : 1, flag1 : 1, flag2 : 1, flag3 : 1;
    };

    /* heap allocated */
    struct {
        char *ptr;
        /* supports strings up to 2^54 - 1 bytes */
        size_t size : 54,
            /* capacity is always a power of 2 (unsigned)-1 */
            capacity : 6;
        /* the last 4 bits are important flags */
    };
} xs;

上述 union 大小總共 16 bytes，其中 data[16] 與另外兩個 struct 共用這個 16 bytes;
根據講解，長度

\leq 15

的字串會存放在 stack，長度

\geq 16

的字串會放在 heap，但是我們可以很明顯地觀察到 data 總共擁有 16 bytes，多的那個 1 個 bytes 則是為了要讓 space_left, is_ptr, flag1, flag2, flag3 用的，而其中 filler[15] 是為了要對齊 data 前 15 bytes 所用，避免更改到資料部分
注意到 union 中 bit field 的用法，根據 C 語言規格書 §6.7.2.1 - 8

A member of a structure or union may have any object type other than a variably modified type.105) In addition, a member may be declared to consist of a specified number of bits (including a sign bit, if any). Such a member is called a bit-field; 106) its width is preceded by a colon.

bit field 可以定義該欄位需要多少 bits，在空間利用上，這是一個很好的技巧

static inline char *xs_data(const xs *x)
{
    if (!xs_is_ptr(x))
        return (char *) x->data;

    if (xs_is_large_string(x))
        return (char *) (x->ptr + OFF);
    return (char *) x->ptr;
}

這個函式主要在取得對應的指標，分別有放在短字串的 x->data、中字串的 x->ptr 以及長字串的 x->ptr + OFF
根據註解 /* The extra 4 bytes are used to store the reference count */ 我們可以得知 OFF 為 4 bytes

/* lowerbound (floor log2) */
static inline int ilog2(uint32_t n) { return LLL; }

取出小於等於 n 的 power of 2，考慮

13_{(10)} = 0000 . . . . . 1101_{(2)}

距離 13 最近的 power of 2 為 16(

0000 . . . . 0001 0000_{(2)}

) 在位置 4 上面
因為 power of 2 在二進位的表示上，都只會有一個 bit
13 的二進位上，只有4個 bit 有值，而使用 clz 我們可以很輕易的取得前方有多少位元空著的，透過簡單的減法運算可以得到 4(32 - clz(n))
又因為這個函式是要取 lowerbound ，所以必須在 - 1

也可以用 ctz 進行改寫
變成

/* lowerbound (floor log2) */
static inline int ilog2(uint32_t n) { return __builtin_ctz(n) - 1; }

xs *xs_new(xs *x, const void *p)
{
    *x = xs_literal_empty();
    size_t len = strlen(p) + 1;
    if (len > NNN) {
        x->capacity = ilog2(len) + 1;
        x->size = len - 1;
        x->is_ptr = true;
        xs_allocate_data(x, x->size, 0);
        memcpy(xs_data(x), p, len);
    } else {
        memcpy(x->data, p, len);
        x->space_left = 15 - (len - 1);
    }
    return x;
}

NNN 為 16，因為大於 16 就必定需要用到 heap 進行存取

#define check_bit(byte) (CCC)
#define set_bit(byte) (SSS)
    size_t i, slen = xs_size(x), trimlen = strlen(trimset);

    for (i = 0; i < trimlen; i++)
        set_bit(trimset[i]);
    for (i = 0; i < slen; i++)
        if (!check_bit(dataptr[i]))
            break;
    for (; slen > 0; slen--)
        if (!check_bit(dataptr[slen - 1]))
            break;
    dataptr += i;
    slen -= i;

    /* reserved space as a buffer on the heap.
     * Do not reallocate immediately. Instead, reuse it as possible.
     * Do not shrink to in place if < 16 bytes.
     */
    memmove(orig, dataptr, slen);
    /* do not dirty memory unless it is needed */
    if (orig[slen])
        orig[slen] = 0;

    if (xs_is_ptr(x))
        x->size = slen;
    else
        x->space_left = 15 - slen;
    return x;
#undef check_bit
#undef set_bit

該 function 為 trim，也就是去除特定字元，所以不難猜測到 check_bit 是用於檢查是否有一樣的字元，而 set_bit 則是將要去除字元寫入 mask
值得注意的是，解答給了一個我覺得很神奇的解法

mask[(uint8_t) byte / 8] & 1 << (uint8_t) byte % 8
mask[(uint8_t) byte / 8] |= 1 << (uint8_t) byte % 8

根據 ASCII 的定義，它包含了 128 個字元 (可用 8 個 bits 表示)
解答採用的方式是，用一個 map 去對應，前 5 個 bits 是 index，後 3 個 bits 是 value
在 value 的方面，也不是簡單的儲存 bits，而是把它當作 shift amount
因為，可想而知，如果直接採用後 3 bits，一定會遇到 000 的情況，變成再比對的時候，條件會一直成立，即使沒有把該字元寫入 trimset (因為一開始就把 mask 初始化為 0)
也就是說，value 的值會從 2 (1 << 1) ~ 128 (1 << 7)

字串複製 (應考慮到 CoW) 的實作

// parameter: src, des
static xs *xs_cow_copy(xs *des, xs *src) {
    if (!xs_is_ptr(src)) {
        *des = *src;
    } else if (xs_is_large_string(src)) {
        des->capacity = src->capacity;
        des->is_ptr   = src->is_ptr;
        des->size     = src->size;
        xs_inc_refcnt(src);
        des->ptr = xs_data(src);
    } else {
        des->capacity = src->capacity;
        des->is_ptr   = src->is_ptr;
        des->size     = src->size;
        size_t len = xs_size(src) + 1;
        xs_allocate_data(des, len, 0);
        memcpy(xs_data(des), xs_data(src), len);
    }
}

輸出如下

test cow copy on stack
[(((foobarbar)))] : 15
[(((foobarbar)))] : 15
[0x7fff06372490] : [0x7fff06372430]

test cow copy on heap
[abcdefghijklmnopqrstuv] : 22
[abcdefghijklmnopqrstuv] : 22
[0x5573b42c56e0] : [0x5573b42c56b0]

test cow copy on heap
[cHbC2VnYvV3O0MMwC1GEVfWQUHozZUkicxkxEgP2ySifc7P5edWzuyPiOoyxYAmVs5ExtHMDlPUYveflnJqF3b9X9qWtQmrsHuUOrnGfLLyrRM23t5SsnGTv7ArtseIG4N4NBSm632jQJTdu11lcGJQZj5mFnQE0Zcc78J9cazKYLF0AYe4bLiUTXJKsMKhxcvqsIVBQvpedDViocbVgcatIRsqnFUtywx5DLlHjcBNpMNXyAf47gXa4NEctfc7UioLx] : 260
[cHbC2VnYvV3O0MMwC1GEVfWQUHozZUkicxkxEgP2ySifc7P5edWzuyPiOoyxYAmVs5ExtHMDlPUYveflnJqF3b9X9qWtQmrsHuUOrnGfLLyrRM23t5SsnGTv7ArtseIG4N4NBSm632jQJTdu11lcGJQZj5mFnQE0Zcc78J9cazKYLF0AYe4bLiUTXJKsMKhxcvqsIVBQvpedDViocbVgcatIRsqnFUtywx5DLlHjcBNpMNXyAf47gXa4NEctfc7UioLx] : 260
[0x5573b42c5714] : [0x5573b42c5714]
reference count: 2

實作 Copy on Write - CoW 時要注意長字串還有 reference count 需要修正。

設計實驗，確認上述字串處理最佳化 (針對空間效率) 帶來的效益，應考慮到 locality of reference

以時間效率來說，根據 Stack vs Heap Memory Allocation 中指出，stack memory allocation 中的操作由編譯器事先規劃，而且記憶體區段是連續的，也因此相較於 heap memory allocation 來說，存取速度會比 heap 還要快
空間效率的部分。C++ std::string 只能使用一半的大小，詳見以下實驗

參照 Exploring std::string 做了一個實驗(實作程式碼: linux2021q1 quiz3/stdstringTest.cpp)

#include <iostream>
#include <string>
#include <cstdlib>

void *operator new(std::size_t n) {
    std::cout << "Allocating " << n << " bytes";
    return malloc(n);
}

void operator delete(void *p) throw() {
    free(p);
}

int main(int argc, const char *argv[]) {
    std::cout << "sizeof(std::string): " << sizeof(std::string) << std::endl;
    for (int i = 0; i < 32; i++) {
        std::cout << i << ": " << std::string(i, '=') << std::endl;
    }

    return 0;
}

得到的測試結果如下

sizeof(std::string): 32
0:
1: =
2: ==
3: ===
4: ====
5: =====
6: ======
7: =======
8: ========
9: =========
10: ==========
11: ===========
12: ============
13: =============
14: ==============
15: ===============
16: Allocating 17 bytes================
17: Allocating 18 bytes=================
18: Allocating 19 bytes==================
19: Allocating 20 bytes===================
20: Allocating 21 bytes====================
21: Allocating 22 bytes=====================
22: Allocating 23 bytes======================
23: Allocating 24 bytes=======================
24: Allocating 25 bytes========================
25: Allocating 26 bytes=========================
26: Allocating 27 bytes==========================
27: Allocating 28 bytes===========================
28: Allocating 29 bytes============================
29: Allocating 30 bytes=============================
30: Allocating 31 bytes==============================
31: Allocating 32 bytes===============================

由上可得知，std::string 僅能使用一半的空間

考慮空間效率實驗，詳見 linux2021q1 quiz3/spaceTest.cpp

$↑$ 針對字串大小為
$1 \to 300$ 的 massif 輸出

$↑$ 針對字串大小為
$1 \to 500$ 的 massif 輸出

綠色的部分是 xs 本身的佔用空間，黃色的部分是 std::string，青色的部分是 xs heap 的部分

根據

1 \to 300

的輸出 (上圖) 可以得知，xs 的空間隨著字串大小越來越大，直到 43.0KB(綠色部分)，後來就換成青色的部分，也就是 heap (malloc) 的部分

接著考慮

1 \to 500

的輸出 (下圖)，可以發現，黃色的部分 (std::string) 相比青色 (xs heap)，增長的更快，也就是說，xs 的實作相比 std::string 來說確實可以減少記憶體使用量

tags: `linux2021`

Jim Huang

2021/03/21 20:28:02

青色

青色的英語是 cyan，例如 https://wingolog.org/archives/2008/05/05/catching-memory-leaks-with-valgrinds-massif 這篇也用 cyan 來描述指定範圍 (Edited)

2021/03/21 20:30:19

ambersun1234

不要打錯自己的 GitHub 帳號名稱 (Edited)

2020q1 Homework3 (quiz3)

開發環境

解釋程式運作原理

字串複製 (應考慮到 CoW) 的實作

設計實驗，確認上述字串處理最佳化 (針對空間效率) 帶來的效益，應考慮到 locality of reference

tags: linux2021

Read more

2021q1 Homework1 (quiz1)

2021q1 Homework4 (quiz4)

2020q1 Homework (期末專題)

C 語言自學經典

tags: `linux2021`