2020q1 Homework2 (quiz2)

contributed by < OscarShiang >

tags: `linux2020`

測試環境

$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

$ cat /proc/version
Linux version 5.3.0-40-generic (buildd@lcy01-amd64-024) 
(gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) 
#32~18.04.1-Ubuntu SMP Mon Feb 3 14:05:59 UTC 2020

作業要求

重新回答第 2 周測驗題
解釋程式碼運作原理，並提出改進方案，可善用 GDB 追蹤和分析;
以上述程式碼為基礎，提供字串複製 (應考慮到 CoW) 和類似 strtok 功能的函式實作;
設計實驗，確認上述字串處理最佳化 (針對空間效率) 帶來的效益，應考慮到 locality of reference;
在 Linux 核心原始程式碼中，找出類似手法的 SSO 或 CoW 案例並解說;

測驗題分析

本題所在的位置是在 xs.c:xs_new 函式中

xs *xs_new(xs *x, const void *p)
{
    *x = xs_literal_empty();
    size_t len = strlen(p) + 1;
    if (len > AAA) {
        x->capacity = ilog2(len) + 1;
        x->size = len - 1;
        x->is_ptr = true;
        x->ptr = malloc((size_t) 1 << x->capacity);
        memcpy(x->ptr, p, len);
    } else {
        memcpy(x->data, p, len);
        x->space_left = 15 - (len - 1);
    }
    return x;
}

這個函式的用途是初始化字串結構 xs 的相關設定，觀察 xs 的結構可以發現

typedef union {
    /* allow strings up to 15 bytes to stay on the stack
     * use the last byte as a null terminator and to store flags
     */
    char data[16];

    struct {
        uint8_t filler[15],
            /* how many free bytes in this stack allocated string
             * same idea as fbstring
             */
            space_left : 4,
            /* if it is on heap, set to 1 */
            is_ptr : 1, flag1 : 1, flag2 : 1, flag3 : 1;
    };

    /* heap allocated */
    struct {
        char *ptr;
        /* supports strings up to 2^54 - 1 bytes */
        size_t size : 54,
            /* capacity is always a power of 2 (unsigned)-1 */
            capacity : 6;
        /* the last 4 bits are important flags */
    };
} xs;

可以看到 len 的大小就是 p 指標所指向的字元陣列包含空字元的大小，而在 xs_new:11 的地方就是將字串複製到 data 中

但是因為 data 的總長度只有 16 個 byte ，所以可以判斷 xs_new:5 的用意是判斷該字串 p 是否有超出 data 可以儲存的長度，所以正確答案就是：

AAA = 16

這題是在 xs.c:xs_trim 的位置，因為在這個函式中 mask 的用途是將 trimset 中的字元記錄在 mask 中

觀察 xs_trim 中的 macro

#define check_bit(byte) (mask[(uint8_t) byte / 8] CCC 1 << (uint8_t) byte % 8)
#define set_bit(byte) (mask[(uint8_t) byte / 8] |= 1 << (uint8_t) byte % 8)

可以發現 set_bit 可以把字元的數值，記錄在 mask 的其中一個位元中，所以考慮所有字元都可以存入的需求，也就是 ACSII 所規範的 255 的字元都可以利用 mask 記錄下來的情況， BBB 應該為 32 ，也就是

32 \times 8 = 256

個 bit 才能符合要求

BBB = 32

根據 xs_trim 中對於 check_bit 的用途

    for (i = 0; i < trimlen; i++)
        set_bit(trimset[i]);
    for (i = 0; i < slen; i++)
        if (!check_bit(dataptr[i]))
            break;
    for (; slen > 0; slen--)
        if (!check_bit(dataptr[slen - 1]))
            break;
    dataptr += i;
    slen -= i;

也就是確認該字元是否是要修剪的字元，所以應該算出來的數值應該是

非 0 : 該字元是 trimset 中所出現的字元
0 : 該字元不是 trimset 中所出現的字元

而因為該字元所對應到 mask 的位置就是 1 << (uint8_t) byte % 8) 的位置，所以答案就是：

CCC = &

解釋程式運作原理

xs_tmp

根據 macro 的定義， xs_tmp 的用途就是回傳一個已經透過 xs_new 初始化的字串

#define xs_tmp(x)                                          \
    ((void) ((struct {                                     \
         _Static_assert(sizeof(x) <= 16, "it is too big"); \
         int dummy;                                        \
     }){1}),                                               \
     xs_new(&xs_literal_empty(), "" x))

因為在 main 函式中的呼叫是不是存進 pointer 中，而在 xs_tmp 最後所呼叫的 *xs_new 的回傳型態是卻是 pointer ，所以需要透過 derefernce 取得

思考上述程式碼列表的 16 這個數值，解釋其作用，考慮到 cache，提出調整方案
jserv

好的，正在研讀 Cache 原理和實際影響，並思考改進的空間
Oscar[color= orange]

xs_size

在 xs 的結構中，如果使用 stack 來儲存字串的話就會利用 space_left 來記錄剩餘的 byte 數，所以如果在 xstring 還是使用 stack 來記錄資料就可以利用 16 - space_left 取得字串長度，但是如果 stack 剛好用完，也就是 '\0' 位於第 16 個 byte 上也不會發生問題，因為 '\0' 在 ASCII 中對應到的就是數值上的 0 所以即使因為 '\0' 位在第 16 個 byte 而覆蓋到 space_left 的資料， xs_size 還是能正確地計算出字串的長度

改進 xs_concat

在 xs_concat 實作中如果遇到連接的結果超出 string 的 stack 所能管理的長度時，會透過宣告一個新的字串 tmp 來連接字串，最後將原本的字串 string 釋放掉，換成 tmp 的字串

但是因為我們可以透過 xs_grow 調整字串的容量，所以就不需要另外宣告新的字串來處理連接，所以我透過 xs_grow 調整 string 的長度接著將 data 指標更新，最後再將字串複製到 string 裡面

xs *xs_concat(xs *string, const xs *prefix, const xs *suffix)
{
    size_t pres = xs_size(prefix), sufs = xs_size(suffix),
           size = xs_size(string), capacity = xs_capacity(string);

    char *pre = xs_data(prefix), *suf = xs_data(suffix),
         *data = xs_data(string);

    if (size + pres + sufs <= capacity) {
        memmove(data + pres, data, size);
        memcpy(data, pre, pres);
        memcpy(data + pres + size, suf, sufs + 1);
        string->space_left = 15 - (size + pres + sufs);
    } else {
-       xs tmps = xs_literal_empty();
-       xs_grow(&tmps, size + pres + sufs);
-       char *tmpdata = xs_data(&tmps);
-       memcpy(tmpdata + pres, data, size);
-       memcpy(tmpdata, pre, pres);
-       memcpy(tmpdata + pres + size, suf, sufs + 1);
-       xs_free(string);
-       *string = tmps;
+       xs_grow(string, size + pres + sufs);
+       data = xs_data(string);
+       memmove(data + pres, data, size);
+       memcpy(data, pre, pres);
+       memcpy(data + pres + size, suf, sufs + 1);
+       string->size = size + pres + sufs;
    }
    return string;
}

為了確定修改過後的版本能夠減少執行時間，我利用開機參數 isolcpus 將 CPU0 排除在開機程序外，接著透過 taskset 將程序綁定在 CPU0 上，並重複測量

10^{4}

次，以 95% 的區間進行平均

而實驗結果發現修改過後的執行時間較原先的版本短，如下圖所示

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

因為省去 malloc 新空間與釋放 string 的程序，所以執行的效率較原先的版本好

使用 Valgrind 檢查記憶體使用

利用 valgrind 檢查 memory leak 時發現，在字串使用 heap 來儲存時，會因為在 main 函式中沒有被釋放而導致 definitely lost ，所以必須在 main 函式使用完 xs 後強制進行 xs_free ，如果該 xs 是使用 heap 來儲存的話才能將 malloc 的空間歸還給記憶體

為了能在程式結束時自動執行 xs_free 函式釋放透過 heap 存取的字串，我在 xs 變數宣告的時候加入 gcc attribute __cleanup__ 綁定 xs_free 函式，讓其在程式結束時會被呼叫，釋放記憶體空間

#define autofree __attribute__((cleanup(xs_free)))
autofree xs string;

設計字串複製與 strtok 函式

因為需要考慮到 Copy-on-Write (CoW) 的議題，所以將字串複製分成兩種情況考慮

字串長度小於 16 ，直接將字串複製到 stack
字串長度大於 16 ，透過 CoW 將 char *ptr 指向來源字串

xs *xs_copy(xs *dst, xs *src)
{
    if (xs_is_ptr(dst))
        *dst = *xs_free(dst);
    if (xs_size(src) <= 16) {
        memcpy(dst, src, xs_size(src));
        dst->is_ptr = 0;
        dst->space_left = src->space_left;
    }
    else {
        dst->ptr = src->ptr;
        dst->is_ptr = 1;
        dst->size = xs_size(src);
        dst->flag1 = 1; // indicate that the string is just a reference
    }
    return dst;
}

顧及 CoW，其他操作字串的函式也需修改

針對因為改寫來源字串而需要複製字串的函式定義新的函式 xs_dup 來簡化流程

static xs *xs_dup(xs *dst) 
{
    int len = xs_size(dst);
    char *srcptr = xs_data(dst);

    *dst = *xs_newempty(dst);
    char *dataptr = xs_data(dst);

    memcpy(dataptr, srcptr, len);
    return dst;
}

因為如果 dst 是指標的話，我們可以先從 xs_data 函式取得 dst 指標對應的字串位置，並利用 xs_size 函式取得字串長度，當取得字串位置與字串長度之後，將 dst 字串重置，並將字串複製到 dst 中，最後回傳 dst 位置，完成複製

接著建構 xs_is_ref 函式讓我們可以知道該字串是否為 reference ，並將會操作到字串內容的函式： xs_concat 與 xs_trim 函式加入以下操作，在操作字串為參考的情況下複製字串以供操作

if (xs_is_ref(string))
    *string = *xs_dup(string);

接著著手建構 xs_tok 函式

根據 Linux Programmer's Manual 所描述的 strtok ：

The strtok() function is used to isolate sequential tokens in a null-terminated string, str.
These tokens are separated in the string by at least one of the characters in sep.
The first time that strtok() is called, str should be specified; subsequent calls, wishing to obtain further tokens from the same string, should pass a null pointer instead.
The separator string, sep, must be supplied each time, and may change
between calls.

因為在 xs_tok 中也要使用在 xs_trim 所使用到的 mask 的技巧，所以將 #undef 的巨集移動到 xs_tok 的結尾處

利用 for 迴圈判斷所在字元是否是 sep 集合中的其中一種，如果是的話，就將該字元改為 \0 ，用指標的指標更新 main 函式內的字元指標 last 後，回傳字串開頭的指標完成 xs_tok

char *xs_tok(xs *str, const char *sep, char **last)
{
    char *dataptr;
    if (str) {
        if (xs_is_ref(str))
            *str = *xs_dup(str);
        dataptr = xs_data(str);
    } else
        dataptr = *last;
    if (!*dataptr)
        return NULL;

    uint8_t mask[32] = {0};
    size_t i, len = strlen(dataptr), seplen = strlen(sep);
    for (i = 0; i < seplen; i++)
        set_bit(sep[i]);

    for (*last = dataptr; **last; ++(*last)) {
        if (check_bit(**last)) {
            **last = '\0';
            ++(*last);
            break;
        }
    }
    return dataptr;
#undef check_bit
#undef set_bit
}

參考 strtok_r 來改寫，避免用到 static char *dataptr
jserv

已參考 strtok_r 重新實作 xs_tok
Oscar[color= orange]

而在 xs_tok 的實作中因為其也會改動到字串的資料，所以在開頭需要加上判斷其為字串參考的操作，避免複寫字串原先的資料

2020q1 Homework2 (quiz2)

tags: linux2020

測試環境

作業要求

測驗題分析

解釋程式運作原理

xs_tmp

xs_size

改進 xs_concat

使用 Valgrind 檢查記憶體使用

設計字串複製與 strtok 函式

參考資料

Read more

Banana Pi R4 - Debricking Guide

Interesting Facts of Pthread

OS HW1: Benchmark Your Computer Black Box

程式設計(二) Lab 3 講解

tags: `linux2020`