2021q1 Homework3 (quiz3)

# 2021q1 Homework3 (quiz3) contributed by < `hankluo6` > > [第 3 週測驗題](https://hackmd.io/@sysprog/linux2021-quiz3) ## 測驗 `1` #### `OFF = ?` - [x] `(d) 4` #### `LLL = ?` - [x] `(b) 32 - __builtin_clz(n) - 1` #### `NNN = ?` - [x] `(a) 16` #### `CCC = ?` - [x] `(a) mask[(uint8_t) byte / 8] << (uint8_t) byte % 8` #### `SSS = ?` - [x] `(d) mask[(uint8_t) byte / 8] &= 1 << (uint8_t) byte % 8` ### 運作原理 ```c typedef union { /* allow strings up to 15 bytes to stay on the stack * use the last byte as a null terminator and to store flags * much like fbstring: * https://github.com/facebook/folly/blob/master/folly/docs/FBString.md */ char data[16]; struct { uint8_t filler[15], /* how many free bytes in this stack allocated string * same idea as fbstring */ space_left : 4, /* if it is on heap, set to 1 */ is_ptr : 1, is_large_string : 1, flag2 : 1, flag3 : 1; }; /* heap allocated */ struct { char *ptr; /* supports strings up to 2^MAX_STR_LEN_BITS - 1 bytes */ size_t size : MAX_STR_LEN_BITS, /* capacity is always a power of 2 (unsigned)-1 */ capacity : 6; /* the last 4 bits are important flags */ }; } xs; ``` `xs` 用 16 bytes 儲存 string，當字串長度小於 15 時，將字串放在 stack 中，否則放在 heap 中。 * stack: * `filler` 欄位儲存字串 * `space_left` 存放 15 - 現在字串長度，因為當字串長度為 15 bytes 時(不包括結尾字元 `'\0'`)，此時最後一個字元應要為 0x00，而使用 15 - 字串長度剛好為 0，再加上 `is_ptr` 及 `is_large_string` 皆會被設定成 0，所以能正確的處理字串。 * `is_ptr` 紀錄是否為使用 heap * `is_large_string` 紀錄 string 是否過長 * `flag2`, `flag3` 未使用 * heap: * `ptr` 為真正存字串的位置，當字串長度大於等於 `LARGE_STRING_LEN` 時，`ptr` 指向的記憶體的前 4 個 byte 會用來紀錄 reference count * `size` 當前字串長度 * `capacity` 當前字串容量 ```c /* Memory leaks happen if the string is too long but it is still useful for * short strings. */ #define xs_tmp(x) \ ((void) ((struct { \ _Static_assert(sizeof(x) <= MAX_STR_LEN, "it is too big"); \ int dummy; \ }){1}), \ xs_new(&xs_literal_empty(), x)) ``` 輸入一個字串，並建立為 `xs` 字串，透過 `_Static_assert` 檢查字串長度是否符合。 :::info 這邊用到幾個 C 語言的技巧： 1. [`Comma Operator`](https://en.wikipedia.org/wiki/Comma_operator) 執行第一個 expression ，與第二個 expression，並回傳第二個的結果 2. `_Static_assert` 檢查長度是否正確，因 `_Static_assert` 不能放置於 expression 內，故需要以 structure 包裝 3. [C 規格書](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf)內提到宣告一個沒有 member 的 struct 為未定義行為 (但 [GNU Extension](https://gcc.gnu.org/onlinedocs/gcc/Empty-Structures.html) 有特別允許此行為)，所以需要宣告一個 `dummy` 成員： >"If the struct-declaration-list does not contain any named members, either directly or via an anonymous structure or anonymous union, the behavior is undefined." ::: ```c #define xs_literal_empty() \ (xs) { .space_left = 15 } ``` 回傳空的 `xs` 物件，用來初始化，此為 [compound literal](https://gcc.gnu.org/onlinedocs/gcc/Compound-Literals.html#:~:targetText=6.28%20Compound%20Literals,compound%20literal%20is%20an%20lvalue.) 的技巧。 ```c xs *xs_new(xs *x, const void *p) { *x = xs_literal_empty(); size_t len = strlen(p) + 1; if (len > 16) { x->capacity = ilog2(len) + 1; x->size = len - 1; x->is_ptr = true; xs_allocate_data(x, x->size, 0); memcpy(xs_data(x), p, len); } else { memcpy(x->data, p, len); x->space_left = 15 - (len - 1); } return x; } ``` 依照字串長度 `len` 決定要放在 heap 上或 stack 上。在 heap 上時，透過 `xs_allocate_data` 分配新的空間，在複製資料到 `x->data` 中；在 stack 上則直接將資料複製到 `x->data` 中，最後在配置相關的參數。 ```c /* lowerbound (floor log2) */ static inline int ilog2(uint32_t n) { return 32 - __builtin_clz(n) - 1; } ``` 回傳 $\lfloor log_2(n)\rfloor$ 的值，因為是 floor 運算，所以要減 1。 ```c static void xs_allocate_data(xs *x, size_t len, bool reallocate) { /* Medium string */ if (len < LARGE_STRING_LEN) { x->ptr = reallocate ? realloc(x->ptr, (size_t) 1 << x->capacity) : malloc((size_t) 1 << x->capacity); return; } /* Large string */ x->is_large_string = 1; /* The extra 4 bytes are used to store the reference count */ x->ptr = reallocate ? realloc(x->ptr, (size_t)(1 << x->capacity) + 4) : malloc((size_t)(1 << x->capacity) + 4); xs_set_refcnt(x, 1); } ``` 在 heap 分配空間也分成兩種情況，一種為字串長度小於 `LARGE_STRING_LEN` 的字串，每個字串都在 heap 中分配一次空間；當長度大於等於 `LARGE_STRING_LEN` 時，則會有需要以同個記憶體空間代表相同字串的情形，故需要一個 reference count 紀錄當前參照的次數。`xs_set_refcnt` 就是將 reference count 初始化為 1，另外，因字串過長時要紀錄 reference count，此紀錄會放在 `x->ptr` 的前 4 個 byte，所以分配空間時要多分配 4 個 byte 的空間。 ```c static bool xs_cow_lazy_copy(xs *x, char **data) { if (xs_get_refcnt(x) <= 1) return false; /* Lazy copy */ xs_dec_refcnt(x); xs_allocate_data(x, x->size, 0); if (data) { memcpy(xs_data(x), *data, x->size); /* Update the newly allocated pointer */ *data = xs_data(x); } return true; } ``` 此函式用來實現 CoW 機制，當 `xs` 內的資料改變時，需要額外分配新的空間出來，並將 reference count 減 1。 ```c xs *xs_trim(xs *x, const char *trimset) { if (!trimset[0]) return x; char *dataptr = xs_data(x), *orig = dataptr; if (xs_cow_lazy_copy(x, &dataptr)) orig = dataptr; /* similar to strspn/strpbrk but it operates on binary data */ uint8_t mask[32] = {0}; #define check_bit(byte) (mask[(uint8_t) byte / 8] & 1 << (uint8_t) byte % 8) #define set_bit(byte) (mask[(uint8_t) byte / 8] |= 1 << (uint8_t) byte % 8) size_t i, slen = xs_size(x), trimlen = strlen(trimset); for (i = 0; i < trimlen; i++) set_bit(trimset[i]); for (i = 0; i < slen; i++) if (!check_bit(dataptr[i])) break; for (; slen > 0; slen--) if (!check_bit(dataptr[slen - 1])) break; dataptr += i; slen -= i; /* reserved space as a buffer on the heap. * Do not reallocate immediately. Instead, reuse it as possible. * Do not shrink to in place if < 16 bytes. */ memmove(orig, dataptr, slen); /* do not dirty memory unless it is needed */ if (orig[slen]) orig[slen] = 0; if (xs_is_ptr(x)) x->size = slen; else x->space_left = 15 - slen; return x; #undef check_bit #undef set_bit } ``` 對 `xs` 字串 trim，首先透過 `xs_cow_lazy_copy` 建立新的字串，`mask` 紀錄要 trim 的字串內的 bit，透過 `set_bit` 將設置 bit，`check_bit` 檢查 bit。每個字元的 8 個 bit 中，前五個 bit 決定 `mask` 的 index，後三個 bit 決定要設置在該 index 中的哪個位置。第二個與第三個 `for` 迴圈分別從開頭集結尾掃描字串，如果字元出現在 `trim` 內則繼續掃描，直到找到無法被修剪的字元。利用 `memmove` 將字串移動到新的位置，並設置 `size`。 ```c xs *xs_concat(xs *string, const xs *prefix, const xs *suffix) { size_t pres = xs_size(prefix), sufs = xs_size(suffix), size = xs_size(string), capacity = xs_capacity(string); char *pre = xs_data(prefix), *suf = xs_data(suffix), *data = xs_data(string); xs_cow_lazy_copy(string, &data); if (size + pres + sufs <= capacity) { memmove(data + pres, data, size); memcpy(data, pre, pres); memcpy(data + pres + size, suf, sufs + 1); if (xs_is_ptr(string)) string->size = size + pres + sufs; else string->space_left = 15 - (size + pres + sufs); } else { xs tmps = xs_literal_empty(); xs_grow(&tmps, size + pres + sufs); char *tmpdata = xs_data(&tmps); memcpy(tmpdata + pres, data, size); memcpy(tmpdata, pre, pres); memcpy(tmpdata + pres + size, suf, sufs + 1); xs_free(string); *string = tmps; string->size = size + pres + sufs; } return string; } ``` 對 xs 字串 concatenation，一樣先透過 `xs_cow_lazy_copy` 複製一個新的 string，接著檢查串接完的大小是否超過 `capacity`，如果沒有的話則移動及複製記憶體內容即可；如果超過的話則透過 `xs_grow` 重新分配空間，再複製記憶體上去。最後在設定 `size` 大小即可。 ```c /* grow up to specified size */ xs *xs_grow(xs *x, size_t len) { char buf[16]; if (len <= xs_capacity(x)) return x; /* Backup first */ if (!xs_is_ptr(x)) memcpy(buf, x->data, 16); x->is_ptr = true; x->capacity = ilog2(len) + 1; if (xs_is_ptr(x)) { xs_allocate_data(x, len, 1); } else { xs_allocate_data(x, len, 0); memcpy(xs_data(x), buf, 16); } return x; } ``` 為 `x` 分配足夠的記憶體空間，先複製原本的資料到 `buf` 內，新增空間後，在將 `bug` 資料複製回去。 ### 字串複製 ```c xs *xs_copy(xs *x, xs *y) { xs_free(x); if (xs_is_large_string(y)) { x->is_ptr = 1; x->capacity = y->capacity; x->size = y->size; x->ptr = y->ptr; x->is_large_string = 1; xs_inc_refcnt(x); } else { xs_new(x, xs_data(y)); } return x; } ``` 如果為長字串 (`> LARGE_STRING_LEN`) 的話，則以相同的記憶體空間存放，並將 reference count 增加，否則使用新的 stack 或 heap 空間複製 data。 ### 空間分析透過 massif 分析 `xs` 與 C++ 內的 `std::string` 空間使用差異，測試資料為 [dict/cities.txt](https://github.com/sysprog21/dict/blob/master/cities.txt)，共 93827 筆資料。 * `union xs` * 測試程式碼： ```c int main(int argc, char *argv[]) { FILE *fp = fopen("cities.txt", "r"); if (!fp) { perror("failed to open cities.txt"); exit(EXIT_FAILURE); } xs strings[93827]; /* cities Length */ char buf[256] = {0}; int i = 0; while (fgets(buf, 256, fp)) { strings[i++] = *xs_tmp(buf); } fclose(fp); return 0; } ``` * 結果： ![](https://i.imgur.com/ighouhU.png) 一開始使用量急遽上升到 1.5 MB 左右是因為在 stack 中宣告 93827 個型態為 `xs` 的陣列，產生 `sizeof(xs) * 93827 = 1501232 MiB` 的空間。其後記憶體空間隨著字串建立在 heap 上時增加(圖中綠色部份)，最後總共記憶體使用量約為 4.5MB。 * `std::string` * 測試程式碼： ```cpp int main(int argc, char *argv[]) { std::ifstream fp("cities.txt"); std::string strings[93827]; /* cities Length */ int i = 0; while (getline(fp, strings[i++])) ; fp.close(); return 0; } ``` * 結果： ![](https://i.imgur.com/JD7ln08.png) 開頭上升一樣為建立在 stack 上的 string 陣列造成的，大小為 `sizeof(string) * 93827 = 3002464 MiB`，為使用 `xs` 的兩倍。而隨著字串產生，放置在 heap 的資料也會漸增，總共約為 2.3 MiB，可以發現比使用 `xs` 時在 heap 放置的資料大小 3.1 MiB 還要來得小。最後記憶體下降為 std::string 的 deconstructor 釋放空間造成的。可以發現使用 `std::string` 雖然在 stack 上的空間分配較多，但在 heap 上分配的空間卻比 `union xs` 還要低，這與預期的結果不同。可以對 `std::string` 內部 new 與 `xs` 內部 malloc 分析，在我的電腦上 `std::string` 預設的 capacity 為 15，當重新分配長度大於 15 的字串時，會加倍其 capacity；而 `xs` 預設的 capacity 為 $1 << \lfloor \log_2(len) \rfloor$，而我們輸入的資料中，長度分佈如圖： ![](https://i.imgur.com/N9xp7G1.png) 分析 `std::string` 與 `xs` 分別會配置多少空間： ![](https://i.imgur.com/BgtUbcc.png) ![](https://i.imgur.com/TlC7kvC.png) 可以看到 `xs` 分配的空間是以 2 的冪成長，但 `std::string` 並不是，而是會依據字串長度決定字串的 capacity，所以分配在 heap 上的空間比 `xs` 還要更少。再深入探討 `std::string` 的機制，會發現原因在於目前使用的 `std::getline` 會對 `std::string` 分配足夠大小的空間，也就是分配的空間會與字串長度相同，所以可以看到 `std::string` 在長度 30 以上的字串數量與輸入資料的趨勢相同。所以理論上 `xs` 與 `std::string` 在 heap 上的空間應該相似 (`xs` 為 $2^i$，`std::string` 為 $15 \times 2^j$)，但在 stack 空間的比較上，`xs` 比 `std::string` 還要更節省空間 (`xs` 為 16 bytes，`std::string` 為 32 bytes)。且如果已經知道字串為 immutable ，並不會在之後改變時，可以利用建立字串 (`xs_new`) 時直接分配足夠的空間，便能更利用空間。 ###### tags: `linux2021`