2021q1 quiz3 - SSO/CoW

# 2021q1 quiz3 - SSO/CoW contributed by < `yellow951321` > ###### tags: `linux2021` ## source [quiz3](https://hackmd.io/@sysprog/ryCRU-Hmu) ## struct 仿造 [folly:fbstring](https://github.com/facebook/folly/blob/master/folly/docs/FBString.md) 的目標， `xs union` 為了更高效能的短字串處理，定義了 16 個位元組，應用和實作方法如下： > 字串長度小於或等於 15個位元組，則放在 stack 。 > 字串長度大於 16 個位元組則放在 heap （使用 malloc 系統呼叫配置所需記憶體） > 放在 heap 的字串，會針對大於 `LARGE_STRING_LEN` 的字串重複使用，同時將字串前 4 byte 用於 reference counting。 reference counting 的用意是為了要記錄是否還有指標在使用這個字串，假如在釋放記憶體時發現超過兩個以上的指標使用字串時。不需要釋放 heap 中的記憶體，只要更改 reference counting 即可。 ```cpp typedef union { /* allow strings up to 15 bytes to stay on the stack * use the last byte as a null terminator and to store flags * much like fbstring: * https://github.com/facebook/folly/blob/master/folly/docs/FBString.md */ char data[16]; struct { uint8_t filler[15], /* how many free bytes in this stack allocated string * same idea as fbstring */ space_left : 4, /* if it is on heap, set to 1 */ is_ptr : 1, is_large_string : 1, flag2 : 1, flag3 : 1; }; /* heap allocated */ struct { char *ptr; /* supports strings up to 2^54 - 1 bytes */ size_t size : 54, /* capacity is always a power of 2 (unsigned)-1 */ capacity : 6; /* the last 4 bits are important flags */ }; } xs; ``` 前半段為子串長度小於 16 時所使用的 struct ，後半為資料放在 heap 時的 struct。 1. `filler` ：當字串小於 16 時，字串的儲存位置。 2. `space_left` ：儲存了短字串中，剩餘的空間。計算公式為`15 - sizeof(filler)` 3. `is_ptr` ：紀錄資料是否被儲存在， heap 中。 4. `is_large_string` ：紀錄資料是否長度大於 `LARGE_STRING_LEN` ``` #define MAX_STR_LEN_BITS (54) #define MAX_STR_LEN ((1UL << MAX_STR_LEN_BITS) - 1) #define LARGE_STRING_LEN 256 ``` 1. `MAX_STR_LEN_BITS (54)` ：定義了可用於用於儲存資料的最大 bits 數量。 2. `MAX_STR_LEN` ：根據 `MAX_STR_LEN_BITS` 定義出的可以儲存的字串最大長度，在這次的實作中為 $2^{53} - 1$ 3. `LARGE_STRING_LEN` ：定義了判定為 `LARGE_STRING` 的最小字串長度。用於當資料儲存在 heap 時跟 reference counting 有關的一些操作。此外，根據 [RZHuangJef](https://hackmd.io/@Forward1105/2021q1Hw_quiz3) 提到出假說。由於我們是用 unino 去宣告字串。我們必須保證當作為短字串時的 flags 不會和長字串中的各個 bit-fields影響。需要確認確認宣告時的 bit-fields 和最後實作出的 bit-fields 的順序是和預期的一樣的。在 IOS/IEC 9899 中，針對 structure 以及 union 的定義 * ISO/IEC 9899:1999 [6.7.2.1] Structure and union specifiers > An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified. C99 規格書有提到，連續的 bit-fields 在記憶體上也會以相鄰的方式連接。但是連接的順序（由低記憶體位置到高記憶體位置或者反之）是 implementation-defined 。因此 [RZHuangJef](https://hackmd.io/@Forward1105/2021q1Hw_quiz3) 設置了一個實驗去檢驗實際上的記憶體順序為何。我試著重現這一段實驗，結果如下。 ### test.c ``` c= #include <stdio.h> #include <stdint.h> #include <stddef.h> struct { uint32_t a : 4, b : 8, c : 20; } ts; void print_bits(uint8_t byte) { for (int8_t i = 7; i >= 0; i--) printf("%d", byte & (1 << i) ? 1 : 0); } void print_layout(const void *ptr, size_t len) { uint8_t *p = (uint8_t *)ptr; for (size_t i = 0; i < len; i++, p++) { printf("%p: ", p); print_bits(*p); printf("\n"); } } int main() { printf("Default:\n"); print_layout(&ts, sizeof(ts)); // set ts.a ts.a = 0xF; printf("\nts.a is set\n"); print_layout(&ts, sizeof(ts)); // set ts.b ts.b = 0xFF; printf("\nts.a & ts.b are set\n"); print_layout(&ts, sizeof(ts)); return 0; } ``` 實驗環境以及編譯過程 ``` shell $ gcc --version | head -n 1 gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 $ lscpu | head -n 3 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian $ gcc -std=c99 test.c -o test ``` 這次實驗中所宣告的 struct 有三個 bit-fields a b c ，長度分別為 4 8 20 。實驗目的是要確認宣告的順序以及實際上實做出的順序的差異，值結果如下。 ```shell= Default: 0x562efb93a014: 00000000 0x562efb93a015: 00000000 0x562efb93a016: 00000000 0x562efb93a017: 00000000 ts.a is set 0x562efb93a014: 00001111 0x562efb93a015: 00000000 0x562efb93a016: 00000000 0x562efb93a017: 00000000 ts.a & ts.b are set 0x562efb93a014: 11111111 0x562efb93a015: 00001111 0x562efb93a016: 00000000 0x562efb93a017: 00000000 ``` 通過實驗，我們可以確定。連續的 bit-fields 是從低記憶體位置開始往高記憶體位置排序。總之，這張圖表可以表示出各種情況下 `xs` 的記憶體配置情況 ``` _______________________________________________________________________ |_____________________________data[16]__________________________________| |____________________________filler[15]____________________|_sl_|_|_|_|_| |______________ptr_______________|_________size________|capacity|\flags/ sl stands for space_left ``` ## functions ### xs_is_ptr ```c= static inline bool xs_is_ptr(const xs *x) { return x->is_ptr; } ``` 回傳 xs 是否儲存在 heap 上。 ### xs_is_large_string ``` c= static inline bool xs_is_large_string(const xs *x) { return x->is_large_string; } ``` 回傳 xs 是否為 large string 。 ### xs_size ```c= static inline size_t xs_size(const xs *x) { return xs_is_ptr(x) ? x->size : 15 - x->space_left; } ``` 以 space_left 計算出 xs 目前的短字串長度。 ### xs_data ```c= static inline char *xs_data(const xs *x) { if (!xs_is_ptr(x)) return (char *) x->data; if (xs_is_large_string(x)) return (char *) (x->ptr + 4); return (char *) x->ptr; } ``` 回傳 xs 目前儲存的字串值，短字串直接回傳 data 即可。自串長度大於 15 但不符合 large string 的字串則直接回傳 ptr 中儲存的字串。而 large string 則需要額外 4 bytes 的 offset 為 reference counting 所用，詳細則在 `static void xs_allocate_data(xs *x, size_t len, bool reallocate)` 中有提到（參考 [gyes00205](https://hackmd.io/@gyes00205/2021q1_quiz3) 的共筆） ### xs_capacity ```c= static inline size_t xs_capacity(const xs *x) { return xs_is_ptr(x) ? ((size_t) 1 << x->capacity) - 1 : 15; } ``` 計算字串的最大長度。字串長度小於 16 時為 on stack 字串，最大長度皆為 15。中長字串則需要藉由 capacity 來計算出可容納的最長字串大小。 capacity 儲存的是可以用來儲存字串的 bits 數，因此需要用 $2^{capacity} - 1$ 去計算出真正的最大字串長度。 ### xs_set_refcnt ```c= static inline void xs_set_refcnt(const xs *x, int val) { *((int *) ((size_t) x->ptr)) = val; } ``` 當 xs 字串為 latge string 時， ptr 的前 4 byte 被使用在 reference counting 上。根據 c99 規格書，對於 size_t 的定義為 unsigned interger。 * ISO/IEC 9899:1999 [7.17] Common definitions <stddef.h> > which is the unsigned integer type of the result of the sizeof operator 以及 interger 和 unsigned interger 轉型有關的敘述 * ISO/IEC 9899:1999 [6.3.1.1] Boolean, characters, and integers > An object or expression with an integer type whose integer conversion rank is less than or equal to the rank of int and unsigned int. 可以確定 size_t 的長度為 4 bytes ，因此 `(size_t) x->ptr)` 的確只會拿取前 4 bytes 的數值取出。並且在沒有超出 signed interger 的數值範圍的前提下，可以安全強制轉型成 unsigned intergear 。 ### xs_inc_refcnt ```c= static inline void xs_inc_refcnt(const xs *x) { if (xs_is_large_string(x)) ++(*(int *) ((size_t) x->ptr)); } ``` 當 xs string 為 large string 時，對 reference counting 增加 1 。和 `xs_set_refcnt` 時有一樣的問題，假如造成 signed interger overflow 時，數值會出錯。 ### xs_dec_refcnt ```c= static inline int xs_dec_refcnt(const xs *x) { if (!xs_is_large_string(x)) return 0; return --(*(int *) ((size_t) x->ptr)); } ``` 做和 `static inline void xs_inc_refcnt(const xs *x)` 相對的事情，減少 reference counting 的數值。 ### xs_get_refcnt ```c= static inline int xs_get_refcnt(const xs *x) { if (!xs_is_large_string(x)) return 0; return *(int *) ((size_t) x->ptr); } ``` 回傳 xs 的 reference counting ### xs_literal_empty ```c= #define xs_literal_empty() \ (xs) { .space_left = 15 } ``` 用於初始化出一個字串長度為 0 的 xs string。 ### ilog2 ```c= static inline int ilog2(uint32_t n) { return 32 - __builtin_clz(n) - 1; } ``` 取 $log_2 n$ 的 lower bound 。用 interger 的 bit 去減內建的 `__builtin_clz()` 計算出 leading zero 數後，再減去 1 (lower bound) 即可得到目標值。以 8 bits 數為例 ``` 00111111 ``` count leading zero 的值為 2， $2^{8-2-1}$ 即為 `00100000` ，因此 $8-2-1$ ，即為我們所要求的 ilog2 的值。 ### xs_allocate_data ```c= static void xs_allocate_data(xs *x, size_t len, bool reallocate) { /* Medium string */ if (len < LARGE_STRING_LEN) { x->ptr = reallocate ? realloc(x->ptr, (size_t) 1 << x->capacity) : malloc((size_t) 1 << x->capacity); return; } /* Large string */ x->is_large_string = 1; /* The extra 4 bytes are used to store the reference count */ x->ptr = reallocate ? realloc(x->ptr, (size_t)(1 << x->capacity) + 4) : malloc((size_t)(1 << x->capacity) + 4); xs_set_refcnt(x, 1); } ``` 初始化中長字串的 xs 的字串內容。並且假如為 large string 則同時將 reference counting 設為 1 。 ### xs_new ```c= xs *xs_new(xs *x, const void *p) { *x = xs_literal_empty(); size_t len = strlen(p) + 1; if (len > 16) { x->capacity = ilog2(len) + 1; x->size = len - 1; x->is_ptr = true; xs_allocate_data(x, x->size, 0); memcpy(xs_data(x), p, len); } else { memcpy(x->data, p, len); x->space_left = 15 - (len - 1); } return x; } ``` 建立一個新的 xs 字串，當字串長度大於短字串的最大長度時。則需要額外配置記憶體空間。要注意的是 `size_t len = strlen(p) + 1;` 有多加上 1 補上 null terminator 所占用的空間。因此下一行的 `if (len > 16)` 是大於 16 而非原本 xs filler 的長度 15 。 ### xs_tmp ```c= #define xs_tmp(x) \ ((void) ((struct { \ _Static_assert(sizeof(x) <= MAX_STR_LEN, "it is too big"); \ int dummy; \ }){1}), \ xs_new(&xs_literal_empty(), x)) ``` 回傳一個新的 xs 字串副本，並且利用 `_Static_assert` 檢查 string x 的自串長度有沒有超過 MAX_STR_LEN。 ### xs_grow ```c= /* grow up to specified size */ xs *xs_grow(xs *x, size_t len) { char buf[16]; if (len <= xs_capacity(x)) return x; /* Backup first */ if (!xs_is_ptr(x)) memcpy(buf, x->data, 16); x->is_ptr = true; x->capacity = ilog2(len) + 1; if (xs_is_ptr(x)) { xs_allocate_data(x, len, 1); } else { xs_allocate_data(x, len, 0); memcpy(xs_data(x), buf, 16); } return x; } ``` 增加 xs 字串的最大字串長度，假如字串長度足夠則直接回傳。假如自串長度不夠時，需要檢查為長字串或者短字串。但在第 13 行， `x->is_ptr = true;` 。卻直接接 x 設定為長字串。會造成 x 原本假如是短字串時，無法執行到 19、20 行。造成短字串的字串內容遺失。因此，根據 [gyes00205](https://hackmd.io/@gyes00205/2021q1_quiz3) ，以及 [ccs100203](https://hackmd.io/@cccccs100203/linux2021-quiz3) 討論出的方法將 `x->is_ptr = true;` 移至 if 下方。 ```c= xs *xs_grow(xs *x, size_t len) { char buf[16]; if (len <= xs_capacity(x)) return x; /* Backup first */ if (!xs_is_ptr(x)) memcpy(buf, x->data, 16); x->capacity = ilog2(len) + 1; if (xs_is_ptr(x)) { xs_allocate_data(x, len, 1); } else { xs_allocate_data(x, len, 0); memcpy(xs_data(x), buf, 16); } x->is_ptr = true; return x; } ``` ### xs_newempty ```c= static inline xs *xs_newempty(xs *x) { *x = xs_literal_empty(); return x; } ``` 將 x 設為一個初始化的 xs 短字串。 ### xs_free ``` static inline xs *xs_free(xs *x) { if (xs_is_ptr(x) && xs_dec_refcnt(x) <= 0) free(x->ptr); return xs_newempty(x); } ``` 初始化 x 的字串內容。假如為長字串時，先檢查 refrence counting 是否大於 0 ，假如大於 0 則代表仍然有其他的 xs 正在使用這個字串，因此不能釋放記憶體。假如為 0 則釋放記憶體空間。 ### xs_cow_lazy_copy ```c= static bool xs_cow_lazy_copy(xs *x, char **data) { if (xs_get_refcnt(x) <= 1) return false; /* Lazy copy */ xs_dec_refcnt(x); xs_allocate_data(x, x->size, 0); if (data) { memcpy(xs_data(x), *data, x->size); /* Update the newly allocated pointer */ *data = xs_data(x); } return true; } ``` 為 reference counting 大於 1 的 `xs->ptr` 建立一個新的記憶體空間並且將字串值複製過去。 ```graphviz digraph { rankdir=LR block2 [shape="record", label="{ptr}"]; block1 [shape="record", label="{xs}"]; block [shape="record", label="{other}"]; block -> block2; block1 -> block2; } ``` ```graphviz digraph { rankdir=LR block3 [shape="record", label="{data}"] block2 [shape="record", label="{ptr}"]; block1 [shape="record", label="{xs}"]; block [shape="record", label="{other}"]; block -> block2; block1 -> block3; } ``` ### xs_concat ```c= xs *xs_concat(xs *string, const xs *prefix, const xs *suffix) { size_t pres = xs_size(prefix), sufs = xs_size(suffix), size = xs_size(string), capacity = xs_capacity(string); char *pre = xs_data(prefix), *suf = xs_data(suffix), *data = xs_data(string); xs_cow_lazy_copy(string, &data); if (size + pres + sufs <= capacity) { memmove(data + pres, data, size); memcpy(data, pre, pres); memcpy(data + pres + size, suf, sufs + 1); if (xs_is_ptr(string)) string->size = size + pres + sufs; else string->space_left = 15 - (size + pres + sufs); } else { xs tmps = xs_literal_empty(); xs_grow(&tmps, size + pres + sufs); char *tmpdata = xs_data(&tmps); memcpy(tmpdata + pres, data, size); memcpy(tmpdata, pre, pres); memcpy(tmpdata + pres + size, suf, sufs + 1); xs_free(string); *string = tmps; string->size = size + pres + sufs; } return string; } ``` 按照順序連接三個字串 prefix, string 和 suffix。假如字串 string 的最大字串長度能夠容納三個字串的總長度。則將字串內容複製到 string 上回傳即可。假如字串 string 的最大自串長度不夠，則利用 xs_grow 增加 string 的最大長度後再複製到 string 上。 ### xs_trim ```c= xs *xs_trim(xs *x, const char *trimset) { if (!trimset[0]) return x; char *dataptr = xs_data(x), *orig = dataptr; if (xs_cow_lazy_copy(x, &dataptr)) orig = dataptr; /* similar to strspn/strpbrk but it operates on binary data */ uint8_t mask[32] = {0}; #define check_bit(byte) (mask[(uint8_t) byte / 8] & 1 << (uint8_t) byte % 8) #define set_bit(byte) (mask[(uint8_t) byte / 8] |= 1 << (uint8_t) byte % 8) size_t i, slen = xs_size(x), trimlen = strlen(trimset); for (i = 0; i < trimlen; i++) set_bit(trimset[i]); for (i = 0; i < slen; i++) if (!check_bit(dataptr[i])) break; for (; slen > 0; slen--) if (!check_bit(dataptr[slen - 1])) break; dataptr += i; slen -= i; /* reserved space as a buffer on the heap. * Do not reallocate immediately. Instead, reuse it as possible. * Do not shrink to in place if < 16 bytes. */ memmove(orig, dataptr, slen); /* do not dirty memory unless it is needed */ if (orig[slen]) orig[slen] = 0; if (xs_is_ptr(x)) x->size = slen; else x->space_left = 15 - slen; return x; #undef check_bit #undef set_bit } ``` 用於根據 trimset 所給的字元來切割 xs string 的字串內容。首先利用 `xs_cow_lazy_copy` 來建立一個新的字串，用於切割字串。由於函式支援自訂欲剪除的字元，假如想排除的字元太多，逐一比對會造成效能低落。因此在這邊我們利用查表的方式來確認是否移除字元 (參考[課堂問答](https://hackmd.io/@sysprog/rk9fG7zG_)。程式碼中，我們宣告一個長度為 32 的 uint8_t 型態的 mask 陣列。由於 $log_x{32}$ 為 5 ，並且 uint8_t 的大小為 8 bits ，且 $log_x{8}$ 為 3 。我們可以把一個欲查詢的字元分成前 5 bits 和後 3bits 。分別對應到 mask 的 index 以及 value 上。這樣即可利用 set_bit 以及 check_bit 兩個 macro 上快速查詢當前字元是否屬於 trimset 。舉例來說 `A` 的 ascii 碼為 '0x41' 。則 index 值為 8 中的第 2 個字元，也就是將 `mask[8]` 的第 2 個字元設為 1 。而 check_bit 只需要快速查表即可知道是否屬於 trimset。