2021q1 Homework3 (quiz3)

# 2021q1 Homework3 (quiz3) contributed by < `tiffany6022` > ###### tags: `linux2021` > [作業說明](https://hackmd.io/@sysprog/linux2021-quiz3) - [x] 解釋上述程式碼運作原理，並提出改進方案; - [x] 以上述程式碼為基礎，提供字串複製 (應考慮到 CoW) 的函式實作; - [ ] 設計實驗，確認上述字串處理最佳化 (針對空間效率) 帶來的效益，應考慮到 locality of reference，善用 GDB 追蹤和分析，確認字串的確符合預期地配置於 stack 或 heap; - [ ] 嘗試將 quiz2 提到的 string interning 機制整合，提出對空間高度最佳化的字串處理工具函式庫 - [ ] 在 Linux 核心原始程式碼中，找出類似手法的 SSO 或 CoW 案例並解說; ## 解釋程式運作原理 ### 解釋 xs 架構 ```cpp typedef union { char data[16]; struct { uint8_t filler[15], space_left : 4, is_ptr : 1, is_large_string : 1, flag2 : 1, flag3 : 1; }; struct { char *ptr; size_t size : MAX_STR_LEN_BITS, capacity : 6; }; } xs; ``` ```graphviz digraph structs { node[shape=record] xs [label="{data (16bytes)|{{{||||||||||||||}|filters (15bytes)}|{|<middle>}}|{{{|||||||}|ptr (8bytes)}|{{|||||||}|{ &emsp;&emsp; size (54bits)&emsp;&emsp;|<bottom>}}} }"] struct1 [label="{{|||||}|capacity (6bits)}|{{|||}|<four>(4bits)}"] struct2 [label="{{|||}|space_left (4bits)}|{{|||}|<four>&ensp;&ensp;&emsp;(4bits)&emsp;&ensp;&ensp;}"] bits4 [label="{|&emsp;&emsp;&emsp;is_ptr (1bit)&emsp;&emsp;&emsp;}|{|is_large_string (1bit)}|{|&emsp;&emsp;&emsp;flag2 (1bit)&emsp;&emsp;&emsp;}|{|&emsp;&emsp;&emsp;flag3 (1bit)&emsp;&emsp;&emsp;}"] xs:bottom -> struct1 xs:middle -> struct2 struct1:four -> bits4 struct2:four -> bits4 } ``` ##### `union` 用法 union 中的每個成員共用同一塊記憶體位置，並由最大的一組決定所佔的記憶體大小，因此這裡的 xs 佔的記憶體大小為 16 bytes * C99 [6.7.2.1] ***Structure and union specifiers*** > **The size of a union is sufficient to contain the largest of its members.** The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bitfield, then to the unit in which it resides), and vice versa. ##### 成員分別代表 * **短字串** ( $\le 15$ bytes ) * 放在 stack * 將字串存入 `data` * 計算長度的方式是透過 `space_left` 紀錄「最大短字串長度減去現在長度」，因為若字串剛好 15 bytes 第 16 個 byte 是結束字元時，剛好 space_left 要記錄 0000 * **中長字串** ( $\ge 16$ bytes ) * 放在 heap (透過 malloc 系統呼叫配置所需字串大小) * 將字串存入 `ptr` * `is_ptr` 紀錄是否為中長字串 * `size` 紀錄字串長度，最大字串長度為 $2^{54}-1$ * `capacity` 紀錄從 heap 配置的空間大小，單位為 $2^{capacity}$，故用 6 個位元即可涵蓋 size 長度 * **長字串** ( $\ge 256$ bytes ) * 需要實作 CoW (copy on write)，所以透過 `is_large_string` 紀錄，若為長字串則設為 1 ### 解釋 xs_tmp 程式碼 ```cpp int main(int argc, char *argv[]) { xs string = *xs_tmp("\n foobarbar \n\n\n"); ... } ``` ```cpp= #define xs_tmp(x) \ ((void) ((struct { \ _Static_assert(sizeof(x) <= MAX_STR_LEN, "it is too big"); \ int dummy; \ }){1}), \ xs_new(&xs_literal_empty(), x)) ``` ##### 主要作用確保輸入的字串小於 MAX_STR_LEN 再呼叫 xs_new，若超過則停止程式並顯示錯誤訊息 ##### 第 3 行 `_Static_assert` 的作用是當 x 長度大於 MAX_STR_LEN 時，編譯會失敗並出現 "it is too big" 錯誤訊息 :::warning 改寫為更精簡的程式碼 :notes: jserv ::: ##### 第 5 行 `{1}` 的作用是 struct 的寫法，可以將變數 dummy 的值設定為 `1` ##### 第 2 行 `(void)` 的作用是變數 dummy 存在的理由是為了讓 _Static_assert 可以在 struct 中存在，所以後面並不會用到 dummy 這個變數，所以在編譯時會出現 warning，若在前面加上 `(void)` 就可以讓 warning 消失參考：[How to suppress “unused parameter” warnings in C?](https://stackoverflow.com/questions/3599160/how-to-suppress-unused-parameter-warnings-in-c) ##### main 程式中呼叫 xs_tmp 的 `*` 令人不解的是 `*` 是要 dereference 一個看似 tuple 但 C 語言沒有 tuple 的東西，怎麼樣想都應該只有 dereference 後面那個 xs_new 回傳的值，但前面的 struct 就消失了? 直到發現 [Sequence point](https://en.wikipedia.org/wiki/Sequence_point)，才知道是由括號中的逗號分開成兩個表達式，先執行前面的，再執行後面的，最後只對後面的也就是剩下的 dereference 雖然看起來是很特別的寫法，但其實在常用的 while 和 for 迴圈都可見參考：[what-does-the-comma-operator-do](https://stackoverflow.com/questions/52550/what-does-the-comma-operator-do) * C99 [6.5.17] ***Comma operator*** > The left operand of a comma operator is evaluated as a void expression; there is a sequence point after its evaluation. **Then the right operand is evaluated; the result has its type and value.** If an attempt is made to modify the result of a comma operator or to access it after the next sequence point, the behavior is undefined. ### 解釋 xs_new 程式碼 ```cpp #define xs_tmp(x) ... xs_new(&xs_literal_empty(), x)) ``` ```cpp= xs *xs_new(xs *x, const void *p) { *x = xs_literal_empty(); size_t len = strlen(p) + 1; if (len > NNN) { /* NNN = 16 */ x->capacity = ilog2(len) + 1; x->size = len - 1; x->is_ptr = true; xs_allocate_data(x, x->size, 0); memcpy(xs_data(x), p, len); } else { memcpy(x->data, p, len); x->space_left = 15 - (len - 1); } return x; } ``` ##### 主要作用判斷字串是屬於短、中長的哪種字串，分別儲存到記憶體中的 stack 、 heap，並設定相對應的一些值 ##### xs_literal_empty() ```cpp= #define xs_literal_empty() \ (xs) { .space_left = 15 } ``` 初始化 space_left，也就是 stack 剩餘的空間，因為 space_left 只佔 4bits，且 15 的 2 進位是 1111，所以這裡的意思是 stack 目前是空的，剩餘空間是滿的 ##### NNN = 16 字串長度 $\ge 16$ 屬於中長字串，因為前一行 len 有先加上結束字元的大小 (+1)，因此這裡應填入 $\gt 16$ ##### ilog2(len) ```cpp /* lowerbound (floor log2) */ static inline int ilog2(uint32_t n) { return LLL; } /* LLL = 32 - __builtin_clz(n) - 1 */ ``` capacity 紀錄從 heap 配置的空間大小，單位為 $2^{capacity}$，因此要將長度取 $log_2$ ##### LLL = 32 - __builtin_clz(n) - 1 根據以下計算可推得取 $log_2$ 的方式 > $n = 2 = 2^1 = ...00000010_2 (24+6) \rightarrow 32-30-1=1$ > $n = 4 = 2^2 = ...00000100_2 (24+5) \rightarrow 32-29-1=2$ > $n = 8 = 2^3 = ...00001000_2 (24+4) \rightarrow 32-28-1=3$ ##### xs_allocate_data(x, x->size, 0) ```cpp static void xs_allocate_data(xs *x, size_t len, bool reallocate) { /* Medium string */ if (len < LARGE_STRING_LEN) { x->ptr = reallocate ? realloc(x->ptr, (size_t) 1 << x->capacity) : malloc((size_t) 1 << x->capacity); return; } /* Large string */ x->is_large_string = 1; /* The extra 4 bytes are used to store the reference count */ x->ptr = reallocate ? realloc(x->ptr, (size_t)(1 << x->capacity) + 4) : malloc((size_t)(1 << x->capacity) + 4); xs_set_refcnt(x, 1); } ``` 配置記憶體給字串。若字串為中字串，配置的大小為 $2^{capacity}$，若字串為長字串，則配置的大小還要加 4，用來儲存 reference counter ##### xs_set_refcnt(x, 1) 與其他跟 reference count 相關的函式 ```cpp= static inline void xs_set_refcnt(const xs *x, int val) { *((int *) ((size_t) x->ptr)) = val; } static inline void xs_inc_refcnt(const xs *x) { if (xs_is_large_string(x)) ++(*(int *) ((size_t) x->ptr)); } static inline int xs_dec_refcnt(const xs *x) { if (!xs_is_large_string(x)) return 0; return --(*(int *) ((size_t) x->ptr)); } static inline int xs_get_refcnt(const xs *x) { if (!xs_is_large_string(x)) return 0; return *(int *) ((size_t) x->ptr); } ``` 透過 reference counter 紀錄目前有多少個 pointer 指向這個 x->ptr * xs_set_refcnt: 將儲存在 x->ptr 前面 4 個 bit 的 reference counter 設為 val * xs_inc_refcnt: 當複製字串時，並未真正分配出新的記憶體位置給他，而是讓他的指標指向字串的位置，並將字串的 reference counter 加 1，紀錄增加一個指標指向當前字串 * xs_dec_refcnt: 當複製的字串需要改寫時，並不能直接改，因為複製的字串與原字串共用相同的記憶體位置，此時需將字串獨立複製出來，也就是額外分配記憶體空間給他，並把 reference counter 減 1，紀錄減少一個指標指向目前字串 * xs_get_refcnt: 回傳目前 reference counter 的值 ##### xs_data(x) ```cpp static inline char *xs_data(const xs *x) { if (!xs_is_ptr(x)) return (char *) x->data; if (xs_is_large_string(x)) return (char *) (x->ptr + OFF); /* OFF = 4 */ return (char *) x->ptr; } ``` 依照短中長字串，回傳正確的字串儲存位置 ##### OFF = 4 真正長字串儲存的位置在 x->ptr + 4 的地方，因為前面 4 個 bit 是用來儲存 reference counter ### 解釋 xs_trim 程式碼 ```cpp int main(int argc, char *argv[]) { xs string = *xs_tmp("\n foobarbar \n\n\n"); xs_trim(&string, "\n "); ... } ``` #### xs_trim -- 檢測是否需要獨立複製一份 ```cpp xs *xs_trim(xs *x, const char *trimset) { if (!trimset[0]) return x; char *dataptr = xs_data(x), *orig = dataptr; if (xs_cow_lazy_copy(x, &dataptr)) orig = dataptr; ... ``` ##### xs_cow_lazy_copy(x, &dataptr) ```cpp= static bool xs_cow_lazy_copy(xs *x, char **data) { if (xs_get_refcnt(x) <= 1) return false; /* Lazy copy */ xs_dec_refcnt(x); xs_allocate_data(x, x->size, 0); if (data) { memcpy(xs_data(x), *data, x->size); /* Update the newly allocated pointer */ *data = xs_data(x); } return true; } ``` 當 xs_get_refcnt(x) > 1 時執行 Copy on Write Copy on Write (CoW) 是指進行字串複製操作時，倘若字串本身沒有修改，就會共享記憶體空間，從而減少記憶體佔用和省去複製的開銷。換言之，真正的「複製」操作僅發生在字串內容發生變更。 data 是指向指向 data (x->data 或 x->ptr) 的指標，這個函式的目的是將指向指向 data 的指標改成指向指向用來存字串新分配的記憶體位置， xs_get_refcnt(x) <= 1 得知 x 需要是長字串並且 reference counter 大於 1，才會進行 lazy copy (copy on write) xs_dec_refcnt(x) 將 reference counter 減 1，因為要將 x 獨立複製出來，所以指向 x 的指標就少一個 xs_allocate_data(x, x->size, 0) 重新配置字串 `x->ptr` 的記憶體空間 memcpy(xsdata(x), \*data, x->size) 把 data 的內容複製到新分配的 x->ptr \*data = xs_data(x) 將 data 指向新複製出來的位置 #### xs_trim -- 檢測哪些字元需要修剪 ```cpp xs *xs_trim(xs *x, const char *trimset) { ... /* similar to strspn/strpbrk but it operates on binary data */ uint8_t mask[32] = {0}; #define check_bit(byte) (CCC) /* CCC = mask[(uint8_t) byte / 8] & 1 << (uint8_t) byte % 8 */ #define set_bit(byte) (SSS) /* SSS = mask[(uint8_t) byte / 8] |= 1 << (uint8_t) byte % 8 */ size_t i, slen = xs_size(x), trimlen = strlen(trimset); for (i = 0; i < trimlen; i++) set_bit(trimset[i]); for (i = 0; i < slen; i++) if (!check_bit(dataptr[i])) break; for (; slen > 0; slen--) if (!check_bit(dataptr[slen - 1])) break; ... ``` ##### SSS = mask[(uint8_t) byte / 8] |= 1 << (uint8_t) byte % 8 <br> CCC = mask[(uint8_t) byte / 8] & 1 << (uint8_t) byte % 8 mask 佔 32 bytes，並且初始化為 0 set_bit(byte) 是用來將特定位置的 bit 設為 1，因此使用 `|=` 將等號右邊的 bitfield 值為 1 的部分，對應到 mask 相同位置的地方設為 1 check_bit(byte) 是用來測試特定位置的 bit 是否為 1，因此使用 `&`，測試的位置即為等號右邊的 bitfield 值為 1 的部分至於 `byte / 8` 和 `byte % 8` 是什麼意思呢，以下用字元 A (ASCII 碼為 65，二進位為 0100 0001) 舉例 * `/8` 也等於 `>>3`，`%8` 也等於 `&7` * 65 / 8 = 0100 0001 >> 3 = 0000 1000 (8) * 65 \% 8 = 0100 0001 \& 0000 0111 = 0000 0001 (1) * 1 << 65 \% 8 = 0000 0001 << 1 = 0000 0010 (2) * 執行 set_bit(byte) * mask[8] = 0000 0000 | 0000 0010 = 0000 0010 * 將所有需要修剪的字元都執行 set_bit，設定 mask 裡的值 * mask 的 index 可以是 0 ~ 31，而每個 index 對應的 8 個 bits 中只有 1 個 1 (8種可能)，所以 mask 可以表示 32 * 8 = 256 種組合 * 執行 check_bit(byte) * check `A` $\Rightarrow$ return `非0` * mask[8] = 0000 0010 * 0000 0010 & 0000 0010 = 0000 0010 * check `B` (ascii 碼為 66) $\Rightarrow$ return `0` * mask[8] = 0000 0010 * 0000 0010 & 0000 0100 = 0000 0000 * 透過先前 set_bit 記錄了需要修剪的字元，分別從字串的前後檢查是否為特定字元，若不是則停止檢查因此此修剪方式是只修剪字串前後出現的特定字元, eg. ` foobarbar \n\n` 修剪 `\n ` 變 `foobarbar` #### xs_trim -- 設定修剪後的字串 ```cpp xs *xs_trim(xs *x, const char *trimset) { ... dataptr += i; slen -= i; /* reserved space as a buffer on the heap. * Do not reallocate immediately. Instead, reuse it as possible. * Do not shrink to in place if < 16 bytes. */ memmove(orig, dataptr, slen); /* do not dirty memory unless it is needed */ if (orig[slen]) orig[slen] = 0; if (xs_is_ptr(x)) x->size = slen; else x->space_left = 15 - slen; return x; #undef check_bit #undef set_bit } ``` 透過先前 for 迴圈的 i，重新設定字串的起始指標 dataptr 以及字串長度 slen，再透過 memmove 將新字串複製到 orig，並且重新設定 x 的 size ### 解釋 xs_concat 程式碼 ```cpp int main(int argc, char *argv[]) { xs string = *xs_tmp("\n foobarbar \n\n\n"); xs_trim(&string, "\n "); ... xs prefix = *xs_tmp("((("), suffix = *xs_tmp(")))"); xs_concat(&string, &prefix, &suffix); ... } ``` ```cpp= xs *xs_concat(xs *string, const xs *prefix, const xs *suffix) { size_t pres = xs_size(prefix), sufs = xs_size(suffix), size = xs_size(string), capacity = xs_capacity(string); char *pre = xs_data(prefix), *suf = xs_data(suffix), *data = xs_data(string); xs_cow_lazy_copy(string, &data); if (size + pres + sufs <= capacity) { memmove(data + pres, data, size); memcpy(data, pre, pres); memcpy(data + pres + size, suf, sufs + 1); if (xs_is_ptr(string)) string->size = size + pres + sufs; else string->space_left = 15 - (size + pres + sufs); } else { xs tmps = xs_literal_empty(); xs_grow(&tmps, size + pres + sufs); char *tmpdata = xs_data(&tmps); memcpy(tmpdata + pres, data, size); memcpy(tmpdata, pre, pres); memcpy(tmpdata + pres + size, suf, sufs + 1); xs_free(string); *string = tmps; string->size = size + pres + sufs; } return string; } ``` 需要考慮以下兩種情形 * size + pres + sufs <= capacity 新的字串長度小於等於已配置的記憶體空間 * 只需將三個字串的記憶體移動或複製到對應位置即可 * 並重新記錄字串長度 * size + pres + sufs > capacity 新的字串長度大於已配置的記憶體空間 * 需要分配更多記憶體空間 * 再將三個字串的記憶體移動或複製到對應位置 * 釋放舊的字串所佔的記憶體空間 * 並重新記錄字串長度 memmove v.s. memcpy * 主要差別是 memmove 的 dest 和 src 記憶體重疊沒關係 * memmove * void *memmove(void *dest, const void *src, size_t n); * The memory areas **may overlap**: copying takes place as though the bytes in src are first copied into a temporary array that does not overlap src or dest, and the bytes are then copied from the temporary array to dest. * memcpy * void *memcpy(void *dest, const void *src, size_t n); * The memory areas **must not overlap**. * 第 12 行用 memmove 是因為此處是將 data 往後移 pres 個位置 (前面要放 pre)，因此若 pres 小於 size 時，src 和 dest 將發生重疊 ##### xs_grow(&tmps, size + pres + sufs) ```cpp= /* grow up to specified size */ xs *xs_grow(xs *x, size_t len) { char buf[16]; if (len <= xs_capacity(x)) return x; /* Backup first */ if (!xs_is_ptr(x)) memcpy(buf, x->data, 16); x->is_ptr = true; x->capacity = ilog2(len) + 1; if (xs_is_ptr(x)) { xs_allocate_data(x, len, 1); } else { xs_allocate_data(x, len, 0); memcpy(xs_data(x), buf, 16); } return x; } ``` * 短字串 * xs_capacity(x) 回傳 15，所以若新的字串長度大於 15，則需變更為中長字串，並用動態的方式配置記憶體 * 因為改成中長字串，所以需將原本 x->data 的值備份，以免寫入 x->is_ptr 和 x->capacity 時覆蓋到原本 x->data，等配置完新的記憶體空間時，再將備份寫入 x->ptr * 中長字串 * xs_capacity(x) 回傳 $2^{capacity}-1$，所以若新的字串長度大於原先配置的記憶體大小，將擴大記憶體空間重新配置程式碼問題：第 13 行將 x->is_ptr 設為 true 導致第 16 行的判斷式永遠進入 if 的陳述句，但這裡應該是先前為短字串的 x 要進入 else 的陳述句解決方法：在設定 x->is_ptr 之前先用一個變數儲存之前是否為短字串，並將 if 的判斷式改為這個變數 ```diff /* grow up to specified size */ xs *xs_grow(xs *x, size_t len) { ... + bool is_short = false; - if (!xs_is_ptr(x)) + if (!xs_is_ptr(x)) { memcpy(buf, x->data, 16); + is_short = true; + } x->is_ptr = true; x->capacity = ilog2(len) + 1; - if (xs_is_ptr(x)) { + if (!is_short) { ... ``` ##### xs_free(string) ```cpp= static inline xs *xs_free(xs *x) { if (xs_is_ptr(x) && xs_dec_refcnt(x) <= 0) free(x->ptr); return xs_newempty(x); } ``` 滿足以下兩個條件則需要釋放 x->ptr 的記憶體空間 * 中長字串，因為 x->ptr 是透過動態配置而得到的記憶體空間 * 長字串的 reference counter -1 $\le$ 0，因為要確保目前沒有副本是透過指向 x->ptr 這個位置而得的 ##### xs_newempty(x) ```cpp static inline xs *xs_newempty(xs *x) { *x = xs_literal_empty(); return x; } ``` 釋放記憶體空間後，初始化一個新的 x 並回傳回去 ### 執行 main 程式碼 ```cpp int main(int argc, char *argv[]) { xs string = *xs_tmp("\n foobarbar \n\n\n"); xs_trim(&string, "\n "); printf("[%s] : %2zu\n", xs_data(&string), xs_size(&string)); xs prefix = *xs_tmp("((("), suffix = *xs_tmp(")))"); xs_concat(&string, &prefix, &suffix); printf("[%s] : %2zu\n", xs_data(&string), xs_size(&string)); return 0; } ``` 透過以上的程式碼解釋，main 函式就看得懂啦 ٩(ˊᗜˋ )و，其執行結果為 ``` [foobarbar] : 9 [(((foobarbar)))] : 15 ``` ## 字串複製的函式實作 :::warning TODO: 思考如何改寫程式碼，使得 thread-safe? :notes: jserv ::: 短字串與中字串需要獨立複製一份出來，不同於長字串直接指向原字串的實作方式，因此短中字串需要執行 xs_new，而長字串在回傳前要將 reference counter 加 `1`，為了記錄目前有多少的指標在複製時指向他 ```cpp static inline xs *xs_cow_copy(xs *x) { if (!xs_is_large_string(x)) return xs_new(&xs_literal_empty(), xs_data(x)); xs_inc_refcnt(x); return x; } ``` ```cpp int main(int argc, char *argv[]) { xs short_str = *xs_tmp("foobarbar"); if (!xs_is_ptr(&short_str)) printf("\n-----------------------This is short string.-----------------------\n"); printf("[%s] : %2zu\n", xs_data(&short_str), xs_size(&short_str)); printf(" string is at address: %p\n", (void*)&short_str); printf("copy string is at address: %p\n", (void*)xs_cow_copy(&short_str)); xs medium_str = *xs_tmp("foobarbarbarbarbar"); if (xs_is_ptr(&medium_str) && !xs_is_large_string(&medium_str)) printf("\n-----------------------This is medium string.-----------------------\n"); printf("[%s] : %2zu\n", xs_data(&medium_str), xs_size(&medium_str)); printf(" string is at address: %p\n", (void*)&medium_str); printf("copy string is at address: %p\n", (void*)xs_cow_copy(&medium_str)); xs long_str = *xs_tmp("foobarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarb if (xs_is_large_string(&long_str)) printf("\n-----------------------This is long string.-----------------------\n"); printf("[%s] : %2zu\n", xs_data(&long_str), xs_size(&long_str)); printf(" string is at address: %p\n", (void*)&long_str); printf("copy string is at address: %p\n", (void*)xs_cow_copy(&long_str)); return 0; } ``` ```cpp -----------------------This is short string.----------------------- [foobarbar] : 9 string is at address: 0x7ffe00e4ecb0 copy string is at address: 0x7ffe00e4ec70 -----------------------This is medium string.----------------------- [foobarbarbarbarbar] : 18 string is at address: 0x7ffe00e4ecd0 copy string is at address: 0x7ffe00e4ec70 -----------------------This is long string.----------------------- [foobarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbar] : 258 string is at address: 0x7ffe00e4ecf0 copy string is at address: 0x7ffe00e4ecf0 ```