# Linux2021 quiz5A - SSO contributed by < [`andy78644`](https://github.com/andy78644) > ###### tags: `2021 linux` --- ## 程式運作原理 ### struct ```cpp= #define STRUCT_BODY(type) \ struct { \ size_t size : 54, on_heap : 1, capacity : 6, flag1 : 1, flag2 : 1, \ flag3 : 1; \ type *ptr; \ } ``` ```cpp= #define v(t, s, name, ...) \ (void) ((struct { \ _Static_assert(s <= 8, "it is too big"); \ int dummy; \ }){1}); \ union { \ STRUCT_BODY(t); \ struct { \ size_t filler; \ t buf[NEXT_POWER_OF_2(s)]; \ }; \ } name __attribute__((cleanup(vec_free))) = {.buf = {__VA_ARGS__}}; \ name.size = sizeof((__typeof__(name.buf[0])[]){0, __VA_ARGS__}) / \ sizeof(name.buf[0]) - \ 1; \ name.capacity = sizeof(name.buf) / sizeof(name.buf[0]) ``` 可以在 `main` 中宣告的 `v(float, 3, vec1);` 看到 `t` 應為 type ,而 s 應為 vector 大小 :::info `_Static_assert` 是在 C11 所定義,主要是用來在編譯時期找出錯誤,而其用法則在 [_Static_assert](https://en.wikichip.org/wiki/c/static_assertion) 可看到,第一個參數等於 0 時則會發出錯誤訊息也就是第二個參數的內容 > _Static_assert takes an integer constant expression. If the constant expression compares unequal to 0, the declaration has no effect. Otherwise it is assumed to produce a constraint violation and the implementation is required to produce a diagnostic message. The diagnostic message must include all the text of from the string-literal (with the exception of any characters that are not in the basic source character set). ::: 第一個 struct 裡面,利用 `_Static_assert` 來判斷,當在這邊這是用來檢視一開始宣告的 vector 大小是否有超過 8 接下來會建立一個 `name` 的 union ,分別是 `struct_body` 和一個 struct 的結構,其中 `struct_body` 是用 heap 來給較大的 vector,而 struct 是使用 stack 來給較小的 vector,而初始在前面有設定要小於 8 個,因此初始設定會將 data 放入後面的 struct 的結構,而 `NEXT_POWER_OF_2(s)` 則會使 buf 的大小會是大於等於 s 的 2 的冪 `{.buf = {__VA_ARGS__}}` 用來對 name 的 buf 進行初始化就是將 `__VA_ARGS` 放入 buf ,該用法為 [designated initializers](https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html) ,用來初始化特定的資料體中特定 field(或是 array index), 可用於 array, struct, union, 而其優點在 [你所不知道的C語言:技巧篇](https://hackmd.io/@sysprog/c-trick) 有詳細的描述 :::info `__VA_ARGS__` 為 Variadic Macros 在 gcc 中文件有提到是用來給 function 接受不定參數 > A macro can be declared to accept a variable number of arguments much as a function can 下面分別根據 GNU 裡面的範例做測試 ```clike= #define test1(format, ...) printf(format, __VA_ARGS__) #define test2(format, args...) printf(format, args) int main(){ test1("%s\n", "test"); test2("%s\n", "test"); } ``` > test test 這部分是針對如果不定參數不能為零,會產生錯誤 ```clike= #define test1(format, ...) printf(format, __VA_ARGS__) #define test2(format, args...) printf(format, args) int main(){ test1("ddd"); } ``` > test.c: In function ‘main’: test.c:3:54: error: expected expression before ‘)’ token #define test1(format, ...) printf(format, __VA_ARGS__) From : [Variadic Macros](https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html) ::: `(__typeof__(name.buf[0])` 可以透過 `__typeof__` 來取的 buf 所存取的 type 類型, `(__typeof__(name.buf[0])[]){0, __VA_ARGS__}` 主要是用來防止指標的傳入導致 size 的計算錯誤。 :::info [Jserv's blog: 藝術與核心](http://blog.linux.org.tw/~jserv/archives/001888.html) (**該文章連結失效**) [Linux Kernel: ARRAY_SIZE()](http://frankchang0125.blogspot.com/2012/10/linux-kernel-arraysize.html) 在文章內有提到如果是直接使用 `(sizeof(arr) / sizeof(arr[0]))` 在使用指標與使用陣列傳入可能導致不同的結果,因為前面的 `sizeof(arr)` 會變成指標大小而不是整個 array 的大小,而使用上面所用到的方式即可避免傳入指標所導致的錯誤 在 [linux/kernel.h](https://github.com/torvalds/linux/blob/master/include/linux/kernel.h) 中所用來檢測 array_size 大小的方式為 ```cpp= #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr)) ``` 如果傳入為指標則會產生編譯錯誤 :::warning TODO 測試傳入指標後結果 ::: :::info `__attribute__()` 是用來將變數,函式等等加入一些屬性在下面為 gcc 內對其的定義 [Specifying Attributes of Variables](https://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html) > The keyword __attribute__ allows you to specify special properties of variables, function parameters, or structure, union, and, in C++, class members. This __attribute__ keyword is followed by an attribute specification enclosed in double parentheses. 這邊主要將 `__attribute__` 分為下面幾種 > Some attributes are currently defined generically for variables. Other attributes are defined for variables on particular target systems. Other attributes are available for functions (see Function Attributes), labels (see Label Attributes), enumerators (see Enumerator Attributes), statements (see Statement Attributes), and for types (see Type Attributes). [Declaring Attributes of Functions](https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#Function-Attributes) 在 gcc 的文件中有提到利用 `function attributes` 去幫助編譯器最佳化以及檢查一些錯誤 > In GNU C and C++, you can use function attributes to specify certain function properties that may help the compiler optimize calls or check code more carefully for correctness. >The cleanup attribute runs a function when the variable goes out of scope. This attribute can only be applied to auto function scope variables; it may not be applied to parameters or variables with static storage duration. The function must take one parameter, a pointer to a type compatible with the variable. The return value of the function (if any) is ignored. If -fexceptions is enabled, then cleanup_function is run during the stack unwinding that happens during the processing of the exception. Note that the cleanup attribute does not allow the exception to be caught, only to perform an action. It is undefined what happens if cleanup_function does not return normally. 以下是針對 `__attribute__((cleanup()))` 做的實驗,分別在 function 內進行 malloc ,然後在宣告時分別使用與未使用 `__attribute__((cleanup()))` 可以看到在未使用的實驗中,function 結束後並未執行 `free_stack` 去將配置的空間 free 掉,利用 `valgrind` 即可看到配置的一個 `int` 的空間在結束後仍然存在 ```shell= $ gcc -o test -g test.c $ valgrind -q --leak-check=full ./test ``` ```clike= #include <stdlib.h> #define autofree \ __attribute__((cleanup(free_stack))) __attribute__ ((always_inline)) inline void free_stack(void *ptr) { free(*(void **) ptr); } int main(void) { autofree int *i = malloc(sizeof (int)); *i = 1; return *i; } ``` > 沒有未 free 的空間 ```clike= #include <stdlib.h> inline void free_stack(void *ptr) { free(*(void **) ptr); } int main(void) { int *i = malloc(sizeof (int)); *i = 1; return *i; } ``` > \==120440== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1 \==120440== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) \==120440== by 0x10915E: main (test2.c:8) \==120440== ::: ### ```cpp #define NEXT_POWER_OF_2(s) \ !(s & (s - 1)) ? s : (size_t) 1 << (64 - __builtin_clzl(s)) ``` ### vec_size and vec_capacity ```cpp= #define vec_size(v) v.size #define vec_capacity(v) \ (v.on_heap ? (size_t) 1 << v.capacity : sizeof(v.buf) / sizeof(v.buf[0])) #define vec_elemsize(v) sizeof(v.buf[0]) ``` size 的大小為 vector 內實際儲存的資料數量 capacity 為整個 vector 的空間大小 `vec_capacity` 內會先判斷該 vector 是存放在 stack 還是 heap 上,如在 heap 上,該大小會記錄在 `struct_body` 的 `capacity` 上,由於該數字為記錄 `capacity` 的冪,因此使用 `1 << v.capacity` 來獲得 capacity 的大小,如果是存放在 stack 可直接利用 `v.buf` 去計算空間大小。 ### push, pop and reserve ```cpp= #define vec_push_back(v, e) \ __vec_push_back(&v, &(__typeof__(v.buf[0])[]){e}, vec_elemsize(v), \ vec_capacity(v)) static NON_NULL void __vec_push_back(void *restrict vec, void *restrict e, size_t elemsize, size_t capacity) { union { STRUCT_BODY(char); struct { size_t filler; char buf[]; }; } *v = vec; if (v->on_heap) { if (v->size == capacity) v->ptr = realloc(v->ptr, elemsize * (size_t) 1 << ++v->capacity); memcpy(&v->ptr[v->size++ * elemsize], e, elemsize); } else { if (v->size == capacity) { void *tmp = malloc(elemsize * (size_t) 1 << (v->capacity = capacity + 1)); memcpy(tmp, v->buf, elemsize * v->size); v->ptr = tmp; v->on_heap = 1; memcpy(&v->ptr[v->size++ * elemsize], e, elemsize); } else memcpy(&v->buf[v->size++ * elemsize], e, elemsize); } } ``` `push_back` 會先確認是否在 heap 上,接下來會分別確認 vecotr 的空間是否不夠,如果是在 heap 上會將整個空間大小變成兩倍,由於 `realloc` 在將空間改變後會將原本的內容移到新的空間上,因此不需重新再複製原本內容,只需將新的內容放入 vector , 也就是在 21 行的位置將新的資料放入 `v_size + 1` 的位置,而如果不是在 heap 上如果空間已滿,則會直接將資料放到 heap 上而不是繼續使用原本的 stack 的空間,而新設的空間大小則為原本 `capacity` 當作 2 的冪 :::info [realloc man page](https://linux.die.net/man/3/realloc) 內所提到 realloc 將原本內容複製到新的空間上並回傳一個新的 pointer 指向所配置新的空間 > The realloc() function changes the size of the memory block pointed to by ptr to size bytes. **The contents will be unchanged in the range from the start of the region up to the minimum of the old and new sizes.** If the new size is larger than the old size, the added memory will not be initialized. If ptr is NULL, then the call is equivalent to malloc(size), for all values of size; if size is equal to zero, and ptr is not NULL, then the call is equivalent to free(ptr). Unless ptr is NULL, it must have been returned by an earlier call to malloc(), calloc() or realloc(). If the area pointed to was moved, a free(ptr) is done. > **The realloc() function returns a pointer to the newly allocated memory**, which is suitably aligned for any built-in type and may be different from ptr, or NULL if the request fails. If size was equal to 0, either NULL or a pointer suitable to be passed to free() is returned. If realloc() fails, the original block is left untouched; it is not freed or moved. ::: :::info C99 的 [6.7.3.1 Formal definition of restrict](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf) 有提到關於 restrict 的用法,下面範例的用法與 `vec_push` 的用法相同避免到兩個指標存取相同區域,在這邊可以避免原本的 vector 和新傳入的 element 是同一個東西 > The function parameter declarations in the following example assert that, during each execution of the function, if an object is accessed through one of the pointer parameters, then it is not also accessed through the other. ```clike= void f(int n, int * restrict p, int * restrict q) { while (n-- > 0) *p++ = *q++; } ``` ::: ```cpp= #define vec_pop_back(v) (void) (v.size -= 1) ``` 透過直接將 `v.size` 減 1 代表減少所儲存的資料,不過由於只更改 `v.size` 而未更改 capacity 有可能使 vector 空間過大而所儲存的資料其實很少而沒有最佳使用資源數量導致效能的降低 ```cpp= #define vec_reserve(v, n) __vec_reserve(&v, n, vec_elemsize(v), vec_capacity(v)) static NON_NULL void __vec_reserve(void *vec, size_t n, size_t elemsize, size_t capacity) { union { STRUCT_BODY(void); struct { size_t filler; char buf[]; }; } *v = vec; if (n > capacity) { if (v->on_heap) { v->ptr = realloc(v->ptr, elemsize * (size_t) 1 << (v->capacity = ilog2(n))); } else { void *tmp = malloc(elemsize * (size_t) 1 << (v->capacity = ilog2(n))); memcpy(tmp, v->buf, elemsize * v->size); v->ptr = tmp; v->on_heap = 1; } } } ``` __vec_reserve 為 linux 命名方式代表內部實作,而外部 vec_reserve(v, n) 進行操作 該函式是先為 vector 預留 capacity ,因此會看 capacity 的大小是否本來就有足夠空間、如未足夠則需配置新的空間,而新的空間則會類似 `push_back` 的處理方式分別有在 heap 上和不在 heap 上 ### data ```cpp= #define vec_data(v) (v.on_heap ? v.ptr : v.buf) /* always contiguous buffer */ #define vec_pos(v, n) vec_data(v)[n] /* lvalue */ ``` 取得 vector 某位置上的資料,需先知道是 heap 上還是 bug 內在去取得資料 ### free ```cpp= /* This function attribute specifies function parameters that are not supposed * to be null pointers. This enables the compiler to generate a warning on * encountering such a parameter. */ #define NON_NULL __attribute__((nonnull)) static NON_NULL void vec_free(void *p) { STRUCT_BODY(void) *s = p; if (s->on_heap) free(s->ptr); } ``` 如果是在 heap 上要將其上面 malloc 的空間 free [NON_NULL](https://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html) 提到 NON_NULL 可以在編譯時期檢查 function 內的參數是否是空指標,並且編譯器可以透過傳入非空指標進行優化 > **The nonnull attribute specifies that some function parameters should be non-null pointers** > If the compiler determines that a null pointer is passed in an argument slot marked as non-null, and the -Wnonnull option is enabled, a warning is issued. **The compiler may also choose to make optimizations based on the knowledge that certain function arguments will not be null.** > If no argument index list is given to the nonnull attribute, all pointer arguments are marked as non-null --- ## 效能分析 ```cpp #include <stdio.h> #include "vector.h" int main() { v(int, 3, vec1); int i=0; for(;i<1000;i++){ vec_push_back(vec1, 100); } return 0; } ==11274== HEAP SUMMARY: ==11274== in use at exit: 0 bytes in 0 blocks ==11274== total heap usage: 6 allocs, 6 frees, 8,064 bytes allocated -------------------------------------------------------------------------------- n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) -------------------------------------------------------------------------------- 0 0 0 0 0 0 1 125,614 136 128 8 0 2 128,201 264 256 8 0 3 131,113 520 512 8 0 4 136,873 1,032 1,024 8 0 5 148,329 2,056 2,048 8 0 6 171,177 4,104 4,096 8 0 7 214,622 4,104 4,096 8 0 99.81% (4,096B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->99.81% (4,096B) 0x109434: __vec_push_back (vector.h:111) | ->99.81% (4,096B) 0x10970A: main (test.c:9) | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%) ``` 首先可以看到 vector 在結束後確實如上面所使用的 `__attribute__((cleanup()))` 去對所有 heap 內 malloc 的空間將其 free,因此結束時沒有未 free 的空間 接著在記憶體的消耗中可以看到從 stack 轉入 heap 後並不是2被大小而這個可以從 `__vec_push_back` 中發現當空間在 stack 中遇到 size = capacity 時會先將空間移入 heap,然後 capacity 則會變為 $2^{capacity + 1}$ ,在我們這個測試程式中,原本的 capacity 為 4 , size 為 3,當 push_back 一次後 capacity 等於 size ,此時即會將 capacity 變為 $2^{4+1}$ 也就是 32 ```cpp= if (v->size == capacity) { void *tmp = malloc(elemsize * (size_t) 1 << (v->capacity = capacity + 1)); memcpy(tmp, v->buf, elemsize * v->size); v->ptr = tmp; v->on_heap = 1; memcpy(&v->ptr[v->size++ * elemsize], e, elemsize); ``` ```cpp= vec_capacity(vec1) = 4 vec_capacity(vec1) = 32 ... vec_capacity(vec1) = 64 ... vec_capacity(vec1) = 128 ... ``` ## Growth factor ### fbvector 在 [FBVector](https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md) 有提到使用 power of 2 會因為前面所使用的記憶體空間加起來小於新的空間導致記憶體無法被重複使用,但如果使用小於 2 的倍數,即可以重複使用,在該文件中也有提到 gcc 實作的 growth factor 是使用 2 ,因此如果利用 gcc 去編譯 STL::vector 則會使記憶體的利用率無法那麼高 > The initial HP implementation by Stepanov used a growth factor of 2; i.e., whenever you'd push_back into a vector without there being room, it would double the current capacity. This was not a good choice: **it can be mathematically proven that a growth factor of 2 is rigorously the worst possible because it never allows the vector to reuse any of its previously-allocated memory.** Despite other compilers reducing the growth factor to 1.5, gcc has staunchly maintained its factor of 2. This makes std::vector cache- unfriendly and memory manager unfriendly. > If we choose k = 2 we know that every element in the series will be strictly larger than the sum of all previous ones because of the remarkable equality: `1 + 2^1 + 2^2 + 2^3... + 2^n = 2^{(n+1)} - 1` > **This means that any new chunk requested will be larger than all previously used chunks combined, so the vector must crawl forward in memory; it can't move back to its deallocated chunks.** But any number smaller than 2 guarantees that you'll at some point be able to reuse the previous chunks. For instance, **choosing 1.5 as the factor allows memory reuse after 4 reallocations;** 1.45 allows memory reuse after 3 reallocations; and 1.3 allows reuse after only 2 reallocations. ### factor 1.5 ```cpp= #define FACTOR 1.5 #define CHUNK_SIZE 4 static inline float ilog_factor(float n) /* log1.5(n) = log2(n)/log2(1.5)*/ { return ceilf(log2f(n)/log2f(FACTOR)); } ``` 要改進該實作,可以看到這邊為了要計算 $log_{1.5}(n)$ 因此計算了兩次以 2 為底的 log ,可以發現後面的 `log2f(FACTOR)` 實際應為常數,由於我們所計算出的 capacity 不需要非常的精確,因此我們可以將 `#define FACTOR` 進行下列更改 ```cpp= #define FACTOR 0.58496250072 ``` 也就是將 $log_{2}(1.5)$ 先計算出當作常數$$ log_2 1.5 \approx 0.58496250072 $$ ### vec_capacity 修改 ```cpp= #define vec_capacity(v) \ (v.on_heap ? (size_t) (CHUNK_SIZE * pow(1.5, ilog_factor(v.capacity)) : sizeof(v.buf) / sizeof(v.buf[0])) ``` ### __vec_reserve 修改 由於新的 capacity 的算法為 `CHUNK_SIZE*pow(1.5,(v->capacity)` ,因此如果是直接使用 `ilog_factor(n)` 當作冪,會變成多乘以 'CHUNK_SIZE' ,空間上會變得過大,由於 $1.5^3$ = 3.375 以及 $1.5^2$ = 2.25,我選擇將冪扣掉 2 以避免空間過大或過小 ```cpp= if (n > capacity) { if (v->on_heap) { v->ptr = realloc(v->ptr, elemsize * (size_t)(CHUNK_SIZE*pow(1.5,(v->capacity = ilog_factor(n) -2 )))); } else { void *tmp = malloc(elemsize * (size_t)(CHUNK_SIZE*pow(1.5,(v->capacity = ilog_factor(n) - 2)))); memcpy(tmp, v->buf, elemsize * v->size); v->ptr = tmp; v->on_heap = 1; } } ``` ### __vec_pushback 修改 ```cpp= if (v->on_heap) { if (v->size == capacity) v->ptr = realloc(v->ptr, elemsize * (size_t)(CHUNK_SIZE * pow(1.5, ++v->capacity))); memcpy(&v->ptr[v->size++ * elemsize], e, elemsize); } else { if (v->size == capacity) { void *tmp = malloc(elemsize * (size_t)(CHUNK_SIZE * pow(1.5, (v->capacity = capacity)))); memcpy(tmp, v->buf, elemsize * v->size); v->ptr = tmp; v->on_heap = 1; memcpy(&v->ptr[v->size++ * elemsize], e, elemsize); } else memcpy(&v->buf[v->size++ * elemsize], e, elemsize); } ``` ### 效能分析 由於每次空間大小的 growth rate 是 1.5,所以同樣使用相同空間大小所需分配的次數有可能較多 ```cpp= ==11178== HEAP SUMMARY: ==11178== in use at exit: 0 bytes in 0 blocks ==11178== total heap usage: 11 allocs, 11 frees, 13,824 bytes allocated -------------------------------------------------------------------------------- n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) -------------------------------------------------------------------------------- 0 0 0 0 0 0 1 151,261 88 80 8 0 2 155,062 136 120 16 0 3 157,494 200 180 20 0 4 161,041 280 272 8 0 5 166,372 424 408 16 0 6 174,156 632 612 20 0 7 185,731 936 920 16 0 8 203,104 1,400 1,380 20 0 9 228,951 2,088 2,072 16 0 99.23% (2,072B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->99.23% (2,072B) 0x1095EC: __vec_push_back (vector_factor.h:119) | ->99.23% (2,072B) 0x109953: main (test.c:11) | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%) -------------------------------------------------------------------------------- n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) -------------------------------------------------------------------------------- 10 267,732 3,128 3,112 16 0 11 325,914 4,680 4,668 12 0 12 375,299 4,680 4,668 12 0 99.74% (4,668B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->99.74% (4,668B) 0x1095EC: __vec_push_back (vector_factor.h:119) | ->99.74% (4,668B) 0x109953: main (test.c:11) | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%) ``` :::warning TODO 檢測實際記憶體配置是否有重複使用 實作 insert earase :::