# 2021q1 Homework5 (quiz5) contributed by < [`linD026`](https://github.com/linD026) > ###### tags: `linux2021` > [2021 年第 5 週隨堂測驗題目](https://hackmd.io/@sysprog/linux2021-quiz5) --- ## 0 進度 - [x] 測驗 1 - [x] 1 解釋程式碼 - [x] 2 限制與改進 - [x] 3 vector - insert 和 erase - [ ] 測驗 2 - [ ] 1 解釋程式碼 - [x] 2 限制與改進 - [x] EV3 子計畫 Coroutines in C - yield 和 restart --- ## 測驗 1 ## 1 解釋程式碼以及效能分析 ### 解釋程式碼 * `STRUCT_BODY` * 此 marco 是 on_heap 狀態下,儲存的資料結構。 * 結構組成是 8 bytes ( 64 bits ) + pointer ( 8 bytes ) 。 * 其中前者利用 54 bits 紀錄記憶體使用量 -- size * 1 bit 來判斷是否為動態記憶體配置 -- on_heap * 和用 6 bits 紀錄分配的記憶體大小的指數 -- capacity * 其他三個與 on_heap 相同性質的 flag -- flag1, flag2, flag3 * 隨後是指向其資料的指標 -- ptr ```cpp #define STRUCT_BODY(type) \ struct { \ size_t size : 54, on_heap : 1, capacity : 6, flag1 : 1, flag2 : 1, \ flag3 : 1; \ type *ptr; \ } ``` * `NEXT_POWER_OF_2` * 利用 $2^n$ 在二進制只會有一個 bit 為 1 的性質判斷。 * 譬如 $s = 4$,$4_{10} = 0100_2$,!($0100_2$ & $0011_2$) 為 true ,結果為 s 。 * 而當判斷式為否時,則利用 `__builtin_clzl` 找 `the number of leading 0-bits in x`後,在與 64 相減得到下一個 $2^n$ 。 * 會是與 64 相減的是因為 `__builtin_clzl` 回傳的是 `unsigned long` 型態。而一般 `uint64_t` 為 `typedef unsigned long uint64_t` 。 >**Built-in Function: `int builtin_clz (unsigned int x)`** Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined. > **Built-in Function: `int builtin_clzl (unsigned long)`** Similar to `__builtin_clz`, except the argument type is unsigned long. ```cpp #define NEXT_POWER_OF_2(s) \ !(s & (s - 1)) ? s : (size_t) 1 << (64 - __builtin_clzl(s)) ``` * `nonnull attribute` ```cpp #define NON_NULL __attribute__((nonnull)) ``` > `nonnull (arg-index, ...)` > The nonnull attribute specifies that some function parameters should be non-null pointers. > * `v` * 此為 vector 的宣告。 * `t` 為儲存資料的型態,`s` 為初始化時儲存資料的數量, `name` 此 vector 結構的名稱。 * 最後 `...` 為不定變數。 * 第一個 struct 是先以 `_Static_assert` 判斷 `s` 是否過大。 * union 則是除存於 heap 和 stack 的資料狀態。 ```graphviz digraph main { rankdir = TD node [shape = record] heap_memory [label = "{heap}|{8 bytes|<0>}|{8 bytes|ptr}"] stack_memory [label = "{stack}|{8 bytes|{filler}}|{8 bytes|buf}"] heap_detail [label = "{{{54 bits|size}|{1 bit|on_heap}|{6 bits|capacity}|{1 bit|flag1}|{1 bit|flag2}|{1 bit|flag3}}}"] heap_memory:0 -> heap_detail } ``` ```cpp #define v(t, s, name, ...) \ (void) ((struct { \ _Static_assert(s <= 8, "it is too big"); \ int dummy; \ }){1}); \ union { \ STRUCT_BODY(t); \ struct { \ size_t filler; \ t buf[NEXT_POWER_OF_2(s)]; \ }; \ } name __attribute__((cleanup(vec_free))) = {.buf = {__VA_ARGS__}}; \ name.size = sizeof((__typeof__(name.buf[0])[]){0, __VA_ARGS__}) / \ sizeof(name.buf[0]) - \ 1; \ name.capacity = sizeof(name.buf) / sizeof(name.buf[0]) ``` ### cleanup > `cleanup (cleanup_function)` > The cleanup attribute runs a function when the variable goes out of scope. This attribute can only be applied to auto function scope variables; **it may not be applied to parameters or variables with static storage duration.** The function must take one parameter, a pointer to a type compatible with the variable. The return value of the function (if any) is ignored. > > If `-fexceptions` is enabled, then `cleanup_function` is run during the stack unwinding that happens during the processing of the exception. Note that the cleanup attribute does not allow the exception to be caught, only to perform an action. It is undefined what happens if `cleanup_function` does not return normally. gcc 提供了 cleanup attribute ,讓當變數超出 scope 時,自動跑 cleanup 裡的函式。 須注意這不能應用在 parameters or variables 是靜態儲存的情況。 ### Variadic Macros > This kind of macro is called variadic. **When the macro is invoked, all the tokens in its argument list after the last named argument (this macro has none), including any commas, become the variable argument**. This sequence of tokens replaces the identifier `__VA_ARGS__` in the macro body wherever it appears. 例子: ```cpp #define eprintf(...) fprintf (stderr, __VA_ARGS__) eprintf ("%s:%d: ", input_file, lineno) → fprintf (stderr, "%s:%d: ", input_file, lineno) ``` #### At least one argument 問題 - standard C > ... if you left the variable argument empty, you would have gotten a syntax error, because there would have been an extra comma after the format string. :::warning This formulation looks more descriptive, but historically it was less flexible: you had to supply at least one argument after the format string. In `standard C`, you could not omit the comma separating the named argument from the variable arguments. (Note that this restriction **has been lifted in `C++2a`**, and **never existed in `GNU C`**; see below.) ::: #### 解決方法 1. First, in GNU CPP, and in C++ beginning in C++2a, you are allowed to leave the variable argument out entirely: ```cpp eprintf ("success!\n") → fprintf(stderr, "success!\n", ); ``` 2. Second, C++2a introduces the `__VA_OPT__` function macro. This macro may only appear in the definition of a variadic macro. If the variable argument has any tokens, then a `__VA_OPT__` invocation expands to its argument; but if the variable argument does not have any tokens, the `__VA_OPT__` expands to nothing: ```cpp #define eprintf(format, ...) \ fprintf (stderr, format __VA_OPT__(,) __VA_ARGS__) ``` > `__VA_OPT__` is also available in GNU C and GNU C++. 3. Historically, GNU CPP has also had another extension to handle the trailing comma: the ‘##’ token paste operator has a special meaning when placed between a comma and a variable argument. Despite the introduction of `__VA_OPT__`, this extension remains supported in GNU CPP, for backward compatibility. If you write ```cpp #define eprintf(format, ...) fprintf (stderr, format, ##__VA_ARGS__) ``` :::info **Reference** * [你所不知道的C語言:技巧篇 - Smart Pointer](https://hackmd.io/@sysprog/c-trick?type=view#Smart-Pointer) * [gcc gnu - 3.6 Variadic Macros](https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html) * [gcc gnu - 6.34.1 Common Variable Attributes](https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html) * [gcc gnu - 6.59 Other Built-in Functions Provided by GCC](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) * [stackoverflow - C++ preprocessor `__VA_ARGS__` number of arguments](https://stackoverflow.com/questions/2124339/c-preprocessor-va-args-number-of-arguments) ::: ### 效能分析 #### 初步測試 - stack to heap 先對 vector 進行 push back ,測量對於記憶體的分配是否符合預期。 可以看到記憶體分配從 stack 到 heap 的轉換並沒有達到預想的原始大小 2 倍的分配。 ```cpp // analysis v(int, 3, vec_int, 1, 2, 3); int i = 3; for (;i < 1000;i++) vec_push_back(vec_int, i); -------------------------------------------------------------------------------- n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) -------------------------------------------------------------------------------- 0 0 0 0 0 0 1 146,056 136 128 8 0 2 148,671 264 256 8 0 3 151,615 520 512 8 0 4 157,439 1,032 1,024 8 0 5 169,023 2,056 2,048 8 0 6 192,127 4,104 4,096 8 0 7 235,969 4,104 4,096 8 0 99.81% (4,096B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->99.81% (4,096B) 0x109474: __vec_push_back (self_vector.c:121) | ->99.81% (4,096B) 0x10A627: main (self_vector.c:211) | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%) ``` 在進一步顯示 capacity 的大小在從 stack 轉換至 heap 時,是如何改變的: ``` capacity(vec1)=4 capacity(vec1)=32 ... capacity(vec1)=64 ... capacity(vec1)=128 ... ``` 可以看出這是因為在 `__vec_push_back` 函式中,對於 stack to heap 的操作, capacity 是以 capacity(v) + 1 狀態更新。 因此次狀況會是: $initail capacity = 2;$ capacity = $2^{capacity} + 1;$ $capacity = 4 + 1 = 5 = 2^{5} = 32;$ ```cpp if (v->size == capacity) { void *tmp = malloc(elemsize * (size_t) 1 << (v->capacity = capacity + 1)); memcpy(tmp, v->buf, elemsize * v->size); v->ptr = tmp; v->on_heap = 1; memcpy(&v->ptr[v->size++ * elemsize], e, elemsize); } ``` 雖然不影響操作結果,但對於記憶體增長的曲線較不符合預期,因此如果不想遽增可以改成: ```cpp malloc(elemsize * (size_t) 1 << (v->capacity = ilog2(capacity) + 1)); ``` #### CPU 使用率 ```shell Performance counter stats for './m_self' (10000 runs): 719 cache-misses # 1.740 % of all cache refs ( +- 1.08% ) 4,1345 cache-references ( +- 0.05% ) 76,6800 instructions # 0.97 insn per cycle ( +- 0.02% ) 78,7666 cycles ( +- 0.06% ) 0.000313471 +- 0.000000765 seconds time elapsed ( +- 0.24% ) ``` ## 2 限制與改進 ### FBVector - reuse previously-allocated memory 在 [FBvector.md](https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md#memory-handling) 中提及對於記憶體增長的倍數是 power of 2 的弊端,是因為先前的配置過得記憶體大小必定**小於**增長過後的大小,因此這不會使 vector 重新使用之前的記憶體。 導致最後必定會無法使用 deallocated chunks 。 > The initial HP implementation by Stepanov used a growth factor of 2; i.e., whenever you'd push_back into a vector without there being room, it would double the current capacity. This was not a good choice: **it can be mathematically proven that a growth factor of 2 is rigorously the worst possible because it never allows the vector to reuse any of its previously-allocated memory.** > `1 + 2^^1 + 2^^2 + 2^^3... + 2^^n = 2^^(n+1) - 1` **This means that any new chunk requested will be larger than all previously used chunks combined**, so the vector must crawl forward in memory; it can't move back to its deallocated chunks. 而在後續也有提及任何小於 2 的數字可以保正在某個 reallocation 次數之後,可以重新使用原先記憶體。 > **But any number smaller than 2 guarantees that you'll at some point be able to reuse the previous chunks.** For instance, choosing 1.5 as the factor allows memory reuse after 4 reallocations; 1.45 allows memory reuse after 3 reallocations; and 1.3 allows reuse after only 2 reallocations. 當然這是在**假設 allocator 會儘量使用先前記憶體**的情形,但這不影響我們避免最糟的選擇。 > Of course, the above makes a number of simplifying assumptions about how the memory allocator works, but definitely you don't want to choose the theoretically absolute worst growth factor. fbvector uses a growth factor of 1.5. That does not impede good performance at small sizes because of the way fbvector cooperates with jemalloc (below). ### factor - 1.5 更改 ```cpp #define FACTOR 1.5 #define CHUNK_SIZE 4 #define vec_capacity(v) \ ((v.on_heap) ? (size_t) (CHUNK_SIZE * pow(FACTOR, v.capacity)) : sizeof(v.buf) / sizeof(v.buf[0])) static inline float ilog_factor(float n) /* log1.5(n) = log2(n)/log2(1.5)*/ { return ceilf(log2f(n)/log2f(FACTOR)); } ``` * `__vec_push_back` ```cpp ... if (v->on_heap) { if (v->size == capacity) v->ptr = realloc(v->ptr, elemsize * CHUNK_SIZE * pow(FACTOR, (++v->capacity))); memcpy(&v->ptr[v->size++ * elemsize], e, elemsize); } else { if (v->size == capacity) { v->capacity = 1; void *tmp = malloc(elemsize * (size_t)(CHUNK_SIZE * pow(FACTOR, (v->capacity)))); memcpy(tmp, v->buf, elemsize * v->size); v->ptr = tmp; v->on_heap = 1; memcpy(&v->ptr[v->size++ * elemsize], e, elemsize); } else memcpy(&v->buf[v->size++ * elemsize], e, elemsize); } ... ``` * `__vec_reserve` ```cpp ... if (n > capacity) { float new_cap = ilog_factor(n/CHUNK_SIZE); new_cap = (new_cap < 1) ? 1 : new_cap; if (v->on_heap) { v->ptr = realloc(v->ptr, elemsize * (size_t)(CHUNK_SIZE * pow(FACTOR, v->capacity = new_cap))); } else { void *tmp = malloc(elemsize * (size_t)(CHUNK_SIZE * pow(FACTOR, v->capacity = new_cap))); memcpy(tmp, v->buf, elemsize * v->size); v->ptr = tmp; v->on_heap = 1; } } } ... ``` ```cpp v(int, 3, vec_int, 1, 2, 3); int i = 3; for (;i < 10;i++) { vec_push_back(vec_int, i); printf("capacity(vec1)=%zu, %d\n", vec_capacity(vec_int), vec_int.capacity); } vec_reserve(vec_int, 15); printf("capacity(vec1)=%zu, %d\n", vec_capacity(vec_int), vec_int.capacity); ----------------------- capacity(vec1)=4, 4 capacity(vec1)=6, 1 capacity(vec1)=6, 1 capacity(vec1)=9, 2 capacity(vec1)=9, 2 capacity(vec1)=9, 2 capacity(vec1)=13, 3 capacity(vec1)=20, 4 ``` #### reallocate address 測試 從下方結果顯示出實際情況並不完全如 FBVector 所假設。 這原因可以從[此看得出原因](https://hackmd.io/@linD026/linux2021-quiz5#realloc)。 ```cpp void reuse_test(void) { v(uint64_t, 3, test, 1, 2, 3); printf("%10s %10s %10s %10s\n", "i", "capacity", "realsize", "address"); printf("%10d %10d %10ld %10p\n", 0, test.capacity, vec_capacity(test), &vec_pos(test, 1)); for (int i = 1;i <= 20;i++) { //vec_reserve(test, vec_capacity(test) * 2); vec_reserve(test, (size_t)(CHUNK_SIZE * pow(FACTOR, i))); printf("%10d %10d %10ld %10p\n", i, test.capacity, vec_capacity(test), &vec_pos(test, 1)); } } ``` ```shell $ make test ./m_vector i capacity realsize address 0 4 4 0x7fff6c164e20 1 1 6 0x55889e4696b8 2 2 9 0x55889e4696b8 3 3 13 0x55889e4696b8 4 4 20 0x55889e4696b8 5 5 30 0x55889e4696b8 6 6 45 0x55889e4696b8 7 7 68 0x55889e4696b8 8 8 102 0x55889e4696b8 9 9 153 0x55889e4696b8 10 10 230 0x55889e4696b8 11 11 345 0x55889e4696b8 12 12 518 0x55889e4696b8 13 13 778 0x55889e4696b8 14 14 1167 0x55889e4696b8 15 15 1751 0x55889e4696b8 16 16 2627 0x55889e4696b8 17 17 3941 0x55889e4696b8 18 18 5911 0x55889e4696b8 19 19 8867 0x55889e4696b8 20 20 13301 0x55889e4696b8 21 21 19951 0x7f2ab4d96018 22 22 29927 0x7f2ab4d5b018 23 23 44890 0x7f2ab4d5b018 24 24 67336 0x7f2ab4cd7018 25 25 101004 0x7f2ab4cd7018 26 26 151507 0x7f2ab4baf018 27 27 227260 0x7f2ab4baf018 28 28 340890 0x7f2ab4915018 29 30 767004 0x7f2ab433a018 30 30 767004 0x7f2ab433a018 ------------------------------------------------------------------------------- $ make self cc -o m_self self_vector.c -g -lm $ make test_self ./m_self i capacity realsize address 0 4 4 0x7ffe86ce1c00 1 3 8 0x560605a6a6b8 2 4 16 0x560605a6a6b8 3 5 32 0x560605a6a6b8 4 6 64 0x560605a6a6b8 5 7 128 0x560605a6a6b8 6 8 256 0x560605a6a6b8 7 9 512 0x560605a6a6b8 8 10 1024 0x560605a6a6b8 9 11 2048 0x560605a6a6b8 10 12 4096 0x560605a6a6b8 11 13 8192 0x560605a6a6b8 12 14 16384 0x560605a6a6b8 13 15 32768 0x7f9dbbd78018 14 16 65536 0x7f9dbbcf7018 15 17 131072 0x7f9dbbbf6018 16 18 262144 0x7f9dbb9f5018 17 19 524288 0x7f9dbb5f4018 18 20 1048576 0x7f9dbadf3018 19 21 2097152 0x7f9db9df2018 20 22 4194304 0x7f9db7df1018 21 23 8388608 0x7f9db3df0018 22 24 16777216 0x7f9dabdef018 23 25 33554432 0x7f9d9bdee018 24 26 67108864 0x7f9d7bded018 25 27 134217728 0x7f9d3bdec018 26 28 268435456 0x7f9cbbdeb018 27 29 536870912 0x7f9bbbdea018 28 30 1073741824 0x7f99bbde9018 29 31 2147483648 0x7f95bbde8018 30 32 4294967296 0x7f8dbbde7018 ``` ### FBVector - autiumatically detect - memory chunk 如果在細部探討 allocator 對於 memory 的操作,可以看到 vector 對於記憶體分配: $realsize(currentsize) = chuncksize \times factor^{\lceil log_{factor}^{current size}\rceil}$ 從上述式子得知,記憶體是以倍數增長,因此會有多餘的記憶體。 > As discussed above, `std::vector` also needs to (re)allocate in quanta. The next quantum is usually defined in terms of the current size times the infamous growth constant. **Because of this setup, `std::vector` has some slack memory at the end much like an allocated block has some slack memory at the end.** 因此 `fbvector` 正是藉由 jemalloc 用以自動檢測記憶體大小並且 `reallocation` 的方式避免。 > It doesn't take a rocket surgeon to figure out that an allocator- aware `std::vector` would be a marriage made in heaven: the vector could directly request blocks of "perfect" size from the allocator so there would be virtually no slack in the allocator. Also, the entire growth strategy could be adjusted to work perfectly with allocator's own block growth strategy. **That's exactly what `fbvector` does - it automatically detects the use of jemalloc and adjusts its reallocation strategy accordingly.** 以下為 folly 對此操作的程式碼: #### `goodMallocSize()` ```cpp inline size_t goodMallocSize(size_t minSize) noexcept { if (minSize == 0) { return 0; } if (!canNallocx()) { // No nallocx - no smarts return minSize; } // nallocx returns 0 if minSize can't succeed, but 0 is not actually // a goodMallocSize if you want minSize auto rv = nallocx(minSize, 0); return rv ? rv : minSize; } ``` #### `jemallocMinInPlaceExpandable` ```cpp // We always request "good" sizes for allocation, so jemalloc can // never grow in place small blocks; they're already occupied to the // brim. Blocks larger than or equal to 4096 bytes can in fact be // expanded in place, and this constant reflects that. static const size_t jemallocMinInPlaceExpandable = 4096; ``` 大小為 4096 的原因: > Virtually all modern allocators allocate memory in fixed-size quanta that are chosen to minimize management overhead while at the same time offering good coverage at low slack. **For example, an allocator may choose blocks of doubling size (32, 64, 128, <t_co>, ...) up to 4096**, and then blocks of size multiples of a page up until 1MB, and then 512KB increments and so on. #### `computePushBackCapacity()` - factor 1.5 當` capacity()` 大於 4096 後,以 1.5 倍成長: > return (capacity() * 3 + 1) / 2; ```cpp size_type computePushBackCapacity() const { if (capacity() == 0) { return std::max(64 / sizeof(T), size_type(1)); } if (capacity() < folly::jemallocMinInPlaceExpandable / sizeof(T)) { return capacity() * 2; } if (capacity() > 4096 * 32 / sizeof(T)) { return capacity() * 2; } return (capacity() * 3 + 1) / 2; } ``` #### `xallocx()` - shrink `jemalloc` 提供的 `xallocx` : > The xallocx() function resizes the allocation at ptr in place to be at least size bytes, and returns the real size of the allocation. If extra is non-zero, an attempt is made to resize the allocation to be at least (size + extra) bytes, though inability to allocate the extra byte(s) will not by itself result in failure to resize. Behavior is undefined if size is 0, or if (size + extra > SIZE_T_MAX). #### emplace_back_aux() > folly::FBVector 的 `push_back` 實作也是此函式 利用 `jemalloc` 裡的 `xallocx` reallocate vector , 如果失敗了則使用 `allocate => memory copy => deallocate` cycle 。 ```cpp size_type byte_sz = folly::goodMallocSize(computePushBackCapacity() * sizeof(T)); if (usingStdAllocator && usingJEMalloc() && ((impl_.z_ - impl_.b_) * sizeof(T) >= folly::jemallocMinInPlaceExpandable)) { // Try to reserve in place. // Ask xallocx to allocate in place at least size()+1 and at most sz // space. // xallocx will allocate as much as possible within that range, which // is the best possible outcome: if sz space is available, take it all, // otherwise take as much as possible. If nothing is available, then // fail. // In this fashion, we never relocate if there is a possibility of // expanding in place, and we never reallocate by less than the desired // amount unless we cannot expand further. Hence we will not reallocate // sub-optimally twice in a row (modulo the blocking memory being freed). size_type lower = folly::goodMallocSize(sizeof(T) + size() * sizeof(T)); size_type upper = byte_sz; size_type extra = upper - lower; void* p = impl_.b_; size_t actual; if ((actual = xallocx(p, lower, extra, 0)) >= lower) { impl_.z_ = impl_.b_ + actual / sizeof(T); M_construct(impl_.e_, std::forward<Args>(args)...); ++impl_.e_; return; } } // reallocation failed ... ``` :::warning **延伸閱讀** * [stackoverflow - push_back vs emplace_back](https://stackoverflow.com/questions/4303513/push-back-vs-emplace-back) > For `emplace_back` constructor `A (int x_arg)` will be called. And for `push_back` `A (int x_arg)` is called first and move `A (A &&rhs)` is called afterwards. * [Proposed Wording for Placement Insert](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2642.pdf) ::: ### 限制 #### `realloc()` 關於 clib-base 的 realloc ,我們並不能如期知道裡面對於記憶體操作的實作機制是 in-place 或 alloc-memcpy-dealloc 。這也是為何先前 address 測試時,不符合預期的原因, #### FBVector - in place reallocation > But wait, there's more. Many memory allocators do not support in- place reallocation, although most of them could. **This comes from the now notorious design of `realloc()` to opaquely perform either in-place reallocation or an allocate-memcpy-deallocate cycle.** Such lack of control subsequently forced all clib-based allocator designs to avoid in-place reallocation, and that includes C++'s `new` and `std::allocator`. This is a major loss of efficiency because an in-place reallocation, being very cheap, may mean a much less aggressive growth strategy. In turn that means less slack memory and faster reallocations. 而關於此解, FBVector 是利用了自己的 allocator jemalloc 中的 xallocx 。 #### folly::Malloc - `smartRealloc()` ```cpp /** * This function tries to reallocate a buffer of which only the first * currentSize bytes are used. The problem with using realloc is that * if currentSize is relatively small _and_ if realloc decides it * needs to move the memory chunk to a new buffer, then realloc ends * up copying data that is not used. It's generally not a win to try * to hook in to realloc() behavior to avoid copies - at least in * jemalloc, realloc() almost always ends up doing a copy, because * there is little fragmentation / slack space to take advantage of. */ FOLLY_MALLOC_CHECKED_MALLOC FOLLY_NOINLINE inline void* smartRealloc( void* p, const size_t currentSize, const size_t currentCapacity, const size_t newCapacity) { assert(p); assert(currentSize <= currentCapacity && currentCapacity < newCapacity); auto const slack = currentCapacity - currentSize; if (slack * 2 > currentSize) { // Too much slack, malloc-copy-free cycle: auto const result = checkedMalloc(newCapacity); std::memcpy(result, p, currentSize); free(p); return result; } // If there's not too much slack, we realloc in hope of coalescing return checkedRealloc(p, newCapacity); } /////////////////////////////////////////////////////////////////////////////// inline void* checkedRealloc(void* ptr, size_t size) { void* p = realloc(ptr, size); if (!p) { throw_exception<std::bad_alloc>(); } return p; } ``` #### capacity - $factor^{capactiy}$ 計算 此 vector 是以 6 bits 儲存 capacity ,因此其能所表達的數值範圍就會比較小。 * [power of 1.5 儲存空間小於 power of 2](https://hackmd.io/@linD026/linux2021-quiz5#reallocate-address-%E6%B8%AC%E8%A9%A6) * FBVector 是直接儲存數值,而**非儲存指數**。因此在計算 $1.5^n$ 倍數時,可以以 `(capacity() * 3 + 1) / 2;` 方式增長。 ### ceilf 改進 `ceilf` 是求數值的 upper bound integer ,而我以用 `modff` 直接去的浮點數的整數型態來節省成本。 ```cpp #define FACTOR 1.5 #define CHUNK_SIZE 4 static inline float ilog_factor(float n) /* log1.5(n) = log2(n)/log2(1.5)*/ { return ceilf(log2f(n) / log2f(FACTOR)); } static inline float modff_factor(float n) { float integer; modff((log2f(n) / log2f(FACTOR)), &integer); return integer + 1; } ``` ![](https://imgur.com/DnysxLG.png) ![](https://imgur.com/le4O59z.png) :::info **function** [folly::Malloc - oodMallocSize](https://github.com/facebook/folly/blob/bd600cd4e88f664f285489c76b6ad835d8367cd2/folly/memory/Malloc.h#L191) [folly::Malloc - jemallocMinInPlaceExpandable](https://github.com/facebook/folly/blob/bd600cd4e88f664f285489c76b6ad835d8367cd2/folly/memory/Malloc.h#L211) [folly::Malloc - smartRealloc](https://github.com/facebook/folly/blob/bd600cd4e88f664f285489c76b6ad835d8367cd2/folly/memory/Malloc.h#L259) [folly::FBVector - in place](https://github.com/facebook/folly/blob/c5651a004dcbb996f0d033d202fade8c12aeb123/folly/FBVector.h#L968) [folly::FBVector - computePushBackCapacity](https://github.com/facebook/folly/blob/c5651a004dcbb996f0d033d202fade8c12aeb123/folly/FBVector.h#L1151) [folly::FBVector - emplace_back_aux](https://github.com/facebook/folly/blob/c5651a004dcbb996f0d033d202fade8c12aeb123/folly/FBVector.h#L1165) [folly::FBVector - shrink_to_fit](https://github.com/facebook/folly/blob/c5651a004dcbb996f0d033d202fade8c12aeb123/folly/FBVector.h#L962) **log2f** [Fast log2(float x) implementation C++](https://stackoverflow.com/questions/9411823/fast-log2float-x-implementation-c) [Fastest implementation of log2(int) and log2(float)](https://stackoverflow.com/questions/15967240/fastest-implementation-of-log2int-and-log2float) ::: ## 3 vector - insert 和 erase ### insert 原先是打算使用 `stdarg.h` 的不定變數來存取資料,但在思考如何從 `arg_list` 型態的 `args` 變數利用 `va_arg` 函式提取資料時發現,我不知道型態是什麼。 起初我是打算利用已知資料大小來設個 `char type_[elemsize]`; 或是 `typedef struct {char dummy; char tmp[elemsize - 1];} type_;` 來儲存資料。但兩者型態都不能夠運用在 `va_arg` 或宣告函式當中。 > 此為 `struct` error ,另一個 `char` 型態忘了紀錄,但我記得是不能用在 `va_arg` 函式上的錯誤。 ```cpp vector.c: In function ‘__vec_insert’: vector.c:121:24: error: variable-sized object may not be initialized 121 | struct type_ e = va_arg(args, struct type_); | ``` 後來查了一下,發現網路上對此的相關的說明甚少,目前比較與此問題較相關為此篇 stackoverflow 文章: [stackoverflow - va_list with unknown type names - only the byte size is known!](https://stackoverflow.com/questions/3213350/va-list-with-unknown-type-names-only-the-byte-size-is-known) 但仍舊沒有也解決方案,因此最後我改變儲存形式以 array 型態傳入。 在設計 insert 函式時有須注意: * 不定變數的傳遞 * 不定變數的個數對記憶體大小的影響,如 `insert` 過後是否在 `stack` 或 `heap` * 記憶體讀取是以 `char` ( 1 byte ) 為單位乘以 `elemsize` ```cpp #define VAR_ARG_COUNT(v, ...)\ (sizeof((__typeof__(v.buf[0])[]){0, __VA_ARGS__}) /sizeof(v.buf[0]) - 1) #define vec_insert(v, pos, ...)\ __vec_insert(&v, pos, vec_elemsize(v), vec_capacity(v), VAR_ARG_COUNT(v, __VA_ARGS__), &(__typeof__(v.buf[0])[VAR_ARG_COUNT(v, __VA_ARGS__)]){__VA_ARGS__}) ``` ```cpp NON_NULL void __vec_insert(void *restrict vec, size_t pos, size_t elemsize, size_t capacity, size_t arg_count, ...) { union { STRUCT_BODY(char); struct { size_t filler; char buf[]; }; } *v = vec; if (pos >= 0 && pos <= v->size) { va_list args; va_start(args, arg_count); char *e_list = (char *)va_arg(args, void *restrict); va_end(args); while (v->size + arg_count >= capacity) { if (!v->on_heap) { v->capacity = 1; void *tmp = malloc(elemsize * (size_t)(CHUNK_SIZE * pow(FACTOR, (v->capacity)))); memcpy(tmp, v->buf, elemsize * v->size); v->ptr = tmp; v->on_heap = 1; } else { v->ptr = realloc(v->ptr, elemsize * CHUNK_SIZE * pow(FACTOR, (++v->capacity))); } capacity = (size_t) (CHUNK_SIZE * pow(FACTOR, v->capacity)); } char *ptr = NULL; if (v->on_heap) { if (v->size + arg_count >= capacity) v->ptr = realloc(v->ptr, elemsize * CHUNK_SIZE * pow(FACTOR, (++v->capacity))); ptr = v->ptr; } else ptr = v->buf; size_t after_pos = pos; char *after_pos_ptr = (char *) &ptr[after_pos * elemsize]; void *after_pos_buf = malloc(elemsize * (size_t)(v->size - pos)); memcpy(after_pos_buf, after_pos_ptr, elemsize * (size_t)(v->size - pos)); for (int i = 0;i < arg_count;i++) { memcpy(&ptr[after_pos++ * elemsize], &e_list[i * elemsize], elemsize); } memcpy(&ptr[after_pos * elemsize], after_pos_buf, elemsize * ((size_t)v->size - pos)); free(after_pos_buf); v->size = v->size + arg_count; } } ``` ```cpp v(float, 4, test, 1.0, 2.0, 3.0, 4); vec_insert(test, 3, 5, 6, 7, 8, 9, 10, 11, 12); display(test); ------------------------------------------------------------------------------- $ make test ./m_vector 1.00 2.00 3.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 4.00 heap ``` 可以從程式碼得知此操作的成本極高,有多個 memory copy 與 allocate 操作,但目前還沒找到比較好的實作手段。 比較可能實行的方法大概是改變儲存結構,使其為 linked list 的方式。 ### erase 利用不定變數達到 function overloading 的部份概念(因為是利用 parameter 數量來判斷,沒辦法達成不同型態但同一個數量的 function overloading )。 當傳入數值為 1 個時, erase 那個位置即可,而若輸入兩個變數則會是範圍 erase 。 ```cpp #define vec_erase(v, ...)\ __vec_erase(&v, vec_elemsize(v), vec_capacity(v), VAR_ARG_COUNT(v, __VA_ARGS__), __VA_ARGS__); ``` ```cpp NON_NULL void __vec_erase(void *restrict vec, size_t elemsize, size_t capacity, size_t arg_count, ...) { union { STRUCT_BODY(char); struct { size_t filler; char buf[]; }; } *v = vec; va_list args; va_start(args, arg_count); char *ptr = NULL; if (v->on_heap) ptr = v->ptr; else ptr = v->buf; if (arg_count > 1) { size_t start = va_arg(args, size_t); size_t end = va_arg(args, size_t); if (start >= 0 && end <= v->size && v->size - (end - start) > 0) { memmove(&ptr[elemsize * start], &ptr[elemsize * (end)], elemsize * ((size_t)(v->size - (end - start)))); v->size = v->size - (end - start); } } else { size_t pos = va_arg(args, size_t); if (pos >= 0 && pos <= v->size && v->size > 0) { memmove(&ptr[elemsize * pos], &ptr[elemsize * (pos + 1)], elemsize * ((size_t)(v->size - pos))); v->size--; } } va_end(args); } ``` ```cpp v(float, 4, test, 1.0, 2.0, 3.0, 4); vec_insert(test, 3, 5, 6, 7, 8, 9, 10, 11, 12); display(test); vec_erase(test, 0, 4); display(test); vec_erase(test, 7); display(test); vec_erase(test, 11); display(test); ------------------------------------------------------------------------------- $ make test ./m_vector 1.00 2.00 3.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 4.00 heap 6.00 7.00 8.00 9.00 10.00 11.00 12.00 4.00 heap 6.00 7.00 8.00 9.00 10.00 11.00 12.00 heap 6.00 7.00 8.00 9.00 10.00 11.00 12.00 heap ``` :::info **Reference** * [cppreference.com - std::vector<T,Allocator>::insert](https://en.cppreference.com/w/cpp/container/vector/insert) * [stackoverflow - va_list with unknown type names - only the byte size is known!](https://stackoverflow.com/questions/3213350/va-list-with-unknown-type-names-only-the-byte-size-is-known) * [cppreference.com - std::vector<T,Allocator>::erase](https://en.cppreference.com/w/cpp/container/vector/erase) ::: --- ## 測驗 2 ## 1 解釋程式碼 ### coroutine * struct ```cpp struct cr { void *label; int status; void *local; /* private local storage */ }; ``` * label ```cpp /* Helper macros to generate unique labels */ #define __cr_line3(name, line) _cr_##name##line #define __cr_line2(name, line) __cr_line3(name, line) #define __cr_line(name) __cr_line2(name, __LINE__) struct cr { void *label; int status; void *local; /* private local storage */ }; #define cr_label(o, stat) \ do { \ (o)->status = (stat); \ __cr_line(label) : (o)->label = &&__cr_line(label); \ } while (0) ``` * action from label ```cpp #define cr_begin(o) \ do { \ if ((o)->status == CR_FINISHED) \ return; \ if ((o)->label) \ goto *(o)->label; \ } while (0) #define cr_end(o) cr_label(o, CR_FINISHED) #define cr_wait(o, cond) \ do { \ cr_label(o, CR_BLOCKED); \ if (!(cond)) \ return; \ } while (0) #define cr_exit(o, stat) \ do { \ cr_label(o, stat); \ return; \ } while (0) ``` ### system call ```cpp #define cr_sys(o, call) \ cr_wait(o, (errno = 0) || !(((call) == -1) && \ (errno == EAGAIN || errno == EWOULDBLOCK || \ errno == EINPROGRESS || errno == EINTR))) ``` ### queue ```cpp #define cr_queue(T, size) \ struct { \ T buf[size]; \ size_t r, w; \ } #define cr_queue_init() \ { \ .r = 0, .w = 0 \ } #define cr_queue_len(q) (sizeof((q)->buf) / sizeof((q)->buf[0])) #define cr_queue_cap(q) ((q)->w - (q)->r) #define cr_queue_empty(q) ((q)->w == (q)->r) #define cr_queue_full(q) (cr_queue_cap(q) == cr_queue_len(q)) #define cr_queue_push(q, el) \ (!cr_queue_full(q) && ((q)->buf[(q)->w++ % cr_queue_len(q)] = (el), 1)) #define cr_queue_pop(q) \ (cr_queue_empty(q) ? NULL : &(q)->buf[(q)->r++ % cr_queue_len(q)]) ``` ### base network function ```cpp static int nonblock(int fd) { int flags = fcntl(fd, F_GETFL, 0); if (flags == -1) return -1; return fcntl(fd, F_SETFL, flags | O_NONBLOCK); } ``` * main func ```cpp char *host = argv[1]; int port = atoi(argv[2]); int fd = socket(AF_INET, SOCK_STREAM, 0); ``` ```cpp struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr = { .s_addr = inet_addr(host), }, .sin_port = htons(port), }; connect(fd, (struct sockaddr *) &addr, sizeof(struct sockaddr_in)); ``` ```cpp select(fd + 1, &fds, NULL, NULL, NULL); ``` ### AP function ```cpp static void socket_read_loop(struct cr *o, int fd) { static uint8_t b; static int r; cr_begin(o); for (;;) { cr_sys(o, r = recv(fd, &b, 1, 0)); if (r == 0) cr_exit(o, 1); cr_sys(o, write(STDOUT_FILENO, &b, 1)); } cr_end(o); } ``` ```cpp static void socket_write_loop(struct cr *o, int fd, byte_queue_t *in) { static uint8_t *b; cr_begin(o); for (;;) { cr_wait(o, !cr_queue_empty(in)); b = cr_queue_pop(in); cr_sys(o, send(fd, b, 1, 0)); } cr_end(o); } ``` ```cpp static void stdin_loop(struct cr *o, byte_queue_t *out) { static uint8_t b; static int r; cr_begin(o); for (;;) { cr_sys(o, r = read(STDIN_FILENO, &b, 1)); if (r == 0) { cr_wait(o, cr_queue_empty(out)); cr_exit(o, 1); } cr_wait(o, !cr_queue_full(out)); cr_queue_push(out, b); } cr_end(o); } ``` ## 2 限制與改進 ### Coroutine in C 原先對於 coroutine 相關操作如 `struct cr` 結構等,都是需要使用者自己加入至 function 當中。 這很容易會因為太多參數需要顧慮,造成有可能出錯。 因此我以用 [EV3 - Coroutines in C](https://in4lio.github.io/ev3dev-c/group__coro.html#ga0fc518855e8bfbbd08507c5168b5179f) 的形式改進。使得在撰寫函式時,不需要顧忌 coroutine 的相關參數等。 而且,在 [EV3 - Coroutines in C](https://in4lio.github.io/ev3dev-c/group__coro.html#ga0fc518855e8bfbbd08507c5168b5179f) 裡,他的函式宣告: ```cpp #define CORO_DEFINE(name) int coro_##name( co_t *co_p ) ``` **無法自設函式參數**,而在這之上我利用不定變數做出能夠改進。 ```cpp #define cr_function(func_name, ...) \ void cr_##func_name(struct cr *o, __VA_ARGS__) #define cr_call(func_name, o, ...) cr_##func_name(&(o), __VA_ARGS__) ``` :::info **完整程式碼** * [GitHub - linD026 - `coroutine.h`](https://github.com/linD026/linux2021_homework_quiz5_network_connect/blob/main/coroutine.h) ::: #### test coroutine ```cpp #include <stdio.h> #include "coroutine.h" // void test(struct cr*o, int state, int par1) cr_function(test, int state, int par1) { cr_begin(); //cr_wait(cond, final) cr_wait((state % 2 == 1), cr_restart()); printf("0 %d %d\n", state, par1); //cr_end(final) cr_end(cr_restart()); } cr_function(test1, int state, int par1, int par2) { cr_begin(); cr_wait((state % 2 == 0), cr_restart()); printf("1 %d %d %d\n", state, par1, par2); cr_end(cr_restart()); } int main(void) { struct cr cor1 = cr_init(); struct cr cor2 = cr_init(); for (int i = 1;i <= 10;i++) { //test(cor1, i, i) cr_call(test, cor1, i, i); cr_call(test1, cor2, i, i, i); } } ``` ```shell % make test ./test_coroutine 0 1 1 1 2 2 2 0 3 3 1 4 4 4 0 5 5 1 6 6 6 0 7 7 1 8 8 8 0 9 9 1 10 10 10 ``` :::info **Reference** * [EV3 - Coroutines in C](https://in4lio.github.io/ev3dev-c/group__coro.html#ga0fc518855e8bfbbd08507c5168b5179f) ::: ## EV3 子計畫 Coroutines in C - yield 和 restart ```cpp #define cr_restart(final) \ do { \ final; \ (o)->label = NULL; \ (o)->status = CR_BLOCKED; \ return; \ } while (0) #define cr_yield(final) \ do { \ cr_label(CR_BLOCKED); \ final; \ return; \ } while (0) ``` :::info **Reference** * [EV3 - Coroutines in C](https://in4lio.github.io/ev3dev-c/group__coro.html#ga0fc518855e8bfbbd08507c5168b5179f) :::