contributed by < linD026
>
linux2021
STRUCT_BODY
NEXT_POWER_OF_2
__builtin_clzl
找 the number of leading 0-bits in x
後,在與 64 相減得到下一個 。__builtin_clzl
回傳的是 unsigned long
型態。而一般 uint64_t
為 typedef unsigned long uint64_t
。
Built-in Function:
int builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.
Built-in Function:int builtin_clzl (unsigned long)
Similar to__builtin_clz
, except the argument type is unsigned long.
nonnull attribute
nonnull (arg-index, ...)
The nonnull attribute specifies that some function parameters should be non-null pointers.
v
t
為儲存資料的型態,s
為初始化時儲存資料的數量, name
此 vector 結構的名稱。...
為不定變數。_Static_assert
判斷 s
是否過大。
cleanup (cleanup_function)
The cleanup attribute runs a function when the variable goes out of scope. This attribute can only be applied to auto function scope variables; it may not be applied to parameters or variables with static storage duration. The function must take one parameter, a pointer to a type compatible with the variable. The return value of the function (if any) is ignored.If
-fexceptions
is enabled, thencleanup_function
is run during the stack unwinding that happens during the processing of the exception. Note that the cleanup attribute does not allow the exception to be caught, only to perform an action. It is undefined what happens ifcleanup_function
does not return normally.
gcc 提供了 cleanup attribute ,讓當變數超出 scope 時,自動跑 cleanup 裡的函式。
須注意這不能應用在 parameters or variables 是靜態儲存的情況。
This kind of macro is called variadic. When the macro is invoked, all the tokens in its argument list after the last named argument (this macro has none), including any commas, become the variable argument. This sequence of tokens replaces the identifier
__VA_ARGS__
in the macro body wherever it appears.
例子:
… if you left the variable argument empty, you would have gotten a syntax error, because there would have been an extra comma after the format string.
This formulation looks more descriptive, but historically it was less flexible: you had to supply at least one argument after the format string. In standard C
, you could not omit the comma separating the named argument from the variable arguments.
(Note that this restriction has been lifted in C++2a
, and never existed in GNU C
; see below.)
__VA_OPT__
function macro. This macro may only appear in the definition of a variadic macro. If the variable argument has any tokens, then a __VA_OPT__
invocation expands to its argument; but if the variable argument does not have any tokens, the __VA_OPT__
expands to nothing:
__VA_OPT__
is also available in GNU C and GNU C++.
__VA_OPT__
, this extension remains supported in GNU CPP, for backward compatibility. If you write
Reference
先對 vector 進行 push back ,測量對於記憶體的分配是否符合預期。
可以看到記憶體分配從 stack 到 heap 的轉換並沒有達到預想的原始大小 2 倍的分配。
在進一步顯示 capacity 的大小在從 stack 轉換至 heap 時,是如何改變的:
可以看出這是因為在 __vec_push_back
函式中,對於 stack to heap 的操作, capacity 是以 capacity(v) + 1 狀態更新。
因此次狀況會是:
capacity =
雖然不影響操作結果,但對於記憶體增長的曲線較不符合預期,因此如果不想遽增可以改成:
在 FBvector.md 中提及對於記憶體增長的倍數是 power of 2 的弊端,是因為先前的配置過得記憶體大小必定小於增長過後的大小,因此這不會使 vector 重新使用之前的記憶體。
導致最後必定會無法使用 deallocated chunks 。
The initial HP implementation by Stepanov used a growth factor of 2; i.e., whenever you'd push_back into a vector without there being room, it would double the current capacity. This was not a good choice: it can be mathematically proven that a growth factor of 2 is rigorously the worst possible because it never allows the vector to reuse any of its previously-allocated memory.
1 + 2^^1 + 2^^2 + 2^^3... + 2^^n = 2^^(n+1) - 1
This means that any new chunk requested will be larger than all previously used chunks combined, so the vector must crawl forward in memory; it can't move back to its deallocated chunks.
而在後續也有提及任何小於 2 的數字可以保正在某個 reallocation 次數之後,可以重新使用原先記憶體。
But any number smaller than 2 guarantees that you'll at some point be able to reuse the previous chunks. For instance, choosing 1.5 as the factor allows memory reuse after 4 reallocations; 1.45 allows memory reuse after 3 reallocations; and 1.3 allows reuse after only 2 reallocations.
當然這是在假設 allocator 會儘量使用先前記憶體的情形,但這不影響我們避免最糟的選擇。
Of course, the above makes a number of simplifying assumptions about how the memory allocator works, but definitely you don't want to choose the theoretically absolute worst growth factor. fbvector uses a growth factor of 1.5. That does not impede good performance at small sizes because of the way fbvector cooperates with jemalloc (below).
__vec_push_back
__vec_reserve
從下方結果顯示出實際情況並不完全如 FBVector 所假設。
這原因可以從此看得出原因。
如果在細部探討 allocator 對於 memory 的操作,可以看到 vector 對於記憶體分配:
從上述式子得知,記憶體是以倍數增長,因此會有多餘的記憶體。
As discussed above,
std::vector
also needs to (re)allocate in quanta. The next quantum is usually defined in terms of the current size times the infamous growth constant. Because of this setup,std::vector
has some slack memory at the end much like an allocated block has some slack memory at the end.
因此 fbvector
正是藉由 jemalloc 用以自動檢測記憶體大小並且 reallocation
的方式避免。
It doesn't take a rocket surgeon to figure out that an allocator- aware
std::vector
would be a marriage made in heaven: the vector could directly request blocks of "perfect" size from the allocator so there would be virtually no slack in the allocator. Also, the entire growth strategy could be adjusted to work perfectly with allocator's own block growth strategy. That's exactly whatfbvector
does - it automatically detects the use of jemalloc and adjusts its reallocation strategy accordingly.
以下為 folly 對此操作的程式碼:
goodMallocSize()
jemallocMinInPlaceExpandable
大小為 4096 的原因:
Virtually all modern allocators allocate memory in fixed-size quanta that are chosen to minimize management overhead while at the same time offering good coverage at low slack. For example, an allocator may choose blocks of doubling size (32, 64, 128, <t_co>, …) up to 4096, and then blocks of size multiples of a page up until 1MB, and then 512KB increments and so on.
computePushBackCapacity()
- factor 1.5當 capacity()
大於 4096 後,以 1.5 倍成長:
return (capacity() * 3 + 1) / 2;
xallocx()
- shrinkjemalloc
提供的 xallocx
:
The xallocx() function resizes the allocation at ptr in place to be at least size bytes, and returns the real size of the allocation. If extra is non-zero, an attempt is made to resize the allocation to be at least (size + extra) bytes, though inability to allocate the extra byte(s) will not by itself result in failure to resize. Behavior is undefined if size is 0, or if (size + extra > SIZE_T_MAX).
folly::FBVector 的
push_back
實作也是此函式
利用 jemalloc
裡的 xallocx
reallocate vector ,
如果失敗了則使用 allocate => memory copy => deallocate
cycle 。
延伸閱讀
For
emplace_back
constructorA (int x_arg)
will be called. And forpush_back
A (int x_arg)
is called first and moveA (A &&rhs)
is called afterwards.
realloc()
關於 clib-base 的 realloc ,我們並不能如期知道裡面對於記憶體操作的實作機制是 in-place 或 alloc-memcpy-dealloc 。這也是為何先前 address 測試時,不符合預期的原因,
But wait, there's more. Many memory allocators do not support in- place reallocation, although most of them could. This comes from the now notorious design of
realloc()
to opaquely perform either in-place reallocation or an allocate-memcpy-deallocate cycle. Such lack of control subsequently forced all clib-based allocator designs to avoid in-place reallocation, and that includes C++'snew
andstd::allocator
. This is a major loss of efficiency because an in-place reallocation, being very cheap, may mean a much less aggressive growth strategy. In turn that means less slack memory and faster reallocations.
而關於此解, FBVector 是利用了自己的 allocator jemalloc 中的 xallocx 。
smartRealloc()
此 vector 是以 6 bits 儲存 capacity ,因此其能所表達的數值範圍就會比較小。
(capacity() * 3 + 1) / 2;
方式增長。ceilf
是求數值的 upper bound integer ,而我以用 modff
直接去的浮點數的整數型態來節省成本。
function
folly::Malloc - oodMallocSize
folly::Malloc - jemallocMinInPlaceExpandable
folly::Malloc - smartRealloc
folly::FBVector - in place
folly::FBVector - computePushBackCapacity
folly::FBVector - emplace_back_aux
folly::FBVector - shrink_to_fit
log2f
Fast log2(float x) implementation C++
Fastest implementation of log2(int) and log2(float)
原先是打算使用 stdarg.h
的不定變數來存取資料,但在思考如何從 arg_list
型態的 args
變數利用 va_arg
函式提取資料時發現,我不知道型態是什麼。
起初我是打算利用已知資料大小來設個 char type_[elemsize]
; 或是 typedef struct {char dummy; char tmp[elemsize - 1];} type_;
來儲存資料。但兩者型態都不能夠運用在 va_arg
或宣告函式當中。
此為
struct
error ,另一個char
型態忘了紀錄,但我記得是不能用在va_arg
函式上的錯誤。
後來查了一下,發現網路上對此的相關的說明甚少,目前比較與此問題較相關為此篇 stackoverflow 文章:
stackoverflow - va_list with unknown type names - only the byte size is known!
但仍舊沒有也解決方案,因此最後我改變儲存形式以 array 型態傳入。
在設計 insert 函式時有須注意:
insert
過後是否在 stack
或 heap
char
( 1 byte ) 為單位乘以 elemsize
可以從程式碼得知此操作的成本極高,有多個 memory copy 與 allocate 操作,但目前還沒找到比較好的實作手段。
比較可能實行的方法大概是改變儲存結構,使其為 linked list 的方式。
利用不定變數達到 function overloading 的部份概念(因為是利用 parameter 數量來判斷,沒辦法達成不同型態但同一個數量的 function overloading )。
當傳入數值為 1 個時, erase 那個位置即可,而若輸入兩個變數則會是範圍 erase 。
Reference
原先對於 coroutine 相關操作如 struct cr
結構等,都是需要使用者自己加入至 function 當中。
這很容易會因為太多參數需要顧慮,造成有可能出錯。
因此我以用 EV3 - Coroutines in C 的形式改進。使得在撰寫函式時,不需要顧忌 coroutine 的相關參數等。
而且,在 EV3 - Coroutines in C 裡,他的函式宣告:
無法自設函式參數,而在這之上我利用不定變數做出能夠改進。
Reference
Reference