Dynamic Array Growth Factor

# Dynamic Array Growth Factor ## Amortized Analysis ### Increase by a constant factor 2 先以上課講到的兩倍作為例子 | ith | Array | Cost (append + copy) | |---|---|---| | 0 | \[ ] | 0 | | 1 | \[a] | 1 | | 2 | \[a, b] | 1 + 1 = 2 | | 3 | \[a, b, c, \_] | 1 + 2 = 3 | | 4 | \[a, b, c, d] | 1 | | 5 | \[a, b, c, d, e, \_, \_, \_] | 1 + 4 = 5 | 觀察一下即可發現$c_i$的關係式 $$c_i=\begin{cases}i,&\text{if }i-1=2^k.\\1,&\text{otherwise.}\end{cases}$$ 接著要算出total cost，把cost部分分成兩部分看，即可列出下面式子 $$\begin{aligned} \sum_{i=1}^nc_i& =\sum_{i=0}^{\lfloor\log_2\left(n{-}1\right)\rfloor }2^i+\sum_{i=1}^n1 \\ &=2^{\lfloor\log_2\left(n{-}1\right)\rfloor+1}-1+n \\&=2\cdot2^{\lfloor\log_2\left(n{-}1\right)\rfloor}-1+n\\ &\leq 3n \end{aligned}$$ 所以可得出n個append operation的total cost最多為$3n$，, 而對於每個單一append操作的amortized cost最多為3，即O(1) 討論完2倍，接著來探討看看如果每次申請新的空間是以加k大小或k倍的情況 ### Increase by a constant k $\begin{aligned}\text{Total cost} &= n +(1)+(1+k)+(1+2k)+(1+3k) +\space ...\space+(1+(\lceil\frac{n-1}{k}\rceil-1)k) \\&=n + \lceil\frac{n-1}{k}\rceil+\frac{\lceil\frac{n-1}{k}\rceil(\lceil\frac{n-1}{k}\rceil-1)}{2}k\\&=O(n^2) \end{aligned}$ 所以對於每個單一append操作的amortized cost為$O(n)$ ### Increase by a constant factor k $$\begin{aligned} \sum_{i=1}^nc_i& =\sum_{i=0}^{\lfloor\log_k\left(n{-}1\right)\rfloor }k^i+\sum_{i=1}^n1 \\ &=\frac{k^{\lfloor\log_k\left(n{-}1\right)\rfloor+1}-1}{k-1}+n \\ &=\frac{k\cdot k^{\lfloor\log_k\left(n{-}1\right)\rfloor}-1+nk-n}{k-1}\\ &\leq\frac{nk-1+nk-n}{k-1}\\ &\leq\frac{2nk-1}{k-1}\\ &=O(n) \end{aligned}$$ 所以對於每個單一append操作的amortized cost為$O(1)$ ### 結論 & 問題從上面的推算可得出和Increase by a constant k相比，Increase by a constant factor k 時會比較好，但接著又衍生一個問題，那constant factor有沒有什麼比較好的選擇? 從時間複雜度來看，做n次append操作的複雜度皆為O(n)，不論k多大對於每個單一append操作的amortized cost皆為O(1)，而k只要不要差太多彼此的常數可能只會差一些些 ## Constant factor choice 從memory角度來看，我們會希望在經過N次resize後，可以重複利用前N-1次resize釋放的空間 **以2倍來說:** 1. 先假設現在dynamic array已經被分配16bytes(4個int)的memory 2. 接著要放入第5個int，則申請32bytes的memory，然後釋放16bytes memory，此時有16bytes的holes在memory 3. 接著要放入第9個int，則申請64bytes的memory，然後釋放32bytes memory，此時有48bytes的holes在memory 4. 接著要放入第17個int，則申請128bytes的memory，然後釋放64bytes memory，此時有112bytes的holes在memory **以1.5倍來說:** 1. 先假設現在dynamic array已經被分配16bytes(4個int)的memory 2. 接著要放入第5個int，則申請24bytes的memory，然後釋放16bytes memory，此時有16bytes的holes在memory 3. 接著要放入第7個int，則申請36bytes的memory，然後釋放24bytes memory，此時有40bytes的holes在memory 4. 接著要放入第10個int，則申請52bytes的memory，然後釋放36bytes memory，此時有76bytes的holes在memory 5. 接著要放入第14個int，則申請76bytes的memory，然後釋放52bytes memory，可以發現**76小於等於前面的hole大小76**，所以申請的空間可以來自前面的hole，即**可以重複利用先前釋放空間** 註:這邊申請的bytes是依據現在dynamic array的size決定，而非bytes，否則以int來說36+36/2會等於54，但54bytes的空間無法剛好全部放入int型態的資料註:這邊一開始分配的空間可以想成是C++ vector的reserve()，然後放入4個int 稍微觀察一下可以發現當倍數>=2時，新申請的空間都會大於之前的hole總和，即無法利用先前釋放的空間既然這樣那我們應該可以得出合理倍數應該是1<k<2，那為什麼有些情況是使用2的倍數? 這邊以linux系統下g++的C\++為例，使用2的倍數是因為Linux的Buddy system，而Buddy system要求memory的分配必須是2的冪次，所以在linux系統下會是兩倍先撇除2倍，那接著，我們已經把範圍縮到1<k<2，而這範圍的時間複雜度也不會相差太多，那接下來有沒有辦法從1<k<2中找出最佳解? ### 最佳解主要目標是讓可以reuse的空間越大越好這邊先假設constant factor是$x$ 一開始分配的memory大小為$1$，接著新分配的大小為$x$，接著$x^2$，然後要分配$x^3$的空間時會希望他可以重複利用先前$1$和$x$釋放的空間然後我們可以得到 $1 + x = x^3$ $x \approx 1.32$ 前面的先不重複利用，而接著分配$x^4$的新空間 $1 + x + x^2 = x^4$ $x \approx 1.47$ 歸納一下 $1+x+\cdots+x^{n-2}=x^n$ $\frac{x^{n-1}-1}{x-1} = x^n$ $x^{n-1}-1 = x^{n+1}-x^n$ 當n很大時，1可忽略 $x^{n-1} = x^{n+1}-x^n$ $x^2-x-1=0$ $x \approx 1.618$ 我們即可得出最佳倍數應該為$1.618$，而$1.618$同時也是黃金比例那為什麼有些Compiler在處理dynamic array時不是以$1.618$這個倍數進行擴張? 首先$1.618$是經由一直先不重複利用前面釋放的memory為前提且N為無限大進行計算，所以實際上正常來說不會使用$1.618$這個倍數，否則永遠都無法用到之前釋放的memory，就實際上來講倍數通常會建議稍微小於這個數字。以Windows的MSVC為例，C++的dynamic array的擴張倍數會是$1.5$倍，計算$1.5$這個數字相對$1.618$可以較容易計算(例如經由bit shift操作計算)，而$1.5$也和$1.618$較為接近 ![image](https://hackmd.io/_uploads/rJ_XRy2-R.png =300x290) 不同programming language和Compiler，constant factor的範圍都在$1< k\leq2$，從表格可以看得出來Python和Go比較獨樹一格(# 這邊提一下，C\++如果是Windows但是是使用MinGW編譯的話，constant factor會和linux的gcc或g++一樣是2 ## C++ ### Linux 作業系統:`Ubuntu 22.04 LTS` gcc版本:`11.4.0` g++版本:`11.4.0` C++版本:`11` 以下code如果是使用linux的話應該可以在`/usr/include/c++/<version>/`找到題外話:GNU的coding style看起來好詭異(# `stl_vector.h` ```c= void push_back (const value_type &__x) { if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) { _GLIBCXX_ASAN_ANNOTATE_GROW (1); _Alloc_traits::construct (this->_M_impl, this->_M_impl._M_finish, __x); ++this->_M_impl._M_finish; _GLIBCXX_ASAN_ANNOTATE_GREW (1); } else _M_realloc_insert (end (), __x); } ``` 可以發現滿的時候如果要`push_back`會調用`_M_realloc_insert (end (), __x)` `vector.tcc` `_M_realloc_insert` ```c= #if __cplusplus >= 201103L template<typename _Tp, typename _Alloc> template<typename... _Args> void vector<_Tp, _Alloc>:: _M_realloc_insert(iterator __position, _Args&&... __args) #else template<typename _Tp, typename _Alloc> void vector<_Tp, _Alloc>:: _M_realloc_insert(iterator __position, const _Tp& __x) #endif { const size_type __len = _M_check_len(size_type(1), "vector::_M_realloc_insert"); pointer __old_start = this->_M_impl._M_start; pointer __old_finish = this->_M_impl._M_finish; const size_type __elems_before = __position - begin(); pointer __new_start(this->_M_allocate(__len)); pointer __new_finish(__new_start); __try { // The order of the three operations is dictated by the C++11 // case, where the moves could alter a new element belonging // to the existing vector. This is an issue only for callers // taking the element by lvalue ref (see last bullet of C++11 // [res.on.arguments]). _Alloc_traits::construct(this->_M_impl, __new_start + __elems_before, #if __cplusplus >= 201103L std::forward<_Args>(__args)...); #else __x); #endif __new_finish = pointer(); #if __cplusplus >= 201103L if _GLIBCXX17_CONSTEXPR (_S_use_relocate()) { __new_finish = _S_relocate(__old_start, __position.base(), __new_start, _M_get_Tp_allocator()); ++__new_finish; __new_finish = _S_relocate(__position.base(), __old_finish, __new_finish, _M_get_Tp_allocator()); } else #endif { __new_finish = std::__uninitialized_move_if_noexcept_a (__old_start, __position.base(), __new_start, _M_get_Tp_allocator()); ++__new_finish; __new_finish = std::__uninitialized_move_if_noexcept_a (__position.base(), __old_finish, __new_finish, _M_get_Tp_allocator()); } } __catch(...) { if (!__new_finish) _Alloc_traits::destroy(this->_M_impl, __new_start + __elems_before); else std::_Destroy(__new_start, __new_finish, _M_get_Tp_allocator()); _M_deallocate(__new_start, __len); __throw_exception_again; } #if __cplusplus >= 201103L if _GLIBCXX17_CONSTEXPR (!_S_use_relocate()) #endif std::_Destroy(__old_start, __old_finish, _M_get_Tp_allocator()); _GLIBCXX_ASAN_ANNOTATE_REINIT; _M_deallocate(__old_start, this->_M_impl._M_end_of_storage - __old_start); this->_M_impl._M_start = __new_start; this->_M_impl._M_finish = __new_finish; this->_M_impl._M_end_of_storage = __new_start + __len; } ``` 簡單來說就是會申請一塊新的__len大小的memory，然後將原本資料複製新的memory中，再將原本資料的memory deallocate 可以發現跟size有關的是14行的 `const size_type __len = _M_check_len(size_type(1), "vector::_M_realloc_insert")` 所以繼續trace看`_M_check_len`如何實現 `vector.tcc` `_M_check_len` ```c= size_type _M_check_len (size_type __n, const char *__s) const { if (max_size () - size () < __n) __throw_length_error (__N (__s)); const size_type __len = size () + (std::max) (size (), __n); return (__len < size () || __len > max_size ()) ? max_size () : __len; } ``` 基本上都會回傳兩倍size()的數值，其他檢查則是避免vector size超過max_size()，max_size()即為vector能存放的最多element個數 ### MSVC 可以在Github看到MSVC對vector的實現:[Github網址](https://github.com/microsoft/STL/blob/a62109595b6d89e08172fdf4beb75a2670fe0cc9/stl/inc/vector) `STL/stl/inc/vector` [872行](https://github.com/microsoft/STL/blob/a62109595b6d89e08172fdf4beb75a2670fe0cc9/stl/inc/vector#L872C1-L879C6) ```c= _CONSTEXPR20 void push_back(const _Ty& _Val) { // insert element at end, provide strong guarantee _Emplace_one_at_back(_Val); } _CONSTEXPR20 void push_back(_Ty&& _Val) { // insert by moving into element at end, provide strong guarantee _Emplace_one_at_back(_STD move(_Val)); } ``` 可以發現如果要`push_back`會調用`_Emplace_one_at_back()` `STL/stl/inc/vector` [775行](https://github.com/microsoft/STL/blob/a62109595b6d89e08172fdf4beb75a2670fe0cc9/stl/inc/vector#L775C1-L786C6) ```c= template <class... _Valty> _CONSTEXPR20 _Ty& _Emplace_one_at_back(_Valty&&... _Val) { // insert by perfectly forwarding into element at end, provide strong guarantee auto& _My_data = _Mypair._Myval2; pointer& _Mylast = _My_data._Mylast; if (_Mylast != _My_data._Myend) { return _Emplace_back_with_unused_capacity(_STD forward<_Valty>(_Val)...); } return *_Emplace_reallocate(_Mylast, _STD forward<_Valty>(_Val)...); } ``` 可以發現滿的時候如果要`push_back`會調用`_Emplace_reallocate()` `STL/stl/inc/vector` [811行](https://github.com/microsoft/STL/blob/a62109595b6d89e08172fdf4beb75a2670fe0cc9/stl/inc/vector#L811C1-L858C6) ```c= template <class... _Valty> _CONSTEXPR20 pointer _Emplace_reallocate(const pointer _Whereptr, _Valty&&... _Val) { // reallocate and insert by perfectly forwarding _Val at _Whereptr _Alty& _Al = _Getal(); auto& _My_data = _Mypair._Myval2; pointer& _Myfirst = _My_data._Myfirst; pointer& _Mylast = _My_data._Mylast; _STL_INTERNAL_CHECK(_Mylast == _My_data._Myend); // check that we have no unused capacity const auto _Whereoff = static_cast<size_type>(_Whereptr - _Myfirst); const auto _Oldsize = static_cast<size_type>(_Mylast - _Myfirst); if (_Oldsize == max_size()) { _Xlength(); } const size_type _Newsize = _Oldsize + 1; const size_type _Newcapacity = _Calculate_growth(_Newsize); const pointer _Newvec = _Al.allocate(_Newcapacity); const pointer _Constructed_last = _Newvec + _Whereoff + 1; pointer _Constructed_first = _Constructed_last; _TRY_BEGIN _Alty_traits::construct(_Al, _Unfancy(_Newvec + _Whereoff), _STD forward<_Valty>(_Val)...); _Constructed_first = _Newvec + _Whereoff; if (_Whereptr == _Mylast) { // at back, provide strong guarantee if constexpr (is_nothrow_move_constructible_v<_Ty> || !is_copy_constructible_v<_Ty>) { _Uninitialized_move(_Myfirst, _Mylast, _Newvec, _Al); } else { _Uninitialized_copy(_Myfirst, _Mylast, _Newvec, _Al); } } else { // provide basic guarantee _Uninitialized_move(_Myfirst, _Whereptr, _Newvec, _Al); _Constructed_first = _Newvec; _Uninitialized_move(_Whereptr, _Mylast, _Newvec + _Whereoff + 1, _Al); } _CATCH_ALL _Destroy_range(_Constructed_first, _Constructed_last, _Al); _Al.deallocate(_Newvec, _Newcapacity); _RERAISE; _CATCH_END _Change_array(_Newvec, _Newsize, _Newcapacity); return _Newvec + _Whereoff; } ``` 可以發現跟size有關的是19行的`_Calculate_growth()` `STL/stl/inc/vector` [1961行](https://github.com/microsoft/STL/blob/a62109595b6d89e08172fdf4beb75a2670fe0cc9/stl/inc/vector#L1961C1-L1977C6) ```c= _CONSTEXPR20 size_type _Calculate_growth(const size_type _Newsize) const { // given _Oldcapacity and _Newsize, calculate geometric growth const size_type _Oldcapacity = capacity(); const auto _Max = max_size(); if (_Oldcapacity > _Max - _Oldcapacity / 2) { return _Max; // geometric growth would overflow } const size_type _Geometric = _Oldcapacity + _Oldcapacity / 2; if (_Geometric < _Newsize) { return _Newsize; // geometric growth would be insufficient } return _Geometric; // geometric growth is sufficient } ``` 基本上都會回傳capacity() + capacity()/2的數值，如果這個數值小於_Newsize(_Oldsize + 1)，則回傳_Newsize(當_Oldsize為1時就有可能發生，畢竟1+1/2=1)，其他檢查則是避免vector size超過max_size()，max_size()即為vector能存放的最多element個數 ### 結論可以根據以上trace過程，得出使用gcc或g++時，vector要push_back時若超出現有capacity，則會將capacity\*2，即2倍。而使用MSVC時vector要push_back時若超出現有capacity，則會將capacity變為capacity+capacity/2，即1.5倍 ## Reference [Dynamic array](https://en.wikipedia.org/wiki/Dynamic_array) [Optimal memory reallocation and the golden ratio](https://archive.li/Z2R8w#selection-83.0-83.48) [What is the ideal growth rate for a dynamically allocated array?](https://stackoverflow.com/questions/1100311/what-is-the-ideal-growth-rate-for-a-dynamically-allocated-array) [Reasoning behind specifically doubling the array size upon reallocation](https://cs.stackexchange.com/questions/149713/reasoning-behind-specifically-doubling-the-array-size-upon-reallocation) [Amortized Analysis of Dynamic Arrays](https://www.cs.utexas.edu/~slaberge/docs/topics/amortized/dynamic_arrays/)