Lecture 03 - Unknown Array Size Problem

# Lecture 03 - Unknown Array Size Problem > 課程內容 : 中興大學資工系范耀中教授 (112 學年度第 2 學期) > > 其他科目內容請參見 [[Here]](https://hackmd.io/@ChenZE/By4SOO6Jyl) :::info **Problem scription** Array 在使用前都需要事先宣告長度，然而常會遇到下列問題 : (1) 長度宣告太小，不夠用 (2) 長度宣告太大，佔據記憶體空間 ::: 此處考慮的資料型態以 **string** 為主，考慮`push()`與`pop()`兩種函數 ## 1. Linked list implementation 最簡單的方式，以 Data structure 中的 ==Linked list== 實作，當長度不夠時再**動態增加 Node** 此外 Size 過大時也可以**動態刪減 Node**來避免佔據過多的記憶體空間 ![image](https://hackmd.io/_uploads/BJfo4BalC.png) ## 2. Resizing arrays implementation > **Problem :** > > Requiring client to provide capacity **does not implement API** ### 2.1 Method 1 : Increase/Decrease size of array by 1 > 當 array 大小不夠/過大時，每次**新創**一個array且大小**增加/減少 1** > 再把舊的 array **複製**到新的 array 中 Assume that an array `s = [a, b, c, d, ...]`with size N * if insert $N = 1$ items, complexity : $1$ * if insert $N = 2$ items, complexity : $(1+1)+(2)$ * if insert $N = 3$ items, complexity : $(1+1+1)+(2+4)$ The first parentheses impleies that **push items** into array The second parentheses impleies that **access array** (讀取+搬移) ![Algorithm Lecture 3 Unknown Array Size Problem 2023-9 2](https://hackmd.io/_uploads/Syf4RIplR.jpg) 此方法的缺點在於==成本太過高昂==，因為每次新增都需要==複製整個 array== ，會浪費太多時間進行**讀取與搬移**(假設 size = N) * complexity of push : $N$ * complexity of access : $2+4+6+8+...+2(N-1)$ * Total : $N+ (2+4+6+...+2(N-1)) \sim N^2$ --- ### 2.2 Method 2-1 : How to grow array ? 第二種方法 Resizing array 的方法，此處先考慮擴增 Array 的方式 : 當 Array size 不夠的時候就將大小**擴增兩倍** Consider that the array size is N : * complexity of $N = 1$ items : $(1)$ * complexity of $N = 2$ items : $(1+1)+(2)$ * complexity of $N = 3$ items : $(1+1+1)+(2+4)$ * complexity of $N = 4$ items : $(1+1+1+1)+(2+4)$ * complexity of $N = 5$ items : $(1+1+1+1+1)+(2+4+8)$ ![Algorithm Lecture 3 Unknown Array Size Problem 2023-13 2](https://hackmd.io/_uploads/SyBr0LTeR.jpg) That is, array access to insert is $N=2^i$ (N個items的搬移次數) 此方法在**N = 第2, 4, 8, 16, ...次**時，需要==擴大 array 並搬移資料== $\therefore T(N)=N+(2+4+8+...+N) \sim 3N$ :::info :bulb:**Complementary** $\begin{aligned} 2+4+8+...+N&=2^1+2^2+2^4+...+2^i \\ & = \frac{2 \cdot (2^i-1)}{2-1} \\ & = 2 \cdot 2^i \\ & = 2N \end{aligned}$ ::: --- ### 2.3 Method 2-2 : How to shrink array ? **Shrink method ~ 1 :** 當 Array size 滿 :arrow_right: 空間增兩倍 (上述方式) 當 Array size **半滿** :arrow_right: 空間==縮一半== <font color="ff0000">**但是**</font>，這個縮減方式在 ==Array size = 臨界值時會有問題==，此時 Array size會不斷的**增倍**(創建大Array+複製)與**縮減**(創建小Array+複製) ![image](https://hackmd.io/_uploads/SkERBLalA.png) **Shrink method ~ 1 :** 當 Array size 滿 :arrow_right: 空間增兩倍 (上述方式) 當 Array size **$\frac{1}{4}$滿** :arrow_right: 空間==縮一半== ![截圖 2024-04-17 晚上10.15.48](https://hackmd.io/_uploads/BksURIpxA.png) 缺點 : Array 的空間不是全滿就是$\frac{1}{4}$滿，會==浪費 3 倍的空間== ## 3. Memory analysis ### 3.1 Typical memory usage for primitive and arrays **Primitive types** | type | bytes | | :-: | :-: | | boolean | 1 | | char | 2 | | int | 4 | | float | 4 | | long | 8 | | double | 8 | **Array overhead** : 24 bytes 宣告一個空的或長度為 0 的 Array 至少需要的空間大小 | type | bytes (size = N) | | :-: | :-: | | char[ ] | 2N + 24 | | int[ ] | 4N + 24 | | double[ ] | 8N + 24 | --- ### 3.2 Typical memory usage for object in Java **Object overhead (16 bytes)** : 新增一個沒有任何變數的物件所需要的大小 **Reference (8 bytes)** : 物件指向另一個物件所需的指標大小 **Padding** : 任何物件的記憶體大小都需要是 8 的倍數，如果不足 8 的倍數就必須做 padding (填充) :::info Reference : 儲存其他變量、物件的記憶體位置 ::: > Example 1. > A Date object uses 32 bytes of memory ![image](https://hackmd.io/_uploads/B16ckdpeA.png) > Example 2. > A virgin(原始) String of length N uses approximated 2N bytes of memory ![image](https://hackmd.io/_uploads/SyYixdTxA.png) 在上述String物件中，變數value本身不是儲存 Array 的內容，而是儲存 Array 的記憶體位址(i.e., 指標)。因此對變數value而言，所佔的記憶體大小為 : * 8 bytes (指向Array的位址的大小/指標變數的大小) * 2N + 24 bytes ( Array 內容的大小) --- ### 3.3 Typical memory usage summary **Shallow memory usage** : Don't count referenced objects (物件指出去的空間不算) **Deep memory usage** : If array entry or instance variable is a reference, count memory for referenced object (物件指出去的空間要算) --- ### 3.4 Memory profiler(分析) ![image](https://hackmd.io/_uploads/B1iLHO6lR.png) * Size of Date (物件內容可看上面) * object : $16$ bytes * 3 個 int : $3 \times 4=12$ bytes * padding : 4 * total : $16+12+4=32$ bytes * Size of s (shallow)(物件內容可看上面) * object : $16$ bytes * 3 個 int : $3 \times 4=12$ bytes * value (指標)的大小 : $8$ bytes * padding : $4$ bytes * total : $16+12+8+4=40$ * Size of s (deep) * object : $16$ bytes * 3 個 int : $3 \times 4=12$ bytes * array (指標)的大小 : $8$ bytes * array (內容)的大小 : $2 \times 12+24=48$ bytes * size = 12 * element is character (2 bytes) * overhead (24 bytes) * padding : $4$ bytes * total : $16+12+8+48+4=88$ bytes ## 4. Which one is better ? ### 4.1 Resizing array * approximated $8N$ bytes when full * approximated $32N$ bytes when one-quarter full ![image](https://hackmd.io/_uploads/Hy4vAdaeC.png) --- ### 4.2 Linked list(implement by stack) * $40$ bytes for one node * approximated $40N$ bytes for N items ![image](https://hackmd.io/_uploads/r1AWJYaxR.png) :::info :bulb:變數 Next : 連結下一個node所需的指標 ::: --- ### 4.3 Which is better ? * Array * 所需空間較小 * 速度較快(?) * Linked list * 插入/刪除元素較方便 <font color="ff0000">**結論 : 挑選適合的情況去實作，不要亂用**</font>