# Data Structure ###### tags: `MyNTUST` {%hackmd @CA-Lee/MyNTUST_banner %} [TOC] Program, Algorithm & Recursion === Datatype --- - char - store in ASCII code - int - float - double Program --- - a good program is - correct - easy to read - easy to debug - easy to modify Algorithm --- - definition - 解決問題的步驟 - input - output - at least one - clear and unambiguous - finite - 有限步驟內可以解決問題 - program don't need - effective - anyone can understand by pencil and paper - algorithm vs program - algorithm is just paperwork - often devide into modules (modularization) - easy to maintain Function --- - why seperate into functions - easy to test - easy to update - 更好分工 - recursion Recursive --- - Direct recursion - A call A in A - indirect recursion - A call B, B call A - tail recursion - A call A in the last line - (often) easy to read - but more time-comsuming - Ackerman's function $$ A(m,n) = \begin{cases} n+1, & \text{if } m=0 \\ A(m-1,1), & \text{if } n=0 \\ A(m-1,A(m,n-1)), & \text{otherwise} \end{cases} $$ eg. $$ \begin{align} A(1,2)=A(0,A(1,1))=A(0,3)=4\\ A(1,1)=A(0,A(1,0))=A(0,2)=3\\ A(1,0)=A(0,1)=2 \\ A(0,1)=2 \end{align} $$ Performance Analysis === Space Complexity --- > $S(P) = c + S_p$ - fixed part - instructions (指令、程式碼) - constants (常數) - simple variable (一些基本的變數) - variable part - recursion stack space (遞迴時佔的空間) - 因為要先儲存還沒執行完的資料(程式) - referenced variables - 一般都針對 $S_p$ 討論 Time Complexity --- > $T(P) = c + T_p$ - 分成 compile time 和 run (execution) time,一般討論 run time - compile time - 編譯的時間,基本上是 constant - run time - 程式執行所耗的時間,可能會因輸入而改變 - measuement - 實際測量程式執行所耗的時間 - analysis - 計算程式走了幾步 (有幾行指令) Expressing Complexity --- - we usually use some terminoloies, which is meaningful but inexact ### Big-oh > $f(n) = O(g(n))$ > (f of n is big oh of g of n) - 程式複雜度上界 - 程式 f 輸入 n 之後執行的時間為 $f(n)$ - 找到數學式 $g(n)$ 使得 $f(n)$ 在 $n \ge n_0$ 時永遠小於等於 $g(n) \times c$ 倍 - $f(n) \le cg(n) \text{ for all } n, n \ge n_0$ - eg. $3n + 2 = O(n)$ - $3n +2 \le 4n \text{ for all } n \ge 2$ - $3n + 2 = O(n^2)$ is also correct, but not informative - $g(n)$ should be as small as possible - Fantastic names (~~潮爆的~~讀法) | | | | -------- | ----------- | | $O(1)$ | constant | | $O(n)$ | linear | | $O(n^2)$ | quadratic | | $O(n^3)$ | cubic | | $O(2^n)$ | exponential | - $O(1) < O(\log n) < O(n) < O(n \log n)$ $< O(n^2) < O(2^n) < O(n^3) < O(n^c)$ $< O(2^n) < O(3^n) < O(c^n) < O(n!)$ $< O(n^n) < O(n^{c^n})$ ### Omega > $f(n) = \Omega(g(n))$ > (f of n is omega of g of n) - 程式複雜度下界 - $n \ge n_0$ 後 $f(n)$ 必大於 $cg(n)$ - $f(n) \ge cg(n) \text{ for all } n, n \ge n_0$ - $g(n)$ also should be as large as possible ### Theta > $f(n) = \Theta(g(n))$ > (f of n is theta of g of n) - $g(n)$ 同時是 $O()$ 同時也是 $\Omega()$ - 比 $O()$ 和 $\Omega()$ 更精確 - $\Theta(g(n)) = O(g(n)) = \Omega(g(n))$ ### Little-Oh > $f(n) = o(g(n)),\space 0 \le f(n) \lt cg(n)$ - for **all** positive constant $c$, exist some $n_0$ such $f(n) < cg(n)$ for all $n > n_0$ - e.g. - $2n = o(n^2)$ - $2n^2 \neq o(n^2)$ ### Little-omega > $f(n) = \omega(g(n)),\space 0 \le cg(n) \lt f(n)$ - for **all** positive constant $c$, exist some $n_0$ such $cg(n) < f(n)$ for all $n > n_0$ - e.g. - $\frac{1}{2}n^2 = \omega(n)$ - $\frac{1}{2}n^2 \neq \omega(n^2)$ ### Implications - $f(n) = \Theta(g(n)) \iff g(n)=\Theta(f(n))$ - $f(n) = O(g(n)) \iff g(n) = \Omega(f(n))$ - $f(n) = o(g(n)) \iff g(n) = \omega(f(n))$ Master Method --- > a "cookbook" for solving complexity of some recursive functions ### Theorem - $T(n) = aT(\frac{n}{b}) + f(n)$ - $a \ge 1, b \gt 1, f(n) \text{ is a positive function}$ 1. $f(n) = O(n^{\log_b a-\epsilon})$ for some constant $\epsilon > 0$ - $T(n) = \Theta(n^{\log_b a})$ 2. $f(n) = \Theta(n^{\log_b a})$ - $T(n) = \Theta(n^{\log_b a} \log_2 n)$ 3. $f(n) = \Omega(n^{\log_b a+\epsilon})$ for some constant $\epsilon > 0$, and $af(\frac{n}{b}) \le cf(n)$ for some constant $c < 1$ and $n > n_0$ - $T(n) = \Theta(f(n))$ ### examples - e.g. $T(n) = 9T(\frac{n}{3}) + n$ - $a=9,b=3,f(n)=n,n^{\log_ba}=n^{\log_3 9}=n^2$ - $f(n) = n = O(n^{\log_b a-\epsilon}) = O(n^{\log_3 9-1})$ - by case 1, $T(n) = \Theta(n^{\log_b a}) = \Theta(n^2)$ - e.g. $T(n) = T(\frac{2n}{3}) + 1$ - e.g. $T(n) = 3T(\frac{n}{4}) + n \log_2 n$ Array === - array is a set of pair of <index, value> - in C/C++, index start from 0 - 通常儲存在連續的記憶體中 - a[3] 的位址等於 a[0] 的位址 + 3 單位 - 在初始化時可以設 default value 2D array --- - 二維,可以和數學的 matrix 對應 - $A_{i,j} = A[i-1][j-1]$ - Row-major - Store in sequence $A_{1,1},A_{1,2},...,A_{1,n},A_{2,1},...,A_{2,n},...,A_{m,n}$ - a[i][j]的位址 = A[0][0]的位址 + i*n 個單位 + j 個單位 - Column-major - Store in sequence $A_{1,1},A_{2,1},...,A_{m,1},A_{1,2},...,A_{m,2},...,A_{m,n}$ - a[i][j]的位址 = A[0][0]的位址 + j*m 個單位 + i 個單位 Triangular Matrix --- > 有約一半的值是 0,存起來很浪費空間,所以不存 - lower-triangular - row-major - $L(A_{i,j}) = l_s + [\frac{(1+(i-1)) \times (i-1)}{2} + (j-1)] \times d$ - column-major - 全部減後面 - $L(A_{i,j}) = l_s + [\frac{(1+m)\times m}{2} - \frac{(1+(m-j))(m-j)}{2} - (m-i+1)] \times d$ - upper-triangular - 概念同上 Stack === Concept --- - ordered - 插入或刪除要依照規定的順序 - 只能插入/刪除 top 的資料 - 桶子 - stack 可以說是一個桶子 - 插入資料 - 往桶子裡丟,最後丟的會在最上面 - 取出/移除資料 - 從桶子裡拿,只能從最上面的開始拿 - LIFO - last in first out Implementation --- - push(data) - 放東西進去 - 放之前程式要檢查 stack 是否已經放滿 - else `stack overflow` - pop() - 取出資料 - 程式要檢查 stack 還有沒有資料 - else `stack underflow` - peek() - 偷看 - 看一下 top 的資料 - use array - 宣告一個空間 - 從 0 開始放,一個指標紀錄 top 位置 Stack Permutation --- - 給一組 elements 和一個 stack - 將 elements 依序放入 stack,但中途可以 pop - 所以會得到新的順序 - 該順序稱為 stack-sortable permutation - e.g. ABC - ACB: push (A), pop (A), push (B), push (C), pop (C), pop (B) - Catalan number ([wiki](https://en.wikipedia.org/wiki/Catalan_number)) - how many stack-sortable permutations - $$\frac{1}{n+1}C^{2n}_{n}$$ Prefix, Infix & Postfix === - Infix (中序表示法) - `運算元1` + `運算子` + `運算元2` - e.g. `A + B` - Postfix (後序表示法) - e.g. `AB X` - infix to postfix 1. 將所有括號寫出來 2. 用運算子取代對應的右括號 3. 移除左括號 - Prefix (前序表示法) - e.g. `* AB` - infix to prefix 1. 將所有括號寫出來 2. 用運算子取代對應的左括號 3. 移除右括號 Red-Black Tree === > 紅黑樹 - definition - nodes have color red or black - leaf nodes doesn't store data - root is always black - leaves are always black - red nodes always have two black children - from any node to its leaf must pass same amount of black nodes - self balanced - longest path $\le$ 2 $\times$ shortest path - 因為紅色都有兩個小孩 - 不會連續太多黑色,因為走過的黑色要一樣多 Insertion --- 1. if one black has two children, change its color 2. if needed, do rotation 3. insert node as red 4. if needed, do rotation 5. check root is black Red-Black Tree VS AVL Tree --- - $O(\log n)$ for searching, insertion, deletion - AVL tree is more rigidly balanced - AVL tree is slower when insertion & deletion, bus faster in searching Contributors === - [calee](https://calee.tw) - [alston](https://github.com/Alston-Jan) - [issues](https://github.com/jp05451)