# Data Structure
###### tags: `MyNTUST`
{%hackmd @CA-Lee/MyNTUST_banner %}
[TOC]
Program, Algorithm & Recursion
===
Datatype
---
- char
- store in ASCII code
- int
- float
- double
Program
---
- a good program is
- correct
- easy to read
- easy to debug
- easy to modify
Algorithm
---
- definition
- 解決問題的步驟
- input
- output
- at least one
- clear and unambiguous
- finite
- 有限步驟內可以解決問題
- program don't need
- effective
- anyone can understand by pencil and paper
- algorithm vs program
- algorithm is just paperwork
- often devide into modules (modularization)
- easy to maintain
Function
---
- why seperate into functions
- easy to test
- easy to update
- 更好分工
- recursion
Recursive
---
- Direct recursion
- A call A in A
- indirect recursion
- A call B, B call A
- tail recursion
- A call A in the last line
- (often) easy to read
- but more time-comsuming
- Ackerman's function
$$
A(m,n) =
\begin{cases}
n+1, & \text{if } m=0 \\
A(m-1,1), & \text{if } n=0 \\
A(m-1,A(m,n-1)), & \text{otherwise}
\end{cases}
$$
eg.
$$
\begin{align}
A(1,2)=A(0,A(1,1))=A(0,3)=4\\
A(1,1)=A(0,A(1,0))=A(0,2)=3\\
A(1,0)=A(0,1)=2 \\
A(0,1)=2
\end{align}
$$
Performance Analysis
===
Space Complexity
---
> $S(P) = c + S_p$
- fixed part
- instructions (指令、程式碼)
- constants (常數)
- simple variable (一些基本的變數)
- variable part
- recursion stack space (遞迴時佔的空間)
- 因為要先儲存還沒執行完的資料(程式)
- referenced variables
- 一般都針對 $S_p$ 討論
Time Complexity
---
> $T(P) = c + T_p$
- 分成 compile time 和 run (execution) time,一般討論 run time
- compile time
- 編譯的時間,基本上是 constant
- run time
- 程式執行所耗的時間,可能會因輸入而改變
- measuement
- 實際測量程式執行所耗的時間
- analysis
- 計算程式走了幾步 (有幾行指令)
Expressing Complexity
---
- we usually use some terminoloies, which is meaningful but inexact
### Big-oh
> $f(n) = O(g(n))$
> (f of n is big oh of g of n)
- 程式複雜度上界
- 程式 f 輸入 n 之後執行的時間為 $f(n)$
- 找到數學式 $g(n)$ 使得 $f(n)$ 在 $n \ge n_0$ 時永遠小於等於 $g(n) \times c$ 倍
- $f(n) \le cg(n) \text{ for all } n, n \ge n_0$
- eg. $3n + 2 = O(n)$
- $3n +2 \le 4n \text{ for all } n \ge 2$
- $3n + 2 = O(n^2)$ is also correct, but not informative
- $g(n)$ should be as small as possible
- Fantastic names (~~潮爆的~~讀法)
| | |
| -------- | ----------- |
| $O(1)$ | constant |
| $O(n)$ | linear |
| $O(n^2)$ | quadratic |
| $O(n^3)$ | cubic |
| $O(2^n)$ | exponential |
- $O(1) < O(\log n) < O(n) < O(n \log n)$
$< O(n^2) < O(2^n) < O(n^3) < O(n^c)$
$< O(2^n) < O(3^n) < O(c^n) < O(n!)$
$< O(n^n) < O(n^{c^n})$
### Omega
> $f(n) = \Omega(g(n))$
> (f of n is omega of g of n)
- 程式複雜度下界
- $n \ge n_0$ 後 $f(n)$ 必大於 $cg(n)$
- $f(n) \ge cg(n) \text{ for all } n, n \ge n_0$
- $g(n)$ also should be as large as possible
### Theta
> $f(n) = \Theta(g(n))$
> (f of n is theta of g of n)
- $g(n)$ 同時是 $O()$ 同時也是 $\Omega()$
- 比 $O()$ 和 $\Omega()$ 更精確
- $\Theta(g(n)) = O(g(n)) = \Omega(g(n))$
### Little-Oh
> $f(n) = o(g(n)),\space 0 \le f(n) \lt cg(n)$
- for **all** positive constant $c$, exist some $n_0$ such $f(n) < cg(n)$ for all $n > n_0$
- e.g.
- $2n = o(n^2)$
- $2n^2 \neq o(n^2)$
### Little-omega
> $f(n) = \omega(g(n)),\space 0 \le cg(n) \lt f(n)$
- for **all** positive constant $c$, exist some $n_0$ such $cg(n) < f(n)$ for all $n > n_0$
- e.g.
- $\frac{1}{2}n^2 = \omega(n)$
- $\frac{1}{2}n^2 \neq \omega(n^2)$
### Implications
- $f(n) = \Theta(g(n)) \iff g(n)=\Theta(f(n))$
- $f(n) = O(g(n)) \iff g(n) = \Omega(f(n))$
- $f(n) = o(g(n)) \iff g(n) = \omega(f(n))$
Master Method
---
> a "cookbook" for solving complexity of some recursive functions
### Theorem
- $T(n) = aT(\frac{n}{b}) + f(n)$
- $a \ge 1, b \gt 1, f(n) \text{ is a positive function}$
1. $f(n) = O(n^{\log_b a-\epsilon})$ for some constant $\epsilon > 0$
- $T(n) = \Theta(n^{\log_b a})$
2. $f(n) = \Theta(n^{\log_b a})$
- $T(n) = \Theta(n^{\log_b a} \log_2 n)$
3. $f(n) = \Omega(n^{\log_b a+\epsilon})$ for some constant $\epsilon > 0$, and $af(\frac{n}{b}) \le cf(n)$ for some constant $c < 1$ and $n > n_0$
- $T(n) = \Theta(f(n))$
### examples
- e.g. $T(n) = 9T(\frac{n}{3}) + n$
- $a=9,b=3,f(n)=n,n^{\log_ba}=n^{\log_3 9}=n^2$
- $f(n) = n = O(n^{\log_b a-\epsilon}) = O(n^{\log_3 9-1})$
- by case 1, $T(n) = \Theta(n^{\log_b a}) = \Theta(n^2)$
- e.g. $T(n) = T(\frac{2n}{3}) + 1$
- e.g. $T(n) = 3T(\frac{n}{4}) + n \log_2 n$
Array
===
- array is a set of pair of <index, value>
- in C/C++, index start from 0
- 通常儲存在連續的記憶體中
- a[3] 的位址等於 a[0] 的位址 + 3 單位
- 在初始化時可以設 default value
2D array
---
- 二維,可以和數學的 matrix 對應
- $A_{i,j} = A[i-1][j-1]$
- Row-major
- Store in sequence $A_{1,1},A_{1,2},...,A_{1,n},A_{2,1},...,A_{2,n},...,A_{m,n}$
- a[i][j]的位址 = A[0][0]的位址 + i*n 個單位 + j 個單位
- Column-major
- Store in sequence $A_{1,1},A_{2,1},...,A_{m,1},A_{1,2},...,A_{m,2},...,A_{m,n}$
- a[i][j]的位址 = A[0][0]的位址 + j*m 個單位 + i 個單位
Triangular Matrix
---
> 有約一半的值是 0,存起來很浪費空間,所以不存
- lower-triangular
- row-major
- $L(A_{i,j}) = l_s + [\frac{(1+(i-1)) \times (i-1)}{2} + (j-1)] \times d$
- column-major
- 全部減後面
- $L(A_{i,j}) = l_s + [\frac{(1+m)\times m}{2} - \frac{(1+(m-j))(m-j)}{2} - (m-i+1)] \times d$
- upper-triangular
- 概念同上
Stack
===
Concept
---
- ordered
- 插入或刪除要依照規定的順序
- 只能插入/刪除 top 的資料
- 桶子
- stack 可以說是一個桶子
- 插入資料
- 往桶子裡丟,最後丟的會在最上面
- 取出/移除資料
- 從桶子裡拿,只能從最上面的開始拿
- LIFO
- last in first out
Implementation
---
- push(data)
- 放東西進去
- 放之前程式要檢查 stack 是否已經放滿
- else `stack overflow`
- pop()
- 取出資料
- 程式要檢查 stack 還有沒有資料
- else `stack underflow`
- peek()
- 偷看
- 看一下 top 的資料
- use array
- 宣告一個空間
- 從 0 開始放,一個指標紀錄 top 位置
Stack Permutation
---
- 給一組 elements 和一個 stack
- 將 elements 依序放入 stack,但中途可以 pop
- 所以會得到新的順序
- 該順序稱為 stack-sortable permutation
- e.g. ABC
- ACB: push (A), pop (A), push (B), push (C), pop (C), pop (B)
- Catalan number ([wiki](https://en.wikipedia.org/wiki/Catalan_number))
- how many stack-sortable permutations
- $$\frac{1}{n+1}C^{2n}_{n}$$
Prefix, Infix & Postfix
===
- Infix (中序表示法)
- `運算元1` + `運算子` + `運算元2`
- e.g. `A + B`
- Postfix (後序表示法)
- e.g. `AB X`
- infix to postfix
1. 將所有括號寫出來
2. 用運算子取代對應的右括號
3. 移除左括號
- Prefix (前序表示法)
- e.g. `* AB`
- infix to prefix
1. 將所有括號寫出來
2. 用運算子取代對應的左括號
3. 移除右括號
Red-Black Tree
===
> 紅黑樹
- definition
- nodes have color red or black
- leaf nodes doesn't store data
- root is always black
- leaves are always black
- red nodes always have two black children
- from any node to its leaf must pass same amount of black nodes
- self balanced
- longest path $\le$ 2 $\times$ shortest path
- 因為紅色都有兩個小孩
- 不會連續太多黑色,因為走過的黑色要一樣多
Insertion
---
1. if one black has two children, change its color
2. if needed, do rotation
3. insert node as red
4. if needed, do rotation
5. check root is black
Red-Black Tree VS AVL Tree
---
- $O(\log n)$ for searching, insertion, deletion
- AVL tree is more rigidly balanced
- AVL tree is slower when insertion & deletion, bus faster in searching
Contributors
===
- [calee](https://calee.tw)
- [alston](https://github.com/Alston-Jan)
- [issues](https://github.com/jp05451)