# Data Structures and Algorithms (DSA) [TOC] <br> 補充文章: - [用JAVA學資料結構與演算法](https://hackmd.io/@Aquamay/H1nxBOLcO/https%3A%2F%2Fhackmd.io%2F%40Aquamay%2Frk1C8ni5d) - [演算法&資料結構學習筆記](https://medium.com/@amber.fragments/演算法-學習筆記-0-前言-8be08a00600) --- ## Part 1. Algorithm ### 1.1 Good Pseudo Code should be ... - **modularize** - depends on **speaker/listener** - usually **no formal definition** Here is an example : ```c++= // SELECTION-SORT(A) for i=1 to A.length m = GET-MIN-INDEX(A, i, A.length) SWAP(A[i], A[m]) return A // which has been sorted in place // GET-MIN-INDEX(A,i,r) m = i // store current min.index for i=i+1 to r // update if i-th element smaller if A[m] > A[i] m = i return m ``` 在以上的例子中,就不需要將 SWAP 的細節一一寫出來。 --- ### 1.2 Criteria of Algorithm - **input** - **definiteness** clear instructions (人要能看懂) - **effectiveness** feasible instructions (電腦要能做到) - **finiteness** completable instructions (跑得完) - **output** output (**correctness**) : needs proving with respect to requirements --- ### 1.3 如何寫證明 ? - #### Claim - **Correctness of GET-MIN-INDEX** Upon exiting GET-MIN-INDEX(A), &nbsp;&nbsp;&nbsp;&nbsp; $A[m] = \underset{1 \leq j \leq n}{\min} A[j]$&nbsp;&nbsp;&nbsp;&nbsp; with n = A.length - #### Proof of Loop Invariant => correctness claim of algorithm - **invariant within GET-MIN-INDEX** Upon finishing the loop with i = k, donate m by $m_k$ &nbsp;&nbsp;&nbsp;&nbsp; $A[m_k] \le A[j] \quad for \quad j = 1,2,...,k$ (loop) invariant: property that algorithm maintains - **Mathematical Induction** - Base when i = 2, invariant true because... - Assume assume invariant true for i = t-1; when i = t, ... - Proof --- ## Part 2. Data Structure Good Algorithm Needs Proper Data Structure ### 2.1 Data Structure Needs Accessing Algorithms 好的 Data Structure 需要拿出來與放進去的方法 - #### GET - GET-BY-INDEX : for arrays - GET-NEXT : for sequential access - GET(key) : for search generally assume to **read without deleting** - #### INSERT - INSERT-BY-INDEX : for arrays - INSERT-AFTER : for sequential access - INSERT(key, data) generally assume to **add without overriding** rule of thumb : often-<span style="color:red">GET</span> $\iff$ <span style="color:blue">place</span> "nearby" (常用到的東西會放的近一點) --- ### 2.2 Data Structure Needs Maintenance Algorithms - #### CONSTRUCT - usually possible with multiple <span style="color:red">INSERT</span> - sometimes possible to be <span style="color:green">faster</span> than so - #### REMOVE - often viewed as deleting <span style="color:blue">after GET</span> - $\sim$ <span style="color:blueviolet">UN</span><span style="color:red">INSERT</span> : often harder than <span style="color:red">INSERT</span> - #### UPDATE - usually possible with <span style="color:blueviolet">REMOVE</span> + <span style="color:red">INSERT</span> - can be viewed as <span style="color:red">INSERT</span> <span style="color:orange">with overriding</span> hidden cost of data structure : maintenance effort (esp. <span style="color:blueviolet">REMOVE</span> & <span style="color:orange">UPDATE</span>) --- ### 2.3 INSERT of Ordered Array #### SWAP Version ```c++= // INSERT(A, data) n = A.length A[n+1] = data // put in the back for i=n downto 1 if A[i+1] < A[i] SWAP(A[i], A[i+1]) // cut in consecutive else return ``` #### Direct Cut-in Version ```c++= // INSERT(A, data) key = data i = A.length while i > 0 and A[i] > key A[i+1] = A[i] i = i - 1 A[i+1] = key return ``` INSERT : <span style="color:red">cut in</span> from <span style="color:blue">back</span> --- ### 2.4 CONSTRUCT of Ordered Array #### SELECTION-SORT ```c++= // SELECTION-SORT(A) for i=1 to A.length GET-MIN-INDEX(A,i,A.length) SWAP(A[i],A[m]) return A // GET-MIN-INDEX(A,l,r) m = l // store current min.index for i=l+1 to r // update if i-th element smaller if A[m] > A[i] m = i return m ``` #### INSERTION-SORT ```c++= // INSERTION-SORT(A) for i=1 to A.length INSERT(A,i) return A // INSERT(A,m) key = A[m] i = m-1 while i > 0 and A[i] > key A[i+1] = A[i] i = i - 1 A[i+1] = key ``` INSERTION-SORT : CONSTRUCT with multiple <span style="color:red">INSERT</span> --- ### 2.5 REMOVE and UPDATE of Ordered Array #### REMOVE ```c++= // REMOVE(A,m) i = m + 1 while i < A.length A[i-1] = A[i] // fill in i = i + 1 A.length = A.length - 1 ``` #### UPDATE ```c++= // UPDATE(A,m,data) key = data i = m if A[i] > key // insert to front i = i - 1 while i > 0 and A[i] > key A[i+1] = A[i] i = i - 1 A[i+1] = key else // insert to back i = i + 1 while i < A.length and A[i] < key A[i-1] = A[i] i = i + 1 A[i+1] = key ``` ordered array : more <span style="color:blueviolet">maintenance efforts</span> $\implies$ faster GET? --- ### 2.6 Sequential Search Algorithm for Any Array <span style="color:blue">SEQ-SEARCH</span> : structurally similar to <span style="color:red">GET-MIN-INDEX</span> Review of GET-MIN-INDEX : ```c++= // GET-MIN-INDEX(A,i,r) m = l // store current min.index for i=l+1 to r // update if i-th element smaller if A[m] > A[i] m = i return m ``` ```c++= // SEQ-SEARCH(A,key,l,r) for i = l to r // return when found if A[i] equals key return i return NIL ``` --- ### 2.7 Ordered Array : Sequential Search with Shortcut ordered : possibly <span style="color:blueviolet">easier</span> to declare <span style="color:orange">NIL</span> ```c++= // SEQ-SEARCH-SHORTCUT(A,key,l,r) for i = l to r // return when found if A[i] equals key return i else if A[i] > key return NIL return NIL ``` --- ### 2.8 Ordered Array : Binary Search Algorithm ```c++= // BIN-SEARCH(A,key,l,r) while l <= r m = floor((l+r)/2) if A[m] equals key return m else if A[m] > key r = m - 1 // cut out end else if A[m] < key l = m + 1 // cut out begin return NIL ``` <span style="color:red">BIN-SEARCH</span> : multiple shortcuts by <span style="color:red">quickly checking the middle</span> #### Binary Search in Open Source Using java.util.Arrays, ```c++= private static int binarySearch(int[] a, int key) int low = 0; int high = a.length - 1; while (low <= high) { int mid = (low + high) >>> 1; int midVal = a[mid]; if (midVal < key) low = mid + 1; else if (midVal > key) high = mid - 1; else return mid; // key found } return -(low + 1) // key not found ``` --- ### 2.9 Why Data Structures and Algorithms? good <span style="color:blueviolet">program</span> : proper use of <span style="color:blueviolet">resources</span> - #### Space Resources - momory - disk(s) - transmission bandwidth usually cared by <span style="color:red">data structure</span> - #### Computation Resources - CPU(s) - GPU(s) - computation power usually cared by <span style="color:blue">algorithm</span> - Other Resources - manpower - budget usually cared by <span style="color:green">management</span> <span style="color:red">data structures</span> and <span style="color:blue">algorithms</span> : for writing good <span style="color:blueviolet">program</span> --- ### 2.10 Proper Use : Trade-off Different Factors - faster GET $\iff$ slower INSERT and/or maintenance - more space $\iff$ faster computation - harder to implement/debug $\iff$ faster computation <span style="color:blueviolet">good program</span> needs understanding trade-off --- ### 2.11 Summary of Data Structure - definition of data structure organize data with <span style="color:orange">access/maintenance</span> algorithms - ordered array as data structure <span style="color:orange">insert by cut-in, remove by fill-in</span> - GET (search) in ordered array <span style="color:orange">binary search</span> using <span style="color:orange">order</span> for shortcuts - why data structures and algorithms study <span style="color:orange">trade-off</span> to move from coding to programming --- ### 2.12 Appendix :question: 生成測試資料的技巧 1. 思考什麼情形最容易出錯,例如最極端情形/邊緣情形 (boundary case),需思考到全面狀況 2. 如何生成大規模測試資料 <br> --- ## Part 3. Linked List ### 3.1 Issue of Order Array for Polynomial Computation Esssentially <span style="color:blue">INSERT</span> in ordered array cause many <span style="color:blueviolet">content moving</span> when cutting in. (ordered array : less flexible for <span style="color:blue">insertion & removal</span>) Use Linked List for flexible insertion, no <span style="color:blueviolet">content moving</span>, just <span style="color:orange">changing pointers</span> <span style="color:red">overhead of next</span> $\implies$ flexible <span style="color:blue">INSERT-AFTER</span> --- ### 3.2 Algorithms (Operations) for Linked List - GET-DATA(clue) $\implies$ GET-DATA(node) return data[clue] &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return node.data - GET-NEXT(clue) $\implies$ GET-NEXT(node) return <span style="color:red">next[clue]</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; return <span style="color:red">node.next</span> ```c++= // INSERT-AFTER(node, data) newNode = NODE(data, node.next) node.next = newNode // REMOVE-AFTER(node) oldNode = node.next node.next = oldNode.next return oldNode ``` <span style="color:red">linked</span> list : flexibility to allocate NODE <span style="color:red">anywhere</span> --- ### 3.3 Linked List : Ordered v.s. Unordered #### Ordered Linked List - output polynomial <span style="color:red">orderly</span> - INSERT : (sequential) <span style="color:red">search</span> before insertion - UPDATE : <span style="color:red">remove then insert</span> #### Unordered Linked List - <span style="color:blue">general purposes</span> - INSERT : usually just <span style="color:blue">head insertion</span> - UPDATE : <span style="color:blue">simply change</span> the node <span style="color:red">ordered</span> v.s. <span style="color:blue">unordered</span> : trade-off between <span style="color:red">(orderly) output</span> & <span style="color:blue">maintenance</span> time --- ### 3.4 REMOVE-HERE for Linked List For example: node@4.next = node@2.next. But here is the problem, do not know where node@4 is, unless sequential search Result : <span style="color:blueviolet">REMOVE-HERE</span> is hard for <span style="color:orange">singly</span> linked list <br> Then there is the solution for REMOVE-HERE : ### 3.5 Doubly Linked List : More Flexible REMOVE-HERE ```c++= // REMOVE-HERE(node) node.prev.next = node.next node.next.prev = node.prev return node ``` <span style="color:blue">overhead of prev</span> in doubly linked list $\implies$ flexible REMOVE-HERE --- ### 3.6 INSERT for Doubly Linked List ```c++= // INSERT-AFTER(node, data) newNode = NODE(data, node.next, node) node.next = newNode newNode.next.prev = newNode // INSERT-BEFORE(node, data) newNode = NODE(data, node, node.prev) node.prev = newNode newNode.prev.next = newNode ``` not just space overhead, but also <span style="color:blue">time overhead</span> for <span style="color:blue">doubly linked list</span> --- ### 3.7 Sparse Vector : Dense Array versus Linked List $[1,0,0,0,0,3,0,4] \in \mathbb{R}^8$ - #### Dense Array Representation | index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | | ----- | --- | --- | --- | --- | --- | --- | --- | --- | | value | 1 | 0 | 0 | 0 | 0 | 3 | 0 | 4 | - #### Linked List Representation store **non-zero** values with ordered linked list of (index, value) $(1,1) \to (6,3) \to (8,4)$ storing only non-zeros : time/space efficient if many zeros --- ### 3.8 Merging Sparse Vectors &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$[1,0,0,0,0,3,0,4] + [0,0,0,0,0,6,9,0]$ $(1,1) \to (6,3) \to (8,4) + (6,6) \to (7,9)$ ```c++= // SUM(A,B) L = empty list, ca = A.head, cb=B.head while ca != NIL and cb != NIL if ca.index equals cb.index if (ca.value+cb.value) != 0 L.INSERT-TAIL(NODE(ca.value+cb.value,NIL)) ca = ca.next, cb = cb.next else if ca.index > cb.index L.INSERT-TAIL(NODE(cb.value,NIL)) cb = cb.next else L.INSERT-TAIL(NODE(ca.value,NIL)) ca = ca.next [L.INSERT-TAIL on the non-NIL cursor] return L ``` #### Real-World Use of Sparse Vector : LIBSVM ```c++= // svm.cpp double Kernel::dot(const svm_node *px, const svm_node *py) { double sum = 0; while (px->index != -1 && py->index != -1) { if (px->index == py->index) { sum += px->value * py->value; ++px; ++py; } else { if (px->index > py->index) ++py; else ++px; } } return sum; } ``` --- ### 3.9 Summmary of Linked List - motivation of linked list overhead of <span style="color:orange">next</span> for flexibility - doubly linked list overhead of <span style="color:orange">prev</span> for more flexibility - linked list for sparse vector storing non-zeros for time/space efficiency - linked list in job interviews often used to test basic computational thinking <br> --- ## Part 4. Analysis Tools ### 4.1 Properties of Good Programs #### Good Program - meet requirements, correctness: basic - clear usage document (external), readability (internal), etc. #### Proper Resource Usage - efficient use of computation resources (CPU,GPU,etc.) $\to$ <span style="color:blue">time complexity</span> - efficient use of storage resources (memory,disk,etc.) $\to$ <span style="color:blue">space complexity</span> --- ### 4.2 Space Complexity of GET-MIN-INDEX ```c++= // GET-MIN-INDEX(A,i,r) m = i // store current min.index for i=i+1 to r // update if i-th element smaller if A[m] > A[i] m = i return m ``` - array A : pointer size <span style="color:blue">$d_1$</span> (not counting the actual input elements) - integer m : size <span style="color:blue">$d_2$</span> - integer i : size <span style="color:blue">$d_2$</span> total space <span style="color:blue">$d_1 + 2d_2 (constant)$</span> within algorithm execution : **not dependent to n = A.length** --- ### 4.3 Space Complexity of GET-MIN-INDEX-WASTE ```c++= // GET-MIN-INDEX-WASTE(A) B = COPY(A,1,A.length) INSERTION-SORT(B) return B[1] // 傳回最小值 ``` - array A : pointer size <span style="color:blue">$d_1$</span> (not counting the actual input elements) - array B : pointer size <span style="color:blue">$d_1$</span> and n = A.length intergers <span style="color:blue">$n \cdot d_2$</span> (跟著資料量成長最多) total space <span style="color:blue">$2d_1 + d_2n$ (linear) : dependent to n</span> --- ### 4.4 Time Complexity of Insertion Sort ```c++= // INSERTION-SORT(A) for m = 2 to A.length key = A[m] // Insert A[m] into the sorted sequence A[1...m-1] i = m - 1 while i > 0 and A[i] > key A[i+1] = A[i] i = i - 1 A[i+1] = key ``` total time $T(n)$ = $\color{blue}{d_1}n + \color{blue}{d_2}(n - 1) + \color{blue}{d_4}(n - 1) + \color{blue}{d_5}\sum_\limits{m=2}^{n}\color{red}{t_m} + \color{blue}{d_6}\sum_\limits{m=2}^{n}(\color{red}{t_m} - 1) + \color{blue}{d_7}\sum_\limits{m=2}^{n}(\color{red}{t_m} - 1) + \color{blue}{d_8}(n - 1)$ cost <span style="color:blue">$d_x$ depends on machine type</span> : total $T(n)$ depends on $n$ and <span style="color:red">$t_m$ number of **while** checks</span> --- ### 4.5 Time Complexity Analysis in Reality - #### Common Focus worst-case time complexity - meaningful in practice (waiting time) - similar to average-case when near-worst-case often enough - #### Common Language 'rough' time needed w.r.t. n (input size) - care about larger n - leading (bigger) term more important - insensitive to constants --- ### 4.6 'Rough' Notation - care about larger $n$ - leading term more important - insensitive to constants In following case : &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $\underbrace{cn^2 + bn + a}_\text{f(n)} = \underbrace{\Theta(n^2)}_\text{g(n)}$ for <span style="color:red">positive</span> $f(n)$ and $g(n)$ [when n $\ge$ 1] <br> #### Representing 'Rough' by Asymptotic Behavior &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $\underbrace{cn^2 + bn + a}_\text{f(n)} = \underbrace{\Theta(n^2)}_\text{g(n)}$ - growth of $bn + a$ slower than $g(n) = n^2$ : removed by dividing $g(n)$ for large $n$ - asymptotically, two functions only differ by $c > 0$ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $\lim\limits_{n\to\infty}\frac{f(n)}{g(n)} = c$ <br> In conclusion, for <span style="color:red">positive</span> $f(n)$ and $g(n)$, &nbsp; $f(n) = \Theta{g(n)}\ \text{ if } \ \lim_{n\to\infty}\frac{f(n)}{g(n)} = c > 0$ <br> **big-$\Theta$ : roughly the same** - Back to the before, definition meets criteria : - care about larger $n$ : yes $n \to \infty$ - leading term more important : yes, $n + \sqrt{n} + log{n} = \Theta(n)$ - insensitive to constants: yes, for example : $1126n = \Theta(n)$ - meaning : $f(n)$ grows roughly the same as $g(n)$ - "$=\Theta(\cdot)$" actually "$\in$" <br> | | $\sqrt{n}$ | $0.1126n$ | $n$ | $112.6n$ | $n^{1.1}$ | $e^n$ | $n+\sqrt{n}$ | | ------------ | ---------- | --------- | --- | -------- | --------- | ----- | ----------- | | $\Theta(n)?$ | N | Y | Y | Y | N | N | Y | Asymptotic notation is the most used 'language' for time/space complexity --- ### 4.7 Big-O Notation Use binary search as example, which best case $O(1)$ and worst case $\Theta(log n)$ for 'any' binary search task, needed time roughly no more than $log(n)$ $\to$ but usually we don't need ~~best case~~ <br> <img src='https://cstaleem.com/wp-content/uploads/2023/03/Theta-Notation-%CE%98-notation.png'><br><br> **Big-O Notation** $f(n)$ grows slower than or similar to $g(n)$ : ("$\le$") One side of $\Theta(\cdot)$ definition : &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $f(n) = O(g(n))$ iff exist positive($c_2$,$n_0$) such that $f(n) \le c_2 \cdot g(n)$ for all $n \ge n_0$ Order (由小到大) : &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $log_2n < n < nlog_2n < n^2 < n^3 < 2^n < n!$ --- ### 4.8 Three Big Asymptotic Notation - $f(n)$ grows slower than or similar to $g(n)$ : ("$\le$") $f(n) = O(g(n))$ iff exist $c$,$n_0$ such that $f(n) \le c \cdot g(n)$ for all $n \ge n_0$ - $f(n)$ grows faster than or similar to $g(n)$ : ("$\ge$") $f(n) = \Omega(g(n))$ iff exist $c$,$n_0$ such that $f(n) \ge c \cdot g(n)$ for all $n \ge n_0$ - $f(n)$ grows similar to $g(n)$ : ("$\approx$") $f(n) = \Theta(g(n))$ iff $f(n) = O(g(n))$ and $f(n) = \Omega(g(n))$ usually only need Big-O --- ### 4.9 Transitivity of Big-O #### Theorem if $f(n) = O(g(n))$ $[f \le g]$, &nbsp; $g(n) = O(h(n))$ then $f(n) = O(h(n))$ $[g \le h]$ #### Proof - When $n \ge n_1$,&nbsp; $f(n) \le c_1g(n)$ - When $n \ge n_2$,&nbsp; $g(n) \le c_2h(n)$ - So, when $n \ge max(n_1, n_2)$,&nbsp; $f(n) \le c_1c_2h(n)$ Similar for $\Omega$ and $\Theta$ --- ### 4.10 Closure of Big-O #### Theorem if $f(n) = O(h(n))$, &nbsp; $g(n) = O(h(n))$ then $f(n) + g(n) = O(h(n))$ #### Proof - When $n \ge n_1$, &nbsp; $f(n) \le c_1h(n)$ - When $n \ge n_2$, &nbsp; $g(n) \le c_2h(n)$ - So, when $n \ge max(n_1, n_2)$, &nbsp; $f(n) + g(n) \le (c_1 + c_2)h(n)$ Similar for $\Omega$ and $\Theta$ --- ### 4.11 Summary of Analysis Tools - motivation quantity <span style="color:orange">time/space complexity</span> to measure efficiency - cases of complexity analysis often focus on <span style="color:orange">worst-case</span> with 'rough' notations - asymptotic notation <span style="color:orange">rough comparison of function for large n</span> - usage of asymptotic notation describe $f(n)$ (time, space) by simpler $g(n)$ - operations on asymptotic complexity analyze pseudo code by pieces - practical complexity how much more resource needed? <br> --- ## Part 5. Stack #### Last-In-First-Out (LIFO) Stack is a restricted data stucture, but important for computer science. All insertions and deletions of entries are made at one end, which called the **top** of the stack. ### 5.1 使用堆疊呼叫函式 - 呼叫函式時 <img src='https://miro.medium.com/v2/resize:fit:1100/1*nMAYYF1N04SjoRfdRmp4NA.gif'> - 執行函式時 <img src='https://miro.medium.com/v2/resize:fit:1100/1*Kk8KFEJ_mJmrv-hQN21gGg.gif'> --- ### 5.2 The Three Stack Operations #### PUSH ```c++= PUSH(S, data) // INSERT usually named PUSH // put data onto top of S ``` #### PEEP ```c++= PEEP(S) // GET usually named PEEP // return top element of S ``` #### POP ```c++= POP(S) // REMOVE usually named POP // remove and return top element of S ``` Sometimes other utility functions like SIZE() or ISEMPTY() --- ### 5.3 Stacks Implemented on Array #### PUSH ```c++= // PUSH(S, data) S.top = S.top + 1 S.arr[S.top] = data ``` #### POP ```c++= // POP(S) S.top = S.top - 1 return S.arr[S.top + 1] ``` Usually consecutive array with S.top at 'tail' of array for O(1) operations. --- ### 5.4 Stack Solution to Parentheses Balancing #### Condition Any ')' should match last unmatched '(' (LIFO) #### Method '(' : PUSH,&nbsp; ')' : POP #### Parentheses Balancing Algorithm ```c++= S = empty stack for each c in input if c is '(' PUSH(S, c) else // c is ')' if not ISEMPTY(S) POP(S) else return FALSE return ISEMPTY(S) ``` --- ### 5.5 Stack Solution to Infix-Postfix Translation 由於電腦看不懂先乘除後加減,所以需用這個方式表達。 補充文章 : [Link](https://www.geeksforgeeks.org/convert-infix-expression-to-postfix-expression/?ref=lbp) #### Postfix postfix evaluation with operand stack S ```c++= S = empty stack for each token in input if token is a number PUSH(S, token) else if token is an operator // 注意 a,b 先後,會影響關係 b = POP(S) a = POP(S) PUSH(S, token(a, b)) return POP(S) ``` #### Infix to Postfix Infix to Postfix with operator stack S2 ```c++= S2 = empty stack for each token in input if token is a number output token else if token is a operator while not ISEMPTY(S2) and PEEP(S2) is higher/same precedence output POP(S2) PUSH(S2, token) ``` Mixing the two Algorithms(or stacks S1, S2) will be a simple calculator --- ### 5.6 Summary of Stack - motivation temporary storage with <span style='color:orange'>LIFO</span> - implementation <span style="color:orange">$O(1)$ push/pop from tail of array</span> - application: postfix evaluation stack as temporary storage of partial results - application: expression parsing stack as temporary storage of waiting operands <br> --- ## Part 6. Queue #### First-In-First-Out (FIFO) Queue is a restricted data stucture, but important for computer science. ### 6.1 Packet Queue in Networking <br> <img src='https://i.stack.imgur.com/LUpE6.gif'><br><br> Multiple queues with different priority for weighted fair queuing. In real world use, from simple/single queue to complicated/multiple queues. --- ### 6.2 Queue Implemented on Circular Array <br> <img src='https://marxtudor.com/wp-content/uploads/2015/04/circular-queue-explained-1.jpg'> <br><br> #### ENQUEUE ```c++= // ENQUEUE(Q, data) Q.arr[Q.tail] = data if Q.tail == Q.length Q.tail = 1 ``` #### DEQUEUE ```c++= // DEQUEUE(Q) x = Q.arr[Q.head] if Q.head == Q.length Q.head = 1 else Q.head = Q.head + 1 return x ``` --- ### 6.3 General Maze Algorithm #### Condition - <span style='color:blue'>start from some location</span> - <span style='color:red'>storage of future candidate</span> - <span style='color:blueviolet'>record of visited</span> #### Code ```c++= // MAZE-OUT(M) Candidate = {} all Visited[i,j] = false Candidate.INSERT(M.begin_i, M.begin_j) // get out of M starting from (M.begin_i, M.begin_j) while Candidate not empty (i, j) = Candidate.REMOVE() if Visited[i,j] is false Visited[i,j] = true for each non-Visited(k,l) neighbor of (i,j) if (k,l) is exit return SUCCEED else Candidate.INSERT(k,l) return NO-WAY-OUT ``` Walk to <span style='color:limegreen'>exit</span> by <span style='color:limegreen'>random</span> remove from <span style='color:red'>Candidate</span>. --- ### 6.4 Maze Algorithm by Stack ```c++= // MAZE-OUT-STACK(M) Candidate = {} all Visited[i,j] = false Candidate.PUSH(M.begin_i, M.begin_j) // get out of M starting from (M.begin_i, M.begin_j) while Candidate not empty (i,j) = Candidate.POP() if Visited[i,j] is false Visited[i,j] = true for each non-Visited(k,l) neighbor of (i,j) if (k,l) is exit return SUCCEED else Candidate.PUSH(k,l) return NO-WAY-OUT ``` LIFO of <span style='color:limegreen'>Stack</span> $\implies$ <span style='color:limegreen'>later neighbor</span> first $\implies$ <span style='color:limegreen'>last path</span> out <span style='color:limegreen'>Stack</span> : last path out by **D**epth-**F**irst **S**earch --- ### 6.5 Maze Algorithm by Queue ```c++= // MAZE-OUT-QUEUE(M) Candidate = {} all Visited[i,j] = false Candidate.ENQUEUE(M.begin_i, M.begin_j) // get out of M starting from (M.begin_i, M.begin_j) while Candidate not empty (i,j) = Candidate.DEQUEUE() if Visited[i,j] is false Visited[i,j] = true for each non-Visited(k,l) neighbor of (i,j) if (k,l) is exit return SUCCEED else Candidate.ENQUEUE(k,l) return NO-WAY-OUT ``` FIFO of <span style='color:limegreen'>Queue</span> $\implies$ <span style='color:limegreen'>nearby neighbors</span> first $\implies$ <span style='color:limegreen'>shortest unit-step path</span> out <span style='color:limegreen'>Queue</span> : shortest unit-step path out by **B**readth-**F**irst **S**earch --- ### 6.6 Deques (雙頭佇列) **Deque = Stack + Queue + push_front** - action : [constant-time] push_back (like push and enqueue), pop_back (like pop), pop_front (like dequeue), push_front - application : job scheduling Can be implemented by circular array or doubly-linked list. --- ### 6.7 Summary of Queue - application : maze solving different data structure causes different algorithm behaviour - <span style="color:orange">stack + queue = deque</span> union of both <br> --- ## Part 7. Tree ### 7.1 Nature of Data Structures | data structure | nature | | ----------------- | ---------------------------- | | array | indexed access | | linked list | dequential access | | stack/queue/deque | restricted (boundary) access | | tree | hierarchical access | --- ### 7.2 Formal Definition of (Rooted) Tree $T = (\color{blue}{root};$ &nbsp; $\color{red}{T_1, T_2, \dots, T_n})$ - recursive definition - <span style='color:blue'>T.root for starting tree access</span> (like L.head) - <span style='color:red'>disjoint sub-trees $(T_1, \dots, T_n)$</span> - recursion termination : <span style='color:limegreen'>$T_{LEAF}$ with no sub-trees</span> <br> - #### Height of Tree height = node max depth + 1 usually want <span style='color:limegreen'>small height</span> to access efficiently from <span style='color:blue'>root</span> <br> <img src='https://media.geeksforgeeks.org/wp-content/uploads/20221107231243/heightexample2-660x399.JPG' style='width:400px'><br> - #### Node of Tree node degree = # sub-trees - internal nodes : degrees > 0 - external nodes : degrees = 0, <span style='color:limegreen'>leaves</span> <span style='color:limegreen'>leaves</span> some times called <span style='color:limegreen'>breadth</span> of tree <br><img src='https://media.geeksforgeeks.org/wp-content/uploads/20221107231626/depthexample1-660x468.JPG' style='width:400px'><br> - #### Family Relatives of Tree <br><img src='https://hyosup0513.github.io/public/images/tree5.PNG' style="width:400px"><br> --- ### 7.3 Basic Algorithms (Operations) for Tree $T = (\color{blue}{root};$ &nbsp; $\color{red}{T_1, T_2, \dots, T_n})$ - #### GET-DATA - linked list ```c++= // GET-DATA(L.node) return L.node.data ``` - tree ```c++= // GET-DATA(T.node) return T.node.data ``` - #### GET-NEXT - linked list ```c++= // GET-NEXT(L.node) return L.node.next ``` - tree ```c++= // GET-SUBTREE(T.node, index) return T.node.subtree.GET(index) ``` - #### <span style="color:red">INSERT-AFTER / INSERT-CHILD</span> - linked list ```c++= // INSERT-AFTER(L.node, data) newNode = NODE(data, L.node.next) L.node.next = newNode ``` - tree ```c++= newNode = NODE(data,[]) // no child T.node.subtree.INSERT(newNode) ``` In General, tree is extension of linked list. --- ### 7.4 Representing Trees with Arrays and Linked List #### Representing Trees with Arrays <br><img src='https://i.stack.imgur.com/7MkVp.png'><br> Can also have parent link (like doubly linked list) <br> #### Representing Trees with Linked List Called <span style="color:red">left-child</span> <span style="color:limegreen">right-sibling</span> (LCRS) <br><img src='https://examradar.com/wp-content/uploads/2016/10/Figure-5.2.7.-Linked-representation-for-the-binary-tree.png?x59255'><br><br> <img src='https://opendsa-server.cs.vt.edu/ODSA/Books/Everything/html/_images/ChildLst.png'><br><br> <img src='https://opendsa-server.cs.vt.edu/ODSA/Books/Everything/html/_images/GenLkFx.png'><br> <hr> ### 7.5 C&RT for Algorithm for Decision Tree ``` function DecisionTree(data) { if cannot branch anymore return best color else 1. learn branching criteria to cut the plane 2. split data to 2 parts 3. build sub-tree from each part 4. return (branching criteria; sub-trees) } ``` <span style="color:orange">C&RT : 'divide-and-conquer'</span> (based on <span style="color:blueviolet">selected components</span> of $CART^{TM}$ of California Statistical Software) --- ### 7.6 Summary of Tree - intuition <span style="color:orange">hierarchical(階層式) accessform root of tree</span> - terminologies family relatives useful for describing node relations - implementations <span style="color:orange">more complicated links than array/linked list</span> - application : decision tree <span style="color:orange">divide-and-conquer model in machine learning</span> <br> --- ## Part 8. Binary Tree - binary tree : rooted tree with node degree $\le$ 2 & <span style="color:green">left</span>/<span style="color:red">right</span> difference <br><img src='https://media.geeksforgeeks.org/wp-content/uploads/20200219144238/General-Tree-vs-Binary-Tree.png'><br> - <span style="color:red">extended</span> binary tree : added <span style="color:red">empty external nodes</span> so all <span style="color:blueviolet">degree = 2</span> & all original nodes <span style="color:blue">blue</span> <br><img src='https://easytechnotes.com/wp-content/uploads/2023/06/30_20231.png'><br> 補充連結 : [Link](https://hackmd.io/@vRN1CwEsTLyHOsG4mC0d4Q/SJJ6ofDn4) --- ### 8.1 Outputting Infix Expression from Expression Tree Expression tree : (binary) expression represented by a (binary) tree. (將運算語意用tree表達) #### want : 3 * 4 - ( 5 + 6 ) * 7 + 8 * 9 ```c++= // OUTPUT-INFIX(T) if IS-LEAF(T) print T.data else if (T.data higher precedence than T.left.data) print '(' OUTPUT-INFEX(T.left) if (T.data higher precedence than T.left.data) print')' print data if (T.data higher precedence than T.right.data) print')' OUTPUT-INFEX(T.right) if (T.data higher precedence thatn T.right.data) print ')' ``` OUTPUT-INFIX is variant of INORDER-TRAVERSAL <hr> ### 8.2 From OUTPUT-INFIX to INORDER-TRAVERSAL 左邊先做,再做中間,再做右邊。 #### OUTPUT-INFIX(T) ```c++= if IS-LEAF(T) print T.data else if ... OUTPUT-INFIX(T.left) if ... print T.data if ... OUTPUT-INFIX(T.right) if... ``` #### INORDER-TRAVERSAL(T) ```c++= if T is NIL else INORDER-TRAVERSAL(T.left) action with T.data INORDER-TRAVERSAL(T.right) ``` <hr> ### 8.3 POSTORDER-TRAVERSAL & PREORDER-TRAVERSAL <img src='https://leetcode.com/articles/Figures/145_transverse.png'><br> #### POSTORDER-TRAVERSAL(T) ```c++= if T is not NIL POSTORDER-TRAVERSAL(T.left) POSTORDER-TRAVERSAL(T.right) action with T.data ``` #### PREORDER-TRAVERSAL(T) ```c++= if T.is not NIL action with T.data PREORDER-TRAVERSAL(T.left) PREORDER-TRAVERSAL(T.right) ``` #### INORDER-TRAVERSAL(T) ```c+== if T is not NIL INORDER-TRAVERSAL(T.left) action with T.data INORDER-TRAVERSAL(T.right) ``` #### Sorting <img src='https://d18l82el6cdm1i.cloudfront.net/uploads/jxTW76g4MR-traversal.gif'> <hr> ### 8.4 Full Tree and Complete Tree #### Node 與 Height 的關係 : &nbsp; &nbsp; $h \le n \le 2^h - 1 \iff lg(n+1) \le h \le n$ &nbsp; (備註: h is height, n is nodes) <span style="color:red">**Note : 樹的演算法的複雜度,經常與高度有關**</span> #### Tree 種類圖 : [Full Tree v.s. Complete Tree 對比解說](http://alrightchiu.github.io/SecondRound/binary-tree-introjian-jie.html)<br> <img src='https://i.stack.imgur.com/n2CFS.png'><br> <hr> ### 8.5 Summary of Binary Tree - terminologies <span style="color:orange">binary tree = tree with left/right sub-trees</span> - bianry tree traversals templates for (recursive) tree algorithm design - full and complete binary trees can be implemented in array <span style="color:orange">without wasting space</span> <br> <hr> ## Part 9. Priority Queue Priority Queue : 'extension' of queue for more realistic use, Max-Priority-Out. ### 9.1 Priority Queue with Binary Tree - #### previously | | data | | ---- | ----- | | <span style="color:green">left</span> | <span style="color:red">right</span> | - #### next | key | data | | --- | ---- | | <span style="color:green">left</span> |<span style="color:red">right</span> | <br> key : priority (the <span style="color:blueviolet">larger</span> the better) data : item in todo list (will show key only for simplicity) Goal : get the node with <span style="color:blueviolet">largest</span> key fast (& remove it) <hr> ### 9.2 Max Tree faster access to <span style="color:blueviolet">lartest key</span> if put at T.root max-root (binary) tree : <span style="color:blueviolet">largest key @ root</span> 以上的問題是 hard to <span style="color:blueviolet">remove largest</span> fast if cannot easily locate <span style="color:red">second largest</span> 解決方法 : 保持第二、三優先度(維持最大的)在頂端,若是root消失,下一層對比找大的補上來。 max (binary) tree : <span style="color:blueviolet">largest key @ root</span> <span style="color:red">for every subtree</span> <hr> ### 9.3 Simple Removal for Max Tree ```c++= // REMOVEL-LARGEST(max tree T) result = T.data cursor = T.root // 要缺位的 while (cursor not NIL) larger = COMPARE(cursor.left, cursor.right) if (larger not NIL) cursor.key = larger.key cursor.data = larger.data cursor = larger return result ``` REMOVE-LARGEST for height-h max-tree takes O(h) time. <hr> ### 9.4 Max Heap : "Shortest" Max Tree ### 9.5 REMOVE-LARGEST for Max Heap **max heap = max tree + complete binary tree** 由於先維持 Max Tree 再維持 Complete Tree 不容易,所以先維持 Complete Tree。 ```c++= // REMOVE-LARGEST(max heap T) result = T.data remove T.tail & copy it to T.root // tail 是尾巴(編號最大)的那個 cursor = T.root while (cursor != NIL) largest = LARGEST(cursor, cursor.left, cursor.right) if (largest == cursor) break else SWAP(cursor, largest) cursor = largest return result ``` 大約 $O(logn)$ <hr> ### 9.6 INSERT for Max Heap ```c++= // INSERT(max heap T, key, data) insert(key, data) to T.tail cursor = T.tail while (cursor!= NIL) larger = LARGER(cursor, cursor.parent) if (larger == cursor) break else SWAP(cursor, larger) cursor = larger ``` <br> <hr> ## Part 10. Heap #### Heap $\iff$ Partially Ordered Array unordered array $\to$ heap selection sort $\to$ heap sort ### 10.1 Selection Sort v.s. Heap Sort #### Selection Sort ```c++= // SELECTION-SORT(A) for i=1 to A.length m = GET-MIN-INDEX(A, i, A.length) SWAP(A[i], A[m]) return A // which has been sorted in place // GET-MIN-INDEX(A,i,r) m = i // store current min.index for i=i+1 to r // update if i-th element smaller if A[m] > A[i] m = i return m ``` time : $O(n) \cdot O(n) = O(n^2)$ <span style="color:orange">without any preprocessing</span> #### Heap Sort 1. Complete Tree : 所有節點靠左 2. Maximum Heap : 上層堆滿後才可以堆下一層 (heap),且父節點的 key 比子節點的 key 大 ```c++= // HEAP-SORT(A) for i = A.length downto 1 (key, data) = REMOVE-LARGEST(A, 1, A.length) copy (key, data) to A[i] // original A[i] is already in A[1] return A // which has been sorted in place ``` time : $O(n) \cdot O(log n) = O(nlogn)$ <span style="color:orange">after building heap</span> Heap Sort : faster selection sort algorithm with help of heap data structure,可應用於 Priority Queue,只能用 Contiguous 實作。 <hr> ### 10.2 Huffman Tree <img src='https://www.researchgate.net/publication/323883375/figure/fig3/AS:606308860444673@1521566705849/Huffman-coding-the-final-tree-of-the-code.png'><br> #### Rule - each <span style="color:red">leaf node</span> stores a symbol (i.e. 'B', 'R') - <span style="color:blue">path</span> to <span style="color:red">leaf node</span> $\equiv$ <span style="color:blue">code</span> of symbol - <span style="color:blue">compress</span> symbol string under some assumptions **Important behind lossless compression (zip, inside jpeg)** <hr> ### 10.3 Huffman Tree Construction - **compute frequency of each symbol** 紀錄每個字元出現的頻率(個數) - **remove two smallest-frequency nodes; build sub-tree & insert** - **repeat the step (tile done)**<br> <img src='https://www.researchgate.net/publication/3337841/figure/fig3/AS:394708393185337@1471117222874/Example-of-the-Huffman-tree-and-its-three-possible-encodings-a-Illustration-example.png'><br> <hr> ### 10.4 DSA Takeaway - binary tree is useful $\to$ huffman tree - heap is useful $\to$ 'remove two smallest-freq' when constructing huffman tree <hr> ### 10.5 Summary of Heap - motivation <span style="color:orange">possibly efficient priority queue by max tree</span> - max heap <span style="color:orange">max + complete binary tree for $O(log n)$ removal / insertion</span> - heap sort <span style="color:orange">max heap ini array to speed up selection sort</span> - application : huffman tree building <span style="color:orange">priority queue to build huffman tree for compression</span> <br><hr> ## Part 11. Binary Search Tree tree 在 memory 上比較 flexible (相較於array),比較 binary search tree 以及 binary search。 #### Design wishes - More flexible insertion than array : (binary) tree - Similar algorithmic structure to BIN-SEARCH #### Key idea root $\sim$ middle - Ordered array : [ left, middle ) < middle < (middle, right] - <span style="color:orange">Binary search tree</span> : left sub-tree < root < right sub-tree <span style="color:orange">for every tree, sub-tree, sub-sub-tree, $\dots$</span> (只要看左邊或右邊就好) - 在插入的順序中,小'於 parent 的節點往左邊放,大於 parent 的節點往右邊放 $\implies$ 所以從整棵樹來看,不是根據 root 大小來判斷節點在樹的左右邊 #### Requirements (of Binary Tree) - The list must be <span style="color:orange">ordered</span> - Rapid random access is required, so <span style="color:orange">cannot</span> use linked list <span style="color:orange">binary search tree</span> : binary search with <span style="color:orange">tree</span> rather than ordered array ### 11.1 Binary Search Tree 將 binary search 的 middle 替換成 T.root (假的中點) #### Binary Search ```c++= // BIN-SEARCH(A,key) left = 1, right = A.length while left <= right mid = floor((left+right)/2) if A[mid] = key return mid else if A[mid] > key right = mid - 1 // cut out end else if A[mid] < key left = mid + 1 // cut out begin return FAIL ``` complexity : $O(log n)$ #### Binary Search Tree ```c++= // BST-SEARCH(T, key) while T != NIL mid = T // root if mid.key = key return mid else if mid.key < key T = T.right else if mid.key > key T = T.left return FAIL ``` complexity : $O(h)$ <hr> ### 11.2 In-Order Traversal on BST ```c++= // INORDER-TRAVERSAL(T) if T is NIL return else INORDER-TRAVERSAL(T.left) action with T.data INORDER-TRAVERSAL(T.right) ``` <hr> ### 11.3 Linked List v.s. Contiguous Array implementation #### 用 Linked List 實作 - 容易增刪 - 容易 Merge Sort - 做二元搜尋 (例如 Binary Search) 時效率差 #### 用 Contiguous Array 實作 - 不容易增刪 - 容易二元搜尋 (例如 Quick Sort、Heap Sort) <hr> ### 11.4 Summary of Binary Search Tree - 容易增刪資料 <span style="color:orange">增加樹節點</span> - 容易排序 <span style="color:orange">中序尋訪</span> - 容易搜尋 <span style="color:orange">直接在樹狀結構上搜尋,複雜度 $O(logn)$</span> <br><hr> ## Part 12. Sort ### 12.1 Selection Sort : Review and Refinements #### Idea Linearly select the minimum one from "unsorted" part; put the minimum one to the end of the "sorted" part #### Implementations - common implementation : swap minimum with a[i] for putting in i-th iteration - rotate implementation : rotate minimum down to a[i] in i-th iteration - linked-list implementation : insert minimum to the i-the element #### About - space $O(1)$ <span style="color:red">in-place</span> - time $O(n^2)$ and <span style="color:red">$\Theta(n^2)$</span> - rotate/linked-list : <span style="color:red">stable</span> by selecting minimum with smallest index (stable means same-valued elements keep their index orders) - common implementation : unstable <hr> ### 12.2 Heap Sort : Review and Refinements #### Idea Selection sort with a max-heap in original array rather than unordered pile #### About - space $O(1)$ <span style="color:red">in-place</span> - time $O(n log n)$ - 三搶一 : $2$ 個比較 - 樹的深度 : $log_2n$ - 執行 $n$ 次 - 複雜度 = $2 * log_2n * n \approx 2nlog_2n$ - <span style="color:red">not stable</span> - usually preferred over selection (faster) - 效能的變異較小 #### Sorting <img src='https://d18l82el6cdm1i.cloudfront.net/uploads/uv9rgMfetq-heapsort-example.gif'> <hr> ### 12.3 Bubble Sort : Review and Refinements #### Idea Swap disordered neighbors repeatedly #### About - space $O(1)$ <span style="color:red">in-place</span> - time $O(n^2)$ - 每個做 $(n-1)$ 次比較 - 最少 1 個 cycle (in-place); 最多 n 個 cycle $\implies$ 平均值 = $(n+1)/2$ - 複雜度 : $(n-1) * (n+1)/2 \approx O(n^2)$ - stable - <span style="color:red">adaptive</span> : can early stop - a deprecated choice except in vary specific applications with a few disordered neighbors or if swapping neighbors is cheap (old tape days) <hr> ### 12.4 Insertion Sort : Review and Refinements #### Idea Insert a card from the unsorted pile to its place in the sorted pile #### Implementations - naive implementation : sequential search sorted pile from the front <span style="color:red">$O(n)$ time per search, $O(n)$ per insert</span> - backwise implementation : sequential search sorted pile from the back <span style="color:red">$O(n)$ time per search, $O(n)$ per insert</span> - binary-search implementation : binary search the sorted pile <span style="color:red">$O(log n)$ time per search, $O(n)$ per insert</span> - linked-list implemenation : same as naive but on linked lists <span style="color:red">$O(n)$ time per sesarch, $O(1)$ per insert</span> #### About - space $O(1)$ - time $O(n^2)$ - stable - backwise implementation <span style="color:red">adaptive</span> (adaptive means 不管有無排好,所需花費的力氣都是一樣的) - usually preferred over selection (adaptive) <hr> ### 12.5 Tree Sort : Review and Refinements #### Idea Replace heap with a BST; an in-order traveral outputs the sorted result #### About - space $O(1)$ - time $O(n \cdot h)$, with worst $O(n^2)$ (unbalanced tree), average $O(nlogn)$, careful BST $O(nlogn)$ - unstable - suitable for stream data and incremental sorting <hr> ### 12.6 Quick Sort : Introduction #### Idea 1. 從序列中找一個樞紐元素 (pivot) 2. 將 pivot 與 序列排頭 置換 (swap) 3. 根據 pivot 將其餘元素分成 「小於樞紐值」 以及 「大於等於樞紐值」 4. 將 pivot 移回 「小於」段落的排尾 5. 對此兩段落遞迴呼叫,重複執行 1~4 步驟,直至排序完成 - #### Tree Sort Revisited ```c++= // make a[0] the root of a BTS for i = 1, ..., n-1 do if a[i] < a[0] insert a[i] to the left-subtree of BST else // a[i] > a[0] insert a[i] to the right-subtree of BST end if end for // in-order traversal of left-subtree, then root, then right-subtree ``` - #### Quick Sort ```c++= // name a[0] the pivot(基準點) for i = 1, ..., n-1 do if a[i] < a[0] put a[i] to the left pile of the pivot else // a[i] > a[0] put a[i] to the right pile of the pivot end if end for // output quick-sorted left; output a[0]; output quick-sorted right ``` #### Implementations - naive implementation : pick first element in the pile as pivot - random implementation : pick a random element in the pile as pivot - median-of-3 implementation : pick median(front, middle, back) as pivot Contiguous 或是 Linked List 均可實作 Quick Sort。 <br> <img src='https://wat-images.s3.ap-south-1.amazonaws.com/images/course/ci6ldqnqthum/Quick_Sort_0.png'><br> --- ### 12.7 Quick Sort : Review and Refinement #### Idea Simulate tree sort without building the tree #### About - space : worst $O(n)$, average $O(log n)$ <span style="color:red">on stack calls</span> - time : worst $O(n^2)$, average $O(n log n)$ - not stable (效能變異比 Merge Sort 大) - usually best choice for large data (if not requiring stability), can be mixed with other sorts for small data <hr> ### 12.8 Merge Sort : Introduction #### Idea 1. 將序列分割成兩個子序列 (等長或長度差一) 2. 對兩個子序列分別排序 (遞迴呼叫) 3. 將兩個排序完成的子序列合併 #### Implementations - Bottom-up implementation: | data | | | | | | | | sorted | | ---- | --- | --- | --- | --- | --- | --- | --- | --------------- | | 6 | 5 | 4 | 7 | 8 | 3 | 1 | 2 | (size-1 sorted) | | 5 | 6 | 4 | 7 | 3 | 8 | 1 | 2 | (size-2 sorted) | | 4 | 5 | 6 | 7 | 3 | 8 | 1 | 2 | (size-4 sorted) | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | (size-8 sorted) | - $O(log n)$ loops, the i-th loop combines size-$2^i$ arrays $O(\frac{n}{2^i})$ times - Combine size-$l$ array can take $O(l)$ time but need <span style="color:red">$O(l) space!$</span> - Thus, bottom-up Merge Sort takes <span style="color:red">$O(n log n)$</span> time - Top-down implementation: MergeSort(arr, left, right) = combine(MergeSort(arr, left, mid), MergeSort(arr, mid+1, right)); - divide and conquer, $O(log n)$ level recursive calls - Analysis Diagram <img src='https://www.andrew.cmu.edu/course/15-121/lectures/Sorting%20Algorithms/pix/mergesort_complexity.bmp'><br> <hr> ### 12.9 Merge Sort : Review and Refinements #### Idea Combine sorted parts repeatedly to get everything sorted #### About - time $O(n log n)$ in both implementations - usually stable (if carefully implemented), parallellize well (很好平行化,可以小堆小堆進行) - popular in <span style="color:red">external sort</span> - 需要額外的空間儲存較小序列 --- ### 12.10 Time Complexity and Space Complexity of Each Sort <img src='https://edurev.gumlet.io/ApplicationImages/Temp/a5d8f710-434d-4b40-8833-75a5b661f2eb_lg.jpeg?w=576&dpr=1.0'>