Fundamentals Of Data Structures

# Fundamentals Of Data Structures [交大開放式課程](http://ocw.nctu.edu.tw/course_detail-v.php?bgid=9&gid=0&nid=412) [Ctutor](http://pythontutor.com/c.html#mode=edit) ## Chapter 1. Basic concepts ### Overview: System Life Cycle #### Requirement 系統的需求 #### Analysis * Bottom-up * Top-down #### Design * Data objects: abstract data types * Operations: specification(function) & design of algorithms(設計演算法達到 function 的目的) #### Refinement & Coding * Choose representations for data objects * Write algorithms for each operation on data obejcts #### Verification * Correctness proofs: selecting proved algorithms * Testing: correctness & efficiency * Error removal: well-document ### Evaluative judgments about programs * Meet the original specification? * Work correctly? * Document? * Use functions to create logical units? * Code readable? * User storage efficiently? * Running time acceptable?(as fast as possible) ### Data Abstraction * Predefined & user defined data type ``` Struct student{ char last_name; int student_id; char grade; } ``` * Data type: objects & operations (加減乘除等等) * Reoresentation: char 1 byte, int 4 byte * Abstract Data Type (ADT): data type specification(object & operation) is separated from representation. * ADT is implementation-independent(與實作無關，ADT 僅定義如何做，但沒有實作細節。p.20 Abstract data type NaturalNumber) ### Algorithm Specification #### Algorithm criteria * Input * Output * Definiteness * Finiteness * Effectiveness 演算法是用來解決問題的方法，演算法中的 input and output 表示了你的問題。我們會希望我們的演算法能夠有效率且能夠在有限的步驟內，解決問題。最簡單的演算法，排序。 input: 一串數值。 output: 一串數值，但需依照大小順序排好。 **program doesn't have to be finite(e.g. OS scheduling)** ### Example 1: Selection Sort * From those integers that are currently unsorted, find the smallest and place it next in the sorted list. ```csharp= // 虛擬程式碼 for(i=0; i < n; i++){ Examine list[i] to list[n-1] and suppose that smallest integer is list[min] Interchange list[i] & list[min] } ``` ```csharp= void sort(int list[], int n) { for (i=0; i < n-1; i++){ min = i; for(j = i+1; j < n; j++){ if(list[j] < list[min]) min = j; } SWAP(list[i], list[min], temp) } } } ``` ### Example of Binary Search Search 最暴力的方法：一個一個去確認搜尋。在執行 Binary Search 前需要先進行排序。圖中 * 為二分點不一定找得到輸入的數值 ![Binary Search](https://i.imgur.com/4ejevyt.png) ```csharp= // pseudo code While (there are moree integers to check){ middle = (left + right) / 2; if(searchnum < list[middle]) right = middle -1 ; else if(searchnum == list[middle]) return middle; else left = middle+1; } ``` ```csharp= int compare(int x, int y) /* return -1 for less than, 0 for equal */ int binsearch(int list[], int searchno, int left, int right) { while (left <= right){ middle = (left + right) / 2; switch (COMPARE(list[middle], searchno)){ case -1: left = middle +1; break; case 0: return middle; case 1: right = middle -1 } } } ``` **不要為了學語言而學語言，學會解決問題的方法。** ### Example 3: Selection Problem * Selection problem: select the kth largest among N number * Solutions: ``` * Approach 1 * read N numbers into an array * sort the array in decreasing order * return the element in position k * Approach 2 * read k elements into an array * sort them in decreasing order * for each remaining elements, read one by one * ignored if it is smaller than the kth element * otherwise, place in correct place and bumping one out of array ``` * Which is better? * More efficient algorithm? ### Recursive Algorithms * Direct recursion: functions that call themselves * Indirect recursion: Functions that call other functions that invoke calling function again * C(n, m) = n!/[m!(n-m)!] -> C(n, m) = C(n-1, m) + C(n-1, m-1) * Boundary condition for recursion (停止條件 Boundary) **不要讓演算法進入無限迴圈** ### Recursive Factorial * n! = n * (n-1) -> fact(n) = n * fact(n-1) * boundary -> 0! = 1 ```csharp= int fact(int n){ if( n == 0 ) // Boundary condition return (1) else return (n*fact(n-1)) } ``` ### Recursive Multiplication * a * b = a * (b-1) +a * a * 1 = a ```csharp= int mult(int a, int b) { if(b == 1) return (a); else return ( mult(a, b-1) + a ) } ``` ### Recursive Summation * sum(1, n) = sum(1, n-1) + n * sum(1, 1) = 1 ```csharp= int sum(int n){ if(n == 1) return (1); else return ( sum(n-1) + n ) } ``` 也可以直接使用 for 迴圈寫完 ### Recursive binary search ```csharp= int binsearch(int list[], int searchno, int left, int right) { if(left <= right){ middle = (left + right) / 2; switch(COMPARE(list[middle], searchno)){ case -1: retrun binsearch(list, searchno, middle + 1, right) case 0: return middle; case 1: return binsearch(list, searchno, left, middle - 1) } } return -1; } ``` ### Recursive Permutations * Permutation of { a, b, c } ``` (a, b, c), (a, c, b) (b, a, c), (b, c, a) (c, a, b), (c, b, a) ``` * Recursion? * a + Perm({b, c}) -> {a, b, c} and {a, c, b} * b + Perm({a, c}) -> {b, a, c} and {b, c, a} * c + Perm({a, b}) -> {c, a, b} and {c, b, a} ![Perm Function](https://i.imgur.com/2NGAjOe.png) ```csharp= void perm(char *;list, int i , int n){ if(i == n){ for(j = 0; j <=n; j++) cout<<list[j]; }else{ for(j = i; j <= n; j++){ SWAP(list[i], list[j], temp); perm(list, i+1, n) SWAP(list[i], list[j], temp) } } } ``` Q. 字元排序問題如果不用 recursive 要怎麼解決？ for loop？ ### Performance Evaluation * Performance analysis: machine independent * Performance measurement: machine dependent ### Performance Analysis * Complexity theory * Space complexity: amount of memory * Time complexity: amount of computer time ### Space Complexity ![Space Complexity](https://i.imgur.com/j7NhWFa.png) ### Time Complexity ![Time Complexity](https://i.imgur.com/lQng23w.jpg) ![Time Complexity 1](https://i.imgur.com/xSbGxYw.jpg) * Worst case * Best case * Average case ![](https://i.imgur.com/gg7flgS.png) ### Asymptotic Notation - Big "oh" ![Big "oh"](https://i.imgur.com/ILwyfKi.png) ### Asymptotic Notation - Omega ![Omega](https://i.imgur.com/liKbk1T.png) The most important is most lower bound. :::info 如何計算 recursive 的時間複雜度。 ![](https://i.imgur.com/cSnTuOh.jpg) ![](https://i.imgur.com/vV0vHu4.jpg) ![](https://i.imgur.com/ReASqSx.jpg) ![](https://i.imgur.com/ttRm8lZ.jpg) ![](https://i.imgur.com/U9i51yr.jpg) ![](https://i.imgur.com/DpOLudE.png) ![](https://i.imgur.com/4biv4iU.jpg) ![](https://i.imgur.com/pxSX4zR.png) https://github.com/illuminate/support/blob/master/Arr.php#L207 10 ans. D ::: ```csharp= Int a = 0 Int* c = &a *c = 1 Int b = 10 c = &b *c = 3 ``` ## Array Array: a set of index and value * data structure For each index, there is a value associated with that index Array ADT ![](https://i.imgur.com/nDJuGiG.jpg) :::info ![](https://i.imgur.com/RfrIBq5.jpg) ![](https://i.imgur.com/RgoR4CW.jpg) ::: https://youtu.be/cgaPEsMJkSc?t=3132 ![](https://i.imgur.com/aTAS6kQ.png) https://blog.gtwang.org/programming/memory-layout-of-c-program/ ``` 001080 000000 040300 600000 000000 050000 000000 ``` array[7][6] | - | - | | - | - | | 0 | 765 | | 1 | 021 | | 2 | 048 | | 3 | 214 | | 4 | 233 | | 5 | 515 | ``` 12345 06789 00abc 000de 0000f ``` index of e = 13 index of any (x,y) where x >= y in n by n matrix = ((n+1) + n-i +1) * i / 2 + (j - i) --- https://docs.google.com/presentation/d/1PfNCvxJV_pOozArub_N-99-fXwu7yNw_Lt3EjlVKLiU/edit#slide=id.p ## Array int list[5]: 存放5個int的array int* plist[5]: 存放list[5]內5個int的pointer的array 陣列內value存放的記憶體空間為連續 *plist = list[0]: base address = alpha list[1]: address = alpha + sizeof(int) list[2]: address = alpha + 2*sizeof(int) ... --- ### loser me 10歲的james薪水就有35000 QQ ![](https://i.imgur.com/YpwaPi7.png) --- ### polynomial * purpose：do the basic oparation like addition or substraction #### 表示方法1: x^4^+10x^3^+3x^2^+1 = [1, 0, 3, 10, 1] | 0 (constant) | 1 | 2 | 3 | 4 | | - | - | - | - | - | | 1 | 0 | 3 | 10 | 1 | Example: a = 3x^20^+2x^5^+4 = [4, 0, 0, 0, 0, 2, 0, ... , 0, 3] b = x^4^+10x^3^+3x^2^+1 = [1, 0, 3, 10, 1] result = [] 1. compare a.count and b.count: a>b -> case 1 result[20] = a[20] a.remove(index: 20) (到a[5]前皆同) 沒有2，因為我懶得打惹 #### 表示方法3: 若多項式為 2x^1000^+1 = [1, 0, ... , 0, 2] 則Array多數空間被浪費 -> 用一個global array存放所有多項式 A(x) = x^1000^+1 B(x) = x^4^+10x^3^+3x^2^+1 | | A Start | A Finish | B Start | | | B Finish | free | | - | - | - | - | - | - | - | - | | coef | 2 | 1 | 1 | 10 | 3 | 1 | | | exp | 1000 | 0 | 4 | 3 | 2 | 0 | | | index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | * free: 允許加入其他多項式 or 存放相加後結果 * 須清楚各多項式的起始及終點位置所對應的index值 Example: 1. Compare A-start and B-start's exp: A-start>B-start -> case 1 copy A-start to free free++ #### complexity = O(n+m) * 若free超過了一開始宣告的maxTerm: * 回收(destruture)已做過運算且不會再使用的polynomial * 缺點: release後的空間不連續 * 須改寫程式碼以避免問題，大家自己回家想一波，期中考會考 😈 --- ### Sparse Matrix * 矩陣內大部分element為0，使用2-dimension array紀錄hen浪費空間 * 使用1-dimension array紀錄非0的element的<row, column, value> * 排序先row再column ### 矩陣轉置 #### dumby method * row與column值互換 * 互換後需再重新檢查並排序array，調整時間成本高 * O(column * term) #### smart method * 先統計transport後各個row有幾個element，紀錄位置後再進行transport * O(row * column) ### 矩陣相乘 * result(i,j) = sigma(a(i,k)*b(j,k)) ### 矩陣存放 * 先存row再存column(row major) --- ### String的存放 #### 英文 | H | E | L | L | O | | W | O | R | L | D | \0 | | - | - | - | - | - | - | - | - | - | - | - | - | #### 中文 * 中文的字串處理(自己google) * 資料檢索 ### 字串的運算 * string match * 字串串接 * tring copy * string insert * trim * print ### String match * 查詢P字串有無在T字串內出現 #### dumby method * 從P1/T1開始比對，若不吻合則shift一格，比對P1/T2直到吻合或回報-1 * O(mn) #### KMP * 建構可快速移動的failure array，再由此加速字串比對 * O(m+n) ### KMP #### first case * 從P1/T1開始比對，若在P4/T4處不吻合，則直接shift至比對P1/T4 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | - | - | - | - | - | - | - | - | - | | A | G | C | **C** | T | A | T | C | A | | A | G | C | **G** | G | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | - | - | - | - | - | - | - | - | - | | A | G | C | C | T | A | T | C | A | | | | | A | G | C | G | G | #### second case * 當字串不吻合時，發生不吻合的字元前(已比對過的字元)若有與字首吻合的字元，則不需再做比對 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | | - | - | - | - | - | - | - | - | - | - | - | - | - | | A | G | C | C | T | **A** | **G** | C | T | C | A | T | T | | **A** | G | C | C | T | **A** | **C** | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | | - | - | - | - | - | - | - | - | - | - | - | - | - | | A | G | C | C | T | **A** | G | C | T | C | A | T | T | | | | | | | **A** | G | C | C | T | A | C | #### third case * second case在非單一字元的情況下同樣適用(failure array) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | A | G | C | C | G | **A** | **G** | **G** | T | C | A | T | T | A | G | T | A | A | | **A** | **G** | C | C | G | **A** | **G** | **C** | A | G | G | C | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | A | G | C | C | G | **A** | **G** | G | T | C | A | T | T | A | G | T | A | A | | | | | | | **A** | **G** | C | C | G | A | G | C | A | G | G | C | ### failure function ![](https://i.imgur.com/5WINRzM.png) #### example | q | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | - | - | - | - | - | - | - | - | | p | a | b | a | b | a | c | a | | ㄇ | 0 | 0 | 1 | 2 | 3 | 0 | 1 | ## Stack and Queue ### Template * 當data type改變時，對function本身沒有影響 ### Stack * last-in-first-out(LIFO) * top * abstract data type(ADT) * top * finite order list * Boolean IsFull() * void Push/Add(const KeyType & item) * Boolean IsEmpty() * KeyType* Delete/Pop(KeyType &) * 可用array inplement(但array不是唯一) #### 迷宮老鼠 ![](https://i.imgur.com/l07waBm.png) * 有幾種走法(8個方位) * 用stack存放行走路徑[row][column][dir]，若無法前進則pop回前一步 * 圍牆用1表示 * maze記錄曾經走過的path * move決定行走方位 * mark記錄有無走過 * 存不存在一條路徑到達終點 * 是否為最短路徑 * 出口位置未知 * 出口數量不只1個 #### Operate * Postfix evaluation ![](https://i.imgur.com/Uf3if3r.png) * infix轉postfix * 先括弧、移項後刪除 * */的priority大於+- ![](https://i.imgur.com/tVAVcgn.png) * priority table ![](https://i.imgur.com/QmIt69C.png) ### Queue * first-in-first-out(FIFO) * job schedule * 排隊即queue * Priority Queue(插隊仔) * abstract data type(ADT) * front / rear * 其他跟stack差不多 * 因資料delete後不進行搬移，故queue size(IsFull())可能造成誤判 * data movement * 修改procedure * 環狀queue(保留一個空間去做判斷) * front = rear: empty * rear + 1 = front: full * delete front + 1 ## Linked List ### Array * 連續的記憶體空間 * 有大小限制 * 有空間浪費問題 * 當資料做增減時，可能產生誤判 ### Linked List * 有序 * element中存放資料以及pointer(存放下一筆資料的記憶體位置) * insert資料時，assign插入資料的記憶體位置給上一筆資料的pointer，並將插入資料的pointer指向下一筆資料的記憶體位置。 * 刪除資料時，需先依序搜尋資料是否存在以及存放位置，再對pointer重新做assign，搜尋成本高。(element可刪除可不刪) * double linked list * 可從頭尾同時搜尋，降低搜尋成本。 * 可避免pointer鏈結斷掉造成找不到下一筆資料的情況。 ### Implement Linked List * Composite Classes * class Node: data, next pointer * class List: first node * 優點：class Node可共用 * 缺點：自己想 * Nested Classes * class List: class Node * class Node: data, next pointer * 缺點：共用需要再複製一份 * 優點：自己想 ![](https://i.imgur.com/zmtzTQz.png) ### insert ### delete ### list iterator * 一個 class 存放 scan, number of element, first, isEmpty...(因重複使用率高) ### invert a list * 反轉 linked list 所有 pointer * first重新assignment ![](https://i.imgur.com/u9P70pY.png) ### Concatenating * if p,link = null {p -> link = b.first} else {p.link = p} ### list desctrutor * delete first ### circular list * 需註明first與last `linked list也可用來表示polynomial以及stack and queue` ### polynomial * 相加 * 指數部分相同->係數相加 * 不同-> copy較大者係數 * 複製後pointer往後走 * 直到兩者pointer都為null為止 ![](https://i.imgur.com/wMsTXAG.png) ### free pool * delete掉的node丟進free pool裡，不先destructor。要創建新的node時可從free pool中取用(first)再修改值，可減少系統請求記憶體空間的程序。 * 環狀linked list -> 可將polynomial一次return進free pool ### equivalence relations * symmetric * reflexive * transitive ![](https://i.imgur.com/zsjXcl0.png) * phase1 - 讀完所有的pair，process未處理的pair(seq array w/ directive equivalence linked list) * phase2 - output結果(output array: output過的index轉成true) (使用stack尋找indirect的equivalence number(transitive)) * input順序不同會造成output的set內容順序不同，但element相同 ### Sparse Matrix 每個row與column都會對應到一個環狀的linked list ![](https://i.imgur.com/ZphBndY.png) * 藍色代表是4 by 4的matrix * header node僅有一藍四黃(左與上為同一組head node) * 紅線為column構成的linked list * 黑線為row構成的linked list * 數字的左上/右上代表pointer，指向對應的row及column * 如果要做transport，需調換row/column及改變pointer ![](https://i.imgur.com/mMsQtuS.png) ### Doubly linked list * 有llink及rlink 2個pointer * header node 的 rlink 指向linked list的first，llink指向linked list的last * insert(修改4個pointer)/delete(修改2個pointer) ## Tree * a collection of nodes * a distinguished node root * zero or more nonempty (sub)trees * level: height of tree(長得像大ray哥的老師在他家祖譜排level 32) * degree: number of pointing out * parent: 最原始的祖先 ### 表示法 #### linked list表示法 ![](https://i.imgur.com/OFBuD5K.png) * ( A ( B ( E ( K , L ) , F ) , C ( G ) , D ( H ( M ) , I , J ) ) ) * 如果n個node就件n筆資料(n個pointer)，很浪費memory(?) * 整理成2元樹，每個node只記錄left child及right child #### binary tree ![](https://i.imgur.com/ZmSQeZo.png) * child只有2個(right/left) * 允許zero node * 在binary tree裡，如果level是i，最多的node數為2^i-1^ * maximum number of pointer (B): B+1=n **(the properties of binary tree沒聲音)** * full binary tree: has maximum number of node * complete tree: 照次序排列的樹 * skill tree: child偏向依邊的歪斜樹 #### Array表示法 * 次序編號: index -> 丟入相對應的value * left child: even value = 2i right child: odd value = 2i+1 i=1: parent node #### 優缺點 array: 可快速查找，但skill tree容易遇見空間的浪費 linked list: 查找困難 ![](https://i.imgur.com/PSq9g57.png) ### binary tree traversals * 快速得知tree的結構，以便比較及複製 * **LVR(inorder), LRV(postorder), VLR(preorder)**, VRL, RVL, RLV * L: 往左查訪 * R: 往右查訪 * V: 印出資料 * recursive function or using a stack/queue #### inorder * recursive * inorder(currentNode -> left child); * cout << currentNode -> data; * inorder(currentNode -> right child); * stack * 往左走訪 * 有child -> 加入stack * 沒child -> pop out the last element in stack and visit the right child #### preorder * recursive * cout << currentNode -> data; * inorder(currentNode -> left child); * inorder(currentNode -> right child); #### postorder * recursive * inorder(currentNode -> left child); * inorder(currentNode -> right child); * cout << currentNode -> data; #### level order * 照index順序 ![](https://i.imgur.com/QyZ4eqK.png) * only inoder can check two trees equal ### Propositional calculus expression #### boolean tree(供三小) ![](https://i.imgur.com/IkzMWDU.png) ![](https://i.imgur.com/zfrZE9f.png) * use postorder -> evaluate expression value * the last element is the root ### Threaded Binary Trees（ㄅ重要） * 讓 tree 裡沒有被使用的node 被充分利用 -> 使用 inorder 讓空的 node 指向他上一個或下一個 node * 可快速查找 inorder 順序 ![](https://i.imgur.com/BepaEB9.png) * 實線：原本的 tree * 虛線：thread binary tree，每個 element 都指向他前後的 2 個 element #### 缺點 * 多增加了 2 個欄位空間 * 增加了 insert 和 delete 的難度 ### Huffman code * 目的在於用最少的 code text 進行編碼 * 根據 message 的使用頻率來決定表達方式 * 例如出現頻率若最高，就用最少的單位(minimum code size)表示，使解碼的速度加快 ![](https://i.imgur.com/1Df7ClW.png) * M1 = 000 M2 = 001 ...依此類推 #### huffman function * 建立方法 * bottom-up * 先選頻率最小的 2 個 message 並 merge 出現頻率 * 將 merge 後的頻率加入比較，再選出最小的 2 個 message 並 merge ![](https://i.imgur.com/UvVAOzN.png) * 程式 * 宣告一個 binary tree 的 priority queue * 選最小的 2 個 element(最前面 2 個) merge * delete 已被 merge 的資料 * insert merge 後的資料(priority queue) * 呼叫自己 * huffman code 並不唯一 ![](https://i.imgur.com/zqSkzEB.png) ![](https://i.imgur.com/P5H7caN.png) ### Priority Queue(heap) * 為 complete binary tree * 分 maximum 與 minimun，差在 return 值以及 root * maximum/minimum heap 只在乎 tree/subtree 的 root 是 maximum/minimum，其餘 leaves 的排列隨便 ![](https://i.imgur.com/8EXPN68.png) #### insert * bottom-up * insert element 到最一個位置 * 若 leaf 比 root 大則進行交換，最多交換 logn次 #### delete * top-down * 將最上面的 element output * 把最後一個 element 放到 root 的位置 * 若 leaf 比 root 大則進行交換，最多交換 logn次 #### heap sort * 建立 minimum/maximum heap tree * insert n 個 element * delete(pop out) n 個 element ### Binary Search Tree * delete max/min: O(logn) * delete arbitary: O(n) * search: O(n) --- * element of left-subtree < root < element of right-subtree #### search * 若所蒐數值比 root 小 -> 查找左邊，比 root 大 -> 查找右邊，直到找到 null 為止 * 找第 n 大的數字 -> sorting(maximum heap tree) #### size (???-36:30) #### Insert * 跟 search 差不多 #### delete * 刪除指定 element * 只有左/右子樹 -> 直接拉上去 * 同時有左右子樹 -> 拉左子樹裡最大的 element 或右子樹裡最小的 element #### AVL tree * 將 binary search tree 調整成左右 level 相差不大於 1 ### selection tree * 每個 leaf node 分別是一個 queue * 可輕鬆做排序 #### winner tree ??? - (52:00) * 大量數字排序，memory 無法開另一個空間存放時 #### loser tree ### Forest * a set of tree #### forest traversal ![](https://i.imgur.com/1vOHkuj.png) #### disjoint set union * 兩種做法的優缺點取決於 element 被查找的頻率 ![](https://i.imgur.com/M3XYg8K.png) #### weighting rule * root i 的 leaf 若比 root j 少，則 union 時把 j 當成 root * 猶豫的ppt https://www.csie.ntu.edu.tw/~hsinmu/courses/_media/dsa_17spring/disjoint_set.pdf ## Graph * vertice(V) & edge(E) * 一比劃問題 * 電路分析 * 最短路徑問題 * 專案規劃 * social science(fb) ### 一筆劃問題 * 七條橋(經典) * Euler's graph -> 每個點的 degree 僅能為偶數 ### 最短路徑問題 * 地圖 ### graph 種類 * complete gragh -> 每個點都和其他點有連接 * directed graph -> 有向的 ![](https://i.imgur.com/XKxIdtQ.png) * multigragh -> 當 edge 有其他記憶條件因素時 * DAG -> 有向無環圖 * connected component -> 沒有人是局外人 * strongly -> 每個點都有相互連接(左) or isolated component(右) ![](https://i.imgur.com/G68MFus.png) ### ADT * 使用矩陣表示點和點之間是否有相連 * 在 undirected graph 下為 symmetry ![](https://i.imgur.com/y2HKf6D.png) * 若不是 complete graph，使用 matrix 較浪費空間，可改用 linked list 表示 ![](https://i.imgur.com/hWYyCDX.png) * 用 one-dimension array(有點複雜) * 紅色(起始點) -> 與 component 有連接的資料的放置位置 * 缺點 -> 刪除資料麻煩(需要一直 shift) ![](https://i.imgur.com/EZ1mM1G.png) * adjacent list * 記載 component 的 out-degree * inverse adjacent list * 記載 component 的 in-degree * multilists * 以 edge 為出發點 * 記錄跟該 edge 有連接的 node 及其 value * N 的個數為 number of edge ![](https://i.imgur.com/Usubg3k.png) ### weight edge * 光復路的故事 ### traversal * DFS * depth preorder traversal * 向一個方向走到不能再走後，退回上一個component 繼續走訪 * 如果表示方法為 list，複雜度 = O(e)；若為 matrix，複雜度為O(n^2) * BFS * level by level * 先走訪一個component 所有連接的 component * 走過的 mark 為 1 (boolean array) * 皆不唯一 ### Spanning tree * sub-graph * minimum cost spanning tree -> 能用最少的 edge 連接所有 vertice * 不唯一 * 有 3 個 greedy 的演算法 #### kruskal * 以 edge 為主 * 從權重最小的 edge 開始選擇 * 不能有 cycle (how to detect loop) * 使用 priority queue #### print * 以 vertice 為主 * 決定一個起點，選擇與起點連接權重最小的 edge * 再往外擴張 #### sollin * 一次決定多個起點，選擇起點們連接的最小權重邊 * 結合 kruskal 及 print ### Biconnected components * articulation point ->在 graph 中若拿掉會使 graph 形成 2 個獨立的 connected components 的 vertice * 在 graph 中沒有 articulation point 則為 bioconnected component ![](https://i.imgur.com/D0v63Nv.png) * 若不被任一 subgraph 包含的話則為 maximum biconnected component * 可用 DFS 查找，查找完將 graph 中原有的 edge 補完(back edge) ![](https://i.imgur.com/wjknCQs.png) * low -> 以 root 為出發點，DFS 時可以拜訪到對最小數值 (48:00) 為何右邊的最小拜訪值不是 1 ？ 567 為何一樣？ #### find articulation point in programming * make the DFS spanning tree -> decide DFN(越靠近 root, DFN 值越小) * 補上 back edge * 於自己本身的 DFN, child node 的 DFN, 以及 back edge 連接的 node 的 DFN 中取最小值(?)(bottom up) * root 一定為 articulation point(若選擇非 articulation point 做 root?)，若 u 不為 root ，則與 u 連接的 child node 1 皆無法不通過 u 拜訪 chide node 2，則 u 為一個 articulation point ### Shortest Path * Dijkstra * 從起點開始選擇最小權重邊延伸，直到所有點都被拜訪過為止 * 選定起始點加入陣列，與起始點相連的點更新權重，其餘為無限大 * 將權重較小的點加進陣列裡(已走訪過)，並做為新的起始點，更新與之相連的點的權重 * 重複以上動作 * 全部點都被走訪完畢後，得知起點至所有點的最短路徑 * 要知道 path 則需另外紀錄 * 為 greedy 演算法 * 無法處理有負權重的情況 * Bellman-Ford * 當只允許走一條路徑時，列出到每個點的最小 cost * 開放走 2 條路徑，經由上一次的結果更新 cost * 不斷更新 cost 直到走訪 edge - 1 次 * 若第 edge 次仍可以更新權重，代表 graph 中有負迴圈 ### All-Pairs * a^k^[i][j]: 從 i 走到 j ，最少經過 k 個頂點 = a^k-1^[i][k] + a^k-1^[k][j] #### Transitive closure * 做完 all-pair 後，大於1的在 transitive closure 中皆為1，表示可連通， 0 表示無路徑可連通 ### Activity-on-Vertex (AOV) Network * 每個 vertex 代表一個 activity，有先後順序關係。若先決條件的 activity 沒發生，則該 activity 不會發生 * in-degree 為 0 的 output 出來(不拘順序)，output 後對有與 output 連接的 vertex 更新 in-degree * order 並不唯一 ![](https://i.imgur.com/K7VcKii.png) ![](https://i.imgur.com/HkHhDPe.png) ### Activity-on-Edge (AOE) Network * activity 在 edge 上，vertex 只代表 activity 的開始即結束 * edge 可同時執行 * vertex 紀錄最早完成時間 * 利用 compage order 依序走訪 ![](https://i.imgur.com/SRxX1KD.png) ![](https://i.imgur.com/LaouOIj.png) ### critical path * 在 graph 中不能有 delay 的空間 * 先計算每個 activity 的時間 * 在 in-degree 不為 1 的 vertex 前建立 vertex'，edge 為 ?，order 為前置作業的 maximum 值 * 從左往右推 -> 完成需花費最短時間 ![](https://i.imgur.com/aXljYUL.png) * 從右往左推(選 minimum) -> 最多可 delay 時間 ![](https://i.imgur.com/0MSRZs1.png) * critical path: 從右往左及從左往右推的時間相減為 0 的即為 critical path * critical path 是否一定存在？ ## Sorting * data base: records(key) * get / set * scan / match * return data * ex: 電話簿 * binary search * 須先做排序 * 可將 search 複雜度從 O(n) 降至 O(logn) * internal sort: 要排序的資料皆可放入 memory 中 * insert * n-1 筆 data 已排序，將第 n 筆 data 插入已排序 data 中適當位置 * recursive * 將 unsolve 的資料加入已排序資料最後項(A(j))，若A(j)<A(j-1)則做swap，j- - * selection * 從未排序資料中挑選最小的資料加入已排序資料 * 最小的資料與第一筆資料 swap 後，將第一筆資料加入已排序資料 * bubble * 兩兩做比較後如有需要則 swap * merge * 合併 2 個已經排序完成的資料 * recursive * divide and conquer * 先將資料切成 n 組最小的 set 後，每 2 組做排序合併，合併時需新開一個 array 存放合併後的結果 * complexity: O(nlogn) * quick * comparison base 中最快的 * recursive * divide and conquer * 在資料中隨機選一筆資料作為分段點，將資料分成比分段點小及分段點大 2 堆，各自再選擇分段點，重複動作 * sampling: 為避免隨機選擇的數字過於偏大或偏小而失去 quick sort 的優勢，可隨機選擇多筆資料，以中位數的資料作為分段點 * 不適合資料筆數少的排序 * complexity: O(nlogn) * heap * 建立一個 minimum heap tree，delete root 後重新做調整(O(logn))，共做 n 次 delete(O(n)) * 須建立額外的空間存放 delete 的值 * 若使用 maximum heap tree 則不需額外建立 array 存放 delete 的 element * 建立 heap: top-down or bottom-up(O(logn)) * bucket * 根據資料特性建立 bucket，並將資料放入對應的 bucket 中 * implement in queue * radix * approach from bucket sort * most significant bit(MSB): radix-exchange sort，根據最重要的位元進行排序 ![](https://i.imgur.com/RhgZ2X8.png) * least significant bit(LSB): straight-radix sort，根據最小的位元進行排序，recursive 至最大位元後，從最小的 bucket 依序從最底端 output 出來(queue) ![](https://i.imgur.com/4SvpMBe.png) * external: data 太大無法放進 memory(目前較少使用，未來趨勢) * merge * 假設 memory 只能放進 750 個 record，而資料有 4500 筆 * 將 data 做 partition，分段處理 ![](https://i.imgur.com/nFPd1zG.png) ### weighted external path lenght * element 的個數及被 merge 的次數乘積的加總 * 可用 huffman-code 解決 ### Proof Lower Bound of Sorting * 在 comparison sorting algorism 中，最小複雜度為O(nlogn) ![](https://i.imgur.com/gLmoahb.png) ## Hashing * hash key(bucket) * hash funciton * static v.s. dynamic * static - hash function 不會隨資料的比數或狀態改變 * dynamic - hash function 會隨資料的比數或狀態調整 * identifier density: n(所有資料)/T(可能的key) * loading density: hash table 被佔用的情況。n(所有資料)/sb(每個bucket所能容納資料數 * bucket) * collision: 2 筆資料都要進同一 bucket * overflow: 資料要進已滿的 bucket * cardinality: 資料筆數 * degree: 欄位數 ### symbol * directory base * 可作新增、搜尋、刪除 ### Hash Function * 目標 * 發生 collision 的情況越少越好 * 每個 bucket 的容納資料筆數平均 * 常見形式 * Mid-Square * 將資料轉成二進位 * 取中間數(不限定幾位) * 將取出數開平方 * Division * 取mode * Fold * 將資料轉成二進位 * 將轉換後的資料分為好幾個區間，分別去做運算(23:44?) * 對折 * solve overflow * open new bucket * linear: 找附近的空 bucket * quadratic: 找遠一點的空 bucket，若 bucket key 為 i，則一次跳 i 平方格查找 * rehash: 共有 m 個 hash function，再 hash 一次 * chaining * 使用 linked list 串接 * 搜索長度：所有可能走訪的 linked list/總資料筆數 ### Dynamic Hashing * 用資料分布的狀態來決定 hash * 將 key 轉換成二進位表示法 * 轉換成 binary search tree * sstime(travers tree's time，與樹的高度有關) * 樹的歪斜問題 * extendible hash(?) * 將 tree 建一個目錄(避免 travers 的時間) * directoryless hash(?) * 沒有 directory * overflow -> linked list? ![](https://i.imgur.com/h2qpiLv.png) ## search structure ### AVL Tree * binary tree 受 input 的影響，可能不 balance * high balance - 左右樹高度差(balance factory)只能為 1, 0, -1 * 在 insert 的過程中只要有 node 違反 BF 值則必須做調整 * 依據 insert 位置，可分為 LL, LR, RR, RL #### 調整(只能意會) * LL * RR * LR(31:00) * RL * 將 insert 後產生衝突的 C 往上拉 * 其餘4顆子樹照 binary search tree 順序排放 #### Complexity ![](https://i.imgur.com/oKwuXUF.png) * 建 AVL tree 的 cost? ### Optimal Binary Search Tree * 在最壞的狀況下有幾次的 comparison(與 tree 的高度有關) * 當每個 node 都有 priority 時，與搜索的頻率有關(與 path 長度有關) * internal path - 搜索到 node 的路徑 * external path - fail node, node 並不存在 binary search tree 中 * cost: internal path * frequency + external path * freqiency * 資料為 static set * if the data is dynamic set? #### Determine(?) * Tij - optimal binary search tree, 包含 a(i+1)-a(j) 個 words * Cij - Tij 的 cost * Cij = cost(L) + cost (R) + weight(L) + weight(R) * Rij - Tij 的 root * Wij - weight, 所有 priority 的加總(static) ##### 若 tree 的 subtree 為 optimal，則 tree 本身有高機率為 optimal * 假設 a(k) 為 Tij 的 root(Rij) 則左子樹包含 a(i+1)-a(k-1)，右子樹包含 a(k+1)-a(j) * Cij = Pk + cost(L) + cost(R) + weight(L) + weight(R)，cost(L) 與 cost (R) 再繼續 recursive ![](https://i.imgur.com/laAvWnA.png) ![](https://i.imgur.com/NVbanc0.png) * 橫軸代表編號，縱軸代表 word 個數 ### M-Way Search * 最多有 M 個 subtree(M 個 pointer) * if x=K -> search complete if x=/=K -> 尋找 x 的坐落區間 ### B-Tree * 是一個 m-way search tree * 若 m = n，則叫做 2-3-...-(n-1) tree * root 最少要 2 個 pointer * 除了 root 外，每個 node 最少要有 m/2 個 children * 所有的 failure node 都在同一個 level * 可以解決 binary tree 當 data 過多時樹高過高的問題 * 缺點：在每個 level 比較時耗時較長，故在比較時會使用 binary search #### insert * 尋找 word 是否已存在，若不存在則判斷適當插入位置 * 當 leaf node還有空間 -> 直接插入 overflow -> create node(split) 新 node 包含衝突的資料內的最大值 middle 值拉到 parent ![](https://i.imgur.com/7xH4w2t.png) ![](https://i.imgur.com/3iwz5EP.png) #### delete * 尋找數字是否存在 * 刪除後符合 B-tree 特性 -> 直接刪除 * 刪除後不符合：隔壁有 node 可以提 -> rotate (隔壁 node 提到 parent，parent 送一個 node 給原本的 pointer) 隔壁沒有 node 可以提，parent 也沒 node 可以送 -> combine ![](https://i.imgur.com/2QEwcFX.png) ### B+ tree * 可一次找一堆值 * 將 leaf node 用 linked list 串接 * balance * 可把想搜索的範圍鎖定在某 subtree 內 * search 到 leaf node 後透過 linked list 做 range query ![](https://i.imgur.com/Hm3hcwS.png) ### R tree(不在課程範圍內) * 可處理多維空間(ex:經緯度)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.