Data Structure

--- title: Data Structure tags: programming, note description: hackmd practice also --- Data Structure === 筆者：林佑庭 > [time=Sun, June 21, 2020 2:24 AM] [TOC] ### Exception #### Try，Throw，Catch Declaration ```cpp= int i; try{ if(i<0){ throw "something wrong..";//throw丟出例外 } } catch(const char* message){//若是catch接收到throw丟出的東西，做例外處理 cout<<message<<endl; } ``` ## Sort and Search ### Selection sort ```cpp= void selection_sort(int *a, const int n//array length){ for(int i = 0; i < n; i++){ int j = i; for(int k = i+1; k<n; k++){ if(a[k]<a[j]) j = k;//find the smallest element in the array } swap(a[i], a[j]); } } ``` ### Binary Search ```cpp= int Binary_search(int *a, const int x, const int n){//n is the length of the array, x is the destination selection_sort(a, n);//sorting int left = 0, right = n-1; while(left<=right){ int middle = (left + right)/2; //floor function if(x<a[middle]) left = middle; else if(x>a[middle]) right = middle; else return middle; } return -1;//not found } ``` ## Recursion ### Binary Search in recursive form: ```cpp= int binary_sort_re(int *a, const int x, const int left, const int right){ if(left<=right){ int middle = (left + right)/2; if(x>a[middle]) return binary_sort_re(a, x, middle, right); else if(x<a[middle]) return binary_sort_re(a, x, left, middle); else return middle; } else return -1;//not found } ``` ### Permutation: Ex. (a, b, c) -> (b, c, a) ```cpp= void permutation(char *a, const int k, const int m){ if(k == m){ for(int i = 0; i<=m; i++) cout<<a[i]; cout<<endl; } else{ for(int i = k; i<=m; i++){ swap(a[k], a[i]); permutation(a, k+1, m); swap(a[k], a[i]); } } } ``` ## The Standard Template Library To use the STL elements in our code, we should include: ``` #include <algorithm> ``` ## Arrays ## Stack (Last in First Out List) push : add an element into a stack. pop : Get and delete an element from a stack. an example: frame of function call. | | previous frame pointer | | ---------------------- | ---------------------- | | | return address | | | local variables | | previous frame pointer | previous frame pointer | | return address | return address | frame pointer 指向frame 中最上層的位址 Adding elements into a stack ```cpp= template <class T> void Stack<T>::Push(const T& x){ if(top == capacity-1){ ChangeSize1D(stack, capacity 2*capacity); capacity = capacity * 3;//change capacity 倍增法 } stack[++top]; } ``` removing elements into a stack: ```cpp= template <class T> void Stack<T>::pop(){ if(IsEmpty()) throw "Stack is empty."; x = stack[top--].~T();//destructor for only one element } ``` ## Queue (First In First Out List) push : Add an element into a queue(rear) pop : Get and delete an element from a queue(front). 與stack不同的是queue需要兩個索引用pointer:front, rear for example: | D | <- rear | | --- | -------- | | C | | | B | | | A | <- front | | front | rear | Q0 Q1 Q2 Q3 | comments | | ----- | ---- | ----------- | --------- | | -1 | -1 | | empty | | -1 | 0 | J1 | | | -1 | 1 | J1 J2 | | | 0 | 1 | == J2 | remove J1 | | 1 | 1 | | empty | problem: when we removing elements from a queue, we usually change the front pointer = pointer + 1; Then we may encounter a problem, that is the capacity may seem like full, but the actually size is far smaller than the capacity. If front != -1; when full -> move the elements from front to -1; Solution: regarding an array as a circular queue. front: the address of the first elements -1. modulo capacity rear: the address of the last element. modulo capacity next_rear = rear+1 (mod capacity) In a circular queue if front = rear, then the queue is empty. If front size=capacity - 1, and we push another element into the queue, front will equal to rear, then we can not determine it is empty or full. So we will waste an element. In a circular queue if front + 1(mod capacity) == front, ehn the circular queue is full. ### A Mazing Problem ```cpp= struct offsets{ int a;//row int b;//column }; offsets move[8]; ``` relations of movements and dir. | Name | Dir | Move[dir].a | Move[dir].b | | ---- | --- | ----------- | ----------- | | N | 0 | -1 | 0 | | NE | 1 | -1 | 1 | | E | 2 | 0 | 1 | | SE | 3 | 1 | 1 | | S | 4 | 1 | 0 | | SW | 5 | 1 | -1 | | W | 6 | 0 | -1 | | NW | 7 | -1 | -1 | ==Using a Stack to keep pass history== The maximal size of the stack: m*p(the size of the maze size) ### precedence rule (ASMD and parentheses) #### infix each operator comes in-between the operands e.g. 2+3 *~正常的數學表達式~ #### Postfix each operator comes after the operands 2 3+ (2+3) -> 以加取代最右邊的括號 -> 2 3+ transform postfix into infix: | Token | 0 1 2 stacks elelments | Top | | ----- |:----------------------------------- |:--- | | 6 | 6 | 0 | | 2 | 6 2 | 1 | | / | 3 (pop two elements, operate 6/2) | 0 | | 3 | 3 3 | 1 | | - | 0 (pop two elements, operate 3-3) | 0 | | 4 | 0 4 | 1 | | 2 | 0 4 2 | 2 | | * | 0 8 (pop two elements, operate 4*2) | 1 | | + | 8 | 0 | *~理論上最後一位僅有一個元素，除非postfix的形式寫錯~ transform infix into postfix: postfix 和 infix 中，operands 的相對順序不變，僅有operator改變位置: Example 1: a + b * c, *>+ | Token | 0 1 2 stacks elelments | Top | output | | ----- |:---------------------- |:--- |:--------- | | a | | -1 | a | | + | + | 0 | a | | b | + | 0 | a b | | * | + * | 1 | a b c | | c | + * | 1 | a b c | | <eos> | | -1 | a b c * + | Example 2: a * b + c, *>+ | Token | 0 1 2 stacks elelments | Top | output | | ----- |:----------------------------------------- |:--- |:--------- | | a | | -1 | a | | + | * | 0 | a | | b | * | 0 | a b | | * | + (since *>+, 當+遇到*的時候將 * pop出來) | 1 | a b * | | c | + | 1 | a b * c | | <eos> | | -1 | a b * c + | Example 3: | Token | 0 1 2 stacks elelments | Top | output | | ----- |:---------------------------------------------------------------- |:--- |:------------------- | | a | | -1 | a | | *~1~ | *~1~ | 0 | a | | ( | *~1~ ( | 1 | a | | b | *~1~ ( | 1 | a b | | + | *~1~ ( + | 2 | a b | | c | *~1~ ( + | 2 | a b c | | ) | *~1~ (match parentheses, pop the elements in (), and () itself) | 0 | a b c + | | *~2~ | *~2~ (\*~1~ = \*~2~, then pop *~1~) | 0 | a b c + *~1~ | | d | *~2~ | 0 | a b * c *~1~ d | | <eos> | | -1 | a b * c *~1~ d *~2~ | ``` Rules: 1. Operators are taken out if the in-stack precedence is numerically less or equal to the imcoming operator's precedence. i.e. isp(y)<=isp(x) 2. ( has the lowest in-stack precedence and highest incoming precedence. ``` #### Prefix each operator comes before the operands +2 3 ==Postfix and Prefix: no parentheses, no precedence== ### Linked List #### Array: 優點: 記憶體為連續，方便的讀取缺點: 擦去以及搬運麻煩 #### Linked list: (內容: data, node* next/prev) 優點: insertion 以及 deletion 簡單(只需修改node的next/prev pointer) 缺點: 讀取elements麻煩(用for去跑), 無法知道前一個node的位置(singular linked list)->產生雙向linked list #### 以linked list 實作 circular queue: pointer: begin, last. n nodes. Free pool(an avaliable linked list): 減少使用new這個function, 將原本要delete掉的記憶體空間轉往free pool, 並且在需要新增node時優先將free pool的記憶體空間取出使用 ## Trees Tree: A finite set of one or more nodes s.t. - a distinguished node r (root) - zero or more nonempty (sub)trees T~1~, T~2~, T~3~, ..., T~k~ each of whose roots are connected by a directed edge from r. *~there's no cycle in trees(root exist)~ - The height/depth is the maximum level of any node in the tree. - The degree of a node is the number of subtrees of the node. - The node with degree 0 is the leaf of the tree. - Children that has the same parents are called siblings. - The ancestors a node n are the nodes on the path from the root to n. ### Representation of trees. #### List representation: - the root comes first, followed by a list of sub-trees. - T = (root(T~1~, T~2~, T~3~, ... , T~n~)) for example: ``` (A(B(E(K, L), F), C(G), D(H(M), I, J))) ``` ```graphviz graph graphname { A -- B; A -- C; A -- D; B -- E; B -- F; E -- K; E -- L; C -- G; D -- H; H -- M; D -- I; D -- J; } ``` Possible Node Structure for a tree of degree k If T is a k-ary tree(a tree of degree k) with n nodes, then n(k - 1) + 1 = nk - (n - 1) of the nk child fields are NULL, n>=1. #### Left Child-Right Sibling Representation - Each node has two links (or pointers) - Each node only has one left modt child and one closest siblings. #### Degree Two Tree Representation rotate the right-sibling in a left child-right sbling tree clockwise by 45 degrees. -> Binary tree. ### Binary Tree - A rooted tree that the most degree is 2. (empty is a tree? both or both not) - Skewd Binary tree : all node (except leaf) has 1 degree. - Complete binary tree: properties: 1. the maximum number of nodes at level n is 2^n-1^ 2. the maximum number of nodes in a binary tree of depth k is 2^k^ - 1 3. for any nonempty binary tree, T, if n~0~ is the number of leaf nodes and n~2~ the number of modes of degree 2, then n~0~ = n~2~+1 ``` <pf> n = n0 + n1 +n2 B = n - 1, B = 2n2 + n1, n0 + n1 + n2 = 2n2 + n1 + 1 n0 = n2 +1 ``` #### Full BT For depth k FULL BT it has 2^k^ -1 nodes. #### Complete BT A BT with n nodes and depth k is called complete iff. its nodes correspond to the nodes numbered from 1 to n in the full BT of depth k.(由右到左依序刪去一些level k的nodes即為complete BT) ###Binaty Tree 實作 example 1: a complete BT -> ==Sequential Representation== (Heap 可以使用) ```graphviz graph graphname { A -- B; A -- C; B -- D; B -- E; C -- F; C -- G; D -- H; D -- I; } ``` The index*2 is the node of left branch, index*2 + 1 is the node of right branch. Similarly, index/2 is the index of parent of the node. | index | node | | ----- | ---- | | 1 | A | | 2 | B | | 3 | C | | 4 | D | | 5 | E | | 6 | F | | 7 | G | | 8 | H | | 9 | I | *~use~ ~this~ ~representation~ ~in~ ~skewd~ ~BT~ ~would~ ~waste~ ~space~ ~much~ Node representation ```graphviz graph graphname{ data -- leftchild; data -- rightchild; } ``` ### Binary Tree Traversals - L, V, R stand for moving left, visiting the node, and moving right. - There are six possible combinations of traversals: LVR, LRV, VLR, VRL, RVL, RLV - adopt conventional, we'll have 3 traversals left: - LVR, LRV, VLR - inorder, postorder, preorder #### Arithmetic Expression Using BT 以queue實作!!!!!! ==inorder traversal== A / B * C * D + E ==preorder traversal== \+ \* \* / A B C D E (the prefix type) ==postorder traversal== A B / C * D * E + (the postfix type) ==level order traversal== \+ \* E D / C A B ```graphviz graph graphname { ADD -- M1; ADD -- E; M1 -- M2; M1 -- D; M2 -- DIV; M2 -- C; DIV -- A; DIV -- B; } ``` #### Propositional Calculus Expression performed formula evaluation: 1. traverse its tree in postorder. 2. Each node has four fields: leftChild, first_data, second_data, rightChild first: the information that stored in the node. second: the value of the subtree rooted by the node. Priority Queues: | Representation | Insertion | Deletion | |:--------------------- | ---------- |:---------- | | Unordered linked list | O(1) | O(n) | | Unordered array | O(1) | O(n) | | Sorted linked list | O(n) | O(1) | | Sorted array | O(n) | O(1) | | Heap | O(log~2~n) | O(log~2~n) | ### Max(Min) Heap - Heaps are frequently used to implement priority queues. The complexity is O(log n) - A max tree is a tree in which the key value in each node is no smaller than the key values in its children. ==A max heap is a max tree and also a complete binary tree.== #### Insert an element into a max heap ```graphviz graph graphname { 20 -- 15; 15 -- 14; 15 -- 10; 20 -- 2; 2 -- 21; } ``` insert 21 s.t. this tree is still a complete BT, but 21 >2 ```graphviz graph graphname { 20 -- 15; 15 -- 14; 15 -- 10; 20 -- 21; 21 -- 2; } ``` swap 21 and 2, but 21>20 ```graphviz graph graphname { 21 -- 15; 15 -- 14; 15 -- 10; 21 -- 20; 20 -- 2; } ``` swap 20 and 21, then this tree becomes a max tree, finish insert. #### Delete an element from a max heap ```graphviz graph graphname { 10 -- 15; 15 -- 14; 10 -- 2; } ``` swap 10 and 20, then delete 20 s.t. this tree is still a complete BT. ```graphviz graph graphname { 15 -- 10; 10 -- 14; 15 -- 2; } ``` 15 is the biggest in 15, 10, 2, so swap 10 and 15. ```graphviz graph graphname { 15 -- 14; 14 -- 10; 15 -- 2; } ``` 14 is bigger than 10, swap them to make sure this tree a max tree. Finish deletion. ### Huffman Tree 1. Scan the text to calculate the number of appearences of each character 2. Build the Huffman tree example: {(A:2), (B:3), (C:5), (D:7), (E:9), (F:13)} ==keep merge the least two root, form a new tree which has root = r1 + r2== {(A, B:5), (C:5), (D:7), (E:9), (F:13)} {(A, B, C:10), (D:7), (E:9), (F:13)} {(A, B, C, 10), (D, E:16), (F:13)} {(A, B, C, F:23), (D, E:16)} {(A, B, C, D, E, F:39)} origin length: 39 * 8 = 312 bits compressed length: 2\*4 + 3\*4 + 5\*3 + 7\*2 + 9\*2 + 13\*2 = 93 bits size of header 越高level的leaf node被使用到的次數越多，在編碼時候使用到的編碼長度越短，壓縮效率越高 ![](https://i.imgur.com/MQN1rjM.png) ![](https://i.imgur.com/2Gl1tZJ.png) (reference: http://puremonkey2010.blogspot.com/2011/02/huffman-code.html) ### Binary Search Tree - Every elements has a unique key. - The keys in a nonempty left subtree(right subtree) are smaller(larger) than the key in the root of subtree. - The left and right subtrees are also binary search trees. wrong example: not a binary search tree ```graphviz graph graphname { 20 -- 15; 20 -- 25; 15 -- 14; 15 -- 10; 25 -- 22; } ``` Good example: ```graphviz graph graphname { 30 -- 5; 30 -- 40; 5 -- 2; 60 -- 70; 70 -- 65; 70 -- 80; } ``` 1. If the root is null, then this is an empty tree -> no search needed. 2. If the root is not null, compare the x with the key of the root. - If x is equal to the root, then it's done. - If x is less than the key of the root, then no elements in the right subtree have key value x. We only need to search the left tree. - If x is larger than the key of the root, we only need to search the right subtree. #### Search Binary Search Tree by Rank - We need a additional field *leftSize*. - *leftSize* is the number of the elements in the left subtree of a node +1. - A height h tree can be searched by key as well as by rank in O(h) time. (h = log~2~ n, where n is the number of nodes) #### Insert a Node into a Binary Search Tree For example: we want to insert 80 into this tree ```graphviz graph graphname { 30 -- 5; 30 -- 40; 5 -- 2; } ``` 1. we operate binary search, x = 80 Then we found that the search terminate at the elements 40 since the right subtree of 40 is null. ```graphviz graph graphname { 30 -- 5; 30 -- 40; 5 -- 2; 40 -- 80; } ``` 2. Then we insert 80 into the left subtree of 40. #### Delete a Node into a Binary Search Tree example1: we want to delete the element 40 of this tree ```graphviz graph graphname { 30 -- 5; 30 -- 40; 5 -- 2; 40 -- 80; } ``` We just connect 80 as the right subtree of 30 ```graphviz graph graphname { 30 -- 5; 5 -- 2; 30 -- 80; } ``` another example: we want to remove the element B ```graphviz graph graphname { A -- T1; A -- B; B -- T2; B -- T3; } ``` Then we have to find the biggest element in T2 and the smallest element in T3. If remove one of the element can reduce the height of the tree, then we tend to use that element to replace B. ### AVL Trees 1. An empty tree is a height balanced tree 2. If T is a nonempty binary tree with T~L~ and T~R~ as its left and right subtrees respectively, then T is height-balanced iff. - T~L~ and T~R~ are height-balanced - BF(T) = |H~L~ - H~R~| smaller than 1 for example: ```graphviz graph graphname { MAR -- MAY; MAY -- NOV; } ``` where the BF(MAR) is -2 error detected. So we use RR operation: ```graphviz graph graphname { MAY -- MAR; MAY -- NOV; } ``` Insert AUG and MAY: ```graphviz graph graphname { MAY -- MAR; MAY -- NOV; MAR -- AUG; AUG -- APR; } ``` Both BF(MAY) and BF(MAR) >1, we deal with the smaller BF one MAR to APR: LL operation ```graphviz graph graphname { MAY -- AUG; MAY -- NOV; AUG -- APR; AUG -- MAR; } ``` Then we insert January: ```graphviz graph graphname { MAY -- AUG; MAY -- NOV; AUG -- APR; AUG -- MAR; MAR -- JAN; } ``` BF(MAY)>1 MAY to JAN : LR operation we replace MAY with MAR ```graphviz graph graphname { MAR -- AUG; MAR -- MAY; AUG -- APR; AUG -- JAN; MAY -- NOV; } ``` Then we insert DEC and JULY and FEB: ```graphviz graph graphname { MAR -- AUG; MAR -- MAY; AUG -- APR; AUG -- JAN; MAY -- NOV; JAN -- DEC; JAN -- JULY; DEC -- FEB; } ``` where AUG is the nearest node that has bigger than 2 BF value. AUG to DEC, we use RL operation. ```graphviz graph graphname { MAR -- DEC; MAR -- MAY; DEC -- AUG; DEC -- JAN; MAY -- NOV; JAN -- FEB; JAN -- JULY; AUG -- APR; } ``` then we insert JUNE, we use LR operation.... repeatly. Finally we'll get this: ```graphviz graph graphname { JAN -- DEC; JAN -- MAR; DEC -- AUG; DEC -- FEB; AUG -- APR; MAR -- JULY; MAR -- NOV; JULY -- JUNE; NOV -- MAY; NOV -- OCT; OCT -- SEP; } ``` #### Rebalancing Rotation of Binary Search Tree 1. LL: new node Y is inserted in the subtree of the left subtree of A 2. LR: Y is inserted in the right subtree of A.. etc, totally we'll have four rotation types. *僅需要走兩步來確定* - LL Rotation insert a new node at BL s.t. BF(a)>1 ```graphviz graph graphname { A -- B; A -- AR; B -- BL; B -- BR; } ``` Then we use LL operation to get this: ```graphviz graph graphname { B -- BL; B -- A; A -- AR; A -- BR; } ``` RR rotate Similarly as LL. - LR Rotation We insert a new node at CL s.t. BF(A)>1 ```graphviz graph graphname { A -- B; A -- AR; B -- BL; B -- C; C -- CL; C -- CR; } ``` Then we need to use LR operation: we replace A with C, A connected to the right side of C, CL replaced C, CR connected to the left side of A. ```graphviz graph graphname { C -- B; C -- A; B -- BL; B -- CL; A -- CR; A -- AR; } ``` LR rotate Similarly as RL. ``` The worst case of binary search tree is O(n)(like linear linked list) The worst case of AVL tree is O(log n)(like linear linked list) ``` ### Comparison of Various Structures *~all of them are sorted~* | Operation | Sequential List | Linked List | AVL Tree | | ------------------- |:--------------- | ----------- | -------- | | Search for x | O(log n) | O(n) | O(log n) | | Search the kth item | O(1) | O(k) | O(log k) | | Delete x | O(n) | O(1) | O(log n) | | Delete the kth item | O(n - k) | O(k) | O(log n) | | insert x | O(n) | O(1)^2^ | O(log n) | | Output in order | O(n) | O(n) | O(n) | ## Graph ### Euler's Graph - Degree of a vertex: the nmber of edges incident to it If all the degree of the vertices are even, then no matter which vertex you start, we can go through every edge exactly once. ### Definition of Graph A graph G = (V, E) V is a finite, nonempty set of vertices, also denotes V(G) E is set of pairs of vertices called edges, also denotes E(G). Adjacent: if (v, w) ∈ E, then w is adjacent to v, so as v to w. graph can be undirected or directed In undirected graph, (u, v) and (v, u) represent the same edge. In undirected graph, <u, v> and <v, u> represent u->v, v->u, different edges. Restriction: 1. no edge from a vertex to it self (self edge or self loop, except mention especially) 2. no multiple occurences of the same edge (but a multigraph can have) Definition: - Path: a sequence of vertices w~1~, w~2~, ..., w~n~ where(w~i~, w~i+1~) ∈ E - Length of a path: number of edges on the path. - Simple path: a path where all vertices are distinct except the first and last. - Cycle in a path: a path s.t. w~1~ = w~n~ - Acyclic graph(DAG): a directed graph with no cycle. Graph: - Connected: There exists path from every vertex to every vertex in undirected graph. - Strong Connected: There exists path from every vertex to every vertex in undirected graph. - Complete Graph: a graph in which there is an edge between every pair of vertices. undirected graph: n(n-1)/2 edges directed graph: n(n-1) edges #### Subgraph If G' is a graph is E(G') ∈ E(G) and V(G') ∈ V(G), then G' is a subgraph of G Connected Component:H is a maximal connected subgraph. (maximal: no other subgraph is both connected and contains H) Strongly Connected Component: H is a maximal strongly connected subgraph. - in-degree: the number of edges use the vertex as head - out-degree: the number of edges use the vertex as tail #### Matrix Representation space: O(V^2^), good for dense Undirected graph: symmetric matrix Time complexity: O(n^2^) (number of edges) #### Adjacency List Representation space O(V+E), good for sparse Time complexity: O(e) (number of edges) ### Graph Operation 1. Depth-first Search (DFS) starting from vertex v , process v& then recursively traverse all vertices adjacent to v. marked visited vertices. example: DFS from 0: 0 1(stack :2) 3(stack :2 4) 7(stack :2 4) 4(stack :2 5 6) 5(stack :2 6 ) 2(stack :6) 6 ```graphviz graph graphname { 0 -- 1; 0 -- 2; 1 -- 3; 1 -- 4; 2 -- 5; 2 -- 6; 3 -- 7; 4 -- 7; 5 -- 7; 6 -- 7; } ``` 2. Breadth-first Search (BFS) Using Queue ### Minimal Cost of Spanning Tree Spanning tree: Any tree consisting solely of edges in G and including all vertices in G called a spanning tree. A spanning tree is a minimal(the fewest number of edges) connected subgraph Any connected graph with n vertices would have at least n-1 edges. Any connected graph have n-1 edges is tree Spanning tree has n-1 edges. Cost : the sum of the costs(weights) of the edges in the spanning tree. #### Fruskal's Algorithm Select the edges for includion in T in non-decreasing of their cost. An edge is added to T if it does not form a cycle with the edges that are already in T. Time complexity: O(e loge) #### Prim's Algorithm The set of select edges forms a tree at all times. 1. Choose any vertex as the node of the tree 2. A least-cost edge(u, v) is added to T that form a new tree, repeat until there are n-1 edges. Time complexity: O(n^2^) #### Sollin's Algorithm ==select multiple edges at each stage== At the beginning, there are n spanning tree(each vertex construct one). ### Find the shortest path Single source/All destination #### All-pairs Shortest Paths #### Floyd- A^-1^[i][j]: the length from i to j A^K^[i][j] = min{A^k-1^[i][j], A^k-1^[i][k] + A^k-1^[k][j]} Time Complexity: O(n^3^)