Coding Note－Data Structure

# Coding Note－Data Structure 此篇為針對資料結構之筆記整理，內容參考至Horowitx.E(2003), Fundamentals of data structure in C, 13th printing、AP325書籍及大學開放式課程。除統整學習內容及概念外，也整理常見問題的討論及剖析，希望透過此筆記綜整所學並精進程式能力！ Author: Vincent Chuang 莊閔翔 ## Arrays and Structures ### Strings 定義：String是一個字符的陣列，以表示文字資料。字串可包含字母、數字、符號和空格等字符，通常用來處理和操作文字資料。字串S可寫為$S=s_0, s_1, s_2..., s_{n-1}$，$s_i$即為其中的字母，組成一有限大小的字串。接下來的討論中，以C++的STL提供的std::string類型處理字串，包含了各種方法操作字串，如拼接、查找、替換和分割，如下列示範： ``` #include <iostream> #include <string> using namespace std; int main() { string str1 = "Hello"; string str2("World"); // Combining strings string combined = str1 + ", " + str2; // length(size) of string int length = str1.length(); //print out each character for(int i=0; i<length; i++){ cout<<str[i]<<" "; } //append and erase str1.append(" C++"); //append at the end string str4 = "Hello, World!"; str4.erase(5, 7); // Erase ", World" // Insert string inserted = str2; //World inserted.insert(3, "ABC"); //WorABCld, "ld" will be pushed to the back //Compare if (str1.compare(str2) == 0) cout << "Equal" << endl; else cout << "Different strings" << endl; //Substring string s = "abcdefghijk"; s.substr(2,5); //指的是從索引2開始的5個字元的子字串，即cdefg s.find('c'); //(注意'')回傳2 s.find("fgh") //回傳5 s.find('c',5) //從索引5開始往後找，回傳INTMAX(找不到) //Split String return 0; } ``` #### Pattern Matching 方法:利用基本Bruteforce方法在給定的字串中，搜尋符合條件的特定字串 Code: ``` #include <iostream> #include <string> using namespace std; int find(string &str, string &pat) { int ls = str.length() - 1; int lp = pat.length() - 1; int endmatch = lp; for (int start = 0; endmatch <= ls; endmatch++, start++) { if (str[endmatch] == pat[lp]) { int j = 0, i = start; while (j < lp && str[i] == pat[j]) { i++; j++; } if (j == lp) return start; } } return -1; } int main() { string str = "ababbaabaa"; string pat = "aab"; int ans = find(str, pat); if (ans != -1) cout << "Found at index: " << ans << endl; else cout << "Not found." << endl; return 0; } ``` 以上述例子而言，"aab"此字串即可在str的第五個位置開始找到。 #### Knuth-Morris-Pratt algorithm(KMP) KMP為在一個字串中搜尋特定目標字串的演算法，最大的優化即為Fail function－透過 LPS（Longest Prefix Suffix）陣列避免不必要的反覆運算，大幅提升字串搜尋效率。以下概述其原理及應用方法 1. Longest Prefix Suffix(LPS) 定義：最長可匹配前綴與後綴的長度以目標序列CABCAA為例，可得下列表格: |位置| 0 | 1 | 2 | 3 | 4 | 5 | |--|--|--|--|--|--|--| |字元|C| A| B| C| A| A | |LPS| 0| 0 | 0 | 1 | 2 |0 | 以讀取至位置3為例，有前綴序列'C'及後綴可匹配的序列'C'，且為最大長度，因此LPS值為1。同理，當讀取至位置4時，有最長的匹配序列'CA'，因此LPS值為2。 LPS Code範例： ``` #include <bits/stdc++.h> using namespace std; // 計算 LPS (Longest Prefix Suffix) 陣列 vector<int> computeLPS(const string &pattern) { int m = pattern.length(); vector<int> lps(m, 0); // LPS 陣列初始化為 0 int len = 0; // 當前最長匹配前綴長度 int i = 1; while (i < m) { if (pattern[i] == pattern[len]) { len++; lps[i] = len; i++; } else { if (len != 0) { len = lps[len - 1]; // 回溯到之前的匹配位置 } else { lps[i] = 0; i++; } } } return lps; } ``` 2.搜尋符合字串 ``` // KMP 搜尋演算法 vector<int> KMPSearch(const string &text, const string &pattern) { int n = text.length(); int m = pattern.length(); vector<int> lps = computeLPS(pattern); vector<int> result; // 存儲所有匹配的起始索引 int i = 0, j = 0; // i 遍歷 text, j 遍歷 pattern while (i < n) { if (text[i] == pattern[j]) { i++, j++; } if (j == m) { // 完整匹配 pattern result.push_back(i - j); j = lps[j - 1]; // 繼續查找下一個匹配 } else if (i < n && text[i] != pattern[j]) { if (j != 0) { j = lps[j - 1]; // 部分匹配，跳過已匹配部分 } else { i++; // 完全沒有匹配，移動 i } } } return result; } int main() { string text = "ABABDABACDABABCABAB"; string pattern = "ABABCABAB"; vector<int> matches = KMPSearch(text, pattern); if (matches.empty()) { cout << "Pattern not found" << endl; } else { cout << "Pattern found at positions: "; for (int pos : matches) cout << pos << " "; cout << endl; } return 0; } ``` 上述兩函數確保 KMP 演算法的時間複雜度為 O(n + m)，比傳統的暴力法BruteForce O(nm) 更高效，適合用於較複雜題目，以提升效率。 ### Matrix - Inverse Matrix #### LU分解參考線性代數書籍後，以LU分解的方法來決定所求矩陣的反矩陣。如此一來，除了不用將整個計算分為決定行列式值及高斯消去法兩個部分，能大幅降低時間複雜度。以下是關於我採用的LU分解方法討論以及時間複雜度分析。對於一個矩陣A而言，其LU分解是將它分解成一個下三角矩陣 L 與上三角矩陣 U 的乘積。利用$det⁡(AB)=det⁡(A)×det⁡(B)$的性質，即可利用矩陣L及U的行列式以確定A是否為可逆矩陣，再進到反矩陣的運算，時間複雜度即可降至$O(n^3)$。 Pseudo code ``` PROCEDURE LU_Inverse(A, n) L = I_n //identity matrix U = O_n //zero matrix FOR k FROM 0 TO n-1 DO FOR i FROM k TO n-1 DO sum = 0 FOR j FROM 0 TO k-1 DO sum = sum + L[i][j] * U[j][k] END FOR L[i][k] = A[i][k] - sum END FOR FOR j FROM k TO n-1 DO sum = 0 FOR i FROM 0 TO k-1 DO sum = sum + L[k][i] * U[i][j] END FOR IF L[k][k] == 0 THEN RETURN "singular irreversable" U[k][j] = (A[k][j] - sum) / L[k][k] END FOR END FOR Y = O_n //Zero matrix in the order of n X = O_n // LY = I FOR i FROM 0 TO n-1 DO Y[i][i] = 1 FOR j FROM 0 TO i-1 DO sum = 0 FOR k FROM 0 TO j-1 DO sum = sum + L[i][k] * Y[k][j] END FOR Y[i][j] = -sum / L[i][i] END FOR END FOR // UX = Y FOR i FROM n-1 TO 0 DO FOR j FROM 0 TO n-1 DO sum = 0 FOR k FROM i+1 TO n-1 DO sum = sum + U[i][k] * X[k][j] END FOR X[i][j] = (Y[i][j] - sum) / U[i][i] END FOR END FOR RETURN X END PROCEDURE ``` ## Stacks AND Queues 此段為針對兩種在圖論以及運算中常使用的資料結構，stack還有queue的基礎運作邏輯以及其功能簡介，最後也將提及兩者特性，並會在下篇關於BFS以及DFS中，更深入地討論兩者的應用。 ### Stacks 定義：在一個有序的陣列中，插入和刪除操作都是在陣列頂部(top)進行的，即為後進先出（LIFO，Last-in-First-Out）的邏輯，如下圖。 ![image](https://hackmd.io/_uploads/B17kwJkwC.png) Ref: Prohramiz, stack data structure 利用C++ STL創建一個stack資料結構，即可看出其LIFO的特性，如下： ``` #include <iostream> #include <stack> using namespace std; int main() { stack<int> stack; stack.push(1); //Adding elements into stack(top) stack.push(2); stack.push(3); stack.pop(); //Removing elements from the stack(top) stack.pop(); while (!stack.empty()) { cout << stack.top() <<" ";　//checking the stack is empty or not stack.pop(); } } ``` 運行後的結果即為１，因為其上方的元素皆被移除掉，更多的功能包括複製stack、確認是否已滿或是確認其大小等。 ### Queues 定義：一個有序列表，其中所有插入的操作都在一端(back)進行，而所有刪除操作都在另一端(front)進行，即為FIFO(First-in-First-Out)的邏輯。 ![image](https://hackmd.io/_uploads/ryhNhJyDR.png) Ref: Bestprog, C++ queue general concepts. Way to implement the queue 同理，利用c++ stl創建一個queue的資料結構來演示其FIFO的特性 ``` #include <iostream> using namespace std; int main() { queue<int>Queue; Queue.push(1); //Enqueue, inseting elements Queue.push(2); Queue.push(3); Queue.pop(); //Dequeue, removing elements while(!Queue.empty()) { cout << Queue.front() << "\n"; Queue.pop(); } return 0; } ``` Ouput即先後為2以及3，除已經移除掉的第一個元素外，其餘依照其插入進queue的順序列印出。 #### Circular queue 簡介: 與Queue同樣遵守FIFO規則，不過在邏輯上將尾端與前端連接，形成環狀結構。這樣可以有效地重複利用已釋放的空間，避免 Queue 在陣列中因空間無法移動的浪費。 code: ``` #include <iostream> using namespace std; class CircularQueue { int *arr; int front, rear, size, capacity; CircularQueue(int k) { capacity = k; arr = new int[k]; front = -1; rear = -1; size = 0; } bool enqueue(int value) { if ((rear + 1) % capacity == front) { cout << "Queue is Full\n"; return false; } if (front == -1) front = 0; rear = (rear + 1) % capacity; arr[rear] = value; size++; return true; } bool dequeue() { if (front == -1) { cout << "Queue is Empty\n"; return false; } if (front == rear) { front = rear = -1; // Queue becomes empty } else { front = (front + 1) % capacity; } size--; return true; } void display() { if (front == -1) { cout << "Queue is Empty\n"; return; } int i = front; while (true) { cout << arr[i] << " "; if (i == rear) break; i = (i + 1) % capacity; } cout << "\n"; } }; int main() { CircularQueue cq(5); //capacity set as 5 cq.enqueue(10); cq.enqueue(20); cq.enqueue(30); cq.enqueue(40); cq.display(); cq.dequeue(); cq.dequeue(); cq.display(); //comply FIFO rule cq.enqueue(50); cq.enqueue(60); cq.display(); // 30 40 50 60 cq.enqueue(70); //display full since the size is 5 } ``` ## Graph Theory ### DFS(Depth First Search) 簡介: 顧名思義DFS演算法從起點之鄰居開始搜索，搜索到沒有鄰居為止才結束，因此DFS較常以遞迴方式呈現其巡訪過程。演算法架構(參考至ap325 P.230) ``` 初始化所有node vi, 其中visit[v]={false}; DFS(s){ visit[s]=true; for each v in adj[s] do if(!visit[v]) then v.p = s; DFS(v); endif endfor return; } ``` 寶藏問題:(Consider邊之權重) ``` #include <iostream> #include <vector> #include <algorithm> using namespace std; const int N = 1000; // Adjust the size as needed vector<int> adj[N]; int w[N]; bool visit[N] = {false}; vector<int> parent[N]; // Declare parent vector for each node int dfs(int s) { int weight = w[s]; visit[s] = true; // Set s as visited for (int u : adj[s]) { if (!visit[u]) { parent[u].push_back(s); // Record the parent of u weight += dfs(u); } } return weight; // Visit all of the nodes } int main() { int n, m; cin >> n >> m; int total = 0; for (int i = 0; i < n; i++) { cin >> w[i]; total += w[i]; } for (int i = 0; i < m; i++) { int u, v; cin >> u >> v; adj[v].push_back(u); // v's adjacent = u adj[u].push_back(v); } int sum = 0; for (int v = 0; v < n; v++) { if (!visit[v]) sum = max(sum, dfs(v)); } cout << sum; return 0; } ``` ### BFS(Breadth First Search) 前情提要: 握手原理有n個人握手，每人均握手x次，握手總次數S為nx/2。圖論描述：在一無向圖中G(V,E),deg(V)的總和為2E 演算法架構(參考ap325 p.227) ``` 初始化for all v, visit[v] = false; v.d為v到初始點s的距離、v.p為v在BFS tree的parent; BFS(G,s){ queue<int> Q; Q.push(s), s.d = 0, visist[s] = true, s.p = -1; while(!Q.empty()) v = Q.front(); Q.pop() //移出(FIFO) for each u in adj(v) if(!visit[u]) visit[u] = true; u.d = v.d +1; u.p = v; Q.push(u); endif endfor endwhile } ``` BFS example: ``` #include <bits/stdc++.h> using namespace std; #define N 10010 vector<vector<int>> adj(N); // Using adjacency list vector<bool> visit(N, false); vector<int> d(N, -1); // Initialize distances with -1 void bfs(int s) { int cnt = 0, total_d = 0; queue<int> Q; Q.push(s); visit[s] = true; d[s] = 0; while (!Q.empty()) { int v = Q.front(); Q.pop(); for (int i : adj[v]) { if (!visit[i]) { d[i] = d[v] + 1; visit[i] = true; Q.push(i); cnt++; total_d += d[i]; // Summing the distances } } } cout << cnt << " " << total_d << "\n"; } int main() { int n, m, s; cin >> n >> m >> s; for (int i = 0; i < m; i++) { int a, b; cin >> a >> b; adj[a].push_back(b); } bfs(s); return 0; } ``` ### 拓樸排序(Topological Sorting) 簡介:適用於 DAG（有向無環圖），幫圖上的節點做線性排序的演算法。排序的條件為對於一DAG圖中，可將所有節點排成一序列，使得所有有向邊滿足起點排在終點前。觀念 1. DAG圖中的入度對於一節點v而言，入度的定義為通往v邊數的總和。 2. 走訪順序關係 Kahn's Algorithm方法－依序刪除節點以入度為0的點開始排序，並將走訪過的點刪除。因為拓樸排序試用於DAG，無環於圖中，因此刪除節點後，必有入度為0的點，再以此類推排序即可滿足條件。 Code: ``` vector<int> in_degree(1005, 0); vector<int> topo_order; void topological_sort(int n) { queue<int> q; for (int i = 0; i < n; i++) { if (in_degree[i] == 0) q.push(i); //考慮入度為0的點 } while (!q.empty()) { int node = q.front(); q.pop(); //將走訪過的節點刪除 topo_order.push_back(node); for (int neighbor : adj[node]) { if (--in_degree[neighbor] == 0) q.push(neighbor); } } } ``` ### 強連通分量（SCC, Strongly Connected Components）定義：在一個強連通分量(SCC)中，圖中任兩點均有大於一種路徑可抵達，即不論從任意節點開始，都可以走完全部的點。 Ref:[ https://hackmd.io/@erichung0906/HkjZBH_IK](https://) #### Kosaraju Algorithm 運算邏輯及簡介: 以dfn, up 及stk來判斷是否符合SCC條件。dfn定義為dfs遍歷所走過的順序、low代表節點 u 或其子樹可回溯到的最小 dfn 值（代表是否能往上繞）、Stk來儲存要加入SCC的節點並依序push in。當 low[u] == dfn[u] 時，表示從節點 u 開始，無法再往上繞回更前面的節點，這就是一個 SCC 的 root，即可開始滿足環繞所有 SCC 的路徑起點，可以從 stack 中彈出所有屬於這個 SCC 的節點。 Code: ``` #include <bits/stdc++.h> using namespace std; const int N = 1e4 + 5; vector<int> graph[N]; int dfn[N], low[N], timestamp = 0; bool in_stack[N]; stack<int> stk; int scc_count = 0; void tarjan(int u) { dfn[u] = low[u] = ++timestamp; stk.push(u); in_stack[u] = true; for (int v : graph[u]) { if (!dfn[v]) { tarjan(v); low[u] = min(low[u], low[v]); } else if (in_stack[v]) { low[u] = min(low[u], dfn[v]); } } if (dfn[u] == low[u]) { cout << "SCC " << ++scc_count << ": "; while (true) { int x = stk.top(); stk.pop(); in_stack[x] = false; cout << x << " "; if (x == u) break; } cout << "\n"; } } int main() { int n, m; cin >> n >> m; // 節點數與邊數 while (m--) { int u, v; cin >> u >> v; graph[u].push_back(v); } for (int i = 0; i < n; ++i) if (!dfn[i]) tarjan(i); return 0; } ``` #### Tarjan Algorithm 運算邏輯及簡介: 運用兩次DFS，分別進行順向與反向的遍歷。因為一個SCC不管是正向圖或是反向圖都不會影響其性質，所以正反dfs都可達到的點群即為SCC。在第一個dfs時，要找到最上游(圖左邊)的節點編號，依序往下游(圖右邊)排列，Stack的Top也就是Root，這樣才能在第二個dfs時，從下游(正向圖上游)開始建立SCC，讓在之後建立SCC時能把更下游的節點(已被建立為SCC的節點)篩掉。 Code: ``` #include <bits/stdc++.h> using namespace std; const int N = 1e4 + 5; vector<int> graph[N], rev_graph[N]; bool visited[N]; stack<int> stk; // record the order of visited nodes void dfs1(int u) { visited[u] = true; for (int v : graph[u]) { if (!visited[v]) dfs1(v); } stk.push(u); } void dfs2(int u) { visited[u] = true; cout << u << " "; for (int v : rev_graph[u]) { if (!visited[v]) dfs2(v); } } int main() { int n, m; cin >> n >> m; // nodes and edges for (int i = 0; i < m; ++i) { int u, v; cin >> u >> v; graph[u].push_back(v); // original rev_graph[v].push_back(u); // reverse graph } // Step 1: 正向 DFS 排序 memset(visited, false, sizeof(visited)); for (int i = 0; i < n; ++i) if (!visited[i]) dfs1(i); // Step 2: 反向 DFS 根據 stack 順序找 SCC memset(visited, false, sizeof(visited)); int scc_count = 0; while (!stk.empty()) { int u = stk.top(); stk.pop(); if (!visited[u]) { cout << "SCC " << ++scc_count << ": "; dfs2(u); cout << "\n"; } } return 0; } ``` ## C++ STL 簡介:C++ 標準模板庫（Standard Template Library, STL）是泛型類別和函數，可在程式中提供常用數據結構和算法。 ### Container ex: vector, list, set, map, queue, stack , set, deque #### vector Basic usages: ``` vector<int> v(size, initial_value); v.push_back(); //add element v.at(index); //access element v.pop_back(); //remove last elemnt v.size(); //number of element v.capcity(); //capacity of the container v.clear(); //remove all elements v.empty(); //check empty or not(in bool) ``` ### Algorithm ex: sort, find, count, for_each STL的algorithm中，最常見的即是以function形式存在(詳細sort的應用可見[https://hackmd.io/V3chzIpITnO0lyZu2TqUdg](https://)中排序演算法的部分) ### Iterator(疊代器) 簡介：Iterator 是 STL 中用來遍歷容器（如 vector、list、set）元素的工具，類似於指標，可以指向元素，並透過運算子如 *, ++, -- 來進行存取與移動。