Week 2: Data Structures and Thought method

:+1: [2020 競技程設教材 HackMD Book](https://hackmd.io/@nckuacm/ryLIV6BYI) :+1: 2019 Week 2: Data Structures and Thought method = 接下來的章節，我們採用 **C++** 作為主要使用的程式語言，原因是相對於其他主流的高階程式語言(如: Python, JavaScript...)，C++ 作為一個編譯語言有著相對快的執行速度，並且在一些細節操作上可以更加自由。不過在部分複雜度較低，或是時間限制較寬鬆的題目中採用 Python 解題也可以提升解題速度，這部分就交給讀者自行探索。 # Basic programming 在撰寫程式時常常會需要估算資料大小，以選擇最適合使用的資料型別，就算是身經百戰的選手也可能因沒有小心估算數值的範圍而導致 WA，以下給出一些 C++ 中的常用型別以及適合存放的資料大小(好用的估計值): `int x`: $|x| < 2 \times 10^9$ `long long x`: $|x| < 9 \times 10^{18}$ `float x`: 整數+小數部分共 6 位精確度 - 整數: 最大超過 $8 \times 10^{6}$ 後不精確 - 小數: 小數點下精度最大約 $6$ 位 `double x`: 整數+小數部分共 15 位精確度 - 整數: 最大超過 $4 \times 10^{15}$ 後不精確 - 小數: 小數點下精度最大約 $15$ 位 `double` 與 `float` 能提供的數值精確度是取決於儲存的數值的**整數+小數部分的位數**，也就是說在使用浮點數時必須在數字大小和小數精度中做抉擇，若是同時存放過大且對小數精確度要求極高的數值時，很有可能會出現誤差。 > 讀者如果對浮點數的實作以及精確值有興趣， > 可以自行翻閱 [IEEE 754](https://zh.wikipedia.org/wiki/IEEE_754) 浮點數規範在第 15 週會討論到如何解決因精度不夠而產生的問題 #### 範例 [CODEFORCES 1106C Lunar New Year and Number Division](https://codeforces.com/contest/1106/problem/C) ```cpp #include <bits/stdc++.h> using namespace std; int const maxn = 3e5 + 10; int n, a[maxn]; int main() { cin >> n; for(int i = 0; i < n; i++) cin >> a[i]; sort(a, a+n); long long ans = 0, tmp; for(int i = 0; i < n/2; i++) tmp = (a[i] + a[n-i-1]), ans += tmp*tmp; cout << ans << endl; return 0; } ``` 學習重點： * 最多會有 $3\times 10^5$ 個數字，每個數字最大為 $10^4$，估計 `ans` 的最大值 $(2\times 10^4)^2\times (3\times 10^5)/2 = 6\times 10^{13}$，會超過 `int` 的範圍，所以改用 `long long` #### 範例 [CODEFORCES 1111A Superhero Transformation](https://codeforces.com/contest/1111/problem/A)： ```cpp #include <bits/stdc++.h> using namespace std; bool vowel['z'+1] = {}; // vowel := 是否為母音 string S, T; int main() { vowel['a'] = vowel['e'] = vowel['i'] = vowel['o'] = vowel['u'] = true; cin >> S >> T; bool ok = true; if(S.size() != T.size()) ok = false; for(int i = 0; i < S.size(); i++) if(vowel[S[i]] != vowel[T[i]]) ok = false; cout << (ok? "Yes" : "No") << endl; return 0; } ``` 學習重點： * a 的 ascii 是 97，所以在 C++ `'a'` 與 `97` 是相同的 * `bool vowel['z'+1] = {}`：後面使用一個空大括號，可以初始化陣列值為`0`。 * `vowel[S[i]] != vowel[T[i]]`：使用 vowel 陣列判斷字母是否為母音。 * 三元運算子(`?:`)使用恰當能使程式碼變得乾淨許多 #### 範例 [UVa OJ 232 Crossword Answers](https://uva.onlinejudge.org/external/2/232.pdf)：先將在 $(r, c)$ 上的編號 `flag[r][c]` 求出來接著直接分別由上到下由左至右將 word 輸出 ```cpp #include<cstdio> int const maxn = 20; int r, c, flag[maxn][maxn]; char puz[maxn][maxn]; bool across_head(int i, int j) { return !j || puz[i][j-1] == '*'; } bool down_head(int i, int j) { return !i || puz[i-1][j] == '*'; } void across_print() { for(int i = 0; i < r; i++) for(int j = 0; j < c; j++) { if(across_head(i, j) && puz[i][j] != '*') printf("\n%3d.", flag[i][j]); if(puz[i][j] != '*') putchar(puz[i][j]); } } void down_print() { for(int i = 0; i < r; i++) for(int j = 0; j < c; j++) if(down_head(i, j) && puz[i][j] != '*') { printf("\n%3d.", flag[i][j]); for(int k = i; puz[k][j] != '*' && k < r; k++) putchar(puz[k][j]); } } bool flag_position(int i, int j) { return (across_head(i, j) || down_head(i, j)) && puz[i][j] != '*'; } int main() { int kase = 0; while(scanf("%d%d", &r, &c) && r) { for(int i = 0; i < r; i++) scanf("%s", puz[i]); //preprocess int n = 0; for(int i = 0; i < r; i++) for(int j = 0; j < c; j++) if(flag_position(i, j)) flag[i][j] = ++n; else flag[i][j] = 0; //print answer if(kase) putchar('\n'); printf("puzzle #%d:\n", ++kase); printf("Across"); across_print(); putchar('\n'); printf("Down"); down_print(); putchar('\n'); } return 0; } ``` 學習重點： - 複習一點 index 的用法 - 條件判斷頗長，給它取個名字然後呼叫它，可增加可讀性 - 別總是一氣呵成，分幾次做通常(?)會比較好做 # Basic STL 介紹 STL 全名 Standard Template Library 由容器 (containers)、迭代器 (Iterators)、演算法 (Algorithms)、函式 (Functions) 4 種元件組成。接下來的幾週課程會陸續介紹幾個常用的 STL 裡的容器及函式絕大部分 STL 的東西只要涉及區間的操作，==區間表示一律為左閉右開==[^start_at_zero] 推薦的參考網站： [cplusplus.com](http://www.cplusplus.com)、[C++ reference](https://en.cppreference.com) ## vector vector 跟一般在使用的陣列很像，最大的特色就是他的儲存空間是動態增減的 ### `std::vector` ```cpp #include <vector> using std::vector; ``` #### 宣告： `vector<T> v`: `v` 為空的 vector，且各元素型別為 `T` `vector<T> v(size_type a)`: `v` 長度為 `a` `vector<T> v(size_type a, const T& b)`: `v` 會被 `a` 個 `b` 填滿 #### 函式： `v.size()`: `v` 目前的長度 `v.push_back(T a)`: 在 `v` 的結尾加一個 `a` `v.pop_back()`: 刪除 `v` 的最末項[^pop_back] `v.empty()`: 是否 `v` 為空 ```cpp int a; do { cin >> a; v.push_back(a); } while (a); cout << v.size() << '\n'; v.pop_back(); v.pop_back(); cout << v.size() << '\n'; cout << (v.empty()? "YES" : "NOT EMPTY"); ``` ``` > 5 2 8 9 1 2 0 < 7 < 5 < NOT EMPTY ``` `v[i]`: `v` 的第 `i` 項 `v.clear()`: 清空 `v` `v.resize(size_type a)`: 將 `v` 的長度條至 `a` ```cpp v.resize(4); v.resize(8, 100); v.resize(12); for (int i = 0; i < v.size(); i++) cout << v[i] << ' '; cout << '\n'; v.clear(); cout << "size = " << v.size(); ``` ``` < 5 2 8 9 100 100 100 100 0 0 0 0 < size = 0 ``` `v.begin()`: v 的起始地址 `v.end()`: v 的末端地址 + 1 (由於左閉右開) `v.assign(InputIterator l, InputIterator r)`: 將指標 `l` 到 `r` 的內容覆蓋至 `v` `v.assign(size_type n, const value_type& val)`: 覆蓋 `n` 個 `val ` 至 `v` ```cpp v.assign(7, 100); // 7 ints with a value of 100 cout << "#1: " << v.size() << '\n'; vector<int>::iterator it = v.begin() + 1; v.assign(it, v.end()-1); cout << "#2: " << v.size(); ``` ``` < #1: 7 < #2: 5 ``` ## string 字串 (string)，顧名思義是很多個字它們串在一起例如： "stringisastring"，就是一堆字 's', 't', 'r', 'i', 'n', 'g', 'i', ..., 'g' 串在一起。字串可像序列一樣表示，例如長度為 $n$ 的字串 $A$ 為： $A_1, A_2, ... , A_{n-1}, A_n$ 字串的問題與序列問題的區別在於字串通常探討的是字元前後**連續**的特性。 ### `std::string` string 的用法很像 `vector<char>` 但多了一些優化，以及功能、且 **`vector<char>` 有的 `string` 都有** ```cpp #include <string> using std::string; ``` 若 `t` 是 `string` 或 [C style string](http://archive.oreilly.com/oreillyschool/courses/cplusplus1/cplusplus106.html) 則： #### 運算： `s = t`: 會將 `t` 複製給 `s` ` s += t`: 在 `s` 的尾端接上 `t` #### 函式： `cin >> s`: 輸入字串至 `s` `cout << s`: 輸出 `s` `getline(cin, s, char c)`: 輸入字串至 `s`，直到讀到 `c`。`c` 預設為 `'\n'` ```cpp getline(cin, s, '\n'); t = " and"; // t is a string now s += t; s += " carry on"; cout << s; ``` ``` > keep calm < keep calm and carry on ``` `s == t`: 是否 `s` 跟 `t` 相同 `s.c_str()`: 回傳 `s` 的 C style string ```cpp char cstr[100]; strcpy(cstr, s.c_str()); cstr[8] = 'l'; cstr[9] = '\0'; cout << cstr << '\n'; cout << (cstr == s? "YES" : "NO"); ``` ``` < keep call < NO ``` #### 範例 [UVa 101 The Blocks Problem](https://uva.onlinejudge.org/external/1/101.pdf)： >挺複雜的題目將 4 道指令拆解成兩種指令的合成，也就是 `return_above` 以及 `move_to` - `return_above(a)`: 將特定 block `a` 上方的 blocks 全數返回到原來的位置。 - `move_to(a, b)`: 將 block `a` 及其上所有 blocks 移動到 block `b` 所在的位置最上方。宣告 `vector<int> pile[maxn];` 以 `pile[p]` 紀錄位置 `p` 上的所有 blocks (從 0 開始由下而上) ```cpp #include<iostream> #include<string> #include<vector> using namespace std; const int maxn = 26; int n, pos[maxn]; // pos := position vector<int> pile[maxn]; int height(int obj) { int i = 0, p = pos[obj]; for(; pile[p][i] != obj; i++); return i + 1; } void return_above(int obj) { int p = pos[obj], h = height(obj); for(int i = h; i < pile[p].size(); i++) { int w = pile[p][i]; // w := wood pile[w].push_back(w); pos[w] = w; } pile[p].resize(h); } void move_to(int a, int b) { int ap = pos[a], bp = pos[b]; int ah = height(a); for(int i = ah-1; i < pile[ap].size(); i++) { int w = pile[ap][i]; // w := wood pile[bp].push_back(w); pos[w] = bp; } pile[ap].resize(ah-1); } int main() { cin >> n; for(int i = 0; i < n; i++) { pos[i] = i; pile[i].push_back(i); } int a, b; string c1, c2; while(cin >> c1 && c1 != "quit") { cin >> a >> c2 >> b; if(pos[a] == pos[b]) continue; if(c1 == "move") return_above(a); if(c2 == "onto") return_above(b); move_to(a, b); } // output for(int i = 0; i < n; i++) { cout << i << ":"; for(int j = 0; j < pile[i].size(); j++) cout << " " << pile[i][j]; cout << endl; } return 0; } ``` 學習重點： - 長度(或高度) 與最後一位 index 之間的關係 (尤其從 0 開始數) - 化繁為簡，將重複性高的功能抽象(函式化)出來 ## sort 將一堆元素由"*給定規則*"排成一[順序](https://en.wikipedia.org/wiki/Order_theory)，稱為排序 (sort)。例如： `5, 6, 9, 8, 2` 這五個元素由小到大排為 `2, 5, 6, 8, 9` `a, bc, aa, ay` 這四個元素由字典順序排為 `a, aa, ay, bc` ### `std::sort` STL 的 `sort` 裡有複雜的優化，**預設**將容器元素由==小排至大== ```cpp #include <algorithm> using std::sort; ``` 假設有如下資料結構： ```cpp struct T { int val, num; }; vector<T> v; ``` 若想依 `num` 對 `v` 做排序，需寫**比較**函數比較函數是在定義"**小於**"**運算**： ```cpp bool cmp(const T &a, const T &b) { return a.num < b.num; } ``` > 或將小於運算子 (`operator<`) 重載也能達到一樣的效果將 `cmp` 做為參數： ```cpp sort(v.begin(), v.end(), cmp); ``` 當然也可以直接把匿名函數寫進去： ```cpp sort(v.begin(), v.end(), [](T a, T b) { return a.num < b.num }); ``` 順帶一提，若把小於行為內部定義為"大於"： ```cpp [](T a, T b) { return a.num > b.num } ``` 則排序為從大至小。 #### 範例 [CODEFORCES 1114B Yet Another Array Partitioning Task](https://codeforces.com/contest/1114/problem/B)： > 乍看之下有夠難欸仔細觀察發現， $k$ 個切出來的 subarray 中最大的 $m$ 個元素，收集起來恰好就是原 array 中前 $k\times m$ 大的元素所以將它們加起來就能得到所有 beauty 的總和 ```cpp #include<bits/stdc++.h> using namespace std; int const maxn = 2e5 + 100; int n, m, k, a[maxn]; vector<int> idx; int main() { scanf("%d%d%d", &n, &m, &k); for(int i = 0; i < n; i++) { scanf("%d", &a[i]); idx.push_back(i); } long long sum = 0; sort(idx.begin(), idx.end(), [&](int x, int y) { return a[x] > a[y]; }); for(int i = 0; i < m*k; i++) sum += a[idx[i]]; printf("%lld\n", sum); : . ``` 因為要求輸出間隔的 index 位置，所以我們需要原本的 index 位置將 `idx` 再排序一遍，就能穩穩輸出來了 :+1: ```cpp : . sort(idx.begin(), idx.begin() + m*k); for(int i = m-1; i < m*(k-1); i+=m) printf("%d ", idx[i]+1); putchar('\n'); return 0; } ``` 學習重點： - 凡事先試試 sort，想想有什麼好處，通常狀況只會變好 - `sort` 複雜度 $O(N\log_2 N)$，大部份競賽中並不會比 $O(N)$ 壞很多 - `for` 迴圈中大於 $1$ 以上的累加 # 演算法的效率 >演算法的設計，得建立在資料結構之上，並評估**時間**與**空間**上的效率題目給定輸入規模通常很大，2 倍、3 倍、甚至 10 倍的常數倍優化其實不是競賽時考慮的要點。我們所設計的演算法必須根據輸入規模 $N$ 而定。 * Big $O$ 表示法 $f(N) = O(g(N)) \iff \exists N_0,c>0. \forall N>N_0.|f(N)|\leq c\cdot |g(N)|$ 意思是說在 $N$ 足夠大的時候，已經存在正數 $c$ 使得 $c\cdot |g(N)|$ 大於 $|f(N)|$ 或是說 $g(N)$ 趨近無窮的成長速度不比 $f(N)$ 來得慢例如: $f(x)=x^2 + x + 1$ 在 $x$ 很大的時候，主要影響整個函數值的大小是平方項，這時可以說 $f(N) = O(N^2)$[^1] 假設輸入規模為 $N$，常見的複雜度有： $O(1)\leq O(\log_2N)\leq O(N)\leq O(N\log_2N)\leq O(N^k)\leq O(k^N)\leq O(N!)\leq O(N^N)$ > $k$ 為不受輸入規模影響的常數 ## 合理的複雜度記憶體空間的用量在各個比賽中規範都不相同但有些基本的考慮因素，例如遞迴深度，使用的變數多寡，程式碼長度[^sloc] 而普遍上題目都會以秒為單位去做時間限制 (e.g. 3 秒、10 秒) 但通常我們會直接考慮輸入資料的規模與計算出來的複雜度有個傳統(?)的範圍：$10^7$ 左右 > 這只是大約的估計範圍假設輸入規模為 $N$，演算法的複雜度為 $O(N^2\log_2 N)$ 那麼需要滿足 $N^2\log N \leq 10^7$ 具體一點，上面如果 $N=10^5$，那你就得重新設計演算法了。 # 設計演算法的思維這邊探討幾個常見去設計一個演算法的切入點 ## 考慮最大連續和問題給定一個長度為 $N$ 的整數數列 $A_1, A_2, ... , A_N$，要求找到 $1 \leq i \leq j \leq N$，使得 $A_i+A_{i+1}+...+A_j$ 儘量大。 > 注意數列中可能會有負數 ## 枚舉所謂枚舉，通俗點說就是**數出**部份給定的集合中元素。下面直接給出程式來解決最大連續和問題： ```cpp int best = A[1]; //與其用無限小，不如這樣初始化更不易出錯 for (int i = 1; i <= N; i++) { for (int j = i; j <= N; j++) { int sum = 0; for (int k = i; k <= j; k++) sum += A[k]; best = max(best, sum); } } ``` 通常我們假設簡單的 $1$ 道指令花費 $1$ 單位的時間 > 例如算數比較宣告判斷等 > 而使用的函式需要看內部做了什麼才能估算時間粗估一下可知道上面演算法的複雜度為 $O(N^3)$ #### 練習： [GCJ Kickstart Round G 2018 A Product Triplets](https://code.google.com/codejam/contest/5374486/dashboard#s=p0): Small dataset ## 動態規劃部份朋友可能知道可以令 $S_i = A_1 + A_2 + ... A_i$ 而 $A_i+A_{i+1}+...+A_j = S_j - S_{i-1}$ 這樣子有了 $S_i$ 就可將連續和的計算從 $O(N)$ 降為 $O(1)$ 構造 $S_i$ 非常的直覺： ```cpp S[0] = 0; for (int i = 1; i < N; i++) S[i] = S[i-1] + A[i]; ``` 從邊界遞推地紀錄所有問題的解，且一個項用到前一項的**最佳**結果，就是動態規劃的精神。而程式可改進為： ```cpp for (int i = 1; i <= N; i++) for (int j = i; j <= N; j++) best = max(best, S[j] - S[i-1]); ``` 複雜度降為 $O(N^2)$。 #### 練習： [CODEFORCES 1033C Permutation Game](https://codeforces.com/problemset/problem/1033/C) 第 9 週將繼續深入討論動態規劃 ## 貪心法籠統的講，貪心法就是：每次做一個在**當下看起來**最佳的決策，進而漸漸求出全局最佳解 > 這種短視近利的心態，居然也是個不錯的思維( 貪心法是動態規劃的特例，上面動態規劃恰好可以用貪心的角度去看，即每次 $S_i$ 所遞推拿的項 $A_i$，正好不用任何考慮，疊上 $S_{i-1}$ 它就是解。 #### 練習： [ZEROJUDGE d652 貪婪之糊](https://zerojudge.tw/ShowProblem?problemid=d652) ## 分治法分治 (divide & conquer) 簡稱 D&C，就是將一個大的問題，**分成**幾個互相*獨立*的子問題，然後再將子問題分成子子問題[^2]，一直重複分割的動作，直到最小的問題足夠和別的最小問題**合併求解**出父問題。將數列切一半，問左半的以及右半的最大連續和多少，以及問包含切開的那道分水嶺的最大連續和為多少，選出三者中最大值，它就是整個數列(原問題)的最大連續和： ```cpp int maxsum(int l, int r) { // 此為左閉右開區間 [l, r) if (r-l == 1) return A[l]; int m = (r+l)/2, sum, centre = A[m]; sum = 0; for (int i = m ; i < r; i++) centre = max(centre, sum += A[i]); sum = centre; for (int i = m-1; i >= l; i--) centre = max(centre, sum += A[i]); return max(centre, max(maxsum(l, m), maxsum(m, r))); } ``` 要驗證分治法的正確性，只需考慮子問題[^3]們解完後(假設已拿到解)，再合併為父問題看是否解完即可，並考慮最小的孫子問題到的邊界是否正確。 [第六週教材](https://hackmd.io/s/rkIHhGZ_4#Merge-sort)的 Merge sort 也是一個不錯的分治法例子 [^sloc]: 有時會想試試不靠儲存空間，使用大量的判定輸出，此時程式碼長度的估算也得考量 [^1]: 注意這邊的符號($=$)與數學上的用法"[等價](https://en.wikipedia.org/wiki/Equivalence_relation)"意義不同。 [^2]: 子子問題就是指從子問題直接分割出來的更小子問題 [^3]: 子問題而非子子問題也非子子..子問題 [^pop_back]: 若 v 為空，會發生無法預期的結果 [^start_at_zero]: [Why numbering should start at zero](https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html)