Intro. to Algorithm HW3

# Intro. to Algorithm HW3 ###### tags: `Intro. to Algorithm` - [name=曾文鼎 0716023] ## 1 以下說明皆為 1-based 。定義二維數列 $a_{(i,j)}$ 如下，其中 $i\in\left[1,n\right], j\in\left[1,\sqrt{n}\right]$ 。 $$a_{(i,j)}=\begin{equation}\begin{cases} 1 &\text{if}~i=0~\text{or}~s_i=j \\ a_{(i-1, j)}+1 &\text{otherwise} \end{cases}\end{equation}$$ 定義 $\Phi$ 為 potential function。 $$\Phi(s_i)=\begin{equation}\begin{cases} 0 & \text{if}~i=0 \\ \sum_{x=1}^{\sqrt{n}}a_{(i,x)} & \text{else} \end{cases}\end{equation}$$ 定義 $c_i$ 為尋找 $s_i$ 的predecessor 實際花費的成本。 $$c_i = \begin{equation}\begin{cases} i-p &\text{where}~s_p~\text{is the predecessor of}~s_i \\ i-1 &\text{if}~s_i~\text{has no predecessor} \end{cases}\end{equation}$$ 因為 $$\begin{align} \widehat{c_i} &= c_i + \left[\Phi(s_i)-\Phi(s_{i-1})\right] \\ &= c_i + \left[\sqrt{n} - c_i\right] \\ &= \sqrt{n} \end{align}$$ 故有 $$\begin{align} && \sum_{i=1}^{n}c_i + \Phi(s_n)-\Phi(s_0) = \sum_{i=1}^{n}(c_i + \Phi(s_n)-\Phi(s_0)) = \sum_{i=1}^{n}\widehat{c_i} &= n\left(\sqrt{n}\right) = n^{1.5} \\ &\implies & \sum_{i=1}^{n}c_i + \Phi(s_n) + \Phi(s_0) &= n^{1.5} \\ &\implies & \sum_{i=1}^{n}c_i + \Phi(s_n) &= n^{1.5} \\ &\implies & \sum_{i=1}^{n}c_i &< n^{1.5} \end{align}$$ 故此程式的複雜度為 $O\left(n^{1.5}\right)$。 ## 2 ### a 下表列出操作各種資料結構的均攤複雜度。 || Array | Binary heap | Fibonacci heap | |---|--- | --- | --- | | Insert | $O(1)$ | $O(\log n)$| $O(1)$ | | Decrease-Key | $O(1)$ | $O(\log n)$| $O(1)$ | | Extract-Min | $O(n)$ | $O(\log n)$| $O(\log n)$ | 定義 $T(G)$ 為找出 $G$ 的最小生成樹的時間成本。整個程式的複雜度為 $$\begin{align} T(G) &= O(E)\cdot\text{Insert}+O(V)\cdot\text{Extract-min} + O(E)\cdot\text{Decrease-Key} \\ &= O(n\log n)(\text{Insert}+\text{Decrease-Key}) + O(n)(\text{Extract-min}) \end{align}$$ * 若使用 Array ，複雜度為 $T(G)=O\left(n\log n+n^2\right) = O\left(n^2\right)$ 。 * 若使用 Binary heap ，複雜度為 $T(G)=O\left(n\log^2 n+n\log n\right) = O\left(n\log^2 n\right)$ 。 * 若使用 Fabonacci heap ，複雜度為 $T(G)=O\left(n\log n+n\log n\right) = O\left(n\log n\right)$ 。因此使用 Fabonacci Heap 最優。 ### b 對於一連通圖 $G=(V,E)$ ， Binary heap 較 Fabonacci heap 佳若且唯若 $$\begin{align} && T_{~\text{Binary heap}}(G) &\leqslant T_{~\text{Fabonacci heap}}(G) \\ &\implies & O(E\log n)+O(n\log n) &\leqslant O(E)+O(n\log n) \end{align}$$ 解不等式，得知若且唯若圖 $G=(V,E)$ 的邊的數量少於等於 $O(n)$ 時， $G \in \mathcal{H}$ 。亦即 $$G=(V,E)\in\mathcal{H} \Longleftrightarrow E\leqslant O(n)$$ ## 3 存在。 | 原圖 | DFS Tree | BFS Tree | | --- | --- | --- | | ![](https://i.imgur.com/5kDQOQj.png =x160) | ![](https://i.imgur.com/YcfnhBG.png =x160) | ![](https://i.imgur.com/phTkrpg.png =x160) | ## 4 ### a 把第 9 行的 $n-1$ 改成 $n$ 。 ### b 一連通圖 $G=(V,E)$ 能夠藉由壞掉的 Algorithm 2 算出正確的最小生成樹，若且唯若 $$\begin{align} && \text{dis}[i][j] &= \min\left(\text{dis}[i][k]+\text{dis}[k][j], \text{dis}[i][j]\right) \\ & \Longleftrightarrow & \text{dis}[i][j] &\leq \text{dis}[i][k]+\text{dis}[k][j] && k=n, \forall i\forall j \in[1,n] \end{align}$$ 亦即節點 $V_n$ 並不在任意兩其他節點的最短路徑上，此時 $V_n$ 為最小生成樹的樹葉。故 $$G=(V,E)\in\mathcal{H} \Longleftrightarrow V_n\in \text{Leaf}(G)$$ ## 5 對於公差為零的等差數列，可以預先透過求眾數演算法得知，因此以下僅討論公差非零的情況。 #### 步驟 1. 給定序列 $s$ ，隨意選一個 $s_i$ ，期望 $s_i \in s'$ （$s'$ 表示 $s$ 的最長等差子序列）。若存在 $|s'|\geqslant |s/2|$ ，顯然這樣的期望有至少 $0.5$ 的機率實現。 2. 對於給定的 $s_i$ ，往後依序選取 $s_{i+1}, s_{i+2}, ...$ ，期望找到最小的正整數 $x$ 使得 $s_{i+x}\in s'$ ，藉以找到公差 $d=s_{i+x}-s_{i}$ 。若存在 $|s'|\geqslant |s/2|$，則 $s_{i+x}$ 為最小的 $x$ 使得 $s_{i+x}\in s'$ 的機率顯然至少為 $0.5^x$ 。 3. 對於給定的 $d$，在 $O(n)$ 的時間內確認這樣的 $s'(|s'|\geqslant|s|/2)$ 是否存在。若存在 $|s'|\geqslant |s/2|$ 只要執行步驟 1. 達 $10$ 次，就能有至少 $1-0.5^{10} > 0.999$ 的機率猜到一個 $s_i\in s'$ 。對於每個步驟 1. 選取的 $s_i$ ，只要執行步驟 2. 達 $10$ 次，就有 $$\sum_{x=1}^{10} 0.5^x > 0.999$$ 的機率找到公差 $d$ 。綜上所述，若存在 $|s'|\geqslant |s/2|$ ，至少有 $0.999^2>0.99$ 的機率找到公差 $d$ ，並且耗時 $O(1)$。檢查公差 $d$ 耗時 $O(n)$ ，故此演算法的複雜度為 $O(n)$ 。此演算法有以下結論： * 若序列 $s$ 存在等差子序列 $s'(|s'|\geqslant|s|/2)$，此演算法有至少 $0.99$ 的機率回傳 true 。 * 若序列 $s$ 不存在等差子序列 $s'(|s'|\geqslant|s|/2)$，此演算法總是回傳 false 。 #### Psuedo code ``` /** * Queries if there exists an arithmetic subsequence of seq * whose length is not less than n/2 and common diffrerence * is d. The correct rate is 100% and the time complexity * is O(n). */ function judge(seq, d) { // Generates a hash set containing all the elements in s. let hs = new HashSet; foreach (e in seq) { insert e into hs; } for (i=1; i<=|seq|; i++) { if ((seq[i] - d) is in hs) continue; // O(1) let e = seq[i]; let counter = 0; // Note this while loop runs n times OVERALL // regardless of the outer for loop. Hence // the complexity is O(n). while ((counter < |seq|/2) and (e is in hs)) { e += d; counter++; } if (counter >= |seq|/2) return true; } return false; } /** * Queries if there exists an arithmetic subsequence of * s whose length is not less than n/2. If such subsequence * exists, this function returns true with probability at * least 99%; if not exists, this function always returns * false. The time complexity is O(n). */ function judge(s) { let m = mode of s; // O(n) if (count of m in s >= |s|/2) { // O(n) // Exists an arithmetic subsequence whose common // difference is 0. return true; } // Common difference must not be 0. for (i=1; i<=10; i++) { // O(1) let a = random integer in [1, |s|-10] for (j=1; j<=10; j++) { // O(1) // Assumed that index is always in the bound. // Since n is big. let d = s[a+j] - s[a]; if (judge(s, d)) { // O(n) return true; } } } return false; } ``` #### C++ code 以下給一個跑得動的 C++ (hackmd.io/@Hyperbola/algo-hw3)。 ```cpp #include <bits/stdc++.h> using namespace std; const int MAX_NUM = 50000; const int MAX_DIFF = 100; bool judge(vector<int>& s, int d) { int n = s.size(); // Generates arith hashset containing all the elements in s unordered_set<int> ss(s.begin(), s.end()); for (int i = 0; i < n; i++) { if (ss.count(s[i] - d) /* O(1) */) continue; int e = s[i]; int counter = 0; while (2 * counter < n and ss.count(e)) { e += d; counter++; } if (2 * counter >= n) return true; } return false; } bool judge(vector<int>& s) { int n = s.size(); assert(n >= 20); const int C = 10; for (int i = 0; i < C; i++) { // O(1) int arith = rand() % n - 10; for (int j = 0; j < C; j++) { // O(1) int d = s[arith + j] - s[arith]; if (judge(s, d)) { // O(n) return true; } } } return false; } vector<int> positive_dataset(int n) { vector<int> arith, non_arith; int d = rand() % MAX_DIFF; if (d == 0) d = 1; int a0 = rand() % MAX_NUM; for (int i = 0; 2 * i < n; i++) arith.push_back(a0 + i * d); while (arith.size() + non_arith.size() < n) { non_arith.push_back(rand() % MAX_NUM); } vector<int> sf; for (int i = 0; i < n; i++) { sf.push_back(i < arith.size() ? 1 : 0); } random_shuffle(sf.begin(), sf.end()); vector<int> v; for (int i = 0; i < n; i++) { if (sf.back() == 1) v.push_back(arith.back()), arith.pop_back(); else v.push_back(non_arith.back()), non_arith.pop_back(); sf.pop_back(); } reverse(v.begin(), v.end()); return v; } vector<int> negative_dataset(int n) { vector<int> ret; for (int i = 0; i < n; i++) ret.push_back(rand() % MAX_NUM); return ret; } int main() { cout << "A postitive dataset or negative (P/N): "; string reply; cin >> reply; char r = toupper(reply[0]); if (r != 'P' and r != 'N') exit(1); cout << "Size of dataset (recommended 100000, at least 20): "; int n; cin >> n; if (n < 20) exit(1); srand(time(0)); vector<int> dataset = r == 'P' ? positive_dataset(n) : negative_dataset(n); cout << (judge(dataset) ? "True" : "False") << endl; return 0; } ``` ## 6 #### 作法定義 * $O(D)$ 表示光碟 $D$ 的圓心。 * $C(P,r)$ 表示以 $P$ 為中心、半徑為 $r$ 的圓。 * $M(B)$ 表示二分圖 $B$ 的 Maximum vertex cover 的數量。以下給出 $O(n^3)$ 的解法。 1. 枚舉光碟對。對於每對光碟對 $(D_1, D_2)$ ，若 $l=\overline{O(D_1)O(D_2)} \leqslant 2$ ，令 $I=C(O(D_1),l)\cap C(O(D_2),l)\neq\phi$ 。此步驟複雜度 $O\left(n^2\right)$ 。 2. 對於一個給定的 $I$，過 $\overline{O(D_1)O(D_2)}$ 將 $I$ 分成兩半 $I_1,I_2$ 。掃描平面上所有光碟的中心，取得兩集合 $H_1, H_2$ ，其中 $H_i=\{D:O(D)\in I_i\}$ 。然後令 $H=H_1\cup H_2$ 。此步驟複雜度 $O(n)$ 。 3. 若 $|H| \geqslant 2k$ ，表示必定存在 $k$ 個光碟其兩兩相交，此演算法回傳 true 並立即結束。 4. 對 $I$ 內所有未相交的光碟對建邊，將會得到二分圖 $B=(V,E)$ 。這是因為 $I_1$ 內的任意光碟對 $(D_1,D_2)$ 必有 $\overline{O(D_1)O(D_2)}\leqslant l\leqslant 2$ 。 $I_2$ 同理。此步驟複雜度 $O(\left|H\right|^2)=O\left(k^2\right)=O\left(\log^2 n\right)$ 。 5. 計算 $M(B)$，表示存在 $M(B)$ 個光碟兩兩相交。若 $M(B)\geqslant k$ ，此演算法回傳 true 並立即結束。此步驟可以透過 Ford-Fulkerson 計算 Maximum flow 在 $O(EV)\leqslant O(\log^2 n)$ 內完成。 6. 枚舉完所有光碟對後，回傳 false ，演算法結束。注意步驟 3. 在 $|H| \geqslant 2k$ 時結束演算法。這是因為對於任意二分圖 $B=(V,E)$ 恆有 $M(B) \geqslant |V|/2$ 。此演算法的複雜度為 $$O(n^2)\left[O(n)+O(\log^2n)+O(\log^2n)\right] = O(n^3)$$ #### Pseudo code ``` /** * Queries if there exists k disks within diskset such * that they mutually intersect. The time complexity is * O(n^3). */ function judge(diskset, k) { foreach (diskpair(d1, d2) in diskset) { // O(n^2) let p1 = center_of(d1), p2 = center_of(d2); let l = dist(p1, p2); if (l > 2) continue; let i = intersect_of( circle(p1, l), circle(p2, l) ); let h = {}; foreach (disk in diskset) { if (center_of(disk) is in i) { insert disk into h } } if (|h| >= 2*k) return true let b = empty graph; foreach (diskpair(u1, u2) in h) { if (dist(center_of(u1), center_of(u2)) > 2) { insert vertex(u1) into b insert vertex(u2) into b insert edge(u1, u2) into b } } let m = maximum_vertex_cover_of(b); if (m >= 2*k) return true; } return false; } ```