Lecture 02 - Analysis of Algorithms

# Lecture 02 - Analysis of Algorithms > 課程內容 : 中興大學資工系范耀中教授 (112 學年度第 2 學期) > > 其他科目內容請參見 [[Here]](https://hackmd.io/@ChenZE/By4SOO6Jyl) ## 1. Example problem : 3-Sum problem ### 1.1 Problem scription :question: > **Question** > > Given N **distinct** integers, how many triples sum to exactly zero ? For example, an 8 elements array, `arr = [30, -40, -20, -10, 40, 0, 10, 5]` | Numbers | a[i] | a[j] | a[k] | sum | | :-: | :-: | :-: | :-: | :-: | | 1 | 30 | -40 | 10 | 0 | | 2 | 30 | -20 | -10 | 0 | | 3 | -40 | 40 | 0 | 0 | | 4 | -10 | 0 | 10 | 0 | --- ### 1.2 Brute-force algorithm (暴力解) ```java= // Java code for 3-Sum problem public class TreeSum { public static int count(int[] a) { int N = a.length; int count = 0; for (int i=0; i < N; i++) for (int j=i+1; j<N; j++) for (int k=j+1; k<N; k++) if (a[i]+a[j]+a[k] == 0) count++; return count; } public static void main(String[] args) { In in = new Ina(args[0]); int[] a = in.readAllInts(); StdOut.printIn(coint(a)); } } ``` **<font color="FF0000">:warning: 任何專業的程式碼都不該出現三3個以上的巢狀迴圈 (垃圾程式碼)</font>** ## 2. Order-of growth classifications ### 2.1 How long does the following algorithm take ? 考慮以下程式碼，當輸入長度為 N 時的指令次數 ```java= int count = 0; for (int i=0; i<N; i++) for (int j=i+1; j<N; j++) if (a[i]+a[j]==0) count++; ``` * **variavle decalration : $N+2$ times** * `count = 0` : 1 times * `int i = 0` : 1 times * `int j = i + 1` : N times, for 1 to N * **assignment : $N+2$ times (why:question:)** * less than compare : $\cfrac{1}{2} (N+1)(N+2)$ times * `i < N` : $N+1$ times, where 0 to N * `j < N` : $N+(N-1)+...+1=\cfrac{N(N+1)}{2}$ * Total : $(N+1)+\cfrac{N(N+1)}{2}=\cfrac{1}{2}(N+1)(N+2)$ * equal to compare : $\cfrac{1}{2}N(N-1)$ times * `a[i][j] == 0` : $(N-1)+...+1=\cfrac{N(N-1)}{2}$ * array access : $N(N-1)$ times * `a[i]` : $(N-1)+...+1=\cfrac{N(N-1)}{2}$ * `a[j]` : $(N-1)+...+1=\cfrac{N(N-1)}{2}$ * Total : $N(N-1)$ 大部分的效能分析太冗長，只要專注在前兩項(粗體字部分)即可以`if`敘述來說，總共執行了$(N-1)+N+...+1=\cfrac{N(N-1)}{2}$次，可視為==從 N 個數字中挑出兩個==，即$(^n_2)$ --- ### 2.2 Common order of growth classification > **Definition** > > If $T(N)\sim c\cdot g(N)$ for some constant $c>0$, then the order of growth of $T(N)$ is $g(N)$ > 以3-sum problem來說，問題複雜度是 $N^3$ ，意義在於==從 N 個元素中任選三個== 常用的演算法複雜度如下 : ==$1$、$\log N$、$N$、$N\log N$==、$N^2$、$N^3$、$2^N$ 其中前 4 個是**可以接受**的範圍，==$N^2$、$N^3$ 不該出現在專業程式中==，$2^N$僅限在測試中 > **Example for $2^N$** > > $S=\{A, B, C, D, E\}$，挑選**任意長度**的子集的和以編碼角度來看，子集$\{B\}=(0, 1, 0, 0, 0)$，子集$\{C\}=(0, 1, 1, 0, 0)$，所以共有 $2^5$ 個子集 ## 3. Doubling hypothesis ### 3.1 About software performance understanding * 黑箱測試(black box testing) : 看不到程式碼內部的邏輯(Ex: API) * 白相測試(white box testing) : 看得到程式碼內部的邏輯 ![image](https://hackmd.io/_uploads/SkAZxWTgC.png) --- ### 3.2 Data analysis 標準方式畫圖，將時間複雜度依照不同輸入長度畫圖，**觀察圖形** ![image](https://hackmd.io/_uploads/S1aDlZalC.png) --- ### 3.3 Double hypothesis ==將input size放大兩倍，觀察程式執行的效能變化== Supposed that complexity : $T(N)=a \times N^b =T(N)$ , thus we have $T_1(N)=a\times N^b$ and $T_2(N)=a \times (2N)^b$ Then, $\cfrac{T_2(N)}{T_1(N)}=\cfrac{a \times (2N)^b}{a \times (N)^b}=2^b$ 由此可求出 $b$ 之值，代回後可再求得 $a$ 之值，但 ==$\log N$ 規模可能不適用== ## 4. Theory of algorithm 已知演算法的複雜度為 : $N^3 \rightarrow N^2\cdot \log N \rightarrow N^2 \rightarrow N \cdot \log N \rightarrow N \rightarrow \log N \rightarrow 1$ 如何知道==針對某問題的演算法已經是最好的 ?== ### 4.1 Types of analysis * Best case * **最簡單**的 input * 提供所有的input一個**目標解** * Worst case * **最難的** input * 提供所有的input一個**保證解** * Average case * 針對**隨機** input :::info 此處主要聚焦在**Worst case**的討論 ::: --- ### 4.2 Theory of algorithm 演算法理論的目標有二個 : 1. 建立問題的**困難度** 2. 發展**最佳**的演算法在此條件下衍生兩個邊界值 : ==Upper bound==，表示**最差 input** 之下所執行的狀況與結果；==Lower bound==，表示**最簡單 input** 之下所執行的狀況與結果。 <font color="F0000">**注意**</font>，其實 ==Lower bound 本身**不包含**演算法的概念==，他指的是**任何人來解問題時都必須付出的成本**。例如 : > **Find zero elements in array whose size = N** 此問題的 Lower bound = $N$，因為不論是誰來解決此問題，都必須將**整個 array 看過一次**才可以執行後續的步驟演算法開發的方式有二個方向 : 1. 將**Upper bound下壓** 2. 將**Lower bound上抬** 當 Upper bound = Lower bound ==(同一規模)== 時，就可以說此演算法是最好的。但實際上將 Lower bound 上抬的過程比較**困難**，需要**嚴謹的數學證明**，所以一般都是==開發新的演算法==將 Upper bound**下壓**，而 Upper bound 下壓的過程就是 optimize