Ch01 - Efficiency, Analysis, and Order

--- tags: Algorithm --- <style> font { color: red; font-weight: bold; } </style> # Ch01 - Efficiency, Analysis, and Order ## Algorithms **Algorithm** - Solving problems by the approach or methodology - 純概念、和實作無關 - 必須是**有效率**的方法 $\Rightarrow$ 在有限的時間和空間內完成 ### Definitions - **Problem** - A **question** to which we seek an answer - Exp: **sort a list of numbers** - **Instance** - Each **specific assignment of values to the parameters** is called an instance of the problem - Exp: **sort $S = [1, 3, 2, 5, 4]$** - **Solution** - The **answer** to the question asked by the problem in that **instance** - Exp: **$[1, 2, 3, 4, 5]$** - **Algorithm** - The step-by-step **procedure** that is used to solve the **problem** is called algorithm ### Example of Algorithms Writing algorithms in English (or other natural language) has two drawbacks: - difficult to write and understand - difficult to implement in Computer Language We prefer to write algorithms in Programming-Language - Like **Pseudo-codes** #### Sequential Search - **Problem**: Is the key $x$ in the array $S$ of $n$ keys? - **Input** (parameters): positive integer $n$, array of keys $S$ indexed from $1$ to $n$, and a key $x$ - **Output**: $location$, the location of $x$ in $S$ ($0$ if $x$ not in $S$). ```c void seqsearch (int n, const keytype S[], keytype x, index& location) { location = 1; while (location <= n && S[location] != x) location++; if (location > n) location = 0; } ``` #### Add Array Members - **Problem**: All the numbers in the array $S$ of $n$ numbers. - **Inputs**: positive integer $n$, array of numbers $S$ indexed from $1$ to $n$ - **Output**: $sum$, the sum of the numbers in $S$ ```c number sum (int n, const number S[]) { index i; number result; result = 0; for (i = 1; i <= n; i++) result = result + S[i]; return result; } ``` #### Exchange Sort - **Problem**: Sort $n$ keys in non-decreasing order (小到大) - **Inputs**: positive integer $n$, array of keys $S$ indexed from $1$ to $n$. - **Outputs**: the array $S$ containing the keys in non-decreasing order. ```c void exchangesort (int n, keytype S[]) { index i, j; for (i = 1; i <= n; i++) for (j = i + 1; j <= n; j++) if (S[j] < S[i]) exchange S[i] and S[j]; } ``` 和其他 $N^2$ 的排序演算法比較 - **Exchange Sort** - **Procedure**: - 進行 $n$ 次 loop (i: 1~n) - 在第 i 次 loop 中，進行另一個 loop (j: i+1~n) - 不斷比較 S[i] 和 S[j]，若 S[i] > S[j] 就交換 - **Analysis** - Comparison: $n^2$ - Exchange: $n^2$ - **Selection Sort** - **Procedure**: - 進行 $n$ 次 loop (i: 1~n) - 在第 i 次 loop 中，進行另一個 loop (j: i+1~n)，並宣告一個變數 min = i - 不斷比較 S[j] 和 S[min]，若 S[j] 就更新 min = j - 交換 S[i] 和 S[min] - **Analysis** - Comparison: $n^2$ - Exchange: $n$ #### Matrix Multiplication $$ A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \ \text{ and }\ B = \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{bmatrix} $$ their product $C = A \times B$ is given by $$ c_{ij} = a_{i1} b_{1j} + a_{i2} b_{2j} $$ In general, if we have two $n \times n$ matrices $A$ and $B$, their product $C$ is given by $$ c_{ij} = \sum_{k=1}^n a_{ik} b_{kj} \ \ \ \text{ for } i \le i,j \le n. $$ - **Problem**: Determine the product of two $n \times n$ matrices. - **Inputs**: a positive integer $n$, two-dimensional arrays of numbers $A$ and $B$, each of which has both its rows and columns indexed from $1$ to $n$ - **Outputs**: a two-dimensional array of numbers $C$, which has both its rows and columns indexed from $1$ to $n$, containing the product of $A$ and $B$. ```c void matrixmult (int n, const number A[][], const number B[][], number C[][]) { index i, j, k; for (i = 1; i <= n; i++) for (j = 1; j <= n; j++) { C[i][j] = 0; for (k = 1; k <= n; k++) C[i][j] += A[i][k] * B[k][j]; } } ``` ## Importance of Developing Efficient Algorithms ### Binary Search - **Problem**: Determine whether $x$ is in the **sorted** array $S$ of $n$ keys. - **Input**: positive integer $n$, sorted array of keys $S$ indexed from $1$ to $n$, a key $x$ - **Output**: $location$, the location od $x$ in $S$ ($0$ if $x$ not in $S$). ```c void binsearch (int n, const keytype S[], keytype x, index& location) { index low, high, mid; low = 1, high = n; location = 0; while (low <= high && location == 0) { mid = ⌊ (low + high) / 2 ⌋; if (x == S[mid]) location = mid; else if (x < S[mid]) high = mid - 1; else low = mid + 1; } } ``` #### Comparison Result Suppose that $x$ not in an array of size $n$ (即搜尋問題中最差的狀況), then... - Sequential Search requires $n$ comparisons - Binary Search requires $(\lg n) + 1$ comparisons. (if $n$ is a power of 2). | Array Size | #Comparison by Seq-Search | #Comparison by Bin-Search | | -------------:| -------------------------:| -------------------------:| | 128 | 128 | 8 | | 1,024 | 1,024 | 11 | | 1,048,576 | 1,048,576 | 21 | | 4,294,967,296 | 4,294,967,296 | 33 | :::info - $\lg n = \log_2 n$ ::: ### The Fibonacci Sequence - **Problem**: Determine the $n$th term in the Fibonacci sequence. - **Input**: a non-negative integer $n$. - **Output**: $fib$, $n$th term in the Fibonacci sequence. **Recursive Approach** ```c int fib (int n) { if (n <= 1) return n; else return f(n - 1) + f(n - 2); } ``` - **Efficiency of Recursive Approach** - The **recursion** tree of `fib(5)` ![](https://i.imgur.com/pnx4Zaa.png) - $T(n)$: the number of terms in the recursion tree for $n$. - If the number of terms more than doubled every time $n$ increased by 2, we have the following: $$ \begin{split} T(n) >& 2 \times T(n-2) \\ >& 2 \times 2 \times T(n-4) \\ &\ \vdots \\ >& 2^{n/2} \times T(0) \end{split} $$ - A kind of **divide-and-conquer** approach **Iterative Approach** ```c int fib (int n) { index i; int f[0 .. n]; f[0] = 0; if (n > 0) { f[1] = 1; for (i = 2; i <= n; i++) f[i] = f[i-1] + f[i-2]; } return f[n]; } ``` - Recursive Approach can be implemented without an array $\Rightarrow$ use 2 variable to store `f(i-1)` and `f(i-2)` - But using the array is more clearly ## Analysis of Algorithm - **Time Complexity Analysis** - Measure must be <font>independent</font> of... - Hardware - OS - The programming language - Complier - The programmer - **Basic Operation** - **Some group of instructions** such that **the total work** done by the algorithm is roughly **proportional** to **the number of times this group of instructions** is done. - **Input size problem** - We analyze the algorithm's efficiency by determining **the number of times some basic operation** is done **as a function of the size of input** - In some algorithms, the input sizes are measured by 2 numbers - The number of arcs and nodes in a graph - Sometimes we must be cautious about calling a parameter the input size ### Every-case Time Complexity: $T(n)$ - Can be performed when the basic operation is always done the same number of times for every instance of size n. - 適用於複雜度只和 input size 有關，和 input 內容本身無關的演算法 :::info **Exp**: sum of an array - **Basic operation**: the addition of an item in the array - **Input size**: $n$, the number of items in the array - 不論 array 的內容為何，總是執行 $n$ 次加法 $$ T(n) = n $$ ::: ### Worst-case Time Complexity: *W(n)* The *maximum* number of times the algorithm will do its basic operation for an input size of $n$. :::info **Exp**: Sequential Search - **Basic Operation**: the comparison of an item in the array with $x$ - **Input size**: $n$, the number if items in the array - The basic operation is done **at most** $n$ times - $x$ is the last item in the array - $x$ is not in the array $$ W(n) = n $$ ::: ### Average-case Time Complexity: *A(n)* The *average* of the number of times the algorithm will do its basic operation for an input size of n. #### **Average-case Time Complexity(Sequential Search)** - **Basic operation**: the comparison of an item in the array with $x$. - **Input size**: $n$, the number of items in the array. - **Analysis**: - For $1 \le k \le n$, the probability that $x$ is in the kth array slot is $1/n$. - If $x$ is in the $k$th array slot, the number of times the basic operation is done to locate $x$ (and therefore, to exit the loop) is $k$. - This means that *the average time complexity* is given by: $$ A(n) = \sum_{k=1}^{n}{(k\times\frac{1}{n})} = \frac{1}{n} \times \sum_{k=1}^{n}{k} = \frac{1}{n} \times \frac{n(n+1)}{2} = \frac{n+1}{2} $$ - Case when $x$ may not be in the array: - We assign some probability $p$ to the event that $x$ is in the array. - If $x$ is in the array, the probability that $x$ is in the $k$th slot is $p/n$, and the probability that $x$ is not in the array is $1 - p$. - *The average time complexity* is given by: $$ \begin{split} A(n) &= \sum_{k=1}^{n}{(k\times\frac{p}{n})}+n(1-p)\\ &= \frac{p}{n}\times\frac{n(n+1)}{2}+n(1-p)\\ &= n(1-\frac{p}{2})+\frac{p}{2} \end{split} $$ ### Best-Case Time Complexity: *B(n)* The *minimum* number of times the algorithm will do its basic operation for an input size of n ## [Order](https://medium.com/@yunyubee/%E6%BC%94%E7%AE%97%E6%B3%95-big-o-and-time-complexity-65f2dfafe9d1) :::info - ***Big-O ($O$):*** 演算法時間函式的**上限(Upper bound)** 即**最糟情況**下的執行次數 ***(Worst-case)*** - ***Omega ($\Omega$):*** 演算法時間函式的**下限(Lower bound)** ***(Best-case)*** - ***Theta ($\Theta$):*** 演算法時間函式的**上限與下限** - <font>常見排序： $1 \lt \log{n} \lt n \lt n\log{n} < n^2 < n^3 < 2^n < n!$ </font> ::: ![](https://i.imgur.com/afNRulb.png) ### Types of Order [Types of order examples](https://hackmd.io/@coding-guy/BJ6UOD3Ri) - $g(n)$: your algorithm - $f(n)$: the complexity - **Big O Order (Upper Bound)** **Definition:** $$ g(n) \le c f(n) $$ - **$\Omega$ Omega Order (Lower Bound)** **Definition:** $$ g(n) \ge c f(n) $$ - **$\Theta$ Theta Order (Exact Order)** **Definition:** $$ \Theta(f(n))=O(f(n)) \cap \Omega(f(n)) \\ cf(n) \le g(n) \le df(n) $$ - **Small o Order (Strong Upper Bound)** **Definition:** - For a given complexity function $f(n)$, $o(f(n))$ is the set of all complexity functions $g(n)$ satisfying the following: For every positive real constant *c* there exists a nonnegative integer $N$ such that, for all $n \ge N$, $$ g(n) \le c \times f(n). $$ :::info If $g(n) \in o(f(n))$, we say that $g(n)$ is small o of $f(n)$. Recall that “big O” means there must be some real positive constant c for which the bound holds. This definition says that the bound must hold for every real positive constant c. ==**Because the bound holds for every positive c, it holds for arbitrarily small c.**== (因為邊界對每個正 c 都成立，所以它對任意小的 c 也成立。) For example, if g(n) ∈ o(f(n)), there is an N such that, for n > N, $$ g(n) \le 0.00001 \times f(n). $$ We see that $g(n)$ becomes insignificant relative to $f(n)$ as n becomes large. For the purposes of analysis, if $g(n)$ is in $o(f(n))$, then $g(n)$ is eventually much better than functions such as $f(n)$. The following examples illustrate this. ::: ![](https://i.imgur.com/pZVQlfS.png) :::info **考題** Show that $n$ is in $O(5n)$, but not in $o(5n)$ **Sol**: $$ \text{for } c > 5, n \in O(5n) \\ \text{for } c = \frac{1}{6}, n \notin o(5n) $$ ::: > 推薦閱讀: [台大計概開放課程](https://youtu.be/AlscQi4C-Vo?t=1719)