--- # System prepended metadata title: 演算法_comparision sort tags: [大二, 資工] --- # 演算法_comparision sort ### In-place sorting algorithm > A sorting algorithm is in-place if the numbers are rearranged within the array A, **with at most a constant number of them sorted outside** the array at any time. * In-place: Insertion sort, Quicksort, Heapsort * Not in-place: Merge sort, Counting sort, Radix sort ### worst-case running time > It is an upper bound on the running time. > The worst case occurs fair often > The average case is often as bad as the worst case ### Stable > keys with same value appear in same order in output as they did in input * Stable: Insertion sort, Merge sort, Counting sort, Radix sort * Unstable: Heapsort, Quicksort ### Sorting in Linear Time > violates the ground rules θ(nlogn) --- # Insertion sort * Sorted in place * Loop invariant * At the start of each iteration of the for loop of line 1-8, the subarray A[1,..., j – 1] consists of the elements originally in A[1,..., j – 1] but in sorted order ## pseudocode ```!= Insertion-sort(A) for j⭠2 to length[A] do key⭠A[j] i⭠j-1 while i>0 and A[i]>key do A[i+1]⭠A[i] i⭠i-1 A[i]⭠key ``` ![image](https://hackmd.io/_uploads/Byh-0ge-A.png =60%x) --- # Merge sort * NOT in-place * T(n) = θ(nlogn) ## pseudocode ```!= MERGE(A, p, q, r) n1 ⭠ q – p + 1 n2 ⭠ r – q create array L[1,..., n1 + 1] and R[1,..., n2 + 1] for i ⭠ 1 to n1 do L[i] ⭠ A[p + i – 1] for j ⭠ 1 to n2 do R[j] ⭠ A[q + j] L[n1 + 1] ⭠ ∞ //比較最後一個元素時會用到 R[n2 + 1] ⭠ ∞ i ⭠ 1 j ⭠ 1 for k ⭠ p to r do if L[i] <= R[j] then A[k] ⭠ L[i] i ⭠ i + 1 else A[k] ⭠ R[j] j ⭠ j + 1 MERGE-SORT(A, p, r) if p < r then q ⭠ [(p + r)/2] //取下高斯 MERGE-SORT(A, p, q) MERGE-SORT(A, q + 1, r) MERGE(A, p, q, r) ``` ![image](https://hackmd.io/_uploads/HJs74WgWR.png =80%x) --- # Asymptotic notation * g(n) is an asymptotic **upper bound** for f(n) O(g(n)) = { f(n) : ∃ positive constants c, n0 s.t. ∀ n ≥ n0, **0 ≤ f(n) ≤ cg(n)**} o(g(n)) = { f(n) : ∀ constants c>0, ∃ constants n0>0 s.t. ∀ n ≥ n0, 0 ≤ f(n) ==<== cg(n)} * g(n) is an asymptotic **lower bound** for f(n) Ω(g(n)) = { f(n) : ∃ positive constants c, n0 s.t. ∀ n ≥ n0, **0 ≤ cg(n) ≤ f(n)**} ω(g(n)) = { f(n) : ∀ constants c>0, ∃ constants n0>0 s.t. ∀ n ≥ n0, 0 ≤ cg(n) ==<== f(n)} * g(n) is an asymptotic **tight bound** for f(n) Θ(g(n)) = { f(n) : ∃ positive constants c1, c2, n0 s.t. ∀ n ≥ n0, 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n)} ## Theorem :::success f(n) = Θ(g(n)) if and only if f(n) = O(g(n)) and f(n) = Ω(g(n)) ::: (⇒): f(n) = Θ(g(n)) imply ∃ positive constants c1 , c2 , n0 s.t. ∀ n ≥ n0 , 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n), from 0 ≤ c1g(n) ≤ f(n) , we get f(n) = Ω(g(n)), from f (n) ≤ c2g(n) , we get f (n) = O(g(n)). (⇐): f(n) = O(g(n)) imply ∃ positive constants c1, n1 s.t. ∀ n ≥ n1 , f (n) ≤ c1g(n), f(n) = Ω(g(n)) imply ∃ positive constants c2, n2 s.t. ∀ n ≥ n2 , 0 ≤ c2g(n) ≤ f(n), therefore, let n0 = max{n1, n2}, ∃ positive constants c1, c2 , n0 s.t. ∀ n ≥ n0 , 0 ≤ c2g(n) ≤ f(n) ≤ c1g(n), we get that f(n) = Θ(g(n)). ## Properties #### Transitivity ![image](https://hackmd.io/_uploads/HkW0UGx-R.png =70%x)![image](https://hackmd.io/_uploads/Syu8vzl-R.png =25%x) #### Reflexivity ![image](https://hackmd.io/_uploads/H1LJPfgZ0.png =20%x)![image](https://hackmd.io/_uploads/HJvePMlZ0.png =20%x) #### Symmetry ![image](https://hackmd.io/_uploads/ry8hwMgWA.png =60%x) #### Transpose symmetry ![image](https://hackmd.io/_uploads/SJiCwGgZA.png =60%x) ## Exmaple 1.![217A9031-B47C-4760-B724-45F17759325D](https://hackmd.io/_uploads/HJgmIGxbC.jpg) 2.![F04A871A-FE82-4114-A246-92B331A85BC2](https://hackmd.io/_uploads/ByUJ7XlZ0.jpg) 3.![image](https://hackmd.io/_uploads/SyYnQ7gbC.png) --- # Recurrences ## Substitution method 1. Guess the solution 2. Upper bound、Lower bound ![3A364978-8BC9-4B47-AD08-9E929DE529BF](https://hackmd.io/_uploads/BJ-8qRlbA.jpg) ## Recurrence trees ![image](https://hackmd.io/_uploads/SJudB1--A.png) ## Master method > 取最大的，差$lg^{k}n$平手變$lg^{k+1}n$ :::info T(n) = aT(n/b) + f(n), Where a≥1, b>1, and f(n)>0. ::: * Case 1：$f(n) < O(n^{log_ba})$ Solution：$T(n) = O(n^{log_ba})$ * ==Case 2：$f(n) = O(n^{log_ba}lg^{k}n),where\ k ≥ 0$== 平手 ==Solution：$T(n) = O(n^{log_ba}lg^{k+1}n)$== * Case 3：$f(n) > O(n^{log_ba})$ Solution：$T(n) = O(f(n))$ --- # Heapsort * Heap A is a nearly complete **binary tree** * Height of node = # of edges on a longest simple path from the **node down to a leaf** * Height of heap = height of root = Θ(lgn) * Maximum numbers of elements in a heap of height h = $2^{h+1}$ − 1 * property * Max-heap: for all nodes i, excluding the root, A[PARENT(i)] ≥ A[i]. * Min-heap: for all nodes i, excluding the root, A[PARENT(i)] ≤ A[i]. * A heap can be stores as an array A: * Root of trees is A[1] * Parent of A[𝑖] = A[⌊𝑖/2⌋] * Left child of A[𝑖] = A[2𝑖] * Right child of A[𝑖] = A[2𝑖+1] ## MAX-HEAPIFY > 先找自己、左右的child誰最大，最後把最大的換上來 * Time: O(lg n) ```!= MAX-HEAPIFY(A, i, n) l ⭠ LEFT(i) r ⭠ RIGHT(i) if l ≤ n and A[l] > A[i] then largest ⭠ l else largest ⭠ i if r ≤ n and A[r] > A[largest] then largest ⭠ r if largest ≠ i then exchange A[i]⭤ A[largest] MAX-HEAPIFY(A, largest, n) ``` ![image](https://hackmd.io/_uploads/HkWnsyWWC.png) ## BUILD-MAX-HEAP > 從最小的子樹走上去 * **Time: O(n)** //Heapify takes a different time for each node * Loop invariant * At start of every iteration of for loop, each node i + 1, i + 2,..., n is root of a max-heap ```!= BUILD-MAX-HEAP(A, n) for i ⭠ ⌊n/2⌋ downto 1 do MAX-HEAPIFY(A, i, n) ``` ![image](https://hackmd.io/_uploads/r1iqTk-bR.png) ## HEAPSORT > 跟最小的node換，再HEAPIFY * Time: O(nlgn) ```!= BUILD-MAX-HEAP(A, n) for i ⭠ n downto 2 do exchange A[1] ⭤ A[i] MAX-HEAPIFY(A, 1, i – 1) ``` ![image](https://hackmd.io/_uploads/B1xNZe-bC.png) ## Implementation of Priority Queue ### Max Priority Queues * INSERT(S, x): inserts element x into set S * 插入新的node，**設為–∞**，再進行HEAP-INCREASE-KEY * O(lg n) * MAXIMUM(S): returns elements of S with largest key * return Root * Time: Θ(1) * EXTRACT-MAX(S): removes and returns element of S with largest key * 把root跟最小node交換，pop largest key，再MAX-HEAPIFY * Time: O(lg n) * INCREASE-KEY(S, x, k): increases value of element x’s key to k. Assume k ≥ x’s current key value * 跟parent比，若key較大就交換 * Time: O(lg n) ```!= INSERT(A, key, n) A[n + 1] ⭠ –∞ INCREASE-KEY(A, n + 1, key) MAXIMUM(A) return A[1] EXTRACT-MAX(A, n) if n < 1 then error “heap underflow” max ⭠ A[1] A[1] ⭠ A[n] MAX-HEAPIFY(A, 1, n – 1) return max INCREASE-KEY(A, i, key) if key < A[i] then error “new key is smaller than current key” A[i] ⭠ key while i > 1 and A[PARENT(i)] < A[i] do exchange A[i] ⭤ A[PARENT(i)] i ⭠ PARENT(i) ``` --- # Quicksort > A[j]小於pivot, i = i+1, A[i]⭤ A[j] * Sorts in place * Loop invariant * Time (Quicksort) * Worst-case running time: Θ($n^2$) * Expected running time: Θ(nlg n), constants hidden in are small * Time (PARTITION): Θ(n) * For loop iterates exactly r−p times, each iteration takes constant time. * The section outside the for loop also takes constant time. * As r−p represents the size of the subarray, PARTITION takes time proportional to the size of the subarray it's called on ```!= QUICKSORT(A, p, r) if p < r then q ← PARTITION(A, p, r) QUICKSORT(A, p, q-1) QUICKSORT(A, q+1, r) PARTITION(A, p, r) x ← A[r] i ← p – 1 for j ← p to r – 1 //A[r] is pivot do if A[j] ≤ x // 小於pivot, 換去前面 then i ← i +1 exchange A[i]⭤ A[j] exchange A[i+1]⭤ A[r] return i + 1 ``` ![image](https://hackmd.io/_uploads/H1xKXk---R.png =50%x)![image](https://hackmd.io/_uploads/BkhDyW--A.png =45%x) ### Performance #### Worst case 𝑇(𝑛) = 𝑇(𝑛 − 1) + 𝑇(0) + 𝑂(𝑛) = **𝑂($𝑛^2$)** * Have 0 elements in one subarray and n-1 elements in the other subarray * When input is sorted in either **ascending order** or **descending order**. * **Insertion sort** runs in O(n) time in this case #### Best case T(n) = 2T(n/2) + Θ(n) = **Θ(nlgn)** * Each subarray has ≤ n/2 elements #### Balanced case T(n) ≤ T(9n/10) + T(n/10) + Θ(n) = **O(nlgn)** * Imagine that PARTITION always produces a 9-to-1 split ## Randomized Quicksort * an **already-sorted array** causes worst-case behavior in non-randomized QUICKSORT, but not in RANDOMIZED QUICKSORT * Why analyze the expected running time of a randomized algorithm? * because it's more representative of **typical time costs**. * Additionally, considering randomness in computation **prevents adversarial input generation** compared to analyzing all possible inputs ```!= RANDOMIZED-PARTITION(A, p, r) i ← RANDOM(p, r) exchange A[r]⭤ A[i] return PARTITION(A, p, r) RANDOMIZED-QUICKSORT(A, p, r) if p < r then q ← RANDOMIZED-PARTITION(A, p, r) RANDOMIZED-QUICKSORT(A, p, q-1) RANDOMIZED-QUICKSORT(A, q+1, r) ``` ### Performance #### ==Worst case== 必考 ![image](https://hackmd.io/_uploads/BJzzwr-Z0.png =60%x) ![image](https://hackmd.io/_uploads/S16b9BZ-0.png =35%x) ![image](https://hackmd.io/_uploads/SkCGqHWZR.png =50%x) ![image](https://hackmd.io/_uploads/HyiN5HZbC.png =90%x) * When the (n/4)th smallest element is selected as the pivot using an O(n) time algorithm * recursion expression is T(n) = T(n/4) + T(3n/4) + cn. * upon solving the recursion, determine that the time complexity is θ(nlogn) * Prove that the running time of QUICKSORT is O(n^2) when the array A is sorted in non-increasing order * the partitioning routine produces one region with n - 1 elements and one region with 1 element * This results in a recursion tree with a height of O(log n), which gives a worst-case running time of O(n^2) #### Average-case --- # Decision tree * Abstraction of any comparison sort * counting only comparisons * Each internal node is labeled by indices of array elements from their original positions ![91C0485D-DC86-4A68-B7D3-D747175EBFD9](https://hackmd.io/_uploads/rJci98Wb0.jpg) * **Big-O = Decision tree 的樹高 (Root到Leaf最長的Path) = O(n lg n)** * the smallest possible depth of a leaf in a decision tree for a comparison sort * n-1 for an input array of length n * 對一個 leaf 而言，best case 為去比較sort好的數列，只要比較 n-1 次就可以判斷大小順序完成 sorting，其depth 為 n-1(假設 root depth 為 0) ## Theorem > Any decision tree that sorts n elements has height Ω(nlgn) > **Heapsort** and **merge sort** are asymptotically optimal comparison sort * **≧ n!** leaves on the decision tree * because every permutation appears at least once * Lemma: Any binary tree of height h has ≦ $2^h$ leaves * l = # of leaves, h = height, Then l ≦ $2^h$ #### ==Proof== 必考 :::info l ≧ n! By lemma, n! ≦ l ≦ $2^h$ or $2^h$ ≧ n! Take logs: h ≧ lg(n!) Use Stirling’s approximation: n! ＞ $(n/e)^n$ (by equation (3.16)) h ≧ $lg(n/e)^n$ = n lg(n/e) = n lg n － n lg e = Ω( n lg n) ::: ---