changed 2 years ago
Published Linked with GitHub

演算法 - 荊宇泰 (2022 Fall)

tags: NYCU-2022-Fall

Class info.

課程資訊

One midterm exam, one final exam (70% of final score).

At most 3 programming assignments, some homework (reading assignments), and some quizs.

Date

9/16

課程介紹,期中要多努力,期末難

9/23

In Computer Science, computer algorithms defined under the Random Access Machine (RAM) model. RAM, Random Access Machine, there are finite number of instructions, \(+, -, *, ...\).
These instructions are basic enough (assembly language of X86).

在隨機存取圖靈機上,多了一個特殊的指針磁帶,大小是對數空間,字母是二進位單字(0和1)。圖靈機有一個特殊的狀態(state),當指到這個狀態而指針磁帶的數字(二進位)是’p’時,圖靈機會將工作磁帶上面的指針移動到輸入的第p個符號。
這特性讓圖靈機可以直接讀取輸入的特定字母,而不需要花時間去處理整條輸入。這對使用少於線性時間的複雜度類來說,是必要的(因爲處理整個輸入的時間是線性時間)

Correctness of algorithm doesn’t imply the correctness of the program.

Calculate the number of operations required and present the time required as a function of the input size.

Worst case, each time you have a key which is smallest among the previous sorted list, and thus you have to compare all.
Generally follow the worst case convention.

Θ-Notation, asymptotic tight bound

Given a function \(g(n), Θ(g(n))\) is the set of functions, \(Θ(g(n)) = \{f (n) \ | \ ∃ \ positive \ constants \ c_1, c_2, and \ n_0 \ s.t. \ 0 ≤ c_1 · g(n) ≤ f (n) ≤ c_2 · g(n), ∀n > n_0\}\)

9/30

\(\theta(n^2) \ \newcommand*\xor{\oplus} \ \rm{O}(n^2) \rightarrow \theta(n^2)\)
兩集合取交集

Insertion Sort
stable
Selection Sort
stable(老師說的"當=時就交換就能簡單變成stable",但查到的都unstable)

Binary Search

// g++ cpp-binary-search.cpp -o a.out -std=c++11
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

int binary_search(const vector<int> &data, int key) {
    int low = 0;
    int high = data.size()-1;
    while (low <= high) {
        int mid = int((low + high) / 2);
        if (key == data[mid])
            return mid;
        else if (key > data[mid])
            low = mid + 1;
        else
            high = mid - 1;
    }
    return -1;
}

int main() {
    vector<int> data = {1, 9, 2, 7, 4, 10, 3, 8, 5, 6};
    int key = 7;
    
    sort(data.begin(), data.end());
    
    for (auto &i : data)
        cout << i << " ";
    cout << "\n";

    int ret = binary_search(data, key);
    if (ret == -1)
        cout << "找不到\n";
    else
        cout << "找到索引值" << ret << "\n";
}

Merge Sort
stable
Divide and Conquer
Divide and Conquer

Homework assignment: Prove the master theorm in the book
(check between simplified master theorem) \(LaTex \rightarrow Overleaf\)


Quick Sort
unstable
only Divide
Quick Sort Pseudocode

QUICKSORT(A, p, r)
    if p < r
        q = PARTITION(A, p, r)
        QUICKSORT(A, p, q - 1)//老師投影片寫q而非q-1
        QUICKSORT(A, q + 1, r)
PARTITION(A, p, r)
    x = A[r]
    i = p - 1
    for j = p to r - 1
        if A[j] <= x
            i = i + 1
            exchange A[i] with A[j]
    exchange A[i+1] with A[r]
    return i + 1

Quick Sort provement
為了避免recursion太多導致負擔太大
可以設定當subproblem的長度小於x時使用insertion sort
以及為了減少worst case的發生 (T(n)=T(n-1)+n) => O(n^2))
使用Randomized Quick Sort隨機切分成兩個subproblem
例如T(n)=T(n/4)+T(3n/4)+n => O(nlogn)
但這方法仍可能有worst case

Analysis of Quick Sort
用Random Variable

10/7

Heap
Complete binary tree:
Full binary tree + leaf全靠左

Heap Sort
unstable

Counting Sort
stable

Radix Sort
stable
針對一堆整數,從最小位數到最小位數做Counting Sort
為啥這樣有效?
prove by induction:
induction on digit(位數)
假設後面i位都排好,考慮i+1位時,若兩筆資料大小不一樣則排序完成,否則兩筆資料一樣,則根據Counting Sort是stable的特性,後面i位數也都會是排好的。

Largest Gap in an array \(O(N)\)
非max-gap,關於min-gap/max-gap在Other Problem

# A python 3 program to find largest gap between
# two elements in an array.

# function to solve the given problem
def solve(a, n):

	min1 = a[0]
	max1 = a[0]

	# finding maximum and minimum of an array
	for i in range ( n):
	
		if (a[i] > max1):
			max1 = a[i]
		if (a[i] < min1):
			min1 = a[i]
	
	return abs(min1 - max1)

# Driver code
if __name__ == "__main__":

	arr = [ -1, 2, 3, 4, -10 ]
	size = len(arr)
	print("Largest gap is : " ,solve(arr, size))

# This code is contributed by chitranayal

Homework assignment: Leetcode K closet pair in \(\rm{O}(nlogn)\)
Using C/C++ as language

10/14

Selection Problem


若要找第i大的數字,使用sorting來解的話會保證O(nlogn),為何是big-O而非big-omega?因為這個問題可能有非sorting的解法。

找最大、最小數是theta(n)因為必定要跟n-1個數字比較
但是同時找最大最小只需要3ceiling(n/2)而非2n-2
不過老師說他不知道怎做的不是重點

Homework assignment: Prove un-used nodes always more than used nodes

CLRS answer

  • Tree
    • Binary search tree
    • High balanced tree
      • AVL tree
      • Red-Black tree
      • 2-3 tree or 2-3-4 tree

期中考11月第一周 (11/4)

考題範圍:

10/21

  • 2-3-4 tree vs red black tree
    One disadvantage for using 2-3-4 tree is that there will have lots of nodes which are unused.
    Waste memory, comparing with red balck tree.

  • Concatenable queue
    This data structure can execute concatenate() and split() in O(logn)


  • Advanced Design and analysis techniques
    • Dynamic Programming
    • Greedy Approach
    • Amortized Analysis

Development of the dynamic programming algorithm is broken into
a sequence of 4 steps.

  1. Characterize the structure of an optimal solution.
  2. Recursively define the value of an optimal solution.
  3. Compute the value of an optimal solution in the bottom-up fashion.
  4. Construct an optimal solution from computed information.

Random Variable

The hiring problem
機率、期望值

Other Problem

Max-gap/Min-gap
把所有資料normalization((k-min)/(max-min))後
放入一個個bucket中
根據鴿籠原理必有一個bucket是空的
而Max-gap必定橫跨這個空bucket
接著用O(n)把所有bucket中最大最小的值找出來
再把最大跟最小兩兩配對算距離(最多n個)即可找到Max-gap

花花酱 LeetCode 164. Maximum Gap

11/11

Greedy Algorithm

  • Huffman Code

Amortized Analysis

Homework 03 Table Expansion and Contraction proof

原文書 17.4 Dynamic tables

some info.
some powerpoint

11/18

Union and Find

  • Homework assignment

Binomial Heap / Fibonacci Heap


期末整理

Greedy algorithm

  • Dynamic programming: A sequence of choices, carefully make choice.
  • Greedy approach: Make choice that looks the best at the moment, and it leads to optimal solution.

Activity-Selection problem

  • Goal: To select a maximum-size set of mutually compactable activities
  • Given a set \(S = \{1, 2, . . . , n\}\) of \(n\) proposed activities that wish to use a resource.
  • Each activitiy \(i\) has a start time \(s_i\) and a finish time \(f_i\), \(s_i < f_i\), \(([s_i, f_i))\).

Dynamic Programming approach

  • Define \(\mathcal{S}_{i, j} = \{ a_k \in \mathcal{S} \ | \ f_i \lt s_k \lt f_k \le s_j \}\)

    • \(\mathcal{S}_{i, j}\) a subset of activities in \(\mathcal{S}\), which start after \(a_i\) finishes, and finish before \(a_j\) starts
    • \(\mathcal{S}_{i, j}\) consists of all activities
  • Add activities \(a_0\) and \(a_{n + 1}\), \(f_0\) = 0, \(s_{n+1} = \infty\)

    • Suppose that activities are sorted. \(f_0 \le f_1 \le ... \le f_n \le f_{n + 1}\), then \(\mathcal{S}_{i, j} = \emptyset\) when \(i \ge j\).
    • Thus we are looking for a maximum-size subset of mutually compatible activities
  • The substructure of the optimal solution

    • For a non-empty subproblem \(\mathcal{S}_{i, j}\)
    • Suppose a solution to \(\mathcal{S}_{i, j}\) includes \(a_k\), \(f_i \le s_k \lt f_k \le s_j\)
    • Using \(a_k\) generate 2 subproblems \(\mathcal{S}_{i, k}\) and \(\mathcal{S}_{k, j}\)
    • So we'll change the solution for \(\mathcal{S}_{i, j}\) to both \(\mathcal{S}_{i, k}\) and \(\mathcal{S}_{k, j} + 1\) (for \(a_k\))
    • Furthermore solutions to \(\mathcal{S}_{i, k}\) and \(\mathcal{S}_{k, j}\) must be optimal
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Theorem

Consider a nonempty subproblem \(\mathcal{S}_{i, j}\) with earliest finish time \(f_m = \min \{ f_k \ | \ a_k \in \mathcal{S}_{i, j}\}\). Then

  1. Activity \(a_m\) is used in some maximum-size subset of mutually compatible activities of \(\mathcal{S}_{i, j}\)
  2. The subproblem \(\mathcal{S}_{i, m}\) is empty, such that choosing \(a_m\) leaves the subproblem \(\mathcal{S}_{m, j}\) as the only one that may not be empty.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Greedy approach

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Knapsack problem

  • 0-1 (whole or nothing) knapscak can not be solved using greedy approach.
  • fractional knapsack (can take a fraction), compute the “value per pound”, \(\frac{v_i}{w_i}\), greedy approach calculates the optimal solution.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Huffman Code

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • Cost of the tree: \(B(T) = \sum_{c \in C} f(c)d_{T}(c)\)
    • \(d_T(c)\) is the depth of the leaf representating \(c\)
    • \(f(c)\) is the frequency of osccuring of \(c\)

Prefix Code

  • No codeword is a prefix of some other codewords.
  • It can be shown that in optimal data compression, code is prefix code.

Proof of the Greedy Choice Property

Proof of the Optimal substructure property

Amortized Analysis

  • Time required to perform a sequence of n operations to a data structure.
  • In the sequence of operations, some cost a lot, some do not.
  • Amortized cost is to calculate the total cost for a sequence of \(n\) operations, then take the average by dividing \(n\).

Aggregate Method

Determine an upper bound \(T(n)\) for \(n\) operations. Amortized cost is \(\frac{T(n)}{n}\).

Accounting Method

  • Determine an amrotized cost for each operation.
  • Some operations are over charged, “pre-paid cost”.
  • Prepaid credit should cover the cost for operations that are less charged.

Potential Method

  • The credit is maintained as “Potential Energy” associated with the data structure.
  • The data structure should always have positive potential.

  • Start with an initial data structure \(D_0\)
  • \(n\) operations are performed
  • \(c_i\) is the actual cost for operation \(i\).
  • \(D_i\) is the data structure after applying operation \(i\) to \(D_{i−1}\).
  • A potential function \(\Phi\) maps each data structure \(D_i\) to a real number \(\Phi(D_i)\).
  • Amortized cost \(\hat{c}_i\) of the \(i\)th operation, which \(\hat{c}_i = c_i + \Phi(D_i) - \Phi(D_{i-1})\)

  • The total amortized cost of \(n\) operations is
    \[ \begin{aligned} \sum_{i=1}^n \hat{c}_i & =\sum_{i=1}^n\left(c_i+\Phi\left(D_i\right)-\Phi\left(D_{i-1}\right)\right) \\ & =\sum_{n=1}^n c_i+\Phi\left(D_n\right)-\Phi\left(D_0\right) \end{aligned} \]
  • if \(\Phi(D_n) \gt \Phi(D_0)\) then total amortized cost \(\sum_{i=1}^n \hat{c}_i\) is an upper bound.
  • if \(\Phi(D_i) \lt \Phi(D_0) \forall i\) then we guarantee that the prepaid can cover the later.
  • if \(\Phi(D_0) = 0\), we only need to argue \(\Phi(D_i) \ge 0\)

Stack operation

Aggregate method

  1. Assume that we start with an empty stack, a better bound can be derived.
  2. At most \(O(n)\) PUSHES, the number of POPS and MULTIPOPS can only less than the number of PUSHES
  3. So the total cost is \(O(n)\), amortized cost is \(\frac{O(n)}{n} = O(1)\).

Accounting Method

  1. Give the amortized cost, PUSH=2, POP=0, MULTIPOP=0.
  2. Each operation will be paid for 1 dollar.
    So I prepay 2 dollars to the person move plates to the stack.
  3. The person who moves the plate out of the stack takes that dollar.
  4. Every one gets his one dollar, amortized cost is \(O(1)\), so the total cost \(< 2n\).

Potential Method

  1. Define \(\Phi\) on a stack, "the number of objects in the stack", \(\Phi(D_i) \ge 0\).
  2. Start with an empty \(\Phi(D_0) = 0\), \(\Phi\) is an upper bound on the actual cost.
  3. Now the amortized cost for various stack operations, assume that the \(i\) operation on a stack containing \(s\) objects

Incrementing a Binary Counter

Aggregate method

  • Implement a \(k\)-bit binary counter.

Accounting Method

  • Charge amortized cost 2 dollars to change from 0 to 1.
  • One of the two dollars is for changing from 0 to 1 and the other is for changing back from 1 to 0 (to reset).
  • So the amortized cost for incrementing operation takes at most 2 dollars.

Potential Method

  • Q: \(\Phi(D_0) = 0 \ ?\)
    • A: Since the potential function represent the number of 1, such that \(\Phi(D_0)=0\)
  • Q: \(\Phi(D_i) \ge 0, \forall i \ ?\)
    • A: Based on the definition of potential function

Dynamic Table

  • Store objects into a table, but we don’t know the number of objects will be inserted.
  • When the table is full, we need to expand the table, which the loading factor \(\alpha(T) = 1\).
  • When there are many free slots, we need to contract the table.
  • The loading factor \(\alpha(T)\) of the table, number of items in the table divided by the table size.

Table Expansion, while inserting ith item

  • Definition size[T]: size of the table, num[T]: number of items in the table.
  • if num[T] = size[T], we allocate a new table of size = 2size[T], copy every item to the new table, then insert the ith item into the new table, otherwise we just insert the ith item.
  • Cost
    • if expansion, size[T] +1
    • if no expansion, 1
Aggregate Method

Accounting Method
  • 1 dollar for the insertion, 1 prepaid dollar for moving itself in the next expansion, the last prepaid dollars is to help the other one (one of the “seniors”) in the next expansion.

Potential Method

Table Expansion

  • \(\Phi(T) = 2 \cdot num[T] - size[T]\), and \(\Phi(T) = 0\) after expansion.
  • Table is always at least half-full, so \(\Phi(T)\) always greater than 0.

If the ith operation trigger expansion, the \(size_i = 2 \cdot size_{i-1}\)

and \(size_{i-1} = num_{i-1} = num_{i} - 1\), which implies that \(size_i = 2 \cdot (num_i - 1)\)

\(\begin{aligned} \hat{c}_i & =c_i+\Phi_i-\Phi_{i-1} \\ & =\text { num }_i+\left(2 \cdot \text { num }_i-\text { size }_i\right)-\left(2 \cdot \text { num }_{i-1}-\text { size }_{i-1}\right) \\ & =\text { num }_i+\left(2 \cdot \text { num }_i-\left(2 \text { num }_i-2\right)\right)-\left(2\left(\text { num }_i - 1 \right)-\left(\text { num }_i-1\right)\right) \\ & =\text { num }_i+2-\left(\text { num }_i-1\right) \\ & =3 .\end{aligned}\)

Table Expansion and Contraction

  • If we contract the table when the number of objects is less than half of the table size, loading factor is no less than 1/2.
  • Contraction: halve the size of the table when a deletion causes the loading factor below 1/4.
  • Loading factor: \(\alpha[T] = \frac{num[T]}{size[T]}\)

Insertion Deletion
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

MergeableHeaps

Properties of binomial trees (provement)

  • There are \(2^k\) nodes.

  • The height of the tree is \(k\)

  • There are exactly \(\begin{pmatrix} k \\ i \end{pmatrix}\) (binomial coefficient) nodes at depth \(i\) for \(i = 0,1, ..., k\)

  • The root has degree \(k\), which is greater than that of any other node; moreover if the children of the root are numbered from left to right by \(k − 1, k − 2, . . . , 0\), child \(i\) is the root of the subtree \(B_i\).

  • The maximum degree of any node in an \(n\)-node binomial tree is \(\log n\)
    \(\Rightarrow\) Proof Immediate from Properties 1 and 4 of previous lemma.

Binomial Heap

  • The root of a min-heap-ordered tree contains the smallest key.
  • An n-node binomial heap H consists of at most \(\lfloor \log n \rfloor + 1\) binomial trees.
  • A node x has a pointers p[x] to its parent, child[x] to its leftmost child, and sibling[x] to the sibling of x immediately to its right, a field degree[x], which is the number of children of x.

Fibonacci heap

YT Course

  • Inserting a node

  • Uniting two Fibonacci heaps

  • Extracting the minimum node

  • Decreasing a key

  • Fib-Heap-Delete amortized time is \(O(\log n)\)

To prove the \(\log n\) upper bound becomes not trival

Union and Find

CLRS Notes

Prove the Time Bound (Amortized Analysis)

  • Potential Function \(\Phi_q(x)\), assigned to each node x in the disjoint-set forest after q operations.
  • If x is a tree root after the qth operation or if rank[x] = 0, then \(\Phi_q(x) = \alpha(n) \cdot\) rank[x].

Graph

Sorting Network

A comparator, a comparison is done in constant time.

  • Our goal in this chapter: Design Sorter[n]: a sorter (a sorting network) that sorts n numbers.

Zero-One Principle

Bitonic Sorting Network

Merging Network

NP

Select Topics Note

  • Decision problem
  • TSP
  • Hamilton cycle
  • 3-SAT
  • Vertex Cover
  • CLQUE

Reference

原文書電子檔

原文書解答

筆記 1

筆記 2

Select a repo