Try   HackMD

Greedy Algorithm

Concept

  • builds up a solution in small steps, choosing a decision at each step myopically to optimize some underlying criterion
  • Intuitive and fast
  • Usually not optimal
  • prove optimality
    1. The greedy algorithm stays ahead.
    2. An exchange argument.

Interval Scheduling Problem

  • Given: Set of requests {1, 2, , n}, ith request corresponds to an interval with start time s(i) and finish time f(i)
    • interval i:
      [s(i),f(i))
  • Find a compatible subset of requests of maximum size
    • compatible: 不重疊
    • maximum: optimal
  • Repeat
    • Use a simple rule to select a first request i1
    • Once i1 is selected, reject all requests incompatible with i1.
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    • A is a compatible set of requests because of line 5
  • prove optimality
    • greedy algorithm stays ahead
    • 假設有最佳解O,證明|A| = |O| (不一定要完全相同,只要cardinality一樣就好)
    • First lemma:
      Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
      • 還不知道k跟m誰大誰小
      • proof by induciton:
        • Basis step: for r = 1, f(i1) ≤ f(j1)
          • 因為i1必是所有人中結束時間最早的,而j1只是從O這個subset排序出來結束時間最早的
        • 假設r-1時成立: f(ir-1) ≤ f(jr-1) (1)
          • 因為O is compatible
            f(jr-1) ≤ s(jr) (2)
          • (1) + (2)
            f(ir-1) ≤ s(jr)
            jr跟ir-1不重疊
            jr
            R after ir-1 is selected in line 5
          • According to line 3,f(ir) ≤ f(jr)
        • jr跟ir都在R裡(因為都跟ir-1 compatible),但greedy最後選擇ir,所以f(ir)<=f(jr)
      • Image Not Showing Possible Reasons
        • The image file may be corrupted
        • The server hosting the image is unavailable
        • The image path is incorrect
        • The image format is not supported
        Learn More →
      • small conclusion: greedy結果的每個finish time都小於O的finish time
    • The greedy algorithm returns an optimal set A
      • Proof by contradiction
        • 假設A不是最佳解
          |O|= m > k = |A|
        • 因為f(ik)≤f(jk)而且|O| > |A|
          O裡至少有jk+1
          • Image Not Showing Possible Reasons
            • The image file may be corrupted
            • The server hosting the image is unavailable
            • The image path is incorrect
            • The image format is not supported
            Learn More →
        • f(ik) ≤ f(jk) ≤ f(jk+1)
          jk+1跟ik compatible
          jk+1
          R
        • greedy只有在R為空時才停止,但是卻在ik就停止了,矛盾
  • Running time
    • O(n2):

      • Initialization:
        • O(n log n): sort R in ascending order of f(i) (根據finish time排序)
      • line 5:
        • 第一次檢查(n-1)個點跟i是否compatible,第二次(n-2)總共O(n2)
    • O(n log n):

      • Initialization:
        • O(n log n): sort R in ascending order of f(i) (根據finish time排序)
        • O(n): construct S, S[i] = s(i)
          • 記錄所有人的start time
      • Lines 3 and 5:
        • O(n): 只需掃過R一次
        • 檢查compatible: 檢查f(i)是否小於s(j)
        • 只檢查下一個(j=i+1)是否跟i compatible,其餘的交給後面檢查

Interval partitioning

  • multiple resources

  • use as few resources as possible

  • a.k.a. the interval coloring problem: one resource = one color

  • Given: Set of requests {1, 2, , n}, ith request corresponds an interval with start time s(i) and finish time f(i)

    • interval i:
      [s(i),f(i))
  • Goal: Partition these requests into a minimum number of compatible subsets, each corresponds to one resource

    • 用最少資源跑完所有工作
  • depth: a set of intervals is the maximum number that pass over any single point on the time-line.

    • 任意時刻需要的最大資源數(定義上不等於總共所需資源數)
  • Lemma: 在interval partitioning的問題下,總共所需資源至少為depth (lower bound)

    • proof:
    • 在產生depth的那個時間點,有d個intervalI1, , Id都經過該時間點
    • 這d個interval都需要resources,所以至少需要d個這d個interval都需要resources
  • Use only d resources (depth) to schedule all requests

    • prove the optimality
      1. 找到一個所有解都必須符合的bound
      2. 證明該演算法總是可以達到bound的結果
  • Greedy Algorithm

    • Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
      1. 把interval根據starting time排序
      2. 對1到n中第j個interval:
      3.     
        把所有跟j不compatible的label排除
      4.     
        如果能從{1, 2, , d}中找到沒被排除的label
      5.         
        把label assign給j
      6.     
        否則不label
    • Lines 3~5 implementation: 找到compatible with j的resource並label
      • 紀錄f(labeli): labeli的最後一個interval的finish time
      • check compatibility: f(labeli) ≤ s(Ij)
  • 找出depth:

    • Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
    1. 把finish time跟start time一起排序
      • 若有finish time跟start time剛好切在同個時間點,則finish time優先
    2. 從頭掃到尾,resource遇到start time表示使用資源就加1,遇到finish time表示釋放資源就減1,紀錄resource的max值即為depth
  • Optimality

    • The greedy algorithm assigns every interval a label, and no two overlapping intervals receive the same label.
      • proof:
        1. 所有interval都會得到label(line 6不會發生)
          • 假設interval Ij跟前面的t個interval overlap
            t+1個interval會重疊於s(j)
          • t+1 ≤ depth
            t ≤ depth - 1
            至少還剩一個label可以assign給Ij
        2. 兩個overlap的interval不會得到相同的label
          • 假設Ii跟Ij overlap,i < j
          • 當for迴圈執行到Ij時,Ii在line 3會被exclude(因為兩個overlap)
          • 所以不可能Ii跟Ij有相同label
      • 因為greedy只需要depth數量的label,符合lower bound
        Optimal!
  • Interval graph (Intersection graph)

    • 每個interval對應一個node,若兩個interval overlap,則用edge連接兩個node
    • 相鄰的兩個點表示其overlap,必須assign不同顏色(resource),可轉換成coloring問題

Scheduling to Minimize Lateness

  • Given: A single resource is available starting at time s. A set of requests {1, 2, , n}, request i requires a contiguous interval of length ti and has a deadline di
    • 所需作業時間及期限
  • Goal: Schedule all requests without overlapping so as to minimize the maximum lateness
    • Lateness: li = max{0, f(i) – di}.
    • 遲交時間最小化
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    1. 所需時間短的不一定緊急
    2. 餘裕最小(越緊急的越先做): 若所需時間長,可能使其他工作遲交的時間太長
  • Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    • no idle time
      • Image Not Showing Possible Reasons
        • The image file may be corrupted
        • The server hosting the image is unavailable
        • The image path is incorrect
        • The image format is not supported
        Learn More →
      • optimal的解如果中間有idle time,則idle time可被壓縮掉,不影響optimality
  • Exchange argument: Gradually transform an optimal solution to the one found by the greedy algorithm without hurting its quality.
    • 對一個optimal solution,把其中不符合greedy rule的部分換成greedy的形式,仍能保持optimality
  • inversion (倒轉): An inversion in schedule S is a pair of requests i and j such that s(i) < s(j) but dj < di
    • 比較早開始但是比較晚才要交
  • Lemma 1: All schedules without inversions and without idle time have the same maximum lateness.
    • proof:
      • Image Not Showing Possible Reasons
        • The image file may be corrupted
        • The server hosting the image is unavailable
        • The image path is incorrect
        • The image format is not supported
        Learn More →
      • 如果兩種schedule沒有晚做但是要早交的情況,且中間沒有idle time,則這兩種schedule皆以deadline順序做排程,且只會在deadline相同的request上有不同的排序
      • 因為deadline時間相同,且所有相同deadline的request所需時間總和相同,所以lateness不會因schedule的順序改變
  • Lemma 2: There is an optimal schedule with no inversions and no idle time
    • Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
      1. 如果有inversion發生在job i 跟 j,假設j被排在i之後,且dj < di
      2. 把 i 跟 j 互換可以消除inversion
      3. 因為dj < di,所以把j換到i前面不可能增加lateness (甚至可能減少lateness),可維持optimality
      • 即使不相鄰也能有相同結論
  • Theorem: The greedy schedule S is optimal.
    • proof by contradiction:
      • 假設O 為有inversion的optimal solution
      • 假設O沒有idle time (idle time可被壓縮)
      • 若O沒有inversion,則S = O (因為greedy的結果即為沒有inversion沒有idle time,又根據Lemma 1,可推得S 跟 O 有相同maximun lateness)
      • 若O有inversion,則當我們把inversion pair互換時,maximun lateness又可能變更小,與optimal的前提矛盾

Shortest Paths

  • Given:

    • Directed graph G = (V, E)
    • Length le = length of edge e = (u, v)
      E
      • 成本(距離、時間) ≥ 0
    • Source s
  • Goal:

    • 從s到V中其他點的最短路徑Pv
      • Length of path: path經過的length總和
    • 檢查所有從S裡的node u到不在S裡的node v的最小距離d(u)+le,把距離最小的node v放進S,更新d(v)
  • Correctness

    • 一旦node u被放進S,d(u)即為最短距離,不會再因放入新的node改變
    • Proof by induction on |S|:
      • Basis step: S只有s在裡面,d(s)=0
      • Inductive step:
        • 假設|S| = k ≥ 1時敘述為真

        • 令下一個要被放進S的node為v,(u, v)為從s到v的最短路徑Pv的最後一段edge,Pv = Pu + (u, v)

        • 根據假設,因u已經在S內,Pu為s到u的最短路徑

        • 考慮其他從s到v路徑的可能路徑P,此路徑必會跨越S與不是S的點,另x為交界處S內的點,y為交界處S外的點

          • 如果沒有y,這輪應該選擇從x出發而非u
        • 因為line 4會選擇d'(v) = (d(u)+le)最短的v點放入S,所以P不可能比Pv還短

          • d'(v)就已經比d(y)小了,且le''=(y,v) ≥ 0
          • 所以新的距離d(v)也是最短的距離
          • Dijkstra重要設定: le ≥ 0
  • Implementation

    • 所有node放在min priority queue裡,每輪更新看得到的node的cost,每次從min priority queue中deque最小的node放進S

Recap Heaps: Priority Queues

  • Priority Queue
    • Each element has a priority (key)
    • Only the element with highest (or lowest) priority can be deleted
      • Max priority queue, or min priority queue
    • An element with arbitrary priority can be inserted into the queue at any time
  • Heap
    • A max heap is
      • 爸媽的key比小孩大的tree
        • root的key最大 – A complete binary tree
    • Implementation
    • Insertion into a Max Heap
      • 把新node先放在尾端,如果新node的key比他爸媽的key大,把新node跟他爸媽交換(Bubble up)
      • Time complexity: O(lg n)
    • Deletion from a Max Heap
      • 把root拿走(key最大),把最尾端的node放到root,如果這個node比他的子孫小,把他跟他子孫交換(trickle down)
      • Time complexity: O(lg n)
    • Max Heapify
      • 從下面往上修正
        • 數學歸納法,假設小孩已經是heap
      • Time complexity: O(nlg n)

Minimum Spanning Trees

  • Given: Undirected graph G = (V, E)

  • cost ce for each edge e = (u, v)

    V

  • Goal:

    • 找到一個edges的子集 T ⊆ E 使得:
      • 連結graph中所有點
      • 總成本最小
  • (V, T)會是什麼?

    1. (V, T)一定要連起來
    2. (V, T)沒有cycle
      • 如果有cycle,拿掉cycle中某條edge,cycle中的點仍然保持連結,且cost減少
    • (V, T)是tree
  • Greedy Algorithms

    • Kruskal's algorithm

      1. T = {}
      2. 把edge的cost從低到高排列
      3. 依序把edge e放進T,但如果e放進T會使T出現cycle,則捨棄
    • Prim's algorithm: (c.f. Dijkstra’s algorithm)

      1. 從任意點出發(因為最後T一定會連接所有點)
      2. 每次把從T看出去cost最小的edge放入T
    • Reverse-delete algorithm: (reverse of Kruskal’s algorithm)

      1. T = E
      2. 把edge的cost從高到低排列
      3. 依序把edge e從T拿掉,但如果e從T拿掉會破壞連結性(出現孤島),則不拿掉e
    • Cut Property

      • Optimality of Kruskal’s & Prim’s algorithms
      • 假設所有cost ce 都不一樣
        • 從所有點中劃分出一個subset,subset交界的edge中cost最小的edge e是safe edge,所有MST都應該包含e
        • Wrong Pf: Exchange argument
          • T為一個任意的spanning tree而且沒有包含e,希望證明T不是MST
          • T為spanning tree,T一定至少有一條edge通過subset交界
          • 因為e是交界上cost最小的edge,ce < cf
          • 所以T – {f} + {e}有更小的cost
            • 但是T – {f} + {e}還是spanning tree嗎?
            • Spanning tree: connected & acyclic
            • maybe not connected
        • 需要滿足spanning tree的條件
        • Modified Pf: Exchange argument
          • T為一個任意的spanning tree而且沒有包含e,希望證明T不是MST

          • T為spanning tree,T一定至少有一條edge e'通過subset交界,e'=(v',w'),v'

            S and w'
            V–S

          • T' = T – {e'} + {e}是spanning tree

            • (V, T') is connected
              • 因為T為spanning tree,所以S內所有點都相通,S外所有點也相通,交界處的e'被拿掉只有使內外斷絕連結,所以可以繞路改由e相通
            • (V, T') must be acyclic
              • 因為T為spanning tree,所以S內S外都沒有cycle,只有可能在交界處增加e時會產生cycle,但是因為我們同時拿掉了e',所以不會有cycle
            • ce < ce',所以T'的cost比T小
    • Cycle Property

        • Pf: Exchange argument! (Similar to Cut Property)
  • Implementing MST Algorithms

    • Dijkstra’s Algorithm
    • Prim’s Algorithm
        • 只要看cost最小的tree edge就好,不用看從s出發的最小cost
    • Kruskal’s Algorithm
      • 如果ei的兩個端點已經屬於同個tree,就捨棄ei
    • The Union-Find Data Structure
      • Union-find data structure represents disjoint sets
        • Disjoint sets: elements are disjoint

        • 每個set有一個代表

        • Operations:

          • MakeUnionFind(S): 建立一個set
          • Find(u): 找u屬於哪個set,return該set的代表
          • Union(A, B): 合併 A 跟 B,重新選擇新的代表
        • Implementation: disjoint-set forest

          • 代表為root,連結由小孩指向父母
          • Union: 把比較小的樹並到比較大的樹裡(union by rank)
          • Find: 找到之後直接把他直接指到root(path compression)
        • Running time: union by rank + path compression

          • amortized running time per operation is O(α(n)), α(n) < 5
            • amortized running time: 平均分攤的running time
        • Implementing Kruskal’s Algorithm

            • 每個subtree用disjoint set表示,一開始有n個set
            • Line 1: 排序(用heap)
              • O(mlg m)
            • Line 4: 只要find(u)不等於find(v),就把ej放進T
              • 每條edge有兩端,find兩次
              • 2m次
            • Line 5: 把ej並到T裡
              • tree最多n-1條edge
            • Comparison sort + simple disjoint set: O(m log n)
            • Linear sort + union-find: O(m α(m, n))