Try   HackMD
tags: ADA 4.2 Selection Problem

ADA 4.2: Selection Problem 選擇問題 (求第k名)

Textbook Chapter 9.3 - Selection in worst case linear time

What is Selection Problem?

取一有

n 個 integers 的
arrayA

求輸出
arrayA
裡第
k
大的數字

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Solution 1: Sorting

藉由將陣列

A 重新由小排到大,要第
k
大就直接取第
k
位置的值

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
但因為用到排序(sort),所以其時間複雜度落在
O(nlogn)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
n>>k
, sorting 的時間複雜度就會越差越多

如何將時間複雜度再往下調呢?

Solution 2: Divide-and-Conquer

想法

  • 設定一個基準點(
    pivot
    ),將原本的陣列依基準點拆成兩半
  • k|X>|
    ,則代表
    k
    落在
    X>
    裡的第
    kth
    位置上
  • k>|X>|
    ,則代表
    k
    落在
    X<
    裡的第
    (k|X>|)th
    位置上

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
我們會希望基準點(
pivot
)會盡量在中間,因為如果基準點一邊個數太大,若
k
剛好都落在較大的一邊,則最差情況你要拆
n
遍才能找到你要的數值

如何找尋中位數?

  • 尋找中位數也是另一個 Selection Problem,只是
    k
    為中位數

Step 1: Five Men Per Group

  • 將 n 個個數的 array 每5個一組,而會分成總共
    n5
    組,每組因為數量為 constant (5個),強制sorting所花費的時間為 constant

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Step 2: A Median Per Group

  • 每組取其中間值(Median)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
上圖為取中間值(黃點)而所做 sorting 動作,箭頭代表序列由小變大

Step 3: Median of Medians

  • n5
    組的所有中間值(medians)在做排序取得中間值(median)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
上圖將所有medians在做排序在取其中間值(紅點)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
在 medians 取得 median 時,若 medians 個數為偶數,通常會取較小的那個值當作 MoM
Reference: Chapter 9.3 p220.第三點

Step 4: Partition via MoM

  • 在取得MoM後,將此數作為切割點(pivot),在
    MoM
    的左邊為小於
    MoM
    元素的集合
    X<
    ,右邊則為大於
    MoM
    元素的集合
    X>

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Step 5: Recursion

接著我們就要判斷

k 位於哪個地方,依照上圖會有三種情況

  1. k
    落於
    X>
    裡面(其 index 一樣落在
    X>
    的第
    kth
    位置上)
  2. k
    落於
    MoM
    上(index 應為
    |X>|+1
  3. k
    落於
    X<
    裡面(其 index 會落在
    X<
    的第
    (k|X>|1)th
    位置上)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
這裡的recursion有牽涉兩個地方

  1. 找尋
    MoM
  2. k
    落於
    X>
    或者
    X<
    ,就要再做一次遞迴拆解一次(直到序列夠小可以直接找到
    k
    值)

Divide-and-Conquer for Selection Problem

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

依序拆解各個時間複雜度

  • base case:
    Θ(1)
  • divide groups with size 5:
    Θ(1)
  • median from group i:
    Θ(n)
  • MoM:
    T(n5)
  • case divide:
    Θ(n)
  • case 1:
    T(|X>|)
  • case 2:
    Θ(1)
  • case 3:
    T(|X<|)

我們希望

|X>|
|X<|
的數量盡量不要差距太大(不然 recursive 的次數就越多)
在做 recursion 時可以每次去掉一些元素
k
|X>|
|X<|
的情況每次可剔除
n5÷2×3=310n

下回要 recursion 的數量就變為
710n

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Time Complexity

  • T(n)
    = time for running
    Selection(X,k)
    with
    |X|=n

T(n)={Θ(1)if n=1T(n5)+max(T(|X>|),T(|X<|))+Θ(n)if n>1

becomes

T(n)={Θ(1)if n=1T(9n10)+Θ(n)if n>1

applying Master Method case 3

T(n)=Θ(n)

applying Sustitution Method

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →