# COMP 3711 – Design and Analysis of Algorithms II 2022 Fall Semester – Written Assignment 2 Distributed: September 30, 2022 Due: October 17, 2022, 23:59 Name: Kwong Cheuk Lam SID: 20776617 Email: clkwongaa@connect.ust.hk **Implementation of Pseudocode of Q4:** https://colab.research.google.com/drive/1vRyqTKPrD7hffAV_XY62y9hnDeZ6JqJR?usp=sharing **Online version:** https://hackmd.io/@j1Jsew7VSb-GQ9oE3SBz8g/Syg2E_fXi **Collaborators:** 1. Lam Ho Wai (hwlamah@connect.ust.hk) ### Question 1 We define the following random variables: $X_{i,j}= \left\{ \begin{array}{rcl} 1 & \mbox{if} & A[i] > A[j] \\ 0 & \mbox{if} & \text{otherwise} \end{array} \right.$, for any $i,j \in [1, n]$ $X = \sum_{1\leq i < j \leq n}{X_{i,j}}$ Then, Expected number of inversions $= E(X) = E(\sum_{1\leq i < j \leq n}{X_{i,j}})$ $=\sum_{1\leq i < j \leq n}{E(X_{i,j})}$ (linearity of expectation) $=\sum_{1\leq i < j \leq n}\text{Pr}[X_{i,j}=1](1)$ Number of ways to choose $i$ and $j$ is $P^n_2$. Number of ways that $A[i] > A[j] = \frac{P^n_2}{2}$, since for any pair of $(i, j) \text{ and } (j, i), \text{either } A[i]>A[j] \text{ or } A[j]>A[i].$ (Elements of A is distinct.) $\text{Pr}[X_{i,j}=1] = \frac{P^n_2/2}{P^n_2} = \frac{1}{2}$ Number of ways to choose $i$ and $j$ such that $1\leq i < j \leq n$ is $C^n_2$. Then, $E(X)=\sum_{1\leq i < j \leq n}\text{Pr}[X_{i,j}=1](1)$ $= \sum_{1\leq i < j \leq n}{\frac{1}{2}}$ $= C^n_2\times \frac{1}{2}$ $= \frac{n!}{4(n-2)!}$ $= \frac{n(n-1)}{4}$ ### Question 2 We can proof the result `S` returned from `RANDOM-SAMPLE(m,n)` to be a subset of [1, n] of size m drawn uniformly at random by induction on m. i.e. $Pr[k \in S] = \frac{m}{n}, \forall k\in[1,n]$, m ≤ n. **Base Case:** m = 1, `RANDOM-SAMPLE(1, n)` Since m ≠ 0, we would then execute `S ← Random-Sample(0,n−1)` and `S` = Φ. Then, we execute `i ← Random(1,n)`, s.t. $Pr[i=k] = 1/n, \forall k\in [1,n]$. Then, since `S` = Φ, $i\notin S$ $S = S \bigcup{i} = \{i\}$ Therefore, $Pr[k\in S] = Pr[i=k \cap i\in S] = Pr[i=k]$ (i must be in S) $= 1/n = m/n$ ***Base Case is True.*** **Induction:** Assume that `RANDOM-SAMPLE(m-1, n)` could return a subset of [1, n] of size m-1 drawn uniformly at random. i.e. $Pr[k\in S] = \frac{m-1}{n}, \forall k\in[1,n]$ Consider `RANDOM-SAMPLE(m, n)`: Since m ≠ 0, we would then execute `S ← Random-Sample(m−1,n−1)`. `S` is a subset s.t. $Pr[k\in S] = \frac{m-1}{n-1}, \forall k\in[1,n-1]$. (Induction assumption) Then, we execute `i ← Random(1,n)`, s.t. $Pr[i=k] = 1/n, \forall k\in [1,n].$ Then, we determine whether $i\in S:$ **If $i\in S,$** we execute `S = S ∪ {n}`. And S becomes a set that $Pr[k\in S] = \frac{m-1}{n-1} \forall k\in[1,n-1]$ (Probability of the original elements would exist in S does not change.) $Pr[n \in S] = Pr[i\in S] = \frac{m-1}{n}$ **If $i\notin S,$** we execute `S = S ∪ {i}`. And S becomes a set that $Pr[k \in S] = Pr[k \notin S \cap i=k] = (1-\frac{m-1}{n-1})\times\frac{1}{n} = \frac{n-m}{n(n-1)}, \forall k\in[1,n-1]$ (Probability that k is not the original S and it is drawn in the `Random` frunction) $Pr[n \in S] = Pr[i=n] = 1/n$ (Probability of drawing n) Since $Pr[k\in S] = Pr[k\in S | i \in S] + Pr[k\in S | i \notin S]$, <!-- $Pr[i \in S] = \frac{m-1}{n}, Pr[i \notin S] = \frac{n-m+1}{n}$ --> $Pr[k\in S] = \left\{ \begin{array}{rl} \frac{m-1}{n-1} + \frac{n-m}{n(n-1)} = \frac{m}{n} && \forall k\in[1,n-1] \\ \frac{m-1}{n} + \frac{1}{n} = \frac{m}{n} && \text{if } k=n \end{array} \right.$ i.e. $Pr[k\in S] = \frac{m}{n}, \forall k\in[1,n]$, `RANDOM-SAMPLE(m, n)` could return a subset of [1, n] of size m drawn uniformly at random. And it completes the proof. **The algorithm works correctly.** ### Question 3 #### a Index of first element in level k $=2^k$ Index of first element in level (k+1) $=2^{k+1}$ Then indices of element A with level k $=[2^k, 2^{k+1}-1]$ Number of elements in level k $=2^{k+1} - 2^k$ $=2^k$ Since k = L - h, **indices of element A with height h** $=[2^{L-h}, 2^{L-h+1}-1]$, **number of elements in height h** $=2^{L-h}$. #### b First we compare A[i] with A[2\*i] and A[2\*i+1]. **If A[i] < A[2\*i] and A[i] < A[2\*i+1]**, **- (\*)** i.e. A[i] is smaller than its two children, then by definition of a binary min-heap, subtree rooted at A[i] is also a binary min-heap. We don't have to do anything. Note that A[2\*i] ≠ A[2\*i+1]. **If (\*) is not satisfied and A[2\*i] < A[2\*i+1]**, we have to bubble A[i] down the left subtree. We swap A[i] with A[2\*i], and the right subtree is preserved to be a binary min-heap and the root is now smaller than every children elements in the tree. We can then focus on the left subtree, recursively repeat the whole process on it until either (\*) is satisfied or the node reaches level L. **If (\*) is not satisfied and A[2\*i] > A[2\*i+1]**, similarly, we have to bubble A[i] down the right subtree. We swap A[i] with A[2\*i+1], and the left subtree is preserved to be a binary min-heap and the root is now smaller than every children elements in the tree. We can then focus on the right subtree, recursively repeat the whole process on it until either (\*) is satisfied or the node reaches level L. It can be done by the following pseudocode: ```c // current level of the orginal parent node level = l(i) // keep looping until the original parent node is smaller than both of its children // while it is bubbling down (any subtrees of binary minheap is also binary minheap) // or the bottom level is reached while (A[i] > A[2*i] or A[i] > A[2*i+1]) and level < L do if(A[2*i] < A[2*i+1]) do swap A[2*i] and A[i] // bubble down i <- 2*i // change the index of the originally parent node else do swap A[2*i+1] and A[i] // bubble down i <- 2*i+1 // change the index of the originally parent node level <- l(i) // update the level of the orginal parent node ``` `l(i)` gives the level of A[i], `log(i)` is the base-10 logirithm of `i`: ```c def l(i): return ⌊log(i)/log(2)⌋ ``` For the runtime, 1. the loop would at most repeat `L-l(i)` times due to condition `level < L` and `l(i)` is guaranteed to increment by 1 every time (bubble down one level). 2. `l(i)` can be implemented to have constant runtime. 3. Other comparisons, variable assignments and swapping's also take constant time. Runtime $= (L-l(i))c$ , for some constants c. $= hc = \mathcal{O}(h)$ Therefore, the runtime is bounded by the height of A[i]. #### c Define `min-heap(A, currentNode, L)` that would turns the subtree with root A[currentNode] into a binary min-heap, where `A` is the array, `currentNode` is the index of parent node which we want to turns to a binary min-heap and `L` is the height of tree. To turn the whole array A to a binary min-heap, we can call `min-heap(A, 1, ⌊log(n)/log(2)⌋)`. **Base Case:** The currentNode is at the bottom (level of currentNode= L). The node itself is a binary min-heap, nothing has to be done. **Recursive Case:** We first run `min-heap` on the two children of A[currentNode], so that the subtree rooted at the children of A[currentNode] are already binary min-heaps. Then, it is reduced to the problem in (b). We use the same logic to turn the subtree rooted at A[currentNode] to a binary min-heap. **Pseudocode:** ```c def min-heap(A, currentNode, L): // get the level of the current node level <- l(currentNode) // do nothing at the lowest level if (level = L) return // recursively call the function for the left subtree min-heap(A, 2*currentNode, L) // recursively call the function for the right subtree min-heap(A, 2*currentNode+1, L) // i is the index keeping track of the original parent node i <- currentNode // same as (b) while (A[i] > A[2*i] or A[i] > A[2*i+1]) and level < L do if(A[2*i] < A[2*i+1]) do swap A[2*i] and A[i] i <- 2*i else do swap A[2*i+1] and A[i] i <- 2*i+1 level <- l(i) ``` `l(i)` gives the level of A[i], `log(i)` is the base-10 logarithm of `i`: ```c def l(i): return ⌊log(i)/log(2)⌋ ``` For the runtime, define T(n) as the runtime of `min-heap` on the whole tree A. 1. `min-heap(A, 2*currentNode, L)` and `min-heap(A, 2*currentNode+1, L)` takes 2T(n/2) time in total. 2. The while loop take $\mathcal{O}(h)$, where h is the height of `A[currentNode]` as proven in (b). 3. Other comparisons and variable assignments take constant time. 4. T(1) = 1 $T(n) = 2T(n/2) + \mathcal{O}(h)$ $T(n) = 2^2T(n/4) + L + L - 1$ ... $T(n) = 2^iT(n/2^i) + iL - (\sum_{k=0}^{i-1}{k})$ ... $T(1) = 1$ When $n/2^i = 1$, $i = \log_2n$: $T(n) = 2^{\log_2n} + L\log_2n - (\sum_{k=0}^{\log_2n-1}{k})$ $T(n) = n^{\log_22} + ⌊\log_2(n)⌋\log_2n - \frac{1}{2}(\log_2n-1)\log_2n$ $T(n) = \mathcal{O}(n)$ ### Question 4 We would design a greedy algorithm `time(intervals)` to achieve the task, where `intervals` is the input array of n intervals $[s_i, e_i]$, where $s_i$ and $e_i$ denotes the start time and end time of the i th element in `intervals` respectively. $1 \leq i \leq n$. #### Description: Consider intervals in increasing order of finishing time, a time instance (denoted as **timestamp**) is added whenever we meet a finishing time and that the interval has not been hit by other timestamps. #### Proof of Correctness We would first proof that the algorithm is legitimate, i.e. the set of timestamps indeed hits all the intervals. Then we would proof that the algorithm is optimal, i.e. the set of timestamps is of minimal size. **1. Proof of legitimacy:** We first sort the intervals in increasing order of their finishing time. Then we consider timestamp $e_1$, it is in fact the latest possible first timestamp we have to add because we have to hit $I_1$ within $[s_1, e_1]$. Then, as we go on the list, we ignore other intervals that has already been hit by this timestamp, i.e. $s_i \leq e_1 \leq e_i$, or simply i.e. $s_i \leq e_1$ since we have sorted the list in descending order $e_1 \leq e_i$, for $1 < i \leq n.$ Until $s_k > e_1$, then we add a new timestamp at $e_k$ since we have no choice but to add it, ignore those intervals with $s_i \leq e_k$, for $1 < k \leq n$ and $k < i \leq n$ and repeats the process. Hence, each interval in the list is either ignored because it is hit or we add a new timestamp that hit the interval. **2. Proof of optimality:** Assume that the solution G of the above greedy algorithm is different from O, the optimal solution. And I is the list of intervals sorted accendingly according to their finishing times. $G = \{t_1, t_2, ... t_m\}$ $O = \{o_1, o_2, ... o_l\}$ $I = \{I_1, I_2, ..., I_n \} = \{[s_1,e_1], [s_2, e_2], ... ,[s_n, e_n] \}$ Choose the largest $r$ such that $t_i = o_i$ for $0 \leq i\leq r < m$ and $t_{r+1} \neq o_{r+1}.$ By the definition of our greedy algorithm, $\exists k \in [1,n] \text{ s.t. } e_k = t_{r+1}$. Then, $I_k$ would be the interval that the greedy algorithm and optimal solution disagree on what timestamp to used to hit it. **If r ≠ 0,** Note that $o_{r} = t_{r} < s_k$. Then, in order to hit $I_{k}$, $s_k \leq o_{r+1} \leq e_k = t_{r+1}$. ![](https://i.imgur.com/0JfLlZj.png) (Selection of $o_{r+1}$ on the graph is arbitrary.) **If r = 0,** In order to hit $I_{1}$, $s_1 \leq o_{1} \leq e_1 = t_1$. ![](https://i.imgur.com/EKtb2QN.png) (Selection of $o_1$ on the graph is arbitrary.) Either way, $o_{r+1} \leq t_{r+1}$. Then, create O* from O by "pushing" $o_{r+1}$ backward in time to $t_{r+1}.$ Note that O* is still a legal and optimal solution because: 1. The number of timestamps does not change. 2. All intervals before $I_k$ has already hit by one or some timestamps in $[t_1, t_2, ..., t_r]$ or equivalently in $[o_1, o_2, ..., o_r]$. 3. The interval $I_k$ and those after $I_k$ that are originally hit by $o_{r+1}$ are still going to be hit by $t_{r+1}$ as $s_j \leq o_{r+1} \leq e_k = t_{r+1} \leq e_j$, for $k\leq j\leq n$. 4. The intervals after $I_k$ that are originally not hit by $o_{r+1}$ must be hit by some other timestamps in $[o_{r+2}, o_{r+3}, ..., o_{l}]$. Then, O* is a legal and optimal solution that if $O^* = \{o_1^*, o_2^*, ..., o_l^* \}$, and we choose the largest $r'$ such that $t_i = o^*_i$ for $i\leq r'$ and $t_{r'+1} \neq o^*_{r'+1}$, we can say $r'> r$. And we can repeat the process until $r' = m$, i.e. all elements in G is the same as those in O*. Therefore, there exist an optimal and legal solution $O^*$ is the same as the greedy solution. ***This completes the proof.*** #### Pseudocode Pseudocode of `time(intervals)`: We would add a dummy interval [∞, ∞] at the end or the sorted intervals in order to add the last timestamp into the resulting array. ```c def time(intervals): sort intervals by finishing tiems so that e₁ < e₂ < ... < eₙ add [∞, ∞] to the end of intervals resultTimestamp <- [] last <- e₁ for i <- 2 to n do if sᵢ > last do add last to resultTimestamp last <- eᵢ return resultTimestamp ``` #### Runtime analysis: Note that 1. Sorting is $n\log n$ time. (We may use merge sort.) 2. The for loop would loop exactly n-1 time. 3. Other variable assignments, list appends and comparisons take constant time. Runtime $=n\log n + (n-1)c + c'$, for some constants c, c'. $=O(n\log(n))$ ### Question 5 #### a $A_{\sigma (j)}$ $=A_{\sigma (j-1)} + t_{\sigma (j)}$ $=A_{\sigma (j-2)} + t_{\sigma (j-1)} + t_{\sigma (j)}$ ... $=\sum_{i=1}^j{t_{\sigma (i)}}$ #### b Total completion time $=\sum_{i=1}^n{A_{\sigma (i)}}$ $=\sum_{i=1}^n{\sum_{j=1}^i{t_{\sigma (j)}}}$ $=\sum_{i=1}^n{(n-i+1)t_{\sigma (i)}}$ $=nt_{\sigma (1)} + (n-1)t_{\sigma (2)} + ... + 2t_{\sigma (n-1)} + t_{\sigma (n)}$ We can proof that the total completion time is minimized if $t_{σ(1)} < t_{σ(2)} < ···< t_{σ(n)}$ by induction on n for $A_{σ(1)},A_{σ(2)},...,A_{σ(n)}$. i.e. this permutation of job time is optimal $O$. **Base Case:** When n = 2, Total completion time = $2t_{\sigma (1)} + t_{\sigma (2)}$. Assume, for the sake of contradiction, $t_{\sigma (1)} > t_{\sigma (2)}$ and the total completion time is minimized ($O$). Then, the computer could assign job in an order of $A_{\sigma (2)}$ then $A_{\sigma (1)}$. Then, the new completion time $=2t_{\sigma (2)} + t_{\sigma (1)}$ $<t_{\sigma (2)} + t_{\sigma (1)} + t_{\sigma (1)}$ $<2t_{\sigma (1)} + t_{\sigma (2)}$, which is suppose to be the optimal. Hence, contradiction arises and the total completion time is minimized ($O$) only if $t_{\sigma (1)} < t_{\sigma (2)}$ ***Base case is true*** **Induction:** Assume that the total completion time is minimized if $t_{σ(1)} < t_{σ(2)} < ···< t_{σ(n-1)}$ for $A_{σ(1)},A_{σ(2)},...,A_{σ(n-1)}$. Consider the case for n jobs. From the induction assumption, the jobs $A_{σ(2)},A_{σ(3)},...,A_{σ(n)}$ is minimized if $t_{σ(2)} < ···< t_{σ(n)}$. Assume, for the sake of contradiction, the total completion time is minimized only if $\exists b \in [2, n], \text{ s.t. } t_{\sigma (1)} > t_{\sigma (b)}.$ (Note that $t_{\sigma (1)} \neq t_{\sigma (b)}$ by definition.) Then, the total completion time of $O$ $=\sum_{i=1}^n{A_{\sigma (i)}}$ $=\sum_{i=1}^n{(n-i+1)t_{\sigma (i)}}$ <!-- $=\sum_{i=1}^{a-1}{(n-i+1)t_{\sigma (i)}} + (n-a+1) t_{\sigma (a)} + \sum_{i=a+1}^{b-1}{(n-i+1)t_{\sigma (i)}} + (n-b+1)t_{\sigma (b)} + \sum_{i=b+1}^{n}{(n-i+1)t_{\sigma (i)}}$ --> $= nt_{\sigma (1)} + \sum_{i=2}^{b-1}{(n-i+1)t_{\sigma (i)}} + (n-b+1)t_{\sigma (b)} + \sum_{i=b+1}^{n}{(n-i+1)t_{\sigma (i)}}$ The computer can then swap task σ(1) and σ(b), then the total completion time becomes $= nt_{\sigma (b)} + \sum_{i=2}^{b-1}{(n-i+1)t_{\sigma (i)}} + (n-b+1)t_{\sigma (1)} + \sum_{i=b+1}^{n}{(n-i+1)t_{\sigma (i)}}$ $=\sum_{i=1}^n{A_{\sigma (i)}} +t_{\sigma (1)} + bt_{\sigma (b)} - t_{\sigma (b)} - bt_{\sigma (1)}$ $=\sum_{i=1}^n{A_{\sigma (i)}} +(1-b)(t_{\sigma (1)} - t_{\sigma (b)})$ $<\sum_{i=1}^n{A_{\sigma (i)}}$ as $1-b<0 \text{ and } t_{\sigma (1)} - t_{\sigma (b)} > 0$, which is supposed to be the optimal $O$. <!-- $=\sum_{i=1}^{a-1}{(n-i+1)t_{\sigma (i)}} + (n-a+1) t_{\sigma (b)} + \sum_{i=a+1}^{b-1}{(n-i+1)t_{\sigma (i)}} + (n-b+1)t_{\sigma (a)} + \sum_{i=b+1}^{n}{(n-i+1)t_{\sigma (i)}}$ $=\sum_{i=1}^n{A_{\sigma (i)}} +at_{\sigma (a)} + bt_{\sigma (b)} - at_{\sigma (b)} - bt_{\sigma (a)}$ $=\sum_{i=1}^n{A_{\sigma (i)}} +a(t_{\sigma (a)} - t_{\sigma (b)}) + b(t_{\sigma (b)} - t_{\sigma (a)})$ $=\sum_{i=1}^n{A_{\sigma (i)}} +(a-b)(t_{\sigma (a)} - t_{\sigma (b)})$ $<\sum_{i=1}^n{A_{\sigma (i)}}$ as $a-b<0 \text{ and } t_{\sigma (a)} + t_{\sigma (b)} > 0$ --> Hence, contradiction arises. Then, the total completion time of is minimized ($O$) only if $\forall b \in [2, n], t_{\sigma (1)} < t_{\sigma (b)}$, and $t_{σ(2)} < ···< t_{σ(n)}$ i.e. $t_{σ(1)} < t_{σ(2)} < ···< t_{σ(n)}$ **This completes the proof.**