Advanced Algorithm and Data Structures Practice Exam
===
## Question 1
### (1)

True, purely conceptual analysis.
### (2)

True, by shannons entropy the splay tree is optimal for any arbitary sequence of search operations
### (3)

False, scaling does not affect the minimum cut - arbitary addition and deletion does.
### (4)

100
100, FF Paths is 200 (flow 1 each time)
2, EK paths as we consider edges not weights in picking the shortest path
### (5)

14 ->
x = 3, y = 1.
## Question 2
#### (1)

$f(n) \leq c_1 \cdot g(n), n \geq c_2, c_1 > 0$
$\log n \leq c_1 \cdot n^\epsilon, n \geq c_2, c_1 > 0$
Cases:
$\epsilon \geq 1$ holds for $c_1=c_2=1$
$\epsilon < 1$ we set $c_1=\epsilon$ then consider the function $z(n) = \frac{n^\epsilon}{\epsilon \log n}$ taking the derivative of $z(n)$ we have:
$z'(n) = \frac{\epsilon n^{\epsilon -1} \cdot (\epsilon \ln n - 1)}{(\epsilon \ln n)^2}$
When $n \geq e^{\frac{1}{\epsilon}}, z'(n) \geq 0$ and $z(e^{\frac{1}{\epsilon}}) > 1$
Therefore $n^\epsilon \geq \epsilon \log n$ for all $n \geq c_2 = e^{\frac{1}{\epsilon}}$
#### (2)

```
removeDuplicates(S):
d <- Hashtable w/ perfect hashing
S' <- empty set
for element in S:
if d does not contain element:
S' <- element
d <- element
return S'
```
#### (3)

```
Define T as:
DS that acts as a Counter
T contains a hashtable M = [], where the hashing function is drawn uniformly at random from the universe of hash functions H, which are perfect O(1) expected
Then for each bucket, there is a secondary hash function, likewise to the previous.
At each inner hash bucket, an additional value is stored, indicating a counter of values with that key present.
i.e. a dictionary {key:value} where the key represents an integer insertion, and the value represents the number of such insertions with the same key.
We can also store an auxillary variable k, n which are incremented by 1 if a new element is added and 1 on each insertion respectively.
The size of the hashtable is dynamic - where it is at most 2*k at any time.
When resizing needs to occur (k=capacity) we set capacity = 2*k
Then to ammortize this insertion cost to O(1)
we pay 2 units by the prepaid argument on each
insertion to cover copying / rehashing.
i.e. with 1 as the starting value.
T = {
M = {1: 1}
n = 1,
k = 1,
}
```
## Question 3
#### (1)

```
Algorithm:
1. Find uSplit, via finding the LCA of Q = [A,B] -> succ(A) pred(B).
2. Following this we follow the following algorithm.
If uSplit in Q then report uSplit
for each node u other than usplit on path to succ(A):
if u in Q, report u
if u.x >= succ(A) then:
if height(u) % 2 == 0: report all points in u.right.ytree whose y coords are in a2,b2
else: report all points in union of u.right.right.ytree and u.right.left.ytree that are in a2,b2
for eeach node other than usplit on path to pred(B):
if u in Q report u
if u.x <= pred(B) then:
if height(u) % 2 == 0: report all points in u.left.ytree whose y coords are in a2,b2
else: report all points in union of u.left.right.ytree and u.left.left.ytree that are in a2,b2
Note that this works on the assumption - if the node u is a leaf - then it is either matches the initial check in Q or does not. Otherwise it is either at level 2,3 or above. In which case if it is odd and doesn't match then we can either check 1 level or 2 levels below. by taking the whole Y-tree or constructing the union of the level below Y-trees if the level is even.
```
#### (2)

```
We know that at each height - there must be exactly n nodes stored. Now in the traditional 2D - range tree, there are logn levels of y-trees - hence nlogn. Howeveer if we instead have a L space-saving range tree - instead there will be n nodes stored every L levels.
Meaning h/L + 1 store a y-tree. Therefore there is h/L * n.
Therefore: O(logn/L * n) is the bounded space
```
#### (3)

```
Likewise - the extension follows that in order to access a the y-subset of points given we have found a x-subtree within the range A,B - we need to descend to the next multiple of l (+1).
1. Find uSplit, via finding the LCA of Q = [A,B] -> succ(A) pred(B).
2. Following this we follow the following algorithm.
If uSplit in Q then report uSplit
for each node u other than usplit on path to succ(A):
if u in Q, report u
if u.x >= succ(A) then:
if height(u) % l == 0:
report all points in u.right.ytree whose y coords are in a2,b2
else report all points in construct_subset(u.Right)
find_points(u)
for eeach node other than usplit on path to pred(B):
if u in Q report u
if u.x <= pred(B) then:
if height(u) % l == 0:
report all points in u.left.ytree whose y coords are in a2,b2
else report all points in construct_subset(u.Left)
construct_subset(u):
if not u: return
if height(u) % h == 0: return points in u.ytree that are in a2,b2
else: return construct_subset(u.left) + construct_subset(u.right)
```
Complexity:
We find $L_{split}$ in $O(\log n)$ time.
We have $L_a, L_b$ paths which are bounded in length by $O(\log n)$
At each level we either perform an $O(1)$ operation or we need to descend to a maximum of $\mathcal l$ levels, and grab all the y-trees to report the points
Each level Y-Tree will take $\log (n) + k$ worst case time to report the points. In addition these points need to be merged together, proportional to the size $l$ worst case meaning $l \cdot \log(n)$ for a level query. For each level - meaning total cost is
(finding $U_{split}$) + (traversing $U_a, U_b$ levels) * (work at each level)
$O(\log n + \log n \cdot (l \cdot \log n) + k)$
$O(l \cdot \log^2 n + k)$
#### (4)

$|G| \geq |g_i(q)|$ as it cannot contain more than the union of all distinct points
Therefore, summing over $L$ buckets, there must be $L \cdot |G|$
#### (5)

## Question 4
#### (1)

Reduce flow by 1 along an augmented path containing e. O(n+m) modified BFS from either side of e to source and sink.
From source - run a single BFS to see if any augmenting is possible - if so augment the path and return the updated max-flow O(n+m)
Since only a single augmented path had its flow reduced by 1 - its only possible that 1 additional path can be added (or none) - given that the one of the outwards edges from the source and inwards to sink both had their flow reduced by only 1 - we essentially just BFS once to try and find a path that starts with the source edge and ends with the sink edge.
#### (2)

#### (3)

Define:
- $m$: number of sets in the optimal solution
- $U$: universe
- $n$: number of points in $U$
- $\alpha$: fraction of maximum current contribution
The number of elements we still hve to cover after the first set is picked is:
$n_1 \leq n - \frac{\alpha \cdot n}{m}$
Now we are left with $n_1$ elements that we have to cover. At least one of the remaining sets $S_i$ must contain at least $\frac{n}{m-1}$ such points otherwise the optimum solution woul have to contain more than m sets.
$n_{i+1} \leq n_i(1-\frac{\alpha}{m}) \leq n(1-\frac{\alpha}{m})$
We would like to determine the number of stages after which our greedy algorithm will have covered all elements of U, this correspond to tthe max number of sets the greedy algorithm has to pick.
$n(1-\frac{\alpha}{m})^k < 1$
$n(1-\frac{\alpha}{m})^{\frac{\alpha}{m} \cdot \frac{k}{\frac{\alpha}{m}}} < 1$
$n(1-\frac{\alpha}{m})^{\frac{\alpha}{m} \cdot \frac{k \cdot m}{\alpha}} < 1$
$e^{-\frac{km}{\alpha}} < \frac{1}{n}$
$\frac{km}{\alpha} > \ln n$
$k < \frac{\alpha \ln n}{m}$
therefore $O(\frac{\ln n}{\alpha})$
## Question 5
#### (1)

Define the flow of each edge $f_e \in \{0,1\}$
The shortest distance algorithm can therefore be shown as:
minimize $\sum_e{w_e \cdot f_e}$ where
$f_e=0$ if edge doesn't exist in path
$f_e=1$ if edge does exist in path
Hence, this problem can be shown as equivalent to finding an s-t flow that minimizes the above.
#### (2)

minimize $\sum_e{w_e f_e}$ subject to:
$f_e = 1$
#### (3)

maximize
## Question 6
#### (1)

Consider a graph with three nodes, {Source, A, Sink}
Where there are the following edges and capacities
Source -> A = 1
A -> Sink = 1
The max-flow is 1.
If we increase source->A, then the flow remains 1 as A->sink still has capacity 1
if we increase A->Sink, the flow remains 1 as A-Sink still has capacity 1.
#### (2)

We can augment the graph by adding an additional vertex for every vertex. This new vertex can be placed to receive all flows in or alternatively all flows out of the original vertex.
The edge v -> v' in the out example would have capacity = k_v.
This would directly affect n, m -> 2*n, m + n
Therefore this would affect the running time of EdmondsKarp to be ((m+n)^(4n))
#### (3)
