Farid-internship

Farid-internship ======= # Eight meeting notes Take $ w(x,y) = 1$ iff they are in the same class Then $$ DEND-PUR = \mathbb{E}_{x,y | w(x,y) = 1}(\frac{1}{|LCA(x,y)|} \sum_{z | w(x,z) = 1} 1) $$ $$ DEND-PUR = \mathbb{E}_{x,y | w(x,y) = 1}(\frac{w(x,y)}{|LCA(x,y)|} \sum_z w(x,z)) $$ $$ DEND-PUR = \frac{1}{ \sum_{u<v} {w(u,v)} } \sum_{x<y} \frac{w(x,y)}{|LCA(x,y)|} \sum_z w(x,z) $$ # Seventh meeting notes Suppose there are four clusters, in order $C,A,B,D$. We also suppose we merge $(A,B)$ and then $(C,AB)$. We use the following bound on the variation in score: $$ \Delta S \geq \Delta(\textrm{MAX-UPPER}) - \Delta(\textrm{MIN-MERGE}) $$ $$ \Delta(\textrm{MIN-MERGE}) \leq \min( w(AB,D)c, ((a+b)c-w(AB,C))d) \leq (a+b)cd \min(1-\bar{w}(AB,C), \bar{w}(AB,D))$$ $$ \Delta(\textrm{MAX-UPPER}) \geq w(A,B)(c+d)+w(AB,C)d$$ ---- We want to show that $\Delta(\textrm{MAX-MERGE} + \textrm{MIN-MERGE}) \geq 2 \sum_{\textrm{MIN-MERGE}} \textrm{MAX}w - \textrm{MIN}w$. We have: $$\Delta(\textrm{MAX-MERGE}+\textrm{MIN-MERGE}) \geq w(A,B)(c+d) + w(AB,C)d$$ and, $$\sum_{\textrm{MIN-MERGE}} \textrm{MAX}w - \textrm{MIN}w \leq w(D,AB)c - w(C,AB)d$$ We finally want to show the following: $$ ab(c+d) \bar{w}(A,B) + (a+b)cd\bar{w}(AB,C) \geq 2(a+b)cd\left(\bar{w}(AB,D) - \bar{w}(AB,C) \right)$$ $$ ab(c+d) \bar{w}(A,B) \geq (a+b)cd\left(\bar{w}(AB,D) - 2\bar{w}(AB,C) \right)$$ --- Going back to threshold weights, we have, $$\Delta(M_{1,1}+M_{1,0}) \geq w(AB,C)d$$ $$\Delta S_{-1} \geq w(A,B)(c+d)$$ $$\Delta \Psi \geq (w(C)+w(D))(a+b) $$ $$\Delta M_{0,1} \leq \min(w(AB,D)c, (c(a+b)-w(AB,C))d)$$ And so, $$\Delta(M_{1,1}+M_{1,0} + 2\Psi + S_{-1}-2M_{0,1}) \geq \\ \Delta(2\Psi + S_{-1} - M_{0,1}) \geq \\ ab(c+d)\bar{w}(A,B) + 2(w(C)+w(D))(a+b) - (a+b)cd \min( \bar{w}(AB,D), 1 - \bar{w}(AB,C))$$ We also know that, $$ w(C) = \bar{w}(C) {c \choose 2} \geq \bar{w}(AB,D) {c \choose 2}$$ And thus, $$ 2(w(C) + w(D)) \geq cd \min(\bar{w}(AB,D), 1-\bar{w}(AB,C)) $$ # Sixth meeting notes In the case of four clusters $ (C,A,B,D) $, we have the following: $$ avg(A,B) = \frac{\Delta M_{1,1} + \Delta M_{1,0}}{|A| |B| ( |C| + |D|)}$$ $$ \frac{1}{|D|} avg(A,C) + \frac{1}{|C|}avg(B,D) = \frac{\Delta M_{1,1} + \Delta M_{0,1}}{|A||B||C||D|} $$ Thus, using the following inequality: $$ \left(\frac{1}{|D|} + \frac{1}{|C|}\right) avg(A,B) \geq \frac{1}{|D|} avg(A,C) + \frac{1}{|C|}avg(B,D) $$ $$ \left(\frac{1}{|D|} + \frac{1}{|C|}\right) \frac{\Delta M_{1,1} + \Delta M_{1,0}}{|C|+|D|} \geq \frac{\Delta M_{1,1} + \Delta M_{0,1}}{|C||D|} $$ $$ \left(|C| + |D|\right) \frac{\Delta M_{1,1} + \Delta M_{1,0}}{|C|+|D|} \geq \Delta M_{1,1} + \Delta M_{0,1} $$ Imagine we lose $(0,1)$-triple $(x,y,z)$ by merging $(x,y)$. This means that if we have points $t < y$ such that $w(t,y) = 1$ then we also lose (or have already lost) the $(t,y,z)$ triple. We are trying to map $(x,y,z)$ onto $(pred(y),y,w)$. Fix $y$ and let $L_q(y) = \{r < y | w(r,y) = q\}$ and $R(y) = \{r > y | w(r,y) = q\}$ be sets of points to the left/right of $y$ with similarity $q$. Assume w.l.o.g that $|L_1(y)| \ge |R_1(y)|$. For every $z \in R_1(y)$ and $x \in L_0(y)$ we would like to find find $x < x' < y$ and $x' \in L_1(y)$ and $z' \in R_1(y) \cup L_1(y)$. For $x'$ pick any element of $L_1(y)$ and for $z'$ pick anything: $|L_1| \times (|L_1| + |R_1|) \geq |L_0| \times |R_1|$ choices. # Fifth meeting notes Let's consider $(i < j < k)$ triples. There are three different types of interesting triples: $(1,0)$, $(0,1)$, $(1,1)$. So the MAX-upper bound is just given as: $$\Psi_{1,0} + \Psi_{0,1} + \Psi_{1,1}.$$ Consider the execution of the average-linkage clustering algorithm. For every $(i,j,k)$ triple consider the first time a pair in this triple gets merged. Let's use notation: * $M_{1,0}$ is the number of merge where in $(1,0), (0,1)$ triple the $1$-pair gets merged first * $M_{0,1}$ is .... where the $0$-pair gets merged first * $M_{1,1}$ is the number of merged $(1,1)$ triples Our score is $M_{1,0} + M_{1,1}$ and the potential drop is $M_{1,0} + M_{0,1} + M_{1,1}$. So we would like to show that at every iteration the cumulative number of merges satisfies: $$M_{1,0} + M_{1,1} \ge \frac 34 \left(M_{1,0} + M_{0,1} + M_{1,1}\right)$$ $$\frac14\left(M_{1,0} + M_{1,1}\right) \ge \frac34 M_{0,1}$$ $$M_{1,0} + M_{1,1} \ge 3 M_{0,1}$$ $$M_{1,0} + M_{1,1} - 3 M_{0,1} \ge 0$$ In the 4-point example we have: * First iteration: $M_{1,0} = 0$, $M_{1,1} = 2, M_{0,1} = 0$ * Second iteration: $M_{1,0} = 1$, $M_{1,1} = 2, M_{0,1} = 1$ Consider the final step $(C,(A,B))$ where we merge $(A,B)$ and the rest is $C$ on the left. Let's express the changes in score and potential in terms of $M_{0,1}, M_{1,0}, M_{1,1}$: Under the hypothesis, $$\Delta S_{merging AC} \leq \Delta S_{merging AB}$$ We have, $$\Delta M_{0,1} + \Delta M_{1,1} \leq \Delta M_{1,0} + \Delta M_{1,1}$$ # Fourth meeting notes We would like to show something liks this: $$\sum_{i = 1}^t \Delta C_i + \frac34 \Delta \Psi^{MAX}_i \ge 0$$ We consider merging $A$ and $B$ where the rest is just $C$ on the left so that we have $(C,(A,B))$. $$\Delta C = w_{AB}|C|$$ $$\Delta \Psi^{SUM} = - (w_{AB}|C| + w_{AC}|B|)$$ $$\Delta \Psi^{MAX} = - (w_{AB}|C| + w_{AC}|B| - \Psi_{1,1}(C,A,B))$$ Suppose we want a proof by induction going from $t$ to $t + 1$. We have: $$\sum_{i = 1}^t \Delta C_i + \frac34 \Delta \Psi^{MAX}_i \ge 0$$ We want to show: $$\sum_{i = 1}^{t + 1} \Delta C_i + \frac34 \Delta \Psi^{MAX}_i \ge 0$$ $$\Delta C_{t + 1} = w_{AB}|C|$$ $$\Delta \Psi_{t + 1}^{MAX} = - (w_{AB}|C| + w_{AC}|B| - \Psi_{1,1}(C,A,B))$$ $$\Delta \Psi_{t + 1}^{MAX} = - (\Psi_{1,0} + \Psi_{0,1} + \Psi_{1,1})$$ We want: $$w_{AB}|C| \ge \frac34 (w_{AB}|C| + w_{AC}|B| - \Psi_{1,1}(C,A,B))$$ $$\frac14 w_{AB}|C| \ge \frac34 w_{AC}|B| - \frac34 \Psi_{1,1}(C,A,B)$$ $$\frac14 \bar w_{AB} \ge \frac34 \bar w_{AC} - \frac{3}{4 |A||B||C|} \Psi_{1,1}(C,A,B)$$ $$0 \ge \frac 12 \bar w_{AC} - \frac{3}{4} \bar \Psi_{1,1}(C,A,B)$$ $$0 \ge \frac 12 (\bar \Psi_{1,0} + \bar \Psi_{1,1}) - \frac{3}{4} \bar \Psi_{1,1}(C,A,B)$$ $$0 \ge \frac 12 \bar \Psi_{1,0}(C,A,B) - \frac{1}{4} \bar \Psi_{1,1}(C,A,B)$$ We have $\bar w_{AC} \ge \bar \Psi_{1,1}(C,A,B)$ so we just need to show: $$0 \ge - \frac{1}{4} \bar \Psi_{1,1}(C,A,B)$$ $$\Psi_{1,1}(C,A,B)$$ $$ |B| w_{AC} = \Psi_{1,0} + \Psi_{1,1}$$ $$ |B| w_{AB} = \Psi_{0,1} + \Psi_{1,1}$$ So, $$ \bar w_{AC} = \bar \Psi_{1,0} + \bar \Psi_{1,1}$$ $$ \bar w_{AB} = \bar \Psi_{0,1} + \bar \Psi_{1,1}$$ Is it true that $\bar w_{AC} \ge \bar \Psi_{1,1}(C,A,B)$? In the 4-point example in the second step we have: $|A| = 2, |B| = 1, |C| = 1$ $\bar w_{A,C} = \frac12$ $$$$ # Third meeting notes Let's study MAX-potential: $$MAX = \sum_{i < j < k} \max(w_{ij}, w_{jk})$$ Pick $w$ be threshold at distance $\theta$. Let's try to characterize the MAX-potential in this case. Let's split $x_i$ points into groups which are at distance $> \theta$ from each other. Suppose there are two such groups $G_1, G_2$. Let's evaluate the MAX-potential. $$MAX(G_1 \cup G_2) = MAX(G_1) + MAX(G_2) + C_1 |G_2| + C_2 |G_1|,$$ where $C_i$ is the the number of pairs in group $i$ at distance at most $\theta$. ## 1/2-approximation analysis Suppose we are merging $(A,B)$ and the all of the rest is on the left and is called $C$. So we have $(C, (A, B))$. We want to show that $\Delta C + 1/2 \Delta \Psi \ge 0$. We have: $$\Delta C = w_{AB}|C|$$ $$\Delta \Psi_{SUM} = - (w_{AB}|C| + w_{AC}|B|)$$ $$\Delta \Psi_{MAX} = - (w_{AB}|C| + w_{AC}|B| - \Psi_{1,1}(C,A,B))$$ Overall in the 4-point example we have: $$\Psi: 4 \rightarrow 2 \rightarrow 0 \rightarrow 0$$ $$C: 0 \rightarrow 2 \rightarrow 3 \rightarrow 3$$ $$\Psi_{1,1}: 2 \rightarrow 0 \rightarrow 0 \rightarrow 0$$ $$\Psi_{1,0}: 2 \rightarrow 2 \rightarrow 0 \rightarrow 0$$ # Second meeting notes CA objective is to maximize: $$\sum_{i < j} d(i,j) |LCA(i,j)|$$ Let's introduce an indicator $I_{k}(i,j)$ for the event that $k$ is in the subtree of $LCA(i,j)$. Then we have: $$CA = \sum_{i < j} d(i,j)\sum_{k \neq i,j} I_k(i,j)$$ Let's introduce $CA' = CA - 2 \sum_{i < j} d(i,j)$. Then for a triangle $(i,j,k)$ we have $$OPT_{CA'} = \sum_{i < j} d(i,j) - \min_{i < j} d(i,j)$$ $$OPT_{MW} = 1 - \min_{i < j} d(i,j)$$ $$ALG_{MW} = 1-d(i^*,j^*) \ge (1/3 + \delta) OPT_{MW} = (1/3 + \delta) (1 - \min_{i < j} d(i,j))$$ This gives us: $$-d(i^*,j^*) \ge (1/3 + \delta) (1 - \min_{i < j} d(i,j)) - 1$$ $$ALG_{CA'} = \sum_{i < j} d(i,j) - d(i^*,j^*)\ge (2/3 + \epsilon) OPT_{CA'} = (2/3 + \epsilon)(\sum_{i < j} d(i,j) - \min_{i < j} d(i,j))?$$ Plugging in the lower bound on $-d(i^*,j^*)$ from above we have: $$ALG_{CA'} \ge \sum_{i < j} d(i,j) + (1/3 + \delta) (1 - \min_{i < j} d(i,j)) - 1$$ Using different notation: $$x < y < z \le 1$$ $$ALG_{CA'} = x + z, OPT_{CA'} = y + z, ALG_{MW} = 1- y, OPT_{MW} = 1 - x$$ $$ALG_{MW} \ge (1/3 + \delta) OPT_{MW}$$ $$ALG_{MW} = 1 - y \ge (1/3 + \delta)(1 - x)$$ I am worried that $y = z$ and $x = 0$. $$1 - y \ge 1/3 + \delta$$ # First meeting notes MW objective is to maximize: $$\sum_{i,j} \rho(i,j) (n - |LCA(i,j)|)$$ Cohen-Addad et al. objective is to maximize: $$\sum_{i,j} d(i,j) |LCA(i,j)|$$ Straightforward conversion is $\rho(i,j) = 1 - d(i,j)$ after normalizing $d(i,j) \in [0,1]$. **Counterexample:** Take triangle with sides at distances $(x, y, 1)$ where $x \le y \le 1$ and hence similarities $(1 - x, 1 - y, 0)$. MW objective is $1 - x$. Cohen-Addad objective is $2 x + 3 (1 + y)$ Suppose we get $\alpha (1 - x) = 1 - y$ inside, so that $y = 1 - \alpha(1-x)$. Then what can we say about CA-objective? $2 y + 3(1 + x)$ while the optimum is $2x + 3(1 + y)$ so the ratio is: $$\frac{2 y + 3x + 3}{2x + 3y + 3} = 1 - \frac{y - x}{2x + 3y + 3}$$ Let's plug in the fact that $(1/3 + \alpha)(1 - x) = 1 - y$ or $y = 1 - (1/3 + \alpha)(1 - x)$. So we have: $$1 - \frac{1 - (1/3 + \alpha)(1 - x) - x}{2x + 3 + 3 - (1 + 3\alpha)(1 - x)}$$ $$\frac{2 y + 3x + 3}{2x + 3y + 3} = \frac{2(1-\alpha(1-x)) + 3x + 3}{2x + 3(1-\alpha(1-x))+3}$$ $$\frac{5-2\alpha+(3+2\alpha)x}{6-3\alpha+(2+3\alpha)x} = 1 - \frac{1+\alpha+(\alpha-1)x}{6-3\alpha+(2+3\alpha)x}$$ alpha = 0: $\frac{5}{6} \le 1-\frac{1-x}{6+2x} \le 1$ ![](https://i.imgur.com/XT20TuD.png) **Claim:** $\frac{1}{3}+\alpha$-approx for MW => $\frac{2}{3} + \frac{1}{2}\alpha$-approx for C-A. We have $(1/3 + \alpha)(1 - x) = 1 - y$ so $y = 1 - (1/3 + \alpha)(1 - x)$. $$ y = 1 - 2\frac{1-x}{3} = \frac{2x+1}{3}$$ Let's evaluate CA-objective: $$\frac{2 y + 3x + 1}{2x + 3y + 3} = \frac{3x+1+\frac{2}{3}(2x+1)}{2x+2x+5}$$ $$= \frac{(3+\frac{4}{3})x+\frac{5}{3}}{4x+5}$$ ![](https://i.imgur.com/Zs8RxZS.png) Triangle inequality: $$x + y \ge 1$$! $$ MW(T) \ge \alpha (1-x) $$ $$ CA(T) \ge (1-\alpha) (3(x+y) -x + 3 ) \ge (1-\alpha)(6-x) $$ $$ MW(T) = n\sum \rho(i,j) - \sum |LCA(i,j)| \rho(i,j)$$

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.