# 2020 DS sample problem

\- Amelia Watson
---
## Chen's
1.
(a) Kernel can simplified the calculation of inner product between two transformed points
(b) When the dimension is higher, the points will become sparse in the dimension space, and the distance and the clustering is more meaningless.
(c ) $P(A|B) = \frac{P(A, B)}{P(B)} \Rightarrow P(A, B) = P(A|B)P(B) = P(B|A)P(A)$
2. DB
(a) blabla
(b) A, B, E, F, {BC}, {FJ}, {DFJ}, {EFJ}
3. Assume there are three clusters in the space, first two are far away from the last one. If the initial three points initialize that one locate between first two cluster, and the other locate on third cluster, the result will not be optimal.
4. $P(Y|X) = 3/5 * 2/6 * 4/6 * 4/7 = 96/...$ <- This
$P(N|X) = 2/5 * 4/6 * 2/6 * 3/7 = 48/...$
5. 自己算
6. 
(1) Merge A & H
(2) Merge B & G
(3) Merge {B, G} & E
(4) Merge {A, H} & F
(5) Merge {B, E, G} & D
---
## Lo's
1.
(a) Statistic is a function of random sample from a given random variable X and distribution f(x, theta), and free of the parameter theta.
(b) Statement based on sample information about the population.
(c ) An accurate statistic used to approximate a population parameter. The expection of estimator equals to population mean.
2.
(a) $\frac{1}{n}\sum\limits_{i=1}^{n} X_{i}^{k}$
(b) Even the next point is worse than local optimal, there still has chance to replace recent local optimal by next point.
3. $\frac{14}{15} > 0.9$, the last dimension can be dropped.
4. $-1.64 \leq \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} \leq 1.64$
$\Rightarrow \frac{565.2}{90}-1.64\sqrt{\frac{10}{90}} \leq \mu \leq \frac{565.2}{90}+1.64\sqrt{\frac{10}{90}}$
5. F2 & F4 is consistency with class
6. $Y_6 = 36-30+10.5+\epsilon_6 = 16.5+\epsilon_6$, $Y_7 = 33+2\epsilon_6-27+10+\epsilon_7 = 16+2\epsilon_6+\epsilon_7$