# 2021FALL ML HW5 ### 1. Estimate Rate Parameter of Exponential Mixture Model(2%) This is a toy example of Expectation-Maximization algorithm (EM) to estimate the rate parameter of two exponential distributions. ### Derive updated form of EM algorithm Given dataset $\boldsymbol{X}=\{x_{1}, x_{2}, x_{3}\} = \{0, 2, 4\} \in [0,\inf)$,and we would like to cluster them into 2 clusters. We would like to apply the Expectation-Maximization algorithm(EM) to find the maximum likelihood estimation of parameters $\theta$. That is to say, we want to estimate rate parameter and prior probability $\theta={(\pi_{k},\tau_{k})}_{k=1}^{2}$ that maximize the likelihood $$p(\boldsymbol{X};\theta)=\prod_{i=1}^{3}p(x_{i};\theta)=\prod_{i=1}^{3}\sum_{k=1}^{2}\pi_{k}f_{\tau}(x_{i})$$ where $\pi_1+\pi_2=1$, $k$ be an index over the exponential distributions, $i$ be an index over the data points and $f_{\tau}$ is probability density function of exponential distribution: $$f_{\tau}(x)=\left\{{\begin{aligned} &\frac{1}{\tau}e^{-\frac{x}{\tau}},x \ge0\\ &0 ,otherwise\end{aligned}}\right. $$ #### (a) Given current parameter estimates $\theta^{[t]}={(\tau_{k}^{[t]},\pi_{k}^{[t]})}_{k=1}^{2}$, where $t$ is an index over the E-M iterations, write down the Expectation Step$(E-step)$. That is to say, derive posterior probability of latent variables $\mathbb{P}[z_{i} = k|x_i;\theta^{[t]}]$ and expectation of log-likelihood function $Q(\theta|\theta^{t})$. #### (b) Given current parameter $\theta^{[t]}$,write down the Maximization Step($M-step$).That is to say, derive estimated rate parameter $\tau_{k}^{[t+1]}$ and the prior probability $\pi_{k}^{[t+1]}$. ### Go through the First iteration of EM algorithm $\boldsymbol{X}=\{x_{1}, x_{2}, x_{3}\} = \{0, 2, 4\}$ be our three data points, presumed to have each been generated from one of two exponential distributions. Besides, let's initialize the rate parameter and the prior over two exponential distributions to some reasonable values to make calculation easy. $\tau_{1}^{[0]} =1,\tau_{2}^{[0]} =2$ $\pi_{1}^{[0]} = \pi_{2}^{[0]} =0.5$ #### \(c\) Please go through the first iteration of EM algorithm and either write down your derivation or provide your code of computation in report. Besides, construct a table of your answers as following. | i | 1 | 2 | 3 | | ----------------------------------- | --- | --- | --- | | $x_{i}$ | 0 | 2 | 4 | | $p(x_i,z_i=1;\theta^{(t)})$ | | | | | $p(x_i,z_i=2;\theta^{(t)})$ | | | | |$\sum_{j=1}^{2}p(x_i,z_i=j;\theta^{(t)})$||| |$\delta_{i1}^{[t]}=\mathbb{P}[z_{i} = 1\|x_i;\theta^{[t]}]$||| |$\delta_{i2}^{[t]}=\mathbb{P}[z_{i} = 2\|x_i;\theta^{[t]}]$||| #### \(d\)Please derive the estimated rate parameter ${\tau}_{1}^{[1]},{\tau}_{2}^{[1]}$,${\pi}_{1}^{[1]},{\pi}_{2}^{[1]}$ with table from \(c\) and either write down your derivation or provide your code of computation in report ### 2. Principle Component Analysis (1%) Given $10$ samples in 3D: $(1,2,3)$, $(4,8,5)$, $(3,12,9)$, $(1,8,5)$, $(5,14,2)$, $(7,4,1)$, $(9,8,9)$, $(3,8,1)$, $(11,5,6)$, $(10,11,7)$ #### (a) What are the principal axes? Please write down your derivation or provide your code of computation in report. #### (b) Please compute the principal components for each sample and either write down your derivation or provide your code of computation in report. #### \(c\) What is the average reconstruction error if reduce dimension to 2D? Here the reconstruction error is defined as the squared loss. Please write down your derivation or provide your code of computation in report. ### 3. Constrained Mahalanobis Distance Minimization Problem (1%) #### (a) Let $A \in R^{m \times n}$, show that $AA^T$ and $A^TA$ are both symmetric and positive semi-definite, and share the same non-zero eigenvalues. #### (b) Let $\Sigma \in R^{m \times m}$ be a symmetric positive semi-definite matrix, $\mu \in R^m$. Please construct a set of points $x_1,...,x_n \in R^m$ such that $$\frac{1}{n}\sum_{i=1}^n (x_i - \mu) (x_i - \mu)^T = \Sigma, ~~ \frac{1}{n}\sum_{i=1}^n x_i = \mu$$ #### \(c\)Let $1 \leq k \leq m$, solve the following optimization problem (and justify with proof): $$minimize \quad Trace(\Phi^T \Sigma \Phi) \\ subject \ to \quad \Phi^T \Phi = I_k \\ variables \quad \Phi \in R^{m \times k}$$