ML2021FALL HW2

--- title: 'ML2021FALL HW2' disqus: hackmd --- ## HW2 - Handwritten Assignment ### 1. Consider a generative classification model for $K$ classes defined by prior class probabilities $p(C_k) = \pi_k$ and general class-conditional densities $p(x|C_k)$, where $x$ is the input feature vector. (Note that $\pi_1+...+\pi_k=1$) Suppose we are given a training data set ${\{x_n, {\bf t_n}\}}$ where $n = 1,...,N$,and $\bf t_n$ is a binary target vector of length $K$ that uses the $1-of-K$ coding scheme, so that it has components $t_{nk}= 1$ if pattern $n$ is from class $C_k$, otherwise $t_{nj}= 0$. Assuming that the data points are drawn independently from this model, show that the maximum-likelihood solution for the prior probablities is given by $$\pi_k =\frac{N_k}{N}$$where $N_k$ is the number of data points assigned to class $C_k$. ### 2. Show that$$ \frac{\partial log(det\space{\bf \Sigma^{}})}{\partial \sigma_{ij}} = {\bf e_j}{\bf \Sigma^{-1}}\space{\bf e_i^T}$$where ${\bf \Sigma} \in {\mathbb R^{m\times m}}$ is a (non-singular) covariance matrix and $\bf e_j$ is a row vector(ex: $e_3=[0,0,1,0,...,0]$). Hint: ![](https://i.imgur.com/gq5ylwQ.jpg) ### 3. Consider the classification model of $\bf problem \space 1$ & result of $\bf problem \space 2$ and now suppose that the class-condition densities are given by Gaussian distributions with a shared convariance matrix, so that$$p(x|C_k)= \mathcal{N}(x|{\bf\mu_k,\Sigma})$$Show that the maximum likelihood solution for the mean of the Gaussian distribution for class $C_k$ is given by $${\bf\mu_k} =\frac{1}{N_k}\sum_{n=1}^N t_{nk}x_n $$which represents the mean of those feature vectors assigned to class $C_k$. Similarly, show that the maximum likelihood solution for the shared covariance matrix is given by $${\bf\Sigma}=\sum_{k=1}^K\frac{N_k}{N}{\bf S_k}$$where $${\bf S_k} = \frac{1}{N_k}\sum_{n=1}^Nt_{nk}({\bf x_n-\mu_k})({\bf x_n-\mu_k})^T$$Thus $\bf \Sigma$ is given by a weighted average of the covariance of the data associated with each class, in which the weighting coeffients are given by the prior probabilities of the classes.