VL Machine Learning: Unsupervised Techniques - Example problems

# VL Machine Learning: Unsupervised Techniques - Example problems ###### tags: `Exam` **Given a probability distribution (or density function) and a function, compute the expected value of the function.** 1. $$f(y) = \left\{\begin{array}{ll}\frac{3y^2(4-y)}{64} & \textrm{for } 0 \leq y \leq 4\\ 0 & \textrm{elsewhere} \end{array}\right.$$ --- **Given two probability distributions, compute their KL divergence.** **Example 1:** Two **discrete** probability distributions P(x) and Q(x) | x | 0 | 1 | 2 | |:--------:|:--------:|:---------:|:--------:| | $$P(x)$$ | $$9/25$$ | $$12/25$$ | $$4/25$$ | | $$Q(x)$$ | $$1/3$$ | $$1/3$$ | $$1/3$$ | $$D_{KL}(P||Q) = \sum P(x)\ln\left(\frac{P(x)}{Q(x)}\right) = \frac{9}{25}\ln\left(\frac{9/25}{1/3}\right)+\frac{12}{25}\ln\left(\frac{12/25}{1/3}\right)+\frac{4}{25}\ln\left(\frac{4/25}{1/3}\right) = 0.0853$$ --- **Given a probability distribution, compute its entropy.** Given a discrete random variable $X$, with possible outcomes $x_{1}, ..., x_{n}$, which occur with probability ${P} (x_{1}),..., {P} (x_{n})$, the entropy of $X$ is formally defined as: $$H(X)=-\sum_{i=1}^{n}P(x_i)\log_b(P(x_i))$$ Base *b* $2$ gives the unit of bits (or "shannons"), while base $e$ gives the "natural units" nat, and base $10$ gives a unit called "dits", "bans", or "hartleys". **Example 1:** Consider a fair dice with 6 sides. Rolling the dice, each side has a probability of $\frac{1}{6}$ of coming up. The entropy of the probability distribution therefore is (with base $2$, same for base $e$ and $10$): $$H(X)=- \sum_{i=1}^{6}\frac{1}{6}\log_{2}\frac{1}{6}=- \sum_{i=1}^{6}\frac{1}{6}(-0.3869)= 0.3869$$ **Example 2:** Now consider an unfair dice, where one side comes up with a probability of $0.5$ and the other 5 sides with a probability of $0.1$ each. $$H(X)=- \left(0.5 \log_{2}(0.5) + \sum_{i=1}^{5}0.1\log_{2}(0.1) \right)=$$$$=- \left[ 0.5 \cdot (-1) + 5 \cdot 0.1 \cdot (-0.301)\right]= 0.6505$$ --- **Given two vectors, compute their outer product.** **Example 1:** Two vectors **u** and **v** $$\bf{u} = \left[\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \\ \end{array}\right]$$ $$\bf{v} = \left[\begin{array}{c} 5 \\ 4 \\ 3 \\ 2 \\ 1 \\ \end{array}\right]$$ $$\bf{u} \otimes \bf{v} = \bf{u} \bf{v}^{\textrm{T}} = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ \end{bmatrix} \begin{bmatrix} 5 & 4 & 3 & 2 & 1 \\ \end{bmatrix} = \left[\begin{array}{ll} 5 & 4 & 3 & 2 & 1 \\ 10 & 8 & 6 & 4 & 2 \\ 15 & 12 & 9 & 6 & 3 \\ 20 & 16 & 12 & 8 & 4 \\ \end{array}\right]$$ **Example 2:** Two vectors **u** and **v** $$\bf{u} = \left[\begin{array}{c} u_1 \\ u_2 \\ u_3 \\ \end{array}\right]$$ $$\bf{v} = \left[\begin{array}{c} v_1 \\ v_2 \\ v_3 \\ v_4 \\ \end{array}\right]$$ $$\bf{u} \otimes \bf{v} = \bf{u} \bf{v}^{\textrm{T}} = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \end{bmatrix} \begin{bmatrix} v_1 & v_2 & v_3 & v_4 \\ \end{bmatrix} = \left[\begin{array}{ll} u_1v_1 & u_1v_2 & u_1v_3 & u_1v_4 \\ u_2v_1 & u_2v_2 & u_2v_3 & u_2v_4 \\ u_3v_1 & u_3v_2 & u_3v_3 & u_3v_4 \\ \end{array}\right]$$ --- **Given a factor loading matrix U, signal vector y and noise vector epsilon, compute the observation vector x.** ![](https://i.imgur.com/FR3yxKy.png) --- **Given data matrix X and least squares solution matrix A, compute the projection matrix (of the data onto factors) Y.** ![](https://i.imgur.com/uGP4PS9.png) --- **Compute the posterior distribution via Bayes’ theorem (from conditional and marginals).** --- **Given probabilities (or a distribution) for observing data, compute the likelihood or log-likelihood.** {%pdf https://www.medicine.mcgill.ca/epidemiology/hanley/bios601/Likelihood/Likelihood.pdf %} --- **Given a distribution (or a set of data samples), compute mean, median, mode, and variance.** **Example 1:** An exponential random variable (exponential distribution) is given. The exponential distribution is defined as follows: $$f(y) = \left\{\begin{array}{ll} \lambda e^{-\lambda x} & \textrm{for } x \geq 0\\ 0 & \textrm{elsewhere} \end{array}\right.$$ * **Mean:** The mean of a distribution is its expected value. It is defined as: $$E[X]=\int_{-\infty}^{\infty}xp(x)dx$$ As $f(x)=0 \ \forall \ x \leq 0$, the integral can start at 0. $$E[X]=\int_{0}^{\infty}x \lambda e^{-\lambda x}dx=$$$$=-xe^{-\lambda x} \bigg|_{0}^{\infty}-\int_{0}^{\infty}-e^{-\lambda x}dx=$$$$=(-0+0)-\frac{1}{\lambda}e^{-\lambda x} \bigg|_{0}^{\infty}=\frac{1}{\lambda}$$ * **Median:** The median $m$ is the value that a random variable has an equal chance of being above or below. Therefore, the following must hold true: $P(X<m)=0.5$ $$P(X<m)=\int_{0}^{m}\lambda e^{-\lambda x}dx=1-e^{-\lambda m}=0.5$$ To obtain the median, we need to solve the above equation for m. $$m=\frac{-\ln(0.5)}{\lambda}$$ * **Mode:** The mode is the value which appears most often in a distribution. For a continous distribution, it is the point at which the probability density function attains its maximum value. **(I'm not 100% sure about this point, please change if I'm wrong)** To get the maximum value, we need to calculate the derivative of our function and set it to 0. $$\frac{d}{dx}\lambda e^{-\lambda x}=-\lambda^2 e^{-\lambda x}=0$$ Unfortunately, an exponential function doesn't become 0 at any point. Therefore, some different kind of reasoning is needed: As the exponential distribution is non-zero for all real values greater or equal to 0 and the function values get smaller for increasing values, the maximum value occurs at $x=0$. Therefore, the mode of the exponential distribution is 0. * **Variance:** The variance for a continous random variable is defined as follows: $$Var(X)=\int_{\mathbb{R}}(x-\mu)^2p(x)dx$$ Whereas: $\mu=E[X]$, where $E[X]$ is the expected value as defined above. Alternatively, the variance can be described as: $$Var(X)=E[X^2]-E[X]^2$$ As we already know $E[X]$, we only need to calculate $E[X^2]$. Using integration by parts and making use of the expected value already calculated, we get: $$E[X^2]=\int_{0}^{\infty}\lambda x^2e^{-\lambda x}dx=$$$$=\left[-x^2e^{-\lambda x}\right]\bigg|_{0}^{\infty}+\int_{0}^{\infty}2xe^{-\lambda x}dx=$$$$=0+\frac{2}{\lambda}E[X]=\frac{2}{\lambda^2}$$ With that, the variance adds up to be: $$Var(X)=E[X^2]-E[X]^2=\frac{2}{\lambda^2}-\left(\frac{1}{\lambda} \right)^2=\frac{1}{\lambda^2}$$ For further details on the variance, check [the Wikipedia article on the variance](https://en.wikipedia.org/wiki/Variance).