# VL Machine Learning: Unsupervised Techniques - Example problems
###### tags: `Exam`
**Given a probability distribution (or density function) and a function, compute the expected value of the function.**
1.
$$f(y) = \left\{\begin{array}{ll}\frac{3y^2(4-y)}{64} & \textrm{for } 0 \leq y \leq 4\\
0 & \textrm{elsewhere} \end{array}\right.$$
---
**Given two probability distributions, compute their KL divergence.**
**Example 1:** Two **discrete** probability distributions P(x) and Q(x)
| x | 0 | 1 | 2 |
|:--------:|:--------:|:---------:|:--------:|
| $$P(x)$$ | $$9/25$$ | $$12/25$$ | $$4/25$$ |
| $$Q(x)$$ | $$1/3$$ | $$1/3$$ | $$1/3$$ |
$$D_{KL}(P||Q) = \sum P(x)\ln\left(\frac{P(x)}{Q(x)}\right) = \frac{9}{25}\ln\left(\frac{9/25}{1/3}\right)+\frac{12}{25}\ln\left(\frac{12/25}{1/3}\right)+\frac{4}{25}\ln\left(\frac{4/25}{1/3}\right) = 0.0853$$
---
**Given a probability distribution, compute its entropy.**
Given a discrete random variable $X$, with possible outcomes $x_{1}, ..., x_{n}$, which occur with probability ${P} (x_{1}),..., {P} (x_{n})$, the entropy of $X$ is formally defined as:
$$H(X)=-\sum_{i=1}^{n}P(x_i)\log_b(P(x_i))$$
Base *b* $2$ gives the unit of bits (or "shannons"), while base $e$ gives the "natural units" nat, and base $10$ gives a unit called "dits", "bans", or "hartleys".
**Example 1:** Consider a fair dice with 6 sides. Rolling the dice, each side has a probability of $\frac{1}{6}$ of coming up. The entropy of the probability distribution therefore is (with base $2$, same for base $e$ and $10$):
$$H(X)=- \sum_{i=1}^{6}\frac{1}{6}\log_{2}\frac{1}{6}=- \sum_{i=1}^{6}\frac{1}{6}(-0.3869)= 0.3869$$
**Example 2:** Now consider an unfair dice, where one side comes up with a probability of $0.5$ and the other 5 sides with a probability of $0.1$ each.
$$H(X)=- \left(0.5 \log_{2}(0.5) + \sum_{i=1}^{5}0.1\log_{2}(0.1) \right)=$$$$=- \left[ 0.5 \cdot (-1) + 5 \cdot 0.1 \cdot (-0.301)\right]= 0.6505$$
---
**Given two vectors, compute their outer product.**
**Example 1:** Two vectors **u** and **v**
$$\bf{u} = \left[\begin{array}{c}
1 \\
2 \\
3 \\
4 \\
\end{array}\right]$$
$$\bf{v} = \left[\begin{array}{c}
5 \\
4 \\
3 \\
2 \\
1 \\
\end{array}\right]$$
$$\bf{u} \otimes \bf{v} = \bf{u} \bf{v}^{\textrm{T}} = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \\ \end{bmatrix} \begin{bmatrix} 5 & 4 & 3 & 2 & 1 \\ \end{bmatrix} =
\left[\begin{array}{ll}
5 & 4 & 3 & 2 & 1 \\
10 & 8 & 6 & 4 & 2 \\
15 & 12 & 9 & 6 & 3 \\
20 & 16 & 12 & 8 & 4 \\
\end{array}\right]$$
**Example 2:** Two vectors **u** and **v**
$$\bf{u} = \left[\begin{array}{c}
u_1 \\
u_2 \\
u_3 \\
\end{array}\right]$$
$$\bf{v} = \left[\begin{array}{c}
v_1 \\
v_2 \\
v_3 \\
v_4 \\
\end{array}\right]$$
$$\bf{u} \otimes \bf{v} = \bf{u} \bf{v}^{\textrm{T}} = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \end{bmatrix} \begin{bmatrix} v_1 & v_2 & v_3 & v_4 \\ \end{bmatrix} =
\left[\begin{array}{ll}
u_1v_1 & u_1v_2 & u_1v_3 & u_1v_4 \\
u_2v_1 & u_2v_2 & u_2v_3 & u_2v_4 \\
u_3v_1 & u_3v_2 & u_3v_3 & u_3v_4 \\
\end{array}\right]$$
---
**Given a factor loading matrix U, signal vector y and noise vector epsilon, compute the observation vector x.**

---
**Given data matrix X and least squares solution matrix A, compute the projection matrix (of the data onto factors) Y.**

---
**Compute the posterior distribution via Bayes’ theorem (from conditional and marginals).**
---
**Given probabilities (or a distribution) for observing data, compute the likelihood or log-likelihood.**
{%pdf https://www.medicine.mcgill.ca/epidemiology/hanley/bios601/Likelihood/Likelihood.pdf %}
---
**Given a distribution (or a set of data samples), compute mean, median, mode, and variance.**
**Example 1:** An exponential random variable (exponential distribution) is given. The exponential distribution is defined as follows:
$$f(y) = \left\{\begin{array}{ll} \lambda e^{-\lambda x} & \textrm{for } x \geq 0\\
0 & \textrm{elsewhere} \end{array}\right.$$
* **Mean:** The mean of a distribution is its expected value. It is defined as:
$$E[X]=\int_{-\infty}^{\infty}xp(x)dx$$
As $f(x)=0 \ \forall \ x \leq 0$, the integral can start at 0.
$$E[X]=\int_{0}^{\infty}x \lambda e^{-\lambda x}dx=$$$$=-xe^{-\lambda x} \bigg|_{0}^{\infty}-\int_{0}^{\infty}-e^{-\lambda x}dx=$$$$=(-0+0)-\frac{1}{\lambda}e^{-\lambda x} \bigg|_{0}^{\infty}=\frac{1}{\lambda}$$
* **Median:** The median $m$ is the value that a random variable has an equal chance of being above or below. Therefore, the following must hold true: $P(X<m)=0.5$
$$P(X<m)=\int_{0}^{m}\lambda e^{-\lambda x}dx=1-e^{-\lambda m}=0.5$$
To obtain the median, we need to solve the above equation for m.
$$m=\frac{-\ln(0.5)}{\lambda}$$
* **Mode:** The mode is the value which appears most often in a distribution. For a continous distribution, it is the point at which the probability density function attains its maximum value. **(I'm not 100% sure about this point, please change if I'm wrong)**
To get the maximum value, we need to calculate the derivative of our function and set it to 0.
$$\frac{d}{dx}\lambda e^{-\lambda x}=-\lambda^2 e^{-\lambda x}=0$$
Unfortunately, an exponential function doesn't become 0 at any point. Therefore, some different kind of reasoning is needed: As the exponential distribution is non-zero for all real values greater or equal to 0 and the function values get smaller for increasing values, the maximum value occurs at $x=0$. Therefore, the mode of the exponential distribution is 0.
* **Variance:** The variance for a continous random variable is defined as follows:
$$Var(X)=\int_{\mathbb{R}}(x-\mu)^2p(x)dx$$
Whereas: $\mu=E[X]$, where $E[X]$ is the expected value as defined above.
Alternatively, the variance can be described as:
$$Var(X)=E[X^2]-E[X]^2$$
As we already know $E[X]$, we only need to calculate $E[X^2]$.
Using integration by parts and making use of the expected value already calculated, we get:
$$E[X^2]=\int_{0}^{\infty}\lambda x^2e^{-\lambda x}dx=$$$$=\left[-x^2e^{-\lambda x}\right]\bigg|_{0}^{\infty}+\int_{0}^{\infty}2xe^{-\lambda x}dx=$$$$=0+\frac{2}{\lambda}E[X]=\frac{2}{\lambda^2}$$
With that, the variance adds up to be:
$$Var(X)=E[X^2]-E[X]^2=\frac{2}{\lambda^2}-\left(\frac{1}{\lambda} \right)^2=\frac{1}{\lambda^2}$$
For further details on the variance, check [the Wikipedia article on the variance](https://en.wikipedia.org/wiki/Variance).