MLFALL 2019 HW4 Sol

# MLFALL 2019 HW4 Sol ### 1. Principle Component Analysis (1%) Given $10$ samples in 3D: $(1,2,3)$, $(4,8,5)$, $(3,12,9)$, $(1,8,5)$, $(5,14,2)$, $(7,4,1)$, $(9,8,9)$, $(3,8,1)$, $(11,5,6)$, $(10,11,7)$ #### (a) What are the principal axes? #### Sol: $\mu = \begin{bmatrix}5.4 & 8 & 4.8 \end{bmatrix}$ $\Sigma = \begin{bmatrix}12.04 & 0.5 & 3.28 \\ 0.5 & 12.2 & 2.9 \\ 3.28 & 2.9 & 8.16\end{bmatrix} = Q\Lambda Q^T$ $Q = \begin{bmatrix}-0.6165 & -0.6781 & 0.3998 \\ -0.5888 & 0.7343 & 0.3375 \\ -0.5225 & -0.0272 & -0.8521\end{bmatrix}\quad \Lambda = \begin{bmatrix}15.2974 & 0 & 0 \\ 0 & 11.6305 & 0 \\ 0 & 0 & 5.472\end{bmatrix}$ 1st = $\begin{bmatrix}−0.6165 & −0.5888 & −0.5225\end{bmatrix}^T$ 2nd = $\begin{bmatrix}-0.6781 & 0.7343 & -0.0272\end{bmatrix}^T$ 3rd = $\begin{bmatrix}0.3998 & 0.3375 & -0.8521\end{bmatrix}^T$ #### (b) Please compute the principal components for each sample. $Principal\ Component = \begin{bmatrix}-0.6165 & -0.6781 & 0.3998 \\ -0.5888 & 0.7343 & 0.3375 \\ -0.5225 & -0.0272 & -0.8521\end{bmatrix}^T(x_i-\mu)$ $1.\ \begin{bmatrix}7.1865 & 1.3732 & 2.2510\end{bmatrix}$ $2.\ \begin{bmatrix}0.7587 & -0.9439 & 0.7302\end{bmatrix}$ $3.\ \begin{bmatrix}-3.0703& -4.4505 & 3.1883\end{bmatrix}$ $4.\ \begin{bmatrix}2.6084 & -2.9785 & 1.9297\end{bmatrix}$ $5.\ \begin{bmatrix}-1.8229 & -4.7540 & -4.2515\end{bmatrix}$ $6.\ \begin{bmatrix}3.3545 & 3.9189 & -2.5275\end{bmatrix}$ $7.\ \begin{bmatrix}-4.4146 & 2.5560 & 2.1395\end{bmatrix}$ $8.\ \begin{bmatrix}3.4656 & -1.7313 & -2.2784\end{bmatrix}$ $9.\ \begin{bmatrix}-2.3135 & 6.0337 & -0.2038\end{bmatrix}$ $10.\ \begin{bmatrix}-5.7524 & 0.9764 & -0.9773\end{bmatrix}$ #### $c$ What is the average reconstruction error if reduce dimension to 2D? Here the reconstruction error is defined as the squared loss. 將(b)小題的答案，捨去第三個dimension的值(第三個dimension為0) $W = \begin{bmatrix}-0.6165 & -0.6781 & 0.3998 \\ -0.5888 & 0.7343 & 0.3375 \\ -0.5225 & -0.0272 & -0.8521\end{bmatrix}^T$ $x_1' = [7.1865, 1.3732, 0]$ $Reconsturction\ r_i = X'W + \mu$ $[ 1.90009072\quad 2.75992709\quad 1.08178971]\\ [ 4.29198496\quad 8.24651657\quad 4.37774211]\\ [ 4.27485905\quad 13.0763358\quad 6.28310968]\\ [ 1.77163801\quad 8.65147726\quad 3.35553912]\\ [ 3.29997625\quad 12.5647067\quad 5.62297154]\\ [ 5.98934216\quad 3.14672348\quad 3.15384320]\\ [ 9.85550052\quad 8.72228056\quad 7.17681721]\\ [ 2.08893199\quad 7.23080501\quad 2.94160433]\\ [10.9184895\quad 4.93118246\quad 6.17370944]\\ [ 9.60918683\quad 10.6700449\quad 7.83287366]$ $average\ error = \frac{1}{10} \sum_{10}^{i=1} \|x_i - r_i\|^2$ $e = \begin{bmatrix}5.0671832 & 0.53323052 & 10.16525752 & 3.72409942 & 18.07607017 & 6.38855061 & 4.57756584 & 5.19153323 & 0.04155478 & 0.95528383\end{bmatrix}$ $average\ error = 5.472$ ### 2. Constrained Mahalanobis Distance Minimization Problem (1%) #### (a) Let $A \in R^{m \times n}$, show that $AA^T$ and $A^TA$ are both symmetric and positive semi-definite, and share the same non-zero eigenvalues. #### Sol: Symmetric: $1.\ (AA^T)^T = (A^T)^TA^T = AA^T \quad 2.\ (A^TA)^T = A^T(A^T)^T = A^TA$ Positive Semi-definite: $\forall v \in R^n \setminus 0 \quad \\ 1.\ v^T(AA^T)v = (A^Tv)^T(A^Tv) = \|A^Tv\|^2 > 0 \\ 2.\ v^T(A^TA)v = (Av)^T(Av) = \|Av\|^2 > 0$ Same non-zero eigenvalues: Let $v$ be a nonzero eigenvector of $A^TA$ with eigenvalue $\lambda\neq0.$ This means $(A^TA)v = \lambda v$ Multiply both sides on the left by A, yields $AA^T(Av) = \lambda(Av)$ Which is the statement that the vector $Av$ is an eigenvector of $AA^T$ with eigenvalue $\lambda$. #### (b) Let $\Sigma \in R^{m \times m}$ be a symmetric positive semi-definite matrix, $\mu \in R^m$. Please construct a set of points $x_1,...,x_n \in R^m$ such that $$\frac{1}{n}\sum_{i=1}^n (x_i - \mu) (x_i - \mu)^T = \Sigma, ~~ \frac{1}{n}\sum_{i=1}^n x_i = \mu$$ #### Sol: $WLOG$, Let $\mu = 0$ and $\because \Sigma$ is symmetric positive smemi-definie matrix $$\Sigma = UDU^T = \Sigma_{i} (d_iu_iu_i^T)$$ Let $2m = n$ Construct a set of points $x_1, ..., x_m, ..., x_{2m}$ $x_i = \sqrt{d_i}u_i \ \forall\ 1\leq i \leq m \quad x_i = -\sqrt{d_i}u_i \ \forall\ (m+1)\leq i \leq 2m$ Then, $$\frac{1}{n}\sum_{i=1}^{n}x_i = \mu = 0 \\ \frac{1}{n}\sum_{i=1}^{n}x_ix_i^T = \sum_{i=1}^{n}(d_iu_iu_i^T) =UDU^T = \Sigma$$ #### $c$ Let $1 \leq k \leq m$, solve the following optimization problem (and justify with proof): $$minimize \quad Trace(\Phi^T \Sigma \Phi) \\ subject \ to \quad \Phi^T \Phi = I_k \\ variables \quad \Phi \in R^{m \times k}$$ #### Sol: $$Trace(\Phi^T\Sigma\Phi) = Trace(\Sigma\Phi\Phi^T) \\ \frac{\partial\ Trace(\Sigma\Phi\Phi^T)}{\partial \Phi} = \Sigma\Phi $$ Using Lagrange Multiplier $\lambda$ $$Trace(\Sigma\Phi\Phi^T) - \lambda (\Phi^T\Phi - I) = 0$$ Taking derivatives $\partial \Phi$ on both sides $$\Sigma \Phi = \lambda \Phi$$ The result indicates that $\Phi$ is an eigenvector of $\Sigma$ To obtain the least value of $\lambda$, we choose the $\Phi$ correspond to the k-least eigenvalue. $$\Phi = [u_{m-k+1}, ..., u_m]$$ Where $U = [u_1, ..., u_m]$ is orthogonal matrix of eigenvectors (of $\Sigma$), $\Lambda = diag(\lambda_1, ..., \lambda_m)$ is diagonal matrix of the associated eigenvalues arranged in non-ascending order $\lambda_1 \geq \lambda_2 \geq ... \geq \lambda_m \geq 0$ ### 3. Multiclass AdaBoost (1%) Let $X$ be the input space, $F$ be a collection of multiclass classifiers that map from $X$ to $[1,K]$, where $K$ denotes the number of classes. Let $\{(x_i,{\hat y}_i)\}_{i=1}^n$ be the training data set, where $x_i \in R^m$ and ${\hat y}_i \in [1,K]$. Given $T \in N$, suppose we want to find functions $$g_T^k(x) = \sum_{t=1}^T \alpha_t^k f_t(x), ~~ k \in [ 1,K]$$ where $f_t \in F$ and $\alpha_t^k \in R$ for all $t \in [ 1,T]$, $k \in [1,K]$, by which the aggregated classifier $h: X \rightarrow [1,K ]$ is defined as $$h(x) = \underset{1 \leq k \leq K}{\mbox{argmax}} ~ g_T^k(x) $$ Please apply gradient boosting to show how the functions $f_t$ and coefficients $\alpha_t^k$ are computed with an aim to minimize the following loss function $$L(g_T^1,...,g_T^K) = \sum_{i=1}^n \exp\left(\frac{1}{K-1}\sum_{k \neq {\hat y}_i} g_T^{k}(x_i) - g_T^{{\hat y}_i}(x_i) \right) $$ #### Sol: $Let\ g^k_0(x) = 0,\ \forall k = 1,...,K$ For t = 1 to T: $\qquad$For k = 1 to K: $\qquad \qquad$Find $f_t(x_i)\ maximize\ \frac{\partial L}{\partial \alpha}$ $$L = \sum_{i=1}^n exp\left(\frac{1}{K-1}\sum_{k\neq y_i}(g_{T-1}^k(x) + \alpha_t^kf_t(x)) - (g_{T-1}^{y_i}(x) + \alpha_t^{y_i}f_t(x))\right)$$ $$If\ k\neq {\hat y_i} \quad \frac{\partial L}{\partial \alpha} = \sum_{i=1}^n \frac{1}{K-1} f_t(x)exp\left(\frac{1}{K-1}\sum_{k\neq y_i}(g_{T-1}^k(x) + \alpha_t^kf_t(x)) - (g_{T-1}^{y_i}(x) + \alpha_t^{y_i}f_t(x))\right)$$ $$If\ k= {\hat y_i} \quad \frac{\partial L}{\partial \alpha} = -\sum_{i=1}^n f_t(x)exp\left(\frac{1}{K-1}\sum_{k\neq y_i}(g_{T-1}^k(x) + \alpha_t^kf_t(x)) - (g_{T-1}^{y_i}(x) + \alpha_t^{y_i}f_t(x))\right)$$ $\qquad\qquad$Find $\alpha_t^k\ such that\ \frac{\partial L}{\partial \alpha} = 0$

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.