owned this note
owned this note
Published
Linked with GitHub
# MLFALL 2019 HW4 Sol
### 1. Principle Component Analysis (1%)
Given $10$ samples in 3D:
$(1,2,3)$, $(4,8,5)$, $(3,12,9)$, $(1,8,5)$, $(5,14,2)$, $(7,4,1)$, $(9,8,9)$, $(3,8,1)$, $(11,5,6)$, $(10,11,7)$
#### (a) What are the principal axes?
#### Sol:
$\mu = \begin{bmatrix}5.4 & 8 & 4.8 \end{bmatrix}$
$\Sigma = \begin{bmatrix}12.04 & 0.5 & 3.28 \\ 0.5 & 12.2 & 2.9 \\ 3.28 & 2.9 & 8.16\end{bmatrix} = Q\Lambda Q^T$
$Q = \begin{bmatrix}-0.6165 & -0.6781 & 0.3998 \\ -0.5888 & 0.7343 & 0.3375 \\ -0.5225 & -0.0272 & -0.8521\end{bmatrix}\quad \Lambda = \begin{bmatrix}15.2974 & 0 & 0 \\ 0 & 11.6305 & 0 \\ 0 & 0 & 5.472\end{bmatrix}$
1st = $\begin{bmatrix}−0.6165 & −0.5888 & −0.5225\end{bmatrix}^T$
2nd = $\begin{bmatrix}-0.6781 & 0.7343 & -0.0272\end{bmatrix}^T$
3rd = $\begin{bmatrix}0.3998 & 0.3375 & -0.8521\end{bmatrix}^T$
#### (b) Please compute the principal components for each sample.
$Principal\ Component = \begin{bmatrix}-0.6165 & -0.6781 & 0.3998 \\ -0.5888 & 0.7343 & 0.3375 \\ -0.5225 & -0.0272 & -0.8521\end{bmatrix}^T(x_i-\mu)$
$1.\ \begin{bmatrix}7.1865 & 1.3732 & 2.2510\end{bmatrix}$
$2.\ \begin{bmatrix}0.7587 & -0.9439 & 0.7302\end{bmatrix}$
$3.\ \begin{bmatrix}-3.0703& -4.4505 & 3.1883\end{bmatrix}$
$4.\ \begin{bmatrix}2.6084 & -2.9785 & 1.9297\end{bmatrix}$
$5.\ \begin{bmatrix}-1.8229 & -4.7540 & -4.2515\end{bmatrix}$
$6.\ \begin{bmatrix}3.3545 & 3.9189 & -2.5275\end{bmatrix}$
$7.\ \begin{bmatrix}-4.4146 & 2.5560 & 2.1395\end{bmatrix}$
$8.\ \begin{bmatrix}3.4656 & -1.7313 & -2.2784\end{bmatrix}$
$9.\ \begin{bmatrix}-2.3135 & 6.0337 & -0.2038\end{bmatrix}$
$10.\ \begin{bmatrix}-5.7524 & 0.9764 & -0.9773\end{bmatrix}$
#### \(c\) What is the average reconstruction error if reduce dimension to 2D? Here the reconstruction error is defined as the squared loss.
將(b)小題的答案,捨去第三個dimension的值(第三個dimension為0)
$W = \begin{bmatrix}-0.6165 & -0.6781 & 0.3998 \\ -0.5888 & 0.7343 & 0.3375 \\ -0.5225 & -0.0272 & -0.8521\end{bmatrix}^T$
$x_1' = [7.1865, 1.3732, 0]$
$Reconsturction\ r_i = X'W + \mu$
$[ 1.90009072\quad 2.75992709\quad 1.08178971]\\
[ 4.29198496\quad 8.24651657\quad 4.37774211]\\
[ 4.27485905\quad 13.0763358\quad 6.28310968]\\
[ 1.77163801\quad 8.65147726\quad 3.35553912]\\
[ 3.29997625\quad 12.5647067\quad 5.62297154]\\
[ 5.98934216\quad 3.14672348\quad 3.15384320]\\
[ 9.85550052\quad 8.72228056\quad 7.17681721]\\
[ 2.08893199\quad 7.23080501\quad 2.94160433]\\
[10.9184895\quad 4.93118246\quad 6.17370944]\\
[ 9.60918683\quad 10.6700449\quad 7.83287366]$
$average\ error = \frac{1}{10} \sum_{10}^{i=1} \|x_i - r_i\|^2$
$e = \begin{bmatrix}5.0671832 & 0.53323052 & 10.16525752 & 3.72409942 & 18.07607017 & 6.38855061 & 4.57756584 & 5.19153323 & 0.04155478 & 0.95528383\end{bmatrix}$
$average\ error = 5.472$
### 2. Constrained Mahalanobis Distance Minimization Problem (1%)
#### (a) Let $A \in R^{m \times n}$, show that $AA^T$ and $A^TA$ are both symmetric and positive semi-definite, and share the same non-zero eigenvalues.
#### Sol:
Symmetric:
$1.\ (AA^T)^T = (A^T)^TA^T = AA^T \quad 2.\ (A^TA)^T = A^T(A^T)^T = A^TA$
Positive Semi-definite:
$\forall v \in R^n \setminus 0 \quad \\ 1.\ v^T(AA^T)v = (A^Tv)^T(A^Tv) = \|A^Tv\|^2 > 0 \\ 2.\ v^T(A^TA)v = (Av)^T(Av) = \|Av\|^2 > 0$
Same non-zero eigenvalues:
Let $v$ be a nonzero eigenvector of $A^TA$ with eigenvalue $\lambda\neq0.$ This means $(A^TA)v = \lambda v$
Multiply both sides on the left by A, yields $AA^T(Av) = \lambda(Av)$
Which is the statement that the vector $Av$ is an eigenvector of $AA^T$ with eigenvalue $\lambda$.
#### (b) Let $\Sigma \in R^{m \times m}$ be a symmetric positive semi-definite matrix, $\mu \in R^m$. Please construct a set of points $x_1,...,x_n \in R^m$ such that
$$\frac{1}{n}\sum_{i=1}^n (x_i - \mu) (x_i - \mu)^T = \Sigma, ~~ \frac{1}{n}\sum_{i=1}^n x_i = \mu$$
#### Sol:
$WLOG$, Let $\mu = 0$ and $\because \Sigma$ is symmetric positive smemi-definie matrix
$$\Sigma = UDU^T = \Sigma_{i} (d_iu_iu_i^T)$$
Let $2m = n$ Construct a set of points $x_1, ..., x_m, ..., x_{2m}$
$x_i = \sqrt{d_i}u_i \ \forall\ 1\leq i \leq m \quad x_i = -\sqrt{d_i}u_i \ \forall\ (m+1)\leq i \leq 2m$ Then,
$$\frac{1}{n}\sum_{i=1}^{n}x_i = \mu = 0 \\ \frac{1}{n}\sum_{i=1}^{n}x_ix_i^T = \sum_{i=1}^{n}(d_iu_iu_i^T) =UDU^T = \Sigma$$
#### \(c\)
Let $1 \leq k \leq m$, solve the following optimization problem (and justify with proof):
$$minimize \quad Trace(\Phi^T \Sigma \Phi) \\ subject \ to \quad \Phi^T \Phi = I_k \\ variables \quad \Phi \in R^{m \times k}$$
#### Sol:
$$Trace(\Phi^T\Sigma\Phi) = Trace(\Sigma\Phi\Phi^T) \\ \frac{\partial\ Trace(\Sigma\Phi\Phi^T)}{\partial \Phi} = \Sigma\Phi $$
Using Lagrange Multiplier $\lambda$
$$Trace(\Sigma\Phi\Phi^T) - \lambda (\Phi^T\Phi - I) = 0$$
Taking derivatives $\partial \Phi$ on both sides
$$\Sigma \Phi = \lambda \Phi$$
The result indicates that $\Phi$ is an eigenvector of $\Sigma$
To obtain the least value of $\lambda$, we choose the $\Phi$ correspond to the k-least eigenvalue.
$$\Phi = [u_{m-k+1}, ..., u_m]$$
Where $U = [u_1, ..., u_m]$ is orthogonal matrix of eigenvectors (of $\Sigma$), $\Lambda = diag(\lambda_1, ..., \lambda_m)$ is diagonal matrix of the associated eigenvalues arranged in non-ascending order $\lambda_1 \geq \lambda_2 \geq ... \geq \lambda_m \geq 0$
### 3. Multiclass AdaBoost (1%)
Let $X$ be the input space, $F$ be a collection of multiclass classifiers that map from $X$ to $[1,K]$, where $K$ denotes the number of classes. Let $\{(x_i,{\hat y}_i)\}_{i=1}^n$ be the training data set, where $x_i \in R^m$ and ${\hat y}_i \in [1,K]$. Given $T \in N$, suppose we want to find functions
$$g_T^k(x) = \sum_{t=1}^T \alpha_t^k f_t(x), ~~ k \in [ 1,K]$$
where $f_t \in F$ and $\alpha_t^k \in R$ for all $t \in [ 1,T]$, $k \in [1,K]$, by which the aggregated classifier $h: X \rightarrow [1,K ]$ is defined as
$$h(x) = \underset{1 \leq k \leq K}{\mbox{argmax}} ~ g_T^k(x)
$$
Please apply gradient boosting to show how the functions $f_t$ and coefficients $\alpha_t^k$ are computed with an aim to minimize the following loss function
$$L(g_T^1,...,g_T^K) = \sum_{i=1}^n \exp\left(\frac{1}{K-1}\sum_{k \neq {\hat y}_i} g_T^{k}(x_i) - g_T^{{\hat y}_i}(x_i) \right)
$$
#### Sol:
$Let\ g^k_0(x) = 0,\ \forall k = 1,...,K$
For t = 1 to T:
$\qquad$For k = 1 to K:
$\qquad \qquad$Find $f_t(x_i)\ maximize\ \frac{\partial L}{\partial \alpha}$
$$L = \sum_{i=1}^n exp\left(\frac{1}{K-1}\sum_{k\neq y_i}(g_{T-1}^k(x) + \alpha_t^kf_t(x)) - (g_{T-1}^{y_i}(x) + \alpha_t^{y_i}f_t(x))\right)$$
$$If\ k\neq {\hat y_i} \quad \frac{\partial L}{\partial \alpha} = \sum_{i=1}^n \frac{1}{K-1} f_t(x)exp\left(\frac{1}{K-1}\sum_{k\neq y_i}(g_{T-1}^k(x) + \alpha_t^kf_t(x)) - (g_{T-1}^{y_i}(x) + \alpha_t^{y_i}f_t(x))\right)$$
$$If\ k= {\hat y_i} \quad \frac{\partial L}{\partial \alpha} = -\sum_{i=1}^n f_t(x)exp\left(\frac{1}{K-1}\sum_{k\neq y_i}(g_{T-1}^k(x) + \alpha_t^kf_t(x)) - (g_{T-1}^{y_i}(x) + \alpha_t^{y_i}f_t(x))\right)$$
$\qquad\qquad$Find $\alpha_t^k\ such that\ \frac{\partial L}{\partial \alpha} = 0$