owned this note
owned this note
Published
Linked with GitHub
---
title: 'ML2019FALL HW1'
disqus: hackmd
---
## HW1 - Handwritten Assignment
### 1. Closed-Form Linear Regression Solution
In the lecture, we've learnt how to solve linear regression problem via gradient descent. Here you will derive the closed-form solution for such kind of problems.
In the following questions, unless otherwise specified, we denote $S = \{({\bf x}_i, y_i)\}_{i=1}^N$ as a dataset of $N$ input-output pairs, where ${\bf x}_i \in {\mathbb R}^k$ denotes the vectorial input and $y_i \in {\mathbb R}$ denotes the corresponding scalar output.
#### 1-(a)
Let's begin with a specific dataset
$$S = \{(x_i, y_i)\}_{i=1}^5 = \{(1, 1.2), (2, 2.4), (3, 3.5), (4, 4.1), (5, 5.6)\}$$
Please find the linear regression model $({\bf w}, b) \in {\mathbb R} \times {\mathbb R}$ that minimizes the sum of squares loss
$$L_{ssq}({\bf w}, b) = \frac{1}{2 \times 5}\sum_{i=1}^5 (y_i- ({\bf w}^T {\bf x}_i+b))^2$$
#### 1-(b)
Please find the linear regression model $({\bf w}, b) \in {\mathbb R}^k \times {\mathbb R}$ that minimizes the sum of squares loss
$$L_{ssq}({\bf w}, b) = \frac{1}{2N}\sum_{i=1}^N (y_i-({\bf w}^T{\bf x}_i+b))^2$$
#### 1-\(c\)
A key motivation for regularization is to avoid overfitting. A common choice is to add a $L^2$-regularization term into the original loss function
$$L_{reg}({\bf w}, b) = \frac{1}{2N}\sum_{i=1}^N (y_i-({\bf w}^T {\bf x}_i+b))^2 + \frac{\lambda}{2} \|{\bf w}\|^{2}$$
where $\lambda \geq 0$ and for ${\bf w} = [w_1 ~ w_2 ~ ... ~w_k]^T$, one denotes $\|{\bf w}\|^2 = w_1^2 + ... + w_k^2$.
Please find the linear regression model $({\bf w}, b)$ that minimizes the aforementioned regularized sum of squares loss.
---
### 2. Noise and regulation
Consider the linear model $f_{{\bf w},b}: {\mathbb R}^k \rightarrow {\mathbb R}$, where ${\bf w} \in {\mathbb R}^k$ and $b \in {\mathbb R}$, defined as
$$f_{{\bf w},b}({\bf x}) = {\bf w}^T {\bf x} + b$$
Given dataset $S = \{({\bf x}_i,y_i)\}_{i=1}^N$, if the inputs ${\bf x}_i \in {\mathbb R}^k$ are contaminated with input noise ${{\bf \eta}_i} \in {\mathbb R}^k$, we may consider the expected sum-of-squares loss in the presence of input noise as
$${\tilde L}_{ssq}({\bf w},b) = {\mathbb E}\left[ \frac{1}{2N}\sum_{i=1}^{N}(f_{{\bf w},b}({\bf x}_i + {\bf \eta}_i)-y_i)^2 \right]$$
where the expectation is taken over the randomness of input noises ${\bf \eta}_1,...,{\bf \eta}_N$.
Now assume the input noises ${\bf \eta}_i = [\eta_{i,1} ~ \eta_{i,2} ~ ... \eta_{i,k}]$ are random vectors with zero mean ${\mathbb E}[\eta_{i,j}] = 0$, and the covariance between components is given by
$${\mathbb E}[\eta_{i,j}\eta_{i',j'}] = \delta_{i,i'}\delta_{j,j'} \sigma^2$$
where $\delta_{i,i'} = \left\{\begin{array}{ll}
1 & \mbox{, if} ~ i = i'\\
0 & \mbox{, otherwise.}
\end{array}\right.$ denotes the Kronecker delta.
Please show that
$${\tilde L}_{ssq}({\bf w},b) = \frac{1}{2N}\sum_{i=1}^{N}(f_{{\bf w},b}({\bf x}_i)-y_i)^2 + \frac{\sigma^2}{2}\|{\bf w}\|^2$$
That is, minimizing the expected sum-of-squares loss in the presence of input noise is equivalent to minimizing noise-free sum-of-squares loss with the addition of a $L^2$-regularization term on the weights.
- Hint: $\|{\bf x}\|^2 = {\bf x}^T{\bf x} = Trace({\bf x}{\bf x}^T)$.
---
### 3. Kaggle Hacker
In the lecture, we've learnt the importance of validation. It is said that fine tuning your model based on Kaggle public leaderboard always causes "disaster" on private test dataset.
Let's not talk about whether it'll lead to disastrous results or not. The fact is that most students even don't know how to "overfit" public leaderboard except for submitting many and many times.
In this problem, you'll see how to take advantages of public leaderboard in hw1 kaggle competition. (In theory XD)
> ## **Warning**
> ![](https://i.imgur.com/FtmP42R.jpg)
Suppose you have trained $K+1$ models $g_0, g_1, \cdots g_K$, and in particular $g_0({\bf x}) = 0$ is the zero function.
Assume the testing dataset is $\{({\bf x}_i, y_i)\}_{i=1}^N$, where you only know $x_i$ while $y_i$ is hidden. Nevertheless, you are allowed to observe the sum of squares testing error
$$e_k = \frac{1}{N}\sum_{i=1}^N (g_k({\bf x}_i)-y_i)^2, ~~ k = 0, 1, \cdots K$$
Ofcourse, you know $s_k = \frac{1}{N}\sum_{i=1}^N(g_k({\bf x}_i))^2$.
#### 3-(a)
Please express $\sum_{i=1}^N g_k({\bf x}_i)y_i$ in terms of $N, e_0, e_1,\cdots,e_K, s_1,\cdots,s_K$. Prove your answer.
- Hint: $e_0=\frac{1}{N}\sum_{i=1}^{N}y_i^2$
#### 3-(b)
For the given $K + 1$ models in the previous problem, explain how to solve
$\min_{\alpha_1, \cdots \alpha_K} L_{test}(\sum_{k=1}^{K} \alpha_k g_k) = \min [\frac{1}{N} \sum_{i=1}^N( \sum_{k=1}^{K} \alpha_k g_k({\bf x}_i) - y_i)^2]$, and obtain the optimal weights $\alpha_1, \cdots \alpha_K$.