# 機器學習 02:線性回歸
###### tags: `ML model`, `Linear Regression`
- **公式 1:線性回歸**
<font size=5>**$$\hat y = θ_0 + θ_1x_1 + θ_2x_2 + \cdots + θ_nx_n$$**</font>
我們目標就是要預測所有 $θ$ ,並最小化這個 MSE 以取得最精確的值。
<font size=5>**$\begin{align}
Loss = MSE & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 \\
& = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - (θ_0 + θ_1x_1 + θ_2x_2 + \cdots + θ_nx_n))^2 \\
& = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - (ax + b))^2
\end{align}$**</font>
將此 loss function 對 $a$ 偏微分得:
<font size=5>**$\begin{align}
D_m = \frac{\partial MSE}{\partial m} & = \frac{-2}{n}\sum\limits_{i=0}^{n} x(\hat Y_i - (ax + b)) \\
& = \frac{-2}{n}\sum\limits_{i=0}^{n} x(\hat Y_i - Y)
\end{align}$**</font>
同理對 b 偏微分得:
<font size=5>**$\begin{align}
D_a = \frac{\partial MSE}{\partial a} = \frac{-2}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y)
\end{align}$**</font>
現在可用以下方程式更新 $a$ 和 $b$:
<font size=5>$$m = m - lr * D_m$$</font>
<font size=5>$$a = a - lr * D_a$$</font>
<br>
並轉換成模型:
```python=
m = 0
c = 0
lr = 0.0001
epochs = 1000
n = float(len(X)) # Number of elements in X
# Performing Gradient Descent
for i in range(epochs):
Y_pred = m*X + c
D_m = (-2/n) * sum(X * (Y - Y_pred))
D_c = (-2/n) * sum(Y - Y_pred)
m = m - lr * D_m # Update m
c = c - lr * D_c # Update c
```
---
- **公式 2:線性回歸向量化形式**
也可以把公式 1 轉成向量化形式
<font size=5>**$$\hat y = h_θ(x) = θ\cdot x$$**</font>
其中 $θ$ 為參數**向量**,$x$ 為特徵**向量**。
<br>
接著最小化這個 MSE:
<font size=5>**$\begin{align}
Loss = MSE & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 \\
& = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - x_i\cdotθ)^2 \\
& = \frac{1}{n}(\hat Y_i - Xθ)^2
\end{align}$**</font>
---
**註:$x\cdotθ$ 為向量內積,$X^Tθ$ 為矩陣運算**
**其中**
<font size=5>$y_i = \sum\limits_{i=0}^{n}x_i \cdot θ = Xθ$</font>
<font size=5>$x_i = \begin{bmatrix}
a_{1, i} \\ a_{2, i} \\ \vdots \\ a_{n, i}
\end{bmatrix}$ , $θ = \begin{bmatrix}
θ_1 \\ θ_2 \\ \vdots \\ θ_n
\end{bmatrix}$</font>
$X = \begin{bmatrix}
x_1 \\ x_2 \\ \vdots \\ x_n
\end{bmatrix} =
\begin{pmatrix}
a_{1, 1} & a_{1, 2} & \cdots & a_{1, i} & 1 \\
a_{2, 1} & a_{2, 2} & \cdots & a_{2, i} & 1 \\
\vdots & \vdots & \ddots & \vdots & \vdots \\
a_{n, 1} & a_{n, 2} & \cdots & a_{n, i} & 1 \\
\end{pmatrix}$
---
<br>
對 $θ$ 作偏微分:
<font size=5>**$\begin{align}
D_θ = \frac{\partial MSE}{\partial θ}
& = \frac{2}{n}(\hat Y_i - Xθ)*d(Y - Xθ)\\
& = \frac{2}{n}(\hat Y_i - Xθ)(-X^T) \\
& = \frac{-2}{n}X(\hat Y_i - X^Tθ)
\end{align}$**</font>
最後可用以下方程式更新 $θ$:
<font size=5>$θ = θ - lr * D_θ$</font>
並轉換成模型:
```python=
lr = 0.1
epoch = 1000
m = 100
theta = np.random.randn(2, 1)
for _ in range(epoch):
gradients = 2/m * X.T.dot(X.dot(theta) - y)
theta -= lr * gradients
```
若用 sklearn 實現:
```python=
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(data_prepared, data_labels)
```
---
# 多項式回歸
加入各個特徵的次方來作為新特徵,接著用這個擴展後的特徵組合來訓練線性模型。
```python=
# 加多項式特徵
from sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
lin_reg.intercept_, lin_reg.coef_
```
---
# 山嶺回歸(Ridge Regression)
線性回歸的正則化版本,在**訓練期間**的 loss function 加入額外的正則化項,即加上所有參數的平方和(||$L_2$||)。
<font size=5>**$\begin{align}
Loss & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 + \alpha \frac{1}{2}||L_2|| \\
& = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 + \alpha \frac{1}{2} \sum\limits_{i=1}^{n} |θ_i|^2
\end{align}$**</font>
其中 $\alpha$ 為超參數,增加 $\alpha$ 會產生較平坦的預測,可減少模型變異度,但增加偏差。下圖左為線性模型,下圖右為多項式模型。

以 sklearn 實現:
```python=
from sklearn.linear_model import Ridge
# cholesky 封閉式、svd 奇異值分解、 lsqr 最小平方法
ridge_reg = Ridge(alpha=1, solver="cholesky", random_state=42)
ridge_reg.fit(X_poly, y)
```
---
# Lasso 回歸
Lasso 回歸則是在 loss function 加上 ||$L_1$|| 項,故 Lasso 回歸會傾向移除最不重要的特徵權重,也就是進行**特徵篩選**,並輸出一個**稀疏模型**。
<font size=5>**$\begin{align}
Loss & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 + \alpha ||L_1|| \\
& = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 + \alpha \sum\limits_{i=1}^{n} |θ_i|
\end{align}$**</font>
下圖左為線性模型,下圖右為多項式模型。

以 sklearn 實現:
```python=
from sklearn.linear_model import Lasso
lasso_reg = Lasso(alpha=0.1)
lasso_reg.fit(X_poly, y)
```
---
# 彈性網路
介於山嶺及 Lasso 回歸之間,當 r=0 時,彈性網路等於山嶺回歸,當 r=1 時,彈性網路等於 Lasso 回歸。
<font size=5>**$\begin{align}
Loss & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 + r\alpha ||L_1|| + \frac{1-r}{2} \alpha |L_2| \\
& = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 + r\alpha \sum\limits_{i=1}^{n} |θ_i| + \frac{1-r}{2} \alpha\sum\limits_{i=1}^{n} |θ_i|^2
\end{align}$**</font>
以 sklearn 實現:
```python=
from sklearn.linear_model import ElasticNet
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5, random_state=42)
elastic_net.fit(X_poly, y)
```
---
# Logistic 回歸
用線性回歸的輸出來加上 sigmoid 函數判斷資料分類。
```python=
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression(solver="lbfgs", random_state=42)
log_reg.fit(X, y)
```
---
## reference
1. 精通機器學習,使用 Scikit-Learn, Keras 與 Tensorflow-Aurelien Greon
2. [机器学习第一课 | 一文读懂线性回归的数学原理](https://zhuanlan.zhihu.com/p/71725190)