# 機器學習 02:線性回歸 ###### tags: `ML model`, `Linear Regression` - **公式 1:線性回歸** <font size=5>**$$\hat y = θ_0 + θ_1x_1 + θ_2x_2 + \cdots + θ_nx_n$$**</font> 我們目標就是要預測所有 $θ$ ,並最小化這個 MSE 以取得最精確的值。 <font size=5>**$\begin{align} Loss = MSE & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 \\ & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - (θ_0 + θ_1x_1 + θ_2x_2 + \cdots + θ_nx_n))^2 \\ & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - (ax + b))^2 \end{align}$**</font> 將此 loss function 對 $a$ 偏微分得: <font size=5>**$\begin{align} D_m = \frac{\partial MSE}{\partial m} & = \frac{-2}{n}\sum\limits_{i=0}^{n} x(\hat Y_i - (ax + b)) \\ & = \frac{-2}{n}\sum\limits_{i=0}^{n} x(\hat Y_i - Y) \end{align}$**</font> 同理對 b 偏微分得: <font size=5>**$\begin{align} D_a = \frac{\partial MSE}{\partial a} = \frac{-2}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y) \end{align}$**</font> 現在可用以下方程式更新 $a$ 和 $b$: <font size=5>$$m = m - lr * D_m$$</font> <font size=5>$$a = a - lr * D_a$$</font> <br> 並轉換成模型: ```python= m = 0 c = 0 lr = 0.0001 epochs = 1000 n = float(len(X)) # Number of elements in X # Performing Gradient Descent for i in range(epochs): Y_pred = m*X + c D_m = (-2/n) * sum(X * (Y - Y_pred)) D_c = (-2/n) * sum(Y - Y_pred) m = m - lr * D_m # Update m c = c - lr * D_c # Update c ``` --- - **公式 2:線性回歸向量化形式** 也可以把公式 1 轉成向量化形式 <font size=5>**$$\hat y = h_θ(x) = θ\cdot x$$**</font> 其中 $θ$ 為參數**向量**,$x$ 為特徵**向量**。 <br> 接著最小化這個 MSE: <font size=5>**$\begin{align} Loss = MSE & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 \\ & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - x_i\cdotθ)^2 \\ & = \frac{1}{n}(\hat Y_i - Xθ)^2 \end{align}$**</font> --- **註:$x\cdotθ$ 為向量內積,$X^Tθ$ 為矩陣運算** **其中** <font size=5>$y_i = \sum\limits_{i=0}^{n}x_i \cdot θ = Xθ$</font> <font size=5>$x_i = \begin{bmatrix} a_{1, i} \\ a_{2, i} \\ \vdots \\ a_{n, i} \end{bmatrix}$ , $θ = \begin{bmatrix} θ_1 \\ θ_2 \\ \vdots \\ θ_n \end{bmatrix}$</font> $X = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} = \begin{pmatrix} a_{1, 1} & a_{1, 2} & \cdots & a_{1, i} & 1 \\ a_{2, 1} & a_{2, 2} & \cdots & a_{2, i} & 1 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ a_{n, 1} & a_{n, 2} & \cdots & a_{n, i} & 1 \\ \end{pmatrix}$ --- <br> 對 $θ$ 作偏微分: <font size=5>**$\begin{align} D_θ = \frac{\partial MSE}{\partial θ} & = \frac{2}{n}(\hat Y_i - Xθ)*d(Y - Xθ)\\ & = \frac{2}{n}(\hat Y_i - Xθ)(-X^T) \\ & = \frac{-2}{n}X(\hat Y_i - X^Tθ) \end{align}$**</font> 最後可用以下方程式更新 $θ$: <font size=5>$θ = θ - lr * D_θ$</font> 並轉換成模型: ```python= lr = 0.1 epoch = 1000 m = 100 theta = np.random.randn(2, 1) for _ in range(epoch): gradients = 2/m * X.T.dot(X.dot(theta) - y) theta -= lr * gradients ``` 若用 sklearn 實現: ```python= from sklearn.linear_model import LinearRegression lin_reg = LinearRegression() lin_reg.fit(data_prepared, data_labels) ``` --- # 多項式回歸 加入各個特徵的次方來作為新特徵,接著用這個擴展後的特徵組合來訓練線性模型。 ```python= # 加多項式特徵 from sklearn.preprocessing import PolynomialFeatures poly_features = PolynomialFeatures(degree=2, include_bias=False) X_poly = poly_features.fit_transform(X) lin_reg = LinearRegression() lin_reg.fit(X_poly, y) lin_reg.intercept_, lin_reg.coef_ ``` --- # 山嶺回歸(Ridge Regression) 線性回歸的正則化版本,在**訓練期間**的 loss function 加入額外的正則化項,即加上所有參數的平方和(||$L_2$||)。 <font size=5>**$\begin{align} Loss & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 + \alpha \frac{1}{2}||L_2|| \\ & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 + \alpha \frac{1}{2} \sum\limits_{i=1}^{n} |θ_i|^2 \end{align}$**</font> 其中 $\alpha$ 為超參數,增加 $\alpha$ 會產生較平坦的預測,可減少模型變異度,但增加偏差。下圖左為線性模型,下圖右為多項式模型。 ![](https://i.imgur.com/DnmlpA9.png) 以 sklearn 實現: ```python= from sklearn.linear_model import Ridge # cholesky 封閉式、svd 奇異值分解、 lsqr 最小平方法 ridge_reg = Ridge(alpha=1, solver="cholesky", random_state=42) ridge_reg.fit(X_poly, y) ``` --- # Lasso 回歸 Lasso 回歸則是在 loss function 加上 ||$L_1$|| 項,故 Lasso 回歸會傾向移除最不重要的特徵權重,也就是進行**特徵篩選**,並輸出一個**稀疏模型**。 <font size=5>**$\begin{align} Loss & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 + \alpha ||L_1|| \\ & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 + \alpha \sum\limits_{i=1}^{n} |θ_i| \end{align}$**</font> 下圖左為線性模型,下圖右為多項式模型。 ![](https://i.imgur.com/QFcQIVc.png) 以 sklearn 實現: ```python= from sklearn.linear_model import Lasso lasso_reg = Lasso(alpha=0.1) lasso_reg.fit(X_poly, y) ``` --- # 彈性網路 介於山嶺及 Lasso 回歸之間,當 r=0 時,彈性網路等於山嶺回歸,當 r=1 時,彈性網路等於 Lasso 回歸。 <font size=5>**$\begin{align} Loss & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 + r\alpha ||L_1|| + \frac{1-r}{2} \alpha |L_2| \\ & = \frac{1}{n}\sum\limits_{i=0}^{n} (\hat Y_i - Y_i)^2 + r\alpha \sum\limits_{i=1}^{n} |θ_i| + \frac{1-r}{2} \alpha\sum\limits_{i=1}^{n} |θ_i|^2 \end{align}$**</font> 以 sklearn 實現: ```python= from sklearn.linear_model import ElasticNet elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5, random_state=42) elastic_net.fit(X_poly, y) ``` --- # Logistic 回歸 用線性回歸的輸出來加上 sigmoid 函數判斷資料分類。 ```python= from sklearn.linear_model import LogisticRegression log_reg = LogisticRegression(solver="lbfgs", random_state=42) log_reg.fit(X, y) ``` --- ## reference 1. 精通機器學習,使用 Scikit-Learn, Keras 與 Tensorflow-Aurelien Greon 2. [机器学习第一课 | 一文读懂线性回归的数学原理](https://zhuanlan.zhihu.com/p/71725190)