###### tags: `machine learning`|`python` # 機器學習 - 多項式回歸(Polynomial Regression) ## 介紹 * <font color="#3355FF">**一個**</font>應變數($Y$)和<font color="#3355FF">**一個或多個**</font>自變數($X$)間多項式的回歸分析方式 * 一個自變量 --> 一元多項式回歸 * 多個自變量 --> 多元多項式回歸 * 一元回歸分析中,應變數($Y$)與自變數($X$)為<font color="#008000">非線性關係</font>時,可採用一元多項式回歸 > 目的: > * 解釋data過去現象 > * 利用自變數($X$)來預測應變數($Y$)的未來可能數值 --- 方程式:$y = b_0 + b_1x_1 + b_2x_1^2 + ... + b_nx_1^n$ (圖形為拋物線) ![](https://i.imgur.com/T6bwK2M.png) Why "Linear"? 項與項之間都是線性組合的關係(都是相乘再相加) --- ## 程式碼操作 ```python= from sklearn.preprocessing import PolynomialFeatures ``` > * 使用sklearn中的preprocessing的PolynomialFeatures類別 > 提供多項式特徵處理的方法 ```python= poly_reg = PolynomialFeatures(degree = 2) ``` > * degree = 2 代表最高次方為2 ```python= x_poly = poly_reg.fit_transform(x) ``` > * 使用PolynomialFeatures中的fit_transfor()進行資料擬合與轉換 > ![](https://i.imgur.com/yxjqerh.png) ```python= x_grid = np.arange(min(x), max(x), 0.1) ``` > * 將x的差距改為0.1(原本差距為1)![](https://i.imgur.com/6Jz4OH6.png) > 有更多點,使曲線更平滑 ```python= new_x = 6.5 new_x = np.array(new_x).reshape(-1, 1) lin_reg.predict(new_x) lin_reg2.predict(poly_reg.fit_transform(new_x)) ``` > * 假設有一位級數為6.5級的應徵者,分別用簡單線性回歸和多項式回歸來操作,結果如下 > ![](https://i.imgur.com/SFaShp9.png) > 由此可知,若用簡單線性回歸模型,公司需支付給此人的薪水超出太多 --- ## 練習 機器學習-作業10 ![](https://i.imgur.com/ZxpI6BM.png) ```python= # Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # Importing the dataset dataset = pd.read_csv("Position_Salaries.csv") x = dataset.iloc[:, 1:2].values y = dataset.iloc[:, 2].values ''' # Missing Data # Categorical Data # Splitting the Dataset into the Training set and Test set 1.只有10筆資料,訓練集合會太小,模型誤差大 2.level希望全部做分析,才能完整觀察與薪水的關係 # Feature Scaling Linear Regression自帶特徵縮放,不做 ''' # Simple Linear Regression from sklearn.linear_model import LinearRegression lin_reg = LinearRegression() lin_reg.fit(x, y) # Graph of Simple Linear Regression plt.scatter(x, y, color = 'red') plt.plot(x, lin_reg.predict(x), color = 'blue') plt.title("Truth or Bluff (Simple Linear Regression)") plt.xlabel("Position Level") plt.ylabel("Salary") plt.show() # Polynomial Regression from sklearn.preprocessing import PolynomialFeatures poly_reg = PolynomialFeatures(degree = 5) x_poly = poly_reg.fit_transform(x) lin_reg2 = LinearRegression() lin_reg2.fit(x_poly, y) # Graph of Polynomial Regression plt.scatter(x, y, color = 'red') plt.plot(x, lin_reg2.predict(x_poly), color = 'blue') plt.title("Truth or Bluff (Polynomial Regression)") plt.xlabel("Position Level") plt.ylabel("Salary") plt.show() ''' 讓線條平滑 ''' x_grid = np.arange(min(x), max(x), 0.1) x_grid = x_grid.reshape(len(x_grid), 1) plt.scatter(x, y, color = 'red') plt.plot(x_grid, lin_reg2.predict(poly_reg.fit_transform(x_grid)), color = 'blue') plt.title("Truth or Bluff (Polynomial Regression)") plt.xlabel("Position Level") plt.ylabel("Salary") plt.show() new_x = 6.5 new_x = np.array(new_x).reshape(-1, 1) lin_reg.predict(new_x) lin_reg2.predict(poly_reg.fit_transform(new_x)) ```