# 2020CCU_Machine_Learning_hw1
###### author:`405420052 資工四 陳奕瑋`
## Use the linear model y = 2x +ε with zero-mean Gaussian noise ε∼ N(0, 1) to generate 20 data points with (equal spacing) x ∈ [−3, 3].:question:</br></br>
### ( a ) Perform linear regression. 20 data points are split into 15 training samples and 5 testing samples (75% for training and 25% for testing).</br></br> Show the **fitting plots** of the **training error**, **cross-validation errors** for both leave-one-out and five-fold, and **testing errors**.
---
#### 過程
1. 依照題目產生 `x` , `y`
```python=
def generate():
epsilon = np.random.normal( 0 , 1 , 20 )
x = np.linspace( -3 , 3 , 20)
y = 2 * x + epsilon
return x , y
```
- `epsilon` 是題目中的 `zero-mean Gaussian noise`
`ε∼ N(0, 1)`
- `x` 則在 `[ -3 , 3 ]` 取等距的 `20` 個資料點
2. 將測試資料( testing data )及訓練資料( training data )各別分成 `25%` 及 `75%`
```python=
def splitdata( a ):
percent = 0.25
sp = math.floor(percent*a.size)
test = a[ : sp ]
train = a[ sp : a.size ]
return train , test
```
- `testing data` 取得前面 `25%` 的資料
- `training data` 取得後面剩下的資料
3. 將訓練資料代入求得 W<sub>linear</sub>
```python=
def linear_compute( x , y ):
x = np.column_stack((x,np.ones(x.shape[0])))
y = y[:,np.newaxis]
pin = np.linalg.pinv(x) # pseudo-inverse
result = pin @ y
return result
```
- 將 `x` 右邊填滿 1 `column` 的 `1`
- 藉由 `np.linalg.pinv` 求得 `x` 的 `pseudo-inverse`
- 將 `pseudo-inverse` 與 y 相乘
4. 計算 `Training Error` 與 `Testing Error`
```python=
def error_func( pred , test ):
num = np.prod ( pred.shape)
error = np.linalg.norm( pred - test ) **2 / num
return error
```
- 將 W<sub>linear</sub>與 `x_train` 或 `x-test` 代入公式求得 `y_pred`, 並與 `y_train` 或`y_test` 一起代入此函式
- 將這 2 個值相減取`2-norm` 後平方再除以資料數量
- 所得即為`training error` 或 `testing error`
5. 求 `five-fold` 及 `leave-one-out` 的 `cross Validation Error`
```python=
def kfold_split( a , begin = 15 , percent = 0.25 ):
arr = np.array([])
k = math.floor(a.size * percent)
test = a[ begin : begin + k ]
if( begin ):
arr = a[ : begin ]
arr2 = a[begin + k : ]
train = np.append(arr,arr2)
return train , test
def kfold( x , y , k ):
begin = 0
terr = 0
size = x.size
for _ in range( k ):
x_train,x_test = kfold_split( x , begin , 1/k )
y_train,y_test = kfold_split( y , begin , 1/k )
wlin = linear_compute( x_train , y_train )
pred = wlin[0] * x_test + wlin[1]
err = error_func( pred , y_test )
begin += math.floor( size * 1/k )
terr += err
return terr/k
```
#### ( a ) 程式
```python=
x , y = generate()
x_train,x_test = splitdata( x )
y_train,y_test = splitdata( y )
wlin = linear_compute(x_train,y_train)
print("-----Part A-----")
print("-----Training Error-----")
pred = wlin[0] * x_train + wlin[1]
print("Training Error : " + str(error_func( pred, y_train )))
print("-----Leave-one-out-----")
print("Cross Validation Error : " + str(kfold(x,y,20)))
print("-----Five-Fold-----")
print("Cross Validation Error : " + str(kfold(x,y,5)))
print("-----Testing Error-----")
test_pred = wlin[0] * x_test + wlin[1]
print("Testing Error : " + str(error_func( test_pred, y_test )))
plt.xlabel("X")
plt.ylabel("Y")
plt.scatter( x_train , y_train, label = "TrainData")
plt.scatter( x_test , y_test, label= "TestData" )
plt.title("Part A : Fitting Plot")
y_pred = wlin[0] * x + wlin[1]
y_pred = y_pred.reshape(20,1)
plt.plot( x , y_pred , label="Fitting Function")
plt.legend()
plt.show()
```
#### 結果

| Training Error | Testing Error | Cross-Validation-Error( leave-one-out ) | Cross-Validation-Error( five-fold )|
| ---- | ---- | ---- | ---- |
| 1.3197551637541514 | 3.3874342169640492 | 1.9077633587725926| 1.8076079235549933 |
#### 討論
Q : 測試資料與訓練資料的切割方法會不會影響error ?
### ( b ) Perform polynomial regression with degree 5, 10 and 14, respectively.</br> For each case, show the fitting plots of the training error, cross-validation errors (both leave-one-out and five-fold) and testing errors.
---
#### 過程
此題與( a )不同的部分在於 `fitting function` 及 `X` ,在此列出與上一題不同的部分。
```python=
def poly_cal( x , n ):
x3=np.array([])
for i in range( n ):
x2=np.array([])
if ( i ):
x2 = x**( i + 1 )
x3 = np.column_stack((x3,x2))
else :
x3 = np.column_stack((x,np.ones(x.shape[0])))
x3 = np.fliplr(x3)
x3 = np.fliplr(x3)
return x3
```
#### ( b )程式
```python=
def poly_regression(x,y,n):
x_train,x_test = splitdata( x ) #preserve x_train for plot
y_train,y_test = splitdata( y )
x_train_2 = poly_cal(x_train,n)
wlin = poly_compute(x_train_2,y_train)
print("-----Degree %d polynomial regression-----" %n)
print("-----Training Error-----")
p = np.poly1d(wlin.flatten())
test_pred=p(x_test)
y_pred = p(x)
pred=p(x_train)
print("Training Error : " + str(error_func( pred, y_train )))
print("-----Leave-one-out-----")
print("Cross Validation Error : " + str(kfold(x,y,20,n)))
print("-----Five-Fold-----")
print("Cross Validation Error : " + str(kfold(x,y,5,n)))
print("-----Testing Error-----")
print("Testing Error : " + str(error_func( test_pred, y_test )))
plt.xlabel("X")
plt.ylabel("Y")
plt.scatter( x_train , y_train, label = "TrainData")
plt.scatter( x_test , y_test, label= "TestData" )
plt.title("Part B : Degree %d Fitting Plot" %n)
plt.plot( x , y_pred , label="Fitting Function")
plt.legend()
plt.show()
return
print("-----Part B-----")
x , y = generate()
poly_regression(x,y,5)
poly_regression(x,y,10)
poly_regression(x,y,14)
```
- `X` 的 `shape` 為 `N*(d+1)` (`d = 5 , 10 , 14 N為資料總數` )
- 第 1 row 由左至右分別是 x<sub>1</sub><sup>d</sup> , x<sub>1</sub><sup>d-1</sup> , .... ,x<sub>1</sub><sup>0</sup>
- 第 2 row 由左至右分別是 x<sub>2</sub><sup>d</sup> , x<sub>2</sub><sup>d-1</sup> , .... ,x<sub>2</sub><sup>0</sup>,以此類推...
- 最後的 `x3` 即為所求的 `X`
#### 結果



| Degree | Training Error | Testing Error | Cross-Validation-Error( leave-one-out ) | Cross-Validation-Error( five-fold )|
|----| ---- | ---- | ---- | ---- |
| 5 | 0.6944558047821588 | 2464.8382865417984 | 1.9578348576088629| 81.18661465708635 |
| 10 | 0.11479188699682578 | 264262952.736448 | 305.6685006156316| 1851569.011690172 |
| 14 | 6.06127400457753e-19 | 217823218344795.53 | 8596.99446149074| 16041214609.370647 |
#### 討論
此題與`A` 差在生成 `X` 的 dimension 變很大 , 由上方的表格可以發現Degree越大,training error變得越來越小,而產生的regression也越貼合 data point 。
### ( c ) Generate data using **y = sin(2πx) +ε** with the noise ε∼ N(0, 0.04) and (equal spacing) x ∈ [0, 1].</br></br> Show the fitting plots of the training error, cross-validation errors for both leave-one-out and five-fold, and testing errors via polynomial regression with degree **5**, **10** and **14**.
---
此題與 ( b ) 差在條件變成 **y = sin(2πx) +ε** with the noise ε∼ N(0, 0.04) and (equal spacing) x ∈ [0, 1],以下為不同的部分:
```python=
def generate():
epsilon = np.random.normal( 0 , 0.04 , 20 )
x = np.linspace( 0 , 1 , 20)
y = np.sin( 2 * x * np.pi ) + epsilon
return x , y
```
#### 結果



| Degree | Training Error | Testing Error | Cross-Validation-Error( leave-one-out ) | Cross-Validation-Error( five-fold )|
|----| ---- | ---- | ---- | ---- |
| 5 | 0.0002208547075204422 | 0.1727247519463676 | 0.0006647943292551545| 0.012829142575124467 |
| 10 | 0.00015291210060351483 | 7508.125578429443 | 0.04077242002862919| 512.9872962501589 |
| 14 | 1.587772072270109e-07 | 44518516510.71311 | 9.387818005589647| 261087699.30254954 |
#### 討論
此題也是 `training error` 隨著 degree 增加而減少。
### ( d ) Consider the model in ( b ) with degree **14** via varying the number training data points m, say, m = 60, 160, 320.</br>Show the five-fold cross-validation errors, testing error and the fitting plots with 75% for training and 25% for testing
---
此題與( b )差在 `training data` 的數量不一樣,本題採用 `m = 60 , 160 , 320`各別求出。
(d) 程式
```python=
def poly_regression(point):
n = 14
epsilon = np.random.normal( 0 , 1 , point )
x = np.linspace( -3 , 3 , point )
y = 2 * x + epsilon
x_train,x_test = splitdata( x )
y_train,y_test = splitdata( y )
x_train_2 = poly_cal(x_train,n)
wlin = poly_compute(x_train_2,y_train)
p = np.poly1d(wlin.flatten())
test_pred=p(x_test)
y_pred = p(x)
'''print("-----Training Error-----")
pred = fit_func( wlin , x_train_2 )
print("Training Error : " + str(error_func( pred, y_train )))
print("-----Leave-one-out-----")
print("Cross Validation Error : " + str(kfold(x,y,20,n)))'''
print("----- %d Training Data-----" %point)
print("-----Five-Fold-----")
print("Cross Validation Error : " + str(kfold(x,y,5,n)))
print("-----Testing Error-----")
print("Testing Error : " + str(error_func( test_pred, y_test )))
plt.xlabel("X")
plt.ylabel("Y")
plt.scatter( x_train , y_train, label = "TrainData")
plt.scatter( x_test , y_test, label= "TestData" )
plt.title("Part D : Degree 14 Fitting Plot with %d data points" %point)
plt.plot( x , y_pred , label="Fitting Function")
plt.legend()
plt.show()
return
print("-----Part D-----")
poly_regression(60)
poly_regression(160)
poly_regression(320)
```
#### 結果



| Data points | Testing Error | Cross-Validation-Error( five-fold )|
|----| ---- | ---- |
| 60 | 278719956.54025924 | 197398876.5216198|
| 160 | 7351466493.240173 | 11407955.479479197|
| 320 | 4622907809.147924 | 235222027.51230615|
#### 討論
Testing Error 隨著 data points 的上升而增加
### ( e ) Consider again the model in (b) with degree 14 via **regularization**:<br/><br/>Compare the results derived by setting λ = 0, 0.001/m , 1/m, 1000/m , where m = 20 is the number of data points (with x = 0, 1/(m−1) , 2/(m−1), . . . , 1). Show the five-fold cross-validation errors, testing errors and the fitting plots with regularization using the following equation:</br>

---
此題與前面幾題最大的差異為他的 `W` 算法不同,總共比較 `4` 種不同的 `λ` ,以下為不同的部分:
```python=
def poly_compute( x , y , lam ):
wlin = np.matmul( np.matmul ( np.linalg.inv( np.matmul( x.T , x ) + lam * np.identity(x.shape[1]) ) , x.T) , y )
return wlin
```
- 此為計算 `W`
( e ) 程式
```python=
def poly_regression(x , y , lam):
n = 14
m = 20
x_train , x_test = splitdata( x )
y_train , y_test = splitdata( y )
x_train_2 = poly_cal( x_train , n )
wlin = poly_compute( x_train_2, y_train , lam )
print("-----λ = %f Degree %d polynomial regression-----" %( lam , n ))
p = np.poly1d(wlin.flatten())
test_pred=p(x_test)
y_pred = p(x)
'''print("-----Training Error-----")
pred = fit_func( wlin , x_train_2 )
print("Training Error : " + str(error_func( pred, y_train )))
print("-----Leave-one-out-----")
print("Cross Validation Error : " + str(kfold(x,y,20,n,lam)))'''
print("-----Five-Fold-----")
print("Cross Validation Error : " + str( kfold ( x , y , 5 , n , lam )))
print("-----Testing Error-----")
print("Testing Error : " + str(error_func( test_pred , y_test )))
print("")
plt.xlabel("X")
plt.ylabel("Y")
plt.scatter( x_train , y_train, label = "TrainData")
plt.scatter( x_test , y_test, label= "TestData" )
plt.title("Part E : λ = %f Degree %d Fitting Plot" %( lam , n ))
y_pred = poly_cal( x , n)
y_pred = y_pred @ wlin
y_pred = y_pred.reshape( m , 1)
plt.plot( x , y_pred , label="Fitting Function")
plt.legend()
plt.show()
return
print("-----Part E-----")
m = 20
x , y = generate( m )
poly_regression( x , y , 0 ) # lambda : 0
poly_regression( x , y , 0.001 / m ) # lambda : 0.001/m
poly_regression( x , y , 1/m ) # lambda : 1/m
poly_regression( x , y , 1000/m ) # lambda : 1000/m
```
#### 結果




`m = 20`
| λ | Testing Error | Cross-Validation-Error( five-fold )|
|----| ---- | ---- |
| 0 | 37523459.38431137 | 1372213865.0117557|
| 0.001/m | 8.185531781011145 | 20.43477279690964|
| 1/m | 0.9067399188977785 | 1.621398010236738|
| 1000/m | 0.8900933833961424 | 1.4656922795408458|
#### 討論
此題可以發現隨著 `λ` 的增加, Fitting Function 會越不貼合 `training data` , 而 `testing error` 及 `cross-validation-error` 也隨之減少。