## Linear Regression
----
Examples:
$f$(藥物名稱) ⮕ 藥劑量
$f$(5090) ⮕ 算力
---
### 寶可夢CP預測
$f$ () ⮕ CP after evolution
{ $x$$_{cp}$ , $x$$_s$ , $x$$_{hp}$} ⮕ _$y$_
---
### Step 1
----
Set up the __model(a set of functions)__
__定義模型__
Assume $y=b+w$$\cdot$$x_{cp}$ $\Rightarrow$ This is a __Linear Model__
__w__ and __b__ are parameters(參數) $\Rightarrow$ Can be any value
$f_1$: $y$$= 10.0+9.0$$\cdot$$x$$_{cp}$
$f_2$: $y$$= -3.1+8.9$$\cdot$$x$$_{cp}$
$f_3$: $y$$= -0.8-0.3$$\cdot$$x$$_{cp}$
----
### Linear Model
All linear model can generalize into $y$$=b+$$\sum$$w$$_i$$x$$_{i}$
$x$$_{i}$: an attribute of input $\Rightarrow$ __feature__
$w$$_{i}$: weight, $b$: bias
---
### Step 2
----
### Review
A set of Functions(Model)
$\downarrow$
Goodness of $f()$
$\uparrow$
Training Data
第二步要做的就是找出goodness of $f()$
__定義目標函數並訓練__
----
### Training Data
How to define input output?
function input: _$x$$^n$_
function output: _$\hat{y}$$^n$_
_$\hat{y}$_ 代表正確output
----
### Loss function _L_
一種函式用來表示預測output與實際output的差異
假設有m筆data則他的lost function如下:
$L(f)$ $=$ $L(w,b)$ $=$ $\frac{1}{m}\sum\limits_{n = 1}^{m}$ $($$\hat{y}$$^n$$-$$($$b+w$$\cdot$$x^{n}_{cp}$$)$$)$^2^
$\hat{y}$$^n$ 為實際output
$b+w$$\cdot$$x^{n}_{cp}$ 為預測output
每一筆資料都取他們的平方差最後再把所有資料加總就是lost function輸出的值
----
### Training Data範例(Row[2]&Row[14])
```csvpreview=
name,species,cp,hp,weight,height,power_up_stardust,power_up_candy,attack_weak,attack_weak_type,attack_weak_value,attack_strong,attack_strong_type,attack_strong_value,cp_new,hp_new,weight_new,height_new,power_up_stardust_new,power_up_candy_new,attack_weak_new,attack_weak_type_new,attack_weak_value_new,attack_strong_new,attack_strong_type_new,attack_strong_value_new,notes,
Pidgey20,Pidgey,176,35,1.81,0.29,1000,1,Tackle,Normal,12,Air Cutter,Flying,30,327,54,3.46,1.07,1000,1,Steel Wing,Steel,15,Twister,Dragon,25,
Pidgey21,Pidgey,97,30,1.95,0.31,600,1,Tackle,Normal,12,Twister,Dragon,25,181,44,1.79,1.15,600,1,Steel Wing,Steel,15,Aerial Ace,Flying,30,
Pidgey22,Pidgey,74,24,0.78,0.26,600,1,Tackle,Normal,12,Twister,Dragon,25,141,38,30,0.95,600,1,Steel Wing,Steel,15,Twister,Dragon,25,
Pidgey23,Pidgey,127,32,2.33,0.34,800,1,Tackle,Normal,12,Air Cutter,Flying,30,241,49,3.4,1.23,800,1,Steel Wing,Steel,15,Air Cutter,Flying,30,
Pidgey24,Pidgey,78,26,2.11,0.32,600,1,Tackle,Normal,12,Twister,Dragon,25,146,39,3.16,1.17,600,1,Steel Wing,Steel,15,Aerial Ace,Flying,30,
Pidgey25,Pidgey,240,41,2.53,0.33,1600,2,Tackle,Normal,12,Aerial Ace,Flying,30,448,64,8.21,1.21,1600,2,Wing Attack,Flying,9,Air Cutter,Flying,30,
Pidgey26,Pidgey,276,48,0.87,0.24,2200,2,Tackle,Normal,12,Aerial Ace,Flying,30,530,74,30,0.86,2200,2,Steel Wing,Steel,15,Aerial Ace,Flying,30,
Pidgey27,Pidgey,207,41,1.19,0.26,1600,2,Quick Attack,Normal,10,Air Cutter,Flying,30,393,64,30,0.96,1600,2,Steel Wing,Steel,15,Air Cutter,Flying,30,
Pidgey28,Pidgey,176,35,1.68,0.31,1300,2,Tackle,Normal,12,Air Cutter,Flying,30,335,55,30,1.13,1300,2,Steel Wing,Steel,15,Aerial Ace,Flying,30,
Pidgey29,Pidgey,316,47,1.71,0.27,2500,2,Quick Attack,Normal,10,Aerial Ace,Flying,30,594,74,5.71,0.99,2500,2,Wing Attack,Flying,9,Twister,Dragon,25,
Pidgey30,Pidgey,305,53,2.04,0.32,2200,2,Quick Attack,Normal,10,Twister,Dragon,25,567,79,1.95,1.17,2200,2,Steel Wing,Steel,15,Aerial Ace,Flying,30,
Pidgey31,Pidgey,304,46,1.99,0.31,2500,2,Tackle,Normal,12,Air Cutter,Flying,30,579,73,3.88,1.12,2500,2,Steel Wing,Steel,15,Twister,Dragon,25,
Pidgey32,Pidgey,242,43,2.09,0.34,1900,2,Quick Attack,Normal,10,Twister,Dragon,25,459,67,30,1.26,1900,2,Steel Wing,Steel,15,Air Cutter,Flying,30,
Pidgey33,Pidgey,250,42,0.95,0.26,1900,2,Quick Attack,Normal,10,Twister,Dragon,25,471,66,30,0.95,1900,2,Wing Attack,Flying,9,Twister,Dragon,25,
Pidgey34,Pidgey,226,40,2.04,0.31,1600,2,Tackle,Normal,12,Aerial Ace,Flying,30,428,63,4.76,1.12,1600,2,Steel Wing,Steel,15,Aerial Ace,Flying,30,
Pidgey35,Pidgey,220,43,2.13,0.3,1600,2,Tackle,Normal,12,Aerial Ace,Flying,30,418,66,7.41,1.1,1600,2,Steel Wing,Steel,15,Twister,Dragon,25,
Pidgey36,Pidgey,185,40,1.63,0.29,1300,2,Tackle,Normal,12,Twister,Dragon,25,354,62,0.53,1.07,1300,2,Steel Wing,Steel,15,Aerial Ace,Flying,30,
Pidgey37,Pidgey,42,20,2.11,0.33,400,1,Quick Attack,Normal,10,Aerial Ace,Flying,30,80,30,1.02,1.21,400,1,Wing Attack,Flying,9,Aerial Ace,Flying,30,
Pidgey38,Pidgey,344,56,2.57,0.35,3000,3,Tackle,Normal,12,Aerial Ace,Flying,30,647,84,4.16,1.29,3000,3,Steel Wing,Steel,15,Twister,Dragon,25,
Pidgey39,Pidgey,108,31,1.19,0.25,800,1,Tackle,Normal,12,Air Cutter,Flying,30,205,47,0.54,0.91,800,1,Wing Attack,Flying,9,Air Cutter,Flying,30,
```
----
### 生成散佈圖

----
### 實作(Data沿用)
```python=
x = np.arange(-200,-100,1) #bias
y = np.arange(-5,5,0.1) #weight
Z = np.zeros((len(x), len(y)))
for i in range(len(x)):
for j in range(len(y)):
b = x[i]
w = y[j]
for n in range(len(x_data)):
Z[i][j] = Z[i][j] + (y_data[n] - b - w*x_data[n])**2 #Loss function
Z[i][j] = Z[i][j]/len(x_data) #Average loss
plt.imshow(Z, extent=[-5, 5, -200, -100], aspect='auto', origin='lower')
plt.colorbar(label='Loss')
plt.xlabel('Weight')
plt.ylabel('Bias')
plt.title('Loss Function')
plt.show()
```
----
$L(w,b)$
越紫色代表$L(w,b)$越小 $\Rightarrow$ better parameters

---
### Step 3
----
__找出最好的function__
A set of Functions(Model)
$\downarrow$
Goodness of $f()$ $\rightarrow$ Best Function
$\uparrow$
Training Data
----
$f^*$ $=$ $arg$ $\min\limits_f$ $L(f)$
$w^*$,$b^*$ $=$ $arg$ $\min\limits_{w,b}$ $L(w,b)$
$L(w,b)$ $=$ $\sum\limits_{n = 1}^{10}$ $($$\hat{y}$$^n$$-$$($$b+w$$\cdot$$x^{n}_{cp}$$)$$)$^2^
$w^*$,$b^*$ $=$ $arg$ $\min\limits_{w,b}$ $\sum\limits_{n = 1}^{10}$ $($$\hat{y}$$^n$$-$$($$b+w$$\cdot$$x^{n}_{cp}$$)$$)$^2^
----
$w^*$,$b^*$ $=$ $arg$ $\sum\limits_{n = 1}^{10}$ $($$\hat{y}$$^n$$-$$($$b+w$$\cdot$$x^{n}_{cp}$$)$$)$^2^
可獲得Training Data中$L(w,b)$ output值最低的function
$\Downarrow$
__Best function__
---
### Gradient Descent 梯度下降法
----
運用此最佳化算法找到函式的局部最小值
$w^*$,$b^*$ $=$ $arg$ $\sum\limits_{n = 1}^{10}$ $($$\hat{y}$$^n$$-$$($$b+w$$\cdot$$x^{n}_{cp}$$)$$)$^2^
----
### 簡化
Create a $L(f)$ with __one__ parameter
$w^*$ $=$ $arg$ $\min\limits_w$ $L(w)$

----
### 沒有效率的做法
窮舉$w$,找出$L(w)$最小的值
----
### 有效率的做法
建立隨機點$w^0$
計算$\frac{dL}{dw}$ $|$ $w=w^0$

----
不會微分..? 計算斜率
if slope<0 $\rightarrow$ increase $w$
if slope>0 $\rightarrow$ decrease $w$

----
### 移動$w$
$w^1$: $w^0 - \eta\frac{dL}{dw}$ | $w=w^0$
----
### 改變$w$的因素
$\eta$ $\&$ $\frac{dL}{dw}$
$\eta$: Learning rate(定值)
$\frac{dL}{dw}$: 微分值
----
### Learning Rate
參數越大,更新幅度越大
Learning rate越大,學習速度越快
----
### Local Minimum
經過非常多次的迭代後,會遇到Local Minimum
在這個點上微分=0 $\Rightarrow$ 之後無法繼續更新參數
$w^t$: $w^{t-1} - \eta\cdot0$ | $w=w^{t-1}$

也因為這個關係,gradient descent只能求出local minimum但沒辦法求出global minimum
----
### 那為什麼在這裡沒有問題呢
Linear model只有local minimum,沒有global minimum
----
Visualize:
https://uclaacm.github.io/gradient-descent-visualiser/#playground
---
### 兩個參數的Gradient Descent
----
$w^*$ $b^*$ $=$ $arg$ $\min\limits_{w,b}$ $L(w,b)$
----
Pick random value $w^0,b^0$
Compute:
$\frac{dL}{dw}$ | $w=w^0,b=b^0$
$\frac{dL}{db}$ | $w=w^0,b=b^0$
$\Downarrow$
$w^1$: $w^0 - \eta\frac{dL}{dw}$ | $w=w^0,b=b^0$
$b^1$: $b^0 - \eta\frac{dL}{db}$ | $w=w^0,b=b^0$
----

----
### 實作(Data沿用)
```python=
b = -120 # initial b
w = -4 # initial w
lr = 1 # learning rate
iteration = 100000 #迭代次數
b_lr = 0.0
w_lr = 0.0
b_history = [b]
w_history = [w]
for i in range(iteration):
b_grad = 0.0
w_grad = 0.0
for n in range(len(x_data)):
b_grad = b_grad - 2.0*(y_data[n] - b - w*x_data[n])*1.0
w_grad = w_grad - 2.0*(y_data[n] - b - w*x_data[n])*x_data[n]
b_lr = b_lr + b_grad**2
w_lr = w_lr + w_grad**2
b = b - lr/np.sqrt(b_lr) * b_grad
w = w - lr/np.sqrt(w_lr) * w_grad
b_history.append(b)
w_history.append(w)
```
----

min $b$: -0.15537159439738932
min $w$: 1.8891815145998943
----
### Learning rate impact
if learning rate越大 $\rightarrow$ 學習速度越快
__But__ 稍微不精準
How about lowering lr? $\rightarrow$ 學習速度變慢

為甚麼會這樣? $\because$ iteration不夠
$\Rightarrow$ 迭代次數調大 $\rightarrow$ 需花費更多時間
----

可以發現$b^*,w^*$的值更準確了
min $b$: -7.031236207033889
min $w$: 1.9181764491678417
---
### 回到寶可夢CP預測
----
將我們用gradient descent求出來的$b^*,w^*$代入原本的model
$b^*$=-7.031236207033889
$w^*$=1.9181764491678417
Model: $y=b+w$$\cdot$$x_{cp}$
----
```python=
import matplotlib.pyplot as plt
import numpy as np
import csv
x_data = []
y_data = []
with open(r'C:\Users\Ryan\Documents\ML\Regression\Dataset-LR\test.csv', 'r') as file:
reader = csv.reader(file)
next(reader)
for row in reader:
x_data.append(float(row[0]))
y_data.append(float(row[1]))
b=-7.031236207033889
w=1.9181764491678417
x=np.linspace(700,0)
y=b+w*x
plt.scatter(x_data, y_data)
plt.plot(x,y)
plt.show()
```
----

----
### Average error on training data
4.503145265728941
```python=
differences = [abs(y - (b + w * x)) for x, y in zip(x_data, y_data)]
average_difference = np.mean(differences)
print("Average error on training data:", average_difference)
```
----

----
### Testing Data
Training data用來訓練模型
Testing data用來確認模型
以上學舉例,Training data是老師,Test data是考試
----
### Test Data範例(Row[2]&Row[14])
```csvpreview=
name,species,cp,hp,weight,height,power_up_stardust,power_up_candy,attack_weak,attack_weak_type,attack_weak_value,attack_strong,attack_strong_type,attack_strong_value,cp_new,hp_new,weight_new,height_new,power_up_stardust_new,power_up_candy_new,attack_weak_new,attack_weak_type_new,attack_weak_value_new,attack_strong_new,attack_strong_type_new,attack_strong_value_new,notes
Pidgey1,Pidgey,384,56,2.31,0.34,2500,2,Tackle,Normal,12,Aerial Ace,Flying,30,694,84,2.6,1.24,2500,2,Steel Wing,Steel,15,Air Cutter,Flying,30,
Pidgey2,Pidgey,366,54,1.67,0.29,2500,2,Quick Attack,Normal,10,Twister,Dragon,25,669,81,1.93,1.05,2500,2,Wing Attack,Flying,9,Air Cutter,Flying,30,
Pidgey3,Pidgey,353,55,1.94,0.3,3000,3,Quick Attack,Normal,10,Aerial Ace,Flying,30,659,83,3.51,1.11,3000,3,Wing Attack,Flying,9,Air Cutter,Flying,30,
Pidgey4,Pidgey,338,51,1.73,0.31,3000,3,Tackle,Normal,12,Air Cutter,Flying,30,640,79,30,1.12,3000,3,Steel Wing,Steel,15,Air Cutter,Flying,30,
Pidgey5,Pidgey,242,45,1.44,0.27,1900,2,Quick Attack,Normal,10,Air Cutter,Flying,30,457,69,1.42,0.98,1900,2,Wing Attack,Flying,9,Twister,Dragon,25,
Pidgey6,Pidgey,129,35,2.07,0.35,800,1,Quick Attack,Normal,10,Air Cutter,Flying,30,243,52,30,1.27,800,1,Wing Attack,Flying,9,Aerial Ace,Flying,30,
Pidgey7,Pidgey,10,10,0.92,0.25,200,1,Tackle,Normal,12,Air Cutter,Flying,30,15,13,30,0.9,200,1,Wing Attack,Flying,9,Air Cutter,Flying,30,
Pidgey8,Pidgey,25,14,2.72,0.37,200,1,Tackle,Normal,12,Twister,Dragon,25,47,21,2.63,1.35,200,1,Steel Wing,Steel,15,Air Cutter,Flying,30,
Pidgey9,Pidgey,24,13,2.07,0.32,200,1,Quick Attack,Normal,10,Twister,Dragon,25,47,21,3.27,1.16,200,1,Wing Attack,Flying,9,Twister,Dragon,25,
Pidgey10,Pidgey,161,35,1.45,0.31,1000,1,Tackle,Normal,12,Twister,Dragon,25,305,54,30,1.14,1000,1,Steel Wing,Steel,15,Aerial Ace,Flying,30,
Pidgey11,Pidgey,114,31,1.58,0.26,800,1,Tackle,Normal,12,Aerial Ace,Flying,30,213,47,4.65,0.96,800,1,Wing Attack,Flying,9,Aerial Ace,Flying,30,
Pidgey12,Pidgey,333,52,1.85,0.3,3000,3,Tackle,Normal,12,Air Cutter,Flying,30,633,80,2.54,1.1,3000,3,Steel Wing,Steel,15,Aerial Ace,Flying,30,
Pidgey13,Pidgey,132,33,1.63,0.28,800,1,Quick Attack,Normal,10,Aerial Ace,Flying,30,247,50,2.41,1.03,800,1,Wing Attack,Flying,9,Air Cutter,Flying,30,
Pidgey14,Pidgey,60,21,1.67,0.3,400,1,Quick Attack,Normal,10,Twister,Dragon,25,113,33,0.28,1.09,400,1,Steel Wing,Steel,15,Twister,Dragon,25,
Pidgey15,Pidgey,42,19,2.01,0.3,400,1,Quick Attack,Normal,10,Air Cutter,Flying,30,79,29,4.87,1.11,400,1,Wing Attack,Flying,9,Aerial Ace,Flying,30,
Pidgey16,Pidgey,91,29,2.68,0.35,600,1,Quick Attack,Normal,10,Twister,Dragon,25,173,44,5.42,1.3,600,1,Steel Wing,Steel,15,Air Cutter,Flying,30,
Pidgey17,Pidgey,139,34,1.76,0.31,1000,1,Tackle,Normal,12,Twister,Dragon,25,265,53,30,1.13,1000,1,Steel Wing,Steel,15,Aerial Ace,Flying,30,
Pidgey18,Pidgey,330,48,1.62,0.29,2500,2,Quick Attack,Normal,10,Air Cutter,Flying,30,624,75,0.02,1.08,2500,2,Wing Attack,Flying,9,Aerial Ace,Flying,30,
Pidgey19,Pidgey,328,48,1.62,0.3,2500,2,Quick Attack,Normal,10,Twister,Dragon,25,619,76,30,1.08,2500,2,Steel Wing,Steel,15,Aerial Ace,Flying,30,
```
----
### Average error on testing data
6.668680136084539

---
### 更準確的Model?
----
### 二次式
Model: $y=b+w_1$$\cdot$$x_{cp}+w_2$$\cdot$$(x_{cp})^2$
----
```python=
for i in range(iteration):
b_grad = 0.0
w1_grad = 0.0
w2_grad = 0.0
for n in range(len(x_data)):
b_grad -= 2.0 * (y_data[n] - b - w1 * x_data[n] - w2 * x_data[n]**2) * 1.0
w1_grad -= 2.0 * (y_data[n] - b - w1 * x_data[n] - w2 * x_data[n]**2) * x_data[n]
w2_grad -= 2.0 * (y_data[n] - b - w1 * x_data[n] - w2 * x_data[n]**2) * x_data[n]**2
b_lr += b_grad**2
w1_lr += w1_grad**2
w2_lr += w2_grad**2
b -= lr / np.sqrt(b_lr) * b_grad
w1 -= lr / np.sqrt(w1_lr) * w1_grad
w2 -= lr / np.sqrt(w2_lr) * w2_grad
b_history.append(b)
w1_history.append(w1)
w2_history.append(w2)
print('min b:', b_history[-1], 'min w1:', w1_history[-1], 'min w2:', w2_history[-1])
```
min $b$: -3.44768035432058
min $w_1$: 1.9399339047755344
min $w_2$: -0.00014155455728048015
----
### Average error on training data
2.776670843299643
```python=
differences = [abs(y - (b + w1 * x + w2 * x ** 2)) for x, y in zip(x_data, y_data)]
average_difference = np.mean(differences)
print("Average error on training data:", average_difference)
```

----
### How about testing data?
4.411362971079184

----
### 三(四、五)次式呢
----
### Overfitting
三、四、五次式 Testing data error值很大
過度學習(overfitting)
$\downarrow$
Training data上結果很好
不代表Testing data也會一樣好

----
### Conclusion
越複雜的model不一定越好!
---
References
----
https://youtu.be/fegAeph9UaA
https://youtu.be/1UqCjFQiiy0
https://pse.is/5sfwgs
https://www.kaggle.com/datasets/mirajdeepbhandari/polynomial-regression
https://www.openintro.org/book/statdata/index.php?data=pokemon
{"description":"type: slide","title":"Linear Regression","contributors":"[{\"id\":\"d4f42d78-aa45-44e2-ade2-4d4e233d6166\",\"add\":26690,\"del\":11002,\"latestUpdatedAt\":null}]"}