三月主題課程第一堂

## SCAICT 三月課程第一堂 ### 機器學習理論與實作(一)aka數學先修班 Intro+線性迴歸 --- ## 簽到 ![qrcode_docs.google.com](https://hackmd.io/_uploads/HJQRPNPCa.png) --- ### 不知道要不要放的自我介紹 <img style="float: left; height:300px; margin-left: 100px;" src="https://hackmd.io/_uploads/S14lQa9rA.jpg"> <ul style="margin-top: 30px"> <li>邱德原(De-Yuan, Chiu)</li> <li>台中一中電研社教學</li> <li>SCAICT 資訊組</li> <li>也可以叫我ChiuChiuCircle</li> <li style="font-size:35px;">https://chiudeyuan.github.io/</li> </ul> --- ### 課程規劃 | | | | ---- |:--------------:| | 3/23 | Intro+線性迴歸 | | 3/30 | 深度學習 | | 4/13 | 複習 | | 4/20 | 支持向量機 | | 4/27 | 決策樹 | --- ## 人工智慧? ---- 起源 : 1956達特茅斯夏季會議 ![12345](https://hackmd.io/_uploads/rJGgzrFU6.jpg =70%x) ---- 不同階段的AI ![下載 (3)](https://hackmd.io/_uploads/BJ_B3eqS6.png =50%x) ---- 人工智慧的實現方式 ![螢幕擷取畫面 2024-03-16 152809](https://hackmd.io/_uploads/r1s4q6MAT.png) ---- ### 需要什麼? ---- + **資料!!!** + 好的資料來源 * 資料預處理 * 分析資料 + **數學!!!** * 微積分 * 線性代數 * 機率 * 統計 * **程式語言** * Python * **運用框架&函式庫** * Scikit-Learn * Tensorflow * OpenCV * Numpy * Matplotlib --- ## 什麼是機器學習? ---- 當我們有數據，可能會想要「預測」 ![IMG_0731](https://hackmd.io/_uploads/S1JJSIq9p.jpg =60%x) ---- 描繪數據 ![IMG_0732](https://hackmd.io/_uploads/r1yySI99p.jpg =60%x) ---- 所以要找出這條線的方程式 ![CS229 2](https://hackmd.io/_uploads/Sy_zduq9a.png =60%x) ---- ### 機器學習的目標就是希望推論出 一種方法能夠正確地將$X\mapsto Y$ --- ## 學習方法 ---- ### Hypothesis 機器要算出來(優化)的方程式 :::info $$h_\theta(x)=\theta_0+\theta_1x_1+\theta_2x_2+...\theta_dx_d$$ ::: ---- $$h_\theta(x)=\theta_0+\theta_1x_1+\theta_2x_2+...\theta_dx_d$$  >$\theta:$ 權重(weights) >$x_d:$ 第d個特徵 >$d:$ dimention，即特徵的**維度** >特別注意，我們一律**令$x_0=1$** ---- Example ![IMG_0731](https://hackmd.io/_uploads/S1JJSIq9p.jpg =40%x) $$h_\theta(x)=\theta_0+\theta_1x_1$$ >$x_1:$ Living area ---- Example2 ![IMG_0740](https://hackmd.io/_uploads/ryxL8Bjc6.jpg =50%x) $$h_\theta(x)=\theta_0+\theta_1x_1+\theta_2x_2$$ >$x_1:$ Living area >$x_2:$ bedrooms ---- 我們還可以將此式表示為: $$h(x)=\sum^d_{i=0}\theta_ix_i=\theta^Tx$$ >$\theta^T:$ $\theta$的**轉置矩陣** ---- [數學] 轉置矩陣 ![Matrix_transpose](https://hackmd.io/_uploads/r1yXOOJop.gif) $$A_{ij}^T=A_{ji}$$ 簡單來說就是把矩陣沿45度角線翻轉 ---- 這條等式是把$\theta_ix_i$這樣表示: $$h(x)=\sum^d_{i=0}\theta_ix_i=\theta^Tx=\begin{bmatrix} \theta_1\\\theta_2\ \\...\\ \theta_d\end{bmatrix}^T\times\begin{bmatrix}x_1\\x_2\\...\\x_d\end{bmatrix}$$ ---- [數學] 矩陣乘法 ![b65bfc690ff74745ba682639c9b320e2~tplv-obj](https://hackmd.io/_uploads/ryXiWEXCa.jpg =60%x) $$A^{d\times n}\cdot B^{k\times m}=C^{d\times m}$$ ---- ### 學習步驟 ![CS229](https://hackmd.io/_uploads/r1yyHLccp.png =60%x) ---- ### Learning algorithm ![1_ZCeOEBhvEVLmwCh7vr2RVA](https://hackmd.io/_uploads/Hy1_84R3p.png) ---- ### Learning algorithm ![image](https://hackmd.io/_uploads/ryhytB7Rp.png) --- ## 實作1 : 普通最小平方法(OLS) ---- ### Cost Function 成本函數 :::info $$J(\theta)=\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})^2$$ ::: >$m:$ 訓練資料筆數 ---- ![upload_03abc117d903713e5de73727f58fef54](https://hackmd.io/_uploads/HJKeAS70p.jpg) $$J(\theta)=\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})^2$$ ---- $J(\theta)最小值:$ 求導為0處 $$\frac{\partial}{\partial\theta_j}J(\theta) =0$$ ---- ### [數學] 微分 ![螢幕擷取畫面 2024-03-17 091611](https://hackmd.io/_uploads/Bkzh8Tm0p.png =40%x) ![未命名](https://hackmd.io/_uploads/HJa3LTQ0a.png =40%x) ---- ### [數學] 偏微分其實就是微分只是是在多變量函式中對某一個變量微分 $J(\theta)=\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})^2$ $h_\theta(x^{(i)})=\sum^d_{j=0}\theta_jx_j^{(i)}=\theta^Tx^{(i)}$ 也就是我們是對$\theta$微分，而不是對$x$ ---- ### 求解一階條件(FOC) 假設只有一個特徵$x$ (只有$\theta_0、\theta_1$) $\frac{\partial}{\partial\theta_0}J(\theta)=-2\frac{1}{m}\sum^m_{i=1}(y^{(i)}-h_\theta(x^{(i)}))=0$ $\frac{\partial}{\partial\theta_1}J(\theta)=-2\frac{1}{m}\sum^m_{i=1}(y^{(i)}-h_\theta(x^{(i)}))x^{(i)}=0$ 此兩式又稱為正規方程(Normal equation) ---- ### 圖形意義 ![a4faae5b443b445898bfee6d079948c2](https://hackmd.io/_uploads/ryWnzg4A6.png =60%x) ---- ### 求解$\theta_0、\theta_1$ $\hat\theta_1=\frac{\sum^m_{i=1}(x^{(i)}-\bar{x})(y^{(i)}-\bar{y})}{\sum^m_{i=1}(x^{(i)}-\bar{x})^2}$ $\hat\theta_0=\bar{y}-\hat{\theta}_1\bar{x}$ ---- ### 利用正規方程求得的Hypothesis $\hat{h_\theta}(x)=\hat\theta_0+\hat\theta_1x$ ---- ### 很難懂? 一行解決! ![螢幕擷取畫面 2024-03-17 224519](https://hackmd.io/_uploads/HJhGGFNAp.png) --- ## 上code! 實作OLS ---- ### 環境介紹 ### Google Colab https://colab.research.google.com/?hl=zh-tw ---- ### OLS Code https://colab.research.google.com/github/ChiuDeYuan/SCAICT_lecture/blob/main/0323/housing_price_OLS.ipynb --- ## 實作2 : 梯度下降法 (Gradient descent) ---- ### 設定成本函數 $$J(\theta)=\frac{1}{2}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})^2$$ ---- ### 權重更新公式 $$\theta_j:=\theta_j-\alpha\nabla{J(\theta)}$$ >$\alpha:$ 學習率 >$:=\;:$ 將...設為 ---- ### 梯度? 對$J(\theta)求梯度$ $$\nabla{J(\theta)=(\frac{\partial{J(\theta)}}{\partial{\theta_0}},\frac{\partial{J(\theta)}}{\partial{\theta_1}},...,\frac{\partial{J(\theta)}}{\partial{\theta_n}})}$$ >$n:$ 特徵數 ---- ### 直觀上來說就是 ### 「每次都走最陡的路線」 ---- 圖形意義 ![assets_-LvBP1svpACTB1R1x_U4_-Lw5TMq46bZGMFbfk1mp_-Lw6lSr3sxuY0gzJBEB9_image](https://hackmd.io/_uploads/r1UK7cEA6.png) ---- ### 所以權重更新公式就是 $$\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta)$$ ---- ### 選擇學習率學習率越大，更新幅度越大太大無法收斂，太小效率低下 ![image](https://hackmd.io/_uploads/ByflleIC6.png) ---- 經過一系列推導(先設training example數量為1) $$\begin{aligned}\frac{\partial}{\partial\theta_j}J(\theta) &=\frac{\partial}{\partial\theta_j}\frac{1}{2}(h_\theta(x)-y)^2\\ &=2\cdot\frac{1}{2}(h_\theta(x)-y)\cdot\frac{\partial}{\partial\theta_j}(h_\theta(x)-y)\\ &=(h_\theta(x)-y)\cdot\frac{\partial}{\partial\theta_j}(\sum^d_{i=0}\theta_ix_i-y)\\ &=(h_\theta(x)-y)x_j \end{aligned}$$ ---- 最後就得到公式ㄌ $$\theta:=\theta+\alpha\sum^m_{i=1}(y^{(i)}-h_\theta(x^{(i)}))x^{(i)}$$ ---- 意思是「看完訓練集後找最陡的路走」 $$\theta:=\theta+\alpha\sum^m_{i=1}(y^{(i)}-h_\theta(x^{(i)}))x^{(i)}$$ ---- 把$J(\theta)$畫成等高線圖 ![IMG_0772](https://hackmd.io/_uploads/By2rVx80p.jpg =50%x) 可以發現每次更新權重都是往垂直於等高線方向 ---- 我們再觀察一下這條公式 $$\theta:=\theta+\alpha\sum^m_{i=1}(y^{(i)}-h_\theta(x^{(i)}))x^{(i)}$$ ---- 可以看到每次都是算完m筆資料後才更新權重 $$\theta:=\theta+\alpha\sum^m_{i=1}(y^{(i)}-h_\theta(x^{(i)}))x^{(i)}$$ 所以在資料集比較大時效率就很差 ---- 於是，我們改成每看完一筆資料後就更新權重 $$\theta:=\theta+\alpha(y^{(i)}-h_\theta(x^{(i)}))x^{(i)}$$ $i$ 是每次從訓練集裡隨機抽取一個樣本做計算 ---- 一次看整個資料集更新一次的方式又稱為 #### 批次梯度下降 #### batch gradient descent ---- 而看一筆資料更新一次的方式又稱為 #### 隨機梯度下降 #### stochastic gradient descent --- ### SGD code https://colab.research.google.com/github/ChiuDeYuan/SCAICT_lecture/blob/main/0323/housing_price_SGD.ipynb --- ### 普通最小平方法 vs 梯度下降法 ---- ### 因為正規方程式計算本高所以在多特徵時梯度下降比較好 --- ## 那如果想要用多個特徵呢? ---- 例如我想用area、bedrooms兩個特徵 ![IMG_0740](https://hackmd.io/_uploads/ryxL8Bjc6.jpg =70%x) ---- 直接上code ### MLR code https://colab.research.google.com/github/ChiuDeYuan/SCAICT_lecture/blob/main/0323/housing_price_MLR.ipynb --- ## 沒了 ### 作業放在dc --- ## 回饋表單 ![qrcode_docs.google.com (1)](https://hackmd.io/_uploads/Sy98_NwRT.png)