機器學習-神經網路

# 機器學習-神經網路 ## 甚麼是神經網路神經網路（Neural Network，簡稱 NN）是一種模仿生物神經系統運作的計算模型，用於學習資料中的模式與規律。它透過多層神經元進行訊號傳遞，並能夠自動從資料中學習特徵，適用於回歸、分類、影像處理、自然語言處理等多種任務。 ### 神經網路的基本構成 1. **輸入層（Input Layer）** - 接收原始資料，例如一個數值、圖像或文字的特徵向量。 2. **隱藏層（Hidden Layers）** - 進行訊號的非線性轉換，提取資料的高階特徵。 - 每個神經元會計算加權和並套用激活函數，例如 `ReLU`, `Sigmoid`, `Tanh`。 3. **輸出層（Output Layer）** - 產生模型預測結果，對回歸問題輸出數值，對分類問題輸出機率。 ### 神經元的運作公式每個神經元的輸出可表示為： $$ a = f\Big(\sum_{i} w_i x_i + b\Big) $$ - `x_i`：輸入 - `w_i`：權重（Weight） - `b`：偏置（Bias） - `f`：激活函數（Activation Function） - `a`：神經元輸出 ### 神經網路訓練流程 1. **前向傳播（Forward Pass）** - 計算每層神經元的輸出，得到模型預測值 `y_pred`。 2. **損失計算（Loss Function）** - 衡量模型預測與真實值的差距，例如均方誤差（MSE）： $$ MSE = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2 $$ 3. **反向傳播（Backward Pass）** - 利用鏈式法則計算每個權重對損失的梯度。 4. **權重更新（Gradient Descent）** - 根據梯度更新權重與偏置，使損失下降。 5. **重複迴圈** - 持續訓練直到損失收斂或達到設定的停止條件。 ### 範例 (python) 這是一個用於預測方程式的簡易神經網路範例 ```python= import numpy as np import matplotlib.pyplot as plt np.random.seed(5) # 建立資料 x_raw = np.linspace(-50, 50, 200).reshape(-1, 1) x = (x_raw - np.mean(x_raw)) / np.std(x_raw) y = 5*x + 1 + np.random.randn(200, 1) * 0.3 # structure layer_sizes = [1, 64, 64, 1] lr = 0.01 # initialize weights = [] biases = [] for i in range(len(layer_sizes) - 1): w = np.random.randn(layer_sizes[i], layer_sizes[i+1]) * np.sqrt(2 / layer_sizes[i]) b = np.zeros((1, layer_sizes[i+1])) weights.append(w) biases.append(b) # 激活函數 tanh def tanh(x): return np.tanh(x) def tanh_derivative(x): return 1 - np.tanh(x)**2 loss = 1 epoch = 0 while loss>0.1: # forward activations = [x] zs = [] a = x for w, b in zip(weights[:-1], biases[:-1]): z = a.dot(w) + b zs.append(z) a = tanh(z) activations.append(a) z = activations[-1].dot(weights[-1]) + biases[-1] zs.append(z) y_pred = z loss = np.mean((y_pred - y)**2) # backward dy = 2 * (y_pred - y) / len(y) deltas = [dy] for i in reversed(range(len(layer_sizes) - 2)): dz = deltas[-1].dot(weights[i+1].T) * tanh_derivative(zs[i]) deltas.append(dz) deltas.reverse() for i in range(len(weights)): dw = activations[i].T.dot(deltas[i]) db = np.sum(deltas[i], axis=0, keepdims=True) weights[i] -= lr * dw biases[i] -= lr * db lr = 0.01 * (0.99 ** (epoch // 100)) if epoch % 500 == 0: print(f"Epoch {epoch}, Loss={loss:.5f}") # 預測 a = x for w, b in zip(weights[:-1], biases[:-1]): a = tanh(a.dot(w) + b) y_pred = a.dot(weights[-1]) + biases[-1] # render plt.figure(figsize=(8,5)) plt.scatter(x_raw, y, s=10, label="True data", alpha=0.6) plt.plot(x_raw, y_pred, color="red", linewidth=2, label="NN prediction") plt.xlabel("x (original scale)") plt.ylabel("y") plt.legend() plt.title("Neural Network Fit (x ∈ [-50, 50])") plt.grid(alpha=0.3) plt.tight_layout() plt.savefig("nn_predict.png", dpi=300) ``` >專案連結(附有預測結果圖) >https://github.com/Neox9487/nns-equation-predict **程式碼解釋** 1. 建立資料 ```python= x_raw = np.linspace(-50, 50, 200).reshape(-1, 1) x = (x_raw - np.mean(x_raw)) / np.std(x_raw) y = 5*x + 1 + np.random.randn(200, 1) * 0.3 ``` - `x_raw`：原始輸入，範圍從 -50 到 50，總共 200 個點 - `x`：標準化後的輸入，使平均值為 0、標準差為 1這樣神經網路訓練更穩定，不會出現梯度爆炸或消失 - `y`：目標輸出（真實資料）線性函數 5*x + 1 - 加上高斯噪聲 `np.random.randn(...) * 0.3` 模擬真實資料的隨機性 2. 定義神經網路結構與學習率 ```python= layer_sizes = [1, 64, 64, 1] lr = 0.01 ``` - `[1, 64, 64, 1]` 表示1個輸入神經元（`x`）-> 兩層隱藏層，每層64個神經元 -> 1個輸出神經元（`y_pred`） - `lr`：初始學習率（learning rate） 3. 初始化權重與偏置 ```python= weights = [] biases = [] for i in range(len(layer_sizes) - 1): w = np.random.randn(layer_sizes[i], layer_sizes[i+1]) * np.sqrt(2 / layer_sizes[i]) b = np.zeros((1, layer_sizes[i+1])) weights.append(w) biases.append(b) ``` - `weights`：每層的權重矩陣，大小 (上一層神經元數, 下一層神經元數) - `biases`：每層的偏置向量，初始化為 0 - 權重初始化方式為 **He初始化**，適合 tanh 或 ReLU 激活函數，公式為 `np.sqrt(2 / fan_in)`，讓輸入方差保持穩定 4. 定義激活函數 ```python= def tanh(x): return np.tanh(x) def tanh_derivative(x): return 1 - np.tanh(x)**2 ``` - `tanh(x)`：雙曲正切函數，輸出範圍 (-1, 1) - `tanh_derivative(x)`：tanh 的導數，用於反向傳播計算梯度 5. 訓練迴圈 ```python= loss = 1 epoch = 0 while loss>0.1: ... ``` > 我常用的激活函數: https://hackmd.io/86IM4PiAS46HpTUUp7uXdQ 5-1. 前向傳播（Forward Pass） ```python= activations = [x] zs = [] a = x for w, b in zip(weights[:-1], biases[:-1]): z = a.dot(w) + b zs.append(z) a = tanh(z) activations.append(a) z = activations[-1].dot(weights[-1]) + biases[-1] zs.append(z) y_pred = z ``` - `z = a.dot(w) + b`：線性組合（加權和 + 偏置） - `a = tanh(z)`：激活函數，加入非線性 - 最後一層不加激活函數（因為是回歸問題） - `activations`：儲存每層的輸出，用於反向傳播 - `zs`：儲存每層的線性輸入（`z`） 5-2. 計算損失（Loss） ```python= loss = np.mean((y_pred - y)**2) ``` - 使用高中教的均方差 $MSE = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2$ 計算 Loss 5-3. 反向傳播（Backward Pass） ```python= dy = 2 * (y_pred - y) / len(y) deltas = [dy] for i in reversed(range(len(layer_sizes) - 2)): dz = deltas[-1].dot(weights[i+1].T) * tanh_derivative(zs[i]) deltas.append(dz) deltas.reverse() ``` - 計算每層的誤差項 `delta` - 從輸出層往輸入層反傳 - 使用鏈式法則乘上 `tanh_derivative(z)` 5-4. 更新權重、偏置跟學習率 ```python= for i in range(len(weights)): dw = activations[i].T.dot(deltas[i]) db = np.sum(deltas[i], axis=0, keepdims=True) weights[i] -= lr * dw biases[i] -= lr * db lr = 0.01 * (0.99 ** (epoch // 100)) ``` - 計算每層權重梯度 dw 和偏置梯度 db - 使用梯度下降更新權重與偏置 - 每 100 個 epoch，學習率衰減 1%，讓訓練後期更穩定 (不一定要加！) 6. 預測（Forward Pass） ```python= a = x for w, b in zip(weights[:-1], biases[:-1]): a = tanh(a.dot(w) + b) y_pred = a.dot(weights[-1]) + biases[-1] ``` - 使用訓練好的權重做預測 - 與訓練時前向傳播相同 7. 畫圖 ```python= plt.figure(figsize=(8,5)) plt.scatter(x_raw, y, s=10, label="True data", alpha=0.6) plt.plot(x_raw, y_pred, color="red", linewidth=2, label="NN prediction") plt.xlabel("x (original scale)") plt.ylabel("y") plt.legend() plt.title("Neural Network Fit (x ∈ [-50, 50])") plt.grid(alpha=0.3) plt.tight_layout() plt.savefig("nn_predict.png", dpi=300) ``` ### 補充說明 #### 1. **偏置（Bias）** 偏置（Bias）讓神經元在輸入為 0 或沒有特徵信號時，仍能產生非零輸出讓模型能夠靈活地學習資料中的平移關係。舉個栗子： ![image](https://hackmd.io/_uploads/BkHbcsD0ee.png) 如果**沒有偏置項:** $z=Wx$ 那當 $x=0$（輸入全為 0）時，就必定有 $z=0$ 輸出經過激活函數後（例如 ReLU、sigmoid），輸出也是固定值 (0)。這表示神經元在輸入沒有訊號時「完全無法動作」。網路的學習能力受到限制，尤其在學習**非過原點 non-zero intercept** 的函數時。 **加上偏置之後:** $z=Wx+b$ 現在即使 $x=0$ 輸出仍是 $z=b$ 這表示神經元能夠「自動調整激活起點」，學習更廣的資料分佈。 **也就是說:** Bias 會讓網路「即使在輸入值很小或為 0 時，也能學會非零輸出」 (防踩雷)我之前認為 Bias 能讓網路「在沒有資料時自我訓練」，這是錯的!!! #### 2. **鏈式法則（Chain Rule）** 鏈式法則，也稱連鎖律，是微積分中求複合函數導數的一種規則。它指出一複合函數的導數是其組成函數在相應點的導數的乘積，就像鎖鏈一環套一環。這種方法可以透過最外層的函數開始，逐層向內求導，直到最內層。