mid algorithms

# mid algorithms ``` class: algorithms ``` ------- ### 梯度消失及爆炸前言曾在上學期的演算法和資料分析中學習了梯度下降。從而知道梯度消失及爆照。簡述梯度消失根源其為---深度神經網絡和反向傳播。深層網絡由許多非綫性層堆叠，深度網絡可被看作一個複合非綫性多元函數。其數學公式 y=fN(fN−1(…f2(f1(x))…)) 最終需要得出最優解是g(x),滿足Loss=L(g(x),f(x)),簡單的數學公式： Loss=（g(x) - f(x)^2 2）損失函數類似於下圖 ![v2-d09ff67c849168b4db6db3201e8edef8_720w](https://hackmd.io/_uploads/ByWPQkRXa.png) ## 梯度消失&爆炸兩次原因下梯度常出現一個是在深度網絡中，二采用了不合適函數。 ### 從深度網絡角度出發 ![v2-a49d6d008278e9b45a7c9db4c661319f_720w](https://hackmd.io/_uploads/H1Ml4JCQT.png) 圖中爲一個四層全連接網絡加黑色每層激活後輸出值為fi(x), i為第i層,x代表第i層輸出，也就是第i-1層的輸出，f是激活函數，那麽，fi+1 = f(ffi*wi+1+bi+1)，簡單的列爲 fi+2 = f(fi * wi + 1) BP算法也就是反向傳播算法，基於梯度下降，以目標的負梯度方向進行調整. 梯度下降法则：参数的更新公式为 θ=θ−α⋅∇J(θ)θ=θ−α⋅∇J(θ)，其中 θθ 为参数，αα 为学习率，∇J(θ)∇J(θ) 为损失函数 J(θ)J(θ) 关于参数 θθ 的梯度。对于第二隐藏层的权值更新：设第二隐藏层的权值矩阵为 WW，损失函数为 JJ，第二隐藏层的输出为 a(2)a(2)，则权值更新的梯度为 ∂J∂W(2)∂W(2)∂J，更新公式为 W(2)=W(2)−α⋅∂J∂W(2)W(2)=W(2)−α⋅∂W(2)∂J。链式求导法则：如果z=f(g(x))z=f(g(x))，∂z∂x=∂f∂g⋅∂g∂x∂x∂z=∂g∂f⋅∂x∂g。對於層級增多求出梯度更新以指數形式增加，即發生梯度爆炸。 > 以下為演示代碼 ``` import numpy as np def sigmoid(x): return 1 / (1 + np.exp(-x)) def sigmoid_derivative(x): return x * (1 - x) def update_weights(weights, learning_rate, gradient): return weights - learning_rate * gradient def forward_propagation(inputs, weights): return sigmoid(np.dot(inputs, weights)) def backward_propagation(output_error, inputs, weights): gradient = output_error * sigmoid_derivative(output_error) weights_gradient = np.outer(inputs, gradient) return gradient, weights_gradient def train_neural_network(inputs, targets, learning_rate, epochs): input_size = len(inputs[0]) hidden1_size = 4 hidden2_size = 3 output_size = len(targets[0]) # Initialize weights randomly weights_input_hidden1 = np.random.rand(input_size, hidden1_size) weights_hidden1_hidden2 = np.random.rand(hidden1_size, hidden2_size) weights_hidden2_output = np.random.rand(hidden2_size, output_size) for epoch in range(epochs): for i in range(len(inputs)): # Forward propagation hidden1_output = forward_propagation(inputs[i], weights_input_hidden1) hidden2_output = forward_propagation(hidden1_output, weights_hidden1_hidden2) final_output = forward_propagation(hidden2_output, weights_hidden2_output) # Backward propagation output_error = targets[i] - final_output gradient_hidden2_output, weights_gradient_hidden2_output = backward_propagation(output_error, hidden2_output, weights_hidden2_output) gradient_hidden1_hidden2, weights_gradient_hidden1_hidden2 = backward_propagation(gradient_hidden2_output, hidden1_output, weights_hidden1_hidden2) _, weights_gradient_input_hidden1 = backward_propagation(gradient_hidden1_hidden2, inputs[i], weights_input_hidden1) # Update weights weights_input_hidden1 = update_weights(weights_input_hidden1, learning_rate, weights_gradient_input_hidden1) weights_hidden1_hidden2 = update_weights(weights_hidden1_hidden2, learning_rate, weights_gradient_hidden1_hidden2) weights_hidden2_output = update_weights(weights_hidden2_output, learning_rate, weights_gradient_hidden2_output) return weights_input_hidden1, weights_hidden1_hidden2, weights_hidden2_output # Example usage inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) targets = np.array([[0], [1], [1], [0]]) learning_rate = 0.1 epochs = 10000 trained_weights = train_neural_network(inputs, targets, learning_rate, epochs) print("Trained weights:") print("Weights Input-Hidden1:\n", trained_weights[0]) print("Weights Hidden1-Hidden2:\n", trained_weights[1]) print("Weights Hidden2-Output:\n", trained_weights[2]) ``` 如果此部分小於1 那麽根據層數增多求出梯度更新信息也將以指數形式衰減，即發生了梯度消失. 以下為演示代碼 ``` import numpy as np import matplotlib.pyplot as plt def sigmoid(x): return 1 / (1 + np.exp(-x)) def sigmoid_derivative(x): return x * (1 - x) def forward_propagation(inputs, weights): return sigmoid(np.dot(inputs, weights)) def backward_propagation(output_error, inputs, weights): gradient = output_error * sigmoid_derivative(output_error) weights_gradient = np.outer(inputs, gradient) return gradient, weights_gradient def train_neural_network(inputs, targets, learning_rate, epochs, layer_sizes): input_size = len(inputs[0]) output_size = len(targets[0]) # Initialize weights randomly weights = [np.random.rand(layer_sizes[i], layer_sizes[i+1]) for i in range(len(layer_sizes)-1)] gradients_magnitude = [] for epoch in range(epochs): for i in range(len(inputs)): # Forward propagation layer_outputs = [inputs[i]] for j in range(len(layer_sizes)-1): layer_outputs.append(forward_propagation(layer_outputs[j], weights[j])) # Backward propagation output_error = targets[i] - layer_outputs[-1] gradient = output_error * sigmoid_derivative(layer_outputs[-1]) gradients_magnitude.append(np.linalg.norm(gradient)) for j in range(len(layer_sizes)-2, -1, -1): gradient, weights_gradient = backward_propagation(gradient, layer_outputs[j], weights[j]) weights[j] += learning_rate * weights_gradient return gradients_magnitude # Example usage inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) targets = np.array([[0], [1], [1], [0]]) learning_rate = 0.1 epochs = 10000 layer_sizes = [2, 50, 50, 1] # Number of neurons in each layer gradients = train_neural_network(inputs, targets, learning_rate, epochs, layer_sizes) # Plot gradients over training epochs plt.plot(gradients) plt.title('Gradients Magnitude Over Training Epochs') plt.xlabel('Training Epoch') plt.ylabel('Gradients Magnitude') plt.show() ``` 下圖曲綫表示權重更新速度，對於兩個隱層網絡，可以說隱2權重更新速度比隱1更新速度慢： ![v2-f6b9e851de6b876cb6f2cab65bd60b75_720w](https://hackmd.io/_uploads/Hk-tXeR7T.png) 對於四個隱層更明顯： ![v2-dffdfc852ee891e6f11ae068efa5737f_720w](https://hackmd.io/_uploads/rJUnXlAma.png) ## 梯度消失、爆炸解決方案 #### 預訓練和微調整此方法來自於 Hinton在2006年發表的論文中。提出用無監督逐層訓練的方法，其基本思想是每次訓練一層隱藏節點，訓練時將上一層隱藏節點的輸出作爲輸入，而本層輸出作爲下一層隱藏節點的輸出，這就是逐層與訓練。 (https://paperswithcode.com/paper/reducing-the-dimensionality-of-data-with) #### batch norm batchnorm 是DL發展提出重要的成果之一。具有加速網絡收斂速度，提高訓練穩定行的效果。本質上是解決反向傳播過程中的梯度問題。反向傳播有xxx的存在，所以xxx的大小影響了梯度消失和爆炸，通過對每一層的輸出規範為均值和方差一致的方法，消除了帶來的放大隨下影響。 ##### 殘差結構 ![image](https://hackmd.io/_uploads/BJnOz4Cmp.png) #### LSTM ![image](https://hackmd.io/_uploads/rJBCGV0XT.png) ## 參考資料：新手村逃脫！初心者的python機器學習攻略 ---書籍 https://paperswithcode.com/paper/reducing-the-dimensionality-of-data-with ---論文 chatgpt --- 程式編寫 https://zh.wikipedia.org/zh-tw/%E5%8F%8D%E5%90%91%E4%BC%A0%E6%92%AD%E7%AE%97%E6%B3%95 --- BP算法講解 https://www.cupoy.com/qa/club/ai_tw/0000016D6BA22D97000000016375706F795F72656C656173654B5741535354434C5542/0000017BAC14A4DE000000116375706F795F72656C656173655155455354 --- 講解消失和爆炸 https://zh.wikipedia.org/zh-tw/%E9%9D%9E%E7%B7%9A%E6%80%A7%E7%B3%BB%E7%B5%B1 ---非綫性系統講解