機器學習 - HackMD

# **機器學習重點** ## 2.What is machine learning :::success 是否有人類監督的四種分類 ::: * Supervised learning(監督式學習) 在有標籤的數據集上進行訓練，模型學習輸入數據與相應的輸出標籤之間的關係，並用這些已學到的關係來預測新的數據。 * Unsupervised learning(非監督式學習) 沒有標籤的數據集上進行訓練。目的是從數據中找出隱藏的結構或模式。 * Semi-supervised lesrning(半監督式學習) 一部分有標籤的數據和大量無標籤的數據進行訓練，以提高模型的性能。 * Reinforcement learning(強化式學習) 無事先標記在機器做出反應時給予正確或錯誤使機器知道對錯並從中學習 :::info 從傳送的資料流中增量學習 ::: * Batch learning(批次學習，離線學習) 在一個==固定的數據==集上進行訓練。模型在所有可用的數據上進行一次或多次訓練，訓練完成後模型不會再更新，直到新的一批數據可用並重新進行訓練。 * Online learning(線上學習) 隨著新的數據==不斷更新==。每當有新的數據到達時，模型會進行即時更新，不需要重新訓練整個數據集。 ## 4.Neuron and Perceptron Perceptron learning algorithm(感知器學習算法) 1. Initial values 初始化值 2. Calculate the output value : 計算y的值 3. Adjust weights : 調整權重 4. Repeat steps 2 to 3 ## 5.Multi Layer Perceptron(MLP) 多層感知網路 ![image](https://hackmd.io/_uploads/HkD5vegOR.png =70%x) MLP是一種前饋神經網絡網路每一層都包含各自獨立神經元 ### 激活函數主要作用 : 讓神經網路加入非線性的因素 * Sigmoid: 適合==機率== 不適用於涵多層網路模型 (因為會有梯度消失問題只在極大=1 或極小值時趨近於直線微分趨近於0)![image](https://hackmd.io/_uploads/rk7xjxxdA.png =50%x) * Tanh: 也是會有梯度消失問題 ![image](https://hackmd.io/_uploads/HkXujge_R.png =50%x) * ReLU x<0時值都=0 權重無法調整 ![image](https://hackmd.io/_uploads/H1wnjeeu0.png =50%x) * Softmax 每個算對應出來都是<1的值當所有元素相加=1 適合用於多分類機率模型 ![image](https://hackmd.io/_uploads/S1hz6glOC.png =50%x) ### Loss function 計算預測值跟實際值的偏差 * 均方誤差 (Mean Squared Error, MSE) 平方誤差的平均值 * 平均絕對誤差 (Mean Absolute Error, MAE) 絕對誤差的平均值 * 平均絕對百分比誤差 (Mean Absolute Percentage Error, MAPE)相對誤差的平均值，以百分比形式表示 * 均方對數誤差 (Mean Squared Logarithmic Error, MSLE) 對數差異的平方平均值 ## 6.Strategies for adjusting MLP weights :::success 神經網路主要目的調整權重(w*)使得預測所產生誤差L(w*)為最小 ::: 1. 反向傳播（Backpropagation）計算損失函數相對於每個權重的梯度來調整權重，使損失函數最小化。使用時問題 * 使用訓練集的所有數據一次性計算損失函數的梯度，然後更新權重，收斂速度會非常慢使用SGD加速 2. 隨機梯度下降（Stochastic Gradient Descent, SGD）一次只跑一個樣本就從新更新權重 :arrow_right: 時間太慢 3. 小批量梯度下降（Mini-Batch Gradient Descent）將資料拆成小份資料時間快 4. 動量（Momentum） Gradient方向和歷史更新方向一致時加強Gradient方向 ![image](https://hackmd.io/_uploads/HkTq7zMOR.png =70%x) ## 7.Tensorflow Tensor 單一資料多維陣列 ## 9.Build MLP Model :::spoiler 程式碼 ``` # coding=utf-8 # 載入函式庫 import tensorflow as tf from tensorflow.keras.datasets import mnist from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.optimizers import RMSprop # 設置參數 batch_size = 128 num_classes = 10 epochs = 20 # 載入mnist資料集的訓練部分: 訓練資料集x_train, y_train(標記)和測試資料集x_test, y_test (x_train, y_train), (x_test, y_test) = mnist.load_data() # 資料轉成float32和正規化 x_train = x_train.reshape(60000, 784) x_test = x_test.reshape(10000, 784) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255 print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples') # 類別轉換成onehot encoding y_train = tf.keras.utils.to_categorical(y_train, num_classes) y_test = tf.keras.utils.to_categorical(y_test, num_classes) # 模型架構 model = Sequential() # 輸入層 model.add(Dense(512, activation='relu', input_shape=(784,))) # 隱藏層 model.add(Dense(512, activation='relu')) # 輸出層 model.add(Dense(num_classes, activation='softmax')) # 模型架構摘要 model.summary() # 編譯模型 model.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy']) # 訓練模型 model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test)) # 評估模型 score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1]) ``` ::: ## 10.Model Evaluation Metrics ### Confusion Matrix (混淆矩陣結構) | | True Yes | True NO | | ----------- |:------------------- | ------------------- | | Predict Yes | True Positive (TP) | False Positive (FP) | | Predict NO | False Negative (FN) | True Negative (TN) | 1.Accuracy = (TP+TN)/Total ==*TP 很少時可能造成誤導如罕見疾病*== 2.Precision = TP/(TP+FP) ==*預測YES裡到底有多少真的YES*== 3.Recall = TP/(TP+FN) *有病沒病都當有病，可信度低* 4.Specificity = TN/(FP+TN) ## 11.Image Convolution 圖像捲積 1.內核(Filter)：內核是一個小矩陣，通常大小為3x3、5x5或7x7。內核矩陣中的每個值都是一個權重，決定了圖像中對應像素對新像素值的影響程度。 2.滑動窗口：將內核放在圖像的左上角。內核覆蓋的像素值與內核對應的值相乘並求和，得到一個新的像素值。這個過程會隨著內核滑動整個圖像而重複進行。 3.卷積運算：對於內核的每個位置，計算內核值與相應圖像像素值的乘積之和。這個和就是輸出圖像中的新像素值。 4.輸出圖像：卷積的結果是一個新圖像，其中每個像素值是內核應用於原圖像相應位置的結果。 ## 12.Convolutional Neural Networks ### MLP與CNN比較 | 特點 | MLP | CNN | | ------------ | ------------------ |:---------------------------- | | 層次結構 | 全連接層 | 卷積層、池化層、全連接層 | | 輸入數據格式 | 一維向量 | 二維矩陣（圖像） | | 特徵提取能力 | 有限 | 強，能自動提取局部特徵 | | 參數數量 | 大 | 相對較少（參數共享） | | 計算效率 | 相對較低 | 高，特別是對於大規模圖像數據 | | 應用場景 | 表格數據、一維數據 | 圖像、視頻、計算機視覺任務 | stride(步輻) 也就是每一個小視窗3X3多少步 Depth(深度) pooling(池化) 只留下重要特徵質 Max Average Global 這層不會學習任何參數 Param=0 ## 14 classic CNN model ### LeNet 模型早期 CNN 結構主要用tanh當激活函數 :::spoiler 程式碼 ``` import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Flatten from keras.layers import Conv2D, AveragePooling2D from keras.utils.np_utils import to_categorical # Build LeNet model def LeNet(input_shape, num_classes): model = Sequential() model.add(Conv2D(filters=6, kernel_size=(5, 5), strides=(1, 1), input_shape=input_shape, activation='tanh', padding='same')) #add捲基層 Fillters 5*5*1=25 6*25=150 150+6=156 model.add(AveragePooling2D(pool_size=(4, 4), strides=(2, 2))) #池化(28-2(pool_size))/2(strides) model.add(Conv2D(filters=16, kernel_size=(3, 3), strides=(1, 1), #第二卷基層 ((3*3)*6)+1=55 55*16 activation='tanh', padding='same')) model.add(AveragePooling2D(pool_size=(2, 2), strides=(2, 2))) model.add(Flatten()) #展開成一維 model.add(Dense(120, activation='tanh')) model.add(Dense(84, activation='tanh')) model.add(Dense(num_classes, activation='softmax')) return model # 設定輸入形狀和類別數量 input_shape = (28, 28, 1) # MNIST 圖像大小 num_classes = 10 # 手寫數字 0-9 # 建立模型 model = LeNet(input_shape, num_classes) # 顯示模型摘要 model.summary() # 載入 MNIST 數據集 (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train = x_train.reshape(x_train.shape[0], 28, 28, 1).astype('float32') / 255 x_test = x_test.reshape(x_test.shape[0], 28, 28, 1).astype('float32') / 255 # 將類別向量轉換為二進制類別矩陣 y_train = to_categorical(y_train, num_classes) y_test = to_categorical(y_test, num_classes) # 編譯模型 model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # 訓練模型 model.fit(x_train, y_train, batch_size=128, epochs=12, validation_split=0.2, verbose=1) # 評估模型 score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1]) # 進行預測 predictions = model.predict(x_test) print('Predicted probabilities for first test image:', predictions[0]) print('Predicted class for first test image:', predictions[0].argmax()) ``` ::: ### AlexNet 使用 ==ReLU== 激活函數用數據增強 (Data Augmentation) 和 Dropout 來防止Overfitting 使用小批量隨機梯度下降 (Mini-batch SGD) 來加速訓練 Dropout: 在訓練過程中隨機丟棄一部分神經元 ### VGGNet 通過使用大量的卷積層來增加網絡的深度。最深的 VGGNet 模型（VGG-19）包含 19 層這些卷積層的濾波器數量逐層增加，通常是 64、128、256、512 使用 3x3 的小卷積核進行卷積操作。這樣可以保持更多的圖像細節，同時減少參數數量 ==固定池化層== ### ResNet Residual Block每一層學習到的只是輸入和輸出之間的殘差，而不是整個輸出跳躍連接（Skip Connection）輸入直接通過一條捷徑傳遞到後面的層，並與後面的層的輸出相加。保證穩定性 ## 18 Data Preprocessing Feature Selection（特徵選擇） Pandas DataFrame的drop方法刪除欄位 ``` import pandas as pd # 移除不必要的列 df = pd.read_csv('data.csv') df.drop(['id', 'name']) ```