CNN 筆記 - HackMD

# CNN 筆記 ## 基本概念 ![Screenshot_20240119_083253](https://hackmd.io/_uploads/S1LJErwF6.png) 專門用在圖像辨識上每個神經元不需要看整張圖片（No need to be Fully connected） ![Screenshot_20240119_083736](https://hackmd.io/_uploads/H1hi4BvFa.png) 所以將圖片拆分成一個一個 Receptive field 每個 Receptive field 連接多個 Neurons (各個Neuron代表各個未知的Filter，來比對出圖片特徵) ![Screenshot_20240119_084532](https://hackmd.io/_uploads/SJuELrDFa.png) Channels: RGB kernel size: Receptive field 大小 stride: 各Receptive field 中間間隔 padding: 超出圖片範圍，可能會用補0方式處理 ![Screenshot_20240119_085052](https://hackmd.io/_uploads/B1__vBwFa.png) 可以想像成：用各種 Filter 去對圖片做比對並建立成 Feature Map 在此第一層輸入為 6 * 6、1 channel 的圖片（Gray scaled） Filter 為 3 * 3、1 channel 拿 Filter 比對後，就會得到 4 * 4 的 Feature Map（可視為產生一張新的圖片）註：$\lceil {(6-3) \div stride} \rceil + 1 = 4$，所以 4 * 4 ![Screenshot_20240119_085250](https://hackmd.io/_uploads/ry6J_HvYa.png) 假設第一層共有 64 個 Filter 最後產生的 Feature Map 就會有 64 channels 那進入第二層所需的 Filter 就也要是 64 channels 註：$\lceil {(4-3) \div stride} \rceil + 1 = 2$，所以 2 * 2 ## Max Pooling ![Screenshot_20240119_092243](https://hackmd.io/_uploads/SkdlJ8vYT.png) 以 2 * 2 方陣取其中最大值保留，其餘刪除（降維） ## 範例：MNIST 手寫數字辨識模型訓練 ### 程式碼：參考：[Github](https://github.com/pyliaorachel/MNIST-pytorch-tensorflow-eager-interactive?tab=readme-ov-file) Defining Network Architecture： ```python= class Net(nn.Module): # Inherit from `nn.Module`, define `__init__` & `forward` def __init__(self): super(Net, self).__init__() # Define two 2D convolutional layers (1 x 10, 10 x 20 each) # with convolution kernel of size (5 x 5). self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(10, 20, kernel_size=5) # Define a dropout layer self.conv2_drop = nn.Dropout2d() # Define a fully-connected layer (320 x 10) self.fc = nn.Linear(320, 10) def forward(self, x): # Input image size: 28 x 28, # input channel: 1, # batch size (training): 64 # Input (64 x 1 x 28 x 28) -> Conv1 (64 x 10 x 24 x 24) -> Max Pooling (64 x 10 x 12 x 12) -> ReLU -> ... x = F.relu(F.max_pool2d(self.conv1(x), 2)) # ... -> Conv2 (64 x 20 x 8 x 8) -> Dropout -> Max Pooling (64 x 20 x 4 x 4) -> ReLU -> ... x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) # ... -> Flatten (64 x 320) -> ... x = x.view(-1, 320) # ... -> FC (64 x 10) -> ... x = self.fc(x) # ... -> Log Softmax -> Output return F.log_softmax(x, dim=1) ``` ### 解釋： ```python= self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(10, 20, kernel_size=5) ``` 定義第一層輸入 1 channel，輸出 10 channels （10個filters）第二層輸入 10 channels，輸出 20 channels（20 filters）兩層 Reception Field 皆為 5 * 5 ```python= # forword 函式中 x = F.relu(F.max_pool2d(self.conv1(x), 2)) ``` 原本圖片大小為 28 * 28、1 channel，經過上述定義之conv1後，變為 24 * 24、10 channels 再來經過 2 * 2 方陣的 Max pooling，變為 12 * 12、10 channels 最後再以 ReLU 作為 Activation Function （$\max(0, x)$ 做負值截斷） ReLU 負值截斷可以達到去除雜訊（[來源](https://bit.ly/3S6hQRR)）： ![Screenshot_20240119_101002](https://hackmd.io/_uploads/SknZcIwYT.png) ```python= # forword 函式中 F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) ``` 12 * 12、10 channels 經過上述定義 conv2 後，變為 8 * 8、20 channels 再來經過一個 Dropout2d層（隨機把某個channel全部設成0，有效防止overfitting，避免模型過度依賴某特徵）接著經過 2 * 2 方陣的 Max pooling，變為 4 * 4、20 channels 最後一樣以 ReLU 作為 Activation Function （$\max(0, x)$ 做負值截斷） ```python= # forword 函式中 x = x.view(-1, 320) x = self.fc(x) return F.log_softmax(x, dim=1) ``` 將最後 4 * 4 * 20 = 320拉直（flatten）並且連接上述定義的 Linear fully connected 層（輸入320、輸出10）最後經過 softmax，值會落在 0～1 之間（機率）： ![v2-ba306ec4c7294444f9d23ef3bca42ad5_720w](https://hackmd.io/_uploads/rk5DWPPF6.png) 註：此處用 log_softmax，意義上 softmax 後會再取一次 log，但實際上為了效率會用另一個Formula來計算。用途是避免 underflow 或 overflow 發生，也方便 Loss 計算（nll_loss） > [bloomhkdan.medium.com](https://bloomhkdan.medium.com/log-softmax-73596d7649ab)：The log of the softmax function is often used instead of the softmax function itself because it has better numerical stability and is less prone to underflow or overflow errors. > ![20200716140212637](https://hackmd.io/_uploads/B1gV8vwFa.png) 更直觀寫法： ```python= # forword 函式中 x = self.conv1(x) x = F.max_pool2d(x, 2) x = F.relu(x) x = self.conv2(x) x = self.conv2_drop(x) x = F.max_pool2d(x, 2) x = F.relu(x) x = x.view(-1, 320) # 或寫成 x.flatten(start_dim=1) x = self.fc(x) return F.log_softmax(x, dim=1) ``` ### 測試紀錄預設：9859 / 10000 batch_size_10：9897 / 10000 learning_rate_0.05：9895 / 10000 learning_rate_0.1：9870 / 10000 batch_size_10 ＆ learning_rate_0.05：9727 / 10000 batch_size_10 ＆ learning_rate_0.005：9891 / 10000 嘗試自行修改（3層）： ```python= class Net(nn.Module): # Inherit from `nn.Module`, define `__init__` & `forward` def __init__(self): # Always call the init function of the parent class `nn.Module` # so that magics can be set up. super(Net, self).__init__() # Define the parameters in your network. # This is achieved by defining the shapes of the multiple layers in the network. # Define two 2D convolutional layers (1 x 10, 10 x 20 each) # with convolution kernel of size (5 x 5). self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(10, 40, kernel_size=3) self.conv3 = nn.Conv2d(40, 80, kernel_size=2) # Define a dropout layer self.conv2_drop = nn.Dropout2d() self.conv3_drop = nn.Dropout2d() # Define a fully-connected layer (320 x 10) self.fc = nn.Linear(320, 10) def forward(self, x): # Define the network architecture. # This is achieved by defining how the network forward propagates your inputs # Input image size: 28 x 28, input channel: 1, batch size (training): 64 x = self.conv1(x) x = F.max_pool2d(x, 2) x = F.relu(x) x = self.conv2(x) x = self.conv2_drop(x) x = F.max_pool2d(x, 2) x = F.relu(x) x = self.conv3(x) x = self.conv3_drop(x) x = F.max_pool2d(x, 2) x = F.relu(x) x = x.flatten(start_dim=1) x = self.fc(x) return F.log_softmax(x, dim=1) ``` Accuracy: 9875 / 10000