# CNN 筆記 ## 基本概念 ![Screenshot_20240119_083253](https://hackmd.io/_uploads/S1LJErwF6.png) 專門用在圖像辨識上 每個神經元不需要看整張圖片(No need to be Fully connected) ![Screenshot_20240119_083736](https://hackmd.io/_uploads/H1hi4BvFa.png) 所以將圖片拆分成一個一個 Receptive field 每個 Receptive field 連接多個 Neurons (各個Neuron代表各個未知的Filter,來比對出圖片特徵) ![Screenshot_20240119_084532](https://hackmd.io/_uploads/SJuELrDFa.png) Channels: RGB kernel size: Receptive field 大小 stride: 各Receptive field 中間間隔 padding: 超出圖片範圍,可能會用補0方式處理 ![Screenshot_20240119_085052](https://hackmd.io/_uploads/B1__vBwFa.png) 可以想像成:用各種 Filter 去對圖片做比對 並建立成 Feature Map 在此第一層輸入為 6 * 6、1 channel 的圖片(Gray scaled) Filter 為 3 * 3、1 channel 拿 Filter 比對後,就會得到 4 * 4 的 Feature Map(可視為產生一張新的圖片) 註:$\lceil {(6-3) \div stride} \rceil + 1 = 4$,所以 4 * 4 ![Screenshot_20240119_085250](https://hackmd.io/_uploads/ry6J_HvYa.png) 假設第一層共有 64 個 Filter 最後產生的 Feature Map 就會有 64 channels 那進入第二層所需的 Filter 就也要是 64 channels 註:$\lceil {(4-3) \div stride} \rceil + 1 = 2$,所以 2 * 2 ## Max Pooling ![Screenshot_20240119_092243](https://hackmd.io/_uploads/SkdlJ8vYT.png) 以 2 * 2 方陣取其中最大值保留,其餘刪除(降維) ## 範例:MNIST 手寫數字辨識模型訓練 ### 程式碼: 參考:[Github](https://github.com/pyliaorachel/MNIST-pytorch-tensorflow-eager-interactive?tab=readme-ov-file) Defining Network Architecture: ```python= class Net(nn.Module): # Inherit from `nn.Module`, define `__init__` & `forward` def __init__(self): super(Net, self).__init__() # Define two 2D convolutional layers (1 x 10, 10 x 20 each) # with convolution kernel of size (5 x 5). self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(10, 20, kernel_size=5) # Define a dropout layer self.conv2_drop = nn.Dropout2d() # Define a fully-connected layer (320 x 10) self.fc = nn.Linear(320, 10) def forward(self, x): # Input image size: 28 x 28, # input channel: 1, # batch size (training): 64 # Input (64 x 1 x 28 x 28) -> Conv1 (64 x 10 x 24 x 24) -> Max Pooling (64 x 10 x 12 x 12) -> ReLU -> ... x = F.relu(F.max_pool2d(self.conv1(x), 2)) # ... -> Conv2 (64 x 20 x 8 x 8) -> Dropout -> Max Pooling (64 x 20 x 4 x 4) -> ReLU -> ... x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) # ... -> Flatten (64 x 320) -> ... x = x.view(-1, 320) # ... -> FC (64 x 10) -> ... x = self.fc(x) # ... -> Log Softmax -> Output return F.log_softmax(x, dim=1) ``` ### 解釋: ```python= self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(10, 20, kernel_size=5) ``` 定義第一層輸入 1 channel,輸出 10 channels (10個filters) 第二層輸入 10 channels,輸出 20 channels(20 filters) 兩層 Reception Field 皆為 5 * 5 ```python= # forword 函式中 x = F.relu(F.max_pool2d(self.conv1(x), 2)) ``` 原本圖片大小為 28 * 28、1 channel,經過上述定義之conv1後,變為 24 * 24、10 channels 再來經過 2 * 2 方陣的 Max pooling,變為 12 * 12、10 channels 最後再以 ReLU 作為 Activation Function ($\max(0, x)$ 做負值截斷) ReLU 負值截斷 可以達到去除雜訊([來源](https://bit.ly/3S6hQRR)): ![Screenshot_20240119_101002](https://hackmd.io/_uploads/SknZcIwYT.png) ```python= # forword 函式中 F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) ``` 12 * 12、10 channels 經過上述定義 conv2 後,變為 8 * 8、20 channels 再來經過一個 Dropout2d層 (隨機把某個channel全部設成0,有效防止overfitting,避免模型過度依賴某特徵) 接著經過 2 * 2 方陣的 Max pooling,變為 4 * 4、20 channels 最後一樣以 ReLU 作為 Activation Function ($\max(0, x)$ 做負值截斷) ```python= # forword 函式中 x = x.view(-1, 320) x = self.fc(x) return F.log_softmax(x, dim=1) ``` 將最後 4 * 4 * 20 = 320拉直(flatten) 並且連接上述定義的 Linear fully connected 層(輸入320、輸出10) 最後經過 softmax,值會落在 0~1 之間(機率): ![v2-ba306ec4c7294444f9d23ef3bca42ad5_720w](https://hackmd.io/_uploads/rk5DWPPF6.png) 註:此處用 log_softmax,意義上 softmax 後會再取一次 log,但實際上為了效率會用另一個Formula來計算。用途是避免 underflow 或 overflow 發生,也方便 Loss 計算(nll_loss) > [bloomhkdan.medium.com](https://bloomhkdan.medium.com/log-softmax-73596d7649ab):The log of the softmax function is often used instead of the softmax function itself because it has better numerical stability and is less prone to underflow or overflow errors. > ![20200716140212637](https://hackmd.io/_uploads/B1gV8vwFa.png) 更直觀寫法: ```python= # forword 函式中 x = self.conv1(x) x = F.max_pool2d(x, 2) x = F.relu(x) x = self.conv2(x) x = self.conv2_drop(x) x = F.max_pool2d(x, 2) x = F.relu(x) x = x.view(-1, 320) # 或寫成 x.flatten(start_dim=1) x = self.fc(x) return F.log_softmax(x, dim=1) ``` ### 測試紀錄 預設:9859 / 10000 batch_size_10:9897 / 10000 learning_rate_0.05:9895 / 10000 learning_rate_0.1:9870 / 10000 batch_size_10 & learning_rate_0.05:9727 / 10000 batch_size_10 & learning_rate_0.005:9891 / 10000 嘗試自行修改(3層): ```python= class Net(nn.Module): # Inherit from `nn.Module`, define `__init__` & `forward` def __init__(self): # Always call the init function of the parent class `nn.Module` # so that magics can be set up. super(Net, self).__init__() # Define the parameters in your network. # This is achieved by defining the shapes of the multiple layers in the network. # Define two 2D convolutional layers (1 x 10, 10 x 20 each) # with convolution kernel of size (5 x 5). self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(10, 40, kernel_size=3) self.conv3 = nn.Conv2d(40, 80, kernel_size=2) # Define a dropout layer self.conv2_drop = nn.Dropout2d() self.conv3_drop = nn.Dropout2d() # Define a fully-connected layer (320 x 10) self.fc = nn.Linear(320, 10) def forward(self, x): # Define the network architecture. # This is achieved by defining how the network forward propagates your inputs # Input image size: 28 x 28, input channel: 1, batch size (training): 64 x = self.conv1(x) x = F.max_pool2d(x, 2) x = F.relu(x) x = self.conv2(x) x = self.conv2_drop(x) x = F.max_pool2d(x, 2) x = F.relu(x) x = self.conv3(x) x = self.conv3_drop(x) x = F.max_pool2d(x, 2) x = F.relu(x) x = x.flatten(start_dim=1) x = self.fc(x) return F.log_softmax(x, dim=1) ``` Accuracy: 9875 / 10000