# CNN 筆記
## 基本概念

專門用在圖像辨識上
每個神經元不需要看整張圖片(No need to be Fully connected)

所以將圖片拆分成一個一個 Receptive field
每個 Receptive field 連接多個 Neurons
(各個Neuron代表各個未知的Filter,來比對出圖片特徵)

Channels: RGB
kernel size: Receptive field 大小
stride: 各Receptive field 中間間隔
padding: 超出圖片範圍,可能會用補0方式處理

可以想像成:用各種 Filter 去對圖片做比對
並建立成 Feature Map
在此第一層輸入為 6 * 6、1 channel 的圖片(Gray scaled)
Filter 為 3 * 3、1 channel
拿 Filter 比對後,就會得到 4 * 4 的 Feature Map(可視為產生一張新的圖片)
註:$\lceil {(6-3) \div stride} \rceil + 1 = 4$,所以 4 * 4

假設第一層共有 64 個 Filter
最後產生的 Feature Map 就會有 64 channels
那進入第二層所需的 Filter 就也要是 64 channels
註:$\lceil {(4-3) \div stride} \rceil + 1 = 2$,所以 2 * 2
## Max Pooling

以 2 * 2 方陣取其中最大值保留,其餘刪除(降維)
## 範例:MNIST 手寫數字辨識模型訓練
### 程式碼:
參考:[Github](https://github.com/pyliaorachel/MNIST-pytorch-tensorflow-eager-interactive?tab=readme-ov-file)
Defining Network Architecture:
```python=
class Net(nn.Module): # Inherit from `nn.Module`, define `__init__` & `forward`
def __init__(self):
super(Net, self).__init__()
# Define two 2D convolutional layers (1 x 10, 10 x 20 each)
# with convolution kernel of size (5 x 5).
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
# Define a dropout layer
self.conv2_drop = nn.Dropout2d()
# Define a fully-connected layer (320 x 10)
self.fc = nn.Linear(320, 10)
def forward(self, x):
# Input image size: 28 x 28,
# input channel: 1,
# batch size (training): 64
# Input (64 x 1 x 28 x 28) -> Conv1 (64 x 10 x 24 x 24) -> Max Pooling (64 x 10 x 12 x 12) -> ReLU -> ...
x = F.relu(F.max_pool2d(self.conv1(x), 2))
# ... -> Conv2 (64 x 20 x 8 x 8) -> Dropout -> Max Pooling (64 x 20 x 4 x 4) -> ReLU -> ...
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
# ... -> Flatten (64 x 320) -> ...
x = x.view(-1, 320)
# ... -> FC (64 x 10) -> ...
x = self.fc(x)
# ... -> Log Softmax -> Output
return F.log_softmax(x, dim=1)
```
### 解釋:
```python=
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
```
定義第一層輸入 1 channel,輸出 10 channels (10個filters)
第二層輸入 10 channels,輸出 20 channels(20 filters)
兩層 Reception Field 皆為 5 * 5
```python=
# forword 函式中
x = F.relu(F.max_pool2d(self.conv1(x), 2))
```
原本圖片大小為 28 * 28、1 channel,經過上述定義之conv1後,變為 24 * 24、10 channels
再來經過 2 * 2 方陣的 Max pooling,變為 12 * 12、10 channels
最後再以 ReLU 作為 Activation Function ($\max(0, x)$ 做負值截斷)
ReLU 負值截斷 可以達到去除雜訊([來源](https://bit.ly/3S6hQRR)):

```python=
# forword 函式中
F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
```
12 * 12、10 channels 經過上述定義 conv2 後,變為 8 * 8、20 channels
再來經過一個 Dropout2d層
(隨機把某個channel全部設成0,有效防止overfitting,避免模型過度依賴某特徵)
接著經過 2 * 2 方陣的 Max pooling,變為 4 * 4、20 channels
最後一樣以 ReLU 作為 Activation Function ($\max(0, x)$ 做負值截斷)
```python=
# forword 函式中
x = x.view(-1, 320)
x = self.fc(x)
return F.log_softmax(x, dim=1)
```
將最後 4 * 4 * 20 = 320拉直(flatten)
並且連接上述定義的 Linear fully connected 層(輸入320、輸出10)
最後經過 softmax,值會落在 0~1 之間(機率):

註:此處用 log_softmax,意義上 softmax 後會再取一次 log,但實際上為了效率會用另一個Formula來計算。用途是避免 underflow 或 overflow 發生,也方便 Loss 計算(nll_loss)
> [bloomhkdan.medium.com](https://bloomhkdan.medium.com/log-softmax-73596d7649ab):The log of the softmax function is often used instead of the softmax function itself because it has better numerical stability and is less prone to underflow or overflow errors.
> 
更直觀寫法:
```python=
# forword 函式中
x = self.conv1(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = self.conv2(x)
x = self.conv2_drop(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = x.view(-1, 320) # 或寫成 x.flatten(start_dim=1)
x = self.fc(x)
return F.log_softmax(x, dim=1)
```
### 測試紀錄
預設:9859 / 10000
batch_size_10:9897 / 10000
learning_rate_0.05:9895 / 10000
learning_rate_0.1:9870 / 10000
batch_size_10 & learning_rate_0.05:9727 / 10000
batch_size_10 & learning_rate_0.005:9891 / 10000
嘗試自行修改(3層):
```python=
class Net(nn.Module): # Inherit from `nn.Module`, define `__init__` & `forward`
def __init__(self):
# Always call the init function of the parent class `nn.Module`
# so that magics can be set up.
super(Net, self).__init__()
# Define the parameters in your network.
# This is achieved by defining the shapes of the multiple layers in the network.
# Define two 2D convolutional layers (1 x 10, 10 x 20 each)
# with convolution kernel of size (5 x 5).
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 40, kernel_size=3)
self.conv3 = nn.Conv2d(40, 80, kernel_size=2)
# Define a dropout layer
self.conv2_drop = nn.Dropout2d()
self.conv3_drop = nn.Dropout2d()
# Define a fully-connected layer (320 x 10)
self.fc = nn.Linear(320, 10)
def forward(self, x):
# Define the network architecture.
# This is achieved by defining how the network forward propagates your inputs
# Input image size: 28 x 28, input channel: 1, batch size (training): 64
x = self.conv1(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = self.conv2(x)
x = self.conv2_drop(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = self.conv3(x)
x = self.conv3_drop(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = x.flatten(start_dim=1)
x = self.fc(x)
return F.log_softmax(x, dim=1)
```
Accuracy: 9875 / 10000