# Data Science Homework 6 - R10922070
## Hyperparameters and model
- Model Structure
>
Trained for 70 epochs, choosing model at epoch 57
Loading weights: model/gpu_57
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 48, 28, 28] 3,648
ReLU-2 [-1, 48, 28, 28] 0
MaxPool2d-3 [-1, 48, 14, 14] 0
BatchNorm2d-4 [-1, 48, 14, 14] 96
Dropout-5 [-1, 48, 14, 14] 0
Conv2d-6 [-1, 108, 14, 14] 129,708
ReLU-7 [-1, 108, 14, 14] 0
MaxPool2d-8 [-1, 108, 7, 7] 0
BatchNorm2d-9 [-1, 108, 7, 7] 216
Dropout-10 [-1, 108, 7, 7] 0
Conv2d-11 [-1, 192, 7, 7] 518,592
ReLU-12 [-1, 192, 7, 7] 0
MaxPool2d-13 [-1, 192, 3, 3] 0
BatchNorm2d-14 [-1, 192, 3, 3] 384
Dropout-15 [-1, 192, 3, 3] 0
Linear-16 [-1, 128] 221,312
ReLU-17 [-1, 128] 0
Dropout-18 [-1, 128] 0
Linear-19 [-1, 128] 16,512
ReLU-20 [-1, 128] 0
Dropout-21 [-1, 128] 0
Linear-22 [-1, 10] 1,290
================================================================
Total params: 891,758
----------------------------------------------------------------
- Criterion
```nn.CrossEntropyLoss()```
- Optimizer
```optim.Adam(model.parameters(), lr=0.001)```
- Data preprocessing and augmentation
```img_tsrm = transforms.Lambda(lambda image: torch.div(image.float(), 255))```
```aug_tsrm = transforms.RandomRotation([-45, 45])```
## Design
- Convolution layer
This model has three convolution layers, the number of channels of each layers are chosen somewhat arbitrarily, each being a square number times 3: the first layer has 4 * 4 * 3, the second layer has 6 * 6 * 3, and the third layer has 8 * 8 * 3.
- Order of layers
conv -> ReLU -> maxpool -> bn
I wanted the output of the convolution layer to have a nice distribution, so I put batch normalization at the end. I chose max pool over other pooling methods because it seems like max pooling is better at picking up edges, which is beneficial to recognizing digits. The order of ReLU and max pool seems irrelevant since swapping these two layers doesn't affect the output.
- Dense layer
There are two fully connected dense layers each with 128 neurons. This is, again, somewhat an arbitrary choice.
- Dropout layer
After every block of layers there's a dropout layer to prevent overfitting. The dropout layers have an increasing drop chance the further into the model.
## Result
<p style="text-align: center">
<img width="550" src="https://i.imgur.com/dmH6M7i.png"></img>
</p>
<p style="text-align: center">
<img width="550" src="https://i.imgur.com/F9pTywT.png"></img>
</p>
From the confusion matrix, it seems like the model confuses 3 as 5, 4 as 9, and 5 as 6 the most.
<p style="text-align: center">
<img width="550" src="https://i.imgur.com/ebt2Edd.png"></img>
</p>