Data Science Homework 6 - R10922070

# Data Science Homework 6 - R10922070 ## Hyperparameters and model - Model Structure > Trained for 70 epochs, choosing model at epoch 57 Loading weights: model/gpu_57 ---------------------------------------------------------------- Layer (type) Output Shape Param # ================================================================ Conv2d-1 [-1, 48, 28, 28] 3,648 ReLU-2 [-1, 48, 28, 28] 0 MaxPool2d-3 [-1, 48, 14, 14] 0 BatchNorm2d-4 [-1, 48, 14, 14] 96 Dropout-5 [-1, 48, 14, 14] 0 Conv2d-6 [-1, 108, 14, 14] 129,708 ReLU-7 [-1, 108, 14, 14] 0 MaxPool2d-8 [-1, 108, 7, 7] 0 BatchNorm2d-9 [-1, 108, 7, 7] 216 Dropout-10 [-1, 108, 7, 7] 0 Conv2d-11 [-1, 192, 7, 7] 518,592 ReLU-12 [-1, 192, 7, 7] 0 MaxPool2d-13 [-1, 192, 3, 3] 0 BatchNorm2d-14 [-1, 192, 3, 3] 384 Dropout-15 [-1, 192, 3, 3] 0 Linear-16 [-1, 128] 221,312 ReLU-17 [-1, 128] 0 Dropout-18 [-1, 128] 0 Linear-19 [-1, 128] 16,512 ReLU-20 [-1, 128] 0 Dropout-21 [-1, 128] 0 Linear-22 [-1, 10] 1,290 ================================================================ Total params: 891,758 ---------------------------------------------------------------- - Criterion ```nn.CrossEntropyLoss()``` - Optimizer ```optim.Adam(model.parameters(), lr=0.001)``` - Data preprocessing and augmentation ```img_tsrm = transforms.Lambda(lambda image: torch.div(image.float(), 255))``` ```aug_tsrm = transforms.RandomRotation([-45, 45])``` ## Design - Convolution layer This model has three convolution layers, the number of channels of each layers are chosen somewhat arbitrarily, each being a square number times 3: the first layer has 4 * 4 * 3, the second layer has 6 * 6 * 3, and the third layer has 8 * 8 * 3. - Order of layers conv -> ReLU -> maxpool -> bn I wanted the output of the convolution layer to have a nice distribution, so I put batch normalization at the end. I chose max pool over other pooling methods because it seems like max pooling is better at picking up edges, which is beneficial to recognizing digits. The order of ReLU and max pool seems irrelevant since swapping these two layers doesn't affect the output. - Dense layer There are two fully connected dense layers each with 128 neurons. This is, again, somewhat an arbitrary choice. - Dropout layer After every block of layers there's a dropout layer to prevent overfitting. The dropout layers have an increasing drop chance the further into the model. ## Result <img width="550" src="https://i.imgur.com/dmH6M7i.png"></img> <img width="550" src="https://i.imgur.com/F9pTywT.png"></img> From the confusion matrix, it seems like the model confuses 3 as 5, 4 as 9, and 5 as 6 the most. <img width="550" src="https://i.imgur.com/ebt2Edd.png"></img>