Inception-V1 (GoogLeNet): Summary and Implementation
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
This post is divided into 2 sections: Summary and Implementation.
We are going to have an in-depth review of Going Deeper with Convolutions paper which introduces the Inception-V1/GoogLeNet architecture.
The implementation uses Pytorch as framework. To see full implementation, please refer to this repository.
Also, if you want to read other "Summary and Implementation", feel free to check them at my blog.
I) Summary
- The paper Going Deeper with Convolutions introduces the first version of Inception model called GoogLeNet.
- During ILSVLC-2014, they achieved 1st place at the classification task (top-5 test error = 6.67%)
- It has around 6.7977 million parameters (without auxilaries layers) which is 9x fewer than AlexNet (ILSVRC-2012 winner) and 20x fewer than its competitor VGG-16.
- In most of the standard network architectures, the intuition is not clear why and when to perform the max-pooling operation, when to use the convolutional operation. For example, in AlextNet we have the convolutional operation and max-pooling operation following each other whereas in VGGNet, we have 3 convolutional operations in a row and then 1 max-pooling layer.
- Thus, the idea behind GoogLeNet is to use all the operations at the same time. It computes multiple kernels of different size over the same input map in parallel, concatenating their results into a single output. This is called an Inception module.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Here its architecture:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- There are:
- 9 Inception modules (red box)
- Global Average pooling were used instead of a Fully-connected layer.
- It enables adapting and fine-tuning on the network easily.
- 2 auxilaries softmax layer (green box)
- Their role is to push the network toward its goal and helps to ensure that the intermediate features are good enough for the network to learn.
- It turns out that softmax0 and sofmax1 gives regularization effect.
- During training, their loss gets added to the total loss with a discount weight (the losses of the auxiliary classifiers were weighted by 0.3).
- During inference, they are discarded.
- Structure:
- Average pooling layer with 5×5 filter size and stride 3 resulting in an output size:
- For 1st green box: 4x4x512.
- For 2nd green box: 4x4x528.
- 128 1x1 convolutions + ReLU.
- Fully-connected layer with 1024 units + ReLU.
- Dropout = 70%.
- Linear layer (1000 classes) + Softmax.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
II) Implementation
1) Architecture build
class InceptionModule(nn.Module):
def __init__(self, in_channels, f_1x1, f_3x3_r, f_3x3, f_5x5_r, f_5x5, f_pp):
super(InceptionModule, self).__init__()
self.branch1 = nn.Sequential(
ConvBlock(in_channels, f_1x1, kernel_size=1, stride=1, padding=0)
)
self.branch2 = nn.Sequential(
ConvBlock(in_channels, f_3x3_r, kernel_size=1, stride=1, padding=0),
ConvBlock(f_3x3_r, f_3x3, kernel_size=3, stride=1, padding=1)
)
self.branch3 = nn.Sequential(
ConvBlock(in_channels, f_5x5_r, kernel_size=1, stride=1, padding=0),
ConvBlock(f_5x5_r, f_5x5, kernel_size=5, stride=1, padding=2)
)
self.branch4 = nn.Sequential(
nn.MaxPool2d(3, stride=1, padding=1, ceil_mode=True),
ConvBlock(in_channels, f_pp, kernel_size=1, stride=1, padding=0)
)
def forward(self, x):
branch1 = self.branch1(x)
branch2 = self.branch2(x)
branch3 = self.branch3(x)
branch4 = self.branch4(x)
return torch.cat([branch1, branch2, branch3, branch4], 1)
class GoogLeNet(nn.Module):
def __init__(self, num_classes = 10):
super(GoogLeNet, self).__init__()
self.conv1 = ConvBlock(3, 64, kernel_size=7, stride=2, padding=3)
self.pool1 = nn.MaxPool2d(3, stride=2, padding=0, ceil_mode=True)
self.conv2 = ConvBlock(64, 64, kernel_size=1, stride=1, padding=0)
self.conv3 = ConvBlock(64, 192, kernel_size=3, stride=1, padding=1)
self.pool3 = nn.MaxPool2d(3, stride=2, padding=0, ceil_mode=True)
self.inception3A = InceptionModule(in_channels=192,
f_1x1=64,
f_3x3_r=96,
f_3x3=128,
f_5x5_r=16,
f_5x5=32,
f_pp=32)
self.inception3B = InceptionModule(in_channels=256,
f_1x1=128,
f_3x3_r=128,
f_3x3=192,
f_5x5_r=32,
f_5x5=96,
f_pp=64)
self.pool4 = nn.MaxPool2d(3, stride=2, padding=0, ceil_mode=True)
self.inception4A = InceptionModule(in_channels=480,
f_1x1=192,
f_3x3_r=96,
f_3x3=208,
f_5x5_r=16,
f_5x5=48,
f_pp=64)
self.inception4B = InceptionModule(in_channels=512,
f_1x1=160,
f_3x3_r=112,
f_3x3=224,
f_5x5_r=24,
f_5x5=64,
f_pp=64)
self.inception4C = InceptionModule(in_channels=512,
f_1x1=128,
f_3x3_r=128,
f_3x3=256,
f_5x5_r=24,
f_5x5=64,
f_pp=64)
self.inception4D = InceptionModule(in_channels=512,
f_1x1=112,
f_3x3_r=144,
f_3x3=288,
f_5x5_r=32,
f_5x5=64,
f_pp=64)
self.inception4E = InceptionModule(in_channels=528,
f_1x1=256,
f_3x3_r=160,
f_3x3=320,
f_5x5_r=32,
f_5x5=128,
f_pp=128)
self.pool5 = nn.MaxPool2d(3, stride=2, padding=0, ceil_mode=True)
self.inception5A = InceptionModule(in_channels=832,
f_1x1=256,
f_3x3_r=160,
f_3x3=320,
f_5x5_r=32,
f_5x5=128,
f_pp=128)
self.inception5B = InceptionModule(in_channels=832,
f_1x1=384,
f_3x3_r=192,
f_3x3=384,
f_5x5_r=48,
f_5x5=128,
f_pp=128)
self.pool6 = nn.AdaptiveAvgPool2d((1,1))
self.dropout = nn.Dropout(0.4)
self.fc = nn.Linear(1024, num_classes)
self.aux4A = InceptionAux(512, num_classes)
self.aux4D = InceptionAux(528, num_classes)
def forward(self, x):
x = self.conv1(x)
x = self.pool1(x)
x = self.conv2(x)
x = self.conv3(x)
x = self.pool3(x)
x = self.inception3A(x)
x = self.inception3B(x)
x = self.pool4(x)
x = self.inception4A(x)
aux1 = self.aux4A(x)
x = self.inception4B(x)
x = self.inception4C(x)
x = self.inception4D(x)
aux2 = self.aux4D(x)
x = self.inception4E(x)
x = self.pool5(x)
x = self.inception5A(x)
x = self.inception5B(x)
x = self.pool6(x)
x = torch.flatten(x,1)
x = self.dropout(x)
x = self.fc(x)
return x, aux1, aux2
2) Training on CIFAR-10
3) Evaluating model