--- tags: machine-learning --- # VggNet: Summary and Implementation <div style="text-align: center"> <img src="https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/vggnet/0.png?token=AMAXSKN5KOX7F6TA34Z4K226WMHKI"> </div> >This post is divided into 2 sections: Summary and Implementation. > >We are going to have an in-depth review of [Very Deep Convolutional Networks for Large-Scale Image Recognition][paper] paper which introduces the VggNet architecture. > > The implementation uses Pytorch as framework. To see full implementation, > please refer to this [repository]. > > Also, if you want to read other "Summary and Implementation", feel free to > check them at my [blog](https://ferdinandmom.engineer/deep-learning/). > # I) Summary - The paper [Very Deep Convolutional Networks for Large-Scale Image Recognition][paper] introduces a familly of ConvNets called VGGNet. - During ILSVLC-2014, they achieved 2nd place at the classification task (top-5 test error = 7.32%) - They demonstrated that depth is beneficial for the classification accuracy. - In spite of its large depth, the number of weights is not greater than number of weights in a more shallow net with larger conv. - VGGNet uses a smaller receptive field (3x3 stride 1) contrary to AlexNet (11x11 with stride 4) and ZFNet (7x7 stride 2). --- VGG architecture: - Input size 224x224x3 (RGB image). - Preprocessing done by substracting training set RGB mean. - Filters size 3x3. - Convolutional layers: - stride 1. - padding 1 (3x3 conv layers). - ReLU or LRN for one of the config. - followed by 5 max pooling layers (not all of them). - 2x2 window. - stride = 2. - Fully-connected layers: - 1st: 4096 (ReLU). - 2nd: 4096 (ReLU). - 3rd: 100 (Softmax). VGGNet configurations: - VGG-11 - VGG-11 (LRN) - VGG-13 - VGG-16 (Conv1) - VGG-16 - VGG-19 <div style="text-align: center"> <img src="https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/vggnet/1.png?token=AMAXSKPJWDG6HR7HYB72KCS6WMHKI"> </div> <br> We are going to focus on VGG-16. Here is its architecture: <br> <div style="text-align: center"> <img src="https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/vggnet/2.png?token=AMAXSKN3R2EKTD235XSBT2C6WMHKM"> </div> <div style="text-align: center"> <img src="https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/vggnet/3.png?token=AMAXSKLO7B2XISQS7FH54C26WMHKK"> </div> <div style="text-align: center"> <img src="https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/vggnet/4.png?token=AMAXSKN5R3RI33N2IPYS5426WMHKM"> </div> # II) Implementation We are going to implement VGG-16. ### 1) Architecture build ```python class Vgg16(nn.Module): def __init__(self): super(Vgg16, self).__init__() # CONV PART. self.features = nn.Sequential(OrderedDict([ ('block1-conv1', nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)), ('block1-act1', nn.ReLU()), ('block1-conv2', nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1)), ('block1-act2', nn.ReLU()), ('pool1', nn.MaxPool2d(kernel_size=2, stride=2)), ('block2-conv1', nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)), ('block2-act1', nn.ReLU()), ('block2-conv2', nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1)), ('block2-act2', nn.ReLU()), ('pool2', nn.MaxPool2d(kernel_size=2, stride=2)), ('block3-conv1', nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1)), ('block3-act1', nn.ReLU()), ('block3-conv2', nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)), ('block3-act2', nn.ReLU()), ('block3-conv3', nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)), ('block3-act3', nn.ReLU()), ('pool3', nn.MaxPool2d(kernel_size=2, stride=2)), ('block4-conv1', nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1)), ('block4-act1', nn.ReLU()), ('block4-conv2', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)), ('block4-act2', nn.ReLU()), ('block4-conv3', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)), ('block4-act3', nn.ReLU()), ('pool4', nn.MaxPool2d(kernel_size=2, stride=2)), ('block5-conv1', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)), ('block5-act1', nn.ReLU()), ('block5-conv2', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)), ('block5-act2', nn.ReLU()), ('block5-conv3', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)), ('block5-act3', nn.ReLU()), ('pool5', nn.MaxPool2d(kernel_size=2, stride=2)) ])) # FC PART. self.classifier = nn.Sequential(OrderedDict([ ('fc6', nn.Linear(512 * 7 * 7, 4096)), ('act6', nn.ReLU()), ('fc7', nn.Linear(4096, 4096)), ('act7', nn.ReLU()), ('fc8', nn.Linear(4096, 1000)) ])) def forward(self, x): x = self.features(x) x = x.view(x.size(0), -1) x = self.classifier(x) return x ``` ### 2) Evaluating ```python fig2 = plt.figure(figsize=(30,10)) model.eval() with torch.no_grad(): for i, image in enumerate(test_loader): probs = torch.nn.Softmax(dim=-1)(model(image)) probability, class_idx = torch.max(probs, 1) class_name = class_names[class_idx] fig2.add_subplot(1,4,i+1) plt.imshow(cv2.cvtColor(custom_dataset.imgsToDisplay[i], cv2.COLOR_BGR2RGB)) plt.title("Class: " + class_name + ", probability: %.4f" % probability, fontsize=13) plt.axis('off') plt.text(0, 240, 'Top-5 Accuracy:') x, y = 10, 260 for idx in np.argsort(probs.numpy())[0][-5::][::-1]: s = '- {}, probability: {:.4f}'.format(class_names[idx], probs[0, idx]) plt.text(x, y, s=s, fontsize=10) y += 20 print() ``` ![evaluating-image] [paper]:https://arxiv.org/pdf/1409.1556.pdf [repository]: https://github.com/3outeille/Research-Paper-Summary/tree/master/src/architecture/vgg/pytorch [evaluating-image]: https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/vggnet/5.png?token=AMAXSKKRZ6FERMROIFH5ROS6WMHKM