---
tags: machine-learning
---
# VggNet: Summary and Implementation
<div style="text-align: center">
<img src="https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/vggnet/0.png?token=AMAXSKN5KOX7F6TA34Z4K226WMHKI">
</div>
>This post is divided into 2 sections: Summary and Implementation.
>
>We are going to have an in-depth review of [Very Deep Convolutional Networks for Large-Scale Image Recognition][paper] paper which introduces the VggNet architecture.
>
> The implementation uses Pytorch as framework. To see full implementation,
> please refer to this [repository].
>
> Also, if you want to read other "Summary and Implementation", feel free to
> check them at my [blog](https://ferdinandmom.engineer/deep-learning/).
>
# I) Summary
- The paper [Very Deep Convolutional Networks for Large-Scale Image Recognition][paper] introduces a familly of ConvNets called VGGNet.
- During ILSVLC-2014, they achieved 2nd place at the classification task (top-5 test error = 7.32%)
- They demonstrated that depth is beneficial for the classification accuracy.
- In spite of its large depth, the number of weights is not greater than number of weights in a more shallow net with larger conv.
- VGGNet uses a smaller receptive field (3x3 stride 1) contrary to AlexNet (11x11 with stride 4) and ZFNet (7x7 stride 2).
---
VGG architecture:
- Input size 224x224x3 (RGB image).
- Preprocessing done by substracting training set RGB mean.
- Filters size 3x3.
- Convolutional layers:
- stride 1.
- padding 1 (3x3 conv layers).
- ReLU or LRN for one of the config.
- followed by 5 max pooling layers (not all of them).
- 2x2 window.
- stride = 2.
- Fully-connected layers:
- 1st: 4096 (ReLU).
- 2nd: 4096 (ReLU).
- 3rd: 100 (Softmax).
VGGNet configurations:
- VGG-11
- VGG-11 (LRN)
- VGG-13
- VGG-16 (Conv1)
- VGG-16
- VGG-19
<div style="text-align: center">
<img src="https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/vggnet/1.png?token=AMAXSKPJWDG6HR7HYB72KCS6WMHKI">
</div>
<br>
We are going to focus on VGG-16. Here is its architecture:
<br>
<div style="text-align: center">
<img src="https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/vggnet/2.png?token=AMAXSKN3R2EKTD235XSBT2C6WMHKM">
</div>
<div style="text-align: center">
<img src="https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/vggnet/3.png?token=AMAXSKLO7B2XISQS7FH54C26WMHKK">
</div>
<div style="text-align: center">
<img src="https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/vggnet/4.png?token=AMAXSKN5R3RI33N2IPYS5426WMHKM">
</div>
# II) Implementation
We are going to implement VGG-16.
### 1) Architecture build
```python
class Vgg16(nn.Module):
def __init__(self):
super(Vgg16, self).__init__()
# CONV PART.
self.features = nn.Sequential(OrderedDict([
('block1-conv1', nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)),
('block1-act1', nn.ReLU()),
('block1-conv2', nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1)),
('block1-act2', nn.ReLU()),
('pool1', nn.MaxPool2d(kernel_size=2, stride=2)),
('block2-conv1', nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)),
('block2-act1', nn.ReLU()),
('block2-conv2', nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1)),
('block2-act2', nn.ReLU()),
('pool2', nn.MaxPool2d(kernel_size=2, stride=2)),
('block3-conv1', nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1)),
('block3-act1', nn.ReLU()),
('block3-conv2', nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)),
('block3-act2', nn.ReLU()),
('block3-conv3', nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)),
('block3-act3', nn.ReLU()),
('pool3', nn.MaxPool2d(kernel_size=2, stride=2)),
('block4-conv1', nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1)),
('block4-act1', nn.ReLU()),
('block4-conv2', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
('block4-act2', nn.ReLU()),
('block4-conv3', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
('block4-act3', nn.ReLU()),
('pool4', nn.MaxPool2d(kernel_size=2, stride=2)),
('block5-conv1', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
('block5-act1', nn.ReLU()),
('block5-conv2', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
('block5-act2', nn.ReLU()),
('block5-conv3', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
('block5-act3', nn.ReLU()),
('pool5', nn.MaxPool2d(kernel_size=2, stride=2))
]))
# FC PART.
self.classifier = nn.Sequential(OrderedDict([
('fc6', nn.Linear(512 * 7 * 7, 4096)),
('act6', nn.ReLU()),
('fc7', nn.Linear(4096, 4096)),
('act7', nn.ReLU()),
('fc8', nn.Linear(4096, 1000))
]))
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
```
### 2) Evaluating
```python
fig2 = plt.figure(figsize=(30,10))
model.eval()
with torch.no_grad():
for i, image in enumerate(test_loader):
probs = torch.nn.Softmax(dim=-1)(model(image))
probability, class_idx = torch.max(probs, 1)
class_name = class_names[class_idx]
fig2.add_subplot(1,4,i+1)
plt.imshow(cv2.cvtColor(custom_dataset.imgsToDisplay[i], cv2.COLOR_BGR2RGB))
plt.title("Class: " + class_name + ", probability: %.4f" % probability, fontsize=13)
plt.axis('off')
plt.text(0, 240, 'Top-5 Accuracy:')
x, y = 10, 260
for idx in np.argsort(probs.numpy())[0][-5::][::-1]:
s = '- {}, probability: {:.4f}'.format(class_names[idx], probs[0, idx])
plt.text(x, y, s=s, fontsize=10)
y += 20
print()
```
![evaluating-image]
[paper]:https://arxiv.org/pdf/1409.1556.pdf
[repository]: https://github.com/3outeille/Research-Paper-Summary/tree/master/src/architecture/vgg/pytorch
[evaluating-image]: https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/vggnet/5.png?token=AMAXSKKRZ6FERMROIFH5ROS6WMHKM