VggNet: Summary and Implementation

This post is divided into 2 sections: Summary and Implementation.

We are going to have an in-depth review of Very Deep Convolutional Networks for Large-Scale Image Recognition paper which introduces the VggNet architecture.

The implementation uses Pytorch as framework. To see full implementation,
please refer to this repository.

Also, if you want to read other "Summary and Implementation", feel free to
check them at my blog.

I) Summary

The paper Very Deep Convolutional Networks for Large-Scale Image Recognition introduces a familly of ConvNets called VGGNet.
During ILSVLC-2014, they achieved 2nd place at the classification task (top-5 test error = 7.32%)
They demonstrated that depth is beneficial for the classification accuracy.
In spite of its large depth, the number of weights is not greater than number of weights in a more shallow net with larger conv.
VGGNet uses a smaller receptive field (3x3 stride 1) contrary to AlexNet (11x11 with stride 4) and ZFNet (7x7 stride 2).

VGG architecture:

Input size 224x224x3 (RGB image).
Preprocessing done by substracting training set RGB mean.
Filters size 3x3.
Convolutional layers:
- stride 1.
- padding 1 (3x3 conv layers).
- ReLU or LRN for one of the config.
- followed by 5 max pooling layers (not all of them).
  - 2x2 window.
  - stride = 2.
Fully-connected layers:
- 1st: 4096 (ReLU).
- 2nd: 4096 (ReLU).
- 3rd: 100 (Softmax).

VGGNet configurations:

VGG-11
VGG-11 (LRN)
VGG-13
VGG-16 (Conv1)
VGG-16
VGG-19

We are going to focus on VGG-16. Here is its architecture:

II) Implementation

We are going to implement VGG-16.

1) Architecture build

class Vgg16(nn.Module):
    
    def __init__(self):
        super(Vgg16, self).__init__()
        
        # CONV PART.
        self.features = nn.Sequential(OrderedDict([
            ('block1-conv1', nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)),
            ('block1-act1', nn.ReLU()),
            ('block1-conv2', nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1)),
            ('block1-act2', nn.ReLU()),
            
            ('pool1', nn.MaxPool2d(kernel_size=2, stride=2)),
            
            ('block2-conv1', nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)),
            ('block2-act1', nn.ReLU()),
            ('block2-conv2', nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1)),
            ('block2-act2', nn.ReLU()),
            
            ('pool2', nn.MaxPool2d(kernel_size=2, stride=2)),
            
            ('block3-conv1', nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1)),
            ('block3-act1', nn.ReLU()),
            ('block3-conv2', nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)),
            ('block3-act2', nn.ReLU()),
            ('block3-conv3', nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)),
            ('block3-act3', nn.ReLU()),
            
            ('pool3', nn.MaxPool2d(kernel_size=2, stride=2)),
            
            ('block4-conv1', nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1)),
            ('block4-act1', nn.ReLU()),
            ('block4-conv2', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
            ('block4-act2', nn.ReLU()),
            ('block4-conv3', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
            ('block4-act3', nn.ReLU()),
            
            ('pool4', nn.MaxPool2d(kernel_size=2, stride=2)),
            
            ('block5-conv1', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
            ('block5-act1', nn.ReLU()),
            ('block5-conv2', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
            ('block5-act2', nn.ReLU()),
            ('block5-conv3', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
            ('block5-act3', nn.ReLU()),
            
            ('pool5', nn.MaxPool2d(kernel_size=2, stride=2))
        ]))
        
        # FC PART.
        
        self.classifier = nn.Sequential(OrderedDict([
            ('fc6', nn.Linear(512 * 7 * 7, 4096)),
            ('act6', nn.ReLU()),
            ('fc7', nn.Linear(4096, 4096)),
            ('act7', nn.ReLU()),
            ('fc8', nn.Linear(4096, 1000))
        ]))
        
        
    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

2) Evaluating

fig2 = plt.figure(figsize=(30,10))

model.eval()
with torch.no_grad():
    
    for i, image in enumerate(test_loader):
        probs = torch.nn.Softmax(dim=-1)(model(image))
        
        probability, class_idx = torch.max(probs, 1)
        class_name = class_names[class_idx]
         
        fig2.add_subplot(1,4,i+1)
        plt.imshow(cv2.cvtColor(custom_dataset.imgsToDisplay[i], cv2.COLOR_BGR2RGB))
        plt.title("Class: " + class_name + ", probability: %.4f" % probability, fontsize=13)
        plt.axis('off')

        plt.text(0, 240, 'Top-5 Accuracy:')
        x, y = 10, 260
        
        for idx in np.argsort(probs.numpy())[0][-5::][::-1]:
            s = '- {}, probability: {:.4f}'.format(class_names[idx], probs[0, idx])
            plt.text(x, y, s=s, fontsize=10)
            y += 20
        print()

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

VggNet: Summary and Implementation

I) Summary

II) Implementation

1) Architecture build

2) Evaluating

Read more

Convolutional Neural Network with Numpy (Fast)

Convolutional Neural Network with Numpy (Slow)

AlexNet: Summary and Implementation

ZFNet/DeconvNet: Summary and Implementation