Try โ€‚โ€‰HackMD

VggNet: Summary and Implementation

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

This post is divided into 2 sections: Summary and Implementation.

We are going to have an in-depth review of Very Deep Convolutional Networks for Large-Scale Image Recognition paper which introduces the VggNet architecture.

The implementation uses Pytorch as framework. To see full implementation,
please refer to this repository.

Also, if you want to read other "Summary and Implementation", feel free to
check them at my blog.

I) Summary

  • The paper Very Deep Convolutional Networks for Large-Scale Image Recognition introduces a familly of ConvNets called VGGNet.
  • During ILSVLC-2014, they achieved 2nd place at the classification task (top-5 test error = 7.32%)
  • They demonstrated that depth is beneficial for the classification accuracy.
  • In spite of its large depth, the number of weights is not greater than number of weights in a more shallow net with larger conv.
  • VGGNet uses a smaller receptive field (3x3 stride 1) contrary to AlexNet (11x11 with stride 4) and ZFNet (7x7 stride 2).

VGG architecture:

  • Input size 224x224x3 (RGB image).
  • Preprocessing done by substracting training set RGB mean.
  • Filters size 3x3.
  • Convolutional layers:
    • stride 1.
    • padding 1 (3x3 conv layers).
    • ReLU or LRN for one of the config.
    • followed by 5 max pooling layers (not all of them).
      • 2x2 window.
      • stride = 2.
  • Fully-connected layers:
    • 1st: 4096 (ReLU).
    • 2nd: 4096 (ReLU).
    • 3rd: 100 (Softmax).

VGGNet configurations:

  • VGG-11
  • VGG-11 (LRN)
  • VGG-13
  • VGG-16 (Conv1)
  • VGG-16
  • VGG-19
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

We are going to focus on VGG-16. Here is its architecture:


Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

II) Implementation

We are going to implement VGG-16.

1) Architecture build

class Vgg16(nn.Module):
    
    def __init__(self):
        super(Vgg16, self).__init__()
        
        # CONV PART.
        self.features = nn.Sequential(OrderedDict([
            ('block1-conv1', nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)),
            ('block1-act1', nn.ReLU()),
            ('block1-conv2', nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1)),
            ('block1-act2', nn.ReLU()),
            
            ('pool1', nn.MaxPool2d(kernel_size=2, stride=2)),
            
            ('block2-conv1', nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)),
            ('block2-act1', nn.ReLU()),
            ('block2-conv2', nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1)),
            ('block2-act2', nn.ReLU()),
            
            ('pool2', nn.MaxPool2d(kernel_size=2, stride=2)),
            
            ('block3-conv1', nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1)),
            ('block3-act1', nn.ReLU()),
            ('block3-conv2', nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)),
            ('block3-act2', nn.ReLU()),
            ('block3-conv3', nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)),
            ('block3-act3', nn.ReLU()),
            
            ('pool3', nn.MaxPool2d(kernel_size=2, stride=2)),
            
            ('block4-conv1', nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1)),
            ('block4-act1', nn.ReLU()),
            ('block4-conv2', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
            ('block4-act2', nn.ReLU()),
            ('block4-conv3', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
            ('block4-act3', nn.ReLU()),
            
            ('pool4', nn.MaxPool2d(kernel_size=2, stride=2)),
            
            ('block5-conv1', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
            ('block5-act1', nn.ReLU()),
            ('block5-conv2', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
            ('block5-act2', nn.ReLU()),
            ('block5-conv3', nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1)),
            ('block5-act3', nn.ReLU()),
            
            ('pool5', nn.MaxPool2d(kernel_size=2, stride=2))
        ]))
        
        # FC PART.
        
        self.classifier = nn.Sequential(OrderedDict([
            ('fc6', nn.Linear(512 * 7 * 7, 4096)),
            ('act6', nn.ReLU()),
            ('fc7', nn.Linear(4096, 4096)),
            ('act7', nn.ReLU()),
            ('fc8', nn.Linear(4096, 1000))
        ]))
        
        
    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

2) Evaluating

fig2 = plt.figure(figsize=(30,10))

model.eval()
with torch.no_grad():
    
    for i, image in enumerate(test_loader):
        probs = torch.nn.Softmax(dim=-1)(model(image))
        
        probability, class_idx = torch.max(probs, 1)
        class_name = class_names[class_idx]
         
        fig2.add_subplot(1,4,i+1)
        plt.imshow(cv2.cvtColor(custom_dataset.imgsToDisplay[i], cv2.COLOR_BGR2RGB))
        plt.title("Class: " + class_name + ", probability: %.4f" % probability, fontsize=13)
        plt.axis('off')

        plt.text(0, 240, 'Top-5 Accuracy:')
        x, y = 10, 260
        
        for idx in np.argsort(probs.numpy())[0][-5::][::-1]:
            s = '- {}, probability: {:.4f}'.format(class_names[idx], probs[0, idx])
            plt.text(x, y, s=s, fontsize=10)
            y += 20
        print()

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’