ZFNet/DeconvNet: Summary and Implementation
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
This post is divided into 2 sections: Summary and Implementation.
We are going to have an in-depth review of Visualizing and Understanding Convolutional Networks paper which introduces the ZFNet and DeconvNet architecture.
The implementation uses Pytorch as framework. To see full implementation,
please refer to this repository.
Also, if you want to read other "Summary and Implementation", feel free to
check them at my blog.
I) Summary
DISCLAIMER:
- We will use the weights/biases of a pretained model from this github repository.
- We are only going to implement 1st version of ZFNet.
- We will remove Local Response Normalization.
- The paper Visualizing and Understanding Convolutional Networks introduces the notion of Deconvnet which enables us to visualize each layer.
- By visualizing each layer, we can get more insight about what the model is learning and thus, make some adjustements to make it more optimize
- That's how ZFnet was created, an AlexNet fine-tuned version based on visualization results.
ZFNet architecture:
- 5 Convolutional layers.
- 3 Fully connected layers.
- 3 Overlapping Max pooling layers.
- ReLU as activation function for hidden layer.
- Softmax as activation function for output layer.
- 60,000,000 trainable parameters.
- Cross-entropy as cost function
- Mini-batch gradient descent with Momentum optimizer.
- Local Response Normalization (Removing it seems to give better results)
ZFNet differences:
-
1st version:
- Conv1 filters: Change from (11x11 stride 4) to (7x7 stride 2).
- By using ZFNet, top-5 validation error rate is 16.5%.
- By using AlexNet, top-5 validation error rate is 18.2%.
-
2nd version:
- Conv2 filters: Use of 512 filters instead of 384.
- Conv4 filters: Use of 1024 filters instead of 384.
- Conv5 filters: Use of 512 filters instead of 256.
- By using ZFnet, top-5 validation error rate is 16.0%.
- By using AlexNet, top-5 validation error rate is 18.2%.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
To visualize each layer, we need to reconstruct an approximate version of the picture.
To do so, we first need to feed our main convnet with an image so that it can record the location of the local max in each pooling region (called switches)
Then, the switches are then used in the unpooling layer to map back pixels to input space.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
II) Implementation
1) Architecture build
class ZFNet(nn.Module):
def __init__(self):
super(ZFNet, self).__init__()
self.features = nn.Sequential(OrderedDict([
('conv1', nn.Conv2d(3, 96, kernel_size=7, stride=2, padding=1)),
('act1', nn.ReLU()),
('pool1', nn.MaxPool2d(kernel_size=3, stride=2, padding=1, return_indices=True)),
('conv2', nn.Conv2d(96, 256, kernel_size=5, stride=2, padding=0)),
('act2', nn.ReLU()),
('pool2', nn.MaxPool2d(kernel_size=3, stride=2, padding=1, return_indices=True)),
('conv3', nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1)),
('act3', nn.ReLU()),
('conv4', nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1)),
('act4', nn.ReLU()),
('conv5', nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1)),
('act5', nn.ReLU()),
('pool5', nn.MaxPool2d(kernel_size=3, stride=2, padding=0, return_indices=True))
]))
self.feature_outputs = [0]*len(self.features)
self.switch_indices = dict()
self.sizes = dict()
self.classifier = nn.Sequential(OrderedDict([
('fc6', nn.Linear(9216, 4096)),
('act6', nn.ReLU()),
('fc7', nn.Linear(4096, 4096)),
('act7', nn.ReLU()),
('fc8', nn.Linear(4096, 1000))
]))
self.deconv_pool5 = nn.MaxUnpool2d(kernel_size=3,
stride=2,
padding=0)
self.deconv_act5 = nn.ReLU()
self.deconv_conv5 = nn.ConvTranspose2d(256,
384,
kernel_size=3,
stride=1,
padding=1,
bias=False)
self.deconv_act4 = nn.ReLU()
self.deconv_conv4 = nn.ConvTranspose2d(384,
384,
kernel_size=3,
stride=1,
padding=1,
bias=False)
self.deconv_act3 = nn.ReLU()
self.deconv_conv3 = nn.ConvTranspose2d(384,
256,
kernel_size=3,
stride=1,
padding=1,
bias=False)
self.deconv_pool2 = nn.MaxUnpool2d(kernel_size=3,
stride=2,
padding=1)
self.deconv_act2 = nn.ReLU()
self.deconv_conv2 = nn.ConvTranspose2d(256,
96,
kernel_size=5,
stride=2,
padding=0,
bias=False)
self.deconv_pool1 = nn.MaxUnpool2d(kernel_size=3,
stride=2,
padding=1)
self.deconv_act1 = nn.ReLU()
self.deconv_conv1 = nn.ConvTranspose2d(96,
3,
kernel_size=7,
stride=2,
padding=1,
bias=False)
def forward(self, x):
for i, layer in enumerate(self.features):
if isinstance(layer, nn.MaxPool2d):
x, indices = layer(x)
self.feature_outputs[i] = x
self.switch_indices[i] = indices
else:
x = layer(x)
self.feature_outputs[i] = x
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
def forward_deconv(self, x, layer):
if layer < 1 or layer > 5:
raise Exception("ZFnet -> forward_deconv(): layer value should be between [1, 5]")
x = self.deconv_pool5(x,
self.switch_indices[12],
output_size=self.feature_outputs[-2].shape[-2:])
x = self.deconv_act5(x)
x = self.deconv_conv5(x)
if layer == 1:
return x
x = self.deconv_act4(x)
x = self.deconv_conv4(x)
if layer == 2:
return x
x = self.deconv_act3(x)
x = self.deconv_conv3(x)
if layer == 3:
return x
x = self.deconv_pool2(x,
self.switch_indices[5],
output_size=self.feature_outputs[4].shape[-2:])
x = self.deconv_act2(x)
x = self.deconv_conv2(x)
if layer == 4:
return x
x = self.deconv_pool1(x,
self.switch_indices[2],
output_size=self.feature_outputs[1].shape[-2:])
x = self.deconv_act1(x)
x = self.deconv_conv1(x)
if layer == 5:
return x
2) Evaluating

3) Visualization of each layer
