--- tags: machine-learning --- # ZFNet/DeconvNet: Summary and Implementation <div style="text-align: center"> <img src="https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/zfnet/0.png?token=AMAXSKJDGC47X5KCTVDXQLK6WMFP4"> </div> >This post is divided into 2 sections: Summary and Implementation. > >We are going to have an in-depth review of [Visualizing and Understanding Convolutional Networks][paper] paper which introduces the ZFNet and DeconvNet architecture. > > The implementation uses Pytorch as framework. To see full implementation, > please refer to this [repository]. > > Also, if you want to read other "Summary and Implementation", feel free to > check them at my [blog](https://ferdinandmom.engineer/deep-learning/). # I) Summary **DISCLAIMER**: - We will use the weights/biases of a pretained model from this [github repository](https://github.com/osmr/imgclsmob/tree/master/pytorch). - We are only going to implement 1st version of ZFNet. - We will remove Local Response Normalization. --- - The paper [Visualizing and Understanding Convolutional Networks](https://cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf) introduces the notion of **Deconvnet** which enables us to **visualize each layer**. - By **visualizing each layer**, we can get **more insight** about **what the model is learning** and thus, make some **adjustements** to make it more **optimize** - That's how **ZFnet** was created, an **AlexNet fine-tuned version based on visualization results**. --- ZFNet architecture: - **5 Convolutional layers**. - **3 Fully connected layers**. - **3 Overlapping Max pooling layers**. - **ReLU** as activation function for hidden layer. - **Softmax** as activation function for output layer. - **60,000,000** trainable parameters. - **Cross-entropy** as cost function - **Mini-batch gradient descent with Momentum optimizer**. - **Local Response Normalization** (Removing it seems to give better results) --- ZFNet differences: - 1st version: - **Conv1** filters: Change from **(11x11 stride 4)** to **(7x7 stride 2)**. - By using **ZFNet**, top-5 validation error rate is **16.5%**. - By using **AlexNet**, top-5 validation error rate is **18.2%**. - 2nd version: - **Conv2** filters: Use of **512** filters instead of **384**. - **Conv4** filters: Use of **1024** filters instead of **384**. - **Conv5** filters: Use of **512** filters instead of **256**. - By using **ZFnet**, top-5 validation error rate is **16.0%**. - By using **AlexNe**t, top-5 validation error rate is **18.2%**. ![legend] ![zfnet-model] To visualize each layer, we need to reconstruct an approximate version of the picture. To do so, we first need to feed our main convnet with an image so that it can record the location of the local max in each pooling region (called switches) Then, the switches are then used in the unpooling layer to map back pixels to input space. ![deconvnet] # II) Implementation ### 1) Architecture build ```python class ZFNet(nn.Module): def __init__(self): super(ZFNet, self).__init__() # CONV PART. self.features = nn.Sequential(OrderedDict([ ('conv1', nn.Conv2d(3, 96, kernel_size=7, stride=2, padding=1)), ('act1', nn.ReLU()), ('pool1', nn.MaxPool2d(kernel_size=3, stride=2, padding=1, return_indices=True)), ('conv2', nn.Conv2d(96, 256, kernel_size=5, stride=2, padding=0)), ('act2', nn.ReLU()), ('pool2', nn.MaxPool2d(kernel_size=3, stride=2, padding=1, return_indices=True)), ('conv3', nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1)), ('act3', nn.ReLU()), ('conv4', nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1)), ('act4', nn.ReLU()), ('conv5', nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1)), ('act5', nn.ReLU()), ('pool5', nn.MaxPool2d(kernel_size=3, stride=2, padding=0, return_indices=True)) ])) self.feature_outputs = [0]*len(self.features) self.switch_indices = dict() self.sizes = dict() self.classifier = nn.Sequential(OrderedDict([ ('fc6', nn.Linear(9216, 4096)), ('act6', nn.ReLU()), ('fc7', nn.Linear(4096, 4096)), ('act7', nn.ReLU()), ('fc8', nn.Linear(4096, 1000)) ])) # DECONV PART. self.deconv_pool5 = nn.MaxUnpool2d(kernel_size=3, stride=2, padding=0) self.deconv_act5 = nn.ReLU() self.deconv_conv5 = nn.ConvTranspose2d(256, 384, kernel_size=3, stride=1, padding=1, bias=False) self.deconv_act4 = nn.ReLU() self.deconv_conv4 = nn.ConvTranspose2d(384, 384, kernel_size=3, stride=1, padding=1, bias=False) self.deconv_act3 = nn.ReLU() self.deconv_conv3 = nn.ConvTranspose2d(384, 256, kernel_size=3, stride=1, padding=1, bias=False) self.deconv_pool2 = nn.MaxUnpool2d(kernel_size=3, stride=2, padding=1) self.deconv_act2 = nn.ReLU() self.deconv_conv2 = nn.ConvTranspose2d(256, 96, kernel_size=5, stride=2, padding=0, bias=False) self.deconv_pool1 = nn.MaxUnpool2d(kernel_size=3, stride=2, padding=1) self.deconv_act1 = nn.ReLU() self.deconv_conv1 = nn.ConvTranspose2d(96, 3, kernel_size=7, stride=2, padding=1, bias=False) def forward(self, x): for i, layer in enumerate(self.features): if isinstance(layer, nn.MaxPool2d): x, indices = layer(x) self.feature_outputs[i] = x self.switch_indices[i] = indices else: x = layer(x) self.feature_outputs[i] = x x = x.view(x.size(0), -1) x = self.classifier(x) return x def forward_deconv(self, x, layer): if layer < 1 or layer > 5: raise Exception("ZFnet -> forward_deconv(): layer value should be between [1, 5]") x = self.deconv_pool5(x, self.switch_indices[12], output_size=self.feature_outputs[-2].shape[-2:]) x = self.deconv_act5(x) x = self.deconv_conv5(x) if layer == 1: return x x = self.deconv_act4(x) x = self.deconv_conv4(x) if layer == 2: return x x = self.deconv_act3(x) x = self.deconv_conv3(x) if layer == 3: return x x = self.deconv_pool2(x, self.switch_indices[5], output_size=self.feature_outputs[4].shape[-2:]) x = self.deconv_act2(x) x = self.deconv_conv2(x) if layer == 4: return x x = self.deconv_pool1(x, self.switch_indices[2], output_size=self.feature_outputs[1].shape[-2:]) x = self.deconv_act1(x) x = self.deconv_conv1(x) if layer == 5: return x ``` ### 2) Evaluating ```python fig2 = plt.figure(figsize=(30,10)) model.eval() with torch.no_grad(): for i, image in enumerate(test_loader): probs = torch.nn.Softmax(dim=-1)(model(image)) probability, class_idx = torch.max(probs, 1) class_name = class_names[class_idx] fig2.add_subplot(1,4,i+1) plt.imshow(cv2.cvtColor(custom_dataset.imgsToDisplay[i], cv2.COLOR_BGR2RGB)) plt.title("Class: " + class_name + ", probability: %.4f" % probability, fontsize=13) plt.axis('off') plt.text(0, 240, 'Top-5 Accuracy:') x, y = 10, 260 for idx in np.argsort(probs.numpy())[0][-5::][::-1]: s = '- {}, probability: {:.4f}'.format(class_names[idx], probs[0, idx]) plt.text(x, y, s=s, fontsize=10) y += 20 print() ``` ![evaluating-image] ### 3) Visualization of each layer ```python fig2 = plt.figure(figsize=(60,60)) model.eval() count = 0 with torch.no_grad(): for i, image in enumerate(test_loader): probs = torch.nn.Softmax(dim=-1)(model(image)) for j in range(1,6): count += 1 ax = fig2.add_subplot(4,5, count) ax.set_title("Layer {}".format(j), fontsize= 30) plt.axis('off') # Channel 3 of the image. plt.imshow(model.forward_deconv(model.feature_outputs[12], j).detach().numpy()[0, 2, :]) ``` ![visualization-image] [paper]:https://cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf [repository]: https://github.com/3outeille/Research-Paper-Summary/tree/master/src/architecture/zfnet/pytorch [legend]: https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/zfnet/1.png?token=AMAXSKI5B33YJBIM5RJ6PLK6WMFP6 [zfnet-model]: https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/zfnet/2.png?token=AMAXSKILCGGXLCQYJDYXVQ26WMFP6 [deconvnet]:https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/zfnet/3.png?token=AMAXSKJD4POFI23IMQUVHOK6WMFUW [evaluating-image]: https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/zfnet/4.png?token=AMAXSKJ24LF5JVJYVKFGS7S6WMFQC [visualization-image]: https://raw.githubusercontent.com/valoxe/image-storage-1/master/research-paper-summary/zfnet/5.png?token=AMAXSKKVQSFCIW43FTGO2Z26WMFQC