# Butikofer Kevin, Jaggi Charles-Lewis ## Practical work 11 - Understanding Deep Neural Networks ### 1. Understanding Convolutional neural networks by using filter activation statistics When we use the filter with the highest activation after the first layer on the whole data set ![](https://i.imgur.com/CQXj4tn.png) We can see in the picture that on all digits there are parts where filters have high activation - The filter 5 (yellow) determines all the parts that under a part of the digits - The filter 3 (green) does the opposite of filter filter 5 and determines all the parts that above a digit - The filter 0 gives a good representation of the inside of a curve. It is good for curve digit like 2,0,8,6 - The filter 1(blue) give the the inner of curve that are like a half moon and the filter 2 (green) does the opposite - The filtre 7 (pink) determines straight line as we can see in 7,1,9,6 All this filter expect 7 have a good representation on 0 because it's a round Activation map on L2 and L3 show that curve are transformed into lines due to dimension reduction after maxpooling. For L2 lines are thicker than L3. ### 2. Activation maximization as a means for understanding a CNN model #### Test different values of `tv_weight` When we increase tv_weight we can see that lines becomes smaller and the images become more realistic #### Select the regularization parameter that gives the best images (more realistic) 13 or 16 #### Show the images that maximize each one of the outputs of the network ##### Output 0 tv_weight = 13 ![](https://i.imgur.com/I9JL0MT.png) ##### Output 1 tv_weight = 16 ![](https://i.imgur.com/sHTFX42.png) ##### Output 2 tv_weight = 8 ![](https://i.imgur.com/Yg8x8FW.png) ##### Output 3 tv_weight = 2 ![](https://i.imgur.com/INzmlsb.png) ##### Output 4 tv_weight = 9 ![](https://i.imgur.com/Mr2HN2N.png) ##### Output 5 tv_weight = 13 ![](https://i.imgur.com/BLzOZ1a.png) ##### Output 6 tv_weight = 5 ![](https://i.imgur.com/t17sDXY.png) ##### Output 7 tv_weight = 9 ![](https://i.imgur.com/HAWT0ud.png) ##### Output 8 tv_weight = 16 ![](https://i.imgur.com/gUGSDqB.png) ##### Output 9 tv_weight = 16 ![](https://i.imgur.com/25WxWnV.png) #### Try two classes with similar shape like 1 and 7 or 4 and 9 ##### Class 1 et 7 tv_weight = 9 ![](https://i.imgur.com/3mDs8aE.png) ##### Class 4 et 9 tv_weight = 8 ![](https://i.imgur.com/hv8hBbQ.png) #### Try two classes with very different shapes like 0 and 1 or 7 and 8 ##### Class (0 et 1) and (7 et 8) Can't have realistic images because the shape of the input are tho different #### How activation maximization can be useful for understanding a deep neural network? Explain It helps to find model realistic looking of input to better learn about data and features. By synthesizing the inputs for neurones in neutral networks, that improve quality of deep learning. ### 3. Class Activation Maps #### 1. Compute the activation map for all the images we give you. | file | vanilla | guided | | -------- | -------- | -------- | | soccer3| ![](https://i.imgur.com/XrY60jt.png) | ![](https://i.imgur.com/uoORjGu.png) | | soccer2| ![](https://i.imgur.com/z1ED7ik.png) | ![](https://i.imgur.com/Fd01yUK.png) | | soccer1 | ![](https://i.imgur.com/2g72SCE.png) | ![](https://i.imgur.com/I1kutBx.png) | | cat | ![](https://i.imgur.com/NlJfpsv.png) | ![](https://i.imgur.com/pjTGnvd.png) | | cow | ![](https://i.imgur.com/5LKNmRu.png) | ![](https://i.imgur.com/RcirK7F.png)| | mini-skit | ![](https://i.imgur.com/kXZI0Lu.png)|![](https://i.imgur.com/cxSl9jD.png) | #### 2. Do the maps reflect the class of the pictures? For soccer2, cat, cow, the map reflects the class of the pictures #### 3. Are there images in which the activation maps highlight zones that do not belong to the object? Which ones? For soccer3, soccer1, mini-skirt the activation maps doesnt reflect the class of the picture #### 4. Go to the ImageNet site (www.image-net.org) and observe the images that were used to train the classifier on these specific classes (i.e., soccer ball). Do you think that all images have been labelled appropriately ? explain. Yes, because the football ball is in every images. #### 5.Hypothesize about the behavior of the network observed in 3. The relevants parts of the image are different of what we think. In Mini-skirt for example, there is always the hands of the person so the network focus on hands.