--- tags: hw3, conceptual --- # HW3 Conceptual: CNNs :::info Due **Monday, March 6th, 2023 at 6:00 PM EST** ::: Answer the following questions, showing your work where necessary. Please explain your answers and work. :::info We encourage the use of $\LaTeX$ to typeset your answers, as it makes it easier for you and us, though you are not required to do so. We provide a general $\LaTeX$ [template](https://www.overleaf.com/read/hjxsyzvfmgwp) that you may use for conceptual assignments. ::: :::warning Do **NOT** include your name anywhere within this submission. Points will be deducted if you do so. ::: ## Theme ![](http://cdn.shopify.com/s/files/1/0280/2270/2132/articles/fat-manatee-cute.jpg?v=1628345791) *This manatee can spot the cat in your image. Can your CNN do the same?* ## Conceptual Questions ### 1. Consider the three following $23 \times 23$ images of the digit 3. ![](https://i.imgur.com/MF0qOYn.png) - a. Which neural net is more fit to identify the digit in each image: a convolutional neural net or a multilayer perceptron (a neural network with multiple fully-connected layers and nonlinear layers)? Explain your reasoning. (2-3 sentences) - b. Will a convolutional layer with standard max-pooling (e.g $2 \times 2$ pooling) produce the same or different outputs for all of the images? Why/why not? How does this relate to translational invariance/equivariance? (hint: remember that the image is $23 \times 23$) (3-4 sentences) - c. Let’s say you built a convolutional neural network to classify these images with two layers: a convolution layer and a fully connected (linear) layer. What are their roles in the network respectively? (2-3 sentences). ### 2. Consider the image of a polar bear shown below. ![](https://i.imgur.com/jk8SUGf.jpg) - a. What are some examples of features that earlier convolutional layers will extract from this image? What about later layers? - b. The input image was converted into a matrix of size $13 \times 13$ along with a filter of size $3 \times 3$ with a stride of 2. Assume that we are using VALID padding. Determine the size of the convolved matrix. ### 3. The following questions refer to CNNs in different dimensions. - a. So far in this class, we’ve only explored 2D CNNs for image recognition and classification. However, 1D CNNs are also popular in many fields, with the network convolving linearly in only one direction. Give a scenario where a 1D CNN could be useful, and explain how the CNN can extract relevant features in a 1D setting. We’re looking for specific examples! (3-4 sentences) - https://www.tensorflow.org/api_docs/python/tf/nn/conv1d - b. Suppose you want your computer to read Twitter data. Explain how you could leverage 1D CNNs to classify different emotions from input tweets. How would you train your model? What would your CNN kernel convolve over? How would you take into account variable tweet sizes? (3-4 sentences) ### 4. (Optional) Have feedback for this assignment? Found something confusing? We’d love to hear from you! ## Ethical Implications In August of 2021, Apple introduced new features that scan iPhones and iCloud for images of child abuse. The model behind the image detection, neuralMatch, was trained using 200,000 images from the National Center for Missing & Exploited Children. Human reviewers would check any positive detections of child abuse imagery and alert law enforcement if confirmed. Please listen to [this 10-minute podcast](https://open.spotify.com/episode/7ihBfIGI9hyJxLzIrRQjk9?si=p25LINH2Se6qpEwdleStMw&dl_branch=1) to learn about the contexts in which neuralMatch is being deployed, and skim [Apple’s technical summary](https://www.apple.com/child-safety/pdf/CSAM_Detection_Technical_Summary.pdf) of CSAM detection. ### 1. According to Apple’s technical summary, how are CNNs used in this specific application? :::info Hint: review the System Overview and Technology Overview: NeuralHash sections. (3-5 sentences of your own words) ::: ### 2. Drawing on the podcast and paper, discuss one technology-driven (implemented using technology/software) and one human-driven method (manually implemented using humans) Apple is using to protect user’s privacy while identifying known CSAM images. (4-6 sentences) ### 3. As discussed in the podcast episode, Apple must balance its long-standing commitment to user privacy with increasing external pressures to act on broader sociotechnological issues like child safety. Do you think Apple should or should not deploy this set of features? What implementation measures or external factors (legal, technical, political, etc.) would cause you to change your mind? Please clearly state your position and be specific in your reasoning. (3-5 sentences) ## 2740-Only Questions ### 1. Prove for the discrete case that convolution is equivariant under translation. It’s fine to do this just for 1D convolution. ### 2. Suppose you have a CNN that begins by taking an input image of size $28 \times 28 \times 3$ and passing through a convolution layer that convolves the image using 3 filters of dimensions $2 \times 2 \times 3$ with valid padding. - a. How many learnable parameters does this convolution layer have? - b. Suppose that you instead decided to use a fully connected layer to replicate the behavior of this convolutional layer. How many parameters would that fully connected layer have - c. Read about [cutout](https://arxiv.org/pdf/1708.04552.pdf) - i. What is cutout? Why is it useful? - ii. What are some similar methods? What makes them similar? - iii. What were the cutout sizes for CIFAR-10 and CIFAR-100? How did the researchers decide on their cutout size? Why do you think the cutout size differed for CIFAR-10 vs CIFAR-100? <style> .alert { color: inherit } .markdown-body { font-family: Inter } h3 { font-size: 1.1em !important; font-weight: 500 !important; } </style>