HW3 Programming: CNNs

Conceptual questions due Monday, March 4th, 2024 at 6:00 PM EST
Programming assignment due Friday, March 8th, 2024 at 6:00 PM EST

Theme

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

The school's biggest fear, the shark, is still an existential threat. To protect themselves, the fish want to develop a model that can help them distinguish between snarky sharks and welcoming whales!

You want to help the fish, but you don't know what a shark or whale is! So build them a model that distinguishes between cats and dogs (we know what those are) as an example!

Assignment Overview

In this assignment, you will be building a Convolutional Neural Network (CNN) with pooling layers using the CIFAR dataset to learn to distinguish cats and dogs. Please read this handout in its entirety before beginning the assignment.

Conceptual Questions

Please submit your pdf with answers to all conceptual questions as one pdf on Gradescope under HW3 Conceptual Questions: Convolutional Neural Networks. When submitting the pdf to Gradescope, be sure to select in Gradescope which pages match with which questions.

LaTeX is recommended but not required. However, your solution must be typeset. No exceptions will be made. You will lose points if your submission doesn't follow this.

2470 students only: If you are in 2470, all conceptual questions (including non-2470 ones) should be written as one pdf and submitted to the "[CS2470] Hw3 Conceptual Questions: Convolution Neural Networks" assignment. Do not submit also to the CS1470 conceptual assignment.

You can find the conceptual questions here.

Note: these questions are due before the coding portion of the assignment.

Getting started

Stencil

Please click here to get the stencil code. Reference this guide for more information about GitHub and GitHub Classroom.

Do not change the stencil except where specified. While you are welcome to write your own helper functions, changing the stencil's method signatures or removing pre-defined functions could result in incompatibility with the autograder and result in a low grade.

The stencil should contain these files: assignment.py, convolution.py, preprocess.py, and local_test.py.

Run ./download.sh to get the data. You may need to run chmod +x download.sh beforehand.

Environment

You will need to use the virtual environment that you made in Homework 0 to run code in this assignment (because it relies on numpy and tensorflow), which you can activate by using conda activate csci1470.

Assignment Overview

Your task is a binary classification rather than a multi-class classification problem. We are doing CIFAR2, not CIFAR10. While the CIFAR10 dataset has 10 possible classes (airplane, automobile, bird, cat, deer, frog, horse, ship, and truck), you will build a CNN to take in an image and correctly predict its class to either be a cat or dog, hence CIFAR2. We limit this assignment to a binary classification problem so that you can train the model in a reasonable amount of time.

The assignment has three parts:

Conceptual Questions: Answer questions related to the assignment and class material on Gradescope.

2470 students only: If you are taking 2470, you must also answer the additional questions on Gradescope marked with CS2470.
Model: Build the model. Our stencil provides a model class with several methods and hyperparameters you need to use for your network.
Convolution Function: Fill out a function that performs the convolution operator. See Roadmap below for more information on parts 2 and 3.

You should include a brief README with your model's accuracy and any known bugs!

This assignment should take longer to run than the previous assignment. If completed correctly, the model should train and test within 15 minutes on a department machine. While you will mainly be using TensorFlow functions, the second part of the assignment requires you to write your own convolution function, which is very computationally expensive. To counter this, we only require that you print the accuracy across the test set after finishing all training. On a department machine, training should take about 3 minutes and testing using your own convolution should take about 2 minutes.

Roadmap

You will notice that the structure of the Model class is very similar to the Model class defined in your first assignment. We strongly suggest that you first complete the Intro to TensorFlow Lab before starting this assignment. The lab includes many explanations about the way a Model class is structured, what variables are, and how things work in TensorFlow. If you come into hours with questions about TensorFlow related material that is covered in the lab, we will direct you to the lab.

Below is a brief outline of some things you should do. We expect you to fill in some of the missing gaps (review lecture slides to understand the pipeline) as this is your third assignment.

Step 1. Preprocessing Data

We have provided you with a function unpickle(file) in the preprocess file stencil, which unpickles an object and returns a dictionary. Do not edit it. We have also already extracted the inputs and labels from the dictionary in get_data so you have no need to deal with the pickled file or the dictionary.
You will want to limit the inputs and labels returned by get_data to those representing the first and second classes of your choice. For every image and its corresponding label, if the label is not of the first or second class, then remove the image and label from your inputs and labels arrays. There are a few different ways to do this—you might find numpy.nonzero or broadcasting useful for finding only the indices of your labels which correspond to the first and second class.
At this point, your inputs are still two dimensional. You will want to reshape your inputs into (-1, 3, 32, 32) using tf.reshape(inputs, (-1, 3, 32 ,32)) and then transpose them using tf.transpose(inputs, perm=[0,2,3,1]) so that the final inputs you return have shape (num_examples, 32, 32, 3), where the width is 32, height is 32, and number of channels is 3.
You now have inputs and labels of only two classes, but the label numbers do not represent the binary inputs. You will want to re-number the labels such that the cat label class is 0 and the dog label is 1 (in general cases, it doesn't matter which is which; however, we require this specificity for our autograder). You might find numpy.where useful in the renumbering process.
After doing that, you will want to turn your labels into one-hot vectors, where the index with a 1 represents the class of the correct image. You can do this with the function tf.one_hot.
- This can be a bit confusing so we'll just make it clear: your labels should be of size (num_images, num_classes). So for example, if your first class is a cat and your second class is sushi, the corresponding label of the first image might be [0, 1] where a 1 in the second index means that it's sushi.

Note: You should normalize the input pixel values so that they range from 0 to 1 to avoid any numerical overflow issues. This can be done by dividing each pixel value by 255.

You're going to be calling get_data on both the training and testing data files in assignment.py. The testing and training data files to be read in are in the following format:

train: A pickled object of 50,000 train images and labels. This includes images and labels of all 10 classes. After unpickling the file, the dictionary will have the following elements:
- data – a 50000x3072 numpy array of uint8s. Each row of the array stores a 32x32 color image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image.
- labels – a list of 50000 numbers in the range 0-1. The number at index i indicates the label of the i-th image in the array data.
test: A pickled object of 10,000 test images and labels. This includes images and labels of all 10 classes. Unpickling the file gives a dictionary with the same key values as above.

⚠️WARNING⚠️: In the main function in assignment.py, we give you AUTOGRADER_TRAIN_FILE and AUTOGRADER_TEST_FILE variables, which are the file paths that must be used for it to work with the autograder. You might need to define separate filepaths to run the code locally (especially if you are on Windows). When you submit your code to Gradescope, you MUST call get_data using the autograder filepaths we have provided in the stencil (or filepaths identical to the ones we have provided).

Note: If you download the dataset from online, the training data is actually divided into batches. We have done the job of repickling all of the batches into one single train file for your ease.

Step 2. Create your model

You will not receive credit if you use the tf.keras, tf.layers, and tf.slim libraries for anything but your optimizer (and Model inheriting from tf.keras.Model is ok too). You may use tf.keras.optimizers.

Again, you should initialize all hyperparameters within the constructor even though this is not customary. This is still necessary for the autograder. Consider what's being learned in a CNN and initialize those as trainable parameters. In the last assignment, it was our weights and biases. This time around, you will still want weights and biases, but there are other things that are being learned!
We recommend using an Adam Optimizer (tf.keras.optimizers.Adam) with a learning rate of 1e-3, but feel free to experiment with whatever produces the best results.
Weight variables should be initialized from a normal distribution (tf.random.truncated_normal) with a standard deviation of 0.1.
You may use any permutation and number of convolution, pooling, and feed forward layers, as long as you use at least one convolution layer with strides of [1, 1, 1, 1], one pooling layer, and one fully connected layer.

Note: the Dense/Fully Connected Layers are like the linear layers created in the last assignment with a weight and bias.

If you are having trouble getting started with model architecture, we have provided an example below:
- 1st Convolution Layer [tf.nn.conv2d] + Bias, Batch Normalization [tf.nn.batch_normalization], ReLU [tf.nn.relu], Max Pooling [tf.nn.max_pool]
  - Use [tf.nn.bias_add] to add the bias after your convolution operation.
  - For convolution, use 16 5x5 filters for each channel, 2x2 strides, and same padding. Example filter initialization: [tf.Variable(tf.random.truncated_normal([5,5,3,16], stddev=0.1))]
  - For pooling, use 3x3 kernels, 2x2 strides, and same padding
- 2nd Convolution Layer + Bias, Batch Normalization, ReLU, Max Pooling
  - For convolution use 20 5x5 filters and same padding. The dimension of the strides is left to you.
  - For pooling, use 2x2 kernels and same padding. The dimension of the strides is left to you.
- 3rd Convolution Layer + Bias, Batch Normalization, ReLU
  - For convolution, use 20 3x3 filters, 1x1 strides, and same padding
- Remember to reshape the output of the previous convolutional layer to make it compatible with the dense layers.
- 1st Dense Layer + Bias, Dropout [tf.nn.dropout]
- 2st Dense Layer + Bias, Dropout
- Final Dense Layer
- Fill out the call function using the trainable variables you've created. Your call function should return the logits. Note that in the lab, we mentioned using a @tf.function decorator to tell TF to run it in graph execution. Do NOT do this for this assignment - we'll explain why the forward pass has to be run in eager execution later. The parameter is_testing will be used later, do not worry about it when implementing everything in this part.
Calculate the average softmax cross-entropy loss on the logits compared to the labels. We suggest using tf.nn.softmax_cross_entropy_with_logits and tf.reduce_mean to condense the loss to one value.

Step 3. Train and test

In the main function, you will want to get your train and test data, initialize your model, and train it for many epochs. We suggest training for 10 epochs. For the autograder, we will train it for at most 25 epochs (hard limit of 15 minutes). We have provided for you a train and test method to fill out. The train method will take in the model and do the forward and backward pass for a SINGLE epoch. Yes, this means that, unlike the first assignment, your main function will have a for loop that goes through the number of epochs, calling train each time.
Even though this is technically part of preprocessing, you should shuffle your inputs and labels when TRAINING. Keep in mind that they have to be shuffled in the same order. We suggest creating a range of indices of length num_examples, then using tf.random.shuffle. Finally you can use tf.gather(train_inputs, indices) to shuffle your inputs. You can do the same with your labels to ensure they are shuffled the same way. Alternatively, you can zip the inputs and labels before shuffling them to ensure they are shuffled in the same order.
Make sure you've reshaped inputs in preprocessing into shape (batch_size, width, height, in_channels) before calling model.call(). When training, you might find it helpful to actually call tf.image.random_flip_left_right on your batch of image inputs to increase accuracy. Do not call this when testing.
Call the model's forward pass and calculate the loss within the scope of tf.GradientTape. Then use the model's optimizer to apply the gradients to your model's trainable variables outside of the GradientTape. If you're unsure about this part, please refer to the lab. This is synonymous with doing the gradient_descent function in the first assignment, except that TensorFlow handles all of that for you!
If you'd like, you can calculate the train accuracy to check that your model does not overfit the training set. If you get upwards of 80% accuracy on the training set but only 65% accuracy on the testing set, you might be overfitting.
The test function will take in the same model, now with trained parameters, and return the accuracy given the test data and test labels. The test function will be very similar to the train function except without the GradientTape.

Step 4. Creating your own `conv2d`

Before starting this part of the assignment, you should ensure that you have an accuracy of at least 70% on the test set using only TensorFlow functions for the problem of classifying dogs and cats.

As a new addition to this assignment, you will be implementing your very own convolution function!

For the sake of simple math calculations (less is more, no?), we'll require that our conv2d function only works with a stride of 1 (for both width and height). This is because the calculation for padding size changes as a result of the stride.

Do NOT change the parameters of the conv2d function we have provided. Even though the conv2d function takes in a strides argument, you should ALWAYS pass in [1, 1, 1, 1]. Leaving in strides as an argument was a conscious design choice - if you wanted to eventually make the conv2d function work for other kinds of strides in your own time, this would allow you to easily change it.

Your inputs will have 4 dimensions. If we are to use this conv2d function for the first layer, the inputs would be [batch_size, in_height, in_width, input_channels].
You should ensure that the input's number of "in channels" is equivalent to the filters' number of "in channels". Make sure to add an assert statement or throw an error if the number of input in channels are not the same as the filter’s in channels. You will lose points if you do not do this.
When calculating how much padding to use for SAME padding, padding is just (filter_size - 1)/2 if you are using strides of 1. The calculation of padding differs if you increase your strides and is much more complex, so we won’t be dealing with that. If you are interested, you may read about it here. If padding is not an integer, you can just round down using math.floor.
You can use this hefty NumPy function np.pad to pad your input!
After padding (if needed), you will want to go through the entire batch of images and perform the convolution operator on each image. There are two ways of going about this - you can continuously append multidimensional NumPy arrays to an output array or you can create a NumPy array with the correct output dimensions, and just update each element in the output as you perform the convolution operator. We suggest doing the latter - it's conceptually easier to keep track of things this way.
Your output dimension height is equal to (in_height + 2*padY - filter_height) / strideY + 1 and your output dimension width is equal to (in_width + 2*padX - filter_width) / strideX + 1. Again, strideX and strideY will always be 1 for this assignment. Refer to the CNN slides if you'd like to understand this derivation.
You will want to iterate the entire height and width including padding, stopping when you cannot fit a filter over the rest of the padding input. For convolution with many input channels, you will want to perform the convolution per input channel and sum those dot products together.
PLEASE RETURN YOUR RESULT TO A TENSOR USING tf.convert_to_tensor(your_array, dtype = tf.float32). Issues have occurred in the past without this step.

Step 5. Testing your own `conv2d`

We have provided for you a few tests that compare the result of your very own conv2d and TensorFlow's conv2d. If you've implemented it correctly, the results should be very similar.
The last super important part of this project is that you should call your conv2d function IN your model. TensorFlow cannot build a graph/differentiate with NumPy operators so you should not add a @tf.function decorator.
In your model, you should set is_testing to True when testing, then make sure that if is_testing is True, you use your own convolution rather than TensorFlow's conv2d on a SINGLE convolution layer. If you follow the architecture described above, we suggest adding in an if statement before the third convolution layer (ie. switch out the conv2d for your third convolution). This part will take the longest, and is why we say it might actually take up to 15 minutes on a local machine.

Mandatory and Non-mandatory Hyperparameters: You can train with any batch size but you are limited to training for at most 25 epochs. However, your model must train using TensorFlow functions and test using your own convolution function without timing out on Gradescope. Again, the parameters we suggest are training for 25 epochs using a batch size of 64.

Hint: If you are having difficulty running within the time frame, consider using matrix multiplication or tensordot to replace one (or more) of your inner for loops.

Visualizing Results

We have written two methods for you to visualize your results. The created visuals will not be graded and are entirely for your benefit. You can use it to check out your doggos and kittens.

We've provided the visualize_results(image_inputs, logits, image_labels, first_label, second_label) method for you to visualize your predictions against the true labels using matplotlib, a useful Python library for plotting graphs. This method is currently written with the image_labels having a shape of (num_images, num_classes). DO NOT EDIT THIS FUNCTION. You should call this function after training and testing, passing into visualize_results an input of 50 images, 50 probabilities, 50 labels, the first label name, and second label name.
Unlike the first assignment, you will need to pass in the strings of the first and second classes. A visualize_results method call might look like: visualize_results(image_inputs, logits, image_labels, "cat", "dog").
This should result in two visuals, one for correct predictions, and one for incorrect predictions. You should do this after you are sure you have met the benchmark for test accuracy.
We have also provided the visualize_loss(losses) method for you to visualize your loss per batch over time. Your model or your training function should have a list loss_list to which you can append batch losses to during training. You should call this function after training and testing, passing in loss_list.

Submission

Requirements

Your model must complete training within 15 minutes AND under 25 epochs on Gradescope.

Our autograder will import your model and your preprocessing functions. We will feed the result of your get_data function called on a path to our data and pass the result to your train method in order to return a fully trained model. After this, we will feed in your trained model, alongside the TA pre-processed data, to our custom test function. This will just batch the testing data using YOUR batch size and run it through your model's call function. However, we will test that your model can test with any batch size, meaning that you should not hardcode self.batch_size in your call function. The logits which are returned will then be fed through an accuracy function. Additionally, we will test your conv2d function. In order to ensure you don't lose points, you need to make sure that you… A) correctly return training inputs and labels from get_data, B) ensure that your model's call function returns logits from the inputs specified, and that it does not break on different batch sizes when testing, and C) it does not rely on any packages outside of tensorflow, numpy, matplotlib, or the python standard library.

In addition, remember to include a brief README with your model's accuracy and any known bugs.

CS2470 Students

There are two extra requirements for CS2470 students.

Please complete the CS2470-only conceptual questions in addition to the coding assignment and the CS1470 conceptual questions.
1. You must receive an accuracy of at least 75% within 25 epochs of training your model. This means that you must choose an architecture/play around with hyperparameters to reach a higher accuracy.
Hint: Consider implementing cutout (as discussed in this paper from the conceptual questions) and/or playing around with the dropout rate.

Grading

Code: You will be primarily graded on functionality. Your model should run within 15 minutes and 25 epochs on Gradescope and have an accuracy that is at least greater than 70% on the testing data (or 75% for CS2470 students).

Conceptual: You will be primarily graded on correctness (when applicable), thoughtfulness, and clarity.

You will not receive credit if you use the tf.keras, tf.layers, and tf.slim libraries for anything but your optimizer.

Handing In

You should submit the assignment via Gradescope under the corresponding project assignment by zipping up your hw1 folder or through GitHub (recommended). To submit through GitHub, commit and push all changes to your repository to GitHub. You can do this by running the following three commands (this is a good resource for learning more about them):

git add file1 file2 file3
- Alternatively, git add -A will stage all changed files for you.
git commit -m “commit message”
git push

After committing and pushing your changes to your repo (which you can check online if you're unsure if it worked), you can now just upload the repo to Gradescope! If you’re testing out code on multiple branches, you have the option to pick whichever one you want.

If you wish to submit via zip file:

Please make sure your python files are in “hw3/code” this is very important for our autograder to work!
Make sure any data folders are not being uploaded as they may be too big for the autograder to work.

IF YOU ARE IN 2470: PLEASE REMEMBER TO ADD A BLANK FILE CALLED 2470student IN THE hw3/code DIRECTORY, WE ARE USING THIS AS A FLAG TO GRADE 2470 SPECIFIC REQUIREMENTS, FAILURE TO DO SO MEANS LOSING POINTS ON THIS ASSIGNMENT

Conclusion

Congrats on finishing your CNN homework; Baby Blueno is very appreciative of your help!!

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

The Dumbo Octopus lives at a lower depth than any other octopus. As such, common octopi defense mechanisms, like changing colors or ink sacs, are absent on the Dumbo Octopus.

HW3 Programming: CNNs

Theme

Assignment Overview

Conceptual Questions

Getting started

Stencil

Environment

Assignment Overview

Roadmap

Step 1. Preprocessing Data

Step 2. Create your model

Step 3. Train and test

Step 4. Creating your own conv2d

Step 5. Testing your own conv2d

Visualizing Results

Submission

Requirements

CS2470 Students

Grading

Handing In

Conclusion

Read more

Deep Learning Final Project

HW6 Conceptual: Variational Autoencoders

HW6 Programming: Variational Autoencoders

HW5 Conceptual: Image Captioning

Step 4. Creating your own `conv2d`

Step 5. Testing your own `conv2d`