Conceptual questions due Monday, March 4th, 2024 at 6:00 PM EST
Programming assignment due Friday, March 8th, 2024 at 6:00 PM EST
The school's biggest fear, the shark, is still an existential threat. To protect themselves, the fish want to develop a model that can help them distinguish between snarky sharks and welcoming whales!
You want to help the fish, but you don't know what a shark or whale is! So build them a model that distinguishes between cats and dogs (we know what those are) as an example!
In this assignment, you will be building a Convolutional Neural Network (CNN) with pooling layers using the CIFAR dataset to learn to distinguish cats and dogs. Please read this handout in its entirety before beginning the assignment.
Please submit your pdf with answers to all conceptual questions as one pdf on Gradescope under HW3 Conceptual Questions: Convolutional Neural Networks. When submitting the pdf to Gradescope, be sure to select in Gradescope which pages match with which questions.
LaTeX is recommended but not required. However, your solution must be typeset. No exceptions will be made. You will lose points if your submission doesn't follow this.
2470 students only: If you are in 2470, all conceptual questions (including non-2470 ones) should be written as one pdf and submitted to the "[CS2470] Hw3 Conceptual Questions: Convolution Neural Networks" assignment. Do not submit also to the CS1470 conceptual assignment.
You can find the conceptual questions here.
Note: these questions are due before the coding portion of the assignment.
Please click here to get the stencil code. Reference this guide for more information about GitHub and GitHub Classroom.
Do not change the stencil except where specified. While you are welcome to write your own helper functions, changing the stencil's method signatures or removing pre-defined functions could result in incompatibility with the autograder and result in a low grade.
The stencil should contain these files: assignment.py
, convolution.py
, preprocess.py
, and local_test.py
.
Run ./download.sh
to get the data. You may need to run chmod +x download.sh
beforehand.
You will need to use the virtual environment that you made in Homework 0 to run code in this assignment (because it relies on numpy
and tensorflow
), which you can activate by using conda activate csci1470
.
Your task is a binary classification rather than a multi-class classification problem. We are doing CIFAR2, not CIFAR10. While the CIFAR10 dataset has 10 possible classes (airplane, automobile, bird, cat, deer, frog, horse, ship, and truck), you will build a CNN to take in an image and correctly predict its class to either be a cat or dog, hence CIFAR2. We limit this assignment to a binary classification problem so that you can train the model in a reasonable amount of time.
The assignment has three parts:
2470 students only: If you are taking 2470, you must also answer the additional questions on Gradescope marked with CS2470.
You should include a brief README with your model's accuracy and any known bugs!
This assignment should take longer to run than the previous assignment. If completed correctly, the model should train and test within 15 minutes on a department machine. While you will mainly be using TensorFlow functions, the second part of the assignment requires you to write your own convolution function, which is very computationally expensive. To counter this, we only require that you print the accuracy across the test set after finishing all training. On a department machine, training should take about 3 minutes and testing using your own convolution should take about 2 minutes.
You will notice that the structure of the Model class is very similar to the Model class defined in your first assignment. We strongly suggest that you first complete the Intro to TensorFlow Lab
before starting this assignment. The lab includes many explanations about the way a Model class is structured, what variables are, and how things work in TensorFlow. If you come into hours with questions about TensorFlow related material that is covered in the lab, we will direct you to the lab.
Below is a brief outline of some things you should do. We expect you to fill in some of the missing gaps (review lecture slides to understand the pipeline) as this is your third assignment.
unpickle(file)
in the preprocess file stencil, which unpickles an object and returns a dictionary. Do not edit it. We have also already extracted the inputs and labels from the dictionary in get_data
so you have no need to deal with the pickled file or the dictionary.get_data
to those representing the first and second classes of your choice. For every image and its corresponding label, if the label is not of the first or second class, then remove the image and label from your inputs and labels arrays. There are a few different ways to do this—you might find numpy.nonzero
or broadcasting
useful for finding only the indices of your labels which correspond to the first and second class.tf.reshape(inputs, (-1, 3, 32 ,32))
and then transpose them using tf.transpose(inputs, perm=[0,2,3,1])
so that the final inputs you return have shape (num_examples, 32, 32, 3), where the width is 32, height is 32, and number of channels is 3.numpy.where
useful in the renumbering process.tf.one_hot
.
Note: You should normalize the input pixel values so that they range from 0 to 1 to avoid any numerical overflow issues. This can be done by dividing each pixel value by 255.
You're going to be calling get_data
on both the training and testing data files in assignment.py
. The testing and training data files to be read in are in the following format:
train
: A pickled object of 50,000 train images and labels. This includes images and labels of all 10 classes. After unpickling the file, the dictionary will have the following elements:
i
indicates the label of the i
-th image in the array data.test
: A pickled object of 10,000 test images and labels. This includes images and labels of all 10 classes. Unpickling the file gives a dictionary with the same key values as above.⚠️WARNING⚠️: In the main
function in assignment.py
, we give you AUTOGRADER_TRAIN_FILE
and AUTOGRADER_TEST_FILE
variables, which are the file paths that must be used for it to work with the autograder. You might need to define separate filepaths to run the code locally (especially if you are on Windows). When you submit your code to Gradescope, you MUST call get_data
using the autograder filepaths we have provided in the stencil (or filepaths identical to the ones we have provided).
Note: If you download the dataset from online, the training data is actually divided into batches. We have done the job of repickling all of the batches into one single train file for your ease.
You will not receive credit if you use the tf.keras
, tf.layers
, and tf.slim
libraries for anything but your optimizer (and Model inheriting from tf.keras.Model
is ok too). You may use tf.keras.optimizers
.
tf.keras.optimizers.Adam
) with a learning rate of 1e-3, but feel free to experiment with whatever produces the best results.tf.random.truncated_normal
) with a standard deviation of 0.1.[1, 1, 1, 1]
, one pooling layer, and one fully connected layer.Note: the Dense/Fully Connected Layers are like the linear layers created in the last assignment with a weight and bias.
If you are having trouble getting started with model architecture, we have provided an example below:
[tf.nn.conv2d]
+ Bias, Batch Normalization [tf.nn.batch_normalization]
, ReLU [tf.nn.relu]
, Max Pooling [tf.nn.max_pool]
[tf.nn.bias_add]
to add the bias after your convolution operation.[tf.Variable(tf.random.truncated_normal([5,5,3,16], stddev=0.1))]
[tf.nn.dropout]
is_testing
will be used later, do not worry about it when implementing everything in this part.Calculate the average softmax cross-entropy loss on the logits compared to the labels. We suggest using tf.nn.softmax_cross_entropy_with_logits
and tf.reduce_mean
to condense the loss to one value.
main
function, you will want to get your train and test data, initialize your model, and train it for many epochs. We suggest training for 10 epochs. For the autograder, we will train it for at most 25 epochs (hard limit of 15 minutes). We have provided for you a train and test method to fill out. The train method will take in the model and do the forward and backward pass for a SINGLE epoch. Yes, this means that, unlike the first assignment, your main
function will have a for loop that goes through the number of epochs, calling train each time.tf.random.shuffle
. Finally you can use tf.gather(train_inputs, indices)
to shuffle your inputs. You can do the same with your labels to ensure they are shuffled the same way. Alternatively, you can zip
the inputs and labels before shuffling them to ensure they are shuffled in the same order.tf.image.random_flip_left_right
on your batch of image inputs to increase accuracy. Do not call this when testing.tf.GradientTape
. Then use the model's optimizer to apply the gradients to your model's trainable variables outside of the GradientTape. If you're unsure about this part, please refer to the lab. This is synonymous with doing the gradient_descent
function in the first assignment, except that TensorFlow handles all of that for you!conv2d
Before starting this part of the assignment, you should ensure that you have an accuracy of at least 70% on the test set using only TensorFlow functions for the problem of classifying dogs and cats.
As a new addition to this assignment, you will be implementing your very own convolution function!
For the sake of simple math calculations (less is more, no?), we'll require that our conv2d
function only works with a stride of 1 (for both width and height). This is because the calculation for padding size changes as a result of the stride.
Do NOT change the parameters of the conv2d function we have provided. Even though the conv2d
function takes in a strides argument, you should ALWAYS pass in [1, 1, 1, 1]. Leaving in strides as an argument was a conscious design choice - if you wanted to eventually make the conv2d
function work for other kinds of strides in your own time, this would allow you to easily change it.
(filter_size - 1)/2
if you are using strides of 1. The calculation of padding differs if you increase your strides and is much more complex, so we won’t be dealing with that. If you are interested, you may read about it here. If padding is not an integer, you can just round down using math.floor
.np.pad
to pad your input!(in_height + 2*padY - filter_height) / strideY + 1
and your output dimension width is equal to (in_width + 2*padX - filter_width) / strideX + 1
. Again, strideX
and strideY
will always be 1 for this assignment. Refer to the CNN slides if you'd like to understand this derivation.tf.convert_to_tensor(your_array, dtype = tf.float32)
. Issues have occurred in the past without this step.conv2d
conv2d
and TensorFlow's conv2d
. If you've implemented it correctly, the results should be very similar.conv2d
function IN your model. TensorFlow cannot build a graph/differentiate with NumPy operators so you should not add a @tf.function
decorator.is_testing
to True when testing, then make sure that if is_testing
is True, you use your own convolution rather than TensorFlow's conv2d
on a SINGLE convolution layer. If you follow the architecture described above, we suggest adding in an if statement before the third convolution layer (ie. switch out the conv2d
for your third convolution). This part will take the longest, and is why we say it might actually take up to 15 minutes on a local machine.Mandatory and Non-mandatory Hyperparameters: You can train with any batch size but you are limited to training for at most 25 epochs. However, your model must train using TensorFlow functions and test using your own convolution function without timing out on Gradescope. Again, the parameters we suggest are training for 25 epochs using a batch size of 64.
Hint: If you are having difficulty running within the time frame, consider using matrix multiplication or tensordot to replace one (or more) of your inner for loops.
We have written two methods for you to visualize your results. The created visuals will not be graded and are entirely for your benefit. You can use it to check out your doggos and kittens.
visualize_results(image_inputs, logits, image_labels, first_label, second_label)
method for you to visualize your predictions against the true labels using matplotlib, a useful Python library for plotting graphs. This method is currently written with the image_labels having a shape of (num_images, num_classes). DO NOT EDIT THIS FUNCTION. You should call this function after training and testing, passing into visualize_results
an input of 50 images, 50 probabilities, 50 labels, the first label name, and second label name.visualize_results
method call might look like: visualize_results(image_inputs, logits, image_labels, "cat", "dog")
.visualize_loss(losses)
method for you to visualize your loss per batch over time. Your model or your training function should have a list loss_list
to which you can append batch losses to during training. You should call this function after training and testing, passing in loss_list
.Your model must complete training within 15 minutes AND under 25 epochs on Gradescope.
Our autograder will import your model and your preprocessing functions. We will feed the result of your get_data
function called on a path to our data and pass the result to your train method in order to return a fully trained model. After this, we will feed in your trained model, alongside the TA pre-processed data, to our custom test function. This will just batch the testing data using YOUR batch size and run it through your model's call
function. However, we will test that your model can test with any batch size, meaning that you should not hardcode self.batch_size
in your call
function. The logits which are returned will then be fed through an accuracy function. Additionally, we will test your conv2d function. In order to ensure you don't lose points, you need to make sure that you… A) correctly return training inputs and labels from get_data
, B) ensure that your model's call
function returns logits from the inputs specified, and that it does not break on different batch sizes when testing, and C) it does not rely on any packages outside of tensorflow, numpy, matplotlib, or the python standard library.
In addition, remember to include a brief README with your model's accuracy and any known bugs.
There are two extra requirements for CS2470 students.
Code: You will be primarily graded on functionality. Your model should run within 15 minutes and 25 epochs on Gradescope and have an accuracy that is at least greater than 70% on the testing data (or 75% for CS2470 students).
Conceptual: You will be primarily graded on correctness (when applicable), thoughtfulness, and clarity.
You will not receive credit if you use the tf.keras, tf.layers, and tf.slim libraries for anything but your optimizer.
You should submit the assignment via Gradescope under the corresponding project assignment by zipping up your hw1 folder or through GitHub (recommended). To submit through GitHub, commit and push all changes to your repository to GitHub. You can do this by running the following three commands (this is a good resource for learning more about them):
git add file1 file2 file3
git add -A
will stage all changed files for you.git commit -m “commit message”
git push
After committing and pushing your changes to your repo (which you can check online if you're unsure if it worked), you can now just upload the repo to Gradescope! If you’re testing out code on multiple branches, you have the option to pick whichever one you want.
If you wish to submit via zip file:
IF YOU ARE IN 2470: PLEASE REMEMBER TO ADD A BLANK FILE CALLED 2470student
IN THE hw3/code
DIRECTORY, WE ARE USING THIS AS A FLAG TO GRADE 2470 SPECIFIC REQUIREMENTS, FAILURE TO DO SO MEANS LOSING POINTS ON THIS ASSIGNMENT
Congrats on finishing your CNN homework; Baby Blueno is very appreciative of your help!!
The Dumbo Octopus lives at a lower depth than any other octopus. As such, common octopi defense mechanisms, like changing colors or ink sacs, are absent on the Dumbo Octopus.