owned this note
owned this note
Published
Linked with GitHub
# HW2 Programming: CNNs
:::info
Assignment due **October 8th at 6 pm EST** on gradescope
:::
## Theme
![catvdog](https://hackmd.io/_uploads/SJ_DOB5TR.png)
*The dogs' greatest enemy, an all powerful cat, is making the dogs question their very existence!
You want to help the dogs, so build them a model that distinguishes between cats and dogs, only then will they be able to trust their own senses.*
## Assignment Overview
In this assignment, you will be building a **Multi-Layer Perceptron (MLP)** and **Convolutional Neural Network (CNN)** with pooling layers using the CIFAR dataset to learn to distinguish cats, dogs and deer (among other things). *Please read this handout in its entirety before beginning the assignment.*
## Getting started
### Stencil
Please click [here](https://classroom.github.com/a/mrex9l6r) to get the stencil code. Reference this [guide](https://hackmd.io/gGOpcqoeTx-BOvLXQWRgQg) for more information about GitHub and GitHub Classroom.
:::danger
**Do not change the stencil except where specified**. While you are welcome to write your own helper functions, changing the stencil's method signatures or removing pre-defined functions could result in incompatibility with the autograder and result in a low grade.
:::
The stencil should contain these files: `assignment.py`, `base_model.py`, `cnn.py`, `local_test.py`, `manual_convolution.py`, `mlp.py` and `preprocess.py`.
:::info
**Task 0.1 Download the data:** Click [here](https://cs.brown.edu/courses/csci1470/hw_data/hw2.zip) to download the data. When you unzip you'll find 2 files, `data/train` and `data/test`. These files contain all the data you'll need for this assignment.
:::
### Environment
You will need to use the virtual environment that you made in Homework 1 to run code in this assignment (because it relies on `numpy` and `tensorflow`), which you can activate by using `conda activate csci2470`.
## Assignment Overview
Your task is a multi-class classification problem on the CIFAR10 dataset which you can read about [here](https://www.cs.toronto.edu/~kriz/cifar.html). While the CIFAR10 dataset has 10 possible classes (airplane, automobile, bird, cat, deer, frog, horse, ship, and truck), you will build a CNN to take in an image and correctly predict a subset of these classes. __You'll submit your model's predictions for the _dog-deer-cat_ subset.__
The assignment has two parts:
1. **Model:** Build the models. Our stencil provides a model class with several methods and hyperparameters you need to use for your network.
2. **Convolution Function:** Fill out a function that performs the convolution operator. See Roadmap below for more information on parts 1 and 2.
:::info
You should include a brief README with your model's accuracy and any known bugs!
:::
If completed correctly, the model should train and test within 15 minutes on a department machine. It takes about 5 minutes on our TAs' laptops. While you will mainly be using TensorFlow functions, the second part of the assignment requires you to write your own convolution function, which can be very computationally expensive. To counter this, we only require that you print the accuracy across the test set after finishing all training. On a department machine, training should take about 3 minutes and testing using your own convolution should take about 2 minutes.
# Roadmap
Below is a brief outline of some things you should do. We expect you to fill in some of the missing gaps (review lecture slides to understand the pipeline).
## Step 1. Preprocessing Data
:::danger
**⚠️WARNING⚠️:** __Please do not shuffle the data here__. You'll shuffle the data before training and testing. You should maintain the order of examples as they are loaded in or you will fail Autograder test 1.4.
:::
:::info
__Task 1.1 [preprocess.get_data pt 1]:__ Start filling in the get_data function in `preprocess.py`.
* We have provided you with a function `unpickle(file)` in the `preprocess.py` file stencil, which unpickles an object and returns a dictionary. Do not edit it. We have also already extracted the inputs and labels from the dictionary in `get_data` so you have no need to deal with the pickled file or the dictionary.
- You will want to limit the inputs and labels returned by `get_data` to those specfied by the `classes` parameter. You will be expected to turn in predictions for the **cat (label index 3), deer (4), dog (5)** subset of the test set, so it's a good default to have in mind. For every image and its corresponding label, if the label is not in `classes`, then remove the image and label from your inputs and labels arrays. There are a few different ways to do this—you might find [`numpy.nonzero`](https://numpy.org/doc/1.18/reference/generated/numpy.nonzero.html) or [`broadcasting`](https://numpy.org/doc/stable/user/basics.broadcasting.html) useful for finding only the indices of your labels.
:::
:::info
__Task 1.2 [preprocess.get_data pt 2]:__ Continue filling in the get_data function in `preprocess.py`.
- At this point, your inputs are still two dimensional. You will want to reshape your inputs so that the final inputs you return have shape (num_examples, 32, 32, 3), where the width is 32, height is 32, and number of channels is 3. `tf.reshape` and `tf.transpose` will be help here.
- You should normalize the input pixel values so that they range from 0 to 1 to avoid any numerical overflow issues. This can be done by dividing each pixel value by 255.
:::
:::danger
It is critical to first reshape your data to (num_examples, 3, 32, 32) then transpose to get (num_examples, 32, 32, 3). If you do not do it in this order the images will be scrambled!
:::
:::info
__Task 1.3 [preprocess.get_data pt 3]:__ Finish the get_data function in `preprocess.py`.
- You will want to re-number the labels such that the lowest label -> 0 and the highest label -> `num_classes`, filling in the intermediate values from likewise. (in general cases, it doesn't matter how this is done; however, we require this specificity for our autograder). You might find [`numpy.where`](https://numpy.org/doc/stable/reference/generated/numpy.where.html) useful in the renumbering process.
- After doing that, you will want to turn your labels into one-hot vectors, where the index with a 1 represents the class of the correct image. You can do this with the function `tf.one_hot`.
- This can be a bit confusing so we'll just make it clear: your final labels should be of size (num_images, num_classes). So for example, if you have 2 classes, cat and dog, the corresponding label of of a dog image might be [0, 1] where a 1 in the second index means that it's a dog.
:::
:::success
Note: If you use `tf.one_hot`, you will need to shift your labels so that the class indices are between 0-len(classes). For example, if you are using classes [3,6,8] change the labels to [0,1,2] respectively before using `tf.one_hot`
:::
:::danger
**⚠️WARNING⚠️:** In the `main` function in `assignment.py`, we give you `AUTOGRADER_TRAIN_FILE` and `AUTOGRADER_TEST_FILE` variables, which are the file paths that must be used for it to work with the autograder. You might need to define separate filepaths to run the code locally (especially if you are on Windows). When you submit your code to Gradescope, you **MUST** call `get_data` using the autograder filepaths we have provided in the stencil (or filepaths identical to the ones we have provided).
:::
:::success
**Note:** If you download the dataset from online, the training data is actually divided into batches. We have done the job of repickling all of the batches into one single train file for your ease.
:::
:::success
**Note:** You're going to be calling `get_data` on both the training and testing data files in `assignment.py`. The testing and training data files to be read in are in the following format:
- `train`: A pickled object of 50,000 train images and labels. This includes images and labels of all 10 classes. After unpickling the file, the dictionary will have the following elements:
- data -- a 50000x3072 numpy array of uint8s. Each row of the array stores a 32x32 color image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image.
- labels -- a list of 50000 numbers in the range 0-9. The number at index `i` indicates the label of the `i`-th image in the array data.
- `test`: A pickled object of 10,000 test images and labels. This includes images and labels of all 10 classes. Unpickling the file gives a dictionary with the same key values as above.
:::
:::info
__Task 1.2 [assignment.main pt 1]:__ Load in both your training and testing data using `get_data`. Print out the shapes, values, etc. and once you are happy feel free to submit what you have so far to the autograder to check your score for the preprocessing tests.
:::
Throughout this assigment we recommend building `assignment.py` as you go so that you can test your implementations as you write them, not all at once. Now is a great time to start filling out `assignment.main` while testing your `get_data` at the same time.
## Step 2. Create your MLP model
Time to make your first deep learning model! Go to the `mlp.py` file and take a glance at the stencil. You'll notice that we have a constructor function (`__init__`) and a `call` function. In the constructor, we want to build up everything necessary to have a working deep learning model. In the `call` function, we want to fill out how the model should use its instance variables to convert an input to an output.
:::info
__Task 2.1 [mlp.MLP.__init__]:__ Finish filling out `MLP.__init__`
While creating your models, you're going to be working with Tensorflow's `keras` library! Take a glance at some documentation if you need help getting started!
- [Tensorflow Dense Layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)
- [Tensorflow Reshape](https://www.tensorflow.org/api_docs/python/tf/reshape)
- You should initialize all hyperparameters within the constructor. We've given you some default values
- Make instances of your model's Dense layers here too. Keep in mind what dimensions you need to get the right predictions, considering the shape of your labels.
- We also recommend starting with just one layer and then adding more intermediate layers once you get that running.
:::
:::info
__Task 2.2 [mlp.MLP.call]:__ Fill out `MLP.call`
- First, flatten your input images. You should end up with `num_inputs` number of vectors.
- Call your dense layers!
- We expect both the MLP and CNN to output Logits (unnormalized likelihood units), with each image in the input matrix being given a logit for each possible class. In other words, we expect your models to have an output shape of `[batch_size, num_classes]`
:::
:::info
__Task 2.3 [base_model.CifarModel.loss]:__ Given the logits and labels, compute and return the mean loss in `CifarModel.loss`
You might've noticed that the `MLP` inherits from the `CifarModel` class. Fill in the loss and accuracy functions in `base_model.py`
- Use the average softmax cross-entropy value on the logits compared to the labels as your loss. We suggest using `tf.nn.softmax_cross_entropy_with_logits` and `tf.reduce_mean` to __condense the loss to one value__.
:::
:::success
__Note:__ We use `softmax_cross_entropy_with_logits` since our output values are unnormalized. We could just as well have used `tf.keras.layers.CategoricalCrossEntropy` and the softmax activation function at the end of our forward pass. In short, there are often many ways to do effectively the same operations.
:::
:::info
__Task 2.4 [base_model.CifarModel.accuracy]:__ Given the logits and labels, compute then return the accuracy in `CifarModel.accuracy`
- To find your accuracy, first find for each input image the predicted, most likely class. You might find [`tf.argmax`](https://www.tensorflow.org/api_docs/python/tf/math/argmax) helpful. Then, find the ratio of correct to incorrect predictions. You might find `tf.equal` and `tf.reduce_mean` useful for this task.
:::
Now, all that's left to do with your MLP is run it!
:::info
__Task 2.5 [assignment.main pt 2]:__ Initialize your MLP model in the main function of `assignment.py` to ensure nothing breaks. If you'd like, you can further sanity check your MLP by running a batch a data through the forward pass and confirming the output shape is what you expect.
You should also initialize your optimizer here. We recommend using an Adam Optimizer with a learning rate of 1e-3, but feel free to experiment with other optimizers.
:::
## Step 3. Train and test
In the `main` function, you will want to get your train and test data, initialize your model, and train it for many epochs. We suggest training for 10 epochs. We have provided for you a train and test method to fill out. The train method will take in the model and do the forward and backward pass for a SINGLE epoch. Iterate until either your test accuracy is sufficiently large or you have reached the max number of epochs (you can set this to whatever you'd like but we recommend keeping it below 25). For reference, we are able to reach good accuracy after no more than 10 epochs.
:::info
__Task 3.1 [train]:__ Go ahead and write the train function in `assignment.py`.
Even though this is technically part of preprocessing, you should shuffle your inputs and labels when TRAINING. Keep in mind that they have to be shuffled in the same order. You may find`tf.random.shuffle` and `tf.gather(train_inputs, indices)` of use.
- Make sure you've reshaped inputs in preprocessing into shape (batch_size, width, height, in_channels) before calling model.call(). When training, you might find it helpful to actually call `tf.image.random_flip_left_right` on your batch of image inputs to increase accuracy. Do not call this when testing.
- When training, call the model's forward pass and calculate the loss within the scope of tf.GradientTape. Then use the optimizer to apply the gradients to your model's trainable variables outside of the GradientTape. In the end it should look like this:
``` python
with tf.GradientTape() as tape:
do stuff
# Computes the gradients of all trainable vars w.r.t loss
gradients = tape.gradient(loss, model.trainable_variables)
# Adjusts the trainable vars according to the optimizer update rule
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
```
:::
If you'd like, you can calculate the train accuracy to check that your model does not overfit the training set. If you get upwards of 80% accuracy on the training set but only 65% accuracy on the testing set, you might be overfitting.
:::info
__Task 3.2 [test]:__ Write the test function in `assignment.py`.
- The test function will take in the same model, now with trained parameters, and return the accuracy given the test data and test labels. The test function will be very similar to the train function except without the GradientTape.
:::
:::danger
**⚠️WARNING⚠️:**
When testing __you should NOT randomly flip images or do any extra preprocessing.__
:::
:::info
__Task 3.3 [assignment.main pt 3]:__ Now try training your MLP Model!
:::
:::info
__Task 3.4 [assignment.main pt 4]]:__ Once you have confirmed that training the model doesn't break, add in testing so you can see how the model does when it counts!
:::
:::info
__Task 3.5 [assignment.main pt 5]]:__ Finally, write a small bit of code to save your MLP predictions for the CAT-DEER-DOG (3-5) subset of the testing data as `predictions_mlp.npy`. You'll submit this file to the autograder for an accuracy check. Your accuracy will also be added to the _optional and ungraded_ leaderboard.
:::
### Improving your MLP
You might notice that your MLP doesn't perform so well. While there's only so much you can do for an MLP, two things to try are activation layers and dropout.
You might've noticed that Dense layers take an "activation" argument. You can either pass an instance of the activation you want to use directly, or you can just pass in a string, like `"sigmoid"`, `"relu"`, or `"leaky_relu"`! Try passing these in to your Dense layers, except the last one (remember, we want logits)!
Dropout can also help improve performance at testing time. `tf.nn.dropout` is a layer which, during training, sets random entries in its input to 0. This way, the model is forced to make a prediction without certain input features–if the model was overfitting on these individual features, then dropout would work to prevent this.
## Step 4. Create your CNN model
Time for your second model! This time, we'll be making a convolutional model to get even better results.
You might find this useful: [Tensorflow Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D)
:::info
__Task 4.1 [CNN.__init__]__:
Go fill out the `__init__` function for the `CNN`.
- Again, you should initialize all hyperparameters within the constructor.
- Additionally, make all of your Convolutional and Dense layers here!
- You may use any permutation and number of convolution, pooling, and dense layers, as long as you use at least one convolution layer with strides of ``(1, 1)``, one pooling layer, and one fully connected layer.
:::
:::success
If you are having trouble getting started with model architecture, we have provided an example below:
- 1st Convolution Layer `[tf.keras.Layers.Conv2D]` + Bias, Batch Normalization `[tf.nn.batch_normalization]`, ReLU `[tf.nn.relu]`, Max Pooling `[tf.nn.max_pool]`
- 2nd Convolution Layer + Bias, Batch Normalization, ReLU, Max Pooling
- 3rd Convolution Layer + Bias, Batch Normalization, ReLU
- Remember to reshape the output of the previous convolutional layer to make it compatible with the dense layers.
- 1st Dense Layer + Bias, Dropout `[tf.nn.dropout]`
- 2st Dense Layer + Bias, Dropout
- Final Dense Layer
:::
:::info
__Task 4.2 [CNN.call]:__ Fill out the call function using the trainable variables you've created. Your call function should return the logits. The parameter `is_testing` will be used later, do not worry about it when implementing everything in this part.
:::
:::info
__Task 4.3 [Train and Test CNN]:__ Go ahead to train and test your CNN model as you did with your MLP model. You'll save and submit the predictions for the cat-deer-dog test subset as `predictions_cnn.npy`.
:::
## Step 5. Creating your own `conv2d`
:::warning
Before starting this part of the assignment, you should ensure that you have an accuracy of **at least 70%** on the test set using only TensorFlow functions for the problem of classifying cats, deer, dogs.
:::
:::success
You will be implementing your very own convolution function!
For the sake of simple math calculations (less is more, no?), we'll require that our `ManualConv2d` function **only works with a stride of 1** (for both width and height). This is because the calculation for padding size changes as a result of the stride.
:::
:::danger
Do **NOT** change the parameters of the ManualConv2d class we have provided. Even though the `ManualConv2d` takes in a strides argument, you should ALWAYS pass in [1, 1, 1, 1]. Leaving in strides as an argument was a conscious design choice - if you wanted to eventually make the `ManualConv2d` function work for other kinds of strides in your own time, this would allow you to easily change it.
:::
:::info
__Task [ManualConv2d]:__ Implement your very own Conv2d! Here are some specifics and hints for you:
- __[Inputs]__ Your inputs will have 4 dimensions. If we are to use this conv2d function for the first layer, the inputs would be [batch_size, in_height, in_width, input_channels].
- __[Filter Initialization]__ You should ensure that the input's number of "in channels" is equivalent to the filters' number of "in channels". It's good practice to add an assert statement or throw an error if the number of input in channels are not the same as the filter’s in channels.
- __[Padding]__ When calculating how much padding to use for SAME padding, we want to have a total of `(filter_size - 1)` padding pixels for each padded dimension (width and height) if you are using strides of 1.
- For instance, if we have `filter_size=3`, then we would want to pad an image's width and height by 1 pixel on both sides (*i.e.* there's a 1-pixel wide border around the whole image).
- The calculation of padding differs if you increase your strides and is much more complex, so we won’t be dealing with that. If you are interested in calculating padding, you can read about it [here](https://cs231n.github.io/convolutional-networks/).
- __[Algorithm Hints]__ After padding (if needed), you will want to go through the entire batch of images and perform the convolution operator on each image. There are two ways of going about this - you can continuously append multidimensional NumPy arrays to an output array or you can create a NumPy array with the correct output dimensions, and just update each element in the output as you perform the convolution operator. We suggest doing the latter - it's conceptually easier to keep track of things this way.
- __[Algorithm Hints]__ You will want to iterate the entire height and width including padding, stopping when you cannot fit a filter over the rest of the padding input. For convolution with many input channels, you will want to perform the convolution per input channel and sum those dot products together.
- __[Outputs]__ Your output dimension height is equal to `(in_height + total_padY - filter_height) / strideY + 1` and your output dimension width is equal to `(in_width + total_padX - filter_width) / strideX + 1`. Again, `strideX` and `strideY` will always be 1 for this assignment. Refer to the CNN slides if you'd like to understand this derivation.
- __[Outputs]__ PLEASE RETURN YOUR RESULT TO A TENSOR USING `tf.convert_to_tensor(your_array, dtype = tf.float32)`. Issues have occurred in the past without this step.
- __[All Around]__ You can (and should) use tensorflow functions (tf.reduce_sum, tf.muliply, etc.) to build your forward pass. This will ensure you are able to train your model using this layer.
:::
:::danger
Writing ManualConv2d has given students grief in the past. We recommend taking your time here, jumping right into coding is likely to confuse you even more.
:::
:::warning
When padding an odd amount, left-right or top-bottom, you should allocate less padding to the left and top than to the right and bottom. For example, if you need 3 padding columns, add 1 column to the left and 2 to the right. This is a convention that __will__ be enforced by the autograder.
:::
:::success
Hopefully Helpful Hints:
1. In the past, many students have found success thinking about how to use 4 for loops to write ManualConv2d, then with some small tweaks you can easily get it down to 2 for loops.
2. Don't be scared to use broadcasting by expanding the dimensions of the input or filters so that your computation is easier.
:::
## Step 6. Testing your own `conv2d`
:::info
__Optional Task [Testing your conv]:__ We have provided for you a few tests in `local_tests.py` that compare the result of your very own `conv2d` and TensorFlow's `conv2d`. If you've implemented it correctly, the results should be very similar.
These tests are also similar (but not quite the same) as the autograder tests.
:::
:::warning
In your model, you should set `is_testing` to True when testing, then make sure that if `is_testing` is True, you use your own convolution rather than TensorFlow's `conv2d` on a SINGLE convolution layer. This part will take the longest, and is why we say it might actually take up to 15 minutes on a local machine.
:::
## Visualizing Results
We have written two methods for you to visualize your results. The created visuals will not be graded and are entirely for your benefit. You can use it to check out your doggos and kittens.
- We've provided the `visualize_results(image_inputs, logits, image_labels, first_label, second_label)` method for you to visualize your predictions against the true labels using matplotlib, a useful Python library for plotting graphs. This method is currently written with the image_labels having a shape of (num_images, num_classes). DO NOT EDIT THIS FUNCTION. You should call this function after training and testing, passing into `visualize_results` an input of 50 images, 50 probabilities, 50 labels, the first label name, and second label name.
- Unlike the first assignment, you will need to pass in the strings of the first and second classes. A `visualize_results` method call might look like: `visualize_results(image_inputs, logits, image_labels, "cat", "dog")`.
- This should result in two visuals, one for correct predictions, and one for incorrect predictions. You should do this after you are sure you have met the benchmark for test accuracy.
- We have also provided the `visualize_loss(losses)` method for you to visualize your loss per batch over time. Your model or your training function should have a list `loss_list` to which you can append batch losses to during training. You should call this function after training and testing, passing in `loss_list`.
##
# Submission
## Requirements
:::warning
**Additionally, you *must* use your own convolution instead of TensorFlow's for at least a single layer when `is_testing` is `True`.**
:::
Our autograder will import your model and your preprocessing functions. We will feed the result of your `get_data` function called on a path to our data and pass the result to your train method in order to return a fully trained model. After this, we will feed in your trained model, alongside the TA pre-processed data, to our custom test function. This will just batch the testing data using YOUR batch size and run it through your model's `call` function. However, we will test that your model can test with any batch size, meaning that you should not hardcode `self.batch_size` in your `call` function. The logits which are returned will then be fed through an accuracy function. Additionally, we will test your conv2d function. In order to ensure you don't lose points, you need to make sure that you... A) correctly return training inputs and labels from `get_data`, B) ensure that your model's `call` function returns logits from the inputs specified, and that it does not break on different batch sizes when testing, and C) it does not rely on any packages outside of tensorflow, numpy, matplotlib, or the python standard library.
In addition, remember to include a brief README with your model's accuracy and any known bugs.
## Grading
Code: You will be primarily graded on functionality. Your model should have an accuracy that is at least greater than 75% on the cat-deer-dog testing data using your CNN model and greater than 60% on the cat-deer-dog testing subset using your MLP model.
## Handing In
You should submit the assignment via Gradescope under the corresponding project assignment dropping all your files into Gradescope or through GitHub. To submit through GitHub, commit and push all changes to your repository to GitHub. You can do this by running the following three commands ([this](https://github.com/git-guides/#how-to-use-git) is a good resource for learning more about them):
1. `git add file1 file2 file3`
- Alternatively, `git add -A` will stage all changed files for you.
3. `git commit -m “commit message”`
4. `git push`
After committing and pushing your changes to your repo (which you can check online if you're unsure if it worked), you can now just upload the repo to Gradescope! If you’re testing out code on multiple branches, you have the option to pick whichever one you want.
![](https://i.imgur.com/fDc3PH9.jpg)
If you wish to submit via zip file:
1. Make sure any data folders are not being uploaded as they may be too big for the autograder to work.
# Conclusion
Congrats on finishing your CNN homework; Baby Blueno is very appreciative of your help!! :tada: :baby: :octopus: :tada:
::: success
The Dumbo Octopus lives at a lower depth than any other octopus. As such, common octopi defense mechanisms, like changing colors or ink sacs, are absent on the Dumbo Octopus.
:::