HW3 Programming: CNNs

# HW3 Programming: CNNs :::info Assignment due **March 14th at 10 pm EST** on gradescope ::: ## Assignment Overview In this assignment, you will be building a **Multi-Layer Perceptron (MLP)** and **Convolutional Neural Network (CNN)** with pooling layers using the CIFAR dataset to learn to distinguish cats and dogs (among other things). *Please read this handout in its entirety before beginning the assignment.* ## Getting started ### Theme ![DALL·E 2025-02-28 08.54.26 - A lost astronaut bear named Bruno piloting a small spaceship in space, encountering two distinct alien species_ a mysterious cat-like species with glo](https://hackmd.io/_uploads/rJEfk8kokx.jpg) Still lost in space, Bruno has encountered two alien species the Conniving Space Felines (Spacis Felis) and the Noble Galactic Canines (Galactus Canis). The dogs would like to help Bruno find the way home, while the cats would like to scrap Bruno's craft for parts! The trouble is, Bruno can't tell them apart! Your job create a model to help Bruno correctly distinguish between the cats and dogs at least 70% of the time. Good luck! *Editor's note: I'm sorry the dog looks like that, it's just what chatbot wanted it to look like. Maybe AI should not have creative liberty...* ### Stencil Please click [here](https://classroom.github.com/a/3dSJmyLG) to get the stencil code. Reference this [guide](https://hackmd.io/gGOpcqoeTx-BOvLXQWRgQg) for more information about GitHub and GitHub Classroom. :::danger **Do not change the stencil except where specified**. While you are welcome to write your own helper functions, changing the stencil's method signatures or removing pre-defined functions could result in incompatibility with the autograder and result in a low grade. ::: The stencil should contain these files: `assignment.py`, `base_model.py`, `cnn.py`, `local_test.py`, `manual_convolution.py`, `mlp.py` and `preprocess.py`. :::info **Task 0.1 Download the data:** Click [here](https://cs.brown.edu/courses/csci1470/hw_data/hw2.zip) to download the data. When you unzip you'll find 2 files, `data/train` and `data/test`. These files contain all the data you'll need for this assignment. ::: :::warning **You should not submit the data to the autograder**. We keep a copy of the data on the autograder, so you don't need to upload it. To ensure you do not accidentally include the `data/` directory inside your git commit, consider creating a `.gitignore` file with `data/` inside it. ::: ### Environment You will need to use the virtual environment that you made in Homework 1 to run code in this assignment (because it relies on `numpy` and `tensorflow`), which you can activate by using `conda activate csci1470`. ## Assignment Overview Your task is a multi-class classification problem on the CIFAR10 dataset which you can read about [here](https://www.cs.toronto.edu/~kriz/cifar.html). While the CIFAR10 dataset has 10 possible classes (airplane, automobile, bird, cat, deer, frog, horse, ship, and truck), you will build a CNN to take in an image and correctly predict a subset of these classes. __You'll be graded on your model's predictions for the cat-dog subset.__ The assignment has two parts: 1. **Model:** Build the models. Our stencil provides a model class with several methods and hyperparameters you need to use for your network. 2. **Convolution Function:** Fill out a function that performs the convolution operator. :::success See Roadmap below for more information on parts 1 and 2. ::: :::info You should include a brief README with your model's accuracy and any known bugs! ::: If completed correctly, the model should train and test within 15 minutes on a department machine. It takes about 5 minutes on our TAs' laptops. While you will mainly be using TensorFlow functions, the second part of the assignment requires you to write your own convolution function, which can be very computationally expensive. To counter this, we only require that you print the accuracy across the test set using your manual convolution. __You should train your model using the tensorflow built-ins__. On a department machine, training should take about 3 minutes and testing using your own convolution should take about 2 minutes. # Roadmap Below is a brief outline of some things you should do. We expect you to fill in some of the missing gaps (review lecture slides to understand the pipeline). ## Step 1. Preprocessing Data :::danger **⚠️WARNING⚠️:** __Please do not shuffle the data here__. You'll shuffle the data before training and testing. You should maintain the order of examples as they are loaded in or you will fail Autograder test 1.4. ::: :::info __Task 1.1 [preprocess.get_data pt 1]:__ Start filling in the get_data function in `preprocess.py`. * We have provided you with a function `unpickle(file)` in the `preprocess.py` file stencil, which unpickles an object and returns a dictionary. Do not edit it. We have also already extracted the inputs and labels from the dictionary in `get_data` so you have no need to deal with the pickled file or the dictionary. - You will want to limit the inputs and labels returned by `get_data` to those specfied by the `classes` parameter. You will be expected to train and test for the **cat (label index 3), dog (5)** subset of the test set, so it's a good default to have in mind. For every image and its corresponding label, if the label is not in `classes`, then remove the image and label from your inputs and labels arrays. There are a few different ways to do this—you might find [numpy.nonzero](https://numpy.org/doc/1.18/reference/generated/numpy.nonzero.html) useful for finding only the indices of your labels. ::: :::info __Task 1.2 [preprocess.get_data pt 2]:__ Continue filling in the get_data function in `preprocess.py`. - At this point, your inputs are still two dimensional. You will want to reshape your inputs so that the final inputs you return have shape (num_examples, 32, 32, 3), where the width is 32, height is 32, and number of channels is 3. `tf.reshape` and `tf.transpose` will be help here. - You should normalize the input pixel values so that they range from 0 to 1 to avoid any numerical overflow issues. This can be done by dividing each pixel value by 255. ::: :::info __Task 1.3 [preprocess.get_data pt 3]:__ Finish the get_data function in `preprocess.py`. - You will want to re-number the labels such that the lowest label -> 0 and the highest label -> `num_classes`, filling in the intermediate values from likewise. (in general cases, it doesn't matter how this is done; however, we require this specificity for our autograder). You might find [`numpy.where`](https://numpy.org/doc/stable/reference/generated/numpy.where.html) useful in the renumbering process. - After doing that, you will want to turn your labels into one-hot vectors, where the index with a 1 represents the class of the correct image. You can do this with the function `tf.one_hot`. - This can be a bit confusing so we'll just make it clear: your final labels should be of size (num_images, num_classes). So for example, if you have 2 classes, cat and dog, the corresponding label of of a dog image might be [0, 1] where a 1 in the second index means that it's a dog. ::: :::success Note: If you use `tf.one_hot`, you will need to shift your labels so that the class indices are between 0-len(classes). For example, if you are using classes [3,6,8] change the labels to [0,1,2] respectively before using `tf.one_hot` ::: :::danger **⚠️WARNING⚠️:** In the `main` function in `assignment.py`, we give you `AUTOGRADER_TRAIN_FILE` and `AUTOGRADER_TEST_FILE` variables, which are the file paths that must be used for it to work with the autograder. You might need to define separate filepaths to run the code locally (especially if you are on Windows). When you submit your code to Gradescope, you **MUST** call `get_data` using the autograder filepaths we have provided in the stencil (or filepaths identical to the ones we have provided). ::: :::success **Note:** If you download the dataset from online, the training data is actually divided into batches. We have done the job of repickling all of the batches into one single train file for your ease. ::: :::success **Note:** You're going to be calling `get_data` on both the training and testing data files in `assignment.py`. The testing and training data files to be read in are in the following format: - `train`: A pickled object of 50,000 train images and labels. This includes images and labels of all 10 classes. After unpickling the file, the dictionary will have the following elements: - data -- a 50000x3072 numpy array of uint8s. Each row of the array stores a 32x32 color image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image. - labels -- a list of 50000 numbers in the range 0-9. The number at index `i` indicates the label of the `i`-th image in the array data. - `test`: A pickled object of 10,000 test images and labels. This includes images and labels of all 10 classes. Unpickling the file gives a dictionary with the same key values as above. ::: :::info __Task 1.2 [assignment.main pt 1]:__ Load in both your training and testing data using `get_data`. Print out the shapes, values, etc. and once you are happy feel free to submit what you have so far to the autograder to check your score for the preprocessing tests. ::: Throughout this assigment we recommend building `assignment.py` as you go so that you can test your implementations as you write them, not all at once. Now is a great time to start filling out `assignment.main` while testing your `get_data` at the same time. ## Step 2. Create your MLP model Time to start modelling with Tensorflow! Go to the `mlp.py` file and take a glance at the stencil. You'll notice that we have a constructor function (`__init__`) and a `call` function. In the constructor, we want to build up everything necessary to have a working deep learning model. In the `call` function, we want to fill out how the model should use its instance variables to convert an input to an output. :::info __Task 2.1 [mlp.MLP.__init__]:__ Finish filling out `MLP.__init__` While creating your models, you're going to be working with Tensorflow's `keras` library! Take a glance at some documentation if you need help getting started, though this should be familiar from Mini-Project 1 - [Tensorflow Dense Layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) - [Tensorflow Reshape](https://www.tensorflow.org/api_docs/python/tf/reshape) - You should initialize all hyperparameters within the constructor. We've given you some default values - Make instances of your model's Dense layers here too. Keep in mind what dimensions you need to get the right predictions, considering the shape of your labels. - We also recommend starting with just one layer and then adding more intermediate layers once you get that running. - Your last layer should return a probability distribution over all classes, i.e. use a softmax activation. ::: :::info __Task 2.2 [mlp.MLP.call]:__ Fill out `MLP.call` - First, flatten your input images. You should end up with `num_inputs` number of vectors. - Call your dense layers! - We expect both the MLP and CNN to output a distribution over all classes for each image in the input matrix. In other words, we expect your models to have an output shape of `[batch_size, num_classes]` ::: :::danger **Warning:** Be sure that your MLP **flatten the images in `call`** since MLP will expect image data as input (i.e. input_shape = (batch_size, 32, 32, 3) for this assignment). This may be counter-intuitive at first, but it's a very small design decision that makes our life easier down the line. ::: :::info __Task 2.3 [base_model.CifarModel.loss]:__ Given the logits and labels, compute and return the mean loss in `CifarModel.loss` You might've noticed that the `MLP` inherits from the `CifarModel` class. Fill in the loss and accuracy functions in `base_model.py` - Use the average cross-entropy value on the outputs (which should be a probability distribution like in HW2) compared to the labels as your loss. We suggest using `tf.keras.losses.CategoricalCrossentropy` ::: :::success If you use `tf.keras.losses.CatgoricalCrossentropy`, you'll need to initialize it every time you run the loss function, then immediately call it. If that style choice bothers you, it bothers us too. But Tensorflow doesn't have a functional CategoricalCrossentropy that takes in noramlized outputs, and we thought it'd be too confusing to specify unnormalized outputs when we've talked about how much using softmax at the end helps. ::: :::info __Task 2.4 [base_model.CifarModel.accuracy]:__ Given the logits and labels, compute then return the accuracy in `CifarModel.accuracy` - To find your accuracy, first find for each input image the predicted, most likely class. You might find [`tf.argmax`](https://www.tensorflow.org/api_docs/python/tf/math/argmax) helpful. Then, find the ratio of correct to incorrect predictions. You might find `tf.equal` and `tf.reduce_mean` useful for this task. Also, if you find yourself needing to use `tf.cast()`, please make sure to set the `dtype` to be `tf.float32`. ::: Now, all that's left to do with your MLP is run it! :::info __Task 2.5 [assignment.main pt 2]:__ Initialize your MLP model in the main function of `assignment.py` to ensure nothing breaks. If you'd like, you can further sanity check your MLP by running a batch a data through the forward pass and confirming the output shape is what you expect. You should also initialize your optimizer here. We recommend using an Adam Optimizer with a learning rate of 1e-3, but feel free to experiment with other optimizers. ::: ## Step 3. Train and test In the `main` function, you will want to get your train and test data, initialize your model, and train it for many epochs. We suggest training for 10 epochs. We have provided for you a train and test method to fill out. The train method will take in the model and do the forward and backward pass for a SINGLE epoch. Iterate until either your test accuracy is sufficiently large or you have reached the max number of epochs (you can set this to whatever you'd like with a hard cap at 25). For reference, we are able to reach good accuracy after no more than 10 epochs. :::info __Task 3.1 [train]:__ Go ahead and write the train function in `assignment.py`. Even though this is technically part of preprocessing, you should shuffle your inputs and labels when TRAINING. Keep in mind that they have to be shuffled in the same order. You may find`tf.random.shuffle` and `tf.gather(train_inputs, indices)` of use. - Make sure you've reshaped inputs in preprocessing into shape (batch_size, width, height, in_channels) before calling model.call(). When training, you might find it helpful to actually call `tf.image.random_flip_left_right` on your batch of image inputs to increase accuracy. Do not call this when testing. ::: If you'd like, you can calculate the train accuracy to check that your model does not overfit the training set. If you get upwards of 80% accuracy on the training set but only 65% accuracy on the testing set, you might be overfitting. :::info __Task 3.2 [test]:__ Write the test function in `assignment.py`. - The test function will take in the same model, now with trained parameters, and return the accuracy given the test data and test labels. The test function will be very similar to the train function except without the GradientTape. ::: :::danger **⚠️WARNING⚠️:** When testing __you should NOT randomly flip images or do any extra preprocessing.__ ::: :::info __Task 3.3 [assignment.main pt 3]:__ Now try training your MLP Model! ::: :::info __Task 3.4 [assignment.main pt 4]]:__ Once you have confirmed that training the model doesn't break, add in testing so you can see how the model does when it counts! We are looking for > 60% accuracy with an MLP model, which you should be able to reach without much trouble and a relatively small model. ::: ### Improving your MLP You might notice that your MLP doesn't perform so well. While there's only so much you can do for an MLP, two things to try are activation layers and dropout. You might've noticed that Dense layers take an "activation" argument. You can either pass an instance of the activation you want to use directly, or you can just pass in a string, like `"sigmoid"`, `"relu"`, or `"leaky_relu"`! Try passing these in to your Dense layers, except the last one (remember, we want logits)! Dropout can also help improve performance at testing time. `tf.nn.dropout` is a layer which, during training, sets random entries in its input to 0. This way, the model is forced to make a prediction without certain input features–if the model was overfitting on these individual features, then dropout would work to prevent this. ## Step 4. Create your CNN model Time for your second model! This time, we'll be making a convolutional model to get even better results. You might find this useful: [Tensorflow Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) :::info __Task 4.1 [CNN.__init__]__: Go fill out the `__init__` function for the `CNN`. - Again, you should initialize all hyperparameters within the constructor. - Additionally, make all of your Convolutional and Dense layers here! - You may use any permutation and number of convolution, pooling, and dense layers, as long as you use at least one convolution layer with strides of ``[1, 1, 1, 1]``, one pooling layer, and one fully connected layer. ::: :::success If you are having trouble getting started with model architecture, we have provided an example below: - 1st Convolution Layer `[tf.keras.Layers.Conv2D]` + Bias, Batch Normalization `[tf.nn.batch_normalization]`, ReLU `[tf.nn.relu]`, Max Pooling `[tf.nn.max_pool]` - 2nd Convolution Layer + Bias, Batch Normalization, ReLU, Max Pooling - 3rd Convolution Layer + Bias, Batch Normalization, ReLU - Remember to reshape the output of the previous convolutional layer to make it compatible with the dense layers. - 1st Dense Layer + Bias, Dropout `[tf.nn.dropout]` - 2st Dense Layer + Bias, Dropout - Final Dense Layer ::: :::info __Task 4.2 [CNN.call]:__ Fill out the call function using the trainable variables you've created. Your call function should return the logits. The parameter `is_testing` will be used later, do not worry about it when implementing everything in this part. ::: :::info __Task 4.3 [Train and Test CNN]:__ Go ahead to train and test your CNN model as you did with your MLP model. You'll be tested on the cat-dog test subset once you indicate that your model is ready to be tested with a FINAL.txt file. ::: ## Step 5. Creating your own `conv2d` :::warning Before starting this part of the assignment, you should ensure that you have an accuracy of **at least 70%** on the test set using only TensorFlow functions for the problem of classifying cats, dogs. ::: :::success You will be implementing your very own convolution function! For the sake of simple math calculations (less is more, no?), we'll require that our `ManualConv2d` function **only works with a stride of 1** (for both width and height). This is because the calculation for padding size changes as a result of the stride. ::: :::danger Do **NOT** change the parameters of the ManualConv2d class we have provided. Even though the `ManualConv2d` takes in a strides argument, you should ALWAYS pass in [1, 1, 1, 1]. Leaving in strides as an argument was a conscious design choice - if you wanted to eventually make the `ManualConv2d` function work for other kinds of strides in your own time, this would allow you to easily change it. ::: :::info __Task [ManualConv2d]:__ Implement your very own Conv2d! Here are some specifics and hints for you: - __[Inputs]__ Your inputs will have 4 dimensions. If we are to use this conv2d function for the first layer, the inputs would be [batch_size, in_height, in_width, input_channels]. - __[Filter Initialization]__ You should ensure that the input's number of "in channels" is equivalent to the filters' number of "in channels". It's good practice to add an assert statement or throw an error if the number of input in channels are not the same as the filter’s in channels. - __[Padding]__ When calculating how much padding to use for SAME padding, we want to have a total of `(filter_size - 1)` padding pixels for each padded dimension (width and height) if you are using strides of 1. - For instance, if we have `filter_size=3`, then we would want to pad an image's width and height by 1 pixel on both sides (*i.e.* there's a 1-pixel wide border around the whole image). - If we have an even filter size, then you can distribute the extra pixel to either side of the image and it will be counted as correct. - The calculation of padding differs if you increase your strides and is much more complex, so we won’t be dealing with that. If you are interested in calculating padding, you can read about it [here](https://cs231n.github.io/convolutional-networks/). - __[Algorithm Hints]__ After padding (if needed), you will want to go through the entire batch of images and perform the convolution operator on each image. There are two ways of going about this - you can continuously append multidimensional NumPy arrays to an output array or you can create a NumPy array with the correct output dimensions, and just update each element in the output as you perform the convolution operator. We suggest doing the latter - it's conceptually easier to keep track of things this way. - __[Algorithm Hints]__ You will want to iterate the entire height and width including padding, stopping when you cannot fit a filter over the rest of the padding input. For convolution with many input channels, you will want to perform the convolution per input channel and sum those dot products together. - __[Outputs]__ Your output dimension height is equal to `(in_height + total_padY - filter_height) / strideY + 1` and your output dimension width is equal to `(in_width + total_padX - filter_width) / strideX + 1`. Again, `strideX` and `strideY` will always be 1 for this assignment. Refer to the CNN slides if you'd like to understand this derivation. - __[Outputs]__ PLEASE RETURN YOUR RESULT TO A TENSOR USING `tf.convert_to_tensor(your_array, dtype = tf.float32)`. Issues have occurred in the past without this step. - __[All Around]__ You can (and should) use tensorflow functions (tf.reduce_sum, tf.muliply, etc.) to build your forward pass. This will ensure you are able to train your model using this layer. ::: :::danger Writing ManualConv2d has given students grief in the past. We recommend taking your time here, jumping right into coding is likely to confuse you even more. ::: :::warning **IMPORTANT** When padding an odd amount, left-right or top-bottom, you should allocate less padding to the left and top than to the right and bottom. For example, if you need 3 padding columns, add 1 column to the left and 2 to the right. This is a convention that __will__ be enforced by the autograder. ::: :::success Hopefully Helpful Hints: 1. In the past, many students have found success thinking about how to use 4 for-loops to write ManualConv2d, then with some small tweaks you can easily get it down to 2 for-loops. 2. Don't be scared to use broadcasting by expanding the dimensions of the input or filters so that your computation is easier. ::: ## Step 6. Testing your own `conv2d` :::info __Task [Testing your conv]:__ We have provided for you a few tests in `local_tests.py` that compare the result of your very own `conv2d` and TensorFlow's `conv2d`. If you've implemented it correctly, the results should be very similar. These tests are a subset of the autograder tests, so if you are passing them locally you should be passing at least a couple of the autograder tests. ::: :::warning In your model, you should set `is_testing` to True when testing, then make sure that if `is_testing` is True, you use your own convolution rather than TensorFlow's `conv2d` on a SINGLE convolution layer. This part will take the longest, and is why we say it might actually take up to 15 minutes on a local machine. ::: ## Visualizing Results We have written two methods for you to visualize your results. The created visuals will not be graded and are entirely for your benefit. You can use it to check out your doggos and kittens. - We've provided the `visualize_results(image_inputs, logits, image_labels, first_label, second_label)` method for you to visualize your predictions against the true labels using matplotlib, a useful Python library for plotting graphs. This method is currently written with the image_labels having a shape of (num_images, num_classes). DO NOT EDIT THIS FUNCTION. You should call this function after training and testing, passing into `visualize_results` an input of 50 images, 50 probabilities, 50 labels, the first label name, and second label name. - Unlike the first assignment, you will need to pass in the strings of the first and second classes. A `visualize_results` method call might look like: `visualize_results(image_inputs, logits, image_labels, "cat", "dog")`. - This should result in two visuals, one for correct predictions, and one for incorrect predictions. You should do this after you are sure you have met the benchmark for test accuracy. - We have also provided the `visualize_loss(losses)` method for you to visualize your loss per batch over time. Your model or your training function should have a list `loss_list` to which you can append batch losses to during training. You should call this function after training and testing, passing in `loss_list`. ## # Submission ## Requirements :::warning **You *must* use your own convolution instead of TensorFlow's for at least a single layer when `is_testing` is `True`.** ::: :::success **Important:** As in BERAS, the autograder will not train and test your model until you __submit a blank FINAL.txt__ file. You should only submit this file once you are passing all non-accuracy related tests, unless you want the autograder to take longer than it needs to! ::: Our autograder will import your model and your preprocessing functions. We will feed the result of your `get_data` function called on a path to our data and pass the result to your train method in order to return a fully trained model. After this, we will feed in your trained model, alongside the TA pre-processed data, to our custom test function. This will just batch the testing data using YOUR batch size and run it through your model's `call` function. However, we will test that your model can test with any batch size, meaning that you should not hardcode `self.batch_size` in your `call` function. The logits which are returned will then be fed through an accuracy function. Additionally, we will test your conv2d function. In order to ensure you don't lose points, you need to make sure that you... A) correctly return training inputs and labels from `get_data`, B) ensure that your model's `call` function returns logits from the inputs specified, and that it does not break on different batch sizes when testing, and C) it does not rely on any packages outside of tensorflow, numpy, matplotlib, or the python standard library. In addition, remember to include a brief README with your model's accuracy and any known bugs. ## Grading Code: You will be primarily graded on functionality. Your model should have an accuracy that is at least greater than 70% on the cat-dog testing data using your CNN model and greater than 60% on the cat-dog testing subset using your MLP model. ## Handing In You should submit the assignment via Gradescope under the corresponding project assignment dropping all your files into Gradescope or through GitHub. To submit through GitHub, commit and push all changes to your repository to GitHub. You can do this by running the following three commands ([this](https://github.com/git-guides/#how-to-use-git) is a good resource for learning more about them): 1. `git add file1 file2 file3` - Alternatively, `git add -A` will stage all changed files for you. 3. `git commit -m “commit message”` 4. `git push` After committing and pushing your changes to your repo (which you can check online if you're unsure if it worked), you can now just upload the repo to Gradescope! If you’re testing out code on multiple branches, you have the option to pick whichever one you want. ![](https://i.imgur.com/fDc3PH9.jpg) If you wish to submit via zip file: 1. Make sure any data folders are not being uploaded as they may be too big for the autograder to work. # Conclusion Congrats on finishing your CNN homework! Bruno was able to successfully reach out to the Space Canines who provided a path to Bruno's home world! Bruno is finally on track to make it back home by 05/01/2025, so long as nothing else goes wrong...