--- tags: hw2, handout --- # HW2 Programming: Beras :::info Conceptual questions due **Friday, February 24th, 2023 at 6:00 PM EST** Programming assignment due **Monday, February 27th, 2023 at 6:00 PM EST** ::: This homework is intended to give you an introduction to building, training, and testing neural network models. You will not only be exposed to using Python packages to build a neural network from scratch, but also the mathematical aspects of backpropagation and gradient descent. While in practical scenarios, you won’t necessarily have to implement neural networks from scratch (as you will see in future labs and assignments), this assignment aims at giving you a rudimentary idea of what goes on under the hood in packages such as TensorFlow and Keras. In this assignment, you will use the MNIST handwritten digits dataset to train a simple classification neural network using batch learning and evaluate your model. ## Theme ![](https://media4.giphy.com/media/Y4K9JjSigTV1FkgiNE/giphy.gif) *TensorFlow is named after the 'flowing' motion of sea annemone (or so we'd like to believe)* # Getting started ## Stencil Please click <ins>[here](https://classroom.github.com/a/ynCa9L6d)</ins> to get the stencil code. Reference this <ins>[guide](https://hackmd.io/gGOpcqoeTx-BOvLXQWRgQg)</ins> for more information about GitHub and GitHub Classroom. :::danger **Do not change the stencil except where specified**. While you are welcome to write your own helper functions, changing the stencil's method signatures or removing pre-defined functions could result in incompatibility with the autograder and result in a low grade. ::: ## Environment You will need to use the virtual environment that you made in Homework 0 to run code in this assignment (because it relies on numpy and tensorflow), which you can activate by using conda activate <your environment name>. # Assignment Overview In this assignment, you will be constructing a Keras mimic, Beras (haha funny name), and will make a sequential model specification that mimics the Tensorflow/Keras API. The Python notebook associated with this homework is meant for you to explore an example implementation so that you can build on it yourself! There are no TODOs for you to work on in the notebook; rather, the testing is done by running the main method of assignment.py. Our stencil provides a model class with several methods and hyperparameters you need to use for your network. You will also answer conceptual questions related to the assignment and class material (**don’t forget to answer the 2470-only questions if you’re a 2470 student!**). You should include a brief README with your model's accuracy and any known bugs. ## Before Starting This homework assignment is due two weeks from release. Labs 1-3 and HW1 provide great practice for this assignment. Specifically: 1. **Implementing Callable/Diffable Components:** The skills you need in order to do this can be found by working through *Lab 1* and *Homework 1*. This includes comfort with mathematical notation, matrix operations, and the logic behind call and gradient methods. 2. **Implementing Optimizers:** You can implement the BasicOptimizer class just by following the logic from the gradient_descent method in *Lab 1: Intro to ML*. The other optimizers (i.e. Adam, RMSProp) are covered in *Lab 2: Optimizers*. 2. **Using batch_step and GradientTape:** You can figure out how to use these to train your model based on the assignment instructions and your implementations of these. With that said, they do mimic the Keras API. You’ll learn about all this in *Lab 3: Intro to Tensorflow*. If your lab is after the due date, it should be fine; just skim over the complementary notebook associated with the lab. Feel free to start off by doing what you can and then add onto it as you learn more about deep learning and realize that the same concepts you learn in the class can actually be used here! :::warning **Note:** Historically, this assignment has been **challenging** for students. We have tried to make changes to improve the student experience, but the best thing you can do for yourself is to **start early**, **ask for help** when needed, and **read function headers!** You will find two Jupyter notebooks, in the stencil, Intro and Test. Intro contains some information on what the functions you will code will look like as well as some info on the structure of the assignment overall, and Test contains unit tests for a couple of the functions you will write. While neither are required to be used, both will save a lot of time if taken advantage of! ::: # Roadmap For this assignment, we'll walk you through the pipeline of training a neural net, including the structure of the model class and the methods you will have to fill in. <!-- ### 1. Preprocessing the Data Before training a network, you will need to clean your data. This includes retrieving, altering, and formatting the data into inputs for your network. For this assignment, you will be working with the **MNIST** dataset (images of digits 0-9). Your network should be training using **_only the training data_**, and then tested for accuracy on **_only the testing data_** (train/test split). You can use your accuracy on the test dataset after training has completed as a metric of how accurate your model is. :::info **Task 1:** Fill in the get_data_MNIST function in preprocess.py. Use numpy operations for this! ::: --> ## 1. Preprocessing Data For this assignment, you will be working with the **MNIST** dataset (images of digits 0-9). Before you can start building the Beras model, you need to load the training and testing data, flatten and normalize the input images, and finally convert all the data to Beras.Tensor. Loading the data will be done for you in the stencil code, but you'll need to take it from there! :::info **Task 1:** Fill in the load_and_preprocess_data function in code/preprocess.py ::: :::success **Hint:** Numpy documentation is your friend! Like you'll discover in this course, there is often a numpy method for what you need. ::: ## 2. One Hot Encoding Before training or testing your model, you will need to “one-hot” encode your class labels so that the model can optimize towards predicting any desired class. Note that the class labels by themselves are simply categories and do not mean anything numerically. In the absence of one-hot encoding, your model might learn some natural ordering between the different class labels based on the labels (which are arbitrary). :::success **Motivation:** This is reflective of a common idea: *categorical data* does not have any pre-determined ordering, so assigning them numbers in a scale would implicitly create _ordering_ and _relations_ between different classes that we want to avoid. **Example:** Let’s say there’s a data point A which corresponds to label ‘$2$’ and a data point $B$ which corresponds to label ‘$7$’. **We don’t** want the model to somehow learn that $B$ has a higher weight than $A$ simply because, numerically speaking, $7 > 2$. ::: To one-hot encode your class labels, you will have to convert your $1$-dimensional label vector into a vector of size num_classes (where num_classes is the total number of classes in your dataset). For the MNIST dataset, it looks something like the matrix below: ![](https://i.imgur.com/dHkYPIu.png) For this assignment, the code for one-hot-encoding your data can be found in Beras/onehot.py. Now, implement the following functions: :::info **Task 2:** Implement fit() in Beras/onehot.py and one hot encode the training and testing labels in assignment.py. Then, you should implement the call() function. In this function, we pass a vector of all the actual labels in the training set and call fit() to populate the uniq2oh dictionary with unique labels and their corresponding one-hot encoding and then use it to return an array of one-hot encoded labels for each label in the training set. Then, implement the inverse() function. In this function, we reverse the one-hot encoding back to the actual label, using the dictionary. ::: :::success **Hint:** You might want to look at the intro notebook for some inspiration :wink:. ::: For example, if we have labels X and Y with one-hot encodings of [1,0] and [0,1], we’d want to create a dictionary as follows: {X: [1,0], Y: [0,1]}. As shown in the image above, for MNIST, you will have 10 labels, so your dictionary should have ten entries! You may notice that some classes inherit from Callable or Diffable. More on this in the next section! ## 3. Core Abstractions Consider the following abstract classes of modules. Be sure to play around with the Python notebook associated with this homework to get a good grip of the core abstraction modules defined for you in Beras/core.py! The HW2_Beras_intro notebook is exploratory in nature; it is **NOT** required and all of the code is given. However, it **will** provide you with lots of insights into understanding and using these class abstractions! Note that these modules are very similar to the Tensorflow/Keras API. **Callable:** A function with a well-defined forward function. These are the ones you'll need to implement: 1. CategoricalAccuracy (found in Beras/metrics.py): Computes the accuracy of predicted probabilities against a list of ground-truth labels. As accuracy is not optimized for, there is no need to compute its gradient. Furthermore, categorical accuracy is piecewise discontinuous, so the gradient would technically be 0 or undefined. 2. OneHotEncoder (found in Beras/onehot.py): You can one-hot encode a class instance into a probability distribution to optimize for classifying into discrete options (as discussed in Step 2). **Diffable**: A _callable_ which is also _differentiable_. We use these to represent _differentiable_ layers which we can compute _gradients_ for: 1. Dense (found in Beras/layers.py): A "linear layer" or "fully-connected layer". - **GIVEN** (aside from some weight initialization options) 3. LeakyReLU (found in Beras/activations.py): A leaky Rectified-linear activation: when the input is negative, we lower its magnitude (more details in section 5). 4. MeanSquaredError (found in Beras/losses.py): Given the predicted and actual labels, compute the average of the squares of the difference between predicted and actual values. More details in section 7. - **GIVEN** 6. **[2470]** SoftMax (found in Beras/activations.py): For a given vector of logits, convert it into a vector of probabilities such that the sum of the probabilities for all the logits sums to 1. More details provided in section 5. :::success **Example:** Consider a **Dense** layer instance. Let _s_ represent the input size (source), let _d_ represent the output size (destination), and let _b_ represent the batch size. Then: 1. The _forward function_ is given by $D_\theta(x): \mathbb{R}^{b \times s} \rightarrow \mathbb{R}^{b \times d} = x \theta_w + \theta_b$, where $\theta_w$ represents the _weight term_, and $\theta_b$ represents the _bias term_. 2. The _input gradient function_ will return a transpose jacobian $J_x^T D_\theta$, **which can be a batch**. Here, $J_x y$ is the _Jacobian matrix_ (matrix of partial derivatives) associating each output entry $y_i$ with each input $x_i$. This means that: 1. self.weights would be a list containing the layer weights and bias (the terms $\theta_w, \theta_b$ from above), having dimensions $(s,d)$ and $(1,d)$, respectively. **Hint**: Recall how these sizes work when the computation is carried out. 2. The forward function $D_\theta$ would then just have to do the following: - Store the input $x$ of shape $(b,s)$ - Compute, store, and return the output $D_\theta(x)$ of shape $(b,d)$. - Inputs and outputs are stored to help compute gradients. 3. Using input_gradient, compose_to_input takes an upstream gradient $\frac{\partial L}{\partial D_\theta}$ and computes a batch of $\frac{\partial L}{\partial x}$, the gradients of the loss with respect to the input (for _each entry_). 4. Using weight_gradient, compose_to_weights computes $\frac{\partial L}{\partial w}$ which is the batch-average gradient loss with respect to the weight. - **Note**: The gradient values should have the same dimensions as the weights $\theta_w, \theta_b$ (since it will be used to update the weights). ::: **GradientTape**: This class will function exactly like tf.GradientTape() (**see Lab 3**). You can think of a GradientTape as a logger. Every time an operation is performed within the scope of a GradientTape, it records which operation occurred. Then, during backprop, we can compute the gradient for all of the operations by figuring out how to go back from it. This allows us to differentiate our final output with respect to any intermediate step. When operations are computed outside the scope of GradientTape, they aren’t recorded, so your code will have no record of them and cannot compute the gradients. Of course, Tensorflow’s gradient tape implementation is a lot more complicated and involves constructing a graph. :::info **Task 3:** Implement the gradient method of the GradientTape in Beras/core.py, which returns a list of gradients corresponding to the list of trainable weights in the network. More details can be found in the stencil code. ::: :::warning **Note:** This is listed as Task 3, but you may find it easier to come back to the GradientTape implementation until you've completed other aspects of the assignment (since it isn't really required until the end, when you need to be able to train your model). This is also probably the most challenging part of the assignment! We didn't originally provide any unit tests in the Testing notebook, but you may find it helpful to use the following test as a sanity check to see if your function is working properly: :::spoiler You can add a cell to the testing notebook and add the following:  from Beras.losses import MeanSquaredError from Beras.layers import Dense import numpy as np from Beras.core import GradientTape, Tensor loss = MeanSquaredError() dense = Dense(4, 1, "zero") input = Tensor(np.array([[1,2,3,4]])) with GradientTape() as tape: v = dense(input) l = loss(v, Tensor(np.array([[1]]))) grads = tape.gradient(l, dense.trainable_weights) print("Basic test grads:", grads) assert(np.all(grads[0] == np.array([[-2], [-4], [-6], [-8]]))) assert(np.all(grads[1] == np.array([-2]))) # This uses non-zero initialization, so make sure that you have implemented # Weight initialization of the Dense layer before using this test! np.random.seed(1337) dense_1 = Dense(3, 2) dense_2 = Dense(2, 3) loss_2 = MeanSquaredError() input = Tensor(np.array([[10,9,8]])) with GradientTape() as tape_2: v = dense_2(dense_1(input)) l = loss_2(v, Tensor(np.array([[1,2,3]]))) all_weights = dense_1.trainable_weights + dense_2.trainable_weights grads = tape_2.gradient(l, all_weights) print("Grads:", grads)  The first test, with a single dense layer has an assert statement to ensure that you are returning the correct values, while the output for the second test should something like the following:  Grads: [array([[-321.45094304, -839.53467465], [-289.30584874, -755.58120719], [-257.16075443, -671.62773972]]), array([[-32.1450943 , -83.95346747]]), array([[-197.91316634, 212.11095743, 22.83712901], [-879.886161 , 943.00697367, 101.52974736]]), array([[ 29.29262428, -31.39400322, -3.38006537]])]  ::: ## 4. Layers To avoid redundancy relating to HW1, the Dense class will be mostly provided and is located in Beras/layers.py. **The weight-initialization step will need to be finished**: :::success The following functions for the dense layer are provided for you (and are essentially the same as in Homework 1): - **call():** Implements the forward pass and return the outputs. - **weight_gradients():** Calculates the gradients with respect to the weights and biases. This will be used to optimize the layer. - **input_gradients():** Calculates the gradients with respect to the layer inputs. This will be used to propagate the gradient to previous layers. ::: :::info **Task 4** You should then implement the following: - **_initialize weight():** Initialize the dense layer's weight values. By default, initialize all the weights to zero (**usually a bad idea**). You are also required to allow for more sophisticated options by allowing for the following: - **Normal: Passing** normal causes the weights to be initialized with a unit normal distribution. - **Xavier Normal:** Passing xavier causes the weights to be initialized in the same way as keras.GlorotNormal. - **Kaiming He Normal:** Passing kaiming causes the weights to be initialized in the same way as keras.HeNormal. :::warning **Note:** The stencil also mentions Xavier/Kaiming Uniform initializations, but these are **not required for this assignment**. ::: :::success **Hint:** **Check out the Keras documentation [here](https://keras.io/api/layers/initializers/) for how these should be implemented!** ::: :::success **Hint:** You may find np.random.normal helpful while implementing these. The TODOs provide some justification for why these different initialization methods are necessary but for more detail, check out this website! Feel free to add more initializer options! ::: ## 5. Activation Functions In this assignment, you will be implementing three major activation functions in Beras/activations.py: LeakyReLU, Sigmoid, and Softmax. Since ReLU is a special case of LeakyReLU, we have already provided you with the code for it. :::info **Task 5:** Sigmoid(): You should fill in the call() (computing the value Sigmoid(x)) and input_gradients() functions of this layer. ::: :::info **Task 6:** LeakyReLU(): You should fill in the call() (computing the value LeakyReLU(x)) and input_gradients() functions of this layer. ::: :::warning **Important:** Since these activation functions are per-element, you may also want to override the compose_to_input method with the following:  def compose_to_input(self, J): return self.input_gradients()[0] * J  ::: :::info **_2470 Only_ Task 1:** Fill in the call() and input_gradients() methods of Softmax. ::: :::success **Hint:** When computing the forward pass, _make sure you use stable softmax where you subtract max of all entries to prevent overflow/undefined issues._ ::: ## 6. Filling in the Model With these abstractions in mind, let’s create a pipeline for our sequential deep learning model. You can find the SequentialModel class in assignment.py where you will initialize your neural network’s layers, parameters (weights and biases), and hyperparameters (optimizer, loss function, learning rate, accuracy function, etc.). The SequentialModel class inherits from Beras/model.py, where you’ll find many useful methods. This will also contain functions that fit the model to your data and evaluate the performance of your model: :::success **Given compile():** Initialize the model optimizer, loss function and accuracy function, which are fed in as arguments for the SequentialModel instance to use. ::: :::success **Given fit():** Trains your model to assiciate input to outputs. Training is repeated for each epoch, and the data is batched based on argument. It also computes batch_metrics, epoch_metrics, and the aggregated agg_metrics that can be used to track the training progress of your model. ::: :::info **Task 7 evaluate():** Evaluate the performance of the final model using the metrics mentioned above during the testing phase. It's almost identical to the fit() function (think about what would change between training and testing). ::: :::info **Task 8 call():** Recall that a sequential model is a _stack_ of layers, where each layer has exactly one input vector and one output vector. You can find this function within the SequentialModel class in assignment.py. ::: :::info **Task 9 batch_step():** You will observe that fit() calls this function for _each batch_. You will first compute your model predictions for the input batch. In the training phase, you will need to compute gradients and update your weights according to the optimizer you are using. For backpropagation during training, you will use GradientTape from the core abstractions in core.py to record operations and intermediate values. You will then use the model's optimizer to apply the gradients to your model's trainable variables. Finally, compute and return the loss and accuracy for the batch. You can find this function within the SequentialModel class in assignment.py. ::: We *strongly encourage* you to check out keras.SequentialModel in the intro notebook (under **Exploring a possible modular implementation: TensorFlow/Keras**) and refer to **Lab 3** to get a feel for how we can work with gradient tapes in deep learning. ## 7. Loss Function This is one of the most crucial aspects of model training. You can find your loss function in Beras/losses.py. To avoid redundancy, we have provided you with an MSE loss layer implementation. :::info **_2470 Only_ Task 2:** Fill in the call() and input_gradients() methods of CategoricalCrossentropy. ::: :::warning **Note:** Since there are a few valid ways to implement this, the autograder test for CategoricalCrossentropy only tests that your function executes and gives an output that *very roughly agrees* with our solutions. We will also be grading this manually by checking exactly what your output values are, so **make sure you test beyond the gradescope test provided**. ::: ## 8. Optimizers In the Beras/optimizers.py file make sure to implement the optimization for each of the different types of optimizers. **Lab 2 should help with this.** You have been provided with the following: - BasicOptimizer: A simple optimizer strategy as seen in Lab 1. :::info **Tasks 10 - 12:** Implement the following in Beras/optimizers.py: - RMSProp: Root mean squared error propagation. - Adam: A common adaptive motion estimation-based optimizer. ::: ## 9. Accuracy Metrics Finally, to evaluate the performance of your model, you need to use appropriate accuracy metrics. In this assignment, you will implement categorical accuracy in Beras/metrics.py: :::info **Task 13 call():** Return the categorical accuracy of your model given the predicted probabilities and true labels. You should be returning the proportion of predicted labels equal to the true labels, where the predicted label for an image is the label corresponding to _highest probability_. ::: :::success **Hint:** Refer to lecture slides for categorical accuracy math! ::: ## 10. Train and Test Finally, using all the above primitives, you are required to build two models in assignment.py: :::info **GIVEN:** A *simplest* model in get_simplest_model() which is included for self-diagnostic purposes. ::: :::info **Task 14:** A simple model in get_simple_model() which exercises your implemetations, has two layers, and uses MSE while also binding your predictions to the [0,1] space. **This one is provided for you by default, though you can change it if you'd like.** The autograder will evaluate the original one though! ::: :::info **2470-Only Task 3:** A slightly more complex model in get_advanced_model() which uses a custom crossentropy loss and binds your output space to a discrete probability distribution. ::: For any hyperparameters you use (layer sizes, learning rate, batch size, etc), please hardcode these values in the get_simple_model() and get_advanced_model() functions. **Do not store them under the main handler**. **Once everything is implemented, you can use python assignment.py to run your model and see loss/accuracy.** ## 11. Visualizing Results We provided the visualize_metrics method for you to visualize how your loss and accuracy changes after each batch using matplotlib. DO NOT EDIT THIS FUNCTION. You should call this function in your main method after you store the loss and accuracy per batch in an array, which would be passed into this function. This should plot line graphs where the horizontal axis is the $i$'th batch and the vertical axis is the loss/accuracy value of the batch. Calling this is OPTIONAL! We've also provided the visualize_images method for you to visualize your predictions against the true labels with matplotlib. This method is currently written with the labels having a shape of $(\# images, 1)$. DO NOT EDIT THIS FUNCTION. You should call this function with all your inputs and labels after training your model. The function will randomly pick 500 samples from your input and will plot 10 correct and 10 incorrect classifications to help you visually interpret your model’s predictions! You should do this last, after you have met the benchmark for test accuracy. # Submission ## Requirements For **CS1470 Students**: - Complete and Submit HW2 Conceptual - Implement Beras per specifications and make a SequentialModel in assignment.py - Test the model inside of main - Get test accuracy >=95% on MNIST with get_simple_model() - The included notebooks are just for your reference. - Include a brief README with your model's accuracy and any known bugs For **CS2470 Students**, it is the same except you must also: - Complete and Submit the 2470 portion of HW2 Conceptual - Complete Softmax - Implement Categorical Crossentropy - Get test accuracy >=95% on MNIST with get_advanced_model() ## Grading Your code will be primarily graded on functionality. Your model should have an accuracy that is at **least greater than the threshold on the testing data**. For 1470, this can be achieved with the simple model parameterization provided. For 2470, you may need to experiment with hyperparameters or develop some custom components. Although you will not be graded on code style, you should not have an excessive number of print statements in your final submission. **IMPORTANT!** Please use vectorized operations when possible and limit the number of for loops you use. While there is no strict time limit for running this assignment, it should typically be less than 3 minutes. The autograder will automatically time out after 10 minutes. You will not receive any credit for methods that use Tensorflow or Keras functions within them. **Notebook:** The notebooks will not be graded. Feel free to change them however you want! :::danger You will not receive any credit for functions that use TensorFlow, Keras, PyTorch, or Scikit-Learn functions within them. You must implement the **TODO** functions manually (you are allowed to use NumPy functions). ::: ## Handing In You should submit the assignment via Gradescope under the corresponding project assignment by zipping up your repository folder or through GitHub (recommended). To submit through GitHub, commit and push all changes to your repository to GitHub. You can do this by running the following three commands ([this](https://github.com/git-guides/#how-to-use-git) is a good resource for learning more about them): 1. git add file1 file2 file3 - Alternatively, git add -A will stage all changed files for you. 3. git commit -m “commit message” 4. git push After committing and pushing your changes to your repo (which you can check online if you're unsure if it worked), you can now just upload the repo to Gradescope! If you’re testing out code on multiple branches, you have the option to pick whichever one you want. ![](https://i.imgur.com/fDc3PH9.jpg) If you wish to submit via zip file: 1. Please make sure your python files are in “hw2/code” this is very important for our autograder to work! 2. Make sure any data folders are not being uploaded as they may be too big for the autograder to work. ::: warning **IF YOU ARE IN 2470:** PLEASE REMEMBER TO ADD A BLANK FILE CALLED 2470student IN THE code DIRECTORY, WE ARE USING THIS AS A FLAG TO GRADE 2470 SPECIFIC REQUIREMENTS, FAILURE TO DO SO MEANS LOSING POINTS ON THIS ASSIGNMENT ::: <style> .alert { color: inherit } .markdown-body { font-family: Inter } /* Some really hacky CSS to hide bullet points * for spoilers in lists */ li:has(details) { list-style-type: none; margin-left: -1em } li > details > summary { margin-left: 1em } li > details > summary::-webkit-details-marker { margin-left: -1.05em } </style> # Conclusion **Congratulations!** You just completed your third assignment of CSCI1470/2470! :tada: :clown_face: :tropical_fish: :tada: ::: success The Clownfish and the Sea Anemone have a symbiotic relationship. Clownfish are immune to the nematocyst, sharp harpoon-like stingers on Sea Anemone, and use them as shelter from predators. In return, clownfish clean anemone and provide fertilizer. :::