owned this note changed 6 months ago
Published Linked with GitHub

HW3 Beras

Assignment due October 29th at 6 pm EST on gradescope

Theme

thephilosophyofmagic

Indeed neural nets can often feel like magic, so we'd like to change that! Get your hands dirty with the nuts and bolts of deep learning to show everyone that it isn't magic after all!

Assignment Overview

In this assignment you will begin constructing a basic Keras mimic, 🐻 Beras 🐻.

Assignment Goals

  1. Implement a simple Multi Layer Perceptron (MLP) model that mimics the Tensorflow/Keras API.

    • Implement core classes and methods used for Auto Differentiation.
    • Implement a Dense Layer similar to keras'.
    • Implement basic preprocessing techniques for use on the MNIST Dataset.
    • Implement a basic objective (loss) function for regression such as MSE.
    • Implement basic regression accuracy metrics.
    • Learn optimal weight and bias parameters using gradient descent and backpropogation.
  2. Apply this model to predict digits using the MNIST Dataset

You should know: This is the longest assignment, we highly recommend starting early and reading this document carefully as you implement each part.

Getting Started

Stencil

Please click here to get the stencil code. Reference this guide for more information about GitHub and GitHub classroom.

Do not change the stencil except where specified. You are welcome to write your own helper functions, however, changing the stencil's method signatures will break the autograder

Environment

You will need to use the virtual environment that you made in Homework 1. You can activate the environment by using the command conda activate csci2470. If you have any issues running the stencil code, be sure that your conda environment contains at least the following packages:

  • python==3.11
  • numpy
  • tensorflow==2.15
  • pytest

On Windows conda prompt or Mac terminal, you can check to see if a package is installed with:

conda list -n csci2470 <package_name>

On Unix systems to check to see if a package is installed you can use:

conda list -n csci2470 | grep <package_name>

Be sure to read this handout in its entirety before moving onto implementing any part of the assignment!

Deep Learning Libraries

Deep learning is a very complicated and mathematically rich subject. However, when building models, all of these nuances can be abstracted away from the programmer through the use of deep learning libraries.

In this assignment you will be writting your own Deep Learning library, 🐻 Beras 🐻. You'll build everything you need to train a model on the MNIST dataset. The MNIST data contains 60k 28x28 black and white hand written digits, your model's job will be to classify which digit is in each image.

Please keep in mind you are not allowed to use any Tensorflow, Keras, or PyTorch functions throughout HW3. The autograder will intentionally not execute if you import these libraries.

You are already familar with Tensorflow from our first assignment. Now your job will be to build your own version of it: Beras.

Roadmap

Don't worry if these tasks seem daunting at first glance! We've included a lot more info down below on specific implementation details.

  1. Start with beras/core.py which will create some of the basic building blocks for the assignment. Specifics
  2. Move on to beras/layers.py to construct your own Dense layer. Specifics
  3. Now complete beras/activations.py Specifics
  4. Continue with beras/losses.py to write CategoricalCrossEntropy. Specifics
  5. Next write CategoricalAccuracy in beras/metrics.py. Specifics
  6. Implement beras/onehot.py which will be used later in preprocessing. Specifics
  7. Fill in the optimizer classes in beras/optimizer.py. Specifics
  8. Write GradientTape in beras/gradient_tape.py. Specifics
  9. Construct the Model class in beras/model.py. Specifics

GradientTape is known to be tricky, so budget some extra time to implement it.

  1. Now you have finished the beras framework! Put it to use by implementing preprocessing.py to load and clean your data. Specifics
  2. Finally, write assignment.py to train a model on the MNIST Dataset! Specifics

Gaurav (one of your wonderful TAs) put together this nice graphic to visualize the roadmap and how it all fits together. It's helpful to refer to as you go through the assignment!

BERAS GRAPH (1) Thanks Gaurav!

1. beras/core.py

In this section we are going to prepare some abstract classes we will use for everything else we do in this assigment. This is a very important section since we will build everything else on top of this foundation.

Task 1.1 [Tensor]: We will begin completing the construction of the Tensor class at the top of the file. Note that it subclasses the np.ndarray datatype, you can find out more about what that means here.

The only TODO is to pass in the data to the a kwarg in np.asarray(a=???) in the __new__ method.

You should know: You'll notice the Tensor class is nothing more than a standard np.ndarray but with an additional trainable attribute.

In Tensorflow there is also tf.Variable which you may see throughout the course. tf.Variable is a subclass of tf.Tensor but with some additional bells and whistles for convenience. Both are fair game for use when working with Tensorflow in this course.

Task 1.2 [Callable]: There are no TODOs in Callable but it is important to familiarize yourself with this class. Callable simply allows its subclasses to use self() and self.forward() interchangably. More importantly, if a class subclasses Callable it will have a forward method that returns a Tensor and so, we can and will use these subclasses when constructing layers and models later.

You should know: Keras and Tensorflow use call instead of forward as the method name for the forward pass of a layer. Pytorch and Beras use forward to make the distinction between __call__ and forward clear.

Task 1.3 [Weighted]: There are 4 methods in Weighted for you to fill out: trainable_variables, non_trainable_variables, trainable, trainable (setter). Each method has a description and return type (if needed) in the stencil code. Be sure to follow the typing exactly or it's unlikely to pass the autograder.

Note: If you need a refreshing on python attributes and properties you can refer to this helpful guide

Task 1.4 [Diffable.__call__] There are no coding TODOs in Diffable.__call__ but it is critical that you spend some time to familize yourself with what it is doing. Understanding this method will help clear up later parts of the assignment.

You should know: Recall that in python, generic_class_name() is equal to generic_class_name.__call__(). Note that Diffable implements the __call__ method and not the forward method.

When we subclass Diffable, for example with Dense, we will implement forward there. Then, when we use something like dense_layer(inputs) the gradients will be recorded using GradientTape as you see in Diffable.__call__. If you use dense_layer.forward(inputs) it will not record the gradients because forward won't handle the necessary logic.

Finally, you will see the methods compose_input_gradients and compose_weight_gradients. These methods will be critical when writing GradientTape.tape. These are how you will compose the upstream gradient of a tensor with the input and weight gradients of a Diffable. You don't have to give these methods a close read, but it's important to come back and use them later on.

2. beras/layers.py

In this section we need to fill out the methods for Dense. We give you __init__ and weights, you should read through both of these one liners to know what they are doing. PLease don't change these since the autograder relies on the naming conventions. Your tasks will be to implement the rest of the methods we need.

Task 1 [Dense.forward]: To begin, fill in the Dense.forward method. The parameter x represents our input. Remember from class that our Dense layer performs the following to get its output:

\[ f(\bf{x}) = \bf{x}\bf{W} + \bf{b} \]

Keep in mind that x has shape (num_samples, input_size).

Task 2 [Dense.get_input_gradients]: Refer to the formula you wrote in Dense.forward to compute \(\frac{\partial f}{\partial x}\). Be sure to return the gradient Tensor as a list, this will come in handy when you write back propagation in beras/gradient_tape.py later in this assignment.

Note: For each Diffable you can access the inputs of the forward method with self.inputs

Task 3 [Dense.get_weight_gradients]: Compute both \(\frac{\partial f}{\partial w}\) and \(\frac{\partial f}{\partial b}\) and return both Tensors in a list, like you did in Dense.get_input_gradients.

Task 4 [Dense._initialize_weight]: Initialize the dense layer’s weight values. By default, return 0 for all weights (usually a bad idea). You are also required to allow for more sophisticated options by allowing for the following:

  • Normal: Passing normal causes the weights to be initialized with a unit normal distribution \(\mathcal{N}(0,1)\).
  • Xavier Normal: Passing xavier causes the weights to be initialized in the same way as keras.GlorotNormal.
  • Kaiming He Normal: Passing kaiming causes the weights to be initialized in the same way as keras.HeNormal.

Explicit definitions for each of these initializers can be found in the tensorflow docs

Note: _initialize_weight returns the weights and biases and does not set the weight attributes directly.

3. beras/activations.py

Here, we will implement a couple activation functions that we will use when constructing our model.

Task 3.1 [LeakyReLU]: Fill out the forward pass and input gradients computation for LeakyReLU. You'll notice these are the same methods we implemented in layers.py, this is by design.

Hint: LeakyReLU is not continous so when computing the gradient, consider both the positive and negative cases.

Note: Though LeakyReLU is technically not differentiable at \(0\) exactly, we can just leave the gradient as \(0\) for any \(0\) input.

Task 3.2 [Sigmoid]: Complete the forward and get_input_gradients methods for Sigmoid.

Task 3.3 [Softmax]: Write the forward pass and gradient computation w.r.t inputs for Softmax.

Hints: You should use stable softmax to prevent overflow and underflow issues. Details in the stencil.

Combining np.outer and np.fill_diagonal will significantly clean up the gradient computation

When you first try to compute the gradient it will become apparent that the input gradients are tricky. This medium article has a fantastic derivation that will make your life a lot easier.

4. beras/losses.py

In this section we need to construct our Loss functions for the assignment, MeanSquaredError and CategoricalCrossEntropy. You should note that for most classification tasks we use CategoricalCrossEntropy by default but, for this assignment we will use both and compare the results.

Note: You'll notice we construct a Loss class that both MeanSquaredError and CategoricalCrossEntropy inherit from. This is just so that we don't have to specify that our Loss functions don't have weights everytime we create one.

Task 1 [MeanSquaredError.forward]: Implement the forward pass for MeanSquaredError. We want y_pred - y_true, and not the other way around. Don't forget that we expect to take in batches of examples at a time so we will need to take the mean over the batch as well as the mean for each individual example. In short, the output should be the mean of means.

Don't forget that Tensors are a subclass of np.ndarrays so we can use numpy methods!

You should know: In general, loss functions should return exactly 1 scalar value no matter how many examples are in the batch. We take the mean loss from the batch examples in most practical cases. We will see later in the course that we can use multiple measures of loss at one time to train a model, in which case we often take a weighted sum of each individual loss as our loss value to backpropagate on.

Task 2 [MeanSquaredError.get_input_gradients]: Just as we did for our dense layer, compute the gradient with respect to inputs in MeanSquaredError. It's important to rememeber that there are two inputs, y_pred and y_true. Since y_true comes from our database and is not dependent on our params, you should treat it like a constant vector. On the other hand, compute the gradient with respect to y_pred exactly as you did in Dense. Remember to return them both as a list!

Hint: If you aren't quite sure how to access your inputs, remember that MeanSquaredError is a Diffable!

Task 3 [CategoricalCrossEntropy.forward]: Implement the forward pass of CategoricalCrossEntropy. Make sure to find the per-sample average of the CCE Loss! You may run into trouble with values very close to 0 or 1, you may find np.clip of use

Task 4 [CategoricalCrossEntropy.get_input_gradients]: Get input gradients for CategoricalCrossEntropy.

5. beras/metrics.py

There isn't much to do in this file, just to implement the forward method for CategoricalAccuracy.

Task 1 [CategoricalAccuracy.forward]: Fill in the forward method. Note that our input probs represents the probability of each class as predicted by the model and labels is a one hot encoded vector representing the true model class.

Hint: It may be helpful this also think of the labels as a probability distribution, where the probability of the true class is 1 and all other classes is 0.

If the index of the max value in both vectors is the same, then our model has made the correct classification.

6. beras/onehot.py

onehot.py only contains the OneHotEncoder class which is where you will code a one hot encoder to use on the data when you preprocess later in the assignment. Recall that a one hot encoder transforms a given value into a vector with all entries being 0 except one with a value of 1 (hence "one hot"). This is used often when we have mutliple discrete classes, like digits for the MNIST dataset.

Task 1 [OneHotEncoder.fit]: In HW2 you were able to use tf.onehot now you get to build it yourself! In OneHotEncoder.fit you will take in a 1d vector of labels and you should construct a dictionary that maps each unique label to a one hot vector. This method doesn't return anything.

Note: you should only associate a one hot vector to labels present in labels!

Hint: np.unique and np.eye may be of use here.

Task 2 [OneHotEncoder.forward]: Fill in the OneHotEncoder.forward method to transform the given 1d array data into a one-hot-encoded version of the data. This method should return a 2d np.ndarray.

Task 3 [OneHotEncoder.inverse]: OneHotEncoder.inverse should be an exact inverse of OneHotEncoder.forward such that OneHotEncoder.inverse(OneHotEncoder.forward(data)) = data.

7. beras/optimizer.py

In beras/optimizer.py there are 3 optimizers we'd like you to implement, a BasicOptimizer, RMSProp, and Adam. In practice, Adam is tough to beat so more often than not you will default to using Adam.

Each has an __init__ and apply_gradients. We give you the __init__ for each optimizer which contains all the hyperparams and variables you will need for each algorithm. Then in apply_gradients you will write the algorithm for each method to update the trainable_params according to the given grads. Both trainable_params and grads are lists with

\[\text{grad}[i] = \frac{\partial \mathcal{L}}{\partial \text{ trainable_params[i]}}\]

where \(\mathcal{L}\) is the Loss of the network.

Task 1 [BasicOptimizer]: Write the apply_gradients method for the BasicOptimizer.

For any given trainable_param, \(w[i]\), and learning_rate, \(r\), the optimization formula is given by \[w[i] = w[i] - \frac{\partial \mathcal{L}}{\partial w[i]}*r\]

Task 2 [RMSProp]: Write the apply_gradients method for the RMSProp.

In RMSProp there are two new hyperparams, \(\beta\) and \(\epsilon\).

\(\beta\) is referred to as the decay rate and typically defaults to .9. This decay rate has the effect of lowering the learning rate as the model trains. Intuitively, as our loss decreases we are closer to a minimum and should take smaller steps towards optimization to ensure we don't optimize past the minimum.

\(\epsilon\) is a small constant to prevent division by 0.

In addition to our hyperparams there is another term which we will call, v, which acts as the moving average of the gradients for each param. We update this value in addition to the trainable_params every time we apply the gradients.

For any given trainable_param, \(w[i]\), learning_rate, \(r\) the update is defined by \[v[i] = \beta*v[i] + (1-\beta)*\left(\frac{\partial \mathcal{L}}{\partial w[i]}\right)^2\] \[w[i] = w[i] - \frac{r}{\sqrt{v[i]} + \epsilon}*\frac{\partial \mathcal{L}}{\partial w[i]}\]

Hint: In our stencil code, we provide v as a dictionary which maps a key to a float. Keep in mind that we only need to store a single v value for each weight!

Task 3 [Adam]: Write the apply_gradients methods for the Adam.

At it's core Adam, is similar to RMSProp but it has more smoothing terms and computes an additional momentum term to further balance the learning rate as we train. This momentum term has it's own decay term, \(\beta_1\). Additionally, Adam keeps track of the number of optimization steps performed to further tweak the effective learning rate.

Here is what an optimization step with Adam looks like for trainable_param, \(w[i]\) and learning_rate, \(r\).

\[m[i] = m[i]*\beta_1 + (1-\beta_1)*\left(\frac{\partial \mathcal{L}}{\partial w[i]}\right)\] \[v[i] = v[i]*\beta_2 + (1-\beta_2)*\left(\frac{\partial \mathcal{L}}{\partial w[i]}\right)^2\] \[\hat{m} = m[i]/(1-\beta_1^t)\] \[\hat{v} = v[i]/(1-\beta_2^t)\] \[w[i] = w[i] - \frac{r*\hat{m}}{\sqrt{\hat{v}}+\epsilon}\]

Note: Don't forget the iterate time once when apply_gradients is called!

Hint: Don't overcomplicate this section, it really is as simple as programming the algorithms as they are written.

8. beras/gradient_tape.py

In beras/gradient_tape you are going to implement your very own context manager GradientTape that is extremely similar to the one actually used in Keras. We give you __init__, __enter__, and __exit__, your job is to implement GradientTape.gradient.

Warning: This section has historically been difficult for students. It's helpful to carefully consider our hints and the conceptual ideas behind gradient before beginning your implementation. You should also freshen up on Breadth First Search to make the implementation easier.

If you get stuck, feel free to come back to this section later on.

Task 1 [GradientTape.gradient]: Implement the gradient method to compute the gradient of the loss with respect to each of the trainable params. The output of this method should be a list of gradients, one for each of the trainable params.

Hint: You'll utilize the compose_input_gradients and compose_weight_gradients methods from Diffable to collect the gradients at every step.

Keep in mind that you can call id(<tensor_variable>) to get the object id of a tensor object!

You should know: When you finish this section you will have written a highly generalized gradient method that could handle an arbitrary network. This method will also function almost exactly how Keras implements the gradient method. This is a very powerful method but it is just one way to implement autograd.

9. beras/model.py

In beras/model.py we are going to construct a general Model abstract class that we will use to define our SequentialModel. The SequentialModel simply calls all of it's layers in order for the forward pass.

You should know: At first it may seem like all neural nets would be SequentialModels but there are some archetectures like ResNets that break the sequential assumption.

Task 1 [Model.weights]: Construct a list of all weights in the model and return it.

We give you Model.compile, which just sets the optimizer, loss and accuracy attributes in the model. In Keras, compile is a huge method that prepares these components to make them hyper-efficient. That implementation is highly technical and outside the scope of the course but feel free to look into it if you are interested.

Task 2 [Model.fit]: This method should train the model for the number of epochs given on the train and test data, x and y given with batch size, batch_size. Importantly, you want make sure you record the metrics throughout training and print stats out during the train so that you can watch the metrics as the model trains.

You can use the print_stats and update_metric_dict functions provided. Note that neither of these methods return any values, print_stats prints out the values directly and update_metric_dict(super_dict, sub_dict) updates super_dict with the mean metrics from sub_dict.

Note: You do not need to call the model here, you should instead use self.batch_step(...) which all child classes of Model will implement.

Task 3 [Model.evaluate]: This method should look very similar to Model.fit except we need to ensure the model does not train on the testing data. Additionally, we will test on the entirety of the test set one time, so there is no need for the epochs parameter from Model.fit.

Task 4 [SequentialModel.forward]: This method passes the input through each layer in self.layers sequentially.

Task 5 [SequentialModel.batch_step]: This method trains makes a model prediction and computes the loss inside of GradientTape just like you did in HW2. Be sure to use the training argument to adjust the weights of the model only when training is True. This method should return the loss and accuracy for the batch in a dictionary.

Note: you should have used the implict __call__ function for the layers in the forward method to ensure that each layer gets tracked to the GradientTape (i.e. layer(x)). However, because we don't define how to calculate gradients for a SequentialModel, make sure to use the SequentialModel's forward method in batch_step.

10. preprocess.py

In this section you will fill out the load_and_preprocess_data() function that will load in, flatten, normalize and convert all the data into Tensors.

Task 1 [load_and_preprocess_data()]: We provide the code to load in the data, your job is to

  1. Normalize the values so that they are between 0 and 1

  2. Flatten the arrays such that they are of shape (number of examples, 28*28).

  3. Convert the arrays to Tensors and return the train inputs, train labels, test inputs and test labels in that order.

  4. You should NOT shuffle the data in this method or do any other transformations than what we describe in 1-3. Importantly, you should NOT return one_hot labels. You'll create those when training and testing.

11. assignment.py

Here you put it all together! We won't autograde this file, it is just for you to train and test your model in with different archetectures. Try starting out with small simple models and play around with different archectures, optimizers, hyperparams, etc. to find a configuration that acheives over 95% accuracy on the testing data.

Task 1 [Save predictions]: Once you have an architecture that works, use np.save("predictions.npy", arr: np.ndarry) to save your predictions for the test. Please name the file predictions.npy when you submit to gradescope or the autograder may not find it.

It might be helpful to change what's being returned by/done in batch_step and evaluate for this! Just make sure that batch_step only returns the metric dictionary when training=True, as that's what we test for.

Note: We have a number of safeguards in place to prevent folks from cheating, and the autograder will not tell you that your submission has been flagged. Instead, you'll get an email from Prof. Sun some time after you submit.

Submission

Requirements

You'll need to submit all the files associated with the assignment, and a "predictions.npy" containing your best model predictions.

Grading

Your code will be primarily graded on functionality, as determined by the Gradescope autograder.

You will not receive any credit for functions that use tensorflow, keras, torch, or scikit-learn functions within them. You must implement all functions manually using either vanilla Python or NumPy.

Handing In

You should submit the assignment via Gradescope under the corresponding project assignment through Github or by submitting all files individually.

To submit via Github, commit and push all of your changes to your repository to GitHub. You can do this by running the following commands.

git commit -am "commit message"
git push

For those of y'all who are already familiar with git: the -am flag to git commit is a pretty cool shortcut which adds all of our modified files and commits them with a commit message.

Note: We highly recommend committing your files to git and syncing with your GitHub repository often throughout the course of the assignment to ensure none of your hard work is lost!

Leaderboard

There is a leaderboard active for this assignment, this is purely for fun and is not graded.

Select a repo