# HW3 Programming: BERAS
:::info
Assignment due **October 2nd, 2025 at 10 pm EST** on Gradescope!
:::
:::danger
__You should know:__ This assignment is statistically rated as the hardest and most time consuming assignment of this course (30+ hours).
:::danger
We **highly** recommend starting early and reading this document entirely carefully before you implement any of the code!
:::
## Assignment Overview
### 🪨 Expedition Log: Bruno’s Deep-Earth Dilemma 🐻

*(Image generated by GPT-5)*
Deep Earth, 2025.09.18 — While mapping caverns beneath Providence, our mascot Bruno the Bear descends into an ancient basalt chamber etched with thousands of symbols. A seismic relay—older than the tunnels themselves—can ping the surface, but it only accepts digits 0–9.
There’s a catch: the walls are covered in 60,000 handwritten markings (yup, MNIST). To get a message back to Brown, Bruno must translate the carvings into clean numerical signals.
Your task: build BERAS — *(Bruno’s Earthbound Recognition & Analysis System)* — a neural framework that decodes the glyphs into digits so the relay can lock onto a safe exit route.
Your mission:
1. Implement BERAS from scratch (no off-the-shelf TensorFlow—these caves block Wi-Fi and convenience).
2. Train it to classify the cavern glyphs (MNIST) into 0–9.
3. Use your model’s predictions to drive the seismic relay and guide Bruno back to daylight at the Main Green.
**Field Notes: Headlamp battery is dwindling and the tunnel echoes are getting… echo-ier. Better get to work soon!**
___
### Assignment Goals
1. Implement a simple Multi Layer Perceptron (MLP) model that mimics the Tensorflow/Keras API.
- Implement core classes and methods used for **Auto Differentiation**.
- Implement a **Dense Layer** similar to `keras`.
- Implement basic **preprocessing techniques** for use on the **MNIST Dataset**.
- Implement a basic objective (loss) function for regression such as **MSE**.
- Implement basic regression **accuracy metrics**.
- **Learn** optimal weight and bias parameters using **gradient descent** and **backpropogation**.
2. Apply this model to predict digits using the MNIST Dataset
## Getting Started
### Stencil
<!--LINK THE REFERENCES-->
Please click [here](https://classroom.github.com/a/JbBpViYM) to get the stencil code. Reference this [guide](https://hackmd.io/gGOpcqoeTx-BOvLXQWRgQg) for more information about GitHub and GitHub classroom.
:::info
Make sure you clone this repository within the same parent folder as your virtual environment! Remember from assignment 1:
```python
csci1470-course/ ← Parent directory (you made this)
├── csci1470/ ← Virtual environment (from HW1)
├── HW3-BERAS-F25/ ← This repo (you cloned this)
└── ...
```
:::
:::danger
**Do not change the stencil except where specified.** You are welcome to write your own helper functions. However, changing the stencil's method signatures **will** break the autograder
:::
### Environment
You will need to use the virtual environment that you made in Assignment 1.
:::info
**REMINDER ON ACTIVATING YOUR ENVIRONMENT**
1. Make sure you in your cloned repository folder
- Your terminal line shuold end with the name of the repository
2. You can activate your environment by running the following command:
```bash
# Activate environment
source ../csci1470/bin/activate # macOS/Linux
..\csci1470\Scripts\activate # Windows
```
:::
:::warning
If you have any issues running the stencil code, be sure that your virtual environment contains at least the following packages by running `pip list` once your environment is activated:
- `python==3.11`
- `numpy`
- `tensorflow==2.15` (any version greater than 2.15 is fine)
- `pytest`
On Windows conda prompt or Mac terminal, you can check to see if a package is installed with:
```bash
pip list -n csci1470 <package_name>
```
On Unix systems to check to see if a package is installed you can use:
```bash
pip list -n csci1470 | grep <package_name>
```
:::
## Deep Learning Libraries
Deep learning is a very complicated and mathematically rich subject. However, when building models, all of these nuances can be abstracted away from the programmer through the use of deep learning libraries.
In this assignment you will be writting your own Deep Learning library, 🐻 Beras 🐻. You'll build everything you need to train a model on the MNIST dataset. The MNIST data contains 60k 28x28 black and white hand written digits, your model's job will be to classify which digit is in each image.
:::danger
Please keep in mind you are _not_ allowed to use ___any___ __Tensorflow, Keras, or PyTorch functions throughout HW3__ (other than your testing files). The autograder will intentionally not execute if you import these libraries.
:::
You are already familiar with **Tensorflow** from our first assignment. Now your job will be to build your own version of it: **BERAS**.
## Implementation Roadmap
### Before you Begin: Our Reccomended Gameplan
**CORE IDEA: Read First, Code Second!** Before diving into any implementation, follow these steps to help make this assignment more digestible!
1. **Read this entire document from start to finish**
- This is a dense, long document with many different sections. You should take a minute to walk through the entire document and inspect each of the sections to familiarize yourself. You will likely be confused, but that is the whole point!
2. **Explore the provided stencil code**
- Go through the repository and inspect the file structure and breakdown. You will see some files contain a lot of stencil code, `core.py`, but many are filled with TODOs for you!
3. **Study the [companion sheet](https://hackmd.io/@dlf25/Hy4d-mRFgg) thoroughly**
- This document is your primary reference for understanding the stencil code and implementation details. The companion sheet explains how the different constructs you are required to work with function together!
4. **Refer back to the [companion sheet](https://hackmd.io/@dlf25/Hy4d-mRFgg) frequently**
- Much of the stencil code's patterns and helper functions are explained in here. If you are confused about how the provided code works, refer back to the document!
### Implementation Tasks
Don't worry if these tasks seem daunting at first glance! We've included a lot more info down below on specific implementation details. The companion sheet is your manual for assembling your neural network framework.
1. Start with implementing **`preprocessing.py`** to load and clean your data, and **`beras/onehot.py`** to get to know the dimensions of the data better. [Specifics](#1-preprocesspy)
2. Now fill in part of **`beras/core.py`** which will create some of the basic building blocks for the assignment. [Specifics](#3-berascorepy)
- This is where the companion sheet comes in really handy!
3. Move on to **`beras/layers.py`** to construct your own `Dense` layer. [Specifics](#4-beraslayerspy)
4. Now complete **`beras/activations.py`** [Specifics](#5-berasactivationspy)
5. Continue with **`beras/losses.py`** to write **CategoricalCrossEntropy**. [Specifics](#6-beraslossespy)
6. Next write **CategoricalAccuracy** in **`beras/metrics.py`**. [Specifics](#7-berasmetricspy)
7. Fill in the optimizer classes in **`beras/optimizer.py`**. [Specifics](#8-berasoptimizerpy)
8. Write **GradientTape** in **`beras/gradient_tape.py`**. [Specifics](#9-berasgradient_tapepy)
:::danger
**GradientTape** is known to be tricky, so budget some extra time to implement it. **Refer to the [companion sheet](https://hackmd.io/@dlf25/Hy4d-mRFgg) for a detailed explanation about how GradientTape works!**
:::
9. Construct the **Model** class in **`beras/model.py`**. [Specifics](#10-berasmodelpy)
10. Finally, write **`assignment.py`** to train a model on the MNIST Dataset! [Specifics](#11-assignmentpy)
### Timeline Suggestion
:::success
__Note:__ You have 2 weeks to complete this assignment in full and we recommend splitting the tasks into 2 big sections.
__Week 1 (Sections 1-7):__ Build the foundations of BERAS
- Sections 1-5 will likely be the most code heavy, but primarily involve implementing functions we've covered in class. A strong conceptual understanding is your key to success on this assignment.
- Section 6-7 are the lightest portion of the assignment and should flow smoothly if you understand the code from parts 1-5
- **Pro tip:** Keep the [companion sheet](https://hackmd.io/@dlf25/Hy4d-mRFgg) open while working - it explains most of the stencil code patterns you'll encounter
__Week 2 (Section 8-11):__ Integration
- Section 7 is very easy and is *almost* as easy copying over your code from assignment 2 into `optimzers.py`
- Section 8 (GradientTape) is conceptually challenging and, on average, is **the hardest section for students**. This section will take about **25-30% of the total assignment time**.
- *TIP: The [companion sheet](https://hackmd.io/@dlf25/Hy4d-mRFgg) breaks down gradient tape!*
- Sections 9-11 involve piecing together your previous work so if you've built solid foundations, these should be relatively straightforward
:::
Gaurav (a former TA) put together this nice graphic to visualize the roadmap and how it all fits together. It's helpful to refer to as you go through the assignment!

*Thanks Gaurav!*
:::success
**HERE IS THE BERAS COMPANION SHEET AGAIN: {%preview https://hackmd.io/@dlf25/Hy4d-mRFgg %}**
:::
:::warning
**[QUICK ASIDE: TESTING INCREMENTALLY]** You will notice the `run_tests.py` file and `tests/` directory in your cloned repository. These are helpful unit tests we have provided to you in order to help you ensure you are on track with the assignment.
Each section will detail which testing files are available to you and how to run them. For example, the tests for `beras/layers.py`, `beras/activations.py`, and `beras/losses.py` are **all** contained within the `tests/test_beras.py` file!
However, we have only provided you with the **minimal** set of tests, with some not fully implemented. You are responsible for implementing more tests as you see fit!
*Before you ask, we are not grading your tests. These are simply there for you since you are limited on the number of gradescope submissions you have!*
:::
## 1. `preprocess.py`
In this section you will fill out the `load_and_preprocess_data()` function that will load in, flatten, normalize and convert all the data into `Tensor`s.
:::info
__Task 1.1 [load_and_preprocess_data()]:__ We provide the code to load in the data, your job is to
1. Normalize the values so that they are between 0 and 1
2. Flatten the arrays such that they are of shape `(number of examples, 28*28)`.
3. Convert the arrays to `Tensor`s and return the train inputs, train labels, test inputs and test labels __in that order__.
4. You should NOT shuffle the data in this method or do any other transformations than what we describe in 1-3. Importantly, you should **NOT** return the labels one-hot encoded. You'll create those when training and testing.
:::
:::warning
__Task 1.2 [Testing]:__ You can now run the preprocess tests provided `tests/test_data.py` file to test your implementation. In order to run the test, you should first make sure you are in the root directory for the assignment. Then, run the following command to run the tests
```python!
python tests/test_data.py --test=preprocess
```
This will run our prewritten tests for your implementation and print out the results in the terminal.
***Note:** These tests do not entirely guarantee your implementation is perfect, but if you pass them, you should be on the right track! **You are encouraged to write more tests in this file, but make sure they being with `test_`!***
:::danger
**You may run into errors because other portions of the code are not implemented. You can either wait till you reach those sections, or fill the missing values in with null/filler data.**
:::
:::
## 2. beras/onehot.py
`onehot.py` only contains the `OneHotEncoder` class which is where you will code a one hot encoder to use on the data when you preprocess later in the assignment. Recall that a one hot encoder transforms a given value into a vector with all entries being 0 except one with a value of 1 (hence "one hot"). This is used often when we have multiple discrete classes, like digits for the MNIST dataset.
:::info
__Task 2.1 [OneHotEncoder.fit]:__ In HW2 you were able to use `tf.onehot` now you get to build it yourself! In `OneHotEncoder.fit` you will take in a 1d vector of labels and you should construct a dictionary that maps each unique label to a one hot vector. This method doesn't return anything.
__Note__: you should only associate a one hot vector to labels present in labels!
:::
:::success
__Hint:__ `np.unique` and `np.eye` may be of use here.
:::
:::info
__Task 2.2 [OneHotEncoder.forward]:__ Fill in the `OneHotEncoder.forward` method to transform the given 1d array `data` into a one-hot-encoded version of the data. This method should return a 2d `np.ndarray`.
:::
:::info
__Task 2.3 [OneHotEncoder.inverse]:__ `OneHotEncoder.inverse` should be an exact inverse of `OneHotEncoder.forward` such that `OneHotEncoder.inverse(OneHotEncoder.forward(data)) = data`.
:::
:::warning
__Task 2.4 [Testing]:__ You can now run the one-hot encoding tests provided `tests/test_data.py` file to test your implementation. Again, you should first make sure you are in the root directory for the assignment. Then, run the following command to run the test(s)
```python!
python tests/test_data.py --test=ohe
```
This will run our prewritten tests for your implementation and print out the results in the terminal.
***Note:** These tests do not entirely guarantee your implementation is perfect, but if you pass them, you should be on the right track! **You are encouraged to write more tests in this file, but make sure they being with `test_`!***
:::
## 3. beras/core.py
In this section we are going to prepare some abstract classes we will use for everything else we do in this assignment. This is a very important section since we will build everything else on top of this foundation.
:::info
**Task 3.1 [Tensor]:** We will begin completing the construction of the `Tensor` class at the top of the file. Note that it subclasses the `np.ndarray` datatype, you can find out more about what that means <ins>[here](https://numpy.org/doc/stable/user/basics.subclassing.html)</ins>.
**The only TODO is to pass in the data to the `a` kwarg in `np.asarray(a=???)` in the `__new__` method.**
:::
:::warning
__You should know:__ You'll notice the `Tensor` class is nothing more than a standard `np.ndarray` but with an additional `trainable` attribute.
In Tensorflow there is also `tf.Variable` which you may see throughout the course. `tf.Variable` is a subclass of `tf.Tensor` but with some additional bells and whistles for convenience. In particular, for weights and biases **you should use Variables and not Tensors**.
:::
:::info
**Task 3.2 [Callable]:** There are no TODOs in `Callable` but it is important to familiarize yourself with this class. `Callable` simply allows its subclasses to use `self()` and `self.forward()` interchangeably. More importantly, if a class subclasses `Callable` it **will** have a `forward` method that returns a `Tensor`. We **can and will** use these subclasses when **constructing layers and models** later.
:::
:::warning
__Fun Fact (You should know):__ __Keras__/__Tensorflow__ use `call` instead of `forward` as the method name for the forward pass of a layer. __Pytorch__ and __Beras__ use `forward` to make the distinction between `__call__` and `forward` clear.
:::
:::info
**Task 3.3 [Weighted]:** There are 4 methods in `Weighted` for you to fill out: `trainable_variables`, `non_trainable_variables`, `trainable`, `trainable (setter)`. Each method has a description and return type (if needed) in the stencil code. Be sure to follow the typing **exactly** or it's unlikely to pass the autograder.
**HINT: weights have a `trainable` attribute**
:::
:::success
__Note:__ If you need a refreshing on python attributes and properties you can refer to <ins>[this](https://realpython.com/python-getter-setter/)</ins> helpful guide
:::
:::info
**Task 3.4 [Diffable.\_\_call__]** There are no coding TODOs in `Diffable.__call__` but it is **critical** that you spend some time to familiarize yourself with what it is doing. Understanding this method will help clear up later parts of the assignment.
:::
:::warning
__You should know:__ Recall that in python, `generic_class_name()` is equal to `generic_class_name.__call__()`. Note that Diffable implements the `__call__` method and __not__ the `forward` method.
When we subclass `Diffable`, for example with `Dense`, we __will__ implement `forward` there. Then, when we use something like `dense_layer(inputs)` __the gradients will be recorded using `GradientTape`__ as you see in `Diffable.__call__`. If you use `dense_layer.forward(inputs)` __it will not record the gradients__ because `forward` won't handle the necessary logic.
:::
:::warning
**`compose_input_gradients` and `compose_weight_gradients`**
These methods are responsible for composing the downstream gradient for during backpropagation using the upstream gradient, `J`, and the local input and weight gradients of a `Diffable`. These methods are defined in more detail in the companion sheet!
:::
## 4. beras/layers.py
:::warning
__[BERAS Testing] (this applies for section 4, 5, 6, and 7)__
We have provided you with a test file `test_beras` which contains a test suite to test your Dense layer, activation functions, and loss functions. This test suite has been set up so you can progressively run the tests as you implement each of the components/functions by simply calling the tests by their name. You can run the first test after implementing the `Dense` layer.
Please note that we do not provide you with tests for all of the classes/function. For example, we provide you with the test case for `LeakyReLU`, but do not provide the cases for `Sigmoid` or `Softmax`. You should be able to write your own tests for the remaining activation functions using the provided case as a template. We **highly recommend** you take the time to write these extra tests and as many as needed before submitting to the autograder
In order to run the tests, you need to run the commmand
```python!
python tests/test_beras.py --list # will list out the available tests
python tests/test_beras.py --test=<test name>
```
***Note:** These tests do not entirely guarantee your implementation is perfect, but if you pass them, you should be on the right track! **You are encouraged to write more tests in this file, but make sure they being with `test_`!***
:::
In this section we need to fill out the methods for `Dense`. We give you `__init__` and `weights`, you should read through both of these one liners to know what they are doing. Please don't change these since the autograder relies on the naming conventions. Your tasks will be to implement the rest of the methods we need.
:::info
__Task 4.1 [Dense.forward]:__ To begin, fill in the `Dense.forward` method. The parameter `x` represents our input. Remember from class that our Dense layer performs the following to get its output:
$$
f(\bf{x}) = \bf{x}\bf{W} + \bf{b}
$$
Keep in mind that `x` has shape `(num_samples, input_size)`.
:::
:::info
__Task 4.2 [Dense.get_input_gradients]:__ Refer to the formula you wrote in `Dense.forward` to compute $\frac{\partial f}{\partial x}$. Be sure to return the gradient `Tensor` __as a list__, this will come in handy when you write back propagation in `beras/gradient_tape.py` later in this assignment.
:::
:::success
__Note:__ For each `Diffable` you can access the inputs of the forward method with `self.inputs`
:::
:::info
__Task 4.3 [Dense.get_weight_gradients]:__ Compute both $\frac{\partial f}{\partial w}$ and $\frac{\partial f}{\partial b}$ and return both `Tensor`s __in a list__, like you did in `Dense.get_input_gradients`.
HINT: The shape of your weight gradients should report the results for how each weight changes, for each example in the batch. How many dimensions should your matrix have?
:::
:::info
__Task 4.4 [Dense.\_initialize_weight]:__ Initialize the dense layer’s weight values. By default, return 0 for all weights (usually a bad idea). You are also required to allow for more sophisticated options by allowing for the following:
- **Normal:** Passing `normal` causes the weights to be initialized with a unit normal distribution $\mathcal{N}(0,1)$.
- **Xavier Normal:** Passing `xavier` causes the weights to be initialized in the same way as `keras.GlorotNormal`.
- **Kaiming He Normal:** Passing `kaiming` causes the weights to be initialized in the same way as `keras.HeNormal`.
Explicit definitions for each of these initializers can be found **[in the tensorflow docs](https://www.tensorflow.org/api_docs/python/tf/keras/initializers)**
:::
:::warning
**Notes:**
1. `_initialize_weight` __returns__ the weights and biases and does not set the weight attributes directly.
2. Your weights should be `Variable` **not** `Tensor`. This is so that we can use the `.assign` method in your optimizers.
:::
:::warning
__Task 4.5 [TESTING]:__ For example, you can now run our first test, `test_dense_forward()` to test if you set up your forward pass correctly. **We do not provide tests for all methods.** You can run the test as follows:
```python
python tests/test_beras.py --test=dense
```
For the following sections, you simply replace `test=dense` with the actual test you want to run to test your implementation (pre-written or custom).
***Note:** These tests do not entirely guarantee your implementation is perfect, but if you pass them, you should be on the right track! **You are encouraged to write more tests in this file, but make sure they being with `test_`!***
:::
## 5. beras/activations.py
Here, we will implement a couple activation functions that we will use when constructing our model. Here is some helpful infromation about the [activation functions](https://medium.com/analytics-vidhya/activation-functions-all-you-need-to-know-355a850d025e)
:::info
__Task 5.1 [LeakyReLU]:__ Fill out the forward pass and input gradients computation for `LeakyReLU`. You'll notice these are the same methods we implemented in `layers.py`, this is by design.
:::
:::success
__Hint:__ LeakyReLU is not continous so when computing the gradient, consider both the positive and negative cases.
Note: Though LeakyReLU is technically not differentiable at $0$ exactly, we can just leave the gradient as $0$ for any $0$ input.
:::
:::info
__Task 5.2 [Sigmoid]:__ Complete the `forward` and `get_input_gradients` methods for `Sigmoid`.
:::
:::info
__Task 5.3 [Softmax]:__ Write the forward pass and gradient computation w.r.t inputs for Softmax.
:::
:::success
__Hints:__
You should use stable softmax to prevent overflow and underflow issues. Details in the stencil.
Combining `np.outer` and `np.fill_diagonal` will significantly clean up the gradient computation
When you first try to compute the gradient it will become apparent that the input gradients are tricky. This [medium article](https://medium.com/towards-data-science/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1) has a fantastic derivation that will make your life a lot easier.
:::
## 6. beras/losses.py
In this section we need to construct our Loss functions for the assignment, `MeanSquaredError` and `CategoricalCrossEntropy`. You should note that for most classification tasks we use `CategoricalCrossEntropy` by default but, for this assignment we will use both and compare the results.
:::success
__Note:__ You'll notice we construct a `Loss` class that both `MeanSquaredError` and `CategoricalCrossEntropy` inherit from. This is just so that we don't have to specify that our Loss functions don't have weights everytime we create one.
:::
:::info
__Task 6.1 [MeanSquaredError.forward]:__ Implement the forward pass for `MeanSquaredError`. We want `(y_true - y_pred)**2`, and not the other way around. Don't forget that we expect to take in _batches_ of examples at a time so we will need to take the mean over the batch as well as the mean for each individual example. In short, the output should be the mean of means.
Don't forget that `Tensors` are a subclass of `np.ndarrays` so we can use numpy methods!
[Mean Squared Error](https://www.geeksforgeeks.org/maths/mean-squared-error/)
:::
:::warning
__You should know:__ In general, loss functions should return **exactly 1 scalar value** no matter how many examples are in the batch. We take the mean loss from the batch examples in most practical cases. We will see later in the course that we can use multiple measures of loss at one time to train a model, in which case we often take a weighted sum of each individual loss as our loss value to backpropagate on.
:::
:::info
__Task 6.2 [MeanSquaredError.get_input_gradients]:__ Just as we did for our dense layer, compute the gradient with respect to inputs in MeanSquaredError. It's important to rememeber that there are two inputs, `y_pred` and `y_true`. Since `y_true` comes from our database and is not dependent on our params, you should treat it like a constant vector. On the other hand, compute the gradient with respect to `y_pred` exactly as you did in `Dense`. Remember to return them both as a list!
:::
:::success
__Hint:__ If you aren't quite sure how to access your inputs, remember that `MeanSquaredError` is a `Diffable`!
:::
:::info
__Task 6.3 [CategoricalCrossEntropy.forward]:__ Implement the forward pass of `CategoricalCrossEntropy`. Make sure to find the per-sample average of the CCE Loss! You may run into trouble with values very close to 0 or 1, you may find `np.clip` of use...
Here is some helpful reading on understanding categorical cross-entropy loss [Categorical Cross Entropy Loss](https://www.geeksforgeeks.org/deep-learning/categorical-cross-entropy-in-multi-class-classification/)
:::
:::info
__Task 6.4 [CategoricalCrossEntropy.get_input_gradients]:__ Get input gradients for `CategoricalCrossEntropy`.
:::
## 7. beras/metrics.py
There isn't much to do in this file, just to implement the forward method for `CategoricalAccuracy`.
:::info
__Task 7.1 [CategoricalAccuracy.forward]:__ Fill in the `forward` method. Note that our input `probs` represents the probability of each class as predicted by the model and labels is a one hot encoded vector representing the true model class. Run `test_beras.py` to to test your activations, losses, and metrics!
:::
:::success
__Hint:__ It may be helpful this also think of the labels as a probability distribution, where the probability of the true class is 1 and all other classes is 0.
If the index of the max value in both vectors is the same, then our model has made the correct classification.
:::
:::warning
__Task 7.2 [Testing]:__ You now should be able to run all of the provided tests in `test_beras.py`. In order to run all of the tests, you should first make sure you are in the root directory for the assignment. Then run the following command
```python!
python run_tests.py --category beras # runs all tests in beras.py
python tests/test_beras.py --all # this runs the same command
```
*Note: These tests do not entirely guarantee your implementation is perfect, but if you pass them, you should be on the right track!*
:::
## 8. beras/optimizer.py
In `beras/optimizer.py` there are 3 optimizers we'd like you to implement, a `BasicOptimizer`, `RMSProp`, and `Adam`. In practice, `Adam` is tough to beat so more often than not you will default to using `Adam`.
Each has an `__init__` and `apply_gradients`. We give you the `__init__` for each optimizer which contains all the hyperparams and variables you will need for each algorithm. Then in `apply_gradients` you will write the algorithm for each method to update the `trainable_params` according to the given `grads`. Both `trainable_params` and `grads` are lists with
$$\text{grad}[i] = \frac{\partial \mathcal{L}}{\partial \text{ trainable_params[i]}}$$
where $\mathcal{L}$ is the Loss of the network.
:::success
**Hint:** In Assignment 2, you wrote all these optimization algorithms already. Feel free to reuse your code for these tasks.
:::
:::warning
__Warning:__ As in the assignment make sure to use `param.assign` when updating the weights!
:::
:::info
__Task 8.1 [BasicOptimizer]:__ Write the `apply_gradients` method for the `BasicOptimizer`.
For any given `trainable_param`, $w[i]$, and `learning_rate`, $r$, the optimization formula is given by
$$w[i] = w[i] - \frac{\partial \mathcal{L}}{\partial w[i]}*r$$
:::
:::info
__Task 8.2 [RMSProp]:__ Write the `apply_gradients` method for the `RMSProp`.
In `RMSProp` there are two new hyperparams, $\beta$ and $\epsilon$.
$\beta$ is referred to as the __decay rate__ and typically defaults to .9. This decay rate has the effect of _lowering the learning rate as the model trains_. Intuitively, as our loss decreases we are closer to a minimum and should take smaller steps towards optimization to ensure we don't optimize past the minimum.
$\epsilon$ is a small constant to prevent division by 0.
In addition to our hyperparams there is another term which we will call, __v__, which acts as the moving average of the gradients __for each param__. We update this value in addition to the `trainable_params` every time we apply the gradients.
For any given `trainable_param`, $w[i]$, `learning_rate`, $r$ the update is defined by
$$v[i] = \beta*v[i] + (1-\beta)*\left(\frac{\partial \mathcal{L}}{\partial w[i]}\right)^2$$
$$w[i] = w[i] - \frac{r}{\sqrt{v[i]} + \epsilon}*\frac{\partial \mathcal{L}}{\partial w[i]}$$
**Hint**: In our stencil code, we provide **v** as a dictionary which maps a key to a float. Keep in mind that we only need to store a single **v** value for each weight!
:::
:::info
__Task 8.3 [Adam]:__ Write the `apply_gradients` methods for the `Adam`.
At it's core Adam, is similar to `RMSProp` but it has more smoothing terms and computes an additional _momentum_ term to further balance the learning rate as we train. This momentum term has it's own decay term, $\beta_1$. Additionally, `Adam` keeps track of the number of optimization steps performed to further tweak the effective learning rate.
Here is what an optimization step with `Adam` looks like for `trainable_param`, $w[i]$ and `learning_rate`, $r$.
$$m[i] = m[i]*\beta_1 + (1-\beta_1)*\left(\frac{\partial \mathcal{L}}{\partial w[i]}\right)$$
$$v[i] = v[i]*\beta_2 + (1-\beta_2)*\left(\frac{\partial \mathcal{L}}{\partial w[i]}\right)^2$$
$$\hat{m} = m[i]/(1-\beta_1^t)$$
$$\hat{v} = v[i]/(1-\beta_2^t)$$
$$w[i] = w[i] - \frac{r*\hat{m}}{\sqrt{\hat{v}}+\epsilon}$$
Note: Don't forget the __iterate time once__ when `apply_gradients` is called!
:::
:::success
__Hint:__ Don't overcomplicate this section, it really is as simple as programming the algorithms as they are written.
:::
## 9. beras/gradient_tape.py
In `beras/gradient_tape` you are going to implement your very own context manager `GradientTape` that is extremely similar to the one actually used in Keras. We give you `__init__`, `__enter__`, and `__exit__`, your job is to implement `GradientTape.gradient`.
:::danger
__Warning:__ This section has historically been difficult for students. It's helpful to carefully consider our hints and the conceptual ideas behind `gradient` __before__ beginning your implementation.
You shuold refer to the companion sheet to get a detailed explanation of GradientTape and the helper functions. This is avaiable below:
{%preview https://hackmd.io/@dlf25/Hy4d-mRFgg %}
If you get stuck, feel free to come back to this section later on.
:::
:::info
__Task 9.1 [GradientTape.gradient]:__ Implement the `gradient` method to compute the gradient of the loss with respect to each of the trainable params. The output of this method should be a list of gradients, one for each of the trainable params.
:::
:::success
__Hint:__ Closely read through the `compose_input_gradients` and `compose_weight_gradients` methods from `Diffable` in core.py- you'll utilize these methods to collect the gradients at every step.
Keep in mind that you can call `id(<tensor_variable>)` to get the object id of a tensor object!
:::
:::warning
__You should know:__ When you finish this section you will have written a highly generalized gradient method that could handle an arbitrary network. This method will also function almost exactly how Keras implements the `gradient` method. This is a very powerful method but it is just one way to implement autograd.
:::
:::warning
__Task 9.2 [TESTING]:__ We have written some tests for you to determine if your gradient tape is working in `tests/test_gradient.py`. You can run them as
```python
python run_tests.py --category gradient # runs gradient tests
python tests/test_gradient.py --all # same output, alternate command
```
***Note:** These tests do not entirely guarantee your implementation is perfect, but if you pass them, you should be on the right track! **You are encouraged to write more tests in this file, but make sure they being with `test_`!***
:::
## 10. beras/model.py
In `beras/model.py` we are going to construct a general `Model` abstract class that we will use to define our `SequentialModel`. The `SequentialModel` simply calls all of it's layers in order for the forward pass.
:::warning
__You should know:__ At first it may seem like all neural nets would be `SequentialModel`s but there are some architectures like __ResNets__ that break the sequential assumption.
:::
:::info
__Task 10.1 [Model.weights]:__ Construct a list of all weights in the model and return it.
:::
:::success
We give you `Model.compile`, which just sets the optimizer, loss and accuracy attributes in the model. In Keras, compile is a huge method that prepares these components to make them hyper-efficient. That implementation is highly technical and outside the scope of the course but feel free to look into it if you are interested.
:::
:::info
__Task 10.2 [Model.fit]:__ This method should train the model for the number of `epochs` given on the train and test data, `x` and `y` given with batch size, `batch_size`. Importantly, you want make sure you record the metrics throughout training and print stats out during the train so that you can watch the metrics as the model trains.
You can use the `print_stats` and `update_metric_dict` functions provided. Note that neither of these methods return any values, `print_stats` prints out the values directly and `update_metric_dict(super_dict, sub_dict)` updates `super_dict` with the mean metrics from `sub_dict`.
__Note:__ You do __not__ need to call the model here, you should instead use `self.batch_step(...)` which all child classes of `Model` will implement.
:::
:::info
__Task 10.3 [Model.evaluate]:__ This method should look _very similar_ to `Model.fit` except we need to ensure the model does not train on the testing data. Additionally, we will test on the entirety of the test set one time, so there is no need for the epochs parameter from `Model.fit`.
:::
:::info
__Task 10.4 [SequentialModel.forward]:__ This method passes the input through each layer in `self.layers` sequentially.
:::
:::info
__Task 10.5 [SequentialModel.batch_step]:__ This method trains makes a model prediction and computes the loss __inside of GradientTape__ just like you did in HW2. Be sure to use the `training` argument to adjust the weights of the model only when `training` is True. This method should return the _loss and accuracy_ for the batch in a dictionary.
**Note**: you should have used the implict `__call__` function for the layers in the forward method to ensure that each layer gets tracked to the GradientTape (i.e. `layer(x)`). However, because we don't define how to calculate gradients for a SequentialModel, make sure to use the SequentialModel's `forward` method in `batch_step`.
:::
## 11. `assignment.py`
Here you put it all together!
:::info
__Task 11.1 [Create, Train and Test your model]:__ You'll find 4 mostly empty functions in `assignment.py`: `get_model`, `get_optimizer`, `get_loss_fn`, and `get_acc_fn`. You should fill these out to create your model however you'd like, you may have to play with different Dense Layers, activations, optmizers, etc. to get better accuracies.
Once you have those 4 methods filled out, you can fill out the `__main__` block to train the model. The steps are outlined in the stencil.
You are looking for an accuracy >= 95% within 10 epochs consistently. **Once you are ready**, you can submit your code to the autograder with an additional `FINAL.txt` file which tells the autograder that you'd like to train and test for score on the autograder. __The autograder will use your `get_model`, `get_loss_fn` and `get_optimizer` to initialize your model on gradescope.__ This will take some time on gradescope so expect __~10 minutes__ waiting time. That said, if you are consistently able to get up to accuracy locally and are passing all other tests you should have no trouble on gradescope.
:::
:::success
__Hint:__ You likely don't need as large a network as you might expect, the right network parameters, activations and optmizer should be able to get up to accuracy within 3-5 epochs with only a couple layers. If you find that your training is taking a long time or that you need many large layers to get good accuracy, probably you can tweak the network in small ways to improve it more quickly.
If you are having trouble reaching accuracy, go to office hours and talk to the TAs about strategies for changing out your hyperparameters.
:::
:::warning
__Task 11.2 [TESTING]:__ We have written some tests for you to determine if your model and assignment are set up properly. You can run them as
```python
python run_tests.py --category assignment # runs assignment/model tests
python tests/test_assignment.py --all # same output, alternate command
```
***Note:** These tests do not entirely guarantee your implementation is perfect, but if you pass them, you should be on the right track! **You are encouraged to write more tests in this file, but make sure they being with `test_`!***
:::
:::warning
__Task 11.3 [TESTING]:__ If you have passed all of your previous tests and have implemented all of the code, you can run the command
```python
python run_tests.py [--v] # runs all of the tests!
```
- `--v` prints out a more verbose output from the test runner
These will run all of the tests within the testing files that being the `test_` prefix.
:::
## Submission
:::danger
**[REMINDER]** After 9/25/2025, you are limited to **15 submissions** on Gradescope! Before then, you have unlimited submissions! Start early :)
:::
### Requirements
Once you've completed your model and are training locally, be sure to include a blank `final.txt` so that the autograder trains and tests your model for accuracy!
### Grading
Your code will be primarily graded on functionality, as determined by the Gradescope autograder.
:::warning
You will not receive any credit for functions that use `tensorflow`, `keras`, `torch`, or `scikit-learn` functions within them. You must implement all functions manually using either vanilla Python or NumPy. (This does not apply to the testing files!)
:::
### Handing In
You should submit the assignment via Gradescope under the corresponding project assignment through Github or by submitting all files individually.
To submit via Github, commit and push all of your changes to your repository to GitHub. You can do this by running the following commands.
```bash
git commit -am "commit message"
git push
```
For those of y'all who are already familiar with `git`: the `-am` flag to `git commit` is a pretty cool shortcut which adds all of our modified files and commits them with a commit message.
**Note:** We highly recommend committing your files to git and syncing with your GitHub repository **often** throughout the course of the assignment to ensure none of your hard work is **lost**!