--- tags: hw1, handout --- # HW1 Programming: Beras Pt. 1 :::info Conceptual questions due **Friday, February 9, 2024 at 6:00 PM EST** Programming assignment due **Monday, February 12, 2024 at 6:00 PM EST** ::: ## Theme ![](https://daily.jstor.org/wp-content/uploads/2017/06/barracuda_fish_school_1050x700.jpg) *Deep under the sea, a school of fish have started learning computer science, and want to make a deep learning model! Because they don't have `conda` or `pip`, they're making their own AI framework, and need your help!* <sub>*P.S. the puns are intended.*</sub> ## Assignment Overview In this assignment you will begin constructing a basic Keras mimic, 🐻 Beras 🐻 (haha funny name). ### Assignment Goals 1. Implement a simple _linear regression_ model that mimics the Tensorflow/Keras API. - Implement a **fully-connected linear (dense) layer** with weights and biases - Implement a basic objective (loss) function for regression such as **MSE** - Implement basic regression **accuracy metrics** - **Learn** optimal weight and bias parameters using **gradient descent** and **backpropogation** 2. Apply this model to predict the progression of diabetes one year from baseline using [`scikit-learn's diabetes dataset`](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-datasethttps://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset). > **Note:** HW2 will use a lot of *your* code from HW1. Therefore, it is highly recommended to complete HW1 in its entirety in a timely fashion to prevent any delays in completing HW2. ## Getting Started ### Stencil <!--LINK THE REFERENCES--> Please click [here](https://classroom.github.com/a/wOLjfowh) to get the stencil code. Reference this [guide](https://hackmd.io/gGOpcqoeTx-BOvLXQWRgQg) for more information about GitHub and GitHub classroom. > **Warning:** **Do not change the stencil except where specified.** You are welcome to write your own helper functions, however, changing the stencil's method signatures or removing pre-defined functions may result in an incompatibility with our autograder and result in a low overall grade. ### Environment You will need to use the virtual environment that you made in Homework 0. You can activate the environment by using the command `conda activate csci1470`. If you have any issues running the stencil code, be sure that your conda environment contains at least the following packages: - `python==3.10` - `numpy` - `tensorflow` - `scikit-learn` (called `sklearn` in actual Python programs!) - `pytest` On Windows conda prompt or Mac terminal, you can check to see if a package is installed with: ```bash conda list -n csci1470 <package_name> ``` On Unix systems to check to see if a package is installed you can use: ```bash conda list -n csci1470 | grep <package_name> ``` > **Note:** Be sure to read this handout in its **entirety before** moving onto implementing **any** part of the assignment! ## Deep Learning Libraries Deep learning is a very complicated and mathematically rich subject. However, when building models, all of these nuances can be abstracted away from the programmer through the use of deep learning libraries. There are a couple of popular choices for deep learning libraries in industry and academia, each with their own opinions on how models should be structured and how the building blocks interact with each other. Two of the most popular libraries for Python include **Tensorflow** and **PyTorch**. - TensorFlow is a linear algebra and auto-differentiation library. Keras is a submodule of Tensorflow which focuses on constructing deep learning objects and their interactions with one another to build complex systems. For example: - `tensorflow.keras.layers.Dense` is the typical fully-connected, linear layer with weights and bias - `tensorflow.keras.losses.MSE` is the differentiable mean-squared error loss layer - PyTorch is also a linear algebra and auto-differentiation library. The `nn` (neural network) submodule is PyTorch's analogue to Keras and is the location of PyTorch's deep learning objects. For example: - `torch.nn.Linear` is the typical fully-connected, linear layer with weight and bias - `torch.nn.MSELoss` is the differentiable mean-squared error loss layer In general, Tensorflow and Keras simplifies life by providing you a lot of building blocks and forcing you to use them. PyTorch, on the other hand, may require you build a couple more things by hand but also makes it easier to interact with lower-level processes. In this class to keep things simple, we will **focus on using Tensorflow**, however, do not fret! You'll have an opportunity to take PyTorch out for a spin if you so choose for your final project, or consider looking into other Brown CS courses that use PyTorch such as Machine Learning (CSCI1420) or Computational Linguistics (CSCI1460). Please keep in mind you are *not* allowed to use *any* Tensorflow, Keras, or PyTorch functions throughout HW1 and HW2. We use these packages to verify functionality, but if you are found using these libraries for functionality you will be docked points. ## Roadmap <!-- TODO: Finish writing this section --> Don't worry if these tasks seem daunting at first glance! We've included a lot more info down below on specific implementation details. 1. Start with **`preprocess.py`** to prepare the diabetes dataset for testing [more info](#1-Preprocessing) * Split our data into training and testing sets 2. Write the call method for the `Dense` layer in **`layers.py`** [more info](#2-Dense-Layer) * Implement the forward pass and return the outputs 3. Work through the methods for Mean Squared Error (MSE) in **`losses.py `** [more info](#3-MSE) 4. Implement the gradient method in **`gradient_tape.py`** [more info](#4-Gradient-Tape) * Differentiate the loss to your `Dense` layer! * aka given some loss Tensor `target`, find all the gradients for each weight in `sources` 5. Write a basic optimizer in **`optimizers.py`** [more info](#5-Basic-Optimizer) * Apply the gradients to their respective weights 6. Fill out **`model.py`** and **`assignment.py`** [more info](#6-Model-and-Assignment) * Make your model class * Fill out the hyperparameters of training your model 7. Train and Test your model! [more info](#7-Training-and-Testing) * Run `assignment.py`, observe, and tune! ### 1. Preprocessing If you look at the main function of `assignment.py`, you can see that we've imported our dataset: * `from sklearn.datasets import load_diabetes` In the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html#sklearn.datasets.load_diabetes), we can see that the diabetes dataset looks like this: > | Samples total | 442 | > | -------------- | ------------------ | > | Dimensionality | 10 | > | Features | real, -.2 < x < .2 | > | Targets | integer 25 - 346 | We extract our inputs ("features") and outputs ("targets") with the following line in our main function: ```python X, Y = load_diabetes(return_X_y=True) ``` We now have inputs `X` with shape `(442, 10)`, where each of the `442` samples have `10` different features, as well as outputs `Y` (or labels, or targets) with shape `(442)`. However, we can't just toss all of this data at our model! If we train on all `442` samples and then test on all `442` samples, then the model might just give memorized results: we have no clue if it actually learned any statistical relationship between the input features and the output. This learning is just too shallow. So, you need to split up `X,Y` into training and testing sets! This is called for you: ```python train_inputs, test_inputs, train_labels, test_labels = preprocess_data(np.array(X), np.array(Y), 0.8) ``` If you go to `preprocess.py`, you'll see the stencil function: ```python def preprocess_data(features: np.ndarray, labels: np.ndarray, split_percentage: float) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]: ``` **Task:** The function takes in features, labels, and a split percentage. First, you will need to reshape `labels` to have shape `(num_samples, 1)`. Then, split `features` and `labels` into training and testing sets according to the split percentage. For example, if the split index is 0.8, then the first 80% of values in the list will be for training, and the last 20% will be for testing. The function should **return the lists in the following order**: Training features, testing features, training labels, testing labels Notice that we're working with 2 NumPy arrays here: feel free to take a look at some of the NumPy functions we linked in the stencil code and any other functions that might make your life a little easier (maybe getting the length?). ### 2. Dense Layer Check out the `Dense` class in `beras/layers.py`! There's a decent amount to do here so let's break it down: #### Initialization First, let's look at our `__init__` method. (This is the `Dense` class's constructor method for those new to Python!) Notice how we initialize `self.w` and `self.b` instance variables. What might these variables represent? #### TODO Here's what you have to fill in: * `def weights(self) -> list[Tensor]:` * `def call(self, x: Tensor) -> Tensor:` * `def get_input_gradients(self) -> list[Tensor]:` * `def get_weight_gradients(self) -> list[Tensor]:` * `def _initialize_weight(initializer, input_size, output_size) -> tuple[Variable, Variable]:` #### `weights` **Task:** Just return the `Dense` layer's weights! #### `call` **Task:** Implement the forward pass of the `Dense` layer The parameter `x` represents our input. Remember from class that our Dense layer performs the following to get its output: $$ f(\bf{x}) = \bf{W}\bf{x} + \bf{b} $$ Keep in mind that `x` has shape `(num_samples, input_size)`, where `input_size=10`. What shape should our predicted output have? What shape do the labels already have? (It might be easier if you change the order of some things :) ) #### `get_input_gradients` **Task:** Calculate the gradients of our Dense layer's forward pass with respect to our inputs, *i.e.* $\frac{\partial f}{\partial x}$ The answer is quite intuitive but if you need some convincing, the answer should come out if you expand the matrix multiplication and take the partial with respect to each entry in $x$! #### `get_weight_gradients` **Task:** Similar to `get_input_gradients`: we need to calculate the gradients of our Dense layer's forward pass with respect to our weights ($\bf{W}, \bf{b}$), *i.e.* $\frac{\partial f}{\partial W}, \frac{\partial f}{\partial b}$ The calculation for this is a little more involved. You'll probably notice that you need the input to calculate the gradient. However, if you take a glance at the class definition (`class Dense(Diffable):`), you can see that `Dense` subclasses `Diffable`. What instance variable might help us out here? **Hint:** check out [the `core.py` cheat sheet](https://hackmd.io/ZQvXlumeR2m0lEnGd8uHnA) for more details on the `Diffable` class. #### `_initialize_weight` **Task:** Initialize the dense layer’s weight values. By default, initialize all the weights to zero (usually a bad idea). You are also required to allow for more sophisticated options by allowing for the following: Tip: Think carefully about the shapes of the weight and bias matrices before you implement this! - **Normal:** Passing `normal` causes the weights to be initialized with a unit normal distribution. You may want to look for a numpy function to help you with this! - **Xavier Normal:** Passing `xavier` causes the weights to be initialized in the same way as [`keras.GlorotNormal`](https://www.tensorflow.org/api_docs/python/tf/keras/initializers/GlorotNormal). - **Kaiming He Normal:** Passing `kaiming` causes the weights to be initialized in the same way as [`keras.HeNormal`](https://www.tensorflow.org/api_docs/python/tf/keras/initializers/HeNormal). If you have never seen Xavier and Kaiming distributions, don't worry - that is totally understandable! They are just normal distributions with some modifications to the standard deviation based on the input size. We recommend taking a look at the Keras documentation linked above for each function. Regardless of weight intialization method, make sure to always initialize the bias $b$ to zeros, and only initialize the weights $W$ according to the selected method. **Note:** The stencil also mentions `Xavier/Kaiming Uniform` initializations, but these are **not required for this assignment**. ### 3. MSE Next up: `beras/losses.py` Tip: Notice each class in losses inherits the Diffable class. Once again, if you are confused about certain methods, you can check out the [core cheat sheet](https://hackmd.io/ZQvXlumeR2m0lEnGd8uHnA) to learn about Diffable! Here's what you have to do: * `call` * Calculate MSE * `get_input_gradients` * Calculate gradients with respect to the inputs to an MSE function Notice we don't have any `get_weight_gradients`. Why might this be? #### `call` **Task:** Given two Tensors `y_pred` and `y_true`, calculate the Mean Squared Error between the two Tensors. **Hint:** Notice that a `Tensor` is just a `np.ndarray` with some extra addons, meaning we can use NumPy functions on them! Make sure to return a `Tensor` as your MSE value. #### `get_input_gradients` **Task:** Calculate gradients with respect to the inputs to MSE. What are the inputs? How can we access them? **Hint: **notice that `class MeanSquaredError(Diffable):` is also a `Diffable`. ### 4. Gradient Tape Now, we know how to take our inputs, make predictions from them with our Dense layer, calculate the loss between our predictions and the true labels for these samples, and calculate all the gradients along the way. Think back to the computation graph. If we're tuning our weights to minimize the loss, how much should the weights change by? If you take a glance at `beras/gradient_tape.py`, you'll notice a couple things. First, inside `__init__`, notice the `self.previous_layers` instance variable. All the `Diffable` layers, or operations, which occur while a `GradientTape` is watching will record the id of their output as well as the operation that occured to `self.previous_layers` Next, we have `__enter__` and `__exit__`. Remember back to lab when we had something like: ```python with tf.GradientTape as tape: #<- this "enters" the GradientTape! #do some diffable things #do some more diffable things #do some non-diffable things #<- leaving the indentation block "exits" the GradientTape ``` So, here's what you have to do #### `def gradient(self, target: Tensor, sources: list[Tensor]) -> list[Tensor]:` **Task:** implement the `gradient` method of the `GradientTape` in `beras/gradient_tape.py`, which returns a list of gradients corresponding to the list of trainable weights in the network. Note that the `sources` parameter should be the list of trainable weights in the network. **Note:** for HW2, you will be expected to expand upon this code to be generalizable to an arbitrary model. However, in HW1, we can backpropagate through the set architecture of a Dense layer followed by the MSE loss. That is to say, our architecture can be summarized as $\mathcal{L}_\text{MSE}(f(x), y_\text{true})$, and we can hard code our gradient backpropagation through these set layers. **Hints:** here's some hints to help you get started: * What are the trainable weights of our model? * Given the MSE loss `Tensor` passed in as the `target` parameter, how can we access the MSE loss layer? * Now that we have our MSE loss layer, what properties and/or methods from `Diffable` might help us out? * Check out [`core.py` cheat sheet](https://hackmd.io/ZQvXlumeR2m0lEnGd8uHnA) for a detailed rundown! * You might find the `zip` function to be helpful in iterating through two lists in parallel! Don't feel like you have to use it, but it might be interesting to look at! ### 5. Basic Optimizer **Task:** In the `beras/optimizers.py` file make sure to implement the optimization for each of the different types of optimizers. Refer back to **Lab 1** if you need a refresher. - `BasicOptimizer`: A simple optimizer strategy as seen in Lab 1. You'll see more soon in **Lab 2** and **HW2**! ### 6. Model and Assignment This is the final bit! You've finished implementing all the different components necessary to a deep learning model and now all that's left is to learn deeply :-). Most things have already been filled out for you at this point in both `model.py` and `assignment.py` Here's the full **TODO** list: * `model.py` * `def evaluate(*self*, *x*, *y*) -> Tensor:` * `assignment.py` * `SequentialModel`'s `def call(self, inputs: beras.Tensor) -> beras.Tensor:` #### `model.py` In `beras/model.py` you'll see the `Model` class. We've implemented most of what's important in a model. First things first, notice the instance variable `self.layers`. This will come in play soon. Next, take a look at `compile()`. This defines 2 new instance variables for the model which are essential for training. Now, read through `fit()`. This method is how we "fit" our model's trainable weights based on predictions from given inputs and the given labels. **Task:** Fill out the `def evaluate(self, x, y) -> Tensor:` method. This method should be really similar to our `fit()` method, but with some parts cut out. #### `assignment.py` ##### `SingleLayerModel` At the top of `assignment.py`, you'll see the `SingleLayerModel` class. **Task:** Fill out the `call` function of the `SingleLayerModel`! ##### `get_single_layer_model_components()` If you scroll down, you'll see stencil code on where you should initialize and compile your model! **Task:** Initialize a `SingleLayerModel` with a single `Dense` layer! Hint: Given what we know about the inputs and outputs, what should the dimensions of the Dense layer be? **Task:** Fill in the optimizer and loss function of your model! You should adjust the learning rate so your model learns reasonably quickly! Given the simplicity of the data, we recommend starting with at least `0.5` **Task:** Decide the number of epochs to train for. ### 7. Training and Testing Congratulations! You've finished all of the coding for HW1! Now, all that's left is to see how well our model does. **Task:** Go into your terminal, `cd` into the directory of the assignment and run `assignment.py`! You can do this by calling ```python python3 assignment.py ``` Your model should do pretty well based on the `get_single_layer_model_components`. However, your model might not be quite good enough for submission yet! Go back to your `get_single_layer_model_components` and try adjusting the learning rates and number of epochs. **Task:** Tune your hyperparameters in `get_single_layer_model_components` to achieve the best performance. ### Some Additional Comments * We have provided testing files (`test_assignment.py`, `test_beras.py`, `test_preprocess.py`) with some sanity checks that you can run to help identify where an issue might be occurring. While these sanity checks don't _guarantee_ that the code being tested is completely correct, it is useful to use them to see if you fail a basic test (in which case you know your code isn't quite right). * In order to run the tests: 1. Make sure you have the virtual environment activated 2. `cd` into the `code` directory 3. Uncomment the tests you want to run from the `main` method of the test file to be run 4. Run `python3 <test-filename>` (ex: if you want to run some of the tests in `test_beras.py`, run `python3 test_beras.py`). If a test fails, you will see an `AssertionError`. ## Submission ### Requirements <!-- TODO: Finish writing this section --> - Implement a deep learning framework, use that framework to build a model and predict the progression of diabetes over 1 year. - Run and pass all sanity tests provided - **Note**: You will lose points on your final score if you don't pass all of these local tests. - Achieve a validation loss below 5500 within 200 epochs<!--Our model with hyperparam tuning achieves 5136 with lr .9 and 200epochs--> - Include a brief `README.md` file containing your **model's accuracy** and any **known bugs** (🐛) - For compatibility with the autograder, please put the `README.md` file within the `code` directory (`code/README.md`) ### Grading Your code will be primarily graded on functionality, as determined by the Gradescope autograder. Your model should have a validation loss **less than 5500**. > **Warning:** You will not receive any credit for functions that use `tensorflow`, `keras`, `torch`, or `scikit-learn` functions within them. You must implement all functions manually using either vanilla Python or NumPy. ### Handing In You should submit the assignment via Gradescope under the corresponding project assignment **through Github**. To submit via Github, commit and push all of your changes to your repository to GitHub. You can do this by running the following commands. ```bash git commit -am "commit message" git push ``` For those of y'all who are already familiar with `git`: the `-am` flag to `git commit` is a pretty cool shortcut which adds all of our modified files and commits them with a commit message. > **Note:** We highly recommend committing your files to git and syncing with your GitHub repository **often** throughout the course of the assignment to ensure none of your hard work is **lost**!