HW1 Programming: Beras Pt. 1

Conceptual questions due Friday, February 9, 2024 at 6:00 PM EST
Programming assignment due Monday, February 12, 2024 at 6:00 PM EST

Theme

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Deep under the sea, a school of fish have started learning computer science, and want to make a deep learning model! Because they don't have conda or pip, they're making their own AI framework, and need your help!

P.S. the puns are intended.

Assignment Overview

In this assignment you will begin constructing a basic Keras mimic, 🐻 Beras 🐻 (haha funny name).

Assignment Goals

  1. Implement a simple linear regression model that mimics the Tensorflow/Keras API.
    • Implement a fully-connected linear (dense) layer with weights and biases
    • Implement a basic objective (loss) function for regression such as MSE
    • Implement basic regression accuracy metrics
    • Learn optimal weight and bias parameters using gradient descent and backpropogation
  2. Apply this model to predict the progression of diabetes one year from baseline using scikit-learn's diabetes dataset.

Note: HW2 will use a lot of your code from HW1. Therefore, it is highly recommended to complete HW1 in its entirety in a timely fashion to prevent any delays in completing HW2.

Getting Started

Stencil

Please click here to get the stencil code. Reference this guide for more information about GitHub and GitHub classroom.

Warning: Do not change the stencil except where specified. You are welcome to write your own helper functions, however, changing the stencil's method signatures or removing pre-defined functions may result in an incompatibility with our autograder and result in a low overall grade.

Environment

You will need to use the virtual environment that you made in Homework 0. You can activate the environment by using the command conda activate csci1470. If you have any issues running the stencil code, be sure that your conda environment contains at least the following packages:

  • python==3.10
  • numpy
  • tensorflow
  • scikit-learn (called sklearn in actual Python programs!)
  • pytest

On Windows conda prompt or Mac terminal, you can check to see if a package is installed with:

conda list -n csci1470 <package_name>

On Unix systems to check to see if a package is installed you can use:

conda list -n csci1470 | grep <package_name>

Note: Be sure to read this handout in its entirety before moving onto implementing any part of the assignment!

Deep Learning Libraries

Deep learning is a very complicated and mathematically rich subject. However, when building models, all of these nuances can be abstracted away from the programmer through the use of deep learning libraries.

There are a couple of popular choices for deep learning libraries in industry and academia, each with their own opinions on how models should be structured and how the building blocks interact with each other. Two of the most popular libraries for Python include Tensorflow and PyTorch.

  • TensorFlow is a linear algebra and auto-differentiation library. Keras is a submodule of Tensorflow which focuses on constructing deep learning objects and their interactions with one another to build complex systems. For example:
    • tensorflow.keras.layers.Dense is the typical fully-connected, linear layer with weights and bias
    • tensorflow.keras.losses.MSE is the differentiable mean-squared error loss layer
  • PyTorch is also a linear algebra and auto-differentiation library. The nn (neural network) submodule is PyTorch's analogue to Keras and is the location of PyTorch's deep learning objects. For example:
    • torch.nn.Linear is the typical fully-connected, linear layer with weight and bias
    • torch.nn.MSELoss is the differentiable mean-squared error loss layer

In general, Tensorflow and Keras simplifies life by providing you a lot of building blocks and forcing you to use them. PyTorch, on the other hand, may require you build a couple more things by hand but also makes it easier to interact with lower-level processes.

In this class to keep things simple, we will focus on using Tensorflow, however, do not fret! You'll have an opportunity to take PyTorch out for a spin if you so choose for your final project, or consider looking into other Brown CS courses that use PyTorch such as Machine Learning (CSCI1420) or Computational Linguistics (CSCI1460).

Please keep in mind you are not allowed to use any Tensorflow, Keras, or PyTorch functions throughout HW1 and HW2. We use these packages to verify functionality, but if you are found using these libraries for functionality you will be docked points.

Roadmap

Don't worry if these tasks seem daunting at first glance! We've included a lot more info down below on specific implementation details.

  1. Start with preprocess.py to prepare the diabetes dataset for testing more info
    • Split our data into training and testing sets
  2. Write the call method for the Dense layer in layers.py more info
    • Implement the forward pass and return the outputs
  3. Work through the methods for Mean Squared Error (MSE) in losses.py more info
  4. Implement the gradient method in gradient_tape.py more info
    • Differentiate the loss to your Dense layer!
    • aka given some loss Tensor target, find all the gradients for each weight in sources
  5. Write a basic optimizer in optimizers.py more info
    • Apply the gradients to their respective weights
  6. Fill out model.py and assignment.py more info
    • Make your model class
    • Fill out the hyperparameters of training your model
  7. Train and Test your model! more info
    • Run assignment.py, observe, and tune!

1. Preprocessing

If you look at the main function of assignment.py, you can see that we've imported our dataset:

  • from sklearn.datasets import load_diabetes

In the documentation, we can see that the diabetes dataset looks like this:

Samples total 442
Dimensionality 10
Features real, -.2 < x < .2
Targets integer 25 - 346

We extract our inputs ("features") and outputs ("targets") with the following line in our main function:

X, Y = load_diabetes(return_X_y=True)

We now have inputs X with shape (442, 10), where each of the 442 samples have 10 different features, as well as outputs Y (or labels, or targets) with shape (442).

However, we can't just toss all of this data at our model! If we train on all 442 samples and then test on all 442 samples, then the model might just give memorized results: we have no clue if it actually learned any statistical relationship between the input features and the output. This learning is just too shallow.

So, you need to split up X,Y into training and testing sets! This is called for you:

train_inputs, test_inputs, train_labels, test_labels = preprocess_data(np.array(X), np.array(Y), 0.8)

If you go to preprocess.py, you'll see the stencil function:

def preprocess_data(features: np.ndarray, labels: np.ndarray, split_percentage: float) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:

Task: The function takes in features, labels, and a split percentage. First, you will need to reshape labels to have shape (num_samples, 1). Then, split features and labels into training and testing sets according to the split percentage.

For example, if the split index is 0.8, then the first 80% of values in the list will be for training, and the last 20% will be for testing.

The function should return the lists in the following order: Training features, testing features, training labels, testing labels

Notice that we're working with 2 NumPy arrays here: feel free to take a look at some of the NumPy functions we linked in the stencil code and any other functions that might make your life a little easier (maybe getting the length?).

2. Dense Layer

Check out the Dense class in beras/layers.py! There's a decent amount to do here so let's break it down:

Initialization

First, let's look at our __init__ method. (This is the Dense class's constructor method for those new to Python!)

Notice how we initialize self.w and self.b instance variables. What might these variables represent?

TODO

Here's what you have to fill in:

  • def weights(self) -> list[Tensor]:
  • def call(self, x: Tensor) -> Tensor:
  • def get_input_gradients(self) -> list[Tensor]:
  • def get_weight_gradients(self) -> list[Tensor]:
  • def _initialize_weight(initializer, input_size, output_size) -> tuple[Variable, Variable]:

weights

Task: Just return the Dense layer's weights!

call

Task: Implement the forward pass of the Dense layer

The parameter x represents our input. Remember from class that our Dense layer performs the following to get its output:

f(x)=Wx+b
Keep in mind that x has shape (num_samples, input_size), where input_size=10.

What shape should our predicted output have? What shape do the labels already have?

(It might be easier if you change the order of some things :) )

get_input_gradients

Task: Calculate the gradients of our Dense layer's forward pass with respect to our inputs, i.e.

fx

The answer is quite intuitive but if you need some convincing, the answer should come out if you expand the matrix multiplication and take the partial with respect to each entry in

x!

get_weight_gradients

Task: Similar to get_input_gradients: we need to calculate the gradients of our Dense layer's forward pass with respect to our weights (

W,b), i.e.
fW,fb

The calculation for this is a little more involved.

You'll probably notice that you need the input to calculate the gradient. However, if you take a glance at the class definition (class Dense(Diffable):), you can see that Dense subclasses Diffable. What instance variable might help us out here?

Hint: check out the core.py cheat sheet for more details on the Diffable class.

_initialize_weight

Task: Initialize the dense layer’s weight values. By default, initialize all the weights to zero (usually a bad idea). You are also required to allow for more sophisticated options by allowing for the following:

Tip: Think carefully about the shapes of the weight and bias matrices before you implement this!

  • Normal: Passing normal causes the weights to be initialized with a unit normal distribution. You may want to look for a numpy function to help you with this!
  • Xavier Normal: Passing xavier causes the weights to be initialized in the same way as keras.GlorotNormal.
  • Kaiming He Normal: Passing kaiming causes the weights to be initialized in the same way as keras.HeNormal.

If you have never seen Xavier and Kaiming distributions, don't worry - that is totally understandable! They are just normal distributions with some modifications to the standard deviation based on the input size. We recommend taking a look at the Keras documentation linked above for each function.

Regardless of weight intialization method, make sure to always initialize the bias

b to zeros, and only initialize the weights
W
according to the selected method.

Note: The stencil also mentions Xavier/Kaiming Uniform initializations, but these are not required for this assignment.

3. MSE

Next up: beras/losses.py

Tip: Notice each class in losses inherits the Diffable class. Once again, if you are confused about certain methods, you can check out the core cheat sheet to learn about Diffable!

Here's what you have to do:

  • call
    • Calculate MSE
  • get_input_gradients
    • Calculate gradients with respect to the inputs to an MSE function

Notice we don't have any get_weight_gradients. Why might this be?

call

Task: Given two Tensors y_pred and y_true, calculate the Mean Squared Error between the two Tensors.

Hint: Notice that a Tensor is just a np.ndarray with some extra addons, meaning we can use NumPy functions on them!

Make sure to return a Tensor as your MSE value.

get_input_gradients

Task: Calculate gradients with respect to the inputs to MSE.

What are the inputs? How can we access them?

**Hint: **notice that class MeanSquaredError(Diffable): is also a Diffable.

4. Gradient Tape

Now, we know how to take our inputs, make predictions from them with our Dense layer, calculate the loss between our predictions and the true labels for these samples, and calculate all the gradients along the way.

Think back to the computation graph. If we're tuning our weights to minimize the loss, how much should the weights change by?

If you take a glance at beras/gradient_tape.py, you'll notice a couple things.

First, inside __init__, notice the self.previous_layers instance variable. All the Diffable layers, or operations, which occur while a GradientTape is watching will record the id of their output as well as the operation that occured to self.previous_layers

Next, we have __enter__ and __exit__. Remember back to lab when we had something like:

with tf.GradientTape as tape: 	#<- this "enters" the GradientTape!
  #do some diffable things
  #do some more diffable things
#do some non-diffable things 		#<- leaving the indentation block "exits" the GradientTape

So, here's what you have to do

def gradient(self, target: Tensor, sources: list[Tensor]) -> list[Tensor]:

Task: implement the gradient method of the GradientTape in beras/gradient_tape.py, which returns a list of gradients corresponding to the list of trainable weights in the network. Note that the sources parameter should be the list of trainable weights in the network.

Note: for HW2, you will be expected to expand upon this code to be generalizable to an arbitrary model. However, in HW1, we can backpropagate through the set architecture of a Dense layer followed by the MSE loss. That is to say, our architecture can be summarized as

LMSE(f(x),ytrue), and we can hard code our gradient backpropagation through these set layers.

Hints: here's some hints to help you get started:

  • What are the trainable weights of our model?
  • Given the MSE loss Tensor passed in as the target parameter, how can we access the MSE loss layer?
  • Now that we have our MSE loss layer, what properties and/or methods from Diffable might help us out?
  • You might find the zip function to be helpful in iterating through two lists in parallel! Don't feel like you have to use it, but it might be interesting to look at!

5. Basic Optimizer

Task: In the beras/optimizers.py file make sure to implement the optimization for each of the different types of optimizers. Refer back to Lab 1 if you need a refresher.

  • BasicOptimizer: A simple optimizer strategy as seen in Lab 1.

You'll see more soon in Lab 2 and HW2!

6. Model and Assignment

This is the final bit! You've finished implementing all the different components necessary to a deep learning model and now all that's left is to learn deeply :-).

Most things have already been filled out for you at this point in both model.py and assignment.py

Here's the full TODO list:

  • model.py
    • def evaluate(*self*, *x*, *y*) -> Tensor:
  • assignment.py
    • SequentialModel's def call(self, inputs: beras.Tensor) -> beras.Tensor:

model.py

In beras/model.py you'll see the Model class. We've implemented most of what's important in a model.

First things first, notice the instance variable self.layers. This will come in play soon.

Next, take a look at compile(). This defines 2 new instance variables for the model which are essential for training.

Now, read through fit(). This method is how we "fit" our model's trainable weights based on predictions from given inputs and the given labels.

Task: Fill out the def evaluate(self, x, y) -> Tensor: method. This method should be really similar to our fit() method, but with some parts cut out.

assignment.py

SingleLayerModel

At the top of assignment.py, you'll see the SingleLayerModel class.

Task: Fill out the call function of the SingleLayerModel!

get_single_layer_model_components()

If you scroll down, you'll see stencil code on where you should initialize and compile your model!

Task: Initialize a SingleLayerModel with a single Dense layer!
Hint: Given what we know about the inputs and outputs, what should the dimensions of the Dense layer be?

Task: Fill in the optimizer and loss function of your model! You should adjust the learning rate so your model learns reasonably quickly! Given the simplicity of the data, we recommend starting with at least 0.5

Task: Decide the number of epochs to train for.

7. Training and Testing

Congratulations! You've finished all of the coding for HW1!

Now, all that's left is to see how well our model does.

Task: Go into your terminal, cd into the directory of the assignment and run assignment.py! You can do this by calling

python3 assignment.py

Your model should do pretty well based on the get_single_layer_model_components.

However, your model might not be quite good enough for submission yet! Go back to your get_single_layer_model_components and try adjusting the learning rates and number of epochs.

Task: Tune your hyperparameters in get_single_layer_model_components to achieve the best performance.

Some Additional Comments

  • We have provided testing files (test_assignment.py, test_beras.py, test_preprocess.py) with some sanity checks that you can run to help identify where an issue might be occurring. While these sanity checks don't guarantee that the code being tested is completely correct, it is useful to use them to see if you fail a basic test (in which case you know your code isn't quite right).
    • In order to run the tests:
      1. Make sure you have the virtual environment activated
      2. cd into the code directory
      3. Uncomment the tests you want to run from the main method of the test file to be run
      4. Run python3 <test-filename> (ex: if you want to run some of the tests in test_beras.py, run python3 test_beras.py). If a test fails, you will see an AssertionError.

Submission

Requirements

  • Implement a deep learning framework, use that framework to build a model and predict the progression of diabetes over 1 year.
  • Run and pass all sanity tests provided
    • Note: You will lose points on your final score if you don't pass all of these local tests.
  • Achieve a validation loss below 5500 within 200 epochs
  • Include a brief README.md file containing your model's accuracy and any known bugs (🐛)
    • For compatibility with the autograder, please put the README.md file within the code directory (code/README.md)

Grading

Your code will be primarily graded on functionality, as determined by the Gradescope autograder. Your model should have a validation loss less than 5500.

Warning: You will not receive any credit for functions that use tensorflow, keras, torch, or scikit-learn functions within them. You must implement all functions manually using either vanilla Python or NumPy.

Handing In

You should submit the assignment via Gradescope under the corresponding project assignment through Github. To submit via Github, commit and push all of your changes to your repository to GitHub. You can do this by running the following commands.

git commit -am "commit message"
git push

For those of y'all who are already familiar with git: the -am flag to git commit is a pretty cool shortcut which adds all of our modified files and commits them with a commit message.

Note: We highly recommend committing your files to git and syncing with your GitHub repository often throughout the course of the assignment to ensure none of your hard work is lost!