Conceptual questions due Friday, February 9, 2024 at 6:00 PM EST
Programming assignment due Monday, February 12, 2024 at 6:00 PM EST
Deep under the sea, a school of fish have started learning computer science, and want to make a deep learning model! Because they don't have conda
or pip
, they're making their own AI framework, and need your help!
P.S. the puns are intended.
In this assignment you will begin constructing a basic Keras mimic, 🐻 Beras 🐻 (haha funny name).
scikit-learn's diabetes dataset
.Note: HW2 will use a lot of your code from HW1. Therefore, it is highly recommended to complete HW1 in its entirety in a timely fashion to prevent any delays in completing HW2.
Please click here to get the stencil code. Reference this guide for more information about GitHub and GitHub classroom.
Warning: Do not change the stencil except where specified. You are welcome to write your own helper functions, however, changing the stencil's method signatures or removing pre-defined functions may result in an incompatibility with our autograder and result in a low overall grade.
You will need to use the virtual environment that you made in Homework 0. You can activate the environment by using the command conda activate csci1470
. If you have any issues running the stencil code, be sure that your conda environment contains at least the following packages:
python==3.10
numpy
tensorflow
scikit-learn
(called sklearn
in actual Python programs!)pytest
On Windows conda prompt or Mac terminal, you can check to see if a package is installed with:
On Unix systems to check to see if a package is installed you can use:
Note: Be sure to read this handout in its entirety before moving onto implementing any part of the assignment!
Deep learning is a very complicated and mathematically rich subject. However, when building models, all of these nuances can be abstracted away from the programmer through the use of deep learning libraries.
There are a couple of popular choices for deep learning libraries in industry and academia, each with their own opinions on how models should be structured and how the building blocks interact with each other. Two of the most popular libraries for Python include Tensorflow and PyTorch.
tensorflow.keras.layers.Dense
is the typical fully-connected, linear layer with weights and biastensorflow.keras.losses.MSE
is the differentiable mean-squared error loss layernn
(neural network) submodule is PyTorch's analogue to Keras and is the location of PyTorch's deep learning objects. For example:
torch.nn.Linear
is the typical fully-connected, linear layer with weight and biastorch.nn.MSELoss
is the differentiable mean-squared error loss layerIn general, Tensorflow and Keras simplifies life by providing you a lot of building blocks and forcing you to use them. PyTorch, on the other hand, may require you build a couple more things by hand but also makes it easier to interact with lower-level processes.
In this class to keep things simple, we will focus on using Tensorflow, however, do not fret! You'll have an opportunity to take PyTorch out for a spin if you so choose for your final project, or consider looking into other Brown CS courses that use PyTorch such as Machine Learning (CSCI1420) or Computational Linguistics (CSCI1460).
Please keep in mind you are not allowed to use any Tensorflow, Keras, or PyTorch functions throughout HW1 and HW2. We use these packages to verify functionality, but if you are found using these libraries for functionality you will be docked points.
Don't worry if these tasks seem daunting at first glance! We've included a lot more info down below on specific implementation details.
preprocess.py
to prepare the diabetes dataset for testing more info
Dense
layer in layers.py
more info
losses.py
more infogradient_tape.py
more info
Dense
layer!target
, find all the gradients for each weight in sources
optimizers.py
more info
model.py
and assignment.py
more info
assignment.py
, observe, and tune!If you look at the main function of assignment.py
, you can see that we've imported our dataset:
from sklearn.datasets import load_diabetes
In the documentation, we can see that the diabetes dataset looks like this:
Samples total 442 Dimensionality 10 Features real, -.2 < x < .2 Targets integer 25 - 346
We extract our inputs ("features") and outputs ("targets") with the following line in our main function:
We now have inputs X
with shape (442, 10)
, where each of the 442
samples have 10
different features, as well as outputs Y
(or labels, or targets) with shape (442)
.
However, we can't just toss all of this data at our model! If we train on all 442
samples and then test on all 442
samples, then the model might just give memorized results: we have no clue if it actually learned any statistical relationship between the input features and the output. This learning is just too shallow.
So, you need to split up X,Y
into training and testing sets! This is called for you:
If you go to preprocess.py
, you'll see the stencil function:
Task: The function takes in features, labels, and a split percentage. First, you will need to reshape labels
to have shape (num_samples, 1)
. Then, split features
and labels
into training and testing sets according to the split percentage.
For example, if the split index is 0.8, then the first 80% of values in the list will be for training, and the last 20% will be for testing.
The function should return the lists in the following order: Training features, testing features, training labels, testing labels
Notice that we're working with 2 NumPy arrays here: feel free to take a look at some of the NumPy functions we linked in the stencil code and any other functions that might make your life a little easier (maybe getting the length?).
Check out the Dense
class in beras/layers.py
! There's a decent amount to do here so let's break it down:
First, let's look at our __init__
method. (This is the Dense
class's constructor method for those new to Python!)
Notice how we initialize self.w
and self.b
instance variables. What might these variables represent?
Here's what you have to fill in:
def weights(self) -> list[Tensor]:
def call(self, x: Tensor) -> Tensor:
def get_input_gradients(self) -> list[Tensor]:
def get_weight_gradients(self) -> list[Tensor]:
def _initialize_weight(initializer, input_size, output_size) -> tuple[Variable, Variable]:
weights
Task: Just return the Dense
layer's weights!
call
Task: Implement the forward pass of the Dense
layer
The parameter x
represents our input. Remember from class that our Dense layer performs the following to get its output:
Keep in mind that x
has shape (num_samples, input_size)
, where input_size=10
.
What shape should our predicted output have? What shape do the labels already have?
(It might be easier if you change the order of some things :) )
get_input_gradients
Task: Calculate the gradients of our Dense layer's forward pass with respect to our inputs, i.e.
The answer is quite intuitive but if you need some convincing, the answer should come out if you expand the matrix multiplication and take the partial with respect to each entry in !
get_weight_gradients
Task: Similar to get_input_gradients
: we need to calculate the gradients of our Dense layer's forward pass with respect to our weights (), i.e.
The calculation for this is a little more involved.
You'll probably notice that you need the input to calculate the gradient. However, if you take a glance at the class definition (class Dense(Diffable):
), you can see that Dense
subclasses Diffable
. What instance variable might help us out here?
Hint: check out the core.py
cheat sheet for more details on the Diffable
class.
_initialize_weight
Task: Initialize the dense layer’s weight values. By default, initialize all the weights to zero (usually a bad idea). You are also required to allow for more sophisticated options by allowing for the following:
Tip: Think carefully about the shapes of the weight and bias matrices before you implement this!
normal
causes the weights to be initialized with a unit normal distribution. You may want to look for a numpy function to help you with this!xavier
causes the weights to be initialized in the same way as keras.GlorotNormal
.kaiming
causes the weights to be initialized in the same way as keras.HeNormal
.If you have never seen Xavier and Kaiming distributions, don't worry - that is totally understandable! They are just normal distributions with some modifications to the standard deviation based on the input size. We recommend taking a look at the Keras documentation linked above for each function.
Regardless of weight intialization method, make sure to always initialize the bias to zeros, and only initialize the weights according to the selected method.
Note: The stencil also mentions Xavier/Kaiming Uniform
initializations, but these are not required for this assignment.
Next up: beras/losses.py
Tip: Notice each class in losses inherits the Diffable class. Once again, if you are confused about certain methods, you can check out the core cheat sheet to learn about Diffable!
Here's what you have to do:
call
get_input_gradients
Notice we don't have any get_weight_gradients
. Why might this be?
call
Task: Given two Tensors y_pred
and y_true
, calculate the Mean Squared Error between the two Tensors.
Hint: Notice that a Tensor
is just a np.ndarray
with some extra addons, meaning we can use NumPy functions on them!
Make sure to return a Tensor
as your MSE value.
get_input_gradients
Task: Calculate gradients with respect to the inputs to MSE.
What are the inputs? How can we access them?
**Hint: **notice that class MeanSquaredError(Diffable):
is also a Diffable
.
Now, we know how to take our inputs, make predictions from them with our Dense layer, calculate the loss between our predictions and the true labels for these samples, and calculate all the gradients along the way.
Think back to the computation graph. If we're tuning our weights to minimize the loss, how much should the weights change by?
If you take a glance at beras/gradient_tape.py
, you'll notice a couple things.
First, inside __init__
, notice the self.previous_layers
instance variable. All the Diffable
layers, or operations, which occur while a GradientTape
is watching will record the id of their output as well as the operation that occured to self.previous_layers
Next, we have __enter__
and __exit__
. Remember back to lab when we had something like:
So, here's what you have to do
def gradient(self, target: Tensor, sources: list[Tensor]) -> list[Tensor]:
Task: implement the gradient
method of the GradientTape
in beras/gradient_tape.py
, which returns a list of gradients corresponding to the list of trainable weights in the network. Note that the sources
parameter should be the list of trainable weights in the network.
Note: for HW2, you will be expected to expand upon this code to be generalizable to an arbitrary model. However, in HW1, we can backpropagate through the set architecture of a Dense layer followed by the MSE loss. That is to say, our architecture can be summarized as , and we can hard code our gradient backpropagation through these set layers.
Hints: here's some hints to help you get started:
Tensor
passed in as the target
parameter, how can we access the MSE loss layer?Diffable
might help us out?
core.py
cheat sheet for a detailed rundown!zip
function to be helpful in iterating through two lists in parallel! Don't feel like you have to use it, but it might be interesting to look at!Task: In the beras/optimizers.py
file make sure to implement the optimization for each of the different types of optimizers. Refer back to Lab 1 if you need a refresher.
BasicOptimizer
: A simple optimizer strategy as seen in Lab 1.You'll see more soon in Lab 2 and HW2!
This is the final bit! You've finished implementing all the different components necessary to a deep learning model and now all that's left is to learn deeply :-).
Most things have already been filled out for you at this point in both model.py
and assignment.py
Here's the full TODO list:
model.py
def evaluate(*self*, *x*, *y*) -> Tensor:
assignment.py
SequentialModel
's def call(self, inputs: beras.Tensor) -> beras.Tensor:
model.py
In beras/model.py
you'll see the Model
class. We've implemented most of what's important in a model.
First things first, notice the instance variable self.layers
. This will come in play soon.
Next, take a look at compile()
. This defines 2 new instance variables for the model which are essential for training.
Now, read through fit()
. This method is how we "fit" our model's trainable weights based on predictions from given inputs and the given labels.
Task: Fill out the def evaluate(self, x, y) -> Tensor:
method. This method should be really similar to our fit()
method, but with some parts cut out.
assignment.py
SingleLayerModel
At the top of assignment.py
, you'll see the SingleLayerModel
class.
Task: Fill out the call
function of the SingleLayerModel
!
get_single_layer_model_components()
If you scroll down, you'll see stencil code on where you should initialize and compile your model!
Task: Initialize a SingleLayerModel
with a single Dense
layer!
Hint: Given what we know about the inputs and outputs, what should the dimensions of the Dense layer be?
Task: Fill in the optimizer and loss function of your model! You should adjust the learning rate so your model learns reasonably quickly! Given the simplicity of the data, we recommend starting with at least 0.5
Task: Decide the number of epochs to train for.
Congratulations! You've finished all of the coding for HW1!
Now, all that's left is to see how well our model does.
Task: Go into your terminal, cd
into the directory of the assignment and run assignment.py
! You can do this by calling
Your model should do pretty well based on the get_single_layer_model_components
.
However, your model might not be quite good enough for submission yet! Go back to your get_single_layer_model_components
and try adjusting the learning rates and number of epochs.
Task: Tune your hyperparameters in get_single_layer_model_components
to achieve the best performance.
test_assignment.py
, test_beras.py
, test_preprocess.py
) with some sanity checks that you can run to help identify where an issue might be occurring. While these sanity checks don't guarantee that the code being tested is completely correct, it is useful to use them to see if you fail a basic test (in which case you know your code isn't quite right).
cd
into the code
directorymain
method of the test file to be runpython3 <test-filename>
(ex: if you want to run some of the tests in test_beras.py
, run python3 test_beras.py
). If a test fails, you will see an AssertionError
.README.md
file containing your model's accuracy and any known bugs (🐛)
README.md
file within the code
directory (code/README.md
)Your code will be primarily graded on functionality, as determined by the Gradescope autograder. Your model should have a validation loss less than 5500.
Warning: You will not receive any credit for functions that use
tensorflow
,keras
,torch
, orscikit-learn
functions within them. You must implement all functions manually using either vanilla Python or NumPy.
You should submit the assignment via Gradescope under the corresponding project assignment through Github. To submit via Github, commit and push all of your changes to your repository to GitHub. You can do this by running the following commands.
For those of y'all who are already familiar with git
: the -am
flag to git commit
is a pretty cool shortcut which adds all of our modified files and commits them with a commit message.
Note: We highly recommend committing your files to git and syncing with your GitHub repository often throughout the course of the assignment to ensure none of your hard work is lost!