--- tags: hw1, handout --- # HW1 Programming: Linear Regression :::info Conceptual section due **Friday, February 10, 2023 at 6:00 PM EST** Programming section due **Monday, February 13, 2023 at 6:00 PM EST** ::: This homework is intended to give you an introduction to the deep learning optimization routine from the simplified perspective of linear regression. Specifically, you'll be mimicking functionality provided by PyTorch to fit a weight and bias matrix towards predicting diabetes progression based on patient characteristics. ## Theme ![](https://ichef.bbci.co.uk/news/800/cpsprodpb/F6E4/production/_107840236_raviolistarfish.jpg) *This is a starfish, not a raviolo (according to Logan, it's the singular of ravioli).* # Getting started ## Stencil Please click <ins>[here](https://classroom.github.com/a/ABpmWb19)</ins> to get the stencil code. Reference this <ins>[guide](https://hackmd.io/gGOpcqoeTx-BOvLXQWRgQg)</ins> for more information about GitHub and GitHub Classroom. :::danger **Do not change the stencil except where specified**. While you are welcome to write your own helper functions, changing the stencil's method signatures or removing pre-defined functions could result in incompatibility with the autograder and result in a low grade. ::: ## Environment You will need to use the virtual environment that you made in Homework 0 to run code in this assignment (because it relies on `numpy` and `torch`), which you can activate by using `conda activate csci1470`. :::info If you're running issues with importing `torch`, make sure PyTorch is installed by running `conda install pytorch torchvision -c pytorch` within the virtual environment. ::: # Assignment overview In this assignment, you'll expand upon your work in Lab 1 and create your own linear regression model. However, in contrast to your lab implementation which was relatively inflexible and objective-centered, you'll be imitating a PyTorch implementation. Specifically, you'll use linear regression on [scikit-learn's diabetes dataset](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) with the goal of predicting disease progression one year after baseline. The data has 10 variables, such as age, sex, and cholesterol levels. It is discussed in detail in its associated [release paper](https://hastie.su.domains/Papers/LARS/LeastAngle_2002.pdf). All of the code is contained within a single notebook which should appear similar to labs. You should not need additional files or data beyond what is provided in the notebook. ---- ### **What's PyTorch and Tensorflow** PyTorch is one of a selection of deep learning library which combines linear algebra, differentiation, and statistical utilities under one common framework and with one specific purpose; to build arbitrary deep learning systems for research and development applications. So... yeah, using it for linear regression is probably overkill, but the modular systems that the library implements will be the necessary for the types of structures we will be building in this course! Of note, we will actually be using Tensorflow/Keras for most of this course, but the Keras assumptions are a bit more complex to model from scratch. The intuitive connection between the two is as follows: - TensorFlow is a linear algebra/auto-differentiation library. Keras - which is part of TF - is where the library keeps deep learning objects that can interact with one another to make complex systems. For example, a dense layer with a weight and bias is `tensorflow.keras.layers.Dense` and a differentiable mean-squared error loss layer generator is `tensorflow.keras.losses.MSE`. - PyTorch is also a lin-alg/autodiff library for deep learning, but the deep learning modules are contained in ``nn`` - or the "neural network" - module. The syntax looks somewhat similar, with a dense layer object being `torch.nn.Linear` and an MSE layer `torch.nn.MSELoss`. In general, TF/Keras makes life easier by giving you a bunch of systems and forcing you to use them. In contrast, PyTorch requires you to make your own stuff more often but makes it easier to interact with the lower-level processes. Since we only care about supporting a very simple routine for this assignment, the transparency of PyTorch will be much appreciated. In Homework 2, you'll get the chance to step up from this and build some of the systems defaulted to by TensorFlow. ### Using PyTorch In this assignment, you will see that using PyTorch to optimize arbitrary parameters of your model can be extremely easy. Specifically, you can get away with just defining the following: - **Architecture:** Define what components your model will be made of (the model architecture). This may include linear layers, activation functions, or maybe other things you haven't learned. - **Forward Pass:** Define how to compute the prediction from a given input. - **Loss:** Define the loss function you'd like to use (MSE, BCE, etc.). - Can also be considered part of the forward pass to contrast against the backwards pass. - **Optimizer:** Pick an optimizer and give it the parameters you'd like to optimize (and the hyperparameters of your optimizer, such as your learning rate). After you do that, you can just take subsets of your data and optimize the model to match it by doing the following: - **Forward Pass** - Compute the prediction using your model. - Compute the loss of the prediction relative to your ground truth. - **Backward Pass** - Call your loss's `backward()` method and let PyTorch compute the gradients of your Tensors. - **Optimize** - Call your optimizer's `step` method to update your gradients by reference. - **Repeat For All Batches, and Repeat Until Convergence (or Stop)** So... how does that work? We've only defined how to do a forward pass, so how can the network just infer how to do a backwards pass? ### Computational Graph Let's start out with a simple assumption that the linear layer and the loss function of a neural network are objects equipped with forward functions such that $\text{linear.forward(}x{)} = \hat y$ and $\text{loss.forward(}y, \hat y{)} = \mathcal{L}$ for some instances `linear` and `loss` and an arbitrary ground-truth pair $(x, y)$. ![](https://i.imgur.com/UGqvZ3U.png) By default, if we use the forward function as-is, we will be able to compute $\hat y$ and $\mathcal{L}$ very naturally, but what then? Well, we can't really do anything as-is because we don't know how to get backwards. To solve this, we can wrap our forward function with a wrapper method (we'll call it `call`) that mimics the operations of `forward` but also maintains two additional objectives: - Inside the layer, store the inputs and the outputs of the layer's forward pass for use later. - Inside the output, store a pathway back to the layer which can connect the output with the layer that generated it. ![](https://i.imgur.com/svq2r9m.png) The first property on its own allows the layer to hold access to its inputs and outputs which can then be used in tandem with other layer-internal structures (i.e. parameters, hyperparameters). One possible option, if the forward function is differentiable, is to compute the partial derivative of the output with respect to various layer components! ![](https://i.imgur.com/otOvhS1.png) Additionally, notice the following: - The outputs now have pathways back to their parent layer - The layers now have pathways back to their inputs (and also parameters) **This means that we have constructed a graph!** ![](https://i.imgur.com/dSI5Fwx.png) Using this structure and assuming that an output loss was generated with a chain of layers like this, you can then implement `backward` such that `loss.backward()` computes the gradient of loss with respect to any of the components that went into generating it. The notebook includes a lot of starter code to show how these components are used in practive, so your assignment is to finish the implementation! <!-- :::info Note that the implicit graph should work correctly for tree-like computational graphs. Suport for arbitrary DAGs is not expected. ::: --> :::warning This assignment lays the foundation for Homework 2, which was found to be moderately difficult last semester. Thus, it's important to understand the concepts and implementation of this assignment. Read the notebook, attend lectures, and come to office hours! ::: # Roadmap This assignment is structured like a lab, so there is just one file. Read the notebook and fill in the TODOs! All of the TODOs are listed below for convenience. - Data Preprocessing - Split the samples into training and testing sets - Reshape the Y subsets to have shape ``(num_samples, 1)`` - **`class MSELoss(Diffable)`** - `forward()` - Compute/return the MSE given predicted and actual labels - `input_gradients()` - Compute and return the gradients w.r.t. the inputs - `backward()` - Implement backpropagation through the MSE loss layer - **`class Linear(Diffable)`** - `forward()` - Implement the forward pass and return the outputs - `weight_gradients()` - Compute and return the gradients w.r.t. the weights and bias - `_initialize_weight()` - Implement default assumption: zero-init for bias, normal distribution for weights - `backward()` - Implement backpropagation through the linear layer - **`class SGD`** - `step()` - Implement stochastic grad descent for each parameter - **Performance** - Compare the test loss of the `ManualRegression` and `LinearRegression` models -- they should be similar :::info Please use vectorized operations when possible and limit the number of for loops you use. While there is no strict time limit for running this assignment, it should typically be less than 3 minutes. The autograder will automatically time out after 10 minutes. ::: # Submission ## Requirements - Complete and submit HW1 Conceptual Questions - Implement the TODOs (listed above and in the notebook) - Run the sanity checks included and check that they match - Achieve a validation loss below 4000 - Include a brief README with your model's accuracy and any known bugs ## Grading Your code will be primarily graded on functionality. Your model should have a validation loss **less than 4000**. This can be achieved with the simple model parameterization provided. Although you will not be graded on code style, please keep your code and outputs clean (e.g. you should not have an excessive number of print statements in your final submission). Additionally, please document your code and make it understandable if you wish to receive partial credit. :::danger You will not receive any credit for functions that use TensorFlow, Keras, PyTorch, or Scikit-Learn functions within them. You must implement the **TODO** functions manually (you are allowed to use NumPy functions). ::: ## Handing In You should submit the assignment via Gradescope under the corresponding project assignment by zipping up your hw1 folder or through GitHub (recommended). To submit through GitHub, commit and push all changes to your repository to GitHub. You can do this by running the following three commands ([this](https://github.com/git-guides/#how-to-use-git) is a good resource for learning more about them): 1. `git add file1 file2 file3` - Alternatively, `git add -A` will stage all changed files for you. 3. `git commit -m “commit message”` 4. `git push` After committing and pushing your changes to your repo (which you can check online if you're unsure if it worked), you can now just upload the repo to Gradescope! If you’re testing out code on multiple branches, you have the option to pick whichever one you want. ![](https://i.imgur.com/fDc3PH9.jpg) If you wish to submit via zip file: 1. Please make sure your python files are in “hw1/code” this is very important for our autograder to work! 2. Make sure any data folders are not being uploaded as they may be too big for the autograder to work. ::: warning **IF YOU ARE IN 2470:** PLEASE REMEMBER TO ADD A BLANK FILE CALLED `2470student` IN THE hw1/code DIRECTORY, WE ARE USING THIS AS A FLAG TO GRADE 2470 SPECIFIC REQUIREMENTS, FAILURE TO DO SO MEANS LOSING POINTS ON THIS ASSIGNMENT ::: <style> .alert { color: inherit } .markdown-body { font-family: Inter } /* Some really hacky CSS to hide bullet points * for spoilers in lists */ li:has(details) { list-style-type: none; margin-left: -1em } li > details > summary { margin-left: 1em } li > details > summary::-webkit-details-marker { margin-left: -1.05em } </style> # Conclusion Congratulations! You just completed your second assignment of CSCI1470/2470! :tada: :star: :fish: :tada: ::: success **[HINT]** Starfish digest their food by extruding their stomach out of their mouths to envelop their prey. This is also the optimal way to eat food from the Ratty. :::