Brown Deep Learning (2470) F24
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # HW3 Beras :::info Assignment due **October 29th at 6 pm EST** on gradescope ::: ## Theme ![thephilosophyofmagic](https://hackmd.io/_uploads/BJvq2Mzkkl.png) *Indeed neural nets can often feel like magic, so we'd like to change that! Get your hands dirty with the nuts and bolts of deep learning to show everyone that it isn't magic after all!* ## Assignment Overview In this assignment you will begin constructing a basic Keras mimic, 🐻 Beras 🐻. ### Assignment Goals 1. Implement a simple Multi Layer Perceptron (MLP) model that mimics the Tensorflow/Keras API. - Implement core classes and methods used for **Auto Differentiation**. - Implement a **Dense Layer** similar to keras'. - Implement basic **preprocessing techniques** for use on the **MNIST Dataset**. - Implement a basic objective (loss) function for regression such as **MSE**. - Implement basic regression **accuracy metrics**. - **Learn** optimal weight and bias parameters using **gradient descent** and **backpropogation**. 2. Apply this model to predict digits using the MNIST Dataset :::warning __You should know:__ This is the longest assignment, we **highly** recommend starting early and reading this document carefully as you implement each part. ::: ## Getting Started ### Stencil <!--LINK THE REFERENCES--> Please click [here](https://classroom.github.com/a/pauFFZ8q) to get the stencil code. Reference this [guide](https://hackmd.io/gGOpcqoeTx-BOvLXQWRgQg) for more information about GitHub and GitHub classroom. :::danger **Do not change the stencil except where specified.** You are welcome to write your own helper functions, however, changing the stencil's method signatures **will** break the autograder ::: ### Environment You will need to use the virtual environment that you made in Homework 1. You can activate the environment by using the command `conda activate csci2470`. If you have any issues running the stencil code, be sure that your conda environment contains at least the following packages: - `python==3.11` - `numpy` - `tensorflow==2.15` - `pytest` On Windows conda prompt or Mac terminal, you can check to see if a package is installed with: ```bash conda list -n csci2470 <package_name> ``` On Unix systems to check to see if a package is installed you can use: ```bash conda list -n csci2470 | grep <package_name> ``` :::danger Be sure to read this handout in its **entirety before** moving onto implementing **any** part of the assignment! ::: ## Deep Learning Libraries Deep learning is a very complicated and mathematically rich subject. However, when building models, all of these nuances can be abstracted away from the programmer through the use of deep learning libraries. In this assignment you will be writting your own Deep Learning library, 🐻 Beras 🐻. You'll build everything you need to train a model on the MNIST dataset. The MNIST data contains 60k 28x28 black and white hand written digits, your model's job will be to classify which digit is in each image. Please keep in mind you are _not_ allowed to use _any_ Tensorflow, Keras, or PyTorch functions throughout HW3. The autograder will intentionally not execute if you import these libraries. You are already familar with **Tensorflow** from our first assignment. Now your job will be to build your own version of it: Beras. ## Roadmap <!-- TODO: Finish writing this section --> Don't worry if these tasks seem daunting at first glance! We've included a lot more info down below on specific implementation details. 1. Start with **`beras/core.py`** which will create some of the basic building blocks for the assignment. [Specifics](#1-berascorepy) 2. Move on to **`beras/layers.py`** to construct your own `Dense` layer. [Specifics](#2-beraslayerspy) 3. Now complete **`beras/activations.py`** [Specifics](#3-activations) 4. Continue with **`beras/losses.py`** to write **CategoricalCrossEntropy**. [Specifics](#4-beraslossespy) 5. Next write **CategoricalAccuracy** in **`beras/metrics.py`**. [Specifics](#5-berasmetricspy) 6. Implement **`beras/onehot.py`** which will be used later in preprocessing. [Specifics](#6-berasonehotpy) 7. Fill in the optimizer classes in **`beras/optimizer.py`**. [Specifics](#7-berasoptimizerpy) 8. Write **GradientTape** in **`beras/gradient_tape.py`**. [Specifics](#8-berasgradient_tapepy) 9. Construct the **Model** class in **`beras/model.py`**. [Specifics](#9-berasmodelpy) :::danger **GradientTape** is known to be tricky, so budget some extra time to implement it. ::: 9. Now you have finished the beras framework! Put it to use by implementing **`preprocessing.py`** to load and clean your data. [Specifics](#10-preprocessingpy) 10. Finally, write **`assignment.py`** to train a model on the MNIST Dataset! [Specifics](#11-assignmentpy) Gaurav (one of your wonderful TAs) put together this nice graphic to visualize the roadmap and how it all fits together. It's helpful to refer to as you go through the assignment! ![BERAS GRAPH (1)](https://hackmd.io/_uploads/HJ9yRMzk1e.png) *Thanks Gaurav!* ## 1. beras/core.py In this section we are going to prepare some abstract classes we will use for everything else we do in this assigment. This is a very important section since we will build everything else on top of this foundation. :::info **Task 1.1 [Tensor]:** We will begin completing the construction of the `Tensor` class at the top of the file. Note that it subclasses the np.ndarray datatype, you can find out more about what that means <ins>[here](https://numpy.org/doc/stable/user/basics.subclassing.html)</ins>. The only TODO is to pass in the data to the `a` kwarg in `np.asarray(a=???)` in the `__new__` method. ::: :::warning __You should know:__ You'll notice the `Tensor` class is nothing more than a standard `np.ndarray` but with an additional `trainable` attribute. In Tensorflow there is also `tf.Variable` which you may see throughout the course. `tf.Variable` is a subclass of `tf.Tensor` but with some additional bells and whistles for convenience. Both are fair game for use when working with Tensorflow in this course. ::: :::info **Task 1.2 [Callable]:** There are no TODOs in `Callable` but it is important to familiarize yourself with this class. `Callable` simply allows its subclasses to use `self()` and `self.forward()` interchangably. More importantly, if a class subclasses `Callable` it **will** have a `forward` method that returns a `Tensor` and so, we **can and will** use these subclasses when **constructing layers and models** later. ::: :::warning __You should know:__ __Keras__ and __Tensorflow__ use `call` instead of `forward` as the method name for the forward pass of a layer. __Pytorch__ and __Beras__ use `forward` to make the distinction between `__call__` and `forward` clear. ::: :::info **Task 1.3 [Weighted]:** There are 4 methods in `Weighted` for you to fill out: `trainable_variables`, `non_trainable_variables`, `trainable`, `trainable (setter)`. Each method has a description and return type (if needed) in the stencil code. Be sure to follow the typing **exactly** or it's unlikely to pass the autograder. ::: :::success __Note:__ If you need a refreshing on python attributes and properties you can refer to <ins>[this](https://realpython.com/python-getter-setter/)</ins> helpful guide ::: :::info **Task 1.4 [Diffable.\_\_call__]** There are no coding TODOs in `Diffable.__call__` but it is **critical** that you spend some time to familize yourself with what it is doing. Understanding this method will help clear up later parts of the assignment. ::: :::warning __You should know:__ Recall that in python, `generic_class_name()` is equal to `generic_class_name.__call__()`. Note that Diffable implements the `__call__` method and __not__ the `forward` method. When we subclass `Diffable`, for example with `Dense`, we __will__ implement `forward` there. Then, when we use something like `dense_layer(inputs)` __the gradients will be recorded using `GradientTape`__ as you see in `Diffable.__call__`. If you use `dense_layer.forward(inputs)` __it will not record the gradients__ because `forward` won't handle the necessary logic. ::: Finally, you will see the methods `compose_input_gradients` and `compose_weight_gradients`. These methods will be critical when writing `GradientTape.tape`. These are how you will compose the upstream gradient of a tensor with the input and weight gradients of a `Diffable`. You don't have to give these methods a close read, but it's important to come back and use them later on. ## 2. beras/layers.py In this section we need to fill out the methods for `Dense`. We give you `__init__` and `weights`, you should read through both of these one liners to know what they are doing. PLease don't change these since the autograder relies on the naming conventions. Your tasks will be to implement the rest of the methods we need. :::info __Task 1 [Dense.forward]:__ To begin, fill in the `Dense.forward` method. The parameter `x` represents our input. Remember from class that our Dense layer performs the following to get its output: $$ f(\bf{x}) = \bf{x}\bf{W} + \bf{b} $$ Keep in mind that `x` has shape `(num_samples, input_size)`. ::: :::info __Task 2 [Dense.get_input_gradients]:__ Refer to the formula you wrote in `Dense.forward` to compute $\frac{\partial f}{\partial x}$. Be sure to return the gradient `Tensor` __as a list__, this will come in handy when you write back propagation in `beras/gradient_tape.py` later in this assignment. ::: :::success __Note:__ For each `Diffable` you can access the inputs of the forward method with `self.inputs` ::: :::info __Task 3 [Dense.get_weight_gradients]:__ Compute both $\frac{\partial f}{\partial w}$ and $\frac{\partial f}{\partial b}$ and return both `Tensor`s __in a list__, like you did in `Dense.get_input_gradients`. ::: :::info __Task 4 [Dense.\_initialize_weight]:__ Initialize the dense layer’s weight values. By default, return 0 for all weights (usually a bad idea). You are also required to allow for more sophisticated options by allowing for the following: - **Normal:** Passing `normal` causes the weights to be initialized with a unit normal distribution $\mathcal{N}(0,1)$. - **Xavier Normal:** Passing `xavier` causes the weights to be initialized in the same way as `keras.GlorotNormal`. - **Kaiming He Normal:** Passing `kaiming` causes the weights to be initialized in the same way as `keras.HeNormal`. Explicit definitions for each of these initializers can be found **[in the tensorflow docs](https://www.tensorflow.org/api_docs/python/tf/keras/initializers)** Note: _initialize_weight __returns__ the weights and biases and does not set the weight attributes directly. ::: ## 3. beras/activations.py Here, we will implement a couple activation functions that we will use when constructing our model. :::info __Task 3.1 [LeakyReLU]:__ Fill out the forward pass and input gradients computation for `LeakyReLU`. You'll notice these are the same methods we implemented in `layers.py`, this is by design. ::: :::success __Hint:__ LeakyReLU is not continous so when computing the gradient, consider both the positive and negative cases. Note: Though LeakyReLU is technically not differentiable at $0$ exactly, we can just leave the gradient as $0$ for any $0$ input. ::: :::info __Task 3.2 [Sigmoid]:__ Complete the `forward` and `get_input_gradients` methods for `Sigmoid`. ::: :::info __Task 3.3 [Softmax]:__ Write the forward pass and gradient computation w.r.t inputs for Softmax. ::: :::success __Hints:__ You should use stable softmax to prevent overflow and underflow issues. Details in the stencil. Combining `np.outer` and `np.fill_diagonal` will significantly clean up the gradient computation When you first try to compute the gradient it will become apparent that the input gradients are tricky. This [medium article](https://towardsdatascience.com/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1) has a fantastic derivation that will make your life a lot easier. ::: ## 4. beras/losses.py In this section we need to construct our Loss functions for the assignment, `MeanSquaredError` and `CategoricalCrossEntropy`. You should note that for most classification tasks we use `CategoricalCrossEntropy` by default but, for this assignment we will use both and compare the results. :::success __Note:__ You'll notice we construct a `Loss` class that both `MeanSquaredError` and `CategoricalCrossEntropy` inherit from. This is just so that we don't have to specify that our Loss functions don't have weights everytime we create one. ::: :::info __Task 1 [MeanSquaredError.forward]:__ Implement the forward pass for `MeanSquaredError`. We want `y_pred - y_true`, and not the other way around. Don't forget that we expect to take in _batches_ of examples at a time so we will need to take the mean over the batch as well as the mean for each individual example. In short, the output should be the mean of means. Don't forget that `Tensors` are a subclass of `np.ndarrays` so we can use numpy methods! ::: :::warning __You should know:__ In general, loss functions should return exactly 1 scalar value no matter how many examples are in the batch. We take the mean loss from the batch examples in most practical cases. We will see later in the course that we can use multiple measures of loss at one time to train a model, in which case we often take a weighted sum of each individual loss as our loss value to backpropagate on. ::: :::info __Task 2 [MeanSquaredError.get_input_gradients]:__ Just as we did for our dense layer, compute the gradient with respect to inputs in MeanSquaredError. It's important to rememeber that there are two inputs, `y_pred` and `y_true`. Since `y_true` comes from our database and is not dependent on our params, you should treat it like a constant vector. On the other hand, compute the gradient with respect to `y_pred` exactly as you did in `Dense`. Remember to return them both as a list! ::: :::success __Hint:__ If you aren't quite sure how to access your inputs, remember that `MeanSquaredError` is a `Diffable`! ::: :::info __Task 3 [CategoricalCrossEntropy.forward]:__ Implement the forward pass of `CategoricalCrossEntropy`. Make sure to find the per-sample average of the CCE Loss! You may run into trouble with values very close to 0 or 1, you may find `np.clip` of use... ::: :::info __Task 4 [CategoricalCrossEntropy.get_input_gradients]:__ Get input gradients for `CategoricalCrossEntropy`. ::: ## 5. beras/metrics.py There isn't much to do in this file, just to implement the forward method for `CategoricalAccuracy`. :::info __Task 1 [CategoricalAccuracy.forward]:__ Fill in the `forward` method. Note that our input `probs` represents the probability of each class as predicted by the model and labels is a one hot encoded vector representing the true model class. ::: :::success __Hint:__ It may be helpful this also think of the labels as a probability distribution, where the probability of the true class is 1 and all other classes is 0. If the index of the max value in both vectors is the same, then our model has made the correct classification. ::: ## 6. beras/onehot.py `onehot.py` only contains the `OneHotEncoder` class which is where you will code a one hot encoder to use on the data when you preprocess later in the assignment. Recall that a one hot encoder transforms a given value into a vector with all entries being 0 except one with a value of 1 (hence "one hot"). This is used often when we have mutliple discrete classes, like digits for the MNIST dataset. :::info __Task 1 [OneHotEncoder.fit]:__ In HW2 you were able to use `tf.onehot` now you get to build it yourself! In `OneHotEncoder.fit` you will take in a 1d vector of labels and you should construct a dictionary that maps each unique label to a one hot vector. This method doesn't return anything. __Note__: you should only associate a one hot vector to labels present in labels! ::: :::success __Hint:__ `np.unique` and `np.eye` may be of use here. ::: :::info __Task 2 [OneHotEncoder.forward]:__ Fill in the `OneHotEncoder.forward` method to transform the given 1d array `data` into a one-hot-encoded version of the data. This method should return a 2d `np.ndarray`. ::: :::info __Task 3 [OneHotEncoder.inverse]:__ `OneHotEncoder.inverse` should be an exact inverse of `OneHotEncoder.forward` such that `OneHotEncoder.inverse(OneHotEncoder.forward(data)) = data`. ::: ## 7. beras/optimizer.py In `beras/optimizer.py` there are 3 optimizers we'd like you to implement, a `BasicOptimizer`, `RMSProp`, and `Adam`. In practice, `Adam` is tough to beat so more often than not you will default to using `Adam`. Each has an `__init__` and `apply_gradients`. We give you the `__init__` for each optimizer which contains all the hyperparams and variables you will need for each algorithm. Then in `apply_gradients` you will write the algorithm for each method to update the `trainable_params` according to the given `grads`. Both `trainable_params` and `grads` are lists with $$\text{grad}[i] = \frac{\partial \mathcal{L}}{\partial \text{ trainable_params[i]}}$$ where $\mathcal{L}$ is the Loss of the network. :::info __Task 1 [BasicOptimizer]:__ Write the `apply_gradients` method for the `BasicOptimizer`. For any given `trainable_param`, $w[i]$, and `learning_rate`, $r$, the optimization formula is given by $$w[i] = w[i] - \frac{\partial \mathcal{L}}{\partial w[i]}*r$$ ::: :::info __Task 2 [RMSProp]:__ Write the `apply_gradients` method for the `RMSProp`. In `RMSProp` there are two new hyperparams, $\beta$ and $\epsilon$. $\beta$ is referred to as the __decay rate__ and typically defaults to .9. This decay rate has the effect of _lowering the learning rate as the model trains_. Intuitively, as our loss decreases we are closer to a minimum and should take smaller steps towards optimization to ensure we don't optimize past the minimum. $\epsilon$ is a small constant to prevent division by 0. In addition to our hyperparams there is another term which we will call, __v__, which acts as the moving average of the gradients __for each param__. We update this value in addition to the `trainable_params` every time we apply the gradients. For any given `trainable_param`, $w[i]$, `learning_rate`, $r$ the update is defined by $$v[i] = \beta*v[i] + (1-\beta)*\left(\frac{\partial \mathcal{L}}{\partial w[i]}\right)^2$$ $$w[i] = w[i] - \frac{r}{\sqrt{v[i]} + \epsilon}*\frac{\partial \mathcal{L}}{\partial w[i]}$$ **Hint**: In our stencil code, we provide **v** as a dictionary which maps a key to a float. Keep in mind that we only need to store a single **v** value for each weight! ::: :::info __Task 3 [Adam]:__ Write the `apply_gradients` methods for the `Adam`. At it's core Adam, is similar to `RMSProp` but it has more smoothing terms and computes an additional _momentum_ term to further balance the learning rate as we train. This momentum term has it's own decay term, $\beta_1$. Additionally, `Adam` keeps track of the number of optimization steps performed to further tweak the effective learning rate. Here is what an optimization step with `Adam` looks like for `trainable_param`, $w[i]$ and `learning_rate`, $r$. $$m[i] = m[i]*\beta_1 + (1-\beta_1)*\left(\frac{\partial \mathcal{L}}{\partial w[i]}\right)$$ $$v[i] = v[i]*\beta_2 + (1-\beta_2)*\left(\frac{\partial \mathcal{L}}{\partial w[i]}\right)^2$$ $$\hat{m} = m[i]/(1-\beta_1^t)$$ $$\hat{v} = v[i]/(1-\beta_2^t)$$ $$w[i] = w[i] - \frac{r*\hat{m}}{\sqrt{\hat{v}}+\epsilon}$$ Note: Don't forget the __iterate time once__ when `apply_gradients` is called! ::: :::success __Hint:__ Don't overcomplicate this section, it really is as simple as programming the algorithms as they are written. ::: ## 8. beras/gradient_tape.py In `beras/gradient_tape` you are going to implement your very own context manager `GradientTape` that is extremely similar to the one actually used in Keras. We give you `__init__`, `__enter__`, and `__exit__`, your job is to implement `GradientTape.gradient`. :::danger __Warning:__ This section has historically been difficult for students. It's helpful to carefully consider our hints and the conceptual ideas behind `gradient` __before__ beginning your implementation. You should also freshen up on [Breadth First Search](https://www.geeksforgeeks.org/breadth-first-search-or-bfs-for-a-graph/) to make the implementation easier. If you get stuck, feel free to come back to this section later on. ::: :::info __Task 1 [GradientTape.gradient]:__ Implement the `gradient` method to compute the gradient of the loss with respect to each of the trainable params. The output of this method should be a list of gradients, one for each of the trainable params. ::: :::success __Hint:__ You'll utilize the `compose_input_gradients` and `compose_weight_gradients` methods from `Diffable` to collect the gradients at every step. Keep in mind that you can call `id(<tensor_variable>)` to get the object id of a tensor object! ::: :::warning __You should know:__ When you finish this section you will have written a highly generalized gradient method that could handle an arbitrary network. This method will also function almost exactly how Keras implements the `gradient` method. This is a very powerful method but it is just one way to implement autograd. ::: ## 9. beras/model.py In `beras/model.py` we are going to construct a general `Model` abstract class that we will use to define our `SequentialModel`. The `SequentialModel` simply calls all of it's layers in order for the forward pass. :::warning __You should know:__ At first it may seem like all neural nets would be `SequentialModel`s but there are some archetectures like __ResNets__ that break the sequential assumption. ::: :::info __Task 1 [Model.weights]:__ Construct a list of all weights in the model and return it. ::: :::success We give you `Model.compile`, which just sets the optimizer, loss and accuracy attributes in the model. In Keras, compile is a huge method that prepares these components to make them hyper-efficient. That implementation is highly technical and outside the scope of the course but feel free to look into it if you are interested. ::: :::info __Task 2 [Model.fit]:__ This method should train the model for the number of `epochs` given on the train and test data, `x` and `y` given with batch size, `batch_size`. Importantly, you want make sure you record the metrics throughout training and print stats out during the train so that you can watch the metrics as the model trains. You can use the `print_stats` and `update_metric_dict` functions provided. Note that neither of these methods return any values, `print_stats` prints out the values directly and `update_metric_dict(super_dict, sub_dict)` updates `super_dict` with the mean metrics from `sub_dict`. __Note:__ You do __not__ need to call the model here, you should instead use `self.batch_step(...)` which all child classes of `Model` will implement. ::: :::info __Task 3 [Model.evaluate]:__ This method should look _very similar_ to `Model.fit` except we need to ensure the model does not train on the testing data. Additionally, we will test on the entirety of the test set one time, so there is no need for the epochs parameter from `Model.fit`. ::: :::info __Task 4 [SequentialModel.forward]:__ This method passes the input through each layer in `self.layers` sequentially. ::: :::info __Task 5 [SequentialModel.batch_step]:__ This method trains makes a model prediction and computes the loss __inside of GradientTape__ just like you did in HW2. Be sure to use the `training` argument to adjust the weights of the model only when `training` is True. This method should return the _loss and accuracy_ for the batch in a dictionary. **Note**: you should have used the implict `__call__` function for the layers in the forward method to ensure that each layer gets tracked to the GradientTape (i.e. `layer(x)`). However, because we don't define how to calculate gradients for a SequentialModel, make sure to use the SequentialModel's `forward` method in `batch_step`. ::: ## 10. `preprocess.py` In this section you will fill out the `load_and_preprocess_data()` function that will load in, flatten, normalize and convert all the data into `Tensor`s. :::info __Task 1 [load_and_preprocess_data()]:__ We provide the code to load in the data, your job is to 1. Normalize the values so that they are between 0 and 1 2. Flatten the arrays such that they are of shape (number of examples, 28*28). 3. Convert the arrays to `Tensor`s and return the train inputs, train labels, test inputs and test labels __in that order__. 4. You should NOT shuffle the data in this method or do any other transformations than what we describe in 1-3. Importantly, you should NOT return one_hot labels. You'll create those when training and testing. ::: ## 11. `assignment.py` Here you put it all together! We won't autograde this file, it is just for you to train and test your model in with different archetectures. Try starting out with small simple models and play around with different archectures, optimizers, hyperparams, etc. to find a configuration that acheives over 95% accuracy on the testing data. :::info __Task 1 [Save predictions]:__ Once you have an architecture that works, use `np.save("predictions.npy", arr: np.ndarry)` to save your predictions for the test. Please name the file predictions.npy when you submit to gradescope or the autograder may not find it. It might be helpful to change what's being returned by/done in `batch_step` and `evaluate` for this! Just make sure that `batch_step` **only returns the metric dictionary** when `training=True`, as that's what we test for. *Note: We have a number of safeguards in place to prevent folks from cheating, and the autograder __will not__ tell you that your submission has been flagged. Instead, you'll get an email from Prof. Sun some time after you submit.* ::: ## Submission ### Requirements You'll need to submit all the files associated with the assignment, and a "predictions.npy" containing your best model predictions. ### Grading Your code will be primarily graded on functionality, as determined by the Gradescope autograder. :::warning You will not receive any credit for functions that use `tensorflow`, `keras`, `torch`, or `scikit-learn` functions within them. You must implement all functions manually using either vanilla Python or NumPy. ::: ### Handing In You should submit the assignment via Gradescope under the corresponding project assignment through Github or by submitting all files individually. To submit via Github, commit and push all of your changes to your repository to GitHub. You can do this by running the following commands. ```bash git commit -am "commit message" git push ``` For those of y'all who are already familiar with `git`: the `-am` flag to `git commit` is a pretty cool shortcut which adds all of our modified files and commits them with a commit message. **Note:** We highly recommend committing your files to git and syncing with your GitHub repository **often** throughout the course of the assignment to ensure none of your hard work is **lost**! ### Leaderboard There is a leaderboard active for this assignment, __this is purely for fun and is not graded.__

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully