# Deep Learning Assignment
Note : Every model should be implmented in `PyTorch` and don't use any sort of AI Assistance for writing the codes, its for your benifit, its ok if you aren't able to get expected results
Q1. Train a character level RNN (hyperparameters are upto you) on [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
* The model class should have a `generate` method which takes in the parameters like `context_window`, `length` and `prompt` as the input
* `TinyStories` consists of multiple paragraphs, use some of them for training and some for testing, its upto you, just add the paragraphs which you are using in a text file.
* It should also have a `train` method, which takes parameters like `learning_rate`, `iterations`, `batch_size`, etc as input
* Use two `nn.RNN` modules and rest hyper parameters are upto you
* Also implement a command line version for it. I should be able generate text using this command :
```bash
python3 filename.py \
--prompt "Once upon a time" \
--context-length 50 \
--num-chars 200
```
Q2. I hope you all are familiar with [attention](https://www.youtube.com/watch?v=nfs8NYg7yQM&t=47s), the good ol' equation : $$softmax(\frac{QK^T}{\sqrt{d_k}}).V$$
* Now your task is to add a `self-attention` layer (single head) between thoose two `nn.RNN` modules (keeping other parameters same), after training compare the loss curves and log likelihood (mean) (on both train and test sets) for both the methods.
* Take a sentence as input and show the self attention matrix of different characters as a heat map, implement it as a function (both the above tasks in a single notebook)
* Make a command line inference script for this one as well, in the above format
Q3. Implement a simple 3 layer neural network to be trained on MNIST dataset, but now you have to write the complete backward pass as well. The model class should have both `forward` and `backward` methods
Q4. Try to fit $f(x) = sin(x)$ for $0 < x < 2\pi$ with the help of a two layer neural network with different activation functions (ReLU, GELU, SiLU, LeakyReLU, ELU) and plot the resulting function for all of them. Also measure the training and validation loss where validation set contains datapoints from $-\pi < x < 0$ and $2\pi < x < 3\pi$.
Show which of them works the best and try to justify it.
Q5. Train a CNN on `CIFAR10` dataset and do the following :
* Show the activation maps and output distribution of the first two layers by taking a single input image
* Now do the same thing with addition of `BatchNorm2d` before activation and plot the results (plot the output distribution after batchnorm layer in this case)
* Also give a brief idea on "Why BatchNorm works ?" **according to you, not ChatGPT** after viewing the above results
* Now take a good looking picture (according to you 😅) from every class of CIFAR10, apply the `SmoothGrad` technique and plot the resulting saliency maps for those images. You can refer to these :
* [Smooth Grad Article](https://christophm.github.io/interpretable-ml-book/pixel-attribution.html#smoothgrad)
* [Smooth Grad Paper](https://arxiv.org/abs/1706.03825)
<!-- Q5. Using a pretrained `resnet18` on Imagenet dataset, generate an adverserial exaple for [this](https://drive.google.com/file/d/1osdia5SX7e_lR8OlRMp_TALnJqHs3acB/view?usp=sharing) image such that the model predicts it as `stingray` instead of `car`, try to make sure the pertubation in the image is minimal, we'll compare at the end who fools the model with least amount of pertubation 🙃 -->
Q6. Perform finetuning on MobileNetV2 to predict the age, gender and race of a person using the [UTKFace Dataset](https://susanqq.github.io/UTKFace/). Specifically, fully retrain/replace the Dense layer (and output layer ofcourse) and fine-tune the last 2-3 trainable convolutional layers. Points to keep in mind:
* You have to train one model that performs all 3 tasks, so your model will have: 5 outputs (one for each race), 1 output for gender (because there are only 2 genders 😘) and one output for age
* This type of a setup that you have to work with is uncommon, where you have one model perform multiple tasks. So you have to design your own loss function to combine multiple losses and balance them well. That is the challenge!
* Your final model should roughly have an MAE for age prediction of <=10 years, gender accuracy of >=80% and race accuracy >= 50% on the test set. Keep the split 80-20
* Have fun playing with the model with images of your own choice and your final submission notebook should have the prediction stats for your own face and any senior of your choice!