Machine Learning Practice Problems

# Machine Learning Practice Problems ### CS181 ### Spring 2023 ![](https://i.imgur.com/xDR9VQd.png) # Midterm #1 On Midterms, you should expect equal emphasize on concept reasoning and problems that involve derivations. **Test Aid:** Midterms are closed-book and closed-internet. However, you will be allowed to bring with you one 8'x11' sheet of paper with information front and back as a test aid. ## Math Practice The following are a ***non-exhaustive*** list of examples of problems that would involve derivations -- you performing mathematical operations to arrive at some formula. 1. **(Regression)** Consider a one-dimensional regression problem with training data $\mathcal{D} =\{(x_1,y_1), \ldots, (x_N,y_N)\}$, $x_n, y_n\in \mathbb{R}$ . We want to fit a linear model with no bias term: $$ f_w(x) = w\cdot x $$ - **(Non-Probabilistic)** Assume a loss function $\mathcal{L}(w) = \frac{1}{2} \sum_{n=1}^N (y_n - f_w(x_n))^2$. Solve for the optimal value of $w$ under $\mathcal{L}$. Write down the $\ell_2$ regularized loss function, solve for the optimal $w$ under $\ell_2$ regularized loss. Write down the $\ell_1$ regularized loss function. ***Variations:*** What if the loss function is: - $\mathcal{L}(w) = \frac{1}{2} \sum_{n=1}^N (y_n - f_w(x_n))^3$ - $\mathcal{L}(w) = \frac{1}{N} \sum_{n=1}^N (y_n - f_w(x_n))$ - $\mathcal{L}(w) = \frac{1}{N} \sum_{n=1}^N e^{(y_n - f^2_w(x_n))}$ - Make up your own loss function! - **(Probabilistic)** Suppose that we have a probabilistic model of the form $$\hat{y} = wx + \epsilon$$ where $\epsilon \sim \mathcal{N}(0,\sigma^2)$. - Write down the exact form of the likelihood for one observation. - Write down the exact form of the joint likelihood for $\mathcal{D}$. - Write down the joint log-likelihood for $\mathcal{D}$. - According to the loss function in the (Non-probabilistic) problem above, if the loss for Model 1 is less than the loss for Model 2, what can we conclude about the log-likelihoods of Model 1 and Model 2? - Write down maximum likelihood inference for this model as an optimization roblem. - Is maximizing the joint log-likelihood for this model equivalent to minimizing the loss in the (Non-Probabilistic) problem in the above? - What is the MLE of the hyperparameter $\sigma^2$? ***Variations:*** What if the noise is: - $\epsilon \sim \mathcal{N}(\mu, \sigma^2)$; if so, what is $\mu_\mathrm{MLE}$? - $\epsilon$ is an exponential RV; if so what is the MLE of the exponential pdf parameter? - $\epsilon$ is an Laplace RV; if so what is the MLE of the Laplace pdf parameter? - Make up your own noise! - **(Bayesian)** Now assume that $w$ is a random variable and that we have a prior on $w$ with known variance $s^2$: $$w \sim \mathcal{N}(0,s^2)$$ Write down the form of the *posterior* distribution over $w$ -- take logs and drop terms that don't depend on the data $\mathcal{D}$ and prior parameters. Can you make the posterior look like a normal pdf by completing the square (assume you can look up how to complete the square) and doing algebraic simplifications? Write down the form of the *posterior predicitve* distribution for a new observation $(x, y)$ -- can you make the posterior predictive look like a normal pdf? What is the effect of prior variance on the posterior variance and posterior predictive variance? What is the effect of prior mean on the posterior mean and posterior predictive mean? What is the effect of the number of data observations, $N$, on the the posterior variance and posterior predictive variance? What is the posterior mode estimate of $w$? Is it equivalent to one of the regularized loss function in the (Non-probabilistic) problem above? What is the posterior mean estimate of $w$? Is it equivalent to one of the regularized loss function in the (Non-probabilistic) problem above? ***Variations:*** What if the prior is: - $w \sim \mathcal{N}(m,s^2)$ - $w$ is a Laplace RV; would the posterior be Normal? - $w$ is an exponential RV; would the posterior be Normal? - Make up your own prior! **Optional Variations (for math complexity level beyond the Midterm):** Repeat all of the above for training data $\mathcal{D} =\{(\mathbf{x}_1,y_1), \ldots, (\mathbf{x}_N,y_N)\}$, where $\mathbf{x}_n \in \mathbb{R}^D$, your choice of feature map $\phi:\mathbb{R}^{D} \to \mathbb{R}^{D'}$, and function $$ f_\mathbf{w}(\mathbf{x}) = \mathbf{w}^\top \phi(\mathbf{x}) $$ You'll need to use multivariate versions of the distributions in this problem. 2. **(Probabilistic Classification)** Given a dataset $\mathcal{D} = \{(-\pi, 1), (0, 0), (\pi, 1)\}$ for binary classification. For each of the following feature maps, - $\boldsymbol{\phi}(x) = [1, x]^T$ - $\boldsymbol{\phi}(x) = [1, x, x^2]^T$ - $\boldsymbol{\phi}(x) = [1, x, x^4]^T$ - $\boldsymbol{\phi}(x) = [1, \cos x]^T$ answer the following questions: - plot the dataset in *input* space, represent points in class 1 using $\times$, represent points in class 0 using $\circ$. Draw the decision boundary that would perfectly classify this dataset. - write down the joint log-likelihood of a logistic regression model with the given feature map. - determine if a logistic regression model with the given feature map is capable of perfectly classifying $\mathcal{D}$. - if so, by trial-and-error or inspection, find parameters $\mathbf{w}$ of the logistic regression model that correctly classifies the data-points. For this set of parameters $\mathbf{w}$: - plot the ROC curve by hand and compute the AUC - describe the affect of changing the classification threshold on your decision boundary - derive the impact on log-odds for each parameter in $\mathbf{w}$ - are the parameters $\mathbf{w}$ you found above the MLE of $\mathbf{w}$? (*Hint:* check the gradient) - are the parameters $\mathbf{w}$ you found above the global optimum of $\ell_2$-regularized logistic regression? (*Hint:* check the gradient) **Variations:** Repeat the above for your own choice of feature map $\phi$, and your own 1D binary classification dataset of three or four points. **Optional Variations (for math complexity level beyond the Midterm):** Repeat the above for $\mathcal{D} = \{(-\pi, 0, 1), (-\pi, -1, 0), (\pi, 0, 2), (\pi, -1, 1)\}$ for three-class classification $y=0, 1, 2$. Try different rearrangements of these four points and make up your own feature map. 3. **(Neural Networks)** Consider the following 2-layer neural network, which takes in $\mathbf{x} \in \mathbb{R}^2$ and has two ReLU hidden units and a final sigmoid activation, $\sigma$. *There are no bias weights on the hidden units.* ![](https://i.imgur.com/sweEqVa.png) For a binary classification problem with true labels $y \in \{0,1\}$, we will use the logistic regression loss function $$L = -(y\log(f_{\mathbf{W}}(\mathbf{x}))+(1-y)\log(1-f_{\mathbf{W}}(\mathbf{x}))).$$ - Suppose we update our network with stochastic gradient descent with batch-size 1, on a single data point $\mathbf{x} = [x_1, x_2]^T$. - Calculate the gradient of the loss with respect to $v_1$. - Calculate the gradient of the loss with respect to $w_{11}.$ - Consider the classification of data points below. Prove or disprove that the neural network classifier above can perfectly classify these data points with the set of weights $w_{11}, w_{12}, w_{21}, w_{22} = \{1, 0, 0, 1\}$? What if additional hidden layers were added to top hidden layer of the existing network (while keeping the specified set of weights fixed)? ![](https://i.imgur.com/oZOR1kT.png) ***Variations:*** Answer all the above with the following changes: - The activation function is: - linear - exponential - quadratic - Make up your own activation function! - The problem is a regression problem (make up a simple regression data set with a small number of points )? ## Concept Practice The following are a ***non-exhaustive*** list of examples of concept reasoning problems on the Midterm -- concept reaosing problems will not involve deriving math but will require that you can look at math in order to derive insights to answer questions about ML models and to inform modeling choices. There are two types questions on the midterm: Please note that while there is no one single correct conclusion or a single correct argument, ***this does not mean that all arguments are correct (logical)!*** Please treat conceptual reasoning questions as asking for a math ***[proof](https://heil.math.gatech.edu/handouts/proofs.pdf)*** (rather than derivation) and be as formal and rigorous as you can. 0. **Broader Impact Analysis** - [(In-Class Exercise) Evaluating Generative AI Technology](https://deepnote.com/workspace/weiwei-pan-2902decb-902f-40cc-9fa6-af2e3f31f15b/project/Evaluating-AI-7190c749-a47d-405d-9c10-09dc53e07d96/notebook/Evaluating%20AI-ba5310d5e7334417875be505bd3ce903) - [(In-Class Exercise) Case Study: Classification](https://hackmd.io/@onefishy/rkTKiTWao) 1. **Regression** - [Intro to Machine Learning](https://docs.google.com/forms/d/e/1FAIpQLSeV5lC76dmWWTW0w740rkYe8F9GVNb4E5dwJu3bvAGDIhSQzg/viewform?usp=sf_link) - [Linear Regression](https://docs.google.com/forms/d/e/1FAIpQLScZs9MGFWR8D7ond4yNjBqDbLR7nQcl1MqWsSLr5soK3NwgYA/viewform?usp=sf_link) - [Multilinear & Polynomial Regression](https://docs.google.com/forms/d/e/1FAIpQLSf-Lqo3ODY01KltOTXoaRy9H5OTYRkb-SLez0OeC_nx6IpgTA/viewform?usp=sf_link) - [(In-Class Exercise) Applying and Evaluating Linear Regression](https://deepnote.com/workspace/weiwei-pan-2902decb-902f-40cc-9fa6-af2e3f31f15b/project/Linear-and-Polynomial-Regression-e2781733-43fd-4788-aa5e-51dd337011dc/notebook/Linear%20and%20Polynomial%20Regression-b3f6f1abfd74401389cd068dd650e8b3) - [Probabilistic Regression](https://docs.google.com/forms/d/e/1FAIpQLSfGWG7KX_uf0WY5R3J0jpqVUcuK-fAqAVQh0F_CH4_kh0BUAg/viewform?usp=sf_link) - [Regression Trees](https://docs.google.com/forms/d/e/1FAIpQLSdoWUOvW6gXbYShdxHWCp4TC-BLwMtvc1W4YGzBolF1PKkMig/viewform?usp=sf_link) - [KNN](https://docs.google.com/forms/d/e/1FAIpQLSd4uWwTrAD3DcmfOmyX3I05WSxAPLqXPP6bMqJtoQzml-AFsQ/viewform?usp=sf_link) - [(In-Class Exercise) Probabilistic vs Non-Probabilistc Regression](https://colab.research.google.com/drive/1_ch9K3VYW0PEwt5ae_KbqRnasgVNjND2?usp=sharing) - [Uncertainty and Confidence Intervals](https://docs.google.com/forms/d/e/1FAIpQLSdzgyPK5EIncWJsFsvtae4azgu3ILTmFy8zeri4zVrEOI8yww/viewform?usp=sf_link) - [Bias, Variance and Generalization](https://docs.google.com/forms/d/e/1FAIpQLSd8Lqy2yU60_jQgVMdTprucIAS4eg7_Y3FoIc8wkx-v7LLmqg/viewform?usp=sf_link) - [Variance Reduction](https://docs.google.com/forms/d/e/1FAIpQLSfDUYEIELn8Fnq6qjZgcAEqfbU6IXu0F9TPC_eH3fSNHZFV9Q/viewform?usp=sf_link) - [(In-Class Exercise) Uncertainty and Variance](https://colab.research.google.com/drive/1ERDXQTebSrDKlLm6F6JwRuvG6gWd8hmA?usp=sharing) 2. **Classification** - [Probabilistic Classification](https://docs.google.com/forms/d/e/1FAIpQLSdAKzqIyvtUTpI0XxUGd--umW-7nXbXIwx6JWlD4IOD7bBZig/viewform?usp=sf_link) - [Gradient Descent](https://docs.google.com/forms/d/e/1FAIpQLSdVrZMtU6-4C9ml6O3x2geQMqXv8ACY8yoSxy_5JgZa9dQWig/viewform?usp=sf_link) - [Evaluating Classifiers](https://docs.google.com/forms/d/e/1FAIpQLSdBE5V-_xHRbxAdoI564IY3T4OFyoFSBTeosq7y5PdSa0wX2w/viewform?usp=sf_link) - [(In-Class Exercise) Evaluating and Interpreting Classifiers](https://colab.research.google.com/drive/1m01jseLkrOryWN5ZlV8D6L-Ez8HsSJp1?usp=sharing) - [Non-Probabilistic Classifiers](https://docs.google.com/forms/d/e/1FAIpQLScU_kxRK5uNThepAtAf52Yhreg2FZMaiolSbV612bbssJr1Gg/viewform?usp=sf_link) - [Hyper-parameter Selection & Variance Reduction](https://docs.google.com/forms/d/e/1FAIpQLScTtjpHg4svVZ6DETiyW9gaHghrTg63wSOyjWzrFkd-AAtEtQ/viewform?usp=sf_link) - [More on Evaluating Classifiers](https://docs.google.com/forms/d/e/1FAIpQLSe6yDU-_h_hsxEdZeV6d8boYE5DuXzzFs7-jjave0Exh-AEBQ/viewform?usp=sf_link) 3. **Neural Network** - [Neural Networks](https://docs.google.com/forms/d/e/1FAIpQLSd7Ynz47sC_1_EHdSp1IusBF2Ulo5hdTZt-S2xfNVQhYMLmCw/viewform?usp=sf_link) - [Neural Network Optimization](https://docs.google.com/forms/d/e/1FAIpQLSd3z0gXS7ep0CTjmsTGUXf5rsbAW-gfuMwJtVcj7qB-fFAXAQ/viewform?usp=sf_link) - [Interpreting Neural Networks](https://docs.google.com/forms/d/e/1FAIpQLSduOiRxej0nPsLk3Zc-0XPGqjiEzQN2OXnh7SHHFpa60eSvfQ/viewform?usp=sf_link) - [Encoding and Transforming Data: Image & Text](https://docs.google.com/forms/d/e/1FAIpQLSc50cp6NMwjyBqoypLTTzEn16-boJu7TD0-Stee04wjg8mk4w/viewform?usp=sf_link) - [Sources of Bias in Model Interpretation and Usage](https://docs.google.com/forms/d/e/1FAIpQLSddbPPwJSi0aiXa_yeBiffoJWMOaLbbgfm4exGjHICJDeyXIg/viewform?usp=sf_link) 4. **Bayesian Models** - [Bayesian Models for Regression](https://docs.google.com/forms/d/e/1FAIpQLSeP7SUBIUl3J2hPyGtLNSnfs_Tu0lX3Yqk7_CPD9x5XK8NqIg/viewform?usp=sf_link) - [Bayesian Linear Regression](https://docs.google.com/forms/d/e/1FAIpQLSeuf-pYY3f2BMeGfLDUuDTp5haCI37znil6Lh8UHYnlCdSWpQ/viewform?usp=sf_link) - [Interpreting the Posterior Predictive](https://docs.google.com/forms/d/e/1FAIpQLSciFhnmbW1fuEoyBbkhZTpY6VhDX4ozDX95_pA7AteJLmXWRA/viewform?usp=sf_link) - [Bayesian Logistic Regression](https://docs.google.com/forms/d/e/1FAIpQLSePX7d8PO6ZRylwBerAVa7ZklZPii4mHK18S0rvZGBgcLoapw/viewform?usp=sf_link) ## Coding and Case Study Practice for Midterm #1 Content Note that the midterms will not having coding portions, however, you nonetheless may wish to practice the ML pipeline for regression and classification models on a number of real-life datasets. **The Machine Learning Pipeline** - Data exploration - Model building - Model evaluation - Model interpretation - Model critique/revision - Communicating results The following are Kaggle competitions with clean datasets, established evaluation metrics and existing solution notebooks to learn from. ### Getting Started Getting your feet wet doing Kaggle competitions: 0. **(Regression)** [Predicting Housing Prices](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques) 1. **(Classification)** [Predicting Survival of Titanic Passengers](https://www.kaggle.com/competitions/titanic) ### Regression Kaggle competitions featuring regression problems: 3. **(ML for Social Good)** [Predict Popularity of Pets on Shelter Websites](https://www.kaggle.com/competitions/petfinder-pawpularity-score/overview/description) 4. **(ML for Climate Change)** [Subseasonal Forecasting for Climate Change Adaptation](https://www.kaggle.com/competitions/widsdatathon2023/overview) 5. **(ML for Climate Change)** [Forecasting Building Energy Consumption](https://www.kaggle.com/competitions/widsdatathon2022/overview) ### Classification Kaggle competitions featuring classification problems: 7. **(ML for Science Research)** [Identify Plant Species from Images](https://www.kaggle.com/competitions/herbarium-2022-fgvc9/overview/description) 8. **(ML for Conservation)** [Identify Bird Species Based on Sound](https://www.kaggle.com/competitions/birdclef-2022/overview/description) 9. **(ML for Conservation)** [Identify Individual Marine Mammals Based on Fin Characteristics](https://www.kaggle.com/competitions/happy-whale-and-dolphin/overview/description) 10. **(ML for Conservation)** [Count Number of Animals Caught in Camera Trap Images](https://www.kaggle.com/competitions/iwildcam2022-fgvc9/overview/description) 11. **(ML for Science Research)** [Identify Blood Clot Origin in Stroke Patients](https://www.kaggle.com/competitions/mayo-clinic-strip-ai) ### Analytics Kaggle competitions featuring analytics problems -- where the deliverable isn't an "answer" or a "prediction" but exposition of actionable insights extracted from data. 14. **(ML for Education)** [Understanding the Impact of COVID on Digital Learning Outcomes](https://www.kaggle.com/competitions/learnplatform-covid19-impact-on-digital-learning/overview/description) 15. **(ML for Resource Allocation)** [Analyzing Water Availability](https://www.kaggle.com/competitions/acea-water-prediction/overview)