Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization

# Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization [toc] [NOTES LINK](https://community.deeplearning.ai/t/course-2-lecture-notes/11866) ![](https://i.imgur.com/lN7pGqn.png) ## BIAS Error of training Data ## Variance Error of Test data We want low bias and variance ## Underfitting Squared Error is high even for training data. Accuracy for train and test data low * High bias and high Variance ## Overfitting wrt training data, predicted and actual points fits perfectly. But for new test points model wont satisfy. Training Data Accuracy high Test data Accuracy low * Low bias * High Variance ![](https://i.imgur.com/A1q8afN.png) Another example- ![](https://i.imgur.com/p3uo3fF.png) To reduce bias increase hiddden layers To reduce variation increase data ![](https://i.imgur.com/VvA7nLl.png) ## Regularization Used to reduce variance Basically u r penalizing higher values of w or b. By increasing lambda u want weight value less ![](https://i.imgur.com/0FhBMOJ.png) ![](https://i.imgur.com/6944hCj.png) ### Y regularization reduces variance * By doing this u r nullifying the effect of nodes of hidden layer since weights are less. Hence moving towards simpler NN. * Also lambda high, weight less --> z less for which it lies in linear region of activation function, hence acting as linear model which isnt able to fit nonlinear complicated decision. the Frobenius norm formula should be the following: ![](https://i.imgur.com/csx2j6J.png) ### Dropout Regularization ![](https://i.imgur.com/csXgOYx.png) Let's say that for each of these layers, we're going to-for each node, toss a coin and have a 0.5 chance of keeping each node and 0.5 chance of removing each node. So, after the coin tosses, maybe we'll decide to eliminate those nodes, then what you do is actually remove all the outgoing things from that no as well. So you end up with a much smaller, really much diminished network. ![](https://i.imgur.com/I1eOxSP.png) Y does dropout works?? Cant rely on any 1 feature, so has to spread out weights. basically minimizing weights ### Other regularization methods In order to increase data- * flip image horizontally * rotate it randomly and zoom it ![](https://i.imgur.com/wdRhTvC.png) ## Normalizing Input ![](https://i.imgur.com/FMI655Y.png) ![](https://i.imgur.com/Cf9pVWF.png) ## Vanishing/ Exploding gradients W[l] > 1 //Exploding y increases exponentially W[l] < 1 //Vanishing y decreases exponentially ![](https://i.imgur.com/ZBtfIYe.png) ![](https://i.imgur.com/D3rFw9h.png) ![](https://i.imgur.com/dRRbtTD.png) ![](https://i.imgur.com/mjHMvIK.png) ## Optimization Algo ### Mini batch Epoch is a word that means a single pass through the training set. Consider training set of 5000000 examples. divide it as 1000 sets of 5000 samples. ![](https://i.imgur.com/STNxD22.png) ![](https://i.imgur.com/e3198vh.png) ![](https://i.imgur.com/WhgfgJD.png) ### Exponentially moving avg Avg = 1/(1-beta) more beta, more smoother curve (more weight to precious days avg than current temp) ![](https://i.imgur.com/Fyo20Vd.png) it acts as exponential decay function ![](https://i.imgur.com/EMUhTtI.png) We expect green graph with beta = 0.9 but get purple graph. bcoz initial temp = 0 due to which avg of first 2 ays. For which bias is added ![](https://i.imgur.com/IM6UYD9.png) ### Momentum Basic idea is to compute an exponentially weighted average of your gradients, and then use that gradient to update your weights instead. ![](https://i.imgur.com/sHCjVvI.png) In practice, people don't usually do bias correction because after just ten iterations, your moving average will have warmed up and is no longer a bias estimate. ![](https://i.imgur.com/9MnGLAo.png) ### RMSProp ![](https://i.imgur.com/eTfG2bD.png) In practice, epsilon= 10^-8 is added in denominator to avoid divide by 0 error. ### Adam optimization algo ![](https://i.imgur.com/URruocA.png) Tune alpha for best result ![](https://i.imgur.com/fi4j9AH.png) ### Learning Rate Decay Decrease alpha after epoches ![](https://i.imgur.com/60qjQuv.png) ![](https://i.imgur.com/MQ2FF2h.png) ### Local Optima Problem ![](https://i.imgur.com/QKmhIUm.png) ![](https://i.imgur.com/qTjigWP.png) ## Tuning Process First preference to tune red then yellow then purple ![](https://i.imgur.com/jOO4TWN.png)