Tâm - Week 6 - HackMD

--- title: 'Tâm - Week 6' tags: CoderSchool, Mariana --- Week 6 === ## Table of Contents [TOC] ## Monday ### Deep learning intro Deep learning vs machine learning: - deep learning can do feature extraction itself whereas machine learning requires feature extraction separately Gradient vanishing: when data goes through a very deep neural network The artificial neural: Sigmoid function as activation function: - drawback: it is not balanced - when z is very big or very small, the speed of learning plateau Common activation function: - Sigmoid - Hyperbolic tangent - ReLU The activate function helps us introduce non-linearity into our model Cost function: the aim of our problem is to minimize the cost function Loss optimization: find the lowest loss Stochastic Gradient Descent: - one epoch: random weight initialization, iteration compute gradient and update weights Regularization: help reduce overfitting in deep learning Hyperparameter tuning: - learning rate: too small too slow, too big will diverge - num of hidden layers - batch size Weight initialization: - Can help to break the symmetry in the model so that the model can capture complexity in the data Problem of global/local minimum - Initialization of weights can lead us to end up at global or local minimum - saddle point Regularization - control the weights to not be too big ### Creating Google Cloud VM instance Follow [Anh Minh's instructions](https://hackmd.io/SEVZeQMJRa2JJMxd9y8PnQ) Connect to instance by: ``` gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME -- -L 8080:localhost:8080 ``` ## Tuesday ### Neural network math - cross entropy loss function is the cost function of logistic regression - What is the derivative of sigmoid, tanh, ReLU function? ## Wednesday ### Review of OOP ### DNN from scratch ## Thursday ### DNN review and problems - splitting data into mini-batch will help train data faster - the more layers the gradient can disappear or become huge -> problem of gradients - We can also change the way we update gradients - Regularization helps with overfitting **Stochastic gradient descent** - pick one mini-batch to train forward and backward, we update more times - small batch size -> fast iteration. large batch size -> more precise estimation of gradients **Data Mismatch** - training loss not good enough and we still not overfit **Learning rate** - we can pick a big learning rate to start with and smaller learning rate when we close to the goal **Vanishing/Exploding gradient** - this problem prevents us to build a deep neural network - activation function saturating : derivative very close to 0 - fan-in: the number of nodes in the previous layer - LeakyReLU: non saturating activation functions (solving the problem of neurons dying) - ELU: another version of leaky Relu, slower but faster convergence rate - Never use tanh or sigmoid in activation layers **Batch normalization** - original -> zero-center -> normalized - gradient clipping : during back propagation to prevent gradient exploding. used in RNN **Faster optimizer** - moving average: higher beta smoother line (beta is the number of days to calculate average) - momentum optimizer: update the parameters with moving average of dW not dW -> making update smoother and faster - RMSProp optimizer: - Adam optimizer: **Learning rate decay**: smaller learning rate over time - e.g for ResNets, multiply lr by 0.1 after epoch 30, 60, 90 **Regularization**: avoid overfitting - earli stopping - L1, L2, data augmentation, batch normalization, dropout - L1, L2: adding function of weight term in loss function - L1: sum of absolute of weight (L1norm manhattan distance) (sparsity) - L2: sum of square of weight (L2norm euclidean distance) (simplicity) - feature selection: - Dropout: randomly dropout certain nodes in some layers -> forces layers not to rely on any 1 node - data augmentation: ### tensorflow syntax - sequential - summary(): - steps: number of training set per epoch (sample / batch size) ## Friday ### Build a Trie class (autocomplete) ### Build flask app from template :::info **Find this document incomplete?** Leave a comment! ::: ###### tags: `CoderSchool` `Mariana` `MachineLearning`