---
title: 'Tâm - Week 6'
tags: CoderSchool, Mariana
---
Week 6
===
## Table of Contents
[TOC]
## Monday
### Deep learning intro
Deep learning vs machine learning:
- deep learning can do feature extraction itself whereas machine learning requires feature extraction separately
Gradient vanishing: when data goes through a very deep neural network
The artificial neural:
Sigmoid function as activation function:
- drawback: it is not balanced
- when z is very big or very small, the speed of learning plateau
Common activation function:
- Sigmoid
- Hyperbolic tangent
- ReLU
The activate function helps us introduce non-linearity into our model
Cost function: the aim of our problem is to minimize the cost function
Loss optimization: find the lowest loss
Stochastic Gradient Descent:
- one epoch: random weight initialization, iteration compute gradient and update weights
Regularization:
help reduce overfitting in deep learning
Hyperparameter tuning:
- learning rate: too small too slow, too big will diverge
- num of hidden layers
- batch size
Weight initialization:
- Can help to break the symmetry in the model so that the model can capture complexity in the data
Problem of global/local minimum
- Initialization of weights can lead us to end up at global or local minimum
- saddle point
Regularization
- control the weights to not be too big
### Creating Google Cloud VM instance
Follow [Anh Minh's instructions](https://hackmd.io/SEVZeQMJRa2JJMxd9y8PnQ)
Connect to instance by:
```
gcloud compute ssh --zone=$ZONE jupyter@$INSTANCE_NAME -- -L 8080:localhost:8080
```
## Tuesday
### Neural network math
- cross entropy loss function is the cost function of logistic regression
- What is the derivative of sigmoid, tanh, ReLU function?
## Wednesday
### Review of OOP
### DNN from scratch
## Thursday
### DNN review and problems
- splitting data into mini-batch will help train data faster
- the more layers the gradient can disappear or become huge -> problem of gradients
- We can also change the way we update gradients
- Regularization helps with overfitting
**Stochastic gradient descent**
- pick one mini-batch to train forward and backward, we update more times
- small batch size -> fast iteration. large batch size -> more precise estimation of gradients
**Data Mismatch**
- training loss not good enough and we still not overfit
**Learning rate**
- we can pick a big learning rate to start with and smaller learning rate when we close to the goal
**Vanishing/Exploding gradient**
- this problem prevents us to build a deep neural network
- activation function saturating : derivative very close to 0
- fan-in: the number of nodes in the previous layer
- LeakyReLU: non saturating activation functions (solving the problem of neurons dying)
- ELU: another version of leaky Relu, slower but faster convergence rate
- Never use tanh or sigmoid in activation layers
**Batch normalization**
- original -> zero-center -> normalized
- gradient clipping : during back propagation to prevent gradient exploding. used in RNN
**Faster optimizer**
- moving average: higher beta smoother line (beta is the number of days to calculate average)
- momentum optimizer: update the parameters with moving average of dW not dW -> making update smoother and faster
- RMSProp optimizer:
- Adam optimizer:
**Learning rate decay**: smaller learning rate over time
- e.g for ResNets, multiply lr by 0.1 after epoch 30, 60, 90
**Regularization**: avoid overfitting
- earli stopping
- L1, L2, data augmentation, batch normalization, dropout
- L1, L2: adding function of weight term in loss function
- L1: sum of absolute of weight (L1norm manhattan distance) (sparsity)
- L2: sum of square of weight (L2norm euclidean distance) (simplicity)
- feature selection:
- Dropout: randomly dropout certain nodes in some layers -> forces layers not to rely on any 1 node
- data augmentation:
### tensorflow syntax
- sequential
- summary():
- steps: number of training set per epoch (sample / batch size)
## Friday
### Build a Trie class (autocomplete)
### Build flask app from template
:::info
**Find this document incomplete?** Leave a comment!
:::
###### tags: `CoderSchool` `Mariana` `MachineLearning`