# TensorFlow 2.0
### optimization
* more popular optimized function: ADAM
### feature selection
* chi-square test
* stabilized regression
* XGboost
* gini index
### learning rate
* dynamic rate
* preliminary larger
* degraded by time
* adagrad
* rmsprop
* **adam** = RMSProp + Momentum(speed up, avoid being trapped in local minimal value)
* warm up: to adjust learning rate by using small amount of data
### loss function
* binary cross entropy: the bigger variance it is, the bigger CE you get
* softmax
* MSE: peak value sensitive
* MAE
* focal loss: suitable for uneven data
### k-fold
* divided by number k
* the number of training data: k-1
* the number of testing data: 1
### anomoly detection
* unbalanced dataset: normal vs anomoly
### model usage/training in real-world business
* advertisement, shopping: revise model everyday
* facial recognition: train the model at first, but only adjust FC layers after deployment
### overfitting vs. underfitting
1. overfitting
* data augmentation
* chi-square, PCA, t-SNE
1. underfitting
* undercut outliers
* add features
* sol:
* regularization:
* L1: loose weight
* L2: understate minority weight
* dropout
* batch normalization
### collatorative filtering
* commonly use: cosine simularity
* adjacency cosine simularity:
* e.g. the score of 5 and 10 both represent like
* factorization machine
* loose dataset
* SVD, SVD++
### wide vs. deep model
#### wide
* specialized LR
* massive(expanded) features: e.g. male => male 0-10, 11-20, 21-30...
#### deep
* small amount of features
* word embedding: similar words have higher scores
### recommendation common problem
1. cold start
2. exploration explitation: bandit(Rakuten)
3. misconducted recommendation: ab-test
### recommendation
* validation: precision + recall (> ab-test)
* enlarge similarities: Advertisement customers wants to buy 1000 hits, but too precise prediction systems will lead to fewer hits.
* unbalanced between hitted and unhitted: negative sampling