# TensorFlow 2.0 ### optimization * more popular optimized function: ADAM ### feature selection * chi-square test * stabilized regression * XGboost * gini index ### learning rate * dynamic rate * preliminary larger * degraded by time * adagrad * rmsprop * **adam** = RMSProp + Momentum(speed up, avoid being trapped in local minimal value) * warm up: to adjust learning rate by using small amount of data ### loss function * binary cross entropy: the bigger variance it is, the bigger CE you get * softmax * MSE: peak value sensitive * MAE * focal loss: suitable for uneven data ### k-fold * divided by number k * the number of training data: k-1 * the number of testing data: 1 ### anomoly detection * unbalanced dataset: normal vs anomoly ### model usage/training in real-world business * advertisement, shopping: revise model everyday * facial recognition: train the model at first, but only adjust FC layers after deployment ### overfitting vs. underfitting 1. overfitting * data augmentation * chi-square, PCA, t-SNE 1. underfitting * undercut outliers * add features * sol: * regularization: * L1: loose weight * L2: understate minority weight * dropout * batch normalization ### collatorative filtering * commonly use: cosine simularity * adjacency cosine simularity: * e.g. the score of 5 and 10 both represent like * factorization machine * loose dataset * SVD, SVD++ ### wide vs. deep model #### wide * specialized LR * massive(expanded) features: e.g. male => male 0-10, 11-20, 21-30... #### deep * small amount of features * word embedding: similar words have higher scores ### recommendation common problem 1. cold start 2. exploration explitation: bandit(Rakuten) 3. misconducted recommendation: ab-test ### recommendation * validation: precision + recall (> ab-test) * enlarge similarities: Advertisement customers wants to buy 1000 hits, but too precise prediction systems will lead to fewer hits. * unbalanced between hitted and unhitted: negative sampling