ML - HackMD

--- title: ML tags: Templates, Talk description: View the slide with "Slide Mode". --- # Trees - Steps to build a tree ``` while |iterate until |: ``` what if a leaf has no data points - shouldn't happen Suppose we have this tree, with these examples: (1:f1=3, 2:f1=2) ``` f1 < 4 root f1 > 4 /. \ {1,2} ``` Y = 0, 1 $Gini = \sum N_{branch} * H$ Regression - sort all the data by the f1 - use a pair of points (two points), cal avg, as splitter - use it as - why avg and not "median"? Time complexity O(mn log m) m : feature ## Random Forest O(M x mn log m) parallel grow trees `If regression` ## Compare ### RF vs GBDT vs Adaboost vs Xgboost - All build on DT - RF use bagging + bootstraping + random features - GBDT steps - init guess - for t in epochs: - calculate prediction and residual - Train a new tree (tree1) to predict residual - combine: init + $\alpha$ * tree1_predict to make `LOSS functions of GBDT` `Converge condition` # SVM ## Concepts ## Kernal Usage - when use linear / polynomial / rbf / Gassian - depends on data pattern - projection to low dim space (PCA=linear/ tSNE=non-linear), visualize `kernel trick` - not useful anymore # Deep Learning ## Constraint to use Deep Learning - dimension? - if pretrain - then few hundreds, - else 10K - Computation resource? - inference, on devices vs. on cloud - 8bit instead of 32bit cut space by 4 -> larger batch - Student / Teacher model, student has less parameter - Training data pre-processing / compression ## Momentum Use historical gradient descent to adjust direction of current grad descent. ![](https://i.imgur.com/CxH808j.png) ## Adagard For more frequent updates, use small steps to update weights; For less frequent updates, use larger step to learn more information. ![](https://i.imgur.com/OL1SrhW.png) ## RMSprop Update based on lastest momemtum ![](https://i.imgur.com/2GhBnHr.png) ## Adam ## sigmoid vs. ReLu - sig: when we need prediction bounded 0-1, or softmax - softmax: all add up to 1 - ReLU, more CV, adv: preserve information of large number; feature: all negatives go to 0 (when we need keep negative labels/probability/scores, use tanh) ## Batch Normalization Standardization, keep the output at same scale also a good for overfitting. ## SGD Stochastic Batch vs. mini Batch: same, take several random samples from the data to form a batch; better for learning because it will shuffle the data GD: feed model with all data, which is not necessary ## RNN / GRU / LSTM RNN: only one state GRU: faster than LSTM; accuracy almost LSTM Often En/de same arch, but diff para, encode: represent a sentence, obtain a single vector, at decoder: use the representation as input, then output words vocabulary, get argmax of word score (position of the word), this will be input of next step of decoder ## Group Convolution ## What can cause a model failed to converge? Is it failed model if not converge? ## VGG3x3 Core Adv. ## Attention D During encoder, each step output a vector, people only care last step, when use attention we use all stages of encoder, output of decoder: hiden representation, multiplied by every encoder stage, ## Overfitting Prevention - Dropout - only for training - introduce noise to reduce overfitting - help to cut off relationship between features, to pay attention on each feature individually for each epoch - Shrink Network (layers number / width) - Regularization (L1 / L2) - Multi-task learning (BERT, predict next sentence / word), predict image and words together - Review Learning Rate, reduce if no big change in likelihood - Early stopping, if no improvements during epoch, stop training - track performance of model by epoch, choose the best one - Batch Normalization ## Vanish / Explosion of Gradient - gradient can not have too large number with over 32 bits float - grad clipping - Sigmoid / ReLu, neg ~ 0, no learning, no adjustment - no good solution for vanishing, maybe `dropout`

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.