###### tags: `Paper` [Tree] DeepGBM === [paper](https://dl.acm.org/doi/10.1145/3292500.3330858) [ppt](https://docs.google.com/presentation/d/1iCZ_vqIDpV1B94rNYtOUEkNId-DFsqsib201uWXNQ8A/edit?usp=sharing) #### GBDT can't do online learning. Why? [Is it possible to update a model with new data without retraining the model from scratch? ](https://github.com/dmlc/xgboost/issues/3055) - value of leaf can be easily changed but not the split #### GBDT fails to encode high dimensional one-hot features cuz the statistical information gain will become very small onsparse features, since the gain of imbalance partitions by sparse features is almost the same as non-partition. Why? [One-Hot Encoding is making your Tree-Based Ensembles worse, here’s why?](https://towardsdatascience.com/one-hot-encoding-is-making-your-tree-based-ensembles-worse-heres-why-d64b282b5769) [Do-we-need-to-apply-one-hot-encoding-to-use-xgboost?](https://www.quora.com/Do-we-need-to-apply-one-hot-encoding-to-use-xgboost) [xgboost issue](https://github.com/szilard/benchm-ml/issues/1) - By one-hot encoding a categorical variable, we are inducing sparsity into the dataset which is undesirable. - From the splitting algorithm’s point of view, all the dummy variables are independent. If the tree decides to make a split on one of the dummy variables, the gain in purity per split is very marginal. As a result, the tree is very unlikely to select one of the dummy variables closer to the root. ### [multihot embedding](https://www.zhihu.com/question/322025862) #### GBDT [CatBoost](https://zhuanlan.zhihu.com/p/102540344) [xgb vs lightgbm vs catboost](https://mp.weixin.qq.com/s/TD3RbdDidCrcL45oWpxNmw) #### AdamW [adam vs SGD](https://medium.com/%E8%BB%9F%E9%AB%94%E4%B9%8B%E5%BF%83/deep-learning-%E7%82%BA%E4%BB%80%E9%BA%BCadam%E5%B8%B8%E5%B8%B8%E6%89%93%E4%B8%8D%E9%81%8Esgd-%E7%99%A5%E7%B5%90%E9%BB%9E%E8%88%87%E6%94%B9%E5%96%84%E6%96%B9%E6%A1%88-fd514176f805) [optimization 李弘毅](http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML2020/Optimization.pdf)