G-Research Crypto Forecasting === ## 關於 Leaderboard 的討論 從 [G-Research - Full overlap-exploiting ("leaky") solution](https://www.kaggle.com/julian3833/g-research-using-the-overlap-fully-lb-0-99/notebook)、[Watch out!: test LB period is contained in the train csv](https://www.kaggle.com/c/g-research-crypto-forecasting/discussion/285505) 可知,現在的 Public Leaderboard 已經沒有參考價值,因為 Public Data 在 Train Data 中,所以 Leaky 的實作方法就是把離 Test Data 的 Timestamp 最近的 Train Data 當作預測值交出去。 所以就提出要有有意義的 Leaderboard,[Proposal for a meaningful LB + LGBM [S]](https://www.kaggle.com/julian3833/proposal-for-a-meaningful-lb-strict-lgbm?scriptVersionId=80421622),作法就是過濾出 2021-06-13 以後的資料,只留下之前的資料,這樣就可以被標註為 Strict,跟其他也是 Strict 的人做比較。 ## 資料分析 [Tutorial to the G-Research Crypto Competition](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition) 其中包含基本的 Dataset 介紹,還有 Preprocess 和一些資料分析。 * Data Feature 解釋 ![](https://i.imgur.com/213f41P.png) * Target 算法 ![](https://i.imgur.com/q7DdQDK.png) ## 測試 Model * [Initial thoughts about this competition](https://www.kaggle.com/c/g-research-crypto-forecasting/discussion/284903) 講了很多可能可以嘗試的方法和資源 ### LGBM (10 Estimators) [G-Research Crypto - Starter LGBM Pipeline](https://www.kaggle.com/julian3833/g-research-starter-lgbm-pipeline?scriptVersionId=78916593) * 直接用 lightgbm 套件訓練 * 使用原本的 Feature `['Count', 'Open', 'High', 'Low', 'Close', 'Volume', 'VWAP']` 和 `max(Open, Close)`、`min(Open, Close) - Low` ![](https://i.imgur.com/ZuHyCPL.png) #### 跟 XGBosst 500 n_estimators 比 ![](https://i.imgur.com/nFao08k.png) ``` xgb.XGBRegressor( n_estimators=500, max_depth=11, learning_rate=0.05, subsample=0.9, colsample_bytree=0.7, missing=-999, random_state=2020, ) ``` ![](https://i.imgur.com/L4EHKcP.png) ``` model = LGBMRegressor( n_estimators=500, num_leaves=700, learning_rate=0.09 ) ``` ![](https://i.imgur.com/zORLDBH.png) #### learning_rate ``` LGBMRegressor( boosting_type='gbdt', verbose = 0, num_leaves = 35, feature_fraction=0.8, bagging_fraction= 0.9, bagging_freq= 8, lambda_l1= 0.6, lambda_l2= 0) ``` * 0.005 * MSE for Binance Coin (ID=0 ) = 2.962883146859823e-05 * MSE for Bitcoin (ID=1 ) = 4.048963027057223e-06 * MSE for Bitcoin Cash (ID=2 ) = 4.155931290802632e-05 * MSE for Cardano (ID=3 ) = 2.0315270619180606e-05 * MSE for Dogecoin (ID=4 ) = 6.533941637832912e-05 * MSE for EOS.IO (ID=5 ) = 2.324051372967448e-05 * MSE for Ethereum (ID=6 ) = 6.217189468552802e-06 * MSE for Ethereum Classic (ID=7 ) = 8.02683104923585e-05 * MSE for IOTA (ID=8 ) = 6.413945973605014e-05 * MSE for Litecoin (ID=9 ) = 1.2607319199368734e-05 * MSE for Maker (ID=10) = 3.580300485128171e-05 * MSE for Monero (ID=11) = 4.4017987958844903e-05 * MSE for Stellar (ID=12) = 2.7032595492229906e-05 * MSE for TRON (ID=13) = 2.417496971443298e-05 * 0.01 * MSE for Binance Coin (ID=0 ) = 2.9558127707388634e-05 * MSE for Bitcoin (ID=1 ) = 4.044616867599669e-06 * MSE for Bitcoin Cash (ID=2 ) = 4.1258924856684506e-05 * MSE for Cardano (ID=3 ) = 2.0297703399596428e-05 * MSE for Dogecoin (ID=4 ) = 6.493981823555335e-05 * MSE for EOS.IO (ID=5 ) = 2.320488878540619e-05 * MSE for Ethereum (ID=6 ) = 6.21070208407895e-06 * MSE for Ethereum Classic (ID=7 ) = 8.010667884220628e-05 * MSE for IOTA (ID=8 ) = 6.403149367841317e-05 * MSE for Litecoin (ID=9 ) = 1.2595095902405634e-05 * MSE for Maker (ID=10) = 3.569801300447905e-05 * MSE for Monero (ID=11) = 4.387446691454699e-05 * MSE for Stellar (ID=12) = 2.695110841483679e-05 * MSE for TRON (ID=13) = 2.4136060755929524e-05 * 0.02 * MSE for Binance Coin (ID=0 ) = 2.945982418971682e-05 * MSE for Bitcoin (ID=1 ) = 4.038377332574062e-06 * MSE for Bitcoin Cash (ID=2 ) = 4.0843502480379525e-05 * MSE for Cardano (ID=3 ) = 2.027343680004845e-05 * MSE for Dogecoin (ID=4 ) = 6.443150741735854e-05 * MSE for EOS.IO (ID=5 ) = 2.3155416323402237e-05 * MSE for Ethereum (ID=6 ) = 6.200562267488674e-06 * MSE for Ethereum Classic (ID=7 ) = 7.988579115252567e-05 * MSE for IOTA (ID=8 ) = 6.389634995746188e-05 * MSE for Litecoin (ID=9 ) = 1.2576096277813133e-05 * MSE for Maker (ID=10) = 3.554459006987387e-05 * MSE for Monero (ID=11) = 4.3710256370286416e-05 * MSE for Stellar (ID=12) = 2.6832070968198765e-05 * MSE for TRON (ID=13) = 2.4079752016012746e-05 * 0.05 * MSE for Binance Coin (ID=0 ) = 2.931587386870643e-05 * MSE for Bitcoin (ID=1 ) = 4.028183071810968e-06 * MSE for Bitcoin Cash (ID=2 ) = 4.008969099435855e-05 * MSE for Cardano (ID=3 ) = 2.022307426797786e-05 * MSE for Dogecoin (ID=4 ) = 6.354473105206926e-05 * MSE for EOS.IO (ID=5 ) = 2.308092459741041e-05 * MSE for Ethereum (ID=6 ) = 6.1787850124376106e-06 * MSE for Ethereum Classic (ID=7 ) = 7.94460810548964e-05 * MSE for IOTA (ID=8 ) = 6.367912596913918e-05 * MSE for Litecoin (ID=9 ) = 1.2540921270235494e-05 * MSE for Maker (ID=10) = 3.530410643635731e-05 * MSE for Monero (ID=11) = 4.347455286169911e-05 * MSE for Stellar (ID=12) = 2.6667722397895925e-05 * MSE for TRON (ID=13) = 2.398156674119335e-05 * 0.1 * MSE for Binance Coin (ID=0 ) = 2.918143436956739e-05 * MSE for Bitcoin (ID=1 ) = 4.018086925690861e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.9102488104637e-05 * MSE for Cardano (ID=3 ) = 2.0169718501550346e-05 * MSE for Dogecoin (ID=4 ) = 6.275430076908439e-05 * MSE for EOS.IO (ID=5 ) = 2.3012182828493816e-05 * MSE for Ethereum (ID=6 ) = 6.160158033592445e-06 * MSE for Ethereum Classic (ID=7 ) = 7.894583329154167e-05 * MSE for IOTA (ID=8 ) = 6.34633917731525e-05 * MSE for Litecoin (ID=9 ) = 1.2507730857841256e-05 * MSE for Maker (ID=10) = 3.510624174720759e-05 * MSE for Monero (ID=11) = 4.325322198684414e-05 * MSE for Stellar (ID=12) = 2.653277921199377e-05 * MSE for TRON (ID=13) = 2.3900588501640176e-05 * 0.15 * MSE for Binance Coin (ID=0 ) = 2.906728791987851e-05 * MSE for Bitcoin (ID=1 ) = 4.012009119027506e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.849959999492438e-05 * MSE for Cardano (ID=3 ) = 2.0134913158302646e-05 * MSE for Dogecoin (ID=4 ) = 6.219233531676569e-05 * MSE for EOS.IO (ID=5 ) = 2.296040867148992e-05 * MSE for Ethereum (ID=6 ) = 6.148232699805026e-06 * MSE for Ethereum Classic (ID=7 ) = 7.856982730610658e-05 * MSE for IOTA (ID=8 ) = 6.330464962075509e-05 * MSE for Litecoin (ID=9 ) = 1.2485867092920265e-05 * MSE for Maker (ID=10) = 3.4992438320594135e-05 * MSE for Monero (ID=11) = 4.312419857218168e-05 * MSE for Stellar (ID=12) = 2.6447636524016766e-05 * MSE for TRON (ID=13) = 2.3835317555694767e-05 * 0.2 * MSE for Binance Coin (ID=0 ) = 2.8985606388263934e-05 * MSE for Bitcoin (ID=1 ) = 4.007453782364536e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.8021043146212407e-05 * MSE for Cardano (ID=3 ) = 2.0110089346564885e-05 * MSE for Dogecoin (ID=4 ) = 6.17364561098782e-05 * MSE for EOS.IO (ID=5 ) = 2.2915108461141584e-05 * MSE for Ethereum (ID=6 ) = 6.138301529397249e-06 * MSE for Ethereum Classic (ID=7 ) = 7.832177019826656e-05 * MSE for IOTA (ID=8 ) = 6.315723308854394e-05 * MSE for Litecoin (ID=9 ) = 1.2466114909651829e-05 * MSE for Maker (ID=10) = 3.4892182564203884e-05 * MSE for Monero (ID=11) = 4.301188737588808e-05 * MSE for Stellar (ID=12) = 2.638958880421093e-05 * MSE for TRON (ID=13) = 2.3801337578863542e-05 * 0.3 * MSE for Binance Coin (ID=0 ) = 2.8855309357508666e-05 * MSE for Bitcoin (ID=1 ) = 4.0004938209675396e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.723095193500236e-05 * MSE for Cardano (ID=3 ) = 2.007565780917732e-05 * MSE for Dogecoin (ID=4 ) = 6.110578003147326e-05 * MSE for EOS.IO (ID=5 ) = 2.285565092052967e-05 * MSE for Ethereum (ID=6 ) = 6.126424181061938e-06 * MSE for Ethereum Classic (ID=7 ) = 7.780285530052265e-05 * MSE for IOTA (ID=8 ) = 6.292324862087826e-05 * MSE for Litecoin (ID=9 ) = 1.2440857409927055e-05 * MSE for Maker (ID=10) = 3.476826965340627e-05 * MSE for Monero (ID=11) = 4.288833109447802e-05 * MSE for Stellar (ID=12) = 2.6311500287775883e-05 * MSE for TRON (ID=13) = 2.3701437578863542e-05 * 0.5 * MSE for Binance Coin (ID=0 ) = 2.867480524637792e-05 * MSE for Bitcoin (ID=1 ) = 3.992312983444045e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.619527661126361e-05 * MSE for Cardano (ID=3 ) = 2.0018321538758788e-05 * MSE for Dogecoin (ID=4 ) = 6.0114898396464895e-05 * MSE for EOS.IO (ID=5 ) = 2.2768479456694815e-05 * MSE for Ethereum (ID=6 ) = 6.109885298130273e-06 * MSE for Ethereum Classic (ID=7 ) = 7.691029259697072e-05 * MSE for IOTA (ID=8 ) = 6.269456139448866e-05 * MSE for Litecoin (ID=9 ) = 1.24079127353404e-05 * MSE for Maker (ID=10) = 3.455629315944908e-05 * MSE for Monero (ID=11) = 4.2709793262059034e-05 * MSE for Stellar (ID=12) = 2.618985330804133e-05 * MSE for TRON (ID=13) = 2.3651823887288394e-05 * 1.0 * MSE for Binance Coin (ID=0 ) = 2.8421766066166344e-05 * MSE for Bitcoin (ID=1 ) = 3.977910071380815e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.421758037005566e-05 * MSE for Cardano (ID=3 ) = 1.9922244824241053e-05 * MSE for Dogecoin (ID=4 ) = 5.864857444509723e-05 * MSE for EOS.IO (ID=5 ) = 2.2651520558892847e-05 * MSE for Ethereum (ID=6 ) = 6.0884637499918094e-06 * MSE for Ethereum Classic (ID=7 ) = 7.555638184096643e-05 * MSE for IOTA (ID=8 ) = 6.215193825729473e-05 * MSE for Litecoin (ID=9 ) = 1.23445749259029e-05 * MSE for Maker (ID=10) = 3.422455248376916e-05 * MSE for Monero (ID=11) = 4.2417450147361976e-05 * MSE for Stellar (ID=12) = 2.6038406653946086e-05 * MSE for TRON (ID=13) = 2.352371412841004e-05 * 1.5 * MSE for Binance Coin (ID=0 ) = 2.8297321550904047e-05 * MSE for Bitcoin (ID=1 ) = 3.968379969103903e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.353898471378931e-05 * MSE for Cardano (ID=3 ) = 1.9883203040775137e-05 * MSE for Dogecoin (ID=4 ) = 5.778379560457684e-05 * MSE for EOS.IO (ID=5 ) = 2.2590663206322334e-05 * MSE for Ethereum (ID=6 ) = 6.076546035291101e-06 * MSE for Ethereum Classic (ID=7 ) = 7.503772850492168e-05 * MSE for IOTA (ID=8 ) = 6.203714433388084e-05 * MSE for Litecoin (ID=9 ) = 1.2331563530592825e-05 * MSE for Maker (ID=10) = 3.402739898817774e-05 * MSE for Monero (ID=11) = 4.230139901103305e-05 * MSE for Stellar (ID=12) = 2.5958566203286074e-05 * MSE for TRON (ID=13) = 2.346579838766222e-05 * 2.0 * MSE for Binance Coin (ID=0 ) = 2.8386586585278976e-05 * MSE for Bitcoin (ID=1 ) = 3.9670081139917455e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.318593860663377e-05 * MSE for Cardano (ID=3 ) = 1.991878127130314e-05 * MSE for Dogecoin (ID=4 ) = 5.843289004108042e-05 * MSE for EOS.IO (ID=5 ) = 2.2691837970777888e-05 * MSE for Ethereum (ID=6 ) = 6.0759537368640395e-06 * MSE for Ethereum Classic (ID=7 ) = 7.622727900890767e-05 * MSE for IOTA (ID=8 ) = 6.234352358611115e-05 * MSE for Litecoin (ID=9 ) = 1.2353996171819229e-05 * MSE for Maker (ID=10) = 3.4081931853343297e-05 * MSE for Monero (ID=11) = 4.248871420986206e-05 * MSE for Stellar (ID=12) = 2.6105811556364908e-05 * MSE for TRON (ID=13) = 2.3533956873151908e-05 #### max_depth ``` LGBMRegressor( boosting_type='gbdt', verbose = 0, learning_rate = 1.5, num_leaves = 35, feature_fraction=0.8, bagging_fraction= 0.9, bagging_freq= 8, lambda_l1= 0.6, lambda_l2= 0 ) ``` * 10 * MSE for Binance Coin (ID=0 ) = 2.830273994479173e-05 * MSE for Bitcoin (ID=1 ) = 3.971531407926195e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.351836916154162e-05 * MSE for Cardano (ID=3 ) = 1.9889381270555074e-05 * MSE for Dogecoin (ID=4 ) = 5.782125676764511e-05 * MSE for EOS.IO (ID=5 ) = 2.259487226312694e-05 * MSE for Ethereum (ID=6 ) = 6.077882389590446e-06 * MSE for Ethereum Classic (ID=7 ) = 7.502913718281185e-05 * MSE for IOTA (ID=8 ) = 6.200538525089023e-05 * MSE for Litecoin (ID=9 ) = 1.2328182755427446e-05 * MSE for Maker (ID=10) = 3.40675190277553e-05 * MSE for Monero (ID=11) = 4.233813125604648e-05 * MSE for Stellar (ID=12) = 2.598367285905819e-05 * MSE for TRON (ID=13) = 2.3484732577097547e-05 * 15 * MSE for Binance Coin (ID=0 ) = 2.8268980011454977e-05 * MSE for Bitcoin (ID=1 ) = 3.968379969103903e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.353898471378931e-05 * MSE for Cardano (ID=3 ) = 1.9877147811320407e-05 * MSE for Dogecoin (ID=4 ) = 5.778379560457684e-05 * MSE for EOS.IO (ID=5 ) = 2.2590663206322334e-05 * MSE for Ethereum (ID=6 ) = 6.076546035291101e-06 * MSE for Ethereum Classic (ID=7 ) = 7.503772850492168e-05 * MSE for IOTA (ID=8 ) = 6.19910057115082e-05 * MSE for Litecoin (ID=9 ) = 1.2331563530592825e-05 * MSE for Maker (ID=10) = 3.402739898817774e-05 * MSE for Monero (ID=11) = 4.230476642768924e-05 * MSE for Stellar (ID=12) = 2.5958566203286074e-05 * MSE for TRON (ID=13) = 2.346579838766222e-05 * 20 * MSE for Binance Coin (ID=0 ) = 2.8297321550904047e-05 * MSE for Bitcoin (ID=1 ) = 3.968379969103903e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.353898471378931e-05 * MSE for Cardano (ID=3 ) = 1.9883203040775137e-05 * MSE for Dogecoin (ID=4 ) = 5.778379560457684e-05 * MSE for EOS.IO (ID=5 ) = 2.2590663206322334e-05 * MSE for Ethereum (ID=6 ) = 6.076546035291101e-06 * MSE for Ethereum Classic (ID=7 ) = 7.503772850492168e-05 * MSE for IOTA (ID=8 ) = 6.203714433388084e-05 * MSE for Litecoin (ID=9 ) = 1.2331563530592825e-05 * MSE for Maker (ID=10) = 3.402739898817774e-05 * MSE for Monero (ID=11) = 4.230139901103305e-05 * MSE for Stellar (ID=12) = 2.5958566203286074e-05 * MSE for TRON (ID=13) = 2.346579838766222e-05 * 30 * MSE for Binance Coin (ID=0 ) = 2.8297321550904047e-05 * MSE for Bitcoin (ID=1 ) = 3.968379969103903e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.353898471378931e-05 * MSE for Cardano (ID=3 ) = 1.9883203040775137e-05 * MSE for Dogecoin (ID=4 ) = 5.778379560457684e-05 * MSE for EOS.IO (ID=5 ) = 2.2590663206322334e-05 * MSE for Ethereum (ID=6 ) = 6.076546035291101e-06 * MSE for Ethereum Classic (ID=7 ) = 7.503772850492168e-05 * MSE for IOTA (ID=8 ) = 6.203714433388084e-05 * MSE for Litecoin (ID=9 ) = 1.2331563530592825e-05 * MSE for Maker (ID=10) = 3.402739898817774e-05 * MSE for Monero (ID=11) = 4.230139901103305e-05 * MSE for Stellar (ID=12) = 2.5958566203286074e-05 * MSE for TRON (ID=13) = 2.346579838766222e-05 #### boosting_type ``` LGBMRegressor( verbose = 0, learning_rate = 1.5, num_leaves = 35, feature_fraction=0.8, bagging_fraction= 0.9, bagging_freq= 8, lambda_l1= 0.6, lambda_l2= 0 ) ``` * 'gbdt', traditional Gradient Boosting Decision Tree * MSE for Binance Coin (ID=0 ) = 2.8297321550904047e-05 * MSE for Bitcoin (ID=1 ) = 3.968379969103903e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.353898471378931e-05 * MSE for Cardano (ID=3 ) = 1.9883203040775137e-05 * MSE for Dogecoin (ID=4 ) = 5.778379560457684e-05 * MSE for EOS.IO (ID=5 ) = 2.2590663206322334e-05 * MSE for Ethereum (ID=6 ) = 6.076546035291101e-06 * MSE for Ethereum Classic (ID=7 ) = 7.503772850492168e-05 * MSE for IOTA (ID=8 ) = 6.203714433388084e-05 * MSE for Litecoin (ID=9 ) = 1.2331563530592825e-05 * MSE for Maker (ID=10) = 3.402739898817774e-05 * MSE for Monero (ID=11) = 4.230139901103305e-05 * MSE for Stellar (ID=12) = 2.5958566203286074e-05 * MSE for TRON (ID=13) = 2.346579838766222e-05 * 'dart', Dropouts meet Multiple Additive Regression Trees * MSE for Binance Coin (ID=0 ) = 2.861552981800196e-05 * MSE for Bitcoin (ID=1 ) = 3.986996394108109e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.5541403695078286e-05 * MSE for Cardano (ID=3 ) = 1.9985109269998895e-05 * MSE for Dogecoin (ID=4 ) = 5.9545485184461394e-05 * MSE for EOS.IO (ID=5 ) = 2.276030912737726e-05 * MSE for Ethereum (ID=6 ) = 6.1058968428090304e-06 * MSE for Ethereum Classic (ID=7 ) = 7.653341862694617e-05 * MSE for IOTA (ID=8 ) = 6.252573392591707e-05 * MSE for Litecoin (ID=9 ) = 1.239507784146932e-05 * MSE for Maker (ID=10) = 3.4427643536435954e-05 * MSE for Monero (ID=11) = 4.260464158108526e-05 * MSE for Stellar (ID=12) = 2.6148974278591002e-05 * MSE for TRON (ID=13) = 2.3614055822540097e-05 * 'goss', Gradient-based One-Side Sampling * MSE for Binance Coin (ID=0 ) = 2.8630436222736632e-05 * MSE for Bitcoin (ID=1 ) = 3.980782161577331e-06 * MSE for Bitcoin Cash (ID=2 ) = 3.39647751030129e-05 * MSE for Cardano (ID=3 ) = 2.00443793666652e-05 * MSE for Dogecoin (ID=4 ) = 5.842851450820307e-05 * MSE for EOS.IO (ID=5 ) = 2.2757154055606087e-05 * MSE for Ethereum (ID=6 ) = 6.102483663518405e-06 * MSE for Ethereum Classic (ID=7 ) = 7.533078033198039e-05 * MSE for IOTA (ID=8 ) = 6.283206579230877e-05 * MSE for Litecoin (ID=9 ) = 1.241762571876302e-05 * MSE for Maker (ID=10) = 3.475188349632839e-05 * MSE for Monero (ID=11) = 4.295045366788878e-05 * MSE for Stellar (ID=12) = 2.6244451428592598e-05 * MSE for TRON (ID=13) = 2.3647323951908538e-05 * 'rf', Random Forest * MSE for Binance Coin (ID=0 ) = 2.9546223866259905e-05 * MSE for Bitcoin (ID=1 ) = 4.043469669104509e-06 * MSE for Bitcoin Cash (ID=2 ) = 4.1203844415454476e-05 * MSE for Cardano (ID=3 ) = 2.0293813613583373e-05 * MSE for Dogecoin (ID=4 ) = 6.476385996176355e-05 * MSE for EOS.IO (ID=5 ) = 2.3194738859482323e-05 * MSE for Ethereum (ID=6 ) = 6.20731163488941e-06 * MSE for Ethereum Classic (ID=7 ) = 8.005680504219078e-05 * MSE for IOTA (ID=8 ) = 6.397949375752386e-05 * MSE for Litecoin (ID=9 ) = 1.2592088094528807e-05 * MSE for Maker (ID=10) = 3.5653200493360276e-05 * MSE for Monero (ID=11) = 4.3777101669941676e-05 * MSE for Stellar (ID=12) = 2.6918108556044384e-05 * MSE for TRON (ID=13) = 2.4131598086748096e-05 ### LGBM (1000 Estimators) [I purchased Bitcoin !](https://www.kaggle.com/shivansh002/i-purchased-bitcoin#Predict-&-submit) ![](https://i.imgur.com/5QJsMAh.png) 與 [G-Research Crypto - Starter LGBM Pipeline](https://www.kaggle.com/julian3833/g-research-starter-lgbm-pipeline?scriptVersionId=78916593) 差別在於 * 分數較高的 LGMRegressor 參數為 `LGBMRegressor(n_estimators=1500,num_leaves=700,learning_rate=0.09)`,較低的則是 `LGBMRegressor(n_estimators=10)` * 分數較高的是用 Strict Dataset ### LGBM (5000 Estimators + Data Preprocess) [[Crypto] Beginner's Try for simple LGBM (En/Jp)](https://www.kaggle.com/zeze1solo/crypto-beginner-s-try-for-simple-lgbm-en-jp/edit) ![](https://i.imgur.com/BzEreLm.png) 作者嘗試過 1. 用 Ridge 訓練,結果 Public Score 0.007 2. Model 改用 lightbgm,Public Store 0.039 3. 把資料缺失的 NAN、Inf、-Inf 改成平均值,結果 Public Score 掉到 0.028,所以後來直接把資料丟掉 4. 把 asset_details.csv 的 weight 加入訓練 Feature,結果 Public Score 掉到 0.036 5. 有遇到 Data Memory 太大的問題,所以根據每個 Column 最大最小值把型別改成 int8、int16、int32、int64、float16、float32、float64 6. 切出 Evaluation Set (6:4),結果 Public Score 掉到 0.018 7. 換 Model `LGBMRegressor(N_estimators = 200)`,Public Score 提升至 0.069 8. 為每種虛擬貨幣建立一個 Model,Public Score 提升至 0.1938 9. 設置 `upper_shadow = High - max(Close, Open)`、`lower_shadow = min(Close, Open) - Low`,Public Score 降低成 0.1761 ![](https://i.imgur.com/gbxGkw7.png) 10. 把參數 n_estimators 從 200 改成 500,Public Score 提高到 0.2516 11. 調整 learning_rate、num_leaves,原本設多少不知道,Public Score 提升至 0.4278 12. 最後單純把 n_estimator 從 500 改成 5000,`LGBMRegressor(N_estimators=5000, num_leaves=300, learning_rate=0.09, random_seed=1234)`,Public Score 就提高到 0.7005 ![](https://i.imgur.com/HpcTh5z.png) ### LSTM [Time Series Modeling 📈 - LSTM 🔥](https://www.kaggle.com/yamqwe/g-research-lstm-starter-notebook?scriptVersionId=81673007) ![](https://i.imgur.com/Se1y1Jm.png) ### XGBoost [G-Research: XGBoost with GPU (Fit in 1min)](https://www.kaggle.com/yamqwe/g-research-xgboost-with-gpu-fit-in-1min/notebook) ![](https://i.imgur.com/L4EHKcP.png) ## Model 原理 ### XGBoost [xgboost入門與實戰(原理篇)](https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/635146/) XGBoost 是 Kaggle 比賽的常勝軍,原理是組合許多弱分類器變成一個強分類器,比一般 GBDT(Gradient Boosting Decision Tree) 做了更多改進。 跟其他 GBDT 相比,XGBoost 也是用 Classification and Regression Tree(CART),用各種條件分出節點。 ![](https://i.imgur.com/cBEZjJP.png) 最後迴歸樹整合,把每個節點的值加起來。 ![](https://i.imgur.com/ahcIkOY.png) 不過跟其他 GBDT 的不同點在於, 1. 傳統 GBDT 以 CART 作為基分類器,xgboost 還支援線性分類器,這個時候xgboost 相當於帶 L1 和 L2 正則化項的邏輯斯蒂迴歸(分類問題)或者線性迴歸(迴歸問題)。 可以通過 booster default=gbtree 設定引數 gbtree tree-based models/gblinear: linear models 2. 傳統 GBDT 在優化時只用到一階導數資訊,xgboost 則對損失函式進行了二階泰勒展開,同時用到了一階和二階導數。順便提一下,xgboost 工具支援自定義損失函式,只要函式可一階和二階求導,對損失函式做了改進。 3. xgboost 在損失函式里加入了正則項,用於控制模型的複雜度。正則項裡包含了樹的葉子節點個數、每個葉子節點上輸出的 score 的 L2 模的平方和。從Bias-variance tradeoff 角度來講,正則項降低了模型 variance,使學習出來的模型更加簡單,防止過擬合,這也是 xgboost 優於傳統 GBDT 的一個特性。正則化包括了兩個部分,都是為了防止過擬合,剪枝是都有的,葉子結點輸出 L2 平滑是新增的。 4. shrinkage and column subsampling 還是為了防止過擬合 * shrinkage 縮減類似於學習速率,在每一步 tree boosting 之後增加了一個引數 n(權重),通過這種方式來減小每棵樹的影響力,給後面的樹提供空間去優化模型。 * column subsampling 列(特徵)抽樣,說是從隨機森林那邊學習來的,防止過擬合的效果比傳統的行抽樣還好(行抽樣功能也有),並且有利於後面提到的並行化處理演算法。 5. split finding algorithms (劃分點查詢演算法) * exact greedy algorithm 貪心演算法獲取最優切分點 * approximate algorithm 近似演算法,提出了候選分割點概念,先通過直方圖演算法獲得候選分割點的分佈情況,然後根據候選分割點將連續的特徵資訊對映到不同的buckets中,並統計彙總資訊。 * Weighted Quantile Sketch—分散式加權直方圖演算法,這裡的演算法是為了解決資料無法一次載入記憶體或者在分散式情況下演算法效率低的問題 * 可並行的近似直方圖演算法。樹節點在進行分裂時,我們需要計算每個特徵的每個分割點對應的增益,即用貪心法列舉所有可能的分割點。當資料無法一次載入記憶體或者在分散式情況下,貪心演算法效率就會變得很低,所以 xgboost 還提出了一種可並行的近似直方圖演算法,用於高效地生成候選的分割點。 6. 對缺失值的處理。對於特徵的值有缺失的樣本,xgboost 可以自動學習出它的分裂方向。稀疏感知演算法 Sparsity-aware Split Finding 7. Built-in Cross-Validation(內建交叉驗證) * XGBoost allows user to run a cross-validation at each iteration of the boosting process and thus it is easy to get the exact optimum number of boosting iterations in a single run. This is unlike GBM where we have to run a grid-search and only a limited values can be tested. 8. continue on Existing Model(接著已有模型學習) * User can start training an XGBoost model from its last iteration of previous run. This can be of significant advantage in certain specific applications. GBM implementation of sklearn also has this feature so they are even on this point. 9. High Flexibility(高靈活性) * XGBoost allow users to define custom optimization objectives and evaluation criteria. This adds a whole new dimension to the model and there is no limit to what we can do. 10. 並行化處理 —系統設計模組,塊結構設計等 * xgboost 工具支援並行。xgboost 的並行不是 tree 的並行,xgboost 也是一次迭代完才能進行下一次迭代的(第 t 次迭代的代價函式裡包含了前面 t-1 次迭代的預測值)。xgboost 的並行是在特徵上的。我們知道,決策樹的學習最耗時的一個步驟就是對特徵的值進行排序(因為要確定最佳分割點),xgboost 在訓練之前,預先對資料進行了排序,然後儲存為 block 結構,後面的迭代中重複地使用這個結構,大大減小計算量。這個 block 結構也使得並行成為了可能,在進行節點的分裂時,需要計算每個特徵的增益,最終選增益最大的那個特徵去做分裂,那麼各個特徵的增益計算就可以開多執行緒進行。 ### LightGBM 效率比 XGBoost 好,多了 Gradient-based One-Side Sampling (GOSS:基於梯度的 one-side 採樣) 和 Exclusive Feature Bundling (EFB:互斥的特徵捆綁) * Gradient-based One-Side Sampling (GOSS): 保留較大梯度實例同時對較小梯度隨機採樣的方式減少計算量。因為在提升樹訓練過程中目標函數學習的就是負梯度(近似殘差),梯度小說明訓練誤差已經很小了,對這部分數據的進一步學習的效果不如對梯度大的樣本進行學習的效果好或者說對梯度小的樣本進行進一步學習對改善結果精度幫助其實並不大。 * 根據樣本的梯度將樣本降序排序。 * 保留前 n 個數據樣本,作為數據子集 z1。 * 對於剩下的數據的樣本,隨機採樣獲得大小為 m 的數據子集 Z2。 * 計算信息增益時對採樣的Z2樣本的梯度數據乘以(1-n)/m(目的是不改變原數據的分布) * Exclusive Feature Bundling (EFB): 通過特徵捆綁的方式減少特徵維度(其實是降維技術)的方式,來提升計算效率。通常被捆綁的特徵都是互斥的(一個特徵值為零一個特徵值不為零),這樣兩個特徵捆綁起來才不會丟失信息。如果兩個特徵並不是完全互斥(部分情況下兩個特徵都是非零值),可以用一個指標對特徵不互斥程度進行衡量,稱之為衝突比率,當這個值較小時,我們可以選擇把不完全互斥的兩個特徵捆綁,而不影響最後的精度。 * 將特徵按照非零值的個數進行排序 * 計算不同特徵之間的衝突比率 * 遍歷每個特徵並嘗試合併特徵,使衝突比率最小化