AutoGluon === ###### tags: `ML / ensemble` ###### tags: `ML`, `AutoML`, `CPU+GPU`, `sklearn`, `XGBoost`, `NeurlNetwork+NN` ![](https://i.imgur.com/VxfOFtV.png =50%x) <br> [TOC] <br> ## github > [awslabs / autogluon](https://github.com/awslabs/autogluon) - [AutoGluon Roadmap 2022](https://github.com/awslabs/autogluon/blob/master/ROADMAP.md) - 文件說明:https://auto.gluon.ai/stable/index.html - [AutoGluon overview & example applications](https://towardsdatascience.com/autogluon-deep-learning-automl-5cdb4e2388ec) - To install AutoGluon for Linux with GPU support, run the following commands in terminal or refer to the installation wiki for CPU only installation: ``` pip install --upgrade mxnet-cu100 ``` - [AWS 上的 Apache MXNet](https://aws.amazon.com/tw/mxnet/) <br> ## Docker > [autogluon/autogluon](https://hub.docker.com/r/autogluon/autogluon) - Start Container and Notebook Server with GPU support ``` $ docker pull autogluon/autogluon:0.5.2-cuda11.2-jupyter-ubuntu20.04-py3.8 $ docker run --gpus all --shm-size=1G --rm -it -p 8888:8888 \ autogluon/autogluon:0.5.2-cuda11.2-jupyter-ubuntu20.04-py3.8 ``` - Start Container and Notebook Server with CPU-only support ``` $ docker pull autogluon/autogluon:0.5.2-cpu-jupyter-ubuntu20.04-py3.8 $ docker run --rm --shm-size=1G -it -p 8888:8888 \ autogluon/autogluon:0.5.2-cpu-jupyter-ubuntu20.04-py3.8 ``` <br> ## Doc > [AutoGluon: AutoML for Text, Image, and Tabular Data](https://auto.gluon.ai/stable/index.html) ### [Installation](https://auto.gluon.ai/stable/install.html) - 若只安裝 tabular 套件 - **基本 (含 scikit-learn: RF+XT+KNN+LR )** `pip install autogluon.tabular` - **基本 + CAT(catboost)** `pip install autogluon.tabular[catboost]` :warning: **`pip install autogluon.tabular`** 並沒有包含 catboost - **基本 + GBM(lightgbm)** `pip install autogluon.tabular[lightgbm]==0.5.2` - **基本 + XGB(xgboost)** `pip install autogluon.tabular[xgboost]==0.5.2` - **基本 + NN_MXNET(mxnet)** `pip install mxnet --upgrade` or `pip install mxnet_cu101 --upgrade` - **基本 + NN_TORCH(torch)** `pip install torch` - **基本 + FASTAI(fastai)** `pip install autogluon.tabular[fastai]` - **基本 + AG_TEXT_NN** `pip install autogluon.text` - **基本 + 全配** `pip install autogluon.tabular[all]` - catboost - lightgbm - xgboost - fastai - torch - ... (備註:全配無包含 NN_MXNET,沒有看到 MXNET log) ### [APIs](https://auto.gluon.ai/stable/api/index.html) - [TabularDataset](https://auto.gluon.ai/0.1.0/api/autogluon.task.html#autogluon.tabular.TabularDataset) - [TabularPredictor](https://auto.gluon.ai/0.1.0/api/autogluon.task.html#autogluon.tabular.TabularPredictor) - [TabularPredictor.fit](https://auto.gluon.ai/0.1.0/api/autogluon.task.html#autogluon.tabular.TabularPredictor.fit) - time_limit: int - hyperparameters: str - Stable model options include: ‘GBM’ (LightGBM) ‘CAT’ (CatBoost) ‘XGB’ (XGBoost) ‘RF’ (random forest) ‘XT’ (extremely randomized trees) ‘KNN’ (k-nearest neighbors) ‘LR’ (linear regression) ‘NN’ (neural network with MXNet backend) ‘FASTAI’ (neural network with FastAI backend) - Experimental model options include: ‘FASTTEXT’ (FastText) ‘AG_TEXT_NN’ (Multimodal Text+Tabular model, GPU is required) ‘TRANSF’ (Tabular Transformer, GPU is recommended) - [TabularPredictor.predict](https://auto.gluon.ai/0.1.0/api/autogluon.task.html#autogluon.tabular.TabularPredictor.predict) - [TabularPredictor.evaluate](https://auto.gluon.ai/0.1.0/api/autogluon.task.html#autogluon.tabular.TabularPredictor.evaluate) - 回傳評估分數 - 需包含 X, y 資料 - 測試範例 ![](https://i.imgur.com/gEnaHKk.png) - [TabularPredictor.evaluate_predictions](https://auto.gluon.ai/0.1.0/api/autogluon.task.html#autogluon.tabular.TabularPredictor.evaluate_predictions) - 回傳評估分數 - 需包含 y_true, y_pred - 測試範例 ![](https://i.imgur.com/Q0mHzzo.png) <br> <hr> <br> ## Problem types - [log] `TabularPredictor(...).fit(...)` > predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression']) <br> <hr> <br> ## Algorithms ### tabular - [[doc] autogluon.tabular.models](https://auto.gluon.ai/0.1.0/api/autogluon.tabular.models.html) - [[doc] autogluon.tabular.TabularPredictor.fit](https://auto.gluon.ai/stable/api/autogluon.predictor.html#autogluon.tabular.TabularPredictor.fit) - **Stable models:** 'GBM' (LightGBM) 'CAT' (CatBoost) 'XGB' (XGBoost) 'RF' (random forest) 'XT' (extremely randomized trees) 'KNN' (k-nearest neighbors) 'LR' (linear regression) 'NN_MXNET' (neural network implemented in MXNet) 'NN_TORCH' (neural network implemented in Pytorch) 'FASTAI' (neural network with FastAI backend) - **Experimental models:** 'FASTTEXT' (FastText) 'AG_TEXT_NN' (Multimodal Text+Tabular model, GPU is required) 'TRANSF' (Tabular Transformer, GPU is recommended) - **Summary** | Model (short) | Model (full)<br>import | Support GPU<br>(Default) | doc | | ------------- | ------------ | ----------- | --- | | GBM | **LightGBM**<br>`autogluon.tabular.models.lgb` | ✘ (default)<br>✔ (install) | [[lightgbm]](https://lightgbm.readthedocs.io/en/latest/Parameters.html) | CAT | **CatBoost**<br>`autogluon.tabular.models.catboost` | ✔ | [[catboost]](https://catboost.ai/docs/concepts/parameter-tuning.html) | | XGB | **XGBoost**<br>`autogluon.tabular.models.xgboost` | ✔ | [[xgboost]](https://xgboost.readthedocs.io/en/latest/parameter.html) | | RF | **random forest**<br>`autogluon.tabular.models.rf` | <span style="color: red">✘</span> | [[sklearn]](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) | | XT | **extremely randomized trees**<br>`autogluon.tabular.models.xt` | <span style="color: red">✘</span> | [[sklearn]](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html) | | KNN | **k-nearest neighbors**<br>`autogluon.tabular.models.knn` | <span style="color: red">✘</span> | [[sklearn]](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) | | LR | **linear regression**<br>`autogluon.tabular.models.lr` | <span style="color: red">✘</span> | [[sklearn]](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) | | NN_MXNET | neural network implemented in MXNet | ✔ || | NN_TORCH | neural network implemented in Pytorch | ✔ || | FASTAI | neural network with FastAI backend<br>`autogluon.tabular.models.fastainn` | ✔ | [[fast.ai]](https://docs.fast.ai/tabular.models.html) | | * FASTTEXT | **FastText** | - || | * AG_TEXT_NN | Multimodal Text+Tabular model,<br>GPU is required | ✔ (no-CPU)|| | * TRANSF | **Tabular Transformer**<br>GPU is recommended | ✔ || - <b style='color: red'>`*`</b> 為 experimental models <br> - 預設演算法不含 experimental models 和 NN_MXNET - `pip install autogluon.tabular[all]` 全配安裝,不會把 NN_MXNET 套件引入安裝 <br> - **執行結果**:[放在 gitlab 上](http://10.78.26.44:30000/ai_maker_template/ml_sklearn/-/issues/19#note_73146) - **NN_MXNET**:效果差 - **FASTTEXT**:無法訓練 > No valid features to train FastText_2... Skipping this model. - **AG_TEXT_NN**:訓練/推論結果差, acc=6%,應該僅是用於 text? <br> - [[官網][doc] autogluon.tabular.models](https://auto.gluon.ai/stable/api/autogluon.tabular.models.html) <br> ## Algorithm > GPU options ### Usages - ### global ```python= %%time predictor = TabularPredictor( label, eval_metric='accuracy', path=path ).fit( train_data, hyperparameters={ 'XGB': [{}] # 至少要有一個 base (不能是空陣列) # or # 'XGB': {} } ag_args_fit={'num_gpus': 1} ) ``` - `'XGB': []`: 錯誤訊息為 No base models to train on - ### local ```python= %%time predictor = TabularPredictor( label, eval_metric='accuracy', path=path ).fit( train_data, hyperparameters={ 'XGB': [ {'ag_args_fit': {'num_gpus': 1}}, # train with CPU (first run) {'ag_args_fit': {'num_gpus': 0}}, # train with GPU (second run) ] } ) ``` :::warning :warning: **在 gpu 環境下的限制** - NeuralNetTorch > Fitting model: NeuralNetTorch ... > TabularNeuralNetTorchModel not yet able to use more than 1 GPU. 'num_gpus' is set to >1, but we will be using only 1 GPU. > [![](https://i.imgur.com/VP16RPd.png)](https://i.imgur.com/VP16RPd.png) ::: :::warning :warning: **在 no-gpu 環境下啟用 GPU ,會有錯誤訊息** - ### XGBoost ```python predictor = TabularPredictor( label, eval_metric='accuracy', path=path ).fit( train_data, hyperparameters={ 'XGB': { } }, ag_args_fit={'num_gpus': 1} ) ``` > Fitting model: XGBoost ... > Warning: Exception caused XGBoost to fail during training... Skipping this model. > [14:44:21] ../src/gbm/gbtree.cc:548: Check failed: common::AllVisibleGPUs() >= 1 (0 vs. 1) : No visible GPU is found for XGBoost. > ... > xgboost.core.XGBoostError: [14:44:21] ../src/gbm/gbtree.cc:548: Check failed: common::AllVisibleGPUs() >= 1 (0 vs. 1) : No visible GPU is found for XGBoost. [![](https://i.imgur.com/Sq7Ddq3.png)](https://i.imgur.com/Sq7Ddq3.png) - ### TRANSF > RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx > No base models to train on, skipping auxiliary stack level 2... [![](https://i.imgur.com/8EimDZb.png)](https://i.imgur.com/8EimDZb.png) - ### CAT > catboost/cuda/cuda_lib/cuda_base.h:281: CUDA error 35: CUDA driver version is insufficient for CUDA runtime version > > _catboost.CatBoostError: catboost/cuda/cuda_lib/cuda_base.h:281: CUDA error 35: CUDA driver version is insufficient for CUDA runtime version No base models to train on, skipping auxiliary stack level 2... - ### NN_MXNET > MXNetError: GPU is not enabled - ### GBM, NeuralNetTorch, FASTAI 不會有 error,會自動切換到 **CPU** 版本 ::: ### Articles - [[kaggle] AutoGluon + RAPIDS (Top 1%)](https://www.kaggle.com/code/innixma/autogluon-rapids-top-1/notebook) - [[github] How to use GPUs for tabular #1097](https://github.com/awslabs/autogluon/issues/1097) - [[AutoGluon][doc] Tabular Prediction / navigate_next / FAQ](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-faq.html?highlight=gpu#can-i-use-gpus-for-model-training) - [[AutoGluon][doc] AutoMM for Text + Tabular - Quick Start](https://auto.gluon.ai/stable/tutorials/multimodal/multimodal_text_tabular.html?highlight=gpu) - [[AutoGluon][doc] Training models with GPU support](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-gpu.html) - CUDA toolkit is required for GPU training. ### Troubleshooting - ### Fitting model: LightGBMXT ``` Fitting model: LightGBMXT ... Training LightGBMXT with GPU, note that this may negatively impact model quality compared to CPU training. Warning: GPU mode might not be installed for LightGBM, GPU training raised an exception. Falling back to CPU training...Refer to LightGBM GPU documentation: https://github.com/Microsoft/LightGBM/tree/master/python-package#build-gpu-versionOne possible method is: pip uninstall lightgbm -y pip install lightgbm --install-option=--gpu ``` --- ``` pip uninstall lightgbm -y pip install lightgbm --install-option=--gpu ``` - errors ``` subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-install-ekyiu2lo/lightgbm_356d35a9e6b84ff89efd84d256913b7b/compile', '-DUSE_GPU=ON']' returned non-zero exit status 1. ... Exception: Please install CMake and all required dependencies first The full version of error log was saved into /root/LightGBM_compilation.log ... ``` - [When I run 'pip install lightgbm --install-option=--gpu', I got an error #1328](https://github.com/Microsoft/LightGBM/issues/1328) > refer to https://github.com/Microsoft/LightGBM/tree/master/python-package#build-gpu-version > you should install cmake, boost, opencl, and set the environment variables correctly. - [Build GPU Version](https://github.com/Microsoft/LightGBM/tree/master/python-package#build-gpu-version) - [Build CUDA Version](https://github.com/Microsoft/LightGBM/tree/master/python-package#build-cuda-version) - try - [How to install Boost on Ubuntu](https://stackoverflow.com/questions/12578499/) ``` sudo apt-get install libboost-all-dev ``` - [Install OpenCL Drivers On Ubuntu](https://support.zivid.com/en/v2.4/getting-started/software-installation/gpu/install-opencl-drivers-ubuntu.html) ```bash mkdir neo cd neo wget https://github.com/intel/compute-runtime/releases/download/19.07.12410/intel-gmmlib_18.4.1_amd64.deb wget https://github.com/intel/compute-runtime/releases/download/19.07.12410/intel-igc-core_18.50.1270_amd64.deb wget https://github.com/intel/compute-runtime/releases/download/19.07.12410/intel-igc-opencl_18.50.1270_amd64.deb wget https://github.com/intel/compute-runtime/releases/download/19.07.12410/intel-opencl_19.07.12410_amd64.deb wget https://github.com/intel/compute-runtime/releases/download/19.07.12410/intel-ocloc_19.07.12410_amd64.deb sudo dpkg -i *.deb # Check OpenCL driver /usr/bin/clinfo -l ``` - error `$ dpkg -i intel-igc-opencl_18.50.1270_amd64.deb` ``` dpkg: dependency problems prevent configuration of intel-igc-opencl: intel-igc-opencl depends on libtinfo5 (>= 6); however: Package libtinfo5 is not installed. dpkg: error processing package intel-igc-opencl (--install): dependency problems - leaving unconfigured Errors were encountered while processing: intel-igc-opencl ``` --- ``` pip uninstall lightgbm -y pip install lightgbm --install-option=--cuda ``` ![](https://i.imgur.com/TMCdxC8.png) <br> <hr> <br> ## Metrics - ### [autogluon.tabular.TabularPredictor](https://auto.gluon.ai/stable/api/autogluon.predictor.html#module-0) - **使用方式(範例)** ```python TabularPredictor( ... eval_metric='accuracy' # 只能有一個指標 ) ``` - **預設** - 二元分類:`accuracy` - 多元分類:`accuracy` - 迴歸:`root_mean_squared_error` - 分位數(quantile):`pinball_loss` - **分類** - `accuracy` - `balanced_accuracy` - `f1` - `f1_macro` - `f1_micro` - `f1_weighted`, - `roc_auc` - `roc_auc_ovo_macro` - `average_precision` - `precision` - `precision_macro` - `precision_micro`, - `precision_weighted` - `recall` - `recall_macro` - `recall_micro` - `recall_weighted` - `log_loss` - `pac_score` - **迴歸** - `root_mean_squared_error` - `mean_squared_error` - `mean_absolute_error` - `median_absolute_error` - `mean_absolute_percentage_error` - `r2` <br> <hr> <br> ## Models ### XGBoost - 測試跑兩次 XGBoost ```python %%time predictor = TabularPredictor( label, eval_metric='accuracy', path=path ).fit( train_data, hyperparameters={ 'XGB': [ {'ag_args_fit': {'num_gpus': 1}}, # train with GPU (first run) {'ag_args_fit': {'num_gpus': 1}}, # train with GPU (second run) ] }, ) ``` ``` . ├── [ 5] __version__ ├── [5.3K] learner.pkl ├── [ 104] models │ ├── [ 48] WeightedEnsemble_L2 │ │ ├── [6.3K] model.pkl │ │ └── [ 59] utils │ │ ├── [ 987] model_template.pkl │ │ └── [ 578] oof.pkl │ ├── [ 50] XGBoost <--------------- │ │ ├── [2.2K] model.pkl │ │ └── [214K] xgb.ubj │ ├── [ 50] XGBoost_2 <--------------- │ │ ├── [2.2K] model.pkl │ │ └── [214K] xgb.ubj │ └── [2.3K] trainer.pkl ├── [ 317] predictor.pkl └── [ 42] utils ├── [ 50] attr │ ├── [ 42] XGBoost │ │ └── [ 440] y_pred_proba_val.pkl │ └── [ 42] XGBoost_2 │ └── [ 440] y_pred_proba_val.pkl └── [ 86] data ├── [4.6K] X.pkl ├── [1.7K] X_val.pkl ├── [2.3K] y.pkl └── [1.1K] y_val.pkl 10 directories, 17 files ``` - KNNRapidsModel [[GitHub](https://github.com/awslabs/autogluon/blob/master/tabular/src/autogluon/tabular/models/knn/knn_rapids_model.py)] - [參考用法](https://github.com/h2oai/driverlessai-recipes/blob/master/models/algorithms/autogluon.py#L111) - 測試程式碼 ```python from autogluon.tabular.models.knn.knn_rapids_model import KNNModel from autogluon.tabular.models.knn.knn_rapids_model import KNNRapidsModel predictor = TabularPredictor( label, eval_metric='accuracy', path=path ).fit( train_data, hyperparameters={ KNNRapidsModel: {} }, ag_args_fit={'num_gpus': 1} ) ``` - error ``` Fitting model: KNNRapidsModel ... Warning: Exception caused KNNRapidsModel to fail during training (ImportError)... Skipping this model. `import cuml` failed. Ensure that you have a GPU and CUDA installation, and then install RAPIDS. You will likely need to create a fresh conda environment based off of a RAPIDS install, and then install AutoGluon on it. RAPIDS is highly experimental within AutoGluon, and we recommend to only use RAPIDS if you are an advanced user / developer. Please refer to RAPIDS install instructions for more information: https://rapids.ai/start.html#get-rapids No base models to train on, skipping auxiliary stack level 2... ``` ![](https://i.imgur.com/b9feDSq.png) - 環境建立 ``` conda create -n rapids-21.06 -c rapidsai -c nvidia -c conda-forge rapids=21.06 python=3.8 cudatoolkit=11.2 conda activate rapids-21.06 pip install --pre autogluon.tabular[all] ``` <br> <hr> <br> ## Doc - 細節 > - 使用場景 > - 在 Kaggle 的使用情境 > - 前處理 > - 帶有文字、分類特徵的資料表格,要如何處理? > - 帶有影像、文字、數值、分類特徵的資料表格,要如何處理? > - 如何使用特徵工程,以及如何擴充? > - 訓練 > - fit 使用教學 (以及進階用法) > - 如何使用規則,建立可解釋的模型? > - 如何透過 GPU 加速 > - 如何添加自定義模型 (以及進階用法)? > - 預測 > - 如何預測多欄位? > - 評估 > - 如何添加自定義指標? ### Intro - [Tabular Prediction](https://auto.gluon.ai/stable/tutorials/tabular_prediction/index.html) - 號稱不需要資料清整、特徵工程、超參數最佳化、模型選擇 ### [1. Quick Start Using FIT](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-quickstart.html) > 5 min tutorial on fitting models with tabular datasets. > fit 教學 - [TabularPredictor](https://auto.gluon.ai/0.1.0/api/autogluon.task.html#autogluon.tabular.TabularPredictor) - problem_type 有四種 - binary - multiclass - regression - None (自動推斷,根據 label 值推斷) - 前處理 - 自動處理缺失值、重新調整特徵值 - 訓練 - API ```python from autogluon.tabular import TabularPredictor predictor = TabularPredictor(label=<variable-name>) .fit(train_data=<file-name>) ``` - 自動擬合各類型模型,像是 neural networks 和 tree ensembles - 自動對各模型尋找最佳超參數,讓 驗證集 表現最好 - `fit()` 可在多執行緒、多機器下並行處理(但不支援多行程?) - time_limit (是總時間限制) - `fit(train_data, time_limit=60)` 使用時間限制的缺點: [![](https://i.imgur.com/ZWV77FT.png)](https://i.imgur.com/ZWV77FT.png) 後面的模型都沒有跑到 - 推論 - 對於分類問題,除了預測類別之外,也支援預測類別的機率 - [leaderboard](https://auto.gluon.ai/0.1.0/api/autogluon.task.html#autogluon.tabular.TabularPredictor.leaderboard) (排行榜) [![](https://i.imgur.com/knY7IL0.png)](https://i.imgur.com/knY7IL0.png) 'score_test', 'pred_time_test', and 'pred_time_test_marginal' ### [2. In-depth FIT Tutorial](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-indepth.html) > In-depth tutorial on controlling various aspects of model fitting. > 深入探討 fit - **指定模型參數** > - 設定:NN 的超參數 (像是 epoch)、LightGBM 的超參數 > - 它的每個超參數 > - 有一個預設值 > - 及一個範圍或清單 > - 在啟動 HPO (超參數優化) 時,可以在範圍或清單中挑選(類似 SmartML 功能) ```python import autogluon.core as ag nn_options = { 'num_epochs': 10, 'learning_rate': ag.space.Real(1e-4, 1e-2, default=5e-4, log=True), 'activation': ag.space.Categorical('relu', 'softrelu', 'tanh'), 'dropout_prob': ag.space.Real(0.0, 0.5, default=0.1), } hyperparameters = { 'GBM': gbm_options, 'NN_TORCH': nn_options, } ``` - **超參數優化** > - 可以設定 HPO 的執行次數、以及執行總時間 (目前還不清楚結束條件如何判斷) > - HPO 也是有 auto 選項 (on-going) ```python time_limit = 15*60 # train various models for ~2 min num_trials = 1000 # try at most 5 different hyperparameter configurations for each type of model search_strategy = 'auto' # to tune hyperparameters using random search routine with a local scheduler # HPO is not performed unless hyperparameter_tune_kwargs is specified hyperparameter_tune_kwargs = { 'num_trials': num_trials, 'scheduler' : 'local', 'searcher': search_strategy, } ``` - **輔助指標** - `evaluate(test_data, auxiliary_metrics=False)` - False: 使用自訂指標 - True, default: 使用所有指標 - 使用所有指標: - accuracy: 0.8752175248234211 - balanced_accuracy: 0.7985774242740231 - mcc: 0.6384055943366135 - roc_auc: 0.9292811684599376 - f1: 0.7128386336866903 - precision: 0.785158277114686 - recall: 0.6527178602243313 ### [3. Kaggle Tutorial](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-kaggle.html) > Using AutoGluon for Kaggle competitions with tabular data. > 使用在 Kaggle 的情境 ### [4. Data Tables Containing Image, Text, and Tabular](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-multimodal.html) > Modeling data tables with image, text, numeric, and categorical features. > 帶有影像、文字、數值、分類特徵的資料表格,要如何處理? ### [5. Data Tables Containing Text](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-multimodal-text-others.html) > Modeling data tables with text and numeric/categorical features. > 帶有文字的資料表格,要如何處理? ### [6. Interpretable rule-based modeling](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-interpretability.html) > Fitting interpretable models to data table for understanding data and predictions. > 使用規則,建立可解釋的模型 ### [7. Training models with GPU support](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-gpu.html) > How to train models with GPU support. > 如何透過 GPU 加速? ### [8. Multi-Label Prediction](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-multilabel.html) > How to predict multiple columns in a data table. > 如何預測多欄位? ### [9. Adding a Custom Model](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-custom-model.html) > How to add a custom model to AutoGluon. > 如何添加客製化模型? ### [10. Adding a Custom Model (Advanced)](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-custom-model-advanced.html) > How to add a custom model to AutoGluon (Advanced). > 如何添加自定義模型(進階)? ### [11. Adding a Custom Metric](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-custom-metric.html) > How to add a custom metric to AutoGluon. > 如何添加自定義指標? ### [12. Feature Engineering](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-feature-engineering.html) > AutoGluon’s default feature engineering and how to extend it. > 如何使用特徵工程,以及如何擴充? ### [FAQ](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-faq.html) > Frequently asked questions about AutoGluon-Tabular. ### [Functionality Reference Implementation](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-custom-model.html#functionality-reference-implementation) <br> <hr> <br> ## conda 實驗 ``` conda create -n autogluon python=3.8 ``` <br> <hr> <br> ## 推論 - 使用 GPU 訓練,卻用 CPU 推論 - entrypoint /autogluon/tabular/predictor/predictor.py", line 1382, in predict - error RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU. - snapshot [![](https://i.imgur.com/ViUfBPc.png)](https://i.imgur.com/ViUfBPc.png) <br> <hr> <br> ## 參考資料 - [AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data](https://arxiv.org/abs/2003.06505) > 連接來源:[Model ensembling with stacking/bagging]](https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-indepth.html#model-ensembling-with-stacking-bagging) - [PDF](https://arxiv.org/pdf/2003.06505.pdf) - [藉助 NVIDIA GPU 和 RAPIDS 提升 AutoML 技術水準的速度達 10 倍之多](https://blogs.nvidia.com.tw/2022/03/29/advancing-the-state-of-the-art-in-automl-now-10x-faster-with-nvidia-gpus-and-rapids/) - AutoGluon 是一種 AutoML,演算法不同於 auto-sklearn - AutoGluon 如何在 Kaggle 預測競賽中,使用三行程式碼勝過 99% 的人類資料科學團隊,而無須專家知識? - AutoGluon-Tabular,一種 AutoGluon API,僅需要幾行 Python,即可在未經處理的表格資料集(例如 CSV 檔案)上,訓練高度準確的機器學習模型。 - AutoGluon 比 TPOT、H2O、AutoWEKA、auto-sklearn 和 Google AutoML Tables 更快速、更穩健、更準確。 - AutoGluon說:一般 AutoML 在整體化時,調整超參數較没有幫助。 - [AutoGluon | 用三行代码战胜 90% 的模型](https://cloud.tencent.com/developer/article/1827841) - [AutoGluon: Deep Learning AutoML](https://towardsdatascience.com/autogluon-deep-learning-automl-5cdb4e2388ec#065f) - Adult Income Classification - [[Day 22] 機器學習模型技巧 Stacking](https://ithelp.ithome.com.tw/articles/10250317) ![](https://i.imgur.com/B9azygw.png) <br> <hr> <br> ### iris 測試 - case 1 ```python predictor = TabularPredictor(label=label, path=save_path).fit(train_data) ``` ``` ... Fitting model: LightGBMLarge ... (卡超過 1 小時) ``` - case 2: total | Fitting model | Training runtime | | --------------- | ---------------- | | LightGBMLarge | 4385.04s (73m 05s) | | XGBoost | 1287.35s (21m 27s) | | LightGBM | 1176.56s (19m 36s) | | LightGBMXT | 768.25s (12m 48s) | | NeuralNetTorch | 327.37s (5m 27s)| | NeuralNetFastAI | 211.36s (3m 31s) | <br> <hr> <br> ## :warning: 錯誤排除 - [[HackMD] NVIDIA / AutoGluon - 錯誤排除](https://hackmd.io/_d8x1bwnR1O9FDpDUtiq0g) - [安裝失敗訊息](https://hackmd.io/_d8x1bwnR1O9FDpDUtiq0g#%E5%AE%89%E8%A3%9D%E5%A4%B1%E6%95%97%E8%A8%8A%E6%81%AF) - [ml-sklearn:v1](https://hackmd.io/_d8x1bwnR1O9FDpDUtiq0g#ml-sklearnv1) - [ubuntu20.04-python3.8:latest](https://hackmd.io/_d8x1bwnR1O9FDpDUtiq0g#ubuntu2004-python38latest) - [notebook / RAPIDS-22.04](https://hackmd.io/_d8x1bwnR1O9FDpDUtiq0g#notebook--RAPIDS-2204) - [autogluon-core 相依性](https://hackmd.io/_d8x1bwnR1O9FDpDUtiq0g#autogluon-core-%E7%9B%B8%E4%BE%9D%E6%80%A7)