[ASUS] mlflow - HackMD

[ASUS] mlflow === ###### tags: `ML / sklearn` ###### tags: `ML`, `sklearn`, `python`, `python`, `mlflow` [TOC] <hr> ## [MLflow Model Registry](https://mlflow.org/docs/1.24.0/model-registry.html#) ### [Fetching an MLflow Model from the Model Registry](https://mlflow.org/docs/1.24.0/model-registry.html#fetching-an-mlflow-model-from-the-model-registry) ```python import mlflow.sklearn model_name = "sk-learn-random-forest-reg-model" model_version = 1 model = mlflow.sklearn.load_model( model_uri=f"models:/{model_name}/{model_version}" ) model.predict(data) ``` ## [MLflow Tracking](https://mlflow.org/docs/latest/tracking.html) ### [Backend Stores](https://mlflow.org/docs/latest/tracking.html#id13) Use `--backend-store-uri` to configure the type of backend store. You specify: ## [官方] Python API ### [mlflow](https://www.mlflow.org/docs/latest/python_api/mlflow.html#module-mlflow) ```python import mlflow mlflow.start_run() mlflow.log_param("my", "param") mlflow.log_metric("score", 100) mlflow.end_run() ``` ### [mlflow.sklearn](https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html) ### [mlflow.xgboost](https://www.mlflow.org/docs/latest/python_api/mlflow.xgboost.html) - [xgboost release](https://github.com/dmlc/xgboost/releases) - 1.6.1 - 1.5.2 ### [mlflow.lightgbm](https://www.mlflow.org/docs/latest/python_api/mlflow.lightgbm.html) ### [MLflow Tracking / Logging Data to Runs](https://www.mlflow.org/docs/latest/tracking.html#logging-data-to-runs) > - set_experiment() > - start_run() > - end_run() > - log_param() > - > - ... - ### 紀錄 - 同一份 source，如果沒有呼叫 `end_run()`，都算在同一筆紀錄 - ### 推測 source 行為： - 抓 `__file__` 的 filename 當作 source - 如果是 notebook cell 通常是 `ipykernel_launcher.py` - 不同的 kernel 亦產生不同的記憶體位址或是 hashcode 即使相同 source = `ipykernel_launcher.py` 亦視為不同 record - ### 設定實驗名稱 `mlflow.set_experiment(os.environ["MLFLOW_EXPERIMENT_NAME"])` - 可以不用呼叫 - `log_xxx()`` 就會自動抓環境變數`MLFLOW_EXPERIMENT_NAME`來填 - i.e. 假設環境變數都有了，底下操作就會有紀錄 ``` import mlflow mlflow.log_xxx() ``` ![](https://i.imgur.com/PRXTxS8.png) ![](https://i.imgur.com/WAaYvtD.png) - ### [mlflow.log_artifact()](https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_artifact) vs [mlflow.log_artifacts()](https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_artifacts) - ### doc - mlflow.log_artifact() > logs a local file or directory as an artifact, optionally taking an artifact_path to place it in within the run’s artifact URI. Run artifacts can be organized into directories, so you can place the artifact in a directory this way. - mlflow.log_artifacts() > logs all the files in a given directory as artifacts, again taking an optional artifact_path. - ### log_artifacts() > 將原始目錄下的所有檔案和目錄，複製到目標目錄下 - `mlflow.log_artifacts("data", artifact_path="states")` - src ![](https://i.imgur.com/NMALHxm.png) - target ![](https://i.imgur.com/F64Wzln.png) - ### log_artifact() > 除了輸出單一檔案，亦可輸出整個目錄(包含它自己) - `mlflow.log_artifact("data", artifact_path="states")` - src ![](https://i.imgur.com/NMALHxm.png) - target ![](https://i.imgur.com/acJV5M6.png) ### [MLflow Tracking / Automatic Logging](https://www.mlflow.org/docs/latest/tracking.html#automatic-logging) | triggered params ? | XGBoost | LightGBM | RandomForest | | ------------------ | ------- | -------- | ------------ | | `mlflow.xgboost.autolog()` | v | | | | `mlflow.lightgbm.autolog()` | | v | | | `mlflow.sklearn.autolog()` | v | v | v | | `mlflow.autolog()` | v | v | v | - sklearn 會為 - RandomForest 冠上 `randomforestclassifier` 當 prefix - LightGBM 冠上 `lgbmclassifier` 當 prefix - XGBoost params 涵蓋範圍： - `mlflow.autolog()` > `mlflow.sklearn.autolog()` ![](https://i.imgur.com/A5cjh0q.png) - `mlflow.autolog()` > `mlflow.xgboost.autolog()` - `mlflow.sklearn.autolog()` & `mlflow.xgboost.autolog()` - 有些自己有，對方沒有 - 有些自己沒有，但對方有 - `mlflow.autolog()` = `mlflow.sklearn.autolog()` + `mlflow.xgboost.autolog()` - 結論 1. `mlflow.autolog()` 涵蓋的 params 最廣 2. call `mlflow.autolog()` 就可以 ### [MLflow Models / Model Signature / How To Log Models With Signatures](https://www.mlflow.org/docs/latest/models.html#how-to-log-models-with-signatures) ```python= import pandas as pd from sklearn import datasets from sklearn.ensemble import RandomForestClassifier import mlflow import mlflow.sklearn from mlflow.models.signature import infer_signature iris = datasets.load_iris() iris_train = pd.DataFrame(iris.data, columns=iris.feature_names) clf = RandomForestClassifier(max_depth=7, random_state=0) clf.fit(iris_train, iris.target) signature = infer_signature(iris_train, clf.predict(iris_train)) mlflow.sklearn.log_model(clf, "iris_rf", signature=signature) ``` <hr> ## [官方] Source code ### [mlflow.autolog](https://github.com/mlflow/mlflow/blob/master/mlflow/sklearn/__init__.py) - https://github.com/mlflow/mlflow/blob/master/mlflow/sklearn/__init__.py#L1005 ```=1013 For post training metrics autologging, the metric key format is: "{metric_name}[-{call_index}]_{dataset_name}" ``` - [[#1510] patched_predict](https://github.com/mlflow/mlflow/blob/master/mlflow/sklearn/__init__.py#L1510) > 關鍵字來源：metric 中的 unknown_datase > unknown_datase -> register_prediction_input_dataset -> - [[#1575] patched_model_score](https://github.com/mlflow/mlflow/blob/master/mlflow/sklearn/__init__.py#L1575) > 關鍵字來源：metric 中的 unknown_datase > unknown_datase -> register_prediction_input_dataset -> - [[#1698] log_post_training_metrics](https://github.com/mlflow/mlflow/blob/master/mlflow/sklearn/__init__.py#L1698) ![](https://i.imgur.com/PvyfnlB.png) ### patch - [[#555]`patch_function(call_original, *args, **kwargs)`](https://github.com/mlflow/mlflow/blob/master/mlflow/utils/autologging_utils/safety.py#L555) ![](https://i.imgur.com/eskNCZ3.png) <hr> ## 啟用 mlflow server ### file-based ? ``` $ mlflow server -h 0.0.0.0 -p 5000 ``` - 目錄下必須有 `mlflow` 目錄 - 指令來源： - [[mlflow] Issue: Bug Report](https://github.com/mlflow/mlflow/issues/new?assignees=&labels=bug&template=bug_report_template.yaml&title=%5BBUG%5D) ### db-based ? > [Scenario 3: MLflow on localhost with Tracking Server](https://mlflow.org/docs/1.24.0/tracking.html#scenario-3-mlflow-on-localhost-with-tracking-server) - ### Step1: 啟動 mlflow server ```bash $ rm -rf mlruns $ rm mlruns.db $ mlflow server -h 0.0.0.0 -p 5000 \ --backend-store-uri sqlite:///mlruns.db \ --default-artifact-root ./mlruns ``` - ### Step2: 檢查 web server 是否有畫面檢查 http://10.78.26.241:35000/ (範例) - ### Step3: 將資訊列入到 mlflow ```python import mlflow import mlflow.sklearn from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier mlflow.set_tracking_uri("sqlite:///mlruns.db") mlflow.set_tag('author', 'tj_tsai') iris = load_iris() sk_model = RandomForestClassifier() sk_model.fit(iris.data, iris.target) #log model params mlflow.log_param("criterion", sk_model.class_weight) mlflow.log_param("ccp_alpha", sk_model.ccp_alpha) # log model mlflow.sklearn.log_model(sk_model, "sk_models") ``` - run 資訊 ``` mlflow.active_run().to_dictionary() ``` ![](https://i.imgur.com/ZARSWvK.png) - mlruns 目錄下 ![](https://i.imgur.com/A1iNirp.png) - ### Step4: 儲存 model - http://10.78.26.241:35000/ ![](https://i.imgur.com/SQeeidR.png) - 查看 model ![](https://i.imgur.com/runrD80.png) - 儲存 model ![](https://i.imgur.com/XolxbHw.png) - 儲存結果 ![](https://i.imgur.com/SQVoWc9.png) - ### Step5: 載入 model - case1 ```python import mlflow from mlflow.tracking import MlflowClient client = MlflowClient() client.get_model_version_download_uri('penguin', '1') ``` - 參考資料 - [get_model_version_download_uri(name: str, version: str)](https://mlflow.org/docs/latest/python_api/mlflow.client.html#mlflow.client.MlflowClient.get_model_version_download_uri) 執行結果： ``` './mlruns/0/ec33df4b41e04dc5bf49e9fabe0f147a/artifacts/sk_models' ``` - case2 ```python from sklearn.metrics import accuracy_score model_uri = f'models:/penguin/1' model2 = mlflow.sklearn.load_model(model_uri) y_true = iris.target y_pred = model2.predict(iris.data) accuracy_score(y_true, y_pred) ``` 執行結果： ``` 1.0 ``` <hr> ## 簡易測試資料1 > [資料來源](https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html) - ### Step1 ```python= import mlflow import os os.environ["AWS_ACCESS_KEY_ID"] = "*****" os.environ["AWS_SECRET_ACCESS_KEY"] = "*****" os.environ["MLFLOW_S3_ENDPOINT_URL"] = "https://cloudstorage.oneai.twcc.ai" os.environ["MLFLOW_TRACKING_URI"] = "https://mlflow-token:*****" os.environ["MLFLOW_EXPERIMENT_NAME"] = "tj-mlflow-0629" ``` - ### Step2 ```python= mlflow.autolog() ``` - ### Step3 ```python= run = mlflow.start_run() ``` ![](https://i.imgur.com/MnDZov3.png) - ### Step4 ```python= import numpy as np from sklearn.linear_model import LinearRegression X = np.array([[1],[2],[3],[4]]) y = np.array([2,4,6,8]) model = LinearRegression() model.fit(X, y) ``` [![](https://i.imgur.com/2YE8ls3.png)](https://i.imgur.com/2YE8ls3.png) [![](https://i.imgur.com/ZASbOXw.png)](https://i.imgur.com/ZASbOXw.png) [![](https://i.imgur.com/RjMCpIu.png)](https://i.imgur.com/RjMCpIu.png) - ### Step5 > https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html#post-training-metrics > - For post training metrics autologging, the metric key format is: > `{metric_name}[-{call_index}]_{dataset_name}` > - If the metric function is `model.score`, then "metric_name" is `{model_class_name}_score` > - If multiple calls are made to the same scikit-learn metric API, each subsequent call adds a "call_index" (starting from 2) to the metric key. **Case1:** ```python= X = [[5],[6],[7]] y = [10,12,14] model.score(X, y) ``` - **round-1** ![](https://i.imgur.com/pmxRsYR.png) - **round-2**: call_index=2 ![](https://i.imgur.com/fXHRTJP.png) - **round-3**: call_index=3 ![](https://i.imgur.com/EGS6THM.png) --- **Case2:** ```python= iris_X = [[5],[6],[7]] iris_y = [10,12,14] model.score(iris_X, iris_y) ``` - **round-1** ![](https://i.imgur.com/HvE9CAq.png) - **round-2**: call_index=2 ![](https://i.imgur.com/OVMCGHR.png) - **round-3**: call_index=3 ![](https://i.imgur.com/YVteks3.png) --- **Case3:** > f the "prediction input dataset" instance is an intermediate expression without a defined variable name, the dataset name is set to "**unknown_dataset**" ```python= model.score([[5],[6],[7]], [10,12,14]) ``` - **round-1** ![](https://i.imgur.com/eBRG9MB.png) - **round-2**: call_index=2 ![](https://i.imgur.com/ykPz5d2.png) - **round-3**: call_index=3 ![](https://i.imgur.com/Nhjiv18.png) :::warning :bulb: **"unknwon_dataset"** 如果 metric 名稱有 "unknwon_dataset" 那應該是直接傳遞陣列給 `mlflow.score([...])` 或 `mlflow.predict([...])` ::: <hr> ## 簡易測試資料2 > [資料來源](https://www.mlflow.org/docs/latest/tracking.html#automatic-logging) > ```python= import mlflow import os os.environ["AWS_ACCESS_KEY_ID"] = "*****" os.environ["AWS_SECRET_ACCESS_KEY"] = "*****" os.environ["MLFLOW_S3_ENDPOINT_URL"] = "https://cloudstorage.oneai.twcc.ai" os.environ["MLFLOW_TRACKING_URI"] = "https://mlflow-token:*****" os.environ["MLFLOW_EXPERIMENT_NAME"] = "tj-mlflow-0629" from sklearn.model_selection import train_test_split from sklearn.datasets import load_diabetes from sklearn.ensemble import RandomForestRegressor mlflow.autolog() db = load_diabetes() X_train, X_test, y_train, y_test = train_test_split(db.data, db.target) # Create and train models. rf = RandomForestRegressor(n_estimators = 100, max_depth = 6, max_features = 3) rf.fit(X_train, y_train) # Use the model to make predictions on the test dataset. predictions = rf.predict(X_test) autolog_run = mlflow.last_active_run() ``` - error: 需安裝 `boto3` ``` 2022/06/30 17:48:48 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn. 2022/06/30 17:48:48 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '5191d4672ab3473199d7a47bb5418590', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow 2022/06/30 17:48:50 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during sklearn autologging: No module named 'boto3' ``` ``` pip install boto3 ``` - [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html): used to create, configure, and manage AWS services ``` from sklearn.metrics import r2_score r2_score(test_y, pred_y) ``` 會自動產生 `r2_score_test_X` ![](https://i.imgur.com/oAoH4m1.png) <hr> ## 參考 Cynthia code ### diff - [is_support_mlflow](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/commit/546d29950202eca9b0f27fef6e4e54aebe28626f) - [add mlflow params and tags](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/commit/56043105fbbe852731474b0ccf9cf8a26eade9d2) > 不要 call mlflow.set_experiment(mlflow_experiment) - [log_model](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/commit/1901cdb321c241851f42be27b240e704a856f1ad) ### [utils / env_utils.py](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/blob/mlflow/workspace/utils/env_utils.py) - [is_support_mlflow](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/blob/mlflow/workspace/utils/env_utils.py#L53) - [get_mlflow_experiment](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/blob/mlflow/workspace/utils/env_utils.py#L58) ### [train.py](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/blob/mlflow/workspace/train.py) - is_support_mlflow=env_utils.is_support_mlflow() ## ASUS ### Mlflow 架構 - [Mlflow Model on Model Management Service](https://hackmd.io/kJzzKT95Qv6VL80FmKdqrg?view) ### AI Maker 文件 - [AI Maker（搶鮮版）](https://docs.oneai.twcc.ai/s/3uxGFglX0) <hr> ## [Issues](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/) - [[映像檔] Training image+範本要能支援 mlflow](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/325) - [看懂 SMTR 的 Error Message](https://hackmd.io/C0BidgsjQ8m4BOuBo1VEUw) - ✗ Create MLflow experiment failed! some error msg - [[Training Job] 搶鮮版的訓練任務未正確處理set_experiment API，導致訓練任務失敗](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/362) - [[fixed][訓練任務] 演算法 Random 會一直在 command 補上 "pip install mlflow-asus-aimaker"](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/363) - [[Portal] 輸入特殊字元為名稱，UI會被鎖死](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/331) - [[訓練任務] Checkbox會自動reset，導致挑選多個訓練任務不易](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/346) - [[Log_Model] 搶鮮版訓練任務使用 log_model 出現 s3 error](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/367) > " warnings.warn(", "2022/07/13 11:09:46 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during sklearn autologging: Failed to upload /tmp/tmp_ego8mh_/model/model.pkl to mlflow/60/2c01ede2c05b40fcb60d4b3411bf1324/artifacts/model/model.pkl: An error occurred (InternalServerError) when calling the CompleteMultipartUpload operation (reached max retries: 4): Unknown", <hr> ## Portal ### env | Variable | Value | | -------- | ----- | | AWS_ACCESS_KEY_ID | X7TUY4N4UA2FYEL6EG1DVPSZ | | AWS_SECRET_ACCESS_KEY | vLsLRSTxv05iSR1oB3YU3p4y2t9PV2wO1J6YWyuJ | | MLFLOW_S3_ENDPOINT_URL | https://cloudstorage.oneai.twcc.ai | | MLFLOW_TRACKING_URI | file-plugin:/mnt/mlflow | ### Search - 條件輸入 > INVALID_PARAMETER_VALUE: Invalid entity type 'Metrics'. Valid values are ['metric', 'parameter', 'tag', 'attribute'] ## 參考資料 - [Python sklearn RandomForestClassifier non-reproducible results](https://stackoverflow.com/questions/47433920) - [Automatic Logging](https://www.mlflow.org/docs/latest/tracking.html#automatic-logging) - [15 Best Tools for ML Experiment Tracking and Management](https://neptune.ai/blog/best-ml-experiment-tracking-tools#overview)