[ASUS] mlflow
===
###### tags: `ML / sklearn`
###### tags: `ML`, `sklearn`, `python`, `python`, `mlflow`
<br>
[TOC]
<br>
<hr>
## [MLflow Model Registry](https://mlflow.org/docs/1.24.0/model-registry.html#)
### [Fetching an MLflow Model from the Model Registry](https://mlflow.org/docs/1.24.0/model-registry.html#fetching-an-mlflow-model-from-the-model-registry)
```python
import mlflow.sklearn
model_name = "sk-learn-random-forest-reg-model"
model_version = 1
model = mlflow.sklearn.load_model(
model_uri=f"models:/{model_name}/{model_version}"
)
model.predict(data)
```
<br>
## [MLflow Tracking](https://mlflow.org/docs/latest/tracking.html)
### [Backend Stores](https://mlflow.org/docs/latest/tracking.html#id13)
Use `--backend-store-uri` to configure the type of backend store. You specify:
<br>
## [官方] Python API
### [mlflow](https://www.mlflow.org/docs/latest/python_api/mlflow.html#module-mlflow)
```python
import mlflow
mlflow.start_run()
mlflow.log_param("my", "param")
mlflow.log_metric("score", 100)
mlflow.end_run()
```
### [mlflow.sklearn](https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html)
### [mlflow.xgboost](https://www.mlflow.org/docs/latest/python_api/mlflow.xgboost.html)
- [xgboost release](https://github.com/dmlc/xgboost/releases)
- 1.6.1
- 1.5.2
### [mlflow.lightgbm](https://www.mlflow.org/docs/latest/python_api/mlflow.lightgbm.html)
<br>
### [MLflow Tracking / Logging Data to Runs](https://www.mlflow.org/docs/latest/tracking.html#logging-data-to-runs)
> - set_experiment()
> - start_run()
> - end_run()
> - log_param()
> -
> - ...
- ### 紀錄
- 同一份 source,如果沒有呼叫 `end_run()`,都算在同一筆紀錄
- ### 推測 source 行為:
- 抓 `__file__` 的 filename 當作 source
- 如果是 notebook cell 通常是 `ipykernel_launcher.py`
- 不同的 kernel 亦產生不同的 記憶體位址 或是 hashcode
即使相同 source = `ipykernel_launcher.py` 亦視為不同 record
- ### 設定實驗名稱
`mlflow.set_experiment(os.environ["MLFLOW_EXPERIMENT_NAME"])`
- 可以不用呼叫
- `log_xxx()`` 就會自動抓環境變數`MLFLOW_EXPERIMENT_NAME`來填
- i.e.
假設環境變數都有了,底下操作就會有紀錄
```
import mlflow
mlflow.log_xxx()
```


- ### [mlflow.log_artifact()](https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_artifact) vs [mlflow.log_artifacts()](https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.log_artifacts)
- ### doc
- mlflow.log_artifact()
> logs a local file or directory as an artifact, optionally taking an artifact_path to place it in within the run’s artifact URI. Run artifacts can be organized into directories, so you can place the artifact in a directory this way.
- mlflow.log_artifacts()
> logs all the files in a given directory as artifacts, again taking an optional artifact_path.
- ### log_artifacts()
> 將原始目錄下的所有檔案和目錄,複製到目標目錄下
- `mlflow.log_artifacts("data", artifact_path="states")`
- src

- target

- ### log_artifact()
> 除了輸出單一檔案,亦可輸出整個目錄(包含它自己)
- `mlflow.log_artifact("data", artifact_path="states")`
- src

- target

<br>
### [MLflow Tracking / Automatic Logging](https://www.mlflow.org/docs/latest/tracking.html#automatic-logging)
| triggered params ? | XGBoost | LightGBM | RandomForest |
| ------------------ | ------- | -------- | ------------ |
| `mlflow.xgboost.autolog()` | v | | |
| `mlflow.lightgbm.autolog()` | | v | |
| `mlflow.sklearn.autolog()` | v | v | v |
| `mlflow.autolog()` | v | v | v |
- sklearn 會為
- RandomForest 冠上 `randomforestclassifier` 當 prefix
- LightGBM 冠上 `lgbmclassifier` 當 prefix
- XGBoost params 涵蓋範圍:
- `mlflow.autolog()` > `mlflow.sklearn.autolog()`

- `mlflow.autolog()` > `mlflow.xgboost.autolog()`
- `mlflow.sklearn.autolog()` & `mlflow.xgboost.autolog()`
- 有些自己有,對方沒有
- 有些自己沒有,但對方有
- `mlflow.autolog()`
= `mlflow.sklearn.autolog()` + `mlflow.xgboost.autolog()`
- 結論
1. `mlflow.autolog()` 涵蓋的 params 最廣
2. call `mlflow.autolog()` 就可以
<br>
### [MLflow Models / Model Signature / How To Log Models With Signatures](https://www.mlflow.org/docs/latest/models.html#how-to-log-models-with-signatures)
```python=
import pandas as pd
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
import mlflow
import mlflow.sklearn
from mlflow.models.signature import infer_signature
iris = datasets.load_iris()
iris_train = pd.DataFrame(iris.data, columns=iris.feature_names)
clf = RandomForestClassifier(max_depth=7, random_state=0)
clf.fit(iris_train, iris.target)
signature = infer_signature(iris_train, clf.predict(iris_train))
mlflow.sklearn.log_model(clf, "iris_rf", signature=signature)
```
<br>
<hr>
<br>
## [官方] Source code
### [mlflow.autolog](https://github.com/mlflow/mlflow/blob/master/mlflow/sklearn/__init__.py)
- https://github.com/mlflow/mlflow/blob/master/mlflow/sklearn/__init__.py#L1005
```=1013
For post training metrics autologging, the metric key format is:
"{metric_name}[-{call_index}]_{dataset_name}"
```
- [[#1510] patched_predict](https://github.com/mlflow/mlflow/blob/master/mlflow/sklearn/__init__.py#L1510)
> 關鍵字來源:metric 中的 unknown_datase
> unknown_datase -> register_prediction_input_dataset ->
- [[#1575] patched_model_score](https://github.com/mlflow/mlflow/blob/master/mlflow/sklearn/__init__.py#L1575)
> 關鍵字來源:metric 中的 unknown_datase
> unknown_datase -> register_prediction_input_dataset ->
- [[#1698] log_post_training_metrics](https://github.com/mlflow/mlflow/blob/master/mlflow/sklearn/__init__.py#L1698)

### patch
- [[#555]`patch_function(call_original, *args, **kwargs)`](https://github.com/mlflow/mlflow/blob/master/mlflow/utils/autologging_utils/safety.py#L555)

<br>
<hr>
<br>
## 啟用 mlflow server
### file-based ?
```
$ mlflow server -h 0.0.0.0 -p 5000
```
- 目錄下必須有 `mlflow` 目錄
- 指令來源:
- [[mlflow] Issue: Bug Report](https://github.com/mlflow/mlflow/issues/new?assignees=&labels=bug&template=bug_report_template.yaml&title=%5BBUG%5D)
### db-based ?
> [Scenario 3: MLflow on localhost with Tracking Server](https://mlflow.org/docs/1.24.0/tracking.html#scenario-3-mlflow-on-localhost-with-tracking-server)
- ### Step1: 啟動 mlflow server
```bash
$ rm -rf mlruns
$ rm mlruns.db
$ mlflow server -h 0.0.0.0 -p 5000 \
--backend-store-uri sqlite:///mlruns.db \
--default-artifact-root ./mlruns
```
- ### Step2: 檢查 web server 是否有畫面
檢查 http://10.78.26.241:35000/ (範例)
- ### Step3: 將資訊列入到 mlflow
```python
import mlflow
import mlflow.sklearn
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
mlflow.set_tracking_uri("sqlite:///mlruns.db")
mlflow.set_tag('author', 'tj_tsai')
iris = load_iris()
sk_model = RandomForestClassifier()
sk_model.fit(iris.data, iris.target)
#log model params
mlflow.log_param("criterion", sk_model.class_weight)
mlflow.log_param("ccp_alpha", sk_model.ccp_alpha)
# log model
mlflow.sklearn.log_model(sk_model, "sk_models")
```
- run 資訊
```
mlflow.active_run().to_dictionary()
```

- mlruns 目錄下

- ### Step4: 儲存 model
- http://10.78.26.241:35000/

- 查看 model

- 儲存 model

- 儲存結果

- ### Step5: 載入 model
- case1
```python
import mlflow
from mlflow.tracking import MlflowClient
client = MlflowClient()
client.get_model_version_download_uri('penguin', '1')
```
- 參考資料
- [get_model_version_download_uri(name: str, version: str)](https://mlflow.org/docs/latest/python_api/mlflow.client.html#mlflow.client.MlflowClient.get_model_version_download_uri)
執行結果:
```
'./mlruns/0/ec33df4b41e04dc5bf49e9fabe0f147a/artifacts/sk_models'
```
- case2
```python
from sklearn.metrics import accuracy_score
model_uri = f'models:/penguin/1'
model2 = mlflow.sklearn.load_model(model_uri)
y_true = iris.target
y_pred = model2.predict(iris.data)
accuracy_score(y_true, y_pred)
```
執行結果:
```
1.0
```
<br>
<hr>
<br>
## 簡易測試資料1
> [資料來源](https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html)
- ### Step1
```python=
import mlflow
import os
os.environ["AWS_ACCESS_KEY_ID"] = "*****"
os.environ["AWS_SECRET_ACCESS_KEY"] = "*****"
os.environ["MLFLOW_S3_ENDPOINT_URL"] = "https://cloudstorage.oneai.twcc.ai"
os.environ["MLFLOW_TRACKING_URI"] = "https://mlflow-token:*****"
os.environ["MLFLOW_EXPERIMENT_NAME"] = "tj-mlflow-0629"
```
- ### Step2
```python=
mlflow.autolog()
```
- ### Step3
```python=
run = mlflow.start_run()
```

- ### Step4
```python=
import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1],[2],[3],[4]])
y = np.array([2,4,6,8])
model = LinearRegression()
model.fit(X, y)
```
[](https://i.imgur.com/2YE8ls3.png)
[](https://i.imgur.com/ZASbOXw.png)
[](https://i.imgur.com/RjMCpIu.png)
- ### Step5
> https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html#post-training-metrics
> - For post training metrics autologging, the metric key format is:
> `{metric_name}[-{call_index}]_{dataset_name}`
> - If the metric function is `model.score`, then "metric_name" is `{model_class_name}_score`
> - If multiple calls are made to the same scikit-learn metric API, each subsequent call adds a "call_index" (starting from 2) to the metric key.
**Case1:**
```python=
X = [[5],[6],[7]]
y = [10,12,14]
model.score(X, y)
```
- **round-1**

- **round-2**: call_index=2

- **round-3**: call_index=3

---
**Case2:**
```python=
iris_X = [[5],[6],[7]]
iris_y = [10,12,14]
model.score(iris_X, iris_y)
```
- **round-1**

- **round-2**: call_index=2

- **round-3**: call_index=3

---
**Case3:**
> f the "prediction input dataset" instance is an intermediate expression without a defined variable name, the dataset name is set to "**unknown_dataset**"
```python=
model.score([[5],[6],[7]], [10,12,14])
```
- **round-1**

- **round-2**: call_index=2

- **round-3**: call_index=3

:::warning
:bulb: **"unknwon_dataset"**
如果 metric 名稱有 "unknwon_dataset"
那應該是直接傳遞陣列給 `mlflow.score([...])` 或 `mlflow.predict([...])`
:::
<br>
<hr>
<br>
## 簡易測試資料2
> [資料來源](https://www.mlflow.org/docs/latest/tracking.html#automatic-logging)
>
```python=
import mlflow
import os
os.environ["AWS_ACCESS_KEY_ID"] = "*****"
os.environ["AWS_SECRET_ACCESS_KEY"] = "*****"
os.environ["MLFLOW_S3_ENDPOINT_URL"] = "https://cloudstorage.oneai.twcc.ai"
os.environ["MLFLOW_TRACKING_URI"] = "https://mlflow-token:*****"
os.environ["MLFLOW_EXPERIMENT_NAME"] = "tj-mlflow-0629"
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
mlflow.autolog()
db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)
# Create and train models.
rf = RandomForestRegressor(n_estimators = 100, max_depth = 6, max_features = 3)
rf.fit(X_train, y_train)
# Use the model to make predictions on the test dataset.
predictions = rf.predict(X_test)
autolog_run = mlflow.last_active_run()
```
- error: 需安裝 `boto3`
```
2022/06/30 17:48:48
INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn.
2022/06/30 17:48:48
INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID
'5191d4672ab3473199d7a47bb5418590', which will track hyperparameters,
performance metrics, model artifacts, and lineage information for the
current sklearn workflow
2022/06/30 17:48:50
WARNING mlflow.utils.autologging_utils: Encountered unexpected error
during sklearn autologging: No module named 'boto3'
```
```
pip install boto3
```
- [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html): used to create, configure, and manage AWS services
```
from sklearn.metrics import r2_score
r2_score(test_y, pred_y)
```
會自動產生 `r2_score_test_X`

<br>
<hr>
<br>
## 參考 Cynthia code
### diff
- [is_support_mlflow](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/commit/546d29950202eca9b0f27fef6e4e54aebe28626f)
- [add mlflow params and tags](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/commit/56043105fbbe852731474b0ccf9cf8a26eade9d2)
> 不要 call mlflow.set_experiment(mlflow_experiment)
- [log_model](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/commit/1901cdb321c241851f42be27b240e704a856f1ad)
### [utils / env_utils.py](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/blob/mlflow/workspace/utils/env_utils.py)
- [is_support_mlflow](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/blob/mlflow/workspace/utils/env_utils.py#L53)
- [get_mlflow_experiment](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/blob/mlflow/workspace/utils/env_utils.py#L58)
### [train.py](http://10.78.26.44:30000/ai_maker_template/ml_ai_maker/-/blob/mlflow/workspace/train.py)
- is_support_mlflow=env_utils.is_support_mlflow()
<br>
## ASUS
### Mlflow 架構
- [Mlflow Model on Model Management Service](https://hackmd.io/kJzzKT95Qv6VL80FmKdqrg?view)
### AI Maker 文件
- [AI Maker(搶鮮版)](https://docs.oneai.twcc.ai/s/3uxGFglX0)
<br>
<hr>
<br>
## [Issues](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/)
- [[映像檔] Training image+範本 要能支援 mlflow](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/325)
- [看懂 SMTR 的 Error Message](https://hackmd.io/C0BidgsjQ8m4BOuBo1VEUw)
- ✗ Create MLflow experiment failed! some error msg
- [[Training Job] 搶鮮版的訓練任務未正確處理set_experiment API,導致訓練任務失敗](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/362)
- [[fixed][訓練任務] 演算法 Random 會一直在 command 補上 "pip install mlflow-asus-aimaker"](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/363)
- [[Portal] 輸入特殊字元為名稱,UI會被鎖死](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/331)
- [[訓練任務] Checkbox會自動reset,導致挑選多個訓練任務不易](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/346)
- [[Log_Model] 搶鮮版訓練任務使用 log_model 出現 s3 error](http://10.78.26.44:30000/UXQ/tws_aimaker/-/issues/367)
> " warnings.warn(",
"2022/07/13 11:09:46 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during sklearn autologging: Failed to upload /tmp/tmp_ego8mh_/model/model.pkl to mlflow/60/2c01ede2c05b40fcb60d4b3411bf1324/artifacts/model/model.pkl: An error occurred (InternalServerError) when calling the CompleteMultipartUpload operation (reached max retries: 4): Unknown",
<br>
<hr>
<br>
## Portal
### env
| Variable | Value |
| -------- | ----- |
| AWS_ACCESS_KEY_ID | X7TUY4N4UA2FYEL6EG1DVPSZ |
| AWS_SECRET_ACCESS_KEY | vLsLRSTxv05iSR1oB3YU3p4y2t9PV2wO1J6YWyuJ |
| MLFLOW_S3_ENDPOINT_URL | https://cloudstorage.oneai.twcc.ai |
| MLFLOW_TRACKING_URI | file-plugin:/mnt/mlflow |
### Search
- 條件輸入
> INVALID_PARAMETER_VALUE: Invalid entity type 'Metrics'. Valid values are ['metric', 'parameter', 'tag', 'attribute']
<br>
## 參考資料
- [Python sklearn RandomForestClassifier non-reproducible results](https://stackoverflow.com/questions/47433920)
- [Automatic Logging](https://www.mlflow.org/docs/latest/tracking.html#automatic-logging)
- [15 Best Tools for ML Experiment Tracking and Management](https://neptune.ai/blog/best-ml-experiment-tracking-tools#overview)