Mlflow - HackMD

![](https://i.imgur.com/t92Ex9I.png) [TOC] # MLflow MLflow is an open source platform for managing the **end-to-end** machine learning lifecycle. ![](https://i.imgur.com/dOZ30Wf.png) why to use Mlflow? --- 1. keep track of experiments 2. reuse or reproduce code 3. package and deploy models 4. manage models --- ## MLflow Tracking The MLflow Tracking component is an **API** and **UI** for logging parameters, code versions, metrics, and output files when running your machine learning code and for later **visualizing** the results. MLflow Tracking lets you log and query experiments using [Python](https://mlflow.org/docs/latest/python_api/index.html), [R](https://mlflow.org/docs/latest/R-api.html), [Java](https://mlflow.org/docs/latest/java_api/index.html), and [REST](https://mlflow.org/docs/latest/rest-api.html) API APIs. ![](https://i.imgur.com/6dP2s6t.jpg) MLflow Tracking is organized around the concept of **runs**, which are executions of some piece of data science code. ![](https://i.imgur.com/xyhqZ3B.png) Example: ```python import mlflow with mlflow.start_run() as run: mlflow.log_param("my", "param") mlflow.log_metric("score", 100) ``` ## MLflow Projects An MLflow Project is a format for packaging data science code in a **reusable** and **reproducible** way, based primarily on conventions. In addition, the Projects component includes an **API** and **command-line** tools for running projects, making it possible to chain together projects into workflows. ![](https://i.imgur.com/fgJ3uru.jpg) Example: ```python= import mlflow project_uri = "https://github.com/mlflow/mlflow-example" params = {"alpha": 0.5, "l1_ratio": 0.01} # Run MLflow project and create a reproducible conda environment # on a local host mlflow.run(project_uri, parameters=params) ``` ```bash= * remote mlflow run https://github.com/mlflow/mlflow-example -v 3c0711f8868232f17a9adbb69fb1186ec8a3c0c7 -b local -P alpha=0.5 -P l1_ratio=0.01 * local mlflow run /home/u1353162/project -b local -P alpha=0.5 -P l1_ratio=0.01 ``` ## MLflow Models An MLflow Model is a standard format for packaging machine learning models. ![](https://i.imgur.com/QYYUubO.jpg) ### Storage Format * All of the flavors that a particular model supports are defined in its MLmodel file in YAML format * mlflow.sklearn outputs models as follows: model/ ├── MLmodel └── model.pkl ### Built-In Model Flavors * Python Function (python_function) * R Function (crate) * H2O (h2o) * Keras (keras) * MLeap (mleap) * PyTorch (pytorch) * Scikit-learn (sklearn) * Spark MLlib (spark) * TensorFlow (tensorflow) * ONNX (onnx) * MXNet Gluon (gluon) * XGBoost (xgboost) * LightGBM (lightgbm) * Spacy(spaCy) * Fastai(fastai) * Statsmodels (statsmodels) ### Deploy MLflow models You deploy MLflow model locally or generate a Docker image using the CLI interface to the mlflow.models module. #### Commands * **serve** deploys the model as a local REST API server. * **build_docker** packages a REST API endpoint serving the model as a docker image. * **predict** uses the model to generate a prediction for a local CSV or JSON file. # MLflow使用手冊 ## 儲存artifacts在S3的方式: 1. 設定環境變數證書和位置 ```bash export MLFLOW_S3_ENDPOINT_URL=https://cos.twcc.ai export AWS_ACCESS_KEY_ID=QQV0EC2S038W29QMK1VV export AWS_SECRET_ACCESS_KEY=EDV54sny8Mez1YrIqB5hQpgSNrpMKMFutlQyRhFw export MLFLOW_TRACKING_URI=https://ai-mkt.nchc.org.tw/mlflow/9a0737e7ee387f06338f6addd7b9ab51/ ``` 2. 安裝mlfow和boto3 pip install mlflow pip install boto3 3. 開啟mlflow並配置artifact store 方法一: 開啟MLflow tracking server(開啟tacking server之前必須設定endpoint) mlflow server --default-artifact-root s3://bucket --host IP --port PORT 方法二: 新增Experiments(CLI) mlflow experiments create -n experiment-name -l s3://bucket/path 方法三: 新增Experiments(Python SDK) ```python from mlflow.tracking import MlflowClient #Create an experiment with a name that is unique and case sensitive. client = MlflowClient() experiment_id = client.create_experiment("experiment name", "s3://bucket/path") ``` ## 部屬model方法: ```bash mlflow models serve -m s3://bucket/your/model/path --host IP --port PORT --no-conda ```