<!-- Event: MLOps - from concept to product, bringing ideas to life Sandra Meneses is a Machine Learning Engineer with experience in B2C product companies. She has deployed multiple end-to-end solutions bringing business value, currently freelancing and building a product that promotes self-education. She enjoys participating in data community activities and supporting open source projects. ## Outline This talk covers: - What is MLOps and its importance to have reliable and evolvable data products. - How best practices from other disciplines have made possible the growth of ML products in the market. - A recommended approach to reach a high MLOps maturity level in an organization. - Resources to help you know if MLOps is for you and how to learn it. Finding ways to use your data to solve a problem is a great step, which needs to activate a process that allows moving from a Proof of Concept (POC) to a feature or product. Products are meant to be used (obviously) by users who have expectations about their performance, reliability and usability. This process is guided by MLOps practices. In this talk, we will explore what that really means and how you could start applying these practices in real-world scenarios. https://www.bigmarker.com/conferences/caa42fc9b7ee/presenter_content/from_presenter_reminder?bmid=40b3f47f872c&bmid_type=member --> <style> .reveal h1 { font-size: 100px; } .reveal h2 { font-size: 75px; text-align: left; } .reveal h3 { font-size: 50px; text-align: left; } .reveal p { font-size: 40px; } .reveal tr { font-size: 28px; } .reveal ul, ol { font-size: 30px; display: block; text-align: left; } </style> # MLOps - from concept to product :rocket: #### Bringing ideas to life --- # What is this talk about? 👩🏽‍🏫 - Overview of MLOps practices - Answering these questions: - What is MLOps? - Why should we follow MLOps practices? - What do we need to do MLOps? - How to start following MLOps practices? --- # What is MLOps? - Process of using machine learning <span style="color:#eee8d5">**models**</span> as <span style="color:#eee8d5">**a useful, verifiable and evolvable product**</span>. - MLOps is an <span style="color:#eee8d5">**agnostic practice**</span> to infrastructure and language. - Application of <span style="color:#eee8d5">**DevOps**</span> to the machine learning workflow. --- # What is DevOps? Practices to automate the software delivery lifecycle **Why:** :arrow_up: Agility 🟰 Quality :busts_in_silhouette: People + 🛞 Process + :wrench: Tools <img src="https://software.af.mil/wp-content/uploads/2019/08/devops-loop.svg" height="250"/> --- ## Continuous Integration (CI) and Continuous Delivery (CD) Practices to ensure Software is ready be deployed and deployment can be done automatically - Test-driven development > Anyone can deploy - Production-like staging environments <img src="https://universaltechconsulting.com/wp-content/uploads/2022/05/ci-cd-diagram.png" height="250"/> --- # ML Systems :robot_face: --- # Machine learning lifecycle <img src="https://i.imgur.com/Pokpvmi.png" alt="drawing" height="500"/> --- # Data Team <img src="https://i.imgur.com/puAvInl.png" height="450"/> --- # Why are ML systems different? - DATA - Higher computing resources - Hard to define and measure - Poorly understood - Team skills --- # ML Challenges in the Dev process <img src="https://miro.medium.com/v2/resize:fit:924/format:webp/1*qtxDxwn2ba3t47vVtcZ2ug.png"/> --- # Experimenting - Features - Algorithms - Hyperparameter tuning <img style="float: rigth;" src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/MLOps-practices-experiments.jpeg?ssl=1" width="450"/> --- ## Reproducibility - Why: deployment, debugging - How: - Inputs: Data ( D ) + Code ( C ) + Parameter ( P ) - Remote storage - Environments (containers) - Reduce non-deterministic behaviour (seeds) --- ## Tracking and Versioning - Track inputs: - Data: Features - Code: Training and Prediction - Parameters: (Hyper)parameters (over time if variable) - Track metrics: - Model training: Loss curve - System: Speed, RAM and CPU/GPU usage - Model performance: Which configuration (D + C + P) is the best? --- ## Git for Data Science - Versioning: - Code - Data: [DVC](https://dvc.org) (checksum) - Model artifacts - CI/CD: - Training pipelines: [CML](https://cml.dev/) - GitOps: Infrastructure as Code (IaC) --- ## Automated Testing - Data validation: - Schema: Anomalies :arrow_right: Debugging - Values: Quality metrics :arrow_right: Retraining - Model evaluation: model quality metrics - Model validation: new model is better <img src="https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/MLOps-practices-code.jpg?ssl=1" height="280"/> --- ## Deployment - ML Training Pipeline - Deploy model - Deploy service --- ## Monitoring <table> <tr> <th> <ul> <li>What</li> <ul> <li>End-to-end metrics: revenue</li> <li>Model performance metrics</li> <li>Integration: no predictions</li> <li>Drift: Statistical metrics</li> </ul> <li>Alerts</li> <ul> <li>Jobs or scheduler activity</li> <li>Usage: CPU/RAM, rpm per model </li> </ul> <li>It triggers ➡️</li> <ul> <li>Rolling back</li> <li>(Auto) Scaling ↕️</li> <li>Debugging and/or retraining</li> <li>Data pipeline fixes</li> </ul> </ul> </th> <th><img src="https://deepchecks.com/wp-content/uploads/2022/12/data-drift-my-model.webp" width="300"/> </th> </tr> </table> --- # MLOps Practices --- # Data Management - Features Stores: features for training and inferencing :arrow_up: Consistency :arrow_up: Reusability - Management: Metadata - Computation: Transformations (Normalization, Anonimization, Labelling) - ML Metadata Store: :arrow_up: Reproducibility :arrow_up: Debugging :arrow_up: Data lineage - ML Pipeline execution details - Reference to artifacts and previous versions - Metrics in training and test data --- # Model Managment - Model registry: collection of models :package: - Why: Tracking and Deployment - What: - Definition: author, type, version, stage - Reference to: D + C + P - Metrics and Artifacts --- # Model Evaluation - Why: Reliable system - Experimenting: Start simple and build up - Debugging: :arrow_up: Components :arrow_up: Time - Tracking and Versioning - Data distribution and feature importance - Artifacts: - Logs - Training and model metrics - ML Pipeline in Production --- # Online ML System Validation - Why: - Model has to perform better than the baseline or previous version - Models for different user clusters - Estimate retraining frequency - How: - A/B test: Comparing variants with random routing - Bandits: Comparing variants with routing by model performance - Canary release: Rolling out to a percentage --- # Responsible AI - Fairness: does my product have any bias? - Correlation between input data and predictions within different user clusters - Explainability: can I answer why my product behave in a particular way? --- # Continuous Training (CT) CD of ML - Why: - Model metrics decay - New data is available - Main Challenges: - Fresh data - Evaluation - Options to consider: - Batch vs Online training - Stateless vs Stateful training: from scratch or incremental - Mechanism to trigger training: scheduler, new data, performance decay, data drift --- # MLOPs Maturity Model | Level | People | Process | | ---------------------------- | ----------------- | ------- | | 0 No MLOps | Disperse | No tracking, manual training and evaluation, limited monitoring | | 1 DevOps but no MLOps| Disperse | Data pipelines, Automatic tests and builds for prediction service | | 2 Automated Training | DS+DE | Tracked experiments, Compute Managed, Training Pipeline | | 3 Automated Model Deployment | DS+DE/ DE+SE | Model Testing and (online) Validation, CI/CD, Automatic release | | 4 Full MLOps | DS+DE+SE | Automatic retraining | DS: Data Scientist DE: Data Engineer SE: Software Engineer --- # Automated Pipeline <img src="https://cloud.google.com/static/architecture/images/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning-4-ml-automation-ci-cd.svg" height="550"/> --- # What did we learn? 👩🏽‍🏫 - What is MLOps? - Why should we follow MLOps practices? - What do we need to do MLOps? - How to start following MLOps practices? --- # Books <img src="https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcRTsaKuwGamDyhUPARjC0Q-lmIBfbPFLik8kfZW6YS3OrV5jmTH" width="400"/> <img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSRl5dz8AT9ZGjpwRqxtOPsLw4HubCpwGhMrzRDtrBg5EOd_SPX" width="400"/> --- # Sources - [MLOps Principles](https://ml-ops.org/content/mlops-principles) - [Machine Learning operations maturity model](https://learn.microsoft.com/en-us/azure/architecture/example-scenario/mlops/mlops-maturity-model) - [MLOps: Continuous delivery and automation pipelines in machine learning](https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning) - [Awesome Production Machine Learning](https://github.com/EthicalML/awesome-production-machine-learning) --- # Tools Review ![](https://i.imgur.com/kUKNF97.png) <!-- # Tools to review AI Platform AzureML SageMaker
{"metaMigratedAt":"2023-06-17T19:20:39.096Z","metaMigratedFrom":"YAML","title":"MLOps - from concept to product :rocket:","breaks":true,"slideOptions":"{\"theme\":\"moon\",\"previewLinks\":true,\"center\":false}","contributors":"[{\"id\":\"e541c8d0-ccf7-445b-b59c-8e11df8d556a\",\"add\":20965,\"del\":33660}]"}
    180 views