ML systems Architecture

# Machine Learning System Architecture ### Components: - infrastructure - applications - data - documentation - configuration ____ # Research environment ### ML Pipeline: - Gathering Data Sources - Data Analysis - Feature Engineering: filling missing value, ..etc (d) - Feature Selection: select most predictable fields (d) - Model Building: build many ML models and test its performance (d) - Model building Business uplift evaluation ```mermaid graph LR A[Gather Data] --> B[Data Analysis] --> C[Data Pre-processing] --> D[Variable Selection] --> E[Machine Learning Model Building] --> F[Evaluation] ``` ### Feature Engineering: Variable Transformation - missing data - labels - distribution: better spread of values may benefit performance - outliers: extreme low or high regarding to dataset ### Feature Selection: algorithm to find best subset of features (the most predictive ones) - enhanced gerneralization by reducing overfitting - Models with less features are easier to deploy ___ # Model Deployment **ML systems challenge** -> Reproduciblity **Reproduciblity** -> returning same results given same data across system ### Deployment of ML Pipelines ```mermaid graph LR A[Raw Data] -->B(Feature Engineering) -->C(Model Training) -->D(Scoring) -->E(Prediction) ``` - **research environment** -> develop ML models (Jupyter - Numpy - Pandas..etc) - **production environment** -> place ML models (Python - Docker..etc) ##### Deployment of ML models: In research env: ```mermaid flowchart LR Databse[(Historical Data)] --> ML_Model[Machine Learning Model] ``` In production env: ```mermaid flowchart LR Live_Data[Live Data] --> ML_Model[Machine Learning Model] ``` ##### ML Pipeline: - Series of steps need to occur from the moment we receive a data to the moment we make a prediction - created in research env - we need to make it reproducible in prod ### Key principles for ML systems: - automate all stages of ML workflow - training is reproducible - use version control - Testing ML models - full ML pipeline integration tested - all input feature code is tested - model specification code is unit tested - model quality is validated before attempting to serve it - shallow, canary process - monitor model performance ### Architecture Approaches for ML Systems: - serving ML models - formats: - embedded (predict on the fly) - dedicated model API - model published as data - offline predictions(outdated) ___ # Reading resources: - Feature Engineering: - [Feature Engineering for Machine Learning: A Comprehensive Overview](https://www.blog.trainindata.com/feature-engineering-for-machine-learning/) - [Best Resources to Learn about Feature Engineering for Machine Learning](https://trainindata.medium.com/best-resources-to-learn-feature-engineering-for-machine-learning-6b4af690bae7) - [Practical Code Implementation of Feature Engineering Techniques with Python](https://towardsdatascience.com/practical-code-implementations-of-feature-engineering-for-machine-learning-with-python-f13b953d4bcd) - [Resources to learn more about Machine Learning](https://trainindata.medium.com/find-out-the-best-resources-to-learn-machine-learning-cd560beec2b7) - Technical debt in ML systems -> https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf - ML Modeling: https://christophergs.com/machine%20learning/2020/03/14/how-to-monitor-machine-learning-models/ - Integration test: https://martinfowler.com/bliki/IntegrationTest.html - Testing guide: https://www.martinfowler.com/testing/ - Shadow Deployment: https://christophergs.com/machine%20learning/2019/03/30/deploying-machine-learning-applications-in-shadow-mode/ - Rubrik for ML production systems: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45742.pdf - Netflix Architecture for Recommendation systems: https://netflixtechblog.com/system-architectures-for-personalization-and-recommendation-e081aa94b5d8 - Site reliability engineering: https://sre.google/sre-book/table-of-contents/ - Repo: https://github.com/trainindata/deploying-machine-learning-models