# Lecture 2: What is a Machine Learning Platform Part of mini-course of [Apache Submarine: Design and Implementation of a Machine Learning Platform](https://hackmd.io/@submarine/B17x8LhAH). Day 1, [Lecture 2](https://cloudera2008-my.sharepoint.com/:p:/g/personal/weichiu_cloudera2008_onmicrosoft_com/EWewCTOgLDhMrypPSKRxb-sBe5n85XGqequJfqWcHor2wg?e=D5fOrX) * 1.5 hr * Also known as machine learning infrastructure, AI infrastructure, Machine Learning Operations (MLOps) * Why do you need a system at all? * Reduce time/effort to develop ML product, simplify workflow * Repeatable process * Support a variety of ML frameworks, users * This is a booming world; many ML algorithms and frameworks and data scientists don’t standardize on any of them. * Easier to evaluate models * Easy to scale up/down, push to production * Industrialization of AI / Productive ML / Productionizing ML... * Worth noting: this is a new domain, and people are still trying to figure out. So this is exciting area, best practices still yet to be established. But it also means what I talk about today may deprecate really soon. * Why not just a notebook, like Juypter? * Collaboration * [Interesting read: “I Don't Like Notebooks”](https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/edit?usp=sharing) * To summarize, hidden state, bad coding habit, modularity, reproducibility, hard to share across media * Reproducibility: Python library version * Why not a data visualization/BI tool? * Why not an ML framework? * ML Platform → ML Framework → ML Algorithms * ML Platform = supports one or multiple ML framework + toolkit to support ML workflow * ML Framework/ ML library = an implementation of ML algorithms, supports one or more ML algorithms, supports one or more languages * What is there in the market * [13 frameworks for mastering machine learning](https://www.idginsiderpro.com/article/3026262/13-frameworks-for-mastering-machine-learning.html) * Open source: Submarine, MLFlow, Kubeflow, TFX * MLFlow * open source platform for managing the end-to-end machine learning lifecycle. * Tracking experiments to record and compare parameters and results ([MLflow Tracking](https://mlflow.org/docs/latest/tracking.html#tracking)). * Packaging ML code in a reusable, reproducible form in order to share with other data scientists or transfer to production ([MLflow Projects](https://mlflow.org/docs/latest/projects.html#projects)). * Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms ([MLflow Models](https://mlflow.org/docs/latest/models.html#models)). * KubeFlow * TBD * TFX * machine learning platform based on TensorFlow * Paper: [https://ai.google/research/pubs/pub46484](https://ai.google/research/pubs/pub46484) * [Compare to other ML pipeline](https://docs.google.com/document/d/1KK1aNsivo6Eyji4r71bLIdwS-hdLeVX0_HCkgQZYoFM/edit#heading=h.wdicv4gxymrz) * Others: * Apache Singa, Apache Marvin * Lyft Flyte * [https://flyte.org/](https://flyte.org/) * Combines machine learning and data processing into a platform. Netflix Metaflow * [Open-Sourcing Metaflow, a Human-Centric Framework for Data Science](https://medium.com/netflix-techblog/open-sourcing-metaflow-a-human-centric-framework-for-data-science-fa72e04a5d9) * Commercial: Cloudera Data Science Workbench, SageMaker, Azure Machine Learning Studio, H2O.ai SAS, RapidMiner, Dataiku, DataRobot, IBM DSX... * Apache Submarine * Big data, large scale, distributed GPU training * [Submarine project spin-off to TLP proposal](https://docs.google.com/document/d/1kE_f-r-ANh9qOeapdPwQPHhaJTS7IMiqDQAS8ESi4TA/edit) * algorithm development, model batch training, model incremental training, model online services and model management * Why Submarine: because data is stored in Hadoop cluster, so naturally ML/DL job run on the cluster. * Big tech companies already developed MLPs, however, they are generally not open source. Submarine intend to be the open standard for machine learning platform. ### Related conferences * USENIX Symposium on Networked Systems Design and Implementation (NSDI) * [https://www.usenix.org/conference/nsdi20](https://www.usenix.org/conference/nsdi20) * ACM Symposium on Operating Systems Principles (SOSP) * [https://sosp19.rcs.uwaterloo.ca/](https://sosp19.rcs.uwaterloo.ca/) * Machine Learning and Systems (MLSys) * [https://mlsys.org/](https://mlsys.org/) * USENIX Conference on Operational Machine Learning (OpML) * [https://www.usenix.org/conference/opml19](https://www.usenix.org/conference/opml19) ![](https://lh6.googleusercontent.com/vDcTTWf93Lwy-oq0VUuYs4x3vt7IyW5-oVOwsYKOzBtHEZPX2dT7kDfxl9cjkvG59dnJ1VJ1ZiZoVOWFpwOsdWFAyPqiXaUJ5V1r7joZ) ### Submarine Architecture ![](https://lh3.googleusercontent.com/UmQWJYIa-xMGDDmRwqytvo4XXhy7SNqB6sFoKvYVvEYCfnHHSNcqUlo3QKVYjeyUS1kY1-xckaHcNGot8n57lu0Iq00fU8Y3NMTTS6O_lf60G7y4VnzgzNkiy-S6h9Upo_Sm5Jyn) ### TFX ![](https://lh5.googleusercontent.com/Il3NP1IF702hGGuqVlovne303Sh_JLhMVClUcBLOQOWZC0h0du43gPmXYzVpMr0f9-_IcRxLO0109tY65GYheJ0BTHb6cd3MKbr3h-AmG3SRoWdMHhXggSHq7WipLQF-wBmrKkMB) ###### tags: `2019-minicourse-submarine` `Machine Learning`