---
# System prepended metadata

title: 'Lecture 2: What is a Machine Learning Platform'
tags: [2019-minicourse-submarine, Machine Learning]

---

# Lecture 2: What is a Machine Learning Platform

Part of mini-course of [Apache Submarine: Design and Implementation of a Machine Learning Platform](https://hackmd.io/@submarine/B17x8LhAH). Day 1, [Lecture 2](https://cloudera2008-my.sharepoint.com/:p:/g/personal/weichiu_cloudera2008_onmicrosoft_com/EWewCTOgLDhMrypPSKRxb-sBe5n85XGqequJfqWcHor2wg?e=D5fOrX)

* 1.5 hr
* Also known as machine learning infrastructure, AI infrastructure, Machine Learning Operations (MLOps)
* Why do you need a system at all? 
    * Reduce time/effort to develop ML product, simplify workflow
    * Repeatable process
    * Support a variety of ML frameworks, users
        * This is a booming world; many ML algorithms and frameworks and data scientists don’t standardize on any of them.
    * Easier to evaluate models
    * Easy to scale up/down, push to production
* Industrialization of AI / Productive ML / Productionizing ML...
* Worth noting: this is a new domain, and people are still trying to figure out. So this is exciting area, best practices still yet to be established. But it also means what I talk about today may deprecate really soon.
*   Why not just a notebook, like Juypter?
    * Collaboration
    * [Interesting read: “I Don't Like Notebooks”](https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/edit?usp=sharing)
        * To summarize, hidden state, bad coding habit, modularity, reproducibility, hard to share across media
            * Reproducibility: Python library version
* Why not a data visualization/BI tool?
* Why not an ML framework?
    * ML Platform → ML Framework → ML Algorithms
        * ML Platform = supports one or multiple ML framework + toolkit to support ML workflow
        * ML Framework/ ML library = an implementation of ML algorithms, supports one or more ML algorithms, supports one or more languages
* What is there in the market
    * [13 frameworks for mastering machine learning](https://www.idginsiderpro.com/article/3026262/13-frameworks-for-mastering-machine-learning.html)
    * Open source: Submarine, MLFlow, Kubeflow, TFX
        * MLFlow
            * open source platform for managing the end-to-end machine learning lifecycle.
            * Tracking experiments to record and compare parameters and results ([MLflow Tracking](https://mlflow.org/docs/latest/tracking.html#tracking)).
            * Packaging ML code in a reusable, reproducible form in order to share with other data scientists or transfer to production ([MLflow Projects](https://mlflow.org/docs/latest/projects.html#projects)).
            * Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms ([MLflow Models](https://mlflow.org/docs/latest/models.html#models)).
        * KubeFlow
            * TBD
        * TFX
            * machine learning platform based on TensorFlow
            * Paper: [https://ai.google/research/pubs/pub46484](https://ai.google/research/pubs/pub46484)
            *   [Compare to other ML pipeline](https://docs.google.com/document/d/1KK1aNsivo6Eyji4r71bLIdwS-hdLeVX0_HCkgQZYoFM/edit#heading=h.wdicv4gxymrz)
        * Others:
            * Apache Singa, Apache Marvin
        * Lyft Flyte
            * [https://flyte.org/](https://flyte.org/)
            * Combines machine learning and data processing into a platform.
          Netflix Metaflow
            * [Open-Sourcing Metaflow, a Human-Centric Framework for Data Science](https://medium.com/netflix-techblog/open-sourcing-metaflow-a-human-centric-framework-for-data-science-fa72e04a5d9)
    * Commercial: Cloudera Data Science Workbench, SageMaker, Azure Machine Learning Studio, H2O.ai SAS, RapidMiner, Dataiku, DataRobot, IBM DSX...
* Apache Submarine
    * Big data, large scale, distributed GPU training
    * [Submarine project spin-off to TLP proposal](https://docs.google.com/document/d/1kE_f-r-ANh9qOeapdPwQPHhaJTS7IMiqDQAS8ESi4TA/edit)
    * algorithm development, model batch training, model incremental training, model online services and model management
    * Why Submarine: because data is stored in Hadoop cluster, so naturally ML/DL job run on the cluster.
    * Big tech companies already developed MLPs, however, they are generally not open source. Submarine intend to be the open standard for machine learning platform.

### Related conferences

* USENIX Symposium on Networked Systems Design and Implementation (NSDI)
    *   [https://www.usenix.org/conference/nsdi20](https://www.usenix.org/conference/nsdi20)
*   ACM Symposium on Operating Systems Principles (SOSP)
    *   [https://sosp19.rcs.uwaterloo.ca/](https://sosp19.rcs.uwaterloo.ca/)
*   Machine Learning and Systems (MLSys)
    *   [https://mlsys.org/](https://mlsys.org/)
*   USENIX Conference on Operational Machine Learning (OpML)
    *   [https://www.usenix.org/conference/opml19](https://www.usenix.org/conference/opml19)

![](https://lh6.googleusercontent.com/vDcTTWf93Lwy-oq0VUuYs4x3vt7IyW5-oVOwsYKOzBtHEZPX2dT7kDfxl9cjkvG59dnJ1VJ1ZiZoVOWFpwOsdWFAyPqiXaUJ5V1r7joZ)

### Submarine Architecture

![](https://lh3.googleusercontent.com/UmQWJYIa-xMGDDmRwqytvo4XXhy7SNqB6sFoKvYVvEYCfnHHSNcqUlo3QKVYjeyUS1kY1-xckaHcNGot8n57lu0Iq00fU8Y3NMTTS6O_lf60G7y4VnzgzNkiy-S6h9Upo_Sm5Jyn)

### TFX

![](https://lh5.googleusercontent.com/Il3NP1IF702hGGuqVlovne303Sh_JLhMVClUcBLOQOWZC0h0du43gPmXYzVpMr0f9-_IcRxLO0109tY65GYheJ0BTHb6cd3MKbr3h-AmG3SRoWdMHhXggSHq7WipLQF-wBmrKkMB)

###### tags: `2019-minicourse-submarine` `Machine Learning`