# ML Platform Research Note - [HackMD – LaTeX 語法與示範](https://hackmd.io/@sysprog/B1RwlM85Z?type=view) ## Paper - [Clipper: A Low-Latency Online Prediction Serving System](https://www.usenix.org/system/files/conference/nsdi17/nsdi17-crankshaw.pdf) - [TensorFlow-Serving: Flexible, High-Performance ML Serving](http://learningsys.org/nips17/assets/papers/paper_1.pdf) - [A Case for Serverless Machine Learning](http://learningsys.org/nips18/assets/papers/101CameraReadySubmissioncirrus_nips_final2.pdf) - [TICTAC: ACCELERATING DISTRIBUTED DEEP LEARNING WITH COMMUNICATION SCHEDULING](https://mlsys.org/Conferences/2019/doc/2019/199.pdf) - [PipeMare: Asynchronous Pipeline Parallel DNN Training](https://arxiv.org/pdf/1910.05124.pdf) - [TFX: A TensorFlow-Based Production-Scale Machine Learning Platform](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/b500d77bc4f518a1165c0ab43c8fac5d2948bc14.pdf) - [Kubebench: A Benchmarking Platform for ML Workloads](https://alln-extcloud-storage.cisco.com/ciscoblogs/5c0fda3a560b9.pdf) - [Hidden Technical Debt in Machine Learning Systems](https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf) - [Deep CTR Prediction in Display Advertising](https://arxiv.org/pdf/1609.06018.pdf) - [Large-Scale Machine Learning at Twitter](http://users.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf) - [Practical Lessons from Predicting Clicks on Ads at Facebook](https://research.fb.com/wp-content/uploads/2016/11/practical-lessons-from-predicting-clicks-on-ads-at-facebook.pdf) - [On Challenges in Machine Learning Model Management](http://sites.computer.org/debull/A18dec/p5.pdf) - [An Experimentation and Analytics Framework for Large-Scale AI Operations Platforms](https://www.usenix.org/system/files/opml20-paper-rausch_0.pdf) ## PPT - [ACCELERATED COMPUTING FOR AI](http://learningsys.org/nips18/assets/slides/Catanzaro_AI_Systems_Workshop_2018.pdf) - [PipeDream: Generalized Pipeline Parallelism for DNN Training](https://sosp19.rcs.uwaterloo.ca/slides/narayanan.pdf) - [MLOp Lifecycle Scheme for Vision-based Inspection Process in Manufacturing](https://www.usenix.org/sites/default/files/conference/protected-files/opml19_slides_lim.pdf) - [[Lecture] SOFA Quick Start](https://docs.google.com/presentation/d/1fyNnLlU-0WMIddkI8hgYn0Tg1vbP9i7VuXSPIsXB2L4/edit#slide=id.g5c0adbf077_0_422) - [Bighead: Airbnb’s End-to-End Machine Learning Infrastructure](https://static1.squarespace.com/static/53629df3e4b02e2dc6655a87/t/5d7bb27e4e663e641fc69c15/1568387721126/B147+-+Hoh%2C+Andrew.pdf) - [Apache Hadoop 机器学习引擎 Submarine 及生态 刘勋](https://myslide.cn/slides/18398) - [Scaling Deep Learning on Hadoop at LinkedIn](https://www.slideshare.net/ssuser72f42a/scaling-deep-learning-on-hadoop-at-linkedin) - [MLeap: Productionize Data Science Workflows Using Spark](slideshare.net/JenAman/mleap-productionize-data-science-workflows-using-spark?next_slideshow=1) - [Bighead airbnb’s End-to-End Machine Learning Infrastructure](https://cdn.oreillystatic.com/en/assets/1/event/278/Bighead_%20Airbnb_s%20end-to-end%20machine%20learning%20platform%20Presentation.pdf) - [Machine Learning as Code and Kubernetes with Kubeflow](https://myslide.cn/slides/13049) - [Kubebench:Benchmarking ML Workloads on Kubernetes](https://schd.ws/hosted_files/kccncchina2018english/17/Kubebench_KubeCon2018China.pdf) - [ML Ops and Kubeflow Pipelines](https://www.usenix.org/sites/default/files/conference/protected-files/srecon19apac_slides_sato.pdf) - [Building AI Platfrom Based on Kubernetes and TensorFlow](http://bos.itdks.com/a1d52ddb24d34f19a194c83a30ff6f43.pdf) - [Apache Spark Model Deployment](https://www.slideshare.net/databricks/apache-spark-model-deployment) - [What are the Unique Challenges and Opportunities in Systems for ML](https://www.slideshare.net/matei/what-are-the-unique-challenges-and-opportunities-in-systems-for-ml) - [Kubeflow++ Building an Open Source Data Science Platform](https://events19.linuxfoundation.org/wp-content/uploads/2017/12/Kubeflow-Building-and-Operating-a-OSS-Data-Science-Platform-J%C3%B6rg-Schad-Mesosphere.pdf) - [Zipline—Airbnb’s Declarative Feature Engineering Framework](https://www.slideshare.net/databricks/ziplineairbnbs-declarative-feature-engineering-framework) ## github - https://github.com/kanonjz/paper - https://github.com/cortexlabs/cortex - https://github.com/ucbrise/clipper - https://github.com/tensorflow/serving - https://github.com/Angel-ML/serving - https://github.com/tensorflow/tensorrt - https://github.com/tensorflow/tfx - https://github.com/kubeflow/examples - https://github.com/microsoft/nni - https://github.com/apache/submarine - https://github.com/tensorflow/cloud - https://github.com/Netflix/metaflow - https://github.com/quantumblacklabs/kedro - https://github.com/HDI-Project/AutoBazaar - https://github.com/awslabs/djl - https://github.com/bentoml/BentoML ## Conference List of Machine Learning and Deep Learning conferences in 2020 https://tryolabs.com/blog/machine-learning-deep-learning-conferences/ - [Systems for ML 2018](http://learningsys.org/nips18/acceptedpapers.html) - [SysML Conference 2019](https://mlsys.org/Conferences/2019/index.html#schedule) - [aiconference](https://aiconference.london/) - [SOSP 2019 Program](https://sosp19.rcs.uwaterloo.ca/program.html) - [OpML '19 Conference Program](https://www.usenix.org/conference/opml19/program) - [ScaledML 2019](http://scaledml.org/2019/index.html) - [Workshop on AI Systems at SOSP 2019](http://learningsys.org/sosp19/) ## Talk - [Bighead: Airbnb’s End-to-End Machine Learning Platform-1](https://databricks.com/session/bighead-airbnbs-end-to-end-machine-learning-platform) - [Bighead: Airbnb’s End-to-End Machine Learning Platform-2](https://www.youtube.com/watch?v=UvcnoOrgyhE) - [Zipline: Airbnb’s Machine Learning Data Management Platform](https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform) - [Machine Learning with TensorFlow and PyTorch on Apache Hadoop using Cloud Dataproc (Cloud Next '19)](https://www.youtube.com/watch?v=hr7_pG3yEOQ) - [Benchmarking Machine Learning Workloads on Kubeflow - Xinyuan Huang, Cisco Systems, Inc. & Ce Gao](https://www.youtube.com/watch?v=9sLRIBYYUlQ) - [Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda Tan, Hortonworks](https://www.youtube.com/watch?v=RlLfxa81hgo) - [SFBigAnalytics_20200825: Apache Submarine: State of the union](https://www.youtube.com/watch?v=Zu_YxxmL6LU&ab_channel=SFBigAnalytics) - [TensorFlow On Spark: Scalable TensorFlow Learning on Spark Clusters](https://databricks.com/session/tensorflow-on-spark-scalable-tensorflow-learning-on-spark-clusters) - [Simplifying Model Management with MLflow - Matei Zaharia (Databricks) Corey Zumar (Databricks)](https://www.youtube.com/watch?v=MSUTaCBhD7A&ab_channel=Databricks) - [Platform for Complete Machine Learning Lifecycle (mlflow)](https://pyvideo.org/pydata-miami-2019/platform-for-complete-machine-learning-lifecycle.html) - [Building and Managing a Centralized Kubeflow Platform at Spotify - Keshi Dai & Ryan Clough, Spotify](https://www.youtube.com/watch?v=m9XhsnNSMAI&ab_channel=CNCF%5BCloudNativeComputingFoundation%5D) - [Human-Centric Machine Learning Infrastructure @Netflix](https://www.youtube.com/watch?v=XV5VGddmP24&ab_channel=InfoQ) ## Blog - [Meet Michelangelo: Uber’s Machine Learning Platform](https://eng.uber.com/michelangelo-machine-learning-platform/) - [Twitter meets TensorFlow](https://blog.twitter.com/engineering/en_us/topics/insights/2018/twittertensorflow.html) - [Using Deep Learning at Scale in Twitter’s Timelines](https://blog.twitter.com/engineering/en_us/topics/insights/2017/using-deep-learning-at-scale-in-twitters-timelines.html) - [Introducing FBLearner Flow: Facebook’s AI backbone](https://engineering.fb.com/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/) - [A Tour of End-to-End Machine Learning Platforms](https://www.kdnuggets.com/2020/07/tour-end-to-end-machine-learning-platforms.html) - [3 Common Technical Debts in Machine Learning and How to Avoid Them](https://towardsdatascience.com/3-common-technical-debts-in-machine-learning-and-how-to-avoid-them-17f1d7e8a428) - [Implementing Apache Submarine — a unified AI Platform](https://medium.com/analytics-vidhya/implementing-apache-submarine-a-unified-ai-platform-459c9edd541e) ###### tags: `Research` `Note`