# Lecture 9: Model development/experiment/management
Part of mini-course of [Apache Submarine: Design and Implementation of a Machine Learning Platform](https://hackmd.io/@submarine/B17x8LhAH). Day 3, Lecture 9
* 1 hr
## Data catalog
* Record data and code in each step of a task execution
* 「中台」, Memoization (Lyft) / task code/data snapshot (Netflix)
* In Lyft Flyte, not every step is cached automatically, because a step may have side effect and therefore the result may change after re-execution. A “cache” annotation can be added.
* Lineage
* Data dependency is tracked
* Caching
* Reproducibility
* Containization + caching (immutable snapshot of code and input data)
## AutoML
### Experiment/training
* Hyperparameter tuning
* **Hyperparameter** for DL: learning rate, number of hidden units, convolution kernel width, and more.
* [Hyperparameter Tuning (Katib)](https://www.kubeflow.org/docs/components/hyperparameter-tuning/hyperparameter/)
* [Google Vizier: A Service for Black-Box Optimization](https://static.googleusercontent.com/media/research.google.com/ja//pubs/archive/bcb15507f4b52991a0783013df4222240e942381.pdf)
* [HyperOpt](https://docs.databricks.com/applications/machine-learning/automl/hyperopt/index.html)
* [Hyperopt: Distributed Asynchronous Hyper-parameter Optimization](https://hyperopt.github.io/hyperopt/)
* Bergstra, J., Yamins, D., Cox, D. D. (2013) Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. To appear in Proc. of the 30th International Conference on Machine Learning (ICML 2013).
* [Understanding Hyperparameters Optimization in Deep Learning Models: Concepts and Tools](https://towardsdatascience.com/understanding-hyperparameters-optimization-in-deep-learning-models-concepts-and-tools-357002a3338a)
* Automatic tuning: algorithm: grid search, random search, baysian?
* Commercial tools: AWS Sagemaker, Cloud ML, Azure ML, Comet.ml, [https://deepcognition.ai/](https://deepcognition.ai/), Weights & Biases
* Parameter server
* Training very large models.
* [Scaling Distributed Machine Learning with the Parameter Server](https://www.cs.cmu.edu/~muli/file/parameter_server_osdi14.pdf)
* [CS 6453: Parameter Server](http://www.cs.cornell.edu/courses/cs6453/2017sp/slides/paramserver.pdf)
* [Parameter Server for Distributed Machine learning](https://medium.com/coinmonks/parameter-server-for-distributed-machine-learning-fd79d99f84c3)
* A parameter server is a key-value store used for training machine learning models on a cluster. The **values** are the parameters of a machine-learning model (e.g., a neural network). The **keys** index the model parameters.
* For example, in a movie **recommendation system**, there may be one key per user and one key per movie. For each user and movie, there are corresponding user-specific and movie-specific parameters. In a **language-modeling** application, words may act as keys and their embeddings may be the values. In its simplest form, a parameter server may implicitly have a single key and allow all of the parameters to be retrieved and updated at once.
* TODO: Parameter server/client pseudo code
* It is argued that MapReduce is not a good machine learning abstraction (Hidden Technical Debt in Machine Learning Systems) and that parameter server may be a good one. The paper “Scaling Distributed Machine Learning with the Parameter Server” elaborated this: MapReduce-based ML framework such as Mahout, MLI requires iterative, synchronous communication, which has high cost at scale. The Tensorflow paper further states that MapReduce style framework requires immutable state and each subcomponent computation deterministic so that it can be restarted.
* AngelML -- a parameter server for large scale machine learning
* [Angel-ML/angel](https://github.com/Angel-ML/angel)
* 
* Runs on YARN and Kubernetes.
* [ray-project/ray](https://github.com/ray-project/ray)
* [TensorFlow: A System for Large-Scale Machine Learning (publications)](https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf)
* [TensorFlow: A System for Learning-Scale Machine Learning (slides)](http://www.cs.cornell.edu/courses/cs6453/2017sp/slides/tensorflow.pdf)
* TensorFlow adds the flexibility to offload arbitrary computation onto the devices that host the shared parameters.
* in TensorFlow there is no such thing as a parameter server. On a cluster, we deploy TensorFlow as a set of tasks (named processes that can communicate over a network) that each export the same graph execution API and contain one or more devices. Typically a subset of those tasks assumes the role that a parameter server plays in other systems [11, 14, 20, 49], and we therefore call them PS tasks; the others are worker tasks. However, since a PS task is capable of running arbitrary TensorFlow graphs, it is more flexible than a conventional parameter server: users can program it with the same scripting interface that they use to define models.
* Model Warm starting (TFX)
* Mode quality vs model freshness.
* Transfer learning
* For developing a new version of an existing model. Initialize the parameters from the previous version of model. Much quicker to converge.
* Machine learning governance
* [Creating an Open Standard: Machine Learning Governance using Apache Atlas](https://blog.cloudera.com/creating-an-open-standard-machine-learning-governance-using-apache-atlas/)
* [Tensorflow: ML Metadata](https://www.tensorflow.org/tfx/guide/mlmd)
* [Kubeflow Metadata](https://www.kubeflow.org/docs/components/misc/metadata/)
a model is any self-contained package that takes features computed from input data and produces a prediction (or score) and metadata. Today, it is typically a docker container with python module/java library installed.
[Uber Michelangelo: Model engineering is similar to software engineering](https://eng.uber.com/michelangelo/)
###### tags: `2019-minicourse-submarine` `Machine Learning`