Introduction to Federated Learning

# Introduction to Federated Learning (FL) Neil John D. Ortega ML Engineer @ LINE Fukuoka 2021/06/04 --- ## Agenda - What? - Why? - How does it work? - How to use? - Challenges - Recap --- ## What is Federated Learning (FL)? - ML setting where: - Multiple clients collaborate in solving an ML problem - Orchestrated by a central server - Clients' data remain local and are not exchanged/transferred - Clients' model updates are sent to central server for aggregation ---- ## What is Federated Learning (FL)? - Can be **cross-device** or **cross-silo**  - Main difference is in scope wherein "cross-silo" FL trains models on siloed (i.e. organization-level) data - Highly interdisciplinary - ML, distributed computing and optimization, cryptography, security, differential privacy, fairness, information theory, etc.  --- ## Why is FL relevant? - Allows training of models without the need of centralizing (usually private) data - Direct application of the following principles [1]: - **Focused collection**: consumers have a right to limit the amount of personal data companies collect AND retain - **Data minimization**: orgs should only collect personally identifiable info (PII) directly relevant to the task at hand AND retain it for only as long as necessary - **How can we protect the privacy of users but at the same time promote innovation?**  --- ## How does FL work? - Model Lifecycle ![Model lifecycle in a federated learning setting](https://i.imgur.com/PXgvRsH.png)  Fig. 1. The model lifecycle and the involved actors in an FL setting [2]. Accessed 3 Jun 2021. ---- ## How does FL work? - Model Lifecycle 1. **Problem identification**: ML engineer determines a problem where FL makes sense 2. **Client instrumentation**: make sure that the client (e.g. edge device) has everything it needs to perform local training 3. **Simulation prototyping (optional)**: ML engineer may play around with model architectures, hyperparams, etc. in an FL simulation on a proxy dataset ---- ## How does FL work? - Model Lifecycle 4. **Federated training**: federated training tasks are started to train the model 5. **(Federated) model evaluation**: after successful/sufficient training, the resulting models are evaluated based on metrics in either a centralized or federated manner (i.e. cross validation but on held-out devices) 6. **Deployment**: after evaluation, the final model goes through the standard model deployment process ---- ## How does FL work? - Federated training ![The typical federated training process](https://i.imgur.com/OjyIxZH.png =680x433) Fig. 2. The typical federated training process [4]. Accessed 3 Jun 2021. ---- ## How does FL work? - Federated training 1. **Client selection**: server samples from all eligible clients 2. **Broadcast**: the sampled clients download (a) the current model weights, and (b) a training program from the server 3. **Client computation**: each client computes a local update to the model by performing the training program 4. **Aggregation**: the server collects the client updates 5. **Model update**: the server updates the shared model based on the computed aggregate ---- ## How does FL work? - Federated training via `FedAvg` Algo ![`FedAvg` algorithm explaind](https://i.imgur.com/xHkLnWv.png =821x439) Fig. 3. The <code>FedAvg</code> algorithm - a concrete example of federated training [3][4]. Accessed 3 Jun 2021. ---- ## How does FL work? - Federated training ![Typical cross-device FL order-of-magnitudes](https://i.imgur.com/LGMTCqH.png) Fig. 4. Typical order-of-magnitude sizes for cross-device FL applications [2]. Accessed 3 Jun 2021. --- ## How to use FL? - Frameworks and Datasets - Frameworks - [Tensorflow Federated](https://github.com/tensorflow/federated) - mainly FL, has abstractions for aggregation, broadcast, and serialization of TF computations and can potentially be used in production - [PySyft](https://github.com/OpenMined/PySyft) - not specific to FL, also includes differential privacy and multi-party computation (MPC) - 🌸 [**Flower**](https://github.com/adap/flower) - geared towards production environments, under active development 🔥  - ... and many more! - Datasets (for benchmarks, experiments) - [LEAF](https://leaf.cmu.edu/) - compilation of FL-ready versions of well-known datasets such as MNIST (image classification), Shakespeare (next character prediction), etc. ---- ## How to use FL? - 🌸 [**Flower**](https://github.com/adap/flower) Demo --- ## Challenges - Handling non-IID (independent, identically distributed) data still an open problem - Communication is a big bottleneck, prompting further research in communication efficiency and compression - Adapting certain techniques in the centralized setting (e.g. hyperparameter tuning, debugging, interpretability, etc.) into FL setting not straightforward - Expanding FL into other learning settings (e.g. semi-supervised, unsupervised, RL, etc.) ---- ## Challenges - Integrating other privacy-preserving techniques (differential privacy, MPC, etc.) into the FL setting - Verifying that parties have faithfully executed the parts of a computation delegated to them (i.e. adversarial clients/servers) - Constant tension between improving robustness and privacy - Ensuring fairness despite the lack of access to data - Systems engineering of the entire model lifecycle - Support for on-device training is still lacking --- ## Recap <style> .reveal h1 {font-size: 2.0em !important;} .reveal h2 {font-size: 1.28em !important;} .reveal ul {font-size: 32px !important;} .reveal ol strong, .reveal ul strong { color: #E26A6A !important; } </style> - Federated Learning (FL) is useful in settings where ML-based solution is desired but data is not centralized - FL can address the privacy-vs-innovation problem - FL introduces some major changes to the entire ML model lifecycle, specifically at the model training step  - Lots of existing frameworks for FL but mainly for simulation - [**Flower**](https://github.com/adap/flower) intends to make FL available for production settings - Still lots of open problems under FL --- # Thank you! :nerd_face: --- ## References  <style> .reveal p {font-size: 20px !important;} .reveal ul, .reveal ol { display: block !important; font-size: 30px !important; } .reveal li {line-height: 1.4 !important;} section[data-id="references"] p { text-align: center !important; } </style> [1] Anonymous, A. “[Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy](https://journalprivacyconfidentiality.org/index.php/jpc/article/view/623).” Journal of Privacy and Confidentiality 4 (2) (2013). https://doi.org/10.29012/jpc.v4i2.623. [2] Kairouz, P. et al. “[Advances and Open Problems in Federated Learning](https://arxiv.org/abs/1912.04977).” ArXiv abs/1912.04977 (2019): n. pag. [3] McMahan, H. B. et al. “[Communication-Efficient Learning of Deep Networks from Decentralized Data](https://arxiv.org/abs/1602.05629).” AISTATS (2017). [4] Visengeriyeva, L. et al. “[Three Levels of ML Software](https://ml-ops.org/content/three-levels-of-ml-software.html).” Web blog post. INNOQ. Web.