RIVM Federated Learning

# RIVM Federated Learning ![](https://i.imgur.com/fWEw7u1.png) # Some Definitions ## Federated Learning The term federated learning was introduced in 2016 by McMahan et al. > "*We term our approach Federated Learning, since the learning task is solved by a loose federation of participating devices (which we refer to as clients) which are coordinated by a central server.*" Federated learning is a machine learning setting where multiple entities (clients) collaborate in solving a machine learning problem, under the coordination of a central server or service provider. Each client’s raw data is stored locally and not exchanged or transferred; instead, focused updates intended for immediate aggregation are used to achieve the learning objective. > For examlpe Gboard mobile keyboard. ### Cross-Silo Federated learning The cross-silo setting can be relevant where a number of companies or organizations share incentive to train a model based on all of their data, but cannot share their data directly. ### Horizontal Federated Learning Horizontal Federated Learning. Horizontal federated learning, or sample-based federated learning, is introduced in the scenarios that data sets share the same feature space but different in samples . For example, two regional banks may have very different user groups from their respective regions, and the intersection set of their users is very small. However, their business is very similar, so the feature spaces are the same. [REF](https://arxiv.org/pdf/1902.04885.pdf) ![](https://i.imgur.com/9D6axKN.png) ### Vertical Federated Learning Vertical federated learning or feature-based federated learning is applicable to the cases that two data sets share the same sample ID space but differ in feature space. For example, consider two different companies in the same city, one is a bank, and the other is an e-commerce company. Their user sets are likely to contain most of the residents of the area, so the intersection of their user space is large. However, since the bank records the user’s revenue and expenditure behavior and credit rating, and the e-commerce retains the user’s browsing and purchasing history, their feature spaces are very different. Suppose that we want both parties to have a prediction model for product purchase based on user and product information. ![](https://i.imgur.com/m88CieL.png) ## Data Partitioning Data partitioning In the cross-device setting the data is assumed to be partitioned by examples. In the cross-data silo, in addition to partitioning by examples, partitioning by features is of practical relevance. An example could be when two companies in different businesses have the same or overlapping set of customers, such as a local bank and a local retail company in the same city. This difference has been also referred to as horizontal and vertical federated learning. ## Dataset Types While a variety of different assumptions can be made on the per-client functions being optimized, the most basic split is between assuming IID and non-IID data. **Formally, having IID data at the clients means that each mini-batch of data used for a client’s local update is statistically identical to a uniformly drawn sample (with replacement) from the entire training dataset (the union of all local datasets at the clients).** Since the clients independently collect their own training data which vary in both size and distribution, and these data are not shared with other clients or the central node, the IID assumption **clearly almost never holds in practice.** However, this assumption greatly simplifies theoretical convergence analysis of federated optimization algorithms, as well as establishes a baseline that can be used to understand the impact of non- IID data on optimization rates. Thus, a natural first step is to obtain an understanding of the landscape of optimization algorithms for the ### Non-IID Datasets > 🤯 But if we have the capability to run training on the local data on each device (which is necessary for federated learning of a global model), is training a single global model even the right goal? There are many cases where having a single model is to be preferred, e.g. in order to provide a model to clients with no data, or to allow manual validation and quality assurance before deployment. Nevertheless, since local training is possible, it becomes feasible for each client to have a customized model. This approach can turn the non-IID problem from a bug to a feature, almost literally — since each client has its own model, the client’s identity effectively parameterizes the model, rendering some pathological but degenerate non-IID distributions trivial. [AOP-FL](https://arxiv.org/pdf/1912.04977v3.pdf) ![](https://i.imgur.com/PnNmMui.png) ## Privacy Techniques ### Differential Privacy ### Secure multiparty computation ### Homomorphic Encryption ## Frameworks and Libraries ### FedML [GITHUB](https://github.com/FedML-AI/FedML) FedML, an open research library and benchmark that facilitates the development of new federated learning algorithms and fair performance comparisons. FedML supports three computing paradigms (distributed training, mobile on-device training, and standalone simulation) for users to conduct experiments in different system environments. FedML also promotes diverse algorithmic research with flexible and generic API design and reference baseline implementations. A curated and comprehensive benchmark dataset for the non-I.I.D setting aims at making a fair comparison. ### PySyft [GITHUB](https://github.com/OpenMined/PySyft) ### Tensorflow Federated [GITHUB](https://github.com/tensorflow/federated) TensorFlow Federated (TFF) is an open-source framework for machine learning and other computations on decentralized data. TFF has been developed to facilitate open research and experimentation with Federated Learning (FL), an approach to machine learning where a shared global model is trained across many participating clients that keep their training data locally. For example, FL has been used to train prediction models for mobile keyboards without uploading sensitive typing data to servers.¡ ### IBM Federated [GITHUB](https://github.com/IBM/federated-learning-lib) IBM federated learning provides a basic fabric for FL, to which advanced features can be added. It is not dependent on any specific machine learning framework and supports different learning topologies ### Flower - A Friendly Federated Learning Framework [GITHUB](https://github.com/adap/flower) Flower is a FL framework which is both agnostic towards heterogeneous client environments and also scales to a large number of clients, including mobile and embedded devices. Flower’s abstractions let developers port existing mobile workloads with little overhead, regardless of the programming language or ML framework used, while also allowing researchers flexibility to experiment with novel approaches to advance the state-of-the-art. [REF](https://arxiv.org/pdf/2007.14390.pdf) ## Substra [GITHUB](https://github.com/SubstraFoundation/substra) Substra gathers data providers and algorithm designers into a network of nodes that can train models on demand but under advanced permission regimes. To guarantee data privacy, Substra implements distributed learning: the data never leave their nodes; only algorithms, predictive models and non-sensitive metadata are exchanged on the network. The computations are orchestrated by a Distributed Ledger Technology which guarantees traceability and authenticity of information without needing to trust a third party. Although originally developed for Healthcare applications, Substra is not data, algorithm or programming language specific. It supports many types of computation plans including parallel computation plan commonly used in Federated Learning. With appropriate guidelines, it can be deployed for numerous Machine Learning use-cases with data or algorithm providers where trust is limited. ## Nvidia Clara [LINK](https://developer.nvidia.com/clara) NVIDIA Clara is a healthcare application framework for AI-powered imaging, genomics, and for the development and deployment of smart sensors. It includes full-stack GPU-accelerated libraries, SDKs and reference applications for developers, data scientists and researchers to create real-time, secure and scalable solutions. ## Benchmark Frameworks ### FATE [GITHUB](https://github.com/FederatedAI/FATE) FATE (Federated AI Technology Enabler) is an open-source project initiated by Webank's AI Department to provide a secure computing framework to support the federated AI ecosystem. It implements secure computation protocols based on homomorphic encryption and multi-party computation (MPC). It supports federated learning architectures and secure computation of various machine learning algorithms, including logistic regression, tree-based algorithms, deep learning and transfer learning. ### LEAF [LINK](https://leaf.cmu.edu) LEAF is a benchmarking framework for learning in federated settings, with applications including federated learning, multi-task learning, meta-learning, and on-device learning. Future releases will include additional tasks and datasets. ## Complimentary Techniques in FL ### Neural Architecture Search (NAS) [LINK](https://arxiv.org/pdf/2004.08546.pdf) FedNAS algorithm scatters workers collaboratively searching for a better architecture with higher accuracy. Experiments on non-IID dataset show that the architecture searched by FedNAS can outperform the manually predefined architecture. ### Meta-Learning in FL [LINK](https://arxiv.org/abs/2002.07948) ## TOWARDS FEDERATED LEARNING AT SCALE: SYSTEM DESIGN [PAPER](https://arxiv.org/pdf/1902.01046.pdf) ## Challenges in FL [LINK](https://blog.ml.cmu.edu/2019/11/12/federated-learning-challenges-methods-and-future-directions/ ) **Extreme communication schemes**: It remains to be seen how much communication is necessary in federated learning. For example, can we gain a deeper theoretical and empirical understanding of one-shot/few-shot communication schemes in massive and statistically heterogeneous networks? **Novel models of asynchrony**: Two communication schemes most commonly studied in distributed optimization are bulk synchronous and asynchronous approaches. However, in federated networks, each device is often undedicated to the task at hand and most devices are not active on any given iteration. Can we devise device-centric communication models beyond synchronous and asynchronous training, where each device can decide when to interact with the server (rather than being dedicated to the workload)? Heterogeneity diagnostics: Recent works have aimed to quantify statistical heterogeneity through various metrics, though these metrics must be calculated during training. This motivates the following questions: Are there simple diagnostics that can be used to quantify systems and statistical heterogeneity before training? Can these diagnostics be exploited to further improve the convergence of federated optimization methods? **Granular privacy constraints**: Privacy is typically defined at either a local or global level with respect to all devices in the network. However, in practice, it may be necessary to define privacy on a more granular level, as privacy constraints may differ across devices or even across data points on a single device. Can we define more granular notions of privacy and develop methods to handle mixed (device-specific or sample-specific) privacy restrictions? **Productionizing federated learning**: There are a number of practical concerns that arise when running federated learning in production. For example, how can we handle issues such as concept drift (when the underlying data-generation model changes over time); diurnal variations (when the devices exhibit different behavior at different times of the day or week); and cold start problems (when new devices enter the network)? ## Algorithms ![](https://i.imgur.com/xUioki5.png)