Federated Learning

# Federated Learning ## What is FL? ### Keyword: - 資料隱私性 Data Privacy - 分散式機器學習 Distributed Machine Learning ### Federated optimization key properties/跟一般分散式ML差異點: - **Non-IID**(Independently and identically distributed) -> local dataset不具代表性（representative） - **Unbalanced** - Massively distributed ### How it works 1. 每個用戶用自己的local data做training（不用上傳資料） 2. 把結果(gradient)上傳給中央server（不會得到user data） ![架構圖](https://i.imgur.com/SC4XyCp.png) 3. 中央server整合gradient結果再做一次得到global gradient ### Bottleneck on FL: - 溝通（Communication） -> 在local client多次算gradient，減少溝通次數（FedAvg） ### Problem - 沒得到user data = 用戶隱私得到保障？ - 上傳的gradient本身即用戶data feature,有心還是有辦法解析出 ### Reference: - [Federated Learning 聯邦學習簡介](https://biic.ee.nthu.edu.tw/blog-detail.php?id=2) - ## Comparison of FedSGD and FedAvg - FedSGD: bassline FL, update **local gradient once per round** - FedAvg: update local gradient few times, then update the **averaged gradient** ## 實作FedAvg - Code tracing 采用前人實作過的code：[link在這裏](https://github.com/vaseline555/Federated-Averaging-PyTorch) ### 實際實作方式中央server把datasets（可爲IID或Non-IID）分散給所有clients，以multi-processing的方式在不同client上運算 - multi-processing: ``` # updated selected clients with local dataset if self.mp_flag: with pool.ThreadPool(processes=cpu_count() - 1) as workhorse: selected_total_size = workhorse.map(self.mp_update_selected_clients, sampled_client_indices) selected_total_size = sum(selected_total_size) ``` 隨機采樣update client的gradient ### 架構 - config讀取: `main.py`用pyyaml讀取`config.yaml` - `fed_config`: ``` C: 0.1 K: 100 R: 500 E: 10 B: 10 criterion: torch.nn.CrossEntropyLoss optimizer: torch.optim.SGD ``` - `model_config`: - *TwoNN* - *CNN* - 中央server: `src/server.py` >At first, center server distribute model skeleton to all participating clients with configurations. While proceeding federated learning rounds, the center server samples some fraction of clients, receives locally updated parameters, averages them as a global parameter (model), and apply them to global model. In the next round, newly selected clients will recevie the updated global model as its local model. - client:`src/client.py` ## Multi-Layer Federated Learning ### Results ![](https://hackmd.io/_uploads/BkyWimfUn.png) ![](https://hackmd.io/_uploads/S1WGiQMUh.png) #### With 2 masters: - 2NN - Non-IID ![](https://hackmd.io/_uploads/HyuZVxsB3.png) ![](https://hackmd.io/_uploads/S1nGVliB3.png) ![](https://hackmd.io/_uploads/HJFCQxjBh.png) ![](https://hackmd.io/_uploads/ryMAQxor3.png) - IDD ![](https://hackmd.io/_uploads/Sk0a6ERH2.png) ![](https://hackmd.io/_uploads/H1wGA4RHh.png) ![](https://hackmd.io/_uploads/ryefCaERBn.png) ![](https://hackmd.io/_uploads/rJYbAVASn.png) - All ![](https://hackmd.io/_uploads/r1XneOkU2.png) ![](https://hackmd.io/_uploads/ryIhlOkIh.png) ![](https://hackmd.io/_uploads/rkFtANRHn.png) ![](https://hackmd.io/_uploads/rk4qxO183.png) - CNN ![](https://hackmd.io/_uploads/Sk_wvsnr3.png) ![](https://hackmd.io/_uploads/ryDODo3Sh.png) ![](https://hackmd.io/_uploads/HJHGwj2B2.png) ![](https://hackmd.io/_uploads/BkDHDihSh.png) - Total ![](https://hackmd.io/_uploads/rJ4Ajvk8h.png) ![](https://hackmd.io/_uploads/SkcpovJI3.png) ![](https://hackmd.io/_uploads/Sk12CPJI3.png) ![](https://hackmd.io/_uploads/r11C0vkL3.png)