[HLD+22]FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning

# FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning 此篇主首先提出了**cluster skew non-iid**現象。在現今的研究中主要討論是non-iid中的non-identical distribution aspect，但很少討論non-independent feature of the clients’ data。client有相像的feature會有很大關聯性(inter-client correlation) 爭對此現象作者提出Federated Learning with Deep Reinforcement Learning的方式，**利用RL來決定不同client upload的model之間要用甚麼樣weight去aggregate成global model**。 **FEATURE**: 1. non-iid, cluster-skew non-IID 2. RL for datermine how to aggregate all local model: 藉由調整aggregation中不同client之間weight來解決non-iid問題 3. single global model FL 4. DDPG **LACKAGE**: 1. <font color="#f00">訓練好的RL agent只能夠給特定client數量的FL system使用</font> * 一個解決方式是action一次只決定一個client weight，但效能比較不好 * 呈上方式，如果把state加入整體資訊作為state呢，可能有機會更global的去思考(但第一個client決定時也需要考慮到後面還有誰) 2. for synchronous FL，not for supporting the asynchronous FL 3. Overhead of FedDRL: performing the aggregation at the server also requires trivial overhead for calculating the impact factors **NON-IID**: 1. Feature skew or attribute skew 2. Quantity skew 3. Label skew: 此篇主要探討，但更針對cluster此現象 * generate: 每個client都有不同label distribution(power-law或是 Dirichlet distribution)，sample再依據其label分到client * cluster skew: clients who share the same feature may have the same data distribution * global distribution of data label is not uniform * t data labels are frequently partitioned into clusters, and a group of clients owning the data labels belong to the same clusters **PROBLEM**: 在single model FL下，希望可以達成: 1. minimized global objective function over the concatenation of all the local data 2. balanced the local objective functions of all the clients **DRL**: 使用**DDPG**作為RL agent model，此model相比於actor critic能夠更有效率的使用收集到資料(後者為online，樣本只能用一次)，其有experience buffer允許每次update時都能夠使用舊有的experience。而此篇DDPG在更進階的採用**temporal difference-prioritizing strategy**，即把每次sample到的資料都prioritize，讓訓練model時可以調整要使用到哪些資料以更有效率地去訓練。而每個sample的priority以其TD error來衡量，當TD error越大，其被選作來使用的機率也就越大。 1. agent: 依據DRL得到每個client用來aggregate的weight，並執行aggregate 2. state: 收集以下每個client的資訊 1. 每個client的資料量 2. 在第t-round更新後的local model所得到的loss 3. 在第t-round使用global model(未更新前download的)所得到的loss 3. action: 1. 每個client的weight * 先以Gaussian distributions行式(平均值and標準差)從RL得到，再用softmax()得到確切該client的weight ![螢幕擷取畫面 (323)](https://hackmd.io/_uploads/HyxnfgoBa.png) 4. reward: * 設計目標: 1. Improving the global model’s accuracy across all clients >> the first goal by reducing the global model’s average loss across all client datasets 2. Balancing the global model’s performance over all clients’ datasets >> minimizing the gap between the global model’s maximum and minimum losses on against clients’ datasets * reward function: ![螢幕擷取畫面 (324)](https://hackmd.io/_uploads/rkLq7gsH6.png) NOTE: 他有詳細說ACTOR即CTITIC network怎麼設計，有需要可以看paper ALGORITHM: ![螢幕擷取畫面 (326)](https://hackmd.io/_uploads/By2WDeiBp.png)