# FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning
此篇主首先提出了**cluster skew non-iid**現象。在現今的研究中主要討論是non-iid中的non-identical distribution aspect,但很少討論non-independent feature of the clients’ data。client有相像的feature會有很大關聯性(inter-client correlation)
爭對此現象作者提出Federated Learning with Deep Reinforcement Learning的方式,**利用RL來決定不同client upload的model之間要用甚麼樣weight去aggregate成global model**。
**FEATURE**:
1. non-iid, cluster-skew non-IID
2. RL for datermine how to aggregate all local model: 藉由調整aggregation中不同client之間weight來解決non-iid問題
3. single global model FL
4. DDPG
**LACKAGE**:
1. <font color="#f00">訓練好的RL agent只能夠給特定client數量的FL system使用</font>
* 一個解決方式是action一次只決定一個client weight,但效能比較不好
* 呈上方式,如果把state加入整體資訊作為state呢,可能有機會更global的去思考(但第一個client決定時也需要考慮到後面還有誰)
2. for synchronous FL,not for supporting the asynchronous FL
3. Overhead of FedDRL: performing the aggregation at the server also requires trivial overhead for calculating the impact factors
**NON-IID**:
1. Feature skew or attribute skew
2. Quantity skew
3. Label skew: 此篇主要探討,但更針對cluster此現象
* generate: 每個client都有不同label distribution(power-law或是 Dirichlet distribution),sample再依據其label分到client
* cluster skew: clients who share the same feature may have the same data distribution
* global distribution of data label is not uniform
* t data labels are frequently partitioned into clusters, and a group of clients owning the data labels belong to the same clusters
**PROBLEM**:
在single model FL下,希望可以達成:
1. minimized global objective function over the concatenation of all the local data
2. balanced the local objective functions of all the clients
**DRL**:
使用**DDPG**作為RL agent model,此model相比於actor critic能夠更有效率的使用收集到資料(後者為online,樣本只能用一次),其有experience buffer允許每次update時都能夠使用舊有的experience。而此篇DDPG在更進階的採用**temporal difference-prioritizing strategy**,即把每次sample到的資料都prioritize,讓訓練model時可以調整要使用到哪些資料以更有效率地去訓練。而每個sample的priority以其TD error來衡量,當TD error越大,其被選作來使用的機率也就越大。
1. agent: 依據DRL得到每個client用來aggregate的weight,並執行aggregate
2. state: 收集以下每個client的資訊
1. 每個client的資料量
2. 在第t-round更新後的local model所得到的loss
3. 在第t-round使用global model(未更新前download的)所得到的loss
3. action:
1. 每個client的weight
* 先以Gaussian distributions行式(平均值and標準差)從RL得到,再用softmax()得到確切該client的weight

4. reward:
* 設計目標:
1. Improving the global model’s accuracy across all clients >> the first goal by reducing the global model’s average loss across all client datasets
2. Balancing the global model’s performance over all clients’ datasets >> minimizing the gap between the global model’s maximum and minimum losses on against clients’ datasets
* reward function:

NOTE: 他有詳細說ACTOR即CTITIC network怎麼設計,有需要可以看paper
ALGORITHM:
