*Federated Learning with Non-IID Data*
https://arxiv.org/abs/1806.00582
此篇paper前半部分透過實驗來驗證weight difference與EMD有很大的關係;後半段則提出一個使用data sharing方式改善FedAvg的algorithm。我著重在前半段實驗結果部分。
1. Experiment(1):
用來說明non-iid對FedAvg所造成的影響
* FL algorithm: FedAvg
* Dataset: (a)**MNIST** (b) **CIFAR-10** (c) **KWS**
* Distribution:
1. iid: the data of the 10 client is uniform distribution of 10 class
2. Non-iid:
1. 第一種: 每個client都只有一種class
2. 第二種: 共10種class,每種class分成兩份,10個client每個client有兩個份分別來自不同class
* Result:
1. 第一種有最嚴重reduction
2. Epoch的增加並不會改善收斂情況
3. 使用提前練好的global model作為initial model,並用FL with non-iid來練效果並不會改善還可能更差


2. <font color="#f00">relation between **weight difference** vs **EMD**</font>
the paper prove that the weight divergence in training is bounded by the earth mover’s distance (EMD) between the distribution over classes on each device (or client) and the population distribution.
1. 
2. 
1. 由此可知weight difference主要與以下兩個因素有關:
1. 上一輪的weight difference (綠色)
2. the data distribution on client k compared with the actual distribution for the whole population(藍色)
2. 如果每個client的initial model不同,則就算使用iid來訓練,結果仍會有很大的weight difference

3. 結論:
當client都有初始相同initial model時,則data distribution on client k compared with the actual distribution for the whole population(用EMD來算)成為主要造成造成weight different的原因。而EMD的影響力主要受到learning rate、time step T前的step數以及gradient(g_max (w_(mT-1-k)^((c))))決定。因此EMD是依可以用來衡量weight difference的方式
3. Experiment(2): (再看)
* Weight Divergence vs. EMD
* Test Accuracy vs. EMD