*Federated Learning on Non-IID Data: A Survey*
https://arxiv.org/abs/2106.06843 [2021]
1. Introduction of the federated learning
1. Horizontal/homogeneous: share the same feature space but have different sample space

2. Vertical: share the same sample space but have different feature space
1. 無server
2. 有label的稱作guest party或passive party
3. 無label的稱作host party或active party

3. 兩者之間差異:
1. h有一個centralized server來aggregate,但是v則無,取而代之的是有label的server來進行
2. h是透過傳遞weight來aggregate,. Instead, the guest client receives model outputs from the connected host clients and sends the intermediate gradient values back for local model updates
3. h每每round每個client只需要交流一次,但後者則會多次
2. non-iid type
1. attribute skew
1. Non-overlapping attribute skew
* the data features across the clients are mutually exclusive
* feature間可能相關(向下面圖片)也可能不相關
* 此類在vertical中表現不影響,訓練結果跟centralized相同

2. partial-overlapping attribute skew
* parts of the data features can be shared with each other
* 不同client間的共同feature可能有不同distribution(如果有相同分布,則這些共同feature不會造成non-iid divergence,因為相同feature間都有共同學習目標:反之則會)
3. full-overlapping attribute skew
* 每個client的feature space都相同,且通常假定feature space distribution都不相同
* 造成此類的原因:
1. local有不同的nosie,造成不同distribution
2. real-world feature imbalance(例如: 每個人的手寫字都有點不同)
2. label skew
* label的distribution在每個client間都不相同
1. Label distribution skew
* client間的P_k (y)不同,但是P_k (x|y)相同 (y-label, x-feature)
* it is usually caused by location variations of the clients that store similar types of local training data
* 兩種主要label distribution skew設定:
1. label size imbalance
2. label distribution imbalance

2. Label preference skew
* client間的P_k (y)同,但是P_k (x|y)不同(同個feature在不同client對應到不同label),實際常遇到
3. Temporal skew
* P_k (x,y|t) (x,y)的機率分布隨著時間改變
1. Spatio-temporal data
2. Time-series data
4. Attribute & Label skew
* Different client hold data with different label and different feature
6. Quantity skew
* The number of training data varies across different clients and occur in all situations discussed above
3. Non-iid對federating learning model影響及挑戰
* 作者分辨對FL所要訓練的model分為parametric/non-parametric及Horizontal FL/Vertical FL,這邊我針對parametric- Horizontal FL部分記錄
* Horizontal FL
1. i. Non-IID in horizontal FL usually refers to label distribution skew(label distribution skew造成的發散比label preference skew更嚴重)
2. ii. cause global model divergence. particularly when the number of epochs for local updates is large

3. iii. multi-layer perceptrons, convolutional neural networks and long-short-term-memory (LSTM),是常見使用FedAvg來image classification和next word prediction tasks,non-iid實驗
=> 若model越deep則non-iid造成的誤差越嚴重(shallow較可以接受)。當model越complex,準確度相對越高則造成的誤差也較大
4. 針對non-iid的處理方式
