[HCZ+21]Personalized Cross-Silo Federated Learning on Non-IID Data

# Personalized Cross-Silo Federated Learning on Non-IID Data 此篇paper首先提出了在傳統FL中目標只是"共同合作完成一個global model"的架構是NON-IID中的bottleneck。原因是這樣的架構所得出的model並無法完全得符合每個local dateset。因此作者提出了一個新的FL framework，他的目標是在每個client透過合作來讓自己的model更好(相比只有自己訓練)，而非合作完成一個model。更詳細的說，此framework藉由調控wight來決定其他人所訓練model要佔自己model在aggregate多少比例(可以想像同時有client number個global model正在進行訓練) ![螢幕擷取畫面 (333)](https://hackmd.io/_uploads/rkwFaWiST.png) **THINKING**: 基於每個client所要合作的目標是甚麼，我認為FL可以再細分成兩個種類: 1. 合作目標是共同訓練一個model * [FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning](<https://hackmd.io/@nnlijpg3Ts6jTgdB-pUfEw/BJKsy3FSp>): aggregate時，用RL調整不同model weight * [Optimizing Federated Learning on Non-IID Data](<>): selcect client時，用RL來選擇 2. 合作目標是在幫助別人的同時也幫助自己的model更好 * cluster FL、multitask FL和Personalized FL都是此類 **FEATURE**: 1. mutltitask model FL(Personalized) 2. non-iid 3. 用optimizationx來解決non-iid: 決定每輪aggregate時local model的weight **PROBLEM**: ![螢幕擷取畫面 (327)](https://hackmd.io/_uploads/SkZRJZjra.png) 1. 第一項(F())，目標希望每個自己所想要訓練的Personalized model都盡可能符合自己的local model(F_i表示在client i的estimated loss) * NOTE: 在FedAvg中，此項所有client計算的model是global model w而非w_i 2. 第二項(A()):此項希望促使每個client間的合作 * A()稱作attention-inducing function其必須滿足以下假設: * non-linear function * increasing and concave * continuously differentiable * finite * 作者提到可以是: (1)negative exponential (2)smoothly clipped absolute deviation function (3)minimax concave penalty function * 其實就是用來計算兩個client model之間差異。 **ALGORITHM**: 此篇的algorithm是透過去解上述目標式發展而來 1. 如何解object function? (central版本，但這樣的解法剛好可以進一步拆成FL的版本) 1. incremental-type optimization((Bertsekas 2011) * 透過分別爭對F()和A()去對W長最佳化，然後不斷的重複直到收斂 * 具體: 每一輪都先以A()開始最佳化W，接著才是用F()開始最佳化W 1. 以A()開始最佳化W: 用gradient descent的到U(稱作prox-center) ![螢幕擷取畫面 (328)](https://hackmd.io/_uploads/S1N6rbiBT.png) * 這個是可以在central計算，其不用client data因此不會透漏privacy * 其中的U_k是一個client_number* client_number矩陣，把它分成client_number個vector來看並且則可以得到以下，而其恰好可以化成一個不同client model的weighted aggregation形式，即是此篇paper解決non-iid之手段 ![螢幕擷取畫面 (330)](https://hackmd.io/_uploads/S1zMKZira.png) * 而這些weight總合為1(調整其中的a)，且為convex combination(這裡不是很懂，應該是用來證明?) 2. 在用F()開始最佳化W: 用 proximal point step (Rockafellar 1976) * 本質上需要用到client data，但其計算時式可以分別在client自己算好在傳上去(model時是算個別model的個別loss的總和) ![螢幕擷取畫面 (329)](https://hackmd.io/_uploads/HJfZLZora.png) * 收斂性質: converges to an optimal solution when G(W) is a convex function, and to a stationary point when G(W) is non-convex (paper中有提供) 2. ANALYSIS: 為甚麼這樣formulate的objective function能夠達成促進合作情況下達成每個人的最佳解? * 由於上述的最佳化解法找到的weight可以表示為以價(圖)，因此可以知道當其他client weight與自己越相似時，則他會有越高的權重在此次aggregate(死自己data分布相似)，因此促進了合作。 ![螢幕擷取畫面 (331)](https://hackmd.io/_uploads/BJvVjWiSa.png) 3. SUDOCODE ![螢幕擷取畫面 (332)](https://hackmd.io/_uploads/r1SGpWoHp.png)