Fair Contrastive Learning for Facial Attribute Classification

# Fair Contrastive Learning for Facial Attribute Classification [toc] ## introduction + 這篇paper認為contrastive loss會隱含ethical risks + 也就是在學習的過程中會學習到sensitive attribute information, 像是種族、性別、年齡等就是常見的sensitive attribute, 這些attribute會導致model藉由它們來predict labels, 而不是用正確的target attribute來判斷, 進而產生bias + 再來就是demographic groups之間的data imbalance, 也會在training的過程中導致model產生bias + 所以作者提出Fair Supervised Contrastive Loss(FSCL) + 這個方法可以對於representation當中的sensitive attribute資訊做penalize, 以達到提升fairness的目標 + 再來就是作者提出group-wise normalization, 藉由對每個demographic groups做normalization可以解決剛剛提到的data imbalance的問題 ## related work ### Supervised Contrastive Loss + ![](https://i.imgur.com/xIZZaja.png) + 第一篇related work來複習一下傳統的supervised contrastive loss + 首先看到右下角的圖, $x$會先經過一個encoder network $f(.)$, 得到representation $h$之後再經過$g(.)$這個projection head, 得到最後用來計算contrastive loss的$z$ + 再來看到公式的部分, $A(i)$指的是除了anchor i以外的所有samples + 在第一個$\sum$的分母$|P(i)|$, $P(i)$指的是在$A(i)$中跟anchor i label相同的element, 也就是positive pairs的數量 + 第二個$\sum$的分子是針對positive pair去算cosine similarity, 再去除上一個hyperparameter $\tau$; 分母的$A(i)$就是除了i以外的所有samples, 所以包含所有positive & negative pairs ### Gradient Reversal Against Discrimination + ![](https://i.imgur.com/4BN1nPi.png) + 第二篇related work希望能夠藉由model架構的設計來讓network不要學習到sensitive attribute + 上圖中經過feature extraction之後會有兩條branch + 左邊的是target branch, 用來predict target label $y$ + 右邊的是attribute branch, 用來predict sensitive attribute $a_p$ + 在右邊attribute branch要backprop回feature extractor的時候, 會將原本的gradient加一個負號, 目的在於讓feature extractor不要學習到attribute branch所學到的sensitive attribute資訊 ### Fair Attribute Classification Through Latent Space De-Biasing + 最後一篇related work是希望產生出一組balanced synthetic images $x_{syn}$來將照片中的sensitive attribute以及target label做de-correlated + ![](https://i.imgur.com/ga5kRsI.png) + 作者希望(如上圖最後一點的公式)不管有沒有給予sensitive label $g$的資訊, model對於target label predict的機率都是一樣的, 這樣就代表產生出來的照片並沒有sensitive attribute的資訊, 也就達成我們的目標 + ![](https://i.imgur.com/6CuXzqV.png) + 一般的latent vector perturbation會用一個vector $z$來產生圖片 + 作者的做法是, 先有一個vector $z$, 我們再去產生一個complementary vector $z'$ + $z$跟$z'$的target label相同, 但sensitive attribute是相反的 + 假設target label是有沒有微笑, sensitive attriute是性別 + $z$的target label是有微笑, sensitive attribute是男生 + $z'$的target label是有微笑, sensitive attribute是女生 + 接下來作者會去train 2個classifier: $h_t$跟$h_g$ + 兩者的input都是$z$ + 會希望$h_t$的output越接近target label(有微笑)越好 + $h_g$的output越接近sensitive attribute(男生)越好 + train好兩個classifier之後, 就可以拿這兩個classifier來幫助判斷$z$跟$z'$ + 會希望$z$跟$z'$在$h_t$的output是一樣的 + 而$z$跟$z'$在$h_g$的output會差一個負號 + 這樣才跟我們的目標(same target label but the opposite sensitive attribute label)一樣 + ![](https://i.imgur.com/WR4Wndu.png) + ![](https://i.imgur.com/PVyj63T.png) + $z$跟$z'$的關係會像上圖所示 + 先看到$w_t$這個target attribute hyperplane, 可以看到$z$跟$z'$是在$w_t$的同一側, 因為它們的target label是相同的 + 而$z$跟$z'$會在$w_g$的兩側, 因為它們是相差一個負號的 + 所以$z$跟$z'$的相對位置大概如圖所示 + 而$z'$可以由上圖的公式算出來, 但因為推導有點冗長, 也不是這次的重點, 大家有興趣可以再去這篇paper看一下 ## method + 這篇paper主要在探討supervised contrastive loss當中造成unfairness的原因 1. 學習到sensitive attribute information + 由於學習target attribute以及sensitive attribute都能夠減少loss, 所以model對於兩者都會去學習 + 但encoding network無法分辨這兩者, 所以學習到sensitive attribute進而導致unfairness的情況發生 2. demographic groups之間的data imbalance問題 + ![](https://i.imgur.com/zB55ejJ.png) + 我們先看到左圖, 左邊就是剛剛在related work看到的supervised contrastive loss function, 在第一個$\sum$內只有對$Z_p(i)$做normalization, 也就是對於positive pairs的數量做notmalization + 而右邊的式子在數學意義上是跟左邊完全一樣的, 我們來仔細看一下內容 + 第一個$\sum$是iterate through $j, k$, $j$是target label, $k$是sensitive attribute + 第二個$\sum$是在$Z^{j, k}$當中每次取一個$z_i$出來; 這樣代表我們會去注意到每個anchor $i$的sensitive attribute + 第三個$\sum$是iterate through $k$, 也就是看過各種demographic groups(sensitive attribute) + 第四個$\sum$是從$Z^k_p(i)$當中去選$z_p$(positive pairs), 而$Z^k_p(i)$代表這些positive pairs除了target label要跟anchor $i$一樣之外, sensitive attribute也要跟anchor $i$一樣 + ![](https://i.imgur.com/UxnTCt9.png) + 我們代入實際的數字來做說明 + 假設左邊算出來的loss是10, $Z_p(i)$是10(normalization), 這樣total loss是-1 + 用右邊的公式來算, 由於會iterate through所有demographic groups(sensitive attributes), 那normalization的term不變, 而Caucasian通常都是data中的多數, 所以在positive pairs比較多的情況下, loss就會比較大 + 這樣就能很明顯看出, 雖然有做normalization, 但weight還是會偏向majority group(Caucasian) + 所以作者認為這是傳統supervised contrastive loss導致unfairness的原因 --- + ![](https://i.imgur.com/VexuQAh.png) + 接著我們來介紹Fair Supervised Contrastive Loss(FSCL) + **IG-SIM**: 代表sample跟anchor有相同的target label以及相同的sensitive attribute + **SG-SIM**: 代表sample跟anchor有相同的target label但不同的sensitive attribute + **TG-SIM**: 代表sample跟anchor有不同的target label和相同的sensitive attribute + **TSG-SIM**: 代表sample跟anchor有不同的target label以及不同的sensitive attribute + ![](https://i.imgur.com/zKTpEE7.png) + 而在作者的方法中只會去討論前三種similarity + 前兩種similarity跟anchor的target label相同(不看sensitive attribute), 所以是positive samples + 第三種similarity的target label跟anchor不同, 是唯一一種negative pairs + 作者希望前兩種(positive samples)的similairty要高於第三種(negative samples) --- + 作者將FSCL分成兩種case(因為有兩種positive sample)來討論 + ![](https://i.imgur.com/EPAYYiW.png) 1. 第一種是positive sample是來自於**IG-SIM**(第一種similarity) + 跟傳統的supervised contrastive loss最大的不同在於第二個$\sum$的分母 + 傳統的supervised contrastive loss分母是除了anchor之外的所有positive sample跟negative sample + FSCL的分母只有negative samples + 可以看到上圖的右上角表格, positive & negative sample的sensitive attribute都anchor都相同 + 那encoding network在訓練過程中就不會認為sensitive attribute是有用的資訊, 因為無法將positive & negative samples與anchor做拉近或拉遠的動作 + ![](https://i.imgur.com/QgiUtDi.png) 2. 第一種是positive sample是來自於**SG-SIM**(第二種similarity) + 一樣看到上圖的右上角表格, positive sample的sensitive attribute跟anchor以及negative sample是不同的 + 假設encoding network學到sensitive attributes, anchor跟positive samples反而會被拉遠, anchor跟negative samples反而會被拉近 + 但這跟loss的目標相反, 所以在minimize loss的過程中, 就能反向地讓encoding network沒辦法學習到sensitive information, 也就能減少fairness + --- + ![](https://i.imgur.com/BJQz00M.png) + 最後是作者所提出的normalization方法, group-wise normalization 1. 第一個$\sum$的normalized term $|Z^{j, k}|$, iterate through所有target label $j$跟所有sensitive attribute $k$的anchor總數 2. 第二跟第三個$\sum$的normalized term $|Z^k_p(i)|$, 是去看跟anchor的target label還有sensitive attribute都一樣的positive samples總數 + 所以由內而外的話就是先對demographic group之間做normlaized, 再對anchor總數做normalized + 這樣有對於每一種demographic group做normalization, 就能夠解決前面看到weight偏向majority group的問題 ## experiment + ![](https://i.imgur.com/6Mavakw.png) + experiment的部分使用了三個dataset + CelebA + 有20萬張照片, 每張照片都有40個attributes + UTK Face + 有2萬張照片, 每張都有三個: gender, age, ethnicity + 作者的實驗將gender當作target attribute, age跟ethnicity當作sensitive attribute + 因為作者的方法是使用binary attribute, 所以age是用35歲當作分界, ethnicity則是看是否為白Caucasain + Dogs and Cats + 有快4萬張貓跟狗的照片 + 將顏色當作sensitive attribute, species(貓或狗)當成target attribute + --- + ![](https://i.imgur.com/zL24PgY.png) + 實驗中使用的fairness metrics是使用equalized odds + 可以看到在上圖中$P_{s^0}$以及$P_{s^1}$是代表在不同sensitive attribute的情況下 + 在fairness程度很高的情況下, 絕對值內的兩個值相減會跟0很接近, 代表就算是不同的sensitive attribute, model所predict出來的值都是一樣的 + 在相同的target label以及classifier output的情況下 + --- + ![](https://i.imgur.com/ddgb4lW.png) + encoder network使用resnet-18 + projection network是兩層hidden layer的MLP + image都resize成128x128 + train 100個epochs + --- + ![](https://i.imgur.com/IH6nDSL.png) + 第一個實驗是實作在CelebA + 在表格中第一列有$T$以及$S$, 分別代表target label以及sensitive attribute + 小寫的$a, m, y, ...$代表不同的attribute, 如上圖所示 + 可以看到不管是哪一種target label與sensitive attribute的組合, 作者的方法都有最好的EO值, 代表fairness程度最好 + FSCL與FSCL+, +只差在有沒有使用group-wise normalization + ![](https://i.imgur.com/xmsvvAH.png) + 第二個實驗實作在剛剛related work的gradient reverse的方法 + 傳統的supervised contrastive loss在使用adversarial training後, EO值有下降, 但accuracy也有跟著下降 + 作者的方法在不使用adversarial training就能有很好的EO值, 使用之後反而提升了 + ![](https://i.imgur.com/Hzv2Dln.png) + 第三個實驗室使用related work當中產生de-bias data的方法來實驗 + cross-entropy loss在使用de-bias data後有將EO值下降 + 而作者的方法使用de-bias data後EO值還能再進一步下降, 而且accuracy也有微幅提升 + ![](https://i.imgur.com/FKU3IAT.png) + 接下來這個實驗是想觀察intra-group compactness以及inter-class separability + 上圖中的數值都是normalized to sum to unity, 也就是一張圖表中的藍/橘色全部相加會等於1 + 這邊使用8個test set groups, target label是attractiveness, sensitive attribute是male跟young + 光看兩個圖表就能看出兩邊的橘色數值之間是比藍色之間還要相近的(fairness程度高) + 上圖右下角的表格也說明FSCL+所得到的standard deviation是比FSCL還低的 + 代表加入group-wise normalization是能夠有效降低std + ![](https://i.imgur.com/SBcVQSv.png) + 這個實驗用t-SNE來呈現使用作者方法之後, 點在空間中的分布 + 這邊使用的target label是attractiveness, sensitive attribute是male + 理想上是所有點只被target label分成兩個區塊 + 左邊傳統的supervised contrastive loss很明顯除了被target label分成兩個區塊, 又被sensitive attribute在分成兩個區塊 + 所以代表model是有學習到sensitive attribute的 + 右邊作者的方法就很明顯只有被target label分成兩個區塊, model就沒有學習到sensitive attribute + ![](https://i.imgur.com/w0MPVN6.png) + 這個實驗想測試在data imbalance的情況下, EO值跟accuracy的表現 + target label是gender, sensitive attribute是ethnicity + data的設置是Caucasian的男性資料是女性資料的$\alpha$倍, 不是Caucasian的女性資料是男性資料的$\alpha$倍 + 在圖表的下方$\alpha=2, 3, 4$就是imbalance的程度 + 可以看到在左圖最下面藍色跟紅色的線是作者的方法, 它們在data imbalance的情況下EO值還是能維持在比較低的程度 + 右圖一樣是藍色跟紅色的線是作者的方法, 在accuracy的表現上也都是在最前面幾名的程度 + ![](https://i.imgur.com/y3MUBbw.png) + 最後一個實驗是想去看作者的方法在不同bias type底下的效果 + 所以實作在Dogs and Cats這個dataset + target label是species(狗還是貓), sensitive attribute是color + 在圖表中藍色跟紅色的點是作者的方法, 在橫向EO值是最低的兩個方法, 在縱向accuracy也是前兩名的成績 ## conclusion + 這篇paper去分析傳統supervised contrastive loss造成unfairness的原因 + 一個是因為model學習到sensitive attribute的資訊 + 一個是因為data imbalance + 作者提出的FSCL能夠解決上述兩個問題 + 作者提出的loss function在兩種不同的case下都能讓model不去學習sensitive attribute + 而group-wise normalization根據每個sensitive attribute都去做normalization能夠有效將原本loss會偏向majority group的問題解決 + 最後就是在剛剛很多實驗中能看到作者的方法在提升fairness的同時, accuracy只有微幅的下降