Survey on AI Fairness and Bias

###### tags: `課業` # Survey on AI Fairness and Bias ## Video {%youtube wmyVODy_WD8 %} ### Video Content Presentation - [Topic Presentation 1](https://docs.google.com/presentation/d/1Pb0fBO2OZcNCjqsVbF2dAJYBJn82aV0kz3bjHamPpmc/edit?usp=sharing) ### What is bias? - prototype:typical - stereotype:particular labels and features are not the minority,and confound our decision. ### Taxomony of Common Bias - Data driven and Interpretation Driven - class balanced: - Batch selection - Weighting - ![](https://i.imgur.com/i0zPKLJ.png) ### Fair model - Bias mitigation - Remove problematic signal - Inclusion - Add desired feature signal - Fair and Bias - A classifier $f_{\theta}(x)$ is **biased** if it changes its prediction with some additional sensitive feature input - **Fair** with respect to variable $z$ if $f_{\theta}(x) = f_{\theta}(x, z)$ ## Survey Paper List ### General - [A Survey on Bias and Fairness in Machine Learning (ACM Computing Surveys, Volume 54, Issue 6, July 2021)](https://arxiv.org/abs/1908.09635) ### Bias mitigation - [Topic Presentation 2](https://docs.google.com/presentation/d/1sRWgEmfSgCD1iKvAVU_0Z-xGbkoTgDfVbbFTYF89a3g/edit?usp=sharing) - [Topic Presentation 3](https://docs.google.com/presentation/d/1cwEe1i7z7OnH5W-cZHPl0cmbka7Vk6D-Ob0bQDdc4fY/edit?usp=sharing) #### In video - [Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure (AIES 2019)](https://dl.acm.org/doi/10.1145/3306618.3314243) - Mitigate bias through learned latent structure - ![](https://i.imgur.com/GxVq4ng.png) - resample according to the reciprocal of the probability of sampling distribution - $\hat{Q}(z|X) \propto \prod_{i}\hat{Q}_{i}(z_{i}|X)$ - $W(z(x)|X) \propto \prod_{i}\dfrac{1}{\hat{Q}_i(z_i(x)|X) + \alpha}$ - Here $W$ is the probability distribution of selecting a datapoint $x$ - [Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations (FAT/ML 2017)](https://arxiv.org/pdf/1707.00075.pdf) - [Mitigating Unwanted Biases with Adversarial Learning (AIES 2018)](https://arxiv.org/pdf/1801.07593.pdf) - ![](https://i.imgur.com/7LgGxd7.png) - Predict sensitive attribute $z$ - Negate gradient for $z$ head - Remove effect of $z$ #### NLP - [Gender-preserving Debiasing for Pre-trained Word Embeddings (ACL 2019)](https://arxiv.org/abs/1906.00742) - [Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings (NAACL 2019)](https://aclanthology.org/N19-1062.pdf) - [github](https://github.com/TManzini/DebiasMulticlassWordEmbedding) - Multiclass debiasing method - Basic thought - Identify words that are gender-neutral $N$ and gender-definitional $S$. - Project away the gender subspace from the gender-neutral words. $w = w - w \cdot B$ for $w \in N$. $B$ is the gender subspace. - Normalize vectors. - ![](https://i.imgur.com/kbQIdNc.png =400x) - PCA for component recognition - [Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation (ACL 2020)](https://arxiv.org/pdf/2005.00965.pdf) - [github](https://github.com/uvavision/Double-Hard-Debias) - ![](https://i.imgur.com/lqKCDPl.png) - Word frequency implicitly plays an important role in the embedding algorithm. So the word frequency difference can harm the effectiveness of the Hard Debiasing Algorithm. - Neighborhood Metric (from [Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them (NAACL 2019)](https://arxiv.org/pdf/1903.03862)) - The bias of a word is the proportion of words with the same gender bias polarity among its nearest neighboring words. - **Do the following steps** 1. Analyze the principal components $\{u_{1} ... u_{k}\}$ of the whole word embeddings as the candidate frequency dimension. 2. Pick up a sets of high gender bias words, such as programmer, homemaker, doctor, nurse. 3. Repeat step 4 ~ 6 for each candidate dimension $u_{i}$ 4. Project word embeddings $w$ into a subspace orthogonal to $u_{i}$ to get $w'$ 5. $\hat{w} = HardDebias(w')$ 6. We get the debiased word embedding $\hat{w}_{HighBiasWords}$ of all high gender bias words after step 5. Then we do k-means clustering (k = 2 for *male* and *female*) on $\hat{w}_{HighBiasWords}$ and calculate the clustering accuracy. - ==The dimension $u$ that make the clustering accuracy decrease the most is the dimension we want to delete.== - Running code - Word2Vec - ![](https://i.imgur.com/jHrEtYC.png) - HardDebias Word2Vec - ![](https://i.imgur.com/AglInIr.png) - Double-HardDebias Word2Vec - ![](https://i.imgur.com/JfDwMIJ.png) - [Counterfactual Inference for Text Classification Debiasing (ACL 2021)](https://aclanthology.org/2021.acl-long.422.pdf) - [github](https://github.com/qianc62/Corsair) - ![](https://i.imgur.com/I1gCJ7v.png) - Post debiasing framework. No need to employ data manipulation or designing balancing mechanism. - First, train a base model on the training data directly to preserve the dataset biases in the trained model. - loss function: standard cross entropy $\mathcal{L}(\theta) = -\dfrac{1}{n} \sum_{i=1}^{n}\sum_{y\in\mathcal{Y}}\pi_{i, y}\ln\bar{\pi}_{i, y}$ - During inference phase - Find out two kinds of bias which will be removed later: **Label Bias** and **Keyword Bias** - Label Bias - We don't feed the original test dataset words to the classifier model. Instead, we ***mask*** them with `[MASK]` token. - Fully-blinded input to the classifier model: $[[MASK]_{1}, [MASK]_{2}, ..., [MASK]_{n}]$ for length n input - Embedding using the average document feature - Keyword Bias - Get the summarization of the document $x_{content}$ and $x_{context} = x - x_{content}$ - ***Masking*** $x_{content}$, only leave $x_{context}$ - Partially-blinded input to the classifier model: $[[MASK]_{content, 1}, w_{context, 2}, ..., w_{context, n-1}, [MASK]_{content, n}]$ for length n input - Factual inference removing ($\backslash$) Label Bias and Keyword Bias - $f(x) \backslash f(\hat{x}) \backslash f(\tilde{x}) = f(x) - \hat{\lambda}f(\hat{x}) - \tilde{\lambda}f(\tilde{x})$ - $\hat{\lambda}$ and $\tilde{\lambda}$ can be searched by $grid$ $search$ $algorithm$ - Running the code ```log Twitter-RoBERTa-Seed=503881793-Start_Time=20211227224109-Current_Time=20211227231223-Epoch=20 | dev_factual_maf1=75.09% | dev_counterfactual_maf1=76.52% | test_counterfactual_maf1=71.85%(rate=0.90,-0.90) | test_counterfactual_maf1=70.95%(rate=0.00,0.00) | test_counterfactual_maf1=76.54%(rate=1.00,0.00) | test_counterfactual_maf1=68.16%(rate=0.00,1.00) | test_counterfactual_maf1=75.04%(rate=0.50,0.50) | factual_label_fairness=14.663680447358479 | counterfactual_label_fairness=10.286049960575063 | factual_keyword_fairness=18.059472894597317 | counterfactual_keyword_fairness=17.511511395244046 | ``` #### CV - [A Reductions Approach to Fair Classification (ICML 2018)](https://arxiv.org/pdf/1803.02453.pdf) - [REPAIR: Removing Representation Bias by Dataset Resampling (CVPR 2019)](https://arxiv.org/pdf/1904.07911) - [github](https://github.com/JerryYLi/Dataset-REPAIR) - Resampling datasets to avoid representation bias which may enable **shortcuts** (the representations for which the dataset is biased) that a model can exploit to solve the dataset without learning the underlying task of interest. - With a weight encodes the probablility![](https://i.imgur.com/cbm5IrV.png) - Goal: Find the weights that minimizes the bias - Lead to the optimization problem with minmax - Minimize the maximum bias we may get - ![](https://i.imgur.com/CDyHHvP.png) - Running code - ![](https://i.imgur.com/frGVTK9.png) - ![](https://i.imgur.com/R8XUwG3.png) | | Original | REPAIR | | --- | --- | --- | | Grayscale | 64.30% | **77.14%** | - [Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation (CVPR 2020)](https://arxiv.org/pdf/1911.11834.pdf) - [github](https://github.com/princetonvisualai/DomainBiasMitigation) - [Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations (ICCV 2019)](https://openaccess.thecvf.com/content_ICCV_2019/papers/Wang_Balanced_Datasets_Are_Not_Enough_Estimating_and_Mitigating_Gender_Bias_ICCV_2019_paper.pdf) - [github](https://github.com/uvavision/Balanced-Datasets-Are-Not-Enough) - [demo](https://www.vislang.ai/genderless) - ![](https://i.imgur.com/50xQcvp.png =500x) - Masking the region which may cause biased prediction for the deep learning models on object recognition tasks. - Using reverse gradient too. - Leakage - $X_{i}, Y_{i}, g_{i}$ for data, labels, protected attribute - We need to preventing a reverse engineering function $f$ from guessing $f(Y_{i}) \approx g_{i}$ - Dataset leakage - $\lambda_{D} = \frac{1}{|\mathcal{D}|} \sum_{(Y_{i},g_{i})\in \mathcal{D}} 1[f(Y_{i}) == g_{i}]$ - $\lambda_{D}(a) = \frac{1}{|\mathcal{D}|} \sum_{(Y_{i},g_{i})\in \mathcal{D}} 1[f(r(Y_{i}, a)) == g_{i}]$ - $\lambda_{D}(a)$ measures the leakage of an ideal model which achieves a performance level $a$ but only makes mistakes randomly, not due to systematic bias. - Model leakage - $\lambda_{M}(a) = \frac{1}{|\mathcal{D}|} \sum_{(\hat{Y}_{i},g_{i})\in \mathcal{D}} 1[f(r(\hat{Y}_{i}, a)) == g_{i}]$ - Bias amplification - For example women are represented as cooking twice as often as men in imSitu, but after models are trained and evaluated on similarly distributed data, they predict cooking for women three times as often as men. - $\Delta = \lambda_{M}(a) - \lambda_{D}(a)$ - A model with $\Delta$ larger than 0 leaks more information about gender than we would expect even from simply accomplishing the task defined by the dataset. - Note that the leakage is just a lower bound of overall leakage, since we need to have a perfect attacker function $f$ to get the real leakage. - Adversarial Debiasing - Hypothesis - Models leak extra information about protected attributes because the underlying representation is overly sensitive to features related to those attributes. - Build Loss - critic $c$ tries to predict protected info from intermediate repr. $h_{i}$ for a given image $X_{i}$ of a predictor $p$ - critic will minimize loss over the info it can extract: $\sum_{(h_{i}, g_{i}) \in \mathcal{D}}L_{c}(c(h_{i}), g_{i})$ - predictor will minimize loss over task-specific pred. while increase the critic's loss: $L_{p} = \sum_{(X_{i}, h_{i}, Y_{i}) \in \mathcal{D}}[L(p(X_{i}), Y_{i}) - \lambda L_{c}(c(h_{i}), g_{i})]$ - for encoder-decoder model $L_{p} = \sum_{i}[\beta|X_{i} - \hat{X_{i}}|_{\ell_{1}} + L(p(\hat{X_{i}}), Y_{i}) - \lambda L_{c}(c(h_{i}), g_{i})]$ - $M_{i}$ is a mask generated by an encoder-decoder bottleneck network with input $X_{i}$ - Experiment - model - classification: ResNet50 - encoder-decoder to predict mask M: U-Net - [EnD: Entangling and Disentangling deep representations for bias correction (CVPR 2021)](https://openaccess.thecvf.com/content/CVPR2021/papers/Tartaglione_EnD_Entangling_and_Disentangling_Deep_Representations_for_Bias_Correction_CVPR_2021_paper.pdf)