Verifying Bias of Machine Learning Algorithms using Adaptive Sampling and Online Learning

# Verifying Bias of Machine Learning Algorithms using Adaptive Sampling and Online Learning Some of the papers summarized here study fairness metrics but some also metigate unwnanted biases, we focus on the study of fairness measurement and we design a toy model that would be used among other baselines ( to be developped further in time) . ## What problem are we trying to solve? Fairness is essentially context dependent and does not have a unified definition, in general, it incorporates social norms to prevent discrimination. The general framework for mitigating biases in order to "garantee" is given in the following figure ([Barocas, Hardt, Narayanan, Fairness in Machine Learning](https://www.myecole.it/biblio/wp-content/uploads/2020/11/2020-Fairness-book.pdf)): ![](https://i.imgur.com/M4s3TdY.png) We distinguish between fairness among individuals and fairness among groups. - **Individual fairness**: Assumes access to a task-specific similarity metric. - General setting: Let's denote $U$ the space where the individuals live, we define the function: $$\delta : U × U \to [0,1].$$ This function is supposed to "treat" similar individuals similarly. One way to see it clearly is by defining a a hypothesis (depending on the learning task) that satisfies Lipschitz condition: $$\forall (x,x') \in U^2: |f(x)-f(x')|\leq \delta(x,x') . $$ A trivial fair solution could be the constant hypothesis (but has very poor accuracy). - **Group fairness**: For a few protected groups $\{S_i\}_{i \in I}$ from a population $U$, we are interested at making our predictor behave similarly on each $S_i$ as in the general population $U$. There are various interpretations for "predictors that behave similarly", that are represented by fairness metrics: - Statistical parity: Every prediction outcome is as likely on $S_i$s and $U$. - Balance: Similar false positives, false negatives on $S_i$s and $U$. - Calibration: prediction values are accurate on average on $S_i$s and on $U$. - And the list is not exhaustive, However, fairness metrics do not often guarantee fairness. In this work, we focus on group fairness. For the sake of developping an intuition about the problem, let's imagine the case where we solve the following simple classification problem: - $X$: Input space that includes race as a feature among others. - $Y = \{0,1\}$ binary output space,where: - $0$: Patient don't require medical treatment. - $1$: Patient don't require medical treatment. Classification is often solved by first solving a regression problem so that our model provides us with a score, a natural score function would be $E[Y|X]$, then we select a threshold on which we apply our decision making and get a risk score. More concretly, let's say we have trained our model and got a hypothesis, calculated the expecred risk and found out that it's "well calibrated" with the actual risk as th following figure shows: ![](https://i.imgur.com/EdG3kJr.png) From a first inspection, it seems like our predictors works fine, but with further inspection by partitionning population based on the race attribute we got the following result: ![](https://i.imgur.com/SORDqYX.png) Once we fix a threshold for our classification problem, we implicitely use two different group specific thresholds on the true risk, In other words, to qualify for advanced treatment a black patient need to be considerably sicker than the white counterpart patient, which lead to discrimination implicitely. Hence, predicted risk and actual risk are no longer calibrated due to internal variability (Simpson's paradox). This type of bias is called aggregation bias. The goal of this first version of this draft is to enable identifying general patterns of algorithmic discrimination and to formalize fairness in a theoritical well defined "concept" that is measurable. ## Two different perspectives: Specifying fairness metric is a core difficulty in making use of (sub)group fairness. Later on our approach is going to focus on learning the metrics through feedback. We can think of that as a measure of distance between subgroups, so instead of defining direclty a measure of fairness, we define a distance between the distributions on protected and unprotected subgroups that we learn to reduce through feedback. ### Problem formulation 1: Take into account the difference in distributions of the subgroups ### Problem formulation 2: Consider a single distribution, that we want to learn but we take into account the variation in the expected risk between each different class. In this first section, we focus on Multicalibrated predictors and we introduce the theory of calibrated predictors. ### Goal: The goal is to assign each individual a representation by being aware of membership in protected group $A$. # Literature review: ## Survey on fairness and fairness metrics: - **Classification**: Most common setup involves the following parameters: - $X$: Input space. - $Y$: labels. - $\hat{Y}$: Predicted output. - $A$: a sensitive attribute (race, gender, age, socioeconomic status) - The goal is to learn a classifier that is accurate and fair toward sensitive (protected) attributes $A$, which reqires some invariance with respect to protected attributes in $A$. In the literature, and specifically for classification, it is common that we make the confusion between defining fairness and measuring fairness: - **Statistical parity/ Group parity/ Demographic parity**: $\hat{Y} \perp A \Leftrightarrow \forall \hat{y} \in\{0,1\}:P[\hat{Y} = \hat{y}|A=0]=P[\hat{Y} = \hat{y}|A=1]$ - **Equalized odds**: $\hat{Y} \perp A|Y$ - **Equal opportunity**: $\hat{Y} \perp A|Y=y$ for $y \in \{0,1\}$ - **Predictive rate parity(sufficiency)/Equal weak calibration**: $Y \perp A|\hat{Y} \Leftrightarrow \forall \hat{y},y \in\{0,1\}: P[Y = y|A=0,\hat{Y} = \hat{y}]=P[Y = y|A=1,\hat{Y} = \hat{y}]$ - **Equal strong calibration**: $Y \perp A|\hat{Y}$ & $\hat{Y} = P[Y = 1]$ - **Fair subgroup accuracy**: ${1}_{[\hat{Y} = Y]} \perp A$ - **Impossibility theorem**: Any two of the following fairness conditions can not be achieved simultaneously (except in degenerate cases): - **(1) Statistical parity**: $\hat{Y} \perp A$ - **(2) Equalized odds**: $\hat{Y} \perp A|Y$ - **(3) Predictive rate parity** $Y \perp A|\hat{Y}$ - Proof: - Let's prove that properties **(1)** and **(2)** are mutually exclusive. Let's assume they're not. Since $\hat{Y} \perp A$ : $$ P[\hat{Y} = \hat{y}] = P[\hat{Y} = \hat{y}|A=a] $$ $$= \sum_{y \in Y} P[\hat{Y} = \hat{y}|Y=y,A=a] P[Y =y|A=a]$$ and since $\hat{Y} \perp A|Y$: $$P[\hat{Y} = \hat{y}] = \sum_{y \in Y} P[\hat{Y} = \hat{y}|Y=y] P[Y =y|A=a]$$ On the other hand, we have $$P[\hat{Y} = \hat{y}] = \sum_{y \in Y} P[\hat{Y} = \hat{y}|Y=y] P[Y =y] $$ Hence, $$\sum_{y \in Y} P[\hat{Y} = \hat{y}|Y=y] P[Y =y|A=a] = \sum_{y \in Y} P[\hat{Y} = \hat{y}|Y=y] P[Y =y] $$ In the simple case of a binary classification problem $Y$ takes binary values: $$p_0 p+p_1 (1-p) = p_0p_a + p_1 (1-p_a) $$ Where: $p = P[Y =y]$ $p_y = P[\hat{Y} = \hat{y}|Y=y], y\in \{0,1\}$ $p_a = P[Y =y|A=a]$ We get : $$p_0 ( p-p_a) = p_1(p-p_a) $$ Which means either $p_0 = p_1$ or $p = p_a$, In other words, $\hat{Y} \perp Y$ or $A \perp Y$, which are degenerate cases. - Similarly, we prove the other properties. - **We can deduce that there's no universal definition of fairness (in the sense that it would satisfy all other constraints given by other definitions).** - **Regression**: [Richard Berk et al. 2017](https://arxiv.org/abs/1706.02409) propose a definition of a measure of fairness in the regression problem case for different cases, the idea is to transform the problem of fairness to an optimization problem where we penalize the discrimination functions $f_i$ (multiplied by a hyperparameter~Trade-off between accuracy and fairness) defined below: - **Individual fairness**: Let $S$ be the space of samples partitioned into $S_1$ and $S_2$ (If $S$ is the set of patients for a clinical trial, then $S_1$ and $S_2$ can be the subgroups of white and black people) $S = S_1 \uplus S_2$, $n = \#S, n_1 = \#S_1, n_2= \#S_2$, Let $d$ be a measure defined on $(\mathbb{R}^d)^2$, $$f_1(\mathbf{w},S) = \frac{1}{n_1 n_2} \sum_{(\mathbf{x_i},y_i) \in S_1} \sum_{(\mathbf{x_j},y_j) \in S_2} d(y_i,y_j) (\mathbf{w}.\mathbf{x_i}-\mathbf{w}.\mathbf{x_j})^2$$ **Note**: The definition of the measure can be given in terms of distributions over subgroups, and this eventually follows **problem formulation 1**. - **Group fairness**: For this case the penalization term of fairness is given by: $$f_2(\mathbf{w},S) = (\frac{1}{n_1 n_2} \sum_{(\mathbf{x_i},y_i) \in S_1} \sum_{(\mathbf{x_j},y_j) \in S_2} d(y_i,y_j) (\mathbf{w}.\mathbf{x_i}-\mathbf{w}.\mathbf{x_j}))^2$$ **Note**: As we can see, unlike $f_1$, $f_2$ allow compensation of errors for the overall sum. ## Multicalibrated predictors: ## In the frontiers of our research area: - Fair ranking - Enforce awareness in neural rankers - Main use $\Rightarrow$ Ranking documents from a retrieved list of queries by providing a balanced representation of the protected attributes. - How? $\Rightarrow$ Infer genderdeness measurements in documents in Hinge loss function - Penalizing bias by optimization in the loss function. - Other references ... (It may not be relevant to go through them all) ## Upcoming study: Toy model that garantees fairness (On progress~Content will be added this weekend) ### Motivation: Analogy with physics ### Building the theoritical framework ## Upcoming literature study (On progress~Content will be added this weekend): - Learning fairness metrics: - Eliciting and Enforcing Subjective Individual Fairness, Jung, M., et al - Metric learning for individual fairness, Ilvento, C. - Probably approximately metric fair learning, Rothblum, G. and Gal, Y. ## Litterature review (Part 2): ### [On the Fairness of Disentangled Representations](https://proceedings.neurips.cc/paper/2019/file/1b486d7a5189ebe8d8c46afc64b0d1b4-Paper.pdf) **Dataset**: Various datasets of images: [Sprites (2D Video Game Character Sprites)](https://github.com/YingzhenLi/Sprites), [smallNORB ](https://cs.nyu.edu/~ylclab/data/norb-v1.0-small/), [dSprites](https://github.com/deepmind/dsprites-dataset). **Code**: Unavailable **Overview**: Bla bla **Category**: Fair representation learning **Fair approach**: Incorporate demographic parity in fairness measure **ML Model**: VAE - Gradient boosting classifier **Evaluation Metrics**: * **Accuracy**: * **Fairness**: Total variation measure (measures implicitely demographic parity) **Key Results**: Blabla **Pros**: * Blabla * Blabla **Cons**: * Blabla * Blabla ## Baselines: ## Our approach (First version):