### Question (VBnU) Comment: Thank authors for the additional experimental results and responses. My original questions and concerns are still not resolved. The term "(un)fair" and "bias" are heavily overloaded in the paper and the response. The data imbalance is irrelevant to what predictor one would like to use, and what fairness notion on prediction one would like to apply. By "fairness attack shifts the data distribution in a more biased way", the data imbalance (which may be problematic in itself and may be not) is mixed together with the DP violation. What exactly is the goal of ALFA? Is it intended to be a general strategy of data augmentation for different fairness notions? Or, is it specifically designed to deal with group-level DP notion, which happen to be closely related to "imbalance" in prediction $\hat{Y}$ (instead of the data $Y$)? In addition, I am still having difficulty parsing the connection between "data augmentation in latent space" and the characterization of "unfair region". While I understand the highlighted region in the figure corresponds to "unfair region", the characterization is too general to be informative. What is the "unfair region" in cases beyond binary classifications? What is the relation between rotating (in a general sense) the classification hyperplane and augmenting the data in latent space? ### Response (draft) Thank you for taking the time to respond to our rebuttal. We are focusing on the correlation between the sensitive attribute $A$ and the prediction $\hat{Y}$. We denote a group of misclassified instances causing high correlation as an unfair region for each sensitive attribute. Unfair region invokes high $\Delta DP$ and $\Delta EOd$ since it gives a higher chance to be predicted as positive to the privileged group (higher FPR) and less chance to the unprivileged group (higher FNR). ALFA's goal is to improve both $\Delta DP$ and $\Delta EOd$ by mitigating the correlation represented by the unfair regions. In detail, as the unfair regions are overestimated or underestimated, correcting the unfair regions makes the classifier fairer in terms of both $\Delta DP$ and $\Delta EOd$. Therefore, ALFA is a general strategy of data augmentation for different fairness notions, even beyond the binary scenario. In the case of multi-class classification or multiple sensitive attributes, it is hard to define the correlation between $A$ and $\hat{Y}$. However, we can define the unfair region as ‘any subgroup having a higher misclassification rate’. ALFA's strategy is to generate samples in unfair regions having correct labels. Although this data augmentation doesn't necessarily 'rotate' the decision boundaries in the case of multiple labels or protected attributes, the augmented latent features lead the classifier to reduce the misclassification rate in the unfair region. ALFA is more focused on the data augmentation in the unfair region, but we can observe the decision boundary rotation in some cases as a result of the data augmentation. Moreover, the reason we consider the unfair regions and data augmentation on latent space is because of the linear separability assumption which makes defining unfair regions easier. Linear separability is a widespread assumption, and deep neural networks transform the input data into well-separated latent features. In contrast, the transparency data augmentation in the input space is not obvious because of the non-linearity of the deep models. Therefore, defining unfair regions and recovering them using data augmentation is more straightforward in the latent space. <!-- ### Response (draft) Thank you for taking the time to respond to our rebuttal. We are focusing on the correlation between the sensitive attribute $A$ and the prediction $Y$. We denote the misclassified instances causing high correlation as unfair regions that provoke high $\Delta DP$ and $\Delta EOd$. ALFA's goal is to improve both $\Delta DP$ and $\Delta EOd$ by mitigating the correlation represented by the unfair regions. In detail, as the unfair regions are overestimated or underestimated, correcting the unfair regions makes the classifier fairer in terms of both $\Delta DP$ and $\Delta EOd$. Therefore, ALFA is a general strategy of data augmentation for different fairness notions, even beyond the binary scenario. In the case of multi-class classification or multiple sensitive attributes, it is hard to define the correlation between $A$ and $Y$. However, we can define the unfair region as ‘any subgroup having a higher misclassification rate’. ALFA's strategy is to generate samples in unfair regions having correct labels. Although this data augmentation doesn't necessarily 'rotate' the decision boundaries in the case of multiple labels or protected attributes, the augmented latent features lead the classifier to reduce the misclassification rate in the unfair region. Moreover, the reason we consider the unfair regions and data augmentation on latent space is because of the linear separability assumption which makes defining unfair regions easier. Linear separability is a widespread assumption, and deep neural networks transform the input data into well-separated latent features. In contrast, the transparency data augmentation in the input space is not obvious because of the non-linearity of the deep models. Therefore, defining unfair regions and recovering them using data augmentation is more straightforward in the latent space. --> ### Question (KRtS) I would like to thank the authors for their rebuttal work. For most parts, my questions have been answered. However, I do have some additional comments on them which seems to be important - **Linear separability:** The original comment should be read as separation of data wrt different sensitive attributes which seems to be essential to be explanation that rotation of linear boundary would help cover unfair regions. Even though data with different classes (labels) would be linearly separable [1], it is not guaranteed the same for data with different sensitive attributes. Consider Figure 1, if the two sensitive attributes were inter-mixed in latent space, I think a rotation would be insufficient to fix unfairness. This issue even worsens when there are more classes of sensitive attributes. **Label flipping** - the original question Q1 was to generate augmented samples using maximizing FPR for one of the classes. This could in theory generate samples to cover the unfair region as well. How would this direct approach do in comparison to covariance loss used in the paper. Contribution of $\lambda$ I was of the understanding that the figures 2, 3 are plots between EOd or DP vs Accuracy. Q3 was regarding the absolute value of $\lambda$ vs EOd and Accuracy. **New results on multi-class classification** - The new results seem to indicate that proposed method leads to significant drop in acc with not much gain in DP. Also why are EOd not reported for this experiment? Minor typo in definition Differential Fairness DF in the rebuttal. Please check indices i,j. ### Response Thank you for taking the time to respond to our rebuttal. #### Linear separability If the demographic groups in the latent are perfectly inter-mixed, it indicates a fair representation in itself. Naturally, we hypothesize they have separation. In fact, literature [1,2,3] requires the linear separability of sensitive groups since they use an auxiliary classification model predicting sensitive attributes. However, linear separability is not strictly required in our framework. Even if the demographic groups are partially inter-mixed, we can still define the unfair regions and the decision boundary can be rotated as long as the latent feature distribution is linearly separable in terms of labels. [1] Zhibo Wang, Xiaowei Dong, Henry Xue, Zhifei Zhang, Weifeng Chiu, Tao Wei, and Kui Ren. Fairness-aware adversarial perturbation towards bias mitigation for deployed deep models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10379-10388, 2022b [2] Christina Wadsworth, Francesca Vera, Chris Piech. Achieving Fairness through Adversarial Learning: an Application to Recidivism Prediction. FAT/ML Workshop, 2018 [3] Ramaswamy, Vikram V., Sunnie SY Kim, and Olga Russakovsky. "Fair attribute classification through latent space de-biasing." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021. #### Label flipping Finding adversarial samples using FPR or FNR maximization looks similar to our framework, but there's a difference in terms of the direction of the perturbation. As the covariance-based fairness attack is designed to maximize the covariance between the sensitive attribute and label, the direction of perturbation is determined as perpendicular to the decision boundary to cover the unfair region effectively. However, maximizing FPR or FNR doesn’t have an obvious tendency. For example, pushing samples to outliers could be a successful FPR maximization, but it doesn’t lead to fairness improvement. #### Contribution of $\lambda$ We visualize the effect of $\lambda$ for three datasets, COMPAS, German, and Drug with Logistic Regression varying $\lambda=0.1,0.3,0.5,0.7,0.9$ and the set $\lambda=0$ as a baseline in Figure 11, Appendix M. Compared to the baseline($\lambda=0$), any $\lambda$ improve fairness. Intuitively, relying heavily on either original samples or perturbed samples might not be a good strategy. We suggest $\lambda=0.5$ as the default setting making the original and augmented features have an equal contribution. #### Experimental results for multi-class classification Here we report a new experimental result with $EOd_{\text{multi}}$. As $EOd_{\text{multi}}$ is not strictly defined in literature, we follows the definition of Predictive Equality (PE) and Equal Opportunity (EO) notion in [4] for each class, and take maximum of the summation of PE and EO similar to $DP_{\text{multi}}$. We define $EOd_{\text{multi}}$ as \begin{align} EOd_{\text{multi}} &= \max_{k\in[K]} \Bigl( \bigl\vert P(\hat{Y}=k \vert a=1, Y=k) -P(\hat{Y}=k \vert a=0, Y=k)\bigr\vert \\ & + \bigl\vert P(\hat{Y}\neq k \vert a=1, Y\neq k) -P(\hat{Y}\neq k \vert a=0, Y\neq k) \bigr \vert \Bigr). \end{align} Here we reveal the technical changes in the rebuttal period. In the binary classification, the fine-tuning learning rate of 0.01 and 0.001 didn't affect the experimental result, so we used 0.01 for convenience (including the previous result in the rebuttal). However, we found that ALFA's performance on multi-class classification is sensitive to hyperparameters including learning rate. While we fix $\lambda=0.5$, we tune other hyperparameters and obtain improved results with $\alpha=0.01$, $\epsilon=1$, and $lr=0.001$. | Drug-multi| Accuracy | | $\Delta DP_{\text{multi}}$ | |$\Delta EOd_{\text{multi}}$ | | | --- | --- | --- | --- | --- | --- | --- | | Model | mean | std. | mean | std. |mean | std. | | MLP | 0.5207|0.0024|0.1917|0.0062|0.3010|0.0158| | MLP + ALFA (lr=0.01) | 0.4851|0.0231|0.1669|0.0224|0.3185|0.0551| | MLP + ALFA (lr=0.001)| 0.5501|0.0013|0.0374|0.0323|0.1934|0.0302| We grid search the hyperpameter by fixing $\lambda=0.5$, set the searching range as $\epsilon=[0.1, 0.2,0.5,1]$, $\alpha=[0.01,0.1,1,10]$, and $lr=[0.01, 0.001]$. Each fine-tuning is conducted in 30 epochs. We measure the training time for a single run for baseline, and the average of 10 runs for fine-tuning. | Dataset | Baseline | Fine-tuning| | --- | --- | --- | | Drug (Multi)| 9.580s | 6.929s| [4] Rouzot, Julien, Julien Ferry, and Marie-José Huguet. "Learning Optimal Fair Scoring Systems for Multi-Class Classification." 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2022. #### Typo Thanks for pointing out the typo. We found the typo regarding the notation of positive ratio and negative ratio for each log probability and changed the indices notation to cartesian product notation. $$ DF = \max_{(i,j) \in S\times S} \Bigl(\max(\bigl \vert \log \frac{P(\hat{y}=1 \mid a=i)}{P(\hat{y}=1\mid a=j)}\bigl \vert, \bigr \vert \log \frac{1-P(\hat{y}=1 \mid a=i)}{1-P(\hat{y}=1\mid a=j)} \bigr \vert \Bigr) $$