Hoin
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # ICML 2024 ALFA Rebuttal (Revised) # Reviewer 1 (Qyhc) ### W1: Improving Readability Thank you for your feedback. In response, we will carefully revise the introduction and visual aids, and provide more straightforward materials, such as animations explaining the concept on our GitHub page if the paper is published. ### W2: Rotation and translation in Decision Boundary Thank you for pointing this out. Our proposed method indeed not only rotates the decision boundary but translates it too. Since the last layer trains the original and perturbed features, both parameters of the last layer, the weight, and bias, will be updated. It results in a change in weight, indicating a rotated decision boundary, while the updated bias component indicates the translation as well. We will emphasize that the last layer is fine-tuned, resulting in both rotation and translation. # Reviewer 2 (ncUi) ### W1: Clarification of the Concept Our paper does not suggest that the decision boundary itself covers an area, but data augmentation covers the unfair region leading the newly trained decision boundary to separate the latent space in a fairer manner. As the reviewer mentioned, the pre-trained decision boundary separates the latent space. Sometimes it may produce unfair predictions such as a higher false positive rate for the privileged group and higher false negative for the unprivileged group. We refer to these subgroups with higher misclassification rates as the "unfair region," defined in Line 107 of the paper: >"This region is characterized by disproportionate misclassification rates between privileged and underprivileged groups. Figure 1(a) illustrates this concept, highlighting areas where biased predictions are most prevalent". Moreover, the caption in Figure 1 explains, >"The misclassification rates of subgroup {A =1, Y = 0} and {A = 0, Y = 1} are disproportionately high, indicated as the unfair region in the left figure." As emphasized in the paper, our proposed method is a data augmentation approach in the latent space. The augmented features have the same class label and sensitive attributes as the samples in unfair region, and will be located in the unfair region covering the area. Ultimately, the newly trained decision boundary on the augmented feature will separate the latent space in a fairer manner. ### W2: Regarding Adversarial Learning References Compared to the given references, our proposed method is still distinct and novel. While our method employs 'adversarial training,' it specifically attacks fairness, not accuracy. Consequently, we discuss 'fairness attacks' in our literature review from lines 130 to 140. Here, we sum up how the proposed method is novel compared to the five references from the reviewer. - Reference [2] adopts a counterfactual data augmentation strategy by blinding the identity terms in the text. This approach does not involve adversarial training and fairness attacks, while our method considers which group of samples specifically brings fairness issues. - References [3] to [5] aim to ensure robust classification within the $\ell_p$-ball of a target instance by utilizing adversarial training, which is a min-max optimization in terms of accuracy. As a result, the prediction becomes more stable and less prone to dynamic changes but doesn't improve group fairness or detect unfair regions. Although these studies include adversarial training, their goals and methodologies differ from ours. - Reference [1] includes a classifier model named adversary, aiming to predict the sensitive attribute, while an encoder wants to deceive the adversary. It is also an adversarial training, but is different from ours as [1] does not involve perturbation, data augmentation, or fairness attack. - Despite the difference in methodology, we recognize that Reference [1] is one of the methods to achieve group fairness. We thank the reviewer for mentioning this related work and have conducted experiments for comparison. We note that their method, named LAFTR, is only applicable to MLP and provides experimental results solely on the Adult dataset. - Here, we validate that the performance of LAFTR [1] is not consistent across the dataset, and our method, ALFA, significantly outperforms LAFTR in various datasets. | Adult | Accuracy |$\Delta DP$ |$\Delta EOd$ | | ------------------- | ------------- | -------------- | ------------- | | Baseline | 0.8525±0.0010 | 0.1824±0.0114 | 0.1768±0.0411 | | ALFA (Ours) | 0.8380±0.0045 | 0.1642±0.0261 | **0.0971±0.0098** | | LAFTR (Madras. et. al.) | 0.8470±0.0020 | **0.1497±0.0191** | 0.1117±0.0443 | | COMPAS | Accuracy |$\Delta DP$ |$\Delta EOd$ | | ------------------- | ------------- | ------------- | ------------- | | Baseline | 0.6711±0.0049 | 0.2059±0.0277 | 0.3699±0.0597 | | ALFA (Ours) | 0.6701±0.0020 | **0.0207±0.0142** | **0.0793±0.0418** | | LAFTR (Madras. et. al.) | 0.6397±0.0284 | 0.1164±0.0183 | 0.2089±0.0252 | | German | Accuracy |$\Delta DP$ |$\Delta EOd$ | | ------------------- | ------------- | ------------- | ------------- | | Baseline | 0.7800±0.0150 | 0.0454±0.0282 | 0.2096±0.0924 | | ALFA (Ours) | 0.7570±0.0024 | **0.0053±0.0064** | **0.0813±0.0110** | | LAFTR (Madras. et. al.) | 0.7308±0.0270 | 0.0419±0.0410 | 0.1677±0.1433 | | Drug | Accuracy |$\Delta DP$ |$\Delta EOd$ | | ------------------- | ------------- | ------------- | ------------- | | Baseline | 0.6674±0.0096 | 0.2760±0.0415 | 0.4718±0.0838 | | ALFA (Ours) | 0.6382±0.0061 | **0.0820±0.0259** | **0.1068±0.0476** | | LAFTR (Madras. et. al.) | 0.6195±0.0352 | 0.1848±0.1035 | 0.3235±0.1715 | [1] Madras et al., Learning Adversarially Fair and Transferable Representations, 2018 [2] Garg et al., Counterfactual Fairness in Text Classification through Robustness, 2019 [3] Yurochkin et al., Training individually fair ML models with Sensitive Subspace Robustness, 2020 [4] Ruoss et al., Learning Certified Individually Fair Representations, 2020 [5] Peychev et al., Latent Space Smoothing for Individually Fair Representations, 2022 ### W3: Consistency of Experiments, Accuracy-Fairness Trade-off Thanks for pointing out a significant insight. We observe that accuracy-fairness trade-offs don't always happen, and sometimes they can be improved simultaneously. - At first, from our insight, the Pareto Frontier in our results represents the optimal trade-off line. Below this line, it is possible for both accuracy and fairness to improve simultaneously. - Moreover, [6] shows that there could be an ideal distribution where accuracy and fairness are in accord, which supports our observation. [6] Dutta, S., Wei, D., Yueksel, H., Chen, P. Y., Liu, S., & Varshney, K. (2020, November). Is there a trade-off between fairness and accuracy? a perspective using mismatched hypothesis testing. In International conference on machine learning (pp. 2803-2813). PMLR. ### Q1: Considerations on Post-Processing for Fairness Thanks for raising a valuable question. Post-processing doesn't rotate the decision boundary, which is determined by the prediction model's weight. Instead, it adjusts the threshold or bias of the last layer. As an example, we implemented a fair post-processing [7] on the synthetic dataset. The image below illustrates that while it can translate the decision boundaries by adjusting the threshold for each demographic group, the weight of the linear classifier remains unchanged, which means not rotated. As a consequence, compared with post-processing methods, our proposed adversarial augmentation method is more effective by rotating and translating the decision boundary (as in Reviewer Qyhc - W2) and thus can achieve better accuracy-fairness trade-off. ![image](https://hackmd.io/_uploads/B10LBDfyC.png) [7] Jang, T., Shi, P., & Wang, X. (2022, June). Group-aware threshold adaptation for fair classification. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 6, pp. 6988-6995). ### Q2 \& Q5: Interpretability of the Augmented Feature and Input Perturbation ##### Interpretability Thanks for pointing out the issue of interpretability. In this work, we can consider interpretability from two aspects: 1) Interpretability on decision boundary (latent space) 2) Interpretability on input feature (input space) While we have focused on the first aspect, we argue that the proposed method can cover the second aspect as well, satisfying the reviewer's concern. Let us break down the two-fold interpretability, and how our method is applicable in both cases. * At first, we are focusing on the interpretability of decision boundaries, which is a common approach to understand the classifier's behavior [8,9]. By manipulating features in the latent space by the fairness attack, we can interpret the decision boundary by discovering an unfair region and adjusting the decision boundary. In this case, it is true that it can't analyze how the changes in input features affect the decision boundary. * On the other hand, the interpretability of the input feature might make it possible to analyze how the fairness attack perturbs input data. However, it may lose the interpretability of decision boundary, such as discovering unfair regions and understanding the last layer's behavior. ##### Fairness attack in input space, and fine-tuning entire network Fortunately, our framework is applicable to the input space by deploying the fairness attack and perturbation in the input space. In this case, the entire model will be fine-tuned, while offering input-level interpretability. We conducted additional experiments with MLP to show the validity of our framework on the input space in the table below. Consequently, our method can offer either interpretability on latent space or input space. In both cases, we can maintain the accuracy level while mitigating the fairness issue. We opt to freeze the pretrained encoder and deploy perturbations in the latent space, as this approach generally leads to greater improvements in fairness compared to perturbation in input space in various datasets. | Adult | Accuracy | $\Delta DP$ |$\Delta EOd$ | | ------------------- | ------------- | -------------- | ------------- | | Baseline | 0.8525±0.0010 | 0.1824±0.0114 | 0.1768±0.0411 | | Latent perturbation | 0.8380±0.0045 | 0.1642±0.0261 | **0.0971±0.0098** | | Input perturbation | 0.8473±0.0016 | **0.1588±0.0135** | 0.1016±0.0394 | | COMPAS | Accuracy | $\Delta DP$ |$\Delta EOd$ | | ------------------- | ------------- | ------------- | ------------- | | Baseline | 0.6711±0.0049 | 0.2059±0.0277 | 0.3699±0.0597 | | Latent perturbation | 0.6701±0.0020 | **0.0207±0.0142** | **0.0793±0.0418** | | Input perturbation | 0.6629±0.0051 | 0.0610±0.0389 | 0.1086±0.0649 | | German | Accuracy | $\Delta DP$ |$\Delta EOd$ | | ------------------- | ------------- | ------------- | ------------- | | Baseline | 0.7800±0.0150 | 0.0454±0.0282 | 0.2096±0.0924 | | Latent perturbation | 0.7570±0.0024 | **0.0053±0.0064** | **0.0813±0.0110** | | Input perturbation | 0.7465±0.0067 | 0.0188±0.0106 | 0.1700±0.0400 | | Drug | Accuracy | $\Delta DP$ |$\Delta EOd$ | | ------------------- | ------------- | ------------- | ------------- | | Baseline | 0.6674±0.0096 | 0.2760±0.0415 | 0.4718±0.0838 | | Latent perturbation | 0.6382±0.0061 | 0.0820±0.0259 | **0.1068±0.0476** | | Input perturbation | 0.6188±0.0146 | **0.0571±0.0365** | 0.1893±0.0809 | [8] Guidotti, R., Monreale, A., Matwin, S., & Pedreschi, D. (2020). Black box explanation by learning image exemplars in the latent feature space. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20, 2019, Proceedings, Part I (pp. 189-205). Springer International Publishing. [9] Bodria, F., Guidotti, R., Giannotti, F., & Pedreschi, D. (2022, October). Interpretable latent space to enable counterfactual explanations. In International Conference on Discovery Science (pp. 525-540). Cham: Springer Nature Switzerland. ### Q3. Analysis on Synthetic Data Thank you for highlighting this detail. We provide the details of the synthetic data, illustrating the concept of the unfair region and how the decision boundary is rotated. We simplify the binary classification task with a 2D Gaussian mixture model, as assumed in [10], consisting of two classes $y \in \{0, 1\}$ and two sensitive attributes $A \in \{0, 1\}$ (indicating unprivileged and privileged groups). \begin{align} x \sim \begin{cases} group1: \textbf{N} (\begin{bmatrix} \mu \\ \mu \end{bmatrix} , \sigma^2 )& \text{if} \: y=1, a=1 \\ group2: \textbf{N} (\begin{bmatrix} \mu \\ \mu^\prime \end{bmatrix} , \sigma^2)& \text{if} \: y=0, a=1 \\ group3: \textbf{N} (\begin{bmatrix} 0 \\ \mu \end{bmatrix} ,(K\sigma)^2) & \text{if}\: y=1, a=0 \\ group4: \textbf{N} (\begin{bmatrix} 0 \\ 0 \end{bmatrix} , (K\sigma)^2)& \text{if}\: y=0, a=0 \end{cases} \end{align} where $\mu^\prime =r\mu, 0<r<1$ and $K>1$, where the number of samples in each group is $N_1 : N_2: N_3: N_4$. We arbitrarily set $K=3$, $r=0.7$, $\mu = 1$, $N_1 = N_2 = 100$, and $N_3=N_4=400$. From the synthetic data, we observe a decision boundary like Figure 2(a) in the paper. Due to dataset imbalance, the subgroup $a=1,y=0$ is overestimated as label $y=1$, and the subgroup $a=0, y=1$ is underestimated as label $y=0$. The disparity in misclassification rates is depicted in Figure 2(c). We define these disparities as 'unfair regions' where the misclassification rate is disproportionately high. Regarding $\delta$, it is a trainable parameter in our framework, rather than being a hyperparameter. However, only in Figure 2(c), we manually vary the amount of perturbation $\delta$ from 0 to 0.2 to show the impact of $\delta$ by demonstrating how the misclassification rate for each group changes accordingly. As the fairness evaluation metric $\Delta EOd$ is defined as the summation of the True Positive Rate (TPR) gap and False Positive Rate (FPR) gap between demographic groups, we plot the TPR gap, FPR gap, $\Delta EOd$, and the overall misclassification rate. Figure 2(c) shows that both the TPR gap and FPR gap decrease significantly indicating a small EOd, with a minor increase in overall misclassification rate. [10] Xu, H., Liu, X., Li, Y., Jain, A., & Tang, J. (2021, July). To be robust or to be fair: Towards fairness in adversarial training. In International conference on machine learning (pp. 11492-11501). PMLR. ### Q4-1: Goal of Fairness Attack The goal of the attack is clearly stated in Section 3, from lines 164 to 172: >"The proposed method aims to automatically discover unfair regions and generate perturbed samples that directly cover these regions with over/underestimated demographic groups for each label, by attacking the fairness constraint. Training on the perturbed latent features results in a rotated decision boundary that reduces the misclassification rate of biased subgroups." This is closely related to the choice of loss function, as discussed in Section 3.1. ### Q4-2: Why Covariance-Attack? Indeed, any type of fairness constraint can be used in the attacking step. For example, we can adopt a convex fairness constraint [11]. We elaborate on the detail of the convex fairness constraint in Appendix J and report the experimental results in the table below by comparing the baseline, the covariance-base fairness attack (suggested in the paper), and the convex fairness attack. The experiment shows that our method can adopt any type of fairness constraint during the attacking step, both showing improvement in fairness. While our framework has wide adaptability in the choice of fairness constraint during the fairness attack, the reason we chose covariance instead of convex fairness constraint is it doesn't depend on the empirical outputs and offers clear proof illustrated in Proposition 3.1 and Theorem 3.2. | Adult | Accuracy | $\Delta DP$ | $\Delta EOd$ | | --- | --- | --- | --- | | Model | mean | std. | mean | std. | mean | std. | | Logistic (Baseline) |0.8470±0.0007|0.1829±0.0020|0.1982±0.0077| | Logistic + ALFA (covariance) |0.8464±0.0004|0.1555±0.0013|**0.0616±0.0022**| | Logistic + ALFA (convex) |0.8227±0.0026|**0.0852±0.0078**|0.1547±0.0133| | MLP (Baseline) |0.8525±0.0010|0.1824±0.0114|0.1768±0.0411| | MLP + ALFA (covariance) |0.8380±0.0045|0.1642±0.0261|0.0971±0.0098| | MLP + ALFA (convex) |0.8324±0.0031|**0.1400±0.0166**|**0.0904±0.0184**| | COMPAS | Accuracy | $\Delta DP$ | $\Delta EOd$ | | --- | --- | --- | --- | | Model | mean | std. | mean | std. | mean | std. | | Logistic (Baseline) |0.6578±0.0034|0.2732±0.0129|0.5319±0.0245| | Logistic + ALFA (covariance) |0.6682±0.0040|**0.0210±0.0167**|**0.0931±0.0323**| | Logistic + ALFA (convex) |0.6740±0.0034|0.0470±0.0180|0.1444±0.0379| | MLP (Baseline) |0.6711±0.0049|0.2059±0.0277|0.3699±0.0597| | MLP + ALFA (covariance) |0.6701±0.0020|0.0207±0.0142|0.0793±0.0418| | MLP + ALFA (convex) |0.6624±0.0010|**0.0130±0.0075**|**0.0738±0.0150**| | German | Accuracy | $\Delta DP$ | $\Delta EOd$ | | --- | --- | --- | --- | | Model | mean | std. | mean | std. | mean | std. | | Logistic (Baseline) |0.7220±0.0131|0.1186±0.0642|0.3382±0.1268| | Logistic + ALFA (covariance) |0.7660±0.0189|0.0397±0.0261|0.1596±0.0354| | Logistic + ALFA (convex) |0.7410±0.0130|**0.0240±0.0179**|**0.1030±0.0360**| | MLP (Baseline) |0.7800±0.0150|0.0454±0.0282|0.2096±0.0924| | MLP + ALFA (covariance) |0.7570±0.0024|**0.0053±0.0064**|**0.0813±0.0110** | MLP + ALFA (convex) |0.7575±0.0087|0.0181±0.0120|0.1960±0.0079| | Drug | Accuracy | $\Delta DP$ | $\Delta EOd$ | | --- | --- | --- | --- | | Model | mean | std. | mean | std. | mean | std. | | Logistic (Baseline) |0.6626±0.0135|0.2938±0.0761|0.5064±0.1616| | Logistic + ALFA (covariance) |0.6554±0.0067|0.0909±0.0261|**0.1170±0.0255**| | Logistic + ALFA (convex) |0.6509±0.0072|**0.0596±0.0198**|0.1284±0.0286| | MLP (Baseline) |0.6674±0.0096|0.2760±0.0415|0.4718±0.0838| | MLP + ALFA (covariance) |0.6382±0.0104|**0.0820±0.0259**|**0.1068±0.0476**| | MLP + ALFA (convex) |0.6329±0.0173|0.1002±0.0826|0.1955±0.0956| [11] Wu, Yongkai, Lu Zhang, and Xintao Wu. "On convexity and bounds of fairness-aware classification." The World Wide Web Conference. 2019. ### Q6-1: Impact of hyperparameters Eq.(5) and Eq.(6) show our intention to retain the accuracy while ensuring fairness, by using Sinkhorn distance to maintain the semantical meaning of perturbed samples, and training the perturbed samples and original samples together also maintain the accuracy level. We are varying the $\alpha$ value which is the weight of the Sinkhorn distance, to construct the Pareto Frontier as stated in line 356. Despite of the line in Pareto Frontier, we agree that the impact of $\alpha$ and having the same weight in Eq.(6) are less introduced in our paper. Here, we report a detailed analysis of how each component in Eq.(5) and Eq.(6) affect fairness-accuracy trade-off by showing 1) the result without Sinkhorn distance ($\alpha=0$) in Eq.(5), and 2) the result without original feature in Eq.(6). 1) As shown in the table below, maximizing solely on $L_{fair}$ during the fairness attack, which means $\alpha=0$, also improves fairness. However, it compromises the accuracy. As the same intention introduced in Section 3.2, the usage of Sinkhorn distance can maintain the semantical meaning of perturbed samples, resulting in retaining the accuracy. | Drug | Accuracy | $\Delta DP$ | $\Delta EOd$ | | --- | --- | --- | --- | | Model | mean | std. | mean | std. | mean | std. | | Logistic (Baseline) |0.6626±0.0135|0.2938±0.0761|0.5064±0.1616| | Logistic + ALFA ($\alpha=0$) |0.6395±0.0067|0.0325±0.0244|0.1638±0.0593| | Logistic + ALFA ($\alpha=10$) |0.6554±0.0067|0.0909±0.0261|0.1170±0.0255| | MLP (Baseline) |0.6674±0.0096|0.2760±0.0415|0.4718±0.0838| | MLP + ALFA ($\alpha=0$) |0.6276±0.0092|0.0393±0.0407|0.0691±0.0518| | MLP + ALFA ($\alpha=10$) |0.6382±0.0104|0.0820±0.0259|0.1068±0.0476| 2) Similar to the Sinkhorn distance, we believe that re-training the classifier solely on perturbed features may negatively impact accuracy. In line with our intuition, training exclusively on perturbed features results in slightly lower accuracy, but it can effectively achieve fairness. | Drug | Accuracy | $\Delta DP$ | $\Delta EOd$ | | --- | --- | --- | --- | | Model | mean | std. | mean | std. | mean | std. | | Logistic (Baseline) |0.6626±0.0135|0.2938±0.0761|0.5064±0.1616| | Logistic + ALFA (only perturbed) |0.6515±0.0070|0.0829±0.0249|0.1237±0.0275| | Logistic + ALFA (original+perturbed) |0.6554±0.0067|0.0909±0.0261|0.1170±0.0255| | MLP (Baseline) |0.6674±0.0096|0.2760±0.0415|0.4718±0.0838| | MLP + ALFA (only perturbed) |0.6340±0.0050|0.0533±0.0313|0.0762±0.0530| | MLP + ALFA (original+perturbed) |0.6382±0.0104|0.0820±0.0259|0.1068±0.0476| Consquently, the our proposed method is effectively designed to retaining the accuracy, while ensuring the fairness. ### Q6-2: Minimizing Fairness Constraint Minimizing the fairness constraint [12] together with our framework makes it challenging to verify the effectiveness of the proposed data augmentation. Therefore, we don't consider minimizing $L_{fair}$ during the training. However, we report additional experiments when minimizes $L_{fair}$ upon our framework. As shown in the table below, the combination of the two methods doesn't additionally improve fairness. | COMPAS | Accuracy | $\Delta DP$ | $\Delta EOd$ | | --- | --- | --- | --- | | Model | mean | std. | mean | std. | mean | std. | | Logistic (Baseline) |0.6578±0.0034|0.2732±0.0129|0.5319±0.0245| | Logistic + ALFA |0.6682±0.0040|0.0210±0.0167|0.0931±0.0323| | Logistic + ALFA + minimizing $L_{fair}$ |0.6701±0.0037|0.0481±0.00431|0.1291±0.0732| | MLP (Baseline) |0.6711±0.0049|0.2059±0.0277|0.3699±0.0597| | MLP + ALFA (covariance) |0.6701±0.0020|0.0207±0.0142|0.0793±0.0418| | MLP + ALFA + minimizing $L_{fair}$ |0.6632±0.0513|0.1242±0.0087|0.0422±0.0474| Furthermore, the baseline that utilizes solely the fairness constraint is depicted on the Pareto Frontier as 'covariance loss' by the brown color, which shows less significant improvement in fairness compared to ours. [12] Zafar, Muhammad Bilal, et al. "Fairness constraints: Mechanisms for fair classification." Artificial intelligence and statistics. PMLR, 2017. # Reviewer3 (dbYU) ### W1: The Analysis and Justification of Eq.(5) and (6) Thanks for pointing out a core principle of our paper. Here we provide the evidence of validity of each formulation. At first, the validity of Eq.(5) is justified by Theorem 3.2. Eq.(5) is designed to maximize fairness by intentionally generating biased features to cover the unfair region. Theorem 3.2 shows that the perturbations that maximize the fairness constraint also increase the gaps in Demographic Parity (DP) and Equality of Opportunity (EOd). To enlarge the EOd gap, the perturbed features must move in a specific direction, resulting in a rotated decision boundary when the classifier trains the perturbed features. Secondly, Eq.(6) represents the empirical risk minimization (ERM) applied to both the original and augmented features. While this may be a heuristic approach, it is a widely used method in data augmentation literature such as [1] and [2] for enhancing model performance. [1] Hsu, C. Y., Chen, P. Y., Lu, S., Liu, S., & Yu, C. M. (2022, June). Adversarial examples can be effective data augmentation for unsupervised machine learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 6, pp. 6926-6934). [2] Zhao, L., Liu, T., Peng, X., & Metaxas, D. (2020). Maximum-entropy adversarial data augmentation for improved generalization and robustness. Advances in Neural Information Processing Systems, 33, 14435-14447. ### W2: Contribution of each factor to find $\delta$ Thank you for highlighting the clarity of our framework. In order to derive the perturbation $\delta$ in Eq.(5) for the fairness attack, we aim to maximize $L_{fair}$ while minimizing the Sinkhorn distance. Thus, the impact of Sinkhorn distance should be analyzed. We dissect the impact of the Sinkhorn distance on obtaining the perturbation $\delta$ in Section W2-1. Furthermore, in response to the reviewer's inquiry, we elaborate the distinction between our adversarial augmentation with $L_{fair}$ and the direct minimization of $L_{fair}$ in Section W2-2. #### W2-1: Contribution of Sinkhorn distance Indeed, $L_{fair}$ alone during the fairness attack is sufficient to transform the decision boundary fair. However, to prevent the decision boundary from hurting accuracy, we utilize the Sinkhorn distance to maintain the semantical meaning of the perturbed samples. Here, we present a comparison with/without the Sinkhorn distance by showing the improvements in accuracy and fairness rather than the $\delta$ value itself. Without Sinkhorn distance, $L_{fair}$ alone, shows the improvement in fairness. However, it compromises the accuracy. Therefore, Sinkhorn distance may not be a component enhancing fairness, but it's crucial to retain the accuracy. | Drug | Accuracy | $\Delta DP$ | $\Delta EOd$ | | --- | --- | --- | --- | | Model | mean | std. | mean | std. | mean | std. | | Logistic (Baseline) |0.6626±0.0135|0.2938±0.0761|0.5064±0.1616| | Logistic + ALFA (w/o Sinkhorn) |0.6395±0.0067|0.0325±0.0244|0.1638±0.0593| | Logistic + ALFA (w/ Sinkhorn) |0.6554±0.0067|0.0909±0.0261|0.1170±0.0255| | MLP (Baseline) |0.6674±0.0096|0.2760±0.0415|0.4718±0.0838| | MLP + ALFA (w/o Sinkhorn) |0.6276±0.0092|0.0393±0.0407|0.0691±0.0518| | MLP + ALFA (w/ Sinkhorn) |0.6382±0.0104|0.0820±0.0259|0.1068±0.0476| <!-- $L_{fair}$ alone during the fairness attack is sufficient to transform the decision boundary fair. However, to prevent the decision boundary from hurting accuracy, we utilize the Sinkhorn distance multiplied by $\alpha$ to maintain the semantical meaning of the perturbed samples. Here, we report the comparison with/without the Sinkhorn distance. Without Sinkhorn distance, $L_{fair}$ alone, shows the improvement in fairness. However, it compromises the accuracy. Therefore, Sinkhorn distance may not be a component enhancing fairness, but it's crucial to retain the accuracy. | Drug | Accuracy | $\Delta DP$ | $\Delta EOd$ | | --- | --- | --- | --- | | Model | mean | std. | mean | std. | mean | std. | | Logistic (Baseline) |0.6626±0.0135|0.2938±0.0761|0.5064±0.1616| | Logistic + ALFA ($\alpha=0$) |0.6395±0.0067|0.0325±0.0244|0.1638±0.0593| | Logistic + ALFA ($\alpha=10$) |0.6554±0.0067|0.0909±0.0261|0.1170±0.0255| | MLP (Baseline) |0.6674±0.0096|0.2760±0.0415|0.4718±0.0838| | MLP + ALFA ($\alpha=0$) |0.6276±0.0092|0.0393±0.0407|0.0691±0.0518| | MLP + ALFA ($\alpha=10$) |0.6382±0.0104|0.0820±0.0259|0.1068±0.0476| --> #### W2-2: Minimizing Fairness Constraint Minimizing the fairness constraint alone during our framework makes it challenging to verify the effectiveness of our proposed method. Therefore, we separately demonstrate our method, and a baseline that utilizes solely the fairness constraint [3] is depicted on the Pareto Frontier as 'covariance loss' by the brown color. Minimizing only the fairness constraint shows less significant improvement in fairness compared to ours. [3] Zafar, Muhammad Bilal, et al. "Fairness constraints: Mechanisms for fair classification." Artificial intelligence and statistics. PMLR, 2017. ### Q1: Iterative Adversarial Training Thank you for highlighting this. As the reviewer mentioned, Eq.(5) is trained iteratively. This is indirectly mentioned in line 328 of Section 4.2, where we discuss the learning rate of the adversarial stage. However, we did not specify the number of epochs. In reality, it is iteratively optimized for 10 epochs. We will revise the experimental setup section to clarify this. ### Q2: Details about Synthetic Dataset Thank you for highlighting this detail. We provide the details of the synthetic data, illustrating the concept of the unfair region and how the decision boundary is rotated. We simplify the binary classification task with a 2D Gaussian mixture model, as assumed in [4], consisting of two classes $y \in \{0, 1\}$ and two sensitive attributes $A \in \{0, 1\}$ (indicating unprivileged and privileged groups). \begin{align} x \sim \begin{cases} group1: \textbf{N} (\begin{bmatrix} \mu \\ \mu \end{bmatrix} , \sigma^2 )& \text{if} \: y=1, a=1 \\ group2: \textbf{N} (\begin{bmatrix} \mu \\ \mu^\prime \end{bmatrix} , \sigma^2)& \text{if} \: y=0, a=1 \\ group3: \textbf{N} (\begin{bmatrix} 0 \\ \mu \end{bmatrix} ,(K\sigma)^2) & \text{if}\: y=1, a=0 \\ group4: \textbf{N} (\begin{bmatrix} 0 \\ 0 \end{bmatrix} , (K\sigma)^2)& \text{if}\: y=0, a=0 \end{cases} \end{align} where $\mu^\prime =r\mu, 0<r<1$ and $K>1$, where the number of samples in each group is $N_1 : N_2: N_3: N_4$. We arbitrarily set $K=3$, $r=0.7$, $\mu = 1$, $N_1 = N_2 = 100$, and $N_3=N_4=400$. We will revise the appendix to provide details of the synthetic data. [4] Xu, H., Liu, X., Li, Y., Jain, A., & Tang, J. (2021, July). To be robust or to be fair: Towards fairness in adversarial training. In International conference on machine learning (pp. 11492-11501). PMLR. ### Q3: Rationalizing the Piecewise Linear Approximation In our implementation, the inverse sigmoid function can be approximated by a piecewise linear function, eliminating any issues related to infinity. It is true that the logit function can produce infinite values. Indeed, this is why we employ a piecewise linear approximation. Without this approximation, highly confident samples (e.g., $p(y) \approx 1$) could disproportionately influence the overall loss value during the adversarial attack, potentially leading to suboptimal optimization. By approximating the logit function as piecewise linear, we can achieve more stable optimization of the fairness attack. This is done by adjusting the sigmoid output range from (0,1) to $(\beta, 1-\beta)$, where $\beta$ is a very small value ($10^{-7}$). As defined in line 238, this adjustment results in a logit value range of (-16.1181, 16.1181), preventing extreme values from dominating the loss calculation. ![image](https://hackmd.io/_uploads/H1ygDonRT.png) ### Limitation: Adaptability to Large-Scale Datasets Our framework can be adopted in large-scale datasets. To illustrate the adaptability of our proposed method to large-scale datasets, we have employed the Wikipedia toxicity classification dataset, an NLP dataset consisting of over 100,000 comments from the English Wikipedia, as introduced in Appendix I.4. We highlight the results in the table below: ![image](https://hackmd.io/_uploads/HkfuAohC6.png) # Reviewer 4 (CNDR) ### W1: Improving Readability of Comparison Methods Thank you for highlighting the issue of readability regarding the comparison methods. While these methods are detailed in the appendix, we will revise the paper to provide a brief introduction to them earlier, incorporating this information into Section 2 for better clarity. ### W2: Enhancing Clarity in Result Analysis We will revise the text in Sections 4.4.1 and 4.4.2 to better demonstrate the effectiveness of our method and provide more details about the comparison methods. Additionally, we will update the captions of Figures 3, 4, and 5 to highlight the consistent improvements achieved through our approach. ### Q1: Addressing the Trade-off in the Adult Dataset with MLP An extension of our framework can solve the problem of the impact of correctly predicted samples in unfair regions, by employing the same strategy on the input space. Here, we elaborate the issue and the solution. - In Figures 3 and 4, our proposed method (ALFA) does not achieve the best results only in the Adult dataset with MLP. We suspect that the MLP encoder may already extract a mixed representation of misclassified privileged and correctly classified unprivileged groups. This makes it challenging to define the unfair region. In this case, relying solely on latent perturbation, an accuracy-fairness trade-off is likely to occur since our method cannot enhance the encoder's ability to distinguish between these two sets of samples. - However, the adaptability of our method through fairness attack and perturbation in the input space offers an alternative approach to mitigate this trade-off. In this case, the perturbation is deployed in the input feature, while the entire network will be fine-tuned with the perturbed data. Specifically, in the Adult dataset with MLP, input perturbation exhibits a better trade-off compared to latent perturbation, as shown below. Therefore, this modification could potentially resolve the issue. | Adult | Accuracy | $\Delta DP$ |$\Delta EOd$ | | ------------------- | ------------- | -------------- | ------------- | | Baseline | 0.8525±0.0010 | 0.1824±0.0114 | 0.1768±0.0411 | | Latent perturbation | 0.8380±0.0045 | 0.1642±0.0261 | **0.0971±0.0098** | | Input perturbation |**0.8473±0.0016** | **0.1588±0.0135** | 0.1016±0.0394 | ### Limitation: Mitigating Data Imbalance in Multi-Class Classification In response to the reviewer's comment, we recognize that the number of classes in multi-class classification may lead to data imbalance in the one-to-all strategy discussed in Appendix A. To address this concern, we employ an upsampling strategy to equalize the number of samples in each subgroup, as outlined in line 205. This approach effectively mitigates the data imbalance issue by ensuring that each class is equally represented in the dataset. By doing so, we enhance the fairness of our model and improve its performance in multi-class classification scenarios. We believe that this strategy provides a practical solution to the data imbalance problem in the one-to-all strategy.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully