## Reviewer 1 (zsXF)
We sincerely appreciate the time and efforts you've dedicated to reviewing and providing invaluable feedback to enhance the quality of this paper. We provide a point-to-point reply below for the mentioned concerns and questions.
---
> **Reviewer**: The technical contribution of this paper is a bit weak in that they mostly followed [1]. The main contributions merely lie in reframing the problems into the robustness on fairness.
**Authors**: We would like to clarify that the technical novelty of our work significantly differs from the contribution of [1]. Specifically, the contribution of [1] mainly lies in utilizing Gaussian to certify a specific predicted label with continuous input. However, such an approach cannot be directly used to tackle the problem studied in our paper. Reasons include: (1) the certification achieved by [1] **is not able to handle discrete input**; (2) the certification achieved by [1] **fails to achieve joint certification on a span over multiple input data modalities**; and (3) the certification achieved by [1] focuses on a specific data point **instead of the property of a group of data points**.
Therefore, we argue that other than formulating a novel research problem, the novelty of our work also lies in: (1) we proposed a strategy to properly handle the smoothing over graph topology; (2) we take the first step to achieve joint certification over the span of continuous and discrete data modalities (i.e., node attributes and graph topology), which is **particularly essential** for certifying Graph Neural Networks due to its natural multi-modality input; and (3) we establish a framework ELEGANT to achieve the certification of the property of a group of data points with the proposed techniques above.
In fact, we note that the only connection between our work and [1] is that the same noise is adopted to smooth the input data in the continuous domain (i.e., node attributes). We would like to argue that Gaussian noise is widely used in research based on randomized smoothing, and thus using Gaussian noise to smooth the continuous input data should not be considered to undermine the novelty.
[1] Cohen, J., Rosenfeld, E., & Kolter, Z. (2019, May). Certified adversarial robustness via randomized smoothing. In international conference on machine learning (pp. 1310-1320). PMLR.
---
> **Reviewer**: There is a major concern about how the proposed method can improve the fairness of graph neural networks. In particular, according to the theorem 1, the smoothed version of the classifier can only guarantee the discrimination wouldn't change a lot after any perturbations. However, if the given GNN is biased, how can this method improve the fairness? Can you clarify how the proposed method can improve the fairness of models?
**Authors**: We note that improving fairness is a byproduct of ELEGANT, and our main focus is to achieve certification over the fairness level of the prediction results. However, as suggested, we provide a detailed discussion about why the fairness performance can be significantly improved in ELEGANT below.
As mentioned in Section 3.4, the proposed strategy to obtain the output predictions in ELEGANT is to select the fairest result among the output set $\hat{\mathcal{Y}}'$, where each output is derived based on a sample $\boldsymbol{\Gamma}_{\boldsymbol{A}}' \in \bar{\mathcal{A}}'$ (i.e., $\mathrm{argmin}_{\hat{\boldsymbol{Y}}'} \pi(\hat{\boldsymbol{Y}}', \mathcal{V}_{\text{tst}}) \;\; s.t. \; \hat{\boldsymbol{Y}}' \in \hat{\mathcal{Y}}'$). Such a strategy provides a large enough probability to achieve certification in light of Proposition 1. At the same time, we point out that such a strategy also helps to significantly improve fairness since outputs with a high level of bias are excluded.
<u>As suggested, we will also add the detailed discussion above in our next version.</u>
---
> **Reviewer**: More analysis on the certified robustness in fairness should be given.
**Authors**: We have provided comprehensive analysis of the certified robustness of fairness in our paper. To help understand the main analysis presented in this paper, we provide a brief summary regarding each lemma, theorem, and proposition presented in this paper.
*Theorem 1. (Certified Fairness Defense for Node Attributes)*: This theorem gives a way to compute the perturbation-invariant budget (i.e., the budget within which the fairness level will not reduce under a certain threshold) of node attributes. However, since we consider both input data modalities could be attacked, we still need to extend the analysis over the span of node attributes and graph topology (see Theorem 4).
*Theorem 2. (Certified Defense Budget for Structure Perturbations)*: This theorem formulates an optimization problem, whose solution is the perturbation-invariant budget (i.e., the budget within which the fairness level will not reduce under a certain threshold) of graph topology under the smoothed node attributes. However, to solve this optimization problem, we need to explicitly compute $\underline{P_{\tilde{g}_{\boldsymbol{X}}=1}}$ (see Theorem 3).
*Theorem 3. (Positive Probability Lower Bound)*: This theorem provides a way to explicitly compute $\underline{P_{\tilde{g}_{\boldsymbol{X}}=1}}$, which directly enables us to solve the optimization problem in Theorem 2.
<u>*Theorem 4. (Certified Defense Budget for Attribute Perturbations)*:</u> This theorem is built upon Theorem 1 and provides a way to explicitly compute the perturbation-invariant budget of node attributes over the span of node attributes and graph topology.
*Lemma 1. (Perturbation-Invariant Budgets Existence)*: This lemma claims the existence and tractability of the perturbation-invariant budgets on both data modalities, which is further detailed by Theorem 2 and Theorem 4.
*Lemma 2. (Positive Probability Bound Under Noises)*: This lemma claims the existence and tractability of $\underline{P_{\tilde{g}_{\boldsymbol{X}}=1}}$, which is further detailed by Theorem 3.
*Proposition 1. (Probabilistic Guarantee for the Fairness Level of Node Classification)*: This proposition provides a neat probabilistic theoretical guarantee — we have a probability that is large enough to successfully achieve certified defense on fairness (i.e., obtaining results with a bias level lower than the given threshold).
We hope the summary above helps address your concern. We are also eager to learn more about your concern regarding any analysis that can be further clarified/added.
## Reviewer 2 (dksM)
We sincerely appreciate the time and efforts you've dedicated to reviewing and providing invaluable feedback to enhance the quality of this paper. We provide a point-to-point reply below for the mentioned concerns and questions.
---
> **Reviewer**: 1) Why the authors choose to certify a classifier on top of an optimized GNN is unclear, given the existing works on robustness certification of GNNs on regular attacks mainly choose to directly target GNNs.
**Authors**: We present the reason to *certify a classifier on top of an optimized GNN* instead of *directly targeting at certifying a GNN model against regular attacks* below.
We note that the rationale of certified defense is to provably maintain the classification results against attacks. Under this context, most existing works on certifying an existing deep learning model focus on certifying a specific predicted label over a given data point. Here, the prediction results to be certified are classification results. Correspondingly, these works are able to directly certify the model itself.
However, the strategy above is not feasible in our studied problem. This is because we seek to certify the level of fairness of a group of nodes. The value of such a group-level property cannot be directly considered as a classification result. Therefore, we proposed to first formulate a classifier on top of an optimized GNN. As such, achieving certification becomes feasible. In fact, this also serves as one of the novelty of our work (see the first reply to reviewer 1, zsXF).
---
> **Reviewer**: 2) The difference between the attacking performance of GNNs and the fairness of GNNs should be clarified, especially how this difference affects the assumptions and theoretical results. It seems like the certification approach could also be applied without considering the binary sensitive attribute.
**Authors**: As suggested, we provide a detailed discussion about *(1) the difference between the attacking performance of GNNs and the fairness of GNNs* and *(2) applying the certification approach without considering the binary sensitive attribute* below.
**(1) What is the difference between the attacking performance of GNNs and the fairness of GNNs?**
In traditional attacks over the performance of GNNs, the objective of the attacker is simply formulated as having false predictions on as many nodes as possible, such that the overall performance is jeopardized. However, in attacks over the fairness of GNNs, whether the goal of the attacker can be achieved is jointly determined by the GNN predictions over all nodes. Such node-level dependency in achieving the attacking goal makes the defense over fairness attacks more difficult, since the defense cannot be directly performed at the node level but at the model level instead. Correspondingly, this necessitates (1) constructing an additional classifier as discussed in the previous reply, and (2) additional theoretical analysis over the constructed classifier as in Theorem 1, 2, and 3 to achieve certification.
**(2) Can we apply the certification approach without considering the binary sensitive attribute?**
Yes. In this work, we utilize the most widely studied setting to assume the sensitive attributes are binary. However, our certification approach is not designed to be tailored to the sensitive attributes. Therefore, our approach can be easily extended to scenarios where the sensitive attributes are multi-class and continuous by adopting the corresponding fairness metric as the function $\pi(\cdot)$ in Definition 1.
<u>In addition, we provide a more detailed discussion in our Appendix.</u>
---
> **Reviewer**: 3) How the main theoretical findings differ from existing works on robustness certification of GNNs on regular attacks could be explicitly discussed for ease of understanding.
**Authors**: Most existing works for robustness certification (e.g., [1, 2, 3]) can only defend against attacks on either node attributes or graph structure. Due to the multi-modal input data of GNNs, existing works usually fail to handle the attacks over node attributes and graph structure at the same time. However, ELEGANT is able to defend against attacks over both data modalities. This necessitates using both continuous and discrete noises for smoothing and the analysis for joint certification in the span of the two input data modalities (as shown in Figure 1).
[1] Wang, B., Jia, J., Cao, X., & Gong, N. Z. (2021, August). Certified robustness of graph neural networks against adversarial structural perturbation. In SIGKDD.
[2] Scholten, Y., Schuchardt, J., Geisler, S., Bojchevski, A., & Günnemann, S. (2022). Randomized message-interception smoothing: Gray-box certificates for graph neural networks. NeurIPS.
[3] Geisler, S., Zügner, D., & Günnemann, S. (2020). Reliable graph neural networks via robust aggregation. NeurIPS.
---
> **Reviewer**: Some recent works tackling graph robustness are not covered in the related works, especially spectral-based methods such as [1-4].
**Authors**: We note that the attacking scenario studied in this paper is **model evasion attack**. Therefore, we did not involve [1] and [3] **since they study a different problem of data poisoning attack**. In addition, the approach proposed in this paper **does not have any overlap with the spectral analysis of GNNs**, and thus the proposed approach can be generalized to both spatial and spectral GNNs. Therefore, we did not involve [2] and [4] since they solely focus on the spectral analysis of spectral GNNs.
<u>However, as suggested, we also added additional discussion over the listed references [1-4] in our appendix.</u>
---
> **Reviewer**: For experiments, do there exist other defense methods that also defend the fairness-aware attack on graphs that could be included as baselines?
**Authors**: We note that the study on the fairness attack and defense over GNNs remains at an early stage. To the best of our knowledge, this paper serves as the first work studying the problem of fairness defense for GNNs. Therefore, **there is no other existing method** that defends the fairness-aware attack on graphs. Such an initial step presented in this paper also makes this research problem particularly interesting to us.
---
> **Reviewer**: Minor: in Theorem 4, should “Certified Defense Budget for Attribute Perturbations” be “Certified Defense Budget for Structure Perturbations”?
**Authors**: Theorem 4 presents the computation of perturbation budget over the node attributes under binary noise on the graph structure. To avoid potential confusion, <u>we have changed its name to “Certified Defense Budget over Node Attributes”.</u> We thank the reviewer again for pointing this out.
## Reviewer 3 (NACu)
We sincerely appreciate the time and efforts you've dedicated to reviewing and providing invaluable feedback to enhance the quality of this paper. We provide a point-to-point reply below for the mentioned concerns and questions.
---
> **Reviewer**: I could not follow how the proposed scheme can be used to certify fairness metrics requiring the ground-truth label information (e.g., equal opportunity) over the test set. The paper claims that equal opportunity is a metric for which they can provide verification.
**Authors**: As suggested, we provide a xx.
We thank the reviewer for pointing this out. We would like to clarify
---
> **Reviewer**: There are already existing works searching over such augmentations for bias mitigation [1, 2, 3], which utilize theoretical findings and systematic designs to automatize augmentation designs. Thus, the bias mitigation part of this work is not of sufficient contribution (Figure 2 would be fairer, if fairness-aware methods' results are obtained over the same set of corrupted graphs used for ELEGANT).
**Authors**: We note that improving fairness is a byproduct of ELEGANT, and our main focus is to achieve certification over the fairness level of the GNN prediction results.
Instead, this paper established an approach for the joint certification over the span of both continuous and discrete input data. To the best of our knowledge, this serves as the first work that (1) achieves certified defense for the fairness of GNNs and (2) achieves joint certification against attacks on both node attributes and graph topology.
---
> **Reviewer**: While the initial research problem is an interesting and important one, to the best of my understanding, this paper applies well-known corruptions to the input graphs multiple times and utilizes Monte Carlo to estimate if the predictions achieve a certain level of fairness. The proposed approach is computationally costly, and lacks an inventive approach.
**Authors**: We provide the clarification of the (1) time complexity analysis and (2) the novelty of the proposed approach below.
**(1) Time complexity analysis.**
*Theoretical*: The time complexity is theoretically linear w.r.t. the total number of the random perturbations $N$, i.e. $\mathcal{O}(N)$. In our experiments, we perform 30,000 random perturbations over the span of node attributes and graph structure. We note that the running time is always acceptable even if $N$ is large, since the certification process does not require GNN re-training (which is the most costly procedure throughout this process). In addition, all runnings based on these random perturbations do not rely on the prediction results from each other. Hence they can be paralleled altogether theoretically, which will further reduce the running time.
*Experimental*: We also perform a study of running time. Specifically, we compare the running time of a successful certification under 30,000 random noise samples and a regular training-inference cycle with vanilla GCN in Appendix B.8. We observe that (1) although ELEGANT improves the computational cost compared with the vanilla GNN backbones, the running time remains acceptable; and (2) ELEGANT has less running time growth rate on larger datasets. For example, E-SAGE has around 10x running time on German Credit (a smaller dataset) while only around 4x on Credit Default (a larger dataset) compared to vanilla SAGE. Hence we argue that ELEGANT bears a high level of usability in terms of complexity and running time.
**(2) The novelty of the proposed approach.**
We would like to argue that utilizing the commonly adopted noise to achieve randomized smoothing does not jeopardize the novelty of this paper, since the novelty of this paper does not lie in the selection of noise or any "well-known corruptions". Instead, this paper established an approach for the joint certification over the span of both continuous and discrete input data, which particularly fits the need of Graph Neural Networks. In fact, this serves as the first work that (1) achieves certified defense for the fairness of GNNs and (2) achieves joint certification against attacks on both node attributes and graph topology to the best of our knowledge.
---
> **Reviewer**: How can one utilize ELEGANT for fairness certification for a fairness metric requiring the ground-truth labels (like equal opportunity)?
**Authors**: Please see the 1st reply above.
---
> **Reviewer**: Can you additionally provide the results for fairness-aware baselines over the same corrupted graphs as ELEGANT uses for the results in Figure 2? Accordingly, I suggest re-framing the discussion about bias mitigation in the paper.
**Authors**: As suggested, we provide a xx.
[ADDITIONAL RESULTS: complete experimental results of fairness-aware baselines]
## Reviewer 4 (wEZx)
We sincerely appreciate the time and efforts you've dedicated to reviewing and providing invaluable feedback to enhance the quality of this paper. We provide a point-to-point reply below for the mentioned concerns and questions.
---
> **Reviewer**: The paper ignores the existence of collective certificates (see e.g. [4] and follow up work). Since for such collective certificates the predictions of all nodes are simultaneously certified this automatically implies a certificate on any function of those predictions and in particular any fairness metric. Therefore, a comparison with collective certificates is warranted.
**Authors**: We note that to the best of our knowledge, there is **no existing work** where a joint certification for GNNs can be achieved over both continuous node attributes and discrete graph topology. In fact, the problem studied in this paper is novel, and there is neither any straightforward certification approach that can be directly adopted to handle the problem studied in this paper. We are not sure which existing literature the reviewer is referring to by [4] since no corresponding reference is provided. However, we are eager to engage in further discussion to learn about whether the concern has been addressed by the reply above and how we are still expected to extend any existing approach to serve as baselines.
---
> **Reviewer**: As a minor point: While the authors are the first to consider certficates w.r.t. both features and structure when the features are continious, joint certificates for discrete features have already been studuied in [3].
**Authors**: We note that **(1) the randomized smoothing technique adopted in [3] is different from the proposed randomized smoothing approach on the graph topology in our paper** and **(2) the techniques in [3] tackle a different problem** from our paper. We elaborate on more details below.
**The techniques in [3] are different from our paper.** Although both randomized smoothing approaches are able to handle binary data, we note that the randomized smoothing approach proposed in [3] is *data-dependent*. However, the proposed randomized smoothing approach in our paper is *data-independent*. We note that in practice, a data-independent approach enables practitioners to pre-generate noises, which significantly improves usability.
**The studied problem in [3] is different from our paper.** Although the authors claimed to achieve a joint certificate for graph topology and node attributes in [3], all node attributes are assumed to be binary, which can only be applied to cases where these attributes are constructed as bag-of-words representations (as mentioned in the second last paragraph in the Introduction of [3]). However, in our work, we follow a more realistic setting where only graph topology is assumed to be binary while node attributes are considered as continuous. This makes the problem **more difficult to handle**, since different strategies should be adopted then for different data modalities. In addition, it is also non-trivial to achieve the joint certification in the span of both continuous and discrete input data. In summary, compared with [3], the problem studied in this paper is more realistic and more suitable for GNNs.
Based on the discussion above, we believe the contribution of this paper is novel and particularly interesting to the researchers working in this area.
---
> **Reviewer**: As a minor point: The authors claim that there is a high computational cost of calculating $\epsilon_{\textbf{A}}$. While the cost is higher compared to Gaussian smoothing it is definitely not prohibitively high since as shown by [3] the certificate can be computed in linear time w.r.t. the certified radius, and thus the maximum certified radius can be also efficiently computed.
**Authors**: We agree with the reviewer that the computational cost of calculating $\epsilon_{\textbf{A}}$ is not prohibitively high. **We need to clarify a misunderstanding below**.
In our paper, by claiming that the computational cost of calculating $\epsilon_{\textbf{A}}$ is high (as in Appendix B.6), we only aim to claim the computational cost of calculating $\epsilon_{\textbf{A}}$ is **higher than that of Gaussian smoothing**, instead of claiming it to be prohibitively high. By this claim, we revealed the reason why we adopt certification on node attributes as the inner certification: since the certified radius of the inner certification needs to be computed more times than the outer certification in our proposed method, it is reasonable to utilize the one with a lower computational cost for the certified radius as the inner certification.
<u>We thank the reviewer for pointing this out, and we have revised the corresponding discussion to avoid confusion.</u>
---
> **Reviewer**: In addition to the above I have strong doubts in the correctness of Proposition 1 and the discussion in the preceeding "Obtaining Fair Classification Result" paragraph.
**Authors**: There seems to be a misunderstanding that Proposition 1 only holds when the output prediction corresponding to each specific node is certified separately. We note that **Proposition 1 does not rely on certification over any specific prediction**, and this is achieved by formulating a classifier on top of an optimized GNN model (as in Definition 3 and discussed in the first reply to reviewer dksM). <u>To avoid potential misunderstanding, we have revised the corresponding paragraph for better clarification.</u>
We note that one of the major differences between our work and other existing works on certification is that most existing works aim to achieve certification over a specific prediction, while our work aims to achieve certification over the property (i.e., the level of fairness) of a set of predictions. The problem studied in this paper is more difficult since the goal in our studied problem cannot be achieved by simply certifying each prediction separately. Such a challenge also highlights the novelty of our work.
---
> **Reviewer**: Moreover, it seems that the guarantee in Proposition 1 only applies to perturbations w.r.t. the features since the authors take the best (smallest) value over all structure perturbation. If this is the case this should be clearly mentioned, if this is not the case it is not clear at all why the certificate holds when selecting the argmin as proposed.
**Authors**: We would like to clarify a misunderstanding here: **Proposition 1 holds for the joint certification over node attributes and graph topology**. We provide a detailed proof in Appendix A.8. The corresponding intuition is also simple: under the joint certification on node attributes and graph topology, we are able to identify those output matrices of all test nodes that satisfy $\mathrm{Pr}(\pi(\hat{\textbf{Y}}', \mathcal{V}_{\text{tst}}) > \eta)<0.5$ and construct $\hat{\mathcal{Y}}'$ given Theorem 2 and 4. Correspondingly, $\mathrm{Pr}(\pi(\hat{\textbf{Y}}, \mathcal{V}_{\text{tst}}) > \eta)=\Pi_{\hat{\textbf{Y}}'\in\hat{\mathcal{Y}}'}\mathrm{Pr}(\pi(\hat{\textbf{Y}}', \mathcal{V}_{\text{tst}}) > \eta)<0.5^{|\hat{\mathcal{Y}}'|}$ also holds under a joint certification of node attributes and graph topology.
---
> **Reviewer**: Another weakness is the evaluation. While the 3 datasets (German Credit, Recidivism, Credit Defaulter) are often used in the fairness literature they are not ideal since the graph structure is not given but derived from the features (see NIFTY (Agarwal et al., 2021)).
**Authors**: As suggested, we provide a xx.
| | Pokec-z | Pokec-z | Pokec-z | Pokec-n | Pokec-n | Pokec-n |
| :----: | :--------------: | :-----------------: | :--------------: | :--------------: | :-----------------: | :--------------: |
| | ACC ($\uparrow$) | Bias ($\downarrow$) | FCR ($\uparrow$) | ACC ($\uparrow$) | Bias ($\downarrow$) | FCR ($\uparrow$) |
| SAGE | 63.13 $\pm$ 0.37 | 6.29 $\pm$ 0.20 | | 57.60 $\pm$ 2.74 | 6.43 $\pm$ 1.08 | |
| E-SAGE | 62.09 $\pm$ 2.22 | 4.18 $\pm$ 1.87 | 94.00 $\pm$ 5.66 | 60.74 $\pm$ 1.87 | 5.23 $\pm$ 0.13 | 91.50 $\pm$ 7.78 |
| GCN | 64.89 $\pm$ 0.93 | 3.44 $\pm$ 0.16 | | 59.86 $\pm$ 0.09 | 4.26 $\pm$ 0.40 | |
| E-GCN | 62.38 $\pm$ 0.26 | 1.52 $\pm$ 0.49 | 90.50 $\pm$ 0.71 | 59.83 $\pm$ 4.16 | 3.23 $\pm$ 1.20 | 94.00 $\pm$ 8.49 |
| JK | 63.06 $\pm$ 1.00 | 7.89 $\pm$ 3.05 | | 57.70 $\pm$ 1.05 | 8.81 $\pm$ 2.46 | |
| E-JK | 61.49 $\pm$ 2.55 | 3.63 $\pm$ 2.18 | 87.50 $\pm$ 2.12 | 61.19 $\pm$ 0.50 | 5.60 $\pm$ 0.01 | 93.00 $\pm$ 9.90 |
[ADDITIONAL RESULTS: experimental results from pokec-n and pokec-z]
---
> **Reviewer**: Since every entry is flipped with the same probability $1 - \beta$ even for very small values of $1 - \beta$ we will introduced many new ones in the adjacency matrix destroying the sparsity which makes scaling to larger graphs more difficult.
**Authors**: We note that the proposed approach can be easily extended to the batch training case, which has been widely adopted by existing scalable GNNs. Specifically, a commonly adopted batch training strategy of scalable GNNs is to only input a node and its surrounding subgraph into the GNN, since the prediction of GNNs only depends on the information of the node itself and its multi-hop neighbors, and the number of hops is determined by the layer number of GNNs. Since the approach proposed in our paper aligns with the basic pipeline of GNNs, the perturbation can also be performed for each specific batch of nodes. In this case, all theoretical analyses in this paper still hold, since they also do not rely on the assumption of non-batch training. **Therefore, we would like to argue that the proposed approach can be easily scaled to large graphs.** However, we note that this goes beyond the main focus of this paper.
---
> **Reviewer**: Finally, I think the motivation behind the Fairness Certification Rate (FCR) metric is questionable. A more informative metric would be e.g. the largest certified threshold $\eta$ for different bugets.
**Authors**: We note that the reason for adopting the Fairness Certification Rate (FCR) as the corresponding metric is that such a metric serves as a close extension of the metric of certified accuracy, which has been widely adopted in existing certified defense literature [1, 2]. The corresponding intuition is to measure the ratio of outputs that are both (1) true and (2) successfully certified. **Hence, we argue that adopting FCR maintains the evaluation consistency between this work and other existing works.**
In addition, we agree with the reviewer that the largest certified threshold $\eta$ for different budgets would also be a meaningful metric. Nevertheless, for any given budget, the corresponding smoothed classifier $\tilde{g}_{\textbf{A},\textbf{X}}$ is intractable. Therefore, it is **impractical** to compute the corresponding $\eta$ in practice.
[1] Cohen, J., Rosenfeld, E., & Kolter, Z. (2019, May). Certified adversarial robustness via randomized smoothing. In *international conference on machine learning* (pp. 1310-1320). PMLR.
[2] Lee, G. H., Yuan, Y., Chang, S., & Jaakkola, T. (2019). Tight certificates of adversarial robustness for randomly smoothed classifiers. *Advances in Neural Information Processing Systems*, *32*.
---
> **Reviewer**: Assuming that only the features are perturbed which final vector of test predictions is returned and why exactly is this vector certified?
**Authors**: For this question, we are not exactly sure what the reviewer is trying to ask. In our understanding, the reviewer is trying to ask about a modification of the proposed method: if we only apply randomized smoothing to node attributes and only achieve certification over node attributes, then how to obtain the corresponding output with fairness certification on node attributes?
We note that our method naturally extends to the case mentioned above. Specifically, by setting $\beta = 1.0$ (i.e., no noise is added to the graph topology), Theorem 1 and Lemma 1 remain hold, since the value of $\beta$ falls in the assumption of $0.5 \leq \beta \leq 1.0$ (when $\beta = 1.0$ , the $\epsilon_{\textbf{A}}$ in Lemma 1 will then be 0). Correspondingly, the size of the set of $\tilde{\epsilon}_{\textbf{X}}$ presented in Theorem 4 will be one, where such a value is directly computed from Theorem 1. Accordingly, only one output will be involved in the set of $\hat{\mathcal{Y}}'$ , and such output will be returned as the final output. In this process, since the assumption over $\beta$ still holds, all theoretical analysis also remains hold. According to Theorem 1 and Lemma 1, we have that the fairness certification corresponding to such an output is certified. <u>We also revised the discussion in Section 3.4 to avoid potential misunderstanding.</u>
---
> **Reviewer**: Does Proposition 1 apply to both feature and structure perturbations?
**Authors**: Yes. We note that Proposition 1 holds under the joint certification over node attributes and graph structures. Please see the 5th reply above.
---
> **Reviewer**: How exactly are Theorems 1, 2 and 3 different from existing results (See weaknesses)?
**Authors**: We elaborated on their differences below.
**Difference between Theorem 1 and Existing Results**: The mentioned reference [1] presents the most similar theoretical results compared with Theorem 1. However, the certification achieved in the mentioned reference [1] is directly achieved over the classification model itself, while the certification achieved by Theorems 1 is over a constructed classifier. We argue that such an extension is not obvious. Therefore, comprehensive theoretical analysis is performed in our paper.
**Difference between Theorems 2 and 3 and Existing Results**: We note that Theorems 2 and 3 are established over the joint certification for (continuous) node attributes and (discrete) graph topology. To the best of our knowledge, there is **no existing work** where a joint certification for GNNs can be achieved over both continuous node attributes and discrete graph topology (also see the 1st reply above). In fact, the problem studied in this paper is novel, and there is neither any straightforward certification approach that can be directly adopted to handle the problem studied in this paper.
---
> **Reviewer**: How does the approach compare to collective certificate (w.r.t. only structure perturbations for example)?
**Authors**: Please see the 1st reply above.