# The ICLR 6021 Rebuttal zn1U
## zn1U:
### Weaknesses:
1. The explanation of the (Emotional graph transform) EGT step lacks depth. Although the authors demonstrate in their experiment (4.5) the difference between EGT and BGI, which is able to transform an intermodal heterogeneous graph into a more emotion-specific graph, the motivation for this step is not clear enough to me, and it seems more like a step based on experimental attempts to determine what to do; in other words, the authors seem to know what has to be done and how it should be done, but are unable to explain why it allows HetG to be transformed in an emotionally weighted way. what has to be done and how it should be done, but are unable to explain why doing so allows HetG to undergo an emotionally relevant weighting transformation. I think it should be that EGT creates an Attention-like effect between the original input and the HetG weights, and in training this ATTENTION tends to notice the HetG edges that are more emotionally relevant. I think the authors should experimentally demonstrate what makes EGT work and provide a more direct explanation in the paper.
2. Lack of a flowchart of the overall model. This paper contains a large number of formulas that are difficult to read, and coupled with the lack of a flowchart of the overarching model, I had a hard time imagining what the complete model would look like, how the temporal and spatial RDA components would be linked, and how the heterogeneous edges would be generated. Although the authors used formulas to explain the steps, this piling up of formulas in the presence of a large number of formulas rather made it difficult for me to understand the framework of the model, at least for me I would have liked a clearer flowchart as a guide.
3. The choice of Baseline is rather narrow. Although several baselines are included, they are basically methods in the BCI area. There are many DA methods in other fields, such as Maximum Classifier Discrepancy proposed in CVPR and a series of methods derived from it. I think adding more diversity of DA methods to compare and analyse can make the results more convincing, as DA is a relatively uncommon method in the BCI domain. Because multimodal data in other area do not necessarily correlate across subject as well as physiological signals, so comparison with methods in other area can demonstrate the applicability of VBH-GNN in the field of physiological signals.
直接反驳审稿人,说明我们的baseline数目足够,且选择是合理的。
### Questions:
1. Is the BCI in Figure 2 trying to represent BGI?
### Response to Reviewer zn1U:
Thank you so much for the positive rating and insightful comments. Your valuable suggestions are beneficial for further strengthening our paper. We have revised our paper according to your comments.
#### Answer to Weakness 1:
If we understand the reviewers correctly, the reviewer believes that our explanation of what and how EGT plays a role in RDA was unclear. Thanks to the reviewers' suggestions, we have updated the description of the EGT section. Here, we would like to clarify the role of EGT in terms of its rationale.
**The role of EGT is to distinguish between the latent relationship distribution founded by BGI in different emotions**. We have mentioned it in our paper:
> EGT divides the clustering centers of the two emotion categories in the source and target domains into two.
Therefore, the EGT is a component designed for downstream ER tasks for extracting emotion-related information representations from HetG (output of BGI). As shown in our paper in Fig.4(c), the distribution after EGT can be more adapted to the downstream classification task. As shown in Section 4.3 ABLATION EXPERIMENTS, there is a significant decrease in the accuracy after the EGT Loss is removed. We have summarized the effect of EGT on VBH-GNN as follows:
> For EGT loss, its effect on the model is to determine the degree of convergence.
In other words, the EGT can be regarded as a bridge between downstream tasks and BGI. It makes the domain-invariant relationship distribution inferred from BGI more suitable for downstream tasks.
**The EGT is achieved by transforming the HetG by weighting each edge with a conditional variable**. This conditional variable is a Gaussian distribution computed from edge embedding and conditioned on the edges of HetG:
> $$\mathcal{N}_{s\lor lt}|Z_{\text{HetG}} \sim \mathcal{N}(Z_{\text{HetG}} \times \bar{\mu}_{s\lor lt}, Z_{\text{HetG}} \times \bar{\sigma }^2_{s\lor lt})$$
>
It integrates the emotion information from node embedding into the graph structure. Therefore, we use the re-parameterization trick on this conditional variable to make it a weight related to the emotion, which will be applied further to transform the HetG to EmoG.
#### Answer to Weakness 2:
According to the reviewer's suggestion, **we have added a flowchart of the VBH-GNN in our paper (Please check Figure 2 in our revised PDF version)**. We have shown the workflow of the VBH-GNN and the critical components. We also added a more detailed explanation of each component in Section 3.1. We hope this will help the reader to understand VBH-GNN better.
#### Answer to Weakness 3:
We apologize for not clearly stating our criteria for baseline selection. We and the reviewers have some different perspectives on baseline selection, so we wanted to share our opinions. We strongly agree with the reviewers and have added new experiments to broaden the scope of our baseline.
**First, the physiological signal is a time series signal, and the MCD series method is unsuitable for this scenario**. The MCD series of methods are more likely to perform tasks such as object classification. Compared with this type of data, physiological signals have a significant feature: time correlation. Therefore, the model needs to analyze the spatial-temporal relationship of the data. Due to different data scales, these models are difficult to transplant to cross-subject ER tasks directly.
**Second, to increase baseline diversity, we added two baselines from other fields**. Baseline in [1] and [2] apply graphs to learn spatial relationships between multivariate time series data. This kind of data is consistent with the multimodal physiological signals we use. We have updated the experimental results in Table 1:
| Method | DEAP | | | | DREAMER | | | |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| | Arousal |Arousal | Valence | Valence | Arousal | Arousal | Valence |Valence |
| | Accuracy | F1 Score | Accuracy | F1 Score | Accuracy | F1 Score | Accuracy | F1 Score |
| MTGNN [8] | $67.46 \pm 11.51$ | $63.03 \pm 12.19$ | $64.77 \pm 7.98$ | $67.24 \pm 8.33$ | $66.66 \pm 9.54$ | $66.24 \pm 11.5$ | $63.35 \pm 6.29$ | $64.01 \pm 9.39$ |
| RAINDROP [9] | $66.06 \pm 10.11$ | $63.7 \pm 12.43$ | $65.59 \pm 7.38$ | $64.29 \pm 7.98$ | $65.74 \pm 8.99$ | $62.17 \pm 10.82$ | $65.85 \pm 7.61$ | $62.44 \pm 8.07$ |
| Our VBH-GNN | **73.5** $\pm$ **7.22** | **71.53** $\pm$ **10.86** | **71.21** $\pm$ **6.41** | **71.85** $\pm$ **7.38** | **70.64** $\pm$ **7.74** | **69.66** $\pm$ **9.51** | **73.38** $\pm$ **4.21** | **69.08** $\pm$ **6.98** |
However, these models do not perform well because the data in their applicable scenarios differ from physiological signals regarding the sparsity and sampling rate, etc.
[1] Zonghan Wu, et al. "Connecting the dots: Multivariate time series forecasting with graph neural networks." ACM SIGKDD, 2020
[2] Xiang Zhang, et al. "Graph-Guided Network for Irregularly Sampled Multivariate Time Series." ICLR, 2022.
#### Answer to Question 1:
We thank the reviewer for carefully reading our manuscript and pointing out this typo. **We examined the manuscript thoroughly and tried our best to correct all the typos we found**. We also have redrawn Figure 2 (Now it is Figure 3 in the current version). We hope this could provide readers with a better reading experience.