# ICML24_PhASER_rebuttal
# Note to Area Chairs before the final decision making
Dear (Senior) Area Chairs,
We deeply appreciate the time and effort invested by you and all the reviewers in providing constructive feedback on our work.
We wish to bring to your attention some concerns we have encountered during the review process, particularly regarding our interactions with Reviewer v4mZ. While we acknowledge the valuable insights provided by this reviewer, we have faced challenges in engaging with them effectively.
**Despite our best efforts to address Reviewer v4mZ's concerns in detail, we have not yet received any acknowledgment or further feedback from them**. We have taken great care and made significant efforts to respond thoughtfully to their queries. Drawing upon insights from **other reviewers whose concerns have been effectively resolved by our responses, we are confident that we have addressed the concerns raised by Reviewer v4mZ**, particularly as they primarily revolve around presentation aspects of the paper (which do not warrant its rejection, especially when the said reviewer acknowledges the technical merit and impact of the work).
Additionally, we notice that **the submission of Reviewer v4mZ's review (15th March) occurred after the required review deadline (14th March)**, reflecting their negative and passive attitude toward the ICML reviewers' responsibility. This makes us apprehensive about their commitment to the entire review process and their willingness to engage in constructive discussions even during the AC-reviewer phase. Consequently, we are afraid that **Reviewer v4mZ's assessment may not accurately reflect the true quality and significance** of our work, especially considering the fact that all other reviewers have consistently provided positive ratings of our work.
In light of these concerns, we respectfully request your careful consideration of the aforementioned issues before reaching a final decision on our paper. **We have full confidence in the integrity of the review process overseen by you and trust that you will ensure a fair and thorough evaluation for all submissions**.
Thank you for your attention to this matter and your continued support of the scholarly community.
Warm regards,
Authors of Paper 2154
# Kind Reminder to Area Chairs before the response period closes
Dear (Senior) Area Chairs,
We are grateful for your efforts in conducting a thorough review of our paper. We have not yet received any acknowledgment of our rebuttals from reviewers v4mZ and YZm2. As we are nearing the conclusion of our response period, we seek your assistance in engaging with our reviewers to address their concerns and clarify their initial queries for a fair evaluation of our work.
Additionally, we want to respectfully draw your attention to the following,
* Regarding Reviewer v4mZ, most of their concerns revolve around refining captions for figures and providing additional insights into our reported results, while also acknowledging the novelty and rigor of our method. We have made efforts to address these concerns by providing visualizations of our method and two baseline methods, along with additional justifications for our results and updated captions for the figures. While we believe our responses comprehensively address the reviewer's specific queries, we are prepared to promptly address any further clarifications requested by the reviewer within the response period. We want to highlight that while these concerns are important for improving the quality of the paper, they are more cosmetic in nature. The technical aspect and the impact of our method are well regarded by Reviewer v4mZ; however, their rating does not reflect this currently.
* We have diligently addressed all questions and possible misunderstandings of our method's premise raised by Reviewer wA1N during the response period, and the reviewer responds in full agreement to our response ("Thanks for the reminder and all the responses. They are well received."). However, the reviewer still recommends a low score, despite acknowledging the novel design of our method and commending the ablation studies in their initial review. We kindly appeal to you to be mindful of such inconsistencies in the feedback versus the rating we received for our paper.
Systematic consideration of time series for machine learning applications is a relatively new but critical and timely field of work. We believe our work contributes to the domain generalization for non-stationary time series from a unique point of view, as identified unanimously by all reviewers and specifically highlighted by Reviewer YZm2. We conduct extensive empirical evaluations across 10 baselines and 5 datasets, along with theoretical grounding, and hope to inspire more work from our design schema. We kindly request that you duly consider the merit of our work (as highlighted by our reviewers in language over their numerical ratings) and make your fair decision.
Thank you for your understanding and consideration.
Warm regards,
Authors of Paper 2154
<!-- As highlighted in reviewer YZm2's initial review we are one of the first works to succesfully evaluate a time series domain generalization method on challenging scenarios (single domain generalization) and acknowledges our innovativeness of design from high-level intuition to theoretical justification. -->
# Kind Reminder to Area Chairs
Dear (Senior) Area Chairs,
We hope this message finds you well. We deeply appreciate the thorough evaluation that our reviewers have provided and the time you have dedicated to the discussion phase. We would like to draw your attention to a few points that we believe are crucial for a fair and accurate assessment of our paper.
Firstly, we have noticed that **most reviewers have not participated in any discussions until now**. We kindly request your help in encouraging the reviewers to participate. We have diligently addressed their concerns and value their input. Regrettably, we haven't had any discussions with them yet, and we are eager for their feedback. Your support in urging them would be greatly appreciated. Thank you so much for your assistance and continued dedication.
In light of the reviews received, we wish to respectfully draw your attention to some issues. **All reviewers uniformly recognize the technical novelty and rigor of our work**, particularly highlighting the innovative application of the Hilbert Transform for enhancing out-of-domain generalization, the novel approach to characterizing time-series non-stationarity through emphasis on phase information, and the strong theoretical foundation supporting domain generalization and Hilbert Transform-based augmentation strategies.
Specifically, **Reviewer YZm2 commended our work with a high rating and detailed the strengths of our approach without identifying any significant weaknesses**. Conversely, **Reviewers v4mZ and 1RSH assigned lower ratings, citing concerns primarily related to the clarity of presentation**, such as the captions for Figures 1 and 3 and request for their higher resolution versions. We have diligently addressed these in our response, improving figure clarity and providing more explicit descriptions to facilitate better understanding. Moreover, we believe that **the essence and contribution of a research paper should predominantly be evaluated on its technical merit and innovation rather than presentation aspects** that are readily amendable. The discrepancy in the assessment, where minor presentation issues overshadow the recognized technical contributions, seems to reflect an imbalance in evaluation criteria. Furthermore, while **Reviewer wA1N had reservations about certain aspects of our methodology and its rationale, we have provided comprehensive clarifications** in our rebuttal, addressing each concern in detail. We are confident that a closer examination of our responses and a revisiting of our paper would alleviate these misunderstandings.
Consequently, we respectfully request that you take into consideration the possibility that **their current ratings may not fully reflect the true quality of our paper**. We wish to emphasize that we have diligently addressed and resolved all of their concerns. In light of this, we firmly believe that if the reviewers invest some time in reviewing our responses, they will likely reconsider and significantly elevate their current ratings.
In conclusion, we kindly request that you consider these factors when making the final acceptance decisions. We are grateful for the time and effort you have dedicated to our paper throughout this stage and the forthcoming steps. Your guidance and insights are invaluable to us, and we sincerely appreciate your thoughtful evaluation of our work. Should you require any further information or clarifications, please do not hesitate to contact us.
Thank you for your understanding and consideration.
Warm regards,
Authors of Paper 2154
# Global Response
We would like to thank all reviewers for their insightful comments and constructive suggestions. In particular, we really appreciate all reviewers' acknowledgments of the technical novelty of our proposed methods and the soundness of empirical evaluations. Below we provide detailed responses to each reviewer for every question, concern, and clarification. Thank you again and we look forward to any further feedback and discussion.
# Reviewer v4mZ
Thank you for taking the time to read and review our work. We are happy that you acknowledged the innovative approach we proposed for addressing non-stationarity in time-series data, as well as the clarity and rigor of our theoretical derivations. We address your questions and comments below.
## Weaknesses
> 1. The analysis of experimental results is not clear.
We will incorporate the below responses to Questions 3-5 in Section 3 of our future revision. We will also add the visualizations (response to Question 6) in the Appendix.
> 2. Some figures and descriptions need enhancement.
We have updated the captions for Figure 1 and Figure 3 and provided them below in response to the direct questions (Questions 1 and 2) to these figures. We also provided an updated higher resolution version for the figures here ([Figure 1](https://drive.google.com/file/d/1Nc76PdiCiQB7OsU-XOgew7ekjmBqwFJW/view?usp=sharing), [Figure 3](https://drive.google.com/file/d/1sguaaBHU_Zf0D-hYEZEhr1ZKGxccaT82/view?usp=drive_link), [Figure 4](https://drive.google.com/file/d/1_F_QtV4xBmqAMw_1J9WfUEDcTLwQGp0u/view?usp=drive_link).). In the manuscript, we will resize our figures with updated captions to improve the readability.
> 3. There is a lack of intuitive visual analysis of results.
We address this in Question 6 using the TSNE visualization techniques for PhASER and two of the best baselines, Diversify and BCResNet, in [Figure TSNE](https://drive.google.com/file/d/1710NIEbEIisP9YILCxDyW1PrG00EzD31/view?usp=sharing).
## Questions
> 1. The description of Figure 1 and its related four steps is not clear enough.
We have updated the caption as - "Overview of PhASER's components. I. illustrates Hilbert transform-based phase augmentation using phasor representation (top left) for the negative and positive frequency components of a time-series signal. The augmentation translates a signal $\mathbf{x}(t)$, to its $\pi/2$ phase-shifted version, $\widehat{\mathbf{x}}(t)$. II. demonstrates separate feature encoding of time-varying phase and magnitude derived from Short-Term Fourier Transform (STFT) using the magnitude encoder $F_\mathrm{Mag}$ and the phase encoder $F_\mathrm{Pha}$ with sub-feature normalization. We find that such a separate encoding followed by fusion provides superior performance over mere concatenation of magnitude and phase features for further processing. III. shows the key elements of the phase-residual broadcasting network. The dimensions of the intermediate feature maps are annotated to demonstrate the design of the depth-wise feature encoder ($F_\mathrm{Dep}$) followed by the temporal encoder ($F_\mathrm{Tem}$) and incorporation of the phase-projection head's ($g_\mathrm{Res}$) output for broadcasting. IV. represents task-specific classification encoder ($g_\mathrm{Cls}$) to optimize a categorical objective."
We also updated Figure 1 to incorporate all the notations of the design blocks and renamed the II subfigure Separate Magnitude-Phase Encoders for clarity. The updated figure is available here - [Figure 1 Updated](https://drive.google.com/file/d/1Nc76PdiCiQB7OsU-XOgew7ekjmBqwFJW/view?usp=sharing).
> 2. If the description of Figure 3 could be optimized, it would improve readability.
We have updated the caption for Figure 3 as - "Illustrative example of non-stationarity diversification by shifting a signal's phase. a) shows the temporal nonstationarity of a signal denoted by varying mean ($\mu$) and variance ($\sigma$) within a domain for three regions color-coded and denoted as I, II, and III. b) shows that the magnitude response ($|\text{DFT}|$) of the Discrete Fourier Transform (DFT) for each region is distinct. There is a clear shift in the dominant frequency for each region. c) shows the original (solid lines) and shifted (dotted lines) phase responses ($\angle(\text{DFT})$) for each region. The $\angle(\text{DFT})$ for each region are distinct and the shift operation can greatly change the signal's phase response. d) illustrates the overall time domain response of the original signal and the phase-shifted version. We achieve non-stationarity diversification by shifting the phase response without altering the magnitude response of the signal thus preserving task-relevant semantics."
The purpose of Figure 3 is to illustrate that statistical and spectral non-stationarity are prominent in most of the time-series data. By dividing the entire signal into three regions we highlight that the mean and variance, dominant frequency, and phase responses are not stationary through subfigures 3. a), b) , and c) respectively. In subfigure 3. c) we further show how the $\pi /2$ phase shifted version in the frequency domain looks like. And finally, in subfigure 3.d) we illustrate the difference in time-domain response of the original and phase-shifted signals.
> 3. In the comparative experiments, why is the RevIN module fused in some models and not in others? Please explain this in the text.
RevIN (Reversible Instance Normalization) is one of the earliest techniques proposed to address non-stationarity in time-series forecasting applications, which typically involve a single domain. The RevIN technique simply involves preserving the feature statistics before inputting them into an encoder network and then reintroducing them (or denormalizing) at the output of the encoder. This ensures that statistical non-stationarity is retained, allowing the model to be aware of this signal property while learning meaningful semantics for a given task. Primarily, this is a model-agnostic technique; however, the input-output space of the encoder needs to be the same (i.e., the input and feature map dimensions need to match) to carry out such reversible normalization for each sample. **We are interested in assessing the benefits of this explicit statistical non-stationarity preservation, along with our proposed techniques in PhASER.** Therefore, we only modify our architecture to incorporate the RevIN module around the depthwise feature encoder. Overall, we do not find any significant gains for human activity recognition, sleep stage classification, and gesture recognition applications.
Additionally, to support our observation, we also adapt the feature-extractor module in the best baseline, Diversify, and assess its performance across three human activity recognition datasets (refer to Table 2 in the manuscript). Below, we present a snippet of the results, **which consistently indicate that RevIN does not yield significant improvements for domain-generalizable classification tasks in such applications**. We will make dedicated revisions in the results section (Section 3.1) and integrate these supplementary experiments.
| | Dataset | | |WISDM | | | | |HHAR | | | | |UCIHAR | | | |
|-- |-------------------|------- |----- |----- |----- |------- |----- |----- |----- |------ |----- |----- |----- |------ |------ |---- |----------- |
| | **Target** |**1**|**2**|**3**|**4**|**Avg.**|**1**|**2**|**3**|**4**|**Avg.**|**1**|**2**|**3**|**4**|**Avg.**|**Overall Avg.**|
|I | Diversify + RevIN*| 0.83 | 0.79 | 0.84 | 0.83 | 0.82 | 0.70 | 0.79 | 0.88 | 0.62 | 0.75 | 0.88 | 0.88 | 0.76 | 0.87 | 0.85 | 0.81 |
|II | Diversify | 0.82 | 0.82 | 0.84 | 0.81 | 0.82 | 0.82 | 0.76 | 0.82 | 0.68 | 0.77 | 0.89 | 0.84 | 0.93 | 0.90 | 0.89 | 0.83 |
|III| Ours + RevIN* | 0.86 | 0.85 | 0.84 | 0.84 | 0.85 | 0.82 | 0.82 | 0.92 | 0.85 | 0.85 | 0.96 | 0.90 | 0.93 | 0.97 | 0.94 | 0.88 |
|IV | Ours | 0.86 | 0.85 | 0.85 | 0.82 | 0.85 | 0.83 | 0.83 | 0.94 | 0.88 | 0.87 | 0.96 | 0.91 | 0.95 | 0.97 | 0.95 | 0.89 |
We would like to highlight that the experiments on RevIN with PhASER for the Sleep-stage classification and Gesture Recognition datasets are presented in Table 13 and Table 14 of the Appendix, respectively. The reason why we didn't put them in the main text is to save space.
> 4. Regarding the results analysis in Figures 3.1 and 3.2, can there be further explanations on why certain models perform well while others perform poorly?
Since we don't have Figures 3.1 and 3.2. We provide an explanation to the best of our interpretation of this question as follows. If our interpretation here is inconsistent with your initial intention for this question, please inform us and we can provide further answers.
We assume the reference is to Table 3 for Sources 1 and 2 as 3.1 and 3.2. We attempt to reason why we observe a drop in performance for the Diversify algorithm, which is generally the best baseline for HHAR tasks. Note that this setting involves single-domain generalization; we train only on one domain and test on all other domains. Diversify's core training scheme assumes training data is composed of multiple domains and assigns pseudo-domain labels along with sub-domain labels which characterize the latent distribution shifts. Due to this inherent assumption, in the case of single-domain generalization, it performs poorly. Single-domain generalization is a very challenging experimental setting, and to the best of our knowledge, we are the first work to demonstrate this capability for time-series data. **Our approach, PhASER, does not have assumptions about the domain space of the training data and holistically addresses non-stationarity through phase-shift-based diversification, thus learning distribution-agnostic semantic representations that allows for successful single-domain generalization as well.**
> 5. Why is there a slight decrease in performance in the Gesture Recognition-4 experiment in Table 4? Are there any limitations in the method?
In scenario 4 for Gesture recognition (GR), Diversify (best baseline) offers slightly better results (0.76 $\pm$ 0.01) than PhASER (0.75 $\pm$ 0.01), but in general, across all other datasets and settings, PhASER outperforms the best baseline by an average of 5% upto 13% in some cases. **This can be attributed to the nature of the GR dataset, the least nonstationary dataset among others.** The Augmented-Dickey-Fuller (ADF) test statistics, as listed in Table 8 in the Appendix, assign it the lowest score, indicating its higher level of stationarity. While PhASER excels when the dataset exhibits pronounced non-stationarity, it also demonstrates competence with more stationary datasets compared to previous benchmarks for time-series classification. Overall, in the context of Gesture recognition, PhASER performs comparably, if not slightly better on average, as depicted in Table 4.
> 6. Could visualizations of some results be provided to analyze the effectiveness of the method and its inevitable limitations (if any)?
**To enable visualization we conducted t-distributed stochastic neighbor embedding (t-sne) analyses on our method (PhASER), Diversify and BCResNet for the HHAR dataset for the left out domains in scenario 1 and provide the visualization here, [Figure TSNE](https://drive.google.com/file/d/1710NIEbEIisP9YILCxDyW1PrG00EzD31/view?usp=sharing).** We illustrate the t-sne plots for in-domain and out-of-domain data and the different colors indicate the six activity classes of this dataset. In all the cases, we only make necessary modifications to extract the embeddings from the last layer of the network before categorical score assignment and tune the perplexity parameters during the t-sne plotting for optimal 2-dimensional projection. For the in-domain data we hold out 20% of the training samples and for out-of-domain, we follow the scenario 1 setting (domains 0 and 1 are the targets). [Figure TSNE](https://drive.google.com/file/d/1710NIEbEIisP9YILCxDyW1PrG00EzD31/view?usp=sharing). (a,d) shows that the clustering for each class is distinct and clearly separable for both in-domain and out-of-domain data using PhASER. The accuracy disparity for unseen domains is also very low, 0.97 for in-domain PhASER accuracy and 0.94 for out-of-domain, which justifies the overall strong generalization ability of PhASER without access to any target domain samples. We would also like to point out that t-sne plots are susceptible to hyperparameters, hence, even though the accuracy of Diversify is better than BCResnet for out-of-domain data, visually [Figure TSNE](https://drive.google.com/file/d/1710NIEbEIisP9YILCxDyW1PrG00EzD31/view?usp=sharing). (f) may convey better separation between classes than [Figure TSNE](https://drive.google.com/file/d/1710NIEbEIisP9YILCxDyW1PrG00EzD31/view?usp=sharing). (e).
> 7. The references seem to be repetitive.
We have rectified the duplicated citations referring to different versions of a paper in the revised manuscript.
## First Reminder Response
Dear Reviewer v4mZ,
This is a gentle reminder that we are currently several days into the response period, and keenly await your response. Thank you for taking the time to review and provide feedback on our paper. We would greatly appreciate the opportunity to discuss and address any concerns you may have regarding the presentation and clarity of the manuscript.
We hope our responses can effectively address your concerns regarding the figure captions, results visualization, and provide additional insights into our empirical observations. This presents a valuable opportunity for us to enhance our work, and we would be grateful for any additional feedback you could provide.
Best Regards,
Authors of Paper 2154
## Second Reminder Response
Dear Reviewer v4mZ,
The response period concludes tomorrow and we would like to express our sincere gratitude for your feedback. With the deadline approaching, we respectfully remind you that we cannot respond to any further questions or concerns after the 4th.
We are hopeful that our comprehensive responses to your initial review, particularly regarding the clarity of captions for Figures 1 and 3, as well as the additional experimental insights and visualizations, can effectively address any concerns and highlight the enhancements made to our paper. We understand your time is precious, yet we earnestly ask for a moment of your consideration to review our responses and raise the rating if possible. Your positive assessment would not only affirm our efforts but also contribute significantly to the paper's soundness and quality. We are also glad to answer any questions and concerns.
Thank you once again for your dedication and insights.
Best Regards,
Authors of Paper 2154
## Third Reminder Response
Dear Reviewer v4mZ,
The response period concludes in a few hours. We want to thank you for your feedback and kindly request you to take some time to review our responses. We understand the demands on your schedule and the importance of your role in providing feedback. We kindly request a moment of your time to review the updates we've made in response to your queries. We've meticulously addressed your concerns by providing updated captions to the figures, enhancing visualizations, and offering deeper insights into our results to the best of our interpretation.
Your feedback is invaluable to us, and we genuinely appreciate your consideration of our additional efforts. We humbly ask for your reconsideration of our paper's score based on the clarifications we've provided. Your assessment plays a pivotal role in ensuring our work receives the recognition it merits, as highlighted in your summary and strengths.
Rest assured, we remain vigilant in monitoring the portal for any further feedback or queries you may have during these remaining hours. Should you require additional information or have any lingering concerns, we stand ready to address them promptly and comprehensively.
Once again, we extend our heartfelt thanks for your dedication and invaluable insights.
Best Regards,
Authors of Paper 2154
# Reviewer 1RSH
Thank you for taking the time to read and review our work. We are happy that you acknowledged our exhaustive empirical analyses and intuitive design of PhASER. We address your questions and comments below.
## Weaknesses
> The introduction of phase information and separate feature encoding could potentially increase the complexity of the model, making it harder to interpret and analyze.
The computational complexity of extracting magnitude and phase information is identical. In fact, through a separate encoder design, we can facilitate implementation-specific optimizations for efficient computation. Table 5 in the main paper is an ablation study to analyze the impact of different modules of PhASER, which is not negatively impacted by incorporating phase-related computations. Although model interpretability and explainability is a key area of research for machine learning; our focus here is on improving the generalizability of time-series classification tasks across domains. Our studies show that phase brings significant enhancements at different stages of the design - phase-based non-stationarity diversification, magnitude-phase separate encoding, and phase-driven residual network, to help us achieve out-of-domain generalization (Tables 2 and 4) and also succeed in challenging scenarios like single domain generalization (Table 3).
## Questions
> 1. How does the computational cost of the proposed PhASER framework compare to other existing methods in terms of time complexity and resource utilization?
To assess the resource utilization of PhASER against other baselines, we offer two metrics - 1) Number of Multiply and Accumulate operations per sample (MACs) for approximate computational complexity at run-time and 2) Number of trainable parameters to determine the memory footprint. We compute these for the HHAR dataset below (these metrics are dependent on input dimensions, hence different choices of dataset, sequence length, and modalities can yield different numbers).
| Model | MACs ($\times 10^6$) | Number of Trainable Parameters ($\times 10^3$)|
|-----------|----------|----------------------------------- |
| ERM | 19.5 | 98.1 |
| GroupDRO | 19.5 | 98.1 |
| DANN | 21.7 | 102.9 |
| RSC | 19.5 | 98.1 |
| ANDMask | 19.5 | 98.1 |
| BCResNet | 55.3 | 154.7 |
| NSTrans | 35.3 | 75.6 |
| MAPU | 46.9 | 128.3 |
| Diversify | 35.7 | 922.9 |
| Ours | 48.6 | 81.4 |
Our computation cost is comparable to the other baseline methods while achieving much better performance. We also determine the asymptotic time complexity of the PhASER modules below. For multi-layer neural network modules (rows 3-7), the representative time complexity for one layer is provided.
| | Module | Complexity per module | Description of input notation for each module |
|--|-------------------|------- |----- |
|1| Hilbert augmentation (using Fast-Fourier transform) | $\mathcal{O}(V \cdot N \log N)$ | $N$ is the sequence length and $V$ is the number of variates of the input |
|2| Short-Term Fourier Transform|$\mathcal{O}(V \cdot N \cdot W \log W)$| $N$ is the sequence length, $V$ is the number of variates and $W$ is the window size of the input |
|3| Magnitude Encoder ($F_\mathrm{Mag}$), Phase Encoder ($F_\mathrm{Pha}$), Phase Projection Head ($g_\mathrm{Res}$) - 2D Convolution Layers |$\mathcal{O}(k^2 \cdot N \cdot d \cdot c_{in} \cdot c_{out} )$| $N$ is the sequence length, $d$ is the feature dimension, $c_{in}$ is the number of channels for the input. $k$ is the size of a symmetric kernel for the convolution layer and $c_{out}$ is the number of channels for the output. |
|4| Depthwise Feature Encoder ($F_\mathrm{Dep}$) - 2D Convolution Layers with average pooling along feature axis |$\mathcal{O}(k^2 \cdot N \cdot d \cdot c_{in} \cdot c_{out} ) + \mathcal{O}(d)$| $N$ is the sequence length, $d$ is the feature dimension, $c_{in}$ is the number of channels for the input. $k$ is the size of a symmetric kernel for the convolution layer and $c_{out}$ is the number of channels for the output. |
|5| Temporal Encoder ($F_\mathrm{Tem}$) - (worst case backbone) Transformer Encoder |$\mathcal{O}(N \cdot d)$| $N$ is the sequence length, $d$ is the feature dimension |
|6| Classification Encoder ($g_\mathrm{Cls}$) - fully connected layers |$\mathcal{O}(d \cdot h)$|$d$ is the input feature dimension and $h$ is the hidden layer dimension.|
> 2. Could you please provide higher-resolution versions of Figures 1, 3, and 4 as the details in them are not sufficiently clear.
We have provided the higher resolution versions of the figures here - [Figure 1](https://drive.google.com/file/d/1Nc76PdiCiQB7OsU-XOgew7ekjmBqwFJW/view?usp=sharing), [Figure 3](https://drive.google.com/file/d/1sguaaBHU_Zf0D-hYEZEhr1ZKGxccaT82/view?usp=drive_link), [Figure 4](https://drive.google.com/file/d/1_F_QtV4xBmqAMw_1J9WfUEDcTLwQGp0u/view?usp=drive_link).
## First Reminder Response
Dear Reviewer 1RSH,
We would like to gently remind you that we are now several days into the response period and eagerly await your feedback and further discussion. Your time and effort in reviewing this paper are greatly appreciated as they contribute significantly to its improvement.
We aim to address your concerns regarding the figure resolutions and the computational costs of our proposed method, PhASER, through our responses. This presents a valuable opportunity for us to enhance our work, and we eagerly anticipate any additional feedback you could provide.
Best Regards,
Authors of Paper 2154
## Second Response
Dear Reviewer 1RSH,
Thank you for raising the score and positive feedback. If you have any other concerns, please let us know so we can address them during this response period. We kindly request your consideration in raising the score if there are no lingering concerns. We value your input and sincerely hope you consider raising your rating again based on the improvements we’re implementing and modifications we will update in future revision. Your endorsement would greatly enhance our work's credibility and chance of acceptance. We are available for any further questions.
Best Regards,
Authors of Paper 2154
## Third Reminder Response
Dear Reviewer 1RSH,
Thank you for raising your rating for our paper and for your positive feedback. We want to kindly remind you that we are one day away from the conclusion of the reviewer-author response period.
We are committed to promptly addressing any further concerns or questions you may have within the current response window. We kindly ask for your thoughtful reconsideration regarding the scoring, provided there are no lingering issues. We deeply appreciate your valuable feedback and sincerely hope you will consider raising your rating in light of the enhancements we've made and your satisfaction with our responses. Your endorsement would significantly bolster the credibility of our work and increase its chances of acceptance. We are available for any further questions.
Best Regards,
Authors of Paper 2154
# Reviewer YZm2
We sincerely thank you for acknowledging our method's merit from high-level intuition to theoretical analysis and appreciating our attempts to rigorously evaluate our method on challenging scenarios. We address your questions and comments below.
## Questions
> 1. Curious about whether some traditional augmentation techniques are useful for time-series DG, such as scaling, reverting, temporal shifting, noise injection, and so on.
For time series, brute augmentations like scaling, reverting, cropping, and jittering may not be always suitable as they may alter the morphological properties that are important for the task. Even more advanced techniques like frequency-time warping and additive noise, need deliberate characterization of the signal's frequency response to meaningfully provide an augmented view while retaining the task-relevant semantics. This is one of the key motivating factors for us to explore a general-purpose augmentation strategy that diversifies the non-stationarity in a signal without altering its task-specific semantics (magnitude and frequency responses).
To demonstrate the use of traditional augmentations with PhASER for human-activity recognition, we incorporate the following augmentations proposed by past works [2, 3] on the HHAR dataset.
* Rotation - incorporating arbitrary rotation matrices to simulate different sensor locations.
* Permutation - random temporal perturbation for fixed window within each sample [3].
We incorporate these augmentations in place of the Hilbert augmentation, apply the PhASER model, and present the results below (row 2). We also run an experiment with identical settings with no augmentations shown in row 3.
| | Dataset | | | HHAR | | |
|-|--------------------------------------------------------------|-------|-------|-------|-------|------- |
| | **Target** | **1** | **2** | **3** | **4** |**Avg.**|
|1| Ours (with Hilbert Augmentation; Table 2 in main paper) | 0.83 | 0.83 | 0.94 | 0.88 | 0.87 |
|2| Ours + Traditional Augmentation (rotation, permutation) | 0.76 | 0.76 | 0.83 | 0.75 | 0.78 |
|3| Ours + No Augmentation | 0.83 | 0.72 | 0.89 | 0.84 | 0.82 |
These results are indicative that arbitrary augmentations in the time-domain do not necessarily diversify the non-stationarity of a signal. Hence, PhASER principles like residual connections to re-introduce nonstationary dictionary as phase-projection and broadcasting (using $g_\mathrm{Res}$) do not bode well here and even the performance of a no-augmentation scenario (row 3) is better than the traditional temporal augmentations for domain-generalization tasks in this case. However, in the future, we may encounter applications where established augmentation strategies, in combination with Hilbert augmentation, might be the best choice. In this work, we aim to propose a more generic framework that can benefit most time-series classification tasks to achieve better generalizability.
> 2. Why is the performance of Nonstationary transformer [1] so poor? Suppose nonstationary transformer is also designed for addressing nonstationarity, why its out-of-distribution generalization is not good. Can the author(s) provide some potential reasons or explanations?
We tune the Nonstationary transformer architecture (number of encoder layers and attention head for each dataset) for the best possible performance in our experiments. The poor performance of the Nonstationary transformer can be attributed to two main reasons:
(1) Originally, the Nonstationary transformer was designed for forecasting time-series tasks and employs an encoder-decoder style architecture. **To successfully apply the core module of the Nonstationary transformer [1], stationarization-destationarization, the input-output space needs to remain consistent.** This consistency is naturally ensured in an encoder-decoder design. However, in our classification applications, we only utilize the encoder module. Although we maintain the input-output dimensions, the semantics of the latent space and input space are not the same. Hence, destationarization is not very successful.
(2) Nonstationary transformer inputs consist of raw time-series data with positional encoding. Given the fine-grained nature of current tasks, such an approach can be more data-hungry as they try to establish a relation (attention) among every time step. Therefore, it may not perform well on short-range classification tasks that focus on domain generalization. **This indicates a limitation in its direct usage for optimizing a categorical objective function using only the encoder part with a classification head.**
However, through our ablation studies and further analyses, as shown in Figure 4 of the paper, we demonstrate that by applying PhASER principles and redesigning the Nonstationary transformer (updating the input space with Short-term Fourier Transform of magnitude and phase and diversifying using Hilbert augmentation), we achieve significant performance improvements. In the WISDM dataset, the average performance improves from 0.40 to 0.83, and in HHAR, from 0.24 to 0.78. To put it simply, we update the temporal encoder ($F_\mathrm{Tem}$) as a Nonstationary transformer encoder. Although, currently we do not find this updated PhASER + Nonstationary transformer to be the most optimal design, in the future for more general applications to generalize across domains in forecasting tasks this may be promising.
[1] Liu, Y., and et. al. Non-stationary transformers: Exploring the stationarity in time series forecasting. NeurIPS, 2022.
[2] Qin, Xin, and et. al. Generalizable low-resource activity recognition with diverse and discriminative representation learning. ACM KDD, 2023.
[3] Um, Terry T., and et. al. Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. In Proceedings of the 19th ACM international conference on multimodal interaction, 2017.
## A Gentle Reminder of Further Feedback from Reviewer YZm2
Dear Reviewer YZm2,
This is a gentle reminder that we have now been in the discussion phase for several days, and we are keenly anticipating your response. We deeply value the time and effort you have dedicated to reviewing our paper and contributing to its enhancement.
Thank you again for the detailed and constructive reviews. We hope our response is able to address your comments related to the possibility of combining other regular data augmentations with our methods, and further analysis about Nonstationary Transformer. We take this as a great opportunity to improve our work and shall be grateful for any additional feedback you could give us.
Best Regards,
Authors of Paper 2154
## Second Reminder
Dear Reviewer YZm2,
The response period concludes tomorrow and we would like to express our sincere gratitude for your feedback. We kindly remind you that after the 4th, we will no longer be able to address any further questions you may have.
We would greatly appreciate any feedback on our rebuttal. We fully understand that you may be busy at this time, but hope that you could kindly have a quick look at our responses and assess whether they have addressed your concerns and warrant an update to the rating. We would also welcome any additional feedback and questions.
Best Regards,
Authors of Paper 2154
# Reviewer wA1N
Thank you for taking the time to read and review our work. We are happy that you acknowledged our method's novelty and design. We address your questions and comments below.
## Weaknesses
> 1. Motivation of employment of the Hilbert transform. We need better understanding for this.
In our framework, PhASER, the role of the Hilbert transform is to generate an augmented view of the sample with a different set of nonstationary statistics. We hypothesize that the phase information of a signal embeds a nonstationary dictionary of the signal, and by using Hilbert transformation we can deterministically shift the phase response by $\pi / 2$, hence, diversifying the pool of training data in terms of non-stationarity. And, it is widely accepted and proven in domain generalization (DG) from both empirical and theoretical perspectives that the generalization ability of models highly depends on the quantity and diversity of the training data [1]. Note that Hilbert transform-based augmentation leaves the magnitude response of a signal untouched and only manipulates the phase response of the signal's spectrum. For classification tasks, the key semantics are derived from the magnitude response. Our initial pilot analyses in Table 1 of the main paper empirically support this statement, where we train the same backbone network with only magnitude and phase-derived features and find that using only phase information drops the performance by at least 19% compared to magnitude features. Hence, by phase-shifting alone we do not corrupt semantic information relevant to the classification task, but we do diversify the nonstationary statistics of the training dataset. As an illustrative example in [Figure Phase_Sinusoid](https://drive.google.com/file/d/1h9qO1fW_--brIm_inMGirCAb-zH6498h/view?usp=drive_link), we show that for two sinusoids (signal 1 - stationary and signal 2 - nonstationary) of different degrees of non-stationarity, by naive phase mixing .i.e. combining the phase-response of signal 1 with magnitude response of signal 2 in the frequency domain and then plotting the mixed signal by taking the inverse Fourier transform; we can create a new sinusoid with the same magnitude response as signal 2 but with distinct non-stationarity. Ideally, any phase-shift may allow for diversification of non-stationarity but the real-world signals are not pure sinusoids and accurately phase-shifting a signal with arbitrary frequency components is difficult. Hence, we adopt Hilbert transformation as a tool to carry out $\pi / 2$ phase shift in our dataset.
Additionally, the intuition behind the superior performance of the PhASER design elements - separate encoding and then fusion of magnitude and phase and reintroduction of phase as a residual connection can be attributed to the ability of a signal's reconstruction using only magnitude or only phase. Theoretically, if only the magnitude response of a signal is available under the assumption of zero-phase one can reconstruct the time domain of the signal, similarly for phase-only response under the assumption of unit-magnitude [4].
## Questions
> 1. With augmenting using Hilbert transform and subsequent STFT, seems the phase information is mixed together. Please clarify its merits and possible relation to self-attention.
In the proposed PhASER, the Hilbert transform is leveraged to augment the source domain dataset $\mathrm{S}$ into a phase-shifted version $\hat{\mathrm{S}}$, and we use the merged dataset $\mathrm{S}^\prime = \mathrm{S} \cup \hat{\mathrm{S}}$ to train the model. Then the STFT is applied to $\mathrm{S}^\prime$ to obtain temporal magnitude and phase responses that are fed into the separate magnitude and phase encoders. As a result, the use of Hilbert transform and STFT cannot be interpreted as any kind of phase mixing or enhancement. Instead, using them should be viewed as a kind of data preprocessing before feeding the data into the model. Moreover, we demonstrate that separating the magnitude and phase features is a superior design leveraging the intuition of magnitude and phase-only reconstruction abilities to allow the model to learn distribution-agnostic task-specific features [4].
We believe that a more accurate analogy for interpreting our design of reintroducing phase information as a residual connection deeper in the network for broadcasting is an implicit regularization. This guides the model towards learning nonstationarity-invariant patterns more effectively.
> 2. For theorem 2.5, what are the connections between the theorem and the proposed method. Any items in the upper bound corresponding to the Hilbert transform? How tight is the bound? For non-stationary time series, the bound should be pretty loose if we do not have regularities on the extent of the non-stationarity.
In the insight description of Theorem 2.5 (please refer to lines 275 to 290 left column), we introduce the detailed connections between this theorem and the proposed PhASER framework. Specifically, the first term of the upper bound is the expected disagreement of the target domain whose data is unavailable in the problem of DG. That's why we cannot make any effort for the first term during training. For the second term, there is a coefficient $\epsilon$ corresponding to the maximum $\beta$-Divergence within source domains. In Theorem 2.5, we have specified the derivation of using nonstationary statistics to represent $\beta$-Divergence (please refer to Eq. (14) and the definition 2.1). We also want to mention that it is infeasible to achieve direct or indirect approximation of $\beta$-Divergence as it is defined on the raw feature space of the data [2]. Therefore, it is not achievable to directly or indirectly regularize nonstationarity in the optimization objectives. But we find another way -- associating $\beta$-Divergence (or non-stationarity) with the classification task via a residual connection, to solve this. In other words, if we can enable the classification to be not impacted by the non-stationarity, the model tends to reject the learning of non-stationarity since it finds learning non-stationarity does not help improve classification. The second term $\left[\mathrm{e}_{\mathcal{D}_{\bar{\mathrm{U}}}}(\rho) \right]^{1-\frac{1}{q}}$ is the empirical risk of source domains, which can be approximated by the available source domain training data. We apply standard CrossEntropy Loss to minimize the empirical risk.
Although there is no direct item corresponding to Hilbert transform in Theorem 2.5, the second term $\epsilon \cdot \left[\mathrm{e}_{\mathcal{D}_{\bar{\mathrm{U}}}}(\rho) \right]^{1-\frac{1}{q}}$ is influenced by the Hilbert transform, especially supposing Hilbert transform does diversify non-stationarity and provide augmented data for better approximation of the empirical risk. As for the tightness of this theorem, we need to mention that there is no work indicating the theoretical correlation between time-series non-stationarity and the model's out-of-domain generalization, thus we extend the tightest generalization bound of standard DG [3] ($\beta$-Divergence is much tighter than disagreement-based domain discrepancy [2]) into time-series DG and provide a general generalization bound for this topic (we hope this may inspire more works in the future). We need to stress again that Theorem 2.5 is built to provide insights for our design (tell us which factor needs to be solved and how to solve it) but not dedicated to PhASER, thus it cannot be interpreted as a generalization bound for the final convergence result of PhASER. Therefore, we think that the tightness of Theorem 2.5 is not influenced by whether there is a regularization on non-stationarity.
[1] Wang, J., and et al. Generalizing to unseen domains: A survey on domain generalization. IEEE Transactions on Knowledge and Data Engineering, 2022.
[2] Germain, P., and et al. New pac-bayesian perspective on domain adaptation. ICML, 2016.
[3] Yang, M., and et al. Invariant learning via probability of sufficient and necessary causes. NeurIPS, 2023.
[4] Hayes, Monson, Jae Lim, and Alan Oppenheim. "Signal reconstruction from phase or magnitude." IEEE Transactions on Acoustics, Speech, and Signal Processing 28, 1980.
## First Reminder
Dear Reviewer wA1N,
This is a gentle reminder that we have now been in the discussion phase for several days, and we eagerly await your response. We greatly appreciate your time and effort in reviewing this paper and helping us improve it.
Thank you again for the detailed and constructive reviews. We hope our response is able to address your comments related to further interpretation of employing Hilbert Transform in time-series classification, and the general clarification of our proposed time-series domain generalization bound (Theorem 2.5). We take this as a great opportunity to improve our work and shall be grateful for any additional feedback you could give us.
Best Regards,
Authors of Paper 2154
## Thanks to Reviewer wA1N and a kind request to raise the score
Dear Reviewer wA1N,
Thank you for your feedback. We are wondering whether your concerns and questions have been resolved, and we are available for any more questions if you have. We kindly request your consideration in raising the score if there are no lingering concerns. We value your input and sincerely hope all your concerns can be addressed in the current discussion phase. Your endorsement would greatly enhance our work's quality and chance of acceptance.
Best Regards,
Authors of Paper 2154
## Thanks to Reviewer wA1N and a kind request to raise the score (reminder)
Dear Reviewer wA1N,
As the response period ends tomorrow, we would like to express our sincere gratitude for your feedback. We kindly remind you that after the 4th, we will no longer be able to address any further questions you may have.
We are writing to ensure that our responses adequately address all the concerns and questions you raised during the author-reviewer discussion phase. We have diligently worked to address each point raised in your initial review, aiming to improve the clarity and quality of our work.
Your feedback is invaluable to us, and we eagerly await any potential updates to your ratings. Your thoughtful evaluation is critical in the assessment of our paper, and we genuinely hope that our responses have effectively resolved your concerns and provided satisfactory explanations.
Thank you once again for your dedication and time.
Best regards,
Authors of Paper 2154