Scott Chang
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # ICLR 2024 Rebuttal (CODA) ## Rebuttal Summary ### Summary of Rebuttal We thank all the reviewers and AC for their efforts and time in evaluating our work. We are pleased to find that all of the reviewers appreciate our experiments, which are well-established and effective. We sincerely value the constructive comments of the reviewers during the rebuttal session. We are glad that the unclear parts for Reviewers VxLr have been addressed during the rebuttal session. After re-evaluating the first revised version, we are happy that Reviewer VxLr raised the score. The two Reviewers, BJ2w and q1Jd, maintain their scores of 6 according to our responses, reflecting that we addressed their concerns in the current version without raising further issues. After the second round of responses for Reviewer VxLr, we firmly believe that we addressed the reviewer's three main concerns: 1. We updated the parts of the claim and motivations in our paper. 2. We clarified the novelty of our work that distinguishes CODA from existing approaches (in fact, we propose a brand-new model-agnostic branch of solution with flexibility and transferability for architecture-type exploration). 3. We conducted more experiments on two real-world datasets to make our experimental results more solid. As for the second round of responses for Reviewer GM2p, we also firmly believe that we addressed all the reviewer's follow-up questions: 1. We detailed our experiments on high-dimensional data. 2. We elaborated our novel perspective for proposing CODA (the added experiments in our second round of responses for Reviewer VxLr can further support our point). 3. We explained why the conditional data generator idea (mentioned by the reviewer) is not feasible, which is actually out of our work's scope and unrelated to our work. <!-- For the 3., we actually don't know why we should answer the feasibility of a without-developed idea, which is in fact unrelated to our work. Even so, we still answer the arbitrary question to convince the reviewer. --> **With the clarification and extra experiments in rebuttal, we believe we have resolved all the reviewer's concerns and look forward to positive feedback.** ### Contributions of Our Work In this work, we propose the model-agnostic framework to address concept drift from a novel, data-centric perspective. The main motivation behind our approach is to "nip the problem in the bud" because the root cause of concept drift lies in the temporal evolution of data. Our solution directly tackles this problem by generating future data for model training. Besides introducing a novel perspective in TDG, the experimental results demonstrate the effectiveness of our solution by achieving SOTA. We believe the proposed new perspective will benefit further research in TDG. ## Reviewer VxLr We thank the reviewer for the constructive comments and appreciate the reviewer for the recognition of the effectiveness of our work. **W1: Clarification of the Motivation** *"Concering about using a data-centric approach over a model-centric method in TDG."* <!-- [AW1]: 1. We believe the fundamental cause of concept drift issue is the underlying temporal trend of data distribution over time. (concept drift occurs in data --> address this root cause) --> <!-- 2. The most effective model architectur varies across different datasets and downstream tasks. --> <!-- > [Table 1: Performance comparison] & [Table 3: Transferability] --> [AW1]: We consider model-centric and data-centric approaches as parallel strategies, and our goal is not to position one approach against the other. Our main motivations are as follows: 1. **Nip the problem in the bud** - We believe the fundamental cause of concept drift is the underlying temporal trend of data distribution over time. Therefore, with the generated future data, TDG can be achieved by training a prediction model on an i.i.d. dataset. - The main motivation behind our approach is to "Nip the problem in the bud". In other words, **the root cause of concept drift lies in the temporal evolution of data**. Our solution is to directly tackle this problem from data perspective, i.e., achieve TDG by training a prediction model for future data generation. <!-- to address the issue of concept drift from a novel angle. We believe that the root cause of concept drift lies in the temporal evolution of data distribution over time. Hence, by utilizing the generated future data, we can achieve TDG by training a prediction model on an i.i.d. dataset. --> 2. **Flexibility and transferability for architecture-type exploration** - Furthermore, it is evident that the most effective model architecture can vary across different datasets and downstream tasks, such as MLPs, tree-based, and Transformer-based backbones. This observation is supported by the results in Table 1 of our paper, which shows that the architecture yielding the best performance differs among the evaluations on the five datasets. - However, existing model-centric methods are limited to specific model architectures. In contrast, our data-centric CODA framework offers flexibility for exploring various architectures by providing transferable training datasets. These datasets are adaptable for training different backbone architectures. For a detailed analysis of this adaptability, refer to 'Cross-Architecture Transferability' in Section 4.3. **W2: Why not generate data in representation space?** <!-- [AW2]: --> <!-- 1. yes, there are some existing works do it in representation space --> show the comparison --> CODA is better --> <!-- 2. Our opinions: --> <!-- 1. fixed encoder --> it will influence the performance --> <!-- 2. root cause in data --> [AW2]: Although the ultimate goal is to train a predictor, as mentioned in the response for Weakness 1. The reason we generate instances for simulating the future data distribution is to pursue model-agnostic methods due to the efficacy of model architecture selection, such as Tree-based or Transformer-based models, varying across different datasets and downstream tasks, which is supported by the results in Table 1. Generating data in representation space can not guarantee model agnostic because the fixed encoder part is required for the efficacy of representation. **W3: Contributions and Novelties** *"The authors need to explicitly distinguish the contributions of existing works[1][2] and novelties in light of these studies."* <!-- [AW3]: 1. Our novelty: --> <!-- 1. no previous work consider the concept drift from the data itself 2. use correletion matrices to simplify the data distirbution for capturing the underlying temporal trend (b/c directly capture the temporal trend among multiple data distribution, time index...) 3. theoretical analysis --> effectiveness of the correlation matrices --> <!-- 2. difference from existing works --> <!-- 3. [1] --> <!-- 1. No model-agnostic --> may limit the performance --> <!-- 2. capture the temporal trend in embedding space --> cannot be leveraged for differnet predictors that are not trained with the identical encoder. --> <!-- 3. independently model the concept shift based on the strong assumption that the temporal trend of covariate shift and concept shift can be modeled seperately. [conduct exp on Rot-MNIST to disprove it] --> <!-- 4. [2] 1. No model-agnostic 2. generates the augmented features in embedding space --> cannot be leveraged for differnet predictors that are not trained with the identical encoder. [Without code --> unreproducible] --> <!-- 5. Conduct experiments on Rot-MNIST (encoder: MNIST ConvNet in [1]) --> [AW3]: Our main contribution and novelty is that we propose a data-centric (model-agnostic) TDG framework by using feature correlation matrices **to simplify the challenges of capturing the temporal trend**. The main challenges of capturing temporal trend among multiple time points is two-fold: 1. In most of the real-world benchmarks, **we don't have sample index for each data instance at different time domain**s, so we cannot treat each instance as a time series data for modeling its temporal evolution pattern. Therefore, it is impossible to predict the future features for each sequence. It is only durable to capture the trend of data distribution along time domains and generate the future dataset/samples. 2. An alternative way is to capture the underlying temporal trend among multiple datasets (distributions) using kernel data distribution estimation method, which is **computationally infeasible and hard to generate effective training data** (details of the analysis refer to Section 3.1). **Novelties and contributions** - For the two missed related works[1][2], we have added them to our references. Although they are also generative-based methods, both data augmentation in latent space[1] and the generation of augmented features within latent space[2] are **not model-agnostic** because different predictors, which are not trained with the same encoder, cannot identify the augmented embeddings or features. - We also conducted our CODA on Rot-MNIST for performance comparison. We use the same encoder structure as [1] (MNIST ConvNet) and train three different architecture predictors. The results reveal all the three different predictors trained on the dataset generated by CODA outperform [1, 2] and other baselines, which again proves the **effectiveness and transferability of CODA**. We have added the experimental results in Appendix G, and the added section title is highlighted in blue. | Frameworks | Sine | Rot-MNIST | | :----: | :----: | :----: | | LSSAE[1] | 36.8 $\pm$ 1.5 | 16.6 $\pm$ 0.7 | | DDA[2] | 1.6 $\pm$ 0.9 | 13.8 $\pm$ 0.3 | | GI | 33.2 $\pm$ 0.7 | 7.7 $\pm$ 1.3 | | DRAIN | 3.0 $\pm$ 1.0 | 7.5 $\pm$ 1.1 | | CODA (MLP) | 2.7 $\pm$ 0.9 | **6.0 $\pm$ 1.2** | | CODA (LightGBM) | **1.2 $\pm$ 0.4** | **5.8 $\pm$ 0.6** | | CODA (FT-Transformer) | **1.1 $\pm$ 0.4** | **6.3 $\pm$ 0.5** | [1] Tiexin Qin, et al. "Generalizing to evolving domains with latent structure-aware sequential autoencoder." ICML 2022. [2] Qiuhao Zeng, et al. "Foresee what you will learn: Data augmentation for domain generalization in non-stationary environment." AAAI 2023. **W4: Why modeling the correlation between two consecutive domains.** <!-- [AW4]: 1. It's a misunderstanding. We don't model the correlation between two consecutive time domains. We use an LSTM to model the temporal trend of correlation matrices over all the time domains in training data. Eq.(3) aims to optimize the loss between the predicted $\mathcal{\hat{C}}_t$ and the groudtruth $\mathcal{C}_{t}$ given the $\mathcal{C}_{t-1}$. 2. without sample index among time axis --> cannot capture the temporal trend of each sample independently --> need to capure the temporal trend among data distribution --> need to choose a way to represent datast distribution --> in our work, we choose feature correlation matrices to do so --> resean (theoretical analysis) --> there are many other options to represent a data distribution information for capturing temporal trend, and as pioneer research, we offer the naturally and theoretically supported way (feature correlation matrices) --> [AW4]: We would like to clarify this misunderstanding. We don't model the correlation between two consecutive time domains. We use an LSTM to model the temporal trend of **feature correlation matrices over all the time domains in the training datasets**. Eq.(3) aims to optimize the loss between the predicted $\mathcal{\hat{C}}_t$ and the groudtruth $\mathcal{C}_{t}$ given $\mathcal{C}_{1}$ to $\mathcal{C}_{t-1}$. - As mentiened in the response for your Weakness 3, the two challenges of capturing the temporal trend from datasets is: 1. We don't have sample index for each data instance at different time domains, so we cannot capture the temporal trend of each sample independently. Therefore, we need to capture the temporal trend among multiple data distributions. 2. However, our preliminary experiments and analysis show the infeasibility of directly modeling the temporal evolusion among data distributions (refer to Section 3.1). - To this end, the core idea of our solution is **to simplify the data distribution at each time domain to capture the underlying temporal trend better**. In this work, we utilize feature correlation matrices to achieve simplification and provide theoretical analysis to prove the rationale of representing data distribution with a feature correlation matrix (refer to Section 3.4). - We want to emphasize that while numerous methods exist to simplify dataset information, **our pioneering research introduces a natural and theoretically supported data-centric approach for this purpose**. **W5 & Q4: Clarification of Data Simulator and how to incorporate the estimated correlation matrix for data generation.** [AW5 & AQ4]: Based on the current data distribution $\mathcal{D}_{T}$, Data Simulator ${G}(\mathcal{D}_{T} ; \mathcal{\hat{C}}_{T+1} | \theta_{G})$ can simulate the future data distribution $\mathcal{\hat{D}}_{T+1}$ that is subject to the predicted correlation matrix $\mathcal{\hat{C}}_{T+1}$. We futher explain the details as follows: - Specifically, our CODA framework comprises two replaceable components: Correlation Predictor ${H}(\cdot)$ (refers to Section 3.2) and Data Simulator $G(\cdot)$ (refers to Section 3.3). Essentially, they can be substituted by other models that perform similar functions, where $G(\cdot)$ should be a generative model that can incorporate prior knowledge into account for data generation. In our case, the prior knowledge is the predicted future correlation matrix $\mathcal{\hat{C}}_{T+1}$, as described in Eq.(4) and Eq.(5). - Simultaneously, the trained $G(\cdot)$ should learn the similar data distribution of the current domain $\mathcal{D}_{T}$. This is based on the assumption that distribution shifts are smooth and closely related to domains in the near time domains (refer to the assumption (iii) in Theorem 1). - In our experiments, we instantiate $G(\cdot)$ with a generative model that jointly learns the encoder and decoder of a VAE-based generative model and a learnable graph. Thus, it can **treat prior knowledge as an adjacency matrix and encourage the learned graph to be similar to the given prior knowledge (refer to Sections 3.3 and 4.1)**. <!-- > $\mathcal{R}_{C}$ serves as a regularization term that ensures $G(\cdot)$ following the prior predicted feature correlation matrix $\mathcal{\hat{C}}_{T+1}$ --> Therefore, based on the current data distribution $\mathcal{D}_{T}$, ${G}(\mathcal{D}_{T} ; \mathcal{\hat{C}}_{T+1} | \theta_{G})$ can simulate the future data distribution $\mathcal{\hat{D}}_{T+1}$ that is subject to the predicted correlation matrix $\mathcal{\hat{C}}_{T+1}$. <!-- **[W5-2 & Q4] How the estimated correlation matrix and $\mathcal{D}_{T}$ are incorporated in the generation process?** **===== Real AW5-2 =====:** - Simultaneously, the trained $G(\cdot)$ should learn the similar data distribution of the current domain $\mathcal{D}_{T}$. This is based on the assumption that distribution shifts are smooth and closely related to domains in the near time domains (refer to the assumption (iii) in Theorem 1). - Therefore, based on the current data distribution $\mathcal{D}_{T}$, ${G}(\mathcal{D}_{T} ; \mathcal{\hat{C}}_{T+1} | \theta_{G})$ can simulate the future data distribution $\mathcal{\hat{D}}_{T+1}$ that is subject to the predicted correlation matrix $\mathcal{\hat{C}}_{T+1}$. --> **W6: Experimental Results on Commonly used benchmarks.** *"Several commonly used benchmark data sets are also missing, including both synthetic (e.g., Circle, Sine) and real (e.g., RMNIST, Portraits, Ocular, Caltran, WILDS) data sets."* <!-- [AW6]: [Same as AW2] Conduct experiments on Rot-MNIST and (encoder: MNIST ConvNet in [1]) --> <!-- [AW6]: - **(same as Reviewer q1Jd AW4&Q2)** Sine, Rot-MNIST --> [AW6]: We have considered diverse concept drift patterns as follows show: - Synthetic datasets: Besides the 2-Moons, we experimented on Sine dataset shown in the table below. - Real datasets: Besides the real-world datasets used in our experiments (Elec2, ONT, Shuttle, and Appliance), we conducted one more experiment on Rot-MNIST, using the same encoder structure as [1] (MNIST ConvNet) and train three architecture predictors, and the results are shown as the table below. The results reveal all three different predictors trained on the dataset generated by CODA outperform other baselines. Furthermore, the differences among the three trained predictors support one of our contributions that the proposed model-agnostic CODA framework is flexible for best architecture exploration towards different datasets and downstream tasks. We have added the experimental results in Appendix G, and the added section title is highlighted in blue. | Frameworks | Sine | Rot-MNIST | |:---------------------:|:-----------------:|:-----------------:| | LSSAE[1] | 36.8 $\pm$ 1.5 | 16.6 $\pm$ 0.7 | | DDA[2] | 1.6 $\pm$ 0.9 | 13.8 $\pm$ 0.3 | | GI | 33.2 $\pm$ 0.7 | 7.7 $\pm$ 1.3 | | DRAIN | 3.0 $\pm$ 1.0 | 7.5 $\pm$ 1.1 | | CODA (MLP) | 2.7 $\pm$ 0.9 | **6.0 $\pm$ 1.2** | | CODA (LightGBM) | **1.2 $\pm$ 0.4** | **5.8 $\pm$ 0.6** | | CODA (FT-Transformer) | **1.1 $\pm$ 0.4** | **6.3 $\pm$ 0.5** | **Q1: Do the correlation matrices include the label information?** <!-- [AQ1]: Yes. The last feature is the column of labels. Explain the details [GOGGLE]. --> [AQ1]: **Yes, it includes label information**. In feature correlation matrices, each row and column corresponds to a specific feature. The final row and column represent label information. Each cell within the matrix indicates the degree of correlation between a pair of features. Furthermore, **Section 3.4** presents a theoretical analysis that guarantees the consistency of our feature correlation estimation with **three assumptions that can be easily satisfied in reality**. **Q2 Explanation of Eq.(3)** [AQ2]: The three regularization terms in Eq.(3) are explained as follows: 1. The $\ell_1$-norm encourages sparsity in the predicted $\mathcal{\hat{C}}_t$ because it can effectively "zero out" less important features while the correlation matrices are generally sparse (as shown in Appendix Figure 9). 2. The $\ell_2$-norm is sensitive to significant errors and imposes a penalty on $\mathcal{\hat{C}}_t$ for substantial errors, promoting overall accuracy in the reconstruction. 3. The cross-entropy loss $\mathcal{L}_{CE}$ can measure how well the distribution of $\mathcal{\hat{C}}_t$ matches the groundtruth distribution of $\mathcal{C}_t$. Despite the errors between predicted future $\mathcal{\hat{C}}_{t+1}$ and the ground truth $\mathcal{C}_{t+1}$ is minimal (as shown in Figure 10 in the Appendix), there is huge potential to enhance the Correlation Predictor. As one of the future directions, it could be achieved using a more sophisticated sequential prediction framework than LSTM. **Q5: Connection between Theorem 1 and the Proposed Method (e.g., Eq (5)).** *"Theorem 1 states that for two random vectors, if they are bounded and their distributions are close, then the difference between their correlation matrices are also bounded. But how this is related to the algorithm?"* <!-- [AQ5]: 1. Dataset prior knowledge (infeasible in CODA) --[Theorm 1]-> covarianace matrix prior knowledge (feasible plug-in in CODA) --> [AQ5]: Theorem 1 serves as the theoretical foundation for the usage of prior knowledge (predicted feature correlation matrix $\mathcal{\hat{C}}_{T+1}$) in data simulator. - As mentioned in our response to Weaknesses 3 and 4, one of the main challenges of capturing the temporal trend among multiple time domains is computationally infeasible (refer to Section 3.1). Our framework addresses this by representing the data distribution at each time domain by its feature correlation matrix to effectively capture the temporal trend. This simplification can effectively represent the original distribution information only if Theorem 1 holds. - Based on the analysis in Section 3.4, we conclude that the three assumptions in Theorem 1 can be easily satisfied in reality, which serves as the theoretical foundation for the simplification. --- ## Re: Response to the rebuttal (VxLr) Q1: Motivations and novelties: while the authors claim that "We consider model-centric and data-centric approaches as parallel strategies", the motivations in the paper are still the same. In fact, the authors did not revise the paper to highlight this point at all. Regarding the novelty, I agree that model-agnostic can be considered as a benefit of CODA (though still not emphasized enough in the paper), but other than that, I cannot see fundamental improvements over [1], [2]. In particular, [1] [2] also face challenge 1, and challenge 2 is not prominent in [1] [2] as they generate samples in the representation space. Q2: The experiments are still not solid enough. RMNIST is problem one of the simplest real-world data sets in TDG. ===================================================== <!-- We appreciate the reviewer increasing the score according to our explanation and providing some remaining concerns for further elaboration. --> We appreciate the reviewer adjusting the score in light of our clarifications, and we are glad to further address the remaining concerns. **[AQ1-1]: Revised the paper.** Thanks to the reviewer for reminding us of the points that should be revised. We have updated the parts of the claim and motivations, highlighted in blue in the introduction section. **[AQ1-2]: Novelty.** We would like to emphasize that our work proposes **a new branch of solution** to address concept drift problem. We argue that such a new branch itself is novel and serves as a fundamental improvement over existing literature. Additionally, As a new-branch solution, it is not obliged to improve the approaches from another branch. Our novelty lies in: 1. **Develop a new branch of solution** for addressing the concept drift problem from a data-centric perspective. 2. Our proposed **model-agnostic** CODA framework provides **flexibility and transferability for architecture-type exploration**. <!-- We agree that [1] and [2] both face the scenarios of the challenge 1 and challenge 2 that we mentioned in abstract and introduction. However, we need to emphasize that, in our work, our novelty lies on the contribution of achieving model-agnostic by generating transferable training data. --> Although [1] and [2] face similar scenarios, their approaches involve training predictors with **specific encoders** This setup does not ensure model-agnostic since the generated embeddings can not be recognizable by other predictors and decoders not trained with the same encoders. **[AQ2]:** As the reviewer mentioned, Rot-MNIST is also a real-world dataset with concept drift. Besides, in Table 1, we have conducted four additional real-world concept drift datasets (Elec2, ONP, Shuttle, Appliance). We believe our experiments on Sine synthetic and Rot-MNIST real-world datasets are sufficient. We are trying our best to conduct one or two more real-world datasets before the rebuttal deadline, and thank you for your understanding. **[AQ2 Part2]:** We have conducted two more real-world datasets in TDG, Portraits and Forest Cover. For Portraits, we use the same encoder as [2] (Wide ResNet) before Correlation Predictor module. We have added the experimental results in Appendix G. | Frameworks | Portraits | Forest Cover | |:---------------------:|:-----------------:|:------------------:| | LSSAE[1] | 6.9 $\pm$ 0.3 | 36.8 $\pm$ 0.4 | | DDA[2] | 5.1 $\pm$ 0.1 | 34.7 $\pm$ 0.5 | | GI[3] | 6.3 $\pm$ 0.2 | 36.4 $\pm$ 0.4 | | CODA (MLP) | 5.1 $\pm$ 0.1 | **34.4 $\pm$ 0.4** | | CODA (LightGBM) | 6.2 $\pm$ 0.1 | **33.0 $\pm$ 0.3** | | CODA (FT-Transformer) | **4.9 $\pm$ 0.2** | **33.7 $\pm$ 0.3** | [1] Tiexin Qin, et al. "Generalizing to evolving domains with latent structure-aware sequential autoencoder." ICML 2022. [2] Qiuhao Zeng, et al. "Foresee what you will learn: Data augmentation for domain generalization in non-stationary environment." AAAI 2023. [3] Anshul Nasery, et al., "Training for the Future: A Simple Gradient Interpolation Loss to Generalize Along Time," NeurIPS 2021. **With the discussion and added results above, we hope that we have resolved all the reviewer's concerns and look forward to clarifying any further questions that may arise.** --- ## Reviewer BJ2w We thank the reviewer for the constructive comments and appreciate the reviewer for the recognition of the effectiveness of our work. **Q1: Difference between the Proposed Framework and Model-centric Methods.** *"The generated data would still be utilized as the training data for prediction models. Would it still go back to model-centric strategies?"* <!-- [AQ1]: 1. [cite Daochen -> what's the difference] The generated data can be utilized for different architectures and free from the specific encoder and predictor. 2. **(same as Reviewer VxLr A1)** The most effective model architectur varies across different datasets and downstream tasks. > [Table 1: Performance comparison] & [Table 3: Transferability] --> [AQ1]: - Our framework focuses on solving problems via generating effective training data, which is identified as a data-centric paradigm. - The concept of the *data-centric paradigm* involves the methods for building effective training data; on the other hand, *model-centric* methods focus on identifying more effective model designs that are trained using the original data[1]. - Our framework aims to address the concept shift issue by generating future datasets for model training, achieving a model-agnostic approach to explore different model architectures for downstream tasks. [1] Daochen, Zha, et al., "Data-centric artificial intelligence: A survey," arXiv:2303.10158 **Q2: Baseline Comparison.** *"The generated data is then used to train models, would it be unfair for comparison methods? Should the comparison methods also use the same generated data to fine-tune?"* **[Q2-1]: Fair performance comparison.** <!-- [AQ2-1]: It's a fair comparison. --> [AQ2-1]: **Yes, the performance comparison is fair**. In our experiments, all the MLP models trained on the data generated by CODA are the same as one of the baselines DRAIN. Instead of designing predictor model structures, our approach focuses on the quality and efficacy of the generated training data. **[Q2-2]: Should the comparison methods also use the same generated data to fine-tune?** <!-- [AQ2-2]: CODA simulates the one domain ahead data for model training. In contrast, other baselines require all training domains for fine-tuning their frameworks and, therefore, cannot be trained using data from a single domain alone. --> [AQ2-2]: **Other baselines cannot be trained on only one datasets at a single time domain.** - CODA simulates the one domain ahead data for model training. In contrast, other baselines require all training domains for fine-tuning their whole models or frameworks and, therefore, cannot be trained using data from a single domain alone. ## Reviewer q1Jd We thank the reviewer for the constructive comments and appreciate the reviewer for the recognition of the effectiveness of our work. **W1. Limited dynamic network adaptability compared to some existing methods.** [AW1]: We are not sure about the content of this weakness. The reviewer's point is that CODA can only be used for certain deep neural networks. Based on this understanding, we believe that **this is a misunderstanding**. The reasons are as follows: - Our main contribution and novelty is that we propose a data-centric (model-agnostic) TDG framework by using feature correlation matrices to simplify the challenges of future data generation. Therefore, with the generated future data, TDG can be achieved by training a prediction model on the i.i.d. dataset. - Our model-agnostic CODA framework offers flexibility for exploring various architectures by providing transferable training datasets. The datasets generated by CODA are adaptable for training different backbone architectures, which is demonstrated in Table 1. For a detailed analysis of this adaptability, refer to 'Cross-Architecture Transferability' in Section 4.3. In sum, one of our main contributions is to be free from a specific model architecture for all downstream tasks instead of a limitation. **W2. Constrained application in model-agnostic learning scenarios.** <!-- AW2: 1. In our understanding, your point is that CODA can be only leveraged to model-agnostic tasks. If our understanding is correct, then the answer is no. Used for any scenario but with flexibility to choose model architecture in downstream tasks. 2. Could you clarify this weak point. - Cite some model-agnostic papers --> [AW2]: We are also not sure about the content of this weakness. The reviewer's point is that "CODA can be only leveraged to model-agnostic tasks." If our understanding is correct, then we believe **this is also a misunderstanding**. The reasons are as follows: - "Model-agnostic" is a feature of "approaches" rather than a feature of "downstream tasks."[1] This merit provides the flexibility of the predictor to explore the best suitable model architecture for the tasks or scenarios you meet. - The datasets generated by CODA are adaptable for training different backbone architectures, which is demonstrated in Table 1. For a detailed analysis of this adaptability, refer to 'Cross-Architecture Transferability' in Section 4.3. [1] Daochen, Zha, et al., "Data-centric artificial intelligence: A survey," arXiv:2303.10158 **W3 & Q1: Effictiveness of High-dimensional Data.** *"In the context of high-dimensional data, how does CODA maintain performance efficiency?"* <!-- [AW3]: 1. We don't claim the efficiency of CODA. 2. **(same as Reviewer GM2p AW2-2)** 3. **(same as Reviewer VxLr A1)** The most effective model architectur varies across different datasets and downstream tasks. --> [AW3 & AQ1]: Based on Theorem 1 and our analysis in Section 3.4, we agree that feature correlation matrices may not effectively represent high-dimensional data. **However, it does not mean that the proposed CODA framework can only work on low-dimensional data**. We conducted our CODA on a high-dimentional dataset (Rotate-MNIST) for performance comparison, shown in the table below. We use the same encoder structure as [1] (MNIST ConvNet) and train three architecture predictors. The results reveal all three different predictors trained on the dataset generated by CODA outperform other baselines. Furthermore, the differences among the three trained predictors support one of our contributions that the proposed model-agnostic CODA framework is flexible for best architecture exploration towards different datasets and downstream tasks. We have added the experimental results in Appendix G, and the added section title is highlighted in blue. | Frameworks | Sine | Rot-MNIST | | :----: | :----: | :----: | | LSSAE[2] | 36.8 $\pm$ 1.5 | 16.6 $\pm$ 0.7 | | DDA[3] | 1.6 $\pm$ 0.9 | 13.8 $\pm$ 0.3 | | GI | 33.2 $\pm$ 0.7 | 7.7 $\pm$ 1.3 | | DRAIN | 3.0 $\pm$ 1.0 | 7.5 $\pm$ 1.1 | | CODA (MLP) | 2.7 $\pm$ 0.9 | **6.0 $\pm$ 1.2** | | CODA (LightGBM) | **1.2 $\pm$ 0.4** | **5.8 $\pm$ 0.6** | | CODA (FT-Transformer) | **1.1 $\pm$ 0.4** | **6.3 $\pm$ 0.5** | [2] Tiexin Qin, et al. "Generalizing to evolving domains with latent structure-aware sequential autoencoder." ICML 2022. [3] Qiuhao Zeng, et al. "Foresee what you will learn: Data augmentation for domain generalization in non-stationary environment." AAAI 2023. **W4 & Q2: Effectiveness in Diverse Concept Drift Scenarios.** *"Does CODA account for various natures of concept drift, such as abrupt or cyclical changes?"* <!-- [AQ2]: [Conduct one of the exp.] 1. we already consider diverse concept drift pattern: categorize into - synthetic: 1. abrupt change: doesn't suit our assumption that **"the joint distribution of features and labels with smooth data shift over time" (refers to Introduction)** 2. cyclical change: **Rot-MNIST**, **Sine**, 2-Moons(Done) - Real world: unknown but standard benchmark following xxx. --> <!-- 2. Experiments is standard following xxx. We believe the experiments are sufficient --> [AW4 & AQ2]: We already considered diverse concept drift patterns as follows: - Synthetic concept drifts: 1. Cyclical change: the 2-Moons dataset is built with a cyclical concept drift pattern, and we conduct **Rot-MNIST** and **Sine** datasets as shown in the table above. 2. Abrupt change: this type of temporal trend doesn't fit our assumption that **"the joint distribution of features and labels with smooth data shift over time"** (refer to Introduction). - Real-world concept drift: The real-world datasets used in our experiments feature various and unknown patterns of concept drift. They have covered diverse realistic temporal trends, such as electricity demand changes (Elec2), space shuttle defects (Shuttle), and appliances energy usage changes (Appliance). <!-- | Frameworks | Sine | Rot-MNIST | | :----: | :----: | :----: | | LSSAE[1] | 36.8 $\pm$ 1.5 | 16.6 $\pm$ 0.7 | | DDA[2] | 1.6 $\pm$ 0.9 | 13.8 $\pm$ 0.3 | | GI[3] | 33.2 $\pm$ 0.7 | 7.7 $\pm$ 1.3 | | DRAIN[4] | 3.0 $\pm$ 1.0 | 7.5 $\pm$ 1.1 | | CODA (MLP) | 2.7 $\pm$ 0.9 | **6.0 $\pm$ 1.2** | | CODA (LightGBM) | **1.2 $\pm$ 0.4** | **5.8 $\pm$ 0.6** | | CODA (FT-Transformer) | **1.1 $\pm$ 0.4** | **6.3 $\pm$ 0.5** | --> ## Reviewer GM2p We thank the reviewer for the constructive comments and appreciate the reviewer for the recognition of the effectiveness of our work. **W1: Effictiveness of High-dimensional Data.** *"The proposed algorithm can only work on low-dimensional data (as the authors also mentioned). It is intractable to learn the correlation matrix on the high-dimensional data. I guess that's why some dataset such as rotating MNIST has been excluded from evaluation."* <!-- [AW1]: 1. We admit it in the current paper -> we will update more discussion in the revised version 2. effectiveness in high input dimension -> image exp 3. efficiency -- O(n^2) 4. Future work to explore effectiveness and efficiency. --> [AW1]: Based on Theorem 1 and our analysis in Section 3.4, we agree that feature correlation matrices may not effectively represent high-dimensional data. <!-- - We will provide a more comprehensive discussion in the revised manuscript. --> **However, it does not mean the proposed framework can only work on low-dimensional data**. We conducted our CODA on a high-dimentional dataset (Rotate-MNIST) mentioned by the reviewer for performance comparison. We use the same encoder structure as [1] (MNIST ConvNet) and train three architecture predictors. **The results reveal all three different predictors trained on the dataset generated by CODA outperform other baselines**. Furthermore, the differences among the three trained predictors support one of our contributions that the proposed model-agnostic CODA framework is flexible for best architecture exploration towards different datasets and downstream tasks. We have added the experimental results in Appendix G, and the added section title is highlighted in blue. | Frameworks | Rot-MNIST | |:---------------------:|:-----------------:| | LSSAE[1] | 16.6 $\pm$ 0.7 | | DDA[2] | 13.8 $\pm$ 0.3 | | GI | 7.7 $\pm$ 1.3 | | DRAIN | 7.5 $\pm$ 1.1 | | CODA (MLP) | **6.0 $\pm$ 1.2** | | CODA (LightGBM) | **5.8 $\pm$ 0.6** | | CODA (FT-Transformer) | **6.3 $\pm$ 0.5** | **W2-1: Justification for the end-to-end SOTA comparison** <!-- [AW2-1]: End-to-end may overfit -> CODA offers model architecture flexibility for tasks and datasets --> [AW2-1]: We agree that end-to-end approaches usually is ideal due to the convenient training process. At the same time, **we also believe that end-to-end approaches may not always be the best solution for tackling the root cause of a problem**. The reasons are as follows: - The main motivation behind our approach is to **"Nip the problem in the bud"**. In other words, **the root cause of concept drift lies in the temporal evolution of data**. Our solution is to directly tackle this problem from data perspective, i.e., achieve TDG by training a prediction model for future data generation. - When end-to-end approaches may be overfitting due to their comprehensive interaction between data and model, one of our main motivations is to offer the flexibility of model architecture exploration for different datasets and downstream tasks by providing high-quality and effective training data. - It is evident that the most effective model architecture can vary across different datasets and downstream tasks. This observation is supported by the results in Table 1, which shows that the architecture yielding the best performance differs among the evaluations on the five datasets. **W2-2 & Q1: Efficiency of CODA.** "Requires separate steps to solve the final task is inefficient." <!-- [AW2-2]: 1. main focus in effectiveness in TDG, instead of efficiency 2. efficiency is not a big problem; -> detailed justification. (1) (2) (3) [conduct a simple table to show the running time comparison] - DRAIN: `465.9357023239136`s, `659.0098538398743`s, - CODA: `447.8165867329`s - Correlation Predictor $H(\cdot)$ trainig: `142.11019372940063`s - Data Simulator $G(\cdot)$ training: `290.8264033794403`s - MLP training: `14.879989624023438`s --> [AW2-2]: In this work, we mainly focus on the effectiveness of achieving TDG rather than on efficiency. Although efficiency is not our main goal, we would like to justify that our proposed framework achieves decent efficiency compared to the SOTA method DRAIN. The reason is that by splitting the whole temporal trend modeling and data generation process into three sub-processes (learning Correlation Predictor $H(\cdot)$, learning Data Simulator $G(\cdot)$, and predictor training), **each of the sub-processes is a manageable sub-problem and takes less training time than a whole end-to-end model**. We demonstrate the training time comparison to the SOTA DRAIN on Elec2 dataset in the table below, where we train the same MLP structure as the predictor. We have added the experimental results in Appendix H, and the added section title is highlighted in blue. | Framework & Components | Training Time (s) | | :----: | :----: | | **DRAIN** | **465.936** | | **CODA (Total)** | **447.817** | | CODA (Correlation Predictor) | 142.110 | | CODA (Data Simulator) | 290.826 | | CODA (MLP) | 14.880 | **W3: A conditional data generator considering the time index.** *"Why not training a conditional data generator considering the time index."* <!-- [AW3]: 1. no time sequency; Cannnot track the sample input change along time dimension, like diffusion model. This is the key challenge part in our problem. -> why covariance matrx (refer to Reviewer Q1) Therefore, infeasible; Clear statament 2. In fact, GI is built based on the similar but more comprehensive idea. We compared CODA with GI in Table 1. --> [AW3]: Unfortunately, this idea cannot be implemented based on the **lack of sample indices for instances at each time domain**. The explanations are as follows: - One of the key challenges of capturing temporal trends among multiple time points is that we don't have time indices for each data instance, so we cannot treat each instance as sequential data for modeling its temporal evolution pattern, as the diffusion model does. - An alternative way that is to capture the underlying temporal trend among multiple datasets (distributions), which is computationally infeasible and hard to generate effective training data (details of the analysis refer to Section 3.1). Therefore, our solution is to simplify the data distribution at each time domain to capture the underlying temporal trend better. We utilize feature correlation matrices to achieve simplification and provide theoretical analysis to prove the rationale of representing data distribution with a feature correlation matrix (refer to Section 3.4). - Notes that the baseline GI proposed a time-sensitive model to extrapolate samples to the near future via the first-order Taylor expansion, which is a implicit way to use time index as conditions for prediction. As shown in Table 1, three different architectures trained on the data generated by CODA outperform GI in all benchmarks. **Q2: Without using all the previous domains for data simulation.** *"Eq.(5) only uses the last domain and not all the previous domains."* <!-- [AQ2]: **(same as Reviewer VxLr A5-2)** Because the distribution of latest domain $\mathcal{D}_T$ is most similar to the next future domain $\mathcal{\hat{D}}_T$. Specifically, Eq.(5) imposes two objectives: 1. optimizes the loss between $\mathcal{D}_T$ and the predicted $\mathcal{\hat{D}}_{T+1}$ 2. consider $\mathcal{\hat{C}}_{T+1}$ as prior knowledge for the Data Simulator $G(\cdot)$: > $\mathcal{R}_{C}$ serves as a regularization term that ensures $G(\cdot)$ following the prior predicted feature correlation matrix $\mathcal{\hat{C}}_{T+1}$ --> [AQ2]: In our proposed CODA framework, the trained Data Simulator $G(\cdot)$ should learn the similar data distribution of the current domain $\mathcal{D}_{T}$. This is based on the assumption that distribution shifts are smooth and closely related to domains in the near time domains (refer to the assumption (iii) in Theorem 1). Therefore, based on the current data distribution $\mathcal{D}_{T}$, ${G}(\mathcal{D}_{T} ; \mathcal{\hat{C}}_{T+1} | \theta_{G})$ can simulate the future data distribution $\mathcal{\hat{D}}_{T+1}$ that is subject to the predicted correlation matrix $\mathcal{\hat{C}}_{T+1}$. --- ## Re: Response to Rebuttal (GM2p) Q1: I cannot understand how the model can work on high-dimensional data. My understanding is that the correlation matrices have computational and memory complexity. So, it seems intractable on high-dimensional data. Q2: Also, if my understanding (about computational and memory complexity) is correct, the training time for correlation predictor and data simulator subprocesses are not manageable. The training time comparison to DRAIN on Elec2 dataset may be misleading since Elec2 has a few dimensions. Q3: For conditional generation, I did not mean to use sample indices. I meant to use domain index as the time index for all the samples in a domain. ===================================================== We appreciate the reviewer's feedback and are glad to further address the remaining concerns. **Q1 & Q2: How the model can work on high-dimensional data.** [AQ1 & AQ2]: We would like to clarify the confusion. We agree the computation complexity $O(N^2)$ may limit the feasibility. However, our CODA can still work on high-dimensional data. Our **empirical results** show the efficacy of the proposed CODA framework in managing a high-dimensional dataset (Rot-MNIST). The key reason is its flexibility in selecting either the input or latent space for employing the Correlation Predictor module. This module calculates correlation matrices for generating future data while preserving the model-agnostic characteristic. <!-- The key reason is that CODA is flexible in selecting **input or latent space** in **Correlation Predictor module** to calculate correlation matrices for generating future data while preserving model-agnoistic in model prediction. --> We agree that using feature correlation may be limited by its computation complexity. To this end, we adopt **a naive solution by first encoding original samples into low-dimensional** latent space, which allows us to compute feature correlation and incorporate it with CODA framework (as we describe in our [previous response](https://openreview.net/forum?id=CE7lUzrp1o&noteId=FJ6NN4Dmc8)). For the purpose of conducting fair performance comparisons with DRAIN, we apply the same pre-processing method. The additional experiment also demonstrates the effectiveness of CODA. Again, we would like to emphasize that **our main contribution lies in proposing a model-agnostic solution (benefits from Data Generator module) to address concept drift from a novel, data-centric perspective**. We agree that exploring TDG in high-dimensional data is a critical and under-studied topic, and this will be our future direction to enhance the robustness of our framework. **Q3: Feasibility of a conditional data generator idea.** [AQ3]: <!-- We believe there is a misunderstanding in the domain index here. --> <!-- In the synthetic and real-world concept drift datasets, the "domain" is defined as a specific "time point." Therefore, the "domain index" is precisely the "time index." --> For training a conditional data generator, **in diffusion model**, the $x_1$ and $x_2$ should be the identical sample with different time index. However, **we have no sequencial time index for each sample**, so it is infeasible for training a diffusion model in such concept drift datasets. On the other hand, it is doable to train a VAE-based conditional data generator using a time index as the input condition. Unfortunately, the native **conditional generation models hardly capture the underlying temporal trend** since the model architecture cannot identify the continuity among the input time index condition. **As mentioned in the existing work GI[1]**: >**(in Section 1)** "as a general-purpose neural network $F(x, t) that takes as input $x$, $t$..." >**(in Section 3.3)** "A naive way to do that is to concatenate $t$ with $x$ to obtain an augmented feature vector [$x$, $t$]. However, such an approach cannot capture complex trends in data, e.g., periodicity." To tackle the difficulty, GI designs a time-sensitive model architecture with a proposed time-dependent activation function. However, the previous work still implicitly captures temporal trend and may limit TDG performance. In our work, we explicitly capture temporal trends via modeling temporal trends of correlation matrices, and empirical results demonstrate that CODA achieves better TDG performance. [1] Anshul Nasery, et al., "Training for the Future: A Simple Gradient Interpolation Loss to Generalize Along Time," NeurIPS 2021. **With the clarification above, we hope that we have resolved all the reviewer's concerns and look forward to clarifying any further questions that may arise.** --- ## General Comments for All Reviewers. Dear reviewers, We thank all reviewers for their constructive reviews. We have revised the paper accordingly and marked the modifications in blue for visibility. We are grateful to all reviewers for their constructive comments and helpful feedback. We are pleased to find that they find our well-written and well-organized (VxLr and GM2p), novel and meaningful approach (BJ2w and q1Jd), theoretically sound (BJ2w and GM2p), and the experiments well-established and effective (VxLr, BJ2w, q1Jd, and GM2p). To address your primary concerns, we have done our best to extend the work with additional experiments, and reply to your concerns and suggestions with more clarification and discussion. We propose a model-agnostic framework to tackle the root cause of concept drift by generating future data for model training. The generated training data provides flexibility and transferability for architecture-type exploration. Experimental results reveal that the different model architectures can be effectively trained on the generated data. The revision parts are summarized as follows: - (q1Jd, GM2p) We have revised the discussion of the effectiveness of high-dimensional data in Section 3.4. - (VxLr, BJ2w, q1Jd, GM2p) We have added the experiments of baseline comparisons and citations in Appendix G. - (q1Jd, GM2p) We have added the experiments of training time efficiency comparison in Appendix H. We appreciate all of the suggestions made by the reviewers to enhance our work. We are delighted to receive your feedback and eagerly anticipate addressing any follow-up questions you may have. Sincerely, Authors

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully