Harshavardhan Kamarthi
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # PEM KDD Rebuttal ## Common Questions We thank all the reviewers for their valuable feedback. Here, we will address the common questions asked by all the reviewers. **1. Missing citations in lines 215, 216.** We thank the reviewers for identifying missing citations in two lines. We will update the manuscript by adding the following references: [1] for DeepAR model, [2] for Deep Markov models, and [3,4] for Deep State models. We checked and found no other missing citations. [1] Salinas, David, et al. “DeepAR: Probabilistic forecasting with autoregressive recurrent networks.” International Journal of Forecasting 36.3 (2020) [2] Krishnan, Rahul, Uri Shalit, and David Sontag. “Structured inference networks for nonlinear state space models.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 31. No. 1. 2017. [3] Li, Longyuan et al. “Learning Interpretable Deep State Space Model for Probabilistic Time Series Forecasting.” International Joint Conference on Artificial Intelligence (2019). [4] Albert Gu, Karan Goel, & Christopher Re (2022). Efficiently Modeling Long Sequences with Structured State Spaces. In International Conference on Learning Representations. **2. More details on training time and batch size and memory.** We found that pre-training for up to 5000 epochs on all SSL tasks simultaneously was sufficient, as longer pre-training did not improve SSL-related losses or downstream performance significantly. During training, we set 5000 epochs as the maximum, but we observed that most downstream tasks required 1500-2500 epochs to converge and reach the early stopping criteria. Since the datasets in most tasks could fit into the GPU, we set the batch size to be equal to the number of training data points. On average, it took around 8 hours for pre-training and 20-150 minutes for fine-tuning for each task (using the Nvidia Tesla V100 GPU as mentioned in the paper). We also observed similar training times for most top baselines which we also trained with early stopping. Regarding memory requirements, for all the datasets, we did not require more than 8 GB of VRAM. Similarly, during pre-training, since we randomly choose a pre-train dataset for each batch (line 521), the memory requirements range from 4-8 GB of VRAM based on dataset size. We will include these additional details in the final version. ## Reviewer RU49 We thank the reviewer for the valuable feedback and are grateful to identify the significance of our work. We will address the questions and concerns as follows. **The training details are relatively rough. For the pre-training studies, the training costs, e.g., training time in term of GPU, are useful information for other researchers but not included in the paper.** We would like to clarify that we have provided specific hyperparameters for both pre-training and fine-tuning in our manuscript, as indicated in lines 646-655. Furthermore, we have conducted ablation studies and hyperparameter sensitivity analysis to investigate the impact of important self-supervised learning (SSL) and architecture hyperparameters in Section 7, with the results presented in Tables 5, 6 and 7. In addition, we provide further details on training, pre-training time, and memory used in the *common response* (Point 2) which we will add to the final version. In summary, we took about 8 hours of GPU time for pre-training on all the datasets and 20-150 minutes for fine-tuning for each task depending on the size of the training dataset. **The pre-training usually enable the few-shot or learning efficiency on the downstream tasks. The related experiments are missing in the paper.** While the use of pre-trained large language models for few-shot or zero-shot learning has been successfully demonstrated in many natural language processing (NLP) tasks, extending this approach to heterogeneous multi-domain time-series datasets and tasks remains an important and open research problem not addressed by any previous work. Our work focuses on the impact of pre-training model weights for more performant training of downstream tasks. Our work would enable future research on leveraging our methods for problems like few-shot learning problems in time-series. In terms of learning efficiency, as we discussed in response to the previous question, our proposed probabilistic epidemic models (PEMs) require similar training time to other baselines while significantly outperforming them in terms of downstream performance. This indicates that our approach not only achieves better results by effectively leveraging pre-train datasets but also does so in a computationally efficient manner. **Will the pre-trained checkpoints be publicly available?** Yes, we will publicly release the model weights along with the implementation code on publication of the paper. ## Reviewer 2RSK We thank the reviewer for their valuable comments and questions. We thank the reviewer for identifying the significance of our work in designing pre-trianing methods for time-series. We will address the reviewer's questions and concerns as follows: **The motivation is not clear enough. The authors argue that there are two challenges in time-series domain pretraining, but they don't explain clearly how their framework can solve these problems.** As explained in lines 134-156, the two main challenges for applying pre-training frameworks on multiple time-series datasets are the higher heterogeneity of time-series data compared to images and the smaller datasets available for pre-training. As a result, general time-series SSL tasks like random masking may not effectively capture the important properties from a large number of small, heterogeneous datasets with varying patterns such as seasonality, periodicity, noise, etc. To address these challenges, we specifically designed SSL tasks (Section 4.3) to enable the model to efficiently extract useful epidemic dynamics information such as identifying peaks and their dynamics (PEAKMASK), learning to forecast future values (LASTMASK), and detecting seasonal information (SEASONDETECT). These tasks effectively learn useful epidemiologically relevant patterns from all the heterogeneous epidemic time-series datasets, which can be leveraged for improved predictive performance in multiple downstream tasks. We will stress this point in the introduction and Section 3.2. **The experiment results in Tables 1 and 2 are not convincing enough. Direct comparisons are unfair for other methods because PEM uses more data. It would be helpful to see the performance of other methods under the same dataset setting.** First, we wish to clarify that other baselines, which use the traditional paradigm of training only on datasets relevant to the task, can't be trivially adapted to use pre-train data used by PEMs. For example, there is no straighforward way to leverage measles dataset when training to predict for influenza. In contrast, our approach for pre-training using SSL from a wide range of heterogeneous epidemic time-series datasets is novel framework that can effectively utilize these multiple pre-train datasets to extract useful patterns. Therefore, we believe that the experiment setup for performance comparison is fair. In addition, we compare our method with past state-of-the-art SSL methods (Table 5) where we use the full pre-trained datasets to pre-train the baselines as well (therefore using same datasets as PEM). However, these SSL methods significantly underperform since they cannot adapt to the heterogeneity of the data and cannot effectively capture useful patterns from the pre-trained datasets. This further emphasizes the effectiveness of our SSL methods for pre-training. **The analysis in the ablation study is insufficient. For example, the impact of each hyperparameter should be discussed in more detail.** We have thoroughly investigated the impact of hyperparameters related to the main contributions of our work (Tables 5, 6, 7). Specifically, we studied the impact of each SSL task and architectural novelty. We also studied the sensitivity of the hyperparameters related to these, specifically the segment size, masking probabilities of SSL tasks, and reverse instance normalization. In addition, we have provided specific hyperparameters for both pre-training and fine-tuning in our manuscript, as indicated in lines 646-655. Furthermore, we have provided additional details on training and pre-training time, memory, and batch sizes used in the *Common Response* above. We will incorporate these details into the revised manuscript. We have also observed that model's hyperparameters perform well across all downstream tasks (line 923). **In line 921, the authors said that when the segment size is 2, they got the best score, but in Table 7, P=4 has higher scores.** This is a typo in line 921. We meant that the best segment size is 4. We thank the reviewer for identifying the error and we will fix it in the revised version. **The writing should be further polished. There are many typos that need to be corrected.** We have addressed the missing citations in lines 215 and 216 in the *Common Response* above. We will go over the manuscript to correct this and any other small typos in the revised manuscript. ## Reviewer evxR Thank you for your positive feedback on our methods' technical novelty, effectiveness, and potential impact. Regrading your comments regarding missing citations we address the 4 missing citations in the *Common Response*. We have also added additional details on the compute and memory requirements during pre-training as well as training in *Common Response*. We will fix the citations and add the additional pre-training details in the final version. **Given the ones reported here for the proposed method, what is the training time cost and memory cost of the baselines?** We apologise for missing you point on comparing with the baselines. We measured the training time and maximum memory requirements of all baselines as follows: | | | Training time(min) | | | | Max. Memory(GB) | | | |-----------------|--------------|--------------------|-------------------|---------|--------------|-----------------|-------------------|---------| | Model/Benchmark | Influenza-US | Influenza-Japan | Cryptosporidiodia | Typhoid | Influenza-US | Influenza-Japan | Cryptosporidiodia | Typhoid | | Autoformer | 37.9 | 31.6 | 29.7 | 49.5 | 4.2 | 3.8 | 4.9 | 3.7 | | Pyraformer | 44.7 | 38.7 | 42.1 | 62.5 | 2.6 | 2.9 | 2.5 | 2.1 | | Informer | 31.6 | 42.5 | 35.9 | 55.1 | 4.5 | 3.7 | 4.3 | 3.2 | | Fedformer | 47.4 | 32.9 | 48.6 | 53.9 | 3.1 | 3.6 | 3.7 | 2.9 | | GP | 3.7 | 3.1 | 2.7 | 3.5 | 0.2 | 0.1 | 0.2 | 0.2 | | EpiFNP | 27.4 | 22.5 | 29.3 | 47.2 | 2.8 | 2.1 | 3.5 | 3.1 | | EpiDeep | 39.1 | 42.7 | 39.6 | 53.6 | 3.2 | 2.7 | 3.4 | 3.1 | | EB | 3.4 | 3.2 | 3.9 | 3.5 | 0.1 | 0.1 | 0.1 | 0.1 | | FUNNEL | 0.6 | 0.5 | 0.9 | 0.2 | 0.1 | 0.1 | 0.13 | 0.1 | | PEM | 42.4 | 35.5 | 39.2 | 64.5 | 4.7 | 3.5 | 4.8 | 4.1 | We observe that PEM's training time is similar to the transformer based baselines and has similar memory requirements for all downstream tasks. Methods like GP, FUNNEL and EB are not deep learning based and use considerable less time and memory but provide worse performance. **In addition, as discussed in the introduction, [12, 18, 31, 33, 10] are proposed to solve the data sparsity and deal with noise in epidemic forecasting, but they are not considered baselines. What is their performance?** We explicitly chose state-of-art machine learning based epidemic and general time-series forecasting baselines. These methods use only the past time-series of an epidemic for forecasting future values. The methods referred to by the reviewer consider other sources of external or expert knowledge that are specific to given epidemic. [12] is a spatio-temporal model that requires graph knowledge between regions such as mobility. [18, 31] require expert knowledge of mechanics of the epidemic such as the mechanistic model of differential equations that governing spread of the disease. [33] is specifically curated for Covid-19 that uses multiple Covid-19 pandemic specific features relevant to the US. [10] is another Covid-19 forecasting model that is an ensemble of predictions from top models designed by multiple research groups in US called Covid-19 FOrecast Hub.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully