Sophia Sun
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Reviewer 1 gQHv We would like to sincerely thank the Reviewer gQHv for providing valuable comments and recognizing the value of our work. ### Response to Weaknesses > W1: multivariate parts that author claim to be new seems to not be really in the novel parts, but multi-step ones are main focus. Otherwise, kindly describe why existing methods by Stankeviciute et al. (2021) cannot handle this setting (with small modifications). We agree with review gQHv's comment. In Stankeviciuteet al. (2021), only univariate time series was handled by the algorithm; the multivariate setting was mentioned as a future direction. We made a small modification to cover multivariate data by computing the vector norm in Eqn (1). Our key contribution is multi-step forecasting. Due to the curse of dimensionality, the uncertainty regions often becomes very large for multivariate time series, if we directly apply CP or Stankeviciuteet al. (2021)'s approach. We developed our method CopulaCPTS to significantly shrink the uncertainty regions while still maintaining coverage, especially for multivariate datasets (see table 1). To summarize, we mention multivariate as a motivating setting and useful application area for this work. > W2: enhance UQ literatures and discussion on other classes. For example there are several advances in forecasting based on quantile regresison methods for univariate and multivariate https://arxiv.org/abs/2202.11316,https://arxiv.org/abs/2111.06581, and even conformal prediction enhance quantile regression method itself https://arxiv.org/abs/1905.03222. Thank you for the valuable suggestion! We have incorporated these papers in our literature review section. Even though quantile methods are widely used in time-series forecasting UQ, they do not give finite sample guarantees on coverage, so they are not directly related to this work. In the CP section we have discussed conformal quantile prediction (Romano et al., 2019) and cited a later work (Sousa et al. 2022). Please understand that in the interest of space, we have kept the discussion concise. > W3: more deep dive on behvarious $\alpha$. For example, is this related to the author claim the capability of capturing dependency? Please see the updated Appendix C.6 for a study on $\alpha_h$. Let us know if it clarifies the role $\alpha_h$ plays in representing the correlation. > W4: the experiment setting are poorly described and difficult to examine univatie/multivate cases and which various base forecasters are used and why. We have edited section 5.1, section 5.2, and Appendix C.1 for better clarity on experimental setting and model selection. ### Response to Clarity > in eq (7) , h is typo? Yes, thank you for the note. We've corrected it in the updated version of the paper. > the data split of training and val is not clearly described, especially multivatie settings. See W4 for the updated description. Thank you again for your thorough review and insightful questions! Let us know if anything remains unclear. # Reviewer 3 oH8D We sincerely thank reviewer oH8D for providing thorough and insightful comments. > -: many typos in the paper, suggesting a rushed in submission. Among others: We apologize for the typos! The updated version has them all corrected. Thank you for carefully reading our paper. > -: in the case of auto-regressive, it is unclear how is treated the newly produced point in order to produce further predictions. In particular, since the newly predicted point is a conformal sets, do authors run conformal prediction using weak-labels/intervals (as in "Cauchois, M., Gupta, S., Ali, A., & Duchi, J. (2022). Predictive Inference with Weak Supervision."), or do they simply use the new point-wise prediction? What guarantees that the previous validity guarantees with respect to the ground-truth will hold (as the prediction is not the true value)? Also, how does one obtain the cumulative distribution for the non-conformity scores of the k+1 prediction? Does one use the same cumulative distribution, and if this is the case, why do we need to re-run an optimisation? In our current setup we only use the point-wise predictions for autoregressive forecasting. The coverage guarantee will always only be on the $k$ step that are predicted by the model, whether it is time steps $\{t+1, \ldots, t+k\}$ or $\{t+2, \ldots, t+k+1 \}$, etc. The algorithm described in the paper does not give cumulative coverage guarantees for over $k$ steps of forecasts. We assume we have validation data for the length of the autoregressive forecasts. This way, we model the copula for uncertainty of _the first $k$ steps of the autoregressive forecast_, _the $t+1$ to $t+k+1$ steps of the autoregressive forecast_, and so on. As outline in paragraph 2 of section 4.3 in the paper, we might or might not need to refit to every one of these cases. We understand that it's desirable to have coverage guarantees for the entire forecasting horizon, or use predicted intervals for future predictions. However these goes beyond the scope of our paper, so we leave it for future work. > -: While the optimisation program introduced to find suitable time-step wise confidence degrees seems fine, it is unclear what we can/should expect from it? In particular, shall we in practice have very similar confidence degrees or very imbalanced ones? What justify to have different confidence degrees at each run? Why not optimizing to minimize the obtained volume? How does it compare to the constant time dichotomic search assuming all confidence degrees being equal? Please see the newly added appendix C.6 for a study on finding $\alpha_h$. Please let us know if any of any of the explanations are unclear! Thank you again. # Reviewer 4 FqUe We would like to sincerely thank Reviewer FqUe for providing these comments. We believe there are some misunderstandings, as we will outline in this response. ### Response to Weakness 1 > the ICP validity definition on coverage (Definition 1) is not correct (unless there are reasons, which are not described). The probability needs to be taken over X, Y, and also a calibration set (e.g., [R1]) We respectfully disagree. Our definition of validity follows from seminal works on conformal prediction: see proposition 1 of Vovk (2013) [1] and the introduction section of Lei (2018) [2]. We want to point out that $\mathcal{P}$ in our definition 1 _**is**_ the data distribution for $(X,Y)$. In the regression case, we can represent the training, calibration, and test data as an exchangeable set of data drawn from distribution $\mathcal{P}$. Theorem 1 in [R1] is identical to our formulation. It omits the subscript because the coverage inequality holds for arbitrary data distribution $\mathcal{P}$. [1] Vovk, Vladimir. "Conditional validity of inductive conformal predictors." Asian conference on machine learning. PMLR, 2012. [2] Lei, Jing, Max G’Sell, Alessandro Rinaldo, Ryan J. Tibshirani, and Larry Wasserman. "Distribution-free predictive inference for regression." Journal of the American Statistical Association 113, no. 523, 2018. > Also, this coverage guarantee is not used consistently. In particular, the validity statement in Proposition 1 contains the expectation over y, but to my understanding, the probability here is also taken over y. The expectation in proposition 1's statement is a typo - we sincerely apologize for it. The inequality should be on the probability as written in the proof, not on the expectation. The content of the proof (in appendix A) is correct and is what we mean for the coverage guarantee. We have removed the extraneous expectation notation in the updated version of the paper. We believe Proposition 1 in the paper is now correct. > In particular, (11) assumes $u^*$ is given, which needs to be found in Line 15 of Algorithm 1 based on a calibration set. As $u^*$ depends on a calibration set, it is a random vector. We want to highlight that the $u^*$ does not depend on the calibration set, but only on the copula function $C$. Given a copula $C$, the set of value for $u^*$ such that $C(u^*) = 1-\alpha$ is fixed. Sklar's theorem (theorem 1) proves the existence of this $u^*$. The proposition is proven with the perfect $u^*$ here; we assume our search for $u^*$ is accurate give the law of large numbers. > Finally, it is unclear how the exchangeability assumption is used in the proof. The exchangeability assumption is used when estimating $\hat{F}$ functions (equation 6 and 7). Note that equation (6) is an ICP procedure; the equation holds because of the exchangeability assumption and the coverage guarantee that follows (equation 3). CopulaCPTS combines the results of $k$ individual ICP procedures for a cumulative guarantee. Please let us know if this explanation is clear. If not, what can we add to the proof to clarify? ### Response to Weakness 2 > Missing baselines: > - A score function can be simply be $\| \hat{\mathbf{y}} - \mathbf{y} \|$ > - by modifying RNN to estimate the standard deviations of $\hat{\mathbf{y}}$ We have included the baselines per request in Appendix C.7 of the updated paper. However, we justify our choice of not including them in the main text as follows. Using L2 norm over the entire forecast horizon as nonconformity score has two drawbacks: (1) L-2 distance does not grant us the desired property of creating a cone of uncertainty for time-series forecasting. (2) Form an application perspective, it is limited in the ability to help aid decisions. In the Covid forecasting scenario, we want to know the highest and lowest number of cases per day with high confidence. For robotics applications, we want to draw bounding boxes around our prediction so we can navigate around with safety. L2 norm does not give direct guidance on the uncertainty for each timestep. (To visualize the uncertainty estimate we assume that for each timestep all error comes from that one step. This results in very inefficient confidence regions.) Directly using an RNN to estimate the standard deviations is widely know to be prone to over-confidence [3], hence more sophisticated approaches such as the papers you cited are researched. There is also a misunderstanding here: if one directly estimate the variance with an RNN, one make the assumption that the error distribution is Gaussian. Conformal prediction is a distribution-free UQ method and avoids making such assumptions. You wouldn't need CP once you have a Gaussian estimation and can directly obtain confidence regions, as we did in Figure 11(b) in Appendix C.7. [3] Guo, Chuan, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. "On calibration of modern neural networks." In International conference on machine learning, pp. 1321-1330. PMLR, 2017. Particle dataset $\sigma = 0.01$, with confidence $1-\alpha = 0.9$ | Method | Coverage (90%) | Area $\downarrow$ | | --- | ----------- | -- | | L2-Conformal | $88.5 \pm 0.4$ | $7.21 \pm 0.35$ | | Direct Gaussian| $11.9 \pm 0.09$ | $0.07 \pm 0.31$ | | CF-RNN | $97.0 \pm 2.3$ | $3.13 \pm 3.24$ | | Copula-RNN | $\textbf{91.5} \pm 2.1$ | $\textbf{1.06} \pm 0.36$ | Particle dataset $\sigma = 0.05$, with confidence $1-\alpha = 0.9$ | Method | Coverage (90%) | Area $\downarrow$| | --- | ----------- | -- | | L2 | $89.7 \pm 0.6$ | $7.21 \pm 0.35$ | | Direct Gaussian| $0.0 \pm 0.0$ | $0.08 \pm 0.02$ | | CF-RNN | $97.0 \pm 2.3$ | $5.79 \pm 0.51$ | | Copula-RNN | $\textbf{90.3} \pm 0.7$ | $\textbf{4.50} \pm 0.07$ | ### Response to clarity > Page 4: The citation (Gibbs & Candes, 2021) in the first paragraph is not appropriate; the cited paper attacks online learning, but the submitted paper does not consider the same setup, making readers confusing. Gibbs & Candes also consider conformal prediction for time series forecasting, which is related. However, their setting treats each time step independent (as in online learning) whereas we consider the correlations among multi-step jointly. In the conformal prediction section of related works, we have elaborated on this distinction. > Page 5: why is (8) true? The Sklar’s theorem only implies the first equality. The first equality, by Theorem 1 Sklar's theorem, we have $$ F(s_1, \ldots, s_k) = C(F_1(s_1), \ldots, F_k(s_k)) $$ The $F_h$ are CDF functions and $F_h(s_h)$ is the probability of on time step $h$, the nonconformity score being less than $s_h$. The $\alpha_h$ values here are introduced as variables. $$F_h(s_h) = 1-\alpha_h, \; h \in 1, \ldots, k$$ The last part of the equation is the objective for conformal prediction. We want to find the $\alpha_1, \ldots, \alpha_k$ values such that $$ C(1-\alpha_1, \ldots, 1-\alpha_k) \geq 1 - \alpha $$ In practice, we perform the search in the case of equality, hence we have written (8) as it is. Your comment made us realize that it might be confusing, so we have updated section 4.2 of the paper to clarify. >Page 5: epsilon_t below the (8) is not defined. Thank you for pointing this out. corrected to $\alpha_h$. >Page 7: in “Metrics”, the expectation in the definition of coverage does not make sense to me. As written in our metric section, coverage is calculated as $$coverage_{1-\alpha} = \mathbb{E}_{x,y \sim \mathbf{X} \times \mathbf{Y}} P(\mathbf{y} \in \Gamma^{1-\alpha}(\mathbf{x})) \approx \frac{1}{n} \sum^n_1 \mathbb{1} (\mathbf{y}_n \in \Gamma^{1-\alpha}_n(\mathbf{x}_n))$$ Which is the expectation over $\mathbf{X} \times \mathbf{Y}$ forthe probability of our prediction region covering $\mathbf{y}$. > For the Covid19 dataset, it consists of 380 sequences; how can it be collected in UK regions? It would be better if the paper is self-contained. Is there any chance that the exchangeability assumption can be violated for this dataset? Please see appendix C.2 for a detailed description of how the dataset is obtained and processed. The 380 sequences are daily new cases from 380 regions (to name a few: Barnet, Worthing, East Devon.) We follow the experiment setup in Stankeviciute et al. (2021) [3] treat them as interrelated time series that are exchangeable. [3] Stankeviciute, Kamile, Ahmed M Alaa, and Mihaela van der Schaar. "Conformal time-series forecasting." Advances in Neural Information Processing Systems 34 (2021): 6216-6228 # Thank you for your response We appreciate your careful review and reply. > please explicitly denote that the probability is also taken over calibration samples. The probability is not taken over the calibration set. Please refer to Proposition 1 of [1]. The precise wording is "the probability of error $Y_{l+1} \notin \Gamma^{\epsilon}(\ldots)$ does not exceed $\epsilon$." which is a probability for the test sample $Y_{l+1}$. [1] Vovk, Vladimir. "Conditional validity of inductive conformal predictors." Asian conference on machine learning. PMLR, 2012. > The proof needs to consider the randomness from the calibration set. I think an asymptotic guarantee is okay, but the paper still needs to change notations. It is true that in Proposition 1 we made the assumption both $C$ and $\textbf{u}^*$ are accurately estimated, and an asymptotic guarantee is achievable if we were to take out this assumption. > Bonferroni correction, reusing calibration dataset. We use the calibration set exactly the same as CF-RNN [2], by treating each forecasting timestep $h$ as a separate task. By validity of ICP, each of these $k$ estimations are individually valid. Bonferroni correction, and hence CF-RNN, allows us to make a cumulative guarantee because of Boole’s Inequality: $\mathbf{P}[ \:\bigcup_{i=1}^k (p_i\leq \frac{\alpha}{k}) \: ] \leq \sum_{i=1}^k (\: \mathbf{P}[p_i \leq \frac{\alpha}{k} ] \:)= k \cdot \frac{\alpha}{k} = \alpha$ In the second paragraph of Section 5.1, we noted that Bonferroni correction is an upper bound of Copula functions. The idea of our paper is that if we can find an accurate Copula then CopulaCPTS is theoretically guaranteed to have smaller regions than CF-RNN. Learning the copula with the calibration set does not constitute reusing it for calibration, as there is no conformal prediction after the $k$ ICP for each forecast step. Although we did not provide analysis on accuracy of estimating the copula, the extensive experiment results show the effectiveness of our method. [2] Stankeviciute, Kamile, Ahmed M Alaa, and Mihaela van der Schaar. "Conformal time-series forecasting." Advances in Neural Information Processing Systems 34 (2021): 6216-6228 > Instead, use the Guassian as a score function for ICP as ICP works with any score function The RNN will not be able to output standard deviations if not trained with an Gaussian assumption. > Figure 11(b) demonstrates that if the intervals are correctly chosen based on ICP, “Direct Gaussian” may provide a smaller interval size than the proposed approach, while satisfying the coverage guarantee simply due to ICP. This is an ungrounded argument. The area with 12% or 0% coverage is not informative of the area for higher coverage levels. ICP methods calibrated for 12% coverage will also have very small areas, for that matter. We do not know of any methods that combine Baysian forecasting along with ICP, and to our knowledge there is no simple way of combining them without compromising the generality of conformal prediction. We thank the reviewer for the comment and leave it for future work.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully