Han Wang
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # ICLR 2024 Rebuttal ## Message to AC Dear Area Chair, We would like to express our sincere appreciation for your time and effort in overseeing the review process. We are writing this message since we are deeply concerned about the soundness and fairness of reviewer uUKR (R2). Specifically: - R2 appears to lack fundamental knowledge of spectral graph theory, and raises incorrect criticism on terminology (which has been used in prior works). This suggests that R2 does not possess the necessary expertise to judge our core technical contribution, which is to perform spectral decomposition to the population graph to construct principled embeddings for OOD generalization and OOD detection. - R2's review significantly lacks depth, with minimal engagement beyond surface-level remarks on terminology. There was no comment being raised about the core methodology or theoretical analysis. While we have replaced the terminology to be "graph-based" as suggested, we believe R2's argument is insufficient and unfair to rate our paper as a reject. We have carefully clarified all reviewers’ questions. Additional experiments and baselines have been added, where our method's strong performance holds. The overall novelty and soundness of our work have been recognized by all reviewers. Your judgment on this matter is very important to us. Thank you again for your valuable time and service. Sincerely, Authors <!-- ## Follow up Dear reviewers, As we approach the end of author-reviewer discussion phase, we want to express our gratitude for your time and effort in evaluating our submission. As the rebuttal phase is crucial for clarifying any remaining concerns, we would greatly appreciate it if the reviewers could kindly advise if our responses resolve the concerns raised. If you have any further questions or require additional information, we will do our best to address them. We understand the demands on your time and appreciate your ongoing commitment to the review process. Thank you for your dedication. Sincerely, Authors --> ## Response summary We thank all the reviewers for their time and commitment to providing valuable feedback and suggestions on our work. We are encouraged that ALL reviewers find our paper _novel_ (WvgX, uUKR, BLbH), that our methodology and theoretical insights are _interesting, sound and valuable_ (WvgX, uUKR, BLbH), and that our results are _impressive/extrodinary and significant_ (WvgX, uUKR, BLbH). We appreciate that reviewers acknowledge our _clear organization and presentation_ (uUKR, BLbH). We have responded to all comments and questions from each reviewer in detail below. We have also modified the manuscript in response to the reviewer's suggestions. The main changes are highlighted in blue, which include: + Replaced "graph-theoretic" with "graph-based" throughout the paper. + Replaced "augmentation graph" with "augmentation transformation probability" (Section 4.3) + Added the latest SOTA baseline for OOD generalization (Section 5) + Fixed typos (Section 3.1) We believe these changes have helped strengthen our manuscript. Once again, we express our gratitude for your thoughtful and thorough evaluations. Sincerely, authors ## Response to Reviewer WvgX We sincerely appreciate your positive feedback and insightful comments! We address the questions below in detail. > **W1. Confusion Between Augmented Graph and Image** Thank you for pointing this out! In Section 3.1, we introduce the definition of _graph_ $G(\mathcal{X},w)$, where the vertex set $\mathcal{X}$ consists of all _augmented images_. For any two augmented images $x$ and $x'\in \mathcal{X}$, we define the weight $w_{xx'}$ based on Equation (2). \begin{align} \begin{split} w_{x x^{\prime}} = \eta_{u} w^{(u)}_{x x^{\prime}} + \eta_{l} w^{(l)}_{x x^{\prime}}, \end{split} \end{align} where \begin{align} w^{(u)}_{x x^{\prime}} & \triangleq \mathbb{E}_{\bar{x} \sim {\mathbb{P}}} \mathcal{T}(x| \bar{x}) \mathcal{T}(x'| \bar{x}) \\ w^{(l)}_{x x^{\prime}} & \triangleq \sum_{i \in \mathcal{Y}_l}\mathbb{E}_{\bar{x}_{l} \sim {\mathbb{P}_{l_i}}} \mathbb{E}_{\bar{x}'_{l} \sim {\mathbb{P}_{l_i}}} \mathcal{T}(x | \bar{x}_{l}) \mathcal{T}\left(x' | \bar{x}'_{l}\right). \end{align} Here, $\mathcal{T}(x|\bar{x})$ denotes the probability of $x$ being augmented from $\bar{x}$. In other words, we can derive the graph based on the augmentation transformation and its probability. The relative magnitude of $w_{xx'}$ intuitively captures the closeness between $x$ and $x'$ with respect to the augmentation transformation. For most of the unrelated $x$ and $x'$, the value $w_{xx'}$ will be significantly smaller than the average value. For example, when $x$ and $x'$ are random croppings of a cat and a dog respectively, $w_{xx'}$ will be essentially zero because no natural data can be augmented into both $x$ and $x'$. We understand that confusion may arise from the wording of "augmentation graph" on page 6, which simply encodes the augmentation probability between any two images $x$ and $x'$. We have revised the draft accordingly and changed it to "augmentation transformation probability", which hopefully avoids the confusion. > **W2. Model complexity** Indeed, as you concur, directly performing eigendecomposition on the graph may be computationally intractable for real-world data with many images. That's precisely the reason for proposing the spectral loss (Section 3.2), which turns the eigendecomposition problem into a representation learning problem that can be optimized efficiently with neural networks. In particular, we parameterize the rows of the eigenvector matrix as a neural net function and assume embeddings can be represented by $f(x)$ for some $f \in \mathcal{F}$, where $\mathcal{F}$ is the hypothesis class containing neural networks. This provides us with the convenience of utilizing the power of neural networks and learning on a large amount of data. In other words, our loss function in Equation (6) allows bypassing the graph construction and eigendecomposition, and can be optimized efficiently as a form of contrastive learning objective (by pulling closer the positive pairs and pushing apart the negative pairs). Importantly, our learning objective allows us to draw a theoretical equivalence between learned representations and the top-$k$ singular vectors of the normalized adjacency matrix $\tilde{A}$. Such equivalence facilitates theoretical understanding of the OOD generalization and OOD detection capability encoded in $\tilde{A}$, while enjoying the benefits of being end-to-end trainable. Empirically, we have demonstrated that our method can be applicable to real-world datasets including ImageNet (Appendix E.2), without the computation issue. > **W3. Derivation** For your reference, the full derivation can be found in **Appendix D**, pages 20-21. In short, the coefficient comes from the closed-form derivation of the top singular values. > **W4. Typos** Yes, $x^+$ should be replaced with $x'$. This has been fixed in our new draft. Thanks for the careful read! > **Q1. Experiments on diverse datasets** We agree that such an extension would be meaningful and important to support the broad applicability of our framework. Apart from CIFAR-10, we also provide large-scale results on the ImageNet dataset in **Appendix E.2** and additional results on the OfficeHome dataset in **Appendix E.3**. ## Response to Reviewer uUKR We thank the reviewer for recognizing the novelty and soundness of our approach. We are encouraged that you appreciate the theoretical justification and our experimental results. Below we address your comments in detail. > **W1. Clarification on terminology "graph-theoretic framework"** We thank you for raising this concern. The central component of our framework is to perform spectral decomposition to the population graph to construct principled embeddings for OOD generalization and OOD detection (see Section 3.2). Such a decomposition is fundamentally related to **spectral graph theory** [1,2]. Spectral graph theory is a classic research field, concerning the study of graph partitioning through analyzing the eigenspace of the adjacency matrix. We provided an extensive discussion on spectral graph theory in the related work section (page 9), along with its connection to modern machine learning. We would also like to point out that the terminology of _graph-theoretic_ framework is adopted from the recent pioneering work of HaoChen et al.[3], which provided a theoretical analysis of unsupervised learning. Different from previous literature, our work focuses on the joint problem of OOD generalization and detection, which has fundamentally different data setup and learning goals (cf. Section 2). In particular, we are dealing with unlabeled data with heterogeneous mixture distribution, which is more general and challenging than previous works. We are interested in leveraging labeled data to classify some unlabeled data correctly into the known categories while rejecting the remainder of unlabeled data from new categories. Accordingly, we derive a novel theoretical analysis uniquely tailored to our problem setting, as shown in Section 4. With that being said, we believe the suggested terminology of _graph-based framework_ is also applicable, and will not affect the essence of our contributions. We have accordingly revised our manuscript (changes marked in blue color). Thank you again for pointing this out! [1] Fan RK Chung and Fan Chung Graham. Spectral graph theory. Number 92. American Mathematical Soc., 1997. [2] Jeff Cheeger. A lower bound for the smallest eigenvalue of the laplacian. In Proceedings of the Princeton conference in honor of Professor S. Bochner, pages 195–199, 1969. [3] Jeff Z. HaoChen, Colin Wei, Adrien Gaidon, and Tengyu Ma. Provable guarantees for self-supervised deep learning with spectral contrastive loss. In NeurIPS, pp. 5000–5011, 2021. >**W2. Additional evaluations, baselines** Literature in OOD generalization commonly considers covariate shift and domain shift. For domain shift, we have included additional evaluations on Office-Home in **Appendix E.3**, where the competitiveness of our method holds. For baselines, we follow closely the latest work SCONE [4], which considers the identical problem setting as ours. In general, methods that have access to wild data are more competitive than standard OOD generalization (which only has access to in-distribution data only). We believe our results and comparisons are meaningful since we already included the SOTA method SCONE. To make our comparison with OOD generalization baselines more convincing, we provide results of EQRM [5] and the latest SOTA baseline from CVPR'23 called SharpDRO [6]. The results of employing SVHN as semantic OOD dataset are shown in the table below (CIFAR-10 as ID, CIFAR-10-C as covariate-shift OOD). Compared to the SOTA baseline tailored for OOD generalization, our method can improve OOD Acc. by 7.59%. | Method | OOD Acc. $\uparrow$ | ID Acc. $\uparrow$ | FPR $\downarrow$ | AUROC $\uparrow$ | | ------------ | --------- | --------- | -------- | --------- | | EQRM [5] | 75.71 | 92.93 | 51.86 | 90.92 | | SharpDRO [6] | 79.03 | 94.91 | 21.24 | 96.14 | | SLW (Ours) | 86.62±0.3 | 93.10±0.1 | 0.13±0.0 | 99.98±0.0 | The numbers for OOD detection baselines are consistent with Table 3 in Bai et al. Strictly following [4], we used the Wide ResNet with 40 layers and a widen factor of 2 to conduct our experiments. The results may differ from those reported in the original baseline papers due to different network architectures and pre-training checkpoints. [4] Bai, Haoyue et al. Feed two birds with one scone: Exploiting wild data for both out-of-distribution generalization and detection. In ICML, 2023. [5] Eastwood, Cian et al. Probable domain generalization via quantile risk minimization. In NeurIPS, 2022. [6] Huang, Zhuo et al. Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization. In CVPR 2023. > **W3. Terminology in Section 3.1** We do agree with the reviewer that the title of Section 3.1 should be revised, since spectral graph theory only becomes relevant in subsequent sections. As suggested, we have changed it to simply "Graph Formulation". Thanks again for the suggestion! ## Response to Reviewer BLbH We sincerely appreciate your positive feedback and constructive feedback! We address the questions below in detail. > **W1. Concept shifts** That's a very insightful comment! We focus on covariate or domain shift for the OOD generalization task, since it's one of the most studied forms of data set shift in popular benchmarks [1]. As you pointed out, we do recognize the existence of more challenging concept shifts, where the posterior distribution $p(y|x)$ changes while $p(x)$ remains the same. The adaptation in this setting can be fundamentally difficult without labeled target data. To estimate conditional distributions, one may need simultaneous observations of both variables. Admittedly, this setting would have been difficult for our framework to solve, since we focus on leveraging unlabeled data, and moreover, the unlabeled data can be mixed with other types of distributional shifts (e.g. semantic shifts). We concur with the reviewer, that the SVD decompositions on such joint distributions may not be able to accurately capture the posterior distribution in the target domain. With that being said, we do believe the problem posed here is very interesting and perhaps worth diving deeper in future work. We have revised our paper accordingly to reflect this. [1] Gulrajani, Ishaan et al. "In search of lost domain generalization." arXiv:2007.01434 (2020). > **W2. Theoretical guarantees and assumptions** We provide a theoretical guarantee for OOD generalization in Theorem 4.1, which bounds the linear probing error $\mathcal{E}(f)$ using the learned representations. As defined in Equation (7), the linear probing error measures the misclassification of linear head on covariate-shifted OOD data. We show that when the scaled connection between the class is stronger than the domain, the model could learn a perfect ID classifier and effectively generalize to the covariate-shifted domain, achieving perfect OOD generalization with linear probing error $\mathcal{E}(f)=0$. For the above theorem to hold, we assume the magnitude order of the probability of augmentation follows $\rho\gg \text{max}(\alpha,\beta)\ge\text{min}(\alpha,\beta)\gg\gamma\ge0$, where $\alpha$ indicates the augmentation probability when two samples share the same label but different domains, $\beta$ indicates the probability when two samples share different class labels but with the same domain, and $\gamma$ is the probability when two samples differ in both class and domain labels. Indeed, as you pointed out, generalization to arbitrary OOD can be impossible when the test distribution is unknown [2,3]. Different from prior literature, our problem setting considers both ID data as well as unlabeled wild data (_which contains samples from the covariate-shifted domain_). Thus, the previous theory in the OOD-agnostic setting no longer applies to our case. We show that the generalization can provably reach low error when one can learn from the wild data, which provides new insight for the community. [2] Gilles Blanchard et al. Generalizing from several related classification tasks to a new unlabeled sample. In NeurIPS, 2011. [3] Krikamol Muandet et al. Domain generalization via invariant feature representation. In ICML, 2013 > **W3. Motivation for choosing a graph to model the sample correlations.** Graph is a classic structure to model the connectivity among points, and reveal useful sub-structures. The sub-structures may correspond to images from different known classes, or OOD data with unknown classes. By performing spectral decomposition on such a graph, our driving motivation is to uncover meaningful structures for both OOD generalization and detection (e.g., covariate-shifted OOD data is close to the ID data, whereas semantic-shifted OOD data is distinguishable from ID data). Importantly, the graph allows us to theoretically understand how wild unlabeled data impacts OOD generalization and detection, through the lens of spectral graph theory. By way of spectral decomposition on the adjacency matrix $\widetilde{A}$, we can rigorously reason the OOD generalization and detection capability encoded in $\widetilde{A}$. As exemplified in Theorem 4.1 and Theorem 4.2, we analyze closed-form solutions for the OOD generalization and detection error. We believe our graph-based formulation provides a new angle to the community for the problems of OOD generalization and OOD detection, and may inspire more future work. > Q1. Without constructing a graph Our key insight is that our spectral contrastive loss (Equation 6) can be derived from a spectral decomposition of the graph adjacency matrix $\tilde A$. This loss effectively turns the eigendecomposition problem into a representation learning problem that can be optimized efficiently with neural networks. In other words, our loss function in Equation (6) allows bypassing the graph construction and eigendecomposition, and can be optimized as a form of contrastive learning objective.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully