Jirayu Burapacheep
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # General response We would like to thank all the reviewers for their valuable effort and helpful comments. We find it particularly encouraging to see that reviewers find that the hyperspherical energy score is a **promising** and **better-suited** OOD detection score due to **its connection to the log-likelihood** (oN3L, td2X, Vf4g) and that method is **empirically supported** through **extensive** and **well designed and conducted experiments** (JWN7, oN3L, td2X, Vf4g) and strengthens with **extensive ablations** (td2X). We are also glad that the reviewers find our paper **clear** and **easy to follow** (JWN7, oN3L, td2X, Vf4g). The theoretical and practical significance of our work forms a key part of the positive feedback we have received. Our method, the first to connect hyperspherical representations and Helmholtz free energy for OOD detection, offers unique contributions and rigorous theoretical interpretations. In practice, our method outperforms existing OOD detection scores in speed, performance on benchmarks like ImageNet-1k, and reduction in FPR95. Compared to the original energy score [1] --- one of the most commonly used scoring functions today --- our hyperspherical energy will steer the OOD community in a new direction with theoretical soundness and significant performance benefits (40% reduction in FPR95$\downarrow$ on CIFAR100). We respond to each reviewer's comments in detail below. We will also revise the manuscript according to the reviewers' suggestions, which makes our paper more comprehensive for readers. [1] Liu et al., Energy-based Out-of-distribution Detection. NeurIPS 2020. # Reviewer JWN7 We sincerely thank the reviewer for the detailed review and constructive criticism. We also appreciate the reviewer's recognition of our paper's clarity and strong results. We address the question below in detail. > **Q1: Difference between hyperspherical energy and CIDER [1]** CIDER and our approach differ significantly in terms of out-of-distribution (OOD) detection method in **testing time**. The design of the OOD scoring function serves as the central intellectual component in many OOD literature, which is what we contribute and focus on in this paper. The difference between CIDER and our approach was discussed in the introduction **L59-L69**. Specifically, CIDER relies on a _non-parametric_ KNN score [2], which requires a nearest neighbor search in the learned embedding space. In contrast, our OOD detection relies on a novel hyperspherical energy formulation, which can be viewed as a _parametric_ OOD scoring function. To further highlight the significance and novelty of our approach: - **Theoretical significance** Our method is the first to establish the connection between the hyperspherical representations and Helmholtz free energy for OOD detection. Our OOD scoring function enjoys rigorous theoretical interpretation from a log-likelihood perspective, while CIDER does not. Our derivation and interpretation of hyperspherical energy provided in Section 3.1 is an entirely new contribution relative to CIDER. From a training-time perspective, we also derive new insight into how the learning objective induces lower hyperspherical energy (Section 3.2), which directly connects to our proposed OOD scoring function. - **Practical significance** We also show empirically that our proposed OOD score achieves competitive performance on different OOD detection benchmarks and is computationally efficient compared to CIDER (Section 4). For example, on a large-scale ImageNet benchmark, our method is more than 10x faster than CIDER while simultaneously reducing the average FPR95 by 11.85% and establishing state-of-the-art performance. Moreover, hyperspherical energy is a privacy-friendly OOD detection score. Our OOD detection algorithm offers privacy benefits over CIDER by not requiring access to a pool of ID data. In situations where data privacy is a concern, the use of the KNN-based CIDER method may not be feasible as it requires access to a certain amount of labeled ID data, which can pose privacy risks. Lastly, we do not claim novelty on learning hyperspherical embeddings since von Mises-Fisher (vMF) distribution is long established in directional statistics [3] and pre-date CIDER. We also properly credited and cited prior works [2,4], which enable efficient optimization using the prototype update scheme (please see L161, L198). These are not our core contributions -- but they enable our work to propose a novel hyperspherical energy method for OOD detection. [1] Ming et al., How to Exploit Hyperspherical Embeddings for Out-of-Distribution Detection? ICLR 2023. [2] Sun et al., Out-of-Distribution Detection with Deep Nearest Neighbors. ICML 2022. [3] P.E. Jupp and K.V. Mardia. Directional Statistics. Wiley Series in Probability and Statistics. Wiley, 2009. [4] Li et al., Mopro: Webly supervised learning with momentum prototypes. ICLR 2020. # Reviewer oN3L We sincerely appreciate your positive feedback and insightful comments. Your acknowledgment of our work's clarity and unique contribution means a lot to us. We address the questions below in detail. > **Q1. OOD data's hyperspherical energy** Thank you for the insightful question. Our method operates under the assumption that OOD data lies in the low-likelihood (high hyperspherical energy) region in the hyperspherical space relative to the ID data with a higher likelihood (low hyperspherical energy). This is natural and commonly adopted by many likelihood-based approaches. Indeed, our empirical results on CIFAR-10, CIFAR-100, and the more challenging ImageNet-1k benchmark support the high hyperspherical energy of OOD data. Effectively, the low FPR is a result of separable hyperspherical energy distributions between ID and OOD data. On the theoretical side, there is a tradeoff in terms of the data used for training vs. the kind of guarantees we can provide. For example, in the most practical and unrestrictive setting with ID data only, we can indeed only guarantee inducing lower hyperspherical energy for ID data (as you said). This is inherently limited by the data exposed to the learner. Alternatively, one could possibly extend our framework by considering a more restrictive data setting that imposes auxiliary outlier data and explicitly optimize for the low hyperspherical energy on the outlier training data. This, in turn, can perhaps render some degree of guarantees on the OOD data. For this study, we intentionally chose to focus our scope on the former case (which is more general), and we believe the latter can be an interesting extension to look into in the near future. > **Q2: Why do generative models like VAE and Flow suffer from the overestimation of the likelihood for OOD data [1,2], while this work does not?** That's another insightful question. We would like to briefly mention several studies aiming to understand the phenomenon. For example, prior work by Ren et al. [3] showed that deep generative models' reliance on the background can undesirably lead to high likelihood estimation for OOD samples. [4] showed that generative models overfit the training data, especially the background pixels that are non-essential for determining the image semantics. In contrast, our method does not suffer from this issue since the training objective is _supervised_ learning, in essence, rather than _unsupervised_ generative modeling like VAE or Flow. As seen in Equation (12), the learning objective has access to the ground truth semantic label for each ID data. Hence, the latent embeddings are shaped and guided by the image semantics without being prone to overfit to the background like deep generative models. In general, methods relying on supervised learning offer stronger performance than the generative counterpart. What makes our method compelling is that it offers the good world of both supervised learning and generative modeling --- hyperspherical energy displays competitive empirical performance while enjoying the theoretical interpretation from a log-likelihood perspective. [1] Nalisnick, Eric, et al. "Do deep generative models know what they don't know?." arXiv preprint arXiv:1810.09136 (2018). [2] Li, Yewen, et al. "Out-of-distribution detection with an adaptive likelihood ratio on informative hierarchical vae." Advances in Neural Information Processing Systems 35 (2022): 7383-7396. [3] Ren, Jie, et al. Likelihood ratios for out-of-distribution detection. NeurIPS 2019. [4] Kirichenko, Polina, et al. Why normalizing flows fail to detect out-of-distribution data. NeurIPS 2020. > **Q3: On the relationship between $P_X(x)$ and $P_Z(z)$** The discrepancy arises since our method is not a deep generative model in nature (that optimizes $p(x)$), but rather a supervised model (that optimizes the posterior probability $p(y|x)$). Precisely because supervised models do not explicitly optimize $p(x)$, the issue you brought up can apply to all OOD detection methods relying on supervised learning. This is also why classification-based OOD detection is fundamentally challenging :) Fortunately, hyperspherical energy offers theoretical guarantees that OOD data can be detected if it lies in the low-likelihood region of the hyperspherical space. Hence, compared to prior OOD detection methods based on supervised models, our method already makes a significant contribution and steps forward by connecting to the log-likelihood interpretation (that usually requires generative modeling). Mitigating the issue you mentioned may require a hybrid model that combines supervised learning and generative modeling --- the latter of which can be often difficult to optimize in practice. In contrast, our method based purely on supervised learning can be tractably optimized, which is easy to use in practice and offers strong empirical performance (outperforming SOTA baselines). > **Typos** All fixed - thank you for the careful read! # Reviewer td2X We sincerely appreciate the reviewer's acknowledgment of the conciseness of our work and the extensiveness of our experiments. We are grateful for the thorough comments and suggestions provided. We address the questions in detail below. > **Rapid explanation of key developments** It's helpful to hear your feedback on this. Though we were constrained by the page limit, we plan to expand the details in our revised supplementary. To elaborate, we assess the gradient of the loss function relative to the model parameters $\theta$ for an embedding $\mathbf{z}$ and its corresponding class label $y$, as illustrated in Equation (14). Applying the chain rule, we compute the partial derivative with respect to $\theta$, as presented in Equations (14.1, 14.2). For Equation (14.3), we substitute the probability function with the Gibbs-Boltzmann distribution, as specified in Equation (2), which we then rewrite as Equation (14.4). $$ \begin{align} \frac{\partial \mathcal{L}(\mathbf{z}, y ; \theta)}{\partial \theta} &= \frac{1}{\tau} \frac{\partial E(\mathbf{z}, y)}{\partial \theta} + \sum_{j=1}^C \frac{\partial \exp(-E(\mathbf{z}, j) / \tau)}{\partial \theta} \cdot \frac{1}{\sum_{c=1}^C \exp(-E(\mathbf{z}, c) / \tau)} \tag{14.1} \\ &= \frac{1}{\tau} \frac{\partial E(\mathbf{z}, y)}{\partial \theta} - \frac{1}{\tau} \sum_{j=1}^C \frac{\partial E(\mathbf{z}, j)}{\partial \theta} \cdot \frac{\exp(-E(\mathbf{z}, j) / \tau)}{\sum_{c=1}^C \exp(-E(\mathbf{z}, c) / \tau)} \tag{14.2} \\ &= \frac{1}{\tau} \frac{\partial E(\mathbf{z}, y)}{\partial \theta} - \frac{1}{\tau} \sum_{j=1}^C \frac{\partial E(\mathbf{z}, j)}{\partial \theta} \cdot p(Y=j \mid \mathbf{z}) \tag{14.3} \\ &= \frac{1}{\tau} (\frac{\partial E(\mathbf{z}, y)}{\partial \theta} (1 - p(Y=y \mid \mathbf{z})) - \sum_{j\neq y} \frac{\partial E(\mathbf{z}, j)}{\partial \theta} p(Y=j \mid \mathbf{z})) \tag{14.4} \end{align} $$ The equation presents a contrastive term, implying the energy of the correct answer $y$ should be lowered while the energies of all the other labels should be raised during the loss optimization process. Furthermore, the sample-wise hyperspherical energy $E(\mathbf{z})$ of ID data is $E(\mathbf{z}) = -\tau \cdot \log \sum_{c=1}^C \exp(-E(\mathbf{z}, c) / \tau)$, which is dominated by the $E(\mathbf{z},y)$ with ground truth label. Hence, the training overall induces lower hyperspherical energy for ID data. By further connecting to our Lemma 3.1, this low hyperspherical energy can directly translate into high ($\uparrow$) log-likelihood $\log p(\mathbf{z})$, for training data distribution. > **Discussion on other hyperspherical approaches** Great suggestion. Our method differs from SSD+ [1] both in terms of training-time loss and test-time scoring function. - At training time, SSD+ directly uses off-the-shelf SupCon (Supervised Contrastive) loss [2], which does not explicitly model the latent representations as vMF distributions. Instead of promoting instance-to-prototype similarity, SupCon promotes instance-to-instance similarity among positive pairs. SupCon loss's formulation thus does not directly correspond to the vMF distribution. Geometrically speaking, our framework directly operates on the vMF distribution, a key property to enable our hyperspherical energy with log-likelihood interpretation. - At testing time, SSD+ uses the Mahalanobis distance, whereas we propose a novel hyperspherical energy score that is compatible with the learned embedding geometry. As suggested, we will include the following table in our revised paper, summarizing and discussing the key distinction among different hyperspherical approaches. ||SSD+|KNN+|CIDER|Ours| |-|-|-|-|-| |Training-time loss function|SupCon|SupCon|vMF|vMF| |Test-time scoring function|Mahalanobis (parametric)|KNN (non-parametric)|KNN (non-parametric)|Hyperspherical energy (parametric)| > **Clarification on CIFAR-10 performance** We would like to note that CIFAR-10 is a very saturated benchmark. We leave it in the Appendix due to the consideration that it's considered an almost-solved task, and the performance is on par with CIDER (the mostly related baseline to ours). Under the same embeddings, our method achieves an FPR95 of 14.16%, which is similar to CIDER (13.85%). While FPR95 can display some sensitivity due to being threshold-dependent, the overall AUROC (averaged across all thresholds) is almost identical to CIDER (97.62 vs. 97.51). We believe more challenging benchmarks such as ImageNet-1k and CIFAR-100 can better signify the efficacy of the OOD detection approach, which has been discussed extensively in the main paper. > **Comparison with ReAct + DICE** We appreciate your suggestion and have conducted an experiment on CIFAR-100 to compare our method with a combination of DICE [3] and ReAct [4]. We set $p$ for ReAct and $p$ for DICE, following their respective validation strategies. Our findings indicate that, even with this combination, the OOD detection performance is still lower compared to the hyperspherical energy score. For a more detailed overview, please refer to the results outlined below. Furthermore, in our revised paper, we plan to integrate the ReAct + DICE results into Tables 2, 3, and 5 for a more comprehensive comparison. **CIFAR-10** | | SVHN | | Places365 | | LSUN | | iSUN | | Texture | | Average | | |:--------------------- |:--------- |:--------- |:--------- | --------- | -------- | --------- | --------- | --------- |:--------- |:--------- |:--------- | --------- | | **Method** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | | ReAct | 48.21 | 92.20 | 48.11 | 90.97 | 23.03 | 95.96 | 22.02 | 96.38 | 48.90 | 91.19 | 38.05 | 93.34 | | DICE | 65.34 | 89.66 | 50.44 | 89.81 | 3.95 | 99.21 | 34.98 | 94.87 | 59.22 | 88.50 | 42.79 | 92.41 | | ReAct + DICE | 48.44 | 91.32 | 61.97 | 87.65 | 9.95 | 98.11 | 21.76 | 96.42 | 42.59 | 91.62 | 36.94 | 93.02 | | Hyperspherical energy | **3.89** | **99.28** | **32.59** | **94.14** | **3.05** | **99.29** | **16.02** | **97.20** | **15.27** | **97.64** | **14.16** | **97.51** | **CIFAR-100** | | SVHN | | Places365 | | LSUN | | iSUN | | Texture | | Average | | |:--------------------- |:--------- |:--------- |:--------- | --------- | -------- | --------- | --------- | --------- |:--------- |:--------- |:--------- | --------- | | **Method** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | | ReAct | 67.07 | 86.83 | 80.98 | 77.39 | 62.89 | 86.90 | 61.62 | 86.61 | 75.69 | 82.85 | 69.65 | 84.12 | | DICE | 53.23 | 89.70 | 83.03 | 75.49 | 44.63 | 91.38 | 74.87 | 79.05 | 84.68 | 73.29 | 68.09 | 81.78 | | ReAct + DICE | 33.05 | 93.49 | 90.38 | 67.91 | 40.92 | 90.46 | 78.90 | 79.36 | 63.24 | 83.30 | 61.30 | 82.90 | | Hyperspherical energy | **17.81** | **96.39** | **76.68** | **76.01** | **8.48** | **98.25** | **57.39** | **86.21** | **32.07** | **93.25** | **38.49** | **90.02** | **ImageNet-1k** | | iNaturalist | | SUN | | Places | | Textures | | Average | | |:--------------------- |:----------- |:--------- |:---------- |:--------- |:--------- |:--------- |:--------- |:--------- |:--------- |:--------- | | **Method** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | | ReAct | 20.38 | 96.22 | **24.20** | **94.20** | **33.85** | **91.58** | 47.30 | 89.80 | 31.43 | 92.95 | | DICE | 25.63 | 94.49 | 35.15 | 90.83 | 46.49 | 87.48 | 31.72 | 90.30 | 34.75 | 90.77 | | ReAct + DICE | 18.64 | 96.24 | 25.45 | 93.94 | 36.86 | 90.67 | 28.07 | 92.74 | 27.25 | 93.40 | | Hyperspherical energy | **8.76** | **98.00** | 36.95 | 91.52 | 49.33 | 87.67 | **11.45** | **96.56** | **26.62** | **93.44** | > **Hyperspherical energy with ReAct** Thank you for the suggestion. We have conducted experiments on ImageNet to combine ReAct [4] with the hyperspherical energy score. The results are provided below for your reference. We observe slight improvement. <!-- **CIFAR-100** | | SVHN | | Places365 | | LSUN | | iSUN | | Texture | | Average | | |:--------------------------------- | ------ |:------- |:--------- | ------- | ------ | ------- | ------ | ------- |:------- | ------- |:------- | ------- | | **Method** | **FPR**| **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | | Hyperspherical energy + ReAct (p=90%) | 61.09 | 85.28 | 84.33 | 72.89 | 13.66 | 97.50 | 64.59 | 85.09 | 41.38 | 90.04 | 53.01 | 86.16 | | Hyperspherical energy + ReAct (p=95%) | 51.07 | 88.85 | 82.54 | 74.52 | 14.30 | 97.40 | 64.30 | 85.30 | 39.13 | 91.09 | 50.27 | 87.43 | | Hyperspherical energy + ReAct (p=99%) | 26.37 | 94.83 | 78.66 | 76.53 | 10.57 | 97.85 | 61.40 | 85.87 | 26.17 | 94.40 | 40.63 | 89.90 | | Hyperspherical energy | 17.81 | 96.39 | 76.68 | 76.01 | 8.48 | 98.25 | 57.39 | 86.21 | 32.07 | 93.25 | 38.49 | 90.02 | --> | | iNaturalist | | SUN | | Places | | Textures | | Average | | |:--------------------------------------| ----------- |:--------- |:------- | ----------| ------- | --------- | -------- | --------- |:------- | --------- | | **Method** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | | Hyperspherical energy + ReAct ($p=90\%$) | 13.31 | 97.55 | 30.65 | 93.53 | 44.08 | 89.53 | 14.45 | 97.02 | 25.62 | 94.41 | | Hyperspherical energy + ReAct ($p=95\%$) | 10.34 | 97.94 | 29.37 | 93.61 | 43.74 | 89.72 | 12.13 | 97.21 | 23.89 | 94.62 | | Hyperspherical energy + ReAct ($p=99\%$) | 9.18 | 98.05 | 33.46 | 92.54 | 47.38 | 88.55 | 10.57 | 97.23 | 25.15 | 94.09 | | Hyperspherical energy | 8.76 | 98.00 | 36.95 | 91.52 | 49.33 | 87.67 | 11.45 | 96.56 | 26.62 | 93.44 | <!-- This performance drop and increase can be attributed to the manner in which the hyperspherical energy score functions – it relies on the relative distances between the hyperspherical embeddings and their corresponding class prototypes. The alteration of embedding positions on the unit hypersphere by ReAct disrupts these distances, thus removing the likelihood property of the hyperspherical energy score. Consequently, under the vMF distribution, the activation truncation induced by ReAct does not bring meaningful contributions. --> [1] Sehwag et al., SSD: A unified framework for self-supervised outlier detection. ICLR 2021 [2] Khosla et al., Supervised contrastive learning. NeurIPS 2020. [3] Sun et al., DICE: Leveraging Sparsification for Out-of-Distribution Detection, ECCV 2022. [4] Sun et al., ReAct: Out-of-distribution Detection With Rectified Activations, NeurIPS 2021. # Reviewer Vf4g We truly value your acknowledgment of the clarity and organization of our paper, as well as the significance of our goal to address issues regarding unconstrained energy scores from a log-likelihood perspective. Your assessment is encouraging and insightful. We address the questions below in detail. > **Difference w.r.t. SIREN [1]** We highlight the major differences below: - SIREN uses maximum class-conditional likelihood, which is mathematically different from the hyperspherical energy score. Unlike SIREN, hyperspherical energy is theoretically sound for OOD detection, due to its log-likelihood interpretation (see Section 3.1 for details). Our OOD scoring function enjoys rigorous theoretical interpretation from a log-likelihood perspective, while SIREN's does not. Our derivation and interpretation of hyperspherical energy provided in Section 3.1 is an entirely new contribution, relative to SIREN. - SIREN primarily focuses on representation shaping loss during training time. In contrast, we focus on a novel test-time OOD detection scoring function. Our method is the first to establish the connection between the hyperspherical representations and Helmholtz free energy for OOD detection. From a training-time perspective, we also derive new insight into how the learning objective induces lower hyperspherical energy (Section 3.2), which directly connects to our proposed OOD scoring function. > **Visualization analysis for large-scale dataset** That's a great suggestion. As part of the author's response, we have included a PDF that illustrates the learned embeddings via UMAP visualization on ImageNet. We indeed observe compact representations, where each sample is effectively pulled close w.r.t. the corresponding class prototype. A notable separation between in-distribution (ID) and OOD classes is also evident, suggesting that the OOD samples exhibit a high hyperspherical energy score. ![](https://hackmd.io/_uploads/SkL-ZBKs2.png) > **Comparison with FeatureNORM [2]** Thank you for your suggestion! In accordance with your advice, we are set to update the results for FeatureNORM [2] in our revised manuscript. This will encompass modifications to Tables 2, 3, 4, and 5, thereby facilitating a more extensive comparison. We have provided the updated results within the following tables for your reference. We utilize the ImageNet-1k results as represented in [2], and for CIFAR-10 and CIFAR-100, we run the experiments using block 4.1 as the selected block. We observe that our method consistently outperforms FeatureNORM across all the benchmark tests. **CIFAR-10** | | SVHN | | Places365 | | LSUN | | iSUN | | Texture | | Average | | |:---------------------- | ---------- |:--------- |:---------- | --------- | --------- | --------- | ---------- | --------- |:--------- |:--------- |:--------- | --------- | | **Method** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | | FeatureNorm | 8.79 | 98.27 | 76.75 | 79.84 | 0.16 | 99.92 | 37.67 | 94.17 | 29.96 | 94.08 | 30.67 | 93.26 | | Hyperspherical energy | 3.89 | 99.28 | 32.59 | 94.14 | 3.05 | 99.29 | 16.02 | 97.20 | 15.27 | 97.64 | 14.16 | 97.51 | **CIFAR-100** | | SVHN | | Places365 | | LSUN | | iSUN | | Texture | | Average | | |:---------------------- | ---------- |:--------- |:---------- | --------- | --------- | --------- | ---------- | --------- |:--------- |:--------- |:--------- | --------- | | **Method** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | | FeatureNorm | 52.69 | 87.95 | 95.26 | 55.62 | 5.96 | 98.74 | 99.33 | 38.51 | 62.11 | 76.16 | 63.07 | 71.40 | | Hyperspherical energy | 17.81 | 96.39 | 76.68 | 76.01 | 8.48 | 98.25 | 57.39 | 86.21 | 32.07 | 93.25 | 38.49 | 90.02 | **ImageNet-1k** | | iNaturalist | | SUN | | Places | | Textures | | Average | | |:--------------------------------------| ----------- |:--------- |:------- | ----------| ------- | --------- | -------- | --------- |:------- | --------- | | **Method** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | **FPR** | **AUROC** | | FeatureNorm | 22.01 | 95.76 | 42.93 | 90.21 | 56.80 | 84.99 | 20.07 | 95.39 | 35.45 | 91.59 | | Hyperspherical energy | 8.76 | 98.00 | 36.95 | 91.52 | 49.33 | 87.67 | 11.45 | 96.56 | 26.62 | 93.44 | [1] Du et al., Siren: Shaping representations for detecting out-of-distribution objects. NeurIPS 2022. [2] Yu et al., Block Selection Method for Using Feature Norm in Out-of-Distribution Detection. CVPR 2023.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully