Neurips rebuttal 2021
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # **Response to reviewer WNca** Thank you for taking the time to review our submission. Here are some clarifications regarding your observations. We hope that they will improve your opinion of our work and we kindly ask you to consider the possibility of raising your score. * **Regarding the practical implications of our work:** Although the main purpose and goal of our analysis is to highlight the different generalization behaviors of different regularizers, it is possible to draw some practical implications from our work: it might be a good idea to use nuclear norm rather than Frobenius norm regularisation in conjunction with quadratic classifiers, especially in high dimensions when the data is nearly isotropic. However, we stress that our paper is not arguing for the use of a certain regularizer over the other, we are simply showing that an observed phenomenon (improved sample complexity) can be explained by regularization. More generally, our results also *suggest* (but not imply) two more promising avenues of research: (1) it might be a good idea to try nuclear norm regularization in multiplicative networks [A] which are composed of quadratic layers and (2) adaptivity of neural networks to the data might be due to small nuclear norm of the weight matrices (this aligns nicely with the line of work arguing that SGD is implicitely minimizing rank, for which the nuclear norm is a proxy). Of course, **these suggestions are not part of the contributions of our paper**. We will clarify this in the conclusion section in the final version. * **Regarding the choice of \\(\lambda\\) in the experiments:** We set lambda to 1 in the experiments. (We have indicated it on line 307, but we can make it figure more prominently) * **Regarding the assumption \\(\|x\|_2^2 \geq c \mathbb{E} \|x\|_2^2 \\) almost surely:** We think you mean the inequality \\(\|x\|_2^2 \leq c \mathbb{E} \|x\|_2^2\\) in Lemma 4 (line 207)? This is only a technical assumption which means that the distribution is bounded, and that its magnitude concentrates near the mean. This is in particular satisfied for natural images with fixed average pixel magnitude. By writing it this way, we are considering the entire class of bounded distributions whose magnitudes are allowed to scale with dimension, but once normalized by the expected norm, the dimension dependence dissappears. Indeed without such assumption, the empirical covariance is not a good estimator of the true covariance (see thm 5.6.1 and ex 5.6.5 in [B]). * **Regarding the interpretation of the experiments, and Figure 1.b**: Our theoretical bounds, as all uniform generalisation bounds, are on the worst function in the class so we designed the experiment to find this worst performing function. We do so to exhibit the intrinsic dimension dependence of the worst train-test gap and corroborate our theoretical findings. We think that the observed decrease in Figure 1.b is not substantial enough to conclude that there is a downward trend, for instance, in the rightmost part of the plot, we see that the generalization gap starts to increase slightly, breaking the downwards trajectory. Overall, we believe the trend is *flat*. The main message is that there is clearly a _different sensitivity_ to an increase in dimension between the different regularizations. The plot is a result of a non-convex optimization in increasing dimensions, so some variability is to be expected. * **Regarding the writing syle**: * *<u>In Proof of Cor 2 it says "both identities in eq. 3" -- but eq(3) is one inequality!</u>*: We will rephrase this in a clearer way. What we mean is that one should use the two identities \\(r(\Sigma) \approx d^s \\) and \\( \|\Sigma\|_2 \approx d^{1-s}\\) together with the inequality given in (3), to arrive at the final result. This line should be read as “Plugging the two previously mentioned identities in the right-hand side of inequality (3)...” * *<u>bottom of p 8: "try to find" sounds vague - indeed, what if the optimiser doesn't find the function we look for?</u>*: The problem is a non-convex maximization problem, therefore, if the optimiser does not find the maximum, it nevertheless provides a lower bound. * *<u>top of p 10: what is mean by "an equal regularisation parameter"?</u>*: This line means that the parameter \\(\lambda\\) has the same numerical value for both regularizations. # **References** [A] MULTIPLICATIVE INTERACTIONS AND WHERE TO FIND THEM Siddhant M. Jayakumar, Wojciech M. Czarnecki, Jacob Menick, Jonathan Schwarz, Jack Rae, Simon Osidnero, Yee Whye Teh, Tim Harley, Razvan Pascanu (DeepMind). ICLR 2020. [B] High-Dimensional Probability: An Introduction with Applications in Data Science. Roman Vershynin. 2020. https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.html # **Response to reviewer DAu4** Thank you for taking the time to review our paper. We will make the following changes to improve the presentation. We will precisely define quadratic classifiers as classifiers based on the sign of a quadratic polynomial. We will add that by pixel space, we mean the space \\([0, 1]^d\\) where images belong, as they are arrays of bounded values. We will also make sure that the definition of _isotropicity_ appears before the term is mentioned in the abstract, precisely, we will rewrite the first sentence as “It has been recently observed that neural networks, unlike kernel methods, enjoy a reduced sample complexity when the distribution is isotropic (i.e., when the covariance matrix is the identity).” Finally, we will correct the typos that you mention. # **Response to reviewer FnjC** Thank you for taking the time to review our submission. We appreciate your comments on the comparison of the trace-norm and the Frobenius-norm. We included the section on computability for completeness. It can be moved to the appendix. The links to neural network learning are indeed interesting. As noted in our response to reviewer WNca, our work points towards the following avenues of research. The most direct application of our results is on networks with quadratic activations or multiplicative interactions [A]. The second more interesting avenue is to show that adaptivity of neural networks to the data might be due to small nuclear norm of the weight matrices. The implicit regularization effects of SGD, which, at least for linear neural networks, is argued to be a form of rank minimization, is closer to a trace-norm regularization than a Frobenius-norm one because trace-norm minimization is a better proxy of rank minimization. Consequently, it may be this implicit regularization that explains the data adaptivity. These speculations however require careful analysis and we judged it best suited for follow-up research. We will nonetheless add a few lines detailling the following points. First, we can describe in more detail the empirical observations that studied intrinsic dimension and its influence on generalization. Second, we can write out the derivation showing that a single hidden layer neural network with quadratic activations \\(x \mapsto x^2\\) is exactly a non-convex parametrization of a quadratic classifier. We believe this will clarify our position that our work serves as an important first step in theoretically characterizing why neural networks generalize well. Our main message, however, remains that a simple model can exhibit some of the observed phenomena in complicated, difficult-to-analyze settings. Regularization can explain improved sample complexity over kernels. # **References** [A] MULTIPLICATIVE INTERACTIONS AND WHERE TO FIND THEM Siddhant M. Jayakumar, Wojciech M. Czarnecki, Jacob Menick, Jonathan Schwarz, Jack Rae, Simon Osidnero, Yee Whye Teh, Tim Harley, Razvan Pascanu (DeepMind). ICLR 2020.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully