Tianqin Li
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # ICLR rebut MOCA <!-- example refer to https://hackmd.io/Fg4NScxuT4WBUzEno05obg --> # General response We would like to thank all reviewers for their feedback and constructive suggestions. We are very encouraged by reviewers' evaluation that our work is interesting ("timely and interesting" (JpF6), "concept is interesting" (4n7p), "novel and significant contributions" (9zZN)). We have taken into account reviewer feedback, and provided an updated revision with a detailed response to each reviewer's suggestions and feedback. In the revised manuscript, we have highlighted our changes in red. **Below we summarize our update to the manuscript.** * To avoid naming confusion, we change rename the previously defined "prototype semantic cell" as "semantic cell" and the previously defined "prototype component cell" as "prototype cell". * We rephrase the introdution part to clarify the connection between our proposed method and neuroscience. In essence, our model is inspired from the hypothesis of "grandmother-cell", and we explore the utility of prototype cells as memory priors in a standard computer vision image generation task to seek providing potential insights into the advantages of having these "grandmother" neurons inthe visual cortex at a functional level. * We add comparison with other works that use prototype learning and memory bank in the related work and highlights the uniqueness of our work. * We modify Figure 1 and Figure 2 to clarify the three projection heads $\theta(\cdot)$, $\phi(\cdot)$ and $\psi(\cdot)$ and how they are used in our model. * Visualization results are added as new Figure 5 and Figure 6 in main text to provide further understanding of the semantic meanings of different clusters and prototypes formed in MoCA. Corresponding discussions are updated in Section 4.3. * To verify that MoCA is not simply memorize the training samples, we visualize the closest top 3 training data for the generated images in Appendix 8.8. * We move the original analysis for MoCA's prototype (original Figure 5, Figure 6 and subsection “Importance of memory organization”) to the Appendix Sec.8.1 and Appendix 8.6. # Response to JpF6 We thank the reviewer for the insightful comment. Below we would like to address the reviewer's concerns. **(Connection to neuroscience)** We thank the reviewer's cautionary note for the connection of our work to neuroscience. In fact, all three reviewers raised this issue, but all agreed that this is not a major concern, as we did not claim this is a neural model, just neural-inspired, and we agree that the paper should be judged as a computer vision paper. We will revise the paper to clarify this point. That said, we must say that our inquiry was genuinely triggered by the finding of the super-sparse code of neurons in the superficial layer of V1. The question we asked was, what could be the computational advantages on feedforward (analysis) / feedback (synthesis) paths in the hierarchical visual system for having this type of neurons with such high response sparsity and selectivity? We used a computer vision task to explore this question at a functional level. We did not claim that our model is an explicit neural model with a serious correspondence at an architectural or cellular level. However, we still believe our work is relevant to neuroscience at a functional level. From our perspective, CNN and GAN are relevant for understanding feedforward and feedback computation in the hierarchical visual cortex, respectively. The conceptual framework of analysis-by-synthesis and predictive coding are popular among cognitive and theoretical neuroscientists. The study of GAN and image generation are meaningful and potentially relevant for understanding the function of recurrent feedback. We tuned down our abstract and revised introduction to say, "Although our work is inspired by neurophysiological findings, we do not claim that the proposed MoCA module is a neural model. Rather, our goal is to explore the utility of prototype cells as memory priors in a standard computer vision image generation task, which might provide potential insights into the advantages of having these "grandmother" neurons in the visual cortex at a functional level." **(Parameter Inflation)** We thank the reviewer for bringing up this question. We address this concern with experiments from the results already documented in the original paper as well as new experiments we conducted in the rebuttal period. First, we do include a fair comparison with the baseline in our manuscript. Specifically, MoCA is an architecture layer that has two paths: one is conventional Self-Attention (SA), the other is a newly proposed prototype memory attentional path. Note that the projection head ($\theta(\cdot)$, $\phi(\cdot)$, $\psi(\cdot)$) of prototype memory path is shared together with the Self-Attention, hence comparing to using self-attention alone, there's no **learnable** parameter inflation (learnable in the sense of gradient descent). Yet, Table 3 in the main text shows MoCA's improvement compared with Self-Attention. When it comes to comparison with original FastGAN and StyleGAN, our method only has an increase of a tiny percent of parameters in comparison to the overall **learnable** parameter numbers (about 0.1 % on FastGAN baseline and 0.6% on StyleGAN). Yet, the addition of a fixed parameter MoCA layer provides 5 % ~ 20 % performance improvement (measured on FID, see main text section 4.1). Furthermore, to completely control for the number of parameters, we added more parameters in terms of "features" in a layer to FastGAN so that they have the same number of parameters as MoCA + FastGAN. We found that the performance improvement it produced was minuscule (best FID improves less than 1% and overlap with the normal size model for most of the time). Thus, the performance improvement observed in MOCA cannot be explained by parameter size only. **(Memorizing examples from training set)** Does MOCA simply remember example images for few-shot image generation? We will provide further justification and empirical analysis in the Appendix. MoCA is a module that can be installed in every layer in theory. In our current implementation, we put MoCA in an intermediate layer. Hence it only caches the parts level activations, analogous to visual concepts, but in a remapped space. The prototypes are clusters over the entire dataset, gathering part-level prototypes that can potentially be used as reconfigurable parts to synthesis novel images in a compositional framework. Such parts are useful for few-shot image generation because the training data is limited, and hence reconfiguration of the existed parts could ease the training process. However, in contrast to AndOr Image Grammar, our reconfiguration is softer, implemented using an attention mechanism. Empirically, we perform comparison with top 3 most similar training images (Based on pretrained VGG features and cosine similarity metrics) in Obama, MSCOCO-300 and Animal Face Dog dataset to demonstrate that our MoCA trained model is not a memorization of the training set (See **Appendix Section 8.8** for demonstration and further discussion). We can qualitatively see that although they have some degree of similarity, they are **not** simply memorization of the train images. **(Concerns on MoCA cluster visualization)** To facilitate further understanding of MoCA system, we update the manuscript to include two additional analyses. We answer the questions about (i) which semantic cluster is used during the attention process for different locations in the image? Is there any real semantic associated with? (ii) What are the relationships between the prototype cells and its corresponding clusters? We provide detailed results and discussions in Section 4.3. Briefly, the results suggest that **(1)** Sementic clusters have semantic meanings, and different semantic clusters will bias different regions of the images based on the semantics. **(2)** Prototype cells are specialized sub-parts of the parent semantic concept clusters, suggesting MoCA may have the potential to facilitate a hierarchical compositional system in the image synthesis path. To make it concrete, consider a layer in which the prototype component cells are encoding face parts such as eyes, noses, and mouth. The prototype semantic cell will be coding "face-ness." The self-attention 1x1 function serves to map eyes, noses, and mouth to proximal locations in a low-dimensional transformed space, where these related concepts are clustered together and can be represented by their cluster center -- "face-ness" semantic cell. Note that this "face-ness" cell has the spatial scope of one hyper-column only and is different from the neurons in the preceding layer (upstream), which, with a larger receptive field or projective field, represent the entire face, with all the face parts in an appropriate spatial configuration. The prototype semantic cell is thus more like a circuit switch or grandmother selector. It represents an abstract semantic meaning like face-ness and is different from visual concepts or prototypes. The prototype semantic cell itself does not have all the fine detailed information to provide the appropriate modulation. We found that a subset of grandmothers turned on (or not turned off) by the prototype semantic cell need to work together to provide the appropriate contextual modulation. # Further Response to JpF6 Dear Reviewer JpF6, Thank you again for your feedback. As the deadline for discussion is approaching, we would be happy to provide any additional clarifications that you may need. In our previous comments, we have carefully studied your comments and made updates to the revision as summarized below: - Provided a discussion on how our work is related to neuroscience and revised the manuscript to clarify our work is not a neural model and MoCA only corresponds with neuroscience at a functional level. - Provided a discussion based on existed ablation and additional results to show that MoCA's performance improvements are not due to parameter inflation but the proposed mechansim itself. - Provided the top-3 nearest neighbors of the generated images to demonstrate that MoCA is not only remembering the train set but facilitating generalization by re-configuring parts-level information. - Add more intepretable visualization to demonstrate how MoCA works: Cache parts-level information which is then used to softly bias the corresponding activation. Please let us know if you have any questions remaining. We would be happy to do anything that would be helpful in the time remaining! Thank you for your time! Best, Authors # Response to 4n7p We thank the reviewer for the constructive feedback. Below we would like to address the concerns. **(Concerns about gradmother cells in neuroscience)** We thank the reviewer for the advice. We made it clear in the abstract and the introduction of the updated manuscript as follows, "We called these highly selective sparse-responding feature detectors 'grandmother neurons' to highlight their possible explicit encoding of prototypes, even though in reality, a prototype is likely represented by a sparse cluster of neurons in the brain. Neurons in different layers of each visual area exhibit different degrees of response sparseness, complementing one another in various functions. " It is worth noting that even in our model, there is also a variety of neural codes, ranging from distributed, sparse, and "grandmotherly." The code of the neurons in the feature layers in the GAN for example are quite distributed. Distributed codes provide greater flexibility and power in analysis and synthesis. Further, while a "grandmother cell" in MoCA might be coding an explicit prototype memory or visual concept, in MoCA, multiple grandmother cells, not just one, in the same semantic cluster would be activated by any particular input, and their codes are combined (weighted by softmax) to generate the needed contextual modulation to the activation patterns for downstream processing. **(Comparison to existed ideas on memory bank and prototype learning)** We appreciate the reviewer's suggestion. We add additional comparison for the memory bank ideas in the literatures under the related work section titled "Prototype Memory Mechanism". Indeed memory bank is a not a completely new idea. However most of the work is using memory bank at instance level [1] but our method propose to use the memory bank at intermediate representation level (parts level). By having memory bank at parts level, our system incorporate the idea of compositional system and reconfigurable parts naturally. We show that this is particularly valuable during the few shot image generation task. In addition, we also discuss our difference with the existed prototype ideas like [2] under the related work section titled "Few Shot Prototypes Learning" in the updated manuscript. The main difference lies in the prototype forming level and how we utilize the prototypes. The prototypes in [2] are learnt as representatives for different classes at instance level, however, our prototype exists at the intermediate feature level (parts level). Also our method utilizes attention mechanism to softly bias the intermediate features using the cached prototypes during inference but [2] select the closest prototype in a discrete manner in order to perform category predictions at test time. Although we both involves prototype generation and usage during inference, our method with attention mechansim can smoothly modify the intermediate repsentation, hence can be applied to more challenging task like image generation in the few-shot setting. **(Neuroscience correspondence, regarding the neural analog of semantic prototype cell, and Why not use Ki for prototype memory)** Our inquiry was inspired by the finding of the super-sparse code of neurons in the superficial layer of V1. We wish to search for the computational advantages of having the neurons with such high response sparsity and selectivity in the feedforward (analysis) / feedback (synthesis) paths in the hierarchical visual system. We used a computer vision task to explore this question at a functional level. Note that we don't claim our work is a neural model that resemble deeply to the biological details, but rather a exploration of potential computational roles of these recently found "grandmotherly" neurons in a hierachracial visual system. The benefit MoCA brings on the computer vision task can potentially hint our questions about the computational advantages of such neurons. Overall our model is a computer vision model that is designed with some neuroscience property in mind. We clearify the connection and the limitation in the revised manuscript (abstract, introduction and conclusion). As to the questions, "What do the prototype semantic cells and the prototype component cells correspond to in the neocortex? Is there a correspondence, or is it just for network design? " In our mind, the prototype component cells correspond to the super-sparse cells found in a superficial layer of V1 that motivated our study. The prototype semantic cells, if you allow us to speculate, might be a type of inhibitory neurons (SOM) that perform circuit switching to select a certain subset of prototype cells to participate in contextual modulation. The prototype semantic cell represents more abstract information and does not have all the fine detailed information to provide the appropriate contextual modulation. The prototype semantic cells are responsible for selecting an appropriate set of prototype cells to participate in the attention process. The group of prototype (grandmother) cells need to work together to provide the appropriate contextual modulation. **(Questions about $\theta$, $\phi$ and $\psi$)** We appreciate the reviewer's question. We clarify in the updated main architecture figure (Figure 1) that our proposed method contains 2 paths, one is Memory Concept Attention (MoCA) and the other is Self-Attention (SA). MoCA and SA share the same projection head. The use of 3 head is followed as the convention in the Self-Attention/non-local network literatures: $\theta$ is considered as query, $\tilde{\phi}$ is considered as key in MoCA ($\phi$ is the key module in SA, and $\tilde{\phi}$ is its corresponding momentum update version), through which we cache $\tilde{\phi(A_{ij})}$ into the memory so that everytime we have a query, we can find a group of related key from the memory and bias the query. $\psi$ corresponds to the 'value' in SA literature. We update the Figure 1 for more clear demonstration of the network architectures. We do not intend to use capitalized $\Theta$, $\Phi$ and $\Psi$ and correct them in the updated manuscript. [1] He, Kaiming, et al. "Momentum contrast for unsupervised visual representation learning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. [2] Snell, J., Swersky, K., & Zemel, R. S. (2017). Prototypical networks for few-shot learning. arXiv preprint arXiv:1703.05175. # Further Response to 4n7p Dear Reviewer 4n7p, Thank you again for your feedback. As the deadline for discussion is approaching, we would be happy to provide any additional clarifications that you may need. In our previous comments, we have carefully studied your comments and made updates to the revision as summarized below: - Revised the manuscript to clarify our connection to neuroscience and provide a discussion on the correspondance with grandmother cells, role of semantic cells and the rationale of choosing prototype memory. - Provided a further comparison with existing works utilized memory bank and prototype ideas. - Clarified the use of projection heads $\theta$, $\phi$ and $\psi$. Update the main architecture figures to make it more clear. Please let us know if you have any questions remaining. We would be happy to do anything that would be helpful in the time remaining! Thank you for your time! Best, Authors # Response to 9zZN We thank the reviewer for very positive comment. We hope that the updates we added to the paper have further strengthened it. Below we would like to address your concerns. **(Concerns about baseline numbers)** We appreciate the reviewer's concern. FastGAN and StyleGANv2 are chosen because they are the strongest image generations models in either unconditional few shot image generation (FastGAN) or the image generation domain in general (StyleGANv2) in terms of architecture design. We do include another baseline model (LSGAN) to further test the generalization of MoCA. Although the LSGAN backbone is weaker comparing to FastGAN and StyleGANv2, we still observe that MoCA can improve upon the baseline model by certain amount (See Appendix 8.2 for details). **(Concerns about neuroscience correspondence)** We thank the reviewer for sharing this concern. Our work was inspired by the neuroscience findings but the correspondence at best is at a functional level. We don't intend to claim this is a neural model nor did we based on neural evidence to construct the corresponding architectures. Our work is rather a computer vision method that was designed with neuroscience questions in mind. Although the exact biological correspondence may still be not obvious at the moment, the performance improvement in the image generation task could spark ideas about the potential computation roles of the sparse complex feature detectors in the hierarchical compositional visual systems. We have updated our manuscript to make the connection and limitations more clear.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully