afluo
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Dear Reviewer HhWA, Thank you again for your feedback. As the deadline for discussion is approaching, we would be happy to provide any additional clarifications that you may need. In our previous comments, we have carefully studied your comments and made updates to the revision as summarized below: * Provided a discussion how we enable generalization in the absence of multiview consistency via local geometric features. * Added a discussion on the advantage of using an implicit function compared to using the dataset. * Clarified in the paper that the grid features are learned, and provided additional details on how we initialize the grid. * Modified our notation to distinguish the original impulse response, and the STFT representation of the impulse response at a specific magnitude/phase index * Added a visualization using nearest-neighbor and linear interpolation baselines * Provided a description of our variables used in each equation, and the dimension of each variable. * Clarified the objective used in our NeRF only training, and the objective used when training NeRF and NAF jointly. Please let us know if you have any questions remaining. We would be happy to do anything that would be helpful in the time remaining! Thank you for your time! Best, Authors =========================== Dear Reviewer U1r6, Thank you again for your suggestions. As the deadline for the discussion is coming up, we would be happy to address any remaining questions. In our previous response, we have carefully studied your suggestions and made updates to our revision, which we summarize below: * We clarify the scope and goal of our work in the introduction. * We provide additional technical details on the dimension and initialization of the grid. * We added an experiment to the Appendix on blending the left/right latent. * We now detail the dimension of each variable, and modify the notation to make the meaning of each variable more obvious. * We now clarify which version of our NAF has a grid in our paper. * We provide additional details on how the waveform can be recovered without learning the phase. * We added more details on how we perform temporal padding of the training data. * We added additional details on how we select the test set. * We have clarified our use of Eqn 8. and modified the language to clarify how we perform localization. We would like to know if you have any additional comments or suggestions. We’d be very happy to do address any remaining questions that we can in the time remaining! Thank you for your suggestions! Best, Authors ======================================= # V59y <!-- Thank you for your comments. We are glad to see that we have addressed your main concerns on modeling phase and appreciate your comments on our work. We are happy to address any remaining concerns with additional clarifications followed by paper revision! --> Thank you for your comments. We have worked hard to address your concerns, and we have incorporated your suggestions in this revision. We are happy to address any remaining concerns with additional clarifications followed by paper revision. > **Comparison with simple baselines.** We have provided a comparison against two simple but strong baselines (nearest neighbor and linear interpolation) and show quantitative results in Table 1., with qualitative results shown in Figure A5 of the Appendix. We demonstrate that we achieve lower error when measured in magnitude-STFT, and achieve lower average error when measured using T60 as a metric. We are currently working on additional qualitative results for our website, this will take one or two days. <!-- We have provided a comparison against two simple but strong baselines (nearest neighbor and linear interpolation) and the quantitative results are shown in Table 1., with qualitative results shown in Figure A5 of the Appendix. We demonstrate that we achieve lower error when measured in magnitude-STFT, and achieve lower average error when measured using T60 as a metric. We are currently working on additional qualitative results for our website, which will take one or two days. --> > **Advantages of our NAFs over interpolation baselines.** A further important advantage is the compact nature of the implicit representation. On average our NAFs use 0.5% the storage of these interpolation baseline methods as we show in Table A1 of the Appendix. The compact nature of the implicit representation means that we can encode and utilize spatial audio even when we cannot store the full amount of spatial audio data (<30MB for NAFs, several GBs for interpolation). > **Limitations and future work.** In terms of only modeling the magnitude of the STFT for a spatial impulse response, precedent can be found in the Image2Reverb (ICCV 2021) paper. They describe modeling the magnitude-only STFT (and sample random phase) as a way to model the spatial impulse response in high quality. We agree that modeling the magnitude alone cannot account for all the information in the impulse response. We will clarify our goal and the limitations of our work in an upcoming revision. > **Consistency on paper writing.** As the first work in modeling the continuous spatial audio in a scene, we strive to make this framework general to public audiences and inspire follow-up work to explore more in this new emerging field. We understand your concerns, and are happy and able to make the change in the text to make our application and claim more appropriate. <!-- Below is a summary of change we are planning to incorporate in the final version. --> Because we are no longer able to update the original paper, we **provide an updated revision on [our website](https://sites.google.com/view/nafs-iclr-2022/home).** We have modified our language in the introduction to reflect our goal is to model plausible spatial audio, We have also modified our conclusion to discuss the limitations of our current model, and potential avenues for future work. Please feel free to let us know if you have additional comments! <!-- Thank you for your comments. Early in the project, we reached out to the authors behind the recent SIGGRAPH papers in Microsoft's Project Triton. However the code and data could not be released under an academic license. In their work, their primary concern is to achieve a compact encoding. Because we utilize a learning representation, our approach has a fixed storage cost regardless of scene size and complexity. As an additional baseline, we propose to add a comparison where the impulse responses undergo lossy compression with a modern audio codec (libopus) or a modern image codec (JPEG/AV1/HEVC) applied to the STFT. This would allow us to compare our approach to when the room acoustics are compactly encoded for VR/gaming applications. As far as we are aware, T60 error is the primary qualitative metric used to evaluate the output in recent papers that deal with learned spatial audio (Image2Reverb, Deep Acoustics), and we provide T60 error as a metric following your feedback. We will further refine our claims in a revision to our paper. --> # U1r6 Thank you for the response. > **How is linear interpolation computed?** As we clarify in the caption for Table 1., we actually adopt a stronger baseline through interpolation in the time/waveform domain. Prior work mentions interpolation in both the time and frequency domain as valid approaches [1]. We utilized linear interpolation in the log-magnitude STFT domain initially. However, after performing the interpolation in time domain, we observe lower MSE error and T60 error compared to interpolation in the log-magnitude STFT and utilizing Griffin-Lim for phase recovery for the T60 metric. For this reason, we believe that time domain interpolation is the stronger baseline. > **Phase and spatial audio.** For the purposes of gaming/VR tasks, past work for spatial audio representations generally also do not model phase. In particular, prior work for spatial audio representation: Pulkki's DirAC [2], Raghuvanshi's parametric coding [3], Drori's image2reverb [4] do not encode phase and construct either random phase filters, minimum-phase filters or sample a random phase for log-magnitude STFT reconstruction respectively. > **Speech dereverberation.** Like other work, we model the spatial sound by convolving a clean source with the impulse generated by our system. We do not explore dereverberation via deconvolution in this work. But we agree that direct deconvolution using our magnitude only representation with inferred phase would not yield high quality dereverberation. A possible approach to produce perceptually reasonable de-reverberant speech/audio could be learning a network to produce dry speech/audio conditioned on the reverberant audio STFT and the NAF predicted magnitude STFT impulse response [5], instead of performing blind dereverberation. In this case, we can reconstruct the time-domain signal from this clean magnitude spectrogram estimate using Griffin-Lim or a neural vocoder (wavenet). Our work takes the first step on modeling magnitude of impulse responses, and we believe this will inspire many follow-up works on this exciting direction. > **Additional qualitative results.** Thank you for this great suggestion. We are currently working on additional qualitative results, this will take one or two days. We will provide an update once the results are posted. Please let us know if you have any additional questions! [1] Raghuvanshi, Nikunj, et al. "Precomputed wave simulation for real-time sound propagation of dynamic sources in complex scenes." SIGGRAPH (2010). [2] Pulkki, Ville. "Directional audio coding in spatial sound reproduction and stereo upmixing." Audio Engineering Society Conference (2006). [3] Raghuvanshi, Nikunj, and John Snyder. "Parametric wave field coding for precomputed sound propagation." SIGGRAPH (2014). [4] Singh, Nikhil, et al. "Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis." ICCV (2021). [5] Han, Kun, et al. "Learning spectral mapping for speech dereverberation and denoising." TASLP (2015). <!-- A possible approach to produce a perceptually reasonable de-reverberant speech/audio could be performing division of the reverberant STFT magnitude with the predicted RIR frequency magnitude to obtain an estimate of real component of the dry speech/audio [5] in the frequency domain. In this example, we can create the time-domain signal from this magnitude spectrogram estimate using Griffin-Lim or a neural vocoder (wavenet). --> <!-- It is still possible to produce a perceptually-reasonable de-reverberant speech/audio using the proposed method. For example, one possibility is to divide the reverberant magnitude spectrogram with the RIR magnitude spectrogram to obtain an estimated magnitude spectrogram of dry speech/audio. Then we can create the time-domain signal from this magnitude spectrogram estimate using Griffin-Lim or other vocoders. We expect that signals created in this way are of reasonable perceptual quality. --> ========================= # HhWA Dear Reviewer HhWA, We deeply appreciate your feedback. Our goal has not changed from our initial revision, and that is to learn a continuous representation of spatial acoustics from sparse training samples. The link we provided in the original paper included many qualitative results for our work, which we have further augmented with a direct comparison between our network and interpolation baselines: [https://sites.google.com/view/nafs-iclr-2022](https://sites.google.com/view/nafs-iclr-2022/home) Generalization to novel locations remains an open question in both visual and acoustic models using implicit functions. As we show in the website, interpolation using the dataset alone cannot yield satisfactory results. We show in our paper that learning local spatial features can help in the absence of multiview consistency (as in vision). As the first work in modeling the continuous spatial audio in a scene, we hope to make this framework general for public audiences and inspire follow-up work to explore more in this new emerging field. We hope you can consider a more positive evaluation given to our work. And honestly, we are really uncomfortable about your concerns on “overclaim”. It seems that all reviewer ignore the website posted on our first draft, and thus misunderstand the goal of our paper. This is clearly not our faults. We have offered all the clarifications, addressed all of the concerns, and updated the draft accourdingly. We would much appreaciate and reconsider changing your score to the positive side. Thanks for your time! Best, Authors # HhWA v2 Dear Reviewer HhWA, We deeply appreciate your feedback, and are grateful that you consider our work to be interesting. Our goal has never changed from our initial revision, and that is to learn a continuous representation of spatial acoustics from sparse training samples. The link we provided in the original paper included many qualitative results for our work, which we have further augmented with a direct comparison between our network and interpolation baselines: [https://sites.google.com/view/nafs-iclr-2022](https://sites.google.com/view/nafs-iclr-2022/home) As we show on the website, interpolation using the dataset alone cannot yield satisfactory results. We respectfully push back on the concerns of "overclaim". We **never** claim we can generalize to unseen scenes. In the context of neural implicit representations, usually the goal is to learn a representation that can generalize to unseen locations/views for a *single scene*, after training from sparse views from the same scene. In the absence of multiview consistency used in vision, we demonstrate that learning local spatial features is an applicable alternative for spatial acoustics. In our very first draft, we included a website with qualitative results where we demonstrate unequivocally that our networks can continuously infer the spatial audio at unseen locations. We have further offered the clarifications and addressed the concerns as requested, and have updated the revision accordingly. As the first work in modeling the continuous spatial audio in a scene, we hope to make this framework sufficiently general for public audiences and inspire follow-up work to explore more in this new emerging field. We hope you can consider a more positive evaluation of our work. Thank you for your time! Best, Authors

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully