linxingyu
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # ICLR'22 DiffSkill rebuttal ## General Response We would like to thank the reviewers for their thoughtful feedback. We are glad to see that reviewers generally appreciated our paper: rationality and novelty of our method (reviewer H1QF, DHQA, wVGi, ifUw), difficulty of the tasks our method can solve (reviewer H1QF, DHQA, wVGi, ifUw), contribution to the robotics community (reviewer H1QF, DHQA), generality of our method to different tasks (wVGi, H1Qf) and clear presentation (reviewer H1Qf, DHQA). In addition to the response to specific reviewers, here we summarize the added experiments and revision to the paper: **[New experiments and discussion]** - Comparison with an additional model-free RL baseline, i.e. SAC (Haarnoja et al. 18') in Table 1. - Comparison of DiffSkill with Traj Opt and RL baselines on single-tool tasks in Appendix B (Reviewer ifUw) - Comparison of pre-training VAE versus training VAE jointly with other losses in Appendix C (Reviewer ifUw) - Our website (https://sites.google.com/view/iclr2022diffskill) is updated with visualization of baseline methods. (Reviewer H1Qf, ifUw) - Add discussion of the limitation and future directions of our work in the conclusion section. (Reviewer H1Qf, wVGi) We hope our responses have convincingly addressed all reviewers’ concerns. We thank all reviewers’ time and efforts! Please don’t hesitate to let us know of any additional comments on the manuscript or the changes. ## R4 (Reviewer ifUw) Thank you for your detailed feedback. We are glad that you find our method "insiprational". Below, we address each of your concerns in the weakness section with new experiments and clarifications. > Is a skill dependent on the tool alone or the observations together as well? Specifically, if you have different start/end positions of the same short-horizon task using only one tool, are they considered the same skill? We define separate skills based on the tool alone and not the observations. In our experiments, a single skill is applied to different observations. For example, in the LiftSpread task, the skill of using the rolling pin can be applied to either when the dough is on the right or when the dough is on the cutting board. The same skill can also generalize to different environment configurations where the shape of the dough or the start/end positions of the tool are different. > Why not training the VAE alone? Generally speaking, this module should be standalone. Thank you for the great suggestion. We have performed an additional experiment comparing two approaches: one is our method in the initial submission which jointly trains the VAE along with other modules and the other is to train the VAE first and then freeze the VAE encoder when training other modules. The planning performance on different tasks are shown below (Normalized Performance / Success Rate): | | LiftSpread | GatherTransport | CutRearrange | ---------- | ----------- |----------- |----------- | | Joint Training | 0.450 / 100% | 0.663 / 60% | 0.367 / 20% | Pre-train VAE | 0.438 / 100% | 0.654 / 60% | 0.450 / 40% We find that pre-training VAE provides a slight performance gain, although it takes longer to train overall since we need to train the VAE first. We will include this comparison in the final version of our paper. > How do you determine H in each long-horizon task? How does different H in one task impact the performance? Currently, H is manually specified according to how many stages each task may take. For example, the LiftSpread task requires a two-stage execution of first lifting up the dough and then spreading the dough. Using smaller H leads to complete failure of the task, since the innate nature of the challenging tasks require multiple stages of execution with different tools. On the other hand, increasing H unnecessarily increases the difficulty of planning, and indeed we observe a decrease in the performance with larger H. Planning over more steps would be an interesting direction for future work. We have added a discussion on this point in the conclusion section. > The method exhaustively search over all combinations of different skills in each small step. Does it mean that in some specific configurations where H and num_skills are large, the proposed method can actually be even slower than the trajectory optimization provided by differentiable physics? I would like to see more discussions or experiments regarding this matter. First, we want to clarify the differences between our method (DiffSkill) and the gradient-based trajectory optimization (GBTO): GBTO requires the full state information in the simulator while DiffSkill directly takes RGBD images as input and thus DiffSkill can be applied to the more general case where the full states are not known. This is important because estimating the full state information in the real world can be very challenging. Additionally, as shown in Table 1, DiffSkill is able to solve the long-horizon tasks while GBTO cannot. Due to these two reasons, merely comparing the computation speed between DiffSkill and GBTO does not tell the full story. Nevertheless, we conduct experiments on LiftSpread (With two skills) and the planning time for different plan steps are shown below. | Plan step | 1 | 2 | 3 | 4 | | ---------- | ----------- |----------- |----------- |----------- | | Planning time (s) | 11 | 31 | 107 | 223| The planning time for DiffSkill does grow exponentially when H increases (on the other hand, GBTO cannot solve this task at all). We have included this point in the limitation section. A potential future direction is to incorporate a policy or value function for more efficient planning. To reiterate, even though GBTO could theoretically be faster given sufficiently large H, GBTO does not solve the discrete planning problem of which tool to use and thus would not solve any long-horizon multi-tool task directly. > How did DiffSkill perform on single tool experiments? We have conducted experiments comparing DiffSkill with the RL baseline on single tool tasks in Appendix B and we found that DiffSkill can robustly complete these tasks. Additionally, the RL baseline we compare to can also solve the easier task of lifting. > Why Trajectory Opt has 0.544 improvement and 0% success rate in 'Tool A Only' while the numbers changed to 0.385/20% in 'Multi-Tool'? Aren't they positively correlated? You are correct, thank you for pointing this out! This specific entry for Trajectory Opt with Multi-tool for the GatherTransport task has a typo and the score should be 0.503. We have checked the rest of the entries carefully and there are no other typos in our results. We have updated this entry in the latest version of our paper. For completeness, we pasted here the raw performance of Trajectory Opt using either Tool A only or Multi-tool for each of the 5 trials. Multi-tool has a 20% success rate as its performance surpasses the threshold of 0.65 on trial 5. | | trial 1 | trial 2 | trial 3 | trial 4 | trial 5 | average | | ---------- | ----------- |----------- |----------- |----------- |----------- |----------- | | Tool A only| 0.4927 |0.5312 |0.564 | 0.5046 |0.6255 | 0.544 | Multi-tool | 0.4197 |0.4357 |0.4405 | 0.5679| 0.6535 | 0.503 > It seems that all previous baselines somewhat fails in the experiments designed. I wonder how they would perform in simpler ones where they actually have achieved something meaningful (i.e. success rate>0), and how the proposed method would compare in these examples. All baselines fail due to the challenges presented by the long-horizon manipulation tasks. The updated videos on our website may help with understanding how the baselines fail. To make our comparison with RL baselines more comprehensive, we compare with an additional model-free RL baseline on our multi-stage tasks: Soft Actor-Critic (SAC) [1]. The results are updated in Table 1 of the revised paper. We can see that SAC performs better than TD3 but is still much worse compared to DiffSkill. Videos of how the SAC baseline fails can be found on our website. We can see that SAC is able to perform reasonable actions but also get stuck in local optima due to the long-horizon nature of the task. To further demonstrate the correctness of the implemented SAC, we compare with SAC on single-tool, single-stage tasks in Appendix B and we can see that SAC is able to solve the Lift task very well. Videos for the single-tool tasks are also updated on our website. [1] Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." International conference on machine learning. PMLR, 2018. **We hope that our response has addressed your concerns, and turns your assessment to the positive side.** *If you have any additional questions, please feel free to let us know during the rebuttal window.* Best, Authors ## R3 (Reviewer wVGi) Thank you for your helpful feedback. We address each of your concern below: > Correct the typos In Equation 2, should it be maximizing ? and should C(k, z) be negative or positive? Thank you for pointing out this typo. We have corrected Equation 2 in the updated version of the paper. > When optimizing zi in equation (2), is it efficient/correct to first treat the problem as unconstrained and then project variables back to the constraint set? Can you directly solve it as a constrained optimization? - Projected gradient descent (PGD) is a common approach for solving constrained optimization. When the objective function is convex and $\beta-smooth$ on the constraint set that is also convex, PGD will converge to the optimal solution (correctness) and have the same convergence rate as the unconstrained case. We refer to Section 3.1, 3.2 and Theorem 3.7 of the book Bubeck15: http://sbubeck.com/Bubeck15.pdf. - We use PGD instead of other constrained optimization methods since projection onto an l2 constrained set is simple and straightforward. - Additionally, we note that PGD has also been used with neural network functions in other works, such as in the case of adversarial attacks [1]. [1] Madry, Aleksander, et al. "Towards deep learning models resistant to adversarial attacks." ICLR 2018. > What could be the possible limitation and future work of this method. We have updated the conclusion section of our paper for a discussion of the limitation and future work. We pasted it below: - There are a few interesting directions for future work. First, currently DiffSkill uses exhaustive search for planning over the discrete space. As the planning horizon and the number of skills grow larger for more complex tasks, exhaustive search quickly becomes intractable. A potential solution is to incorporate a policy or value function for more efficient planning. Second, similar to many other data-driven methods, while neural skill abstraction gives good prediction results on scenarios where a lot of data are available, it performs worse when tested on situations that are more different from training. This can be remedied by either collecting more diverse training data in simulation, for example, under an online reinforcement learning framework, or by using a more structured representation beyond RGBD images, such as using an object-centric representation. Third, we hope to extend our current results to the real world, by using a more transferrable representation such as just a depth map or a point-cloud representation. Finally, we hope to see DiffSKill be applied to other similar tasks, such as those related to cloth manipulation. *We sincerely appreciate your comments. Please let us know if you have further feedback.* Best, Authors ## R2 (Reviewer H1Qf) Thank you for your insightful comments and for finding our work "novel", "beneficial for the robotics community". We address each of your concern below. > For the optimization problem in (2), shouldn't you maximize the cost function C(k,z) instead of minimizing it? Thank you for pointing out this typo. We have corrected this typo in Eqn. 2 in the revised version of the paper. > A video comparison of the resulting policies to the chosen baselines would be nice (RL, BC and trajectory optimizer). This could illustrate where the other methods fail. Please see our updated website for videos of the baselines. The baselines are unable to complete the task as they get stuck in a local optima. > What are the possible limitations of this approach in practice? We have updated the conclusion section of our paper for a discussion of the limitation and future work. We also pasted it below: - There are a few interesting directions for future work. First, currently DiffSkill uses exhaustive search for planning over the discrete space. As the planning horizon and the number of skills grow larger for more complex tasks, exhaustive search quickly becomes infeasible. An interesting direction is to incorporate a heuristic policy or value function for more efficient planning. Second, similar to many other data-driven methods, while neural skill abstraction gives good prediction results on region where a lot of data are available, it performs worse when tested on situations that are more different from training. This can be remedied by either collecting more diverse training data in simulation, for example, under an online reinforcement learning framework, or by using a more structured representation beyond RGBD images, such as using an object-centric representation. Third, we hope to extend our current results to the real world, by using a more transferrable representation such as just a depth map or a point-cloud representation. Finally, we hope to see DiffSKill be applied to other similar tasks, such as those related to cloth manipulation. > Is the goal representation as an RGB-D image practical for example for real robot applications? It is difficult to directly transfer our current RGB-D models to the real robot, since our simulator is not photo realistic and we are not rendering any robot arm. There are two potential directions for better sim2real transfer. The first way is to make the rendering in the simulator more photorealistic, or use domain randomization [1]. The second way, as discussed in the conclusion section, is to use a more transferrable representation such as just a depth map or a point-cloud representation [2]. [1] Tobin, Josh, et al. "Domain randomization for transferring deep neural networks from simulation to the real world." IROS 2017. [2] Lin, Xingyu, et al. "Learning Visible Connectivity Dynamics for Cloth Smoothing." CoRL 2021 > How critical is resetting tools to initial poses at the end of each skill policy execution? We apply this reset procedure since it is easy to do both in simulation and on real robots, and generally simplifies the planning problem. Without the reset procedure, the planner would also need to reason about collision of the tools with each other and plan extra steps to avoid such collision. *We sincerely appreciate your comments. Please let us know if you have further feedback.* Best, Authors ## R1 (Reviewer DHQA) Thank you for your positive feedback. We are glad to hear that you find our work to be "a great contribution to the robotic manipulation literature". We address each of your concerns below. > W.1 Maybe this is just me but the writing could be a bit more polished for accessibility. It took me 2 reads for it to "click" that when you write "neural skill abstractor" in the abstract and intro, what you mean is that you're learning image-based control from a state-based ("expert") policy with the purpose of applying this in an image-based setting. I think just adding a sentence or two to explain this or create more of an intuition why we need to learn this would go a long way. The same is true imho for the fact that you're discovering intermediate goals via searching over z. Thank you for your writing suggestion. We have updated the corresponding text in abstract and introduction in our revised version, highlighted in red. > W.2 There's a couple typos in the main body of the text but I'm sure you can fix those. Also in the first sentence, calling elder care "deformable object manipulation" is... interesting. We have fixed all the typos that we found and clarified the elder care example in the introduction. To also clarify here, the task of elder care involves assistive dressing, feeding or bed bathing~[1], which involve manipulation of fabrics, food, or lifting and cleaning the human body, all of which are deformable/non-rigid. [1] Erickson, Zackory, et al. "Assistive gym: A physics simulation framework for assistive robotics." ICRA 2020. > When mentioning differentiable physics and planning therein, it would be nice to also mention the recent work on GradSim (Jatavallabhula, Krishna Murthy, Miles Macklin, Florian Golemo, Vikram Voleti, Linda Petrini, Martin Weiss, Breandan Considine et al. "gradSim: Differentiable simulation for system identification and visuomotor control." ICLR 2021.) Thank you for the reference. We have added this citation. > Why did you decide to train all network components simultaneously with one big loss term? Did you try training them separately? Are there any upsides/downsides to that? * The reason for training them with a single loss is that, the feasibility predictor, reward predictor and the VAE model in our framework actually share the same encoder. As such, losses for training each prediction head can affect each other. The architecture is described more formally in Section 3.3 of the paper, under the variational auto-encoder block. * Sharing the encoder allows us to do planning more efficiently. In DiffSkill, during planning, we search the intermediate goal images in the latent space $\bf{z}$, as seen in Eqn. 2. Sharing the encoders among different modules means that the latent space is also shared. This enbles us to directly evaluate the feasibility and reward of each plan using the latent vectors as input, instead of first decoding them into images. *We sincerely appreciate your comments. Please let us know if you have further feedback.* Best, Authors ## Experiments / Visualization to add - [x] Running SAC baseline, or show RL working on easier examples - Updated new results in Appendix B. - [x] Add visualization of RL, show why they fail - See here: https://www.notion.so/notebookxingyu/SAC-Visualization-726880941aa0445d8863b720bdf53318 - [x] Add visualization of Traj opt and BC - [x] Compare DiffSkill with RL baselines on single-stage tasks - [x] Run RL on single stage task. Directly run single-stage policy? - [x] Try different horizon H for DiffSkill - Preliminary result - [x] Compare performance using a pre-trained VAE vs training VAE along with other losses - In progress ## Modification to the paper - [x] Added future work and limitation in the conclusion - [x] Modify introduction - [x] Fix all typos

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully