Amrit Singh Bedi
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    ## Summarize our discussion with Reviewer htJq, qQsj and vQKv. We appreciate the engaging discussion of our paper with Reviewers htJq, qQsj, and vQKv. All of them raised insightful points that strengthened our paper, improving its clarity, exposition, and rigor. Since the discussion thread is long, we summarize the discussions following the initial review and our follow-ups. **Reviewer htJq** - All concerns (both initial review and follow up questions) have been **addressed** and **acknowledged**. - In general finds the paper promising and **suggested acceptance** after the presentation concerns were addressed. We have addressed all presentation concerns raised by Reviewer htJq by: - revising the main contributions, abstract, parts of Section 3, Section 4, Section 5.4, and Section 6. - More specifically, after the initial review, Reviewer htJq helped improve our papers exposition by (1) simplifying the explanation of behavioral and instant incentives, (2) precisely stating our contributions, and (3) framing our results in a way that better aligns with the stated contributions. - Post-initial review presentation questions have been **acknowledged** by reviewer. **Reviewer qQsj** - All technical concerns (both initial review and follow up questions) have been **addressed** and **acknowledged**. - Agrees with all reviewers that paper has good contribution and good results. Primarily had issues with narrative and writing: - narrative in introduction (motivate approach without inaccurately demoting CTDE) - lack of proper discussion on limitations (generalization, high-dimensional input) - We have addressed these by: - rewriting the introduction in section 1, - adding a section in Section 6 on limitations that includes a discussion on the possible limitations w.r.t generalizability, high-dimensional input states, among other things. - We also expanded on the discussion of empirical analysis of centralized and decentralized methods in Section 5.4 along with a justification of design choice of incentives. **Reviewer vQKv**: - All concerns (initial review only, no follow up concerns) have been **addressed** and **acknowledged**. - Note our responses to other reviewers regarding (1) clarity (2) CTDE v.s. DTDE (3) and found them sufficient - Finds our paper very solid, and raised the recommendation to **Strong Accept**. <!-- mentioned that we have addressed many technical comments, including: (1) performing more rigorous testing phase, like standard statistical test, on frozen models (2) including albation study on cerntralized version of our approach (3) performing weight sharing over inference modules. Reviewer qQsj pointed out a few limitations that are not addressed in our initial paper, including (1) challenges our approach may face when working with high-dimensional states. (2) The generalization problem when encountering agents with unknown strategies. We appreciate Reviewer qQsj helped us raise these limitations in our discussion and enlight potential directions for our future work. Meanwhile, the discussion with Reviewer qQsj over CTDE v.s. DTDE helps us repharse our problem setting and make it more rigorous. We appreciate the effort of Reviewer qQsj on making our paper more rigorous and solid. --> <!-- acknowledged the interesting and promising approach of our paper, especially in the domain of autonomous driving. Reviewer htJq praised our effort in clarify our approach, experimental domains and pseudocode. Their follow-up questions and comments helped us improve our paper, including (1) formulating the definition of behavioral and instant incentives used in our approach with clear examples (2) addressing the contribution we made in our paper by emphasizing on the behavior-awareness in trajectory prediction and decision-making (3) framing our results discussion and enlighting our future research in investigating the representational selection of inference models. All of them help us improve the exposition and positioning of our paper. --> <!-- ## Rebuttal for Reviewer htJq (2nd round) > First, can you articulate how your results justify the added complexity of using the behavioral/instant incentive? I can see the experimental results and there is some additional analysis. I would like to hear why the improvements you see between, e.g., IPLAN-GAT and IPLAN justify the additional complexity of IPLAN. Thanks for the question! The added incentive modules not only yield novel insights into decentralized MARL for heterogeneous autonomous driving, which we believe would generally be beneficial to the decentralized MARL research community in general, but also do **not** incur high computational complexity compared to the base controller. Specifically, we are happy to report that: - Each of the GAT and BM modules occupy only about 10% of the size of the core controller (which is basically iPlan without the two incentive modules). We can provide specific parameter sizes if necessary. - In terms of inference time, both the core controller and the added incentive modules are of the same order. - Finally in terms of training time (using the hyperparameters in the paper), iPlan takes 4.897 days whereas iPlan-GAT takes 4.042 days. Note here, that the training time complexity also results from the complexity of the simulator environment. Let us know if you'd like us to analyze complexity in any other way. > Second, I think there is room for improvement in explaining and motivating the hierarchical incentives used here. For example, in describing the instant incentive, you say that collision avoidance becomes more important in heavy traffic. I expect a lot of readers will object to that reasoning because collision avoidance is always important --- it is simply easier to do in light traffic so one can drive faster. I guess that it is helpful because you don't have the right features. I.e., the preference over velocity is a simplification of a much more complicated set of preferences so it is easy for the inference to allow it to adapt. We apologize for the confusion. Let us try again. Perhaps the most concrete way to explain the two incentives is to first differentiate between the objectives of the two incentives. - **Behavior incentive**: Given the observations for the previous few seconds, behavior incentive performs high-level decision-making similar to action planning, "*What's the most likely action of this driver to take next?*". The answer is encoded via $\hat\beta^t_i$. This tells an agent when it is able to speed up in sparse or empty traffic or slow down in dense traffic. It also is able to recognize conservative drivers and recognize the possible need to overtake. Therefore, this incentive is able to reason between aggressive and conservative drivers. - **Instant incentive**: Instant incentive then asks "*How should I execute this maneuver using my controller so that I'm safe and still on track towards my goal?*". Instant incentive measures classical efficiency metrics defined in robotics literature such as collision avoidance (safety), distance from goal, and smoothness. Having gained an idea of what each incentive is responsible for, here's a toy example. Suppose Alice is driving behind Bob. Alice is a realtively more assertive and confident driver than Bob, who is driving very slowly. Now, Alice's' *behavior incentive* is tracking both Alice's and Bob's driving for the past few minutes and after observing for a short while will tell her to overtake Bob. At this point, her behavior incentive will inform her *instant incentive*, which will modify her trajectory and show her exactly how (what controls) to execute the overtake maneuver safely, as opopsed to having her stuck behind Bob. Another way to look at it is that instant incentive is akin to motion forecasting whereas behavior incentive is akin to high-level decision making. Then, we can say that the behavior incentive biases the motion forecasting in a behavior-aware manner such that it is better suited for heterogeneous traffic. For evidence, note that in more homogeneous traffic, iPlan has similar success rate (68.44) to iPlan-GAT (no behavior incentive), whereas in chaotic traffic, the success rate drops significantly for iPlan-GAT, compared to iPlan (61.88 versus 67.81), indicating that you need behavior modeling to survive in more heterogeneous chaotic traffic > Finally, I do still notice a few typos and grammatical errors --- some additional editing is needed for the camera ready. Perhaps consider using Grammarly or a similar tool to check for missing words or incorrect verb tenses (e.g., l.165 "Behavioral incentive captures these inherent tendencies," should either say "The behavioral incentive captures" or "Behavioral incentives capture"). We are currently fixing these. Will update this response once complete. Thanks! When observing another vehicles states, one can differeniate the incentive driving this vehicle is two manner: - **Behavior incentive**: Given the observations for the previous few seconds, behavior incentive asks, one can conclude the action preference of this vehicle in this circumstance, like "What's the most likely action of this driver to take next?" The answer is encoded via $\hat\beta^t_i$. This value, is obviously an indicator of aggressive or conversative behavior. - **Instant incentive**: Instant incentive then asks "*How should I execute this maneuver using my controller so that I'm safe and still on track towards my goal?*". Instant incentive measures classical efficiency metrics defined in robotics literature such as collision avoidance (safety), distance from goal, and smoothness. Having gained an idea of what each incentive is responsible for, here's a toy example. Suppose Alice is driving behind Bob, and a neural observer is tracking the two and generate the behavior and instant incentives of the two. Alice is a realtively more assertive and confident driver than Bob, who is driving very slowly. Now, the neural observer's *behavior incentive infernece module* is tracking both Alice's and Bob's driving for the past few minutes and get some ideas the behavior pattern of both. When Alice gets closer to Bob, the neural observer's *behavior incentive infernece module* find hat Alice is likely to overtake Bob, given the behavior pattern of Alice. At this point, her behavior incentive will inform her *instant incentive*, which will nake a modification of the trajectory prediction of Alice by considering the behavior pattern of Alice in prediction and predict the trajectory that Alice will be more likely to execute the overtake maneuver safely over Bob instead of slow down and keep safe distance. So behavior incentive is acting like applying a bias term over instant incentive infernce, who actually performs the trajectory prediction. In this case, agents could perform trajectory prediction by considering the effect of behaviors, and then make more accurate predictions. --> ## Rebuttal for Reviewer htJq (3rd round) Thank you for clarifying your follow-up question. Below, we briefly discuss additional tuneable hyperparameters,a dditional code, and design choices. However, it should be noted that our main contribution is a practically working and efficient novel joint trajectory and intent prediction algorithm using MARL for autonomous driving in heterogeneous traffic. In general, it is well-known that training even simple MARL algorithms is hard. Yet, our approach extends MARL for trajectory planning research in autonomous driving to harder domains (heterogeneous traffic) under minimalistic assumptions (decentralized training, no weight sharing, variable agents etc.). Considering that even getting decentralized MARL algorithms to converge effectively in simpler environments is a challenge, the fact that our combined approach not only trains well, but also outperforms many state-of-the art baselines, is a significant achievement, in our opinion. In summary, thinking of our contribution in terms of a just the improved percentage points is highly reductive. Our work is a significant push in the research landscape of decentralized MARL and autonomous riving in heterogeneous traffic. - **more interacting parts of the system:** There are no additional interacting parts of the system. All three modules (controller, behavioral and instant incentive inference) use the same form of inputs that comes from the ego agent's observations of opponents. The only extra complexity here is the observation wrapper that processes the observations, which is shared by the episode batch creator. We are using the same observation wrapper to convert the initial observation from the environment into the input when performing all baselines in our paper. - **more hyperparameters to tune:** Yes, there are a few extra hyperparameters introduced by behavioral and instant incentive inference module. We have included details of these hyperparameters in Appendix C of our paper. We also present some additional experiment results on tuning hyperparameters in Appendix E. - **Behavioral incentive inference module**: - the hidden state dimension of encoder and decoder, - the dimension of behavioral incentive, - the learning rate of behavioral incentive inference module, - the coefficient for soft update policy, - the length of historical observation sequence, - the drop out rate. - **Instant incentive inference module**: - The hidden state dimension of GAT and recurrent layer, - the batch size of sampling the moment from the episode batch for training, - the learning rate of instant incentive inference module the length of trajectrory prediction, - the drop out rate. - **more code to maintain:** Yes, both behavioral and instant incentive inference modules are defined separately and there have are in separate files. The behavioral incentive inference module is defined by a separate training and execution code, with autoencoder network structures. Similarly, the instant incentive inference module is defined by a training and execution code, with GAT and recurrent network structures. - **more design choices to make:** Yes, we also explored some alternative design in our inference modules, like using different network structures, or using a hard updating policy in behavior module. We have included these results in Appendix D. Results show our current design has better performance. ___ ## Rebuttal for Reviewer htJq (3rd round) Thank you for clarifying your follow-up question about methodological complexity and our apologize about the previous confusions. Regarding the methodological complexity concerns you raise: - **more interacting parts of the system:** There is no more interacting parts of the system. All three modules (controller, behavioral and instant incentive inference) use the same form of inputs that comes from the ego agent's observations of opponents, and there is no more interacting parts among agents or between agents and the environment. As we mentioned in Section 4 of the paper and Fig. 1, behavioral incentive inference uses the sequence of historical observations as the input, instant incentive inference uses the current observation of opponents and behavior incentives (originally from the observations), controller combines the current observation, behavioral and instant incentives (both of which comes from the observations of opponents) as the input. The extra complexity here is the observation wrapper that processes the observations, which is shared by the episode batch creator. Notably, we are using the same observation wrapper to convert the initial observation from the environment into the input when performing all baselines in our paper. - **more hyperparameters to tune:** Yes, there are a few extra hyperparameters introduced by behavioral and instant incentive inference module. We have included details hyperparameters in Appendix C of our paper. We also present some additional experiment results on tuning hyperparameters in Appendix E. - **Behavioral incentive inference module**: The hidden state dimension of encode and decoder, the dimension of behavioral incentive, the learning rate of behavioral incentive inference module, the coefficient for soft update policy, the length of historical observation sequence, the drop out rate. - **Instant incentive inference module**: The hidden state dimension of GAT and recurrent layer, the batch size of sampling the moment from the episode batch for training, the learning rate of instant incentive inference module the length of trajectrory prediction, the drop out rate. - **more code to maintain:** Yes, both behavioral and instant incentive inference modules are defined separately. The behavioral incentive inference module is defined by a separate training and execution code, with autoencoder network structures. Similarly, the instant incentive inference module is defined by a training and execution code, with GAT and recurrent network structures. - **more design choices to make:** Yes, we also explored some alternative design in our inference modules, like using different network structures, or using a hard updating policy in behavior module. We have included these results in Appendix D. Results show our current design has better performance. Given our attempt in hyperparameter tuning, design choice exploration and ablation studies, we find that our current design of behavioral and instant incentive inference helps to achieve better performance than those alternative approaches we have explored, and the extra complexity applied to the backbone code (IPPO) is ___ ## Rebuttal for Reviewer htJq (4th round) Thank you for your follow-up question regarding our module design and contributions we claimed in our paper. We are sorry for confusions we made here. According to your comments, we appreciate your suggestion on the contributions we claim. We think we may be better to repharse our second contribution proposed in our paper as:

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully