李政毅
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.

      Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Explore these features while you wait
      Complete general settings
      Bookmark and like published notes
      Write a few more notes
      Complete general settings
      Write a few more notes
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.

    Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Explore these features while you wait
    Complete general settings
    Bookmark and like published notes
    Write a few more notes
    Complete general settings
    Write a few more notes
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Multi-agent RL for traffic signal control ## Introduction ### Multi-agent vs. Single-agent multi-agent跟single-agent最大的差別在single agent只需要根據環境的互動來做學習 ![](https://i.imgur.com/XwbBsY1.png) 但multi-agent除了跟環境的互動外其他agents的動作也會影響到自己,且multi-agent的行為在訓練中是不可預期的,所以會比single agent要來的複雜許多 ### Multi-agent的分類 ![](https://i.imgur.com/ZZLpMkI.png) - Anlysis of emergent behaviors : 簡單來說是把所有的agent直接灑到環境中,不溝通也不共享訊息,各自學習的一種訓練方式 - Learning communicate :在學習的過程中agent彼此間會傳輸訊息或是共享訊息來學會如何溝通而達到更好的結果 - Learning cooperation : 在學習的過程中agent透過學習互相合作而達到更好的結果 - Agent modeling agents: 若agent間並沒有溝通,但是agent透過看到其他agent的一些外在結果來去model自己的訊息 ## Why this is important ### Action space exponetial increase 舉例:若一個路口會輸出綠燈時間0-30秒,共30個action可以選,那若今天有三個路口那action space就會增長為30^3=27000個,是exponential成長,透過Action branch能達到線性成長 變30+30+30=90。 ### traffic signal control is no good enough ![](https://i.imgur.com/nAVZjv9.png) 現今的交通號誌設計不夠良好,沒有友善利用 如圖可見即便左右向是沒有車流的前後向的車流卻還是被block住 ## Background knowledge ### Centralized vs. Decentralized Centralized : 只透過一個agent決定所有的actions Decentrailzied : 會有多個agents,且各自的agent透過學習得出各自的action ### 傳統的設計方式與RL設計方式 ![](https://i.imgur.com/1Doo0xH.png) - 當前較常使用的控制系統還是pre-designed的可能會根據過往歷史的經驗判斷哪些時段是高峰就開較長的綠燈時間 ,但車流問題會因為每個城市的發展狀況而有所改變 - 例如該地若新建捷運後可能會較多人搭乘捷運車流量反而會較少 - 傳統在解這個問題時會假設車子的speed是uniform的情況下,去解最小化旅行時間的問題,但這在現實生活中顯然不合理。 ## Existed works ### Action Branching Architectures for Deep Reinforcement Learning #### Abstract + Intro 舉例:若一個路口會輸出綠燈時間0-30秒,共30個action可以選,那若今天有三個路口那action space就會增長為30^3=27000個,是exponential成長,透過Action branch能達到線性成長 變30+30+30=90。 拿這篇paper的架構Branching Dueling Q-network,是透過一層share layer接到各自對應的branch上 經由這個share layer來去調空各個branch的action。 BDQ可以在一個把continuos action space 切成6.5*10^25的 discrete action space後玩的比DDPG好 BDQ跟IDQ做比較(IDQ是完全沒有共用參數)得到IDQ隨著維度增加很快就爆了 這種對控制的部份分佈其實在自然界中也能看到:例如章魚的腳 #### Branching Dueling Q-Network 如果可以直接不管其他agent各自練各自的,每個agent只自己做優化那就可以大幅減少action space 但直接這樣練顯然有問題,為了解決它提出了一個結合Dueling DQN的演算法 稱之為BDQ ![](https://i.imgur.com/T0ckW5N.png) 透過三種方法去改善DQN這個演算法 #### Double Q-learning + DDQN 原本的Q learning會有 overestimation的問題 透過Double Q learning改善這問題 Double-Q 可以解決 select max的問題 DDQN 透過target network解決bootstrapping問題 ![](https://i.imgur.com/LNw5MR6.png) #### Prioritized Experience Replay sample時不要uniform選,而是透過Loss去優先選一些比較重要的buffer來提昇訓練的結果 #### Dueling Network Architecture ![](https://i.imgur.com/kOi4i39.png) ![](https://i.imgur.com/FwtKuq5.png) 透過這公式去實作出同時能產生state vlaue 跟 advantage的網路去估計Q值 而實驗顯示扣掉mean A*結果較好 #### Methods - Common State-Value Estimator 在架構上每個branch的使用的state value都是同一個 而實驗上得出(無理論)證明扣掉mean A*結果較1.不扣、2.扣maximum好 - Temporal-Difference Target TD target的計算按照理論上也是用max action去預估下一個state的Q值 但實驗上得出去取mean也會比max來的好 - Loss function 把每個branch的predict 跟target的error取決對值加起來取平均當作loss是最值觀的作法 但實驗上得出square error的效果比直接取error更好 - Gradient Rescaling 因為是所有branch會更新到同一個share memory上所以要rescaling 除以N+1的原因是因為還有考慮State Value - Error for Experience Prioritization 透過所有branch的target跟predict的絕對值累加當作priority的error error越大代表預測的越不準,代表還有很大的進步空間所以需要比較高的優先權來sample #### Experiments 拿去跟沒有變形的Dueling DDQN比也在連續環境中去跟DDPG比 而為了要讓演算法能在連續環境上使用,使用到切割的技巧把連續的空間切成N等份變成離散空間 #### Custom Reaching Environments 去客製化reacher v1這個遊戲 擴張成維度3 4 5 的情況下去看Dueling跟BDQ的表現誰比較好 在延伸去把個別dimension的action space分成n = 5跟n= 9的情況下作考慮 若N=5 n= 9 就有9^5種state可以選 結果可以看到BDQ隨著N跟n增長的表現會比較穩定 #### Standard Benchmark Environments 比較BDQ跟IDQ跟DDPG的演算法在4個環境中,值得注意的是DDPG在最複雜的遊戲中輸了 而在其他較簡單的遊戲中則跟BDQ差不多或是略優於BDQ,作者解釋的原因是因為DDPG的探索是用noise的緣故在單前時段能夠做更好的的exploration 而可以發現Dueling DDQN在Reacher還練的起來但到Hopper就不行了 這只多了一個維度N=2 → 3 #### Note BDQ的穩健性整體是比較好的但隨著priority experience replay buffer 的ablation它會嚴重惡化 因此prioritize experience replay仍是未來研究的主題 ### Multi-Agent Deep Reinforcement Learning for Large-scale Traffic Signal Control #### Abstract + Intro 早期的交通方法是在解優化問題,但需要透過大量的數學計算,使它們不太好用,且不一定符合現實狀況 因為action space過大的關係所以要透過MARL的方式去處理,但也因此面臨幾項挑戰 環境在各個agent只能看到部份的訊息,因為agent之間的溝通不夠完整。 提出了一個可擴展、decentralized的MARL 方法,將multi-agent 結合A2C 去跟完全獨立的A2C、Q learning做比較,結果證明了與其他最先進的分散式 MARL 算法相比最優 #### decentrailized vs. independent decentrailized代表各個agent中還是各自訓練,並沒有透過一個layer去調控,但有可能去抓取其它agent的資訊來做訓練,但independent就完全是沒有其他agent的任何資訊了 #### Method #### Consider latest neighbor agent policy and state 把最新的policy的鄰域也考慮進來,增加local agent觀察的範圍 ![](https://i.imgur.com/1b58FCk.png) #### Use spatial discount factor adjust the state and reward signal from other. 透過spatial discount factor來去調整從其他agent收進來的reward該如何影響個別agent的決策 α = 0 代表完全只考慮自己的reward (greedy) α = 1 代表會完全考慮其他neighbor的reward(coopeateive) ![](https://i.imgur.com/nUUd761.png) #### Archeitecture ![](https://i.imgur.com/toIkZ4x.png) 這邊用LSTM的原因是因為歷史交通資訊對agent來說很重要,因此透過LSTM讓它能有效去recall這些歷史訊息 ### CoLight: Learning Network-level Cooperation for Traffic Signal Control #### Problem - 環境的變化是動態的 把target路口相鄰的路口拉進去考慮時,相鄰的路口若是主要幹道那它對target路口的影響應該要更大,但相鄰的路口對target路口的影響是會隨時間改變的。 例如到早上巔峰A→target→C很塞,晚上下班可能就會是A←target←C很塞,方向會改變 透過Graph attention network 來去學這種動態的影響 - 分享model variable可能會有不相容的問題 因為agent間會透過share model variable來提升訓練效率,但可能會遇到不同intersection所考慮的importance不同衝突 例如:A路口的主要影響會是W,E,因此會把neihbor index成[W,E,S,N],而B路口的主要影響會是N,S,index成[N,S,W,E] 這篇paper透過定義impotance跟attention,透過attetion wegited來implement index-free model來解決這個問題 ![](https://i.imgur.com/x8GEazz.png) #### Method ![](https://i.imgur.com/39YdtCJ.png) #### Cooperation through dynamic communication & Index-free model learning with parameter sharing ![](https://i.imgur.com/6H8BZzH.png) - embeded observation 把observation過embeded layer後得到hidden state - importnace Wt := embeding parameters for target Ws := embeding parameters for source - attetion 把先前定義的importance過一個softmax得到相對的attention value - representation of several source instersections to target i 經過一次與經過兩次Graph attetion network的結果,可看到第二次更明顯的顯示出了主要幹道上的負載量,經過越多層GAT理論上可以達到更精準的評分結果 #### Multi-head Attention ![](https://i.imgur.com/7mIn163.png) 如同上述步驟,但同時做了多種linear projection 用了不同種的{Wt,Ws,Wc},最後在取平均 實務上5個head attetion的結果最好 #### Q-value Prediction ![](https://i.imgur.com/xtX1Xty.png) p代表的是有可能的phase數量,像一般的Q-value選擇一樣,最終從q個維度中選值最大的當作output action phase。 ## Possible future direction - Graph Neural Network based training - Design better reward and state such that RL training have a good goal - Partial observation training since sensors are expensive ## Slide https://docs.google.com/presentation/d/15kDveFa2epGSe9yJs2caYn8kx24eM6Ro/edit#slide=id.p1

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password
    or
    Sign in via Google Sign in via Facebook Sign in via X(Twitter) Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    By signing in, you agree to our terms of service.

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully