BOR-JIUN, LIN
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    ###### tags: `RNE Course 2022` # Group 3B Paper Summary :::info Presenter: 林柏均 林家合 Deep Reinforcement Learning based Automatic Exploration for Navigation in Unknown Environment Authors: Haoran Li, Qichao Zhang, Dongbin Zhao Journal: IEEE Transactions on Neural Networks and Learning Systems ::: ## Introduction ### Automatic exploration 機器人在沒有任何prior knowledge的情況下,在一個新環境中不斷地移動探索,并從中構建出整個環境的地圖。 ### Traditional methods * Frontier-based: 1. 核心問題: 根據目前對於世界的理解,应该往哪里走才能尽可能地获取更多的新信息? 2. Agent以當前地圖中開放空間與未知空間之間的邊界(frontier)作為參考點去進行探索。 3. 此概念最初由1997年的論文提出:[A Frontier-Based Approach for Autonomous Exploration](https://ieeexplore.ieee.org/document/613851) 4. 上述論文相較於更早之前所提出方法的優點在於(1)它可以探索包含开放和杂乱空間的环境;(2)它可以探索環境中存在任意方向墙壁和障碍物的空間;(3)它可以通过移动到最有可能會在地圖中添加新信息的位置来有效地探索環境。 * Information-based: 1. 核心思想: 探索未知領域的問題可以用information gain maximization problem來描述。 2. 目標: 通過同時最大化Occupancy Grid地圖上的expected Shannon information gain和最小化SLAM過程中vehicle pose和map feature的不確定性來最大化地圖信息。(前者會引導agent去探索環境中的未知空間,後者可以確保繪製出來的map和agent本身的位置有一定的精準度) 3. Agent以當下的資訊去判斷哪一個action可以獲得最多information gain,然後采取該action。 4. 此方法由2002年的論文提出:[Information Based Adaptive Robotic Exploration](https://ieeexplore.ieee.org/abstract/document/1041446) 簡而言之,traditional methods的key就在於根據目前的位置和已知的地圖找到一個optimum next point。但是,這也會導致這些方法非常依賴已經一開始就決定好的expert features of maps(e.g. frontier)。不僅如此,traditional methods也很难设计一个通用的终端机制用來平衡exploration efficiency与computational burden。 ### Learning based exploration methods 近年來,deep reinforcement learning based methods被廣汎應用在robotic visual control和navigation tasks,這些方法會去建立raw sensor data和control policy之間的mapping relationship。雖然end-to-end approaches可以讓agent根據sensor data做到自主探索環境這件事情,但是這個方法會有兩個缺點。第一,無法使用之前其他人所提出的一些比較成熟的探索方法,這意味著agent需要從raw data中學習如何在環境中移動以及如何做到更有效地探索環境,這無疑會提高agent的訓練難度。第二,虛擬和真實的sensor data之前所存在的差距會導致從simulation transfer到reality的時候遇到巨大的挑戰。 ## Contribution 作者提出了一個基於map building, decision making and planning modules的automatic exploration framework。這篇論文的contribution可以用以下兩點來概括: 1. 與其他使用raw sensor data作爲input,control policy作爲output的end-to-end approach相比,作者提出的exploration framework結合了傳統導航方法,此方法可以減少simulation和reality之間的reality gap。除此之外,此framework還可以降低end-to-end approach的訓練難度。 2. 與traditional methods相比,作者設計了深度强化学习的价值网络,並將DRL-based decision algorithm與classical robotic methods進行結合,從而達到提高探索效率和對未知环境的适应性的成果。 ## Related Work Exploration methods: traditional methods and intelligent methods Traditional methods: 1. [A Frontier-Based Approach for Autonomous Exploration](https://ieeexplore.ieee.org/document/613851): 定義frontier的概念,使用frontier作爲next candidate point。 2. [Navigation Strategies for Exploring Indoor Environments](https://journals.sagepub.com/doi/abs/10.1177/0278364902021010834?casa_token=J2zW-F-Gz60AAAAA:9IhkH8tQmCPzXvvaTjOm--lNWR7kG--ACutmQygqWigJaFrIeSog_XhJGd73571g3z8230SKh0KLKg): 定義safety region的概念,使用Next-Best-View演算法來決定robot要往哪一個region走。 3. [Information Based Adaptive Robotic Exploration](https://ieeexplore.ieee.org/abstract/document/1041446): 使用Shannon information gain和uncertainty作爲utility function來決定robot下一步的位置。 Intelligent methods: 1. [Mobile robots exploration through cnn-based reinforcement learning](https://jrobio.springeropen.com/articles/10.1186/s40638-016-0055-x): 使用DQN來訓練一個control network,這個模型只吃RGB-D images作爲input。 2. [Learning to Navigate in Complex Environments](https://arxiv.org/abs/1611.03673): 使用A3C訓練一個agent在複雜的3D迷宮中進行探索,這個模型只使用raw sensory data作爲input。 ## Methodology ### Relationship between the modules of the exploration framework #### Exploration Framework ![](https://i.imgur.com/D9KReOT.png) #### Decision Module 根據mapping module模擬出的map與過去robot的trajectory作為input,丟到$f_{decision}$去生成去生成goal point。$l_{0:t}$代表過去robot個時間點的位置,$m_t$為mapping module產生的surrounding map。 \begin{aligned} g_t = f_{decision}(l_{0:t}, m_t) \end{aligned} 作者在此篇論文中提出以DRL為基礎的decision module。由於此論文專注在decision module的發表,故會在以下提供更多詳細的方法內容。 * **Objective function** 此處的目標為希望robot exploration過程中預測的地圖可以與現實環境越接近越好,此外由於robot有電池上的限制,因此亦希望可以找到一條最短路徑到達目標點,其中$\hat{m}$表示模擬的地圖而$L(\cdot)$代表path length: \begin{aligned} c(\hat{m}, x_{t=0:T}) = \min_{u_{t=0:T}} || m - \hat{m} ||^2 + L(x_{t = 0:T}) \end{aligned} 然而在真實情況下我們通常無法得到真實地圖的資訊,所以以上目標函數是不可行的;但啟發於Shannon Entropy用於探索評估,每個座標(i, j)可能存在的狀態有三種:unknown, free and occupied,並依此定義目標函數為: \begin{aligned} H(m) = -\sum_i \sum_j p(m_{i, j})logp(m_{i, j}) \end{aligned} * **Deep Reinforcement Learning** DRL已經成為常見用於robotic navigation的方法,然而在現實環境中所收到的sensor data往往都有著高維度的資訊,進而導致robot需要大量的trail及error才可以有效的找到可能的state-action pair以此達到好的performance。因此大多情況我們都會將robot放在模擬環境,然而此會造成兩個問題: 1. 虛擬環境中的error function與現實會有差異,這導致其無法做到generalization 2. 機械上的error是難以模擬的,其非常容易使現實與模擬間產生誤差 因此作者選擇將decision module與planning module分離,前者專注於將目標的位置找出,並推得reward function為以下: \begin{aligned} r_t = \alpha (H(m_{t-1}) - H(m_t) - L(x_{t-1}, x_t)) \end{aligned} 除此之外在reward的定義上,為了安全考量我們引入heuristic function如下: \begin{aligned} \begin{equation} r_t= \begin{cases} -1, & \text{if the next point is in unknown space}\ \\ -1, & \text{if the next point is too close to the obstacle} \end{cases} \end{equation} \end{aligned} 再者,為了讓exploration可以在時間內停止,作者定義出terminal action並將對應的reward定義為如下: \begin{aligned} \begin{equation} r_t= \begin{cases} 1, & \text{if the ratio of explored region in map ρ > 0.85}\ \\ -1, & \text{otherwise} \end{cases} \end{equation} \end{aligned} * **Action Space and Network Architecture** 1. Action Space 在此篇中的action被定義為選取map上一點並前進,故根據下圖,我們可以得知綠色點為free points為可抵達的點,故又可稱為可行的action。 ![](https://i.imgur.com/z1Qc38n.png) 2. Network Architecture ![](https://i.imgur.com/1jlctQL.png) 在此處的神經網路的設計中作者借鑑Dueling DQN的架構並做出些許改變,在原始DDQN中,feature map會分別通過兩個Dense layer才去產生兩組feature vector並再分別進入兩個dense layer去生成value和advantage,在此處作者直接拿掉前面的dense layer並直接用1x1的convolutional layer去map到advantage value,而state value的部分他直接用過global maxpooling產生feature vector並接上fully connected layer去產生state value。其中每個state-action pair的q value定義為下: \begin{aligned} Q(s,a;α,β) = V(s;β)+(A(s,a;α)− {1 \over |A|} \sum_{a'} A(s,a′;α)) \end{aligned} 其中α,β分別對應two stream的network parameters;而terminal action所對應的q value如下:(此處目前無法了解) \begin{aligned} Q(s,a;β) = V(s;β)+(A(s,a;β)− {1 \over |A|} \sum_{a'} A(s,a′;β)) \end{aligned} * **Auxiliary task** 在原先的神經網路中加上第三個branch負責segmentation,其目的是讓神經網路將其視為輔助的訓練參考,在navigation的過程中,對於地圖上每個點所代表的物件的了解是非常重要的,在此篇論文中共有三種類型:obstacles、frontier及others。而Loss function的定義也如下: \begin{aligned} L_s = -\sum_{p} \sum^M_{c = 1} y_{o, c} log p_{o, c} \end{aligned} 綜合以上的整個神經網路的結構,低階的convolutional layers的total loss function為: \begin{aligned} \Delta \omega = -\epsilon({{\partial L^{DQN} \over {\partial \omega}}} - \lambda{{\partial L_s} \over {\partial \omega}}) \end{aligned} #### Planning Module Planning module目標是找到一個可行的trajectory從目前位置到達goal point,基本上planning這個階段可以拆分成兩個部分:Global planning和Local planning。前者主要是找出可行的路徑從當前位置到達目標,而後者則是嘗試將path轉換到trajectory,並根據real-time sensor去調整trajectory。而作者以$A^*$作為Global planning的演算法,從$A^*$得到的資料為dicreted trajectory,所以在Local planning部分作者使用time-elastic-band(TEB)將離散點轉會成robot velocity以此作為控制資訊。此方法相比於其他tracking method可藉由LiDAR產生real-time cloud point來避免未出現在建置地圖中的障礙物。 \begin{aligned} \tau_{t:t+T} = f_{planning}(l_{0:t}, g, m_t) \end{aligned} 其中g為decision module模擬出的目標點,其餘皆與上段定義相同。 #### Mapping Module Mapping module根據一連串的sensor data來產生當前robot的位置及周遭的地圖,此即為SLAM。其對應的方程式如下: \begin{aligned} m_t, l_t = f_{mapping}(l_{t-1}, m_{t-1}, u_t, z_t) \end{aligned} 其中$u_t$及$z_t$分別對應當前的控制(control)與觀察(observation)。此篇論文中作者使用Karto SLAM,此為一個graph-based的SLAM方法。 ## Experimental Results ### Simulation Environment #### Training & Testing Maps ![](https://i.imgur.com/5kxUJaX.png) - 左上為trainning時所使用的simulation map - 中上及右上的map為相同尺寸的testing map - 左下及右下的map為不同尺寸下的testing map #### Trainning Result ![](https://i.imgur.com/wLZA1GO.png) - 從圖中可以發現AFCQN不論是在trainning performance又或著是最終reward都優於其他兩種模型 - 而DQN在trainning過程最終所得到的reward高於FCQN的可能解釋為DQN相較於後者有較多參數可以面對較為複雜的控制過程 ![](https://i.imgur.com/ff8FuNh.png) - 此表比較的是不同模型trainning過程中的Explored grids及Path length - 其中AFCQN在Explored grids比較中都有著高的數值,原因在於此結構本就希望藉由建構地圖以此獲得更加安全的路徑 ![](https://i.imgur.com/iVxg3JI.png) - AFCQN在Explored region rate的數值中最高,其餘前段解釋相同 - FCQN則在Path length及Explored efficiency中有著很好的表現,其目的為快速找到Goal point並不關心地圖的建置是否完全 #### Testing Results ![](https://i.imgur.com/qG9xDKA.png) - 此圖表顯示的是Explored region rate、Path length及Exploration efficiency在test map上對應不同模型的結果 - 由於DQN的input/output為固定的size,故無法在不同大小的test map上做testing - AFCQN依然在Explored region rate有著較高的比例 - 其餘結論與前段提到的相同 ![](https://i.imgur.com/sdbcFeR.png) - 此處將AFCQN與Frontier-based method做比較 - Frontier-based method有更高的Explored region rate,因為此算法更注重於將整張地圖找完 - 由於AFCQN注重建制地圖的同時也要求在時間內抵達goal point,所以其Exploration efficiency更佳 ### Physical Environment #### Real World & Simluation Trainning Results ![](https://i.imgur.com/oAuUNKF.png) - #### Cases caused Frontier-base method failure ![](https://i.imgur.com/gCeL1PZ.png) - ## Your Insight 1. 近年來所提出的deep reinforcement learning的方法雖然在各個領域取得都不錯的表現,但不一定適用於所有任務。反而,加入一些已經被證明非常實用的傳統方法可以結合兩者的優點,在automatic exploration這個任務中取得更進一步的成果。 2. 在training pipeline中加入auxiliary task可以引導agent學到特定的知識,選擇對的task可以鼓勵agent更加頻繁去探索未知的環境。因此,除了使用的方法非常重要之外,training pipeline也需要好的設計。

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully