# Rainbow DQN ###### tags: `paper` `research` ## Rainbow = DQN + 6 Extensions ### 1. Double Q-learning -> **解決傳統 Q-learning 存在*高估Q value*問題** ### 2. Prioritized replay 一般DQN samples uniformly from the replay buffer 但理想情況下,我們希望更頻繁的採樣那些需要被學習的transitions (TD error大的) -> **解決 *無法收斂* 以及 *收斂速度* 問題** ### 3. Dueling networks 提出一個新的網路架構,在傳統dqn網路架構最後分成value and advantage streams  -> **收斂速度可能較快** ### 4. Multi-step learning  原本只計算單一step reward,改成multi-step reward -> ***加速learning*** ### 5. Distributional RL 在網路架構最後輸出改成Distribution (每個action 有自己的Distribution) 實作上比較困難,必須設定Distribution範圍 Loss funtion 使用 Kullbeck-Leibler divergence **-> 在特定的Atari game performance 顯著優於其他六種DQN** ### 6. Noisy Nets Noise on Action or Parameter 一般使用Gaussian noise (𝜇,𝜎) -> ***加速exploration*** ## Rainbow DQN Result  Figure : Median human-normalized performance across 57 Atari games. A3C中有用到Multi-step技術,以此代表Multi-step DQN ## Reference Rainbow paper : https://arxiv.org/abs/1710.02298 Hung-yi Lee YT : https://youtube.com/playlist?list=PLJV_el3uVTsODxQFgzMzPLa16h6B8kWM_
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up