Deep Reinforcement Learning with Double Q-learning

# Deep Reinforcement Learning with Double Q-learning ## webs - [https://arxiv.org/pdf/1509.06461.pdf](https://arxiv.org/pdf/1509.06461.pdf) - [https://yq.aliyun.com/articles/311010](https://yq.aliyun.com/articles/311010) - [https://www.cnblogs.com/wangxiaocvpr/p/5620365.html](https://www.cnblogs.com/wangxiaocvpr/p/5620365.html) - [https://www.jianshu.com/p/c87c539e26bd](https://www.jianshu.com/p/c87c539e26bd) - - [https://zhuanlan.zhihu.com/p/25239682](https://zhuanlan.zhihu.com/p/25239682) - [https://junmo1215.github.io/paper/2017/12/08/Note-Deep-Reinforcement-Learning-with-Double-Q-learning.html](https://junmo1215.github.io/paper/2017/12/08/Note-Deep-Reinforcement-Learning-with-Double-Q-learning.html) - [https://blog.csdn.net/taoyafan/article/details/90951058](https://blog.csdn.net/taoyafan/article/details/90951058) ## Abstarct - The popular Q-learning algorithm is known to ==overestimate== action values under certain conditions. such overestimations are common, whether they harm performance, and whether they can generally be prevented. - Q-learning is one of the most popular reinforcement learning algorithms, but it is known to sometimes learn unrealistically high action values because it includes a maximization step over estimated action values, which tends to prefer overestimated to underestimated values. - To test whether overestimations occur in practice and at scale, we investigate the performance of the recent DQN algorithm . DQN combines Q-learning with a flexible deep neural network and was tested on a varied and large set of deterministic Atari 2600 games, reaching human-level performance on many games. Perhaps surprisingly, we show that even in this comparatively favorable setting DQN sometimes substantially overestimates the values of the actions. - We show that the idea behind the Double Q-learning algorithm (van Hasselt, 2010), which was first proposed in a tabular setting, can be generalized to work with arbitrary function approximation, including deep neural networks. We use this to construct a new algorithm we call Double DQN. ## Background - ![](https://i.imgur.com/2A5YLfi.png) - ![](https://i.imgur.com/wY4KI3e.png) - ![](https://i.imgur.com/yYwbUQd.png) ## Deep Q Networks - A deep Q network (DQN) is a multi-layered neural network that for a given state s outputs a vector of action values Q(s, · ; θ), where θ are the parameters of the network. For an n-dimensional state space and an action space containing m actions, the neural network is a function from $R^n$ to $R^m$. - The target network, with parameters $θ^−$, is the same as the online network except that its parameters are copied every τ steps from the online network, so that then $θ^−_t$ = $θ_t$, and kept fixed on all other steps. The target used by DQN is then![](https://i.imgur.com/vVw52nu.png)