# Deep Reinforcement Learning with Double Q-learning
## webs
- [https://arxiv.org/pdf/1509.06461.pdf](https://arxiv.org/pdf/1509.06461.pdf)
- [https://yq.aliyun.com/articles/311010](https://yq.aliyun.com/articles/311010)
- [https://www.cnblogs.com/wangxiaocvpr/p/5620365.html](https://www.cnblogs.com/wangxiaocvpr/p/5620365.html)
- [https://www.jianshu.com/p/c87c539e26bd](https://www.jianshu.com/p/c87c539e26bd)
-
- [https://zhuanlan.zhihu.com/p/25239682](https://zhuanlan.zhihu.com/p/25239682)
- [https://junmo1215.github.io/paper/2017/12/08/Note-Deep-Reinforcement-Learning-with-Double-Q-learning.html](https://junmo1215.github.io/paper/2017/12/08/Note-Deep-Reinforcement-Learning-with-Double-Q-learning.html)
- [https://blog.csdn.net/taoyafan/article/details/90951058](https://blog.csdn.net/taoyafan/article/details/90951058)
## Abstarct
- The popular Q-learning algorithm is known to ==overestimate== action values under certain conditions.
such overestimations are common, whether they harm performance, and whether they can generally be prevented.
- Q-learning is one of the most popular reinforcement learning algorithms, but it is known to sometimes learn unrealistically high action values because it includes a maximization step over estimated action values, which tends to prefer overestimated to underestimated values.
- To test whether overestimations occur in practice and at scale, we investigate the performance of the recent DQN algorithm .
DQN combines Q-learning with a flexible deep neural network and was tested on a varied and large set of deterministic Atari 2600 games, reaching human-level performance on many games.
Perhaps surprisingly, we show that even in this comparatively favorable setting DQN sometimes substantially overestimates the values of the actions.
- We show that the idea behind the Double Q-learning algorithm (van Hasselt, 2010), which was first proposed in a tabular setting, can be generalized to work with arbitrary function approximation, including deep neural networks. We use this to construct a new algorithm we call Double DQN.
## Background
- 
- 
- 
## Deep Q Networks
- A deep Q network (DQN) is a multi-layered neural network that for a given state s outputs a vector of action values Q(s, · ; θ), where θ are the parameters of the network. For an n-dimensional state space and an action space containing m actions, the neural network is a function from $R^n$ to $R^m$.
- The target network, with parameters $θ^−$, is the same as the online network except that its parameters are copied every τ steps from the online network, so that then $θ^−_t$ = $θ_t$, and kept fixed on all other steps. The target used by DQN is then
{"metaMigratedAt":"2023-06-14T22:59:12.403Z","metaMigratedFrom":"YAML","title":"Deep Reinforcement Learning with Double Q-learning","breaks":true,"contributors":"[{\"id\":\"fc12e243-a9bb-45bc-8657-e6beffed25c8\",\"add\":3309,\"del\":456}]"}