HackMD - Collaborative Markdown Knowledge Base

Video Game 新生 Paper 調查 --- Value-based DRL - [ ] 元祖DQN (Playing Atari with Deep Reinforcement Learning) - [ ] Nature DQN (Human-level control through deep reinforcement learning) - [ ] Gorila DQN (Massively Parallel Methods for Deep Reinforcement Learning) - [ ] DRQN (Deep Recurrent Q-Learning for Partially Observable MDPs) - [ ] DDQN (Deep Reinforcement Learning with Double Q-learning) - [ ] Dueling Network (Dueling Network Architectures for Deep Reinforcement Learning) - [ ] PER (Prioritized Experience Replay) - [ ] Bootstrapped DQN (Deep Exploration via Bootstrapped DQN) - [ ] C51 (A Distributional Perspective on Reinforcement Learning) - [ ] Rainbow (Rainbow: Combining Improvements in Deep Reinforcement Learning) - [ ] QR-DQN (Distributional Reinforcement Learning With Quantile Regression) - [ ] IQN (Implicit Quantile Networks for Distributional Reinforcement Learning) - [ ] Ape-X (Distributed Prioritized Experience Replay) - [ ] R2D2 (Recurrent Experience Replay in Distributed Reinforcement Learning) Policy/Actor-critic-based DRL - [ ] DDPG (Continuous control with deep reinforcement learning) - [ ] A3C (Asynchronous Methods for Deep Reinforcement Learning) - [ ] GA3C (Reinforcement learning through asynchronous advantage actor-critic on a gpu) - [ ] TRPO (Trust Region Policy Optimization) - [ ] PPO (Proximal Policy Optimization Algorithms) - [ ] ACER (Sample Efficient Actor-Critic with Experience Replay) - [ ] IMPALA (IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures) - [ ] TD3 (Addressing Function Approximation Error in Actor-Critic Methods) - [ ] D4PG (Distributed Distributional Deterministic Policy Gradients) Exploration - [ ] CTS (Unifying Count-Based Exploration and Intrinsic Motivation) - [ ] Curiosity (Curiosity-Driven Exploration by Self-Supervised Prediction) - [ ] RND (Exploration By Random Network Distillation) - [ ] Go-Explore (Montezuma’s Revenge Solved by Go-Explore) - [ ] Noisy Network (Noisy Networks For Exploration) - [ ] DQN/C51-IDS (Information-Directed Exploration for Deep Reinforcement Learning) - [ ] Episodic Curiosity (Episodic Curiosity through Reachability) - [ ] Soft-Q Learning (Reinforcement Learning with Deep Energy-Based Policies) - [ ] SAC (Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor) - [ ] UNREAL (Reinforcement Learning with Unsupervised Auxiliary Tasks) - [ ] NGU (Never Give Up: Learning Directed Exploration Strategies) - [ ] Agent57 (Agent57: Outperforming the human Atari benchmark) Model-based RL - [ ] Prioritized Sweeping (Planning by Prioritized Sweeping with Small Backups) - [ ] MuZero (Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model) Demonstration - [ ] Single Demo (Learning Montezuma's Revenge from a Single Demonstration) - [ ] GAIL (Generative Adversarial Imitation Learning) - [ ] DQfD (Deep Q-learning from Demonstrations) - [ ] Ape-X DQfD (Observe and Look Further: Achieving Consistent Performance on Atari) Fundamental Theory - [ ] Invariant Reward (Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping) - [ ] Negative Q (A study of Q-learning considering negative rewards) - [ ] GAE (High-Dimensional Continuous Control Using Generalized Advantage Estimation) Applications - [ ] OpenAI Five (Dota 2 with Large Scale Deep Reinforcement Learning) - [ ] FTW FPS (Human-level performance in 3D multiplayer games with population-based reinforcement learning) - [ ] AlphaStar (Grandmaster level in StarCraft II using multi-agent reinforcement learning) - [ ] Hide-and-seek (Emergent Complexity via Multi-Agent Competition) Explainable - [ ] Visualizing and Understanding Atari Agents - [ ] Grad-CAM (Grad-CAM Visual Explanations from Deep Networks via Gradient-based Localization)