Seminar Report -

# Seminar Report - Project & Seminar Reports: * Hand in at the end of the semester (only then you are officially finished with the course) * Length: approx. 5 to 8 pages (excluding appendix, references) for the seminar and usually >10 pages for the project reports (but still keep in mind: less is more, quality over quantity) * There is no official template, but as a rule of thumb, everything that follows the style and structure of a paper is perfectly fine (e.g. abstract, introduction, related work, …. , appendix references, citations, etc.) * Also here, include the course you are registered in, your name + student id, your supervisors name(s) and the title of your report to the first page ## Notes - Bias in Natural Actor-Critic Algorithms: Policy gradient methods can reach global optima like Sarsa$(\lambda)$[^biasActorCritic]. - The reason to have a buffer is when the samples are generated from exploring sequentially in an environment the assumption that the samples are independently and identically distributed no longer holds. ### TO BE SORTED [Continuos papers connections](https://www.connectedpapers.com/main/024006d4c2a89f7acacc6e4438d156525b60a98f/Continuous-control-with-deep-reinforcement-learning/graph) [Vanilla Policy Gradient](https://spinningup.openai.com/en/latest/algorithms/vpg.html) good copy-paste https://openai.com/blog/openai-baselines-ppo/ https://paperswithcode.com/methods/category/policy-gradient-methods https://github.com/supercurious/supercurious.github.io/blob/a64407d984da5c0100630a2ff8e0d92012827e5c/_posts/2018-12-20-deep-rl-continuous-ctrl.markdown ## TODO References 1. - [X] [Lillicrap et al., 2015, (**DDPG**) Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971) 2. - [ ] [Silver et al., (**DPG**)](https://deepmind.com/research/publications/deterministic-policy-gradient-algorithms) 3. - [ ] [Mnih et al., 2013 (**DQN**)](https://arxiv.org/abs/1312.5602) 6. - [ ] [Fujimote et al., 2018 (**TD3**)](https://arxiv.org/abs/1802.09477) 5. - [ ] [Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." arXiv preprint arXiv:1801.01290, 2018 (**SAC**)](https://arxiv.org/abs/1801.01290) 7. - [ ] [ Hasselt et al., 2018 - (Insatiability of target network) Deep Reinforcement Learning and the Deadly Triad](https://arxiv.org/pdf/1812.02648.pdf) 8. - [ ] [Schulman et al. (**TRPO**), Trust Region Policy Optimization](https://arxiv.org/pdf/1502.05477.pdf) 9. - [ ] [Mnih et al., Asynchronous Methods for Deep Reinforcement Learning](https://arxiv.org/pdf/1602.01783.pdf) 10. - [ ] [Shixiang et al., Continuous Deep Q-Learning with Model-based Acceleration](https://arxiv.org/pdf/1603.00748.pdf) 11. - [ ] [Schulman et al., Proximal Policy Optimization Algorithms](https://arxiv.org/pdf/1707.06347.pdf) 13. - [ ] [Thomas,2014, Bias in Natural Actor-Critic Algorithms](http://proceedings.mlr.press/v32/thomas14.pdf) ## Citations [^biasActorCritic]: [Thomas,2014, Bias in Natural Actor-Critic Algorithms](http://proceedings.mlr.press/v32/thomas14.pdf) [^ddpg]: [Lillicrap et al., 2015, (DDPG) Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971) [^td3]: [Twin Delayed DDPG (TD3)](https://spinningup.openai.com/en/latest/algorithms/td3.html)