# Seminar Report -
Project & Seminar Reports:
* Hand in at the end of the semester (only then you are officially finished with the course)
* Length: approx. 5 to 8 pages (excluding appendix, references) for the seminar and usually >10 pages for the project reports (but still keep in mind: less is more, quality over quantity)
* There is no official template, but as a rule of thumb, everything that follows the style and structure of a paper is perfectly fine (e.g. abstract, introduction, related work, …. , appendix references, citations, etc.)
* Also here, include the course you are registered in, your name + student id, your supervisors name(s) and the title of your report to the first page
## Notes
- Bias in Natural Actor-Critic Algorithms:
Policy gradient methods can reach global optima like Sarsa$(\lambda)$[^biasActorCritic].
- The reason to have a buffer is when
the samples are generated from exploring sequentially in an environment the assumption that the samples are independently and identically distributed no longer
holds.
### TO BE SORTED
[Continuos papers connections](https://www.connectedpapers.com/main/024006d4c2a89f7acacc6e4438d156525b60a98f/Continuous-control-with-deep-reinforcement-learning/graph)
[Vanilla Policy Gradient](https://spinningup.openai.com/en/latest/algorithms/vpg.html)
good copy-paste https://openai.com/blog/openai-baselines-ppo/
https://paperswithcode.com/methods/category/policy-gradient-methods
https://github.com/supercurious/supercurious.github.io/blob/a64407d984da5c0100630a2ff8e0d92012827e5c/_posts/2018-12-20-deep-rl-continuous-ctrl.markdown
## TODO References
1. - [X] [Lillicrap et al., 2015, (**DDPG**) Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971)
2. - [ ] [Silver et al., (**DPG**)](https://deepmind.com/research/publications/deterministic-policy-gradient-algorithms)
3. - [ ] [Mnih et al., 2013 (**DQN**)](https://arxiv.org/abs/1312.5602)
6. - [ ] [Fujimote et al., 2018 (**TD3**)](https://arxiv.org/abs/1802.09477)
5. - [ ] [Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." arXiv preprint arXiv:1801.01290, 2018 (**SAC**)](https://arxiv.org/abs/1801.01290)
7. - [ ] [ Hasselt et al., 2018 - (Insatiability of target network) Deep Reinforcement Learning and the Deadly Triad](https://arxiv.org/pdf/1812.02648.pdf)
8. - [ ] [Schulman et al. (**TRPO**), Trust Region Policy Optimization](https://arxiv.org/pdf/1502.05477.pdf)
9. - [ ] [Mnih et al., Asynchronous Methods for Deep Reinforcement Learning](https://arxiv.org/pdf/1602.01783.pdf)
10. - [ ] [Shixiang et al., Continuous Deep Q-Learning with Model-based Acceleration](https://arxiv.org/pdf/1603.00748.pdf)
11. - [ ] [Schulman et al., Proximal Policy Optimization Algorithms](https://arxiv.org/pdf/1707.06347.pdf)
13. - [ ] [Thomas,2014, Bias in Natural Actor-Critic Algorithms](http://proceedings.mlr.press/v32/thomas14.pdf)
## Citations
[^biasActorCritic]: [Thomas,2014, Bias in Natural Actor-Critic Algorithms](http://proceedings.mlr.press/v32/thomas14.pdf)
[^ddpg]: [Lillicrap et al., 2015, (DDPG) Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971)
[^td3]: [Twin Delayed DDPG (TD3)](https://spinningup.openai.com/en/latest/algorithms/td3.html)
{"metaMigratedAt":"2023-06-15T16:37:15.038Z","metaMigratedFrom":"YAML","title":"Seminar Report -","breaks":false,"contributors":"[{\"id\":\"f76b4bc5-114c-445f-a24d-bc3cfc8414e9\",\"add\":16703,\"del\":13337}]"}