# 強化學習
###### tags: `強化學習`
- [Policy Gradient](/8SuIRATlSlGjBOdaQ01hlQ)
- [Proximal Policy Optimization (PPO)](/f8-xjHBPSUS28GwsbmRO_g)
- [Q-learning (Basic Idea)](/bSzqZQ5KRvSmvQ8bQh2gUQ)
- [Q-learning (Advanced Tips)](/uKLA4H5JSteu45gRihq1HQ)
- [Q-learning (Continuous Action)](/J6YPq_wzQ0a4kUQY1kRZVw)
- [Actor-Critic](/9JTyIkUCS4OYqgWIBSjVsQ)
- [Sparse Reward](/HZa1TOJsTZueozjKjemLng)
- [Imitation Learning](/vE4cMXitT8yfRhjB3fdVyw)
資源
https://bigdatafinance.tw/index.php/tech/data-processing/528-2018-02-27-04-31-42
https://github.com/tinyzqh/awesome-reinforcement-learning
https://github.com/NeuronDance/DeepRL
https://github.com/dennybritz/reinforcement-learning
https://datawhalechina.github.io/easy-rl/
https://www.inside.com.tw/article/12526-deep-reinforcementlearing-not-work
{"metaMigratedAt":"2023-06-17T06:15:47.929Z","metaMigratedFrom":"Content","title":"強化學習","breaks":true,"description":"Policy Gradient","contributors":"[{\"id\":\"e7ee1d27-3870-4a38-b411-cc0220e85f99\",\"add\":3186,\"del\":2396}]"}