# Reinforcement Learning ## DQN 離散Action 以DNN 模擬Q-table why DQN cannot work on continues control? ![alt](https://i.imgur.com/g4mqzjS.png) reference: [DQN从入门到放弃7 连续控制DQN算法-NAF](https://zhuanlan.zhihu.com/p/21609472) ## Actor-Critic Actor - Policy(π) Gradient Actor_Loss![](https://i.imgur.com/Z9kiifz.png) Critic - Q-Learning (Q^π(s,a)) Critic_Loss ![](https://i.imgur.com/5lAtYPW.png) ![](https://i.imgur.com/a6YQc19.png) Critci網路算出Td-Error,利用Td-Error對Actor網路進行更新 目標—最小化Td-error ## TD(Temporal Difference) why -Q(s,a) reference: [時序差分學習](https://ithelp.ithome.com.tw/articles/10234455) ##