# Reinforcement Learning ## DQN 離散Action 以DNN 模擬Q-table why DQN cannot work on continues control?  reference: [DQN从入门到放弃7 连续控制DQN算法-NAF](https://zhuanlan.zhihu.com/p/21609472) ## Actor-Critic Actor - Policy(π) Gradient Actor_Loss Critic - Q-Learning (Q^π(s,a)) Critic_Loss   Critci網路算出Td-Error,利用Td-Error對Actor網路進行更新 目標—最小化Td-error ## TD(Temporal Difference) why -Q(s,a) reference: [時序差分學習](https://ithelp.ithome.com.tw/articles/10234455) ##
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up