###### tags: `Deep learning` # Reinforcement Learning ## 不理解環境(model free RL) * 機器人不理解環境,透過環境一步一步反饋而一步一步採取行動學習  ## 理解環境(model base RL) * 機器人會透過過往的經驗,並建立一個模型來模擬現實世界的反饋 * 透過想像來愈判斷接下來發生情況,並依照情況來找出最佳情況  ## Policy-based RL * 依照感知選擇動作,每個動作都是機率  * model:Policy gradients ## Value-based RL * 依照感知選擇動作,每個動作都有value,但選擇最高value  * model:Q-learning,Sarsa ## Policy-based RL+Value-based 會基於policy做出action,並依照action給予value  ## update machnism ### Monte-carlo update 直到遊戲結束能更新  ### Temporal-Difference update 每走一步都能更新  ### On-policy * 需本身學習 * model:sarsa  ### Off-Policy * 可透過別人經驗學習或是本身 * model:Q-learning,Deep Q network  ## Reference 1. https://www.youtube.com/watch?v=AINrxEjflaM&list=PLXO45tsB95cK7G-raBeTVjAoZHtJpiKh3&index=2
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up