###### tags: `Deep learning` # Reinforcement Learning ## 不理解環境(model free RL) * 機器人不理解環境,透過環境一步一步反饋而一步一步採取行動學習 ![](https://i.imgur.com/m23rl7n.png) ## 理解環境(model base RL) * 機器人會透過過往的經驗,並建立一個模型來模擬現實世界的反饋 * 透過想像來愈判斷接下來發生情況,並依照情況來找出最佳情況 ![](https://i.imgur.com/CERr7Bo.png) ## Policy-based RL * 依照感知選擇動作,每個動作都是機率 ![](https://i.imgur.com/KlJlWZF.png) * model:Policy gradients ## Value-based RL * 依照感知選擇動作,每個動作都有value,但選擇最高value ![](https://i.imgur.com/CWYfDTw.png) * model:Q-learning,Sarsa ## Policy-based RL+Value-based 會基於policy做出action,並依照action給予value ![](https://i.imgur.com/1cRaCYw.png) ## update machnism ### Monte-carlo update 直到遊戲結束能更新 ![](https://i.imgur.com/Cr98a3B.png) ### Temporal-Difference update 每走一步都能更新 ![](https://i.imgur.com/Vij0ccT.png) ### On-policy * 需本身學習 * model:sarsa ![](https://i.imgur.com/dWbcFwk.png) ### Off-Policy * 可透過別人經驗學習或是本身 * model:Q-learning,Deep Q network ![](https://i.imgur.com/Wuz8XV7.png) ## Reference 1. https://www.youtube.com/watch?v=AINrxEjflaM&list=PLXO45tsB95cK7G-raBeTVjAoZHtJpiKh3&index=2