# Practical_RL - Lecture 1 : Introduction to Reinforcement Learning [toc]     ## Reinforcement Learning   **Branches of Machine Learning :**   ![](https://i.imgur.com/aPpDOZR.png)     **Reinforcement Learning 的簡單概念 :** 常見的想法 : - Get data by trial and error and error and error and error - Learn (situation) → (optimal action) - Repeat ![](https://i.imgur.com/nQdqn61.png)       **Decision process : Agent and Environment**   ![](https://i.imgur.com/3gcH4Mq.png)     ![](https://i.imgur.com/r1bykFA.png)     強化學習 (Reinforcement Learning) 與其他機器學習方法不同的地方 : - There is no supervisor, only a reward signal - Feedback is delayed, not instantaneous - Agent’s actions affect the subsequent data it receives     **Examples of Reinforcement Learning :**   1. Reality check : dynamic systems   ![](https://i.imgur.com/o11qFG7.jpg)       ![](https://i.imgur.com/fThlSbt.png)     2. ~~Reality~~ check : videogames   ![](https://i.imgur.com/fdPRQkw.png)         **Reward :** - Reward $R_t$ 是一個 scalar feedback signal - 可以從 reward 看出 agent 在 step t 的表現狀況 - agent 的工作就是讓累積得到的 reward 最大化   ![](https://i.imgur.com/OEs8IVD.png)           **Markov Decision Process (馬可夫決策過程) :** MDP 是在環境中模擬 agent 的策略 (policy) 與回報的數學模型,且環境的狀態具有馬可夫性質。 馬可夫性質 : 是機率論中的一個概念,當一個隨機過程在給定現在狀態及所有過去狀態情況下,其未來狀態的條件機率分布僅依賴於當前狀態;換句話說,在給定現在狀態時,他與過去狀態(及該過程的歷史路徑)是條件獨立的,那麼此隨機過程及具有馬可夫性質   ![](https://i.imgur.com/yuePmxF.png) $P_.$($s_{t+1}$|$s_t$,$a_t$) 為在 $s_t$ 狀態採取 $a_t$ 的動作下與環境互動能得到 $s_{t+1}$ 的機率   MDP 圖示 :   ![](https://i.imgur.com/pPzlKuC.png)     狀態 State 具有馬可夫性質 :   ![](https://i.imgur.com/GC1kyqD.png)     Markov assumption : ![](https://i.imgur.com/Zzb2ysg.png)         **Exploration and Exploitation (探索和利用) : 動作的選擇** - Reinforcement learning is like trial-and-error learning - The agent should discover a good policy from its experiences of the environment - Exploration finds more information about the environment - Exploitation exploits known information to maximise reward - It is usually important to explore as well as exploit   Examples : - 餐廳的選擇 Exploitation : 去你最喜歡的餐廳 Exploration : 試著去一間新的餐廳 - 玩遊戲 Exploitation : 做出你認為最好的行動 Exploration : 做出實驗性的隨機行動         ![](https://i.imgur.com/ilYoMC2.png)         ![](https://i.imgur.com/uaBQXsD.png)