# Practical_RL - Lecture 1 : Introduction to Reinforcement Learning [toc]     ## Reinforcement Learning   **Branches of Machine Learning :**        **Reinforcement Learning 的簡單概念 :** 常見的想法 : - Get data by trial and error and error and error and error - Learn (situation) → (optimal action) - Repeat        **Decision process : Agent and Environment**             強化學習 (Reinforcement Learning) 與其他機器學習方法不同的地方 : - There is no supervisor, only a reward signal - Feedback is delayed, not instantaneous - Agent’s actions affect the subsequent data it receives     **Examples of Reinforcement Learning :**   1. Reality check : dynamic systems               2. ~~Reality~~ check : videogames            **Reward :** - Reward $R_t$ 是一個 scalar feedback signal - 可以從 reward 看出 agent 在 step t 的表現狀況 - agent 的工作就是讓累積得到的 reward 最大化              **Markov Decision Process (馬可夫決策過程) :** MDP 是在環境中模擬 agent 的策略 (policy) 與回報的數學模型,且環境的狀態具有馬可夫性質。 馬可夫性質 : 是機率論中的一個概念,當一個隨機過程在給定現在狀態及所有過去狀態情況下,其未來狀態的條件機率分布僅依賴於當前狀態;換句話說,在給定現在狀態時,他與過去狀態(及該過程的歷史路徑)是條件獨立的,那麼此隨機過程及具有馬可夫性質    $P_.$($s_{t+1}$|$s_t$,$a_t$) 為在 $s_t$ 狀態採取 $a_t$ 的動作下與環境互動能得到 $s_{t+1}$ 的機率   MDP 圖示 :        狀態 State 具有馬可夫性質 :        Markov assumption :          **Exploration and Exploitation (探索和利用) : 動作的選擇** - Reinforcement learning is like trial-and-error learning - The agent should discover a good policy from its experiences of the environment - Exploration finds more information about the environment - Exploitation exploits known information to maximise reward - It is usually important to explore as well as exploit   Examples : - 餐廳的選擇 Exploitation : 去你最喜歡的餐廳 Exploration : 試著去一間新的餐廳 - 玩遊戲 Exploitation : 做出你認為最好的行動 Exploration : 做出實驗性的隨機行動                  
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up