---
title: Reinforcement Learning
tags: ml
---
### Reinforcement Learning
- Develop an agent that interacts with an environment and takes actions

- Explanation of diagram
- Overarching idea - agent interacts with an environment over a series of time steps
- At each time step, the agent receives some observation from the environment and must choose an action that is subsequently transmitted back to the environment via some mechanism (sometimes called an actuator)
- The agent receives a reward from the environment
- Agent receives subsequent observation and cycle continues
- Behavior of a reinforcement learning agent is governed by a policy
- A policy is just a function that maps from observations of the environment to actions
- Goal of reinforcement learning is to produce a good policy
- Problems reinforcement learners might have to deal with
- Credit assignment
- Determining which actions to credit or blame for an outcome
- Partial observability
- The current observation might not tell you everything about your current state
- Markov decision process
- When the environment is fully observed
- Contextual bandit problem
- When the state does not depend on the previous actions
- Multi-armed bandit problem
- When there is no state, just a set of available actions with initially unknown rewards