# [Chapter 2 solutions](https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf)
Author
[Raj Ghugare](https://github.com/Raj19022000)
###### tags: `Sutton and barton` `Solutions`
### Chapter 5 - Multi-armed Bandits:
* ### Exercise 2.1:
Probability of greedy action being selected in this case is : $0.75$
* ### Exercise 2.2:
Until action $4$ all actions were greedy actions.After the third action was taken the $Q$ estimates of all arms were as follows
$Q(a_{1}) = -1$
$Q(a_{2}) = -0.5$
$Q(a_{3}) = 0$
$Q(a_{4}) = 0$
At this stage the greedy actions are $3$ or $4$. But at this stage action $2$ was selected hence it was surely an $\epsilon$ case.
After the fourth action was taken the $Q$ estimates of all arms were:
$Q(a_{1}) = -1$
$Q(a_{2}) = \frac{1}{3}$
$Q(a_{3}) = 0$
$Q(a_{4}) = 0$
But at this stage action action $3$ was taken which was again an $\epsilon$ case.
Apart from these at all other stages actions could have been chosen under $\epsilon$ case because there is an $\epsilon$/n probability that the best arm is chosen even in the exploration case.
* ### Exercise 2.3:
In the long run, like for millions of trials, $\epsilon = 0.01$ would be the best because once the best arm is known it will choose that arm almost 99 out of 100 times. Moreover, the best way to do epsilon-greedy is to reduce epsilon over time.