# Smooth Exploration for Robotic Reinforcement Learning ###### [original paper](https://proceedings.mlr.press/v164/raffin22a/raffin22a.pdf) ## Situation #### Reinforcement learning (RL) enables robots to learn skills from interactions with the real world. In practice, the unstructured step-based exploration used in Deep RL – often very successful in simulation – leads to jerky motion patterns on real robots. Consequences of the resulting shaky behavior are poor exploration, or even damage to the robot. ## State-Dependent Exploration #### State-Dependent Exploration (SDE) is an intermediate solution that consists in adding noiseas a function of the state st, to the deterministic action µ(st). At the beginning of an episode, the parameters &theta; of that exploration function are drawn from a Gaussian distribution. The resultingaction at is as follows: ![](https://hackmd.io/_uploads/Bkg0aV1d2.png) ## Generalized State-Dependent Exploration ![](https://hackmd.io/_uploads/Skf11S1d3.png) ![](https://hackmd.io/_uploads/SyhgyHJ_3.png) ![](https://hackmd.io/_uploads/rk2eSSJ_h.png) ## Thoughts #### [cited paper](https://arxiv.org/pdf/1903.11524.pdf) 1. #### Don't know if this claim in the paper is valid or not: <br>The samples from gaussian exploration policies (ex. SAC, PPO) are temporally coherent only through the distribution mean. In most environments this coherence is not sufficient to provide consistent and effective exploration. 2. #### Are there any domains suffer from jerky motion patterns and we can solve them by similar approaches