Notes on [Learning Agile and Dynamic Motor Skills for Legged Robots](https://arxiv.org/abs/1901.08652)

# Notes on [Learning Agile and Dynamic Motor Skills for Legged Robots](https://arxiv.org/abs/1901.08652) Author: [Sharath Chandra](https://sharathraparthy.github.io/) ###### tags: sim2real-transfer, legged-robots --- ## Outline This paper talks about the simulation-to-real transfer problem in context of robotics and proposes a technique for the same for legged locomotion task. The problem is divided into four sub-parts. First, we identify the physical parameters and estimate the uncertainities. Second, we learn a mapping between the high level actions to the motor torques. Third, we train a policy in a simulator and finally we deploy on an actual quadraped robot. The learned policy is evaluated on various tasks such as **Command-conditioned locomotion**, which evaluates te generality and robustness of the learned controller, **High-speed locomotion** which evaluates the speed improvement and finally **Recovery from a fall** to check how fast can the robot recover from a fall. ## Method ### Modeling the rigid body dynamics For effective sim2real transfer we need a simulation platform which is accurate. In order to tackle the modelling errors in the simulator,this method uses rigid body contact solver presented in [J. Hwangbo et. al.,]() to model the rigid body dynamics. The inertial properties of the links owe to 20% of the error. In order to tackle this the policy network is trained on wide range of inertial properties (similar to what we do in DR). Similarly, center of mass positions, the masses of links, and joint positions are also randomized. ### Modeling the actuation Most of the legged robots actuators are driven by mechanical/electrical/hydraulic actuators which are difficult to model. In order to overcome this, they used supervised learning to establish the relation between action and torque. They trained an **actuator network** which takes history of position errors and the velocities and outputs the estimated torque of the joints. ### Training in the Simulator: After the above two steps, the policy is trained with TRPO algorithm which almost require no parameter tuning. ### Deployment in real-world The trained policies in the simulator are deployed on the **ANYmal robot**. Look into the paper for exact implementation details. ![](https://i.imgur.com/Wm6fwwx.png) To sum up, first we identify the physical parameters and model the rigid body dynamics. Second, we try to find the relation between the joint position velocities and torque by using supervised learning. Then using the above two, we train a policy in the simlator and lastly we deploy that in the real world. ## Pros and Cons: ### Pros: 1. The method proposed has outperformed the best existing model based controller running on the same robot. 2. The trained policies are robust while still utilizing very less compute and energy. ### Cons: 1. This requires human expertise for tuning and to design initial state distribution for each new task. 2. Possibility of overfitting while training the **actuator net**. 3. "*Another limitation of our approach was observed over the course of this study. A single neural network trained in one session manifests single-faceted behaviors that do not generalize across multiple tasks*"