# Notes on "Combining Physical Simulators and Object-Based Networks for Control" #### Author: [Sharath Chandra](https://sharathraparthy.github.io/) ## [Paper Link](https://arxiv.org/pdf/1904.06580.pdf) ###### tags: simulation, interaction-networks, robotics --- In this paper the authors proposed a hybrid dynamics model, Simulation-Augemented Interaction Networks, where they incorporated Interaction Networks into a physics engine for solving real world complex robotics control tasks. ## Brief Outline: Most of the physics based simulators serves as a good platform for carrying out robot planning and control tasks. But no simulator is a perfect because it has it's own modelling errors. So, most of the physics engines (mujoco, bullet, gazebo etc.,) demonstrate some descrepencies between their predictions and actual real world predictions. To decrease these errors, many methods have been propsed in the literature. Some of the methods include randomizing the simulation environments, famously known as Domain Randomization. In this paper, model errors are tackled by learning a residual model between the real world and simulator. In other words, instead of adding pertubations to the environment parameters, here we utilize some real world data to correct the simulator. Even though this method uses some real world data, this method is shown to be sample efficient and have better generalization capabilities. ### Interaction Networks An Interaction network, proposed by [Peter W. Battaglia et al.](https://arxiv.org/abs/1612.00222), is a model which reasons about how objects interact in the complex systems. In this paper the authors incorporated interaction networks into a physics based simulator to learn the residual. ## Method: ### Formulation: Let $S$ be state space and $A$ be the action space. dynamics model is function $f$ : $SXA$ $\rightarrow{S}$ which predicts $s_{t+1}$ given $s_{t}$ and $a_{t}$.There are two typs of dynamics models. * Analytical model * Data driven model. The goal is to learn a hybrid model which is the combination of both. So, here the data driven model learns the discrepancy between analytical model prediction and real world data. Let $f_{r}$ be hybrid model function, $f_{p}$ be physics engine and $f_{\theta}$ be a residual compenent. So mathematically, $$f_{r}(s, a) = f_{\theta}(f_{p}(s,a), s, a)$$ Here the residual model $f_{\theta}(.)$ refines the physics engine predictions. ![](https://i.imgur.com/ZZMq8xX.png) ### Interaction Networks: An interaction network consists of two neural networks. * $f_{rel}$ : Calculates the pairwise forces between the objects. * $f_{dyn}$ : Predicts the next state of the object based on the objects it is interacting with and the nature of interactions. Here the aim is to predict the state of an object at time $t+1$, $\hat{o}_{t+1}^{i}$, given the current state of the object $o_{t}^{i}$. If there are $n$ objects, then $s_{t}$ = [${o}_{t}^{1}, {o}_{t}^{2}, {o}_{t}^{3} .... , {o}_{t}^{n}$] represents the state of $n$ objects at time $t$. Each ${o}_{t}^{i}$ is the vector of the object properties such as position $(p)$, velocity$(v)$, mass$(m)$ and radius$(r)$. Therefore current state of the object ${i}$ can be written as $o_{t}^{i} =[p_{t}^{i}, v_{t}^{i}, m^{i}, r^{i}]$. If $a_{t}^{i}$ id the action applied on the object at time $t$ then the equations foor the interaction network are the following, 1. $e_{t}^{i} = \sum_{j \neq i} f_{rel}(v_{t}^{i}, p_{t}^{i} - p_{t}^{i}, v_{t}^{i} - v_{t}^{i}, m^{i}, m^{j}, r^{i}, r^{j})$ 2. $\hat{v}_{t+1}^{i} = v_{t}^{i} + dt \cdot f_{dyn}(v_{t}^{i}, a_t^{i},m^{i},r^{i}, e_{t}^{i})$ 3. $\hat{p}_{t+1}^{i} = p_{t}^{i} + dy \cdot \hat{v}_{t+1}^{i}$ 4. $\hat{o}_{t+1}^{i} =[\hat{p}_{t}^{i}, \hat{v}_{t}^{i}, m^{i}, r^{i}]$ ### Simulator-Augmented Interaction Networks (SAIN): Here, $f_{rel}$ and $f_{rel}$ will now take in the prediction of physics engine $f_p$ into account. Let $\bar{s}_t$ be the state and the $\bar{o}_t$ be the state of the object predicted by the physics engine. So, the equations of **SAIN** are the following, 1. $\bar{s}_{t+1} = f_{p}(\bar{s}_t, a_{t}^{1}, a_{t}^{2}, ..., a_{t}^{n})$ 2. $e_{t}^{i} = \sum_{j \neq i} f_{rel}(v_{t}^{i},\bar{v}_{t+1}^{i} - \bar{v}_{t}^{i}, p_{t}^{i} - p_{t}^{i}, v_{t}^{i} - v_{t}^{i}, m^{i}, m^{j}, r^{i}, r^{j})$ 3. $\hat{v}_{t+1}^{i} = v_{t}^{i} + dt \cdot f_{dyn}(v_{t}^{i}, \bar{p}_{t+1}^{i} - \bar{p}_{t}^{i}, a_t^{i},m^{i}, r^{i}, e_{t}^{i})$ 4. $\hat{p}_{t+1}^{i} = p_{t}^{i} + dy \cdot \hat{v}_{t+1}^{i}$ 5. $\hat{o}_{t+1}^{i} =[\hat{p}_{t}^{i}, \hat{v}_{t}^{i}, m^{i}, r^{i}]$ ## Experiments and Results The authors tested this approach on a challenging planar manipulation task and evaluated the generalization of this method by involving the objects of new shapes and materials. The task is to direct the second disk to a goal position(which is generated randomly) by only pushing the first disk. So the state is a set of two objects $s$ = ${o_1, o_2}$. After every trajectory, the environment dynamics such as friction between disks and surfaces, mass of the disks etc., are randomized (or sampled from a uniform distribution with fixed ranges). For the network architecture details please look into the paper.The loss function after every trajectory is $$\frac{1}{T} \sum_{i=1}^{2}\sum_{t=0}^{T -1} \|p_{t}^{i} - \hat{p}_{t}^{i}\|_{2}^{2} + \|v_{t}^{i} - \hat{v}_{t}^{i}\|_{2}^{2} + \|\sin{r_{t}^{i}} - \sin{\hat{r}_{t}^{i}}\|_{2}^{2} + \|\cos{r_{t}^{i}} - \cos{\hat{r}_{t}^{i}}\|_{2}^{2}$$ For action selection, the paper depends on some classical control algorithms but instead we can use sate-of-the-art RL algorithms like PPO, TRPO, DDPG etc. ### Results It is seen from the results that SAIN achievies best performance when compared to interaction networks both in simulation and real world. In order to test the generalization of SAIN, the authors conducted several tests on different surfaces using the same disks. From the evaluation results we can see that the percent success using fine-tuned SAIN is 92%.