# Notes on "Combining Physical Simulators and Object-Based Networks for Control"
#### Author: [Sharath Chandra](https://sharathraparthy.github.io/)
## [Paper Link](https://arxiv.org/pdf/1904.06580.pdf)
###### tags: simulation, interaction-networks, robotics
---
In this paper the authors proposed a hybrid dynamics model, Simulation-Augemented Interaction Networks, where they incorporated Interaction Networks into a physics engine for solving real world complex robotics control tasks.
## Brief Outline:
Most of the physics based simulators serves as a good platform for carrying out robot planning and control tasks. But no simulator is a perfect because it has it's own modelling errors. So, most of the physics engines (mujoco, bullet, gazebo etc.,) demonstrate some descrepencies between their predictions and actual real world predictions. To decrease these errors, many methods have been propsed in the literature. Some of the methods include randomizing the simulation environments, famously known as Domain Randomization. In this paper, model errors are tackled by learning a residual model between the real world and simulator. In other words, instead of adding pertubations to the environment parameters, here we utilize some real world data to correct the simulator. Even though this method uses some real world data, this method is shown to be sample efficient and have better generalization capabilities.
### Interaction Networks
An Interaction network, proposed by [Peter W. Battaglia et al.](https://arxiv.org/abs/1612.00222), is a model which reasons about how objects interact in the complex systems. In this paper the authors incorporated interaction networks into a physics based simulator to learn the residual.
## Method:
### Formulation:
Let $S$ be state space and $A$ be the action space. dynamics model is function $f$ : $SXA$ $\rightarrow{S}$ which predicts $s_{t+1}$ given $s_{t}$ and $a_{t}$.There are two typs of dynamics models.
* Analytical model
* Data driven model.
The goal is to learn a hybrid model which is the combination of both. So, here the data driven model learns the discrepancy between analytical model prediction and real world data. Let $f_{r}$ be hybrid model function, $f_{p}$ be physics engine and $f_{\theta}$ be a residual compenent. So mathematically, $$f_{r}(s, a) = f_{\theta}(f_{p}(s,a), s, a)$$
Here the residual model $f_{\theta}(.)$ refines the physics engine predictions.
![](https://i.imgur.com/ZZMq8xX.png)
### Interaction Networks:
An interaction network consists of two neural networks.
* $f_{rel}$ : Calculates the pairwise forces between the objects.
* $f_{dyn}$ : Predicts the next state of the object based on the objects it is interacting with and the nature of interactions.
Here the aim is to predict the state of an object at time $t+1$, $\hat{o}_{t+1}^{i}$, given the current state of the object $o_{t}^{i}$. If there are $n$ objects, then $s_{t}$ = [${o}_{t}^{1}, {o}_{t}^{2}, {o}_{t}^{3} .... , {o}_{t}^{n}$] represents the state of $n$ objects at time $t$. Each ${o}_{t}^{i}$ is the vector of the object properties such as position $(p)$, velocity$(v)$, mass$(m)$ and radius$(r)$. Therefore current state of the object ${i}$ can be written as $o_{t}^{i} =[p_{t}^{i}, v_{t}^{i}, m^{i}, r^{i}]$. If $a_{t}^{i}$ id the action applied on the object at time $t$ then the equations foor the interaction network are the following,
1. $e_{t}^{i} = \sum_{j \neq i} f_{rel}(v_{t}^{i}, p_{t}^{i} - p_{t}^{i}, v_{t}^{i} - v_{t}^{i}, m^{i}, m^{j}, r^{i}, r^{j})$
2. $\hat{v}_{t+1}^{i} = v_{t}^{i} + dt \cdot f_{dyn}(v_{t}^{i}, a_t^{i},m^{i},r^{i}, e_{t}^{i})$
3. $\hat{p}_{t+1}^{i} = p_{t}^{i} + dy \cdot \hat{v}_{t+1}^{i}$
4. $\hat{o}_{t+1}^{i} =[\hat{p}_{t}^{i}, \hat{v}_{t}^{i}, m^{i}, r^{i}]$
### Simulator-Augmented Interaction Networks (SAIN):
Here, $f_{rel}$ and $f_{rel}$ will now take in the prediction of physics engine $f_p$ into account. Let $\bar{s}_t$ be the state and the $\bar{o}_t$ be the state of the object predicted by the physics engine. So, the equations of **SAIN** are the following,
1. $\bar{s}_{t+1} = f_{p}(\bar{s}_t, a_{t}^{1}, a_{t}^{2}, ..., a_{t}^{n})$
2. $e_{t}^{i} = \sum_{j \neq i} f_{rel}(v_{t}^{i},\bar{v}_{t+1}^{i} - \bar{v}_{t}^{i}, p_{t}^{i} - p_{t}^{i}, v_{t}^{i} - v_{t}^{i}, m^{i}, m^{j}, r^{i}, r^{j})$
3. $\hat{v}_{t+1}^{i} = v_{t}^{i} + dt \cdot f_{dyn}(v_{t}^{i}, \bar{p}_{t+1}^{i} - \bar{p}_{t}^{i}, a_t^{i},m^{i}, r^{i}, e_{t}^{i})$
4. $\hat{p}_{t+1}^{i} = p_{t}^{i} + dy \cdot \hat{v}_{t+1}^{i}$
5. $\hat{o}_{t+1}^{i} =[\hat{p}_{t}^{i}, \hat{v}_{t}^{i}, m^{i}, r^{i}]$
## Experiments and Results
The authors tested this approach on a challenging planar manipulation task and evaluated the generalization of this method by involving the objects of new shapes and materials. The task is to direct the second disk to a goal position(which is generated randomly) by only pushing the first disk. So the state is a set of two objects $s$ = ${o_1, o_2}$. After every trajectory, the environment dynamics such as friction between disks and surfaces, mass of the disks etc., are randomized (or sampled from a uniform distribution with fixed ranges). For the network architecture details please look into the paper.The loss function after every trajectory is
$$\frac{1}{T} \sum_{i=1}^{2}\sum_{t=0}^{T -1} \|p_{t}^{i} - \hat{p}_{t}^{i}\|_{2}^{2} + \|v_{t}^{i} - \hat{v}_{t}^{i}\|_{2}^{2} + \|\sin{r_{t}^{i}} - \sin{\hat{r}_{t}^{i}}\|_{2}^{2} + \|\cos{r_{t}^{i}} - \cos{\hat{r}_{t}^{i}}\|_{2}^{2}$$
For action selection, the paper depends on some classical control algorithms but instead we can use sate-of-the-art RL algorithms like PPO, TRPO, DDPG etc.
### Results
It is seen from the results that SAIN achievies best performance when compared to interaction networks both in simulation and real world. In order to test the generalization of SAIN, the authors conducted several tests on different surfaces using the same disks. From the evaluation results we can see that the percent success using fine-tuned SAIN is 92%.

Author: Sharath Paper Link Overview Curriculum learning is inspired by the way human learns, where the examples are shown in the increasing order of the difficulty. More sepcifically te network is exposed to the easier examples in the early stages of training and then gradually to the tougher ones. This paper studies the benifits of showing this sequential ordered eamples to the network and comments about when it works and when it doesn't. Contributions: This paper introduces a phenomenon called implicit curricula. One of the claims they make is that the the ordered learning (curriculum, anti-curriculum and random) almost performs same in the standard settings. Curricula is benificial when there is a limited time budget and in noisy regime

7/4/2021Author: Sharath Overview This paper discusses how the BNNs behave as we scale to the large models. This paper discusses the "soap-bubble" issue in case of high dimensional probability spaces and how MFVI suffers from this. As a way to tackle this issue, the authors propose a new variational posterior approximation in hyperspherical coordinate system and show that this overcomes the soap-bubble issue when we sample from this posterior. The geometry of high dimensional spaces One of the properties of high dimensional spaces is that there is much more volume outside any given neighbourhood than inside of it. Betacount et al explained this behaviour visually with two intuitive examples. For first example let us consider partitioning our parameter space in equal rectangular intervals as shown below. We can see that as we increase the dimensions the distribution of volume around the center decreases. This becomes almost negligible as compared to the its neighbourhood in high dimensional cases where $D$ is very large. We can observe a similar behaviour if we consider spherical view of parameter space, where the exterior volume grows even larger than the interior in high dimensional spaces as shown in the figure. . How this intuition of volumes in high dimensions explain soap bubble phenomenon?

7/4/2021Author: Sharath What is a convex function? Let's try to define a convex function formally and geometrically. Formally, a function $f$ is said to be a convex function if the domain of $f$ is a convex set and if it satisfies the following $\forall x \ \text{and} \ y \in \text{dom} f$; \begin{equation} f(\theta x + (1 - \theta)y) \leq \theta f(x) + (1 - \theta)f(y) \end{equation} Geometrically it means that the value of a function at the convex combination of two points of the function always lies below the convex combination of the values at the corresponding points. It means that if we draw a line at any two points $(x, y) \in \text{dom} f$, then this line/chord always lies above the function $f$.

7/4/2021Author: Sharath What is a dynamical system? It is any system that evolves and changes through time governed by a set of rules. Using dynamical systems we can study the long term behavior of an evolving system. Formally, it is a triplet $(X, T, \phi)$ where $X$ denotes the state space, $T$ denotes the time space and $\phi: X \times T \rightarrow X$ is the flow (this is the rule that governs the evolution). There are few properties of flow: $\phi(X, 0) = X$ Principle of compositionality: $\phi(\phi(x, t), s) = \phi(x, t+s)$

7/4/2021
Published on ** HackMD**

or

By clicking below, you agree to our terms of service.

Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet

Wallet
(
)

Connect another wallet
New to HackMD? Sign up