---
tags: Paper Survey
---
# Motion Perception in Reinforcement Learning with Dynamic Objects (Similar)
### Motion Perception in Reinforcement Learning with Dynamic Objects (Similar)
* [Paper Link](https://arxiv.org/pdf/1901.03162.pdf) | [Github (including environments)](https://github.com/lmb-freiburg/flow_rl) | [Youtube](https://www.youtube.com/watch?v=YALmehmmu3Q) | [Project Site](https://lmb.informatik.uni-freiburg.de/projects/flowrl/)
#### Architecture

#### Details
* Provide the agent with **vector of robot state variables** and also the **high-dimensional sensory observation**.
* Train with **20M timesteps** / PPO.
* TinyFlowNet consists of a 5-layer encoder and a 2-layer decoder and is pretrained with 20,000 image pairs (random actions to generate training data) / Use a teacher-student concept.

* Evaluate and compare with following baseline --
* Image
* Image stack
* Image stack + image difference
* LSTM
* **Image + Segmentation** -- <span class="red"> **The mask is a motion segmentation taken from the predicted optical flow.** (maybe binary?) </span>
* **Image + Backward Flow** -- Flow is computed in the <span class="red"> **backward direction** </span> to ensure that the object in the flow image is co-located with the object in the color image.
* Experiment Environments
* Standard Control Tasks (Walker / Swimmer / Hopper) -- There are <span class="red">**no moving objects in these tasks apart from the agent itself.**</span>

* Tasks with Dynamic Objects -- The environment <span class="red">**surrounding the robot contains moving objects.**</span>

* Target speed

Some insights --
1. In 2D environments: The ***image diff*** baseline outperform the other baseline and nearly match the performance of the flow-based agent in **2D Chaser**.
> It is interesting that a **minor change in the input representation from image stack to the image difference leads to such a dramatic performance improvement**, despite the fact that a network with image stack as input could easily learn to compute the image difference. We believe this inability to learn the simple difference operation is **due to the complexity and unstability of the network optimization**.
2. Providing the segmentation mask of the moving ball did not reach the same performance as providing the optical flow. <span class="red">**This shows that the optical flow is not just used for localizing the moving object, but also for predicting its future position.**</span>
<span class="brown">Q. **With assumption -- the environment only conists important moving object and without the unnecessary movements?** </span>
3. Pre-trained or from scratch?
* Demo Video
<iframe src='//gifs.com/embed/wVX5Jr' frameborder='0' scrolling='no' width='360px' height='180px' style='-webkit-backface-visibility: hidden;-webkit-transform: scale(1);' ></iframe>
<iframe src='//gifs.com/embed/D1Gr36' frameborder='0' scrolling='no' width='360px' height='180px' style='-webkit-backface-visibility: hidden;-webkit-transform: scale(1);' ></iframe>