Motion Perception in Reinforcement Learning with Dynamic Objects (Similar)

--- tags: Paper Survey --- # Motion Perception in Reinforcement Learning with Dynamic Objects (Similar) ### Motion Perception in Reinforcement Learning with Dynamic Objects (Similar) * [Paper Link](https://arxiv.org/pdf/1901.03162.pdf) | [Github (including environments)](https://github.com/lmb-freiburg/flow_rl) | [Youtube](https://www.youtube.com/watch?v=YALmehmmu3Q) | [Project Site](https://lmb.informatik.uni-freiburg.de/projects/flowrl/) #### Architecture ![](https://i.imgur.com/GFk8z7V.png) #### Details * Provide the agent with **vector of robot state variables** and also the **high-dimensional sensory observation**. * Train with **20M timesteps** / PPO. * TinyFlowNet consists of a 5-layer encoder and a 2-layer decoder and is pretrained with 20,000 image pairs (random actions to generate training data) / Use a teacher-student concept. ![](https://i.imgur.com/1elIikb.png) * Evaluate and compare with following baseline -- * Image * Image stack * Image stack + image difference * LSTM * **Image + Segmentation** -- <span class="red"> **The mask is a motion segmentation taken from the predicted optical flow.** (maybe binary?) </span> * **Image + Backward Flow** -- Flow is computed in the <span class="red"> **backward direction** </span> to ensure that the object in the flow image is co-located with the object in the color image. * Experiment Environments * Standard Control Tasks (Walker / Swimmer / Hopper) -- There are <span class="red">**no moving objects in these tasks apart from the agent itself.**</span> ![](https://i.imgur.com/Dbbggeq.png) * Tasks with Dynamic Objects -- The environment <span class="red">**surrounding the robot contains moving objects.**</span> ![](https://i.imgur.com/a51m4TH.png) * Target speed ![](https://i.imgur.com/sHi0RIj.png) Some insights -- 1. In 2D environments: The ***image diff*** baseline outperform the other baseline and nearly match the performance of the flow-based agent in **2D Chaser**. > It is interesting that a **minor change in the input representation from image stack to the image difference leads to such a dramatic performance improvement**, despite the fact that a network with image stack as input could easily learn to compute the image difference. We believe this inability to learn the simple difference operation is **due to the complexity and unstability of the network optimization**. 2. Providing the segmentation mask of the moving ball did not reach the same performance as providing the optical flow. <span class="red">**This shows that the optical flow is not just used for localizing the moving object, but also for predicting its future position.**</span> <span class="brown">Q. **With assumption -- the environment only conists important moving object and without the unnecessary movements?** </span> 3. Pre-trained or from scratch? * Demo Video <iframe src='//gifs.com/embed/wVX5Jr' frameborder='0' scrolling='no' width='360px' height='180px' style='-webkit-backface-visibility: hidden;-webkit-transform: scale(1);' ></iframe> <iframe src='//gifs.com/embed/D1Gr36' frameborder='0' scrolling='no' width='360px' height='180px' style='-webkit-backface-visibility: hidden;-webkit-transform: scale(1);' ></iframe>