# FeiFei Li: Understanding and Interacting - Intro: Stanford, HAI, IMAGENET - Evolution of Vision: Key to Cambrian Explosion(寒武紀爆發) ## Understanding - Human visual perception: object, speed - Objedct understanding, identification - 2000: Hand-design feature, learned model - Identify 20K object for 8Yold child - 2016 Visual Genome: Scene graph - 2022 MOMA: Activity Understanding, video scene graph - representation learning, captioning ## Interaction - Plato's allegory of the cave: Degenerated perception of the world - Activity of Neurons, inhibitor - robotics: highly programed for structured en->unstructured (messy) env - Explorative Learning > No goal > SSL: Inspiration from infant learning > Intrinsic Motivation: World model based > Self model (error of world) v.s. World model (consequecis) - Exploitative Learning > Goal driven > Task: short horizon -> long horizon task(organize, planning)(Neural Task Programming) > Curricula Learning: Generating Tasks - Image Forecasting ## Big Data for Robotic Learning - Dynamic Messive, interactive env - Ongoing: BEHAVIOR([Benchmark for Everyday Household Activities in Virtual Interactive and ecological environements](https://arxiv.org/abs/2108.03332)) - Some of the tasks we don't want robots to do - survey:https://openreview.net/forum?id=_8DoIe8G3t - still challenging - Sim2Real is crutail: https://github.com/StanfordVL/OmniGibson ## Q - You focus on the perception and understanding for robots - https://www.nature.com/articles/s41467-021-25874-z