# FeiFei Li: Understanding and Interacting
- Intro: Stanford, HAI, IMAGENET
- Evolution of Vision: Key to Cambrian Explosion(寒武紀爆發)
## Understanding
- Human visual perception: object, speed
- Objedct understanding, identification
- 2000: Hand-design feature, learned model
- Identify 20K object for 8Yold child
- 2016 Visual Genome: Scene graph
- 2022 MOMA: Activity Understanding, video scene graph
- representation learning, captioning
## Interaction
- Plato's allegory of the cave: Degenerated perception of the world
- Activity of Neurons, inhibitor
- robotics: highly programed for structured en->unstructured (messy) env
- Explorative Learning
> No goal
> SSL: Inspiration from infant learning
> Intrinsic Motivation: World model based
> Self model (error of world) v.s. World model (consequecis)
- Exploitative Learning
> Goal driven
> Task: short horizon -> long horizon task(organize, planning)(Neural Task Programming)
> Curricula Learning: Generating Tasks
- Image Forecasting
## Big Data for Robotic Learning
- Dynamic Messive, interactive env
- Ongoing: BEHAVIOR([Benchmark for Everyday Household Activities in Virtual Interactive and ecological environements](https://arxiv.org/abs/2108.03332))
- Some of the tasks we don't want robots to do
- survey:https://openreview.net/forum?id=_8DoIe8G3t
- still challenging
- Sim2Real is crutail: https://github.com/StanfordVL/OmniGibson
## Q
- You focus on the perception and understanding for robots
- https://www.nature.com/articles/s41467-021-25874-z