Hierarchies in Models and Control

# Hierarchies in Models and Control Humans have a rich repertoire of motion skills, and are able to quickly recombine them in unexpected ways, for example to adapt to a new situation (parkour or soccer player) or even for aesthetic creativity (dancers or skateboarder). Robots, on the other hand, are still largely trained for specific tasks, and with a rather rudimentary metric on performance: largely effort-squared and some proxy on stability. A critical limitation, both in conventional approaches such as model-predictive control (MPC) approaches and reinforcement learning (RL), is a heavy reliance on reference trajectories. A key to recreating the similar richness in motion as humans, is the ability to reason across different scales and resolutions of both (state-) space and time, which we believe requires a (flexible) hierarchy of models and policies. We distinguish between - _horizon length_: how far into the future you predict - _timescales_: temporal resolution, essentially your control or sim dt (maybe call _bandwidth_) - _order_: dimensionality of your state-space model - _resolution_: spatial resolution, e.g. precision of your state-space predictions. In most MPC frameworks, these proporties are fixed ad-hoc manually, and limited to 2-3 levels. E.g. CMPC has a _horizon length_ of X seconds, _timescale_ of Y (dt in the traj-node), _order_ of 12 states and 12 actions. _Resolution_ is rarely quantified, but it is typically assumed to be perfect: the tracking controller (WBC in this case) is tasked with perfect tracking. While our existing model-based frameworks give us working examples and insight into the choices of these hierarchical levels, we still lack a principled understanding of how they interact, and how to relate them to specific motions. In particular, current frameworks treat all motions as "one size fits all". RL methods have begun to provide tools to allow flexibility through a _latent represenation_, but current state of the art rarely incorporates hierarchies, and lack insight. We conjecture that a **flexible hierarchy** of models and policies is essential to efficiently explore the space of trajectories to discover new "skill blocks" autonomously, tractably decompose existing motions into composable skill blocks, and compose new motions out of this repertoire of skill blocks. For some intuitive (hand-wavy) examples: walking across a room requires at - **high level** the entire horizon, but at a very coarse timescale, at a very low order, and low resolution - **low level** (stabilizing motion) short or 0 horizon, fine timescale, high/full order, mid resolution long jump (at a competitive level) - **high level** the entire horizon, at a fine timescale, at close to full order, and high resolution dribbling a soccer ball, avoiding adversaries - **high level** far enough to avoid the next adversary (flexible), mid timescale, low-mid order, low resolution - **mid level** 1 "ball manipulation" horizon (flexible), mid-high timescale, mid-full order, mid-high resolution (depending on manipulation complexity) - **low level** short horizon, fine timescale, full order, high resolution