Unsticking Animation

# Unsticking Animation Bevy animation sounds good on paper: we merged the animation graph RFC, we implemented it, we have a *blazingly fast* parallelized animation evaluator, and we even have support for animating arbitrary ECS component properties. But mostly the community response has been lukewarm. We often hear that the system is complex, verbose, and limiting. And, in some sense, I don't think we fully understand why the response has been that way. As engine contributors, we have mostly opted to wait for a better solution to emerge, or the issues to become more clear. As yet, neither of those things has happened. So we're kind of stuck. Now it's time for us to start thinking critically at how we can get ourselves un-stuck. I want to start by looking at `bevy_animation` as it exists today, so that everyone has a firm grasp of just what it is we have actually been supplying to users. Then I want to contrast our system with the alternatives available in other engines and ui toolkits. The goal of this document is to identify the strengths and weaknesses of the current design, so that we can forge a new path forward for the animation ecosystem. ## The core of `bevy_animation` Bevy animation was originally built to play baked animations loaded from GLTF files. That's been its main purpose since the start, and though it's expanded a bit, all of the infrastructure is more or less designed around that workflow. So let's start by looking at how that is done. Typically, playing a clip looks something like this: ```rust // spawn a gltf scene let scene = asset_server.load(GltfAssetLabel::Scene(0).from_asset(path)); world.spawn(SceneRoot(scene)); // load an animation from the same gltf file let clip = asset_server.load(GltfAssetLabel::Animation(2).from_asset(path)); let (anim_graph, clip_index) = AnimationGraph::from_clip(clip); // ... then after the scene finishes loading ... // let’s assume we have a skinned mesh in the scene let entity = get_skinned_mesh_entity(); // we must associate the animation graph with the entity world.get_entity_mut(entity)? .insert(AnimationGraphHandle(anim_graph.graph_handle.clone())); // ... then at any later point ... // we can start the clip playing, or do other stuff let anim_player = world.get_mut::<AnimationPlayer>(entity)?; anim_player.play(clip_index).repeat(); ``` From this example, it should be clear that playing an animation on an entity requires three things: a clip, a graph, and a player. ### Animation Clips The `AnimationClip` type is the core unit of animation data. It contains: + A duration (as `f32` seconds) + A mapping from "animation targets" (aka "bones") to interpolated value curves. + A mapping from "event targets" to a timed sequence of events (essentially callbacks) If it can't be expressed as an animation clip, `bevy_animation` can't play it. For convenience, animation clips are assets, and you typically deal with them through `Handle<AnimationClip>`. ### Animation Graphs An animation graph (`AnimationGraph`) blends between multiple clips. Each time the game updates, the graph is evaluated to calculate the values of animated properties for that frame. :::success Structurally, each animation graph is a DAG with multiple "source" nodes and a single "sink". The sources are specific animation clips, which are blended together using various weights to produce an output at the sink. ::: The `AnimationGraph::from_clip` function, which we saw above, creates a simple graph with one source (the clip itself) and one sink (the output) with a blend-weight of 100%. Trivial graphs like this are necessary because you can only apply *animation graphs* to entities, not clips. That's what inserting `AnimationGraphHandle` does; it causes the animation graph to be evaluated on the inserted entity, using whatever clips are playing on that entity as inputs. ### Animation Players Because animation graphs are assets, multiple instances of the same graph can appear on different entities. The `AnimationPlayer` component on each of those entities controls the "inputs" to that animation graph instance: starting, stopping, pausing, and seeking the "source clips" that feed into the graph. ## Animating Properties Bevy also provides ways to animate things in ways other than playing simple skeletal animations from gLTF files: animated ui, animated entity properties. This part of the system is newer and less well-loved. ### Animatable At the core of the system sits the `Animatable` trait, which provides the following two methods: ```rust // interpolates between a and b on the time range [0, 1]. fn interpolate(a: &Self, b: &Self, time: f32) -> Self; // blends between multiple inputs, and provides a default if none are provided. fn blend(inputs: impl Iterator<Item = BlendInput<Self>>) -> Self; ``` To animate a value of a component, it must implement `Animatable`, which again means it must support both *additive blending* and *interpolation*. Most important things already implement this: colors, quats, vectors, transforms, and so on, but let's look at how we might implement this for something custom. ```rust struct AudioFilter { dampening: f32, room_size: f32, delay: f32, volume: f32 } impl Default for AudioFilter { fn default() -> AudioFilter { AudioFilter { dampening: 0.0, room_size: 0.0, delay: 0.0, volume: 0.0 } } } impl AudioFilter { fn set_damping(&mut self, dampening: f32) { self.damping = damping.clamp(0.0, f32::MAX) } fn set_room_size(&mut self, room_size: f32) { self.room_size = room_size.clamp(0.0, f32::MAX) } fn set_delay(&mut self, delay: f32) { self.delay = delay.clamp(0.0, f32::MAX) } fn set_volume(&mut self, volume: f32) { self.volume = volume.clamp(0.0, 1.0) } } impl Animatable for AudioFilter { fn interpolate(a: &Self, b: &Self, time: f32) -> Self { // Using the setters here means we clamp the output properly AudioFilter::default() .set_damping(lerp(a.dampening, b.dampening, time)) .set_room_size(lerp(a.room_size, b.room_size, time)) .set_delay(lerp(a.delay, b.delay, time)) .set_volume(lerp((a.volume, b.volume, time)) } fn blend(inputs: impl Iterator<Item = BlendInput<Self>>) -> Self { let mut filter = AudioFilter::default(); for BlendInput { weight, value, additive } in inputs { if additive { filter .set_damping(filter.damping + value.damping * weight) .set_room_size(filter.room_size + value.room_size * weight) .set_delay(filter.delay + value.delay * weight) .set_volume(filter.volume + value.volume * weight) } else { filter = Self::interpolate(&filter, &value, weight); } } } filter } } ``` Even for this relatively simple object, implementing `Animatable` is somewhat verbose. Regardless, now that we have it, we can begin animating `AudioFilter` fields wherever they appear. ### Animatable Properties Let's suppose we have some component `CutScene` which contains an `AudioFilter`. ```rust #[derive(Component)] struct CutScene { // ... other fields filter: Option<AudioFilter> // ... other fields } ``` To *animate* this *property* of `CutScene`, we have to define a new `AnimatableProperty`. ```rust #[derive(Clone)] struct CutSceneAudioFilterProperty; impl AnimatableProperty for CutSceneAudioFilterProperty { type Property = AudioFilter; // This must be an Animatable fn get_mut<'a>( &self, entity: &'a mut AnimationEntityMut ) -> Result<&'a mut Self::Property, AnimationEvaluationError> { // Try to get a cutscene component from the provided entity let cutscene = entity .get_mut::<CutScene>() .ok_or(AnimationEvaluationError::ComponentNotPresent( TypeId::of::<CutScene>() ))? .into_inner(); // Try to get a reference to the filter from the cutscene cutscene .filter .as_mut() .ok_or(AnimationEvaluationError::PropertyNotPresent( TypeId::of::<Option<AudioFilter>>() )) } fn evaluator_id(&self) -> EvaluatorId { EvaluatorId::Type(TypeId::of::<Self>()) } } ``` Most animatable properties look more or less like this: :::success For components that implement `Reflect`, you can often use the `animated_field!` macro instead of manually writing out a custom property. ```rust let field = animated_field!(CutScene::audio_filter); ``` But in this case, because our `AudioFilter` is inside an `Option`, this won't work. ::: ### Animatable Curves When we dealt with gLTFs, we loaded "animation clips" as assets, and it was these clips that contained the actual curves which defined our animations. Here, since we are animating a custom value, we will have to create the curves manually. ```rust let curve = AnimatableCurve::new( CutSceneAudioFilterProperty, AnimatableKeyframeCurve::new([ (0.0, AudioFilter::default()), (0.5, AudioFilter::default().set_volume(1.0)), (0.8, AudioFilter::default().set_volume(8.0).set_delay(200.0)) (1.0, AudioFilter::default().set_delay(200.0)) ])?; ); ``` ### Playing Custom Animations To play our animation, our custom curve must be manually added to a clip. ```rust let mut animation_clip = AnimationClip::default(); // When this clip is evaluated on an entity, this will apply our curve // to the entity named "Cutscene" let animation_target_name = Name::new("Cutscene"); let animation_target_id = AnimationTargetId::from_name(&animation_target_name); animation_clip.add_curve_to_target( animation_target_id, curve ); ``` At this point, we retrace our steps from the original gLTF animation example. ```rust // create an animation graph for the clip let (anim_graph, clip_index) = AnimationGraph::from_clip(clip); // get the cutscene entity somehow let entity = get_cutscene_entity(); // assoceate the animation graph with the entity world.get_entity_mut(entity)? .insert(AnimationGraphHandle(anim_graph.graph_handle.clone())); // play the clip on the entity let anim_player = world.get_mut::<AnimationPlayer>(entity)?; anim_player.play(clip_index); ``` ## What We Can Already See Even before moving on to comparison with other animation systems, I think some features of this design are already becoming apparent. ### Static and Structural Our animation system is very *static*. It's designed to load mostly static animation assets and play them on other mostly static mesh assets. When you have access to static data, the system works well and is not too verbose. When you don't, things get messy. This isn't an arbitrary choice; the system is designed around static data for performance reasons. And it must be said, our animation system is extremely performant (especially compared to Unity). We achieve this performance by eagerly compiling down the animation graphs into accelerated structures that can be rapidly applied to sequences of clips. This approach is clever and works very well. As a consequence, dynamic actions (like transitioning between two clips by varying the blend weights) usually take the form of *structural modifications to data*. This is where much of the verbosity and user annoyance with the system seems to come from: Creating the data manually is tedious, and modifying the internal structure of an animation graph is nontrivial. ### Blending-Focused Everything in this system is designed around interpolation and additive blending (largely built on top of the `Animatable` trait). This is great when you want to animate stuff that can be blended or interpolated. But not everything is blendable. Not everything that we might wish to "change over time" (in the more general sense of animation) needs to change smoothly. There are some animation contexts where we shouldn't allow playing more than one animation clip on an object at once, and in these situations blending is useless. Many of those situations can also get by without interpolation, where the value changes discretely at each keyframe. Neither of these can be supported by the current system. :::success As an aside, I also think `bevy_animation` tends to conflate the terms "animation/animatable" with "blending/blendable". I'd like to see us rename the `Animatable` trait to `Blendable` in the future. I do have [a pr] that improves the situation a bit, but it's just a start. ::: [a pr]: https://github.com/bevyengine/bevy/pull/18674. ### Clip-Oriented The system is very focused on clips. Everything goes into a clip, even basic ui tweens. This is because clips are effectively the only unit of data that the animation system is prepared to deal with. There's no way to orchestrating animations on a grander scale than individual clips: grouping and playing sequences of clips, creating state machines for character controllers, synchronizing clips played in multiple graphs across multiple entities. These are all left entirely to the user. We have been so focused on clips that we have neglected to specify how non-clip-based but animation-adjacent systems (like physics, or inverse kinematics) should interact with animation. # Prior Work Now let's take a look at how other systems handle animation. I'm actually going to start with `egui`, because the topic of ui animation is a hot-button issue at the moment. ## Animation in Egui Animation in `egui` works a little differently, since it's an immediate mode library. At the lowest level of the animation stack, egui provides a function called [`animate_bool`], which more or less looks like this: ```rust fn animate_bool(&self, bool) -> f32; ``` Each time `animate_bool` is called with `true`, it returns a number a bit closer to `1.0`. Each time it is called with `false`, it returns a number a bit closer to `0.0`. After a few frames of being either only `true` (or only `false`), it will eventually approach and settle on `1.0` (or `0.0`, respectively). [`animate_bool`]: https://docs.rs/egui/latest/egui/struct.Context.html#method.animate_bool This is a neat little function, isn't it? In my mind, it's like a "procedural boolean easing function" or “procedural boolean tween". Notice how this is more or less the inverse of blending: you start with a stream of values (in this case coming from user interface input) and get back a blend-weighting, that can then be used as input for other animations. This general design would be pretty easy to adapt to bevy. Here's a rough sketch: ```rust struct AnimatedBool { /// The current state of the boolean. state: bool, /// The last value returned by `get`. last_value: f32, /// The time of the last call to `get`, in seconds. last_update: f64, /// How slowly the value changes, expressed in Hz. period: f32, /// The maximum amount the value will change between reads. max_delta: f32 } impl AnimatedBool { fn new(value: bool) -> AnimatedBool { AnimatedBool{ state: bool, last_value: if value { 1.0 } else { 0.0 }, last_update: None, // Default to a duration of 1/12th of a second, or about 5 frames. period: 12.0, max_delta: f32::INFINITY } } fn set<T>( &mut self, state: bool, time: Time<T>, ) -> f32 { self.state = state; self.get(time) } fn get<T>(&mut self, time: Time<T>) -> f32 { let (start, end) = if self.state { (0.0, 1.0) } else { (1.0, 0.0) }; let current_time = time.elapsed_secs_f64(); let speed = match self.last_update { Some(last_update) => { let delta = (current_time - last_update) as f32 * self.period; delta.at_most(self.max_delta) }, None => self.max_delta } let new_value = self.last_value + (end - start) * speed; self.last_value = if new_value.is_finite() { value.clamp(0.0, 1.0) } else { end }; self.last_update = current_time; self.last_value } } impl Deref for AnimatedBool { type Target = bool; fn deref(&self) -> &bool { &self.state } } ``` This object provides ways for users to take their hard, cold state and derive weights that can be used to drive transitions or juicy animations. You can imagine how one might generalize this to provide weights for a collection of objects, rather than just boolean values. Most of the rest of `bevy_egui` isn't applicable because it's rooted in immediate-mode patterns. It is largely focused on easings and transitions. ## Animation with CSS Animation Continuing the theme of UI, we should probably also look at how animations are handled on the web. Like most ui tools, CSS animations are focused on easings and transitions, but unlike eGUI, it's retained mode. The CSS animation system is weird, and rather than trying to describe it better than the thousand-and-one other tutorial websites, I'm just going to try to distill the key points: + Animations contain keyframes that can set any CSS property. + Properties are either Not Animatable, Discrete (set by keyframes but do not blend), or Computed (which means it uses some property-specific interpolation scheme to blend between keyframes). + Every HTML element can have only *one* animation playing at a time, due to the `animation-name` CSS property. + The animation starts automatically when the `animation-name` property is set, which is usually either when a node is created or on page load. + JavaScript can also cause the animation to start playing by firing `animationstart`, `animationend` and `animationiteration` events targeting specific HTML elements. In some ways, this is astonishingly similar to Bevy, but there are some clear ergonomics improvements that we lack: + We lack support for discretely animated properties; everything has to be blendable. + We could use observer events (like JS events) to play animations on an entity instead of using an `AnimationPlayer` component. + Because you can only play a single animation (in Bevy parlance, a single "clip") on an element (or "entity“), CSS has no need for a complex blending graph. ## Animation with CSS Transitions CSS has a second animation system specialized to transitions, which is arguably more useful and somewhat more complex. Instead of applying animations composed of multiple keyframes, the `transition` property lets users specify that specific attributes should appear to transition smoothly when changed. Here's an example: ```css div { width: 100px; height: 100px; background: red; transition: width 2s; } div:hover { width: 300px; } ``` When the div is hovered, the width changes. But instead of abruptly changing the width, it smoothly extends over the course of two seconds. If the mouse is removed while extending, the div slowly and gracefully shifts back to its original size. This is somewhat similar to what we saw with `animated_bool` in `Egui`, but now with arbitrary properties. @dreamertalin has a prototype of something like this working in bevy, and I invite them to expand on it here if they wish. ## Animation in Unreal Engine 5 TODO + Talk about the hierarchy of animation sequences Unreal doesn't have a single animation system. There are various systems that share some code and UI but are distinct to the user: - Skeletal animation: https://dev.epicgames.com/documentation/en-us/unreal-engine/skeletal-mesh-animation-system-in-unreal-engine - There's also an ECS based crowd system that reuses some parts of the skeletal animation system. - And an "AnimNext" successor in development, but I haven't looked at it. - Sprite animation: https://dev.epicgames.com/documentation/en-us/unreal-engine/paper-2d-overview-in-unreal-engine - UI animation: https://dev.epicgames.com/documentation/en-us/unreal-engine/animating-umg-widgets-in-unreal-engine - Motion graphics: https://dev.epicgames.com/documentation/en-us/unreal-engine/motion-design-in-unreal-engine - Sequencer: https://dev.epicgames.com/documentation/en-us/unreal-engine/cinematics-and-movie-making-in-unreal-engine - Primarily a cinematics tool that can drive other systems and do property animation. - Can be used in non-cinematic contexts. - Can be used as a higher-level controller for skeletal animation. ### Skeletal Animation In Unreal #### Background Unreal and most other skeletal animation systems have some concept of "skeleton", "pose" and "animation clip". - A skeleton is a hierarchy of named joints with a single root joint. - Skeletons only describe the hierarchy - they're not an instance of the joints in the world and don't have transforms. - Joint names are usually unique within the skeleton. - Skeletons may also define "attributes" - named values of various types, sometimes bound to joints, sometimes non-interpolable. - A pose is set of transforms, one for each joint in a skeleton. - Effectively an instance of a skeleton. - The transforms are often but not always in parent-space. - Can also contain attribute values. - Skeleton assets will sometimes contain a "reference pose" which is used a default or placeholder. - Skinned meshes contain one "bind pose" that relates joints to vertices. - An animation clip is essentially a function `f(time) -> Pose`. - Underneath it might be curves for each joint/attribute, or an array of poses. - Systems often have other assets which are essentially functions to poses, but take different kinds of parameters. - E.g. a static blendspace that maps from some N-dimensional coordinate to a pose by interpolating an array of poses positioned in that coordinate space. #### Unreal Basics - Skeleton descriptions, skinned meshes and animation clips are immutable assets. - Skinned mesh and animation clip assets link to a single skeleton asset that describes them. - There's also blendspaces and a couple of other assets in the "function from parameters to pose" mould. - A "skeletal mesh component" instances a skinned mesh in the world. - (There's actually several different components, but I'll pretend there's one for simplicity). - Contains a current pose. - Contains a `UAnimInstance` object that's responsible for calculating the current pose. - There are various `UAnimInstance` implementations, the simplest of which is basically an animation clip and a time. - Joints are not first-class entities - they're just data than can be looked up by name within a skeletal mesh component. - Entities can still be attached to joints. Roughly speaking, where Bevy's hierarchical relationship is `ChildOf(Entity)`, Unreal's is `ChildOf(Entity, Option<JointName>)`. #### Animation Blueprints - An Unreal animation blueprint is one implementation of `UAnimInstance`. - It's both an immutable asset and a mutable instance - (Unreal often uses the prototype pattern, so the runtime instance is a mutable copy of the asset). - The instance of a blueprint is contained in a skeletal mesh component. - The animation blueprint asset links to a skeleton asset. - This means it's aware of the hierarchy and can reference joints. - The animation blueprint contains: - Parameters: Named variables used throughout the animation blueprint. - Event graph: Code graph that writes parameters. - Blend graphs: Dataflow graph that outputs a pose. - State graphs: State transition graph + state machine that outputs a pose. - The event graph reads data from the world/actor (e.g. the actor's speed and angle) and writes parameters. - The state and blend graphs read those parameters - they (mostly?) can't access anything outside the animation blueprint. - The blend graph has: - Edges that are poses or data. - Nodes that are operations on poses (e.g. a BlendTwo node takes two input poses, a weight input, and outputs a pose), or operations on data/parameters (get parameter by name, add two values etc) - Nodes can have internal mutable state. - There's always terminal pose node that takes a single pose as input. - The state graph has: - Nodes that are states. - Edges with trigger conditions that can read from parameters (e.g. "if parameters.speed > 50"). - State and blend graphs can be recursively nested. - The outermost graph is a blend graph. - Blend graph nodes can contain a state graph. - State graph nodes can contain a blend graph. - Animation blueprints can be nested. - A blend graph node can instance another animation blueprint asset. #### Other Stuff Some parts I'm gonna mention for reference but not cover in detail: - Events, motion extraction, retargeting, lodding, debugging, collision, rigid body physics, cloth physics. - "Animation montage" system: Kind of a second state machine for certain types of transitions, exposed as a blend graph node. - This doesn't use a state graph per se - it's partly code-driven and partly made from states implicit in a timeline. - "Control rig": Rigging system that also includes a stateless(-ish?) pose blend graph. - "Deformer graph": I'm not familiar with this - seems like a dataflow graph focused on more low-level mesh transforms through compute shaders. #### Contrasting Skeletal Animation In `bevy_animation` And Unreal Unreal and Bevy differ a lot in their fundamentals: - Bevy doesn't delineate skeletons and poses. - Joints are first-class entities within the world/scene. - In theory, any joint can reference any mesh and any animation player/graph, regardless of the hierarchy. - Unreal joints are always contained in skeleton descriptions and poses. - Bevy blend graph evaluation is per-joint. - Bevy doesn't support features that read or write multiple joints. - Includes cross-joint constraints, IK, non-parentspace blending/masking, complex retargeting. - Unreal blend graphs work on poses. - Bevy blend graph edges are weighted. - Unreal blend graph edges are unweighted. Nodes control blend weights. - Bevy blend graph nodes are limited to engine defined types. - Unreal allows apps to define new node types. - Bevy requires blend graphs. - Unreal allows anything that can implement a reasonable subset of the `UAnimInstance` interface, and has a "just a single animation please" version. - Bevy joints are identified by a UUID built from the joint name and the names of all its ancestors. - Unreal joints just have names, which are required to be unique within a skeleton. I'm also gonna note a few cases where they're fairly close but still different. - Bevy only allows a single blend graph to affect a joint. - Unreal allows nesting/instancing of multiple graphs. - Bevy animation players are separate from the blend graph, but can be referenced by a blend graph node. - Unreal animation players are kinda just graph nodes, although there's some magic underneath for cross-player syncing. - Bevy animation clips are somewhat tied to graph evaluation (see `AnimationCurve` trait). - Unreal animation clips are easier to treat as plain data. - Bevy animation clips are curves of any interpolable type with irregular keys. - Unreal animation clips have optimised support for joint transforms and float attributes. - These are compressed and might be baked down to regular, linearly interpolated keys. - There's also support for arbitrary attributes bound to joints, some of which are non-interpolable (e.g. strings). - Bevy animation clip events are non-serializable closures at a point in time. - Unreal animation clip events are serializable objects with a overridable callback, and can optionally span time with begin/tick/end callbacks. - Bevy blend graph evaluation uses a system thread local to minimise allocations (`AnimationEvaluationState`). - Unreal uses a thread local bump allocator that's shared with other systems. ## Animation in Unity 6 TODO ## Animation in Godot 4 TODO + Talk about blend-trees (similar to our animation graph) ## Animation in Source 2 TODO + Talk about their animation graph