Dreaming of an input manager

Input managers have one core goal: transforming messy, raw user inputs from the meat hands of your players into a pure, beautiful string of actions, which represent semantic tasks or states for the game (or business) logic to interpret.

This seems simple: define a set of inputs, a set of actions, create a hashmap and you're golden. Right?

Wrong!

Many years ago, I sat down to build leafwing-input-manager, creating a clean, ergonomic API for setting up keybindings in simple games.

In the years that have passed, my horror has only grown as the sheer complexity of the domain has slowly unfolded. Can we make something that's simple enough for beginners, pleasant enough to spark joy, and powerful enough to handle the complexities of even a demanding finished video game?

As we dream, let's break down the three key parts of any input manager:

user inputs
player actions
picking the right actions

We should keep the following secondary goals in mind: they have a big impact on the correct choice of representation!

actions should be mockable (great for AI, essential for testing)
input maps should be serializable (easy keybinding menus)
action state should be serializable (great for networking)
works correctly even when frame times are inconsistent or simulation steps are decoupled from input polling (like Bevy's fixed timestep)

Credit where it's due: the folks behind Unreal's Enhanced Input and other game engines have also done a ton of thinking about this domain! This document synthesizes what I've seen in those systems and Shatur's work in bevy_enhanced_input.

User input

The first step is capturing user input in all of its diverse forms, and storing it in a single form that could be bound to something.

Keyboards and gamepads and mice oh my!

Flavors of input

AXIS, BUTTON ETC

Cross-device input and input modalities

The traditional game input settings menu has two distinct, parallel keybinding menus: "gamepad controls" and "keyboard-mouse controls".

This idea, that users are interacting with your software using a specific physical device (or set of devices) shapes the design of input managers.

In practice, there are four distinct modalities seen in games today:

keyboard and mouse
gamepad (or flight stick or something else pretending to be a gamepad)
touch screens (and gyros)
virtual reality

This raises three questions:

Should we explicitly model input modalities?
Should we allow cross-device chords?
Should we allow users to store inputs from the "wrong" modality in the associated map?

Enums and traits

Input processing

DEADZONES

INPUT VS ACTION-LEVEL

Player actions

Sending actions

MUST BE MOCKABLE

Reading actions

STATE-BASED AND EVENT-BASED

BEVY'S JUST_PRESSED IS A MUDDLED COMPROMISE?

FIXED TIMESTEP

Keybinding menus

Picking the right action

Clashing inputs

Context is king

Putting it all together

Phew, that was a lot! Here are my thoughts on the critical questions identified above:

how do we unify and organize user input?
- user inputs are split into Button / Axis / DualAxis etc enums
- variants correspond to things like gamepad, mouse buttons etc
- custom inputs go into a Custom arm that stores a &'static str
should chords be of fixed length?
- no, but optimize for length two
should we explicitly model input modalities?
- yes: this is the standard way to define keybindings
should we allow cross-device chords?
- yes: Keyboard + Mouse chords are common and important
should we allow users to store inputs from the "wrong" modality in the associated map?
- by default, yes. This is a good escape hatch for unconventional inputs, especially for disabled users.
should we allow cross-modality chords?
- yes, we have to support this for KBM at least
- generally nice for accessibility
how should we define actions?
- each action gets its own type
- this plays nicely with observers
- the kind of each action can be encoded in the type system
- easily extended with modding etc
how are inputs mapped?
- each context gets its own InputMap
- registered one at a time using a fluent API in (input, action) pairs
how should users respond to actions?
- for interrupt driven actions (e.g. Jump), use observers
- for state polling, check a global ActionSt ate resource in an ordinary system
how do we select the right action to use?
- CONTEXT
- optionally, the correct input modality is detected based on inputs received from a matching device
within a context, how do we decide which actions to prioritize?
- use insertion order: earlier actions take priority\

The Hard Bits

Start with these as examples first:

input binding menu
serializing and deserializing input maps