Input managers have one core goal: transforming messy, raw user inputs from the meat hands of your players into a pure, beautiful string of actions, which represent semantic tasks or states for the game (or business) logic to interpret.
This seems simple: define a set of inputs, a set of actions, create a hashmap and you're golden. Right?
Wrong!
Many years ago, I sat down to build leafwing-input-manager,
creating a clean, ergonomic API for setting up keybindings in simple games.
In the years that have passed, my horror has only grown as the sheer complexity of the domain has slowly unfolded. Can we make something that's simple enough for beginners, pleasant enough to spark joy, and powerful enough to handle the complexities of even a demanding finished video game?
As we dream, let's break down the three key parts of any input manager:
- user inputs
- player actions
- picking the right actions
We should keep the following secondary goals in mind: they have a big impact on the correct choice of representation!
- actions should be mockable (great for AI, essential for testing)
- input maps should be serializable (easy keybinding menus)
- action state should be serializable (great for networking)
- works correctly even when frame times are inconsistent or simulation steps are decoupled from input polling (like Bevy's fixed timestep)
Credit where it's due: the folks behind Unreal's Enhanced Input and other game engines have also done a ton of thinking about this domain! This document synthesizes what I've seen in those systems and Shatur's work in bevy_enhanced_input.
The first step is capturing user input in all of its diverse forms, and storing it in a single form that could be bound to something.
Keyboards and gamepads and mice oh my!
AXIS, BUTTON ETC
The traditional game input settings menu has two distinct, parallel keybinding menus: "gamepad controls" and "keyboard-mouse controls".
This idea, that users are interacting with your software using a specific physical device (or set of devices) shapes the design of input managers.
In practice, there are four distinct modalities seen in games today:
- keyboard and mouse
- gamepad (or flight stick or something else pretending to be a gamepad)
- touch screens (and gyros)
- virtual reality
This raises three questions:
- Should we explicitly model input modalities?
- Should we allow cross-device chords?
- Should we allow users to store inputs from the "wrong" modality in the associated map?
Enums and traits
DEADZONES
INPUT VS ACTION-LEVEL
Player actions
Sending actions
MUST BE MOCKABLE
Reading actions
STATE-BASED AND EVENT-BASED
BEVY'S JUST_PRESSED IS A MUDDLED COMPROMISE?
FIXED TIMESTEP
Picking the right action
Context is king
Putting it all together
Phew, that was a lot! Here are my thoughts on the critical questions identified above:
- how do we unify and organize user input?
- user inputs are split into Button / Axis / DualAxis etc enums
- variants correspond to things like gamepad, mouse buttons etc
- custom inputs go into a Custom arm that stores a &'static str
- should chords be of fixed length?
- no, but optimize for length two
- should we explicitly model input modalities?
- yes: this is the standard way to define keybindings
- should we allow cross-device chords?
- yes: Keyboard + Mouse chords are common and important
- should we allow users to store inputs from the "wrong" modality in the associated map?
- by default, yes. This is a good escape hatch for unconventional inputs, especially for disabled users.
- should we allow cross-modality chords?
- yes, we have to support this for KBM at least
- generally nice for accessibility
- how should we define actions?
- each action gets its own type
- this plays nicely with observers
- the kind of each action can be encoded in the type system
- easily extended with modding etc
- how are inputs mapped?
- each context gets its own
InputMap
- registered one at a time using a fluent API in
(input, action)
pairs
- how should users respond to actions?
- for interrupt driven actions (e.g. Jump), use observers
- for state polling, check a global
ActionSt ate
resource in an ordinary system
- how do we select the right action to use?
- CONTEXT
- optionally, the correct input modality is detected based on inputs received from a matching device
- within a context, how do we decide which actions to prioritize?
- use insertion order: earlier actions take priority\
The Hard Bits
Start with these as examples first:
- input binding menu
- serializing and deserializing input maps