The Bevy Audio Backend Problem (aka `common-audio-traits`)

Summary

There are a lot of audio backend libraries. Authors must implement their libraries in accordance with every audio backend. This is miserable as a non-backend audio library author.

Motivation

The default audio crate for bevy is lacking. As such, many people have made alternative audio libraries such as bevy_kira_audio, and bevy_oddio, using different audio backends (kira and oddio, respectively).

This creates a fracture in the bevy audio ecosystem, and libraries dealing with audio in bevy find that one needs to implement their library in accordance with every bevy audio backend library (bevy_fundsp).

If the backend of bevy_audio is abstracted, then future audio non-backend libraries will not have to rely on specific backend details. Users can simply plug-and-play their own preferred audio backend (the default will be rodio), and library authors can write their own code without relying on any specific underlying implementation.

In the case for bevy_fundsp, three different implementations are made and gated in mutually-exclusive features. This violates Rust's assumption that features are additive. Therefore, weird errors pop up, which the user may not understand at first glance.

Problems Faced by Bevy Audio Libraries

Increasing complexity with addition of new backend crates.
Backend implementation leaks to audio library implementation.
All backend APIs are inconsistent.

Case Studies of Audio Libraries in Bevy

Here are some real code that reflects this fracture of audio backend libraries (If you have your own audio library, please share your implementation in a reply).

`bevy_fundsp`

Copying from the bevy_fundsp implementation of Backend:

trait AudioBackend {
    /// The static audio source.
    /// Usually stores a collection of sound bytes.
    type AudioSource;

    /// Initialization of App that is specific for the given Backend.
    fn init_app(app: &mut App);
    /// Convert the given [`DspSource`] to the defined static audio source.
    fn convert_to_audio_source(dsp_source: DspSource) -> Self::AudioSource;
}

This is the initial code for abstracting different audio backends, used for bevy_fundsp.

Tangentially related is bevy_fundsp's extension trait for Audio types

/// Extension trait to add a helper method for playing DSP sources.
pub trait DspAudioExt {
    /// The [`Assets`](bevy::prelude::Assets)
    /// for the concrete `Audio` type in the given backend.
    type Assets;
    /// The settings that are usually passed
    /// to the concrete `Audio` type of the given backend.
    type Settings: Default;
    /// The audio sink that is usually returned
    /// when playing the given DSP source.
    type Sink;

    /// Play the given [`DspSource`] with the given settings.
    fn play_dsp_with_settings(
        &mut self,
        assets: &mut Self::Assets,
        source: &DspSource,
        settings: Self::Settings,
    ) -> Self::Sink;

    /// Play the given [`DspSource`] with the default settings.
    fn play_dsp(&mut self, assets: &mut Self::Assets, source: &DspSource) -> Self::Sink {
        self.play_dsp_with_settings(assets, source, default())
    }
}

Requirements

These requirements should be met when designing an implementation for this (potential) RFC.

Abstraction of audio backends. bevy_audio should allow different backends such rodio, kira, oddio, or other rust libraries. Users can simply choose which backend to use, and non-backend library authors can simply not care about its implementation details.
Allow extensibility of bare-minimum interface. Some backends have missing features that others have. The implementation shouldn't restrict backend library authors if they want to have new features that isn't defined by bevy_audio.
Interoperability. AudioSources should not care about its presentation, only when its a custom Read + Seek or Iterator<Item = Frame>.

User-facing explanation

Common audio types

Here are the common types that are present for all audio backend libraries:

Static audio sources
Audio sinks
Type that allows users to play audio (Audio type)
Trait for custom audio (!Sync and Send)
Trait for implementing Asset and Sync
Audio control types (for custom audio)

Static audio sources

These are types that contain all the bytes of the audio. Audio bytes are loaded in memory all at once.

Library	Type	Note
`bevy_audio`	`AudioSource`	`rodio` does not have an equivalent static audio source. `bevy_audio` uses `Arc<[u8]>` internally.
`bevy_kira_audio`	`AudioSource`	uses `StaticSoundData` in `kira` internally
`bevy_oddio`	`AudioSource`	uses `oddio::Frames` internally

Audio sinks

Library	Type
`bevy_audio`	`AudioSink`
`bevy_kira_audio`	`PlayAudioCommand`
`bevy_oddio`	`AudioSink`

Type that allows users to play audio

These are usually accessed through a resource.

Library	Type	Note
`bevy_audio`	`Audio`
`bevy_kira_audio`	`AudioChannel` or `DynamicAudioChannel`	`Audio` is simply a type definition for `AudioChannel<MainTrack>`
`bevy_oddio`	`Audio`

The trait for custom audio

These traits allow users to implement their own audio type.

Library	Trait	Note
`bevy_audio`	`Decodable::Decoder`	internally uses `rodio::Source`.
`bevy_kira_audio`	does not support it as of 0.12	`kira` has `Sound`
`bevy_oddio`	`oddio::Signal`	exposes `oddio::Signal` directly

The trait that is converted to the playing audio

Typically this is needed to implement bevy's Asset, therefore Sync.

Library	Trait	Note
`bevy_audio`	`Decodable`	`rodio` has no equivalent trait
`bevy_kira_audio`	does not support it as of 0.12	`kira` has `SoundData`
`bevy_oddio`	`ToSignal`	`oddio` has no equivalent trait

The trait that allows control of playing audio

This is typically different from audio sinks, as this allows custom control of playing audio.

Library	Trait	Note
`bevy_audio`	N/A	`rodio` has no equivalent trait
`bevy_kira_audio`	N/A	`kira` has `SoundData::Handle`
`bevy_oddio`	`oddio::Signal` that implements `Controlled`, which has `Controlled::Control`

Supported audio files

mp3
wave
ogg
flac

Miscellaneous Terminologies

Backend: libraries that handle the playing of audio.

Implementation Strategy

`AudioSource`

AudioSource is now a trait. Frame is simply [f32; 2] or similar, which represents the left and right channels in stereo.

/// Similar to rodio's Source, oddio's Signal,
/// and kira's Sound.
trait Source: Iterator<Item = Frame> + Send {
    /// Controls must implement the basic functionality of an audio sink.
    /// They can, however, make their own methods
    /// specific to the type implementing `Source`.
    type Control: Sink;
    /// Get the next frame after `delta` seconds
    /// have passed.
    fn tick(&mut self, delta: f64) -> Frame;
}

Static audio sources

There should be a way to convert a vector of frames into a static audio source.

trait StaticSource: Source {
    fn into_static_source<I>(frames: I) -> Self
    where
        I: IntoIterator<Item = Frame>,
        I::IntoIter: ExactSizeIterator;
}

`AudioSink`

AudioSink are traits that have the basic functionality of being… an audio sink.

// Copy pasted from here: https://github.com/mockersf/bevy/blob/b9dd4d03f37b079c909404af006fa3b946c55414/crates/bevy_audio/src/sinks.rs#L7-L52
trait Sink {
    /// Gets the volume of the sound.
    ///
    /// The value `1.0` is the "normal" volume (unfiltered input). Any value other than `1.0`
    /// will multiply each sample by this value.
    fn volume(&self) -> f32;

    /// Changes the volume of the sound.
    ///
    /// The value `1.0` is the "normal" volume (unfiltered input). Any value other than `1.0`
    /// will multiply each sample by this value.
    fn set_volume(&self, volume: f32);

    /// Gets the speed of the sound.
    ///
    /// The value `1.0` is the "normal" speed (unfiltered input). Any value other than `1.0`
    /// will change the play speed of the sound.
    fn speed(&self) -> f32;

    /// Changes the speed of the sound.
    ///
    /// The value `1.0` is the "normal" speed (unfiltered input). Any value other than `1.0`
    /// will change the play speed of the sound.
    fn set_speed(&self, speed: f32);

    /// Resumes playback of a paused sink.
    ///
    /// No effect if not paused.
    fn play(&self);

    /// Pauses playback of this sink.
    ///
    /// No effect if already paused.
    /// A paused sink can be resumed with [`play`](Self::play).
    fn pause(&self);

    /// Is this sink paused?
    ///
    /// Sinks can be paused and resumed using [`pause`](Self::pause) and [`play`](Self::play).
    fn is_paused(&self) -> bool;

    /// Stops the sink.
    ///
    /// It won't be possible to restart it afterwards.
    fn stop(&self);
}

`AudioOutput`

AudioOutput is now a trait that simply handles the audio thread.

trait Output {
    /// plays the audio (usually getting the samples
    /// of the source and feeding it to `cpal`)
    /// and returns the handle of the audio source.
    fn play<Au: Source>(&mut self, source: Au) -> Au::Control;
}

`AudioData`

AudioData is the Asset form of AudioSource. Generally AudioSource is usually Send but not Sync, so we make another trait to convert audio data into audio sources. This is similar to bevy_audio's Decodable, kira's SoundData, and bevy_oddio's ToSignal traits.

trait AudioData {
    type Source: Source;
    type Settings;
    
    fn to_source(&self, settings: Self::Settings) -> Self::Source;
}

trait StaticAudioData: AudioData
where
    Self::Source: StaticSource,
{
    fn to_static_source(&self, settings: Self::Settings) -> Self::Source;
}

`AudioMixer` and `Audio`

AudioMixer is the public API usually accessed by the user through a resource. Generally implementing libraries should have type alias that specifies the AudioOutput used.

struct AudioMixer<O> {
    output: O,
    // ...impl details
}

impl<O: Output> AudioMixer<O> {
    fn play<Au: AudioData>(
        &mut self, data: Handle<Au>,
        settings: Au::Settings
    ) -> <Au::Source as Source>::Control {
        // use self.output internally
    }
}

// For example, we use a `rodio` backend library
type Audio = AudioMixer<RodioOutput>;
// and the provided plugin will be registering this
// type as a resource

`AudioLoader`

Audio loaders are simply types that implement AssetLoader. It's up to the backend library authors to implement them.

`AudioPlugin`

Plugins will not be provided by bevy_audio. This will be provided by backend authors. Generally this type will:

For Backend Library Authors

Backend library authors must implement the following traits to their own types:

Source
StaticSource
Sink
Output
AudioData

For Users (Both Bevy Game Developers and Non-Backend Library Authors)

Users should essentially see no API changes (except for imports). They can choose their own custom backend (currently rodio, kira, and oddio in the bevy ecosystem).

AudioPlugin should be from the backend libraries and not from bevy_audio, as each backend library will have their own way of setting up their audio threads, systems, resources, components, etc.

Drawbacks

This will force backend library authors to rewrite their whole library.
The current implementation uses a lot of trait soup. This will probably confuse some users.

Rationale and alternatives

Why abstract `bevy_audio`?

As a non-backend library author, working with the bevy audio ecosystem is frustrating:

Since each audio backend has its own idiosyncrasies regarding their own implementations, there will be different implementations the author have to make just to support these backends.
Since these backends are mutually incompatible, implementations have to be gated in non-additive features, which of course Rust do not like.

Why not improve `rodio`? Why not focus on `kira`?

Each audio backend library has their own use cases:

kira focuses on timing audio correctly. This is why it has functionalities related to tweening and clocks.
oddio is more on raw digital signal processing. It works by using iterator-like combinators that you combine together to manipulate audio in real time. As such, it places less focus on static audio files, and more on procedural generation and manipulation of audio.
synthizer, an unpublished audio backend crate, is heavily optimized for binaural audio that brings a native dependency.
rodio is rodio.

Since there are currently three main contenders for backend libraries (rodio, kira, and oddio), this will cause non-backend library authors to create three exclusive crate features for their library. This creates a very frustrating experience, as exclusive features are not well supported in Rust.

Users should not care about implementation details, unless they need specific functionalities provided by their chosen backend libraries.

Unresolved questions

What should be our default audio backend?
- rodio? It has many potential problems with its API, and it has a problem regarding stereo audio. (See rodio#444 and bevy#6122)
- kira? This is the best potential default backend for bevy, however I found some features lacking, like sound effects and digital signal processing.
- oddio? This is pretty much bare bones in terms of its features, opting for more flexibility for the user. It has first class support for spatial audio. It does not, however, output audio, rather only manipulate signals. bevy_oddio handles this by using cpal directly.
How does bevy-rrise mesh in this RFC?
- Probably bevy-rrise can simply ignore this, as this has a vastly different architecture compared to other backend crates.

Future possibilities

Traits for spatial audio. This is useful for interfacing audio components with different backends. We don't want to lock our implementation with a single backend some users may not want.

The Bevy Audio Backend Problem (aka common-audio-traits)

Summary

Motivation

Problems Faced by Bevy Audio Libraries

Case Studies of Audio Libraries in Bevy

bevy_fundsp

Requirements

User-facing explanation

Common audio types

Static audio sources

Audio sinks

Type that allows users to play audio

The trait for custom audio

The trait that is converted to the playing audio

The trait that allows control of playing audio

Supported audio files

Miscellaneous Terminologies

Implementation Strategy

AudioSource

Static audio sources

AudioSink

AudioOutput

AudioData

AudioMixer and Audio

AudioLoader

AudioPlugin

For Backend Library Authors

For Users (Both Bevy Game Developers and Non-Backend Library Authors)

Drawbacks

Rationale and alternatives

Why abstract bevy_audio?

Why not improve rodio? Why not focus on kira?

Unresolved questions

Future possibilities

The Bevy Audio Backend Problem (aka `common-audio-traits`)

`bevy_fundsp`

`AudioSource`

`AudioSink`

`AudioOutput`

`AudioData`

`AudioMixer` and `Audio`

`AudioLoader`

`AudioPlugin`

Why abstract `bevy_audio`?

Why not improve `rodio`? Why not focus on `kira`?