# The Bevy Audio Backend Problem (aka `common-audio-traits`)
## Summary
There are a lot of audio backend libraries. Authors must implement their libraries in accordance with every audio backend. This is miserable as a non-backend audio library author.
## Motivation
The default audio crate for bevy is lacking. As such, many people have made alternative audio libraries such as `bevy_kira_audio`, and `bevy_oddio`, using different audio backends (`kira` and `oddio`, respectively).
This creates a fracture in the bevy audio ecosystem, and libraries dealing with audio in bevy find that one needs to implement their library in accordance with every bevy audio backend library ([`bevy_fundsp`](https://github.com/harudagondi/bevy_fundsp/blob/refactor-0.2/src/backend.rs )).
If the backend of `bevy_audio` is abstracted, then future audio non-backend libraries will not have to rely on specific backend details. Users can simply plug-and-play their own preferred audio backend (the default will be `rodio`), and library authors can write their own code without relying on any specific underlying implementation.
In the case for `bevy_fundsp`, three different implementations are made and gated in mutually-exclusive features. This violates Rust's assumption that features are additive. Therefore, weird errors pop up, which the user may not understand at first glance.
### Problems Faced by Bevy Audio Libraries
1. Increasing complexity with addition of new backend crates.
2. Backend implementation leaks to audio library implementation.
3. All backend APIs are inconsistent.
### Case Studies of Audio Libraries in Bevy
Here are some real code that reflects this fracture of audio backend libraries (If you have your own audio library, please share your implementation in a reply).
#### `bevy_fundsp`
Copying from the [`bevy_fundsp`](https://github.com/harudagondi/bevy_fundsp/pull/6) implementation of `Backend`:
```rust
trait AudioBackend {
/// The static audio source.
/// Usually stores a collection of sound bytes.
type AudioSource;
/// Initialization of App that is specific for the given Backend.
fn init_app(app: &mut App);
/// Convert the given [`DspSource`] to the defined static audio source.
fn convert_to_audio_source(dsp_source: DspSource) -> Self::AudioSource;
}
```
This is the initial code for abstracting different audio backends, used for `bevy_fundsp`.
Tangentially related is `bevy_fundsp`'s extension trait for `Audio` types
```rust
/// Extension trait to add a helper method for playing DSP sources.
pub trait DspAudioExt {
/// The [`Assets`](bevy::prelude::Assets)
/// for the concrete `Audio` type in the given backend.
type Assets;
/// The settings that are usually passed
/// to the concrete `Audio` type of the given backend.
type Settings: Default;
/// The audio sink that is usually returned
/// when playing the given DSP source.
type Sink;
/// Play the given [`DspSource`] with the given settings.
fn play_dsp_with_settings(
&mut self,
assets: &mut Self::Assets,
source: &DspSource,
settings: Self::Settings,
) -> Self::Sink;
/// Play the given [`DspSource`] with the default settings.
fn play_dsp(&mut self, assets: &mut Self::Assets, source: &DspSource) -> Self::Sink {
self.play_dsp_with_settings(assets, source, default())
}
}
```
### Requirements
These requirements should be met when designing an implementation for this (potential) RFC.
1. **Abstraction of audio backends**. `bevy_audio` should allow different backends such `rodio`, `kira`, `oddio`, or other rust libraries. Users can simply choose which backend to use, and non-backend library authors can simply not care about its implementation details.
2. **Allow extensibility of bare-minimum interface**. Some backends have missing features that others have. The implementation shouldn't restrict backend library authors if they want to have new features that isn't defined by `bevy_audio`.
3. **Interoperability**. `AudioSource`s should not care about its presentation, only when its a custom `Read + Seek` or `Iterator<Item = Frame>`.
## User-facing explanation
### Common audio types
Here are the common types that are present for all audio backend libraries:
1. Static audio sources
2. Audio sinks
3. Type that allows users to play audio (`Audio` type)
4. Trait for custom audio (`!Sync` and `Send`)
5. Trait for implementing `Asset` and `Sync`
6. Audio control types (for custom audio)
### Static audio sources
These are types that contain all the bytes of the audio. Audio bytes are loaded in memory all at once.
| Library | Type | Note |
| - | - | - |
| `bevy_audio` | `AudioSource` | `rodio` does not have an equivalent static audio source. `bevy_audio` uses `Arc<[u8]>` internally. |
| `bevy_kira_audio` | `AudioSource` | uses `StaticSoundData` in `kira` internally |
| `bevy_oddio` | `AudioSource` | uses `oddio::Frames` internally |
### Audio sinks
| Library | Type |
| - | - |
| `bevy_audio` | `AudioSink` |
| `bevy_kira_audio` | `PlayAudioCommand` |
| `bevy_oddio` | `AudioSink` |
### Type that allows users to play audio
These are usually accessed through a resource.
| Library | Type | Note |
| - | - | - |
| `bevy_audio` | `Audio` | |
| `bevy_kira_audio` | `AudioChannel` or `DynamicAudioChannel` | `Audio` is simply a type definition for `AudioChannel<MainTrack>` |
| `bevy_oddio` | `Audio` | |
### The trait for custom audio
These traits allow users to implement their own audio type.
| Library | Trait | Note |
| - | - | - |
| `bevy_audio` | `Decodable::Decoder` | internally uses `rodio::Source`. |
| `bevy_kira_audio` | does not support it as of 0.12 | `kira` has `Sound` |
| `bevy_oddio` | `oddio::Signal` | exposes `oddio::Signal` directly |
### The trait that is converted to the playing audio
Typically this is needed to implement bevy's `Asset`, therefore `Sync`.
| Library | Trait | Note |
| - | - | - |
| `bevy_audio` | `Decodable` | `rodio` has no equivalent trait |
| `bevy_kira_audio` | does not support it as of 0.12 | `kira` has `SoundData` |
| `bevy_oddio` | `ToSignal` | `oddio` has no equivalent trait |
### The trait that allows control of playing audio
This is typically different from audio sinks, as this allows custom control of playing audio.
| Library | Trait | Note |
| - | - | - |
| `bevy_audio` | N/A | `rodio` has no equivalent trait |
| `bevy_kira_audio` | N/A | `kira` has `SoundData::Handle` |
| `bevy_oddio` | `oddio::Signal` that implements `Controlled`, which has `Controlled::Control` | |
### Supported audio files
1. `mp3`
2. `wave`
3. `ogg`
4. `flac`
### Miscellaneous Terminologies
- Backend: libraries that handle the playing of audio.
## Implementation Strategy
### `AudioSource`
`AudioSource` is now a trait. `Frame` is simply `[f32; 2]` or similar, which represents the left and right channels in stereo.
```rust
/// Similar to rodio's Source, oddio's Signal,
/// and kira's Sound.
trait Source: Iterator<Item = Frame> + Send {
/// Controls must implement the basic functionality of an audio sink.
/// They can, however, make their own methods
/// specific to the type implementing `Source`.
type Control: Sink;
/// Get the next frame after `delta` seconds
/// have passed.
fn tick(&mut self, delta: f64) -> Frame;
}
```
### Static audio sources
There should be a way to convert a vector of frames into a static audio source.
```rust
trait StaticSource: Source {
fn into_static_source<I>(frames: I) -> Self
where
I: IntoIterator<Item = Frame>,
I::IntoIter: ExactSizeIterator;
}
```
### `AudioSink`
`AudioSink` are traits that have the basic functionality of being... an audio sink.
```rust
// Copy pasted from here: https://github.com/mockersf/bevy/blob/b9dd4d03f37b079c909404af006fa3b946c55414/crates/bevy_audio/src/sinks.rs#L7-L52
trait Sink {
/// Gets the volume of the sound.
///
/// The value `1.0` is the "normal" volume (unfiltered input). Any value other than `1.0`
/// will multiply each sample by this value.
fn volume(&self) -> f32;
/// Changes the volume of the sound.
///
/// The value `1.0` is the "normal" volume (unfiltered input). Any value other than `1.0`
/// will multiply each sample by this value.
fn set_volume(&self, volume: f32);
/// Gets the speed of the sound.
///
/// The value `1.0` is the "normal" speed (unfiltered input). Any value other than `1.0`
/// will change the play speed of the sound.
fn speed(&self) -> f32;
/// Changes the speed of the sound.
///
/// The value `1.0` is the "normal" speed (unfiltered input). Any value other than `1.0`
/// will change the play speed of the sound.
fn set_speed(&self, speed: f32);
/// Resumes playback of a paused sink.
///
/// No effect if not paused.
fn play(&self);
/// Pauses playback of this sink.
///
/// No effect if already paused.
/// A paused sink can be resumed with [`play`](Self::play).
fn pause(&self);
/// Is this sink paused?
///
/// Sinks can be paused and resumed using [`pause`](Self::pause) and [`play`](Self::play).
fn is_paused(&self) -> bool;
/// Stops the sink.
///
/// It won't be possible to restart it afterwards.
fn stop(&self);
}
```
### `AudioOutput`
`AudioOutput` is now a trait that simply handles the audio thread.
```rust
trait Output {
/// plays the audio (usually getting the samples
/// of the source and feeding it to `cpal`)
/// and returns the handle of the audio source.
fn play<Au: Source>(&mut self, source: Au) -> Au::Control;
}
```
### `AudioData`
`AudioData` is the `Asset` form of `AudioSource`. Generally `AudioSource` is usually `Send` but not `Sync`, so we make another trait to convert audio data into audio sources. This is similar to `bevy_audio`'s `Decodable`, `kira`'s `SoundData`, and `bevy_oddio`'s `ToSignal` traits.
```rust
trait AudioData {
type Source: Source;
type Settings;
fn to_source(&self, settings: Self::Settings) -> Self::Source;
}
trait StaticAudioData: AudioData
where
Self::Source: StaticSource,
{
fn to_static_source(&self, settings: Self::Settings) -> Self::Source;
}
```
### `AudioMixer` and `Audio`
`AudioMixer` is the public API usually accessed by the user through a resource. Generally implementing libraries should have type alias that specifies the `AudioOutput` used.
```rust
struct AudioMixer<O> {
output: O,
// ...impl details
}
impl<O: Output> AudioMixer<O> {
fn play<Au: AudioData>(
&mut self, data: Handle<Au>,
settings: Au::Settings
) -> <Au::Source as Source>::Control {
// use self.output internally
}
}
// For example, we use a `rodio` backend library
type Audio = AudioMixer<RodioOutput>;
// and the provided plugin will be registering this
// type as a resource
```
### `AudioLoader`
Audio loaders are simply types that implement `AssetLoader`. It's up to the backend library authors to implement them.
### `AudioPlugin`
Plugins will not be provided by `bevy_audio`. This will be provided by backend authors. Generally this type will:
1. Register their `StaticSource` as a library
### For Backend Library Authors
Backend library authors must implement the following traits to their own types:
1. `Source`
2. `StaticSource`
3. `Sink`
4. `Output`
5. `AudioData`
### For Users (Both Bevy Game Developers and Non-Backend Library Authors)
Users should essentially see no API changes (except for imports). They can choose their own custom backend (currently `rodio`, `kira`, and `oddio` in the bevy ecosystem).
`AudioPlugin` should be from the backend libraries and not from `bevy_audio`, as each backend library will have their own way of setting up their audio threads, systems, resources, components, etc.
## Drawbacks
1. This will force backend library authors to rewrite their whole library.
2. The current implementation uses a lot of trait soup. This will probably confuse some users.
## Rationale and alternatives
### Why abstract `bevy_audio`?
As a non-backend library author, working with the bevy audio ecosystem is frustrating:
1. Since each audio backend has its own idiosyncrasies regarding their own implementations, there will be different implementations the author have to make just to support these backends.
2. Since these backends are mutually incompatible, implementations have to be gated in non-additive features, which of course Rust do not like.
### Why not improve `rodio`? Why not focus on `kira`?
Each audio backend library has their own use cases:
1. [`kira`] focuses on timing audio correctly. This is why it has functionalities related to tweening and clocks.
2. [`oddio`] is more on raw digital signal processing. It works by using iterator-like combinators that you combine together to manipulate audio in real time. As such, it places less focus on static audio files, and more on procedural generation and manipulation of audio.
3. [`synthizer`], an unpublished audio backend crate, is heavily optimized for binaural audio that brings a native dependency.
4. [`rodio`] is `rodio`.
Since there are currently three main contenders for backend libraries (`rodio`, `kira`, and `oddio`), this will cause non-backend library authors to create three exclusive crate features for their library. This creates a very frustrating experience, as exclusive features are not well supported in Rust.
Users should not care about implementation details, unless they need specific functionalities provided by their chosen backend libraries.
## Unresolved questions
- What should be our default audio backend?
- [`rodio`]? It has many potential problems with its API, and it has a problem regarding stereo audio. (See [rodio#444] and [bevy#6122])
- [`kira`]? This is the best potential default backend for bevy, however I found some features lacking, like sound effects and digital signal processing.
- [`oddio`]? This is pretty much bare bones in terms of its features, opting for more flexibility for the user. It has first class support for spatial audio. It does not, however, output audio, rather only manipulate signals. `bevy_oddio` handles this by using `cpal` directly.
- How does [`bevy-rrise`] mesh in this RFC?
- Probably [`bevy-rrise`] can simply ignore this, as this has a vastly different architecture compared to other backend crates.
[rodio#444]: https://github.com/RustAudio/rodio/issues/444
[bevy#6122]: https://github.com/bevyengine/bevy/issues/6122
[`bevy-rrise`]: https://github.com/dtaralla/bevy-rrise
## Future possibilities
- Traits for spatial audio. This is useful for interfacing audio components with different backends. We don't want to lock our implementation with a single backend some users may not want.
[`rodio`]: https://github.com/RustAudio/rodio
[`kira`]: https://github.com/tesselode/kira
[`oddio`]: https://github.com/Ralith/oddio
[`synthizer`]: https://github.com/synthizer/synthizer