Improving Bevy's Async + ECS ergonomics

# Improving Bevy's Async + ECS ergonomics ## Motivation ### What do Bevy users want to do with async integration? ### How it currently works TODO: verify and summarize https://bevy-cheatbook.github.io/fundamentals/async-compute.html ### Why that's not good enough ## A comparative analysis of prior art Bevy's API here is quite limited, and has been for a very long time. As a result there is a *lot* of prior art. Direct attempts to solve this problem include: - [`bevy_coroutine_system`](https://crates.io/crates/bevy_coroutine_system) - [`bevy_mod_async`](https://crates.io/crates/bevy_mod_async) - [`bevy_defer`](https://crates.io/crates/bevy_defer) - [`bevy_flurx`](https://github.com/not-elm/bevy_flurx) - [`bevy_async_task`](https://github.com/loopystudios/bevy_async_task) - [`bevy-async-ecs`](https://github.com/dlom/bevy-async-ecs) - [`pecs`](https://github.com/jkb0o/pecs/tree/master) - [`bevy_jobs`](https://github.com/frewsxcv/bevy_jobs) In addition: - [`bevy_mod_scripting`'s async workloads](https://discord.com/channels/691052431525675048/918591326096850974/1419809954021380280) - [`haalka`](https://github.com/databasedav/haalka) ## What is most desired The ideal situation for ergonomic async integration would be a runtime-agnostic integration, one that could work with any and all async executors with minimal or no setup and integration with them, that can be easily accessed from async tasks *not* spawned by our particular library. Where the async integration is able to take in non-send datatypes like `&mut` from the environment when accessing ECS data so it can ergonomically write into sections of async data. Where it can work with multithreaded runtimes so existing networking crates like `Quinn` based on `Tokio`'s multithreaded executor can just work. The existing crates in the ecosystem do not provide this capability but a crate I created to test out the viability of such a thing *does* provide it, `bevy_malek_async`. It provides a *single* function `async_access` that takes in a WorldId and a closure that is non-send. It works with multithreaded runtimes like tokio with 0 integration, and is only 166 lines. Currently the implementation runs in PreUpdate, Update, and PostUpdate, but we *could* create an implementation that is schedule locked and only runs in a specific schedule. The pros and cons of this are talked about [here](https://discord.com/channels/691052431525675048/1422021261655015564/1428467495697776772) as well. Do we want greedy scheduling or do we want it to predictably only run at the beginning or end of a Schedule when we schedule lock it? The pros are we can do lots of interleaved tasks and be sure we're gonna get as many async accesses as possible as fast as possible. The cons are the scheduling becomes *more* unpredictable. Though async is already to some extent unpredictable. ## Analysis of Alternatives ### bevy_coroutine_system This crate sadly requires nightly features, so any talk of it being upstreamed would have to wait until the corutine feature is stabilized in rust, which doesn't appear to be close on the horizion. That being said it is *very* cool and in many ways an optimal way of doing things. If only coroutines were stabilized. That being said, coroutines *aren't* async, and we also want tight integration with async itself, so we can do things like use networking and asset loading and interface with the wider rust async ecosystem. ### bevy_mod_async bevy_mod_async is a great project. It increases the ergonomics for async tasks in bevy by quite a bit, however there are downsides. It's tied to a single async runtime ( bevy's internal runtime ) which makes direct interface with other async executors like tokio, smol, etc more difficult. This is a problem directly for tokio because one of the usecases of a more ergonomic async integration is something that is runtime-agnostic. Also it's general api using the `cx` parameter instead of a `WorldId` means that it is not easy even within bevy tasks to use it unless you're using it's specific spawning api. Furthermore the `Fn` that it requires must be Send which limits the types you can temporarily borrow within it quite a bit. This makes ergonomics way worse for if you want to take in a `&mut Thing` and update it within the closure. This is a *very* common pattern that ergonomic async ecs should support: ```rust IoTaskPool::get().spawn(async { let id = ...//we get the world id from somewhere; let mut uwu = String::new(); let awa = &mut uwu; async_access::<Res<MyThing>, _, _>(id, |my_thing| { *awa = my_thing.0.clone(); }).await; }).detach(); #[derive(Resource)] pub struct MyThing(String); ``` This kind of pattern can only work when the equivalent of `async_access` or `cx.with_world` supports a Non-Send future. There are other ergonomics that bevy_mod_async adds that would be nice to have in bevy, such as directly awaiting a model load, spawning, and despawning of an entity: ```rust= let e = cx.spawn(()).await; cx.despawn(e).await; let a = cx.load_asset::<Mesh>("model.glb#Mesh0").await.unwrap(); ``` These things are out of scope for the current working group, but would be great to add as followup once we have the initial async support. We could go even further, and await the adding or removing of components to existing entities as well! ### bevy_defer `bevy_defer` is a great crate that makes many strong tradeoff decisions. It operates on it's own runtime/executor, not bevy or tokio, but this also means that it is incompatible with being used with those easily, you must use channels. It has many higher level niceties like `bevy_mod_async` but it's fundemental model is of storing a `&mut World` in a thread_local, and then accessing that in futures. This makes it not suitable for multithreaded runtimes like normal tokio usage as well, and cannot be adopted to such. It has some niceties that might be nice to add as higher level features such as storing information data like component type on a copyable structure, and then using `.get` on it. Because of it's structure it is able to hand one closures that contain data *without* an await call. ```rust= let translation = AsyncWorld .entity(entity) .component::<Transform>() .get(|t| { t.translation }).unwrap() ``` This is different than how `bevy_malek_async` and `bevy_mod_async` operate, where they require the await call. This is because of bevy_defer's single threaded and thread_local nature. The approach with `bevy_defer` *is* compatiable with bringing `&mut` and other non-send types into it's closures when accessing ecs data, unlike `bevy_mod_async`, however it's fundmental structuring makes it incompatible with both parallel runtimes, and being runtime-agnostic in general. ### bevy_flurx `bevy_flurx` is a wonderful crate with very detailed documentation. It uses async in order to allow for scheduling delays and interactions between things in bevy's ecs. It also has 'side_effect's which allow integration with tokio, bevy_task runtime, and thread tasks. These overall create a very interesting structured way of using async. There are many usages and it is very specialized to representing specific useful logic with it's abstractions. For certain kinds of game logic a strong argument could be made for it's existance, but it is not the kind of *generally useful* async ecs integration that we are looking to upstream a version of. This way of doing async is not reversable. It allows for the integration of async stuff into the ecs, but not vice-versa. It is also not runtime agnostic, having specific extensions for tokio, and, in the opinion of Malek, is not ergonomic in general, with a very boilerplate syntax that makes expressing terse easy to understand async <-> ecs code more difficult. `bevy_flurx` exists to solve a very specific, useful!, subset of problems with timing using async, and does so well, but it is not the solution I think we should go with. ### bevy_async_task `bevy_async_task` provides some improvments on normal bevy async ergonomics, but entirely in the realm of doing async things from a system, and not accessing ecs data from an async task. As such it's ideas are not very applicable to the kind of niche we want to serve: making it ergonomic to access and manipulate ecs data / the bevy world from async tasks. ### bevy_async_ecs `bevy_async_ecs` is a very interesting crate with a very interesting and ergonomic yet limited approach to async ecs access. It uses the ability to send command queues in order to do it's async ecs access, but the way it has gone about it means that what one can actually do is limited. You can run systems, spawn and despawn entities, send and wait for events. All of it is rather interesting and could be, i believe, modified to work in a runtime agnostic manner and works with multithreaded runtimes. Currently it does not have the ability to do the equivalent of `cx.world_access(|&mut World|)` or `async_access(cx, |&mut World|)` but it's fundemental approach *could* be modified to do this. However like `bevy_mod_async` this limits it to only Send futures, vastly harming ergonomics compared to the ability to use Non-Send futures so one can ergonomically mutate state between the async and internal closure sections. In Malek's opinion this is the most promising method of upstreaming ergonomic async ecs access other than Malek's own `bevy_malek_async` crate. ### pecs `pecs` is a macro heavy method of doing async <-> ecs, remincisent in structure to `bevy_flurx` as it also relies a lot on method chaining. The following is the example from the README ```rust= fn setup(mut commands: Commands, time: Res<Time>) { let start = time.elapsed_seconds(); commands // create PromiseLike chainable commands // with the current time as state .promise(|| start) // will be executed right after current stage .then(asyn!(state => { info!("Wait a second.."); state.asyn().timeout(1.0) })) // will be executed after in a second after previous call .then(asyn!(state => { info!("How large is is the Bevy main web page?"); state.asyn().http().get("https://bevyengine.org") })) // will be executed after request completes .then(asyn!(state, result => { match result { Ok(response) => info!("It is {} bytes!", response.bytes.len()), Err(err) => info!("Ahhh... something goes wrong: {err}") } state.pass() })) // will be executed right after the previous one .then(asyn!(state, time: Res<Time> => { let duration = time.elapsed_seconds() - state.value; info!("It took {duration:0.2}s to do this job."); info!("Exiting now"); asyn::app::exit() })); } ``` ## Stage 1: Async Commands ## Async tasks It seems to me that one of the major use-cases of async within the ECS would be to ergonomically handle multi-frame tasks. Things like: 1) despawn this entity after 10 ticks 2) spawn entity after asset loaded You can accomplish these using systems: - it has a performance cost. (for 2) we would add an observer that then triggers on **every** AssetLoaded event. There is no easy way to despawn the Observer) - it's not very ergonomic The current MultiThreadedExecutor currently already runs in a somewhat async manner: - a certain number of threads in a threadpool run the same `spawn_system_tasks` fn, with tries to determine which systems can be run by checking the FilteredAccess - if a system can be run: the FilteredAccess is updated, the system spawned as an async Task. When the task is completed the execution metadata is updated. Spawning bevy tasks would be fairly similar to spawning these systems. However there are some differences. #### Computing access Similarly to system, the bevy tasks would required ECS data and have an associated FilteredAccess that the executor could check for to determine if the task is currently runnable without any conflict. However for async tasks the ECS access might only be needed for a subset of the task; how can we specify this information? ``` async fn task() { ecs_task(|Res<R>| ...).await; time.sleep(1000).await ecs_task(|Res<S>| ...).await; } ``` For example in this case we don't want the task to request access for both `R` and `S` for the entire duration for the task; the Access should be tailored to the portion of the task that currently needs to be run. #### Run order The executor currently uses the Schedules to create an ordered graph of systems that can run, along with their dependencies. It then uses a bit-set to track the current state of each system, to determine which systems can currently run. For async tasks, when would we determine when the task can run? 1) specifying order with systems/schedules It might be very useful to be able to specify: 'run this portion of the task only after a given Schedule/System/Set' For example something like: ``` async fn task_a() { ecs_task(|ResMut<R>| ...).after_schedule(PostUpdate).after_set(Transform::Propagate) } ``` ### General implementation The general idea could be that when the user can spawn `Future`s that would be scheduled on Bevy's task pool. The user also has access to a `ecs_task` function that lets them create an ECS-aware Task. `ecs_task` take as input a `FunctionSystem` and register it inside the current bevy executor's metadata. When the executor tries to determine which systems can currently run (using the information computed from the schedule), it would also look at the added ecs-tasks as potential 'systems' that can be run. Here the main complexity will come from efficiently determining which tasks can run. For systems we precompute all ordering-related constraints **once** when the Schedule is created, for tasks this would have to be done on the fly. How to efficiently compute this information is still an open problem. Even if the ecs-tasks didn't have any ordering constraints, it might be expensive to figure out which ones can run: - one strategy could be to run them all in a task-dedicated schedule to ensure that they conflict with the normal system schedules - another (very expensive) option would be that after every system we would loop through all pending tasks to determine whether its access now allow it to run. Bevy also pre-computes **once** the list of conflicting systems based on their Accesses once at schedule creation. We would have to move this step to be done on the fly whenever ECS-tasks are available. This should be do-able since bevy used to do that when system conflicts was based on `ArchetypeComponentId` and not `ComponentId`. The task is scheduled similarly to a system: a new async task is spawned for it, and upon its completion the Executor metadata is updated (notable the combined Access) so that it can determine which new systems/tasks can run. What changes is that the Executor would also call the Waker associated with the ECS task to let the bevy task pool know that the Future can now resume. Some open questions: - how do we efficiently find which tasks can run, considering that their Access/Order constraints was not pre-computed at the start of the schedule? - one option could be to have one step per frame where we re-compute the entire schedule graph with these 'ecs-tasks' added. Maybe there are some graph algorithms that would do this efficiently if the new graph is only a slightly-perturbated version of the original graph? - how does the ecs-task get access to the current Executor so that it can add itself to its metadata?

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.