The Great Winit Refactorening

# The Great Winit Refactorening This document lays out a plan of action for refactoring the winit runner. Through this initiative, we aim to: 1. Significantly simplify the winit runner and make it maintainable. 2. Decouple bevy ECS updates from the winit event loop. 3. Allow bevy to make full use of Web workers. 4. Allow bevy to run properly and easily without winit. 5. Introduce the infrastructure necessary for frame-pacing. 6. Introduce the infrastructure necessary for accurate input timestamps. These goals are all closely interrelated. ## 1. Plan Of Attack This plan revolves around three important changes: 1. Introduce a task-queue that can be used to inject code into the winit runner. 2. Move `!Send + !Sync` data out of the ECS and into the winit runner. 3. Introduce a channel that the winit runner uses to send events to the ECS world. These changes, taken together, will give us the option of running the ECS event loop on its own thread[^1]. This should unblock all the other goals we stated above. I will discuss each of these changes in detail, then proceed to outline some other details about how winit ought to be used in the future, and how we can approach the higher-level goals. ### 1.1 Main Thread Task Queue The basic structure of a winit application is as follows: ```rust #[derive(Default)] struct App { ... } impl ApplicationHandler for App { ... } fn main() { let event_loop = EventLoop::new().unwrap(); let mut app = App::default(); event_loop.run_app(&mut app); } ``` As far as we are concerned, the call to `run_app` __must__ take place on your process's main thread. This call essentially does not yield until your app is closed, meaning that it effectively takes charge of that thread. For reasons that I will discuss in great detail further on, this made a lot of people very angry and has been widely regarded as a bad move. While winit is running its event loop on the main thread, it periodically dispatches callbacks to a value that implements `ApplicationHandler` (`App` in the example above). This is how we run code within the main thread loop. Let’s take a closer look at what callbacks are available to us. ```rust pub trait ApplicationHandler<T: 'static = ()> { // Required methods fn resumed(&mut self, event_loop: &ActiveEventLoop); fn window_event(&mut self, event_loop: &ActiveEventLoop, window_id: WindowId, event: WindowEvent); // Provided methods fn new_events(&mut self, event_loop: &ActiveEventLoop, cause: StartCause) { ... } fn user_event(&mut self, event_loop: &ActiveEventLoop, event: T) { ... } fn device_event(&mut self, event_loop: &ActiveEventLoop, device_id: DeviceId, event: DeviceEvent) { ... } fn about_to_wait(&mut self, event_loop: &ActiveEventLoop) { ... } fn suspended(&mut self, event_loop: &ActiveEventLoop) { ... } fn exiting(&mut self, event_loop: &ActiveEventLoop) { ... } fn memory_warning(&mut self, event_loop: &ActiveEventLoop) { ... } } ``` During normal operations, the winit event loop (as seen by an ApplicationHandler) looks something like this: 1. The thread sleeps for a bit. 2. It receives an event, wakes up, and calls `new_events`. 3. It calls `window_event`, `device_event`, and `user_event` a few times. 4. It runs out of events, calls `about_to_wait`, and goes to sleep. 5. Return to step 1. We'll focus on these five calls. Neither `new_events` nor `about_to_wait` are especially useful to us. Both `window_events` and `device_events` are the main way that bevy receives input from the OS (mouse movement, button presses, and so on). There's also a special `WindowEvent` called `RedrawRequested`, which informs us that the OS would like us to repaint our window. For us, the most important of these is `user_event`. The `ApplicationHandler` trait is generic, so it can receive any type we like. And instead of receiving data from the operating system, _we_ can send these events via an `EventLoopProxy` (which is `Send + Sync`). >[!Important] Why This Matters > The important thing here is that `EventLoopProxy::send_event` _wakes the winit thread if it is sleeping_ so that it can call the `user_event` callback; which means we can run some custom code within the main thread loop, whenever we want. We can take advantage of this to execute arbitrary closures within `user_event`, by treating winit as a work queue. ```rust // somewhere in bevy_winit ... pub type WinitTask = Box<dyn FnOnce(&mut WinitApp, &ActiveEventLoop) + Send + 'static>; impl ApplicationHandler<WinitTask> for WinitApp { ... fn user_event(&mut self, event_loop: &ActiveEventLoop, event: WinitTask) { event(self, event_loop); } ... } enum WinitTaskError { NoEventLoopProxy, EventLoopClosed } impl World { /// Runs the provided closure within the winit event loop pub fn spawn_winit_task<F>(&self, func: F) -> Result<(), WinitTaskError> where F: FnOnce(&mut BevyWinitApp, &ActiveEventLoop) + Send + 'static { self.get_resource::<EventLoopProxyWrapper<WinitTask>() .ok_or(WinitTaskError::NoEventLoopProxy)? .send_event(Box::new(func) as WinitTask) .map_err(|_| WinitTaskError::EventLoopClosed) } } ``` Here's an example of what this might look like ```rust // Asychronously creating a window from a non-main thread world.spawn_winit_task(|bevy_winit_runner, event_loop| { let window_attributes = Window::default_attributes() .with_title("A lovely new window!"); let new_window = event_loop.create_window(window_attributes); // This isn't a standard api, it's just an example of how we might // be able to store data on the bevy_winit_runner within callbacks. bevy_winit_runner.add_window(new_window) }); ``` >[!Note] > I'm using the world here instead of a `static` because statics don't work across web-workers, whereas normal references do. We might also want to implement a version of `spawn_winit_task` that blocks calling context, since this would let us return an output, and also allow the closure to safely borrow locally scoped variables. ```rust impl World { pub fn scoped_winit_task<F, T>(&self, func: F) -> Result<T, WinitTaskError> where F: FnOnce(&mut BevyWinitApp, &ActiveEventLoop) -> T + Send T: Send + 'static { // I'm going to omit the actual implementation, since to do it // properly we'd want to avoid heap allocation, and that would // require a fair amount of unsafe code that isn't really // relevant to the broader proposal. // // Be content that this can be done. If you are curious, check // out forte's `StackJob` type. } } ``` This is a sorta-sketchy thing to do, but it's very flexible. ```rust let name = "foo"; let item = world.scoped_winit_task(|bevy_winit_runner, event_loop| { // this is borrowing `name`, passing it to our runner, and returning data! return bevy_winit_runner.get_some_property(&name); }); ``` With these two methods in hand, you may be starting to see how I intend to approach the data-storage problem. ### 1.2 Main Thread Data Storage I have good news, and I have bad news. The good news is: `World` is `Send + Sync`. The bad news is, it's not send or sync in any way that matters. It turns out that even though `World` is `Send + Sync`, it can still contain `!Send + !Sync` resources. It just panics if they are accessed across threads. This currently prevents us from moving the ECS out of the main thread, since resources are first created on the main thread, and then handed to the ECS. We can't move the ECS without leaving them behind. >[!Note] > Rust datatypes come in several forms: the good (`Send + Sync`), the bad (`!Send + !Sync` + locked to the main thread), and the ugly (four other permutations that are neither exactly "good" nor entirely "bad"). > > Since "bad" datatypes have the most restrictions, any infrastructure that can handle them will also work for "ugly" data-types. So going forward, I'm going to assume all our datatypes are either "good" (`Send + Sync`) or "bad" (`!Send + !Sync` and locked to the main thread), and just ignore all the "ugly" ones. > > I'm also going to keep using "bad" to refer to this type of data. We need to take "bad" resources out of the ECS entirely. To start with, we can remove `Storages::non_send_resources` and the hacky const generic `SEND` from `Resources` and `ResourceData`. We can also remove the following methods: + `World::insert_non_send_resource` + `World::insert_non_send_by_id` + `World::init_non_send_resource` + `World::initialize_non_send_internal` + `World::remove_non_send_resource` + `World::remove_non_send_by_id` + `World::contains_non_send` + `World::contains_non_send_by_id` + `World::non_send_resource` + `World::non_send_resource_mut` + `World::get_non_send_resource` + `World::get_non_send_resource_mut` + `World::get_non_send_by_id` + `World::get_non_send_mut_by_id` From now on, "bad" data will need to be stored in thread-locals or static variables. This lets us avoid contaminating any other types (like `App` or `World`) and solves some fiddly platform-specific constraints. ```rust cfg::std! { if { thread_local! { static NONSEND_ATTACHMENTS: RefCell<AnyMap> = RefCell::new(AnyMap::new()); } } else { // thread_local is not supported on no_std targets static NONSEND_ATTACHMENTS: RefCell<Option<AnyMap>> = RefCell::new(None); } } ``` The methods for adding non-send data to `App` should just proxy over to this storage. ```rust impl App { ... pub fn insert_non_send_data<D: 'static>(&mut self, data: D) -> &mut Self { cfg::std! { if { NONSEND_ATTACHMENTS.with_borrow_mut(|attachments| { attachments.insert(data) }) } else { NONSEND_ATTACHMENTS .borrow_mut() .get_or_insert_with(AnyMap::default) .insert(data) } } } ... } ``` To access non-send data, we should provide a convience api that combines the tasks from section 1.1 with the backing storage. ```rust impl World { fn with_nonsend_data<D, F>(func: F) where D: 'static, F: FnOnce(&mut D) + Send + 'static { self.spawn_winit_task(move |_, _| { cfg::std! { if { NONSEND_ATTACHMENTS.with_borrow_mut(|attachments| { attachments.get_mut::<D>().map(func); }) } else { if let Some(attachments) = NONSEND_ATTACHMENTS.borrow_mut() { attachments.get_mut::<D>().map(func); } } } }); } } ``` This ends up working quite a bit like `resource_scope`. ### 1.3 Windowing Events Channel The last thing needed before we can move the ECS out of the winit event loop is some way to forward windowing events to the ecs. This calls for a channel (probably `crossbeam_channel::unbounded` or something similar). ```rust // somewhere in bevy_winit ... enum RuntimeEvent { Suspended, Resumed, Exiting, WindowEvent { window_id: WindowId, event: WindowEvent }, DeviceEvent { device_id: DeviceId, event: DeviceEvent }, MemoryWarning } // This is the same implementation of winit::ApplicationHandler again struct WinitApp { ... pub events: Sender<WindowEvent> ... } impl ApplicationHandler<WinitTask> for BevyWinitApp { // contains the simple task executor from section 1.1 fn user_event(&mut self, event_loop: &ActiveEventLoop, event: T) { ... } fn suspended(&mut self, event_loop: &ActiveEventLoop) { self.events.send(RuntimeEvent::Suspended); } fn resumed(&mut self, event_loop: &ActiveEventLoop) { self.events.send(RuntimeEvent::Resumed); } fn exiting(&mut self, event_loop: &ActiveEventLoop) { self.events.send(RuntimeEvent::Exiting); } fn window_event( &mut self, event_loop: &ActiveEventLoop, window_id: WindowId, event: WindowEvent, ) { self.events.send(RuntimeEvent::WindowEvent { window_id, event }); } fn device_event( &mut self, event_loop: &ActiveEventLoop, device_id: DeviceId, event: DeviceEvent, ) { self.events.send(RuntimeEvent::DeviceEvent { device_id, event }); } fn memory_warning(&mut self, event_loop: &ActiveEventLoop) { self.events.send(RuntimeEvent::MemoryWarning); } } ``` Then we just need to set up a runner in `bevy_winit` that processes this stream *on the ECS thread* before running the schedule. A very simple version might look something like this: ```rust // runs a single app update, and is intended to be called in a loop fn run_app( mut app: App, mut events: Receiver<RuntimeEvent>, ) { loop { // Handle all delivered events while let Ok(event) = events.try_recv() { // handle events here app.handle_event(event); } // Then execute a single "tick" of the world app.update(); } } ``` This is a terrible implementation of `run_app` for lots of reasons. Ignore it's faults for now, we will fix them in chapter 3. ## 2. Demolishing `WinitAppRunnerState` I am going to be blunt: Our current `ApplicationHandler` implementation (the `WinitAppRunnerState` type) is pretty much unsalvagable. I am strongly of the opinion that it is time to write a new runner from scratch. We may be able to repurpose some of the logic in a peaicemeal fasion, but it really is an absolute mess. In this section, we'll write the basics of a replacement. ### 2.1 The Current State The heart of `bevy_winit` is effectivly this little snippet, taken from `WinitPlugin::build`. ```rust let mut event_loop_builder = EventLoop::<WakeUp>::with_user_event(); let event_loop = event_loop_builder .build() .expect("Failed to build event loop"); app.set_runner(|app| winit_runner(app, event_loop)) ``` This tells bevy to call `winit_runner` after it finishes building the app. Here's a slightly simplified version of that function: ```rust pub fn winit_runner(mut app: App, event_loop: EventLoop<WakeUp>) -> AppExit { app.world_mut() .insert_resource(EventLoopProxyWrapper(event_loop.create_proxy())); cfg::web! { if { let runner_state = WinitAppRunnerState::new(app); event_loop.spawn_app(runner_state); AppExit::Success } else { let mut runner_state = WinitAppRunnerState::new(app); if let Err(err) = event_loop.run_app(&mut runner_state) { error!("winit event loop returned an error: {err}"); } // If everything is working correctly then the event loop only exits // after it's sent an exit code. runner_state.app_exit.unwrap_or_else(|| { error!("Failed to receive an app exit code! This is a bug"); AppExit::error() }) } } } ``` >[!Note] > The most important part of this `WinitAppRunnerState::new(app)`. We are passing the app direclty into the winit app runner. That just about gets us to `WinitAppRunnerState` which, as I've already said, is mess. ### 2.2 Replacing `winit_runner` The big thing that needs to happen here is to move the ECS to it's own thread before we start winit. Incorperating that with the changes from chapter 1 will give us a `winit_runner` that looks something like this ```rust // Note: I've replaced `WakeUp` with the `WinitTask` type from section 1.1 pub fn winit_runner(mut app: App, event_loop: EventLoop<WinitTask>) -> AppExit { app.world_mut() .insert_resource(EventLoopProxyWrapper(event_loop.create_proxy())); // Create channels for sending runtime events let (runtime_events_sender, runtime_events_receiver) = unbounded(); cfg::web! { if { // this is using `run_app` from section 1.3 // wasm_thread just provides thread-style apis for web-workers wasm_thread::spawn(move || run_app(app, runtime_events_receiver)); // this no longer takes ownership of the bevy app let winit_app = WinitApp::new(runtime_events_sender); event_loop.spawn_app(winit_app); AppExit::Success } else { // this is using `run_app` from section 1.3 thread::spawn(move || run_app(app, runtime_events_receiver)); // this no longer takes ownership of the bevy app let mut winit_app = WinitApp::new(runtime_events_sender); if let Err(err) = event_loop.run_app(&mut winit_app) { error!("winit event loop returned an error: {err}"); } } } } ``` Hey-Presto: Our ECS is now running in it's own thread! And, now that the runner dosn't own the app, probably 90% of the logic in `WinitAppRunnerState` can be removed or ported over to `run_app`. `WinitApp` will be tiny, probably not much more than what was already shown in sections 1.1 and 1.3. By contrast, `run_app` will probably assume much of the complexity budget, and be much longer than what I presented here. ## 3. The Rest of the Fucking Owl That's the broad strokes of the proposal, but there's a very, very long tail of details which will need to be resolved to keep feature parity with the existing solution. In this section, we'll address those additional details one by one. ### 3.1 Single-Threaded Fallback Hot take, I don't think we should support running the ECS within winit after this change. Any platform supported by winit will realistically support running the ECS on a separate thread, and there's basically no reason not to take advantage of that. Obviously, `no_std` targets don't support threading, but they don't support `winit` either, so this isn't an issue. Winit is for operating systems, and operating systems _have_ threads. On the web, the story is the same. Web workers are ubiquitous and have far better support than other things we rely on (like `WebGpu`). IMO, bevy should plan to _require web worker support_ at some point in the future. This will complicate web builds, but this complexity is manageable. ### 3.3 Running Winit Off the Main Thread Winit has a `run_on_any_thread` option. In this case, we should run the ECS in the main thread instead. This will require changes to + `winit_runner`, to swap around the thread spawning + `World::with_nonsend_data`, to directly access resources stored in the main ECS thread. I am currently not exactly sure what this will look like. We may need to put some kind of task queue in the main ECS executor. ### 3.4 Non-Send Data Without Winit We also need to support non-send data when winit is not in use. This seems similar to the problem from section 3.3, since the ECS owns the main thread where the data is stored. Anything that works for 3.3 should work for this as well. ### 3.5 The Windows-As-Entities API Winit windows _can_ be used from any thread, but may make a blocking call to the main thread on some platforms. When making multiple calls on a window, it's much more efficient to send a single task to the main thread and do all the window logic there. Supporting bevy's existing windowing API will take some care, as now we will have to pick which side of the winit integration things live on: on the winit side, or within the ECS. I recommend: + Storing mappings between winit IDs and entities on the ECS side + Storing winit types themselves on the winit side + Using `scoped_winit_task` to do bulk data read from winit side to the ECS + Using `spawn_winit_task` to do bulk data write from the ECS to winit + Trying to call `scoped_winit_task` and `spawn_winit_task` as few times as possible ### 3.6 Reactive & Continuous Presentation Bevy can run in either "reactive" or "continuous" update modes. Reactive mode means that ECS updates are only run after an (a.) input is received (b.) the OS requests a repaint or (c.) the app itself requests a repaint. Continuous mode just runs app updates continuously. To model this, we're going to have to revisit that super-minimal version of `run_app` I showed above. Let's try just implementing the "reactive" mode first. ```rust fn run_app( mut app: App, mut events: Receiver<RuntimeEvent>, ) { loop { // Wait to receive an event match events.recv() { Ok(event) => app.handle_event(event), Err(_) => return, } // Try to drain any other events that may be pending loop { match events.try_recv() { Ok(event) => app.handle_event(event), Err(TryRecvError::Empty) => break, Err(TryRecvError::Disconnected) => return, } } // Then execute a single "tick" of the world app.update(); } } ``` To allow bevy to request a manual redraw, we can pass a `Sender<RuntimeEvent>` into app and provide a method to send a `RuntimeEvent::RequestUpdate` event, or something. Then, to provide a continuous update mode, all we really need to do is send a `RequestUpdate` event every frame. ### 3.7 Frame-Pacing Let's assume we have some function `app.next_update_boundary() -> Instant` that provides frame pacing timeings, and we want to avoid calling `app.update()` before the timestamp it gives us. To do this, we can return to `run_app`. ```rust fn run_app( mut app: App, mut events: Receiver<RuntimeEvent>, ) { let mut next_update = Instant::now(); loop { // Wait to receive any event match events.recv() { Ok(event) => app.handle_event(event), Err(_) => return, } // Receive other events while waiting for the update boundary loop { match events.recv_deadline(next_update) { Ok(event) => app.handle_event(event), Err(RecvTimeoutError::Timeout) => break, Err(RecvTimeoutError::Disconnected) => return, } } // Then execute a single "tick" of the world app.update(); // Determine when we should start the next update next_update = app.next_update_boundary(); } } ``` This provides frame-pacing for both reactive and continuous modes, and also has the benifit of preventing us from getting stuck in `try_recv` if there are an infinite number of messages. ### 3.8 Input Timestamps Running the ECS on it's own thread also lets us improve the accuracy of our input timings. Winit does not (currently) provide timestamps with inputs, so it's up to us collect them when we first receive the event. For this reason, behoves us to do as little work in the winit runner as possible, to avoid introducing latency between event creation and event read. ```rust enum RuntimeEvent { Suspended, Resumed, Exiting, WindowEvent { window_id: WindowId, timestamp: Instant, event: WindowEvent }, DeviceEvent { device_id: DeviceId, timestamp: Instant, event: DeviceEvent }, MemoryWarning } struct WinitApp { pub events: Sender<WindowEvent> } impl ApplicationHandler<WinitTask> for BevyWinitApp { fn user_event(&mut self, event_loop: &ActiveEventLoop, event: T) { ... } fn suspended(&mut self, event_loop: &ActiveEventLoop) { self.events.send(RuntimeEvent::Suspended); } fn resumed(&mut self, event_loop: &ActiveEventLoop) { self.events.send(RuntimeEvent::Resumed); } fn exiting(&mut self, event_loop: &ActiveEventLoop) { self.events.send(RuntimeEvent::Exiting); } fn window_event( &mut self, event_loop: &ActiveEventLoop, window_id: WindowId, event: WindowEvent, ) { let timestamp = Instant::now(); self.events.send(RuntimeEvent::WindowEvent { window_id, instant, event }); } fn device_event( &mut self, event_loop: &ActiveEventLoop, device_id: DeviceId, event: DeviceEvent, ) { let timestamp = Instant::now(); self.events.send(RuntimeEvent::DeviceEvent { device_id, intant, event }); } fn memory_warning(&mut self, event_loop: &ActiveEventLoop) { self.events.send(RuntimeEvent::MemoryWarning); } } ``` ## 4. Other Considerations This is a living document, and I'm going to try to reify any surrounding decisions / descussions into this section. ### 4.1 Web-Atomics It's our opinion that, in order for bevy to meet it's performance and quality targets, we need to use atomics on the web. Therefore, this proposal is written assuming atomics are avalible. In our opinion, bevy should require the nightly compiler for web builds going forward, along with a very specific set of compiler flags (including bulk-memory and atomics). ### 4.2 Non-Winit Backends In order to ensure non-winit backends are supported, we should develop a very minimal "windowing backend" example that demonstrates the correct architectural patterns. ### 4.3 Ugly Data-Types This proposal sugests locking all non-send data to the main thread, even if for types that can be created on arbitrary threads. In the future, we should design better solutions for handling this type of non-send-but-not-locked-to-main data directly within the ecs. ### 4.4 Web Multithreading Requirements, Benifits & Costs To use all the multithreading bells and whistles on wasm, you must: 1. Use nightly 2. Pass `target-feature=+atomics` to rustflags 3. Build with `--target=wasm32-unknown-unknown` and `build-std=std,panic_abort` 4. Serve with `Cross-Origin-Opener-Policy` header set to `same-origin` and `Cross-Origin-Embedder-Policy` header set to `require-corp` or `credentialless` Notibly: + Many other useful browser apis are locked behind these headers, like sub-millisecond-precicion timers. + `itch.io` claims to enable these headers automatically. + Enabling these flags does make it harder to pull in assets from other domains, such as google fonts. [^1]: When I refer to "threads", I mean this to also include web-workers.