Bevy's Rendering Crates

# Bevy's Rendering Crates ``` authors: tychedelia, nth ``` :::warning This is a work in progress, and may contain omissions or mistakes. If you notice an issue or have a question, please don't hesitate to: + Bring it up in `#documentation-dev` on the bevy discord + Leave feedback in the form of a comment or suggested edit + Or just fix it yourself ==This is a community document, which anyone can edit!== You are welcome to: + Add new sections + Complete tasks listed as ==TODO== + Rewrite anything for clarity or correctness + Fix spelling mistakes + Do anything else at all ::: :::info Submitted as [`bevy_website` PR #1080]. ::: [`bevy_website` PR #1080]: https://github.com/bevyengine/bevy-website/pull/1080 ## Introduction This is a technical summary of Bevy's rendering architecture as of the 0.13 release. Bevy's rendering code is largely concentrated in three loosely-coupled crates: [`bevy_render`], [`bevy_core_pipeline`], and [`bevy_pbr`]. The foundation is [`bevy_render`], which manages how low-level rendering tasks are scheduled and executed. Above this sits [`bevy_core_pipeline`], which defines the individual steps of rendering each frame. Above both is [`bevy_pbr`] which adds a material abstraction and uses it to provide physically-based shaders. These three crates sit roughly in "upper-middle" of bevy's default stack and do not fully encompass all of bevy's rendering capabilities. Although we will not cover them here, both `bevy_ui` and `bevy_sprite` also hook-into and interact with rendering. We will begin with the lowest levels of the stack and progress upwards towards high-level concepts like materials. But before diving in, lets take a look at the two big dependencies underpinning the rest of the rendering stack: `wgpu` and `bevy_ecs`. [`bevy_render`]: https://github.com/bevyengine/bevy/tree/main/crates/bevy_render [`bevy_core_pipeline`]: https://github.com/bevyengine/bevy/tree/main/crates/bevy_core_pipeline [`bevy_pbr`]: https://github.com/bevyengine/bevy/tree/main/crates/bevy_pbr ## `wgpu` Preliminaries This is a quick overview of some important aspects of wgpu. Those experienced with wgpu can skip this section. Those unfamiliar with wgpu should check out [the wgpu book][wgpu book] and [the WebGPU spec][WebGPU spec]. `wgpu` is a rust package providing a cross platform implementation `WebGPU`, a modern graphics api. It strikes a nice balance between ergonomics and power, and can run pretty much anywhere. [wgpu book]: https://sotrh.github.io/learn-wgpu/ [WebGPU spec]: https://gpuweb.github.io/gpuweb/ ### Layers of Abstraction `wgpu` is a low level graphics library, but because it is designed to work across a wide range of platforms there are many layers of abstraction between it and the graphics hardware. Going from lowest to highest level: 1. First, there is the GPU hardware itself. 2. Above this sit platform-specific GPU drivers (supplied by Nvidia, AMD, Apple, Windows or the Linux Kernel), 3. Then there is the native GPU api (`metal`, `vulkan`, `gles` etc.) supplied by your platform, 4. Then the platform-specific adapter (`wgpu_hal::vulkan`, `wgpu_hal::gles`, `wgpu_native` and the browser's own `WebGpu` adapter, ect.) which are low-level unsafe rust bindings to the native api, 5. And finally there is the safe `WebGPU` api provided by `wgpu`, which abstracts over all the previous layers. It's good to be vaguely aware of this stack. `wgpu` does it's best to stay consistent, but some features are not fully supported on all platforms. It's also common for the "Hardware Abstraction Layers" (HAL) on layer 4 to support features beyond the base `WebGPU` spec. For example, some features that are platform dependent are: - Textures, paticularly those used for advanced use cases like compression. - The availability of storage buffers. - Compute shader capabilities. Additionally, because `wgpu` is both a slow moving specification and a kind of "lowest common denominator" API, there may be some advanced rendering APIs supported by your GPU, e.g. mesh shaders, that are not yet supported in `wgpu`. The software sections of this stack (layers 2-5) are represented in `wgpu` by a [`wgpu::Adapter`] and [`wgpu::Device`]. The adapter provides access to the platform specific `WebGPU` implementation (layers 2 and 3 of the stack). The device is a logical representation of layers 4 and 5 obtained from the `Adapter`, and is ultimately what allows us to submit work to the GPU. Both of these abstractions come directly from the `WebGPU` spec. [`wgpu::Adapter`]: https://docs.rs/wgpu/latest/wgpu/struct.Adapter.html [`wgpu::Device`]: https://docs.rs/wgpu/latest/wgpu/struct.Device.html ### Buffers Allocating memory is the most basic of all operations. In `wgpu`, GPU memory management is done through buffers. A [`wgpu::Buffer`] represents a continuous block of memory allocated on the GPU. Buffers can be created by calls to [`wgpu::Device::create_buffer`] (which only allocates the buffer) or [`wgpu::DeviceExt::create_buffer_init`] (which both allocates and initializes the buffer). You can reference a slice of a buffer using [`wgpu::Buffer::slice`]. This returns a [`wgpu::BufferSlice`] which can be "mapped" to main program memory for reading and writing. Writes are flushed by calling [`wgpu::Buffer::unmap`] and can be deallocated with [`wgpu::Buffer::destroy`]. Writes can also be queued as part of commands (which we will talk about later) using [`wgpu::Queue::write_buffer`]. Basically, `write_buffer` delays the memory write until the next set of instructions are sent to the GPU. When you create a buffer, you generally have to declare the intended use with [`wgpu::BufferUsages`]. Buffers must have `BufferUsages::MAP_READ` to be read from. [`wgpu::Buffer`]: https://docs.rs/wgpu/latest/wgpu/struct.Buffer.html [`wgpu::Device::create_buffer`]: https://docs.rs/wgpu/latest/wgpu/struct.Device.html#method.create_buffer [`wgpu::DeviceExt::create_buffer_init`]: https://docs.rs/wgpu/latest/wgpu/struct.Device.html#method.create_buffer_init [`wgpu::Buffer::slice`]: https://docs.rs/wgpu/latest/wgpu/struct.Buffer.html#method.slice [`wgpu::BufferSlice`]: https://docs.rs/wgpu/latest/wgpu/struct.BufferSlice.html [`wgpu::Buffer::unmap`]: https://docs.rs/wgpu/latest/wgpu/struct.Buffer.html#method.unmap [`wgpu::Buffer::destroy`]: https://docs.rs/wgpu/latest/wgpu/struct.Buffer.html#method.destroy [`wgpu::Queue::write_buffer`]: https://docs.rs/wgpu/latest/wgpu/struct.Queue.html#method.write_buffer [`wgpu::BufferUsages`]: https://docs.rs/wgpu/latest/wgpu/struct.BufferUsages.html ### Textures, Views, and Samplers A `wgpu::Texture` represents an image stored in GPU memory. Textures can be created by calling `wgpu::Device::create_texture`, which requires information about the format, dimensions, and so on. The main way to write to a texture is [`wgpu::Buffer::write_texture`], which is similar to [`wgpu::Device::write_buffer`] in that it is delayed for dispatch until commands are next sent to the GPU. It's also possible to copy a buffer to a texture using [`wgpu::CommandEncoder::copy_buffer_to_texture`]. Like buffers, textures can be deallocated using [`wgpu::Texture::destroy`]. Textures are usually accessed in shaders through views and samplers. A [`wgpu::TextureView`] is a "view" on a specific texture. The view describes how the raw texture data should be interpreted, and the subset of the texture to access. Views are created using [`wgpu::Texture::create_view`]. They will be very important when we get into bind groups later on. A [`wgpu::Sampler`] is a method of sampling color values from a texture at a given texture coordinate. They are created using [`wgpu::Device::create_sampler`], and will also be important in the context of bind groups, which we are about to introduce. ### Pipelines and Shaders Before introducing bind groups, we have to introduce the idea of a Pipeline. A pipeline is like a highly-specialized program that runs on the GPU. Currently `wgpu` supports two types of pipeline: [`wgpu::RenderPipeline`] for rasterizing geometry, and [`wgpu::ComputePipeline`] for general computation. Pipelines are composed of a sequence of stages, which can be either fixed or programmable. Fixed stages are inflexible hardware-accelerated operations, with no user control. Programmable stages, on the other hand, can be controlled by user-supplied shader modules. #### Shaders Shaders are small highly-specialized programs that run on the GPU within the programmable stages of pipelines. They are written in a language `WGSL`. Unlike code written to run on the CPU, shader programs are highly parallel by default, meaning your code will be executed across many threads at the same time, for example for each pixel in a fragment shader. This places some limitations on what shader code must look like as shaders must avoid complex branching or dependencies between threads to ensure efficient parallel execution. Operations should be designed to run independently for each thread, and control flow should remain uniform where possible to prevent performance bottlenecks caused by thread divergence. Shader programs also tend to be written to target a specific stage of the programable graphics pipeline. We describe the pipeline as "programmable" as it supports running shader programs at certain steps of the pipeline, as opposed to fixed function APIs that are not programmable. For example, in a traditional graphics pipeline, a vertex shader processes vertex data to transform it into screen space, where it is then passed to a fixed function rasterizer which cannot be explicitly programmed. Similarly, a fragment shader calculates the color of each pixel, which is then used by a fixed function stage to write the pixel data to the framebuffer. The stages of this pipeline may also have specialized hardware in your GPU to support their efficiency. In recent years, it has also become possible both to program more stages of the graphics pipeline, as well as to write "general purpose" shaders known as compute shaders, which are not tied to the traditional graphics pipeline. Compute shaders allow developers to perform a wide range of parallel computations, such as physics simulations, image processing, or machine learning, using the GPU’s immense computational power for tasks beyond rendering. Combining compute shaders with more traditional techniques allows building powerful modern graphics features. #### Layouts and Bind Groups Bindings allow buffers, texture views, and samplers (as well as arrays of the same) to be accessed in pipelines just like global variables. Each binding occupies a unique (but not necessarily sequential) index from 0 to 1000 in one of four "bind group" slots. On the GPU side, the expected type for each binding must be hard-coded onto the shader. On the CPU, the layout of bindings for each slot is specified by a "bind group layout" ([`wgpu::BindGroupLayout`]). Up to four of these layouts can be specified when creating a pipeline, one for each bind group slot. Bind groups are created on the CPU to assign specific buffers and textures to each binding in a layout. Once created, bind groups are bound to the four pipeline bind group slots using GPU commands (which we will talk more about later). When a pipeline is invoked (also via GPU commands) the group bound to each slot must match the layout of the slot. Bind groups work as a stack: When a group is rebound, all bind groups with higher slot index must be rebound as well. Only certain types of bindings can be written to by shaders: + Storage buffers + Storage textures #### Compute Pipelines A compute shader is a GPU program designed for general-purpose parallel computation. Compute shaders can bind resources like buffers and textures. ##### Workgroups A compute shader operates on data in parallel, with threads organized into workgroups, which are small, cohesive groups of threads that share memory and are scheduled together by the GPU. Each workgroup contains multiple threads (also called invocations), and each thread has a unique local ID within the group ##### Dispatch Compute shaders are launched via a dispatch command, which defines the number of workgroups to execute across the GPU. Calculating the necessary number of workgroups to efficiently run the program is part of optimizing the workload distribution, ensuring that the problem size is evenly divided among threads while taking into account the hardware’s capabilities, such as the number of available cores and the size of each workgroup, to maximize parallel efficiency and minimize idle threads. #### Render Pipelines ==TODO:== Explain vertex and fragment shaders and the graphics pipeline. Render pipelines are more specialized than compute pipelines, but also have more mechanisms for input and output than just bindings. Render pipelines have the following additional inputs: + Vertex and Index Buffers + Color and Depth attachments Render pipelines have the following outputs: + Color and Depth attachments ##### Render Attachments Render attachments are textures that store the results of rendering; they are what get rendered to. Render pipelines expect a specific number of color attachments, and can also configure a depth attachment. Failing to provide the attachments when invoking a pipeline will result in an error. Attachments can be sampled from as well as written to. ==TODO:== Double-check that this is correct. ##### Vertex and Index Buffers The vertex buffer is a special buffer that is used as input for the pipeline. In order for the pipeline to know what to draw, it must first define the geometry of what is being drawn. This is done by suppling a series of vertices to the fragment shader, that will represent the triangles used by the rasterizer and ultimately passed to the fragment shader. Vertex data is typically described in "local" space, which is the space that is relative to the origin mesh that the vertex is part of. For example, the center of a cube might be represnted as the point (0,0) in local space. The goal of the vertex shader is to transform the vertex data first into world space using the perspective of the camera and then into screen or clip space, which represents the normalized 2d coordinates of the screen. The vertex data supplied to the fragment shader is built on the CPU using a certain [*topology*](https://www.w3.org/TR/webgpu/#enumdef-gpuprimitivetopology) that describes how the data is laid out in the vertex buffer. Typically, this is as a list of triangles. Additionally, users may supply an optional *index* buffer, which tells the vertex shader in which order the vertices should be drawn. This must mach the "winding" that is defined in the pipeline descriptor, i.e. clockwise or counter-clockwise, to determine which direction the "front" of the triangle faces. Vertex data may also be *instanced*, which describes when the same mesh is drawn multiple times in a single draw call. There may be an additional vertex buffer bound in this case (at a different vertex buffer slot) that describes the per-instance data (as opposed to per-vertex data, like coordinates or colors). ### Commands and Passes Communication between the CPU and the GPU is asyncronuous; the CPU dispatches commands to the GPU and must wait for the GPU to finish it's workload and transmit a response. We'll reffer to these as GPU commands to differentiate them from bevy ECS commands. As previously said, a pipeline is like a highly-specialized program that runs on the GPU. To "run" a pipeline, multiple GPU commands must be issued to set up the various bindings, select the active pipeline, and finally invoke the pipeline itself. These GPU commands are grouped together in a "pass". Passes group and organize GPU commands. A [`wgpu::RenderPass`] allows issuing GPU commands relating to render pipelines, and a similar structure [`wgpu::ComputePass`] exists for compute pipelines. Render attachments are set during render pass creation and fixed for the duration; you can think of a render pass as a resource-scope for a certain set of attachments. All pipelines executed within a pass must be compatible with the provided set of attachments. Between passes, writes are flushed and attachments can be swapped around or reconfigured. Commands are queued together in a [`wgpu::CommandEncoder`] when they are added to a pass. Calling `CommandEncoder::finish()` encodes the commands into a [`wgpu::CommandBuffer`], which is ready to be sent off to the gpu for execution. Work begins when command buffers are submitted to the GPU command queue. [`wgpu::RenderPass`]: https://docs.rs/wgpu/latest/wgpu/struct.RenderPass.html [`wgpu::ComputePass`]: https://docs.rs/wgpu/latest/wgpu/struct.ComputePass.html [`wgpu::CommandEncoder`]: https://docs.rs/wgpu/latest/wgpu/struct.CommandEncoder.html [`wgpu::CommandBuffer`]: https://docs.rs/wgpu/latest/wgpu/struct.CommandBuffer.html ### Limitations As we have seen, `wgpu` sits atop many semi-interchangeable layers. Unfortunately, it's often necessary to design features to accommodate the lowest-common-denominator of supported platforms. Here are some limitations to keep in mind: + Only 4 bind-groups are allowed in a pipeline layout. + Only 4 storage buffers with dynamic offsets are allowed in a pipeline layout. + Only 4 storage textures are allowed in a pipeline layout. + Only 8 color attachments can be added to a pass. + Only 16 variables can be passed between shader stages. The full list is available [here](https://www.w3.org/TR/webgpu/#limits). ## `bevy_ecs` Preliminaries This is a quick overview of some important aspects of bevy's Entity Component System. Those experienced with the ECS can skip this section. Everyone else should refer to the [the book](https://bevyengine.org/learn/book/ecs/) and [the docs](https://docs.rs/bevy/latest/bevy/index.html) for more info. ### Systems and System Sets A [`System`] is a stateful instance of a function that can access data stored in a world. A [`SystemSet`] is logical group of systems (which can comprise other system sets). By default systems have neither strict execution order nor any conditions for execution. For any system or system set, you can define: + Its execution order relative to other systems or sets. + Any conditions that must be true for it to run. + Which set(s) it belongs to. These properties are all additive, and properties can be added to existing sets. Adding another does not replace an existing one, and they cannot be removed. If incompatible properties are added, the schedule will panic at startup. [`System`]: https://docs.rs/bevy/latest/bevy/ecs/system/trait.System.html [`SystemSet`]: https://docs.rs/bevy/latest/bevy/ecs/prelude/trait.SystemSet.html ### Schedules A [`Schedule`] is a collection of systems that are ordered, dispatched and executed together as a batch. Every system belongs to exactly one schedule (If the same function is added to two different schedules, it is considered a different system). Systems in different schedules are independent; Each schedule has it's own collection of SystemSets and it's own system execution ordering. [`Schedule`]: https://docs.rs/bevy/latest/bevy/ecs/schedule/struct.Schedule.html ### Apps An [`App`] contains a [`World`], several schedules, and a runner function which manages an event-loop. Every app has one main schedule, which the runner (generally) executes once per frame. Other schedules may be executed by calling [`World::run_schedule`]. [`App`]: https://docs.rs/bevy/latest/bevy/app/struct.App.html [`World`]: https://docs.rs/bevy/latest/bevy/ecs/world/struct.World.html [`World::run_schedule`]: https://docs.rs/bevy/latest/bevy/ecs/world/struct.World.html#method.run_schedule ### Sub-Apps A [`SubApp`] is a secondary world with it's own set of schedules, which is contained within the main app. Like the main app, a sub-app has a main schedule which is (usually) executed by the main app's runner once per frame. Before the runner executes the main schedule, it calls [`SubApp::extract`] to synchronize the sub-app with the main world. No schedules can execute on the main world during extraction. [`SubApp`]: https://docs.rs/bevy/latest/bevy/app/struct.SubApp.html [`SubApp::extract`]: https://docs.rs/bevy/latest/bevy/app/struct.SubApp.html#method.extract ## The `bevy_render` Crate The `bevy_render` crate is a modular rendering toolkit. It's mostly concerned with the nuts-and-bolts of scheduling and executing the rendering workload. The tools it provides are largely independent, but they are intended to be used together. We will start by investigating how the rendering code integrates with `wgpu` through Render Resources, then take a look at the ECS side of things with the render sub-app, before moving on to high level scheduling abstractions like the Render Graph and Render Phases. ### Render Resources The `bevy_render` crate wraps many primitive resources from `wgpu` in higher level abstractions that are accessible through the [`RenderContext`]. There are several things you can do with a `RenderContext`, including setting up a render pass and queuing a task to generate command buffer in parallel. Bevy provides it's own convenience wrappers around [`wgpu::Device`] and [`wgpu::CommandEncoder`], called [`Device`] and [`CommandEncoder`] respectively. These are both accessible through the [`RenderContext`]. #### Bind Groups There are two ways to create bind groups in bevy, at different layers of abstraction. The most direct path is with [`Device::create_bind_group_layout`]. Instead of specifying a slice of `BindGroupLayoutEntry` items, it is usually more ergonomic to use the [`BindGroupLayoutEntries`]. Bind group instances can be created from a layout using [`Device::create_bind_group`]. Here again, it is usually preferable to use [`BindGroupEntries`] over manually constructing a slice of [`BindGroupEntry`]. While it is possible to create bind groups directly in this manner, most users will want to use an [`AsBindGroup`] trait derive instead. Types that implement [`AsBindGroup`] provide two important functions: [`AsBindGroup::bind_group_layout`] and [`AsBindGroup::as_bind_group`]. The former is a static function which creates a bind group layout (which is the same for all instances of the type) and the latter is a method which returns a bind group instance with the appropriate layout. We will cover when these are called later when we get into render workload scheduling. #### Tracked Render Passes In bevy, render passes can be created by calling [`RenderContext::begin_tracked_render_pass`]. A [`TrackedRenderPass`] is a wrapper around a `wgpu::RenderPass` with some handy automatic resource-management built-in. It's a "Tracked" pass because it keeps track of the current pipeline id, configured bind-groups, and bound vertex and index buffers. This lets us treat several important render commands as idempotent. Binding the same pipeline multiple times, for example, will only result in a single `BindPipeline` command being sent to the GPU driver. Note: This approach avoids redundant GPU instructions, which can be very costly, but at the expense of additional state-management overhead on every frame. Some rendering experts are experimenting with alternatives, and it's likely this will change in the near future. [`RenderContext::begin_tracked_render_pass`]: https://docs.rs/bevy/latest/bevy/render/renderer/struct.RenderContext.html#method.begin_tracked_render_pass [`TrackedRenderPass`]: https://docs.rs/bevy/latest/bevy/render/render_phase/struct.TrackedRenderPass.html #### Parallel Command Encoding As covered in the `wgpu` preliminary, rendering involves queuing render commands onto a `CommandEncoder`, so that they can be submitted to the gpu in a single batch. Unfortunately, queuing a large number of commands can take quite a long time. Command buffer generation tasks alleviate this issue by allowing us to perform this costly work in parallel. A "Command-Buffer Generation Task" is a function registered with `RenderContext::add_command_buffer_generation_task` which takes a read-only reference to the underlying `wgpu::Device` and returns a `wgpu::CommandBuffer`. Before`RenderContext` finishes, it runs all registered tasks in parallel and then joins the resulting commands back together in the order they were added. The `RenderContext` is not available within tasks. A new `wgpu::CommandEncoder` and `wgpu::RenderPass` must be created from the provided `wgpu::Device`. Bevy's wrapper types must be added to `wgpu` resources manually. Tracking, for instance, can be added to an existing `wgpu::RenderPass` by passing it to `TrackedRenderPass::new()`. ### The Render Sub-App We are now going to move away from the `wgpu` side of things and look at how rendering intersects with scheduling and the ECS. Bevy optionally supports pipelined-rendering, which is a technique where the current frame is rendered at the same time as the next game update runs. Pipelined-rendering is achieved by moving rendering work into a sub-app which (mostly) executes on its own thread. On single-threaded targets (like wasm), pipelined-rendering is disabled and the render sub-app instead runs on the main thread between executions of the main app. The render sub-app is a very specialized use of the ecs. It relies upon a Entity-sharing scheme with the main world that involves clearing all the entities after every frame. Because entities don't stick around more than a single frame most of the interesting stuff in the render world happens within resources. One of these resources, the [`RenderGraph`], is what ultimately drives the rendering work done each frame. The render sub-app has two schedules, called [`ExtractSchedule`] and [`Render`] (the inconsistent naming avoids conflict with the `Extract` system parameter, which we will talk about later). The extract schedule is executed during the `extract` function (which we discussed on the ecs preliminaries), and allows access to both the main world and the render world. The render schedule is the main schedule of the render sub-app, and runs after `ExtractSchedule`. All entities are cleared at very end of the `Render` schedule. [`RenderGraph`]: https://docs.rs/bevy/latest/bevy/render/render_graph/struct.RenderGraph.html [`ExtractSchedule`]:https://docs.rs/bevy/latest/bevy/prelude/struct.ExtractSchedule.html [`Render`]: https://docs.rs/bevy/latest/bevy/prelude/struct.ExtractSchedule.html #### The Entity Sharing Scheme As of 0.13, the docs on [`Entity`] have this to say about cross-world entity use: > [An entity is] only valid on the World it’s sourced from. Attempting to use an > Entity to fetch entity components or metadata from a different world will > either fail or return unexpected results. The render world is an exception; it implements an explicit entity-sharing scheme with the main world to enable cross-world use. It works like this: + [`World::flush_and_reserve_invalid_assuming_no_entities`] is called on the render world before each frame, to reserve all entities that are (or could be) used in the main world. + The reserved entities are spawned into the render world as "invalid", and won't really be used until components are assigned to them. + [`World::get_or_spawn`] lets render-world systems assign render-world components to entity ids initially derived from the main world. + During the frame, [`World::spawn`] can be used to spawn new render-world exclusive entities, which are guaranteed not to conflict with main world entities. + All entities are cleared from the render world at the end of each frame so that the reserve function can be safely called again at the top of the next frame. From now on, a reference to a "reserved entity" in the render world will always mean an entity that was reserved before extraction, with a corresponding entity in the main world. "Unreserved entity" may sometimes be used to refer to an entity that is spawned into the render world directly, without a corresponding twin in the main world. The upside is that entities from the main world can be used in the render world without fear of collisions. The downside is that the entities need to be added back to the render world every frame. [`Entity`]: https://docs.rs/bevy/latest/bevy/ecs/entity/struct.Entity.html [`World::flush_and_reserve_invalid_assuming_no_entities`]: https://docs.rs/bevy/latest/bevy/ecs/entity/struct.Entities.html#method.flush_and_reserve_invalid_assuming_no_entities [`World::spawn`]: https://docs.rs/bevy/latest/bevy/ecs/world/struct.World.html#method.spawn [`World::get_or_spawn`]: https://docs.rs/bevy/latest/bevy/ecs/world/struct.World.html#method.get_or_spawn #### The Extract Schedule The [`ExtractSchedule`] is a sync-point that allows the render-world to be populated with data from the main world. Render-world systems in this schedule access the main would through the special [`MainWorld`] resource (which can be mutable), or the [`Extract`] parameter (which is read-only). Read-only access through the `Extract` parameter is preferred. While the `Extract` schedule runs, no other schedule can execute on either the main app or the render sub app. It effectively locks both worlds, and is the only part of bevy that can bottleneck both game logic and rendering. Bevy takes great pains to keep the extract schedule as slim and efficient as possible, and users should attempt to keep systems the extract schedule small. The `RenderPlugin` only adds a single system to the extract schedule, `PipelineCache::extract_shader`, which we will talk more about when we introduce the `Shader` asset. Most of the systems in the extract schedule read some main-world component `Foo` and add a matching `ExtractedFoo` render component to the corresponding reserved entity in the render world. [`MainWorld`]: https://docs.rs/bevy/latest/bevy/render/struct.MainWorld.html [`Extract`]: https://docs.rs/bevy/latest/bevy/render/struct.Extract.html #### The Render Schedule The [`Render`] schedule runs directly after extraction. It comes with 16 built-in system sets, grouped as variants of the [`RenderSet`] enum. [`RenderSet`]: https://docs.rs/bevy/latest/bevy/render/enum.RenderSet.html ##### The `ExtractCommands` Set The first set to run in the render schedule is `ExtractCommands`. This set usually contains a single system that applies ECS commands dispatched in the extract schedule. Applying ECS commands in Render rather than Extract means the main world spends less time locked in extraction. ##### The `ManageViews` Set The `ManageViews` set runs after `ExtractCommands`, and contains four systems: + [`sort_cameras`] Sorts extracted cameras by [`Camera::order`]. + [`prepare_windows`] Gets each window ready to be drawn to. + [`prepare_view_attachments`] After windows have been prepared, adds an `OutputColorAttachment` to each camera, containing the output texture view and format. These attachements can be optionally overriden before `prepare_view_targets` by user code, for example when taking a screenshot to insert a different texture for single frame. + [`prepare_view_targets`] After the correct attachment has been selected, adds a [`ViewTarget`] component to each camera, that contains both the output texture as well as a reference to the "intermediate" texture that will be rendered to. It will become more clear why this is called `ManageViews` when we talk about views later. [`sort_cameras`]: https://docs.rs/bevy/latest/bevy/render/camera/fn.sort_cameras.html [`Camera::order`]: https://docs.rs/bevy/latest/bevy/prelude/struct.Camera.html#structfield.order [`prepare_windows`]: https://docs.rs/bevy/latest/bevy/render/view/fn.prepare_windows.html [`prepare_view_targets`]: https://docs.rs/bevy/latest/bevy/render/view/fn.prepare_view_targets.html [`ViewTarget`]: https://docs.rs/bevy/latest/bevy/render/view/struct.ViewTarget.html ##### The `PrepareAssets` Set The `PrepareAssets` runs between `ExtractCommands` and `Prepare`, in parallel with `ManageViews`, `Queue` and `PhaseSort`. The purpose of the `PrepareAssets` set is to run a bunch of instances of a generic system called [`prepare_assets`] added by different [`RenderAssetPlugins`]. After this set completes, various `RenderAssets<A>` resources are populated with data ready to be sent to the GPU. See the section on the `RenderAssetPlugin` for more information. [`RenderAssetPlugins`]: https://docs.rs/bevy/latest/bevy/render/render_asset/struct.RenderAssetPlugin.html [`prepare_assets`]: https://docs.rs/bevy/latest/bevy/render/render_asset/fn.prepare_assets.html ##### The `Queue` Set The `Queue` set runs after `ManageViews`. The `bevy_render` crate intentionally leaves populating the `Queue` set to higher levels of the stack. The systems in this set are expected to do stuff with render phases, which we will talk about later. The `Queue` set also contains a special subset `QueueMeshes` which executes after `prepare_assets<Mesh>` completes. It is also empty by default. ##### The `PhaseSort` Set The `PhaseSort` set runs after `Queue` and is also left empty by `bevy_render`. We will talk more about this set along side render phases and the core pipeline. ##### The `Prepare` Set The `Prepare` set runs after `PhaseSort` and `PrepareAssets` complete. It is intended for use by systems which translate entities and components into GPU-friendly formats and creates [`BindGroups`]. ==TODO:== This may be wrong, creating bind groups probably happens in PrepareBindGroups. Fix or clarify... [`BindGroups`]: https://docs.rs/bevy/latest/bevy/render/render_resource/struct.BindGroup.html ###### The `PrepareResources` Subset ==TODO:== Explain prepare the resources subset of `Prepare`. ###### The `PrepareBindGroups` Subset ==TODO:== Explain the bind groups subset of `Prepare`. ##### The `Render` Set The `Render` set (not to be confused with the `Render` schedule to which it belongs) runs after `Prepare`, and is when the actual draw calls get issued to the GPU. The `bevy_render` crate adds two systems to the `Render` set: + First a system calls [`PipelineCache::process_queue`] to compile all the necessary shader pipelines. + The [`render_system`] does the rendering. The `render_system` mostly triggers an execution of the render graph, bevy's render workload scheduling system. We will talk more about the render graph in it's own section. [`PipelineCache::process_queue`]: https://docs.rs/bevy/latest/bevy/render/render_resource/struct.PipelineCache.html#method.process_queue [`render_system`]: https://docs.rs/bevy/latest/bevy/render/render_resource/struct.PipelineCache.html#method.process_queue ##### The `Cleanup` Set The `Cleanup` set is the last set in the schedule, running after `Render`. The `bevy_render` crate adds two systems to the `Cleanup` set: + [`World::clear_entities`], which drops all the entities from the render world. + [`update_texture_cache_system`], which unloads textures that haven't been used in the last three frames. ### Views The render world has one very special component, with no direct analog in the main world: [`ExtractedView`]. Any entity that ends up with this component is considered a 'View'. At time of writing, only two main-world components get an `ExtractedView` attached to them in the render world: + cameras (added in `bevy_render::camera::extract_camera`), + and shadow-casting lights (added by a system in `bevy_pbr`, as we will see later) A 'View' represents a point in space from which render commands can be issued. Each `ExtractedView` contains a projection, a transform, a viewport rectangle (width, height, origin), and a few other bits and bobs. Views are important because of how they hook into the render graph: Each view has the potential to execute a specialized render workload. [`ExtractedView`]: https://docs.rs/bevy/latest/bevy/render/view/struct.ExtractedView.html ### The Render Graph The [`RenderGraph`], is bevy's specialized task scheduler for render workloads. It consists of a set of [`Nodes`] connected by [`Edges`] into a directed acyclic computation-graph. Each node represents a self-contained unit of work (a task, effectively) that needs to be performed to render a frame. The edges specify dependencies between nodes. Every node has a [`Node::run`] function, which takes as arguments immutable references to the node itself, a [`RenderGraphContext`], a [`RenderContext`] and the world. When the graph is executed, the nodes are ordered according to the dependency graph and executed in sequence. The render graph is executed once per frame (in the `Render` set) by the `render_system`. What goes in a `Node::run` function? Whatever you want! Nodes can run arbitrary code. They only get immutable access to the render world, but can generate GPU work through the provided [`RenderContext`]. In practice, most nodes set up a single render pass and queue up a bunch of draw commands (more or less). [`Nodes`]: https://docs.rs/bevy/latest/bevy/render/render_graph/trait.Node.html [`Edges`]: https://docs.rs/bevy/latest/bevy/render/render_graph/enum.Edge.html [`Node::run`]: https://docs.rs/bevy/latest/bevy/render/render_graph/trait.Node.html#tymethod.run [`RenderGraphContext`]: https://docs.rs/bevy/latest/bevy/render/render_graph/struct.RenderGraphContext.html [`RenderContext`]: https://docs.rs/bevy/latest/bevy/render/renderer/struct.RenderContext.html #### Subgraphs The render graph can have additional named child graphs associated with it. These are called "subgraphs", even though they are not subgraphs in the traditional graph-theory sense of the term. Instead, a subgraph is an auxiliary graph of render tasks which can be executed (possibly multiple times) by nodes in the main graph. Subgraphs do not run when the main render graph is executed. Only invoking [`RenderGraphContext::run_sub_graph`] within a top-level node's run function will cause the subgraph to run. Subgraphs are queued and executed in sequence after the main render graph finishes. ==TODO:== Redraft this section and merge it with CameraDriverNode and ViewNode. [`RenderGraphContext::run_sub_graph`]: https://docs.rs/bevy/latest/bevy/render/render_graph/struct.RenderGraphContext.html#method.run_sub_graph #### CameraDriverNode and ViewNode With knowledge of subgraphs and views under our belt, we can approach one of the most commonly misunderstood and underappreciated aspects of the render graph: the [`CameraDriverNode`] and [`ViewNodes`]. These are both implementations of [`render_graph::Node`]. The `CameraDriverNode` is automatically added to the main render graph by the `CameraPlugin`. When it runs, it looks up every extracted camera entity and runs the subgraph specified in [`ExtractedCamera::render_graph`]. The subgraphs are queued to run after the main graph, and execute in sequence. The `CameraDriverNode` also passes along the camera entity into the subgraph so it can be accessed through `RenderGraphContext::view_entity`. [`ViewNodes`] are a convenient wrapper around `Node` which makes it easy to query for data off of the `RenderGraphContext::view_entity`. It basically lets you grab extracted data off the camera entity in a camera-driven subgraph. If that all seems very abstract, just keep going for now. It will make more sense when we move on to the `bevy_core_pipeline` crate. [`render_graph::Node`]: https://docs.rs/bevy/latest/bevy/render/render_graph/trait.Node.html [`CameraDriverNode`]: https://docs.rs/bevy/latest/bevy/render/camera/struct.CameraDriverNode.html [`ViewNodes`]: https://docs.rs/bevy/latest/bevy/render/render_graph/trait.ViewNode.html [`ExtractedCamera::render_graph`]: https://docs.rs/bevy/latest/bevy/render/camera/struct.ExtractedCamera.html#structfield.render_graph [`RenderGraphContext::view_entity`]: https://docs.rs/bevy/latest/bevy/render/render_graph/struct.RenderGraphContext.html#method.view_entity ### Render Phases `bevy_render` supplies a generic work queue called a "Render Phase" that is intended to supply high-level render instructions for render graph nodes to execute. #### Phase Items [`PhaseItem`] is a trait for types that contain renderable data. All instances of a type that implements `PhaseItem` are expected to be drawn using a single pipeline. `bevy_render` provides a convenient set of tools for grouping, sorting, batching and rendering `PhaseItems`, but leaves the actual implementation up to its dependencies (largely `bevy_core_pipeline`). ##### Sorted vs Binned Phases Phases that are defined for use with the main 2d and 3d pipelines are typically defined as being *sorted* or *binned*, and have additional data stored in `ViewSortedRenderPhases<T>` and `ViewBinnedRenderPhases<T>` resources respectively. These different strategies allow for different optimizations depending on how a phase is intended to be drawn. For example, for items with transparency, it's necessary to sort the items from back to front to ensure correct blending, as transparent objects need to be drawn after opaque objects for proper rendering. This is handled using the sorted phase strategy to manage the draw order based on depth. For opaque items, it’s more efficient to use a binned strategy, which allows grouping items by certain properties (like material or mesh) to reduce state changes in the rendering pipeline. This leads to better performance since opaque objects don’t require depth sorting and can be drawn in any order. #### Draw Functions and Render Commands Each `PhaseItem` defines how it should be drawn using [`Draw`] functions. This is a trait with two methods: [`Draw::prepare`], and [`Draw::draw`]. The former sets up the function for drawing, and the latter draws an individual item. Multiple draw functions can be registered to a `PhaseItem` type using the [`DrawFunctions`] resource. `PhaseItem::draw_function` determines which registered `Draw` function is applied to any given instance of a `PhaseItem`. Bevy also allows users to compose draw functions from an ordered set (eg. tuple) of [`RenderCommands`]. [`RenderCommands`] offer a simple modular api and they fulfill the same fundamental purpose as draw functions. Render commands must also be registered by invoking `App::add_render_command` on the render sub-app. #### Phases and The Render Schedule Phase items are collected into a [`RenderPhase<I: PhaseItem>`] component on each View. The `bevy_render` crate expects items to be added to each `RenderPhase` during the `Queue` render set. The `PhaseSort` set is (as can probably guessed) expected to contain systems that sort and batch render phases. #### Phases and The Render Graph After all the items are queued onto phases, the render graph runs. A specific view-node is usually given the task of calling [`RenderPhase::render`], which calls the registered [`Draw`] function on each of queued `PhaseItem`. This is when the "actual rendering" happens. [`PhaseItem`]: https://docs.rs/bevy/latest/bevy/render/render_phase/trait.PhaseItem.html [`RenderPhase<I: PhaseItem>`]: https://docs.rs/bevy/latest/bevy/render/render_phase/struct.RenderPhase.html [`Draw`]: https://docs.rs/bevy/latest/bevy/render/render_phase/trait.Draw.html ### The `RenderPlugin` The [`RenderPlugin`] is responsible for: + Creating and configuring the render sub-app (as described above), + Starting `wgpu`, + Adding the `Shader` asset type, + Adding a bunch of other self-contained plugins. ### The `RenderAssetsPlugin` The `RenderAssetPlugin<A>` takes a type implementing [`RenderAsset`], a trait for types that can be encoded into a GPU-friendly format. Each instance of the `RenderAssetPlugin` adds two systems: + [`extract_render_asset::<A>`] in the extract schedule, which extracts new and changed instances of `A` into the render world cache. If `RenderAsset::asset_usage` returns `!RenderAssetUsages::MAIN_WORLD` then this system also unloads it from the main world. + [`prepare_asset::<A>`] in `PrepareAssets` calls `RenderAsset::prepare_asset` to get the GPU-friendly version. The `bevy_render` crate contains two implementations of `RenderAsset`: + [`Image`] which transforms into a [`GpuImage`] + [`Mesh`] which transforms into a [`GpuMesh`]. Each gets their own `RenderAssetPlugin` instance added by [`MeshPlugin`] and [`ImagePlugin`] (though for some reason `ImagePlugin` is part of the default plugins and `MeshPlugin` is added directly by `RenderPlugin`). [`RenderAssetPlugin`]: https://docs.rs/bevy/latest/bevy/render/render_asset/struct.RenderAssetPlugin.html [`RenderAsset`]: https://docs.rs/bevy/latest/bevy/render/render_asset/trait.RenderAsset.html [`RenderAsset::prepare_asset`]: 3https://docs.rs/bevy/latest/bevy/render/render_asset/trait.RenderAsset.html#tymethod.prepare_asset ### The `Shader` Asset Shaders are loaded using Bevy's asset management system and interected with via `Handle<Shader>`, which represents a particular shader source. Typically, a shader will be loaded from the assets folder, but can also be created with a source wgsl snippet or even raw spirv bytes. Shaders are not compiled until they are processed by the `PipelineCache` when a pipeline is queued for creation, either through pipeline specialization (e.g. for use with a material) or manually requested via `queue_render_pipeline`. #### `naga_oil` Bevy uses a preprocessor library called `naga_oil` in order to compose various shader resources together that are defined in separate sources. For example, many internal Bevy shaders start with an import directive `#import` that describes what other shader sources are required in order to compose (i.e. concatenate into a single source) the final shader. `naga_oil` also uses "shader defs" to power `#ifdef` preprocessing, which can be used by shader authors to turn certain features on or off during pipeline specialization. Thus, a single `Handle<Shader>` may actually represent several different instances of compiled shader that are associated with a single handle, and are cached on the basis of what shader defs were enabled for a particular shader source. ### The `WindowRenderPlugin` The [`WindowRenderPlugin`] adds systems that extract window information from the main world into the render world. It creates two resources: + [`ExtractedWindows`]: A mapping from entity ids to extracted window data. + [`WindowSurfaces`]: A mapping from entity ids to a `wgpu::Surface` (for each window). In the extract schedule: + [`extract_windows`] populates `ExtractedWindows`. In the render schedule: + First [`create_surfaces`] creates a `wgpu::Surface` any window that need it. + Then, in `ManageViews`, [`prepare_windows`] prepares a texture for drawing to. Note that `get_current_texture` has to wait for the GPU to be ready to draw. `prepare_windows` may block waiting for the texture under heavy graphics workload. ### The `CameraPlugin` The [`CameraPlugin`] adds systems that extract and sort cameras. It creates one resource: + [`SortedCameras`]: A sorted list of camera information, including the camera render-target. In the extract schedule: + [`extract_cameras`] adds [`ExtractedCamera`] and [`ExtractedView`] components to reserved camera entities. ### The `ViewPlugin` ==TODO:== Explain the view plugin (or remove this section because it is covered partially in the render graph. ### `ImagePlugin` and `MeshPlugin` ==TODO:== Explain how the renderer sets up image and mesh extraction. ### Summary of `bevy_render` The best way to think of `bevy_render` is as a toolbox, but thus far we've mostly covered how these tools work. Now, as we move into the higher levels of the rendering stack, we can now start to think about how to use these tools: + The render sub-app: For rendering in parallel with game logic. + Render resources: For automatically managing low-level rendering resources. + Views: For rendering from multiple different "camera-like" perspectives. + Phases: For many rendering objects (eg. meshes and lights). + The Render Graph: For scheduling render tasks for arbitrary numbers of views and queued phase items. ## The `bevy_core_pipeline` Crate Now we can move on to `bevy_core_pipeline`: An opinionated rendering solution built using the `bevy_render` toolbox. It doesn't touch the actual shading, but it does set up a common `RenderGraph` and a set of standard `RenderPhases`. The main thing `bevy_core_pipeline` does is add two new subgraphs to the render graph: `Core2d` and `Core3d`, each introduced by their respective plugins. ### Forward vs Deferred Rendering ==TODO:== Explain how deferred rendering is implemented. See [the original pr][original pr]. [original pr]: https://github.com/bevyengine/bevy/pull/9258/files ### The Core 2d Subgraph The `Core2d` subgraph is executed once for every 2d camera. Nodes are identified by variants of the `Node2d` enum. We will look at the nodes of the graph label-by-label, following the dependency graph: ```mermaid graph TB; MainPass-->Tonemapping; Tonemapping-->EndMainPassPostProcessing; EndMainPassPostProcessing-->Upscaling; ``` The `MainPass` is implemented by view node [`MainPass2dNode`], which mostly just renders the `Transparent2d` render phase. No draw functions are registered for this phase, which allows higher-levels of the rendering architecture to hook in their own rendering code. Why is it called `Transparent2d`? Because it's drawn as if it might be transparent. ==TODO:== Elaborate on this. Both `Tonemapping` and `Upscaling` use the same logic for both the 2d and 3d and we will cover them in their own section. ### The Core 3d Subgraph The `Core3d` subgraph is executed once for every 3d camera. Nodes are identified by variants of the `Node3d` enum. Again, we will look at the graph label by label following the dependency graph: ```mermaid graph TB; subgraph Prepass Partition direction LR Prepass-->DeferredPrepass DeferredPrepass-->CopyDeferredLightingId CopyDeferredLightingId-->EndPrepass end subgraph Mainpass Partition direction LR EndPrepass-->StartMainPass StartMainPass-->MainOpaquePass MainOpaquePass-->MainTransmissivePass MainTransmissivePass-->MainTransparentPass MainTransparentPass-->EndMainPass end EndMainPass-->Tonemapping Tonemapping-->EndMainPassPostProcessing EndMainPassPostProcessing-->Upscaling ``` #### The Prepass Section `Prepass` is implemented by view node [`PrepassNode`]. ==TODO:== Explain what the prepass section is for. #### The Mainpass Section `MainOpaquePass` is implemented in view node [`MainOpaquePass3dNode`]. It renders the [`Opaque3d`] and [`AlphaMask3d`] phases along with the skybox in a single render pass. `MainTransmissivePass` is implemented by view node [`MainTransmissivePass3dNode`]. It draws the [`Transmissive3d`] render pass. `MainTransparentPass` is implemented by view node [`MainTransparentPass3dNode`]. It draws the [`Transparent3d`] phase. ### Shared Logic #### Tonemapping #### Upscaling ## The `bevy_pbr` Crate ==TODO:== Give an overview of the `bevy_pbr` crate. ## License This document is released under the MIT License. Copyright 2024 Miles Silberling-Cook. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.