Bevy's Rendering Crates

authors: tychedelia, nth

This is a work in progress, and may contain omissions or mistakes. If you notice an issue or have a question, please don't hesitate to:

  • Bring it up in #documentation-dev on the bevy discord
  • Leave feedback in the form of a comment or suggested edit
  • Or just fix it yourself

This is a community document, which anyone can edit!

You are welcome to:

  • Add new sections
  • Complete tasks listed as TODO
  • Rewrite anything for clarity or correctness
  • Fix spelling mistakes
  • Do anything else at all

Submitted as bevy_website PR #1080.

Introduction

This is a technical summary of Bevy's rendering architecture as of the 0.13 release.

Bevy's rendering code is largely concentrated in three loosely-coupled crates: bevy_render, bevy_core_pipeline, and bevy_pbr. The foundation is bevy_render, which manages how low-level rendering tasks are scheduled and executed. Above this sits bevy_core_pipeline, which defines the individual steps of rendering each frame. Above both is bevy_pbr which adds a material abstraction and uses it to provide physically-based shaders.

These three crates sit roughly in "upper-middle" of bevy's default stack and do not fully encompass all of bevy's rendering capabilities. Although we will not cover them here, both bevy_ui and bevy_sprite also hook-into and interact with rendering.

We will begin with the lowest levels of the stack and progress upwards towards high-level concepts like materials. But before diving in, lets take a look at the two big dependencies underpinning the rest of the rendering stack: wgpu and bevy_ecs.

wgpu Preliminaries

This is a quick overview of some important aspects of wgpu. Those experienced with wgpu can skip this section. Those unfamiliar with wgpu should check out the wgpu book and the WebGPU spec.

wgpu is a rust package providing a cross platform implementation WebGPU, a modern graphics api. It strikes a nice balance between ergonomics and power, and can run pretty much anywhere.

Layers of Abstraction

wgpu is a low level graphics library, but because it is designed to work across a wide range of platforms there are many layers of abstraction between it and the graphics hardware. Going from lowest to highest level:

  1. First, there is the GPU hardware itself.
  2. Above this sit platform-specific GPU drivers (supplied by Nvidia, AMD, Apple, Windows or the Linux Kernel),
  3. Then there is the native GPU api (metal, vulkan, gles etc.) supplied by your platform,
  4. Then the platform-specific adapter (wgpu_hal::vulkan, wgpu_hal::gles, wgpu_native and the browser's own WebGpu adapter, ect.) which are low-level unsafe rust bindings to the native api,
  5. And finally there is the safe WebGPU api provided by wgpu, which abstracts over all the previous layers.

It's good to be vaguely aware of this stack. wgpu does it's best to stay consistent, but some features are not fully supported on all platforms. It's also common for the "Hardware Abstraction Layers" (HAL) on layer 4 to support features beyond the base WebGPU spec.

For example, some features that are platform dependent are:

  • Textures, paticularly those used for advanced use cases like compression.
  • The availability of storage buffers.
  • Compute shader capabilities.

Additionally, because wgpu is both a slow moving specification and a kind of "lowest common denominator" API, there may be some advanced rendering APIs supported by your GPU, e.g. mesh shaders, that are not yet supported in wgpu.

The software sections of this stack (layers 2-5) are represented in wgpu by a wgpu::Adapter and wgpu::Device. The adapter provides access to the platform specific WebGPU implementation (layers 2 and 3 of the stack). The device is a logical representation of layers 4 and 5 obtained from the Adapter, and is ultimately what allows us to submit work to the GPU. Both of these abstractions come directly from the WebGPU spec.

Buffers

Allocating memory is the most basic of all operations. In wgpu, GPU memory management is done through buffers.

A wgpu::Buffer represents a continuous block of memory allocated on the GPU. Buffers can be created by calls to wgpu::Device::create_buffer (which only allocates the buffer) or wgpu::DeviceExt::create_buffer_init (which both allocates and initializes the buffer).

You can reference a slice of a buffer using wgpu::Buffer::slice. This returns a wgpu::BufferSlice which can be "mapped" to main program memory for reading and writing. Writes are flushed by calling wgpu::Buffer::unmap and can be deallocated with wgpu::Buffer::destroy. Writes can also be queued as part of commands (which we will talk about later) using wgpu::Queue::write_buffer. Basically, write_buffer delays the memory write until the next set of instructions are sent to the GPU.

When you create a buffer, you generally have to declare the intended use with wgpu::BufferUsages. Buffers must have BufferUsages::MAP_READ to be read from.

Textures, Views, and Samplers

A wgpu::Texture represents an image stored in GPU memory. Textures can be created by calling wgpu::Device::create_texture, which requires information about the format, dimensions, and so on. The main way to write to a texture is [wgpu::Buffer::write_texture], which is similar to [wgpu::Device::write_buffer] in that it is delayed for dispatch until commands are next sent to the GPU. It's also possible to copy a buffer to a texture using [wgpu::CommandEncoder::copy_buffer_to_texture]. Like buffers, textures can be deallocated using [wgpu::Texture::destroy].

Textures are usually accessed in shaders through views and samplers.

A [wgpu::TextureView] is a "view" on a specific texture. The view describes how the raw texture data should be interpreted, and the subset of the texture to access. Views are created using [wgpu::Texture::create_view]. They will be very important when we get into bind groups later on.

A [wgpu::Sampler] is a method of sampling color values from a texture at a given texture coordinate. They are created using [wgpu::Device::create_sampler], and will also be important in the context of bind groups, which we are about to introduce.

Pipelines and Shaders

Before introducing bind groups, we have to introduce the idea of a Pipeline.

A pipeline is like a highly-specialized program that runs on the GPU. Currently wgpu supports two types of pipeline: [wgpu::RenderPipeline] for rasterizing geometry, and [wgpu::ComputePipeline] for general computation.

Pipelines are composed of a sequence of stages, which can be either fixed or programmable. Fixed stages are inflexible hardware-accelerated operations, with no user control. Programmable stages, on the other hand, can be controlled by user-supplied shader modules.

Shaders

Shaders are small highly-specialized programs that run on the GPU within the programmable stages of pipelines. They are written in a language WGSL.

Unlike code written to run on the CPU, shader programs are highly parallel by default, meaning your code will be executed across many threads at the same time, for example for each pixel in a fragment shader. This places some limitations on what shader code must look like as shaders must avoid complex branching or dependencies between threads to ensure efficient parallel execution. Operations should be designed to run independently for each thread, and control flow should remain uniform where possible to prevent performance bottlenecks caused by thread divergence.

Shader programs also tend to be written to target a specific stage of the programable graphics pipeline. We describe the pipeline as "programmable" as it supports running shader programs at certain steps of the pipeline, as opposed to fixed function APIs that are not programmable. For example, in a traditional graphics pipeline, a vertex shader processes vertex data to transform it into screen space, where it is then passed to a fixed function rasterizer which cannot be explicitly programmed. Similarly, a fragment shader calculates the color of each pixel, which is then used by a fixed function stage to write the pixel data to the framebuffer. The stages of this pipeline may also have specialized hardware in your GPU to support their efficiency.

In recent years, it has also become possible both to program more stages of the graphics pipeline, as well as to write "general purpose" shaders known as compute shaders, which are not tied to the traditional graphics pipeline. Compute shaders allow developers to perform a wide range of parallel computations, such as physics simulations, image processing, or machine learning, using the GPU’s immense computational power for tasks beyond rendering. Combining compute shaders with more traditional techniques allows building powerful modern graphics features.

Layouts and Bind Groups

Bindings allow buffers, texture views, and samplers (as well as arrays of the same) to be accessed in pipelines just like global variables. Each binding occupies a unique (but not necessarily sequential) index from 0 to 1000 in one of four "bind group" slots.

On the GPU side, the expected type for each binding must be hard-coded onto the shader. On the CPU, the layout of bindings for each slot is specified by a "bind group layout" ([wgpu::BindGroupLayout]). Up to four of these layouts can be specified when creating a pipeline, one for each bind group slot.

Bind groups are created on the CPU to assign specific buffers and textures to each binding in a layout.

Once created, bind groups are bound to the four pipeline bind group slots using GPU commands (which we will talk more about later). When a pipeline is invoked (also via GPU commands) the group bound to each slot must match the layout of the slot. Bind groups work as a stack: When a group is rebound, all bind groups with higher slot index must be rebound as well.

Only certain types of bindings can be written to by shaders:

  • Storage buffers
  • Storage textures

Compute Pipelines

A compute shader is a GPU program designed for general-purpose parallel computation. Compute shaders can bind resources like buffers and textures.

Workgroups

A compute shader operates on data in parallel, with threads organized into workgroups, which are small, cohesive groups of threads that share memory and are scheduled together by the GPU. Each workgroup contains multiple threads (also called invocations), and each thread has a unique local ID within the group

Dispatch

Compute shaders are launched via a dispatch command, which defines the number of workgroups to execute across the GPU. Calculating the necessary number of workgroups to efficiently run the program is part of optimizing the workload distribution, ensuring that the problem size is evenly divided among threads while taking into account the hardware’s capabilities, such as the number of available cores and the size of each workgroup, to maximize parallel efficiency and minimize idle threads.

Render Pipelines

TODO: Explain vertex and fragment shaders and the graphics pipeline.

Render pipelines are more specialized than compute pipelines, but also have more mechanisms for input and output than just bindings.

Render pipelines have the following additional inputs:

  • Vertex and Index Buffers
  • Color and Depth attachments

Render pipelines have the following outputs:

  • Color and Depth attachments
Render Attachments

Render attachments are textures that store the results of rendering; they are what get rendered to. Render pipelines expect a specific number of color attachments, and can also configure a depth attachment. Failing to provide the attachments when invoking a pipeline will result in an error.

Attachments can be sampled from as well as written to. TODO: Double-check that this is correct.

Vertex and Index Buffers

The vertex buffer is a special buffer that is used as input for the pipeline.

In order for the pipeline to know what to draw, it must first define the geometry of what is being drawn. This is done by suppling a series of vertices to the fragment shader, that will represent the triangles used by the rasterizer and ultimately passed to the fragment shader. Vertex data is typically described in "local" space, which is the space that is relative to the origin mesh that the vertex is part of. For example, the center of a cube might be represnted as the point (0,0) in local space. The goal of the vertex shader is to transform the vertex data first into world space using the perspective of the camera and then into screen or clip space, which represents the normalized 2d coordinates of the screen.

The vertex data supplied to the fragment shader is built on the CPU using a certain topology that describes how the data is laid out in the vertex buffer. Typically, this is as a list of triangles. Additionally, users may supply an optional index buffer, which tells the vertex shader in which order the vertices should be drawn. This must mach the "winding" that is defined in the pipeline descriptor, i.e. clockwise or counter-clockwise, to determine which direction the "front" of the triangle faces.

Vertex data may also be instanced, which describes when the same mesh is drawn multiple times in a single draw call. There may be an additional vertex buffer bound in this case (at a different vertex buffer slot) that describes the per-instance data (as opposed to per-vertex data, like coordinates or colors).

Commands and Passes

Communication between the CPU and the GPU is asyncronuous; the CPU dispatches commands to the GPU and must wait for the GPU to finish it's workload and transmit a response. We'll reffer to these as GPU commands to differentiate them from bevy ECS commands.

As previously said, a pipeline is like a highly-specialized program that runs on the GPU. To "run" a pipeline, multiple GPU commands must be issued to set up the various bindings, select the active pipeline, and finally invoke the pipeline itself. These GPU commands are grouped together in a "pass".

Passes group and organize GPU commands. A wgpu::RenderPass allows issuing GPU commands relating to render pipelines, and a similar structure wgpu::ComputePass exists for compute pipelines.

Render attachments are set during render pass creation and fixed for the duration; you can think of a render pass as a resource-scope for a certain set of attachments. All pipelines executed within a pass must be compatible with the provided set of attachments. Between passes, writes are flushed and attachments can be swapped around or reconfigured.

Commands are queued together in a wgpu::CommandEncoder when they are added to a pass. Calling CommandEncoder::finish() encodes the commands into a wgpu::CommandBuffer, which is ready to be sent off to the gpu for execution. Work begins when command buffers are submitted to the GPU command queue.

Limitations

As we have seen, wgpu sits atop many semi-interchangeable layers. Unfortunately, it's often necessary to design features to accommodate the lowest-common-denominator of supported platforms. Here are some limitations to keep in mind:

  • Only 4 bind-groups are allowed in a pipeline layout.
  • Only 4 storage buffers with dynamic offsets are allowed in a pipeline layout.
  • Only 4 storage textures are allowed in a pipeline layout.
  • Only 8 color attachments can be added to a pass.
  • Only 16 variables can be passed between shader stages.

The full list is available here.

bevy_ecs Preliminaries

This is a quick overview of some important aspects of bevy's Entity Component System. Those experienced with the ECS can skip this section. Everyone else should refer to the the book and the docs for more info.

Systems and System Sets

A System is a stateful instance of a function that can access data stored in a world. A SystemSet is logical group of systems (which can comprise other system sets).

By default systems have neither strict execution order nor any conditions for execution. For any system or system set, you can define:

  • Its execution order relative to other systems or sets.
  • Any conditions that must be true for it to run.
  • Which set(s) it belongs to.

These properties are all additive, and properties can be added to existing sets. Adding another does not replace an existing one, and they cannot be removed. If incompatible properties are added, the schedule will panic at startup.

Schedules

A Schedule is a collection of systems that are ordered, dispatched and executed together as a batch. Every system belongs to exactly one schedule (If the same function is added to two different schedules, it is considered a different system).

Systems in different schedules are independent; Each schedule has it's own collection of SystemSets and it's own system execution ordering.

Apps

An App contains a World, several schedules, and a runner function which manages an event-loop.

Every app has one main schedule, which the runner (generally) executes once per frame. Other schedules may be executed by calling World::run_schedule.

Sub-Apps

A SubApp is a secondary world with it's own set of schedules, which is contained within the main app. Like the main app, a sub-app has a main schedule which is (usually) executed by the main app's runner once per frame. Before the runner executes the main schedule, it calls SubApp::extract to synchronize the sub-app with the main world. No schedules can execute on the main world during extraction.

The bevy_render Crate

The bevy_render crate is a modular rendering toolkit. It's mostly concerned with the nuts-and-bolts of scheduling and executing the rendering workload. The tools it provides are largely independent, but they are intended to be used together. We will start by investigating how the rendering code integrates with wgpu through Render Resources, then take a look at the ECS side of things with the render sub-app, before moving on to high level scheduling abstractions like the Render Graph and Render Phases.

Render Resources

The bevy_render crate wraps many primitive resources from wgpu in higher level abstractions that are accessible through the RenderContext. There are several things you can do with a RenderContext, including setting up a render pass and queuing a task to generate command buffer in parallel.

Bevy provides it's own convenience wrappers around wgpu::Device and wgpu::CommandEncoder, called [Device] and [CommandEncoder] respectively. These are both accessible through the RenderContext.

Bind Groups

There are two ways to create bind groups in bevy, at different layers of abstraction. The most direct path is with [Device::create_bind_group_layout]. Instead of specifying a slice of BindGroupLayoutEntry items, it is usually more ergonomic to use the [BindGroupLayoutEntries]. Bind group instances can be created from a layout using [Device::create_bind_group]. Here again, it is usually preferable to use [BindGroupEntries] over manually constructing a slice of [BindGroupEntry].

While it is possible to create bind groups directly in this manner, most users will want to use an [AsBindGroup] trait derive instead. Types that implement [AsBindGroup] provide two important functions: [AsBindGroup::bind_group_layout] and [AsBindGroup::as_bind_group]. The former is a static function which creates a bind group layout (which is the same for all instances of the type) and the latter is a method which returns a bind group instance with the appropriate layout. We will cover when these are called later when we get into render workload scheduling.

Tracked Render Passes

In bevy, render passes can be created by calling RenderContext::begin_tracked_render_pass. A TrackedRenderPass is a wrapper around a wgpu::RenderPass with some handy automatic resource-management built-in. It's a "Tracked" pass because it keeps track of the current pipeline id, configured bind-groups, and bound vertex and index buffers. This lets us treat several important render commands as idempotent. Binding the same pipeline multiple times, for example, will only result in a single BindPipeline command being sent to the GPU driver.

Note: This approach avoids redundant GPU instructions, which can be very costly, but at the expense of additional state-management overhead on every frame. Some rendering experts are experimenting with alternatives, and it's likely this will change in the near future.

Parallel Command Encoding

As covered in the wgpu preliminary, rendering involves queuing render commands onto a CommandEncoder, so that they can be submitted to the gpu in a single batch. Unfortunately, queuing a large number of commands can take quite a long time. Command buffer generation tasks alleviate this issue by allowing us to perform this costly work in parallel.

A "Command-Buffer Generation Task" is a function registered with RenderContext::add_command_buffer_generation_task which takes a read-only reference to the underlying wgpu::Device and returns a wgpu::CommandBuffer. BeforeRenderContext finishes, it runs all registered tasks in parallel and then joins the resulting commands back together in the order they were added.

The RenderContext is not available within tasks. A new wgpu::CommandEncoder and wgpu::RenderPass must be created from the provided wgpu::Device. Bevy's wrapper types must be added to wgpu resources manually. Tracking, for instance, can be added to an existing wgpu::RenderPass by passing it to TrackedRenderPass::new().

The Render Sub-App

We are now going to move away from the wgpu side of things and look at how rendering intersects with scheduling and the ECS. Bevy optionally supports pipelined-rendering, which is a technique where the current frame is rendered at the same time as the next game update runs.

Pipelined-rendering is achieved by moving rendering work into a sub-app which (mostly) executes on its own thread. On single-threaded targets (like wasm), pipelined-rendering is disabled and the render sub-app instead runs on the main thread between executions of the main app.

The render sub-app is a very specialized use of the ecs. It relies upon a Entity-sharing scheme with the main world that involves clearing all the entities after every frame. Because entities don't stick around more than a single frame most of the interesting stuff in the render world happens within resources. One of these resources, the RenderGraph, is what ultimately drives the rendering work done each frame.

The render sub-app has two schedules, called ExtractSchedule and Render (the inconsistent naming avoids conflict with the Extract system parameter, which we will talk about later). The extract schedule is executed during the extract function (which we discussed on the ecs preliminaries), and allows access to both the main world and the render world. The render schedule is the main schedule of the render sub-app, and runs after ExtractSchedule. All entities are cleared at very end of the Render schedule.

The Entity Sharing Scheme

As of 0.13, the docs on Entity have this to say about cross-world entity use:

[An entity is] only valid on the World it’s sourced from. Attempting to use an Entity to fetch entity components or metadata from a different world will either fail or return unexpected results.

The render world is an exception; it implements an explicit entity-sharing scheme with the main world to enable cross-world use. It works like this:

  • World::flush_and_reserve_invalid_assuming_no_entities is called on the render world before each frame, to reserve all entities that are (or could be) used in the main world.
  • The reserved entities are spawned into the render world as "invalid", and won't really be used until components are assigned to them.
  • World::get_or_spawn lets render-world systems assign render-world components to entity ids initially derived from the main world.
  • During the frame, World::spawn can be used to spawn new render-world exclusive entities, which are guaranteed not to conflict with main world entities.
  • All entities are cleared from the render world at the end of each frame so that the reserve function can be safely called again at the top of the next frame.

From now on, a reference to a "reserved entity" in the render world will always mean an entity that was reserved before extraction, with a corresponding entity in the main world. "Unreserved entity" may sometimes be used to refer to an entity that is spawned into the render world directly, without a corresponding twin in the main world.

The upside is that entities from the main world can be used in the render world without fear of collisions. The downside is that the entities need to be added back to the render world every frame.

The Extract Schedule

The ExtractSchedule is a sync-point that allows the render-world to be populated with data from the main world. Render-world systems in this schedule access the main would through the special MainWorld resource (which can be mutable), or the Extract parameter (which is read-only). Read-only access through the Extract parameter is preferred.

While the Extract schedule runs, no other schedule can execute on either the main app or the render sub app. It effectively locks both worlds, and is the only part of bevy that can bottleneck both game logic and rendering. Bevy takes great pains to keep the extract schedule as slim and efficient as possible, and users should attempt to keep systems the extract schedule small.

The RenderPlugin only adds a single system to the extract schedule, PipelineCache::extract_shader, which we will talk more about when we introduce the Shader asset. Most of the systems in the extract schedule read some main-world component Foo and add a matching ExtractedFoo render component to the corresponding reserved entity in the render world.

The Render Schedule

The Render schedule runs directly after extraction. It comes with 16 built-in system sets, grouped as variants of the RenderSet enum.

The ExtractCommands Set

The first set to run in the render schedule is ExtractCommands. This set usually contains a single system that applies ECS commands dispatched in the extract schedule. Applying ECS commands in Render rather than Extract means the main world spends less time locked in extraction.

The ManageViews Set

The ManageViews set runs after ExtractCommands, and contains four systems:

  • sort_cameras Sorts extracted cameras by Camera::order.
  • prepare_windows Gets each window ready to be drawn to.
  • [prepare_view_attachments] After windows have been prepared, adds an OutputColorAttachment to each camera, containing the output texture view and format. These attachements can be optionally overriden before prepare_view_targets by user code, for example when taking a screenshot to insert a different texture for single frame.
  • prepare_view_targets After the correct attachment has been selected, adds a ViewTarget component to each camera, that contains both the output texture as well as a reference to the "intermediate" texture that will be rendered to.

It will become more clear why this is called ManageViews when we talk about views later.

The PrepareAssets Set

The PrepareAssets runs between ExtractCommands and Prepare, in parallel with ManageViews, Queue and PhaseSort. The purpose of the PrepareAssets set is to run a bunch of instances of a generic system called prepare_assets added by different RenderAssetPlugins. After this set completes, various RenderAssets<A> resources are populated with data ready to be sent to the GPU.

See the section on the RenderAssetPlugin for more information.

The Queue Set

The Queue set runs after ManageViews. The bevy_render crate intentionally leaves populating the Queue set to higher levels of the stack. The systems in this set are expected to do stuff with render phases, which we will talk about later.

The Queue set also contains a special subset QueueMeshes which executes after prepare_assets<Mesh> completes. It is also empty by default.

The PhaseSort Set

The PhaseSort set runs after Queue and is also left empty by bevy_render. We will talk more about this set along side render phases and the core pipeline.

The Prepare Set

The Prepare set runs after PhaseSort and PrepareAssets complete. It is intended for use by systems which translate entities and components into GPU-friendly formats and creates BindGroups.

TODO: This may be wrong, creating bind groups probably happens in PrepareBindGroups. Fix or clarify

The PrepareResources Subset

TODO: Explain prepare the resources subset of Prepare.

The PrepareBindGroups Subset

TODO: Explain the bind groups subset of Prepare.

The Render Set

The Render set (not to be confused with the Render schedule to which it belongs) runs after Prepare, and is when the actual draw calls get issued to the GPU.

The bevy_render crate adds two systems to the Render set:

The render_system mostly triggers an execution of the render graph, bevy's render workload scheduling system. We will talk more about the render graph in it's own section.

The Cleanup Set

The Cleanup set is the last set in the schedule, running after Render.

The bevy_render crate adds two systems to the Cleanup set:

  • [World::clear_entities], which drops all the entities from the render world.
  • [update_texture_cache_system], which unloads textures that haven't been used in the last three frames.

Views

The render world has one very special component, with no direct analog in the main world: ExtractedView. Any entity that ends up with this component is considered a 'View'. At time of writing, only two main-world components get an ExtractedView attached to them in the render world:

  • cameras (added in bevy_render::camera::extract_camera),
  • and shadow-casting lights (added by a system in bevy_pbr, as we will see later)

A 'View' represents a point in space from which render commands can be issued. Each ExtractedView contains a projection, a transform, a viewport rectangle (width, height, origin), and a few other bits and bobs.

Views are important because of how they hook into the render graph: Each view has the potential to execute a specialized render workload.

The Render Graph

The RenderGraph, is bevy's specialized task scheduler for render workloads. It consists of a set of Nodes connected by Edges into a directed acyclic computation-graph. Each node represents a self-contained unit of work (a task, effectively) that needs to be performed to render a frame. The edges specify dependencies between nodes.

Every node has a Node::run function, which takes as arguments immutable references to the node itself, a RenderGraphContext, a RenderContext and the world. When the graph is executed, the nodes are ordered according to the dependency graph and executed in sequence. The render graph is executed once per frame (in the Render set) by the render_system.

What goes in a Node::run function? Whatever you want! Nodes can run arbitrary code. They only get immutable access to the render world, but can generate GPU work through the provided RenderContext. In practice, most nodes set up a single render pass and queue up a bunch of draw commands (more or less).

Subgraphs

The render graph can have additional named child graphs associated with it. These are called "subgraphs", even though they are not subgraphs in the traditional graph-theory sense of the term. Instead, a subgraph is an auxiliary graph of render tasks which can be executed (possibly multiple times) by nodes in the main graph.

Subgraphs do not run when the main render graph is executed. Only invoking RenderGraphContext::run_sub_graph within a top-level node's run function will cause the subgraph to run. Subgraphs are queued and executed in sequence after the main render graph finishes.

TODO: Redraft this section and merge it with CameraDriverNode and ViewNode.

CameraDriverNode and ViewNode

With knowledge of subgraphs and views under our belt, we can approach one of the most commonly misunderstood and underappreciated aspects of the render graph: the CameraDriverNode and ViewNodes.

These are both implementations of render_graph::Node. The CameraDriverNode is automatically added to the main render graph by the CameraPlugin. When it runs, it looks up every extracted camera entity and runs the subgraph specified in ExtractedCamera::render_graph. The subgraphs are queued to run after the main graph, and execute in sequence. The CameraDriverNode also passes along the camera entity into the subgraph so it can be accessed through RenderGraphContext::view_entity.

ViewNodes are a convenient wrapper around Node which makes it easy to query for data off of the RenderGraphContext::view_entity. It basically lets you grab extracted data off the camera entity in a camera-driven subgraph. If that all seems very abstract, just keep going for now. It will make more sense when we move on to the bevy_core_pipeline crate.

Render Phases

bevy_render supplies a generic work queue called a "Render Phase" that is intended to supply high-level render instructions for render graph nodes to execute.

Phase Items

PhaseItem is a trait for types that contain renderable data. All instances of a type that implements PhaseItem are expected to be drawn using a single pipeline. bevy_render provides a convenient set of tools for grouping, sorting, batching and rendering PhaseItems, but leaves the actual implementation up to its dependencies (largely bevy_core_pipeline).

Sorted vs Binned Phases

Phases that are defined for use with the main 2d and 3d pipelines are typically defined as being sorted or binned, and have additional data stored in ViewSortedRenderPhases<T> and ViewBinnedRenderPhases<T> resources respectively.

These different strategies allow for different optimizations depending on how a phase is intended to be drawn. For example, for items with transparency, it's necessary to sort the items from back to front to ensure correct blending, as transparent objects need to be drawn after opaque objects for proper rendering. This is handled using the sorted phase strategy to manage the draw order based on depth.

For opaque items, it’s more efficient to use a binned strategy, which allows grouping items by certain properties (like material or mesh) to reduce state changes in the rendering pipeline. This leads to better performance since opaque objects don’t require depth sorting and can be drawn in any order.

Draw Functions and Render Commands

Each PhaseItem defines how it should be drawn using Draw functions. This is a trait with two methods: [Draw::prepare], and [Draw::draw]. The former sets up the function for drawing, and the latter draws an individual item.

Multiple draw functions can be registered to a PhaseItem type using the [DrawFunctions] resource. PhaseItem::draw_function determines which registered Draw function is applied to any given instance of a PhaseItem.

Bevy also allows users to compose draw functions from an ordered set (eg. tuple) of [RenderCommands]. [RenderCommands] offer a simple modular api and they fulfill the same fundamental purpose as draw functions. Render commands must also be registered by invoking App::add_render_command on the render sub-app.

Phases and The Render Schedule

Phase items are collected into a RenderPhase<I: PhaseItem> component on each View. The bevy_render crate expects items to be added to each RenderPhase during the Queue render set. The PhaseSort set is (as can probably guessed) expected to contain systems that sort and batch render phases.

Phases and The Render Graph

After all the items are queued onto phases, the render graph runs. A specific view-node is usually given the task of calling [RenderPhase::render], which calls the registered Draw function on each of queued PhaseItem. This is when the "actual rendering" happens.

The RenderPlugin

The [RenderPlugin] is responsible for:

  • Creating and configuring the render sub-app (as described above),
  • Starting wgpu,
  • Adding the Shader asset type,
  • Adding a bunch of other self-contained plugins.

The RenderAssetsPlugin

The RenderAssetPlugin<A> takes a type implementing RenderAsset, a trait for types that can be encoded into a GPU-friendly format. Each instance of the RenderAssetPlugin adds two systems:

  • [extract_render_asset::<A>] in the extract schedule, which extracts new and changed instances of A into the render world cache. If RenderAsset::asset_usage returns !RenderAssetUsages::MAIN_WORLD then this system also unloads it from the main world.
  • [prepare_asset::<A>] in PrepareAssets calls RenderAsset::prepare_asset to get the GPU-friendly version.

The bevy_render crate contains two implementations of RenderAsset:

  • [Image] which transforms into a [GpuImage]
  • [Mesh] which transforms into a [GpuMesh].

Each gets their own RenderAssetPlugin instance added by [MeshPlugin] and [ImagePlugin] (though for some reason ImagePlugin is part of the default plugins and MeshPlugin is added directly by RenderPlugin).

The Shader Asset

Shaders are loaded using Bevy's asset management system and interected with via Handle<Shader>, which represents a particular shader source. Typically, a shader will be loaded from the assets folder, but can also be created with a source wgsl snippet or even raw spirv bytes.

Shaders are not compiled until they are processed by the PipelineCache when a pipeline is queued for creation, either through pipeline specialization (e.g. for use with a material) or manually requested via queue_render_pipeline.

naga_oil

Bevy uses a preprocessor library called naga_oil in order to compose various shader resources together that are defined in separate sources. For example, many internal Bevy shaders start with an import directive #import that describes what other shader sources are required in order to compose (i.e. concatenate into a single source) the final shader.

naga_oil also uses "shader defs" to power #ifdef preprocessing, which can be used by shader authors to turn certain features on or off during pipeline specialization.

Thus, a single Handle<Shader> may actually represent several different instances of compiled shader that are associated with a single handle, and are cached on the basis of what shader defs were enabled for a particular shader source.

The WindowRenderPlugin

The [WindowRenderPlugin] adds systems that extract window information from the main world into the render world. It creates two resources:

  • [ExtractedWindows]: A mapping from entity ids to extracted window data.
  • [WindowSurfaces]: A mapping from entity ids to a wgpu::Surface (for each window).

In the extract schedule:

  • [extract_windows] populates ExtractedWindows.

In the render schedule:

  • First [create_surfaces] creates a wgpu::Surface any window that need it.
  • Then, in ManageViews, prepare_windows prepares a texture for drawing to.

Note that get_current_texture has to wait for the GPU to be ready to draw. prepare_windows may block waiting for the texture under heavy graphics workload.

The CameraPlugin

The [CameraPlugin] adds systems that extract and sort cameras. It creates one resource:

  • [SortedCameras]: A sorted list of camera information, including the camera render-target.

In the extract schedule:

  • [extract_cameras] adds [ExtractedCamera] and ExtractedView components to reserved camera entities.

The ViewPlugin

TODO: Explain the view plugin (or remove this section because it is covered partially in the render graph.

ImagePlugin and MeshPlugin

TODO: Explain how the renderer sets up image and mesh extraction.

Summary of bevy_render

The best way to think of bevy_render is as a toolbox, but thus far we've mostly covered how these tools work. Now, as we move into the higher levels of the rendering stack, we can now start to think about how to use these tools:

  • The render sub-app: For rendering in parallel with game logic.
  • Render resources: For automatically managing low-level rendering resources.
  • Views: For rendering from multiple different "camera-like" perspectives.
  • Phases: For many rendering objects (eg. meshes and lights).
  • The Render Graph: For scheduling render tasks for arbitrary numbers of views and queued phase items.

The bevy_core_pipeline Crate

Now we can move on to bevy_core_pipeline: An opinionated rendering solution built using the bevy_render toolbox. It doesn't touch the actual shading, but it does set up a common RenderGraph and a set of standard RenderPhases.

The main thing bevy_core_pipeline does is add two new subgraphs to the render graph: Core2d and Core3d, each introduced by their respective plugins.

Forward vs Deferred Rendering

TODO: Explain how deferred rendering is implemented. See the original pr.

The Core 2d Subgraph

The Core2d subgraph is executed once for every 2d camera. Nodes are identified by variants of the Node2d enum. We will look at the nodes of the graph label-by-label, following the dependency graph:

MainPass
Tonemapping
EndMainPassPostProcessing
Upscaling

The MainPass is implemented by view node [MainPass2dNode], which mostly just renders the Transparent2d render phase. No draw functions are registered for this phase, which allows higher-levels of the rendering architecture to hook in their own rendering code.

Why is it called Transparent2d? Because it's drawn as if it might be transparent. TODO: Elaborate on this.

Both Tonemapping and Upscaling use the same logic for both the 2d and 3d and we will cover them in their own section.

The Core 3d Subgraph

The Core3d subgraph is executed once for every 3d camera. Nodes are identified by variants of the Node3d enum. Again, we will look at the graph label by label following the dependency graph:

Mainpass Partition
Prepass Partition
StartMainPass
MainOpaquePass
MainTransmissivePass
MainTransparentPass
EndMainPass
DeferredPrepass
Prepass
CopyDeferredLightingId
EndPrepass
Tonemapping
EndMainPassPostProcessing
Upscaling

The Prepass Section

Prepass is implemented by view node [PrepassNode].

TODO: Explain what the prepass section is for.

The Mainpass Section

MainOpaquePass is implemented in view node [MainOpaquePass3dNode]. It renders the [Opaque3d] and [AlphaMask3d] phases along with the skybox in a single render pass.

MainTransmissivePass is implemented by view node [MainTransmissivePass3dNode]. It draws the [Transmissive3d] render pass.

MainTransparentPass is implemented by view node [MainTransparentPass3dNode]. It draws the [Transparent3d] phase.

Shared Logic

Tonemapping

Upscaling

The bevy_pbr Crate

TODO: Give an overview of the bevy_pbr crate.

License

This document is released under the MIT License. Copyright 2024 Miles Silberling-Cook.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.