authors: tychedelia, nth
This is a work in progress, and may contain omissions or mistakes. If you notice an issue or have a question, please don't hesitate to:
#documentation-dev
on the bevy discordThis is a community document, which anyone can edit!
You are welcome to:
Submitted as bevy_website
PR #1080.
This is a technical summary of Bevy's rendering architecture as of the 0.13 release.
Bevy's rendering code is largely concentrated in three loosely-coupled crates: bevy_render
, bevy_core_pipeline
, and bevy_pbr
.
The foundation is bevy_render
, which manages how low-level rendering tasks are scheduled and executed.
Above this sits bevy_core_pipeline
, which defines the individual steps of rendering each frame.
Above both is bevy_pbr
which adds a material abstraction and uses it to provide physically-based shaders.
These three crates sit roughly in "upper-middle" of bevy's default stack and do
not fully encompass all of bevy's rendering capabilities. Although we will not
cover them here, both bevy_ui
and bevy_sprite
also hook-into and interact
with rendering.
We will begin with the lowest levels of the stack and progress upwards towards
high-level concepts like materials. But before diving in, lets take a look at
the two big dependencies underpinning the rest of the rendering stack: wgpu
and bevy_ecs
.
wgpu
PreliminariesThis is a quick overview of some important aspects of wgpu. Those experienced with wgpu can skip this section. Those unfamiliar with wgpu should check out the wgpu book and the WebGPU spec.
wgpu
is a rust package providing a cross platform implementation WebGPU
, a
modern graphics api. It strikes a nice balance between ergonomics and power,
and can run pretty much anywhere.
wgpu
is a low level graphics library, but because it is designed to work
across a wide range of platforms there are many layers of abstraction between it
and the graphics hardware. Going from lowest to highest level:
metal
, vulkan
, gles
etc.) supplied by your platform,wgpu_hal::vulkan
, wgpu_hal::gles
,
wgpu_native
and the browser's own WebGpu
adapter, ect.) which are low-level unsafe rust bindings to the native api,WebGPU
api provided by wgpu
, which abstracts over all the previous layers.It's good to be vaguely aware of this stack. wgpu
does it's best to stay
consistent, but some features are not fully supported on all platforms. It's also common for the "Hardware Abstraction Layers" (HAL) on layer 4 to support features beyond the base WebGPU
spec.
For example, some features that are platform dependent are:
Additionally, because wgpu
is both a slow moving specification and a kind of "lowest common denominator" API, there may be some advanced rendering APIs supported by your GPU, e.g. mesh shaders, that are not yet supported in wgpu
.
The software sections of this stack (layers 2-5) are represented in wgpu
by a
wgpu::Adapter
and wgpu::Device
. The adapter provides access to the platform
specific WebGPU
implementation (layers 2 and 3 of the stack). The device is a
logical representation of layers 4 and 5 obtained from the Adapter
, and is
ultimately what allows us to submit work to the GPU. Both of these abstractions come directly from the WebGPU
spec.
Allocating memory is the most basic of all operations. In wgpu
, GPU memory
management is done through buffers.
A wgpu::Buffer
represents a continuous block of memory allocated on the GPU.
Buffers can be created by calls to wgpu::Device::create_buffer
(which only
allocates the buffer) or wgpu::DeviceExt::create_buffer_init
(which both
allocates and initializes the buffer).
You can reference a slice of a buffer using wgpu::Buffer::slice
. This
returns a wgpu::BufferSlice
which can be "mapped" to main program memory for
reading and writing. Writes are flushed by calling wgpu::Buffer::unmap
and
can be deallocated with wgpu::Buffer::destroy
. Writes can also be queued as part
of commands (which we will talk about later) using wgpu::Queue::write_buffer
. Basically, write_buffer
delays the memory write until the next set of instructions are sent to the GPU.
When you create a buffer, you generally have to declare the intended use with
wgpu::BufferUsages
. Buffers must have BufferUsages::MAP_READ
to be read from.
A wgpu::Texture
represents an image stored in GPU memory. Textures can be
created by calling wgpu::Device::create_texture
, which requires information
about the format, dimensions, and so on. The main way to write to a texture is
[wgpu::Buffer::write_texture
], which is similar to
[wgpu::Device::write_buffer
] in that it is delayed for dispatch until commands
are next sent to the GPU. It's also possible to copy a buffer to a texture using
[wgpu::CommandEncoder::copy_buffer_to_texture
]. Like buffers, textures can be
deallocated using [wgpu::Texture::destroy
].
Textures are usually accessed in shaders through views and samplers.
A [wgpu::TextureView
] is a "view" on a specific texture. The view describes
how the raw texture data should be interpreted, and the subset of the texture to
access. Views are created using [wgpu::Texture::create_view
]. They will be
very important when we get into bind groups later on.
A [wgpu::Sampler
] is a method of sampling color values from a texture at a
given texture coordinate. They are created using
[wgpu::Device::create_sampler
], and will also be important in the context of
bind groups, which we are about to introduce.
Before introducing bind groups, we have to introduce the idea of a Pipeline.
A pipeline is like a highly-specialized program that runs on the GPU.
Currently wgpu
supports two types of pipeline: [wgpu::RenderPipeline
] for
rasterizing geometry, and [wgpu::ComputePipeline
] for general computation.
Pipelines are composed of a sequence of stages, which can be either fixed or programmable. Fixed stages are inflexible hardware-accelerated operations, with no user control. Programmable stages, on the other hand, can be controlled by user-supplied shader modules.
Shaders are small highly-specialized programs that run on the GPU within the
programmable stages of pipelines. They are written in a language WGSL
.
Unlike code written to run on the CPU, shader programs are highly parallel by default, meaning your code will be executed across many threads at the same time, for example for each pixel in a fragment shader. This places some limitations on what shader code must look like as shaders must avoid complex branching or dependencies between threads to ensure efficient parallel execution. Operations should be designed to run independently for each thread, and control flow should remain uniform where possible to prevent performance bottlenecks caused by thread divergence.
Shader programs also tend to be written to target a specific stage of the programable graphics pipeline. We describe the pipeline as "programmable" as it supports running shader programs at certain steps of the pipeline, as opposed to fixed function APIs that are not programmable. For example, in a traditional graphics pipeline, a vertex shader processes vertex data to transform it into screen space, where it is then passed to a fixed function rasterizer which cannot be explicitly programmed. Similarly, a fragment shader calculates the color of each pixel, which is then used by a fixed function stage to write the pixel data to the framebuffer. The stages of this pipeline may also have specialized hardware in your GPU to support their efficiency.
In recent years, it has also become possible both to program more stages of the graphics pipeline, as well as to write "general purpose" shaders known as compute shaders, which are not tied to the traditional graphics pipeline. Compute shaders allow developers to perform a wide range of parallel computations, such as physics simulations, image processing, or machine learning, using the GPU’s immense computational power for tasks beyond rendering. Combining compute shaders with more traditional techniques allows building powerful modern graphics features.
Bindings allow buffers, texture views, and samplers (as well as arrays of the same) to be accessed in pipelines just like global variables. Each binding occupies a unique (but not necessarily sequential) index from 0 to 1000 in one of four "bind group" slots.
On the GPU side, the expected type for each binding must be hard-coded onto the
shader. On the CPU, the layout of bindings for each slot is specified by a "bind
group layout" ([wgpu::BindGroupLayout
]). Up to four of these layouts
can be specified when creating a pipeline, one for each bind group slot.
Bind groups are created on the CPU to assign specific buffers and textures to each binding in a layout.
Once created, bind groups are bound to the four pipeline bind group slots using GPU commands (which we will talk more about later). When a pipeline is invoked (also via GPU commands) the group bound to each slot must match the layout of the slot. Bind groups work as a stack: When a group is rebound, all bind groups with higher slot index must be rebound as well.
Only certain types of bindings can be written to by shaders:
A compute shader is a GPU program designed for general-purpose parallel computation. Compute shaders can bind resources like buffers and textures.
A compute shader operates on data in parallel, with threads organized into workgroups, which are small, cohesive groups of threads that share memory and are scheduled together by the GPU. Each workgroup contains multiple threads (also called invocations), and each thread has a unique local ID within the group
Compute shaders are launched via a dispatch command, which defines the number of workgroups to execute across the GPU. Calculating the necessary number of workgroups to efficiently run the program is part of optimizing the workload distribution, ensuring that the problem size is evenly divided among threads while taking into account the hardware’s capabilities, such as the number of available cores and the size of each workgroup, to maximize parallel efficiency and minimize idle threads.
TODO: Explain vertex and fragment shaders and the graphics pipeline.
Render pipelines are more specialized than compute pipelines, but also have more mechanisms for input and output than just bindings.
Render pipelines have the following additional inputs:
Render pipelines have the following outputs:
Render attachments are textures that store the results of rendering; they are what get rendered to. Render pipelines expect a specific number of color attachments, and can also configure a depth attachment. Failing to provide the attachments when invoking a pipeline will result in an error.
Attachments can be sampled from as well as written to. TODO: Double-check that this is correct.
The vertex buffer is a special buffer that is used as input for the pipeline.
In order for the pipeline to know what to draw, it must first define the geometry of what is being drawn. This is done by suppling a series of vertices to the fragment shader, that will represent the triangles used by the rasterizer and ultimately passed to the fragment shader. Vertex data is typically described in "local" space, which is the space that is relative to the origin mesh that the vertex is part of. For example, the center of a cube might be represnted as the point (0,0) in local space. The goal of the vertex shader is to transform the vertex data first into world space using the perspective of the camera and then into screen or clip space, which represents the normalized 2d coordinates of the screen.
The vertex data supplied to the fragment shader is built on the CPU using a certain topology that describes how the data is laid out in the vertex buffer. Typically, this is as a list of triangles. Additionally, users may supply an optional index buffer, which tells the vertex shader in which order the vertices should be drawn. This must mach the "winding" that is defined in the pipeline descriptor, i.e. clockwise or counter-clockwise, to determine which direction the "front" of the triangle faces.
Vertex data may also be instanced, which describes when the same mesh is drawn multiple times in a single draw call. There may be an additional vertex buffer bound in this case (at a different vertex buffer slot) that describes the per-instance data (as opposed to per-vertex data, like coordinates or colors).
Communication between the CPU and the GPU is asyncronuous; the CPU dispatches commands to the GPU and must wait for the GPU to finish it's workload and transmit a response. We'll reffer to these as GPU commands to differentiate them from bevy ECS commands.
As previously said, a pipeline is like a highly-specialized program that runs on the GPU. To "run" a pipeline, multiple GPU commands must be issued to set up the various bindings, select the active pipeline, and finally invoke the pipeline itself. These GPU commands are grouped together in a "pass".
Passes group and organize GPU commands. A wgpu::RenderPass
allows issuing
GPU commands relating to render pipelines, and a similar structure
wgpu::ComputePass
exists for compute pipelines.
Render attachments are set during render pass creation and fixed for the duration; you can think of a render pass as a resource-scope for a certain set of attachments. All pipelines executed within a pass must be compatible with the provided set of attachments. Between passes, writes are flushed and attachments can be swapped around or reconfigured.
Commands are queued together in a wgpu::CommandEncoder
when they are added
to a pass. Calling CommandEncoder::finish()
encodes the commands into a
wgpu::CommandBuffer
, which is ready to be sent off to the gpu for execution.
Work begins when command buffers are submitted to the GPU command queue.
As we have seen, wgpu
sits atop many semi-interchangeable layers.
Unfortunately, it's often necessary to design features to accommodate the
lowest-common-denominator of supported platforms. Here are some limitations to
keep in mind:
The full list is available here.
bevy_ecs
PreliminariesThis is a quick overview of some important aspects of bevy's Entity Component System. Those experienced with the ECS can skip this section. Everyone else should refer to the the book and the docs for more info.
A System
is a stateful instance of a function that can access data stored in
a world. A SystemSet
is logical group of systems (which can comprise other
system sets).
By default systems have neither strict execution order nor any conditions for execution. For any system or system set, you can define:
These properties are all additive, and properties can be added to existing sets. Adding another does not replace an existing one, and they cannot be removed. If incompatible properties are added, the schedule will panic at startup.
A Schedule
is a collection of systems that are ordered, dispatched and
executed together as a batch. Every system belongs to exactly one schedule (If
the same function is added to two different schedules, it is considered a
different system).
Systems in different schedules are independent; Each schedule has it's own collection of SystemSets and it's own system execution ordering.
An App
contains a World
, several schedules, and a runner function which
manages an event-loop.
Every app has one main schedule, which the runner (generally) executes once per
frame. Other schedules may be executed by calling World::run_schedule
.
A SubApp
is a secondary world with it's own set of schedules, which is
contained within the main app. Like the main app, a sub-app has a main schedule
which is (usually) executed by the main app's runner once per frame. Before the
runner executes the main schedule, it calls SubApp::extract
to synchronize
the sub-app with the main world. No schedules can execute on the main world
during extraction.
bevy_render
CrateThe bevy_render
crate is a modular rendering toolkit. It's mostly concerned
with the nuts-and-bolts of scheduling and executing the rendering workload. The
tools it provides are largely independent, but they are intended to be used
together. We will start by investigating how the rendering code integrates with
wgpu
through Render Resources, then take a look at the ECS side of things with
the render sub-app, before moving on to high level scheduling abstractions like
the Render Graph and Render Phases.
The bevy_render
crate wraps many primitive resources from wgpu
in higher
level abstractions that are accessible through the RenderContext
. There are
several things you can do with a RenderContext
, including setting up a render
pass and queuing a task to generate command buffer in parallel.
Bevy provides it's own convenience wrappers around wgpu::Device
and
wgpu::CommandEncoder
, called [Device
] and [CommandEncoder
] respectively.
These are both accessible through the RenderContext
.
There are two ways to create bind groups in bevy, at different layers of
abstraction. The most direct path is with [Device::create_bind_group_layout
].
Instead of specifying a slice of BindGroupLayoutEntry
items, it is usually
more ergonomic to use the [BindGroupLayoutEntries
]. Bind group instances can
be created from a layout using [Device::create_bind_group
]. Here again, it is
usually preferable to use [BindGroupEntries
] over manually constructing a
slice of [BindGroupEntry
].
While it is possible to create bind groups directly in this manner, most users
will want to use an [AsBindGroup
] trait derive instead. Types that implement
[AsBindGroup
] provide two important functions:
[AsBindGroup::bind_group_layout
] and [AsBindGroup::as_bind_group
]. The
former is a static function which creates a bind group layout (which is the same
for all instances of the type) and the latter is a method which returns a bind
group instance with the appropriate layout. We will cover when these are called
later when we get into render workload scheduling.
In bevy, render passes can be created by calling
RenderContext::begin_tracked_render_pass
. A TrackedRenderPass
is a
wrapper around a wgpu::RenderPass
with some handy automatic
resource-management built-in. It's a "Tracked" pass because it keeps track of
the current pipeline id, configured bind-groups, and bound vertex and index
buffers. This lets us treat several important render commands as idempotent.
Binding the same pipeline multiple times, for example, will only result in a
single BindPipeline
command being sent to the GPU driver.
Note: This approach avoids redundant GPU instructions, which can be very costly, but at the expense of additional state-management overhead on every frame. Some rendering experts are experimenting with alternatives, and it's likely this will change in the near future.
As covered in the wgpu
preliminary, rendering involves queuing render commands
onto a CommandEncoder
, so that they can be submitted to the gpu in a single
batch. Unfortunately, queuing a large number of commands can take quite a long
time. Command buffer generation tasks alleviate this issue by allowing us to
perform this costly work in parallel.
A "Command-Buffer Generation Task" is a function registered with
RenderContext::add_command_buffer_generation_task
which takes a read-only
reference to the underlying wgpu::Device
and returns a wgpu::CommandBuffer
.
BeforeRenderContext
finishes, it runs all registered tasks in parallel and
then joins the resulting commands back together in the order they were added.
The RenderContext
is not available within tasks. A new wgpu::CommandEncoder
and wgpu::RenderPass
must be created from the provided wgpu::Device
. Bevy's
wrapper types must be added to wgpu
resources manually. Tracking, for
instance, can be added to an existing wgpu::RenderPass
by passing it to
TrackedRenderPass::new()
.
We are now going to move away from the wgpu
side of things and look at how
rendering intersects with scheduling and the ECS. Bevy optionally supports
pipelined-rendering, which is a technique where the current frame is rendered at
the same time as the next game update runs.
Pipelined-rendering is achieved by moving rendering work into a sub-app which (mostly) executes on its own thread. On single-threaded targets (like wasm), pipelined-rendering is disabled and the render sub-app instead runs on the main thread between executions of the main app.
The render sub-app is a very specialized use of the ecs. It relies upon a
Entity-sharing scheme with the main world that involves clearing all the
entities after every frame. Because entities don't stick around more than a
single frame most of the interesting stuff in the render world happens within
resources. One of these resources, the RenderGraph
, is what ultimately
drives the rendering work done each frame.
The render sub-app has two schedules, called ExtractSchedule
and Render
(the inconsistent naming avoids conflict with the Extract
system parameter,
which we will talk about later). The extract schedule is executed during the
extract
function (which we discussed on the ecs preliminaries), and allows
access to both the main world and the render world. The render schedule is the
main schedule of the render sub-app, and runs after ExtractSchedule
. All
entities are cleared at very end of the Render
schedule.
As of 0.13, the docs on Entity
have this to say about cross-world entity
use:
[An entity is] only valid on the World it’s sourced from. Attempting to use an Entity to fetch entity components or metadata from a different world will either fail or return unexpected results.
The render world is an exception; it implements an explicit entity-sharing scheme with the main world to enable cross-world use. It works like this:
World::flush_and_reserve_invalid_assuming_no_entities
is called on the
render world before each frame, to reserve all entities that are (or could be)
used in the main world.World::get_or_spawn
lets render-world systems assign render-world
components to entity ids initially derived from the main world.World::spawn
can be used to spawn new render-world
exclusive entities, which are guaranteed not to conflict with main world
entities.From now on, a reference to a "reserved entity" in the render world will always mean an entity that was reserved before extraction, with a corresponding entity in the main world. "Unreserved entity" may sometimes be used to refer to an entity that is spawned into the render world directly, without a corresponding twin in the main world.
The upside is that entities from the main world can be used in the render world without fear of collisions. The downside is that the entities need to be added back to the render world every frame.
The ExtractSchedule
is a sync-point that allows the render-world to be
populated with data from the main world. Render-world systems in this schedule
access the main would through the special MainWorld
resource (which can be
mutable), or the Extract
parameter (which is read-only). Read-only access
through the Extract
parameter is preferred.
While the Extract
schedule runs, no other schedule can execute on either the
main app or the render sub app. It effectively locks both worlds, and is the
only part of bevy that can bottleneck both game logic and rendering. Bevy takes
great pains to keep the extract schedule as slim and efficient as possible, and
users should attempt to keep systems the extract schedule small.
The RenderPlugin
only adds a single system to the extract schedule,
PipelineCache::extract_shader
, which we will talk more about when we introduce
the Shader
asset. Most of the systems in the extract schedule read some
main-world component Foo
and add a matching ExtractedFoo
render component to
the corresponding reserved entity in the render world.
The Render
schedule runs directly after extraction. It comes with 16
built-in system sets, grouped as variants of the RenderSet
enum.
ExtractCommands
SetThe first set to run in the render schedule is ExtractCommands
. This set
usually contains a single system that applies ECS commands dispatched in the
extract schedule. Applying ECS commands in Render rather than Extract means the
main world spends less time locked in extraction.
ManageViews
SetThe ManageViews
set runs after ExtractCommands
, and contains four systems:
sort_cameras
Sorts extracted cameras by Camera::order
.prepare_windows
Gets each window ready to be drawn to.prepare_view_attachments
] After windows have been prepared, adds an OutputColorAttachment
to each camera, containing the output texture view and format. These attachements can be optionally overriden before prepare_view_targets
by user code, for example when taking a screenshot to insert a different texture for single frame.prepare_view_targets
After the correct attachment has been selected, adds a ViewTarget
component to each camera, that contains both the output texture as well as a reference to the "intermediate" texture that will be rendered to.It will become more clear why this is called ManageViews
when we talk about
views later.
PrepareAssets
SetThe PrepareAssets
runs between ExtractCommands
and Prepare
, in parallel
with ManageViews
, Queue
and PhaseSort
. The purpose of the PrepareAssets
set is to run a bunch of instances of a generic system called prepare_assets
added by different RenderAssetPlugins
. After this set completes, various
RenderAssets<A>
resources are populated with data ready to be sent to the GPU.
See the section on the RenderAssetPlugin
for more information.
Queue
SetThe Queue
set runs after ManageViews
. The bevy_render
crate intentionally
leaves populating the Queue
set to higher levels of the stack. The systems in
this set are expected to do stuff with render phases, which we will talk about
later.
The Queue
set also contains a special subset QueueMeshes
which executes
after prepare_assets<Mesh>
completes. It is also empty by default.
PhaseSort
SetThe PhaseSort
set runs after Queue
and is also left empty by bevy_render
.
We will talk more about this set along side render phases and the core pipeline.
Prepare
SetThe Prepare
set runs after PhaseSort
and PrepareAssets
complete. It is
intended for use by systems which translate entities and components into
GPU-friendly formats and creates BindGroups
.
TODO: This may be wrong, creating bind groups probably happens in PrepareBindGroups. Fix or clarify…
PrepareResources
SubsetTODO: Explain prepare the resources subset of Prepare
.
PrepareBindGroups
SubsetTODO: Explain the bind groups subset of Prepare
.
Render
SetThe Render
set (not to be confused with the Render
schedule to which it
belongs) runs after Prepare
, and is when the actual draw calls get issued to
the GPU.
The bevy_render
crate adds two systems to the Render
set:
PipelineCache::process_queue
to compile all
the necessary shader pipelines.render_system
does the rendering.The render_system
mostly triggers an execution of the render graph, bevy's
render workload scheduling system. We will talk more about the render graph in
it's own section.
Cleanup
SetThe Cleanup
set is the last set in the schedule, running after Render
.
The bevy_render
crate adds two systems to the Cleanup
set:
World::clear_entities
], which drops all the entities from the render world.update_texture_cache_system
], which unloads textures that haven't been used
in the last three frames.The render world has one very special component, with no direct analog in the
main world: ExtractedView
. Any entity that ends up with this component is
considered a 'View'. At time of writing, only two main-world components get an
ExtractedView
attached to them in the render world:
bevy_render::camera::extract_camera
),bevy_pbr
, as we will see
later)A 'View' represents a point in space from which render commands can be
issued. Each ExtractedView
contains a projection, a transform, a viewport
rectangle (width, height, origin), and a few other bits and bobs.
Views are important because of how they hook into the render graph: Each view has the potential to execute a specialized render workload.
The RenderGraph
, is bevy's specialized task scheduler for render workloads. It
consists of a set of Nodes
connected by Edges
into a directed acyclic
computation-graph. Each node represents a self-contained unit of work (a task,
effectively) that needs to be performed to render a frame. The edges specify
dependencies between nodes.
Every node has a Node::run
function, which takes as arguments immutable
references to the node itself, a RenderGraphContext
, a RenderContext
and
the world. When the graph is executed, the nodes are ordered according to the
dependency graph and executed in sequence. The render graph is executed once per
frame (in the Render
set) by the render_system
.
What goes in a Node::run
function? Whatever you want! Nodes can run arbitrary
code. They only get immutable access to the render world, but can generate GPU
work through the provided RenderContext
. In practice, most nodes set up a
single render pass and queue up a bunch of draw commands (more or less).
The render graph can have additional named child graphs associated with it. These are called "subgraphs", even though they are not subgraphs in the traditional graph-theory sense of the term. Instead, a subgraph is an auxiliary graph of render tasks which can be executed (possibly multiple times) by nodes in the main graph.
Subgraphs do not run when the main render graph is executed. Only invoking
RenderGraphContext::run_sub_graph
within a top-level node's run function
will cause the subgraph to run. Subgraphs are queued and executed in sequence
after the main render graph finishes.
TODO: Redraft this section and merge it with CameraDriverNode and ViewNode.
With knowledge of subgraphs and views under our belt, we can approach one of the
most commonly misunderstood and underappreciated aspects of the render graph:
the CameraDriverNode
and ViewNodes
.
These are both implementations of render_graph::Node
. The CameraDriverNode
is automatically added to the main render graph by the CameraPlugin
. When it
runs, it looks up every extracted camera entity and runs the subgraph specified
in ExtractedCamera::render_graph
. The subgraphs are queued to run after the
main graph, and execute in sequence. The CameraDriverNode
also passes along
the camera entity into the subgraph so it can be accessed through
RenderGraphContext::view_entity
.
ViewNodes
are a convenient wrapper around Node
which makes it easy to
query for data off of the RenderGraphContext::view_entity
. It basically lets
you grab extracted data off the camera entity in a camera-driven subgraph. If
that all seems very abstract, just keep going for now. It will make more sense
when we move on to the bevy_core_pipeline
crate.
bevy_render
supplies a generic work queue called a "Render Phase" that is
intended to supply high-level render instructions for render graph nodes to
execute.
PhaseItem
is a trait for types that contain renderable data. All instances
of a type that implements PhaseItem
are expected to be drawn using a single
pipeline. bevy_render
provides a convenient set of tools for grouping,
sorting, batching and rendering PhaseItems
, but leaves the actual
implementation up to its dependencies (largely bevy_core_pipeline
).
Phases that are defined for use with the main 2d and 3d pipelines are typically defined as being sorted or binned, and have additional data stored in ViewSortedRenderPhases<T>
and ViewBinnedRenderPhases<T>
resources respectively.
These different strategies allow for different optimizations depending on how a phase is intended to be drawn. For example, for items with transparency, it's necessary to sort the items from back to front to ensure correct blending, as transparent objects need to be drawn after opaque objects for proper rendering. This is handled using the sorted phase strategy to manage the draw order based on depth.
For opaque items, it’s more efficient to use a binned strategy, which allows grouping items by certain properties (like material or mesh) to reduce state changes in the rendering pipeline. This leads to better performance since opaque objects don’t require depth sorting and can be drawn in any order.
Each PhaseItem
defines how it should be drawn using Draw
functions. This
is a trait with two methods: [Draw::prepare
], and [Draw::draw
]. The former
sets up the function for drawing, and the latter draws an individual item.
Multiple draw functions can be registered to a PhaseItem
type using the
[DrawFunctions
] resource. PhaseItem::draw_function
determines which
registered Draw
function is applied to any given instance of a PhaseItem
.
Bevy also allows users to compose draw functions from an ordered set (eg. tuple)
of [RenderCommands
]. [RenderCommands
] offer a simple modular api and they
fulfill the same fundamental purpose as draw functions. Render commands must
also be registered by invoking App::add_render_command
on the render sub-app.
Phase items are collected into a RenderPhase<I: PhaseItem>
component on each
View. The bevy_render
crate expects items to be added to each RenderPhase
during the Queue
render set. The PhaseSort
set is (as can probably guessed)
expected to contain systems that sort and batch render phases.
After all the items are queued onto phases, the render graph runs. A specific
view-node is usually given the task of calling [RenderPhase::render
], which
calls the registered Draw
function on each of queued PhaseItem
. This is
when the "actual rendering" happens.
RenderPlugin
The [RenderPlugin
] is responsible for:
wgpu
,Shader
asset type,RenderAssetsPlugin
The RenderAssetPlugin<A>
takes a type implementing RenderAsset
, a trait for
types that can be encoded into a GPU-friendly format. Each instance of the
RenderAssetPlugin
adds two systems:
extract_render_asset::<A>
] in the extract schedule, which extracts new and
changed instances of A
into the render world cache. If
RenderAsset::asset_usage
returns !RenderAssetUsages::MAIN_WORLD
then this
system also unloads it from the main world.prepare_asset::<A>
] in PrepareAssets
calls RenderAsset::prepare_asset
to get the GPU-friendly version.The bevy_render
crate contains two implementations of RenderAsset
:
Image
] which transforms into a [GpuImage
]Mesh
] which transforms into a [GpuMesh
].Each gets their own RenderAssetPlugin
instance added by [MeshPlugin
] and
[ImagePlugin
] (though for some reason ImagePlugin
is part of the default
plugins and MeshPlugin
is added directly by RenderPlugin
).
Shader
AssetShaders are loaded using Bevy's asset management system and interected with via Handle<Shader>
, which represents a particular shader source. Typically, a shader will be loaded from the assets folder, but can also be created with a source wgsl snippet or even raw spirv bytes.
Shaders are not compiled until they are processed by the PipelineCache
when a pipeline is queued for creation, either through pipeline specialization (e.g. for use with a material) or manually requested via queue_render_pipeline
.
naga_oil
Bevy uses a preprocessor library called naga_oil
in order to compose various shader resources together that are defined in separate sources. For example, many internal Bevy shaders start with an import directive #import
that describes what other shader sources are required in order to compose (i.e. concatenate into a single source) the final shader.
naga_oil
also uses "shader defs" to power #ifdef
preprocessing, which can be used by shader authors to turn certain features on or off during pipeline specialization.
Thus, a single Handle<Shader>
may actually represent several different instances of compiled shader that are associated with a single handle, and are cached on the basis of what shader defs were enabled for a particular shader source.
WindowRenderPlugin
The [WindowRenderPlugin
] adds systems that extract window information from
the main world into the render world. It creates two resources:
ExtractedWindows
]: A mapping from entity ids to extracted window data.WindowSurfaces
]: A mapping from entity ids to a wgpu::Surface
(for each
window).In the extract schedule:
extract_windows
] populates ExtractedWindows
.In the render schedule:
create_surfaces
] creates a wgpu::Surface
any window that need it.ManageViews
, prepare_windows
prepares a texture for drawing to.Note that get_current_texture
has to wait for the GPU to be ready to draw.
prepare_windows
may block waiting for the texture under heavy graphics
workload.
CameraPlugin
The [CameraPlugin
] adds systems that extract and sort cameras. It creates one
resource:
SortedCameras
]: A sorted list of camera information, including the camera
render-target.In the extract schedule:
extract_cameras
] adds [ExtractedCamera
] and ExtractedView
components
to reserved camera entities.ViewPlugin
TODO: Explain the view plugin (or remove this section because it is covered partially in the render graph.
ImagePlugin
and MeshPlugin
TODO: Explain how the renderer sets up image and mesh extraction.
bevy_render
The best way to think of bevy_render
is as a toolbox, but thus far we've
mostly covered how these tools work. Now, as we move into the higher levels of
the rendering stack, we can now start to think about how to use these tools:
bevy_core_pipeline
CrateNow we can move on to bevy_core_pipeline
: An opinionated rendering solution
built using the bevy_render
toolbox. It doesn't touch the actual shading, but
it does set up a common RenderGraph
and a set of standard RenderPhases
.
The main thing bevy_core_pipeline
does is add two new subgraphs to the render
graph: Core2d
and Core3d
, each introduced by their respective plugins.
TODO: Explain how deferred rendering is implemented. See the original pr.
The Core2d
subgraph is executed once for every 2d camera. Nodes are identified
by variants of the Node2d
enum. We will look at the nodes of the graph
label-by-label, following the dependency graph:
The MainPass
is implemented by view node [MainPass2dNode
], which mostly just
renders the Transparent2d
render phase. No draw functions are registered for
this phase, which allows higher-levels of the rendering architecture to hook in
their own rendering code.
Why is it called Transparent2d
? Because it's drawn as if it might be
transparent.
TODO: Elaborate on this.
Both Tonemapping
and Upscaling
use the same logic for both the 2d and 3d and
we will cover them in their own section.
The Core3d
subgraph is executed once for every 3d camera. Nodes are identified
by variants of the Node3d
enum. Again, we will look at the graph label by
label following the dependency graph:
Prepass
is implemented by view node [PrepassNode
].
TODO: Explain what the prepass section is for.
MainOpaquePass
is implemented in view node [MainOpaquePass3dNode
]. It
renders the [Opaque3d
] and [AlphaMask3d
] phases along with the skybox in a single
render pass.
MainTransmissivePass
is implemented by view node
[MainTransmissivePass3dNode
]. It draws the [Transmissive3d
] render pass.
MainTransparentPass
is implemented by view node [MainTransparentPass3dNode
].
It draws the [Transparent3d
] phase.
bevy_pbr
CrateTODO: Give an overview of the bevy_pbr
crate.
This document is released under the MIT License. Copyright 2024 Miles Silberling-Cook.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.