In GPU we Rust

`whoami`

rust-projects

Agenda

Landscape of GPU abstractions
History of wgpu
Blade of difference

GPU abstractions

safe/lightweight/portable

Map of Portability

platform availability

gpu-safety

case: glow

Purity:
Safety:

OpenGL is safe, but Rust API is not

Backends: GL/GLES/WebGL

no compute on Apple platforms

Overhead:

API itself is close to zero overhead
but actual platforms may involve translation

Ergonomics: AA+

relatively small API
boilerplate related to bindings and framebuffers

Downloads: every 8 seconds

case: Ash

Purity: (no shader solution)
Safety:
Backends: Vulkan
Overhead:
Ergonomics: A
Downloads: every 9 seconds

is a dependency of many others

case: Vulkano

Purity: host, shader processing (3rd party C++)
Safety: host, shaders, relies on robust buffer/image access
Backends: Vulkan
Overhead:

every draw/dispatch is iterating all the used resources
actual commands are recorded at the end of the pass

Ergonomics: AA

automatic barriers, bit of type sugar

Downloads: every 2.5 minutes

case: wgpu

Purity: (includes shader solution via naga)
Safety: (includes shader instrumentation)
Backends: Vulkan, D3D12, Metal, GL, WebGPU, WebGL2
Overhead:

tracking every bind group setup
actual commands are recorded at the end of the pass

Ergonomics: AAA

simple specification
automatic state tracking

Downloads: every 12 seconds

case: wgpu-hal

Purity: (includes shader solution via naga)
Safety:
Backends: Vulkan, D3D12, Metal, GL/GLES/WebGL2, WebGPU
Overhead: (directly mapped)
Ergonomics: A+

a bit simpler than Vulkan

Downloads: every 12 seconds (same as wgpu)

case: Blade

Purity: (includes shader solution via naga)
Safety:
Backends: Vulkan, Metal, GLES/WebGL2
Overhead: (directly mapped)
GPU penalty: (to be discussed)
Ergonomics: AAA+

doesn't involve any bind group layout business
no resource states or barriers
but requires manual resource destruction

Downloads: every 15 minutes

Ergonomics scale

ergononimcs

wgpu: Implementation of WebGPU

webgpu-problem

WebGPU: Targets

wgpu-intersection

wgpu: History

wgpu-history

wgpu: Architecture

wgpu-graph

wgpu: Safety

Core idea: validating correctness takes as much computation as providing it.

wgpu: Synchronization

wgpu-usages wgpu-sync

WebGPU Shading Language

webgpu-shading-language2

WGSL: Motivation

one of the drivers behind early Web was the ability to inspect/edit/write pages directly.
no shading language is designed for safety and lack of UB.
GLSL is outdated, SPIR-V spec is difficult, everything else is poorly specified…

Naga shows GLSL -> SPIRV in just 1.5ms per shader.

naga: Architecture

naga-architecture

wgpu: Conclusion

most mature, portable, well specified
pretty fast, and the only truly safe

vangers debug

blade

Lean and mean graphics API

blade: Motivation

it's not always worth it to provide the driver with all the info ahead of time.
lots of workflows are leaning to compute-only, e.g. 2D graphics rendering, ray tracing, neural networks.
most API complexity is from rasterization.
modern APIs are too verbose.

Screenshot 2024-10-28 222030

blade: Principles

hacking graphics should be fun!
- we can live without resource barriers
- shader resource layouts can be simpler
- uniforms are just data
simplicity >> safety
- no runtime validation
- copyable handles

validation

blade: Look, ma, no bindings!

Shader:

var<storage,read_write> particles: array<Particle>;
var<uniform> parameters: Parameters;

Host:

pc.bind(0, &MainData {
    particles: particle_buffer.into(),
    parameters:  Parameters {
        my_uniform: [1,2,3,4],
    },
});
pc.dispatch([group_count, 1, 1]);

blade: Synchronization

if let mut pass = command_encoder.compute("fill-gbuf") {
    let mut pc = pass.with(&self.fill_pipeline);
    pc.bind(0, &FillData {...});
    pc.dispatch(groups);
}
// implicit barrier between passes
if let mut pass = command_encoder.compute("ray-trace") {
    let mut pc = pass.with(&self.main_pipeline);
    pc.bind(0, &MainData {...});
    pc.dispatch(groups);
}

blade-zed

blade: Performance

API translation and command recording:

Rasterization:

GPU	blade	wgpu-hal
Ryzen 3500U	20K	20K
Ryzen 6850U	70K	70K
GeForce 3050	100K	100K

blade: GPU Penalty

@krOoze on Khronos forums:

Supplying GENERAL everywhere sure is state-of-the-art weapons-grade laziness…

Drivers:

NVIDIA: irrelevant

Just leave images in the VK_IMAGE_LAYOUT_GENERAL layout
AMD: comes down to ac_surface_supports_dcc_image_stores
- roughly starts with RDNA
- experiments show no penalty on Vega
Intel: unclear

Easy to mitigate by inserting transitions around render passes.

blade: conclusion

easy to use, hackable
very fast and portable

game

Thank you!

torus

1

{"image":"https://hackmd.io/_uploads/BJu9sS6a0.jpg","title":"In GPU we Rust","breaks":true,"description":"Presentation about the GPU abstractions in Rust.","contributors":"[{\"id\":\"979e994f-8a6f-4ba5-b86c-9af3abd000ad\",\"add\":12168,\"del\":5573}]"}

changed 9 months ago 179 views