<!-- markdownlint-disable MD013 --> # (Drafts) Multiple components cannot effectively share a GPUDevice because it is stateful **These are drafts. The working proposal can be found [here](https://hackmd.io/@webgpu/ByXm27I29).** [TOC] Multiple components, for example multiple graph/plot widgets on a page, cannot effectively share a `GPUDevice` because it is stateful: - It has a global error scope state, which means any error scope spanning an async boundary will capture stuff from other components. This is very difficult for an application to work around. - It can be destroyed, which means one component which wants to clean up after itself cannot take advantage of `GPUDevice.destroy()` to do so. An application can somewhat work around this by tracking its own resource allocations, which is also more flexible for more complex ownership models (like refcounting), though that only allows freeing Buffer/Texture/QuerySet. This can be useful for some other things as well: - An application/engine that has some asynchronous work that goes on in the "background" can separate the state for that work. - A library that wants to expose its `GPUDevice` to a user can separate the user's state from its own. Relevant past work: - <https://github.com/gpuweb/gpuweb/issues/2599> - <https://github.com/gpuweb/gpuweb/issues/2119#issuecomment-921828306> - <https://hackmd.io/@webgpu/r1mBAEGCK> ## Premises - It should be possible for multiple things using the same device on the same thread to have different error scope state. - There should be no restriction on sharing objects between them. - It's annoying to check and unnecessary. - It's okay if each of those things can't device.destroy() independently. - It should be a lightweight object that can be handled by GC. - Error scope state should not be shared across threads, at least by default - Having an opt-in way to share across threads is actually good for matching native-style semantics # Alternatives - 8: Refinement of 7 with no explicit `*Handle` objects. - 7: Refinement of 6 with no explicit `GPUErrorSink` object. - 6: Like 2 but forked devices have no error sink by default. All errors go to uncapturederror. - There is only one uncapturederror event target and it's on a non-shareable, non-transferrable object. - Eliminates the problem with proposal 3 where multithreading gives you end up with racy error scopes by default, which is probably not what you want. - Error sinks would be shareable (if you do so explicitly) so that native-like semantics can be implemented. - Things with error sinks would be shareable but by default would arrive with no error sink. There would be a way to give it one. - 5: Error scope state is cross-thread BUT nothing has an error sink by default. All errors go to the uncapturederror event (which perhaps has only one instance). - 4: Like 3 but make `GPUTexture` and possibly everything else also be a client so `createView()` doesn't have to move. - Would require a way to make new clients of every object (that has methods that can generate errors on it, but probably need to do all for future proofing). - 3: Error scope state is cross-thread. - 2: `GPUErrorSink` which holds the error state and is an (immutable) member of both `GPUDevice` and `GPUQueue` (and command/bundle encoders). - 1: Multiple "sub device" objects which are stateful and all part of one device (can always share). - 99: Share groups, where you explicitly opt into sharing a resource across `GPUDevice`s and out out of having it destroyed and unmapped by `GPUDevice.destroy()`. Keep the per-device per-realm thing. ## Proposal Idea 8 - All objects have an `[[errorTarget]]`, which roughly speaking has type `GPUDevice?`. - `createView()`, for example, does not have to move. - The only pre-V1 change to today's API is the addition of `GPURootDevice`. (I have long expected we would want this - I just didn't realize it was a breaking change.) - There is only one `uncapturederror` `EventTarget` object for the device (`GPURootDevice`) and it's never possible to duplicate or transfer it. This approach is chosen so `[[errorTarget]]` can be nullable. - When an object is sent to another thread, the error target does not come with it; `[[errorTarget]]` is `null`. Errors that happen on this object get sent straight to `uncapturederror`. To get them to go to an error target, you have to "clone" the object and specify an `errorTarget` to use. - `GPUDevice` **is** the error target and can never have a null error target. - When you send it to another thread you have to explicitly choose whether it's going to share its error state tracker or get a new one. ### 8 Pre-V1 Changes: - Add a `GPURootDevice` subclass of `GPUDevice`. Move `EventTarget` to `GPURootDevice`. Return `GPURootDevice` from `requestDevice()`. Editorial changes: - `GPUObjectBase` gets an internal slot `[[errorTarget]]`, pointing to a device "client". - An object created from `this` inherits `this.[[errorTarget]]`. - All operations send their errors to `(GPUObjectBase)this`, which passes them through `[[errorTarget]]` and forwards uncaptured errors to `uncapturederror`. Notes: - There is not yet any way to override `[[errorTarget]]`. No significant implementation work is needed at this stage. ### 8 Post-V1: Multi-Client Changes: - Most objects gain a method `object.withErrorTarget(GPUDevice errorTarget)`, which gives you an instance of the object with the provided `[[errorTarget]]`. - `GPUDevice` and `GPURootDevice` don't have these, as (roughly speaking) they *are* the error target. - Add `GPUDevice.withNewErrorTracker()` which returns a new `GPUDevice` with an empty error state tracker. ### 8 Post-V1: Multi-Threading Changes: - Make `[[errorTarget]]` nullable. - Objects without an `[[errorTarget]]` send errors directly to `uncapturederror`. - Most objects are thread-shareable. When deserialized, they have a null `[[errorTarget]]`. To associate them with an error target, the receiver must call `withErrorTarget()`. - `GPURootDevice` is not shareable or transferrable. If you try to send it we send the base `GPUDevice`. (TBD how to spec this.) - When `GPUDevice` is sent, you receive a `GPUReceivedDevice`. To get a `GPUDevice` from it, you must choose either `.withNewErrorTracker()` or `.withSharedErrorTracker()`. (These can be called multiple times.) - This affects whether the internal `[[errorTarget]]` points to the same device "client" or a new one. Note: - Native-style semantics can be implemented with `withSharedErrorTracker()`. Also eliminates a need in the actual native API to use thread-local storage to track which error scope state to target. ### 8 Post-V1: Multi-Queue New independent queue objects behave the same way as other objects. ## Proposal Idea 7 ### 7 Pre-V1 Changes: - Add a `GPURootDevice` subclass of `GPUDevice`. Move `EventTarget` to `GPURootDevice`. Return `GPURootDevice` from `requestDevice()`. Editorial changes: - `GPUObjectBase` gets an internal slot `[[errorTarget]]`, pointing to a DeviceClient. - An object created from `this` inherits `this.[[errorTarget]]`. - All operations send their errors to `(GPUObjectBase)this` which passes them through `[[errorTarget]]` and forwards uncaptured errors to `uncapturederror`. Notes: - There is not yet any way to override `[[errorTarget]]`. No significant implementation work is needed at this stage. ### 7 Post-V1: Multi-Client Changes: - Most objects get a `*Handle` version, accessible by `object.handle`. - There is a `GPUDeviceHandle` but not a `GPURootDeviceHandle`. - `handle.instantiate(GPUDevice errorTarget)` gives you an instance of the object, with the provided `[[errorTarget]]`. - `GPUDeviceHandle.instantiate()` does not take an `errorTarget`, as it's creating one. Notes: - `requestDevice` can be described as creating a `GPUDeviceHandle` and returning `GPUDeviceHandle.instantiate()`. ### 7 Post-V1: Multi-Threading Changes: - Make `[[errorTarget]]` nullable. - Objects without an `[[errorTarget]]` send errors directly to `uncapturederror`. - Normal objects are not thread-shareable or transferrable. Their `*Handle` versions are shareable. The receiver must call `handle.instantiate()` and may pass a `GPUDevice` or `null`. - Add a `GPUDeviceHandle.instantiateWithSharedErrorTracker()`. - This makes the internal `[[errorTarget]]` point to the same device "client" instead of a new one. Note: - Native-style semantics can be implemented with `instantiateWithSharedErrorTracker()`. Also eliminates a need in the actual native API to use thread-local storage to track which error scope state to target. ### 7 Post-V1: Multi-Queue Nothing special compared to other objects. ## Proposal Idea 6 ### 6 Pre-V1 Changes: - Add a `GPUErrorSink` object and move `pushErrorScope()`/`popErrorScope()` to it. - `GPUDevice` gets a `readonly attribute GPUErrorSink errorSink`. Editorial changes: - `GPUObjectBase` gets an internal slot `[[errorSink]]`, nullable, pointing to an error sink (handle). - `requestDevice` creates a new `GPUErrorSink` and uses it for the `GPUDevice` and its `defaultQueue`. - All object creations inherit the error sink from `this` (a device/texture/pipeline). - All operations send their errors to `(GPUObjectBase)this` which passes them through `[[errorSink]]` (if any) and forwards uncaptured errors to `uncapturederror`. Notes: - There is not yet any way to attach an existing `GPUErrorSink` to new objects. No significant implementation work is needed at this stage. ### 6 Post-V1: Multi-Client Changes: - Most objects get a `*Handle` version, accessible by `object.handle`. - `handle.instantiate(GPUErrorSink? errorSink)` gives you an instance of the object, with the provided `GPUErrorSink` (if any). - Objects without error sinks send errors directly to `uncapturederror`. - `GPUDeviceHandle.instantiate()` is different from the others: - The `errorSink` argument is optional instead of nullable. - If no `GPUErrorSink` was provided, it creates a new one for the device and its `defaultQueue`. - It returns a `GPUDeviceClone` instead, which does not receive `uncapturederror` events. Notes: - `GPUErrorSink` can now be attached to new objects. - `GPUErrorSink` is not constructible and new ones can only be gotten from `GPUDevice`. - `requestDevice` can be described as creating a `GPUDeviceHandle` and returning `GPUDeviceHandle.instantiate()`. ### 6 Post-V1: Multi-Queue Nothing special compared to other objects. ### 6 Post-V1: Multi-Threading - `GPUErrorSink` is thread-shareable (and internally synchronized). This never happens implicitly. - Normal objects are not thread-shareable or transferrable. Their `*Handle` versions are shareable. The receiver must call `handle.instantiate()` and may pass a `GPUErrorSink`. Note: - Native-style semantics can be implemented by sharing a `GPUErrorSink` object across threads. This eliminates a need in the actual native API to use thread-local storage to track which error scope state to target. ## Proposal Idea 3 In this proposal, we keep things as is except you can have multiple `GPUDeviceClient`s on one thread, with their own error scope states, instead of having all `GPUDevice`s on one thread magically point to the same one. For multi-threading, one client (and its error scope state) is **shared** across threads and internally synchronized. `GPUDeviceClient.destroy()` still unmaps all buffers in the same thread, as `GPUBuffer`s are not associated with a specific client. (Native API note: eliminates thread local storage for error scope state.) ### 3 Stage 0 (pre-V1) Changes: - Move `GPUTexture.createView()` back out to `GPUDevice.createTextureView()` so there's a place to send its errors. - Move `GPUPipelineBase.getBindGroupLayout()` to `GPUDevice`. - Or, change it to be a special exception where it returns an invalid `GPUBindGroupLayout` that doesn't actually generate any errors upon its creation. This would kind of make sense because it's not "creating" an object (but it is). Notes: - Of the other methods not on `GPUDevice` or `GPUQueue`: - Encoder `finish()` calls can stay on command/bundle encoders. It can be said that an encoder is permanently associated with the `GPUDeviceClient` it's created on. - `mapAsync()`, `getMappedRange()`, `unmap()`, `compilationInfo()`, and buffer/texture/queryset `destroy()` would not be able to generate WebGPU errors (they do not, currently). ### 3 Stage 1 (pre-V1) `GPUDevice` becomes an "interface" through which you send commands to an underlying device/queue (referred to by a "handle"). For now call this `GPUDeviceClient`. `GPUQueue` becomes permanently associated with the specific `GPUDeviceClient` that it was created from. For now call this `GPUQueueClient`. (In practice, the names wouldn't change.) Changes: none Notes: - `GPUQueueClient`, `GPUCommandEncoder`, and `GPURenderBundleEncoder` are permanently associated with a specific `GPUDeviceClient`. - All other objects associate with an actual **device** (handle) (which cannot receive errors), not a device **client**. - `label` remains associated with `GPUDeviceClient`/`GPUQueueClient`, not the respective handles. ### 3 Multi-Client (post-V1) Allow creating new `GPUDeviceClient`s for existing devices. Changes: - Add `GPUDeviceClient.createNewClient()` to create new clients from a handle. - New clients have a fresh error scope state. - The `defaultQueue` of this new `GPUDeviceClient` is a new `GPUQueueClient` (of the old queue handle) associated with the new `GPUDeviceClient`. Notes: - `GPUDeviceClient.destroy()` destroys the whole device, not just the client. Just like today, it still unmaps all buffers for that device on the thread where it's called, but not on other threads. - There is no way to detach a `GPUDeviceClient` without destroying the underlying device. It should be a lightweight object that can be handled by GC. ### 3 Multi-Queue (post-V1) Changes: none Notes: - When a queue is created you get a `GPUQueueClient` pointing back to the `GPUDeviceClient` it was created from. If you want a queue with its own error state (for some reason) you can just create it off of a fresh `GPUDeviceClient`. ### 3 Multi-Threading (post-V1) Changes: - Make various handle-only objects serializable (`GPUBuffer`, etc.) - Make `GPUDeviceClient` shareable across threads (but not transferrable). - The `GPUDeviceClient`'s error state would be shared and internally synchronized. - Make `GPUQueueClient` serializable (but not transferrable). - It keeps being associated with the same `GPUDeviceClient`. ### 3 Open Questions - `uncapturederror` exists for the telemetry use case. Should there be a separate uncapturederror sink for each `GPUDeviceClient`, or one common one shared by all? (If there's a common one, an uncaptured error on any client would probably trigger the event on all clients.) - Having one shared sink is less flexible (can't handle/ignore errors from one specific cilent), but makes telemetry easier (everything arrives in one place). - The biggest annoyance with separate sinks is probably that apps will have to deal with `uncapturederror` coming in on multiple threads. However, this is also true of the `unhandledrejection` event for unhandled promise rejections, so this is probably fine. ## Proposal Idea 2 In this proposal, we keep things as is except we extract the error state into a separate object. **For multi-threading, we keep the "per-device per-realm" semantics, but only for `GPUDeviceClient.destroy()` unmapping buffers, not for error scope state.** ### 2 Stage 0 (pre-V1) Changes: - Move `GPUTexture.createView()` back out to `GPUDevice.createTextureView()` so there's a place to send its errors. - Move `GPUPipelineBase.getBindGroupLayout()` to `GPUDevice`. - Or, change it to be a special exception where it returns an invalid `GPUBindGroupLayout` that doesn't actually generate any errors upon its creation. This would kind of make sense because it's not "creating" an object (but it is). Notes: - Of the other methods not on `GPUDevice` or `GPUQueue`: - Encoder `finish()` calls can stay on command/bundle encoders. It can be said that an encoder inherits the `GPUErrorSink` from the `GPUDeviceClient` it's created on. - `mapAsync()`, `getMappedRange()`, `unmap()`, `compilationInfo()`, and buffer/texture/queryset `destroy()` would not be able to generate WebGPU errors (they do not, currently). ### 2 Stage 1 (pre-V1) `GPUDevice` and `GPUQueue` become "interfaces" through which you send commands to an underlying device/queue. For now call these `GPUDeviceClient` and `GPUQueueClient`. (In practice, the names wouldn't change.) Error stuff moves to a new object. For now call this `GPUErrorSink`. Changes: - Move all the error scope state, `pushErrorScope`/`popErrorScope`, and `EventTarget`/`onuncapturederror` onto `GPUErrorSink` (as `pushScope`/`popScope`). - Add `[[error_sink]]` internal slots to `GPUDeviceClient` and `GPUQueueClient`. Not yet configurable, and by default a device and its default queue get the same `GPUErrorSink`, so semantically nothing changes. - Add a `GPUDeviceClient.errorSink` readonly attribute. Notes: - `label` remains associated with `GPUDeviceClient`, not `GPUDeviceHandle`. - **OPEN QUESTION:** `uncapturederror` exists for the telemetry use case. If it's on `GPUErrorSink`, applications must register it on every `GPUErrorSink` they create. It could go on `GPUDeviceHandle` instead, which makes it easier to capture everything for telemetry, but it's also less flexible (can't specifically ignore errors from one user of the device). - The biggest annoyance here is probably that apps will have to deal with `uncapturederror` coming in on multiple threads. However, this is also true of the `unhandledrejection` event for unhandled promise rejections, so this is probably fine. ### 2 Stage 2 (post-V1) Allow creating new `GPUDeviceClient`s for existing devices. Add an object to represent the underlying device. For now call this `GPUDeviceHandle`. Changes: - Expose the `GPUDeviceHandle` as `GPUDeviceClient.handle`. - Add a way to construct `GPUErrorSink`. - Add `GPUDeviceHandle.createClient()` to create new clients from a handle. - Allow configuring the `errorSink` in `requestDevice()` and both `createClient()` methods. - If a `GPUErrorSink` is not provided, both create a new one. Notes: - A `GPUErrorSink` can now be shared by multiple completely unrelated devices. - All `GPUQueueClient` objects are the default queue of some `GPUDeviceClient` and therefore inherit their error sinks. - `GPUDeviceClient.destroy()` destroys the whole device, not just the client. Just like today, it still unmaps all buffers for that device on the thread where it's called, but not on other threads. - There is no way to detach a `GPUDeviceClient` without destroying the underlying device. It should be a lightweight object that can be handled by GC. ### 2 Multi-threading (post-V1) Changes: - Make `GPUDeviceHandle` sharable. `GPUDeviceClient` and `GPUErrorSink` are not sharable or transferrable. Notes: - `GPUDeviceHandle` can have events registered from any thread. If there's an uncaptured error, fire all of them. (This isn't really any different from registering multiple handlers on one event except that they can't cancel each other.) ### 2 Multi-Queue (post-V1) Now, not all queues are default queues. Changes: - Add a `GPUQueueHandle`, `GPUQueueClient.handle`, `GPUQueueHandle.createClient()`. - Allow configuring the error sink when you create a queue. ## Proposal Idea 1 Separate the error scope state from the "global" object so that there can be multiple states maintained separately (for each component). For multithreading, take advantage of this separation as well, so there's no "per-device per-realm" state like we had planned. (In this proposal, bad names are used intentionally for disambiguation from current concepts, and so we have to replace them later.) ### 1 Stage 1: Preparation (pre-V1) Centralize state, state checks, and ordering on one object. For now call this `GPUStatefulDevice`. Each error-generating call must be clearly associated with a single `GPUStatefulDevice` so there is a place for errors to go. Changes: - Move `GPUTexture.createView()` back out to `GPUStatefulDevice.createTextureView()`. Notes: - Encoder `finish()` calls stay on the encoders. An encoder is permanently associated with one `GPUStatefulDevice`. - `defaultQueue` becomes associated with one `GPUStatefulDevice` in particular. For now call this type `GPUSharedQueueInterface`. ### 1 Stage 2: Groups (post-V1?) Create a new concept of a group of `GPUStatefulDevice` objects which can all share their objects freely. For now call this `GPUAdapterConnection`. Changes: - Add `GPUStatefulDevice.connection`, which is the `GPUAdapterConnection` for the `GPUStatefulDevice`. Notes: - Calling `GPUStatefulDevice.destroy()` destroys the whole group, because it's not really meaningful to destroy just one `GPUStatefulDevice` (it doesn't own any resources so there's nothing to clean up). - If done before V1, `destroy()`, and possibly `limits`, `features`, could move to `GPUAdapterConnection`. ### 1 Stage 3: Multiple `GPUStatefulDevice`s (post-V1) Make it possible to get more `GPUStatefulDevice`s from the `GPUAdapterConnection`. Required for multi-threading. Changes: - Add `GPUAdapterConnection.createStatefulDevice()` which gives you a new device in the same group, with the same limits/features, and an empty error scope state. Without multi-queue, it gives you a `GPUSharedQueueInterface defaultQueue` which has the same underlying `GPUSharedQueue` as the other `GPUStatefulDevice`s. - `GPUAdapterConnection` is serializable (sharable). Notes: - `GPUStatefulDevice` is not sharable, because it is stateful. It is also not transferrable, because that would require it to be closeable. ### 1 Multi-Queue (post-V1) Changes - Add a type `GPUSharedQueue` which represents the actual queue (without an associated `GPUStatefulDevice`). You can't do anything with this except create `GPUSharedQueueInterface`s. - Add `GPUSharedQueueInterface.sharedQueue` pointing to the `GPUSharedQueue` (maybe). - Add `GPUAdapterConnection.defaultQueue` which is the default `GPUSharedQueue` (maybe). - In `GPUAdapterConnection.createStatefulDevice()`, allow configuring the `defaultQueue`. Depending on the solution to multi-queue, this may take a `GPUQueueDescriptor`, or it may only take a `GPUSharedQueue` that the `defaultQueue` should point to. ### 1 Options - Do Stage 2 before V1 and move some stuff to `GPUAdapterConnection`. - ~Merge `GPUStatefulDevice` and `GPUQueue`. Multi-queue becomes multi-`GPUStatefulDeviceAndQueue`. It would not be possible to submit work from multiple threads to the same queue (probably a dealbreaker).~ - Don't actually have an object for `GPUAdapterConnection`, instead "clone" `GPUStatefulDevice` directly. Find a name that makes it clear it won't clone the error scope state, like `makeNewStatefulDeviceForAdapterConnection`. - Instead of `GPUStatefulDevice`/`GPUAdapterConnection`, have a device and sub-devices. The sub-devices wouldn't have `destroy()`. (I think I had a reason not to do this but I forgot it.) ### 1 Open Questions - Can we make `GPUStatefulDevice.destroy()` meaningful (destroy only the stuff on that device and not the whole group)? Would require some form of ownership. - Right now, `GPUDevice.destroy()` can unmap everything on the thread. This becomes impossible. Can we preserve that?