Try   HackMD

(Drafts) Multiple components cannot effectively share a GPUDevice because it is stateful

These are drafts. The working proposal can be found here.

Multiple components, for example multiple graph/plot widgets on a page, cannot effectively share a GPUDevice because it is stateful:

  • It has a global error scope state, which means any error scope spanning an async boundary will capture stuff from other components. This is very difficult for an application to work around.
  • It can be destroyed, which means one component which wants to clean up after itself cannot take advantage of GPUDevice.destroy() to do so. An application can somewhat work around this by tracking its own resource allocations, which is also more flexible for more complex ownership models (like refcounting), though that only allows freeing Buffer/Texture/QuerySet.

This can be useful for some other things as well:

  • An application/engine that has some asynchronous work that goes on in the "background" can separate the state for that work.
  • A library that wants to expose its GPUDevice to a user can separate the user's state from its own.

Relevant past work:

Premises

  • It should be possible for multiple things using the same device on the same thread to have different error scope state.
  • There should be no restriction on sharing objects between them.
    • It's annoying to check and unnecessary.
  • It's okay if each of those things can't device.destroy() independently.
    • It should be a lightweight object that can be handled by GC.
  • Error scope state should not be shared across threads, at least by default
    • Having an opt-in way to share across threads is actually good for matching native-style semantics

Alternatives

  • 8: Refinement of 7 with no explicit *Handle objects.
  • 7: Refinement of 6 with no explicit GPUErrorSink object.
  • 6: Like 2 but forked devices have no error sink by default. All errors go to uncapturederror.
    • There is only one uncapturederror event target and it's on a non-shareable, non-transferrable object.
    • Eliminates the problem with proposal 3 where multithreading gives you end up with racy error scopes by default, which is probably not what you want.
    • Error sinks would be shareable (if you do so explicitly) so that native-like semantics can be implemented.
    • Things with error sinks would be shareable but by default would arrive with no error sink. There would be a way to give it one.
  • 5: Error scope state is cross-thread BUT nothing has an error sink by default. All errors go to the uncapturederror event (which perhaps has only one instance).
  • 4: Like 3 but make GPUTexture and possibly everything else also be a client so createView() doesn't have to move.
    • Would require a way to make new clients of every object (that has methods that can generate errors on it, but probably need to do all for future proofing).
  • 3: Error scope state is cross-thread.
  • 2: GPUErrorSink which holds the error state and is an (immutable) member of both GPUDevice and GPUQueue (and command/bundle encoders).
  • 1: Multiple "sub device" objects which are stateful and all part of one device (can always share).
  • 99: Share groups, where you explicitly opt into sharing a resource across GPUDevices and out out of having it destroyed and unmapped by GPUDevice.destroy(). Keep the per-device per-realm thing.

Proposal Idea 8

  • All objects have an [[errorTarget]], which roughly speaking has type GPUDevice?.
    • createView(), for example, does not have to move.
  • The only pre-V1 change to today's API is the addition of GPURootDevice. (I have long expected we would want this - I just didn't realize it was a breaking change.)
    • There is only one uncapturederror EventTarget object for the device (GPURootDevice) and it's never possible to duplicate or transfer it. This approach is chosen so [[errorTarget]] can be nullable.
  • When an object is sent to another thread, the error target does not come with it; [[errorTarget]] is null. Errors that happen on this object get sent straight to uncapturederror. To get them to go to an error target, you have to "clone" the object and specify an errorTarget to use.
  • GPUDevice is the error target and can never have a null error target.
    • When you send it to another thread you have to explicitly choose whether it's going to share its error state tracker or get a new one.

8 Pre-V1

Changes:

  • Add a GPURootDevice subclass of GPUDevice. Move EventTarget to GPURootDevice. Return GPURootDevice from requestDevice().

Editorial changes:

  • GPUObjectBase gets an internal slot [[errorTarget]], pointing to a device "client".
  • An object created from this inherits this.[[errorTarget]].
  • All operations send their errors to (GPUObjectBase)this, which passes them through [[errorTarget]] and forwards uncaptured errors to uncapturederror.

Notes:

  • There is not yet any way to override [[errorTarget]]. No significant implementation work is needed at this stage.

8 Post-V1: Multi-Client

Changes:

  • Most objects gain a method object.withErrorTarget(GPUDevice errorTarget), which gives you an instance of the object with the provided [[errorTarget]].
    • GPUDevice and GPURootDevice don't have these, as (roughly speaking) they are the error target.
  • Add GPUDevice.withNewErrorTracker() which returns a new GPUDevice with an empty error state tracker.

8 Post-V1: Multi-Threading

Changes:

  • Make [[errorTarget]] nullable.
    • Objects without an [[errorTarget]] send errors directly to uncapturederror.
  • Most objects are thread-shareable. When deserialized, they have a null [[errorTarget]]. To associate them with an error target, the receiver must call withErrorTarget().
    • GPURootDevice is not shareable or transferrable. If you try to send it we send the base GPUDevice. (TBD how to spec this.)
  • When GPUDevice is sent, you receive a GPUReceivedDevice. To get a GPUDevice from it, you must choose either .withNewErrorTracker() or .withSharedErrorTracker(). (These can be called multiple times.)
    • This affects whether the internal [[errorTarget]] points to the same device "client" or a new one.

Note:

  • Native-style semantics can be implemented with withSharedErrorTracker(). Also eliminates a need in the actual native API to use thread-local storage to track which error scope state to target.

8 Post-V1: Multi-Queue

New independent queue objects behave the same way as other objects.

Proposal Idea 7

7 Pre-V1

Changes:

  • Add a GPURootDevice subclass of GPUDevice. Move EventTarget to GPURootDevice. Return GPURootDevice from requestDevice().

Editorial changes:

  • GPUObjectBase gets an internal slot [[errorTarget]], pointing to a DeviceClient.
  • An object created from this inherits this.[[errorTarget]].
  • All operations send their errors to (GPUObjectBase)this which passes them through [[errorTarget]] and forwards uncaptured errors to uncapturederror.

Notes:

  • There is not yet any way to override [[errorTarget]]. No significant implementation work is needed at this stage.

7 Post-V1: Multi-Client

Changes:

  • Most objects get a *Handle version, accessible by object.handle.
    • There is a GPUDeviceHandle but not a GPURootDeviceHandle.
  • handle.instantiate(GPUDevice errorTarget) gives you an instance of the object, with the provided [[errorTarget]].
  • GPUDeviceHandle.instantiate() does not take an errorTarget, as it's creating one.

Notes:

  • requestDevice can be described as creating a GPUDeviceHandle and returning GPUDeviceHandle.instantiate().

7 Post-V1: Multi-Threading

Changes:

  • Make [[errorTarget]] nullable.
    • Objects without an [[errorTarget]] send errors directly to uncapturederror.
  • Normal objects are not thread-shareable or transferrable. Their *Handle versions are shareable. The receiver must call handle.instantiate() and may pass a GPUDevice or null.
  • Add a GPUDeviceHandle.instantiateWithSharedErrorTracker().
    • This makes the internal [[errorTarget]] point to the same device "client" instead of a new one.

Note:

  • Native-style semantics can be implemented with instantiateWithSharedErrorTracker(). Also eliminates a need in the actual native API to use thread-local storage to track which error scope state to target.

7 Post-V1: Multi-Queue

Nothing special compared to other objects.

Proposal Idea 6

6 Pre-V1

Changes:

  • Add a GPUErrorSink object and move pushErrorScope()/popErrorScope() to it.
  • GPUDevice gets a readonly attribute GPUErrorSink errorSink.

Editorial changes:

  • GPUObjectBase gets an internal slot [[errorSink]], nullable, pointing to an error sink (handle).
  • requestDevice creates a new GPUErrorSink and uses it for the GPUDevice and its defaultQueue.
  • All object creations inherit the error sink from this (a device/texture/pipeline).
  • All operations send their errors to (GPUObjectBase)this which passes them through [[errorSink]] (if any) and forwards uncaptured errors to uncapturederror.

Notes:

  • There is not yet any way to attach an existing GPUErrorSink to new objects. No significant implementation work is needed at this stage.

6 Post-V1: Multi-Client

Changes:

  • Most objects get a *Handle version, accessible by object.handle.
  • handle.instantiate(GPUErrorSink? errorSink) gives you an instance of the object, with the provided GPUErrorSink (if any).
    • Objects without error sinks send errors directly to uncapturederror.
  • GPUDeviceHandle.instantiate() is different from the others:
    • The errorSink argument is optional instead of nullable.
    • If no GPUErrorSink was provided, it creates a new one for the device and its defaultQueue.
    • It returns a GPUDeviceClone instead, which does not receive uncapturederror events.

Notes:

  • GPUErrorSink can now be attached to new objects.
  • GPUErrorSink is not constructible and new ones can only be gotten from GPUDevice.
  • requestDevice can be described as creating a GPUDeviceHandle and returning GPUDeviceHandle.instantiate().

6 Post-V1: Multi-Queue

Nothing special compared to other objects.

6 Post-V1: Multi-Threading

  • GPUErrorSink is thread-shareable (and internally synchronized). This never happens implicitly.
  • Normal objects are not thread-shareable or transferrable. Their *Handle versions are shareable. The receiver must call handle.instantiate() and may pass a GPUErrorSink.

Note:

  • Native-style semantics can be implemented by sharing a GPUErrorSink object across threads. This eliminates a need in the actual native API to use thread-local storage to track which error scope state to target.

Proposal Idea 3

In this proposal, we keep things as is except you can have multiple GPUDeviceClients on one thread, with their own error scope states, instead of having all GPUDevices on one thread magically point to the same one. For multi-threading, one client (and its error scope state) is shared across threads and internally synchronized.

GPUDeviceClient.destroy() still unmaps all buffers in the same thread, as GPUBuffers are not associated with a specific client.

(Native API note: eliminates thread local storage for error scope state.)

3 Stage 0 (pre-V1)

Changes:

  • Move GPUTexture.createView() back out to GPUDevice.createTextureView() so there's a place to send its errors.
  • Move GPUPipelineBase.getBindGroupLayout() to GPUDevice.
    • Or, change it to be a special exception where it returns an invalid GPUBindGroupLayout that doesn't actually generate any errors upon its creation. This would kind of make sense because it's not "creating" an object (but it is).

Notes:

  • Of the other methods not on GPUDevice or GPUQueue:
    • Encoder finish() calls can stay on command/bundle encoders. It can be said that an encoder is permanently associated with the GPUDeviceClient it's created on.
    • mapAsync(), getMappedRange(), unmap(), compilationInfo(), and buffer/texture/queryset destroy() would not be able to generate WebGPU errors (they do not, currently).

3 Stage 1 (pre-V1)

GPUDevice becomes an "interface" through which you send commands to an underlying device/queue (referred to by a "handle"). For now call this GPUDeviceClient.

GPUQueue becomes permanently associated with the specific GPUDeviceClient that it was created from. For now call this GPUQueueClient.

(In practice, the names wouldn't change.)

Changes: none

Notes:

  • GPUQueueClient, GPUCommandEncoder, and GPURenderBundleEncoder are permanently associated with a specific GPUDeviceClient.
  • All other objects associate with an actual device (handle) (which cannot receive errors), not a device client.
  • label remains associated with GPUDeviceClient/GPUQueueClient, not the respective handles.

3 Multi-Client (post-V1)

Allow creating new GPUDeviceClients for existing devices.

Changes:

  • Add GPUDeviceClient.createNewClient() to create new clients from a handle.
    • New clients have a fresh error scope state.
    • The defaultQueue of this new GPUDeviceClient is a new GPUQueueClient (of the old queue handle) associated with the new GPUDeviceClient.

Notes:

  • GPUDeviceClient.destroy() destroys the whole device, not just the client. Just like today, it still unmaps all buffers for that device on the thread where it's called, but not on other threads.
    • There is no way to detach a GPUDeviceClient without destroying the underlying device. It should be a lightweight object that can be handled by GC.

3 Multi-Queue (post-V1)

Changes: none

Notes:

  • When a queue is created you get a GPUQueueClient pointing back to the GPUDeviceClient it was created from. If you want a queue with its own error state (for some reason) you can just create it off of a fresh GPUDeviceClient.

3 Multi-Threading (post-V1)

Changes:

  • Make various handle-only objects serializable (GPUBuffer, etc.)
  • Make GPUDeviceClient shareable across threads (but not transferrable).
    • The GPUDeviceClient's error state would be shared and internally synchronized.
  • Make GPUQueueClient serializable (but not transferrable).
    • It keeps being associated with the same GPUDeviceClient.

3 Open Questions

  • uncapturederror exists for the telemetry use case. Should there be a separate uncapturederror sink for each GPUDeviceClient, or one common one shared by all? (If there's a common one, an uncaptured error on any client would probably trigger the event on all clients.)
    • Having one shared sink is less flexible (can't handle/ignore errors from one specific cilent), but makes telemetry easier (everything arrives in one place).
    • The biggest annoyance with separate sinks is probably that apps will have to deal with uncapturederror coming in on multiple threads. However, this is also true of the unhandledrejection event for unhandled promise rejections, so this is probably fine.

Proposal Idea 2

In this proposal, we keep things as is except we extract the error state into a separate object.
For multi-threading, we keep the "per-device per-realm" semantics, but only for GPUDeviceClient.destroy() unmapping buffers, not for error scope state.

2 Stage 0 (pre-V1)

Changes:

  • Move GPUTexture.createView() back out to GPUDevice.createTextureView() so there's a place to send its errors.
  • Move GPUPipelineBase.getBindGroupLayout() to GPUDevice.
    • Or, change it to be a special exception where it returns an invalid GPUBindGroupLayout that doesn't actually generate any errors upon its creation. This would kind of make sense because it's not "creating" an object (but it is).

Notes:

  • Of the other methods not on GPUDevice or GPUQueue:
    • Encoder finish() calls can stay on command/bundle encoders. It can be said that an encoder inherits the GPUErrorSink from the GPUDeviceClient it's created on.
    • mapAsync(), getMappedRange(), unmap(), compilationInfo(), and buffer/texture/queryset destroy() would not be able to generate WebGPU errors (they do not, currently).

2 Stage 1 (pre-V1)

GPUDevice and GPUQueue become "interfaces" through which you send commands to an underlying device/queue. For now call these GPUDeviceClient and GPUQueueClient. (In practice, the names wouldn't change.)

Error stuff moves to a new object. For now call this GPUErrorSink.

Changes:

  • Move all the error scope state, pushErrorScope/popErrorScope, and EventTarget/onuncapturederror onto GPUErrorSink (as pushScope/popScope).
  • Add [[error_sink]] internal slots to GPUDeviceClient and GPUQueueClient. Not yet configurable, and by default a device and its default queue get the same GPUErrorSink, so semantically nothing changes.
  • Add a GPUDeviceClient.errorSink readonly attribute.

Notes:

  • label remains associated with GPUDeviceClient, not GPUDeviceHandle.
  • OPEN QUESTION: uncapturederror exists for the telemetry use case. If it's on GPUErrorSink, applications must register it on every GPUErrorSink they create. It could go on GPUDeviceHandle instead, which makes it easier to capture everything for telemetry, but it's also less flexible (can't specifically ignore errors from one user of the device).
    • The biggest annoyance here is probably that apps will have to deal with uncapturederror coming in on multiple threads. However, this is also true of the unhandledrejection event for unhandled promise rejections, so this is probably fine.

2 Stage 2 (post-V1)

Allow creating new GPUDeviceClients for existing devices.

Add an object to represent the underlying device. For now call this GPUDeviceHandle.

Changes:

  • Expose the GPUDeviceHandle as GPUDeviceClient.handle.
  • Add a way to construct GPUErrorSink.
  • Add GPUDeviceHandle.createClient() to create new clients from a handle.
  • Allow configuring the errorSink in requestDevice() and both createClient() methods.
    • If a GPUErrorSink is not provided, both create a new one.

Notes:

  • A GPUErrorSink can now be shared by multiple completely unrelated devices.
  • All GPUQueueClient objects are the default queue of some GPUDeviceClient and therefore inherit their error sinks.
  • GPUDeviceClient.destroy() destroys the whole device, not just the client. Just like today, it still unmaps all buffers for that device on the thread where it's called, but not on other threads.
    • There is no way to detach a GPUDeviceClient without destroying the underlying device. It should be a lightweight object that can be handled by GC.

2 Multi-threading (post-V1)

Changes:

  • Make GPUDeviceHandle sharable. GPUDeviceClient and GPUErrorSink are not sharable or transferrable.

Notes:

  • GPUDeviceHandle can have events registered from any thread. If there's an uncaptured error, fire all of them. (This isn't really any different from registering multiple handlers on one event except that they can't cancel each other.)

2 Multi-Queue (post-V1)

Now, not all queues are default queues.

Changes:

  • Add a GPUQueueHandle, GPUQueueClient.handle, GPUQueueHandle.createClient().
  • Allow configuring the error sink when you create a queue.

Proposal Idea 1

Separate the error scope state from the "global" object so that there can be multiple states maintained separately (for each component).

For multithreading, take advantage of this separation as well, so there's no "per-device per-realm" state like we had planned.

(In this proposal, bad names are used intentionally for disambiguation from current concepts, and so we have to replace them later.)

1 Stage 1: Preparation (pre-V1)

Centralize state, state checks, and ordering on one object. For now call this GPUStatefulDevice.

Each error-generating call must be clearly associated with a single GPUStatefulDevice so there is a place for errors to go.

Changes:

  • Move GPUTexture.createView() back out to GPUStatefulDevice.createTextureView().

Notes:

  • Encoder finish() calls stay on the encoders. An encoder is permanently associated with one GPUStatefulDevice.
  • defaultQueue becomes associated with one GPUStatefulDevice in particular. For now call this type GPUSharedQueueInterface.

1 Stage 2: Groups (post-V1?)

Create a new concept of a group of GPUStatefulDevice objects which can all share their objects freely. For now call this GPUAdapterConnection.

Changes:

  • Add GPUStatefulDevice.connection, which is the GPUAdapterConnection for the GPUStatefulDevice.

Notes:

  • Calling GPUStatefulDevice.destroy() destroys the whole group, because it's not really meaningful to destroy just one GPUStatefulDevice (it doesn't own any resources so there's nothing to clean up).
  • If done before V1, destroy(), and possibly limits, features, could move to GPUAdapterConnection.

1 Stage 3: Multiple GPUStatefulDevices (post-V1)

Make it possible to get more GPUStatefulDevices from the GPUAdapterConnection.
Required for multi-threading.

Changes:

  • Add GPUAdapterConnection.createStatefulDevice() which gives you a new device in the same group, with the same limits/features, and an empty error scope state. Without multi-queue, it gives you a GPUSharedQueueInterface defaultQueue which has the same underlying GPUSharedQueue as the other GPUStatefulDevices.
  • GPUAdapterConnection is serializable (sharable).

Notes:

  • GPUStatefulDevice is not sharable, because it is stateful. It is also not transferrable, because that would require it to be closeable.

1 Multi-Queue (post-V1)

Changes

  • Add a type GPUSharedQueue which represents the actual queue (without an associated GPUStatefulDevice). You can't do anything with this except create GPUSharedQueueInterfaces.
  • Add GPUSharedQueueInterface.sharedQueue pointing to the GPUSharedQueue (maybe).
  • Add GPUAdapterConnection.defaultQueue which is the default GPUSharedQueue (maybe).
  • In GPUAdapterConnection.createStatefulDevice(), allow configuring the defaultQueue. Depending on the solution to multi-queue, this may take a GPUQueueDescriptor, or it may only take a GPUSharedQueue that the defaultQueue should point to.

1 Options

  • Do Stage 2 before V1 and move some stuff to GPUAdapterConnection.
  • ~Merge GPUStatefulDevice and GPUQueue. Multi-queue becomes multi-GPUStatefulDeviceAndQueue. It would not be possible to submit work from multiple threads to the same queue (probably a dealbreaker).~
  • Don't actually have an object for GPUAdapterConnection, instead "clone" GPUStatefulDevice directly. Find a name that makes it clear it won't clone the error scope state, like makeNewStatefulDeviceForAdapterConnection.
  • Instead of GPUStatefulDevice/GPUAdapterConnection, have a device and sub-devices. The sub-devices wouldn't have destroy(). (I think I had a reason not to do this but I forgot it.)

1 Open Questions

  • Can we make GPUStatefulDevice.destroy() meaningful (destroy only the stuff on that device and not the whole group)? Would require some form of ownership.
  • Right now, GPUDevice.destroy() can unmap everything on the thread. This becomes impossible. Can we preserve that?