Extended buffer protocol PEP draft for device memory support

Abstract

The Python buffer protocol is widely adopted across the ecosystem to share data between packages such as NumPy and its downstream libraries. Adoption for example happens via Cython's typed memoryviews.

However, it was not extended for many years and modern scientific programs now often use accelerators such as GPUs. In part this need has created many new protocols, such as:

and these solve most of the issues that occur to varying degree. Unfortunately, none have quite the reach and low integration as the Python buffer protocol.

In this PEP we propose a light-weight extension of the buffer protocol, to allow exposing non CPU buffer. This extension is designed so that it will allow/simplify addition of new features. An example of such an extension would be indicating buffer borrowing that can be useful for ownership management and is desired for example in rust.[1]

Motivation

Todays data intensive workflows often use CPU and accelerators or cross boundaries between between these. We believe that the buffer protocol could fill at least some of these needs if it were to be extended. Importantly, we believe doing this inside the buffer protocol will help projects that wish to support a wide variaty of devices to do so with less code duplication.

The buffer protocol is deeply integrated into Python and because of that has a centrality and at least small performance benefits. As it is also widely adopted, we believe that if it can fill new needs more widely, it's adoption will increase and make the creation of future protocols unnecessary or simpler.

There is always a danger of creating N + 1 protocols to do a similar task and as such pushing for more larger buffer protocol use. Right now, no single protocol solves all needs and even with this extention, the buffer protocol can cover most, but not all, use-cases.

In general, we believe that the possibility of only partitial adoption should not be seen as problematic. Libraries may implement whichever subset of the buffer-protocol that is useful and easy to implement for them.

This proposal not just allows adoption by more current use-cases, it also unblocks extending the most central buffer exchagne protocol in Python to push new ideas and capabilities.

Rationale

The buffer protocol, with all it's flaws, is widely adopted in the scientific python ecosystem. However, due to it living in Python and the non-obvious nature of how to extend it, new capabilities were never added.

This PEP wishes to address this by:

  • Proposing an extension that will simplifier at future additions.
  • Enable exporting data that is not CPU accessible.

A major point in extending the buffer protocol in this direction is that we wish to remain as compatible as possible:

  • Any existing code should keep working unmodified (backwards compatibility)
  • Ideally, most extension can be used on old Python versions (forward compatibility to the extend possible). This is now and also for future extensions.

Further, we realize that neither Python nor this PEP can or should describe how data exchange on non-CPU memory can work. Exchanging data on accelerators (or distributed data, ) is far more complex than for CPU data:

  • There are a host of accelerators, all with their own ways or interesting additional information.
  • Accelerators work asynchronously: A function using a dataset may still be working when a new function is already called (similar to threading). This means that some form of synchronization is needed to correctly order operations. However, this sychronization is both complicated and device specific.

Thus, the design rationale here is to extend the buffer protocol in backwards compatible way with forward compatibility in mind. And further to leave any device specific information outside of Python's purview, Python may facilitate the exchange of such extension, but is unlikely to define them itself.

Specification

New slot for supported flags

While the below device flag specifically accepts ignoring this, we propose adding a new field to the buffer slots to PyBufferProcs:

struct PyBufferProcs {
    /* get buffer and release buffer slots */
    int bf_supported_flags;
}

A new PyBUF_REQUIRED_FLAGS constant (see later) and a way to query these via:

/*
 * Check if an object supports the buffer protocol and certain flags.
 * 
 */
int PyObject_CheckBufferSupports(PyObject *obj, int flags);

On future Python versions, int PyObject_CheckBufferSupports(obj, flags) can be used to query whether an object is a buffer, and if it is whether it advertises support for all flags (users may need to call CheckBuffer to as a 0 return may indicate no buffer protocol support).

In practice, some flags are soft request/capability flags. This is currently PyBUF_INDIRECT, which indicates support by the consumer but in practice the producer is unlikely to use it.

In practice, Python will define PyBUF_REQUIRED_FLAGS to indicate which flags must not be ignored the producer and PyObject_GetBuffer will use PyObject_CheckBufferSupports(obj, flags & PyBUF_REQUIRED_FLAGS) and set a BufferError.

Currently, it is typical practice to ignore unknown flags and this practice is actually useful for us. In the future, Python will enforce flag support for known flags.

Compatibility

Currently, supporting only PyBUF_SIMPLE == 0 is possible, but for types doing nothing bf_supported_flags == 0 must be the default for technical reasons. This means that 0 will be translated to support for all currently existing flags and does not change current behavior.

We suggest using -1 as an indicator that only PyBUF_SIMPLE is supported. (One only needs to ensure that -1 can never have a meaning of "all flags", which may limit the sign bit to be a flag in the future.)

With the above, this extension has the following backwards/foward compatibility:

  • Users do not need to check the flags or set the bf_support_flags slot to keep existing implementations working. Implementing it would only avoid some BufferError creation.
  • After adoption of the new slow, new required flags can be backported (i.e. adopted earlier), with the one caveat that the user must check for new flags in PyObject_CheckBufferSupports themselves as PyObject_GetBuffer would not check for these on older Python versions.

Extended buffer struct and PyBUF_DEVICE flag

We propose a new "extended" set of flags with the only current member of the family being PyBUF_DEVICE.

The new flag PyBUF_DEVICE will be a request flag to be passed to PyObject_GetBuffer. This flag is not a "required" flag, but allows the producer to fill in device information if desired.

If PyBUF_DEVICE (or any future extended flag) is passed, the structure passed must have a layout of:

struct Py_buffer_extended {
    /* Until here identical to previous buffer interface */
    int flags;
    int ext_flags;
    /* Identifier and small scratch space for device indication */
    char *device_type;
    uintptr_t device_specific_storage[3];
    /* Future flags can new fields here fields */
}

Which currently adds flags, to indicate which of the new request flags were used, ext_flags as general flag space for the future, and device specific space. (We are happy to add additional information or reserved space, but growing this struct by introducing a new request flag seems easy.)

A producer may fill in this extended information if such an extended flag is passed. A producer must not touch additional fields if the corresponding request flag was not passed. Thus if for example PyBUF_DEVICE wasn't passed, but is required to describe the buffer a BufferError must be raised. (Support must be indicated in bf_supported_flags, but that only allows the consumer to skip calling PyObject_GetBuffer if it would reject all CPU buffers anyway.)

If a producer fills in any extended information it must set the flags to include this information. That also means that the consumer must check the flags before using any of the passed fields, even if the producer advertises support.[2]

In the future PyObject_GetBuffer() will zero both flags and ext_flags to ensure correctness.

Compatibility

As producers are free to ignore the extended flags, this extension is fully backwards compatible. Producers may exist that error on undefined flags, however, we are not aware of any.

One correct observation is that some PyBuffer_* functions will only be valid on non-device buffers. However, they cannot be called accidentally, so that this only requires documentation for actual support.

This extension comes with a future compatibility design:

  • Many new extended flags can be added at any time to add new information or grow the buffer struct, since consumers are free to ignore these.
  • The use of request-style flags like PyBUF_DEVICE on Python versions without PyObject_CheckBufferSupports is possible in practice by zeroing flags and ext_flags.
  • If a new extended flag would be strictly required to be supported by the producer, backporting is possible but requires some care. Python's PyBUF_REQUIRED_FLAGS would bit include the flag, so the user must check PyObject_CheckBufferSupports manually. (Such flags cannot be backported to Python versions without PyObject_CheckBufferSupports.)

Device information

If PyBUF_DEVICE is passed, the device information must be filled in, in a well defined way. Python reserves the "cpu" identifier for possible future extension.

Since we reject the idea of Python defining device specific standards, we instead propose that the above mentioned device_identifier must be either NULL or point to a unique, null terminated, char *.

If a device is matched, the device specific storage can then be reinterpreted to whatever matches the corresponding specification.

The actual device specification may be tricky and will not be provided by Python itself.

Choice and list of device identifiers

Since this proposal is to use a unique name as a device identifier there is a problem of competing naming and authority to use a canonical name.

Python cannot fully control this, but users specifying an extension should open a documentation PR to Python before they adopt a name and Python does reserve the right to reject a choice if possibly contest.

For example using SYCL, CUDA, or HIP as a name is not acceptable without clear consensus. Experiments, could rather use a name likecupy-cuda even if that may unfortunately mean a transition in the future.

Consumers of such a definition can support multiple definitions, but the producer cannot deprecate theirs, unfortunately.

Notes for device specification

Since an exporter cannot support multiple devices (except via a global config), care should be taken when designing device specific information.

  • Include flags or a version to possibly extend the device information in the future.
  • You may also wish to reserve

Backwards Compatibility

This PEP is fully backwards compatible. It also largely foward compatible because many exporters will ignore additional extended flags.

The above section contain brief notes on backwards and forward compatibility.

Reference Implementation

Beyond documentation, this PEP requires relatively minor extensions to CPython itself. We propose:

  • Define the PyBUF_DEVICE flag
  • Define PyBUF_REQUIRED_FLAGS.
  • Defining the new PyObject_CheckBufferSupports function.
  • Introduce the Py_buffer_extended struct (which may grow in the future).
  • Changing PyObject_GetBuffer to zero out flags and ext_flags and to check PyBUF_REQUIRED_FLAGS.
  • Change type creation to fill in bf_supported_flags correctly.

Definitions which do not touch bf_supported_flags may be backported e.g. to https://github.com/vstinner/pythoncapi_compat.

Otherwise, the buffer protocol documentation needs to be extended with these definitions and add notes to all public API functions that cannot work with the new extended flag.

Security Implications

There are no security implications beyond incorrect implementations or use of existing API on buffers requested via the new flags.

Rejected Ideas

As we do wish to integrate this and extend the buffer protocol and do not wish for Python to define details of device support, we are not aware of alternative approaches.

In detail, we considered using a unique symbol rather than a char * device name to identify the unique device. While this seemed very reasonable having a string is useful to raise errors when a device type is not understood. If we need a string for this purpose it seems reasonable to use it for device identification.

Open Discussions

How to exchange community defined device types

Python itself should not need to define exact device types as there are many such devices and they may be complex.

A problem is how to reserve names for new devices and exchange existing definitions. We propose here to have light review by asking for a PR to the Python documentation as well as discussion on the Python discuss for new ideas.

Python cannot striclty police this, although it may rejec

Including new fields now and future struct growth

We could include some reserved space already now in the Py_buffer_extended struct, which may be nice for future adoption. However, other than this space coming before the device space there seems little gain in it.

In the future Py_buffer_extended may grow, even if users do not need the additional space. We consider this to be OK and users who are concerned by it should vendor the struct definition.

Add an explicit PyBUF_EXTENDED flags

We could add an explicit PyBUF_EXTENDED request flag to indicate exactly that the extended flags are available. This seemed unnecessary to us for now, but we are happy to do this if it seems clearer or there is a possible future use-case this would simplify.


  1. The concept may also be useful for device data exchange, since knowing that a buffer is only borrowed temporarily can simplify worries about synchronization (where multiple works might use data at the same time). ↩︎

  2. I.e. if a consumer passes PyObject_GetBuffer(obj, Py_buf_extended &buf, PyBUF_DEVICE); it must check (buf->flags & PyBUF_DEVICE) and if not set, must not access the device specific information and assume a CPU buffer. ↩︎

Select a repo