The Python buffer protocol is widely adopted across the ecosystem to share data between packages such as NumPy and its downstream libraries. Adoption for example happens via Cython's typed memoryviews
.
However, it was not extended for many years and modern scientific programs now often use accelerators such as GPUs. In part this need has created many new protocols, such as:
and these solve most of the issues that occur to varying degree. Unfortunately, none have quite the reach and low integration as the Python buffer protocol.
In this PEP we propose a light-weight extension of the buffer protocol, to allow exposing non CPU buffer. This extension is designed so that it will allow/simplify addition of new features. An example of such an extension would be indicating buffer borrowing that can be useful for ownership management and is desired for example in rust.[1]
Todays data intensive workflows often use CPU and accelerators or cross boundaries between between these. We believe that the buffer protocol could fill at least some of these needs if it were to be extended. Importantly, we believe doing this inside the buffer protocol will help projects that wish to support a wide variaty of devices to do so with less code duplication.
The buffer protocol is deeply integrated into Python and because of that has a centrality and at least small performance benefits. As it is also widely adopted, we believe that if it can fill new needs more widely, it's adoption will increase and make the creation of future protocols unnecessary or simpler.
There is always a danger of creating N + 1
protocols to do a similar task and as such pushing for more larger buffer protocol use.
Right now, no single protocol solves all needs and even with this extention, the buffer protocol can cover most, but not all, use-cases.
In general, we believe that the possibility of only partitial adoption should not be seen as problematic. Libraries may implement whichever subset of the buffer-protocol that is useful and easy to implement for them.
This proposal not just allows adoption by more current use-cases, it also unblocks extending the most central buffer exchagne protocol in Python to push new ideas and capabilities.
The buffer protocol, with all it's flaws, is widely adopted in the scientific python ecosystem. However, due to it living in Python and the non-obvious nature of how to extend it, new capabilities were never added.
This PEP wishes to address this by:
A major point in extending the buffer protocol in this direction is that we wish to remain as compatible as possible:
Further, we realize that neither Python nor this PEP can or should describe how data exchange on non-CPU memory can work. Exchanging data on accelerators (or distributed data, …) is far more complex than for CPU data:
Thus, the design rationale here is to extend the buffer protocol in backwards compatible way with forward compatibility in mind. And further to leave any device specific information outside of Python's purview, Python may facilitate the exchange of such extension, but is unlikely to define them itself.
While the below device flag specifically accepts ignoring this, we propose adding a new field to the buffer slots to PyBufferProcs
:
struct PyBufferProcs {
/* get buffer and release buffer slots */
int bf_supported_flags;
}
A new PyBUF_REQUIRED_FLAGS
constant (see later) and a way to query these
via:
/*
* Check if an object supports the buffer protocol and certain flags.
*
*/
int PyObject_CheckBufferSupports(PyObject *obj, int flags);
On future Python versions, int PyObject_CheckBufferSupports(obj, flags)
can be used to query whether an object is a buffer, and if it is whether it advertises support for all flags (users may need to call CheckBuffer
to as a 0 return may indicate no buffer protocol support).
In practice, some flags are soft request/capability flags. This is currently PyBUF_INDIRECT
, which indicates support by the consumer but in practice the producer is unlikely to use it.
In practice, Python will define PyBUF_REQUIRED_FLAGS
to indicate which flags must not be ignored the producer and PyObject_GetBuffer
will use PyObject_CheckBufferSupports(obj, flags & PyBUF_REQUIRED_FLAGS)
and set a BufferError
.
Currently, it is typical practice to ignore unknown flags and this practice is actually useful for us. In the future, Python will enforce flag support for known flags.
Compatibility
Currently, supporting only PyBUF_SIMPLE == 0
is possible, but for types doing nothing bf_supported_flags == 0
must be the default for technical reasons.
This means that 0
will be translated to support for all currently existing flags and does not change current behavior.
We suggest using -1
as an indicator that only PyBUF_SIMPLE
is supported. (One only needs to ensure that -1
can never have a meaning of "all flags", which may limit the sign bit to be a flag in the future.)
With the above, this extension has the following backwards/foward compatibility:
bf_support_flags
slot to keep existing implementations working. Implementing it would only avoid some BufferError
creation.PyObject_CheckBufferSupports
themselves as PyObject_GetBuffer
would not check for these on older Python versions.PyBUF_DEVICE
flagWe propose a new "extended" set of flags with the only current member of the family being PyBUF_DEVICE
.
The new flag PyBUF_DEVICE
will be a request flag to be passed to PyObject_GetBuffer
.
This flag is not a "required" flag, but allows the producer to fill in device information if desired.
If PyBUF_DEVICE
(or any future extended flag) is passed, the structure passed must have a layout of:
struct Py_buffer_extended {
/* Until here identical to previous buffer interface */
int flags;
int ext_flags;
/* Identifier and small scratch space for device indication */
char *device_type;
uintptr_t device_specific_storage[3];
/* Future flags can new fields here fields */
}
Which currently adds flags
, to indicate which of the new request flags were used, ext_flags
as general flag space for the future, and device specific space.
(We are happy to add additional information or reserved space, but growing this struct by introducing a new request flag seems easy.)
A producer may fill in this extended information if such an extended flag is passed. A producer must not touch additional fields if the corresponding request flag was not passed.
Thus if for example PyBUF_DEVICE
wasn't passed, but is required to describe the buffer a BufferError
must be raised.
(Support must be indicated in bf_supported_flags
, but that only allows the consumer to skip calling PyObject_GetBuffer
if it would reject all CPU buffers anyway.)
If a producer fills in any extended information it must set the flags
to include this information.
That also means that the consumer must check the flags
before using any of the passed fields, even if the producer advertises support.[2]
In the future PyObject_GetBuffer()
will zero both flags
and ext_flags
to ensure correctness.
Compatibility
As producers are free to ignore the extended flags, this extension is fully backwards compatible. Producers may exist that error on undefined flags, however, we are not aware of any.
One correct observation is that some PyBuffer_*
functions will only be valid on non-device buffers. However, they cannot be called accidentally, so that this only requires documentation for actual support.
This extension comes with a future compatibility design:
PyBUF_DEVICE
on Python versions without PyObject_CheckBufferSupports
is possible in practice by zeroing flags
and ext_flags
.PyBUF_REQUIRED_FLAGS
would bit include the flag, so the user must check PyObject_CheckBufferSupports
manually.
(Such flags cannot be backported to Python versions without PyObject_CheckBufferSupports
.)If PyBUF_DEVICE
is passed, the device information must be filled in, in a well defined way.
Python reserves the "cpu"
identifier for possible future extension.
Since we reject the idea of Python defining device specific standards, we instead propose that the above mentioned device_identifier
must be either NULL
or point to a unique, null terminated, char *
.
If a device is matched, the device specific storage can then be reinterpreted to whatever matches the corresponding specification.
The actual device specification may be tricky and will not be provided by Python itself.
Choice and list of device identifiers
Since this proposal is to use a unique name as a device identifier there is a problem of competing naming and authority to use a canonical name.
Python cannot fully control this, but users specifying an extension should open a documentation PR to Python before they adopt a name and Python does reserve the right to reject a choice if possibly contest.
For example using SYCL
, CUDA
, or HIP
as a name is not acceptable without clear consensus.
Experiments, could rather use a name likecupy-cuda
even if that may unfortunately mean a transition in the future.
Consumers of such a definition can support multiple definitions, but the producer cannot deprecate theirs, unfortunately.
Since an exporter cannot support multiple devices (except via a global config), care should be taken when designing device specific information.
This PEP is fully backwards compatible. It also largely foward compatible because many exporters will ignore additional extended flags.
The above section contain brief notes on backwards and forward compatibility.
Beyond documentation, this PEP requires relatively minor extensions to CPython itself. We propose:
PyBUF_DEVICE
flagPyBUF_REQUIRED_FLAGS
.PyObject_CheckBufferSupports
function.Py_buffer_extended
struct (which may grow in the future).PyObject_GetBuffer
to zero out flags
and ext_flags
and to check PyBUF_REQUIRED_FLAGS
.bf_supported_flags
correctly.Definitions which do not touch bf_supported_flags
may be backported e.g. to https://github.com/vstinner/pythoncapi_compat.
Otherwise, the buffer protocol documentation needs to be extended with these definitions and add notes to all public API functions that cannot work with the new extended flag.
There are no security implications beyond incorrect implementations or use of existing API on buffers requested via the new flags.
As we do wish to integrate this and extend the buffer protocol and do not wish for Python to define details of device support, we are not aware of alternative approaches.
In detail, we considered using a unique symbol rather than a char *
device name to identify the unique device.
While this seemed very reasonable having a string is useful to raise errors when a device type is not understood.
If we need a string for this purpose it seems reasonable to use it for device identification.
Python itself should not need to define exact device types as there are many such devices and they may be complex.
A problem is how to reserve names for new devices and exchange existing definitions. We propose here to have light review by asking for a PR to the Python documentation as well as discussion on the Python discuss for new ideas.
Python cannot striclty police this, although it may rejec
We could include some reserved space already now in the Py_buffer_extended
struct, which may be nice for future adoption.
However, other than this space coming before the device space there seems little gain in it.
In the future Py_buffer_extended
may grow, even if users do not need the additional space. We consider this to be OK and users who are concerned by it should vendor the struct definition.
PyBUF_EXTENDED
flagsWe could add an explicit PyBUF_EXTENDED
request flag to
indicate exactly that the extended flags are available.
This seemed unnecessary to us for now, but we are happy to do this if it seems clearer or there is a possible future use-case this would simplify.
The concept may also be useful for device data exchange, since knowing that a buffer is only borrowed temporarily can simplify worries about synchronization (where multiple works might use data at the same time). ↩︎
I.e. if a consumer passes PyObject_GetBuffer(obj, Py_buf_extended &buf, PyBUF_DEVICE);
it must check (buf->flags & PyBUF_DEVICE)
and if not set, must not access the device specific information and assume a CPU buffer. ↩︎