Author: Stephan Hoyer
Status: Informational
This document outlines what NumPy needs to make it possible to write fully-featured data types from Python. We focus on universal functions, since these are NumPy's main abstration for data type dependent functionality.
This arose out of discussion at the NumPy developer meeting in Berkeley, California on November 30, 2018. Thanks to Eric Wieser, Matti Picus, Charles Harris, and Travis Oliphant for sharing their thoughts on parts of this proposal (did I miss anyone?).
Data types ("dtype") in NumPy idenfies the type of the data stored in arrays (e.g., float vs integer). The power of dtypes is that they allows for a clean separation (at least in principle) between shape and data dependent functionality:
concatenate
, reshape
, indexing.add
, sin
, sum
.NumPy up to 1.16 has poor support for custom dtypes. Defining dtype-dependent functionality requires writing inner loops in C, and even the C API is insufficiently flexible for many use cases (e.g., there is no support for functions that depend on metadata stored on dtypes).
NumPy comes with many builtin dtypes, and there are also many many usecases for user-defined dtypes. These fall into several categories:
These are all valuable, but here we want to focus on category (3). We think this sort of dtype has the largest need, and would have the largest beneficial impact on the broader NumPy ecosystem.
This NEP intentionally does not address questions of how NumPy's low-level dtype machinery will need to be updated to make it possible to write high-level logical dtypes. These will be the subject another NEP.
"Duck arrays" are great and we should support them better in NumPy. See NEP-18 for an overview and discussion. Most functionality envisioned for dtypes could also be implemented by duck arrays, but duck arrays require a lot more work: you need to redefine all of the shape functions as well as the dtype functions.
Subclasses of NumPy arrays are widely used and, at least in principle, could satisfy the use-case of extending dtype-specific functions without rewriting shape-dependent functions. Unfortunately, subclasses are a little too flexible: it's easy to write a subclass that violates the Liskov substitution principle. NumPy itself includes several examples of ndarray
subclasses that violate substitutabiltiy, either by changing shape semenatics (for np.matrix
) or the behavior of methods (for np.ma.MaskedArray
).
Dtypes provide just the right amount of flexibility to define a new data type, while allowing them to be reliably used from third-party libraries. You need to define new functionality related to the dtype, but shape functions are guaranteed to work exactly like standard NumPy arrays.
The core idea of universal functions is that they are data-dependent but shape agnostic. They use shared function signatures allows them to be processed in standard ways, e.g., with __array_ufunc__
.
We should double-down on ufuncs
, by making all data-dependent behavior in NumPy use ufuncs
. See issue 12514 for a list of things that need to change. This isn't to say that every data-dependent function in NumPy needs to be ufunc: in many cases, functions have esoteric enough signatures (e.g., for shape handling) that they can't fit into the ufunc model. But we should write ufuncs for all the data-dependent functionality within NumPy, and implementing ufuncs for a new dtype should be NumPy's extension mechanism for dtype-specific functionality.
Note: This NEP presumes that dtypes will be rewritten as a metaclasses, so NumPy scalars will become instances of dtypes. This will be the subject of another NEP; protyping has started in pull requests #12467 and #12462.
Writing a high-level Python dtype should simply entail writing a new Python type that implements a standard interface. This interface should specify:
itemsize
: how many bytes does each dtype element consume?alignment
(should be itemsize
or smaller).dtype=object
, which uses NPY_USE_GETITEM
)ndarray.item()
?__array_ufunc__
to override existing ufuncs (see __dtype_ufunc__
below)__repr__
method.By design, it is impossible to override arbitrary NumPy functions that act on shapes, e.g., np.concatenate
. These should not vary in a dtype dependent fashion.
We propose a more restricted variant of __array_ufunc__
(only for high level Python dtypes) that restricts itself to not handle duckarrays, which we'll tentatively call __dtype__ufunc__
.
Unlike __array_ufunc__
, calling ufunc overrides __dtype_ufunc__
should happen at a lower level in the ufunc machinery:
However, __dtype_func__
overrides happens at a higher level than NumPy's existing ufunc implementations:
Drawbacks:
__array_ufunc__
, __dtype_ufunc__
and NumPy's internal thing).Consider datetime and timedelta dtype like NumPy's datetime64/timedelta64.
Most operations could be implemented simply by casting to int64 and calling another ufunc on the int64 data, e.g., for np.sub
:
class MyDatetime(metaclass=np.dtype):
@classmethod
def __dtype_ufunc__(cls, ufunc, method, *inputs, **kwargs):
if method != '__call__':
return NotImplemented
if ufunc is np.sub:
a, b = inputs
if isinstance(a, cls) and isinstance(b, cls):
return (a.view(np.int64) - b.view(np.int64)).view(MyTimedelta)
elif isinstance(b, MyTimdelta):
return (a.view(np.int64) - b.view(np.int64)).view(MyDatetime)
else:
return NotImplemented
# implement other ufuncs
return NotImplemented
class MyTimedelta(metaclass=np.dtype):
...
__dtype_ufunc__
NumPy should check for __dtype_ufunc__
attributes after looking for __array_ufunc__
overrides, but before builtin ufunc implementations, e.g.,
def implement_dtype_ufunc(ufunc, method, *inputs, **kwargs):
outputs = kwargs.get('out', ())
arrays = inputs + outputs
# dtype dispatching
dtypes = [item.dtype
for item in arrays
if hasattr(item.dtype, '__dtype_ufunc__')]
if dtypes:
for dtype in dtypes:
# note: each element in inputs is a numpy array
# or subclass
result = dtype.__dtype_ufunc__(
ufunc, method, *inputs, **kwargs)
if result is not NotImplemented:
check_result(result, ufunc, inputs, kwargs)
return result
raise TypeError('dtypes did not implement ufunc')
# base ndarray implementation
return getattr(ufunc, method)(*items, **kwargs)
As part of calling __dtype_ufunc__
overrides, NumPy should verify that the custom ufunc implementation honors appropriate invariants:
def check_result(result, ufunc, inputs):
# various consistency checks for the result
if type(result) is not tuple:
result = (result,)
if len(result) != ufunc.nout:
raise ValueError('wrong number of outputs')
for x in result:
if not isinstance(x, ndarray):
raise TypeError('wrong result type')
# TODO: handle gufunc shapes
expected_shape = broadcast_arrays(*inputs).shape
for expected_shape, res in zip(shapes, result):
if expected_shape != res.shape:
raise ValueError('wrong shape')
Most dtypes need new functions, beyond those that already exist as ufuncs in NumPy. For example, our new datetime type should have functions for doing custom datetime conversions.
Logically, almost all of these operations are element-wise, so they are a good fit for NumPy's ufunc model. But right now it's hard to write ufuncs: you need to define the inner loops at a C level, and sometimes even write or modify NumPy's internal "type resolver" function that determines the proper output type and inner loop function to use given certain input types (e.g., NumPy has hard-coded support for datetime64
in the type resolver for np.add
). For user-defined dtypes written in Python to be usable, it should be possible write user-defined ufuncs in Python, too.
There are least three use-cases for writing ufuncs in Python:
np.vectorize
but actually creating a ufunc. This will not be terribly useful because it is painfully slow to do inner loops in Python.__dtype_ufunc__
or __array_ufunc__
. This provides useful introspection options for third-party libraries to build upon, e.g., dask.array
can automatically determine how to parallelize such a ufunc.For usable user-defined ufuncs, case (2) is probably most important. There are lots of examples of performant vectorized functions in user code, but with the exception of trivial cases where non-generalized NumPy ufuncs are wrapped, most of these don't handle the full generality of NumPy's ufuncs.
For NumPy itself, case (3) could be valuable: we have lots of non-ufuncs that could logically fit into the ufunc model, e.g., argmin
, median
, sort
, where
, etc.
Note: numba.vectorize
is does not produce a ufunc currently, but it should.
A ufunc decorator should check args, and do broadcasting/reshaping such that the ufunc implementation only needs to handle arrays with one more dimensions than the number of "core dimensions" in the gufunc signature. For example:
@ufunc(signature='()->()')
def dayofyear(dates):
# dates is a 1D numpy array
return dates.view(np.int64) % 365
or perhaps supporting multiple loops
# the gufunc signature shows nin/nout and dimensionality
@ufunc(signature='()->()')
def dayofyear(dates):
"""Used just for documentation."""
@dayofyear.define_loop([MyDatetime, MyTimedelta], MyDatetime)
def datetime_add(date, delta, out):
# How do we determine the precise output dtype?
out[:] = ...
def _resolver(date_dtype, time_dtype):
return date_dtype
@dayofyear.define_loop(_resolver)
def datetime_add(date, delta):
return out
@dayofyear.define_loop([MyDatetime], int)
def dayofyear(dates, out):
# dates is a 1D numpy array
out[:] = dates.view(np.int64) % 365
@dayofyear.define_loop([np.generic], int)
def dayofyear(dates):
...
# or extracting the dtypes from annotations:
@dayofyear.define_loop
def dayofyear(dates: Array[np.generic]) -> Array[int]:
...
This is doing three things:
Why is @ufunc
different from vectorize
?
__array_ufunc__
or __dtype_ufunc__
.where
, and axis
for gufuncs).TODO: finish cleaning this up
For each ufunc, we currently have:
This results in hard-wired cases for new dtypes (e.g., np.datetime64
) inside type resolver functions, which is not very extensible.
Instead, we might want:
__dtype_ufunc__
but without the overhead of Python calls) finds a dtype that implements the ufunc for all the given argument dtypesWe will want to default to using NumPy's current type resolving protocol for current builtin dtypes/ufuncs, i.e., by writing a generic version of __low_level_dtype_ufunc__
to set on builtin dtypes.
ddof
on np.std
):
np.mean
, or perhaps we could define these as ufuncs that have a reduce
method but no __call__
method.np.linalg.solve
) have their own strange casting rules. If we want to support these, we will need some dtype equivalent version of __array_function__
.np.busday_count
)sort
, mean
, median
etc.
mean
will need new axis rules.np.where
are vectorized like ufuncs, but it can use a generic (non dtype-dependent) inner loop.
np.where.ufunc
).class DescrFlags(IntFlags):
# The item must be reference counted when it is inserted or extracted.
ITEM_REFCOUNT = 0x01
# Same as needing REFCOUNT
ITEM_HASOBJECT = 0x01
# Convert to list for pickling
LIST_PICKLE = 0x02
# The item is a POINTER
ITEM_IS_POINTER = 0x04
# memory needs to be initialized for this data-type
NEEDS_INIT = 0x08
# operations need Python C-API so don't give-up thread.
NEEDS_PYAPI = 0x10
# Use f.getitem when extracting elements of this data-type
USE_GETITEM = 0x20
# Use f.setitem when setting creating 0-d array from this data-type
USE_SETITEM = 0x40
# A sticky flag specifically for structured arrays
ALIGNED_STRUCT = 0x80
class current_dtype(object):
itemsize: int
alignment: int
byteorder: str
flags: DescrFlags
metadata: ... # unknown
# getters
hasobject: bool
isalignedstruct: bool
isbuiltin: bool
isnative: bool
def newbyteorder(self) -> current_dtype: ...
# to move to a structured dtype subclass
names: Tuple[str]
fields: Dict[str, Union[
Tuple[current_dtype, int],
Tuple[current_dtype, int, Any]
]]
# to move to a subarray dtype subclass
subdtype: Optional[Tuple[dtype, Tuple[int,...]]]
shape: Tuple[int]
base: current_dtype
# to deprecate
type: Type # merge with cls
kind: str
num: int
str: str
name: str
char: str
descr: List[...]
class UfuncLoopImpl:
"""
A implementation of an ufunc with all the argument types resolved
"""
dtypes: ...
inner_loop: ...
data: context
def __call__(self, *args_and_outs, where):
do_setup()
for args_1d in the_flat_loop(args_and_out, where):
self.inner_loop(*args_1d, self.context)
do_teardown()
class ufunc:
_resolvers = ... # List[Tuple[Type[dtype], Function]]
@memoize
def _resolve(self, *dtypes):
assert all(dtypes[:self.nin]), "all inputs types must be specified"
dype_categories = [type(d) for d in dtypes]
matching_resolvers = [
(types, func)
for types, func in self._resolvers
if all(
# unspecified output matches anything
issubclass(d, t) or d is type(None)
for d, t in zip(dype_categories, types)
)
]
# todo: find the "most specific" matching resolver
resolver = matching_resolvers[0]
impl = resolver(*dtypes)
assert impl.dtypes[self.nin:].matches_the_requested_dtypes()
return impl
def register_resolver(self, outputs):
if not isinstance(outputs, tuple):
outputs = (outputs,)
Example: Implementing multiplication for a unit dtype
# new terminology - a category is a
class UnitCategory(np.dtype):
base: dtype
units: str
def __new__(cls, base, units):
self = super().__new__(cls, ...)
self.base = base
self.units = units
# context is the void pointer passed to C
# this may or may not be the inner loop.
def the_resolved_mul_loop(a, b, out, context):
base_impl = context
# this could also have been handled by creating 8 different inner loops and choosing them in the dispatchers
# optimization lets us move this as needed
if isinstance(a.dtype, UnitCategory):
a = a.view(a.dtype.base)
if isinstance(b.dtype, UnitCategory):
b = b.view(b.dtype.base)
if isinstance(out.dtype, UnitCategory):
out = out.view(out.dtype.base)
base_impl.inner_loop(a, b, out)
# could use casting here for
@mul.register_resolver
def mul(a_type: UnitCategory, b_type: np.dtype, out_dtype: UnitCategory = None) -> UfuncLoopImpl:
base_type, base_loop = mul.resolver(
a_type.base, b_type, out_dtype.base if out_dtype else None
)
output_type = UnitCategory(base_type, a_type.unit)
return UfuncLoopImpl(
(a_type, b_type, output_type), the_resolved_mul_loop, base_loop)
@mul.register_resolver
def mul(a_type: np.dtype, b_type: UnitCategory, out_dtype: UnitCategory = None) -> UfuncLoopImpl:
base_type, base_loop = mul.resolver(
a_type, b_type.base, out_dtype.base if out_dtype else None
)
output_type = UnitCategory(base_type, b_type.unit)
return UfuncLoopImpl(
(a_type, b_type, output_type), the_resolved_mul_loop, base_loop)
@mul.register_resolver
def mul(a_type: UnitCategory, b_type: UnitCategory, out_dtype: UnitCategory = None) -> UfuncLoopImpl:
base_impl = mul.resolver(
a_type.base, b_type.base, out_dtype.base if out_dtype else None
)
new_units = a_type.units + b_type.units
if out_dtype and new_units:
assert out_dtype == new_units
output_type = UnitCategory(base_impl.dtypes[-1], a_type.units + b_type.units)
return UfuncLoopImpl(
(a_type, b_type, output_type), the_resolved_mul_loop, base_loop)
Example: flexible dtype bytes addition
# encoding is harder
class BytesCategory(np.dtype):
nchars: int
def __new__(cls, nchars):
self = super().__new__(cls, ...)
self.nchars = nchars
# context is tnot needed here
def the_resolved_add_loop(a, b, out, context):
assert context is None
out[:len(a)] = a
out[len(a):len(a)+len(b)] = b
out[len(a)+len(b):] = '\0'
@add.register_resolver(outputs=(BytesCategory,))
def add(a_type: BytesCategory, b_type: BytesCategory) -> UFuncLoopImpl:
output_type = StringCategory(a_type.nchars + b_type.nchars)
return UfuncLoopImpl(
(a_type, b_type, output_type), the_resolved_add_loop, None)