Inspired by https://hackmd.io/ok21UoAQQmOtSVk6keaJhw?view
Some terminology notes:
dtype
" refers to the PyTypeObject *
with value PyArrayDescr_Type
dtype
" refers to a PyArray_Descr *
with either a static or heap-allocated value, and is an instance of "dtype
"PyTypeObject *
with value Py@NAME@ArrType_Type
, that is a subclass of PyGenericArrType_Type
Py@NAME@ScalarObject *
, which is an instance of "A numpy scalar type"Two orthogonal goals:
dtype
to aid with repesentation of parameterized types (builtin and third-party)In some ways, np.dtype
is becoming a union of all possible data types. For example, it contains the fields (shown as .python
, ->c
):
.fields
(->fields
), .names
(->names
), .isalignedstruct
- only applicable to structured arrays.subdtype
(->subarray->base
), .shape
(->subarray->shape
) - only applicable to subarrays->c_metadata
- only applicable to datetime types, and perhaps third party types.metadata
- not used internally, but accessible to pythonEach of these assume some default value in the cases when they don't apply.
This type of approach does not scale well. Many third party types may want to store their own metadata alongside a dtype (and make it accesible from C), such a quantity type wanting to store a PyObject *unit
. Our c_metadata
slot allows them to do this, but it makes them second-class citizens.
It would be better to allow custom dtypes to freely add their own information at the end of this struct by subclassing. To dogfood ourselves on this approach, we could try to extract structured, subarray, and datetime dtypes into their own dtype subclasses. Expressed in python, this would look like:
This sets out to primarily help users / numpy itself writing parameterizable dtypes, including:
unit: Unit
)values: Tuple[Any]
)encoding: str
)Open questions:
quaternion
[mattip]Maybe mention this holds 4 doubles and overrides all the ufunc methods?
int256
In favor:
Against:
PyArrayDescr
an opaque struct, and requiring users to use PyArrayDescr_KIND(desc)
instead of desc->kind
. After a couple releases of this, we could remove the old API, and start to change the underlying struct layouts.@property
wrappers in the base class.Counter-argument:
dtype
different to a type
(comparing dtype methods to classmethod, dtype attributes to class attributes)? type
has few subclasses, instead opting for customization via tp_*
slots. Why can't we do the same with dtype?dtype
a metaclass, so that np.dtype('i')
is itself a typeThis would:
np.int64
, a scalar type, and np.dtype(np.int64)
, its dtype.dtype.type
to be deprecated, and just return self
my_dtype(0)
instead of my_dtype.type(0)
.Although it opens
TODO: more justification needed here.
TODO