owned this note
owned this note
Published
Linked with GitHub
---
tags: Dtype, NumPy
---
# Dtype tasks
Reading the issues, discussions, and [brainstorming](https://github.com/numpy/numpy/wiki/Dtype-Brainstorming), here is my (Matti's) take on what we should do (rough draft), also see the [ufunc tutorial](http://www.numpy.org/devdocs/user/c-info.ufunc-tutorial.html?highlight=ufunc%20tutorial)
- [ ] Convert the dtype objects [np.int32, np.float18 ...](#Builtin-Numeric-types-by-number) into PyTypeObjects, much like the scalar type objects in `scalartypes.c.src`
- code is in `arraytypes.c.scr`,
- np.dtype('int32') will return the `PyDescr_Int32` **type object**
- 2018-11-28 progress in PR [](https://github.com/numpy/numpy/pull/12462)
- will allow subclassing dtypes
- will break peoples compiled c-code (should be ok once they recompile)
- Scalars should be instances of the appropriate dtype. This will remove dtype.typeobj, and simplify
- `PyArray_CheckAnyScalarExact` (something like `PyCheckObject(obj, PyDescr_Type) && obj->num < NPY_SCALAR`)
- `_typenum_fromobj` (something like `if (PyCheckObject(obj, PyDescr_Type)) {return obj->num;} else {return -1;}`)
- Make sure the offset, elsize can be very large (today limited to int32)
- [ ] Provide `ArrFuncs` to be able to extend the structure in the future.
- [ ] Add dunder methods to dtypes
- Should they allow self? Probably not
- needed for ufuncs. Replaces [`PyArray_SetNumericOps`](https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/src/multiarray/number.c#L63)
- Some (most?) of the ufuncs are already dunder methods in `PyNumberMethods`, some come from [`PyArray_ArrFuncs`](https://github.com/numpy/numpy/blob/v1.15.3/numpy/core/src/multiarray/usertypes.c#L93)
- [ ] Pass dtypes in to ufuncs
discussion at [issue 12518](https://github.com/numpy/numpy/issues/12518)
- New inner loop signature, so we will need to support both old and new inner loops (don't mess with optimized stuff)
- by default, inner loop will use dtype methods on arguments
- except for current ufunc type selector mechanism for the 21 built-in dtypes
- MKL can still override some of the 21 built-in float/int types with their own inner loop functions via [`PyUFunc_ReplaceLoopBySignature`](https://docs.scipy.org/doc/numpy/reference/c-api.ufunc.html#c.PyUFunc_ReplaceLoopBySignature) and ['PyUFunc_RegisterLoopForType'](https://docs.scipy.org/doc/numpy/user/c-info.beyond-basics.html#c.PyUFunc_RegisterLoopForType)
- [ ] Allow subclassing dtype from both python and c
- Uses dunder methods from python, they go to the slots
- People who want to add attributes like units and categories can do so from python or c by subclassing
- [ ] How can a dtype cause a change in higher-level functions
- [ ] Dtypes have to cast to new dtypes and may want to promote others
in calculations.
- Say I create two category dtypes and two arrays `a`,`b` from each. Then `a + b` means concatenation (or does it?). How can the dtype cause that to happen?
- (*Sebastian*:) Currently, dtypes can register new casting functions
(how to coerce/convert), as well as define whether one is "safe" (there
are more complex hooks for scalars possible but probably unused).
*We probably need to add a possibility to add promotion rules.* For
usertypes, promotion rules only exist in the form of multiple ufunc
loops. Possibly, a better concept of promotion rules can replace
"same kind" casting?
- [ ] Object-type dtypes like text, unicode, shapefiles.
- [ ] What operations (slots) are required from a dtype? Is there a protocol like `__array_function__`? Does the default `np.dtype` simply return NotImplemented for all the slots?
- [ ] What about immutability of dtypes? How does that influence using as keys or pickling?
### Builtin Numeric types by number
```
In [1]: {i: np.sctypeDict[i].__name__ for i in range(24)}
Out[1]:
{0: 'bool_',
1: 'int8',
2: 'uint8',
3: 'int16',
4: 'uint16',
5: 'int32',
6: 'uint32',
7: 'int32',
8: 'uint32',
9: 'int64',
10: 'uint64',
11: 'float32',
12: 'float64',
13: 'float64',
14: 'complex64',
15: 'complex128',
16: 'complex128',
17: 'object_',
18: 'bytes_',
19: 'str_',
20: 'void',
21: 'datetime64',
22: 'timedelta64',
23: 'float16'}
```
- Use cases for reference:
- units,
- enumerations (categories),
- datetime (refactor datetime64?),
- extended numerics
- rational
- fixed-point int/float
- quaternions / geometric algebra
- "conjugate complex" type storing ``x - yi`` as ``{x, y}``
- fixed-size data structures like multi-column data in Pandas
- object types like str, unicode, shapely/GEOS geometries - use raw pointers
- "missing data" sentinal that inherit from the [24 numeric types](#Builtin-Numeric-types-by-number)
- automatic differentiation dual values