Dtype Thoughts

--- tags: Dtype, NumPy --- # Dtype Thoughts ## Goals: ### Clarify the meaning of `np.dtype` It is [documented](http://www.numpy.org/devdocs/reference/generated/numpy.dtype.html?highlight=dtype#numpy.dtype) as "Create a data type object". But it is a class. When I do `int(100)` it creates an instance of the `int` class, with 100 as one of its attributes. However, np.dtype(`int8`) actually returns an instance of the `np.dtype` class set up with methods and attributes that are totally different from `np.dtype([('x','int'), ('y', int)])`. So what is `np.dtype`? A class factory? A class `__new__` method? - It currently calls `PyArray_DescrType.tp_new` which is `arraydescr_new`, which is a class `__new__`. But the class instance it returns depends on the arguments, it may return a `uint8` or a `void`. - The instances all have separate methods and attributes, so they should be considered instances of separate subclasses. - The PyArray_Descr type is more of a container than a type with methods. For instance, casting relies on a global table (tree?, list of lists?) to resolve casing rules rather than calling a method on the dtype instance. The same with result-type resolution for ufuncs. ### Propose a mechanism to subclass existing dtypes - Refactor along the lines of NEP proposals in [PR #12660](https://github.com/numpy/numpy/pull/12660) or [PR #12636](https://github.com/numpy/numpy/pull/12660) to allow subclassing. This is actually easier than thinking about what it all means, and has been implemented in [PR #12585](https://github.com/numpy/numpy/pull/12585). - Requires rethinking PyArray_Descr to be more like a `tp_as_number` that is inherited in subclasses (filled in from the class heirarchy when calling `PyType_Ready`). - Requires rethinking the ufunc lookup rules. For instance, what would happen if we create a int-overflow class that checks int operations for overflow? Now how do we lookup a ufunc to use? Rather than having a global table, we should have a more pythonic protocol. - Propose a mechanism to create dtypes in python, would need to do something like `np.frompyfunc` You cannot override tp_as_number slots on instances so we would do the same, once you call `PyType_Ready` on your class the slots are set. ``` class A(int): def __add__(self, other): return 'in A.__add__' def adder(self, other): return 'in adder' a = A(10) print(a + 20) # prints "in A.__add__" a.__add__ = adder print(a + 20) # still prints "in A.__add__" ``` ## Discussion on specifying dtypes at array creation - Specifying dtypes. [This PR](https://github.com/numpy/numpy/pull/5634) suggests extending `a = np.array(b, min_dtype=np.float32)` but then moved on to more exotic dtype specification syntax: - np.array(b, dtype=np.blasable) for `(s, d, c, z)` - np.array(b, dtype=np.floating) and there are also proposals around to make the default never convert to an object array to avoid the `np.array([[1], [2, 2]])` problems. I used `np.array` but the same holds for `np.asarray` or `np.asanyarray` or ... (Matti) - Hameer suggests ordering the dtypes and adding casting rules. Would need a flag for abstract? - Can we do this as a table? A tree? A method on a dtype subclass? - Use cases: np.blasable, np.complex, np.no-object, np.inexcact (float, complex) - Julia casting tables for type promoting - Maybe use context managers to modify the casting rules? - Think about this in the context of overflow ## Appendix Other documents - SciPy 2018 [brainstorming session](https://github.com/numpy/numpy/wiki/Dtype-Brainstorming) - [PR to refactor dtypes](https://github.com/numpy/numpy/pull/12585) - [NEP design PR](https://github.com/numpy/numpy/pull/12630) (one of many such PRs since the subject got complicated) ### quaternions Separate [repo](https://github.com/moble/quaternion) as one C file that - Calls `PyObject_New(PyArray_Descr, &PyArrayDescr_Type)` - Assigns all the special `PyArray_Descr` fields including `f` (with a `_PyQuaternion_ArrFuncs` like `setitem`, `getitem`) - Registers the dtype via `PyArray_RegisterDataType` which gives it a `type_num` (python level `dtype.num`) - Defines all the needed ufuncs and registers them via `PyUFunc_FromFuncAndData`, `PyUFunc_RegisterLoopForType` which requres adding them to `np.add`, `np.subtract`,... This is done for `quat, quat -> quat`, `quat, double -> quat`, `double, quat -> quat` - Defines extra ufuncs specifically for quaternions and exports them as `np.norm`, `np.normalized`, `np.*parity*`, `np.rot*` - Registers all the casting functions via `PyArray_RegisterCastFunc`, `PyArray_RegisterCanCast` - Adds `_eps` and `quaternion` to the top-level `np` namespace Whew!! ## Rational numbers Part of the NumPy repo, `src/umath/_rational_tests.c.src` - Defines a PyArray_ArrFuncs npyrational_arrfuncs, fills fields like `setitem`, `getitem` - Defines a PyArray_Descr rather than call `PyObject_New` ``` PyArray_Descr npyrational_descr = { PyObject_HEAD_INIT(0) ... &npyrational_arrfuncs, /* f */ } ``` - Registers the dtype via `PyArray_RegisterDataType` which gives it a `type_num` (python level `dtype.num`) - Defines all the needed ufuncs and registers them via `PyUFunc_FromFuncAndData`, `PyUFunc_RegisterLoopForType` which requres adding them to `np.add`, `np.subtract`,... This is done for `rational, rational -> rational` - Registers all the casting functions via `PyArray_RegisterCastFunc`, `PyArray_RegisterCanCast` ## User Stories - units - [unyt](https://github.com/yt-project/unyt) (ndarray subclass) - [Pint](https://github.com/hgrecco/pint) (wrapper via [__array_prepare__ and __array_wrap__](https://github.com/hgrecco/pint/blob/master/pint/unit.py#L237), see [comment](https://pint.readthedocs.io/en/latest/numpy.html#comments)) - astropy.units (ndarray subclass) - several others, see https://www.youtube.com/watch?v=N-edLdxiM40 - enumerations / categorical - text - encoded fixed width text (utf8, latin1, ...) - variable width - datetime - 360 day calendar - Ora: https://github.com/alexhsamuel/ora - shapely/GEOS geometries - does this include jagged arrays of polygons? - Numerics - Novel floating point formats - Decimal (arbitrary precision) - Big int - finite fields - Rationals https://github.com/numpy/numpy-dtypes - float16? - Missing values - sentinels - bitmask - record-like array - optional - quaternion - https://github.com/martinling/numpy_quaternion (outdated) - https://github.com/moble/quaternion (maintianed) - a general pointer dtype that does memory packagement - xdress (https://github.com/xdress/xdress) - ndtypes (https://ndtypes.readthedocs.io/)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.