2019-09-23 NumPy DType Catch-up

# 2019-09-23 NumPy DType Catch-up - Time: 11am Pacific Time - Join via Zoom at https://zoom.us/j/6398421986 (active now) - [Trello workboard](https://trello.com/b/Azg4fYZH/numpy-at-bids) - [Previous meetings](https://github.com/BIDS-numpy/docs/tree/master/status_meetings) **NEP draft hackmd:** https://hackmd.io/kxuh15QGSjueEKft5SaMug **Present:** * DTypeInitFromSpec may want something like "subclass"/"inherit" from slot... # Topics (please add) * UFunc dispatching * Casting * Abstract dtypes * nice for ufunc dispatching * useful for promotion (int -> float) rules # Proposal Overview Proposal -------- ### UFunc Dispatching * Before calling inner-loop, datatypes need to be fixed (e.g. length of strings when adding two strings.) * UFUncs need to figure out which "inner loop" to call * Type promotion needs to be defined (Float + Int -> Float) Solution: 1. Create UFuncImpl objects: * ``UfuncImpl[Int64, Int64 -> Int64]``, replace current "inner-loop" function * Handle that the output of ``"a" + "b"`` is the two letter string ``"ab"`` * Exposed to python and operate like a UFunc with a specific signature (limited to numpy-arrays probably). 2. UFuncImpl is registered to UFunc * We have a list of available UfuncImpls (at least typically) 3. Promotion: * Register promotion/resolver functions which: * Return UfuncImpl * Could use "common type"/promotion and try again * Promotion function lookup: 1. Try all functions 2. Dispatch based on categories [mattip] Give the promotion resolver function a name, explain the mechanism to resolve the promotion logic ``` @add.register_resolver(Number, Number) def resolver(ufunc, number1, number2): common = np.common_dtype(number1, number2) return ufunc.get_loop[common, common, common] def get_impl(dtype_cls1, dtype_cls2): pass ``` ### AbstractDType DTypes, but cannot be attached to arrays: * Numeric, Integral, Floating, … Implementation solves two problems: 1. Answers the question ``isinstance(Int64, Integral)`` * ``dtype.kind`` is not extensible enough for user dtypes! * We abuse the scalar type hierarchy for this currently 2. Registration of promotion functions: * Can use dispatching (multiple dispatching) based on AbstractDTypes. 3. Could even allow to write ``arr.astype(np.floating)`` in the future. 4. Unfortunately something like this necessary for value based promotion in UFunc dispatching. > [mattip] this is awkward and need a definition of value based promotion ### DTypes are classes, ``arr.dtype`` is their instance * Current ``arr.dtype.num`` type number * Subclassing may be allowable, but we would need to drop, * Allows for example definition of ``unit.to_si()`` and gives dtype authors the ability the full flexibility of Python types. * UFuncs currently dispatch on type number → should be DType class ![overview_hierarchy_and_type_definitions](https://i.imgur.com/cu8xMBA.png) ### Define as much as possible on the DType class * ``dt_slots`` to define methods and classmethods. * Casting/promotion goes through dunder-methods/slots just like in Python operators. * Most important slots: * ``__can_cast_from_other__(cls, other, casting) -> CastingImpl`` * ``__can_cast_to_other__(cls, other, casting) -> CastingImpl`` * ``__common_dtype__(cls, other) -> DType class`` * E.g. concatenating String and Float gives a String. * ``__common_instance__``: * String instances S4 and S8 concatenate to S8. * Existing, but probably slightly changing "slots": * ``dtype_getitem``/``dtype_setitem`` * sorting... (but these should all morphe and currently live somewhere between class and instance) * Do **not** make the C-struct visible, allowing easy extension, and even deprecation (since we can deprecate slots just like python functions). ### Casting * Multistep: 1. Ask both dtype classes 2. Classes provide CastingImpl (much like UFuncImpl above) 3. CastingImpl checks details: * Float to "S8" is not possible because string is too short (Non-flexible dtypes can skip 2. and 3.) * CastingImpl can be easily extended: * Return specialized implementation (performance, we do this internally, replicate the complicated implementation we currently have) ### Iffy points: * We have to break ABI compatibility in very minor, back-portable way. Some small incompatibilities are likely. * Promotion (coercion) between different dtype instances can require multiple steps * Promotion for value based casting (`np.array([1], dtype="int8") + 1` to give a `"int8"` array): * Requires AbstractDType for python integers to store value! * User DTypes may have to deal with it * (May provide fallback paths e.g. to use smallest possible signed integer) * Array coercion: * Finding correct DType: * Either looping (plus caching) through ``__discover_dtype_from_pytype__`` on all DTypes * global dict of Scalar->DType * e.g. string coercion is slow because it needs two passes, and second pass is complex * Second pass (currently) creates many new dtype instances (slow, could be cached, or use "adjustment" logic) * Have to provide compatibility to old slots * Sorting, take, casting: All need an API refresh * Inspecting values during UFunc calls (before the inner loop) should not be possible * But does anyone have a need for that? * Could be possible to add later (but would be highly discouraged) ### API Considerations: * **DTypes** * Users have to use ``PyArray_InitDTypeMetaFromSpec`` to initialize DType (from C) * Hide implementation * Allow easy later addition of new slots * Users can create their own DType Types * We may allow users to create their own DTypeMeta types (maybe not initially) * **CastingImpl** * Initially no new power (except flexible dtypes) * Can easily extend/make API public to allow performance optimization (without having to add many slots on the DTypes) * Will be UFuncImpl (but does not need to be initially!) * **UFuncImpl** * Need to decide on exact API * Loops will need hooks to allow better setup/teardown * Allow better error reporting from inside loops. * Initially: * Possibly not full control initially * Easily extensible (after initial dispatching, numpy could forfeit almost all control)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.