# 2019-09-23 NumPy DType Catch-up
- Time: 11am Pacific Time
- Join via Zoom at https://zoom.us/j/6398421986 (active now)
- [Trello workboard](https://trello.com/b/Azg4fYZH/numpy-at-bids)
- [Previous meetings](https://github.com/BIDS-numpy/docs/tree/master/status_meetings)
**NEP draft hackmd:** https://hackmd.io/kxuh15QGSjueEKft5SaMug
* DTypeInitFromSpec may want something like "subclass"/"inherit" from slot...
# Topics (please add)
* UFunc dispatching
* Abstract dtypes
* nice for ufunc dispatching
* useful for promotion (int -> float) rules
# Proposal Overview
### UFunc Dispatching
* Before calling inner-loop, datatypes need to be fixed (e.g. length of strings when adding two strings.)
* UFUncs need to figure out which "inner loop" to call
* Type promotion needs to be defined (Float + Int -> Float)
1. Create UFuncImpl objects:
* ``UfuncImpl[Int64, Int64 -> Int64]``, replace current "inner-loop" function
* Handle that the output of ``"a" + "b"`` is the two letter string ``"ab"``
* Exposed to python and operate like a UFunc with a specific signature (limited to numpy-arrays probably).
2. UFuncImpl is registered to UFunc
* We have a list of available UfuncImpls (at least typically)
* Register promotion/resolver functions which:
* Return UfuncImpl
* Could use "common type"/promotion and try again
* Promotion function lookup:
1. Try all functions
2. Dispatch based on categories
[mattip] Give the promotion resolver function a name, explain the mechanism to resolve the promotion logic
def resolver(ufunc, number1, number2):
common = np.common_dtype(number1, number2)
return ufunc.get_loop[common, common, common]
def get_impl(dtype_cls1, dtype_cls2):
DTypes, but cannot be attached to arrays:
* Numeric, Integral, Floating, …
Implementation solves two problems:
1. Answers the question ``isinstance(Int64, Integral)``
* ``dtype.kind`` is not extensible enough for user dtypes!
* We abuse the scalar type hierarchy for this currently
2. Registration of promotion functions:
* Can use dispatching (multiple dispatching) based on AbstractDTypes.
3. Could even allow to write ``arr.astype(np.floating)`` in the future.
4. Unfortunately something like this necessary for value based promotion in UFunc dispatching.
> [mattip] this is awkward and need a definition of value based promotion
### DTypes are classes, ``arr.dtype`` is their instance
* Current ``arr.dtype.num`` type number
* Subclassing may be allowable, but we would need to drop,
* Allows for example definition of ``unit.to_si()`` and gives dtype authors the ability the full flexibility of Python types.
* UFuncs currently dispatch on type number → should be DType class
### Define as much as possible on the DType class
* ``dt_slots`` to define methods and classmethods.
* Casting/promotion goes through dunder-methods/slots just like in Python operators.
* Most important slots:
* ``__can_cast_from_other__(cls, other, casting) -> CastingImpl``
* ``__can_cast_to_other__(cls, other, casting) -> CastingImpl``
* ``__common_dtype__(cls, other) -> DType class``
* E.g. concatenating String and Float gives a String.
* String instances S4 and S8 concatenate to S8.
* Existing, but probably slightly changing "slots":
* sorting... (but these should all morphe and currently live somewhere between
class and instance)
* Do **not** make the C-struct visible, allowing easy extension, and even deprecation (since we can deprecate slots just like python functions).
1. Ask both dtype classes
2. Classes provide CastingImpl (much like UFuncImpl above)
3. CastingImpl checks details:
* Float to "S8" is not possible because string is too short
(Non-flexible dtypes can skip 2. and 3.)
* CastingImpl can be easily extended:
* Return specialized implementation (performance, we do this internally, replicate the complicated implementation we currently have)
### Iffy points:
* We have to break ABI compatibility in very minor, back-portable
way. Some small incompatibilities are likely.
* Promotion (coercion) between different dtype instances can require multiple steps
* Promotion for value based casting (`np.array(, dtype="int8") + 1` to give a `"int8"` array):
* Requires AbstractDType for python integers to store value!
* User DTypes may have to deal with it
* (May provide fallback paths e.g. to use smallest possible signed integer)
* Array coercion:
* Finding correct DType:
* Either looping (plus caching) through ``__discover_dtype_from_pytype__`` on all DTypes
* global dict of Scalar->DType
* e.g. string coercion is slow because it needs two passes, and second pass is complex
* Second pass (currently) creates many new dtype instances (slow, could be cached, or use "adjustment" logic)
* Have to provide compatibility to old slots
* Sorting, take, casting: All need an API refresh
* Inspecting values during UFunc calls (before the inner loop) should not be possible
* But does anyone have a need for that?
* Could be possible to add later (but would be highly discouraged)
### API Considerations:
* Users have to use ``PyArray_InitDTypeMetaFromSpec`` to initialize DType (from C)
* Hide implementation
* Allow easy later addition of new slots
* Users can create their own DType Types
* We may allow users to create their own DTypeMeta types (maybe not initially)
* Initially no new power (except flexible dtypes)
* Can easily extend/make API public to allow performance optimization (without having to add many slots on the DTypes)
* Will be UFuncImpl (but does not need to be initially!)
* Need to decide on exact API
* Loops will need hooks to allow better setup/teardown
* Allow better error reporting from inside loops.
* Possibly not full control initially
* Easily extensible (after initial dispatching, numpy could forfeit almost all control)