owned this note
owned this note
Published
Linked with GitHub
# 2019-09-23 NumPy DType Catch-up
- Time: 11am Pacific Time
- Join via Zoom at https://zoom.us/j/6398421986 (active now)
- [Trello workboard](https://trello.com/b/Azg4fYZH/numpy-at-bids)
- [Previous meetings](https://github.com/BIDS-numpy/docs/tree/master/status_meetings)
**NEP draft hackmd:** https://hackmd.io/kxuh15QGSjueEKft5SaMug
**Present:**
* DTypeInitFromSpec may want something like "subclass"/"inherit" from slot...
# Topics (please add)
* UFunc dispatching
* Casting
* Abstract dtypes
* nice for ufunc dispatching
* useful for promotion (int -> float) rules
# Proposal Overview
Proposal
--------
### UFunc Dispatching
* Before calling inner-loop, datatypes need to be fixed (e.g. length of strings when adding two strings.)
* UFUncs need to figure out which "inner loop" to call
* Type promotion needs to be defined (Float + Int -> Float)
Solution:
1. Create UFuncImpl objects:
* ``UfuncImpl[Int64, Int64 -> Int64]``, replace current "inner-loop" function
* Handle that the output of ``"a" + "b"`` is the two letter string ``"ab"``
* Exposed to python and operate like a UFunc with a specific signature (limited to numpy-arrays probably).
2. UFuncImpl is registered to UFunc
* We have a list of available UfuncImpls (at least typically)
3. Promotion:
* Register promotion/resolver functions which:
* Return UfuncImpl
* Could use "common type"/promotion and try again
* Promotion function lookup:
1. Try all functions
2. Dispatch based on categories
[mattip] Give the promotion resolver function a name, explain the mechanism to resolve the promotion logic
```
@add.register_resolver(Number, Number)
def resolver(ufunc, number1, number2):
common = np.common_dtype(number1, number2)
return ufunc.get_loop[common, common, common]
def get_impl(dtype_cls1, dtype_cls2):
pass
```
### AbstractDType
DTypes, but cannot be attached to arrays:
* Numeric, Integral, Floating, …
Implementation solves two problems:
1. Answers the question ``isinstance(Int64, Integral)``
* ``dtype.kind`` is not extensible enough for user dtypes!
* We abuse the scalar type hierarchy for this currently
2. Registration of promotion functions:
* Can use dispatching (multiple dispatching) based on AbstractDTypes.
3. Could even allow to write ``arr.astype(np.floating)`` in the future.
4. Unfortunately something like this necessary for value based promotion in UFunc dispatching.
> [mattip] this is awkward and need a definition of value based promotion
### DTypes are classes, ``arr.dtype`` is their instance
* Current ``arr.dtype.num`` type number
* Subclassing may be allowable, but we would need to drop,
* Allows for example definition of ``unit.to_si()`` and gives dtype authors the ability the full flexibility of Python types.
* UFuncs currently dispatch on type number → should be DType class
![overview_hierarchy_and_type_definitions](https://i.imgur.com/cu8xMBA.png)
### Define as much as possible on the DType class
* ``dt_slots`` to define methods and classmethods.
* Casting/promotion goes through dunder-methods/slots just like in Python operators.
* Most important slots:
* ``__can_cast_from_other__(cls, other, casting) -> CastingImpl``
* ``__can_cast_to_other__(cls, other, casting) -> CastingImpl``
* ``__common_dtype__(cls, other) -> DType class``
* E.g. concatenating String and Float gives a String.
* ``__common_instance__``:
* String instances S4 and S8 concatenate to S8.
* Existing, but probably slightly changing "slots":
* ``dtype_getitem``/``dtype_setitem``
* sorting... (but these should all morphe and currently live somewhere between
class and instance)
* Do **not** make the C-struct visible, allowing easy extension, and even deprecation (since we can deprecate slots just like python functions).
### Casting
* Multistep:
1. Ask both dtype classes
2. Classes provide CastingImpl (much like UFuncImpl above)
3. CastingImpl checks details:
* Float to "S8" is not possible because string is too short
(Non-flexible dtypes can skip 2. and 3.)
* CastingImpl can be easily extended:
* Return specialized implementation (performance, we do this internally, replicate the complicated implementation we currently have)
### Iffy points:
* We have to break ABI compatibility in very minor, back-portable
way. Some small incompatibilities are likely.
* Promotion (coercion) between different dtype instances can require multiple steps
* Promotion for value based casting (`np.array([1], dtype="int8") + 1` to give a `"int8"` array):
* Requires AbstractDType for python integers to store value!
* User DTypes may have to deal with it
* (May provide fallback paths e.g. to use smallest possible signed integer)
* Array coercion:
* Finding correct DType:
* Either looping (plus caching) through ``__discover_dtype_from_pytype__`` on all DTypes
* global dict of Scalar->DType
* e.g. string coercion is slow because it needs two passes, and second pass is complex
* Second pass (currently) creates many new dtype instances (slow, could be cached, or use "adjustment" logic)
* Have to provide compatibility to old slots
* Sorting, take, casting: All need an API refresh
* Inspecting values during UFunc calls (before the inner loop) should not be possible
* But does anyone have a need for that?
* Could be possible to add later (but would be highly discouraged)
### API Considerations:
* **DTypes**
* Users have to use ``PyArray_InitDTypeMetaFromSpec`` to initialize DType (from C)
* Hide implementation
* Allow easy later addition of new slots
* Users can create their own DType Types
* We may allow users to create their own DTypeMeta types (maybe not initially)
* **CastingImpl**
* Initially no new power (except flexible dtypes)
* Can easily extend/make API public to allow performance optimization (without having to add many slots on the DTypes)
* Will be UFuncImpl (but does not need to be initially!)
* **UFuncImpl**
* Need to decide on exact API
* Loops will need hooks to allow better setup/teardown
* Allow better error reporting from inside loops.
* Initially:
* Possibly not full control initially
* Easily extensible (after initial dispatching, numpy could forfeit almost all control)