Note that I continued part of this in new documents which are not yet online, but it should be relevant since because it lists many things (including some smaller side notes)
Prepared Document for Meeting: https://hackmd.io/B5TPPP-8QiKOFODPqmXtfA
TODO Document restructuring/new document
I will probably try to split the document up into:
- Requirements (possibly split into different sections)
- Decisions that we probably need to make
- Either in 2. or seperately, some suggestions for it.
It may be interesting to have some python dummy implementation…
NumPy Dtype Requirements and Approaches
This is a work in progress to try to structure my thoughts.
It may be a bit random and some of these are probably premature.
Please feel free to change/edit this!
Nomenclature
Depending on how types/instances are created, things can be confusing, so I would suggest to call:
- dtype: A class (type or not)
- descriptor (or a dtype): object tagged onto the array (an instance of dtype)
- scalar type: A type which, when instanciated, provides the scalar instance
- a scalar: instance of scalar type.
Just to note, the names below are just working names, and they would all need an __array_
or __numpy_
prefix of course.
Please try to list all related documents here (in case I missed some):
Descriptor (dtype instance) Requirements
- Provides an immutable instance acting as descriptor tagged to the array
- Maybe for some dtypes with their own storage management,
this would not be true (but maybe it would be true, since
if the storage changes it is probably some kind of shared
storage of the dtype type and not the descriptor)
- Dtype comparison and hash
- Allowing to override:
- Representation
- slots (ArrFuncs) from python and C.
- Inner loop implementations
- Probably at instanciation time, so that a python
callback can be set in the "slot" level (I guess this is
what python does as well).
- Casting and promotion rules
- Decision needed: How do subclasses inherit these?
- Should a new type be allowed to make an impossible loop
possible easily? (thoughts below depend on this!)
- Say a unit could say:
timedelta * timedelta -> unit[timedelta**2]
although datetime does not know this may be plausible.
- Thinking more about this: I think probably not immediately, but,
if we allow registering arbitrary loops, it is (depends on
who we ask).
- Is it OK, if import order/loop adding order can affect results?
- It would be nice to stay close to
__array_ufunc__
when it comes to implementation.
- Assuming we go that way, we probably need caching?
- If we have a
unit[m]
is the casting handled on the
unit level?
- Currently I do not think promotion rules really exist
aside from "result_type
" (and the less its ideal scalar
rules).
- Promotion rules may be ufunc dependend, but
np.result_type
does try to do handle them generically.
- While value based promotion sucks, it would be good to be
able to do it?! (also affects possible caching)
- Promotion could be handled by the ufunc loop:
- Input dtypes may not realize a loop exists?
- Could ask
np.result_type
common type (promotion) on failure
(this is what Julia does I think)
np.result_type
like common type operation is also needed
- i.e. for
concatenate
- Casting tables could solve this (save-casting graph of depth 1?)
- Ask all involved dtypes (but what if there is a more awesome one
which knows how to cast them all, say a unit which understands
ints and datetime)
- Ufunc hook:
- Dtypes should be able to reuse existing inner-loops.
- Do we need to expose them somehow (including to python
or is it enough to have a way to do this for subclassing).
- But need to be able to run setup and teardown code?
- Reason: Units need to check and find the input
and find descriptor. However, after they did this, they
can simply use the normal python inner-loops.
- Currently, we use the the type resolver ufunc
functions for this. It needs to be possible to
inject logic here for custom dtypes!
- Dtype hooks run after
__array_ufunc__
(just noting seems logical and pretty obvious).
- Storage of metadata (e.g. similar to units)
- Storage of additional data ragged arrays/variable length strings.
- "Reference counting"
- Associated with a scalar type, this is currently only done through
dtype.type
(so already exists).
- Extensible
ArrFunctions
:
- Example: New sorting implementations
(current timsort hack).
- We must be free to add new slots/methods to dtypes
(although sometimes we should likely move them to functions)
Other Possible Requirements:
dtype
parsing capability, for dtype="MyType[unit]"
.
(Frankly, I am not convinced of this at all, but it can wait
in any case. Mentioned in https://github.com/numpy/numpy/wiki/Dtype-Brainstorming)
- "fused types" on the python level as mentioned by Matti and
this PR:
* np.array(b, dtype=np.blasable)
for (s, d, c, z)
* np.array(b, dtype=np.floating)
Are these like ABCs that types register to? Is this similar to
a flexible type?
They could also be used for casting…
np.load
and np.save
would require pickling for user dtypes,
that seems very annoying.
List of (current) ArrFunction slots/attributes
- get- and setitem (Python/PyObject coercion casting)
- copyswap and copyswapn (copying dtype)
- compare (comparison function, like Python
__cmp__
slots)
- argmax/argmin
- dot product
- Scanfunc (parsing ascii file)
- fromstr (parse a single string)
- nonzero
- arange (FillFunc) implementation
- fill with scalar (fillwithscalar)
- Sorting and Argsorting implementations (fixed size)
- "Fast" functions:
Other slots:
- casting dictionary (castdict)
- Casting rules:
- ScalarKindFunc (same kind casting)
- can cast scalar kind to
Flags
- Item Refcount
- Has Object.
- Convert to list for pickling (TODO: What is this?)
- Is pointer: Item is a pointer (For extension types?)
- Needs Init (e.g. objects need initialization, no "empty")
- Needs PyAPI
- Use Getitem/Setitem for extraction 0-D array from scalar
- Rational/Usertypes in general need to define this I think?
- Object arrays use this
- (Aligned Struct)
Possibly nice to haves?
isinstance(scalar, descriptor)
could be nice, but that
does not necessarily mean that descriptor is scalar_type
?
- We already have
(dtype|descriptor).type
which may be
easier to think of in any case.
- I am also wondering if we can make such things as a
decimal_dtype
, in a sense specialize python types
(which mostly would mean asserting the output type and
possibly allowing to expose some of its methods)
- A place to put dtype specific ufuncs?
- (Seb.) A convenient/standard where to put something like methods?
MyDate.normalize(arr) # Use dtype namespace?
# Could expose as a method-like (Thought for later)
arr.for_each_element.normalize()
- Should also work for Operators/dunder methods.
Proposals
The specific propals are outdated and need some thoughts.
- Lets not worry about making descriptor a type (and thus the same as scalars). This could probably be added later, but I/we are not convinced it is a good idea.
- The exact steps still need to be decided, but:
- We should aim to keep close to other APIs (
__array_ufunc__
)
- (More?)
Proposed API for subclassing numpy arrays. These are really brain storming right now:
class dtype(object):
def __promote__(self, other):
"""Used by np.promote_types, identical to result_type
but without scalar logic.
"""
if isinstance(other, self):
# make sure to give priority to subclasses:
self, other = other, self
if self.__can_cast__(other):
return other
if other.__can_cast__(self):
return self
raise TypeError("cannot promote to common type.")
class unit(dtype):
itemsize = 8
type = pyunit
def __new__(self):
# what is specfically needed here?
pass
def __item_unpack__(self, val):
"""How to do this as low level loops?"""
return self.type.from_bytes(val)
def __item_pack__(self, obj):
if not isinstance(obj, self.type):
raise ValueError
return obj.to_bytes()
@classmethod # not sure?
def __promote__(cls, dt1, dt2):
"""By default, checks can_cast(self, other)"""
return np.result_type(self, other)
@classmethod
def __can_cast__(cls, dt1, dt2, casting="safe"):
return True or False # or unit["m"], loop?
@classmethod
def __get_loop__(self, ufunc, in_types, out_types):
"""
Alternatively, register loops and:
* check exact loops → promote → check loops again
* (loop could still refuse)
"""
if not_possible:
return NotImplemented
return UfuncLoop
class UfuncLoop:
# probably largely C-slots, but could fill from python
inner_loop
setup_loop = None # i.e. FloatClearErr
teardown_loop = None # i.e. FloatCheckErr
needs_api # Flag (or allow setup to set/override?)
identity = NotImplemented
# more flags to add
# Other or even extendable things?:
specialized_inner_loops # contiguous (copy code), AVX?
Casting inner loops, basically seem like "Type1->Type2"
ufuncs,
so whatever API we end up using, unary ufunc calls and the final
casting call should likely look identical. In fact, they could
probably be identical.
Promotion is necessary if there is no ufunc loop for the specific types.
Pushing it to the objects, may also allow to hack value based promotion
that we currently are stuck with.
We do have some "special" promotion rules currently hacked in, first
thing that comes to mind is integer addition using at least long
precision.
Should check current TypeResolution
functions for other special cases.
The question is how to split it up. We currently have np.result_type
and
it would be nice if that can just call the __promote__
logic.
This is also used elsewhere, i.e. in concatenate
If the promotion gets additional information, it could handle most
most of the things done in setup (see below), making the default setup
basically a no-op.
Other things to keep in mind:
- Flexible dtypes may be an issue. If promotion does not know about
the ufunc, it may have to return a generic string (or something like a
generic string user dtype).
- One possible thought: Have
dtype=IntWithUnit
but the unit still
is flexible? Note that this is likely so corner case,
I am not sure we have to worry about it.
- Flexible types are an option to handle promotion though:
unit * unit -> unit
, and the ufunc setup time check
decides which unit.
- Special rule for addition reduction (sum)! This is hardcoded "promotion"
loop selection logic, which is hard to represent (unless you pass method="reduce"
to some dtype/loop related setup).
Existing Implemenations:
Julia and XND
Julia:
- Generally use "exact" signature
- promotion rules can be registered
type1, type2 → typeX
with promote
promote
can promote any number of arguments of course
- Super type (
Number
) implementation of functions (math operators):
- only used if no more specific implemtation found
- calls
promote(*args)
and tries again
xnd-project:
- Does not know about promotion as such.
- gumath seems to simply stuff promotion into the ufunc
loops, i.e. you register all loops that you may want to use.
(I do not think there is casting involved at all in function calls?
For xnd, the interesting part is resolving the shape part of the
datashape probably.)
Details about casting
Note that unsafe casting. We currently have:
- unsafe casting
- save casting
- same kind casting
- equivalent (byte order changes)
- No casting
And python coercion (can be seen as unsafe casting to/from PyObject):
- to PyObject (a special type of casting used by
item
)
- from PyObject (maybe these are identical to unsafe casting)
I think it may be possible to get around same_kind
casting, but
maybe there is also not much reason for it.
There is also the concept of construction which we may also need for
np.array(..., dtype=dtype)
logic. Julias Reasoning although possibly it can be seen as unsafe casting?
Other Notes:
- Should the inner loop dispatcher be allowed to override casting?
This may be useful to e.g. allow unsafe casting when the existing loop
you want to reuse has a for example a more precise output dtype.
Issues/Discussion?:
- Inheritence/chained casting:
MyInt(int)
could inherit casting from int
, but often that does
probably not make sense. However, unsafe casting may make sense.
- Type1 → Type2 → Type3 may be defined, but not Type1 → Type2
(Probably we should just not – accidentally – allow this?)
- Dtype discovery for
np.array
(unsupported for user types?)
Details about setup/teardown
At some point, we have to discover which inner loop to use and give
the ufunc a chance to set other information:
-
Discover the correct output dtype (to get there, promotion of the input
dtypes may be necessary!).
- Since some loops may be mixed and others are not (timedelta * float
is OK, but timedelta * timedelta is not), it seems like pushing
this into promotion may be simpler (promotion would know the ufunc!?)
-
Return the inner loop function (in some form or another)
- It could be plausible to just return inner loop types, such as
"f,f->f"
which together with the ufunc name defines the inner loop
- More likely: expose
np.add.loop["f,f->f"]
using some PyCapsule
style wrapper object.
-
Set whether the inner loops requires the Py-API (maybe tagged onto the
inner loop object itself, e.g. python implemented ones always need anyway,
ours will never need).
-
Run additional setup code:
- Setup working memory
- Clear error flags
-
Setup a teardown function to:
- Free working memory
- Check error flags and give warnings, or raise errors
-
Kwargs to ufuncs, foward to inner-loops?
- paramter type arguments (e.g. precision argument)
- Resolved during ufunc setup?
- Broadcastable arguments seem difficult/out of scope
(Example: np.clip(arr, minval=None, maxval=None)
)
-
arr.view(new_dtype)
is buggy if types use their own storage area:
- Could have a flag/reuse HASREF (probably have to?)
- (Other things to keep in mind?)
- Depending on type may or may not make sense?
(object → object is OK, but for a type with metadata and refs
it can probably go both ways)
API
What needs to happen if we call a ufunc:
-
Ask dtypes if they implement a ufunc loop that should be used
- TypeResolution step returns the loop which should be used
and thus output dtype. Although that could still be cast in principle.
- If none found: Additional promotion step here (like Julia)?
-
Decide if casting input to the output can be handled.
-
Run the inner-loop:
- Allow for inner-loop specific setup:
- Run inner loop until finished or stop flag given
- Allow for inner-loop specfic teardown:
→ What type of access should we give these step? E.g. access values?
Currently:
- TypeResolver of ufunc gets run (currently do not really know about user types)
- This typically calls
ResultType
to find the output type
- (often) does a linear search on the existing loops
(unless specific TypeResolver for the function)
- Loop selector is run, this is set for the ufunc object:
- Finds the actual loop
- Can force
needs_api
(otherwise the iterator will decide)
- Ufunc machinery decides on what casting is necessary
- Runs the loop:
- Run the loop until it finishes (except breaking on PyErrors)
- Check for floating point errors (when done)
Main issue: Everything is tagged onto the ufunc object, so ufunc object would have to ask the dtypes specifically.
C-API vs. Python API
Some thoughts:
Py-API:
- Make wrapping elementwise functions into inner-loops easy
- Probably similar to
np.from_pyfunc
except it would return some
capsule.
- Type annotations could be nice to use here as well.
- Provide a way to register/give existing inner-loops (or say cython
defined inner loops easily).
- It should be OK to have a python object that implements a few
fast loops in Cython and tags them on from python during instanciation
time?
- Something like PyCapsule, or NpyInnerLoopCapsule?:
- Requires API flag?
- Leave room for other flags? (e.g. optimization hints
or even alignment requirements)
From the python side, it may be an alternative to simply view the array
and then call np.add
again, OTOH that may force some consistency
checking/wrapping to make sure the python side cannot just return a wrong
dtype or shape.
C-API:
- Inner-loop registration/capsuling should have a similarity.
- Need to make existing inner-loops "capsules" available probably?
- …
Arguments for and against making descriptors types
Pro:
- Feels logical for simple scalars and
float32_arr.dtype(3)
and isinstance(float32_arr[0], float32_arr.dtype)
is nice.
Against:
- It is probably possible to change later without a large compatibility
break.
- Implementation needs Metaclasses, which are somewhat harder to reason about
(Although it is also not very hard probably).
- Say I want to create a dtype for
decimal.Decimal
. It is not
possible to change Decimal
to add numpy specific information, so
off-loading it into a dtype/descriptor with npdecimal.type is Decimal
seems easier?
- (There is at least some discussion, whether scalars are even a good idea
within numpy.)
Ufunc properties
- Ufuncs have an
identity
should this move to the loop implementation
(or be overridable)? The loop implementation knows the correct output type.
(It could live on the dtype, but that seems unnecessary/strange?)
Ufunc signatures
See: https://github.com/numpy/numpy/issues/12518
The ufunc inner loops are pretty limited right now. I am (personally) not
in favor of bloating the API too much, but we may want to add some things
to it while we are at it:
- Possibly a return value to signal:
- Better payload/metadata values to use for custom dtypes, these
may already fit into the current pointer we have. But are definitely
necessary.
- Possibly more fields related/in gufuncs?
Plausibly, the old ufuncs could recieve a very lightweight wrapper