Note that I continued part of this in new documents which are not yet online, but it should be relevant since because it lists many things (including some smaller side notes)

Prepared Document for Meeting: https://hackmd.io/B5TPPP-8QiKOFODPqmXtfA

TODO Document restructuring/new document

I will probably try to split the document up into:

  1. Requirements (possibly split into different sections)
  2. Decisions that we probably need to make
  3. Either in 2. or seperately, some suggestions for it.

It may be interesting to have some python dummy implementation

NumPy Dtype Requirements and Approaches

This is a work in progress to try to structure my thoughts.
It may be a bit random and some of these are probably premature.

Please feel free to change/edit this!

Nomenclature

Depending on how types/instances are created, things can be confusing, so I would suggest to call:

  • dtype: A class (type or not)
  • descriptor (or a dtype): object tagged onto the array (an instance of dtype)
  • scalar type: A type which, when instanciated, provides the scalar instance
  • a scalar: instance of scalar type.

Just to note, the names below are just working names, and they would all need an __array_ or __numpy_ prefix of course.

Please try to list all related documents here (in case I missed some):

Descriptor (dtype instance) Requirements

  • Provides an immutable instance acting as descriptor tagged to the array
    • Maybe for some dtypes with their own storage management,
      this would not be true (but maybe it would be true, since
      if the storage changes it is probably some kind of shared
      storage of the dtype type and not the descriptor)
  • Dtype comparison and hash
  • Allowing to override:
    • Representation
    • slots (ArrFuncs) from python and C.
    • Inner loop implementations
      • Probably at instanciation time, so that a python
        callback can be set in the "slot" level (I guess this is
        what python does as well).
  • Casting and promotion rules
    • Decision needed: How do subclasses inherit these?
    • Should a new type be allowed to make an impossible loop
      possible easily? (thoughts below depend on this!)
      • Say a unit could say: timedelta * timedelta -> unit[timedelta**2]
        although datetime does not know this may be plausible.
      • Thinking more about this: I think probably not immediately, but,
        if we allow registering arbitrary loops, it is (depends on
        who we ask).
    • Is it OK, if import order/loop adding order can affect results?
    • It would be nice to stay close to __array_ufunc__
      when it comes to implementation.
      • Assuming we go that way, we probably need caching?
      • If we have a unit[m] is the casting handled on the
        unit level?
    • Currently I do not think promotion rules really exist
      aside from "result_type" (and the less its ideal scalar
      rules).
      • Promotion rules may be ufunc dependend, but np.result_type
        does try to do handle them generically.
      • While value based promotion sucks, it would be good to be
        able to do it?! (also affects possible caching)
      • Promotion could be handled by the ufunc loop:
        • Input dtypes may not realize a loop exists?
        • Could ask np.result_type common type (promotion) on failure
          (this is what Julia does I think)
      • np.result_type like common type operation is also needed
        • i.e. for concatenate
        • Casting tables could solve this (save-casting graph of depth 1?)
        • Ask all involved dtypes (but what if there is a more awesome one
          which knows how to cast them all, say a unit which understands
          ints and datetime)
  • Ufunc hook:
    • Dtypes should be able to reuse existing inner-loops.
      • Do we need to expose them somehow (including to python
        or is it enough to have a way to do this for subclassing).
    • But need to be able to run setup and teardown code?
      • Reason: Units need to check and find the input
        and find descriptor. However, after they did this, they
        can simply use the normal python inner-loops.
      • Currently, we use the the type resolver ufunc
        functions for this. It needs to be possible to
        inject logic here for custom dtypes!
    • Dtype hooks run after __array_ufunc__
      (just noting seems logical and pretty obvious).
  • Storage of metadata (e.g. similar to units)
  • Storage of additional data ragged arrays/variable length strings.
  • "Reference counting"
  • Associated with a scalar type, this is currently only done through dtype.type
    (so already exists).
  • Extensible ArrFunctions:
    • Example: New sorting implementations
      (current timsort hack).
    • We must be free to add new slots/methods to dtypes
      (although sometimes we should likely move them to functions)
Other Possible Requirements:
  • dtype parsing capability, for dtype="MyType[unit]".
    (Frankly, I am not convinced of this at all, but it can wait
    in any case. Mentioned in https://github.com/numpy/numpy/wiki/Dtype-Brainstorming)
  • "fused types" on the python level as mentioned by Matti and
    this PR:
    * np.array(b, dtype=np.blasable) for (s, d, c, z)
    * np.array(b, dtype=np.floating)
    Are these like ABCs that types register to? Is this similar to
    a flexible type?
    They could also be used for casting
  • np.load and np.save would require pickling for user dtypes,
    that seems very annoying.
List of (current) ArrFunction slots/attributes
  • get- and setitem (Python/PyObject coercion casting)
  • copyswap and copyswapn (copying dtype)
  • compare (comparison function, like Python __cmp__ slots)
  • argmax/argmin
  • dot product
  • Scanfunc (parsing ascii file)
  • fromstr (parse a single string)
  • nonzero
  • arange (FillFunc) implementation
  • fill with scalar (fillwithscalar)
  • Sorting and Argsorting implementations (fixed size)
  • "Fast" functions:
    • clip, putmask, take
Other slots:
  • casting dictionary (castdict)
    • Cast between user types
  • Casting rules:
    • ScalarKindFunc (same kind casting)
    • can cast scalar kind to
Flags
  • Item Refcount
  • Has Object.
  • Convert to list for pickling (TODO: What is this?)
  • Is pointer: Item is a pointer (For extension types?)
  • Needs Init (e.g. objects need initialization, no "empty")
  • Needs PyAPI
  • Use Getitem/Setitem for extraction 0-D array from scalar
    • Rational/Usertypes in general need to define this I think?
    • Object arrays use this
  • (Aligned Struct)

Possibly nice to haves?

  • isinstance(scalar, descriptor) could be nice, but that
    does not necessarily mean that descriptor is scalar_type?
    • We already have (dtype|descriptor).type which may be
      easier to think of in any case.
    • I am also wondering if we can make such things as a
      decimal_dtype, in a sense specialize python types
      (which mostly would mean asserting the output type and
      possibly allowing to expose some of its methods)
  • A place to put dtype specific ufuncs?
  • (Seb.) A convenient/standard where to put something like methods?
    ​​​​MyDate.normalize(arr)  # Use dtype namespace?
    ​​​​# Could expose as a method-like (Thought for later)
    ​​​​arr.for_each_element.normalize()
    
    • Should also work for Operators/dunder methods.

Proposals

The specific propals are outdated and need some thoughts.

  1. Lets not worry about making descriptor a type (and thus the same as scalars). This could probably be added later, but I/we are not convinced it is a good idea.
  2. The exact steps still need to be decided, but:
    • We should aim to keep close to other APIs (__array_ufunc__)
  3. (More?)

Proposed API for subclassing numpy arrays. These are really brain storming right now:


class dtype(object):
    def __promote__(self, other):
        """Used by np.promote_types, identical to result_type
        but without scalar logic.
        """
        if isinstance(other, self):
            # make sure to give priority to subclasses:
            self, other = other, self

        if self.__can_cast__(other):
            return other
        if other.__can_cast__(self):
            return self
        
        raise TypeError("cannot promote to common type.")


class unit(dtype):
    itemsize = 8
    type = pyunit

    def __new__(self):
        # what is specfically needed here?
        pass
    
    def __item_unpack__(self, val):
        """How to do this as low level loops?"""
        return self.type.from_bytes(val)

    def __item_pack__(self, obj):
        if not isinstance(obj, self.type):
            raise ValueError
        return obj.to_bytes()

    @classmethod  # not sure?
    def __promote__(cls, dt1, dt2):
        """By default, checks can_cast(self, other)"""
        return np.result_type(self, other)

    @classmethod
    def __can_cast__(cls, dt1, dt2, casting="safe"):
        return True or False  # or unit["m"], loop?

    @classmethod
    def __get_loop__(self, ufunc, in_types, out_types):
        """
        Alternatively, register loops and:
            * check exact loops → promote → check loops again
            * (loop could still refuse)
        """
        if not_possible:
            return NotImplemented
        return UfuncLoop


class UfuncLoop:
    # probably largely C-slots, but could fill from python
    inner_loop
    setup_loop = None     # i.e. FloatClearErr
    teardown_loop = None  # i.e. FloatCheckErr
    needs_api  # Flag (or allow setup to set/override?)
    identity = NotImplemented
    # more flags to add
    
    # Other or even extendable things?:
    specialized_inner_loops  # contiguous (copy code), AVX?

Casting inner loops, basically seem like "Type1->Type2" ufuncs,
so whatever API we end up using, unary ufunc calls and the final
casting call should likely look identical. In fact, they could
probably be identical.

Details about promotion

Promotion is necessary if there is no ufunc loop for the specific types.
Pushing it to the objects, may also allow to hack value based promotion
that we currently are stuck with.

We do have some "special" promotion rules currently hacked in, first
thing that comes to mind is integer addition using at least long precision.

Should check current TypeResolution functions for other special cases.

The question is how to split it up. We currently have np.result_type and
it would be nice if that can just call the __promote__ logic.
This is also used elsewhere, i.e. in concatenate

If the promotion gets additional information, it could handle most
most of the things done in setup (see below), making the default setup
basically a no-op.

Other things to keep in mind:
  • Flexible dtypes may be an issue. If promotion does not know about
    the ufunc, it may have to return a generic string (or something like a
    generic string user dtype).
    • One possible thought: Have dtype=IntWithUnit but the unit still
      is flexible? Note that this is likely so corner case,
      I am not sure we have to worry about it.
    • Flexible types are an option to handle promotion though:
      • unit * unit -> unit, and the ufunc setup time check
        decides which unit.
  • Special rule for addition reduction (sum)! This is hardcoded "promotion"
    loop selection logic, which is hard to represent (unless you pass method="reduce"
    to some dtype/loop related setup).

Existing Implemenations:

Julia and XND
Julia:
  • Generally use "exact" signature
  • promotion rules can be registered type1, type2 → typeX with promote
    • promote can promote any number of arguments of course
  • Super type (Number) implementation of functions (math operators):
    • only used if no more specific implemtation found
    • calls promote(*args) and tries again
xnd-project:
  • Does not know about promotion as such.
  • gumath seems to simply stuff promotion into the ufunc
    loops, i.e. you register all loops that you may want to use.
    (I do not think there is casting involved at all in function calls?
    For xnd, the interesting part is resolving the shape part of the
    datashape probably.)

Details about casting

Note that unsafe casting. We currently have:

  • unsafe casting
  • save casting
  • same kind casting
  • equivalent (byte order changes)
  • No casting

And python coercion (can be seen as unsafe casting to/from PyObject):

  • to PyObject (a special type of casting used by item)
  • from PyObject (maybe these are identical to unsafe casting)

I think it may be possible to get around same_kind casting, but
maybe there is also not much reason for it.

There is also the concept of construction which we may also need for
np.array(..., dtype=dtype) logic. Julias Reasoning although possibly it can be seen as unsafe casting?

Other Notes:
  • Should the inner loop dispatcher be allowed to override casting?
    This may be useful to e.g. allow unsafe casting when the existing loop
    you want to reuse has a for example a more precise output dtype.
Issues/Discussion?:
  • Inheritence/chained casting:
    • MyInt(int) could inherit casting from int, but often that does
      probably not make sense. However, unsafe casting may make sense.
    • Type1 → Type2 → Type3 may be defined, but not Type1 → Type2
      (Probably we should just not – accidentally – allow this?)
  • Dtype discovery for np.array (unsupported for user types?)

Details about setup/teardown

At some point, we have to discover which inner loop to use and give
the ufunc a chance to set other information:

  • Discover the correct output dtype (to get there, promotion of the input
    dtypes may be necessary!).

    • Since some loops may be mixed and others are not (timedelta * float
      is OK, but timedelta * timedelta is not), it seems like pushing
      this into promotion may be simpler (promotion would know the ufunc!?)
  • Return the inner loop function (in some form or another)

    • It could be plausible to just return inner loop types, such as
      "f,f->f" which together with the ufunc name defines the inner loop
    • More likely: expose np.add.loop["f,f->f"] using some PyCapsule
      style wrapper object.
  • Set whether the inner loops requires the Py-API (maybe tagged onto the
    inner loop object itself, e.g. python implemented ones always need anyway,
    ours will never need).

  • Run additional setup code:

    • Setup working memory
    • Clear error flags
  • Setup a teardown function to:

    • Free working memory
    • Check error flags and give warnings, or raise errors
  • Kwargs to ufuncs, foward to inner-loops?

    • paramter type arguments (e.g. precision argument)
    • Resolved during ufunc setup?
    • Broadcastable arguments seem difficult/out of scope
      (Example: np.clip(arr, minval=None, maxval=None))
  • arr.view(new_dtype) is buggy if types use their own storage area:

    • Could have a flag/reuse HASREF (probably have to?)
    • (Other things to keep in mind?)
    • Depending on type may or may not make sense?
      (object → object is OK, but for a type with metadata and refs
      it can probably go both ways)

API

What needs to happen if we call a ufunc:

  1. Ask dtypes if they implement a ufunc loop that should be used

    • TypeResolution step returns the loop which should be used
      and thus output dtype. Although that could still be cast in principle.
    • If none found: Additional promotion step here (like Julia)?
  2. Decide if casting input to the output can be handled.

  3. Run the inner-loop:

    1. Allow for inner-loop specific setup:
      • i.e. FPU error clearing
    2. Run inner loop until finished or stop flag given
    3. Allow for inner-loop specfic teardown:
      • i.e. FPU error checking

    → What type of access should we give these step? E.g. access values?

Currently:

  1. TypeResolver of ufunc gets run (currently do not really know about user types)
    • This typically calls ResultType to find the output type
    • (often) does a linear search on the existing loops
      (unless specific TypeResolver for the function)
  2. Loop selector is run, this is set for the ufunc object:
    • Finds the actual loop
    • Can force needs_api (otherwise the iterator will decide)
  3. Ufunc machinery decides on what casting is necessary
  4. Runs the loop:
    • Run the loop until it finishes (except breaking on PyErrors)
    • Check for floating point errors (when done)

Main issue: Everything is tagged onto the ufunc object, so ufunc object would have to ask the dtypes specifically.

C-API vs. Python API

Some thoughts:

Py-API:

  • Make wrapping elementwise functions into inner-loops easy
    • Probably similar to np.from_pyfunc except it would return some
      capsule.
    • Type annotations could be nice to use here as well.
  • Provide a way to register/give existing inner-loops (or say cython
    defined inner loops easily).
    • It should be OK to have a python object that implements a few
      fast loops in Cython and tags them on from python during instanciation
      time?
    • Something like PyCapsule, or NpyInnerLoopCapsule?:
      • Requires API flag?
      • Leave room for other flags? (e.g. optimization hints
        or even alignment requirements)

From the python side, it may be an alternative to simply view the array
and then call np.add again, OTOH that may force some consistency
checking/wrapping to make sure the python side cannot just return a wrong
dtype or shape.

C-API:

  • Inner-loop registration/capsuling should have a similarity.
  • Need to make existing inner-loops "capsules" available probably?

Arguments for and against making descriptors types

Pro:

  • Feels logical for simple scalars and float32_arr.dtype(3)
    and isinstance(float32_arr[0], float32_arr.dtype) is nice.

Against:

  • It is probably possible to change later without a large compatibility
    break.
  • Implementation needs Metaclasses, which are somewhat harder to reason about
    (Although it is also not very hard probably).
  • Say I want to create a dtype for decimal.Decimal. It is not
    possible to change Decimal to add numpy specific information, so
    off-loading it into a dtype/descriptor with npdecimal.type is Decimal
    seems easier?
  • (There is at least some discussion, whether scalars are even a good idea
    within numpy.)

Ufunc properties

  • Ufuncs have an identity should this move to the loop implementation
    (or be overridable)? The loop implementation knows the correct output type.
    (It could live on the dtype, but that seems unnecessary/strange?)

Ufunc signatures

See: https://github.com/numpy/numpy/issues/12518

The ufunc inner loops are pretty limited right now. I am (personally) not
in favor of bloating the API too much, but we may want to add some things
to it while we are at it:

  1. Possibly a return value to signal:
    • StopIteration
    • Error
  2. Better payload/metadata values to use for custom dtypes, these
    may already fit into the current pointer we have. But are definitely
    necessary.
  3. Possibly more fields related/in gufuncs?

Plausibly, the old ufuncs could recieve a very lightweight wrapper

Select a repo