NumPy2 lighning talk: Refactor np.MaskedArray to use __array_function__ instead of inheriting from np.ndarray

Even with all the checks added for MaskedArrays inside NumPy code, there are still issues around properly propogating masks. In PR 22914, @greglucas proposed adding MaskedArray.__array_ufunc__ building on work done in other PRs that were subsequently reverted.

On the mailing list, Allan Haldane announced a third-party implementation of MaskedArray using __array_function__ https://github.com/ahaldane/ndarray_ducktypes. The post mentions two open problems:

  • what to do about the mask of arr.imag = MaskedArray([1,X,1])
  • how to ducktype scalars when using __array_function__ ? Do we always need an associated "MaskedScalar" type?

In the PR, @greglucas says:

My desires from a new class:

  • No auto-masking of "bad" computation locations, let the user do that apriori if they want I haven't seen anyone argue for this yet, and this looks like it goes way back in git blame.
  • No warnings when doing a computation of a bad value under a mask. It would be annoying to get divide by zero warnings on your computations when you've already pre-masked the data (see my example at the top of the PR for the motivation here!)
  • np.func gets forwarded to np.ma.func whenever possible (implement __array_ufunc__ and __array_function__). It is quite surprising right now to do np.stack(ma, ma) and get a Masked instance back, but with an incorrect mask! (it would be better to just get a plain ndarray IMO)
  • Proper subclass wrapping/heirarchy preservation. Should fall out of (3) if implemented properly with NotImplemented being returned.

@mhvk adds

There is only one item I'd add: think carefully how to deal with underlying data other than ndarray. For instance, part of the reason I made a separate implementation in astropy was that the current MaskedArray simply cannot be used to create MaskedQuantity with MaskedArray holding a Quantity attributes like the unit become hidden

There is also PR 22913 to implement this idea on the NumPy MaskedArray, without removing the inheritance from ndarray.

Implications for Array API compatibility

Masked arrays are currently out of scope for the Array API.

Who is in favor?

There are many open issues around MaskedArray. A refactor would enable cleaning up athe API space and smoothing out many of the rough edges, making maintenance easier.

What will break and how could we work around the breakage?

Why not make the change?

Is it worth it?

<Discussion></Discussion>

Decision

<Discussion></Discussion>