--- tags: NumPy, NumPy2 --- # NumPy2 lighning talk: Refactor `np.MaskedArray` to use `__array_function__` instead of inheriting from `np.ndarray` Even with all the checks added for `MaskedArray`s inside NumPy code, there are still issues around properly propogating masks. In [PR 22914](https://github.com/numpy/numpy/pull/22914#issuecomment-1372842554), @greglucas proposed adding `MaskedArray.__array_ufunc__` building on work done in other PRs that were subsequently reverted. On [the mailing list](https://mail.python.org/archives/list/numpy-discussion@python.org/thread/OSGGDTJX6XZK3LXLNQYGZP766FD3YFQQ/#QNVWET3O7KXFGAB7NVZNWUGHE766TGNL), Allan Haldane announced a third-party implementation of MaskedArray using `__array_function__` https://github.com/ahaldane/ndarray_ducktypes. The post mentions two open problems: - what to do about the mask of `arr.imag = MaskedArray([1,X,1])` - how to ducktype scalars when using `__array_function__` ? Do we always need an associated "MaskedScalar" type? In the PR, @greglucas says: > My desires from a new class: > - No auto-masking of "bad" computation locations, let the user do that apriori if they want I haven't seen anyone argue for this yet, and this looks like it goes way back in git blame. > - No warnings when doing a computation of a bad value under a mask. It would be annoying to get divide by zero warnings on your computations when you've already pre-masked the data (see my example at the top of the PR for the motivation here!) > - `np.func` gets forwarded to `np.ma.func` whenever possible (implement `__array_ufunc__` and `__array_function__`). It is quite surprising right now to do `np.stack(ma, ma)` and get a `Masked` instance back, but with an incorrect mask! (it would be better to just get a plain `ndarray` IMO) > - Proper subclass wrapping/heirarchy preservation. Should fall out of (3) if implemented properly with NotImplemented being returned. @mhvk adds > There is only one item I'd add: think carefully how to deal with underlying data other than ndarray. For instance, part of the reason I made a separate implementation in astropy was that the current `MaskedArray` simply cannot be used to create `MaskedQuantity` -- with `MaskedArray` holding a `Quantity` attributes like the unit become hidden There is also [PR 22913](https://github.com/numpy/numpy/pull/22913) to implement this idea on the NumPy `MaskedArray`, without removing the inheritance from `ndarray`. ## Implications for Array API compatibility Masked arrays are currently out of scope for the Array API. ## Who is in favor? There are many open issues around `MaskedArray`. A refactor would enable cleaning up athe API space and smoothing out many of the rough edges, making maintenance easier. ## What will break and how could we work around the breakage? ## Why not make the change? ## Is it worth it? <Discussion></Discussion> ## Decision <Discussion></Discussion>