Array API standard - community meeting

Meeting minutes 15 May 2025

Time: 10am PDT, 1pm EDT, 5pm GMT, 7pm CEST
Present: Ralf Gommers, Athan Reines, Hameer Abbasi, Rok Mihevc, Aaron Meurer, Leo Fang,

  1. Announcements
  2. Can default device be None?
  3. adding eig and eigvals?
  4. test suite pain points
  5. meshgrid/broadcast_arrays return type
  6. matrix_transpose when fewer than 2 dims
  7. 0d fill value when passed to full?
  8. what should happen when providing an empty axis tuple to reductions (e.g., count_nonzero)?
  9. Adding put_along_axis to the specification

Notes

Announcements

Ralf: SciPy release coming up. Trying to get Array API updates in. Release branch cut on 17th. Then 5 weeks until final release.

Can default device be None? (jake/evgeni)

Ralf: yes, seems like everyone in agreement.

Hameer: can None be compared, as devices must be comparable?

Ralf: yes, confirmed.

Resolution: update spec accordingly. Add single sentence that None may be returned, as long as None is accepted by the device kwarg.

Hameer: can this be backported?

Aaron: yes, I think so, as this would be more of a clarification than anything.

eig and eigvals (evgeni)

Leo: added in cuSolver. In theory, this is doable now. Not sure how API should like. May be possible to follow NumPy. In CuPy, should be doable soonish.

Ralf: decent amount of usage in across multiple SciPy modules. My guess is that generally useful, but should likely wait until available in CuPy. In theory, once CUDA 12.6, then should be available everywhere, as CuPy always builds with latest CUDA release. Uses CUDA minor version compatibility.

Aaron: does always return complex?

Ralf: may be a slight hiccup in that JAX diverages from NumPy. NumPy has a value-dependent output dtype.

Ralf: some homework needs to be done to see if this is addressable in NumPy.

Resolution: tentative yes, provided can be addressed in NumPy.

In what ways can the test suite be improved? (evgeni)

(postponed)

meshgrid/broadcast_arrays return type (evgeni)

Ralf: reason it was changed in NumPy 2.0 was that it is problematic with Numba. Tuples are immutable, and compilers are happier here.

Ralf: would need to check with Jax whether it can be changed.

Aaron: can we use Sequence?

Ralf: for output types, this is not desirable.

Ralf: IIRC, only reason you want a list is if you wanted a mutable container, but not clear that this is a dominant need.

Aaron: during NumPy v2.0, was there any instances of downstream concerns?

Ralf: no, as mainly just need to index.

Resolution: get JAX to change. If they do not agree, then may need to update spec to allow both Tuple|List.

Resolution: combine the issues into one, and audit the spec to determine if any other affected APIs.

matrix_transpose when fewer than 2 dims (athan)

Leo: context: we try to Array API in cuTile. We wanted to know how to be compliant after conversations with compiler team.

Ralf: I vote to raising an exception after checking with various implementations.

Resolution: confirm with existing implementations and then, if all in agreement, add language to raise an exception.

0d fill value when passed to full? (athan)

  • Discussion: https://github.com/data-apis/array-api/issues/909
  • Would allow for the dtype to be inferred from the fill value; however, this could also be considered sugar for full_like(fill_value, float(fill_value)).
  • If we choose to support, what should happen when the fill_value dtype is different from dtype kwarg value?

Aaron: seems like the Dask behavior is a mistake, and should be fixed. Based on Matt's testing, seems like dtype kwarg always takes precedence.

Ralf: seems okay, although not as nice from typing perspective.

Aaron: would also allow for device inference, not just dtype inference, from fill_value.

Resolution: yes, if issue fixed in Dask. If this is a real need, do the work. Can also support in compat. Mildy in favor.

What should happen when providing an empty axis tuple to reductions? (evgeni)

  • Discussion: https://github.com/data-apis/array-api/issues/937
  • Implementation-defined? Error? Or should it fall out naturally from the fact that the set of non-reduced dimensions is the complement of the list of provided axes, and thus an empty axis tuple means that no axes are reduced.

(postponed)

Adding put_along_axis to the specification

Ralf: NumPy would need to change its behavior, as it currently returns None.

Ralf: given that ecosystem is still not universally aligned, harder to push forward. array-api-extra supports at and takes in indices which are arrays. JAX's at supports arbitrary indices. In which case, use case covered, it seems.

Resolution: ecosystem needs greater alignment and problem is not particularly urgent (only used in 2 places in SciPy).

Resolution: punt on need for put and put_along_axis. If are going to really solve this, pursue view tracking implementation. Second best option is supporting an at function, which would be more general that these functions, as can use not only for arbitrary in-place updates, but also in-place operations (e.g., add, subtract, etc).

Resolution: keep https://github.com/data-apis/array-api/issues/177 open, as this is a recurring discussion.


Meeting minutes 1 May 2025

Time: 10am PDT, 1pm EDT, 5pm GMT, 7pm CEST
Present: Ralf Gommers, Athan Reines, Evgeni Burovski, Aaron Meurer, Guido Imperiale, Rok Mihevc, Leo Fang,

  1. Announcements
  2. 2025 focus areas
  3. Can default device be None?

Notes

Announcements

No announcements.

2025 focus areas

  • Previous discussion 9 January 2025

    • in-place updates (setitem)
    • cond function
    • typing
    • device transfers
    • DLPack v1.0 support
      • DLPack C API support?
    • making progress on documentation
      • what types of documentation?
        • Gap: high-level docs on how do all the array-api-* packages fit together.
      • where should this documentation live?
    • nan* functions
    • missing data/masks (tentative proposals to get ecosystem consensus)
  • What are the pain points? Where are the gaps?

  • Question: the compatibility layer continues to grow as a shim layer. Over the past couple of years, we've grown increasingly reliant on array-api-compat to provide compatible behavior, which has enjoyed success in downstream libraries, such as SciPy and sklearn, among others. However, the compat layer's existence, to some degree, dis-incentivizes upstream array libraries from adding compliant behavior, and user's moving from one library to the next still cannot simply swap an import and enjoy a frictionless experience, at least for the common core of Array API functionality. So the question becomes: should we be making a greater effort to upstream compliant behavior and better align array libraries? Some folks, such as the JAX community, have made great strides to improve Array API compliance; others, such as PyTorch, have stalled out.

    • A (rough) snapshot of test suite compliance results (using max-examples=1; note: failures are over-represented by special cases):
      • NumPy: 19 failures, 938 pass, 66 skipped, 98%/91.6%
      • CuPy: 349 failures, 872 pass, 9 skipped, 71.4%/70.9%
      • PyTorch: 210 failures, 995 pass, 25 skipped, 82.6%/80.9%
      • JAX: 349 failures, 872 pass, 9 skipped, 71.4%/70.9%

Guido: cannot think of use cases of cond function, thus far. However, I have not implemented any convergence algorithms. I would like to see a use case. I could see a while and for.

Ralf: the idea is lazy boolean evaluation. I agree that we have yet to identify a high priority use case. Thus far, we have not been bitten, yet.

Ralf: update on DLPack v1.0. PyTorch PR adding this. Implementation looks ready.

Leo: I already told our PyTorch team about this, as well.

Ralf: missing data/masks is likely not a priority, as most array libraries do not handle. nan* functions, however, yes, there may be room here.

Guido: I think tutorials around "you want to write an array API compatible package, here is how you do it".

Evgeni: some worked examples. Here is a SciPy function; here is how you make it array API compatible.

Guido: agreed.

Guido: how to migrate from pure eager to lazy and the design decisions that are possible. Demonstrate alternative designs. E.g., in SciPy, there are ongoing discussions regarding generalized dispatch mechanisms.

Ralf: may not be great for generalized tutorial/guide, as specific to SciPy. May be better as developer docs for SciPy, itself.

Aaron: my main gripe with standard documentation is that it can be difficult to find information. E.g., when going to homepage, currently only a table of contents. Not clear that the meat of the spec is in a sub-folder.

Ralf: agreed. Would be good to provide some high-level context on the specification homepage.

Guido: having some way of querying whether an array is read-only or of requiring an array to be read-only. Essentially, want an API in which I call at the beginning of any Python function to return a view that is read-only in any API that uses setitem in order to avoid accidental mutation.

Guido: in CuPy, it doesn't exactly replicate NumPy's semantics, such as in broadcasting, where NumPy returns a read-only view in order to avoid updating multiple elements at once. CuPy does not cover this edge case, as doesn't have the same implementation details.

Aaron: wouldn't copy-on-write be another approach for addressing this issue?

Ralf: yes. NumPy is likely moving in this direction, due to free-threading support.

Ralf: I think this topic is one of the main areas that needs addressing.

Guido: that and at, as currently it is missing in Sparse. And even in JAX.

Ralf: still the same problem, as can solve via view tracking. And no point and discussing with JAX until we solve the problems in PyTorch and NumPy. Once we do, it is simply syntax to add. The at problem is the in-place update problem is the view-tracking problem.

Guido: I think JAX also struggles when have a mutable wrapper around an immutable JAX array.

Ralf: the reason JAX does not want to discuss this is that not possible to currently replicate NumPy behavior. However, if we are able to solve this in NumPy, then it becomes possible to do so in JAX.

Can default device be None?

(postponed until next meeting)


Meeting minutes 17 April 2025

Time: 10am PDT, 1pm EDT, 5pm GMT, 7pm CEST
Present: Ralf Gommers, Hameer Abbasi, Rok Mihevc, Joren Hammudoglu, Jake Vanderplas, Lucas Colley, Aaron Meurer, Guido Imperiale, Tim Head, Tyler Reddy, Evgeni Burovski, Raúl Cumplido, Leo Fang,

  1. Announcements
  2. Typing
  3. Assorted discussions (Lucas)
  4. 2025 focus areas

Notes

Announcements

Rok: contributor to Arrow for past 6 years. Been working on tensor arrays in Arrow.

Raúl: Also a contributor to Apache Arrow.

Ralf: any announcements?

Tim: anyone going to be at PyCon Germany next week? If so, happy to meet up.

Typing

Joren: general question regarding whether to follow official Python spec.

Joren: for historical reasons, union of bool and int is equivalent to int. Similarly, for other unions.

Ralf: we should document the divergence. Spec is docs. Typing library is for type checking behavior.

Aaron: consider the typing library as the authoriative implementation.

Joren: can also clarify that some things may be accepted even if not supported.

Ralf: we should update the typing design topic page. Essentially, agree on the separation of concerns.

Guido: so, in array-compat-api, which types should be followed?

Joren: there are also cases where type checkers behave differently, so which choice is followed does impact end result.

Resolution: array-api-compat should follow separate typing library. We should follow up in the spec and be more explicit about where, e.g., bool is not allowed when int is, etc.

Assorted discussions (Lucas)

Lucas: someone with permissions can help debug docs deployment.

Lucas: would be good to get feedback from other libraries, so that we can make the assertion infrastructure standardized across libraries.

Aaron: my suggestion might be to make things a PyTest plugin, as would then allow fixtures.

Guido: is there a reason why NumPy testing not shipping as a plugin?

Evgeni: predating plugins?

Aaron: that is likely one reason.

Guido: fixtures end up being rather different. E.g., SciPy uses a number of SciPy-specific env flags. array-api-extra uses a bunch of magic to test against different backends.

Lucas: idea would be to identify what could be pulled out and should be pulled out, but yes, point taken that there is going to be a subset of behavior which cannot be pulled out.

Lucas: Tim, would you be willing to take a look at how different we are doing for sklearn?

Tim: had a quick look. Most of the time use assert and numpy equal.

Lucas: yes, different in SciPy, but still worthwhile to have another look.

with cp.cuda.Device(1):
    y = cp.asarray(1, device=cp.cuda.Device(0))  # y should be on Device(0)
    z = y + 1  # device of z?

Guido: currently, specification provides recommendations, but is not proscriptive. Atm, have issues in SciPy where encounter difficulties with using CuPy and arrays allocated on different devices.

Tim: based on the example above, what would the device of z be?

Guido: based on my understanding of spec, z should be on Device(0), as the input array device to a function takes precedence.

Tim: okay, and in this case, 1 should inherit the device of y.

Ralf: and CuPy chokes?

Guido: correct.

Tim: right, the problem seems to be that CuPy ends up creating 1 on device 1, rather than 0.

Ralf: seems like a CuPy bug.

Lucas: CuPy also has guidance that context manager should take precedence. Would need to discuss further to determine whether following spec would mean a backward compat break.

Ralf: I think the only thing which is reasonable is for z to be on same device as y.

Resolution: we should clarify in the spec device guidance for scalar arguments to behave similarly to how dtype is inherited from array argument in binops.

Lucas: general question about where metadata should live to allow consuming libraries to inspect array flavors.

Guido: applies more generally anytime you need to special case a library. E.g., was not able to implement xp.at that was not special-cased for JAX, as no standardized behavior in standard which would support JAX's at semantics. Another case is where Dask cannot support boolean masks when there is an unknown shape.

Ralf: discussion at https://github.com/pydata/duck-array-discussion/issues may be relevant.

Guido: another problem area is lazy_apply. Requires quite a bit of special casing for Dask and JAX. And if either come wrapped by, say, MArray, nothing will work.

Guido: not sure about solutions. One solution is to have a standardized API for wrapping and unwrapping, but this is problematic when it comes to metadata. For various use cases, you need domain-specific knowledge in order to generate metadata.

Ralf: I am very skeptical that we'd be able to standardize anything here. Already, one layer of wrapping is already difficult. Not a new request. People have been trying to do this for over a decade. Given the niche aspects of the various wrappers, I think may be difficult to do in a generic way which is not terrible for performance.

Guido: could be possible to generalize for element-wise ops.

Ralf: yes, but that is not going to get you very far.

Guido: in general, easy to standardize an api for wrapping and unwrapping, until you crash and burn wrt to meta data. E.g., going across chunks in Dask. In order to rebuild meta data, you need to know the backend and the operation involved.

Hameer: same thing in Sparse.

Lucas: problem is when the meta data is not completely determined by performing the operation.

Ralf: right now, there is only a few of these libraries, and, atm, seems like these libraries have to know about one another.

Ralf: I think this entire enterprise is not amenable to standardization. Even masked arrays is niche. NumPy has them, but not other libraries.

Guido: Dask wrapping a CuPy array is not niche.

Ralf: correct, but that is one use case. I think we can revisit this in a year and see where things stand.

2025 focus areas

  • Previous discussion 9 January 2025

    • in-place updates (setitem)
    • cond function
    • typing
    • device transfers
    • DLPack v1.0 support
    • making progress on documentation
    • missing data/masks (tentative proposals to get ecosystem consensus)
  • What are the pain points? Where are the gaps?

(postponed until next meeting)


Meeting minutes 3 April 2025

Time: 10am PDT, 1pm EDT, 5pm GMT, 7pm CEST
Present: Ralf Gommers, Joren Hammudoglu, Aaron Meurer, Hameer Abbasi, Lucas Colley, Guido Imperiale, Athan Reines, Evgeni Burovski, Tyler Reddy, Nikita Grigorian, Leo Fang, Tim Head, Sebastian Berg,

  1. Announcements
  2. Triage
  3. Sparse interchange update (Hameer)
  4. DLPack (Leo)
  5. Assorted discussions (Lucas)
  6. 2025 focus areas

Notes

Announcements

Leo: DPLack v1.1 tagged and release. In this release, added more enumerators. Support for fp4, fp6, fp8 for ML data types.

Triage

Guido: would be good to discuss. We can either postpone until the end of the meeting or next time.

Resolution:

Sparse interchange update (Hameer)

  • PR: https://github.com/data-apis/array-api/pull/912
  • What should the extension mechanism be?
    • These APIs do not fit neatly into how we've done other extensions, such as fft and linalg, and require that extension behavior be present on the array object, itself.
  • Should there be a minimum set of supported formats?

Aaron: could binsparse be implemented for dense arrays, as well?

Hameer: in principle, yes, but not clear why would you want to do that, as less efficient than just using DLPack.

Aaron: do we need to be cognizant of using sparse as a sub-namespace name? E.g., PyTorch has a dedicated sparse namespace.

Ralf: no, I don't think this should be a problem, as PyTorch has its APIs primarily in the main namespace.

Aaron: could we think about adding an is_sparse_array API?

Ralf: passing a dense array to from_binsparse should be disallowed. In general, we should recommend that dense arrays not implement the __binsparse__ method.

DLPack (Leo)

Leo: one concern is that NumPy cannot represent sub-byte types. NumPy will pad, even though types use fewer bits than a full byte (fp4, only 4 bits).

Leo: added a flag to determine whether a dtype is padded or packed.

Leo: would be good to get feedback.

Sebastian: NumPy dtypes can probably be made to support it, but the array object wouldn’t be able to do anything with it.

Leo: Another issue is https://github.com/dmlc/dlpack/issues/74.

Ralf: this would be adding new C APIs but matching the semantics of what we have in Python.

Leo: correct. Aim is to have more to discuss in 2-4 weeks.

Hameer: in general, if using these compact dtypes, they should not be packed.

Leo: correct, if on GPUs, don't want to pad. However, at least for NumPy, NumPy cannot support. JAX uses NumPy dtype system to allow for easy prototyping. The goal would be to allow NumPy to at least express these specialized dtypes.

Leo: at NVIDIA, we noticed that, even with DLPack implemented at C level, too much overhead for kernel launch. At a bare minimum, need only data pointer, shape, and strides. Don't have an explicit ask here, but I at least wanted to share our experience after having worked with DLPack over the past few years. This, at least, is how CuPy does things.

Assorted discussions (Lucas)

Lucas: should array-api-compat follow the sklearn dep policy? Seems reasonable to me, but it would mean testing NumPy v1.22 in CI.

Guido: I think one of the concerns is that policy means keeping NumPy v1 until 2027.

Ralf: a couple of comments. Spec0 has not been universally accepted. PyTorch tried to be a bit more loose, but got feedback from users that they essentially adopted sklearn policy.

Aaron: in general, the compat library should be as conservative as possible, and not drop versions prematurely. The compat library is a compatibility library and the entire point is to smooth versions. Main thing is CI matrix is a bit annoying to maintain.

Aaron: the strict library can support the latest and greatest. Strict is not a runtime dependency. They should not necessarily have the same policies.

Ralf: NumPy v2 has only been out for 9 months. There are still major projects which haven't migrated. So it is likely that v1.26 will be around for another year and some months.

Guido: Python 3.12 is the last version of NumPy v1 for which wheels exist.

Tim: there is an escape hatch in the sklearn policy. Maybe more of an appetite for dropping NumPy v1 in 12-18 months.

(further discussion related to points above postponed until next meeting; especially typing which should be placed at beginning of agenda)

2025 focus areas

  • Previous discussion 9 January 2025

    • in-place updates (setitem)
    • cond function
    • typing
    • device transfers
    • DLPack v1.0 support
    • making progress on documentation
    • missing data/masks (tentative proposals to get ecosystem consensus)
  • What are the pain points? Where are the gaps?

(postponed until next meeting)


Meeting 20 Mar: CANCELLED


Meeting 6 Mar: CANCELLED


Meeting minutes 20 February 2025

Time: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST
Present: Athan Reines, Evgeni Burovski, Hameer Abbasi, Aaron Meurer, Tim Head, Stephan Hoyer, Sebastian Berg, Jake Vanderplas, Leo Fang,

  1. Announcements
  2. Integer array indexing
    • Overview
    • Supporting multiple integer index arrays (beyond 0D)
    • Supporting multi-dimensional integer arrays
    • Supporting sequences (non-arrays)
    • setitem semantics
    • allowed dtypes
    • other considerations?
  3. Sparse interchange update

Notes

Announcements

Integer array indexing

  1. Overview of integer array indexing support

    • https://github.com/data-apis/array-api-tests/pull/343

      • brings array-api-tests style testing
      • tests indexing 1D and nD arrays with mixtures of integers and 1D index arrays, mixtures of integers and nD index arrays (4 separate tests: 1D-1D, nD-1D, 1D-nD, nD-nD)
    • NumPy: support all sorts fancy indexing, maybe even too much

    • Torch: all tests pass

    • CuPy: all tests pass

    • JAX: does not support lists for fancy indexing. https://github.com/jax-ml/jax/issues/4564

    • ndonnx: all fancy indexing tests fail with ValueError: The lazy Array Array(dtype=Int32) cannot be converted to a scalar int

    • mlx: (unknown)

    • Dask: Understands a single 1D index array. Fails to allow

      • multidim indexing arrays: da.arange(5)[da.asarray([[1], [1]])] -> NotImplementedError: Slicing with dask.array of ints only permitted when the indexer has zero or one dimensions
      • multiple indexing arrays: a2 = da.arange(12).reshape(3, 4), then a2[1, da.array([0, 1])] works but a2[da.array([1, 0]), da.array([0, 1])] -> NotImplementedError: Don't yet support nd fancy indexing
      • da.take does not seem to allow arrays either

Stephan: it does appear that ONNX does support N-dimensional integer array indexing via its gather API: https://onnx.ai/onnx/operators/onnx__GatherND.html

  1. Should we support multiple integer index arrays?

    • Dask only seems to support x[[1,2,3]] and

      ​​​​​​​​In [36]: a3.compute()
      ​​​​​​​​Out[36]: 
      ​​​​​​​​array([[[ 0,  1],
      ​​​​​​​​        [ 2,  3]],
      ​​​​​​​​       [[ 4,  5],
      ​​​​​​​​        [ 6,  7]],
      ​​​​​​​​       [[ 8,  9],
      ​​​​​​​​        [10, 11]]])
      ​​​​​​​​In [38]: a3[0, 1, da.array([0, 1])].compute()
      ​​​​​​​​Out[38]: array([2, 3])
      

Stephan: without multiple integer index arrays, not really that useful. Dask can support, just may not be that efficient.

Athan: could make it an optional feature, similar to boolean indexing.

Stephan: I don't think you want the binary choice, as no technical limitation. It is more whether it is advisable to actually use integer array indexing.

Jake: not really a property you can query or reason for it, as no way to guarantee efficiency due to random access.

Resolution: yes, support multiple integer arrays.

  1. Should we support multi-dimensional integer arrays?

    • NumPy, PyTorch, and JAX support; however, Dask does not.

      ​​​​​​​​a3[0, da.array([0]), da.array([0, 1])].compute()  -> NotImplementedError
      
    • What would be the workaround in the compatibility layer?

Aaron: for the compat layer, you could flatten it and use unravel_index, etc.

Aaron: for ndonnx, unless they do chunking, there may not be a data dependence, and should be feasible.

Stephan: biggest use case for me is in XArray as it allows generalized indexing.

Tim: for sklearn, main use case is single array for randomly selecting elements.

Resolution: yes, support multi-dimensional integer arrays. In the compat layer, will need to add a helper function which can do the right thing for Dask.

  1. Should we support sequences, such as lists, or only standardize integer array indices?

Jake: in general, JAX does not support lists where NumPy does due to a perf footgun.

Leo: Yeah no Python objects as N-D indexers if possible, plz.

Leo: I found the instance in which PyTorch did not prefer list arguments: PyTorch preferred list for the repeat() API: <https://github.com/data-apis/array-api/issues/654

Sebastian: not applicable here.

Resolution: only support integer arrays.

  1. __setitem__ semantics for multi-dimensional integer arrays?

    • __getitem__ results in an array having a larger rank
    • should we add a sentence saying that integer array indexing only applies to the RHS of an assignment operation for the time being, with LHS semantics determined in 2025?
    • NumPy creates a LHS view and then assigns such that the last value wins

Example:

In [1]: z2
Out[1]:
array([[[ 0.,  1.,  2.,  3.,  4.],
        [ 5.,  6.,  7.,  8.,  9.],
        [10., 11., 12., 13., 14.],
        [15., 16., 17., 18., 19.],
        [20., 21., 22., 23., 24.]],

       [[25., 26., 27., 28., 29.],
        [30., 31., 32., 33., 34.],
        [35., 36., 37., 38., 39.],
        [40., 41., 42., 43., 44.],
        [45., 46., 47., 48., 49.]],

       [[50., 51., 52., 53., 54.],
        [55., 56., 57., 58., 59.],
        [60., 61., 62., 63., 64.],
        [65., 66., 67., 68., 69.],
        [70., 71., 72., 73., 74.]],

       [[75., 76., 77., 78., 79.],
        [80., 81., 82., 83., 84.],
        [85., 86., 87., 88., 89.],
        [90., 91., 92., 93., 94.],
        [95., 96., 97., 98., 99.]]])
        
In [2]: z2.shape
Out[2]: (4, 5, 5)
        
In [3]: v2 = np.linspace(1,81,81).reshape((3,3,3,3))

In [4]: v2
Out[4]:
array([[[[ 1.,  2.,  3.],
         [ 4.,  5.,  6.],
         [ 7.,  8.,  9.]],

        [[10., 11., 12.],
         [13., 14., 15.],
         [16., 17., 18.]],

        [[19., 20., 21.],
         [22., 23., 24.],
         [25., 26., 27.]]],


       [[[28., 29., 30.],
         [31., 32., 33.],
         [34., 35., 36.]],

        [[37., 38., 39.],
         [40., 41., 42.],
         [43., 44., 45.]],

        [[46., 47., 48.],
         [49., 50., 51.],
         [52., 53., 54.]]],


       [[[55., 56., 57.],
         [58., 59., 60.],
         [61., 62., 63.]],

        [[64., 65., 66.],
         [67., 68., 69.],
         [70., 71., 72.]],

        [[73., 74., 75.],
         [76., 77., 78.],
         [79., 80., 81.]]]])
         
In [5]: z2[[1],[1],np.ones((3,3,3,3),dtype='int32')] = v2

In [6]: z2
Out[6]:
array([[[ 0.,  1.,  2.,  3.,  4.],
        [ 5.,  6.,  7.,  8.,  9.],
        [10., 11., 12., 13., 14.],
        [15., 16., 17., 18., 19.],
        [20., 21., 22., 23., 24.]],

       [[25., 26., 27., 28., 29.],
        [30., **81.**, 32., 33., 34.],
        [35., 36., 37., 38., 39.],
        [40., 41., 42., 43., 44.],
        [45., 46., 47., 48., 49.]],

       [[50., 51., 52., 53., 54.],
        [55., 56., 57., 58., 59.],
        [60., 61., 62., 63., 64.],
        [65., 66., 67., 68., 69.],
        [70., 71., 72., 73., 74.]],

       [[75., 76., 77., 78., 79.],
        [80., 81., 82., 83., 84.],
        [85., 86., 87., 88., 89.],
        [90., 91., 92., 93., 94.],
        [95., 96., 97., 98., 99.]]])

Resolution: punt setitem to 2025.

  1. Allowed integer array data types?

    • Torch only allows native int types, such as int32 and int64, but not uint64 or int16

Resolution: must support default indexing dtype.

  1. Other considerations?

Leo: re: out of bounds, cupy cannot handle it like in numpy: https://docs.cupy.dev/en/stable/user_guide/difference.html#out-of-bounds-indices

Sparse interchange update

Hameer: PRs up for SciPy and in PyData/Sparse.

DLPack

Sebastian: if Leo shows: could discuss briefly how to do fp6 in dlpack. The questions are around padding to bytes, or not and if to use .bits or extend it.

Leo: I’d like to discuss https://github.com/dmlc/dlpack/issues/65.

Leo: folks at NVIDIA are complaining about DLPack is slow.

Leo: getting various about C API. Want to get performance down to lower than kernel launch behavior.


Meeting minutes 6 February 2025

Time: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST
Present: Ralf Gommers, Aaron Meurer, Hameer Abbasi, Tim Head, Athan Reines, Nikita Grigorian, Sebastian Berg, Jake Vanderplas, Simon Adorf, Evgeni Burovski, Leo Fang,

  1. Announcements
  2. Triage
  3. DLPack update
  4. Adding spread_dims to array libraries?
  5. Binsparse update
  6. Downstream library update

Notes

Announcements

Simon: software engineer at NVIDIA, working on cuML. Dropping in to learn more.

Leo: cupy 13.4.0 in preparation (to support ctk 12.8/blackwell gpu)

Triage

AR: What remains for v2024?

DLPack Update

Ralf: progress on DLPack v1.0. PyTorch PR is up, but not finished. Should get in for PyTorch 2.7. Peter Hawkins was looking at JAX, which is the last major library we are tracking. He identified that it would nice to have a better test suite. Currently missing in the test suite. This is likely one of the most important missing APIs in the test suite. Hopefully, we can borrow round-trip tests from NumPy et al.

Jake: no updates from me regarding JAX.

Leo: speaking of JAX, doesn't currently string CUDA-1 which was a hack to workaround limitation. We plan to file an issue for JAX XLA.

Leo: DLPack covers all the basic data types; however, with ML use cases, need has arisen for supporting lower precision data types.

Leo: quick way out would be to add new DLPack enumerated dtype variants. This solves the immediate needs. However, I am wondering if this is too cumbersome and would inflate the enum variants too much. Thoughts?

Hameer: for all the floating-point dtypes, do they all boil down to sign bit plus exponent bits plus mantissa?

Jake: that is mostly true; however, they can vary in terms of their representation of NaN, etc. My main concern is that there is a lack of agreement as to what those dtypes mean/represent.

Jake: I think we could potentially standardize around the 4 that PyTorch has and the ML dtypes. However, there could be others that come along based on research.

Sebastian: I think it should be fine, as we have enough enum space. And we are not asking for everyone to support. I think inflating the enum is easier compared to the approach with byte sizes being split out. And even if another name takes over, we add an alias for the name.

Jake: PyTorch bf16 is not compatible with ML dtypes bf16.

Ralf: I would hope that we can standardize bf16 in the next year.

Jake: would be nice to get that into NumPy.

Leo: okay, so the plan is to open a new PR to add new enums to DLPack. Will seek feedback.

spread_dims (v2025; Athan)

  • Discussion: https://github.com/data-apis/array-api/issues/760#issuecomment-2602163086

  • Proposal:

    ​​​​def spread_dims(x: array, ndims: int, axes=Tuple[int, ...]) -> array
    
  • TL;DR: expand_dims has issues when supporting multiple axes due to ambiguities when mixing nonnegative and negative indices, thus hindering standardization of a common use case. Is there room for adding a separate API which handles the "spreading" of dimensions more explicity?

Sebastian: Not sure about naming, I think I could be convinced if there are some people saying that it seems nicer for their use-cases.
The new API does make sense, but I am not immediately sure it would be used much. Asking a non ambiguous subset is of course easier in a sense.

Resolution: try to standardize unambiguous subset of behavior. Revisit PyTorch objections.

binsparse update

Hameer: discussion this week about adding blocked sparse formats, as well as additional formats. This should hopefully the missing major format in the binsparse specification.

Ralf: is there a complete implementation somewhere?

Hameer: of the Python version, no. There is still the discussion of the one or two function version. Will try to implement the two function version in SciPy and PyData/Sparse.

Downstream libraries update

Evgeni: we are planning on dropping old NumPy support in SciPy.

Evgeni: another version update is that in the array-api-strict is requiring support for Python v3.12 and up. Change coming this summer.

Jake: seem fines to me to follow SPEC 0.

Tim: for array-api-strict, should not be a problem, as we only use it during testing.

Tim: in sklearn, we decided to not follow SPEC 0, as we believe it moves too fast. For NumPy, we look to whatever the default is being shipped on Ubuntu LTS.

Tim: that said, may be open.

Ralf: I'd say we shouldn't drop it so long as sklearn supports.

Tim: may be good to coordinate.


Meeting minutes 23 January 2025

Time: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST
Present: Ralf Gommers, Oleksandr Pavlyk, Sebastian Berg, Aaron Meurer, Evgeni Burovski, Nikita Grigorian,

  1. Announcements
  2. Triage
  3. Clarifying copy kwarg behavior in asarray
  4. Adding support for a copy kwarg to real and conj
  5. Adding dtype kwarg support to fftfreq and fftfreq
  6. Supported data type kinds in astype
  7. Adding spread_dims to array libraries?

Notes

Announcements

Oleksandr: joining NVIDIA as part of CUDA Python team.

Nikita: work for Intel. Been working with Oleksandr at Intel. Background in physics and ML research.

Triage

Clarifying copy kwarg behavior in asarray (Athan)

  • Discussion: https://github.com/data-apis/array-api/issues/495

  • PR: https://github.com/data-apis/array-api/pull/886

    • Proposed note: "A common use case for setting copy equal to True is to guarantee that, when provided an array x, asarray(x, copy=True) returns an array which may be mutated without affecting user data. For conforming implementations which disallow mutation, explicitly copying array data belonging to a known array type to a new memory location may not be necessary. Accordingly, such implementations may choose to ignore the copy keyword argument when obj is an array belonging to that implementation."
  • Related discussions:

  • Summary: in the specification, we have tried to avoid prescribing implementation behavior. However, a question that has arisen on multiple occassions is what is meant by "copy". In general, we've tended to (informally) say "logical copy", which means that, so long as a library can guarantee that a returned array behaves semantically like a copy, then conforming libraries can avoid actually allocating and copying to new memory. However, implementors have struggled with understanding the intended semantics of "copy", and there seems to be some subtlety in terms of desired use cases:

  • Additional confusion arises due to a divergence in astype and asarray kwarg behavior, where copy=False does NOT mean never copy, but only copy when the input and desired output dtype differ.

  • APIs supporting a copy kwarg in the specification:

    • asarray
      • "If True, the function must always copy. If False, the function must never copy for input which supports the buffer protocol and must raise a ValueError in case a copy would be necessary. If None, the function must reuse existing memory buffer if possible and copy otherwise."
        • Comment: TMK, for this API, we've tended to define "copy" as a logical copy.
      • Default: None
    • reshape
      • "If True, the function must always copy. If False, the function must never copy. If None, the function must avoid copying, if possible, and may copy otherwise."
        • Comment: should we raise a ValueError here in case a copy would be necessary and copy == False?
        • Comment: here, "copy" can be both logical and physical (e.g., for strided implementations, some shapes may require copying data to a new contiguous chunk of memory)
      • Default: None
    • __dlpack__
      • "If True, the function must always copy (performed by the producer). If False, the function must never copy, and raise a BufferError in case a copy is deemed necessary (e.g. if a cross-device data movement is requested, and it is not possible without a copy). If None, the function must reuse the existing memory buffer if possible and copy otherwise."
      • Default: None
    • astype
      • "If True, a newly allocated array must always be returned. If False and the specified dtype matches the data type of the input array, the input array must be returned; otherwise, a newly allocated array must be returned."
        • Comment: here, "copy" means physical copy (i.e., "newly allocated")
      • Default: True
  • Questions to focus discussion:

    1. What language should we add to the standard to clarify what the specification means by "copy" and where should this clarification live?
    2. Do we need an API which explicitly guarantees a physical copy? (ref: John and Jake/JAX use cases)
    3. Should astype behavior be aligned with asarray et al in terms of (a) default behavior and (b) support of None?
    4. Should astype be allowed to reinterpret elements without allocation (i.e., no forced copy) when dtypes do not agree but are of the same bit width or are a multiple of the desired bit width (e.g., float32 => int32, float64 => int32, etc)?

Initial recommendation: recommend a physical copy, unless you can guarantee that an array can be mutated without side-effects. In order to guarantee, you have to perform whole-program analysis in order to ensure this.

Recommendation: add to copy views to design topics. If a copy is made, the intent is fulfilled: namely, that mutation can happen free of side effects.

Recommendation: put a copy function in array-api-extra, as one-liner convenience function.

Addendum: For users, if copy is True, then you are free to use in-place operations.

Sebastian: I think the question is, if the user does:

b = array(a, copy=True)
np.from_dlpack(b)

Does b ensure that a cannot be modified or do we allow libraries to be sloppy on it (telling users to be careful about from_dlpack there).

Oleksandr: I think modifying b can actually modify a . We would need b = asarra(a); c = from_dlpack(b, copy=True)

Aaron: But if asarray actually did the copy the copy in from_dlpack could be a redundant copy.

Aaron: also depends on provenance of array data (e.g., from a buffer, dlpack). If coming from upstream, then cannot actually know if data cannot be mutated. If a user specifies copy, then may be indicating to actually perform the physical copy in order to guard against mutation.

Recommendation: astype => ship has sailed on this one. Change in NumPy would be very disruptive.

Recommendation: separate proposal for reinterpreting memory. Would also deviate from astype as could change shape. Could be a candidate for array-api-extra, using library-specific functions.

Adding support for a copy kwarg to real and conj (Athan)

  • Discussion: https://github.com/data-apis/array-api/pull/884
  • TL;DR: libraries having the concept of strides, such as NumPy, return a strided view, rather than copying real components to a new memory location, and libraries such as PyTorch perform "lazy conjugation" by setting a conjugate bit. In order to perform explicit conjugation, PyTorch provides conj_physical. This discussion raises the question of whether we should add an explicit copy kwarg to real and conj to force data copying, as downstream consumers may want to mutate elements free of side effects and a copy kwarg would allow signaling intent.

Oleksandr: I would be in favor of separate APIs for forcing a physical copy.

Sebastian: for real and imag, could add a copy kwarg. But for conj in NumPy, this is a ufunc and we do not want to have to add a copy kwarg to every ufunc.

Oleksandr: in dpctl, we implement as ufuncs.

Aaron: in compat, we had to use conj_physical.

Ralf: IIRC, that PyTorch behavior was a bug.

Ralf: moving things away from ufuncs would be a huge implementation lift and would not be backward-compat. Can also force a copy using: asarray(real(x), copy=True).

Oleksandr: would be nice to have dedicated APIs for explicitly setting either real or imaginary components.

Ralf: can do sthg like: real(x) + 1.j * fill(pi)

Oleksandr: if not JAX, this is rather wasteful given the number of temporary copies. Not just about perf, but also limited memory usage on GPUs.

Recommendation: shelve this.

Adding dtype kwarg support to fftfreq and rfftfreq (Athan)

  • Discussion: https://github.com/data-apis/array-api/issues/717
  • PR: https://github.com/data-apis/array-api/pull/885
  • TL;DR: these two creation APIs lack the ability to specify an output dtype and thus the output floating-point precision. Currently, an array having the default floating-point dtype is returned. This PR proposes to add support for specifying a dtype, with the caveat that NumPy et al do not currently support a dtype kwarg (note: PyTorch does support a dtype kwarg), so this would be new in various libraries and would need to be supported in array-api-compat in the interim.

Recommendation: move forward. No objections.

Supported data type kinds in astype (Athan)

Oleksandr: need to accommodate array libraries which support, e.g., float16.

Aaron: I would punt this to v2025 and allow to incubate in the compat to allow for further discussion.

Evgeni: I vote for deferring and add implementation to the compat layer.

spread_dims (v2025; Athan)

  • Discussion: https://github.com/data-apis/array-api/issues/760#issuecomment-2602163086

  • Proposal:

    ​​​​def spread_dims(x: array, ndims: int, axes=Tuple[int, ...]) -> array
    
  • TL;DR: expand_dims has issues when supporting multiple axes due to ambiguities when mixing nonnegative and negative indices, thus hindering standardization of a common use case. Is there room for adding a separate API which handles the "spreading" of dimensions more explicity?

(postponed until next meeting)


Meeting minutes 9 January 2025

Time: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST
Present: Ralf Gommers, Athan Reines, Evgeni Burovski, Hameer Abbasi, Sebastian Berg, Oleksandr Pavlyk, Tim Head, Jake Vanderplas, Lucas Colley, Aaron Meurer, Leo Fang,

  1. Announcements
  2. Triage
  3. Mutable arrays in capabilities (Lucas)
  4. __binsparse__ next steps and updates (Hameer)
  5. Type promotion behavior in diff (Athan)
  6. Adding nan_to_num to the specification
  7. Adding support for specialized matrix products
  8. Other 2025 focus areas and/or missing APIs?

Notes

Announcements

  • Happy New Year!

Ralf: any announcements?

Lucas: spoke with Tim and Olivier about getting array-api-extra in sklearn. PR is up, and looks like this is good to go.

Triage

Mutable arrays in capabilities (Lucas)

Lucas: context was DLPack in SciPy. I believe we skip this, due to Evgeni's PR to add support for the buffer protocol.

Ralf: sounds good.

__binsparse__ next steps and updates (Hameer)

  • Discussion: https://github.com/data-apis/array-api/issues/840
  • Seems like some kind of blocked-sparse support will land in the binsparse protocol; there's general interest.
  • Also there is general interest in supporting the in-memory interchange.
  • Should we have __binsparse_format__/.format? To me (Hameer), the answer is yes, similar to __dlpack_device__/.device.

Hameer: point of contention to have two separate protocols: one for querying format and another for actually materializing/transferring data.

Hameer: any issues with having two separate APIs?

Ralf: to me, this is analogous to the dense case, with __dlpack_device__ and .device.

Ralf: seems like we are good to go here.

Outcome: go ahead and implement.

Athan: what is the current status of the binsparse spec?

Leo: IIRC, there were some concerns around performance due to parsing JSON, etc.

Hameer: I think the initial path forward is to use Python dictionaries. However, no real progress on a C struct, TMK.

Leo: so they are not ready to commit to a standardized struct layout?

Hameer: correct. Discussions are still ongoing.

Ralf: seems like starting with Python makes sense as helps us get to a prototype faster.

Type promotion behavior in diff (Athan)

Outcome: maintaining status quo is fine here.

Adding nan_to_num to the specification

Outcome: adding to array-api-extra seems like preferred path.

Adding specialized methods for matrix products

Jake: to me, this are not necessary to add to the standard.

Ralf: we can reconsider in the future if all libraries in the standard have implemented.

Leo: IMO, we should focus on standardizing einsum, rather than having these specialized APIs.

Outcome: punt on this.

Any specific focus areas and/or missing APIs for 2025?

Jake: the big item is in-place updates, setitem. Still some design work to be done.

Ralf: agreed. I think we will also need to revisit the cond function, as anything in an if creates problems.

Ralf: typing is another area.

Lucas: in SciPy, the things we have punted on is device transfers. Related: https://github.com/data-apis/array-api-extra/pull/86#issuecomment-2580929411

Ralf: DLPack v1.0 would be nice to get supported.

Oleksandr: more or less, we are good in OneAPI/sycl. Where the spec is silent, we choose to align with NumPy.

Ralf: would be good to make progress on documentation.

Ralf: we finally have GPU CI in SciPy. We currently use Cirrus runners. These may be useful for other projects. Currently, much more effective and cheaper than using GitHub. Link: https://cirrus-runners.app. One of the big advantages is caching.

Leo: this would be great for CuPy.

Leo: thus far, only Linux support and no multi-GPU support. But that would be a high ask to extend to Windows and multi-GPU.

Lucas: quick announcement regarding having array API support with unit libraries: https://github.com/orgs/quantity-dev/discussions.

Athan: is there any thing we need to do for missing data/masks? Ref: https://github.com/data-apis/array-api/issues/875

Ralf: there may be room here for tentative proposals in order to get consensus across the ecosystem. In this case, it would be good to ensure consistency across NumPy.

Sebastian: re: NumPy inconsistency: all of the ones that don’t have it, are a bit weird ones (i.e. not reduction implementation wise). So that’s why they don’t have it.

Lucas: would it make sense to extend membership to consumer libraries? Ref: < https://github.com/data-apis/governance/issues/26>

Ralf: we should tweak the governance language. The easy part is just adding folks to the list.


Meeting minutes 12 December 2024

Time: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST
Present: Ralf Gommers, Athan Reines, Hameer Abbasi, Oleksandr Pavlyk, Sebastian Berg, Evgni Burovski, Lucas Colley, Tim Head, Aaron Meurer, Jake Vanderplas,

  1. Announcements
  2. Triage (Athan)
  3. Scalar support in where (Athan/Tim)
  4. Scalar support in result_type (Athan)
  5. Require that dtypes obey Python hashing rules (Evgeni)
  6. Mutable arrays in capabilities (Lucas)
  7. __binsparse__ next steps and updates (Hameer)

Notes

Announcements

  • Cancel the meeting on Dec 26? Next meeting Jan 9, 2025?

Ralf: everyone good with canceling on Dec 26? Good. Canceled.

Ralf: any other announcements?

Triage

Ralf: need a proper discussion in NumPy to ensure alignment. That should happen as a precedent before merge.

Oleksandr: what is the hold-up?

Ralf: NaN sorting.

Sebastian: kwarg support.

Scalar support in where (Athan/Tim)

Outcome: restrict where to having a condition argument being an array. See also https://github.com/data-apis/array-api/issues/807#issuecomment-2159156834. Require at least one of x1 or x2 to be an array.

Scalar support in result_type (Athan)

Outcome: require at least one array or dtype object.

Hashing rules for dtypes (Evgeni)

Evgeni: use case in sklearn to be able to add dtypes as dictionary keys.

Aaron: in terms of hashing, the issue is NumPy as it has different ways of representing dtypes and they don't hash the same. This has been a pain, so would be nice to resolve.

Aaron: current issue is that x.dtype and np.dtype do not hash the same, as one is an instance and one is a class.

Sebastian: don't think we can fix the hashing, in this case. We may be able to change the hash of the scalar type. But would need to put in the work to make this happen.

Aaron: to circle back, the use of dictionary keys assume different hashing.

>>> x = np.asarray(0.)
>>> {np.float64: 0, x.dtype: 1}
{<class 'numpy.float64'>: 0, dtype('float64'): 1}
>>> x.dtype == np.float64
True

Aaron: I wrote a whole blog post about this a long time ago https://www.asmeurer.com/blog/posts/what-happens-when-you-mess-with-hashing-in-python

Evgeni: could we standardize around __str__?

Ralf: problem there is that would mean dtypes from different array libraries would compare as true, which is not desired.

Outcome: need a better summary on the issue, with the various constraints.

Ideal outcome: NumPy could fix this.

Sebastian: I think you can fix it to the extent that you want it fixed.

Mutable arrays in capabilities (Lucas)

(postponed discussion)

Lucas: can wait, depending on how urgently array-api-strict would be to move from __array__.

__binsparse__ next steps and updates (Hameer)

  • Seems like some kind of blocked-sparse support will land in the binsparse protocol; there's general interest.
  • Also there is general interest in supporting the in-memory interchange.
  • Should we have __binsparse_format__/.format? To me (Hameer), the answer is yes, similar to __dlpack_device__/.device.

(postponed discussion)


Meeting minutes 28 November 2024

Time: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST
Present: Ralf Gommers, Athan Reines, Tim Head, Evgeni Burovski, Lucas Colley, Nathaniel Starkman, Sebastian Berg, Jorn Hammudoglu, Aaron Meurer,

Agenda

  1. Announcements
  2. Revisiting materialize (Hameer)
  3. __binsparse__ next steps and updates (Hameer)
  4. Adding isin to the specification
  5. Static typing
  6. Triage
  7. array-api-extra and scikit-learn (Lucas)

Notes

Announcements

  • Aaron departure from Quansight
  • Meeting logistics
    • Public calendar?
      • (Ralf) In the meantime, happy to add folks to my calendar invite for this.

Ralf: will figure out a public calendar invite. In the meantime, please let me know if you'd like to be added to my private calendar invite. Just make sure you set your timezone to UTC.

Evgeni: should we still have a Google Group?

Sebastian: We could use the scientific-python discuss (happy to use it for numpy too, FWIW).

Ralf: that may be better.

Aaron: IMO we should keep public discussions on GitHub, since that's where things have been mostly done already

Revisiting materialize (Hameer)

  • Issue: https://github.com/data-apis/array-api/issues/839
  • Graph breaks work for e.g. PyTorch
  • Sometimes (e.g. Dask, Galley/sparse) hints are absolutely necessary, and automatic breaks can be sub-optimal.
    • E.g. iterative algorithms should materialize after each iteration

(postponed until next meeting)

__binsparse__ next steps and updates (Hameer)

  • Seems like some kind of blocked-sparse support will land in the binsparse protocol; there's general interest.
  • Also there is general interest in supporting the in-memory interchange.
  • Should we have __binsparse_format__/.format? To me (Hameer), the answer is yes, similar to __dlpack_device__/.device.

(postponed until next meeting)

Adding isin to the specification (Lucas or Ralf)

Ralf: question for second element in terms of supported values: tensor and scalar?

Sebastian: how would type promotion work? Scalars makes sense (not sure how important it is). The big thing you could skip, is that you could force the second one to match the first dtype?

Ralf: could restrict to same dtype?

Evgeni: does the spec specify hashing behavior?

Aaron: I think the implementation is usually based on searchsorted, so not usually hashing.

Sebastian: could be implemented using hashing, in principle.

Sebastian: should see what NumPy does in terms type promotion.

Aaron: I would tend toward following what we do for equal.

Aaron: separate question: what is the kind argument? The choice of algorithm?

Lucas: yes, and certainly for Dask, could be more important for being able to choose algorithm in distributed contexts.

Ralf: specifying the algorithm is not something we want to specify/require.

Joren: how would different shapes work? Would the second argument ravel?

Sebastian: we could just limit the second argument to 1D.

Nathaniel: looking at NumPy's implementation, just uses __contains__. So invariant to shape of array.

Joren: if allow other shapes for second argument, could be confusing, as users may expect different output shape.

Sebastian: I think it’s fine to limit, but also fine not to if others support it.

Ralf: I think we can punt on this decision for now and do more R&D. At a minimum, we can specify 1D and always expand to n-dimensional later.

Static typing (Nathaniel)

Nathaniel: was motivated as a downstream consumer of array libraries in order to build array-agnostic libraries.

Nathaniel: I am interested in eventually achieving having something pip-installable to allow static and runtime type checking.

Joren: agree. Standalone library seems like the most obvious solution. If have separate from spec, would also be nice for release cadence.

Joren: have also thoughts on typing of dtypes and devices.

Ralf: seems like we have converged over the past week to have a separate thing called array-api-typing.

Nathaniel: Nick Tesser is another individual who'd likely be intersted in getting involved.

Joren: dtypes are not particularly amenable to statically typable atm. Currently there is not a way to statically distinguish between dtypes. Related: <https://github.com/jorenham/optype/issues/25 >.

Nathaniel: personally, I'd love if array could be parameterized by the dtype. Could also be nice to parameterize by backend, as well. Would allow seeing how dtypes flow through a function.

Action item: get the typing repo/project going and we can iterate there.

Action item: we should make clear that the typing stubs that we have now is that the stubs are there for Sphinx and not intended to be explicitly used.

Triage (Athan)

Athan: unless folks object, will go ahead and merge within the next week or so.

Ralf: add a comment to that extent so everyone gets a ping.

array-api-extra and scikit-learn (Lucas)

  • Question: vendor (git submodule new for scikit-learn?) vs optional dependency (not useful?) vs hard dependency (not feasible?)

Lucas: sklearn has array-api-compat as an optional dep, rather than SciPy, which vendors.

Tim: we don't currently use array-api-extras is that don't currently need. We just have some custom implementations/wrappers. I suggest creating an issue which articulates how to transition.

Tim: most of what goes into sklearn's _array is what is in NumPy, but not in the Array API standard. Occassionally, it is something outside of NumPy.

Lucas: could work to have extras as an optional dep, which only gets pulled in if array-api-compat installed.

Tim: I am slightly leaning toward vendoring. Currently, if array-api-compat is missing, we print a message instructing the user to install. Could do the same, but not ideal to impose other actions.

Aaron: I'd advocate for vendoring, especially for backward-compat reasons and handling breaking changes.

Ralf: agreed that vendoring is preferred.

Tim: Lucas, I suggest creating an issue.

Lucas: seems like vendoring could be applicable to array-api-compat.


Meeting minutes 14 November 2024

Time: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST
Present: Ralf Gommers, Athan Reines, Hameer Abbasi, Evgeni Burovski, Leo Fang, Sebastian Berg, Aaron Meurer, Oleksandr Pavlyk, Tim Head,

Agenda

  1. Intros & context on community meeting
  2. Announcements
  3. __array__ support (Aaron)
  4. Future of the project as a dispatching mechanism? (Leo)
  5. Revisiting materialize (Hameer)
  6. __binsparse__ next steps and updates (Hameer)
  7. Triage

Notes

Please help to edit meeting notes collaboratively during the call! And afterwards, please help ensure that key points, plans and decisions are recorded on the relevant GitHub issues.

Intros and call overview

Ralf: maybe next time, we can put on the mailing lists of SciPy, sklearn, and other interested projects.

Tim: key will be ensuring good minutes.

Announcements

  • Blog post about the new CZI EOSS cycle 6 award and related work that is planned: https://data-apis.org/blog/eoss6_award/.
  • Array API strict (Aaron)
    • Started working on v2024 support (preliminary work in a PR, but will merge and can be enabled via a flag)
      • Still need to add scalar support
    • Still need test suite support

Aaron: next meeting is Thanksgiving.

Ralf: we should probably keep the meeting, as better than being off for a month. We should just avoid making major decisions.

Aaron: Array API strict update. Have started working on v2024; however, need to enable a flag. This is all preliminary. Assumes that functions are already implemented correctly in NumPy. The test suite still needs to be updated. Biggest gap is scalar support, as we have yet to standardize. Regardless, if using the strict library and want to use one of the new functions, you should have the ability to access that now provide you opt-in.

__array__ support (Aaron)

Aaron: originally, the strict library was numpy.array_api. During that time, we had added __array__ which allows NumPy to convert input objects to NumPy arrays.

Aaron: I attempted to remove from the strict library; however, this broke many things in SciPy.

Oleksandr: this has be problematic in the context of GPU libraries, especially for 0D arrays.

Sebastian: could just implement __array__ and then error or whatever else makes sense. Want to avoid sequence route and just error.

Hameer: what about removing __array__ and then just rely on DLPack?

Ralf: if we could implement the Buffer protocol in the strict library, then would resolve issue, I believe.

Sebastian: cannot do that, as would not work on many arrays. Only applicable for CPU arrays.

Aaron: strict library does have some "fake" device support. We could say that the Buffer protocol works for default devices (CPU) and then error for non-default devices.

Ralf: agree with Sebastian, as DLPack is preferrable here.

Aaron: should we put something in the standard about requirements for implementing the Buffer protocol, if you can? At least, in Python v3.12, you can implement the Buffer protocol in Python. While arguably not how most libraries would implement, but it should be possible.

Hameer: some libraries would not be able to implement via C API. Sparse and Dask would have difficulty.

Evgeni: what about just avoiding the Buffer protocol and only using DLPack?

Sebastian: not sure what problem we are trying to solve.

Evgeni: in SciPy, we were relying on xp.asarray for any object, including for objects from other namespaces.

Ralf: there were many places in SciPy where we intentionally needed to convert to NumPy arrays.

Sebastian: I think it is safe to add to the standard that, if on CPU, you should always implement the Buffer protocol.

Oleksandr: for non-CPU arrays, libraries such as SciPy would explicitly error, and then user would be required to make an explicit transfer to CPU.

Action item: update the standard to require Buffer protocol for array libraries supporting CPU.

Future of the project as a dispatching mechanism? (Leo)

  • Three mechanisms for dispatching:
    • Pure Python, array/tensor based programs
      • This is the Array API, as used in SciPy, sklearn, etc
    • Import hook hijacking (ex: cudf.pandas)
      • Allows intercepting import statements, while still allowing is instance checks to continue working.
    • Backend dispatch system (ex: cudf.polars, NetworkX)
      • Allow drop-in replacement, as everything is transparent to upstream user/developer.

Leo: NetworkX approach is currently being discussed elsewhere, and I wonder if there is an opportunity to standardize something in this area.

Ralf: I believe there is a fourth option here. The above assumes that SciPy/sklearn is doing things in Python, but this may not be true.

Oleksandr: my understanding for NetworkX is that a user can say, for this task, I have an implementation for the GPU, so make a transfer to the GPU, perform computation, and then transfer back.

Ralf: at least in SciPy, if we get an array in, and if there is a matching function, then delegate to that. E.g., if CuPy array in, introspect, and then explicitly import CuPy and then call the corresponding function in CuPy.

Ralf: But there is another approach is nanobind, which allows supporting Torch, NumPy, and other array libraries at the same time.

Ralf: You can also write Cython memory view code that works with multiple CPU libraries.

Leo: https://nanobind.readthedocs.io/en/latest/api_extra.html#n-dimensional-array-type

Leo: Seems like a similar idea to array-api-{compat,strict}, or what we have in cuda.core.StridedMemoryView.

Oleksandr: but these work the same way in that the computation happens on the same device as the device of the array that came in. But there is another use case here were you want to support multi-device support.

Leo: returning back to original question is whether we are at a stage where we can make explicit recommendations for people to consider for their projects.

Sebastian: I think this topic doesn’t fit in a single meeting. :)

Ralf: I am not sure, as this is still early days.

Leo: right, but then we can at least say that here are 5 or so approaches, but here is why you should first try Array API support.

Ralf: correct, and we rank according to complexity.

Tim: would be good to connect with the Scientic Python community on this, as there is interest and some discussions happening.

Ralf: I think it is a matter of writing a good overview. Sthg along the lines of here are the 4-5 conceptual approaches, here are a few concrete implementations, and provide some guidance on the pitfalls, shortcomings, and strengths of the approach.

Action item: write an overview.

Leo: proving out some of these approaches could be a good project for an intern.

Revisiting materialize (Hameer)

  • Graph breaks work for e.g. PyTorch
  • Sometimes (e.g. Dask, Galley/sparse) hints are absolutely necessary, and automatic breaks can be sub-optimal.
    • E.g. iterative algorithms should materialize after each iteration

Hameer: as an alternative to a materialize API, could we standardize around a compile decorator which is a no-op for those libraries not supporting compilation?

Ralf: this is likely a non-starter in libraries such as Jax and Torch, as they are not going to change.

(postponed until next meeting)

__binsparse__ next steps and updates (Hameer)

  • Seems like some kind of blocked-sparse support will land in the binsparse protocol; there's general interest.
  • Also there is general interest in supporting the in-memory interchange.
  • Should we have __binsparse_format__/.format? To me (Hameer), the answer is yes, similar to __dlpack_device__/.device.

(postponed until next meeting)

Triage (Athan)

(postponed until next meeting)

Select a repo