Array API standard - community meeting

# Array API standard - community meeting - Next meeting: 18 September 2025 - Join via Zoom at: https://zoom.us/j/91658331939?pwd=cWVPa3dESFJzU05ROFVJbXovdkJxdz09 - Time: UTC 17:00 (05:00 PM) | Timezone | Date/Time | | ---------- | ---------- | | US / Pacific | Thu 18-Sep-2025 10:00 (10:00 AM) | | US / Mountain | Thu 18-Sep-2025 11:00 (11:00 AM) | | US / Central | Thu 18-Sep-2025 12:00 (12:00 AM) | | US / Eastern | Thu 18-Sep-2025 13:00 (01:00 PM) | | EU / Western | Thu 18-Sep-2025 18:00 (06:00 PM) | | EU / Central | Thu 18-Sep-2025 19:00 (07:00 PM) | | EU / Eastern | Thu 18-Sep-2025 20:00 (08:00 PM) | | Moscow | Thu 18-Sep-2025 20:00 (08:00 PM) | | Chennai | Thu 18-Sep-2025 22:30 (10:30 PM) | | Hangzhou | Fri 19-Sep-2025 01:00 (01:00 AM) | | Tokyo | Fri 19-Sep-2025 02:00 (02:00 AM) | | Sydney | Fri 19-Sep-2025 03:00 (03:00 AM) | or in your local time: - <https://www.timeanddate.com/worldclock/fixedtime.html?msg=Workgroup+Meeting+Array+API&iso=20250918T1700> Public calendar: - <https://calendar.google.com/calendar/embed?src=8fe9013a2cb5d3409bb236d04eca73fa5227eac01c02ea8f6bc4a6a3cf982fa3%40group.calendar.google.com> * * * # Meeting minutes 18 September 2025 **Present**: Jake Vanderplas, Tom White, Tianqi Chen, Tim Head, Sebastian Berg, Evgeni Burovski, Athan Reines, Tyler Reddy, Aaron Meurer, Ralf Gommers, Leo Fang 1. Announcements 2. DLPack exchange of dtype and device 3. DLPack C function 4. Cubed (Tom W) 5. Handling multiple array objects from same array library 6. Negative axis support in `permute_dims` 7. `meshgrid` return type ## Notes ### Announcements - Reminder: new public calendar of events: <https://calendar.google.com/calendar/embed?src=8fe9013a2cb5d3409bb236d04eca73fa5227eac01c02ea8f6bc4a6a3cf982fa3%40group.calendar.google.com> - Navigate to the link above and hit the "Add to Google Calendar" link at the bottom of the page to stay up-to-date on the latest. - **October 2 meeting canceled.** ### DLPack exchange of dtype and device - Discussion: <https://github.com/data-apis/array-api/issues/972> Chen: would like attach dunder DLPack methods to device and dtype objects to allow their independent exchange. Jake: this would be the first time to place restrictions on the device object? Sebastian: correct, and this could be problematic for device types which are strings. Do we have a motivating use case? Chen: See the OP in <https://github.com/data-apis/array-api/issues/972>. Leo: only PyTorch is using non-NumPy dtypes. Ralf: JAX does this via user-defined dtypes. However, separate package and not a trivial amount of work. Jake: fairly happy with user-defined dtypes. Using NumPy dtypes has been fairly convenient. Have designs to get bfloat16 in NumPy, but that is another topic. Leo: only PyTorch is the known outlier. cuNumeric uses NumPy dtypes. Ralf: don't see it changing in Torch. Practically, don't see a strong motivation. Leo: would love to stand behind NumPy/ML dtypes, as it would simplify things greatly. In talking to Torch team, seems like main issue was no one thought of it and engineering effort. Ralf: regarding device object, for me, no strong objections, and seems like a reasonable thing to do. I suppose the follow-up here is to see who else is using strings. If only NumPy, don't really use devices, so we could change. Jake: JAX dtypes have a concept of sharding. Would it be okay for DLPack to error here? Chen: yes. Sebastian: FWIW, I don’t mind adding `__dlpack_data_type__`. We actually should add __numpy_dtype__ to NumPy, probably anyway. Resolution: for dtype, goal is to standardize around NumPy dtypes. For device, we can see about moving forward with dunder method. ### DLPack C function - Discussion: <https://github.com/data-apis/array-api/issues/973> - C function proposal: <https://github.com/dmlc/dlpack/issues/175> - Background discussion: <https://github.com/dmlc/dlpack/issues/65> - C helpers: <https://github.com/dmlc/dlpack/issues/74> Chen: found that performance bottleneck is going through Python API. Solution is three functions to convert to and from PyObject to DLPack tensors and allocators. Chen: with this API, we can lower overhead by 30\%. Ralf: in principle, always had this in mind. Initial adoption DLPack was a success. As for C API, seems reasonable that we can R&D. Leo: this would allow bypassing Python? Chen: would allow going between extension to extension. Still need PyObject. Leo: seems like you need to know which library you are talking to. Chen: ah, should only need a pointer and don't care about which library. Leo: still need to interact with CPython C API, correct? Chen: depends on how you implement. Sebastian: This is the same path that type slots in Python objects work, except it’s not an tp_numbers slot, but fetched via the dunder. And yes, I think the scheme makes a lot of sense. I do have an idea to support everything in the buffer protocol itself and extend Python, but I don’t want to really go into that now. Ralf: would this proposal result in different semantics? I've seen discussion regarding stream handling, etc. <https://github.com/dmlc/dlpack/issues/176> Chen: stream handling patterns in libraries overview discussed in <https://github.com/dmlc/dlpack/issues/176#issuecomment-3308115899>. Chen: for now, trying to limit scope so we don't get into stream handling, yet. Would like to keep these two ideas separate. Sebastian: stream handling would impose a different API in Python. Chen: we could update Python version to support stream query. Sebastian: I wouldn't mind if no Python version for the moment. Just highlighting that it is different. Could always add the Python version to the C side. Leo: I have a sense that query stream is not going to fly with JAX, as streams are all internal. Don't expect that stream will be provided by JAX to Python layer. Jake: not familiar with that part of FFI. Ralf: should be fine that what happens at C level is a superset of what happens at the Python layer. Leo: this discussion is also relevant to issue opened a long time regarding deleter behavior. Sebastian: in the kernel use case, a kernel could internally use a stream. So a user cannot explicitly pass a stream to deleter. Sebastian: We should move forward, I think it would be very interesting about JAX and whether stream exchange is possible. Otherwise, I think it is better if we can just map out how whatever we do pans out for the use-cases I mentioned. Leo: regarding proposal, I am not in favor of capsules. Hard to get them right. Scope is clear that for two C APIs to communicate. Would be good to have a note in RFCs that for DSLs this isn't the whole story. Chen: question: is it preferrable to use PyCapsule or an integer? Ralf: as long as PyCapsule doesn't end up in Python layer. If going to Python, integer is fine. Chen: will update the proposal. Will sync with Leo and co. Leo: still need to discuss with internal teams. Resolution: move forward with proposal. Will work with folks to bring to DLPack. ### Cubed (Tom W) - Background (SciPy 2025 talk): <https://www.youtube.com/watch?v=HUFY2EFc6zs> - 2024.12 support in [Cubed 0.23.0](https://github.com/cubed-dev/cubed/releases/tag/0.23.0) - Discoverability/list of Array API implementations Tom: just recently released a new version of Cubed with Array API support. Tom: is there a page with a list of all implementations? Ralf: we only have a page for stakeholders: <https://data-apis.org/array-api/latest/purpose_and_scope.html#stakeholders> Ralf: having a new page with all implementations. Ralf: could you tell us about Cubed? Main thing I see is fixed memory. Tom: most similar to Dask. Canonical use case is processing Zarr data in the cloud. So chunked data. Cannot do data-dependent shapes. Static graph concept. Because chunked arrays, wraps a backend Array API implementation (e.g., JAX, NumPy, CuPy), etc. Also exposes Array API itself (so both consumer and producer). Tom: interested in how mask arrays could evolve as an Array API backend. Ralf: so before execution, analyze graph, and that tells you how much memory you need to allocate. Tom: correct. From Dask, take blockwise abstraction. Build graph. Heavy optimization on graph. Minimize intermediate stages to limit amount of data to transfer. Resolution: create a new page and a note to README encourage library authors to submit PR against that page to add their libraries along with the version supported. ### Handling multiple array objects from same array library - Discussion: <https://github.com/data-apis/array-api/discussions/968> - TL;DR: specification mentions as a non-goal mixing array objects from multiple libraries (see <https://data-apis.org/array-api/latest/purpose_and_scope.html>); however, the specification doesn't offer any guidance for array libraries which offer multiple array types (compressed/decompressed, sparse/dense, etc). Is there something we should say here? - Note: the above discussion concerns an array library mixing compressed with NumPy arrays. They encountered an edge case; however, it isn't clear whether this might be a more general concern. Resolution: wait for real user needs before any starting to think about this. ### Negative axis support in `permute_dims` (follow-up) - Discussion: <https://github.com/data-apis/array-api/issues/962> - See 24 July meeting notes. - Need JAX support. Jake: yes, this should be fine. If NumPy supports, would be good. Resolution: can move forward. ### `meshgrid` return type (follow-up) - Discussion: <https://github.com/data-apis/array-api/issues/938> - See 15 May meeting notes. - Need JAX/CuPy support. Jake: tuple vs list. Does break some users, but we can do a deprecation cycle. Ralf: context: changed in NumPy 2.0, as Numba folks really wanted, as tuples easier to reason about. Jake: yes, makes sense. Resolution: create master list of functions can change. * * * # Meeting minutes 4 September 2025 **Present**: Ralf Gommers, Athan Reines, Lucas Colley, Sebastian Berg, Tyler Reddy, Aaron Meurer, Tim Head, Evgeni Burovski, Leo Fang, 1. Announcements 2. Scalar support in `searchsorted` 3. JITting classes across JAX, PyTorch & beyond 4. Next steps for `npx.at.set` fancy indexing 5. Negative axis support in `permute_dims` 6. `meshgrid` return type 7. ## Notes ### Announcements - New public calendar of events: <https://calendar.google.com/calendar/embed?src=8fe9013a2cb5d3409bb236d04eca73fa5227eac01c02ea8f6bc4a6a3cf982fa3%40group.calendar.google.com> - Navigate to the link above and hit the "Add to Google Calendar" link at the bottom of the page to stay up-to-date on the latest. - This should help reduce maintenance burden of an explicit calendar invitation list. - PR: <https://github.com/data-apis/array-api/pull/971> - [name=Lucas] please could `lucas.colley8@gmail.com` be added to the members mailing list? - [name=Lucas] EuroSciPy was good! https://github.com/data-apis/array-api/discussions/969 Lucas: EuroSciPy involved many discussions around the array API. Lots of momentum there. A number of libraries were discussing adoption. ### Scalar support in `searchsorted` - Discussion: <https://github.com/data-apis/array-api/issues/967> Leo: should be possible to provide scalar to second argument in CuPy. Aaron: omitted from previous scalar argument updates based on <https://github.com/data-apis/array-api/issues/807#issuecomment-2195567565> as first argument cannot be a scalar. Leo: I'd argue `searchsorted` should follow the same principle and allow a scalar, so yes SGTM. Resolution: add support for scalar arguments in the second argument. Can add support in array-api-compat. ### JIT-ing classes across JAX, PyTorch & beyond - This is challenging it turns out (unsurprisingly), see design discussion in <https://github.com/scipy/scipy/pull/23447>. - Notebook: <https://github.com/ev-br/bench_playground/blob/master/viz.rbf.ipynb> Evgeni: who is doing the JIT-ing? Library or the user? Seems like the user. Evgeni: What APIs should be provided? What sort of API should be exposed? Or do we expose any API at all? OR if a user wants to benefit from JIT, advise user to go reach in and access private method/helper, etc? Ralf: seems like JAX is farther along than the PyTorch folks. Would be nice if they can get on the same page. Ralf: from standards perspective, not sure we can do much, apart from telling folks to "do things right". Seems like all of this is still in immature state. Ralf: there is a proposal on the JAX repo but it hasn't moved forward after a couple of years. Resolution: if folks have time, read the discussion and the out-bound links from the SciPy issue. ### Next steps for `xpx.at.set` fancy indexing? (Guido/Martin?) - <https://github.com/data-apis/array-api-extra/pull/395> - <https://github.com/data-apis/array-api/issues/864> Lucas: TL;DR: workaround in array-api-extra is not sufficient. Would be nice to move forward in a standardized way. Ralf: fancy indexing is going to be hard to implement in a portable manner. Re: PR. Probably should not merge. Can use library-specific code. Lucas: what makes that awkward is that can add special path for JAX, but this does not work with array-api-strict. Would like to avoid special casing strict and having to skip the tests. Lucas: can leave state as is, where for certain algos requiring a specific form of fancy indexing, we can say we support provided one of a specific set of libraries. ### Negative axis support in `permute_dims` (follow-up; Jake?) - Discussion: <https://github.com/data-apis/array-api/issues/962> - See 24 July meeting notes. - Need JAX support. Athan: no news here. Need JAX input. ### `meshgrid` return type (follow-up; Jake?) - Discussion: <https://github.com/data-apis/array-api/issues/938> - See 15 May meeting notes. - Need JAX/CuPy support. Athan: no news here. Need JAX input. ### Dispatching in SciPy Sebastian: would be interesting in hearing more. Ralf: discussion: <https://github.com/scipy/scipy/pull/23493> and <https://github.com/scientific-python/spatch>. Evgeni: Can jitting be cast as a spatch backend? Sebastian: I don't know if the dispatching layer would confuse it? Sebastian: would be good to discuss at some point how the jit-ers work and how it might work with dispatching. * * * # Meeting minutes 24 July 2025 **Time**: 10am PDT, 1pm EDT, 5pm GMT, 7pm CEST **Present**: Ralf Gommers, Joren Hammudoglu, Aaron Meurer, Sebastian Berg, Athan Reines, Tim Head, Evgeni Burovski, 1. Announcements 2. Adding explicit support for negative indices in `permute_dims` 3. Adding `keepdims` support to `vecdot` 4. Moving `logsumexp` forward 5. Adding a way to serialize arrays to string/bytes 6. Adding `special.erf` 7. Adding `special.erfinv` 8. Special function extension prioritization 9. ## Notes ### Announcements Ralf: many functions in SciPy.special were made Array API compat. Aaron: at SciPy, discussion about Dask. Dask has steep drop in dev and there was some concern. For those interested, see the dask-next channel to discuss and may be slated for discussion at the next community meeting. Ralf: yes, would be nice to know where going, as done work in array-api-extra, array-api-compat, etc. Aaron: `cubed` is a new array library which is built similarly with a slightly nicer design. Talk at SciPy. I recommend seeing the talk. A bit designed for storing things in S3 buckets. Joren: <https://cubed-dev.github.io/cubed/> Ralf: has an `xarray` API. Aaron: one of the benefits is its ability to track bounded memory across computation. Ralf: seems like there is decent Array API support. <https://cubed-dev.github.io/cubed/array-api.html> Aaron: also related, I suggest watching Chuck's keynote where he talks about the history of NumPy/SciPy. ### Negative indices in `permute_dims` (Athan) - Discussion: <https://github.com/data-apis/array-api/issues/962> - Support status: - Does support: - NumPy - PyTorch (via `permute`) - Dask (via `transpose`) - Does not support: - JAX - Unknown - CuPy - If we add support to the specification, it should be possible to support in the compat layer. Athan: code demonstration ```python= In [1]: import numpy as np In [2]: x = np.zeros((2,3,4)) In [3]: np.permute_dims(x, axes=(1,0,2)).shape Out[3]: (3, 2, 4) In [4]: np.permute_dims(x, axes=(-2,-3,-1)).shape Out[4]: (3, 2, 4) ``` Ralf: good design goal. Joren: why not supported JAX? Ralf: XLA currently does not support negative indices and JAX largely just passes through. Would need to be implemented at the higher level. Joren: no fundamental reason? Ralf: need Jake to weigh in here. Sebastian: does JAX support negative indices in `moveaxis`? Ralf: yes, so should work. Resolution: await feedback from Jake and so long as JAX is okay, go ahead and specify. ### Adding `keepdims` support to `vecdot` (Athan) - Discussion: <https://github.com/data-apis/array-api/issues/910> - TL;DR: `vecdot` is a linear algebra function which is also a reduction. For reduction APIs, such as `sum`, `max`, `prod`, etc, support a `keepdims` argument. `vecdot` currently does not, which is unfortunate as `vecdot` is equivalent to the fused operation `sum(x*y)`. - additional query whether we can extend multi-axis support to `vecdot`. Currently, `vecdot` only supports a single axis. Adding support for multiple axes would allow greater generality for `sum(x*y)`. Sebastian: no real appetite here because the machinery in NumPy is not well-suited for supporting multiple axes, as would need to copy to contiguous memory. Sebastian: In the end, NumPy would just have to reshape and copy (possibly into a buffer though, so it is a bit nicer maybe, but I am not sure that is exactly easy either.) I suppose, I think: No do, unless someone is excited about working on it NumPy, and I doubt that. Ralf: likely hard to get this in PyTorch. Sebastian: `vecdot` is not a typical reduction, as generalized ufunc. (NumPy has keepdims, so I don't mind having it, but I don't see as super important.) Sebastian: The solution: use einsum. ;) Resolution: not move forward with this. ### Moving `logsumexp` forward (Athan) - Discussion: <https://github.com/data-apis/array-api/issues/596> - TL;DR: with the special functions extension progress slowed (ref: <https://github.com/data-apis/array-api/issues/725>), can we proceed with adding `logsumexp` to the main namespace given the existing inclusion of [`logaddexp`](https://data-apis.org/array-api/latest/API_specification/generated/array_api.logaddexp.html#logaddexp)? - This would also allow us to bypass the naming bikeshed of `log_sum_exp` vs `logsumexp` given that `logaddexp` already serves a precedent in the main namespace. Sebastian: cannot exactly replicate with `logaddexp.reduce`. Evgeni: took several years to add to SciPy due to several edgecases. Implemented in pure Python. Sebastian: I guess I can be convinced that logsumexp isn’t all that special. ### Adding a way to serialize arrays to string/bytes (Evgeni) - Discussion: <https://github.com/data-apis/array-api/issues/963> Evgeni: currently, does not seem that DLPack capsules cannot be converted to bytes. Sebastian: what is the use case? Evgeni: being able to store an array as a binary blob. Sebastian: need to retain memory order, etc. Evgeni: if you want to pass along to another array, then use `from_dlpack`. Aaron: is DLPack contiguous memory? If not, then using `to_bytes` is not possible. Sebastian: `numpy.from_dlpack(obj, device=“cpu”).to_bytes()` Ralf: not trivial to do this. Joren: relevant recent python proposal: <https://discuss.python.org/t/adding-serialize-and-deserialize-dunder-methods-for-object-serialization/98591> Aaron: one of the annoying things about standard library JSON is not able to specify your own serialization approaches. Aaron: no way to have a PyCapsule which tells standard library to decode to JSON. Evgeni: does not have to be JSON, could be database. Sebastian: what is the use case here? From user POV, large use case. But, when storing in database, only useful if you pick a library. Ralf: not clear that raw bytes are meaningful. Aaron: only meaningful is PyCapsule blob is one chunk of contiguous memory. Ralf: even when contiguous, you need to know dtype, etc. Joren: is the buffer protocol not the way to go for encoding arrays to bytes? Sebastian: same issue is DLPack. Evgeni: also not supported by PyTorch. Resolution: reject. Use NumPy, Pandas, or something. ### Adding `erf` to a special functions extension (Athan) - Discussion: <https://github.com/data-apis/array-api/issues/964> (see below) ### Adding `erfinv` to a special functions extension (Athan) - Discussion: <https://github.com/data-apis/array-api/issues/965> (see below) ### Special function extension prioritization - Discussion: <https://github.com/data-apis/array-api/issues/725> - Which functions should we prioritize above others due to their importance and/or need? - E.g., `gamma` Ralf: - logit - expit - gamma - erf - erfinv - erfc - erfcinv - softmax - logsumexp Evgeni: would this not be better as a separate package? If `xsf`, then would have CPU and GPU implementations. No need to create an extension specification? Sebastian: That seperate package could also be scipy.special in principle. Evgeni: somebody may not want to depend on SciPy. Ralf: could be a pure Python package. Sebastian: I don’t think it’s quite that simple, but I suspect it's simpler then defining a standard. Ralf: yes, would simply delegate to the underlying array library's implementation. Resolution: explore possibility of separate package rather than standardizing an extension. * * * # Meeting minutes 10 July 2025 **Time**: 10am PDT, 1pm EDT, 5pm GMT, 7pm CEST **Present**: Ralf Gommers, Lucas Colley, Tim Head, Athan Reines, Tyler Reddy, Guido Imperiale, Marco Gorelli, Sebastian Berg, 1. Announcements 2. array-api-strict and Python 3.{10, 11} 3. RNG 4. Backward compatibility policies 5. (your topic) 6. ## Notes ### Announcements - SciPy v1.16 is out! - Lucas: speaking about the standard at EuroSciPy and PyData Paris 2025 - For the stats inclined, discussion regarding quantile in `array-api-extra`: <https://github.com/data-apis/array-api-extra/issues/340> - August recess? Ralf: SciPy v1.16 is out. Lucas: significant stats Array API work and JAX support. Lucas: will be speaking about the standard at EuroSciPy and PyData Paris. Will be covering various repos/projects, what works, and future plans. Tim: may also be at EuroSciPy. Lucas: Evgeni will also be attending to talk about SciPy. Lucas: regarding quantile, main discrepancy is that SciPy has a version, but does not support weights, which is what sklearn needs. Tim: has someone mentioned negative weights, yet? Lucas: no, not yet. Ralf: is this needed in sklearn? Tim: no, but this is a conversation ender. :) Ralf: August recess? Everyone on board? Resolution: August recess. ### array-api-strict and Python 3.{10, 11} (Guido/Evgeni/Lucas) - array-api-extra is stuck upgrading to array-api-strict 2.4: <https://github.com/data-apis/array-api-extra/pull/343> - potential ways forward discussed at <https://github.com/data-apis/array-api-strict/issues/160>: - 2.4 could be yanked, and `__array__` could be restored for `python<3.12` - 2.4 could be yanked, and Python 3.{10, 11} could be dropped from `pyproject.toml` - 2.4 could stay, and downstreams can work around any problems when upgrading - if `__array__` is not restored for `python<3.12`, <https://github.com/data-apis/array-api-extra/pull/348> unblocks `xpx` by only testing array-api-strict on latest Python Guido: a number of breakages due to dropped support for Python 3.10/11 in array-api-strict. Lucas: would be nice to support 3.10, as would be nicer for testing. Agreed that we should yank 2.4. Guido: would be nice to have consistent Python support across projects. Tim: wonder if array-api-strict should be more conservative in its Python version support, as one of its primary purpose is for testing. Much easier to use than testing directly with PyTorch, etc. I might advocate for making array-api-strict more generally installable. Ralf: I am in agreement, as are others. Lucas: yeah, the problem is that array-api-strict is more aggressive than Spec 0. Tim: yes, would be good if array-api-strict at least follows Spec 0. Ralf: probably better to not be more aggressive than NumPy. Ralf: Spec 0 sets the most aggressive boundary, but obviously we should avoid breaking downstream. Ralf: background is that Aaron advocated against having `__array__` available as this is something that is not in the standard. Resolution: support Python 3.10/11 to avoid unnecessarily breaking downstream packages. ### RNG (Lucas) - Discussion: <https://github.com/data-apis/array-api-extra/issues/308> - Could include in array-api-extra and make NumPy an optional dependency - but there are scope creep considerations - Past discussion: <https://github.com/data-apis/array-api/issues/431> Lucas: significant interest on `array-api-extra` to add RNG support. Current thinking is to provide an optional dependency on NumPy and provide a wrapper. Guido: my biggest concern is with JAX. Ralf: main concern is complex to get right. There is a clear need. If included, should be put behind a warning regarding unstable ABI. Likely that we wouldn't get this right the first time. Tim: why do users need random arrays generated in an array agnostic fashion? Guido: part of it is consistency in that the Array API already has creation functions. Ralf: fairly unlikely that RNG will ever be standardized. Ralf: to Tim's question, one use case is in SciPy, where certain stats function require a generator. Sebastian: I agree with it probably being good to experiment with the rng somewhere else… Maybe one can find a solution, or just limit things a lot and accept spawning new streams. (but then you may need bigger warnings around why this may not be great...) Resolution: if don't want to include in array-api-extra, suggest creating a new package. Can live under the Consortium organization, if need be or would be useful. ### Backward compatibility policies (Lucas/Guido) - Backwards-compat policies for array-api-{compat, extra}? - Discussion: <https://github.com/data-apis/array-api-extra/issues/324> Lucas: would be good to actually have a policy. Currently, there is no policy. Guido: one question is how mature is support across the ecosystem? Ralf: for compat, it should strictly implement what is in the standard. In which case, the backward compat policy should be as strict as the standard itself (i.e., very rare breaking changes). Ralf: array-api-extra is the more interesting one. Guido: but it is also a question of whether array-api-compat should support older versions of upstream libraries? Marco: for Narwhals, we try to be as user-friendly as possible in order to avoid breaking people's code, especially wrt transitive dependencies. If ``` import narwhals.stable.v1 as nw ``` Marco: we effectively commit to any bad design decisions for that version. Ended up being a lot less effort than I expected it to be. Ralf: not too dissimilar to array-api-compat, as can ask for a specific version of the standard. Would think it works in similar ways. Guido: I think array-api-strict is the only one which implements older versions of the standard. Ralf: seems a bit foreign, but I like the strategy. The stable API is a similar strategy as vendoring. Ralf: in array-api-extra, we effectively say vendor, pushing to the user, with less maintainer work. Marco: inspiration here is Rust and their concept of addition. Ralf: would this be appropriate to consider for the future? Namely, right now, vendoring is the strategy. When dust settles, can revisit. Lucas: yes, that makes sense. Resolution: vendor for now and revisit in the future. Write up a policy for display in docs. ### Mixing ellipses indexing and integer arrays - Would be useful in SciPy. Ralf: probably not going to happen. There is an instructive PR on PyTorch. There is divergent behavior which cannot be readily worked around. Due to the complexity, not worth addressing and very little chance to standardize. * * * **Meeting 26 June: CANCELLED** * * * # Meeting minutes 10 July 2025 **Time**: 10am PDT, 1pm EDT, 5pm GMT, 7pm CEST **Present**: 1. Announcements 2. ## Notes ### Announcements - SciPy v1.16 is out! - Lucas: speaking about the standard at EuroSciPy and PyData Paris 2025 ### array-api-strict and Python 3.{10, 11} (Guido/Evgeni/Lucas) - array-api-extra is stuck upgrading to array-api-strict 2.4: https://github.com/data-apis/array-api-extra/pull/343 - potential ways forward discussed at https://github.com/data-apis/array-api-strict/issues/160: - 2.4 could be yanked, and `__array__` could be restored for `python<3.12` - 2.4 could be yanked, and Python 3.{10, 11} could be dropped from `pyproject.toml` - 2.4 could stay, and downstreams can work around any problems when upgrading - if `__array__` is not restored for `python<3.12`, https://github.com/data-apis/array-api-extra/pull/348 unblocks `xpx` by only testing array-api-strict on latest Python ### RNG (Lucas) - Discussion: <https://github.com/data-apis/array-api-extra/issues/308> - Could include in array-api-extra and make NumPy an optional dependency - but there are scope creep considerations - Past discussion: https://github.com/data-apis/array-api/issues/431 ### Backward compatibility policies (Lucas/Guido) - Backwards-compat policies for array-api-{compat, extra}? - Discussion: <https://github.com/data-apis/array-api-extra/issues/324> * * * # Meeting minutes 12 June 2025 **Time**: 10am PDT, 1pm EDT, 5pm GMT, 7pm CEST **Present**: Ralf Gommers, Athan Reines, Sebastian Berg, Aaron Meurer, Evgeni Burovski, Tim Head, Leo Fang, 1. Announcements 2. extension capability detection 3. typing? 4. adding `isin`? 5. adding `bincount`? 6. ## Notes ### Announcements - Side discussion in sklearn regarding multi-device/namespace support: - <https://github.com/scikit-learn/scikit-learn/issues/28668> - <https://github.com/scikit-learn/scikit-learn/issues/31274> - JAX support in SciPy - <https://github.com/scipy/scipy/issues/22246> - einsum discussion - <https://github.com/data-apis/array-api/issues/727> Ralf: re einsum, maybe we can see if Alex can drive this. Aaron: may be worthwhile to invite Alex to this meeting so he can discuss. Ralf: re JAX, this is quite interesting. Did a fair amount of work in test and infra work. ### Extension capability detection? - Discussion: <https://github.com/data-apis/array-api/issues/950> - Related: <https://github.com/data-apis/array-api-compat/issues/335> - What to do when a library such as Dask does not fully implement an extension? Ralf: three options: 1. Have inspection utility in main namespace => likely no-go 2. Have an inspection API => is this necessary? 3. Independently version extensions => not clear this is necessary Ralf: not clear why cannot simply directly change whether the function is in the namespace. Aaron: is there a reason not to have a dedicated inspection API? Ralf: it would make it quite cumbersome, as then need to query inspection API before every usage. Tim: agreed. Sebastian: agreed. Evgeni: seems like best approach would be to upstream the linalg functionality to Dask. Aaron: agreed, an extension is really no different non-extension APIs. Resolution: just special case. Nothing to do at standards level. ### Typing libraries - Discussion: <https://github.com/data-apis/array-api/discussions/951> - PyPI: <https://pypi.org/project/array-api/> - can we get the name? Leo: seems unfortunate atm. Aaron: would be good step-in. Leo: I was also concerned about lack of progress, as we were interested in type checking in cuTile. Evgeni: two things to negotiate: 1) getting the PyPI name and 2) aligning contributors. Tim: likely an oversight that we did not name squat the PyPI name earlier. Likely to be confusing for downstream users who may not know what `array-api` corresponds. Evgeni: could we ask to have `array-api` org on PyPI? Ralf: Good idea, although we want the `data-apis` org, as that matches the GitHub org. Aaron: in short, likely want to pursue an amicable solution. Further discussion context: <https://github.com/data-apis/array-api/discussions/863#discussioncomment-11379051> Leo: further rename discussion: <https://github.com/34j/array-api/issues/6> Resolution: Consortium should step in here to mediate a path forward. ### Adding `isin`? - Discussion: <https://github.com/data-apis/array-api/issues/854> - PR: <https://github.com/data-apis/array-api/pull/959> Resolution: require value equality. ### Adding `bincount`? - Discussion: <https://github.com/data-apis/array-api/issues/812> - PR: <https://github.com/data-apis/array-api/pull/960> Sebastian: one of the bigger issues is that NumPy always returns a float64 array when providing a `weights` array. Resolution: can we add `length`? Resolution: can we drop `minlength`? Or is there a use case where `minlength=1` to avoid empty arrays? Resolution: update spec and then go to NumPy to see if buy-in. * * * # Meeting minutes 29 May 2025 **Time**: 10am PDT, 1pm EDT, 5pm GMT, 7pm CEST **Present**: Ralf Gommers, Aaron Meurer, Athan Reines, Evgeni Burovski, Rok Mihevc, 1. Announcements 2. what should happen when providing an empty axis tuple to reductions (e.g., `count_nonzero`)? 3. device-specific capabilities 4. test suite pain points 5. Allow 0d arrays and scalars in concat 6. ## Notes ### Announcements - magpylib adoption of Array API: <https://github.com/data-apis/array-api/discussions/947> - SciPy Array API docs: <https://scipy.github.io/devdocs/dev/api-dev/array_api.html> Ralf: Quansight transitioned from LLC to PBC. Evgeni: full functional coverage of v2024 in the test suite. If currently pinning version, would be good to update. Ralf: NumPy also implemented full v2024 support. Aaron: someone posted a blog post on thoughts on NumPy indexing: <https://dynomight.net/dumpy/>. Ralf: related to the topic of indexing is <https://github.com/arogozhnikov/eindex>. ### What should happen when providing an empty axis tuple to reductions? (evgeni) - Discussion: <https://github.com/data-apis/array-api/issues/937> - Implementation-defined? Error? Or should it fall out naturally from the fact that the set of non-reduced dimensions is the complement of the list of provided axes, and thus an empty axis tuple means that no axes are reduced. Evgeni: divergent behavior between PyTorch and NumPy and even inconsistent within NumPy. Aaron: would be useful to have a table of library support. Ralf: PyTorch has a historical quirk for why they equate an empty tuple with None due to their JIT. Hard to change without breaking backward compatibility. Resolution: for all reductions, add a line saying that this is implementation-defined behavior. ### Device-specific capabilities - Discussion: <https://github.com/data-apis/array-api/issues/945> - Related: <https://github.com/data-apis/array-api/issues/728> - Related: <https://github.com/data-apis/array-api/discussions/777> Ralf: this would be backward-compatible, so I am not seeing a blocker. Resolution: get feedback from JAX folks, given that default value would be `None`. ### In what ways can the test suite be improved? (evgeni) Evgeni: would be good to get feedback from downstream consumers of test suite about ergonomics. Aaron: compat library is likely the biggest user here. The issue with JAX concerned performance issues, but some of these have been resolved. Ralf: one of the issues was lack of vectorization. Aaron: that was one of the changes was moving to vectorized ops. So likely partially resolved. Aaron: from my experience, managing expected failures, etc, were the biggest usability concerns. Ralf: ping ndonnx folks to get their feedback. Resolution: gather additional downstream feedback to identify pain points. ### Allow 0d arrays and scalars in `concat` (evgeni) - Discussion: <https://github.com/data-apis/array-api/issues/946> Evgeni: no current way in the spec to easily prepend/append a scalar/zero-dimensional to an array. Evgeni: the current NumPy way is to use `r_`. Aaron: is there an upstream issue in NumPy about this behavior? Evgeni: in SciPy, `r_` used ~200 times. Ralf: we should make helper functions and see how many we need. There are likely a few more of these manipulation functions that we could use. Aaron: seems like we should first iron this out in NumPy. Ralf: should also query SciPy whether the current proposal covers all the necessary use cases. Would be nice to only come to NumPy with one proposal. Resolution: Evgeni will open an issue on NumPy and follow-up. * * * # Meeting minutes 15 May 2025 **Time**: 10am PDT, 1pm EDT, 5pm GMT, 7pm CEST **Present**: Ralf Gommers, Athan Reines, Hameer Abbasi, Rok Mihevc, Aaron Meurer, Leo Fang, 1. Announcements 2. Can default device be None? 3. adding `eig` and `eigvals`? 4. test suite pain points 5. `meshgrid`/`broadcast_arrays` return type 6. `matrix_transpose` when fewer than 2 dims 7. 0d fill value when passed to `full`? 8. what should happen when providing an empty axis tuple to reductions (e.g., `count_nonzero`)? 9. Adding `put_along_axis` to the specification 10. ## Notes ### Announcements Ralf: SciPy release coming up. Trying to get Array API updates in. Release branch cut on 17th. Then 5 weeks until final release. ### Can default device be `None`? (jake/evgeni) - Discussion: <https://github.com/data-apis/array-api/issues/923> Ralf: yes, seems like everyone in agreement. Hameer: can `None` be compared, as devices must be comparable? Ralf: yes, confirmed. Resolution: update spec accordingly. Add single sentence that `None` may be returned, as long as `None` is accepted by the `device` kwarg. Hameer: can this be backported? Aaron: yes, I think so, as this would be more of a clarification than anything. ### `eig` and `eigvals` (evgeni) - Discussion: <https://github.com/data-apis/array-api/issues/935> Leo: added in cuSolver. In theory, this is doable now. Not sure how API should like. May be possible to follow NumPy. In CuPy, should be doable soonish. Ralf: decent amount of usage in across multiple SciPy modules. My guess is that generally useful, but should likely wait until available in CuPy. In theory, once CUDA 12.6, then should be available everywhere, as CuPy always builds with latest CUDA release. Uses CUDA minor version compatibility. Aaron: does always return complex? Ralf: may be a slight hiccup in that JAX diverages from NumPy. NumPy has a value-dependent output dtype. Ralf: some homework needs to be done to see if this is addressable in NumPy. Resolution: tentative yes, provided can be addressed in NumPy. ### In what ways can the test suite be improved? (evgeni) (postponed) ### `meshgrid`/`broadcast_arrays` return type (evgeni) - List or tuple or Sequence? - Backstory: pre-NumPy v2.0, NumPy returned lists; now it returns tuples, as does PyTorch. JAX and CuPy return lists. - Discussion: <https://github.com/data-apis/array-api/issues/938> - Discussion: <https://github.com/data-apis/array-api/issues/934> Ralf: reason it was changed in NumPy 2.0 was that it is problematic with Numba. Tuples are immutable, and compilers are happier here. Ralf: would need to check with Jax whether it can be changed. Aaron: can we use `Sequence`? Ralf: for output types, this is not desirable. Ralf: IIRC, only reason you want a list is if you wanted a mutable container, but not clear that this is a dominant need. Aaron: during NumPy v2.0, was there any instances of downstream concerns? Ralf: no, as mainly just need to index. Resolution: get JAX to change. If they do not agree, then may need to update spec to allow both Tuple\|List. Resolution: combine the issues into one, and audit the spec to determine if any other affected APIs. ### `matrix_transpose` when fewer than 2 dims (athan) - Discussion: <https://github.com/data-apis/array-api/issues/936> - Can we require raising an exception? Leo: context: we try to Array API in cuTile. We wanted to know how to be compliant after conversations with compiler team. Ralf: I vote to raising an exception after checking with various implementations. Resolution: confirm with existing implementations and then, if all in agreement, add language to raise an exception. ### 0d fill value when passed to `full`? (athan) - Discussion: <https://github.com/data-apis/array-api/issues/909> - Would allow for the dtype to be inferred from the fill value; however, this could also be considered sugar for `full_like(fill_value, float(fill_value))`. - If we choose to support, what should happen when the `fill_value` dtype is different from dtype kwarg value? Aaron: seems like the Dask behavior is a mistake, and should be fixed. Based on Matt's testing, seems like `dtype` kwarg always takes precedence. Ralf: seems okay, although not as nice from typing perspective. Aaron: would also allow for device inference, not just dtype inference, from `fill_value`. Resolution: yes, if issue fixed in Dask. If this is a real need, do the work. Can also support in compat. Mildy in favor. ### What should happen when providing an empty axis tuple to reductions? (evgeni) - Discussion: <https://github.com/data-apis/array-api/issues/937> - Implementation-defined? Error? Or should it fall out naturally from the fact that the set of non-reduced dimensions is the complement of the list of provided axes, and thus an empty axis tuple means that no axes are reduced. (postponed) ### Adding `put_along_axis` to the specification - Can we add without specifying that the updates must happen in-place? - NumPy: - <https://numpy.org/doc/stable/reference/generated/numpy.put_along_axis.html> - CuPy: - Undocumented API: <https://github.com/cupy/cupy/blob/66820586ee1c41013868a8de4977c84f29180bc8/cupy/lib/_shape_base.py#L180> - PR: <https://github.com/cupy/cupy/pull/8199> - PyTorch: - Willingness to add: <https://github.com/pytorch/pytorch/issues/120209> - PR stalled: <https://github.com/pytorch/pytorch/pull/125601> - JAX: - API: <https://docs.jax.dev/en/latest/_autosummary/jax.numpy.put_along_axis.html> - However, user must always set `inplace=False` when calling - Dask: - Open feature request: <https://github.com/dask/dask/issues/3664> - ndonnx: - Supports `put`: <https://ndonnx.readthedocs.io/en/latest/api/ndonnx.extensions.html#ndonnx.extensions.put> - MLX: - Not supported. Ralf: NumPy would need to change its behavior, as it currently returns `None`. Ralf: given that ecosystem is still not universally aligned, harder to push forward. `array-api-extra` supports `at` and takes in indices which are arrays. JAX's `at` supports arbitrary indices. In which case, use case covered, it seems. Resolution: ecosystem needs greater alignment and problem is not particularly urgent (only used in 2 places in SciPy). Resolution: punt on need for `put` and `put_along_axis`. If are going to really solve this, pursue view tracking implementation. Second best option is supporting an `at` function, which would be more general that these functions, as can use not only for arbitrary in-place updates, but also in-place operations (e.g., add, subtract, etc). Resolution: keep <https://github.com/data-apis/array-api/issues/177> open, as this is a recurring discussion. * * * # Meeting minutes 1 May 2025 **Time**: 10am PDT, 1pm EDT, 5pm GMT, 7pm CEST **Present**: Ralf Gommers, Athan Reines, Evgeni Burovski, Aaron Meurer, Guido Imperiale, Rok Mihevc, Leo Fang, 1. Announcements 2. 2025 focus areas 3. Can default device be None? 4. ## Notes ### Announcements No announcements. ### 2025 focus areas - Previous discussion 9 January 2025 - in-place updates (setitem) - `cond` function - typing - device transfers - DLPack v1.0 support - DLPack C API support? - making progress on documentation - what types of documentation? - Gap: high-level docs on how do all the `array-api-*` packages fit together. - where should this documentation live? - `nan*` functions - missing data/masks (tentative proposals to get ecosystem consensus) - What are the pain points? Where are the gaps? - Question: the compatibility layer continues to grow as a shim layer. Over the past couple of years, we've grown increasingly reliant on `array-api-compat` to provide compatible behavior, which has enjoyed success in downstream libraries, such as SciPy and sklearn, among others. However, the compat layer's existence, to some degree, dis-incentivizes upstream array libraries from adding compliant behavior, and user's moving from one library to the next still cannot simply swap an import and enjoy a frictionless experience, at least for the common core of Array API functionality. So the question becomes: should we be making a greater effort to upstream compliant behavior and better align array libraries? Some folks, such as the JAX community, have made great strides to improve Array API compliance; others, such as PyTorch, have stalled out. - A (rough) snapshot of test suite compliance results (using `max-examples=1`; note: failures are over-represented by special cases): - NumPy: 19 failures, 938 pass, 66 skipped, 98%/91.6% - CuPy: 349 failures, 872 pass, 9 skipped, 71.4%/70.9% - PyTorch: 210 failures, 995 pass, 25 skipped, 82.6%/80.9% - JAX: 349 failures, 872 pass, 9 skipped, 71.4%/70.9% Guido: cannot think of use cases of `cond` function, thus far. However, I have not implemented any convergence algorithms. I would like to see a use case. I could see a `while` and `for`. Ralf: the idea is lazy boolean evaluation. I agree that we have yet to identify a high priority use case. Thus far, we have not been bitten, yet. Ralf: update on DLPack v1.0. PyTorch PR adding this. Implementation looks ready. Leo: I already told our PyTorch team about this, as well. Ralf: missing data/masks is likely not a priority, as most array libraries do not handle. `nan*` functions, however, yes, there may be room here. Guido: I think tutorials around "you want to write an array API compatible package, here is how you do it". Evgeni: some worked examples. Here is a SciPy function; here is how you make it array API compatible. Guido: agreed. Guido: how to migrate from pure eager to lazy and the design decisions that are possible. Demonstrate alternative designs. E.g., in SciPy, there are ongoing discussions regarding generalized dispatch mechanisms. Ralf: may not be great for generalized tutorial/guide, as specific to SciPy. May be better as developer docs for SciPy, itself. Aaron: my main gripe with standard documentation is that it can be difficult to find information. E.g., when going to homepage, currently only a table of contents. Not clear that the meat of the spec is in a sub-folder. Ralf: agreed. Would be good to provide some high-level context on the specification homepage. Guido: having some way of querying whether an array is read-only or of requiring an array to be read-only. Essentially, want an API in which I call at the beginning of any Python function to return a view that is read-only in any API that uses setitem in order to avoid accidental mutation. Guido: in CuPy, it doesn't exactly replicate NumPy's semantics, such as in broadcasting, where NumPy returns a read-only view in order to avoid updating multiple elements at once. CuPy does not cover this edge case, as doesn't have the same implementation details. Aaron: wouldn't copy-on-write be another approach for addressing this issue? Ralf: yes. NumPy is likely moving in this direction, due to free-threading support. Ralf: I think this topic is one of the main areas that needs addressing. Guido: that and `at`, as currently it is missing in Sparse. And even in JAX. Ralf: still the same problem, as can solve via view tracking. And no point and discussing with JAX until we solve the problems in PyTorch and NumPy. Once we do, it is simply syntax to add. The `at` problem is the in-place update problem is the view-tracking problem. Guido: I think JAX also struggles when have a mutable wrapper around an immutable JAX array. Ralf: the reason JAX does not want to discuss this is that not possible to currently replicate NumPy behavior. However, if we are able to solve this in NumPy, then it becomes possible to do so in JAX. ### Can default device be None? - Discussion: <https://github.com/data-apis/array-api/issues/923> (postponed until next meeting) * * * # Meeting minutes 17 April 2025 **Time**: 10am PDT, 1pm EDT, 5pm GMT, 7pm CEST **Present**: Ralf Gommers, Hameer Abbasi, Rok Mihevc, Joren Hammudoglu, Jake Vanderplas, Lucas Colley, Aaron Meurer, Guido Imperiale, Tim Head, Tyler Reddy, Evgeni Burovski, Raúl Cumplido, Leo Fang, 1. Announcements 2. Typing 3. Assorted discussions (Lucas) 4. 2025 focus areas 5. ## Notes ### Announcements Rok: contributor to Arrow for past 6 years. Been working on tensor arrays in Arrow. Raúl: Also a contributor to Apache Arrow. Ralf: any announcements? Tim: anyone going to be at PyCon Germany next week? If so, happy to meet up. ### Typing - Typing and `bool | int | float | complex` - <https://github.com/data-apis/array-api-compat/pull/288> Joren: general question regarding whether to follow official Python spec. Joren: for historical reasons, union of `bool` and `int` is equivalent to `int`. Similarly, for other unions. Ralf: we should document the divergence. Spec is docs. Typing library is for type checking behavior. Aaron: consider the typing library as the authoriative implementation. Joren: can also clarify that some things may be accepted even if not supported. Ralf: we should update the typing design topic page. Essentially, agree on the separation of concerns. Guido: so, in array-compat-api, which types should be followed? Joren: there are also cases where type checkers behave differently, so which choice is followed does impact end result. Resolution: array-api-compat should follow separate typing library. We should follow up in the spec and be more explicit about where, e.g., `bool` is not allowed when int is, etc. ### Assorted discussions (Lucas) - array-api-extra docs deployment is failing - <https://github.com/data-apis/array-api-extra/issues/266> Lucas: someone with permissions can help debug docs deployment. - Unifying test assertion infrastructure - <https://github.com/data-apis/array-api-extra/pull/267> - feasible to accommodate what scikit-learn needs? Lucas: would be good to get feedback from other libraries, so that we can make the assertion infrastructure standardized across libraries. Aaron: my suggestion might be to make things a PyTest plugin, as would then allow fixtures. Guido: is there a reason why NumPy testing not shipping as a plugin? Evgeni: predating plugins? Aaron: that is likely one reason. Guido: fixtures end up being rather different. E.g., SciPy uses a number of SciPy-specific env flags. array-api-extra uses a bunch of magic to test against different backends. Lucas: idea would be to identify what could be pulled out and should be pulled out, but yes, point taken that there is going to be a subset of behavior which cannot be pulled out. Lucas: Tim, would you be willing to take a look at how different we are doing for sklearn? Tim: had a quick look. Most of the time use `assert` and numpy equal. Lucas: yes, different in SciPy, but still worthwhile to have another look. - CuPy multi-device support - <https://github.com/data-apis/array-api-compat/pull/293> ```python with cp.cuda.Device(1): y = cp.asarray(1, device=cp.cuda.Device(0)) # y should be on Device(0) z = y + 1 # device of z? ``` Guido: currently, specification provides recommendations, but is not proscriptive. Atm, have issues in SciPy where encounter difficulties with using CuPy and arrays allocated on different devices. Tim: based on the example above, what would the device of `z` be? Guido: based on my understanding of spec, `z` should be on `Device(0)`, as the input array device to a function takes precedence. Tim: okay, and in this case, `1` should inherit the device of `y`. Ralf: and CuPy chokes? Guido: correct. Tim: right, the problem seems to be that CuPy ends up creating `1` on device `1`, rather than `0`. Ralf: seems like a CuPy bug. Lucas: CuPy also has guidance that context manager should take precedence. Would need to discuss further to determine whether following spec would mean a backward compat break. Ralf: I think the only thing which is reasonable is for `z` to be on same device as `y`. Resolution: we should clarify in the spec device guidance for scalar arguments to behave similarly to how `dtype` is inherited from array argument in binops. - mixing agnostic-wrapping libraries: - <https://github.com/mdhaber/marray/issues/67#issuecomment-2746150210> - nested metadata? metadata namespace? `MetadataArray` library? - is it feasible for any of these to interoperate with the library-specific paths in e.g. `xpx.lazy_apply`? Lucas: general question about where metadata should live to allow consuming libraries to inspect array flavors. Guido: applies more generally anytime you need to special case a library. E.g., was not able to implement `xp.at` that was not special-cased for JAX, as no standardized behavior in standard which would support JAX's `at` semantics. Another case is where Dask cannot support boolean masks when there is an unknown shape. Ralf: discussion at <https://github.com/pydata/duck-array-discussion/issues> may be relevant. Guido: another problem area is `lazy_apply`. Requires quite a bit of special casing for Dask and JAX. And if either come wrapped by, say, MArray, nothing will work. Guido: not sure about solutions. One solution is to have a standardized API for wrapping and unwrapping, but this is problematic when it comes to metadata. For various use cases, you need domain-specific knowledge in order to generate metadata. Ralf: I am very skeptical that we'd be able to standardize anything here. Already, one layer of wrapping is already difficult. Not a new request. People have been trying to do this for over a decade. Given the niche aspects of the various wrappers, I think may be difficult to do in a generic way which is not terrible for performance. Guido: could be possible to generalize for element-wise ops. Ralf: yes, but that is not going to get you very far. Guido: in general, easy to standardize an api for wrapping and unwrapping, until you crash and burn wrt to meta data. E.g., going across chunks in Dask. In order to rebuild meta data, you need to know the backend and the operation involved. Hameer: same thing in Sparse. Lucas: problem is when the meta data is not completely determined by performing the operation. Ralf: right now, there is only a few of these libraries, and, atm, seems like these libraries have to know about one another. Ralf: I think this entire enterprise is not amenable to standardization. Even masked arrays is niche. NumPy has them, but not other libraries. Guido: Dask wrapping a CuPy array is not niche. Ralf: correct, but that is one use case. I think we can revisit this in a year and see where things stand. ### 2025 focus areas - Previous discussion 9 January 2025 - in-place updates (setitem) - `cond` function - typing - device transfers - DLPack v1.0 support - making progress on documentation - missing data/masks (tentative proposals to get ecosystem consensus) - What are the pain points? Where are the gaps? (postponed until next meeting) * * * # Meeting minutes 3 April 2025 **Time**: 10am PDT, 1pm EDT, 5pm GMT, 7pm CEST **Present**: Ralf Gommers, Joren Hammudoglu, Aaron Meurer, Hameer Abbasi, Lucas Colley, Guido Imperiale, Athan Reines, Evgeni Burovski, Tyler Reddy, Nikita Grigorian, Leo Fang, Tim Head, Sebastian Berg, 1. Announcements 2. Triage 3. Sparse interchange update (Hameer) 4. DLPack (Leo) 5. Assorted discussions (Lucas) 6. 2025 focus areas 7. ## Notes ### Announcements - scikit-learn now vendors `array-api-compat` and `array-api-extra` rather than having optional dependencies: <https://github.com/scikit-learn/scikit-learn/pull/30340> Leo: DPLack v1.1 tagged and release. In this release, added more enumerators. Support for fp4, fp6, fp8 for ML data types. ### Triage - Fix typing for `indexing` kwarg in `meshgrid` - PR: <https://github.com/data-apis/array-api/pull/915> - Status: **merged** - Previously undertyped as `str`. Updated to the `Literal` values. - Should `full` be updated to allow 0D arrays for the `fill_value` argument? - Issue: <https://github.com/data-apis/array-api/issues/909> - Should `vecdot` support a `keepdims` kwarg? Should `vecdot` support multiple axes? - Issue: <https://github.com/data-apis/array-api/issues/910> - Discussion concerning multi-device support across ecosystem libraries - Issue: <https://github.com/data-apis/array-api/issues/918> Guido: would be good to discuss. We can either postpone until the end of the meeting or next time. - What should be the expected behavior when providing an empty array to `vector_norm` and `matrix_norm`? - Issue: <https://github.com/data-apis/array-api/issues/901> - Nested subsequences containing arrays provided to `asarray` - PR: <https://github.com/data-apis/array-api/pull/917> - Status: **open** - Can we state that this is unspecified? Resolution: ### Sparse interchange update (Hameer) - PR: <https://github.com/data-apis/array-api/pull/912> - What should the extension mechanism be? - These APIs do not fit neatly into how we've done other extensions, such as `fft` and `linalg`, and require that extension behavior be present on the array object, itself. - Should there be a minimum set of supported formats? Aaron: could `binsparse` be implemented for dense arrays, as well? Hameer: in principle, yes, but not clear why would you want to do that, as less efficient than just using DLPack. Aaron: do we need to be cognizant of using `sparse` as a sub-namespace name? E.g., PyTorch has a dedicated `sparse` namespace. Ralf: no, I don't think this should be a problem, as PyTorch has its APIs primarily in the main namespace. Aaron: could we think about adding an `is_sparse_array` API? Ralf: passing a dense array to `from_binsparse` should be disallowed. In general, we should recommend that dense arrays not implement the `__binsparse__` method. ### DLPack (Leo) - How to do `fp6` in DLPack? (Sebastian) - padding to bytes? use `.bits` or extend? - Standardize a C interface for stream exchange: <https://github.com/dmlc/dlpack/issues/65> (Leo) - DLPack exchange mediated through Python is too slow for NVIDIA's needs - <https://github.com/dmlc/dlpack/issues/74> Leo: one concern is that NumPy cannot represent sub-byte types. NumPy will pad, even though types use fewer bits than a full byte (fp4, only 4 bits). Leo: added a flag to determine whether a dtype is padded or packed. Leo: would be good to get feedback. Sebastian: NumPy dtypes can probably be made to support it, but the array object wouldn’t be able to do anything with it. Leo: Another issue is <https://github.com/dmlc/dlpack/issues/74>. Ralf: this would be adding new C APIs but matching the semantics of what we have in Python. Leo: correct. Aim is to have more to discuss in 2-4 weeks. Hameer: in general, if using these compact dtypes, they should not be padded. Leo: correct, if on GPUs, don't want to pad. However, at least for NumPy, NumPy cannot support. JAX uses NumPy dtype system to allow for easy prototyping. The goal would be to allow NumPy to at least express these specialized dtypes. Leo: at NVIDIA, we noticed that, even with DLPack implemented at C level, too much overhead for kernel launch. At a bare minimum, need only data pointer, shape, and strides. Don't have an explicit ask here, but I at least wanted to share our experience after having worked with DLPack over the past few years. This, at least, is how CuPy does things. ### Assorted discussions (Lucas) - Should array-api-compat support min. NumPy for scikit-learn and test it in CI? - <https://github.com/data-apis/array-api-compat/pull/266> - cf. <https://scikit-learn.org/dev/developers/maintainer.html#guideline-for-bumping-minimum-versions-of-our-dependencies> - not the same as Spec0. Main thing is to keep around older versions for at least a year. - Typing and `bool | int | float | complex` - <https://github.com/data-apis/array-api-compat/pull/288> - CuPy multi-device support - <https://github.com/data-apis/array-api-compat/pull/293> - mixing agnostic-wrapping libraries: - <https://github.com/mdhaber/marray/issues/67#issuecomment-2746150210> Lucas: should `array-api-compat` follow the sklearn dep policy? Seems reasonable to me, but it would mean testing NumPy v1.22 in CI. Guido: I think one of the concerns is that policy means keeping NumPy v1 until 2027. Ralf: a couple of comments. Spec0 has not been universally accepted. PyTorch tried to be a bit more loose, but got feedback from users that they essentially adopted sklearn policy. Aaron: in general, the compat library should be as conservative as possible, and not drop versions prematurely. The compat library is a compatibility library and the entire point is to smooth versions. Main thing is CI matrix is a bit annoying to maintain. Aaron: the strict library can support the latest and greatest. Strict is not a runtime dependency. They should not necessarily have the same policies. Ralf: NumPy v2 has only been out for 9 months. There are still major projects which haven't migrated. So it is likely that v1.26 will be around for another year and some months. Guido: Python 3.12 is the last version of NumPy v1 for which wheels exist. Tim: there is an escape hatch in the sklearn policy. Maybe more of an appetite for dropping NumPy v1 in 12-18 months. (further discussion related to points above postponed until next meeting; especially typing which should be placed at beginning of agenda) ### 2025 focus areas - Previous discussion 9 January 2025 - in-place updates (setitem) - `cond` function - typing - device transfers - DLPack v1.0 support - making progress on documentation - missing data/masks (tentative proposals to get ecosystem consensus) - What are the pain points? Where are the gaps? (postponed until next meeting) * * * **Meeting 20 Mar: CANCELLED** * * * **Meeting 6 Mar: CANCELLED** * * * # Meeting minutes 20 February 2025 **Time**: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST **Present**: Athan Reines, Evgeni Burovski, Hameer Abbasi, Aaron Meurer, Tim Head, Stephan Hoyer, Sebastian Berg, Jake Vanderplas, Leo Fang, 1. Announcements 2. Integer array indexing - Overview - Supporting multiple integer index arrays (beyond 0D) - Supporting multi-dimensional integer arrays - Supporting sequences (non-arrays) - setitem semantics - allowed dtypes - other considerations? 3. Sparse interchange update 4. ## Notes ### Announcements ### Integer array indexing - Discussion: <https://github.com/data-apis/array-api/issues/669> - PR: <https://github.com/data-apis/array-api/pull/900> 0. Overview of integer array indexing support - https://github.com/data-apis/array-api-tests/pull/343 - brings array-api-tests style testing - tests indexing 1D and nD arrays with mixtures of integers and 1D index arrays, mixtures of integers and nD index arrays (4 separate tests: 1D-1D, nD-1D, 1D-nD, nD-nD) - NumPy: support all sorts fancy indexing, maybe even too much - Torch: all tests pass - CuPy: all tests pass - JAX: does not support lists for fancy indexing. <https://github.com/jax-ml/jax/issues/4564> - ndonnx: all fancy indexing tests fail with `ValueError: The lazy Array Array(dtype=Int32) cannot be converted to a scalar int` - mlx: (unknown) - Dask: Understands a single 1D index array. Fails to allow - multidim indexing arrays: `da.arange(5)[da.asarray([[1], [1]])] -> NotImplementedError: Slicing with dask.array of ints only permitted when the indexer has zero or one dimensions` - multiple indexing arrays: `a2 = da.arange(12).reshape(3, 4), then a2[1, da.array([0, 1])] works but a2[da.array([1, 0]), da.array([0, 1])] -> NotImplementedError: Don't yet support nd fancy indexing` - `da.take` does not seem to allow arrays either Stephan: it does appear that ONNX does support N-dimensional integer array indexing via its gather API: https://onnx.ai/onnx/operators/onnx__GatherND.html 1. Should we support multiple integer index arrays? - Dask only seems to support `x[[1,2,3]]` and ```python In [36]: a3.compute() Out[36]: array([[[ 0, 1], [ 2, 3]], [[ 4, 5], [ 6, 7]], [[ 8, 9], [10, 11]]]) In [38]: a3[0, 1, da.array([0, 1])].compute() Out[38]: array([2, 3]) ``` Stephan: without multiple integer index arrays, not really that useful. Dask can support, just may not be that efficient. Athan: could make it an optional feature, similar to boolean indexing. Stephan: I don't think you want the binary choice, as no technical limitation. It is more whether it is advisable to actually use integer array indexing. Jake: not really a property you can query or reason for it, as no way to guarantee efficiency due to random access. Resolution: yes, support multiple integer arrays. 2. Should we support multi-dimensional integer arrays? - NumPy, PyTorch, and JAX support; however, Dask does not. ```python a3[0, da.array([0]), da.array([0, 1])].compute() -> NotImplementedError ``` - What would be the workaround in the compatibility layer? Aaron: for the compat layer, you could flatten it and use unravel_index, etc. Aaron: for ndonnx, unless they do chunking, there may not be a data dependence, and should be feasible. Stephan: biggest use case for me is in XArray as it allows generalized indexing. Tim: for sklearn, main use case is single array for randomly selecting elements. Resolution: yes, support multi-dimensional integer arrays. In the compat layer, will need to add a helper function which can do the right thing for Dask. 3. Should we support sequences, such as lists, or only standardize integer array indices? - JAX does not support: <https://github.com/jax-ml/jax/issues/4564> Jake: in general, JAX does not support lists where NumPy does due to a perf footgun. Leo: Yeah no Python objects as N-D indexers if possible, plz. Leo: I found the instance in which PyTorch did not prefer list arguments: PyTorch preferred list for the `repeat()` API: <https://github.com/data-apis/array-api/issues/654 Sebastian: not applicable here. Resolution: only support integer arrays. 4. `__setitem__` semantics for multi-dimensional integer arrays? - `__getitem__` results in an array having a larger rank - should we add a sentence saying that integer array indexing only applies to the RHS of an assignment operation for the time being, with LHS semantics determined in 2025? - NumPy creates a LHS view and then assigns such that the last value wins Example: ``` In [1]: z2 Out[1]: array([[[ 0., 1., 2., 3., 4.], [ 5., 6., 7., 8., 9.], [10., 11., 12., 13., 14.], [15., 16., 17., 18., 19.], [20., 21., 22., 23., 24.]], [[25., 26., 27., 28., 29.], [30., 31., 32., 33., 34.], [35., 36., 37., 38., 39.], [40., 41., 42., 43., 44.], [45., 46., 47., 48., 49.]], [[50., 51., 52., 53., 54.], [55., 56., 57., 58., 59.], [60., 61., 62., 63., 64.], [65., 66., 67., 68., 69.], [70., 71., 72., 73., 74.]], [[75., 76., 77., 78., 79.], [80., 81., 82., 83., 84.], [85., 86., 87., 88., 89.], [90., 91., 92., 93., 94.], [95., 96., 97., 98., 99.]]]) In [2]: z2.shape Out[2]: (4, 5, 5) In [3]: v2 = np.linspace(1,81,81).reshape((3,3,3,3)) In [4]: v2 Out[4]: array([[[[ 1., 2., 3.], [ 4., 5., 6.], [ 7., 8., 9.]], [[10., 11., 12.], [13., 14., 15.], [16., 17., 18.]], [[19., 20., 21.], [22., 23., 24.], [25., 26., 27.]]], [[[28., 29., 30.], [31., 32., 33.], [34., 35., 36.]], [[37., 38., 39.], [40., 41., 42.], [43., 44., 45.]], [[46., 47., 48.], [49., 50., 51.], [52., 53., 54.]]], [[[55., 56., 57.], [58., 59., 60.], [61., 62., 63.]], [[64., 65., 66.], [67., 68., 69.], [70., 71., 72.]], [[73., 74., 75.], [76., 77., 78.], [79., 80., 81.]]]]) In [5]: z2[[1],[1],np.ones((3,3,3,3),dtype='int32')] = v2 In [6]: z2 Out[6]: array([[[ 0., 1., 2., 3., 4.], [ 5., 6., 7., 8., 9.], [10., 11., 12., 13., 14.], [15., 16., 17., 18., 19.], [20., 21., 22., 23., 24.]], [[25., 26., 27., 28., 29.], [30., **81.**, 32., 33., 34.], [35., 36., 37., 38., 39.], [40., 41., 42., 43., 44.], [45., 46., 47., 48., 49.]], [[50., 51., 52., 53., 54.], [55., 56., 57., 58., 59.], [60., 61., 62., 63., 64.], [65., 66., 67., 68., 69.], [70., 71., 72., 73., 74.]], [[75., 76., 77., 78., 79.], [80., 81., 82., 83., 84.], [85., 86., 87., 88., 89.], [90., 91., 92., 93., 94.], [95., 96., 97., 98., 99.]]]) ``` Resolution: punt setitem to 2025. 5. Allowed integer array data types? - Torch only allows native int types, such as `int32` and `int64`, but not `uint64` or `int16` Resolution: must support default indexing dtype. 6. Other considerations? Leo: re: out of bounds, cupy cannot handle it like in numpy: <https://docs.cupy.dev/en/stable/user_guide/difference.html#out-of-bounds-indices> ### Sparse interchange update Hameer: PRs up for SciPy and in PyData/Sparse. ### DLPack Sebastian: if Leo shows: could discuss briefly how to do fp6 in dlpack. The questions are around padding to bytes, or not and if to use `.bits` or extend it. Leo: I’d like to discuss <https://github.com/dmlc/dlpack/issues/65>. Leo: folks at NVIDIA are complaining about DLPack is slow. Leo: getting various about C API. Want to get performance down to lower than kernel launch behavior. * * * # Meeting minutes 6 February 2025 **Time**: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST **Present**: Ralf Gommers, Aaron Meurer, Hameer Abbasi, Tim Head, Athan Reines, Nikita Grigorian, Sebastian Berg, Jake Vanderplas, Simon Adorf, Evgeni Burovski, Leo Fang, 1. Announcements 2. Triage 3. DLPack update 4. Adding `spread_dims` to array libraries? 5. Binsparse update 6. Downstream library update 7. ## Notes ### Announcements Simon: software engineer at NVIDIA, working on cuML. Dropping in to learn more. Leo: cupy 13.4.0 in preparation (to support ctk 12.8/blackwell gpu) ### Triage - Discussion on the ambiguity of `shape` behavior - Issue: <https://github.com/data-apis/array-api/issues/891> - Adding `broadcast_shapes` to the specification - Issue: <https://github.com/data-apis/array-api/issues/893> - Clarify support for negative indices in `take` and `take_along_axis` - PR: <https://github.com/data-apis/array-api/pull/894> - Status: **merged** - Clarify accuracy requirements for `sqrt` - PR: <https://github.com/data-apis/array-api/pull/882> - Status: **merged** - Define Python `complex <op> fp_array` - PR: <https://github.com/data-apis/array-api/pull/871> - allows `1j*<real_float_array>` - Status: **merged** - Clarify type promotion in in-place operators - PR: <https://github.com/data-apis/array-api/pull/895> - Status: **awaiting review** - Clarify `clip` behavior when `min` and `max` have different dtypes than `x` - PR: <https://github.com/data-apis/array-api/pull/896> - Status: **awaiting review** - Fix incorrect boundary conditions in `searchsorted` - PR: <https://github.com/data-apis/array-api/pull/898> - Status: **awaiting review** - Scalar argument support - Issue: <https://github.com/data-apis/array-api/issues/807> - Status: **complete** AR: What remains for v2024? - description of behavioral semantics for `copy` kwarg - need to update design topics - integer array indexing - need to confirm alignment across array libraries - merge: - <https://github.com/data-apis/array-api/pull/898> - <https://github.com/data-apis/array-api/pull/897> - <https://github.com/data-apis/array-api/pull/896> - <https://github.com/data-apis/array-api/pull/895> ### DLPack Update - Discussion: - <https://github.com/dmlc/dlpack/issues/156> - <https://github.com/pytorch/pytorch/issues/146414> - TL;DR: we need to address an issue that the data type coverage needs to expand to also cover machine learning dtypes, such as various fp8, fp4, int4, .... Basically, everything that ml_dtype covers. Ralf: progress on DLPack v1.0. PyTorch PR is up, but not finished. Should get in for PyTorch 2.7. Peter Hawkins was looking at JAX, which is the last major library we are tracking. He identified that it would nice to have a better test suite. Currently missing in the test suite. This is likely one of the most important missing APIs in the test suite. Hopefully, we can borrow round-trip tests from NumPy et al. Jake: no updates from me regarding JAX. Leo: speaking of JAX, doesn't currently string `CUDA-1` which was a hack to workaround limitation. We plan to file an issue for JAX XLA. Leo: DLPack covers all the basic data types; however, with ML use cases, need has arisen for supporting lower precision data types. Leo: quick way out would be to add new DLPack enumerated dtype variants. This solves the immediate needs. However, I am wondering if this is too cumbersome and would inflate the enum variants too much. Thoughts? Hameer: for all the floating-point dtypes, do they all boil down to sign bit plus exponent bits plus mantissa? Jake: that is mostly true; however, they can vary in terms of their representation of NaN, etc. My main concern is that there is a lack of agreement as to what those dtypes mean/represent. Jake: I think we could potentially standardize around the 4 that PyTorch has and the ML dtypes. However, there could be others that come along based on research. Sebastian: I think it should be fine, as we have enough enum space. And we are not asking for everyone to support. I think inflating the enum is easier compared to the approach with byte sizes being split out. And even if another name takes over, we add an alias for the name. Jake: PyTorch bf16 is not compatible with ML dtypes bf16. Ralf: I would hope that we can standardize bf16 in the next year. Jake: would be nice to get that into NumPy. Leo: okay, so the plan is to open a new PR to add new enums to DLPack. Will seek feedback. ### `spread_dims` (v2025; Athan) - Discussion: <https://github.com/data-apis/array-api/issues/760#issuecomment-2602163086> - Proposal: ```python def spread_dims(x: array, ndims: int, axes=Tuple[int, ...]) -> array ``` - TL;DR: `expand_dims` has issues when supporting multiple axes due to ambiguities when mixing nonnegative and negative indices, thus hindering standardization of a common use case. Is there room for adding a separate API which handles the "spreading" of dimensions more explicity? Sebastian: Not sure about naming, I think I could be convinced if there are some people saying that it seems nicer for their use-cases. The new API does make sense, but I am not immediately sure it would be used much. Asking a non ambiguous subset is of course easier in a sense. Resolution: try to standardize unambiguous subset of behavior. Revisit PyTorch objections. ### binsparse update Hameer: discussion this week about adding blocked sparse formats, as well as additional formats. This should hopefully the missing major format in the binsparse specification. Ralf: is there a complete implementation somewhere? Hameer: of the Python version, no. There is still the discussion of the one or two function version. Will try to implement the two function version in SciPy and PyData/Sparse. ### Downstream libraries update Evgeni: we are planning on dropping old NumPy support in SciPy. Evgeni: another version update is that in the array-api-strict is requiring support for Python v3.12 and up. Change coming this summer. Jake: seem fines to me to follow SPEC 0. Tim: for array-api-strict, should not be a problem, as we only use it during testing. Tim: in sklearn, we decided to not follow SPEC 0, as we believe it moves too fast. For NumPy, we look to whatever the default is being shipped on Ubuntu LTS. Tim: that said, may be open. Ralf: I'd say we shouldn't drop it so long as sklearn supports. Tim: may be good to coordinate. * * * # Meeting minutes 23 January 2025 **Time**: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST **Present**: Ralf Gommers, Oleksandr Pavlyk, Sebastian Berg, Aaron Meurer, Evgeni Burovski, Nikita Grigorian, 1. Announcements 2. Triage 3. Clarifying `copy` kwarg behavior in `asarray` 4. Adding support for a `copy` kwarg to `real` and `conj` 5. Adding `dtype` kwarg support to `fftfreq` and `fftfreq` 6. Supported data type kinds in `astype` 7. Adding `spread_dims` to array libraries? 8. ## Notes ### Announcements Oleksandr: joining NVIDIA as part of CUDA Python team. Nikita: work for Intel. Been working with Oleksandr at Intel. Background in physics and ML research. ### Triage - Adding `dtype` kwarg support to `fft.fftfreq` and `fft.rfftfreq` - PR: <https://github.com/data-apis/array-api/pull/885> - Status: **awaiting review** - Adding real argument support to `real` and `conj` - PR: <https://github.com/data-apis/array-api/pull/884> - Status: **under review** - Clarifying `copy` kwarg behavior in `asarray` - PR: <https://github.com/data-apis/array-api/pull/886> - Status: **awaiting review** - Clarifying exceptions in `__dlpack__` - PR: <https://github.com/data-apis/array-api/pull/887> - Status: **awaiting review** - Clarifying broadcast behavior in `broadcast_to` - PR: <https://github.com/data-apis/array-api/pull/888> - Status: **awaiting review** - Applying type promotion rules according to device capabilities in `result_type` and `can_cast` - PR: <https://github.com/data-apis/array-api/pull/889> - Status: **awaiting review** ### Clarifying `copy` kwarg behavior in `asarray` (Athan) - Discussion: <https://github.com/data-apis/array-api/issues/495> - PR: <https://github.com/data-apis/array-api/pull/886> - Proposed note: "A common use case for setting ``copy`` equal to ``True`` is to guarantee that, when provided an array ``x``, ``asarray(x, copy=True)`` returns an array which may be mutated without affecting user data. For conforming implementations which disallow mutation, explicitly copying array data belonging to a known array type to a new memory location may not be necessary. Accordingly, such implementations may choose to ignore the ``copy`` keyword argument when ``obj`` is an array belonging to that implementation." - Related discussions: - Clarify what `copy` kwarg means for Dask: <https://github.com/data-apis/array-api/issues/866> - `copy` kwarg behavior in `astype`: <https://github.com/data-apis/array-api/issues/788#issuecomment-2080416220> - Summary: in the specification, we have tried to avoid prescribing implementation behavior. However, a question that has arisen on multiple occassions is what is meant by "copy". In general, we've tended to (informally) say "logical copy", which means that, so long as a library can guarantee that a returned array behaves semantically like a copy, then conforming libraries can avoid actually allocating and copying to new memory. However, implementors have struggled with understanding the intended semantics of "copy", and there seems to be some subtlety in terms of desired use cases: - Array consuming libraries want to ensure explicit copies in order to avoid unintended mutation: <https://github.com/data-apis/array-api/issues/495> - Array libraries, such as Dask, never want to have to copy memory: <https://github.com/data-apis/array-api/issues/866> - Array libraries, such as JAX, generally want to avoid copying memory, but there are exceptions where copying memory is actually what's desired (e.g., when provided donated memory): <https://github.com/data-apis/array-api/issues/788> - Additional confusion arises due to a divergence in `astype` and `asarray` kwarg behavior, where `copy=False` does NOT mean never copy, but only copy when the input and desired output dtype differ. - Aside: even here, the choice to copy when the dtypes differ is questionable, as creating a view having a different dtype of the same bit width (e.g., intrepreting float32 as int32 and vice versa) could be considered a reasonable use of `astype`. - Related discussion: <https://github.com/data-apis/array-api/issues/266> - Related discussion: <https://github.com/data-apis/array-api/issues/867> - APIs supporting a `copy` kwarg in the specification: - `asarray` - "If ``True``, the function must always copy. If ``False``, the function must never copy for input which supports the buffer protocol and must raise a ``ValueError`` in case a copy would be necessary. If ``None``, the function must reuse existing memory buffer if possible and copy otherwise." - Comment: TMK, for this API, we've tended to define "copy" as a logical copy. - Default: None - `reshape` - "If ``True``, the function must always copy. If ``False``, the function must never copy. If ``None``, the function must avoid copying, if possible, and may copy otherwise." - Comment: should we raise a ``ValueError`` here in case a copy would be necessary and `copy == False`? - Comment: here, "copy" can be both logical and physical (e.g., for strided implementations, some shapes may require copying data to a new contiguous chunk of memory) - Default: None - `__dlpack__` - "If ``True``, the function must always copy (performed by the producer). If ``False``, the function must never copy, and raise a ``BufferError`` in case a copy is deemed necessary (e.g. if a cross-device data movement is requested, and it is not possible without a copy). If ``None``, the function must reuse the existing memory buffer if possible and copy otherwise." - Default: None - `astype` - "If ``True``, a newly allocated array must always be returned. If ``False`` and the specified ``dtype`` matches the data type of the input array, the input array must be returned; otherwise, a newly allocated array must be returned." - Comment: here, "copy" means physical copy (i.e., "newly allocated") - Default: True - Questions to focus discussion: 1. What language should we add to the standard to clarify what the specification means by "copy" and where should this clarification live? 2. Do we need an API which explicitly guarantees a physical copy? (ref: John and Jake/JAX use cases) 3. Should `astype` behavior be aligned with `asarray` et al in terms of (a) default behavior and (b) support of `None`? 4. Should `astype` be allowed to reinterpret elements without allocation (i.e., no forced copy) when dtypes do not agree but are of the same bit width or are a multiple of the desired bit width (e.g., `float32` => `int32`, `float64` => `int32`, etc)? Initial recommendation: recommend a physical copy, unless you can guarantee that an array can be mutated without side-effects. In order to guarantee, you have to perform whole-program analysis in order to ensure this. Recommendation: add to copy views to design topics. If a copy is made, the intent is fulfilled: namely, that mutation can happen free of side effects. Recommendation: put a `copy` function in array-api-extra, as one-liner convenience function. Addendum: For users, if `copy` is `True`, then you are free to use in-place operations. Sebastian: I think the question is, if the user does: ``` b = array(a, copy=True) np.from_dlpack(b) ``` Does `b` ensure that `a` cannot be modified or do we allow libraries to be sloppy on it (telling users to be careful about `from_dlpack` there). Oleksandr: I think modifying `b` can actually modify `a` . We would need `b = asarra(a); c = from_dlpack(b, copy=True)` Aaron: But if asarray actually did the copy the copy in `from_dlpack` could be a redundant copy. Aaron: also depends on provenance of array data (e.g., from a buffer, dlpack). If coming from upstream, then cannot actually know if data cannot be mutated. If a user specifies `copy`, then may be indicating to actually perform the physical copy in order to guard against mutation. Recommendation: `astype` => ship has sailed on this one. Change in NumPy would be very disruptive. Recommendation: separate proposal for reinterpreting memory. Would also deviate from `astype` as could change shape. Could be a candidate for array-api-extra, using library-specific functions. ### Adding support for a `copy` kwarg to `real` and `conj` (Athan) - Discussion: <https://github.com/data-apis/array-api/pull/884> - TL;DR: libraries having the concept of strides, such as NumPy, return a strided view, rather than copying real components to a new memory location, and libraries such as PyTorch perform "lazy conjugation" by setting a conjugate bit. In order to perform explicit conjugation, PyTorch provides `conj_physical`. This discussion raises the question of whether we should add an explicit `copy` kwarg to `real` and `conj` to force data copying, as downstream consumers may want to mutate elements free of side effects and a `copy` kwarg would allow signaling intent. Oleksandr: I would be in favor of separate APIs for forcing a physical copy. Sebastian: for `real` and `imag`, could add a `copy` kwarg. But for `conj` in NumPy, this is a ufunc and we do not want to have to add a `copy` kwarg to every ufunc. Oleksandr: in `dpctl`, we implement as ufuncs. Aaron: in compat, we had to use `conj_physical`. Ralf: IIRC, that PyTorch behavior was a bug. Ralf: moving things away from ufuncs would be a huge implementation lift and would not be backward-compat. Can also force a copy using: `asarray(real(x), copy=True)`. Oleksandr: would be nice to have dedicated APIs for explicitly setting either real or imaginary components. Ralf: can do sthg like: `real(x) + 1.j * fill(pi)` Oleksandr: if not JAX, this is rather wasteful given the number of temporary copies. Not just about perf, but also limited memory usage on GPUs. Recommendation: shelve this. ### Adding `dtype` kwarg support to `fftfreq` and `rfftfreq` (Athan) - Discussion: <https://github.com/data-apis/array-api/issues/717> - PR: <https://github.com/data-apis/array-api/pull/885> - TL;DR: these two creation APIs lack the ability to specify an output dtype and thus the output floating-point precision. Currently, an array having the default floating-point dtype is returned. This PR proposes to add support for specifying a `dtype`, with the caveat that NumPy et al do not currently support a `dtype` kwarg (note: PyTorch does support a `dtype` kwarg), so this would be new in various libraries and would need to be supported in `array-api-compat` in the interim. Recommendation: move forward. No objections. ### Supported data type kinds in `astype` (Athan) - Discussion: <https://github.com/data-apis/array-api/issues/841#issuecomment-2392032433> - Related: <https://github.com/data-apis/array-api/issues/859> - PR: <https://github.com/data-apis/array-api/pull/848> - TL;DR: downstream consumers want an easy way to convert a real-valued floating-point array to a complex-valued floating-point array having the same precision. In NumPy world, `x + 1j` yields a type promoted complex floating-point dtype; in standard-land, that is undefined, as `1j` should be converted to the data type of `x` before performing the operation. This PR seeks to add support for specifying a data type kind so that one may do `xp.astype(x, dtype="complex floating") + 1j`, thus promoting `x` to the nearest representable complex floating-point data type before converting `1j` and performing `+`. - Questions: 1. What should be the supported kinds? 2. Are we happy with the specified rules governing what data type should be returned? See <https://output.circle-artifacts.com/output/job/c2231088-1209-4852-9a3c-cf32c2375ef3/artifacts/0/_site/draft/API_specification/generated/array_api.astype.html> Oleksandr: need to accommodate array libraries which support, e.g., `float16`. Aaron: I would punt this to v2025 and allow to incubate in the compat to allow for further discussion. Evgeni: I vote for deferring and add implementation to the compat layer. ### `spread_dims` (v2025; Athan) - Discussion: <https://github.com/data-apis/array-api/issues/760#issuecomment-2602163086> - Proposal: ```python def spread_dims(x: array, ndims: int, axes=Tuple[int, ...]) -> array ``` - TL;DR: `expand_dims` has issues when supporting multiple axes due to ambiguities when mixing nonnegative and negative indices, thus hindering standardization of a common use case. Is there room for adding a separate API which handles the "spreading" of dimensions more explicity? (postponed until next meeting) * * * # Meeting minutes 9 January 2025 **Time**: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST **Present**: Ralf Gommers, Athan Reines, Evgeni Burovski, Hameer Abbasi, Sebastian Berg, Oleksandr Pavlyk, Tim Head, Jake Vanderplas, Lucas Colley, Aaron Meurer, Leo Fang, 1. Announcements 2. Triage 3. Mutable arrays in capabilities (Lucas) 4. `__binsparse__` next steps and updates (Hameer) 5. Type promotion behavior in `diff` (Athan) 6. Adding `nan_to_num` to the specification 7. Adding support for specialized matrix products 8. Other 2025 focus areas and/or missing APIs? 9. ## Notes ### Announcements - Happy New Year! Ralf: any announcements? Lucas: spoke with Tim and Olivier about getting array-api-extra in sklearn. PR is up, and looks like this is good to go. ### Triage - Adding scalar support to `result_type`: <https://github.com/data-apis/array-api/pull/873> - Status: **merged** - Adding scalar support to `where`: <https://github.com/data-apis/array-api/pull/860> - Status: **merged** - Adding scalar support to element-wise functions: <https://github.com/data-apis/array-api/pull/862> - Status: **merged** - Clarify zero-dimensional array behavior in `take`: <https://github.com/data-apis/array-api/pull/876> - Status: **merged** - Change: guidance updated to specify that the input array should have one or more dimensions. Array libraries are allowed to support zero-dimensional arrays (e.g., NumPy) for backward-compatibility. Otherwise, behavior is left unspecified. - Address default value for `axis` kwarg in `vecdot`: <https://github.com/data-apis/array-api/pull/880> - Status: **merged** - Change: addresses a regression and ensures that the default value for `axis` is consistent for `vecdot` and `linalg.vecdot` API signatures - Clarify that `sqrt` must be correctly rounded: <https://github.com/data-apis/array-api/pull/882> - Status: **awaiting review** - Change: makes explicit that `sqrt` should follow IEEE 754 ### Mutable arrays in capabilities (Lucas) - Discussion: <https://github.com/data-apis/array-api/issues/845> Lucas: context was DLPack in SciPy. I believe we skip this, due to Evgeni's PR to add support for the buffer protocol. Ralf: sounds good. ### `__binsparse__` next steps and updates (Hameer) - **Discussion**: <https://github.com/data-apis/array-api/issues/840> - Seems like some kind of blocked-sparse support will land in the binsparse protocol; there's general interest. - Also there is general interest in supporting the in-memory interchange. - Should we have `__binsparse_format__`/`.format`? To me (Hameer), the answer is yes, similar to `__dlpack_device__`/`.device`. Hameer: point of contention to have two separate protocols: one for querying format and another for actually materializing/transferring data. Hameer: any issues with having two separate APIs? Ralf: to me, this is analogous to the dense case, with `__dlpack_device__` and `.device`. Ralf: seems like we are good to go here. Outcome: go ahead and implement. Athan: what is the current status of the binsparse spec? Leo: IIRC, there were some concerns around performance due to parsing JSON, etc. Hameer: I think the initial path forward is to use Python dictionaries. However, no real progress on a C struct, TMK. Leo: so they are not ready to commit to a standardized struct layout? Hameer: correct. Discussions are still ongoing. Ralf: seems like starting with Python makes sense as helps us get to a prototype faster. ### Type promotion behavior in `diff` (Athan) - Issue: <https://github.com/data-apis/array-api/issues/852> - PR: <https://github.com/data-apis/array-api/pull/881> - TL;DR: current guidance limits the only portable behavior to the scenario where `append` and `prepend` have the same dtype as `x`. NumPy supports type promotion, where optional array kwargs may affect the output array dtype. Are we okay maintaining the status quo, or should we follow NumPy and allow type promotion? Outcome: maintaining status quo is fine here. ### Adding `nan_to_num` to the specification - Issue: <https://github.com/data-apis/array-api/issues/878> - Arose during discussions in sklearn. - E.g., <https://github.com/scikit-learn/scikit-learn/pull/30562#discussion_r1899444599> Outcome: adding to array-api-extra seems like preferred path. ### Adding specialized methods for matrix products - Issue: <https://github.com/data-apis/array-api/issues/877> - `matvec` and `vecmat` - NumPy and JAX support; will other libraries follow suit? - v2025? Jake: to me, this are not necessary to add to the standard. Ralf: we can reconsider in the future if all libraries in the standard have implemented. Leo: IMO, we should focus on standardizing einsum, rather than having these specialized APIs. Outcome: punt on this. ### Any specific focus areas and/or missing APIs for 2025? Jake: the big item is in-place updates, setitem. Still some design work to be done. Ralf: agreed. I think we will also need to revisit the `cond` function, as anything in an `if` creates problems. Ralf: typing is another area. Lucas: in SciPy, the things we have punted on is device transfers. Related: <https://github.com/data-apis/array-api-extra/pull/86#issuecomment-2580929411> Ralf: DLPack v1.0 would be nice to get supported. Oleksandr: more or less, we are good in OneAPI/sycl. Where the spec is silent, we choose to align with NumPy. Ralf: would be good to make progress on documentation. Ralf: we finally have GPU CI in SciPy. We currently use Cirrus runners. These may be useful for other projects. Currently, much more effective and cheaper than using GitHub. Link: <https://cirrus-runners.app>. One of the big advantages is caching. Leo: this would be great for CuPy. Leo: thus far, only Linux support and no multi-GPU support. But that would be a high ask to extend to Windows and multi-GPU. Lucas: quick announcement regarding having array API support with unit libraries: <https://github.com/orgs/quantity-dev/discussions>. Athan: is there any thing we need to do for missing data/masks? Ref: <https://github.com/data-apis/array-api/issues/875> Ralf: there may be room here for tentative proposals in order to get consensus across the ecosystem. In this case, it would be good to ensure consistency across NumPy. Sebastian: re: NumPy inconsistency: all of the ones that don’t have it, are a bit weird ones (i.e. not reduction implementation wise). So that’s why they don’t have it. Lucas: would it make sense to extend membership to consumer libraries? Ref: < https://github.com/data-apis/governance/issues/26> Ralf: we should tweak the governance language. The easy part is just adding folks to the list. * * * # Meeting minutes 12 December 2024 **Time**: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST **Present**: Ralf Gommers, Athan Reines, Hameer Abbasi, Oleksandr Pavlyk, Sebastian Berg, Evgni Burovski, Lucas Colley, Tim Head, Aaron Meurer, Jake Vanderplas, 1. Announcements 2. Triage (Athan) 3. Scalar support in `where` (Athan/Tim) 4. Scalar support in `result_type` (Athan) 5. Require that dtypes obey Python hashing rules (Evgeni) 6. Mutable arrays in capabilities (Lucas) 7. `__binsparse__` next steps and updates (Hameer) 8. ## Notes ### Announcements - Cancel the meeting on Dec 26? Next meeting Jan 9, 2025? Ralf: everyone good with canceling on Dec 26? Good. Canceled. Ralf: any other announcements? ### Triage - Clarification regarding 0D arrays in cumulative sum: <https://github.com/data-apis/array-api/pull/851> - Status: **merged** - Change: disallow 0D arrays, but allow for backward compatibility - Adding `count_nonzero` to the specification: <https://github.com/data-apis/array-api/pull/803> - Status: **merged** - Change: added standardized API for `count_nonzero` - Adding `cumulative_prod` to the specification: <https://github.com/data-apis/array-api/pull/793> - Status: **merged** - Change: added standardized API for `cumulative_prod` - Clarification regarding data type of the `condition` argument in `where`: <https://github.com/data-apis/array-api/pull/868> - Status: **merged** - Change: should be a boolean data type - Adding complex dtype support for `mean`: <https://github.com/data-apis/array-api/pull/850> - Good to merge? - Adding scalar support to element-wise functions: <https://github.com/data-apis/array-api/pull/862> - Good to merge? - Support for `nan*` reductions: <https://github.com/data-apis/array-api/issues/621> - Bump to v2025? - Outcome: YES - Adding `top_k` to the specification: <https://github.com/data-apis/array-api/pull/722> - Only add `top_k` - Use of `mode` kwarg - NumPy: <https://github.com/numpy/numpy/pull/26666> - Chose to follow PyTorch, but did not explain rationale, apart from PyTorch precedent. - Next steps? Keep in v2024? - Outcome: still stalled in NumPy. This could be fine to fall into v2025. Keep on v2024 milestone. Ralf: need a proper discussion in NumPy to ensure alignment. That should happen as a precedent before merge. Oleksandr: what is the hold-up? Ralf: NaN sorting. Sebastian: kwarg support. ### Scalar support in `where` (Athan/Tim) - PR: <https://github.com/data-apis/array-api/pull/860> - Considerations: - Allow `condition` to be a scalar? - NumPy allows - PyTorch does not - Resolved data type when both `x1` and `x2` are scalars? Outcome: restrict `where` to having a `condition` argument being an array. See also <https://github.com/data-apis/array-api/issues/807#issuecomment-2159156834>. Require at least one of `x1` or `x2` to be an array. ### Scalar support in `result_type` (Athan) - PR: <https://github.com/data-apis/array-api/pull/873> - Discussion: <https://github.com/data-apis/array-api/issues/805> Outcome: require at least one array or dtype object. ### Hashing rules for dtypes (Evgeni) - Discussion: <https://github.com/data-apis/array-api/issues/582> - Comment: <https://github.com/data-apis/array-api/issues/582#issuecomment-1364345806> - Can we adopt the following language: "All objects in this standard must adhere to the following requirement (as required by Python itself): objects which compare equal have the same hash value"? - Alternatives: - `name` attribute - `__str__` method Evgeni: use case in sklearn to be able to add dtypes as dictionary keys. Aaron: in terms of hashing, the issue is NumPy as it has different ways of representing dtypes and they don't hash the same. This has been a pain, so would be nice to resolve. Aaron: current issue is that `x.dtype` and `np.dtype` do not hash the same, as one is an instance and one is a class. Sebastian: don't think we can fix the hashing, in this case. We may be able to change the hash of the scalar type. But would need to put in the work to make this happen. Aaron: to circle back, the use of dictionary keys assume different hashing. ``` >>> x = np.asarray(0.) >>> {np.float64: 0, x.dtype: 1} {<class 'numpy.float64'>: 0, dtype('float64'): 1} >>> x.dtype == np.float64 True ``` Aaron: I wrote a whole blog post about this a long time ago <https://www.asmeurer.com/blog/posts/what-happens-when-you-mess-with-hashing-in-python> Evgeni: could we standardize around `__str__`? Ralf: problem there is that would mean dtypes from different array libraries would compare as true, which is not desired. Outcome: need a better summary on the issue, with the various constraints. Ideal outcome: NumPy could fix this. Sebastian: I think you can fix it to the extent that you want it fixed. ### Mutable arrays in capabilities (Lucas) - Discussion: <https://github.com/data-apis/array-api/issues/845> (postponed discussion) Lucas: can wait, depending on how urgently `array-api-strict` would be to move from `__array__`. ### `__binsparse__` next steps and updates (Hameer) - Seems like some kind of blocked-sparse support will land in the binsparse protocol; there's general interest. - Also there is general interest in supporting the in-memory interchange. - Should we have `__binsparse_format__`/`.format`? To me (Hameer), the answer is yes, similar to `__dlpack_device__`/`.device`. (postponed discussion) * * * # Meeting minutes 28 November 2024 **Time**: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST **Present**: Ralf Gommers, Athan Reines, Tim Head, Evgeni Burovski, Lucas Colley, Nathaniel Starkman, Sebastian Berg, Jorn Hammudoglu, Aaron Meurer, ## Agenda 1. Announcements 2. Revisiting `materialize` (Hameer) 3. `__binsparse__` next steps and updates (Hameer) 4. Adding `isin` to the specification 5. Static typing 6. Triage 7. array-api-extra and scikit-learn (Lucas) 8. ## Notes ### Announcements - Aaron departure from Quansight - Meeting logistics - Public calendar? - (Ralf) In the meantime, happy to add folks to my calendar invite for this. Ralf: will figure out a public calendar invite. In the meantime, please let me know if you'd like to be added to my private calendar invite. Just make sure you set your timezone to UTC. Evgeni: should we still have a Google Group? Sebastian: We could use the scientific-python discuss (happy to use it for numpy too, FWIW). Ralf: that may be better. Aaron: IMO we should keep public discussions on GitHub, since that's where things have been mostly done already ### Revisiting `materialize` (Hameer) - Issue: <https://github.com/data-apis/array-api/issues/839> - Graph breaks work for e.g. PyTorch - Sometimes (e.g. Dask, Galley/`sparse`) hints are absolutely necessary, and automatic breaks can be sub-optimal. - E.g. iterative algorithms should materialize after each iteration (postponed until next meeting) ### `__binsparse__` next steps and updates (Hameer) - Seems like some kind of blocked-sparse support will land in the binsparse protocol; there's general interest. - Also there is general interest in supporting the in-memory interchange. - Should we have `__binsparse_format__`/`.format`? To me (Hameer), the answer is yes, similar to `__dlpack_device__`/`.device`. (postponed until next meeting) ### Adding `isin` to the specification (Lucas or Ralf) - Issue: <https://github.com/data-apis/array-api/issues/854> Ralf: question for second element in terms of supported values: tensor and scalar? Sebastian: how would type promotion work? Scalars makes sense (not sure how important it is). The big thing you could skip, is that you could force the second one to match the first dtype? Ralf: could restrict to same dtype? Evgeni: does the spec specify hashing behavior? Aaron: I think the implementation is usually based on searchsorted, so not usually hashing. Sebastian: could be implemented using hashing, in principle. Sebastian: should see what NumPy does in terms type promotion. Aaron: I would tend toward following what we do for `equal`. Aaron: separate question: what is the `kind` argument? The choice of algorithm? Lucas: yes, and certainly for Dask, could be more important for being able to choose algorithm in distributed contexts. Ralf: specifying the algorithm is not something we want to specify/require. Joren: how would different shapes work? Would the second argument ravel? Sebastian: we could just limit the second argument to 1D. Nathaniel: looking at NumPy's implementation, just uses `__contains__`. So invariant to shape of array. Joren: if allow other shapes for second argument, could be confusing, as users may expect different output shape. Sebastian: I think it’s fine to limit, but also fine not to if others support it. Ralf: I think we can punt on this decision for now and do more R&D. At a minimum, we can specify 1D and always expand to n-dimensional later. ### Static typing (Nathaniel) - typing living in the strict library? - arguments for static typing (e.g., protocols, etc) in the specification itself? - Discussion: <https://github.com/data-apis/array-api/discussions/863> - PR: <https://github.com/data-apis/array-api/pull/589> Nathaniel: was motivated as a downstream consumer of array libraries in order to build array-agnostic libraries. Nathaniel: I am interested in eventually achieving having something pip-installable to allow static and runtime type checking. Joren: agree. Standalone library seems like the most obvious solution. If have separate from spec, would also be nice for release cadence. Joren: have also thoughts on typing of dtypes and devices. Ralf: seems like we have converged over the past week to have a separate thing called `array-api-typing`. Nathaniel: Nick Tesser is another individual who'd likely be intersted in getting involved. Joren: dtypes are not particularly amenable to statically typable atm. Currently there is not a way to statically distinguish between dtypes. Related: <https://github.com/jorenham/optype/issues/25 >. Nathaniel: personally, I'd love if `array` could be parameterized by the `dtype`. Could also be nice to parameterize by backend, as well. Would allow seeing how dtypes flow through a function. Action item: get the typing repo/project going and we can iterate there. Action item: we should make clear that the typing stubs that we have now is that the stubs are there for Sphinx and not intended to be explicitly used. ### Triage (Athan) - Batched einsum/tensordot: <https://github.com/data-apis/array-api/issues/727> - Leo added this to v2024. Is this still something we want to pursue? - `cumulative_prod`: <https://github.com/data-apis/array-api/pull/793> - Can we merge? - `count_nonzero`: <https://github.com/data-apis/array-api/pull/803> - Can we merge? Athan: unless folks object, will go ahead and merge within the next week or so. Ralf: add a comment to that extent so everyone gets a ping. ### array-api-extra and scikit-learn (Lucas) - Question: vendor (git submodule new for scikit-learn?) vs optional dependency (not useful?) vs hard dependency (not feasible?) Lucas: sklearn has array-api-compat as an optional dep, rather than SciPy, which vendors. Tim: we don't currently use `array-api-extras` is that don't currently need. We just have some custom implementations/wrappers. I suggest creating an issue which articulates how to transition. Tim: most of what goes into sklearn's `_array` is what is in NumPy, but not in the Array API standard. Occassionally, it is something outside of NumPy. Lucas: could work to have extras as an optional dep, which only gets pulled in if array-api-compat installed. Tim: I am slightly leaning toward vendoring. Currently, if `array-api-compat` is missing, we print a message instructing the user to install. Could do the same, but not ideal to impose other actions. Aaron: I'd advocate for vendoring, especially for backward-compat reasons and handling breaking changes. Ralf: agreed that vendoring is preferred. Tim: Lucas, I suggest creating an issue. Lucas: seems like vendoring could be applicable to array-api-compat. * * * # Meeting minutes 14 November 2024 **Time**: 10am PDT, 1pm EDT, 6pm GMT, 8pm CEST **Present**: Ralf Gommers, Athan Reines, Hameer Abbasi, Evgeni Burovski, Leo Fang, Sebastian Berg, Aaron Meurer, Oleksandr Pavlyk, Tim Head, ## Agenda 1. Intros & context on community meeting 2. Announcements 3. `__array__` support (Aaron) 4. Future of the project as a dispatching mechanism? (Leo) 5. Revisiting `materialize` (Hameer) 6. `__binsparse__` next steps and updates (Hameer) 7. Triage ## Notes *Please help to edit meeting notes collaboratively during the call! And afterwards, please help ensure that key points, plans and decisions are recorded on the relevant GitHub issues.* ### Intros and call overview - Coordination: - Array API repo pinned issue for agenda items and link to HackMD - Issue: <https://github.com/data-apis/array-api/issues/853> Ralf: maybe next time, we can put on the mailing lists of SciPy, sklearn, and other interested projects. Tim: key will be ensuring good minutes. ### Announcements - Blog post about the new CZI EOSS cycle 6 award and related work that is planned: <https://data-apis.org/blog/eoss6_award/>. - Array API strict (Aaron) - Started working on v2024 support (preliminary work in a PR, but will merge and can be enabled via a flag) - Still need to add scalar support - Still need test suite support Aaron: next meeting is Thanksgiving. Ralf: we should probably keep the meeting, as better than being off for a month. We should just avoid making major decisions. Aaron: Array API strict update. Have started working on v2024; however, need to enable a flag. This is all preliminary. Assumes that functions are already implemented correctly in NumPy. The test suite still needs to be updated. Biggest gap is scalar support, as we have yet to standardize. Regardless, if using the strict library and want to use one of the new functions, you should have the ability to access that now provide you opt-in. ### `__array__` support (Aaron) - Spec: <https://data-apis.org/array-api/latest/API_specification/generated/array_api.asarray.html> - Python buffer protocol support: <https://peps.python.org/pep-0688/> - Array API strict: <https://github.com/data-apis/array-api-strict/issues/67> Aaron: originally, the strict library was `numpy.array_api`. During that time, we had added `__array__` which allows NumPy to convert input objects to NumPy arrays. Aaron: I attempted to remove from the strict library; however, this broke many things in SciPy. Oleksandr: this has be problematic in the context of GPU libraries, especially for 0D arrays. Sebastian: could just implement `__array__` and then error or whatever else makes sense. Want to avoid sequence route and just error. Hameer: what about removing `__array__` and then just rely on DLPack? Ralf: if we could implement the Buffer protocol in the strict library, then would resolve issue, I believe. Sebastian: cannot do that, as would not work on many arrays. Only applicable for CPU arrays. Aaron: strict library does have some "fake" device support. We could say that the Buffer protocol works for default devices (CPU) and then error for non-default devices. Ralf: agree with Sebastian, as DLPack is preferrable here. Aaron: should we put something in the standard about requirements for implementing the Buffer protocol, if you can? At least, in Python v3.12, you can implement the Buffer protocol in Python. While arguably not how most libraries would implement, but it should be possible. Hameer: some libraries would not be able to implement via C API. Sparse and Dask would have difficulty. Evgeni: what about just avoiding the Buffer protocol and only using DLPack? Sebastian: not sure what problem we are trying to solve. Evgeni: in SciPy, we were relying on `xp.asarray` for any object, including for objects from other namespaces. Ralf: there were many places in SciPy where we intentionally needed to convert to NumPy arrays. Sebastian: I think it is safe to add to the standard that, if on CPU, you should always implement the Buffer protocol. Oleksandr: for non-CPU arrays, libraries such as SciPy would explicitly error, and then user would be required to make an explicit transfer to CPU. Action item: update the standard to require Buffer protocol for array libraries supporting CPU. ### Future of the project as a dispatching mechanism? (Leo) - Three mechanisms for dispatching: - Pure Python, array/tensor based programs - This is the Array API, as used in SciPy, sklearn, etc - Import hook hijacking (ex: cudf.pandas) - Allows intercepting import statements, while still allowing is instance checks to continue working. - Backend dispatch system (ex: cudf.polars, NetworkX) - Allow drop-in replacement, as everything is transparent to upstream user/developer. Leo: NetworkX approach is currently being discussed elsewhere, and I wonder if there is an opportunity to standardize something in this area. Ralf: I believe there is a fourth option here. The above assumes that SciPy/sklearn is doing things in Python, but this may not be true. Oleksandr: my understanding for NetworkX is that a user can say, for this task, I have an implementation for the GPU, so make a transfer to the GPU, perform computation, and then transfer back. Ralf: at least in SciPy, if we get an array in, and if there is a matching function, then delegate to that. E.g., if CuPy array in, introspect, and then explicitly import CuPy and then call the corresponding function in CuPy. Ralf: But there is another approach is nanobind, which allows supporting Torch, NumPy, and other array libraries at the same time. Ralf: You can also write Cython memory view code that works with multiple CPU libraries. Leo: <https://nanobind.readthedocs.io/en/latest/api_extra.html#n-dimensional-array-type> Leo: Seems like a similar idea to array-api-{compat,strict}, or what we have in cuda.core.StridedMemoryView. Oleksandr: but these work the same way in that the computation happens on the same device as the device of the array that came in. But there is another use case here were you want to support multi-device support. Leo: returning back to original question is whether we are at a stage where we can make explicit recommendations for people to consider for their projects. Sebastian: I think this topic doesn’t fit in a single meeting. :) Ralf: I am not sure, as this is still early days. Leo: right, but then we can at least say that here are 5 or so approaches, but here is why you should first try Array API support. Ralf: correct, and we rank according to complexity. Tim: would be good to connect with the Scientic Python community on this, as there is interest and some discussions happening. Ralf: I think it is a matter of writing a good overview. Sthg along the lines of here are the 4-5 conceptual approaches, here are a few concrete implementations, and provide some guidance on the pitfalls, shortcomings, and strengths of the approach. Action item: write an overview. Leo: proving out some of these approaches could be a good project for an intern. ### Revisiting `materialize` (Hameer) - Graph breaks work for e.g. PyTorch - Sometimes (e.g. Dask, Galley/`sparse`) hints are absolutely necessary, and automatic breaks can be sub-optimal. - E.g. iterative algorithms should materialize after each iteration Hameer: as an alternative to a `materialize` API, could we standardize around a compile decorator which is a no-op for those libraries not supporting compilation? Ralf: this is likely a non-starter in libraries such as Jax and Torch, as they are not going to change. (postponed until next meeting) ### `__binsparse__` next steps and updates (Hameer) - Seems like some kind of blocked-sparse support will land in the binsparse protocol; there's general interest. - Also there is general interest in supporting the in-memory interchange. - Should we have `__binsparse_format__`/`.format`? To me (Hameer), the answer is yes, similar to `__dlpack_device__`/`.device`. (postponed until next meeting) ### Triage (Athan) - Batched einsum/tensordot: <https://github.com/data-apis/array-api/issues/727> - Leo added this to v2024. Is this still something we want to pursue? - `cumulative_prod`: <https://github.com/data-apis/array-api/pull/793> - Can we merge? - `count_nonzero`: <https://github.com/data-apis/array-api/pull/803> - Can we merge? (postponed until next meeting)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.