# Draft array-api sparse issue Hi all, Continuing from this issue on scipy: * https://github.com/scipy/scipy/issues/18915 I would like to start a discussion about array-api support for sparse arrays and where there are existing issues. Currently, `scipy.sparse` has regained some momentum around the development of the `array` like classes. I would really like for there to be some level of `array-api` support as part of this, as I'd really like to have better support for sparse arrays in libraries like `xarray`, `dask`, and `scikit-learn`. I would be very keen on hearing input from authors of sparse array implementations on what their appetite is for adopting the array API and where they see issues. * `cupyx.scipy.sparse`: * `pytorch`: * `pydata/sparse`: @hameerabbasi * `jax.sparse`: @jakevdp * `python-graphblas`: @eriknw, @jim22k I think a good potential outcome here would be a `sparse` extension to the array-api which at addresses some of the issues I've included below: ## Potential issues ### `dlpack` based interchange Since sparse arrays are meant to be an efficient encoding of matrices with large amounts of zeros, it does not neccesarily make sense to allocate all those missing values as a default interchange mechanism. Surely we can achieve more reasonable interchange of sparse matrices between devices. I would like to plug https://github.com/graphBLAS/binsparse-specification here, an effort I'm contributing to for establishing a binary interchange format for sparse tensors. ### Multiple sparse types Sparse array libraries typically provide multiple formats of sparse arrays (e.g. CSC, CSR, COO). This is unlike the currently supported array-api libraries which only have one array class. More complicated is the return types of each operation, especially with lazy APIs where output formats may depend on optimizations. ### Dense types Even more complicated, sparse libraries will also have associated dense array types. For example | Library | Dense class | | ----------- | ----------- | | `scipy.sparse` | `np.ndarray` | | `pydata/sparse` | `np.ndarray` | | `cupyx` | `cupy.ndarray` | | `jax.experimental.sparse` | `jax.Array` | | `torch`| `torch.Tensor`| Notably, `torch`'s sparse arrays are also a `torch.Tensor`. These dense types are generally supported for construction, and sometimes for the result of reduction (e.g. `csr_array.sum(axis=0) -> ndarray`). This somewhat breaks the "not dealing with multiple array types", but it is a very controlled set of interoperation. ### Broadcasting with null values Sparse libraries often don't play very well with null values. The optimization of skipping the 0 values often means <details> <summary> Example </summary> ```python from scipy import sparse import numpy as np coo = sparse.coo_array(([1, 1, 1, 1, 1], ([0, 0, 1, 2, 2], [0, 1, 1, 0, 2]))) coo.toarray() # array([[1, 1, 0], # [0, 1, 0], # [1, 0, 1]]) (coo * np.array([np.nan, np.nan, 2.])).toarray() # array([[nan, nan, 0.], # [ 0., nan, 0.], # [nan, 0., 2.]]) ``` </details> ### nD support While 1d support may be reasonable nD support, especially for non-COO formats, is likely out of scope for this library. See also: * https://github.com/scipy/scipy/pull/18530 This means that specific concatenation, indexing, and reshaping operations may not work. Arguably, the reshape operation may not even make sense. CSR, CSC classes are only 2d. Some libraries, like python-graphblas, are only