Developer Summit 2023 report: `scipy.sparse`

Date: May 26, 2023
Contributors: CJ, Julien, Isaac, Dan, Ross, Stéfan, Levi, Sebastian

I've locked edit access for now. If you have changes or suggestions, let me know. CJ Carey

Context

scipy.sparse provides a widely-used set of types for working with sparse data, which mimic the numpy.matrix API. As the community continues to move away from matrix semantics, we want to provide a sparse array-like API that follows modern conventions.

This involves a lot of work, including some necessary churn for end-users, but it also provides a unique opportunity to re-evaluate many design choices that have accumulated over the last 20+ years of SciPy development.

What we accomplished

Reorganized class hierarchies to allow easy isinstance checking.
- isinstance(..., sparray) now selects only sparse arrays
- isinstance(..., spmatrix) now selects only sparse matrices
- A new private base class (_spbase) underpins all sparse container types.
Streamlined the behavior of isspmatrix and isspmatrix_<fmt> helper functions to better reflect their name (see gh-18528).
- isspmatrix* now only return True for sparse matrices, not sparse arrays.
- issparse is the recommended function to check for either array or matrix sparse containers.
- As part of this cleanup, internal usage of isspmatrix and the associated isspmatrix_<fmt> functions was replaced with issparse and isinstance checks, as appropriate.
Updated the sparray interface, leaving spmatrix untouched for backward compatibility:
- Deprecated several methods in sparse array classes: asfptype, getrow, getcol, getnnz, getformat.
- Deprecated .H and .Aattributes in sparse array classes.
- Ensured that sparray doesn't downcast index arrays from 64-bit to 32-bit.
Came to a decision about Creation Functions for sparse arrays.
- We identified three approaches:
  - our choice: Define new creation functions with distinct names in the existing scipy.sparse namespace. For example, scipy.sparse.diags_array() will act like scipy.sparse.diags() but return a sparse array instead of a sparse matrix.
  - Define a new scipy.sparse.array namespace. This would complicate the process of incrementally updating dependent library code, as upgrades would need to be performed in an all-or-nothing fashion.
  - Add an array=None keyword argument to each existing creation function. This would make it hard to change other arguments and clean up the API going forward.
- This entails two updates for users over the transition period:
  - switching call-sites to use the new functions
  - (eventually) switching the new functions back to their shorter names, once the original functions are gone. This may need to coincide with a major version bump.
Made progress toward 1D sparse arrays
- Generalized isshape and check_shape to optionally handle non-2d shapes.
- Started implementing n-dimensional coo_array support (see gh-18530).
Explored feasibility and usefulness of defining __array_ufunc__ and other __array_*__ protocols for sparse arrays
- See this proof of concept branch that includes some workarounds.
- This experiment with direct ufunc fallback is slower than current specialized implementations, but could potentially reduce complexity of templating.
- These efforts might also solve a long-standing issue related to build size/time of the sparsetools shared library. We sometimes get requests to support more dtypes (see gh-7408) but our current approach requires generating separate C++ functions covering the cross-product of all index and value types.
Stopped performing slow O(nnz) checks for downcasting gh-18509
- We used to iterate through all the index arrays to see if could fit them into a smaller bitwidth. This slow check has been removed.
Improved documentation for sparse arrays
- New module level documentation added gh-18516
- Documentation of canonical formats gh-18539
Added missing benchmarks for matrix power gh-18553
Triaged older issues from the backlog
- Fixed gh-15177: element-wise division densifies
- Fixed gh-16929: argmin/argmax return the wrong values
- Fixed gh-18494: mst tree ordering with np.argsort(..., kind='stable')
- Partially addressed gh-16774: int64 index arrays by default

What remains to be done

1-d sparse array support
- Construct generic tests for 1d-array semantics.
- Continue iterating on gh-18530: Generalize coo_array to support n-dimensional shapes.
- Explore using a dense wrapper sparse array to enable testing / prototyping: dps_array in 2d gh-18514 and then 1d.
- Start looking at other sparse formats that might be a good fit for 1d array support.
Decide whether SciPy wants to implement \(n\)-dimensional arrays (for \(n > 3\))
- Potentially relevant formats: CSF and GCXS
Continue improving sparse creation functions
- Fix gh-18555: Improve validation for scipy.sparse.eye + friends.
- Add sparse array creation functions for eye, random, and others.
- Deprecate some of the old sparse matrix creation functions. Candidates include spdiags, rand and identity.
Deprecate more matrix-specific functionality
- Deprecate the isspmatrix_<fmt> functions now that the class hierarchy for sparray and spmatrix has been adjusted. To check for a particular format, users can write x.format == 'coo'.
General performance improvements
- Address gh-18546: Fast path for sparse array "canonical format"
- MAINT Remove redundant checks in sparse arrays and sparse matrices
Adapting scikit-learn to support sparse arrays (to be discussed with scikit-learn's maintainers):

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

Developer Summit 2023 report: scipy.sparse

Context

What we accomplished

What remains to be done

Developer Summit 2023 report: `scipy.sparse`