Sparse summit follow up meeting

--- tags: scipy, sparse, scientific-python-summit --- # Monday April 29, 2024 Video call link: https://meet.google.com/jrp-qgcf-nxh If not working: https://colgate.zoom.us/j/276854695 ## Attendees: CJ Carey, Dan Schult, Ilan Gold, Stéfan van der Walt ## News - [Scipy Release schedule for 1.14](https://discuss.scientific-python.org/t/proposed-release-schedule-for-scipy-1-14-0/1151) - This has branch and rc1 before Scientific Python Summit. Should we request a delay till June 10? (Summit over and Tyler back from PTO) ## Updates: - 1D PRs: - `csr-1d` [PR 19833](https://github.com/scipy/scipy/pull/19833) minor merge conflicts, CJ to do final review - `index-1d` [PR 20120](https://github.com/scipy/scipy/pull/20120) wait until csr-1d goes in - Other PRs - parallel sparse [PR 19717](https://github.com/scipy/scipy/pull/19717) - single pass CSR binops [PR 19765](https://github.com/scipy/scipy/pull/19765) - singleton review from CJ [PR 20490](https://github.com/scipy/scipy/pull/20490) - Try adding the validation in `_spbase.__init__()`, using `isinstance(self, sparray)`. This would avoid another hop in the `__init__` chain, and hopefully avoid order dependence for multiple inheritance. - Issues needing PRs - [issue 20169](https://github.com/scipy/scipy/issues/20169) CSR/CSC Division Yields COO - is [name=Isaac]'s suggestion (return CSR instead of COO) okay? Yes - should we have a rule for when the returned format can change? - switch when its faster - try to only switch to more feature-rich formats - [issue 20377](https://github.com/scipy/scipy/issues/20377) Fix the `__str__` method for 1d sparse arrays (Also see April 1 discussion) - [issue 20378](https://github.com/scipy/scipy/issues/20378) Fix the `__repr__` method for all sparse arrays (Also see April 1 discussion) ## Discussion - 1d return class [issue 20128](https://github.com/scipy/scipy/issues/20128) - we have a PR to fix this: [PR 20490](https://github.com/scipy/scipy/pull/20490) - What is needed before Sparse Arrays are deemed feature complete -- can announce to folks to change? - 1D arrays needs CSR - indexing - docs and tools for changing spmatrix code to sparray (use ruff rules? probably not sophisticaed enough. Lean on deprecation warnings and migration guide with list of gotchas. Topic for Summit.) - returning 1Dslices (restrict to cases where we are currently returning a 2D version of a row or column) - What other big picture changes are on horizon? (ramp up to summit) - make test_base more effective: - split test_base into tests that need change by dtype vs those that don't (but currently run on every dtype anyway) - other ideas - sparsetools build size and time: - check for speedup over similar/last 5yrs numpy tools? If not speedup use numpy. - nD arrays - DOK and COO. - Generalized CSR: mark axes as compressed or not. ravel each set of axes to make 2D equivalent(generalized CS). Audit how methods would apply with this storage. - audit of functions using canonical format or able to change to take advantage of it. Looks like having classes enforce is hard to handle. Perhaps have functions use parameters to skip sorting, etc with good docs to let user enforce canonical and choose to skip checks. - Map of features: - Make it easier to contribute (docs for developers) - which operations cause excess time/conversions/etc - which use sparsetools? - which allow/use which dtypes? - code cleanup for after we remove spmatrix --------------------- # Monday April 15, 2024 video call link: https://meet.google.com/jrp-qgcf-nxh If not working: https://colgate.zoom.us/j/276854695 ## Attendees: CJ, Dan, Ilan, Isaac ## Updates: - 1D PRs: - `csr-1d` [PR 19833](https://github.com/scipy/scipy/pull/19833) minor merge conflicts, CJ to do final review - `index-1d` [PR 20120](https://github.com/scipy/scipy/pull/20120) wait until csr-1d goes in - Other PRs - DIA * scalar fix [PR 20445](https://github.com/scipy/scipy/pull/20445) is merged :tada: - parallel sparse [PR 19717](https://github.com/scipy/scipy/pull/19717) - single pass CSR binops [PR 19765](https://github.com/scipy/scipy/pull/19765) - CSR/CSC Division Yields COO [issue 20169](https://github.com/scipy/scipy/issues/20169) - is [name=Isaac]'s suggestion (return CSR instead of COO) okay? Yes - should we have a rule for when the returned format can change? - switch when its faster - try to only switch to more feature-rich formats - in need of someone to take this on ## Discussion - [PR 20322](https://github.com/scipy/scipy/pull/20322) (addon to [20175](https://github.com/scipy/scipy/pull/20175) which added dok.pop()) This makes the `default` arg match `dict.pop` in case of no such key: it raises a KeyError if `default` is not specified, and Typeerror if `default` provided as a keyword name instead of a position parameter. - Merged! :tada: - [PR 20444](https://github.com/scipy/scipy/pull/20444) 1D input in matrix & array creates a 2D row. We've talked about that before (Jan 8) here. The notes say we decided that we want 1D input to create 1D arrays, but not how to handle other cases. - breaking change is needed - Should we raise ValueError for arrays that don't support 1D? - [name=CJ] yes - Should we add FutureWarning for matrices? (could just let matrices create 2D, but futurewarning tells users to switch now -- making the switch to arrays easier later) - [name=CJ] no, let's wait until we deprecate spmatrix - current behavior matches np.matrix, so it's okay - sparsetools uses index array dtype for shape scalars [PR 20311](https://github.com/scipy/scipy/issues/20311) ```c void coo_tocsr(const I n_row, // <--- const I n_col, // <--- const I nnz, const I Ai[], const I Aj[], ``` - Fix: replace `I` with `int64_t` for the first two args - Hard part: make usable regression tests for this - Could maybe use broadcasting tricks to avoid large allocation - 1d return class [issue 20128](https://github.com/scipy/scipy/issues/20128) - repro: `csr_array(0)` used to return a 1x1 CSR - until the csr-1d PR is merged, we should return a 1x1 - after, do we return a shape (1,) array? - broken in 1.13, could be backported if desired - Option 1: - always return 1x1 to match spmatrix + old behavior - pro: consistent with old behavior - pro: makes array API tests run more easily - con: not logically consistent with some formats - Option 2: - always raise ValueError (for all sparse arrays) - pro: consistent across all formats - con: makes array API testing more painful - Consensus: option 2 (with a readable error message) ------------------------ # Monday April 1, 2024 video call link: https://meet.google.com/jrp-qgcf-nxh If not working: https://colgate.zoom.us/j/276854695 ## Attendees: Dan, CJ, Issac ## Updates: - 1D PRs: - `csr-1d` [PR 19833](https://github.com/scipy/scipy/pull/19833) - Needs CJ's review, should be ready to go - `index-1d` [PR 20120](https://github.com/scipy/scipy/pull/20120) - Computes desired shape while handling index. Handles `newaxis` now. - Raises if >2D is requested. - 0D indexing results -> numpy 0D - 1D results stick with format unless not available (lil format indexing) then use coo. - Other PRs - parallel sparse [PR 19717](https://github.com/scipy/scipy/pull/19717) - Waiting on author, but maybe needs encouragment? - Action: CJ will send a friendly nudge - single pass CSR binops [PR 19765](https://github.com/scipy/scipy/pull/19765) - [name=CJ] TODO: take a closer look, seems complete - coalesce binopts [PR 19766](https://github.com/scipy/scipy/pull/19766) - Cuts sparsetools.so in half, but incurs ~0 to ~30% performance hit - Action: Respond on the PR with consensus that it's probably too much of a performance hit - CSR/CSC Division Yields COO [issue 20169](https://github.com/scipy/scipy/issues/20169) - is [name=Isaac]'s suggestion okay? Yes - should we have a rule for when the returned format can change? - switch when its faster - try to only switch to more feature-rich formats - sparsetools uses index array dtype for shape scalars [PR 20311](https://github.com/scipy/scipy/issues/20311) ``` void coo_tocsr(const I n_row, const I n_col, const I nnz, const I Ai[], const I Aj[],``` ## Discussion - The next summit is coming up (in June)! - Should we return numpy scalars or 0d-arrays? - What about sparse constructors? `csr_array(4)` - Bump this to next week when Ilan is here. - Path forward for [`scipy.sparse.csr_matrix(scalar)`](https://github.com/scipy/scipy/issues/20128) - technically a regression, and something of a blocker for array-api compat for creation: [the specification](https://data-apis.org/array-api/latest/API_specification/generated/array_api.asarray.html#array_api.asarray) has specific wording about this: May be a Python scalar, a (possibly nested) sequence of Python scalars - Maybe solved by 1D arrays? - Return a 0D ndarray? - DECISiON: Return scalar or 0d array in core (what is the difference?) and then check in the array api for the numpy type before returning 1x1 - 0D arrays? Should these be "sparse" arrays (reshaping creates sparse)? Or is numpy array OK? - DECISION: use numpy 0D arrays and keep our eyes out for cases where that might not work (where sparse is needed) - str(A) issues: - returns "" for zero array. Confusing! - maybe include the shape or format in `__str__`? Then would never be nothing. Could also reuse `repr` at the top. - or print `(...) 0` to say that everything is zero - or wrap the coords/values in brackets of some kind - broken for 1d arrays, because it prints 2d indices (with zeros) - what if we just delegate to `__repr__` (and delete `__str__`)? - repr(A) returns "<8 sparse array of type ('<class 'int64'>') with 3 stored elements in Compressed Sparse Row format>". - shape at start, class next, format at end. Propose we switch format and shape? - "<Compressed Sparse Row sparse array of type ('<class 'int64'>') with 3 stored elements and shape (8,)>" - comes down to impact of change on users. wording is better but at what cost? - "<sparse array of type int64 in CSR format with 3 stored elements and shape (8,)>" - See Dask: `dask.array<ones_like, shape=(3,), dtype=float64, chunksize=(3,), chunktype=numpy.ndarray>` - `csr_array<shape=(8,), dtype=int64, nnz=3>` - `csr_array<shape=(8,), dtype=int64, 3 stored values>` - `dia_array<shape=(8, 4), dtype=float32, 3 stored values (2 diagonals)>` - Should include things like `has_canonical_format`? - Or we could special-case the 1d repr shape: `<length-3 sparse array ...>` ## Issues to file - Fix the `__str__` method for 1d sparse arrays - Fix the `__repr__` method for all sparse arrays # Monday March 18, 2024  If not working: -------> https://colgate.zoom.us/j/276854695 <---- This one ## Attendees: Thad Bindas, Dan Schult, Isaac Virshup, Ilan Gold ## Updates: - sklearn is now entirely tested against sparse arrays!! :tada: [PR 27090](https://github.com/scikit-learn/scikit-learn/pull/27090) - `dok-1d` [PR 19715](https://github.com/scipy/scipy/pull/19715) is now merged! :tada: - Roadmap sparse update [PR 19941](https://github.com/scipy/scipy/pull/19941) now merged :tada: - setdiag performance [Issue 19943](https://github.com/scipy/scipy/issues/19943) with [PR 19962](https://github.com/scipy/scipy/pull/19962). Now merged :tada: - 1D PRs: - `csr-1d` [PR 19833](https://github.com/scipy/scipy/pull/19833) - Action: CJ needs to re-review - `index-1d` [PR 20120](https://github.com/scipy/scipy/pull/20120) - Now computes desired shape while handling index. Handles `None` now. - Raises if >2D is requested. - 0D results -> numpy scalars - 1D results stick with format unless not available.. in that case: use coo - Other PRs - parallel sparse [PR 19717](https://github.com/scipy/scipy/pull/19717) - Waiting on author, but maybe needs encouragment? - Action: CJ will send a friendly nudge - single pass CSR binops [PR 19765](https://github.com/scipy/scipy/pull/19765) - [name=CJ] TODO: take a closer look, seems complete - coalesce binopts [PR 19766](https://github.com/scipy/scipy/pull/19766) - Cuts sparsetools.so in half, but incurs ~0 to ~30% performance hit - Action: Respond on the PR with consensus that it's probably too much of a performance hit - test/add all dict methods to DOK [PR 20175](https://github.com/scipy/scipy/pull/20175) -`pop`, `__reversed__`, `__or__`, `__ror__`, `__ior__` - Note that `__or__` has different semantics for dict vs arrays - Decision: focus on `pop` :popcorn: - General rule: let's follow the Array API when possible, and then the Dict API otherwise - CSR/CSC Division Yields COO [issue 20169](https://github.com/scipy/scipy/issues/20169) - is [name=Isaac]'s suggestion okay? Yes - should we have a rule for when the returned format can change? - switch when its faster - try to only switch to more feature-rich formats - minor typos in sparse code [PR 20274](https://github.com/scipy/scipy/pull/20274) - (axis 2->-2; - `_get_index_dtype(maxval=(nnz, N)))` -> `nnz,M,N`; - `__rmul__` uses `*args, **kwargs` which differs from `__mul__`. mypy doesn't like it. ## Discussion - Path forward for [`scipy.sparse.csr_matrix(scalar)`](https://github.com/scipy/scipy/issues/20128) - technically a regression, and something of a blocker for array-api compat for creation: [the specification](https://data-apis.org/array-api/latest/API_specification/generated/array_api.asarray.html#array_api.asarray) has specific wording about this: May be a Python scalar, a (possibly nested) sequence of Python scalars - Maybe solved by 1D arrays? - Return a 0D ndarray? - DECICISON: Return scalar or 0d array in core (what is the difference?) and then check in the array api for the numpy type before returning 1x1 - 0D arrays? Should these be "sparse" arrays (reshaping creates sparse)? Or is numpy array OK? - DECISION: use numpy 0D arrays and keep our eyes out for cases where that might not work (where sparse is needed) - str(A) returns "" for zero array. Confusing. ?Replace with `(...) 0`? - maybe include the shape or format in `__str__`? Then would never be nothing. - the proposed `(...) 0` is also good. - repr(A) returns "<8 sparse array of type ('<class 'int64'>') with 3 stored elements in Compressed Sparse Row format>". - shape at start, class next, format at end. Propose we switch format and shape? - "<Compressed Sparse Row sparse array of type ('<class 'int64'>') with 3 stored elements and shape (8,)>" - comes down to impact of change on users. wording is better but at what cost? # Monday March 4, 2024 video call link: https://meet.google.com/jrp-qgcf-nxh If not working: https://colgate.zoom.us/j/276854695 ## Attendees: CJ Carey, Thad Bindas, Dan Schult, Isaac Virshup, Stefan van der Walt ## Updates - Timing for Numpy 2.0? For Scipy 1.13? - If numpy gets bumped out another month and scipy a month after that will 1.14 be delayed well beyond June? If so, maybe we want the deprecations to go into 1.13. - Action: let's ask the "deprecation squad". - 1D PRs: - `dok-1d` [PR 19715](https://github.com/scipy/scipy/pull/19715) is now merged! :tada: - `csr-1d` [PR 19833](https://github.com/scipy/scipy/pull/19833) - Action: CJ needs to re-review, Dan can resolve conflicts - `index-1d` [PR 20120](https://github.com/scipy/scipy/pull/20120) - Handle `None` in the index now? What about other index fancy methods that could build >2D results. - [name=CJ]: Let's try to accept anything so long as it produces a (0,1,2)-d result - Would be useful to "symbolically evaluate" the indexing arguments to check the result shape (so long as we don't accidentally allocate huge temporary arrays). - Action: ping seberg or other numpy folks - 0D results -> numpy scalars - 1D results stick with format unless not available.. in that case: use coo - Other PRs - Roadmap sparse update [PR 19941](https://github.com/scipy/scipy/pull/19941) - Action: [name=CJ] will re-read and approve - setdiag performance [Issue 19943](https://github.com/scipy/scipy/issues/19943) with [PR 19962](https://github.com/scipy/scipy/pull/19962). - Action: Dan will resolve conflicts, and then CJ can solo-merge. - parallel sparse [PR 19717](https://github.com/scipy/scipy/pull/19717) - Waiting on author, but maybe needs encouragment? - Action: CJ will send a friendly nudge - single pass CSR binops [PR 19765](https://github.com/scipy/scipy/pull/19765) - [name=CJ] TODO: take a closer look, seems complete - coalesce binopts [PR 19766](https://github.com/scipy/scipy/pull/19766) - Cuts sparsetools.so in half, but incurs ~0 to ~30% performance hit - Action: Respond on the PR with consensus that it's probably too much of a performance hit - add fromkeys to DOK Merged :tada: - test/add all dict methods to DOK [PR 20175](https://github.com/scipy/scipy/pull/20175) -`pop`, `__reversed__`, `__or__`, `__ror__`, `__ior__` - Note that `__or__` has different semantics for dict vs arrays - Decision: focus on `pop` :popcorn: - General rule: let's follow the Array API when possible, and then the Dict API otherwise - CSR/CSC Division Yields COO [issue 20169](https://github.com/scipy/scipy/issues/20169) - is [name=Isaac]'s suggestion okay? Yes - should we have a rule for when the returned format can change? - switch when its faster - try to only switch to more feature-rich formats ## Discussion 1) `resize()`: Numpy handles this via ravel/unravel -- differently from scipy.sparse via clipping shape. Array-api doesn't have it at all. All features (except in-place nature) should be available via indexing and reshape. - What is our goal for this method? Should we remove it from sparrays? - [name=CJ]: I think it's worth keeping (with distinct semantics from numpy) 2) CSR @ CSC sparsetools for wide array @ tall array (e.g. 1d@1d) and large number of values summed (e.g. `A@A.T`)? One problem with CSR @ CSR is a column requires M+1 entries in `indptr`. An algorithm for CSR@CSC has complexity: `O(n_row_A*n_col_B*K)` where K is the maximum nnz in a row of A and column of B. Quadratic in resulting matrix size, linear in K; so good for wide @ tall matrices. Compare to CSR@CSR complexity of `O(n_row_A*K^2 + max(n_row_A,n_col_B)` which is linear in result shape but quadratic in K. - would be cool to have a dispatcher that looks at the shapes (and maybe densities?) - Ideally users could choose to call these explicitly 3) Where should we be validating indices dtypes? [name=Isaac] running into cases where construction doesn't validate, then we get C++ errors from indexing. Shouldn't this just happen at index time? - See https://github.com/scipy/scipy/issues/20182 https://github.com/scipy/scipy/pull/20183 - [name=CJ] I don't love it, but it's okay. # Monday February 19, 2024 video call link: https://meet.google.com/jrp-qgcf-nxh Fallback: https://framatalk.org/sp-sparse-arrays Next fallback? https://colgate.zoom.us/j/276854695 ## Attendees: Ilan Gold, Dan Schult, Tadd Bindas, Isaac Virshup ## Updates - Timing for Numpy 2.0? For Scipy 1.13? - If numpy gets bumped out another month and scipy a month after that will 1.14 be delayed well beyond June? If so, maybe we want the deprecations to go into 1.13. - misshaped boolean index. Merged:tada:[PR 19957](https://github.com/scipy/scipy/pull/19957) - PRs about 1d-arrays - `base-1d` Merged :tada: [PR 19853](https://github.com/scipy/scipy/pull/19853) needed for the others. - `csr-1d` [PR 19833](https://github.com/scipy/scipy/pull/19833) - I think this is ready after discussing: - scalar vs 0-dim array for 1d@1d (0-dim here) - Big picture questions remain (not this PR): - remove inference of shape -- make it be explicit - generalize `_get_*X*` methods (maybe `_get1`, `_get_many`, `_get_fancy`) - `dok-1d` [PR 19715](https://github.com/scipy/scipy/pull/19715) - reviews by Stefan and CJ. changes made. - 2 changes not quite as requested -- OK? - longer term issue of what to do with `resize`. See discussion items below. - Coming soon: `index-1d` PR for 1d indexing - Roadmap sparse update [PR 19941](https://github.com/scipy/scipy/pull/19941) - Other Sparse PRs - setdiag performance [Issue 19943](https://github.com/scipy/scipy/issues/19943) with [PR 19962](https://github.com/scipy/scipy/pull/19962). - parallel sparse [PR 19717](https://github.com/scipy/scipy/pull/19717) - Waiting on author, but maybe needs encouragment? - single pass CSR binops [PR 19765](https://github.com/scipy/scipy/pull/19765) - [name=CJ] TODO: take a closer look, seems complete - coalesce binopts [PR 19766](https://github.com/scipy/scipy/pull/19766) - Cuts sparsetools.so in half, but incurs ~0 to ~30% performance hit - pydata for csgraph. Merged :tada:[PR 19796](https://github.com/scipy/scipy/pull/19796) ## Discussion 1) When to return sparse vs dense (how to decide?). Maybe build sparse and then see performance issues? Different cases: - indexing - reductions e.g. `.sum()` Comments: probably best to return dense but have a way to override that to signal for a sparse. 2) `resize()`: Numpy handles this via ravel/unravel -- differently from scipy.sparse via clipping shape. Array-api deosn't have it at all. All features (except in-place nature) should be available via indexing and reshape. - What is our goal for this method? Should we remove it? 3) CSR @ CSC sparsetools for wide array @ tall array (e.g. 1d@1d)? Can convert 1d array to dense. The problem is CSR for a column requires M+1 entries in `indptr`. An algorithm for sparse@sparse has complexity: `O(n_row_A*n_col_B*K)` where K is the maximum nnz in a row of A and column of B. Quadratic in matrix size, but good for wide @ tall matrices. Compare to CSR@CSR complexity of `O(n_row_A*K^2 + max(n_row_A,n_col_B)`. 4) Where should we be validating indices dtypes? [name=Isaac] running into cases where construction doesn't validate, then we get C++ errors from indexing. Shouldn't this just happen at index time? # Monday February 5, 2024 video call link: https://framatalk.org/sp-sparse-arrays Looks like the framatalk link is (still) down. Fallback: https://meet.google.com/jrp-qgcf-nxh ## Attendees: CJ, Ilan, Issac, Dan, Stefan ## Updates - Dev Summit again this year? (planned for early June) - Timing on v1.13: soon after Numpy 2.0 is released which is ~late Feb. - Deprecations bumped to v1.14 - [#19892](https://github.com/scirpy/scipy/pull/19892) bumped the removal version to 1.14. We could push for getting them in for 1.13. Removing in 1.13 would be less than a year. (But 1.14 is probs > a year?) - Note that we will deprecate functions in `sparray` NOT in `spmatrix`. So the impact is less. - [ ] get_format, get_shape, set_shape, getH - [ ] asfptype, getmaxprint, H - [ ] getcol/getrow - [ ] A - We need to decide how to provide axis support for nnz. - Keep `getnnz`? [name=CJ] votes keep for now (undeprecate and maybe deprecate later for new name/functionality) - New method like `A.nnz_by_axis(axis)`? (the `get*` method names are largely being removed) - `.count_stored_values()` (similar to `.count_nonzero()`), with `axis=` kwarg - Should have the option to not reduce, similar to numpy's `.nonzero()`. - Would want separate codepaths for reduction cases - see previous discussion in [Issue 3343](https://github.com/scipy/scipy/issues/3343#issuecomment-184729746) - PRs about 1d-arrays - `base-1d` [PR 19853](https://github.com/scipy/scipy/pull/19853) needed for the others. - getcol() should raise on a 1d input, mention indexing - `csr-1d` [PR 19833](https://github.com/scipy/scipy/pull/19833) needs base-1d but independent of dok-1d. - `dok-1d` [PR 19715](https://github.com/scipy/scipy/pull/19715) needs base-1d but indpendent of csr-1d - Name changes - `coo.indices -> coo.coords` [PR 20003](https://github.com/scipy/scipy/pull/20003) :tada: merged - `A._mul_* -> A._matmul_*` to see what it looks like [PR 20004](https://github.com/scipy/scipy/pull/20004) - Roadmap sparse update [PR 19941](https://github.com/scipy/scipy/pull/19941) - The sparse array [draft roadmap](https://hackmd.io/3Vyca8s8T0KwGD0pURaFtg) was sent to [scipy-dev list](https://mail.python.org/archives/list/scipy-dev@python.org/thread/2CZBCBWP2LMQW67MWKMDJ74PSPGRAXP2/). - [Repo with tentative new direction for pydata sparse](https://github.com/willow-ahrens/finch-tensor) - [Discussion in pydata sparse repo](https://github.com/pydata/sparse/discussions/618) indicates a dynamic compiling approach using TACO? - [Isaac] Using JIT, not specificially using TACO (but again, tentative) - [Dan] Am I reading this post incorrectly, or is it out of date? - [Isaac] I believe it's a similar set of people, some of whom have worked on TACO. I also thought the post said taco, but reading it again it just mentions TACO. Willow also thought it was worded oddly - [Dan] :) Thanks! - Other sparse related PRs: - misshaped boolean index [PR 19957](https://github.com/scipy/scipy/pull/19957) spmatrix differs from np.matrix and ndarray. - [name=CJ] TODO read + review this - setdiag performance [Issue 19943](https://github.com/scipy/scipy/issues/19943) with [PR 19962](https://github.com/scipy/scipy/pull/19962). Test failing in optimize due to csr_matrix idxdtype - Ready to merge, after `indices` -> `coords` rename - parallel sparse [PR 19717](https://github.com/scipy/scipy/pull/19717) - Waiting on author, but maybe needs encouragment? - single pass CSR binops [PR 19765](https://github.com/scipy/scipy/pull/19765) - [name=CJ] TODO: take a closer look, seems complete - coalesce binopts [PR 19766](https://github.com/scipy/scipy/pull/19766) - Cuts sparsetools.so in half, but incurs ~0 to ~30% performance hit - pydata for csgraph [PR 19796](https://github.com/scipy/scipy/pull/19796) - Seems ready to merge, needs a second approval - Main issue: testing could be cleaner ## Discussion 1) When to return dense 1d and when sparse 1d? - for functions we can add options for user control. - for operators we can return sparse container but perhaps with dense storage under the hood. - some cases one is clearly better and other cases the other is clearly. Which is better is data value dependent. - alt 1: user chooses with kwarg - alt 2: automatically choose based on density, etc - alt 3: always return sparse container but it might have dense storage uder the hood. - can add options to the `dot` function and other functions. And in the reduce functions as well. - Prior art: https://github.com/flatironinstitute/sparse_dot >The output will be a dense array, unless both inputs are sparse, in which case the output will be a sparse matrix. The sparse matrix output format will be the same as the left (A) input sparse matrix. dense=True will directly produce a dense array during sparse matrix multiplication. dense has no effect if a dense array would be produced anyway. Dense array outputs may be row-ordered or column-ordered, depending on input ordering. - # Monday January 22, 2024 video call link: https://framatalk.org/sp-sparse-arrays Looks like the framatalk link is down. Fallback: https://meet.google.com/jrp-qgcf-nxh ## Attendees: CJ, Dan, Stefan, Isaac, Ilan ## Updates - PRs about 1d-arrays - `dok-1d` is now split to allow easier review: - `base-1d` [PR 19853](https://github.com/scipy/scipy/pull/19853) needed for the others. Includes `test_common1d.py` and changes to `_base.py` - `csr-1d` [PR 19833](https://github.com/scipy/scipy/pull/19833) needs base-1d but independent of dok-1d. - `dok-1d` [PR 19715](https://github.com/scipy/scipy/pull/19715) needs base-1d but indpendent of csr-1d - :tada: merged coo_row [PR 19940](https://github.com/scipy/scipy/pull/19940) Handling `A.row` for 1d in `coo`. Proposal from last meeting: - keep the `A.row` syntax in the 1d case - getter: if 1d, return `zeros_like(col)` but set `tmp.setflags(write=False)` - setter: if 1d, raise Error - Merged as of today! - sparse roadmap [PR 19941](https://github.com/scipy/scipy/pull/19941) - The sparse array [draft roadmap](https://hackmd.io/3Vyca8s8T0KwGD0pURaFtg) was sent to [scipy-dev list](https://mail.python.org/archives/list/scipy-dev@python.org/thread/2CZBCBWP2LMQW67MWKMDJ74PSPGRAXP2/). This PR is close to the same language, but incorporates some of that discussion. - Other sparse related PRs: - parallel sparse [PR 19717](https://github.com/scipy/scipy/pull/19717) - single pass CSR binops [PR 19765](https://github.com/scipy/scipy/pull/19765) - coalesce binopts [PR 19766](https://github.com/scipy/scipy/pull/19766) - pydata for csgraph [PR 19796](https://github.com/scipy/scipy/pull/19796) - setdiag performance [Issue 19943](https://github.com/scipy/scipy/issues/19943) ### Discussion: 0) What is a 1d CSR array, anyway? * just the name we use for a COO-style array with a trivial indptr 1) Scipy deprecation removal for v1.13 - Timing discussed in [PR 19880](https://github.com/scipy/scipy/pull/19880) due to shorter release clock. Upshot: stick with 1.13 planned timeline but removals can always be delayed. - Which to remove: - [ ] get_format, get_shape, set_shape, getH - [ ] make .shape a property - [ ] asfptype, getmaxprint - [ ] remove .H - [ ] remove getcol/getrow - [ ] remove .A - [ ] getnnz (issue: users providing `axis=` kwarg) - Notes: - https://stackoverflow.com/a/30668375 says assigning to shape is not recommended - https://numpy.org/doc/stable/reference/generated/numpy.ndarray.shape.html has a warning about assignment to shape 2) Name Change Proposals - Proposal 1: **Change `coo.indices` to `self.coords` (coordinates).** `indices` is a public attribute that hasn't been released for coo. So we should get the name right before it is released. - Problems with `indices` - coo `A.indices` looks too much like csr `A.indices` - Code for coo/csr conversion may be confusing. - When reading code w/o context, hard to tell if csr or coo format code. - Positives with `coords` - avoids `tuple` vs `1d-nparray` confusion for same name - for 1d, avoids `indices = self.indices[0]` confusion when converting to csr. Instead `indices = self.coords[0]` - matches the name of the class. - Proposal 2 **Switch to `_matmul_vector`, `_matmul_multivector` and `_matmul_sparse_array`** Changing these private names would help readability. - names `_mul_vector` and friends do `matmul` but named `mul`. - Some are used within `multiply` to do elementwise multiply with broadcasting by using matmul. Confusing whether a method is doing matmul or mul. - The matrix part of name `_mul_sparse_matrix` might be confusing during the transition to arrays. 3) When to return dense 1d and when sparse 1d? - some cases one is clearly better and other cases the other is clearly. Which is better is data value dependent. - alt 1: user chooses with kwarg - alt 2: automatically choose based on density, etc - alt 3: always return sparse container but it might have dense storage uder the hood. - can add options to the `dot` function and other functions. And in the reduce functions as well. - Prior art: https://github.com/flatironinstitute/sparse_dot Are we going to Seattle again this year? ### Report Report on 1d uni-format exploration. - Summary of Dan's conclusion that `coo-1d` and `csr-1d` should be separate. - Dan tried `uni_array` as a subclass of both `coo_array` and `_cs_matrix`. Added `__new__` to decide whether to be coo_array or uni_array based on shape. - coo and csr mesh well together, easy to implement with about 8 methods for uni_array - BUT harder to maintain because of `__new__` method with value dependent class choice. - Also BUT `resize` must change `ndim` in place!! And we can't change class in-place. - not much savings of duplicated code: every coo method and csr method must handle both 1d & 2d cases just like with separated classes. Or, all those methods have to get mostly duplicated in `uni-array`. - Dan recommends sticking with separated classes for coo-1d and csr-1d. # Monday January 8, 2024 video call link: https://framatalk.org/sp-sparse-arrays ## Attendees: CJ, Dan, Stefan ## Updates - Merged :tada: `coo-1d` [PR #18530](https://github.com/scipy/scipy/pull/18530) - Merged :tada:`minmax-1d` [PR 19743](https://github.com/scipy/scipy/pull/19743) builds on `coo-1d` to provide min-max support. - dok-1d [PR 19715](https://github.com/scipy/scipy/pull/19715) - suggestion to raise for 1d getrow. Q: should getcol raise error for 1d? Shouldn't users use indexing instead? - Decision to not support getcol/getrow in sparse arrays has already been made with a deprecation for v1.13. There is a box tracking this in the issue tracker for deprecations - Q: remove extraneous np.array from 2d code? [Yes] - Looks like `_base.py` changes are reviewed. How to make it easier to review tests and dok? [split base and test changes from dok changes] - [Deprecation tracker](https://github.com/scipy/scipy/issues/15765) mentions removing things like `getrow` and `getcol` from the sparray API. - sparse array [draft roadmap](https://hackmd.io/3Vyca8s8T0KwGD0pURaFtg) sent to [scipy-dev list](https://mail.python.org/archives/list/scipy-dev@python.org/thread/2CZBCBWP2LMQW67MWKMDJ74PSPGRAXP2/) - [roadmap](https://github.com/scipy/scipy/blob/main/doc/source/dev/roadmap.rst), [roadmap detailed](https://github.com/scipy/scipy/blob/main/doc/source/dev/roadmap-detailed.rst) - [name=Dan] will create a PR ##### new non-1d related PRs - parallel sparse [PR 19717](https://github.com/scipy/scipy/pull/19717) - single pass CSR binops [PR 19765](https://github.com/scipy/scipy/pull/19765) ## Discussion - Followup to #18530 - Scikit-learn found a few surprises in their code after #18530 (coo-1d). See [scikit-learn/scikit-learn#28047](https://github.com/scikit-learn/scikit-learn/pull/28047) and possible fix at [PR 19809](https://github.com/scipy/scipy/pull/19809): - sparse array interface changed after #18350. Before: 1d-input gave 2d array. `A = csr_array([1,2,3])` makes shape `(1, 3)`. After raises when converting from coo_array to csr_array. - We want it to create a 1d array. [Yes] - Direct mutation of 1d `A.row` doesn't change `A`. The 1d `A.row` property creates/returns `np.zeros_like(self.col)`. sklearn changes the idx_dtype. Other users might change values. For 1d, mutating `A.row` doesn't change the array (there are no row indices in 1d). - proposed fix in [PR 19809](https://github.com/scipy/scipy/pull/19809): - raise on `A.row` for 1d `A`. - add `A._row_as_2d` property for internal use to set up 1d coo to use sparsetools. (similar to `A._shape_as_2d`) - Will need transition docs to make noise about finding 1d -> 2d silent conversion in sparse matrix and adjusting for arrays. - CJ's proposal: - keep the `A.row` syntax in the 1d case (for now, at least) - getter: if 1d, return zeros_like(col) but set `tmp.setflags(write=False)` - setter: if 1d, raise Error - potential new spbase method: `cast_indices() -> None` - would be useful to support format-agnostic index array casting - sklearn does this manually right now, likely others as well - csr-1d [PR 19833](https://github.com/scipy/scipy/pull/19833): csr sparsetools machinery "works" for 1d. `indptr` set to `np.array([0 nnz])`. But `csc` is bigger than dense for storage of one row (indptr by itself takes more memory than dense). Options: - make a new format for 1d that uses csr sparsetools. Avoids expectations from names csr/csc. - allow csr to store as one row AND csc to store as one column. Converting is fast (current 2d transpose). Perhaps confusing to users. - CJ's take: provide 1d csr but not 1d csc - 1d CSC is never the right call (worse than dense) - users can treat CSR/CSC interchangeably from an API POV # Monday December 18, 2023 link: https://meet.google.com/qvk-bydd-tem ## Attendees: Dan, CJ, Stefan ## Updates - `_construct.py` [PR #19171](https://github.com/scipy/scipy/pull/19171) merged :tada: - `coo-1d` [PR #18530](https://github.com/scipy/scipy/pull/18530) - v1.12 branch created (so we can merge for v1.13 when ready) - [PR to the PR](https://github.com/perimosocordiae/scipy/pull/13) to handle 1d outside `_coo.py` - will need more fixes for `_construct` in `block`, `kron,` etc. Should we delay til after coo-1d merged? - sparse array [draft roadmap](https://hackmd.io/3Vyca8s8T0KwGD0pURaFtg) sent to [scipy-dev list](https://mail.python.org/archives/list/scipy-dev@python.org/thread/2CZBCBWP2LMQW67MWKMDJ74PSPGRAXP2/) - [roadmap](https://github.com/scipy/scipy/blob/main/doc/source/dev/roadmap.rst), [roadmap detailed](https://github.com/scipy/scipy/blob/main/doc/source/dev/roadmap-detailed.rst) - [name=Dan] will create a PR - scipy-dev list [query about parallel sparse](https://mail.python.org/archives/list/scipy-dev@python.org/thread/TWLORTYBBBUPWYW62QWBHBRJJYRRWPOO/) has been encouraged to explore it. - `coo-dok-1d` [PR 19688](https://github.com/scipy/scipy/pull/19688) builds on `coo-1d` to provide min-max and dok support. ## Discussion - next steps for Roadmap discussion: - make PR to change sparse section of [roadmap page](https://docs.scipy.org/doc/scipy/dev/roadmap.html) - No blocker in adding 1D structures; currently arrays fail when reductions produce 1D arrays. - Some Qs on `coo-dok-1d` PR: - for `__iter__` can we count on `self[r]` to return same as `self.[r,:]` for 2d? - Yes - OK to rewrite `validate_indices` and helpers? Could be more closely aligned with pydata sparse approach: License aligns... more smaller well-named functions. - Yes, but let's do that in a separate indexing PR - 1st PR can do dok without large scale indexing support - https://github.com/pydata/sparse/blob/master/sparse/_slicing.py - OK to change "incorrect" handling of some cases? i.e. currently no error if bool index's shape mismatches when no `True` value lies outside shape. - We might need a deprecation cycle for this - No handling of Newaxis yet. how much `nd` handling should we add now? - Let's err on the side of providing too little functionality (at first). - private functions changed. should we change their name to make clear changes have occurred? At what point do we split the matrix code from array code? - Now have base for csr support -- meaning matmul/arithmetic/binop. Do we add 1d to `csr_array` or somehow put into `coo_array` or make `uni_array` for 1d-only? (can do more than one to see what they look like) - order: dok w/o indexing - indexing - csr or uni - future steps for full feature 1d: - works with `construct` - make reducing 2d ops return 1d - what about scalars? (shift from 1x1 to 0d?) - best org for testing of all this? - shift testing from matrix to array? - let's build a testing system that localizes the info about what cases it is testing -- yet covers what test_base covers. # Monday December 4, 2023 link: https://meet.google.com/qvk-bydd-tem ## Attendees: Dan Schult, Stéfan van der Walt, CJ Carey ## Discussion - Dan's PR [PR #19171](https://github.com/scipy/scipy/pull/19171) and CJ's PRs [PR #18530](https://github.com/scipy/scipy/pull/18530) need to go in before [SciPy's Dec 6th deadline](https://mail.python.org/archives/list/scipy-dev@python.org/thread/VFV2Y2CYRFGACTTVDQOCUWP3I5R6MPHX/) - Or can ask Tyler for a few more days - Dan's PR: - [`random_state` naming](https://github.com/scipy/scipy/pull/19171/files#diff-139795d27596cd4bc81110c1a0319c87147cafe9338b8e008cccf8a06b4dab4eR1095) - We don't want a [slow default RNG](https://github.com/scipy/scipy/pull/19171/files#diff-139795d27596cd4bc81110c1a0319c87147cafe9338b8e008cccf8a06b4dab4eR1100) - [Context](https://github.com/scipy/scipy/issues/19159#issuecomment-1699161579) - CJ's PR: - Default "orientation" of 1D - row/col semantics - Limiting to <= 2D mostly to ensure compatibility with sparsetools, but underlying implementation is generalized - Breakage introduced (`if issparse(x): m, n = x.shape`) - [Parallel sparse operations email](https://mail.python.org/archives/list/scipy-dev@python.org/thread/TWLORTYBBBUPWYW62QWBHBRJJYRRWPOO/) to scipy-devel - 1d full-feature - we'll need to make sure that multiplying by a 1d works for 2d csr+. Checking for `issparse` will no longer ensure 2d -- and there may be code in many places that assumes 2d for sparse. - multiplying two 1d arrays could use csr_matmat with some fake `indptr` stuff. But that indptr is expensive (N+1 entries in indptr). - Testing: tests for coo-1d are in `tests/test_coo.py`. Do we need more? The file [tests/test_common1d.py](https://github.com/dschult/scipy/blob/more-1d/scipy/sparse/tests/test_common1d.py) in a dschult branch [more-1d](https://github.com/dschult/scipy/tree/more-1d) contains a set of "common" 1d tests based on `test_base.py`. But is there a better approach? That more-1d branch builds off of coo-1d and provides `dok` format and the common tests. - `test_array.py` refit test_base with just checking if behavior is the same for numpy array and sparse. - Generic 1D container, vs specialized 1D container of each type (coo, dok, lil, etc.) - Plan for while: - CJ will get sparse constructor PR over the finishing line - We hold off on 1D COO until SciPy 1.12 fork - We will get that in early in the next release cycle, and add other 1D formats - Dan to share preliminary roadmap with SciPy mailing list ## PR / Issue Roundup * sparse constructors: [PR #19171](https://github.com/scipy/scipy/pull/19171) * 1d COO format: [PR #18530](https://github.com/scipy/scipy/pull/18530) * [name=CJ]: Most of Julien's comments have been addressed. Needs a bit more polish before merging. * Will likely adjust validation to disallow >2d shapes. * Timing of merging constructors and coo-1d relative to scipy-1.12 release * constructors might be OK to go? * check coo-1d interaction with 2d checks for `issparse`. i.e. matmul against a coo-1d * perhaps v1.13 for timing of dok-1d, uni-1d, csr-1d, csc?, dsp?. * roadmap: plan for migrating to arrays * [draft](https://hackmd.io/3Vyca8s8T0KwGD0pURaFtg) -- probably leave off the last sections of the draft doc (more discussion) * how to communicate to community? ## Previous topics: * docstring fixes: [PR #18898](https://github.com/scipy/scipy/pull/18898) * [name=CJ]: Needs a bit of love to resolve merge issues * sparse nonzeros for argmin/argmax: [PR #16467](https://github.com/scipy/scipy/pull/16467) * [name=CJ]: Still needs some design work, see recent comments by @yfarjoun. * Maybe the simplest solution is to let the caller provide an `implicit_value` kwarg, as suggested in the comments. * Old effort to improve the lil_matrix constructor: [PR #13415](https://github.com/scipy/scipy/pull/13415) * [name=CJ]: This was close to mergeable back in 2021, but it fell off my radar. Might be nice to get it in, finally. * [name=Julien] scikit-learn test suite extension * Now explicitly checks behavior with sparse arrays * WIP, PRs are open now * [name=Isaac] Fast path for canonical format, issues * https://github.com/scipy/scipy/issues/19106 * [name=Isaac] array-api draft issue: https://hackmd.io/YKoA7a7HRCOndnwv38Dn3A # Friday August 25 2023 ## Attendees - Isaac - Julien - CJ - Dan ## PR / Issue Roundup * matrix_power was merged: [PR #18544](https://github.com/scipy/scipy/pull/18544) * [name=CJ]: We noted some opportunities for optimizing this further in the PR discussion. A followup PR would be useful. * block_array + friends: [PR #18862](https://github.com/scipy/scipy/pull/18862) * [name=CJ]: This looks ready to merge, I'll do a final read before pushing the button. * 1d COO format: [PR #18530](https://github.com/scipy/scipy/pull/18530) * [name=CJ]: Most of Julien's comments have been addressed. Needs a bit more polish before merging. * Will likely adjust validation to disallow >2d shapes. * docstring fixes: [PR #18898](https://github.com/scipy/scipy/pull/18898) * [name=CJ]: Needs a bit of love to resolve merge issues * sparse nonzeros for argmin/argmax: [PR #16467](https://github.com/scipy/scipy/pull/16467) * [name=CJ]: Still needs some design work, see recent comments by @yfarjoun. * Maybe the simplest solution is to let the caller provide an `implicit_value` kwarg, as suggested in the comments. * Old effort to improve the lil_matrix constructor: [PR #13415](https://github.com/scipy/scipy/pull/13415) * [name=CJ]: This was close to mergeable back in 2021, but it fell off my radar. Might be nice to get it in, finally. * [name=Julien] scikit-learn test suite extension * Now explicitly checks behavior with sparse arrays * WIP, PRs are open now * [name=Isaac] Fast path for canonical format, issues * https://github.com/scipy/scipy/issues/19106 * [name=Isaac] array-api draft issue: https://hackmd.io/YKoA7a7HRCOndnwv38Dn3A ## Discussion * 1d support for DOK format * [#18929](https://github.com/scipy/scipy/pull/18929) was merged, so we can move ahead on this * canonical format / sorted indices * [name=CJ]: Maybe we need to redesign this from the ground up. Tracking/exploting canonical formats is a bit of a mess right now. * See [#18546](https://github.com/scipy/scipy/issues/18546) and [#19106](https://github.com/scipy/scipy/issues/19106). * Multithreaded sparse dot products * See [#19112](https://github.com/scipy/scipy/issues/19112). * https://github.com/Quantco/tabmat * A plugable backend, reasonable dependency (from both technical and licensce perspective), or simple implementation could be good * For simple implementation, cython has parallel loops * openMP dependency is a problem: https://github.com/scipy/scipy/issues/15129 * [EuroSciPy 2023 Maintainer Track](https://docs.google.com/presentation/d/1VVj-jOYulBC6h8vMURA-fiQHfYyKi6PUzfFL9K8QLoY/edit?usp=sharing) * Fill values for sparse arrays? * Brings a lot of added complexity * Updating the scipy.sparse roadmap entry * The old roadmap came from an era where we thought about spinning out scipy.sparse into a third-party dep * pydata/sparse was potentially going to become a dependency of scipy, but its use of Numba is preventing that * The current plan is to keep maintaining scipy.sparse with the modern ndarray API, while explicitly supporting users who bring their own sparse container types * We should redo this + add some history * It would be good to get ralf's opin (we should invite him to one of these meetings). * Array api issue * who are maintainers for other libraries? * cupy.scipy.sparse: https://github.com/pearu # Friday August 4 2023, 14:00 to 15:00 UTC ## Attendees - CJ - Julien - Dan - Levi ## From last meeting: * Update the [`scipy.sparse` section of the roadmap]( https://docs.scipy.org/doc/scipy/dev/roadmap-detailed.html#sparse) found by Isaac? ([name=Julien]) * dok-1d: * added PR [#18929](https://github.com/scipy/scipy/pull/18929) to stop subclassing from `dict` * [name=CJ] approved the PR * [name=CJ] started taking [a quick look at this](https://github.com/scipy/scipy/compare/main...perimosocordiae:scipy:dok-1d), two notes: (a) this will rely on 1d COO support, so we need to get that merged first. (b) DOK supports indexing, which means we may need to overhaul the IndexMixin class as well. * Get a 1d version of some TestCommon-like tests * suggested new method `spmatrix.to_sparray()` on Issue [#18918](https://github.com/scipy/scipy/issues/18918) * Still need to implement. Perhaps add a separate free function in `_sputil.py` that could then be deprecated. * Or maybe we just wait until the sparse array creation functions are complete and see if we still need this (Let's do this option.) * Follow up on: - PRs: - Matrix Power [#18544](https://github.com/scipy/scipy/pull/18544) - Waiting on final author updates from Levi - We're not going to do the efficiency part yet, that can be a separate PR - Basically just needs docstring changes - sparse arrays for hstack, vstack, bmat, block_diag [#18862](https://github.com/scipy/scipy/pull/18862) - TODO [name=CJ]: still need to read through this again after the recent changes - connected component index type [#18913](https://github.com/scipy/scipy/pull/18913) - there's some Cython weirdness, but we can merge first and fix it later - 1d COO support [#18530](https://github.com/scipy/scipy/pull/18530) - any method that uses `to_csr` raises for coo-1d. OK to merge anyway since "that functionality isn't provided by coo-1d and will be included in csr-1d." ?? Is this the right approach? (Yes) - We currently allow >2d constructor usage, but most/all functionality is missing for those shapes. We should restrict this (in the allow_ndim validation) with a comment explaining why. This will prevent users from getting the wrong idea about what we plan to support in the near future. - TODO [name=CJ]: resolve Julien's comments - Issues: - [name=Isaac] Fast path for canonical format [#18546](https://github.com/scipy/scipy/issues/18546) - This isn't the top priority right now, but the idea is good. - Semantics of sparse array creation [#18592](https://github.com/scipy/scipy/pull/18592) - Dan is looking to take this over - array API eye and np.eye both only support 2D with (n, m=None) - Maybe this is fine, it just won't support anything other than 2d sparray. Users can use `identity()` for other cases. - concatenate arrays [#18839](https://github.com/scipy/scipy/issues/18839) => related to PR #18862 above - It seems like the new `block()` function can support this, so let's wait and maybe add this later for Array API compat. - split docstrings for array vs matrix #18669 Needs Review at [#18898](https://github.com/scipy/scipy/pull/18898) - Julien recently left comments, Dan and CJ will also review - We need to be careful about docstrings all over the place (many base classes + mixins). - Follow the array-api as much as possible [#18915](https://github.com/scipy/scipy/issues/18915) - some functions like `ones` create dense arrays unless we get an optional fill value - Lots of discussion from Scipy maintainers on this one - Overall it seems like a good general idea, but we're not obliged to support 100% - Remove the `_is_array` attribute [#18921](https://github.com/scipy/scipy/issues/18921) :Tada: done... ## Notes - [name=Julien Jerphanion] Please ping me for reviews. - On holidays. I reviewed some PRs. I am Preparing EuroScipy 2023. - I am thinking of reviewing and participating to: - [ENH: sparse.linalg: Implement matrix_power()](https://github.com/scipy/scipy/pull/18544) - [ENH: Fast path for sparse array "canonical format"](https://github.com/scipy/scipy/issues/18546) - [DOC:sparse: More properties and docs for sparse array classes](https://github.com/scipy/scipy/pull/18561) - I am planning to continue: - [BUG: check array bounds in csr_todense](https://github.com/scipy/scipy/pull/16243) - Next meeting (August 18th) may be cancelled because several people will be out. We'll confirm over email. # Friday July 21 2023, 14:00 to 15:00 UTC ## Attendees - Isaac - Dan - Kat - Julien - CJ ## From last meeting: - calendar corrected [community calendar](https://scientific-python.org/calendars/) - scientific-python blog posted about [sparse array progress at the summit](https://blog.scientific-python.org/scientific-python/dev-summit-1-sparse/) - [name=Julien] will be organizing a maintainer track for the work on [Sparse Data at EuroScipy 2023]((https://pretalx.com/euroscipy-2023/me/submissions/8ZANVV/)); hope this will help discussion ## Agenda * Update the [`scipy.sparse` section of the roadmap]( https://docs.scipy.org/doc/scipy/dev/roadmap-detailed.html#sparse) found by Isaac? * Consider DOK for 1D arrays. * [name=CJ] started taking [a quick look at this](https://github.com/scipy/scipy/compare/main...perimosocordiae:scipy:dok-1d), two notes: (a) this will rely on 1d COO support, so we need to get that merged first. (b) DOK supports indexing, which means we may need to overhaul the IndexMixin class as well. * Re: indexing woes, it's a big pain that `dok_matrix` is a subclass of `dict`. Maybe we can make `dok_array` a generic Mapping but internally it would have a dict attribute that holds the entries. * Create 1d array version of TestCommon [name=Dan] * Maybe we can steal parts of the Array API conformance tests * Request for a function to convert from matrix to array? [#18918](https://github.com/scipy/scipy/issues/18918); The long term fix is to stop making matrices, but in the meantime... * `coo_array(my_mtrx)` * Method on `spmatrix`: `as_sparse_array()` * Reopen issue, suggest ^ * Follow up on: - PRs: - Matrix Power [#18544](https://github.com/scipy/scipy/pull/18544) - Waiting on author updates from Levi - sparse arrays for hstack, vstack, bmat, block_diag [#18862](https://github.com/scipy/scipy/pull/18862) - Should we name this `scipy.sparse.block()` instead of `block_array()`? - Index dtypes - Add _array version of diags creation functions [#18538](https://github.com/scipy/scipy/pull/18538) - Still merged, will appear in the next release - connected component index type [#18913](https://github.com/scipy/scipy/pull/18913) - 1d COO support [#18530](https://github.com/scipy/scipy/pull/18530), now ready for review! - Follow-ups: more testing, toarray() that doesn't go through the current native code path, indexing, broadcasting for math ops - Please kick the tires! We definitely don't have tests covering 100% of the usage. - Issues: - [name=Isaac] Fast path for canonical format [#18546](https://github.com/scipy/scipy/issues/18546) - The python attribute isn't passed to our native code, so the C++ always checks (which is a O(nnz) operation) - JAX does has_unique and has_sorted separately - May need some discussion! The priority may be low if we can't pass has_canonical in more places. - Semantics of sparse array creation [#18592](https://github.com/scipy/scipy/pull/18592) - spdiags patch is still on main - Ross and Stéfan are planning to work on this - array API eye and np.eye both only support 2D with (n, m=None) - Consider using block_array instead of barray for bmat (Dan) If any of inputs are sparse arrays, produce sparse array as output. Or just scipy.sparse.block, to match np.block. - concatenate arrays [#18839](https://github.com/scipy/scipy/issues/18839) => related to PR #18862 above - [array vs matrix in docstrings #18669]( https://github.com/scipy/scipy/issues/18669) Follow-up at [#18898](https://github.com/scipy/scipy/pull/18898) - [name=CJ] I think we should split the docstrings - what about docstrings in methods of e.g. `_cs_matrix`? - Follow the array-api as much as possible [#18915](https://github.com/scipy/scipy/issues/18915) - some functions like `ones` create dense arrays unless we get an optional fill value - Remove the `_is_array` attribute [#18921](https://github.com/scipy/scipy/issues/18921) ## Notes * Different array/matrix docstrings * Do seperate class docstrings for the current PR [name=Kat] * We can overload the .__doc__ in some cases * Tackle separate method docstrings in future issue/PR * coo_1d * Prefer follow up PRs * dok * make dok_array not subclass from dict? Open an issue * 1d # Friday July 7 2023, 14:00 to 15:00 UTC >[name=Julien Jerphanion] I might not be able to attend this time. My comment: > - I added some info in the actionables items of the last meeting > - I will be organizing [a maintainer track for the work on Sparse Data at EuroScipy 2023](https://pretalx.com/euroscipy-2023/me/submissions/8ZANVV/); hope this will help discussion ## Attendees - Isaac - CJ - Dan - Stéfan - Jarrod ## Agenda * Blog Post * Dan is going to work on this, maybe at SciPy'23 * Update the [`scipy.sparse` section of the roadmap]( https://docs.scipy.org/doc/scipy/dev/roadmap-detailed.html#sparse) found by Isaac? * Update the [community calendar](https://scientific-python.org/calendars/) to have an open meeting link * Updating the date/time of the meeting: https://github.com/scientific-python/scientific-python.org/pull/436 * TODO [name=CJ]: Send an email to Scipy-Dev publicizing the meeting, link these notes and the calendar page. * Consider DOK for 1D arrays [name=Dan] * Follow up on: - PRs: - [SuperLU index downcasting #18644](https://github.com/scipy/scipy/pull/18644) was merged - [use sparse.diags instead of spdiags internally#18802](https://github.com/scipy/scipy/pull/18802) - [name=CJ] Merged! :tada: - [Matrix power #18544](https://github.com/scipy/scipy/issues/18544) - Issues: - [spmatrix lost its methods #18749]( https://github.com/scipy/scipy/issues/18749) - [name=CJ] I just closed this. - [connected_components index dtype #18716]( https://github.com/scipy/scipy/issues/18716): - [Cython code in question](https://github.com/scipy/scipy/blob/3c89445b6439f3ce7bffc4cf11c6407c39faedc5/scipy/sparse/csgraph/_traversal.pyx#L647) - [example of using a fused index type]( https://github.com/scipy/scipy/blob/3c89445b6439f3ce7bffc4cf11c6407c39faedc5/scipy/sparse/csgraph/_reordering.pyx#L81C1-L81C1) - [array vs matrix in docstrings #18669]( https://github.com/scipy/scipy/issues/18669) - Now that semantics are changing in meaningful ways, we should probably separate docstrings. - Matrix docstrings won't be changing much anyway, so should be safe to "freeze". - [Kat's commit with token replacement]( https://github.com/KatMistberg/scipy/commit/7f6fe9a27f53c4eb5dbe57b8824ec6c7fbb53819) - [name=CJ] I think we should just split the docstrings - [array creation semantics #18592](https://github.com/scipy/scipy/issues/18592) - spdiags patch is still on main - [name=Ross] and [name=Stéfan] are planning to work on this - [array API `eye`](https://data-apis.org/array-api/2022.12/API_specification/generated/array_api.eye.html) and `np.eye` both only support 2D with `(n, m=None)` - Consider using `block_array` instead of `barray` for `bmat` ([name=Dan]) - If any of inputs are sparse arrays, produce sparse array as output - Or just `scipy.sparse.block`, to match `np.block`. - [name=Isaac] Fast path for canonical format - [1d COO support #18530](https://github.com/scipy/scipy/pull/18530) - Actually not too far off - [concatenating sparse arrays #18839](https://github.com/scipy/scipy/issues/18839) # # Friday June 17 2023, 14:00 to 15:00 UTC ## Attendees - Dan - Erik - Isaac - CJ - Julien ## Agenda * Sort out and attribute [what remains to be done](https://hackmd.io/1Q2832LDR_2Uv_-cV-wnYg#What-remains-to-be-done) * Roadmap/ plan for array-like sparse arrays * Deprecation plan for matrices * [Creation routines](https://github.com/scipy/scipy/issues/18592) * Interop with other sparse matrix packages (isaac) * Blog post ## Notes * Introduction, Erik works on libraries using sparse datastructures * >[name=Julien] Erik: here is [the worklog of the summit](https://hackmd.io/iEtdfbxfSbGwOAJTXmqyIQ) ### What remains to be done * How do we work * [Dan] Where does in-progress work go * CJ's fork, make PRs onto branch there * Using the view on the diff on GitHub might prevent the creation of spurious Draft PR * Dense pretending to be sparse can be ephermeral, never needs to go to scipy * Dan will make a branch for exploration that people can fork and make PRs to. Dan will try to keep it uptodate with the main branch too. * 1d sparse array support Timeline? * Dan + CJ working on it * Targetting 1.12 * nd sparse array support * Julien, CJ: seems like low priority and this would be a huge maintainance cost SciPy for relatively rare use-cases * CJ: Array api makes interop easier, we can lean on external packages for now * creation functions [scipy#18592](https://github.com/scipy/scipy/issues/18592) * Want to get them out in a single release * Is Ross going to implement these, or need to assign elsewhere? * Current plan: `_array` methods, so we can apply this to the fast_matrix_market io at the same type * Bundling in api changes (like tuples for shapes) * Deprecate `isspmatrix_` methods? * Sounds like combination of classes + format attribute cover this * (Julien) Deprecation plan: have suggested code to replace these things with * Add a section in the documentation explaining the change of semantic and a migration plan for downstream libraries during the deprecation cycle. * Make a roadmap for array api * scikit-learn * When will scikit-learn be willing to support a compatible version of scipy? * Julien: * reasoning is based on some linux package manager not supporting latest numpy scipy * Personally would like scikit-learn to follow SPEC 0 * See discussion for [1.3](https://github.com/scikit-learn/scikit-learn/issues/26438) * [Dedicated RFC]( https://github.com/scikit-learn/scikit-learn/issues/26418) * Some part of scikit-learn's behavior depends on SciPy's version (e.g. see [this one](https://github.com/scikit-learn/scikit-learn/blob/784ba9ef9f65d5e4e33087dd7f5b87d65b605efc/sklearn/preprocessing/_polynomial.py#L61-L73)); we might potentialy rely on similar mechanism for the support of sparse arrays (e.g. reshaping outputs) * scikit-learn uses nightly builds to test the developer version of SciPy and NumPy and failing tests are updated in [this issue](https://github.com/scikit-learn/scikit-learn/issues/26154). We can watch those for breakage in scikit-learn when we make changes in SciPy. * General speed ups * "Fast path for canonical" * TODO (Isaac) * Figure out canonical for all the formats, probably normalize APIs * Add keyword public argument for constructors ## Broad plan for deprecation * Add support for sparse_arrays everywhere we can * Creation functions: target 1.12 * 1d COO: target 1.12 * Deprecate spmatrix-specific functions (where there's an easy replacement) * Target: 1.12 * Once everything works with sparse arrays, then deprecate spmatrix * And the sparse array API is stable! (specifically, 1d support) * Target: 1.13 ## Actionable items - [Julien] Add a section in the documentation explaining the change of semantic and a migration plan for downstream libraries during the deprecation cycle. - see [the section bellow](https://hackmd.io/9UCjKdRUTEeoS8KdQNll-A#Migration-guide-from-scipysparsespmatrices-to-scipysparsesparray) - [Julien] Drop `isspmatrix_*` checks in scikit-learn and use `issparse` and `format` - https://github.com/scikit-learn/scikit-learn/pull/26751 ## Next meeting date and time * Every two weeks with the option to drop every other meeting (at least once a month) > [name=Julien] +1. I proposed to meet every two week for 1h on the same timeslot --- ## Migration guide from `scipy.sparse.spmatrices` to `scipy.sparse.sparray` ### Rationale for the migration In SciPy 1.8.0 a new set of classes for Sparse Arrays (`csr_array` et al.) has been added. The reason for it is to align with NumPy's semantic. Sparse Matrices will be deprecated before being entirely removed. For the detailed about the future changes made to the `sparse` module, see [this section form the roadmap](https://docs.scipy.org/doc/scipy/dev/roadmap-detailed.html#sparse) ### How to migrate - Replace `scipy.sparse.<format>_matrix` with `scipy.sparse.<format>_array` - Replace `isspmatrix` with `issparse` - Replace `isspmatrix_<format>(X)` with `issparse(X) and X.format == '<format>'` - Mind the change of behavior between Sparse Matrices' and Sparse Arrays'. For instance: - shape might be different in case of 1-D array - For examples of migrations done in downstream libraries, see those PRs: - https://github.com/scikit-learn/scikit-learn/pull/26751

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.