owned this note
owned this note
Published
Linked with GitHub
# New Xarray meeting notes
https://us02web.zoom.us/j/87503265754?pwd=cEFJMzFqdTFaS3BMdkx4UkNZRk1QZz09
Archive: https://hackmd.io/Vv6g2ABzTPKbe2MWBQqS1w
## Dec 18, 2024
### Attendees
- Benoît Bovy / @benbovy (cannot attend, unfortunately)
- Kai Mühlbauer / @kmuehlbauer
- Tom Nicholas / @TomNicholas
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Eni Awowale / @eni-awowale
- Deepak Cherian / @dcherian
- Stephan Hoyer
- Matt Savoie
### 60 seconds updates
- Benoît
- continued working on https://github.com/benbovy/xproj
- Kai
- [Relax nanosecond datetime restriction in CF time decoding](https://github.com/pydata/xarray/pull/9618)
- Now splitting out backwards compatible code parts for better review experience. Reviews appreciated on the pending PR's.
- [x] [Add "unit"-parameter to date_range, enhance iso time parser to us](https://github.com/pydata/xarray/pull/9885)
- [x] [move scalar-handling logic into possibly_convert_objects](https://github.com/pydata/xarray/pull/9900)
- [ ] [Enhance and move ISO-8601 parser to coding.times](https://github.com/pydata/xarray/pull/9899)
- [ ] [split out CFDatetimeCoder, deprecate use_cftime as kwarg ](https://github.com/pydata/xarray/pull/9901)
- [ ] [time coding refactor](https://github.com/pydata/xarray/pull/9906)
- [ ] use iso-parser when reference time out-of-bounds (needs iso-parser and time-coding refactor)
- Tom
- Blog post on DataTree collaboration
- Hoping to finish this today, and release before the end of the year
- AGU
- VirtualiZarr
- Serverless parallelization of opening files in `open_mfdataset`
- See https://github.com/zarr-developers/VirtualiZarr/pull/349#discussion_r1885979222
- Eni
- xarray.DataTree poster for AGU last week
- Working/thinking about a good tutorial for notebook for DataTree
- `DataTree.to_zarr(append_dim)` bug (https://github.com/pydata/xarray/issues/9858)
- Deepak
- working on rewriting linear interp to use indexing + averaging
- rewrote interp to use apply-ufunc
- https://github.com/pydata/xarray/pull/9881
- using shuffle for groupby binary ops
- https://github.com/pydata/xarray/pull/9896
### Agenda
- NumFOCUS SDG idea: Xarray objects reprs (Benoît)
- https://numfocus.org/programs/small-development-grants (we should use it more)
- HTML (interactive) repr:
- Well-defined scope, well suited for NumFOCUS SDG application
- Impactful! (https://matthewrocklin.com/blog/2019/07/04/html-repr)
- Fix html repr rendering issues in sphinx-based documents (dark-mode)
- Datatree repr (https://github.com/pydata/xarray/issues/9350, https://github.com/pydata/xarray/issues/9350)
- Embed nd-array visualization (https://github.com/pydata/xarray/issues/9324)
- https://github.com/benbovy/xarray-fancy-repr (the whole Javascript ecosystem at our fingertips!)
- I'm (Benoît) happy to draft a SDG proposal. Who else is interested contributing? Find a front-end / Javascipt / Viz expert?
- Moving CF-related codecs outside of xarray
## Dec 04, 2024
### Attendees
- Deepak Cherian / @dcherian
- Justus Magin / @keewis
- Scott Henderson / @scottyhq
- Tom Nicholas / @TomNicholas
- Nick Hodgskin / @VeckoTheGecko
### 60 second updates
- Deepak:
- played around with better vectorized interp with dask
- better idxmin, idxmax with dask
- https://github.com/pydata/xarray/pull/9800
- pushed anderson's namedarray/backends refactor quite close
- https://github.com/pydata/xarray/pull/9273
- Scott:
- worked w/ Benoit lask week to rekindle CRSIndex
- prototype https://github.com/benbovy/xproj
- sent email about possible NSF Grant, any takers?
- https://new.nsf.gov/funding/opportunities/safe-ose-safety-security-privacy-open-source-ecosystems
- Justus
- some progress on [marray](https://github.com/mdhaber/marray/)
- Nick:
- New to xarray!
- Bio
- Research Software Engineer in University Utrecht working on OceanParcels and other oceanography projects
- Looking to depend more on Xarray, and contribute upsteam for my own professional development.
- Low hanging fruit (https://github.com/pydata/xarray/pull/9821, https://github.com/pydata/xarray/pull/9840). Still need feedback on 9821
- Tom
- Mostly VirtualiZarr stuff for AGU
- Small xarray things
- Writing blog post announcing `xarray.DataTree`
- Ideas https://github.com/xarray-contrib/xarray.dev/issues/708
- Stephan
- Writing yet-another implementation of labeled arrays on top of JAX: https://github.com/neuralgcm/neuralgcm/tree/main/neuralgcm/experimental/coordax
### Agenda
- NSF security grant
- anything to include in datatree blog announcement?
- Want to include thoughts about collaboration with NASA
- Including the in-kind dev time contributions that ESDIS made
- Ideal in the sense of literally zero overhead
- Also core dev spending 10% time spent directing someone with more time is efficient use of relative expertise
- Less ideal that Tom/Justus/Stephan didn't get paid for the work
- In future better to have one of the paid people at the contributing org already be a core dev
- pushed anderson's namedarray/backends refactor quite close. ready for prelim review.
- https://github.com/pydata/xarray/pull/9273
## Nov 20, 2024
### Attendees
- Matt Savoie / @flamingbear
- Deepak Cherian / @dcherian
- Justus Magin / @keewis
- Stephan Hoyer
- Eni Awowale / @eni-awowale
- Kai Mühlbauer / @kmuehlbauer
- Tom Nicholas / @TomNicholas
- Alfonso Ladino / @aladinor
### 60 second updates
- Matt: mostly just watching the repo for datatree issues and using it constantly in my day to day.
- Deepak:
- lots of dask stuff
- zarr v3 compatibility
- icechunk distributed writes
- Justus:
- rewrite of the min-deps check script
- creation of a separate github action
- Kai:
- datetime64 decoding (non nanosecond relaxation, https://github.com/pydata/xarray/pull/9618)
- Tom:
- Not much direct xarray stuff
- Eni:
- Testing xarray.DataTree internally ran into some issues with numpy 2.0 :-/
- Working on DataTree poster for AGU, will share accordingly with folks!
### Agenda
- fsspec utility PR: https://github.com/pydata/xarray/pull/9797
- icechunk & to_zarr
- https://github.com/earth-mover/icechunk/issues/383
- add the notion of closing a store?
- maybe make the store readable after pickle?
- upstream issue: https://github.com/earth-mover/icechunk/issues/185
- duck array / array api PR: https://github.com/pydata/xarray/pull/9798
- ImportError / ValueError when chunkmanagers not installed?
- https://github.com/pydata/xarray/pull/9676
- Fine to maintain explicit list of "expected" chunkmanagers
- This would help us improve error messages by pointing to packages like cubed-xarray
- Separate question of whether or not the entire entrypoint system was overkill
- But we can punt on that for later
## Nov 06, 2024
### Attendees
- Tom Nicholas / @TomNicholas
- Deepak Cherian / @dcherian
- Kai Mühlbauer / @kmuehlbauer
- Owen Littlejohns / @owenlittlejohns
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Stephan Hoyer / @shoyer
- Eni Awowale / @eni-awowale
### 60 second updates
- Tom
- Was ill
- Now crying about election
- Working on virtualizarr
- Datatree release seems to have gone okay?
- Deepak
- shuffle:
- Groupby.shuffle: https://github.com/pydata/xarray/pull/9320
- GroupBy.map(..., shuffle=True) https://github.com/pydata/xarray/pull/9706
- Joe
- zarr3
- zarr3+xarray concurrency
- Kai:
- non-nanosecond time decoding (https://github.com/pydata/xarray/pull/9618)
- old issue treatment
- Justus:
- astropy / numpy subclasses: https://github.com/pydata/xarray/pull/9705
### Agenda
- fsspec by default in all backends:
- https://github.com/pydata/xarray/issues/9723
- perhaps with only basic fsspec functionality
- will ask to open PR.
- PRs needing review:
- Shuffle: https://github.com/pydata/xarray/pull/9320
- GroupBy.shuffle() -> GroupBy
# - GroupBy.sort() -> Dataset
-
- Dataset.shuffle_by(Groupers) -> Dataset
- GroupBy.map(.., shuffle=True) -> uses shuffle + map_blocks (useful for quantile)
- Pandas extensionarray: https://github.com/pydata/xarray/pull/9671
- kai + someone else
- unlock setup-micromamba https://github.com/pydata/xarray/pull/9732
## Oct 23, 2024
### Attendees
- Justus Magin / @keewis
- Joe Hamman / @jhamman
- Tom Nicholas / @TomNicholas
- Eni Awowale / @eni-awowale
- Deepak Cherian / @dcherian
### 60 second updates
- Stephan:
- Lots of DataTree refinements
- Added xarray.group_subtrees(): https://github.com/pydata/xarray/pull/9636
- Justus:
- open_datatree + chunks
- missing value support for numpy (marray / dtypes)
- Tom
- Reviewing Stephan's DataTree PRs
- Some small DataTree PRs myself, including updating the HTML repr to match new inheritance model
- Otherwise mostly VirtualiZarr stuff
- Joe
- Zarr v3
- Icechunk
- Deepak
- just back from vacation.
### Agenda
- Release?
- What to do with `xarray-contrib/datatree`?
- Yank it from PyPI?
- No - instead release one more time with a warning on import
- Maybe yank in future...
- link to the migration guide in the readme
- retire the old datatree repository
- Tom volunteered to do the release
- DataTree stuff to finish up?
- Support chunks in open_datatree()
- compute, load, chunk, persist?
- Re-write coordinates in each group when writing to Zarr?
- Zarr V3 PR
- stops interpreting Zarr `.fill_value` as CF `_FillValue` only for new V3 stores
- are we affected by the RTD add-ons deprecation?
- https://about.readthedocs.com/blog/2024/07/addons-by-default/#how-does-it-affect-my-projects
## Oct 9, 2024
### Attendees
- Tom Nicholas / @TomNicholas
- Justus Magin / @keewis
- Joe Hamman / @jhamman
- Owen Littlejohns / @owenlittlejohns
- Spencer Clark / @spencerkclark
- Mathias Hauser / @mathause
### 60 second updates
- Tom
- Reviewing Stephan's datatree PRs around coordinate inheritance
- Migration guide for users of old datatree repo https://github.com/pydata/xarray/pull/9598
- Justus
- PR to avoid truncating fixed-width strings: https://github.com/pydata/xarray/pull/9586
- Joe
- Xarray <-> Zarr-python V3 integration is close but not in `main`
- Mathias
- Issue with reducing non-numeric scalars
- Spencer
- Just wanted to thank Kai for looking at datetime precision issue
- Owen
- Planning to review Tom's PR on datatree alignment docs
### Agenda
- Zarr-python v3 status update
- Consolidated metadata is on by default in xarray but not part of v3 spec
- But Tom A has made that work on a branch
- FillValue issues
- https://github.com/pydata/xarray/issues/5475
- Strings
- Added a variable-length string codec in zarr
- working branches
```
pip install git+https://github.com/TomAugspurger/zarr-python@xarray-compat git+https://github.com/TomAugspurger/xarray/@fix/zarr-v3 git+https://github.com/jhamman/dask@fix/zarr-array-construction-2
```
## Sep 25, 2024
### Attendees
- Kai Mühlbauer / @kmuehlbauer
- Justus Magin / @keewis
- Deepak Cherian / @dcherian
- Matt Savoie / @flamingbear
- Tom Nicholas / @TomNicholas
- Eni Awowale/ @eni-awowale
- Paul Ockenfuß / @Ockenfuss
- Spencer Clark / @spencerkclark
### 60 second updates
- Kai
- [ERAD2024](https://openradarscience.org/erad2024/) short course on open source software for weather radar processing
- h5netcdf new release soon with additional capabilities
- preparing xarray for that change
- Justus
- nested duck array introspection issue: https://github.com/data-apis/array-api/discussions/843#discussioncomment-10714668
- feedback: instead of a new protocol, get nested namespaces
- swapping the order of preference to `__array_namespace__` over `__array_function__`
- https://github.com/pydata/xarray/pull/9530
- Deepak
- groupby things (chunked array, shuffle)
- Stephan
- DataTree inheritance issues, related to https://github.com/pydata/xarray/issues/9475
- Tom
- DataTree inheritance model discussions
- Wrote some documentation on the new data model
- https://xray--9501.org.readthedocs.build/en/9501/user-guide/hierarchical-data.html#alignment-and-coordinate-inheritance
- Would be a good thing for others to take a look at
- But bear in mind it is affected by the unsolved issue https://github.com/pydata/xarray/issues/9475
- Spencer
- Addressing various issues arising from changes made to enable lazy encoding of chunked arrays of datetimes. https://github.com/pydata/xarray/pull/9498 should hopefully more robustly address most of these.
- See discussion here for more background: https://github.com/pydata/xarray/issues/9488#issuecomment-2351149546.
- Defer cast to different dtype to its usual place in the encoding pipeline.
- More safely allow different default choice of datetime64[ns] encoding units: https://github.com/pydata/xarray/issues/9154.
### Agenda
1. Need decision on xarray, xarray-core on conda-forge
- https://github.com/conda-forge/xarray-feedstock/pull/113#issuecomment-2265819231
- Decision: xarray, xarray-core on conda; xarray & xarray[recommended] on PyPI?
- 6 month deprecation period.
2. PRs needing review:
- GroupBy(chunked array) : https://github.com/pydata/xarray/pull/9522
- netcdf4/h5netcdf: complex numbers and enums https://github.com/pydata/xarray/pull/9509
- API naming for improved gap filling. See summary of open questions [here](https://github.com/pydata/xarray/pull/9402#issuecomment-2341844048) and [here](https://github.com/pydata/xarray/pull/9402#issuecomment-2344171177)
3. Grouped Shuffle
- general issue: https://github.com/pydata/xarray/issues/9546
- PR: https://github.com/pydata/xarray/pull/9320
4. DataTree inheritance issue, related to https://github.com/pydata/xarray/issues/9475
- separate discussion meeting? (Tom: Yes, we could also just stay on the call after? Stephan: unfortunately I cannot today)
## Sep 11th, 2024
### Attendees
- Matt Savoie / @flamingbear
- Tom Nicholas / @TomNicholas
- Owen Littlejohns / @owenlittlejohns
- Eni Awowale/ @eni-awowale
- Stephan Hoyer / @shoyer
### 60 second updates
- Tom
- Lots of DataTree stuff
- We are very close to releasing!
- Matt
- (datatree) keeping up with main changes in docs. Need to fix current doc errors.
### Agenda
- Release
- DataTree q's (maybe answered now)
- Hashable vs str
- https://github.com/pydata/xarray/issues/8836#issuecomment-2341963401
- `DataTree.subtree.<method>` namespace?
- https://github.com/pydata/xarray/issues/9472#issuecomment-2341590576
- Avoid duplicated variables by design
- Eni's `open_groups` typing Q
## Aug 28, 2024
### Attendees
- Justus Magin / @keewis
- Matt Savoie / @flamingbear
- Tom Nicholas /
- Deepak Cherian /
- Daniel Kaufman / @danielfromearth
### 60 second updates
- Justus: sprint on xarray + duckarrays testing framework with Tom (https://github.com/xarray-contrib/xarray-array-testing/)
- Tom:
- duckarrays testing
- using `conventions.decode_cf_variables` without decoding actual values
- https://github.com/zarr-developers/VirtualiZarr/pull/224
- going to NUMFocus summit next week
- Deepak:
- groupby multiple arrays
- providing input to dask things; shuffling, blockwise reshape, auto rechunking, reshaping;
- https://github.com/dask/dask/pull/11350/files
- Matt: Nada
### Agenda
- Anyone else want to come and represent Xarray at the NUMFocus summit next week in Boston?
- Eni?
- decode_cf
- PRs needing review:
- speed up docs build: https://github.com/pydata/xarray/pull/9395
- Shuffling API:
- https://github.com/pydata/xarray/pull/9320
- `Dataset.shuffle_by() -> Dataset`
- `DatasetGroupBy.shuffle() -> DatasetGroupBy`.
## Aug 14, 2024
### Attendees
- Justus Magin / @keewis
- Owen Littlejohns / @owenlittlejohns
- Eni Awowale/ @eni-awowale
- Tom Nicholas / @TomNicholas
- Deepak Cherian / @dcherian
### 60 second updates
- Moving old datatree issues
- Tom
- Working on chunkmanager PR
- Deepak
- groupby shuffle
- https://github.com/pydata/xarray/pull/9320
- engaging with dask
- https://github.com/dask/dask/issues/11314
- https://github.com/dask/dask/pull/11303
- https://github.com/dask/dask/pull/11273
- Justus
- merged the python 3.9 dropping PR
### Agenda
- VirtualZarr ideas?
- Eni's open_groups PR
- https://github.com/pydata/xarray/pull/9243
## July 31, 2024
### Attendees
- Matt Savoie / @flamingbear
- Tom Nicholas / @TomNicholas
- Justus Magin / @keewis
- Owen Littlejohns / @owenlittlejohns
### 60 second updates
- Matt
- is trying to wrap head about copying trees. [#9285](https://github.com/pydata/xarray/issues/9285) should not be as hard as I'm making it.
- Tom
- Trying to coordinate to push datatree over the finish (well first release) line
- Fixes for a bunch of small datatree bugs
- PR for allowing chunked arrays that aren't dask/cubed through xarray
- https://github.com/pydata/xarray/pull/9286
- rename chunkmanagers vs "ComputeManagers"
- Justus:
- released 2024.07.0 yesterday (new script to extract contributors from git commits)
- pint-xarray: accessor entrypoints / PintIndex
### Agenda
- DataTree should avoid any in-case modification
- Auto-copy on setting parent?
- Remove the ability to assign .parent entirely?
- Need to keep .parent accessible in order to walk up through tree
- Who is submitting to AGU today?
- (Tom is, on VirtualiZarr)
- (Ryan is)
- Owen is
- Stephan maybe
- ChunkManager vs ComputeManager https://github.com/pydata/xarray/pull/9286
- Justus tell us about the PintIndex (postponed to next time)
## July 17, 2024
Cancelled -- only Stephan Hoyer and Justus Magin showed up.
## July 3, 2024
### Attendees
- Justus Magin / @keewis
- Tom Nicholas / @TomNicholas
- Matt Savoie / @flamingbear
- Joe Hamman / @jhamman
### 60 second updates
- Tom
- Reviewed datatree coordinate inheritance PR properly (https://github.com/pydata/xarray/pull/9063)
- Now unblocked for releasing datatree in `main`
- Mostly actually wrote code for virtualizarr
- Justus
- lots of fixes for numpy2 (for the dependencies we couldn't test before)
- https://github.com/pydata/xarray/pull/9136 should be ready for merging
- other bug fixes (hypothesis test for datetime ExtensionArrays, arrays as attributes)
- nested duck arrays: finding `cupy` underneath arbitrary layers (especially dask)
- Matt
- Watching [#9063 ](https://github.com/pydata/xarray/pull/9063#) get merged.
- Joe
- Just working on zarr-python
-
### Agenda
- Codecs separate from xarray?
- Keeps coming up in virtualizarr
- https://github.com/zarr-developers/VirtualiZarr/issues/68#issuecomment-2197682388
- Can we get zarr-python to open a netCDF file by using chunk manifests + defining enough new codecs?
- One difference is that zarr codecs take arrays -> arrays but xarray decoding takes Variables -> Variables
- action: open a new issue to consolidate the discussion
- Interesting question of subclassing xarray.Dataset in virtualizarr
- https://github.com/zarr-developers/VirtualiZarr/issues/171
- Cupy + dask
- https://github.com/pydata/xarray/issues/7721 (discussion of the issue)
- https://github.com/keewis/nested-duck-arrays
## June 19, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Tom Nicholas / @TomNicholas
- Stephan Hoyer / @shoyer
- Owen Littlejohns / @owenlittlejohns
- Eni Awowale/ @eni-awowale
- David Auty / @autydp
- From NASA EED-3, knows Matt and Owen
### 60 second updates
- Matt
- Hope to continue datatree inheritance discussion.
- Justus: numpy2-compatible release last week
### Agenda
- DataTree coordinate inheritance question
- Release timeline
- Can we release by the time of Eni and Tom's SciPy talk about DataTree? (~July 10th)
- How much feedback from community do we need?
- Stephan: Got plenty already
- David: Has "quirky" data at NASA
- Would probably prefer more lenient data model
- Stephan: Prefer not to have "fallback mechanisms"
- David: Wants to use datatree for analysis, ideally changing the structure as little as possible
- Tom: What do we think about this `open_as_dict_of_datasets` idea? Would that help?
- Tom: Solves problem of interrogating data / displaying groups
- Stephan: Makes some sense - analogous to how `open_mfdataset` works for 90% of cases
- As if we had made a `open_mf_as_grid_of_datasets` function to create an interrogatable intermediate structure
- Stephan: Function to write a messy dataset too? (lower priority)
- Matt: In favour
- David: Can you open just a subtree of a file? Tom: Yes if we add a group kwarg to `open_datatree`
- Eni: Useful if `open_datatree` failed on alignment it gave very clear report of what should be fixed
- Justus: `preprocess` arg could be useful for "massaging"
- Tom: Could use python's new Exception Groups feature for showing all errors at once
- Stephan: Should also think about saving out a "crooked datatree"
- Consensus?!
- Plan going forward
- Everyone who is interested look in detail at Stephan's PR (https://github.com/pydata/xarray/pull/9063)
- Likely to spawn smaller issues / PRs about reprs and so on
- Need separate PR for `open_as_dict_of_datasets` (or `open_datatree_as_dict`?) (Tom can raise issue for this)
- Orthogonal to Stephan's PR
- Tutorial for tidying up a messy nested netCDF file into a nice sane aligned DataTree (similar to the "Tidy Xarray" idea)
## June 5, 2024
### Attendees
- Justus Magin / @keewis
- Tom Nicholas / @TomNicholas
- Kai Mühlbauer / @kmuehlbauer
- Stephan Hoyer / @shoyer
- Joe Hamman / @jhamman
- Deepak Cherian / @dcherian
- Matt Savoie / @flamingbear
- Mathias Hauser / @mathause
### 60 second updates
- Justus:
- numpy 2: more progress, still not done (lots of edge cases): https://github.com/pydata/xarray/pull/8946
- Joe
- still heads down on zarr 3, alpha release coming this week
- I will be at the CZI annual meeting next week, kicking off our latest grant
- We're hiring (https://jobs.gusto.com/postings/earthmover-xarray-community-developer-498dca94-335e-4d5c-a6c7-83ca19772512)
- Tom:
- Owen, Matt, and Eni accepted our invitation to join the core team!
- Mostly just this discussion about datatree coordinate inheritance behaviour (https://github.com/pydata/xarray/issues/9056)
- Kai:
- a bit of issue clearance
- jumped now on the open_datatree-stuff
- Deepak:
- job posting: https://jobs.gusto.com/postings/earthmover-xarray-community-developer-498dca94-335e-4d5c-a6c7-83ca19772512
- @deepak (https://discourse.pangeo.io/t/potential-for-adapting-pythia-foundations-for-different-disciplines-e-g-neuro/4239/3?u=tomnicholas)
- user survey: https://docs.google.com/forms/u/2/d/1x9bOIelnUsDMyI1tF4bN7TWK0v4nBDiwhpxh9mi6PaI/edit
- last call for comments.
- Stephan:
- DataTree inheritance model: https://github.com/pydata/xarray/pull/9063
### Agenda
- Numpy 2: dtype casting in where
- separate meeting to discuss in detail
- Miscellaneous PRs
- https://github.com/pydata/xarray/pull/5704
## May 22nd, 2024
### Attendees
- Matt / @flamingbear
- Justus / @keewis
- Tom Nicholas / @TomNicholas
- Mathias Hauser / @mathause
### 60 second updates
- Matt
- no update
- Justus
- numpy 2: array api fixes (ready for a final review!) https://github.com/pydata/xarray/pull/8854
- numpy 2: where dtype casting. Stephan helped me figure out a clean way to implement this, but didn't have time to do this, yet.
- Tom
- No real updates on xarray itself
- Q about ChunkManager https://github.com/pydata/xarray/issues/8733#issuecomment-2111146588
- Joe
- Still cranking on zarr v3
- Deepak is heads down this week
- Mathias
- no update
### Agenda
## May 8th, 2024
### Attendees
- Deepak
- Justus
- Matt
- Tom
- Mathias
- Ryan
### 60 second updates
- Deepak
- optimizing zarr region writes / appends
- iterating on Xarray User Survey
- Justus
- more numpy 2 compat... we're now down to failures with just the array api and the casting changes due to NEP 50
- Tom
- Trying to start some deprecation cycles
- PR to concat without creating indexes
- Ryan
- Trying to nerd-swipe someone into making some useful indexes
- https://github.com/pydata/xarray/discussions/8955#discussioncomment-9226372
- https://discourse.pangeo.io/t/example-which-highlights-the-limitations-of-netcdf-style-coordinates-for-large-geospatial-rasters/4140/26
- Tom: even simpler case - no indexes!
- PR for concat is all that is needed
- Immediate next case is pandas index that is disconnected from variable data
- https://github.com/TomNicholas/VirtualiZarr/issues/18#issuecomment-2025423042
- Mathias
- no news
- Matt
- Also no update
### Agenda
- NamedArray update
- Stalled, Anderson has run out of time
- 80% of the way there for decoupling the backends from lazy indexing.
- Action item: make a todo list for whats left and needed.
- PRs to review/merge
- Tom: Please someone merge this indexes PR https://github.com/pydata/xarray/pull/8872
- I can't release v0.1 of VirtualiZarr until it's in xarray main...
- Deepak : https://github.com/pydata/xarray/pull/8998
- Numpy 2 compat:
- should we switch casting behavior to NEP 50? https://github.com/pydata/xarray/pull/8946
- https://numpy.org/neps/nep-0050-scalar-promotion.html
- array api is close: https://github.com/pydata/xarray/pull/8854
- Release plan:
- release before that doesn't fully support numpy 2 yet
- release one version with numpy 2 and py3.9
- then drop python 3.9
## April 24th, 2024
### Attendees
- Justus Magin / @keewis
- Matt Savoie / @flamingbear
- Kai Mühlbauer / @kmuehlbauer
- Tom Nicholas / @TomNicholas
- Joe Hamman / @jhamman
- Owen Littlejohns / @owenlittlejohns
- Deepak Cherian / @dcherian
- Stephan
### 60 second updates
- Justus: upstream-dev CI / numpy 2 compat
- Tom:
- On a train, probably can't call in
- Looking at changing backends.NetCDFDataStore to only open file once when reading many groups
- Kai are you or others planning to take this on?
- I want to change internal invariants to stop checking for default pandas indexes
- https://github.com/pydata/xarray/pull/8960#discussion_r1573306634
- @deepak what option are you referring to? I don't see a kwarg to `assert_equal`... https://github.com/pydata/xarray/blob/b0036749542145794244dee4c4869f3750ff2dee/xarray/testing/assertions.py#L88-L120
- https://github.com/pydata/xarray/blob/b0036749542145794244dee4c4869f3750ff2dee/xarray/testing/assertions.py#L385-L387
- +We should plumb it through.+ Wrong: look here: https://github.com/pydata/xarray/blob/b0036749542145794244dee4c4869f3750ff2dee/xarray/tests/__init__.py#L286
- (we don't use the public API directly in tests)
- COol.
- This might be a good time to write an "Assertions" section into the [docs page on testing](https://docs.xarray.dev/en/stable/user-guide/testing.html#testing-your-code)
- Kai: not much, considering helping with datatree backend stuff together with @aladinor and @mgrover1, need to check which way to go (from the xarray side, or from the external backend side)
- Matt: also not much recently. Always datatree. listening.
- Owen: Continued datatree migration, current PR open: https://github.com/pydata/xarray/pull/8967
- Joe:
- Foo proposal was funded! Deepak and I are hoping to hire a near-full time dev with bio experience to come work with us
- Going to try zarr-python-3 in xarray next week.
- Deepak - not much, pushed on public grouper api
### Agenda
- Break behaviour of dataset constructor?
- https://github.com/pydata/xarray/issues/8959
- `ds = xr.Dataset(data_vars={'x': ('x', [0])})`
- promotes to coordinate
- Start with `PendingDeprecationWarning`
- Add a separate more explicit construction method/kwarg? Or use the new behavior in case a `Coordinates` object is passed
- numpy 2:
- array api: https://github.com/pydata/xarray/pull/8854
- main concerns: dispatching between numpy issubdtype and arrayapi isdtype <- this is kinda hairy
- stephan will take a look
- copy parameter to `__array__` (typing, mostly): https://github.com/pydata/xarray/pull/8939/files
- dtype casting rules: https://github.com/pydata/xarray/pull/8946#issuecomment-2068949796
- general rule: determine dtype without python scalars (which are "weak dtypes" in jax), then cast python scalars to array using that dtype. If that doesn't work, either raise or determine a fallback
- implementation `as_shared_dtype`
- `concatenate` and `stack` shouldn't really allow python scalars (?)
- may be specific to `where`, in which case that code could also go there
## April 10, 2024
### Attendees
- Matt Savoie / @flamingbear
- Kai Mühlbauer / @kmuehlbauer
- Tom Nicholas / @TomNicholas
- Justus Magin / @keewis
- Deepak Cherian / @dcherian
- Owen Littlejohns / @owenlittlejohns
### 60 second updates
- Matt : Good meeting for Datatree yesterday. [PR](https://github.com/flamingbear/xarray/pull/11) to [existing PR](https://github.com/pydata/xarray/pull/8879) for simplifying iterators is ready. Owen will ping Tom later today when he merges.
- Justus:
- "source" encoding from `fsspec` objects
- h5netcdf + character sets
- Tom
- Mostly thinking about the virtualizarr stuff (i.e. not propagating xarray indexes and dealing with encoding)
- Chance of me being able to think about datatree inheritance has gone up since NCAR machines are all down...
- Kai: not much xarray related (beside some h5netcdf char encoding ;-)
- Owen: [Open PR for iterators.py](https://github.com/pydata/xarray/pull/8879) - Will update based on recent feedback.
- Deepak :
- merged in stateful tests (https://github.com/pydata/xarray/pull/8658)
- Explanation of hypothesis testing strategies https://docs.xarray.dev/en/stable/user-guide/testing.html#hypothesis-testing
### Agenda
- upstream tests:
- https://github.com/pydata/xarray/issues/8844
- string dtypes (needs volunteer)
- array API tests
- https://github.com/pydata/xarray/pull/8854
```
xarray/tests/test_duck_array_ops.py::TestOps::test_where_type_promotion: AssertionError: assert dtype('float64') == <class 'numpy.float32'>
+ where dtype('float64') = array([ 1., nan]).dtype
+ and <class 'numpy.float32'> = np.float32
xarray/tests/test_duck_array_ops.py::TestDaskOps::test_where_type_promotion: AssertionError: assert dtype('float64') == <class 'numpy.float32'>
+ where dtype('float64') = array([ 1., nan]).dtype
+ and <class 'numpy.float32'> = np.float32
```
- encoding and virtualizarr
- https://github.com/TomNicholas/VirtualiZarr/issues/68
- https://github.com/fsspec/kerchunk/blob/a0c4f3b828d37f6d07995925b324595af68c4a19/docs/source/tutorial.rst
## March 27, 2024
### Attendees
- Deepak Cherian
- Alex Ford / @asford
- Tom Nicholas / @TomNicholas
- Matt Savoie / @flamingbear
- Stephan Hoyer
### 60 second updates
- Deepak : upstream-dev fixes
- Tom:
- Datatree meetings
- Xarray without indexes
- With a few un-merged PRs I can actually xr.concat Datasets without indexes at all
- See [example in VirtualiZarr](https://github.com/TomNicholas/VirtualiZarr/blob/main/docs/usage.md#manual-concatenation-ordering)
- Alex F
- First time attending.
- Question on possible wrapping of torch-tensors in xarray
- We have working internal fork, interested in upstreaming
- Matt
- good meeting yesterday with agreement to move forward faster not smarter. Basically move most code without improvements and identify places we want to work later.
- Stephan
- Benoit might be working on indexes again in a couple of months, funding from NASA grant at UW.
### Agenda
- torch inside xarray
- Relevant issues
- https://github.com/pydata/xarray/issues/3232
- https://github.com/data-apis/array-api-compat
- as a comparison point: [JAX-Xarray](https://github.com/google-deepmind/graphcast/blob/main/graphcast/xarray_jax.py)
- https://github.com/pytorch/pytorch/issues/58743
- Why xarray?
- "problem of dimension tracking", sequence information, hypercubes, align in to canonical coordinate frame, write gradient aware calculations inside that coordinate system,
- like named dims, tried NamedTensor, switched to Xarray, have many coordinate variables,
- Pain points?
- pytorch isn't compliant with array API standard
- can be mostly solved using the array-api-compat shim library
- non-numpy dtypes
- Not really covered in the array API standard
- Might need special-casing within xarray