New Xarray meeting notes

https://us02web.zoom.us/j/87503265754?pwd=cEFJMzFqdTFaS3BMdkx4UkNZRk1QZz09

Archive: https://hackmd.io/Vv6g2ABzTPKbe2MWBQqS1w

Mar 12, 2025

Attendees

  • Eni Awowale / @eni-awowale
  • Justus Magin / @keewis
  • Julia Signell / @jsignell
  • Alfonso Ladino / @aladinor
  • Davis Bennett / @d-v-b

60-second updates

Agenda

Feb 26, 2025

Attendees

  • Justus Magin / @keewis
  • Alfonso Ladino / @aladinor
  • Matt Savoie / @flamingbear
  • Tom Nicholas / @TomNicholas
  • Eni Awowale / @eni-awowale
  • Ian Hunt-Isaak / @ianhi
  • Joe Hamman / @jhamman
  • Davis Bennett / @d-v-b

60-second updates

Agenda

Feb 12, 2025

Attendees

  • Deepak Cherian /@dcherian
  • Justus Magin / @keewis
  • Alfonso Ladino / @aladinor
  • Eni Awowale / @eni-awowale
  • Tom Nicholas / @TomNicholas
  • Max Jones / @maxrjones
  • Ian Hunt-Isaak / @ianhi
  • Stephan

Updates

Agenda

  • SciPy proposals?
    • Email same list of people who ran tutorial last time
  • Pandas extension arrays in Xarray
    • bug about how we've stopped eagerly converting to numpy: https://github.com/pydata/xarray/issues/9742
    • more extension array support: https://github.com/pydata/xarray/pull/9671
    • Possible solutions:
      1. Preserve them as pandas extension types but a lot of operations break
        • nice for categoricalarray, intervalarray, datetimearray with timezone
        • add support for N-d data into PandasExtensionWrapper, by adding a .shape attribute
      2. Convert into corresponding NumPy dtypes but this is lossy
        • option to control which dtypes are converted
      3. Wrap them in masked duck arrays, using marray
        • increased memory usage (additional bool mask)
        • somewhat surprising?
      4. Somehow make it easier to write custom dtypes in NumPy
  • Array conversion methods (e.g., https://github.com/pydata/xarray/pull/9823)
    ​​​​ds.as_array_type(cp.asarray)
    ​​​​ds.as_array_type(jnp.from_dlpack)
    ​​​​ds.as_array_type(jnp.asarray, device=jax.devices("gpu")[0])
    ​​​​ds.as_array_type(pint.Quantity, units="m/s")
    ​​​​ds.is_array_type(cp.ndarray) # -> True
    
    • Could also map over all array nodes ala JAX-Xarray?
      • jax.tree.map(cp.asarray, ds) or isinstance(array.data, cp.ndarray)
    • Conceptually this is a typing issue could Dataset be a generic array mapping over a value type?
      • Kind of like dict / TypedDict

Jan 29, 2025

Attendees

  • Tom Nicholas / @TomNicholas
  • Kai Mühlbauer / @kmuehlbauer
  • Justus Magin / @keewis
  • Alfonso Ladino / @aladinor
  • Matt Savoie / @flamingbear
  • Eni Awowale / @eni-awowale
  • Joe Hamman / @jhamman
    ore / @joshmoore

60 seconds updates

Agenda

  • Executive order?
    • NumFOCUS is technically a NASA subcontractor through xarray funding
    • Let's just wait for NumFOCUS to advise
  • Flexible indexes PR status?
  • Multiple DataTree bugs surfaced
  • Release including non-nanosecond in January (tomorrow or Friday) or postpone to February
    • just release? Kai will create a final release tracking issue #10002 and after that goes for preparing the release tomorrow/Friday
  • marray appears to be ready for trying with xarray (might be possible to interface with fillna, bfill / ffill, where, notnull / isnull, interpolate_na):
    ​​​​import marray
    ​​​​
    ​​​​data = marray.numpy.asarray(np.array([1, 2, 3], dtype="int32"), mask=np.array([True, False, True]))
    ​​​​arr = xr.DataArray(data, dims="x")
    ​​​​assert arr.dtype == "int64"
    

Jan 15, 2024

Attendees

  • Stephan Hoyer
  • Kai Mühlbauer / @kmuehlbauer
  • Eni Awowale / @eni-awowale
  • Davis Bennett

60 seconds updates

Agenda

Dec 18, 2024

Attendees

  • Benoît Bovy / @benbovy (cannot attend, unfortunately)
  • Kai Mühlbauer / @kmuehlbauer
  • Tom Nicholas / @TomNicholas
  • Matt Savoie / @flamingbear
  • Justus Magin / @keewis
  • Eni Awowale / @eni-awowale
  • Deepak Cherian / @dcherian
  • Stephan Hoyer
  • Matt Savoie

60 seconds updates

Agenda

Dec 04, 2024

Attendees

  • Deepak Cherian / @dcherian
  • Justus Magin / @keewis
  • Scott Henderson / @scottyhq
  • Tom Nicholas / @TomNicholas
  • Nick Hodgskin / @VeckoTheGecko

60 second updates

Agenda

  • NSF security grant
  • anything to include in datatree blog announcement?
    • Want to include thoughts about collaboration with NASA
    • Including the in-kind dev time contributions that ESDIS made
    • Ideal in the sense of literally zero overhead
      • Also core dev spending 10% time spent directing someone with more time is efficient use of relative expertise
    • Less ideal that Tom/Justus/Stephan didn't get paid for the work
      • In future better to have one of the paid people at the contributing org already be a core dev
  • pushed anderson's namedarray/backends refactor quite close. ready for prelim review.
    - https://github.com/pydata/xarray/pull/9273

Nov 20, 2024

Attendees

  • Matt Savoie / @flamingbear
  • Deepak Cherian / @dcherian
  • Justus Magin / @keewis
  • Stephan Hoyer
  • Eni Awowale / @eni-awowale
  • Kai Mühlbauer / @kmuehlbauer
  • Tom Nicholas / @TomNicholas
  • Alfonso Ladino / @aladinor

60 second updates

  • Matt: mostly just watching the repo for datatree issues and using it constantly in my day to day.
  • Deepak:
    • lots of dask stuff
    • zarr v3 compatibility
    • icechunk distributed writes
  • Justus:
    • rewrite of the min-deps check script
    • creation of a separate github action
  • Kai:
  • Tom:
    • Not much direct xarray stuff
  • Eni:
    • Testing xarray.DataTree internally ran into some issues with numpy 2.0 :-/
    • Working on DataTree poster for AGU, will share accordingly with folks!

Agenda

Nov 06, 2024

Attendees

  • Tom Nicholas / @TomNicholas
  • Deepak Cherian / @dcherian
  • Kai Mühlbauer / @kmuehlbauer
  • Owen Littlejohns / @owenlittlejohns
  • Matt Savoie / @flamingbear
  • Justus Magin / @keewis
  • Stephan Hoyer / @shoyer
  • Eni Awowale / @eni-awowale

60 second updates

Agenda

Oct 23, 2024

Attendees

  • Justus Magin / @keewis
  • Joe Hamman / @jhamman
  • Tom Nicholas / @TomNicholas
  • Eni Awowale / @eni-awowale
  • Deepak Cherian / @dcherian

60 second updates

  • Stephan:

  • Justus:

    • open_datatree + chunks
    • missing value support for numpy (marray / dtypes)
  • Tom

    • Reviewing Stephan's DataTree PRs
    • Some small DataTree PRs myself, including updating the HTML repr to match new inheritance model
    • Otherwise mostly VirtualiZarr stuff
  • Joe

    • Zarr v3
    • Icechunk
  • Deepak

    • just back from vacation.

Agenda

  • Release?
    • What to do with xarray-contrib/datatree?
      • Yank it from PyPI?
        • No - instead release one more time with a warning on import
        • Maybe yank in future
      • link to the migration guide in the readme
      • retire the old datatree repository
    • Tom volunteered to do the release
  • DataTree stuff to finish up?
    • Support chunks in open_datatree()
    • compute, load, chunk, persist?
    • Re-write coordinates in each group when writing to Zarr?
  • Zarr V3 PR
    • stops interpreting Zarr .fill_value as CF _FillValue only for new V3 stores
  • are we affected by the RTD add-ons deprecation?

Oct 9, 2024

Attendees

  • Tom Nicholas / @TomNicholas
  • Justus Magin / @keewis
  • Joe Hamman / @jhamman
  • Owen Littlejohns / @owenlittlejohns
  • Spencer Clark / @spencerkclark
  • Mathias Hauser / @mathause

60 second updates

  • Tom
  • Justus
  • Joe
    • Xarray <-> Zarr-python V3 integration is close but not in main
  • Mathias
    • Issue with reducing non-numeric scalars
  • Spencer
    • Just wanted to thank Kai for looking at datetime precision issue
  • Owen
    • Planning to review Tom's PR on datatree alignment docs

Agenda

  • Zarr-python v3 status update
    • Consolidated metadata is on by default in xarray but not part of v3 spec
      • But Tom A has made that work on a branch
    • FillValue issues
    • Strings
      • Added a variable-length string codec in zarr
    • working branches
      ​​​​​​pip install git+https://github.com/TomAugspurger/zarr-python@xarray-compat git+https://github.com/TomAugspurger/xarray/@fix/zarr-v3 git+https://github.com/jhamman/dask@fix/zarr-array-construction-2
      

Sep 25, 2024

Attendees

  • Kai Mühlbauer / @kmuehlbauer
  • Justus Magin / @keewis
  • Deepak Cherian / @dcherian
  • Matt Savoie / @flamingbear
  • Tom Nicholas / @TomNicholas
  • Eni Awowale/ @eni-awowale
  • Paul Ockenfuß / @Ockenfuss
  • Spencer Clark / @spencerkclark

60 second updates

Agenda

  1. Need decision on xarray, xarray-core on conda-forge
  2. PRs needing review:
  3. Grouped Shuffle
  4. DataTree inheritance issue, related to https://github.com/pydata/xarray/issues/9475
    • separate discussion meeting? (Tom: Yes, we could also just stay on the call after? Stephan: unfortunately I cannot today)

Sep 11th, 2024

Attendees

  • Matt Savoie / @flamingbear
  • Tom Nicholas / @TomNicholas
  • Owen Littlejohns / @owenlittlejohns
  • Eni Awowale/ @eni-awowale
  • Stephan Hoyer / @shoyer

60 second updates

  • Tom
    • Lots of DataTree stuff
    • We are very close to releasing!
  • Matt
    • (datatree) keeping up with main changes in docs. Need to fix current doc errors.

Agenda

Aug 28, 2024

Attendees

  • Justus Magin / @keewis
  • Matt Savoie / @flamingbear
  • Tom Nicholas /
  • Deepak Cherian /
  • Daniel Kaufman / @danielfromearth

60 second updates

Agenda

Aug 14, 2024

Attendees

  • Justus Magin / @keewis
  • Owen Littlejohns / @owenlittlejohns
  • Eni Awowale/ @eni-awowale
  • Tom Nicholas / @TomNicholas
  • Deepak Cherian / @dcherian

60 second updates

Agenda

July 31, 2024

Attendees

  • Matt Savoie / @flamingbear
  • Tom Nicholas / @TomNicholas
  • Justus Magin / @keewis
  • Owen Littlejohns / @owenlittlejohns

60 second updates

  • Matt
    • is trying to wrap head about copying trees. #9285 should not be as hard as I'm making it.
  • Tom
    • Trying to coordinate to push datatree over the finish (well first release) line
    • Fixes for a bunch of small datatree bugs
    • PR for allowing chunked arrays that aren't dask/cubed through xarray
  • Justus:
    • released 2024.07.0 yesterday (new script to extract contributors from git commits)
    • pint-xarray: accessor entrypoints / PintIndex

Agenda

  • DataTree should avoid any in-case modification
    • Auto-copy on setting parent?
    • Remove the ability to assign .parent entirely?
      • Need to keep .parent accessible in order to walk up through tree
  • Who is submitting to AGU today?
    • (Tom is, on VirtualiZarr)
    • (Ryan is)
    • Owen is
    • Stephan maybe
  • ChunkManager vs ComputeManager https://github.com/pydata/xarray/pull/9286
  • Justus tell us about the PintIndex (postponed to next time)

July 17, 2024

Cancelled only Stephan Hoyer and Justus Magin showed up.

July 3, 2024

Attendees

  • Justus Magin / @keewis
  • Tom Nicholas / @TomNicholas
  • Matt Savoie / @flamingbear
  • Joe Hamman / @jhamman

60 second updates

  • Tom
  • Justus
    • lots of fixes for numpy2 (for the dependencies we couldn't test before)
    • other bug fixes (hypothesis test for datetime ExtensionArrays, arrays as attributes)
    • nested duck arrays: finding cupy underneath arbitrary layers (especially dask)
  • Matt
  • Joe
    • Just working on zarr-python

Agenda

June 19, 2024

Attendees

  • Matt Savoie / @flamingbear
  • Justus Magin / @keewis
  • Tom Nicholas / @TomNicholas
  • Stephan Hoyer / @shoyer
  • Owen Littlejohns / @owenlittlejohns
  • Eni Awowale/ @eni-awowale
  • David Auty / @autydp
    • From NASA EED-3, knows Matt and Owen

60 second updates

  • Matt
    • Hope to continue datatree inheritance discussion.
  • Justus: numpy2-compatible release last week

Agenda

  • DataTree coordinate inheritance question
    • Release timeline
      • Can we release by the time of Eni and Tom's SciPy talk about DataTree? (~July 10th)
      • How much feedback from community do we need?
        • Stephan: Got plenty already
    • David: Has "quirky" data at NASA
      • Would probably prefer more lenient data model
    • Stephan: Prefer not to have "fallback mechanisms"
    • David: Wants to use datatree for analysis, ideally changing the structure as little as possible
    • Tom: What do we think about this open_as_dict_of_datasets idea? Would that help?
      • Tom: Solves problem of interrogating data / displaying groups
      • Stephan: Makes some sense - analogous to how open_mfdataset works for 90% of cases
        • As if we had made a open_mf_as_grid_of_datasets function to create an interrogatable intermediate structure
      • Stephan: Function to write a messy dataset too? (lower priority)
      • Matt: In favour
      • David: Can you open just a subtree of a file? Tom: Yes if we add a group kwarg to open_datatree
      • Eni: Useful if open_datatree failed on alignment it gave very clear report of what should be fixed
      • Justus: preprocess arg could be useful for "massaging"
      • Tom: Could use python's new Exception Groups feature for showing all errors at once
      • Stephan: Should also think about saving out a "crooked datatree"
      • Consensus?!
    • Plan going forward
      • Everyone who is interested look in detail at Stephan's PR (https://github.com/pydata/xarray/pull/9063)
        • Likely to spawn smaller issues / PRs about reprs and so on
      • Need separate PR for open_as_dict_of_datasets (or open_datatree_as_dict?) (Tom can raise issue for this)
        • Orthogonal to Stephan's PR
      • Tutorial for tidying up a messy nested netCDF file into a nice sane aligned DataTree (similar to the "Tidy Xarray" idea)

June 5, 2024

Attendees

  • Justus Magin / @keewis
  • Tom Nicholas / @TomNicholas
  • Kai Mühlbauer / @kmuehlbauer
  • Stephan Hoyer / @shoyer
  • Joe Hamman / @jhamman
  • Deepak Cherian / @dcherian
  • Matt Savoie / @flamingbear
  • Mathias Hauser / @mathause

60 second updates

Agenda

May 22nd, 2024

Attendees

  • Matt / @flamingbear
  • Justus / @keewis
  • Tom Nicholas / @TomNicholas
  • Mathias Hauser / @mathause

60 second updates

Agenda

May 8th, 2024

Attendees

  • Deepak
  • Justus
  • Matt
  • Tom
  • Mathias
  • Ryan

60 second updates

Agenda

April 24th, 2024

Attendees

  • Justus Magin / @keewis
  • Matt Savoie / @flamingbear
  • Kai Mühlbauer / @kmuehlbauer
  • Tom Nicholas / @TomNicholas
  • Joe Hamman / @jhamman
  • Owen Littlejohns / @owenlittlejohns
  • Deepak Cherian / @dcherian
  • Stephan

60 second updates

Agenda

  • Break behaviour of dataset constructor?
    • https://github.com/pydata/xarray/issues/8959
    • ds = xr.Dataset(data_vars={'x': ('x', [0])})
      • promotes to coordinate
      • Start with PendingDeprecationWarning
      • Add a separate more explicit construction method/kwarg? Or use the new behavior in case a Coordinates object is passed
  • numpy 2:

April 10, 2024

Attendees

  • Matt Savoie / @flamingbear
  • Kai Mühlbauer / @kmuehlbauer
  • Tom Nicholas / @TomNicholas
  • Justus Magin / @keewis
  • Deepak Cherian / @dcherian
  • Owen Littlejohns / @owenlittlejohns

60 second updates

  • Matt : Good meeting for Datatree yesterday. PR to existing PR for simplifying iterators is ready. Owen will ping Tom later today when he merges.
  • Justus:
    • "source" encoding from fsspec objects
    • h5netcdf + character sets
  • Tom
    • Mostly thinking about the virtualizarr stuff (i.e. not propagating xarray indexes and dealing with encoding)
    • Chance of me being able to think about datatree inheritance has gone up since NCAR machines are all down
  • Kai: not much xarray related (beside some h5netcdf char encoding ;-)
  • Owen: Open PR for iterators.py - Will update based on recent feedback.
  • Deepak :

Agenda

xarray/tests/test_duck_array_ops.py::TestOps::test_where_type_promotion: AssertionError: assert dtype('float64') == <class 'numpy.float32'>
 +  where dtype('float64') = array([ 1., nan]).dtype
 +  and   <class 'numpy.float32'> = np.float32
xarray/tests/test_duck_array_ops.py::TestDaskOps::test_where_type_promotion: AssertionError: assert dtype('float64') == <class 'numpy.float32'>
 +  where dtype('float64') = array([ 1., nan]).dtype
 +  and   <class 'numpy.float32'> = np.float32

March 27, 2024

Attendees

  • Deepak Cherian
  • Alex Ford / @asford
  • Tom Nicholas / @TomNicholas
  • Matt Savoie / @flamingbear
  • Stephan Hoyer

60 second updates

  • Deepak : upstream-dev fixes

  • Tom:

    • Datatree meetings
    • Xarray without indexes
  • Alex F

    • First time attending.
    • Question on possible wrapping of torch-tensors in xarray
    • We have working internal fork, interested in upstreaming
  • Matt

    • good meeting yesterday with agreement to move forward faster not smarter. Basically move most code without improvements and identify places we want to work later.
  • Stephan

    • Benoit might be working on indexes again in a couple of months, funding from NASA grant at UW.

Agenda

Select a repo