owned this note
owned this note
Published
Linked with GitHub
# Weekly Xarray-DataTree design meeting
[Zoom link](https://us02web.zoom.us/j/87503265754?pwd=cEFJMzFqdTFaS3BMdkx4UkNZRk1QZz09)
[Meetings issue (#8747)](https://github.com/pydata/xarray/issues/8747) - includes list of design questions
[Tracking issue (#8572)](https://github.com/pydata/xarray/issues/8572) - includes checklist of what's been done so far
## Oct 22th, 2024
### Attendees
- Justus Magin / @keewis
- Alfonso Ladino / @aladinor
- Tom Nicholas / @TomNicholas
- Owen Littlejohns / @owenlittlejohns
### Updates
- Justus
-
- Tom
-
### Agenda
- Forbid slashes in coordinate names https://github.com/pydata/xarray/pull/9492
- `group` arg to `open_datatree`
- remove empty parents on top of the selected node
- add ancestor path to `encoding['source']`?
- `chunks` support?
- Before release:
- https://github.com/pydata/xarray/issues/9634
- get `open_datatree` and `open_groups` to support `chunks`
- implement `chunk`, `compute`, `load` and `persist`
- Justus will look into `open_datatree` / `open_groups` with `chunks`
- dask specific methods can be added after the release (https://github.com/pydata/xarray/issues/9355)
## Oct 15th, 2024
### Attendees
- Justus Magin / @keewis
- Alfonso Ladino / @aladinor
- Eni Awowale / @eni-awowale
- Matt Savoie / @flamingbear
### Agenda
- how do we test the `group` argument of `open_datatree`?
## Oct 8th, 2024
### Attendees
- Tom Nicholas / @TomNicholas
- Justus Magin / @keewis
- Alfonso Ladino / @aladinor
- Eni Awowale / @eni-awowale
### Agenda
- Close last issues on xarray-contrib repo?
## Oct 4th, 2024
### Attendees
- Tom Nicholas / @TomNicholas
- Justus Magin / @keewis
- Matt Savoie / @flamingbear
- Gui(lherme) Castelao / @castelao
- Kai Mühlbauer / @kmuehlbauer
### Agenda
- inheritance for map_over_subtree, to_dict, and `to_<file_format>`
## Oct 1st, 2024
### Attendees
- Tom Nicholas / @TomNicholas
- Justus Magin / @keewis
- Matt Savoie / @flamingbear
- Gui(lherme) Castelao / @castelao
- Kai Mühlbauer / @kmuehlbauer
- Alfonso Ladino / @aladinor
### Updates
- Alfonso
- https://github.com/pydata/xarray/pull/9428 ready to go.
### Agenda
- Performance issue with [StoreBackendEntrypoint](https://github.com/pydata/xarray/blob/095d47fcb036441532bf6f5aed907a6c4cfdfe0d/xarray/backends/zarr.py#L1352) when opening datatree in zarr. It is taking too long compared with using [open_dataset](https://github.com/pydata/xarray/blob/095d47fcb036441532bf6f5aed907a6c4cfdfe0d/xarray/backends/zarr.py#L1225
- Open a new issue showing the unexpected behavior
## Sept 24th, 2024
### Attendees
- Tom Nicholas / @TomNicholas
- Justus Magin / @keewis
- Matt Savoie / @flamingbear
- Owen Littlejohns / @owenlittlejohns
- Gui(lherme) Castelao / @castelao
- Kai Mühlbauer / @kmuehlbauer
- Alfonso Ladino / @aladinor
- Eni Awowale / @eni-awowale
### Updates
- Tom
- https://xray--9501.org.readthedocs.build/en/9501/user-guide/hierarchical-data.html#alignment-and-coordinate-inheritance
- Matt: Merged docs and ghosted
### Agenda
- Problem of duplicating inherited coordinates across nodes
- https://github.com/pydata/xarray/issues/9475
- Coordinates backed by indexes can be cheaply (eagerly) compared, and therefore de-duplicated on assignment
- This seems fine, Stephan has a PR to add this
- https://github.com/pydata/xarray/pull/9531
- Problem is this doesn't work for non-indexed coordinates, because any comparison could eagerly load an arbitrarily large variable into memory
- Suggestion 1: pass inherited coordinates separately in `map_over_subtree`
- two arguments go into `map_over_subtree` calls
- downside: can't apply functions that work on datasets anymore
- def func(ds: Dataset) -> Dataset:
...
dt.map_over_subtree(func)
- variant: mark inherited coords with a temporary attribute, and people can duplicate by removing that
- Suggestion 2: Don't allow access to inherited non-indexed coordinates
- Specifically for `.dataset` inside `map_over_subtree`?
- Restricts use cases to not be able to even access non-indexed coordinates
- e.g. want to make decision based on scalar `ds.coords['cloud_coverage']`
- Suggestion 3: Disallow overwriting any inherited coordinates inside `map_over_subtree`
- Should we raise an error or warn if user tries to overwrite inherited coords?
- e.g. `map_over_subtree(lambda ds: ds.isel(...))`
- Add kwarg `replace_duplicated_inherited`
- Suggestion 4: Forbid overriding coordinates in child nodes completely
- Very restrictive, breaks netCDF model
- Stronger version of suggestion 3
- https://github.com/pydata/xarray/pull/9428 might be ready?
## Sept 17th, 2024
### Attendees
- Tom Nicholas / @TomNicholas
- Guiherme Castelao / @castelao
- Stephan Hoyer / @shoyer
### Updates
- Tom
- Wrote some docs on DataTree alignment and coordinate inheritance
- https://github.com/pydata/xarray/pull/9501
- Been refactoring to use a new `._walk_to` method
- Stephan
- Deduplicated coordinates
- https://github.com/pydata/xarray/pull/9510
- Issue with passing state to the `._post_attach` method
- But just an internal detail
- Can't have conflicting coordinates on descendants ("no overriding")
- What to do about non-indexed coordinates?
- Indexed coordinates are in memory so easy to check for duplication
- But non
- Current design might be slow
- Lots of internal method calls
- Some methods have performance that scales poorly with tree depth
- e.g. `__init__` constructor has quadratic performance
- Let's raise an issue for this
- Want to complete some traversing refactors
-
## Sept 10th, 2024
### Attendees
- Tom Nicholas / @TomNicholas
- Matt Savoie / @flamingbear
- Owen Littlejohns / @owenlittlejohns
- Eni Awowale / @eni-awowale
### Updates
- Tom
- Sprinted with Eni at NumFOCUS summit on Saturday
- Moved / closed a bunch of issues
- PRs
- https://github.com/pydata/xarray/pull/9465
- https://github.com/pydata/xarray/pull/9453
- https://github.com/pydata/xarray/pull/9451
- https://github.com/pydata/xarray/pull/9470
- Reviewed several of Stephan's PRs
- Eni
- Sprint with Tom at NumFOCUS summit
- PR for `open_groups` with zarr https://github.com/pydata/xarray/pull/9469
### Agenda
- Should the docs be in a separate branch?
- Documenting coordinate inheritance and alignment rules
- Deserves its own PR...
- Names of things
- https://github.com/pydata/xarray/issues/9458
- `DataTree(data=...)` or `DataTree(node=...)` or ?
- `DataTree.ds` or `DataTree.node` or ?
- Migration guide
- Blog post https://github.com/xarray-contrib/xarray.dev/issues/708
- Issue-moving spree
- Eni's `open_groups` for Zarr PR
- https://github.com/pydata/xarray/pull/9469
- Bear in mind Stephan is about to change the meaning of "identical" slightly https://github.com/pydata/xarray/pull/9473
## Sept 3rd, 2024
### Attendees
- Matt Savoie / @flamingbear
- Eni Awowale / @eni-awowale
- Justus Magin / @keewis
- Tom Nicholas / @TomNicholas
- Alfonso Ladino Rincon / @aladinor
### Updates
- Matt: Pushed changes to fix Tom's [PR#9297](https://github.com/pydata/xarray/pull/9297) for shallow copy.
Added more to remove parent from constructor keywords on my [branch](https://github.com/flamingbear/xarray/tree/datatree_init_dont_modify_inplace. I pushed to Tom's repo.
- Alfonso working on `open_zarr` #9198
### Agenda
## Aug 27th, 2024
### Attendees
- Justus Magin / @keewis
- Tom Nicholas / @TomNicholas
- Alfonso Ladino Rincon / @aladinor
- Matt Savoie / @flamingbear
- Owen Littlejohns / @owenlittlejohns
- Eni Awowale / @eni-awowale
### Updates
- Tom
- Worked more on https://github.com/pydata/xarray/pull/9297
- Failing doctest: https://github.com/pydata/xarray/actions/runs/10476674854/job/29016117743?pr=9297
- Matt
- Added Eni's open_groups to the [Documentation PR.](https://github.com/pydata/xarray/pull/9033)
- Just rescanned the issues from Aug 13 triage session.
- Justus: nothing (but I do remember wanting to post a review comment on [#9378](https://github.com/pydata/xarray/pull/9378))
- Alfonso: Nothing - (Still looking at [#9198](https://github.com/pydata/xarray/pull/9198))
- Owen:
- Moved some issues over
### Agenda
- Merge some more PRs?
- Go through more old issues?
## Aug 20th, 2024
### Attendees
- Justus Magin / @keewis
- Tom Nicholas / @TomNicholas
- Alfonso Ladino Rincon / @aladinor
### Updates
- Tom
- Working on https://github.com/pydata/xarray/pull/9297
### Agenda
- Alfonso's PR on opening zarr stores with consolidated group
- https://github.com/pydata/xarray/pull/9377
- merged
- Etienne's PR on disallowing paths with slashes
- https://github.com/pydata/xarray/pull/9378
- further modify the error message to mention that `/` in variable names is only illegal when creating datatree nodes
- Stephan's PR on improving error message
- https://github.com/pydata/xarray/pull/9222
- unsure why the error message is now less explicit
- Cloud storage credentials
- https://github.com/pydata/xarray/pull/9198
- partially fixed by Alfonso's PR, the rest can be fixed by further optimizing zarr's open_datatree to use the pre-opened store
- AoB
- Continue moving old issues / working on PRs
## Aug 13, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Tom Nicholas / @TomNicholas
- Eni Awowale / @eni-awowale
- Owen Littlejohns / @owenlittlejohns
- Gui Castelao
- Alfonso Ladino Rincon / @aladinor
### Updates
- Tom
- Might have use cases for DataTree at CWorthy
- Eni : still working on #9243. Will try the suggested mypy fix
### Triaging session
<details><summary>Issues and PRs to triage</summary>
```markdown
_Originally posted by @<user> in <link>_
```
Please add the `topic-datatree` label!
Open issues:
- [x] #5 - (Tom) Moved upstream
- [x] #9 - moved [9347](https://github.com/pydata/xarray/issues/9347) Tom
- [x] #47 - moved [9348](https://github.com/pydata/xarray/issues/9348) Eni
- [x] #51 (Justus) - moved to xarray
- [x] #55 - recommend closing / asked @maxgrover1 and @kmuehlbauer if we can closed
- [x] #58 (Justus) - moved to xarray
- [x] #61 - closed: PR was merged for issue and issue is accounted for in #8572 (Eni)
- [x] #67 (Tom) - closed in favor of existing xarray issue
- [x] #77 (Tom) - moved over
- [x] #79 (Tom) - moved over
- [x] #80 (Tom) - closed as arguably already solved
- [x] #93 - (Owen) migrated [9337](https://github.com/pydata/xarray/issues/9337) - closing of file using open_datatree in context manager
- [x] #97 (Tom) migrated upstream
- [x] #100 (Eni) Closed and moved to [9437](https://github.com/pydata/xarray/issues/9437)
- [x] #124 (Justus) - closed
- [x] #134 (Eni) Closed and moved to [9438](https://github.com/pydata/xarray/issues/9438)
- [x] #145 - (Owen) migrated [9343](https://github.com/pydata/xarray/issues/9343)
- [x] #146 - (Owen) migrated [9365](https://github.com/pydata/xarray/issues/9365)
- [x] #152 - Eni - moved upstream
- [ ] #168 - Eni - closed
- [x] #184 - (Tom) closed as same ideas implemented by Stephan in https://github.com/pydata/xarray/pull/9064
- [x] #186 - (Tom) moved upstream
- [x] #189 - Eni: moved to https://github.com/pydata/xarray/issues/9440
- [ ] #191
- [X] #192 - migrated https://github.com/pydata/xarray/issues/9349
- [ ] #193
- [ ] #195
- [ ] #199
- [x] #200 - migrated [#9335](https://github.com/pydata/xarray/issues/9335)
- [x] #203 - migrated [#9345](https://github.com/pydata/xarray/issues/9345)
- [x] #204 - Close in favor of #192
- [X] #206 - migrated to pydata/xarray#9350
- [X] #207 - migrated [#9338](https://github.com/pydata/xarray/issues/9338)
- [ ] #210
- [ ] #230
- [ ] #232
- [ ] #235
- [x] #240 - (Tom) moved to xarray
- [ ] #242
- [ ] #244
- [x] #250 - (Justus) closed
- [x] #252 - Eni - closed and moved upstream https://github.com/pydata/xarray/issues/9502
- [x] #254 - (Justus) moved to xarray
- [ ] #258
- [ ] #266
- [ ] #270
- [x] #276 - Eni (working on)
- [ ] #277
- [ ] #281
- [ ] #283
- [x] #290 (Justus) - moved to xarray
- [x] #292 - Eni - moved upstream https://github.com/pydata/xarray/issues/9503
- [x] #297 (Justus) - closed in favor of existing xarray issue (#9056)
- [ ] #309
- [x] #311 (Tom) - moved to xarray
- [ ] #312
- [ ] #313
- [x] #316 (Justus) - moved to xarray
- [x] #320 (Eni) - moved https://github.com/pydata/xarray/issues/9539. Thought this was an interesting feature request.
- [x] #322 (Justus) - closed in favor of the existing xarray issue (#9197)
- [ ] #323
- [x] #325 (Tom) - closed with link to upstream replacement
- [x] #331 (Tom) - closed with comment
- [ ] #337
Open PRs:
- [x] #114 - (Owen) linked to from xarray issue (9335).
- [ ] #142
- [ ] #147
- [x] #155 - (Owen) linked to from xarray issue (9343).
- [ ] #196
- [ ] #198
- [ ] #217
- [ ] #220
- [ ] #221
- [ ] #238
- [ ] #253
- [ ] #265
- [ ] #271
- [ ] #282
- [ ] #307
- [ ] #310
- [x] #314 (Tom) - linked to from new issue on xarray
- [ ] #319
- [ ] #338
</details>
## Aug 6, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Tom Nicholas / @TomNicholas
- Eni Awowale / @eni-awowale
### 60 Second Updates.
- Matt : waiting for PRs before re-reviewing the Documentation. If you want to see the diff for the docs updated for inheritance: [here](https://github.com/pydata/xarray/pull/9033/files/b303d6255d762f0a82188ff6446b25a7bc82aadb..421c404c59c18ab36bfb2ab9fe1db016a154d9ad) And the [current PR docs](https://xray--9033.org.readthedocs.build/en/9033/)
- Eni : still working on #9243
### Agenda
- issues on the old repository:
- block about an hour separately to go through the issues
- Justus will organize / create a poll to find a good time
- Special dask methods
- https://github.com/pydata/xarray/blob/c508cc6a2e3000a9d87d2f8c611aae8733be07bf/xarray/core/dataset.py#L879
- https://github.com/xarray-contrib/datatree/pull/196
- tutorial files for datatree: possibly synthetic, neuro-imagery, or a geoscience (weather?) image pyramid
- check for isomorphic trees (for `map_over_subtree`): also compare names to avoid relying on the order of the nodes
## Jul 30, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Tom Nicholas / @TomNicholas
- Stephan Hoyer / shoyer
- Eni Awowale / @eni-awowale
- Owen Littlejohns / @owenlittlejohns
### 60 Second Updates.
- Matt: Almost completed the update for [Doc PR](https://github.com/pydata/xarray/pull/9033)
- Tom:
- Looked at fixing several bugs
- https://github.com/pydata/xarray/issues/9285
- https://github.com/pydata/xarray/issues/9196
- https://github.com/pydata/xarray/pull/9292
- Eni:
- PR [#9243](https://github.com/pydata/xarray/pull/9243)
### Agenda
## Jul 23, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Stephan Hoyer / shoyer
- Eni Awowale / @eni-awowale
- Alfonso Ladino Rincon / @aladinor
- Etienne Schalk / @eschalkargans
- Tom Nicholas / @TomNicholas
### 60 Second Updates.
- Tom:
- Was at SciPy then PTO
- Matt: still nothing. looking at Eni's draft [PR #9243](https://github.com/pydata/xarray/pull/9243/files)
- Etienne: convert datatree to dict [PR #9080](https://github.com/pydata/xarray/pull/9080) (note: with coordinate inheritance, inherited coords are duplicated ; disadvantage: denormalization of data ; advantage: self sufficient leaf groups)
- Eni: Back from SciPy and PTO working on draft PR #9243
- Will add tests to new file
### Agenda
- SciPy report
- We should move old issues
- Best to do manually as then a human will check
- Eni has issue with openDAP for trees
- Latest [tasks](https://github.com/pydata/xarray/issues/8572#issuecomment-2218020742) to get datatree released and original set [#8572](https://github.com/pydata/xarray/issues/8572)
## Jul 16, 2024
### Attendees
- Matt Savoie / @flamingbear
- Stephan Hoyer / @shoyer
- Justus Magin / @keewis
- Alfonso Ladino / @aladinor
### 60 Second Updates.
- Matt has barely been even following issues.
### Agenda
- Not much but Alfonso had two PRs to discuss
Options for credentials for s3 when opening zarr stores https://github.com/pydata/xarray/pull/9198/files
Addresses backend kwargs that were removed (addresses [#9135](https://github.com/pydata/xarray/issues/9135)) https://github.com/pydata/xarray/pull/9199/files
- Early adjournment
## Jul 9, 2024
### Attendees
- Justus Magin / @keewis
- Stephan Hoyer
- Tom Nicholas / @TomNicholas
### Agenda
- checklist for releasing datatree
- https://github.com/pydata/xarray/issues/8572#issuecomment-2218020742
-
## Jul 2, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Owen Littlejohns / @owenlittlejohns
- Stephan Hoyer
- Alfonso Ladino / @aladinor
### 60 second updates
- Tom
- Reviewed coordinate inheritance PR properly
- Matt
- Also viewed the inheritance PR understood most.
- Owen
- Also reviewed PR 9063 (inheritance)
- Stephan
- Inheritance PR
- Alfonso
- Got both PR for keywords and benchmarks ready.
- https://github.com/pydata/xarray/pull/9158
- https://github.com/pydata/xarray/pull/9199
### Agenda
- Are we happy to merge Stephan's PR?
- Outstanding Q's?
- A couple of other things to merge
- Constructor parent not mutating
- What does that unblock?
- Release schedule
- release
- whats required
- docs PR
- open_as_dict_of_datasets
- blog
## Jun 25, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Owen Littlejohns / @owenlittlejohns
- Stephan Hoyer
### 60 second updates
- Matt: Reviewed / following the inherited coordinate PR [#9063](https://github.com/pydata/xarray/pull/9063/files)
- Tom: Also reviewed the PR
- Owen: Also partially reviewed Stephan's PR [#9063](https://github.com/pydata/xarray/pull/9063/files)
### Agenda
- Benchmark for open_datatree: https://github.com/pydata/xarray/pull/9158
- Probably should close the files
- DataTree should be a context manager (like how you can already do `with open_dataset(path) as ds:`)
- raise an issue for this!
- Backend kwargs are not forwarded: https://github.com/pydata/xarray/issues/9135
- Review of coordinate inheritance PR [#9063](https://github.com/pydata/xarray/pull/9063/files)
- Tom: Main question is what should the internal structure be?
- DataTree repr: https://github.com/pydata/xarray/pull/9064
- SciPy talk
- Practice talk for NASA 2nd July 12pm EDT
- Everyone welcome on teams (https://teams.microsoft.com/l/meetup-join/19%3ameeting_NDc3ZWRiOGUtOTdhNS00ZDkyLWI2ZGQ[…]2c%22Oid%22%3a%2275a4b9ac-327c-4e32-9aeb-1eab36528186%22%7d)
- Tom and Eni will give h
of talk each
- Tom on general datatree idea, Eni on NASA's use case
- Top-level functions like `xr.concat` accepting DataTree objects?
- https://github.com/pydata/xarray/issues/9106
## Jun 18, 2024
### Attendees
- Matt Savoie / @flamingbear
- Tom Nicholas
- Eni Awowale/ @eni-awowale
- Owen Littlejohns / @owenlittlejohns
- Alfonso Ladino Rincon
### 60 second updates
- Trying hard to wrap my head around the current discussion [#9077](https://github.com/pydata/xarray/issues/9077)
### Agenda
- Inherited coordinates -- allow overrides or not?
- The case for forbidding overrides
- If non-alignment is allowed, we would need a way to tell update/setitem methods whether or not we want them to check alignment in this particular case
- Alignment will have to be checked between variables on the same node anyway
- Discuss #9077 some more?
- Particularly this `open_as_dict_of_datasets` idea
- Could even point to this function from within the alignment failure in `open_datatree`
- Is the value in having `open_datatree` work on everything or having some xarray function work on everything?
- Optional vs forbidden overriding of dimensions in child nodes
- How much feedback do we actually need from the community?
- Mapping top-level functions like concat over trees https://github.com/pydata/xarray/issues/9106
- Eni's SciPy talk?
## Jun 11, 2024
### Attendees
- Matt Savoie / @flamingbear
- Eni Awowale / @eni-awowale
- Owen Littlejohns / @owenlittlejohns
- Tom Nicholas
- Justus Magin / @keewis
### 60 second updates
- Matt - Following discussions at most.
- Tom - Mostly just following other people's issues / PRs
- Justus - nothing datatree-related, but I'll try releasing numpy 2 later today
- Eni - dropped a bug report #9093 about segmentation faults with `open_datatree()`
### Agenda
- Let's merge some things?
- open_datatree speedup PR
- Matt will add commits to remove uneeded kwargs then we can merge
- Tom reply to Etienne's PR about to_dict
- Owen self-merge common.py PR
- Coordinate inheritance issue
- Stephan summarized it nicely
- We should use his description to ask around
- Pangeo discourse
- Twitter
- ESDIS metadata manager people?
- Point out on issue
- that one can still open invalid files using group/root kwarg
- becomes hard to list the groups in a file
- New function?:
- `list_groups`
- `open_datasets_dict`
- Numpy release status?
- basically done, one PR missing
- will release today or tomorrow morning
## Jun 4, 2024
### Attendees
- Matt Savoie / @flamingbear
- Owen Littlejohns / @owenlittlejohns
- Justus Magin / @keewis
- Eni Awowale / @eni-awowale
- Tom Nicholas
### 60 second updates
- Matt - have only read [proposal](https://github.com/pydata/xarray/issues/9056#) and PRs.
- Owen - have open PRs for migration https://github.com/pydata/xarray/issues/9011, https://github.com/pydata/xarray/issues/9033 (latter probably needs to wait for numpy 2.0 support)
- Stephan - sketch of hierarhical coordinates: https://github.com/pydata/xarray/pull/9063
- Tom
- Also messed with hierarchical coordinates: https://github.com/pydata/xarray/pull/9065/files
### Agenda
- Owens' TreeAttrAccessMixin PR
- Decision to not worry about slots/dict stuff too much and move forward
- Alfonso's [open_datatree PR](https://github.com/pydata/xarray/pull/9014)
- Review
- Stephan's hierarchical coordinates PR
## May 28, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Eni Awowale / @eni-awowale
- Tom Nicholas
- Stephan Hoyer
### 60 second updates
- Matt - still nothing.
### Agenda
- decision on variable inheritance:
- should we change behavior now? Or should we have a separate API instead?
- Way to defer the decision?
- Proposal
- Keep `.ds`, `__getitem__` as-is
- Define "compatible variables" for inheritance
- Same-named dimensions have to the same
- Alignable
- (Compare with what it says in the CF conventions)
- Additional API which allows access to inherited variables
- dt.ds will never give access to inherited vars
- But dt.inherited.ds would allow `__getitem__` access to inherited vars
- `dt.inherited[...].ds`?
- `dt.inherited.to_dataset()` -> xr.Dataset containing inherited vars
- Don't change `map_over_subtree` (again for backwards compatibility)
- `map_over_inherited_subtree` isolates the conceptuals of mapping over tree with inherited variables
- issues: e.g. map over and see the same variable multiple times (in its "local" group and in all its child groups)
- Explicit API for propagating / shallow-copying variables to child nodes?
- dt.inherit()? -> DataTree
- Either way: this will be a new feature, to be done in a separate release (i.e. no blocker right now)
## May 21, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Owen Littlejohns / @owenlittlejohns
- Eni Awowale / @eni-awowale
- Tom Nicholas
### 60 sec updates.
- Matt: Reviewed Alfonso's open_datatree PR. No ticket work.
- Owen: Submitted PR for documentation and exposing DataTree in public API (https://github.com/pydata/xarray/pull/9033)
### Agenda
- Announcements
- Write a blog post
- Doesn't need to be long
- https://medium.com/pangeo/easy-ipcc-part-1-multi-model-datatree-469b87cf9114
## May 14, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Tom Nicholas
- Alfonso Ladino
- Owen Littlejohns / @owenlittlejohns
- Stephan Hoyer
- Eni Awowale
### 60 sec updates.
- Matt slacking on other work and time off.
- Owen responding to feedback for [PR](https://github.com/pydata/xarray/pull/9011) migrating `io.py` and `common.py`
- Tom prepping for virtualizarr talk tomorrow
### Agenda
- Alfonso's `open_datatree` performance PR
- https://github.com/pydata/xarray/pull/9014
- Coordinate inheritance discussion
- Implementation isn't that hard, difficulty is clear model and behaviour, especially wrt mapping
- Need to keep Dataset invariant of all shared dims on one group have same length
- Option (1): Explicit API separation of group with inherited variables
- e.g. dt.inherited.ds
- The check:
`xarray.align(*[node.ds, node.parent.ds, node.parent.parent.ds, ...], join='exact')`
- Tom to make an issue to write out thoughts/options
## May 7, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Tom Nicholas
- Alfonso Ladino
- Owen Littlejohns / @owenlittlejohns
### 60 sec updates.
- Owen: [PR migrating last pieces of datatree code into xarray.core](https://github.com/pydata/xarray/pull/9011)
### Agenda
- Alfonso show us his work on opening stuff efficienctly
- 1-2 order of magnitude speedup with <= 1000 groups on netcdf4!
- Separate PRs would be great
- important things left in the merge
- docs
- formalize the backend
- moving to_netcdf and AttrAccessMixin
- issue with slots
- split up into 2 PRs to separate out the potential rabbit hole
## Apr 30rd, 2024
### Attendees
- Matt Savoie / @flamingbear
- Tom Nicholas
- Eni
- Ty
- Justus
### 60 sec updates.
- Matt: PR for [ops.py](https://github.com/pydata/xarray/pull/8976)
### Agenda
- Progress / priorities
- Good progress on merging core modules
- Still need also docs, expose API, backends optimization
- Should docs be added on same release as API is made public?
- Each docs page is intended to be merged into the existing xarray docs page of the same name
- With the exception of "Hierarchical Data", which is its own new page in the user guide
- inherited variables:
- maybe have a separate namespace (for example, `dt.cf["/path/to/inherited/variable"]` does inherited access as defined by the CF conventions)
- or `dt.ia[]` for inherited access.
- the advantage would be that we would be able to release, then add this feature later
## Apr 23rd, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Tom Nicholas
- Eni Awowale / @eni-awowale
- Owen Littlejohns / @owenlittlejohns
### 60 sec updates.
- Matt: I'm just returning my attention. ops.py.
- Owen: Working on migrating most of remaining modules.
### Agenda
- Merge tarball PR (merged)
- SciPy talk?
- Ideally be able to say DataTree is in xarray main by then (July)
- Integrating backends
- https://github.com/xarray-contrib/datatree/issues/330
- Currently we create a new `CachingFileManager` for each group
- Want to only create one per file
- two options:
- Modify netcdfdatastore object to iterate over groups
- allow creating the datastore given a file manager object
- How do we test the performance of this?
- Benchmark
- Create datatree object with many nodes (but doesn't need actual data)
- Write to disk, then benchmark opening it up.
- Action items
- Tom: Dedicated issue for this? (on xarray)
- Write that benchmark first (goes with the other airspeed velocity tests)
- Modify netcdfdatastore to only create one FileManager
- Publicly the top-level `open_datatree` function (plus docs on datatree backends)
- Tom: Ask Kai and Max etc. if they are actually planning to do this
- Quick questions on xarray.core.common.py and testing.py.
- `from_root` kwargs to `assert_equal` → add `**options` to `assert_*`
-
## Apr 16th, 2024
### Attendees
- Matt Savoie / @flamingbear
- Tom Nicholas
- Stephan Hoyer
- Owen Littlejohns / @owenlittlejohns
- Eni Awowale
### 60 sec updates.
- Matt: working other side.
- Owen: looking at `mapping.py`
- Eni: HTML repr
- https://github.com/pydata/xarray/pull/8930
-
### Agenda
- Justus (can't join but would like to bring this up):
- type checking of xarray apparently fails because of the typing import of `DataTree`: https://github.com/pydata/xarray/issues/8768
- should we remove that for now / replace with `"DataTree"` (not sure if that works)?
- action: Matt will change tarball to stop stripping out datatree
## Apr 9th, 2024
### Attendees
- Tom Nicholas
- Matt Savoie / @flamingbear
- Ty Schlichenmeyer
### Agenda
- Discussed the original Xarray [Tracking issue (#8572)](https://github.com/pydata/xarray/issues/8572). Tom will update where we are.
- Matt will see if we can add planned work for getting the documentation another pair of eyes before the merge as well as to get a short (no pressure) blog post for both NASA and Xarray to celebrate :tada: completion.
- Talked through the depth first (PreOrderIter) and breadth first (LevelOrderIter) and discussed if there was any benefit to having both in the code base. We are going to try to replace and simplify by using LevelOrderIter only. We could not determine a performance reason for having depth first considering all of the intermediate nodes have to be created.
## Apr 2nd, 2024
### Attendees
- Tom Nicholas
- Justus Magin / @keewis
- Eni Awowale / @eni-awowale
## Mar 26th, 2024
### Attendees
- Matt Savoie / @flamingbear
- Tom Nicholas
- Owen Littlejohns / @owenlittlejohns
- Stephan Hoyer
### 60 Second updates
- Matt: Looking at mapping.py
- Owen: Resolve last few mypy issues with datatree.py PR (thanks to Matt for help there). PR is pretty much ready to go.
### Agenda
- Current [datatree.py PR](https://github.com/pydata/xarray/pull/8789). [Should we pull everything that is imported from `datatree_` out of this one?](https://github.com/pydata/xarray/pull/8789#discussion_r1538584210)
- `ops.py` should go into xarray's `generate_aggregations`? [no for now, can be cleaned up later, add an issue?]
- Priorities?
- `from xarray import datatree`
## Mar 19th, 2024 (special time)
### Attendees
- Matt Savoie / @flamingbear
- Tom Nicholas
- Owen Littlejohns / @owenlittlejohns
- Justus Magin / @keewis
### Agenda
Discussed "DataTree handles Hashables"
- The use cases seemed very infrequent.
- zarr groups are limited to strings. The Netcdf4 doesn't have types but you can't create a group from an int `TypeError: expected str, bytes or os.PathLike object, not int`
- To move forward, allow the getter to have a Hashable type, but be clear that we only use str and raise errors on non-str in DataTrees. Hopefully this solves problems with traversing and finding data, but keeps us without having terrible typing conflicts between Dataset Dataarray and DataTree
Discussed issues with wrapping a Dataset in a "FrozenDataset" as a replacement for DatasetView which problematically inherits from Dataset.
- First suggested solution for FrozenDataset was failing because special methods aren't caught by `__getattr__`.
- Owen was looking into a metaclass solution that seemed really complicated.
- Tom, Matt and Owen decided that we should move on if Owen's next stab also failed (using a mixin).
Tom showed Matt the metaprogramming in [generate_aggregations.py](https://github.com/pydata/xarray/blob/main/xarray/util/generate_aggregations.py) and the resulting [_aggregations.py](https://github.com/pydata/xarray/blob/main/xarray/core/_aggregations.py) and sounded like he convinced himself that we might use that instead of the code currently in ops.py to apply the map_over_subtree decorator. This solution wasn't avaiable before as the datatree repo was separate from xarray when implemented. This would also allow us to fixup some of the documentation for datatree that is "good enough". Probably a good thing for Tom and Stephan to discuss before we migrate that code.
## Mar 12th, 2024
### Attendees
- Matt Savoie / @flamingbear
- Tom Nicholas
- Owen Littlejohns / @owenlittlejohns
- Eni Awowale / @eni-awowale
- Justus Magin / @keewis
- Stephan
### 60 second updates
- Matt: No progress last week.
- Have PR up for datatree.py migration. Working on FrozenDataset.
### Agenda
- Slow week with not much to report.
- Some discussion about missing API pieces to Datatree. For merging or filtering in particular.
- It was mostly agreed that maybe an advanced usage documentation with recipes for how to do common operations could be useful, but keep an eye open for opportunities to improve if obvious, repeating use cases appear.
## Mar 5th, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Tom Nicholas
- Stephan Hoyer
- Eni Awowale / @eni-awowale
### 60 second updates
- Matt
- Struggling to rectify the mypy errors in [#8789](https://github.com/pydata/xarray/pull/8789). Looking for advice on which way to proceed.
- Same story for implementing Hashable for Datatree.
### Agenda
- Continue Discussion around Datatree following CF model for [scoping variables](https://cfconventions.org/cf-conventions/cf-conventions.html#_scope).
+ Justus would like a flag for behavor switching, Tom thinks that would over complicate things including docs and support.
+ Tom will go back to thinking and see if he can prototype something.
- Questions for implementing Hashable for Datatree led to discussion
+ Should backslash "\", slash "/", dot "." and dotdot ".." be allowed in variable names (I think this was the discussion).
+ Seemed like Hashable should work except for the Paths. Maybe it was a bad idea in Xarray? Don't think wse had a decision on how to move here, but Matt will continue to think about it. overall generally inconsequental.
- Matt will replace DatasetView with a Frozen style wrapper to Dataset.
## Feb 27th, 2024
### Attendees
- Matt Savoie / @flamingbear
- Stephan Hoyer
- Tom Nicholas
- Eni Awowale / @eni-awowale
- Etienne Schalk / @etienneschalk
### 60 second updates
- Tom
- Not much - at conference
- Matt
- Waiting on first PR, have a few others behind. https://github.com/pydata/xarray/pull/8757
### Agenda
- Recap of previous meeting
- Updates / Q's
- Deep dive?
- Data model for inherited nodes
- e.g.,
- Entirely independent?
- Shared coordinates from parent nodes?
- CF conventions: https://cfconventions.org/cf-conventions/cf-conventions.html#groups
- Key clause: "If any dimension of an out-of-group variable has the same name as a dimension of the referring variable, the two must be the same dimension (i.e. they must have the same netCDF dimension ID)."
- design questions:
- Should we be able to open any netCDF file?
- Dict contents are ambiguous when there is fallback look-up
- Could maybe use ChainMap for inheritance
- Example in h5netcdf https://github.com/h5netcdf/h5netcdf/blob/b19d4a03a4bb553312d77135c23f3eedba243899/h5netcdf/core.py#L697
- are we excluding any use-cases by adopting a netCDF data model?
- do we allow conflicts in inherited variables?
- CF conventions do not allow conflicting dimensions
- Do we want to allow conflicting coordinates/data variables?
- EDIT: Tom commented a summary of this https://github.com/xarray-contrib/datatree/issues/297#issuecomment-1967328385
## Feb 20th, 2024
### Attendees
- Matt Savoie / @flamingbear
- Justus Magin / @keewis
- Owen Littlejohns / @owenlittlejohns
- Stephan Hoyer
### 60 second updates
- datatree tests are not skipped in the new release
### Agenda
- Intro to the purpose of these meetings
- Update from Matt?
- High-level explanation of datatree's overall design from Tom
- One group, one `Dataset`
- Nested dictionary
- Independent nodes
- Store `Variable` objects instead of `Dataset`s
- Map API downwards
- Deep-dive into one decision / part of code (if time)
- pathlib: non-pure paths on datatree?
### Actions
- [X] Track down reason for exploding Dataset into pieces in datatree in issues.
https://github.com/pydata/xarray/issues/8747#issuecomment-1955051183
- [X] Make migrations flat, i.e. no datatree subdir in xarray.
### Ideas
- Ideas from Stephan:
- Switched OrderdDict -> dict
- Move Dataset-like hidden properties onto a dedicated object?
- idea: subtree mapping: returns the full tree with just the specified nodes (and maybe their children)
```python
dt.subtree(["/a", "/b/c"]).isel(...)
``` ###