There is no commentSelect some text and then click Comment, or simply add a comment to this page from below to start a discussion.
August 16th, 2023
CANCELED
Updates
BP: Scott Simmons to schedule public comment on charter the week of August 28th in the AM for US time zones. Will post details to this thread once confirmed day/time.
CN: from TC-Announce "The SWG proposers will hold a webinar to highlight the planned activities and answer any questions. The webinar is scheduled for 30 August 2023 at 1400 UTC / 1000 EDT"
The GeoZarr Standard Working Group (SWG) is chartered to develop a Zarr encoding for geospatial gridded data in the form of Zarr conventions (based on the approach described in the draft Zarr Enhancement Proposal 4). Zarr specifies a protocol and format used for storing Zarr arrays, while GeoZarr defines conventions and recommendations for storing multidimensional georeferenced grid of geospatial observations (including rasters). The GeoZarr SWG will also work on improving Climate and Forecast (CF) metadata conventions if necessary, particularly for alternative coordinate reference system encoding if relevant. Since January 2023, a community effort has been convening bi-weekly to take the next steps in the OGC process, with a draft charter submitted for consideration in July 2023. This presentation will provide a status of the charter, walk through core aspects of the specification, as well as optional conformance classes and finally demonstrate interoperatbility with common geospatial tools with example zarr stores.
Hackathon for implementation examples
July 19th, 2023
CANCELED Brianna: Many of us are attending ESIP in Vermont. I provided Scott from OGC the link to the merged charter PR. His reply:
This is a really good charter - thanks for the excellent work!
The next step is to go to Member and public comment. I can get the document to the right place and kick off that process. The comment period last 3 weeks after which the proposed SWG needs to be presented to Membership and a vote initiated. For these last steps, the presentation could occur in our Closing Plenary at our next Member Meeting in Singapore on 28 September or via webinar before that date.
Please let me know if you would prefer to have this presented in Singapore or via webinar… either would happen after the member/public comment ends.
I personally have the preference to have it presented via webinar rather than the next member meeting in Singapore - please leave a comment if you have a different preference.
July 5th, 2023
CANCELED Please provide last feedback/approval for charter PR by Friday July 7th.
Agenda
CN: I would like to address a critical point concerning ESA and Spacebel: the primary/original goal of GeoZarr is providing native geospatial functionalities (serverless) in Zarr, such as visualization (akin to COG), optimized access for analysis (like OGC API coverage), etc. While it is highly beneficial to add a range of other objectives to this foundation (encoding refinements/improvements is not needed by our own customers), those joining the initiative should be aware of the original objective so it can be pursued and extended as it remains a fundamental aspects for some participants (otherwise, why not starting another format project from scratch ?). Note also that some specific/advanced aspects might be explored in later version of the spec (e.g. symbology is not a primary aspect but I will elaborate during the project how it is he ideal companion of multiscales for visualisation). Finally, remind that OGC SWG philosophy is based on inclusion…
BP: Christophe - we discussed some of these points extensively in the last call, I tried to capture some notes below. I think it's important for us within the SWG to identify what parts of the specification are part of the core, and what are part of conformance classes. For example, the visualization piece, that is centered around compatibility with tools. But as there are many use cases for even viz itself, we wouldn't want to require very specific specifications, rather have optional conformance classes that are available for guidance. The example of pyramiding/multi-scaling, not every user of zarr needs this, but there should be guidance of how to link this when a zarr convention is released. See my last comment on the PR, I suggested canceling this weeks meeting, for preference of any last async discussions needed. I think we are going around without decision in the Charter, and we need to have some of these discussions in the SWG itself.
CN: Yes, the comment is fine and I adhere to comformance class as it's something I have mentioned from the start (despite it is generally not reported in the charter). For information, as per OGC API common, core of geozarr should be typically one/multiple conformance classes also (and personally, I don't see any topic which is needed by every user of Zarr). I believe the charter is in a very good shape now.
Attendance of OGC meeting, have not received much feedback.
BP: FWIW, I do not think we need to resolve every commit in the draft charter, I would assume some of these points can/should be discussed as part of the formalized SWG. Scott did say we should submit a charter with enough consensus, but feels like there should be path forward to discuss points.
MH: Would like to include a slide on GeoZarr for FOSS4G conference.
Solution: leave it worded 'as possiblly' and discuss as part of charter meetings?
Brianna will removing (2) Visualisation: Simplifying the creation and display of geospatial data in web browsers without the need for complex workarounds, making geospatial information more accessible to users. and adding to (1) compatibility point of being compatible with viz software.
We know of existing work, this work needs champions, Brianna brought it up in bi-weekly zarr meeting, need to have follow-up with Josh Moore, regardless, do we table these points? wait for zarr conventions then adopt as needed? if so what do we do in the meantime?
CNL: I think we don't need to wait. However, I would encourage each member to tackle aspects of interest one by one, then see how to manage and how to consider the work in a consolidate specification, with potentially categories or "conformance-class" axed on domains.
ED: Does not have to be the only group pushing multi-scale, but geozarr can participate, can reference other documents that will be worked on.
MH: Aligned with OGC API features. Somewhat two-way relationship
Same as above for 'upstream' - optimized rechunking?
RA: Would be ideally another convention, could also have multiple chunking schemes, already implemented, creating rechunked versions. This is domain independent. Stand alone convention in zarr, fine to say this is something we want, don't think geozarr needs to specify.
MH: Sounds like a conformance class - called a requirements class. In COG, you have core conformance, all of properties that make it a COG, like table of contents, image file directory, tile chunks. Having overviews at reduced resolution, is not necessary, this is optional, seperate conformance class.
RA: Working group decides what is in conformance class versus core.
ED: Multi-res and multiple chunking is interesting for many communities. If there is a way to make it abstract that would be awesome. Conformance class is within spec, or you can have core or higher level spec documents.
MH: These have been stored in STAC, the classification extension of STAC can be used.
AS: This is way too specific. Goal should be the fewest people have to care/adjust for a special case GeoZarr. Maybe netcdf API needs to understand geozarr, but everything downstream has to know/care.
To Do
[]
June 7th, 2023
CANCELED Due to attendance at the OGC meeting Will provide updates async Thank you for everyone who is actively working on the charter PR.
Meeting with Google Earth Engine Ingest team last week, more related to pangeo-forge, but they had interest as well in figuring out ways to work wiht zarrs with GEE.
Alex Merose to try and demo the ability of working with a zarr store directly in GEE
June 05 - next OGC meeting, Scott is happy to set up time for discussion and socialize session, should be options to join remotely. Would need some draft in any state before then. Next Member meeting is Sept 25.
David: Fits as a community standard better than SWG. Zarr v2 is a community standard. CF-baseline is not governed in OGC. CF is it's own comunity and has it's own governance structure. This is building up from CF and zarr which feels more like a community standard. Also if we want to go with SWG we have to write a charter, get it approved and only then we can start working, so could also delay things. We need operators.
Ethan: Community can be easier and faster, on the other hand getting something like this going and governance issues come up. NetCDF came into existance before the community standard tract.
Action Items
Amit creating example zarr with geotransform PR from David
Brianna can lead interoptibility testing with example zarr posted from Amit
David adding Sean Gilles to thread, might have to explicitly ask Sean + Alan to submit a review.
Questions Christophe posed to Scott. We are unsure if this aligns with the expectations for an OGC Implementation Standard and whether it is suitable for an SWG. We would greatly appreciate your input on the following:
Are the objectives of the GEoZarr working group compliant with the requirements for an OGC SWG?
Is the Zarr ZEP-4 document suitable for developing an OGC Implementation Standard?
Scott's response:
In short related to Christophe’s question: yes, the objectives are definitely in line with OGC intent and the ZEP-4 document includes information that is suitable for an OGC Standard. BUT, OGC would still need a formal Standard document to reference that looks quite different from the ZEP-4 document.
Here is how I can see this working. The content of a ZEP would need to be described as clearly-defined requirements that are testable. The OGC template for a Standard [1] would then be populated from the ZEP text and requirement(s). The Standard could be very short - no need to write hundreds of pages if very few are needed. Finally, the enhancement (which I suppose would be extended, but optional functionality for Zarr) could also be described as is best for the Zarr community as an included Annex in the Standard so that Zarr users see what they are used to and OGC Standard readers also see what they expect.
Matt: last thing people want is to be handed a pdf with the spec asked to implement, but if you give people tools, that's more successfull, why STAC was successful.
Come up with something like a basic spec, then make it work.
David: Build on CF conceptual model that lends itself to a nice zarr implementation.
Ryan: If we can say zarr is using CF convention for CRS then gdal would be able to decode. Whereas if geozarr is something seperate it would have to be implemented at a different layer. Use case in practice is data cubes.
David: building on top of CF convention instead of netcdf, a core assumption we need to build on. Wouldn't go through full standard tract, we would already come with something working. What has emerged is a desire to write a zarr convention using OGC process to write convention. I think where we left of - we would write a charter for a SWG - what done looks like and kickoff a SWG spec dev process through OGC where result would be presented as a zarr convention through ZEP4 process.
Matt: Advantage of community standard requires the right people and we don't have all of it. Maybe we need more outreach to get the right people.
Matt: what is the role of STAC in all of this? Is it orthogonal to using CF?
Ryan: netCDF, COG, zarr, all different assets in a STAC collection. STAC is for searching. STAC can be useful for many data cubes. Zarr is a catalogue. Zarr <> STAC same conceptual level.
Matt: seems alot of people using zarr for smaller that aren't global. I still want to do that geospatial query, is the answer there that it's not a good use case.
Christophe: cloud native data store using zarrs from ESA. For each directory, there will be a zarr file including the metadata. Also a STAC file describing.
David: Something that looks/feels like nczarr, an incremental add on, bring on WKT, this is geozarr: it builds on CF, it breaks some netcdf
Amit: gdal is missing the transform when using zarrs. Band shuffling needed, CRS needed. Time series remote sensing.
Ryan: gdal writes a zarr, xarray tries to open, cannot because xarray using netcdf model. Add origin offset/transform… why are we doing it in zarr? This is a netcdf issue. xarray built on netcdf model not CF.
David: wouldn't break anything in xarray, just xarray wouldn't understand.
Ryan: we have that with rioxarray.
Ethan: So many tools that already read netcdf, haven't been written for the cloud. The netcdf model is just arrays/attributes. CF is the convention that tells you want attributes to put in there to identify things. Not sure on the netcdf/cf contiuum this lies. The basic stuff like coordinate variable, 2d lat/lon, there's alot of stuff, anything that's from geoworld is going to know how to deal with that. CF + CRS… problem is not enough poeple pushing, that's what it takes, a concerted effort and to be willing to work.
Ryan: xarray cherry picked, can decode time, but nothing else lat/lon. cf-python, implementing xarray like thing.
David: can we ask Evan…
Ryan: solve gdal <> xarray problem.
Christophe: very specific, intent of ESA is to migrate all datasets to standard format, I believe our expectations was not only to have a data format, but to have an alternative to geodatacubes, geozarr is alternative to datacubes, all functionalities wouldn't be available, but this means holding multiple projections or scales of the data.
Ryan: we want to fix interoptibility with zarr, Christophe is opinionated idea of what is in a geodatacube to be serverless. geozarr is how to put crs in zarr.
Action Items
write a PR to describe how to put in origin offset metadata in zarr, can use what gdal, can we prototype interoptibility, just around crs
Sean Harkins / Development Seed / @sharkinsspatial
David Blodgett (USGS)
… ?
Summary
Participants debated whether to develop a spec extension or a convention, with Ryan suggesting the latter. They also discussed the potential of going through the OGC SWG process, with Christophe advocating for its formation and Brianna and Sean acknowledging the benefits of a parallel effort. They also discussed the roadmap, existing conventions, and encoding differences between GDAL and Xarray. The conversation touched upon the relationship between GeoZarr and other standards, as well as the implications of incorporating CF conventions and CRS.
Agenda
Discuss upcoming CEOS WGISS #54 meeting on April 19th, 2023.
Christophe: Is our objective to adhere to the ZEP process confirmed?
Ryan: I believe our work is better suited as a convention. It doesn't require any core changes or extensions to Zarr; it's just a set of guidelines for storing metadata and organizing data within the existing Zarr framework. We can post this proposed convention on the Zarr website and follow the process. It doesn't need a ZEP (Zarr enhancement proposal), in my opinion.
OGC Process - Confirmed?
Christophe: Is our objective to adhere to the ZEP process confirmed?
Brianna: I was under the impression that going through the OGC process is a good idea since it can be done in parallel. Whatever we decide as the convention can be presented to OGC. I'm curious about others' viewpoints. From a NASA perspective, yes.
Sean: You're right, Brianna. As long as it's a parallel effort that doesn't distract from our main work, it's a fine approach. However, it may be a slow, friction-filled process. Having more existing traction and widespread adoption could act as a forcing function for acceptance.
Ryan: I believe we should discuss a roadmap. We have engaging conversations every two weeks, but I don't see a clear path for converging, aligning, and implementing our spec.
David: I'd like to point out that there's an HDF SWG, so having a Zarr SWG wouldn't be out of line with current practices. I'm not advocating for it, but it could be a way to discuss encoding data in Zarr within the OGC sphere. There's also a NetCDF SWG for encoding the semantics of NetCDF data, which is separate from the binary encoding. Additionally, there's a GeoTIFF SWG, etc.
Brianna: I agree with Ryan that we don't have much to show yet. My perspective is that while I want to pursue the OGC route and don't mind if it takes a year, I'm more focused on what we can start referencing. I've added to the agenda and invited Denis from NCZarr, but I'm not sure he received my invitation.
Christophe : With regards to Sean comment, writing conventions on your own to impose the adoption seems to me to be completely contrary to a standardization process. Indeed, we should gather all experts (including from the OGC community) as soon as possible in order to better represent all use cases and gather all the skills in our team. OGC SWG provides the opportunity to gather other ideas and be supported by research projects (such as OGC Testbed), mailing, etc. so what not starting the creation process immediately ?
Ryan: Christophe has a point – what's the harm in starting the SWG creation process? It will take time, and some people will be pleased to know we've begun setting up the SWG. It won't happen quickly, and since we're already moving slowly without a clear process, maybe the OGC could provide the structure we need. I guess I'm in favor of that.
Brianna: I'm mindful of potential issues with ODC, which may create barriers for some people to participate. If we have someone actively contributing who's not an official member associated with a company, I'd want to help facilitate a fairer process, but only if we have someone in that situation.
Ryan: A counter proposal worth considering is that we could develop a convention ourselves and present it to OGC as a community standard, essentially saying, "Here it is. Take it or leave it."
Ryan: Perhaps we should discuss the roadmap and the specific work that needs to be done first, and then revisit this question later on in the meeting.
Ryan: Do we piggy back off CF or do we create a new standard seperate? Existing zarr community standard doc says, we put netcdf data into Zarr. The issue is that we put NetCDF data into Zarr, but complications arose when GDAL did something differently, creating a new way of incorporating geospatial data into Zarr. Now, we have two competing conventions as a result of our discussions. Christophe's original is aligned with CF approach.
Brianna: how is this related to nczarr if at all.
Ryan: another stanard, not zarr compliant
David: binary enconding of CF, but also include geotiff use cases.
Ryan: We already have on file OGC, how to put netcdf data into zarr, so if question is how to encode imagery, maybe we look at CF conventiond not OGC. We would need a very coordinated proposal that is signed off on heavy hitters, which says this is what we want in CF, if not put in, we will fork CF.
Christophe: current draft of geozarr reuse essentially standard names from CF (other stuff is optional) so I don't know if we realy want to apply all about the very substantial CF conventions. For concerns about SWG: It's up for chair to decide on what we do with requests from external people.
Ryan: Transform concept in CF
Sean: If specify a transform based, will it break things? can we keep parallel representations?
Ryan: We will always have zarr data encoding netcdf into zarr, do we want another route there, where you don't care about full netcdf compliant dataset.
Please provide any updates/requests on https://github.com/zarr-developers/geozarr-spec/ For those interested in a co-working session I am blocking off time next Monday and Tuesday afternoon (EST) to make progress on the numerous use cases we've defined
Sean Harkins / Development Seed / @sharkinsspatial
Matthew Hanson / Element 84 / @matthewhanson
Summary
Participants discussed their progress and shared updates on tasks from the previous week. Brianna provided a small Zarr store example, and the group acknowledged that they felt stuck. Sean shared a use case focusing on browser-based visualization, while Ryan and Brianna suggested working with example Zarr stores to identify any issues. The group also discussed the GeoZarr spec, example workflows, and the need for support for rasterio's CRS model in Zarr. The participants agreed to work with the provided example Zarr stores and to build example notebooks based on these datasets. The next steps include Amit sharing sample data and all members continuing to develop example notebooks.
Ryan: viz can bring in complicated issues. Softwares that understand geospatial info and some that does not. For example netcdf and xarray. Hold geospatial and do not do anything to it. GDAL must understand. Achieve interoptibility with two chains.
Sean: biggest use case for rioxr is writing out external netcdf files from xarray dataset created in analysis env. Focused on writing netcdf from source dataset.
Brianna: I prefer to send out a zarr store, people trying to use it, see what breaks.
Ryan: the geozarr spec is written down, but not implemented.
Brianna: Provided a netcdf based zarr let's get a tiff based zarr out there. Have people try to work with it.
Ryan: example workflow
Can I open data with Xarray / Zarr and then pass it to rioxarray? Can we generate the rioxarray spatial_ref variable from a generic dataset?
In memory rep of geospatial and then serialization
Could we make a zipped geozarr that is functionally identical to a single COG?
Let's try to actively work with example zarr posted.
Attendees: Brianna R. Pagán (NASA GES DISC) Aaron Friesz (NASA LP DAAC) Anderson Banihirwe (CarbonPlan) David Blodgegtt (U.S. Geological Survey) Alexey Shiklomanov (NASA GSFC ESDIS) Christophe Noel (Spacebel) Scott (OGC)
Summary
The team discussed the OGC presentation, the dependency mapping, and collaborations with other ecosystems. Scott provided insights on the process of bringing a spec into OGC as a standard and estimated the time required. The team exchanged ideas on the challenges of transforming source formats into Zarr, including encoding, data models, and implementation in various software. They also discussed the aspects that GeoZarr should address or recommend, such as multiple resolutions, projections, and dimensional optimizations. They identified individuals to explore QGIS/Python, R (stars), and other ecosystems, and shared links to demo Zarr catalogs. To-dos include compiling use cases, revisiting a GitHub issue, and exploring QGIS/Python and R ecosystems.
Agenda
OGC presentation/chat by Scott
OGC model is we have two ways of bringing a spec in as a standard.
Standard working group process, use OGC resources, githubs etc
Spec fully developed in community, then apply to become an OGC community standard. External community still owns it. This works well for existing efforts.
The SWG path is faster in that there is no need to have the spec done, or widely implemented.
Three OGC members have to suppoprt the charter, charter is written and goes through approval process.
For something already cooking, the whole process could be done in 8-9 months.
For community standard about the same time of lead time.
Can be done in parallel, publically in GitHub. Once everyone feels comfortable enough, can submit. Scott endorses this approach
David: domain alignment? Scott: discussions about open geo data cube
Scott has spent alot of time writing these ups, willing to help us with discussions on writing charts
Christophe: What is the challenge of bringing the original format to what zarr adds as a functionality which is a n-dimension array. And what features we want from those applications. When you use existing tools you can easily transform source formats into zarr. If you do a simple conversion, you do not get more than the original format.
Anderson: In Python ecosystem, ongoing effort kerchunk. What can we pull from this?
Not exclusive to this effort
We can add this to the mapping
Alexey: how do we encode? Unidata/netcdf versus GDAL. We need a spec to fit in both paradigms. Is any major use case we're missing?
David: push back on GDAL, what about level-2 swatch?
Alexey: Let's punt on L2 swath data (irregular pixel sizes); that's not a problem that anyone has solved. But, GeoZarr absolutely should support just linear affine transforms ("rotations") in the way that most GDAL drivers do.
Since Xarray is so popular, maybe we start by prototyping an Xarray extension for "virtual variables" that parse CRS information and a 6-parameter affine transform (stored as a parameter), just to have something to play with.
Implementation in GDAL seems a lot harder…but maybe a GDAL person
David: yes, two data models. Common data structure. Real dichotomy from CRS vs WKT
Even: binary numbers is geotiff, netcdf it's text. You can define mapping to EPSG and names. Do we invent mapping
David: you need to be able to map these parameters to the software. The EPSG registery is a registery of projection models, some are support CRS
Alexey: how to read netcdf into a raster data model. You can start with opening dataset in xarray, everything is an additional attribute, applying it in
David: Spec might support more than one implementation, it's about supporting certain functions. If you have a dataset with too many cells for CF, you rep it as GDAL style original offset. Risk of accepting 'optional' fields. Do we expect people who have geotiffs for naming variables?
Even: More an issue of conversion tool. That is going to be specific of each dataset. Some might have geotiff bands for time, or each time step in seperate file. Not the goal of this spec, you should support n-dimensional array. Two different sides of GDAL, one historical 2-d raster but now we have more recently multi-dimensional, strongly modeled from netcdf and hdf5, few gdal drivers that implement both drivers. QGIS/rasterio, both use 2-d classic version, not multi-dimensional. 3-d gdal n-dim info and n-dim translate.
Christophe: geozarr recommend at least I can find multiple resolutions or multiple projections. For the bands you need to map multiple files into dimensions of bands, or different convention but this would help needing to know what to do when accessing the data.
NOTE: I would expect GeoZarr to address / recommend such aspects: * How to describe/access multiple related variables, with heterogeneous coordinates (e.g. children Datasets) * How to describe/access multiple resolutions of the data (multiscales draft may help ) * How to encode/describe for optimised Map Tiling support * How to describe/access subsets only available in some resolutions (e.g. an index of the dimensions / resolution) * How to describe/access multiple projections (index ?) * How to describe/access multiple dimensional optimisations (rechunking) * How to describe/access typical EO products (e.g. multispectral band recommended as a dimension of the array) * How to describe/access time series that have not been normalized (e.g. footprints no aligned) - David: multi-member ensembles
Who can steer other ecosystems (R, QGIS, Julia, Javascript)
Even: If you have 2-d zarr array this should work with gdal/QGIS
What happens with 3-d, a zarry with time, too many and it would explode.
(Alexey) NOTE from internal EOSDIS discussion:
EOSDIS is inter-conversion between Proj strings and CF specification
HDF_EOS-2 (built on HDF-4 format) is based on GCTP projection specifications
To Do
Christophe to compile some use concrete use cases he has access to, optimized on some dimensions.
Will make a more explicit request and formalize some presentations via the suggestions of Ryan:
Scott or someone from OGC could explain the SWG process (currently RSVP'ed as maybe)
Matthew Hanson could explain how the STAC process has worked (declined this AM)
Ryan could explain how the Zarr spec and convention process works
Christophe could give an overview of where GeoZarr stands today and what are some of the challenges / open questions that have to be addressed (requested for this to be at a later date)
High level overview of zarr (Ryan):
Zarr created 2015, Alastair created and python implementation, both zarr and n5 (not HDF5) arose for a more simple "hackable" data format
Many implementations in zarr in different languages, native implementations
Shifted to community model after Alastair exited, Sanket is funded community director
Currently working on a V3 spec: creates a formal mechanism to extend the spec
Also working on conventions, don't require changes to spec, more for changing variables, downstream applications, more lightweight than a spec
Is geozarr going to be a convention or spec? Right now it's akin to a convention
High level overview of existing geozarr spec repo (Christophe):
Initially involved with a young company Constellar, in contract to provide a cloud native database
Originally working with COGs, needed extra dimension of light spectrum and time, some work to extend COGs with time, but didn't have the speed (as it wasa not really N-D array but rather series of arrays)
Zarr fulfilled the capabilities looking for, but noticed the libraries like xarray, rasterio, gdal etc all needed some geospatial metadata.
Based on xarray conventions, Christophe based conventions for geozarr, extended with additional features, because wanted integration of symbology
Adopted CF conventions for the standard names, allows client to know exactly what coordinates refer to
Christophe created conventions very close to netcdf
Geozarr interest is to be extended to include geotiff capabilities
Thinking this should not be restricted to CF
What are the high level objectives of this commitee:
Focus on use cases that we want enabled by this work
Not going to convince communities how to encode data, best thing we can do is write out operability between tools
If we land on a convention here? How do we get community consensus. Two primary conventions:
GDAL/WKT
Grid mapping/CF conventions
Here is how geozarr will use this convention so that all software reading zarr can identify if that convention is present
Very few people have implemented CRS math, in practice it always goes thorugh proj/wkt, that is not supported by CF
Affine transforms / polynomial affine transforms are very complex, hard to implement
Important to solve these workflow issues, but not the focus of this group
Let's brainstorm!
As a data scientist I want to open zarr in gdal and get crs
As a data scientist I want to write data in xarray then open in gdal and get crs
Right now, conflicting standards, xarray using CF conventions, gdal using adhoc
Global models where cell geometry becomes important, bounds concept from CF
Remote sensing, or DEMs with high res, storing individual cell coordinates that become larger than the dataset itself, and you need an origin offset
I want to be able to subset swath/L2/irregular grids
Cross domain interoptibility?
Geospatial viz in the browser. With v2 had to build custom workaround, v3 could benefit from extensions or the spec of geozarr
As a GIS analyst I want to be able to read a zarr store into ArcGIS/QGIS with correct spatial represenation
Proposal: let's do 5 minutes of silent writing of use cases
Template:
- As a [type of User], I need to [do something] with Zarr using [tool X]
As a geospatial analyst, I would like to have support for rectilinear affine transforms (already supported by GDAL)
As a geospatial analyst,I would like to have support for ground control points (already supported by GDAL)
As a climate scientist, I need to open CMIP6 data from AWS stored in Zarr format with Xarray, reproject the data to web Mercator, and export a COG for visualization purposes. The Zarr data were transcoded directly from NetCDF using CF conventions with a grid_mapping variable and no WKT.
As a data scientist at a remote sensing company, I want to build harmonized datacubes of Sentinel / MODIS / Landast data and store them in Zarr on S3. I need all my tools to understand the CRS of the data cube.
As a publisher of integrated climate and landscape data products, I need one set of conventions to house both 2D coordinate variable low granularity (e.g. climate) data and highly granular (e.g. elevation) data, so my client software and the infrastructure we use to work with both can be less complex and more understandable for all involved.
As a client/tool, I want to discover dimensions, coordinates, and variables. Dimensions shall include (if relevant) the spectrum band or wavelength and the provide unambigous description (e.g. standard name) to interpret the coordinates.
As a GeoTiff provider, I want to be able to encode in GeoZarr my set of GeoTiff (e.g one file per resolution) and encode in a standard way the various resolution/band arrays
As a client/tool, I want to discover if data downscales are available
As a client/tool, I want to discover, if rechunked (dimension-optimised) instance of the data are available (e.g., time series optimised rechunked array)
As a client/tool, I want to discover a composition of array (e.g. subarrays being temporal instances or adjacent regions)
As a user/client I want to be able to retrieve subset of the data.
As a client/tool, I want to discover a set of visual portayls of the geospatial data and the relevant symbology.
As a Map Viewer, I want to be able to discovery the GeoZarr product and display the data on a map with the right projection and be able to browse the other dimensions (time, elevation, bands, wavelengths)
As a Catalogue, I want to be able to provide the necessary information about the GeoZarr product so it can be displayed on a map
As a tiler or frontend developer, I can access a zarr archive with reduced resolution overviews stored with a standard CRS and level convention.
As an xarray user I'd like flexible CRS enabled indexes to be able to optimally request sharded data with spatial operations.
As a frontend developer, i would like to be able to develop browser-based tools to visualize data stored in zarr store by taking full advantage of the zarr geospecial/CF conventions
As a geospatial analyst, I want to analyze remote sensing / climate datasets (that follow the NetCDF/Xarray data model) alongside "traditional" raster and vector datasets in a variety of projections in my desktop GIS client (QGIS, ArcGIS Pro).
QGIS is almost entirely based on GDAL
As a climate scientist working in climate impacts, I want to aggregate gridded climate/remote sensing data (that follow the NetCDF/Xarray data model) to political units (e.g., counties, states, countries) distributed as spatial polygons.
If I'm coding in Python: Geopandas, Xarray, rasterio
If I'm coding in R: ncdf4/RNetCDF; stars/terra (bindings for GDAL); sf (bindings for OGR)
As a scientist using remote sensing data, I want to be able to use the latest, most frequent, and highest-resolution satellite data (which are only distributed as L2 swaths) in my spatial analyses (that also involve "normal" raster data in GeoTIFF and vector data for my site).
As a GIS specialist, I can open an S3 / http url point to a Zarr dataset in the cloud and interact with it the same way I would with a COG
Sofware/Repos needed for this interoperability
gdal
netcdf-java
xarray
rasterio / rioxarray (are these subsumed by GDAL?)
As a community, let's diagram the geospatial Zarr stack and dependency chain for each ecosystem (Python, R, Julia, Rust, QGIS) - Ryan will kick this off
Brianna Christophe (Thanks!) ask for a presentation from OGC
PR to geozarr repo that would propose what is needed to encompass more of the above use cases (David + Sean, @ Evan in the PR)
Roundtable Intros: names, roles, motivation for joining the call
Finding a home for the geozarr-spec repo:
Leave as is
Move to community org, Ryan offered zarr-developers
Other ideas?
Discussion
Christophe: ESA is on board with this collaboration
David: bringing in front of netcdf standards working group
Alexey: if NASA leads we wil just slow things down
Ryan: cannot assume the spec we all agree on will be what it is today. We should not assume that what we align on be CF conventions. Vocal community that doesn't want that, coming from GIS raster world. Should get netcdf working group involved from OGC, we will confront the culture clash between raster GIS vs netcdf climate communities.
David: Spatial first enconding for multi-dimensions, momentum has taken over. Coverage implemention folks harder to get on board.
Ryan: Align with OGC somehow, don't need to follow full OGC playbook. Another standards group, which is zarr itself, have done alot of work with ZEPs (see link below). Part of this was developing a process for extensions, implementors etc.
Hailiang: We have ZEP3, version 3 has lots of new features, irregular chunking, sharding etc..
Ryan: We can do this orthogonal to the zarr version. Ryan introduced ZEP as a convention, for how a domain will store metadata. It doesn't need to be a zarr extension, just need to say how we are going to store metadata in the container, this would be applied to ZEP2 or ZEP3 etc. It can be seperated from that process.
Hailiang: any existing examples?
Ryan: many ad-hoc conventions out there, xarray, microscopy community, but no formal process
David: hesistant to bring it to a SWG that's not the netcdf SWG. Zarr as a binary carrier is a standalone spec, same as TIFF and HDF5, that we build on top of with conventions.
Alexey: I'm on the GeoDataCube SWG; they've only had a few meetings right now, so it's in pretty early stages. They are definitely aware of the NetCDF data model — I'm pretty sure not listing that here is just a minor oversight, but I'll ensure it's there during the next meeting.The idea is to not have the overhead of another SWG, this could be an agile way and point to existing SWG
Ultimately intent is to be an OGC spec and can be moved to the opengeospatial repo
Chartering a new Steering Working Group (SWG) under OGC
What does this entail?
Timeline?
Who is involved?
Discussion:
David: It doesn't have to be slow, it can be rapid. Recommend drafting something in zarr community, get ready to roll.
Ryan: Zarr + OGC, question - is the community standard process? It's alot more light weight, standard is developed outside of OGC. What are pro/cons of community standard vs OGC standard working group process
Alexey: Nothing formally at NASA keeping us from using this, we already are
Christine: having OGC stamp on it helps.
Brianna: zarr is not an official NASA approved data format, but we're still moving forward with use
Matt: proponent of community standard approach, his own personal experience with STAC. Adoption is the most important part, have people use whatever we come up with. We do this by supporting open source implementations that can utilize this. More important than where it lives.
David: why not community standard? Because you want to be in OGC architecture.
Ryan: clear political advantages of becoming an OGC standard, consesus that is our long term goal, not let progress be blocked by this. Convene implementors, we as a group move forward of doing the hard work, which is discussing what is correct from a technical point of view.
Alexey: To clarify - If the focus here is on the metadata and not the internal Zarr storage, we could use whatever we want for the storage backend, right, via fsspec-reference/Kerchunk-like workflows? I.e., We are specifically targeting the JSON, not the underlying storage? (I think this is similar to what Ryan just said).
Christophe: Yes but I think the extension must consider the aspects specific for S3 backend.
Who are the implementors for this being a success:
NASA
Brianna, Christine
Unidata / NetCDF
Ethan Davis and Dennis Heimberger
Zarr:
Ryan
GDAL:
Even Rouault - invite? Planet contracts with him.
Planetary Computer:
Tom - invite, Matt will invite.
Open Geospatial Data Cube:
Kirill
STAC:
Panoply?
Robert Shmunk, ncZarr
Action items
Ryan will coordinate with Christophe to transfer the repo
Brianna will send invitations to the implementers listed above to get involved
Brianna will schedule a bi-weekly call, but sometimes what's really need is to all get in a room together.
15 May 2024: Issues with calendar invites were discussed, leading to a suggestion to use a dedicated page for meeting details. Brianna presented the first iteration of the GeoZarr validator, covering grid mapping, time bounds, and validation models. The progress on conforming to the OGC template and updates on the multi-scale PR were also reviewed, with discussions on implementation challenges with Julia and webmapping models versus geotiff overview models.
17 April 2024: Ethan and Brianna discussed Zarr compression compatibility with NCZarr, with Ethan following up on documentation indicating potential support. Brianna and Christophe updated on the branch refactoring to comply with the OGC template. Ryan provided updates on limitations of netCDF-style coordinates for large geospatial rasters and the potential implementation of new indexing types in xarray. A scheduled discussion on Pangeo/NASA funding and GeoZarr specification updates were also noted.
3 April 2024: The creation of a new branch to conform with the OGC template was discussed, focusing on distinguishing conventions from extensions within the Zarr framework. Ryan suggested linking to the PR for the Zarr spec implementing the ZEP. A need for an example Zarr file was reiterated. Discussions included the development of a tool for checking compliance with GeoZarr specifications and related NASA funding and ecosystem support.
6 March 2024: Ethan presented on the OGC netCDF SWG. The organizational structure of GeoZarr was discussed, focusing on mapping GeoZarr to Zarr and the alignment with CF conventions. Tile Matrix and compression algorithm support were reviewed, including the need to specify Zarr version compatibility. Interoperability issues with Julia and sparse array support in Zarr were also discussed.
21 February 2024: Christophe and Brianna were appointed co-chairs for the OGC subgroup. Updates included repo formatting, Zarr sprint summaries, and interoperability issues. Discussion covered GeoTiff to GeoZarr PR, Tile Matrix PR, and the move away from consolidated metadata. A working session was scheduled for March 4, and action items were assigned.
24 January 2024: Charter approved in November 2023. Nominees for co-chairs were discussed. Zarr sprint logistics and focus groups were outlined, with an emphasis on practical demos and use cases. Integration of Zarr with STAC catalogs and issues with CF encoding of CRS were highlighted. Discussions on achieving round-trip CRS compatibility between Python and GDAL, and handling pyramiding in QGIS, were also covered.
21 June 2023: The feedback on the OGC meeting attendance was limited. There was a debate on whether it was necessary to resolve every commit in the draft charter before moving ahead. Discussion on the draft OGC Charter focused on whether to include visualization in the spec, etc.
24 May 2023: There was a discussion on the open PR for the draft OGC Charter, and participants were encouraged to provide feedback. The meeting with Radiant Earth Foundation regarding the Cloud-Native Geospatial Foundation was also discussed, as was the upcoming meeting with the Google Earth Engine Ingest team.
26 April 2023: The addition of GeoTransform as implemented by GDAL was discussed, and the need for an example Zarr file was highlighted. The discussion then moved to the Standards Working Group OGC Draft, with debates on whether to move forward with a community standard or an SWG. The importance of interoperability and community involvement was highlighted.
12 April 2023: The questions posed to Scott from OGC regarding the alignment of GeoZarr objectives with the requirements for an OGC SWG were reviewed. Discussions ensued on how to create the OGC standard from the ZEP-4 document, with Scott suggesting a process for translating the ZEP into a Standard. The role of STAC in relation to the use of CF was also discussed, along with the challenges associated with interoperability between GDAL and xarray.
29 March 2023: Debated developing a spec extension or a convention for GeoZarr, and discussed the potential of going through the OGC SWG process. Also touched upon a roadmap, and encoding differences between GDAL and Xarray.
1 March 2023: Discussed progress and shared updates on example Zarr stores to identify any issues, focusing on browser-based visualization and the need for support for rasterio's CRS model in Zarr. Agreed to continue developing example notebooks based on the provided datasets.
15 February 2023: Scott Simons presentation of OGC standardisation processes. Debated challenges of transforming source formats into Zarr, including encoding, data models, and implementation in various software. Explored aspects that GeoZarr should address or recommend.
1 February 2023: Explored high-level objectives for the committee, such as use cases, compatibility, and community consensus; brainstormed use cases and software/repos needed for interoperability.
19 January 2023: Discussions for moving forward a community-led geozarr spec, transferring the repo, and organizing bi-weekly calls and in-person meet-ups.