Intake-STAC Design Doc

Authors: Joe Hamman, Scott Henderson

Date: Started February 26, 2019

tldr; intake-stac is an intake plugin for accessing datasets described using the SpatioTemporal Asset Catalog specification

Background and High-level Goals

STAC is a simple catalog format that is finding wide adoption in the remote sensing world, especially for datasets stored in the Cloud.
Intake is a lightweight Python package for finding, investigating, loading, and disseminating data.

The goal of Intake-Stac is to facilitate lazy loading of remote sensing datasets stored on servers into xarray datasets for analysis with Python.

Intake-stac should:

ingest STAC catalogs, providing a mapping from STAC to intake catalog formats (see https://github.com/sat-utils/sat-stac)
support queries against STAC metadata (see https://github.com/sat-utils/sat-api and https://github.com/sat-utils/sat-search)
support reading data using the intake-xarray plugin

Design Philosophy

lightweight
let other tools do the heavy lifting (intake, xarray, s3fs, etc.)
provide for anticipated/common query patterns
- what types of data are available in this bounding box and time period?
  - narrow by source name ('MODIS')
  - narrow by data type ('MSLA')
- what sources provide data of such type
- what types are provided by source
- coincidence
- data quality and density
Intake-STAC access automatically appears in a pangeo JupyterHub; does this make sense? If so what does it look like?
where Intake-STAC functionality ends suggest:
- provide FAQ pointers to stable extern resources - Jake's book, Scott's Binder notebooks, etc
Interop w/ DataCite DOIs
- Makes data dicoverable through Google
- First goal: Don't replicate existing
- Second goal: Take advantage of this project

Scope questions

do we limit to data stored on AWS/GCS/Azure? current STAC implementations are limited compared to archives on gov servers: https://github.com/radiantearth/stac-spec/blob/master/implementations.md
will intake-stac support transformations from gov servers to archives of convenience (e.g. COG or Zarr on S3)?
element 84 has put together CMR search, which catalogs NASA's entire archive. CMR queries can return STAC catalogs, but need to update version and maybe incorporate directly into CMR? https://github.com/Element84/cmr-stac-api-proxy

Technical Design

what we want:

# converting to intake catalog will enable intake tools such as gui browser
cat = intake.StacCatalog('landsat8-aws.json')

# or leverage existing tools such as sat-api/sat-search
cat = intake.StacSearch(collection='landsat8', bbox=[], datetime='2017/2019')
cat.filter(bands=['red','green','nir'], cloudcover=20)

# need to share STAC catalogs with colleagues / reproduce work later
cat.to_file('my-catalog.json') 

# would be great to explore metadata as geopandas geodataframe
df = cat.to_dataframe()

# for achives on gov servers or legacy formats
cat.to_archive_of_convenience(s3bucket, awscredentials)

# currently sat-utils allows data download, but not lazy loading via xarray:
ds = cat.to_dask()

# default plots with geoviews?
cat.plot.thumbnails()

currently, lots of manual functions to get remote sensing time series into xarray datasets (even w/ intake): https://nbviewer.jupyter.org/github/scottyhq/pangeo-binder-test/blob/master/notebooks/3-intake-stac-landsat.ipynb
challenges:
- stac spec changing rapidly, so intake-stac versions should match stac spec versions (currently 0.6.1)
- stac item assets can be any format (not just COG or Zarr)
  - what to do with complex NASA HDF data?
- i suspect we will need a 'plugin' system for subcatalogs that define options for every satellite / sensor (e.g. landsat8.yml, sentinel2.yml, modis.yml, sentinel.yml). This is what will specify defaults and parameters for the to_dask() function.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.