<!-- Docs for making Markdown slide deck on HackMD using Revealjs https://hackmd.io/s/how-to-create-slide-deck https://revealjs.com --> ### A streaming data pipeline with Harmonized Landsat Sentinel-2 (HLS) :satellite: imagery <small> Lunch and Learn presentation at NASA IMPACT<br> Friday 30 Jun 2023, 20:00-21:00 (UTC) </small> _by **[Wei Ji Leong](https://github.com/weiji14)** & [Ryan Avery](https://github.com/rbavery) @ [DevelopmentSeed](https://developmentseed.org)_ <!-- Put the link to this slide here so people can follow --> <small> P.S. Slides are at https://hackmd.io/@weiji14/2023mlpipeline</small> --- ### :floppy_disk: Typical way of producing ML-ready geospatial :earth_africa: data 1. Download imagery from cloud to local disk 2. Pre-process and chip into smaller size (e.g. 512x512) 3. Store as GeoTIFF/NPY/TFRecords/etc | Pros | Cons| |--|--| | Can be handled by GIS/geospatial expert | Need to reprocess data for different input size | | Often faster to load into neural network model | Loss of geospatial metadata if not careful | --- ### :cloud: Cloud-native way of creating ML-ready geospatial :earth_asia: data 1. Access data from Spatiotemporal Asset Catalog (STAC) 2. Data-proximate pre-processing and chipping on the fly 3. Load tensors directly into GPUs in cloud environment | Pros | Cons | |--|--| | Can experiment with different input sizes as hyperparameter | ML engineer has to manage data pipeline | | Save on local storage and file management | More data loading latency if off-region | --- ### Demo Data pipeline for Harmonized Landsat Sentinel-2 + Burn scar masks <img src="https://hackmd.io/_uploads/ryn96bhu2.png" alt="Harmonized Landsat Sentinel-2 image on left, burn scar mask on right" style="margin:0px auto;display:block" width="50%"/> Follow along at https://nasa-impact.github.io/ml-pipeline/docs/01_datapipelines_with_torchdata.html --- ### Take home messages - Data: Publish as Spatiotemporal Asset Catalogs (STAC) - Model: Look into STAC ML-model standard: https://github.com/stac-extensions/ml-model - Learn: About scalable geospatial machine learning! --- ### Links - Repo: https://github.com/NASA-IMPACT/ml-pipeline - Jupyter Book: https://nasa-impact.github.io/ml-pipeline - Contact: - weiji@developmentseed.org (@weiji14) - ryan@developmentseed.org (@rbavery)
{"title":"A streaming data pipeline with Harmonized Landsat Sentinel-2 (HLS) imagery","description":"Pangeo Machine Learning working group presentation","slideOptions":"{\"theme\":\"simple\",\"width\":\"80%\"}","lang":"en-NZ","contributors":"[{\"id\":\"c1f3f3d8-2cb7-4635-9d54-f8f7487d0956\",\"add\":15102,\"del\":12557}]"}
    282 views