Exercise document: Geospatial Python workshop 2022-11-21

# Exercise document: Geospatial Python workshop 2022-11-21 [Link](https://carpentries-incubator.github.io/geospatial-python/) to the lesson materials Note that the exercises start from Episode 5: **Access satellite imagery using Python**. For the intro section there is no exercise. - Tobias: - Andrew: - Erik: - Robbin: - Max: - Julia: - Jan: - I do: - Jelle: - Luca: - Rui: - Petra: - Christina: - Mirko: - Raul: - Anniek: |Name| | |---|---| |Tobias| | |Andrew| | |Erik| | |Robbin| | |Max| | |Julia| | |Jan| | |Ido| | |Jelle| | |Luca| | |Rui| | |Petra| | |Christina| | |Mirko| | |Raul| | |Anniek| | ## Day 1 ### Access satellite imagery using Python #### Exercise: Discover a STAC catalog Let’s take a moment to explore the Earth Search STAC catalog, which is the catalog indexing the Sentinel-2 collection that is hosted on AWS. We can interactively browse this catalog using the STAC browser at [this link](https://stacindex.org/catalogs/earth-search#/). 1. Open the link in your web browser. Which (sub-)catalogs are available? 2. Open the Sentinel-2 L2A COGs collection (link ending in `sentinel-s2-l2a-cogs`), and select one item from the list. Each item corresponds to a satellite “scene”, i.e. a portion of the footage recorded by the satellite at a given time. Have a look at the metadata fields and the list of assets. What kind of data do the assets represent? #### Exercise: Search satellite scenes using metadata filters Search for all the available Sentinel-2 scenes in the `sentinel-s2-l2a-cogs` collection that satisfy the following criteria: - intersect a provided bounding box (use ±0.01 deg in lat/lon from the previously defined point); - have been recorded between 20 March 2020 and 30 March 2020; - have a cloud coverage smaller than 10% (hint: use the `query` input argument of `client.search`). How many scenes are available? Save the search results in GeoJSON format. #### Exercise: Downloading Landsat 8 Assets In this exercise we put in practice all the skills we have learned in this episode to retrieve images from a different mission: [Landsat 8](https://www.usgs.gov/landsat-missions/landsat-8). In particular, we browse images from the [Harmonized Landsat Sentinel-2 (HLS) project](https://lpdaac.usgs.gov/products/hlsl30v002/), which provides images from NASA’s Landsat 8 and ESA’s Sentinel-2 that have been made consistent with each other. The HLS catalog is indexed in the NASA Common Metadata Repository (CMR) and it can be accessed from the STAC API endpoint at the following URL: https://cmr.earthdata.nasa.gov/stac/LPCLOUD. - Using `pystac_client`, search for all assets of the Landsat 8 collection (`HLSL30.v2.0`) from February to March 2021, intersecting the point with longitude/latitute coordinates (-73.97, 40.78) deg. - Visualize an item’s thumbnail (asset key `browse`). ### Read and visualize raster data #### Exercise: find the axes units of the CRS What units are our data in? See if you can find a method to examine this information using `help(crs)` or `dir(crs)` #### Exercise: set the plotting aspect ratio Plotting a satellite image can lead to a "stretched" image. Let’s visualize it with the right aspect ratio. You can use the [documentation](https://xarray.pydata.org/en/stable/generated/xarray.DataArray.plot.imshow.html) of `DataArray.plot.imshow()`. ## Day 2 ### Vector data in Python #### Exercise: Import Line and Point Vector Datasets Using the steps above, load the waterways and groundwater well vector datasets (`data/status_vaarweg.zip` and `data/brogmwvolledigeset.zip`, respectively) into Python using `geopandas`. Name your variables `waterways_nl` and `wells_nl` respectively. Answer the following questions: - What type of spatial features (points, lines, polygons) are present in each dataset? - What is the CRS, give the EPSG ID. - What is the total extent (bounds) of each dataset? - How many spatial features are present in each file? - #### Exercise: Investigate the waterway lines Now we will take a deeper look in the Dutch waterway lines: `waterways_nl`. Let’s visualize it with the `plot` function. Can you tell what is wrong with this vector file? ### Crop raster data with rioxarray and geopandas #### Exercise: select the underground water monitoring wells in the cropped raster extent `brogmwvolledigeset.zip` contains the underground water monitoring wells of the whole Netherlands. They are not all relevant to us. Can you 1) select all wells within the extent of the cropped raster `raster_clip`, and 2) plot the selected wells on top of `raster_clip`? Hint: think of how we selected the crop fields. #### Exercise: Select the raster data around the wells. Now we have the buffer polygons around the groudwater monitoring wells, i.e. wells_buffer. Can you crop the image `raster_clip` to the buffer polygons? Can you visualize the results of cropping? #### Exercise: Select the raster data around the waterways In the previous episode we have corrected the waterway vector data and saved it in `waterways_nl_corrected.shp`. Can you select out all the raster data within 100m around the waterways, and visualize the results? ### Raster Calculations in Python #### Exercise: Explore NDVI Raster Values It’s often a good idea to explore the range of values in a raster dataset just like we might explore a dataset that we collected in the field. The histogram we just made is a good start but there’s more we can do to improve our understanding of the data. 1. What is the min and maximum value for thCan you crop the image `raster_clip` toAre thbuffer ere missing values? 2. Plot a histogram with 50 bins instead of 8. What do you notice that wasn’t clear before? 3. Plot the `ndvi` raster using breaks that make sense for the data. #### Exercise: Compute the NDVI for the Texel island Data are often more interesting and powerful when we compare them across various locations. Let’s compare the computed NDVI map with the one of another region in the same Sentinel-2 scene: the [Texel island](https://en.wikipedia.org/wiki/Texel), located in the North Sea. 1. You should have the red- and the NIR-band rasters already loaded (red and nir variables, respectively). 2. Crop the two rasters using the following bounding box: `(610000, 5870000, 630000, 5900000)`. Don’t forget to check the shape of the data, and make sure the cropped areas have the same CRSs, heights and widths. 3. Compute the NDVI from the two raster layers and check the max/min values to make sure the data is what you expect. 4. Plot the NDVI map and export the NDVI as a GeoTiff. 5. Compare the distributions of NDVI values for the two regions investigated. ## Day 3 ### Calculating Zonal Statistics on Rasters #### Exercise: Calculate zonal statistics for zones defined by `ndvi_classified` Let’s calculate NDVI zonal statistics for the different zones as classified by `ndvi_classified` in the previous episode. Load both raster data-sets and convert into 2D `xarray.DataArray`. Then, calculate zonal statistics for each `class_bins`. Inspect the output of the `zonal_stats` function. ### Parallel raster computations using Dask #### Chunk sizes matter We have already seen how COGs are regular GeoTIFF files with a special internal structure. Another feature of COGs is that data is organized in “blocks” that can be accessed remotely via independent HTTP requests, enabling partial file readings. This is useful if you want to access only a portion of your raster file, but it also allows for efficient parallel reading. You can check the blocksize employed in a COG file with the following code snippet: ```python import rasterio with rasterio.open(visual_href) as r: if r.is_tiled: print(f"Chunk size: {r.block_shapes}") ``` In order to optimally access COGs it is best to align the blocksize of the file with the chunks employed when loading the file. Open the blue-band asset (“B02”) of a Sentinel-2 scene as a chunked `DataArray` object using a suitable chunk size. Which elements do you think should be considered when choosing the chunk size?