This guide aims to help researchers to work with Earth Observation (EO) data using CSC's computing resources. The purpose of this guide is to give an overview of available options, so it would be easier to decide if CSC has suitable services for your your EO reseach. It also helps you find the right data and tools for raster data based EO tasks. If you are interested in the fundamentals of EO, please check the resources and further reading section.
What are the benefits of using EO data?
For working with EO data there are in general three main options:
CSC options do not fit well in this categorization, rather they have some features from all of these. CSC computing services provide a lot of computing power and storage space, and they are free of charge for Finnish researchers for academic or educational use.
At CSC, EO data can be processed and analyzed using supercomputer Puhti or a virtual machine in the cPouta cloud service. Puhti's computing capacity can hardly be compared to any other EO service, in both available cores and amount of memory. Both Puhti and cPouta have also GPU resources, which are especially useful for deep learning use cases.
Puhti has also a lot of pre-installed applications, so it is an environment ready to use. cPouta virtual machines are similar to commercial cloud offerings, there all set-up is done by the end-user. In both services, it is only possible to install tools that support Linux.
At CSC, are available only some Finnish datasets, so often working with CSC comptuing services requires to download EO data from other services, see list of EO data download services. The available local storage in Puhti and cPouta is ~1-20 Tb, more space is available from Allas object storage.
Using CSC computing services is technically a little bit demanding, requiring basic Linux skills and ability to use some scripting language or command-line tools. Supercomputers and virtual machines have several specific consepts, so it takes a few hours to get started. The new Puhti web interface makes the start considerably easier, providing several graphical tools. It has also Studio and JupyterLab for easy start with R or Python.
What to consider when chosing data:
Name | Max resolution, m | Revisit time, days | Years of operation | Open data |
---|---|---|---|---|
Optical, multispectral | ||||
ESA, Sentinel-2 | 10-60 | 5 | 2015-> | Yes |
NASA, Landsat | 15-120 | 8 | 1972-> | Yes |
ESA, Proba-V | 100-1000 | 1-2 | 2013-> | Yes |
Airbus, Spot | 1.5 | - | 1986-> | No |
Planet, several satellites | 0.5-5 | - | 2009-> | No |
DigitalGlobe, WorldView | 0.3-30 | - | 1997-> | No |
Airbus, Pleiades | 0.3-0.5 | - | 2012-> | No |
Hyperspectral | ||||
NASA, MODIS | 250-100 | 1-2 | 1999-> | Yes |
NASA, EO-1 | 10-30 | - | 2000-2017 | Yes |
Radar, SAR | ||||
ESA, Sentinel 1 | 5 | 6 | 2014-> | Yes |
ESA, Radarsat | 1-100 | 24 | 1995-> | Yes |
TanDEM-X/TerraSAR-X | 0.25-40 | - | 2010-> | No |
ICEYE | 0.5-2.5 | 1 | 2018-> | No |
LiDAR | Footprint size | |||
NASA, ICESat2 | 13 | 91 | 2019-> | Yes |
NADA, GEDI | 25 | - | 2018-> | Yes |
!!! default "EO database"
Database of all EO missions and instrument information can be found in the [CEOS EO handbook database](http://database.eohandbook.com/database/instrumenttable.aspx)
The commercial data is usually available from data provider, the open datasets have several copies in different services. Usually it is good to use data close to the processing, for speeding up the download. For bigger downloads it is important that services have a download API.
!!! default "STAC"
Many data providers provide a Spatio Temporal Asset Catalog (STAC) of their datasets. These catalogues help in finding available data based on time and location with the possibility for multiple additional filters, such as cloud cover and resolution. The [STAC Index](https://www.stacindex.org/) provides a nice overview of available catalogues from all over the world. The STAC Index page also includes many resources for learning and utilizing STAC. Check out also CSC's [examples for utilizing STAC from Python](https://github.com/csc-training/geocomputing/blob/master/python/STAC).
Some Finnish EO datasets are available locally at CSC. STAC for data at CSC is coming in 2023.
SYKE/FMI, Finnish image mosaics Mosaics are available both for Sentinel 1 and 2, and Landsat, for several time periods per year. Some of them are available in Puhti, but not all. FMI has also STAC catalog for these mosaics
European Space Agency's SciHub provides worldwide all main products for Sentinel 1, 2 and 3. It requires free registration. Big part of the data is in the "Long term archive" and cannot be downloaded directly, but needs to be requested. Download is limited to 2 concurrent processes per user.
FinHub is the Finnish national mirror of SciHub; others national mirrors also exist. It covers Finland and the Baltics and offers Sentinel 2 L1C (but not L2A) and Sentinel 1 SLC, GRD and OCN products and requires own registration. Finhub does not have concurrent download limitations nor "Long term archive".
!!! default ""
Both of the above provide a similar Graphical User Interface (GUI) and Application Programming Interface (API) to access the data. You can also use for example the sentinelsat tool for downloading data from ESA open access hubs. See also CSC examples for SciHub and FinHub data download. STAC catalog will be available for Scihub data during 2023.
USGS EarthExplorer provides lots of different US related datasets, also worldwide Landsat mission datasets. It requires free registration. Data can be browsed and downloaded via web interface and bulk download. USGS is the prime provider of the new Landsat Collection 2 data.
NASA Earthdata provides among many others harmonized Landsat 8 and Sentinel-2 dataset. It requires free registration and download is possible via web interface and bulk download.
Amazon Web Service (AWS) open EO data is a collection of worldwide EO datasets provided by different organizations, including Landsat and Sentinel. Some of the data can be downloaded only on "requestor pays" basis. The situation is changing all the time, currently Sentinel-2 L2A Cloud-optimized Geotiffs are available for free, also via STAC.
Microsoft planetary computer: Sentinel, Landsat, MODIS etc. STAC available. It is currently available in preview.
Google Cloud Storage open EO data, inlcuding Sentinel-2 L1C and Landsat: Collection 1 data. Data can be downloaded from here for example with FORCE.
Terramonitor provides pre-prosessed, analysis ready Sentinel-2 data from Finland available between 2018-2020. Commercial service.
!!! default "Other geospatial datasets"
To find other geospatial datasets, check out [CSC open spatial dataset list](https://research.csc.fi/open-gis-data).
You can find more information about geocomputing using CSC resources and how to get started on CSC geocomputing pages, inlcuding links to creating user accounts and all other practical information.
FORCE - Framework for Operational Radiometric Correction for Environmental monitoring. All-in-one processing engine with CLI for EO image archives. FORCE example for Puhti
GDAL (OGR) - Geospatial Data Abstraction Library. Collection of command-line tools for accessing and transforming geospatial data, relatively fast and require little computational resources. GDAL support reading data directly from Internet or object storage. GDAL is included in many other tools for data reading and writing. GDAL example for Puhti
Julia - Puhtis Julia installation does not include any geospatial packages, but they can be installed by the user. JuliaGeo provides an overview of packages for geospatial data.
Matlab - you can run Matlab jobs on Puhti conveniently from your own computers Matlab installation.
Orfeo Toolbox (OTB) - offers a wide variety of applications from ortho-rectification or pansharpening, all the way to classification, SAR processing, and much more. Orfeo Toolbox is available as CLI, GUI and via Python interface.
QGIS - is a very widely used GUI for working with spatial data, it has also limited multispectral image processing capabilities. GUI with batch processing possibility and Python interface. Used for example for visualization, map algebra and other raster processing. Many plug-ins available, for EO data processing, check out the QGIS Semi-automatic classification plugin.
R - Puhti R installation includes a lot of geospatial packages, includeing serveral useful for EO data processing.
Sen2Cor - a command-line tool for Sentinel-2 Level 2A product generation and formatting.
Sen2mosaic - a command-line tool to download, preprocess and mosaic Sentinel-2 data.
SNAP - ESA Sentinel Application Platform. Tool for processing of Sentinel data (+ support for other data sources). GUI, CLI (Graph Processing Tool, GPT) and Python interfaces. SNAP GPT example for Puhti.
If you need further applications, you can ask CSC to install them for you.
One example of the advanced usage of EO data is for machine learning. If you are interested in the topic, you can find a lot of examples from CSC machine learning with spatial data course materials. For practical guidelines, see also CSC machine learning guide
Below is a list of alternative EO processing services that might be useful, when a lot of data is required and downloading it all to CSC might not be feasible. All of them include the main open datasets: Sentinel, Landsat, MODIS etc.
Google Earth Engine is a processing platform, which requires registration, but is free of charge for research users. It can be accessed via browser and has worldwide analysis ready data available (browse the catalogue). In general, JavaScript is used on the platform, but also Python and R support exists. Check out GEE's tutorials. Note that Google Cloud Storage might be needed to export large datasets.
Microsoft planetary computer. For computing it offers JupyterHub together with Dask Gateway, both CPUs and GPUs are available. It is currently available in preview.
Data and Information Access Services (DIAS) cloud based Virtual Machines (VMs), dedicated baremetal servers, containers, operating system and software images. Specialized in EO, user support available. Commercial services.
Sentinelhub. Several different APIs. Commercial service.
Commercial clouds: Amazon, Google Cloud and Microsoft Azure, all provide virtual machines and other processing services, all of them have some local data, see links above. No EO support.
If you are interested in using CSC services for your EO research, please make yourself familiar with the services:
You can find all the ways that you can get help from CSC specialists via CSC contact page. We are happy to help with technical problems around our services and are open for suggestions on which software should be installed to Puhti, or what kind of courses should be offered or materials/examples should be prepared.
This guide was developed in cooperation with the Finnish Environment Institute, SYKE, as part of the Geoportti project.
If you are interested in the fundamentals of EO, take a look at these excellent resources:
Further reading:
!!! default "Raster data format"
Most EO data is available in <a href="https://towardsdatascience.com/the-ultimate-beginners-guide-to-geospatial-raster-data-feb7673f6db0" aria-label="Towards data science guide to raster data">raster format</a>. The most common file formats are <a href="https://en.wikipedia.org/wiki/GeoTIFF" aria-label="GeoTiff data format description">GeoTiff</a> and <a href="http://giswiki.org/wiki/GeoJPEG2000" aria-label="GeoJPEG2000 data format description">GeoJPEG2000</a>.