# EDAM-Geo workathon (= a few long days spent working, like a marathon)
###### tags: `EDAM-geo` `nicest2`
This document is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.
Attendees are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html
All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/
### Logistics
- Information about the hackathon: https://indico.neic.no/event/221/
- Location: room 317 in Geology building, University of Oslo, Sem Sælands vei 1 (map), Blindern, 0371 Oslo (Norway)
- For those online:
https://uio.zoom.us/j/67381104332?pwd=MW5sbUVIYVFyTzVlMlpFRW5pd1RFQT09
Meeting ID: 673 8110 4332
Passcode: 976636
Documentation on how to use Zoom can be found here:
https://www.uio.no/english/services/it/phone-chat-videoconf/zoom/
## Welcome
## Presentation
- Anne Fouilloux, University of Oslo, Norway. Working on Earth System Modelling (NICEST2)
- Olga Silantyeva, University of Oslo, Norway, working on hydrological software development
- Jean Iaquinta, University of Oslo, Norway, also working with ESMs, previously in the INES project (Infrastructure for the Norwegian Earth System Model) and now as part of NICEST2
- Hamish Struthers, Linköpings Universitet. Climate modeling support for SNIC.
- Matúš Kalaš, University of Bergen, Norway. Working in ELIXIR on bioscientific infrastructure (main work = EDAM), but also with other communities
## Who is busy when (UTC+2)
Anne - Thursday and Friday 11:00 -12:00
Hamish - 10am - 10:30am Thu. Friday I have a meeting 13:30 - 14:00
Matus - 12-13 Wed
Olga - 12.15-13.00 lunch seminar, Friday, 06-05-22
## Meet-up
- Wed afternoon: 13:30 UTC+2
- Thursday: 9:30 UTC+2, 13:15 UTC+2
- Friday: 09:30 UTC+2
## Presentation of EDAM
[EDAM presentation](https://docs.google.com/presentation/d/1t0HYYrAz13XQIIuvYMAB2dXCggqgt1uGCc0O0tXQ0_Y)
- [Main EDAM github repository](https://github.com/edamontology/edamontology)
- [EDAM geo ontology](https://github.com/edamontology/edam-geo)
We work in WebProtégé: https://webprotege.stanford.edu/#projects/69591619-4eda-4f03-9e7f-65b213038fe1/edit/Classes
EDAM Popovers extension for Chrome-based browsers: https://chrome.google.com/webstore/detail/edam-popovers/amboeicaknkjjpffmdgkopjfljneolca
## Notes
**What is the difference between GCMs and ESMs?**
Jean:
The difference between GCMs and ESMs is that the former generally represents physical processes occurring in the atmosphere, ocean, cryosphere, and interactions between these domains. In addition to representing oceanic and atmospheric dynamics, *ESMs also include information on biogeochemical cycling in terrestrial and marine ecosystem*s and allow for these ecosystems to have feedbacks on the circulation. Therefore, all ESMs are GCMs, but not all GCMs are ESMs (from https://aslopubs.onlinelibrary.wiley.com/doi/pdf/10.1002/lob.10113)
Digital Earth?
https://joint-research-centre.ec.europa.eu/scientific-activities-z/digital-earth_en
# Day-2 and Day-3
## Annotate tools
### GIS Tool
Repository: https://github.com/expertanalytics/rasputin
Short Description: The main usage is conversion of raster DEMs into simplified triangular meshes
Lisence: GNU GPLv.3
API access: python3, shell
Type of tool: Command-line tool, Library (python), web-application
Relations: dependencies: CGAL and Boost, Meshio, pybind11, Armadillo, date, Catch
Data Dependency: GlobCov 2009 (http://due.esrin.esa.int/page_globcover.php) or Corine (Europe) (https://land.copernicus.eu/global/products/) datasets
Docker: https://github.com/expertanalytics/rasputin/blob/master/Dockerfile
Example: https://github.com/expertanalytics/rasputin#minimal-example
Application: Hydrologic modelling, Avalanch forecasting
Has topic:
- Computational Geometry (https://en.wikipedia.org/wiki/Computational_geometry)
- GIS (https://en.wikipedia.org/wiki/Geographic_information_system)
- Grid/Mesh (triangular irregular network TIN) (https://en.wikipedia.org/wiki/Triangulated_irregular_network),
- Data structure (https://en.wikipedia.org/wiki/Data_structure)
- Polygon
- Spatial representation
- Vector based representation
- Surface/terrain representation
- LandUse (Land types, vegetation types)
- Mesh resolution, DEM resolution
- Sun position
- Insolation, shading
- Visualization -> TODO: fix EDAM visualization to be more generic, web-based mesh visualization
- Universal Transverse Mercator coordinate system (UTM) (https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system)
- map projection (https://en.wikipedia.org/wiki/Map_projection)
- latitude, longitude, elevation (node coordinates)
- node, edge
- UTC (https://en.wikipedia.org/wiki/Coordinated_Universal_Time)
Operations:
- tesselation/triangulation (added to EDAM)
- landuse assignment (general in EDAM: Data integration or rather Data transformation. See EDAM Bioimaging at https://webprotege.stanford.edu/#projects/2ce704bf-83ed-4d2e-985f-84c4841fac71/edit/Classes?selection=Class(%3Chttp://edamontology.org/operation_Geometrical_transform%3E))
- mesh coarsening
- shading computation (extra feature: astronomical sun exposure with downto 1 hour resolution)
- sun position calculation (as part of the previous)
- Avalanche prediction based on NVE forecast and Sun position (tentative, output only in Visualization tool) (possible application; this avalanche analysis is a work-in-progress extra feature, hardcoded in Rasputin)
- Do I need to go into details of methods? CGAL 2D Delanay Triangulation (Incremental Randomized Delaunay Triangulation algorithm (Dev- illers, 1999)). The half-edge collapse algorithm from CGAL based on Lindstrom and Turk (1998) and Lindstrom and Turk (1999) is used in Rasputin, which coarsens the mesh minimizing the deformations based on the given cost function.
Data:
- input:
- digital elevation map (DEM) in GeoTIFF
- area specification ("shapefile") in WKT
- landuse file (land cover map), for example, http://due.esrin.esa.int/page_globcover.php
- user chooses none or one of the reference land use/cover maps (GlobCover in GML, Corine in GeoTIFF +metadata?)
- optionally link to NVE avalanche forecast data
- output:
- TIN mesh of the defined resolution (EDAM: TIN under Irregular spatial grid) in HDF5, can optionally contain also the land cover data
- Shades Timeseries of each mesh cell (tentative)
Formats:
- input:
- GeoTIFF (DEM),
- wkt (shapefile) (https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) -> need to add to EDAM,
- gml (https://en.wikipedia.org/wiki/Geography_Markup_Language) need to add "seeAlso" to EDAM,
- xsd (landuse file (GlobCover dataset, Corine dataset)) (https://en.wikipedia.org/wiki/XML_Schema_(W3C)) "seeAlso" needs to be added to EDAM
- output:
- HDF5 (h5) (https://en.wikipedia.org/wiki/Hierarchical_Data_Format) need to add "seeAlso" in EDAM,
- xdmf (https://en.wikipedia.org/wiki/XDMF) -> need to add to EDAM
- log (tool diagnostics)
### ESMval
https://www.esmvaltool.org/
https://github.com/ESMValGroup/ESMValTool
License: Apache License v2 (http://www.apache.org/licenses/)
Description: The Earth System Model Evaluation Tool (ESMValTool) is a community diagnostics and performance metrics tool for the evaluation of Earth System Models (ESMs)
Interface/type of tool
- ESMValTool is a Command-line tool, Workflow, Toolkit/Framework (Suite in biotoolsSchema), with plugins, API for Jupyter & Python
- ESMValCore is a library for Python, only, and a Workflow
Data:
- CMIP6
- CMIP5
- CMIP3
- these CMIPs are standardised more than just a generic NetCDF-CF. What to do with that?
- OBS, OBS6
- obs4MIPs
- ana4mips
- CORDEX
This is a workflow with a number of steps, where some are workflows themselves. Which steps of the workflows are performed depends on the config/inputs
This are 2 tools/frameworks/workflow. Operations of the workflow are:
- ESMValCore does the first steps in the workflow
- a lot of various (pre)processing steps and quality control
- incl. data finding+download (Data retrieval)
- Data standarisation (CMORisation etc.), incl. masking of missing values and outliers, also normalisation? incl. Unit conversion (Conversion in EDAM)
- segmentation/masking into land/sea/ice, time/space selection (all some sort of Data selection)
- Reprojection/Regridding (is a kind of alignment/registration. Controversy reprojection vs regridding: what is more generic etc... are perhaps just 2 different things)
- Statistical analysis, incl. correlation, ML regression, ...
- ESMValTool adds the additional diagnostics/comparison/plotting to the workflow
- diagnostics/validation but possibly also comparison
- plotting (any specific? spatial/geo/geospatial(+temporal, can be animation) visualisation & specific plotting?)
- Inputs:
- "config user"
- selected datasets from the centralised DB or local, 1 or more (usually 2, or more). Ideally CMORised but doesn't have to be, not even fully NetCDF-CF
- ES model data = trajectory (3D geolocated time series of various variables) = simulation result = climate prediction or projection (long-term, using scenarios) in NetCDF
- observed (3D geolocated time series of various variables) from satellite imaging, ground measurements, aerial sensing, in NetCDF
- reanalysis = mix of modelled and observed data
- Jean: I wrote in Edam "This is a bit like running a Numerical Weather Prediction model "backward", using data assimilation (or "analysis") techniques to calculate the best estimates of many atmospheric variables such as wind and temperature in the past (typically covering the period from 1950 to present time)"
- "config developer"
- additional info? optional?
- "recipe"
- selected set of diagnostics (https://docs.esmvaltool.org/en/latest/_images/catchments.png)
- EDAM data: Climate data, which in this case could be simulated, measured, or a combined a.k.a. "reanalysis". Plus some additional info about the workflow steps/diagnostics/plots/etc... (Tool/workflow configuration data?)
- Outputs of ESMValCore:
- CMORised and otherwise processed NetCDF-CF (not only NetCDF). EDAM data: same as input, but standardised and otherwise processed
- provenance and logs. EDAM: Provenance metadata, Report...
- Outputs of ESMValTool:
- plot(s). EDAM: Plot (Do we need more details?)
- provenance and logs
- optionally additional NetCDF-CF from the comparison/diagnostics (in addition to the processed inputs)
Note: It's possible to add new diagnostics like plugins.
Formats:
- input
-- netCDF - What about format for observations? are they all in netCDF? (Yes)
- output
-- netCDF-CF - What about other types of outputs (plots, etc.)?
-- png?
-- provenance/reports:
-- metadata YAML
-- settings YAML
-- logs in text
has topic:
- Earth System Modelling
- oceanography
- atmospheric science
- biosphere
- cryosphere
- etc.
Operation: diagnostics
- analysis
- calculation
- comparison
- conversion
- data handling
- visualization
- data retrieval
- correlation
- geographical visualization map
- machine learning regression
### SH matching (https://bio.tools/sh_matching)
https://github.com/TU-NHM/sh_matching_pub
Data:
- input
- DNA sequence (http://edamontology.org/data_3494 would strictly mean sequence alone, whereas Nucleic acid / Nucleotide sequence record http://edamontology.org/data_2887 might mean much more than just sequence + id. FASTA formats do not point here)
- output
- Taxonomy (http://edamontology.org/data_3028)
- This are sequence IDs from the inputs plus best matches in a reference DB
- Taxonomic profile would include some extra additional statistics etc.
- Probably Taxonomic classification (report) http://edamontology.org/data_1872 is the best match here
- Sequence composition report(?) (http://edamontology.org/data_1261 seems to be defined very narrowly)
- this output is about how many or which sequences balong to which taxonomic ranks
- sounds like Taxonomic profile would be
These I/O are valid for a user's operation. On the other hand, a reference database can downloaded at any time to the location of the installed tool. This should be at least a comment for human readers, but could also be a separate "pseudo-operation", unavailable if used as a web API or webpp.
Formats:
- input:
- FASTA (http://edamontology.org/format_2200)
- output:
- HTML (http://edamontology.org/format_2331)
- TSV (http://edamontology.org/format_3475)
has topic:
- ~~Data visualisation (http://edamontology.org/topic_0092)~~
- FAIR data (http://edamontology.org/topic_4012). This would have to be commented for human readers how it is more FAIR data-related than other similar tools
- ~~Workflows (http://edamontology.org/topic_0769)~~
- Biodiversity (http://edamontology.org/topic_3050). Taxonomic profiling or even more specifically Metabarcoding should be added as topic(s)
- Metabarcoding (<= missing term that could fit to Environmental sciences:Ecology)
**metabarcoding** - large-scale taxonomic identification of complex environmental samples via analysis of DNA sequences for short regions of one or a few genes -- is this Taxonomic profiling? Should metabarcoding/taxonomic profiling/community profiling be sublevel of DNA barcoding.
Operation:
- Taxonomic classification (http://edamontology.org/operation_3460 and there is a data concept with the same name at the moment: http://edamontology.org/data_1872. Plus DNA barcoding http://edamontology.org/operation_3200), but that should be a topic. Is there Taxonomy (information) or classification (taxon tree) ment under the data concept?
- Visualisation (http://edamontology.org/operation_0337). We can consider if we need specifically Taxonomic profile visualisation or not.
### FATES
Data: model data, satellite observation, observations, timeseries
Format: netCDF-CF, netCDF
Topics: environmental science, physics, biodiversity, ecology
Operation: ecological modelling, modelling and simulation, species distribution modelling, predictions (growth, death, and regeneration of plants .e. fate)
### EODIE
https://eodie.readthedocs.io/en/latest/
Format: Geotiff, shp
Data:
- input
- satellite images (like Sentinel-2/Landsat)
- a geospatial vector data file with polygons of the objects of interest (JSON?)
- output
- time series of indices and stastistics derived from the satellite data like **NDVI** (not in Edam)
- output format = .csv, tabular
Topics: **Earth observation** (not in Edam), satellite imaging, environmental science, ecology, biodiversity, **global change** (not in Edam)
Operations: statistical calculation, **index calculation** (not in Edam)
### Panoply?
https://www.giss.nasa.gov/tools/panoply/
https://live.usegalaxy.eu/?tool_id=interactive_tool_panoply
# Follow-up: What kind of output we want?
## Paper
Software management/stewartship
Title: "Fairification of Earth System Science tools: Case study"
Actions: Olga to start the paper writing, git repo for the paper
## Readme.md
New structure suggestion to software community, check other schemas
## Bringing bio and geo together
### Fates
could be a good example/case study of it
## We need funding
EOSC
RDA
Elixir
Possible funding source Sigma2 (Anne to check what is about "politics")
# Actions
- If there is anything related to Imaging, please search for the concepts in EDAM Bioimaging and put comments there (https://webprotege.stanford.edu/#projects/2ce704bf-83ed-4d2e-985f-84c4841fac71/edit/Classes)
- But new Imaging concepts, such as Satellite and Aerial imaging are added into EDAM-geo (We could consider doing this differently, perhaps adding them to EDAM Bioimaging instead)
- Matúš: merge geo edam (in small steps). It needs:
- Definitions!
- Fixes in the main EDAM
- Priority what we want to merge first
- Olga: start paper
- Anne: figure out possibilities of funding from Sigma2
- need to put all the "definitions" in EDAM for proper merging --> need a regular half-day "hackaton" to share and discuss
- Travel: Matúš is not able to travel August to October, Olga should be able to travel from August; First 2-3 weeks of August probably (Follow-up in Bergen) --> Anne organises
- Need to ask Adil to support it (Anne)
- Tentative days so far agreed are 10-12 August 2022
- Please let everyone know if they stop working for you and we can try to move it a bit either way
- Location: Ideally Bergen, maybe Oslo
- Other people to involve:
- Sonya Geange (Bergen), and/or colleagues
- Natural History Museum Oslo (FATES)
- Eva Lieungh
- maybe Korbinian Bösl (Bergen), at least to support this work
- more biodiversity and ecology folks:
- Kristjan Adojaan, University of Tartu
- climate pepole from Stockholm University (Bolin Centre database).
- Write a summary/blog of the hackathon (to advertise on EOSC-Nordic/NeIC)
- Talk to Bjoern and/or Herve for https://github.com/bio-tools/content (we have cesm and fates-emerald) if we want to add "mockup" for "geotools" and/or ROtools? (Anne)