This HackMD is re-used under a CC-BY license from The Turing Way collaboration cafe template
Environmental Data Science book β° π³ ποΈ βοΈ π₯ π online Collaboration Cafe [PUBLIC ARCHIVE]
Archive: 22 February 2022 | Prepare compelling and reproducible notebooks for the EnvDS book
Sign up below
Name + Which is your essential clothing accessory for frozen days? + an emoji to represent it (emoji cheatsheet)
- Alejandro + Gloves +

- Pirta + Beanie +

- Tim + Argyle +

- Ricardo + Scarf +

Conversation Starters
Breakout rooms: Topic proposals
- No breakout rooms, all in the main room
Notes and questions
Request for reviews!
Feedback at the end of the call
Archive: 23 November 2021 | FAIR data in Environmental Sciences
The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.
Sign up below
Name + What is your recent favorite resource or tool or app or software? + an emoji to represent it (emoji cheatsheet)
- Alejandro + The Turing Way +

- Bea + scivision +

Conversation Starters
Breakout rooms: Topic proposals
- Main room (silent mode)
- Bea: working on the submission of her PhD thesis.
- Alejandro:
- adding helpful resources about FAIR and example of research repositories for Environmental Sciences.
- checking which sample data within the Environmental Data Science book can be curated in the Environmental Data Science Zenodo community.
Notes and questions
- Alejandro
-
Useful resources about FAIR :
-
Research data platforms:
- General
- re3data.org: initiative indexing research data platforms by content topic and knowledge domain.
- Stats datacite: dashboard mapping the registration of persistent identifiers (DOIs) for research data and other research outputs.
- Environmental science (list of platforms with the highest number of total DOIs registrations according to Stats datacite):
-
Other interesting FAIR-driven platforms:
- ROHub: research object management platform supporting the preservation and lifecycle management of scientific investigations, research campaigns and operational processes. It implements FAIR digital objects and specific metadata for data-cube in Earth Science.
-
Challenges of FAIR data repositories for Environmental sciences (ES):
- ES is structured as tabular data collected in the field or laboratory (see further discussion in BEXIS2).
- FAIR-enabled data available could be daunting for many ES researchers and organisations due to the lack of awareness, efficient data management tools, infrastructure and skills see further discussion in BEXIS2).
- Spatio-temporal (data cubes) > this seems to be adressed by novel research object management platforms such as ROHub.
Request for reviews!
Feedback at the end of the call
- Alejandro: Few participants in this particular collaboration cafe. We should restructure the promotion strategy, proposing new topics and/or changing the format for coming collaborations cafes in 2022
.
Archive: 26 October 2021 - Reproducibility in Environmental Science
Sign up below
Name + Share a song that expresses your personality + an emoji to represent it (emoji cheatsheet)
- Alejandro + Should stay or Should I go (The Clash) + π§³
- Sam + Wish you were here (Pink Floyd)
- Matt - BBC Grandstand Theme -

Conversation Starters
Breakout rooms: Topic proposals
- Matt, making a reproducible GitHub code for his MRes dissertation
- Alejandro, preparing contributions guidelines for the Environmentel AI book
- Sam J, exploration of resources for reproducibility and feedback on Matt and Alejandro's topics
Notes and questions
- Sam J:
- The Turing Way, a great resource to guide Environmental scientist in reproducible research.
- Cornell Dataset Description a good starting template for dataset documentation!
- Standards in data catalogues, e.g. STAC (but it isn't mature)
- Alejandro:
- Zenodo:
- It is great to keep your sample data (up to 50 GB).
- notebooksharing.space
- A nice resource to share notebooks with interactive plotting (up to 10Mb). However, it doesn't allow track changes as ReviewNB does.
- Contributors guidelines for the EnvAI book
- Sam suggests example environmental python packages with links to notebooks (e.g. hvplot, geopandas etc.)
- Minimal publishable version guidelines e.g. Binder
- Use external links for general versioning principles e.g. how to pull request in Github
- Provide examples how to create lock environments
- Section of tools for sharing notebooks e.g. ReviewNB, notebooksharing.space
- Matt
- Publishing reproducible code for environmental science
- It can be more important that the process can be reproduced rather than accuracies to the nearest 0.01%
- Use a subset of data to demonstrate the tool where the owners aren't happy to share the whole thing - training & inference
- In env science a visual demonstration of the results can be more useful than a commandline readout of accuracy
- Suggest sensible ranges for hyperparameters in the documentation
Request for reviews!
- Sam J: reviewers need for SEVIRI wildfire data notebook of the EnvAI book, see PR#12
Feedback at the end of the call
Archive: 28 September - Data preprocessing
Name + Whatβs the hardest part about working virtually for you? and the easiest? + an emoji to represent it (emoji cheatsheet)
- Alejandro + social interaction, more sleep time +

- Sam A. + I still have just as many meetings if not more and it is soooo tiring!

- Evangeline + Feeling self-conscious on camera, flexibility +

Conversation Starters
Breakout rooms: Topic proposals
- Sam A. Manufacture Urban Data in GIS format
- Evie. Preprocessing satellite data for crop yield prediction
- Alejandro. Preprocess FluxNet data and related gridded products
Notes and questions
- We showcased the SEPAL platform for Vegetation Satellite Image analysis.
- Discussed challenges around scoping and extracting satellite data for machine learning models of vegetation (agricultural crops):
- Appropriate satellite platform (Sentinel/LANDSAT?)
- Preprocessing of radar and optical data (i.e. dealing with cloud cover)
- Appropriate time series/critical dates for plant growth
- Sam A. used ArcGIS pro to extract site-specific temperature information from a gridded netCDF dataset using the Spatial Analyst 'Sample' tool. It is very useful in that it works across the time dimension so I could do this for 1 year of data in one go. It is also possible to set a desired output coordinate system. I could save the data out as a csv file and then use standard python tools like pandas and numpy for further processing
- Sam A. suggests using Iris package for reprojecting gridded netCDF files. The project is
- Data preprocessing is still too time-consuming, and there is lack of communication of the tools available.
Request for reviews!
Feedback at the end of the call
Archive: 29 June - Data Visualization
Name + Something you watch (video, movie, documentary. etc) recently that was inspiring for you? + an emoji to represent it (emoji cheatsheet)
Conversation Starters
- Alejandro: EGU Public call-for-session-proposals all other sessions: Deadline: 6 September 2021
- Scott: Pangeo European Community is growing and there are plans of coffee chats and regular showcase meetings (see here)
Breakout rooms: Topic proposals
- Sam J: Regridding MODIS data for wildfires detection
- Tom: Produce script to reproduce IceNet paper figures for Nature Communications
- Emily: Visualization of LiDAR data
- Scott: Organizing and Admin EnvSensors WPs project timetable
- Alejandro: Deploying a FluxNet use case visualization outputs for the Environmental Data Science book
Notes and questions
- Emily showed a cool visualization of a laser scan image (100 GB) using the propietary software of the scanner device. After data preprocessing, she will use libraries for visualizing individual trees.
- Emily says there are also some radar sensors that collect soil data.
- Tools for regridding MODIS data. Sam is using satpy. Suggestions of other existing tools are welcome.
- Tom is making his code nicer i.e. modules and efficient i.e using dask.
- Alejandro shows FluxNet demo
- Emily suggest adding woodlands and shrubs to subset FluxNet data.
Feedback at the end of the call
- Add a disclaimer collaboration cafes' hackMDs are public.
- Names for breakout rooms.
- We should aim to keep to time, once we are used to the format etc.