Try   HackMD

This HackMD is re-used under a CC-BY license from The Turing Way collaboration cafe template

Environmental Data Science book β›° 🌳 πŸ™οΈ ❄️ πŸ”₯ 🌊 online Collaboration Cafe [PUBLIC ARCHIVE]

Archive: 22 February 2022 | Prepare compelling and reproducible notebooks for the EnvDS book

Sign up below

Name + Which is your essential clothing accessory for frozen days? + an emoji to represent it (emoji cheatsheet)

  • Alejandro + Gloves + :gloves:
  • Pirta + Beanie + :billed_cap:
  • Tim + Argyle + :socks:
  • Ricardo + Scarf + :scarf:

Conversation Starters

  • None

Breakout rooms: Topic proposals

  • No breakout rooms, all in the main room

Notes and questions

  • Alejandro went through the key steps to submit a notebook to the EnvDS book:

    • Step 1: Notebook idea
      • Log your notebook idea as a new issue in the project repo
      • Once the idea is clear e.g purpose, data sources, packages, and/or you receive a feedback from a collaborator or EnvDS community, you can move to the next step :arrow_down:
    • Step 2: Preparation
      • Open a terminal in your local/remote machine.
      • Fork the EnvDS repository to your personal github account.
      • Clone the forked repository into your local/remote machine.
      • Go to the folder of the forked repository in your local/remote machine.
      • If the environmental system and/or topic doesn't exist create a folder in the forked repository.
      • Copy the template of your topic from the community chapter, (see here).
    • Step 3: Setup
      • Open a terminal in your local/remote machine and change the current directory (path) to the directory of the forked repo.
      • Verify if you have conda i.e. type conda. If you don't get results after the verification, follow this guide for installing conda.
      • Prepare a conda environment for your notebook. Note the environment should use python version of the EnvDS book (python 3.8) and also install jupyter which is the library to edit the notebook. The lines below guide you to launch a jupyter notebook session, one of the jupyter interfaces to edit notebooks.
        • conda create -n <environment_name> python=3.8 jupyter
        • follow the instructions to activate your environment
        • check the list of packages in the environment.yml of the EnvDS book.
        • install relevant packages from the list using conda install <package-name>
        • type jupyter notebook
        • If a package relevant for your notebook isn't in the list, you can add a cell to install it before the import libraries section in the notebook. Packages can be installed using pip -q install <package-name> where -q means to install in silent mode.
    • Step 4: Edit the the noebook
      • Once you have the environment ready for your notebook, you can modify the sections of the template.
      • (optional) follow the 10 rules of compelling notebooks provided by the EarthCube initiative available in their Notebook Template (section Data processing and analysis).
      • Once happy with the first editions of the notebook. Save it and push the changes. Not sure how to push changes, follow Turing Way community chapter in Github.
    • Step 5: Open a pull request
      • Go to the forked github repository in your Github account.
      • Click in contribute and open a pull request
      • You'll see a form which you should fill and submit according to the information requested.
      • Go to the EnvDS main repo and you'll see the PR as below:
    • Step 6: Editions
      • Click in your pull request
        • To facilitate the interaction with the reviewer, we are using ReviewNB and Netlify previews. You can access to ReviewNB clicking in the purple buttom below:
      • Note you can continue implementing changes in the forked repo. They'll automatically will change the notebook in the PR.
    • Step 6: Review
      • Once you're happy with the first version in the pull request, you can change to Ready to review in the checkbox.
      • A reviewer will be assigned to your pull request.
      • The reviewer will start a discussion of your notebook through the ReviewNB platform.
    • Step 7: Publication
      • Once both parties, author(s) and reviewer(s) are ha
  • Some useful resources mentioned in the meeting:

  • Celebrations :rocket:

    • Four great notebooks ideas!
      • Alejandro: COSMOS-UK Sensor Visualisation, see issue#49
      • Ricardo: Long timeseries phenology using Landsat data, see issue#52
      • Tim: Concatenating a gridded rainfall dataset into a time series, see issue#53
      • Pirta Nutrientscape mapping in optically shallow tropical coastal waters, see issue#54

Request for reviews!

  • None

Feedback at the end of the call

  • None

Archive: 23 November 2021 | FAIR data in Environmental Sciences

Illustration of the FAIR principles to show the definition of being Findable, Accessible, Interoperable and Reusable. Source: The Turing Way: The FAIR Principles The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.

Sign up below

Name + What is your recent favorite resource or tool or app or software? + an emoji to represent it (emoji cheatsheet)

  • Alejandro + The Turing Way + :milky_way:
  • Bea + scivision + :koala:

Conversation Starters

  • None

Breakout rooms: Topic proposals

  • Main room (silent mode)
    • Bea: working on the submission of her PhD thesis.
    • Alejandro:
      • adding helpful resources about FAIR and example of research repositories for Environmental Sciences.
      • checking which sample data within the Environmental Data Science book can be curated in the Environmental Data Science Zenodo community.

Notes and questions

  • Alejandro
    • Useful resources about FAIR :

    • Research data platforms:

    • Other interesting FAIR-driven platforms:

      • ROHub: research object management platform supporting the preservation and lifecycle management of scientific investigations, research campaigns and operational processes. It implements FAIR digital objects and specific metadata for data-cube in Earth Science.
    • Challenges of FAIR data repositories for Environmental sciences (ES):

      • ES is structured as tabular data collected in the field or laboratory (see further discussion in BEXIS2).
      • FAIR-enabled data available could be daunting for many ES researchers and organisations due to the lack of awareness, efficient data management tools, infrastructure and skills see further discussion in BEXIS2).
      • Spatio-temporal (data cubes) > this seems to be adressed by novel research object management platforms such as ROHub.

Request for reviews!

  • None

Feedback at the end of the call

  • Alejandro: Few participants in this particular collaboration cafe. We should restructure the promotion strategy, proposing new topics and/or changing the format for coming collaborations cafes in 2022 :face_with_monocle:.

Archive: 26 October 2021 - Reproducibility in Environmental Science

Sign up below

Name + Share a song that expresses your personality + an emoji to represent it (emoji cheatsheet)

  • Alejandro + Should stay or Should I go (The Clash) + 🧳
  • Sam + Wish you were here (Pink Floyd)
  • Matt - BBC Grandstand Theme - :horse_racing:

Conversation Starters

Breakout rooms: Topic proposals

  • Matt, making a reproducible GitHub code for his MRes dissertation
  • Alejandro, preparing contributions guidelines for the Environmentel AI book
  • Sam J, exploration of resources for reproducibility and feedback on Matt and Alejandro's topics

Notes and questions

  • Sam J:
    • The Turing Way, a great resource to guide Environmental scientist in reproducible research.
    • Cornell Dataset Description a good starting template for dataset documentation!
    • Standards in data catalogues, e.g. STAC (but it isn't mature)
  • Alejandro:
    • Zenodo:
      • It is great to keep your sample data (up to 50 GB).
    • notebooksharing.space
      • A nice resource to share notebooks with interactive plotting (up to 10Mb). However, it doesn't allow track changes as ReviewNB does.
    • Contributors guidelines for the EnvAI book
      • Sam suggests example environmental python packages with links to notebooks (e.g. hvplot, geopandas etc.)
      • Minimal publishable version guidelines e.g. Binder
      • Use external links for general versioning principles e.g. how to pull request in Github
      • Provide examples how to create lock environments
      • Section of tools for sharing notebooks e.g. ReviewNB, notebooksharing.space
  • Matt
    • Publishing reproducible code for environmental science
      • It can be more important that the process can be reproduced rather than accuracies to the nearest 0.01%
      • Use a subset of data to demonstrate the tool where the owners aren't happy to share the whole thing - training & inference
      • In env science a visual demonstration of the results can be more useful than a commandline readout of accuracy
      • Suggest sensible ranges for hyperparameters in the documentation

Request for reviews!

  • Sam J: reviewers need for SEVIRI wildfire data notebook of the EnvAI book, see PR#12

Feedback at the end of the call

  • None

Archive: 28 September - Data preprocessing

Name + What’s the hardest part about working virtually for you? and the easiest? + an emoji to represent it (emoji cheatsheet)

  • Alejandro + social interaction, more sleep time + :busts_in_silhouette: :sleeping:
  • Sam A. + I still have just as many meetings if not more and it is soooo tiring! :sleeping: :pleading_face:
  • Evangeline + Feeling self-conscious on camera, flexibility + :movie_camera: :clock1:

Conversation Starters

Breakout rooms: Topic proposals

  • Sam A. Manufacture Urban Data in GIS format
  • Evie. Preprocessing satellite data for crop yield prediction
  • Alejandro. Preprocess FluxNet data and related gridded products

Notes and questions

  • We showcased the SEPAL platform for Vegetation Satellite Image analysis.
  • Discussed challenges around scoping and extracting satellite data for machine learning models of vegetation (agricultural crops):
    • Appropriate satellite platform (Sentinel/LANDSAT?)
    • Preprocessing of radar and optical data (i.e. dealing with cloud cover)
    • Appropriate time series/critical dates for plant growth
  • Sam A. used ArcGIS pro to extract site-specific temperature information from a gridded netCDF dataset using the Spatial Analyst 'Sample' tool. It is very useful in that it works across the time dimension so I could do this for 1 year of data in one go. It is also possible to set a desired output coordinate system. I could save the data out as a csv file and then use standard python tools like pandas and numpy for further processing
  • Sam A. suggests using Iris package for reprojecting gridded netCDF files. The project is
  • Data preprocessing is still too time-consuming, and there is lack of communication of the tools available.

Request for reviews!

  • None

Feedback at the end of the call

  • None

Archive: 29 June - Data Visualization

Name + Something you watch (video, movie, documentary. etc) recently that was inspiring for you? + an emoji to represent it (emoji cheatsheet)

Conversation Starters

  • Alejandro: EGU Public call-for-session-proposals all other sessions: Deadline: 6 September 2021
  • Scott: Pangeo European Community is growing and there are plans of coffee chats and regular showcase meetings (see here)

Breakout rooms: Topic proposals

  • Sam J: Regridding MODIS data for wildfires detection
  • Tom: Produce script to reproduce IceNet paper figures for Nature Communications
  • Emily: Visualization of LiDAR data
  • Scott: Organizing and Admin EnvSensors WPs project timetable
  • Alejandro: Deploying a FluxNet use case visualization outputs for the Environmental Data Science book

Notes and questions

  • Emily showed a cool visualization of a laser scan image (100 GB) using the propietary software of the scanner device. After data preprocessing, she will use libraries for visualizing individual trees.
    • Emily says there are also some radar sensors that collect soil data.
  • Tools for regridding MODIS data. Sam is using satpy. Suggestions of other existing tools are welcome.
  • Tom is making his code nicer i.e. modules and efficient i.e using dask.
  • Alejandro shows FluxNet demo
    • Emily suggest adding woodlands and shrubs to subset FluxNet data.

Feedback at the end of the call

  • Add a disclaimer collaboration cafes' hackMDs are public.
  • Names for breakout rooms.
  • We should aim to keep to time, once we are used to the format etc.