** sDevTraits 2022 **
# Monday 8/8/22
## Reimbursement
iDiv Intro by Jens
Forms send by Doreen
- 15€ flat per day
- 9€ ticket
- flight tickets etc. by mail
Dinner
City Tour tomorrow
What to do on thursday: Expedition to some lakes nearby?
## Collaboration on Webpage
- machine readible format
- collaborations and maps
- e.g. conferences, workshops
- already existing data model from informatics
- consesus sDevTraits -> github
- participants
- https://github.com/open-traits-network/open-traits-network.github.io/blob/master/_posts/2022-07-25-sdev-traits-workshop.md
- Domains on OTN page, like plants, animals
## 10 rules Traits paper
under review
encourage to present on conferences, e.g. GFÖ
OA funding covered by DEAL/LMU/iDiv
- change to CC-BY preprint
- journal final -> CC-BY
- add paragraph about licensing, CC-bY for documents, CC-0/Open Database license for data
- but intellectual property
- How to tackle data citations:
- cite the data directly
- a lot of data citations -> data paper synthesize the data set, then cite it in synthesis paper
or if by the journal -> journal,
if not: Zenodo / FigShare but no Impact for data contributors
- recursive literature/data citation
-> needs structure
-> needs tools
=>> GAPS / Body Size try that out
- plazi project / wikidata -> needs review
- https://plazi.org
- link to OpenTraits website for data life cycle?
- mention potential issues with categorical traits in columns e.g. column names are body_lenght, diurnal, nocturnal (with 0/1)
## Icebreaker : Unrelated projects
- Alex: Insect Microbiomes along elevational gradient in the Andes
- Brian: Mapping geographical/Biodiversity information in his Postdoc project
- Caterina: Synthesis on Biodiversity data in the Biodiversity Exploratories
- Daniel: Machine readible integration of IPBES reports with machine learning, Hackathons with Students
- Stefanie: new traits: sound traits for fishes etc. later
- Jorrit: Better ways to find/cite data, unreliability of URLs -> Content IDs/Hashes
- Jens: develop workflow to extract quantitative traits from herbarium specimens
- Markus: Medical Imaging, Cardical MRI machine learning -> competition
- Jen: Community liason and corresponding with requests. Example: Traffic limits from HR of Amazon
- Katja: Ecological restoration, monitoring endangered species, facilitation using camera traps. training data sets for machine learning: "Animating the dead" with after effects
## Scholia / WikiData
- link to presentation by Jorrit / Daniel "Exploring Traits of the Open Traits Network (OTN)"
- archived copy: [DOI: 10.5281/zenodo.6863639](https://doi.org/10.5281/zenodo.6863639)
- live doc https://docs.google.com/presentation/d/1Db4wuJwBgCNvv43rzWGKmexezcTqHIbSlzG24aOkEwU/edit?usp=sharing
-
- Exploring Traits of the Open Traits network
- wikidata good for relationships data, make machine readable
- Scholia: associations of researchers, organizations and products
-> combining these projects, as different nodes in the network
- First integrations by Jorrit and Daniel
- script to generate scholia queries, mines from wikidata -> csv files
- independent snapshots of wikidata
- e.g. (currently only OTN members). Wikidata = Data -> Scholia = Frontend
- author <-> taxa links
- author <-> author relations
- authors <-> publications
- authors <-> research topics/claims
- Annotation of publications important to integrate
- Exploring potential collaborations
- e.g. terms like taxa
- Scholia pages have "improve data"
- Scholia organization "Open Trait network"
- author disambiguation
- Other database resources can be useful
- eg. art galleries -> annotations of taxa by community efforts
- Data Model for the OTN/Scholia integration: Graph by Jorrit
- OTN informing Scholia/Wikidata
- Personal pages: "Bring your own Links"
- hands on session
Topics to get deeper into
- annotation pathways
- access points / legal repositories / OA
- Katja discusses that University of California promotes closed accessed to be published via the internal tool called https://escholarship.org/ . How to better share the UC publications for better access / visibility?
- what topics to put on the OTN webpage
- ethical issues
https://opentraits.org/publications
https://scholia.toolforge.org/organization/Q112326635
https://opentraits.org/methods
https://opentraits.org/taxa
https://opentraits.org/traits (example of a trait https://scholia.toolforge.org/topic/Q113378236 )
## BigBee project
Katja Seltmann
Slides: https://docs.google.com/presentation/d/15qwNuDI3qfKC5H_RZPzsOGeGn4Nl0SXlEvVU-0XYN8o/edit?usp=sharing
- NSF funded
- data integrations ongoing
- global data also beyond US
- capturing intra specific variation
- 2D -> 3D "packages"
- also interaction data from flowers collected
- from natural history collections
- data acquisition
- aggregations
- integrations
-> Bee library
-> integration of traits/occurences etc.
-> GloBI Interactions -> Data sets
- body size important "functional" trait, with relevance to fitness
- common measure intertegular span
- pilosity
- -> thermoregulation
- measured by manual counting or Shannon entropy (incl deep learning)
- via Daniel: Scholia on hairiness so far: https://scholia.toolforge.org/topic/Q113378236
library.big-bee.net
The full proposal can be found at: Seltmann, K. C. (2021). Extending Anthophila research through image and trait digitization (Big-Bee) proposal. UC Santa Barbara: Cheadle Center for Biodiversity and Ecological Restoration. Retrieved from https://escholarship.org/uc/item/2vm761mv
-
- re engaging the natural history collections community, here is a TDWG session about integrating them with the Wikimedia ecosystem https://www.tdwg.org/conferences/2022/session-list/#int19%20the%20role%20of%20the%20wikimedia%20ecosystem%20in%20linking%20biodiversity%20data
- about images sharing: https://www.floridamuseum.ufl.edu/overt/ ; https://github.com/bio-guoda/preston-brit-2022
- alex added "Katja, do you know the DISC3D device? https://small-world-vision.de/de/ "
- katja shares 3d macropod: https://macroscopicsolutions.com/product/the-macropod-pro-3d
## Random notes/ topics
- make presentations / posters about OpenTraits available to everybody?
# 9/8/22
## Project GAPS
Assess global gaps in traits across taxa
- Might encourage people to register their datasets in OpenTraits
- Available publicly
- Open call to the community, assess, another call, re-assess, re-iterate -> Create a workflow
- Use minimal info: trait and species and author (would help with duplicates)
- Use checklists to estimate completeness (+ papers estimating the completeness of checklists)
- Focus on people: diversity of researchers studying a given organism
- Mention the importance of intra-specific variation
- Novelty: across domains
- Make trait categories ("buckets"): size, metabolic,...
- Tasks: review OTN registry, select Taxonomic databases/lists, make table with 10 rows from an example,
Here's the datasheet
https://docs.google.com/spreadsheets/d/1CeW3SHKqfwpX8QCcRC9WRgMZ2jRtaeBpwh8Gl0yS0no/edit?usp=sharing
Brian - BIEN inaccessible
Caterina /datasets/amphi-bio
- traits are reported in different formats (columns are sometimes trait categories or values)
- combine the Genus Species into single verbatim sciName value (e.g., genus: Homo, species: sapiens enter Homo sapiens)
All - process of registering datasets - ask people to point to their existing data. Figure how to/how will do synthesis
Daniel - https://opentraits.org/datasets/columbian-anurans link to paper and dataset are identical, so not clear how to access the data; data spread over three files - a PDF with explanations and a zip file containing two text files (one for species-level traits, another for individual level); some UTF encoding problems with the species-level file
Stefanie - fishbase User sees webpages, and not the database behind. FB website does not have a taxonID, but links to the ITIS, COL and WoRMS IDs. Not really a traitID, but an ID for the source.
Jorrit - re: Fishbase there's a data publication https://github.com/globalbioticinteractions/fishbase and https://github.com/globalbioticinteractions/fishbase-archiver
All - add traitIdVerbatim, traitNameVerbatim
Jorrit - elton-traits dataset - "links to ESA pub broken; doi works; downloaded https://figshare.com/articles/dataset/Data_Paper_Data_Paper/3559887?backTo=/collections/EltonTraits_1_0_Species-level_foraging_attributes_of_the_world_s_birds_and_mammals/3306933 ; 'line:hash://sha256/97216eb1797da077169ebb1ebea275db293b09fc62f8bb8911f9beb98c50d321!/L2'
Jens - BROT dataset - ok, about 19000 unique species / traits combinations . What is the provenance for this dataset? Is it coming from BROT, or do we want to traverse to the source?
Alex - in an ideal world, source data of aggregate dataset should also be present OTN registrey, and the provenance could be traced. And, we'd have to be pragmatic too.
Marcus - mentions that in AUS-traits contains N/A trait values, suggesting trait values should be checked for N/A, nulls, DD, unknown etc, undefined, -1, -999, 999, 0 (present/absent or something else). You have to understand the schedule and the expected values. "0" could be not defined, or an actual value.
**What to do with the output table?**
**WEBSITE**
1. Render as is
2. Filter by trait "bucket"
3. Filter by taxon "bucket"
4. Filter by data source
5. Trait summary from trait "bucket" (OTN - Traits - Dataset)
with trait name, number of records, number of taxa and dataset, last summarised date
6. Taxon summary from taxon "bucket" (OTN - Kingdom - Dataset)
with kingdom name, number of records, number of traits and dataset, last summarised date
7. Tracking - Check for versions / flag updates
-> check for availability: "badge" (red: NA /orange: change /green:unchanged)
-> check if dataset if static or dynamic (when claimed by author)
-> change rate / last update
8. Register EOL traits datasets
Prominent datasets
9. Contact owner curators and share results
Groups:
- dataset summary: Daniel, Markus, Caterina
- versioning: Jorrit, Daniel
- data import: Brian, Alex, Jens, Caterina
- registering trait datasets EOL: Jen
- talk about buckets: Stefanie, then discussed as group
- taxonomy: Jorrit
MAD (Make A Database) already has scripts to download datasets.https://github.com/willpearse/MADtraits/blob/master/R/downloads.R
10/08
Progress
- Alex & Brian
Code in progress, already running for 3 datasets
- Markus & Caterina
Interface and summary tables with fake data - to be implemented from main table
- Stefanie
Buckets: https://docs.google.com/document/d/1XIrxFpcjU6i9gXHIlHwhmxxlUfiYjVwjJbYtoP2rmI0/edit?usp=sharing
- Daniel
- Wikidata-based data models for trait types, versioning and profiling of trait datasets
- Jorrit
pipeline for versioning of datasets and taxonomy solving
- Jens
Data magic for TRY
List of things to do
- Methods paper
-
**GAP ANALYSIS**
The summary pages may provided the basis for the gap analysis
Aug 10, 2022
Daniel brings up the editorial guidelines related to editing member profiles.
Is it ok for someone to enter someone else's profile?
Brian - minor editorial corrections are ok (e.g., precision or errors in lat/long, formatting errors like malformed url). Consent is required for personal information, affiliations, links (e.g., github handles, ORCIDs).
Daniel - whenever a new property is added to the member profile template, a ticket should be opened. E.g., when adding gender, pronouns, sex, you'd have to open a ticket.
re: OTN + Wikidata = ?
Daniel/ Jens - suggest to make a link from OTN member profile to Scholia profile. Opt in/ opt out? Automatically generate? Specify explicitly?
Additional field for Scholia in member template ? Additional instruction to make the wikidata profile representative.
Daniel - did not profile students on the OTN network. Any OTN member with a publication should have a Wikidata entry.
Markus & Daniel worked on rendering the Wikidata-derived CSV files, leading to new pages on the OTN website about [traits](https://opentraits.org/traits) and [events](https://opentraits.org/events) related to OTN members.
Daniel started a [Scholia profile for the workshop](https://scholia.toolforge.org/event/Q113487331)
Aug 11, 2022
- Key Traits: open data driven, using the overview table across domains, subjective review of data registered to see which traits are most abundant in OTN registered datasets
overview table -> a initial review of a select groups of traits across available datasets registered in Open Trait Network as seen by some OTN group members created in Aug 2022.
Jens want the community to eat the worm: at this workshop we presented an outlines and limited analysis that can be further enhanced. The early registratants will be awarded a early registration award by getting credit/ co-authors. Tell your friends.
to complete today:
1. Stefanie/Daniel/Jens to assign trait categorie to verbatim traits (e.g., "size" category includes "body mass, length, intertegular span"). An example table would be:
before trait category assignment:
traitCategory, traitVerbatim, datasetId
NA, beak-length, https://opentraits.org/datasets/australian-bird
NA, body mass, https://opentraits.org/datasets/australian-bird
after trait category assignment:
traitCategory, traitVerbatim, datasetId
size, beak-length, https://opentraits.org/datasets/australian-bird
size, body mass, https://opentraits.org/datasets/australian-bird
2. Caterina to generate trait/taxa dataset aggregate summary tables (e.g., # traits per dataset, # taxa per dataset)
3. Jorrit to align verbatim taxon information with catalog of life using Nomer and normalized verbatim tables produced by Alex and Brian
4. Jorrit OTN corpus of data available today.
5. Markus to create a web rendering of the Caterina summary table
6. Caterina/Brian to draft a data paper web page that will include Markus's summary tables as a figures:
https://docs.google.com/document/d/1p_If1Djw6i5EGLBrC94YTuqz-iuJB8AZBr3DRq0ahF4/edit?usp=sharing
8. Alex to create more datasets
9. Alex to concatenate the aligned datasets into a single table and publish to Zenodo for Caterina/Markus
10. Daniel to curate some of the trait categories from Stefanie into Wikidata and to register the famous Iris dataset in the registry
- registration ticket for the Iris dataset https://github.com/open-traits-network/open-traits-network.github.io/issues/226
next steps:
1. Brian to use the draft data paper to coordinate talk prep about initial gap analysis to be presented at OTN vitual meeting 2022-08-31/2022-09-01 - tentative title - "Global Gap Analysis of Open Trait Data Registered in the Open Trait Network"
2. Jen to fill in Wikidata connections for the rest of the meetings in the series for which https://www.wikidata.org/wiki/Q113493870 was the first meeting. We can maybe invite people to check that they and their friends are attached to pages like that when we meet them at the OTN Zoom meeting.
3. Trait categories: (five of them from GEOBON) phenology, morphology, physiology, reproduction, movement, interaction, habitat, societal value https://docs.google.com/spreadsheets/d/18VAULEfbpmGd8cW5Uis-CFYmUkrjnAYZ/edit#gid=1267670913
4. Handbook of Plant Traits https://www.publish.csiro.au/bt/pdf/BT12225 ; https://scholia.toolforge.org/work/Q56965448