owned this note
owned this note
Published
Linked with GitHub
# KB DataFest 2017
## Day 4: December 14, 2017
Available at https://hackmd.io/MYEwrAzBDsBmCcBaAHAJggNkQFlfAhivGFrMAKb4AMAjLLCBsuUA?both
## Day 1: December 11, 2017
Hilmar's opening briefing:
* Wifi: The Chesterfield Public Wifi, need to login every four hours
* DukeVisitor also available, but limited bandwidth
Phenoscape timeline:
* ~2007: Incubated at NESCent
* until ~2011: Phenoscape I
* until ~2016: Phenoscape II
* currently: SCATE (Phenoscape NG)
* http://scate.phenoscape.org/
Paula Mabee's keynote:
* Genomics -> phenomics
* Phenotype RCN: http://www.phenotypercn.org/
* phenotype- environment-evolution
* Science questions
* Inferences:
* Presence of part -> presence of whole
* Quality of part -> presence of part
* Absence of developmental precursor -> infer absence
* 0.5% of populated cells are conflicted: both present and absent, say
* Can help you identify sampling gaps with "missing" phenotypic data
* Potential: Arbor Matrix Explorer- ontology enabled
* Look for phenotypic disparities: where are the similarities, where are the differences
* phylogeny- traits andd trees
* Can answer really broad questions like, "how often were fins lost?"
* Phenotypic data is very, very sparse, so we need lots of inferencing and propagating down a tree
* "Forward genomics"- Hiller et al, Cell Reports (http://dx.doi.org/10.1016/j.celrep.2012.08.032)
* Gene-phenotype connections- shared ontologies ties it together
* MODs link phenotypes to genes
* NSF funds 23 model organisms, and are interested in identifying variation interesting to folks focussed on disease
* KB can help predict target genes from existing models- catfish lack a tongue
* semantic similarity reasoning links phenotypes to set of candidate genes
* "Phenoblast": search for genes with a similar phenotype or visa versa
* Can we predict the phenotype of, say, the [Pichi Armadillo, *Zaedyus pichiy*](https://en.wikipedia.org/wiki/Pichi) from its genome?
* free text decriptions not computable
* PNAS publication- predictiong phenotype from whole genome sequence
* If phenotype <-> taxon and taxon <-> location and location <-> environment, can we link phenotype with environment, can we start thinking about adaptive changes in similar environments?
* https://peerj.com/articles/1470/
# Icebreakers
# Subgroups (Open Space exercise):
## Rules of engagement
* Only propose ideas that you would excited to do yourself.
* Three minute limit on project pitches
* Each idea gets a flipchart page, later refined into a subgroup or discarded through lack of interest
* Everybody should propose, discuss or learn ideas
* Questions that permit the validity of the idea itself are not permitted right now
* Questions that help you understand the pitch are permitted
## Initial pitches
1. Environments - Anne Thessen
* "Phenopackets": what does "tropical" mean in terms of temperature, humidity, etc.
* Treating the environment as a phenotype
* We think of phenotype as internal to an organism, but it is also how the genotype interacts with the environment
2. Genotype-phenotype mapping using deep learning - Marjan Sadeghi
* Developing an algorithm based on deep-learning
* Is there sufficient data to applying deep learning?
* Can we find specific questions that deep learning might be able to answer?
* Training set: genomic data, gene expression levels,
* Uniprot database might have phenotypes
* What is deep learning? Using (>2 layer) neural networks
* Output: a classifier
3. How to develop queryable phenotypes?- Istvan
* Software like Noctua, mx that
* What reasoners could we use for querying phenotypes?
* Describe best practices -- what we have done so far, what we need to do in the future -- and generate recommenders for reasoners
4. Digitization of museum records - Rebecca
* How to integrate into phenoscape?
* What is a gap in our knowledge of ontology
* Digimorph: effort to digitize museum specimens -- can we use this data to identify/annotate phenotypes?
* At the least, they should be queryable from Phenoscape
* Maybe link this to the Open Tree of Life
5. Extracting data for comparative trait analysis from Phenoscape- Josef
* Traits from phenoscape in
* Dependencies: "this can only evolve if we already have that"
6. Pheno-phylo - Emily Jane
* https://github.com/phenoscape/KB-DataFest-2017/issues/12
* Integrate better phylogenetic information with phenoscape data
7. Wikidata - Gaurav
* connections between Wikidata and Uberon and other phenotypic ontologies
8. Read and annotate phenotypic data - Matt Yoder
* In Taxonworks
* Use case on Github: add an image
* crosswalking proof of concept API experiment
9. How can Phenoscape faciliate other ontologies? - Annika Smith
* Connections to plant data?
* How do you create a phenotype ontology for a clade, like plants?
* There is already a Plant Ontology and a Rhabadopsis ontology -- how can phenoscape help putting those pieces together?
* Lots of data coming in through digitization
10. Trait/character extraction and data capture from images - Austin
* Translate into ontology terms that are human-readable
11. Noctua: ontology annotation software - Jim
* Web-based graphical editor for describing a set of relationships at the instance level, drawing terms from a set of ontologies that are configured in it (http://noctua.berkeleybop.org/)
* Related to Istvan's project
12. JSON-based templates for describing common phenotypes - Matt
* How can we get other people to understand the semantic complexity of what we do?
13. Correlating phenotypes with function - Ethan
14. Translation of NeXML to JSON-LD - Scott
* Could use NexSON?
* Carl Boettiger's idea
* Could provide a better way to get data out of the Phenoscape API
## Discussion with the leader
Push towards a convincing pitch, with:
* An anticipated product
* Do you have what it takes to get there?
## Second round of pitches
Supportive but critical.
1. Environments! And Phenopackets
* What traits are unique to organisms that live in low-light environments?
* Deliverable: Best practices for representing phenopackets, try to add it to ENVO
* Talk to the ENVO people
* Deliverable: See if we can automatically make a phenopacket
2. Integrating into Phenoscape of Museum Specimen Data
* Can definitely map ~200 specimens in Phenoscape to iDigBio
* But linking taxonomy is difficult, but there will be a lot of overlap
* Look at how morphology maps onto a global map
* 1.8 million museum records in iDigBio about frogs
* Records coming from Morphosource, which has manually curated images, hopefully using Uberon terms
* Look at limb structure of frogs on a latitudinal gradient
* Selfish reason: museums specimen to phylogenies via phenotypes
* Open Tree of Life uses GBIF Backbone Taxonomy, so that should be easy to link with iDigBio
* Ideas for questions they could ask?
3. Harnessing Phenoscape infrastructure for other groups
* Make a "Plant Phenoscape Lite"
* How do you move from a model organism like Rhabadopsis, what is the actual process that gets something that can integrate with Phenoscape?
* What is the process of semnatic annotation?
* Want something that's reprodible
* Start within the order Brassicales, focus on one plant family in particular, maybe focus exclusively on floral anatomy
* Will need someone from Phenoscape to work with.
* Deliverable: prepare an outline for a paper on this subject
* iDigBio has a bunch of plants, so include specimen identifiers from the start!
* Research questions?
* Identifying and studying trait evolution?
* Images would be awesome!
4. NeXML to JSON-LD Conversion
* https://hackmd.io/CwdgZghhxgjAtMAxgIxIgHATgKzwziAAzwBsAJkQEwixICmRERSQA===#
* NeXML would have rich expressivity for trait data, including RDFa
* NeXML in theory has an RDF basis
* So you could do NeXML -> RDF, and then save that as JSON-LD? (I don't think that would work well)
* Merged with original idea of Phenoscape talking to RDF
* Could be a white paper on how those transformations take place
* Driving use-cases:
* Adding Phenoscape (or any other semantic) annotations to NeXML may be easier to express in JSON-LD than in meta RDFa.
* Phenoscape data embedded in NeXML would be easier to mobilize if the entire document was in JSON-LD. (The current strategy extracts only RDFa data, and rest of NeXML is lost. and RDF-XML is still cumbersome)
* moving annotations round-trip from Phenoscape
*
* What is the application? How do you evaluate that the result meets the goal?
* Looking at what's in the RDF files and seeing what the connections are?
* Goal: preserving semantics between the various portals
5. PhenoPhylo
* Shared evolutionary history of a taxa and
* Better connections between phenotype data and phylogenies
* How do we interpret changes in the composition of taxonomy (e.g. families).
* Pasan has software for reconciliating phylogenies from the Open Tree with Phenoscape phenotypes
6. Using Noctua for composing Phenotype
* Set up an instance of Noctua with all the ontologies we use
* Maybe write a white paper on how to express complex phenotypes, based on a previous white paper written by them
* Maybe a demo of Noctua if there is time?
7. Extracting data for comparative trait analysis from Phenoscape
* Deliverable: R package that links RPhenoscape with a package that does ancestral state reconciliation
* Generates trait matrices from Phenoscape data
* https://github.com/phenoscape/KB-DataFest-2017/issues/13
## Bootcamp ideas
* Noctua tutorial
* Phenopacket tutorial (maybe get Melissa to do this remotely?)
* Phenoscape API (matt: with emphasis on API content): 3:45pm, somewhere down the hall
## Phenoscape API Bootcamp
* Triplestore called Blazegraph
* Web services application that provide JSON or NeXML (see http://kb.phenoscape.org/apidocs/#/ for the Swagger-rendered docs)
* Website uses web services
* [`/term/search`](http://kb.phenoscape.org/apidocs/#/Terms/get_term_search) can be used to retrieve a list of terms with a particular name, such as a taxonomic name. Might be useful for cross-linking with other databases.
* http://yasgui.org/ http://db.phenoscape.org/bigsparql