# Project 26 notes BioHackathon Europe 22
## Potentially interesting metabolomics datasets:
These datasets have been found in the Metabolights repository (annotated with ChEBI, (class)name, SMILES*, InChI*) and Metabolomics workbench (annotated with PubChem*, Kegg*), through the search term 'toxic' (the term 'UVCB' gave 0 results). *If available?
| ID | Species | Chemical(s) | Comment(s) | Technique | Annotated IDs | Unannotated peaks
| ----------- | ----------- | ----------- | ----------- |----------- |----------- |----------- |
| [MTBLS275](https://www.ebi.ac.uk/metabolights/MTBLS275/descriptors) | mouse (Mus musculus) | chlorpyrifos; chlorpyrifos-methyl metabolite(3,5,6-trichloro-2-pyridinol) | oral exposure | NMR | 52 | 2+? |
| [MTBLS48](https://www.ebi.ac.uk/metabolights/MTBLS48/descriptors) | mouse (Mus musculus) | Municipal wastewater effluents (MWWE) | transcriptomic data also available | NMR | 47 | 1+? |
| [MTBLS532](https://www.ebi.ac.uk/metabolights/MTBLS532/descriptors) | Earthworms (Eisenia fetida) | Riclosan (TCS); methyl-triclosan (MTCS) | TCS is ubiquitous in sewage sludge, large proportion is transformed into MTCS | GC-MS | 17 | ? |
| [MTBLS602](https://www.ebi.ac.uk/metabolights/MTBLS602/descriptors) | mouse (Mus musculus) | boscalid, captan, chlorpyrifos, thiofanate, thiacloprid, ziram | Mixture oral exposure, Untargeted urine, plasma, liver samples | NMR | 209 | 2+? |
| [MTBLS596](https://www.ebi.ac.uk/metabolights/MTBLS596/descriptors) | mouse (Mus musculus) | boscalid, captan, chlorpyrifos, thiofanate, thiacloprid, ziram | Mixture oral exposure, Untargeted urine samples | UPLC-MS | 6 | 77 |
| [MTBLS360](https://www.ebi.ac.uk/metabolights/MTBLS360/descriptors) | Trypanosoma brucei (African trypanosomiasis) | 3-(oxazolo[4,5-b]pyridine-2-yl)anilide (OXPA) | "non-toxic" drug, unknown mechanism of action. | LC-MS | 506 | 2+? |
| [MTBLS1196](https://www.ebi.ac.uk/metabolights/MTBLS1196/descriptors) | various micro-organisms | Polycyclic aromatic hydrocarbons (PAHs) | soil samples | GC-TOF-MS | 264 | 3+? |
| [MTBLS2878](https://www.ebi.ac.uk/metabolights/MTBLS2878/descriptors) | Euglena gracilis (microalga) | CdCl2 (heavy metal), paromomycin(antibiotics) | - | UHPLC-MS/MS | 3806 | 422 + ? |
| [MTBLS2166](https://www.ebi.ac.uk/metabolights/MTBLS2166/descriptors) | Glossina morsitans morsitans (African trypanosomiasis) | nitisinone | - | LC-MS/MS | 40 | ? |
| [MTBLS5772](https://www.ebi.ac.uk/metabolights/MTBLS5772/descriptors) | Bottlenose dolphin (Tursiops) | "Trace elements" | samples were collected from dead animals |UHPLC-MS(/MS?) | - | 316 + ? |
| [ST001428](https://www.metabolomicsworkbench.org/data/DRCCMetadata.php?Mode=Study&StudyID=ST001428&StudyType=MS&ResultType=1) | human (Homo sapiens) | "environmental toxicants" | nonalcoholic fatty liver disease NAFLD and nonalcoholic steatohepatitis (NASH) in children |LC-MS | ? | >103 (duplicates) + ?|
| [ST000446](https://www.metabolomicsworkbench.org/data/DRCCMetadata.php?Mode=Study&StudyID=ST000446&StudyType=MS&ResultType=5) | E. coli | apratoxin | anti-cancer treatment drug |UHPLC-MS | >160(duplicates) | ~20 + ?|
| [ST000415](https://www.metabolomicsworkbench.org/data/DRCCMetadata.php?Mode=Study&StudyID=ST000415&StudyType=MS&ResultType=1) | rat (Rattus norvegicus) | Flame Retardant Mixture Firemaster 550 | placenta samples |GCxGC-MS | ? |73 + ? |
* ~~MZmine could be used to "reannotate" the raw data files from the projects above, to find more matches than currently available. [R package MZmine](https://www.pharmacognosie-parisdescartes.fr/pdf/150420_MZmine_Tutorial_UNIGE.pdf)~~
* WebChem (for reannotation of names) [R package WebChem](https://cran.r-project.org/web/packages/webchem/webchem.pdf), [Github WebChemR](https://github.com/ropensci/webchem)
* Obtaining data: [R package MetabolighteR](https://aberhrml.github.io/metabolighteR/articles/Introduction_to_metabolighteR.html); [Metabolomics Workbench API](https://www.metabolomicsworkbench.org/tools/MWRestAPIv1.0.pdf)
## PubChemLite
RDF of PubChem is available, on a "hidden" virtuoso endpoint. A REST RDF API [Documentation](https://pubchemdocs.ncbi.nlm.nih.gov/rdf#_5) prvides users with restriced access (to avoid overload of resources) [Evan].
@egonw [Egon]: how do I get a subset of PubChem RDF?
## Biochemical Transformation Dataset
Provided by [Emma], summarized from all data on PubChem (both NORMAN data and ChEMBL).
Data on Zenodo DOI:[10.5281/zenodo.5644560](https://doi.org/10.5281/zenodo.5644560).`
* [Egon] interested to creating GPML for this
## UVCBs: NORMAN-SLE
### Outlining SLE datasets for PubChem
Several datasets roughly in four categories
* Surfactants: S7 EAWAGSURF, S8 ATHENSUS, S18 TSCASURF, S23 EIUBASURF
* PFAS: S9 PFASTRIER, S25 OECDPFAS, S80 PFASGLUEGE
* Extra: CompTox datasets of interest: PFASDEV1, PFASMARKUSH, PFASMASTER
* Regulatory lists: S17 KEMIMARKET, S18 TSCASURF, S32 REACH
* Plastics: S47 ECHAPLASTICS, S48/9 CPPDB, S77 FCCDB
### UVCB Names
* Curated S77 FCCdb based on error report, new version DOI:[10.5281/zenodo.7304977](https://doi.org/10.5281/zenodo.7304977)
* Curated S71 CECSCREEN based on error report, new version DOI:[10.5281/zenodo.7305138](https://doi.org/10.5281/zenodo.7305138)
* Emma and Evan did a lot of number crunching of NORMAN-SLE UVCB names based on output sent by Jeff. An excerpt:
```
Synonyms found in 1 concepts: 142172
Synonyms found in 2 concepts: 13248
Synonyms found in 3 concepts: 1668
Synonyms found in 4 concepts: 313
Synonyms found in 5 concepts: 76
Synonyms found in 6 concepts: 20
Synonyms found in 7 concepts: 1
Synonyms found in 11 concepts: 1
Synonyms found in 22 concepts: 2
Synonyms found in 23 concepts: 2
```
* Evan and Emma came up with a set of RegEx / name detections to catch "typical" UVCBs (i.e. records in PubChem that should not have CIDs associated with them)
```
if ( $s =~ /C\d+\-C\d+/ ) { $is_uvcb = 1; }
elsif ( $s =~ /C\d+\-\d+/ ) { $is_uvcb = 1; }
elsif ( $s =~ /C\>\d+/ ) { $is_uvcb = 1; }
elsif ( $s =~ /activated/i ) { $is_uvcb = 1; }
elsif ( $s =~ /alcohols/i ) { $is_uvcb = 1; }
elsif ( $s =~ /alkane/i ) { $is_uvcb = 1; }
elsif ( $s =~ /alkanol/i ) { $is_uvcb = 1; }
elsif ( $s =~ /alkoxy/i ) { $is_uvcb = 1; }
elsif ( $s =~ /alkyl/i ) { $is_uvcb = 1; }
elsif ( $s =~ /alloy/i ) { $is_uvcb = 1; }
elsif ( $s =~ /amines/i ) { $is_uvcb = 1; }
elsif ( $s =~ /amides/i ) { $is_uvcb = 1; }
elsif ( $s =~ /analog/i ) { $is_uvcb = 1; }
elsif ( $s =~ /benzylated/i ) { $is_uvcb = 1; }
elsif ( $s =~ /branched/i ) { $is_uvcb = 1; }
elsif ( $s =~ /brominated/i ) { $is_uvcb = 1; }
elsif ( $s =~ /butter/i ) { $is_uvcb = 1; }
elsif ( $s =~ /charcol/i ) { $is_uvcb = 1; }
elsif ( $s =~ /chlorinated/i ) { $is_uvcb = 1; }
elsif ( $s =~ /complex/i ) { $is_uvcb = 1; }
elsif ( $s =~ /condensed/i ) { $is_uvcb = 1; }
elsif ( $s =~ /distillate/i ) { $is_uvcb = 1; }
elsif ( $s =~ /dervs/i ) { $is_uvcb = 1; }
elsif ( $s =~ /derivs/i ) { $is_uvcb = 1; }
elsif ( $s =~ /diluent/i ) { $is_uvcb = 1; }
elsif ( $s =~ /dimer/i ) { $is_uvcb = 1; }
elsif ( $s =~ /earth/i ) { $is_uvcb = 1; }
elsif ( $s =~ /esters/i ) { $is_uvcb = 1; }
elsif ( $s =~ /extract/i ) { $is_uvcb = 1; }
elsif ( $s =~ /ext./i ) { $is_uvcb = 1; }
elsif ( $s =~ /fatty/i ) { $is_uvcb = 1; }
elsif ( $s =~ /fluid/i ) { $is_uvcb = 1; }
elsif ( $s =~ /fragment/i ) { $is_uvcb = 1; }
elsif ( $s =~ /fruit/i ) { $is_uvcb = 1; }
elsif ( $s =~ /gum/i ) { $is_uvcb = 1; }
elsif ( $s =~ /hormone/i ) { $is_uvcb = 1; }
elsif ( $s =~ /hydrocarbon/i ) { $is_uvcb = 1; }
elsif ( $s =~ /hydrogel/i ) { $is_uvcb = 1; }
elsif ( $s =~ /hydrogenated/i ) { $is_uvcb = 1; }
elsif ( $s =~ /hydrolized/i ) { $is_uvcb = 1; }
elsif ( $s =~ /ketones/i ) { $is_uvcb = 1; }
elsif ( $s =~ /linear/i ) { $is_uvcb = 1; }
elsif ( $s =~ /liq./i ) { $is_uvcb = 1; }
elsif ( $s =~ /mixture/i ) { $is_uvcb = 1; }
elsif ( $s =~ /modified/i ) { $is_uvcb = 1; }
elsif ( $s =~ /oil/i ) { $is_uvcb = 1; }
elsif ( $s =~ /oligomer/i ) { $is_uvcb = 1; }
elsif ( $s =~ /oxidized/i ) { $is_uvcb = 1; }
elsif ( $s =~ /pigment/i ) { $is_uvcb = 1; }
elsif ( $s =~ /pills/i ) { $is_uvcb = 1; }
elsif ( $s =~ /petrol/i ) { $is_uvcb = 1; }
elsif ( $s =~ /poly/i ) { $is_uvcb = 1; }
elsif ( $s =~ /product/i ) { $is_uvcb = 1; }
elsif ( $s =~ /rare/i ) { $is_uvcb = 1; }
elsif ( $s =~ /reaction/i ) { $is_uvcb = 1; }
elsif ( $s =~ /resin/i ) { $is_uvcb = 1; }
elsif ( $s =~ /rosin/i ) { $is_uvcb = 1; }
elsif ( $s =~ /salt/i ) { $is_uvcb = 1; }
elsif ( $s =~ /solution/i ) { $is_uvcb = 1; }
elsif ( $s =~ /solvent/i ) { $is_uvcb = 1; }
elsif ( $s =~ /starch/i ) { $is_uvcb = 1; }
elsif ( $s =~ /steroid/i ) { $is_uvcb = 1; }
elsif ( $s =~ /syrup/i ) { $is_uvcb = 1; }
elsif ( $s =~ /tannin/i ) { $is_uvcb = 1; }
elsif ( $s =~ /tree/i ) { $is_uvcb = 1; }
elsif ( $s =~ /unsaturated/i ) { $is_uvcb = 1; }
elsif ( $s =~ /wax/i ) { $is_uvcb = 1; }
elsif ( $s =~ /wool/i ) { $is_uvcb = 1; }
elsif ( $s =~ /whole/i ) { $is_uvcb = 1; }
```
* todo is still to see how many false positives and negatives this generates [Emma/Evan]
* We can test the regex on the collected dataset chemical compound names [Denise]
### NORMAN-SLE TSCA surfactant data
About 750 compounds from S18 TSCASURF (from James Little).
DOI:[10.5281/zenodo.2628791](https://doi.org/10.5281/zenodo.2628791)
725 surfactants have been put in Wikidata. The CSV was converted into QuickStatement and added.
* code: https://github.com/elixir-europe/biohackathon-projects-2022/tree/main/26/wikidata/norman-sle
* example surfactant in Wikidata: https://www.wikidata.org/wiki/Q115141114
Ideas:
* Scan EuropePMC (see Google Collab ERM/JRCNM notebook) [Egon]
* Perform community curation (like the sex-bias project) to find false/true positives? [Denise]
## Bioschemas JSON-LD dumps
Rhea has been converted to Bioschemas Turtle. [Egon]
* [recipe](https://github.com/elixir-europe/biohackathon-projects-2022/tree/main/26/bioschemas/rhea)
Pending:
* convert to JSON-LD for consumation by PubChem (alternative 1)
Possible other sources:
* UniProt
* Scholia
## RDF Subset PubChem
* [Evan, Egon]: PubChem RDF REST interface
## CCS / Lipids
* [Emma, Evan] have coordinated import of Erin Baker lipid and PFAS CCS into PubChem.
* [Baker Lab Data Source](https://pubchem.ncbi.nlm.nih.gov/source/25763) now live with [1,073 Live Substances](https://www.ncbi.nlm.nih.gov/pcsubstance?term=%22Baker%20Lab%2C%20Chemistry%20Department%2C%20The%20University%20of%20North%20Carolina%20at%20Chapel%20Hill%22%5BSourceName%5D%20AND%20hasnohold%5Bfilt%5D)
* CCS values in preview on test website, hopefully live soon!
## Grouping Homologue Series
[Anjana] A list of compounds from the PubChemLite exposomics dataset that form homologue series was compiled using the OngLai algorithm. Some outputs [here](https://gitlab.lcsb.uni.lu/eci/pubchem/-/tree/master/annotations/UVCBs/)
## Federated queries
Update queries (from WP to another one) [here](https://github.com/wikipathways/SPARQLQueries/tree/master/C.%20Collaborations)
* Blog on [names queries](https://www.bergnet.org/2022/11/sparql-named-query/) (for faster federated queries)
* [Dominik] MolMeDB-WP: (From WP-endpoint)
idea: find membrane interactions for metabolites in 1 PW. Queries MolMeDB for permeabilities of source-target pairs on the same membrane measured or computed by the same method.
```SPARQL=
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX bao: <http://www.bioassayontology.org/bao#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX chebi: <http://purl.obolibrary.org/obo/chebi/>
PREFIX sachem: <http://bioinfo.uochb.cas.cz/rdf/v1.0/sachem#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
PREFIX sso: <http://semanticscience.org/resource/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX mmdbvoc: <https://rdf.molmedb.upol.cz/vocabulary#>
SELECT distinct ?chebioSrc ?similarSrc ?ikeySrc ?srcLogPerm
?chebioTgt ?similarTgt ?ikeyTgt ?tgtLogPerm
?membraneName ?methodName WHERE
{
# WikiPathways service
?interaction dcterms:isPartOf ?pathway ; a wp:Conversion ;
wp:source ?source ;
wp:target ?target .
?source wp:bdbChEBI ?chebiSrc .
?target wp:bdbChEBI ?chebiTgt .
?pathway dcterms:identifier "WP4225".
BIND(iri(concat("http://purl.obolibrary.org/obo/CHEBI_", substr(str(?chebiSrc),37))) AS ?chebioSrc)
BIND(iri(concat("http://purl.obolibrary.org/obo/CHEBI_", substr(str(?chebiTgt),37))) AS ?chebioTgt)
#MolMeDB - everything else happens here
SERVICE <https://idsm.elixir-czech.cz/sparql/endpoint/molmedb>
{
#IDSM CHEBI - find similar molecules in ChEBI
SERVICE <https://idsm.elixir-czech.cz/sparql/endpoint/chebi>
{
?chebioSrc ^sso:is-attribute-of / sso:has-value ?molfileSrc .
?chebioTgt ^sso:is-attribute-of / sso:has-value ?molfileTgt .
[ sachem:compound ?similarSrc; sachem:score ?scoreSrc ]
sachem:similaritySearch [
sachem:query ?molfileSrc ;
sachem:cutoff 98e-2
].
[ sachem:compound ?similarTgt; sachem:score ?scoreTgt ]
sachem:similaritySearch [
sachem:query ?molfileTgt ;
sachem:cutoff 98e-2
].
}
#IDSM - find InChiKeys of similar source and target
SERVICE <https://idsm.elixir-czech.cz/sparql/endpoint/idsm>
{
?similarSrc chebi:inchikey ?ikeySrc .
?similarTgt chebi:inchikey ?ikeyTgt .
}
#search MolMeDB for pairs similar to source-target pairs and permeability for same membrane and method
?mmdbSrc sio:SIO_000008 [a sio:CHEMINF_000059;
sio:SIO_000300 ?ikeySrc] ;
^bao:BAO_0090012 [bao:BAO_0090012 ?membrane ;
bao:BAO_0000212 ?method ;
bao:BAO_0000208 [ a mmdbvoc:LogPerm;
bao:BAO_0095007 ?srcLogPerm]
] .
?mmdbTgt sio:SIO_000008 [a sio:CHEMINF_000059;
sio:SIO_000300 ?ikeyTgt];
^bao:BAO_0090012 [bao:BAO_0090012 ?membrane ;
bao:BAO_0000212 ?method ;
bao:BAO_0000208 [ a mmdbvoc:LogPerm ;
bao:BAO_0095007 ?tgtLogPerm]
] .
?membrane rdfs:label ?membraneName .
?method rdfs:label ?methodName .
}
}
```
* [Denise] MetaNetX-WP (WP endpoint):
```SPARQL=
#Prefixes required which might not be available in the SPARQL endpoint by default
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
#Prefixes for the MetaNetX RDF:
PREFIX mnx: <https://rdf.metanetx.org/schema/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rhea: <http://rdf.rhea-db.org/>
#Variable selection
SELECT DISTINCT (str(?title) as ?pathwayName) ?PWID ?interactionID ?reac
WHERE {
#Pathway Model IDs of interest
VALUES ?PWID {"WP5275"}
?pathway a wp:Pathway . #Define what a pathway is
?pathway dcterms:identifier ?PWID. #Obtain the ID
?pathway dc:title ?title . #Obtain the title
?interaction wp:bdbRhea ?interactionID . #Find interactions with a Rhea ID
?interaction dcterms:isPartOf ?pathway . #Only those part of PW
##The IRI for Rhea-IDs from WikiPathways starts with https://identifiers.org/rhea/, where the one from MetaNetX starts with "http://rdf.rhea-db.org/ , so we need to rewrite the IRI
BIND( # Bind the created IRI into a new variable (called ?newIRI)
IRI( # Convert the string back to an IRI
CONCAT( # Concatenate item 1 and 2 together as one string
"http://rdf.rhea-db.org/", # First item to concat (more items can be added with a comma
#Second item to concat:
SUBSTR( # Obtain a substring
STR(?interactionID), # Convert the Rhea IRI from WikiPathways to a string,
30) # removing the first 29 charachters
)) AS ?newIRI # Name for the new variable
)
SERVICE <https://rdf.metanetx.org/sparql/> {
SELECT DISTINCT ?reac
WHERE{
?reac mnx:reacXref rhea:17658 .}
}
} ORDER BY ASC(?pathway)
```
* [Dominik, Egon, Denise] WP-IDSM-Rhea (IDSM-endpoint):
```SPARQL=
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX ebi: <http://ebi.rdf.ac.uk/dataset/>
PREFIX sachem: <http://bioinfo.uochb.cas.cz/rdf/v1.0/sachem#>
PREFIX idsm: <https://idsm.elixir-czech.cz/sparql/endpoint/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
PREFIX sso: <http://semanticscience.org/resource/>
PREFIX rh:<http://rdf.rhea-db.org/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT distinct ?chebioSrc ?similarSrc ?chebioTgt ?similarTgt ?reaction WHERE {
#SELECT distinct ?chebio ?score ?similar ?reaction WHERE {
# WikiPathways service
SERVICE <https://sparql.wikipathways.org/sparql/> {
#SELECT ?chebioSrc ?chebioTgt WHERE{
?interaction dcterms:isPartOf ?pathway ; a wp:Conversion ;
wp:source ?source ;
wp:target ?target .
?source wp:bdbChEBI ?chebiSrc .
?target wp:bdbChEBI ?chebiTgt .
?pathway dcterms:identifier "WP4225".
BIND(iri(concat("http://purl.obolibrary.org/obo/CHEBI_", substr(str(?chebiSrc),37))) AS ?chebioSrc)
BIND(iri(concat("http://purl.obolibrary.org/obo/CHEBI_", substr(str(?chebiTgt),37))) AS ?chebioTgt)
#} LIMIT 10
}
?chebioSrc ^sso:is-attribute-of / sso:has-value ?molfileSrc .
?chebioTgt ^sso:is-attribute-of / sso:has-value ?molfileTgt .
[ sachem:compound ?similarSrc; sachem:score ?scoreSrc ]
sachem:similaritySearch [
sachem:query ?molfileSrc ;
sachem:cutoff "0.98"^^xsd:double
].
[ sachem:compound ?similarTgt; sachem:score ?scoreTgt ]
sachem:similaritySearch [
sachem:query ?molfileTgt ;
sachem:cutoff "0.98"^^xsd:double
].
SERVICE <https://sparql.rhea-db.org/sparql> {
?reaction rh:side / rh:contains / rh:compound / rh:chebi ?similarSrc , ?similarTgt .
?reaction rdfs:subClassOf rh:Reaction .
}
}
```
* [Anne, Denise]: WP to Rhea (from WP endpoint):
```SPARQL=
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX vg: <http://biohackathon.org/resource/vg#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/>
PREFIX uberon: <http://purl.obolibrary.org/obo/uo#>
PREFIX taxon: <http://purl.uniprot.org/taxonomy/>
PREFIX sp: <http://spinrdf.org/sp#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX schema: <http://schema.org/>
PREFIX sachem: <http://bioinfo.uochb.cas.cz/rdf/v1.0/sachem#>
PREFIX rh: <http://rdf.rhea-db.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX pubmed: <http://rdf.ncbi.nlm.nih.gov/pubmed/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX patent: <http://data.epo.org/linked-data/def/patent/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX orthodbGroup: <http://purl.orthodb.org/odbgroup/>
PREFIX orthodb: <http://purl.orthodb.org/>
PREFIX orth: <http://purl.org/net/orth#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX np: <http://nextprot.org/rdf#>
PREFIX nextprot: <http://nextprot.org/rdf/entry/>
PREFIX mnx: <https://rdf.metanetx.org/schema/>
PREFIX mnet: <https://rdf.metanetx.org/mnet/>
PREFIX mesh: <http://id.nlm.nih.gov/mesh/>
PREFIX lscr: <http://purl.org/lscr#>
PREFIX lipidmaps: <https://www.lipidmaps.org/rdf/>
PREFIX keywords: <http://purl.uniprot.org/keywords/>
PREFIX insdcschema: <http://ddbj.nig.ac.jp/ontologies/nucleotide/>
PREFIX insdc: <http://identifiers.org/insdc/>
PREFIX identifiers: <http://identifiers.org/>
PREFIX glyconnect: <https://purl.org/glyconnect/>
PREFIX glycan: <http://purl.jp/bio/12/glyco/glycan#>
PREFIX genex: <http://purl.org/genex#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX faldo: <http://biohackathon.org/resource/faldo#>
PREFIX eunisSpecies: <http://eunis.eea.europa.eu/rdf/species-schema.rdf#>
PREFIX ensembltranscript: <http://rdf.ebi.ac.uk/resource/ensembl.transcript/>
PREFIX ensemblterms: <http://rdf.ebi.ac.uk/terms/ensembl/>
PREFIX ensemblprotein: <http://rdf.ebi.ac.uk/resource/ensembl.protein/>
PREFIX ensemblexon: <http://rdf.ebi.ac.uk/resource/ensembl.exon/>
PREFIX ensembl: <http://rdf.ebi.ac.uk/resource/ensembl/>
PREFIX ec: <http://purl.uniprot.org/enzyme/>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX chebislash: <http://purl.obolibrary.org/obo/chebi/>
PREFIX chebihash: <http://purl.obolibrary.org/obo/chebi#>
PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX bibo: <http://purl.org/ontology/bibo/>
PREFIX allie: <http://allie.dbcls.jp/>
PREFIX SLM: <https://swisslipids.org/rdf/SLM_>
PREFIX GO: <http://purl.obolibrary.org/obo/GO_>
PREFIX ECO: <http://purl.obolibrary.org/obo/ECO_>
PREFIX CHEBI: <http://purl.obolibrary.org/obo/CHEBI_>
# Select all Rhea reactions that have a pair of ChEBI IDs as reaction participant and in opposite side
# Return Rhea reactions that have CHEBI:29985 (L-glutamate) as reaction participant in one side
# and CHEBI:58359 (L-glutamine) in the other side
# Include the UniProt enzyme with ID as additional check.
SELECT ?uniprot ?chebi1 ?name1 ?chebi2 ?name2 ?rhea ?equation
WHERE {
VALUES (?chebi1) { (CHEBI:29985) }
?chebi1 up:name ?name1 .
?rhea rh:side ?reactionSide1 .
?reactionSide1 rh:contains / rh:compound / rh:chebi ?chebi1 .
VALUES (?chebi2) { (CHEBI:58359) }
?chebi2 up:name ?name2 .
?rhea rh:side ?reactionSide2 .
?reactionSide2 rh:contains / rh:compound / rh:chebi ?chebi2 .
?reactionSide1 rh:transformableTo ?reactionSide2 .
?rhea rh:equation ?equation .
SERVICE <https://sparql.uniprot.org/sparql> {
?uniprot up:annotation/up:catalyticActivity/up:catalyzedReaction ?rhea .
VALUES (?uniprot) { (uniprotkb:P05041) } #Optional?
#uniprotkb:P05041 #if only 1 value is needed
up:annotation/up:catalyticActivity/up:catalyzedReaction ?rhea .
}
}
```
* [Dominik, Egon, Denise] WP-IDSM-MolMeDB (IDSM-endpoint):
```SPARQL=
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX bao: <http://www.bioassayontology.org/bao#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX chebi: <http://purl.obolibrary.org/obo/chebi/>
PREFIX sachem: <http://bioinfo.uochb.cas.cz/rdf/v1.0/sachem#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
PREFIX sso: <http://semanticscience.org/resource/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX mmdbvoc: <https://rdf.molmedb.upol.cz/vocabulary#>
SELECT distinct ?chebioSrc ?similarSrc ?ikeySrc ?srcLogPerm
?chebioTgt ?similarTgt ?ikeyTgt ?tgtLogPerm
?membraneName ?methodName WHERE {
# WikiPathways service
SERVICE <https://sparql.wikipathways.org/sparql/> {
?interaction dcterms:isPartOf ?pathway ; a wp:Conversion ;
wp:source ?source ;
wp:target ?target .
?source wp:bdbChEBI ?chebiSrc .
?target wp:bdbChEBI ?chebiTgt .
?pathway dcterms:identifier "WP4225".
BIND(iri(concat("http://purl.obolibrary.org/obo/CHEBI_", substr(str(?chebiSrc),37))) AS ?chebioSrc)
BIND(iri(concat("http://purl.obolibrary.org/obo/CHEBI_", substr(str(?chebiTgt),37))) AS ?chebioTgt)
}
#IDSM CHEBI - find similar molecules in Chebi
SERVICE <https://idsm.elixir-czech.cz/sparql/endpoint/chebi>
{
?chebioSrc ^sso:is-attribute-of / sso:has-value ?molfileSrc .
?chebioTgt ^sso:is-attribute-of / sso:has-value ?molfileTgt .
[ sachem:compound ?similarSrc; sachem:score ?scoreSrc ]
sachem:similaritySearch [
sachem:query ?molfileSrc ;
sachem:cutoff "0.98"^^xsd:double
].
[ sachem:compound ?similarTgt; sachem:score ?scoreTgt ]
sachem:similaritySearch [
sachem:query ?molfileTgt ;
sachem:cutoff "0.98"^^xsd:double
].
}
#IDSM - find InChiKeys of similar source and target
SERVICE <https://idsm.elixir-czech.cz/sparql/endpoint/idsm>
{
?similarSrc chebi:inchikey ?ikeySrc .
?similarTgt chebi:inchikey ?ikeyTgt .
}
#find pairs in MolMeDB with data on interaction with same membrane by same method and their permeabilities
?mmdbSrc sio:SIO_000008 [a sio:CHEMINF_000059;
sio:SIO_000300 ?ikeySrc] ;
^bao:BAO_0090012 [bao:BAO_0090012 ?membrane ;
bao:BAO_0000212 ?method ;
bao:BAO_0000208 [ a mmdbvoc:LogPerm;
bao:BAO_0095007 ?srcLogPerm]
] .
?mmdbTgt sio:SIO_000008 [a sio:CHEMINF_000059;
sio:SIO_000300 ?ikeyTgt];
^bao:BAO_0090012 [bao:BAO_0090012 ?membrane ;
bao:BAO_0000212 ?method ;
bao:BAO_0000208 [ a mmdbvoc:LogPerm ;
bao:BAO_0095007 ?tgtLogPerm]
] .
?membrane rdfs:label ?membraneName .
?method rdfs:label ?methodName .
}
```
* [Dominik, Egon, Denise] WP-IDSM-MolMeDB (WP endpoint):
```SPARQL=
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX bao: <http://www.bioassayontology.org/bao#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX chebi: <http://purl.obolibrary.org/obo/chebi/>
PREFIX sachem: <http://bioinfo.uochb.cas.cz/rdf/v1.0/sachem#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
PREFIX sso: <http://semanticscience.org/resource/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX mmdbvoc: <https://rdf.molmedb.upol.cz/vocabulary#>
SELECT distinct ?chebioSrc ?similarSrc ?ikeySrc ?srcLogPerm
?chebioTgt ?similarTgt ?ikeyTgt ?tgtLogPerm
?membraneName ?methodName WHERE
{
# WikiPathways service
?interaction dcterms:isPartOf ?pathway ; a wp:Conversion ;
wp:source ?source ;
wp:target ?target .
?source wp:bdbChEBI ?chebiSrc .
?target wp:bdbChEBI ?chebiTgt .
?pathway dcterms:identifier "WP4225".
BIND(iri(concat("http://purl.obolibrary.org/obo/CHEBI_", substr(str(?chebiSrc),37))) AS ?chebioSrc)
BIND(iri(concat("http://purl.obolibrary.org/obo/CHEBI_", substr(str(?chebiTgt),37))) AS ?chebioTgt)
#MolMeDB - everything else happens here
SERVICE <https://idsm.elixir-czech.cz/sparql/endpoint/molmedb>
{
#IDSM CHEBI - find similar molecules in ChEBI
SERVICE <https://idsm.elixir-czech.cz/sparql/endpoint/chebi>
{
?chebioSrc ^sso:is-attribute-of / sso:has-value ?molfileSrc .
?chebioTgt ^sso:is-attribute-of / sso:has-value ?molfileTgt .
[ sachem:compound ?similarSrc; sachem:score ?scoreSrc ]
sachem:similaritySearch [
sachem:query ?molfileSrc ;
sachem:cutoff 98e-2
].
[ sachem:compound ?similarTgt; sachem:score ?scoreTgt ]
sachem:similaritySearch [
sachem:query ?molfileTgt ;
sachem:cutoff 98e-2
].
}
#IDSM - find InChiKeys of similar source and target
SERVICE <https://idsm.elixir-czech.cz/sparql/endpoint/idsm>
{
?similarSrc chebi:inchikey ?ikeySrc .
?similarTgt chebi:inchikey ?ikeyTgt .
}
#search MolMeDB for pairs similar to source-target pairs and permeability for same membrane and method
?mmdbSrc sio:SIO_000008 [a sio:CHEMINF_000059;
sio:SIO_000300 ?ikeySrc] ;
^bao:BAO_0090012 [bao:BAO_0090012 ?membrane ;
bao:BAO_0000212 ?method ;
bao:BAO_0000208 [ a mmdbvoc:LogPerm;
bao:BAO_0095007 ?srcLogPerm]
] .
?mmdbTgt sio:SIO_000008 [a sio:CHEMINF_000059;
sio:SIO_000300 ?ikeyTgt];
^bao:BAO_0090012 [bao:BAO_0090012 ?membrane ;
bao:BAO_0000212 ?method ;
bao:BAO_0000208 [ a mmdbvoc:LogPerm ;
bao:BAO_0095007 ?tgtLogPerm]
] .
?membrane rdfs:label ?membraneName .
?method rdfs:label ?methodName .
}
}
```
## Feasibility of metabolic reactions
[Anne, Denise]: Discussion on how to select relevant Rhea IDs for reactions in WPs, specifically the direction of the reaction. Example:
A+B -> C+D ; main reaction: A->D, side metabolites: B, C. The other direction requires another side metabolite, e.g. A+B <- C+D+E, and will have a different Rhea ID. Ideas to potentially resolve this:
* Include Substrate, Product, and Protein in query. See #L291
* Atom-to-Atom mapping relevant [paper](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0223-1)
* Thermodynamic direction detection algorithm [paper 1](https://doi.org/10.1186/1471-2105-7-512), [paper 2](https://doi.org/10.1038/srep07022)
## Please add more if needed...