# Associations in Biolink Model Biolink Model makes use of LinkML and has 5 types of entities defined in the model: - classes - associations - relations - node properties - edge properties (or association properties) ## Property Graph Formalism In a property graph (like Neo4j or networkx) there are nodes and edges. Nodes have node properties and edges have edge properties. If we were to map the Biolink Model onto a property graph formalism then: - classes correspond to nodes in the graph (with appropriate node properties) - associations correspond to edges (where each edge has a `subject`, `predicate`, `object` property; and any additional edge properties) - relations are used in the `predicate` property of edges ### Graph If we take the example of a simple Disease to Phenotypic Feature Association and express it as a property graph, **Nodes:** ```json! { "nodes": [ { "id": "MONDO:0005737", "name": "Ebola hemorrhagic fever", "category": "biolink:Disease" }, { "id": "HP:0001892", "name": "Abnormal Bleeding", "category": "biolink:PhenotypicFeature" }, { "id": "ORCID:123-321233-32432-54353", "name": "Robinson, P.", "category": "biolink:Author" }, { "id": "PMID:12345", "date": "2019-05-12", "authors": [ "ORCID:123-321233-32432-54353" ], "category": "biolink:Publication" } ] } ``` Each node is typed using an appropriate class from the Biolink Model. The `category` field should have a value from the `biolink:NamedThing` hierarchy. **Edges:** ```json! { "edges": [ { "id": "my_association_1", "subject": "MONDO:0005737", "predicate": "biolink:has_phenotype", "object": "HP:0001892", "publications": [ "PMID:12345" ], "category": "biolink:DiseaseToPhenotypicFeatureAssociation" } ] } ``` The edge is typed as `biolink:DiseaseToPhenotypicFeatureAssociation` via the `category` property. The `category` field should have a value from the `biolink:Association` hierarchy. The edge also has a `predicate` field which should have a value from the `biolink:related_to` hierarchy. In this case it is the `biolink:has_phenotype` relation. ## RDF Formalism In an RDF graph there are only triples. If we were to map the Biolink Model onto an RDF graph formalism then: - classes correspond to the nodes in the graph - node properties correspond to predicates that are used in triples that describe nodes - relations correspond to the predicates used in triples But one caveat of RDF is that you cannot describe a triple further. Taking the previous example and expressing in RDF, **Nodes:** ```rdf @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix biolink: <https://w3id.org/biolink/vocab/> . @prefix MONDO: <http://purl.obolibrary.org/obo/MONDO_> . @prefix HP: <http://purl.obolibrary.org/obo/HP_> . @prefix ORCID: <https://orcid.org/> . MONDO:0005737 biolink:id "MONDO:0005737" . MONDO:0005737 biolink:name "Abnormal Bleeding" . MONDO:0005737 biolink:category biolink:Disease . HP:0001892 biolink:id "HP:0001892" . HP:0001892 biolink:name "Abnormal Bleeding" . HP:0001892 biolink:category biolink:PhenotypicFeature . ORCID:123-321233-32432-54353 biolink:id "ORCID:123-321233-32432-54353" . ORCID:123-321233-32432-54353 biolink:name "Robinson, P." . ORCID:123-321233-32432-54353 biolink:category biolink:Author . PMID:12345 biolink:id "PMID:12345" . PMID:12345 biolink:date "2019-05-12" . PMID:12345 biolink:authors ORCID:123-321233-32432-54353 . PMID:12345 biolink:category biolink:Publication . ``` **Edges:** ```rdf MONDO:0005737 biolink:has_phenotype HP:0001892 . ``` As you can see, it is not possible to attach additional metadata to this triple (For example, the publication information). This be cause in RDF everything (or every triple) is basically an edge. It is not possible to describe a triple further (assuming RDF 1.1 specification). To overcome this limitation, RDF has the concept of **Reification** which is the projection a triple onto a node such that you can attach more metadata to that node. In Biolink Model we apply the concept of Reification to edges; Which is why we have Associations (i.e. edges being treated as classes/nodes). With Reificaiton the previous edge example can be expressed as: ```rdf <https://example.org/my_association_1> biolink:id "my_association_1" . <https://example.org/my_association_1> rdf:subject MONDO:0005737 . <https://example.org/my_association_1> rdf:predicate biolink:has_phenotype . <https://example.org/my_association_1> rdf:object HP:0001892 . <https://example.org/my_association_1> biolink:publication PMID:12345 . ``` The process of reification is unnecessary in property graphs but required in RDF graphs. As of 2021, it is possible to add metadata to triples directly using RDF-star (https://w3c.github.io/rdf-star/cg-spec/editors_draft.html). As a result the Reification is becoming less and less common. The concept of 'Association' is basically a combination of the nature of the subject node, the nature of the object node and the relationship that links the two together. (i.e. the edge). In property graphs we can describe an edge natively. Thus we type edge as an Association (via the `category` field) so that the nature of the edge is explicitly defined. We do not need to create a node to represent the Association. Whether or not you create a node for an Association is dependent on the graph technology you are using - **Property Graphs:** If you are using property graph (Neo4j, networkx) then you can treat Associations as edges directly - **Hypergraphs:** If you are using hypergraphs (TypeDB) then you can treat Associations as edges directly - **RDF Graphs:** If you are using RDF graphs (GraphDB, Apache Jena) then you have to treat Associations as nodes (for reasons described previously)