owned this note
owned this note
Published
Linked with GitHub
# RDF 1.2 semantic datasets
## Rationale
The RDF 1.1 WG did not standardize any semantics for named graphs, because there was already too many different practices with SPARQL 1.1 datasets. But I think that we (I was part of that WG) missed an opportunity to go a little further.
Indeed, there is a subtle difference between SPARQL 1.1 datasets and RDF 1.1 datasets: the former only allows IRIs as graph name, while the latter also allows blank nodes. If I remember correctly, blank nodes where allowed as graph names to allow some uses that were already deployed in JSON-LD.
:::info
On further inquiry, there is evidence that named graphs in SPARQL endpoints still use exclusively IRI-named graphs.
See [this query](http://prod-dekalog.inria.fr/sparql?default-graph-uri=http%3A%2F%2Fns.inria.fr%2Findegx&query=PREFIX+sd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2Fns%2Fsparql-service-description%23%3E%0D%0ASELECT+%28COUNT%28DISTINCT+%3Fign%29+AS+%3Firi_named%29+%28COUNT%28DISTINCT+%3Fbgn%29+AS+%3Fbnode_named%29+%7B%0D%0A++++++%7B+GRAPH+%3Fg+%7B+%3Fdesc+sd%3Aname+%3Fign+%7D+FILTER%28isIri%28%3Fign%29%29+%7D%0D%0A++++++UNION%0D%0A++++++%7B+GRAPH+%3Fg+%7B+%3Fdesc+sd%3Aname+%3Fbgn+%7D+FILTER%28isBlank%28%3Fbgn%29%29+%7D%0D%0A++++%7D+&format=text%2Fhtml&should-sponge=&timeout=0&signal_void=on) on an index of 200 to 300 SPARQL endpoints.
:::
I would argue that, contrarily to SPARQL's IRI-named graphs, bnode-named graphs are used in JSON-LD in a quite consistent way, and that in those case the bnode is supposed to denote the graph itself (e.g. in Verifiable Credentials).
As such, bnode-named graphs are very similar to N3's graph-terms. By the way, Gregg Kellog's implementation of RDF-star and N3 are consistent with this interpretation:
```n3
# this is N3
<tag:a> <tag:b> { <tag:c> <tag:d> <tag:e> }.
```
is serialized to/from
```n-quads
# this is N-quads
<tag:c> <tag:d> <tag:e> _:_form_0 .
<tag:a> <tag:b> _:_form_0 .
```
## General idea
Define "semantic datasets" a profile on dataset concrete syntaxes, that would endorse the interpretation hinted above. More specifically, the constraints on this profile would be:
- all graph names are blank nodes
- all graphs (default and named) are RDF 1.2 classic (no quoted triples)
- no self-reference: a named graph can not (directly or indirectly) contain the blank node that is its name (this requires a more formal definition, but I hope it is clear enough)
:::info
**Why not allow quoted triples there?**
Having both quoted triples and singleton graph-terms would seem redundant in this context. Note that in concrete syntax, we could still use the double pointy brackets (even inside the graph-terms). They would simply be considered as syntactic sugar for a singleton graph-term.
:::
## Proposal 1 (straw man)
We keep the abstract syntax as it is now (i.e. with quoted triples and no notion of graph-term). As for RDF 1.1, we only define a semantics for graphs in this abstract syntax.
We propose (in a note, or a new REC after rechartering?), an alternative abstract syntax including graph-terms, with a semantics inspired by N3. We explain how dataset concrete syntax *with* the "semantic dataset" profile, can be alternatively parsed into this abstract syntax.
We could then
* define a mapping between the two abstract syntaxes
(where quoted triples would become singleton graph-terms)
* prove that the mapping is inference preserving
PROS: conservative change: we keep RDF 1.2 close to CG-RDF-star
CONS: proving that the mapping preserves inferences might be tricky
CONS: how semantics extensions on graphs (e.g. RDFS) apply to semantic datasets can not be automatically deduced from these definitions. So for each semantic extension, we would need to do the same work as with simple entailmenet (define it on both abstract syntaxes, and prove that the mapping is inference preserving between these two semantics). :fearful:
## Proposal 2
We bite the bullet and we define the abstract syntax with graph-terms instead of quoted triples. However, we restrict RDF 1.2 graphs to have only singleton graph-termss (same for datasets). As for RDF 1.1, we only define the semantics for graphs.
Overall, RDF 1.2 will be very similar to RDF-star, but we can define the semantics on the full abtract syntax (without the "singleton graph" restriction). That way, the semantics is future-proof.
Any TriG (or n-quads, or JSON-LD) complying with the "semantic dataset" profile could be alternatively parsed as
- a standard RDF 1.2 dataset (with no semantics)
- an "extended" RDF 1.2 graph with non-singleton graph-terms
(using the same mapping as in prop 1)
In the second case, the semantics would naturally apply to this extended graph, and reasoning on semantic datasets.
## Proposal 3 (straw man)
NB: this proposal does not rely on a profile for dataset concrete-syntaxes.
We bite the bullet and we define the abstract syntax with graph-terms instead of quoted triples. We define 3 profiles of RDF 1.2: full, star (only singleton graph-terms), basic (no graph-terms). In RDF 1.2 datasets, graph names can only be IRIs (bnode graph names are still allowed in *concrete* syntaxes, see below).
Graph concrete syntaxes (n-triples, turtle, RDF/XML) are restricted to producing RDF 1.2 star (or basic) graphs (although this may change in RDF 1.3 or later). Bnode-named graphs are authorized in dataset concrete syntaxes (N-quads, TriG, JSON-LD), but they are a syntactic workaround to express arbitrary graph-terms (RDF 1.2 full). The double pointy brackets are of course still useable to express singleton graph-terms.
In that semantics, the following three TriG files produce the same abstract syntax:
```trig
@prefix : <tag:>.
:a :b << :c :d :e >>.
<< :c :d :e >> :f :g.
```
```trig
@prefix : <tag:>.
:a :b _:qt.
_:qt :f :g.
GRAPH _:qt { :c :d :e }
```
```trig
@prefix : <tag:>.
:a :b _:qt1.
_:qt2 :f :g.
GRAPH _:gt1 { :c :d :e }
GRAPH _:gt2 { :c :d :e }
```
We could extend graph semantics to "datasets who have only a default graph", so that we could say thing like "this TriG file entails that TriG file" without being sloppy.
:::warning
What to do of the following TriG snippet, then? They would have no counterpart in the abstract syntax
```
@prefix : <tag:>.
GRAPH _:gt1 { :c :d :e }
# "floating" graph-term
```
```
@prefix : <tag:>.
:a :b _:qt1.
GRAPH _:gt1 { :c :d _:gt1 }
# self-referential graph-term
```
Because of this breaking change, I consider that this proposal is too radical. That's why I tagged it as "straw man".
:::
## Proposal 2bis (hand wavy)
Proposal 2 includes graph-terms in the abstract syntax *and* the semantics. Maybe we can keep the abstract syntax as it is *and still* define on top of it a semantics that is "graph-term ready". (After all, the abstract syntax does not allow literals in the subject position, and the semantics still "supports" it).
If we did that, then the semantics of RDF 1.2 semantic dataset could be a "natural extrapolation" of the semantics of RDF 1.2 semantic for graphs, and semantic extensions on graphs would "propagate" on semantic datasets smoothly.
Rough idea of how this could work:
each interpretation has a mapping IQ that maps to arbitrary graphs, and which is constrained to map quoted triples to the corresponding singleton graph (but would be allowed to map other terms).
Interpretations of semantic dataset would extend the constrain to force every bnode used as a graph name to map, via IQ, to the corresponding graph (not restricted to singleton).
## Aside: canonicalization
Whichever proposal we use, the idea of mapping quoted triples to singleton bnode-named graphs seems like a good way to canonicalize RDF 1.2 using RDC 1.0.
## Proposal 4 (added 2023-09-21) - WIP
We bite the bullet and we define the abstract syntax with graph-terms instead of quoted triples. We define 3 profiles of RDF 1.2: Full, Star (only singleton graph-terms), Basic (no graph-terms). We define RDF 1.2 datasets the same way they are defined in RDF 1.1 (graphs names can be IRIs or literals).
Graph concrete syntaxes (n-triples, turtle, RDF/XML) are restricted to producing RDF 1.2 star (or basic) graphs (although this may change in RDF 1.3 or later).
Only RDF 1.2 graphs have a specified semantics, RDF 1.2 datasets don't. However, we have a simple way to transform any RDF 1.2 dataset D into an RDF 1.2 Full graph G:
- initialize G with the default graph of D
- decide on a specific predicate P
- for each named graph (N, H), add a triple (N, T, P) in G
The predicate P intended for a specific file could be indicated by a content-type parameter (i.e. application/trig; graph_link=http://www.w3.org/2002/07/owl#sameAs), or possibly by defining a kind of "pragma" in a comment.
:::warning
The content-type parameter / pragma trick is a hack: it keeps part of the semantics out-of-band. So a full-fledged RDF 1.2 Full concrete syntax would be a better alternative in the future.
But at least this provides a smooth evolution from datasets (compatible with current architectures) towards RDF 1.2 full.
:::
NB: by using `owl:sameAs` as the predicate P above, one can encode an arbitrary RDF 1.2 Full graph in an RDF 1.2 dataset. Some datasets, however, might result in inconsistent graphs, or even "impossible" ones (if the named graphs are self-referential)...
NB: by using another predicate than `owl:sameAs`, named graphs can be considered as "tokens" rather than "types" (i.e. different named graphs containing the same triples can coexist, and still be considered as different things).
...