Reconstruction process
input:
- : RDF graph
- : a well-behaved context
output:
- : a property graph for which is complete
steps:
- find signature triples
- i.e. all triples that have the same as a type signature in
- for each signature triple, there must be exactly one type
(by definition of the "signature")
- associate a type to each bnode
- for each bnode in , find all types associated with the signature triples in which appears
- if is associated with exactly one type, keep this type
- if is associated with exactly one node type, and one or several edge type, only keep the node type
- in any other cases (0 types, several node types), is INVALID-1
- associate a bnode to each triple
- for each triple in
- if contains only one bnode (possibly in several positions),
then associate it with
- if contains two different bnodes (possibly in several positions)
- one of them must be associated to a node type, and the other one to an edge type
- otherwise the graph is INVALID-2
- associate the triple with the "edge" bnode
- in any other case (no bnode, more than two bnodes), is INVALID-2'
- for each bnode
- let be the type associated to , and
- let the set of template triples that have a unique in
- let be the set of triples associated with
- if there is no injection of into , then is INVALID-3
- WARNING: here we are simply ignoring triples extra triples (i.e. those that were generated by triple patterns with non-unique )
- not sure if Julian does the same
- it means that the constructed may not round-trip to
- we might even lose some information in the process (see below: 'the algorith is creative')
- for each in , extract property values and / or ?src / ?dest, and add them to the PG being constructed
- if conflicting values for properties, src or dest arise, then is INVALID-4
- NB: per the definition of well-behaved context,
references all properties of , as well as and for edge types
- so at this point, in the property graph complies with type
Proof
The algorithm is complete
i.e. if was actually generated from a , it will NOT be rejected as invalid.
For this, we need to prove that always has the properties checked at each INVALID case above.
- PROP-0 the bnodes of are exactly the elements of
- not an "INVALID" case in the algorithm, but useful for the rest
- proof sketch: part of the proof of Theorem 3
- PROP-1 every bnode occurs in exactly one "signature triple"
- PROP-2 every triple contains exactly one or two bnodes – in the case of two, one of them is a node, the other is an edge
- proof sketch: Theorem 5
- corollary: every triple is produced by the graph pattern associated to its unique bnode, or to its edge bnode
- PROP-3 for every element of , and every unique triple pattern in , there is exactly one triple in containing and whose is the same
- proof sketch: Corollary PROP-2 and Theorem 7 / Corrolary 3
- PROP-4 for every element of , "the triples generated with b contain the correct values for the placeholders"
- needs to be more formally defined
The algorithm is sound
i.e. if was actually generated from a , then it is that is actually reconstructed
TODO
The algorithm is creative
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
i.e. it may generate a even if was not (or even could not be) generated from a .
Consider a node type with label Person
and properties givenName
and familyName
.
Consider the (well behaved) context that associate that type with the following graph pattern:
Consider now the following PG:
It converts to
But the following graphs would also generate the same PG, with the process above
This one is ok-ish, because it is a subgraph of the original. We removed redundant information. That might be considered acceptable.
This one is more disturbing, because we added information that
- is inconsistent with the context
- will be lost when converting back to PG*
I suspect that Julian will say: "I don't care, I am only concerned with RDF graphs that actually come from a PG". And that will be fair ;)
Note that we could improve the algorithm above to detect that these graphs are invalid (or possibly just the second one), but that might be expensive… So there is a trade-off to be made here.