# Reconstruction process input: - $r$: RDF graph - $c$: a well-behaved context output: - $pg$: a property graph for which $c$ is complete steps: * find signature triples * i.e. all triples that have the same $\kappa$ as a type signature in $c$ * for each signature triple, there must be exactly one type (by definition of the "signature") * associate a type to each bnode * for each bnode $b$ in $g$, find all types associated with the signature triples in which $b$ appears * if $b$ is associated with exactly one type, keep this type * if $b$ is associated with exactly one node type, and one or several edge type, only keep the node type * in any other cases (0 types, several node types), $r$ is **INVALID-1** * associate a bnode to each triple * for each triple $t$ in $r$ * if $t$ contains only one bnode (possibly in several positions), then associate it with $b$ * if $t$ contains two different bnodes (possibly in several positions) * one of them must be associated to a node type, and the other one to an edge type * otherwise the graph is **INVALID-2** * associate the triple with the "edge" bnode * in any other case (no bnode, more than two bnodes), $r$ is **INVALID-2'** * for each bnode $b$ * let $T$ be the type associated to $b$, and $tps = c(T)$ * let $tpus \subseteq tps$ the set of template triples that have a unique $\kappa$ in $tps$ * let $ts$ be the set of triples associated with $b$ * if there is no injection of $tpus$ into $ts$, then $g$ is **INVALID-3** * WARNING: here we are simply ignoring triples extra triples (i.e. those that were generated by triple patterns with non-unique $\kappa$) * not sure if Julian does the same * it means that the constructed $pg$ may not round-trip to $g$ * we might even lose some information in the process (see [below](#the-algotithm-is-creative-): 'the algorith is creative') * for each $tp$ in $tpus$, extract property values and / or ?src / ?dest, and add them to the PG being constructed * if conflicting values for properties, src or dest arise, then $g$ is **INVALID-4** * NB: per the definition of well-behaved context, $tpus$ references all properties of $T$, as well as $?src$ and $?dest$ for edge types * so at this point, $b$ in the property graph complies with type $T$ # Proof ## The algorithm is complete i.e. if $r$ was actually generated from a $pg$, it will NOT be rejected as invalid. For this, we need to prove that $r = prsc(pg, c)$ always has the properties checked at each INVALID case above. * PROP-0 the bnodes of $r$ are exactly the elements of $pg$ * not an "INVALID" case in the algorithm, but useful for the rest * proof sketch: part of the proof of Theorem 3 * PROP-1 every bnode occurs in exactly one "signature triple" * proof sketch: Lemma 2 ? * PROP-2 every triple contains exactly one or two bnodes -- in the case of two, one of them is a node, the other is an edge * proof sketch: Theorem 5 * corollary: every triple is produced by the graph pattern associated to its unique bnode, or to its edge bnode * proof sketch: Theorem 6 * PROP-3 for every element $b$ of $pg$, and every unique triple pattern $tp$ in $c(typeof_p(b))$, there is exactly one triple in $g$ containing $b$ and whose $\kappa$ is the same $tp$ * proof sketch: Corollary PROP-2 and Theorem 7 / Corrolary 3 * PROP-4 for every element $b$ of $pg$, "the triples generated with b contain the correct values for the placeholders" * needs to be more formally defined ## The algorithm is sound i.e. if $r$ was actually generated from a $pg$, then it is $pg$ that is actually reconstructed TODO ## The algorithm is creative :stuck_out_tongue_winking_eye: i.e. it may generate a $pg$ even if $g$ was not (or even could not be) generated from a $pg$. Consider a node type with label `Person` and properties `givenName` and `familyName`. Consider the (well behaved) context that associate that type with the following graph pattern: ```sparql ?self a schema:Person . ?self schema:givenName "givenName"^^prec:valueOf . # unique ?self schema:familyName "familyName"^^prec:valueOf . # unique ?self rdfs:label "givenName"^^prec:valueOf . # NOT unique ?self rdfs:label "familyName"^^prec:valueOf . # NOT unique ``` Consider now the following PG: ```cypher [ Person { givenName: "Jean", familyName: "Jean "}] ``` It converts to ```turtle [] a schema:Person ; schema:givenName "Jean"; schema:familyName "Jean"; rdfs:label "Jean" ; # NB: actually generated twice, but that's ok . ``` But the following graphs would also generate the same PG, with the process above ```turtle [] a schema:Person ; schema:givenName "Jean"; schema:familyName "Jean"; # removed the rdfs:label . ``` This one is ok-ish, because it is a subgraph of the original. We removed redundant information. That might be considered acceptable. ```turtle [] a schema:Person ; schema:givenName "Jean"; schema:familyName "Jean"; rdfs:label "Jean", "Batman"; # ADDED a extraneous rdfs:label . ``` This one is more disturbing, because we added information that - is inconsistent with the context - will be lost when converting back to PG* I suspect that Julian will say: "I don't care, I am only concerned with RDF graphs that actually come from a PG". And that will be fair ;) Note that we could improve the algorithm above to detect that these graphs are invalid (or possibly just the second one), but that might be expensive... So there is a trade-off to be made here.