# Reconstruction process
input:
- $r$: RDF graph
- $c$: a well-behaved context
output:
- $pg$: a property graph for which $c$ is complete
steps:
* find signature triples
* i.e. all triples that have the same $\kappa$ as a type signature in $c$
* for each signature triple, there must be exactly one type
(by definition of the "signature")
* associate a type to each bnode
* for each bnode $b$ in $g$, find all types associated with the signature triples in which $b$ appears
* if $b$ is associated with exactly one type, keep this type
* if $b$ is associated with exactly one node type, and one or several edge type, only keep the node type
* in any other cases (0 types, several node types), $r$ is **INVALID-1**
* associate a bnode to each triple
* for each triple $t$ in $r$
* if $t$ contains only one bnode (possibly in several positions),
then associate it with $b$
* if $t$ contains two different bnodes (possibly in several positions)
* one of them must be associated to a node type, and the other one to an edge type
* otherwise the graph is **INVALID-2**
* associate the triple with the "edge" bnode
* in any other case (no bnode, more than two bnodes), $r$ is **INVALID-2'**
* for each bnode $b$
* let $T$ be the type associated to $b$, and $tps = c(T)$
* let $tpus \subseteq tps$ the set of template triples that have a unique $\kappa$ in $tps$
* let $ts$ be the set of triples associated with $b$
* if there is no injection of $tpus$ into $ts$, then $g$ is **INVALID-3**
* WARNING: here we are simply ignoring triples extra triples (i.e. those that were generated by triple patterns with non-unique $\kappa$)
* not sure if Julian does the same
* it means that the constructed $pg$ may not round-trip to $g$
* we might even lose some information in the process (see [below](#the-algotithm-is-creative-): 'the algorith is creative')
* for each $tp$ in $tpus$, extract property values and / or ?src / ?dest, and add them to the PG being constructed
* if conflicting values for properties, src or dest arise, then $g$ is **INVALID-4**
* NB: per the definition of well-behaved context,
$tpus$ references all properties of $T$, as well as $?src$ and $?dest$ for edge types
* so at this point, $b$ in the property graph complies with type $T$
# Proof
## The algorithm is complete
i.e. if $r$ was actually generated from a $pg$, it will NOT be rejected as invalid.
For this, we need to prove that $r = prsc(pg, c)$ always has the properties checked at each INVALID case above.
* PROP-0 the bnodes of $r$ are exactly the elements of $pg$
* not an "INVALID" case in the algorithm, but useful for the rest
* proof sketch: part of the proof of Theorem 3
* PROP-1 every bnode occurs in exactly one "signature triple"
* proof sketch: Lemma 2 ?
* PROP-2 every triple contains exactly one or two bnodes -- in the case of two, one of them is a node, the other is an edge
* proof sketch: Theorem 5
* corollary: every triple is produced by the graph pattern associated to its unique bnode, or to its edge bnode
* proof sketch: Theorem 6
* PROP-3 for every element $b$ of $pg$, and every unique triple pattern $tp$ in $c(typeof_p(b))$, there is exactly one triple in $g$ containing $b$ and whose $\kappa$ is the same $tp$
* proof sketch: Corollary PROP-2 and Theorem 7 / Corrolary 3
* PROP-4 for every element $b$ of $pg$, "the triples generated with b contain the correct values for the placeholders"
* needs to be more formally defined
## The algorithm is sound
i.e. if $r$ was actually generated from a $pg$, then it is $pg$ that is actually reconstructed
TODO
## The algorithm is creative :stuck_out_tongue_winking_eye:
i.e. it may generate a $pg$ even if $g$ was not (or even could not be) generated from a $pg$.
Consider a node type with label `Person` and properties `givenName` and `familyName`.
Consider the (well behaved) context that associate that type with the following graph pattern:
```sparql
?self a schema:Person .
?self schema:givenName "givenName"^^prec:valueOf . # unique
?self schema:familyName "familyName"^^prec:valueOf . # unique
?self rdfs:label "givenName"^^prec:valueOf . # NOT unique
?self rdfs:label "familyName"^^prec:valueOf . # NOT unique
```
Consider now the following PG:
```cypher
[ Person { givenName: "Jean", familyName: "Jean "}]
```
It converts to
```turtle
[] a schema:Person ;
schema:givenName "Jean";
schema:familyName "Jean";
rdfs:label "Jean" ; # NB: actually generated twice, but that's ok
.
```
But the following graphs would also generate the same PG, with the process above
```turtle
[] a schema:Person ;
schema:givenName "Jean";
schema:familyName "Jean";
# removed the rdfs:label
.
```
This one is ok-ish, because it is a subgraph of the original. We removed redundant information. That might be considered acceptable.
```turtle
[] a schema:Person ;
schema:givenName "Jean";
schema:familyName "Jean";
rdfs:label "Jean", "Batman"; # ADDED a extraneous rdfs:label
.
```
This one is more disturbing, because we added information that
- is inconsistent with the context
- will be lost when converting back to PG*
I suspect that Julian will say: "I don't care, I am only concerned with RDF graphs that actually come from a PG". And that will be fair ;)
Note that we could improve the algorithm above to detect that these graphs are invalid (or possibly just the second one), but that might be expensive... So there is a trade-off to be made here.