Reconstruction process

input:

$r$ : RDF graph
$c$ : a well-behaved context

output:

$p g$ : a property graph for which
$c$ is complete

steps:

find signature triples
- i.e. all triples that have the same
  $κ$ as a type signature in
  $c$
- for each signature triple, there must be exactly one type
  (by definition of the "signature")
associate a type to each bnode
- for each bnode
  $b$ in
  $g$ , find all types associated with the signature triples in which
  $b$ appears
- if
  $b$ is associated with exactly one type, keep this type
- if
  $b$ is associated with exactly one node type, and one or several edge type, only keep the node type
- in any other cases (0 types, several node types),
  $r$ is INVALID-1
associate a bnode to each triple
- for each triple
  $t$ in
  $r$
  - if
    $t$ contains only one bnode (possibly in several positions),
    then associate it with
    $b$
  - if
    $t$ contains two different bnodes (possibly in several positions)
    - one of them must be associated to a node type, and the other one to an edge type
      - otherwise the graph is INVALID-2
      - associate the triple with the "edge" bnode
  - in any other case (no bnode, more than two bnodes),
    $r$ is INVALID-2'
for each bnode
$b$
- let
  $T$ be the type associated to
  $b$ , and
  $t p s = c (T)$
- let
  $t p u s \subseteq t p s$ the set of template triples that have a unique
  $κ$ in
  $t p s$
- let
  $t s$ be the set of triples associated with
  $b$
- if there is no injection of
  $t p u s$ into
  $t s$ , then
  $g$ is INVALID-3
  - WARNING: here we are simply ignoring triples extra triples (i.e. those that were generated by triple patterns with non-unique
    $κ$ )
    - not sure if Julian does the same
    - it means that the constructed
      $p g$ may not round-trip to
      $g$
      - we might even lose some information in the process (see below: 'the algorith is creative')
- for each
  $t p$ in
  $t p u s$ , extract property values and / or ?src / ?dest, and add them to the PG being constructed
- if conflicting values for properties, src or dest arise, then
  $g$ is INVALID-4
- NB: per the definition of well-behaved context,
  
  $t p u s$ references all properties of
  $T$ , as well as
  $? s r c$ and
  $? d e s t$ for edge types
  - so at this point,
    $b$ in the property graph complies with type
    $T$

Proof

The algorithm is complete

i.e. if

r

was actually generated from a

p g

, it will NOT be rejected as invalid.

For this, we need to prove that

r = p r s c (p g, c)

always has the properties checked at each INVALID case above.

PROP-0 the bnodes of
$r$ are exactly the elements of
$p g$
- not an "INVALID" case in the algorithm, but useful for the rest
- proof sketch: part of the proof of Theorem 3
PROP-1 every bnode occurs in exactly one "signature triple"
- proof sketch: Lemma 2 ?
PROP-2 every triple contains exactly one or two bnodes – in the case of two, one of them is a node, the other is an edge
- proof sketch: Theorem 5
- corollary: every triple is produced by the graph pattern associated to its unique bnode, or to its edge bnode
  - proof sketch: Theorem 6
PROP-3 for every element
$b$ of
$p g$ , and every unique triple pattern
$t p$ in
$c (t y p e o f_{p} (b))$ , there is exactly one triple in
$g$ containing
$b$ and whose
$κ$ is the same
$t p$
- proof sketch: Corollary PROP-2 and Theorem 7 / Corrolary 3
PROP-4 for every element
$b$ of
$p g$ , "the triples generated with b contain the correct values for the placeholders"
- needs to be more formally defined

The algorithm is sound

i.e. if

r

was actually generated from a

p g

, then it is

p g

that is actually reconstructed

TODO

The algorithm is creative

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

i.e. it may generate a

p g

even if

g

was not (or even could not be) generated from a

p g

Consider a node type with label Person and properties givenName and familyName.

Consider the (well behaved) context that associate that type with the following graph pattern:

  ?self a schema:Person .
  ?self schema:givenName "givenName"^^prec:valueOf .    # unique
  ?self schema:familyName "familyName"^^prec:valueOf .  # unique
  ?self rdfs:label "givenName"^^prec:valueOf .          # NOT unique
  ?self rdfs:label "familyName"^^prec:valueOf .         # NOT unique

Consider now the following PG:

[ Person { givenName: "Jean", familyName: "Jean "}]

It converts to

[] a schema:Person ;
  schema:givenName "Jean";
  schema:familyName "Jean";
  rdfs:label "Jean" ;  # NB: actually generated twice, but that's ok
.

But the following graphs would also generate the same PG, with the process above

[] a schema:Person ;
  schema:givenName "Jean";
  schema:familyName "Jean";
  # removed the rdfs:label
.

This one is ok-ish, because it is a subgraph of the original. We removed redundant information. That might be considered acceptable.

[] a schema:Person ;
  schema:givenName "Jean";
  schema:familyName "Jean";
  rdfs:label "Jean", "Batman"; # ADDED a extraneous rdfs:label
.

This one is more disturbing, because we added information that

is inconsistent with the context
will be lost when converting back to PG*

I suspect that Julian will say: "I don't care, I am only concerned with RDF graphs that actually come from a PG". And that will be fair ;)

Note that we could improve the algorithm above to detect that these graphs are invalid (or possibly just the second one), but that might be expensive… So there is a trade-off to be made here.

Reconstruction process

Proof

The algorithm is complete

The algorithm is sound

The algorithm is creative Image Not Showing Possible Reasons The image file may be corruptedThe server hosting the image is unavailableThe image path is incorrectThe image format is not supported Learn More →

The algorithm is creative

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →