Reconstruction process

input:

  • r
    : RDF graph
  • c
    : a well-behaved context

output:

  • pg
    : a property graph for which
    c
    is complete

steps:

  • find signature triples
    • i.e. all triples that have the same
      κ
      as a type signature in
      c
    • for each signature triple, there must be exactly one type
      (by definition of the "signature")
  • associate a type to each bnode
    • for each bnode
      b
      in
      g
      , find all types associated with the signature triples in which
      b
      appears
    • if
      b
      is associated with exactly one type, keep this type
    • if
      b
      is associated with exactly one node type, and one or several edge type, only keep the node type
    • in any other cases (0 types, several node types),
      r
      is INVALID-1
  • associate a bnode to each triple
    • for each triple
      t
      in
      r
      • if
        t
        contains only one bnode (possibly in several positions),
        then associate it with
        b
      • if
        t
        contains two different bnodes (possibly in several positions)
        • one of them must be associated to a node type, and the other one to an edge type
          • otherwise the graph is INVALID-2
          • associate the triple with the "edge" bnode
      • in any other case (no bnode, more than two bnodes),
        r
        is INVALID-2'
  • for each bnode
    b
    • let
      T
      be the type associated to
      b
      , and
      tps=c(T)
    • let
      tpustps
      the set of template triples that have a unique
      κ
      in
      tps
    • let
      ts
      be the set of triples associated with
      b
    • if there is no injection of
      tpus
      into
      ts
      , then
      g
      is INVALID-3
      • WARNING: here we are simply ignoring triples extra triples (i.e. those that were generated by triple patterns with non-unique
        κ
        )
        • not sure if Julian does the same
        • it means that the constructed
          pg
          may not round-trip to
          g
          • we might even lose some information in the process (see below: 'the algorith is creative')
    • for each
      tp
      in
      tpus
      , extract property values and / or ?src / ?dest, and add them to the PG being constructed
    • if conflicting values for properties, src or dest arise, then
      g
      is INVALID-4
    • NB: per the definition of well-behaved context,
      tpus
      references all properties of
      T
      , as well as
      ?src
      and
      ?dest
      for edge types
      • so at this point,
        b
        in the property graph complies with type
        T

Proof

The algorithm is complete

i.e. if

r was actually generated from a
pg
, it will NOT be rejected as invalid.

For this, we need to prove that

r=prsc(pg,c) always has the properties checked at each INVALID case above.

  • PROP-0 the bnodes of
    r
    are exactly the elements of
    pg
    • not an "INVALID" case in the algorithm, but useful for the rest
    • proof sketch: part of the proof of Theorem 3
  • PROP-1 every bnode occurs in exactly one "signature triple"
    • proof sketch: Lemma 2 ?
  • PROP-2 every triple contains exactly one or two bnodes in the case of two, one of them is a node, the other is an edge
    • proof sketch: Theorem 5
    • corollary: every triple is produced by the graph pattern associated to its unique bnode, or to its edge bnode
      • proof sketch: Theorem 6
  • PROP-3 for every element
    b
    of
    pg
    , and every unique triple pattern
    tp
    in
    c(typeofp(b))
    , there is exactly one triple in
    g
    containing
    b
    and whose
    κ
    is the same
    tp
    • proof sketch: Corollary PROP-2 and Theorem 7 / Corrolary 3
  • PROP-4 for every element
    b
    of
    pg
    , "the triples generated with b contain the correct values for the placeholders"
    • needs to be more formally defined

The algorithm is sound

i.e. if

r was actually generated from a
pg
, then it is
pg
that is actually reconstructed

TODO

The algorithm is creative
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

i.e. it may generate a

pg even if
g
was not (or even could not be) generated from a
pg
.

Consider a node type with label Person and properties givenName and familyName.

Consider the (well behaved) context that associate that type with the following graph pattern:

  ?self a schema:Person .
  ?self schema:givenName "givenName"^^prec:valueOf .    # unique
  ?self schema:familyName "familyName"^^prec:valueOf .  # unique
  ?self rdfs:label "givenName"^^prec:valueOf .          # NOT unique
  ?self rdfs:label "familyName"^^prec:valueOf .         # NOT unique

Consider now the following PG:

[ Person { givenName: "Jean", familyName: "Jean "}] 

It converts to

[] a schema:Person ;
  schema:givenName "Jean";
  schema:familyName "Jean";
  rdfs:label "Jean" ;  # NB: actually generated twice, but that's ok
.

But the following graphs would also generate the same PG, with the process above

[] a schema:Person ;
  schema:givenName "Jean";
  schema:familyName "Jean";
  # removed the rdfs:label
.

This one is ok-ish, because it is a subgraph of the original. We removed redundant information. That might be considered acceptable.

[] a schema:Person ;
  schema:givenName "Jean";
  schema:familyName "Jean";
  rdfs:label "Jean", "Batman"; # ADDED a extraneous rdfs:label
.

This one is more disturbing, because we added information that

  • is inconsistent with the context
  • will be lost when converting back to PG*

I suspect that Julian will say: "I don't care, I am only concerned with RDF graphs that actually come from a PG". And that will be fair ;)

Note that we could improve the algorithm above to detect that these graphs are invalid (or possibly just the second one), but that might be expensive So there is a trade-off to be made here.