Semantic Markdown Spec (V0)

Introduction

What is Semantic MarkDown ?

Design Rationale:

  • Embed RDFa-like semantic annotation within MarkDown
  • Ability to mix unstructured human-text with machine-readabale data in JSON-LD-like lists
  • Ability to semantically annotate an existing plain MarkDown document with semantic annotations
  • Try to keep human-readability to a maximum

We need 3 kinds of annotations:

  • annotations with a property
  • annotations with a subject identifier
  • annotations with a type/class

About this document

In brief

  • Annotations starting with a . indicate a type/class, and generate RDFa typeof attribute : {.foaf:Person}
  • Annotations starting with a = indicate a URI of a known entity, and generate RDF resource attribute : {=wdt:Q42}
  • Annotations without any marker indicate a property, and generate RDFa property attribute : {foaf:name}

Paragraph example

The Hitchhiker's Guide to the Galaxy was written by [Douglas Adams]{dct:creator} in [1979]{dct:created}. {=wdt:Q25169}

A short example

:::{.schema:Event}

## Specification meeting {schema:name}

* Date : 11/10 {schema:startDate}
* Place : Our office, Street name, 75014 Paris {schema:location}
* Meeting participants : {schema:attendee}
  * Alice;
  * Bob;
  * [Tim](https://www.wikidata.org/wiki/Q80);
* Description : Some information not annotated
:::

MarkDown extensions needed

Rely on attributes extension

See PHP Markdown extra special attributes and Pandoc's header attributes :

Extract from PHP Markdown extra documentation :

With Markdown Extra, you can set the id and class attribute on certain elements using an attribute block. For instance, put the desired id prefixed by a hash inside curly brackets after the header at the end of the line, like this:

Header 1            {#header1}
========

## Header 2 ##      {#header2}

Then you can create links to different parts of the same document like this:

[Link back to header 1](#header1)

To add a class name, which can be used as a hook for a style sheet, use a dot like this:

## The Site ##    {.main}

You can also add custom attributes having simple values by specifying the attribute name, followed by an equal sign, followed by the value (which cannot contain spaces at this time):

## Le Site ##    {lang=fr}

The id, multiple class names, and other custom attributes can be combined by putting them all into the same special attribute block:

## Le Site ##    {.main .shine #the-site lang=fr}

At this time, special attribute blocks can be used with

  • headers,
  • fenced code blocks
  • links, and
  • images.

Extend where attributes can be places

Extend attributes to lists

The attribute mechanism need to be extended to annotate lists. In this case the curly brackets should be put right before the list:

{foaf:member}
- item 1
- item 2
- item 3
Extend attributes to list items
- item 1 {foaf:member}
- item 2 {foaf:member}
- item 3 {foaf:member}
Extend attributes to inlines
Thomas is _39_{foaf:age}.
Attributes on a word without inline delimiters ?
Thomas is 39{foaf:age}.

Allow "property attribute"

An attribute without ., without # and that is not a key-value pair should be recognized as a property name, e.g. {foaf:name}.

Allow "subject attribute"

An attribute beginning with the = sign indicates a subject URI, equivalent to an about=xxx property, e.g. {=wdt:Q42} is equivalent to {about=wdt:Q42}

Rely on divs and bracketed spans extension

See PanDoc bracketed spans

Meeting with [Bob]{.foaf:Person}

Should produce

<p>Meeting with <span typeof="foaf:Person">Bob</span></p>

Mechanism to indicate property values (RDFa "property" attribute)

Properties in lists

Key/Value pairs

If the list item contains : or =, the annotation is applied to the string after this character. Should final dot or semi-coloon be omitted here ?

- Nom : Thomas Francart {foaf:name}
- Age = 39 {foaf:age}
- Profession : Semantic Web Consultant; {rdfs:comment}

Should yield (note how semi-colon is exclused from last annotation) :

<ul>
  <li>Nom : <span property="foaf:name">Thomas Francart</span></li>
  <li>Age = <span property="foaf:age">Thomas Francart</span></li>
  <li>Profession : <span property="rdfs:comment">Semantic Web Consultant</span>;</li>
</ul>

URI written directly as key

- foaf:name : Thomas Francart
- foaf:age = 39
- rdfs:comment : Semantic Web Consultant

Should yield

<ul>
  <li>foaf:name : <span property="foaf:name">Thomas Francart</span></li>
  <li>foaf:age = <span property="foaf:age">Thomas Francart</span></li>
  <li>rdfs:comment : <span property="rdfs:comment">Semantic Web Consultant</span></li>
</ul>

Value-only list items

- Thomas Francart {foaf:name}
- 39 {foaf:age}
- Semantic Web Consultant {rdfs:comment}

Should yield

<ul>
  <li><span property="foaf:name">Thomas Francart</span></li>
  <li><span property="foaf:age">Thomas Francart</span></li>
  <li><span property="rdfs:comment">Semantic Web Consultant</span></li>
</ul>

Annotate a list with a property

Annotating a list with a property annotation should be treated as if all list items are annotated with the same property

{foaf:member}
- Thomas ;
- Vincent;
- Nicolas;

Is equivalent to

- Thomas; {foaf:member}
- Vincent; {foaf:member}
- Nicolas; {foaf:member}

And should yield

<ul>
  <li><span property="foaf:member">Thomas</span>;</li>
  <li><span property="foaf:member">Vincent</span>;</li>
  <li><span property="foaf:member">Nicolas</span>;</li>
</ul>

Inline properties

Properties on inline delimiters

Thomas is [39]{foaf:age}.

Should yield

<p>Thomas is <span property="foaf:age">39</span></p>

Same with _, * or **.

Properties on word without delimiters

If a property annotation immediatly follows a word with no explit inline delimiters, it should be applied to this word only. (Is it really possible in termes of parsing ? don't know).

Thomas is 39{foaf:age}.

Should yield

<p>Thomas is <span property="foaf:age">39</span></p>

Annotate with 2 properties

It should be possible to annotate with 2 properties

- Name : Alice {foaf:name rdfs:label}
- Age : 23 {foaf:age}

Mechanism to indicate current subject

RDFa relies on a mechanism to indicate the current subject of the annotation (precise reference needed). We should aim at having something equivalent in SemanticMarkDown.

Intuitively, the current subject is the resource annotated in the "closest ancestor" of a property annotation.

Use a class attribute (RDFa "typeof" attribute)

# Le site {.foaf:Document}
{.foaf:Document}
- item 1
- item 2
- item 3

Use an ID attribute (RDFa "about" or "resource" attribute)

Use an attribute with a key-pair, with the key "about" or "resource"

# Douglas Adams {resource=wdt:Q42}

Can we find some ind of shortcut ? Maybe use the equal sign

# Douglas Adams {=wdt:Q42}

Combine ID + class

It should be possible to combine an ID and a type attrbute

# Douglas Adams {.foaf:Person =wdt:Q42}

Where to find the current subject ?

Current span subject (?) (requires div-span extension)

Used to indicate that a certain inline portion of a sentence is about an entity.

[Tim Berners Lee]{=wdt:Q80} invented the web.

Should yield

<p><span resource="wdt:Q80">Tim Berners Lee</span> invented the web</p>.

Current paragraph subject

Used to indicate that a whole paragraph is about an entity.

Tim Berners Lee invented the web. {=wdt:Q80}

Should yield

<p resource="wdt:Q80">Tim Berners Lee invented the web</p>.

Current list subject

Used to indicate that a whole list describes an entity

{=wdt:Q80}
- Name : Tim Berner's Lee {foaf:name}
- ISNI : 0000 0000 7866 6209 {wd:P213}

Should yield

<ul resource="wdt:Q80">
  <li>Name : <span property="foaf:name">Tim Berner's Lee</span></li>
  <li>ISNI : <span property="wd:P213">0000 0000 7866 6209</span></li>
</ul>

For readablity, the list annotation should be seeked at the end of the line preceding the list:

:::{.schema:Event}

* Date : 11/10 {schema:startDate}
* Meeting participants : {schema:attendee}
  * Alice;
  * Bob;
:::

Indented lists

Indented lists are key because they could make plain MarkDown lists look like JSON-LD-like structures;

Here is our meeting description :

- Date : 10/11/2019
- Location : somewhere
- Attendees :
  - Alice
    - Engineer
    - Works for : Foo
    - Hobbies :
      - Football
      - Video games
  - Bob
    - Sales Manager
    - Works for : Bar
    - Hobbies : 
      - Cooking
      - Cycling

Annotated version:

Here is our meeting description : {.schema:Event}

- Date : 10/11/2019 {schema:startDate}
- Location : somewhere {schema:place}
- Attendees : {schema:attendee}
  - Alice {schema:name}
    - Engineer {schema:jobTitle}
    - Works for : Foo {schema:affiliation}
    - Hobbies : {schema:knowsAbout}
      - Football
      - Video games
  - Bob {schema:name}
    - Sales Manager {schema:jobTitle}
    - Works for : Bar {schema:affiliation}
    - Hobbies : {schema:knowsAbout}
      - Cooking
      - Cycling

Arguably, this is not human-readable anymore

Current blockquote subject (is it useful ?)

Used to indicate that a blockquote describes an entity

TODO

Current header subject

Used to indicate that a certain section of a document describes an entity.

While this is certainly useful and intuitive to do (and compatible with the attributes MarkDown extension), this is probably the most tricky to implement because a header in MarkDown does not generate a common HTML ancestor for its whole content.

Let's assume for now that it is possible to generate a <div> that contains the entire header content, but the feasability of this should be checked.

TODO

## Specification meeting {.schema:Event}

- Date : 10/11/2019 {schema:startDate}
- Location : somewhere {schema:location}

Should yield

<div typeof="schema:Event">
  <h2>Specification meeting</h2>
  <ul>
    <li>Date : <span property="schema:startDate">10/11/2019</span></li>
    <li>Location : <span property="schema:location">somewhere</span></li>
  </ul>
</div>

Current div subject (requires div-span extension)

:::{=wdt:Q80}
Tim Berners Lee invented the web.

He now works on Solid.
:::

Should yield

<div about="wdt:Q80">
  <p>Tim Berners Lee invented the web.</p>
  <p>He now works on Solid.</p>
</div>

Mechisnm to declare namespaces

Use link references, anywhere in the document, preferably at the end to ease readability.


{.schema:Event}
* Date : 10/11/2019 {schema:startDate}
* Location : somewhere {schema:location}
... the rest of the document ...

—
[schema]: http://schema.org/
[rdfs]: http://www.w3.org/2000/01/rdf-schema#

Should yield

<html prefix="schema: http://schema.org/ rdfs: http://www.w3.org/2000/01/rdf-schema#">
  <body>
        <ul typeof="schema:Event">
          <li>Date : <span property="schema:startDate">10/11/2019</span></li>
          <li>Location : <span property="schema:location">somewhere</span></li>
        </ul>
  </body>
</html>

Question : how to distinguish link references that are prefixes from link references that are just link references ? should we need a special annotation for that ? e.g. {@prefix} :

### Specifications Meeting {.schema:Event}

* Date : 10/11/2019 {.schema:startDate}
... the rest of the document ...

—
[schema]: http://schema.org/ {@prefix}
[rdfs]: http://www.w3.org/2000/01/rdf-schema# {@prefix}

Mechanism to declare default namespace

Referring to a URI

Absolute URI reference

Meeting with _Bob_{.http://xmlns.com/foaf/0.1/Person}

Absolute URI reference with <>

Meeting with _Bob_{.<http://xmlns.com/foaf/0.1/Person>}

Prefixed URI reference (known prefix)

Meeting with _Bob_{.foaf:Person}

Prefixes known in RDFa Core Initial Context

Meeting with _Bob_{.f:Person}

[f]: http://xmlns.com/foaf/0.1/Person

Here is our meeting description : {.schema:Event}

- [Date] : 10/11/2019
- [Location] : somewhere
- [Attendees] :
  - [Name] : Alice
    - [jobTitle] : Engineer
    - [Works for] : Foo
    - [Hobbies] :
      - Football
      - Video games
  - [Name] : Bob
    - [jobTitle] : Sales Manager
    - [Works for] : Bar
    - [Hobbies] :
      - Cooking
      - Cycling

--
[Date]: http://schema.org/startDate
[Location] : http://schema.org/Location
[Name] : http://schema.org/name
[jobTitle] : http://schema.org/jobTitle
[Works for] : http://schema.org/affiliation
[Hobbies] : http://schema.org/knowsAbout

In this example there is a "key : value" syntax we could use to populate correctly an event (instead of [*]). The first "key:value" is Date:10/11/2019, meaning the script has to put 10/11/2019 in date property of event.
If we encounter "Attendees:", we retrieve schema of event's attendee => foaf:Person and we use it to analyse the stuff below.
– Swann

Very readable ! Looks very much like JSON-LD, with a context. Besides, makes link clickable. And does not rely on an extension to capture the property. Closer to original MarkDown philosophy. Lots of advantages !


Benchmark

Roam-research

  • Similarities :
  • Differences : Open-source

Org-roam

  • Similarities :
  • Differences :

TiddlyRoam

  • Similarities :
  • Differences :