Try   HackMD

Graduate "Metadata in Table Schema" from pattern to spec

Context

In 2019, we introduced a pattern for schema metadata properties, to be able to describe their name, description and other caracteristics. This helps users to understand schemas and increase their sharing and reuse, for example as part of a cataloging use case.

These metadata properties have since been used by a significative number of schemas, most of which have been created in France and cataloged on schema.data.gouv.fr.

Examples of adoption:

Some of the properties have also been implemented in frictionless-py:

  • descriptor (optional)
  • name (optional)
  • type (optional)
  • title (optional)
  • description (optional)

Proposition

In order to solidify the growing adoption of the metadata properties and increase the coherence between the spec and implementation, we propose to add a subset of those properties, those most frequently used, to the Table Schema specification and documentation, as part of the v2 Frictionless Data specs.

We will also propose via an issue on the frictionless-py repository to implement those properties in the library and document them.

All those properties would stay optional to ensure the retrocompatibility of the spec and implementations with existing schemas.

  • name:
    • Description: An identifier string for this schema.
    • Format: string in lower-case and containaing only alphanumeric characters along with "_" or "-" characters, without any space
    • Example: 'schema-static-ev-charger'
  • title:
    • Description: A human-readable title for this schema.
    • Format: string limited to 100 characters
    • Example: 'Static EV charger'
  • description:
    • Description: A text description for this schema.
    • Format: string
    • Example: "Specification of the exchange file for data concerning the geographical location and technical characteristics of electric vehicle charging stations and points."
  • homepage:
  • path:
    The direct path to the schema itself can be useful to help accessing it (i.e. machine readability).
  • sources:
    • Description: A list of dictionnaries containing documentation sources titles and urls related to the schema
    • Format: json array of documentation sources described with these properties "title" and "path"
    • Example:
    ​​​​[
    ​​​​        {
    ​​​​            "title": "Décret n° 2017-26 du 12 janvier 2017 relatif aux infrastructures de recharge pour véhicules électriques et portant diverses mesures de transposition de la directive 2014/94/UE du Parlement européen et du Conseil du 22 octobre 2014 sur le déploiement d’une infrastructure pour carburants alternatifs",
    ​​​​            "path": "https://www.legifrance.gouv.fr/jo_pdf.do?id=JORFTEXT000033860620"
    ​​​​        }
    ​​​​]
    
  • keywords:
    • Description: A list of short keywords related to the schema
    • Format: list of string
    • Example:
    ​​​​[
    ​​​​  "electric vehicle",
    ​​​​  "ev",
    ​​​​  "charging station",
    ​​​​  "mobility"
    ​​​​]
    
  • resources:
    Oftentimes, schemas are shared with example resources to illustrate them, with valid or even invalid files (e.g. with constraint errors).
    • Description: Example tabular data resource(s) validated or invalidated against this schema.
    • Format: json array of data file(s) described with these properties "title" and "path"
    • Example:
      ​​​​[
      ​​​​        {
      ​​​​            "title": "Exemple de fichier IRVE valide",
      ​​​​            "path": "https://raw.githubusercontent.com/etalab/schema-irve/v2.3.0/statique/exemple-valide-statique.csv"
      ​​​​        }
      ​​​​]
      
  • created:
    • Description: The date on which this schema was created.
    • Format: date
    • Example: "2018-06-29"
  • lastModified:
    • Description: The date on which this schema was last modified.
    • Format: date
    • Example: "2022-10-10"
  • version:
    • Description: A unique version number for this schema, in the semantic versioning format, possibly prefixed with the letter "v".
    • Format: string
    • Examples: "2.3.0" or "v2.3.0" or "2.3.0-beta"
  • contributors:
    • Description: The contributors to this schema.
    • Format: json array of contributors described by these properties "title", "email", "organisation", "role"
    • Example:
      ​​​​[
      ​​​​    {
      ​​​​        "title": "Alexandre Bulté",
      ​​​​        "email": "validation@data.gouv.fr",
      ​​​​        "organisation": "Etalab",
      ​​​​        "role": "author"
      ​​​​    },
      ​​​​    {
      ​​​​        "title": "Pierre Dittgen",
      ​​​​        "email": "pierre.dittgen@jailbreak.paris",
      ​​​​        "organisation": "Jailbreak",
      ​​​​        "role": "contributor"
      ​​​​    },
      ​​​​    ...
      ​​​​]
      

Adding other custom properties, would still be allowed and tolerated by implementations such as frictionless-py

Next

  • Collect feedback
  • Work on spec (PR)
  • Propose and work on implementation (issue+PR)

We propose to contribute to all or part of this work.