owned this note
owned this note
Published
Linked with GitHub
Lossless harvesting storage with 2 databases (graph and postgres)
===
[toc]
## Summary
After analyzing one DCAT data source and the whole process of:
1. harvesting metadata into CKAN
2. storing metadata in CKAN metadata DB (PostgreSQL ckan metadata DB) and
3. exporting metadata from CKAN in DCAT standard format
the conslusion is that there is no metadata loss.
But, it might be not a case with other DCAT sub-standards (DCAT standard modifications) like DCAT-AP, DCAT-AP.de, DCAT-AP.it, DCAT-AP.ch etc.
[DCAT example (bottom of the doc)](#Example-DCAT-to-CKAN-to-DCAT-metdata-transition):
1. [Original dataset metadata in DCAT format](#Original-DCAT-catalog-dataset-metadata)
3. [CKAN package metadata](#CKAN-package-metadata)
4. [Exposed metadata in DCAT standard rdf format](#Exposed-metadata-in-DCAT-standard-rdf-format)
## Analysis
We can have two approaches here:
### To enable triplestore DB only for harvested dataset's metadata and to keep both copies of metadata:
* CKAN metadata in Postgres DB, indexed in Solr and ready for search
* The second copy should go in some Graph DB and will be used only if we want to expose original metadata in DCAT rdf (or any other triplets) format
This way we will retain CKAN philosophy of metadata structure (personaly I think this is big advantage comparing with competitor softwares) and also we'll have original DCAT metadata.
We don't need to care about DCAT metadata standard or it's derivations because we can store the original metadata without any modifications.
The custom DCAT extensions will be needed for transforming and mapping metadata from DCAT and it's derivations to CKAN metadata schema.
```mermaid
graph LR
A[DCAT metadata from harvesting source] --> B(CKAN metadata DB)
A --> C(Graph DB)
B --> D[Metadata in CKAN schema standard]
C --> E[Metadata in DCAT standard - original metadata]
```
### To have support for DCAT standard in CKAN out of the box
* that way publisher can create datasets with DCAT standard schema (this is against CKAN philosophy)
* also, this will not solve the issue with all of the DCAT standard derivatives - for every derivative some custom implementation will be needed
## Example DCAT to CKAN to DCAT metdata transition
Original datasource catalog: https://www.duva-server.de/obis2dcat/catalog.rdf
#### Original DCAT catalog dataset metadata
```xml=
<dcat:dataset>
<dcat:Dataset rdf:about="http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009">
<dct:language rdf:resource="http://id.loc.gov/vocabulary/iso639-1/de"/>
<dct:spatial rdf:resource="http://www.geonames.org/6553028"/>
<dcat:keyword>Dirk Karl Buttler CDU</dcat:keyword>
<dcat:keyword>Wahlart</dcat:keyword>
<dct:description>Oberbürgermeisterwahl 2009</dct:description>
<dct:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2021-05-19</dct:modified>
<dcatde:maintainer>
<foaf:Person>
<foaf:name>Jörg Jülkenbeck</foaf:name>
</foaf:Person>
</dcatde:maintainer>
<dcat:keyword>Wahlscheinwähler</dcat:keyword>
<dcat:keyword>Gemeindewahlbezirke</dcat:keyword>
<dcatde:politicalGeocodingURI rdf:resource="http://dcat-ap.de/def/politicalGeocoding/regionalKey/051190000000"/>
<dcat:keyword>Manfred Lorentschat GRÜNE</dcat:keyword>
<dcat:keyword>Stimmabgabebezirk</dcat:keyword>
<dcat:keyword>Amtlicher Gemeindeschlüssel</dcat:keyword>
<dcat:keyword>Wahldatum</dcat:keyword>
<dcat:keyword>Frank Dittmeyer DIE LINKE</dcat:keyword>
<dcat:contactPoint>
<vcard:Individual>
<vcard:hasEmail>joerg.juelkenbeck@oberhausen.de</vcard:hasEmail>
<vcard:fn>Jörg Jülkenbeck</vcard:fn>
</vcard:Individual>
</dcat:contactPoint>
<dcat:keyword>Stadtbezirk</dcat:keyword>
<dcat:keyword>Urnen- oder Briefwahl</dcat:keyword>
<dcat:keyword>Wähler/-innen</dcat:keyword>
<dcat:keyword>Wähler/-innen mit Wahlbrief</dcat:keyword>
<dct:temporal>
<dct:PeriodOfTime>
<schema:endDate rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-08-30T00:00:00</schema:endDate>
<schema:startDate rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-08-30T00:00:00</schema:startDate>
</dct:PeriodOfTime>
</dct:temporal>
<dcat:keyword>Klaus Heinrich Wehling SPD</dcat:keyword>
<dct:title>Oberbürgermeisterwahl 2009</dct:title>
<dcat:keyword>Regina Boos FDP</dcat:keyword>
<dcat:distribution>
<dcat:Distribution rdf:about="http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009#dist-csv">
<dct:license rdf:resource="http://dcat-ap.de/def/licenses/dl-by-de/2.0"/>
<dct:language rdf:resource="http://id.loc.gov/vocabulary/iso639-1/de"/>
<dct:format rdf:resource="http://publications.europa.eu/resource/authority/file-type/CSV"/>
<dcat:accessURL rdf:resource="http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009"/>
<dct:description>de-nw-oberhausen-oberbuergermeisterwahl_2009</dct:description>
<adms:status rdf:resource="http://purl.org/adms/status/Completed"/>
<dct:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2021-05-19</dct:modified>
<dcat:downloadURL rdf:resource="http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009/content.csv"/>
<dct:title>Oberbürgermeisterwahl 2009</dct:title>
</dcat:Distribution>
</dcat:distribution>
<dcat:theme rdf:resource="http://publications.europa.eu/resource/authority/data-theme/GOVE"/>
<dct:identifier>45</dct:identifier>
<dcat:keyword>ungültige Stimmen</dcat:keyword>
<dcatde:politicalGeocodingLevelURI rdf:resource="http://dcat-ap.de/def/politicalGeocoding/Level/municipality"/>
<dcat:keyword>Wahlberechtigte ohne Sperrvermerk Wahlschein</dcat:keyword>
<dcat:keyword>gültige Stimmen</dcat:keyword>
<dcat:keyword>Wahlberechtigte mit Sperrvermerk Wahlschein</dcat:keyword>
<dcat:keyword>Wahlberechtigte</dcat:keyword>
</dcat:Dataset>
```
#### CKAN package metadata
https://opendata.ruhr/api/3/action/package_show?id=oberburgermeisterwahl-20091
```json=
{
"author": null,
"author_email": null,
"creator_user_id": "4da4df01-6b05-421f-9587-304d475968c6",
"id": "d0651156-9304-42a9-851a-c6fdd0aa0e44",
"isopen": false,
"license_id": "",
"license_title": "",
"maintainer": null,
"maintainer_email": null,
"metadata_created": "2023-01-25T14:00:06.901238",
"metadata_modified": "2023-03-24T07:00:06.763004",
"name": "oberburgermeisterwahl-20091",
"notes": "Oberb\u00fcrgermeisterwahl 2009",
"num_resources": 1,
"num_tags": 20,
"organization": {
"id": "babbd3cb-99fe-41f4-9371-ad002586cb84",
"name": "oberhausen",
"title": "Oberhausen",
"type": "organization",
"description": "",
"image_url": "2019-12-19-095723.483745Oberhausenneuneu.jpg",
"created": "2019-08-29T15:08:52.907277",
"is_organization": true,
"approval_status": "approved",
"state": "active"
},
"owner_org": "babbd3cb-99fe-41f4-9371-ad002586cb84",
"private": false,
"state": "active",
"title": "Oberb\u00fcrgermeisterwahl 2009",
"type": "dataset",
"url": null,
"version": null,
"extras": [
{
"key": "contact_email",
"value": "joerg.juelkenbeck@oberhausen.de"
},
{
"key": "contact_name",
"value": "J\u00f6rg J\u00fclkenbeck"
},
{
"key": "guid",
"value": "http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009"
},
{
"key": "identifier",
"value": "45"
},
{
"key": "language",
"value": "[\"http://id.loc.gov/vocabulary/iso639-1/de\"]"
},
{
"key": "modified",
"value": "2021-05-19"
},
{
"key": "spatial_uri",
"value": "http://www.geonames.org/6553028"
},
{
"key": "temporal_end",
"value": "2009-08-30T00:00:00"
},
{
"key": "temporal_start",
"value": "2009-08-30T00:00:00"
},
{
"key": "theme",
"value": "[\"http://publications.europa.eu/resource/authority/data-theme/GOVE\"]"
},
{
"key": "uri",
"value": "http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009"
}
],
"groups": [
{
"description": "",
"display_name": "Regierung und \u00f6ffentlicher Sektor",
"id": "gove",
"image_display_url": "https://opendata.ruhr/uploads/group/2019-10-15-131217.0083792019-09-25-134640.503466iconkatgove.svg",
"name": "gove",
"title": "Regierung und \u00f6ffentlicher Sektor"
}
],
"resources": [
{
"access_url": "http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009",
"cache_last_updated": null,
"cache_url": null,
"created": "2023-01-25T14:00:06.914692",
"datastore_active": false,
"description": "de-nw-oberhausen-oberbuergermeisterwahl_2009",
"distribution_ref": "http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009#dist-csv",
"download_url": "http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009/content.csv",
"format": "http://publications.europa.eu/resource/authority/file-type/CSV",
"hash": "",
"id": "cba2009e-5086-4e35-84e8-8fd795f5a276",
"language": "[\"http://id.loc.gov/vocabulary/iso639-1/de\"]",
"last_modified": null,
"license": "http://dcat-ap.de/def/licenses/dl-by-de/2.0",
"metadata_modified": "2023-01-26T15:00:06.733503",
"mimetype": "text/csv",
"mimetype_inner": null,
"modified": "2021-05-19",
"name": "Oberb\u00fcrgermeisterwahl 2009",
"package_id": "d0651156-9304-42a9-851a-c6fdd0aa0e44",
"position": 0,
"resource_type": null,
"size": null,
"state": "active",
"status": "http://purl.org/adms/status/Completed",
"uri": "http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009#dist-csv",
"url": "http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009/content.csv",
"url_type": null
}
],
"tags": [
{
"display_name": "amtlicher-gemeindeschl\u00fcssel",
"id": "d943afb6-2d84-4db9-92c6-3f1eee9f8b64",
"name": "amtlicher-gemeindeschl\u00fcssel",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "dirk-karl-buttler-cdu",
"id": "7dd9ca0a-65df-4562-8580-950f13b3cb39",
"name": "dirk-karl-buttler-cdu",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "frank-dittmeyer-die-linke",
"id": "821d7cc8-4f1e-4d02-a1dc-76a6f097a625",
"name": "frank-dittmeyer-die-linke",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "gemeindewahlbezirke",
"id": "cfdb4704-ff4f-45cd-b230-2eb437096bd6",
"name": "gemeindewahlbezirke",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "g\u00fcltige-stimmen",
"id": "fbed9da7-cde0-48c2-b9c0-34a9a4cdd55d",
"name": "g\u00fcltige-stimmen",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "klaus-heinrich-wehling-spd",
"id": "ee9b67b6-8cf7-4141-b95e-e1d603d62f46",
"name": "klaus-heinrich-wehling-spd",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "manfred-lorentschat-gr\u00fcne",
"id": "5ae51e6c-e6f8-4f08-8e96-dd39de05d189",
"name": "manfred-lorentschat-gr\u00fcne",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "regina-boos-fdp",
"id": "43535f89-02e2-4794-a3eb-d6ffad779e82",
"name": "regina-boos-fdp",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "stadtbezirk",
"id": "249aa39c-6cf7-43ab-8a9b-cc6b476b3ef6",
"name": "stadtbezirk",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "stimmabgabebezirk",
"id": "01b20fdd-841b-4beb-9e09-9fdf9b9c32e1",
"name": "stimmabgabebezirk",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "ung\u00fcltige-stimmen",
"id": "e1a11194-c86a-4c07-a37c-7ecd842caa4f",
"name": "ung\u00fcltige-stimmen",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "urnen--oder-briefwahl",
"id": "2349ac0a-d84b-4444-9984-bb35b290751e",
"name": "urnen--oder-briefwahl",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "wahlart",
"id": "4ce2ac58-51f5-42ae-9c8d-13fc81f1c08f",
"name": "wahlart",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "wahlberechtigte",
"id": "ceb78c2c-a9f4-4f0a-ba7a-69f1debc9e42",
"name": "wahlberechtigte",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "wahlberechtigte-mit-sperrvermerk-wahlschein",
"id": "44ed004e-9757-4115-b5fd-c8a2d33465ea",
"name": "wahlberechtigte-mit-sperrvermerk-wahlschein",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "wahlberechtigte-ohne-sperrvermerk-wahlschein",
"id": "ca4b75ec-4762-4b36-8a5a-e1fdea7cfc2f",
"name": "wahlberechtigte-ohne-sperrvermerk-wahlschein",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "wahldatum",
"id": "9c158164-d966-44d4-8247-d4277f9f97d7",
"name": "wahldatum",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "wahlscheinw\u00e4hler",
"id": "c82ea7af-6c20-4a74-ab26-f0ec0a9837f2",
"name": "wahlscheinw\u00e4hler",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "w\u00e4hler-innen",
"id": "89f67e9a-2e87-4b29-ae4d-5f8910bee79d",
"name": "w\u00e4hler-innen",
"state": "active",
"vocabulary_id": null
},
{
"display_name": "w\u00e4hler-innen-mit-wahlbrief",
"id": "a9d96ecb-d627-4d4c-93d9-1d6df467ff18",
"name": "w\u00e4hler-innen-mit-wahlbrief",
"state": "active",
"vocabulary_id": null
}
],
"relationships_as_subject": [],
"relationships_as_object": []
}
```
#### Exposed metadata in DCAT standard rdf format
https://opendata.ruhr/dataset/oberburgermeisterwahl-20091.rdf
```xml=
<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
xmlns:dct="http://purl.org/dc/terms/"
xmlns:vcard="http://www.w3.org/2006/vcard/ns#"
xmlns:dcat="http://www.w3.org/ns/dcat#"
xmlns:adms="http://www.w3.org/ns/adms#"
xmlns:schema1="http://schema.org/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
>
<dcat:Dataset rdf:about="http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009">
<dct:title>Oberbürgermeisterwahl 2009</dct:title>
<dct:description>Oberbürgermeisterwahl 2009</dct:description>
<dct:identifier>45</dct:identifier>
<dcat:keyword>amtlicher-gemeindeschlüssel</dcat:keyword>
<dcat:keyword>dirk-karl-buttler-cdu</dcat:keyword>
<dcat:keyword>frank-dittmeyer-die-linke</dcat:keyword>
<dcat:keyword>gemeindewahlbezirke</dcat:keyword>
<dcat:keyword>gültige-stimmen</dcat:keyword>
<dcat:keyword>klaus-heinrich-wehling-spd</dcat:keyword>
<dcat:keyword>manfred-lorentschat-grüne</dcat:keyword>
<dcat:keyword>regina-boos-fdp</dcat:keyword>
<dcat:keyword>stadtbezirk</dcat:keyword>
<dcat:keyword>stimmabgabebezirk</dcat:keyword>
<dcat:keyword>ungültige-stimmen</dcat:keyword>
<dcat:keyword>urnen--oder-briefwahl</dcat:keyword>
<dcat:keyword>wahlart</dcat:keyword>
<dcat:keyword>wahlberechtigte</dcat:keyword>
<dcat:keyword>wahlberechtigte-mit-sperrvermerk-wahlschein</dcat:keyword>
<dcat:keyword>wahlberechtigte-ohne-sperrvermerk-wahlschein</dcat:keyword>
<dcat:keyword>wahldatum</dcat:keyword>
<dcat:keyword>wahlscheinwähler</dcat:keyword>
<dcat:keyword>wähler-innen</dcat:keyword>
<dcat:keyword>wähler-innen-mit-wahlbrief</dcat:keyword>
<dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-01-25T14:00:06.901238</dct:issued>
<dct:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-05-19T00:00:00</dct:modified>
<dct:language rdf:resource="http://id.loc.gov/vocabulary/iso639-1/de"/>
<dcat:theme rdf:resource="http://publications.europa.eu/resource/authority/data-theme/GOVE"/>
<dcat:contactPoint>
<vcard:Organization rdf:nodeID="Nf191306d629e427c840361472af353e9">
<vcard:fn>Jörg Jülkenbeck</vcard:fn>
<vcard:hasEmail rdf:resource="mailto:joerg.juelkenbeck@oberhausen.de"/>
</vcard:Organization>
</dcat:contactPoint>
<dct:publisher>
<foaf:Organization rdf:about="https://opendata.ruhr/organization/babbd3cb-99fe-41f4-9371-ad002586cb84">
<foaf:name>Oberhausen</foaf:name>
</foaf:Organization>
</dct:publisher>
<dct:temporal>
<dct:PeriodOfTime rdf:nodeID="N1ca7bc0dc0b3497899ec848ac77815b9">
<schema1:startDate rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-08-30T00:00:00</schema1:startDate>
<schema1:endDate rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2009-08-30T00:00:00</schema1:endDate>
</dct:PeriodOfTime>
</dct:temporal>
<dcat:distribution>
<dcat:Distribution rdf:about="http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009#dist-csv">
<dct:title>Oberbürgermeisterwahl 2009</dct:title>
<dct:description>de-nw-oberhausen-oberbuergermeisterwahl_2009</dct:description>
<adms:status rdf:resource="http://purl.org/adms/status/Completed"/>
<dct:license rdf:resource="http://dcat-ap.de/def/licenses/dl-by-de/2.0"/>
<dcat:accessURL rdf:resource="http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009"/>
<dcat:downloadURL rdf:resource="http://www.duva-server.de:8080/obis2dcat/dataset/de-nw-oberhausen-oberbuergermeisterwahl_2009/content.csv"/>
<dct:language rdf:resource="http://id.loc.gov/vocabulary/iso639-1/de"/>
<dcat:mediaType>text/csv</dcat:mediaType>
<dct:format rdf:resource="http://publications.europa.eu/resource/authority/file-type/CSV"/>
<dct:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-05-19T00:00:00</dct:modified>
</dcat:Distribution>
</dcat:distribution>
</dcat:Dataset>
</rdf:RDF>
```
---
Q: Storing metadata in Graph DB vs Blob Storage - pros and cons?