These instructions are for a data publisher using:
These instructions are being written to inform the development of:
Assumptions and constraints that influence the instructions for each scenario:
datapackage.json
or tableschema.json
files directlytableschema.json
file stored or referenced for each data resourcedatapackage.zip
file uploaded to CKAN is stored in some way and can be downloaded from CKAN as either a:
datapackage.zip
filedatapackage.json
(that doesn't include the README.md
)There are a number of different scenarios for creating or updating data packages and publishing or accessing them on CKAN.
Data publishers can:
and after the data package is published, data consumers can:
datapackage.json
datapackage.zip
To create a data package:
name
, type
and format
valuesdatapackage.json
generated from the column, table and package propertiesREADME.md
generated from the provenance information/ data
directory
To publish a data package to CKAN and create a dataset and related resources:
tableschema.json
file(s)README.md
fileConsider adding support for the following properties in Data Curator:
keyword
tags (currently supported by CKAN Data Package Tools.)
bytes
(planned) (not currently supported by CKAN Data Package Tools.)
bytes
calculated by CKAN?Many data resources are published as complete snapshots of the data, e.g. at the end of a month, that month's data is appended to the end of the existing data.
To correct or add data to an existing data resource in CKAN:
tableschema.json
fileIf the CKAN Validation extension isn't installed, before you add or change data in a published data package you may want to validate the data using Data Curator. This will provide two files to be uploaded to CKAN:
README.md
explaining any errorsTo validate the data using Data Curator:
datapackage.zip
from CKANdatapackage.zip
file in Data CuratorREADME.md
README.md
to be uploaded to CKANIf Data Curator could open a datapackage.json
file that references the data and table schemas by URL, then the requirement to provide a datapackage.zip
download could be deferred.
The instructions above would still be valid apart from an additional step if you decide to publish the data with errors. As the original README.md
is not downloaded, it would need to be downloaded and its contents pasted into the provenance information before it could be updated explaining the errors.
Sometimes data is added in increments to a dataset e.g. at the end of a year, that year’s data is add as a new data resource to other yearly data resources.
To add a new data resource to a published data package:
tableschema.json
resourceThere is no way to upload the datapackage .zip file and apply it to the existing CKAN dataset. You can either:
A major change to a data package is when you make changes that are incompatible with prior versions, e.g.
An example could be adding a reference table as a new dataset and creating a foreign key relationship between it and the existing data.
To publish a major change to a published data package:
datapackage.zip
datapackage.zip
and make the changesThere is no way to upload the datapackage .zip file and apply it to the existing CKAN dataset. You can either:
To download a datapackage.json
file:
To download a datapackage.zip
:
Some new properties need to be included to support tabular data packages.
Create valid data package properties for use in create.py. In converter.py convert the following properties from a data package to a CKAN dataset.
profile
mandatory for tabular data packageslicenses
(#62)contributors
(#59) maps to author
in CKANsources
(#59) maps to maintainer
in CKANSee notes below for what metadata is currently lost when converting between CKAN and data packages
Create valid data resource properties for use in create.py. In converter.py, convert the following properties from the data resources to CKAN resources.
schema
mandatory for tabular data resourcesprofile
mandatory for tabular data resourcesdialect
mandatory for tabular data resources, if it differs from specification defaultsencoding
mandatory for tabular data resources, if it differs from specification defaultSee notes below for what metadata is currently lost when converting between data resources and CKAN
In create.py:
schema
property for each data resource. This would be a tableschema.json
file for a Tabular Data Resource (#61)datapackage.json
and support the CKAN Validation extensiondialect
for each data resource.encoding
for each data resource.See:
Convert the CKAN dataset to a data package using convertor.py dataset_to_datapackage
Convert the CKAN resources to data resources using convertor.py _convert_to_datapackage_resource
datapackage.json
for downloadGenerate a minimal, valid datapackage.json
for download
profile
to the data packageschema
"schema": "URL"
to point to the schema in CKAN (#49) (noting this discussion), orprofile
, dialect
and encoding
README.md
won't be included
README.md
Store README.md
(#60) using create.py
datapackage.zip
for downloadGenerate a full datapackage.zip
for download (#52).
This should match the datapackage.zip
used to upload the data package to CKAN (less any properties not yet implemented e.g. image
).
Store data resources in the CKAN Data Store (#44)
The following properties are converted by CKAN Data Package Tools and the CKAN Data Packager extension (ignoring the issues mentioned above)
name
title
description
version
licenses
(CKAN has a single value for a license but a data package supports an array of licenses)sources
contributor
(author role)keywords
Other properties in the data package are converted to CKAN "extras" properties
Properties in the specification that are not directly converted:
profile
(e.g. "tabular-data-package")id
homepage
image
created
In the CKAN Data Package extension name
is limited to 2-100 characters. Consider adding this validation to Data Curator (planned).
The following properties are converted by CKAN Data Package Tools and the CKAN Data Packager extension (ignoring the issues mentioned above)
name
title
description
homepage
version
licenses
sources
contributor
(author role)keywords
Other properties in CKAN are parsed into "extras" properties
Properties in the specification that are not directly converted:
id
profile
image
created
The following properties are converted by CKAN Data Package Tools and the CKAN Data Packager extension (ignoring the issues mentioned above)
path
or data
name
title
description
format
(e.g. "csv")hash
Properties in the specification that are not directly converted:
profile
(e.g. "tabular-data-resource")schema
(Table Schema for a Tabular Data Resource or another schema for other data resource types)dialect
(CSV Dialect for a Tabular Data Resource. Defaults "line terminator": "\r\n"
, "delimiter": ","
)encoding
(e.g. default "UTF-8")mediatype
(e.g. "text/csv")bytes
sources
licenses
(CKAN doesn't store licenses at the resource level, they inherit from the dataset)Would it help if a CKAN Schema was defined to support all data package metadata?
The following properties are converted by CKAN Data Package Tools and the CKAN Data Packager extension (ignoring the issues mentioned above)
name
path
title
description
format
(e.g. "csv")hash
Properties in the specification that are not directly converted:
profile
(e.g. "tabular-data-resource")mediatype
(e.g. "text/csv")encoding
(e.g. "UTF-8")bytes
sources
licenses
(CKAN doesn't store licenses at the resource level, they inherit from the dataset)schema
(Table Schema for a Tabular Data Resource or another schema for other data resource types)dialect
(CSV Dialect for a Tabular Data Resource)