owned this note
owned this note
Published
Linked with GitHub
---
title: Session E - Table 1
description: Notes from Table 1, Session E
date: 2018-12-13
author: Susan Brown, Valeria Vitale, Florian Thiery, Andreas Wagner
license: cc-by-sa 4.0
tags: LinkedPasts IV
---
Session E - Table 1
===
# Title: Tools and Workflows #LinkedPipes
## Challenges
* versioning -- if you have only RDF
* graph partitioning (named graphs, VoID, reification etc)
* Aligning ontologies and vocabularies (what are the best tools? For example [Protégé](https://protege.stanford.edu/), [OntoME](http://ontome.dataforhistory.org/) (by [Data for History consortium](http://dataforhistory.org/)). Historical vs non-historical, event-based vs resource-based etc. Interoperability of different ontologies, what are the inconsistencies that you may introduce when merging ontologies and data; how to find the right ontologie(s) for aligning existing datasets
* How do you decide what is the best ontology for your project? For example when trying to choose the best ontology to describe time
* Combining archaeological data from different excavations. Provenance, idiosyncratic data, different level of granularity
* Building a community of users, also on the technical side. For example with the CWRC-Writer
* How to express different level of academic interpretations around objects, for example around 3D objects
* How people can embed LOD in research process (esp Recogito); are there workflows going from Recogito to EpiDoc, GIS, etc.
* Tool replicability; often there are things that are quite close but not quite right for purpose so we build something new; or building database AND all the tools on top of it rather than just working with the data
* Need intuitive tools for teaching people who cannot code; workflows that use open software (e.g. From the Page to Voyant to Recogito)
* Not promoting the idea of a closed virtual work environment (one single tool to do eerything) but have a tool inventory, instead, keep the tools pipeline-able i.e. modular
* pipelines are long and there are many segments; problems replicate at different stages; as aggregator WHG allows contributors to enhance their data (e.g. reconciliation); how to manage an update process
* In the context of aggregation, provide tools for reconciliation and erichment of data
* impact of modelling decisions on future work but challenges to make time for this work/consulting
* API design (for exposition of text or whatever is specific about the kind of data at hand)
* Sometimes the models and standards to enable the connections are not there yet, so it is up to the practictioners to find their way around it, possibly connecting different tools
### Clusters of challenges
* Ontologies:
* building (decide which to use, where to extend? where to import whole or just cherry-pick parts)
* mapping/aligning (keep free from inconsistencies)
* inferencing (move from rdfs to owl2)
* tools for validation, quality control (e.g. Protogé)
* how to support complex queries--example of building up from snippets of sparql
* Data complexity
* complex provenance information that needs to stay with object
* choosing ontologies to bring together heterogeneous legacy databases
* partitioning
* versioning, persistence, long-term preservation, also relevant to moving objects from one tool environment to another
* replicability of chain of actions performed on an object
* Tool /flows
* input/output data/serialization formats (json-ld vs rdf+xml) and their conversions; limitations in tools (loss due to transformations performed by tools)
* pipelines for building LOD into researcher workflow
* overlaps/complementarity
* what fields do the tools absolutely need to be interoperable and documented (e.g. which bits of provenance information, serialization format, ontologies used, respective documentation location) etc.
* roundtripping/snakepit of data enhancement
* how and where to track provenance, when and how to refer to the original
## Strategies
### Overall Goal
**Strategy on "Tools and Flows" aka Linked Pipes**
**-> build up a LinkedPipes working group**
### About Linked Pipes
* produce a feature matrix specifically related to LOD and promoting pipelines
* fields: name, link to source code; date of entry; entry-level tool?; consumes LOD?; produces LOD?; input formats (a section with individual columns: xml, xml+tei, xml+rdf, csv, json-ld (iiif, ...), djvu+xml, geoJSON, plaintext, html, ttl, n3, graphml, jpg, audio, video, shapefiles, lpif, sql, sparql, shacl); output formats;
* Encourage projects to enter in software registry ([TaPoR](http://tapor.ca/home) or teresah.dariah.eu) and link to that general profile; also advise providing link to tool use case and flows that include that tool; provide link from the main page to other pages (e.g. wiki with list of tools). Focus this on working towards pipelines: Categories like consumption/production/bridging; Data formats consumed/produced;
* More general information about the tools that we suggest should be entered at those other places: license; howtos; also free comments for things to point out about tools individually; project contacts (project director); technical contacts (programmers); institution(s)
* later (?): categorize by type e.g. production, visualization; also bridging tools
* also produce links to the pipelines, wherever they are: another page on github, a programming notebook, blog post, or whatever.
### List of participants in the Linked Pipes working group
(just those whose github ids are not listed below)
|name|
|---|
|Guenther Goerz|
#### github ids etc.
| name | github-id | twitter-id | short|
|------|----|---|--|
| Ben Brumfield | @benwbrum | @benwbrum
| Gimena del Rio | @Gimena | @gimenadelr
| Andreas Wagner | @awagner-mainz | @anwagnerdreas
| Frank Grieshaber | @wenamun | @wenamun
| Florian Thiery | @florianthiery| @fthierygeo |FT
| Rainer Simon | @rsimon | @aboutgeo
| Valeria Vitale | @valeriavitale | @nottinauta |VV
| Susan Brown | @susanbrown | @susanirenebrown
#### Linked Pipes WG
* "Project Manager": Florian
* "Committee": Susan, Valeria, Rainer, Florian, Ben, Andreas, Frank, Gimena, Guenther
### Notes, tools and other links to (maybe) integrate later into the inventory
* In terms of pipelines: Notebooks (Jupyter, R)
* [curl - command line tool and library for transferring data with URLs](https://curl.haxx.se/)
* [xTriples - Web services for extracting rdf from xml](http://xtriples.spatialhumanities.de/index.html)
* [X3ML Toolkit - Extracting rdf from other data formats](https://www.ics.forth.gr/isl/index_main.php?l=e&c=721)
* [jq - a commandline json processor](https://stedolan.github.io/jq/) ([lesson in Programming Historian](https://programminghistorian.org/en/lessons/json-and-jq))
* [SAMOD: an agile methodology for the development of ontologies](http://essepuntato.github.io/samod/)
* [Labeling System - web app for creating and publishing terms with contextual validity as LOD](https://github.com/search?q=topic%3Alabelingsystem+org%3Amainzed+fork%3Atrue)
* [Academic Meta Tool - webapp for modelling vagueness in graphs including reasoning](http://academic-meta-tool.xyz/)
* [Alligator - web app transforming a correspondence analyses to a relative chronology as RDF](https://rgzm.github.io/alligator/)
* [WissKI Virtual Research Environment](http://wiss-ki.eu/)[, see also](https://www.drupal.org/project/wisski)
* [ResearchSpace environment](https://github.com/researchspace/researchspace)
* [Pipeline RDF+XML to JSON-LD](https://hbz.github.io/swib18-workshop/#/35)
* [Protégé](https://protege.stanford.edu/)
* [OntoME](http://ontome.dataforhistory.org/)
## Commitments
* With regards to *ontologies*, we try to resume the discussion on ADHO's LOD SIG as well as see if the [Data for History consortium](http://dataforhistory.org/) would be a good community to approach. Andreas will do the first, and Andreas will do the second (others are very welcome to chime in).
* We nominate Karl :-) to create a [Linked Pipes Repo](https://github.com/LinkedPasts/LinkedPipes) for us in the [Linked Pasts github organization](https://github.com/LinkedPasts), and call it Linked Pipes (short: Linked||); have a searchable page with the list of tools that we will commit individually to documenting and Florian to set it up; *NOTE (FT):* would recommend not a md structure, we should use JSON templates (as single documents to pull request files) in order to use nice frameworks like [filter.js](http://jiren.github.io/filter.js/index.html).
* *NOTE (FT): A domain is registered [http://linkedpipes.xyz](http://linkedpipes.xyz/) which will contain the filter.js framework.*
* first logo proposal by FT, VV will to digital version
![](https://i.imgur.com/GKmooGG.jpg)
*CC BY 4.0 Linked Pipes WG*
![](https://i.imgur.com/EYpgwNp.png)
*CC BY 4.0 Linked Pipes WG*
* Google group for cummunication inside the Linked Pipes Working Group: (Ben will create, Valeria will get the email addresses of the people in the group)
* Florian will work with Karl to set up the pages and be project manager of Linked||;
* The following folks will be admins on the repo and collaborate to approve new contributors:
* Will document tool(s):
* Susan
* Valeria and Gimena (Recogito)
* Andreas
* Loïc
* Florian
* Frank
* Guenther (Pointers to WissKi doc)
* Will document at least one workflow:
* Ben
* Valeria and Gimena (Recogito-related workflows)
* Andreas
* Florian (Alligator to AMT)
* Will prepare some kind of summary (blog post/white paper) for reporting at next Linked pasts meeting
* Susan will lead
* Florian will help ;-)
* will be *technical admin(s)* of the LinkedPipes Repo
* Florian
* will be *content admins* of the LinkedPipes Repo
* Susan
* Valeria
* Gimena
* Florian
* Andreas
* Ben
* Rainer
* Frank
* The basic structure of the template:
![](https://i.imgur.com/PwevL2s.jpg) *CC BY 4.0 Linked Pipes WG*
```json
{
"name": "",
"links": [],
"dateOfEntry": "",
"entryLevel": "{beginner:yes/no}",
"consumesLOD": "true/false",
"producesLOD": "true/false",
"inputFormats": ["JPG", "TIFF", "PNG", "N3", "RDF/XML", "XML-TEI", "CSV", "JSON-LD", "GEOJSON", "IIIF-JSON", "PLAIN-TEXT", "HTML", "TTL", "SHP", "X3D", "any 3D format", "SQL", "SPARQL", "SHAQL", "CYPHER", "audio/video"],
"outputFormats": ["JPG", "TIFF", "PNG", "N3", "RDF/XML", "XML-TEI", "CSV", "JSON-LD", "GEOJSON", "IIIF-JSON", "PLAIN-TEXT", "HTML", "TTL", "SHP", "X3D", "any 3D format", "SQL", "SPARQL", "SHAQL", "CYPHER", "audio/video"]
}
```