# VAIA/UGain Linked Data and Solid: day 1 An introduction to the world of Linked Data: fostering semantic interoperability ## Agenda Your hosts for today: - [Pieter Colpaert](https://pietercolpaert.be/#me) - [Pieter Heyvaert](https://pieterheyvaert.com/#me) | Time | Title | By | | -------- | -------- | -------- | | 17:30 | Welcoming everyone at Technologiepark and networking over sandwich dinner | | | 18:30 | First theory class: an introduction to Linked Data (recorded and livestreamed) | Pieter Colpaert | | 19:30 | First exercises: creating your first bit of Linked Data | Pieter Heyaert | | 20:30 | Networking drink (not recorded) | | ## Exercises See Ufora: open the ZIP file Description: ## Competences Check out the recording of the lecture. You must be able to: * Explain the difference between N-Triples, Turtle, TriG and N-Quads * Explain RDF triples, named nodes, blank nodes. * Read and write Turtle notation * Read and write JSON-LD (knowing the functionality from the slides is sufficient) * Be able to understand what triples are created when given an RDFa example * Be able to curl Slides: _TODO:Link_ Different people have different ways of learning. The text below also contains everything you need to know after today. ## Further reading ### URI dereferencing and disambiguation The page identified by, or located at <https://stad.gent/nl/mobiliteit-openbare-werken/parkeren/parkings-gent/parking-sint-pietersplein> is not the same as the thing identified by <https://stad.gent/id/parking/P10>. The latter identifier points at the parking facility, not at a page about this parking lot. Nonetheless, if you dereference the parking lot’s identifier, you will be redirected to a page about it. In RDF documents, you can now do statements about both separately: parking:P10 foaf:page <https://…sint-pietersplein> . <https://…sint-pietersplein> foaf:primaryTopic parking:P10 . Web trivia: this is a long-standing conundrum in Web engineering referred to as HTTPRange-14: https://en.wikipedia.org/wiki/HTTPRange-14 Two common solutions are used to make sure a real-world identifier can be disambiguated: a HTTP 303 redirect as in the example above, or by using hash-identifiers. A 303 See Other redirect is used to indicate that this is not a page you can GET, but that there is another document somewhere else you can consult to get a representation of the thing this URI is identifying. ```bash $ curl -I https://stad.gent/id/parking/P10 HTTP/2 303 server: nginx location: https://mobiliteit.stad.gent/p10-sint-pietersplein ``` Based on the Web Scraping chapter, you should be able to now explain why 303 is used, and not a 301 for example. A potential disadvantage towards developer experience is that in a browser, a web developer will not always notice a redirection happening, and may wrongly assume the current Web browser’s URL is the URI of the real-world object. This is a common mistake developers make when using Wikidata, where concept URIs and URLs for pages are only slightly different: https://www.wikidata.org/wiki/Q800814 vs. http://www.wikidata.org/entity/Q800814 for example. Hash-identifiers are used to identify something that is described in a page. The client knows what identifier in the page to look at. As explained in the URL-section in the Web Scraping chapter, the server does not see the # and what is behind it. Then no redirection needs to be done at all. Good examples of this: https://pietercolpaert.be/#me – identifying someone on a personal website http://www.w3.org/1999/02/22-rdf-syntax-ns#type – publishing a vocabulary in a simple RDF file with multiple terms Finally, it is not uncommon that no disambiguation is done at all and the Range-14 issue is ignored. Then one identifier is used for both the document and the real-world object. From the context of the triple you could then try to infer what it is about. If you are saying for example that `parking:P10` takes 5 minutes to read, you could infer that you of course can only read a page and thus this is related to the page instead of the actual parking facility. ### Varying content types URI dereferencing can be executed by various agents: by a browser like Firefox, or by an HTTP library in a programming language. Content negotiation is often provided on URIs in order to make sure different types of user agents will get the Content-Type that’s best suited for their use case. #### Option 1: Content negotiation Firefox will by default send an Accept header that will look something like: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8 However, a Linked Data user agent may send an Accept header that looks more like this: Accept: application/ld+json;q=1,application/n-quads;q=1,application/n-triples;q=1,application/rdf+xml;q=1,application/trig;q=1,text/turtle;q=1,text/n3;q=1,text/html;q=0.95 A server will also keep a similar priority list of content types it supports when handling a GET request to a URL. It multiplies matching content-types q-values with each other, and takes the content type with the highest resulting value. The Content-Type header will set the correct mime-type. A user agent cannot rely on the Accept header being consistently honored, since the origin server might not implement content negotiation for the requested resource, or might decide that sending a response that doesn’t conform to the user agent’s preferences is better than sending a 406 Not Acceptable response. Do not forget to also set the Vary header (see the section on caching). #### Option 2: embedding RDF in an HTML page Another option in order to make URI dereferencing work for both humans and Linked Data clients would be to use the `<script>` tag inside your HTML page to include a Linked Data snippet, or include RDFa annotations. You can test getting the RDF triples from various URIs yourself using https://rdf-play.rubensworks.net/. It also comes with a server proxy you can configure in case CORS is not properly configured on the server. ## Other learning resources that may help you Telling a similar story in a slightly different way: * An introduction to Linked Data in a video: https://vimeo.com/401026338 * The course by prof. Harald Sack: https://www.youtube.com/playlist?list=PLoOmvuyo5UAcBXlhTti7kzetSsi1PpJGR * The “Semantic Web” from Web Fundamentals by prof. Ruben Verborgh: https://rubenverborgh.github.io/WebFundamentals/semantic-web/ * The FAIR principles: https://www.go-fair.org/fair-principles/