ECB data model

# ECB data model **NEXT MEETING: tuesday april 19, 13-14 CET** Location: https://meet.jit.si/AMARCEurope ### Agenda april 19 talk about prototyping strategies and tools *Feel free to add items, we'll discuss the agenda at the beginning of the meeting (~Franz)* ### Agenda march 22 #### 1) Discuss media data model * Compare again with most relevant other data models: * [https://schema.org](schema.org) * [ActivityPub](https://www.w3.org/TR/activitypub/) * RSS plus [podcast namespace](https://github.com/Podcastindex-org/podcast-namespace) * ARDcore * Modeling of playlists (of different kinds like episodes, channels, user created playlists, best-ofs, ..) * Review naming maybe (if time permits or prepare for next meeting): * discuss shared data model for claims on authorship and licensing * (prepare) discussion on lower data model for replication (hashes and addresses, encoding and signatures, change tracking and conflict resolution, migration and versioning) #### 2) Discuss prototyping environments * Use a "Quick path to API" framework like [Keystone](https://keystonejs.com/), [Directus](https://directus.io) * RDF based tools, e.g. https://atomicdata.dev/ * Anyone knows other nice interactive data modeling tools? #### 3) Communication and opening for contributions * Preparing to broaden the discussion and invite for participation? * Setup mailing list and/or matrix room #### 4) Next steps * Define aims until hackathon * Define and delegate action items * Find date and facilitator for next meeting --- > https://app.conceptboard.com/board/8qua-amof-disd-8gti-6hgi Basic Idea: Digest important aspects of a common data model for a replication network for community media. Initial collection of things to look at follows. For the session we'd collaborate in this doc and/or a collaborative modeling tool. If you have ideas please share here, the best interactiv tools for data modeling that I found so far seems to be LucidChart, there everyone would need a (free) account for shared editing. ## Audio/Video/Media metadata standards ### Specs & standards * [RSS](https://validator.w3.org/feed/docs/rss2.html) * [Atom](https://datatracker.ietf.org/doc/html/rfc4287) * [RSS Podcast Namespace extension](https://github.com/Podcastindex-org/podcast-namespace) * [RSS Media Namespace extension](https://www.rssboard.org/media-rss) #### Public broadcasters & GLAM * [EBUcore](https://tech.ebu.ch/MetadataEbuCore) *RDF* * [Europeana Data Model](https://pro.europeana.eu/page/edm-documentation) *RDF* * [BIBFRAME](https://www.loc.gov/bibframe/docs/bibframe2-model.html) - ContentItem ~ Work - Instance ~ MediaAsset - Item ~ File ## Implementations #### Community media * [Lohrothek](https://git.hack-hro.de/lohro/lohrothek/lohrothek-api) *REST/JSON* * [AURA](https://gitlab.servus.at/aura/) *REST/JSON* * freie-radios.net *RSS/custom* * CBA *RSS/wordpress REST/JSON* * [XRCB.cat](https://xrcb.cat/en/) *RSS/wordpress REST/JSON* * media.ccc.de *RSS/REST/GraphQL* * Castopod *RSS/REST/GraphQL* Links to material - BfR: (first Meeting to API Model) https://cloud.freie-radios.de/s/agikozqXWMLL836 Passwort BFR_API-2020 Freie Radio / BFR API https://git.hack-hro.de/lohro/bfr-api Presentation for Freie Radio API (BfR API) https://digital.danubestreamwaves.org/wp-content/uploads/2020/11/bfr-api.pdf https://digital.danubestreamwaves.org/2020/11/freie-radio-api/ ## Exchange Protocols * RSS, Atom * ActivityPub * RDF dumps ## Replication *May be deferred to a later session* * Custom protocol vs existing protocols * "True P2P" vs authorative source-of-truth # Data modeling The following is a draft of a data model for the exchange of community media. Currently, this is a very rough first draft. Please add comments and/or edit if you see things that are missing or that you would model differently. ### Data types ```mermaid classDiagram class License { name } class File { contentURL mimetype size hash duration codec bitrate resolution additionalMetadata } class MediaAsset { title description mediaType[audio,video,image,document] duration imageID conceptID[] contributorID[] } class Chapter { startTime endTime title type[music,speech] meta concepts } class Transcript { text language engine } class Collection { type[podcast,event] title subtitle summary description imageID variant[EPISODIC|SERIAL] broadcastSchedule[channel, rrule] contributorIDs rssFeedURL creationDate terminationDate } class BroadcastEvent { startTime endTime channel } class BroadcastChannel { name publisher } class PublicationChannel { type(FM, Web) address } class ContentItem { title subtitle summary fullText concepts showID groupingID groupingDelta contributorID[] mediaAssetID[] relatedContentItemID[] } class Grouping { title showID } class Actor { name type[person,group,organization] name contactInformation logo/avatar } class Image { title alt fileIDs } class Contribution { contributedTo role actor } Contribution --> Actor Image --> File: 1 ContentItem <--> Collection: n..1 ContentItem <--> Grouping: n..1 ContentItem --> MediaAsset: n..n Grouping --> Collection: n..1 BroadcastEvent <--> BroadcastChannel BroadcastChannel <--> PublicationChannel PublicationChannel --> Actor MediaAsset <--> BroadcastEvent MediaAsset <--> File MediaAsset <--> Transcript MediaAsset <--> Chapter MediaAsset --> Actor : role ``` `License` should be on `MediaAsset`, `ContentItem`, `Show`, `PublicationChannel` `Image` should be on `MediaAsset`, `ContentItem`, `Show`, `Chapter`, `Grouping` `Contribution` should be on `MediaAsset`, `ContentItem`, `Collection` ### Feedback AndiH: * use `name` instead of `headline`, c.f. https://github.com/schemaorg/schemaorg/issues/373 * use `title`? Roland Alton: * use `collection` instead of `series`, weil generischer? ## modeling categories / tags / keywords creators/authors/originator publishers/distributor licenseHolder contributor/interviewee/.. ["role"] all link to "actor" or "concept" --- ContentItem --> Concepts Concept uuid origin: (imported / manually assigned / automatically derived by NLP tools) relatedTo --> Concept sameAs --> Concept parentOf --> Concept children --> Conecpts label conceptType wikiDataId? ConceptType: Person, Organization, Location, Subject, Tag Concept: label: Australia type: Location Concept: label: Sidney type: Location isChildOf: Australia Concept: label: Communism type: Subject Concept: label: Rosa Luxemburg type: Person relatedTo: Communism Concept: namespace: cba label: Technology isChildOf: cba:Science Concept: namespace: frn label: "Technik & Wissenschaft" isSameAs: cba:Technology ContentItem: headline: "Rosa Luxemburg and the German Revolution of 1919" concepts: uuids[reference link] ## class description Das Datenmodell soll in der Lage sein, serielle und nicht-serielle audiovisuelle Inhalte (Audio, Video, Bilder, Dokumene) abzubilden. Es soll den Metadatenaustausch zwischen ECB Nodes ermöglichen. Darüber hinaus soll es für klassische Anwendungsgebiete, wie für die Darstellung auf Podcast- und Radioseiten (zB. Programmkalender "Was lief wann?") eingesetzt werden können. Herkunft des jeweiligen Inhalts berücksichtigen. Wie speichert Node, woher das contentItem stammt? ### ContentItem - -> https://schema.org/CreativeWork - kann für sich stehen - entspricht Sendung, Feature, Werk, Publikation - guter Einstiegspunkt für Contentsuche ### MediaAsset - Media Files mit den konkreten Files (Audio, Video, Image, Document) - Beispiel: 1. MediaAsset ist Sendung. 2. MediaAsset ist Thumbnail/Teaser Bild - Beispiel: Beitrag berichtet über Vortrag, zusätzlich 90 kompletter Mitschnitt der so nicht gespielt wurde - Diskussion: Einzelne Beiträge einer Sendung sind MediaAssets oder sollten als ContentItem abgebildet werden? Beides ist möglich und ein Grenzfall. Im Fall von Podcasts wahrscheinlich genau einem ContentItem zugeordnet, im Fall von Radiosendungen können mehrere MediaAssets einem ContentItem zugeordnet sein ### Transcript - optional - Die Audiospur des MediaAssets als Text ### File - verschiedene Dateiversionen des MediaAssets, zB. verschiedene Formate und Bitraten ### Chapter - optional - Kapitel innerhalb eines MediaAssets, zB. Kennzeichnung inhaltlicher Blöcke, von Sprache und Musik, u.Ä. ### Actor - optional - Beteiligte Personen/Organisationen nach versch. Rollen (Autor, Urheber, Interviewer, Interviewte, Publisher, License Holder, etc.) ### License - optional (?) - Lizenzinformationen von MediaAssets und/oder einzelnen Chapters ### Episode - optional - Zur Kennzeichnung serieller Publikationen (Show, Season, Episode) ??? - Einordnung in Sendeplan - Kann auch von Radiosendung oder Podcast sein - Episode kann zur Planung ohne ContentItem bestehen und enthält Sendeplanung - (Könnte auch als ContentItem abgebildet werden) ### Show (Collection, weil generischer?) - optional - Zur Kennzeichnung des ContentItems als Teil einer Serie - Enthält recurrence rule für Senderhythmus - Im Fall eines Kongresses wären Show Senderäume/Programmschienen und die gesamte Veranstaltung ein Channel ### Broadcast - optional - Zur Kenn ### ContentChannel - optional - Zur Gruppierung von ContentItems in eine "Sammlung", zB. Veranstaltung ### PublicationChannel - optional - Wo wurde die Publikation ausgestrahlt? Radio, TV, Web, etc. ## wie machen wir weiter ### Anwendungsfälle - Podcast - Radiosendung - Ankündigung - Videoarchiv (zB. Sammlung von Vorträgen) - Zeitungsausgabe & Artikel - Veranstaltung/Kongress - Kunstwerke/sammlung (zB. Europeana) - ... ### Demo / Prototype z.B. mit https://directus.io/ oder https://keystonejs.com/ lassen sich schnell solche Datenmodelle abbilden und zur Spielwiese machen ## Replication Requirements ### Problemdefinition Template: https://github.com/jam01/SRS-Template/blob/master/template.md - bestehende Plattformen bestehen aus verschiedenen Systemen und DBs, auf die wir keinen Einfluss haben - Ergebnis soll sein: geteilter Index, der sich automatisch aktualisiert, alle Nodes abgreift und alles komplett integrieren können soll (gesamter Inhalt aller Nodes bis zum Zeitpunkt x), aber auch in Teilen ermöglichen - System soll auch funktionieren, wenn man nur tlw. Daten haben möchte (zb. ich interessiere mich nur für die neuen Publikationen, ähnlich wie Podcast-Tracker) - Trigger, wenn was Neues publiziert wird? - enthält der Replication Feed die Gesamtheit der Daten oder enthalten Replication Streams Headers der Changes mit URLs, über die man die zusätzlichen Daten holt (zb. Transkripte), was performanter wäre? - in wie weit bildet Datenmodell und Replication die Social Media Features ab und wie detailliert wollen wir das schon drin haben? - Authentication? - Provenienz: Überprüfbarkeit der Herkunft des jeweiligen Inhalts - Moderationspolicies abbilden? ### Lösung - Changes werden vom abgreifenden Node mit Gesamtbestand des abzugreifenden Nodes abgeglichen und nicht umgekehrt - Pointer: alles zurückkriegen, was bis zu Zeitpunkt x upgedated wurde (zB. via Sortierung der Feed Items nach modified date DESC) oder via Abgleich mit Sequence counter zu jedem Item ## protokoll meeting 22/03 ### discuss media data model __Kommentare Status Quo:__ - Andi: Naming nicht korrekt, sich an schema.org richten, zb: Broadcasting -> BroadcastEvent - Andi fragt nach ob Dokumentation de ARDcore API öffentlich gemacht werden kann, oder ob wir zumindest Blick drauf werfen können - ARDcore Franz: Interfaces wie Items für lose elemente wo verschiedene entities platz finden - ARDcore Andi: Item damit elemente nicht mehrmals existieren, zb. Tatort, verschiedene ausstrahlungen werden mit publications abgebildet - ActivityPub Franz: keine spezifischen sachen für audio&video aber wir frei erweitert - Andi: castopod (fr) haben komplette erweiterung abgebildet -> kein Modell 1:1 übernehmen, aber regelmässig vergleichen um sich keine hürden zu bauen. schauen, dass es zusammenpasst aber ohne angst was neues einzuführen. __Modeling of playlists:__ - Editorial collection / zusammenstellung anstatt von ContentItem - todo diesen part abzubilden (Sid&Franz) __Todo:__ - Franz schaut sich lower data model for replication genauer an (hashes and addresses, encoding and signatures, change tracking and conflict resolution, migration and versioning) - Andi wirft ein, dass dies stark der gewälten technologie abhängt, da zb. Matrix (json) schon viel inkludiert, HTTP feeds allerdings wohl eher nicht - authorship and licensing genauer anschaun - anschaun: https://podlovers.org/episode/podlove-radiator/ - anschaun: https://www.http-feeds.org/ (https://www.youtube.com/results?search_query=http+feeds) - Zur Vorbereitung des Hackathon Slots: Requirement doc: was braucht es für Replication an Funktionen? + verschiedene Möglichkeiten dokumentieren, wie man jeweiliges Problem lösen könnte - um im Diskussionsprozess später auszusortieren - auch als Grundlage für Vorbereitung der Protokollinputs - zB. sollen Edits von mehreren Nodes möglich sein? Eher nicht, aber Mglk. mitbedenken, das Item mit Metadaten anzureichern (zB. über eigenen Datensatz, der sich auf den originalen bezieht) - Changes: wie kriegen die Nodes von den Changes mit? Zurverfügungstellung der Changesets auf Source-Node? History mit Sequence numbers speichern? - Wollen wir bidirektionale Anfragen haben oder einfach einen Request, der verarbeitet wird? - Unterschied machen zw. Gesamtreplication oder auch partielle ermöglichen? (zB. zu Thema xy oder zu Zeitraum) -> wäre sinnvoll - Im ersten Schritt nur header austauschen und dann erst aussuchen, was repliziert werden soll? - Protokoll sollte trotzdem so simpel wie möglich sein ## protokoll meeting 19.04.22 ### Prototyping Andreas Hubel: Idee einer Audio API, um verschiedene Datenmodelle auf ein Schema zu mappen. https://github.com/saerdnaer/audio-api Mapping via TypeScript. Benutzt GraphQL Schema Interfaces nutzen, um API Mapping für das jeweilige Datenmodell zu implementieren. Feeds sollten <guid> enthalten https://github.com/Podcastindex-org/podcast-namespace/blob/main/docs/other-recommendations.md#episode-guid Praxisbeispiel: https://api-test.ardaudiothek.de/docs/ GraphQL für PostgreSQL https://github.com/graphile/postgraphile https://api-test.ardaudiothek.de/explorer/graphiql [postigraphql](https://github.com/kirbo/posti-graphql) hat bessere explorer ui als das normale graphiql! https://supabase.com/blog/2022/03/29/graphql-now-available external_ids[] multivalue text. das dann immer resolven wenn man was sucht. eher alternative_ids[]. dazu die primäre external id und die import quelle speichern. Wie wird mit Änderungen/field updates verfahren? via modified_date tag based caching invalidation via http header (etag) replication http feeds vs matrix vs kafka/nats nochmal durchgehen - kafka/nats braucht wen der die queues betreibt, http feeds kann dezentral implementiert werden matrix auch interessant für search und listings dann den index für detailqueries immer lieber direkt die apis benutzen wenn not found in index ## Protokoll Meeting 26.04.22 Spielen mit Andis https://github.com/saerdnaer/audio-api ### Get show from FYYD ``` query { show(id: "freakonomics-radio", source: FYYD) { title subtitle nodeId _raw } } ``` ### Get node type depending on interface ``` query { node(nodeId: "freakonomics-radio") { nodeId __typename ... on Item { title subtitle } ... on ShowType { title description } } } ``` ### Mapping audio API mit ECB Datamodel ContentItem -> not existing MediaAsset -> Item Relation zw. Show, Season und Items via GroupingID Audio API's 'Show' is either 'Collection' or 'Podcast' (Extensions for Show) ### Next steps - umziehen zu Github/Gitlab (cba&franz) - Mermaid kann direkt übernommen werden - im next step strukturiert abbilden, zb. JSON Schema - mailing liste und matrix raum einrichten (franz) ### Next meetings - 12/4, 13-14h: in 2 wochen treffen franz, cba & gerald wegen hackathon - 19/4, 13-14h: Andi erklärt uns prototypinganzätze - 26/4: 13-15h: prototyping session

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.