# How subtitles should work in Opencast We have a lot of implicit and often slightly different assumptions of how we handle subtitles. In here, I'll try to document a best-practice approach. After some discussion, we can maybe add this to the Opencast documentation and make it the official approach. I'll try to suggest common ways of storing and identifying the subtitles from upload, through processing and finally for showing them in the player. ## TL;DR :::info Let's make subtitles a first-class citizen in Opencast, storing them as tracks alongside audio and video streams and letting workflows handle them by default. Let us use flavors of the form `captions/<processing>`, e.g. `captions/source` and `captions/delivery`. This makes it easy to identify subtitles and also easy to publish everything ready for publication. Let us specify tags holding additional information of the form `lang:<lang>`, `generator:<type>[:<id>]` and `type:<caption-type>` to each subtitle track. ::: ## History While there are many components in the history of Opencast dealing with subtitles (Matterhorn admin interface, Engage player, …), the most relevant ones are Opencast's HTML5 players: Theodul and Paella. :::warning Since it is still beta, I leave Paella Player 7 out of the historical consideration. ::: ### Theodul Theodul supports WebVTT files and considers both tracks and attachments when loading subtitles. It only supports a single subtitle stream. If both exist, it prefers tracks over attachments. The only selection criteria is, that the media package element must have the mime type `text/vtt`. Examples: - type `track`, flavor `captions/delivery`, no tags, mime type `text/vtt` - will be selected for the player - type `track`, flavor `captions/vtt`, tag `lang:en`, mime type `text/plain` - will __not__ be selected See `loadAndAppendCaptions(…)` in `engage-theodul-plugin-video-videojs/src/main/resources/static/main.js` for more details. ### Paella Paella supports dxfp, WebVTT and SubRip files (not entirely sure). It supports loading subtitles from attachments, catalogs and (since ≥ 12.x) tracks. It supports multiple subtitle streams. Subtitles selection happens by selecting all media package elements (only attachments in Opencast ≤ 11) with main flavor `caption`. The sub-flavor is split at the first `+` character. The first part is used as the format identifier, the second part, if present, as a language identifier. If no language was detected, the Paella player looks for a tag of the form`lang:<language>` to use as language identifier. The language identifier is also the language description. Examples: - flavor `captions/vtt+de`, tag `lang:de` becomes: - format: `vtt` - language identifier: `de` - language description: `de` - flavor `captions/vtt`, tag `lang:en` becomes: - format: `vtt` - language identifier: `en` - language description: `en` - flavor `captions/delivery`, no tags, mime type `text/vtt` - will cause problems since `delivery` is detected as format See `getCaptions(…)` in `engage-paella-player/src/main/paella-opencast/plugins/es.upv.paella.opencast.loader/03_oc_search_converter.js` for more details. ## Components Going forward, we need to support subtitles not only in players, but in several components to help users working with subtitles: - Creating subtitles - **Upload** Components which allow uploading media should also allow uploading subtitles by default. This includes the admin interface, LMS integrations and the video portal. - **Autogenerate subtitles** Opencast should offer to auto-generate subtitles out of the box, leveraging Vosk, Subtitle2go and/or whisper. It should also provide further integration with SaaS providers like IBM, Google or AmberScript. - Modifying subtitles - **Opencast Editor** Opencast's editor should allow users to download, create, upload and modify subtitles easily - **Update Publications** It should be easy for users to update publications after modifying a subtitle in the editor. - **Editor Backend** Cutting parts from video streams should also effect subtitle tracks. - Presenting subtitles - **Default Workflows** All of Opencast's workflows should treat subtitles as a first-class citizen and handle them by default, including cutting and publication. - **Players** Players should offer to display published subtitles. ## Subtitle Formats Now let's talk about how things should work in the future. The [Web Video Text Tracks Format (WebCTT)](https://www.w3.org/TR/webvtt1/) is a widely adopted W3C standard for subtitles/captions in the World Wide Web. In this context, it has completely replaced all competitor formats by now. We do not need to support any other formats. :::info For simplicity, I suggest to support WebVTT only. ::: If other formats still exist in its archive, Opencast can convert many formats to WebVTT using FFmpeg. ## Media Package: How to store subtitles In the past, two media package categories have been used to store subtitles: Tracks and attachments. - __Attachments__ are static files associated with the event represented by the media package. They usually do not have a temporal aspect and are not commonly used for further processing. - __Tracks__ usually have a temporal aspect and represent the entire, or at least a part of the overall event. They are commonly used for further processing. Since Opencast allows cutting of events, and with the trend to push this task more and more towards end-users, we cannot control the order of generating subtitles versus cutting the event. Therefore, it is important that subtitles are treated similar to other tracks, since that allows us to cut the subtitle along video and audio whenever that becomes necessary. :::info To allow for this, I suggest to always store subtitles in the **tracks** section of media packages. ::: Using tracks also allows us to easily use tools like FFmpeg to convert subtitles between different formats, in the same way we convert audio and video using the `encode` operation. ## Flavors With auto-generated and archived subtitles, as well as with editing in user's hands, we need to be able to distinguish uncut and cut subtitles. That is why using `captions/vtt` for everything doesn't really work. More than that, we may have several different subtitle streams. They can differ because they are generated differently, they can be in different languages, or they can be either closed captions or subtitles. Containing all information in a two (arguably three) component flavor is hard, and we should consider putting most of these additional information in more flexible places instead. With video and audio streams, the main flavor describes the kind of video (`presenter`, `presentation`, …) while the sub-flavor expresses the processing state (`source`, `work`, `delivery`). Having unrelated captions would mean that we also have unrelated audio streams in a media package. That is unlikely. We can hopefully assume that just supporting one stream is enough, and we can always stick to `captions` as main flavor, to make the set of captions easily identifiable. As sub-flavor, we should use the processing state similar to what we use for video and audio streams. This makes it easy for us to distinguish between source material and material which has been processed (e.g. cut). :::info I suggest always using `captions` as flavor while using the processing state as sub-flavor, similar to other media package tracks. For example, a caption could be flavored `captions/source` when ingesting and `captions/delivery` when it is ready for publication. ::: Previously, the language was sometimes attached to the sub-flavor in the form `captions/source+en`. I suggest moving this information to tags instead, since they are more flexible and this makes generic handling of media package tracks harder. For example, it's no longer possible to publish `*/delivery` if captions are flavored `captions/delivery+en`. ## Tags Tags are very flexible and can hold all sorts of additional information about media package elements. I suggest defining common tags to specify: - language of the subtitle track - generator (auto-generated, manually generated) - type (closed captions, subtitles) All tags should be optional and components should work without them being present, falling back to generic displays, or not showing information at all. When processing subtitles, Opencast's components should either keep these tags as is, or adjust them accordingly. To specify the language of a subtitle track, we can use tags of the form `lang:<language>` where `language` is a 3-letter [ISO 639 language code](https://en.wikipedia.org/wiki/ISO_639-3). Adding multiple tags in case multiple languages are used is possible. :::danger Just using 3-letter ISO language codes does not allow us to specify regions. For example, we cannot distinguish between British English (`en-GB`) and American English (`en-US`). Do we need that? If so, we could use [RFC 3066 language codes](https://www.ietf.org/rfc/rfc3066.txt) instead. ::: Specifying how subtitles are generated, in partcular, if they are automatically generated, can help users. Therefore, I suggest adding tags of the form `generator:<id>` and `generator-type:<type>`, where type should be either `manual` or `auto` and the generator `id` may be added to specify the system generating the subtitle. For example, a manually generated subtitle should be tagged `generator:manual` while an auto-generated subtitle should be tagged `generator:auto` and one generated by Vosk may even be tagged `generator:auto:vosk` to be more specific. If no `generator` is specified, components should make no claims about how this may be generated. For accessibility in particular, it is important to know if a subtitle track is actually a subtitle or a closed caption (e.g. including additional information for deaf people). If we have this information, we should include it with a tag of the form `type:<tyle>`, either `type:subtitle` or `type:closed-caption`. :::info I suggest including additional information about subtitles with tags of the form `lang:<language-code>`, `generator:<type>[:<id>]` and `type:<captions-type>`. ::: ## Mime Type Since the format of subtitle tracks should always be WebVTT, the mime type [should always be `text/vtt`](https://developer.mozilla.org/en-US/docs/Web/API/WebVTT_API#webvtt_files). This also makes selecting all subtitles easier and could also be an alternative to using `captions` as main flavor, although no operations in Opencast let you select tracks by mime type. ## Fallbacks For internal components, no fallbacks should be necessary, unless adopters want to reprocess already generated and/or published subtitles. We already have the means to [access published media package elements](https://docs.opencast.org/r/12.x/admin/#workflowoperationhandlers/publication-to-workspace-woh/). If we provide a simple operation to move attachments flavored `captions/vtt+lang` to the new track-based format, we should be good to go. For already published media, it would be great if we could still have a fallback in players, so that old media are still displayed correctly. That is why I suggest players to implement the following fallback mechanism: - If no subtitle tracks are present, look for attachments with flavor `captions`. This should already give us a list of all old subtitles. - If the sub-flavor is of the form `…+<lang>`, treat it as if a tag of the form `lang:<lang>` were present. :::info Let players fall back to attachments if no subtitle tracks are present. :::