How subtitles should work in Opencast

We have a lot of implicit and often slightly different assumptions of how we handle subtitles. In here, I'll try to document a best-practice approach. After some discussion, we can maybe add this to the Opencast documentation and make it the official approach.

I'll try to suggest common ways of storing and identifying the subtitles from upload, through processing and finally for showing them in the player.

TL;DR

Let's make subtitles a first-class citizen in Opencast, storing them as tracks alongside audio and video streams and letting workflows handle them by default.

Let us use flavors of the form captions/<processing>, e.g. captions/source and captions/delivery. This makes it easy to identify subtitles and also easy to publish everything ready for publication.

Let us specify tags holding additional information of the form lang:<lang>, generator:<type>[:<id>] and type:<caption-type> to each subtitle track.

History

While there are many components in the history of Opencast dealing with subtitles (Matterhorn admin interface, Engage player, …), the most relevant ones are Opencast's HTML5 players: Theodul and Paella.

Since it is still beta, I leave Paella Player 7 out of the historical consideration.

Theodul

Theodul supports WebVTT files and considers both tracks and attachments when loading subtitles. It only supports a single subtitle stream. If both exist, it prefers tracks over attachments. The only selection criteria is, that the media package element must have the mime type text/vtt.

Examples:

type track, flavor captions/delivery, no tags, mime type text/vtt
- will be selected for the player
type track, flavor captions/vtt, tag lang:en, mime type text/plain
- will not be selected

See loadAndAppendCaptions(…) in engage-theodul-plugin-video-videojs/src/main/resources/static/main.js for more details.

Paella

Paella supports dxfp, WebVTT and SubRip files (not entirely sure). It supports loading subtitles from attachments, catalogs and (since ≥ 12.x) tracks. It supports multiple subtitle streams. Subtitles selection happens by selecting all media package elements (only attachments in Opencast ≤ 11) with main flavor caption.

The sub-flavor is split at the first + character. The first part is used as the format identifier, the second part, if present, as a language identifier. If no language was detected, the Paella player looks for a tag of the formlang:<language> to use as language identifier. The language identifier is also the language description.

Examples:

flavor captions/vtt+de, tag lang:de becomes:
- format: vtt
- language identifier: de
- language description: de
flavor captions/vtt, tag lang:en becomes:
- format: vtt
- language identifier: en
- language description: en
flavor captions/delivery, no tags, mime type text/vtt
- will cause problems since delivery is detected as format

See getCaptions(…) in engage-paella-player/src/main/paella-opencast/plugins/es.upv.paella.opencast.loader/03_oc_search_converter.js for more details.

Components

Going forward, we need to support subtitles not only in players, but in several components to help users working with subtitles:

Creating subtitles
- Upload
  Components which allow uploading media should also allow uploading subtitles by default. This includes the admin interface, LMS integrations and the video portal.
- Autogenerate subtitles
  Opencast should offer to auto-generate subtitles out of the box, leveraging Vosk, Subtitle2go and/or whisper. It should also provide further integration with SaaS providers like IBM, Google or AmberScript.
Modifying subtitles
- Opencast Editor
  Opencast's editor should allow users to download, create, upload and modify subtitles easily
- Update Publications
  It should be easy for users to update publications after modifying a subtitle in the editor.
- Editor Backend
  Cutting parts from video streams should also effect subtitle tracks.
Presenting subtitles
- Default Workflows
  All of Opencast's workflows should treat subtitles as a first-class citizen and handle them by default, including cutting and publication.
- Players
  Players should offer to display published subtitles.

Subtitle Formats

Now let's talk about how things should work in the future.

The Web Video Text Tracks Format (WebCTT) is a widely adopted W3C standard for subtitles/captions in the World Wide Web. In this context, it has completely replaced all competitor formats by now. We do not need to support any other formats.

For simplicity, I suggest to support WebVTT only.

If other formats still exist in its archive, Opencast can convert many formats to WebVTT using FFmpeg.

Media Package: How to store subtitles

In the past, two media package categories have been used to store subtitles: Tracks and attachments.

Attachments are static files associated with the event represented by the media package. They usually do not have a temporal aspect and are not commonly used for further processing.
Tracks usually have a temporal aspect and represent the entire, or at least a part of the overall event. They are commonly used for further processing.

Since Opencast allows cutting of events, and with the trend to push this task more and more towards end-users, we cannot control the order of generating subtitles versus cutting the event.

Therefore, it is important that subtitles are treated similar to other tracks, since that allows us to cut the subtitle along video and audio whenever that becomes necessary.

To allow for this, I suggest to always store subtitles in the tracks section of media packages.

Using tracks also allows us to easily use tools like FFmpeg to convert subtitles between different formats, in the same way we convert audio and video using the encode operation.

Flavors

With auto-generated and archived subtitles, as well as with editing in user's hands, we need to be able to distinguish uncut and cut subtitles. That is why using captions/vtt for everything doesn't really work.

More than that, we may have several different subtitle streams. They can differ because they are generated differently, they can be in different languages, or they can be either closed captions or subtitles.

Containing all information in a two (arguably three) component flavor is hard, and we should consider putting most of these additional information in more flexible places instead.

With video and audio streams, the main flavor describes the kind of video (presenter, presentation, …) while the sub-flavor expresses the processing state (source, work, delivery). Having unrelated captions would mean that we also have unrelated audio streams in a media package. That is unlikely. We can hopefully assume that just supporting one stream is enough, and we can always stick to captions as main flavor, to make the set of captions easily identifiable.

As sub-flavor, we should use the processing state similar to what we use for video and audio streams. This makes it easy for us to distinguish between source material and material which has been processed (e.g. cut).

I suggest always using captions as flavor while using the processing state as sub-flavor, similar to other media package tracks. For example, a caption could be flavored captions/source when ingesting and captions/delivery when it is ready for publication.

Previously, the language was sometimes attached to the sub-flavor in the form captions/source+en. I suggest moving this information to tags instead, since they are more flexible and this makes generic handling of media package tracks harder. For example, it's no longer possible to publish */delivery if captions are flavored captions/delivery+en.

Mime Type

Since the format of subtitle tracks should always be WebVTT, the mime type should always be text/vtt.

This also makes selecting all subtitles easier and could also be an alternative to using captions as main flavor, although no operations in Opencast let you select tracks by mime type.

Fallbacks

For internal components, no fallbacks should be necessary, unless adopters want to reprocess already generated and/or published subtitles. We already have the means to access published media package elements. If we provide a simple operation to move attachments flavored captions/vtt+lang to the new track-based format, we should be good to go.

For already published media, it would be great if we could still have a fallback in players, so that old media are still displayed correctly. That is why I suggest players to implement the following fallback mechanism:

If no subtitle tracks are present, look for attachments with flavor captions. This should already give us a list of all old subtitles.
If the sub-flavor is of the form …+<lang>, treat it as if a tag of the form lang:<lang> were present.

Let players fall back to attachments if no subtitle tracks are present.

How subtitles should work in Opencast

TL;DR

History

Theodul

Paella

Components

Subtitle Formats

Media Package: How to store subtitles

Flavors

Tags

Mime Type

Fallbacks

Read more

BigBlueButton-Adopter's Meeting

Workshop: Open-Source-Software

Untertiteleditor 2022/2023

Certificate Renewal on PVE with wildcards