THIS IS THE INITIAL LIST. FURTHER DOCUMENTATION CAN BE FOUND IN GITLAB.
A list of potential projects based on the initial accessibility discussion.
Vosk is a free speech-to-text engine now included in Opencast. It's an easy way to get automated subtitles on a large scale.
Right now it is included as-is with no major modifications or improvements in post-processing. We hope that will very little effort, we can improve the experience quite a bit.
Extend vosk-cli
Recognizing sentence structures may be hard and may require additional work to get into that topic. This sub-task may fail if we don't find a good tutorial on something similar.
LanguageTool's language recognition:
LanguageTool responds with rules and suggestions to recognized errors:
For a long term improvement of speech recognition in an academic context, we need better source material to build data from. We should look into if we can contribute material to Mozilla Common Voice or similar projects to improve recognition long-term.
Research projects and options and ways for Opencast community members to contribute material to a central database. Does such a database exist? How can we contribute? Will that have any effect on data models we can use?
Worst case, Common Voice does not want our contribution, and we find no projects. But at least, we then know that there are none.
While we will not see any effect short-term, we can hope that this will have a long-term effect on free language models, improving speech recognition.
1 week of Nadine's time.
Vosk is a free speech-to-text engine now included in Opencast. It's an easy way to get automated subtitles on a large scale.
Vosk comes with support for many languages, but it's not tunes to an academic context. It does allow for tuning the language models, and we should look into how hard this is and if we can easily do this.
Research what we need to do to improve the language models for our context. How hard is it? What do we need to do that?
Building a new model is out of scope for this project, but may be addressed after evaluation in a follow-up project.
No risks.
We have lots of specific content where we actually need the general type of content before we analyze it. If we have more precise language models, we can potentially increase the accuracy of the recognition.
One week of Nadine's time
To make Vosk easy to use for the community, we should make sure to add vosk-cli
and different languages to Opencast's package repository.
Add vosk-cli
and language packs to the package repository.
No modifications or improvements of the tools unless necessary for packaging.
No risks.
vosk-cli
This makes the speech recognition actually usable for the masses.
We have several tools with similar functionality which already support keyboard shortcuts or should support them in the future. We should make sure to not use different shortcuts for the same function.
Research and define a set of ideally well established of keyboard shortcuts for common functionality (play/pause, seek, search, …) we can then implement across different tools.
Not actually implementing any shortcuts anywhere.
None.
A set of at least 10 commonly used shortcuts for web-based media tools.
It's less confusing for users if they do not have to re-learn shortcuts with every tool they use.
One week of Nadine's time.
Opencast Studio, especially the built-in editor, does not support any keyboard shortcuts at the moment.
Add keyboard shortcuts to Opencast studio where it makes sense. This means first and foremost to the player on the editor view.
Not all functions need keyboard shortcuts.
Relatively risk-free.
1 week of Nadine's time.
Opencast Studio still uses Travis CI for automated tests and deployments. The Travis checks seem to be broken which could cause quality issues if we do more development on Studio again. We switched to GitHub Actions everywhere else. We should do that here as well.
1 weeks of Nadine's time.
Audio normalization can drastically improve the clarity of audio, especially if a video is created by multiple speakers. We can easily make use of FFmpeg's excellent internal loudnorm implementation to add good audio normalization to Opencast.
Relatively risk-free.
2 weeks of Nadine's or Alex's time.
Glare sensitivity can be one type of visually impairment. This means that bright white surfaces can cause problems for users. This can be particular bad if users have to switch between dark and bright interfaces.
To prevent this, it would be great for Opencast Studio to have a dark mode. Even better would it be if the mode would automatically be triggered by a user's system being in dark mode.
auto
to respect the user's system settings if possible1 weeks of Nadine's time.
Glare sensitivity can be one type of visually impairment. This means that bright white surfaces can cause difficulties for users. This can be particular bad if users have to switch between dark and bright interfaces.
To prevent this, it would be great for Opencast Editor to have a dark mode. Even better would it be if the mode would automatically be triggered by a user's system being in dark mode.
auto
to respect the user's system settings if possible2 weeks of Nadine's time.
Glare sensitivity can be one type of visually impairment. This means that bright white surfaces can cause difficulties for users. This can be particular bad if users have to switch between dark and bright interfaces.
To prevent this, it would be great for Tobira to have a dark mode. Even better would it be if the mode would automatically be triggered by a user's system being in dark mode.
auto
to respect the user's system settings if possible1 weeks of Julian's or Lukas' time.
Right now, a lot of elements can only be reached via the mouse in Opencast Studio. Tabbing through active elements and activating them via keyboard should be possible.
Additionally, sensible title
and aria-label
attributes should be used in the user interface.
Low risk
A user can create a full recording with no mouse.
Accessibility improvement for motor impaired people and for visually impaired people using screen reader.
1 weeks of Nadine's time.
Opencast automatically recognizes video scenes and with this, for example, slide changes in the presentation video.
The format Opencast stores these segments in is XML-based and maybe somewhat outdated. We could investigate using WebVTT instead. This would potentially allow us to use the editing tools we have for subtitles.
Update Opencast to fully work with WebVTT sections (could be a potentially resulting project)
We decide that there is no benefit in switching.
We made a decision.
1 weeks of Nadine's time.
Opencast automatically recognizes video scenes and with this, for example, slide changes in the presentation video.
While this works well, sometimes it would be nice if users can fix these, add new segments, change segments and update segment descriptions.
Blocked by Switch to WebVTT for Segments
which may make this obsolete.
Users are able to change segments
7 weeks of Nadine's or Alex's time.
Opencast automatically recognizes video scenes and with this, for example, slide changes in the presentation video. It also extracts text from these slides.
It would be really helpful if we could identify the most significant parts of these slide texts like, for example, the slide title. This can help users navigate through the video.
If we cannot make this accurate enough, results may be useless.
We extracted at least 4 titles from the Dual-Stream Demo video.
4 weeks of Nadine's time.
Having a lot of extracted text for each event with the OCR on slides and the speech recognition, we could use something like tf-idf to easily extract subjects or other keywords from these texts to enrich event metadata.
This might work especially well, if we look at the term frequency in documents for one event and compared to the frequency in all events of a series.
Implement a tf-idf extraction workflow operation to extract keywords from metadata, slide texts and subtitles.
We can add the subjects as Dublin Core subject metadata and maybe add the results with additional information as JSON attachments.
None
Running this on a series of UOS or ETH material should result in recognizable subjects.
2 weeks of Nadine's time.
Analyzing the videos, we can extract a lot of useful information we should properly present to users in the player to help them navigate through a video.
Evaluate, add and/or improve in Paella Player:
Potential conflict with what UPV and ETH does. Make sure to coordinate with them.
A user can use the slide texts of the Dual-Stream Demo for navigational help in paella Player.
3 weeks of Nadine's time.
Analyzing the videos, we can extract a lot of useful information in text form about events. It would be really helpful, if we could use those for searching in Tobira.
Make Tobira search through:
A user can find events based on speech recognition.
3 weeks of Lukas' or Julian's time.
Subtitle2go is an alternative to Vosk for free, automatic subtitling. It has good support for the German language. We could add it as a transcription service to Opencast.
4 weeks of Nadine's time.
Toooltips (e.g. HTML title
attributes) are great to add additioonal explanations to control elements. They are – in combination with aria-label
– also picked up by some accessibility tools to help users further.
But non-persistent tooltips can unfortunately hinder accessibility. Users relying on screen magnification tool may be unable too read the whole tooltip since only part of the actual screen is visible.
For example, here is screen magnification being used with default title
attributes vs using tippy.js for rendering in the Opencast documentation:
In addition to the regular models, Vosk provides pubctuation models for both punctuation and case restoration. We should try them and compare the Results to the default models for German and English.