rust-av project

# rust-av project [rust-av](https://github.com/rust-av) is a new multimedia toolkit written in [Rust](https://www.rust-lang.org/it) language. ## So far `rust-av` started as an experiment to see whether the Rust language could be used for multimedia purposes. As the language evolved, it had been fairly clear it **does** work. We contribute to other related projects: - [rav1e](https://github.com/xiph/rav1e/), an AV1 encoder. We even host some rav1e-related components which are also shared across other encoders. - Bindings towards other multimedia C/C++ libraries. [david-rs](https://github.com/rust-av/dav1d-rs) is the most notable one, since it has also been integrated into the [GStreamer](https://gitlab.freedesktop.org/gstreamer/gst-plugins-rs/-/tree/main/video/dav1d?ref_type=heads) pipeline. - [NihAV](https://nihav.org/), a similar pure-Rust experimental framework. His author focused more on getting as much as possible done, but since he is not interested in dealing with a community, despite the codebase itself is [AGPLv3](https://www.gnu.org/licenses/agpl-3.0.en.html), he kindly allowed us to relicense the code to a more liberal license. ## Goals - A community-driven project is the base for a framework. As much as possible, the requests carried by the community **must** be listened to and, when approved by the majority of developers, transformed into implementation requests. This does not mean though to become community-slave. The community brings its ideas, in addition to its difficulties in using the framework, and then, through a **procedure**, the main core of developers will choose what deems better to implement. - Create a modular library for audio and video processing. This new toolkit is going to provide a clear and definite structure in terms of APIs. Each API **must** tackle different multimedia processing aspects. For example: - A crate for muxing and demuxing formats - A crate to encode and decode codecs - A crate containing a series of functions and utilities to parse a binary - A crate to integrate IA-functionalities - A crate to provide functionalities for embedded devices, so a `no-std` environment - A crate to provide a thin, but safe layer, towards hardware decoder APIs - As main priority, each public APIs **must** be safe and secure. No unsound code **must** be present within crates and every **unsafe** code, necessary for optimization aspects, **must** be labeled with a specific keyword, in addition to some comments which state **every** possible risk. - Replace **unsafe**, or **unoptimized**, multimedia libraries from software which adopt those libraries in their projects. We can see this task as a consulting service. The creation of a series of APIs bindings could help the transition, which **must** be gradual and tested, towards safer alternatives. We can start from the most appealing programming languages such as C/C++ and JavaScript. As an example, surveillance systems make use of unsafe libraries and rust-av might step in as a new **valid** alternative. - Usage of **Continuous Integration** system to catch up code smells, code complexities, and vulnerabilities in our project **is** a very important aspect. Automatizing this process can also reduce a developer effort. The entire checking process **must** be reproducible on a developer system as well. ## Programming language **Rust** offers a series of features aimed to ensure a developer a certain degree of code security in comparisons to old-seasoned languages such as `C/C++`. Among the features of the language, divided by category: - **Speed and Security** - Memory safe (no use-after free and concurrency hazards) - Zero cost abstractions - Almost no hidden runtime costs (no garbage collector) - More predictable (less undefined behavior traps to be aware of) - **Documentation** - Explanatory books and well-documented APIs - Introductory small exercises to learn the language with fun and with a constant, but not steep, learning curve - Provide a series of tools to better document code - **API** - Allow to write clear and flexible APIs through the use of [traits](https://doc.rust-lang.org/book/second-edition/ch10-02-traits.html) - Traits and [Cargo features](https://doc.rust-lang.org/cargo/reference/features.html) can facilitate the creation of a thin, modular, plug-and-play unified library and/or binary interface - Clear separation among structures and methods which interact with them - Possibility to write **asynchronous** APIs with simplicity starting from Rust version 1.77 - **Interoperability** - Run on embedded devices - Support a discrete amount of architectures and operating systems - Easily integrate with other language through bindings ## Other multimedia frameworks This paragraph aims at describing a series of frameworks which process audio and video. Each of these frameworks has been created with specific purposes in mind and presents different development histories. We are going to list all the advantages and disadvantages of these software in order to have a clear idea on which aspects rust-av should be in accordance with the features contained in these frameworks and on which ones it should differ. ### FFmpeg ### GStreamer @guerra ### VapourSynth ### Symphonia (Audio only) [Symphonia](https://github.com/pdeljanov/Symphonia) provides both demuxers for common multimedia formats and decoders for some audio formats **implemented in pure Rust**. Licensed under the permissive [MPL](https://www.tldrlegal.com/license/mozilla-public-license-2-0-mpl-2) license. Despite being a relatively young project, it **could be a useful inspiration for demuxing/decoding APIs**. # Project Impact An effective employment of `rust-av` in real projects leads us to ask several questions about the impact of the project both on users and companies. A pure-Rust framework can determine new approaches and we would like to perform a priori analysis to make certain of risks. To provide a more precise view on the subject, we have transformed some of the sectors of analysis into questions. Each answer to the question represents our way to comprehend that matter. ## Which users might be interested in rust-av? With the term *users*, we are referring to **end users**, who are going to use software developed with the `rust-av` toolkit, but also every **developer** who will implement multimedia software through our APIs. If we make an hypothesis on the type of users who might be interested in our framework, we can almost surely find those who are **dissatisfied** with the features offered by other multimedia frameworks. We have created this Venn diagram to better illustrate the concept: <p style="text-align:center"> <img src="https://hackmd.io/_uploads/rkbLjh1q6.png"> </p> The people in the white space are those who do not use **any** multimedia framework. However, this hypothesis does not exclude whoever is **inside** a circle: those people **might** be willing to use `rust-av`, but it is less probable for them to switch from long-maintained and consolidated frameworks to an innovative and unstable one. Usually, getting used to known instruments create in us a comfort zone and a lower interest for what is new. Therefore, our framework is thought for: - **unsatisfied** multimedia users - people who are approaching audio and video processing for the first time To make the experience for these users more appealing, we should be able to provide them: - A good and easily-browsing documentation, full of examples about the most common use-cases one can encounter when dealing with multimedia tasks. A well-documented framework makes the learning process more exiting and less exhausting for newcomers. - Well-described and clear APIs. Having an API with a non-ambiguous name and a good description of what it does is **extremely** important for a developer. Fully explained APIs simplifies the process of put them together in order to build a binary, reducing a developer effort during features development. - Modularity. Using only the required components without having to import superfluous crates, because of a separation of concerns issues, reduces the number of errors and improves codebase maintainability. The hypothesis presented until now has been formulated relying on observations written by users on forums, chat, GitHub issues, thus they might not represent a reliable sample. A possible way to get more information from users could be that of submit a **survey** at the end of the year, or once in a while depending on our needs, with the aim of understanding how our project is progressing. This method would give us more precise information and hence a better plan for the next development cycle. ## What could be the reasons which lead a company to replace their multimedia frameworks with our own? In the previous question, we have only considered those people who can freely choose their preferred multimedia frameworks. But this scenario is not suited for companies which would like to change their audio and video processing libraries, but they cannot do that because of the huge effort in replacing them. To motivate a company in replacing their own libraries, we have to provide the same features it **does** currently have **in addition to something more**. A company usually bases its products on two aspects: **security** and **optimizations**. Most of the time, you have either one or the other, but having both of them at the same time is not quite common for these reasons: - Hard to learn a programming language which does not provide so many abstractions which speeds up code writing - The effort to maintain an unsafe programming language is higher than using a based-on-interpreter one which gives security features for free - If fast software are required, it is nearly mandatory the usage of unsafe libraries since they are optimized. Though, this choice will probably introduce security vulnerabilities within the codebase during code development. For instance, there are companies which uses libraries written in `Python` and `JavaScript` language for security purposes, but their software results slow and resource-hungry, while there are other companies which tend to provide highly-optimized products written in `C/C++` languages, but subjected to several security vulnerabilities. `rust-av`, being written in Rust, can provide these two properties simultaneously. We can treat this characteristic of the Rust language as the **_something-more_** part which might push a company to replace its own libraries. But is that sufficient? No. We also have to define a method to integrate our libraries in an existent codebase gradually. A step-by-step solution to achieve this result could happen through the usage of **bindings**. The diagram below helps to illustrate the desired solution: <p style="text-align:center"> <img src="https://hackmd.io/_uploads/r19ba8Z9p.png"> </p> The circle in the center represents our Rust framework. Smaller circles define bindings towards other programming languages and each of them is connected to a series of rectangles which identify binaries, libraries, and frameworks developed in the same bindings language by different companies. Thinking of converting each company software into Rust is **infeasible**, so we have to interact with them using bindings created through the libraries offered by the Rust ecosystem. As an initial step, we can create a single repository containing bindings for the `C/C++` language, since they can be easily integrated in any project written in `JavaScript`, `TypeScript`, `Go` and `Python`. We can eventually add in our organization a series of repositories containing bindings for each programming language we decide to support. # Security Analysis A security analysis aims to test the security level of our framework. We can adopt the following approaches to achieve this objective: - The framework must be tested in a sandbox environment to determine how it works and whether it contains malicious issues which could affect a real environment where a software made with it will eventually run on. - Enabling the `clippy` lint called [undocumented_unsafe_blocks](https://rust-lang.github.io/rust-clippy/master/#/undocumented_unsafe_blocks) to explain why the unsafe operations performed inside `unsafe` blocks are safe - Running [cargo-careful](https://github.com/RalfJung/cargo-careful) to detect undefined behaviors - Running [sanitizer](https://doc.rust-lang.org/beta/unstable-book/compiler-flags/sanitizer.html) to identify memory vulnerabilities - Running [valgrind](https://valgrind.org/) to detect different kinds of vulnerabilities - Running [cargo-audit](https://github.com/RustSec/rustsec/tree/main/cargo-audit) to catch security issues in dependencies # Possible investors Hereafter we define a list of the possible investors for the `rust-av` project: - [Prossimo](https://www.memorysafety.org/): Developing memory safe multimedia software i.e. [rav1d](https://github.com/memorysafety/rav1d). **Prossimo** might be interested in investing in Rust-only projects focused on memory safety because one of its objectives is having a safer web. - [System76](https://system76.com/): Developing [Pop!\_os](https://pop.system76.com/) operating system in Rust, perhaps a multimedia library completely written in this language could be of some interest for them - [Servo](https://servo.org/) is a new browser currently developed by [Linux Foundation Europe](https://linuxfoundation.eu/). It uses `GStreamer` as pipeline right now. - Any company interested in **sponsoring** our framework - **Anyone** who wants to replace multimedia libraries with a safer alternative written in Rust # Proof of concept (POC) A proof of concept (POC) allows to test and validate the effective feasibility of the toolkit we are building. Each subsection describes an environment, in the form of a concrete example, in which our APIs can be presented and combined together. The main goal of this proof of concept is to grasp the `rust-av` potentiality through simple use-cases. ## Multimedia API @lu-zero ### Example ## Artificial Intelligence Adding a crate to run artificial intelligence filters and algorithms which perform video enhancement is an added value for a multimedia framework. Starting from the functionalities provided by the [burn](https://github.com/tracel-ai/burn) framework, we build a simple filter which: 1. Load an `ONNX` file containing a video-enhancer neural network 2. Run the `burn` model derived from the ONNX file onto a degraded video 3. Get the enhanced video as output We do not make any **training APIs** available inside the crate ### Example @Luni-4 ## Hardware decoder Some of the most recent decoders are directly implemented on bare-metal. Modern GPUs and APUs contain **video decoding ASIC[^asic]** to decode cutting-edge formats, such as AV1. Those integrated circuits provide a series of registers and computation units which speed up video processing executing a bunch of instructions in parallel and part of the decoding process in dedicated compartments. ### Common hardware in consumer PCs The main GPU vendors and the video decoding SDKs for their hardware are - NVIDIA with [NVENC](https://developer.nvidia.com/video-codec-sdk) - AMD with [AMF](https://github.com/GPUOpen-LibrariesAndSDKs/AMF) - Intel with [VPL](https://www.intel.com/content/www/us/en/developer/tools/vpl/overview.html) (supports both integrated graphics chips and dedicated graphics cards) The supported features (e.g. video codecs, post-processing effects) vary by vendor and hardware generation. New features in these SDKs are often first implemented in [OBS Studio](https://obsproject.com/) because of its popularity among streamers. FFmpeg is also often one of the first projects to integrate new features. ### Example # Afterward goals - A crate to manage video subtitles through [cosmic-text](https://github.com/pop-os/cosmic-text]), a multi-line text handling which provides advanced text shaping, layout, and rendering primitives wrapped up into a simple abstraction. # rust-av Organization Roadmap This is a primitive roadmap for the `rust-av` organization crates. We will subsequently convert it into a real `GitHub` roadmap. The order of each task is not written in the stone, it can be subjected to changes depending on different factors such as the time availability of a developer, the incoming requests and so on. - [ ] 1. Refreshing the [matroska](https://github.com/rust-av/matroska) crate implementing more parts of the [matroska specification](https://github.com/ietf-wg-cellar/matroska-specification) - [ ] 2. Implement a crate which parses`.ts` files - [ ] 3. Finish the [opus](https://github.com/rust-av/opus) decoder. Some parts have already been implemented locally, but still not pushed on its repository. [^asic]: https://en.wikipedia.org/wiki/Category:Video_compression_and_decompression_ASIC # Communication Tools - [Jitsi](https://meet.jit.si/) is a free video conferencing software which allows to create private rooms for personal meetings . It only requires a group member to be a moderator logged in with a GitHub account. It does not place limitations to the number of people. On desktop, you do not need to install any application, you can use it from your browser. - [Telegram](https://desktop.telegram.org/?setln=en) is an instant messaging application initially developed for smartphone but it can now be used on desktop too. We can share unstructured information and freely talk there just creating a private group. - [GitHub](https://github.com/) is repositories hosting. We can use [discussions](https://docs.github.com/en/discussions), a collaborative forum, to talk about the approaches we want to follow for our repository. Then we can use issues to describe features and technical problems.