# Quansight's PyTorch Contributions in 2021
> TODO: Add here a "Contents" section (perhaps in two columns) with links to all the sections / subsections.
## Intro
>What is PyTorch (NumPy-like with GPU / TPU support, autograd, AMP, distributed...)
Why is Quansight a very good match for this project, and how we can bring expertise from 10+ years of OSS support in the PyData world.
Throw a few numbers on why PyTorch is massive (x10 loc than NumPy), very active (800+ active contributors).
## Quansight Contributions
We have a team of 15+ engineers involved in many aspects of PyTorch development. We distinguish the following as our main topic areas: Scientific Computing, Python Array API Standard and NumPy compatibility, Maintainability, Torchvision, and Research topics.
### Python Array API and NumPy Compatibility
In 2010, if a user needed to perform some numerical computations using multidimensional data, they would import [NumPy](https://numpy.org/) or its extension [SciPy](https://scipy.org/), store their data in an array, and start taking advantage from the speed of the operations implemented in C from the comfort of the Python language. Nowadays, the landscape is quite different. We have libraries with autograd, GPU and TPU support oriented towards deep learning, like [PyTorch](https://pytorch.org/), [Tensorflow](https://www.tensorflow.org/), [JAX](https://jax.readthedocs.io/) or [MXNet](https://mxnet.apache.org/), libraries that provide a CUDA extension for NumPy-like code, like [CuPy](https://cupy.dev/) or cluseter-level parallelism, like [Dask](https://dask.org/).
At the same time, not only are these libraries used by millions of people, but they also serve as basic building blocks for most of the other tools in the PyData world. Most other libraries for data science in Python, such as [pandas](https://pandas.pydata.org/), [scikit-learn](https://scikit-learn.org/), or [matplotlib](https://matplotlib.org/), use and consume arrays, and build on top of them to implement higher-level functionality.
The [Python Array API](https://data-apis.org/array-api/latest/) aims to serve as a bridge between these two realities. It is a standard that aims to provide a common API. Then, if libraries write their code in terms of this API, their code becomes library-agnostic. This means that the user could choose the backend library that is used to manage the arrays internally depending on their use case. For this to be possible, the Python Array API is largely based on (a curated subset of) NumPy's API, which is the gold standard that most other libraries follow.
From a user perspecive, if they have a large codebase written in NumPy and they want to migrate it to another library, if this other library implements the Python Array API, doing so should be as easy as changing the imports.
PyTorch decided to implement the Python Array API in May 2021. As of version 1.12, PyTorch implements >90% of its functionality. Quansight has been instrumental in this process in both directions: by extending the functionality that PyTorch provies to cover that specified in the API, and also by improving and fine-tuning the API standard itself based on the knowledge acquired during years of developing PyTorch core alongside NumPy and SciPy.
### Scientific PyTorch
Scientific computing was the branch of knowledge that first inspired data analysis. PyTorch, although a deep learning library at heart, is also widely used in the scientific community, where there is a growing trend of boosting classical numerical methods and algorithms with the parallelism of GPUs and the information given by autograd.
Quansight has a number of global experts in this area coming from the SciPy community, or more generally, the PyData community. As such, many of the contributions of Quansight within PyTorch have been in this area. Mostly driven by Quansight efforts, PyTorch 1.12 includes a number of popular modules from SciPy, together with CUDA and autograd support
#### Linear Algebra: `torch.linalg`
[PyTorch 1.9](https://pytorch.org/blog/torch-linalg-autograd/) included a [`linalg` module](https://pytorch.org/docs/stable/linalg.html) that implemented all the functionality from `numpy.linalg` in a NumPy-compatible way, together with CUDA acceleration and autograd support. Since its release, this module has been expanded to also include a number of popular functions from `scipy.linalg` and more. This module was created and is actively maintained by a group of Quansight engineers.
#### Forward and backward AD
Automatic differentiation (AD) is arguably the main feature that deep learning frameworks bring to the table over traditional array libraties. Quansight is actively involved in the support and implementation of correct and efficient derivatives. In particular, it has helped in implementing many of the forward AD formulas to make possible the release of the [support for forward AD mode in PyTorch 1.11](https://pytorch.org/tutorials/intermediate/forward_ad_usage.html).
#### Complex numbers
PyTorch 1.10 came out with [complex numbers support](https://pytorch.org/docs/master/complex_numbers.html), and optimization over complex tensors. This had been a feature that had been requested [since the beginning of PyTorch](https://github.com/pytorch/pytorch/issues/755), and which has deep applications in fields ranging from signal processing to quantum mechanics. Quansight helped generalizing the formulas for many functions and their derivatives to the complex case. The foundations of how to do so are not well-understood by the community, so a number of people from Quansight are currently working on publishing a paper formalizing the ideas and semantics that drive PyTorch's design from a theoretical point of view.
#### Mathematical functions: `torch.special`
PyTorch 1.9 introduced the [`torch.special` module](https://pytorch.org/docs/stable/special.html), modelled after `scipy.special` module. This module contains special functions used in mathematics such as the Riemann zeta function or the gamma function. These functions are of paramount importance in fields like physics, mathematics and statistics, but they also appear when modelling complex systems in biology and mechanics. This module expands on SciPy's one by adding GPU and autograd support to its functions. This module was implemented and is currently maintained by engineers at Quansight.
#### Fast Fourier Transforms: `torch.fft`
PyTorch 1.8 introduced the `torch.fft` module implementing fast Fourier transforms fully compatible with `numpy.fft`. All functions support CPU or GPU acceleration and complex autograd. There are plans to expand this module with discrete sine and cosine transform algorithms, compatible with `scipy.fft`. This module is written and maintained by Quansight engineers.
#### Interpolation
Up/down sampling algorithms are at the core of many algorithms in the field of computer vision. They are used both for preprocessing the data, but also as components to assess the quality of a given model. [A paper published in 2021](https://github.com/GaParmar/clean-fid) showed that most major deep learning libraries suffered of poor scaling issues in its interpolation algorithms, giving vastly incorrect results. All these issues have been addressed and corrected by Quansight engineers, implementing new stable and efficient algorithms and their derivatives on CPU and GPU in PyTorch 1.11.
### Maintainability
Given the speed of development of PyTorch, with more than a hundred people working full-time on it, usual software engineering practices like testing, integration, benchmarking and documentation are fundamental to be able to provide a seamless user experience. The main challenge here is that textbook solutions often do not scale to projects of the size and complexity of PyTorch. By leveraging years of experience developing and maintaining many other large OSS projects, Quansight has been able to help in the sustainable growth of PyTorch.
#### Automated testing
In 2021, PyTorch started to find a way to reduce and standarize its 100K+ lines of tests into what's referred to as `OpInfo`s and `ModuleInfo`s. Given the amount of subsystems within PyTorch (forward AD, backward AD, strided tensors, different JIT backends...) and its extensive API (2000+ functions) it was not sustainable to manually write tests for all these functions against all subsystems. The solution was to write generic tests for each of the subsystems that would be fed the operations together with their characteristics, and the test would then know how to test that function. Quansight engineers have been involved in the implementation of generic tests and in adding support for more and more operations to increase the test coverage. While doing this, the engineers at Quansight have also been involved in fixing the bugs that were found in the process.
#### `torch.testing`
PyTorch internal had developed elaborate utilities that are needed for testing, e.g. creating random tensors for a given specification and comparing the results of tensor operations. With the ever growing ecosystem, the demand for having these utilities publicly accessible also grew. Starting in 2021, Quansight engineers started to design and implement a system that is able to handle the complex internal needs of the PyTorch project while simultaneously providing downstream libraries the tools they need. Soon after the inception, `torch.testing` has seen adoption by other projects. In early 2022, not even a year after the beginning of the project, the module reached a stable state.
#### Structured Kernels
Even though the API surface of PyTorch is remarkably large, there are a few properties that most if not all operations within PyTorch share. A PyTorch function is fed one or more tensors and perhaps more arguments, and returns some new tensors. This simple remark, allows to factorize any PyTorch operation into first creating the output tensors given the input tensors, and then computing the values of the operation and filling the output tensors. This factorization allows for, for example, if we skip the actual computaiton, figuring out the size and other properties of all the inner tensors of a neural network without really running the model. Quansight engineers have been involved in the design and implementation of parts of this mechanism, and are actively involved in the migration of PyTorch functions to this more flexible model.
#### Build times
Editing any PyTorch operator often required rebuilding thousands of C++ and CUDA files and was highly disruptive to the development cycle. Quansight engineers have profiled and eliminated bottlenecks in PyTorch's parallel builds, as well as fixing structural issues in PyTorch's core C++ codebase that led to thousands of files being rebuilt unnecessarily. Typical build times when switching branches went from 20 minutes to 5 or less minutes.
#### Docs and docs infra
From a usability perspective, a library is as good as its documentation. Quansight engineers have been and currently are involved in rewriting major sections of the documentation of PyTorch. We have also been involved in updating and maintainance of the infrastructure that runs and hosts all the documentation pages within PyTorch, improving the formatting of the docs and the overall user experience.
#### Type annotations
Type annotations for PyTorch were an oft-requested feature by users. They help with catching errors and with code completion in IDEs. Two Quansight engineers improved type annotation support significantly by adding a testing framework, running Mypy in CI, moving existing type annotations in stubs inline, and fixing a large amount of issues. By April 2021 type annotation support in PyTorch was declared complete.
#### Port legacy code from C to C++
PyTorch originally started as a Python port of the Lua library Torch, which itself was a C library with Lua bindings. From its start, PyTorch decided to rewrite its backend completely in terms of two in-house C++ libraries. The process of migrating the macro-based C backend to the higher level C++ one started in 2016, and it has just been completed at the end of 2021 with a final push from an engineer from Quansight, who helped migrating a large number of highly non-trivial functionality.
#### High prio issues
From its initial involvment in the project in 2019, Quansight has been actively involved in helping Meta deal with high priority issues. These are bugs reported by users that are considered critical, or feature requests that got enough attention from the community as to be deemed of particular interest. During the last year, Quansight engineers closed XXX high priority issues.
### Torchvision
#### Datasets & Transforms
`torchvision.datasets` and `torchvision.transforms` have been part of `torchvision` since the initial release in 2016. There original purpose was to assist image classification scenarios and for this use case they work well. Quickly after the beginning, demand grew for other vision tasks like object detection, video classification, and optical flow.
The original API was able to partially support these use cases as well, but there was never a general way. Starting in mid 2021, Quansight and Meta engineers started completely redesigning the API to achieve convergence between the different tasks. This work is still ongoing, and can be found in the `torchvision.prototype` namespace.
Although the revamp brings a plethora of improvements, the most important change to highlight here is that datasets now return everything they have to offer and transforms now handle that without any need for manual interference. For example, if a dataset provides a bounding box together with the image, all transformations that alter the shape of the image are also applied to the bounding box to keep them in sync.
#### Video reading
We have seen widespread adoption of torchvision video backends and datasets in 2021. Quansight engineers, in collaboration with community developers and Meta engineers, have continued to push the performance, reliability and accuracy of video infrastructure in torchvision to match the new demand. We refactored and updated the existing API to support the latest versions of FFmpeg system libraries and resolved numerous issues related to video modules. We have also worked closely with engineers from Meta and NVIDIA to bring the support for GPU decoding, one of the most-requested features, to torchvision and have integrated it into the existing infrastructure.
### Research Topics
Engineers at Quansight are also often actively involved in the research part of PyTorch. This involves features that are expected to be used either by researchers or are implemented as proofs of concept of promising ideas, and do not necessarily have equivalents in other deep leaning frameworks. These topics require strong design capabilities, together with a good knowledge of the current research literature on these topics, which fit well with the academic background of many of the engineers at Quansight.
#### Sparse Tensors
Sparse data appears naturally in fields with high-dimensional datapoints, like vision, chemistry and drug synthesis, or analysis of time sequences in geology or biology. While sparse tensors have been around for as long as data analysis, the semantics for sparse operations in the context of autograd are still far from well-understood. A team of Quansight engineers is involved in both the design and the implementation of sparse operations and their derivatives within PyTorch. The current goal is to have PyTorch match the capabilities of `scipy.sparse`, together with GPU support and sparse gradients when possible.
#### Functorch
With its release in 2019, JAX introduced a new way of thinking of machine learning. JAX showed what many programming languages researchers had theorized for years: It was not only possible but sound to implement an efficient ML framework based on functional programming principles. [Functorch](https://github.com/pytorch/functorch) (for Functional PyTorch) is an approach to marry the benefits and simplicity of the higher order functional transformations from JAX with the simplicity of use of the eager mode and class-based approach from PyTorch. Quansight has a number of engineers participating in this project, which will be first released as an external library for PyTorch 1.12. This library is intended to stay as an out-of-tree project until its design is stable. Then, it is planned to be merged into core PyTorch.
#### Parametrizations
In the same way that data is often preprocessed and cleaned-up before being analyzed, preprocessing weights of layers by transforming them before being used within a layer can be used as a regularizer to [stabilize the training of a network](https://arxiv.org/abs/1802.05957). PyTorch 1.9 added a way to [parametrize parameters of neural networks](https://pytorch.org/docs/master/_modules/torch/nn/utils/parametrizations.html) in a composable and extensible way. The design of this feature stemmed from the research carriend by one of the engineers at Quansight during their doctoral studies, who then went to implement this feature in PyTorch core.
## Closing Remarks
> What to write here? (1-2 sentences per point:)
> 1. Credits: name our team members and PyTorch maintainers
> 2. Concluding remark on 2021 (impact, learning & growth of team)
> 3. Looking forward to 2022
The overall impact of this work
This turned into a very long post, which reflects the huge amount of effort put in by our team of 15+ engineers. This was a true team effort, with contributions from: Ivan Yashchuk, Peter Bell, Mario Lezcano, Ralf Gommers, Nikita Vedeneev, Kurt Mohler, Kshiteej Kalambarkar, Thomas Fan, Philip Meier, Yukio Siraichi, Victor Fomin, Pearu Peterson, Kushashwa Shrimali, Sameer Deshmukh, Hameer Abbasi, Bruno Korbar, Nikita Karetnikov, Matti Picus, Antonio Cuni, Guilherme Leobas, Alexander Ocsa, Edgar Margffoy, and Anirudh Dagar.
All this work also wouldn't have been possible without the excellent collaboration with and support from PyTorch and Torchvision maintainers at Meta. We'd like to thank Mike Ruberry, Natalia Gimelschein, Edward Yang, Alban Desmaison, Anjali Chourdia, Christian Pursch, Joel Schlosser, Nikita Shulga and Richard Zou (PyTorch) and Vasilis Vryniotis, Francisco Massa, Prabhat Roy and Nicolas Hug (Torchvision) in particular.