# Migrating crash ingestion Python services and libraries from pip and pip-tools to uv ## Rationale The crash ingestion Python services and libraries use `pip` and `pip-tools` to manage dependencies. Over the past year, we hit bugs or incompatibilities in these tools with increasing frequency. During the last few months, we had to debug new issues with almost every round of monthly Dependabot upgrades. We hope that migrating to `uv` will eliminate or at least severely reduce the time we have to spend on issues with the dependency management tooling itself. There are some additional reasons for this migration: * `uv` is extremely fast -- 10 to 100 times faster than `pip` [according to Astral's benchmarks](https://github.com/astral-sh/uv/blob/main/BENCHMARKS.md). Docker image builds become faster, both locally and in CI. This can save a significant amount of time during development work. * `uv` is more flexible than current tooling. To give an example, we will be able to pin individual packages to an alternative index. With `pip`, every package is looked up on all indexes. * `uv` is quickly becoming the industry standard for Python package and dependency management. Its speed, flexibility and ease of use provide a compelling improvement over older tools. `uv` has already been adopted by many projects at Mozilla, including by projects maintained by the Firefox Delivery Tools team. ## Overview of the current dependency management approach * **Dependency declaration:** Dependencies are declared with their exact versions in `requirements.in`. * **Dependency resolution:** Dependencies are resolved using `pip-compile` when running `just rebuildreqs`, producing `requirements.txt`. This "lock file" includes the exact versions of transitive requirements in addition to the direct dependencies, and has build hashes for all packages. * **Dependency pinning:** We include package hashes in `requirements.txt` by passing `--generate-hashes` to `pip-compile`. * **Verifying the lock file:** Since we are specifying exact versions in `requirements.in`, the result of the compilation is fully determined by the input. We verify that the `requirements.txt` matches what we get when compiling `requirements.txt` in CI. * **Dependency installation:** Dependencies from `requirements.txt` are installed using `pip install` into the system Python environment of the Docker image. * **Running the application:** Since dependencies are installed into the system Python environment of the Docker image, simply running the application with the system Python interpreter will make the dependencies available. * **Keeping dependencies up to date:** We rely on GitHub's Dependabot for most dependency upgrades. ## Suggested uv-based dependency management approach `uv` is more flexible than our current tooling. It allows to exactly imitate our current approach by using the `uv pip` interface. We'd basically replace calls to `pip` with `uv pip` and calls to `pip-compile` with `uv pip compile`. While the `uv` replacements aren't exact drop-in replacements, the approach would likely work for our use case. It would probably also help with some of the issues we've experienced – mostly incompatibilities between `pip` and `pip-tools` versions and other bugs in these tools. However, the `uv pip` interface mostly exists to support legacy workflows, and is not intended as the main `uv` workflow. I think we are better off using `uv` the way it's actually intended. This section lays out the apprach I suggest we use. * **Dependency declaration:** Dependencies are declared in `pyproject.toml`. We only include version ranges where we care, e.g. for Django. Most dependencies ban be left unconstrained; we still have pins to exact versions in `uv.lock`. * **Dependency resolution:** Dependencies are resolved using `uv lock`. This command will update the lock file, e.g. to reflect manual changes to the dependencies in `pyproject.toml`. We should expose it as a `just` command, e.g. `just uv-lock`, which replaces `just rebuildreqs`. * **Dependency pinning:** `uv` includes package hashes in `uv.lock` by default. * **Verifying the lock file:** Since we no longer have exact versions in `pyproject.toml`, the information in the lock file isn't completely determined by its input. Instead of verifying that regenerating the lock file from scratch gives exactly the same result, we only need to verify that the lock file is consistent with the inputs. `uv` will verify that automatically when installing the dependencies in the Dockerfile, so we don't need to do anything explicitly. * **Dependency installation:** The dependencies are installed into `uv`'s managed virtual environment in the Dockerfile using `uv sync --locked`. The `--locked` flag ensures that the lock file is not updated, and that the operation fails if the lock file is not consistent with the dependency declarations in `pyproject.toml`. * **Running the application:** We need to make sure that we use the Python interpreter from the managed virtual environment inside the Docker container. This can be achieved by adding `/app/.venv/bin` at the beginning of the `PATH` environment variable in the Dockerfile: ```dockerfile ENV PATH="/app/.venv/bin:$PATH" ``` (`/app/.venv/bin/python` is just a symlink to the system Python interpreter; using the different path still ensures that dependencies in the virtual env will be found by the interpreter.) * **Keeping dependencies up to date:** The Dependabot support for `uv` looks sufficient for our use case – see the dedicated section further down. ## Docker integration We need to adapt our Docker files to use `uv` instead of pip. First, this requires that `uv` is present in the Docker image in the first place. Astral offers the Debian-based Python images we use with `uv` pre-installed, so we can simply switch to these images to have `uv` available inside the Docker image. To install dependencies, the `uv` documentation [recommends this approach](https://docs.astral.sh/uv/guides/integration/docker/#intermediate-layers): ```dockerfile RUN --mount=type=cache,target=/root/.cache/uv \ --mount=type=bind,source=uv.lock,target=uv.lock \ --mount=type=bind,source=pyproject.toml,target=pyproject.toml \ uv sync --locked ``` Since we don't need to install the app itself, we can omit `--no-install-project` and the second call to `uv sync` after copying the source files. We can consider running this step as the `app` user already, since it doesn't require root permissions. The above command uses an external volume as the package cache. This ensures the cache isn't left behind in the image. `uv` places the managed virtual environment inside the app directory, e.g. at the path `/app/.venv`. This is fine in production, but it can cause issues in the development environment, where the `/app` directory is mounted from the host machine. The virtual environment is platform-specific and only works inside the Docker container, but it's leaked to the host machine via the volume, and it's likely to at least confuse IDEs there. We should make sure that the venv on the host machine and in the Docker container are kept separate. We can prevent the host's venv from being added to the Docker container by adding `.venv` to `.dockerignore`, and we can prevent the Docker container's venv from leaking out to the host during development by adding a separate volume for `/app/.venv` to our `docker-compose.override.yaml` files, which are only used for the development environment. ## Dependeabot support * Generally existing. but may have limitations. It looks good enough. In particular, there's [support for security updates as of December](https://github.com/astral-sh/uv/issues/2512#issuecomment-3644062604). * Documentation * By Astral: https://docs.astral.sh/uv/guides/integration/dependency-bots/#dependabot * By GitHub: Nothing really, but this issue is useful: https://github.com/dependabot/dependabot-core/issues/12609 * Dependabot bugs with respect to uv: https://github.com/dependabot/dependabot-core/issues?q=state%3Aopen%20label%3A%22L%3A%20python%3Auv%22 ## uv configuration ### Settings * `environments` can be used to restrict to Linux, CPython etc. This improves resolution performance and avoids adding dependencies that are only required on Windows. * Setting `package` to `false` simplifies things. That way we don't need a build backend, and don't need to "install" the app itself. ```toml [tool.uv] environments = [""" sys_platform == 'linux' and platform_machine == 'x86_64' and python_version == '3.11' and implementation_name == 'cpython' """] package = false ``` ### Enviroment variables * `ENV UV_PYTHON_DOWNLOADS=0` in the Dockerfile ensures `uv` never downloads a Python interpreter in the Docker image. We want to use the system Python, which should be used automatically, but if we mess up the image version or the Python version constraint in `pyproject.toml`, `uv` could still end up downloading a different interpreter. Setting this environment variable ensures we get an error message in that case, which is what we want. ## Migration ### Build backend We don't need to build packages for our services, so we should remove our `setup.py` files. Some of the settings can be added to `pyproject.yaml` as desired, for example ```toml [project] name = "tecken" description = "The Mozilla Symbol Server" readme = "README.rst" dynamic = ["version"] requires-python = ">=3.11,<3.12" [project.urls] Homepage = "https://github.com/mozilla-services/tecken" ``` ### Pinning obs-common to our private index Unlike with `pip`, we are now able to pin a specific package to an alternative index. This is useful for obs-common, which we can pin with this configuration in `pyproject.toml`: ```toml [[tool.uv.index]] name = "cavendish" url = "https://us-python.pkg.dev/moz-fx-cavendish-prod/cavendish-prod-python/simple/" [tool.uv.sources] obs-common = { index = "cavendish" } ``` ### Dependencies The old dependencies can be moved from `requirements.in` to `pyproject.toml` and from `requirements.txt` to `uv.lock` using this command: ```bash uv add -r requirements.in -c requirements.txt ``` This will only work after pinning obs-common to the private index. We then need to drop the exact version constraints from `pyproject.toml` manually. ## Open questions * Should we compile Python files to bytecode in the Docker image? I think it will image startup performance, but it's currently disabled at least for Tecken. We should investigate why it's disabled.