owned this note
owned this note
Published
Linked with GitHub
---
status: draft
last-updated: 2022-09-22
---
# `Conda-forge`'s Long-term Goals and Plans
## Purpose
This document contains the long-term goals and plans for `conda-forge`.
It exists to aid the core team and outside entities understand where `conda-forge` is as a whole, where it is going, and most importantly, how the community can further support its goals.
## Table of contents
- [`Conda-forge`'s Long-term Goals and Plans](#conda-forges-long-term-goals-and-plans)
- [Purpose](#purpose)
- [Background](#background)
- [Context](#context)
- [Continuous Integration Infrastructure and Cloud Services](#continuous-integration-infrastructure-and-cloud-services)
- [Infrastructure as configuration](#infrastructure-as-configuration)
- [Mirroring](#mirroring)
- [Source tarball hosting](#source-tarball-hosting)
- [Specialized CI needs](#specialized-ci-needs)
- [Software and Internal Tooling](#software-and-internal-tooling)
- [Conda and Mamba tooling](#conda-and-mamba-tooling)
- [Repodata serving](#repodata-serving)
- [Windows toolchain](#windows-toolchain)
- [Recipe generation](#recipe-generation)
- [Recipe maintenance](#recipe-maintenance)
- [Documentation](#documentation)
- [Achieved goals](#achieved-goals)
- [The `.conda` format](#the-conda-format)
## Background
The first version of this document was developed by the core team over a period of several months starting in late 2020 and extending into early 2021.
This process and construction of this document was motivated by a few things happening in the community at the time.
First, `conda-forge` has grown spectacularly since its inception (see e.g. this [`Conda-forge` year-in-review blog post](https://conda-forge.org/blog/posts/2020-12-26-year-in-review/)).
This growth has occurred in multiple ways, including the number of artifacts we host, the number of community members maintaining those artifacts, the number of people downloading them, the diversity of the community, and the diversity of the kinds of software `conda-forge` hosts.
Second, people and organizations beyond Anaconda Inc. started making core contributions to the conda ecosystem tooling. In particular, QuantStack is contributing a huge amount of effort to building efficient tooling and an open-source conda package server (see [mamba-org](https://github.com/mamba-org)).
This effort has transformed and enabled the conda ecosystem in numerous ways. Third, due to its increasing presence in the community, `conda-forge` started receiving offers of financial support from a broader group of people/organizations.
Until now, we have lacked a coherent message about what help we needed and how these groups could help.
While this document cannot possibly answer all of these questions definitively, it can distill our thinking at the time, our current thoughts, and hopefully express clearly what is important in the opinion of the core team. We expect this document to be living in the sense that it is updated and improved over time. We also hope the community finds it useful and engages with `conda-forge` on these items going forward.
-- `conda-forge/core`
## Context
Growth is causing a variety of failures in our tooling and maintenance workflows such as:
- Significantly increased maintenance burdens
- Creation of new or unexpected demands (i.e. community and user-driven, as well as by international policies)
- Increase in potential security and reliability concerns
The numbers behind these observations can be checked in the [`conda-forge/by-the-numbers` repository](https://github.com/conda-forge/by-the-numbers).
---
## Continuous Integration Infrastructure and Cloud Services
### Infrastructure as configuration
🚧 _Quansight and QuantStack have submitted a grant (pending decision) to work on this_ 🚧
**Summary**: `conda-forge` depends upon a large amount of infrastructure configuration spread across multiple GitHub repositories, external CI services, and Heroku instances.
This configuration info and provisioning of this infrastructure needs to be centralized into a service like terraform to enable better security, reliability, and recovery from adverse events.
**Effort / cost**: medium (estimated at 1 FTE over a year or so)
**Priority**: High
**Context**: As `conda-forge` has grown, the infrastructure that powers various user-facing services (e.g., admin commands) and backend services like artifact validation and builds has grown organically too.
This situation has resulted in an array of bot accounts, API keys, and bespoke configuration settings spread across Azure DevOps, GitHub, TravisCI, Drone.io, CircleCI, and Heroku.
Further, very little to no documentation exists on how to re-provision any of these services should one of them encounter some serious event. We also cannot easily perform basic tasks, like rotating API tokens.
**Description**: N/A
**References**: N/A
**Contact info**: N/A
---
### Mirroring
🚧 *QuantStack is working on this item* 🚧
**Summary**: `conda-forge` hosts its artifacts in Anaconda.org. This is further served through a Cloudfare CDN, but it's, in principle, the sole source for all `conda-forge` packages.
It'd be advisable to have backups and mirrors to prevent single-point-of-failure type issues.
**Effort / cost**:
**Priority**:
**Context**:
**Description**:
**References**:
- https://github.com/conda-forge/conda-forge.github.io/issues/1191
**Contact info**:
---
### Source tarball hosting
> [name="Wolf Vollprecht"] This could be quite nicely and easily done as an additional push to an OCI registry. The source is anyways indexed by SHA256 from the recipe.
**Summary**: Most `conda-forge` packages obtain their source code from official origins (git repository, packaging index...).
However, these sources are not guaranteed to be available forever, threatening the reproducibility of the `conda-forge` packaging efforts.
This item would involve a mechanism to save and keep the used sources for each package pushed to the `conda-forge` channel.
**Effort / cost**: TBD
**Priority**:
**Context**: `conda-forge` undergoes ABI migrations often, which require rebuilding packages (same version) with different build-time dependencies.
If the source is not available for unrelated reasons, the migration process is interrupted until the new location of the source is found (best case).
**Description**: Note this would probably need its own Terms of Service.
**References**:
- https://github.com/conda-forge/conda-forge.github.io/issues/839
**Contact info**:
---
### Specialized CI needs
🚧 *[Quansight](https://www.quansight.com/labs) is working on **some** of these items* 🚧
**Summary**: Provide infrastructure to build packages with specific technical requirements, like needing a GPU or copious amounts of RAM / storage not available in free services.
**Effort / cost**:
**Priority**:
**Context**: `conda-forge` uses freely available CI resources (like Azure Pipelines or GitHub Actions) to build its packages.
These generous services have some limitations (allowed execution time, disk space, RAM available, processors...), which, in some cases, prevent some packages from being built.
Notorious examples include `qt`, `pytorch` or `tensorflow`. The current workaround is to run the build script locally on somebody's machine, which introduces issues such as lack of standardization of the build machine, availability of volunteer time and trust chain.
**Description**:
Currently needed infrastructure:
* GPU builds:
* Linux x64: WIP (Quansight)
* Long builds:
* Linux: WIP (Quansight)
* Windows
* macOS x64
* Native architectures:
* macOS arm64
* Linux aarch64
* Linux ppc64le
**References**:
- https://github.com/conda-forge/conda-forge.github.io/issues/63
- https://github.com/conda-forge/conda-forge.github.io/issues/1272
- https://github.com/conda-forge/conda-forge.github.io/issues/1781
- https://github.com/conda-forge/conda-forge.github.io/issues/1537
-
**Contact info**:
---
## Software and Internal Tooling
### Conda and Mamba tooling
**Context**: Anaconda used to be the sole provider of tooling in the `conda` ecosystem, but other organizations are producing their own tooling to either complement or substitute existing utilities (QuantStack, Quansight, [conda-incubator](https://github.com/conda-incubator), among others).
**Tasks**:
* For better interoperability, some of Anaconda's tools could be split in smaller pieces. For example, `conda index` could be provided on its own, not as part of `conda build`.
* Create schemas for the different file formats involved. (*In progress* 🚧 )
* Standardize the _de facto_ behaviors into technical specs tools can reimplement.
---
### Incremental repodata updates
🚧 *[The conda](https://github.com/conda/conda) and [mamba](https://github.com/mamba-org) teams are working on this* 🚧
**Summary**: More efficient serving of the ever-growing `repodata.json` files with incremental updates
**Effort / cost**:
**Priority**:
**Context**: The package metadata used by the conda solvers is published upfront for each conda channel in a JSON file called `repodata.json`. The more packages published in a channel, the bigger the JSON file. This file is downloaded every time `conda` needs to install something. Popular channels like conda-forge store a lot of packages and are updated often, so caching and compression only help to some extent.
**Description**:
**References**:
- https://github.com/conda-incubator/ceps/pull/20
- https://github.com/conda/conda/issues/11640
- https://github.com/mamba-org/powerloader
**Contact info**:
---
### Periodic snapshots of conda-forge repodata
**Summary**: Creating snapshots of `conda-forge` `repodata` at regular time intervals
**Effort / cost**:
**Priority**:
**Context**: The channel metadata changes when new packages are added, but it can also be patched retroactively to fix metadata problems introduced in the past. For conda-forge, the patches are submitted via [`conda-forge/conda-forge-repodata-patches-feedstock`](https://github.com/conda-forge/conda-forge-repodata-patches-feedstock) and the propagated to the CDN.
**Description**:
**References**:
**Contact info**:
---
### Windows toolchain
**Summary**: `conda-forge` provides the `MinGW` compilers, built at a certain _epoch_ (timestamp). Updating to a more recent build would be desirable.
**Effort / cost**:
**Priority**:
**Context**:
**Description**: This requires rebuilding `MinGW`, updating `repodata` and rebuilding all downstream packages that depend on the `MinGW` tool chain.
**References**:
**Contact info**:
---
### Recipe generation
* Generate R recipes with [Grayskull](https://github.com/conda-incubator/grayskull)
* Generate multi-outputs for pip-extras
* Generate multi-outputs for headers / dynamic / static libraries
---
### Recipe maintenance
* Detect when requirements have changed in a new version and suggest the change as part of the automated PR.
* License-based filtering
* https://github.com/conda-forge/conda-forge.github.io/issues/209
---
## Supply chain security
* Package signing on Quetz
* Running an [X-ray](https://jfrog.com/xray/) security scan on all the artifacts
---
## Documentation
Lots of institutional knowledge that changes often without being consolidated in a written medium. Enumeration and writing needed. Some examples include:
* Infrastructure deployment
* Compilers
* Migration process
* Cross-organization tooling
* Onboarding
* Staged-recipes handbook
---
## Achieved goals
The following items are no longer part of the `conda-forge` roadmap because they were funded and/or completed!
### The `.conda` format
The new package format is live on Anaconda.org, as of XX.XX.XXXX. `conda-forge` supports it
https://github.com/conda-forge/conda-forge.github.io/issues/1586
https://github.com/conda-forge/conda-forge.github.io/issues/877