---
# System prepended metadata

title: Conda-forge wish list and long-term planning

---

# Conda-forge wish list and long-term planning 

> Copied from https://hackmd.io/0zGSUS71SbOdBsdLtDmGjg

## attendees

- MRB
- Cheng
- Uwe
- Eric
- Filipe
- Isuru
- Keith
- Crystal
- John K.
- Chris B.
- Wolf
- Marcelo
- CJ
- Connor

## agenda

1. distill our ideas into
    - a few near-term (2 years) 
    - a few long-term goals (5 years)
2. slot ideas and tasks below into those goals

## Resources

### data on conda-forge

* see https://github.com/conda-forge/by-the-numbers/blob/master/conda-forge-timelines.ipynb
* conda-forge has added about 3k feedstocks per year in 2019 and will in 2020
* the growth in the amount of data we store appears to be accelerating

### infrastructure survey

* fill out the risk measurement spread sheet:  
* notebook: https://gist.github.com/beckermr/9c0f5aa71720cf1b18646ccd0c3ab40f
* results: https://nbviewer.jupyter.org/gist/beckermr/9c0f5aa71720cf1b18646ccd0c3ab40f
    * Style the notebook with >=4 being red and everything else being green:
```python
data = pd.DataFrame(tot.values)

def _color_red_or_green(val):
    print(type(val))
    color = 'red' if val >=4 else 'green'
    return 'color: %s' % color

df = data.iloc[[4, 5, 6, 7, 8, 9, 10, 11, 16], 0:22]

df.index = df[[0,1]]
del df[0]
del df[1]

df.columns = df.iloc[0]
df = df.drop([df.index[0]])

d = df.style.applymap(_color_red_or_green);

d
```

### pypa roadmap example

* https://github.com/psf/fundable-packaging-improvements/blob/master/FUNDABLES.md


## Notes

* (MRB) Biggest problem is growth. Haven't really thought through infrastructure at that scale.
* Small ticket items should be things that dont have long-term cost commitments
* If we're going to ask for big things that are going to
* What is the marginal cost of adding one new recipe to conda-forge?
    * Varies from trivial (noarch python) to impossible (all of RAPIDS). Most complex is adding a new compiler.
* What is the path forward for conda-forge if we don't employ 
* What are we going to break next?
    * anaconda.org?
        * Cloudflare is handling most of the traffic there
* What's the actual growth of conda-forge?
    * What load does that put on all of the services we use / depend on?
    * We have or will have added about 3k packages/feedstocks a year.
* Develop plan B for a lot of the big scary items.
    * Plan B for the admin server? Ask Bloomberg to fund web server that's more robust than our small Heroku instance that we've pushed to its limit

* Growth is documented here: 

* We are taking a risk census here: https://docs.google.com/spreadsheets/d/1ADNNauwVZWUsEdlh5aEg0OLjyDWvCX7PLoo-K34EqcM/edit#gid=0

## Notes 2020-12-09

### `.conda` format

 - faster to extract metadata
 - filesize is smaller
 - would lower CDN sync time
 - anaconda.org does not have support 
     - would have to host them somewhere else and make the CDN look for .conda files there
     - could find a way to contract with anaconda to get it done
     - need a price tag

### infrastructure as configuration (terraform)

 - solve multiple issues around
     - key person risk
     - rotating credentials
     - recovering due to adverse events

### servers for XYZ

 - metachannel
 - linux-64 server 
     - debugging and builds for qt
     - use OVH or amazon
 - GPU servers
     - anaconda?
     - quantsight?
     - nvidia can donate maybe: looking for a place to house them 
 
### a CI service for feedstocks

 - problems solved
     - long builds
     - GPU CI 
 - requirements:
     - TOS
     - homes for GPUs
     - 

### host source tarballs

 - solves reproducibility issues
 - hosted tarballs would solve some migration issues
 - terms of service required

### big/long builds

 - we need a way to pay people for this time to update recipes.
 - we need long builds on CI.
 - we need buy-in from package authors (or community).

### maintain/update MinGW toolchain

 - MinGW recipes work on epochs.
 - Update to a new epoch.
 - Patch our repodata for the epoch.
 - Then rebuild downstream recipes.
 - explain what it is

## Focused effort on specific recipes/new packages

- (MRB) pytorch
- (MRB) tensorflow
- (FF) julia
- (??) bazel
    - Does this mean recipes that use bazel? or a package on conda-forge so we can install?
    - Need to build bazel to build tensorflow. Need JVM to build bazel. Then you need to build or package a JVM
    - (IF) building bazel isn't that hard. Using bazel is challenging
- (UK) updated MinGW toolchain

For all of these, we seem to have the following problems:
* access to expertise
* access to CI services
* 

## Infrastructure

- (FF) A server to call my own
    - (ED 2021-01-13) open vote for conda-forge/core to request purchasing a personal machine (laptop / desktop)
- (FF) AWS / OVH / GKE / Heroku credits
- (FF) Long-term Windows cloud VM
    - (MVN) ideally usable for intel fortran
    - (ED 2021-01-13) solved by OVH?
- (IF) A Linux x86_64 server and a MacOS server for long time builds and setup drone there.
- (MRB) Setup drone on aarch64 and ppc64le servers
- (JK) GPU machines for testing
- (JK) Migrating to the new `.conda` format
- (IF) Host source tarballs
- (CHL) Support for anaconda.org CDN (low priority)
    - (MRB) What does "support for anaconda.org CDN" mean here?
    - (CHL) Largely community support and maintenance 
    - $$$?
    - big key person/organization risk
    - can we do more community maintenance bits?

## Focused effort on internal tooling

* four buckets
    * conda tooling: conda / conda-build / mamba
        * move indexing code out of conda build
        * Anaconda is working on opening conda/conda-build up for community maintenance
            * identify high-value issues on GitHub
            * e.g., reconcile solver uses between conda and mamba
                * make them swappable is the (very) long-term goal
        * Standards
            * support recipe spec in conda-build
        * mamba maintenance
            * missing features for micromamba (upgrade, clean, ...) to make it a full fledged conda-alternative
            * micromamba proper pinning support (also for minor python version)
            * micromamba read ".condarc" or ".mambarc" file
            * proper ssl stuff for micromamba on win / osx 
            * add `repoquery why` and other repoquery command work
                * https://github.com/mamba-org/mamba/issues/620
                * https://github.com/mamba-org/mamba/issues/618
                * https://github.com/mamba-org/mamba/issues/413
    * serving repodata (quetz, anaconda.org, 
      github thing)
        - Snapshotting of the conda-forge channel for 
          a given state in time
        - There's another use case for repodata patching 
          for different labels (i.e., development of new 
          platforms like macos-arm)
        - (WV 2021-01-13) we have some additional ideas in [this issue](https://github.com/mamba-org/quetz/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc) 
          for something like the pypi-timemachine.
          jupyterlab used the rc label quite succesfully 
          for the 3.0 release
        - anaconda.org support for .conda format
        - 
    * generating recipes (grayskull)
        - (FF) R recipe regeneration for grayskull
        - (SC) Should Julia be well packaged, Julia
          package recipe generation.
        - (UK) Python extra requirements as additional
          outputs via greyskull
        - (UK) Split C++ libraries into multiple outputs 
          (shared libs, headers, static libs, ..)
          - Build some tooling that simplifies that 
            (RPM is that you? I knew you would catch on
            what we are doing!)
    * maintaining recipes (bot, etc.)
        - (JK) Updating dependencies as part of version
          bumps (this is in grayskull already for 
          Python)
        - (MVN) More event driven builds allowing for
          graphs of outputs building in order without
          needing to be in one CI stage.
        - (CJ) Automatic repodata patching
            - Don't issue migration if only a patch is
              needed
            - Provide repodata patches to API pins 
              (or the lack thereof)
        - (SC) Nicer web UI around migrations
        - (CJ) Better finding abi breaks
        - (CJ) Fix remaining packages with build -> 
          host issues
        - (FF) "better" graph
        - move more bot stuff to quetz
    * other, misc?
        - (FF) improved docs user/maintainers guide
        - (WV) repology integration / mapping conda-forge <-> repology
        - (WV) (with above) CVE notices?


## Long-term Challenges/Risks

### maintenance 

* (MRB) documentation
* (MRB) staged-recipes

### loss of institutional knowledge

* (MRB) compilers
* (MRB) github automation infrastructure
* (MRB) credentials
* (MRB) policies and practices

### core/maintainer time/resources/burnout

* (MRB) communicating effectively with our maintainers

### provider relationships

* (MRB) we are slowly bleeding our deep connections w/ anaconda
* (MRB) we don't have formal relationships around key resources like azure
* (MRB) we struggle to provide more advanced tooling
    * cuda
    * compilers beyond gcc/clang
* (MRB) github itself

### growth

* (MRB) web hosting
* (MRB) github automation
* (MRB) recipe metadata maintenance and reconciliation 
* (MRB) build capacity 
    * more complicated packages that don't fit in our limits
    * loss of azure build ticket
    * build capacity on specialized architectures
* (MRB) tooling failure due to growth (solves to slow etc.)

### security

* (MRB) as we have more maintainers, how do we further secure key parts of the infrastructure and packages?

### fiscal sustainability

* (MRB) operating cost constraints
* (MRB) operating cost growth?

# Key

FF: Filipe
MVN: Marius
CJ: CJ-Wright
MRB: Matt Becker
CHL: Cheng Lee
UK: Uwe Korn
WV: Wolf Vollprecht
SC: Sylvain Corlay
JK: John Kirkham
IF: Isuru Fernando
