# Conda-forge wish list and long-term planning > Copied from https://hackmd.io/0zGSUS71SbOdBsdLtDmGjg ## attendees - MRB - Cheng - Uwe - Eric - Filipe - Isuru - Keith - Crystal - John K. - Chris B. - Wolf - Marcelo - CJ - Connor ## agenda 1. distill our ideas into - a few near-term (2 years) - a few long-term goals (5 years) 2. slot ideas and tasks below into those goals ## Resources ### data on conda-forge * see https://github.com/conda-forge/by-the-numbers/blob/master/conda-forge-timelines.ipynb * conda-forge has added about 3k feedstocks per year in 2019 and will in 2020 * the growth in the amount of data we store appears to be accelerating ### infrastructure survey * fill out the risk measurement spread sheet: * notebook: https://gist.github.com/beckermr/9c0f5aa71720cf1b18646ccd0c3ab40f * results: https://nbviewer.jupyter.org/gist/beckermr/9c0f5aa71720cf1b18646ccd0c3ab40f * Style the notebook with >=4 being red and everything else being green: ```python data = pd.DataFrame(tot.values) def _color_red_or_green(val): print(type(val)) color = 'red' if val >=4 else 'green' return 'color: %s' % color df = data.iloc[[4, 5, 6, 7, 8, 9, 10, 11, 16], 0:22] df.index = df[[0,1]] del df[0] del df[1] df.columns = df.iloc[0] df = df.drop([df.index[0]]) d = df.style.applymap(_color_red_or_green); d ``` ### pypa roadmap example * https://github.com/psf/fundable-packaging-improvements/blob/master/FUNDABLES.md ## Notes * (MRB) Biggest problem is growth. Haven't really thought through infrastructure at that scale. * Small ticket items should be things that dont have long-term cost commitments * If we're going to ask for big things that are going to * What is the marginal cost of adding one new recipe to conda-forge? * Varies from trivial (noarch python) to impossible (all of RAPIDS). Most complex is adding a new compiler. * What is the path forward for conda-forge if we don't employ * What are we going to break next? * anaconda.org? * Cloudflare is handling most of the traffic there * What's the actual growth of conda-forge? * What load does that put on all of the services we use / depend on? * We have or will have added about 3k packages/feedstocks a year. * Develop plan B for a lot of the big scary items. * Plan B for the admin server? Ask Bloomberg to fund web server that's more robust than our small Heroku instance that we've pushed to its limit * Growth is documented here: * We are taking a risk census here: https://docs.google.com/spreadsheets/d/1ADNNauwVZWUsEdlh5aEg0OLjyDWvCX7PLoo-K34EqcM/edit#gid=0 ## Notes 2020-12-09 ### `.conda` format - faster to extract metadata - filesize is smaller - would lower CDN sync time - anaconda.org does not have support - would have to host them somewhere else and make the CDN look for .conda files there - could find a way to contract with anaconda to get it done - need a price tag ### infrastructure as configuration (terraform) - solve multiple issues around - key person risk - rotating credentials - recovering due to adverse events ### servers for XYZ - metachannel - linux-64 server - debugging and builds for qt - use OVH or amazon - GPU servers - anaconda? - quantsight? - nvidia can donate maybe: looking for a place to house them ### a CI service for feedstocks - problems solved - long builds - GPU CI - requirements: - TOS - homes for GPUs - ### host source tarballs - solves reproducibility issues - hosted tarballs would solve some migration issues - terms of service required ### big/long builds - we need a way to pay people for this time to update recipes. - we need long builds on CI. - we need buy-in from package authors (or community). ### maintain/update MinGW toolchain - MinGW recipes work on epochs. - Update to a new epoch. - Patch our repodata for the epoch. - Then rebuild downstream recipes. - explain what it is ## Focused effort on specific recipes/new packages - (MRB) pytorch - (MRB) tensorflow - (FF) julia - (??) bazel - Does this mean recipes that use bazel? or a package on conda-forge so we can install? - Need to build bazel to build tensorflow. Need JVM to build bazel. Then you need to build or package a JVM - (IF) building bazel isn't that hard. Using bazel is challenging - (UK) updated MinGW toolchain For all of these, we seem to have the following problems: * access to expertise * access to CI services * ## Infrastructure - (FF) A server to call my own - (ED 2021-01-13) open vote for conda-forge/core to request purchasing a personal machine (laptop / desktop) - (FF) AWS / OVH / GKE / Heroku credits - (FF) Long-term Windows cloud VM - (MVN) ideally usable for intel fortran - (ED 2021-01-13) solved by OVH? - (IF) A Linux x86_64 server and a MacOS server for long time builds and setup drone there. - (MRB) Setup drone on aarch64 and ppc64le servers - (JK) GPU machines for testing - (JK) Migrating to the new `.conda` format - (IF) Host source tarballs - (CHL) Support for anaconda.org CDN (low priority) - (MRB) What does "support for anaconda.org CDN" mean here? - (CHL) Largely community support and maintenance - $$$? - big key person/organization risk - can we do more community maintenance bits? ## Focused effort on internal tooling * four buckets * conda tooling: conda / conda-build / mamba * move indexing code out of conda build * Anaconda is working on opening conda/conda-build up for community maintenance * identify high-value issues on GitHub * e.g., reconcile solver uses between conda and mamba * make them swappable is the (very) long-term goal * Standards * support recipe spec in conda-build * mamba maintenance * missing features for micromamba (upgrade, clean, ...) to make it a full fledged conda-alternative * micromamba proper pinning support (also for minor python version) * micromamba read ".condarc" or ".mambarc" file * proper ssl stuff for micromamba on win / osx * add `repoquery why` and other repoquery command work * https://github.com/mamba-org/mamba/issues/620 * https://github.com/mamba-org/mamba/issues/618 * https://github.com/mamba-org/mamba/issues/413 * serving repodata (quetz, anaconda.org, github thing) - Snapshotting of the conda-forge channel for a given state in time - There's another use case for repodata patching for different labels (i.e., development of new platforms like macos-arm) - (WV 2021-01-13) we have some additional ideas in [this issue](https://github.com/mamba-org/quetz/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc) for something like the pypi-timemachine. jupyterlab used the rc label quite succesfully for the 3.0 release - anaconda.org support for .conda format - * generating recipes (grayskull) - (FF) R recipe regeneration for grayskull - (SC) Should Julia be well packaged, Julia package recipe generation. - (UK) Python extra requirements as additional outputs via greyskull - (UK) Split C++ libraries into multiple outputs (shared libs, headers, static libs, ..) - Build some tooling that simplifies that (RPM is that you? I knew you would catch on what we are doing!) * maintaining recipes (bot, etc.) - (JK) Updating dependencies as part of version bumps (this is in grayskull already for Python) - (MVN) More event driven builds allowing for graphs of outputs building in order without needing to be in one CI stage. - (CJ) Automatic repodata patching - Don't issue migration if only a patch is needed - Provide repodata patches to API pins (or the lack thereof) - (SC) Nicer web UI around migrations - (CJ) Better finding abi breaks - (CJ) Fix remaining packages with build -> host issues - (FF) "better" graph - move more bot stuff to quetz * other, misc? - (FF) improved docs user/maintainers guide - (WV) repology integration / mapping conda-forge <-> repology - (WV) (with above) CVE notices? ## Long-term Challenges/Risks ### maintenance * (MRB) documentation * (MRB) staged-recipes ### loss of institutional knowledge * (MRB) compilers * (MRB) github automation infrastructure * (MRB) credentials * (MRB) policies and practices ### core/maintainer time/resources/burnout * (MRB) communicating effectively with our maintainers ### provider relationships * (MRB) we are slowly bleeding our deep connections w/ anaconda * (MRB) we don't have formal relationships around key resources like azure * (MRB) we struggle to provide more advanced tooling * cuda * compilers beyond gcc/clang * (MRB) github itself ### growth * (MRB) web hosting * (MRB) github automation * (MRB) recipe metadata maintenance and reconciliation * (MRB) build capacity * more complicated packages that don't fit in our limits * loss of azure build ticket * build capacity on specialized architectures * (MRB) tooling failure due to growth (solves to slow etc.) ### security * (MRB) as we have more maintainers, how do we further secure key parts of the infrastructure and packages? ### fiscal sustainability * (MRB) operating cost constraints * (MRB) operating cost growth? # Key FF: Filipe MVN: Marius CJ: CJ-Wright MRB: Matt Becker CHL: Cheng Lee UK: Uwe Korn WV: Wolf Vollprecht SC: Sylvain Corlay JK: John Kirkham IF: Isuru Fernando