Cons of the current spec:
(MBargull) If we want to come up with a "more sound" specification that is easier to parse (for machine as well as humans), avoids many complications of the current format and allows future extensions/workflow integrations, we should (on a higher level, IMHO) try to discern needed structures and processes as clearly as feasible.
This means:
requirements
sections define something like
NODE[X] := X | SELECT_NODE[X]
SELECT_NODE[X] := [CONDITION, X]
NODE[REQUIREMENTS] := LIST[REQUIREMENT_NODE]
NODE[REQUIREMNT] := [PACKAGE_NAME, PACKAGE_VERSION, PACKAGE_BUILD, ...]
"sel(cond)": entry
route, we assume to go with a format that makes it{if: cond, then: entry}
would be a more flexible serialization alternative)TEXT -> Jinja2 -> YAML -> [OBJECTS]
TEXT -> YAML -> [OBJECTS] -> [Jinja2(obj) for obj in OBJECTS]
.pin_compatible
/pin_subpackage
don't have to be processed by theThis should also help us identify which legacy concepts/features we can remove or replace by
more general approaches (e.g., I'd like us to remove unnecessarily domain-specific things like
noarch: python
(should be {arch: noarch, package_type: python}
or the like), CONDA_*
and
PYTHON
/{{ python }}
env/jinja2 vars etc.) and to make implicit behavior more explicit etc.
bld.bat
/ build.sh
" section below.channel:namespace:pkg 0.3.1 XXXXXX
mypackage[build_number>5]
which seems impossible with the current conda_build specbld.bat
/ build.sh
↦ unify script names to build.bat
and build.sh
(WV) The new selector syntax may tempt to write code like this:
requirements:
host:
- python
- "sel(win)":
- curl
- flask
"sel(linux)":
- wget
- bash
It would require some code to make this work as expected. I wonder if we should support it or not.
MRB: No!
At some point in time variants could be set with CONDA_PY
, CONDA_R
… environment vars. This apparently leads to conda-build having to check and regex the build scripts if these env vars are used. I think it's time to get rid of them and not include them with v2.
It would be great if there was a better story for optional dependencies.
The special-casing of test requirements is inconsistent with the requirements for host/build/run. The fact that you can't independently install the test dependencies is a huge usability wart and suggests there should be a better way of supporting them.
I think that rather than special-casing test dependencies, that test
should simply be another sub-category under requirements
alongside host/build/run. Taking this idea further, a user could define any sub-category they wanted under requirements
- e.g. dev
, docs
, etc…
In the case of dask
they would define sub-categories array
, bag
, dataframe
, distributed
, diagnostics
, delayed
:
https://github.com/dask/dask/blob/master/setup.py#L10-L28
As it is, conda forces you to take an all or nothing approach - e.g.
https://github.com/conda-forge/dask-feedstock/blob/master/recipe/meta.yaml
Even if all you want to use is dask.array
you're forced to install the dependencies for everything. The answer often given is to use outputs, and that's what we do ourselves - every package builds pkg-test
, pkg-docs
and pkg-dev
metapackages along with pkg
itself. Whilst this can be made to work, IMHO it's a poor substitute for proper support for optional dependencies. Ideally a recipe maintainer could specify whatever sub-category under requirements
they wanted and conda would allow you to install that sub-category independently.
e.g. using pip syntax it would be
conda install dask[array,bag]
It has been argued that the pip syntax conflicts with existing conda syntax which is a fair argument, but there could be other ways to specify optional dependencies - e.g.
conda install dask --include-deps array --include-deps bag
conda install dask
could then be used as an alias for conda install dask --include-deps run
This proposal requires a change to the yaml spec and also support from conda/mamba. The yaml changes could be made backwards compatible by allowing test requirements to be specified either under test/requirements
or requriements/test
with a suitable deprecation period.
MRB: we should not further complicate recipes with optional deps or arbitrary dep sections. the gain here is marginal and the costs are big
Outputs can currently be two things:
These two functions and the (implicit?) super-build are confusing and not very intuitive.
One suggestion is to have explicit "transient" builds / outputs but the rules for this are also not clear.
MRB: what do you mean the rules are not clear? The rules are
conda-build defines the following jinja functions:
load_setup_py_data
(used by 0 packages on cf)load_setuptools
(used by 1 package on cf, conda_smithy)load_npm
(used by 0 packages on cf)load_file_regex
(used by 0 packages)installed
pin_compatible
(IMPORTANT)pin_subpackage
(IMPORTANT)compiler
(IMPORTANT)cdt
(IMPORTANT)resolved_packages
time
-> this might not be great for reproducible builds? (MRB: we have to keep these, they are very important for builds that need to autoincrement the version)datetime
-> this might not be great for reproducible builds? (MRB: we have to keep these, they are very important for builds that need to autoincrement the version)environ
(IMPORTANT)Maybe pin_compatible and pin_subpackage should not be implemented as Jinja functions but rather as a preprocessing step in the build / host environment configuration? The syntax would stay the same but the {{ & }}
brackets would be removed. The reasoning is that the output of those two functions depends on the solve and hashing process which is already a part of the processing that conda build does (vs. string manipulation and templating that Jinja does). A clean seperation could lead to code that becomes easier to digest.
The syntax to continue to use variables would change to pin_subpackage({{ name }}, max_pin='x.x')
MRB: The change of pin_subpackage and pin_compatible to non-jinja2 is pretty invasive and not needed. We can rearange the parsing to explicitly use the graph of deps and proceed in topological order to avoid the difficulties here.
WV: Yes, I agree we can keep the Jinja syntax, but internally the jinja might just modify the dependency to some other machine readable output, e..g
{{ pin_subpackage(name, max_pin='x.x') }} => PIN_SUBPACKAGE[somelib, max_pin=x.x, exact=False, ...]
which we can parse in a later step. This would remove the need of Jinja and solving to be intermangled.
MRB: FWIW, jinja2 gives you access to their AST. They break the text up in a way that has nothing to do with the YAML AFAICT.
MRB: My goal here is to make sure that all of these changes can be easily put into conda-build. Thus keeping them minimal is very important. The parsing and solving being intermingled is OK once we have a faster solver around.
WV: I think having clean seperations between the stages is much preferred though. Right now it's very difficult to follow the conda build code. The different stages that (theoretically) should exist are probably:
MRB: So all I am saying is that in step 3 above (sorting outputs and getting deps), we render with stub jinja2 functions for pin_subpackage and pin_compatible that just return the name. Then in stage 4, we use the information from the previous build. In other words, we can support the parsing + rendering + solving + building steps above with the jinja2 just fine.
WV: no, I don't think so because e.g. for sorting we need the right package name inserted. Imagine an output using {{ pin_compatible(name) }}
then we would (for sorting) have to evaluate the jinja etc. It would be much easier (implementation wise) to evaluate this to mypackage PIN_COMPATIBLE[max_pin='x.x', ...]
and we still have a way to figure out the correct sorting order without going back to Jinja. Jinja should execute only once.
MRB: This is a minor implementation detail. Executing jinja2 once to build a special string that we then parse again is no different from having the second step of parsing the string again be done in jinja2 as well. At the end of the day, you have to turn PIN_COMPATIBLE[max_pin='x.x', ...]
into an actual package requirement string.
WV: sure but you are missing that I was using name
as a variable. We can throw away the jinja context and all that overhead after the first parsing step is done.
MRB: No I was not. The context is one extra dictionary that is around. Seems like a small thing.
WV: it seems small, but it complicates the architecture. Clean seperation of the stages is very important to me. But it's an implementation detail, I agree with that and one is free to chose to do it differently. The key is that we want to engineer something that will be maintainable for a long time.
MRB: (edit:) implementing PIN_COMPATIBLE[max_pin='x.x', ...]
in conda build as it is now is a huge change.
WV: ew don't want to support PIN_COMPATIBLE. The end user still writes {{ pin_compatible( ... ) }}
but internally boa / conda-build might choose to change it to an internal representation and do as it pleases.
MRB: I mean implementing the internal rep, not supporting it for users. Sorry! :D
WV: for reference, that's how I am planning to do it with boa, and then generate the conda-build MetaData class which I hopefully can feed to conda-build to create packages :)
bld.bat
/ build.sh
Chris Burr: This might be out of scope for this proposal but it would be nice for the scripts used for building to become more automatic. Nix, portage and many other package managers have a mechanism by which builds are split into "phases". Packages can modify how other stages behave, for example depending on cmake
causes the configure stage to run something like cmake "$source_dir" "${cmake_flags[@]}"
. The cmake_flags
array can then be set to a sensible default. This makes it easier to write correct recipes without knowing the details of how to use every build system and also makes it easier to modify how all recipes are built. Both Nix and Portage ultimately implement this using bash scripting. I'll include a suggestion of how this might look but I'm not advocating for it to be exactly this way. I'm especially not sure how to map this over to Windows builds.
If the pip
package is defined like:
build:
export: # Tells dependent recipes to change how they run some stages
install: python -m pip install "${src_dir}"
and cmake
is defined like:
build:
export_env: # All exported environment changes are sourced before building any recipes
declare -a cmake_flags
cmake_flags+=("-DCMAKE_INSTALL_LIBDIR=lib")
cmake_flags+=("-DCMAKE_INSTALL_PREFIX=${PREFIX}")
export_phases:
configure: |
cmake \
"${cmake_flags[@]}" \
"${src_dir}"
build: cmake --build . -j${CPU_COUNT}
install: cmake --build . --target install -j${CPU_COUNT}
pip
build:
phases:
build: python setup.py build
install: python setup.py install
check: |
my_installed_comand --help
my_installed_comand --version
imports:
- my_package
build:
phases: # Install phase is done automatically
check: |
my_installed_comand --help
my_installed_comand --version
imports:
- my_package
SOME_FLAG
) enabledbuild:
env: |
cmake_flags+=("-DSOME_FLAG=ON")
# No need to define any phases, everything is done automatically
Currently if a package needs to provide an environment variable it has to be done by writing two bash+fish+csh+bat+… scripts that will be sourced during activation/deactivation. Many packages don't do this for all shells making them broken in some shells. It would be nicer to do this declaratively in the recipe metadata.
Currently some packages that provide static libraries or are header-only require downstream packages to include their licenses.
This is difficult to do correctly in practice therefore it might be useful to have a license_exports
field.
See this comment.
A common request is to be able to use the same recipe file local development or to provide nightly builds.
Tools such as conda-devenv
have been made to ease this use case but this duplicates the maintanence work.
Currently when using specifying a version constraint such as numpy >=1.15
it conflicts with the global conda-forge pinning and causes the latest version to be used. It would be nice to be able to minimise versions in host
to maximise compatibility later on. Or at least to be able to combine the global pins with the recipe ones, i.e. in this case where the global pin is 1.14
on x86 and 1.16
on ARM/POWER, 1.15
should be chosen on x86 and 1.16
should be chosen on alternative architectures.
This one combines some of the great ideas from v1, but tries to make more minimal changes to the
spec.
rules:
test.requirements
eval
ed in python - see the comments below on how they arecontext:
name: blah
version: "1.2"
build_num: 3
major_ver: "{{ version.split('.')[0] }}"
# indicates that we are using version 2 of the recipe format
recipe_format: 2
package:
name: {{ name|lower }}
version: "{{ version }}" # we have to have quotes here or conda build errors
source:
- git_url: https://github.com/blib/blah
git_rev: master
patches:
- 001_blah_to_blih.patch
- sel(win): 002_blib_to_blob.patch
build:
number: 0
script: "{{ python }} -m pip install . --no-deps --ignore-installed -vv"
requirements:
host:
- pip
- python >=3.5
# this is a selector
# it can be as many keys as needed
# when parsing, only one key is allowed to eval to true, if more than one does
# an error is raised - the value in the dict with the true key is inserted for the
# element
# this construct can appear anywhere in the recipe
- sel(win or osx): rust =1
sel(linux): rust >=1
run:
- python
- "{{ pin_subpackage('libxgboost', exact=True) }}"
test:
requirements:
- nose
imports:
- blah
about:
home: "https://palletsprojects.com/p/click/"
license: "BSD"
license_family: "BSD"
license_file: "LICENSE.txt"
summary: "Composable command line interface toolkit"
extra:
recipe-maintainers:
- FrankSinatra
- ElvisPresley
Here is one with outputs:
context:
name: blah
version: "1.2"
build_num: 3
major_ver: "{{ version.split('.')[0] }}"
package:
version: "{{ version }}"
build:
number: 0
binary_relocation: False
script: "{{ python }} -m pip install . --no-deps --ignore-installed -vv"
requirements:
host:
- pip
- python >=3.5
- "sel(win or osx)": rust =1
"sel(linux)": rust >=1
run:
- python
- "{{ pin_subpackage('libxgboost', exact=True) }}"
test:
requirements:
- nose
imports:
- blah
# all of these outputs use the same requirements as above
# along with the test requirements, but not imports
outputs
- package:
name: out1
# has the same version as above
build:
binary_relocation: True
script: out1_build.sh
test:
imports:
- blah.out2
commands:
- echo "this format is nice!"
- package:
name: ou2
version: "{{ version }}.1"
build:
script: out2_build.sh
test:
imports:
- blah.out2
about:
home: https://palletsprojects.com/p/click
license: BSD
license_family: BSD
license_file: LICENSE.txt
summary: Composable command line interface toolkit
extra:
recipe-maintainers:
- "FrankSinatra"
- "ElvisPresley"
This proposal is a very logical reaction to complexity. complexity arises for many reasons and i (mrb) don't fully understand where the complexity in the orignal conda recipe format came from. IIUIC selectors predate jinja2 in the recipes so part of it might simply be the accumulation of features over time. I worry that another part of the complexity is from actual needs that we have not fully enumerated or understood. Thus it seems like we should either try to make minimal changes or we need to take a census of sorts of recipes, both simple and complicated, to ensure that we can support everything that is needed.
the version being a string is something we can test in conda-build itself and error otherwise. this in and of itself is not a motivation for moving to TOML or the like.
the selectors here are implemented too narrowly given how our recipes use them in practice.
I really like the very structured context block. It solves issues around enforcing good jinja2 usage and clearly stating constants at the top.
We will still need jinja2 munging of stuff in the context block. i think a three stage parsing process makes sense. 1) get the context, 2) eval munged context with jinja2, 3) render the rest of the recipe with the context.
this spec doesn't address a bunch of other mildy confusing things (eg test commands w/ test scripts too, requires versus requirements in the test section, the py selector value, how build scripts are specified for outputs, …)
i'd like to see all jinja2 control flow commands deprecated.
Switch to a more restricted markup language (TOML) and get rid of some of the custom features.
Rules:
# initialize a variable context (has to come first)
[context]
name = "click"
version = "7.0"
# Now you can use the context variables
[package]
name = "{{ name|lower }}"
version = "{{ version }}"
[source]
url = "https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/{{ name }}-{{ version }}.tar.gz"
sha256 = "5b94b49521f6456670fdb30cd82a4eca9412788a93fa6dd6df72c94d5a8ff2d7"
[build]
number = 0
script = "{{ PYTHON }} -m pip install . --no-deps --ignore-installed -vv "
[requirements]
host = [
"pip",
"python >=3.5",
"rust =1 [win or osx]",
"rust >=1 [linux]"
]
run = [
"python",
"{{ pin_subpackage('libxgboost', exact=True) }}"
]
[test]
imports = [
"click"
]
[about]
home = "https://palletsprojects.com/p/click/"
license = "BSD"
license_family = "BSD"
license_file = "LICENSE.txt"
summary = "Composable command line interface toolkit"
doc_url = "https://palletsprojects.com/p/click/"
description = """
This is a long description
of the package that is described by this
meta.yaml file and it spans multiple lines.
"""
[extra]
recipe-maintainers = [
"FrankSinatra", "ElvisPresley"
]
Clarify how outputs are handled – implicit first level output, but if [outputs]
key specified, no implicit outputs?!
- MCS: https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#implicit-metapackages and https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#outputs-section