rattler-build cache

We want to be able to "cache" an intermediary build result for multiple reasons. The main use-case is to split a single build into multiple packages (such as headers, shared library, static library), without needing to rerun the build.

We propose to add a single "cache" step to rattler-build.

cache:
  requirements:
    build:
      - ${{ compiler }}
      - cmake
      - ninja
   script:
    - cmake build ...

The cache is built like a normal package, and then the work - directory and any new files from the prefix are zipped up in a cache folder. For each output we extract the files again from that cache.

QUESTIONS

  • should the cache step already "install" files into the host prefix? This would make it easy to split the package later but makes caching a bit harder since we need to zip up both new files from host + the work directory.
  • Does any of this really make sense or would ccache / sccache integration be superior? - WV: Actually, the configuration step takes quite some time, e.g. when invoking cmake or autotools
  • Antoine P: about terminology, maybe something in the lines of "pre-build"/"build-common" would be more explicit, especially wrt "does the cache install?". Pre-build could be more explicit in case the "cache" is not installed.
    I also hope to see caching in-between builds (a la ccache, sccache) in the future, so "cache" may have multiple meanings in the future.

Use cases:

  • C++ library + Python library (e.g. mamba)
  • C++ split package (headers, library, debug symbols)
  • ?

To make it easy to produce split packages, we would add a file-glob step to rattler-build, e.g.

outputs:
  - package:
      name: libfoo
    files:
      - lib/**
      - include/**
  - package:
      name: py-libfoo
    files:
      - site-packages/**

Challenges

How do we handle run-dependencies exported from the cache step?

Antoine P: Sometimes even build requirements (compiler, libs) need to be shared because some compilation can be shared, but not all.
Alternatively, perhaps the abstraction we are looking for is not a "package"-like cache but a "reusable build step". Listing it a script element would have the both the effect of caching, but also of "merging" all dependencies. In fact, since for caching these needs to be called first, perhaps a new key like "script-reuse" would be needed.

If we have an output-chain, should only the first one get the run exports applied?

For example:

outputs:

  • bla
  • py-bla -> requires bla

It would be enough if bla get's the run exports from the cache.