rattler-build cache

We want to be able to "cache" an intermediary build result for multiple reasons. The main use-case is to split a single build into multiple packages (such as headers, shared library, static library), without needing to rerun the build.

We propose to add a single "cache" step to rattler-build.

cache:
  requirements:
    build:
      - ${{ compiler }}
      - cmake
      - ninja
   script:
    - cmake build ...

The cache is built like a normal package, and then the work - directory and any new files from the prefix are zipped up in a cache folder. For each output we extract the files again from that cache.

QUESTIONS

should the cache step already "install" files into the host prefix? This would make it easy to split the package later but makes caching a bit harder since we need to zip up both new files from host + the work directory.
Does any of this really make sense or would ccache / sccache integration be superior? - WV: Actually, the configuration step takes quite some time, e.g. when invoking cmake or autotools
Antoine P: about terminology, maybe something in the lines of "pre-build"/"build-common" would be more explicit, especially wrt "does the cache install?". Pre-build could be more explicit in case the "cache" is not installed.
I also hope to see caching in-between builds (a la ccache, sccache) in the future, so "cache" may have multiple meanings in the future.

Use cases:

C++ library + Python library (e.g. mamba)
C++ split package (headers, library, debug symbols)
…?

To make it easy to produce split packages, we would add a file-glob step to rattler-build, e.g.

outputs:
  - package:
      name: libfoo
    files:
      - lib/**
      - include/**
  - package:
      name: py-libfoo
    files:
      - site-packages/**

Challenges

How do we handle run-dependencies exported from the cache step?

Antoine P: Sometimes even build requirements (compiler, libs) need to be shared because some compilation can be shared, but not all.
Alternatively, perhaps the abstraction we are looking for is not a "package"-like cache but a "reusable build step". Listing it a script element would have the both the effect of caching, but also of "merging" all dependencies. In fact, since for caching these needs to be called first, perhaps a new key like "script-reuse" would be needed.

If we have an output-chain, should only the first one get the run exports applied?

For example:

outputs:

bla
py-bla -> requires bla

It would be enough if bla get's the run exports from the cache.