--- title: Revamping dependencies in a ca world tags: ca --- This document tries to explain why we need to do more in term of dependency tracking in presence of ca derivations. For all the examples, assume the following Nix expression, where `hello` keeps a runtime dependency on both `libhello` and `greetings_text` ```nix with import <nixpkgs> {}; rec { libhello = stdenv.mkDerivation { ... }; greetings_text = builtins.toFile "greetings" "Hello, Nixer;" hello = stdenv.mkDerivation { buildInputs = [ libhello ]; greetings_text = greetings_text; ... }; } ``` ## Copying stuff ### Toplevel drv output mapping In a ia world, the following works: ```shell nix build .#hello nix copy --to ssh://remotemachine $(realpath ./result) rm result nix-collect-garbage nix copy --from ssh://remotemachine .#hello ``` However, in a ca word, this doesn't anymore, because `remotemachine` has `outPath(hello!out)` in store, but doesn't now that it's `outPath(hello!out)` (nor does the locale machine). The conclusion of this, is that we must be able to also copy the mapping `hello!out -> outPath(hello!out)`, _i.e._ do something like ```shell drvOutputId="$(nix eval .#hello.drvPath)!out" nix copy --to ssh://remotemachine $drvOutputId ``` Or more concisely, `nix copy --to ssh://remotemachine #hello` should be equivalent to the above. If we're able to do that, then we can fetch `.#hello` from the remote machine again as it knows `outPath(hello!out)` ### Inner drv output mappings Another thing that should hold is ```shell nix copy --to ssh://remotemachine .#hello nix-collect-garbage nix copy --from ssh://remotemachine .#libhello ``` This means that we need to also copy the drv output mapping for `libhello!out`. But there's currently nothing linking `hello!out` and `libhello!out`. So we need `nix copy .#hello` to register not only the output path of `.#hello` on the remote store, but also the output path of `.#libhello`. ## Wrapping it up In a ia world, the only (meaningful) dependency in the example is that `outPath(hello!out)` depends on `outPath(libhello!out)` and `greetings_text`. But in a ca world, we should also register that `hello!out` depends on `libhello!out` and `greetings_text`. Which is a slightly different thing. The two dependency sets are moreover not totally separated, because drv outputs don't only depend on other drv outputs, but also on store paths (like `greetings_text` in the example). The contrary shouldn't be true though: A store path can only depend on other store paths. ## Implementation wise: Changes to the sql schema What this means is that we'll have to change the sql schema to take that into account. I'm not an sql expert, but the possibilites I see are ### Replace the `Refs` table by three tables Add three tables `PathToPathRefs`, `DrvOutputToDrvOutputRefs` and `DrvOutputToPathRefs` to model the three different kind of refs ### Have an extra `Storable` table for `std::variant<StorePath,DrvOutput>` The other option is to have a new table `Storable` (very bad name), whose elements would be an encoding of `std::variant<*DrvOutput, *ValidPathInfo>` (so either a reference to a row of `ValidPaths` or a reference to a row of `DerivationOutputs`). Then change the `Refs` table to be a mapping between `StoreItem`s rather than between `*ValidPathInfo`s. I've tried this option (not to the end because it involves quite a lot of changes everywhere). Just to make the terminology clear, I've defined I ended up with the following (probably not final) schema changes: 1. The `DerivationOutputs` table is modified to: ```sql= create table if not exists DerivationOutputs ( id integer primary key autoincrement not null, drvPath text not null, outputName text not null, outputPath integer not null, foreign key (outputPath) references ValidPaths(id) on delete cascade ); ``` The most interesting change (apart from the colum name changes to make it clearer) is that the `drvPath` is now a raw text, while the `outputPath` is a reference to a `ValidPath`. This means that a derivation output exists independently of its deriver, but requires its output path to be live, which is more in line with the (current) semantics of the table. (As it's the opposite of the previous one, maybe we should instead create a new table and keep the old-one as-it-was). 2. There's a new `Storables` (bad name) table, defined as: ```sql= create table if not exists Storables ( id integer primary key autoincrement not null, path integer unique, drvOutput integer unique, CHECK ((path is not null AND drvOutput is null) OR (path is null AND drvOutput is not null)) foreign key (path) references ValidPaths(id) on delete cascade, foreign key (drvOutput) references DerivationOutputs(id) on delete cascade ); ``` This table is the one that holds the `std::variant` described above 3. The `Refs` table is then modified as ```sql= create table if not exists Refs ( referrer integer not null, reference integer not null, primary key (referrer, reference), foreign key (referrer) references Storables(id) on delete cascade, foreign key (reference) references Storables(id) on delete restrict ); ``` Meaning that it'll hold dependencies between `Storable`s instead of `StorePath`s. (with the unchecked assumption that a `StorePath` can only depend on other `StorePath`s) With all that, we can define a function to get all the dependencies of a `Storable`, and by extension compute its closure. The nice thing is that we unify the handling of `StorePath` and `DrvOutputId`. The less nice thing is that it's a pain to implement.