Flakes and Git integration

The point of this document is to question the UX choice of flakes to only consider files that are tracked by the git repository, ignoring files that are visible in the filesystem.
I have obseved many complaints about this design, and I wanted to collect opinions, use cases and stumbling stones at one place, in order to provide the community with a unified design document.
All the questions should be adressed, which should make the choice more acceptable to users, as they can get some conviction that their issues have been taken into account for the final decision.

This is not about the integration and proeminence of github as the default remote git provider, nor the github:// input shceme for flakes. It is also not about the the fact that flakes make a copy of the content of the current git repository into the store for evaluation. This is adressed by the Source tree abstraction #6530 draft PR.

Current design (with `nix (Nix) 2.8.0pre20220411_f7276bc`)

An example: creating a new flake

This is the ideal workflow for creating a new flake. The git repo may already exist, but let's assume we start a completely new project for the sake of the demo.

$ cd /somewhere
$ git init .
$ nix flake init -t templates#simpleContainer
$ nix build .#nixosConfigurations.[...]
warning: Git tree '/somewhere' is dirty
[0/218 built, 1/2/192 copied (7.1/426.0 MiB), 0.8/90.9 MiB DL] (works)

Now, the thing is nix flake init does some magic for you. It regiters the intent to add the files inside git with git add -N flake.{nix,lock}. This is obvious if the git repo is create after the nix flake init command

$ cd /somewhere
$ nix flake init -t templates#simpleContainer
$ git init .
$ nix build .#nixosConfigurations.[...] # fails
warning: Git tree '/somewhere' is dirty
error: getting status of '/nix/store/<hash>-source/flake.nix': No such file or directory

Now, nix fails with a missing flake.nix, which we have just created. The error may be even more confusing if you do some development with the flake before creating the git repo, because it would work correctly outside of a git repo.

An attempt at writting down the behavior

The reason it needs to be staged is that flake evaluation will only look at files that are tracked by Git. Note that it’s the content of the file in the worktree that counts, not the staging area. In particular, you can git add -N the flake.nix file and it’ll be just fine, at least until the next commit.

@lourkeur comment in My painpoints with flakes

Each nix evaluation of a flake first makes a copy of the local files into the nix store.
The set of files that are copied changes if the flake.nix file is inside a git repository. (That is, there is a valid .git fodler in any of the parents folder of the current path).
When it is not, the source tree if formed by making a copy of all the files in the folder that contains the flake.nix files.
Instead, if the flake.nix is inside a git repository, then the set of source files is formed by taking all the files that are currently tracked by git (i.e. in the staging area) but their content is taken from the working area.
That is, the source tree is identical to the one in the commit you would get by calling git commit --all

-a, –all

Tell the command to automatically stage files that have been modified and deleted, but new files you have not told Git about are not affected.

As a consequence of this, it has been repetedly explained that the way to make a new file available for the evaluation of a flake is to run git add --intent-to-add on the file.

-N, –intent-to-add

Record only the fact that the path will be added later. An entry for the path is placed in the index with no content. This is useful for, among other things, showing the unstaged content of such files with git diff and committing them with git commit -a.

So, the source tree is quite different when inside and outside git. Inside git, the source tree is rooted at the root of the git repo (instead of at the flake.nix folder) and takes into account a subset of the files (the ones in the staging area) whose content comes from the working area.
Outise, it takes a copy of the tree rooted at the flake.nix currently evaluated.

Reasons behind that choice

This design choice was made on purpose. We can find traces of this intent at several places:

Ideally, we would copy the flake to the store only when its outPath attribute is evaluated. However, we also need to ensure that it's not possible to access untracked files (i.e. we need to check every file against git ls-files).

@edolstra, in Copy local flakes to the store lazily #3121

Well, no, I could not find anu place where this was discussed.

The behavior comes from the fetchGit builtin, which received less crittiques (if at all). I guess mostly because it was seldom used on dirty local checkouts with new untracked files that had to be included.

This was introduced in 2017, in https://github.com/NixOS/nix/commit/72cd52c3cdd1fc465fade6d553b3823aca9f8b6e

builtins.fetchgit: Support importing a working tree

For example, you can write

src = fetchgit ./.;

and if ./. refers to an unclean working tree, that tree will be copied

to the Nix store. This removes the need for "cleanSource".

In nix since 2.0, made by Eelco on Oct 30, 2017

The fact that it was initially implemented for a different purpose, and not much used, hints that the design may not be hard decision, but rather a quick way to get the job done.

May I surmise that the use of git ls-files in there was mainly a way to take .gitignore into account ? There would be other ways to do that, but they are way more convoluted AFAIK. Ripgrep seems to have a .gitignore parser and a copy of git's regex engine.

Recently, the only reason I heard in supoprt of the current behavior is that it avoid issues downstream, when you push the code to CI only to discover that you forgot to add important files. This argument is weak, as we will see below.

To be honnest, I am quite surprised that I could find no more discussions about this part of the design.

Bad

Flakes perform git commands silently (git add -N)
Worse, they use --force (see https://github.com/NixOS/nix/issues/5810) to bypass the .gitignore
Flakes impose a git workflow on the users (they must use the staging area in a certain way). There are many git workflows. I for example usually make many changes, and then git add -p them in separate commits.
It deviate from all the other tools which ignore git most of the time, or provide opt-in (or opt-out) integration for convenience.
Tup generates a .gitignore file when there is none, and can append to existing ones. An opt-in feature!
all their effects are visible, and disableable
Other tools take an existing gitignore in account (ripgrep)
It mixes the staging and working areas, breaking git abstractions.
It does not fix all the cases: when the files you miss is only evaluated in CI for example. The missing files are not always detected by a local eval.
the workaround of using path:. is not equivalent, as it also changes the root of the source tree for nested flakes. (TODO: check that a import ../foo.nix works in git: and not in path:)

Good

Flakes catch files that are used but not added.
There is an escape hatch: using path:. input scheme.

My opinion

I think this has been implemented the wrong way. The default should be path:., possibly with support for finding the closest enclosing flake in the project.

Warnings should be about files that are used, but not under version control. This becomes feasible with lazy sources fetching.

This may change the behavior of fetchGit but 1) noone will notice, as it never caused any issue before and 2) the new behavior makes way more sense.

Options

Worspace without ignored files
Same as now –
Skip git integration.

– Use case for user that does not want to commit the flakes.
– Use case for user that

Use cases

From previous discussions, nix flakes have two distinct use cases. The difference lies in nix evaluation versus nix build.

Nix flakes evaluation

When evaluating .nix files, nix does not really need to have them all in the nix store. Just like the legacy commands evaluate inside any source tree.

Despite this, flakes do copy the source tree before evaluating the store copy. This is what leads to a discrepancy between the user worspace and the tree that is actually executed, in the store.

In this mode, and with this use case in mind, it makes little sense to restrict the set of files that are available. This is what we do now and leads to the surprising behavior that nix can display very precisely the file that it would need to evaluate, but still refuses to do it.

Nix flakes buils

When building anything, nix needs a store path. In particular, store paths formed from flake source paths are content adressed. This means we cannot know the nix store path without knowing beforehand the full set of files that will be part of it.

That sources path is very important as it propagates the cache hits that we can expect from any cache. If various files enter and leave this space without being used, then they trigger "false cache misses" where we are in effect building the same outputs from the same set of meaningful files, but nix path names change and thus force a rebuild.

In that sense, we pay the price of a lack of granularity.

Unifiyng these ideas

Evaluation of nix files did not happend in the store before flakes. With flakes, they always need to be in the store. I wonder if this restriction could be lifted.

The current implementation tries very hard to have all the following use cases hit the same nix cache entries.

nix build github:user/repo#pkg

nix build .#pkg

Meaning that the following workflow should build the exact same nix derivation

computer A

vim /src/file.c

make changes

nix build

build with local changes

git commit –all

git push

computer B

git pull

nix build

fully cached build

Note about evaluation of flakes vs evaluation of nix files

Evaluting an expression like "${./foo}" does not evaluate to the same path inside a flake and outside it. Because flakes evaluation happens in the store, evaluating "${./foo}" does not copy anything to the store, it resuses the foo subpath from the flake store path.

This discrepancy has deep implications. Because evaluating flakes for a remote source (say github:user/repo) will pull the remote ressource into a store path and evaluate from there, the behavior will be different from what you would expect when evaluatin in a local checkout.

To reproduce that behavior, and have local flakes evaluate like remote ones, nix flakes copy all the sources before evaluation. Which poses the problem at hand: which files should be part of that copy ?

Taking too much files, including random garbage in the git repository, leads to rebuilds even if the relevant files did not change.
Taking too few causes the frustration that a file that is obviously present in the workspace is pedantically ignored by nix.

What if ?

What if nix could take into account a missing nix file and produce a warning ? Like "evaluation used file foo but it was not included in the sources for evaluation. This will fail when you upstream it."

That solves the pain point of having to add files to the staging area, but not te underlying issue that anything build based on these sources is still invalid, because the build would stil lbe invalidated as soon as the missing file is added to the store.

Separate eval from build

Truth is, nix has always had two phases. First evaluate, then build.
This flake evaluation model breaks it, because it uses the same sources for evaluation and building. All of this because defining sources from inside a store path leads to a different result than evaluating sources from outside of it.

I understand that it may be seen as an optimisation when the sources are already in the store, back in the time where we would evaluate nixpkgs from a tarbal copied to the store. Repeating every source file would be a waste of time.

Scrap that, it is false.

The only problem this is trying to solve is to avoid having to build a proper filterSources in the nix derivation.
By using the staging area as a kind of reversed flakeignore, it gets us the ability to do src = self (or src = ./. if you prefer) in a best effort way.

We pay the price here of having a package manager intead of a file manager. Because the nix files take part into the source of the program, changing the nix files in any irrelevant way (spacing) will trigger a full rebuild. Bazel, make, tup, none of them would rebuild on such a silly change. But it poses a good question: what to do with nix configuration files with a build system nix that works on paths/packages/subtrees/derivations ?

We also pay the price of sandboxing vs tracing, as the full set of files needs to be known in advance to be included in the sources path.
A tracing build system would dicover the paths that are ussed effectively, and call that the set of inputs a posteriori. Not possible here because names need to be known before they are even used, for they appear in .drv files.

Proposal

Evaluate in "path:." mode as long as possible. (let's even include the .gitignored files). When the sourcres have to be reified (src = self) then generate them, with the custom ls-files + dirty changes based algorithm if really important, and fail when one of the files evaluated previously

Flakes and Git integration The point of this document is to question the UX choice of flakes to only consider files that are tracked by the git repository, ignoring files that are visible in the filesystem. I have obseved many complaints about this design, and I wanted to collect opinions, use cases and stumbling stones at one place, in order to provide the community with a unified design document. All the questions should be adressed, which should make the choice more acceptable to users, as they can get some conviction that their issues have been taken into account for the final decision. This is not about the integration and proeminence of github as the default remote git provider, nor the github:// input shceme for flakes. It is also not about the the fact that flakes make a copy of the content of the current git repository into the store for evaluation. This is adressed by the Source tree abstraction #6530 draft PR. Current design (with nix (Nix) 2.8.0pre20220411_f7276bc ) An example: creating a new flake This is the ideal workflow for creating a new flake. The git repo may already exist, but let's assume we start a completely new project for the sake of the demo. $ cd /somewhere $ git init . $ nix flake init -t templates #simpleContainer $ nix build . #nixosConfigurations.[...] warning: Git tree '/somewhere' is dirty [ 0 / 218 built , 1 / 2 / 192 copied ( 7.1 / 426.0 MiB ), 0.8 / 90.9 MiB DL ] (works) Now, the thing is nix flake init does some magic for you. It regiters the intent to add the files inside git with git add -N flake.{nix,lock} . This is obvious if the git repo is create after the nix flake init command $ cd /somewhere $ nix flake init -t templates #simpleContainer $ git init . $ nix build . #nixosConfigurations.[...] # fails warning: Git tree '/somewhere' is dirty error: getting status of '/nix/store/<hash>-source/flake.nix' : No such file or directory Now, nix fails with a missing flake.nix, which we have just created. The error may be even more confusing if you do some development with the flake before creating the git repo, because it would work correctly outside of a git repo. An attempt at writting down the behavior The reason it needs to be staged is that flake evaluation will only look at files that are tracked by Git. Note that it’s the content of the file in the worktree that counts, not the staging area. In particular, you can git add -N the flake.nix file and it’ll be just fine, at least until the next commit. @lourkeur comment in My painpoints with flakes Each nix evaluation of a flake first makes a copy of the local files into the nix store. The set of files that are copied changes if the flake.nix file is inside a git repository. (That is, there is a valid .git fodler in any of the parents folder of the current path). When it is not, the source tree if formed by making a copy of all the files in the folder that contains the flake.nix files. Instead, if the flake.nix is inside a git repository, then the set of source files is formed by taking all the files that are currently tracked by git (i.e. in the staging area) but their content is taken from the working area. That is, the source tree is identical to the one in the commit you would get by calling git commit --all -a, – all Tell the command to automatically stage files that have been modified and deleted, but new files you have not told Git about are not affected. As a consequence of this, it has been repetedly explained that the way to make a new file available for the evaluation of a flake is to run git add --intent-to-add on the file. -N, – intent-to-add Record only the fact that the path will be added later. An entry for the path is placed in the index with no content. This is useful for, among other things, showing the unstaged content of such files with git diff and committing them with git commit -a. So, the source tree is quite different when inside and outside git. Inside git, the source tree is rooted at the root of the git repo (instead of at the flake.nix folder) and takes into account a subset of the files (the ones in the staging area) whose content comes from the working area. Outise, it takes a copy of the tree rooted at the flake.nix currently evaluated. Reasons behind that choice This design choice was made on purpose. We can find traces of this intent at several places: Ideally, we would copy the flake to the store only when its outPath attribute is evaluated. However, we also need to ensure that it's not possible to access untracked files (i.e. we need to check every file against git ls-files). @edolstra, in Copy local flakes to the store lazily #3121 Well, no, I could not find anu place where this was discussed. The behavior comes from the fetchGit builtin, which received less crittiques (if at all). I guess mostly because it was seldom used on dirty local checkouts with new untracked files that had to be included. This was introduced in 2017, in https://github.com/NixOS/nix/commit/72cd52c3cdd1fc465fade6d553b3823aca9f8b6e builtins.fetchgit: Support importing a working tree For example, you can write src = fetchgit ./.; and if ./. refers to an unclean working tree, that tree will be copied to the Nix store. This removes the need for "cleanSource". In nix since 2.0, made by Eelco on Oct 30, 2017 The fact that it was initially implemented for a different purpose, and not much used, hints that the design may not be hard decision, but rather a quick way to get the job done. May I surmise that the use of git ls-files in there was mainly a way to take .gitignore into account ? There would be other ways to do that, but they are way more convoluted AFAIK. Ripgrep seems to have a .gitignore parser and a copy of git's regex engine. Recently, the only reason I heard in supoprt of the current behavior is that it avoid issues downstream, when you push the code to CI only to discover that you forgot to add important files. This argument is weak, as we will see below. To be honnest, I am quite surprised that I could find no more discussions about this part of the design. Bad Flakes perform git commands silently (git add -N) Worse, they use --force (see https://github.com/NixOS/nix/issues/5810 ) to bypass the .gitignore Flakes impose a git workflow on the users (they must use the staging area in a certain way). There are many git workflows. I for example usually make many changes, and then git add -p them in separate commits. It deviate from all the other tools which ignore git most of the time, or provide opt-in (or opt-out) integration for convenience. Tup generates a .gitignore file when there is none, and can append to existing ones. An opt-in feature! all their effects are visible, and disableable Other tools take an existing gitignore in account (ripgrep) It mixes the staging and working areas, breaking git abstractions. It does not fix all the cases: when the files you miss is only evaluated in CI for example. The missing files are not always detected by a local eval. the workaround of using path:. is not equivalent, as it also changes the root of the source tree for nested flakes. (TODO: check that a import ../foo.nix works in git: and not in path:) Good Flakes catch files that are used but not added. There is an escape hatch: using path:. input scheme. My opinion I think this has been implemented the wrong way. The default should be path:., possibly with support for finding the closest enclosing flake in the project. Warnings should be about files that are used, but not under version control. This becomes feasible with lazy sources fetching. This may change the behavior of fetchGit but 1) noone will notice, as it never caused any issue before and 2) the new behavior makes way more sense. Options Worspace without ignored files Same as now – Skip git integration. – Use case for user that does not want to commit the flakes. – Use case for user that Use cases From previous discussions, nix flakes have two distinct use cases. The difference lies in nix evaluation versus nix build. Nix flakes evaluation When evaluating .nix files, nix does not really need to have them all in the nix store. Just like the legacy commands evaluate inside any source tree. Despite this, flakes do copy the source tree before evaluating the store copy. This is what leads to a discrepancy between the user worspace and the tree that is actually executed, in the store. In this mode, and with this use case in mind, it makes little sense to restrict the set of files that are available. This is what we do now and leads to the surprising behavior that nix can display very precisely the file that it would need to evaluate, but still refuses to do it. Nix flakes buils When building anything, nix needs a store path. In particular, store paths formed from flake source paths are content adressed. This means we cannot know the nix store path without knowing beforehand the full set of files that will be part of it. That sources path is very important as it propagates the cache hits that we can expect from any cache. If various files enter and leave this space without being used, then they trigger "false cache misses" where we are in effect building the same outputs from the same set of meaningful files, but nix path names change and thus force a rebuild. In that sense, we pay the price of a lack of granularity. Unifiyng these ideas Evaluation of nix files did not happend in the store before flakes. With flakes, they always need to be in the store. I wonder if this restriction could be lifted. The current implementation tries very hard to have all the following use cases hit the same nix cache entries. nix build github:user/repo#pkg nix build .#pkg Meaning that the following workflow should build the exact same nix derivation computer A vim /src/file.c make changes nix build build with local changes git commit – all git push computer B git pull nix build fully cached build Note about evaluation of flakes vs evaluation of nix files Evaluting an expression like "${./foo}" does not evaluate to the same path inside a flake and outside it. Because flakes evaluation happens in the store, evaluating "${./foo}" does not copy anything to the store, it resuses the foo subpath from the flake store path. This discrepancy has deep implications. Because evaluating flakes for a remote source (say github:user/repo ) will pull the remote ressource into a store path and evaluate from there, the behavior will be different from what you would expect when evaluatin in a local checkout. To reproduce that behavior, and have local flakes evaluate like remote ones, nix flakes copy all the sources before evaluation. Which poses the problem at hand: which files should be part of that copy ? Taking too much files, including random garbage in the git repository, leads to rebuilds even if the relevant files did not change. Taking too few causes the frustration that a file that is obviously present in the workspace is pedantically ignored by nix. What if ? What if nix could take into account a missing nix file and produce a warning ? Like "evaluation used file foo but it was not included in the sources for evaluation. This will fail when you upstream it." That solves the pain point of having to add files to the staging area, but not te underlying issue that anything build based on these sources is still invalid, because the build would stil lbe invalidated as soon as the missing file is added to the store. Separate eval from build Truth is, nix has always had two phases. First evaluate, then build. This flake evaluation model breaks it, because it uses the same sources for evaluation and building. All of this because defining sources from inside a store path leads to a different result than evaluating sources from outside of it. I understand that it may be seen as an optimisation when the sources are already in the store, back in the time where we would evaluate nixpkgs from a tarbal copied to the store. Repeating every source file would be a waste of time. Scrap that, it is false. The only problem this is trying to solve is to avoid having to build a proper filterSources in the nix derivation. By using the staging area as a kind of reversed flakeignore, it gets us the ability to do src = self (or src = ./. if you prefer) in a best effort way. We pay the price here of having a package manager intead of a file manager. Because the nix files take part into the source of the program, changing the nix files in any irrelevant way (spacing) will trigger a full rebuild. Bazel, make, tup, none of them would rebuild on such a silly change. But it poses a good question: what to do with nix configuration files with a build system nix that works on paths/packages/subtrees/derivations ? We also pay the price of sandboxing vs tracing, as the full set of files needs to be known in advance to be included in the sources path. A tracing build system would dicover the paths that are ussed effectively, and call that the set of inputs a posteriori . Not possible here because names need to be known before they are even used, for they appear in .drv files. More questions: Why do flakes warn about dirty trees. Local development trees will always be dirty ! It feels like a stupid thing to say so loudly. Why provide a path: … flake scheme ? When there is no git dir it works out of the box anyway, and when there is one it will copy all the crap that .gitignore has. Proposal Evaluate in "path:." mode as long as possible. (let's even include the .gitignored files). When the sourcres have to be reified (src = self) then generate them, with the custom ls-files + dirty changes based algorithm if really important, and fail when one of the files evaluated previously