`verdi init` - HackMD

# `verdi init` ## Semantics I'll start with some semantics again, to limit confusion in the presentation and the discussion that follows. I will _not_ define an AiiDA "instance", since this definition is exactly what is up for debate. For now, I will define[^1]: [^1]: These are _my_ definitions, and probably can be made more precise/complete. Suggestions welcome, we might even use it to start a glossary. * **AiiDA installation**: This is the source code, installed by a package manager so the `verdi` entry point is available from the command line. * **The `.aiida` directory**: I've called this the "configuration" before, but it actually contains more than just configuration at this point: * `config.json`: The _actual_ configuration, both of the AiiDA installation _and_ the AiiDA profiles (see below). * daemon-related configuration files and logs for each profile. * The default location for repositories. * The default location for SQLite databases of the `sqlite_dos` storage. * **AiiDA profile**: each AiiDA profile is a separate combination of an AiiDA storage and daemon. The image below tries to visualise and connect these concepts: ![image](https://hackmd.io/_uploads/HJu1elFeC.png) ## Introduction As Julian mentioned, the `verdi init` command (and commit https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe which it relies on) tries to tackle several issues: 1. Quick setup of a profile that requires no services. 1. Having a contained AiiDA "instance" that is easy to remove _completely_. 1. Better separation of AiiDA "instances". (1) would be fully satisfied by "`verdi blitz`" (https://github.com/aiidateam/aiida-core/pull/6305), and (2) _mostly_ as well. Doing `verdi profile delete` will remove the storage associated with the profile, but not the daemon-related files. This could be changed though, but let's not go into that here. (3) is the main issue I want to discuss, i.e. the changes made in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe. I could also post my essay there, but I think `verdi init` and its usage advocated here helps to illustrate the consequences of the change, and why I'm against it. ## The Status Quo The current definition of an AiiDA "instance" is straight from [the documentation](https://aiida.readthedocs.io/projects/aiida-core/en/stable/howto/installation.html#isolating-multiple-instances): > An AiiDA instance is defined as the installed source code plus the configuration folder that stores the configuration files with all the configured profiles. It continues: > It is possible to run multiple AiiDA instances on a single machine, simply by isolating the code and configuration in a virtual environment. Note how it mentions to store the configuration in a virtual environment. Next it mentions using Python environments, which I'm sure we all agree with: > To isolate the code, make sure to install AiiDA into a virtual environment, e.g., with conda or venv, as described here. Whenever you activate this particular environment, you will be running the particular version of AiiDA (and all the plugins) that you installed specifically for it. Then comes the tricky bit: > This is separate from the configuration of AiiDA, which is stored in the configuration directory which is always named `.aiida` and by default is stored in the home directory. Therefore, the default path of the configuration directory is `~/.aiida`. By default, each AiiDA instance (each installation) will store associated profiles in this folder. This is the current problem with separating AiiDA instances. By default, the `.aiida` folder is _not_ isolated like the Python environment is. The documentation does indicate best practises on how to deal with this: > A best practice is to always separate the profiles together with the code to which they belong. The typical approach is to place the configuration folder in the virtual environment itself and have it automatically selected whenever the environment is activated. But this requires setting the `AIIDA_PATH` environment variable, which is not trivial, environment-manager dependent and frankly: tedious. ## `git`-like `.aiida` discovery The changes in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe proposed a solution to this problem. It inserts an extra step in the resolution to the `.aiida` folder: 1. The `AIIDA_PATH` variable, if set. 2. `git`-like `.aiida` discovery 3. The `$HOME/.aiida` folder Basically, if `AIIDA_PATH` is not set, AiiDA will now go up the hierarchy of the current working directory and look for the first instance of `.aiida` and use that. At first glance the change seems innocuous and reasonable. I liked it too when I first saw the PR. But it has some pretty profound implications, especially when we combine it with `verdi init` and how it will be used. As I've mentioned, it radically redefines what an AiiDA instance is: > An AiiDA instance is defined by its configuration directory, which is always named `.aiida`. And allows for multiple `.aiida` directories per AiiDA installation[^2]. Next, its usage in `verdi init` _encourages_ multiple `.aiida` directories: [^2]: Note that in principle it is also possible to have multiple `.aiida` folders for one AiiDA installation in the status quo. The user _could_ change the `AIIDA_PATH` variable if they wanted to. But it's a use case we clearly don't recommend, and so advanced that I'd be surprised if anyone has done it in practise. - any beginning user that installs AiiDA for the first time in a fresh Python environment will have _two_ `.aiida` folders after running `verdi init`, and will interact with one of them depending on where they execute `verdi` or the location of the Python script they are running. - there are already multiple instances in the comments above where users would want to create another profile, but this would fail with `verdi init` because the `.aiida` folder already exists (in the current working directory or a parent) / `AIIDA_PATH` is set. It seems we'd be recommending them to create a new directory and run `verdi init` again. So, after this is released, we can fully expect that users will have multiple `.aiida` directories per AiiDA installation, that each can have multiple AiiDA profiles. ## The Problem Great, so what's the problem? I've tried to gather my thoughts here as best as I could, and also added some more examples of problems a user could run into because of this change at the end. ### It makes AiiDA _more_ complex The fact that we now can have multiple `.aiida` folders per install that each have multiple profiles adds another layer of complexity to the AiiDA setup, which is exactly what we want to avoid. Let's say we adopt the new definition of an AiiDA instance as a `.aiida` directory. What's the difference between an AiiDA instance and profile? Why do we need both? I wouldn't be able to answer this question to be honest. In fact it seems like this new approach of separating AiiDA instances is usurping the concept of an AiiDA profile, and it's already clear from this thread that they are being used interchangeably (incorrectly so!). This is problematic for several reasons: - New users will probably quite quickly try using `verdi init` to create a second _profile_ (e.g. if they realise they want/need RabbitMQ). This will fail or not, depending on the behaviour we choose [here](https://github.com/aiidateam/aiida-core/pull/6315#issuecomment-2012147746). If it does, they won't understand why, if it doesn't and create a new profile, what `verdi init` does will change fundamentally, and they'll have to understand the differences in how to interact with instances and profiles (see below). - Existing users will most likely be surprised by the behaviour of `verdi init`. They will assume that they've just created another profile they can access from anywhere with verdi, and be wondering "where is my profile" if they aren't in the correct folder. You could say that `verdi init` isn't for them, but then we don't solve the issue of isolating AiiDA instances. - All users will have to deal with both AiiDA "instances" and profiles, which have different modi operandi. Instances have git-like discovery, whereas profiles are configured in the `.aiida` folder and "loaded" through the Python API. Moreover, to get to the correct profile, you'll have to both (1) be in the correct directory and (2) _still_ load the profile somehow. Users will rightfully ask why. Deleting a profile uses `verdi profile delete`, but for an instance you have to remove the folder. To me it seems clear that having both multiple `.aiida` instances _and_ profiles will cause quite a bit of confusion. One might argue that we can fully move to git-like discovery "instances" in lieu of profiles. But I see other issues with this: - It's a massive backwards-incompatible change in behaviour which would definitely require an AEP and probably a major release. - The AiiDA installation is not aware of the various `.aiida` instances associated with it, which _is_ true for profiles. - Having a directory-based approach for interacting with certain instances is natural for CLI tools, but not so much for Python code. The location of the Python script/Jupyter notebook does not influence which Python instance I'm interacting with, rather it's the Python binary I'm calling or kernel I've loaded in my Jupyter notebook. - It's more challenging to interact with that instance _because_ you need to be in a specific folder hierarchy. Like interacting with Python instances, the current approach allows you to execute `/path/to/verdi -p profile` to e.g. back up a certain profile in a cronjob. With git-like discovery you'll have to make sure any script command that interacts with a certain instance does so in that folder (or somehow tweaks the `$PWD`). ### It doesn't _fully_ deal with the issues it's trying to solve #### Quick setup of a profile that requires no services. [As Xing rightfully mentions](https://github.com/aiidateam/aiida-core/pull/6315#issuecomment-2015226465), an advanced user might still want to be able to set up a profile quickly without having to deal with the complexity of `verdi profile setup` (I for one have rediscovered a new-found love for `verdi quicksetup`). However, anyone that actually uses `AIIDA_PATH`, can't use `verdi init`. So I guess we'd still need _another_ command, which ropes back into Julian's point that we shouldn't have too many commands for creating a profile. #### Having a contained AiiDA "instance" that easy to remove _completely_. The idea of having a fully contained AiiDA instance within a folder only works for the SQLite database. So if we would extend `verdi init` to Postgres-type databases the user can't simply remove the folder to purge an AiiDA "instance". And if we start promoting `verdi init`, you can be sure that users will _expect_ instances/profiles to be fully contained in that folder. Profiles on the other hand can be fully removed with `verdi profile delete` (or they should be). #### Better separation of AiiDA "instances". As mentioned above, anyone that _doesn't_ use `git init` to separate AiiDA instances, still has to deal with the same problem. Now, you could argue they can just make a `.aiida` folder, or we provide some kind of command for that. But really, we are making things harder for the user than they have to be to solve something that AiiDA should take care of automatically. ### Some more examples of potential issues - Separating the AiiDA installation from its configuration can have other downsides. The AiiDA installation determines the version of the configuration file. If AiiDA finds that the configuration file is outdated, it will automatically upgrade it (note that this _doesn't_ work the other way!). This is desired behaviour, since we want configuration file format updates to be seamless for the user. However, with the changes in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe, if I run the `verdi` command of the wrong Python environment in the scope of another `.aiida` directory, it will either give an error or (worse) automatically migrate my configuration. - If you run a `git` command outside the directory scope of a `.git` directory, `git` will raise an error. This is not true for AiiDA. It's why Sebastiaan decided to not allow `verdi init some/path` as described [here](https://github.com/aiidateam/aiida-core/pull/6315#issuecomment-2015190834). This is problematic because it can be easy to forget you have to be in the correct folder scope (_and_ Python environment) to interact with the AiiDA instance, and it won't be immediately clear that you are not. - A user downloads a Jupyter notebook and opens it. Normally they'd just select their Python kernel (VScode makes this quite easy) and then load the profile. If they are working with `.aiida` instances though, they have to move the notebook in the right folder first. And then they _still_ have to load the kernel _and_ profile. ## Alternative solutions ### Tie the AiiDA install to the Python instance However, as Daniel mentioned, the status quo is not acceptable either. I'm currently working on AEP + PR that ties the AiiDA installation to the corresponding Python instance. Effectively, each an "AiiDA instance" will once again be the combination of the AiiDA installation + `.aiida` folder, but the latter will be automatically isolated for each Python instance. ### Folder scope for profiles I do like the concept of git-like folder-based separatation in general, and I understand some users might prefer it. What we could do is create a "git-folder-like" profile. Basically, instead of changing the default `.aiida` configuration folder, you'd change the default profile depending on where the code is executed. ## Some questions that came up * Why does `AIIDA_PATH` allow for multiple directories? ## TL; DR * While I understand the appeal of having `git`-like `.aiida` directory discovery, I think it's a sub-optimal solution to the problem of separating AiiDA instances. * The changes in https://github.com/aiidateam/aiida-core/commit/1059b5f2d365e0f2ea4dea4d3c9343ce77829cfe should be reverted in favor of a solution that doesn't decouple the `.aiida` folder from the AiiDA installation. * In any case, any solution to the "separation of instances" problem is so impactful that it should require an AEP to properly motivate the chosen design.