Questionnaire analysis 2024

## Apart from all positive feedback, there are a few things needs our attention: - Documentation needs extensive improvement, many comments on discoverability and clarity. **+19 mentions** - Setting up / debugging workflows/workchain. **+17 mentions** --> [WorkGraph in development] - The querybuilder is difficult to use. **+7 mentions** - Some requested folder-based behaviour. **+6 mentions** - Possible issue on daemon performance. **+3 mentions** - Some miscellaneous suggestions (see below). --- ### We could have `verdi mirror` **+6 mentions** - Hard to understand AiiDA's paradigm to explore data - Figuring out where to find everything after a simulation. - How the data is structured behind the scenes and how to retrieve what I needed. - Either a simpler setup process or more browser-based AiiDA instances. - Slow experimentation/development loop (i.e. lack of "unsafe" no-provenance mode for quick-and-dirty use cases) - It is as if - rather than making the data easier to explore and navigate - it obfuscates it requiring help just to figure out how to perform a query. ### Possible issue on daemon performance **+3 mentions** - For a project with a 12k workchains database, AiiDA v1.6.9, 1) Monitoring the process daemon queue took up most of the project time which it shouldn't - When using workflows, especially when performing many simultaneous tasks, the daemons often freeze/do not update the job status. **+2 mentions** ### Found difficult: - Unintuitive interface, hard to start. **+2 mentions** - Lack of "best practice" to organize and post-process. **+2 mentions** - Fixing and resubmitting jobs that did not exit successfully - Error messages are not immediately clear as to what is happening. Often responses following lengthy Discourse discussion point to issues that are not obviously related to the original error. ### Documentation: #### Discoverability: - Very difficult to find information on how to do things and solve issues you encounter. **+2 mentions** - I did not know where to find the commands in general. Some are shown inconsistently in the tutorials, some others I had to ask colleagues about. - I think specifying scheduler resources is somewhat opaque. There's one page in the docs that has all of the names, but it took me a while to find it. - You learn by trial and errors, but this takes a huge and unnecessary amount of time. - Searching for documentation is too chaotic. The various sections, how-to, tutorials, topics overlap too much in the way they provide information. It can be very confusing and time-consuming understanding where to look. - Readibility of the materials online. #### Found unclear - Specifying slurm resources in job scripts is not transparent. - The concept of “namespace" is unclear. - How the connection to remote computers works/why it fails/how to debug (or restart) it - Too much emphasis on visualizing and teaching provenance in tutorials/docs. I just wanted to quickly set up my workflows for automation and know that I have the provenance, if I need it. - The tutorials were very simplistic and did not help at all in actually using AiiDA for my project. - Unless I was looking at the AiiDA code itself, it was impossible to find the available/valid options for my queries and workflows - Provide a standardized "workflow" of setting up AiiDA, running calculations, organizing nodes into orm.Group, and data querying and processing could help a lot. This can be as simple as instructions on how to organize scripts and data into a well-organized directory structure. Give a full picture of all I need. **+2 mentions** - To know when to apply which concept (calcfunctions, -jobs, workflow chains, ...) to my specific use case. - Docs are a bit difficult to use. There are two categories "How-to Guides" and "Topics" and it's not clear which contains the information I need. - can't find in one place all the valid options for the queries - I would like to see online tutorials on how to do data processing of actual simulation data. In my humble opinion, tutorials on how to add numbers and do other completely irrelevant tasks are just useless (this might sound too harsh but it's true). Please, explain the concepts using real use cases ### Miscellaneous Suggestions: - Do error handling in the code setup to make sure a code actually exists in a given namespace. It would set up the code without error and then any time I tried to run a calculation it would throw errors. - Not challenging, but sometimes confusing: I would make Computer a standard AiiDA node. - It would be really cool to find out how much LLMs could help as an interface. - How ML workflow would work. Like, how to represent training datasets, embeddings, etc. in provenance. - AiiDA should support both: Quick and dirty calculations or series of calculations always similar? When is it worth spending some time to learn AiiDA and have calculations organized in an "AiiDA database" rather than in a std folder? - `aiida-restapi` is a good start but it is incomplete with very useful functionality missing, e.g. downloading files. - `Aiidalab` also I think should be more prominent, while I personally enjoy the CLI and working with scripts, having everything working in jupyter and allowing to pass from one type of calculation to another, while exploring data, etc. would make it much easier to use for newcomers - one thing I'd like to see a little more of is integrating data querying with calculations. For example, query a group of calculations and find the job that converged to the highest force, resubmit the job, and show how one would organize the GraphQL data structure to make it clear it was restarted. - The documentation: disentangle the documentation for users (assume they know nothing, absolutely nothing, and get into-the-point immediately, straight away) and for developers - would be nice if you could copy between remote folders on different machines. - It would be nice if there was a way to record the versions of standard Python packages (i.e. numpy) that are imported by a calcfunction in the provenance of that calcfunction. - AiiDA is clearly robust. I just don't want to know that immediately, or rather have to know that to use it. In other words, great if the UI can be significantly simplified, with additional options available only on request where they are most needed (not simple to achieve), thus retaining AiiDA's flexibility. - I think that the workflows though very powerful can sometimes be inflexible, sometimes calculations fail due to numerical issues, and as an input might need changing this can invalidate a lot of results in the name of consistency. Though very valuable it can sometimes be very painful. - Cannot re-parse data later [I think it means after storage?] - Allowing the usage of a path as the database host to use UNIX sockets in custom locations instead of TCP/IP. It is possible right now, but very hackish as one has to set the host to "null", and leverage the libpq under the hood by setting the PGHOST environment variable.Together with implementing a different broker (eg "pgmq") would allow to run AiiDA without having to open any ports and also not loosing efficiency (RabbitMQ only supports running over TCP/IP). - Implementing CalcJobs, prepare_for_submission method [is difficult] ## Other repos ### `verdi shell` - If I load the node in a verdi shell, I'm not sure if I can see directly to which group(s) it belongs. Also to my knowledge, if the nodes are organized in the groups and subgroups, it's not possible to list all nodes or processes in a parent group of a subgroup, or maybe just show which subgroups there are, similar to a file manager. ### `aiida-quantumespresso` - Finding the right override template, understanding which Quantum espresso input variables are not parsed from the override but have to be given separately (like occupation = insulator...) ### `StructureData` - Maybe an additional layer of user interface commands could simplify profile setup, interaction with the database etc. Something more intuitive for those who are not very familiar with the workflow managers, e.g. maybe something like an import_structure(filepath, format="cif", store=True) command that returns a StructureData node ready to be used as an input to a calculation. Also e.g. in aiida-quantumespresso plugin is not so intuitive how to change certain parameters when creating a builder, e.g. specifying custom k-point grid. Maybe this could be simplified by e.g. adding some "setter" functions (pwbandsbuilder.set_kpoint_grid([8,8,8])) ## Already done, has to be advertised: - `aiida-shell` - It was difficult to write plugins, but then I slowly learned how to do that. - If I want to do something that's not inherently supported by either AiiDA core or one of the plugins, but I would like to do it from within AiiDA to preserve provenance. - `verdi process dump` - I need directly access to the raw results of the calculations. **+2 mentions** - `aiida-profiles` - HPC installation of AiiDA: using the same profile with other members of the group is not easy.