Future Architecture of Sirocco

# Future Architecture of Sirocco [toc] ## Executive Summary The suggested adaptations in the Sirocco development were motivated by the following concerns (which are explained in more details thereafter): 1. The current design using `aiida-workgraph` doesn't allow ahead-of-time submission of jobs which, as illustrated recently by one of the EXCLAIM use-cases, is a requirement for good usage of the resources. In particular, in the santis case where large priority jobs require more than half of the cluster, allowing such projects with actual priority partition and avoiding large amounts of idling nodes while gathering resources for them, cannot be achieved without ahead-of-time submission. 2. Using AiiDA adds a significant entry barrier to users by running jobs in directories named after auto-generated UUIDs (not human-readable), so that interacting with the AiiDA database becomes a prerequisite for using the tool (inspecting run directories, debugging, finding output data, ...). 3. While the current design using `aiida-workgraph` does give provenance of the workflow (as promised by AiiDA), making use of this provenance requires users to interact with AiiDA's generic provenance tools (like the QueryBuilder API). As these are unfamiliar to the climate & weather user base, this can create a barrier to accessing simulation results and their relationships. 4. The current design makes Sirocco depend on four AiiDA pacakges, some of which are still in a pre-stable, development phase. This adds uncertainty to the maintenance and future development of Sirocco, given that developers on the AiiDA side will stop working on Sirocco after SwissTwins. What we suggest is to keep the workflow description and internal representation, but replace the AiiDA-driven orchestration of tasks by a buit-in mechanism which integrates tighter with the scheduler. This is achieved by a Sirocco task that resubmits itself directly, and, therefore orchestrates the workflow on the cluster. A first working version is implemented in a `standalone` branch and already runs an example based on simple shell tasks. AiiDA would then be used to control the workflow from a local machine supporting submission through SSH and FirecREST. This part still needs to be implemented but is low effort and has very little uncertainty associated to it. The plan for the immediate future is to continue the development on the two paths until the next milestone (aqua planet test case) is reached, see roadmap below, and then confirm that the new approach is suitable for every party by October. In conclusion, the timeline for the aquaplanet milestone is delayed, however, for large scale production runs (e.g. DYAMOND) the timeline becomes more realistic. ![image](https://hackmd.io/_uploads/Byjf7G7cel.png)  ## Motivations for the suggested adaptions ### 1. Upgraded requirement: submitting tasks ahead of time Scale runs require to submit tasks to the cluster for days, weeks or even months. Two approaches can be taken to manage this: 1. Constantly Monitor tasks status and submit new ones to the scheduler as soon as their dependencies have successfully completed. 2. Submit more tasks ahead-of-time, so that some of them still have pending dependencies, by communicating the dependencies to the scheduler and regularly submit new tasks based on the workflow status. Mode 2 allows to optimize the Cluster load by giving as much information as possible to the scheduler regarding future resources requests. Still, as this wasn't perceived as crucial early on, we started the development in mode 1 which is supported by aiida-workgraph, the component we chose to transfer the declarative workflow description into the AiiDA world. However, the first large scale project recently highlighted the requirement for this feature, especially when such projects are granted a priority on the cluster. In mode 1, as soon as such a large priority job has completed, the scheduler, because of lack of information, indeed allows some other tasks to start during the small time interval before the submission of the next prioritized job. It then needs to collect again sufficient resources for it by preventing other tasks to run, so that: - the cluster resources aren't optimally used - priority projects don't get a real priority #### Mitigation strategies This upgraded requirement necessitates that the first operational version (early users) not rely on `aiida-workgraph`, as no developers are immediately available who can add the feature or even estimate the work required. Work to overcome this limitation could be undertaken, but a long term maintenance plan is required for it to become viable as a component in `Sirocco`. A feature request has been made to `aiida-workgraph` and can be tracked here: https://github.com/aiidateam/aiida-workgraph/issues/646 ### 2. User adoption barrier: debugging explorative runs Unmodified, AiiDA assigns every simulation run a random ID (UUID). The simulation will be run on the cluster in a directory, the path of which is determined by that ID. In this folder organization, there is no workflow hierarchy information encoded. Rather, it is a flat collection of directories with their names and locations based on sharding of each process ID. Without looking up the ID on the user side machine (which requires knowledge of the corresponding AiiDA feature), the tree of work directories can not be navigated. This runs counter to intuition for this particular user base. #### Mitigation strategies AiiDA has facilities that allow additionally associating human readable labels with the random IDs. This alone would reduce friction but still requires individual workdir lookups per subtask of a workflow. Alternatively it should be investigated whether the mechanism that assigns work directories can be configured. A feature request for this has been made to the `aiida-core` repository and can be tracked here: https://github.com/aiidateam/aiida-core/issues/6018. It's underlying assumption that only aggregated results from many parallel simulations are of interest to the user also does not hold for explorative work. This could include implementing hierarchical work directories. Ideally, both approaches together could make for something more user friendly than what users are used to already. In case changes are required in AiiDA to make this work, a maintenance committment is again required for the long term. ### 3. User adoption barrier: accessing provenance The AiiDA provenance graph connects inputs with results. This is a desired feature (although there is no general agreement on how strongly desired). However, the generic tools, such as the QueryBuilder API AiiDA provides, to make use of that are foreign to the W&C user base. #### Mitigation strategies AiiDA provides various ways to augment the provenance graph (labels, extras, groups). These are ordinarily up to the user to place strategically. Instead, we could place them automatically via Sirocco. This could simplify user access to nodes of interest in the provenance graph. In addition, a higher level user interface could be built around that. Such an interface could expose common data lookup tasks in the form of single CLI / API calls. This is likely to not require changes in AiiDA Core or to rely on API that is due to change any time soon. ### 4. Maintenance burden and uncertainty With the current design, Sirocco relies on four distinct AiiDA packages (`aiida-core`, `aiida-shell`, `aiida-workgraph`, `aiida-firecrest`), of which the latter two are still in a pre-stable state and being actively developed. This dependency structure creates maintenance risks, particularly given that AiiDA developers will conclude their involvement in Sirocco after the SwissTwins project ends. The reliance on development-phase packages means potential API changes, breaking updates, and the need for ongoing adaptation work. Without guaranteed long-term support from the AiiDA development team, this creates uncertainty around Sirocco's future maintainability and evolution. Further, even if the missing features mentioned above would be implemented, the long-term maintenance burden remains. If bugs are discovered or compatibility issues arise, developers familiar with AiiDA must be available to resolve them. This may not be guaranteed after project completion. #### Mitigation strategies The suggested standalone approach (see below) would eliminate AiiDA dependencies entirely if users submit the Sirocco task manually and directly on the cluster. The two remaining AiiDA dependencies - `aiida-core` (a mature, stable package) and `aiida-firecrest` (which evolves alongside the FirecREST development itself) - would only be required for submission from a local machine and monitoring of the self-submitting Sirocco task. This significantly reduces the maintenance burden and dependency risk. Alternatively, formal maintenance agreements could be established with the AiiDA team, though this would require securing resources and commitments beyond the current project timeline. ## Suggested Adaptions We would replace `aiida-workgraph` for now with a much smaller component, which acts as a workflow orchestrator on-cluster. AiiDA would be used to submit entire workflows to that component. ### The workflow orchestrator This is built from / based on the existing code which was originally written with the purpose of generating and submitting a `WorkGraph`. A working version exists, which allows submitting in "mode 2", cancelling and restarting entire workflows. > [name=GeigerJ2] and can be found in the following branch: https://github.com/leclairm/Sirocco/tree/standalone It works with the same workflow description file format as originally planned. > [name=GeigerJ2] Due to the modularization of the Sirocco code, removing the WorkGraph creation code required only removing a single file, `workgraph.py`, with the current adaptations in the `standalone` branch, all tests are still passing. The component is much smaller, because it is tailored for weather and climate workflows: - does not need to take different transport types into account (done by AiiDA beforehand) as it runs directly on-cluster - dispenses with other features like workflow construction UI - it is not a long running daemon-style job - support for additional schedulers (beyond SLURM) has to be added, but doesn't present a large programming burden (SLURM integration in the current design implementation is ~100 lines) Rather than running alongside and monitoring the workflow by polling SLURM, this orchestrator will re-submit instances of itself directly on the cluster with a dependency on any of the currently running tasks. An additional benefit of this design is that it is very familiar to the long term maintainers in C2SM, as it is a broadly used design in that community. The user base is also familiar with the concepts that go alongside. ### The user side AiiDA component This would initially consist of an AiiDA plugin which facilitates submitting the workflow description to the on-cluster workflow orchestrator. As problems 2 and 3 (mentioned above) are being solved, this component would gain additional UI / CLI endpoints that add significant value to the users. It would furthermore only rely on `aiida-core` which is a mature and stable code, as well as `aiida-firecrest` which evolves alongside `firecrest` itself and `pyfirecrest`. ### Optional parallel tracks If the resources can be found to implement the necessary features in `aiida-core` and `aiida-workgraph` the architecture can be shifted back with little additional work towards workflows being fully orchestrated on the user machine by AiiDA. Alternatively, a solution could be investigated that employs much more lightweight tooling (compared to `aiida-workgraph`) based on `aiida-core` to generate static workflows from the declarative input format, which can implement "mode 2" submitting. This again, would allow moving back to AiiDA orchestration. ## Alternatives to the suggested adaptions ### Adapting timeline and resources instead of architecture To keep the original architecture throughout, it would be necessary to - add the missing pre-submitting feature to `aiida-workgraph` - clarify long-term maintainership of that feature in `aiida-workgraph` before being able to run experiments at scale. This would delay the milestones by an unknown amount of time.