# AiiDA Firecrest scheduler/transport discussion 2023-02-10
###### tags: `design`
###### time: 11am CET
[TOC]
### Present
* Chris
* Gio
* Sebastiaan
* Jusong
* Xing
### Notes
- Chris: Interesting things to consider:
- consider typer for the command line?
- Consider switching to SQLAlchemy 2.0 - in 1.4 we already set the future flag, so should be easy, and Table ORM classes now behave as dataclasses and use annotations
- consider splitting CalcJob into two pieces, an immutable CalcJob, immediately sealed, and a Processing class that has the mutable things that happen while running; you can also expose some mutable properties such as the state (playing, ...) using a SQLAlchemy AssociationProxy.
Agenda for next time:
- redesign new transport and scheduler classes
- goal: avoid to go via getting a command to execute via bash, but have directly a `submit_job(job_description)`
- migrate ssh plugin and slurm scheduler to new class, ideally minimizing changes as much as possible, and providing a list of "migration tasks/instructions"
- possibly already implement the firecrest plugin via pyfirecrest, with the new interface
- async interface; for SSH I would still use the curent pluign even if blocking, but check that one could do a second plugin (later) with asyncssh
- make sure the mock (fireflow) uses the plugin interface just designed (and not hardcodes firecrest)
- only then, check what need to be changed in the engine of AiiDA (doing it first in the mock) to comply with rate-limiting, make bulk requests ...; get out with a (minimal) list of things to change in AiiDA
- implement these changes in AiiDA
# AiiDA Firecrest scheduler/transport discussion 2022-01-17
###### tags: `design`
###### time: 12pm CET
[TOC]
### Present
* Chris
* Gio
* Francisco
* Sebastiaan
* Jusong
* Xing
### Demonstration
- REST API: https://firecrest-api.cscs.ch
- https://github.com/eth-cscs/firecrest demo server
- https://github.com/eth-cscs/pyfirecrest experimental CLI
- AiiDA minimal calculation mockup ([pyfirecrest/aiida_mock](https://github.com/chrisjsewell/pyfirecrest/tree/aiida_mock))
- Run multiple calculations asynchronusly
- async polling and file uploads/downloads
- small vs large file download/upload
- https://github.com/aiidateam/aiida-firecrest
- Implemented most of Transport
- Scheduler definiely not compatible though
### Discussion
* FirecREST has a maximum number of connections per second. We need to have some rate-limiting feature in AiiDA (and even in the mock) to avoid too many REST requests within a given time frame. I think this is important because even if a single calculation does a lot of waiting, in AiiDA we often submit hundreds of jobs at the same time.
* We need to have a meeting with them discussing also the roadmap - e.g. if they have some "notification" rather than polling, or if they plan to change something for the rate limiting, or allow bulk operations (e.g. polling). This might affect the design in AiiDA! (We might want anyway to design polling operations in a way that can be run as bulk from a design point of view, and for now we loop if bulk requests are not available in FirecREST - but maybe they are?).
* For the exiting "Bulk" requests (via TASKS), does it only give back *everything* or is it possible to ask only tasks in a list?
* Granularity of the tasks and how these are represented in AiiDA: if AiiDA gets stopped during an upload/download, will it be able to recover? Or it has to restart the upload from scratch? It would be good to cache the info so we can just check if the upload finished.
* Alternative, maybe better: accept that if the connection is stopped, you need to restart a new request in FirecREST (as otherwise this woudl require a lot of "entanglement" between AiiDA and the transport plugin) but have some feature in AiiDA to ask "please prepare for shutdown and stop doing new things", that will complete running tasks and after some time, possibly minutes, will tell that it's ready to be shut down.
* Will stashing go via this?
#### What issues do people see with Firecrest?
- API Request speed [firecrest#162](https://github.com/eth-cscs/firecrest/issues/162)
- Feature request - ability to register callback url for compute jobs [firecrest#126](https://github.com/eth-cscs/firecrest/issues/126)
- Clearing the file transfer server (good get full?)
#### Questions for CSCS
- Is there a way to clean up the object store from old files that we know we don't need anymore?
- Is it possible to project only specific attributes from a tasks GET request, to limit the data transfer? (If we only need the ID for example)
- Roadmap of FirecREST on managing hith-throughput runs: is this dealt with transparently (like hyperqueue) or there will be a bulk submission command?
#### Implementing in AiiDA
* How can the Scheduler be changed, **with back-compatibility**??
* Maybe we are OK not being backward compatible in this case, if imposisble, and we need to release 3.0 (but then we need to design well!)
* I would separate in two:
* check what needs to be changed in the AiiDA engine itself (e.g. we want to avoid that the header is written by the scheduler, uploaded by AiiDA, and then we just get a bash string to execute to submit, but we have a real submit() function in the scheduler that, if needed, uses the transport to upload the file before submitting)
* Also, separate transport in [file copy] + [command execution] (also beyond firecrest, to use blob storage). Qustion: were in AiiDA we call exec_command_wait that is not to call a Scheduler command? Because then, the separation could be [filetransfer] + [scheduler+execution_of_commands(if_needed)]
* think to asyncssh?
* To test in the mock implementation:
* batch all the polling requests (get jobs, get tasks)
* [CJS] Note in aiida this is handled by the `JobsList`: https://github.com/chrisjsewell/aiida_core/blob/e770eaade2d58a2e77a14e9f6c33abe090c6b19c/aiida/engine/processes/calcjobs/manager.py#L28
* implement some rate limiting option