Architecture design for the new declarative GT4Py

# Architecture design for the new declarative GT4Py ###### tags: `functional cycle 7` shaped by: Enrique appetite: 1/2 cycle for Enrique (supported by Rico). ## Motivation The original GT4Py was designed as a simple prototype tool to generate Gridtools C++ code from a Python DSL at roughly the same level of abstraction. Since then, many new features have been added by amending and extending the existing architecture as required for each feature in a organic trial-and-error incremental process. As a result of this organic growth the current architecture is unnecessarily complex and hard to understand, which makes the overall project prone to workarounds and hacks due to the fragile design and backwards compatibility constraints. ## Goal The goal of this project is to come up with a new architecture design, leveraging the opportunity originated by the implementation of the new Declarative GT4Py model, which differs substantially from the current Cartesian GT4Py. The new design should consider the following functionalities while avoiding the well-known problems from the current version: - General infrastructure + Fast fingerprinting of DSL sources (decoupled from frontend parsing). + Efficient caching of already generated binaries: + Configurable + Shareable between multiple users/nodes running in the same cluster (multi-layered?) + Efficient mechanism to load generated Python extensions (if running in Python mode) + Optionally, investigate if modern Python versions have added module unloading capabilities or other alternatives like using stubs (e.g. DaCe?) + Configuration system for the toolchain accessible from: + Environment variables + Configuration files (INI? YAML? JSON? NestedText?) + Python API + CLI-friendly: support for programatic access of the infrastructure functionality decouple from the user API. - Analysis and code-generation toolchain + Support for multiple frontend parsers, considering as a _frontend_ any text string which can be lowered to a valid Iterator IR program. + Support for multiple code-generation backends, considering as a _backend_, a specific hardware platform + software API (e.g. CUDA GPU + GridTools C++, Intel CPU + OpenMP, ...). + A flexible mechanism to build analysis and lowering pipelines from a combination of transformation/verification passes. + Support for automatic generation of bindings to multiple languages (not only Python) - Compatible with the new _no-storages_ design for memory allocation (basically, only provide allocation functions, not _Storage_ objects) - Compatible with running in purely embedded mode without using the toolchain The design should include the description of the components and their interactions, as well as rationale behind the design decisions and other discarded alternatives. ### Optional goals - A new file layout proposal for the new repository - A proposal for agreed recommended coding guidelines and style (doctrings, imports, ...) - A proposal for testing, documentation and automation tools ## Non-Goals This is just a design task and therefore a full implementation of the architecture is out-of-scope. Some partial prototypes might be sketched during the design discussions to quickly evaluate the trade-offs of different design decisions in specific components.