Investigate and fix performance issues of GTC Toolchain

# Investigate and fix performance issues of GTC Toolchain ###### tags: `betting-archive` ## Problem The current GTC toolchain works well, it is flexible, easier to extend and maintain, and it generates faster code, but it comes with some performance overhead compared to the "classic" GT4Py toolchain, specially for large stencils. So far, the use of the `pydantic` library as base for all the node definitions and validators has been regarded as the most likely cause for the overall slowdown. While `pydantic` is convenient and very general, the implementation is mostly meant for flat data structures instead of highly nested structures as we have in the IR trees. Nested objects are completely reconstructed in each assignment as a node attribute, which we think it might create an overall quadratic complexity for some traversals operations. However, it is not yet clear if this is the main performance bottleneck of the toolchain or if other features (e.g. `xiter()`) are involved. ## Goals - Profile the toolchain to find out the actual bottlenecks in the GTC code and its relative impact in the overall execution time. - Evaluate and profile different alternatives to fix the bottlenecks. - If it is finally verified that pydantic is the main performance blocker, node classes should be migrated to eve datamodels classes (either the current version with `attrs` or a new one with plain `dataclasses`) ## No Goals In general, changing the logic of the GTC passes or a major refactoring of GTC is out of the scope. ## Potential Rabbit Holes Some performance bottlenecks could be hard to fix without major refactoring. Moving away from pydantic to a custom datamodels implementation might force changes in the implementation of certain features that are currently implemented in a highly pydantic-specific way (e.g. dtype propagation) ## Appetite 1/2 or 1 cycle, 1 developer (@egparedes should be available for consulting pydantic/datamodels questions)