GT4Py passes to GTC migration

# GT4Py passes to GTC migration ###### tags: `cycle 0` ## Problem With Eve we are developing a library to implement IRs, transformations and code generators. GT4Py's backend will directly profit from this development. We see the following advantages for moving to Eve now: - We will use a common infrastructure for Cartesian backends and the prototype work for unstructured meshes. - Vulcan actively works on the optimizations in GT4Py. Implementing these optimizations directly on the new infrastructure will save us work to migrate them later. - Develop a testing methodology for transformation passes that can be applied later to unstructured meshes. ## Appetite Full cycle (Rico, Hannes, help from Johann) ## Solution Migrate current GT4Py functionality to the Eve infrastructure using the new GTC IRs. A NumPy backend and a GridTools (CPU backend) will be used to test implementation. A focus will be on testability of transformation passes. Subtopics: - Complete GTIR with focus on validation (narrowing it as much as possible to the parallel model). - Design a middle IR that allows for optimizations like fusing etc (OIR = optimizable IR) - Complete the low-level IRs (npir, gtcppir) - Implement analysis passes for information necessary to lower - Describe testing patterns and develop simple infrastructure for testing passes: how to add a new testcase without much overhead - Implement optimization passes (nice to have) No-goal: - At this point we don't focus on CUDA backends. They should be available via gtcppir. But not testing explicitly now, will reduce compile time etc. - Don't extend existing integration/regression tests of GT4Py (GTScript -> code + execution), but pass all existing tests ## Testing strategy - Definition IR to GTIR: unit tests as needed for implementation (developer's choice) - Test as much as possible by a narrow GTIR (pydantic types and validators) - Analysis passes: - Programmatically construct IRs and check against annotations of result IR - Unit tests for components used in the passes - Optimization passes: - Programmatically construct IRs and check against features of result IR - Unit tests see above - Lowering: - Unittests for individual nodes lowering - No tests from full IR -> full IR (rely on integration tests) - Codegen: - testing for edge cases for compilability - Integration tests (GTScript -> code + execution): - no new tests (only existing GT4Py tests) ### How to programmatically create IRs - Provide helpers to construct IRs (make it easy to construct similar IRs with small differences) - Builder pattern (keep an eye on complexity of the builders) ## Migration of transformation passes - Transformation split into analysis (e.g. compute merge canditates) and apply. Current passes in the order they would most likely be ported (form essential to nice-to-have). ### Lowering - InitInfoPass: Transcribe definition IR structure into blocks. The following `transform_data` attributes are changed: - `symbols`: SymbolInfo for each used symbol. - `blocks`: Block structure following the original Definition IR. Composed of 3 sub-passes: - IntervalMaker - SymbolMaker - BlockMaker --> integrate into Lowering DefIR -> GTIR (gt4py side) - NormalizeBlocksPass: Create a DomainBlockInfo for each StatementInfo. The following `transform_data` attributes are changed: - `blocks`: DomainBlockInfo each contain only a single StatementInfo. --> integrate into Lowering DefIR -> GTIR - BuildIIRPass: Transcribe `transform_data.blocks` to `transform_data.implementation_ir`. ``` DomainBlockInfo -> MultiStage IJBlockInfo -> Stage IntervalBlockInfo -> ApplyBlock ``` --> Lowering Def IR -> GTIR ### Analysis - DataTypePass: Fills in the concrete data_type for all set to `DataType.AUTO` These should be limited to: - NativeFuncCall - Ternary - Binary - Unary - FieldRef - VarRef - lhs of Assign --> Analysis on GTIR: annotate each Expr with its type -> OIR has subnode with dtype for leaves, e.g. Literal, FieldRef, VarRef until we can rely on symbol table to give us this information (no AUTO anywhere) - ComputeExtentsPass: Loop over blocks backwards and accumulate extents. Writes to `transform_data.blocks` and fills each IJBlockInfo.compute_extent, and creates `transform_data.implementation_ir.fields_extent`. --> Analysis on OIR? If on GTIR, then the lowering to (and optimizations in) OIR would have to keep this info intact. Required for code generators (i.e. lower-level IRs) and optimizations. - ComputeUsedSymbolsPass: Fills the SymbolInfo `in_use` attribute in `transform_data.symbols`. (nice to have) It does this because an entry was originally created for each Decl in the SymbolMaker part of InitInfoPass, and some of these may not be referenced. This allows to have Fields which are defaulted to None. - No checks are done on these fields - They are not passed to the C++ side --> Currently required in Tasmania ### Optimization - MergeBlocksPass: Merges `transform_data.blocks` using a greedy algorithm. The first step merges IJBlockInfos as long as compatibility conditions are met, then proceeds to try and merge IJBlockInfos. The secondary merging step attempts to create as few IntervalBlockInfos as necessary, by re-using existing blocks with the same interval. Note that this could be re-implemented as a third merging step for every IJBlockInfo instead. The following `transform_data` attributes are changed: - `blocks`: Merged as far as possible without reordering. --> Optimization on Optimization IR (keep greedy merge strategy) 2 passes: - First Vertical loop merging - Then Horizontal loop merging Strategy for each: - Find merge candidates - Merge - DemoteLocalTemporariesToVariablePass: Demote symbols only used within a single stage to scalars. This may occur because these can be local variables in the scope, and therefore do not need to be fields. --> Optimization IR optimization - HouseKeepingPass: Misc clean-up... They should be separated. - WarnIfNoEffect: Warn if StencilImplementation has no effect. --> probably on GTIR (maybe implicitly checked by GTIR restrictions) - PruneEmptyNodes: Removes empty multi-stages, stage groups, and stages --> optimization on OIR? Test optimization passes to not leave over empty nodes?