[Greenline] Make Graupel practical with GTFN

# [Greenline] Make Graupel practical with GTFN - Shaped by: Till, Hannes - Appetite (FTEs, weeks): - Developers: ## Problem Implementation and debuggability of the Graupel scheme is unpleasent due to performance issues and bugs in the optimization pipeline and execution. This hinders implementing optimal code structure and makes debugging time consuming. This project aims to improve the situation by solving the known issues. Getting it to run with the C++ backend would be nice but might not be feasible within this project as not all issues are known and understood. The known problems are: - Compilation and/or execution is very slow. - If statements don't work with the gtfn backend. - Chia Rui discovered a bug where equivalent `if` statements give different results. - Temporaries pass does not support scan (might not be important for graupel) - [Optional] Bug in type inference resulting in a non-deterministic constraint resolution error. ## Appetite ## Solution ### Unblock Chia Rui's work Currently, the Graupel scheme partially uses non-optimal patterns to deal with the compile-time issue. We will add `scan` to embedded field view in [](), which will improve time to solution and therefore the ability to improve the code towards its desired form. Note: Embedded field view for the `scan` will completely bypass the compilation phase (which is most likely part of the performance problem), but runtime will not be good enough for real-world problem sizes. ### Analyze GT4Py performance The goal is to get an understanding of the bottlenecks of GT4Py: - how much the time is spent in compilation, how much in runtime - how do the different passes behave for complex codes Run the scheme in a profiler and identify the most time consuming parts. If reasonable (i.e. obvious, simple-to-fix problems) improve GT4Py to improve compile/runtime. ### If statements with tuples (blocks GTFN execution) Some experiments with rewriting parts of the IR using Continuation-Passing-Style suggest that it should be possible to remove all tuple_get and make_tuple statements (except for expressions mixing tuples created inside a closure and passed from the outside). This optimization should allow to rewrite the IR in a way that the lift statements (originating from the ifs in the frontend) which are not supported by the compiled backend can be transformed away. ### Bug: if statements produce invalid code Find a small reproducible for the problem and identify the source of the error. If reasonable, fix the problem within this project. ### Temporaries pass compatible with scan The temporary pass is currently not compatible with scan operators as the lowering from FOAST to ITIR emits lift statements that capture the carry argument. The goal is to transform the ITIR such that no lifts which capture exist. ### Optional: Bug in type inference resulting in a non-deterministic constraint resolution error This is the lowest priority task in this project: - might be very time consuming to debug the issue: non-deterministic - current ITIR type inference is difficult to debug, see [Replace ITIR type inference](https://hackmd.io/@gridtools/H1trEmURn) ## Rabbit holes The difficulty in this project is to trade-off problem solving and problem analysis. If in doubt, focus on analyzing the problems and schedule the solution in a later cycle. ## No-gos