# GT4Py optimizations ###### tags: `cycle 15` - Shaped by: Till - Appetite: 1/2 cycle, 1 developer (plus optional 1-2 weeks of heuristics discussions with other developer (Christoph)) - Developers: Till ## Problem __Temporaries__ The main part that is still missing to archieve good performance is the support for automatically extracting temporaries and a heuristics for doing so. Without temporaries derived values (e.g. dereferenced iterators originating from lifted stencil calls) need to be computed on demand resulting in an exessive amount of recomputations (e.g. for every grid point we compute all required values regardless of whether they are needed for other grid points as well). In particular with respect to composing field operators this is a problem easily, resulting in hundreds of recomputations after only a few compositions. __Power builtin__ Additionally we want to look at some smaller optimizations, like transforming power operations for small exponents into regular multiplications (good beginners task). ## Appetite To implement the temporaries and the heuristics 1/2 cycle 1 FTE is needed. To fine tune the heuristics feedback and discussions from the ICON profiling would be useful (1/3 cycle 1 FTE). Power builtin is around a day for one developer. ## Solution ### Temporaries While a first prototype exists for the extraction of temporaries some cleanup is still missing for it to be merged and used in production codes. Further more we need a reasonable heuristics to decide if a temporary is to be extracted (right now we only support uncondtionally extracting temporaries or no temporaries at all). In this project we plan to implement a heuristic based on the following principle: If an iterator is derefed at multiple offsets (how many should be configurable) we extract a temporary. For example we want to be able to distinguish between the following two cases: ``` (let it = lift(lambda it: ...) deref(shift(...)(it)) + deref(shift(...)(it))) ``` ``` (let it = lift(lambda it: ...) deref(it)+deref(it)) ``` Where in the first cases we might want to create a temporary for `it` whereas in the last case we won't. ### Power builtin Write a pass that transforms the following ITIR ``` power(expr, 2) ``` into ``` (lambda x: x*x)(expr) ``` ## Rabbit holes The condition for an extraction of a temporary to be legal is something we haven't formalized yet. For example iterators that are only conditionally derefenced should not be extracted as it could lead to an undefined access. To our knowledge for the patterns that occur in a dycore, including stencils with boundary conditions, we should be able to find a simple solution. However generalizing this such that everything we support in the frontend works is a challenging task. ## No-gos Profiling is explicitly not part of this project, but will be carried out in other projects (TODO add document for icon, FVM is on demand).