# Compiled Backend Temporaries ###### tags: `functional cycle 13` - Developers: Till - Appetite: Full cycle ## Problem On Iterator level composition of stencils is realized using the `lift` builtin. The resulting iterator of an applied `lift` can be used to evaluate a stencil on neighboring locations (i.e. grid points). Without memorization those stencils are regularly evaluated many times on the same location. As a result the runtime $t$ of a stencil closure grows exponentially in composition depth $d$, i.e. $t(n, d) \in \mathcal{O}(n^d)$, where $c$ is the maximum number of neighboring locations on any nesting level. This is particularly devestating in terms of runtime for nested reductions. Even a simple second order advection scheme can easily require reevaluation of a hundred or more stencils. Without memorization fusing of stencils in Icon4Py and a clean / scalable implemention of the FVM dycore is impractical. ## Solution An efficient way of memorization is the creation of temporaries containing the values of `lift`ed stencils, evaluated at all locations they accessed on. ## Goals The goal of this project is support temporaries in the embedded and gtfn backend for all frontend and iterator tests (unless otherwise disabled). ## Non-goals ## Current state ITIR has already been extended by two nodes (`Temporary`, `FencilWithTemporaries`) required to represent temporaries and fencils with temporaries. A pass transforming a regular `Fencil` into a `FencilWithTemporaries` has been implemented in `split_closures` (this can be triggered by using `LiftMode.ForceTemporaries`). This pass is currently limited in that it can only handle some structures of ITIR, but notably fails for patterns encountered in ITIR generated by the frontend. _Example stencil not supported by `split_closures` pass_ ```python= @fundef def stencil(closure_arg_it): scalar = 1+1 lifted_it = lift(lambda it: deref(it)+scalar)(closure_arg_it) return deref(lifted_it) ``` Code reference: https://github.com/GridTools/gt4py/blob/functional/src/functional/iterator/transforms/global_tmps.py ## Known steps - Extend the `split_closures` pass to support ITIR as generated by the frontend - Optional, but desired: Implement a heuristic with reasonable performance (inbetween `LiftMode.ForceTemporaries` and `LiftMode.ForceInline`) ## Possible rabbit holes