# Stencil Object/Module Improvements Argument parsing and validation of stencil arguments has been a constant performance bottleneck in GT4PY. The goal of this project is to implement an interface to avoid those overheads. ###### tags: `cycle 5` --- ## Problem The main contributor to the stencil call overhead is the deduction of a domain as a tuple of integers and the origins of each field as a dictionary mapping its name to its origin (again tuple of integers) from the domain, origin and field arguments given to the stencil (e.g. domain might be `None` in which case a maximum domain needs to be computed). Note: Validation also contributes significantly, but can already be disabled used the `validate_args` flag. ## Solution 1. Enhance `StencilObject` by an additional method `freeze` returning a `FrozenStencil` which does not require any validation and argument parsing, but only calls `StencilObject.run`, `pre_run` and `post_run`. ```python @gtscript.stencil() def example(...): ... # Interface 1: Freeze fast_stencil = example.freeze(domain, origins...) fast_stencil(*fields, **kwparams) ``` Note: The existing `FrozenStencil` implementation in FV3 additionally marks gpu storages as modified, which would be done in `post_run`. 2. *Optional* Implement a caching mechanism for the `domain` and `origin` deduced inside `StencilObject.__call__` so that subsequent runs do not need to recalculate them over and over again. ```python # caching: first time caches the domain and origins example(*fields, **kwparams) # second time uses the cache example(*fields, **kwparams) ``` The deduced `domain` and `origin` depend on the passed `domain`, `origin` and storage metadata (shape, default origin) of the fields. A light-weight cache identifier can be computed as follows: - *origin* + `dict`: use either `hash(tuple(origin.items()))` or `id(origin)` depending on execution time and "stability" (i.e. does `id` change on subsequent calls) + `tuple`: use `hash` - *domain*: can be `None` or `tuple` both are hashable right away - *storage metadata*: Extend the `gt.storage.storage` class to retrieve a hash of the storage metadata computed during construction. As the numpy backends also support `nd.array`s as storages accessing this hash should be done using a try-except. When the access fails no caching is performed. For testing all combinations of the above should be executed and it should be checked that cached and non-cached gives the same arguments to `StencilObject.run`. ## Prior-work - The [`FrozenStencil`](https://github.com/VulcanClimateModeling/fv3core/blob/master/fv3core/decorators.py) decorator in `fv3core` already provides similar functionality and can be used as point of reference. - While for a different purpose the [stencil decorator](https://github.com/ckuehnlein/FVM_GT4Py/blob/main/src/fvms/utils/stencil.py) in the FVM dycore implements a caching mechanism. ## No-Goals - Improve performance of checks using C++ or Cython - Changes in the low-level interface ## Appetite 1/2 cycle with the optional goal, 1 developer --- <!--## Solution - Decide on interface - Review existing impl (https://github.com/VulcanClimateModeling/fv3core/blob/master/fv3core/decorators.py) - Change existing impl to proposed interface and document it Project 2 (optional, for later): - What is the minimal info needed to fingerprint a field - What is the hash function for this info to check whether out of date - Implement ## Initial Questions/Notes JD: - Do we want a different interface, or first investigate speeding up the existing interface by moving all of the StencilObject calculation into Cython or C++? EP: - Should the user be able to disable levels of safety? - How to implement in performant way? - Low-level interface for calling stencils that applications can hook into. TW: - Changing the low-level interface is a different task from what was envisioned here - Baseline: checks have to happen either way e.g. compute origins for each field. Can do at call time, pre-compute at generation, or python layer in cython at call time (slow to fast) - Want to move checks to a different time EG: - Frozen stencil is moving checks to generation time from call time - Good idea to have method to fully turn off all checks and directly call -->