# Stencil Object/Module Improvements
Argument parsing and validation of stencil arguments has been a constant performance bottleneck in GT4PY. The goal of this project is to implement an interface to avoid those overheads.
###### tags: `cycle 5`
---
## Problem
The main contributor to the stencil call overhead is the deduction of a domain as a tuple of integers and the origins of each field as a dictionary mapping its name to its origin (again tuple of integers) from the domain, origin and field arguments given to the stencil (e.g. domain might be `None` in which case a maximum domain needs to be computed).
Note: Validation also contributes significantly, but can already be disabled used the `validate_args` flag.
## Solution
1. Enhance `StencilObject` by an additional method `freeze` returning a `FrozenStencil` which does not require any validation and argument parsing, but only calls
`StencilObject.run`, `pre_run` and `post_run`.
```python
@gtscript.stencil()
def example(...):
...
# Interface 1: Freeze
fast_stencil = example.freeze(domain, origins...)
fast_stencil(*fields, **kwparams)
```
Note: The existing `FrozenStencil` implementation in FV3 additionally marks gpu storages as modified,
which would be done in `post_run`.
2. *Optional* Implement a caching mechanism for the `domain` and `origin` deduced inside `StencilObject.__call__` so that subsequent runs do not need to recalculate them over and over again.
```python
# caching: first time caches the domain and origins
example(*fields, **kwparams)
# second time uses the cache
example(*fields, **kwparams)
```
The deduced `domain` and `origin` depend on the passed `domain`, `origin` and storage metadata (shape, default origin) of the fields. A light-weight cache identifier can be computed as follows:
- *origin*
+ `dict`: use either `hash(tuple(origin.items()))` or `id(origin)` depending on execution time and "stability" (i.e. does `id` change on subsequent calls)
+ `tuple`: use `hash`
- *domain*: can be `None` or `tuple` both are hashable right away
- *storage metadata*: Extend the `gt.storage.storage` class to retrieve a hash of the storage metadata computed during construction. As the numpy backends also support `nd.array`s as storages accessing this hash should be done using a try-except. When the access fails no caching is performed.
For testing all combinations of the above should be executed and it should be checked that cached and non-cached gives the same arguments to `StencilObject.run`.
## Prior-work
- The [`FrozenStencil`](https://github.com/VulcanClimateModeling/fv3core/blob/master/fv3core/decorators.py) decorator in `fv3core` already provides similar functionality and can be used as point of reference.
- While for a different purpose the [stencil decorator](https://github.com/ckuehnlein/FVM_GT4Py/blob/main/src/fvms/utils/stencil.py) in the FVM dycore implements a caching mechanism.
## No-Goals
- Improve performance of checks using C++ or Cython
- Changes in the low-level interface
## Appetite
1/2 cycle with the optional goal, 1 developer
---
<!--## Solution
- Decide on interface
- Review existing impl (https://github.com/VulcanClimateModeling/fv3core/blob/master/fv3core/decorators.py)
- Change existing impl to proposed interface and document it
Project 2 (optional, for later):
- What is the minimal info needed to fingerprint a field
- What is the hash function for this info to check whether out of date
- Implement
## Initial Questions/Notes
JD:
- Do we want a different interface, or first investigate speeding up the existing interface by moving all of the StencilObject calculation into Cython or C++?
EP:
- Should the user be able to disable levels of safety?
- How to implement in performant way?
- Low-level interface for calling stencils that applications can hook into.
TW:
- Changing the low-level interface is a different task from what was envisioned here
- Baseline: checks have to happen either way e.g. compute origins for each field. Can do at call time, pre-compute at generation, or python layer in cython at call time (slow to fast)
- Want to move checks to a different time
EG:
- Frozen stencil is moving checks to generation time from call time
- Good idea to have method to fully turn off all checks and directly call
-->