GT4Py Execution Model discussions (leading to new functional model)

# GT4Py Execution Model discussions (leading to new functional model) ###### tags: `archive` `discussion` ## Discussion session 1 (10.03.2021) ### Core definitions - Grid: set of connected points. - Field: mapping from grid points to quantities. Inside a stencil can also be viewed as a function that can be called with an offset and returns a value(s). - Stencil: computation using neighborhood shapes to compute output value(s) from a set of input fields. + Each input field may be read with a different neighborhood shape. - Computation area: set of points (or subset of a grid) where a stencil is applied. - Fencil: sequence of _(stencil, computation area)_ pairs, usually computed together for optimization. + In the current DSL, the computation area for each stencil is indirectly defined as a function of a rectangle _domain_ and the stencil itself: * ComputationArea = Function(Stencil, Domain) + Instead, we could define the ComputationArea explictly as the subset of grid points where this stencil will be applied. ``` - Grid: Set[Items] e.g. grid = {p0, p1, p2, ... p_n, e0, ... e_n ...} - ComputationArea: Set[Items] # subset of Grid - Field: Mapping[[Grid], Quantity] - Accessor: Callable[[Optional[Offset], Quantity] # Field bound to an application point inside a stencil - Stencil: Callable[[Accessor, ...]], Quantity] e.g. a(x, y, z) => x[1] + y[0] + 2* z[3] - Fencil: List[Tuple[Stencil, ComputationArea]] e.g. [(a(x), comp_area_1), (b(lift(a)(x)), comp_area_2)] ``` ### Model Assumptions in the internal representation of a stencil to guarantee a well-defined interpretation: - Fields used in a stencil are either input or output (not both). - Symbols are only assigned once (SSA). - Assignment to temporaries can be considered as _lifting_ a stencil (RHS expression) to a field. - The computation area of a stencil is computed explicitly as a function of the domain rectangle passed at run-time. A fencil is implemented as the sequential application of its stencils in the corresponding computation areas. A _call_ from a fencil (_caller_) to another fencil (_callee_) represents the concatenation of the list of stencil-computation pairs of the _callee_ in the call site of the _caller_ fencil. ## Possible issues - Definition of computation areas for each stencil might be more complicated than simply slicing a rectangular domain: + how to define a disjoint computation area for a stencil? + how to define non-rectangular computation areas? + how to use named computation areas defined outside of the stencil? - Grid concept should be explicitly specified as an argument in the stencil definition. - _Halo/ghost cells_ and _domain_ specification should be explicitly defined for any object implementing the _Field_ concept (e.g. gt4py storages) - If a single domain (rectangle?) is passed to the fencil at runtime, what happens when output fields have different domain sizes (e.g. staggered fields)? - What should be the scope of symbol names in the DSL: global (currently), per computation or per region? - If assignment to temporaries is considered as just a "lifting", doesn't it complicate the optimization of assigning to a temporary field and reusing the result from several places? In other words, isn't it easier to demote temporaries to scalars than promote computations to temporaries? - A grid is not just a set of points but several (related) sets of points: vertices, faces, edges (in cartesian grids, even two sets of edges: I-edges and J-edges). In principle, stencils applied on a computation area defined on one of the sets, should only write to output fields defined on that grid item. - How this model can be extended to solvers? - How does the execution model relate to the current GridTools C++ execution model? ## Discussion session 2 Preliminar: So far it is been discussed how to represent the internal model in a way that makes sense for a non-ambiguos implementation. The internal model is lower-level than the current DSL (SSA, lifting, etc.), under the assumption of that the current DSL can be regarded as syntatic sugar on top of the model. What I think is still missing in the discussion is the full description of the transforamation to go from the DSL to our model (the _de-sugarization_ of the DSL). It could also be important to re-examine if the current DSL syntax represents sufficiently well the NWP domain concepts. ```mermaid flowchart LR MWP[NWP equations] --> DSL DSL --> Model[Internal model] Model --> Implemenation ``` ------------------------------ ## Scratchpad ```python= @gt.stencil def stencil_1(in_field_1: InputField, out_field_2: OutputField): with computation(), interval(...): with horizontal_region([3:3, :]): out_field_1 = 0.5 * (in_field_1[I+1] + in_field_1[I-1]) def foo(in_field_1: InputField, in_field_2: InputField): tmp <<= in_field_1[J+1] out_field_1 = in_field_1[I+1] + in_field_1[I-1] + in_field_2[I+2] + in_field_2[I-2], out_field_2 = out_field_1[I+1] + out_field_1[I-1] + in_field_2[I+2] + in_field_2[I-2] return out_field_1, out_field_2 def bar(x): return x[I + 0] + 1 def baz(x: acc) -> value: tmp : acc = bar(x) # bar' = lift(bar) return x[I + 0] + tmp[I - 1] @true_stencil def bazz(x: acc) -> value: tmp = bar(x) # return x[I + 0] + tmp # x[I + 0] + bar(x) vs. x[I + 0] + lift(bar)(x)[I + 0] def lap(x): return ... def foo(x): tmp <= lap(x) return lap(tmp) @gt.stencil def stencil_2(domain, in_field_1: InputField, in_field_2: InputField, out_field_1: OutputField, out_field_2: OutputField): with computation(), interval(...): with horizontal_region(domain[:, :]) as ff: with computation(), interval(...): with horizontal_region([:, :]) as ff: a, b = foo(in_field_1, in_field_2) tmp = baz(in) with horizontal_region([2:-2, :]): out = bar(in) + baz(in) out2 = bar(tmp) #stencil_1(domain[2:-2], in_field_1, out_field_1) stencil(if1, if2, of1, of2, domain=[100, 100]) # origin=(2,0)) #dict(in_field_1=(1,0), in_field_2=(2,0)) def s(in): return in[1, 1, 1] + in[0, 0, 0] stencil is a function from accessor(s) to value(s) fencil: [ {a(x), comp_area}, {b(lift(a)(x) ), comp_area_2 }] y = a(x) z = b(y) return y, z @fencil def aa(x) : return [(a(x), a0), (b(x), a1)] bb = [(c(x), a0}, (d(x), a1)] aa(bb(x)) => [(a(c(x)), a0}, (b(d(x), a1)] ``` ```python @stencil def laplacian(psi: Accessor) -> double: return -4 * delta_psi + delta_psi[I+1] + delta_psi[I-1] + delta_psi[J+1] + delta_psi[J-1] @fencil def laplace_operator(psi: InField) -> OutField: with computation(PARALLEL), interval(...): with horizontal_region([1:-1, 1:-1]) lap_psi = laplacian(psi) return lap_psi @fencil def average_laplacian_operator(psi: InField) -> OutField: """Compute the average of the laplacian of the I+-1, J+-1 neighbors""" with computation(PARALLEL), interval(...): with horizontal_region([1:-1, 1:-1]) # reuse of stencils avg_lap_psi = laplacian(psi[I+1]) + laplacian(psi[I-1]) + laplacian(psi[J+1]) + laplacian(psi[J-1]) return delta_psi @fencil def laplap(psi: InField) -> OutField: # composition of fencils return laplacian(laplacian(psi)) --- @stencil def nabla(psi: Accessor) -> Tuple[double, double]: return [ psi[I+1]-psi[I-1], psi[J+1]-psi[J-1] ] @stencil def divergence(vec: Tuple[Accessor, Accessor]): # arg return vec[0][I+1]-vec[0][I-1] + vec[0][J+1]-vec[0][J-1] @fencil def laplacian_2(psi: InField): with computation(PARALLEL), interval(...): with horizontal_region([1:-1, 1:-1]): nab = nabla(psi) lap = divergence(nab) return lap ``` ## List of samples - gtbench without periodic boundary conditions - ??