Formalize Field-View concepts

# Formalize Field-View concepts ###### tags: `cycle 15` - Shaped by: Enrique - Appetite: full cycle for @fthaler and at least 3 part-time core developers with deep understanding of the gt4py model. - Developers: ## Problem The current _field-view_ user frontend in `gt4py.next` is based on the idea of dealing with fields using a NumPy-like/tensor syntax with some extra domain-specific operations on top (e.g. `neighbor_sum`, `shift/remap`, ...). User-codes written are transformed to the IteratorIR representation which is later processed in the toolchain to obtain the final high-perfomance implementation of the program. One of the key ideas behing this `field-view` frontend design was to support a fast embedded implementation using tensor libraries like NumPy, JAX or PyTorch. Fast embedded execution is a great feature to help users to easily debug their codes. However, up until now only proof-of-concept implementation of embedded execution has been developed, and it is still quite rough, but it cannot be completed until the _field-view_ concepts are fully defined. ## Appetite The appetite for this project is large since this is a key milestone to formalize the core concepts of the current _field-view_ frontend and therefore for the development of the _embedded_ execution mode. Moreover, @fthaler is leaving soon and we need to gather and transfer as much knowledge as possible from hin, since he has a very deep understanding of the gt4py concepts both in the IR and in the frontends. ## Solution In this project we will discuss and clarify all the concepts defining _field-view_ (`field`, `remap`, `neighbor_reduction`, `dimension`, ...) and how they translate to a embedded implementation with tensor libraries. Additionally, once the concepts are well defined, we will investigate if _field-view_ itself can be used as a higher-level tensor-based IR with better optimization opportunities for the code-generation backends. As part of the discussion, we will: - Review and collect field-view assumptions used in the current lowering to IteratorIR. - Discuss efficient Python implementation strategies for the concepts. - Review and discuss [FieldIR idea](https://hackmd.io/@gridtools/S1y9Q3rFc) from @fthaler - Optionally, if time allows it, investigate the relationship between Field IR and Iterator IR - Maybe try to define Field-view / FieldIR in terms of _functor_, _profunctor_ and _comonad_ concepts as Iterator IR (see [Iterator IR, Functors, and Comonads](https://hackmd.io/x7vfhtrdQh-WQgAGEbHiZQ)) ## No-gos During the development of the project, it could be needed to use the existing embedded field-view prototype or to implement new prototypes to test some concepts, however, those prototypes are not expected to have a production-quality implementation and thus they won't be merged in `main`. ## Progress Peter's list of GT4Py issues: https://hackmd.io/@gridtools/HkMMfEqSn/edit ### Agenda for 2023-06-13 - [x] Details of how `remap` works, Dimensions, slicing - [x] Concatenation of fields, boundary conditions - [x] Scan ### Agenda for 2023-06-20 - [X] compile-time domain sizes would be much more complicated (specially in the blue line), because in a multi-node setup every node might have a different domain. - Short-term, it is not possible to implement a compile-time size IR, until the infrastructure to deal with this issues is in place. - [ ] why do all iterators of a stencil have to be in the same position? Write it down!  ### Open field view questions - What is a field? - $f: D \rightarrow V, D = D_0 \times D_1 \times \dots \times D_n$ - Definition of domain of operations of fields - arithmetic binary: - $h(f,g): D_f \cap D_g \rightarrow V$ - $h(f,g)(x) = h(f(x), g(x)) \forall x \in D_f \cap D_g$ - currently: implicit broadcast in some cases - remap: - $remap(f, c) = f \circ c$ - $c: D_c \rightarrow D_{f'}, D_{f'} \subset D_f$ - $D$ of $remap(f,c)$ - inverse image of c for points in $D_f$: $c^{-1}[D_f]$ - where: - maybe desired - does intersection on all 3 fields, like arithmetic binary - $where(c, t, f)$ - with $c: D_c \rightarrow \{True, False\}$ - current state - resulting domain is domain of $D_c \cap (D_\tilde{t}\cup D_\tilde{f})$, $D_\tilde{t} = \left \{ d \forall d \in D_t \text{ where } c(d) == True \right \}$ and $D_\tilde{t} \subset D_c$ (f analogous). - this is bad for - dependency analysis if used for boundary conditions, because $c$ is runtime, and we need to reconstruct domain slicing from arbitrary expressions (using index fields). - embedded execution: maybe we cannot compute the domain of the resulting field efficiently - broadcast: - $broadcast(f, D): D_f \times D \rightarrow V$ - $broadcast(f, D)(x_0, \dots, x_D) = f(x_0, \dots, x_{D-1})$ - union (one of the 2, not implemented): - with precedence - without overlap - neighbor_reduce - $neighborsum(f, D_s): D \rightarrow V$, where $f: D \times D_s \rightarrow V$ - currently: - only 1 sparse dimension supported - if we want more we have to think about dimensions tags that appear multiple times (and the order of allowed reductions) - What is a field_operator? - single-output-field-operator: a function $f(g_0,g_1,...) \rightarrow h$ where $g_i, h$ are fields. - multi-output-field-operator: a set of functions $f_j(g_0,g_1,...) \rightarrow h_j$ where $g_i, h_j$ are fields. - Till wants to make this more concrete - Explicit slicing vs intersecting domains vs intersecting domains + explicit slicing? - intersecting domains (with possibility to define domain in the output) - if we need, we might want to introduce explicit slicing on top - apply field operator: take field $f: D_f \rightarrow V$, Domain $D_\text{out}$ and buffer $M$ assign the elements $f(D_\text{out})$ to $M$, $D_f \subset D\text{out}$ - compile-time vs runtime-time domain? - compile-time is much simpler to analyze - if embedded is fast enough, compile time doesn't really matter - we are partly using compile-time domains already (as a short-cut in temporaries) - What does combined IR mean? - The IR has 2 extremes: fully field representation (every operation is a map over the temporary domain) and fully local representation (only a single map of a stencil over the output domain) - local view: it only uses `map` (and other higher order functions) to apply scalar functions over fields - field view: it only uses _mapped_ versions of builtin scalar functions - combined view: it is possible to use `map`s and operators between fields - Do we need tuple of tensor? - (tensor IR doesn't have it) - A Field that represent data of neighbor points can be represented as a Field with local dimension (MCH Sparsefield, for variable number of neigbors its a masked/sparse array) or as a Field of local Fields (where the local Fields can have different number of values). - we conclude for the frontend we want to use the Field with local dimension - (probably we should do the same in Iterator IR / combined IR) - Column mode of Iterator IR (scan): - Iterator IR column probably is Iterator of 1D Field #### Boundary conditions - we allow slicing with compile time values (or relative to the field domain) - then we can concatenate non-overlapping fields with the constraint that they produce a rectangular shape ```python= @field_operator(backend=build_config.backend) def with_boundary_values( lower: Field[[Vertex, K], float_type], interior: Field[[Vertex, K], float_type], upper: Field[[Vertex, K], float_type], level_indices: Field[[K], int], num_level: int ) -> Field[[Vertex, K], float_type]: return where(level_indices == 0, lower, where(level_indices == num_level - 1, upper, interior)) # TODO(tehrengruber): move to seperate file @field_operator(backend=build_config.backend, grid_type=GridType.UNSTRUCTURED) def nabla_z(psi: Field[[Vertex, K], float_type], level_indices: Field[[K], int], num_level: int): return with_boundary_values( (domain(psi)[:, 0:1], psi(Koff[1]) - psi(Koff[0])), psi(Koff[1]) - psi(Koff[-1]), (domain(psi)[:, -1:], psi(Koff[0]) - psi(Koff[-1]), ) ] field[K(0):K(1)] + field == (field + field)[K(0):K(1)] == where(K==0, 2*field, None) # Non-overlapping concatenate concatenate( field[K(0):K(MAX_K - 1)], const_field[K(MAX_K):K(MAX_K + 1)] ) field[K.relative[0:-1] ^ const_field[K(30):] # Setting with precedence (JAX syntax) c = a.at[somewhere].set(b) # This can be implemented with slicing and concatenate ``` ### Combined IR questions - itir.NeighborList vs field view local dimension #### Field concept constraints (_Dimension_ from [Gibbons 2017](https://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/aplicative.pdf)) - Functor: fmap - Applicative (Monoidal): pure, apply <*> - Foldable: foldl - Traversable: traverse - Naperian/Representable <= Profunctor ?? - Profunctor: contramap #### Iterator concept constraints ([IndexedComonadStore](https://hackage.haskell.org/package/lens-4.14/docs/Control-Lens-Internal-Context.html#t:IndexedComonadStore)) - Functor (IndexedFunctor): fmap - Comonad (IndexedComonad): extend (ITIR: lift), extract (ITIR: deref) - StoreComonad (IndexedStoreComonad): pos, peek, (extra: peeks, seek, seeks (ITIR: shift), experiment (ITIR: neighbors)) #### Playground: fields and iterators ```python # -- Fields and field operators -- f_a: Field[[I,J,K], Scalar] f_b: Field[[I,J,K], Scalar] def field_operator( field_a: Field[[I,J,K], Scalar], field_b: Field[[I,J,K], Scalar] ) -> Field[[I,J,K], Scalar]: return 0.5*field_a + 0.5*field_b f_c: Field[[I,J,K], Scalar] = field_operator(field_a, field_b) # -- Iterator and local operators -- def local_stencil( it_a: Iterator[[I,J,K], [I,J,K], Scalar], it_b: Iterator[[I,J,K], [I,J,K], Scalar], ) -> Scalar: return 0.5*extract(it_a) + 0.5*extract(it_b) f_a: Field[[I,J,K], Scalar] f_b: Field[[I,J,K], Scalar] # Option 1 -> field_op: (Field, Field) -> Field (implicitly computes maximum domain) ?? field_op: Callable[[Field[I,J,K], Field[I,J,K]], Field[I,J,K]] = fmap( local_stencil, #domain_maker=annalyze_stencil(local_stencil) # what about concatenate?? ) f_c: Field[[I,J,K], Scalar] = field_op(field_a, field_b) # Option 2 -> field_op: domain -> ((Field, Field) -> Field) (explicit domain) ?? field_op: Callable[ [Domain], Callable[[Field[I,J,K], Field[I,J,K]], Field[I,J,K] ] = fmap(local_stencil) f_c: Field[[I,J,K], Scalar] = field_op[out_domain](field_a, field_b) # -- Reshaping fields -- def expand_domain(it: Iterator[[I,J,K], [I,J], Field[[K], Scalar]]) -> Scalar: return extract(seeks((lambda i,j,k: i,j), it))[pos(it).k] f_splitted: Field[[I,J], Field[K]] f: Field[[I,J,K], Scalar] = fmap(expand_domain)[domain(I,J,K)](f_splitted) def slice_domain(it: Iterator[[I,J], [I,J,K], Scalar]) -> Field[[k], scalar]: return field( domain=(K,), [extract(seeks((lambda i,j: i,j,k), it)) for k in K] ) f: Field[[I,J,K], Scalar] f_splitted: Field[[I,J], Field[K]] = fmap(slice_domain)[domain(I,J)](f) ``` #### Playground: scan ```python # -- Scan with fields f_ijk: Field[[I,J,K], Scalar] def f_scan_pass( state: Field[[I,J], Scalar], # state: Field[[I_current, J_current], Scalar], field: Field[[I,J], Scalar] ) -> Field[[I,J], Scalar]: # -> Field[[I_current, J_current], Scalar]: return state + 0.5*field field_op: Callable[[Field[I,J,K]], Field[I,J,K]] = scan( f_scan_pass, K ) # -- Scan with iterators def it_scan_pass( state: Iterator[[?], [?], Scalar], it: Iterator[[I,J], [I,J], Scalar] ) -> Scalar: return state + 0.5*extract(it) + 0.25*extract(seeks(lambda i: i-1, it)) field_op: Callable[[Field[I,J,K]], Field[I,J,K]] = scan( it_scan_pass, K )[0:K_max] def it_scan_pass_k( state: Iterator[[?], [?], Scalar], it: Iterator[[K], [K], Scalar] ) -> Field[[I,J], Scalar]: return state + 0.5*extract(it) + 0.25*extract(seeks(lambda k: k-1, it)) combined_field_op: Callable[[Field[I,J,K]], Field[I,J,K]] = fmap( lambda it_of_f_k: scan(it_scan_pass_k)[0:K_max](extract(it_of_f_k)) )[0:I_max, 0:J_max](field_ijk) ``` #### Playground: Tuples ```python! def foo_it(it0, it1): return make_tuple(deref(it0), deref(it1)) def foo_f(f0: Field[D, dtype], f1: Field[D, dtype]) -> Field[D, tuple[dtype,dtype]]: return zip(f0, f1) def bar_f(f0: Field[D0, dtype], f1: Field[D1, dtype]): return f0, f1 ``` ```python! def whole_dycore(inp): return ... def main(inp): return diff_x(whole_dycore(inp)), diff_y(whole_dycore(inp)) def main2(inp): dycore_of_inp = whole_dycore(inp) return diff_x(dycore_of_inp), diff_y(dycore_of_inp) ``` ```python! def foo(e, v): if bar: tmp0 = e + 1 tmp1 = v + 2 + 4 return tmp0, tmp1 else: return e, v def foo_explicit(e,v): if bar: tmp0 = fmap(plus, d)(e, 1) tmp1 = fmap(plus, d)(fmap(plus, d)(v, 2), 4) return tmp0, tmp1 else: return e,v def foo1(e,v): if bar: tmp0 = fmap(...) tmp1 = fmap(lambda x: x + 2 + 4)(v) else: e, v def foo2(e,v): fmap(lambda e, v, bar: e + 1 if bar else e), fmap(lambda e,v, bar: v+2+4 if bar else v) def foo_e(e, v): if bar: tmp0 = e + 1 tmp1 = v + 2 return tmp0 else: return e def foo_v(e, v): ... # Broadcast constant if def foo_bcast(e, v): out_e = where(broadcast(bar, e), e + 1, e) out_v = where(broadcast(bar, v), v + 2 + 4, v) return out_e, out_v # if bar: # tmp0 = e + 1 # tmp1 = v + 2 + 4 # return tmp0, tmp1 # else: # return e, v ``` - ~~Tendency: FieldIR doesn't need the concept of returning multiple fields (on different domain), but it's important in the frontend, therefore we need a good understanding how to translate from field_operator with multiple returns to the IR without that. (E.g. extension of IR to connect to the frontend).~~ Definitions: Bob: is a function that takes fields and returns a single field, build from builtin functions which are constructed from fmapped local operations. - Second Try: FieldIR should have the concept of a function taking fields and returning fields. Such a function is not lowerable to a single stencil. But lowers to a tuple of Bobs, each of them lowering to stencil. (Since all field view builtins come from fmap, we can transform every Bob to a single stencil and every function returning fields to a tuple of stencils). (Till approves, but wants a better name for Bob.) - Intermediate solution: FieldIR should have the concept of a function taking fields and returning potentially multiple fields defined on arbitrary domains. Such a function is not lowerable to a single ITIR stencil. But: For all cases (single return statement with fields of different domain, if statements are always compile time) that we will need in the foreseeable future there is a simple way to write such functions (without duplicating the expressions which we are unsure how well we can optimize that) in a way that they only return a single field. As such we can also lower them regularly. All other cases that we can not lower this way should be prohibited (until we potentially abandon ITIR). ```python! def foo_f(f0: Field[D, dtype], f1: Field[D, dtype]) -> Field[D, array[dtype,2]]: return fzip(f0, f1) def bar_f(...): a, b = funzip(foo_f) def foo_f(f0: Field[D, dtype], f1: Field[D, dtype]) -> array[Field[D,dtype],2]: return zip(f0, f1) func <$> arg1 <*> arg2 <*> arg3 def bar_f(...): a, b = foo_f ``` ##### Questions How to fmap fields with the same domain? (Note: above we said we cannot map fields with different domains together) Ideas: - we zip the fields first 1. `(Field[D, dtype], f1: Field[D, dtype]) -> Field[D, array[dtype,2]]` now we can fmap fields with same domain and dtype 2. `(Field[D, dtype1], f1: Field[D, dtype2]) -> Field[D, tupe[dtype1,dtype2]]` this might not be a very interesting Field for the frontend, but allows to fmap fields with different dtype together what we absolutely want 3. as execution hint/policy introduce multi-fmap `mfmap` which maps `N` `field_operator`s over the same domain and returns `N` fields 4. similar to 3, but introduce `mfmap` takes a single local operations that returns tuple and outputs to tuple of fields Hannes thoughts: (1) is not useful as it doesn't allow expressing a single kernel over operations on different dtypes Maybe all above doesn't make sense because it's late...