# Formalize Field-View concepts
###### tags: `cycle 15`
- Shaped by: Enrique
- Appetite: full cycle for @fthaler and at least 3 part-time core developers with deep understanding of the gt4py model.
- Developers:
## Problem
The current _field-view_ user frontend in `gt4py.next` is based on the idea of dealing with fields using a NumPy-like/tensor syntax with some extra domain-specific operations on top (e.g. `neighbor_sum`, `shift/remap`, ...). User-codes written are transformed to the IteratorIR representation which is later processed in the toolchain to obtain the final high-perfomance implementation of the program.
One of the key ideas behing this `field-view` frontend design was to support a fast embedded implementation using tensor libraries like NumPy, JAX or PyTorch. Fast embedded execution is a great feature to help users to easily debug their codes. However, up until now only proof-of-concept implementation of embedded execution has been developed, and it is still quite rough, but it cannot be completed until the _field-view_ concepts are fully defined.
## Appetite
The appetite for this project is large since this is a key milestone to formalize the core concepts of the current _field-view_ frontend and therefore for the development of the _embedded_ execution mode. Moreover, @fthaler is leaving soon and we need to gather and transfer as much knowledge as possible from hin, since he has a very deep understanding of the gt4py concepts both in the IR and in the frontends.
## Solution
In this project we will discuss and clarify all the concepts defining _field-view_ (`field`, `remap`, `neighbor_reduction`, `dimension`, ...) and how they translate to a embedded implementation with tensor libraries. Additionally, once the concepts are well defined, we will investigate if _field-view_ itself can be used as a higher-level tensor-based IR with better optimization opportunities for the code-generation backends.
As part of the discussion, we will:
- Review and collect field-view assumptions used in the current lowering to IteratorIR.
- Discuss efficient Python implementation strategies for the concepts.
- Review and discuss [FieldIR idea](https://hackmd.io/@gridtools/S1y9Q3rFc) from @fthaler
- Optionally, if time allows it, investigate the relationship between Field IR and Iterator IR
- Maybe try to define Field-view / FieldIR in terms of _functor_, _profunctor_ and _comonad_ concepts as Iterator IR (see [Iterator IR, Functors, and Comonads](https://hackmd.io/x7vfhtrdQh-WQgAGEbHiZQ))
## No-gos
During the development of the project, it could be needed to use the existing embedded field-view prototype or to implement new prototypes to test some concepts, however, those prototypes are not expected to have a production-quality implementation and thus they won't be merged in `main`.
## Progress
Peter's list of GT4Py issues: https://hackmd.io/@gridtools/HkMMfEqSn/edit
### Agenda for 2023-06-13
- [x] Details of how `remap` works, Dimensions, slicing
- [x] Concatenation of fields, boundary conditions
- [x] Scan
### Agenda for 2023-06-20
- [X] compile-time domain sizes would be much more complicated (specially in the blue line), because in a multi-node setup every node might have a different domain.
- Short-term, it is not possible to implement a compile-time size IR, until the infrastructure to deal with this issues is in place.
- [ ] why do all iterators of a stencil have to be in the same position? Write it down!
<!---->
### Open field view questions
- What is a field?
- $f: D \rightarrow V, D = D_0 \times D_1 \times \dots \times D_n$
- Definition of domain of operations of fields
- arithmetic binary:
- $h(f,g): D_f \cap D_g \rightarrow V$
- $h(f,g)(x) = h(f(x), g(x)) \forall x \in D_f \cap D_g$
- currently: implicit broadcast in some cases
- remap:
- $remap(f, c) = f \circ c$
- $c: D_c \rightarrow D_{f'}, D_{f'} \subset D_f$
- $D$ of $remap(f,c)$
- inverse image of c for points in $D_f$: $c^{-1}[D_f]$
- where:
- maybe desired
- does intersection on all 3 fields, like arithmetic binary
- $where(c, t, f)$
- with $c: D_c \rightarrow \{True, False\}$
- current state
- resulting domain is domain of $D_c \cap (D_\tilde{t}\cup D_\tilde{f})$, $D_\tilde{t} = \left \{ d \forall d \in D_t \text{ where } c(d) == True \right \}$ and $D_\tilde{t} \subset D_c$ (f analogous).
- this is bad for
- dependency analysis if used for boundary conditions, because $c$ is runtime, and we need to reconstruct domain slicing from arbitrary expressions (using index fields).
- embedded execution: maybe we cannot compute the domain of the resulting field efficiently
- broadcast:
- $broadcast(f, D): D_f \times D \rightarrow V$
- $broadcast(f, D)(x_0, \dots, x_D) = f(x_0, \dots, x_{D-1})$
- union (one of the 2, not implemented):
- with precedence
- without overlap
- neighbor_reduce
- $neighborsum(f, D_s): D \rightarrow V$, where $f: D \times D_s \rightarrow V$
- currently:
- only 1 sparse dimension supported
- if we want more we have to think about dimensions tags that appear multiple times (and the order of allowed reductions)
- What is a field_operator?
- single-output-field-operator: a function $f(g_0,g_1,...) \rightarrow h$ where $g_i, h$ are fields.
- multi-output-field-operator: a set of functions $f_j(g_0,g_1,...) \rightarrow h_j$ where $g_i, h_j$ are fields.
- Till wants to make this more concrete
- Explicit slicing vs intersecting domains vs intersecting domains + explicit slicing?
- intersecting domains (with possibility to define domain in the output)
- if we need, we might want to introduce explicit slicing on top
- apply field operator: take field $f: D_f \rightarrow V$, Domain $D_\text{out}$ and buffer $M$ assign the elements $f(D_\text{out})$ to $M$, $D_f \subset D\text{out}$
- compile-time vs runtime-time domain?
- compile-time is much simpler to analyze
- if embedded is fast enough, compile time doesn't really matter
- we are partly using compile-time domains already (as a short-cut in temporaries)
- What does combined IR mean?
- The IR has 2 extremes: fully field representation (every operation is a map over the temporary domain) and fully local representation (only a single map of a stencil over the output domain)
- local view: it only uses `map` (and other higher order functions) to apply scalar functions over fields
- field view: it only uses _mapped_ versions of builtin scalar functions
- combined view: it is possible to use `map`s and operators between fields
- Do we need tuple of tensor?
- (tensor IR doesn't have it)
- A Field that represent data of neighbor points can be represented as a Field with local dimension (MCH Sparsefield, for variable number of neigbors its a masked/sparse array) or as a Field of local Fields (where the local Fields can have different number of values).
- we conclude for the frontend we want to use the Field with local dimension
- (probably we should do the same in Iterator IR / combined IR)
- Column mode of Iterator IR (scan):
- Iterator IR column probably is Iterator of 1D Field
#### Boundary conditions
- we allow slicing with compile time values (or relative to the field domain)
- then we can concatenate non-overlapping fields with the constraint that they produce a rectangular shape
```python=
@field_operator(backend=build_config.backend)
def with_boundary_values(
lower: Field[[Vertex, K], float_type],
interior: Field[[Vertex, K], float_type],
upper: Field[[Vertex, K], float_type],
level_indices: Field[[K], int],
num_level: int
) -> Field[[Vertex, K], float_type]:
return where(level_indices == 0, lower, where(level_indices == num_level - 1, upper, interior))
# TODO(tehrengruber): move to seperate file
@field_operator(backend=build_config.backend, grid_type=GridType.UNSTRUCTURED)
def nabla_z(psi: Field[[Vertex, K], float_type], level_indices: Field[[K], int], num_level: int):
return with_boundary_values(
(domain(psi)[:, 0:1], psi(Koff[1]) - psi(Koff[0])),
psi(Koff[1]) - psi(Koff[-1]),
(domain(psi)[:, -1:], psi(Koff[0]) - psi(Koff[-1]),
)
]
field[K(0):K(1)] + field == (field + field)[K(0):K(1)] == where(K==0, 2*field, None)
# Non-overlapping concatenate
concatenate(
field[K(0):K(MAX_K - 1)],
const_field[K(MAX_K):K(MAX_K + 1)]
)
field[K.relative[0:-1] ^ const_field[K(30):]
# Setting with precedence (JAX syntax)
c = a.at[somewhere].set(b)
# This can be implemented with slicing and concatenate
```
### Combined IR questions
- itir.NeighborList vs field view local dimension
#### Field concept constraints (_Dimension_ from [Gibbons 2017](https://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/aplicative.pdf))
- Functor: fmap
- Applicative (Monoidal): pure, apply <*>
- Foldable: foldl
- Traversable: traverse
- Naperian/Representable <= Profunctor ??
- Profunctor: contramap
#### Iterator concept constraints ([IndexedComonadStore](https://hackage.haskell.org/package/lens-4.14/docs/Control-Lens-Internal-Context.html#t:IndexedComonadStore))
- Functor (IndexedFunctor): fmap
- Comonad (IndexedComonad): extend (ITIR: lift), extract (ITIR: deref)
- StoreComonad (IndexedStoreComonad): pos, peek, (extra: peeks, seek, seeks (ITIR: shift), experiment (ITIR: neighbors))
#### Playground: fields and iterators
```python
# -- Fields and field operators --
f_a: Field[[I,J,K], Scalar]
f_b: Field[[I,J,K], Scalar]
def field_operator(
field_a: Field[[I,J,K], Scalar],
field_b: Field[[I,J,K], Scalar]
) -> Field[[I,J,K], Scalar]:
return 0.5*field_a + 0.5*field_b
f_c: Field[[I,J,K], Scalar] = field_operator(field_a, field_b)
# -- Iterator and local operators --
def local_stencil(
it_a: Iterator[[I,J,K], [I,J,K], Scalar],
it_b: Iterator[[I,J,K], [I,J,K], Scalar],
) -> Scalar:
return 0.5*extract(it_a) + 0.5*extract(it_b)
f_a: Field[[I,J,K], Scalar]
f_b: Field[[I,J,K], Scalar]
# Option 1 -> field_op: (Field, Field) -> Field (implicitly computes maximum domain) ??
field_op: Callable[[Field[I,J,K], Field[I,J,K]], Field[I,J,K]] = fmap(
local_stencil,
#domain_maker=annalyze_stencil(local_stencil) # what about concatenate??
)
f_c: Field[[I,J,K], Scalar] = field_op(field_a, field_b)
# Option 2 -> field_op: domain -> ((Field, Field) -> Field) (explicit domain) ??
field_op: Callable[
[Domain], Callable[[Field[I,J,K], Field[I,J,K]], Field[I,J,K]
] = fmap(local_stencil)
f_c: Field[[I,J,K], Scalar] = field_op[out_domain](field_a, field_b)
# -- Reshaping fields --
def expand_domain(it: Iterator[[I,J,K], [I,J], Field[[K], Scalar]]) -> Scalar:
return extract(seeks((lambda i,j,k: i,j), it))[pos(it).k]
f_splitted: Field[[I,J], Field[K]]
f: Field[[I,J,K], Scalar] = fmap(expand_domain)[domain(I,J,K)](f_splitted)
def slice_domain(it: Iterator[[I,J], [I,J,K], Scalar]) -> Field[[k], scalar]:
return field(
domain=(K,),
[extract(seeks((lambda i,j: i,j,k), it)) for k in K]
)
f: Field[[I,J,K], Scalar]
f_splitted: Field[[I,J], Field[K]] = fmap(slice_domain)[domain(I,J)](f)
```
#### Playground: scan
```python
# -- Scan with fields
f_ijk: Field[[I,J,K], Scalar]
def f_scan_pass(
state: Field[[I,J], Scalar], # state: Field[[I_current, J_current], Scalar],
field: Field[[I,J], Scalar]
) -> Field[[I,J], Scalar]: # -> Field[[I_current, J_current], Scalar]:
return state + 0.5*field
field_op: Callable[[Field[I,J,K]], Field[I,J,K]] = scan(
f_scan_pass, K
)
# -- Scan with iterators
def it_scan_pass(
state: Iterator[[?], [?], Scalar],
it: Iterator[[I,J], [I,J], Scalar]
) -> Scalar:
return state + 0.5*extract(it) + 0.25*extract(seeks(lambda i: i-1, it))
field_op: Callable[[Field[I,J,K]], Field[I,J,K]] = scan(
it_scan_pass, K
)[0:K_max]
def it_scan_pass_k(
state: Iterator[[?], [?], Scalar],
it: Iterator[[K], [K], Scalar]
) -> Field[[I,J], Scalar]:
return state + 0.5*extract(it) + 0.25*extract(seeks(lambda k: k-1, it))
combined_field_op: Callable[[Field[I,J,K]], Field[I,J,K]] = fmap(
lambda it_of_f_k: scan(it_scan_pass_k)[0:K_max](extract(it_of_f_k))
)[0:I_max, 0:J_max](field_ijk)
```
#### Playground: Tuples
```python!
def foo_it(it0, it1):
return make_tuple(deref(it0), deref(it1))
def foo_f(f0: Field[D, dtype], f1: Field[D, dtype]) -> Field[D, tuple[dtype,dtype]]:
return zip(f0, f1)
def bar_f(f0: Field[D0, dtype], f1: Field[D1, dtype]):
return f0, f1
```
```python!
def whole_dycore(inp):
return ...
def main(inp):
return diff_x(whole_dycore(inp)), diff_y(whole_dycore(inp))
def main2(inp):
dycore_of_inp = whole_dycore(inp)
return diff_x(dycore_of_inp), diff_y(dycore_of_inp)
```
```python!
def foo(e, v):
if bar:
tmp0 = e + 1
tmp1 = v + 2 + 4
return tmp0, tmp1
else:
return e, v
def foo_explicit(e,v):
if bar:
tmp0 = fmap(plus, d)(e, 1)
tmp1 = fmap(plus, d)(fmap(plus, d)(v, 2), 4)
return tmp0, tmp1
else:
return e,v
def foo1(e,v):
if bar:
tmp0 = fmap(...)
tmp1 = fmap(lambda x: x + 2 + 4)(v)
else:
e, v
def foo2(e,v):
fmap(lambda e, v, bar: e + 1 if bar else e), fmap(lambda e,v, bar: v+2+4 if bar else v)
def foo_e(e, v):
if bar:
tmp0 = e + 1
tmp1 = v + 2
return tmp0
else:
return e
def foo_v(e, v):
...
# Broadcast constant if
def foo_bcast(e, v):
out_e = where(broadcast(bar, e), e + 1, e)
out_v = where(broadcast(bar, v), v + 2 + 4, v)
return out_e, out_v
# if bar:
# tmp0 = e + 1
# tmp1 = v + 2 + 4
# return tmp0, tmp1
# else:
# return e, v
```
- ~~Tendency: FieldIR doesn't need the concept of returning multiple fields (on different domain), but it's important in the frontend, therefore we need a good understanding how to translate from field_operator with multiple returns to the IR without that. (E.g. extension of IR to connect to the frontend).~~
Definitions:
Bob: is a function that takes fields and returns a single field, build from builtin functions which are constructed from fmapped local operations.
- Second Try: FieldIR should have the concept of a function taking fields and returning fields. Such a function is not lowerable to a single stencil. But lowers to a tuple of Bobs, each of them lowering to stencil. (Since all field view builtins come from fmap, we can transform every Bob to a single stencil and every function returning fields to a tuple of stencils). (Till approves, but wants a better name for Bob.)
- Intermediate solution: FieldIR should have the concept of a function taking fields and returning potentially multiple fields defined on arbitrary domains. Such a function is not lowerable to a single ITIR stencil. But: For all cases (single return statement with fields of different domain, if statements are always compile time) that we will need in the foreseeable future there is a simple way to write such functions (without duplicating the expressions which we are unsure how well we can optimize that) in a way that they only return a single field. As such we can also lower them regularly. All other cases that we can not lower this way should be prohibited (until we potentially abandon ITIR).
```python!
def foo_f(f0: Field[D, dtype], f1: Field[D, dtype]) -> Field[D, array[dtype,2]]:
return fzip(f0, f1)
def bar_f(...):
a, b = funzip(foo_f)
def foo_f(f0: Field[D, dtype], f1: Field[D, dtype]) -> array[Field[D,dtype],2]:
return zip(f0, f1) func <$> arg1 <*> arg2 <*> arg3
def bar_f(...):
a, b = foo_f
```
##### Questions
How to fmap fields with the same domain? (Note: above we said we cannot map fields with different domains together)
Ideas:
- we zip the fields first
1. `(Field[D, dtype], f1: Field[D, dtype]) -> Field[D, array[dtype,2]]` now we can fmap fields with same domain and dtype
2. `(Field[D, dtype1], f1: Field[D, dtype2]) -> Field[D, tupe[dtype1,dtype2]]` this might not be a very interesting Field for the frontend, but allows to fmap fields with different dtype together what we absolutely want
3. as execution hint/policy introduce multi-fmap `mfmap` which maps `N` `field_operator`s over the same domain and returns `N` fields
4. similar to 3, but introduce `mfmap` takes a single local operations that returns tuple and outputs to tuple of fields
Hannes thoughts: (1) is not useful as it doesn't allow expressing a single kernel over operations on different dtypes
Maybe all above doesn't make sense because it's late...