Iterator IR without partial offsets

# Iterator IR without partial offsets ###### tags: `functional cycle 13` - Developers: Hannes (after ICON dycore GT4Py transition), Enrique (after merge), Rico (after CI/CD-ext) - Appetite: full cycle ## Motivation The partial offsets in Iterator IR complicate the IR in several ways: 1. **No simple lowering of nested reductions from field view.** This is related to the problem that full shifts in field view go in the opposite direction than in Iterator IR, the `lift` operation makes this simple. However the Tag-Offset-Pair (full offset) needs to be kept in that order. With a whole program lowering this could be fixed, however operator-by-operator lowering is not possible (or very complicated). 2. **Reductions cannot be applied to sparse-fields.** Reductions work by detecting the last partial shift, however for spare-fields, there is no shift within Iterator IR. They can be seen as being passed pre-shifted to the Iterator IR entrypoint. 3. Related, it is **complicated to detect the last shift for reduction** (see current `UnrollReduce` pass). 4. **Indexing a field with neighbor dimensions cannot be indexed.** Currently, we would have to represent indexing as a `shift(index)` at Iterator IR level. In field view we don't know if a field has the neighbor dimension or if it was broadcasted (broadcast could have happened outside of the current operator). Therefore, we should not always generate `shift(index)`. In operator-by-operator lowering this cannot be resolved without changing Iterator IR. ## Goal Make Iterator IR work with the above use-cases by exploring the following solution. If we find the solution not viable, we will fall back to alternative solutions, briefly outlined later. The ideal outcome is a working implementation. If not feasible, detailed project for next cycle need to be shaped, where all exploration tasks have an answer. ## Solution We propose to replace partial shifts by full shifts to a list of iterators. We introduce ```python neighbor_shift(o: OffsetTag)(it: Iterator) -> Iterator[list] ``` or ```python derefed_neighbor_shift(o: OffsetTag)(it: Iterator) -> list[Value] ``` (name to be improved), with the semantics ```python! derefed_neighbor_shift = lambda o: (lambda it: make_list( deref(shift(o, 0)(it)), deref(shift(o, 1)(it)), ..., deref(shift(o, max_neighbors - 1)(it)) ) ) ``` and similarly `neighbor_shift` with the inner lambda lifted return iterator of list. The `reduce` builtin will be changed to work on list or iterator of list respectively. Open questions: - How much of the list type will be exposed: Can the user `make_list(values,...)`? Can the user get the size of the list, i.e. `list_size` builtin. - Should we use the current Tuple instead of a separate list type with uniform elements? ### Are the problems solved? 1. Explore nesting of `neighbor_shift`s. Especially, answer the question of `neighbor_shift` (which is better for composability) and `derefed_neighbor_shift`. 2. Sparsefields would be represented with the list-type as values. Their length need to be accessible to the implementation of reduce (not necessarily the user). 3. The unroll factor would be accessible, see previous point. 4. `broadcast` in field view would create semantically a list of size of the dimension with all elements being the same (the broadcasted value). **Other problems to be explored** - Do we need changes to the backend? - Can we represent the fvm nabla sign field? - What do we do with ```python @field_operator def number_of_neighbors(): return neighbor_sum(broadcast(1, (C, C2V)), C2V) ``` Here now iterator is available to know where we are in the mesh and we cannot count the number of neighbors. Solution might be similar to the solution for an `index()`-builtin: somehow we need to inject an iterator for location tracking. - How do we improve offset representation? For normal shifts, we only want to accept full shifts, i.e. a (tag,offset)-pair, however for `neighbor_shift` we need just `tag`. Alternatively, we can represent the `offset` in `neighbor_shift`s as a list of (tag,offset)-pair. Consider shaping this properly for a later cycle. - How do we deal with optional values (do we replace `can_deref` with `has_value`?) ## Implementation - Test the solution in embedded execution (main focus) - If, we we are confident the problems are solved (and time allows) - update transformations to work with the new representation - update type inference - implement lowering from fieldview - Otherwise, explore the problem space with the alternative solutions. ## Rabbit holes - It might be tempting to extend frontend/language capabilities, e.g. with the possibility to index sparse fields. This is not the goal of this project. ## Alternative Solutions ### `translate_shift`-builtin See proof of concept in https://github.com/GridTools/gt4py/pull/965 ### full dimension list for `reduce` Instead of passing just iterators, provide the offset information explicitly in the `reduce`. ## Optional: Static offset provider information to IR Represent the static offset-provider information in Iterator IR: - NeighborTableOffset: has_skip_values, max_neighbors - CartesianOffset ## Discussions during implementation - how to order list and tuple for an external tuple sparse field? ### Static offset provider information Add an extra node in FencilDefinition: ```python! class FencilDefinition: offset_definitions: list[OffsetDefinition] ... class OffsetDefinition: ... class ConnectivityDefinition(OffsetDefinition): has_skip_values: bool max_neighbors: int class SingleOffsetDefinition(OffsetDefinition): ... ``` ```python! SList[has_skip_values, size, dtype] ``` ```python! # now def stencil(inp): return deref(shift(koff, 1)(inp)) # new def stencil(inp): return deref(shift(koff(1))(inp)) def fencil(inp: Iterator[], koff: Callable[[int], Callable[[KDim], KDim]]): stencil() ``` ```python! def stencil(inp): return reduce(...)(nbshift(e2v)(inp)) def fencil_unstructured(e2v: Callable[[E], SList[V,...]]): ... ``` ```python! nbshift :: (a -> [b]) -> It a b d -> It a a [d] shift :: (a -> b) -> It a c d -> It b c d derefed_nbshift(c, it) = [deref(shift(nb(c, i))(it)) for i in range(size)] ``` ### Which builtins do we implement required: (d)nbshift map (list[values], ...) reduce(op, init)(list[values], ...) (maybe variadic or via map) optional: list_get(0, dnbshift(E2V)(it)) == deref(shift(E2V(0))(it)) make_list? ```python! # def foo(it: It[Cell, Vertex]): # return deref(shift(E2V)(shift(C2E)(it))) def bar_lifted(it: It[Cell, Edge]) -> It[Cell, Cell]: return lift(lambda x: deref(shift(C2E)(x)))(it) def bar_it(it: It[Cell, Edge]) -> It[Edge, Edge]: return shift(C2E)(it) def foo(it: It[Vertex, Edge])-> It[Cell, Cell]: shifted: It[Cell, Edge] = shift(V2C)(it) return barlifted(shifted) def bar(it: It[Cell, Edge]) -> List: return neighbors(C2E)(it) def nbar_lifted(it: It[Cell, Edge]) -> Iterator[Cell, Cell, List]: return lift(lambda x: neighbors(C2E)(x))(it) def nbar_it(it: It[Cell, Edge]) -> List[List[Iterator[Edge, Edge, V]]]: return hannes_nshift(V2E)(hannes_nshift(C2V)(it)) # ??? def reduced(it): reduce(lambda acc, x: acc + reduce(lambda y, z: y+ deref(z), init0)(x), init1)(nbar_it(it)) def foo(it: It[Vertex, Edge]) -> It[Cell, Cell, List]: shifted = barlifted() return bar(shifted) # def foo(it: It[Cell, Vertex]): # return neighbors(C2E)(lift(lambda x: neighbors(E2V)(x))(it)) ``` ### 2023-02-21 - we want to use `map` (-> map fusion pass) - make_list(args...) takes values (scalars have to be explicitly promoted) #### Strategy for using C++ backend a) Implement all new builtins (map, list_get, reduce, make_list, \_list_size) in C++ with a tuple-like that has size (in case of has_skip_values) a*) encode mask in the values (nan-boxing, int) b) - merge all maps - combine reduce_op and the map op -> reduce only has calls to neighbors or derefed sparse fields - translate neighbors(i,it) -> shift(off, i)(it) and deref(it) -> shift(i)(it) - old unroll_reduce works tuple_get(1, neighbors(off, a)) reduce(...)(neighbors(Off, foo)) -> reduce(...)(shift(Off)(foo), sparse_field) ```python! (lambda x: reduce(plus, 0)(map(mult, x,neighbors(off,b), deref(c))))(neighbors(off, a)) ``` ### 2023-03-01 #### Topics - `make_const_list`, no `make_list` for now - want to simplify unroll_reduce: need easy access to max_neighbors, has_skip_values **and** the argument which can be checked if `can_deref` - TODOs: - add nested reduction test - map fuse: take care of symbol clashes (we should collect symbol utils) Options: a) replace `can_deref` by `has_value(i, list)` `can_deref(shift(off, i)(it))` -> `has_value(i, neighbors(off, it))` b) replace `can_deref` by `length` `can_deref(shift(off, i)(it))` -> `if_(length(neighbors(off, it)) > i, true, false)` reduce(add_1, 0)(neighbors(off, it)) Connectivity(length_kind=) ``` neighbors(V2E, it): List[Loc] neighbors(V2C, it2): List[Loc] shift(V2E, 1)(it: Iterator[V, E])-> Iterator[E, E] V2E: V -> LocalField[V2EDim, E] shift(V2E)(it: Iterator[V, E]) -> LocalField[V2EDim, Iterator[E, E]] ``` ``` o1 = FieldOffset(source=a, target=(c, dim)) o2 = FieldOffset(source=b, target=(c, dim)) sum(f(o1)*g(o2), axis=dim) ``` ``` it1: Iterator[V, E] it2: Iterator[V, C] neighbors(it1, V2E): List[V2E] it2_mod = lift(lambda it: deref(shift(E2C_one_to_one, 0))(it2): Iterator[V, E] neighbors(it2_mod, V2E): Value[V2E] ``` #### Conclusions - We go with current unroll_reduce (ideally) as a temporary solution, then implement lists in C++ backend - We remove `can_deref` when we have lists in C++ (length can be implemented in itir with `reduce(add_1, 0)(neihbors(off, it))`) - Can we reduce sparse fields? Not now, but we could if we don't need unroll_reduce, i.e. once we have list support in C++. - Type inference for sparse field works if we pass the constraint from outside. - Imperative problem: - use `reduce` (from normal gtfn once we have it) - how does itir unroll reduce look like in generated code? why is it not good enough? - TODOs: - add nested reduction test - map fuse: take care of symbol clashes (we should collect symbol utils)