Reductions for the Functional Model

# Reductions for the Functional Model ###### tags: `IR` `Felix’s remains` ## Antons Original Implementation Signatures of `shift` and `reduce`: ```haskell= shift :: (Connectivity, Offset) → Iterator → Iterator shift :: Connectivity → Iterator → NeighborIterator shift :: Offset → NeighborIterator → NeighborIterator reduce :: (F, Init) → (NeighborIterators, …) → Value ``` ### Problem Analysis According to the definition of lift, the following two equivalences should hold: ```cpp= shift(v2v, 0)(lift(f)(it)) == lift(f)(shift(v2v, 0)(it)) ``` This is a crucial requirement for inline/on-the-fly evaluation of `lift`: ```cpp= deref(shift(v2v, 0)(lift(f)(it))) == f(shift(v2v, 0)(it)); ``` Now to reductions. For simplicity, let’s assume we have a connectivity _v2e_ with _only a single neighbor_, that is, `max_neighbors(v2e) == 1`. According to the current implementation, the evaluation of a reduction is then as follows: ```cpp= g = reduce(f, 0); // Evaluation (with only one neighbor) g(shift(v2e)(it)); shifted = shift(0)(shift(v2e)(it)); res = can_deref(shifted) ? f(res, deref(shifted)) : res; ``` Note that `shift(v2e)(it)` returns a 1D iterator that is only shiftable again with a single integer offset. In this case, this is fine as obviously the next shift is `shift(0)`. Now let’s lift `g`! We immediately introduce a problem; there’s a full shift happening after a partial shift: ```cpp= deref(shift(v2v, 0)(lift(g)(shift(v2e)(it)))) == g(shift(v2v, 0)(shift(v2e)(it))); ``` If we look at the evaluation of the reduction, the situation does not improve. Now we have a partial shift (with connectivity), followed by a full shift, followed by a partial shift (with offset). ```cpp= // Evaluation would be shifted = shift(0)(shift(v2v, 0)(shift(v2e)(it))); res = can_deref(shifted) ? f(res, deref(shifted)) : res; ``` When looking at nested reductions, the situation gets even worse. Now, there are additionally multiple partial shifts applied one after another: ```cpp= h = reduce(f1, 0) // Does not work: normal shift after partial shift! deref(shift(v2v, 0)(lift(g)(shift(v2v)(lift(h)(shift(v2e)(it)))))) == g(shift(v2v, 0)(shift(v2v)(lift(h)(shift(v2e)(it))))) == g(shift(v2v, 0)(lift(h)(shift(v2v)(shift(v2e)(it))))) == g(lift(h)(shift(v2v, 0)(shift(v2v)(shift(v2e)(it))))); // Evaluation would be shifted_g = shift(0)(lift(h)(shift(v2v, 0)(shift(v2v)(shift(v2e)(it))))); // Which is shifted_g = lift(h)(shift(0)(shift(v2v, 0)(shift(v2v)(shift(v2e)(it))))); // And evaluating h shifted_h = shift(0)(shift(0)(shift(v2v, 0)(shift(v2v)(shift(v2e)(it))))); res_h = can_deref(shifted_h) ? f(res_h, deref(shifted_h)) : res_h; // Now (ignoring can_deref) res_g = f(res_g, res_h); ``` ### Just for Reference: `reduce` & `shift` Pseudocode ```cpp= auto reduce(auto f, auto init) { return [=](auto it, auto... its) { auto res = init; for_each_neighbor([&](auto i) { if (can_deref(shift(i)(it))) res = f(res, deref(shift(i)(it)), deref(shift(i)(its))...); }); return res; }; } auto shift(auto connectivity, auto offset) { return [=](auto iter) { if constexpr (is_lifted_stencil_iter(iter)) { return lifted_stencil_iter( iter.func, transform(shift(connectivity, offset), iter.args) ); } else { ... } }; } ``` ## Alternative 1: Trivializing `reduce` If `reduce` is not a stencil, the requirement of a liftable reduce can be dropped. Instead, some new tuple-like type concept—here called `LazyOptionalTuple`—must be introduced to keep efficient lazy shifting and dereferencing capabilities. The newly introduced concept has the following API requirements: ```haskell= neighbors :: It → LazyOptionalTuple reduce :: (F, Init) → LazyOptionalTuples… → Value has_value :: (LazyOptionalTuple, Int) → Bool get_value :: (LazyOptionalTuple, Int) → Value ``` A possible implementation of reduce on the backend side (note that the shifting and dereferencing happens lazily in `get_value`): ```cpp= g = reduce(f, 0); g(neighbors(it1), neighbors(it2), deref(sparse)); auto reduce(auto f, auto init) { return [=](auto t, auto ts...) { auto res = init; for_each_index(t)([&](auto i) { if (has_value(t, i)) res = f(res, get_value(t, i), get_value(ts, i)...); }); return res; }; } ``` ## Alternative 2: Fixing the 1D Iterators Let’s call the fixed iterator type _partial iterator_. Most importantly, we need to be able to recover a normal iterator from a partial iterator if we apply a offset-only shift to a partial iterator. That is, we need the following functionality: ```haskell= shift :: (Connectivity, Offset) → Iterator → Iterator shift :: Connectivity → Iterator → PartialIterator shift :: Offset → PartialIterator → Iterator reduce :: (F, Init) → (PartialIterators, …) → Value ``` Second, to allow for lifted reductions, we need to be able to apply normal shifts interleaved with partial shifts. That is, the expression `shift(0)(shift(v2v, 0)(shift(v2e)(it)))` from the examples above has to be valid. Third, multiple partial shifts must be applicable to enable nested (lifted) evaluation of reductions. For example, the expression `shift(0)(shift(0)(shift(v2v, 0)(shift(v2v)(shift(v2e)(it)))))` from the examples above must be valid. All these requirements can be met in a straightforward way by parametrizing iterators by a stack of partially shifted connectivities. That is, we have the following API: ```haskell= shift :: (Conn, Offset) → Iterator[Conns…] → Iterator[Conns…] shift :: Conn → Iterator[Conns…] → Iterator[Conn, Conns…] shift :: Offset → Iterator[Conn, Conns…] → Iterator[Conns…] reduce :: (F, Init) → (Iterator[Conn1, Conns1…], Iterator[Conn2, Conns2…], …) → Value ``` Note that `can_deref` returns false for any partial iterator, that is for any iterator with at least one partial shift applied. Example type evaluations for the previous expressions: `shift(0)(shift(v2v, 0)(shift(v2e)(it)))`: ```cpp= it; // Iterator[]: normal iterator shift(v2e)(it); // Iterator[v2e], one partial shift applied shift(v2v, 0)(shift(v2e)(it)); // Iterator[v2e], full shift does not change the type shift(0)(shift(v2v, 0)(shift(v2e)(it))); // normal Iterator[] again ``` `shift(0)(shift(0)(shift(v2v, 0)(shift(v2v)(shift(v2e)(it)))))`: ```cpp= it; // Iterator[]: normal iterator shift(v2e)(it); // Iterator[v2e], one partial shift applied shift(v2v)(shift(v2e)(it)); // Iterator[v2v, v2e], second partial shift shift(v2v, 0)(shift(v2v)(shift(v2e)(it))); // Iterator[v2v, v2e] shift(0)(shift(v2v, 0)(shift(v2v)(shift(v2e)(it)))); // Iterator[v2e] shift(0)(shift(0)(shift(v2v, 0)(shift(v2v)(shift(v2e)(it))))); // Iterator[] ``` Note on sparse fields: the previous strategy of making all tuple-like types shiftable does not work anymore, as shifting along any other but the partial dimension is not possible. Instead, we require a function to obtain a partial iterator from an iterator with tuple value type. The function, here called `sparse`, has the following signature: ```haskell= sparse :: Iterator[] → Iterator[TupleDim] ``` An experimental (but tested!) implementation on top of Anton’s C++20 CPU-only code is available at https://github.com/GridTools/gridtools/pull/1703.