Yet Another New Model for Unstructured

# Yet Another New Model for Unstructured Moved to https://github.com/GridTools/concepts/pull/44 ###### tags: `archive` `GTScript` `unstructured` ## Definitions - **Domain** - a set of points - **Field** - a function from point to some value - **Offset(s) Literal** - specifies the position of one point against another. The formulation is very generic on purpose. The believe is that the way how offsets are defined is orthogonal to the execution model. The important part that it should be *literal*. I.e. offsets should be known in generation/compile time. - **Iterator** - represents the position within the field. You can do two things with the iterator: - apply offsets (aka **shift**); - dereference (aka **deref**). The `shift` should have the following feature: for every iterator `i` and offsets literals `A` and `B`, there is a literal `C`, such that: ``` shift(C)(i) == shift(B)(shift(A)(i)) ``` In other way -- shifts are foldable. Any composition of shifts can be presented as a single shift. The *deref* returns not the value directly but the optional. Deref returns **None** if the value was not set in the current position. - **Stencil** - a function that takes iterators and returns a value (or a tuple of values). The simplest possible stencil is *deref*. - **Fencil** - a list of stencil closures (closure is a stencil, input and output fields and the domain). ## Lifting Stencils are not composable -- they take iterators but returns values. Let us try to fix that introducing **lift** builtin that takes a stencil and return a function from iterators to iterator(s). Let us call the lift result **Lifted Stencil**. Semantics: derefencing the shifted iterator(s) returned by the lifted stencil is same as the result of the stencil with shifted arguments (with the same Offset(s) literal). ## Paralellism and polymorphism We can execute a fencil by applying the stencil to each point of the domain in parallel if there are no vertical dependencies. This is the maximum possible parallelism. If there are vertical dependencies, stencils can be defined on the whole columns instead of individual points. In this case fencil executes the stencil for each column in the domain in parallel. To achieve the composability we want to compose value and column stencils. It is easy to do if we assume that each value stencil can take columns as well. In this case it executes the column pointwise. ## Easy builtins Arithmetic operations and conditionals are the bare minimum set of builtins needed for point stencils. Let us focus on summation builtin. The most straitforward and conceptually simple is to define it like that: ```python def sum(a, b): return a + b ``` This does not work because we need point/column polimorhism. The working implementation should look like: ```python def sum(a, b): return (column(a[x] + b[x] for x in a.indices) if are_columns(a, b) else a + b) ``` Alternatively we can combine summation with the *deref* and the *lift* in the same builtin: ```python @lift def mega_sum(a, b): return sum(deref(a), deref(b)) ``` If all the builtins are defined that way, we can get rid of using `lift` and `deref` in the stencil composition. That is probably a good idea for the frontend, but not for the intermediate level. Here the illustration. Suppose we have a stencil: ```python def foo(a, b, c): return hannes_sum(hannes_sum(a, b), c) ``` comparing to ```python def foo(a, b, c): return sum(sum(deref(a), deref(b)), deref(c)) ``` And we are lowering this stencil to C++ in the context of parallel execution (pointwise). No lifting is needed in this case and C++ code should look something like (suppose we have iterator-like args in C++ as well): ```C++ auto foo(auto a, auto b, auto c) { return (*a + *b) + *c; } ``` In `sum` case there is no explicit or implicit lifting. the lowering can be implemented without addional transformation. In `mega_sum` case, in between inner and outer `mega_sum` there is an implicit `lift` that is followed by immediate `deref`. Naive lowering of that construction will be like that (suppose we implement lowering by inlining): ```C++ auto foo(auto a, auto b, auto c) { auto tmp = make_inline( [](auto a, auto b) { return *a + *b; }, a, b); return *tmp + *c; } ``` Which is correct but pessimized way to go. To avoid this pessimization we need to perform some AST transformation. ## Scanning TODO(anstaf): scanning is the same thing as for cartesian. * **scan**: a function with the signature: ```python scan(scan_pass: ScanPass, is_forward: bool, init) -> Stencil ``` It returns a stencil that executes a vertical loop. `scan_pass` serves as a body of the loop; `is_forward` defines direction. The values that `scan_pass` returns are written to the returning column and also passed to the next invocation of `scan_pass` as a first parameter. On the first invocation `init` value is passed to `scan_pass`. Note that the scan returns a stencil that can be applied only to column iterators, but not to point iterators. ## Reduce and sparse fields Say we have the data that is defined on unstructed mesh. And we are applying a stencil on cells over some domain. The stencil takes a two arguments and sums all multiplications of values in the neighbor edges: ```python def val(v): return 0 if v is None else v # maximum number of edges is 4 def foo(a, b): return (val(deref(shift(c2e(0))(a))) * val(deref(shift(c2e(0))(b))) + val(deref(shift(c2e(1))(a))) * val(deref(shift(c2e(1))(b))) + val(deref(shift(c2e(2))(a))) * val(deref(shift(c2e(2))(b))) + val(deref(shift(c2e(3))(a))) * val(deref(shift(c2e(3))(b))) ) ``` Let as split the offset like `c2e(i)` into `c2e` and `nth(i)`. No we can now rewrite `foo`: ```python def foo(a, b): a = shift(c2e)(a) b = shift(c2e)(b) return (val(deref(shift(nth(0))(a))) * val(deref(nth(edge(0))(b))) + val(deref(shift(nth(1))(a))) * val(deref(shift(nth(1))(b))) + val(deref(shift(nth(2))(a))) * val(deref(shift(nth(2))(b))) + val(deref(shift(nth(3))(a))) * val(deref(shift(nth(3))(b))) ) ``` To support that pattern we introduce a builtin **reduce**. With the signature is: ```python def reduce(fun, init) -> Stencil ``` Now we can rewrite `foo` again: ```python def foo(a, b): return reduce(lambda acc, a, b: return acc + a * b, 0)(shift(c2e)(a), shift(c2e)(b)) ``` The implementation of `reduce` could look like: ```python def reduce(fun, init): def sten(*args): assert check_that_all_args_are_comatible(*args) first_arg = args[0] n = get_maximum_number_of_neigbors_from_args(first_arg) res = init for i in range(n): # we can check a single argument # because all arguments share the same pattern if deref(shift(nth(i))(first_arg)) is None: break res = fun(res, *(deref(shift(nth(i))(a)) for a in args)) return res return sten ``` Let us call *Iterator* that can be shifted by `nth(i)` **SparseField**. Two *SparseField*s are compatible if they are constructed with the *shift* with the same *offsets literal*. In this case they have the same maximum number of neighbors. ## Building a neighbor table For each argument of each stencil closure within the fencil we can build its own neighbor table. The trick here is to collect all the `shift`'s and extract offsets literals from them. ## Converting from unstructured offsets to strided offsets with colored fields ## IR Grammar in EBNF ```ebnf= program = { function_definition | fencil_definition | setq }; function_definition = "(", "defun", id, "(", {id}, ")", expr, ")"; fencil_definition = "(", "defen", id, "(", {id}, ")", { "(", ":domain", expr , ":stencil", expr, ":outputs", {id}, ":inputs", {id}, ")" }, ")"; setq = "(", "setq", id, expr, ")"; expr = int_literal | float_literal | offset_literal | id | lambda | fun_call ; lambda = "(", "lambda", "(", {id}, ")", expr, ")"; fun_call = "(", expr, {expr}, ")"; ``` `let` is optional and can be added to expressions. ```ebnf let = "(", "let", "(", {"(", id, expr, ")"}, ")", expr, ")"; ``` ```lisp (let ((a <a_expr>) (b <b_expr>)) <expr>) ``` is the same as ```lisp (lambda (a b) <expr>)(<a_expr> <b_expr>) ``` ```lisp= ;; minimalistic example of the IR script (defen my_copy_fencil(x y z in out) (:domain (cartesian 0 x 0 y 0 z) :stencil deref :output out :input in )) ``` ## Builtins ### Concepts - **Domain** - can be followed by the `:domain` designator in the fencil definition. - **Value** - a type that potentially can be stored in the field. - **Location** - one of `:cell`, `:vertex` or `:edge` - **Offset** - a designator that can passed to the `shift` call; the set of cartesian offsets could be `:i`, `:j`, `:k` and `:1`, `:2`, ...; for unstructured we could introduce smth like `:e2v`, `:v2e` ...; - **Field<Value>** - can be followed by `:output` or `:input` in the fencil definition. - **Column<Value>** - represents a column within the field; the height of the column is determined from *Domain* which is expected to be in the context. *Column<Value>* models *Value* concept. - **Optional<Value>** - Value or `None`. - **Tuple<...>** - just a tuple. - **Iterator<Value>** - supports `shift` and `deref` - **Stencil<R, Args...>** - a function with the signature: `R(Iterator<Args>...)` ### Signatures ```cpp= Domain cartesian(int i0, int i1, int j0, int j1, int k0, int k1); Domain unstructured(Location, int hor0, int hor1, int k0, int k1); // Iterators API Stencil<Iterator<V>, V> shift(Offsets...); Stencil<Optional<V>, V> deref; Stencil<Iterator<R>, Args...> lift(Stencil<R, Args...>); Tuple<Stencil<Iterator<Rs>, Args...>...> lift(Stencil<Tuple<Rs...>, Args...>); Stencil<Column<V>, Column<Args>...> scan( bool is_forward, V init, V pass(V, Iterator<Args>...)); Stencil<V, Args...> reduce(V init, V fun(V, Args...)); // Optional API V opt_deref(Optional<V>); bool is_none(Optional<V>); // Tuple API Tuple<Vs...> make_tuple(Vs...); auto nth(int, Tuple<Vs...>); // Those helpers are needed // if we do not want to support implicit type conversions Column<V> promote(V); To cast<To>(From); // ternary operator V if(bool, V, V); Column<V> if(Column<bool>, Column<V>, Column<V>); // conditionals bool eq(V, V); Column<bool> eq(Column<V>, Column<V>); bool less(V, V); Column<bool> less(Column<V>, Column<V>); // logical bool not(bool); Column<bool> not(Column<bool>); bool and(bool, bool); Column<bool> and(Column<bool>, Column<bool>); bool or(bool, bool); Column<bool> or(Column<bool>, Column<bool>); // arithmetic V negate(V); Column<V> negate(Column<V>); V plus(V, V); Column<V> plus(Column<V>, Column<V>); V minus(V, V); Column<V> minus(Column<V>, Column<V>); V mult(V, V); Column<V> mult(Column<V>, Column<V>); V divides(V, V); Column<V> divides(Column<V>, Column<V>); ```