Completion of the field-view frontend

# Completion of the field-view frontend ###### tags: `functional cycle 8` Appetite: full cycle Developers: Rico (after unblocking bindings), contributions from Ben [TOC] :::warning **REMINDER**: any decision regarding DSL/frontend issues should take into account the feasibility of pure Python _embedded_ implementation. ::: ## Complete reductions - Current [PR](https://github.com/GridTools/gt4py/pull/640) **should** work for now. Check current status and implement the missing parts required for the first ICON stencils. - In case there are serious problems with the current PR which cannot be fixed in time, directly using Iterator IR would be the alternative for now. ## Scalar support **Goal**: Enable support for arithmetic operations between fields and compile-time or run-time provided scalar values, which includes the following list of use cases. **Potential rabbit holes**: - It is not clear if the scalar support in the current implementation of Iterator IR is solid and complete enough: - Does the backend/iterator prototype currently support constant fields? ### Arithmetic operations between fields and scalars Add support for mixing scalar and fields in arithmetic operations. Example: ```python @field_operator def foo(bar: Field[..., dtype]) -> Field[..., dtype] return bar + 1 ``` There are two possible approaches to be discussed to expand the lowering from FOAST to ITIR: 1. Treat scalars as a different entity - Add Constant FOAST node (should follow PAST structure) - Implement binary operator overload for arguments of type field and constant, where the constant is lowered into a lifted lambda expression. - Problem: unnecessary complex? 2. **Treat scalars as 0-dimensional fields (fields without dimensions)** - Treat ```alpha: dtype``` as syntatic sugar for ```alpha: Field[[], dtype]``` - Lower every constant into a lifted lambda expression with no arguments. - Problem: does it also work for embedded execution semantics? Finally, define the dimensions of the result field (also related to [Broadcasting of dimensions](#Broadcasting-of-dimensions)) ### Arithmetic operations between scalars Add support for arithmetic operations with scalars. Example: ```python @field_operator def foo(bar: Field[..., dtype], alpha: dtype) -> Field[..., dtype] return (2.0 *alpha) * bar ``` After adding support for 0-dimensional fields this kind of operations should just work out-of-the-box. ### Field operators with scalar parameters At the frontend, we would like to support scalar arguments in the operator and programs signature for values provided at run-time. Example: ```python @field_operator def foo(bar: Field[..., dtype], alpha: dtype) -> Field[..., dtype] return alpha * bar ``` With the 0-dimensional field approach this basically works out-of-the-box: ```python= def foo(bar: Field[..., dtype], alpha: Field[[], dtype]) -> Field[..., dtype] return alpha * bar # using implicit broadcast syntax for simplicity ``` ### Constructing explicitly typed constant values If a scalar value is just a 0-dimensional field, defining scalar constants is equivalent to define a field. See disccusion in [Inline construction of fields](#Inline-construction-of-fields) ## Broadcasting of dimensions ### Alternatives: implicit vs explicit #### Implicit approach ```python op(Field[a, ...], Field[b, ...]) -> Field[[*({*a} | {*b})], ...] # actual syntax: op(a, b) # a is Field[A, ...] # b is Field[B, ...] c: Field[[A, B]], ...] = a * b # numpy way (3,4,5,6) + (5,6) => (3,4,5,6) + (1, 1, 5,6) # how about? [I, K] + [J, K] [Vertex, Edge] [A, E, F, G, H] + [G, E] ``` **Open question:** How can dimensions be ordered? Ideas: - Define ordering between dimensions and topologically sort. + Currently we don't know anything about dimensions they are only tags that need to be unique in the field args. - Use syntatic ordering (`unique(flatten([[Z,B,E], [D,B,E]])) = [Z,B,D,E] `) + Predictable outputs but probably semantically weird **Open question:** What about _generic_ fields or functions using _generic_ fields? #### Explicit approach ```python # a is Field[A, ...] # b is Field[B, ...] # inline syntax for longer expressions? (broadcast(a, [A, B]) + broadcast(b, [A, B]) - c) * broadcast(a, [A, B]) / broadcast(b, [A, B]) # numpy-like a[:, B] + b[A, :] # a: [A] => [A, B], b = [B] +> [A, B] # inspired by: a[:, np.newaxis, :, np.newaxis] # Problem: it's not clear in the expression if `B` and `A` are # just dimensions and then this is a broadcast or if they are # variables and this is a slice operation. # Idea: Find a less ambiguous syntax that expresses the # introduction of new dimensions. # Examples: a[:, new_axis(B)] + b[new_axis(A), :] a[:, new(B)] + b[new(A), :] a[:, axis(B)] + b[axis(A), :] a[:, ~B] + b[~A, :] ``` **Current conclusion:** Use explicit _numpy-like_ syntax for operations between fields, but allow implicit for constants/0-dimensional fields. Unless there is a strong consensus among the whole team for the proposed broadcasting syntax, it might be better to use a `broadcast()` function, which is uglier and more verbose, but much simpler to deprecate and upgrade in the future. ### Lowering **Question:** what happens if there is a `shift` in a dimension of a broadcasted `Field`, on which the non-broadcasted `Field` was not defined? - `Field` should be treated as constant field in the broadcasted dimensions (there are not index typechecks in the current definition of Iterator IR, similar to SID implementation). - **TODO:** remove `shift` safe checks in the embedded iterator IR implementation. ### Test cases - Existing example: `mo_nh_diffusion_stencil09` in field view (`Edge` + `Edge`, `K` as well as `Field` + `Constant`). - At least 1 reduction example. #### Operations involving generic fields For now the result of an operation mixing generic and non-generic fields is a generic field. ``` Field[..., dtype], Field[[X, Y, Z], dtype] -> Field[..., dtype] ``` ## Inline construction of fields _Use case_: local weights for neighbor reductions in ICON stencils. Add syntax to create constant local fields directly in the field view. Example: ```python # Field constructor-like syntax proposal: constant_field = Field[[], float32]("1.123") constant_field = float32("1.123") # Equivalent to previous statement local_field = Field[[E2V], float32](["1.123", "2.3"]) ## Should be implicit dtypes also accepted? constant_field_f64 = Field[[]](1.123) # What about other numpy-like creation functions? # This is uglier and more verbose, but most likely simpler to # upgrade and deprecate in the future constant_field = gt.as_field("1.123", dims=[], dtype=float32) constant_field = gt.as_field(["1.123"], dims=[], dtype=float32) ## Is it valid? local_field = gt.as_field(["1.123", "2.3"], dims=[E2V], dtype=float32) local_field = gt.as_field([1.123, 2.3], dims=[E2V]) ``` ### Explicitly typed string conversion for numeric values In a parsing frontend, literal numeric values could be used since the parser usually has direct access to the source code string. However, in an embedded execution context, numeric literal values would be directly converted to `int`, `float64` or `complex84` datatypes by the Python interpreter, which is not always a reversible transformation, making effectively impossible to define constant values with other types like `float32`. For this reason, explicit construction of typed values from string literals containing numeric values should be added as the safe way to define constant values. ```python np.float32(np.float64(CONSTANT)) == np.float32(CONSTANT) # Not always True ``` See also: - [Can you safely parse a double when you need a float?](https://lemire.me/blog/2021/11/30/can-you-safely-parse-a-double-and-then-cast-to-a-float/) - [Sylvie Boldo, Guillaume Melquiond (2004) - When double rounding is odd](https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.5554) (only slightly related) **Open question:** Has Iterator IR enought support for explicit type casts ??  ## Externals (good first task for learning purposes) __Goal__: Enable use of compile-time constant external values by either passing them to the respective decorator (externals) or by implicitly capturing them from the outer scope (closure vars). ### Explicit and implicit externals #### Explicit externals: decorator parameter ```python @field_operator(externals={'foo': 42}) def foo(a: Field[..., dtype]) -> Field[..., dtype]: from gt4py.__externals__ import foo return a+foo ``` #### Implicit externals: closure vars ```python # Alternative 1: simple Python variables GLOBAL_CONSTANT = 42 GLOBAL_CONSTANTS = SimpleNamespace(bar=443, baz=4343) GLOBAL_CONSTANTS = {'a': 43, b: '4343'} # Probably not supported # Alternative 2: wrap values into special sentinel objects GLOBAL_CONSTANT = gt.Global(42) GLOBAL_CONSTANTS = gt.Global(bar=443, baz=4343) @field_operator def foo(a: Field[..., dtype]) -> Field[..., dtype]: return a + GLOBAL_CONSTANT # Access to nested variables return a + GLOBAL_CONSTANTS.bar # Access to mappings, ?? return a + GLOBAL_CONSTANTS["bar"] @field_operator def foo(a: Field[..., dtype]) -> Field[..., dtype]: ``` - Explicit externals (`externals` dict) is always local to the field operator. If several field operators need to share a common set of externals, either the caller explicitly forwards the values as arguments in the call, or the `externals` argument in the decorator gets the same mapping for all the operators meant to share the definitions. ### Open questions - Should implicit externals (captured closure vars) be wrapped with a `Global` sentinel type or just use plain Python variables? (Preliminary answer: it's likely better to use a wrapper type) + There is no way in Python to prevent changes in the value bound to a symbol name. Thus, any _external_ symbol could be modified after some operators have been defined (compiled), which is conceptually wrong. + Using plain Python variables is appealing since it does not introduce new syntax and interoperates better with other Python codes. However, with this approach the only possible action in case of changes in the externals symbols in embedded execution is to warn the user and stop the execution. + Using a specific `gt4py` wrapper type like `Global()`, the check for possible value changes could be implemented more efficiently and it could be even possible to implement the right semantics also in embedded mode (use the value found at compilation time, not at execution time) using `contextvars` (which also requires all `Global` symbols to be defined at global module level). - Which data types should be supported? 1. Only simple numeric scalars 2. Numeric scalars and arrays/tensors 3. Numeric scalars and both arrays/tensors and mappings/namespaces of scalars 5. Numeric scalars and both arrays/tensors and mappings of nested arrays/tensors and mappings of scalars.