Field view frontend working document

# Field view frontend working document ###### tags: `frontend` [TOC] ## Useful links: - [Array API terms](https://data-apis.org/array-api/latest/purpose_and_scope.html#terms-and-definitions) - [Xarray terminology](http://xarray.pydata.org/en/stable/user-guide/terminology.html) - [mypy playground](https://mypy-play.net) - [Python PEPs Index](https://www.python.org/dev/peps/) ## Action points __Week 1__ - [X] Preliminary list of domain scientist candidates for feedback - Jeremy - Christian - David Leutwyler - Anurag? - Stefano? - Oli? - Sascha? (MCH) - Stephanie? (MCH) - [X] Ideas to define the evaluation/feedback process with users - Are all use cases covered? Are there remaining issues / dealbreakers? - What can be improved? (Likes, Wishes, What if?) - Send example + mock API / syntax documentation before first session - 3rd week: First session - Streamlined explanation of syntax, show example (and gather feedback) - Let them play with the examples - 4th week: Second session starts with feedback from them - 5th, 6th week implement feedback, document decisions to implement now, later or not at all - [x] Skeleton for the frontend specification (API design) __Week 2__ ## Feedback *Status* On hold, to be continued after MeteoSwiss workshop. ### Template for asking participants ### Scratchpad #### First email / slack (propspective participants) - [X] ask if willing to give feedback - [X] give impression of time requirement Hi XXX The GT4Py Team is working on the next iteration of GT4Py adding support for non-Cartesian grids. We would like to get some early stage feedback from users with experience in the domain. Our idea is to schedule two meetings. In the first one we would like to present the draft interface and in a follow-up meeting during the following week gather your feedback and opinions. In between you will need to take some more time to study the examples and make up your opinions. Total time required: 3.5 - 5.5 h over a week. If you would be willing, please let us know your availabilities in the week of the Xth to the Yth for a 1 hour meeting. Best, The GT4Py Team #### Second email / slack (interested participants) - [ ] detailed schedule when & what ## Quick start ```python= @gt.local_operator def average(inp: Field[(I, J), DT], *, position: Position[I, J]): i, j = position return 1/5*(inp[i, j] + inp[i-1, j] + inp[i+1, j] + inp[i, j-1] + inp[i, j+1]) @gt.field_operator def double_average(inp: Field[(I, J), DT]): return average[inp.domain](average[inp.domain](inp)) ``` ## API ### Dimension ```python= @dataclass(frozen=True) class Dimension: name: str ``` ### Index ```python= @dataclass(frozen=True) class Index: dimension: str value: int ``` *Usage*: ```python= idx = Index("I", 0) ``` ### Index Range ```python= @dataclass(frozen=True) class IndexRange: dimension: str start: int stop: int ``` *Usage*: ```python= index_range = IndexRange("I", 0, 10) ``` ### Position ```python= @dataclass(frozen=True) class Position: indices: Tuple[Index, ...] ``` *Usage*: ```python= pos = Position(Index("I", 0), Index("J", 1)) ``` ### Domain ```python= @dataclass(frozen=True) class Domain: ranges: List[IndexRange] ``` *Usage*: ```python= domain_1d = Domain(IndexRange("A", 0, 5)) assert Position(Index("A", 0)) in domain assert Position(Index("A", 4)) in domain assert Position(Index("A", 5)) not in domain domain_2d = Domain(IndexRange("A", -3, 3), IndexRange("B", -3, 3)) a_intersect_b = a & b # intersection a_union_b = a | b # union # syntactic sugar domain_3d = Domain(("A", -3, 3), ("B", -2, 2), ("C", -3, 3)) assert ("A", 4) not in domain_3d ``` ### Field ```python= @dataclass(frozen=True) class Field: domain: Domain # no image needed in public api def __getitem__(self, pos: Position): ... @dataclass class MaterializedField(Field): image: NDArray @dataclass class FunctionalField(Field): func: Callable[[Position], Any] def FieldFactory(...): ... ``` *Usage*: Fields can be constructed by passing a domain and either a function or an image to the factory. The return value is the corresponding Field subclass. ```python= domain = Domain(IndexRange("A", -3, 3), IndexRange("B", -3, 3) field = FieldFactory(domain, function=lambda pos: pos) other_field = FieldFactory(domain, image=some_ndarray) test_position = Position(Index("A", 0), Index("B", 0)) assert field[test_position] == test_position ``` #### Arithmetic `Field`s support the regular (element-wise) arithmetic operations `+`, `-`, `*`, `/`. The resulting field's domain is the intersection of the operand's domains. ```python= field_ab = field_a + field_b field_ab.domain == (field_a.domain & field_b.domain) field_ab[pos] == field_a[pos] + field_b[pos] ``` #### Remap ```python= cell_domain = ... field_ij: Field[(I, J), DT] = FieldFactory(cell_domain, function=lambda pos: pos) # relocate field_sij: Field[(I+1/2), J] = field_vertex(I+1/2) field_sij.domain == cell_domain.translate(I, 1/2) # WIP # shift field_sij: Field[(I+1/2), J] = field_vertex(I+4.5) field_sij.domain == cell_domain.translate(I, 4.5) # WIP # laplacian lap = -4*field_a + field_a(I-1) + field_a(I+1) + field_a(J-1) + field_a(J+1) ``` #### Local operator *Usage*: ```python= @gt.local_operator def laplacian(inp: Field[(I, J), DT], *, position: Position[I, J]) -> Field: i, j = position return -4*inp[i, j]+inp[i-1, j]+inp[i+1, j]+inp[i, j-1]+inp[i, j+1] ``` *Lifting* ```python= field_op: FieldOperator = local_op[domain] ``` #### Field operator composition We have already seen examples of field operators in the form of arithmetic between fields or lifted local operators. In many cases it is useful to compose field operators into new field operators, e.g. a laplacian can be composed of a divergence and a nabla operator. ```python= @gt.field_operator def field_op(inp: Field) -> Field: tmp: Field = field_op1(inp) return field_op2(tmp) ``` ## Concepts ### Dimension A *dimension* is a just a tag or type for [indices](#index). ### Local dimension A *local dimension* is a just a tag or type for [local indices](#index). ### Index An index $i_I$ is a signed integer $i \in \mathbb{Z}$ tagged with a [dimension](#Dimension) $I$. The set of all indices of a particular dimension is called an axis. ### Local index A local index $\hat{i}_{\hat I}$ is an unsigned integer tagged with a [local dimension](#Local-dimension). ### Index range An *index range* $[a_I, b_I[$ is the set of all indices between two indices $a_I$, $b_I$ of same dimension $I$, i.e. $\{a_I, \dots, b_I-1\}$ ([wikipedia](https://en.wikipedia.org/wiki/Interval_(mathematics)#Integer_intervals)). As such an *index range* is a subset (without "holes") of an axis. Note that $b_I$ is not an element of the index range. ### Position A *position* is a tuple of indices of different dimension uniquely identifying an element, e.g. $(0_I, -1_J, 2_K)$. ### Domain A *domain* is a cartesian product of index ranges of different dimension. ### Field A *field* is a function $f: D \to V, pos \mapsto v$, with $D$ a domain and $V$ a set of values. __Remap__ The $remap$ operation maps a field $f$ on elements of one type to a new field $g$ on elements of another type such that the resulting field maps elements $remap(f, c) = g$ $g(pos) = f(c(pos, 0))$ (Cartesian) $g(pos) = pos \mapsto (\hat{i} \mapsto f(c(pos, \hat{i})))$ (Unstructured) ### Connectivity A *connectivity* is a function $c_{\hat{dim}}: pos \mapsto (nb\_pos_1, \ldots, nb\_pos_n)$, mapping a position $pos$ to a set of incident positions $nb\_pos_i$. ### Local connectivity A local connectivity $\hat{c}: \hat{i}_\hat{I} \mapsto i_I$ is mapping from a local index $\hat{i}_\hat{I}$ to a (global) index $i_I$. ### Local operator A *local operator* is a function $\hat{op}: (pos, f_1, \ldots, f_n) \mapsto v$, where $pos$ is a position, element of the domain $D$, $f_1, \ldots, f_n$ a set of (input) fields, and $v \in V$ a value. Local operators are used to express what is commonly known as a stencil operation: Given a position we calculate a new value based on the value of the input fields in its neighborhood. ### Field operator A *field operator* is (higher-order) function $op: (f_1, \ldots, f_n) \mapsto f_{\text{out}}$, where $f_1, \ldots, f_n, f_{out}$ are a set fields. In other words a field operator is a function mapping fields to a field. ## Advanced examples ## Scratchpad __Dimension__ A *dimension* is a just a tag or type for [indices](#index) ```python= # Dimension as (empty?) subclasses of "Dimension" class Dimension: # Optional: # def __init_subclass__(cls): # #assert cls.__dict__ is empty # Dimension.__cache__[cls.__module__. + cls.__name__] = cls # def __init__(self): # # assert False, "Do NOT instantiate this class" ... I(3) == Index(I, 3) I[3:9] == IndexRange(I, 3, 9) class I(Dimension): ... # Optional for the user # discouraged for manual use due to name duplication # I = new_type("I", (Dimension,)) # I: Dimension = Dimension("I") # Dimensions as instances # class Dimension(Generic[T]): # name: T # def __new__(): # ... # def index(self, value): # ... # I = Dimension("I") # I: Type[Dimension] = new_type(...) # Dimensions as types # I: str = "I" # Dimensions as strings ``` __Index__ An index is an integer in $\mathbb{Z}$ tagged with a [dimension](#Dimension), i.e. an element of an axis. ```python= # Indices as instances DimT = TypeVar("DimT", bound=Type[Dimension]) class Index(Generic[DimT]): dimension: DimT value: int # Probably instances should be valid indices # def __index__(self) - > int: ... i: Index[I] = Index(I, 0) # dimension may have factory # i2: Index[I] = I.index(1) # nested index class under dimension class: # discarded because indices all behave the # same independently of dimension -> duplication ``` TODO: example math syntax __Index range__ An *index range* is the set of all indices between two indices of same dimension, e.g $\{1_I, 2_I, 3_I\}$, $\{-1_I, 0_I, 1_I\}$ ([wikipedia](https://en.wikipedia.org/wiki/Interval_(mathematics)#Integer_intervals)). As such an *index range* is a subset of an axis. ```python= class IndexRange(Generic[DimT]): dimension: DimT start: int # Included stop: int # Not included # bounds: [start, stop) # TODO: define arithmetic operators for ranges # Iterable[int] # def __iter__(self): .... # Indexable # def __getitem__(self, index: int) -> Index: .... i_range: IndexRange[I] = IndexRange(I, (-2, 10)) ``` __Domain__ A *domain* is a cartesian product of index ranges of different dimension. _We may need to revisit this name in the future (other alternatives: IndexSpace, Space, Region._ ```python= class Domain: ranges: Tuple[IndexRange, ...] def __contains__(self, item): ... # Iterable[Position] # Indexable: # def __getitem__(self, Tuple[int, ...]) -> Position: Domain((I, (-2, 10)), J=10, K=10) domain = Domain(i_range, j_range) domain = i_range * j_range domain.add_subdomain("interior", domain[1:-1, 1:-1]) # what about passing other_domain[1:-1, 1:-1]. assert is subset? domain["interior"] == domain[1:-1, 1:-1] domain[1:-1, 1:-1] = "interior" # subdomains ``` TODO: Domain should be indexable with (absolute) Indices and relative indices __Position__ An element of a domain, i.e. a tuple of indices. ```python= Position = Tuple[Index, ...] class Position: indices: Tuple[Index, ...] # optional: def __getitem__(self, item: int | str): ... Position(Index(I, 10), Index(J, 30)) ``` __Field__ (proposal 1: "always with halos") A field is a datastructure mapping positions to values. As such it can be thought of as a function $f: D \to V, pos \mapsto v$, with $D$ a domain and $V$ a set of values. ---- A *field* is a function $f: D \to V, pos \mapsto v$, with $D$ a domain formed by cartesian product of integer ranges, and $V$ a set of values. In other words a field maps a position to a value. Operations with fields: - _map_: given a field $f: D \to V$ and a function $g: V \to W$, $map(g, f) = g \circ f$ - _imap_: given a field $f: D \to V$ and a surjective function $h: A \to B \subseteq D$, $imap(h, f) = f_{|B} \circ h$ # TODO: restrict h to inverse image of D - _restrict_: given a field $f: D \to V$ and a subdomain $D^\prime$ (i.e. $D^\prime \subset D$), $restrict(f, D^\prime) = f_{|D^\prime}: D^\prime \subset D \to V$ such that $f_{|D^\prime}(pos) = f(pos)$. In other words $f_{|D^\prime}$ is obtained from $f$ by choosing a smaller domain $D^\prime \subset D$. - _combine_: given two fields $f_1: D_1 \to V$ and $f_2: D_2 \to V$ where $D_1 \cup D_2$ forms a valid domain (i.e., a product set of integer ranges), $combine(f_1, f_2) = f: D_1 \cup D_2 \to V$ such that $f(x) = f_1(x)$ if $x \in D_1$ and $f(x) = f_2(x)$ if $x \in D_2$ - what if $D_1 \cap D_2 \neq \{\}$ ? - _unrestrict_: the inverse of restrict, that is, $unrestrict(f_{|D^\prime}) = f: D \to V$. ---- ```python= class Field: domain: Domain image: npt.NDArray[DT] data: npt.NDArray[DT] origin: Tuple[int, ...] # might require dimensions info def __getitem__(self, pos: Position): # check pos in self.domain ... def __op__(self, other) -> Field: assert self.domain == other.domain # pseudocode: Field( domain=self.domain, image=(op(image[position], other.image[position]) for position in domain) ) def remap(self, conn: Callable[[SourcePos], TargetPos]) return Field(map(conn, self.domain), data=self.data, origin=self.origin) def __shift__(self, dimension, offset) -> Field: return Field( domain=self.domain, data=self.data, origin=shift(self.origin, dimension, offset), # origin = shift_origin(self.origin, dimension, offset) # TODO: workout if origin needs to hold more info than a Tuple[int, ...] ) # Shift and arithmethic operation result = field_a(I-1) * field_a(I+1) # Restrict domain operations (with or without halos) sub_field = select(field_a, subdomain) => field_a[subdomain] restricted_field = restrict(field_a, subdomain) => field_a[subdomain].drop_halo() # Combine fields with adjacent domains field_a = Field((I[0:10], J[0:10]), ...) field_bottom_left = Field((I[-2:0], J[-2:0]), ...) merged_field = combine(field_a, field_bottom_left) # TODO: type annotations interface for Fields T = TypeVar('T') IJFIeld = Annotated[Field[T], (I, J)] IJField[VType[float]] # -- should be supported -- I = Dimension(...) D = Domain(...) A = Field(D, some_buffer) B = Field(D, some_other_buffer) C = A + B D = A + shift(B, I + 1) assert shift(B, ...).domain == B.domain # always ``` __Field__ (proposal 2: no halos) A *field* is a function $f: D \to V, pos \mapsto v$, with $D$ a domain and $V$ a set of values. In other words a field maps a position to a value. ```python= class Field: domain: Domain image: npt.NDArray[DT] def __getitem__(self, position_or_subdomain): if isinstance(position_or_subdomain, Position): position = position_or_subdomain return self.image[position-self.domain.origin] # (ricoh) The domain concept does not allow inference on what domain.origin would be # nor does the accompanying example elif isinstance(position_or_subdomain, Domain): subdomain = position_or_subdomain slices = (slice(r2.start - r1.start, r2.stop-r1.stop) for r1, r2 in zip(self.domain.ranges, subdomain.ranges)) # (ricoh) assuming domain range for dimension I is (0, 4) # and subdomain range for dimension I is (1, 3) # then this would return image[-1:1] # It is hard to match this to the concepts it is supposed to illustrate return Field(subdomain, self.image[slices]) raise TypeError() ``` __Field operator__ A *field operator* is function $op: (f_1, \ldots, f_n) \mapsto f_{\text{out}}$, where $f_1, \ldots, f_n, f_{out}$ are a set of fields. In other words a field operator is a function mapping fields to a field. __Restriction__ In agreement with the usual concept of a [restriction](https://en.wikipedia.org/wiki/Restriction_(mathematics)) of a function, the restriction of a field $f: D \to V$ to a domain $D^\prime$ is a new field $f_{|D^\prime}: D^\prime \subset D \to V$ such that $f_{|D^\prime}(pos) = f(pos)$. In other words $f_{|D^\prime}$ is obtained from $f$ by choosing a smaller domain $D^\prime \subset D$.  __Field__ (proposal 2: no halos) A *field* is a field $f: D \to V$ together with a "backing field" $g: \mathbb{D} \to \mathbb{V}$ such that $\forall pos \in D: \, f(pos)=g(pos)$. As such $D \subset \mathbb{D}$ as well as $V \subset \mathbb{V}$. For sake of simplicity we denote the backing field of $f$ as $backing\_field(f)$. Alternative: ```python= class FieldOpMixin: def __op__(self, other): @local_operator def my_local_op(a: Field, b: Field): return op(a[pos], b[pos]) return my_local_op[self.domain](self, other) class Field(FieldOpMixin): domain: Domain image: npt.NDArray[DT] def __getitem__(self, item): return self._backing_field(item) class FieldWithHalo(FieldOpMixin): domain: Domain _backing_field: Field @property def image(self): return self._backing_field[self.domain] def __getitem__(self, item): return self._backing_field(item) def __shift__(self, dimension, offset) return Field(self._backing_field.domain.translate(dimension, offset)).transparent_view(self.domain) for_diff = combine(inp[0], inp[0:-1]-inp[1:], inp[-1]) # results in a field with a bigger domain than inp: # 1 + len(inp) - 1 + 1 = len(inp) + 1 ``` Difference in user code: ```python= a = Field(...) b = Field(...) a + b[I + 1] # vs. a[domain[:-1, ...]] + b[I + 1] # "domain shrinking" inferred for a? # because b[I+1].domain is 1 smaller in I direction # so a has to be shrunken accordingly ```