Description of the Optimization Intermediate Representation (OIR)

# Description of the Optimization Intermediate Representation (OIR) This document gives an overview of the OIR of the gtc backends of GT4Py. A related read is the https://github.com/GridTools/concepts/blob/master/GTScript-Parallel-model.md which defines gt4py's "Parallel Model". The parallel model is a specification of a semantic behavior, to which any optimization must remain equivalent. Most importantly, it specifies that the behavior is as if there were an iteration over the levels of the K axis (potentially parallel if possible based on reads patterns), while each of those loops contains a sequence of assignments which are executed in parallel for the grid points of the IJ planes. Currently, this ordering of a K loop on the outside and IJ maps inside is still reflected in OIR, although up for discussion. (It is not always possible to exchange the loop order, depending on the access pattern, but it is of course sometimes desireable.) ## Overview All IRs in EVE/GTC are currently based on pydantic models. (Although a move to a different framework is in the works, but that will not conceptually change anything.) That means they are data classes for which attribute types and other validity constraints are enforced when they are initialized. The OIR is an IR which is still the same for all code generation backends which are based on gtc. At this level, the main focus is on optimizing the execution order by grouping the individual computations in maps and loops. OIR is not yet set in stone, and as features are added in the coming weeks and months, some attributes will be added to some of the nodes. ## OIR Nodes The most interesting OIR nodes are the following which are concerned with the execution order / grouping into loops and maps. * `Stencil` This is the root node of OIR. In addition to a list of K loops (`VerticalLoop` in the `vertical_loops` attribute) it holds information about the fields. In the SDFG representation as implemented in the ShapeUp cycle in Jan/Feb'21, the `Stencil` is an SDFG, where `VerticalLoop`s are represented by `VerticalLoopLibraryNode`s holding the same information. ```Python class Stencil(LocNode, SymbolTableTrait): name: Str params: List[Decl] # declarations (name and dtype) of API fields and scalar parameters vertical_loops: List[VerticalLoop] # the loop nests in sequential order, see below declarations: List[Temporary] # declarations of temporaries (transients) ``` * `VerticalLoop` A grouping of k loops that ideally operate on different vertical sections of the same fields. They all have the same loop direction (forward, backward, or if possible, parallel). This grouping is useful to keep fields cached for different vertical sections although not the same exact computations are executed. ```Python class VerticalLoop(LocNode): loop_order: common.LoopOrder # forward, backward loop or parallel sections: List[VerticalLoopSection] # different computations to be # executed for different k levels, # see below caches: List[CacheDesc] # declarations of how to cache fields # in this VerticalLoop ``` * `VerticalLoopSection` The actual k loops, covering a range of k values defined by `interval`, running the 2d `horizontal_executions` on each level. In the SDFG representation, each of these loop sections is itself another SDFG, where the `HorizontalExecution`s are representd by `HorizontalExecutionLibraryNode`s holding the original `HorizontalExecution` node as a property. ```Python class VerticalLoopSection(LocNode): interval: Interval # k-levels on which the horizontal executions of this # VerticalLoopSection are to be applied horizontal_executions: List[HorizontalExecution] # ``` * `HorizontalExecution` The mask is a boolean statement, which _masks_ the horizontal execution in the sense that the actual assign statements are only evaluated for grid points where the mask expression evaluates to true. Each statement, as well as the mask expression, comprises of an ast-style tree of (unary, binary, ternary) operations and (field, scalar, literal) references. ```Python class HorizontalExecution(LocNode): body: List[Stmt] mask: Optional[Expr] # a boolean expression (3 dimensional or scalar) # body is only ran for grid points where the mask is true declarations: List[LocalScalar] ``` ## Examples ### Horizontal Stencil #### GTScript ```Python def horizontal_conditional(lap_field: Field3D, in_field: Field3D, flx_field: Field3D): with computation(PARALLEL), interval(...): res = lap_field[1, 0, 0] - lap_field[0, 0, 0] if (res * (in_field[1, 0, 0] - in_field[0, 0, 0])) > 0: flx_field = 0 else: flx_field = res ``` #### OIR ![](https://i.imgur.com/Rf29fq3.png) #### SDFG "Outer" stencil-level SDFG ![](https://i.imgur.com/WRsZPzH.png) One of the "inner" `VerticalLoopSection`-level SDFGs ![](https://i.imgur.com/pqDVE8v.png) ### Vertical Computation Tridiagonal Solver #### GTScript ```Python def tridiagonal_solver(inf: Field3D, diag: Field3D, sup: Field3D, rhs: Field3D, out: Field3D): with computation(FORWARD): with interval(0, 1): sup = sup / diag rhs = rhs / diag with interval(1, None): sup = sup / (diag - sup[0, 0, -1] * inf) rhs = (rhs - inf * rhs[0, 0, -1]) / (diag - sup[0, 0, -1] * inf) with computation(BACKWARD): with interval(-1, None): out = rhs with interval(0, -1): out = rhs - sup * out[0, 0, 1] ``` #### OIR ![](https://i.imgur.com/J938ob3.png) #### SDFG "Outer" stencil-level SDFG ![](https://i.imgur.com/nN0L9G9.png) The (in this case only) "inner" `VerticalLoopSection`-level SDFG ![](https://i.imgur.com/8KVYxlI.png) ## Currently implemented transformations (with visitors) Currently, mainly different varieties of fusions are implemented with node visitors: * `GreedyMerging` Merges the statements of two adjacent `HorizontalExecution`s to the same one. Despite the name, there is no performance model involved, i.e. no guarantee that a fusion choice is locally optimal. It just fuses anything that is valid from a parallel model perspective. * `AdjacentLoopMerging` Similar to `GreedyMerging`, but on the level of `VerticalLoop`s. It is mainly about grouping `VerticalLoopSections` that operate on the same fields in the same `VerticalLoop` so that the fields can be cached in k direction. * `OnTheFlyMerging` `HorizontalExecution`s within a `VerticalLoopSection` are fused by introducing redundant computations instead to replace stores and loads Other smaller transformations are more in a "house keeping" fashion such as turning fields into local variables if possible, pruning unread stores or detecting fields which can be cached as a result of above merging transformations. ## Possible use cases for dace On this level of abstraction, mainly for the former two fusion types, DaCe could be useful to * detect more fusion choices, currently only try to fuse elements that are adjacent in list per node * estimate performance impact of possible fusion choices to build (greedy or better) fusion decisions (by estimating compute / memory bounds) Conversely, based on the graph representation, fissions could be performed. The inner SDFG on the level of `VerticalLoopSection`s with `HorizontalExecution`s as nodes could be split into separate `VerticalLoop`s s.t. they can later be merged together with computations working on the same data rather than e.g. conceptually similar operations working on different data as they are sometimes specified by the users.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.