# [GT4Py] GTIR for DaCe <!-- Add the tag for the current cycle number in the top bar --> - Shaped by: - Appetite (FTEs, weeks): - Developers: <!-- Filled in at the betting table unless someone is specifically required here --> Taken over from cycle 23. ## Problem We want to enable ICON4Py programs to DaCe SDFGs via GTIR. Status: - new itir/gtir type inference merged (https://github.com/GridTools/gt4py/pull/1531) - itir embedded is upgraded to gtir (https://github.com/GridTools/gt4py/pull/1530) - new DaCe lowering architecture (https://github.com/GridTools/gt4py/pull/1538) and additional features - `neighbors`, `reduce` in progress Goal is to be able to verify the ICON4Py program [velocity advection stencil 1 to 7](https://github.com/C2SM/icon4py/blob/main/model/atmosphere/dycore/src/icon4py/model/atmosphere/dycore/fused_velocity_advection_stencil_1_to_7.py), which was difficult to optimize in ITIR. ### Status after cycle 23 - 07/24 Many of the pieces are implemented and merged: - Embedded execution of GTIR - Translation to DaCe for many features many implemented but not merged to main yet: - lowering from frontend to GTIR - domain propagation (needs extensions) - more features in the DaCe lowering For cycle 24, we did a status check by merging all pieces into a single branch. Status as of the betting table: 50/83 icon4py tests in the dycore are passing with GTIR embedded (-> lowering + domain propagation is working) 33/83 are passing the DaCe backend. Reasons for the failing tests are features that are not yet implemented, see the status section below. ## Appetite <!-- Explain how much time we want to spend and how that constrains the solution --> ## Solution Progress and discussions are sketched in the [previous project](https://hackmd.io/QK_2lijFS5-4oHBdLfO6VQ). ### Lowering foast to GTIR - `lift` -> `as_fieldop` - TODO discuss how we integrate lowering + domain propagation as a new backend with embedded ### Transformation passes - function inlining - domain propagation to as_fieldop (in progress) - CSE must not extract functions for DaCe lowering (but is optional in the first vertical slice) ### Lowering GTIR to DaCe - finish - `shift` - `neighbors`, `reduce` (with/without skip values) - `concat_where` - support for tuples (return tuples, tuple expressions and tuple arguments) - let-lambda's (otherwise we need to inline everything) ### Other work to keep GT4Py consistent - upgrade remaining passes - add new `select` builtin to all backends ## Rabbit holes <!-- Details about the solution worth calling out to avoid problems --> ## No-gos <!-- Anything specifically excluded from the concept: functionality or use cases we intentionally aren’t covering to fit the ## appetite or make the problem tractable --> ## Progress <!-- Don't fill during shaping. This area is for collecting TODOs during building. As first task during building add a preliminary list of coarse-grained tasks for the project and refine them with finer-grained items when it makes sense as you work on them. --> - [ ] DaCe PRs (Edoardo) - [x] Skeleton PR - [x] shift PR - [x] neighbor + reduce - [x] add type inference - [x] temporary array domain - [x] let expression support - [x] where - [x] as_type - [ ] tuples on field level (including input/output to program) - [ ] IR changes - [ ] Remove `cond` and use `if_` everywhere - [ ] all passes updated to GTIR - [x] CSE pass (Till) - [x] trace_shift (PR) - [x] all others except temporary (PR) - [ ] temporaries - [ ] concat_where improvements - [ ] index builtin - [ ] to GTIR lowering - [ ] DaCe - [ ] Lowering FOAST to GTIR (Enrique, Hannes) - [x] foast lowering: almost done (all tests pass) - [ ] past lowering (seems ok, ) - [ ] remaining untested foast lowering - [x] as_type - [x] unary ops (+/-) - [x] where / scalar-if-stmt (in frontend field_operator) - [ ] gtir embedded backend starting from field view - [x] backend exists - [ ] run all tests - [ ] GTIR embedded - [ ] handle cases where the result of `as_fieldop` produces a `List` - [x] Domain propagation (Sara), - [x] nested `as_fieldop` - [x] starting from `set_at` - [x] unstructured (max domain approach) - [x] support let-stmt - [x] cond - [x] tuple support - [x] bugs in complicated lambda structures resulting from SSA pass (if InlineLambdas is disabled) - [ ] Temporaries - [x] if_stmt (Till) - [x] embedded - [x] gtfn - [ ] cond - [ ] temporary pass - [ ] **first combined program of icon4py** - [ ] gtir lowering, dace lowering, domain propagation, passes all together - [ ] problems: - [ ] full tuple support - [ ] how to handle cond (see open questions) - [ ] need to run with partially inlining lambdas, otherwise the tree explodes, but domain propagation doesn't deal with all structures yet - [ ] add the dace backend (which combines the steps above) - [ ] open questions - [ ] something like `as_fieldop` returning tuple of fields with different domain (this is extending what we currently support in gt4py) - [ ] the first argument of `cond` seems to be the only place that has scalar operations in a pure field view. How do we handle that? (What doe we do in lowering the domain?) - [ ] complete support - [ ] scan - [ ] as_offset - [ ] Temporary deallocation ## Discussions ### Maps - as a start: use maximum domain always - indices fully concrete, requires compilation for each rank - abstract domain representation that is understood by DaCe transformations (express relations between domains) - abstract domain representation in DaCe (think of: one symbol for each domain) ### Implicit domain ``` as_fieldop(stencil, domain)(*args) as_fieldop(lambda x: deref(x))(*args) ``` ## If, Select, Cond ``` if_(bool, scalar, scalar) #where(field, field, field) cond(bool, field, field) if_stmt ``` ### 2024-08-14: cond's first argument as scalar or 0d-field ```python cond(a<b, c, d) ``` Problem: `a < b` is a scalar operation, but current lowering only deals with field arithmetic Possible solutions: 1. in lowering make a special case in `_map` that lowers operations on scalars without `as_fieldop` 2. lower as 0d-field. needs some special casing to extract the value form 0d-field in embedded 3. multi stage lowering: - first stage: just keep the operation without `as_fieldop` (polymorphic operations) - second stage: resolve polymorphism, e.g. `plus -> plus`, `plus -> field_plus` and add library functions for `field_plus = lambda a, b: as_fieldop(plus_stencil)(a,b)` We go with 1, but evolve to 3. We definitely don't do 2. ### 2024-08-14: Tuples and aliasing in/out of field_operators ``` gtir_tuple_return(x, y, size) { {x, y} @ c⟨ IDimₕ: [0, size) ⟩ ← {as_fieldop(λ(__arg0, __arg1) → ·__arg0 + ·__arg1, c⟨ IDimₕ: [0, size) ⟩)(x, 1.0), x}; } ``` should be translated with state ![image](https://hackmd.io/_uploads/SJpz_Z9qR.png) We then should write an optimization to get rid of the state by using the info that the domain is the same in the maps and the copies. The X->Y memlet should be converted into a copy-tasklet inside the map scope. Even in the case where the domain is not the same we could mask parts to make it the same domain. #### Scratch ``` let myif = cond(foo,make_tuple(a,b), make_tuple(c,d) tuple_get(0, myif)+tuple_get(1, myif) #ideally: transform to cond(foo, a,c) if foo tuple_val1 = a tuple_val2 = b else tuple_val1 = c tuple_val2 = d do_something(tuple_val1,tuple_val2) cond(foo, do_something(a, b), do_something(c, d)) cond(foo, a+b, c+d) ``` **Can we express swapping as a map** ``` gtir_tuple_return(x, y, size) { {x, y} @ c⟨ IDim: [0, size) ⟩ ← {y, x}; } ``` ``` gtir_tuple_return(x, y, size) { {x, y} @ {c⟨ IDim: [0, size-1), c⟨ IDim: [0, size-2)⟩} ← {y, x}; } ``` ``` @field_operator def foo(inp): res = compute_on_size_p1(inp) return res, res(K)+res(K+1) ``` ``` gtir_tuple_return(x, y, size) { {x, y} @ {c⟨ KDim: [0, size+1), c⟨ KDim: [0, size)⟩} ← {compute_on_size_p1(...), average_on_size(compute_on_size_p1(...))}; } ``` ### 2024-08-20 concat_where ``` concat_where(TDim==1, b, c, domain=...) concat_where(TDim<5, as_fieldop(id,domain=)(b), concat(b[TDim<5], c[TDim>=5]) TDim < 5: (⇑(λ(__arg0, __arg1) → ·__arg0 < ·__arg1))(index(TDim), 5) (⇑(λ(__arg0, __arg1, __arg2) → if ·__arg0 then ·__arg1 else ·__arg2))( (⇑(λ(__arg0, __arg1) → ·__arg0 < ·__arg1))(index(TDim), 5), b, c ) ``` ``` concat_where(TDim==1, b, c) concat_where(TDim >= 1, concat_where(TDim <=1, b, c), c) ``` ## Problems ### domain propagation #### ~~trivial set_at~~ ``` out @ c⟨ IDimₕ: [0, __out_size_0), JDimₕ: [0, __out_size_1), KDimᵥ: [0, __out_size_2) ⟩ ← __sym_1; ``` #### ~~passing literal~~ ``` operator_testee(__sym_1, out, __out_size_0) { testee = λ(a) → (⇑(λ(__arg0, __arg1) → ·__arg0 + ·__arg1))(a, 1); out @ u⟨ Vertexₕ: [0, __out_size_0) ⟩ ← (⇑(λ(__arg0, __arg1) → ·__arg0 + ·__arg1))(__sym_1, 1); } ``` #### cond not implemented ... ### gtir embedded backend Problems encountered when using domain propagation with the GTIR embedded backend __`test_nested_reduction`__ ``` __field_operator_testee(__sym_1, out, ____sym_1_size_0, __out_size_0) { out @ u⟨ Edgeₕ: [0, __out_size_0) ⟩ ← as_fieldop(λ(__arg0) → reduce(plus, 0)(·__arg0), u⟨ Edgeₕ: [0, __out_size_0) ⟩)( as_fieldop(λ(__arg0) → reduce(plus, 0)(·__arg0), u⟨ Edgeₕ: [0, __out_size_0) ⟩)( as_fieldop(λ(it_) → neighbors(E2Vₒ, it_), u⟨ Edgeₕ: [0, __out_size_0) ⟩)( as_fieldop(λ(it) → neighbors(V2Eₒ, it), u⟨ Vertexₕ: [0, 9) ⟩)(__sym_1) ) ) ); } # this (the inner most as_fieldop above) should produce something like a field of # lists, but instead produces a tuple of fields outer = as_fieldop(λ(it) → neighbors(V2Eₒ, it), u⟨ Vertexₕ: [0, 9) ⟩)(__sym_1) # afterwards in as_fieldop that this is being passed to, i.e. as_fieldop(λ(it_) → neighbors(E2Vₒ, it_), u⟨ Edgeₕ: [0, __out_size_0) ⟩)(outer) # `it_` is an iterator of tuples which doesn't work with `neighbors` # see `if isinstance(cur, (_List, tuple)):` check in `embedded.reduce` builtin ``` __~test_broadcast_simple~__ ``` __field_operator_simple_broadcast(__sym_1, out, ____sym_1_size_0, __out_size_0, __out_size_1) { simple_broadcast = λ(inp) → inp; out @ c⟨ IDimₕ: [0, __out_size_0), JDimₕ: [0, __out_size_1) ⟩ ← __sym_1; } ``` Error: ``` E ValueError: Incompatible 'Domain' in assignment. Source domain = 'Domain(IDim[horizontal]=(0:10))', target domain = 'Domain(IDim[horizontal]=(0:10), JDim[horizontal]=(0:10))'. ``` Conclusion: FOAST lowering should emit an `as_fieldop(deref)(...)` (for the explicit broadcast) then the transformations can remove it again. __`test_ternary_operator`__ `cond` builtin is not implemented in embedded backend so this test fails. __`test_reduction_expression_in_call`__ This field operator contains an `as_fieldop` whose result is never used (`tmp_nbh_tup[1]`). As such we can not infer a domain for it and the domain inference fails with `ValueError: 'target_domain' cannot be 'None'.`. In order to fix this we need to execute the `CollapseTuple` before the domain inference. That, however, is not not possible since it requires the type inference, which requires the domain inference, creating a circular dependency. The proposed solution is to remove the dependency of the domain propagation pass from the type inference. Then we can execute CollapseTuple before domain inference. This is a side-effect of this project: https://hackmd.io/0HIdi4g9TDWNr25n14BkpA ### 2024-09-30 Preparation for Cycle 25 #### Domain inference for unused tuple elements TODO #### Should set_at work on scalar? Scalars at the set_at level should we wrapped in a 0-d as_fieldop (decided at QPM, possibly change in the future). Should we be fixed in lowering. #### ALL_NEIGHBORS not handled in translate See https://github.com/GridTools/gt4py/pull/1648/files#diff-36c97ae1e7197fbdd81532e533a2e6294febc1c23bbb1a6ef2968130e15368fb #### Handle `map` and `make_const_list` and error "no support for nested red" to be implemented in dace lowering #### Next Cycle project ##### optimizations Before next cycle we want to have an SDFG for - apply_diffusion_to_vn - apply_diffusion_to_w_and_compute_horizontal_gradients_for_turbulence Make sure that we have big enough serialized data for these 2 examples (or all) OR make sure the grid works for this case and we can run from random data. Get the baseline from an ICON run. Optimize with DaCe and compare. Optional: if dependencies are cleared (Python overhead), compare within Python against other implementations. ##### continue GTIR Goal: Get everything running through GTIR in GT4Py (and delete itir lowering), except `scan`. (if applicable) - in field_view mode - in itir mode Missing features - `concat_where`: - we implement the simple case first (no support for `==`) that results in proper domains - conceptual discussion should be closed after the next cycle, ideally implemented - (`scan` (maybe copy the itir lowering?))