[DaCe] Fixing ITIR to SDFG Issues For a Comparison

# [DaCe] Fixing ITIR to SDFG Issues For a Comparison - Shaped by: Philip - Appetite (FTEs, weeks): - Developers: ## Problem This document should be seen as an extension of [[DaCe] ITIR vs. Field View – The Best Representation](https://hackmd.io/@gridtools/BynVNnAu6), that focuses on the ITIR to SDFG translator, i.e. the current DaCe backend in GT4Py. Therefore, it is not a real shaped project. #### Avoid Dynamic Allocation The first issue, is related to how certain dimensions are handled. Currently, the translator will generate a symbol for the shape and stride for every array. First of all, having symbolic shapes is not a problem in itself. However, for some dimensions there exists a known upper bound, which is small, an example would be `E2VDim` which is always $2$. Keeping such shapes prevent the static allocation of certain auxiliary arrays, which currently renders the backend unusable on GPU and causing a massive penalty on CPU: - https://github.com/GridTools/gt4py/issues/1412 - https://github.com/spcl/dace/issues/1500 #### Lowering of neighbor reductions with lift expressions Current ITIR-2-DaCe backend does not support neighbor reductions with lift expressions. In the ITIR pre-process stage, we force loop unrolling to get rid of these lifts. Unrolled loops clatter the SDFG, making it difficult to analyze. The problem gets even worse in fused stencils. Another and more important disadvantage is that we force a certain implementation (loop unrolling) in the front-end while the decision should be taken by the optimization stage in the DaCe backend. We should not support all lifts but only lifts inside neighbor reductions. We can timebox this task to 3 days, if it takes longer than that we could drop it. #### Symbols for Shape and Stride Currently, every time a new array is generated the translator generates new symbols for its shape and stride. This will generate a massive amount of redundant symbols. #### New loop construct After upgrade to dace v0.15.1 it will be possible to replace a loop represented as state machine with the new loop construct. Loop state machine is used in scan operator. Please refer to this [PR](https://github.com/spcl/dace/pull/1407) and to the DaCe documentation. Since loops are not supported by dace transformation yet, as documented in the above PR, we have to use the utility function `dace.sdfg.utils.inline_loop_blocks` that inlines any LoopRegions to traditional state machine loops. Since loops are not handled by dace transformation yet, this task is not expected to give any performance improvement now, but it could in future since it enables loop analysis. ## Appetite As pointed out in [[DaCe] ITIR vs. Field View – The Best Representation.](https://hackmd.io/@gridtools/BynVNnAu6) we must fix dynamic allocation, but otherwise we can not make a meaningful comparison. Thus we should first try to fix the bug and use this `max_neighbor` properties. However, if we are not able to fix the issue after **3** days we should stop and go to the emergency plan, i.e. making all sizes fix, which should be done in about 2 days. ## Solution We now outline possible solutions to the different parts of the project. #### Avoid Dynamic Allocation The offset providers carries the `max_neighbor` properties, which is an upper bound. As far as I can tell, this only applies to dimensions with `DimensionKind.LOCAL`. With this value we are able to create a statically allocated array of that size. For maintaining compatibility with other code a view, with dynamic size can be created. Another, much simpler solution would be to just fix the sizes of all arrays. Its main advantage would be be that it is very simple and it is closer to the Jax based translator. #### Symbols for Shape and Stride While there is the `add_storage()` function to add new arrays, some parts of the code will call `dace.SDFG.add_array()` directly. But all are using `new_array_symbols()` to generate the symbols. Thus, we have to change it only at one point. Currently, the function uses some mangling algorithm to generate a unique symbol. While this is well for strides, we should change the generation algorithm, such that derives the name from the dimension name and then reuses it. ## Rabbit holes #### Avoid Dynamic Allocation Potentially it could trigger a cascade of changes, but for this project we will not follow all, just to make it run. ## No-gos The ITIR to SDFG translator has a lot of other issues, that will not be addresses in this project. ## Progress  - [x] Avoid Dynamic Allocation ([PR#1430](https://github.com/GridTools/gt4py/pull/1430)) - [x] Symbols for Shape and Stride ([PR#1422](https://github.com/GridTools/gt4py/pull/1422)) - Only applicable to local dimension of connectivity tables - [x] New loop construct ([PR#1424](https://github.com/GridTools/gt4py/pull/1424)) - [x] Lowering of neighbor reductions with lift expressions ([PR#1431](https://github.com/GridTools/gt4py/pull/1431)) - [ ] Bugfixes: - [x] Fix for neighbor reduction with skip values ([PR#1443](https://github.com/GridTools/gt4py/pull/1443))