# DaCe backend: Reductions Follow-up project of the [Frontend Feature Support for DaCe Backend ](https://hackmd.io/XE0BswX9TOmZp0ZKh2JiGA?both) project of cycle 14 ###### tags: `cycle 15` <!-- Change to the current cycle number --> - Shaped by: Linus - Developers: at least Linus at 60-70% capacity - Appetite: 6 weeks, 1 person ## Problem <!-- The raw idea, a use case, or something we’ve seen that motivates us to work on this --> The dace backend of the gt4py.next has been implemented to the point that were defined as hard requirements in the previous project description, as covered by the end-to-end tests in the gt4py repository. However, some tasks were left unconcluded that were marked as optional in the previous proposal. Further, we add reductions as a further requirement since they have now stabilized upstream. We can therefore list the following list of deficiencies that we want to tackle in this cycle, **in order of priority**: - [ ] **Review & cleanup**: Merge the [Cycle 14 PR]. Mainly requires excluding DaCe backend in more tests and fixing merge conflicts in tests. - [ ] **Neighbor Reductions** are expected to unblock a large amount of stencils in icon4py. - [ ] **Scans** Lower priority since they can be worked around and are not a very central part of the current icon4py implementation. - [ ] **Temporaries** Currently, the DaCe backend assumes that all lifts have been inlined and as such no temporaries are needed or implemented. While potentially interesting for analysis in future cycles and increasing the optimization space of the DaCe backend, this does not unlock new user codes. Further, the testing should be extended. Notably, reductions should be tested on the [fvm_nabla](./tests/next_tests/integration_tests/multi_feature_tests/iterator_tests/test_fvm_nabla.py) test scripts. Further, stencil tests on icon4py should be used to further verify the validity of the already established work and potentially scans operators if implemented. (Be in contact with the icon4py team to understand what parts can be run after completion of the respective tasks above.) ## Appetite <!-- How much time we want to spend and how that constrains the solution --> Linus will during the cycle also work with Edoardo Paone to introduce him to DaCe and enable him to work on the library node concept. ## Solution <!-- The core elements we came up with, presented in a form that’s easy for people to immediately understand --> - **Neighbor Reductions** There are simple solutions to represent this in DaCe such as for loops in tasklets. - **Scans** can be implemented with what is the standard practice to represent loops in dace: Based on state machines (usually in nested SDFGs) with backward edges. With Lambdas working, this loop can be set up with the iteration bounds of the scans, with the loop state just holding the lambda call. - **Temporaries** This step requires some design process to map the iterator based concepts to the array based representation of DaCe. The no questions are on the DaCe implementation side however. ## Rabbit holes <!-- Details about the solution worth calling out to avoid problems --> The tasks that are listed in order of priority above also come with increasing uncertainty. A task can be skipped (e.g. scans) if large research questions arise. ## No-gos <!-- Anything specifically excluded from the concept: functionality or use cases we intentionally aren’t covering to fit the ## appetite or make the problem tractable --> Performance of the backend is not a concern. (The _temporaries_ task will enable more optimizations, yet it is not the task to tune them.) A complete support of iterator IR is not the goal, nor is a distinct graph for each iterator IR stencil. (I.e. non-semantic changes to the IR do not necessarily reflect in the SDFGs structure.) Rather, any frontend code should have some path through itir (including whatever transformations in itir it takes.) To get the ir in some shape that is amenable to the DaCe backend.