# [Blueline] Enable GTIR-DaCe backend in blueline <!-- Add the tag for the current cycle number in the top bar --> - Shaped by: Edoardo - Appetite (FTEs, weeks): - Developers: <!-- Filled in at the betting table unless someone is specifically required here --> ## Problem <!-- The raw idea, a use case, or something we’ve seen that motivates us to work on this --> The dace backend is _almost_ working, in its current state. The tests are passing in icon4py, but there are some issues in blueline testing. This project has two objectives: 1. (most important one) finalize the backend so that it _works_, and can be enabled in the icon4py and icon-exclaim CI. 2. implement some new features to make the backend usable in blueline, with decent performance. ## Appetite <!-- Explain how much time we want to spend and how that constrains the solution --> Full Cycle ## Solution <!-- The core elements we came up with, presented in a form that’s easy for people to immediately understand --> ### Enable the dace backend in icon4py and icon-exclaim CI For the first objective, two issues are already known and are being looked into: - Validation error in blueline testing (field `w` is only 89.0% close to reference) - SDFG transformation for loop-blocking has issues in icon4py (needs to be tested again on new baseline) More issues could be reported during integration and testing. By the time we have a working backend, hopefully can we also benefit from some improvements in the SDFG optimization pipeline to reduce the duration of CI jobs (see dace project [Optimize the Optimizer](https://hackmd.io/B4GBJaEZRfmFhhgIEt_iIQ)). ### Make the dace backend usable in blueline For this objective, we need to implement at least two new feaures: - Lowering of symbolic expressions without resorting to tasklets, in order to avoid scalar-to-symbol conversion. This is known to be a performance bottleneck in SDFG lowering. - Memory pool for allocation of transient arrays, to reduce the memory requirement for large experiments. ## Rabbit holes <!-- Details about the solution worth calling out to avoid problems --> ## No-gos <!-- Anything specifically excluded from the concept: functionality or use cases we intentionally aren’t covering to fit the ## appetite or make the problem tractable --> ## Progress <!-- Don't fill during shaping. This area is for collecting TODOs during building. As first task during building add a preliminary list of coarse-grained tasks for the project and refine them with finer-grained items when it makes sense as you work on them. --> - [x] Lowering of symbolic expressions without resorting to tasklets ([PR#2122](https://github.com/GridTools/gt4py/pull/2122)) - [x] Memory pool for allocation of transient arrays - [x] Fix in dace codegen: [PR#2103](https://github.com/spcl/dace/pull/2103) - [x] Change in gt4py: [PR#2130](https://github.com/GridTools/gt4py/pull/2130) - [x] Fix validation error in blueline - [x] Ensure non-negative shape of temporaries in concat_where expressions ([PR#2150](https://github.com/GridTools/gt4py/pull/2150)) - [x] Get correct symbol mapping in MoveDataflowIntoIfBody ([PR#2154](https://github.com/GridTools/gt4py/pull/2154)) - [x] Correct dace setting to use default cuda stream ([PR#2189](https://github.com/GridTools/gt4py/pull/2189)) - [ ] Validate dace backend with dycore granule - [x] Upgrade GT4Py to v1.0.6: [PR#810](https://github.com/C2SM/icon4py/pull/810) - [ ] Enable dace backend tests in ICON4Py CI [PR#822](https://github.com/C2SM/icon4py/pull/822)