# [DaCe] Halo Exchange in fused SDFG (Optimized Lib Node) <!-- Add the tag for the current cycle number in the top bar --> ###### tags: `cycle 21` - Shaped by:Christos - Appetite (FTEs, weeks): - Developers: Christos & Edoardo ## Problem <!-- The raw idea, a use case, or something we’ve seen that motivates us to work on this --> In the previous cycles, we have developed a stable version for the fused SDFG for the whole Diffusion module, i.e., GT4Py programs & GHEX exchange object are SDFGConvertibles (i.e., when placed inside a DaCe program they automatically return an sdfg). However, DaCe uses the predefined patterns and domain descriptors from the `GHexMultiNodeExchange` initialization (Python API). The current solution works without any issue. However, DaCe cannot further analyze/optimize the Halo Exchange Library nodes given that everything is happening inside tasklets (blackbox for DaCe). We need to expose more information to DaCe in order to benefit from optimized data flow. Currently, the Python interface precomputes the domain descriptors/patterns (see `GHexMultiNodeExchange` class), and these are baked in the halo tasklets. ## Appetite <!-- Explain how much time we want to spend and how that constrains the solution --> Multiple cycles. ## Solution <!-- The core elements we came up with, presented in a form that’s easy for people to immediately understand --> Before jumping to the optimization of the Halo Exchange DaCe Library node, the following issues need to be addressed: 1. Finalize the asynchronous communication (successful prototype but I need to do some futher changes to GHEX as discussed with Fabian). 2. Update the SDFGConvertible interface of the GT4Py programs, as there have been some breaking changes introduced with the new workflows. Once these are done, I will investigate how we can expose the domain descriptors/patterns (which are basically lists of local/global indices to be exchanged) to DaCe. ## Rabbit holes <!-- Details about the solution worth calling out to avoid problems --> ## No-gos <!-- Anything specifically excluded from the concept: functionality or use cases we intentionally aren’t covering to fit the ## appetite or make the problem tractable --> ## Progress <!-- Don't fill during shaping. This area is for collecting TODOs during building. As first task during building add a preliminary list of coarse-grained tasks for the project and refine them with finer-grained items when it makes sense as you work on them. --> - [x] Finalize the asynchronous communication (successful prototype but I need to do some futher changes to GHEX as discussed with Fabian). - [x] Update the SDFGConvertible interface of the GT4Py programs, as there have been some breaking changes introduced with the new workflows. - [x] [GT4Py programs as DaCe SDFGConvertibles #1527](https://github.com/GridTools/gt4py/pull/1527) : Ready to merge - [x] Updated to the latest GHEX (merged in ICON4Py) - new Python API - [x] My PR: [Expose C++ obj pointers to Python API](https://github.com/ghex-org/GHEX/pull/163) - ready to merge - [x] First optimization pass for the halo exchange in the fused SDFG: The halo library node is placed automatically in the fused SDFG and the user does not need to manually place the halo nodes. - [x] Prototyped how to extract the fused SDFG (without halos) and manipulate it. - [x] Analysis of the fused SDFG and automatic insertion of the halo lib nodes.