# Introduce Halo-Exchanges in Icon4Py
###### tags: `cycle 14`
- Appetite:
- Developers:
## Objective
**Develop a test**, starting from the diffusion granule, that can work on multiple nodes.
**Secondary objective**: If the GHEX library is chosen for the test, by means of the implementation the Python bindings to GHEX for unstructured grids will be developed, at least partially (GHEX is the library we want to use in the long term in GT4Py).
## Approach
In order to get the domain decomposition information we follow the approach followed by stand-alone advection granule, which (I assume - need to check) is done through serialized data. The advection granule also contains a branch in which GHEX is used to initialize the communication calls. This code can be understood by Mauro and Fabian, and they can help in understanding it to set up a communicaiton pattern.
The test then will introduce halo-exchanges in the source code (from a copy of the diffusion granule) following the guideline in the Fortran code, which Will refactored and can guide in the identification of the relevant code sections there.
There are three main possible implementation strategies:
1. Use MPI4Py directly, which would imply the introduction of explicit MPI calls and packing routines for the data to be sent. It is, in fact, necessary to iterate using neighbor tables to gather the data to send. Unpacking happens directly into the halo regions, AFAICT, but the ICON halo structure has some complications that are not well documented. So a loop for unpacking could also be a possibility. The regions to pack and unpack are found in the advection granule in the GHEX branch (for cell centers).
2. Use GHEX to move toward the desired state of incorporating it to the automatic code generation. The loops for packing and unpacking are dealt during the setup phase, so the implementation should be easier.
3. Use YAXT, the native ICON library for halo-exchanges. The loops for packing/unpacking are dealt by the library. They are currently working in implementing them for GPUs, and there are not Python bindings for that. The bindings should not complex, since YAXT is a library that do not use compile-time polymorphism, but ideally the YAXT developers should provide the bindings.
Of the approaches the second seems the most valuable, since it will imply work that we need in the future and we have knowledge in-house to deal with it.
Option 1 is viable if it's development time is actually much reduced compared to option 2.
Option 3 is actually saying we will not start in this cycle but, if not ready before the next, we need to switch to option 1 or 2 anyway.
## Non-objectives
- For the options
- Full bindings for GHEX
- YAXT Python bindings
- Optimization of the echanges in terms of better packing/unpaking
## Possible time sinks
- The advection only uses communication of cell-centers, while diffusion needs also exchanges on edjes. This requires finding, in the diffusion code, the information about halo-exchanges for edges, too.
- Testing for correctness can become a major source of time consumption, since this is how things are in this business.
- Packing/Unpacking loops for MPI4Py could be tricky without guidance into the source code to idenmtify the regions/index-sets to iterate on
## mo_tiny_comm.f90
Notes from walk through with Jörg Behrens on April, 25th.
See https://gitlab.dkrz.de/icon/pww/icon/-/tree/pww_advection_standalone?ref_type=heads
`src/advection/mo_tiny_comm__adv.f90`
`mo_tiny_comm` is an adhoc implementation that isolates communication for use with the advection and diffusion granules. It is supposed to go away in the long run and be replaced by the (yet to come) ICON communication library and the new version of YAXT.
### working
`mo_tiny_comm` splits into two parts: it can run with the full model or with a standalone granule.
When running with the full model it serializes the information needed for communication after the domain decomposition is done. Hence it needs ICON to do the domain decomposition. The writing is done in:
`mo_tiny_comm__adv.f90: SUBROUTINE my_v2v3`
It writes:
-
- global index...
-
Running with the standalone it can only run on the same setup (number of MPI processes) that it ran the full model before hand.
- It reads in the decomposition information, serialized from the full model run
- it passes this information to YAXT which is able to determine from it in an efficient way what needs to be communicated where. YAXT can determine in a memory efficient way from the local information of the global grid where the relevant neighbors are and setup the communication patterns.
- It then uses plain MPI_send/recv/gather/scatter to do the communication. Before actually calling communication routines it packs/unpacks information to be sent into buffers because in ICON the halo rows are not contiguous and `mo_tiny_comm` avoids using MPI datatypes in this way.
## next steps
- [ ] (Fabian) GHEX python bindings (structured) merge
- [ ] (Fabian) GHEX python bindings (unstructured)
- [ ] (Magdalena) ICON domain decomposition information serialize/read in (from python)
- [ ]