# Integrating Halo Exchanges and Further Optimizing SDFG of `fv3core` Acoustic Substep ###### tags: `cycle 5` `dace` In the past cycle, we were able to get an SDFG of the acoustic substep of fv3core which shows good performance on GPU and added support Halo Exchanges from parsed programs. The goal of this project is to integrate the Halo Exchanges into the acoustic substep and measure the scalability and further improve the runtime of the resulting graph of the substep on Piz Daint nodes. ## Tasks 1. Integrate halo exchanges into acoustic substep (CPU) 2. Extend the halo exchange support to GPUDirect. 3. Establish performance baselines on GPUs of daint-gpu nodes: * representative measurements in fv3core Fortran (acoustics with communication) * collect scalability data 4. Experiment with performance, by means of existing tools, for example * at DaceProgram scope: dace-builtin transformations To learn about the effectiveness of transformations, this can be approached in two substeps a. manual transformation candidate selection, selection of OIR optimizations b. fully automated pipeline, indiscriminately or based on performance heuristics 4. Should the results of 2. show the possibility to significant further improvements, the development of new * at gtscript stencil scope: improving OIR transformations, improving OIR node expansion * new "low-level" trafos (possible reuse from 2020 GTBench?, mostly stencil-scope) * new OIR transformations (stencil-scope only) ## Goals * Parsing fv3core acoustic substep to a single SDFG including halo exchanges. * Performance better than Fortran, better than any gt4py+python. * Optimization as time permits. There are no further baselines available. ### No-Goals * Optimizing Parsing / Codegen times. ## Appetite * 1 FTE We further rely on the Fortran timings and input data for scalability tests being provided by ai2.