Brainstorming Cycle 34 02/26

# Brainstorming Cycle 34 02/26 Decision: 1 long 8-weeks cycle Goals for the Quarter: - Make ICON4Py granules (dycore and diffusion) available upstream by March end or early April. - warm-bubble (submitted to ICCARUS March 16-20): because we can show the greenline is working (this quarter, but the blueline upstream can't be missed): JW multi-node has priority over Warm-Bubble single-node PMAP: - Move PMAP-LES to next Other things happening in the quarter: - muphys paper - SIAM PP conference ## Blueline - Upstreaming ICON4Py granule integration: ComIn (or py2fgen?) (Will, Christoph, Hannes) - Figure out in the shaping if ComIn is a viable approach, and also document the py2fgen plan - Upstream refactorings, e.g. swapped indices - Ensure documentation and tests - Session to explore, build and run BlueLine with C2SM (Annika, Matthieu, Mikael, Michael) - Ensure commonly used ICON output features (e.g., lat/lon intp and meteogram) work with nblocks_e=1 - spack 1.X update ((easily) enables newer NVHPC, libfabric, MPI, NCCL) (Can Matthieu come up with a plan/timeline?) - Nightly/weekly CI verification and performance runs of a real test case? (Christoph, Daniel, C2SM?) - Continuations: - Combine divergence damping coefficient calculation with other stencils ([PR#951](https://github.com/C2SM/icon4py/pull/951)) ## Greenline - [Warm bubble building blocks](https://hackmd.io/HvHaFPQrRP-8d9UzMA_Gkg?edit) (shaped by the group that made this document) - Continuations: - Halo construction - MPI testing - Likely finishable before next cycle - Benchmarking - Add AMD platform to benchmark-CI of standalone granules and stencil tests [Warm Bubble Shaping](https://hackmd.io/5ceTe0y2SZWJGZBwFDdMiQ?both) ## GT4Py/DaCe - Finish embedded implementation and add tests with `jax` / `jax.jit` (Enrique, Hannes) - embedded concat_where + as_offset + bugfix for shift with connectivity of same dim + codim - concat_where: infinite domain -> lazy field -> field refactoring??? - jax.numpy supported in testing - jax.jit as backend - Performance optimizations: - AMD GPU performance (Ioannis) - Someone tries to shape what we can do for balfrin performance (Christoph) - Tracer/Expandable parameter use-case (shaping, but not necessarily schedule Till, Enrique, Hannes; we should be at least able to start optimizing this use case) - caching bugs (Till, Enrique) - Address cache stability issue, related to offset provider, observed in AMD reframe tests - switching CPU <-> GPU backends - Improve GT4Py compile time / lowering performance & predictability, stability Motivation: Moving PMAP-LES to gt4py-next - GTFN passes cleanup. Requires an investigation how can we can transform the IR not just pre or post-order. - GTFN: Return K-only fields - Improve testing infrastructure - large programs that cover current weak points in gt4py which currently only surface in applications - Fuzzing / automatically generated programs validated against embedded - Generic types - Any or variable length tuple (e.g. for containers) - Union, Optional values - Support for passing callable types