# GT4Py community meeting
###### tags: `minutes`
## 2026-01-13
- Updates from CSCS
- Happy new year!
- Updates from NASA
- CPU performance part I done

- Clear path to optimization accross patterns based on schedule tree
- Waiting to move to DaCe v2 for part II
- Lots (lots) of performance left to squeeze out (this is mostly macro)
- Cartesian moving to DaCe v2
- Experience on the Grace Hopper AST/UM: so far no strategy changes compared to fat nodes, latency kills most of the benefits
- Next up: frontend clean up and producterization
- Updates from NOAA
- Running Pace with "full" aquaplanet physics
- Validating physicality of solutions, start using for science
- Most technical issues are in initializing radiation data with MPI or in compute environment
- Then profiling and performance engineering along NDSL, DaCe developments, resolving issues with bridge to RTE-RRTMGP
- Refactoring PyFV3 for whole-atmosphere model
- Updates from ECMWF
## 2025-12-18
- Global questions
- When 3.10 dies?
- Early next year, pending stability checks for 3.12 on `cartesian`
- Any problems running gt4py with a uv-managed python installation in HPC systems?
- EXCLAIM/ICON4Py roadmap for 2026 (not final)
- Merge the gt4py-dycore into the upstream ICON repo in 2026 Q1/Q2
- Coupled atmosphere/ocean/land runs fully driven from Python before the end of 2026 (using Fortran CPU components)
- Use gt4py.cartesian ECRad in the coupled runs, and probably try to port it to gt4py.next
- GridTools python package index POC working: https://gridtools.github.io/python-pkg-index/ We will install DaCe-next and GHEX wheel packages from there starting next year
- Update from NASA
- First results of CPU optimization on HPC - confirming the strategy

- GT4Py.cartesian
- "Dynamic intervals" available for `debug`, coming to `dace:X`
- Temporary defined first/only in an `if` will be made illegal to fight odd numerical failures
- 2026: accelerated deployment timeline - scope still unsure of the goals as the dusts settles from budget/shutdown/layoffs.
- Try to move the dace work to the main/v2 branch soon (early next year?)
- Updates from NOAA
- Moving towards high-resolution aquaplanet simulations once the physics results are validated in smaller runs
- Updates from ECMWF
- Started migration of stencils to gt4py.next
- Loki now supports translating some microphysics schemes to gt4py.cartesian
## 2025-11-18
- GT4Py Roadmap
- Status of 3.10 deprecation
- Python 3.11: LUMI PMAP setup is currently stuck on 3.11
- Update from CSCS
- see performance numbers in https://docs.google.com/presentation/d/1N-HhYtmRVxZ278rXMZE0-RQPEaNXGS3nVQVTncNUcKg/edit?slide=id.g3a247b305ed_21_0#slide=id.g3a247b305ed_21_0
- Update from NASA
- Self-assignment with offset in K (sequential loops) has been tuned and fixed
- Column physics: dynamics interval and "IJ - Iterative Solver - K" pattern
- Update from NOAA
- Updates from ECMWF/PMAP
- Others?
## 2025-10-21
- GT4Py roadmap
- deprecate gt backends in Cartesian? -> check with PMAP, for NASA ok to deprecate/feature-freeze them
- Deprecation of 3.10 (and possibly 3.11)?
- PR to be open from cartesian `gtscript_frontend` changes
- Updates from CSCS
- Performance optimizations
- DaCe backend
- Python overhead reduction
- Halo exchanges
- fixed buffer sizes
- experiments with NCCL as exchange backend showed good performance
- Updates from NASA
- Features coming online
- Experimental
- K index read in stencils. We have concern about this usage, it might be better to implement dynamic intervals. But we have physics relying on it, so we are pushing forward with it as experimental.
- 2D temporaries in stencils
- Non experimental
- `IntEnum` support in stencil and as arguments
- CPU performance
- Now reaching Fortran on isolated runs or better
- Requires deeper cache work (across stencils) with proper "transient" flagging for DaCe to go further. This is proposed to be solved by introducing a new concept in the NDSL applicative layer.
- Much more vectorization can be done
- Updates from NOAA
- Updates from ECMWF/PMAP
- Others?
## 2025-09-30
- Updates from CSCS
- Next month: full focus on Fortran ICON + ICON4Py performance
- MPI (ghex) performance issue
- managed vs pinned memory?
- Updates from NASA
- [Experimental features](https://github.com/romanc/gt4py/blob/romanc/feature-experimental-absolute-indexing-k/docs/development/ADRs/cartesian/general-experimental-features.md) in `gt4py.cartesian` in the context of [absolute indexing in the `K`-dimension](https://github.com/GridTools/gt4py/pull/2276)
- CPU optimization: slowed by instability of orchestration
- NASA/NOAA work on model pipeline (State, Configs, etc.)
- Updates from NOAA
- Working on non-square layouts with pyFMS, but glad to see GT4Py supports executing stencils on non-square arrays
- Lots of work with NASA on frontend design, implementation
- Shallow convection validates, other physics schemes in process of merging, starting full physics integration and validation now
- Thoughts on 2D temporaries? Restricted K intervals?
- Updates from ECMWF/PMAP
- Others?
## 2025-08-26
- Updates from CSCS
- CI on AMD GPUs
- DaCe backend deactivated
- NamedTuple-like containers
- Updates from NASA
- User manual for NDSL (partially covering `gt4py.cartesian`) almost ready for show
- Porting moving to Moist's GF and CO Chemistry next
- String of small fixes to DaCe
- CPU optimization: we cache miss but we vectorize so... half-win.
- Updates from NOAA
- Merging physics schemes and radiation code into PySHiELD
- Will need to performance engineer physics, especially calls to radiation and data handling
- shim_key issue with 4D fields
- Changes to validation in Microphysics
- Deploying via containers on GPU HPCs
- Updates from ECMWF/PMAP
- Others?
## 2025-07-29
- Updates from CSCS
- ICON with ICON4Py
- performance between greenline (pure Python) and blueline (Fortran + Python dycore+diffusion) almost on par (JW setup)
- with GTFN backend: validated, performance need some tweaks
- with DaCe backend: almost validated, performance in full application ongoing
- Regular releases (probably)
- Plans to change README and make `next` more prominent
- (Numpy issue discussed last time resolved: numpy 1.26.x compiles easily on CI and "my machine")
- Updates from NASA
- debug backend merged
- OIR -> ScheduleTree -> SDFG merged
- Updates from NOAA
- Validating radiation driver
- Performance optimization to come, DaCe may be tricky
- Integrating PBL, surface schemes into PySHiELD, finalizing shallow convection validation
- Needed for coupling with ML LSM scheme
- Scientific runs starting soon
- Updates from ECMWF/PMAP
- Others?
## 2025-07-01
- Updates from CSCS
- Use a single DaCe version so `cartesian` and `next` could use the same newer features?
- Drop Numpy 1.x ? (try to overwrite DaCe 1.x NumPy restriction)
- Which Python versions should we support? Last 3? All which are not End-of-life?
- Last 3 sounds good
- Rename package optional deps `all` to something else e.g. `cartesian`, `next` or `full-cartesian`, `full-next`, ...
- Memory pool for DaCe:
- cuda pool is used in cartesian
- Updates from NASA
- [Dropping the CLI backend](https://github.com/GridTools/gt4py/pull/2090) - only affects cartesian
- [Removing support for py < 3.10](https://github.com/GridTools/gt4py/pull/2093) - affects cartesian and next
- New GT4Py -> DaCe bridge via Schedule Tree nearly operational (requires custom DaCe up until v2 is out + refactor)
- Last checks on `regions`, some orchestration issues
- Optimization work to follow: K-axis merge, trivial 2/3-axis merge, tiling, dynamic block-threading & thread-local transients
- POC Iterative/AI model work has started
- Experimental features coming to mainline with backend restrictions when applicable
- Updates from NOAA
- Working on radiation drivers and coupling
- Bringing more domain scientists in to development
- Breaking large stencils up to support differences between nonhydrostatic, hydrostatic, and shallow-water modes for FV3
- Will be guided by performance analysis
- Hopeful DaCe will be able to maintain performance
- Optional fields typehinted as None are supported in `gt4py.cartesian`
- Working on GPU runs on Ursa, Stellar
- Continuing on comparisons between FMS (ground truth) through the pyFMS package and NDSL, in preparation for the generation of communication and partitioning functionality based on data from calls to FMS
- Updates from ETH/ECMWF
- Port of ECRad is faster than openacc
- Collaboration on developments like data dimensions between `gt4py.cartesian` and `gt4py.next` would be productive
## 2025-06-03
- Updates from CSCS
- WIP on Icon4py going into production
- T-shirts!
- Updates from NASA
- UW in model validation ongoing
- GT4py -> DaCe bridge: attempt at rewriting it as a OIR -> Stree -> SDFG pipeline
- POC : hybrid "physical" and data-driven components. Settled on Dynamics -> Microphysics -> Radiation -> ML Land Model
- Updates from NOAA
- Mostly focused on tooling still
- PyFMS
- Validating physics schemes
- Improving coupling with pyRTE-RRTMGP
- Bringing in latest updates from Fortran dynamical core
- Running on new systems
- Ursa
- Stellar
- Targeting full-physics scientific application by September
- Primarily looking at high-resolution aquaplanet simulations to start
- Coupling with ML LSM in collaboration with NASA
## 2025-05-06
- Updates from CSCS
- Work on CI moving features to nox
- ICON4Py dycore integrated in FORTRAN running with DaCe
- Updates from NASA
- Numerical validation of UW Shallow Convection & Dynamics at 32-bit. Scientific in-model validation ongoing
- User Manual for NDSL first draft
- GT4Py -> DaCe bridge unused parameter bug: need to fix proper by moving to `dace.Scalar` for parameters
- Updates from NOAA
- Physics ports merging in to PySHiELD
- k-offset writes in use in microphysics
- GT4Py fork of DaCe will be useful
- Adopting some of Florian's work on Serialbox to write more granular tests
- Continuing work on PyFMS development and tooling
- Updates from PMAP
## 2025-04-08
- Updates from CSCS
- ADR merge templates or not
- versioningit

- Updates from NASA
- Cleaning up of mixed precision cartesian feature to be PR'ed in main
- Other "experimental" features will remain on side branch for now
- Optimization work with schedule tree as started
- Work is shared with Stefano & Gabriel
- Updates from NOAA
- Mostly focused on tooling and modeling more than GT4Py itself
- PyFMS development is going well, integration into NDSL will begin soon.
- PySHiELD physics implementations moving forwards
- LSM issues ongoing
- Shallow Convection scheme is in-test
- RTE-RRTMGP: We would have to add gpu capacity to python frontend, "simple" wrapper won't be performant.
- Adding more tests to dynamical core
- Updates from ECMWF/ETH
## 2025-03-11
- Updates from CSCS
- Dycore, diffusion Python granule integrated into Fortran
- py2fgen tool to create Fortran bindings to Python functions
- Focus on DaCe backend feature complete
- Updates from NASA
- Microphysics (GFDL v2): validated (60% slower on CPU, 630% faster on GPU)
- DaCe -> GT4Py bridge: PR up, this opens the next phase of optimization work
- Auto-diff: Good first contact with Affif from SPCL. Will put up a meeting with stakeholders (NOAA, NASA, AI2, CSCS)
- Experimental branch is clean and technical documentation is being drafter on the more obscure part of the cartesian stack -> to be moved under gt4py/ADR
- Updates from NOAA
- Welcome to Janice Kim! New MSD team member working on the Python code for now
- No in-person meetings for the forseeable future
- PyFMS development is moving forward
- Goal is to use FMS for initial domain decomposition, diagnostic handling, etc. and use that within NDSL infrastructure
- Physics porting also moving forward:
- PBL validation done
- LSM progressing slowly
- RTE-RRTMGP progress slow.
- Python frontend is performant but isn't GPU capable yet
- Can also make a wrapper around Fortran code
- Updates from ECMWF/ETH
## 2025-02-11
- Updates from CSCS
- Plans for 2025: ICON4Py in MeteoSwiss production with "granule"
- DaCe main
- Updates from NASA
- For the next 6 months to a year
- Performance engineering: based on Schedule Tree with a first emphais en macro organization of the code
- GEOS physics: continuing the port -> integrate -> test online, loop
- Cartesian physics feature: still on the experimental branch, with more feature to come, angling toward mainline marge
- Will go to DaCe main alongside the performance engineering
- Sitting down with NOAA & ECMWF to gather physics requirements
- Bug: something is off about temporaries. Unclear of the root cause, but we have seen the "make it a parameter" strategy fix numerical issues where it shouldn't
- Updates from NOAA
- FMS integration ongoing
- Message passing
- data pointers
- PBL scheme mostly validating
- Temporary variable bug
- Adding stencils changes answers?
- RTE-RRTMGP integration beginning
- Updates from ETH/ECMWF
## 2024-12-17
- Updates from CSCS
- Dropping support for Python 3.8, 3.9
- Providing source dist for GHEX in PyPI
- Adding glue to Serialbox insides to make Python use easier
- Updates from NOAA
- Change to Apache2.0 license allows NVIDIA to collaborate more closely
- Still developing FMS integration, testing physics schemes
- Python frontend for RTE-RRTMGP is intriguing
- Updates from NASA
- Updates to Serialbox PPSer (and some core library development)
- Physics motifs
- Adding capacity for stencils to natively have access to current k index, hard to make work in numpy backend
- Some pointwise stencils need to be applied with a boolean 2D field, can the mask be applied to the stencil definition instead of inside the stencil
- Updates from ETH/ECMWF
- Porting ecrad, done cloud optics, doing gas optics, saving solver for last
## 2024-11-19
- Updates from CSCS
- ICON4Py performance
- Updates from NOAA
- Working on implementation of workflow tools FMS, grid generation
- Pace will couple to OpenACC version of RTE-RRTMGP in the short term, starting discussions with Robert Pincus about GT4Py version
- Aiming for full physics runs ASAP
- Updates from NASA
- Updates from ECMWF/ETH
## 2024-10-22
- Updates from CSCS
- Updates from NOAA
- New release of NDSL, includes support for latest versions of GT4Py and DaCe
- Plan for new releases every 2 months from now on
- Continuing to port Physics schemes, ocean and sea ice finished, LSM and PBL in progress, shallow convection started
- Updates from NASA
- Experimental features ready for a first pass usage
- Absolute K indexing
- Cast to `int`
- `debug` backend (e.g. iterative python)
- Pressing issue for the GEOS physics
- Better `serialbox` capacities to allow for data serialization deep within solvers
- Stencil and sub-stencil mixed precision
- Will probably solve most if not all of it at the frontend level with type hinting and casting
- Updates from ECMWF/ETH
## 2024-09-24
- Next GT4Py workshop 2025
- aim for 4 days workshop to allow going deeper as in the first workshop
- at CSCS in Zurich
- April - June 2025 time-frame
- Updates from CSCS
- Tagged a new release, will drop 3.8, 3.9 soon
- Fixing ROCm support for storages (for more recent cupy?)
- Planning a physics workshop on our side
- TODO: inform this meeting about the concrete plans
- Updates from NOAA
- Updates from NASA
- Serialbox: developed a `!$ser data_buffered` system that buffers scalars up until the array is full and do a dump then. Also, a `!$ser flush_savepoint` to dump all buffers with no checks.
- Physics features worked on an `unstable` fork (https://github.com/FlorianDeconinck/gt4py/tree/unstable/develop)
- pure "debug" Python backend
- absolute K indexing
- cast to `int` in stencil
- data dimensions access via variable for numpy
- K-offset write [Done and in mainline]
- nest K-interval
- dynamic K-interval (e.g. interval computed from variables) (could be a no-go)
- data dimensions better failure & feedback
- Debug backend in Python with "readability > perf" as an implementation principle
- Serious bug around Inliner and While loop to be PR soon
- Updates from ECMWF/ETH
## 2024-08-27
- Updates from CSCS
- License changed
- gt4py.next IR refactoring
- Updates from NOAA
- Florian will investigate https://github.com/GridTools/gt4py/pull/1612/files
-
- Updates from NASA
- DaCe: move conditionals from tasklet to DaCe control flow
- gt4py.cartesian: int cast prototyped, need to debug the offset in K PR, need a solution for absolute indexing of K within a FieldAcess
- Updates from ECMWF/ETH
## 2024-07-02
- GT4Py workshop retrospective
- physics patterns
- first merge, then categorize and discuss solutions
- example should be standalone runable, ideally Python code
- see also [minutes](https://hackmd.io/-3fddUjLRXi6bJDBTkR41Q)
- Updates from CSCS
- Minimum Python version (>=3.10 ?)
- probably ok, will check with Florian
- Numpy 2.0 support
- gridtools_cpp minimum boost version changed
- License
- Updates from NOAA
- Working on surface physics, k-extents can differ between columns (but known at compile-time)
- CPU performance
- Updating Python and package versions
- Resolving new DaCe bugs
- Updates from NASA
- Updates from ECMWF/ETH
- Looking into DaCe orchestration
## 2024-05-07
- Updates from CSCS
- No relevant updates in the `gt4py.next` development
- Still working in the license change and the removal of the CA
- Next week we will send more details about the GT4Py workshop after PASC
- Updates from NOAA
- Oliver was invited to present Pace work for NCAR group (next Tuesday)
- Updates from NASA
- Team will be in place in August
- Fixed k-offset write bug in DaCe backends
- Should be ready for review
- Full regression on DaCe backends to be done
- Identified unexpected side effects of data-dimension unrolling (https://github.com/GridTools/gt4py/pull/815)
## 2024-04-09
- Updates from CSCS
- Abishek and Sam from EXCLAIM joined the meeting. They are currently working in calling GT4Py/Python from FORTRAN and would like to learn and discuss past approaches in PACE.
- License change: it looks like we'll be able to change the GT4Py license very soon.
- PASC minisyposium is preliminary scheduled for Wednesday (https://pasc24.pasc-conference.org/minisymposia/)
- We will start soon planning the GT4Py workshop and will send you updates.
- Updates from NOAA
- Oliver will attend PASC and the GT4Py workshop in June.
- Working in physics & PACE coupling.
- Next steps could involve more scientists working with the frontend and some numeric development.
- More work in PACE infrastructure. FMS Fortran library provides services to Fortran codebase, and there is ongoing work to add python bindings with other features like domain decompositions.
- Start looking into CPU performance in the C++ code generation.
- Looking into using Intel compilers.
- GSL team beginning project to port CCPP physics into GT4Py
- NVIDIA/Wyoming team also looking to transition physics code to Python
- Updates from NASA
- Asked @havogt for a SerialBox release
## 2024-03-12
- Updates from CSCS
- Christos shared state of halo exchange library node in icon4py and DaCe
- architecture workshop: https://hackmd.io/C_I3twUjRduo08BmgbcwWQ
- Updates from NOAA
- NDSL release (2024.03.00): https://github.com/NOAA-GFDL/NDSL/releases/tag/2024.03.00
- GT4Py: `main` branch as of March 6th.
- Rebuilt simple docker entrypoints for model testing and development
- Debugging doubly-periodic PyFV3 almost finished
- Physics implementations in PySHiELD:
- Fully stencil-based SHiELD microphysics using off-center writes
- Integrating PBL scheme, updating old port with newer features (while loops, higher-dimensional fields)
- Updates from NASA
- off-center writes PR ready
- looking into for-loop
- 3.11.7 canonical python from now on
- Intel compilers failing on C++17 includes
## 2024-02-13
- Updates from CSCS:
- Regular releases
- NOAA/NCAR/NREL Open Hackathon
- PASC mini-syposium got accepted
- Updates from NOAA
- Xingqiu Yuan joined MSD team, coming from E3SM Kokkos implementation
- Working on implementing more physics in Pace, restructuring physics infrastructure
- microphysics
- surface exchange
- turbulence
- Breaking Pace into multiple modules
- NDSL contains the infrastructure for DSL model development and utilities
- pyFV3 dynamical core
- pySHiELD physics parameterizations
- Updates from NASA
- Will start locking in physical parametrizations
- discuss writes with vertical offsets
- https://github.com/GridTools/concepts/pull/34
- Problem of DaCe parser in 3.11 in full Pace
- Updates from ECMWF-related
## 2024-01-16
- Updates from CSCS
- License of GT4Py can be changed to BSD 3-clause
- no blocker from ICON, ICON will be opensource End of January
- Updates from NOAA
- Rusty will be at PASC
- benchmark setup for Pace
- Updates from NASA
- people will be at PASC
- validation problems of 32bit float vs Fortran
- Updates from ECWMF-related
## 2023-12-12
- Updates from CSCS
- Organizational:
- PASC mini-symposium
- Pace in AI2 github organization
- Embedded execution
- Updates from NOAA
- refactoring:
- separation of concerns of infrastructure (I/O, domain decomposition, ...) from model components
- abstracting some GT4Py interfaces
- integrating SHiELD microphysics into Pace
- Updates from NASA
- updated GT4Py version
- setting up a specific DSL team: DSL operationalize large parts of GEOS
- Updates from SPCL
- Updates from ECWMF-related
- porting current version of IFS CloudSC to GT4Py
- PhD student working on porting of ECRad to different Python implementations
- Sara (ECMWF) working on gt4py.next for global FVM in GT4Py
Let's restart the meeting on Jan 16th.
## 2023-11-14
- Updates from CSCS
- Continuing the development of the embedded mode and GPU storages for next in python
- Two new hires (Christos, Philip) joined the team to work together with the DaCe team to improve and develop the DaCe backend for gt4py.next
- Workshop in Germany next week about gt4py for the ICON community. The materials will be uploaded to the examples folder in the repo.
- Still waiting the decision about the PASC minisymposium proposal
- Updates from SPCL:
- Ramping up new ideas for the representation of stencils as SDFG
- Updates from NOAA
- Experimenting with adapting the numpy backend to generate JAX code. It seems to work fine for single stencils.
- Working in model infrastructure (IO, configurations, ...)
- There is a new hire working part-time on PACE
- Working with other postdocs in scientific applications for PACE
- Refactoring PACE code and removing legacy parts
## 2023-10-17
- Updates from CSCS
- Minisymposium at PASC about GT4Py
- Model development workshop before or after PASC? Who would be interested?
- Pace
- Tasmania (Stefano Ubbiali)
- FVM-LAM, FVM global (ECMWF, ETH)
- ICON EXCLAIM
- Updates from NASA
- what's the status of AMD?
- LUMI AMD MI250x is faster than P100 but slower than A100 for cloud sc dwarfs.
- performance gap bigger for more complex programs
- What are the compiler options on Lumi for compiling AMD GPU code?
- Updates from NOAA
- Pace cleanup to get ready for new developments
- work on double-periodic domain
- Updates from FVM-related
- latest version of ghex is not working on Piz Daint
## 2023-09-19
- Update from CSCS
- Coming back to real work after summer-time
- Keep working in embedded field-view
- Planning a GT4Py (next) workshop for the ICON community at the end of November
- Updates from NASA
- Florian implemented global/absolute indexing using data dimensions trick without spatial dims.
- Florian playing with auto-differentiation with JAX and numpy backend (jitting doesn't work yet)
- Memory pressure issues with the GridTools backend (but DaCe works well) ??
- Small speedup in the CPU GridTools backend. Now it's only 1.8-2.6x slower than Fortran
- Updates from NOAA
- Frank is working in gt4py and PACE.
- PACE vision document for cleaning up the code
- More people joining soon to the effort
- Frank is working in PACE: more modular, analytic test cases, code refactoring, and in the future grid-generation and other initialization cleanups
- Oliver also working in PACE with other scientists/post-docs
- Oliver will present one paper in the ECMWF workshop
## 2023-08-22
- Update from CSCS
- Storage refactoring merged
- License
- Updates from NASA
- GEOS performance numbers for SC
- Updates from NOAA
- Pace infrastructure
- Updates from FVM-related (ECMWF, ETH)
## 2023-07-26
- Updates from CSCS
- Linus left
- make DaCe partial expansion work for as much as possible of Pace
- did some handover to Florian
- status of parallel compilation PR (https://github.com/GridTools/gt4py/pull/1242)
- distributing compilation on MPI ranks would be more interesting for Pace use-case -> tool on top of GT4Py
- in gt4py.next:
- working on embedded field view
- better exceptions handling
- (currently only for next: dlpack bindings support)
- Updates from NASA
- we have a PR up that updates Pace to the latest gt4py and DaCe (as of last Friday) with good validation
- GT4Py issue: reload so broken (not blocking)
- Updates from NOAA
- (@Oliver: we would need a our CAA signed for your contributions)
- validations and verification of Pace
- possible next schemes: radiation schemes
- Updates from FVM-related (ECMWF, ETH)
- support for AMD GPUs in gt4py.cartesian
-
- Conferences:
- Christian Kühnlein and Till(?) will present at ECMWF workshop
## 2023-05-30
- Updates from CSCS
+ Get feedback from issues from previous meetings
+ Cleanups in internal tests infrastructure
+ Cleaning up public user interface for next (errors, storages)
+ Finishing the DaCe backend
- Updates from NASA:
+ Working on the GEOS side, not much on gt4py
+ Will send a reproducible artifact for the OpenACC vs gt4py
+ Working on physics and discovered some missing features/problems to one-to-one porting:
+ Lookup tables: constant global arrays accessed at runtime from points
+ Break early the computations according to some runtime condition
+ Working in the distributed compilation for FV3 (9 caches, ...)
+ DaCe backend: pending on a couple of bugs/issues but talked to SPCL people and they are working on it
+ Expanding communication layer for non-square layouts
- Updates from NOAA:
+ Working on adding `log10` builtin. Pending on adding the feature in the DaCe backend (DaCe PR already opened)
+ Transforming lookup tables to computations (it should work for now for microphysics)
+ Restructuring and refactoring PACE FV3 dycore
+ There is a possible candidate to help in the gt4py side who could join soon. Still looking for other candidates as well.
+ Duo grid experiments. How could it be accelerated? It currently requires a lot of small computation kernels.
+ "Unifying ... " workshop organized by NOAA will have a talk by Christian Kuhnlein
- Updates from FVM:
+ Poster on PASC
+ NWP conference ...
+ Christian is totally focused on the new global model using gt4py.next
+ The local cartesian model is more or less feature complete
+ Working in CloudSC and CloudSC2 microphysics. Finished the collection of timings per stencils (found a couple of bugs)
+ CloudSC has Fortran, CudaC, Loki (source-to-source translation tool) versions
+ kFirst 20-30% slower on CPU than Fortran
+ DaCe-GPU backend is on par or faster than OpenACC, faster than Loki, and slower than a optimized CudaC implementation
+ CloudSC2 doesn't have a Fortran GPU implementation, so the gt4py implementation is the first GPU version
- PACE paper: https://gmd.copernicus.org/articles/16/2719/2023/
## 2023-05-02
- Updates from CSCS
- parallel compilation PR
- scipy dependency removed (from default)
- Updates from NASA
- Validation of GEOS looking good - still in progress
- benchmark in-situ is 3.25x faster for the dycore
- `gt:gpu` closed the performance gap to orchestrated `dace:gpu`
- few versions behind for both framework
- better thread/block for GT (preliminary nsys)
- `dace:gpu` generates too many kernels
- deeper investigation to come
- GridTools backend at scale: is there code out there?
- On the roadmap:
- test the parallel PR
- bundle physics OACC vs GT examples
- download Linus brain on gt -> dace & dace CPU
- Updates from NOAA
- refactoring Pace to make it more maintainable (eventually to be merged to the main branch)
- new microphysics validated
- Updates from FVM~~ECMWF/ETH~~
- collecting performance numbers for cloudsc
- looking into porting ECRAD
## 2023-04-04
- Updates from CSCS
- @tehrengruber gist from FVM containing parallel compilation: https://gist.github.com/tehrengruber/bfc0050cf9f46e4fee031ca2bac0e3d8, Line 217ff
- Draft: Drop scipy as required dependency -> slow gamma function by default
- Refactored tests directory structure.
- How is Pace using gt4py.storages?
- allocators are used
- double-k-loop-off-center write pattern, see gh issue
- Updates from NASA
- 4.6x from 1 1/2 node-node (72 CPU cores vs 6 GPUs) 32-bit
- baseline: classic GEOS Fortran
- vs hybrid: physics on CPU, dycore on GPUs
- Physics ported with OpenACC vs GT4Py+DaCe
- bug in DaCe kernel fusion
- GT4Py+DaCe 20% faster than OpenACC
- data locality
- temporary array removal
- CSCS would be interested in seeing a comparison
- Updates from NOAA
- in microphysics:
- `__INLINED` is deprecated: https://github.com/GridTools/gt4py/issues/1012, but a compiler pass to eliminate compile time ifs is missing.
- [ ] would be interesting to see if `__INLINED` actually improves performance or the C++ compiler optimizes properly.
- applying for perlmutter allocation
- projects to optimize maybe at the LLVM level
- Updates from ECMWF:
- LAM-FVM (local are on gt4py.cartesian)
- cloudsc ported to GT4Py
- GT4Py CPU backends slower than Fortran OpenMP
- GT4Py+Dace backend faster than optimized OpenACC, except:
- gt:gpu doesn't compile (compiler hang)
- cloudsc integrated in FVM
- with GHEX python bindings
## 2023-03-07
- Updates from CSCS
- `gt4py.next` part of the main branch
- Updates from NASA-GSFC
- Pace updated to work with float32
- Compiling Pace from scratch requires several (3-4) hours
- Can we build in parallel? (04.04 @tehrengruber gist from FVM containing parallel compilation: https://gist.github.com/tehrengruber/bfc0050cf9f46e4fee031ca2bac0e3d8, Line 217ff)
- Updates from NOAA-GFDL
- integration microphysics into Pace
- has writes on vertical offsets (which is not implemented in GT4Py)
- short term solution in Cartesian could be US implements the feature with CSCS design help
- longer term CSCS will implement a solution for gt4py.next which could be plugged into Pace with DaCe
- see https://github.com/GridTools/gt4py/issues/131
- Cartesian vs gt4py.next example
- Once we have a DaCe backend for gt4py.next, gt4py.next code could be integrated in DaCe orchestrated code (maybe summer 2023)