This was done by GLM-5.1. Requested by @msimberg. Done on commit ab2c3ecba9084a39ec2146c1ca07b4d3b76657a2. # ICON4Py Architecture Review **Date**: 2026-04-27 **Codebase**: ~66K lines source, ~57K lines tests, 11 packages, uv monorepo **Reviewer**: Automated architectural review --- ## Executive Summary ICON4Py is a well-structured scientific codebase with a clean monorepo layout, enforced dependency boundaries, and consistent coding patterns. The core architectural decision -- a flat dependency graph where atmosphere packages depend only on `common` -- is sound and keeps coupling manageable. The codebase shows signs of organic growth with some areas of over-engineering (unused orchestration layer, elaborate factory patterns) and under-engineering (missing shared abstractions for cross-package duplication, incomplete driver migration). **Overall health: Good, with specific areas that deserve attention.** ### Key numbers | Metric | Value | |---|---| | Source files | 328 | | Test files | 355 | | Source lines | ~66K | | Test lines | ~57K | | Packages | 11 (10 model + 1 tools) | | TODOs in source | 263 | | `type: ignore` in source | 60 | | Largest file | `serialbox.py` (2,252 lines) | --- ## 1. Architecture Positives ### 1.1 Clean monorepo structure with enforced boundaries The `tach.toml` dependency graph is well-designed: atmosphere packages depend only on `common`, tools is fully independent, and the driver layer is the single integration point. Tach enforces this at CI time. No circular dependencies exist between packages (with one exception noted below). ### 1.2 Consistent per-package structure Each atmosphere package follows the same pattern: `<component>.py` (main class), `<component>_states.py` (dataclasses), `stencils/` (one file per stencil). This makes it easy to navigate unfamiliar packages. ### 1.3 Shallow inheritance hierarchies The grid class hierarchy is only two levels (`Grid` -> `IconGrid`). State types are flat dataclasses with no shared base. No deep inheritance chains anywhere. This is appropriate for a numerical codebase where composition wins over inheritance. ### 1.4 Good test infrastructure The pytest plugin architecture is clean: custom markers (`datatest`, `mpi`, `level`), custom CLI options, a well-designed filter system, and the `StencilTest` base class with `__init_subclass__` for automatic test registration. The fixture re-export pattern is explicit and debuggable. ### 1.5 The py2fgen tool is well-engineered The Python-to-Fortran binding generator is a clean, self-contained code generation pipeline. It uses Jinja templates, CFFI embedding, and handles the GPU memory model correctly. The `@export` decorator with annotation hooks is an elegant API. ### 1.6 Dual CI is appropriate Fast GitHub Actions (lint, unit tests) + expensive CSCS CI (GPU/HPC) is the right split for a climate modeling codebase. Nox provides local/CI parity. Pre-commit hooks are thorough. ### 1.7 The Pair/TimeStepPair abstraction `utils/_common.py` defines a generic `Pair[T]` with `PredictorCorrectorPair` and `TimeStepPair` specializations, plus a `named_property` descriptor and `chainable` decorator. This is well-designed, generic Python utility code with zero domain coupling. ### 1.8 Decomposition layer is well-factored The separation between graph partitioning (`decomposer.py`), topological halo construction (`halo.py`), and runtime MPI communication (`mpi_decomposition.py`) is clean. The `Decomposer` protocol with `MetisDecomposer` / `SingleNodeDecomposer` is a textbook strategy pattern. --- ## 2. Architecture Concerns ### 2.1 `standalone_driver` has undeclared dependencies (HIGH) **tach.toml** declares `standalone_driver` depends only on `common`. In reality, it imports from `advection`, `diffusion`, and `dycore` across 4 source files with 12 import statements. This is the largest undeclared dependency in the codebase. **Impact**: Tach is not catching this. The dependency boundary enforcement is incomplete. **Fix**: Update `tach.toml` to declare `standalone_driver -> advection, diffusion, dycore, common`. This would make it symmetric with `driver`. ### 2.2 Circular dependency between `decomposition` and `orchestration` (MEDIUM) `decomposition/definitions.py` imports `DummyNestedSDFG` from `orchestration/halo_exchange.py`. `orchestration/decorator.py` imports `ExchangeRuntime` from `decomposition/definitions.py`. And `decomposition/mpi_decomposition.py` imports `halo_exchange` from `orchestration/halo_exchange.py`. **Impact**: The `DummyNestedSDFG` class (a no-op SDFG for single-node mode) is a decomposition concept that lives in the wrong module. **Fix**: Move `DummyNestedSDFG` to `decomposition/` or a shared utility. ### 2.3 Two drivers in incomplete migration state (MEDIUM) `model/driver` (serialbox-based, legacy) and `model/standalone_driver` (computed, future) coexist with massive code duplication: - `_update_time_levels_for_velocity_tendencies()` is identical in both - `_do_dyn_substepping()` is nearly identical - The JW initial condition Newton iteration is copy-pasted across 3 files - `testcases/utils.py` has duplicate `hydrostatic_adjustment_ndarray` and `zonalwind_2_normalwind_ndarray` The standalone driver has a TODO to "change the name (remove the standalone)" when it becomes the main driver. **Impact**: Every fix must be applied twice. The migration is stalled. **Fix**: Extract shared time-stepping infrastructure and test case utilities to `common` or a shared `driver_utils` package. Then complete the migration. ### 2.4 No common component interface (MEDIUM) The atmosphere packages have completely bespoke APIs: - `SolveNonhydro.time_step(...)` takes 10+ arguments - `Diffusion.run(...)` takes 4 arguments - `Advection.run(...)` takes 5 arguments Each has different config types, different state types, different initialization patterns. Adding a new component requires modifying the driver's time loop, state construction, and configuration manually. The `Component[Ins, Outs]` protocol in `components/components.py` was designed for this but has **zero implementations**. It is dead scaffolding. **Impact**: The driver is a 600-line wiring script that knows the internal field names of every package's state. **Fix**: Either commit to the `Component` protocol or accept the procedural wiring. The current state (protocol exists but is unused) is the worst of both worlds. ### 2.5 The `@orchestrate` decorator is dead code (MEDIUM) `orchestration/decorator.py` (667 lines) implements a DaCe SDFG fusion decorator. It has **zero usages** in the codebase. No function applies `@orchestrate`. The underlying halo exchange SDFG infrastructure IS used (by `mpi_decomposition.py`), but the decorator itself is speculative. **Impact**: 667 lines of dead, high-complexity code. The reflective attribute scanning (iterating `self.__dict__` to find grid/exchange objects) is fragile and undocumented for future maintainers. **Fix**: Either commit to using it (with tests) or remove it. The halo exchange SDFG generation can stay since it is actually used. ### 2.6 Testing package has zero tests (LOW) The `model/testing` package (4,860 lines) has no unit tests. Core utilities like `data_handling.py`, `test_utils.py`, `definitions.py` are untested. **Impact**: Bugs in test infrastructure only surface when tests fail for other reasons. **Fix**: Add at least basic unit tests for `test_utils.py`, `data_handling.py`, `definitions.py`, and `locking.py`. --- ## 3. Overengineering ### 3.1 The `FieldSource` / `FieldProvider` abstraction in `states/factory.py` 804 lines defining 4 protocols, 4 concrete provider classes, and utility functions. The abstraction cost is high for 3 production consumers: `metrics_factory.py`, `interpolation_factory.py`, and `grid/geometry.py`. **Specific issues**: - `EmbeddedFieldOperatorProvider` and `ProgramFieldProvider` share ~60 lines of allocation and offset-provider logic but have no common base - `FieldSource` is a Protocol with a full default implementation of `get()` -- non-idiomatic - `_providers: MutableMapping[str, FieldProvider] = {}` is a mutable class-level default on a Protocol (shared-state bug risk) - `ModelField` and `DataField` (in `model.py`) are unused in production **Verdict**: The abstraction is justified in principle (lazy evaluation, dependency graph, multiple backends) but overbuilt for the current number of consumers. The 4 provider classes could be 2 with a shared base. ### 3.2 The `Component` protocol `components/components.py` defines a generic `Component[Ins, Outs]` protocol with declared input/output metadata. Zero implementations. This is forward-looking scaffolding that should either be committed to or removed. ### 3.3 Per-package `pyproject.toml` duplication All 12 packages share identical build system requirements, classifiers, bump-my-version config, and ruff config references. The gt4py version (`==1.1.9`) is hardcoded in 11 of 12 files with no single source of truth. **Fix**: Use a `[dependency-groups]` entry in the root `pyproject.toml` or a `[tool.uv.override]` to centralize the gt4py pin. --- ## 4. Underengineering / Missing Shared Abstractions ### 4.1 InterpolationState duplication between diffusion and dycore `DiffusionInterpolationState` and `dycore.InterpolationState` share 8 identical fields (`e_bln_c_s`, `rbf_coeff_1`, `rbf_coeff_2`, `geofac_div`, `geofac_n2s`, `geofac_grg_x`, `geofac_grg_y`, `nudgecoeff_e`) with only dtype differences. The dycore version has 12 additional fields. **Fix**: Extract a shared base `InterpolationState` to `common` with the 8 common fields. Packages extend with their specific fields. ### 4.2 `_determine_local_domains()` boilerplate Every main class (advection, diffusion, dycore, velocity_advection, microphysics) repeats the same 10-20 lines computing horizontal domain objects. This is pure boilerplate. **Fix**: A helper function in `common/grid/horizontal.py` that returns a frozen dataclass of all zone indices. ### 4.3 `_scale_k` stencil duplication Identically defined in both `diffusion_utils.py` and `dycore_utils.py`. **Fix**: Move to `common/math/` or `common/stencils/`. ### 4.4 `nudge_max_coeff` config handling The same 15-line pattern for computing `max_nudging_coefficient` from `_nudge_max_coeff` is copy-pasted verbatim between `diffusion.py` and `solve_nonhydro.py`. Both files have TODOs acknowledging this. ### 4.5 No structural link between state fields and CF metadata `PrognosticState.rho` is a `CellKField[wpfloat]` but nothing enforces that `PROGNOSTIC_CF_ATTRIBUTES["air_density"]` describes this same field. The correspondence exists only by naming convention. **Impact**: If someone renames a state field or adds a new one without updating the metadata, tests silently become incorrect. **Fix**: Consider deriving metadata from the state dataclass via a decorator or metaclass, or at minimum, add a validation test. ### 4.6 `ddqz_z_full` is in 4 different metric states Advection, diffusion, dycore, and microphysics all define their own metric state containing `ddqz_z_full`. A base metric state with this common field would reduce duplication. --- ## 5. Code Smells (The Small) ### 5.1 Bugs | Location | Description | |---|---| | `geometry_attributes.py:269` | Copy-paste bug: `EDGE_TANGENT_Z` has `standard_name=EDGE_NORMAL_Z` (2026-05-04, fixed in https://github.com/C2SM/icon4py/pull/1230) | | `factory.py:761` | `NumpyDataProvider.needs_exchange()` references undefined `self._do_exchange` (will raise `AttributeError`) | | `factory.py:172` | Mutable class-level default `{}` on Protocol `FieldSource._providers` | ### 5.2 Frozen dataclass mutation via `object.__setattr__` Used in `GlobalGridParams.__post_init__` (`icon.py:86-105`) and `VerticalGrid.__post_init__` (`vertical.py:145-170`). If a dataclass needs mutation during initialization, it should not be frozen. ### 5.3 `grid/states.py` naming collision with `model/common/states/` Two different modules named `states` in the same namespace tree serving different purposes. `EdgeParams`/`CellParams` (old pattern) coexists with the `FieldSource` factory (new pattern). Rename to `grid/geometry_params.py`. ### 5.4 `EdgeParams` mutable bag-of-fields anti-pattern 20 optional constructor parameters, all defaulting to `None`. Widely used across 45+ files. This is the old pattern that the `FieldSource` factory was designed to replace, but both coexist. ### 5.5 `geometry.py` registration verbosity ~600 lines of repetitive provider registration code in `GridGeometry`. Each provider follows the same `factory.ProgramFieldProvider(func=..., deps=..., fields=..., domain=..., do_exchange=...)` pattern. A data-driven approach could reduce this substantially. ### 5.6 `metric_fields.py` and `interpolation_fields.py` are misnamed Despite the `_fields` naming, these are stencil/computation libraries, not field definitions. `metric_fields.py` (960 lines) contains 17 GT4Py stencil programs. `interpolation_fields.py` (1303 lines) contains 20+ numpy computation functions. ### 5.7 `math/helpers.py` is misnamed 661 lines of core spherical geometry and vector algebra field operators. The name "helpers" suggests minor utilities. Should be `vector_algebra.py` or `spherical_geometry.py`. ### 5.8 Private symbol leakage Both `derivative.py` and `smagorinsky.py` have their underscore-prefixed field_operators imported directly by atmosphere packages (`_compute_first_vertical_derivative_at_cells` by dycore, `_en_smag_fac_for_zero_nshift` by diffusion). This breaks the intended `field_operator` + `program` encapsulation pattern. ### 5.9 `math/xp_utils.py` is dead code Contains only `compute_sqrt(input_val: np.float64) -> np.float64` wrapping `math.sqrt`. Zero internal consumers. The actual numpy/cupy switching is in `data_allocation.py`. ### 5.10 `initialization/topography.py` is orphaned Zero imports. No `__init__.py` in the `initialization/` directory. Either dead code or not yet connected. ### 5.11 State frozen/mutable inconsistency | Package | States | frozen? | |---|---|---| | Advection | All | Yes | | Diffusion | All | Yes | | Dycore | All | **No** | | Microphysics | Mixed | Yes + N/A | Dycore's mutable states allow accidental modification. The inconsistency makes reasoning about state ownership harder. ### 5.12 Typo in filename `microphysics/stencils/microphyiscal_processes.py` should be `microphysical_processes.py`. ### 5.13 `muphys/pyproject.toml` inconsistencies - Uses `license = {file = "..."}` while all others use `license = {text = "..."}` - Different `authors` format - Missing `Python :: 3.11` classifier - Uses `repository` in URLs while others use `Homepage` ### 5.14 `bindings/pyproject.toml` dead extra Still has a `cuda11` optional dependency, but the root `pyproject.toml` only defines `cuda12`, `cuda13`, `rocm7`. --- ## 6. The `model/common` Package: Deep Assessment `model/common` is the hub of the codebase at ~22K lines (100 source files). Everything depends on it. Its internal architecture determines the health of the entire system. ### 6.1 Internal structure ``` common/ grid/ 6,324 lines -- Grid construction, geometry, connectivities states/ 1,441 lines -- State management, field factory metrics/ 3,450 lines -- Vertical grid metrics interpolation/ 3,372 lines -- Interpolation coefficients orchestration/ 1,080 lines -- DaCe SDFG fusion (unused decorator) decomposition/ 1,795 lines -- MPI decomposition, halo exchange components/ 143 lines -- Component/Monitor protocols (aspirational) math/ 1,029 lines -- Vector algebra, stencils utils/ 706 lines -- Generic utilities, memory allocation io/ 1,061 lines -- NetCDF output, UGRID initialization/ 44 lines -- Topography (orphaned) + top-level: ~800 lines -- constants, dimension, type_alias, backends, options, exceptions ``` ### 6.2 Pattern: The three-layer architecture Three submodules (`grid/`, `metrics/`, `interpolation/`) share an identical three-layer pattern: ``` Layer 1: _attributes.py -- Pure data: field names + metadata (1,182 lines total) Layer 2: _fields.py + compute_* -- Computation: stencils + numpy functions Layer 3: _factory.py -- Wiring: dependency graph + lazy evaluation ``` This is a sound architecture. The question is whether the pattern is explicit enough to benefit from a shared base abstraction. ### 6.3 The factory pattern is verbose but justified `metrics_factory.py` (1,022 lines) and `interpolation_factory.py` (685 lines) are large, but they do one thing: wire dependency graphs for lazy field computation. The verbosity comes from the `ProgramFieldProvider(...)` constructor calls (~15-20 lines each). A more concise DSL could halve the line count. Both factories share the same constructor signature, the same `_register_computed_fields` pattern, the same `_sources` property, and the same `metadata`/`backend`/`grid` properties. A `FieldFactoryBase` could extract this common structure. ### 6.4 The old/new pattern coexistence Two patterns for passing geometry data coexist: - **Old**: `EdgeParams`/`CellParams` in `grid/states.py` -- mutable bag-of-fields, widely used - **New**: `FieldSource` / `GridGeometry` in `states/factory.py` -- lazy evaluation with metadata The new pattern is used in `standalone_driver` and `GridGeometry`. The old pattern is used in `driver` (serialbox path), `bindings`, and many tests. Both will likely coexist for the foreseeable future. ### 6.5 What `common` does well - **Clean dimensional model**: `dimension.py` defines all GT4Py dimensions in one place. `field_type_aliases.py` provides parameterized type aliases used everywhere. - **Backend abstraction**: `model_backends.py` and `model_options.py` handle GT4Py backend configuration with a registry pattern. - **Physical constants**: `constants.py` is a clean mapping of ICON's `mo_physical_constants.f90`. - **Memory allocation**: `data_allocation.py` (228 lines) handles numpy/cupy/GT4Py field allocation uniformly. ### 6.6 What `common` could improve 1. **Extract `connectivity.py` from `grid_manager.py`**: The 7 connectivity construction functions (lines 548-776) are pure array manipulation unrelated to grid file management. 2. **Rename `grid/states.py` to `grid/geometry_params.py`**: Eliminates naming collision with `states/`. 3. **Consolidate `math/` patterns**: Currently 3 different organizational styles (operators+stencils split, helpers monolith, operator+program in one file). Pick one. 4. **Split `decomposition/definitions.py`**: At 679 lines it defines streams, process properties, decomposition info, exchange protocols, reduction protocols, and factory functions. Split into `streams.py`, `process.py`, `exchange.py`, `reductions.py`. 5. **Remove dead code**: `xp_utils.py`, `initialization/topography.py`, `ModelField`/`DataField`/`Component` protocol, `@orchestrate` decorator. --- ## 7. Per-Package Assessment ### Advection -- Cleanest atmosphere package Clear ABC + Strategy pattern, frozen state dataclasses, one stencil per file, good docstrings, reasonable file sizes. The `GodunovSplittingAdvection` splits into horizontal/vertical sub-modules cleanly. ### Diffusion -- Solid, with coupling to dycore Well-structured but shares interpolation state with dycore (8 identical fields). The config uses a mutable class rather than a frozen dataclass (inconsistent with advection). Imports a private symbol from `math/smagorinsky.py`. ### Dycore -- Largest and most complex `solve_nonhydro.py` at 1,427 lines is the largest source file. The `__init__` alone is ~440 lines of stencil setup. Mutable state dataclasses. `DiagnosticStateNonHydro` has 21 fields mixing concerns (IAU increments, predictor/corrector pairs). `IntermediateFields` exists because the main class is already too large. **Recommendation**: Factor `__init__` stencil setup into a separate builder. Split `DiagnosticStateNonHydro` into core diagnostics and extension fields. ### Microphysics -- Monolithic stencils, undertested `graupel_stencils.py` (1,322 lines) and `microphyiscal_processes.py` (1,135 lines, typo in filename) are monolithic files that do not follow the per-stencil-file convention. Test coverage is low (3857 src / 416 test lines). Does not use decomposition/exchange at all. ### Muphys -- Different but clean Function-oriented (no classes), minimal common surface area, clean `core/` + `implementations/` + `driver/` structure. Has its own constants in `core/common/constants.py` instead of using common's. Not integrated with the model driver. ### Driver + Standalone Driver -- Incomplete migration See section 2.3 above. The standalone driver is a superset being prepared to replace the original, but the migration is stalled with significant code duplication. ### Tools (py2fgen) -- Excellent Clean code generation pipeline, well-separated concerns, self-contained, no model dependencies. The `@export` decorator with annotation hooks is elegant. ### Bindings -- Functional but with module globals The `_init`/`_run` granule pattern matches ICON Fortran's module structure. Module-level globals (`granule = None`, `grid_state = None`) are acknowledged with TODOs. `dycore_wrapper.py` at 445 lines is the most complex binding. ### Testing -- Functional but bloated `serialbox.py` at 2,252 lines mixes generic serialization infrastructure with domain-specific savepoint definitions. The savepoint classes for each atmosphere component should be co-located with those packages. The testing package has zero unit tests. --- ## 8. CI/CD Assessment ### What works - Dual CI (GitHub fast feedback + CSCS GPU/HPC) is appropriate - Nox provides local/CI parity with 66 well-defined sessions - Pre-commit hooks use uv-managed tool versions - Tach enforces dependency boundaries - Docker images use SHA-based tags for reproducibility ### What needs attention - **gt4py version duplicated 11 times** across per-package pyproject.toml files - **GitHub Actions test-model disables caching** (48 matrix jobs would thrash cache, but costs wall-clock time) - **DaCe GPU backend excluded from several CI paths** due to compilation time, leaving coverage gaps - **No change-detection** to skip irrelevant matrix entries on PRs - **MPI Docker image rebuild** is expensive (builds OpenMPI, libfabric from source) --- ## 9. Actionable Recommendations (Priority Order) ### High Priority 1. **Fix tach.toml**: Declare `standalone_driver`'s actual dependencies on advection/diffusion/dycore. 2. **Fix bugs**: `geometry_attributes.py:269` copy-paste bug, `factory.py:761` undefined `self._do_exchange`, `factory.py:172` mutable class-level default. 3. **Extract shared `InterpolationState` base** for the 8 common fields between diffusion and dycore. 4. **Extract `_determine_local_domains()` helper** from the 5+ copies in atmosphere packages. 5. **Centralize gt4py version** in one place instead of 11 pyproject.toml files. ### Medium Priority 6. **Split `decomposition/definitions.py`** into `streams.py`, `process.py`, `exchange.py`, `reductions.py`. 7. **Extract connectivity construction** from `grid_manager.py` into `connectivity.py`. 8. **Rename misleading files**: `metric_fields.py` -> `metric_stencils.py`, `interpolation_fields.py` -> `interpolation_computations.py`, `math/helpers.py` -> `math/vector_algebra.py`. 9. **Decide on `@orchestrate` decorator**: Either commit to using it or remove it. 10. **Decide on `Component` protocol**: Either drive adoption or remove the dead code. 11. **Complete driver migration**: Extract shared time-stepping and test case code, then deprecate `model/driver`. 12. **Break up `serialbox.py`**: Move domain-specific savepoint classes to their respective packages. ### Low Priority 13. Add unit tests for `model/testing` package. 14. Fix `muphys/pyproject.toml` inconsistencies (license format, authors format, classifiers). 15. Remove dead `cuda11` extra from `bindings/pyproject.toml`. 16. Remove dead code: `xp_utils.py`, `initialization/topography.py`, `ModelField`/`DataField`. 17. Fix typo: `microphyiscal_processes.py` -> `microphysical_processes.py`. 18. Make dycore state dataclasses frozen for consistency. 19. Stop importing private field_operators from `derivative.py` and `smagorinsky.py`. --- ## 10. Metrics Summary ### Codebase size by package | Package | Source lines | Test lines | Test/Source ratio | |---|---|---|---| | common | 21,683 | 12,596 | 0.58 | | advection | 7,908 | 6,000 | 0.76 | | diffusion | 2,822 | 3,456 | 1.22 | | dycore | 10,472 | 15,146 | 1.45 | | microphysics | 3,857 | 416 | 0.11 | | muphys | 4,168 | 2,615 | 0.63 | | driver | 2,452 | 996 | 0.41 | | standalone_driver | 2,223 | 246 | 0.11 | | testing | 4,860 | 38 | 0.01 | | tools | 1,625 | 1,255 | 0.77 | | bindings | 1,876 | 6,939 | 3.70 | ### Largest source files | File | Lines | Concern | |---|---|---| | `testing/serialbox.py` | 2,252 | God file, should be split | | `dycore/solve_nonhydro.py` | 1,427 | Large class, complex init | | `microphysics/stencils/graupel_stencils.py` | 1,322 | Monolithic stencil file | | `common/interpolation/interpolation_fields.py` | 1,303 | Misnamed computation library | | `microphysics/stencils/microphyiscal_processes.py` | 1,135 | Monolithic, typo in name | | `common/metrics/metrics_factory.py` | 1,022 | Verbose wiring, but justified | | `common/metrics/metric_fields.py` | 960 | Misnamed stencil library | | `common/grid/geometry.py` | 918 | Verbose registration | | `diffusion/diffusion.py` | 929 | Acceptable for main class | | `common/grid/geometry_stencils.py` | 852 | Systematic icosahedron/torus duplication |