Brainstorming Cycle 30 07/25

Memory consumption (Christoph)

Investigate memory consumption
Push OpenACC -> CUDA mempool from Dmitry
Fortran: check that we don't have allocations that are not used
Backend keep temporary too long?
- How important is this?
- DaCe+gtfn: re-use/release temporaries within a program
DaCe/gtfn: don't use the CUDA memory pool probably
- check that DaCe does cudaMallocAsync if switched to per run allocation
ICON4Py manual temporaries (decrease scope to where it's needed)
- make cupy use the cuda memory pool

Crash in mch_ch2 (Hannes)

not resolved in gtfn
check with Dace

Make dynamic substepping available from Fortran (pass relevant variables to Python), combine the programs more and testing of CFL exceed cases (extra diffusion and extra substep) (Chia Rui)

Continue deployment: uenv + venv including ICON Fortran (Christoph)

and put it in CI?
shaped here

Performance benchmarks (Christoph)

Do we have a production relevant experiment running in CI?
- soon
Is total currently a good comparison of DSL vs OpenACC?
- Use total plus model_init or time with external timer
median would be ideal, but does not exist right now. Let's use min for now.
- switch back to average as soon as we pre-compile
Get the mch_ch2 and mch_ch1 running (not in CI).
icon4py: mch-ch2 and mch-ch1-medium on 1 GPU
icon-exlclaim: mch-ch1-medium on 1 GPU (+ mch-ch2 on 4 GPUs, as soon as possible)
see also the task in ICON4Py

Brainstorming Cycle 30 07/25

Brainstorming Blueline production

GT4Py

DaCe (optimizations)

ICON4Py

Integration

Greenline

Other

Came up later

Brainstorming Cycle 30 07/25

Brainstorming Blueline production

GT4Py

DaCe (optimizations)

ICON4Py

Integration

Greenline

Other

Came up later

Read more

[Blueline] Debug mch_ch2

[DaCe] Optimization IX

[DaCe] Optimize the Optimizer

OVERVIEW - Cycle 30 05/25