ICON4Py benchmark on Clariden

# ICON4Py benchmark on Clariden Make sure to always work on scratch file system. Working on home file system will affect the entire system performance, including other users on the machine. For build and test execution, make sure to allocate a compute node (never run from login node). Alps user guide: https://user.cscs.ch/access/running/alps/ This environment is setup using [Stackinator](https://eth-cscs.github.io/stackinator/) to build the GT4Py dependencies (boost, cmake, gcc and Python 3.10). 1. Prepare a user environment using Stackinator (one time setup) You can skip this step and just use a pre-built software stack available in scratch file system: ``` export SQUASHFS_PATH="/iopsstor/scratch/cscs/epaone/uenv-images/clariden-icon4py-py310-cuda118.squashfs"` ``` The creation of a user environment follows the procedure described [here](https://eth-cscs.github.io/stackinator/configuring/). The above software stack was created based on this Spack-environment configuration: ``` gcc-env: compiler: - toolchain: gcc spec: gcc@11 mpi: spec: cray-mpich gpu: cuda unify: true specs: - boost@1.76 - cmake - cuda@11.8 - python@3.10 variants: - +mpi - +cuda - cuda_arch=80 views: default: ``` The entire configuration is available on Githb: https://github.com/edopao/alps-spack-stacks/tree/dev-icon4py/recipes/icon4py/a100 2. Checkout icon4py repo (one time setup) ``` cd $SCRATCH mkdir repo cd repo git clone https://github.com/C2SM/icon4py.git ``` 3. Allocate a compute node and load the user environment: Note that we export `CUDAARCHS="80"` in order to compile the CUDA kernels for the A100 GPU architecture. ``` cd $SCRATCH/repo/icon4py srun -A <YOUR_ACCOUNT> -N1 -t60 --partition=nvgpu --pty --uenv-file=$SQUASHFS_PATH bash module use /user-environment/modules module load boost cmake cuda gcc python export CUDAARCHS="80" python -m venv .venv-py310 source .venv-py310/bin/activate python -m pip install -r requirements-dev.txt python -m pip install cupy-cuda11x python -m pip install dace pytest -s --benchmark-skip --backend=gtfn_gpu --grid=simple_grid model/atmosphere/diffusion/tests/diffusion_stencil_tests/test_calculate_nabla4.py deactivate # exit python environment exit # terminate slurm allocation on gpu node ``` 4. Run benchmark ``` cd $SCRATCH/repo/icon4py srun -A <YOUR_ACCOUNT> -N1 -t60 --partition=nvgpu --pty --uenv-file=$SQUASHFS_PATH bash uenv modules use module load boost cmake cuda gcc python export CUDAARCHS="80" source .venv-py310/bin/activate pytest -s -m 'not slow_tests' --benchmark-only --backend=gtfn_gpu --grid=simple_grid model/atmosphere/diffusion/tests/diffusion_stencil_tests deactivate # exit python environment exit # terminate slurm allocation on gpu node ```