# ICON4Py benchmark on Clariden
Make sure to always work on scratch file system. Working on home file system will affect the entire system performance, including other users on the machine. For build and test execution, make sure to allocate a compute node (never run from login node).
Alps user guide: https://user.cscs.ch/access/running/alps/
This environment is setup using [Stackinator](https://eth-cscs.github.io/stackinator/) to build the GT4Py dependencies (boost, cmake, gcc and Python 3.10).
1. Prepare a user environment using Stackinator (one time setup)
You can skip this step and just use a pre-built software stack available in scratch file system:
```
export SQUASHFS_PATH="/iopsstor/scratch/cscs/epaone/uenv-images/clariden-icon4py-py310-cuda118.squashfs"`
```
The creation of a user environment follows the procedure described [here](https://eth-cscs.github.io/stackinator/configuring/). The above software stack was created based on this Spack-environment configuration:
```
gcc-env:
compiler:
- toolchain: gcc
spec: gcc@11
mpi:
spec: cray-mpich
gpu: cuda
unify: true
specs:
- boost@1.76
- cmake
- cuda@11.8
- python@3.10
variants:
- +mpi
- +cuda
- cuda_arch=80
views:
default:
```
The entire configuration is available on Githb: https://github.com/edopao/alps-spack-stacks/tree/dev-icon4py/recipes/icon4py/a100
2. Checkout icon4py repo (one time setup)
```
cd $SCRATCH
mkdir repo
cd repo
git clone https://github.com/C2SM/icon4py.git
```
3. Allocate a compute node and load the user environment:
Note that we export `CUDAARCHS="80"` in order to compile the CUDA kernels for the A100 GPU architecture.
```
cd $SCRATCH/repo/icon4py
srun -A <YOUR_ACCOUNT> -N1 -t60 --partition=nvgpu --pty --uenv-file=$SQUASHFS_PATH bash
module use /user-environment/modules
module load boost cmake cuda gcc python
export CUDAARCHS="80"
python -m venv .venv-py310
source .venv-py310/bin/activate
python -m pip install -r requirements-dev.txt
python -m pip install cupy-cuda11x
python -m pip install dace
pytest -s --benchmark-skip --backend=gtfn_gpu --grid=simple_grid model/atmosphere/diffusion/tests/diffusion_stencil_tests/test_calculate_nabla4.py
deactivate # exit python environment
exit # terminate slurm allocation on gpu node
```
4. Run benchmark
```
cd $SCRATCH/repo/icon4py
srun -A <YOUR_ACCOUNT> -N1 -t60 --partition=nvgpu --pty --uenv-file=$SQUASHFS_PATH bash
uenv modules use
module load boost cmake cuda gcc python
export CUDAARCHS="80"
source .venv-py310/bin/activate
pytest -s -m 'not slow_tests' --benchmark-only --backend=gtfn_gpu --grid=simple_grid model/atmosphere/diffusion/tests/diffusion_stencil_tests
deactivate # exit python environment
exit # terminate slurm allocation on gpu node
```