# Diffusion granule integration - setup guide ## Install icon4py + venv + deps The below steps are to setup the icon4py virtual environment and all dependencies. These need to be available to the interpreter in your venv before you can execute `py2fgen` and finally run the Python code from Fortran. **Note:** This has currently only been tested on tsa, balfrin is a work in progress. 1. Clone icon4py ```bash cd $SCRATCH git clone git@github.com:C2SM/icon4py.git ``` 2. Clone netcdf4-python ```bash git clone git@github.com:Unidata/netcdf4-python.git cd netcdf4-python git fetch --all --tags git checkout -b latest v1.6.5rel # we use the latest tag git submodule update --init ``` 3. Load some modules ```bash ## Balfrin module use /mch-environment/v5/modules module load netcdf-c/4.8.1-nvhpc module load hdf5/1.12.2-nvhpc module load python/3.10.8 module load gcc/11.3.0 export BOOST_ROOT=/scratch/mch/agopal/gh_benchmarks_dsl/spack-c2sm/spack/opt/spack/linux-sles15-zen3/gcc-11.3.0/boost-1.82.0-svvawyd7blvz5lpguvj66ptwp4odtcvn ``` ```bash ## Tsa module use /apps/common/UES/sandbox/kraushm/tsa-nvhpc/easybuild/modules/all module load gcc/8.3.0 module use /project/g110/install/tsa/python-3.10/module/ module load python/3.10.4 module load openmpi/4.1.0-nvhpc-21.2-cuda-11.2 source /project/g110/spack/user/tsa/spack/share/spack/setup-env.sh spack load --first boost@1.77.0 ``` 4. Now start installing venv for icon4py ```bash cd $SCRATCH/icon4py python3.10 -m venv .venv source .venv/bin/activate pip install --upgrade wheel pip setuptools pip install --src _external_src -r requirements-dev.txt ##pip install cython==0.29.37 ## this version required for cupy? ``` 5. Install cupy ```bash! #Balfrin srun --pty bash module load cuda/11.8.0-nvhpc export CUDAARCHS="80" export BOOST_ROOT=/scratch/mch/agopal/gh_benchmarks_dsl/spack-c2sm/spack/opt/spack/linux-sles15-zen3/gcc-11.3.0/boost-1.82.0-svvawyd7blvz5lpguvj66ptwp4odtcvn module load cuda/11.8.0-nvhpc pip install cupy-cuda11x ``` ```bash! #Tsa srun --partition=debug --pty bash export CUDAARCHS="70" export BOOST_ROOT=/project/g110/spack-install/tsa/boost/1.77.0/gcc/ugflscr3j6igarx7gugjwo4dc43lhh2e pip install cupy-cuda11x ``` To check if it works ```bash nvidia-smi pytest --backend=gtfn_gpu model/atmosphere/diffusion/tests/diffusion_stencil_tests/test_apply_diffusion_to_w_and_compute_horizontal_gradients_for_turbulence.py ``` 5. Now to install mpi4py ``` # Balfrin module load cray-mpich-binary/8.1.18.4-nvhpc CFLAGS=-noswitcherror pip install mpi4py ``` CFLAGS is required due to this [issue](https://github.com/mpi4py/mpi4py/issues/114) ``` # Tsa CFLAGS=-noswitcherror pip install mpi4py ``` 6. Install ghex option envs for ghex gpu ```bash export GHEX_USE_GPU=ON export GHEX_GPU_TYPE=NVIDIA export GHEX_GPU_ARCH="70;80" #for balfrin only module load gcc/11.3.0 pip install 'git+https://github.com/ghex-org/GHEX.git#subdirectory=bindings/python' ``` 7. Now install netcdf4-python (only for Balfrin) In the netcdf4-python directory, hand edit `setup.cfg` to add this line under ``[directories]`` `mpi_incdir=/mch-environment/v5/linux-sles15-zen3/nvhpc-23.3/cray-mpich-8.1.18.4-nvhpc-ytijkhza6iqxk642wz5hgx63q5t3uwm7/include` then run ```bash pip install cython cd $SCRATCH/netcdf4-python pip install --no-build-isolation -v . ``` ```bash! #For Tsa module load hdf5/1.10.5-nvhpc-21.2-cuda-11.2 module show netcdf/4.7.0-nvhpc-21.2-cuda-11.2 export NETCDF4_DIR=/apps/common/UES/sandbox/kraushm/tsa-nvhpc/easybuild/software/netCDF/4.7.0-NVHPC-21.2-cuda-11.2 export HDF5_DIR=/apps/common/UES/sandbox/kraushm/tsa-nvhpc/easybuild/software/HDF5/1.10.5-NVHPC-21.2-cuda-11.2 pip install cython cd $SCRATCH/netcdf4-python pip install -v . ``` ## Run py2fgen 1. Copy grid file (in serial mode we read the grid file in Python using the `ICON_GRID_PATH` environment variable). If the grid is called something other than `grid.nc` then we also need to export the `ICON_GRID_NAME` environment variable in the run script. ``` # balfrin cp /scratch/mch/jenkins/icon/pool/data/ICON/mch/grids/ch_r04b09_dsl/grid.nc icon4py/ ``` ``` # tsa cp /scratch/jenkins/icon/pool/data/ICON/mch/grids/ch_r04b09_dsl/grid.nc icon4py/ ``` 2. Run py2fgen We can now run `py2fgen` to generate the bindings. Most importantly the shared library (`.so`) and fortran interface (`.f90`). *Example (adapt to your use case):* ```bash py2fgen icon4pytools.py2fgen.wrappers.diffusion diffusion_init,diffusion_run diffusion_plugin -d -b GPU -o build_gpu2py ``` This should generate diffusion_plugin.[.c,.f90,.o,.h,.py,.so] You now need to copy the Fortran interface (`.f90`) to the `src/atm_dyn_iconam` folder of your ICON installation. **Note:** When running the `mch_r04b09_dsl` experiment you must also use the `--limited-area` flag with `py2f`. The generated Python wrapper code as well as the fortran interface will then be different (unpacking additional arrays), whereas in global experiments such as in the `ape_exclaim` experiment you should not use this flag (which is the default). ## Build icon-dsl in ACC mode **Before you build ICON make sure to start a new shell as any previously loaded modules will interfere with the build process, the configure script will load them for you** 1. Clone the icon-exclaim branch ```bash git clone -b diffusion_granule_verifying git@github.com:C2SM/icon-exclaim.git diffusion_granule_verifying cd diffusion_granule_verifying git submodule update --init ``` 2. Follow instructions in `/dsl/README.md`, and as usual execute the `./setup.sh` file using either `build_gpu2py` or `build_cpu2py` modes. Note that the path to the generated share library has to be correctly specified in the respective configuration script, so make sure to check that if things go wrong. ## Prepare to run 1. Generate run script ```bash ./make_run_scripts --all ``` 2. Add these settings to your run script ```bash export BOOST_ROOT=<path to folder containing boost `include`, `lib` directories> export PATH=/scratch/mch/agopal/run-granule-with-py2fgen/.venv/bin:$PATH export LD_LIBRARY_PATH=/mch-environment/v5/linux-sles15-zen3/gcc-11.3.0/python-3.10.8-ipylersqr3j767naw72rcs6637o7jd5q/lib:$LD_LIBRARY_PATH export ICON_GRID_LOC=<Path to the grid used by the experiment> export ICON_GRID_NAME=<grid name if something other than grid.nc, otherwise omit> export ICON4PY_LAM=<Set to 1 for MCH experiment, omit otherwise.> export ICON4PY_BACKEND=<GPU or CPU> ``` **Example Settings for MCH R04B09 Run on TSA** (adapt to your needs) ```bash export BOOST_ROOT=/project/g110/spack-install/tsa/boost/1.77.0/gcc/ugflscr3j6igarx7gugjwo4dc43lhh2e export PATH=/scratch/skellerh/profiling_py2f/run-granule-with-py2fgen/.venv/bin:$PATH export LD_LIBRARY_PATH=/project/g110/install/tsa/python-3.10/python/lib:$LD_LIBRARY_PATH export ICON4PY_BACKEND=GPU export ICON_GRID_LOC=/scratch/jenkins/icon/pool/data/ICON/mch/grids/ch_r04b09_dsl export PYTHONOPTIMIZE=1 export ICON4PY_LAM=1 ``` 3. Change run script to run on 1 node with 1 GPU and 2 mpi tasks. 1 GPU + 1 prefetch proc 4. If you want to use precompiled stencils you can also add these export statements to the run file (use your own scratch directory and create tmp directory there): ```bash export GT4PY_BUILD_CACHE_LIFETIME=PERSISTENT export GT4PY_BUILD_CACHE_DIR=/scratch/mch/agopal/tmp ``` ## cupy ```bash pip install cython==0.29.37 NVCC=/mch-environment/v5/linux-sles15-zen3/gcc-11.3.0/nvhpc-23.3-4sf7jktvmwpir5zm3bvnqvsjjaihxqzb/Linux_x86_64/23.3/compilers/bin/nvcc NVHPC_CUDA_HOME=/mch-environment/v5/linux-sles15-zen3/gcc-11.3.0/nvhpc-23.3-4sf7jktvmwpir5zm3bvnqvsjjaihxqzb CUDA_PATH=/mch-environment/v5/linux-sles15-zen3/nvhpc-23.3/cuda-11.8.0-fnodbjgsy5oei56xb7swss3lqwpjvbm7 pip install cupy ```