# Diffusion granule integration - setup guide
## Install icon4py + venv + deps
The below steps are to setup the icon4py virtual environment and all dependencies. These need to be available to the interpreter in your venv before you can execute `py2fgen` and finally run the Python code from Fortran.
**Note:** This has currently only been tested on tsa, balfrin is a work in progress.
1. Clone icon4py
```bash
cd $SCRATCH
git clone git@github.com:C2SM/icon4py.git
```
2. Clone netcdf4-python
```bash
git clone git@github.com:Unidata/netcdf4-python.git
cd netcdf4-python
git fetch --all --tags
git checkout -b latest v1.6.5rel # we use the latest tag
git submodule update --init
```
3. Load some modules
```bash
## Balfrin
module use /mch-environment/v5/modules
module load netcdf-c/4.8.1-nvhpc
module load hdf5/1.12.2-nvhpc
module load python/3.10.8
module load gcc/11.3.0
export BOOST_ROOT=/scratch/mch/agopal/gh_benchmarks_dsl/spack-c2sm/spack/opt/spack/linux-sles15-zen3/gcc-11.3.0/boost-1.82.0-svvawyd7blvz5lpguvj66ptwp4odtcvn
```
```bash
## Tsa
module use /apps/common/UES/sandbox/kraushm/tsa-nvhpc/easybuild/modules/all
module load gcc/8.3.0
module use /project/g110/install/tsa/python-3.10/module/
module load python/3.10.4
module load openmpi/4.1.0-nvhpc-21.2-cuda-11.2
source /project/g110/spack/user/tsa/spack/share/spack/setup-env.sh
spack load --first boost@1.77.0
```
4. Now start installing venv for icon4py
```bash
cd $SCRATCH/icon4py
python3.10 -m venv .venv
source .venv/bin/activate
pip install --upgrade wheel pip setuptools
pip install --src _external_src -r requirements-dev.txt
##pip install cython==0.29.37 ## this version required for cupy?
```
5. Install cupy
```bash!
#Balfrin
srun --pty bash
module load cuda/11.8.0-nvhpc
export CUDAARCHS="80"
export BOOST_ROOT=/scratch/mch/agopal/gh_benchmarks_dsl/spack-c2sm/spack/opt/spack/linux-sles15-zen3/gcc-11.3.0/boost-1.82.0-svvawyd7blvz5lpguvj66ptwp4odtcvn
module load cuda/11.8.0-nvhpc
pip install cupy-cuda11x
```
```bash!
#Tsa
srun --partition=debug --pty bash
export CUDAARCHS="70"
export BOOST_ROOT=/project/g110/spack-install/tsa/boost/1.77.0/gcc/ugflscr3j6igarx7gugjwo4dc43lhh2e
pip install cupy-cuda11x
```
To check if it works
```bash
nvidia-smi
pytest --backend=gtfn_gpu model/atmosphere/diffusion/tests/diffusion_stencil_tests/test_apply_diffusion_to_w_and_compute_horizontal_gradients_for_turbulence.py
```
5. Now to install mpi4py
```
# Balfrin
module load cray-mpich-binary/8.1.18.4-nvhpc
CFLAGS=-noswitcherror pip install mpi4py
```
CFLAGS is required due to this [issue](https://github.com/mpi4py/mpi4py/issues/114)
```
# Tsa
CFLAGS=-noswitcherror pip install mpi4py
```
6. Install ghex
option envs for ghex gpu
```bash
export GHEX_USE_GPU=ON
export GHEX_GPU_TYPE=NVIDIA
export GHEX_GPU_ARCH="70;80"
#for balfrin only
module load gcc/11.3.0
pip install 'git+https://github.com/ghex-org/GHEX.git#subdirectory=bindings/python'
```
7. Now install netcdf4-python
(only for Balfrin)
In the netcdf4-python directory, hand edit `setup.cfg` to add this line under ``[directories]``
`mpi_incdir=/mch-environment/v5/linux-sles15-zen3/nvhpc-23.3/cray-mpich-8.1.18.4-nvhpc-ytijkhza6iqxk642wz5hgx63q5t3uwm7/include`
then run
```bash
pip install cython
cd $SCRATCH/netcdf4-python
pip install --no-build-isolation -v .
```
```bash!
#For Tsa
module load hdf5/1.10.5-nvhpc-21.2-cuda-11.2
module show netcdf/4.7.0-nvhpc-21.2-cuda-11.2
export NETCDF4_DIR=/apps/common/UES/sandbox/kraushm/tsa-nvhpc/easybuild/software/netCDF/4.7.0-NVHPC-21.2-cuda-11.2
export HDF5_DIR=/apps/common/UES/sandbox/kraushm/tsa-nvhpc/easybuild/software/HDF5/1.10.5-NVHPC-21.2-cuda-11.2
pip install cython
cd $SCRATCH/netcdf4-python
pip install -v .
```
## Run py2fgen
1. Copy grid file (in serial mode we read the grid file in Python using the `ICON_GRID_PATH` environment variable). If the grid is called something other than `grid.nc` then we also need to export the `ICON_GRID_NAME` environment variable in the run script.
```
# balfrin
cp /scratch/mch/jenkins/icon/pool/data/ICON/mch/grids/ch_r04b09_dsl/grid.nc icon4py/
```
```
# tsa
cp /scratch/jenkins/icon/pool/data/ICON/mch/grids/ch_r04b09_dsl/grid.nc icon4py/
```
2. Run py2fgen
We can now run `py2fgen` to generate the bindings. Most importantly the shared library (`.so`) and fortran interface (`.f90`).
*Example (adapt to your use case):*
```bash
py2fgen icon4pytools.py2fgen.wrappers.diffusion diffusion_init,diffusion_run diffusion_plugin -d -b GPU -o build_gpu2py
```
This should generate diffusion_plugin.[.c,.f90,.o,.h,.py,.so]
You now need to copy the Fortran interface (`.f90`) to the `src/atm_dyn_iconam` folder of your ICON installation.
**Note:** When running the `mch_r04b09_dsl` experiment you must also use the `--limited-area` flag with `py2f`. The generated Python wrapper code as well as the fortran interface will then be different (unpacking additional arrays), whereas in global experiments such as in the `ape_exclaim` experiment you should not use this flag (which is the default).
## Build icon-dsl in ACC mode
**Before you build ICON make sure to start a new shell as any previously loaded modules will interfere with the build process, the configure script will load them for you**
1. Clone the icon-exclaim branch
```bash
git clone -b diffusion_granule_verifying git@github.com:C2SM/icon-exclaim.git diffusion_granule_verifying
cd diffusion_granule_verifying
git submodule update --init
```
2. Follow instructions in `/dsl/README.md`, and as usual execute the `./setup.sh` file using either `build_gpu2py` or `build_cpu2py` modes. Note that the path to the generated share library has to be correctly specified in the respective configuration script, so make sure to check that if things go wrong.
## Prepare to run
1. Generate run script
```bash
./make_run_scripts --all
```
2. Add these settings to your run script
```bash
export BOOST_ROOT=<path to folder containing boost `include`, `lib` directories>
export PATH=/scratch/mch/agopal/run-granule-with-py2fgen/.venv/bin:$PATH
export LD_LIBRARY_PATH=/mch-environment/v5/linux-sles15-zen3/gcc-11.3.0/python-3.10.8-ipylersqr3j767naw72rcs6637o7jd5q/lib:$LD_LIBRARY_PATH
export ICON_GRID_LOC=<Path to the grid used by the experiment>
export ICON_GRID_NAME=<grid name if something other than grid.nc, otherwise omit>
export ICON4PY_LAM=<Set to 1 for MCH experiment, omit otherwise.>
export ICON4PY_BACKEND=<GPU or CPU>
```
**Example Settings for MCH R04B09 Run on TSA** (adapt to your needs)
```bash
export BOOST_ROOT=/project/g110/spack-install/tsa/boost/1.77.0/gcc/ugflscr3j6igarx7gugjwo4dc43lhh2e
export PATH=/scratch/skellerh/profiling_py2f/run-granule-with-py2fgen/.venv/bin:$PATH
export LD_LIBRARY_PATH=/project/g110/install/tsa/python-3.10/python/lib:$LD_LIBRARY_PATH
export ICON4PY_BACKEND=GPU
export ICON_GRID_LOC=/scratch/jenkins/icon/pool/data/ICON/mch/grids/ch_r04b09_dsl
export PYTHONOPTIMIZE=1
export ICON4PY_LAM=1
```
3. Change run script to run on 1 node with 1 GPU and 2 mpi tasks. 1 GPU + 1 prefetch proc
4. If you want to use precompiled stencils you can also add these export statements to the run file (use your own scratch directory and create tmp directory there):
```bash
export GT4PY_BUILD_CACHE_LIFETIME=PERSISTENT
export GT4PY_BUILD_CACHE_DIR=/scratch/mch/agopal/tmp
```
## cupy
```bash
pip install cython==0.29.37
NVCC=/mch-environment/v5/linux-sles15-zen3/gcc-11.3.0/nvhpc-23.3-4sf7jktvmwpir5zm3bvnqvsjjaihxqzb/Linux_x86_64/23.3/compilers/bin/nvcc NVHPC_CUDA_HOME=/mch-environment/v5/linux-sles15-zen3/gcc-11.3.0/nvhpc-23.3-4sf7jktvmwpir5zm3bvnqvsjjaihxqzb CUDA_PATH=/mch-environment/v5/linux-sles15-zen3/nvhpc-23.3/cuda-11.8.0-fnodbjgsy5oei56xb7swss3lqwpjvbm7 pip install cupy
```