# Python Granule Integration into ICON
This document provides detailed instructions for integrating Python granules into the ICON model, including setup, build instructions, and verification processes.
---
## Granule Status
- **Diffusion Granule**:
- Serial Runs: Integrated and verified using `cdo`, `probtest`, and a custom verification framework on CPU, GPU.
- Parallel Runs: Integrated and verified on CPU and GPU.
- Available in default branches of icon-exclaim & icon4py
- **Dycore Granule**:
- Serial Runs: Runs on CPU and GPU, verification ongoing.
- Parallel Runs: Runs on CPU and GPU, verification ongoing.
- PRs: [PR#282](https://github.com/C2SM/icon-exclaim/pull/282), [PR#576](https://github.com/C2SM/icon4py/pull/576)
---
## Balfrin Machine Setup
### Prerequisites
Ensure you are on the `balfrin` machine at CSCS.
### Download Required Scripts
1. Access the required files in the Github repository (raw format). The files are located inside the `dsl` folder. Open each file separately and open the `raw` files in your browser.
2. Copy the URL of each file, and download it using `wget`:
- `install_dependencies.sh`
- `env.sh`
- `setup.sh`
You can download them using `wget <URL> -O <output-file-name>`
3. Make these scripts executable:
```bash
chmod +x install_dependencies.sh env.sh setup.sh
```
---
## Build Instructions
### Prerequisites
Activate `uenv`:
```bash
uenv start /scratch/mch/leclairm/uenvs/images/icon.v1.rc4.sqfs
uenv view icon-wcp:icon
```
This activates the uenv which includes all necessary dependencies to build ICON.
### 1. Installing Dependencies
Execute `install_dependencies.sh` to set up the environment and clone necessary repositories. It is required to use the `--granule` flag so that `py2fgen` is used to generated the bindings. Furthermore if running a LAM experiment you need to use the `--limited-area` flag. An example is found here:
```bash
./install_dependencies.sh --granule --limited-area
```
**Optional Flags**:
- `--icon`: Branch for `icon-exclaim` (default: `icon-dsl`)
- `--icon4py`: Branch for `icon4py` (default: `main`)
- `--gt4py`: Branch for `gt4py` (default: `main`)
- `--gridtools`: Branch for `gridtools` (default: `master`)
Also important to note is that currently the `install_dependencies.sh` script builds GHEX in GPU mode. Meaning that when executing experiments with multiple tasks (MPI parallel) which make use of halo exchanges and thus GHEX, a GPU node will be required, even if the experiment runs on GPU.
### 2. Setting Up and Running the Model
Run `setup.sh` to compile and execute the model:
```bash
./setup.sh <build_mode>
```
- **Build Modes**:
- `build_cpu2py`: Compile for CPU.
- `build_gpu2py`: Compile for GPU.
---
## Running the Experiment
To run the Python granule, export required environment variables within the experiment runscript.
**For CPU:**
```bash
export BOOST_ROOT=/user-environment/env/icon/include/boost/
export PATH=/scratch/mch/skellerh/icon4py/.venv/bin:$PATH
export LD_LIBRARY_PATH=/user-environment/env/icon/lib64:$LD_LIBRARY_PATH
export ICON4PY_BACKEND=CPU
```
**For GPU:**
```bash
export BOOST_ROOT=/user-environment/env/icon/include/boost/
export CUDA_PATH=/user-environment/linux-sles15-zen3/gcc-12.3.0/cuda-12.3.0-45bjh5clzy5kol7ymjjqzshbn7ymxriu
export CUDAARCHS="80"
export PATH=/scratch/mch/skellerh/icon4py/.venv/bin:$PATH
export LD_LIBRARY_PATH=/user-environment/env/icon/lib64:$LD_LIBRARY_PATH
export ICON4PY_BACKEND=GPU
```
**Note**: The `BOOST_ROOT`, `LD_LIBRARY_PATH` and `CUDA_PATH` can be copied 1 to 1 to your own runscript, as they will be the same paths as long as you have activated your `uenv`. However you need to ensure that the the Python path in `PATH` points to the virtual environment in your own icon4py installation.
To reduce compilation times when running the experiment it is possible to tell GT4PY to cache the stencils, and reuse previously compiled stencils. To enable this you can export the following environment variables in your runscript:
```bash
export GT4PY_BUILD_CACHE_DIR=<path-to-your-cache-dir>
export GT4PY_BUILD_CACHE_LIFETIME=PERSISTENT
```
Note that the "cache directory" mentioned above must already exist.
### Parallel vs Single Task Runs
By default, the granules run as single tasks. Set up SLURM for 2 tasks (1 reserved for I/O):
```bash
#SBATCH --gpus-per-node=1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
```
For parallel granule runs with halo exchanges, set:
```bash
export ICON4PY_PARALLEL=1
```
If this environment variable is omitted no halo exchanges will be performed.
Ensure `nproma` is large enough, typically `50,000`. In practice it needs to be larger or equal to the number of edges in your grid.
---
## Inspecting Results
Check the `LOG` file in the `run` directory for model outputs. Debugging information is included by default (remove `-d` flag in `install_dependencies.sh` from the `py2fgen` commands if not needed).
Furthermore at the end of the LOG file, you will find a table containing timers for each part of the Fortran code that has timers, of which by default there are already many in place. If you want to use the Fortran timers to time specific parts of your code you have to manually add new ones to your Fortran code.
---
## Verifying Results
### Using CDO (Climate Data Operators)
`cdo` can be used to compare the output NetCDF files of our runs against the output files of a reference run. Results can be checked visually using the `diffv` tool.
Load `cdo` on Balfrin:
```bash
module use $USER_ENV_ROOT/modules
module load cdo
```
To compare outputs:
```bash
cdo diffv <actual.nc> <reference.nc>
```
### Using Probtest
This is a tool developed at meteoswiss that runs an ensemble of models for an experiment and creates reference data from this ensemble run. It then allows you to compare your own run (for example using the granule) against this reference to check if it produces outputs that lie within an acceptable range from the ensembles.
1. **Activate Virtual Environment**:
```bash
source /scratch/d1000/apps/buildbot/bb-venv/bin/activate
```
2. **Setup `probtest.json`**:
```bash
cd icon-exclaim/build_cpu
python ../externals/probtest/probtest.py init --codebase-install $PWD --experiment-name mch_ch_r04b09_dsl --reference $PWD --file-id NetCDF "*atm_3d_ml*.nc"
```
Update `account` to `s83` if required, or whatever CSCS account allows you to submit jobs on the cluster.
3. **Run Ensemble**:
```bash
python ../externals/probtest/probtest.py run-ensemble
```
4. **Generate Ensemble Statistics**:
```bash
python ../externals/probtest/probtest.py stats --ensemble
```
5. **Set Tolerances**:
```bash
python ../externals/probtest/probtest.py tolerance
```
6. **Compare Experiments**:
```bash
python ../externals/probtest/probtest.py stats --no-ensemble --model-output-dir ../build_cpu2py/experiments/mch_ch_r04b09_dsl/ --stats-file-name build_cpu2py_stats.csv
python ../externals/probtest/probtest.py check --input-file-ref stats_ref.csv --input-file-cur build_cpu2py_stats.csv --factor 5
```
The above example is for a granule run for a `build_cpu2py` build, change this to your own path.
**If results are within tolerance, `probtest` will pass. Otherwise, it will fail.**
---
## Granule Verification Framework
The Granule Verification Framework allows real-time verification of input/output fields for granules during runtime. It compares results between Fortran and Python implementations, helping detect discrepancies at a detailed level. The maintainer of this framework is @huppd.
### Enabling Verification
To activate the framework, define the `__USE_PY2F_VERIFY` directive. When enabled, this runs the Fortran diffusion first, then the Python granule diffusion using copies of the fields for direct comparison, any discrepancies are logged immediately.
**Note:** When verification is enabled, the final output will reflect the results from the Fortran code, as it runs first. To get results from only the Python granule, the verification framework must be turned off.
### How It Works
#### Verification routines:
`verify_field_wp`: This function checks field values from Fortran and Python, calculating maximum errors, mean squared errors, and other metrics to assess accuracy based on a tolerance level.
`print_verify_header` and `print_verify_footer`: These functions display a structured summary of results.
#### Example: Diffusion Granule
In the `diffusion_py` routine, the framework verifies fields by:
1. **Copying Initial Fields:** Fields are copied before running Fortran diffusion to serve as a baseline.
2. **Running Diffusion and Verifying:** After the Fortran diffusion, the Python granule diffusion is executed using initial field copies, allowing field-by-field comparisons.
3. **Printing Results:** The framework outputs metrics on any differences.
---
## Performance Considerations
The GT4Py code, when embedded in Fortran, has a performance ceiling equal to its execution within a standalone Python interpreter. This establishes an upper limit for performance optimization, implying that improvements must be made at the interface layer between Fortran and GT4Py, or within GT4Py itself to surpass ICON performance.
There is an inherent overhead when passing Fortran pointers to Python. These pointers need to be unpacked in Python, transformed into numpy or cupy views, and then allocated as GT4Py Fields. To reduce this overhead, consider the following optimization:
- **Pointer Persistence in Bindings**
Rather than unpacking and allocating fields each time a Python function is invoked, we can establish a hash table to store each GT4Py field upon its initial creation. On subsequent executions, the field can be retrieved directly from the hash table, maintaining updated results since it references the same memory location in Fortran.
Additional optimizations are also feasible within ICON4Py granules, such as:
- **Utilizing `FrozenProgram` in Dycore and Diffusion Granules**
Currently, a processing overhead occurs as GT4Py stencils are lowered with each program execution. To mitigate this, the `FrozenProgram` class can be used, allowing for more efficient execution. This approach should be applied wherever possible to maximize performance.
**An implementation of the above Performance Optimisations has been done for diffusion**. More information can be found here: https://hackmd.io/-RXC1RhDToGe-Z8cZs0sIg
---
## Profiling
Each wrapper in ICON4Py includes `profile_enable` and `profile_disable` functions, which can be embedded within ICON as needed.
To profile a specific granule in Python, use `profile_enable()` in Fortran before the target routine, such as `diffusion_run`. Immediately after the routine, call `profile_disable()` to export profiling statistics from `cProfile` into your experiments directory.
These statistics can be visualized with tools like [snakeviz](https://jiffyclub.github.io/snakeviz/), which provides a browser-based overview of time allocation within the Python code.
For a broader perspective, review the ICON timer report found at the end of the LOG file, and consider creating custom ICON timers as required.
---
## Other hints
### Granule Development and Debugging
When you modify the interface of any Python function embedded within Fortran, you must regenerate the bindings with `py2fgen` and recompile the ICON project. Here’s how to do it step-by-step:
1. **Regenerate Bindings**: Run `py2fgen` to generate new C and Fortran 90 (F90) wrappers. This ensures the Fortran interface correctly communicates with the updated Python function.
2. **Recompile ICON**: After updating the bindings, recompile ICON from within the relevant build folder, for example inside `build_cpu` by running:
```bash
make -j 16
```
3. **When Recompilation Isn’t Needed**: If your changes only affect the internal logic of a Python function (or any other part of the Python codebase that the function calls), you don’t need to regenerate the bindings and recompile ICON. Instead, you can simply rerun your already built ICON binary, and the embedded Python interpreter will detect and apply the updates to your Python code.
## Useful Links
### Py2F docs
https://github.com/C2SM/icon4py/blob/main/tools/README.md#py2fgen
### ICON-DSL build docs
https://github.com/C2SM/icon-exclaim/blob/icon-dsl/dsl/README.md
### Blue Line ICON
https://docs.google.com/presentation/d/1s-Ff7p2GapgK5thd42QmDukrzQl41NedKmCsCvOThUo/edit#slide=id.g2f13c407cf2_0_803
### Diffusion Optimizations
https://hackmd.io/-RXC1RhDToGe-Z8cZs0sIg