Installing dsl_chain_bench on Balfrin

# Installing dsl_chain_bench on Balfrin `dsl_chain_bench` currently has quite a few pre-requisites. Once by way of dusk and dawn, but then by itself because it requires `AtlasUtilities`, which in turn requires `atlas` and `eckit`. Should `dsl_chain_bench` be maintained for whatever reason beyond repeating the benchmarks already present on balfrin, it is recommended that the dependence on `AtlasUtilities` is removed. This can be achieved by writing a new utility function that leverages `netcdf-cxx4` to read in the neighbor lists directly. This hypothetical utility may be inspired by `icon_setup.cpp` [here](https://github.com/C2SM/icon-exclaim/blob/icon-dsl/dsl/icon_setup.cpp). It's probably best you make a new empty folder on your scratch, possibly called `dsl_synthetic_bench` or so. This folder will be refered to as `<your/scratch/dsl_synthetic_bench>` throghout this guide. Also, make sure to source a spack-c2sm installation (`source <path/to/spack_folder/spack-c2sm/setup/setup-env.sh`). Install **python 3.8** using spack (dusk is only compatible with python3.8): ``` spack install python@3.8.13%gcc@11.3.0 ``` Unfortunately, llvm (needed for dawn) can not be installed using spack due to a bizare python version problem. Thus, install **llvm 10** manually instead ``` cd <your/scratch/dsl_synthetic_bench> #it is extremely important to install out of the build folder for some reason mkdir llvm10 # module load gcc/11.3.0 module load cmake/3.24.2-gcc module load python/3.10.6 git clone https://github.com/llvm/llvm-project.git cd llvm-project git checkout refs/tags/llvmorg-10.0.0 mkdir build && cd build && mkdir install cmake ../llvm -DLLVM_ENABLE_PROJECTS=clang -DLLVM_ENABLE_RTTI=ON -DLLVM_ENABLE_TERMINFO=OFF -DCMAKE_INSTALL_PREFIX=<your/scratch/dsl_synthetic_bench>/llvm10 #gcc 11.3 is a bit strict, let's hotfix a small issue printf '%s\n%s\n' "#include <limits>" "$(cat ../llvm/utils/benchmark/src/benchmark_register.h)" > ../llvm/utils/benchmark/src/benchmark_register.h srun -p postproc -uc 128 make -j 128 make -j8 install ``` Install **dusk** & **dawn** First ``` cd <your/scratch/dsl_synthetic_bench> ``` It's probably best you store and execute this as a script. ``` #!/bin/bash set -e -x module use $USER_ENV_ROOT/modules module load gcc module load cmake spack load python@3.8 python -m venv dusk-venv source dusk-venv/bin/activate pip install --upgrade pip setuptools wheel # Clone and build/install dawn git clone -b experimentalInlinePass https://github.com/MeteoSwiss-APN/dawn pushd dawn mkdir build pushd build mkdir install cmake -DLLVM_ROOT=/scratch/e1000/meteoswiss/scratch/mroeth/dsl_synthetic_bench/llvm10 -DCMAKE_INSTALL_PREFIX=$(pwd)/install -DGTCLANG_BUILD_TESTING=OFF -DDAWN_BUILD_TESTING=OFF .. srun -p postproc -uc 36 make -j 36 srun -p postproc -uc 8 make install -j 8 popd popd # Set up dusk (branch 'horizon') and install dawn and dusk in virtualenv git clone https://github.com/dawn-ico/dusk.git pushd dusk git checkout horizon popd export LLVM_ROOT=/scratch/e1000/meteoswiss/scratch/mroeth/dsl_synthetic_bench/llvm10 pip install -e dawn/dawn pip install -e dusk deactivate ``` Install **atlas**, **eckit** and **atlas_utils** (all required for the bench) For eckit ``` spack install eckit%gcc@11.3.0~mpi ``` Note the `~mpi`. This disables mpi support in eckit. This is important, because otherwise running the benchmarks in the final step will result in a cryptic mpi error. Disabling mpi is fine, the bench, as it exists Today, only runs single node experiments. For atlas ``` cd <your/scratch/dsl_synthetic_bench> git@github.com:ecmwf/atlas.git cd atlas spack load eckit # Environment --- Edit as needed ATLAS_SRC=$(pwd) ATLAS_BUILD=build ATLAS_INSTALL=$(pwd)/build/install # 1. Create the build directory: mkdir $ATLAS_BUILD cd $ATLAS_BUILD mkdir install # 2. Run CMake ecbuild --prefix=$ATLAS_INSTALL -- $ATLAS_SRC # 3. Compile / Install make -j10 make install # 4. Check installation $ATLAS_INSTALL/bin/atlas --info ``` For atlas utils, first install netcdf-c and netcdf-cxx4 using spack ``` spack install netcdf-c%gcc@11.3.0 spack install netcdf-cxx4%gcc@11.3.0 ``` And make sure to load them ``` spack load netcdf-c@4.8.1%gcc@11.3.0 spack load netcdf-cxx4 ``` Then: ``` cd <your/scratch/dsl_synthetic_bench> git clone https://github.com/dawn-ico/AtlasUtilities cd AtlasUtilities mkdir build && cd build && mkdir install cmake .. -Deckit_DIR=$(spack find --format "{prefix}" eckit | head -n1)/lib64/cmake/eckit -Datlas_DIR=<your/scratch/dsl_synthetic_bench>/atlas/build/install/lib64/cmake/atlas -Dnetcdfcxx4_DIR=$(spack find --format "{prefix}" netcdf-cxx4 | head -n1) -DCMAKE_INSTALL_PREFIX=$(pwd)/install make -j8 make install ``` Then, finally, clone the actual repo ``` cd <your/scratch/dsl_synthetic_bench> git clone git@github.com:dawn-ico/dsl_chain_bench.git ``` The general idea is now to generate some benchmarks using the `generate.py` script, and subsequently build them using cmake, so ``` cd dsl_chain_bench python generate.py ``` Basically, `generate.py` takes the "templates" in the `templates` directory and generates benchmark files (cpp) and dusk files (py) for all 12 possible reduction sequences. A variable `NAME` at the very top of the script determines the template to be generated. Then do the following to build the benchmarks ``` cd benchmarks/build module use $USER_ENV_ROOT/modules module load gcc module load cmake module load cuda module load python/3.10.6 cmake .. -Datlas_DIR=<your/scratch/dsl_synthetic_bench>/atlas/build/install/lib64/cmake/atlas \ -Datlas_utils_DIR=<your/scratch/dsl_synthetic_bench>/AtlasUtilities/build/install/lib/cmake/atlas_utils/ \ -Datlas_utils_LIBRARY_PATH=<your/scratch/dsl_synthetic_bench>/AtlasUtilities/build/install/lib/libatlasUtilsLib.a \ -Ddawn4py_DIR=<your/scratch/dsl_synthetic_bench>/dawn/dawn/src/ \ -DTOOLCHAINPATH=<your/scratch/dsl_synthetic_bench>/dawn/build/install/bin:<your/scratch/dsl_synthetic_bench>/dusk-venv/bin ``` ## Running Benchmarks Using the Default (naive) Inlining Technique You can run all benchmarks in sequence using the `runner.sh` script in the `benchmarks` folder. The runtimes are emitted into a text file called `default_times.txt` in the `build` folder. Note that some inlined stencils may fail the verification due to overcomputation in the inlined case. This could be improved by not checking the error at the domain boundaries. ## Running Benchmarks Using more Advanced Inlining Strategies Note: it is strongly recommended that you delete the contents of the build folder completely when switching inlining techniques * Running the **compressed** benchmarks (using the terminology in [this slideset](https://docs.google.com/presentation/d/1XLcZt83fxTN5UalYZPhYelgHcu7zKC_l7SzwTlwyFys/edit?usp=sharing)) is a bit cumbersome. First, checkout the custom branch `compressed` by doing `git checkout compressed` * Run `generate.py` * Switch to `benchmarks/build` and `make -j8`. If you study one of the generated inlined stencils you will notice that the stencil is invalid. In particular, the weights vector employed only has a single entry. * In order to correct this, and generate the final, correct stencils, an additional python script has to be ran, placed in the `benchmarks/build` folder. So run `python inject_patches_unrolled.py`. This python script will manipulate the emitted stencil code directly, and certainly does not live up to any python coding standards (and is expected to be very brittle) * Build again: `make -j8` * An run `cd .. && ./runner.sh` * Running the **compressed and unrolled** benchmarks (using the terminology in [this slideset](https://docs.google.com/presentation/d/1XLcZt83fxTN5UalYZPhYelgHcu7zKC_l7SzwTlwyFys/edit?usp=sharing)) is quite easy. In this case, the compressed & unrolled stencils are *not* compiled using the dusk/dawn toolchain but have been hand written and checked into the repo on a custom branch, somewhat uncanonically, directly into the build folder. So, just `git checkout unrolling` and rebuild will compile all compressed and unrolled benchmarks. The `runner.sh` script on this branch is adapted to run just the compressed and unrolled benchmarks (just 6 of the possible 12 reductions have been implemented) ## Some Notes on the Mesh the Benchmarks are Performed On * The mesh being used is checked into the github repository [here](https://github.com/dawn-ico/dsl_chain_bench/tree/master/benchmarks/resources), even though it is quite a large binary file. Be that as it may, this is the mesh used by the experiment `mch_ch_r04b09` (without the `_dsl` suffix). * Should the mesh be changed in the future (perhaps to the operational 1e or 2e mesh), the subdomain markers present in all `*_bench.cpp` files need to be adapted. Currently they are set to conservatively exclude the boundary regions (such that one does not need to check for missing neighbors in neighbor lists). The same files also control how many vertical levels there are used for the benchmark.