0. Get the personal access token for BSC-ES gitlab (see `https://earth.bsc.es/gitlab/digital-twins/nvidia/nemo-build/-/blob/master/README.md#authorization-notes` for hints) --- 00. Make a symlink: `ln -s /gpfs/projects/bsc32/DE340-share/nemo-standalone-data $HOME/data` --- 1. Modify account config files: - `$HOME/.bashrc`: ``` export PATH=/gpfs/projects/bsc32/DE340-share/nemo-build-utils:$PATH export DNB_GITLAB_USERNAME="XXXXXX" export DNB_GITLAB_ACCESS_TOKEN="XXXXXXXXXXXXXX" ``` - `$HOME/.netrc`: ``` machine earth.bsc.es login XXXXXXX password XXXXXXXXXXXXXXX ``` > NOTE: in this file, login/password pair is the same as DNB_GITLAB_USERNAME/DNB_GITLAB_ACCESS_TOKEN above > NOTE: don't forget to log off and log in again after changing these config files --- 2. Get `nemo-build`: ``` $ git clone --recursive https://earth.bsc.es/gitlab/digital-twins/nvidia/nemo-build.git $ cd nemo-build ``` --- 3. Create `overrides.yaml`: ``` --- environment: - export NEMO_CODEBASE="DE340_NVIDIA" - export DNB_PACKAGE_VERSIONS=nemo:HEAD^master - export NEMO_WITH_REPRODUCIBLE=0 - export NEMO_WITH_GPU=1 - export NEMO_WITH_PROFILING=1 - export NEMO_WITH_FAKE_GPU=0 - export NEMO_WITH_MANAGED_MEMORY=0 - export DNB_SANDBOX_SUBDIR=nemo.gpu.acc.opt ``` --- 4. Create `account.yaml`: ``` --- # MN5-ACC: psubmit: queue_name: "" account: ehpc01 node_type: acc_debug ``` If you do not have ehpc01 access, you may use: ``` --- # MN5-ACC: psubmit: queue_name: "" account: bsc32 node_type: acc_debug ``` --- 5. Make a `machine.yaml` symlink: ``` ln -s dnb-mn5-acc.yaml machine.yaml ``` > NOTE: files `overrides.yaml`, `account.yaml` and a `machine.yaml` symlink are supposed to be local and we never push them into git repository. --- 6. Download and unpack NEMO: ``` $ ./dnb.sh :du 2>&1 | tee build.log Download and build script for NEMO, for docs please refer: https://earth.bsc.es/gitlab/digital-twins/nvidia/nemo-build/-/blob/master/README.md Download and build started at timestamp: 1743411705. ---------- >> dbscripts: Processing account.yaml: >> dbscripts: Processing overrides.yaml: export NEMO_CODEBASE="DE340_NVIDIA" export DNB_PACKAGE_VERSIONS=nemo:HEAD^master export NEMO_WITH_REPRODUCIBLE=0 ... >> dbscripts: dubi_main: call: dnb_nemo mode=du version=HEAD^master 2025-03-31 11:17:44 URL:https://earth.bsc.es/gitlab/api/v4/projects/1476/repository/archive?sha=master [165646931/165646931] -> "nemogcm_v40.dwn/nemogcm_v40-HEAD^master.tar.gz" [1] patching file makenemo >> dbscripts: applying patch_nemo-ANY_isnan_v1.diff: patching file src/OCE/OBS/obs_inter_h2d.F90 patching file src/OCE/OBS/obsinter_h2d.h90 patching file src/OCE/stpctl.F90 >> dbscripts: applying patch_nemo-ANY_keys.diff: patching file src/OCE/stpctl.F90 >> dbscripts: applying patch_nemo-ANY_timing.diff: patching file src/OCE/IOM/in_out_manager.F90 ---------- Full operation time: 27 seconds. ``` --- 7. Build NEMO: ``` $ ./dnb.sh 2>&1 | tee build.log Download and build script for NEMO, for docs please refer: https://earth.bsc.es/gitlab/digital-twins/nvidia/nemo-build/-/blob/master/README.md Download and build started at timestamp: 1743412988. ---------- >> dbscripts: Processing account.yaml: >> dbscripts: Processing overrides.yaml: export NEMO_CODEBASE="DE340_NVIDIA" export DNB_PACKAGE_VERSIONS=nemo:HEAD^master export NEMO_WITH_REPRODUCIBLE=0 export NEMO_WITH_GPU=1 export NEMO_WITH_PROFILING=1 export NEMO_WITH_FAKE_GPU=0 export NEMO_WITH_MANAGED_MEMORY=0 export DNB_SANDBOX_SUBDIR=nemo.gpu.acc.opt >> dbscripts: Processing machine.yaml: module load Stages/2025 NVHPC/25.1-CUDA-12 OpenMPI/5.0.5 MPI-settings/CUDA HDF5/1.14.5-serial netCDF/4.9.2-serial CMake/3.30.3 export FC=mpifort export CXX=g++ export CC=gcc export NEMO_SCRIPTS_FOLDER="jedi" export NEMO_AVAILABLE_WORKLOADS=${NEMO_AVAILABLE_WORKLOADS:="eORCA1-spinup eORCA025-spinup"} export NEMO_USE_PREBUILT_HDF5=TRUE export NEMO_USE_PREBUILT_NETCDF_C=TRUE export NEMO_USE_PREBUILT_NETCDF_FORTRAN=TRUE export NEMO_USE_PREBUILT_XIOS=TRUE export NEMO_HDF5_PATH="$HDF5_DIR" export NEMO_NETCDF_C_PATH="$EBROOTNETCDF" export NEMO_NETCDF_FORTRAN_PATH="/p/project1/training2508/HybridNEMO/thirdparty/netcdf-fortran-4.4.3" export NEMO_XIOS_PATH="/p/project1/training2508/HybridNEMO/thirdparty/xios-2.5" export NEMO_WORKLOAD_ARCHIVE="" WARNING: overriding psubmit/queue_name setting is ignored in machine.yaml: preserved value PSUBMIT_OPT_QUEUE=all WARNING: overriding psubmit/account setting is ignored in machine.yaml: preserved value PSUBMIT_OPT_ACCOUNT=training2508 NEMO_CODEBASE=${NEMO_CODEBASE:=DE340_NVIDIA} [ -f nemo-build-select-codebase.inc ] && source nemo-build-select-codebase.inc Using cmake version 3.30.3 Using CUDA path /p/software/jedi/stages/2025/software/CUDA/12, version: 12.6 >> dbscripts: Package list: nemo >> dbscripts: EXECUTING: dubi_main "$*" >> dbscripts: dubi_main: actions: nemo:bi >> dbscripts: dubi_main: call: dnb_nemo mode=bi version=HEAD^master Are you sure that you want to remove this directory ORCA2? [y/n] ORCA2 configuration REMOVED ++++ ./makenemo -j 16 -r ORCA2_ICE_PISCES -n ORCA2 -d 'OCE ICE' -m dnb add_key 'key_asminc key_netcdf4 key_sms key_xios2 key_nosignedzero key_profiling key_gpu key_nvidia_gpu key_prof_gpu' del_key key_top You are installing a new configuration ORCA2 from ORCA2_ICE_PISCES with sub-components: OCE ICE Creating ORCA2/WORK = OCE ICE for ORCA2 MY_SRC directory is : ORCA2/MY_SRC Adding keys in : ORCA2 added key key_asminc in ORCA2 added key key_netcdf4 in ORCA2 ... ar: creating libnemo.a mpifort -o nemo cfgs/ORCA2/WORK/nemo.o libnemo.a -lstdc++ -fopenmp -acc=gpu -mp=gpu -Minfo=mp,accel -gpu=cc90 -L/p/software/jedi/stages/2025/software/CUDA/12/lib64 -lcudart -lnvToolsExt -L/p/project1/training2508/HybridNEMO/thirdparty/netcdf-fortran-4.4.3/lib -L/p/software/default/stages/2025/software/netCDF/4.9.2-GCCcore-13.3.0-serial/lib -L/p/software/default/stages/2025/software/HDF5/1.14.5-nvompic-2024a/lib -lhdf5 -lhdf5_hl -lnetcdf -lnetcdff /p/project1/training2508/HybridNEMO/thirdparty/xios-2.5/lib/libxios.a cp nemo cfgs/ORCA2/BLD/bin/nemo.exe /p/home/jusers/medvedev1/jedi/src/nemo-build/nemo-HEAD^master.src ++++ set +x 'nemo-HEAD^master.src/cfgs/ORCA2/EXP00/nemo' -> 'nemo-HEAD^master/nemo' >> dbscripts: dubi_main: call: dnb_sandbox '/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/fixforcings.sh' -> './fixforcings.sh' '/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/getgoldvalues.sh' -> './getgoldvalues.sh' '/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/nemo-postproc.sh' -> './nemo-postproc.sh' '/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/nemo-preproc.sh' -> './nemo-preproc.sh' '/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/generic/cmp.sh' -> './cmp.sh' '/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/generic/compare.sh' -> './compare.sh' '/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/generic/scalability_table.sh' -> './scalability_table.sh' '/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/generic/jedi/get_serialized_regions_nvhpc.sh' -> './get_serialized_regions_nvhpc.sh' '/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/generic/jedi/profiling-wrapper.sh' -> './profiling-wrapper.sh' '/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/generic/jedi/runner-script.sh' -> './runner-script.sh' ---------- Executables before build start: (unix-time, size, name) ---------- Executables after build finish: (unix-time, size, name) >> 1743413195 121081232 nemo.bin/nemo ---------- Full operation time: 210 seconds. ``` --- 8. Check the resulting binary, scripts and inputs: ``` $ cd sandbox/ $ ls eORCA025-spinup eORCA1-spinup nemo.gpu.acc.opt $ tree . ├── eORCA025-spinup -> /p/home/jusers/medvedev1/jedi/data/eORCA025-spinup ├── eORCA1-spinup -> /p/home/jusers/medvedev1/jedi/data/eORCA1-spinup └── nemo.gpu.acc.opt ├── cmp.sh ├── compare.sh ├── fixforcings.sh ├── getgoldvalues.sh ├── get_serialized_regions_nvhpc.sh ├── lib ├── nemo ├── nemo-postproc.sh ├── nemo-preproc.sh ├── profiling-wrapper.sh ├── psubmit.opt └── runner-script.sh ``` > NOTE: the directory name `nemo.gpu.acc.opt` is given as `DNB_SANDBOX_SUBDIR` value in `overrides.yaml` and is supposed to be changed if we change the build configuration: no GPU, reproducible, etc. That is the easy way to control consistency of build options for binaries we are working on. --- 9. Try to execute the resulting binary: ``` $ cd sandbox/ $ NSTEPS=32 psubmit.sh -u nemo.gpu.acc.opt Submitted batch job 55837 Job ID 55837 Queue: all; account: training2508; gres: gpu:4 Job status: Q Job status: CF Job status: R Job status: CG Job status: JOB DISAPPEARED! Job status: NONE Results collected: results.55837/ Rank 0 output: results.55837/rank0 Rank 0 errout: results.55837/erank0 Batch system output: results.55837/slurm-55837.out Psubmit wrapper output: results.55837/psubmit_wrapper_output.55837 --- Batch system output: --- >>> PSUBMIT: PSUBMIT_JOBID=55837 PSUBMIT_DIRNAME=/p/project1/training2508/HybridNEMO/utils/bin >>> PSUBMIT: args: -t slurm -n 8 -p 8 -h 9 -g 4 -d /p/project1/training2508/HybridNEMO/utils/bin -s nemo.gpu.acc -e ./runner-script.sh -o nemo.gpu.acc/psubmit.opt -a >>> PSUBMIT: sbatch -J NEMO_test_job --exclusive --time=10 --gres=gpu:4 --account=training2508 -p all -D /p/home/jusers/medvedev1/jedi/src/nemo-build/sandbox -N 1 -n 8 /p/project1/training2508/HybridNEMO/utils/bin/psubmit-mpiexec-wrapper.sh -t slurm -n 8 -p 8 -h 9 -g 4 -d /p/project1/training2508/HybridNEMO/utils/bin -s nemo.gpu.acc -e ./runner-script.sh -o nemo.gpu.acc/psubmit.opt -a "\"\"" Submitted batch job 55837 --- Psubmit wrapper output: --- >>> PSUBMIT: PWD=/p/home/jusers/medvedev1/jedi/src/nemo-build/sandbox >>> PSUBMIT: INJOB_INIT_COMMANDS: "export NEMO_DEFAULT_WORKLOAD=eORCA1-spinup; export LD_LIBRARY_PATH=/p/project1/training2508/HybridNEMO/thirdparty/netcdf-fortran-4.4.3/lib:$LD_LIBRARY_PATH" >>> PSUBMIT: Nodelist jpbot-001-38:8 >>> PSUBMIT: srun is: /usr/bin/srun >>> PSUBMIT: Executable is: ./runner-script.sh ++ srun --cpu-bind=no --gpus-per-node=4 --ntasks-per-node=8 --output=out.55837.%t --error=err.55837.%t --input=none ./runner-script.sh >>> PSUBMIT: Exiting... --- Rank 0 output: --- >> runner-script.sh: executing: numactl -l --all --physcpubind=0-8 -- ./nemo --- Rank 0 errout: --- Warning: ieee_underflow is signaling Warning: ieee_inexact is signaling 0 --- NOTE: THERE ARE ERROR OUTPUT FILES $ ``` > NOTE: the messages `Job status: JOB DISAPPEARED!` and `--- NOTE: THERE ARE ERROR OUTPUT FILES` are normal in this context, they do not indicate anything incorrect in our case > NOTE: here `NSTEPS=32` overrides the default bumber of timesteps (which is 8) --- 10. Try to execute with profiling: ``` $ PROFILE=each-rank NEMO_WORKLOAD_NAME=eORCA1-spinup psubmit.sh -unemo.gpu.acc.opt Submitted batch job 55885 Job ID 55885 Queue: all; account: training2508; gres: gpu:4 Job status: Q Job status: CF Job status: R Job status: CG Job status: JOB DISAPPEARED! Job status: NONE Results collected: results.55885/ Rank 0 output: results.55885/rank0 Rank 0 errout: results.55885/erank0 Batch system output: results.55885/slurm-55885.out Psubmit wrapper output: results.55885/psubmit_wrapper_output.55885 --- Batch system output: --- >>> PSUBMIT: PSUBMIT_JOBID=55885 PSUBMIT_DIRNAME=/p/project1/training2508/HybridNEMO/utils/bin >>> PSUBMIT: args: -t slurm -n 8 -p 8 -h 9 -g 4 -d /p/project1/training2508/HybridNEMO/utils/bin -s nemo.gpu.acc.opt -e ./runner-script.sh -o nemo.gpu.acc.opt/psubmit.opt -a >>> PSUBMIT: sbatch -J NEMO_test_job --exclusive --time=10 --gres=gpu:4 --account=training2508 -p all -D /p/home/jusers/medvedev1/jedi/src/nemo-build/sandbox -N 1 -n 8 /p/project1/training2508/HybridNEMO/utils/bin/psubmit-mpiexec-wrapper.sh -t slurm -n 8 -p 8 -h 9 -g 4 -d /p/project1/training2508/HybridNEMO/utils/bin -s nemo.gpu.acc.opt -e ./runner-script.sh -o nemo.gpu.acc.opt/psubmit.opt -a "\"\"" Submitted batch job 55885 --- Psubmit wrapper output: --- >>> PSUBMIT: PWD=/p/home/jusers/medvedev1/jedi/src/nemo-build/sandbox >>> PSUBMIT: INJOB_INIT_COMMANDS: "export NEMO_DEFAULT_WORKLOAD=eORCA1-spinup; export LD_LIBRARY_PATH=/p/project1/training2508/HybridNEMO/thirdparty/netcdf-fortran-4.4.3/lib:$LD_LIBRARY_PATH; export NEMO_KEEP_TESTBED=TRUE;" >>> PSUBMIT: Nodelist jpbot-001-40:8 >>> PSUBMIT: srun is: /usr/bin/srun >>> PSUBMIT: Executable is: ./runner-script.sh ++ srun --cpu-bind=no --gpus-per-node=4 --ntasks-per-node=8 --output=out.55885.%t --error=err.55885.%t --input=none ./runner-script.sh >>> PSUBMIT: Exiting... --- Rank 0 output: --- >> runner-script.sh: executing: numactl -l --all --physcpubind=0-8 -- ./profiling-wrapper.sh ./nemo Collecting data... Generating '/tmp/nsys-report-22fd.qdstrm' [1/1] [========================100%] report_0.nsys.nsys-rep Generated: /p/home/jusers/medvedev1/jedi/src/nemo-build/sandbox/nemo_testbed_55885/profiling.55885/rank_0/report_0.nsys.nsys-rep --- Rank 0 errout: --- Warning: ieee_invalid is signaling Warning: ieee_divide_by_zero is signaling Warning: ieee_underflow is signaling Warning: ieee_inexact is signaling 0 --- NOTE: THERE ARE ERROR OUTPUT FILES $ ``` --- 11. Exploring the results: Everything left after execution is saved in the results.\<JOBID\>/ directory in the sandbox: ``` results.55885/ ├── batch.55885.out -> slurm-55885.out ├── communication_report.txt.55885 ├── erank0 -> err.55885.0 ├── err.55885.0 ├── err.55885.1 ├── err.55885.2 ├── err.55885.3 ├── err.55885.4 ├── err.55885.5 ├── err.55885.6 ├── err.55885.7 ├── layout.dat.55885 ├── namelist_cfg.55885 ├── nemo_testbed_55885.55885 │   ├── ahmcoef.nc -> ../eORCA1-spinup/data/ahmcoef.nc │   ├── axis_def_nemo.xml -> ../eORCA1-spinup/data/axis_def_nemo.xml │   ├── ... │   ├── wU_r1x1.nc -> ../eORCA1-spinup/data/wU_r1x1.nc │   └── wV_r1x1.nc -> ../eORCA1-spinup/data/wV_r1x1.nc ├── ocean.output.55885 ├── out.55885.0 ├── out.55885.1 ├── out.55885.2 ├── out.55885.3 ├── out.55885.4 ├── out.55885.5 ├── out.55885.6 ├── out.55885.7 ├── out.55885.master ├── output.namelist.dyn.55885 ├── output.namelist.ice.55885 ├── profiling.55885 │   ├── rank_0 │   │   └── report_0.nsys.nsys-rep │   ├── rank_1 │   │   └── report_1.nsys.nsys-rep │   ├── rank_2 │   │   └── report_2.nsys.nsys-rep │   ├── rank_3 │   │   └── report_3.nsys.nsys-rep │   ├── rank_4 │   │   └── report_4.nsys.nsys-rep │   ├── rank_5 │   │   └── report_5.nsys.nsys-rep │   ├── rank_6 │   │   └── report_6.nsys.nsys-rep │   └── rank_7 │   └── report_7.nsys.nsys-rep ├── psubmit_wrapper_output.55885 ├── rank0 -> out.55885.0 ├── result.55885.yaml ├── run.stat.55885 ├── run.stat.nc.55885 ├── slurm-55885.out ├── time.step.55885 └── timing.output.55885 ``` In the results we have: - `nemo_testbed_55885.55885` is a copy of execution testbed (is not very useful and has broken symlinks, is left just in case) - `result.55885.yaml` the most useful file, the result of postprocessing of the `run.stat` and `timing.output`. The per-time-step dump of norms from run.stat can be used to ensure calclation correctness. The timing info is reliable enough to estimate speedups. Please note that timing statistics depends on: 1) NEMO_WITH_PROFILING build option; 2) number of time steps calculated. For precise results, better get stats from builds configured as: `NEMO_WITH_PROFILING=0; NEMO_WITH_REPRODUCIBLE=0;`, but even with `NEMO_WITH_PROFILING=1` and/or `NEMO_WITH_REPRODUCIBLE=1` timestats can be used for quick estimation (assuming there was no nsys profiling turned on). - `report_<NRANK>.nsys.nsys-rep` are nsys files, they can be viewed and analyzed using nsys-ui --- 12. Control norms comparison If you have two result directories made with two different builds of NEMO (but both builds are expected to be NEMO_WITH_REPRODUCIBLE=1), you can compare norms using scripts `cmp.sh` and `compare.sh`, you may find them next to `nemo` binary. You provide two directories as args: ``` $ nemo.gpu.acc.opt/compare.sh results.XXX results.YYY ``` --- 13. Scalability tests automation (`scripts/generic/scalability_table.sh`): Can be used for automated generating of the scaling/speedup tables per-subroutine. May require a bit of debugging, but must be effective for the final reports.