0. Get the personal access token for BSC-ES gitlab (see `https://earth.bsc.es/gitlab/digital-twins/nvidia/nemo-build/-/blob/master/README.md#authorization-notes` for hints)
---
00. Make a symlink: `ln -s /gpfs/projects/bsc32/DE340-share/nemo-standalone-data $HOME/data`
---
1. Modify account config files:
- `$HOME/.bashrc`:
```
export PATH=/gpfs/projects/bsc32/DE340-share/nemo-build-utils:$PATH
export DNB_GITLAB_USERNAME="XXXXXX"
export DNB_GITLAB_ACCESS_TOKEN="XXXXXXXXXXXXXX"
```
- `$HOME/.netrc`:
```
machine earth.bsc.es
login XXXXXXX
password XXXXXXXXXXXXXXX
```
> NOTE: in this file, login/password pair is the same as DNB_GITLAB_USERNAME/DNB_GITLAB_ACCESS_TOKEN above
> NOTE: don't forget to log off and log in again after changing these config files
---
2. Get `nemo-build`:
```
$ git clone --recursive https://earth.bsc.es/gitlab/digital-twins/nvidia/nemo-build.git
$ cd nemo-build
```
---
3. Create `overrides.yaml`:
```
---
environment:
- export NEMO_CODEBASE="DE340_NVIDIA"
- export DNB_PACKAGE_VERSIONS=nemo:HEAD^master
- export NEMO_WITH_REPRODUCIBLE=0
- export NEMO_WITH_GPU=1
- export NEMO_WITH_PROFILING=1
- export NEMO_WITH_FAKE_GPU=0
- export NEMO_WITH_MANAGED_MEMORY=0
- export DNB_SANDBOX_SUBDIR=nemo.gpu.acc.opt
```
---
4. Create `account.yaml`:
```
---
# MN5-ACC:
psubmit:
queue_name: ""
account: ehpc01
node_type: acc_debug
```
If you do not have ehpc01 access, you may use:
```
---
# MN5-ACC:
psubmit:
queue_name: ""
account: bsc32
node_type: acc_debug
```
---
5. Make a `machine.yaml` symlink:
```
ln -s dnb-mn5-acc.yaml machine.yaml
```
> NOTE: files `overrides.yaml`, `account.yaml` and a `machine.yaml` symlink are supposed to be local and we never push them into git repository.
---
6. Download and unpack NEMO:
```
$ ./dnb.sh :du 2>&1 | tee build.log
Download and build script for NEMO, for docs please refer: https://earth.bsc.es/gitlab/digital-twins/nvidia/nemo-build/-/blob/master/README.md
Download and build started at timestamp: 1743411705.
----------
>> dbscripts: Processing account.yaml:
>> dbscripts: Processing overrides.yaml:
export NEMO_CODEBASE="DE340_NVIDIA"
export DNB_PACKAGE_VERSIONS=nemo:HEAD^master
export NEMO_WITH_REPRODUCIBLE=0
...
>> dbscripts: dubi_main: call: dnb_nemo mode=du version=HEAD^master
2025-03-31 11:17:44 URL:https://earth.bsc.es/gitlab/api/v4/projects/1476/repository/archive?sha=master [165646931/165646931] -> "nemogcm_v40.dwn/nemogcm_v40-HEAD^master.tar.gz" [1]
patching file makenemo
>> dbscripts: applying patch_nemo-ANY_isnan_v1.diff:
patching file src/OCE/OBS/obs_inter_h2d.F90
patching file src/OCE/OBS/obsinter_h2d.h90
patching file src/OCE/stpctl.F90
>> dbscripts: applying patch_nemo-ANY_keys.diff:
patching file src/OCE/stpctl.F90
>> dbscripts: applying patch_nemo-ANY_timing.diff:
patching file src/OCE/IOM/in_out_manager.F90
----------
Full operation time: 27 seconds.
```
---
7. Build NEMO:
```
$ ./dnb.sh 2>&1 | tee build.log
Download and build script for NEMO, for docs please refer: https://earth.bsc.es/gitlab/digital-twins/nvidia/nemo-build/-/blob/master/README.md
Download and build started at timestamp: 1743412988.
----------
>> dbscripts: Processing account.yaml:
>> dbscripts: Processing overrides.yaml:
export NEMO_CODEBASE="DE340_NVIDIA"
export DNB_PACKAGE_VERSIONS=nemo:HEAD^master
export NEMO_WITH_REPRODUCIBLE=0
export NEMO_WITH_GPU=1
export NEMO_WITH_PROFILING=1
export NEMO_WITH_FAKE_GPU=0
export NEMO_WITH_MANAGED_MEMORY=0
export DNB_SANDBOX_SUBDIR=nemo.gpu.acc.opt
>> dbscripts: Processing machine.yaml:
module load Stages/2025 NVHPC/25.1-CUDA-12 OpenMPI/5.0.5 MPI-settings/CUDA HDF5/1.14.5-serial netCDF/4.9.2-serial CMake/3.30.3
export FC=mpifort
export CXX=g++
export CC=gcc
export NEMO_SCRIPTS_FOLDER="jedi"
export NEMO_AVAILABLE_WORKLOADS=${NEMO_AVAILABLE_WORKLOADS:="eORCA1-spinup eORCA025-spinup"}
export NEMO_USE_PREBUILT_HDF5=TRUE
export NEMO_USE_PREBUILT_NETCDF_C=TRUE
export NEMO_USE_PREBUILT_NETCDF_FORTRAN=TRUE
export NEMO_USE_PREBUILT_XIOS=TRUE
export NEMO_HDF5_PATH="$HDF5_DIR"
export NEMO_NETCDF_C_PATH="$EBROOTNETCDF"
export NEMO_NETCDF_FORTRAN_PATH="/p/project1/training2508/HybridNEMO/thirdparty/netcdf-fortran-4.4.3"
export NEMO_XIOS_PATH="/p/project1/training2508/HybridNEMO/thirdparty/xios-2.5"
export NEMO_WORKLOAD_ARCHIVE=""
WARNING: overriding psubmit/queue_name setting is ignored in machine.yaml: preserved value PSUBMIT_OPT_QUEUE=all
WARNING: overriding psubmit/account setting is ignored in machine.yaml: preserved value PSUBMIT_OPT_ACCOUNT=training2508
NEMO_CODEBASE=${NEMO_CODEBASE:=DE340_NVIDIA}
[ -f nemo-build-select-codebase.inc ] && source nemo-build-select-codebase.inc
Using cmake version 3.30.3
Using CUDA path /p/software/jedi/stages/2025/software/CUDA/12, version: 12.6
>> dbscripts: Package list: nemo
>> dbscripts: EXECUTING: dubi_main "$*"
>> dbscripts: dubi_main: actions: nemo:bi
>> dbscripts: dubi_main: call: dnb_nemo mode=bi version=HEAD^master
Are you sure that you want to remove this directory ORCA2? [y/n]
ORCA2 configuration REMOVED
++++ ./makenemo -j 16 -r ORCA2_ICE_PISCES -n ORCA2 -d 'OCE ICE' -m dnb add_key 'key_asminc key_netcdf4 key_sms key_xios2 key_nosignedzero key_profiling key_gpu key_nvidia_gpu key_prof_gpu' del_key key_top
You are installing a new configuration ORCA2 from ORCA2_ICE_PISCES with sub-components: OCE ICE
Creating ORCA2/WORK = OCE ICE for ORCA2
MY_SRC directory is : ORCA2/MY_SRC
Adding keys in : ORCA2
added key key_asminc in ORCA2
added key key_netcdf4 in ORCA2
...
ar: creating libnemo.a
mpifort -o nemo cfgs/ORCA2/WORK/nemo.o libnemo.a -lstdc++ -fopenmp -acc=gpu -mp=gpu -Minfo=mp,accel -gpu=cc90 -L/p/software/jedi/stages/2025/software/CUDA/12/lib64 -lcudart -lnvToolsExt -L/p/project1/training2508/HybridNEMO/thirdparty/netcdf-fortran-4.4.3/lib -L/p/software/default/stages/2025/software/netCDF/4.9.2-GCCcore-13.3.0-serial/lib -L/p/software/default/stages/2025/software/HDF5/1.14.5-nvompic-2024a/lib -lhdf5 -lhdf5_hl -lnetcdf -lnetcdff /p/project1/training2508/HybridNEMO/thirdparty/xios-2.5/lib/libxios.a
cp nemo cfgs/ORCA2/BLD/bin/nemo.exe
/p/home/jusers/medvedev1/jedi/src/nemo-build/nemo-HEAD^master.src
++++ set +x
'nemo-HEAD^master.src/cfgs/ORCA2/EXP00/nemo' -> 'nemo-HEAD^master/nemo'
>> dbscripts: dubi_main: call: dnb_sandbox
'/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/fixforcings.sh' -> './fixforcings.sh'
'/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/getgoldvalues.sh' -> './getgoldvalues.sh'
'/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/nemo-postproc.sh' -> './nemo-postproc.sh'
'/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/nemo-preproc.sh' -> './nemo-preproc.sh'
'/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/generic/cmp.sh' -> './cmp.sh'
'/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/generic/compare.sh' -> './compare.sh'
'/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/generic/scalability_table.sh' -> './scalability_table.sh'
'/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/generic/jedi/get_serialized_regions_nvhpc.sh' -> './get_serialized_regions_nvhpc.sh'
'/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/generic/jedi/profiling-wrapper.sh' -> './profiling-wrapper.sh'
'/p/home/jusers/medvedev1/jedi/src/nemo-build/scripts/generic/jedi/runner-script.sh' -> './runner-script.sh'
----------
Executables before build start: (unix-time, size, name)
----------
Executables after build finish: (unix-time, size, name)
>> 1743413195 121081232 nemo.bin/nemo
----------
Full operation time: 210 seconds.
```
---
8. Check the resulting binary, scripts and inputs:
```
$ cd sandbox/
$ ls
eORCA025-spinup eORCA1-spinup nemo.gpu.acc.opt
$ tree
.
├── eORCA025-spinup -> /p/home/jusers/medvedev1/jedi/data/eORCA025-spinup
├── eORCA1-spinup -> /p/home/jusers/medvedev1/jedi/data/eORCA1-spinup
└── nemo.gpu.acc.opt
├── cmp.sh
├── compare.sh
├── fixforcings.sh
├── getgoldvalues.sh
├── get_serialized_regions_nvhpc.sh
├── lib
├── nemo
├── nemo-postproc.sh
├── nemo-preproc.sh
├── profiling-wrapper.sh
├── psubmit.opt
└── runner-script.sh
```
> NOTE: the directory name `nemo.gpu.acc.opt` is given as `DNB_SANDBOX_SUBDIR` value in `overrides.yaml` and is supposed to be changed if we change the build configuration: no GPU, reproducible, etc. That is the easy way to control consistency of build options for binaries we are working on.
---
9. Try to execute the resulting binary:
```
$ cd sandbox/
$ NSTEPS=32 psubmit.sh -u nemo.gpu.acc.opt
Submitted batch job 55837
Job ID 55837
Queue: all; account: training2508; gres: gpu:4
Job status: Q
Job status: CF
Job status: R
Job status: CG
Job status:
JOB DISAPPEARED!
Job status: NONE
Results collected: results.55837/
Rank 0 output: results.55837/rank0
Rank 0 errout: results.55837/erank0
Batch system output: results.55837/slurm-55837.out
Psubmit wrapper output: results.55837/psubmit_wrapper_output.55837
--- Batch system output: ---
>>> PSUBMIT: PSUBMIT_JOBID=55837 PSUBMIT_DIRNAME=/p/project1/training2508/HybridNEMO/utils/bin
>>> PSUBMIT: args: -t slurm -n 8 -p 8 -h 9 -g 4 -d /p/project1/training2508/HybridNEMO/utils/bin -s nemo.gpu.acc -e ./runner-script.sh -o nemo.gpu.acc/psubmit.opt -a
>>> PSUBMIT: sbatch -J NEMO_test_job --exclusive --time=10 --gres=gpu:4 --account=training2508 -p all -D /p/home/jusers/medvedev1/jedi/src/nemo-build/sandbox -N 1 -n 8 /p/project1/training2508/HybridNEMO/utils/bin/psubmit-mpiexec-wrapper.sh -t slurm -n 8 -p 8 -h 9 -g 4 -d /p/project1/training2508/HybridNEMO/utils/bin -s nemo.gpu.acc -e ./runner-script.sh -o nemo.gpu.acc/psubmit.opt -a "\"\""
Submitted batch job 55837
--- Psubmit wrapper output: ---
>>> PSUBMIT: PWD=/p/home/jusers/medvedev1/jedi/src/nemo-build/sandbox
>>> PSUBMIT: INJOB_INIT_COMMANDS: "export NEMO_DEFAULT_WORKLOAD=eORCA1-spinup; export LD_LIBRARY_PATH=/p/project1/training2508/HybridNEMO/thirdparty/netcdf-fortran-4.4.3/lib:$LD_LIBRARY_PATH"
>>> PSUBMIT: Nodelist jpbot-001-38:8
>>> PSUBMIT: srun is: /usr/bin/srun
>>> PSUBMIT: Executable is: ./runner-script.sh
++ srun --cpu-bind=no --gpus-per-node=4 --ntasks-per-node=8 --output=out.55837.%t --error=err.55837.%t --input=none ./runner-script.sh
>>> PSUBMIT: Exiting...
--- Rank 0 output: ---
>> runner-script.sh: executing: numactl -l --all --physcpubind=0-8 -- ./nemo
--- Rank 0 errout: ---
Warning: ieee_underflow is signaling
Warning: ieee_inexact is signaling
0
--- NOTE: THERE ARE ERROR OUTPUT FILES
$
```
> NOTE: the messages `Job status: JOB DISAPPEARED!` and `--- NOTE: THERE ARE ERROR OUTPUT FILES` are normal in this context, they do not indicate anything incorrect in our case
> NOTE: here `NSTEPS=32` overrides the default bumber of timesteps (which is 8)
---
10. Try to execute with profiling:
```
$ PROFILE=each-rank NEMO_WORKLOAD_NAME=eORCA1-spinup psubmit.sh -unemo.gpu.acc.opt
Submitted batch job 55885
Job ID 55885
Queue: all; account: training2508; gres: gpu:4
Job status: Q
Job status: CF
Job status: R
Job status: CG
Job status:
JOB DISAPPEARED!
Job status: NONE
Results collected: results.55885/
Rank 0 output: results.55885/rank0
Rank 0 errout: results.55885/erank0
Batch system output: results.55885/slurm-55885.out
Psubmit wrapper output: results.55885/psubmit_wrapper_output.55885
--- Batch system output: ---
>>> PSUBMIT: PSUBMIT_JOBID=55885 PSUBMIT_DIRNAME=/p/project1/training2508/HybridNEMO/utils/bin
>>> PSUBMIT: args: -t slurm -n 8 -p 8 -h 9 -g 4 -d /p/project1/training2508/HybridNEMO/utils/bin -s nemo.gpu.acc.opt -e ./runner-script.sh -o nemo.gpu.acc.opt/psubmit.opt -a
>>> PSUBMIT: sbatch -J NEMO_test_job --exclusive --time=10 --gres=gpu:4 --account=training2508 -p all -D /p/home/jusers/medvedev1/jedi/src/nemo-build/sandbox -N 1 -n 8 /p/project1/training2508/HybridNEMO/utils/bin/psubmit-mpiexec-wrapper.sh -t slurm -n 8 -p 8 -h 9 -g 4 -d /p/project1/training2508/HybridNEMO/utils/bin -s nemo.gpu.acc.opt -e ./runner-script.sh -o nemo.gpu.acc.opt/psubmit.opt -a "\"\""
Submitted batch job 55885
--- Psubmit wrapper output: ---
>>> PSUBMIT: PWD=/p/home/jusers/medvedev1/jedi/src/nemo-build/sandbox
>>> PSUBMIT: INJOB_INIT_COMMANDS: "export NEMO_DEFAULT_WORKLOAD=eORCA1-spinup; export LD_LIBRARY_PATH=/p/project1/training2508/HybridNEMO/thirdparty/netcdf-fortran-4.4.3/lib:$LD_LIBRARY_PATH; export NEMO_KEEP_TESTBED=TRUE;"
>>> PSUBMIT: Nodelist jpbot-001-40:8
>>> PSUBMIT: srun is: /usr/bin/srun
>>> PSUBMIT: Executable is: ./runner-script.sh
++ srun --cpu-bind=no --gpus-per-node=4 --ntasks-per-node=8 --output=out.55885.%t --error=err.55885.%t --input=none ./runner-script.sh
>>> PSUBMIT: Exiting...
--- Rank 0 output: ---
>> runner-script.sh: executing: numactl -l --all --physcpubind=0-8 -- ./profiling-wrapper.sh ./nemo
Collecting data...
Generating '/tmp/nsys-report-22fd.qdstrm'
[1/1] [========================100%] report_0.nsys.nsys-rep
Generated:
/p/home/jusers/medvedev1/jedi/src/nemo-build/sandbox/nemo_testbed_55885/profiling.55885/rank_0/report_0.nsys.nsys-rep
--- Rank 0 errout: ---
Warning: ieee_invalid is signaling
Warning: ieee_divide_by_zero is signaling
Warning: ieee_underflow is signaling
Warning: ieee_inexact is signaling
0
--- NOTE: THERE ARE ERROR OUTPUT FILES
$
```
---
11. Exploring the results:
Everything left after execution is saved in the results.\<JOBID\>/ directory in the sandbox:
```
results.55885/
├── batch.55885.out -> slurm-55885.out
├── communication_report.txt.55885
├── erank0 -> err.55885.0
├── err.55885.0
├── err.55885.1
├── err.55885.2
├── err.55885.3
├── err.55885.4
├── err.55885.5
├── err.55885.6
├── err.55885.7
├── layout.dat.55885
├── namelist_cfg.55885
├── nemo_testbed_55885.55885
│ ├── ahmcoef.nc -> ../eORCA1-spinup/data/ahmcoef.nc
│ ├── axis_def_nemo.xml -> ../eORCA1-spinup/data/axis_def_nemo.xml
│ ├── ...
│ ├── wU_r1x1.nc -> ../eORCA1-spinup/data/wU_r1x1.nc
│ └── wV_r1x1.nc -> ../eORCA1-spinup/data/wV_r1x1.nc
├── ocean.output.55885
├── out.55885.0
├── out.55885.1
├── out.55885.2
├── out.55885.3
├── out.55885.4
├── out.55885.5
├── out.55885.6
├── out.55885.7
├── out.55885.master
├── output.namelist.dyn.55885
├── output.namelist.ice.55885
├── profiling.55885
│ ├── rank_0
│ │ └── report_0.nsys.nsys-rep
│ ├── rank_1
│ │ └── report_1.nsys.nsys-rep
│ ├── rank_2
│ │ └── report_2.nsys.nsys-rep
│ ├── rank_3
│ │ └── report_3.nsys.nsys-rep
│ ├── rank_4
│ │ └── report_4.nsys.nsys-rep
│ ├── rank_5
│ │ └── report_5.nsys.nsys-rep
│ ├── rank_6
│ │ └── report_6.nsys.nsys-rep
│ └── rank_7
│ └── report_7.nsys.nsys-rep
├── psubmit_wrapper_output.55885
├── rank0 -> out.55885.0
├── result.55885.yaml
├── run.stat.55885
├── run.stat.nc.55885
├── slurm-55885.out
├── time.step.55885
└── timing.output.55885
```
In the results we have:
- `nemo_testbed_55885.55885` is a copy of execution testbed (is not very useful and has broken symlinks, is left just in case)
- `result.55885.yaml` the most useful file, the result of postprocessing of the `run.stat` and `timing.output`. The per-time-step dump of norms from run.stat can be used to ensure calclation correctness. The timing info is reliable enough to estimate speedups. Please note that timing statistics depends on: 1) NEMO_WITH_PROFILING build option; 2) number of time steps calculated. For precise results, better get stats from builds configured as: `NEMO_WITH_PROFILING=0; NEMO_WITH_REPRODUCIBLE=0;`, but even with `NEMO_WITH_PROFILING=1` and/or `NEMO_WITH_REPRODUCIBLE=1` timestats can be used for quick estimation (assuming there was no nsys profiling turned on).
- `report_<NRANK>.nsys.nsys-rep` are nsys files, they can be viewed and analyzed using nsys-ui
---
12. Control norms comparison
If you have two result directories made with two different builds of NEMO (but both builds are expected to be NEMO_WITH_REPRODUCIBLE=1), you can compare norms using scripts `cmp.sh` and `compare.sh`, you may find them next to `nemo` binary. You provide two directories as args:
```
$ nemo.gpu.acc.opt/compare.sh results.XXX results.YYY
```
---
13. Scalability tests automation (`scripts/generic/scalability_table.sh`):
Can be used for automated generating of the scaling/speedup tables per-subroutine. May require a bit of debugging, but must be effective for the final reports.