# build IFS-NEMO using ifsnemo-build ## ensure the credentials are OK On the download machine (it can be login node on some machines, but often you have to do downloads seperately) you have to have the access permitions to: a) restricted source code repositories; b) target HPC machines. On a target machine you have to select the accont and queue/partition that is used for integration testing purposes. ### check and fill in the required .netrc contensts $HOME/.netrc file tuned. You need records for `earth.bsc.es` and for `git.ecmwf.int`. The .netrc file typically looks like: ``` machine earth.bsc.es login XXXXXX password XXXXXXXXXXXXXXXX machine git.ecmwf.int login XXXXXX password XXXXXXXXXXXXXXXX ``` where password field holds the access tokens. You may get the idea on how to get an access token from the ifsnemo-build project README. ### access to a target HPC system access node You have to be sure that there is passwordless ssh access from the download machine to a target HPC login node. This is typically resolved by managing ssh keys in the $HOME/.ssh directory and $HOME/.ssh/config file. ### make a template for account.yaml The `account.yaml` file on a target machine will be used later on the stage of arranging the testbed directory. But it is better to agree on its contents early. The typical structure of account.yaml is: ``` --- psubmit: queue_name: XXX account: XXX node_type: XXXXXX ``` The parameters that we may use at BSC internally are given below (for a reference only, then can be different in your case): - MN5-GPP: `queue_name: ""`; `account: bsc32`; `node_type: gp_debug` - MN5-ACC: `queue_name: ""`; `account: bsc32`; `node_type: acc_debug` - LUMI-G: `queue_name: "standard-g"`; `account: project_465000454` ## clone the repository of ifsnemo-build project ### actions on a local machine (with full internet access) ``` $ git clone --recursive https://earth.bsc.es/gitlab/digital-twins/nvidia/ifsnemo-build.git $ cd ifsnemo-build $ ln -s dnb-generic.yaml machine.yaml $ vim overrides.yaml ... (may use override_example.yaml for reference) ``` The overrides.yaml contains things specific for a certain build. It is recommended to put there at lease one record, similar to this one: ``` --- environment: - export DNB_SANDBOX_SUBDIR="ifsMASTER.SP.CPU.GPP" ``` Here DNB_SANDBOX_SUBDIR sets up the subdirectory in the target testbed directory to place all binaries into. Having a specific name every time we do a build helps a lot to keep things in order during the test process. This record is not actually influencing the download stage, but it is better to create overrides.yaml early. In some cases we need more tunings to put inside the overrides.yaml. The example below shows how to build the NVIDIA/GPU version of IFS-NEMO using non-mainstream source code branches: ``` --- environment: - export DNB_IFSNEMO_URL="https://git.ecmwf.int/scm/~ecme6549" - export IFS_BUNDLE_RAPS_GIT="$DNB_IFSNEMO_URL/raps-accel.git" - export IFS_BUNDLE_RAPS_VERSION="feature/mn5-accel-partition" - export IFS_BUNDLE_IFS_SOURCE_GIT="$DNB_IFSNEMO_URL/ifs-source-accel.git" - export IFS_BUNDLE_IFS_SOURCE_VERSION="feature/mn5-accel-partition" - export DNB_IFSNEMO_BUNDLE_GIT="$DNB_IFSNEMO_URL/ifs-bundle-accel.git" - export DNB_IFSNEMO_BUNDLE_BRANCH="feature/mn5-accel-partition" - export DNB_IFSNEMO_WITH_GPU=TRUE - export DNB_IFSNEMO_WITH_GPU_EXTRA=TRUE # - export DNB_IFSNEMO_WITH_GPU_NEMO=TRUE - export DNB_SANDBOX_SUBDIR="ifsMASTER.SP.GPU.ACC" ``` Here we override 6 environment variables that declare the specific urls and branch names. Besides that, the `DNB_IFSNEMO_WITH_GPU`, `DNB_IFSNEMO_WITH_GPU_EXTRA` and `DNB_IFSNEMO_WITH_GPU_NEMO` control certain ifs-bundle tunings for GPU features. Some other high-level build features are controlled in a similar way: by exporting some `DNB_IFSNEMO_XXXXXX` variables to override their default values. ### make a placeholder directory on target machine This step is rather staright-forward: we have to mkdir the directory on a target machine's filesystem where to put build directory after downloading it. ## start build operation ### download all things The comman below must download all things required for build: ``` $ ./dnb.sh :du $ ``` ### copy them to target We may archive everything and then call scp for data transfer: ``` $ tar czf ../ifsnemo-build.tar.gz * $ scp ../ifsnemo-build.tar.gz user@target.machine:/dir/on/target/machine ``` ### unpack and tune the environment on target After unpacking the tar archive on a target machine, we have to create an appropriate account.yaml, as described above. Besides that, we have to remove the existing machine.yaml symlink, and create a new one that corresponds to a target environment: ``` $ ln -s dnb-mn5-gpp.yaml machine.yaml $ ``` ### start build process It is strongly recommended to do build process on a compute node: ``` $ some-command-to-allocate-compute-node node$ ./dnb.sh :b 2>&1 | tee build.log node$ finish-node-session $ ``` ## do the test ### setup the testbed directory ``` $ ./dnb.sh :i $ ``` Please note, that the "installation" stage (`./dnb.sh :i`) is better to do on a login node. ### execute the test If all previous steps are succesfull, one may go to the ifsnemo/ directory and run simple test: ``` $ psubmit.sh -n 1 -u ifsMASTER.SP.CPU.GPP ``` where ifsMASTER.SP.CPU.GPP is the name for a subdirectory for binaries that was given in the DNB_SANDBOX_SUBDIR variable on the build stage. That means, several subdirectories from different build may co-exist in the ifsnemo/ directory, giving an opportunity to easily compare different results. ## parameters to tune ### compile time TODO: we have to develop tunings: - to use raps/bundle code for loading env tuning and mpirun-level tuning instead of a current procedure ehre all this is overriden by ifsnemo-build code. - IFS-FESOM build instead of IFS-NEMO ### run time Please look at the dnb.yaml file header to have more ideas on the runtime options. We can select the input to work with out of 3 options: - tco79-eORCA1 (default) - tco399-eORCA025 - tco1279-eORCA12 NSTEPS defines how long to run the model. Standard psubmit parameters (https://github.com/a-v-medvedev/psubmit) are applied to select the parallel scope, SLURM time limit and other queue related details. TODO: we have to develop the runtime knobs for: - checkpoint/restart functionality - keeping fdb files - ...