# meetings with Deucalion
## 20240627
attending: Joāo, Ricardo Vilaça, Alan, Lara, Kenneth, Miguel P.
- status of A64FX support in EESSI
- see also https://gitlab.com/eessi/support/-/issues/76
- bunch of installations optimized for A64FX have been built + deployed in EESSI production repository (`software.eessi.io`)
- through EESSI build-and-deploy bot, running in Kenneth's account on Deucalion
- see for example https://github.com/EESSI/software-layer/pull/624
- pretty smooth ride, no (major) suprises so far (compared to building for `aarch64/neoverse_v1`)
- 56 installations in total, including:
- `foss/2023a` toolchain
- `SciPy-bundle/2023.07-gfbf-2023a`
- short term goal (this week) is to build `ESPResSo/4.2.2-foss-2023a` for A64FX and run scaling test with it
- currently building, see https://github.com/EESSI/software-layer/pull/625
- still need to do performance comparisons with generic ARM support (and perhaps against Fujitsu compilers for specific apps?)
- using EESSI single-node can be done via `cvmfsexec`:
```bash
# prepare cvmfsexec tool
mkdir -p /tmp/$USER
cd /tmp/$USER
git clone https://github.com/cvmfs/cvmfsexec.git
cd cvmfsexec
./makedist default
cd ~
# start subshell in which EESSI is available
/tmp/$USER/cvmfsexec/cvmfsexec software.eessi.io -- /bin/bash -l
# bypass CPU auto-detection, force use of a64fx installations
export EESSI_SOFTWARE_SUBDIR_OVERRIDE=aarch64/a64fx
# initialize EESSI environment
source /cvmfs/software.eessi.io/versions/2023.06/init/bash
# run small benchmark with numpy (dot product of 1000x1000 matrix)
# with aarch64/generic: 27.9 msec per loop
# with arch64/a64fx: 7.03 msec per loop
module load SciPy-bundle/2023.07-gfbf-2023a
python -m timeit -n 3 -r 3 -s "import numpy; x = numpy.random.random((1000, 1000))" "numpy.dot(x, x.T)"
# CVMFS cache size
$ du -sh /tmp/kehoste/cvmfsexec/dist/var/lib/cvmfs/shared
345M /tmp/kehoste/cvmfsexec/dist/var/lib/cvmfs/shared
```
- questions
- enroot/squashfs
- we want to do a squashfs export of EESSI, so we can use it for testing EESSI on Deucalion without having CernVM-FS installed system-wide
- is enroot available everywhere (Slurm plugin)?
- Slurm plugin: https://github.com/NVIDIA/pyxis
- focus for this is GPU partition
- only available on GPU & AMD partitions
- only one image per node
- could help us with CernVM-FS
- enroot does not support FUSE mounting (like Apptainer does)
- recommendations for limited amount of available memory per core (~800MB/core) on A64FX partition?
- OOMs/segfaults in application codes are more likely
- May want to consider partition-specific Lmod hooks for user advice
- expensive HBM, same as on Fugaku
- MACC is looking into core pinning & NUMA and trying to come up with recommendations
- recommendation from Fujitsu is to use their compiler
- dialogue with Riken
- Alan was at meeting with Riken in Barcelona (InPex), not clear if there is much dialogue with them for Deucalion
- see https://inpex.science/workshop/the-2024-inpex-workshop/
- there is dialogue with specific people at Riken
- They seem to have some very good power management capabilities
- Alan can share link to presentations from InPex meeting
- these features may not be available on Deucalion (FXC700?)
- Fugaku is different setup (not using Infiniband)
- Riken has a lot of job data, which is being used for training AI to tweak power usage
- Fugaku will run until ~2030
- Software stack seems quite static (but they support Spack, though Spack support for Fugaku may also be a bit dated)
- Fugaku will be available for EuroHPC projects (and European systems will be available for Japanese projects)
- would be good to have a shared account to run bot eventually
- additional accounts can be created on Deucalion
- project number `I20240007` should be mentioned
- MACC is currently looking into PyTorch
- special fork of PyTorch available for A64FX
- https://github.com/fujitsu/pytorch
- see also https://github.com/fujitsu/pytorch/wiki/PyTorch-DNNL_aarch64-build-manual-for-FUJITSU-Software-Compiler-Package-(PyTorch-v1.10.1)#build-instruction-for-pytorch-on-fujitsu-supercomputer-primehpc-fx1000fx700
- short projects will be onboarded in July on Deucalion
- larger projects in Oct/Nov
- if needed, a short 24h/48h reservation can be set up for testing EESSI
### Next meeting
- Thu 5 Sept 2024, 9:00 Portugese time/10:00 CEST
---
## 20240611
- attending
- Kenneth Hoste + Lara Peeters (HPC-UGent, EasyBuild/EESSI/MultiXscale)
- Alan O'Cais (CECAM, Univ. of Barcelona, EasyBuild/EESSI/MultiXscale)
- Miguel Peixoto & João Barbosa (MACC)
- Miguel Dias Costa (Univ. of Singapore)
- high-level overview of Deucalion
- AMD Rome + A64FX partition
- NVIDIA GPU partition
- using EasyBuild to install central software stack
- EasyBuild vs EESSI
- can build additional modules on top of EESSI, cfr. http://www.eessi.io/docs/using_eessi/building_on_eessi
- current status of EESSI
- production repository: `software.eessi.io`
- 8 supported CPU targets (incl. `aarch64/*`)
- over 325 unique open source software projects installed
- plus 100s of Python + R + Perl extensions
- over 600 software installations per CPU target
- over 5000 software installations in total
- see http://eessi.io/docs/available_software/overview
- lots of attention to software testing
- cfr. SVE bug uncovered in GROMACS ()
- basic support for NVIDIA GPUs in place
- GPU installations of PyTorch, GROMACS, etc. coming soon
- EESSI works (Miguel DC tried via `cvmfs-exec` in `/tmp`)
- detects `aarch64/generic` as CPU target
- because `asimddp` CPU feature (SIMD Dot Product) is missing in A64FX
- see also https://www.kernel.org/doc/Documentation/arm64/elf_hwcaps.txt
- `asimdhp` (only on A64FX) is `AdvSIMD`
- we can/want to start looking into installations optimized for A64FX
- questions
- accounts on Deucalion?
- via portal, use EESSI as justification => https://portal.deucalion.macc.fccn.pt
- 24h reservation can be considered
- native CernVM-FS installation?
- see tutorial => https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices
- resources to build for A64FX in EESSI?
- in case of questions, contact Miguel + João (in CC) directly
- concerns with EESSI/CernVM-FS
- missing optimized build for A64FX, EESSI is interested in fixing that
- CernVM-FS caching
- location of cache on client needs to be figured
- private mirror server for Deucalion
- will take time
- EESSI is happy to help with this
- see also tutorial: https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices
- follow-up meeting
- on monthly basis
- Thu 27 June'24, 09:00 Portugese time, 10:00 CEST, 16:00 Singapore