# meetings with Deucalion ## 20240627 attending: Joāo, Ricardo Vilaça, Alan, Lara, Kenneth, Miguel P. - status of A64FX support in EESSI - see also https://gitlab.com/eessi/support/-/issues/76 - bunch of installations optimized for A64FX have been built + deployed in EESSI production repository (`software.eessi.io`) - through EESSI build-and-deploy bot, running in Kenneth's account on Deucalion - see for example https://github.com/EESSI/software-layer/pull/624 - pretty smooth ride, no (major) suprises so far (compared to building for `aarch64/neoverse_v1`) - 56 installations in total, including: - `foss/2023a` toolchain - `SciPy-bundle/2023.07-gfbf-2023a` - short term goal (this week) is to build `ESPResSo/4.2.2-foss-2023a` for A64FX and run scaling test with it - currently building, see https://github.com/EESSI/software-layer/pull/625 - still need to do performance comparisons with generic ARM support (and perhaps against Fujitsu compilers for specific apps?) - using EESSI single-node can be done via `cvmfsexec`: ```bash # prepare cvmfsexec tool mkdir -p /tmp/$USER cd /tmp/$USER git clone https://github.com/cvmfs/cvmfsexec.git cd cvmfsexec ./makedist default cd ~ # start subshell in which EESSI is available /tmp/$USER/cvmfsexec/cvmfsexec software.eessi.io -- /bin/bash -l # bypass CPU auto-detection, force use of a64fx installations export EESSI_SOFTWARE_SUBDIR_OVERRIDE=aarch64/a64fx # initialize EESSI environment source /cvmfs/software.eessi.io/versions/2023.06/init/bash # run small benchmark with numpy (dot product of 1000x1000 matrix) # with aarch64/generic: 27.9 msec per loop # with arch64/a64fx: 7.03 msec per loop module load SciPy-bundle/2023.07-gfbf-2023a python -m timeit -n 3 -r 3 -s "import numpy; x = numpy.random.random((1000, 1000))" "numpy.dot(x, x.T)" # CVMFS cache size $ du -sh /tmp/kehoste/cvmfsexec/dist/var/lib/cvmfs/shared 345M /tmp/kehoste/cvmfsexec/dist/var/lib/cvmfs/shared ``` - questions - enroot/squashfs - we want to do a squashfs export of EESSI, so we can use it for testing EESSI on Deucalion without having CernVM-FS installed system-wide - is enroot available everywhere (Slurm plugin)? - Slurm plugin: https://github.com/NVIDIA/pyxis - focus for this is GPU partition - only available on GPU & AMD partitions - only one image per node - could help us with CernVM-FS - enroot does not support FUSE mounting (like Apptainer does) - recommendations for limited amount of available memory per core (~800MB/core) on A64FX partition? - OOMs/segfaults in application codes are more likely - May want to consider partition-specific Lmod hooks for user advice - expensive HBM, same as on Fugaku - MACC is looking into core pinning & NUMA and trying to come up with recommendations - recommendation from Fujitsu is to use their compiler - dialogue with Riken - Alan was at meeting with Riken in Barcelona (InPex), not clear if there is much dialogue with them for Deucalion - see https://inpex.science/workshop/the-2024-inpex-workshop/ - there is dialogue with specific people at Riken - They seem to have some very good power management capabilities - Alan can share link to presentations from InPex meeting - these features may not be available on Deucalion (FXC700?) - Fugaku is different setup (not using Infiniband) - Riken has a lot of job data, which is being used for training AI to tweak power usage - Fugaku will run until ~2030 - Software stack seems quite static (but they support Spack, though Spack support for Fugaku may also be a bit dated) - Fugaku will be available for EuroHPC projects (and European systems will be available for Japanese projects) - would be good to have a shared account to run bot eventually - additional accounts can be created on Deucalion - project number `I20240007` should be mentioned - MACC is currently looking into PyTorch - special fork of PyTorch available for A64FX - https://github.com/fujitsu/pytorch - see also https://github.com/fujitsu/pytorch/wiki/PyTorch-DNNL_aarch64-build-manual-for-FUJITSU-Software-Compiler-Package-(PyTorch-v1.10.1)#build-instruction-for-pytorch-on-fujitsu-supercomputer-primehpc-fx1000fx700 - short projects will be onboarded in July on Deucalion - larger projects in Oct/Nov - if needed, a short 24h/48h reservation can be set up for testing EESSI ### Next meeting - Thu 5 Sept 2024, 9:00 Portugese time/10:00 CEST --- ## 20240611 - attending - Kenneth Hoste + Lara Peeters (HPC-UGent, EasyBuild/EESSI/MultiXscale) - Alan O'Cais (CECAM, Univ. of Barcelona, EasyBuild/EESSI/MultiXscale) - Miguel Peixoto & João Barbosa (MACC) - Miguel Dias Costa (Univ. of Singapore) - high-level overview of Deucalion - AMD Rome + A64FX partition - NVIDIA GPU partition - using EasyBuild to install central software stack - EasyBuild vs EESSI - can build additional modules on top of EESSI, cfr. http://www.eessi.io/docs/using_eessi/building_on_eessi - current status of EESSI - production repository: `software.eessi.io` - 8 supported CPU targets (incl. `aarch64/*`) - over 325 unique open source software projects installed - plus 100s of Python + R + Perl extensions - over 600 software installations per CPU target - over 5000 software installations in total - see http://eessi.io/docs/available_software/overview - lots of attention to software testing - cfr. SVE bug uncovered in GROMACS () - basic support for NVIDIA GPUs in place - GPU installations of PyTorch, GROMACS, etc. coming soon - EESSI works (Miguel DC tried via `cvmfs-exec` in `/tmp`) - detects `aarch64/generic` as CPU target - because `asimddp` CPU feature (SIMD Dot Product) is missing in A64FX - see also https://www.kernel.org/doc/Documentation/arm64/elf_hwcaps.txt - `asimdhp` (only on A64FX) is `AdvSIMD` - we can/want to start looking into installations optimized for A64FX - questions - accounts on Deucalion? - via portal, use EESSI as justification => https://portal.deucalion.macc.fccn.pt - 24h reservation can be considered - native CernVM-FS installation? - see tutorial => https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices - resources to build for A64FX in EESSI? - in case of questions, contact Miguel + João (in CC) directly - concerns with EESSI/CernVM-FS - missing optimized build for A64FX, EESSI is interested in fixing that - CernVM-FS caching - location of cache on client needs to be figured - private mirror server for Deucalion - will take time - EESSI is happy to help with this - see also tutorial: https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices - follow-up meeting - on monthly basis - Thu 27 June'24, 09:00 Portugese time, 10:00 CEST, 16:00 Singapore