Meeting about EESSI and Komondor --- ## Attendees - Alan O'Cais (UB) - László Környei (SZE, HiDALGO2, DKF) - Feró Orsolya, Tamás István, Debreczeni Attila (DKF) - Máté Lohász (DKF) - Richard Topouchain (UiB) - # Presention ## EESSI * Questions * Is EESSI delivered by squashfs * No, but it can be (though not preferred option) * Diskless nodes on Komondor * See https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices/ ## Komondor * GPU and AI partitions * NVIDIA A100 * 4 GPUs on AI nodes * AMD GPUs * Intel on BIG DATA partition (large memory nodes) * No local storage on the nodes # Discussion * Fabric support * Need latest and greatest OpenMPI+libfabric for slingshot support * This can be injected to replace the OpenMPI that EESSI ships * Software updates * via EasyBuild * rely on community contributions * What about GPU? * EESSI uses [`host_injections`](https://www.eessi.io/docs/site_specific_config/host_injections/#the-host_injections-variant-symlink) to allow you to inject your drivers where EESSI can find them (we have a script that automates this) * The same approach is used to expose site-wide extensions of EESSI and to inject site-specific builds of OpenMPI * What about different nodes when building other software, and GPU drivers? * Again `host_injections`, also `EESSI-extend` installs software into an architecture specific path (so you need to perform builds natively for all architectures you want to support with your extensions) * For example, EESSI itself submits build jobs to architecture-specific partitions of Slurm, you could adopt a similar approach with some scripting around job submissions to Slurm * What can be done in the short term? * You can experiment by installing CernVM-FS client on two nodes * Diskless, so easy option is to place the cache under `/dev/shm` * Another option is squashfs export of some packages of EESSI, and then mounting those * User name spaces would be required _unless_ you go through a container runtime (like `apptainer`). Getting multi-node runs working via a container can be tricky, namespace mounts could avoid this.