Meeting about EESSI and Komondor
---
## Attendees
- Alan O'Cais (UB)
- László Környei (SZE, HiDALGO2, DKF)
- Feró Orsolya, Tamás István, Debreczeni Attila (DKF)
- Máté Lohász (DKF)
- Richard Topouchain (UiB)
-
# Presention
## EESSI
* Questions
* Is EESSI delivered by squashfs
* No, but it can be (though not preferred option)
* Diskless nodes on Komondor
* See https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices/
## Komondor
* GPU and AI partitions
* NVIDIA A100
* 4 GPUs on AI nodes
* AMD GPUs
* Intel on BIG DATA partition (large memory nodes)
* No local storage on the nodes
# Discussion
* Fabric support
* Need latest and greatest OpenMPI+libfabric for slingshot support
* This can be injected to replace the OpenMPI that EESSI ships
* Software updates
* via EasyBuild
* rely on community contributions
* What about GPU?
* EESSI uses [`host_injections`](https://www.eessi.io/docs/site_specific_config/host_injections/#the-host_injections-variant-symlink) to allow you to inject your drivers where EESSI can find them (we have a script that automates this)
* The same approach is used to expose site-wide extensions of EESSI and to inject site-specific builds of OpenMPI
* What about different nodes when building other software, and GPU drivers?
* Again `host_injections`, also `EESSI-extend` installs software into an architecture specific path (so you need to perform builds natively for all architectures you want to support with your extensions)
* For example, EESSI itself submits build jobs to architecture-specific partitions of Slurm, you could adopt a similar approach with some scripting around job submissions to Slurm
* What can be done in the short term?
* You can experiment by installing CernVM-FS client on two nodes
* Diskless, so easy option is to place the cache under `/dev/shm`
* Another option is squashfs export of some packages of EESSI, and then mounting those
* User name spaces would be required _unless_ you go through a container runtime (like `apptainer`). Getting multi-node runs working via a container can be tricky, namespace mounts could avoid this.