# EESSI software layer sync meeting
## planning
- next meeting
- Tue 10 Oct at 09:00 CEST
## Meeting (2023-10-10)
- attending: Kenneth, Lara, Alan, Bob, Pedro, Richard, Maxim, Caspar, Julián
- status update eessi.io
- multiple Stratum-1's set up
- 1 with S3 backend, other with regular storage backend
- old setup is using eessi-infra.org (owned by Terje)
- meeting planned for Fri 13 Oct to discuss this
- what does the ComputeCanada setup look like
- w.r.t. CDN
- eessi.io domain is registered with CloudFlare (by Alan)
- we also have eessi.science
- bot
- v0.1.0 of bot has been tested in different context (not a setup like EESSI)
- to automate installation of software in central software stack on HPC-UGent systems
- only build phase (no deploy, direct installations into `/apps` NFS dir)
- only small bug fix was needed (see [PR #220](https://github.com/EESSI/eessi-bot-software-layer/pull/220))
- goal is to try and use bot to better automate and track software installation for new HPC-UGent cluster
- next big step (v0.2.0): add testing step in between build & deploy
- leverage EESSI test suite, run tests for software installations produced during build phase (ideally in different OS container, on different Slurm cluster)
- software-layer
- new Slurm clusters are being set up (in AWS + Azure)
- using [Magic Castle](https://github.com/ComputeCanada/magic_castle) instead of CitC
- **bot for EESSI pilot 2023.06 will be migrated to new Slurm cluster soon!**
- should set up a DNS entry like mc-aws.eessi.io ?
- CI for testing if EESSI stack is available is only checking single architecture ([issue #349](https://github.com/EESSI/software-layer/issues/349))
- merged PRs
- determine easystack files to process via PR patch file (PRs [#351](https://github.com/EESSI/software-layer/pull/351) + [#354](https://github.com/EESSI/software-layer/pull/354))
- helps to avoid hitting GitHub rate limit
- remove bot configuration (2023.06) (PR #356)
- moved to private repo EESSI/bot-configs
- show active EasyBuild configuration when checking for missing installations (PR #358)
- to help debug [issue #349](https://github.com/EESSI/software-layer/issues/349)
- simplify CI workflow to check for missing installations, just loop over easystack files per CPU target (PR #359)
- BAMM (PR #350)
- SciPy-bundle 2022.05 w/ foss/2022a (PR #352)
- WRF v4.4.1 w/ foss/2022b (PR #336)
- open PRs
- TensorFlow
- stuck for different reasons (testing fails on aarch64/neoverse_v1 for older versions, build )
- may need to mark these installations as missing for aarch64
- via *missing*.yaml
- should have pointer to issue that has more info on failing build
- and also mention which alternative modules are available
- can add "fake" property to generated module for these missing installations
- matplotlib
- stuck due to instalation problem with Pillow
- setup.py looks for zlib.h, doesn't consider compat layer
- do we add path to /include in compat layer to `$CPATH`?
- Pillow's setup.py does consider `$CPATH` and `$LIBRARY_PATH`
- Richard is looking into this, Kenneth can help
- [MXS] ESPResSo
- we need to use `-DPython3_EXECUTABLE` to control which Python is being used, cfr. [fix for VTK](https://github.com/easybuilders/easybuild-easyconfigs/pull/16741)
- Maxim can't access Slurm cluster yet to access build logs
- [MXS] LAMMPS
- still building (first attempt)
- [MXS] waLBerla
- [easyconfig PR #18932](https://github.com/easybuilders/easybuild-easyconfigs/pull/18932) is open
- currently only header files are being installed
- Maxim is helping out Xin with this
- RStudio-Server
- should be synced with `2023.06` since required R dependency is now in place in EESSI 2023.06
- ComputeCanada patches RStudio a bit to make it work in user space
- other
- Pedro is interested in helping out with the bot
- looking for easy tasks to get started with
- planning to look into [issue #212](https://github.com/EESSI/eessi-bot-software-layer/issues/212)
- GPU support
- really need to get started with this...
- we should set up a meeting on this, get a plan worked out, and get it done (definitely involving Alan, Kenneth, ...)
- important for MultiXscale milestones due end of 2023!
- steps
- add CUDA compat libs to EESSI (in compat layer?)
- make sure linker can find system libs
- check via Lmod hook to make sure that libcuda.so is available, or whether script needs to be run to get required symlinks in place
- to build CUDA software in build container, we need a full CUDA install in a local dir
- we're only allowed to ship runtime stuff of CUDA in EESSI (via hook available in ...)
- includes broken links to local CUDA install
---
## Meeting (2023-10-03)
- attending: Kenneth, Lara, Bob, Julián, Pedro, Thomas, Richard, Caspar, Alan
- status update eessi.io
- Stratum-0 is set up at RUG
- single Stratum-1 running in AWS (using S3 backend)
- test setup
- required lots of manual work (create VM + S3 bucket) because Atlantis wasn't working
- Ansible playbooks sort of worked, but does not support S3 buckets yet
- see [WIP filesystem-layer PR #160](https://github.com/EESSI/filesystem-layer/pull/160)
- GeoAPI doesn't work well with S3 buckets, clients go straight to S3 bucket
- need to figure out:
- how many Stratum-1's do we want (initially)?
- currently we have 4 for eessi-hpc.org (AWS, Azure, RUG in NL, BGO in Norway)
- how to deal with S3 buckets vs GeoAPI
- who should have admin access?
- DNS
- using CDN (CloudFlare)?
- sync meeting being planned
- bot
- merged PRs:
- add `shared_fs_path` configuration setting [PR #214](https://github.com/EESSI/eessi-bot-software-layer/pull/214)
- README updated ([PR #215](https://github.com/EESSI/eessi-bot-software-layer/pull/215))
- v0.1.0 released: https://github.com/EESSI/eessi-bot-software-layer/releases/tag/v0.1.0
- `develop` branch
- for active development (PRs)
- `main` branch always corresponds to latest release
- open PRs:
- script to clean up tarballs of jobs given a PR number ([PR #217](https://github.com/EESSI/eessi-bot-software-layer/pull/217))
- can let bot use this when a PR is merged/closed
- only cleans up large "checkpoint" tarballs for now, should eventually clean up *everything* related to a PR?
- next steps
- test step in between build & deploy
- make deploy step agnostic of EESSI
- new Slurm clusters for bot
- new Slurm clusters are being set up with [Magic Castle](https://github.com/ComputeCanada/magic_castle)
- in AWS: set up, need to test bot there
- will (very) soon replace current CitC cluster...
- next steps
- create more accounts
- increase disk space to couple of TBs (no EFS used there)
- in Azure: to set up, need to figure out account/API stuff
- software-layer
- merged PRs
- foss/2023a ([PR #334](https://github.com/EESSI/software-layer/pull/334))
- ignore flaky failing FFTW.MPI tests (see [issue #325](https://github.com/EESSI/software-layer/issues/325))
- use patch to fix detection of Neoverse V1 in OpenBLAS (cfr. [easyconfigs PR #18870](https://github.com/easybuilders/easybuild-easyconfigs/pull/18870))
- foss/2022a ([PR #310](https://github.com/EESSI/software-layer/pull/310))
- R v4.1.0 w/ foss/2021a ([PR #328](https://github.com/EESSI/software-layer/pull/328))
- add YAML file to keep track of known issues in EESSI pilot 2023.06 ([PR #340](https://github.com/EESSI/software-layer/pull/340))
- only increase limit for numerical test failures for OpenBLAS for aarch64/neoverse_v1 ([merged PR #345](https://github.com/EESSI/software-layer/pull/345))
- open PRs
- TensorFlow
- TensorFlow v2.7.1 with `foss/2021b` ([PR #321](https://github.com/EESSI/software-layer/pull/321))
- several test failures in `aarch64/*` targets
- may be fixable by backporting a couple of patches, but maybe not worth the trouble?
- TensorFlow v2.8.4 with `foss/2021b` ([PR #343](https://github.com/EESSI/software-layer/pull/343))
- assembler errors on `aarch64/*` when building XNNPACK
- due to use of `-mcpu=native` which clashes with custom `-march=...` options used by XNNPACK build procedure
- see also [easyconfigs issue #18899](https://github.com/easybuilders/easybuild-easyconfigs/issues/18899)
- should be fixed by making sure that `-mcpu=...` is not used when building XNNPACK, see [easyblocks PR #3011](https://github.com/easybuilders/easybuild-easyblocks/pull/3011)
- TensorFlow v2.11.0 with `foss/2022a` ([PR #346](https://github.com/EESSI/software-layer/pull/346))
- assembler errors on `aarch64/*` when building XNNPACK, fixed with [easyblocks PR #3011](https://github.com/easybuilders/easybuild-easyblocks/pull/3011)
- TensorFlow v2.13.0 with `foss/2022b` ([PR #347](https://github.com/EESSI/software-layer/pull/347))
- 928 failing `scipy` tests on `aarch64/neoverse_v1`...
- build error on `x86_64/intel/haswell` because `/usr/include/stdio.h` is picked up
- need to set `$TF_SYSROOT`?
- matplotlib v3.4.3 with `foss/2021b` ([PR #339](https://github.com/EESSI/software-layer/pull/339))
- open pr for Pillow in EasyBuild: [#PR 18881](https://github.com/easybuilders/easybuild-easyconfigs/pull/18881)
- ESPResSo
- with `foss/2021a` ([PR #332](https://github.com/EESSI/software-layer/pull/332))
- with `foss/2022a` ([PR #331](https://github.com/EESSI/software-layer/pull/331))
- - wrong Python installation is picked up
- WRF
- ([PR #336](https://github.com/EESSI/software-layer/pull/336))
- failing netCDF tests due to RPATH issue
- seems to be caused by `-DCMAKE_SKIP_RPATH=ON` that was added in https://github.com/easybuilders/easybuild-easyblocks/pull/1031 (Nov 2016)
- maybe needed due to a bug in old CMake versions
- notes
- should add "missing" YAML file (like for old TensorFlow versions on `aarch64/*`)
- next packages
- OpenFOAM
- newer R
- Bioconductor
- AlphaFold (GPU)
- GPU
- we should set up a meeting to figure out the right steps...
- plan is to look into supporting GPUs in `software.eessi.io` CVMFS repo
- is ldconfig OK with non-existing paths (to system paths)? also, order matters
- Apptainer also uses ldconfig to figure out paths to required libraries
- CUDA compat libs (could be avoided, only needed as a fallback)
- last location: Apptainer libs
- first step should be to get it working on assumption that GPU driver is sufficiently recent
- Alan will look into planning a sync meeting on GPU support
# Previous meetings
* https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-Software-layer-(2023-10-03)
* https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-Software-layer-(2023-09-26)
* https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-Software-layer-(2023-09-20)
* https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-Software-layer-(2023-09-12)
* https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-Software-layer-(2023-09-05)