# Best Practices for CernVM-FS in HPC
- https://github.com/multixscale/cvmfs-tutorial-hpc-best-practices
- online tutorial, focused on (Euro)HPC system administrators
- aiming for Fall 2023 (Sept-Oct-Nov)
- collaboration between MultiXscale/EESSI partners and CernVM-FS developers
- tutorial + improvements to CernVM-FS docs
- similar approach to introductory tutorial by Kenneth & Bob in 2021, see https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/
- format: tutorial website (+ CVMFS docs) + accompanying slide deck
---
## Final sync meeting (2023-12-01)
### Attending
- EESSI/MultiXscale: Kenneth, Lara, Alan, Bob, Thomas
- CVMFS: Valentin, Laura, Jakob
### Notes
- final tally on registrations: *exactly* 200 people!
- tutorial website: https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices
- "preview" website (boegel.github.io/...) is now 404
- last chance to give feedback on contents
- Kenneth needs to take into account feedback shared by Laura in Mattermost (WIP)
- now in "only important things can still be changed" mode
- rien-ne-va-plus w.r.t. contents at 12:00 CET
- slide deck to drive tutorial (WIP)
- https://docs.google.com/presentation/d/1f7AMtfIa93k9sOTq4MXMhWRHw9Wfsig4RFqiM-yIOtg
- practical info to share with registered attendees
- Zoom link (final)
- tutorial website
- ~~YouTube live stream~~
- dedicated channel in EESSI Slack: `#cvmfs-best-practices-hpc`
- should we also mention CVMFS Mattermost? => not in email
- slide to let ~~Jakob~~Valentin say a word
- while showing slide 3
- agenda - timing
```
(*): incl. hands-on demo
[13:35-14:00 CET] Introduction to CernVM-FS
[14:00-14:20 CET] * EESSI
[14:20-15:00 CET] * Accessing repositories
[15:00-15:30 CET] (coffee break)
[15:30-15:50 CET] * Configuring CernVM-FS on HPC infrastructure
[15:50-16:10 CET] * Troubleshooting
[16:10-16:30 CET] * Performance aspects
[16:30-16:40 CET] * Containers
[16:40-16:50 CET] Creating a CernVM-FS repository (birds-eye view)
[16:50-17:00 CET] Q&A
```
- live demo of
- structure of EESSI repo + using EESSI
- installing & configuring CVMFS client + proxy + Stratum 1
- show performance impact of no proxy + distant Stratum 1
- start TensorFlow container via unpacked.cern.ch
- troubleshooting (firewall problem, incorrect ACL in Squid proxy config)
- ISC'24 tutorial submission
- deadline Fri 8 Dec'23
- add Valentin + Jakob + Laura as co-author
- join Zoom by 13:00 CET to prepare
- Valentin, Kenneth, Alan
----
## Sync meeting (2023-11-28)
### Attending
- CernVM-FS: Laura, Valentin, Jakob
- EESSI/MultiXscale: Bob, Kenneth, Lara
### Notes
- Practical
- T minus 6 days...
- 152 people have registered so far...
- Should we send out another reminder?
- regular Zoom session
- via CERN => Valentin
- can also use Zoom setup at Univ. of Oslo (via Terje)
- no need for webinar mode, we can make sure that participants can not unmute on join
- cloud recording
- with support for streaming to YouTube (backup recording)
- dedicated Slack channel in EESSI Slack
- send out practical info on Mon 4 Dec around 09:00 CET
- send out message to notify people that practical info will be sent on Monday 4 Dec
- last minute sync Mon 4 Dec at 10:00 CET
- agenda
- 13:30 - 17:00 CET
- 13:30 CernVM-FS
- [14:00 - 14:15] EESSI
- [14:15 - 14:45] Access
- client setup
- `sudo cvmfs_config setup`
- required to create `cvmfs` user configure autofs
- can skip autofs configuration
- client config
- warning on using direct proxy
- fstab based mounting instead of autofs, or manual mount
- `cvmfs_config chksetup` after custom client config
- show hands-on
- proxy server
- stateless
- show hands-on
- recommendations
- at least two (maintenance reason)
- rule of thumb: 1 powerful proxy per 100-500 nodes
- 10Gbit link to nodes
- SSD storage
- decent CPU
- depends on workload mix
- very easy to scale up, especially via round-robin DNS setup
- replica server
- required resources, monitoring
- pre-run snapshot because this takes time
- can use S3-like as backend storage (CEPH, Azure blob, AWS S3, ...)
- alternative ways
- cvmfs-exec
- cvmfs in container with `apptainer --fusemount`
- alien cache
- [15:00 - 15:15] coffee break
- [15:15] Configuration for HPC => Bob
- diskless
- prefered => loopback cache on shared FS
- https://cvmfs.readthedocs.io/en/stable/cpt-hpc.html#loopback-file-systems-for-nodes-caches
- client cache in memory
- stealing memory
- not extensively tested, complex to support this in CernVM-FS
- alien cache on shared FS
- eaiest to configure
- may overload shared FS
- offline
- prefered => proxy and/or replica server in local network
- preload alien cache
- drop security bit?
- export CVMFS to other FS
- sync subdirs of CMVFS repo to filesystem like NFS
- same problems as installing software on NFS filesystem
- heavy-weight process
- needs to be kept in sync
- https://cvmfs.readthedocs.io/en/stable/cpt-shrinkwrap.html
- NFS export is *not* recommended
- https://cvmfs.readthedocs.io/en/stable/cpt-hpc.html#nfs-export-with-cray-dvs
- Parrot not recommended anymore
- https://cvmfs.readthedocs.io/en/stable/cpt-hpc.html#parrot-mounted-cernvm-fs-instead-of-fuse-module
- https://cvmfs.readthedocs.io/en/stable/cpt-configure.html#parrot-connector-to-cernvm-fs
- replaced by cvmfs-exec
-
- status update tutorial contents
- https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices
- CernVM-FS section: done
- EESSI section: done
- in progress (see preview at https://boegel.github.io/cvmfs-tutorial-hpc-best-practices)
- Access section: client install + config, proxy, (private) Stratum-1
- will finish this today
- Troubleshooting section
- incorrect "`CVMFS_REPOSITORIES can be used to limit access to specific repositories`"
- Performance section
- MPI startup: impact of proxy
- drop OS jitter, drop CDN
- Storage backends => DROP?
- HPC section
- Containers section
- Valentin can look into this section
- incl. example to demo
- Creating CVMFS repo section => very short, refer to 2021 tutorial
----
## Sync meeting (2023-11-20)
### Attending
- CernVM-FS: Laura, Valentin, Jakob
- EESSI/MultiXscale: Alan, Lara, Kenneth
### Notes
- 132 people have registered for the online tutorial on Mon 4 Dec'23 so far...
- [PR #13](https://github.com/multixscale/cvmfs-tutorial-hpc-best-practices/pull/13): *What is CVMFS?*
- terminology moved to separate appendix
- pages renamed (`-` not `_`)
- links to CVMFS docs for `unpacked.cern.ch`
- **Ready for final review + merge**
- [PR #12](https://github.com/multixscale/cvmfs-tutorial-hpc-best-practices/pull/12): *EESSI*
- split up into separate pages
- needs more revision work
- HPC section
- see also https://cvmfs.readthedocs.io/en/stable/cpt-hpc.html
- diskless
- CVMFS cache in ramdisk
- alien cache is unmanaged (no cache eviction)
- can be read-only or read/write
- diskless workernodes => alien cache on shared FS
- preloaded cache (orthogonal to alien cache)
- often used with alien cache
- promote cvmfs-exec
- requires unpriv user namespaces
- useful if CVMFS is not installed CVMFS system-wide
- sync to another FS
- shrinkwrap
- performance section
- see notebooks:
- https://github.com/boegel/cvmfs-tutorial-hpc-best-practices/blob/perf/cvmfs_perf_python_hpcugent.ipynb
- https://github.com/boegel/cvmfs-tutorial-hpc-best-practices/blob/perf/cvmfs_perf_tensorflow_hpcugent.ipynb
- test GPFS
- test CVMFS with cache in ramdisk
- troubleshooting section
- starting to collect ideas/structure
- `cvmfs_config chkconfig` to check client config
- logging
- can configure syslog facility
- CVMFS_USYSLOG to filename
- access syslog via xattrs (ony if repo is mounted)
- proxy
- make sure that proxies are allowed to connect to S1's (in proxy settings)
- usually targets are restricted
- common issue, for example if WLCG is already used
- for revision
- see xattrs
- cURL debug command to download `.cvmfspublished` manifest file
- cURL treats this as binary file, so be careful
- `-I`
- config
- order of files being considered
- `default.conf`, `default.local`, config-repo, config for repo
- showconfig shows actual config + where it was set
- cache corruption => in monitoring section
- containers
- TensorFlow container?
- via CVMFS: no full download, no conversion, managed cache
- dedup benefit when using multiple container images
- other stuff
- dedicated URL for full replication of clients
- next sync meeting Tue 28 Nov'23 - 10:00 CET
- last-minute sync on Mon 4 Dec'23 - 10:00 CET
----
## Sync meeting (2023-10-23)
### Attending
- CernVM-FS: Laura, Jakob, Valentin
- EESSI/MultiXscale: Kenneth, Lara
- Excused: Bob
### Notes
- go/no-go for online tutorial on Mon 4 Dec 2023
- create event + announce?
- motivation to stick to Mon 4 Dec'23
- ISC'24
- MultiXscale
- maybe make Terminology section an appendix (out of CVMFS intro)
- flagship repos
- plot from department newsletter <LINK>
- unpacked.cern.ch
- https://cvmfs.readthedocs.io/en/stable/cpt-ducc.html
- https://cvmfs.readthedocs.io/en/stable/cpt-containers.html
- performance section
- tiered cache: https://cvmfs.readthedocs.io/en/stable/cpt-hpc.html
- not sure if this should be encouraged
- S3 storage backend
- Stratum-1 with S3 storage
- S3 on-site CEPH
- S3 with cloud provider
- not really compatible with GeoAPI...
- CDN
- https://openhtc.io
- Troubleshooting
- https://github.com/cvmfs/cvmfs/blob/devel/doc/developer/60-debugging-and-testing.md
- Next sync meeting
- mid Nov'23?
- Mon 20 Nov'23 14:00 CET
- OK for Lara, Laura, Jakob, Valentin, Kenneth
- will check with Bob & Alan
-----------------------------------------------------------------------------
## Sync meeting (2023-09-28)
Attending:
- CernVM-FS: Laura
- EESSI/MultiXscale: Alan, Lara, Kenneth
### Notes
- Laura's PR https://github.com/cvmfs/cvmfs/pull/3372
- use cases
- GROMACS binary
- TensorFlow import
- (ROOT)
- hot vs warm vs cold cache
- Laura's benchmark script does each run 20 times
- warm cache means kernel cache is cleared between runs
- hot cache is without clearing kernel cache
- scenarios
- private Stratum-1 with and without proxy
- also test with repo not mounted yet (couple of seconds to let autofs kick in)
- GeoAPI impact
- see Alan's script on comparing Stratum-1's @ https://github.com/EESSI/eessi-demo/pull/24
- see also https://cvmfs.readthedocs.io/en/stable/cpt-telemetry.html
---------------------------------------------------------------------------------------------------
# Previous meetings
## Sync meeting (2023-09-25)
Notes available at https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-09-25-with-CernVM-FS-developers-on-Best-Practices-for-CernVM-FS-on-HPC-tutorial
## Sync meeting (2023-09-05)
Notes available at https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-09-05-with-CernVM-FS-developers-on-Best-Practices-for-CernVM-FS-on-HPC-tutorial
## Sync meeting (2023-07-07)
Notes available at https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-07-07-with-CernVM-FS-developers-on-Best-Practices-for-CernVM-FS-on-HPC-tutorial
## Sync meeting (2023-07-03)
Bob, Lara, Kenneth
- next sync meeting early Sept'23
- test tutorial with SURF?
- split up in subsections, based on notes in https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-05-25-with-CernVM-FS-developers-on-Best-Practices-for-CernVM-FS-on-HPC-tutorial
(Kenneth) What is CernVM-FS?
(Lara) EESSI
(Kenneth) Accessing a repository
(Kenneth) Configuration on HPC systems
(Bob) Troubleshooting and debugging
(Alan?) Performance aspects
(Bob) Storage backends
(Bob) Containers
(Getting started with CernVM-FS)
## Initial meeting (2023-05-25)
Notes available at https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-05-25-with-CernVM-FS-developers-on-Best-Practices-for-CernVM-FS-on-HPC-tutorial