# Best Practices for CernVM-FS in HPC - https://github.com/multixscale/cvmfs-tutorial-hpc-best-practices - online tutorial, focused on (Euro)HPC system administrators - aiming for Fall 2023 (Sept-Oct-Nov) - collaboration between MultiXscale/EESSI partners and CernVM-FS developers - tutorial + improvements to CernVM-FS docs - similar approach to introductory tutorial by Kenneth & Bob in 2021, see https://cvmfs-contrib.github.io/cvmfs-tutorial-2021/ - format: tutorial website (+ CVMFS docs) + accompanying slide deck --- ## Final sync meeting (2023-12-01) ### Attending - EESSI/MultiXscale: Kenneth, Lara, Alan, Bob, Thomas - CVMFS: Valentin, Laura, Jakob ### Notes - final tally on registrations: *exactly* 200 people! - tutorial website: https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices - "preview" website (boegel.github.io/...) is now 404 - last chance to give feedback on contents - Kenneth needs to take into account feedback shared by Laura in Mattermost (WIP) - now in "only important things can still be changed" mode - rien-ne-va-plus w.r.t. contents at 12:00 CET - slide deck to drive tutorial (WIP) - https://docs.google.com/presentation/d/1f7AMtfIa93k9sOTq4MXMhWRHw9Wfsig4RFqiM-yIOtg - practical info to share with registered attendees - Zoom link (final) - tutorial website - ~~YouTube live stream~~ - dedicated channel in EESSI Slack: `#cvmfs-best-practices-hpc` - should we also mention CVMFS Mattermost? => not in email - slide to let ~~Jakob~~Valentin say a word - while showing slide 3 - agenda - timing ``` (*): incl. hands-on demo [13:35-14:00 CET] Introduction to CernVM-FS [14:00-14:20 CET] * EESSI [14:20-15:00 CET] * Accessing repositories [15:00-15:30 CET] (coffee break) [15:30-15:50 CET] * Configuring CernVM-FS on HPC infrastructure [15:50-16:10 CET] * Troubleshooting [16:10-16:30 CET] * Performance aspects [16:30-16:40 CET] * Containers [16:40-16:50 CET] Creating a CernVM-FS repository (birds-eye view) [16:50-17:00 CET] Q&A ``` - live demo of - structure of EESSI repo + using EESSI - installing & configuring CVMFS client + proxy + Stratum 1 - show performance impact of no proxy + distant Stratum 1 - start TensorFlow container via unpacked.cern.ch - troubleshooting (firewall problem, incorrect ACL in Squid proxy config) - ISC'24 tutorial submission - deadline Fri 8 Dec'23 - add Valentin + Jakob + Laura as co-author - join Zoom by 13:00 CET to prepare - Valentin, Kenneth, Alan ---- ## Sync meeting (2023-11-28) ### Attending - CernVM-FS: Laura, Valentin, Jakob - EESSI/MultiXscale: Bob, Kenneth, Lara ### Notes - Practical - T minus 6 days... - 152 people have registered so far... - Should we send out another reminder? - regular Zoom session - via CERN => Valentin - can also use Zoom setup at Univ. of Oslo (via Terje) - no need for webinar mode, we can make sure that participants can not unmute on join - cloud recording - with support for streaming to YouTube (backup recording) - dedicated Slack channel in EESSI Slack - send out practical info on Mon 4 Dec around 09:00 CET - send out message to notify people that practical info will be sent on Monday 4 Dec - last minute sync Mon 4 Dec at 10:00 CET - agenda - 13:30 - 17:00 CET - 13:30 CernVM-FS - [14:00 - 14:15] EESSI - [14:15 - 14:45] Access - client setup - `sudo cvmfs_config setup` - required to create `cvmfs` user configure autofs - can skip autofs configuration - client config - warning on using direct proxy - fstab based mounting instead of autofs, or manual mount - `cvmfs_config chksetup` after custom client config - show hands-on - proxy server - stateless - show hands-on - recommendations - at least two (maintenance reason) - rule of thumb: 1 powerful proxy per 100-500 nodes - 10Gbit link to nodes - SSD storage - decent CPU - depends on workload mix - very easy to scale up, especially via round-robin DNS setup - replica server - required resources, monitoring - pre-run snapshot because this takes time - can use S3-like as backend storage (CEPH, Azure blob, AWS S3, ...) - alternative ways - cvmfs-exec - cvmfs in container with `apptainer --fusemount` - alien cache - [15:00 - 15:15] coffee break - [15:15] Configuration for HPC => Bob - diskless - prefered => loopback cache on shared FS - https://cvmfs.readthedocs.io/en/stable/cpt-hpc.html#loopback-file-systems-for-nodes-caches - client cache in memory - stealing memory - not extensively tested, complex to support this in CernVM-FS - alien cache on shared FS - eaiest to configure - may overload shared FS - offline - prefered => proxy and/or replica server in local network - preload alien cache - drop security bit? - export CVMFS to other FS - sync subdirs of CMVFS repo to filesystem like NFS - same problems as installing software on NFS filesystem - heavy-weight process - needs to be kept in sync - https://cvmfs.readthedocs.io/en/stable/cpt-shrinkwrap.html - NFS export is *not* recommended - https://cvmfs.readthedocs.io/en/stable/cpt-hpc.html#nfs-export-with-cray-dvs - Parrot not recommended anymore - https://cvmfs.readthedocs.io/en/stable/cpt-hpc.html#parrot-mounted-cernvm-fs-instead-of-fuse-module - https://cvmfs.readthedocs.io/en/stable/cpt-configure.html#parrot-connector-to-cernvm-fs - replaced by cvmfs-exec - - status update tutorial contents - https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices - CernVM-FS section: done - EESSI section: done - in progress (see preview at https://boegel.github.io/cvmfs-tutorial-hpc-best-practices) - Access section: client install + config, proxy, (private) Stratum-1 - will finish this today - Troubleshooting section - incorrect "`CVMFS_REPOSITORIES can be used to limit access to specific repositories`" - Performance section - MPI startup: impact of proxy - drop OS jitter, drop CDN - Storage backends => DROP? - HPC section - Containers section - Valentin can look into this section - incl. example to demo - Creating CVMFS repo section => very short, refer to 2021 tutorial ---- ## Sync meeting (2023-11-20) ### Attending - CernVM-FS: Laura, Valentin, Jakob - EESSI/MultiXscale: Alan, Lara, Kenneth ### Notes - 132 people have registered for the online tutorial on Mon 4 Dec'23 so far... - [PR #13](https://github.com/multixscale/cvmfs-tutorial-hpc-best-practices/pull/13): *What is CVMFS?* - terminology moved to separate appendix - pages renamed (`-` not `_`) - links to CVMFS docs for `unpacked.cern.ch` - **Ready for final review + merge** - [PR #12](https://github.com/multixscale/cvmfs-tutorial-hpc-best-practices/pull/12): *EESSI* - split up into separate pages - needs more revision work - HPC section - see also https://cvmfs.readthedocs.io/en/stable/cpt-hpc.html - diskless - CVMFS cache in ramdisk - alien cache is unmanaged (no cache eviction) - can be read-only or read/write - diskless workernodes => alien cache on shared FS - preloaded cache (orthogonal to alien cache) - often used with alien cache - promote cvmfs-exec - requires unpriv user namespaces - useful if CVMFS is not installed CVMFS system-wide - sync to another FS - shrinkwrap - performance section - see notebooks: - https://github.com/boegel/cvmfs-tutorial-hpc-best-practices/blob/perf/cvmfs_perf_python_hpcugent.ipynb - https://github.com/boegel/cvmfs-tutorial-hpc-best-practices/blob/perf/cvmfs_perf_tensorflow_hpcugent.ipynb - test GPFS - test CVMFS with cache in ramdisk - troubleshooting section - starting to collect ideas/structure - `cvmfs_config chkconfig` to check client config - logging - can configure syslog facility - CVMFS_USYSLOG to filename - access syslog via xattrs (ony if repo is mounted) - proxy - make sure that proxies are allowed to connect to S1's (in proxy settings) - usually targets are restricted - common issue, for example if WLCG is already used - for revision - see xattrs - cURL debug command to download `.cvmfspublished` manifest file - cURL treats this as binary file, so be careful - `-I` - config - order of files being considered - `default.conf`, `default.local`, config-repo, config for repo - showconfig shows actual config + where it was set - cache corruption => in monitoring section - containers - TensorFlow container? - via CVMFS: no full download, no conversion, managed cache - dedup benefit when using multiple container images - other stuff - dedicated URL for full replication of clients - next sync meeting Tue 28 Nov'23 - 10:00 CET - last-minute sync on Mon 4 Dec'23 - 10:00 CET ---- ## Sync meeting (2023-10-23) ### Attending - CernVM-FS: Laura, Jakob, Valentin - EESSI/MultiXscale: Kenneth, Lara - Excused: Bob ### Notes - go/no-go for online tutorial on Mon 4 Dec 2023 - create event + announce? - motivation to stick to Mon 4 Dec'23 - ISC'24 - MultiXscale - maybe make Terminology section an appendix (out of CVMFS intro) - flagship repos - plot from department newsletter <LINK> - unpacked.cern.ch - https://cvmfs.readthedocs.io/en/stable/cpt-ducc.html - https://cvmfs.readthedocs.io/en/stable/cpt-containers.html - performance section - tiered cache: https://cvmfs.readthedocs.io/en/stable/cpt-hpc.html - not sure if this should be encouraged - S3 storage backend - Stratum-1 with S3 storage - S3 on-site CEPH - S3 with cloud provider - not really compatible with GeoAPI... - CDN - https://openhtc.io - Troubleshooting - https://github.com/cvmfs/cvmfs/blob/devel/doc/developer/60-debugging-and-testing.md - Next sync meeting - mid Nov'23? - Mon 20 Nov'23 14:00 CET - OK for Lara, Laura, Jakob, Valentin, Kenneth - will check with Bob & Alan ----------------------------------------------------------------------------- ## Sync meeting (2023-09-28) Attending: - CernVM-FS: Laura - EESSI/MultiXscale: Alan, Lara, Kenneth ### Notes - Laura's PR https://github.com/cvmfs/cvmfs/pull/3372 - use cases - GROMACS binary - TensorFlow import - (ROOT) - hot vs warm vs cold cache - Laura's benchmark script does each run 20 times - warm cache means kernel cache is cleared between runs - hot cache is without clearing kernel cache - scenarios - private Stratum-1 with and without proxy - also test with repo not mounted yet (couple of seconds to let autofs kick in) - GeoAPI impact - see Alan's script on comparing Stratum-1's @ https://github.com/EESSI/eessi-demo/pull/24 - see also https://cvmfs.readthedocs.io/en/stable/cpt-telemetry.html --------------------------------------------------------------------------------------------------- # Previous meetings ## Sync meeting (2023-09-25) Notes available at https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-09-25-with-CernVM-FS-developers-on-Best-Practices-for-CernVM-FS-on-HPC-tutorial ## Sync meeting (2023-09-05) Notes available at https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-09-05-with-CernVM-FS-developers-on-Best-Practices-for-CernVM-FS-on-HPC-tutorial ## Sync meeting (2023-07-07) Notes available at https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-07-07-with-CernVM-FS-developers-on-Best-Practices-for-CernVM-FS-on-HPC-tutorial ## Sync meeting (2023-07-03) Bob, Lara, Kenneth - next sync meeting early Sept'23 - test tutorial with SURF? - split up in subsections, based on notes in https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-05-25-with-CernVM-FS-developers-on-Best-Practices-for-CernVM-FS-on-HPC-tutorial (Kenneth) What is CernVM-FS? (Lara) EESSI (Kenneth) Accessing a repository (Kenneth) Configuration on HPC systems (Bob) Troubleshooting and debugging (Alan?) Performance aspects (Bob) Storage backends (Bob) Containers (Getting started with CernVM-FS) ## Initial meeting (2023-05-25) Notes available at https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-05-25-with-CernVM-FS-developers-on-Best-Practices-for-CernVM-FS-on-HPC-tutorial