# EESSI/Azure/SURF sync meeting 2022-03-18 ## Agenda - Overview of spent credits - EESSI hackathon January 2022 - EUM'22 - Monitoring - Repository for hosting datasets (e.g. for WRF) - EESSI paper published! - Azure support for CitC - European project proposal (funding for EESSI) ## Attendees - Bob Dröge (RUG) - Kenneth Hoste (HPC-UGent) - Laura Redfern (MS Azure) - Alan O'Cais (CECAM) - Hugo Meiland (MS Azure) - Julian Kreuk (MS Azure) - Ahmad Hesam (SURF) ## Notes ### Use of sponsored credits - January 2022: ~€1195 - temporary Magic Castle cluster EESSI hackathon (Jan'22): ~€610 - AMD Milan build node: ~€350 - Stratum-1 in us-east: ~€200 - February 2022: ~€185 - March 2022 (partial): ~€103 - in last 1.5 months: basically only Stratum-1 in us-east ### EESSI hackathon - https://github.com/EESSI/meetings/wiki/EESSI-hackathon-Jan'22 - show & tell meeting on last day to wrap up the hackathon - slides: https://github.com/EESSI/meetings/blob/main/meetings/EESSI_hackathon_2022-01_show_and_tell.pdf - recording: https://www.youtube.com/watch?v=pRjm8cayxi0 - good progress on: - installing software on top of EESSI (experiments with LAMMPS) - see https://github.com/EESSI/hackathons/tree/02_software_on_top/2022-01/02_software_on_top - good progress on support for running software on NVIDIA GPUs - see https://github.com/EESSI/hackathons/tree/05_gpu/2022-01/05_gpu - working solution (proof-of-concept) for Rocky Linux (should work on RHEL8-based Linux distros) - some form of approval from NVIDIA to ship CUDA runtime in EESSI (but nothing more than that) - still leaves the CUDA compilers - even if we include CUDA runtime, we'll still need a script to ensure things work properly - can also include sanity checks - Hugo: what would be needed in Azure VM images for this? - Alan: basically only reasonbly recent GPU drivers - script can install compat libs in userspace to fill the gaps where needed - scripts we have should get integrated in EESSI to facilitate letting play with it - Hugo also has some contacts in NVIDIA that could be helpful - Hugo is interested on playing with proof-of-concept GPU support with Devito (see https://github.com/easybuilders/easybuild-easyconfigs/pull/14984) - archiving EESSI software installations into a container image - see https://github.com/EESSI/hackathons/tree/16_export_software_stack/2022-01/16_export_software_stack ### EUM'22 - https://easybuild.io/eum22 - incl. talk by Hugo & Davide on running WRF on Azure via EESSI - see https://easybuild.io/eum22/#azure-eessi-wrf - some additional progress has been made there - question in this context that came up was whether Intel oneAPI compilers and tools (incl. MKL) could be included in EESSI - EESSI community has not contacted Intel yet at all for this - two talks on EESSI: - Getting Started with EESSI - see https://easybuild.io/eum22/#eessi-getting-started - Semi-automated workflow for adding software to EESSI - https://easybuild.io/eum22/#eessi-workflow ### Monitoring - good progess, work done by Terje @ Univ. of Oslo - https://monitoring.eessi-infra.org ### Data repository - see Hugo's proposal at https://github.com/EESSI/filesystem-layer/pull/112 - WRF input data is handful of 1GB files - should be doable with a "normal" data repository? - Bob will look into this - expectation is that additional data would be pulled in soon for seismic data - some concerns there w.r.t. client CernVM-FS cache - larger cache than 10GB would be needed for this ### EESSI paper published - EESSI: A cross-platform ready-to-use optimised scientific software stack - https://doi.org/10.1002/spe.3075 - Open access - "Software: Practice and Experience" special issue on "New Trends in HPC" ### Azure support for CitC - work done by Hugo - resulted in PRs to CitC: - https://github.com/clusterinthecloud/terraform/pull/68 - https://github.com/clusterinthecloud/ansible/pull/118 - Matt Williams (main developer of CitC) has been granted access to EESSI Azure credits for testing CitC on Azure - maybe we should try and get Matt to join next EESSI hackathon - Magic Castle already has Azure support, but no auto-power-down of unused workernodes there - Hugo could help here too - Alan: doing things securely is a big concern here ### European project proposal - seeking letter of support from Azure - if possible, also some confirmation that sponsored Azure credits could be made available to project (what we basically already have) ### ISC'22 - any plans? - half-day EasyBuild tutorial where EESSI will be shortly featured - Laura will check if there could be a session setu p ### azhop - integration is done by Hugo, using EESSI is opt-in ### Internal presentation in Azure by Hugo - was well received - Hugo is making it very clear that EESSI is still pre-production ### Q&A - Hugo: best option to spend time on stuff that's helpful for EESSI? - Kenneth: blockers for stable EESSI are: - dedicated manpower (would be resolved if European project proposal gets accepted) - proper support for NVIDIA GPUs - Alan could post instructions on how to play with proof-of-concept that was puzzled together during last hackathon - GitHub App to automate workflow for community contributions - starting point by Kenneth: https://github.com/boegel/pyghee - set up central Stratum-0 server securely (physical box, yubikeys) - any helps on these aspects is welcome :)