# EESSI/AWS sync meetings - link to AWS project doc: https://docs.google.com/document/d/1CHG9fCh2LkfJ-EI8J-_Wr5NpHL5iwm8Wu6syfK9h7-c ## Next meeting - Every 2 months on 2nd Thursday, 13:00 GMT/BST a.k.a. 14:00 CE(S)T - Thu 14 Mar 2024 - Thu 9 May 2024 => move to June? --- ## Notes 14 Mar 2024 (13:00 UTC) - sponsored credits: ~$28.5k left (expires 2024-12-31) - current burn rate: ~$6k/month (so OK until ~July'24) - project review of MultiXscale EuroHPC CoE - good feedback on EESSI aspect of the project - interest within EuroHPC community is growing - updates - over 3,500 software installations in place in EESSI production repo (`software.eessi.io`) - ~250 different open source software projects - incl. TensorFlow, PyTorch, OpenFOAM, WRF, ... - initial support for NVIDIA GPUs: https://www.eessi.io/docs/gpu - see also latest EESSI update meeting: https://github.com/EESSI/meetings/blob/main/meetings/EESSI_meeting_20240307.pdf - OpenMPI bug fixed by Luke - https://github.com/open-mpi/ompi/issues/12270 - private preview for Graviton4 (Neoverse_V2) - very similar to NVIDIA Grace (interesting for JUPITER system at JSC) - getting access is quite competitive currently - r8g instance type - ISC'24 (12-16 May'24) - EESSI attendees: Alan, Kenneth, Lara, Pedro? - our tutorial submissions on EESSI, CernVM-FS, Magic Castle did not get accepted - AWS tutorial on Sunday - fixed program, mostly DevOps focus - Arm Neoverse V1 tutorial on Sunday - reach out to Filippo to see if EESSI could be part of this? - [EESSI BoF session](https://app.swapcard.com/widget/event/isc-high-performance-2024/planning/UGxhbm5pbmdfMTgyNjgxMg==) (Tue 9am) - paper for RISC-V workshop in the works - thinking about submission to Arm workshop (AHUG)... - AWS booth? - no AWS booth at ISC'24 - EESSI social event - sponsoring opportunity? - Tue evening joint AWS/NVIDIA party - EUM'24 (23-25 April 2024) - https://easybuild.io/eum24 - sync on ISC'24 plans (Brendan, Kenneth): Thu 18 April 14:00 CEST --- ## Notes 12 Oct 2023 (12:00 UTC) - sponosored credits - monthly spent is ramping up - April-Aug'23: ~$3k/month - Sept'23: $4.1k/month - $25k was added by Brendan on 14 Sept'23 - currently $28k left - should suffice until Feb'24 (extra $25k expires 02/29/2024) - migrating away from Slurm cluster set up with [Cluster-in-the-Cloud](https://github.com/clusterinthecloud), to [Magic Castle](https://github.com/ComputeCanada/magic_castle) - Magic Castle is developed by The Alliance (a.k.a.) ComputeCanada + actively maintained and supported - combo of Terraform and Puppet - supports **AWS**, **Azure**, OpenStack, GCP, OVH - support auto-scaling (power up nodes as jobs are queued) - very good fit for EESSI - [EESSI build-and-deploy bot](https://www.eessi.io/docs/bot) is now running on Slurm cluster in AWS managed with Magic Castle - Arm login node, x86_64 mgmt node - mix of Arm and x86_64 partitons - Rocky 8 is quickly becoming most popular OS in HPC, see https://docs.easybuild.io/user-survey/#operating-system - preparing switch to CVMFS repo under eessi.io domain - current EESSI pilot repo @ `/cvmfs/pilot.eessi-hpc.org` - new repo @ `/cvmfs/software.eessi.io` - Stratum-0 (central server) set up @ Univ. of Groningen (NL), funded via [MultiXscale EuroHPC project](https://www.multixscale.eu) - temporary Stratum-1 (mirror) servers running in AWS, Azure - both backed by regular storage and S3 - S3-backed is preferable in some ways, but not in other - CVMFS GeoAPI feature not compatible with serving CVMFS repo via S3 - GeoAPI is used to figure out which Stratum-1 is geographically closest (assumed to be fastest) - CVMFS client can go straight to S3 (no need to talk to Stratum-1 server) - can Stratum-1 use S3 as storage backend but still operate as a proper Stratum-1 server - CDN is something we want to try, but it may complicate things? - Spack is using one big S3 bucket in us-west for their binary cache - ~1.7GB on x86_64, ~1GB on aarch64 to launch TensorFlow starting from scratch - integration of EESSI in ParallelCluster - via [HPC Recipes for AWS](https://github.com/aws-samples/aws-hpc-recipes/) - Steve Messenger (AWS) joined last EESSI update meeting - some connection via Univ. of Luxembourg - they're using Graviton to prep for future EuroHPC chip (Rhea?) - could be interesting w.r.t. the problems we've been seeing on Graviton3 (with SVE) - see https://github.com/EESSI/software-layer/blob/2023.06/eessi-2023.06-known-issues.yml - we should provide more details to Brendan on that to get in touch with Arm experts - "CVMFS for HPC" tutorial - preliminary date for online session 1st week of Dec'23 - may become an ISC'24 tutorial submission as well - next to "Getting started with EESSI" tutorial submission - some changes to how latest libfabric (1.19.0) and OpenMPI (4.1.6) stack on top of each other w.r.t. EFA - fixes w.r.t. intranode MPI msgs in libfabric - more details on SC'23? - EESSI submission for HUST'23 workshop was not selected - Alan + HPCNow! + Henk-Jan (RUG) will be attending - also some people of Univ. of Brussels (VUB) who are interested in EESSI - [Magic Castle tutorial](https://sc23.supercomputing.org/presentation/?id=tut130&sess=sess216) on Sun 12 Nov'23 - AWS/NVIDIA/Arm "Welcome to Denver" open bar night on Sunday --- ## Notes previous meetings - 12 Oct 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-10-12 - 14 Sept 2023: no notes were taken :( - 10 Aug 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-08-10 - July 2023: (skipped) - 8 June 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-06-08 - 11 May 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-05-11 - 13 April 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-04-13 - 9 Mar 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-03-09 - 11 Jan 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-01-11