# EESSI/AWS sync meetings - link to AWS project doc: https://docs.google.com/document/d/1CHG9fCh2LkfJ-EI8J-_Wr5NpHL5iwm8Wu6syfK9h7-c ## Next meeting - Thu 12 Oct 2023, 12:00 UTC - Thu 9 Nov 2023, 12:00 UTC --- ## Notes 14 Sept 2023 (12:00 UTC) - ... --- ## Notes 10 August 2023 (12:00 UTC) - status update on sponsored credits - Costs are about $3k/month for March-July'23 (up from ~$1.5k/month) - EFS costs are on the rise (~50% in July) - Build bot is still leaving behind large tarballs to allow debugging failing builds, which are not getting cleaned up currently - currently ~$10k left in sponsored credits - Looking into using a CDN - Will be Q4 before we look further into this - Injecting OpenMPI/libfabric libraries into EESSI - Full discussion in https://github.com/EESSI/software-layer/issues/252 (see this comment for details on the [potential way forward](https://github.com/EESSI/software-layer/issues/252#issuecomment-1662202921)) - Basically two steps - Take a copy of the host libmpi.so - We are using the EESSI linker so we need to force the library to find some of it's libraries from the host (like libfabric) - We modify the elf header of the library to do this - We also inject some additional dependencies to effectively preload some other required host libraries - Place it in a special place where it will get preferentially get picked up before the EESSI MPI library - Seems to work with latest version of EESSI, GROMACS runs show performance improvement of ~5% - failing test suites for OpenBLAS/FFTW/numpy (only) on Graviton 3 (not seeing this on Graviton 2) - popping up while populating software stack in EESSI pilot 2023.06 - Numerical errors with OpenBLAS in LAPACK test suite - Some toolchains use older OpenBLAS which lack optimisations - We see increased number of failing tests - Discussion on issue at https://github.com/xianyi/OpenBLAS/issues/4187 - Note OpenBLAS devels are only just starting to test on `neoverse_v1` - We ignored these failing tests for now, assuming they're mostly harmless - cfr. https://github.com/EESSI/software-layer/pull/309 - FFTW: erratic error with single FFTW test (not always the same one) - cfr. https://github.com/EESSI/software-layer/pull/310 - still figuring this out - handful of failing tests in numpy test suite - cfr. https://github.com/EESSI/software-layer/pull/306 - planning to open upstream issues for this to figure out how serious these are - Kenneth will send email to Angel on this, could be useful to get some feedback on this from AWS Performance Engineering team - progress on making it easy to integrate EESSI with ParallelCluster - Matt is working on open source add-ons for ParallelCluster - booth talk at AWS booth at SC'23 - long talks (~45min), repeated a couple of times - live demo of getting EESSI working on AWS - can cover different aspects ## Notes previous meetings - 10 Aug 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-08-10 - July 2023: (skipped) - 8 June 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-06-08 - 11 May 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-05-11 - 13 April 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-04-13 - 9 Mar 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-03-09 - 11 Jan 2023: https://github.com/EESSI/meetings/wiki/AWS-meeting-2023-01-11