# MultiXscale WP1+WP5 sync meetings - Monthly, every 2nd Tuesday of the month at 10:00 CE(S)T - Notes of previous meetings at https://github.com/multixscale/meetings/wiki --------------------------- ## Next meetings - Tue 11 Feb 2025 10:00 CET - Caspar can't make it due to time difference, Kenneth will chair - Tue 11 March 2025 10:00 CET - Caspar _probably_ can't make it (traveling around this time), Kenneth will chair - Tue 8 April 2025 10:00 CEST - Tue 13 May 2025 10:00 CEST - Tue 10 June 2025 10:00 CEST - Tue 8 July 2025 10:00 CEST (without Kenneth) - reschedule to 1 July? --------------------------- ## Agenda/notes 2025-01-14 attending: Caspar (SURF) | Kenneth, Lara (UGent) | Thomas, Richard (UiB) | Helena, Eli, Pedro, Susana, Nadia (HPCNow!) | Alan (UB) | Bob, Pedro (RUG) | Julián (BSC) | Neja (NIC) - Final word on deliverables (M24): - D1.3 ... - D6.2 ... - D7.2 ... - D8.5 ... - All of these were succesfully submitted on time (except for WP3) - Final versions available on Zenodo + (linked from our website) - Upcoming deliverables (**M30 - June 2025**): - (Alan) create Overleaf project - by next sync meeting (Tue 11 Feb): come up with outline ASAP + make sure deliverable description from Grant Agreement is covered - D1.4 => RIJKSUNI (Pedro) - D1.5 => SURF (Caspar) - D5.3 => UGent (Kenneth) - D6.3 => NIC (Neja) - keep deliverables short => ~15 pages max. - set early internal deadline to get these fully done: 1st week of June? - D1.4 Support for emerging system architectures (RIJKSUNI) - Arm CPUs (in place for `aarch64/{generic,neoverse_n1,neoverse_v1}`) - NVIDIA Grace (to start) - AMD GPUs / ROCm (to start) - Zen4 (AMD Genoa) (how did we manage that, what did we do differently) - (outlook to) Zen5 (AMD Turin) - improvements in glibc - also cover RISC-V (despite that having a separate task) - D1.5 Portable test suite for shared software stack (Ugent - actual: SURF) - Mixin class, easier portability - D5.3 Report on testing provided software (SURF - actual: UGent) - Not just test suite, but also test suites run during build of software - Can say something on the need for GPU build infra so that we can run GPU unit tests - Integration with Ramble - EESSI testsuite Dashboard - D6.3 Interim report on Community outreach, education and Training (NIC) - WP status updates - [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies - [UGent] T1.1 Stable (EESSI) - **D1.3 due M24 (Dec'24)** - `dev.eessi.io`: - WIP from last time: - Meeting planned in January with Tilen to have him experiment with `dev.eessi.io` - => Moving to February. They need a GPU (Rome+A100), so also blocked by that - Meeting tomorrow to see what still needs to be implemented before the January meeting (Lara+Pedro) - meeting done, see notes? - couple of things still need to be done - docs for setting up bot: OK - support for changing software subdir - support for GPU builds - schedule other tiger team meeting after MultiXscale GA - ideally before meeting w/ Tilen on 5 Feb'25 - so tiger team to be scheduled last week of Jan'25 - Documentation effort to describe what we should do if we want to onboard a new code / repo to build for `dev.eessi.io` - Pedro is ready to open PR for this - NVIDIA GPU support: - we had tiger team meeting on this this morning - bot setup in service account at UGent for GPU build nodes is WIP (Lara) - Key results: - Fix GPU availability in EESSI container in test step [#847](https://github.com/EESSI/software-layer/pull/847]) - WIP: - Deploy bot @UGent, testing first builds in [#842](https://github.com/EESSI/software-layer/pull/842) - Fix issues in automatically determining ReFrame config from template in test step [#114](https://gitlab.com/eessi/support/-/issues/114) - TODO from last time: - [WIP] updateing the `SitePackage.lua` for proper GPU support ([see PR #798](https://github.com/EESSI/software-layer/pull/798)) => STILL waiting for review - will be required for stuff that depends on cuDNN - Deploy bot @SURF - Re-install GPU software in proper location (not in CPU-only prefix) - only applies to CUDA itself + OSU benchmarks + CUDA-samples - these cause some headaches when installing CUDA & co for newer architectures - "we will benchmark software from the shared software stack and compare the performance against on-premise software stacks to identify potential performance limitations, ..." - work done by Satish for Espresso, LAMMPS, GROMACS?, OSU - All put into the deliverable, no surprises. EESSI generally on par with local SW stack. - [RUG] T1.2 Extending support - **D1.4 due M30 (June'25)** - `zen4` _almost_ on par with the rest. - PR to do this was merged, but not deployed, so we need to still do that https://github.com/EESSI/software-layer/pull/841 - Question: should these be _hidden_ modules? - Then merge https://github.com/EESSI/software-layer/pull/766 - NVIDIA Grace - @Thomas: any update? - => set up tiger team for this (Thomas) - AMD ROCm (see [planning issue #31](https://github.com/multixscale/planning/issues/31) + [support issue #71](https://gitlab.com/eessi/support/-/issues/71)) - @Pedro/Bob: any update? - Bob looked at open EasyBuild PRs a bit for ROCm, plans to keep working on this - => set up tiger team for this (Bob) - [SURF] T1.3 Test suite - D1.5 due M30 (June'25) - Ongoing effort: porting tests to use `eessi_mixin` class 80% complete - Dealing more elegantely with read-only data now [issue](https://github.com/EESSI/test-suite/issues/211) - would be nice to get more contributors... - talk at EasyBuild User Meeting + hands-on session? - webinar on EESSI test suite, maybe via EPICURE? - maybe include hands-on too? - also show off dashboard - are we ready to let other sites push in their results and expose it via dashboard? - probably require a policy in terms of which data is required (which scales) - there will be a talk on Continuous Benchmarking by JSC at EUM'25 - also remote talk by Ramble? - [BSC] T1.4 RISC-V (due M48, D1.6) - ... (is build bot active? Who can control it? Should all PRs try to build for this, or not?) - Bob is working on this, we're close - BSC firewall was blocking events recoming from smee - should reach out to Vitamin-V project? - https://vitamin-v.upc.edu/ - BSC training using Paraver on RISC-V, will use EESSI for hands-on - [SURF] T1.5 Consolidation (starts M25 - Jan'25) - continuation of effort on EESSI (T1.1, etc.) - [UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations - [SURF] T5.2 Monitoring/testing, D5.3 due M30 (June'25) - Plan to seperate dashboard & database in two separate VMs (security) => Status? - Vega agreed to make test data public. Karolina is waiting for response from their director. - Caspar sent reminder on 13-01@16:00h - [UGent] T5.4 support/maintenance - D5.4 due M48 (Dec'26) - should be a bit more proactive on getting support issues closed + follow-up on software-layer PRs - [UB] WP6 Community outreach, education, and training - **deliverables due: D6.3 (M30 - June'25)** - Upcoming activities: - [Alan] EESSI tutorial at HiPEAC 2025 accepted (20-22 Jan'25) - standard 2h tutorial + extending EESSI - 39 people registered for EESSI tutorial - Lara is giving demo at some time, but can maybe be moved to end of the session (closer to 5pm) - details are unclear for Lara - [Lara] Also at HiPEAC: another workshop (about CoEs), Lara will present workshop there - 2 workshops for MXS: a talk (Mon), and a demo (Wed - collision with EESSI tutorial). - [HPCNow] WP7 Dissemination, Exploitation & Communication - podcast interview for EuroHPC podcast - Any updates? Date planned yet? - Contacted HPCwire to see if they can make an article about EESSI => Status? - TODO last time: we could make a press release ourselves. Susana would take lead, Kenneth provides input for quote. Status...? - WIP by Susana - new edition of newsletter is ready to be published - would be nice to promote this at HiPEAC - website update with new newsletter - T7.1 Scientific applications provisioned on demand (lead: HPCNow) (started M13, finished M48) - EESSI on 'paid layer' on top of Parallel Cluster: WIP. Status? (Pedro @ HPCNow) - PR to AWS merged - blog post once new AWS Tech Short is recorded? - some discussion with Open OnDemand team on integrating EESSI (Eli, Kenneth) - Task 7.2 - Dissemination and communication activities (lead: NIC) - Updates ... ? - see deliverable + GA - Task 7.3 - Sustainability (lead: NIC, started M18, due M42) - Updates ... ? - see deliverable + GA - Task 7.4 - Industry-oriented training activities (lead: HPCNow) - Updates ... ? - upcoming events - HiPEAC'25 (Barcelona, 20-22 Jan) - EuroHPC Summit (Krakow, 18-20 March) - EasyBuild User Meeting (Jülich, 25-27 March) - https://easybuild.io/eum25/ - 3rd day will probably be focused on EESSI - MultiXscale scientific CECAM workshop @ Ljubljana (April 2025, WP6) - talk on EESSI (by Alan, remote), would be nice to cover `dev.eessi.io` service - EuroHPC User Day (Copenhagen, Denmark, 30 Sept-1 Oct) - https://www.deic.dk/events/eurohpc-user-days-2025 - [NIC] WP8 (Management and Coordination) - Amendmend not accepted in current form - Status? (last time: waiting for IIT for changes, then resubmit on friday after the sync meeting) - amendmend was re-submitted 20 Dec, waiting for reply - waiting for report from special review - for travel budget, a more detailed overview is probably desired? - next General Assembly meeting - 23-24 Jan'25 in Barcelona/Sitges - Neja will send reminder for quarterly reports 2024Q4 ### Other topics - Interim EESSI Steering Committee (https://www.eessi.io/docs/governance/), had initial meeting - will meet quarterly (+ additional topical meetings) --------------------------- ## Notes of previous meetings see https://github.com/multixscale/meetings/wiki ---------------------------- ## Template for sync meeting notes TO COPY-PASTE - overview of MultiXscale planning - https://github.com/orgs/multixscale/projects/1/views/1 - WP status updates - [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies - [UGent] T1.1 Stable (EESSI) - due M12+M24 - ... - [RUG] T1.2 Extending support (starts M9, due M30) - [SURF] T1.3 Test suite - due M12+M24 - ... - [BSC] T1.4 RISC-V (starts M13) - [SURF] T1.5 Consolidation (starts M25) - [UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations - [UGent] T5.1 Support portal - due M12 - ... - [SURF] T5.2 Monitoring/testing (starts M9) - [UiB] T5.3 community contributions (bot) - due M12 - ... - [UGent] T5.4 support/maintenance (starts M13) - [UB] WP6 Community outreach, education, and training - ... - [HPCNow] WP7 Dissemination, Exploitation & Communication - ...