Try   HackMD

Sync meeting on MultiXscale deliverables due M30 June 2025 (20250528)

Present: Alan, Pedro, Satish, Kenneth, Lara, Maksim, Thomas

D1.4 Support for emerging system architectures

  • Rhea v1 (Neoverse V1) already covered, Rhea v2 (Neoverse V2) very close to NVIDIA Grace which is already supported
    • not building yet for Graviton 4 (only 2+3), but we could/should
    • we could get in touch with SiPearl (via Estella -> Alan)
  • ROCm compat guarantee only from 6.4
  • Overview of ROCm ecosystem (external contribution with our input)
  • LLVM - see if we have enough for a full section (maybe out of scope); external contribution (w/ internal contribution). We run and pass the test suite. Important for future architectures.
  • Future RISC-V work will be referenced in future deliverable

D1.5

  • Caspar tackled all comments, to be reviewed again by Kenneth
  • OpenFOAM test is close to being ready, so that should be reflected in deliverable

D5.3

  • placeholder page on dashboard to be added to EESSI docs
  • w.r.t. performance comparisons: just mention that no attempt was made to tune the tests for a particular system, it's not a benchmark suite
  • mention that dashboard will need to actively maintained to keep the service up

Next sync

  • during WP1+WP5 sync meeting on Tue 10 June
    • aim to have deliverables 100% ready for final review by SC + submission by Petra

Sync meeting on MultiXscale deliverables due M30 June 2025 (20250509)

Present: Pedro, Bob, Petra, Richard, Thomas, Maksim, Kenneth, Satish

D1.4 Support for emerging system architectures

  • Written by Pedro and Bob
  • Thomas will review the document
  • Some things are still missing or need to be updated
    • E.g. the metadata on the first page and the RISC-V LLVM section
    • Numbers of available apps need to be filled in just before submitting the deliverable
  • Namings and references may not be consistent throughout the document
  • A64FX is not an emerging microarchitecture, has been there for many years
    • we could name it "additional" or "interesting for EuroHPC"
  • The additional targets should match the aforementioned criteria
    • Reason for adding cascadelake could be that we now have a system that allows us to build for GPUs
  • Is the NVIDIA GPU section too detailed?
  • The ROCm section could stress a bit more how complex and fast evolving the ROCm stack is
  • LLVM can be removed from the RISC-V section (or just mention it in the first subsection)

D1.5 Portable test suite for shared software stack

  • List of authors needs to be fixed/updated
  • Satish has reviewed the document
  • Should we show a bit of relevant code of the EESSI Mixin class?
  • This deliverable is close to done, just need to address some details (see comments in Overleaf)
  • More or less complete, details.
  • Write some more details about the tests.
  • Community contributions.

D5.3 Report on testing provided software

  • Remove IP addresses from figure 2
  • Combine figure 4 and 5 (with the same layout as on the website)
  • Connected system 4.6 could move to section 3 Periodic testing.
    • But the dashboard only collects info on some systems. Otherwise refer to D1.5
  • Refer to the list of systems table in periodic testing.
  • 5.1 title (sanity checks -> test step of install procedure)
  • 5.2.2 figure 11, include timepoints before the performance drop
  • Hardware based comparison -> Show a plot comparing between ARM systems
  • Also more or less complete, great overview.

Next meetings

  • Wed 28 May 2025 10:30-12:00 CEST
    • goal: have deliverables reviewed + camera-ready for handover for final review to Steering Committee


Sync meeting on MultiXscale deliverables due M30 June 2025 (20250407)

D1.4 Support for emerging system architectures

writing effort lead by: Bob + Pedro (RUG)

  • process for identifying emerging targets
    • new systems (within EuroHPC context, national systems)
    • support requests (e.g. https://gitlab.com/eessi/support/-/issues/68 for Sapphire Rapids)
    • supported instructions + (expected) performance difference
    • also Intel Cascade Lake + Ice Lake
  • overview of procedure to provide installations for additional CPU target
  • lessons learned from adding additional CPU targets to existing EESSI version
  • TODO
    • A64FX: currently ~1/3rd of modules available
      • set up bot in service account + give others access to it
      • use EasyBuild 4.9.4 to install missing bits?
      • use Bob's script to generate easystack files from existing installations
    • NVIDIA Grace CPU: close to ready?
    • AMD ROCm: quite a lot of work to do
    • also make progress on NVIDIA GPU?
      • workflow is missing to put more software installations in place
        • no workflow that includes testing
        • no fixed set of GPU targets (CUDA compute capabilities)
        • expose GPU software in overview in docs
        • capture whether modules were built on GPU build node, or not (in description or module-load-message)
        • should have sanity check for CUDA compute capabilities in place, so we can rely on it
      • more relevant for progress report (due in Aug'25, work done by June'25)
        • but should be kept short
      • tiger team should convene again! => Thomas
  • timeline
    • [Pedro] early draft by end of April
    • [Thomas] review done of the draft by mid May
      • can start reviewing on Monday, May 12
    • [Pedro] camera-ready by end of May
    • June as buffer

D1.5 Portable test suite for shared software stack

writing effort led by Caspar (SURF)

  • focus on EESSI test suite itself
  • Supported software in EESSI test suite
  • How it's used (daily runs, test step as part of deployment procedure, and even on local software stacks)
  • EESSI mixin class: extraction of common logic in portable tests to a single mixin class
  • Check for and discuss other substantial improvements in the test suite repo (go through release notes?)
  • Community building (EB user meeting / docs & tutorial on how to contribute)
    • EESSI mixin facilitates writing new tests, as it points you to missing keywords etc.
  • timeline
    • [Caspar+Satish] early draft by end of April
    • [Kenneth] review done of the draft by mid May
    • [Caspar+Satish] camera-ready by end of May
    • June as buffer

D5.3 Report on testing provided software

writing effort led by:
- Lara (UGent) + Satish (SURF) for daily runs + performance results
- Maksim (SURF) on dashboard

  • Daily runs: which systems? Improvements to configuration (disjoining test version and config version), automatically use latest test suite release
    • Lessons learned
      • alerting based on sudden change of performance?
    • Study performance results
      • more variation for e.g. TensorFlow tests
      • RHEL8 vs RHEL9 (Snellius, HPC-UGent Tier-2)
      • interesting patterns in Vega?
      • OSU Microbenchmarks faster with 2023a compared to 2023b?
        • performance drop in early version of OpenMPI 5.0.x (2024a toolchain)
    • Performance variations due to change in system software, changes to test suite, etc.
      • lack of impact from changes to EESSI (which is a good thing!)
  • Dashboard
    • Inclusion of more sites on the dashboard?
    • Challenges? (e.g. permission to publish?)
  • timeline
    • early draft by end of April
      • Lara+Kenneth on daily runs of test suite
      • Satish on performance results
      • Maksim on dashboard
    • [???] review done of the draft by mid May
    • [Lara/Satish/Maksim] camera-ready by end of May
    • June as buffer

Next meetings

  • also invite Petra to these meetings?
  • Wed 30 April 2025 10:30-12:00 CEST
    • goal: have drafts ready for review
  • Wed 28 May 2025 10:30-12:00 CEST
    • goals:
      • have camera-ready versions ready
      • assess whether deliverables are ready for review by MultiXscale Steering Committee