# Reviews
###### tags: `EUROHPC`
## Cross-collaboration
Cross-Project-Collaboration Board.
Two BoFs at Supercomputing 2022 planned.
Two workshops at HiPEAC 2023. Machine Learning technique for Software development and optimisation.
Common EuroHPC booth at ISC2023
Joint website. (on its way)
Time-X
Parallel in-time approach for domain decomposition.
requires new algorithm.
Goal is to go from academic methodology into a widely available technology.
Showcase: medicine, MD, electronmagenitics, climate/weather.
Project is on time.
DEEP-SEA
Part of the SEA family. IO-SEa and RED-SEA.
Modular supercomuting Architecture.
Software stack for exascale
RED-SEA network solution
IO-SEA IO software solution.
Goal of DEEP-SEA: is to ease and automate some of the task applications devlopers do to scale of exascale systems.
Provide high TRL.
Co-Design with applications + middleware.
1st integrated software roll-out
continuous integration infrastructure estbalished and in use
IO-SEA
IO and data management
Data scalability, system scalanility, CPU/GPU evolution, data placement, data heterogeneity
Usage of object storages
Hierarchical storage management
ondemand/ephemerial provisioning of storage and services & scheduling
IO instrumentation & AI based telemetry analytics
Co-design with IO intensive applications and development of a user-oriented application interface. (DASI)
collaboration with ADMIRE on IO-traces.
+ SEA projects.
RED-SEA
exascale networking capabilities allowing low latency and high bandwidth communication between large number of extreme computing and data components.
exascale needs efficient network.
Goal: covers the variety of network of solutions.
support high node count and massively processing systems.
smarter: congestion management
support data centric and AI related applications.
Network resource management.
Project is on track.
ADMIRE
active I/O stack that dynamically adjusts computation and storage requirement through intelligent global coordination.
Receive info (moniroting) and optimize from reception of info.
balance computation and data transfer. Reduce data movement.
co-design: several application.
basic Open source tools and libraries
in the admire gitlab.
open research datasets (zenodo) Community EuroHPC?
scientific publication
webinars, workshops & talks
19 deliverables and 4 milestones.
DCoMEX
Data driven computation mechanics at Exascale.
5 partners 5 european countries.
physics-contrained machine learning, data driven inference and large scale linear algebra solvers to solve extremly demanding computational mechanics problems.
Present Open source user friendly and customisable computational mechanics.
Open source framework.
O1: construction of AI-solve an AI-enhanced linear algebra library,
methods for dimensionality reduction and surrogate modelling, including the diffusion maps (DMAP) manifold learning and deep learning network (DNN).
O2: Exascale deployment of MSolve and Korali software engines
O3: Pre-processing of experimental image data
O4: Integration of the DCoMEX framework, application and performance evaluation,
Application to immunotherapy problem. + application to the multi-scale material design.
O5: Scientific contribution and disseminaiton,
dcomex.eu
github.com/DCoMEX
github.com/mgroupntua
eProcessor
Hardware side. leverage existing IP and make them more robust and make open source HW.
build a new RISC-V OoO open source processor and accelerators to deliver the f1st European full-stack ecosystem.
HPC, AI and Bioinformatics application middleware.
Tools performance monitoring, debugging, compiler).
reduced precision for AI (1-bit for bioinformatics!).
+ fault tolerance.
Application use cases ported to RISC-V and clearly defined optimization plans.
LLVM able to generate SIMD code with low-prcesopn floating point types.
MAELSTROM
7 partners
Weather/climate HPC and ML.
bringing communities together.
Open W&C predictions as a new usage domain for machine leanring applications do exploit exaflop performance.
Develop the optimal software environment.
Co-design cycle:
- application benchmark dataset (open)
- ML workflow & software benchmarking
- Hardware benchmarking & bespoke system design
(compute system design)
Optimisation cycle closed one already.
MAELSTROM dataset 16TB documented and published and available for download.
Project on track.
SPARCity
Optimisation and Co-Design framwrokf for sparse computation.
Develop a framework of efficient algorithms and coherent tools for maximising the performance and energy efficiency of sparse computations on emegering HPC systems.
Also Applications: which ones?
Digital Supertwins of supercomputers? Whsat is that?!
Real-life science applications.
Technical approach:
- Node-level optimizations
- System-level optimization
- digital super twin
- demonstrators
- complete framework.
Cardiac modelling, bioinformatics, social network analysis, autonomous driving.
Open source software and Data repsotirories
SparseBase, Digital Twin, DataRepository
Contious feedback with industry partner GraphCore.
Collaboration with DcMOES, eProcessor and DEEP-SEA.
TEXTAROSSA
Deveopment of integrated Development vehicle IDV
Goal:
- energy efficiency and thermal control
- suustained application performance
- seamless integration of reconfigurable accelerators
- development of new IPs
- Integrated Development Platforms
Co-design fashion...
Status:
- prototypes of IDV-E and of 2-phase cooling
- Thermal model to drive power management
- prototype tools and IPs (integrate accelerators or to generate online power monitors)
2 deliverables delayed.
### Alignement with EuroHPC JU strategy
international collaboration with US and Japan.
Advanced community standards to support EU HW/SW
OpenMP, MPI, PMIx
Enhance existing OS SW solutons vs buiolding everything from scratch.
example: Slurm and OpenMP.
Widening Usage Pillar.
- advances in algorithms and applications
- advances in system SW will simplify efficient use of HPC resources
- advances in programming models, tools and APIs will result in more & better HPC applications and wider use
- projects roll out significant colume of high-quality training materials. <-- should be available on the main EuroHPC website.
Malleability in EuroHPC projects
Current and future potential collaborations.
Run time varying resource (== Malleability)
DEEP-SEA, ADMIRE, TIME-X
Collaboration started with 1st EuroHPC workshop on malleability in HPC
4 projects (what is the last one?)
day.1 programmers'day: how slurm needs to be modified
HPC SW track.
Day.2 Algorithm dat. Application perspectve + scheduling problem.
Next workshop planned in Aug/Sep 2023.
collaborations:
applications, scheduling strategies.
1. Programming models
Dynamic MPI TIME-X DEEP-SEA (generic but difficult to use)
FlexMPI ADMIRE (not cover all application)
2. Optimal resource scheduling
specialized solutions.
malleability requires standard --> collaborate to push standards
Applications's information (from all 4)
Formal expression.
3. From monolithic to modularity
goal is to have multi-layed with separation ofconcerns
modularization of composenents
4. Collaboratove EuroHPC malleability paper
joint write paper.
to be submitted next year
enhance visibility of malleability in EuroHPC.
Final words:
REGALE is the last project that collaborates.
Not competing projects, etc.
I/O tracing Tools and Traces
Motivation IO is bottleneck
HPC IO trace archive required by application and system developers.
Use all applications to build an HPC IO trace archive (where will it be "archived").
Need different run for each application.
Archive should follow FAIR principles.
Investigate many different supercomputers.
easy for people to run the IO trace
Develop a joint metadata scheme
put data in OpenAIRE & Zenodo.
Open Trace Format and Darshan file format to enable interoperability.
+ automatic analysis inside web-environment.
Darshan: trace and profiling environment.
Profiling, tracing and analysis will be built on Darshan,
What License?
long-term commitment by Argone National Lab.
Counters, histograms, timers, statistics
Full IO trqces
POSIX, MPI-IO, HDF5 and lustre information available.
Darshan will be extended with:
- more system level IO tracing. add strace in combination with seccomp-bpf allows fine grained syscall collection at small overhead.
Status:
Two EuroHPC ADMIRE and IO-SEA
First verson of recipe and scropts for Darshan, strace profiles and SLURM available pad.zdv.net/s/ccjfwWDps
mailing list (@lists.uni-mainz.de)
Website for uploading traces if available.
Benchmarking.
DEEP-SEA, RED-SEA and IO-SEA.
Use JUBE.
Weekly schedule of benchmark runs.
results put in an archive (which one?)
JUBE helps to automatically run all the analysis qe want for the benchmark traces.
DEEP-SEA + IO-SEA
+ MAELSTROM interested in benchmark environmet.
DMRLib DEEP-SEA
## ADMIRE
November 7, 2022: 10:00 - 17:00
Adaptive multi-tier intelligent data manager for Exascale
### Objectives:
The growing need to process extremely large data sets is the driving force accelerating the transition to high-performance computing. However, the flat data storage hierarchies with a central parallel file system are proving to be inadequate. Emerging multi-tier storage hierarchies could meet the needs of data-intensive applications but currently lack adequate control mechanisms for the available resources. The EU-funded ADMIRE project plans to develop an adaptive storage system that should allow high-performance computing systems to deliver very high throughput and increase application performance. The aim is to significantly improve the runtime of applications in fields such as weather forecasting, remote sensing and deep learning.
### Website
https://www.admire-eurohpc.eu
### Applications
6 use cases.
#### Application 1: Monitoring and Modelling Marine, weather and Air quality
#### Application 2: Car-Parrinello molecular dynamic simulation of large molecules and small proteins
#### Application 3: Simulation of large scale turbulent flow
#### Application 4: Continental-scale land cover mapping with scalable and automatic deep learning frameworks
#### Application 5: Super-resolution imaging using Opera microscopy and SRRF/ImageJ software
#### Application 6: Software Heritage Management & Indexing
### Periodic report
Period covered by the report: from 01/04/2021 to 30/09/2022
#### WP1 Project management
The project management is professional; the work is well organized and communication between workpackages is good.
**Delays (no impact on the project results)**:
Task 1.8. Delay of the EuroHPC projects Consortium Agreement. Foreseen M6. Signed M12
**Deliverable D1.1 (M1)**. Project handbook. Status: Completed.
This handbook provides partners with Task 7.2 a summarised guide to project management structure and associated procedures.
**Deliverable D1.2 (M6)**. Data Management Plan. Status: Completed.
Eight data sets that will be produced as part of the project activities.
**Deliverable D1.3 (M12)**. First internal review report. Status: Completed.
The reporting period includes the work done throughout the whole duration of the project, from April 2021 (Month 1) to the end of March 2022 (Month 12).
**Deliverable D1.4 (M18)**. Midterm report. Status: Completed.
This deliverable summarises the progress of the project during the first 18 months. It explains the relationship between different components and work packages of the project. The content of all delivered deliverables is briefly explained in the document.
**Deliverable D1.5 (M18)**. Updated Data Management Plan. Status: Completed.
This updated version shows more details related to the data published in the project, information sharing aspects through tools such as Gitlab, Overleaf, Gdrive, etc., and practical data management procedures implemented by the ADMIRE project consortium.
**Deliverable D1.8 (M6/M12)**. Collaboration plan with EuroHPC Projects. Status: Completed.
A Collaboration Agreement has been signed and the end of the first year (M12). Even though it was planned for M6, the complexity of the agreement among more than 60 partners delayed the task. Thus, this deliverable was also delayed to fit the CA.
The clearest scientific cooperation opportunities have been identified in two areas: malleability and Ad-hoc storage. Related to the project, main cooperation will be established with Deep-Sea and IO-Sea projects. The clearest opportunities for cooperation in dissemination are through activities associated with the HIPEAC network, which has a strong industry and technical focus on parallel systems, PRACE for testing, and major supercomputing conferences, such as SC and ISC.
**Software**
- InfiniBand Networks Congestion Control Tool. Cooperation ADMIRE-Red-SEA.
##### Resources WP1
Some deviations in the usage of resources but all are clearly justified in the periodic report.
#### WP2 Ad-hoc storage systems
**Deviations**
As mentioned above, JGU’s unexpected personnel shortage resulted in the delay of the tasks in T2.1 which focuses on analysing the ADMIRE applications concerning their used semantics. Meanwhile, this task is well underway, and therefore we do not expect that objectives will not be met at this time.
**Deliverable D2.1 (M6)** Definition of requirements for ad-hoc storage systems. Status: Completed
Deliverable D2.1 describes the various ad-hoc storage systems used in ADMIRE, their interfaces with other components in the ADMIRE framework, the ADMIRE applications’ I/O requirements, and how ad-hoc storage systems can be used to improve their runtimes.
**Deliverable D2.2 (M13)** Design of the ad-hoc storage systems. Status: Completed
Deliverable D2.2 provides a detailed analysis of the ad-hoc storage systems, their interfaces to the ADMIRE framework, and an initial analysis of the ADMIRE applications.
**Deliverable D2.3 (M24)**. Short-lived ad-hoc storage systems. Status: In Progress
Deliverable D2.3 is ongoing and will include the software developments for the different adhoc filesystems.
#### WP3 Malleability management
The main objectives of WP3 are: to provide base mechanisms that allow the I/O resources of jobs to be dynamically adjusted alongside their computational resources; to design algorithms that dynamically balance the I/O allocations of individual jobs across the system to maximise throughput while preserving fairness; develop malleable ad-hoc storage systems capable of scaling the I/O resources of individual jobs according to scheduler decisions; and to integrate the above solution components into a real scheduler.
**Deliverable D3.1 (M6)** Malleability requirements definition. Status: Completed.
**Deliverable D3.2 (M14)** Base mechanisms for malleability. Status: Completed
**Deliverable D3.3 (M24)**. Scheduling algorithms and policies. Status: In Progress
**Deviations**
UC3M. Less effort than expected as Limitless integration was moved in part to the second half
of the project.
INRIA. Less effort than expected as I/O malleability is also included in WP4
#### WP4 I/O scheduler
The goal of WP4 is to design and develop an I/O Scheduler component for the ADMIRE framework. The prototype for this component should have support for control points to allow fine-tuning of its operation and should also be able to coordinate direct and indirect inputs from the Intelligent Controller, the Job Scheduler, and the Malleability Manager to provide QoS- aware data scheduling.
Therefore, the main responsibility of the I/O Scheduler component is to control (and often execute) the movement of datasets (i.e. files or objects) between storage tiers with the goals of accelerating data processing by maintaining data locality as well as reducing I/O contention to the PFS.
**Deliverable D4.1 (M6)** I/O Scheduler requirements and API. Status: Completed
**Deliverable D4.2 (M28)** Software to support I/O scheduling policies. Status: In Progress
**Deviations**
JGU deviation is due to problems in finding candidates to work on the project.
BSC deviation is due to redefinition of design and implementation.
DDN deviation is due to additional time required to provision and set-up their internal infrastructure (a dedicated test cluster for ADMIRE).
PSNC. Analysis of Life-Sciences application preprocessing phase in order to identify whether it would benefit from in-situ operations was done in the first period. Initially, the use of in-situ mechanisms was planned for large-resolution image visualisation. However, good results and performance of automated processing of smaller images led to lower demand for these mechanisms in machine learning applications used in the project.
#### WP5 Sensing and profiling
WP5 is dedicated to monitoring and profiling.
**Deliverable 5.1 (M6)**. Definitions of the profiling and monitoring requirements. Status: Completed
**Deliverable 5.2 (M14)**. Design of the monitoring and profiling tool. Status: Completed
**Deliverable 5.3 (M24)**. Report on the implementation application I/O profiling. Status: In Progress.
Led by DDN, this deliverable is aimed toward the description and implementation of application I/O profiling.
**Deviations**
TUDA. Deviation due to more efforts to extend Extra-P than what was estimated.
#### WP6 Intelligent controller
**Deliverable D6.1 (M6)** Report on the intelligent controller requirements. Status: Completed
**Deliverable D6.2 (M13)** Report on the intelligent controller design. Status: Completed
This deliverable includes the design of the Intelligent Controller showing control and data plane architectural blocks for orchestrating system components, the final version of the definition of the Application Programming Interface (API) with the other of the Intelligent Controller.
**Deliverable D6.3 (M24)** Runtime tools to tune I/O system behaviour. Status: In Progress.
This deliverable will present runtime analytics tools to tune I/O system behaviour, including the mechanisms to improve I/O system behaviour and facilitate anticipatory decisions for resource allocations based on the knowledge collected from the I/O systems, application runs, the batch system
Deviations in WP6 are not justified. For instance JGU and PSNC habe about +35 and -35 % deviation, repectively.
#### WP7 Application co-design
WP7 is responsible for analysing applications and their codes for identifying and formalising requirements such as co-design input to other Work Packages (WP2-WP6).
**Deliverable D7.1 (M6)** Application requirement definition. Status: Completed
**Deliverable D7.2 (M19)** Application co-design preliminary report. Status: In progress
**Deviations:**
UC3M. The effort spent is higher than the estimated due to stronger implication in the co-
design of some use cases (Wacom++, ...) using FlexMPI.
DDN. Extra effort devoted to trace applications with Lustre.
PSNC. In the second period slightly more work will be required to adopt ADMIRE tools.
#### WP8 Dissemination and exploitation
**Deliverable D8.1 (M1)**. Project web site.
Status: Completed.
**Deliverable D8.2 (M6)**. Dissemination, communication, exploitation, and standardisation plan. Status: Completed.
**Deliverable D8.3 (M18)**. Midterm report on dissemination, communication, exploitation, and standardisation. Status: Completed.
**Deviations**
BSC deviation is due to the will to first structure our
in the second half of the project.
0.84 0% 0 0%
2.34 17% 0.5 0%
software offers before distributing it widely
TUDA. Deviation due to more efforts on preparing the user manual and tutorial for open source
ElastiSim than what was estimated.
KTH. We planned for a smaller number of dissemination and exploitation efforts in the first part of the project. In the second part of the project, we will largely increase the dissemination and exploitation effort as the results of the project on Nek5000, large-scale I/O, and in-situ data analysis will need to be disseminated and exploited.
CINI. Small deviation due to more efforts in preparing and delivering the D8.3 deliverable.
**Periodic Report is very clear**.
## DEEP-SEA
DEEP Software for Exascale Architectures
November 8, 2022. 9:00 17:00
### D7.5 Periodic progress report at M18
### Co-design effort
**Space weather**: xPic uses a large list of libraries (e.g. ParaStation MPI, SIONlib, HDF5, H5hut, and PETSc), tools and frameworks (e.g. Extrae/Paraver, PyTorch, scikit- learn), which are part of the DEEP-SEA OCs. In the reporting period KULeven has worked on porting its autoencoder software to PyTorch for the analysis of solar active regions, and testing it on the DEEP system. This is done with the help of the support team, whose members profit from the interaction as well, as they collect user-feedback about the system capabilities and missing software dependencies.
• Weather forecast and climate: The Dynamic Load Balancing OC has been tested in the CLOUDSC mini-app. The results will be used in developing a new IFS-physics mini- app with efficient load balancing. The MUSA Optimisation Cycle is being used by IFS in the context of spectral transforms.
**Seismic imaging**: The Asynchronous Constraint Execution (ACE) task scheduler has been implemented into FRTM. This has enabled FRTM to be integrated with the GPU kernels. Work to implement memory tools and the malleability and resiliency features are in progress in collaboration with WP3 and WP5. BSIT is working closely with WP3 to implement the Intel Optane features as part of the Memory Management OC.
**Molecular dynamics**: GROMACS is working closely with the developers of the OCs
in MSA, Application Mapping T oolchain, Malleability, and Memory System Performance. The works towards implementation of these OCs and integration with the DEEP-SEA software stack is carried out in collaboration with WP2-5. In addition, GROMACS has been instrumented with the VEF trace tool and has successfully generated and shared the traces it with the developers in the RED-SEA project.
**Neutron Monte-Carlo transport for nuclear energy**: In collaboration with WP2, work is ongoing on the GPU and OpenMP offloading of PATMOS. In addition, the outputs of the MSA OC for the CPU-only version of PATMOS is being shared with the developers of Scalasca for feedback and smooth implementation. Feedback will be provided on the Application Mapping Toolchain cycle after its complete implementation. PATMOS is also working with RED-SEA to test the VEF traces. The VEF traces from PATMOS were provided to the RED-SEA team for analysis and feedback.
**Earth system modelling**: In collaboration with WP2 the performance tools (e.g. Score- P, Scalasca, Extrae, and Paraver) and monitoring tools (LLview API) are used. For an automated deployment of TSMP using GitLab runners on the DEEP system, tight interaction was established with the CI-teams in WP3 and WP4. The Load Balancing OC is being applied to MPI-OpenMP processes in collaboration with WP4. To use the Malleability OC in TSMP, co-design discussions with WP5 are taking place. Close collaboration with IO-SEA are ongoing to exploit synergies between the TSMP activities in both projects. For example, the team has significantly benefited from the common benchmarking workflow between both projects. Furthermore, interactions between the IO-SEA ephemeral data services and the MSA OC will be explored as they are important to improve the TSMP performance on Exascale machines.
### Synergies & collaboration with other projects
Common Benchmarking workflow between DEEP-SEA and IO-SEA. Could this be re-used for ADMIRE too?
Topics on which two or more projects are active (e.g. malleability, benchmarking, tracing, or co-design) were identified and discussions in each of them continue within work streams. Within the SEA-projects, examples of joint co-design activities have been already given, such as the use of a common benchmarking strategy, the use of IO-SEA ephemeral services, and the collection of VEF traces on the DEEP-SEA applications, which were provided to the RED-SEA team.
### Work Package 1: Co-design applications
A collaboration between DEEP and RED-SEA has been established around use of the VEF trace framework. The WP1 applications have generated traces using the VEF trace tool and did share the traces with the developers in RED-SEA for further analysis and feedback.
Good progress with all applications.
Are there any synergies with other EuroHPC projects? such as ADMIRE for applications? Could the developments done within ADMIRE be "tested" with a few DEEP-SEA applications?
Seismic imaging: FRTM worked on the malleability and resiliency features in addition to the standard monitoring cycle developed in WP5, and on the interaction of the GPI- Space programming model with Slurm and vice versa with WP3. This would allow Slurm to dynamically shrink or increase the allocated MSA resources.
Any work plan with ADMIRE? (for malleability, skrink/expand resources the allocated MSA resources). Does it overlap with ADMIRE? Is it complementary? How?
Computational fluid dynamics: The application Neko, which is a variation of the Nek5000, is receiving support from with WP2 around mpi-f08 and how to instrument with performance analysis tools and generate profiles.
Similar application than the one used in ADMIRE. Any possibility to increase synergies with ADMIRE?
Use EasyBuild; ADMIRE uses Spack
Could projects learn form each others and agree on best practices?
Run on multiple sites (for instance at BSC)
Could the Benchmarks be used by other EuroHPC projects?
Neko is the next generation of NEK5000. Could ADMIRE Frameowkr be used for Neko? This should not require extensive work, right?
### Work Package 2: Measuring, Modelling, Mapping and Monitoring
Deliverable D2.2 has been submitted in time, with contribution from all WP2 tasks. This also includes a release of new versions of several tools.
## RED-SEA
November 9, 8:30 - 16:35
**Plots in the progress report are not readable!**
The RED-SEA overall goal is to prepare a new-generation European Interconnect, capable of powering the EU Exascale systems to come, through an economically viable and technologically efficient interconnect, leveraging European interconnect technology (I.e., BXI) associated with standard and mature technology (I.e., Ethernet), previous EU-funded initiatives, such as ExaNeSt1, EuroEXA2, ECOSCALE (Energy-efficient Heterogeneous COmputing at exascale), Mont-Blanc3, DEEP projects4, and the European processor (EPI)5 project, as well as open standards and compatible APIs.
### Work Package 1 Architecture, Co-design, and Performance
RED-SEA Application portfolio:
• NEST (INFN) a simulator for spiking neural network models that focuses on the dynamics, size and structure of neural systems, providing over 50 neuron models and over 10 synapse models. NEST supports parallel execution via an MPI and OpenMP hybrid programming style and provides a Python interface for easier setup and interoperability with codes for further algebraic manipulation and statistical investigation over the simulated network and its dynamics.
• LAMMPS (EXACT) a classic molecular dynamic engine with focus on material modelling. It is used widely in several branches of science: solid state physics, computational chemistry, biophysics and many others.
• SOM (EXACT) are artificial neural networks that are used in the context of unsupervised machine learning with the goal of developing a (massively)-parallel implementation of this algorithm.
RED-SEA Benchmark portfolio:
• DAW (FORTH) a generator of interesting workloads to stress the network interface capabilities at scale and the QoS capabilities of the interconnect.
• LinkTest (FZJ) a scalable benchmark for point-to-point communications, capable of benchmarking a variety of message-passing implementations, including MPI, IB verbs, PSM2, NVLink (only between GPUs on a single node), UCP (part of UCX) and TCP, among others.
PCVS (CEA) a validation engine designed to evaluate the offloading capabilities of high-speed networks by running large test bases in a scalable manner, taking advantage of highly parallel environments to reduce their time to result, improving subsequently the project efficiency thanks to a more regular validation process.
VEF Traces tool (UCLM) has been chosen as the framework to get the network traces generated by the applications and benchmarks.
VEF Traces is a framework devoted to capture MPI point-to-point and collective communications (and the dependencies among them) of a certain application and to generate network-oriented traces used to feed a network simulator. The analysis of the traces should provide the network requirements for the applications and co-design recommendations for all the technological WPs, regarding network bandwidth, end-to-end latency, scalability, network topology, network reliability requirements.
The partners involved in T1.1 activities have decided to procure a common hardware platform to run applications and benchmarks to obtain homogeneous and comparable results: the Dibona cluster provided by Atos. The Dibona characteristics are depicted in the hardware testbeds table (Table 3). Since 16/11/2021, the cluster has been up and running and remote access is available for the RED- SEA partners and enables having network traces based on BXI interconnect since the early stages of the project.
### Work package 2 High Performance Ethernet
The main objectives of this WP are to study a High-performance low latency bridging solution with Ethernet network based on the BXI HW and SW.
Also some underspending. More or less justified.
Summary of expected results:
o Develop a hardware bridging solution and associated software modules
o Optimally integrate Internet Protocol (IP) and Ethernet and RoCE (RDMA over Converged
Ethernet) traffic over an HPC interconnect, achieving low latency and high message rates. o Multiply by 4 the bandwidth and the message rate available for each endpoint of the network by doubling the frequency of the link (up to 200Gb/s) and by doubling the number of network
interface for each process (multi-rail)
o Cost reduction of 10K euros per IP router while matching Ethernet bandwidth requirements o 9 deliverables will be submitted on time
#### Deviations
Deliverable D2.3 is shifted by 6 months due to late start of task 2.2 because of non-availability of people with required technical background and workload a bit underestimated. In addition, the extended months of D2.3 by moving the due month to M18, as depicted in the graph below (Figure 6), will allow task 2.1 to report whole work into the deliverable.
D2.1, D2.4 and D2.7 deliverables are shifted by 2 months. The main reason of the delay is because the BXI architecture has been deeply reviewed & upgraded after project start in order to support native Ethernet protocol. This change will significantly increase the performance of Ethernet ports moving from a 200Gb/s to a 400Gb/s Ethernet link for addressing the latest technology available on the market. It has however an impact mainly on the first deliverable D2.1 and its associated milestone 3 of the task 2.4: two additional months are needed to be able to extract from the current HAS what is actually required for developing Ethernet Gateway IP.
As a domino effect, the two subsequent deliverables (D2.4 and D2.7) and associated milestones (6 and 7) of the task 2.4 will also be delayed due to the delay of D2.1. These 3 deliverables and their associated milestones are the outcomes of task 2.4 as pictured below.
These delays will not have major impact on the overall WP2.
### Work package 3 Efficient Network Resource Management
Underspending (mostly ATOS side again)
### Work package 4 Endpoint Functions and Reliability
WP4 deals with functions carried out by the endpoint, around the network. The tasks in this work package span from resiliency features implemented in the network interfaces, the integration with new processors and accelerators, to MPI and in-network compute technologies.
Table 8 – NIC features to be implemented
what is in red? Are there any delays?
Again underspending.
### Work package 5 Dissemination & Exploitation
mostly rely on DEEP-SEA for communication and dissemination!
D5.3 “Table of Exploitable Results” in M6. We are convinced that exploitation should not remain the exclusive domain of industrial partners: exploitation must be every partner’s business, to achieve a great diversity of exploitations: products, patents, IP portfolio, open-source tools, etc. Therefore, all RED-SEA partners were involved in the identification of the RED-SEA Exploitable Results (ER), and each partner committed to at least one ER.
This resulted in a list of 16 Exploitable Results (see Appendix A in D5.3), which sets the basis for the monitoring of the RED-SEA project’s exploitation.
### Work package 6 Project Management
## IO-SEA
November 10, 2022. 8:30 - 17:00
## Logistics
Hotel Pax from November 6 to November 10.
Need to take an extra night.
### Travel
Frequent Flyer Number: SKEBB628582900
Booking Reference: OGPUSW
Oslo Gardermoen - Brussels BRU 1PC
Scandinavian Airlines
SK 4743 / 06NOV 08:15 10:20 07:15
Refreshments For Purchase 02:05
L / Confirmed
Brussels BRU - Oslo Gardermoen 1PC
Scandinavian Airlines
SK 4746 / 11NOV 19:40 21:35 18:40
Refreshments For Purchase 01:55