# ITN Gigantic Meeting Day 1
_2022.09.07_
[toc]
## Attendees:
cf. Slides
## News from each Participants
### INSA:
Involved in a French project on SKA, on Dataflow Aspect. A PhD Student was hired last year.
### TUD:
Goals from last year
* introduce high-level abstraction to manage heterogeneity of astronomy systems.
* Use MLIR-based infrastructure
Model-based not sufficiently justified in last year's proposal.
EU Innovation to be improved also. (Maybe see beyond astronomy benefits)
[EVEREST](https://everest-h2020.eu) EU project:
* Based on MLIR - TUD now has experience
* Able to do complex transformation between Models
* Further motivates the use of MLIR as a convergence framework for GIGANTIC
* Exchange with AMD Compiler teams: future architectures are dataflow friendly (manycore-like)
* Recently started Dataflow with MLIR
> Mickaël: TODO check Pedro Ciambra's approach to model SDF in MLIR).
>
> Francesca: HiPEAC's Vision – MLIR will be covered in a chapter, to expose evolvable/reconfigurable low-level concerns to compilers)
>
> Jeronimo: There is a dialect for ONNX (stencils/tensors) to directly import into MLIR.
Lingua Franca: Timed Discrete Event MoC.
### UNISS
* Ongoing collab with TUD on HW integration:
Scheduling on HW multithreading.
* Connection of MDC with MLIR (early discussions)
how to use the optimization capabilities of MLIR to work/switch different approximate computing configurations
### UPM - CEI
* Previous call: model-based description > HW (still interested in this line)
* Eduardo de la Torre to retire soon.
* Alfonso & Andres Otero to take over.
* Hot topic:
* Hardware composition
* C -> LLVM -(mapping/scheduling on tile-based archi)-> HW
* Optims: loop unrolling.
* New PhD students on the compiler & HW side (with RISC-V and MLIR to properly exploit parallelism extraction and deployment)
### Fermat
Fermat addresses the bottleneck of data accesses.
Storage is "smartly attached" to the computing in FERMAT's solutions (multiple FPGA-SSD systems).
For compatible applications: speedups 300x and more.
Technological updates since last year.
* More powerful FPGAs
* Current UC: mostly banking
Main concern: how to make HW accessible (especially FERMAT's architecture) and "easy" to use.
* Currently, algorithms need to be rewritten (simple ones can go through HLS tools, but this process is not easily scalable).
FERMAT's solution is a PCIe card, up to 4 can be integrated in a server. The solution has not be experimented in an HPC system.
### ASTRON
SKA Antenna's are being built.
Still a couple of years to figure out what computers will be needed exactly.
Astron leads a SKA co-design team, with French and Swiss partners, creating benchmarks to help design the most appropriate system. Benchmarks include the possibility of measuring power consumption.
Use [REFRAME from CSCS](https://reframe-hpc.readthedocs.io/en/stable/)
Project on streaming data processing.
Work with hw-accelerators or GPUs also.
In contact with a prof. in Univ. of Amsterdam that could be added to the consortium.
Potential issue: Project do not cover the whole time of the PhD in the Netherland (also true in Germany and Spain).
One of the use cases provider
### BSC
Accepted EU Projects:
* EU Project (start Jan 2023):
SKA use case
MLIR concepts are more and more present for BSC also.
BSC is very interested in MLIR also to explore parallel computations.
Possibility to include Universitat Politècnica de Catalunya (who can deliver PhD diploma). To discuss tomorrow in the slot for double diplomas.
Summary: same role, more effort on MLIR.
### ATOS BULL
Focus on SDP-archi
Involved on the training part on:
* Cluster programming (connect, compile, connect)
* Training on programming languages
Interested in supervising a PhD.
* On parallel programming languages
Can access to latest technology for benchmarking, training, and open access for dedicated
### UPM-CITSEM
Hyperspectral image processing domain for Health.
Used to work with "push-broom" sensors to snapshot cameras.
Recently acquired two servers:
* One with GPUs
* One with a mix of GPUs & FPGAs
Signals from radio-telescope are also hyperspectal images, so processing concerns are very similar to medical images.
Objective for this proposal:
* Not only focuse on compression of Hyperspectal cubes
* But also on how to manage computations for Heterogeneous HW.
## Feedbacks from 2021
Main Objectives:
* Cut development & Operational costs
* Enable seamless evolution of
* Co-design betwen astronomers & system architects
grading scale:
* well written/described/whatever < good < satisfactory < excellent
* last year we got 88%, threshold for being funded 94%
| Section | Score | Weight |
|:-------------|-----------:|-------:|
|Excellence | 4.3 / 5.00 | 50%|
|Impact | 4.4 / 5.00 | 30%|
|Implementation| 4.7 / 5.00 | 20%|
### Ideas
- Show that PhDs trained in GIGANTIC ESR can innovate in SKA-like LSCIs but also on other types of related topics
- Show some interconnects
- Create a Diagram of exchanges between partners
- We need to motivate the training plan with "skills" needed by the Industry (ASTRON, FERMAT, ATOS)++
- Courses for Researchers in the NL (https://www.esciencecenter.nl/digital-skills/, https://www.esciencecenter.nl/training-materials/)
- devOps (code management, software development methodology, etc.)
- Innovation
- we need to connect to domain-specific trainings (and to what EU expects)
- hardware specialization...
- EU is a lot about edge (near-sensors)
- Sustainable systems
- Erwan: SKA objective decades of running, need for smooth upgrades that do not require stopping/replacing/rebuilding the whole system.
- Train the people capable of running the change
- Jeronimo: Cluster of long-term "sustainable" computing in Dresden
- TU Dresden: long-term activity on monitors of all energy compsumptions
- Grail of ITN: common PhD degree for the consortium
- Courses available in the Dutch ASCI research school, which is focused on computer science and vision (https://asci.tudelft.nl/courses/)[
- Mention "third-party" summer schools: [HiPEAC's ACACES](https://www.hipeac.net/events/#/acaces/), etc.. where some members of the consortium have taught in the past (BSC, UNISS).
- ESR "carreer development plans"
- Issue: Training plan is biased towards the begining of the project.
- Pierre: be convincing at the very beginning of the document and at the very end. (when the reader makes her/his opinion)
- European Innovation Capacity insufficiently demonstrated:
- Tackle long-term objectives: where will PhD Students be 5yrs after the project.
- Remove barriers to innovation (not only in astronomy -> see remark on applicability of GIGANTIC's knowledge to edge computing)
- Remark Pierre: EU usually mixes conf and journal publis
- Each task should produce at least one publication in a high quality journal.
- > Erwan: Propose a special issue in a journal to publish relevant paper (done in other ITN Erwan knows). We need to make sure this does not appear as an artificial way to increase the number of publications.
- organize a special journal issue in a well ranked journal (complementary to publications in standard issues of well ranked journals)
- 20 journal publications over the 20 ESR
-
## Ideas to improve the Excellence/Impact/Implementation Sections
### **KPIs** (!!!) :timer_clock: :microscope: :face_with_thermometer:
#### We need to organize (Eduardo Quiñones)
1. Objectives
- Technological objectives
- portability & composability
- KPI1: integrating a new HW in the LSCI with limited manual modifications (measurement method to be invented)
- KPI2: HW/SW composability: new module does not impact existing modules
- tested on examples measurements
- Use case-related objectives
- SKA developed solutions performance
- Exploitation objectives
- optimized libraries for astronomers
2. KPIs and means of verification
- KPIs need to be SMART: specific, measurable, assignable, realistic and time-related
- performance
- real-time
- cost & value for astronomers
- design productivity
- predictable performance
2. Research activities (tasks) to reach objectives
- Chris: **Cost & value** for a radio astronomy system: [Paper link](https://www.astron.nl/~broekema/papers/ASCOM19/1806.06606.pdf)
- Value of the system?
- Number of papers produced.
- **Energy efficiency / sustainability** (baseline?)
- high expected gains
- CO2 Footprint (Not only during computation, but also production of computing systems)
- depending on load level and lifetime EDA tool from [Alex K. Jones, Pittsburgh U.](https://sites.pitt.edu/~akjones/Alex-K-Jones/Home.html)
- [GreenChip](https://www.sciencedirect.com/science/article/pii/S2210537917300823)
- **Accuracy**
- Accuracy of (co-)simulation
- exemple of SST simulation framework for supercomputers
- **Dissemination**
- **Networking**
- general info on community, networking/summer school, following on video/social networks
- Jeronimo: increase networking, influence
- Show influence of EU funding, how money is used, counter defiance of EU
- **PhDs Communication skills**
- less important that dissemination
- Target groups for communication
- The conversation, the Europe magazine: science vulgarization
- rely on sustainability and astronomy
* **Development & operational cost**
* "From months to days" is it really measurable?
* Chris: Yes: not ideal, but doable.
* Karol: KPI on automation also: no automation / partial automation / full automation.
* > Edu Q.: Automation is the mean, not the KPI
* Edu Q.: Reduce development cost by X.
* Need to do the exercise of implementing the system manually without the developed technology, and then with the developped system.
* Maxime: [Design Productivity Measure Method](https://hal.archives-ouvertes.fr/hal-01358210/file/samos_hls-design-productivity_hal.pdf)
* KPI must come with a method to measure, a method to compare, and a protocol to reduce biases
* some metrics are measurable but badly representative
* some metrics are representative but not really measurable
* performance is multidimensional
* measuring one's own system capacity tends to create biases in evaluation method
* **System Performance**
* Fulfil Real-time/Energy/... constraints
## Discussion on HW Models & HW Platforms
**We want to be HW-Agnostic**
- Common interest in RISC-V
- [Xilinx AI engine: specific HW](https://www.xilinx.com/products/technology/ai-engine.html)
- [Signal processing with the Xilinx ACAP AI-engines?, Steven van der Vlugt, ASTRON, in Casper 2022](https://sites.google.com/inaf.it/casper2022/agenda?authuser=0)
- Extension to TensorCore, extremely good for Beamforming, very limited applicatibility to SKA due to data
- Array of VLIW
- Available ALVEO and Versal platforms at UPM CEI: might be used to explore a combination of AI Engines + dedicated HW on FPGA
- Astron:
- Started experiencing with Programmable network interfaces (in-network computing)
- NVidia Mellanox platform
- Jeronimo: methodology depends on the HW constraints; constraints could spice it up
- Atos:
- NIC connected directly to a GPU or an FPGA in NIC (Network Interface Card) (small one, similar to embedded profiles and not HPC)
- Example of low power RISC-V for HPC: ["A RISC-V in-network accelerator for flexible high-performance low-power packet processing"](https://arxiv.org/pdf/2010.03536.pdf)
- GPUs
- Jeronimo: In EVEREST, trying to focus on FPGA and other archi which are less "mainstream"
- particular kernels in computational fluid dynamics (spectral elements): Already known they were not very amenable to GPUs.
- taking a different direction: heterogeneous architectures
- Mickael: "FPGAs and high perf mem have been less studied so we do that"
- Pros of FPGAs/AI engines: architectures are exposed, there is no need for cache coherency, only scratchpads managable through DMAs if we know the scheduling
Correlator and beamformer papers:
GPU based system design (COBALT) https://doi.org/10.1016/j.ascom.2018.04.006
Earlier BlueGene based systems, including detailed description of the algorithms:
https://www.astron.nl/~broekema/papers/URSIGASS-11/lofar.pdf
https://www.astron.nl/~broekema/papers/PPoPP-10/lofar.pdf
https://www.astron.nl/~broekema/papers/SPAA-06/bgl.pdf
All code used in LOFAR is open source, the correlator beamformer code is here:
https://git.astron.nl/ro/lofar/-/tree/master/RTCP/Cobalt
## Discussion on Model-based design
- We all seem to agree on the use of MLIR as the intermediate representation between some high-level abstraction and implementation.
- Use of model-based approach is insufficiently motivated.
- Potential Argument: Competition is doing that: DASK, DALIUGE
- Potential Arguments: Models of computations provide the basic semantics that:
- enables automating the design-flow (through welle defined semantics)
- makes it possible to provides guarantees on the described applications (time/deadlock freeness/memory boundedness, ...)
- One of the motivations can be that the SKA application itself is dataflow
- LinguaFranca from Berkeley (with work from TUD) could be interesting
- reactive model and reacting to events
- once events are there, a graph of dependency is triggered
- no explicit FIFOs yet
- if input signal timings are relevant
- LinguaFranca not yet connected to MLIR --> possible work in progress, but not there yet
- WP2 lowers to WP3 ->
- SKA Applications are not completely static
- Datapath needs to be reconfigured dynamically, data-dependent/triggered
- Data-triggered reconfigurations can be related to waiting for an observation (pulsar)
- need to store huge amount of data on trigger
- Fermat: concentrating on first bricks
## DC Program
* A few advisor names need to be confirmed/replaced
* Description of DC programs should match the goals defined at the beginning of the project.
### DC6 - example
* Former title: Hypercube analysis & compression
* needs to be more GIGANTIC oriented
* Proposal: Estimation of the impact of Data compression in the computing continuum.
* Lossy or not lossy (that is the question :skull:)
* KPIs: storage, time, energy
* Link with in network computing
* link with approximate computing (DC2)
### DC 1, (2,) 4, 9
* Put MLIR forward (not necessarily in titles) -> "high-level compilation framework"
* DC9 defines the high level input of the framework
* DC4 translates the high-level (DC9) model into an executable model
* DC1 simulates the model from DC4
# ITN Gigantic Meeting Day 2
_2022.09.08_
* We have changed the order of the agenda, first topic now is Doctoral Schools discussion
* EduJ: From UPM side they do not think there will be any problem for enrolling external students to graduate students courses, but he has to check. However, no such thing as an official calendar is published.
* We need to try to have some double diplomas (DD)
* Last year, most DD were with INSA.
* Edu Q.: Can we include UPC as a associated partner (for its doctoral school)?
* Pierre: Yes, not a problem for universities awarding PhDs, but we should avoid adding associated partners for trainings.
* J: Univ. of Amsterdam would have the same role.
* DC4 - Replace Astron with TUD.
* DC6 - Maybe replace TUD w/ ASTRON? - DD between UPM & Univ. of Amsterdam - INSA/Jeff as possible secondment
* DC7 - To be discussed with ABI. Also discuss DC2?
* DC9 - Replace INSA w/ UPC (?)
## Budget :money_with_wings: :money_mouth_face: :moneybag: :bank: :euro:
* All beneficiaries will receive money for DCs.
* Living Allowance depends on the country: fixed by ITN.
* Mobility allowance is fixed
* Family allowance for PhD students with family obligations.
* Coordinator (INSA) of the project receives all the money, and distribute it.
* Budget for TUD is well below the established salaries for PhD students. Are they managed as grants (or can they be)? [German costs of a PhD student](https://www.dfg.de/formulare/60_12/v/60_12_-2022-_en.pdf)
* Do the living allowance numbers in the table represent what the student will get, or do they include social charges?
* The coordinator retains 10% of the budget of every beneficiary so as to pay for associated partners (for example, to pay for the summer school)
* Also to pay for possible trips from associated partners (e.g., to the kick-off): each beneficiary cover their own travelling expenses, but those of associated partners are covered by the coordinator (INSA).
* Damien: Pay attention to tuition fees, notably in case of double diplomas
## Excellence Section
## Impact Section
## Implementation Section
## TODOs:
* [ ] ALL: Put the word sustainable forward in the proposal
* [ ] ALL: Contact doctoral schools to identify courses proposed by existing organisations, and that could be proposed to ESRs of the consortiums.
* Dresden: graduate academy but not a unique portfolio of lectures accessible to PhD students
* [ ] ALL: Reference summer/winter schools: CPS, ACACES...
* [ ] Non academic partners: share the perfect skills for junior recruitment
* [ ] ALL: align DC names to the goals they cover
* [ ] Make sure that all rules about supervisors' diploma (Hab./PhDs/Prof. status), from all doctoral schools involved (INSA, UPM, TUD, UPC, Univ. Amsterdam, UNICA?) are respected