Hybrid Model Workshop

# Cambridge Hybrid Modelling Workshop Collaborative Notes We will use this collaborative space to take notes during the workshop and discussion sessions. Click the [Edit](https://hackmd.io/@jatkinson1000/ByerJSm5le/edit) button to open a split screen. You can freely edit the markdown in a collaborative fashion and see the preview on the right hand side. Please be sensible and do not erase the work of others. Questions from the panel discussion and notes from the breakout groups will be used to produce a summary of the workshop outcomes after the event. ## Contents - [Key Links](#Key-Links) - [Attendee contact details](#Attendee-Contact-Details) - [Panel Discussion](#Panel-Discussion) - [Breakout Discussion Notes](#Breakout-Discussions) ## Key Links - [Workshop Website](https://cambridge-iccs.github.io/ml-coupling-workshop) - [Programme and Talks](https://cambridge-iccs.github.io/ml-coupling-workshop/programme.html) - [Zoom call](https://cam-ac-uk.zoom.us/j/88605044072?pwd=BsVa5OC1h7goScGohax9coXslYS0AF.1) ## Attendee contact details Add your contact information for other atendees if desired. We will not be sharing this more widely beyond this event or using it to contact you personally. - Jack Atkinson - ICCS Cambridge - https://jackatkinson.net/ and jwa34[AT]cam.ac.uk - Joe Wallwork - ICCS Cambridge - https://joewallwork.com/ and jw2423[AT]cam.ac.uk - Matt Graham - Centre for Advanced Research Computing, UCL - https://matt-graham.github.io - m.graham[AT]ucl.ac.uk - Valentin Churavy - Uni Augsburg/Uni Mainz - vchuravy[AT]uni-mainz.de - Alan Geer - ECMWF - alan.geer[AT]ecmwf.int - Maha Badri - Potsdam Institute for Climate Impact Research (PIK) and Technical University of Munich (TUM) - maha.badri[AT]tum.de ## Panel Discussion Questions for the panel discussion will be collated through slido and moderated by the chair. To submit a question and vote for others please go to slido.com and use code: 3497047 Or use [this direct link](https://app.sli.do/event/eMCHaRGSsWx1A4rMu6t8kk). ## Breakout session 1 ### Coupling interfaces > We’ve seen various options for coupling ML code to numerical models. > * What are pros/cons to each? > * Which suit particular circumstances? > * How can we accommodate different research objectives? > * How can we simplify the technical details? > * How can we standardise approaches across model suites? Chair: Jack Franklin Note taker: Notes: - Hand writing Fortran ML - good learning experience, has been taken further towards a useable library - Explorative development (easy with coupling libraries like ftorch) - Performance of Fortran ML vs other packages - Dependency on libraries beyond our control (e.g., torchscript will be deprecated) - Spread the word about existing coupling methods -> not every research group has to come up with their own solution ### Hardware > GPU acceleration shook up scientific computing and facilitated deep learning. > * What hardware changes could the future bring and how will they impact current approaches? > * There are large differences between proof of concept and deployment hardware, how can we incorporate this? > * What novel solutions already exist? > * What knock-on impacts/constraints will hardware have that need to be considered by software and ML algorithm implementation? Chair: Ian McInerney Note taker: Rob Waters Notes: - size of ML very important - small problems should still be on CPU - DL or Large NN - will need GPUs for inference - training of will always be GPU - ideal problem size will grow - leaving performance on the table if your problem is too small to maximise use of GPUs - IFS, ICON-ML and NEMO-VAR - moving to GPU port but slow - DAWN + Isimbard - HPCs with GPUs for research - Chicken + Egg of porting dynamical models --> need to justify now for potential future ML - data transfer between architectures will be a big issue - the bottle neck! - Grace Hopper - unified memory - can help with split need - Getting access to hardware an issue - need testing resources! - hard with competitive call for CPU hours - Scheduler issue - cant request mix of nodes - say if dynamical core was on the CPU but components were on GPU - MPI transferring between GPU and CPU nodes - potential limitation on any speed improvement - newer MPI systems - GPU connected to infinity band, MPI writes directly to GPU memory - **CUDA aware MPI** - each GPU have its own infinity band card on modern chips - NVIDIA cutting float64 - AMD might not - Would be an issue for UKCA - large range of concentrations - difference between proof of concept vs operationalised - different scales - desktop GPU vs HPC - GPU generations - notable difference in performance - novel solution - write output between GPU and CPU - SmartSIM - move information between frameworks - potentially python could be on GPU - run separately but use a shared database - still issues regarding scheduling - cost of buying operational GPU based HPC a big issue - does it justify potential speed ups? ### Differentiable Models & Online training > Differentiable Models and Online Training is a topic increasingly discussed… > * How do models need to change to be “differentiable”? > * What will this involve? > * Should we strive for all scientific models to be differentiable? > * Is it a silver bullet? > * What exactly do we mean by “online training”? > * What are the benefits online training brings us, and what are the challenges? Chair: Branwen Snelling Note taker: Matt Graham Notes: - What are people currently using / what is your interested? - Maha - Julia - land surface model Terrarium.jl https://github.com/TUM-PIK-ESM/Terrarium.jl - goal is to couple with other Julia ESM components - Nell - interested in differentiable programming from perspective of user of FireDrake. Looking at coupling machine learning components and being able to differentiate through whole coupled model end to end. - Milan - lead developer of SpeedyWeather - early idea to make it differentiable. Been working with people like Valentin to help guide development to ensure it fits into constraints of AD framework e.g. type stability, allowing array mutation. - Paul - little previous experience with differentiable programming, here to learn. - Jack - background in modelling, direction towards using hybrid components. - Valentin - expertise in compilers, working on differentiable programming in Julia but also other languages for several years. Enzyme in Julia initially scoped for quite specific type-stable code, now trying to generalize to allow for use in wider range of codebases. - Ben - led team developing a dynamical core GungHo written in Firedrake, rewriting in Fortran for HPC deployment. Mainly interested in this space from perspective of hybrid modelling and how to make codes amenable to use with differentiable components - interfacing with Fortran code, reimplementing components in other paradigms / frameworks like JAX or something else. - Matt at UCL. Recently using JAX and building tools for spherical model transforms and differentialbility. - Paul - interested in near term forces and how they affect climate change. Small effect - how to detect this, need for looking at sensitivities, calibration of parameters. - Joe - particle physics background introduced to automatic differentiation via PyTorch - great documentation. Scope for rewriting some model components to allow for data assimilation, parameter calibration. Empowering scientists through building useful tools. - Julien - developing hybrid version of ICON models. Replacing components with ML surrogates - mainly currently using automatic differentiation only to train ML components in isolation. Longer term - end to end differentiability of a model like ICON would allow parameter calibration of coupled model. - Alan - working on observation operators in data assimilation context - always needs to be differentiable to provide differentiable and adjoint models for variational data assimilation. 'Automatic' differentiation via human in the loop - lots of suggestions to ECMWF to employ AD frameworks but currently find more manual approach works for them. - Branwen - systems biology modelling, behaviour of groups of cells - interested in using fully differentiable approaches in this context. - Milan: If ECMWF is always writing differentiable code to allow for data assimilation, why is this not always employed for parameter estimation? - Alan: Something which there has been work on for a long time, but some skepticism that it would work well. Technical challenges in terms of high-dimensionality of parameter space, variable sensitivity to different parameters. Lots of legacy code in data assimilation framework - difficult to coordinate all different groups / components (physics, observations, assimilation) to agree on route forward. - Branwen: Trade-off between writing models from scratch versus updating / adapting existing model codes. - Milan: SpeedyWeather not originally intended to be differentiable, but intention to allow for generality in numerical precision / types made it natural to then fit within constraints of AD frameworks. - Valentin: Clima - initially not interested in differentiability due to previous experience with MITgcm, found that achieving differentiability a large development burden. Now retrospectively adding support for differentiability back in. Dangers of trying to do too many things at once - gets in the way of science. Better to start simple and then add complexity, e.g. support for AD, later. How to verify correctness of gradients? - Alan: In ECMWF context, correctness verified by looking for consistency between tangent linear and adjoint and whether data assimilation scheme actually works as expected. - Jack: Some interest from collaborators on making models differentiable. Not written with differentiability in mind from the outset - e.g. in place mutations to arrays, propagating derivatives across timesteps. - Valentin: Initial goal of Enzyme was to break this paradox - allow differentiating scientific code written with typical paradigms. Generally people have high expectations - differentiating through entire model including time stepper. Tradeoff between differentiating and then discretising versus discretising then differentiating. - Ben: considered this tradeoff in development of tangent linear and adjoint model. - Valentin: Trajectories of forward and backward models not guaranteed to be consistent when using adjoint sensitivity analysis approach. :exploding_head: - Alan: what are the applications in mind for AD? Sometimes alternative approaches for obtaining sensitivities - for example ensemble approaches, fitting emulator and differentiating through that. Do we always need line-by-line differentiation? In ECMWF context they do want this level of granularity. - Valentin: In long running models, sensitivities to initial state becoming weaker. - Branwen: relation to distinction between online learning and offline. Sometimes less need for differentiability of whole model. - Milan: sees differentiability of components as first step. Ideally even if components trained offline, will eventually be trained online as part of end to end model. - Paul: SuperDroplet talk - need for physical constraints to get consistency when embedded in overall model. - Valentin: presence of one differentiable component opens up avenues to calibrate this component as part of broader model. - Julien: ideally would like to do online learning, but typical response would need overall model to differentiable rather than a single component. Importance of being able to deal with single components in isolation. - Branwen: concerns about considering components in isolation from physical plausibility perspective. - Paul: some modularization already from operator splitting schemes - parameterizations dealt with separately. - Milan: challenges of time dimension in NeuralGCM case - model runs initially unstable / diverging. Need to start at shorter rollouts / lower resolutions. Intially start with optimizing one step ahead problem. - Julien: similar idea used in training ML component of model, training over shorter time horizons initially. - Valentin: Active learning in molecular dynamics setting - learning 'curriculums' to train different components at different stages. Possible relation to AutoEmulate talk? - Clarification around roles of when JAX is useful versus other frameworks. - Tying in to specific framework - need to use structured control primitives. - JAX limitations in requiring shapes and datatypes to be known statically at compile time. Lots of limitations come from JIT model rather than differentiability perse. - Enzyme also has its own sharp edges. - JAX easier to pick up if you have existing experience with scientific Python Summary points: - Being clear with goals initially when writing code. Need to know what you want to do and ideally co-design with AD people. - Importance of approaches to allow dealing with components in isolation first as part of a journey towards more general differentiability. - Software lock-in - legacy code bases - difficult in introducing differentiability. No easy solutions. - Trade-offs in different frameworks - each implies its own constraints on programming model, some will feel more natural / less obtrusive depending on prior experience. - Manually implementing derivatives can be feasible approach with sufficient experience (e.g. observation operators at ECMWF) but importance of having checks for correctness - for example consistency between tangent linear and adjoint model. --- ## Breakout session 2 ### Stability and Uncertainty > Stability has always been a concern/Achilles’ heel for ML/hybrid models, and uncertainty is an important topic in simulation. > * Stability offline ⇏ stability online, how can we tackle this? > * How should we handle sensitivity/ensemble simulation with hybrid models? > * How can we make components/parameterisations portable between models? > * How can we calibrate/re-calibrate between different settings? > * How can we test/evaluate stability of models in a hybrid setting? Chair: Jack Franklin Note taker: Notes: - Physical constraints that are included in the physical model are not necessarily enforced in the ML parameterisation -> need to be included before feeding back to the GCM (e.g. no humidity -> no cloud cover) - Option 1: by model design (architecture or loss function) - Option 2: by post processing model outputs - Training the ML model with constraints (e.g. by ensuring energy conservation in the loss function) can be harder than ensuring conservation post hoc. Especially if the training data does not observe the constraint. - Many processes are not deterministic -> stochastic models with calibrated uncertainties - Simulations with many time steps (e.g. climate) mean errors accumulate quickly (as opposed to data-driven weather models with longer time steps) -> stability required - Online stability can be different between longer simulations and demo cases (e.g. AMIP experiment vs single-column model) - Source of instability? - simulation going beyond the training data range - Feedback loops - Solutions? Explore parameter space with reinforcement learning, online training - How to test and quantify the stability of a hybrid model? - ML optimises for a given metric - Look into the cause for failure, maybe physical constraints are missing? Summary: - Offline training is probably not enough to prevent instabilities - How do we incorporate uncertainties from ML into the rest of the hybrid model - Maybe instabilities can be used as a feedback tool to understand the problem space better? - Incorporate knowledge of physics into ML - how best to do this? - Toy models for testing/developing methodologies (climlab) - Types of instability - ML chaotic physics vs feedback loops in hybrid models ### ML architectures > We began with simple FCNNs but are now seeing increasingly complex models. > * What architectures are suited to what sorts of problems? > * Is the future in pre-trained models? > * What constraints do numerical models/hardware place on architectures? or vice versa? Chair: Note taker: Notes: ### Research to operations > The eventual goal is to bring hybrid models to benefit in an operational setting. > * What needs demonstrating before hybrid models are “trusted” in operation? > * What differences are there between research and operational deployment, and how can we reduce these differences? > * What should we be worried about regarding: > * Optimisations > * Hardware > * Research time vs. delayed cost > * Whose responsibility are these aspects? Chair: Note taker: Rob Notes: - motivations - Rob - how do we operationalise PhD ML solutions in UKCA - Dongxiao Hong - huge amount of data from engine research - can we use ML to solve issue - Alan Xavier - flow field predictions, developed some ML models but not deployed anything operationally - Ian - industry collaborators, how do we get the ML back to them - Angela Maylee Iza Wong - how do we determine whether ML is actually an improvement, particularly extreme events (rainfall) - Joe - Ftorch, used by researchers - e.g. ML parameterisation, aim is to get it used by operational centres - Jack - how to we get PhD/research ML to operations - streamline all the good research to operations - Alan - the difficulties to operationalise other peoples ML solution - appreciate the testing required to operationalise - ECMWF - danger of adding complexity to model with low benefit - some high level planning but also bottom up (area specific developments) - 6 months worth of testing to operationalise - doesnt always show improvement - research partition on the hpc * dual language is a common problem (python and fortran) * ensuring backwards compatibility - known good output tests downstream (industry) - their engineer's sign off on changes * interpretability of ML - need to understand why, especially for certification - 5-6 year time scale * operationalise code as well * industry connections bring a direct workflow to operationalise * how much engineering should be done during research - RSE needed to operationalise * data driven models are very good at showing improvements on standard metrics but what about extreme - AIFS, snow in sahara * feedback from end users - but technically * Onyx - no one currently using * layers of wrapper can be problematic - lose control over program behaviour * optimisations - need to know the systems you run on - architectures change etc * compare vs observations - integrate the model and validate, will increase trust Summary Motivations - Moving research back to industrial partners (engine vibrations, flow fields) - Incorporating new ML parametrisations in climate and weather models - Getting excellent new PhD work benefitting society in operational contexts - Top down planning versus bottom-up developments - a mix Contexts - Everyone has a larger target system, probably Fortran, into which developments need to be incorporated - No one is using Onyx Challenges - ECMWF - huge amount of testing against forecast scores required (6 months) - it can take years of development to show benefit - Testing and validation is mostly at the full system level (why is no-one writing component tests?) - Multiple HPC targets to be supported; optimisation - Should researchers be burdened with engineering concerns (testing, optimisation) at all? - Extremes (e.g. weather warnings) are key outputs of the models but how to validate and test? Test coverage - Different codebases - Different layers, lack of control for optimsation: fortran wrapping C++ Benefits - Operational systems allow comparison to observations - Feedback from users - Industry contacts are valuable to guiding research

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.