jatkinson1000
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Cambridge Hybrid Modelling Workshop Collaborative Notes We will use this collaborative space to take notes during the workshop and discussion sessions. Click the [Edit](https://hackmd.io/@jatkinson1000/ByerJSm5le/edit) button to open a split screen. You can freely edit the markdown in a collaborative fashion and see the preview on the right hand side. Please be sensible and do not erase the work of others. Questions from the panel discussion and notes from the breakout groups will be used to produce a summary of the workshop outcomes after the event. ## Contents - [Key Links](#Key-Links) - [Attendee contact details](#Attendee-Contact-Details) - [Panel Discussion](#Panel-Discussion) - [Breakout Discussion Notes](#Breakout-Discussions) ## Key Links - [Workshop Website](https://cambridge-iccs.github.io/ml-coupling-workshop) - [Programme and Talks](https://cambridge-iccs.github.io/ml-coupling-workshop/programme.html) - [Zoom call](https://cam-ac-uk.zoom.us/j/88605044072?pwd=BsVa5OC1h7goScGohax9coXslYS0AF.1) ## Attendee contact details Add your contact information for other atendees if desired. We will not be sharing this more widely beyond this event or using it to contact you personally. - Jack Atkinson - ICCS Cambridge - https://jackatkinson.net/ and jwa34[AT]cam.ac.uk - Joe Wallwork - ICCS Cambridge - https://joewallwork.com/ and jw2423[AT]cam.ac.uk - Matt Graham - Centre for Advanced Research Computing, UCL - https://matt-graham.github.io - m.graham[AT]ucl.ac.uk - Valentin Churavy - Uni Augsburg/Uni Mainz - vchuravy[AT]uni-mainz.de - Alan Geer - ECMWF - alan.geer[AT]ecmwf.int - Maha Badri - Potsdam Institute for Climate Impact Research (PIK) and Technical University of Munich (TUM) - maha.badri[AT]tum.de ## Panel Discussion Questions for the panel discussion will be collated through slido and moderated by the chair. To submit a question and vote for others please go to slido.com and use code: 3497047 Or use [this direct link](https://app.sli.do/event/eMCHaRGSsWx1A4rMu6t8kk). ## Breakout session 1 ### Coupling interfaces > We’ve seen various options for coupling ML code to numerical models. > * What are pros/cons to each? > * Which suit particular circumstances? > * How can we accommodate different research objectives? > * How can we simplify the technical details? > * How can we standardise approaches across model suites? Chair: Jack Franklin Note taker: Notes: - Hand writing Fortran ML - good learning experience, has been taken further towards a useable library - Explorative development (easy with coupling libraries like ftorch) - Performance of Fortran ML vs other packages - Dependency on libraries beyond our control (e.g., torchscript will be deprecated) - Spread the word about existing coupling methods -> not every research group has to come up with their own solution ### Hardware > GPU acceleration shook up scientific computing and facilitated deep learning. > * What hardware changes could the future bring and how will they impact current approaches? > * There are large differences between proof of concept and deployment hardware, how can we incorporate this? > * What novel solutions already exist? > * What knock-on impacts/constraints will hardware have that need to be considered by software and ML algorithm implementation? Chair: Ian McInerney Note taker: Rob Waters Notes: - size of ML very important - small problems should still be on CPU - DL or Large NN - will need GPUs for inference - training of will always be GPU - ideal problem size will grow - leaving performance on the table if your problem is too small to maximise use of GPUs - IFS, ICON-ML and NEMO-VAR - moving to GPU port but slow - DAWN + Isimbard - HPCs with GPUs for research - Chicken + Egg of porting dynamical models --> need to justify now for potential future ML - data transfer between architectures will be a big issue - the bottle neck! - Grace Hopper - unified memory - can help with split need - Getting access to hardware an issue - need testing resources! - hard with competitive call for CPU hours - Scheduler issue - cant request mix of nodes - say if dynamical core was on the CPU but components were on GPU - MPI transferring between GPU and CPU nodes - potential limitation on any speed improvement - newer MPI systems - GPU connected to infinity band, MPI writes directly to GPU memory - **CUDA aware MPI** - each GPU have its own infinity band card on modern chips - NVIDIA cutting float64 - AMD might not - Would be an issue for UKCA - large range of concentrations - difference between proof of concept vs operationalised - different scales - desktop GPU vs HPC - GPU generations - notable difference in performance - novel solution - write output between GPU and CPU - SmartSIM - move information between frameworks - potentially python could be on GPU - run separately but use a shared database - still issues regarding scheduling - cost of buying operational GPU based HPC a big issue - does it justify potential speed ups? ### Differentiable Models & Online training > Differentiable Models and Online Training is a topic increasingly discussed… > * How do models need to change to be “differentiable”? > * What will this involve? > * Should we strive for all scientific models to be differentiable? > * Is it a silver bullet? > * What exactly do we mean by “online training”? > * What are the benefits online training brings us, and what are the challenges? Chair: Branwen Snelling Note taker: Matt Graham Notes: - What are people currently using / what is your interested? - Maha - Julia - land surface model Terrarium.jl https://github.com/TUM-PIK-ESM/Terrarium.jl - goal is to couple with other Julia ESM components - Nell - interested in differentiable programming from perspective of user of FireDrake. Looking at coupling machine learning components and being able to differentiate through whole coupled model end to end. - Milan - lead developer of SpeedyWeather - early idea to make it differentiable. Been working with people like Valentin to help guide development to ensure it fits into constraints of AD framework e.g. type stability, allowing array mutation. - Paul - little previous experience with differentiable programming, here to learn. - Jack - background in modelling, direction towards using hybrid components. - Valentin - expertise in compilers, working on differentiable programming in Julia but also other languages for several years. Enzyme in Julia initially scoped for quite specific type-stable code, now trying to generalize to allow for use in wider range of codebases. - Ben - led team developing a dynamical core GungHo written in Firedrake, rewriting in Fortran for HPC deployment. Mainly interested in this space from perspective of hybrid modelling and how to make codes amenable to use with differentiable components - interfacing with Fortran code, reimplementing components in other paradigms / frameworks like JAX or something else. - Matt at UCL. Recently using JAX and building tools for spherical model transforms and differentialbility. - Paul - interested in near term forces and how they affect climate change. Small effect - how to detect this, need for looking at sensitivities, calibration of parameters. - Joe - particle physics background introduced to automatic differentiation via PyTorch - great documentation. Scope for rewriting some model components to allow for data assimilation, parameter calibration. Empowering scientists through building useful tools. - Julien - developing hybrid version of ICON models. Replacing components with ML surrogates - mainly currently using automatic differentiation only to train ML components in isolation. Longer term - end to end differentiability of a model like ICON would allow parameter calibration of coupled model. - Alan - working on observation operators in data assimilation context - always needs to be differentiable to provide differentiable and adjoint models for variational data assimilation. 'Automatic' differentiation via human in the loop - lots of suggestions to ECMWF to employ AD frameworks but currently find more manual approach works for them. - Branwen - systems biology modelling, behaviour of groups of cells - interested in using fully differentiable approaches in this context. - Milan: If ECMWF is always writing differentiable code to allow for data assimilation, why is this not always employed for parameter estimation? - Alan: Something which there has been work on for a long time, but some skepticism that it would work well. Technical challenges in terms of high-dimensionality of parameter space, variable sensitivity to different parameters. Lots of legacy code in data assimilation framework - difficult to coordinate all different groups / components (physics, observations, assimilation) to agree on route forward. - Branwen: Trade-off between writing models from scratch versus updating / adapting existing model codes. - Milan: SpeedyWeather not originally intended to be differentiable, but intention to allow for generality in numerical precision / types made it natural to then fit within constraints of AD frameworks. - Valentin: Clima - initially not interested in differentiability due to previous experience with MITgcm, found that achieving differentiability a large development burden. Now retrospectively adding support for differentiability back in. Dangers of trying to do too many things at once - gets in the way of science. Better to start simple and then add complexity, e.g. support for AD, later. How to verify correctness of gradients? - Alan: In ECMWF context, correctness verified by looking for consistency between tangent linear and adjoint and whether data assimilation scheme actually works as expected. - Jack: Some interest from collaborators on making models differentiable. Not written with differentiability in mind from the outset - e.g. in place mutations to arrays, propagating derivatives across timesteps. - Valentin: Initial goal of Enzyme was to break this paradox - allow differentiating scientific code written with typical paradigms. Generally people have high expectations - differentiating through entire model including time stepper. Tradeoff between differentiating and then discretising versus discretising then differentiating. - Ben: considered this tradeoff in development of tangent linear and adjoint model. - Valentin: Trajectories of forward and backward models not guaranteed to be consistent when using adjoint sensitivity analysis approach. :exploding_head: - Alan: what are the applications in mind for AD? Sometimes alternative approaches for obtaining sensitivities - for example ensemble approaches, fitting emulator and differentiating through that. Do we always need line-by-line differentiation? In ECMWF context they do want this level of granularity. - Valentin: In long running models, sensitivities to initial state becoming weaker. - Branwen: relation to distinction between online learning and offline. Sometimes less need for differentiability of whole model. - Milan: sees differentiability of components as first step. Ideally even if components trained offline, will eventually be trained online as part of end to end model. - Paul: SuperDroplet talk - need for physical constraints to get consistency when embedded in overall model. - Valentin: presence of one differentiable component opens up avenues to calibrate this component as part of broader model. - Julien: ideally would like to do online learning, but typical response would need overall model to differentiable rather than a single component. Importance of being able to deal with single components in isolation. - Branwen: concerns about considering components in isolation from physical plausibility perspective. - Paul: some modularization already from operator splitting schemes - parameterizations dealt with separately. - Milan: challenges of time dimension in NeuralGCM case - model runs initially unstable / diverging. Need to start at shorter rollouts / lower resolutions. Intially start with optimizing one step ahead problem. - Julien: similar idea used in training ML component of model, training over shorter time horizons initially. - Valentin: Active learning in molecular dynamics setting - learning 'curriculums' to train different components at different stages. Possible relation to AutoEmulate talk? - Clarification around roles of when JAX is useful versus other frameworks. - Tying in to specific framework - need to use structured control primitives. - JAX limitations in requiring shapes and datatypes to be known statically at compile time. Lots of limitations come from JIT model rather than differentiability perse. - Enzyme also has its own sharp edges. - JAX easier to pick up if you have existing experience with scientific Python Summary points: - Being clear with goals initially when writing code. Need to know what you want to do and ideally co-design with AD people. - Importance of approaches to allow dealing with components in isolation first as part of a journey towards more general differentiability. - Software lock-in - legacy code bases - difficult in introducing differentiability. No easy solutions. - Trade-offs in different frameworks - each implies its own constraints on programming model, some will feel more natural / less obtrusive depending on prior experience. - Manually implementing derivatives can be feasible approach with sufficient experience (e.g. observation operators at ECMWF) but importance of having checks for correctness - for example consistency between tangent linear and adjoint model. --- ## Breakout session 2 ### Stability and Uncertainty > Stability has always been a concern/Achilles’ heel for ML/hybrid models, and uncertainty is an important topic in simulation. > * Stability offline ⇏ stability online, how can we tackle this? > * How should we handle sensitivity/ensemble simulation with hybrid models? > * How can we make components/parameterisations portable between models? > * How can we calibrate/re-calibrate between different settings? > * How can we test/evaluate stability of models in a hybrid setting? Chair: Jack Franklin Note taker: Notes: - Physical constraints that are included in the physical model are not necessarily enforced in the ML parameterisation -> need to be included before feeding back to the GCM (e.g. no humidity -> no cloud cover) - Option 1: by model design (architecture or loss function) - Option 2: by post processing model outputs - Training the ML model with constraints (e.g. by ensuring energy conservation in the loss function) can be harder than ensuring conservation post hoc. Especially if the training data does not observe the constraint. - Many processes are not deterministic -> stochastic models with calibrated uncertainties - Simulations with many time steps (e.g. climate) mean errors accumulate quickly (as opposed to data-driven weather models with longer time steps) -> stability required - Online stability can be different between longer simulations and demo cases (e.g. AMIP experiment vs single-column model) - Source of instability? - simulation going beyond the training data range - Feedback loops - Solutions? Explore parameter space with reinforcement learning, online training - How to test and quantify the stability of a hybrid model? - ML optimises for a given metric - Look into the cause for failure, maybe physical constraints are missing? Summary: - Offline training is probably not enough to prevent instabilities - How do we incorporate uncertainties from ML into the rest of the hybrid model - Maybe instabilities can be used as a feedback tool to understand the problem space better? - Incorporate knowledge of physics into ML - how best to do this? - Toy models for testing/developing methodologies (climlab) - Types of instability - ML chaotic physics vs feedback loops in hybrid models ### ML architectures > We began with simple FCNNs but are now seeing increasingly complex models. > * What architectures are suited to what sorts of problems? > * Is the future in pre-trained models? > * What constraints do numerical models/hardware place on architectures? or vice versa? Chair: Note taker: Notes: ### Research to operations > The eventual goal is to bring hybrid models to benefit in an operational setting. > * What needs demonstrating before hybrid models are “trusted” in operation? > * What differences are there between research and operational deployment, and how can we reduce these differences? > * What should we be worried about regarding: > * Optimisations > * Hardware > * Research time vs. delayed cost > * Whose responsibility are these aspects? Chair: Note taker: Rob Notes: - motivations - Rob - how do we operationalise PhD ML solutions in UKCA - Dongxiao Hong - huge amount of data from engine research - can we use ML to solve issue - Alan Xavier - flow field predictions, developed some ML models but not deployed anything operationally - Ian - industry collaborators, how do we get the ML back to them - Angela Maylee Iza Wong - how do we determine whether ML is actually an improvement, particularly extreme events (rainfall) - Joe - Ftorch, used by researchers - e.g. ML parameterisation, aim is to get it used by operational centres - Jack - how to we get PhD/research ML to operations - streamline all the good research to operations - Alan - the difficulties to operationalise other peoples ML solution - appreciate the testing required to operationalise - ECMWF - danger of adding complexity to model with low benefit - some high level planning but also bottom up (area specific developments) - 6 months worth of testing to operationalise - doesnt always show improvement - research partition on the hpc * dual language is a common problem (python and fortran) * ensuring backwards compatibility - known good output tests downstream (industry) - their engineer's sign off on changes * interpretability of ML - need to understand why, especially for certification - 5-6 year time scale * operationalise code as well * industry connections bring a direct workflow to operationalise * how much engineering should be done during research - RSE needed to operationalise * data driven models are very good at showing improvements on standard metrics but what about extreme - AIFS, snow in sahara * feedback from end users - but technically * Onyx - no one currently using * layers of wrapper can be problematic - lose control over program behaviour * optimisations - need to know the systems you run on - architectures change etc * compare vs observations - integrate the model and validate, will increase trust Summary Motivations - Moving research back to industrial partners (engine vibrations, flow fields) - Incorporating new ML parametrisations in climate and weather models - Getting excellent new PhD work benefitting society in operational contexts - Top down planning versus bottom-up developments - a mix Contexts - Everyone has a larger target system, probably Fortran, into which developments need to be incorporated - No one is using Onyx Challenges - ECMWF - huge amount of testing against forecast scores required (6 months) - it can take years of development to show benefit - Testing and validation is mostly at the full system level (why is no-one writing component tests?) - Multiple HPC targets to be supported; optimisation - Should researchers be burdened with engineering concerns (testing, optimisation) at all? - Extremes (e.g. weather warnings) are key outputs of the models but how to validate and test? Test coverage - Different codebases - Different layers, lack of control for optimsation: fortran wrapping C++ Benefits - Operational systems allow comparison to observations - Feedback from users - Industry contacts are valuable to guiding research

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully