owned this note
owned this note
Published
Linked with GitHub
# 'Crafting Digital Twins' Seminar Series
:::info
**Today's Topic: Emulators for Scaling-up Simulations**
Zoom Link: https://turing-uk.zoom.us/j/96429763451?pwd=SUpaR01NSi8wVzduZUkvRlNjSmY3Zz09&from=addon
:::
**What?** [TRIC-DT Seminar Series](https://github.com/alan-turing-institute/tric-dt/tree/main/Seminars) is an interdisciplinary platform for [TRIC-DT](https://www.turing.ac.uk/research/research-projects/tric-dt) researchers & friends to **share and discuss the computational methods, algorithms, and models that underpin Digital Twin technology across diverse fields**.
**When?** Thursdays afternoon (aiming for 1 or 2pm) Europe/London
**Who?** ***Everyone** intersted in digital twin research*
***All questions, comments, and recommendations are welcome!***
:::info
**Zoom Link**: https://turing-uk.zoom.us/j/96429763451?pwd=SUpaR01NSi8wVzduZUkvRlNjSmY3Zz09&from=addon
:::
---
## Welcome, introduce yourself!
**Name + Pronouns + your affiliation + your research area + an emoji ([emoji cheatsheet](https://github.com/ikatyang/emoji-cheat-sheet/blob/master/README.md))**
*(Remember that this is a public document. You can use a pseudonym if you'd prefer.) Use the 🤫 emoji if you would not like to be included in our public archive*
* [name] + [pronoun] + [affiliation] + [health/infrasturcture/natural environment/other?]
* Sophie Arana + she/her + Turing + Innovation & Impact Hub!
* Chris Burr + he/him + Alan Turing Institute + Innovation & Impact Hub!
* James Byrne + he/him + British Antarctic Survey + Infrastructure / Natural Environment
* Martin Stoffel +he/him + Alan Turing Institute / Research Engineering Group
* Kalle Westerling + he/they + Turing Institute / Research Application Manager, TRIC-DT Innovation and Impact Hub
* Cristobal Rodero + he/they + Imperial College London + Health
* Nick Barlow + he/him + Turing REG
* Scott Hosking + Turing + BAS + Natural Environment
*
*
*
*
---
### Seminar Schedule
| Date | Time | Topic | Room | Speakers |
|------------|-------|-------|--------|-------------------|
| 2023-11-23 | 13:00 | [Emulators for Scaling-up Simulations](https://github.com/alan-turing-institute/tric-dt/issues/12) | Remote (Zoom) | [Marina Strocchi](https://kclpure.kcl.ac.uk/portal/en/persons/marina-strocchi) & [Martin Stoffel](https://www.turing.ac.uk/people/research-engineering/martin-stoffel) |
| 2023-12-14 | 14:00 | Research Roundup | Margaret Hamilton Meeting Room | all |
| 2024-01-?? | 13:00 | tbc | tbc | tbc |
| 2024-02-?? | 13:00 | Data ingress | Remote (Zoom) | tbc |
| 2024-03-28 | 13:00 | Dynamic Knowledge Graphs | Remote (Zoom) | [Xiaoxue Shen](https://digitwin.ac.uk/team/xiaoxue-shen/) |
## Agenda
| Time | Activity |
| ---- | -------- |
| 10 mins | Welcome & Updates |
| 20 mins | Gaussian Processes Emulators for Cardiac Modelling Applications |
| 15 mins | autoemulate - A Python package for making emulation easy |
| 10 mins | Q&A and discussion |
| 5 mins | Questions to audience |
| 5 mins | 👋 Close |
## Q&A
### Questions for everyone
_Have you used an emulator? add :+1: or :-1:_
* Yes :+1: :+1:
* No +1 +1
_If you have used emulators before, what model did you use and what was the biggest challenge in the process?_
* [name=X]
* [name=X]
_Are there any models you can imagine being useful as emulators from your field?_
* [name=X]
* [name=X]
### Questions for Marina
_Gaussian Processes Emulators for Cardiac Modelling Applications_
_Abstract: Computational models of the heart offer a non-invasive tool to link molecular processes to whole-heart function. AI, particularly Gaussian processes emulators (GPEs), helps reduce the cost of these simulations. I'll showcase the GPEs used in our research with some practical examples._
_add your questions below. Add "+1" to upvote_
* +1 Regarding the global sensitivity analysis, each input parameter x will have a different effect on the output y, is there a general way of normalising the dy/dx? And how to make sure it is a "global" one, will it cover the whole range of all the input parameters?
* Marina shared slides mentioned during her talk: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011257
*
*
*
*
*
### Questions for Martin
_Autoemulate_
_Abstract: Simulations can be slow and compute-intensive, which is why we often train emulator models for real-world applications and research. Our Python package, autoemulate, automatically evaluates a variety of models to find the best fit. I'll be discussing the package's current state and future plans._
_add your questions below. Add "+1" to upvote_
*
*
*
*
*
*
*
## Notes
* The call is recorded, but only for internal purposes
* TRIC-DT
* Research and INnovation cluster in digital twin -- advancing science of digital twinning
* operating interdisciplinary across 3 themes
* I&I hub = convenes science across themes, but also production of open and computational tools through events
* We're RAMs, RCMs, and REG members.
* RAM helps researchers maximise impact + value for their research products
* Seminar series
* meant to be a knowledge-share space + space for connection and potential future collaborations. You're here to geek out!
* More information is available here: https://github.com/alan-turing-institute/tric-dt
* Chime into the issues with suggested future topics -- don't forget to tag with "seminar series": https://github.com/alan-turing-institute/tric-dt/issues?q=is:issue+is:open+label:%22seminar+series%22
* Upcoming seminars around knowledge graphs, and data ingress (about relational databases)
* Important: a **general research sharing 1-2 minutes about their research in a Research Roundup in December** = in-person event with delicious snacks!
* Interested in joining on the 14th of December?
* xshen@turing.ac.uk
* Cristóbal Rodero Gómez
* Marina Strocchi
* Jose Alonso Solis Lemus
* Shahrokh Rahmani
* Cesare Corrado
* Abdul Qayyum
* Ziad Georges Ghauch
* Glossary
* terms that will be good to know to follow along in the talks.
* Gaussian Processes Emulators for Cardiac Modelling Applications (Marina Strocchi)
* ![image](https://hackmd.io/_uploads/HkNOe02Np.png)
* Lots of collaborators:
*
* What are emulators?
* ![image](https://hackmd.io/_uploads/SknAgRnV6.png)
* Simulator = linking model inputs to outputs. In the context of this presentation, takes many hours to solve.
* Emulator = linking model inputs to outputs. Doing so through statistical model
* The emulators we use: Gaussian processes (GP)
* ![image](https://hackmd.io/_uploads/HJZxWAhNa.png)
* QR code takes you to code.
* GPs use scalar outputs. Training emulators for outputs independently.
* Emulators used creates linear model of input parameters and
* Training emulators
* ![image](https://hackmd.io/_uploads/B1FSZChN6.png)
* For validation datasets, we compute coefficient of determination.
* ![image](https://hackmd.io/_uploads/H1rwZAnEa.png)
* ![image](https://hackmd.io/_uploads/ry5wZR3Na.png)
* Graphic representation gives good idea of what's going on:
* ![image](https://hackmd.io/_uploads/H1Wt-0nE6.png)
* Red: observations (what we're trying to mimic)
* Black: prediction
* Gray: uncertainty
* ![image](https://hackmd.io/_uploads/BJpcW0nNa.png)
* Speeding up our problem-solving
* Sensitivity analysis
* Which model inputs affect the outputs of interest?
* Many ways to do it
* one at a time approach.
* ![image](https://hackmd.io/_uploads/ryKpbRhN6.png)
* variance-based approach
* ![image](https://hackmd.io/_uploads/BkQJfRhNa.png)
* more expensive but more accurate
* that's what we go for.
* ![image](https://hackmd.io/_uploads/B1ogM0nNT.png)
* How to?
* start with sampling. Then run simulator (expensive simulator). Once we have the training, we have emulators we can use instead of expensive simulators. We use the trained emulators to do predictions quickly.
* ![image](https://hackmd.io/_uploads/BJu7GRn4a.png)
* This gives us two things:
* ![image](https://hackmd.io/_uploads/rkvNMC2Vp.png)
* Darker the rectangle, the stronger the correlation is
* ![image](https://hackmd.io/_uploads/rJorGA2NT.png)
* enables exclude parameters that are not much of importance.
* History matching (parameter inference)
* Important in terms of building digital twin.
* ![image](https://hackmd.io/_uploads/HyTDz02VT.png)
* Not trying to find perfect fit, but set of parameters that gives us acceptable outputs for clinical data.
* For each output, we need to define mean target value (comes from clinical data or literature) and standard deviation on the experimental data. Important on fitting. Healthcare could have physiological variability, measurement error, human errors...
* Also emulators' prediction and uncertainty.
* Then, for each parameter space, we can calculate an implausibility measure:
* ![image](https://hackmd.io/_uploads/BkTazCnNa.png)
* Defining cut-off for plausible / implausible points. implausible = no physiological output, not applicable to output data.
* Iterative process, refining the emulators, hoping uncertainty will go to 0. (In practice, it won't but we want it to be very small...)
* How to?
* Initial traning samples for parameters. Run simulation training. Then we can do denser sampling. These we want to divide between plausible and implausible. Then we can do quick model predictions, computing the implausibility measure.
* Next iteration, we take some samples from plausible area, and run simulations on them again, to be able to retrain emulators, making them more accurate. We also build new test datasets, within the plausible area, trying to refine it. Why this focus? We don't care about the points that are too far from the data we're trying to fit.
* We can then use the emulators to get model predictions in the test points. Then compute the implausibility measure again. Then we start over again. You stop when you're happy with model predictions or you don't see improvement in emulators accuracy or
* Applications
* Cardiac modelling
* very complex
* processes that result in our heart beating goes from small proteins in cardiac cells
* ![image](https://hackmd.io/_uploads/BkmCmA3N6.png)
* Anything can go wrong across all those dimensions.
* Difficult for clinicians to understand what's going on, and therefore to know what is the best treatment.
* We want to fill the gap with cardiac DTs.
* Why it's important? Cardiac diseases = one leading cause of death in the developed world. Clinical decisions are difficult. Based on large clinical trials, where heterogeneous population are tested. But each patient is different. We want to try to deliver: precision medicine. Tailor treatment to patient through building DT for the heart.
* Which parts of the heart are relevant?
* ![image](https://hackmd.io/_uploads/BJld4C3VT.png)
* Breaking down problem into components. We took cellular scale, looked at all parts individually, understanding sensitivity analysis and history matching:
* ![image](https://hackmd.io/_uploads/BkG5V0hNT.png)
* {Example: how the cell excites and relaxes, based on 29 parameters = we wanted to get rid of as many of them as possible. Used emulators to do this. We ended up excluding 19 of 29 parameters, reducing computational cost.}
* ![image](https://hackmd.io/_uploads/Bk-xSA2Np.png)
* Ten remaining parameters, we ran history matching:
* ![image](https://hackmd.io/_uploads/SJ--HR2VT.png)
* After matching:
* ![image](https://hackmd.io/_uploads/B1CGHR3Na.png)
* Compared to data that we wanted to match (in black) = history matching is doing what we want it to do:
* ![image](https://hackmd.io/_uploads/SkzNB0nNp.png)
* Final result reduced computational cost hugely:
* ![image](https://hackmd.io/_uploads/ry7LrRhVT.png)
* Autoemulate (Martin Stoffel)
* A Python package for making emulation easy
* It's not an easy process, and you need specialised tools. You could come with simulation in your field, and autoemulate helps.
* Collaborating now with Eric and Steve. A young project, still.
* Why do we need emulators?
* Simulations are slow and expensive.
* Prediction and sensitivity analysis
* optimisation
* uncertainty quantification
* It's hard to build one
* 1. Experimental design: If each run is expensive, which data shiould we evaluate at?
* 2. Creating the emulator: Which model? (GPs, neutral networks), which hyperparameters?
* Our current focus is here.
* {Example: Emulating a rocket simulation with autoemulate.}
* applying a little bit of phsyics: thust + launch angle to get to a certain max altitude.
* ![image](https://hackmd.io/_uploads/S1gzwAnVT.png)
* Solution is to build an emulator to get many more datapoints.
* ![image](https://hackmd.io/_uploads/rJfVPChVT.png)
* ![image](https://hackmd.io/_uploads/r1b8P0nNa.png)
* best according to some metric, that you should be able to define.
* Only 3-4 lines of code.
* Diagnostics are simple:
* ![image](https://hackmd.io/_uploads/SktPwRhNT.png)
* Looking at prediction and true output:
* ![image](https://hackmd.io/_uploads/Hk3KvCnVa.png)
* Using the emulator (with the best model), generate a mesh:
* ![image](https://hackmd.io/_uploads/SJ8av02Ep.png)
* predict on the mesh and the output gives us a better estimation:
* ![image](https://hackmd.io/_uploads/SJv0wC34T.png)
* Please don't use autoemulate yet to launch a rocket!
* The idea is to be able to balance simplicity and ability to customise the models to use and strategies for evaluation/validation
* ![image](https://hackmd.io/_uploads/rkgl_ChEp.png)
* Design choice: All the models are compatible with the `scikit-learn` ecosystem.
* Upcoming features:
* Deep learning + GPU support
* Visual diagnostics for each model (like what Marina showed before, uncertainty visualisation etc.)
* Allow for easy contributions (new models, metrics)
* Automating the experimental design step.
* ...all the while keeping it all simple.
* We're looking for:
* new datasets/simulations
* ideas for models/features
* feedback on autoemulate once the MVP is ready
* code contributions
* ![image](https://hackmd.io/_uploads/rJlROR3N6.png)
*
----
### Link List ✨
#### Today's Slidedecks & Resources
* [autoemulate slidedeck](https://zenodo.org/records/10174843)
* [autoemulate GitHub](https://github.com/alan-turing-institute/autoemulate)
#### General links
* [TRIC-DT website](https://www.turing.ac.uk/research/research-projects/tric-dt)
* [Seminar series page](https://github.com/alan-turing-institute/tric-dt/tree/main/Seminars) and [suggested topics](https://github.com/alan-turing-institute/tric-dt/issues?q=is%3Aopen+is%3Aissue+label%3A%22seminar+series%22)
* [TRIC-DT Glossary](https://hackmd.io/npUXO9llSDqagVM3SntR_w)
* [HackMD guide](https://hackmd.io/@turingway/hackmd-guide)
### Code of conduct
* [Take a moment to read this](https://www.turing.ac.uk/events/policies-and-guidelines)
---