Accelerating UKCA by predicting timesteps with FTorch

<style> .reveal { font-size: 27px; } </style> <style> .green {color: green;} </style> <style> .red {color: red;} </style> ## Accelerating UKCA by predicting timesteps with FTorch <u>Joe Wallwork</u><sup>1</sup>, Luke Abraham<sup>2,3</sup>, Jack Atkinson<sup>1</sup> <sup>1</sup>Institute of Computing for Climate Science, University of Cambridge, U.K. <sup>2</sup>Department of Chemistry, University of Cambridge, U.K. <sup>3</sup>National Centre for Atmospheric Science, U.K.  <img src="https://hackmd.io/_uploads/SyNht0cpyg.png" alt="drawing" width="400"/> <img src="https://hackmd.io/_uploads/ryH0C69a1l.png" alt="drawing" width="300"/> Slides: https://hackmd.io/@jwallwork/2025-durham-hpc-days?type=slide  --- ## Funding * The [Institute of Computing for Climate Science (ICCS)](https://iccs.cam.ac.uk) acknowledges funding from [Schmidt Sciences](https://www.schmidtsciences.org). * This project also received funding from a [C2D3-Accelerate grant](https://science.ai.cam.ac.uk/news/2024-12-09-exploring-novel-applications-of-ai-for-research-and-innovation-%E2%80%93-announcing-our-2024-funded-projects.html) for novel applications of AI in research and innovation. --- ## UKCA: United Kingdom Chemistry \& Aerosols * Atmospheric composition model used in UKESM and at the Met Office.  * ~85-200 tracers and ~300-750 reactions. * UKCA takes ~25% of UM runtime (depending on configuration). * UKCA's chemical solver takes ~40% UKCA runtime (~10% overall UM runtime). ![Screenshot from 2025-05-30 09-40-36](https://hackmd.io/_uploads/Bk0M1gDGxx.png) --- ## UKCA chunking approach * Structured latitude-longitude grid. * *Chemistry solver is spatially independent.* * Chunking options: 1. horizontal levels. 2. vertical columns (slower). 3. full domain (intended for GPU). --- ## UKCA timestepping approach * Implicit timestepping, quasi-Newton, full LU decomposition. * For each time subinterval to be integrated...  ![TimestepHalvings.drawio](https://hackmd.io/_uploads/BkyvtvTfxl.svg) --- ## Numbers of halving steps Low resolution "N48" UKCA job with 10 timesteps. ![nhsteps_hist](https://hackmd.io/_uploads/rkb7eTgZlx.png =x500) --- ## ML halving steps ![nn](https://hackmd.io/_uploads/Syw4MgDMll.svg =x400) * Idea: for each grid-box, map input variables to the number of halving steps. * https://github.com/Cambridge-ICCS/mlstep --- ## FTorch - overview Fortran interface for PyTorch, https://cambridge-iccs.github.io/FTorch. ![FTorch-webpage](https://hackmd.io/_uploads/Bkuyg9gZxx.png =x250) * Open source (MIT license) and open development. * Designed to be familiar to both Fortran programmers and PyTorch users. * Uses `iso_c_binding` to interface with the Torch C++ backend (no data copying). * Couple directly to `libtorch` $\implies$ no need for Python runtime. --- ## FTorch - offline training workflow ![offline.drawio](https://hackmd.io/_uploads/Syjv5_cakx.svg) --- ## Offline approach 1. **Generate training data (Fortran)** Run UKCA test case, writing input arrays and an output array containing numbers of halving steps with NetCDF. 2. **Data processing (Python)** Raw training material unsuitable. 3. **Training and scripting (Python)** Load training data, use it to train ML model, and save in TorchScript format. 4. **Inference (Fortran)** Load trained model and use it to predict timestep. --- ## Offline approach - 1. Generate training data (Fortran) ```fortran USE ncutils, ONLY: write_nc_real_1d, wrice_nc_integer_1d, ... ! [Setup] ! Open training data files for writing iteration = iteration + 1 IF (training) THEN tot_n_points = theta_field_size * model_levels ! Write temperature data to file ALLOCATE(zt_full(tot_n_points)) zt_full(:) = PACK(temp,.TRUE.) CALL write_nc_real_1d(iteration, "zt", zt_full) DEALLOCATE(zt_full) ! [Other inputs] END IF ! [Run solver with a chunk size of 1] IF (training) THEN ! Write numbers of chemistry timesteps to file ALLOCATE(ncsteps_full(tot_n_points)) CALL write_nc_integer_1d(iteration, "ncsteps", ncsteps_full) DEALLOCATE(ncsteps_full) END IF ! [Cleanup] ``` --- ## Offline approach - 2. Data processing (Python) * 10 timesteps $\times$ 3D domain $\implies$ 2.6M data points! * Mostly zeros $\implies$ massive bias. * Take nonzero points, plus `zero_factor=3` times as many zero points $\implies$ just 4,904. --- ## Offline approach - 3. Training (Python) ```python import torch from mlstep.data_utils import NetCDFDataLoader from mlstep.net import FCNN from mlstep.propagate import propagate # [Setup] # Load the target and feature data from file features_1d = ["stratflag", "zp", "zt", "zq", "cldf", "cldl"] features_2d = ["prt", "dryrt", "wetrt", "ftr"] ncloader = NetCDFDataLoader( features_1d, features_2d, num_timesteps, zero_factor=zero_factor ) target_data = ncloader.load_target_data() max_nhsteps = ncloader.max_nhsteps # Setup model, optimiser, and loss function nn = FCNN(input_size, max_nhsteps=max_nhsteps, hidden_size=hidden_size) nn = nn.to(device, dtype=torch.float) optimizer = torch.optim.Adam(nn.parameters(), lr=lr) criterion = torch.nn.CrossEntropyLoss(reduction="sum") # [Training loop] # Save model in TorchScript format scripted_model = torch.jit.script(nn) scripted_model.save("model.pt") ``` --- ## Offline approach - 4. Inference (Fortran) ```fortran use ftorch ! [Setup] IF (.NOT. training) THEN CALL torch_tensor_from_array(in_tensors, ..., torch_kCPU) CALL torch_tensor_from_array(out_tensor, out_data, torch_kCPU) CALL torch_model_load(mlp, "model.pt", torch_kCPU) END IF DO i=1,rows DO j=1,row_length ! [Array chunking, fill out_data] IF (.NOT. training) THEN CALL torch_model_forward(mlp, in_tensors, out_tensor) ncsteps_full(kcs:kce) = out_data END IF ! [Run solver on chunk] END DO END DO ! [Cleanup] ``` --- ## Offline results ![losses](https://hackmd.io/_uploads/rko1Ud2zll.png) Validation results: ~0% overestimation! ...and ~10% underestimation.  --- ## Summary and conclusions * UKCA timestepping algorithm needs reworking to improve performance. * FTorch can be used to integrate a PyTorch emulator. * Preliminary work on offline training (training in Python). * Need to generate more training data! * Inbuilt regularisation with having more zero data points. * Might be able to drop some input variables. --- ## Future work: online training * We recently exposed automatic differentiation and optimisers in FTorch. * Means we can define the ML model in PyTorch but then do the training in Fortran. * Avoids saving large volumes of training data and gives possibility to extend loss function to include model errors from Fortran. --- ## Resources * FTorch webpage: https://cambridge-iccs.github.io/FTorch. * Atkinson et al., (2025). FTorch: a library for coupling PyTorch models to Fortran. Journal of Open Source Software, 10(107), 7602, https://doi.org/10.21105/joss.07602. ![image](https://hackmd.io/_uploads/rkDMMtxWlg.png) * [ICCS ML coupling workshop](https://cambridge-iccs.github.io/ml-coupling-workshop) - 3-4 September, Cambridge, U.K.