Tom Nicholas, 29/06/2021
TLDR: The MCF and plasma physics community should copy the Pangeo project and its (xarray + dask-based) software ecosystem, because geosciences have already solved many of the software problems we still struggle with.
Problems to solve
My experience of software practices in fusion research has been that there are some common and widespread problems. In particular:
Code is not open or shared. Whilst there are notable exceptions, the typical workflow of a fusion research group right now involves simulation or diagnostic processing code that is closed-source, sometimes requires a licence (for IDL or MatLab), and is not designed to be reusable or composable for other researchers. This increases the chance of bugs, massively inhibits collaboration, and prevents reproducibility.
Re-inventing the wheel regularly is another inevitable result of not sharing tools. Even different groups who use the same simulation code often have different data loading and analysis scripts. Clear opportunities for standardization are not taken - for example the ubiquitous edge simulation code SOLPS has no good set of standard analysis tools. This is to say nothing of the myriad equilibrium solving codes, field-line tracers, and diverted geometry plotting scripts.
Not taking advantage of existing solutions from other domains. A lot of what we do already has well-developed tools, and other fields have already faced and overcome many software challenges that we still struggle with. There is perhaps a bit of a cultural issue at play where we think fusion is especially hard or different somehow, e.g. "the science and engineering challenges are novel and unprecedentedly complicated, so we need to use totally bespoke software tools". This does not follow - in reality the majority of fusion research involves rather similar modelling and analysis tasks to other fields of science. Most of us still merely need to load gridded numerical data from simulations or experiment, run some pretty standard analysis functions on it, and then visualize the results. (The one exception might be in the degree of complexity of integrated modelling, but that effort would still benefit from these recommendations.)