# Running molecular dynamics simulations with GROMACS on Snellius (WIP) **Original authors: Szilárd Páll and Andrey Alekseenko** Adapted for BioExcel Summer School 2025 and Snellius by Atte Sillanpää, Laxmana Yetukuri. (adapted from [doi:10.6084/m9.figshare.22303477](https://doi.org/10.6084/m9.figshare.22303477)) ## Introduction [\\]: # (CB Write an introductiory paragraph about the aim of the tutorial and some information about LUMI?) :::success :dart: **Learning goals** * Get familiar with the GROMACS tools used in the exercises. * Understand common features of the `mdrun` command line. * Understand key parts of the `mdrun` log file structure. ::: Software: GROMACS 2023 [//]: # (### Performance in MD) [//]: # (Molecular dynamics simulations do the same calculation repeatedly, typically for a large number of time-steps. ) ### The GROMACS simulation engine [GROMACS](https://www.gromacs.org) is a molecular simulation package which comes with a molecular dynamics simulation engine (`mdrun`), a set of analysis tools, and the gmxlib Python API. GROMACS is highly flexible and can be built in various ways depending on the target hardware architecture and the parallelization features enabled. The GROMACS features and dependencies enabled at compile-time are shown in the _version header_ which is listed by invoking the `gmx -version` command as well as at the top of the simulation log outputs. All functionalities of the GROMACS package, the simulation engine and tools, are provided in the `gmx` program through subcommands. The program can have suffixes, e.g. MPI builds are typically installed as `gmx_mpi`. In this tutorial, we will use a version of GROMACS that has already been built on the LUMI-G cluster, but if you wish to install GROMACS on your own system, instructions for many different hardware configurations are available at [GROMACS documentation](https://manual.gromacs.org/current/install-guide/index.html). #### GROMACS parallelization overview Parallelization of MD simulation requires _expressing_ concurrent work (multiple computations happening at the same time) and _exposing_ it using an implementation with the help of a parallel programming model. To express concurrency within a single simulation in GROMACS we can divide the work using data (e.g. spatial decomposition algorithms), task (e.g. rank specialization for the "separate PME ranks" feature), or ensemble decomposition. The exposed concurrent work can then be mapped to various processing units, like CPU cores or GPU accelerators. GROMACS relies on a hierarchical heterogeneous parallelization using MPI, OpenMP multi-threading, CUDA/SYCL/OpenCL for asynchronous GPU execution, and SIMD for low-level CPU and GPU algorithms. The data parallelism is used for implementing spatial decomposition (that consist of dividing the simulation system into parts that are as independent as possible) and takes place across MPI ranks using multi-threading on CPUs and fine-grained SIMD-style algorithms (Single Instruction Multiple Data). At the same time, task parallelism is at the heart of the heterogeneous GPU engine and it is also what enables scaling the PME algorithm efficiently by employing [rank specialization](https://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html#separate-pme-ranks). MD simulation studies can be classified into two major flavors: those that use a single (or a few) long trajectory, and those realying on a larger set of trajectories. Due to the timescale of some biological processes, a single/few very long trajectories might not be enough (and/or it is inefficient) to sample the conformational space. Then, an alternative is to use an ensemble of simulations. A wide range of algorithms, from free energy perturbation to replica exchange to the accelerated weight histogram method (AWH), rely on (or require) multiple MD simulations which form an _ensemble_. An _ensemble simulation_ refers to a set of simulations, where each individual simulation is referred to as _ensemble member_ (called "replica" in replica-exchange and walker in AWH). These algorithms provide a source of concurrent work, which simulation workflows can use to parallelize over, and require different levels of coupling between the ensemble members. E.g., standard free-energy calculations (with a pre-determined simulation length) require no communication across the ensemble members, whereas replica-exchange and AWH require exchange of information at regular time intervals. The latter class of methods is referred to as _coupled_ ensembles. Depending on the frequency of data exchange, ensembles can be _weakly_ or _strongly_ coupled (with infrequent or frequent data exchange, resp.). Coupled ensembles are more performance sensitive, hence more prone to be influenced by imbalance (e.g. member simulations running with different throughput). The stronger the coupling the more sensitive the ensemble simulation is to performance bottlenecks. #### The `mdrun` simulation tool In GROMACS, the primary way to run simulations across CPUs and GPUs is to use the command line program `mdrun`. The simulation tool `mdrun` can be invoked as a subcommand of the main program, e.g. `gmx mdrun`. The `mdrun` functionalities available in a specific build depend on the options GROMACS was configured with and can be seen in the version header. The following list contains key performance-related command line options used in this tutorial: * `-g LOGFILE` set a custom name for the log file (default `md.log`); * `-pin on` enable `mdrun` internal thread affinity setting (might override externally set affinities). Note on LUMI externally set affinities are recommended or [the LUMI documentation](https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/distribution-binding/) FIXME; * `-tunepme`/`-notunepme` enable PME task load balancing; * `-nsteps N` set the number of simulations steps for the current run to `N` (N=-1 means infinitely long runs, intended to be combined with `-maxh`); * `-maxh H` stop the simulation after `0.99*H` hours; * `-resethway` reset performance counters halfway through the run; - `-nb`/`-pme`/`-bonded`/`-update` task assignment options used to select tasks to run on either CPU or GPU. - `-npme N` set the number of separate ranks to be used for PME (N=-1 is a guess) Note that some performance features require using environment variables. Documentation for these can be found in the [GROMACS user guide](https://manual.gromacs.org/current/user-guide/environment-variables.html). For further information on the `mdrun` simulation tool command line options and features, see the [online documentation](https://manual.gromacs.org/current/onlinehelp/gmx-mdrun.html). #### The `mdrun` log file The log file of the `mdrun` simulation engine contains extensive information about the GROMACS build, hardware detected at runtime, complete set of simulation setting, diagnostic output related to the run and its performance, as well as physics and performance statistics. The version header in a GROMACS `mdrun` log: ==FIXED== ![mdlogfile1](https://hackmd.io/_uploads/ryKG1kA-ll.jpg) The version header contains GROMACS version and the command line used for the current run (highlighted). It also contains additional information, like where GROMACS is installed and how it was compiled. The hardware detection section of the `mdrun` log: ==FIXED== ![mdlogfile2](https://hackmd.io/_uploads/HyHBe1A-gl.jpg) This section contains the detailed information about the hardware GROMACS is running on. The first line is a brief summary of the available resources (number of nodes, CPU cores and GPUs), followed by additional details: CPU architecture, CPU topology, and the list of the GPUs. The performance accounting in the `mdrun` file: ==FIXED== ![mdlogfil3](https://hackmd.io/_uploads/Sk_z7kR-xg.jpg) This section is printed at the end of the run. The table contains breakdown of the total run time per different kinds of activity. One can see the "Wall time" taken by each activity, as well as its percentage with regard to the total simulation time. Under the table one can find the absolute simulation performance (ns/day). ## Simulation input systems In the following exercises, we will use two different simulation systems: * large-sized [satellite tobacco mosaic virus, STMV](https://en.wikipedia.org/wiki/Tobacco_virtovirus_1) (~ 1 milion atoms) system solvated in a box of TIP3P water molecules, using the CHARMM27 force field. [//]: # (Comment 1066628 atoms, 2 fs time-step, 1.2 nm cut-offs, h-bond constraints, 0.15 nm PME grid spacing, NVT ensemble.) * medium-sized [aquaporin membrane protein](https://en.wikipedia.org/wiki/Aquaporin), a tetrameric ion channel (~110000 atoms) embedded in a lipid bilayer and solvated in a box of TIP3P water using the CHARMM36 force field. We will use the Accelerated Weight Histogram (AWH) algorithm with 32 walkers. [//]: # (Comment 2.5 fs time-step, 1.2 nm cut-offs, 0.1125 nm PME grid spacing, h-bond constraints, NPT ensemble.) Both systems have previously been used to benchmark GROMACS heterogeneous parallelization and acceleration (https://doi.org/10.1063/5.0018516). The simulation input files (including tpr) can be obtained from: * [Aquaporin](https://a3s.fi/gmx-lumi/aqp-240122.tar.gz) * [STMV](https://a3s.fi/gmx-lumi/stmv-240122.tar.gz) * or from the folder `/projects/prjs1567/data ` on Snellius ==FIXED== In exercises 1-3, start with the STMV input, while for exercise 4 Aquaporin. If time allows, feel free to experiment with the other input too. ## 0. Connecting to the Snellius Supercomputer (==new==) Gromacs totorials are done in Snellius cluster whose detailed description can be found in the SURF [user guide](https://servicedesk.surf.nl/wiki/spaces/WIKI/pages/30660184/Snellius). Here are minimal instructions to access Snellius supercomputer: - **Obtain an account on Snellius**: You should have received two emails with the necessary information to set up your access. To complete the setup: - Change your password and accept the usage agreement via the SURF user portal, as instructed in the emails. - Wait 10 minutes after accepting the usage agreement before logging into Snellius. This allows your home directory to be generated properly. **Please note** that course logins are temporary and will terminate soon after the School. Also, there are limited computing resources allocated to us, so keep usage to the tutorial content. - **Login through Open OnDemand interface**: Once you have credentials login to Snellius via [OnDemand interface](https://ondemand.snellius.surf.nl/). Once you are inside of OpenOnDemand web-interface, you can launch a terminal session on Snellius without using an SSH client. :::info :bulb: **Note**: To copy and paste into the OPEN OnDemand terminal, use the keyboard shortcuts CTRL+C / CTRL+V ::: :::info :bulb: **Note**: If your password contains special characters such as @ or $, it may not work properly in the login shell within the Open OnDemand environment. To avoid issues, either avoid using these characters in your password or copy and paste it directly when accessing the Snellius terminal. ::: - **Navigate to shared working environment**: All participants of the course share the same workenvironment. Once you have successfully logged into Snellius via Open OnDemand, you will be placed in your $HOME directory. You can navigate to shared work environment as below: ``` cd /projects/prjs1567/ ``` - **Create your own folder for your course tutorials**: Since the environment is shared among all participants, it is highly recommended to create your own folder (e.g, ```/projects/prjs1567/$USER``` or similar) to keep your work organized: ``` mkdir /projects/prjs1567/$USER cd /projects/prjs1567/$USER ``` You can now proceed with the tutorials in your personal workspace. ## 1. Running your first jobs on Snellius (KEEP) In this first exercise, we will submit our initial jobs on GPU nodes (i.e., using the partition```gpu_a100```) and explore key features and peculiarities of the LUMI system, scheduler, and GROMACS' `mdrun` simulation tool. As simulation system we use STMV input. We will start with a basic job submission script (batch script) and successively build on it to explore how to correctly request resources using the SLURM job scheduler and to finally arrive to have a script that correctly requests resources on GPU nodes. :::info :bulb: WIP: Among other GPU partitions on Snellius, the gpu_a100 and gpu_h100 partitions are designed for high-performance GPU workloads, each equipped with four powerful NVIDIA GPUs per node. The gpu_a100 partition features NVIDIA A100 GPUs with 40 GB of memory, making it well-suited for general-purpose GPU computing and AI/ML tasks. In contrast, the gpu_h100 partition uses the newer NVIDIA H100 GPUs with 80 GB of memory, optimized for demanding HPC and large-scale AI workloads. so it's not intended for visualization tasks. Both partitions offer high-speed interconnects and top-tier CPU configurations to support intensive computations. For further details on the Snellius GPU architecture, see [the SURF documentation on Snellius](https://servicedesk.surf.nl/wiki/spaces/WIKI/pages/30660209/Snellius+partitions+and+accounting) ==FIXED== ::: :::success :dart: **Learning goals** * Know how to submit GROMACS jobs. * Get familiar with common SLURM scheduling and `mdrun` command line options. * Get familiar with the `mdrun` console and log outputs. * _Bonus_: Understand the impact of using multiple CPU cores/threads on parts of the MD computation. ::: :::info :bulb: **The GROMACS log file** Before starting, take a look at the [introduction on the GROMACS`mdrun` simulation tool](https://hackmd.io/GciXcxaBSFOjlLXgDoDpIA#The-mdrun-simulation-tool) and at the description of log files. ==url FIXED== ::: ### Exercise 1.1: Launching a first GROMACS simulation on Snellius (KEEP) Now, we will launch a first test simulation on Snellius. Make sure you have copied the necessary topol.tpr file into a working directory of your choice under path `/projects/prjs1567/$USER` (this is also where your output files will end up, so it's good to keep it organized!), and then create a batch file (with suffix .sh) with the following content: ```bash= #!/bin/bash #SBATCH --partition=gpu_a100 # partition to use # SBATCH --reservation=gromacs_wednesday # workshop reservation - FIXME #SBATCH --time=00:10:00 # maximum execution time of 10 minutes #SBATCH --nodes=1 # we use 1 node #SBATCH --ntasks-per-node=1 # 1 MPI rank #SBATCH --cpus-per-task=1 # 1 CPU core #SBATCH --gpus=1 # number of GPUs; You should request at least one GPU module use /projects/prjs1567/modulefiles module load gromacs-2025.2-cuda export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} srun gmx_mpi mdrun \ -g ex1.1_${SLURM_NTASKS}x${OMP_NUM_THREADS}_jID${SLURM_JOB_ID} \ -nsteps -1 -maxh 0.017 -resethway -notunepme ``` Note the four benchmarking flags used above (`-nsteps`, `-maxh`, `-resethway` and `-[no]tunepome`) [are described in the introduction](https://hackmd.io/GciXcxaBSFOjlLXgDoDpIA#The-mdrun-simulation-tool). ==URLFIXED== We use time limited runs here by passing `-maxh 0.017` which sets the run time limit in hours (~one minute); we do that as the simulation throughput significantly changes and we want to avoid either very long wait time or unreliable benchmark measurements due to just a few seconds of runtime. Submit the job (use the `sbatch` command) and wait until it finishes. :::warning * Take a look at log file (.log) and find the hardware detection and performance table. What are the resources detected and used? ::: Now try to enable multithreading. To do that, we need to request multiple CPU cores. Edit the job script, change the number of CPU cores and submit a new job. :::info :bulb: `SBATCH` arguments provided in the job script header can also be passed on the command line (e.g. `sbatch --cpus-per-task N`) overriding the setting in the job script header. Doing so can allow varying submission parameters without having to edit the job script. ::: ```bash= #!/bin/bash #SBATCH --partition=gpu_a100 # partition to use # SBATCH --reservation=gromacs_wednesday # workshop reservation - FIXME #SBATCH --time=00:10:00 # maximum execution time of 10 minutes #SBATCH --nodes=1 # we use 1 node #SBATCH --ntasks-per-node=1 # 1 MPI rank #SBATCH --cpus-per-task=9 # 9 CPU cores; there are 72 CPU cores in a nodes on GPU_a100 partition #SBATCH --gpus=1 # number of GPUs module use /projects/prjs1567/modulefiles module load gromacs-2025.2-cuda export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} srun gmx_mpi mdrun \ -g ex1.1_${SLURM_NTASKS}x${OMP_NUM_THREADS}_jID${SLURM_JOB_ID} \ -nsteps -1 -maxh 0.017 -resethway -notunepme ``` :::warning * Compare the log file hardware detection and performance table of the two above runs. What has changed in terms of resources detected and used? Is there anything still missing? ::: There are 72 CPU cores and four GPUs in a node on GPU_a100 partition on Snellius. So please optimise the resources accordingly. LUMI-G has relatively few CPUs cores per GPU, so making the best use of these is important and can have a strong impact on performance. We will explore this further in Exercise 1.3. FIXME ### Exercise 1.2: Launching a simple GROMACS GPU run (KEEP) Note in the log files of previous exercise that GPUs are **not** detected. Now we learn how to request GPUs. Use the job script below to submit a job using one GPU. FIXME ```bash= #!/bin/bash #SBATCH --partition=gpu_a100 # partition to use # SBATCH --reservation=gromacs_wednesday # reservation for 24 January :FIXME #SBATCH --time=00:10:00 # maximum execution time of 10 minutes #SBATCH --nodes=1 # we run on 1 node #SBATCH --ntasks-per-node=1 # 1 MPI rank #SBATCH --cpus-per-task=18 # number cpus-per-task #SBATCH --gpus=1 # Get 1 GPU device module use /projects/prjs1567/modulefiles module load gromacs-2025.2-cuda export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} srun gmx_mpi mdrun \ -g ex1.2_${SLURM_NTASKS}x${OMP_NUM_THREADS}_jID${SLURM_JOB_ID} \ -nsteps -1 -maxh 0.017 -resethway -notunepme ``` :::warning * Look at the log file hardware detection and performance table: what are the resources detected and used? FIXME * Compare the log file of Ex 1.1 and 1.2. Has the performance changed? FIXME ::: ### _Bonus_ Exercise 1.3: Explore the use of CPUs and OpenMP multi-threading (KEEP) In this exercise, we will use only the CPUs of the LUMI-G nodes to explore how the different computational tasks perform with OpenMP multi-threading. Use the job script below to submit a job using only CPUs. Note that you have to fill in a value for `--cpu-per-task`. :::warning * But we can't run _without_ GPUs, can we, if we only have GPU allocation? So, this can't be done? FIXME * Should we skip this _or_ ask CPU capacity, too? ```bash= #!/bin/bash #SBATCH --partition=gpu_a100 # partition to use # SBATCH --reservation=gromacs_wednesday # reservation for 24 January FIXME #SBATCH --time=00:10:00 # maximum execution time of 10 minutes #SBATCH --nodes=1 # we run on 1 node #SBATCH --ntasks-per-node=1 # 1 MPI rank #SBATCH --cpus-per-task=... # number cpus-per-task module use /projects/prjs1567/modulefiles module load gromacs-2025.2-cuda export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} srun gmx_mpi mdrun \ -g ex1.3_${SLURM_NTASKS}x${OMP_NUM_THREADS}_jID${SLURM_JOB_ID} \ -nsteps -1 -maxh 0.017 -resethway -notunepme ``` * Modify the script varying the number of CPU cores used (`--cpus-per-task`) and submit runs with each new setting. :::warning * Look at the `mdrun` log file output (the files will be named `ex1.3_1xN_jIDXXXXXX.log`). * How does the absolute performance (ns/day) change when increasing the number of cores used? * How does the wall-time of various computations change with the thread count (e.g. "Force", "PME mesh", "Update" tasks)? [//]: # (Comment: * _Bonus_: ) ::: :::spoiler Help with the solution Sample log files for the exercise session.: FIXME * [ex 1.1](https://github.com/Lumi-supercomputer/gromacs-on-lumi-workshop/tree/main/Exercise-1.1/STMV) * [ex 1.2](https://github.com/Lumi-supercomputer/gromacs-on-lumi-workshop/tree/main/Exercise-1.2/STMV) * [ex 1.3](https://github.com/Lumi-supercomputer/gromacs-on-lumi-workshop/tree/main/Exercise-1.3/STMV) ::: # In the BioExcel Summer School 2025 the plan is to finish here ## 2. GPU accelerated simulations :::success :dart: **Learning goals** * Understand how the GROMACS heterogeneous parallelization allows moving tasks between CPU and GPU and how that impacts performance. * Understand the difference between force-offload and GPU-resident modes. * _Advanced_: Explore the effects of load balancing. ::: The GROMACS MD engine uses heterogeneous parallelization which can flexibly utilize both CPU and GPU resources. As discussed in the lecture, there are two offload modes: * In the _force offload_ mode, some or all forces are computed on the GPU, but are transferred to the CPU every iteration for integration; * In the GPU-resident mode, the integration happens on the GPU allowing the simulation state to reside on the GPU for tens or hundreds of iterations. Further details can be found in the [GROMACS users guide](https://manual.gromacs.org/current/user-guide/mdrun-performance.html#running-mdrun-with-gpus) and [DOI:10.1063/5.0018516](https://aip.scitation.org/doi/full/10.1063/5.0018516). In the following exercises, we will learn how moving tasks between the CPU and GPU impacts performance. As simulation system we use the STMV input. We will be using LUMI-G GPU nodes for submitting single-GPU device jobs (hence using one of the eight in the full compute node); for further details on the architecture and usage see the [GPU nodes - LUMI-G](https://docs.lumi-supercomputer.eu/hardware/lumig/). ### Exercise 2.1: GPU offloading force computations (KEEP as bonus) The tasks corresponding to the computation of bonded, short and long-range non-bonded forces can be offloaded to a GPU in GROMACS. The assignment of these tasks is controlled by the following `mdrun` command line options: * (short-range) nonbonded: `-nb ASSIGNMENT` * particle mesh Ewald: `-pme ASSIGNMENT` * bonded: `-bonded ASSIGNMENT` The possible "`ASSIGNMENT`" values are `cpu`, `gpu`, or `auto`. We use one GPU with CPU cores (an eighth of the node) in a simulation and assess how the performance changes with offloading different force calculations. As a baseline, launch a run first with assigning all tasks to the CPU (as below). ```bash= #!/bin/bash #SBATCH --partition=gpu_a100 # partition to use # SBATCH --reservation=gromacs_thursday # reservation for 25 January FIXME #SBATCH --time=00:10:00 # maximum execution time of 10 minutes #SBATCH --nodes=1 # we run on 1 node #SBATCH --gpus=1 # we use 1 GPU device #SBATCH --ntasks-per-node=1 # 1 MPI rank #SBATCH --cpus-per-task=9 # number cpus-per-task module use /projects/prjs1567/modulefiles module load gromacs-2025.2-cuda export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} srun gmx_mpi mdrun \ -nb cpu -pme cpu -bonded cpu -update cpu \ -g ex2.1_${SLURM_NTASKS}x${OMP_NUM_THREADS}_jID${SLURM_JOB_ID} \ -nsteps -1 -maxh 0.017 -resethway -notunepme ``` * Next submit jobs by incrementally offloading various force tasks (non-bonded `-nb`, PME `-pme`, bonded `-bonded`) to the GPU. :::warning * How does the performance (ns/day) change with offloading more tasks? * Look at the performance table in the log and observe how the fraction wall-time spent in the tasks left on the CPU change. ::: :::info :bulb: **Note** that the log file performance report will only contain timings of tasks executed on the CPU, not those offloaded to the GPU, as well as timings of the CPU time spent launching GPU work as well as waiting for GPU results. ::: ### Exercise 2.2: GPU-resident mode (KEEP as bonus) Continuing from the previous exercise, we will now explore using the GPU-resident mode. ```bash= #!/bin/bash #SBATCH --partition=gpu_a100 # partition to use # SBATCH --reservation=gromacs_thursday # reservation for 25 January FIXME #SBATCH --time=00:10:00 # maximum execution time of 10 minutes #SBATCH --nodes=1 # we run on 1 node #SBATCH --gpus=1 # we use 1 GPU device #SBATCH --ntasks-per-node=1 # 1 MPI rank #SBATCH --cpus-per-task=9 # number cpus-per-task module use /projects/prjs1567/modulefiles module load gromacs-2025.2-cuda export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} srun gmx_mpi mdrun \ -nb gpu -pme gpu -bonded gpu -update gpu \ -g ex2.2_${SLURM_NTASKS}x${OMP_NUM_THREADS}_jID${SLURM_JOB_ID} \ -nsteps -1 -maxh 0.017 -resethway -notunepme ``` * Submit a fully offloaded GPU-resident job using the `-update gpu` option (as above). * Since we moved most computation to the GPU, the CPU cores are left unused. The GROMACS heterogeneous engine allows moving work back to the CPU. Now, let's try to utilize CPU cores for potential performance benefits. First, try moving the PME tasks back to the CPU, then the bonded tasks. [\\]: # (Next let's try adding more cores to the run by specifying `--cpus-per-task=` and `OMP_NUM_THREADS=` in the job script.) :::warning * How does the GPU-resident mode perform compared to the best performing force-offload run from ex 2.1? * How did the performance change when force tasks were moved back to the CPU? * _Bonus_: Enable (PME) task load balancing by replacing `-notunepme` with `-tunepme`. Assign the PME task to the CPU and GPU and observe how the performance changes compared to the earlier similar run without load balancing. * _Bonus_: The frequency of neighbor search (`nstlist`) is a free parameter and can impact performance. Observe the default in the log files, try other values (e.g. double and triple) and observe how the performance changes. ::: :::spoiler help with the results Sample log files for the exercise session.: FIXME * [ex 2.1](https://github.com/Lumi-supercomputer/gromacs-on-lumi-workshop/tree/main/Exercise-2.1/STMV) * [ex 2.2](https://github.com/Lumi-supercomputer/gromacs-on-lumi-workshop/tree/main/Exercise-2.2/STMV) ::: --- :::info **License** FIXME This material is shared under CC BY-SA 4.0 license. [DOI:10.5281/zenodo.10556522](https://zenodo.org/doi/10.5281/zenodo.10556522) ![CC BY-SA](https://hackmd.io/_uploads/SJvv38tuT.png "title" =240x84) FIXME [//]: <> (This is a comment, it will not be included) [//]: <> (in the output file unless you use it in) [//]: <> (a reference style link.) [//]: # (This may be the most platform independent comment)