GPU basecalling ONT data

# GPU basecalling ONT data *Author:* [Miles Benton](https://sirselim.github.io/) ([GitHub](https://github.com/sirselim); [Twitter](https://twitter.com/miles_benton)) *Created:* 2021-11-16 10:24:03 *Last modified:* 2022-08-17 10:37:17 ###### tags: `Nanopore` `GPU` `notes` `documentation` `benchmarks` ---- ## A note on live basecalling From ONT community forum [link](https://community.nanoporetech.com/protocols/Guppy-protocol/v/gpb_2003_v1_revaa_14dec2018/guppy-software-overview): > “Keep up” is defined as 80% of the theoretical flow cell output. e.g. MinION = 4000 kHz x 512 channels x 0.8 = 1.6 M samples/s = 160 kbases/s at 400 b/s MinION = 4000 kHz x 512 channels x 1.0 = 2,048,000 samples/s $\equiv$ 2.048 M samples/s or 2.048e+06 samples/s It should be noted that this is based of an ideal situation where a flowcell is sequencing at 100% it's capacity / theoretical output. This is in reality unlikely to happen, so it's probably safe to assume that a GPU that can perform a minimum of 1.6 M samples/s for a given basecalling model will be able to keep up live. :::warning IMPORTANT: please remember these numbers and calculations are based on MinION flowcells and not indicative of PromethION flowcell performance. ::: ## Table of results (updated: 2022-08-17) | GPU\CPU | FAST model^+^ | HAC model^+^ | SUP model^+^ | Guppy Version | |--------------------------|:-------------:|:------------:|:------------:|---------------| | A100 (40GB) (MS) | 4.42000e+07 | 3.40000e+07 | 1.17000e+07 | 6.2.1 | | A100 (40GB) | 3.40604e+07 | 2.68319e+07 | 6.58227e+06 | 6.0.1 | | RTX3090 (HG) | 6.09667e+07 | 1.90738e+07 | 6.24702e+06 | 6.0.1 | | RTX3080Ti (eGPU) | 5.71209e+07 | 1.18229e+07 | 4.52692e+06 | 6.0.1 | | A5000 | 8.71738e+07 | 1.33596e+07 | 4.43743e+06 | 6.2.1 | | Tesla V100 (16GB) (MS) | 2.69000e+07 | 1.65000e+07 | 4.32000e+06 | 6.2.1 | | Titan RTX (P920) | 3.17412e+07 | 1.47765e+07 | 4.29710e+06 | 6.0.1 | | Telsa V100 (32GB) | 2.66337e+07 | 1.58095e+07 | 3.91847e+06 | 5.3.4 | | RTX6000 (Clara AGX) | 2.01672e+07 | 1.36405e+07 | 3.42290e+06 | 5.3.4 | | Titan V (DE) | 4.71917e+07 | 1.33653e+07 | 3.07009e+06 | 6.0.1 | | RTX3070 (HG) | 5.04924e+07 | 1.03841e+07 | 2.95291e+06 | 6.0.1 | | RTX3070 (MH) | 4.59143e+07 | 7.32223e+06 | 2.40374e+06 | 6.0.1 | | RTX3060 (eGPU) | 4.70238e+07 | 6.40374e+06 | 2.28163e+06 | 5.3.4 | | RTX2060 SUPER (MS) | 4.12000e+07 | 8.28000e+06 | 2.24000e+06 | 6.2.1 | | Tesla T4 (16GB) (MS) | 2.61000e+07 | 5.16000e+06 | 1.43000e+06 | 6.2.1 | | RTX4000 (mobile) | 2.88644e+07 | 4.81920e+06 | 1.36953e+06 | 6.0.1 | | Telsa P100 (12GB) (MS) | 1.41000e+07 | 4.00000e+06 | 9.37000e+05 | 6.2.1 | | Jetson Xavier AGX (16GB) | 8.49277e+06 | 1.57560e+06 | 4.40821e+05 | 5.3.4 | | Jetson Xavier NX | 4.36631e+06 | - | - | 5.3.4 | | Jetson TX2 (modified) | 2.05553e+06 | - | - | 5.0.14 | | Jetson TX2 (Mk1C) | 1.60000e+06 | - | - | - | | Xeon W-10885M (CPU) | 6.43747e+05 | DNF | DNF | 5.0.14 | **NOTE:** the above table is currently sorted based on best performance in the Super Accuracy Model (SUP). A massive thank you to all external contributors: * **David Eccles** (Titan V) * **Martin Haagmans** (RTX3070) * **Hasindu Gamaarachchi** (RTX3070, 2x RTX3090) * **Michael Shamash** (RTX2060, P100, T4, V100-SXM2, A100) ### Table indicating live calling performance The below table lists results for all the GPUs that we have currently tested. We have used the same example set of ONT fast5 files, Guppy 5.0.16, and where possible have tuned the `chunks_per_runner` parameter to get the most out of HAC and SUP calling based on the GPU being tested. This hopefully gives a more "real world" example of what you can expect from these types of cards in terms of basecalling rate. The colours represent how well a given GPU and basecalling model will perform for keeping up with live basecalling during a sequencing run. * green - easily keeps up in real-time * orange - will likely keep up with 80-90% of the run in real-time * red - won't get anywhere close, large lag in basecalling From ONT community forum [link](https://community.nanoporetech.com/protocols/Guppy-protocol/v/gpb_2003_v1_revaa_14dec2018/guppy-software-overview): > “Keep up” is defined as 80% of the theoretical flow cell output. e.g. MinION = 4000 kHz x 512 channels x 0.8 = 1.6 M samples/s = 160 kbases/s at 400 b/s MinION = 4000 kHz x 512 channels x 1.0 = 2,048,000 samples/s $\equiv$ 2.048 M samples/s or 2.048e+06 samples/s It should be noted that this is based of an ideal situation where a flowcell is sequencing at 100% it's capacity / theoretical output. This is in reality unlikely to happen, so it's probably safe to assume that a GPU that can perform a minimum of 1.6 M samples/s for a given basecalling model will be able to keep up 'live'. ![image](https://user-images.githubusercontent.com/5932864/155636629-da28c09e-8253-46c0-8cf1-636aa2df1db5.png) \* the metric reported is samples/second - where higher is faster basecalling **DNF** - did not finish (I couldn’t be bothered waiting hours/days for the CPU)

Read more

GPU musings (with an eye on genomics)

Tuning GPU parameters for Guppy performance

GPU price / performance comparisons for Nanopore basecalling

Running (live) GPU basecalling on 21.04 [experimental]