Try   HackMD

GPU basecalling ONT data

Author: Miles Benton (GitHub; Twitter)
Created: 2021-11-16 10:24:03
Last modified: 2022-08-17 10:37:17

tags: Nanopore GPU notes documentation benchmarks

A note on live basecalling

From ONT community forum link:

“Keep up” is defined as 80% of the theoretical flow cell output.
e.g. MinION = 4000 kHz x 512 channels x 0.8 = 1.6 M samples/s = 160 kbases/s at 400 b/s

MinION = 4000 kHz x 512 channels x 1.0 = 2,048,000 samples/s

2.048 M samples/s or 2.048e+06 samples/s

It should be noted that this is based of an ideal situation where a flowcell is sequencing at 100% it's capacity / theoretical output. This is in reality unlikely to happen, so it's probably safe to assume that a GPU that can perform a minimum of 1.6 M samples/s for a given basecalling model will be able to keep up live.

IMPORTANT: please remember these numbers and calculations are based on MinION flowcells and not indicative of PromethION flowcell performance.

Table of results (updated: 2022-08-17)

GPU\CPU FAST model+ HAC model+ SUP model+ Guppy Version
A100 (40GB) (MS) 4.42000e+07 3.40000e+07 1.17000e+07 6.2.1
A100 (40GB) 3.40604e+07 2.68319e+07 6.58227e+06 6.0.1
RTX3090 (HG) 6.09667e+07 1.90738e+07 6.24702e+06 6.0.1
RTX3080Ti (eGPU) 5.71209e+07 1.18229e+07 4.52692e+06 6.0.1
A5000 8.71738e+07 1.33596e+07 4.43743e+06 6.2.1
Tesla V100 (16GB) (MS) 2.69000e+07 1.65000e+07 4.32000e+06 6.2.1
Titan RTX (P920) 3.17412e+07 1.47765e+07 4.29710e+06 6.0.1
Telsa V100 (32GB) 2.66337e+07 1.58095e+07 3.91847e+06 5.3.4
RTX6000 (Clara AGX) 2.01672e+07 1.36405e+07 3.42290e+06 5.3.4
Titan V (DE) 4.71917e+07 1.33653e+07 3.07009e+06 6.0.1
RTX3070 (HG) 5.04924e+07 1.03841e+07 2.95291e+06 6.0.1
RTX3070 (MH) 4.59143e+07 7.32223e+06 2.40374e+06 6.0.1
RTX3060 (eGPU) 4.70238e+07 6.40374e+06 2.28163e+06 5.3.4
RTX2060 SUPER (MS) 4.12000e+07 8.28000e+06 2.24000e+06 6.2.1
Tesla T4 (16GB) (MS) 2.61000e+07 5.16000e+06 1.43000e+06 6.2.1
RTX4000 (mobile) 2.88644e+07 4.81920e+06 1.36953e+06 6.0.1
Telsa P100 (12GB) (MS) 1.41000e+07 4.00000e+06 9.37000e+05 6.2.1
Jetson Xavier AGX (16GB) 8.49277e+06 1.57560e+06 4.40821e+05 5.3.4
Jetson Xavier NX 4.36631e+06 - - 5.3.4
Jetson TX2 (modified) 2.05553e+06 - - 5.0.14
Jetson TX2 (Mk1C) 1.60000e+06 - - -
Xeon W-10885M (CPU) 6.43747e+05 DNF DNF 5.0.14

NOTE: the above table is currently sorted based on best performance in the Super Accuracy Model (SUP).

A massive thank you to all external contributors:

  • David Eccles (Titan V)
  • Martin Haagmans (RTX3070)
  • Hasindu Gamaarachchi (RTX3070, 2x RTX3090)
  • Michael Shamash (RTX2060, P100, T4, V100-SXM2, A100)

Table indicating live calling performance

The below table lists results for all the GPUs that we have currently tested. We have used the same example set of ONT fast5 files, Guppy 5.0.16, and where possible have tuned the chunks_per_runner parameter to get the most out of HAC and SUP calling based on the GPU being tested. This hopefully gives a more "real world" example of what you can expect from these types of cards in terms of basecalling rate.

The colours represent how well a given GPU and basecalling model will perform for keeping up with live basecalling during a sequencing run.

  • green - easily keeps up in real-time
  • orange - will likely keep up with 80-90% of the run in real-time
  • red - won't get anywhere close, large lag in basecalling

From ONT community forum link:

“Keep up” is defined as 80% of the theoretical flow cell output.
e.g. MinION = 4000 kHz x 512 channels x 0.8 = 1.6 M samples/s = 160 kbases/s at 400 b/s

MinION = 4000 kHz x 512 channels x 1.0 = 2,048,000 samples/s

2.048 M samples/s or 2.048e+06 samples/s

It should be noted that this is based of an ideal situation where a flowcell is sequencing at 100% it's capacity / theoretical output. This is in reality unlikely to happen, so it's probably safe to assume that a GPU that can perform a minimum of 1.6 M samples/s for a given basecalling model will be able to keep up 'live'.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

* the metric reported is samples/second - where higher is faster basecalling
DNF - did not finish (I couldn’t be bothered waiting hours/days for the CPU)