owned this note
owned this note
Published
Linked with GitHub
# Nanopore Guppy GPU basecalling on Windows using WSL2
*Author:* [Miles Benton](https://sirselim.github.io/) ([GitHub](https://github.com/sirselim); [Twitter](https://twitter.com/miles_benton))
*Created:* 2021-06-16 21:05:32
*Last modified:* 2021-06-21 11:24:06
###### tags: `Nanopore` `GPU` `notes` `documentation` `Linux` `Windows`
-----
:::warning
**WARNING:** this is still very much 'experimental' in terms of the process and packages that are available. I will highlight throughout this section exactly which software and/or drivers are currently in beta - be warned it's pretty much everything.
:::
-----
## Foreward
This is documentation of my notes and experiences getting a brand new laptop running Windows 10 to perform GPU basecalling via Windows Subsystem for Linux 2 (WSL2).
For those that know me well, they know that it has been a very long time since I owned (or really used) a Windows based device. When I made the shift to Linux was during my masters, which would have been around the time that Windows 7 was quite new on the scene.
*Disclaimer:* **I did not have a good time!** But to be fair I did get it working and once working it seems pretty good. I still believe that Linux offers the best experience for Nanopore sequencing and all downstream processing and analysis. The Linux desktop experience has come a very long way in the last 5-10 years and it's actually very easy and quick to get up and running. Plus Nanopore seem to do all their development either on, or geared towards, Linux (Ubuntu runs on all ONT hardware), and I don't see that changing anytime soon.
## Resources
Before I launch into how I got set up I want to provide a list of some of the sites/resources I visited during the process:
* [Windows Subsystem for Linux Installation Guide for Windows 10](https://docs.microsoft.com/en-us/windows/wsl/install-win10)
* [Getting started with CUDA on Ubuntu on WSL 2](https://ubuntu.com/blog/getting-started-with-cuda-on-ubuntu-on-wsl-2)
* [How to Install WSL 2 on Windows 10 (Updated)](https://www.omgubuntu.co.uk/how-to-install-wsl2-on-windows-10)
* [WSL2 on Windows](https://dinhanhthi.com/docker-wsl2-windows/#wsl-%2B-windows)
* [Install the CUDA Driver and Toolkit in WSL2](https://levelup.gitconnected.com/install-the-cuda-driver-and-toolkit-in-wsl2-be38703fed5c)
* [Docker Desktop WSL 2 backend](https://docs.docker.com/docker-for-windows/wsl/)
As I mentioned previously, I had some real issues getting the Windows 'WSL' drivers communicating with the Ubuntu installed inside WSL2. I 'think' it was the reinstall of the Nvidia preview CUDA drivers with WSL, followed by a random system update (that seemed to take >30 mins) that finally 'fixed' my issue... but your milage may well vary. However the above resources should be enough to help get around any issues that arise (fingers crossed).
## The system
For completeness sake I'll record the system specs of the laptop that was used for this experiment. It was a new HP ZBook Fury 17 G7 Mobile Workstation, a very 'beefy'/powerful laptop in the scheme of things.
```
OS Name: Microsoft Windows 10 Pro for Workstations
Version: 10.0.21390 Build 21390
System Model: HP ZBook Fury 17 G7 Mobile Workstation
System Type: x64-based PC
Processor: Intel(R) Xeon(R) W-10885M CPU @ 2.40GHz, 2400 Mhz, 8 Core(s), 16 Logical Processor(s)
System memory: 64Gb RAM
Display: Nvidia Quadro RTX4000
Storage: 2x 2Tb NVMe SSD
```
## The process
The basic overview of what is required looks something like this:
`upgrade to Windows Insider ->
install WSL ->
install Nvidia drivers ->
install CUDA toolkit ->
basecall with GPU Guppy`
It seems simple, it should be simple, but it's not as simple as it's made out to be. Hopefully the below provides some use to those embarking on this journey.
:::info
**DISCLAIMER:** I find writing 'guides' for setting things up on Windows difficult. It's so foreign to me trying to explain things like "navigate here", "click this", "install that", etc, as opposed to something reproducible like `apt install package`. Therefore I'm going to leverage websites and guides that have been already written for most of that type of thing. This is just a heads up that you will need to navigate around a little bit, mainly at the start - once we get into the Linux side of things life becomes 'simple' again.
:::
Let's go!
### Windows Insider installation/upgrade
:::danger
**BETA/PREVIEW SOFTWARE:** this is just a warning that this software is still in preview/beta and may not behave as expected.
:::
First you will need to sign up to the Windows developer program. You can do that [here](https://insider.windows.com/).
Once you have access to the Insider program have a read through [this page](https://docs.microsoft.com/en-us/windows-insider/get-started#:~:text=Go%20to%20Settings%20%3E%20Update%20%26%20Security,you%20registered%20with%20and%20continue.) and follow the instructions there to get upgraded to a fresh cutting edge Windows Insider build.
### Installing WSL2
The below gives an overview of the process. It looks simple, it should be simple, I did not find it simple - again you're milage may vary.
1. Enable WSL 2
2. Enable ‘Virtual Machine Platform’
3. Set WSL 2 as default
4. Install a Linux distro
There is a 'preview' simplified approach:
>*The `wsl --install` simplified install command requires that you join the Windows Insiders Program and install a preview build of Windows 10 (OS build 20262 or higher), but eliminates the need to follow the manual install steps. All you need to do is open a command window with administrator privileges and run `wsl --install`, after a restart you will be ready to use WSL.* [(source)](https://docs.microsoft.com/en-us/windows/wsl/install-win10)
So in theory once you are set up with Windows Insider it should now be possible to install WSL, including Ubuntu 20.04, in a single command (in an Admin Powershell window):
```shell=
wsl --install
```
For whatever reason this didn't work for me so I documented the manual approach I took below. Hopefully this one line approach works for others and improves with age.
So on to the manual setup...
#### Enable WSL 2
Run the below code in Admin elevated Powershell:
```shell=
# Eanble Windows Subsystem for Linux
# PowerShell as Admin
dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart
```
#### Enable ‘Virtual Machine Platform’
Run the below code in Admin elevated Powershell:
```shell
# Enable Virtual Machine Platform (for WSL2)
dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart
```
You should now restart your system before moving to the next step.
#### Download and install WSL2 update package
Run the below to download the require installer:
```shell=
# Download and install WSL2 Linux kernel update package for x64 machines
https://wslstorestorage.blob.core.windows.net/wslblob/wsl_update_x64.msi
```
Run the update package downloaded (Double-click to run - you will be prompted for elevated permissions, select ‘yes’ to approve this installation). The system may want tp restart here.
#### Set WSL 2 as default
Now in the Powershell window we set the deault environment to WSL2:
```shell=
# PowerShell as Admin
wsl --set-default-version 2
```
I can't recall if there was a system restart here, probably.
#### Install a Linux distro
Time to install our Linux distribution. I picked Ubuntu 18.04 as this at the moment is the easiest to 'mesh' with current ONT software. Other Debian/Ubuntu distros will work fine, you'll just need to change a few software versions in the below code.
So first go to the [Windows Store](https://aka.ms/wslstore) and search for Ubuntu. Select Ubuntu 18.04 and follow the instructions to 'get' and install. Once it's installed you should be able to launch from the start menu. The first time you will be asked to create a user name and password - don't forget these!
Once you're in you can update the distro:
```shell=
# Update system
sudo apt update
sudo apt upgrade # be careful, it takes time!
```
Hooray! We now have Linux in Windows. :)
### Nvidia CUDA driver (Windows)
Time to grab and install some fancy preview drivers that allow a bridging between Windows and the Linux kernel in WSL2.
:::danger
**BETA SOFTWARE:** this is just a warning that this software is still in preview/beta and may not behave as expected.
:::
First you will need to join the Nvidia developers programme. Once you have done this you will be able to access the preview drivers, as well as a LOT more cool stuff that really is worth checking out if you are at all interested.
Sign up for developer access [here](https://developer.nvidia.com/developer-program).
Once you have access it's time to download the preview driver, you can get that [here](https://developer.nvidia.com/cuda/wsl/download). Depending on your card you'll want to grab either the GeForce or Quadro version of the driver. I grabbed the Quadro version for the mobile RTX4000.
Once that is downloaded it can be installed. Selecting the recommended options is fine and installation should go smoothly.
If you want more detailed instructions for the above please follow along with the first section of this guide ([link](https://levelup.gitconnected.com/install-the-cuda-driver-and-toolkit-in-wsl2-be38703fed5c)).
### Install CUDA-toolkit inside WSL
Now we need to install some packages within our new WSL2 environment.
:::info
**IMPORTANT:** At the moment it seems that versions of the cuda-toolkit other than 11.0 and 10.2 (i.e. 11.2 and greater) seem to be 'broken'. I have gone with 11.0 here as this is require for the later versions of Guppy. It seems to work well so it is my current suggestion. I noted that this issue has been clocked and a fix is inbound shortly.
:::
Because the latest versions of Guppy are built against CUDA 11 we'll grab the CUDA 11.0 toolkit and install it in our WSL2 environment.
First we need to set up the Nvidia public key:
```shell=
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
```
Now we add the Nvidia repositories to our system:
```shell=
sudo sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list'
```
Time to update:
```shell=
sudo apt-get update
```
Now we are able to install the CUDA toolkit:
```shell=
sudo apt-get --yes install cuda-toolkit-11-0 cuda-toolkit-10-2
```
I grabbed cuda-toolkit-10-2 as well for some 'legacy' needs that may arise for some other things I'm testing.
Once the CUDA toolkit is installed in WSL it's probably worth another restart of the computer (Windows has way too many restarts!). After reboot you can check if a folder named `/usr/lib/wsl` is present in the WSL2 environment. If you can find the folder, the whole installation process has worked. If not, then like me you'll be going back over numerous steps trying to figure out what went wrong. For me re-installing the Nvidia preview driver and doing another system update seemed to do the trick. So hopefully people following along with this get it to work first time, if not don't give up as it is possible - or you could give up and install Linux and have a much easier time... ;)
Now you can check if CUDA and your GPU works on Ubuntu with a sample program.
#### Testing CUDA
CUDA ships with a large range of builtin samples. These can be built and used to test the GPU/CUDA/Driver environment.
If all has gone well and we now have a working set up we should be able to build any of these samples, run them and get a 'pass'. Below I have done this and report the output (which will be specific to this system).
##### deviceQuery
First you'll need to move to the specific directory and make the sample 'program':
```shell=
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
```
You can then run the program:
```shell=
$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA Quadro RTX 4000 with Max-Q Design"
CUDA Driver Version / Runtime Version 11.4 / 11.0
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 8192 MBytes (8589934592 bytes)
(40) Multiprocessors, ( 64) CUDA Cores/MP: 2560 CUDA Cores
GPU Max Clock rate: 1380 MHz (1.38 GHz)
Memory Clock rate: 6001 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.0, NumDevs = 1
Result = PASS
```
Hooray! All looks like it is working here!
##### BlackScholes
```shell=
$ ./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Turing" with compute capability 7.5
Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.
Executing Black-Scholes GPU kernel (512 iterations)...
Options count : 8000000
BlackScholesGPU() time : 1.981922 msec
Effective memory bandwidth: 40.364860 GB/s
Gigaoptions per second : 4.036486
BlackScholes, Throughput = 4.0365 GOptions/s, Time = 0.00198 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128
Reading back GPU results...
Checking the results...
...running CPU calculations.
Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05
Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.
[BlackScholes] - Test Summary
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Test passed
```
So with that we can conclude that CUDA is working in WSL2 and utalising the GPU. Feel free to explore the various samples contained in that directory, there are some fun little examples in there.
We can finally get to the point of the while excerise.
### Basecalling with GPU in WSL
So with all the above done and working the rest is actually straight forward.
First you will require access to the Nanopore Community space to download the software. If you haven't already you can sign up [here](https://community.nanoporetech.com/).
Once that is sorted you can proceed with downloading Guppy.
#### Download and Extract Guppy
In the software section you'll find Guppy listed. You can either manually download the "Linux 64-bit GPU" binaries or you can right click and copy the url (link). If you modify the below code with that url you'll be able to run the following code block in a WSL2 terminal and download and extract Guppy in one go.
```shell=
version=5.0.11
wget https://[paste your link here]/ont-guppy_${version}_linux64.tar.gz
tar -zxvf ont-guppy_${version}_linux64.tar.gz
```
This approach has the added benefit of being able to easily modify the version number should you wish to download an older version of Guppy.
We can now check that the download binary works:
```shell=
$ ~/ont-guppy/bin/guppy_basecaller -v
: Guppy Basecalling Software, (C) Oxford Nanopore Technologies, Limited. Version 5.0.11+2b6dbff
```
Great, this is looking good.
#### Basecall some data
Time to throw some data at Guppy and the GPU and see what sort of performance we get. I'm going to test with both the fast model as well as the new super high accuracy model - I'm very interested to see how this laptop performs with the sup model.
:::warning
**WARNING:** OK, so I got caught out here. My years of not using Windows meant I overlooked the fact that it doesn't default to the best performance in terms of power mode when plugged in. What this means is that certain pieces of hardware (likely CPU and GPU) are 'scaled' down in their performance to save power. When I noticed and changed to the 'Best' performance profile, it greatly impacted basecalling - in a very positive way!
So please consider this a public service announcment if you are using, or plan to use Windows for GPU basecalling with Nanopore data.
:::
Ok, on with the fun stuff! First up we'll run the FAST model.
##### FAST model
First up the FAST model with default parameters.
```shell=
$ ~/ont-guppy/bin/guppy_basecaller -c dna_r9.4.1_450bps_fast.cfg -i fast5/ -s fastq -x 'auto' --recursive
ONT Guppy basecalling software version 5.0.11+2b6dbff
config file: /home/miles/ont-guppy/data/dna_r9.4.1_450bps_fast.cfg
model file: /home/miles/ont-guppy/data/template_r9.4.1_450bps_fast.jsn
input path: fast5/
save path: fastq
chunk size: 2000
chunks per runner: 160
minimum qscore: 8
records per file: 4000
num basecallers: 4
gpu device: auto
kernel path:
runners per device: 8
Found 10 fast5 files to process.
Init time: 803 ms
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 15874 ms, Samples called: 189550286, samples/s: 1.19409e+07
Finishing up any open output files.
Basecalling completed successfully.
```
It took a total of **15 secs**, nice! Before changing power mode profiles the FAST model with default settings took 37secs, so that's a reduction of more than 50% in terms of time, or more than a doubling in base calling speed. Cool!
Now we have confirmation of a fully working GPU Guppy set up in Windows WSL2. Let's try the other basecalling models.
##### HAC model
This first run was before I noticed the power mode issue.
```shell=
$ ~/ont-guppy/bin/guppy_basecaller -c dna_r9.4.1_450bps_hac.cfg -i fast5/ -s fastq -x 'auto' --recursive
ONT Guppy basecalling software version 5.0.11+2b6dbff
config file: /home/miles/ont-guppy/data/dna_r9.4.1_450bps_hac.cfg
model file: /home/miles/ont-guppy/data/template_r9.4.1_450bps_hac.jsn
input path: fast5/
save path: fastq
chunk size: 2000
chunks per runner: 256
minimum qscore: 9
records per file: 4000
num basecallers: 4
gpu device: auto
kernel path:
runners per device: 4
Found 10 fast5 files to process.
Init time: 973 ms
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 251015 ms, Samples called: 189550286, samples/s: 755135
Finishing up any open output files.
Basecalling completed successfully.
```
That's **4mins 11secs**.
Once I had clocked that the laptop was not running at full power while plugged I made the required change and reran the models. The HAC model with defaul (as above) completed in around **3mins 30secs**, so a good increase in basecalling rate just by changing the power profile.
I then had a play with tweaking the model. Trying to optimise basecalling speed for this RTX4000 mobile GPU I increased the parameter `--chunks_per_runner`. I did this in small increments, keeping an eye on GPU memory being used. At around 384 the basecalling rate 'stabilised', by this I mean that increasing the parameter lead to smaller and smaller gains in speed. At 412 I recorded the below run:
```shell=
$ ~/ont-guppy/bin/guppy_basecaller -c dna_r9.4.1_450bps_hac.cfg -i fast5/ -s fastq -x 'auto' --recursive --chunks_per_runner 412
ONT Guppy basecalling software version 5.0.11+2b6dbff
config file: /home/miles/ont-guppy/data/dna_r9.4.1_450bps_hac.cfg
model file: /home/miles/ont-guppy/data/template_r9.4.1_450bps_hac.jsn
input path: fast5/
save path: fastq
chunk size: 2000
chunks per runner: 412
minimum qscore: 9
records per file: 4000
num basecallers: 4
gpu device: auto
kernel path:
runners per device: 4
Found 10 fast5 files to process.
Init time: 927 ms
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 123090 ms, Samples called: 189550286, samples/s: 1.53993e+06
Finishing up any open output files.
Basecalling completed successfully.
```
That completed in **2mins 3secs** - which is nearly a minute and a half faster than the default HAC model parameters. That's some really nice gains!
:::info
**NOTE:** It should be noted that this is going to be very different between GPU models. Some GPUs will respond well to parameter optimisation, some won't. Most of the time the default model will be a fine option.
:::
##### SUP model
The SUP (super high accuracy) model came in with Guppy 5.0.7 and is much more taxing on hardware than the HAC model. So we expect this model to run anywhere from 2-8 times slower, depending greatly on the hardware that you have at hand.
This first run is the SUP model (default parameters) before adjusting the performance profile of the laptop.
```shell=
$ ~/ont-guppy/bin/guppy_basecaller -c dna_r9.4.1_450bps_sup.cfg -i fast5/ -s fastq -x 'auto' --recursive
ONT Guppy basecalling software version 5.0.11+2b6dbff
config file: /home/miles/ont-guppy/data/dna_r9.4.1_450bps_sup.cfg
model file: /home/miles/ont-guppy/data/template_r9.4.1_450bps_sup.jsn
input path: fast5/
save path: fastq
chunk size: 2000
chunks per runner: 256
minimum qscore: 10
records per file: 4000
num basecallers: 4
gpu device: auto
kernel path:
runners per device: 12
Found 10 fast5 files to process.
Init time: 2666 ms
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 553242 ms, Samples called: 189550286, samples/s: 342617
Finishing up any open output files.
Basecalling completed successfully.
```
So that SUP model run completed in **9mins 13secs**. I've yet to do a lot of testing with SUP to have a feel for what this result is like performance wise, but I'm feeling pleasently surprised with what the mobile RTX4000 is able to do here, I was expecting it to take quite a long time.
This next run is still SUP model default but now running in full performance mode.
```shell=
$ ~/ont-guppy/bin/guppy_basecaller -c dna_r9.4.1_450bps_sup.cfg -i fast5/ -s fastq -x 'auto' --recursive
ONT Guppy basecalling software version 5.0.11+2b6dbff
config file: /home/miles/ont-guppy/data/dna_r9.4.1_450bps_sup.cfg
model file: /home/miles/ont-guppy/data/template_r9.4.1_450bps_sup.jsn
input path: fast5/
save path: fastq
chunk size: 2000
chunks per runner: 256
minimum qscore: 10
records per file: 4000
num basecallers: 4
gpu device: auto
kernel path:
runners per device: 12
Found 10 fast5 files to process.
Init time: 1625 ms
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 261553 ms, Samples called: 189550286, samples/s: 724711
Finishing up any open output files.
Basecalling completed successfully.
```
Well that made a MASSIVE difference. We've gone from 9mins 13secs down to **4mins 21secs**! I am even more impressed with the mobile RTX4000 GPU that is in this HP laptop. I believe that it will easily keep up with HAC live basecalling of one, maybe even 2 MinIONs. I also think it might be able to SUP live basecall a single MinION, but we'll need to get it into the lab and starts some real life sequencing runs to confirm.
:::info
**NOTE:** I played with model parameter optimisation for all 3 models but it was only the HAC model that I found made any significant difference. This likely means that the FAST and SUP models are fairly well optimised, at least for the GPU performance that the mobile RTX4000 brings to the table. You're milage may vary based on the specific GPU(s) that you are using. In general the default models are going to do a really good job, but sometimes a little tweaking can eek out a bit more performance.
I was acutally very surprised that I was able to drastically increase the basecalling rate of the HAC model with the change in power mode and adjustment of the chunks per runner parameter - that must have just been the sweet spot for this particular GPU.
:::
## Native Linux vs WSL2
I finally found some time to install Linux in a dual boot setup on the HP Zbook Fury G7 17. I ended up going with Pop_OS! (21.10) as I've heard lots of great things about it, and I haven't been dissapointed - I'll find time to write about that experience elsewhere (spoiler: everything just works!).
So now I've got a native Linux environment I thought it might be fun to see how basecalling compares between WSL2 and "full-blown" Linux. Here is a comparision table based on the same data and models above. I've recorded samples per second as the metric (rate of basecalling).
| Model | WSL2 (samples/s) | Pop_OS! (samples/s) | Speed Up |
|:--------:|:--------:|:--------:|:--------:|
| FAST | 1.19409e+07 | 2.88644e+07 | 2.4x |
| HAC | 1.53993e+06 | 4.8192e+06 | 3.1x|
| SUP | 724711 | 1.36953e+06 | 1.9x |
This seems crazy to me! I wasn't sure what I was expecting, maybe a little faster under native Linux but not this much faster. They are obviously a lot of various overheads that are part of the WSL2 system. I reached out on Twitter and a WSL2 commented with some suggestions but at this stage I don't believe WSL2 is going to give the same level of performance that you will see under native Linux.
## What's next?
So after a fair few pain points with getting set up the goal was achieved and GPU base calling inside WSL2 seems to work nicely. Instead of wiping Windows I might actually dual boot this laptop with Ubuntu 20.04 (or similar). This will allow me to do some more robust testing between the two operating systems. I don't imagine there is much, if any, performance hit to basecalling in Windows vs Linux but I don't think anyone has offically documented this (to the best of my knowledge).
If I find the time I would like to see if MinKNOW can be run inside WSL and accessed remotely for a sequencing run with live basecalling. There are going to be some rather large hurdles in that experiment so I'll probably let sleeping dogs lie for a while longer before revisiting.
So hopefully the process above was of some use to anyone that wants to use a Windows machine for GPU basecalling. I personally will continue to adovcate for Linux, but it's nice to have been able to get this process working and I learned a lot, which is always a win in my books.
As with my other notes and documents I see this as fairly dynamic and will update as and when I find time. So please do feel to check back occasionally.
Thanks for reading!