Data groupA - HackMD

This tutorial assumes you have installed docker correctly!
See https://hackmd.io/@wvdt/docker on how to do that.

Let's look at the results for Run1, barcode08. This is a positive control sample.

DIR=/home/$USER/workshop_data/groupA
cd $DIR

# look at the quality of the run with pycoQC
conda activate pycoQC
pycoQC --summary_file ./sequencing_summary*.txt \
--html_outfile pycoQC_report.html

conda activate QC
mkdir -p $DIR/nanoplot/barcode08
cd $DIR/nanoplot/barcode08
NanoPlot --fastq ../../fastq_pass/barcode08/F*.fastq.gz

cat $DIR/fastq_pass/barcode08/*.fastq.gz > $DIR/fastq_pass/barcode08/all_reads.fastq.gz

mkdir -p $DIR/fastq_pass/barcode08/filtered
filtlong --min_length 150 $DIR/fastq_pass/barcode08/all_reads.fastq.gz \
| gzip > $DIR/fastq_pass/barcode08/filtered/filtered_reads.fastq.gz

NFDIR=$DIR/nf-flu/barcode08
mkdir -p $NFDIR

echo "sample,reads" > $NFDIR/samplesheet.csv
echo "barcode08,/home/$USER/workshop_data/groupA/fastq_pass/barcode08/filtered" >> $NFDIR/samplesheet.csv
cat $NFDIR/samplesheet.csv

conda activate nextflow
cd $NFDIR
nextflow run CFIA-NCFAD/nf-flu --input samplesheet.csv \
--platform nanopore -profile docker \
--skip_irma_subtyping_report false

Pre-download the NCBI files to speed up future runs!

# Tip: you can download the NCBI fasta and metadata yourself once and save them
# in this way, you can provide the pipeline with the filepaths when you run it,
# such that they don't have to be downloaded every single time!
mkdir -p ~/workshop_data/references/{ncbi_influenza_fasta,ncbi_influenza_metadata}
wget https://ftp.ncbi.nih.gov/genomes/INFLUENZA/genomeset.dat.gz -O ~/workshop_data/references/ncbi_influenza_metadata/genomeset.dat.gz
wget https://ftp.ncbi.nih.gov/genomes/INFLUENZA/influenza.fna.gz -O ~/workshop_data/references/ncbi_influenza_fasta/influenza.fna.gz

NCBI_META=~/workshop_data/references/ncbi_influenza_metadata/genomeset.dat.gz
NCBI_FASTA=~/workshop_data/references/ncbi_influenza_fasta/influenza.fna.gz

Let's do barcode01 of the same run. We will use medaka for variant calling now.
Attention: when using medaka, you have to specify which medaka-model to use for variant calling and consensus, this should be the same model as was used by Guppy for basecalling! You can find a list of available models here: https://github.com/nanoporetech/medaka/blob/master/medaka/options.py

For the r10 pore chemistry the naming scheme is:
r[pore_chemistry]_e82_[basecall_speed]_[basecalling_mode]_[guppy_version]

Our data was generated on a r10.4.1 pore, with 400bps Fast basecalling and basecalled with guppy 6.4.6 (you can find this information in the file report_[....].html that is generated by the MinION), so the best medaka model we can choose for variant calling is 'r1041_e82_400bps_fast_variant_g632'.

NCBI_META=~/workshop_data/references/ncbi_influenza_metadata/genomeset.dat.gz
NCBI_FASTA=~/workshop_data/references/ncbi_influenza_fasta/influenza.fna.gz

BARCODE=barcode01

DIR=/home/$USER/workshop_data/groupA

conda activate pycoQC
cd $DIR
pycoQC --summary_file ./sequencing_summary*.txt --html_outfile pycoQC_report.html


conda activate QC
mkdir -p $DIR/nanoplot/$BARCODE
cd $DIR/nanoplot/$BARCODE
NanoPlot --fastq ../../fastq_pass/$BARCODE/F*.fastq.gz

cat $DIR/fastq_pass/$BARCODE/*.fastq.gz > $DIR/fastq_pass/$BARCODE/all_reads.fastq.gz
mkdir -p $DIR/fastq_pass/$BARCODE/filtered
filtlong --min_length 100 $DIR/fastq_pass/$BARCODE/all_reads.fastq.gz \
| gzip > $DIR/fastq_pass/$BARCODE/filtered/filtered_reads.fastq.gz

NFDIR=$DIR/nf-flu/$BARCODE
mkdir -p $NFDIR
echo "sample,reads" > $NFDIR/samplesheet.csv
echo "$BARCODE,$DIR/fastq_pass/$BARCODE/filtered" >> $NFDIR/samplesheet.csv
cat $NFDIR/samplesheet.csv

conda activate base
cd $NFDIR
nextflow run CFIA-NCFAD/nf-flu --input samplesheet.csv --platform nanopore -profile docker \
--skip_irma_subtyping_report false \
--variant_caller medaka \
--medaka_variant_model r1041_e82_400bps_fast_variant_g632 \
--ncbi_influenza_fasta $NCBI_FASTA \
--ncbi_influenza_metadata $NCBI_META

Results:
https://drive.google.com/drive/folders/1AotYhrseIOdEprWJT6v5nbouHHsS2zY1?usp=drive_link

Read more

Avian Influenza, Hybrid reference mapping

Assembly of Avian Influenza using IRMA

Introduction to Docker

Introduction to Nextflow