Author Diana Moreno (dmorenos@ttu.edu)
Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. It takes raw PB / ONT reads as input and outputs polished contigs.
flye (--pacbio-raw | --nano-raw ) <file1> <file2...> --genome-size <int> --out-dir <PATH> --threads <int> --iterations <int>
Assembly of long and error-prone reads
optional arguments:
-h, --help show this help message and exit
--pacbio-raw path [path ...]
PacBio raw reads
--pacbio-corr path [path ...]
PacBio corrected reads
--nano-raw path [path ...]
ONT raw reads
--nano-corr path [path ...]
ONT corrected reads
--subassemblies path [path ...]
high-quality contigs input
-g size, --genome-size size
estimated genome size (for example, 5m or 2.6g)
-o path, --out-dir path
Output directory
-t int, --threads int
number of parallel threads [1]
-i int, --iterations int
number of polishing iterations [1]
-m int, --min-overlap int
minimum overlap between reads [auto]
--asm-coverage int reduced coverage for initial disjointig assembly [not
set]
--plasmids rescue short unassembled plasmids
--meta metagenome / uneven coverage mode
--no-trestle skip Trestle stage
--polish-target path run polisher on the target sequence
--resume resume from the last completed stage
--resume-from stage_name
resume from a custom stage
--stop-after stage_name
stop after the specified stage completed
--debug enable debug output
-v, --version show program's version number and exit
To install flye with conda, simply run this command:
conda install flye
Warning
Flye is a bioconda package, therefore we need to have bioconda enabled first.
If bioconda is not enabled do the following:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
Before running flye, check the available memory. For a human genome with 30x coverage, you will need ~800Gb at peak.
Flye can be easily run with a simple command line:
For nanopore raw sequences (i.e no corrected)
โโโโflye --nano-raw <reads1.fastq reads2.fastq> --genome-size <int> --out-dir <path> --threads <int> --iterations 2
For pacbio raw sequences (i.e no corrected)
โโโโflye --pacbio-raw <reads1.fastq reads2.fastq> --genome-size <int> --out-dir <path> --threads <int> --iterations 2
Flye run can take from 1 to 2 weeks on a 2GB mammal genome, if the run stops you can always restarted with the โresume-from option (e.g. โresume-from polishing)
The results will be saved on the output directory.
Genome analysis
, Diana