Try โ€‚โ€‰HackMD

Genome assembly: Flye software

Author Diana Moreno (dmorenos@ttu.edu)

Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. It takes raw PB / ONT reads as input and outputs polished contigs.

Table of Contents

Flye usage:

flye (--pacbio-raw | --nano-raw ) <file1> <file2...> --genome-size <int> --out-dir <PATH> --threads <int> --iterations <int> 

Assembly of long and error-prone reads

optional arguments:
  -h, --help            show this help message and exit
  --pacbio-raw path [path ...]
                        PacBio raw reads
  --pacbio-corr path [path ...]
                        PacBio corrected reads
  --nano-raw path [path ...]
                        ONT raw reads
  --nano-corr path [path ...]
                        ONT corrected reads
  --subassemblies path [path ...]
                        high-quality contigs input
  -g size, --genome-size size
                        estimated genome size (for example, 5m or 2.6g)
  -o path, --out-dir path
                        Output directory
  -t int, --threads int
                        number of parallel threads [1]
  -i int, --iterations int
                        number of polishing iterations [1]
  -m int, --min-overlap int
                        minimum overlap between reads [auto]
  --asm-coverage int    reduced coverage for initial disjointig assembly [not
                        set]
  --plasmids            rescue short unassembled plasmids
  --meta                metagenome / uneven coverage mode
  --no-trestle          skip Trestle stage
  --polish-target path  run polisher on the target sequence
  --resume              resume from the last completed stage
  --resume-from stage_name
                        resume from a custom stage
  --stop-after stage_name
                        stop after the specified stage completed
  --debug               enable debug output
  -v, --version         show program's version number and exit

Install flye using conda

To install flye with conda, simply run this command:

conda install flye

Warning
Flye is a bioconda package, therefore we need to have bioconda enabled first.

If bioconda is not enabled do the following:

conda config --add channels defaults 
conda config --add channels bioconda
conda config --add channels conda-forge 

Run flye

Before running flye, check the available memory. For a human genome with 30x coverage, you will need ~800Gb at peak.

Flye can be easily run with a simple command line:

  • For nanopore raw sequences (i.e no corrected)

    โ€‹โ€‹โ€‹โ€‹flye --nano-raw <reads1.fastq reads2.fastq> --genome-size <int> --out-dir <path> --threads <int> --iterations 2 
    
  • For pacbio raw sequences (i.e no corrected)

    โ€‹โ€‹โ€‹โ€‹flye --pacbio-raw <reads1.fastq reads2.fastq> --genome-size <int> --out-dir <path> --threads <int> --iterations 2 
    

Flye run can take from 1 to 2 weeks on a 2GB mammal genome, if the run stops you can always restarted with the โ€“resume-from option (e.g. โ€“resume-from polishing)

The results will be saved on the output directory.

tags: Genome analysis , Diana