--- title: "nf-core/genomeassembler assembly workflow planning" tags: pipeline,genomeassembler,planning,plans --- # nf-core/genomeassembler assembly workflow discussion ## Desired components: - assembles - polishes - scaffolds - purge_dups - phases - sub assembly of organelles, sex chromosomes, germline/somatic genomes (https://www.nature.com/articles/s41467-019-13427-4) - circular DNA rotation ( e.g., plasmids, organelles, etc ) ## Workflow diagram draft: Are Tricycler & SPAdes prokaryote specific? ```mermaid flowchart LR Illumina-WGS --> SPAdes Illumina-WGS --> Shovill Illumina-WGS --> Shovill-SE SPAdes --> Illumina-Output Shovill --> Illumina-Output Shovill-SE --> Illumina-Output ``` ```mermaid flowchart LR PacBioHifi --> Canu PacBioHifi --> HiFiAsm PacBioHifi --> Flye Canu --> Pacbio-Output Flye --> Pacbio-Output HiFiAsm --> Pacbio-Output ``` ```mermaid flowchart LR Illumina-WGS --> Pilon Nanopore --> Canu Nanopore --> RedBean Nanopore --> Raven Nanopore --> Shasta Nanopore --> Miniasm Nanopore --> Flye Canu --> Tricycler Flye --> Tricycler RedBean --> Tricycler Raven --> Tricycler Shasta --> Tricycler Miniasm --> Tricycler Tricycler --> Medaka Tricycler --> Pilon Tricycler --> NanoPolish Nanopore-fast5 --> NanoPolish NanoPolish --> Nanopore-Output Pilon --> Nanopore-Output Medaka --> Nanopore-Output ``` <!-- Felipe Almeida: I am making some change suggestions here in the graph as I believe some processes are not fit. 1. PacBioHifi I believe should not be passed to spades. I believe the tools that handles them best are hifiasm, flye and canu. 2. I believe the graph should use Tricycler as a consensus maker as it gets generated assemblies and creates a consensus instead of having "Nanopore -> Tricycler" we must "Canu -> Tricycler" 3. I don't believe tricycler can be used for short reads. 4. Tricycler output can be polished later with Medaka or Pilon. Tricycler expects "raw" assemblies. 5. I believe saying the generated end-line of each workflow in the graph should help understanding it. E.g "Pacbio-Output". 6. Maybe is better to separate graphs for cleaner visualization? I left the removed lines in the mermaid graph below: Illumina-WGS -\-> Tricycler PacBioHifi -\-> SPAdes PacBioHifi -\-> Tricycler Nanopore -\-> SPAdes Nanopore -\-> Tricycler Miniasm -\-> Racon Racon -\-> NanoPolish Racon -\-> Medaka -->