# Mega Sequencing reads Preprocessing Subworkflows ## Specifications and Scope The main aim of this subworkflow is to provide a common set of steps for processing high-throughput sequencing data, that can be installed across most nf-core pipelines analysing this type of data. This will provide consistency and efficiency for developers across all pipelines. The following points describe the scope: - Should cover the most common steps of sequencing processing - Each subworkflow should target a single sequencing platform - Should provide maximum flexibility/choice for users (i.e. tool choice at each step) - Each step should be skippable - Final emitted output should be same regardless of the processing steps Eventually these subworkflows could be aggregated into a single pipeline for the sole purpose of preprocessing data, that can be chained into downstream pipelines. ## (Illumina) Short Reads ```mermaid graph LR; fastq[/fastq/] subgraph initial_stats fastq --> pre_fastqc fastq --> seqkit_sana --> pre_seqfu_check end pre_seqfu_check --> umitools_extract subgraph umicleandetection umitools_extract end umicleandetection --> adapterclip_merge subgraph adapterclip_merge direction LR fastp adapterremoval leehom cutadapt trimmomatic trimgalore fastp bbduk ngmerge end adapterclip_merge --> complexity_filtering subgraph complexity_filtering direction LR prinseqplusplus fastp_complexity bbmap_bbduk end complexity_filtering --> deduplication deduplication --> post_fastqc subgraph deduplication direction LR bbmap_clumpify end deduplication --> hostremoval subgraph hostremoval direction LR deacon hostile end hostremoval --> run_merging --> final_qc subgraph run_merging cat_fastq end subgraph final_qc direction LR seqfu_stats post_seqfu_check end final_qc --> cleaned_fastq[/cleaned_fastq/] ``` ## (Nanopore) Long Reads TBD ## (PacBio) Long Reads TBD