# Mega Sequencing reads Preprocessing Subworkflows
## Specifications and Scope
The main aim of this subworkflow is to provide a common set of steps for processing high-throughput sequencing data, that can be installed across most nf-core pipelines analysing this type of data.
This will provide consistency and efficiency for developers across all pipelines.
The following points describe the scope:
- Should cover the most common steps of sequencing processing
- Each subworkflow should target a single sequencing platform
- Should provide maximum flexibility/choice for users (i.e. tool choice at each step)
- Each step should be skippable
- Final emitted output should be same regardless of the processing steps
Eventually these subworkflows could be aggregated into a single pipeline for the sole purpose of preprocessing data, that can be chained into downstream pipelines.
## (Illumina) Short Reads
```mermaid
graph LR;
fastq[/fastq/]
subgraph initial_stats
fastq --> pre_fastqc
fastq --> seqkit_sana --> pre_seqfu_check
end
pre_seqfu_check --> umitools_extract
subgraph umicleandetection
umitools_extract
end
umicleandetection --> adapterclip_merge
subgraph adapterclip_merge
direction LR
fastp
adapterremoval
leehom
cutadapt
trimmomatic
trimgalore
fastp
bbduk
ngmerge
end
adapterclip_merge --> complexity_filtering
subgraph complexity_filtering
direction LR
prinseqplusplus
fastp_complexity
bbmap_bbduk
end
complexity_filtering --> deduplication
deduplication --> post_fastqc
subgraph deduplication
direction LR
bbmap_clumpify
end
deduplication --> hostremoval
subgraph hostremoval
direction LR
deacon
hostile
end
hostremoval --> run_merging --> final_qc
subgraph run_merging
cat_fastq
end
subgraph final_qc
direction LR
seqfu_stats
post_seqfu_check
end
final_qc --> cleaned_fastq[/cleaned_fastq/]
```
## (Nanopore) Long Reads
TBD
## (PacBio) Long Reads
TBD