# 6. MarkDuplicatesSpark ###### tags: `2. Main Steps` `tutorials` `markduplicatesspark` `markduplicates` > Programs required: GATK ## Step 1: Sort and Mark Duplicate Reads Next, you'll want to mark duplicate reads and sort the SAM file using the GATK command called MarkDuplicatesSpark, which actually sorts the SAM file and marks duplicate reads in one step! The "Mark Duplicates" function of this command will tag duplicate reads, which will be important for future steps like variant calling so duplicates won't be counted twice. The sort function automatically sorts the SAM file by coordinate. ``` #!/bin/bash #SBATCH -e errs/%a_markdupspark-%A.err #SBATCH -o outs/%a_markdupspark-%A.out #SBATCH --mem-per-cpu=20G #SBATCH -p scavenger #SBATCH --cpus-per-task=4 #SBATCH --mail-type=ALL --mail-user=rp280@duke.edu #SBATCH -a 19 LL=${SLURM_ARRAY_TASK_ID} module load GATK/4.1.9.0 module load Java/1.8.0_60 java -Xmx16G -jar $GATK MarkDuplicatesSpark \ -I ../data/2021_09_21_fastq/mergedBAMs/merged_L19.bam \ -O ../data/2021_09_21_fastq/aligned_merged_mdup_BAMs/aligned_merged_mdup_L${LL}.bam \ -M ../data/2021_09_21_fastq/aligned_merged_mdup_BAMs/mdup_metrics_L${LL}.txt \ --tmp-dir /work/$USER ```