5. uBAM to interleaved fastq, align with reference and then merge

# 5. uBAM to interleaved fastq, align with reference and then merge ###### tags: `2. Main Steps` `tutorials` `aligned BAM` > Required software: BWA ## Step 1: Convert uBAM back to fastq ``` #!/bin/bash #SBATCH -e errs/%a_samtofastq-%A.err #SBATCH -o outs/%a_samtofastq-%A.out #SBATCH --mem-per-cpu=20G #SBATCH -p scavenger #SBATCH --cpus-per-task=4 #SBATCH --mail-type=ALL --mail-user=rp280@duke.edu #SBATCH -a 19 LL=${SLURM_ARRAY_TASK_ID} module load GATK/4.1.9.0 java -Xmx8G -jar $GATK SamToFastq --INTERLEAVE true \ --INPUT ../data/2021_09_21_fastq/uBAMs/unmapped_adapter_L${LL}.bam \ --FASTQ ../data/2021_09_21_fastq/rgadded_adapter_L${LL}.fastq \ --CLIPPING_ATTRIBUTE XT \ --CLIPPING_ACTION 2 \ --NON_PF true \ --TMP_DIR /work/$USER ``` ## Step 2: Align fastq to reference using BWA MEM This step will use your reference genome and bwa indexes! ``` #!/bin/bash #SBATCH -e errs/%a_alignment-%A.err #SBATCH -o outs/%a_alignment-%A.out #SBATCH --mem-per-cpu=20G #SBATCH -p scavenger #SBATCH --cpus-per-task=4 #SBATCH --mail-type=ALL --mail-user=rp280@duke.edu #SBATCH -a 19 LL=${SLURM_ARRAY_TASK_ID} module load BWA/0.7.17 bwa mem -M -t 4 -p ../ref/dmel-all-chromosome-r6.41.fasta ../data/2021_09_21_fastq/rgadded_adapter_L${LL}.fastq > ../data/2021_09_21_fastq/alignedBAMs/aligned_L${LL}.bam ``` ## Step 3: MergeBamAlignment This step will use your reference genome and the GATK indexes (so make sure they're all in the same directory!) After you have your aligned fastq file with the reference genome (and its BWA index), the output aligned BAM file will be missing all of the information you added to your uBAM (readgroups and marked Illumina adapters). The first step, MergeBamAlignments will merge the information in your uBAM with your aligned BAM and output an aligned bam with readgroups and marked adapters! Then, you will need to mark duplicate reads and sort by coordinate (which can both be done using MarkDuplicatesSPark, step 2). This step merges the alignment information from the product fastq from BWA MEM and the read groups and metadata from uBAM from steps earlier together. ``` #!/bin/bash #SBATCH -e errs/%a_mergeBam-%A.err #SBATCH -o outs/%a_mergeBam-%A.out #SBATCH --mem-per-cpu=20G #SBATCH -p scavenger #SBATCH --cpus-per-task=4 #SBATCH --mail-type=ALL --mail-user=rp280@duke.edu #SBATCH -a 19 LL=${SLURM_ARRAY_TASK_ID} module load GATK/4.1.9.0 java -Xmx16G -jar $GATK MergeBamAlignment \ -O ../data/2021_09_21_fastq/mergedBAMs/merged_L${LL}.bam \ -R ../ref/dmel-all-chromosome-r6.41.fasta \ -UNMAPPED_BAM ../data/2021_09_21_fastq/uBAMs/unmapped_L${LL}.bam \ -ALIGNED_BAM ../data/2021_09_21_fastq/alignedBAMs/aligned_L${LL}.bam \ -CREATE_INDEX true \ -ADD_MATE_CIGAR true \ -CLIP_ADAPTERS false \ -CLIP_OVERLAPPING_READS true \ -INCLUDE_SECONDARY_ALIGNMENTS true \ -MAX_INSERTIONS_OR_DELETIONS -1 \ -PRIMARY_ALIGNMENT_STRATEGY MostDistant \ -ATTRIBUTES_TO_RETAIN XS \ -TMP_DIR /work/$USER ```