# Filtering ddRADSEQ data for plastid and nuclear genomes ###### tags: `RADSeq` `Plastid Genome` `Nuclear Genome` `BWA` `SAMTOOLS` --- Author: Dylan Cohen ___ ## Overview 1. Obtain a Genome 2. Create Index with BWA 3. Align reads to reference with BWA 4. Sort SAM/BAM file using Samtools 5. Extract mapped reads with Samtools 6. Extract unmapped reads with Samtools 7. Convert unmapped files to fasta files 8. Use Stacks and ref.map.pl to call SNPs with mapped bam files 9. Use Stacks and denovo.pl to call snps on fasta files Notes: This could probably all be |piped| and one big loop. --- 2: Creating an index ``` bwa index ref.fa ``` This will create five files - keep them all in your working directory for the next step. --- 3: Align reads to indexed reference with bwa Here is a for loop for a whole directory of samples that follow this **R1_fq.gz** naming convention. ``` for i in $(ls -1 *R1_fq.gz | sed 's/\_R1_fq.gz//'); do bwa mem -t 1 taber.ref.fasta $i\_R1_fq.gz $i\_R2_fq.gz $i\.sam done ``` --- 4: Sort the aligned reads with Samtools ``` for i in *.sam do samtools sort -T $i -o ${i/.sam/.sorted.sam} done ``` --- 5: Extract mapped reads from the sorted ``` for i in $(ls -1 *.sorted.sam | sed 's/\.sorted.sam//') do samtools view -b -F 4 $i.sorted.sam -o $i\.mapped.bam done ``` --- 6:Extract unmapped from the sorted sam file ``` for i in $(ls -1 *.sort.sam | sed 's/\.sort.sam//') do samtools view -b -f 4 $i.sort.sam -o $i\.SUN.bam done ``` --- 7: Convert unmapped sam bam file back to read 1 and read 2 fasta files ``` for i in $(ls -1 *.SNOMAP.bam | sed 's/\.SNOMAP.bam//') do samtools fasta -0 $i.SNOMAP.bam -o $i\.S.fasta done ``` ---