## Filter phased data to look for genetic variants ###### tags: `Pacbio Data` ### 1. BSC630, Nipped-A, 2R:5,177,965..5,251,022 ### Allelic Set (6): #### - L10 (18F08A-1(1)) #### - L72 (20G14B-M1(2)) #### - L95 (18F22A-1) #### - L122 (18F25B-1(1)) #### - L173 (21G29B-M1(1)) #### - L302 (21F25B-M2(2)) ### 1. Use VCFtools to filter `module load VCFtools/0.1.17` ### 2. Filter L302 (first phased genome complete) for Passed variants and those that fall within the breakpoints of L302 `vcftools --gzvcf L302_dv_phased_chr2.vcf.gz --chr 2R --from-bp 5177965 --to-bp 5251022 --remove-filtered RefCall --recode --out L302_dv_phased_nipped_a` ### 3. Filter any line that didn't map to Nipped-A (in this case L301) for variants within Nipped-A breakpoints and spit out the positions and variants at those positions vcftools --vcf L301_dv_phased_nipped_a.recode.vcf --positions-only --out L301_dv_nippeda_positions module load bcftools/1.4 bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\n' L301_dv_phased_nipped_a.recode.vcf > L301_nippeda_dv_variants.txt ### 4. Use BCFtools to remove all variatns from L302 Nipped-A region that are also present in L301 Nipped-A region (these either are from CyO or can't be the lethal variant) bcftools view -T ^L301_nippeda_dv_variants.txt L302_dv_phased_nipped_a.recode.vcf -o filtered_L302nippedA.vcf --force-samples ### 5. Annotate the remaining variants and look for high impact mutations specific to L302 in that region #!/bin/bash #SBATCH -e errs/%a_SnpEff_chr2-%A.err #SBATCH -o outs/%a_SnpEff_chr2-%A.out #SBATCH --mem-per-cpu=20G #SBATCH -p scavenger #SBATCH --mail-type=ALL --mail-user=sbm34@duke.edu #SBATCH -a xx,xx,xx LL=${SLURM_ARRAY_TASK_ID} module load Java/23.0.1 java -Xmx8g -jar /hpc/home/sbm34/snpEff/snpEff.jar \ -c /hpc/home/sbm34/snpEff/snpEff.config Drosophila_melanogaster \ -o gatk \ filtered_L302nippedA.vcf > filtered_L302nippedA.ann.vcf ### 2. DF(2L)ED50001, l(2)gl, 2L:9,839..21,376 ### Allelic Set (6): #### - L10, 18F08A-1(1) #### - L12, 18I06C-M1(1) #### - L212, 18F12A-1(3) #### - L271, 18I24B-M2 #### - L274, 21F02B-M1(1) #### - L288, 21G05B-M1(1) #### - L297, 21F27B-1(1) #### - L301, 21G10D-M1(2) #### - L303, 21F26D-M3(1) ### 1. Use VCFtools to filter `module load VCFtools/0.1.17` ### 2. Filter L301 adn L271 for Passed variants (snps) and those that fall within the breakpoints of l(2)gl `vcftools --gzvcf L301_dv_phased_chr2.vcf.gz --chr 2R --from-bp 1 --to-bp 1000000 --remove-filtered RefCall --recode --out L301_dv_phased_telomere` vcftools --gzvcf L271_dv_phased_chr2.vcf.gz --chr 2R --from-bp 1 --to-bp 1000000 --remove-filtered RefCall --recode --out L271_dv_phased_telomere ### 2. Filter L301 for Passed variants (structural) and those that fall within the breakpoints of l(2)gl `vcftools --vcf L301_pbsv_phased_chr2.vcf --chr 2R --from-bp 1 --to-bp 1000000 --remove-filtered RefCall --recode --out L301_pbsv_phased_telomere` vcftools --vcf L271_pbsv_phased_chr2.vcf --chr 2R --from-bp 1 --to-bp 1000000 --remove-filtered RefCall --recode --out L271_pbsv_phased_telomere ### 3. Filter any line that didn't map to l(2)gl (in this case L256) for variants within telomere and spit out the positions and variants at those positions #### SNPS vcftools --gzvcf L256_dv_phased_chr2.vcf.gz --chr 2R --from-bp 1 --to-bp 1000000 --remove-filtered RefCall --recode --out L256_dv_phased_telomere module load bcftools/1.4 bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\n' L256_dv_phased_telomere.recode.vcf > L256_telomere_dv_variants.txt #### structural variants vcftools --vcf L256_pbsv_phased_chr2.vcf --chr 2R --from-bp 1 --to-bp 1000000 --remove-filtered RefCall --recode --out L256_pbsv_phased_telomere module load bcftools/1.4 bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\n' L256_pbsv_phased_telomere.recode.vcf > L256_telomere_pbsv_variants.txt ### 4. Use BCFtools to remove all variatns from L301 telomere region that are also present in L256 telomere region (these either are from CyO or can't be the lethal variant) #### SNPS bcftools view -T ^L256_telomere_dv_variants.txt L301_dv_phased_telomere.recode.vcf -o filtered_L301_telomere_dv.vcf --force-samples bcftools view -T ^L256_telomere_dv_variants.txt L271_dv_phased_telomere.recode.vcf -o filtered_L271_telomere_dv.vcf --force-samples #### structural variants bcftools view -T ^L256_telomere_pbsv_variants.txt L301_pbsv_phased_telomere.recode.vcf -o filtered_L301_telomere_pbsv.vcf --force-samples bcftools view -T ^L256_telomere_pbsv_variants.txt L271_pbsv_phased_telomere.recode.vcf -o filtered_L271_telomere_pbsv.vcf --force-samples module load Java/23.0.1 java -Xmx8g -jar /hpc/home/sbm34/snpEff/snpEff.jar \ -c /hpc/home/sbm34/snpEff/snpEff.config Drosophila_melanogaster \ -o gatk \ filtered_L301_telomere_pbsv.vcf > filtered_L301_telomere_pbsv.ann.vcf module load Java/23.0.1 java -Xmx8g -jar /hpc/home/sbm34/snpEff/snpEff.jar \ -c /hpc/home/sbm34/snpEff/snpEff.config Drosophila_melanogaster \ -o gatk \ filtered_L271_telomere_pbsv.vcf > filtered_L271_telomere_pbsv.ann.vcf