## Filter phased data to look for genetic variants
###### tags: `Pacbio Data`
### 1. BSC630, Nipped-A, 2R:5,177,965..5,251,022
### Allelic Set (6):
#### - L10 (18F08A-1(1))
#### - L72 (20G14B-M1(2))
#### - L95 (18F22A-1)
#### - L122 (18F25B-1(1))
#### - L173 (21G29B-M1(1))
#### - L302 (21F25B-M2(2))
### 1. Use VCFtools to filter
`module load VCFtools/0.1.17`
### 2. Filter L302 (first phased genome complete) for Passed variants and those that fall within the breakpoints of L302
`vcftools --gzvcf L302_dv_phased_chr2.vcf.gz --chr 2R --from-bp 5177965 --to-bp 5251022 --remove-filtered RefCall --recode --out L302_dv_phased_nipped_a`
### 3. Filter any line that didn't map to Nipped-A (in this case L301) for variants within Nipped-A breakpoints and spit out the positions and variants at those positions
vcftools --vcf L301_dv_phased_nipped_a.recode.vcf --positions-only --out L301_dv_nippeda_positions
module load bcftools/1.4
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\n' L301_dv_phased_nipped_a.recode.vcf > L301_nippeda_dv_variants.txt
### 4. Use BCFtools to remove all variatns from L302 Nipped-A region that are also present in L301 Nipped-A region (these either are from CyO or can't be the lethal variant)
bcftools view -T ^L301_nippeda_dv_variants.txt L302_dv_phased_nipped_a.recode.vcf -o filtered_L302nippedA.vcf --force-samples
### 5. Annotate the remaining variants and look for high impact mutations specific to L302 in that region
#!/bin/bash
#SBATCH -e errs/%a_SnpEff_chr2-%A.err
#SBATCH -o outs/%a_SnpEff_chr2-%A.out
#SBATCH --mem-per-cpu=20G
#SBATCH -p scavenger
#SBATCH --mail-type=ALL --mail-user=sbm34@duke.edu
#SBATCH -a xx,xx,xx
LL=${SLURM_ARRAY_TASK_ID}
module load Java/23.0.1
java -Xmx8g -jar /hpc/home/sbm34/snpEff/snpEff.jar \
-c /hpc/home/sbm34/snpEff/snpEff.config Drosophila_melanogaster \
-o gatk \
filtered_L302nippedA.vcf > filtered_L302nippedA.ann.vcf
### 2. DF(2L)ED50001, l(2)gl, 2L:9,839..21,376
### Allelic Set (6):
#### - L10, 18F08A-1(1)
#### - L12, 18I06C-M1(1)
#### - L212, 18F12A-1(3)
#### - L271, 18I24B-M2
#### - L274, 21F02B-M1(1)
#### - L288, 21G05B-M1(1)
#### - L297, 21F27B-1(1)
#### - L301, 21G10D-M1(2)
#### - L303, 21F26D-M3(1)
### 1. Use VCFtools to filter
`module load VCFtools/0.1.17`
### 2. Filter L301 adn L271 for Passed variants (snps) and those that fall within the breakpoints of l(2)gl
`vcftools --gzvcf L301_dv_phased_chr2.vcf.gz --chr 2R --from-bp 1 --to-bp 1000000 --remove-filtered RefCall --recode --out L301_dv_phased_telomere`
vcftools --gzvcf L271_dv_phased_chr2.vcf.gz --chr 2R --from-bp 1 --to-bp 1000000 --remove-filtered RefCall --recode --out L271_dv_phased_telomere
### 2. Filter L301 for Passed variants (structural) and those that fall within the breakpoints of l(2)gl
`vcftools --vcf L301_pbsv_phased_chr2.vcf --chr 2R --from-bp 1 --to-bp 1000000 --remove-filtered RefCall --recode --out L301_pbsv_phased_telomere`
vcftools --vcf L271_pbsv_phased_chr2.vcf --chr 2R --from-bp 1 --to-bp 1000000 --remove-filtered RefCall --recode --out L271_pbsv_phased_telomere
### 3. Filter any line that didn't map to l(2)gl (in this case L256) for variants within telomere and spit out the positions and variants at those positions
#### SNPS
vcftools --gzvcf L256_dv_phased_chr2.vcf.gz --chr 2R --from-bp 1 --to-bp 1000000 --remove-filtered RefCall --recode --out L256_dv_phased_telomere
module load bcftools/1.4
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\n' L256_dv_phased_telomere.recode.vcf > L256_telomere_dv_variants.txt
#### structural variants
vcftools --vcf L256_pbsv_phased_chr2.vcf --chr 2R --from-bp 1 --to-bp 1000000 --remove-filtered RefCall --recode --out L256_pbsv_phased_telomere
module load bcftools/1.4
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\n' L256_pbsv_phased_telomere.recode.vcf > L256_telomere_pbsv_variants.txt
### 4. Use BCFtools to remove all variatns from L301 telomere region that are also present in L256 telomere region (these either are from CyO or can't be the lethal variant)
#### SNPS
bcftools view -T ^L256_telomere_dv_variants.txt L301_dv_phased_telomere.recode.vcf -o filtered_L301_telomere_dv.vcf --force-samples
bcftools view -T ^L256_telomere_dv_variants.txt L271_dv_phased_telomere.recode.vcf -o filtered_L271_telomere_dv.vcf --force-samples
#### structural variants
bcftools view -T ^L256_telomere_pbsv_variants.txt L301_pbsv_phased_telomere.recode.vcf -o filtered_L301_telomere_pbsv.vcf --force-samples
bcftools view -T ^L256_telomere_pbsv_variants.txt L271_pbsv_phased_telomere.recode.vcf -o filtered_L271_telomere_pbsv.vcf --force-samples
module load Java/23.0.1
java -Xmx8g -jar /hpc/home/sbm34/snpEff/snpEff.jar \
-c /hpc/home/sbm34/snpEff/snpEff.config Drosophila_melanogaster \
-o gatk \
filtered_L301_telomere_pbsv.vcf > filtered_L301_telomere_pbsv.ann.vcf
module load Java/23.0.1
java -Xmx8g -jar /hpc/home/sbm34/snpEff/snpEff.jar \
-c /hpc/home/sbm34/snpEff/snpEff.config Drosophila_melanogaster \
-o gatk \
filtered_L271_telomere_pbsv.vcf > filtered_L271_telomere_pbsv.ann.vcf