# Nanopore and Target Capture
## Background
## The Data
Information from Erika: The Illumina and MinION data have their own folders which contain:
- bowtie2 consensus sequences in fastq (.fq.gz) and fasta (.fq.fa.gz) format
- demultiplexed data of the two species, Packera greenei and P. streptanthifolia (fastq.gz)
- [MinION only] a file containing all raw reads from the MinION sequencer - NOT demultiplexed (MinION_Packera_raw.fastq.gz)
- I also included the Compositae specific probe sets (cos_hyb_piper_targets.fasta.gz), and a simplified version of the code I used to generate the consensus sequences for the MinION and Illumina data (MinION_steps.txt).
`minion_steps.txt`
```
######MinION demultiplexing and alignment steps - simplified
##Note: Packera greenei is named try11_E11 and Packera streptanthifolia is named try11_E12 throughout!
###Step 1: Demultiplex MinION data with cutadapt using the 5' end and 8bp i7 barcode adapter sequence
#Packera greenei
nohup cutadapt -e 1 -b E11_R1=CACCTGTA -o try11-{name}.fastq.gz ../MinION_Packera_raw.fastq &
#Packera streptanthifolia
nohup cutadapt -e 1 -b E12_R1=TGAGGACT -o try11-{name}.fastq.gz ../MinION_Packera_raw.fastq &
###Step 2: Run bowtie2 to align sequences against probe set
#Build index
bowtie2-build -f cos_hyb_piper_targets.fasta cos_hyb_piper_targets
#Generate consensus sequences in MinION data:
#Packera greenei
bowtie2 -x cos_hyb_piper_targets -U try11-E11_R1.fastq.gz --very-sensitive-local -S try11-E11_R1.sam
samtools sort -T try11-E11_R1.sorted -o try11-E11_R1.bam try11-E11_R1.sam
bcftools mpileup -f cos_hyb_piper_targets.fasta try11-E11_R1.bam > try11-E11_R1.bcf
bcftools call --ploidy 1 -c try11-E11_R1.bcf | perl /home/ermoore3/miniconda2/pkgs/bcftools-1.9-ha228f0b_4/bin/vcfutils.pl vcf2fq > try11-E11_R1.consensus.fq
#Packera streptanthifolia
bowtie2 -x cos_hyb_piper_targets -U try11-E12_R1.fastq.gz --very-sensitive-local -S try11-E12_R1.sam
samtools sort -T try11-E12_R1.sorted -o try11-E12_R1.bam try11-E12_R1.sam
bcftools mpileup -f cos_hyb_piper_targets.fasta try11-E12_R1.bam > try11-E12_R1.bcf
bcftools call --ploidy 1 -c try11-E12_R1.bcf | perl /home/ermoore3/miniconda2/pkgs/bcftools-1.9-ha228f0b_4/bin/vcfutils.pl vcf2fq > try11-E12_R1.consensus.fq
###Step 3: Convert consensus fastq to fasta
for f in ./*.consensus.fq; do
/home/ermoore3/seqtk/seqtk seq -l0 ${f} > temp.fq
/home/ermoore3/seqtk/seqtk seq -a temp.fq > ${f}.fa
done
```
## Analysis plan
One question is whether the `exonerate_hits.py` script will be better than `bowtie` at identifying target capture genes in Nanopore reads.
* re-run demultiplexing on raw Nanopore reads and convert to FASTA
```
cutadapt -e 1 -b E12_R1=TGAGGACT -o Packera_streptanthifolia_raw.fq.gz MinION_Packera_raw.fastq-001 -j 40
seqtk seq -A Packera_streptanthifolia_raw.fq.gz > Packera_streptanthifolia_raw.fasta
```
* Run `exonerate_hits.py` on the MinION assembly
- needed to translate the COS genes to protein and only keep one ortholog (I chose sunflower)
- first run complained about not having `--exonerate_hit_sliding_window_thresh`
- So may need to run hybpiper from the beginning, as HybPiper2 does not appear to support a direct call of `exonerate_hits.py`
* Running Hybpiper2 with Nanopore FASTQ file
* Compare extracted sequences to those extracted from HybPiper with Illumina data
## Results
## Future Analyses
- retrieve coding sequence from hybpiper
- add consensus assembly from Nanopore/Bowtie for all (just sunflower/lettuce)
- identify if older version of `exonerate_hits.py` can be used on Nanopore raw
- reference protein sequence from HybPiper output
ideally - retain the intronic information (stitch contigs together)
comparison of illumina/nanopore data (trees)