# Nanopore and Target Capture ## Background ## The Data Information from Erika: The Illumina and MinION data have their own folders which contain: - bowtie2 consensus sequences in fastq (.fq.gz) and fasta (.fq.fa.gz) format - demultiplexed data of the two species, Packera greenei and P. streptanthifolia (fastq.gz) - [MinION only] a file containing all raw reads from the MinION sequencer - NOT demultiplexed (MinION_Packera_raw.fastq.gz) - I also included the Compositae specific probe sets (cos_hyb_piper_targets.fasta.gz), and a simplified version of the code I used to generate the consensus sequences for the MinION and Illumina data (MinION_steps.txt). `minion_steps.txt` ``` ######MinION demultiplexing and alignment steps - simplified ##Note: Packera greenei is named try11_E11 and Packera streptanthifolia is named try11_E12 throughout! ###Step 1: Demultiplex MinION data with cutadapt using the 5' end and 8bp i7 barcode adapter sequence #Packera greenei nohup cutadapt -e 1 -b E11_R1=CACCTGTA -o try11-{name}.fastq.gz ../MinION_Packera_raw.fastq & #Packera streptanthifolia nohup cutadapt -e 1 -b E12_R1=TGAGGACT -o try11-{name}.fastq.gz ../MinION_Packera_raw.fastq & ###Step 2: Run bowtie2 to align sequences against probe set #Build index bowtie2-build -f cos_hyb_piper_targets.fasta cos_hyb_piper_targets #Generate consensus sequences in MinION data: #Packera greenei bowtie2 -x cos_hyb_piper_targets -U try11-E11_R1.fastq.gz --very-sensitive-local -S try11-E11_R1.sam samtools sort -T try11-E11_R1.sorted -o try11-E11_R1.bam try11-E11_R1.sam bcftools mpileup -f cos_hyb_piper_targets.fasta try11-E11_R1.bam > try11-E11_R1.bcf bcftools call --ploidy 1 -c try11-E11_R1.bcf | perl /home/ermoore3/miniconda2/pkgs/bcftools-1.9-ha228f0b_4/bin/vcfutils.pl vcf2fq > try11-E11_R1.consensus.fq #Packera streptanthifolia bowtie2 -x cos_hyb_piper_targets -U try11-E12_R1.fastq.gz --very-sensitive-local -S try11-E12_R1.sam samtools sort -T try11-E12_R1.sorted -o try11-E12_R1.bam try11-E12_R1.sam bcftools mpileup -f cos_hyb_piper_targets.fasta try11-E12_R1.bam > try11-E12_R1.bcf bcftools call --ploidy 1 -c try11-E12_R1.bcf | perl /home/ermoore3/miniconda2/pkgs/bcftools-1.9-ha228f0b_4/bin/vcfutils.pl vcf2fq > try11-E12_R1.consensus.fq ###Step 3: Convert consensus fastq to fasta for f in ./*.consensus.fq; do /home/ermoore3/seqtk/seqtk seq -l0 ${f} > temp.fq /home/ermoore3/seqtk/seqtk seq -a temp.fq > ${f}.fa done ``` ## Analysis plan One question is whether the `exonerate_hits.py` script will be better than `bowtie` at identifying target capture genes in Nanopore reads. * re-run demultiplexing on raw Nanopore reads and convert to FASTA ``` cutadapt -e 1 -b E12_R1=TGAGGACT -o Packera_streptanthifolia_raw.fq.gz MinION_Packera_raw.fastq-001 -j 40 seqtk seq -A Packera_streptanthifolia_raw.fq.gz > Packera_streptanthifolia_raw.fasta ``` * Run `exonerate_hits.py` on the MinION assembly - needed to translate the COS genes to protein and only keep one ortholog (I chose sunflower) - first run complained about not having `--exonerate_hit_sliding_window_thresh` - So may need to run hybpiper from the beginning, as HybPiper2 does not appear to support a direct call of `exonerate_hits.py` * Running Hybpiper2 with Nanopore FASTQ file * Compare extracted sequences to those extracted from HybPiper with Illumina data ## Results ## Future Analyses - retrieve coding sequence from hybpiper - add consensus assembly from Nanopore/Bowtie for all (just sunflower/lettuce) - identify if older version of `exonerate_hits.py` can be used on Nanopore raw - reference protein sequence from HybPiper output ideally - retain the intronic information (stitch contigs together) comparison of illumina/nanopore data (trees)