--- Caryn-GSL --- # Looking into primers on V4V5 GSL amplicons 16S 515-926 Forward primer: `GTGYCAGCMGCCGCGGTAA` Reverse primer: `CCGYCAATTYMTTTRAGTTT` [toc] ## Conclusion Based on peeking shown below, I think both primers have been trimmed off already :thumbsup: ## Work ### cutadapt peeking If wanting to install same version: ```bash conda create -n cutadapt -c conda-forge -c bioconda -c defaults cutadapt=2.10 conda activate cutadapt ``` Running: ```bash cutadapt -g GTGYCAGCMGCCGCGGTAA -G CCGYCAATTYMTTTRAGTTT -o trimmed-R1.fastq -p trimmed-R2.fastq --discard-untrimmed -m 180 CE_GSL_Pool_S1_L001_R1_001.GSL_N3_D_V4V5_I4.fastq CE_GSL_Pool_S1_L001_R2_001.GSL_N3_D_V4V5_I4.fastq ``` This doesn't find them on virtually any of the sequences: ``` === Summary === Total read pairs processed: 192,215 Read 1 with adapter: 154 (0.1%) Read 2 with adapter: 24 (0.0%) ``` So these primers are not attached to the reads. Below we make sure they are the primers that were used by seeing where our reads fit in with a reference. ### Forward primer peeking Forward primer: `GTGYCAGCMGCCGCGGTAA` **Example forward read from `CE_GSL_Pool_S1_L001_R1_001.GSL_N3_D_V4V5_I4.fastq`:** ``` >M02404_75_000000000-J5YC2_1_1101_15933_2703 1:N:0:1 TACCGGCAGCCCGAGTGATGGCCGATCTTATTGGGCCTAAAGCGTCCGTAGCTGGGTCTGAAAGTCCATCGGGAAATCCACTCGCTCAACGAGTGGCCGTCCGGTGGAAACTCCACACCTTGGGACCGGAAGGCTCGAAGGGTACGGCCGGGGTAGGAGTGAAATCCTGTAATCCCGTCCGGACCACCGATGGCGAAAGCACTTCGAGAAAACGGATCCGACAGTGAGGGACGAAAGCTAAGATCGCGAAC ``` **Top blast hit:** ``` >NR_126292.1 Halobellus rufus strain CBA1103 16S ribosomal RNA, partial sequence ATTCCGGTTGATCCTGCCGGAGGCCATTGCTATCGGAGTCCGATTTAGCCATGCTAGTCGCACGAGTTCACACTCGTGGCGGAAAGCTCAGTAACACGTGGCCAATCTACCCTACAGAACTGAACAACCTCGGGAAACTGAGGCTAATTCAGTATACAACTCCCACGGATGAACACGGGGAGTCACAAACGCGTTAGCGCTGTAGGATGAGGCTGCGGCCGATTAGGTAGACGGTGGGGTAACGGCCCACCGTGCCCATAATCGGTACGGGTTGTGAGAGCAAGAGCCCGGAGACGGAATCTGAGACAAGATTCCGGGCCCTACGGGGCGCAGCAGGCGCGAAACCTTTACACTGCACGACAGTGCGATAAGGGGACTCCGAGTGCGAGGGCATATCGTCCTCGCTTTTGTGTACCGTAGGGCGGTACACGAATAAGAGCTGGGCAAGACCGGTGCCAGCCGCCGCGGTAATACCGGCAGCTCAAGTGATGGCCAATCTTATTGGGCCTAAAGCGTCCGTAGCTGGCCCTGAAAGTCCGTCGGGAAATCCACTCGCTCAACGAGTGGGCGTCCGGCGGAAACTTCAGGGCTTGGGACCGGAAGGCTCGAGGGGTACGTCCGGGGTAGGAGTGAAATCCCGTAATCCCGGACGGACCACCGATGGCGAAAGCACCTCGAGAAGACGGATCCGACAGTGAGGGACGAAAGCTAGGGTCTCGAACCGGATTAGATACCCGGGTAGTCCTAGCCGTAAACGATGTTCGCTAGGTGTGGCACAGGCTACGCGCCTGTGCTGTGCCGTAGGGAAGCCGAGAAGCGAACCGCCTGGGAAGTACGTCTGCAAGGATGAAACTTAAAGGAATTGGCGGGGGAGCACTACAACCGGAGGAGCCTGCGGTTTAATTGGACTCAACGCCGGACATCTCACCAGCCCCGACTACAGTAATGACGGTCAGGTTGATGACCTCGCCACGACGCTGTAGAGAGGAGGTGCATGGCCGCCGTCAGCTCGTACCGTGAGGCGTCCTGTTAAGTCAGGCAACGAGCGAGACCCACACTCCTAATTGCCAGCAGCAGTCTCGACTGGCTGGGTACATTAGGAGGACTGCCAGTGCCAAACTGGAGGAAGGAATGGGCAACGGTAGGTCAGTATGCCCCGAATGGGCTGGGCTACACGCGGGCTACAATGGTCGAGACAATGGGTTGCAACCTCGAAAGAGGGCGCTAATCTCCGAAACTCGATCGTAGTTCGGATTGAGGACTGAAACTCGTCCTCATGAAGCTGGATTCGGTAGTAATCGCATTTCACAAGAGTGCGGTGAATACGTCCCTGCTCCTTGCACACACCGCCCGTCAAAGCACCCGAGTGAGGTCCGGATGAGGCCATCGCAAGATGGTCGAATCTGGGCTTCGCAAGGGGGCTTAAGTCGTAACAAGGTAGCCGTAGGGGAATCTGCGGCTGGATCACCTCCT ``` **Aligned region:** ``` TACCGGCAGCTCAAGTGATGGCCAATCTTATTGGGCCTAAAGCGTCCGTAGCTGGCCCTGAAAGTCCGTCGGGAAATCCACTCGCTCAACGAGTGGGCGTCCGGCGGAAACTTCAGGGCTTGGGACCGGAAGGCTCGAGGGGTACGTCCGGGGTAGGAGTGAAATCCCGTAATCCCGGACGGACCACCGATGGCGAAAGCACCTCGAGAAGACGGATCCGACAGTGAGGGACGAAAGCTAGGGTCTCGAAC ``` **Aligned region in context of full ref sequence:** ``` # start of ref ATTCCGGTTGATCCTGCCGGAGGCCATTGCTATCGGAGTCCGATTTAGCCATGCTAGTCGCACGAGTTCACACTCGTGGCGGAAAGCTCAGTAACACGTGGCCAATCTACCCTACAGAACTGAACAACCTCGGGAAACTGAGGCTAATTCAGTATACAACTCCCACGGATGAACACGGGGAGTCACAAACGCGTTAGCGCTGTAGGATGAGGCTGCGGCCGATTAGGTAGACGGTGGGGTAACGGCCCACCGTGCCCATAATCGGTACGGGTTGTGAGAGCAAGAGCCCGGAGACGGAATCTGAGACAAGATTCCGGGCCCTACGGGGCGCAGCAGGCGCGAAACCTTTACACTGCACGACAGTGCGATAAGGGGACTCCGAGTGCGAGGGCATATCGTCCTCGCTTTTGTGTACCGTAGGGCGGTACACGAATAAGAGCTGGGCAAGACCGGTGCCAGCCGCCGCGGTAA # aligned TACCGGCAGCTCAAGTGATGGCCAATCTTATTGGGCCTAAAGCGTCCGTAGCTGGCCCTGAAAGTCCGTCGGGAAATCCACTCGCTCAACGAGTGGGCGTCCGGCGGAAACTTCAGGGCTTGGGACCGGAAGGCTCGAGGGGTACGTCCGGGGTAGGAGTGAAATCCCGTAATCCCGGACGGACCACCGATGGCGAAAGCACCTCGAGAAGACGGATCCGACAGTGAGGGACGAAAGCTAGGGTCTCGAAC # rest of ref CGGATTAGATACCCGGGTAGTCCTAGCCGTAAACGATGTTCGCTAGGTGTGGCACAGGCTACGCGCCTGTGCTGTGCCGTAGGGAAGCCGAGAAGCGAACCGCCTGGGAAGTACGTCTGCAAGGATGAAACTTAAAGGAATTGGCGGGGGAGCACTACAACCGGAGGAGCCTGCGGTTTAATTGGACTCAACGCCGGACATCTCACCAGCCCCGACTACAGTAATGACGGTCAGGTTGATGACCTCGCCACGACGCTGTAGAGAGGAGGTGCATGGCCGCCGTCAGCTCGTACCGTGAGGCGTCCTGTTAAGTCAGGCAACGAGCGAGACCCACACTCCTAATTGCCAGCAGCAGTCTCGACTGGCTGGGTACATTAGGAGGACTGCCAGTGCCAAACTGGAGGAAGGAATGGGCAACGGTAGGTCAGTATGCCCCGAATGGGCTGGGCTACACGCGGGCTACAATGGTCGAGACAATGGGTTGCAACCTCGAAAGAGGGCGCTAATCTCCGAAACTCGATCGTAGTTCGGATTGAGGACTGAAACTCGTCCTCATGAAGCTGGATTCGGTAGTAATCGCATTTCACAAGAGTGCGGTGAATACGTCCCTGCTCCTTGCACACACCGCCCGTCAAAGCACCCGAGTGAGGTCCGGATGAGGCCATCGCAAGATGGTCGAATCTGGGCTTCGCAAGGGGGCTTAAGTCGTAACAAGGTAGCCGTAGGGGAATCTGCGGCTGGATCACCTCCT ``` Right in front of the aligned portion, the end of the "start of ref" sequence, is the expected forward primer: `GTGCCAGCCGCCGCGGTAA` So based on this and that cutadapt doesn't find them, I think they've been cut off already :thumbsup: ### Reverse primer peeking Reverse primer: `CCGYCAATTYMTTTRAGTTT` **Corresponding reverse read from `CE_GSL_Pool_S1_L001_R1_001.GSL_N3_D_V4V5_I4.fastq`:** ``` >M02404_75_000000000-J5YC2_1_1101_15933_2703 2:N:0:1 AATCCTTGCAGACGTACTTCCCAGGCGGTTCGTTTCTCGGCTTCCCTACGGCACAACACAGCCGCGTAGTCTGTGTCATACCTAACGAACATTGTTTACGGCCAAGACTACCCGGGCATCTAATCCGGTTCGCGATCTTAGCTTTCCTCCCTCACTGTCGGATCCGTTTTCTCGAAGTGCTTTCGCTATCGGTGGTCCGGACGGGATTACAGGATTTCACTCCTACCCCGGCCGTACCCTTCGAGCCTTCC ``` **Top blast hit:** ``` >NR_028207.1 Haloquadratum walsbyi C23 16S ribosomal RNA, partial sequence CGGAGGCCATTGCTATCGGAGTCCGATTTAGCCATGCTAGTCGTGCGAGTTCAGACTCGCGGCACCGAGCTCAGTAACACGTGGCCAAACTACCCTACAGAGACGGATACCCTCGGGAAACTGAGGTTAACCCGTCATATCGATCTCAGGCTTGAATCGCAGAGATCACAAAACGCCCCGGCGCTGTAGGATGTGGCTGCGGTTGATTAGGTAGACGGTGGGGTAACGGCCCACCGTGCCCATAATCAGTACAGGTTGTGAGAGCAAGAACCTGGAGACGGAATCTGAGACAAGATTCCGGGCCCTACGGGGCGCAGCAGGCGCGAAACCTTTACACTGCACGCACGTGCGATAAGGGGACTCCGAGTGCGAGGGCATATCGTCCTCGCTTTCGTGTACCGTAGGGTGGTACACCAACAAGGGCTGGGCAAGACCGGTGCCAGCCGCCGCGGTAATACCGGCAGCCCGAGTGATGGCCGATCTTATTGGGCCTAAAGCGTCCGTAGCTGGCTGCGCAAGTCCGTCGGGAAATCCACTCGCCCAACGAGTGGGCGTCCGACGGAAACTGCACAGCTTGGGACCGGAAGGCTCGAAGGGTACGTTCGGGGTAGGAGTGAAATCCCATAATCCCGCACGGACCACCGATGGCGAAAGCACTTCGAGAAAACGGATCCGACAGTGAGGGACGAAAGCCAGGGTCTCCAACCGGATTAGATACCCGGGTAGTCCTGGCCGTAAACAATGTTCGCTAGGTATGACACAGACTACGCGTCTCTGTTGTGCCGTAGGGAAGCCGAGAAGCGAACCGCCTGGGAAGTACGTCTGCAAGGATGAAACTTAAAGGAATTGGCGGGGGAGCACTACAACCGGAGGAGCCTGCGGTTTAATTGGACTCAACGCCGGACATCTCACCAGCTCCGACTACAGTGATGACGACCAGGTTGATGACCTCATCACGACGCTGTAGAGAGGAGGTGCATGGCCGCCGTCAGCTCGTACCGTGAGGCGTCCTGTTAAGTCAGGCAACGAGCGAGACCCGCACCCCTAATTGCCAGCAACAGTTTCGACTGGTTGGGTACATTAGGAGGACTGCCAGTGTTAAACTGGAGGAAGGAACGGGCAACGGTAGGTCAGTATGCCCCGAATGAGCTGGGCAACACGCGGGCTACAATGGCTAAGACAATGGGTCGCTATCTCGACAGAGAACGCTAATCTCGAAACTTAGTCGTAGTTCGGATTGAGGGCTGAAACTCGCCCTCATGAAGCTGGATTCGGTAGTAGCCGCCTTTCAGTAGAAGGCGACGAATACGTCCCTGCTCCTTGCACACACCGCCCGTCAAAGCACCCGAGTGAGGTCCGGATGAGGCTATCACTGATAGTCGAATCTGGGCTTCGCAAGGGGGCTTAAGTCGTAACAAGGTAGCCGTAGGGGAATCTGC ``` **Aligned region (it is in reverse complement relative to our read):** ``` GGAAGGCTCGAAGGGTACGTTCGGGGTAGGAGTGAAATCCCATAATCCCGCACGGACCACCGATGGCGAAAGCACTTCGAGAAAACGGATCCGACAGTGAGGGACGAAAGCCAGGGTCTCCAACCGGATTAGATACCCGGGTAGTCCTGGCCGTAAACAATGTTCGCTAGGTATGACACAGACTACGCGTCTCTGTTGTGCCGTAGGGAAGCCGAGAAGCGAACCGCCTGGGAAGTACGTCTGCAAGGAT ``` Reverse primer's reverse complement (generated at [this handy site](http://arep.med.harvard.edu/labgc/adnan/projects/Utilities/revcomp.html)): `AAACTYAAAKRAATTGRCGG` **Aligned region in context of full ref sequence:** ``` # start of ref CGGAGGCCATTGCTATCGGAGTCCGATTTAGCCATGCTAGTCGTGCGAGTTCAGACTCGCGGCACCGAGCTCAGTAACACGTGGCCAAACTACCCTACAGAGACGGATACCCTCGGGAAACTGAGGTTAACCCGTCATATCGATCTCAGGCTTGAATCGCAGAGATCACAAAACGCCCCGGCGCTGTAGGATGTGGCTGCGGTTGATTAGGTAGACGGTGGGGTAACGGCCCACCGTGCCCATAATCAGTACAGGTTGTGAGAGCAAGAACCTGGAGACGGAATCTGAGACAAGATTCCGGGCCCTACGGGGCGCAGCAGGCGCGAAACCTTTACACTGCACGCACGTGCGATAAGGGGACTCCGAGTGCGAGGGCATATCGTCCTCGCTTTCGTGTACCGTAGGGTGGTACACCAACAAGGGCTGGGCAAGACCGGTGCCAGCCGCCGCGGTAATACCGGCAGCCCGAGTGATGGCCGATCTTATTGGGCCTAAAGCGTCCGTAGCTGGCTGCGCAAGTCCGTCGGGAAATCCACTCGCCCAACGAGTGGGCGTCCGACGGAAACTGCACAGCTTGGGACC # aligned GGAAGGCTCGAAGGGTACGTTCGGGGTAGGAGTGAAATCCCATAATCCCGCACGGACCACCGATGGCGAAAGCACTTCGAGAAAACGGATCCGACAGTGAGGGACGAAAGCCAGGGTCTCCAACCGGATTAGATACCCGGGTAGTCCTGGCCGTAAACAATGTTCGCTAGGTATGACACAGACTACGCGTCTCTGTTGTGCCGTAGGGAAGCCGAGAAGCGAACCGCCTGGGAAGTACGTCTGCAAGGAT # rest of ref GAAACTTAAAGGAATTGGCGGGGGAGCACTACAACCGGAGGAGCCTGCGGTTTAATTGGACTCAACGCCGGACATCTCACCAGCTCCGACTACAGTGATGACGACCAGGTTGATGACCTCATCACGACGCTGTAGAGAGGAGGTGCATGGCCGCCGTCAGCTCGTACCGTGAGGCGTCCTGTTAAGTCAGGCAACGAGCGAGACCCGCACCCCTAATTGCCAGCAACAGTTTCGACTGGTTGGGTACATTAGGAGGACTGCCAGTGTTAAACTGGAGGAAGGAACGGGCAACGGTAGGTCAGTATGCCCCGAATGAGCTGGGCAACACGCGGGCTACAATGGCTAAGACAATGGGTCGCTATCTCGACAGAGAACGCTAATCTCGAAACTTAGTCGTAGTTCGGATTGAGGGCTGAAACTCGCCCTCATGAAGCTGGATTCGGTAGTAGCCGCCTTTCAGTAGAAGGCGACGAATACGTCCCTGCTCCTTGCACACACCGCCCGTCAAAGCACCCGAGTGAGGTCCGGATGAGGCTATCACTGATAGTCGAATCTGGGCTTCGCAAGGGGGCTTAAGTCGTAACAAGGTAGCCGTAGGGGAATCTGC ``` The reverse complement of the reverse read (AAACTYAAAKRAATTGRCGG) matches up with what comes right after our aligned region – looking at "rest of ref" (following the 'G' which is just missing due to the alignment): `AAACTTAAAGGAATTGGCGG` So based on this and that cutadapt doesn't find these either, I think they've been cut off already too :thumbsup: