---
Caryn-GSL
---
# Looking into primers on V4V5 GSL amplicons
16S 515-926
Forward primer: `GTGYCAGCMGCCGCGGTAA`
Reverse primer: `CCGYCAATTYMTTTRAGTTT`
[toc]
## Conclusion
Based on peeking shown below, I think both primers have been trimmed off already :thumbsup:
## Work
### cutadapt peeking
If wanting to install same version:
```bash
conda create -n cutadapt -c conda-forge -c bioconda -c defaults cutadapt=2.10
conda activate cutadapt
```
Running:
```bash
cutadapt -g GTGYCAGCMGCCGCGGTAA -G CCGYCAATTYMTTTRAGTTT -o trimmed-R1.fastq -p trimmed-R2.fastq --discard-untrimmed -m 180 CE_GSL_Pool_S1_L001_R1_001.GSL_N3_D_V4V5_I4.fastq CE_GSL_Pool_S1_L001_R2_001.GSL_N3_D_V4V5_I4.fastq
```
This doesn't find them on virtually any of the sequences:
```
=== Summary ===
Total read pairs processed: 192,215
Read 1 with adapter: 154 (0.1%)
Read 2 with adapter: 24 (0.0%)
```
So these primers are not attached to the reads. Below we make sure they are the primers that were used by seeing where our reads fit in with a reference.
### Forward primer peeking
Forward primer: `GTGYCAGCMGCCGCGGTAA`
**Example forward read from `CE_GSL_Pool_S1_L001_R1_001.GSL_N3_D_V4V5_I4.fastq`:**
```
>M02404_75_000000000-J5YC2_1_1101_15933_2703 1:N:0:1
TACCGGCAGCCCGAGTGATGGCCGATCTTATTGGGCCTAAAGCGTCCGTAGCTGGGTCTGAAAGTCCATCGGGAAATCCACTCGCTCAACGAGTGGCCGTCCGGTGGAAACTCCACACCTTGGGACCGGAAGGCTCGAAGGGTACGGCCGGGGTAGGAGTGAAATCCTGTAATCCCGTCCGGACCACCGATGGCGAAAGCACTTCGAGAAAACGGATCCGACAGTGAGGGACGAAAGCTAAGATCGCGAAC
```
**Top blast hit:**
```
>NR_126292.1 Halobellus rufus strain CBA1103 16S ribosomal RNA, partial sequence
ATTCCGGTTGATCCTGCCGGAGGCCATTGCTATCGGAGTCCGATTTAGCCATGCTAGTCGCACGAGTTCACACTCGTGGCGGAAAGCTCAGTAACACGTGGCCAATCTACCCTACAGAACTGAACAACCTCGGGAAACTGAGGCTAATTCAGTATACAACTCCCACGGATGAACACGGGGAGTCACAAACGCGTTAGCGCTGTAGGATGAGGCTGCGGCCGATTAGGTAGACGGTGGGGTAACGGCCCACCGTGCCCATAATCGGTACGGGTTGTGAGAGCAAGAGCCCGGAGACGGAATCTGAGACAAGATTCCGGGCCCTACGGGGCGCAGCAGGCGCGAAACCTTTACACTGCACGACAGTGCGATAAGGGGACTCCGAGTGCGAGGGCATATCGTCCTCGCTTTTGTGTACCGTAGGGCGGTACACGAATAAGAGCTGGGCAAGACCGGTGCCAGCCGCCGCGGTAATACCGGCAGCTCAAGTGATGGCCAATCTTATTGGGCCTAAAGCGTCCGTAGCTGGCCCTGAAAGTCCGTCGGGAAATCCACTCGCTCAACGAGTGGGCGTCCGGCGGAAACTTCAGGGCTTGGGACCGGAAGGCTCGAGGGGTACGTCCGGGGTAGGAGTGAAATCCCGTAATCCCGGACGGACCACCGATGGCGAAAGCACCTCGAGAAGACGGATCCGACAGTGAGGGACGAAAGCTAGGGTCTCGAACCGGATTAGATACCCGGGTAGTCCTAGCCGTAAACGATGTTCGCTAGGTGTGGCACAGGCTACGCGCCTGTGCTGTGCCGTAGGGAAGCCGAGAAGCGAACCGCCTGGGAAGTACGTCTGCAAGGATGAAACTTAAAGGAATTGGCGGGGGAGCACTACAACCGGAGGAGCCTGCGGTTTAATTGGACTCAACGCCGGACATCTCACCAGCCCCGACTACAGTAATGACGGTCAGGTTGATGACCTCGCCACGACGCTGTAGAGAGGAGGTGCATGGCCGCCGTCAGCTCGTACCGTGAGGCGTCCTGTTAAGTCAGGCAACGAGCGAGACCCACACTCCTAATTGCCAGCAGCAGTCTCGACTGGCTGGGTACATTAGGAGGACTGCCAGTGCCAAACTGGAGGAAGGAATGGGCAACGGTAGGTCAGTATGCCCCGAATGGGCTGGGCTACACGCGGGCTACAATGGTCGAGACAATGGGTTGCAACCTCGAAAGAGGGCGCTAATCTCCGAAACTCGATCGTAGTTCGGATTGAGGACTGAAACTCGTCCTCATGAAGCTGGATTCGGTAGTAATCGCATTTCACAAGAGTGCGGTGAATACGTCCCTGCTCCTTGCACACACCGCCCGTCAAAGCACCCGAGTGAGGTCCGGATGAGGCCATCGCAAGATGGTCGAATCTGGGCTTCGCAAGGGGGCTTAAGTCGTAACAAGGTAGCCGTAGGGGAATCTGCGGCTGGATCACCTCCT
```
**Aligned region:**
```
TACCGGCAGCTCAAGTGATGGCCAATCTTATTGGGCCTAAAGCGTCCGTAGCTGGCCCTGAAAGTCCGTCGGGAAATCCACTCGCTCAACGAGTGGGCGTCCGGCGGAAACTTCAGGGCTTGGGACCGGAAGGCTCGAGGGGTACGTCCGGGGTAGGAGTGAAATCCCGTAATCCCGGACGGACCACCGATGGCGAAAGCACCTCGAGAAGACGGATCCGACAGTGAGGGACGAAAGCTAGGGTCTCGAAC
```
**Aligned region in context of full ref sequence:**
```
# start of ref
ATTCCGGTTGATCCTGCCGGAGGCCATTGCTATCGGAGTCCGATTTAGCCATGCTAGTCGCACGAGTTCACACTCGTGGCGGAAAGCTCAGTAACACGTGGCCAATCTACCCTACAGAACTGAACAACCTCGGGAAACTGAGGCTAATTCAGTATACAACTCCCACGGATGAACACGGGGAGTCACAAACGCGTTAGCGCTGTAGGATGAGGCTGCGGCCGATTAGGTAGACGGTGGGGTAACGGCCCACCGTGCCCATAATCGGTACGGGTTGTGAGAGCAAGAGCCCGGAGACGGAATCTGAGACAAGATTCCGGGCCCTACGGGGCGCAGCAGGCGCGAAACCTTTACACTGCACGACAGTGCGATAAGGGGACTCCGAGTGCGAGGGCATATCGTCCTCGCTTTTGTGTACCGTAGGGCGGTACACGAATAAGAGCTGGGCAAGACCGGTGCCAGCCGCCGCGGTAA
# aligned
TACCGGCAGCTCAAGTGATGGCCAATCTTATTGGGCCTAAAGCGTCCGTAGCTGGCCCTGAAAGTCCGTCGGGAAATCCACTCGCTCAACGAGTGGGCGTCCGGCGGAAACTTCAGGGCTTGGGACCGGAAGGCTCGAGGGGTACGTCCGGGGTAGGAGTGAAATCCCGTAATCCCGGACGGACCACCGATGGCGAAAGCACCTCGAGAAGACGGATCCGACAGTGAGGGACGAAAGCTAGGGTCTCGAAC
# rest of ref
CGGATTAGATACCCGGGTAGTCCTAGCCGTAAACGATGTTCGCTAGGTGTGGCACAGGCTACGCGCCTGTGCTGTGCCGTAGGGAAGCCGAGAAGCGAACCGCCTGGGAAGTACGTCTGCAAGGATGAAACTTAAAGGAATTGGCGGGGGAGCACTACAACCGGAGGAGCCTGCGGTTTAATTGGACTCAACGCCGGACATCTCACCAGCCCCGACTACAGTAATGACGGTCAGGTTGATGACCTCGCCACGACGCTGTAGAGAGGAGGTGCATGGCCGCCGTCAGCTCGTACCGTGAGGCGTCCTGTTAAGTCAGGCAACGAGCGAGACCCACACTCCTAATTGCCAGCAGCAGTCTCGACTGGCTGGGTACATTAGGAGGACTGCCAGTGCCAAACTGGAGGAAGGAATGGGCAACGGTAGGTCAGTATGCCCCGAATGGGCTGGGCTACACGCGGGCTACAATGGTCGAGACAATGGGTTGCAACCTCGAAAGAGGGCGCTAATCTCCGAAACTCGATCGTAGTTCGGATTGAGGACTGAAACTCGTCCTCATGAAGCTGGATTCGGTAGTAATCGCATTTCACAAGAGTGCGGTGAATACGTCCCTGCTCCTTGCACACACCGCCCGTCAAAGCACCCGAGTGAGGTCCGGATGAGGCCATCGCAAGATGGTCGAATCTGGGCTTCGCAAGGGGGCTTAAGTCGTAACAAGGTAGCCGTAGGGGAATCTGCGGCTGGATCACCTCCT
```
Right in front of the aligned portion, the end of the "start of ref" sequence, is the expected forward primer: `GTGCCAGCCGCCGCGGTAA`
So based on this and that cutadapt doesn't find them, I think they've been cut off already :thumbsup:
### Reverse primer peeking
Reverse primer: `CCGYCAATTYMTTTRAGTTT`
**Corresponding reverse read from `CE_GSL_Pool_S1_L001_R1_001.GSL_N3_D_V4V5_I4.fastq`:**
```
>M02404_75_000000000-J5YC2_1_1101_15933_2703 2:N:0:1
AATCCTTGCAGACGTACTTCCCAGGCGGTTCGTTTCTCGGCTTCCCTACGGCACAACACAGCCGCGTAGTCTGTGTCATACCTAACGAACATTGTTTACGGCCAAGACTACCCGGGCATCTAATCCGGTTCGCGATCTTAGCTTTCCTCCCTCACTGTCGGATCCGTTTTCTCGAAGTGCTTTCGCTATCGGTGGTCCGGACGGGATTACAGGATTTCACTCCTACCCCGGCCGTACCCTTCGAGCCTTCC
```
**Top blast hit:**
```
>NR_028207.1 Haloquadratum walsbyi C23 16S ribosomal RNA, partial sequence
CGGAGGCCATTGCTATCGGAGTCCGATTTAGCCATGCTAGTCGTGCGAGTTCAGACTCGCGGCACCGAGCTCAGTAACACGTGGCCAAACTACCCTACAGAGACGGATACCCTCGGGAAACTGAGGTTAACCCGTCATATCGATCTCAGGCTTGAATCGCAGAGATCACAAAACGCCCCGGCGCTGTAGGATGTGGCTGCGGTTGATTAGGTAGACGGTGGGGTAACGGCCCACCGTGCCCATAATCAGTACAGGTTGTGAGAGCAAGAACCTGGAGACGGAATCTGAGACAAGATTCCGGGCCCTACGGGGCGCAGCAGGCGCGAAACCTTTACACTGCACGCACGTGCGATAAGGGGACTCCGAGTGCGAGGGCATATCGTCCTCGCTTTCGTGTACCGTAGGGTGGTACACCAACAAGGGCTGGGCAAGACCGGTGCCAGCCGCCGCGGTAATACCGGCAGCCCGAGTGATGGCCGATCTTATTGGGCCTAAAGCGTCCGTAGCTGGCTGCGCAAGTCCGTCGGGAAATCCACTCGCCCAACGAGTGGGCGTCCGACGGAAACTGCACAGCTTGGGACCGGAAGGCTCGAAGGGTACGTTCGGGGTAGGAGTGAAATCCCATAATCCCGCACGGACCACCGATGGCGAAAGCACTTCGAGAAAACGGATCCGACAGTGAGGGACGAAAGCCAGGGTCTCCAACCGGATTAGATACCCGGGTAGTCCTGGCCGTAAACAATGTTCGCTAGGTATGACACAGACTACGCGTCTCTGTTGTGCCGTAGGGAAGCCGAGAAGCGAACCGCCTGGGAAGTACGTCTGCAAGGATGAAACTTAAAGGAATTGGCGGGGGAGCACTACAACCGGAGGAGCCTGCGGTTTAATTGGACTCAACGCCGGACATCTCACCAGCTCCGACTACAGTGATGACGACCAGGTTGATGACCTCATCACGACGCTGTAGAGAGGAGGTGCATGGCCGCCGTCAGCTCGTACCGTGAGGCGTCCTGTTAAGTCAGGCAACGAGCGAGACCCGCACCCCTAATTGCCAGCAACAGTTTCGACTGGTTGGGTACATTAGGAGGACTGCCAGTGTTAAACTGGAGGAAGGAACGGGCAACGGTAGGTCAGTATGCCCCGAATGAGCTGGGCAACACGCGGGCTACAATGGCTAAGACAATGGGTCGCTATCTCGACAGAGAACGCTAATCTCGAAACTTAGTCGTAGTTCGGATTGAGGGCTGAAACTCGCCCTCATGAAGCTGGATTCGGTAGTAGCCGCCTTTCAGTAGAAGGCGACGAATACGTCCCTGCTCCTTGCACACACCGCCCGTCAAAGCACCCGAGTGAGGTCCGGATGAGGCTATCACTGATAGTCGAATCTGGGCTTCGCAAGGGGGCTTAAGTCGTAACAAGGTAGCCGTAGGGGAATCTGC
```
**Aligned region (it is in reverse complement relative to our read):**
```
GGAAGGCTCGAAGGGTACGTTCGGGGTAGGAGTGAAATCCCATAATCCCGCACGGACCACCGATGGCGAAAGCACTTCGAGAAAACGGATCCGACAGTGAGGGACGAAAGCCAGGGTCTCCAACCGGATTAGATACCCGGGTAGTCCTGGCCGTAAACAATGTTCGCTAGGTATGACACAGACTACGCGTCTCTGTTGTGCCGTAGGGAAGCCGAGAAGCGAACCGCCTGGGAAGTACGTCTGCAAGGAT
```
Reverse primer's reverse complement (generated at [this handy site](http://arep.med.harvard.edu/labgc/adnan/projects/Utilities/revcomp.html)): `AAACTYAAAKRAATTGRCGG`
**Aligned region in context of full ref sequence:**
```
# start of ref
CGGAGGCCATTGCTATCGGAGTCCGATTTAGCCATGCTAGTCGTGCGAGTTCAGACTCGCGGCACCGAGCTCAGTAACACGTGGCCAAACTACCCTACAGAGACGGATACCCTCGGGAAACTGAGGTTAACCCGTCATATCGATCTCAGGCTTGAATCGCAGAGATCACAAAACGCCCCGGCGCTGTAGGATGTGGCTGCGGTTGATTAGGTAGACGGTGGGGTAACGGCCCACCGTGCCCATAATCAGTACAGGTTGTGAGAGCAAGAACCTGGAGACGGAATCTGAGACAAGATTCCGGGCCCTACGGGGCGCAGCAGGCGCGAAACCTTTACACTGCACGCACGTGCGATAAGGGGACTCCGAGTGCGAGGGCATATCGTCCTCGCTTTCGTGTACCGTAGGGTGGTACACCAACAAGGGCTGGGCAAGACCGGTGCCAGCCGCCGCGGTAATACCGGCAGCCCGAGTGATGGCCGATCTTATTGGGCCTAAAGCGTCCGTAGCTGGCTGCGCAAGTCCGTCGGGAAATCCACTCGCCCAACGAGTGGGCGTCCGACGGAAACTGCACAGCTTGGGACC
# aligned
GGAAGGCTCGAAGGGTACGTTCGGGGTAGGAGTGAAATCCCATAATCCCGCACGGACCACCGATGGCGAAAGCACTTCGAGAAAACGGATCCGACAGTGAGGGACGAAAGCCAGGGTCTCCAACCGGATTAGATACCCGGGTAGTCCTGGCCGTAAACAATGTTCGCTAGGTATGACACAGACTACGCGTCTCTGTTGTGCCGTAGGGAAGCCGAGAAGCGAACCGCCTGGGAAGTACGTCTGCAAGGAT
# rest of ref
GAAACTTAAAGGAATTGGCGGGGGAGCACTACAACCGGAGGAGCCTGCGGTTTAATTGGACTCAACGCCGGACATCTCACCAGCTCCGACTACAGTGATGACGACCAGGTTGATGACCTCATCACGACGCTGTAGAGAGGAGGTGCATGGCCGCCGTCAGCTCGTACCGTGAGGCGTCCTGTTAAGTCAGGCAACGAGCGAGACCCGCACCCCTAATTGCCAGCAACAGTTTCGACTGGTTGGGTACATTAGGAGGACTGCCAGTGTTAAACTGGAGGAAGGAACGGGCAACGGTAGGTCAGTATGCCCCGAATGAGCTGGGCAACACGCGGGCTACAATGGCTAAGACAATGGGTCGCTATCTCGACAGAGAACGCTAATCTCGAAACTTAGTCGTAGTTCGGATTGAGGGCTGAAACTCGCCCTCATGAAGCTGGATTCGGTAGTAGCCGCCTTTCAGTAGAAGGCGACGAATACGTCCCTGCTCCTTGCACACACCGCCCGTCAAAGCACCCGAGTGAGGTCCGGATGAGGCTATCACTGATAGTCGAATCTGGGCTTCGCAAGGGGGCTTAAGTCGTAACAAGGTAGCCGTAGGGGAATCTGC
```
The reverse complement of the reverse read (AAACTYAAAKRAATTGRCGG) matches up with what comes right after our aligned region – looking at "rest of ref" (following the 'G' which is just missing due to the alignment): `AAACTTAAAGGAATTGGCGG`
So based on this and that cutadapt doesn't find these either, I think they've been cut off already too :thumbsup: