# Submitting sequences to GenBank Dave Angelini, 14 May 2021. [GenBank](https://www.ncbi.nlm.nih.gov/genbank/) is a [vast](https://www.ncbi.nlm.nih.gov/genbank/statistics/) open access collection of all available nucleotide sequences maintained by the National Center for Biotechnology Information ([NCBI](https://www.ncbi.nlm.nih.gov/)). Every time I need to submit sequences to GenBank, typically before submitting a manuscript for publication, I struggle to remember how best to do it. There are many tools, and none of them ever seem to fit what I'd like to do. This tutorial is meant to serve as a record of one successful and relatively painless workflow. - **Clean up the sequences.** Create a folder in [Geneious](https://www.geneious.com/) that contains all of the sequences to be submitted for a particular project. - **Annotations** should be limited. GenBank only requires annotations for `CDS`. The `gene` annotation is required only if the sequence contains multiple genes, but is reccomended for all submissions. If they are known, `exon` annotations should also be included. - **`CDS` annotations** must have two properties: `transl_table` with value 1 (for the universal genetic code), and `codon_start` with value 1 (assuming that the start codon is included, otherwise this value should be the reading frame). - **`exon` annotations** must include the property `number` with numerical value indicating the exon's number. - Export all the sequences from Geneious as a single FASTA file, with the sequence names in the style: `name [organism=Genus species] longer definition`, for example the definition might be `Oncopeltus fasciatus doublesex homolog, transcript A`. - Export each sequence individually as a separate GenBank format file. - Use [GB2sequin](https://chlorobox.mpimp-golm.mpg.de/GenBank2Sequin.html) to generate [feature tables](https://www.ncbi.nlm.nih.gov/genbank/feature_table/) using each GenBank file. - Use the following bash one-liner to concatenate all the `.tbl` files and add a blank line between each. ```bash awk 'NR>1 && FNR==1{print ""};1' *.tbl > features.tbl ``` - From here you can use NCBI's [BankIt web portal](https://www.ncbi.nlm.nih.gov/WebSub/?form=auth&tool=genbank) for sequence submission. Upload the FASTA file as the starting point and provide the metadata. Then upload `features.tbl` to provide the annotations for all the sequences. ## Notes - Geneious has a [tutorial](https://www.geneious.com/tutorials/genbank-submission/) for submission to GenBank, but I've found their interface doesn't work well. ---