Dave Angelini, 14 May 2021.
GenBank is a vast open access collection of all available nucleotide sequences maintained by the National Center for Biotechnology Information (NCBI).
Every time I need to submit sequences to GenBank, typically before submitting a manuscript for publication, I struggle to remember how best to do it. There are many tools, and none of them ever seem to fit what I'd like to do. This tutorial is meant to serve as a record of one successful and relatively painless workflow.
CDS
. The gene
annotation is required only if the sequence contains multiple genes, but is reccomended for all submissions. If they are known, exon
annotations should also be included.CDS
annotations must have two properties: transl_table
with value 1 (for the universal genetic code), and codon_start
with value 1 (assuming that the start codon is included, otherwise this value should be the reading frame).exon
annotations must include the property number
with numerical value indicating the exon's number.name [organism=Genus species] longer definition
, for example the definition might be Oncopeltus fasciatus doublesex homolog, transcript A
..tbl
files and add a blank line between each.features.tbl
to provide the annotations for all the sequences.