Try   HackMD

Submitting sequences to GenBank

Dave Angelini, 14 May 2021.

GenBank is a vast open access collection of all available nucleotide sequences maintained by the National Center for Biotechnology Information (NCBI).

Every time I need to submit sequences to GenBank, typically before submitting a manuscript for publication, I struggle to remember how best to do it. There are many tools, and none of them ever seem to fit what I'd like to do. This tutorial is meant to serve as a record of one successful and relatively painless workflow.

  • Clean up the sequences. Create a folder in Geneious that contains all of the sequences to be submitted for a particular project.
  • Annotations should be limited. GenBank only requires annotations for CDS. The gene annotation is required only if the sequence contains multiple genes, but is reccomended for all submissions. If they are known, exon annotations should also be included.
  • CDS annotations must have two properties: transl_table with value 1 (for the universal genetic code), and codon_start with value 1 (assuming that the start codon is included, otherwise this value should be the reading frame).
  • exon annotations must include the property number with numerical value indicating the exon's number.
  • Export all the sequences from Geneious as a single FASTA file, with the sequence names in the style: name [organism=Genus species] longer definition, for example the definition might be Oncopeltus fasciatus doublesex homolog, transcript A.
  • Export each sequence individually as a separate GenBank format file.
  • Use GB2sequin to generate feature tables using each GenBank file.
  • Use the following bash one-liner to concatenate all the .tbl files and add a blank line between each.
awk 'NR>1 && FNR==1{print ""};1' *.tbl > features.tbl
  • From here you can use NCBI's BankIt web portal for sequence submission. Upload the FASTA file as the starting point and provide the metadata. Then upload features.tbl to provide the annotations for all the sequences.

Notes

  • Geneious has a tutorial for submission to GenBank, but I've found their interface doesn't work well.