Renato Alves

@unode

Joined on Mar 23, 2020

  • The 2.3 billion sequences generated from predicted open reading frames, were clustered into three categories based on different approaches and thresholds. A unigene or unigene cluster contains sequences that represent a unique gene. They are obtained by clustering the 2.3 billion nucleotide sequences at 95% identity resulting in approximately 300 million unigenes. Protein cluster A protein cluster is produced by translating the 300 million unigenes to amino-acid followed by clustering at 90% identity. Protein family Protein families capture distant homology relationships by clustering the translated 300 million unigenes at 20% identity requiring also a minimum of 50% sequence overlap.
     Like  Bookmark