Knowledge Article · NLM Customer Support Center

Print Article: KA-03451

What are genome assembly accession numbers at NCBI?

To provide a single-entry point for complex genomic data, NCBI creates genome assemblies. In this context*, a genome assembly is the collection of all the sequences comprising a genome. Each assembly carries a unique name and a unique genome assembly accession number or simply assembly accession.

NCBI creates genome assemblies for the primary GenBank (INSDC)** genomic data. NCBI also selects from the primary assemblies and creates their RefSeq*** counterparts. With its RefSeq collection, NCBI aims for high-quality, well-annotated assemblies from all domains of life. RefSeq assemblies receive separate RefSeq assembly accessions while assembly names are shared between GenBank and RefSeq.

The accession formats of the two assembly types have the following generic formats:

GenBank (primary):

[GCA] [ _ ] [nine digits] [.] [version number]

RefSeq (NCBI-derived):

[GCF] [ _ ] [nine digits] [.] [version number]

Where can you find genome assembly records and their accession numbers?

The NCBI Datasets service provides access to all genome assembly records. Use the Genome tab to search for assemblies of interest. It supports several types of searching and filtering, including searching by assembly accessions. On the Datasets website, a single record represents both the GenBank assembly and its RefSeq counterpart if created. For example, when looking at the siamang reference**** assembly named NHGRI_mSymSyn1-v2.1_pri , you will find the assembly accession numbers below the name: GCA_028878055.3 for GenBank and GCF_028878055.3 for RefSeq.

What can you do with assembly accessions?

You can use an assembly accession to refer to all data associated with a genome: its sequences, annotation, and associated metadata. Make sure that you include the version number when using assembly accessions in your communications.

Where can you learn more?

Knowledge articles:

What are accession numbers at NCBI?
* Genome assembly can also refer to the process in which researchers assemble genomic sequences from smaller components. See the What is a genome assembly? article.
**** Note that reference genome is not the same as RefSeq genome. See the What are reference genomes and how can you find these at NCBI? article.

NCBI Datasets documentation

GenBank (INSDC) and Refseq:

** See the International Nucleotide Sequence Database Collaboration (INSDC) site to learn more about NCBI’s collaboration in exchanging nucleotide sequence data.
*** Visit the RefSeq home page to access information on various NCBI RefSeq projects.