NCBI Reference Sequence accession numbers (or RefSeq accessions) uniquely identify sequence records that NCBI derives from selected GenBank records. GenBank is a highly redundant database. Hence, NCBI creates RefSeqs to provide a less redundant representation of the naturally occurring nucleic acid and protein molecules. RefSeqs also allow annotation updates and other maintenance, independently from the primary data.
The generic format of a RefSeq accession is as follows:
[two-letter alphabetical prefix][ _ ][series of digits or alphanumeric characters][.][version number]
Some examples of Nucleotide RefSeq accession numbers are: NM_001744.6, NC_003619.1, NG_009904.1, NR_135858.1, and NZ_CASIGT010000001.1,while NP_001735.1 and WP_228380365.1 represent RefSeq records in the Protein database.
How can the accession format help you recognize RefSeq data?
You can quickly recognize a RefSeq accession by the underscore ( _ ) placed between the alphabetical prefix and the remaining alphanumeric characters. To ensure accuracy, you should keep the underscore and the version number when you communicate about a RefSeq record.
RefSeq alphabetic prefixes embed two types of information: (1) different prefixes mean different molecule types and (2) different curation statuses. For example, the “NC_” prefix represents chromosome records, while the “XM_” prefix represents predicted messenger RNA (mRNA). For more details see:
Where can you learn more?
Knowledge articles:
GenBank (INSDC) and RefSeq: