Knowledge Article · NLM Customer Support Center

Print Article: KA-03570

Who generates genome assemblies that are present in the NCBI databases?

NCBI receives genome sequencing and assembly data from individual researchers as well as large sequencing centers. Often, consortia consisting of collaborative groups from across the world tackle sequencing and assembly of complex eukaryotic genomes (examples). NCBI does not conduct any type of nucleotide sequencing nor does it generate genome assemblies from data found in the NCBI databases. However, some NCBI staff do actively participate in The Genome Reference Consortium (GRC) that is responsible for maintaining and improving the human, mouse, zebrafish, and chicken reference assemblies.

To make their assemblies publicly available, researchers submit them to NCBI (via GenBank) or to another member of the International Nucleotide Sequence Database Collaboration (INSDC). (GRC also submits their assembly data to INSDC.) All levels* of assembly are acceptable: contig, scaffold, chromosome, or complete. Currently, most researchers who sequence entire genomes submit these as WGS submissions. NCBI also accepts raw (unassembled) sequencing reads* that submitters deposit separately into the Sequence Read Archive (SRA).

The submitters may or may not annotate (indicate the location of the genes and other features on the DNA) their assemblies prior to submission to GenBank. The submitted assemblies are subject to processing by NCBI.

*See the article on assembly processes and assembly levels.