Print Article: KA-03568

How are genome assemblies generated and what are assembly levels?

A major strategy in prokaryotic/eukaryotic genome sequencing is termed Whole Genome Shotgun (WGS) sequencing. The WGS approach involves (1) isolation of genomic DNA from a biological sample and (2) fragmentation of DNA into small pieces that are then sequenced individually. Once the sequences of the small pieces — called reads — are obtained, (3) researchers assemble them like tiny pieces of a giant puzzle into progressively larger contiguous sequence pieces (called contigs). The next step (4) is to build scaffolds (supercontigs). To build a scaffold, researchers place several contigs in the correct order and orientation. To make a scaffold a single sequence unit (a single sequence record), they represent sequencing gaps between the contigs in the scaffold with a series of NNNs (instead of DNA sequence of A, T, G, and C). The final step (5) is to have the scaffolds that belong to the same chromosome properly ordered, oriented, and assembled into the chromosome sequence. Again, researchers represent any sequencing gaps in an assembled chromosome with NNNs.
 

What are assembly levels?

Assembly levels tell how well researchers assembled a genome. There are four such levels: (1) Complete genome, (2) Chromosome, (3) Scaffold, and (4) Contig.  See the definitions for each level in:

NCBI Datasets Glossary

Genomes at the contig level are the most fragmented. Note that researchers may halt their sequencing/assembling efforts once they gather the information they need. Therefore, many assemblies stay indefinitely as collections of contigs and/or scaffolds regardless of the organism's genome complexity.
Until recent years, the complete assembly level was unattainable for complex eukaryotic genomes such as the human genome. With new long-read shotgun sequencing technology, researchers have overcome the technical difficulties. In 2022, they obtained the first complete human genome model, the T2T-CHM13v2.0 assembly.

 

Where can you learn more?

Knowledge articles:

Blogs on human genomes, including:

​​​​​​​NCBI Datasets documentationThe NCBI Genome Assembly Data Model