NCBI RefSeq staff create
reference sequence records (RefSeqs) only from selected
GenBank (
INSDC) records to obtain one or more RefSeq genome (
assembly) for each viral species. You can access and download viral RefSeqs through the NCBI FTP site or the web. Choose your access/downloading path depending on your goal:
1. Accessing viral data that are organized in individual RefSeq assemblies
Assembly records aggregate all segments of segmented viral genomes as a single genome assembly. On the web, search
the Assembly database for all viral entries or for a smaller taxonomic group (
example):
- On the search results page, select Latest RefSeq within the Status facet on the left side of the screen.
- Use additional facets/filters to narrow your search results to the set that you want. (Tip: A statement above the records will indicate which filters are activated and allow you to Clear all before a new search/selection)
- Use the blue Download Assemblies button at the top of the page and select the format of your choice.
- Note the estimated size of the data (uncompressed). The data will download as a file with tar compression.
Alternatively, use the
/genomes/refseq/viral path on the NCBI FTP site and refer to the
assembly_summary file, which lists various metadata that you can use to determine your set of assemblies to download. For additional help on downloading genome assembly data see the
Genome Download (FTP) FAQ.
2. Accessing individual RefSeq genome records for viruses (not organized in individual assemblies)
NCBI creates an individual RefSeq sequence record for each viral segment. Use the links under the
Explore Viral Genome Sequences section of the
Viral Genomes page (a part of
Genome resource) for convenient access and selection of the data that you want:
- Select a browser, for example the Viral genome browser.
- (If desired, narrow your selection by a taxonomy node
- Use the top part of the browser and click on the node that you want.
- In the resulting menu, select Complete genomes to reload the page accordingly.)
- To obtain RefSeq nucleotide sequence records:
- To obtain sequence records for proteins that are annotated on RefSeq genomes:
- (Another option in that you will see in the Retrieve sequences menu is Neighbor Nucleotides that will retrieve GenBank (INSDC) records for complete viral genomes.)
The
Viral Genomes resource page also provides the direct link (under the
Download Viral Genome Data section) to the
Complete RefSeq release of viral and viroid sequences:
- RefSeq collection releases occur every two months
- There is no archive of previous releases
- You can update the records between releases by parsing the daily files