After you have accessed the set of records in the
Nucleotide or the
Protein that you want to download (
example), use the
Send to link. The link is located on the right side of the screen above the records and it displays a menu with several options. In the
Nucleotide database, the menu provides three record-downloading paths. This approach works best for sets containing up to ~1000 sequence records. See below for better options to download larger sets.
Path 1: Downloading complete records
- Select Complete Record -> File (as your Destination) -> Format
- From the Format pull-down menu, select one of the twelve available display formats (that include GenBank, FASTA, various flavors of XML, and GFF3).
- Use the Sort by menu to specify how the records will be sorted in your download.
- Click the Create File button and specify a space on your local computer to store the file.
Path 2: Downloading coding sequences
Use this path when sequences (records) are annotated with coding regions (the CDS feature) and non-coding regions, but you want the coding regions only. Examples are records for larger genomic sequences that encompass more than a single CDS feature, and also mRNA records that contain the untranslated regions (5' UTR and 3' UTR) in addition to the coding region.
- Select Coding Sequences -> Format
- From the Format pull-down menu select one of the two formats that are available for this path: FASTA Nucleotide or FASTA Protein
- Click the Create File button and specify a space on your local computer to store the file.
Path 3: Downloading Gene Features
Use this path for larger genomic sequences (records) that are annotated with several gene features and you want to exclude sequence of intergenic regions.
- Select Gene Features -> Format
- The Format pull-down menu will offer the single available format for this path: FASTA Nucleotide.
- Click the Create File button and specify a space on your local computer to store the file.
There is a single path in the
Protein database with steps akin to path 1 in the Nucleotide database.
Use Batch Entrez for larger sets (up to ~10,000 records):
- If you experienced a server time-out when trying to download your set, use path 1 and choose the Accession List as the format to download. This format will result in the smallest possible file for a given set.
- Split the list into batches of smaller files. You will need to determine empirically the size of each batch.
- Proceed to the Batch Entrez tool and follow the instructions on the page to display the records from a batch on the web.
- Once you retrieve the records, use the Send to File menu and choose your path/format that you ultimately need.
- Repeat the whole process for your next batch.
For large data downloads, consider these alternatives to the sequence downloads from the Nucleotide and Protein databases: