Print

How do I use Nucleotide BLAST (blastn) to determine the coding locations on a sequence from a prokaryotic genome?

Nucleotide BLAST (blastn) can help you find coding regions (CDS) on your sequence. You can utilize the CDS feature display on the BLAST search results page. See the article on blastn and CDS feature set up

This article deals with finding CDS locations on sequences from prokaryotic genomes. Follow these steps:

  • Perform a blastn search.
  • On the search result page, click the Alignments tab to view pairwise alignments.
  • Check the CDS feature box to display the CDS feature on the alignments.
  • Select an alignment to view.
  • Verify the following for the alignment:
    • Subject has annotated coding region in the aligned region
    • Query (your sequence) aligns to Subject across its entire length
    • The alignment has no gaps
  • To learn how to verify the above items, see the article on interpreting pairwise alignments.
  • Click the GenBank link in the Range row above the alignment. The link will display only the aligned region of the Subject record.
  • Infer the CDS locations on Query from the FEATURES section in the Subject record. They will adjust to the alignment locations.

Any gaps in the alignment will affect CDS locations. Gaps within CDS may alter the reading frame. In addition, pay attention to the correct coding strand. See the article on determining the coding strand for more information.

Figures 1 and 2 below illustrate an example of the method.

Figure 1: A pairwise alignment of a 443 bp Query on the CP007048 (Subject) sequence. Query aligns in its entire length and the alignment is gapless.  The aligned region between the two sequences have one-to-one correspondence.  The translation shows a complete CDS. The CDS starts with the ATG codon. The codon translates to methionine- M (blue oval). The CDS ends with the TAA stop codon (red oval). Query shows the stop codon as asterisk (*). The GenBank link in the Range row above the alignment (Range 1: 45661 to 46103 GenBank) displays the aligned part of the CP007048.1 record (locations 45661 to 46103).

Figure 2: The FEATURES section of the CP007048.1 record adjusted to the locations from the aligned region in Figure 1. It shows the CDS locations on Query from bases 81 to 350 (yellow rectangle).