Print

How do I use Nucleotide BLAST (blastn) and CDS feature display to determine if a protein-coding sequence has poor quality or incorrect nucleotide substitutions?

A coding region (CDS) may contain poor quality or incorrect nucleotide substitutions. Unlike nucleotide insertions or deletions, nucleotide substitutions do not cause frameshifts in CDS. But in they can cause internal stop codons in CDSs or alter consensus splice sites at intron-exon junctions. Such sequencing errors within CDS can cause problems during GenBank submissions.
You can detect internal stop codons and altered splice sites with Nucleotide BLAST (blastn). Utilize the CDS feature display on the BLAST search results page. See our article on blastn and CDS feature set up.

To detect possible internal stop codons and consensus splice errors, follow these steps:

  • Perform a blastn search.
  • On the search result page, click the Alignments tab to view pairwise alignments.
  • Check the CDS feature box to display the CDS feature on the alignments.
  • To learn more about the display, see the article on interpreting pairwise alignments.
  • Select an alignment to view.
  • Query CDS has an internal stop codon if you:
    • See an asterisk (*) in the Query translation above the query nucleotide sequence.
    • See a mismatch between the Query and Subject nucleotide sequence at the asterisk.
    • Confirm the stop codon to be true by the correct genetic code (translation table).
    • Ignore discrepancies that result from using the standard genetic code for all Query translations.
  • Query has wrong splice consensus sites if:
    • Intron does not begin with bases GT on its 5' end.
    • Intron does not end with bases AG on its 3' end. 
    • You confirm that no alternative splice sites apply to the Query sequence. 

Figure 1 illustrates a sequence with an altered splice site and an internal stop codon in CDS.

Figure 1: An alignment of 530 bp Query to the AY341426.1 sequence. Query contains four bases that do not match Subject.  A mismatch (red box) changes the 3’ splice site of an intron (indicated with the tilde symbols ~~~). The base change modifies the consensus AG sequence to AA. A GenBank submission tool would report this as a splice site error.  Another possibly wrong base in Query (blue box) changes the TAC codon for the amino acid (AA) residue Y to a stop codon, TAA. The Query translation shows the internal stop codon in the sequence as an asterisk (*).