How do I interpret Nucleotide BLAST (blastn) pairwise alignments with the CDS feature display?

Views:

You can view Nucleotide BLAST (blastn) search results as pairwise alignments. Further, you can opt to display the CDS feature on the alignment. See the article on blastn and CDS feature set up. In this article we describe pairwise alignments with CDS feature display. A pairwise alignment can help you determine properties or problems of a sequence.

Alignments on blastn search results pages (see an example in Figure 1) consist of two rows of nucleotide sequence. The top row represents your search sequence (Query). The bottom row represents a database sequence, called Subject (Sbjct). Lines connect the matching bases between Query and Subject. Missing lines indicate mismatches. Dashes (-) indicate gaps in the alignment. Gaps represent parts where Query or Subject have no counterpart.

If you select the CDS feature option, you will see two more rows of sequence. BLAST translates the CDS annotated on the Subject into a protein. You will see the protein sequence below the Subject nucleotide sequence as a row of letters. These letters are single-letter amino acid (AA) codes. Each code sits in the middle of its nucleotide codon (coding triplet). BLAST finds the aligned region between the Subject CDS and Query. It translates the Query based on the CDS alignment. It shows the protein translation above the Query (top-most row of letters).

BLAST uses the genetic code (translation table) from the Subject record, but it applies this code only to the Subject translation. It uses only the standard genetic code for any Query translation that may not be appropriate for the query.

Note the numbering for each sequence row. You can use the numbers to count the base- or amino acid positions.

Query and Subject can represent the same strand of the double-stranded DNA. In such case, you see the Plus/Plus Strand statement above the alignment. If the strands align in opposite directions, BLAST makes the Query sequence the plus strand. You then see the Plus/Minus Strand statement.

Figure 1: A pairwise alignment with the CDS feature display of a 483 bp-long Query against the KT780704.1 (Subject) sequence with its total length of 64127 bp. Query alignment starts with its first base (T) and it matches Subject location 22081; the end of Query alignment matches Subject location 22559 (yellow rectangles). CDS starts at Query location 69 (blue oval); the GTG (GUG) codon is an alternative protein initiation codon (M) for Bacterial, Archaeal and Plant Plastid genetic code in Subject. BLAST applied the standard genetic code for Query, translating GTG into valine (V). An asterisk (*) marks the stop codon (TAA) at Query location 472 (red oval). Subject and Query translate into a protein of 118 amino acid residues (blue rectangles mark protein start and end locations). Query and Subject have two bases mismatched (orange ovals). Query has four extra bases that introduce a gap in Subject (purple oval). In total, six Query bases have no identity with Subject. The rest (477 out of 483) are identical (see the Identities and Gaps statements above the alignment). The Plus/Plus Strand statement indicates that Query and Subject represent the same strand of the double-stranded DNA.

Keywords: Nucleotide BLAST, blastn, nucleotide sequence analysis, coding region analysis, CDS analysis, pairwise alignments, CDS feature, GenBank submissions, BankIt feature annotation

Comments (0)