Print

What was the reason for the removal of a prokaryotic RefSeq protein record and how do I find its replacement at NCBI?

NCBI RefSeq staff may remove a RefSeq sequence record for a variety of reasons.

Most of the prokaryotic RefSeq genomes are annotated through the NCBI Prokaryotic Genome Annotation Pipeline, which is an automated process. Protein annotations may change over time with new sequence data or algorithm adjustments. Any removed protein record is still accessible on the web in the Protein database. Search the database with the Protein accession, for example, WP_011184929.1. There is a message indicating that the record has been removed by RefSeq staff. Click on the accession number to access the record itself. Even though the record has been removed, the identical Protein records based on several INSDC genomic sequences are still live in the database and accessible through the Identical Proteins link, located under the title of the record.

To determine if the protein has been replaced by a new RefSeq protein model (a new WP_ protein) use protein BLAST (blastp).  In this example, enter the WP_011184929.1 accession and perform a search against the Reference proteins (refseq_protein) Database. Check the top alignments and compare the database sequences (subject) with WP_011184929.1 (query). Currently (June 2021), the top-aligning subject sequence, WP_002988233.1, is shorter (missing nine residues at the N-terminus of the protein) but otherwise identical to WP_011184929.1 and you may want to consider it as the new RefSeq model for the Streptococcus formate acetyltransferase protein. Click on the WP_002988233.1 accession link to access the record in the Protein database. Use the Identical Protein Report to see all identical proteins annotated on RefSeq and INSDC genomic Nucleotide sequence records.