web
You’re offline. This is a read only version of the page.
close
Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at opm.gov.
Print Article: KA-03456

How do I search NCBI Protein database for sequences that belong to a protein family?

Start your search in the Protein database. Use the text term that you expect to be a common component of the protein names in a certain protein family, for example "heat shock." Limit the term to the title field, which is one of several search fields that you can use in Advanced search to construct the following query:

"heat shock"[title]

The query returns those records that have the term "heat shock" as a part of the title (definition line) of the record. 

If your text term consists of several words (such as "G-protein coupled receptor") it may not work well as a single phrase. Rather, break it into parts, for example:

"G protein"[ti] AND coupled[ti] AND receptor[ti]

The query returns records in which all three terms are present in the title of the record.

You can further filter your search results using the customizable filters on the left side of the search results page. For example, if you are interested in three-dimensional protein structures, use the Source databases filter and select PDB. The PDB (Protein Data Bank) records contain protein sequences that accompany three-dimensional protein structures that are available in the NCBI Structure database (example).

Many records in the Protein database arise from computational translation of partial coding sequences that submitters deposit to GenBank or a collaborating database. To exclude any protein sequences designated as partial, expand your original query with the "NOT" Boolean operator:

"heat shock"[ti] NOT partial[ti]