Originally published July 12, 2005
There are 100,000 proteins in the human body—or 1 million—depending on who is counting. Right now the exact number of proteins in humans is unknown, although it's possible to make some educated guesses based on the number of genes in the human genome (around 20,000 more or less), alternative splicing (a process of generating multiple proteins from single genes) and post-translational modifications (a process of altering proteins by modifying their structure). In any event, the number of proteins that constitute the human proteome is quite large.
Many efforts are underway to examine the human proteome, and the protein complements in other species as well. To assist in this effort there is a wide variety of bioinformatics tools and databases available for the analysis of proteins. We’ll take a brief look at some of them here.
These microarrays are used to identify protein-protein interactions, to identify the substrates of proteins or to identify the targets of biologically active small molecules. Like its cousin the gene microarray, the protein microarray market is growing rapidly. And with this growth comes a need for bioinformatics tools to analyze the microarrays. Many bioinformatics tools for analyzing protein microarrays are offered by vendors of the microarrays, such as TeleChem, but we should see a steady increase in the number of publicly available, Open Source bioinformatics tools in the near future.
Substitutions of specific amino acids are used in mutagenesis studies to determine how modifications in structure affect protein function. Therefore, scientists often begin formulating their research protocols around an analysis of amino acid sequences that make up protein primary structure. A wide variety of tools are available for analyzing protein amino acid sequences, but a few of the more well-known ones are Modeller, the ExPASy Proteomics Server and HMMER.
On one hand, protein folds are defined strictly by their 3D structure and topological arrangement. On the other hand, protein folds are often associated with specific functions. For example, transferase and hydrolase enzymes often contain alpha/beta folds and perform similar functions. So it is important to know both the structure and function of folds in proteomics analysis.

The colorful figure above shows various sets of protein folds. Along one axis (mostly red) are proteins containing predominantly alpha-helices; along another axis (mostly yellow) are proteins containing mostly beta-strands; and along yet a third axis (mostly green and blue) are proteins containing an equal mix of alpha-helices and beta-strands. In the brief sample of proteins shown in this diagram, one can see a great variety of folds and related secondary structures.
Currently, there are about 550 recognized folds, although this number will continue to rise for some time. The total number of unique folds in nature may be around 1,000. There are several well-known protein fold databases, including SCOP, CATH and FSSP.
Protein Interaction Maps. Many cellular processes and regulatory pathways are controlled by networks of interacting proteins. These networks determine how cells grow, divide, die, differentiate and communicate with other cells. And groups of cooperating, interacting proteins carry out fundamental cellular processes such as DNA replication, DNA repair, transcription, translation and protein synthesis.
Therefore, to gain intimate knowledge of how cellular processes work, it’s important to know how proteins interact with one another. Protein interaction maps identify which proteins interact and how they are grouped together to form functional units. For example, there are about 14,000 proteins in the fruit fly, Drosophila melanogaster. Nearly complete protein interaction maps are available for the fruit fly, which should lead to new discoveries about cell signaling pathways in this organism. Having complete protein interaction maps provides detailed information about specific cell signaling pathways and how they function. Some sites that describe protein interaction maps include Genome Biology and the Institute of Molecular Biotechnology.
One of the fascinating forefronts in protein research is the ability to design and engineer proteins with novel structures and functions. By using the methods described above and by systematically altering protein structures, it’s possible to create (semi-) artificial proteins with new enzymatic functions and unusually strong binding capabilities. Researchers at the Howard Hughes Medical Institute are engaged in this type of research. And they have developed a new tool, ORBIT (Optimization of Rotamers by Iterative Techniques) to assist in the protein design process.
Recent articles by Dr. Richard Casey
Comments
Want to post a comment? Login or become a member today!
Be the first to comment!