Bioinformatics in Biomarker Discovery

Originally published September 20, 2005

Biomarkers are molecular signatures that can be used to measure the progress of disease or the physiological effects of therapeutic intervention in the treatment of disease. They often serve as early warning signs for of various diseases; including cancer, tuberculosis, cardiovascular disease, and inflammatory diseases. Their presence can alert physicians and researchers to the potential existence of transformed, pre-malignant cells and tissues and are thus an active area of investigation in the medical community. Bioinformatics plays a major role in the discovery of new biomarkers, the validation of potential biomarkers and the analysis of disease states.  We will briefly examine the integration of bioinformatics and biomarkers.

For examples of biomarker discovery programs and biomarker technology, see The National Cancer Institute’s Early Detection Research Network.

Types of Biomarkers

There are a variety of biomarkers used today, reflecting their disparate uses in experimental research and clinical settings.

  • Early detection biomarkers are used to identify disease states in the earliest stages of onset and progression. 
  • Diagnostic biomarkers are used to identify the presence or absence of a particular disease state. 
  • Prognostic biomarkers are used to determine survival probabilities of patients. 
  • Predictive biomarkers are used to predict the efficacy of drug therapies in the treatment of disease.
  • Translation biomarkers can be applied in both preclinical and clinical settings.
  • Disease biomarkers are related to clinical outcomes or measures of disease states.
  • Efficacy biomarkers reflect the beneficial effects of a given treatment.
  • Surrogate biomarkers are regarded as valid substitutes for measuring clinical outcomes.
  • Toxicity biomarkers report the toxicological effects of drugs on an in vitro or in vivo system.
  • Target biomarkers report the interactions of drugs with their targets.

Biomarker Identification

The discovery of new biomarkers is often carried out by comparing physiological changes between normal and disease states. The physiological and biochemical conditions in normal and disease states result in differential gene expression profiles, protein expression profiles and changes in metabolite profiles. After examining up-regulated and down-regulated genes, proteins and metabolites, researchers can identify new biomarker compounds and genetic patterns associated with particular diseases that can serve as biomarkers.

 

Fig. 1.  Some important elements in the discovery of new biomarkers.

The disease state is often characterized by well-known structural changes in proteins and enzymes. For example, glycosylation, which is the addition of polysaccharides (sugars) to polypeptides (proteins), yields new forms of glycoproteins. Abnormal concentrations of glycoproteins can then act as a biomarker for various diseases, including muscular dystrophy, acute chronic inflammation and leukemia. Another example of altered protein (enzyme) structure occurs in pancreatic adenocarcinoma. The enzyme RNAase-1, which is detectable in urine and blood serum, displays dramatically altered glycosylation patterns in normal vs. tumorous pancreatic cells. Thus, RNAase-1 can be used as a biomarker for pancreatic adenocarcinoma.

Biomarker Validation

After identifying potential biomarkers, researchers must validate whether biochemical compounds or genetic patterns are useful, as one of the biomarkers described above. In the validation phase, researchers systematically modify putative biomarker compounds, and then check for phenotypic changes or alterations in biochemical and physiological profiles.

Putative Biomarker

Disease State

CA 125

Ovarian cancer

Kllikrein 6

Ovarian cancer

Osteopontin

Ovarian cancer

Prostate-specific antigen

Prostate cancer

Alpha methyl CoA-racemase

Prostate cancer

APC

Colon cancer

BRCA-1, BRCA-2 mutations

Breast cancer

Glutathione S-transferase-1

Breast and prostate cancer

EGFR

Lung cancer

Haptoglobin

Lung, colon, and breast cancer

CDKN A

Colon cancer

Des-gamma-carboxyl-prothrombin

Hepatocellular carcinoma

AFP

Hepatocellular carcinoma

Table 1.  Some potential biomarkers and their respective disease states.

The validation step can be accomplished by genetic means, such as gene knockout studies or RNA interference methods. The validation step can also be accomplished through biochemical means, such as developing agonists or antagonists for potential biomarkers and determining what physiological effects these have on biomarker expression patterns.

Bioinformatics Tools for Biomarker Analysis

Because of the diverse types of biomarkers, the many sources of new biomarkers and the various methods used to discover and validate them, there is an equally impressive set of bioinformatics tools available for biomarker analysis.

DNA, RNA, protein and antibody microarrays are commonly used for biomarker analysis. A sample of microarray analysis tools is available at the Delaware Biotechnology Institute, the Eisen Lab and The Institute for Genomic Research

The comparison of plasma protein concentration levels in normal and diseased states is often used in biomarker analysis. A relatively new Plasma Proteome Database at the Institute of Bioinformatics should prove to be extremely useful in identifying novel biomarkers. The database contains annotated lists of plasma proteins, protein isoforms (multiple 3D shapes derived from alternative splicing) and post-translational modifications. These are very important in determining protein structure and function.  

Single nucleotide polymorphisms (SNP’s) are often associated with various disease states, i.e. cystic fibrosis and prostate cancer. There are an estimated 10,000,000 SNP’s in the human genome, some portion of which will be responsible for diseases. (See the International HapMap Project for more information about human SNP’s). Bioinformatics tools for detecting SNP’s include DRAGON for microarray analysis, the SNP Consortium Database at Cold Spring Harbor, the SNP Consortium tools and databases, as well as many others.

A somewhat unusual bioinformatics method called Decision Forest Analysis has been used to identify biomarkers for esophageal cancer. Here, a pattern recognition algorithm is used to locate complex SNP patterns that are correlated with this particular form of cancer. Interestingly, DF analysis identified lists of SNP’s, various types of SNP’s and specific SNP patterns that could be used as biomarkers for esophageal cancer. DF analysis is now being expanded to identify biomarkers for other disease states.

An interesting recent development is the integration of bioinformatics, biomarker identification, and their application in research on bioterrorism related pathogens. The Viral Bioinformatics Resource Center and the Poxvirus Bioinformatics Resource Center both maintain databases, analytical tools and data mining tools that will encourage the search for pathogen biomarkers.  Several biodefense initiatives have identified more than 50 pathogen species that can potentially be used as bioweapons. The discovery of diagnostic pathogen biomarkers for early, rapid detection of bioterrorist attacks is a high priority item for the National Institutes of Health and other government agencies.

  • Dr. Richard CaseyDr. Richard Casey

    Richard is the Founder and Chief Scientific Officer of RMC Biosciences Inc., a firm that offers services in Bioinformatics and Computer Aided Drug Design. Dr. Casey received a Ph.D. in Biological Sciences from Colorado State University. He has 20-plus years experience in Computational Sciences, Information Technology and High-Performance Computing. He has held corporate and academic positions at Hewlett-Packard, Boeing Computer Services, Arizona State University, Colorado State University, the Alabama Supercomputer Center, and the Institute for Computational Studies at CSU and was the founder of a software consulting firm, Alpine Computing Inc. He holds a Project Management Professional Certificate and a Bioinformatics Certificate from Stanford University. Richard can be reached at rcasey@rmcbiosciences.com.

Recent articles by Dr. Richard Casey


Related Stories


 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!