Designing Chemical Compound Libraries for Drug Discovery

Originally published December 1, 2005

Well-designed chemical compound libraries truly jump-start the search for new drugs. Today, many biopharmaceutical companies search large compound libraries to identify promising leads for potentially marketable drugs. High-throughput screening (HTS) is a technique widely used in the industry to rapidly scan and analyze these libraries. However, a key feature for a successful library search is the strategy used to design the library itself, and whether the design increases the probability of retrieving promising “hits” and potential leads. Consequently, there is now an intense focus on research in compound library design strategies.

We should note that there are two types of compound libraries: experimental HTS (eHTS) libraries and virtual HTS (vHTS) libraries. Whereas eHTS consist of real chemical compounds that are screened in a laboratory environment, vHTS libraries consist of 3D representations of chemical compounds that are screened using computational methods. The logical designs of both library types are often similar to one another, and the two methods—experimental and computational—often complement one another in drug research settings. Similarly, eHTS and vHTS are commonly run in parallel in drug discovery campaigns with the results of one compared to the other. While I will focus on vHTS in this article, many of my comments will apply equally to eHTS. And we should note that there is a pressing need for bioinformaticians with the required skills to design vHTS libraries for promising, new drug leads.

First Generation vHTS Libraries

The first vHTS libraries became popular in the mid-1990’s. Early attempts to design these libraries often consisted of including the largest set of compounds possible for whatever computer system they were hosted on. Essentially, the basic approach was to screen as many compounds in a given period of time as possible. Less consideration was given to the specific types of compounds in the libraries with the result that more-or-less random collections of compounds were commonplace. There was a general belief that drug leads could be derived from the sheer number of compounds screened.  Libraries typically included up to one million small-molecule compounds, which could be screened relatively quickly in perhaps a few days to a week of run time. Although these efforts led to some notable successes in finding drug leads, many believed that the screening results were not as fruitful as expected. These early efforts may be called the first generation of vHTS libraries.

Second Generation vHTS Libraries 

Over the last few years there has been considerable research in the design of vHTS libraries to improve their capability for finding drug leads. Library design today is more sophisticated than in the past and centers around the methods used for choosing compound membership. The choice of compounds is often based on two widely used design strategies: diversity-oriented design and target-oriented design. These current efforts may be called the second generation of vHTS libraries.

Diversity-Oriented Design

The goal of this design strategy is to generate libraries with a highly diverse set of chemical compounds. By using a diverse set of compounds, there should be a greater likelihood that query molecules will “hit” one or several novel target compounds.

Numerous methods are available for creating such diversity. Skeletal diversity, for example, is a strategy where the core, backbone, or scaffold elements of chemical compounds are chosen to maximize their variation in 3D shape, electrostatics, or molecular properties. Stereochemical diversity involves the 3D spatial arrangements of atoms and functional groups in molecules, and is maximized such that a range of molecular conformations is sampled during screening runs. Molecular property diversity is another method for generating compound diversity. Here, molecular properties available for modification include hydrogen bond donor groups, hydrogen bond acceptor groups, polarizable groups, charge distributions, hydrophobic and lipophobic groups, and numerous other properties. The diversity of the libraries resulting from these methods is often measured using statistical techniques, such as cluster analysis and principal components analysis.

Target-Oriented Design

In contrast to diversity-oriented design, target-oriented design seeks to create libraries that are focused around specific chemotypes, molecular species, or classes of compounds. Target-oriented design results in focused libraries with a limited number of well-defined compounds. For example, scaffold compounds can be used as “seed” elements with various functional groups systemically added to the seed scaffolds to create sets of analogue compounds. Target-oriented design methods use 3D shape, 3D electrostatics, pharmacophore models, molecular descriptors, and other methods to generate focused libraries. And if compounds of known 3D structure bind to active sites, they can also be used as seeds for libraries.

When building targeted libraries, a common design method is to take existing drug leads and generate neighbors (analogues) of the leads in chemistry space using combinatorial methods and conformational expansions of the lead compounds. The resulting compound libraries thus include many analogues of the lead compounds, which can be used in additional screens for novel leads.

Molecular Property Diversity

Whether vHTS libraries are designed for diversity or focused around specific chemotypes, they often use molecular property profiles in the design process. Chemical compounds need to satisfy a variety of constraints before they become marketable drugs, including solubility, oral bioavailability, cell membrane permeability, liver enzyme activity (i.e. the cytochrome series), plasma protein binding, penetration of the blood-brain barrier, toxicity (mutagenicity, carcinogenicity, LD50), and many others. For example, a common design approach is focusing molecular properties around Lipinski’s rules. This is a set of rules that describes common molecular properties of many currently marketed drugs. Lipinski’s rules place limits on molecular weight, the number of hydrogen bond donors and acceptors, the number of rotatable bonds, and solubility. Applying Lipinski’s rules in library design acts as a molecular property filter, you can effectively restrict the set of compounds to those with drug-like characteristics.

Natural Product Libraries

A significant number of marketed drugs are derived from natural products. In many cases a sensible design strategy is one that includes natural chemical products in compound libraries. For example, according to the Journal of Chemical Information and Modeling, a recent study by Feher and Schmidt examined the property space of several representative HTS libraries. They compared HTS property profiles to those found in natural products and marketed drugs. In the figure below, the distribution of molecular properties found in a representative set of HTS libraries was rather narrowly defined, whereas the distributions of molecular properties found in natural products and marketed drugs were much broader. 

Distributions of chemical compound properties in a representative set of HTS libraries (red), natural products (blue), and marketed drugs (green). Observe the similarity in distributions between natural products and marketed drugs. And note the differences in distributions between HTS libraries and both natural products and drugs.

Furthermore, there are greater similarities between the molecular property profiles of natural products and marketed drugs than there are with HTS libraries. When designing libraries, these results at least suggest that we should emulate the molecular properties and diversity found in natural products and currently marketed drugs.

Future Paths

The Second generation design of vHTS libraries is well underway. In addition to the brief descriptions of design methods given here, there are various other efforts occurring in library design. These include dynamic combichem, click chemistry, catalyst optimization, multi-component reactions, recursive partitioning and others offering additional sophisticated approaches to chemical compound library design.

  • Dr. Richard CaseyDr. Richard Casey

    Richard is the Founder and Chief Scientific Officer of RMC Biosciences Inc., a firm that offers services in Bioinformatics and Computer Aided Drug Design. Dr. Casey received a Ph.D. in Biological Sciences from Colorado State University. He has 20-plus years experience in Computational Sciences, Information Technology and High-Performance Computing. He has held corporate and academic positions at Hewlett-Packard, Boeing Computer Services, Arizona State University, Colorado State University, the Alabama Supercomputer Center, and the Institute for Computational Studies at CSU and was the founder of a software consulting firm, Alpine Computing Inc. He holds a Project Management Professional Certificate and a Bioinformatics Certificate from Stanford University. Richard can be reached at rcasey@rmcbiosciences.com.

Recent articles by Dr. Richard Casey



 

Comments

Want to post a comment? Login or become a member today!

Be the first to comment!