Deriving knowledge through data mining high-throughput screening data

J Med Chem. 2004 Dec 2;47(25):6373-83. doi: 10.1021/jm049902r.

Abstract

Deriving general knowledge from high-throughput screening data is made difficult by the significant amount of noise, arising primarily from false positives, in the data. The paradigm established for screening an encoded combinatorial library on polymeric support, an ECLiPS library, has a significant amount of built-in redundancy. Because of this redundancy, the resulting data can be interpreted through a rigorous statistical analysis procedure, thereby significantly reducing the number of false positives. Here, we develop the statistical models used to analyze data from high-throughput screens of ECLiPS libraries to derive unbiased true hit rates. These hit rates can also be calculated on subsets of the collection such as those compounds containing a carboxylic acid or those with molecular weight below 350 Da. The relative value of the hit rate on the subset of the collection can then be compared to the overall hit rate to determine the effect of the substructure or physical property on the likelihood of a molecule having biological activity. Here, we show the effects that various functional groups and the standard physical properties, molecular weight, hydrogen bond donors, hydrogen bond acceptors, log P, and rotatable bonds, have on the likelihood of a compound being biologically active. To our knowledge this is the first published account of the use of high-throughput screening data to elucidate the effects of physical properties and substructures on the likelihood of compounds showing biological activity over a broad range of pharmaceutically relevant targets.

MeSH terms

  • Algorithms
  • Databases, Factual
  • Hydrogen Bonding
  • Models, Molecular*
  • Molecular Conformation
  • Molecular Weight
  • Pharmaceutical Preparations / chemistry*
  • Probability
  • Protein Binding*
  • Quantitative Structure-Activity Relationship*

Substances

  • Pharmaceutical Preparations