Interpretation of Quantitative Structure-Property and - Activity Relationships
Interpretation of Quantitative Structure-Property and - Activity Relationships
Interpretation of Quantitative Structure-Property and - Activity Relationships
Subhash Basak
Natural Resources Research Institute, University of Minnesota, Duluth, 5013 Miller Trunkhwy,
Duluth, Minnesota 55811
Emilio Benfenati
Instituto Mario Negri, Via Eritrea 62, 20157 Milan, Italy
The potential utility of data reduction methods (e.g. principal component analysis) for the analysis of matrices
assembled from the related properties of large sets of compounds is discussed by reference to results obtained
from solvent polarity scales, ongoing work on solubilities and sweetness properties, and proposed general
treatments of toxicities and gas chromatographic retention indices.
Protection Agency (USEPA) uses two major methods in their equal or superior to multivariate regression.13a,b,14a,b Artificial
premanufacture hazard assessment notification (PMN) of intelligence offers advantages in dealing with numerical
chemicals: (a) class specific QSARs if available and (b) continuous values and also with categories and rules.15
chemical analogues.6 PCA and PCs derived from computed Machine learning research seeks to develop algorithms that
molecular descriptors can be used for both purposes. learn predictive relationships from data by data mining and
The data reduction capability of PCA has also provided a knowledge discovery techniques. Fuzzy logic can be used
“synthetic and holistic view” of different solvent polarity to keep into account the uncertainty of the property of
scales, insight into the action of structural classes of interest, e.g., the magnitude of a toxicological value.
sweeteners,7a,b and the solubility of chemicals in diverse Numerous QSAR/QSPR models apply both statistical and
solvents.8 Analogy indicates that such approaches will yield neural net methods, with a single or a small set of
useful results for the various toxicity scales that have been independent variables, to small and structurally related
developed for the assessment of hazard posed by natural and compound sets. Complex properties such as solvent polarity,
anthropogenic chemicals to human and environmental health. sweetness, and the toxicity of structurally diverse chemicals
A fundamental goal of QSAR/QSPR studies is to predict (e.g. those in the TSCA inventory) call for broader integrated
complex physical, chemical, biological, and technological approaches, rather than such piecemeal methods. Techniques
properties of chemicals from simpler “descriptors”, preferably such as PCA provide holistic approaches for combining many
those calculated solely from molecular structure, excluding independent variable (descriptor) scales for deriving QSAR/
experimental data.9 To this end, numerous experimental and QSPR models for complex properties and larger sets of
computed descriptors have been developed for QSAR/QSPR compounds.
studies.10 Any descriptor, whether experimental or calculated, The data of p descriptors for n chemicals form an n × p
associates a real number with a chemical and then orders matrix X. Each chemical is now a point in the p-dimensional
the set of chemicals according to the numerical value of the space, RP. Since many descriptors, whether experimental or
specific property. Each descriptor or property provides a scale calculated, are significantly intercorrelated, the points in RP
for a particular set of chemicals. Thus an experimentally will in fact define a subspace of lower dimension than p,
determined solvent polarity scale orders a set of solvents and, as discussed above, PCA can provide PCs which
according to the magnitude of the solvent polarity as defined represent reduced data and efficiently combine diverse
by the scale. Similarly, the magnitude of the molecular predictor variables. The PCs can then be used in the
complexity descriptor (e.g., the first-order information prediction of properties, quantification of structural similarity/
content, IC1) maps a set of chemicals into a corresponding dissimilarity of chemicals, and the clustering of large and
set of real numbers, and orders them into a scale.11a,b If such diverse combinatorial libraries of chemicals.16
a scale (independent variable), experimental or calculated, PCs may find applications in the clustering and classifica-
is linearly or nonlinearly related to the magnitude (scale) of tion techniques for complex properties such as toxicity. While
a particular physical, chemical, biological, or technological classification methods may appear crude, compared to
property (dependent variable) of interest, this provides a multilinear analysis and NN, given the huge variability of
successful QSAR/QSPR model. Multiple linear regressions the toxic effects, they will be suitable for the preliminary
have been very popular in the formulation of QSARs/QSPRs. treatment of large sets of data.17
The partial least squares (PLS) method is particularly
suited for the extraction of a few highly significant formal SOLVENT POLARITY SCALES8,18
correlational factors from large homogeneous sets of descrip-
tors such as molecular field grid data (cf. comparative Solvent polarity is widely recognized to be of great
molecular field analysis, CoMFA).12a However, this approach importance in many fundamental and applied areas of
is often less appropriate in cases of large diverse descriptor research. However, the precise definition of solvent polarity
sets, as its use can result in the selection of too many formal has proved difficult. More than a hundred quantitative solvent
factors.12b Therefore, in this paper we consider an alternative polarity scales have been proposed on the basis of diverse
approach that combines the PCA and multilinear regression properties, including reaction rates, solvatochromic effects,
analysis. The new approach is outlined on the basis of the and entropies. In recent joint work of two of our laboratories,
previous work by our groups. a matrix was formed from 40 of the most important scales
and 40 of the most important solvents. However, there were
FURTHER MODEL BUILDING METHODS
many gaps in this database. A QSPR was established for
each of the 40 scales,18 and this was then used to fill in all
For more complicated situations, several statistical methods the gaps in the matrix. The principal component analysis of
can be used for flexible nonlinear modeling, including the this matrix8 showed that the first three principal components
following: polynomial regression; tree-based models; Baye- accounted for about 75% of the total variance. These
sian methods. components described 22 of the scales very well (greater or
For instance, Trinajstic and co-workers used nonlinear equal to 90% of variance), another 14 were well or fairly
multivariate regression to predict biological and pharmaco- well described (70-89% of the variance), and 4 were rather
logical properties.13a Methods of machine learning have also poorly described, the 54-65% of variance.
been used in the development of QSAR/QSPR models. In A three-dimensional plot using the loadings of the first
the 1990s, regression methods based on neural networks three PCs as axes gave very useful information on the scales;
(NNs) offered new possibilities to QSARs, accounting for see Figure 1. In particular, most scales fell into five groups
nonlinear structure-activity relationships and dealing with as follows: (i) expression of dielectric constant; (ii) charge-
nonlinear dependencies.13b Repeatedly, NNs proved to be transfer effects on electronic spectra; (iii) other UV spectral
INTERPRETATION OF QSPR AND QSAR J. Chem. Inf. Comput. Sci., Vol. 41, No. 3, 2001 681
Figure 1. Loadings of the second PCA component plotted versus the loadings of the first component with the third component loading and
scale classification given as labels to the data points. Reprinted with permission from ref 8. Copyright 1999 American Chemical Society.
effects; (iv) expression of solvent basicity; (v) expression predictions were obtained with a two-parameter correlation
of solvent refractive index. equation (R2 ) 0.977, R2cV ) 0.975) that adequately
Similarly, a three-dimensional plot of the scores of the represented the effective dispersion and cavity formation
first three PCs (Figure 2) gave information on the solvents. effects for the solvation of nonpolar solutes in water. A set
In particular, the hydroxylic solvents appeared in one group, of 406 structurally diverse organic compounds (including
the dipolar-aprotic solvents in another, and polar solvents structures containing N, O, S, and halogen atoms) was
in yet another group (well-separated from the nonpolar successfully correlated by a five-parameter equation (R2 )
solvents). Only formamide remained as a single group. 0.941, R2cV ) 0.939),19 which accounts for the dispersion
In this way considerable information and rationalization energy of polar solutes in solution, the electrostatic part of
was obtained for both solvents and solvent polarity scales. the solute-solvent interaction, and hydrogen-bonding inter-
General Treatment of Solubility. Consideration of the actions in liquids.
solubilities of solids or liquids in a liquid solvent is We recently obtained similar equations for the solubilities
complicated by the need to consider intermolecular interac- of organic molecules in methanol and ethanol.20 The solubili-
tions in the bulk solute in addition to those in the bulk solvent ties of 87 gases and vapors in methanol resulted in a four-
and between the solute and the solvent. Thus, it is easier to parameter equation (R2 ) 0.945, R2cV ) 0.938) that ad-
treat the solubilities of vapors and gases, i.e., gas-liquid equately represents the solute-solvent interactions described
partition coefficients. Moreover, such gas-liquid partition by the polarizability, dipole moment, hydrogen bonding, and
coefficients are extremely important from an environmental lipophilicity. The solubilities in ethanol of 61 gases and
point of view, especially when the liquid is water. Therefore vapors also yielded a four-parameter equation (R2 ) 0.969,
it is of great utility to have the structure-based chemical R2cV ) 0.964), where the solute-solvent intercorrelations,
information on gas solubilities generalized. similar to those of methanol, include electrostatic and
Water-gas phase partition coefficients of diverse organic hydrogen-bonding interactions.
compounds can be adequately described using descriptors We plan to extend this work to a variety of other solvents,
based solely on the chemical structures of the organic including polar aprotic solvents such as dimethylformamide,
molecules: a web site is available (http://clogp.pomona.edu/ dimethyl sulfoxide, nitrobenzene; polar solvents such as
medchem/chem/qsar-db). The partitioning of two sets of chloroform and ethyl acetate; and nonpolar solvents such as
organic gases and vapors between water and air (Lw) has hexane and benzene. This will provide a matrix between the
been studied using the CODESSA program.19 For a set of solvents and solutes; vacancies in the matrix will be
95 alkanes, cycloalkanes, alkylarenes, and alkynes, excellent calculated using the correlations already obtained. A principal
682 J. Chem. Inf. Comput. Sci., Vol. 41, No. 3, 2001 KATRITZKY ET AL.
Figure 2. Plot of the scores of the second component versus the scores of the first component with the third component loading and scale
classification given as labels to the data points. Reprinted with permission from ref 8. Copyright 1999 American Chemical Society.
component analysis on this matrix, similar to that described consequently should result in a better description of the
above for polarity scales, should provide a set of loadings property. Indeed, topological descriptors can be successfully
which will characterize the solvents and a set of scores that combined with quantum-chemical descriptors to predict GC
will characterize the solutes. We believe that examination retention indices.27a-c Quantum chemical descriptors also
of the patterns for the loadings and the scores will present encode information about the charge distribution and polarity
useful information and insight into the general phenomenon of molecules and were capable of handling specific effects
of solubility. of the stationary phase. Even alone, quantum chemical
Gas Chromatographic Retention Times. It would be descriptors can be useful for this type of study, as indicated
advantageous to systematize gas chromatographic (GC) by the theoretical linear solvation energy relationship (TLS-
retention times based on the chemical structure. We recently ER) established for GC retention indices.28
reviewed the enormous amount of data on the QSPR and Our QSPR analysis of GC retention times utilized a mixed
related analyses of GC retention times.1 A systematic set of topological and quantum-chemical descriptors to model
treatment should illuminate the structural dependencies 152 structures, including a wide cross-section of classes of
between the eluted compound and various stationary phases organic compounds.27b A forward procedure for the selection
in GC. of molecular descriptors in the multilinear regression analysis
Several authors have estimated retention indices using in the CODESSA program gave a six-parameter model (R2
topological descriptors.21a-e Charged partial surface area ) 0.959, R2cV ) 0.955), with polarizability being the most
(CPSA) descriptors22 have also been successfully combined important descriptor in the model. These results were recently
with topological and geometrical descriptors to predict reevaluated using improved procedures in CODESSA and
retention indices of substituted pyrazines,23 polycyclic aro- new methods for the efficient selection of variables in the
matic compounds,24 stimulants and narcotics,25 and anabolic multilinear regression analysis.27c In more recent work,29 we
steroids.26 The CPSA descriptors encode information about analyzed a set of 178 methyl-branched hydrocarbons to give
charge distribution and surface areas, which relates to a four-parameter model (R2 ) 0.9585, R2cV ) 0.9543)
interactions between the eluted compounds and molecules combining topological and quantum chemical descriptors.
in the stationary phase. Considering the amount of information encoded into
As the polarity of the stationary phase changes, the descriptors, insight into the general phenomenon of gas-
influence of the charge distribution of the eluted molecules solid absorption could be obtained by combining QSPRs and
changes, and different descriptors become important. There- subsequent PCAs of a matrix of retention times of a diverse
fore, each phase has to be modeled separately. Diverse set of compounds using a range of solid phases in GC. It
classes of descriptors extend the pool of information and would of course be necessary to make all the GC measure-
INTERPRETATION OF QSPR AND QSAR J. Chem. Inf. Comput. Sci., Vol. 41, No. 3, 2001 683
ments under the same experimental conditions such as the related to nonspecific mechanisms, while carcinogenesis is
length of the column, the temperature of the column, the the result of several complex phenomena involving many
nature of the carrier gas, and the speed of the carrier gas. biological and chemical steps. Furthermore, it is easier to
Sensory Properties. In unpublished work,7b we have model the toxicity of a congeneric set of compounds, for
provided QSPRs for the sweetness property, defined as the instance, a homologous series, while it is more difficult to
dimensionless ratio of the concentration of the alternative extrapolate the behavior of chemicals of vastly diverse
sweetener to the concentration of sucrose, which has an chemical classes.
equally sweet taste. For a comprehensively referenced set In addition, there is the problem of the variability of the
of 348 natural and artificial sweeteners, the treatment of data biological data arising from the chemical purity of the
using the linear and nonlinear regression methods of the compound under study, the variability of the protocol, and
CODESSA software package resulted in a global three- biological variability. We may be able to avoid much of the
parameter correlation with R2 ) 0.71. Significantly more variability resulting from the chemical purity and protocol,
reliable models were developed for various subclasses of but the reproducibility of biological tests is much lower than
compounds (peptides, aldoximes, acesulfamates and sulfa- that for other properties, such as gas chromatographic
mates, guanidines, ureas and thioureas, and various natural retention times. Such variability is particularly relevant when
sweeteners). dealing with reduced sets of data.
Following the general idea now expounded, it would be QSAR models for various toxicities have been collected.33
of substantial interest to extend this investigation by applying Hermens co-coordinated a project in which QSAR models
the QSPR treatment to other sensory properties of com- for aquatic toxicity were reviewed:34 log P was the parameter
pounds. It is known that taste reception is localized in four most frequently related to toxicity, but it is insufficient to
regions of the tongue, corresponding to the sensations of explain all the toxicological properties.35a,b Many other
sweetness, saltiness, sourness, and bitterness, each related descriptors can be used in order to predict toxicity better;
to different receptors.30 Nevertheless, according to the for example, Basak et al. compared topological, geometrical,
approach described above for other properties, all these and quantum-chemical parameters in predicting mutagenic-
gustatory properties should be treatable simultaneously using ity,36 aquatic toxicity,36 and dermal penetration37 of chemi-
a combination of QSPR with PCA. Furthermore, it has been cals.38 Of course, chemical descriptors can be combined and
observed that the gustatory properties of certain compounds selected, to take advantage of the most useful parameters;
can be interrelated with the corresponding olfactory proper- this has been done, for instance, in a study of genotoxicity
ties.7a,31a,b Consequently, a combined QSPR/PCA treatment using multilinear regression.39
may also be feasible for the sets of data on both sensory A huge number of parameters describing a compound can
properties. Extensive data on olfactory properties have been be measured or computed, but how to deal with this high-
collected and systematized using the QSAR approach.32 dimensional information is a problem. In many cases no a
General Treatment of Toxicities. A general treatment priori knowledge on the role of parameters in determining
could determine underlying relationships between different a property is available. In this situation, a selection of the
measures of toxicity. Although toxicity is far more complex variables is needed to reduce the complexity of the descrip-
than the topics previously discussed in this paper, we believe tion, using for instance PCAs (which imply linearity of the
that the method could make a significant contribution to the model) or genetic algorithms (which may also keep into
analysis, classification, and understanding of toxicity. account nonlinearity).
A comparison with the treatment of solvent polarity scales The complexity of toxicity stems from the following: the
mentioned above is illuminating. Between 100 and 200 toxicological aspect, the chemical information, the math-
solvent polarity scales have been formulated, and perhaps ematical approach, the dimension, and the diversity of the
400 or 500 solvents were examined. The numbers for toxicity set of chemicals. To investigate all of these points, in an
are far larger: many different measures of toxicity have been ongoing project a data set of compounds presenting six
used depending on species, concentration, mode of applica- different toxicological endpoints has been compiled. About
tion, and duration. The number of compounds, on which at 200 chemical descriptors were calculated for these com-
least one measure of toxicity has been obtained, ranges up pounds, and different computational models were evaluated.
to six figures. Despite this complexity, the method could Preliminary results40a,b indicated the feasibility of the ap-
investigate (i) general interrelationships between various proach; however, a wider data set is required, both for the
types of toxicity and (ii) interrelationships between structures number of compounds and for the number of toxicological
in determining toxicity. endpoints.
The enormous amount of experimental data available Quo Vadis? We are suggesting a transition from the
makes this attempt challenging. Moreover, considering the familiar one-dimensional QSAR/QSPR treatments, where the
data as a matrix of compounds against toxicities, the matrix variation of a single property with structure is studied, to a
is very fragmentary: there are far more missing than general multidimensional treatment. This implies the simul-
available data points. Work on multidimensionality problems taneous study of many descriptors or the study of the
so far has centered on much simpler topics. It is very difficult utilization of orthogonal variables extracted from many
to compare the performances of the multitude of different descriptors in the development of QSAR/QSPR models. Such
models reported for the prediction of toxicity because they models should be based solely on parameters that can be
refer to a multitude of situations: different toxicological calculated directly from the molecular structure using
endpoints, chemical descriptors, mathematical algorithms, computer algorithms without any input of experimental data.
and data sets. Some toxicological endpoints can be explained This is essential because even the simplest experimental
more easily than others. For instance, narcosis in fish is properties are not available for many known environmental
684 J. Chem. Inf. Comput. Sci., Vol. 41, No. 3, 2001 KATRITZKY ET AL.
pollutants and most chemicals of real or virtual combinatorial (12) (a) Cramer, R. D.; Patterson, D. E.; Bunce, J. D. Comparative
Molecular Field Analysis (CoMFA). 1. Effect of Shape on Binding
libraries. This more general approach is also advantageous of Steroids to Carrier Proteins. J. Am Chem. Soc. 1988, 110, 5959-
for real applications: for example, it is much more useful 5967. (b) Höskuldsson, A. PLS Regression Methods. J. Chemom. 1988,
to have a model that takes into account the numerous 2, 211-228.
toxicological endpoints related to an aquatic ecosystem than (13) (a) Lucic, B.; Trinajstic, N. Multivariate Regression Versus Artificial
Neural Networks in QSAR. Second Indo-US Workshop on Math-
a model which accounts only for a single endpoint such as ematical Chemistry. Duluth, MN, May 30-June 3, 2000. (b) Balaban,
lethality in Daphnia. Thus, the approach should provide A. T.; Basak, S. C.; Colburn, T.; Grunwald, G. D. Correlation between
Structure and Normal Boiling Points of Haloalkanes C1-C4 Using
additional insight into QSAR/QSPR by the application of Neural Networks. J. Chem. Inf. Comput. Sci. 1994, 34, 1118-1121.
data-reduction methods such as PCA to property/structure (14) (a) Gini, G.; Lorenzini, M.; Benfenati, E.; Grasso, P.; Bruschi, M.
matrices. Predictive Carcinogenicity: A Model for Aromatic Compounds, with
Nitrogen-Containing Substituents, Based on Molecular Descriptors
Using an Artificial Neural Network. J. Chem. Inf. Comput. Sci. 1999,
ACKNOWLEDGMENT 39, 1076-1080. (b) Basak, S. C.; Gute, B. D.; Grunwald, G. D.; Optiz,
D. W.; Balasubramanian, K. Use of Statistical and Neural Net Methods
We thank Dr. Dennis Hall for comments on the draft in Predicting Toxicity of Chemicals: A Hierarchical QSAR Approach.
manuscript. In PredictiVe Toxicology of Chemicals: Experiences and Impact of
AI Tools; Gini, G. C., Katritzky, A. R., Eds.; AAAI 1999 Spring
Symposium Series; AAAI Press: Menlo Park, CA, 1999; pp 108-
REFERENCES AND NOTES 111.
(15) Benfenati, E.; Gini, G. Computational Predictive Programs (Expert
(1) Katritzky, A. R.; Maran, U.; Lobanov, V. S.; Karelson, M. Structurally Systems) in Toxicology. Toxicology 1997, 119, 213-225.
Diverse Quantitative Structure-Property Relationship Correlations of (16) Basak, S. C.; Mills, D.; Gute, B. D.; Balaban, A. T.; Basak K.;
Technologically Relevant Physical Properties. J. Chem. Inf. Comput. Grunwald, G. D. Use of Mathematical Structural Invariants in
Sci. 2000, 40, 1-18. Analyzing Combinatorial Libraries: A Case Study with Psoralen
(2) (a) Basak, S. C.; Magnuson, V. R.; Niemi, G. J.; Regal, R. R. Derivatives. In Aspects of Mathematical Chemistry; Sinha, D. K.,
Topological Indexes-Their Nature, Mutual Relatedness, and Applica- Basak, S. C., Mohanty, R. K., Basumallick, I. N., Eds.; Visva Bharati
tions. Math Model. 1987, 8, 300-305. (b) Basak S. C.; Niemi G. J.; University Press: in press.
Regal, R. R.; Veith, G. D. Determining Structural Similarity of (17) Benfenati, E.; Lorenzini, P.; Grasso, P.; Gini, G. Classification
Chemicals Using Graph-Theoretic Indexes. Discrete Appl. Math. 1988, Experiments for the Prediction of Pesticide Ecotoxicity. Second Indo-
19 (1-3) 17-44. US Workshop on Mathematical Chemistry. Duluth, MN, May 30-
(3) (a) Basak, S. C.; Grunwald, G. D.; Host, G. E.; Niemi, G. J.; Bradbury, June 3, 2000.
S. P. A Comparative Study of Molecular Similarity, Statistical, and (18) Katritzky, A. R.; Tamm, T.; Wang, Y.; Sild, S.; Karelson, M. QSPR
Neural Methods for Predicting Toxic Modes of Action. EnViron. Treatment of Solvent Scales. J. Chem. Inf. Comput. Sci. 1999, 39,
Toxicol. Chem. 1998, 17, 1056-1064. (b) Basak, S. C.; Grunwald, 684-691.
G. D. Molecular Similarity and Estimation of Molecular-Properties.
(19) Katritzky, A. R.; Mu, L.; Karelson, M. A QSPR Study of the Solubility
J. Chem. Inf. Comput. Sci. 1995, 35, 366-372. (c) Basak, S. C.; Niemi,
of Gases and Vapors in Water. J. Chem. Inf. Comput. Sci. 1996, 36,
G. J.; Veith, G. D. Predicting Properties of Molecules Using Graph
1162-1168.
Invariants. J. Math. Chem. 1991, 7, 243-272. (d) Basak, S. C.;
Grunwald, G. D. Use of Topological Space and Property Space in (20) Katritzky, A. R.; Tatham, D. B.; Maran, U. The Correlation of the
Selecting Structural Analogs. Math. Modell. Sci. Comput., in press. Solubility of Gases and Vapors in Methanol and Ethanol with Their
(e) Basak, S. C.; Grunwald, G. D. Development and Application of Molecular Structures. J. Chem. Inf. Comput. Sci. Accepted for
Molecular Similarity Methods Using Nonempirical Parameters. Math. publication.
Modell. Sci. Comput., in press. (f) Basak, S. C.; Grunwald, G. D. (21) (a) Michotte, Y.; Massart, D. L., Molecular Connectivity and Retention
Quantitative Comparison of Five Molecular Structure Spaces in Indexes. J. Pharm. Sci. 1977, 66, 1630-1632. (b) Bonchev, D.;
Selecting Analogs of Chemicals. Math. Modell. Sci. Comput., in press. Mekenjan, O.; Protic, G.; Trinajstic, N. Application of Topological
(g) Xue, L.; Bajorath, J. Molecular Descriptors for Effective Clas- Indices to Gas Chromatographic Data: Calculation of the Retention
sification of Biologically Active Compounds Based on Principal Indices of Isomeric Alkylbenzenes. J. Chromatogr. 1979, 176, 149-
Component Analysis Identified by a Genetic Algorithm. J. Chem. Inf. 156. (c) Kier, L. B.; Hall, L. H. Molecular Connectivity Analysis of
Sci. 2000, 40, 801-809. (h) Basak, S. C.; Grunwald, G. D. Tolerance Structure Influencing Chromatographic Retention Indices. J. Pharm.
Space and Molecular Similarity. SAR QSAR EnViron. Res. 1995, 3, Sci. 1979, 68, 120-122. (d) Duvenbeck, Ch.; Zinn, P. List Operations
265-277. on Chemical Graphs. 3. Development of Vertex and Edge Models for
(4) Basak, S. C.; Gute, B. D.; Grunwald, G. D. Characterization of the Fitting Retention Index Data. J. Chem. Inf. Comput. Sci. 1993, 33,
Molecular Similarity of Chemicals Using Topological Invariants. In 211-219. (e) Duvenbeck, Ch.; Zinn, P. List Operations on Chemical
AdVances in Molecular Similarity; JAI Press: Greenwich, CT, 1996; Graphs. 4. Using Edge Models for Prediction of Retention Index Data.
Vol. 2, pp 171-185. J. Chem. Inf. Comput. Sci. 1993, 33, 220-230.
(5) Balasubramanian, K.; Basak, S. C. Characterization of Isospectral (22) Stanton, D. T.; Jurs, P. C. Development and Use of Charged Partial
Graphs Using Graph Invariants and Derived Orthogonal Parameters. Surface Area Structural Descriptors in Computer-Assisted Quantitative
J. Chem. Inf. Comput. Sci. 1998, 38, 367-373. Structure-Property Relationship Studies. Anal. Chem. 1990, 62,
(6) Auer, C. M.; Nabholz, J. V.; Baetcke, K. P. Mode of Action and the 2323-2329.
Assessment of Chemical Hazards in the Presence of Limited Data: (23) Stanton, D. T.; Jurs, P. C. Computer-Assisted Prediction of Gas
Use of Structure-Activity Relationships (SAR) Under TSCA, Section Chromatographic Retention Indices of Pyrazines. Anal. Chem. 1989,
5. EnViron. Health Perspect. 1990, 87, 183-197. 61, 1328-1332.
(7) (a) Jurs, P. C.; Bakken, G. A.; McClelland, H. E. Computational (24) Whalen-Pedersen, E. K.; Jurs, P. C. Calculation of Linear Temperature
Methods for the Analysis of Chemical Sensor Array Data from Volatile Programmed Capillary Gas Chromatographic Retention Indices of
Analytes. Chem. ReV. 2000, 100, 2649-2678. (b) Katritzky, A. R.; Polycyclic Aromatic Compounds. Anal. Chem. 1981, 53, 2184-2187.
Petrukhin, R.; Karelson, M.; Prakash, I.; Desai, N. Sweetness (25) Georgakopoulos, C. G.; Kiburis, J. C.; Jurs, P. C. Prediction of Gas
Correlations Using CODESSA. Part I. Manuscript in preparation. Chromatographic Relative Retention Times of Stimulants and Narcot-
(8) Katritzky, A. R.; Tamm, T.; Wang, Y.; Karelson, M. A Unified ics. Anal. Chem. 1991, 63, 2021-2024.
Treatment of Solvent Properties. J. Chem. Inf. Comput. Sci. 1999, 39, (26) Georgakopoulos, C. G.; Tsika, O. G.; Kiburis, J. C.; Jurs, P. C.
692-698. Prediction of Gas-Chromatographic Relative Retention Times of
(9) Karelson, M.; Lobanov, V. S.; Katritzky, A. R. Quantum-Chemical Anabolic Steroids. Anal. Chem. 1991, 63, 2025-2028.
Descriptors in QSAR/QSPR Studies. Chem. ReV. 1996, 96, 1027- (27) (a) Buydens, L.; Massart, D. L.; Geerlings, P. Prediction of Gas
1043. Chromatographic Retention Indexes with Topological, Physicochem-
(10) Karelson, M. Molecular Descriptors in QSAR/QSPR; John Wiley & ical, and Quantum Chemical Parameters. Anal. Chem. 1983, 55, 738-
Sons: New York, 2000. 744. (b) Katritzky, A. R.; Ignatchenko, E. S.; Barcock, R. A.; Lobanov,
(11) (a) Basak, S. C. Use of Molecular Complexity Indices in Predictive V. S.; Karelson, M. Prediction of Gas Chromatographic Retention
Pharmacology and Toxicology: A QSAR Approach. Med. Sci. Res. Times and Response Factors Using a General Quantitative Structure-
1987, 15, 605-609. (b) Johnson, M.; Basak, S. C.; Maggiora, G. A Property Relashionship Treatment. Anal. Chem. 1994, 66, 1799-1807.
Characterization of Molecular Similarity Methods for a Property (c) Lucic, B.; Trinajstic, N.; Sild, S.; Karelson, M.; Katritzky, A. R.
Prediction. Math. Comput. Modell. 1988, 11, 630-634. A New Efficient Approach for Variable Selection Based on Multire-
INTERPRETATION OF QSPR AND QSAR J. Chem. Inf. Comput. Sci., Vol. 41, No. 3, 2001 685
gression: Prediction of Gas Chromatographic Retention Times and of Aquatic Toxicity 1: Guppy. EnViron. Toxicol. Chem. 1999, 18,
Response Factors. J. Chem. Inf. Comput. Sci. 1999, 39, 610-621. 2497-2505.
(28) Donovan W. H.; Famini, G. R. Using Theoretical Descriptors in (36) Basak, S. C.; Gute, B. D.; Grunwald, G. D. Assessment of the
Structure Activity Relationships: Retention Indices of Sulfur Vesicans Mutagenicity of Aromatic Amines from Theoretical Structural Pa-
and Related Compounds. J. Chem. Soc., Perkin Trans. 2 1996, 83- rameters: A Hierarchical Approach. SAR QSAR EnViron. Res. 1999,
89. 10, 117-129.
(29) Katritzky, A. R.; Chen, K.; Maran, U.; Carlson, D. A. QSPR (37) Gute, B. D.; Grunwald, G. D.; Basak, S. C. Prediction of the Dermal
Correlation and Predictions of GC Retention Indexed for Methyl- Penetration of Polycyclic Aromatic Hydrocarbons (PAHs): A Hier-
Branched Hydrocarbons Produced by Insects. Anal. Chem. 2000, 72, archical QSAR Approach. SAR QSAR EnViron. Res. 1999, 10, 1-15.
101-109. (38) Basak, S. C.; Gute, B. D.; Grunwald, G. D. Relative Effectiveness of
(30) Shallenberger, R. S. Taste Recognition Chemistry. Pure Appl. Chem. Topological, Geometrical, and Quantum-Chemical Parameters in
1997, 69, 659-666. Investigating Mutagenicity of Chemicals. In QuantitatiVe Structure-
(31) (a) Nahon, D. F.; Roozen, J. P.; De Graaf, C. Sensory Evaluation of ActiVity Relationships in EnVironmental Sciences, VII; Chen, F.,
Mixtures of Maltitol or Aspartame, Sucrose and an Orange Aroma. Schuurmann, G., Eds.; SETAC Press: Pensacola, FL, 1997; pp 245-
Chem. Senses 1998, 23, 59-66. (b) Nahon, D. F.; Roozen, J. P.; De 261.
Graaf, C. Sensory Evaluation of Mixtures of Sodium Cyclamate, (39) Maran, U.; Karelson, M.; Katritzky, A. R. A Comprehensive QSAR
Sucrose and an Orange Aroma. J. Agric. Food Chem. 1998, 46, 3426- Treatment of the Genotoxicity of Heteroaromatic and Aromatic
3430. Amines. Quant. Struct.-Act. Relat. 1999, 18, 3-10.
(32) Rossiter, K. J. Structure-Odor Relationships. Chem. ReV. 1996, 96, (40) (a) Benfenati, E.; Pelagatti, S.; Grasso, P.; Gini. G. COMET: The
3201-3240. Approach of a Project in Evaluating Toxicity. In PredictiVe Toxicology
(33) http://clogp.pomona.edu/medchem/chem/qsar-db/search.html. of Chemicals: Experiences and Impact of AI Tools; Gini, G. C.,
(34) Hermens, J. QSAR for Prediction of Fate and Effects of Chemicals Katritzky, A. R., Eds.; AAAI 1999 Spring Symposium Series; AAAI
in the Environment. Final Report, European Commission, Project Press: Menlo Park, CA, 1999; pp 40-43. (b) Gini, G.; Lorenzini,
EV5V-CT92-0211. M.; Vittore, A.; Benfenati, E.; Grasso, P. Some Results for the
(35) (a) Russom, C. L.; Bradbury, S. P.; Broderius, S. J.; Hammermeister, Prediction of Carcinogenicity Using Hybrid Systems. In PredictiVe
D. E.; Drummond, R. A. Predicting Models of Toxic Action from Toxicology of Chemicals: Experiences and Impact of AI Tools; Gini,
Chemical Structure: Acute Toxicity in the Fathead Minnow (Pime- G. C., Katritzky, A. R., Eds.; AAAI 1999 Spring Symposium Series;
phales Promelas). EnViron. Toxicol. Chem. 1997, 16, 948-967. (b) AAAI Press: Menlo Park, CA, 1999; pp 139-143.
Klopman, G.; Saiakhov, R.; Rosenkranz, H. S.; Hermens, J. L. M.
Multiple Computer-Automated Structure Evaluation Program Study CI000134W