2012 Computational Toxicology

METHODS MOLECULAR BIOLOGY
TM
IN
Series Editor
John M. Walker
School of Life Sciences
University of Hertfordshire
Hatfield, Hertfordshire, AL10 9AB, UK
For further volumes:

http://www.springer.com/series/7651
.
Computational Toxicology
Volume I
Edited by
Brad Reisfeld
Chemical and Biological Engineering & School of Biomedical Engineering
Colorado State University, Colorado, USA
Arthur N. Mayeno
Chemical and Biological Engineering,
Colorado State University, Colorado, USA
Editors
Brad Reisfeld Arthur N. Mayeno
Chemical and Biological Engineering Chemical and Biological Engineering
& School of Biomedical Engineering Colorado State University
Colorado State University Colorado, USA
Colorado, USA
ISSN 1064-3745 ISSN 1940-6029 (electronic)

ISBN 978-1-62703-049-6 ISBN 978-1-62703-050-2 (eBook)
DOI 10.1007/978-1-62703-050-2
Springer New York Heidelberg Dordrecht London
Library of Congress Control Number: 2012947026
ª Springer Science+Business Media, LLC 2012

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction
on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this
legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for
the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.
Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions
for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution
under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and
regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of publication, neither the
authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be
made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
Printed on acid-free paper
Humana Press is a brand of Springer

Springer is part of Springer Science+Business Media (www.springer.com)
Preface
Rapid advances in computer science, biology, chemistry, and other disciplines are enabling
powerful new computational tools and models for toxicology and pharmacology. These
computational tools hold tremendous promise for advancing applied and basic science,
from streamlining drug efficacy and safety testing to increasing the efficiency and effective-
ness of risk assessment for environmental chemicals. These approaches also offer the
potential to improve experimental design, reduce the overall number of experimental trials
needed, and decrease the number of animals used in experimentation.
Computational approaches are ideally suited to organize, process, and analyze the vast
libraries and databases of scientific information and to simulate complex biological phe-
nomena. For instance, they allow researchers to (1) investigate toxicological and pharma-
cological phenomena across a wide range of scales of biological organization
(molecular $ cellular $ organism), (2) incorporate and analyze multiple biochemical
and biological interactions, (3) simulate biological processes and generate hypotheses
based on model predictions, which can be tested via targeted experimentation in vitro or
in vivo, (4) explore the consequences of inter- and intra-species differences and population
variability on the toxicology and pharmacology, and (5) extrapolate biological responses
across individuals, species, and a range of dose levels.
Despite the exceptional promise of computational approaches, there are presently very
few resources that focus on providing guidance on the development and practice of these
tools to solve problems and perform analyses in this area. This volume was conceived as
part of the Methods in Molecular Biology series to meet this need and to provide both
biomedical and quantitative scientists with essential background, context, examples, useful
tips, and an overview of current developments in the field. To this end, we present a
collection of practical techniques and software in computational toxicology, illustrated with
relevant examples drawn principally from the fields of environmental and pharmaceutical
sciences. These computational techniques can be used to analyze and simulate a myriad of
multi-scale biochemical and biological phenomena occurring in humans and other animals
following exposure to environmental toxicants or dosing with drugs.
This book (the first in a two-volume set) is organized into four parts each covering a
methodology or topic, subdivided into chapters that provide background, theory, and
illustrative examples. Each part is generally self-contained, allowing the reader to start with
any part, although some knowledge of concepts from other parts may be assumed. Part I
introduces the field of computational toxicology and its current or potential applications.
Part II outlines the principal elements of mathematical and computational modeling, and
accepted best practices and useful guidelines. Part III discusses the use of computational
techniques and databases to predict chemical properties and toxicity, as well as the use of
molecular dynamics. Part IV delineates the elements and approaches to pharmacokinetic
and pharmacodynamic modeling, including non-compartmental and compartmental mod-
eling, modeling of absorption, prediction of pharmacokinetic parameters, physiologically
based pharmacokinetic modeling, and mechanism-based pharmacodynamic modeling;
chemical mixture and population effects, as well as interspecies extrapolation, are also
described and illustrated.
v
vi Preface
Although a complete picture of toxicological risk often involves an analysis of environ-

mental transport, we believe that this expansive topic is beyond the scope of this volume,
and it will not be covered here; overviews of computational techniques in this area are
contained in a variety of excellent references [1–4].
Computational techniques are increasingly allowing scientists to gain new insights into
toxicological phenomena, integrate (and interpret) the results from a wide variety of
experiments, and develop more rigorous and quantitative means of assessing chemical
safety and toxicity. Moreover, these techniques can provide valuable insights before initiat-
ing expensive laboratory experiments and into phenomena not easily amenable to experi-
mental analysis, e.g., detection of highly reactive, transient, or trace-level species in
biological milieu. We believe that the unique collection of explanatory material, software,
and illustrative examples in Computational Toxicology will allow motivated readers to
participate in this exciting field and undertake a diversity of realistic problems of interest.
We would like to express our sincere thanks to our authors whose enthusiasm and
diverse contributions have made this project possible.
Colorado, USA Brad Reisfeld

Arthur N. Mayeno
References
1. Clark, M.M., Transport modeling for environmental engineers and scientists. 2nd ed. 2009, Hobo-
ken, N.J.: Wiley.
2. Hemond, H.F. and E.J. Fechner-Levy, Chemical fate and transport in the environment. 2nd ed.
2000, San Diego: Academic Press. xi, 433 p.
3. Logan, B.E., Environmental transport processes. 1999, New York: Wiley. xiii, 654 p.
4. Nirmalakhandan, N., Modeling tools for environmental engineers and scientists. 2002, Boca Raton,
Fla.: CRC Press. xi, 312 p.
Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
PART I INTRODUCTION
1 What is Computational Toxicology? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Brad Reisfeld and Arthur N. Mayeno
2 Computational Toxicology: Application in Environmental Chemicals . . . . . . . . . . . 9
Yu-Mei Tan, Rory Conolly, Daniel T. Chang, Rogelio Tornero-Velez,
Michael R. Goldsmith, Shane D. Peterson, and Curtis C. Dary
3 Role of Computational Methods in Pharmaceutical Sciences . . . . . . . . . . . . . . . . . . 21
Sandhya Kortagere, Markus Lill, and John Kerrigan
PART II MATHEMATICAL AND COMPUTATIONAL MODELING
4 Best Practices in Mathematical Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Lisette G. de Pillis and Ami E. Radunskaya
5 Tools and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Arthur N. Mayeno and Brad Reisfeld
PART III CHEMINFORMATICS AND CHEMICAL PROPERTY PREDICTION
6 Prediction of Physicochemical Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

John C. Dearden
7 Informing Mechanistic Toxicology with Computational Molecular Models . . . . . . 139
Michael R. Goldsmith, Shane D. Peterson, Daniel T. Chang,
Thomas R. Transue, Rogelio Tornero-Velez,
Yu-Mei Tan, and Curtis C. Dary
8 Chemical Structure Representations and Applications
in Computational Toxicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Muthukumarasamy Karthikeyan and Renu Vyas
9 Accessing and Using Chemical Property Databases . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Janna Hastings, Zara Josephs, and Christoph Steinbeck
10 Accessing, Using, and Creating Chemical Property Databases
for Computational Toxicology Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Antony J. Williams, Sean Ekins, Ola Spjuth, and Egon L. Willighagen
11 Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Xiaolin Cheng and Ivaylo Ivanov
vii
viii Contents
PART IV PHARMACOKINETIC AND PHARMACODYNAMIC MODELING

12 Introduction to Pharmacokinetics in Clinical Toxicology . . . . . . . . . . . . . . . . . . . . . 289
Pavan Vajjah, Geoffrey K. Isbister, and Stephen B. Duffull
13 Modeling of Absorption. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Walter S. Woltosz, Michael B. Bolger, and Viera Lukacova
14 Prediction of Pharmacokinetic Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
A.K. Madan and Harish Dureja
15 Ligand- and Structure-Based Pregnane X Receptor Models . . . . . . . . . . . . . . . . . . . 359
Sandhya Kortagere, Matthew D. Krasowski, and Sean Ekins
16 Non-compartmental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Johan Gabrielsson and Daniel Weiner
17 Compartmental Modeling in the Analysis of Biological Systems . . . . . . . . . . . . . . . 391
James B. Bassingthwaighte, Erik Butterworth,
Bartholomew Jardine, and Gary M. Raymond
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling . . . . . . . . . . . . . . . 439
Jerry L. Campbell, Rebecca A. Clewell, P. Robinan Gentry,
Melvin E. Andersen, and Harvey J. Clewell III
19 Interspecies Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
Elaina M. Kenyon
20 Population Effects and Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
Jean Lou Dorne, Billy Amzal, Frédéric Bois, Amélie Crépet,
Jessica Tressou, and Philippe Verger
21 Mechanism-Based Pharmacodynamic Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
Melanie A. Felmlee, Marilyn E. Morris, and Donald E. Mager
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
List of Contributors
HERVÉ ABDI School of Behavioral and Brain Sciences, The University of Texas
at Dallas, Richardson, TX, USA
BILLY AMZAL LA-SER Europe Ltd, London, UK
MELVIN E. ANDERSEN The Hamner Institutes for Health Sciences,
Research Triangle Park, NC, USA
JAMES B. BASSINGTHWAIGHTE Department of Bioengineering, University
of Washington, Seattle, WA, USA
FRÉDÉRIC Y. BOIS Royallieu Research Center, Technological University
of Compiegne, Compiegne, France; INERIS, DRC/VIVA/METO,
Verneuil en Halatte, France
MICHAEL B. BOLGER Simulations Plus, Inc., Lancaster, CA, USA
ERIK BUTTERWORTH Department of Bioengineering, University of Washington, Seattle,
WA, USA
JERRY L. CAMPBELL JR. The Hamner Institutes for Health Sciences,
DANIEL T. CHANG National Exposure Research Laboratory, US Environmental
Protection Agency, Research Triangle Park, NC, USA
XIAOLIN CHENG Oak Ridge National Laboratory, UT/ORNL Center
for Molecular Biophysics, Oak Ridge, TN, USA; Department of Biochemistry
and Cellular and Molecular Biology, University of Tennessee, Knoxville,
TN, USA
HARVEY J. CLEWELL III The Hamner Institutes for Health Sciences, Research Triangle
Park, NC, USA
REBECCA A. CLEWELL The Hamner Institutes for Health Sciences,
JEAN PAUL COMET I3S laboratory, UMR 6070 CNRS, University of Nice-Sophia
Antipolis, Sophia Antipolis, France
RORY CONOLLY National Health and Environmental Effects Research Laboratory,
U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
AMÉLIE CRÉPET French Agency for Food, Environment and Occupational Health Safety
(ANSES), Maisons-Alfort, France
CURTIS C. DARY National Exposure Research laboratory, US Environmental Protection
Agency, Research Triangle Park, NC, USA
LISETTE G. DE PILLIS Department of Mathematics, Harvey Mudd College,
Claremont, CA, USA
JEAN LOU DORNE Emerging Risks Unit, European Food Safety Authority,
Parma, Italy
STEPHEN B. DUFFULL School of Pharmacy, University of Otago, Otago,
New Zealand
HARISH DUREJA M. D. University, Rohtak, India
ix
x List of Contributors
SEAN EKINS Collaborations in Chemistry, Fuquay Varina, NC, USA; Department

of Pharmaceutical Sciences, University of Maryland, Baltimore, MD, USA; Department
of Pharmacology, University of Medicine & Dentistry
of New Jersey (UMDNJ)-Robert Wood Johnson Medical School, Piscataway,
NJ, USA
MELANIE A. FELMLEE Department of Pharmaceutical Sciences,
University at Buffalo, State University of New York, Buffalo, NY, USA
JOHAN GABRIELSSON Division of Pharmacology and Toxicology,
Department of Biomedical Sciences and Veterinary Public Health,
Swedish University of Agricultural Sciences, Uppsala, Sweden
P. ROBINAN GENTRY Environ International Corporation, Monroe, LA, USA
MICHAEL R. GOLDSMITH National Exposure Research Laboratory,
US Environmental Protection Agency, Research Triangle Park, NC, USA
JANNA HASTINGS European Bioinformatics Institute, Hinxton, UK
GEOFFREY K. ISBISTER Department of Clinical Toxicology and Pharmacology,
Calvary Mater Newcastle, University of Newcastle, Newcastle, NSW, Australia;
Discipline of Clinical Pharmacology, University of Newcastle,
Newcastle, NSW, Australia
IVAYLO IVANOV Department of Chemistry, Georgia State University, Atlanta,
GA, USA
BARTHOLOMEW JARDINE Department of Bioengineering, University of Washington,
Seattle, WA, USA
ZARA JOSEPHS European Bioinformatics Institute, Hinxton, UK
MUTHUKUMARASAMY KARTHIKEYAN National Chemical Laboratory,
Digital Information Resource Centre & Centre of Excellence in Scientific Computing,
Pune, India
ELAINA KENYON Pharmacokinetics Branch, Integrated Systems Toxicology Division, MD
B105-03, National Health and Environmental Effects Research Laboratory, Office of
Research and Development, U.S. Environmental Protection Agency,
JOHN KERRIGAN Cancer Institute of New Jersey, Robert Wood Johnson
Medical School, New Brunswick, NJ, USA
SANDHYA KORTAGERE Department of Microbiology and Immunology, Drexel University
College of Medicine, Philadelphia, PA, USA
MATTHEW D. KRASOWSKI Department of Pathology, University of Iowa
Hospitals and Clinics, Iowa City, IA, USA
MARKUS LUKACOVA Department of Medicinal Chemistry and Molecular Pharmacology,
Purdue University, West Lafaytte, IN, USA
VIERA LUKACOVA Simulations Plus, Inc., Lancaster, CA, USA
A.K. MADAN Pt. B.D. Sharma University of Health Sciences, Rohtak, India
DONALD E. MAGER Department of Pharmaceutical Sciences, University
at Buffalo, State University of New York, Buffalo, NY, USA
ARTHUR N. MAYENO Department of Chemical and Biological Engineering,
Colorado State University, Fort Collins, CO, USA
MARILYN E. MORRIS Department of Pharmaceutical Sciences,
University at Buffalo, State University of New York, Buffalo, NY, USA
List of Contributors xi
SHANE D. PETERSON National Exposure Research Laboratory, US Environmental

AMI E. RADUNSKAYA Department of Mathematics, Pomona College,
Claremont, CA, USA
GARY M. RAYMOND Department of Bioengineering, University of Washington,
Seattle, WA, USA
BRAD REISFELD Department of Chemical and Biological Engineering,
Colorado State University, Fort Collins, CO, USA
OLA SPJUTH Department of Pharmaceutical Biosciences, Uppsala University, Uppsala,
Sweden; Swedish e-Science Research Center, Royal Institute of Technology, Stockholm,
Sweden
CHRISTOPH STEINBECK European Bioinformatics Institute, Hinxton, UK
YU-MEI TAN National Exposure Research Laboratory, U.S. Environmental
ROGELIO TORNERO-VELEZ National Exposure Research Laboratory, U.S. Environmental
THOMAS R. TRANSUE Lockheed Martin Information Technology,
JESSICA TRESSOU National Institute for Agronomic Research (INRA),
Paris, France
PAVAN VAJJAH School of Pharmacy, University of Otago, Otago, New Zealand; Systems
Pharmacology Group, Simcyp Ltd, Sheffield, UK
PHILIPPE VERGER Department of Food Safety and Zoonoses, World Health Organization,
Geneva, Switzerland
RENU VYAS Department of Bioinformatics and Computer Science,
Dr. D.Y. Patil Biotechnology and Bioinformatics Institute, Pune, India
DANIEL WEINER Division of Certara, Pharsight Corporation, Cary, NC, USA
ANTONY J. WILLIAMS Royal Society of Chemistry, Wake Forest, NC, USA
EGON L. WILLIGHAGEN Department of Pharmaceutical Biosciences, Uppsala University,
Uppsala, Sweden; Division of Molecular Toxicology, Institute of Environmental Medicine,
Karolinska Institutet, Stockholm, Sweden;
Department of Bioinformatics - BiGCaT, Maastricht University
Universiteitssingel 50, Maastricht, The Netherlands
WALTER S. WOLTOSZ Simulations Plus, Inc., Lancaster, CA, USA
Part I
Introduction
Chapter 1
What is Computational Toxicology?

Brad Reisfeld and Arthur N. Mayeno
Abstract
Computational toxicology is a vibrant and rapidly developing discipline that integrates information and
data from a variety of sources to develop mathematical and computer-based models to better understand
and predict adverse health effects caused by chemicals, such as environmental pollutants and pharmaceu-
ticals. Encompassing medicine, biology, biochemistry, chemistry, mathematics, computer science, engi-
neering, and other fields, computational toxicology investigates the interactions of chemical agents and
biological organisms across many scales (e.g., population, individual, cellular, and molecular). This multi-
disciplinary field has applications ranging from hazard and risk prioritization of chemicals to safety screening
of drug metabolites, and has active participation and growth from many organizations, including govern-
ment agencies, not-for-profit organizations, private industry, and universities.
Key words: Computational toxicology, Computational chemistry, Computational biology, Systems

biology, Risk assessment, Safety assessment
1. Introduction
There are over 80,000 chemicals in common use worldwide, and

hundreds of new chemicals and chemical mixtures are introduced
into commerce each year. Because chemical safety has traditionally
been assessed using expensive and time-consuming animal-based
toxicity tests, only a small fraction of these chemicals have been
adequately assessed for potential risk.
Aside from these environmentally relevant chemicals, drugs are
another class of chemicals for which toxicity testing is crucial. For
new drugs, toxicity is still the cause of a significant number of
candidate failures during later development stages. In the pharma-
ceutical industry, toxicity assessment is hampered by the large
amounts of compound required for the in vivo studies, lack of
reliable high-throughput in vitro assays, and inability of in vitro
and animal models to correctly predict some human toxicities.
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_1, # Springer Science+Business Media, LLC 2012
3
4 B. Reisfeld and A.N. Mayeno
For both environmental pollutants and drugs, there is a clear

need for alternative approaches to traditional toxicity testing to help
predict the potential for toxicity and prioritize testing in light of the
limited resources. One such approach is computational toxicology.
2. What is
Computational
Toxicology?
The US Environmental Protection Agency (EPA) defines Compu-
tational Toxicology as “the application of mathematical and com-
puter models to predict adverse effects and to better understand the
single or multiple mechanisms through which a given chemical
induces harm.”
In a larger context, computational toxicology is an emerging
multidisciplinary field that combines knowledge of toxicity path-
ways with relevant chemical and biological data to inform the
development, verification, and testing of multi-scale computer-
based models that are used to gain insights into the mechanisms
through which a given chemical induces harm. Computational
toxicology also seeks to manage and detect patterns and interac-
tions in large biological and chemical data sets by taking advantage
of high-information-content data, novel biostatistical methods, and
computational power to analyze these data.
3. What Are Some

Application Areas
for Computational
Toxicology? Some of the principal application areas for computational toxicology
are (1) hazard and risk prioritization of chemicals, (2) uncovering
mechanistic information that is valuable in tailoring testing pro-
grams for each chemical, (3) safety screenings of food additives and
food contact substances, (4) supporting more sophisticated
approaches to aggregate and cumulative risk assessment, (5) estimat-
ing the extent of variability in response in the human population, (6)
pharmaceutical lead selection in drug development, (7) safety
screening and qualification of pharmaceutical contaminants and
degradation products, and (8) safety screening of drug metabolites.
4. What Are
the Major Fields
Comprising
Computational Computational toxicology is highly interdisciplinary. Researchers in
Toxicology? the field have backgrounds and training in toxicology, biochemistry,
chemistry, environmental sciences, mathematics, statistics, medicine,
engineering, biology, computer science, and many other disciplines.
1 What is Computational Toxicology? 5
In addition, the development of models in computational

toxicology has been supported by the development of numerous
“-omics” technologies, which have evolved into a number of scien-
tific disciplines, including genomics, proteomics, metabolomics,
transcriptomics, glycomics, and lipomics.
5. Who Uses
Computational
Toxicology?
A broad spectrum of international organizations are involved in the
development, application, and dissemination of knowledge, tools,
and data in computational toxicology. These include
l Government agencies in
The USA (EPA, Centers for Disease Control, Food and
Drug Administration, National Institutes of Health,
Agency for Toxic Substances and Disease Registry).
Europe (European Chemicals Agency, Institute for Health
and Consumer Protection).
Canada (Health Canada, National Centre for Occupational
Health and Safety Information).
Japan (National Institute of Health Sciences of Japan).
l The USA state agencies.
l Not-for-profit organizations.
l National laboratories.
l Nongovernment organizations.
l Military laboratories; private industry.
l Universities.
6. What Are Some

Current Areas
of Research
in Computational This volume covers a diverse range of applications for computa-
Toxicology? tional toxicology. This trend is also reflected by recent publications
in the scientific literature. Some of the topics of papers published
recently in the area of computational toxicology include
l Computational methods for evaluating genetic toxicology.
l Structure-based predictive toxicology.
l Informatics and machine learning in computational toxicology.
l Estimating toxicity-related biological pathways.
l Computational approaches to assessing human genetic
susceptibility.
6 B. Reisfeld and A.N. Mayeno
l Assessing activity profiles of chemicals evaluated across bio-

chemical targets.
l Pharmacokinetic modeling for nanoparticles.
l Quantitative structure–activity relationships in toxicity predic-
tion.
l In silico prediction of carcinogenic potency.
l Virtual tissues in toxicology.
l Public databases supporting computational toxicology.
l Regulatory use of computational toxicology tools and data-
bases.
l Computational approaches to assess the impact of environmen-
tal chemicals on key transcription regulators.
l Molecular modeling for screening environmental chemicals for
estrogenicity.
l Predicting inhibitors of acetylcholinesterase by machine
learning approaches.
l Predicting activation enthalpies of cytochrome-P450-mediated
reactions.
7. What Are Likely

Future Directions
in Computational
Toxicology? Progress in computational toxicology is expected to facilitate the
transformative shift in toxicology called for by the US National
Research Council in the recent report entitled “Toxicity Testing
in the 21st Century: A Vision and a Strategy.” Specifically, the
following directions are among those that will be critical in enabling
this shift:
l Broadening the usage of high-throughput screening methods
to evaluate the toxicity for the backlog of thousands of indus-
trial chemicals in the environment.
l Informing computational models through the continued and
expanded use of “-omics” technologies.
l Acquiring new biological data at therapeutic or physiologically
relevant exposure levels for more realistic computational
endpoints.
l Predicting adverse outcomes of environmental chemical and
drug exposure in specific human populations.
l Establishing curated and widely accessible databases that
include both chemical and biological information.
l Creating models for characterizing gene–environment
interactions.
1 What is Computational Toxicology? 7
l Developing approaches to predict cellular responses and bio-

logically based dose–response.
l Utilizing toxicogenomics to illuminate mechanisms and bridge
genotoxicity and carcinogenicity.
l Incorporating rigorous uncertainty estimates in models and
simulations.
l Implementing strategies for utilizing animals more efficiently
and effectively in bioassays designed to answer specific
questions.
l Generating models to assess the effects of mixtures of chemicals
by employing system-level approaches that encompass the
underlying biological pathways.
l Developing virtual tissues for toxicological investigations.
Chapter 2
Computational Toxicology: Application

in Environmental Chemicals
Yu-Mei Tan, Rory Conolly, Daniel T. Chang, Rogelio Tornero-Velez,
Michael R. Goldsmith, Shane D. Peterson, and Curtis C. Dary
Abstract
This chapter provides an overview of computational models that describe various aspects of the source-to-health
effect continuum. Fate and transport models describe the release, transportation, and transformation of
chemicals from sources of emission throughout the general environment. Exposure models integrate the
microenvironmental concentrations with the amount of time an individual spends in these microenviron-
ments to estimate the intensity, frequency, and duration of contact with environmental chemicals. Physio-
logically based pharmacokinetic (PBPK) models incorporate mechanistic biological information to predict
chemical-specific absorption, distribution, metabolism, and excretion. Values of parameters in PBPK models
can be measured in vitro, in vivo, or estimated using computational molecular modeling. Computational
modeling is also used to predict the respiratory tract dosimetry of inhaled gases and particulates [computa-
tional fluid dynamics (CFD) models], to describe the normal and xenobiotic-perturbed behaviors of
signaling pathways, and to analyze the growth kinetics of preneoplastic lesions and predict tumor incidence
(clonal growth models).
Key words: Computational toxicology, Source-to-effect continuum, Fate and transport, Dosimetry,
Signaling pathway, Physiologically based pharmacokinetic model, Biologically based dose response
model, Clonal growth model, Virtual tissue
1. Overview
Computational toxicology involves a variety of computational

tools including databases, statistical analysis packages, and predic-
tive models. In this chapter, we focus on computational models
that describe various aspects of the source-to-health effect contin-
uum (Fig. 1). Literature on the application of computational
models across the continuum has been expanding rapidly in recent
years. Using the Web of Science portal, we conducted a
9
10 Y.-M. Tan et al.
Fig. 1. Major components of the source-to-effect continuum.
bibliometric analysis of publications that appeared between 1970

and 2009. Using the search structure [TS ¼ (computational
OR “in silico” OR predictive OR model* OR virtual) AND TS
¼ (toxicology) AND TS ¼ (environment*)] (TS: Topic), a total
of 397 articles were found. Adding “NOT pharmaceutic*” to the
search structure above, found 371 articles, indicating only a
small fraction of the 397 deal with aspects of drug development.
A PubMed search (Feb 17, 2011) on “physiologically based phar-
macokinetic (PBPK) modeling” found 769 articles, indicating
that our search, which focused on computational modeling spe-
cifically in environmental toxicology, was quite restrictive.
Literature searches using specific terminology were performed to
understand the publication frequency of some of the most common
types of modeling used in computational toxicology, including fate
and transport, exposure, PBPK, computational fluid dynamic (CFD),
signaling pathway, biologically based dose–response (BBDR), and
clonal growth modeling. Searches were restricted to original scientific
publications only (i.e., reviews were excluded) and fields of science
were restricted (e.g., “NOT eco*”) in order to focus on applications
relevant to human health effects. A yearly breakdown showing publi-
cation frequency over time is presented in Fig. 2. The data show a
rapid increase in publication frequency for many of the modeling
types beginning in the early 1990s and that PBPK, fate and transport
and signaling pathways are the most common. BBDR and clonal
growth modeling have received considerably less attention.
2 Computational Toxicology: Application in Environmental Chemicals 11
Fig. 2. Literature searches performed to understand publication frequency of common modeling types used in environ-
mental computational toxicology.
2. Computational
Models Along the
Source-to-Health
Effect Continuum Fate and transport models describe the release, transportation, and
transformation of chemicals from sources of emission throughout
2.1. Fate and Transport the general environment. Fate addresses persistence, dissipation,
and loss of chemical mass along the migration pathway; and trans-
port addresses mobility of a chemical along the migration pathway
(1). Based on their complexity, models of fate and transport can be
used for either “screening-level” or “higher-tiered” applications
(2). Screening-level models often use default input parameters
that tend to over-predict exposures (the preferred default approach
used in the absence of data). These models are suitable for obtain-
ing a first approximation or to screen out exposures that are not
likely to be of concern (3). Screening-level models have limited
spatial and temporal scope. Higher-tiered models are needed when
analyses require greater temporal and spatial resolution, but much
more information is required, such as site-specific data.
The processes that can be described in fate and transport models
include advection, dispersion, diffusion, equilibrium partitioning
between solid and fluid, biodegradation, and phase separation of
immiscible liquids (1). In general, fate and transport models require
information on physicochemical properties; mechanisms of release
12 Y.-M. Tan et al.
of chemicals to environmental media; physical, chemical, and

biological properties of the media though which migration occurs;
and interactions between the chemical and medium (1). For exam-
ple, typical inputs to an air quality and dispersion model are source
data (e.g., emission rates), meteorological data (e.g., temperature),
and physicochemical properties of the chemical. Inputs to a surface
water model, in addition to source data and physicochemical prop-
erties, may include water flows, soil properties and topography, and
advective/dispersive movement (2).
2.2. Exposure The outputs of a fate and transport model are concentrations to
which humans may be exposed. These predicted concentrations are
then used, in some cases, as surrogates for actual exposure (2).
Since these provisional estimates do not provide sufficient resolu-
tion about variation of exposure among individuals and by time and
location, they can also be used as inputs to exposure models.
Exposure models integrate the microenvironmental concentrations
with the amount of time an individual spends in these microenvir-
onments to provide qualitative and quantitative evaluations of the
intensity, frequency, and duration of contact with chemicals, and
sometimes, the resulting amount of chemicals that is actually
absorbed into the exposed organism. Exposure models vary con-
siderably in their complexity. Some models are deterministic and
generate site of contact-specific point estimates (e.g., dermal con-
centration contact time). Others are probabilistic, describing
spatial and temporal profiles of chemical concentrations in micro-
environments. Both deterministic and probabilistic models may
aggregate some or all of the major exposure pathways.
Probabilistic models can also be used to describe variability in
human behavior. Human activities contribute to exposure variabil-
ity, and at first glance appear to be arbitrary, yet patterns of behavior
are known to be representative of different age groups (e.g., hand-
to-mouth behavior among 3–5 year olds) and this information can
be used to better inform stochastic exposure models (4). A major
challenge in characterizing human activity is overcoming the cost of
collecting information. For example, food consumption question-
naires are important in dietary modeling (e.g., estimating chronic
arsenic exposure by shellfish consumption); however the accuracy
in assessing chronic exposure is limited by the lack of longitudinal
survey information in the surveys such as Continuing Survey of
Food Intake by Individuals (CSFII) and National Health and
Nutrition Examination Survey (NHANES) (5, 6). The recent
study of Song et al. (7) examined how much information is needed
in order to predict human behavior. The authors examined the
predictability of macro-scale human mobility over a span of 3
months based on cell phone use—comparing a continuous record
(e.g., hourly) of a user’s momentary location with a less expensive
measure of mobility. The authors found that there is a potential
93% average predictability in user mobility. This predictability

reflects the inherent regularity of human behavior (7) and exem-
plifies an approach that holds promise for examining aspects of
human mobility, thereby reducing the cost of exposure modeling.
The degree of complexity needed in an exposure model
depends on (1) the nature of the chemical (e.g., volatility) and (2)
the number and complexity of the most common exposure scenar-
ios that the model is required to describe. The number of para-
meters in the model and their corresponding data needs are
functions of model complexity. The first choice for obtaining
input parameter data is direct measurement of the environment
concentrations and observations of human activity patterns. When
these specific data are not available, inputs may be obtained from
population-based surveys, such as NHANES or the Exposure Fac-
tors Handbook (8). The outputs of fate, transport, and exposure
models can serve as inputs to pharmacokinetic models for estimat-
ing internal tissue dosimetry.
2.3. Dosimetry Pharmacokinetic processes translate the exposure or applied dose

into a delivered dose at an internal site. Internal doses often correlate
better with apical effects than do the external doses due to nonlinear
pharmacokinetics (9). Pharmacokinetic data can be obtained from
studies using laboratory animals (10) or from controlled human
exposures (11, 12). Controlled human exposures are largely
reserved for evaluating the safety and efficacy of drugs or therapies
not for environmental chemicals. Human observational studies
(13–15) may provide some insight into the disposition of environ-
mental chemicals, but the relationship between exposure and sys-
temic levels is obscured because of the complexity of exposure.
Furthermore, the lack of control with regards to chemical
co-occurrence may confound interpretation of the exposure–dose
relationship.
The relationship between exposure to a chemical and its dose at
an internal target site is determined by a set of chemical structure-
dependent properties (e.g., solubility in water, blood, and tissues,
volatility, susceptibility to biotransformation) and corresponding
properties of the biological system (e.g., tissue volumes, blood
flows, metabolic capabilities). Computational models that describe
the minimum set of these characteristics needed to predict chemical-
specific absorption, distribution, metabolism, excretion (ADME)
are commonly referred to as PBPK models though PBTK, were
the T stands for toxicokinetic, is also used. Because models of this
type describe the relevant biology that determines ADME, they are
useful not only for predicting pharmacokinetic behavior within the
dose range and time course of available data but also for extrapola-
tion outside these ranges. These characteristics make these models
particularly useful in risk assessments, where extrapolation to doses
well below those for which data are available is often necessary (16).
14 Y.-M. Tan et al.
Many of the parameters used in PBPK models can be measured

in vitro (17). Obach and colleagues (18, 19) observed that scaling
in vitro metabolism data from human liver microsomes to in vivo
clearance values yielded predictions that were within 70–80% of
actual values. They also found that the clearance predictions were
improved by accounting for plasma and microsomal protein bind-
ing. Tornero-Velez and colleagues (20) applied the same approach
to account for deltamethrin’s age-dependent pharmacokinetics in
the maturing Sprague-Dawley rats using in vitro parameters for
hepatic and plasma metabolic clearance of deltamethrin. Finding
agreement between in vitro parameter values and in vivo parameter
estimates is one way to explore pharmacokinetic mechanisms and
reduce pharmacokinetic data gaps. In the absence of data, however,
which may often be the case for new chemicals, the exposure–dose
modeler may turn to the emerging field of molecular modeling and
chemoinformatics to obtain provisional pharmacokinetic values.
Molecular modeling makes use of a wide variety of techniques
to predict or understand chemical behavior at the atomic level.
Modeling chemical interactions is an important step in understand-
ing the molecular events encountered in both biological and envi-
ronmental systems (21–23). These methods have the potential to
explain the underlying molecular processes of chemical interactions
and transformations in the source–exposure–dose–response contin-
uum. Here, the primary use of such tools is to provide in silico
predictions of relevant data where little or no actual data exist.
Provisional estimates derived from structure–activity relationships
may then be tested using focused methods to validate or augment
parameter values.
The field of molecular modeling comprises a wide variety of tools
from chemoinformatics-based disciplines [e.g., quantitative struc-
ture–activity relationships (QSAR)] and graph network theory (e.g.,
two-dimensional topological molecular descriptors) to detailed atom-
istic simulations (e.g., molecular dynamics) and quantum mechanical
simulations of the electron distributions of a molecule. Chemoinfor-
matic techniques have a long history in promoting simple concepts
such as lipophilicity and partitioning (24) as indicators of persistence
and toxicity within the environment (i.e., fate and transport) (25).
These techniques are also used to obtain indicators of chemical dispo-
sition (26) and pharmacodynamics (27) within biological organisms
(28). Many software packages exist whereby one can develop, aug-
ment, and utilize new or existing QSARs for parameters such as
blood–brain barrier transfer coefficients, dermal permeation rates,
cell line permeability, and octanol–water partition coefficient
[e.g., molecular operating environment (MOE), QiKProP (https://
www.schrodinger.com/products/14/17/], and OpenEye (http://
www.eyesopen.com)]. These QSAR packages are generally confined
within biological system analysis as seen on the right side of the
source–exposure–dose–response continuum (Fig. 1).
For environmental fate and transport models, QSAR can be

used to estimate the values of the physicochemical parameters
describing the partitioning and transfer processes among air,
water, and soil. For example, the US EPAs SPARC predictive
modeling system is able to calculate large numbers of physical/
chemical parameters from molecular structure and basic informa-
tion about the environment (e.g., media, temperature, pressure,
PH). These parameters are used in fate and transport modeling of
organic pollutants, nutrients, and other stressors.
Techniques such as QSAR are ideally suited for rapid evaluation
of parameters for pharmacokinetic and fate and transport models.
However, development of these techniques is data intensive,
requiring training sets with well-defined endpoints to develop the
relationship between chemical structure and observed activity. In
addition, QSAR models are fitted to specific molecular subsets
(training set) and it is difficult to apply them to compounds outside
the chemical space represented in the training set.
While QSAR is the more known molecular modeling technique
within computational toxicology, there are other tools, such as
classical force-field docking techniques, that can aid in understand-
ing the biological processes which involve chemical interactions
with biomolecular targets. Inter-molecular interactions between
ligands and biomolecular targets determine binding mechanics
that ultimately lead to altered physiological responses and potential
toxicological effects. Thus, an understanding of the relevant bind-
ing interactions can lead to a better understanding of chemical
function and provide a visual representation of chemical binding
and mechanisms of toxicity. For example, estimating the relative
binding affinities of 281 chemicals to a surrogate rat estrogen
receptor, Rabinowitz et al. (29) utilized docking techniques to
screen out 16 actives (“true competitive inhibitors”) from nonac-
tive substrates with no false negatives and eight misclassified false
positives. Molecular dynamics (17, 30) or ab initio molecular
dynamics (31) can be used to simulate time-evolving processes
such as diffusion through environmental media, solvation effects,
and “classical” kinetic rate constants (e.g., solvent-mediated hydro-
lysis, oxidation, and hydrogen abstraction rates). This information
can be used as chemical-specific inputs to pharmacokinetic and
environmental fate and transport models (32–37).
Computational models are also used to predict the respiratory
tract dosimetry of inhaled gases and particulates. These models are
needed because the complex shapes of the nasal airways and the
branching pattern of the airways leading from the trachea to the
alveoli often result in nonuniform deposition of inhaled materials.
Models of the respiratory tract incorporate varying degrees of
anatomical realism. CFD models of the nasal airways use accurate,
three-dimensional reconstructions of the airways (38), while one-
dimensional reconstructions have been more commonly used for
the pulmonary airways (39).
16 Y.-M. Tan et al.
2.4. Signaling Pathways Signaling pathways such as the mitogen-activated protein kinase
(MAPK) pathway (40) consist of one or more receptors at the cell
surface that, when activated by their cognate ligands, transmit
signals to cytosolic effectors and also to the genome. The cytosolic
effects are rapid, occurring within seconds or minutes of receptor
activation, while the effects on gene expression take longer, with
changes in the associated protein levels typically occurring after one
or more hours. A number of computational models of signaling
pathways have been described (e.g., (41, 42).
The National Research Council (NRC) report, Toxicity Test-
ing in the twenty-first century (18) introduced the concept of
“toxicity pathways.” Toxicity pathways were defined by the NAS
as “interconnected pathways composed of complex biochemical
interactions of genes, proteins, and small molecules that maintain
normal cellular function, control communication between cells,
and allow cells to adapt to changes in their environment” and
which, “when sufficiently perturbed, are expected to result in
adverse health effects are termed toxicity pathways” (43). The
adverse effect is the clinically evident effect on health and is
often referred to as the apical effect, denoting its placement at
the terminal end of the toxicity pathways. Although not much
work has been done to date, computational models of signaling
pathways are expected to be integral components of toxicity path-
way models.
2.5. BBDR/Clonal Cancer is a disease of cell division. In healthy tissue, the respective
Growth rates of cellular division and death are tightly regulated, allowing
for either controlled growth or the maintenance of tissue size in
adulthood. When regulation of division and death rates is dis-
rupted, tumors can develop. (It should also be noted that embry-
onic development depends on tight regulation of division and
death rates, where dysregulation can result in malformations).
A number of computational models have been developed to
describe tumor incidence and the growth kinetics of preneoplastic
lesions. These vary from purely statistical models fit to incidence
data (44) to models that track time-dependent division and death
rates of cells in the various stages of multi-stage carcinogenesis (45).
These latter kinds of models provide insights into how different
kinds of toxic effects—e.g., direct reactivity with DNA versus cyto-
lethality—can differentially affect tumor development.
BBDR models represent the entire exposure—target site
dose—apical response continuum. These kinds of models require
large amounts of supporting data but have the capability to predict
both dose–response and time course for development of apical
effects as well as for some intermediate effects (e.g., (46). This
latter capability is important as it provides the opportunity to use
data on biomarkers in support of model development. The
resources needed to develop such models are, unfortunately,
seldom available. In some cases, however, where the economic

importance or the degree of human exposure is sufficient, develop-
ment of BBDR models can be justified.
3. Virtual Tissues
The computational models described above incorporate varying

degrees of biological detail. Over time, these models will be refined
as new data and new degrees of understanding of the relevant
biological processes become available. Taking a long-term view,
the iterative refinement of such models will lead asymptotically to
the development of virtual tissues, where multiple scales of biol-
ogy—molecular, macromolecular, organelle, tissue—are described
in a spatially and temporally realistic manner. Numerous efforts that
are self-described as virtual tissues are underway (47–49). While
important and useful, these are, however, preliminary steps toward
actual development of virtual tissues, and we do not expect that this
goal will be realized for some time. However, while a long-term
goal of computational toxicology, virtual tissues and, by extension,
virtual organisms, have the potential to eventually reduce and
perhaps even eliminate the use of laboratory animals, thereby revo-
lutionizing toxicity testing.
Disclaimer
The United States Environmental Protection Agency through its

Office of Research and Development funded and managed the
research described here. It has been subjected to Agency’s admin-
istrative review and approved for publication.
References
1. ASTM (1998) 1998 Annual book of ASTM 3. US EPA (1992) Guidelines for exposure assess-
standards: standard guide for remediation of ment. EPA/600/Z-92/001. US Environmen-
ground water by natural attenuation at petro- tal Protection Agency, Washington, DC
leum release sites (Designation: E 1943-98), 4. Zartarian VG, Xue J, Ozkaynak H, Dang W,
vol 11.04. American Society for Testing and Glen G, Smith L, Stallings C (2006) A proba-
Materials, West Conshohocken, pp 875–917 bilistic arsenic exposure assessment for children
2. Williams PRD, Hubbell WBJ, Weber E et al who contact CCA-treated playsets and decks,
(2010) An overview of exposure assessment Part 1: model methodology, variability results,
models used by the U.S. Environmental Pro- and model evaluation. Risk Anal 26
tection Agency. In: Hanrahan G (ed) Modeling (2):515–531
of pollutants in complex environmental 5. Glen G, Smith L, Isaacs K, Mccurdy T, Lang-
systems, vol 2. ILM Publications, St Albans staff J (2008) A new method of longitudinal
18 Y.-M. Tan et al.
diary assembly for human exposure modeling. 17. Rapaport DC (2004) The art of molecular
J Expo Sci Environ Epidemiol 18(3):299–311 dynamics simulation, 2nd edn. Cambridge
6. Tran NL, Barraj L, Smith K, Javier A, Burke T University, New York
(2004) Combining food frequency and survey 18. Obach RS (1999) Prediction of human
data to quantify long-term dietary exposure: a clearance of twenty-nine drugs from hepatic
methyl mercury case study. Risk Anal 24 microsomal intrinsic clearance data: an exami-
(1):19–30 nation of in vitro half-life approach and non-
7. Song C, Qu Z, Blumm N, Barabasi AL (2010) specific binding to microsomes. Drug Metab
Limits of predictability in human mobility. Sci- Dispos 27(11):1350–1359
ence 327(5968):1018–1021 19. Obach RS, Baxter JG, Liston TE, Silber BM,
8. US EPA (1997) Exposure factors handbook. Jones BC, MacIntyre F, Rance DJ, Wastall P
US Environmental Protection Agency, (1997) The prediction of human pharmacoki-
Washington, DC. http://www.epa.gov/ netic parameters from preclinical and in vitro
NCEA/pdfs/efh/front.pdf metabolism data. J Pharmacol Exp Ther 283
9. Watanabe PG, Gehring PJ (1976) Dose- (1):46–58
dependent fate of vinyl chloride and its possible 20. Tornero-Velez R, Mirfazaelina A, Kim KB,
relationship to oncogenicity in rats. Environ Anand SS, Kim HJ, Haines WT, Bruckner JV,
Health Perspect 17:145–152 Fisher JW (2010) Evaluation of deltamethrin
10. Reddy MB, Yang RSH, Clewell HJ, Andersen kinetics and dosimetry in the maturing rats
ME (2005) Physiologically based pharmacoki- using a PBPK model. Toxicol Appl Pharmacol
netic modeling: science and applications. Wiley, 244(2):208–217
Hoboken 21. Böhm G (1996) New approaches in molecular
11. Emmen HH, Hoogendijk EM, Klopping- structure prediction. Biophys Chem 59
Ketelaars WA, Muijser H, Duisterrnaat E, (1–2):1–32
Ravensberg JC, Alexander DJ, Borkhataria D, 22. Fielden MR, Matthews JB, Fertuck KC et al
Rusch GM, Schmit B (2000) Human safety (2002) In silico approaches to mechanistic and
and pharmacokinetics of the CFC alternative predictive toxicology: an introduction to bio-
propellants HFC 134a (1,1,1,2- informatics for toxicologists. Crit Rev Toxicol
tetrafluoroethane) and HFC 227 32(2):67–112
(1,1,1,2,3,3,3-heptafluoropropane) following 23. Marrone TJ, Briggs JM, McCammon JA
whole-body exposure. Regul Toxicol Pharma- (1997) Structure-based drug design: computa-
col 32(1):22–35 tional advances. Annu Rev Pharmacol Toxicol
12. Ernstgard L, Andersen M, Dekant W, Sjogren 37:71–90
B, Johanson G (2010) Experimental exposure 24. Leo A, Handsch C, Elkins D (1971) Partition
to 1,1,1,3,3-pentafluoropropane (HFC- coefficients and their uses. Chem Rev 71
245fa): uptake and disposition in humans. Tox- (6):525–616
icol Sci 113(2):326–336 25. Valko K (2002) Measurements and predictions
13. Sexton K, Kleffman DE, Cailahan MA (1995) of physicochemical properties. In: Darvas F,
An introduction to the National Human Expo- Dorman G (eds) High-throughput ADMETox
sure Assessment Survey (NHEXAS) and estimation. Eaton Publishing, Westborough
related phase I field studies. J Expo Anal Envi- 26. Topliss JG (ed) (1983) Quantitative structure-
ron Epidemiol 5(3):229–232 activity relationships of drugs. Academic, New
14. Shin BS, Hwang SW, Bulitta JB, Lee JB, Yang York
SD, Park JS, Kwon MC, do Kim J, Yoon HS, 27. Cronin MTD, Dearden JC, Duffy JC, Edwards
Yoo SD (2010) Assessment of bisphenol A R, Manga N, Worth AP, Worgan ADP (2002)
exposure in Korean pregnant women by physi- The importance of hydrophobicity and electro-
ologically based pharmacokinetic modeling. philicity descriptors in mechanistically-based
J Toxicol Environ Health A 73 QSARs for toxicological endpoints. SAR
(21–22):1586–1598 QSAR Environ Res 13:167–176
15. Wilson NK, Chuang JC, Morgan MK, Lordo 28. Pratt WB, Taylor P (eds) (1990) Principles of
RA, Sheldon LS (2007) An observational study drug action. Churchill-Livingstone, Inc, New
of the potential exposures of preschool to pen- York
tachlorophenol, bisphenol-A, and nonylphenol 29. Rabinowitz JR, Little S, Laws SC, Goldsmith R
at home and daycare. Environ Res 103(1):9–20 (2009) Molecular modeling for screening envi-
16. Andersen ME (2003) Toxicokinetic modeling ronmental chemicals for estrogenicity: use
and its applications in chemical risk assessment. of the toxicant-target approach. Chem Res
Toxicol Lett 138(1–2):9–27 Toxicol 22(9):1594–1602
30. Allen MP, Tildesley DJ (2002) Computer protein kinases with diverse biological
simulations of liquids. Oxford University, functions. Microbiol Mol Biol Rev
New York 68:320–344
31. Car R, Parrinello M (1985) Unified approach 41. Bhalla US, Prahlad RT, Iyengar R (2002) MAP
for molecular dynamics and density-functional kinase phosphatase as a locus of flexibility in a
theory. Phys Rev Lett 55(22):2471–2474 mitogen-activated protein kinase signaling net-
32. Colombo MC, Guidoni L, Laio A, Magistrato work. Science 297:1018–1023
A, Maurer P, Piana S, Röhrig U, Spiegel K, 42. Hoffman A, Levchenko A, Scott ML, Balti-
Sulpizi M, VandeVondele J, Zumstein M, more D (2005) The IkB-NF-kB signaling
Röthlisberger U (2002) Hybrid QM/MM module: temporal control and selective gene
Carr-Parrinello simulations of catalytic and activation. Science 298:1241–1245
enzymatic reactions. CHIMIA 56(1–2):13–19 43. National Research Council (NRC) Committee
33. Geva E, Shi Q, Voth GA (2001) Quantum- on Toxicity and Assessment of Environmental
mechanical reaction rate constants from cen- Agents (2007) Toxicity testing in the twenty-
troid molecular dynamics simulations. J Chem first century: a vision and a strategy. National
Phys 115:9209–9222 Academies, Washington, DC. ISBN 0-309-
34. Prezhdo OV, Rossky PJ (1997) Evaluation of 10989-2
quantum transition rates from quantum classi- 44. Crump KS, Hoel DG, Langley CH, Peto R
cal molecular dynamics simulation. J Chem (1976) Fundamental carcinogenic processes
Phys 107:5863 and their implications for low dose risk assess-
35. Truhlar DG, Garrett BC (1980) Variational ment. Cancer Res 36:2937–2979
transition-state theory. Acc Chem Res 45. Moolgavkar SH, Dewanji A, Venzon DJ
13:440–448 (1988) A stochastic two-stage model for cancer
36. Tuckerman M, Laasonen K, Sprik M, Parri- risk assessment. The hazard function and prob-
nello M (1995) Ab initio molecular dynamics ability of tumor. Risk Anal 8:383–392
simulation of the solvation and transport of 46. Conolly RB, Kimbell JS, Janszen D, Schlosser
hydronium and hydroxyl ions in water. PM, Kalisak D, Preston J, Miller FJ (2004)
J Chem Phys 103:150–161 Human respiratory tract cancer risks of inhaled
37. Wang H, Sun X, Miller WH (1998) Semiclassi- formaldehyde: dose-response predictions
cal approximations for trhe calculation of ther- derived from biologically-motivated computa-
mal rate constants for chemical reactions in tional modeling of a combined rodent and
complex molecular systems. J Phys Chem 108 human dataset. Toxicol Sci 82:279–296
(23):9726–9736 47. Adra S, Sun T, MacNeil S, Holcombe M,
38. Kimbell JS, Gross EA, Joyner DR, Godo MN, Smallwood R (2010) Development of a three-
Morgan KT (1993) Application of computa- dimensional multiscale computational model
tional fluid dynamics to regional dosimetry of of the human epidermis. PLoS One 5(1):
inhaled chemicals in the upper respiratory tract e8511. doi:10.1371/journal.pone.0008511
of the rat. Toxicol Appl Pharmacol 48. Shah I, Wambaugh J (2010) Virtual tissues in
121:253–263 toxicology. J Toxicol Environ Health 13
39. Overton JH, Kimbell JS, Miller FJ (2001) (2–4):314–328
Dosimetry modeling of inhaled formaldehyde: 49. Wambaugh J, Shah I (2010) Simulating micro-
the human respiratory tract. Toxicol Sci dosimetry in a virtual hepatic lobule. PLoS
64:122–134 Comput Biol 6(4):e1000756. doi:10.1371/
40. Roux PP, Blenis J (2004) ERK and p38 journal.pcbi. 1000756
MAPK-activated protein kinases: a family of
Chapter 3
Role of Computational Methods in Pharmaceutical Sciences

Sandhya Kortagere, Markus Lill, and John Kerrigan
Abstract
Over the past two decades computational methods have eased up the financial and experimental burden of
early drug discovery process. The in silico methods have provided support in terms of databases,
data mining of large genomes, network analysis, systems biology on the bioinformatics front and struc-
ture–activity relationship, similarity analysis, docking, and pharmacophore methods for lead design and
optimization. This review highlights some of the applications of bioinformatics and chemoinformatics
methods that have enriched the field of drug discovery. In addition, the review also provided insights into
the use of free energy perturbation methods for efficiently computing binding energy. These in silico
methods are complementary and can be easily integrated into the traditional in vitro and in vivo methods to
test pharmacological hypothesis.
Key words: ADME/Tox, Bioinformatics, Chemoinformatics, Protein structure prediction,

Homology models, Virtual screening, Ligand-based methods, Structure-based methods, Drug design,
Docking and scoring, Protein–protein interactions, Protein networks, Systems biology, PK–PD
modeling
1. Introduction
A living cell is composed of a number of processes that are well

networked and compartmentalized in both space and time. These
complex networks hold clues to the normal and diseased physiology
of the living organism. Thus understanding the molecular interac-
tion network to delineate the normal from the disease phenotype
could help diagnose and treat the symptoms of disease states. While
this may sound simplistic, building the complete network of molec-
ular interactions is highly challenging and could at best be described
through model systems by incorporating evidence of interactions
obtained from biochemical experiments. Such evidence is now
obtained in a large scale with the advent of the various “omics and
omes” such as genomics, proteomics, and interactome to name a
few (1–3). The field of “omics” has revolutionized the way drug
21
22 S. Kortagere et al.
Fig. 1. A schematic representation of the role of in silico methods in pharmaceutical sciences.
discovery and development has been done in the past decade.

This transformation has been achieved mainly due to the increase
in computational power and development of novel bioinformatics
algorithms that have helped mine the “omics” data and present it in
usable formats for target and lead discovery (4–6). An associated
wing of bioinformatics that has taken a lead role in drug discovery is
the field of chemoinformatics. The term was aptly coined by Brown
(7, 8) as a way to integrate informatics and chemistry in 1998.
However, prior to being integrated under the roof of chemoinfor-
matics, the algorithms were functional under the broad title of
ligand-based methods (9). Thus in a broad sense the role of in silico
methods in drug discovery can be discussed under two categories
namely bioinformatics (as a tool for finding new targets) and
chemoinformatics (for lead identification and optimization) meth-
ods as summarized in Fig. 1.
2. Bioinformatics
The precise definition of the term bioinformatics is not known. It

varies from being as simple as using computers to study biological
problems to as complex as methods for generating, storing,
3 Role of Computational Methods in Pharmaceutical Sciences 23
retrieving, analyzing, and modeling biological data such as

genomic, proteomic, and biological pathways. The goal of bioinfor-
matics from a pharmaceutical sciences perspective is to mine the vast
data available and correlate the data with disease phenotypes to
discover new target proteins for further use in developing new
drugs. However, in recent years, there seems to be a paradigm
shift in single target hypothesis and moving towards protein–
proteininteraction (PPI) inhibitors, pathway inhibitors and giving
way to the concept of polypharmacology (10–12). The field of
polypharmacology is synonymous to the early medicines developed
by ayurvedic and traditional Chinese medicines, where in the goal
was to treat with a concoction of compounds that could hit multiple
targets to provide a systemic relief from the symptoms. Thus systems
biology is trending towards becoming a reality and in silico models
to delineate these pathways are being developed. Further, with
improvements in robotic technologies and their utility in protein
crystallography, characterizing the newly identified targets has
become a possibility and this field is called structural bioinformatics.
As the name suggests, structural bioinformatics deals with
protein structure and algorithms that can be used to predict, ana-
lyze, and model three-dimensional structure of protein. This is one
of the most popular fields of computational biology with algo-
rithms designed as early as 1960. Protein structure can be explained
at four levels of complexity: the primary, secondary, tertiary, and
quaternary structure. Hence algorithms were developed to deal
with each of these levels. The role of bioinformatics starts with
the sequencing of the genomes during mapping of the loci, frag-
ment assembly and annotations of the sequenced genomes
(13, 14). Gene annotations are complex with algorithms needed
to identify coding and noncoding regions and those for deriving
information from homologous and orthologous genomes
using evolutionary dendrograms (15, 16) and sequence alignment
programs (17–19).
3. Protein Structure
and Prediction
Protein structure prediction algorithms can be classified into three
categories namely, secondary structure prediction, ab initio struc-
ture modeling, and homology modeling. Given that there are 20
amino acids, the number of possibilities for a given sequence of
amino acids that constitute the primary structure of the protein to
fold into a tertiary structure is astronomical. However, protein
folding in physiological conditions probably follows the path of
least complexity and is therefore highly efficient. To mimic this
process, it is prudent to first predict the secondary structural ele-
ments namely alpha helix, beta sheet, and gamma turns which are
guided by evolutionary and functional relationships among

homologous proteins (20, 21). The algorithms for searching data-
bases and predicting secondary structural elements are based on
heuristic statistical methods. These algorithms include sequence
database search methods such as BLAST (22) and FASTA (23);
sequence alignment methods such as ClustalW (24), KALLIGN
(25), muscle (26), T-Coffee (27); secondary structure prediction
methods such as HMM (28), Predict Protein (29), JPred (30); and
other methods listed in repositories such as ExPASy tools (http://
expasy.org/tools/). These methods assign the most preferred
secondary structural element to a stretch of amino acids based on
the statistical propensity of a given amino acid to be a part of alpha
helix or beta sheet or a turn (31, 32). Ab initio methods on the
other hand use only sequence information to predict the tertiary
structure and rely on conformational analysis methods such as
Monte-Carlo or molecular dynamics simulations to assess whether
the predicted sequence can correctly fold into the assigned struc-
ture (33–35). Homology modeling on the other hand uses the
structure of a known protein called the template to model the
three-dimensional structure of the unknown (36). The resulting
model is then refined using energy minimization and molecular
dynamics simulations and further validated using evidence from
biochemical experiments such as site-directed mutagenesis and
fluorescence measurements (37, 38). All these methods are highly
resourceful in the context of structure-based drug design wherein
the models built can be directly used for screening compounds or
to understand the mode of binding of an inhibitor and hence
designing mutants or in rational lead optimization (39). Several
examples of structure-based drug design are available in the litera-
ture including the early success of HIV protease inhibitors (40).
4. Protein–Protein
Interactions
It can be envisioned that the next paradigm in drug discovery will
be to design inhibitors to key protein–protein interactions (PPIs).
These PPIs could be present between host and pathogen or entirely
belonging to a host or a pathogen. The feasibility of such drug
design has been shown recently by our group in designing small
molecule interaction inhibitors of key PPIs of the malaria parasite
(41) and other infectious agents such as Toxoplasma gondii and
HIV (42, 43). However, the bottle neck in this design process lies
in identifying key PPIs given only a handful of crystal structures of
such complexes are currently available in the protein databank.
Understanding PPIs are also important from other pharmacologi-
cal and biochemical perspectives such as in signaling, cellular
adhesion, enzyme kinetics, pathways, etc. Thus understanding,
predicting, and cataloging these PPIs using bioinformatics and

structural biology methods is crucial. A significant step towards
achieving this goal is the availability of the genomic information
of species of interest. A general hypothesis about PPIs is that if the
proteins coevolve then they have a higher probability of being
interaction partners (44). Several experimental techniques such as
yeast two hybrid, mammalian hybrid methods, protein fragment
complementation assays, fluorescence resonance energy transfer
can detect or validate if the two proteins form interaction partners
(45–48). Given the complexities of establishing these in vitro meth-
ods, a number of computational methods have been developed to
predict PPIs. Based on the protein coevolution hypothesis, phylo-
genetic methods that can make inferences about interactions
between pairs of proteins based on their phylogenetic distances
have been designed (49, 50). Others include use of PSI-BLAST
(51) or BLAST algorithms to query for templates either from a set
of nonredundant sequences or from a library of known protein–
protein interfaces (52, 53). These libraries can be built from data-
bases that maintain information about PPIs such as BIND (54),
BioGRID (55), DIP (56), HPRD (57), IntAct (58), and MINT
(59). Some methods of prediction use homology modeling tech-
niques to build the complexes and to score them, using statistical
residue interaction energetic (60–62). Machine learning methods
such as SVM and Bayesian network models have also been used to
predict the interacting partners of a given protein (63).
5. Systems Biology
and Protein
Networks
PPIs can be called as the minimal subunit of protein networks. The
concept of systems biology deals with systemically understanding
the biological processes by incrementally building up the networks
of interactions that underlie the biological process (64, 65). These
networks then provide the molecular basis for the etiology of dis-
eases and to rationally develop therapeutics that can work at one or
more components of the protein network (66, 67). The systems
approach also help in identifying key targets, biomarkers, and to
quantify potential side effects of drugs due to off-target interactions
(68, 69). In delineating these new networks, experimental methods
such as microarrays work in close association with statistical meth-
ods such as Bayesian network models to uncover new protein net-
works (70, 71). In addition, the networks of lower organisms such
as S. cerevisiae (72, 73), Drosophila melanogaster, have been identi-
fied (74, 75) and stored in databases such as KEGG (http://www.
genome.jp/kegg/pathway.html), UniPathway (http://www.gre-
noble.prabi.fr/obiwarehouse/unipathway) and clearly serve as
models to understand and derive the networks of higher organisms.
Mathematical modeling of the pathways is another tool that has

added to the understanding of these biochemical networks often
called “reaction networks.” Given a few parameters, mathematical
modeling can help derive the unknowns in the equations for flux
modeling and hence help in modeling the networks (76, 77).
A major utility of such networks in the pharmaceutical industry is
the pharmacokinetics and pharmacodynamics models (PK–PD).
Pharmacokinetics (PK) characterizes the absorption, distribution,
metabolism, and elimination properties of a drug. Pharmacody-
namics (PD) defines the physiological and biological response to
the administered drug. PK–PD modeling establishes a mathemati-
cal and theoretical link between these two processes and helps
better predict drug action (78, 79). Recent models for PK–PD
include mechanism-based models which aim to link the drug
administration and effect such as target-site distribution, receptor
binding kinetics, and the dynamics of receptor activation and trans-
duction (80). Algorithms for PK–PD modeling include GASTRO-
PLUS (Simulations, Inc.), WinNonLin (Pharsight, Inc.) which are
based on a knowledgebase of known drugs experimentally
measured PK–PD parameters. These modeling programs use data
such as the presence of drug in plasma at particular points in time
and allows for calculation and estimation of critical PK parameters
such as maximum concentration, total exposure (i.e., area under
the curve), half-life, clearance rate, and volume of distribution.
Other utilities of mathematical modeling involve deriving flux-
based networks in modeling phenotypes in response to a toxic
agent such as receptor overexpression induced by environmental
agents (81). In general, most mathematical models are determin-
istic in nature and hence limited in ability for use in small scales and
so extending their utility to derive large networks is not feasible.
However, other newer methods such as stochastic networks, Bayes-
ian models can conquer such limitations (82, 83).
6. Chemoinfor-
matics
Analogous to bioinformatics, chemoinformatics is a field of sci-
ence that is involved in management of chemical data using
computational methods. This area of science has gained tremen-
dous significance in the past few decades due to the availability of
chemical data in the form of combinatorial chemistry and high
throughput screening (84). These two aspects have set new trends
in drug discovery in which attrition rates are now being seriously
taken into account at early stages of drug discovery. Chemoinfor-
matics plays a major role in designing models for virtual screening,
lead design, lead optimization, preclinical filtering schemes for
drug like properties, ADMET and PK–PD modeling (85).
Chemoinformatics models were earlier referred to as ligand-based

methods. As the name suggests ligand-based methods are derived
solely with information from a molecule(s). A variety of techniques
such as CoMFA, CoMSIA, QSAR (1D or multidimensional),
Bayesian and other numerical and statistical methods and pharma-
cophore methods can be classified under ligand-based methods.
7. Classical QSAR
Quantitative structure–activity relationships (QSAR) aim to quan-

titatively relate the biological activity (e.g, inhibition constants, rate
constants, bioavailability, or toxicity) of a series of ligands with the
similarities between the chemical structures of those molecules. It
requires, first, the consistent measurement of biological activity,
second, the quantitative encoding of the chemical structure of the
studied molecules, also known as molecular descriptors, and third,
the derivation of mathematical equations quantitatively relating
molecular descriptors with biological activity.
Many different means of encoding the chemical structure of
ligands have been devised since the pioneering work of (86) who
used hydrophobicity and Hammett constants to describe the
varying substituent of a common scaffold of a series of studied
molecules. Classical Hansch-type QSAR models utilize atomic or
group properties describing the physicochemical properties of the
substitutions such as hydrophobic (e.g., log P, p ¼ partial log P),
electronic (e.g., Hammett s, quantum-mechanical indices such as
electronegativity or hardness index), or steric and polarizability
properties (e.g., molecular volume, molecular refractivity). Also,
the spatial distribution of physicochemical properties of a ligand
in 3D can be mapped onto molecular surfaces and used as molecu-
lar descriptors (e.g., polar surface area). Several molecular descrip-
tors dik for each ligand k in the dataset are usually combined in a
multi-linear regression model representing a correlation with the
biological activity Ak of compound k:
X
Ak ¼ c0 þ ci dik for all ligands k: (1)
i
The prefactors c0 and ci can be derived using multi-linear

regression analysis to fit the experimental activity data of a pre-
selected set of compounds that are used to train the regression
model, named the training set. Whereas parameters such as the
regression coefficient r2, which measures the ratio of explained
variance to total variance in the training set’s activity data, and the
Fisher value F, which measures the statistical significance of the
model, describe how well the model fits the experimental data of
the training set, they do not provide information about the
predictive quality of the model for new compounds that are not
included in the training set. Leave-one-out cross-validation, a
technique used in the past to provide this information, proved
to be insufficient to measure predictive power (87–89), but leav-
ing out larger groups throughout cross-validation or scrambling
tests (the activity data of the compounds is randomly reordered
among the dataset and no QSAR model with comparable regres-
sion quality should be obtained for any reordering) have been
shown to more reliably estimate the predictive quality of the
QSAR model. As an ultimate test, however, the QSAR model
should always be validated by its potential to predict compounds,
called the external test set, not included at any stage in the training
process.
In parallel to Hansch and Fujita, Free and Wilson (90) derived
QSAR models using indicator variables. Indicator variables [aim]k
describe the absence or presence of a chemical substituent im (e.g.,
Cl, Br, I, Me) at position m of a common ligand scaffold with values
of 0 (absence) and 1 (presence)
X X
Ak ¼ c0 þ ci0 ½ai0 k þ þ ciN ½aiN k for all ligands k:
i0 iN
(2)
N is the number of substitutions. Only one indicator variable in
each sum of equation (2) can have a value of 1 for each ligand.
Although original Free–Wilson type QSAR analysis displayed some
shortcomings over Hansch-type QSAR models (e.g., activity pre-
dictions are only possible for new combinations of substituents
already included in the training set; more degrees of freedom are
necessary to describe every substitution), this QSAR scheme has
become popular again with the onset of structural fingerprints or
hashed keys (91, 92) (Fig. 2) describing the topology of the mole-
cules in the data set.
8. 3D-QSAR
and Extensions
With the introduction of comparative molecular field analysis
(CoMFA) (93), for the first time structure–activity relationships
were based on the three-dimensional structure of the ligand mole-
cules (3D-QSAR). In 3D-QSAR, the ligands’ interaction with
chemical probes or the ligands’ property fields (such as electrostatic
fields) are mapped onto a surface or grid surrounding a series of
compounds. The values on the grid or surface points are utilized as
individual descriptors, which are usually grouped into a smaller
number of descriptors, for use in a regression. The quality of the
3D-QSAR model critically depends on the correct superposition of
Fig. 2. (a) A structural fingerprint for a chemical is generated by determining the frequency a specific fragment of a
predefined library is present in the ligand. The frequencies of all fragments are stored as individual bits in a bit string. The
individual bits are used as individual indicator variables. (b) In a hashed key the fragments are generated on-the-fly for all
ligands in the training set and the frequency of presence of a fragment is distributed to a hash key with fixed length.
the ligands representing the native conformations and orientations

of the ligands, a very difficult task, particularly, in the absence of the
X-ray structure of the target protein.
QSAR based on “alignment-independent descriptors” (AIDs)
(94–96) was proposed to surpass the necessity of the correct ligand
alignment. In these methods, properties of compounds, such as
hydrophobicity or hydrogen bonds, are projected on its molecular

surface. The surface properties are then transformed into position-
independent characteristics, such as the terms of a moment
expansion of the associated physicochemical fields of a molecule.
However, the selection of the native ligand conformations is like-
wise critical to the quality of an QSAR model based on such
descriptors. Alternatively, 4D-QSAR concepts (97–100) approach
the alignment issue by representing each molecule in different
conformations, orientations, tautomers, stereoisomers, or proton-
ation states. The true binding mode (or the bioactive conforma-
tion) is then identified throughout the training procedure of the
QSAR model, e.g., by the energetically most favorable ligand pose
with respect to its interaction with the surrounding molecular
probes. 4D-QSAR not only addresses the uncertainties of the
alignment process, but can also model multimode binding targets,
such as cytochrome P450 enzymes. Those enzymes, critical for
drug metabolism, are known to accommodate a ligand in different
binding poses, each yielding different metabolic products of a given
compound (101). Standard 3D-QSAR methods do not incorpo-
rate protein flexibility upon ligand binding. To model this impor-
tant factor in protein–ligand association, 5D-QSAR techniques that
simulate a topological adaptation of the QSAR model to the indi-
vidual ligand have been devised (100, 102, 103).
9. Applications
of QSAR
QSAR has become an integral component in pharmaceutical
research to optimize lead compounds. Whereas QSAR is widely
used to identify ligands with high affinity for a given target protein,
more recently QSAR methodology has been extended to predict
pharmacokinetic properties, such as adsorption, distribution,
metabolism, elimination (ADME) properties (104) or the oral bio-
availability of compounds (105, 106), as well as the toxicity of drug
candidates. Furthermore, in the context of the Registration, Evalu-
ation, and Authorization of Chemicals (REACH) legislation of the
European Union, the prediction of the toxic potential of environ-
mental chemicals using QSAR has created public interest (107).
10. Pharmacophore-
Based Modeling
Another ligand-based method that has found utility in pharmaceu-
tical industry is pharmacophore-based modeling. A pharmacophore
can be defined as a molecular framework required for the biological
activity of a compound or a set of compounds (108). The concept of
pharmacophores have found wide spread utility in virtual screening,

similarity analysis, and lead optimization. Three popular pharmaco-
phore modeling tools are DIStance COmparisons (DISCO) (109),
Genetic Algorithm Similarity Program (GASP) (110), and Catalyst
(111) and these have been thoroughly described and compared by
Patel and colleagues (112). The Catalyst program has been used
widely by researchers (113–120) and has two methods for generating
pharmacophores namely HIPHOP (121) and HYPOGEN (122).
HIPHOP uses few active molecules to derive the common chemical
features, while HYPOGEN derives models based on a series of mole-
cules with varying structure activity and function. Both these meth-
ods have found utility in deriving models for enzymes, nuclear
receptors, ion channels, and transporters. The types of transporter
pharmacophores that have been published to date have been recently
reviewed (123) and along with 3D-QSAR these methods have
become widely accepted methods for assessing the drug–transporter
interactions (124). The pharmacophore methods have been used to
discover new inhibitors or substrates for transporters by first search-
ing a database then generating in vitro data (114, 125–127). The
pharmacophore methods can also have a good hit rate which may be
used alongside other QSAR methods for database screening and
ADMET modeling (128, 129). In addition, 3D pharmacophore
methods are being used for lead design (41, 130).
11. ADMET
Modeling
One of the major applications of ligand-based method is to predict
absorption, distribution, metabolism, excretion, and toxicity
(ADME/Tox) applications. A number of studies have utilized dif-
ferent ligand-based models (131–139). In addition there are many
studies that have provided an extensive comparison of these pro-
grams that have been designed to predict the ADMET properties
(140–142). ADMET properties can be described using a set of
physicochemical properties such as solubility, log P, log D, pKa,
polar surface area that describe permeability, intestinal absorption,
blood brain barrier penetration, and excretion. Solubility is mod-
eled as logarithm of solubility (log S) using molecular descriptors
that govern shape, size, interatomic forces, and polarity (143–147).
Permeability is a measure of the compound’s bioavailability and is
modeled using molecular descriptors that code for hydrophobicity,
steric and electronic properties of molecules (148–150). In addi-
tion to passive diffusion across cellular membranes, permeability
could also be through active transport by membrane bound trans-
porters and pumps (151, 152). In silico models of such active
transport have been modeled using ligand-based methods (153,
154). Permeability across the blood brain barrier is an associated
parameter that is computed exclusively for CNS drugs and for other
compounds as an off-target filter. It is computed as logarithm of BB
(log BB) which is a measure of the ratio of concentration of the
drug in the brain to that in the blood (155, 156). Several in silico
models have been proposed that utilize molecular descriptors such
as log P, pKa, TPSA, and molecular weight with a variety of meth-
ods such as ANN, multiple regression models (MLR), QSAR,
Support vector machines (SVM), and other statistical techniques
(157–159). Similarly several in silico models have been proposed to
model drug metabolism and toxicity predictions which is reviewed
in a recent report (160).
12. Structure-
Based Methods
Structure-based methods as the name suggests relies on the
three-dimensional structure of the target and small molecule.
Three-dimensional (3D) structure of the target can be obtained
by experimental methods such as X-ray crystallography or NMR
methods or by homology modeling methods. Several review
articles provide additional details about the methods and utility of
homology models (160–162). Here we discuss the utility of
structure-based methods in virtual screening applications.
13. Virtual
Screening
High-throughput screening (HTS) has become a common tool for
drug discovery used to identify new hits interacting with a certain
biological target. Virtual screening technologies are applied to
either substitute or aid HTS. Both ligand-based methods that use
similarity to previously identified hits and structure-based methods
that use existing protein structure information can be used to
perform virtual high-throughput screening (VHTS) to identify
potentially active compounds. In ligand-based VHTS, one or sev-
eral hits must be identified first, for example, from previous HTS
experiments using a smaller subset of a ligand library or from
previously published hits. Factors such as the set of molecular
descriptors, the measurement of similarity, size, and diversity of
the virtual ligand library, and the similarity threshold value separat-
ing potential active from inactive compounds are critical to the
success of VHTS and must be carefully tuned. Techniques used in
ligand-based VHTS include methods based on 2D and 3D similar-
ity. Examples of such methods are substructure comparison (163),
shape matching (164), or pharmacophore methods (165–167).
In structure-based VHTS, automated docking is commonly

used to identify potential active compounds by ranking the ligand
library based on the strength of protein–ligand interactions evalu-
ated by a scoring function. Throughout the docking process, many
different ligand orientations and conformations (binding poses) are
generated in the binding site of the protein using a search algo-
rithm. Docking methods can be classified by the level of flexibility
that will be allowed for the protein and ligands (168). However,
with the increase in computational resources, most recently devel-
oped algorithms allow complete flexibility for the ligand molecules
and varying levels of flexibility to the amino acid side chains that are
involved in binding the ligands. Conformational analysis of ligands
can be performed using several algorithms such as systemic search,
stochastic or random search and using molecular simulation tech-
niques (169, 170). Systemic search includes performing conforma-
tional search along each dihedral in the small molecule. This could
lead to an exponential number of conformations that needs to be
docked and scored and may not be practical in many cases. To avoid
this, several algorithms utilize stored conformations of fragments
to limit the number of dihedrals that can be sampled (e.g., FLOG
(171)). Other alternatives include splitting the ligand into the core
and side chain regions and docking the core first and incrementally
sampling and adding the side chains (172). This method has been
adopted by several programs including DOCK (173), FlexX (174),
GLIDE (175), and Hammerhead (176). In stochastic search, flexi-
bility is computed by introducing random changes to the chosen
dihedrals and sampled using a Monte Carlo method or genetic
algorithm method. In each case, the newly formed conformation
is accepted or rejected using a probabilistic function that uses a
library of previously computed conformations (177). Autodock
(178), GOLD (179), MOE-DOCK (http://www.chemcomp.
com/) are some of the well-known programs that utilize the ran-
dom search method. Molecular simulations help derive conforma-
tions that may be compatible in a dynamic model. Molecular
dynamics methods are very efficient in deriving such conformations
but are expensive in terms of time and resources. However
simulated annealing, and accelerated and high-temperature MD
studies have helped conquer some of the issues associated with
MD studies, which also addresses the local minima problem (180,
181). Many programs such as DOCK, GLIDE, MOE-DOCK
utilize MD simulations to refine conformations obtained from
other methods. Protein pockets are generally represented as grid
points, surface, or atoms (180). Thus protein ensemble grids can
also be used to provide information on protein flexibility. The
atomistic representation of proteins are generally used for comput-
ing scoring functions, while surface representations are more useful
in the case of protein–protein docking methods.
14. Scoring
Functions
Scoring functions are used to estimate the protein–ligand interac-
tion energy of each docked pose. The pose with the most favorable
interaction score represents the predicted bioactive pose and in
principle can be used as a starting point for subsequent rational
structure-based drug design. Scoring functions can be classified
into three types, namely force field, knowledge based, and empirical
based on the type of parameters used for scoring protein–ligand
interactions. The force field method uses the molecular mechanics
energy terms to compute the internal energy of the ligand and the
binding energy. However, entropic terms are generally omitted in
the calculations as they are computationally expensive to be com-
puted. Various scoring schemes are built on different force fields
such as Amber (182), MMFF (183), and Tripos (184). In general a
force field-based scoring function consists of the van der Waals term
approximated by a Lennard Jones potential function and an elec-
trostatics term in the form of a Coulombic potential with distance-
dependent dielectric function to reduce the effect of charge–charge
interactions (180). Additional terms can be incorporated in certain
cases where in the contributions of water molecules or metal ions
are distinctly known that can increase the accuracy of the scoring
function. Empirical scoring functions are devised to fit known
experimental data derived from a number of protein–ligand com-
plexes (185). Regression equations are derived using these known
protein–ligand complexes and regression coefficients are computed
and these coefficients are used to derive information about ener-
getic of other protein–ligand complexes. Scoring schemes employ-
ing empirical methods include LUDI (185), Chemscore (186), and
F-score (174). Knowledge-based scoring functions are used to
score simple pairwise atom interactions based on their environ-
ment. A set of known protein–ligand complexes are used to build
the knowledge database about the type of interactions that can
exist. Because of their simplistic approach they are advantageous
to be used in scoring large databases in relatively short time scales.
Scoring functions that use knowledge-based methods include
Drugscore (187), PMF (188), and SMOG (189).
Each of the scoring schemes mentioned have their advantages
and disadvantages. Hence the concept of consensus scoring
schemes was introduced to limit the dependency on any of the
schemes (190). A number of publications in the literature describe
comparative studies employing different docking and scoring
schemes (168, 180). There is no set rule to combine scoring
schemes, deriving a consensus score should be customized to
every application to limit amplifying errors and balancing the
right set of parameters that can be useful to identify the correctly
docked pose. Although there are several scoring schemes, it should

be noted that existing scoring functions are not accurate enough to
reliably predict the native binding mode and associated free energy
of binding. This limitation originates from the necessity to find a
balance between accuracy and efficiency in order to screen large
ligand libraries. Consequently, scoring functions quantify a simpli-
fied representation of the full protein–ligand interaction by only
including critical elements such as hydrogen bonds and hydropho-
bic contacts and neglecting effects such as polarization and entropy.
In addition to using a simplified scoring function to reduce the
computational time required for VHTS, only critical degrees of
freedom, such as translation, rotation and torsional rotations of
the ligand, are considered during the search algorithm to limit the
conformational space that must be sampled. To compensate for the
tradeoff between efficiency and accuracy, more accurate post-
processing techniques such as free-energy methods based on
molecular dynamics simulations are required to confirm the pre-
dicted bioactive binding pose or to more reliably rank the ligand
library according to their free energy of binding.
15. Binding Energy

Estimation
Estimation of binding free energy for the case of protein–protein
and protein–drug complexes has remained a daunting challenge;
however, significant progress in free energy calculations has been
made over the past 20 years. This section will touch on the rigors of
binding free energy calculations with a focus on the linear interac-
tion energy method covering the past few years. The free energy, G,
is represented as follows
G ¼ kB T lnðZ Þ; (3)
where kB is the Boltzmann constant, T is the temperature, and Z is
the partition function described as
X
Z ¼ ebEi ; (4)
i
where b ¼ 1/kBT. The equation describes samples of configura-

tions i (also referred to as a “microstate”) following a Boltzmann
distribution. These samples of configurations i can be thought of as
the snapshots generated by a molecular dynamics (MD) or Monte
Carlo (MC) simulation. The quantity Ei is the potential energy of
configuration i. When it comes to binding free energy, we are most
interested in the differences between two states (e.g., the bound ZB
versus the unbound state ZA). Equation 3 below is known as
Zwanzig’s formula (191).
Fig. 3. Illustration of the unbound receptor plus unbound ligand in equilibrium with the
receptor–ligand complex (the bound state).

ZB
DG ¼ GB GA ¼ kB T ln : (5)
ZA
The equations below describe the relation to chemical equilib-
rium (Fig. 3). KA is the equilibrium constant for association of recep-
tor (R) and ligand (L) to the complex (RL). The dissociation of the
complex (RL) back to receptor (R) and ligand (L) is described by the
dissociation constant KD which is also the inhibition constant Ki.
R þ L Ð RL ½RL
KA ¼
½R½L
RL Ð R þ L ½R½L
KD ¼ Ki ¼
½RL
These constants are related to the binding free energy DGbind,

enthalpy DH, and entropy DS via the following relationship.
DGbind ¼ RT ln KA ¼ RT ln KD ¼ DH T DS: (6)
All of the components of the molecular mechanics force field
contribute to the enthalpy (DH) of the system. The change in
entropy (DS) associated with the motions (conformational changes)
of the ligand and the protein upon binding is a key contributor to
the overall system free energy. For example, the release of highly
ordered water molecules in a hydrophobic pocket of the protein
upon binding of the ligand gives a positive change in entropy
resulting in more favorable DGbind. If the bound conformation of
the ligand is not a stable conformation when the ligand is in the free
or unbound state the binding to protein will be less favored entro-
pically. However, if the bound state conformation of the ligand is
also the most stable conformation when the ligand is free in solu-
tion (unbound) the binding will be more favored entropically and
the ligand is said to be preorganized for binding (192). The estima-
tion of system entropy is an extensive area of research and an
ongoing challenge in computational chemistry today. A brief, yet
elegant discussion of the pitfalls of methods used to calculate the

entropy term can be found in Singh and Warshel’s excellent review
paper on binding free energy calculations (193).
16. Free Energy

Perturbation (FEP)
The principal goal of all free energy simulations is the computation
of the ratio of the two states (ZB/ZA). Rigorous methods require
exact sampling of the configurations between the two states. A well-
known technique is the free energy perturbation method based on
Zwanzig’s formula in Eq. 5. The potential energy difference
between the two states can be computed using MD or MC simula-
tions. Intermediate states represented by lambda l (also referred to
as windows) are introduced to cover space between the two states.
The energy difference between the two states (e.g., ligand free
in solution versus the ligand bound to protein) is often too large to
be computed directly using FEP and that the two states are too
distinct in their conformations. Hence a relative DG (DDG) can be
computed using a thermodynamic cycle.
Two sets of FEP calculations need to be performed. The more

simple of the two is the calculation of DGm(W), where state A is
ligand L in water and state B is ligand L0 in water. The second more
expensive calculation is that for DGm(R), where we mutate ligand L
to L0 in the presence of the receptor (protein). The relative free
energy can be calculated using Eq. 7.
DDG ¼ DGbind ðL’Þ DGbind ðLÞ ¼ DGm ðRÞ DGm ðWÞ: (7)
There are several inaccuracies, which can plague free energy
calculations. One is error in the force field used. Another issue is
adequate sampling of phase space. This can be addressed by
running the simulation for a longer time period for molecular
dynamics calculations or use more iterations in the case of Monte
Carlo calculations. The FEP calculation should be run in both
forward and reverse directions. The difference in the free energies
of the forward and reverse direction calculations provides a lower
bound estimate of the error in the calculation (194). Changing one
atom to another atom or group in the perturbation near the end of

the intermediate states can result in well-known endpoint problems
like numerical instability and singularities. These “endpoint” issues
can be addressed using “soft-core” potentials for the van der Waals
component of the force field (195, 196).
17. Linear
Interaction Energy
(LIE)
Two methods, LIE and Molecular Mechanics-Poisson Boltzmann-
Surface Area (MM-PBSA), are often referred to as “endpoint”
methods because they neglect the intermediate states in the transi-
tion (192). The LIE method developed by Åqvist uses a scaling
factor b, based on the linear response approximation for the elec-
trostatic component (197, 198), while estimating the van der Waals
term using a scaling factor, a (199). This approach only considers
the endpoints: the bound ligand and the unbound or “free” ligand.
elec elec
DGbind ¼ b Vbound Vunbound
vdw vdw
þ a Vbound Vunbound þ g: (8)
The sometimes-used parameter g is a scaling factor used to
account for the medium in computing absolute free energies. The
values for the scaling factors a, b, g are dependent on the nature of
binding pocket, functional groups of ligands (Table 1), force fields,
solvent models (200–205). Several studies in the literature have
proposed the use of LIE methods to efficiently design small molecule
inhibitors such as antimalarials (206), antibiofilm agent Dispersin B
(207), HIV-1 reverse transcriptase inhibitors (208), glucose binding
to insulin (209), BACE-1 inhibitors (210), tubulin (211), and CDK-
2 inhibitors (212). The LIE method continues to evolve and grow as
Table 1
Optimal b parameters based
on compound type (203)
Compound b
Alcohols 0.37
1 Amides 0.41

1 , 2 Amines 0.39
Carboxylic acids 0.40
Cations 0.52
Anions 0.45
Other compounds 0.43
a mainstay tool of the computational chemist or biologist. The tech-

nique was born from free energy perturbation methods, is faster than
the FEP or thermodynamic integration (TI) techniques, and with
careful fit to training sets of experimental data is highly accurate in its
prediction. The implicit solvent methods used in LIE calculations
enjoy a one order of magnitude increase in speed over the older
explicit solvent model LIE calculations (208). Even in its functional
form, the LIE equation is useful in a qualitative sense for early
investigative work. The LIE method can be used in the study of
protein–protein and small molecule complexes with nucleic acids.
In general, docking methods are significantly more time
demanding than similarity-based methods and thus are often only
applied to a subset of the full ligand library that is pre-filtered by a
similarity-based method. As well as being applied consecutively in
VHTS applications, structure-based and ligand-based methods can
also be applied in parallel or as a single integrated method which
occurs in structure-derived pharmacophore models (213, 214) and
hybrid structure-based methods that integrate both ligand-based
and structure-based methods (41, 130).
To estimate the expected success of VHTS or to optimize the
procedure, retrospective screening is often performed; known
actives are mixed with a large number of proposed nonactive mole-
cules (decoys) and the percentage of identified actives as a function
of the number of tested molecules is plotted in so-called enrich-
ment plots. While this may show enrichment, it should be noted
that successful retrospective screening does not always imply suc-
cessful prospective screening of a novel ligand library. For example,
similarity-based screening methods may be biased towards the
original set of active compounds and does not identify a novel
class of molecules as potential hits.
In many instances, virtual screening is not used to replace HTS
but rather to pre-filter the ligand library to a smaller subset of
compounds (focused library) that are more likely have the required
properties to be active compounds for the biological target of
interest. As part of this library design process, computational meth-
ods are also used to estimate lead-like, drug-like, or pharmacoki-
netic properties that are utilized to preselect compounds with
reasonable properties for the drug-discovery project.
18. Conclusions
In this review we have introduced a number of in silico methods

that have a myriad of applications in pharmacology. We have classi-
fied these approaches into bioinformatics based that deal with
protein targets and chemoinformatics based that deal with small
molecules. In addition, we have also discussed in a slightly detailed
manner the role of free energy methods in estimating binding

energies. Although the FEP methods are computationally inten-
sive, their applications to designing new inhibitor molecules and to
understand the structure–activity relationship of a series of com-
pounds are unmatched. Taken together, the in silico methods
clearly complement the in vitro and in vivo methods in pharmacol-
ogy and have become an integral part of the drug discovery process.
Acknowledgments
We would like to thank our collaborators for their views and comments
on the manuscript and Bharat Kumar Stanam for his help in
designing figures. SK is funded by American Heart Association,
scientist development grant.
References
1. Figeys D (2004) Combining different ‘omics’ 10. Hopkins AL (2008) Network pharmacology:
technologies to map and validate protein–- the next paradigm in drug discovery. Nat
protein interactions in humans. Brief Funct Chem Biol 4:682–690
Genomic Proteomic 2:357–365 11. Metz JT, Hajduk PJ (2010) Rational
2. Cusick ME, Klitgord N, Vidal M, Hill DE approaches to targeted polypharmacology: cre-
(2005) Interactome: gateway into systems ating and navigating protein-ligand interaction
biology. Hum Mol Genet 14(Spec No. 2): networks. Curr Opin Chem Biol 14:498–504
R171–R181 12. Morrow JK, Tian L, Zhang S (2010) Molec-
3. Chakravarti B, Mallik B, Chakravarti DN ular networks in drug discovery. Crit Rev
(2010) Proteomics and systems biology: Biomed Eng 38:143–156
application in drug discovery and develop- 13. Scheibye-Alsing K, Hoffmann S, Frankel A,
ment. Methods Mol Biol 662:3–28 Jensen P, Stadler PF, Mang Y, Tommerup N,
4. Butcher EC, Berg EL, Kunkel EJ (2004) Gilchrist MJ, Nygard AB, Cirera S, Jorgensen
Systems biology in drug discovery. Nat CB, Fredholm M, Gorodkin J (2009) Sequence
Biotechnol 22:1253–1259 assembly. Comput Biol Chem 33:121–136
5. Cho CR, Labow M, Reinhardt M, van Oos- 14. Huang X (2002) Bioinformatics support for
trum J, Peitsch MC (2006) The application of genome sequencing projects. In: Lengauer T
systems biology to drug discovery. Curr Opin (ed) Bioinformatics—from genomes to drugs.
Chem Biol 10:294–302 Wiley-VCH Verlag GmbH, Weinheim
6. Chen C, McGarvey PB, Huang H, Wu CH 15. Mihara M, Itoh T, Izawa T (2010) SALAD
(2010) Protein bioinformatics infrastructure database: a motif-based database of protein
for the integration and analysis of multiple annotations for plant comparative genomics.
high-throughput “omics” data. Adv Bioin- Nucleic Acids Res 38:D835–D842
form 423589:19 16. Katayama S, Kanamori M, Hayashizaki Y
7. Gund P, Maliski E, Brown F (2005) Editorial (2004) Integrated analysis of the genome
overview: whither the pharmaceutical indus- and the transcriptome by FANTOM. Brief
try? Curr Opin Drug Discov Dev 8:296–297 Bioinform 5:249–258
8. Brown FK (1998) Chemoinformatics: what is 17. Blanchette M (2007) Computation and anal-
it and how does it impact drug discovery. ysis of genomic multi-sequence alignments.
Annu Rep Med Chem 33:9 Annu Rev Genomics Hum Genet 8:193–213
9. Gasteiger J, Engel T (2004) Chemoinfor- 18. Mungall CJ, Misra S, Berman BP, Carlson J,
matics: a textbook. Wiley, Weinheim Frise E, Harris N, Marshall B, Shu S,
Kaminker JS, Prochnik SE, Smith CD, Smith 33. Bonneau R, Baker D (2001) Ab initio protein
E, Tupy JL, Wiel C, Rubin GM, Lewis SE structure prediction: progress and prospects.
(2002) An integrated computational pipeline Annu Rev Biophys Biomol Struct
and database to support whole-genome 30:173–189
sequence annotation. Genome Biol 3: 34. Simons KT, Kooperberg C, Huang E, Baker
RESEARCH0081 D (1997) Assembly of protein tertiary struc-
19. Lewis SE, Searle SM, Harris N, Gibson M, tures from fragments with similar local
Lyer V, Richter J, Wiel C, Bayraktaroglir L, sequences using simulated annealing and
Birney E, Crosby MA, Kaminker JS, Mat- Bayesian scoring functions. J Mol Biol
thews BB, Prochnik SE, Smithy CD, Tupy 268:209–225
JL, Rubin GM, Misra S, Mungall CJ, Clamp 35. Hardin C, Pogorelov TV, Luthey-Schulten Z
ME (2002) Apollo: a sequence annotation (2002) Ab initio protein structure prediction.
editor. Genome Biol 3:RESEARCH0082 Curr Opin Struct Biol 12:176–181
20. Pirovano W, Heringa J (2010) Protein sec- 36. Sali A, Blundell TL (1993) Comparative pro-
ondary structure prediction. Methods Mol tein modelling by satisfaction of spatial
Biol 609:327–348 restraints. J Mol Biol 234:779–815
21. Cozzetto D, Tramontano A (2008) Advances 37. Kriwacki RW, Wu J, Tennant L, Wright PE,
and pitfalls in protein structure prediction. Siuzdak G (1997) Probing protein structure
Curr Protein Pept Sci 9:567–577 using biochemical and biophysical methods.
22. Altschul SF, Gish W, Miller W, Myers EW, Proteolysis, matrix-assisted laser desorption/
Lipman DJ (1990) Basic local alignment ionization mass spectrometry, high-
search tool. J Mol Biol 215:403–410 performance liquid chromatography and
23. Pearson WR (1990) Rapid and sensitive size-exclusion chromatography of p21Waf1/
sequence comparison with FASTP and Cip1/Sdi1. J Chromatogr A 777:23–30
FASTA. Methods Enzymol 183:63–98 38. Kasprzak AA (2007) The use of FRET in the
24. Thompson JD, Higgins DG, Gibson TJ analysis of motor protein structure. Methods
(1994) CLUSTAL W: improving the sensitiv- Mol Biol 392:183–197
ity of progressive multiple sequence align- 39. Takeda-Shitaka M, Takaya D, Chiba C,
ment through sequence weighting, position- Tanaka H, Umeyama H (2004) Protein struc-
specific gap penalties and weight matrix ture prediction in structure based drug
choice. Nucleic Acids Res 22:4673–4680 design. Curr Med Chem 11:551–558
25. Lassmann T, Sonnhammer EL (2005) 40. Wlodawer A, Erickson JW (1993) Structure-
Kalign—an accurate and fast multiple sequence based inhibitors of HIV-1 protease. Annu Rev
alignment algorithm. BMC Bioinform 6:298 Biochem 62:543–585
26. Edgar RC (2004) MUSCLE: multiple 41. Kortagere S, Welsh WJ, Morrisey JM, Daly T,
sequence alignment with high accuracy and Ejigiri I, Sinnis P, Vaidya AB, Bergman LW
high throughput. Nucleic Acids Res (2010) Structure-based design of novel small-
32:1792–1797 molecule inhibitors of Plasmodium falci-
27. Notredame C, Higgins DG, Heringa J (2000) parum. J Chem Inf Model 50:840–849
T-Coffee: a novel method for fast and accu- 42. Kortagere S, Mui E, McLeod R, Welsh WJ.
rate multiple sequence alignment. J Mol Biol Rapid discovery of inhibitors of Toxoplasma
302:205–217 gondii using hybrid structure-based computa-
28. Tusnady GE, Simon I (1998) Principles gov- tional approach. J Comput Aided Mol Des.
erning amino acid composition of integral 2011 May;25(5):403–11
membrane proteins: application to topology 43. Kortagere S, Madani N, Mankowski MK,
prediction. J Mol Biol 283:489–506 Schön A, Zentner I, Swaminathan G,
29. Rost B, Liu J (2003) The PredictProtein Princiotto A, Anthony K, Oza A, Sierra LJ,
server. Nucleic Acids Res 31:3300–3304 Passic SR, Wang X, Jones DM, Stavale E,
30. Cole C, Barber JD, Barton GJ (2008) The Krebs FC, Martı́n-Garcı́a J, Freire E, Ptak
Jpred 3 secondary structure prediction server. RG, Sodroski J, Cocklin S, Smith AB 3rd.
Nucleic Acids Res 36:W197–W201 Inhibiting Early-Stage Events in HIV-1
Replication by Small-Molecule Targeting of
31. Guzzo AV (1965) The influence of amino- the HIV-1 Capsid. J Virol. 2012 Aug;86
acid sequence on protein structure. Biophys (16):8472–81
J 5:809–822
44. Pazos F, Valencia A (2008) Protein co-
32. Chou PY, Fasman GD (1974) Prediction of evolution, co-adaptation and interactions.
protein conformation. Biochemistry EMBO J 27:2648–2655
13:222–245
45. Hu CD, Chinenov Y, Kerppola TK (2002) studying cellular networks of protein interac-
Visualization of interactions among bZIP tions. Nucleic Acids Res 30:303–305
and Rel family proteins in living cells using 57. Peri S, Navarro JD, Amanchy R, Kristiansen
bimolecular fluorescence complementation. TZ, Jonnalagadda CK, Surendranath V, Nir-
Mol Cell 9:789–798 anjan V, Muthusamy B, Gandhi TK, Gron-
46. Chien CT, Bartel PL, Sternglanz R, Fields S borg M, Ibarrola N, Deshpande N, Shanker
(1991) The two-hybrid system: a method to K, Shivashankar HN, Rashmi BP, Ramya MA,
identify and clone genes for proteins that Zhao Z, Chandrika KN, Padma N, Harsha
interact with a protein of interest. Proc Natl HC, Yatish AJ, Kavitha MP, Menezes M,
Acad Sci U S A 88:9578–9582 Choudhury DR, Suresh S, Ghosh N, Saravana
47. Selbach M, Mann M (2006) Protein interac- R, Chandran S, Krishna S, Joy M, Anand SK,
tion screening by quantitative immunoprecip- Madavan V, Joseph A, Wong GW, Schiemann
itation combined with knockdown (QUICK). WP, Constantinescu SN, Huang L, Khosravi-
Nat Methods 3:981–983 Far R, Steen H, Tewari M, Ghaffari S, Blobe
48. Gavin AC, Aloy P, Grandi P, Krause R, GC, Dang CV, Garcia JG, Pevsner J, Jensen
Boesche M, Marzioch M, Rau C, Jensen LJ, ON, Roepstorff P, Deshpande KS, Chinnai-
Bastuck S, Dumpelfeld B, Edelmann A, Heur- yan AM, Hamosh A, Chakravarti A, Pandey A
tier MA, Hoffman V, Hoefert C, Klein K, (2003) Development of human protein refer-
Hudak M, Michon AM, Schelder M, Schirle ence database as an initial platform for
M, Remor M, Rudi T, Hooper S, Bauer A, approaching systems biology in humans.
Bouwmeester T, Casari G, Drewes G, Neu- Genome Res 13:2363–2371
bauer G, Rick JM, Kuster B, Bork P, Russell 58. Kerrien S, Alam-Faruque Y, Aranda B, Bancarz
RB, Superti-Furga G (2006) Proteome survey I, Bridge A, Derow C, Dimmer E, Feuermann
reveals modularity of the yeast cell machinery. M, Friedrichsen A, Huntley R, Kohler C, Kha-
Nature 440:631–636 dake J, Leroy C, Liban A, Lieftink C,
49. Pellegrini M, Marcotte EM, Thompson MJ, Montecchi-Palazzi L, Orchard S, Risse J,
Eisenberg D, Yeates TO (1999) Assigning Robbe K, Roechert B, Thorneycroft D, Zhang
protein functions by comparative genome Y, Apweiler R, Hermjakob H (2007) IntAct—
analysis: protein phylogenetic profiles. Proc open source resource for molecular interaction
Natl Acad Sci U S A 96:4285–4288 data. Nucleic Acids Res 35:D561–D565
50. Dandekar T, Snel B, Huynen M, Bork P 59. Zanzoni A, Montecchi-Palazzi L, Quondam
(1998) Conservation of gene order: a finger- M, Ausiello G, Helmer-Citterich M, Cesareni
print of proteins that physically interact. G (2002) MINT: a molecular INTeraction
Trends Biochem Sci 23:324–328 database. FEBS Lett 513:135–140
51. Tan SH, Zhang Z, Ng SK (2004) ADVICE: 60. Ogmen U, Keskin O, Aytuna AS, Nussinov R,
automated detection and validation of inter- Gursoy A (2005) PRISM: protein interactions
action by co-evolution. Nucleic Acids Res 32: by structural matching. Nucleic Acids Res 33:
W69–W72 W331–W336
52. Aloy P, Russell RB (2003) InterPreTS: pro- 61. Keskin O, Ma B, Nussinov R (2005) Hot
tein interaction prediction through tertiary regions in protein–protein interactions: the
structure. Bioinformatics 19:161–162 organization and contribution of structurally
conserved hot spot residues. J Mol Biol
53. Aytuna AS, Gursoy A, Keskin O (2005) Pre- 345:1281–1294
diction of protein-protein interactions by
combining structure and sequence conserva- 62. Chen YC, Lo YS, Hsu WC, Yang JM (2007)
tion in protein interfaces. Bioinformatics 3D-partner: a web server to infer interacting
21:2850–2855 partners and binding models. Nucleic Acids
Res 35:W561–W567
54. Bader GD, Donaldson I, Wolting C, Ouellette
BF, Pawson T, Hogue CW (2001) BIND— 63. Jansen R, Yu H, Greenbaum D, Kluger Y,
the biomolecular interaction network data- Krogan NJ, Chung S, Emili A, Snyder M,
base. Nucleic Acids Res 29:242–245 Greenblatt JF, Gerstein M (2003) A Bayesian
networks approach for predicting protein-
55. Stark C, Breitkreutz BJ, Reguly T, Boucher L, protein interactions from genomic data.
Breitkreutz A, Tyers M (2006) BioGRID: a Science 302:449–453
general repository for interaction datasets.
Nucleic Acids Res 34:D535–D539 64. Monk NA (2003) Unravelling nature’s net-
works. Biochem Soc Trans 31:1457–1461
56. Xenarios I, Salwinski L, Duan XJ, Higney P,
Kim SM, Eisenberg D (2002) DIP, the data- 65. Uetz P, Finley RL Jr (2005) From protein
base of interacting proteins: a research tool for networks to biological systems. FEBS Lett
579:1821–1827
66. Schrattenholz A, Groebe K, Soskic V (2010) drug development: role of modeling and sim-
Systems biology approaches and tools for ulation. AAPS J 7:E544–E559
analysis of interactomes and multi-target 80. Danhof M, de Jongh J, De Lange EC, Della
drugs. Methods Mol Biol 662:29–58 Pasqua O, Ploeger BA, Voskuyl RA (2007)
67. Lowe JA, Jones P, Wilson DM (2010) Network Mechanism-based pharmacokinetic-
biology as a new approach to drug discovery. pharmacodynamic modeling: biophase distri-
Curr Opin Drug Discov Dev 13:524–526 bution, receptor theory, and dynamical systems
68. Kell DB (2006) Systems biology, metabolic analysis. Annu Rev Pharmacol Toxicol
modelling and metabolomics in drug discov- 47:357–400
ery and development. Drug Discov Today 81. Paul Lee WN, Wahjudi PN, Xu J, Go VL
11:1085–1092 (2010) Tracer-based metabolomics: concepts
69. Xie L, Bourne PE (2011) Structure-based and practices. Clin Biochem 43:1269–1277
systems biology for analyzing off-target bind- 82. Chipman KC, Singh AK (2011) Using sto-
ing. Curr Opin Struct Biol 21(2):189–199 chastic causal trees to augment Bayesian net-
70. Imoto S, Higuchi T, Goto T, Tashiro K, works for modeling eQTL datasets. BMC
Kuhara S, Miyano S (2003) Combining Bioinform 12:7
microarrays and biological knowledge for esti- 83. Hou L, Wang L, Qian M, Li D, Tang C, Zhu
mating gene networks via Bayesian networks. Y, Deng M, Li F (2011) Modular analysis of
Proc IEEE Comput Soc Bioinform Conf the probabilistic genetic interaction network.
2:104–113 Bioinformatics 27:853
71. Needham CJ, Manfield IW, Bulpitt AJ, Gil- 84. Villar HO, Hansen MR (2009) Mining and
martin PM, Westhead DR (2009) From gene visualizing the chemical content of large data-
expression to gene regulatory networks in bases. Curr Opin Drug Discov Dev
Arabidopsis thaliana. BMC Syst Biol 3:85 12:367–375
72. Otero JM, Papadakis MA, Udatha DB, 85. Langer T, Hoffmann R, Bryant S, Lesur B
Nielsen J, Panagiotou G (2010) Yeast (2009) Hit finding: towards ‘smarter’
biological networks unfold the interplay of approaches. Curr Opin Pharmacol 9:589–593
antioxidants, genome and phenotype, and 86. Fujita T, Hansch C. Analysis of the structure-
reveal a novel regulator of the oxidative stress activity relationship of the sulfonamide drugs
response. PLoS One 5:e13606 using substituent constants. J Med Chem.
73. Teusink B, Westerhoff HV, Bruggeman FJ 1967 Nov;10(6):991–1000
(2010) Comparative systems biology: from 87. Golbraikh A, Tropsha A (2002) Beware of q2!
bacteria to man. Wiley Interdiscip Rev Syst J Mol Graph Model 20:269–276
Biol Med 2:518–532 88. Kubinyi H (2002) High throughput in drug
74. Neumuller RA, Perrimon N (2010) Where discovery. Drug Discov Today 7:707–709
gene discovery turns into systems biology: 89. Kubinyi H, Hamprecht FA, Mietzner T
genome-scale RNAi screens in Drosophila. (1998) Three-dimensional quantitative
Wiley Interdiscip Rev Syst Biol Med similarity-activity relationships (3D QSiAR)
3:471–478 from SEAL similarity matrices. J Med Chem
75. Bier E, Bodmer R (2004) Drosophila, an 41:2553–2564
emerging model for cardiac disease. Gene 90. Free SM Jr, Wilson JW (1964) A mathemati-
342:1–11 cal contribution to structure-activity studies.
76. Gianchandani EP, Chavali AK, Papin JA J Med Chem 7:395–399
(2010) The application of flux balance analysis 91. Brown RD, Martin YC (1996) Use of struc-
in systems biology. Wiley Interdiscip Rev Syst ture activity data to compare structure-based
Biol Med 2:372–382 clustering methods and descriptors for use in
77. Neves SR, Iyengar R (2009) Models of spa- compound selection. J Chem Inform Comput
tially restricted biochemical reaction systems. Sci 36:12
J Biol Chem 284:5445–5449 92. Brown RD, Martin YC (1997) The informa-
78. Czock D, Markert C, Hartman B, Keller F tion content of 2D and 3D structural descrip-
(2009) Pharmacokinetics and pharmacody- tors relevant to ligand-receptor binding.
namics of antimicrobial drugs. Exp Opin J Chem Inform Comput Sci 37:9
Drug Metab Toxicol 5:475–487 93. Cramer RD, Patterson DE, Bunce JD (1988)
79. Chien JY, Friedrich S, Heathman MA, de Comparative molecular-field analysis (Comfa).
Alwis DP, Sinha V (2005) Pharmacokinet- 1. Effect of shape on binding of steroids to
ics/pharmacodynamics and the stages of carrier proteins. J Am Chem Soc 110:8
94. Bravi G, Gancia E, Mascagni P, Pegna M, 107. Worth AP, Bassan A, De Bruijn J, Gallegos
Todeschini R, Zaliani A (1997) MS-WHIM, Saliner A, Netzeva T, Patlewicz G, Pavan M,
new 3D theoretical descriptors derived from Tsakovska I, Eisenreich S (2007) The role of
molecular surface properties: a comparative the European Chemicals Bureau in promot-
3D QSAR study in a series of steroids. ing the regulatory use of (Q)SAR methods.
J Comput Aided Mol Des 11:79–92 SAR QSAR Environ Res 18:111–125
95. Belvisi L, Bravi G, Scolastico C, Vulpetti A, 108. Leach AR, Gillet VJ, Lewis RA, Taylor R
Salimbeni A, Todeschini R (1994) A 3D (2010) Three-dimensional pharmacophore
QSAR approach to the search for geometrical methods in drug discovery. J Med Chem
similarity in a series of nonpeptide angiotensin 53:539–558
II receptor antagonists. J Comput Aided Mol 109. Martin YC, Bures MG, Danaher EA, DeLaz-
Des 8:211–220 zer J, Lico I, Pavlik PA (1993) A fast new
96. Silverman BD, Platt DE (1996) Comparative approach to pharmacophore mapping and its
molecular moment analysis (CoMMA): 3D- application to dopaminergic and benzodiaze-
QSAR without molecular superposition. pine agonists. J Comput Aided Mol Des
J Med Chem 39:2129–2140 7:83–102
97. Hopfinger AJ, Wang S, Tokarski JS, Jin B, 110. Jones G, Willett P, Glen RC (1995) A genetic
Albuquerque M, Madhav PJ, Duraiswami C algorithm for flexible molecular overlay and
(1997) Construction of 3D-QSAR models pharmacophore elucidation. J Comput Aided
using the 4D-QSAR analysis formalism. Mol Des 9:532–549
J Am Chem Soc 119:15 111. Chang C, Swaan PW (2006) Computational
98. Vedani A, Briem H, Dobler M, Dollinger H, approaches to modeling drug transporters.
McMasters DR (2000) Multiple- Eur J Pharm Sci 27:411–424
conformation and protonation-state repre- 112. Patel Y, Gillet VJ, Bravi G, Leach AR (2002)
sentation in 4D-QSAR: the neurokinin-1 A comparison of the pharmacophore identifi-
receptor system. J Med Chem 43:4416–4427 cation programs: Catalyst, DISCO and GASP.
99. Lukacova V, Balaz S (2003) Multimode J Comput Aided Mol Des 16:653–681
ligand binding in receptor site modeling: 113. Ekins S, Johnston JS, Bahadduri P, D’Souza
implementation in CoMFA. J Chem Inf VM, Ray A, Chang C, Swaan PW (2005) In
Comput Sci 43:2093–2105 vitro and pharmacophore-based discovery of
100. Lill MA, Vedani A, Dobler M (2004) Raptor: novel hPEPT1 inhibitors. Pharm Res
combining dual-shell representation, 22:512–517
induced-fit simulation, and hydrophobicity 114. Chang C, Bahadduri PM, Polli JE, Swaan PW,
scoring in receptor modeling: application Ekins S (2006) Rapid identification of
toward the simulation of structurally diverse P-glycoprotein substrates and inhibitors.
ligand sets. J Med Chem 47:6174–6186 Drug Metab Dispos 34:1976–1984
101. Lill MA, Dobler M, Vedani A (2006) Predic- 115. Ekins S, Kim RB, Leake BF, Dantzig AH,
tion of small-molecule binding to cytochrome Schuetz EG, Lan LB, Yasuda K, Shepard RL,
P450 3A4: flexible docking combined with Winter MA, Schuetz JD, Wikel JH, Wrighton
multidimensional QSAR. ChemMedChem SA (2002) Application of three-dimensional
1:73–81 quantitative structure-activity relationships of
102. Vedani A, Dobler M, Lill MA (2005) Com- P-glycoprotein inhibitors and substrates. Mol
bining protein modeling and 6D-QSAR. Pharmacol 61:974–981
Simulating the binding of structurally diverse 116. Ekins S, Kim RB, Leake BF, Dantzig AH,
ligands to the estrogen receptor. J Med Chem Schuetz EG, Lan LB, Yasuda K, Shepard RL,
48:3700–3703 Winter MA, Schuetz JD, Wikel JH, Wrighton
103. Vedani A, Dobler M (2002) 5D-QSAR: the SA (2002) Three-dimensional quantitative
key for simulating induced fit? J Med Chem structure-activity relationships of inhibitors
45:2139–2149 of P-glycoprotein. Mol Pharmacol
104. Norinder U (2005) In silico modelling of 61:964–973
ADMET-a minireview of work from 2000 to 117. Bednarczyk D, Ekins S, Wikel JH, Wright SH
2004. SAR QSAR Environ Res 16:1–11 (2003) Influence of molecular structure on
105. Yoshida F, Topliss JG (2000) QSAR model substrate binding to the human organic cat-
for drug human oral bioavailability. J Med ion transporter, hOCT1. Mol Pharmacol
Chem 43:2575–2585 63:489–498
106. Martin YC (2005) A bioavailability score. 118. Chang C, Pang KS, Swaan PW, Ekins S
J Med Chem 48:3164–3170 (2005) Comparative pharmacophore
modeling of organic anion transporting poly- method for efficient screening of ligands
peptides: a meta-analysis of rat Oatp1a1 and binding to G-protein coupled receptors.
human OATP1B1. J Pharmacol Exp Ther J Comput Aided Mol Des 20:789–802
314:533–541 131. Ekins S, Waller CL, Swaan PW, Cruciani G,
119. Suhre WM, Ekins S, Chang C, Swaan PW, Wrighton SA, Wikel JH (2000) Progress in
Wright SH (2005) Molecular determinants predicting human ADME parameters in
of substrate/inhibitor binding to the human silico. J Pharmacol Toxicol Methods
and rabbit renal organic cation transporters 44:251–272
hOCT2 and rbOCT2. Mol Pharmacol 132. Ekins S, Ring BJ, Grace J, McRobie-Belle DJ,
67:1067–1077 Wrighton SA (2000) Present and future
120. Ekins S, Swaan PW (2004) Computational in vitro approaches for drug metabolism.
models for enzymes, transporters, channels J Pharm Tox Methods 44:313–324
and receptors relevant to ADME/TOX. Rev 133. Ekins S, Ring BJ, Bravi G, Wikel JH,
Comp Chem 20:333–415 Wrighton SA (2000) Predicting drug-drug
121. Clement OO, Mehl AT (2000) HipHop: interactions in silico using pharmacophores:
pharmacophore based on multiple common- a paradigm for the next millennium. In:
feature alignments. IUL, San Diego, CA Guner OF (ed) Pharmacophore perception,
122. Evans DA, Doman TN, Thorner DA, Bodkin development, and use in drug design. IUL,
MJ (2007) 3D QSAR methods: phase and San Diego, pp 269–299
catalyst compared. J Chem Inf Model 134. Ekins S, Obach RS (2000) Three
47:1248–1257 dimensional-quantitative structure activity
123. Bahadduri PM, Polli JE, Swaan PW, Ekins S relationship computational approaches of pre-
(2010) Targeting drug transporters—com- diction of human in vitro intrinsic clearance.
bining in silico and in vitro approaches to J Pharmacol Exp Ther 295:463–473
predict in vivo. Methods Mol Biol 135. Ekins S, Bravi G, Binkley S, Gillespie JS, Ring
637:65–103 BJ, Wikel JH, Wrighton SA (2000) Three and
124. Ekins S, Ecker GF, Chiba P, Swaan PW four dimensional-quantitative structure activ-
(2007) Future directions for drug transporter ity relationship (3D/4D-QSAR) analyses of
modeling. Xenobiotica 37:1152–1170 CYP2C9 inhibitors. Drug Metab Dispos
125. Diao L, Ekins S, Polli JE (2010) Quantitative 28:994–1002
structure activity relationship for inhibition of 136. Ekins S, Bravi G, Wikel JH, Wrighton SA
human organic cation/carnitine transporter. (1999) Three dimensional quantitative struc-
Mol Pharm 7(6):2120–2131 ture activity relationship (3D-QSAR) analysis
126. Zheng X, Ekins S, Rauffman J-P, Polli JE of CYP3A4 substrates. J Pharmacol Exp Ther
(2009) Computational models for drug inhibi- 291:424–433
tion of the human apical sodium-dependent 137. Ekins S, Bravi G, Ring BJ, Gillespie TA, Gil-
bile acid transporter. Mol Pharm 6:1591–1603 lespie JS, VandenBranden M, Wrighton SA,
127. Diao L, Ekins S, Polli JE (2009) Novel inhi- Wikel JH (1999) Three dimensional-
bitors of human organic cation/carnitine quantitative structure activity relationship
transporter (hOCTN2) via computational analyses of substrates for CYP2B6. J Pharm
modeling and in vitro testing. Pharm Res Exp Ther 288:21–29
26:1890–1900 138. Ekins S, Bravi G, Binkley S, Gillespie JS, Ring
128. Gao Q, Yang L, Zhu Y (2010) Pharmaco- BJ, Wikel JH, Wrighton SA (1999) Three and
phore based drug design approach as a practi- four dimensional-quantitative structure activ-
cal process in drug discovery. Curr Comput ity relationship (3D/4D-QSAR) analyses of
Aided Drug Des 6:37–49 CYP2D6 inhibitors. Pharmacogenetics
9:477–489
129. Keri G, Szekelyhidi Z, Banhegyi P, Varga Z,
Hegymegi-Barakonyi B, Szantai-Kis C, 139. Ekins S, Bravi G, Binkley S, Gillespie JS, Ring
Hafenbradl D, Klebl B, Muller G, Ullrich A, BJ, Wikel JH, Wrighton SA (1999) Three and
Eros D, Horvath Z, Greff Z, Marosfalvi J, four dimensional-quantitative structure activ-
Pato J, Szabadkai I, Szilagyi I, Szegedi Z, ity relationship analyses of CYP3A4 inhibi-
Varga I, Waczek F, Orfi L (2005) Drug dis- tors. J Pharm Exp Ther 290:429–438
covery in the kinase inhibitory field using the 140. Lagorce D, Sperandio O, Galons H, Miteva
Nested Chemical Library technology. Assay MA, Villoutreix BO (2008) FAF-Drugs2: free
Drug Dev Technol 3:543–551 ADME/tox filtering tool to assist drug dis-
130. Kortagere S, Welsh WJ (2006) Development covery and chemical biology projects. BMC
and application of hybrid structure based Bioinform 9:396
141. Villoutreix BO, Renault N, Lagorce D, maceutical research and development. Wiley,
Sperandio O, Montes M, Miteva MA (2007) Hoboken, NJ, pp 495–512
Free resources to assist structure-based virtual 154. Chang C, Ekins S, Bahadduri P, Swaan PW
ligand screening experiments. Curr Protein (2006) Pharmacophore-based discovery of
Pept Sci 8:381–411 ligands for drug transporters. Adv Drug Del
142. Ekins S (2007) Computational toxicology: Rev 58:1431–1450
risk assessment for pharmaceutical and envi- 155. Hamilton RD, Foss AJ, Leach L (2007)
ronmental chemicals. Wiley, Hoboken, NJ Establishment of a human in vitro model of
143. Wang J, Hou T (2009) Recent advances on in the outer blood-retinal barrier. J Anat
silico ADME modeling. Annu Rep Comput 211:707–716
Chem 5:101–127 156. Loscher W, Potschka H (2005) Drug resistance
144. Jorgensen WL, Duffy EM (2002) Prediction in brain diseases and the role of drug efflux
of drug solubility from structure. Adv Drug transporters. Nat Rev Neurosci 6:591–602
Deliv Rev 54:355–366 157. Abraham MH, Ibrahim A, Zhao Y, Acree WE
145. Wang J, Hou T, Xu X (2009) Aqueous solu- Jr (2006) A data base for partition of volatile
bility prediction based on weighted atom type organic compounds and drugs from blood/
counts and solvent accessible surface areas. plasma/serum to brain, and an LFER analysis
J Chem Inf Model 49:571–581 of the data. J Pharm Sci 95:2091–2100
146. Delaney JS (2005) Predicting aqueous solu- 158. Kortagere S, Chekmarev D, Welsh WJ, Ekins
bility from structure. Drug Discov Today S (2008) New predictive models for blood-
10:289–295 brain barrier permeability of drug-like mole-
147. Votano JR, Parham M, Hall LH, Kier LB, cules. Pharm Res 25:1836–1845
Hall LM (2004) Prediction of aqueous solu- 159. Zhang L, Zhu H, Oprea TI, Golbraikh A, Trop-
bility based on large datasets using several sha A (2008) QSAR modeling of the blood-
QSPR models utilizing topological structure brain barrier permeability for diverse organic
representation. Chem Biodivers compounds. Pharm Res 25:1902–1914
1:1829–1841 160. Kortagere S, Ekins S (2010) Troubleshooting
148. Hou TJ, Zhang W, Xia K, Qiao XB, Xu XJ computational methods in drug discovery.
(2004) ADME evaluation in drug discovery. J Pharmacol Toxicol Methods 61:67–75
5. Correlation of Caco-2 permeation with 161. Grant MA (2009) Protein structure predic-
simple molecular properties. J Chem Inf tion in structure-based ligand design and vir-
Comput Sci 44:1585–1600 tual screening. Comb Chem High
149. Jung E, Kim J, Kim M, Jung DH, Rhee H, Throughput Screen 12:940–960
Shin JM, Choi K, Kang SK, Kim MK, Yun 162. Sjogren B, Blazer LL, Neubig RR (2010)
CH, Choi YJ, Choi SH (2007) Artificial neu- Regulators of G protein signaling proteins as
ral network models for prediction of intestinal targets for drug discovery. Prog Mol Biol
permeability of oligopeptides. BMC Bioin- Transl Sci 91:81–119
form 8:245 163. Willett P (2003) Similarity-based approaches
150. Thomas VH, Bhattachar S, Hitchingham L, to virtual screening. Biochem Soc Trans
Zocharski P, Naath M, Surendran N, Stoner 31:603–606
CL, El-Kattan A (2006) The road map to oral 164. Ebalunode JO, Zheng W (2010) Molecular
bioavailability: an industrial perspective. Exp shape technologies in drug discovery: meth-
Opin Drug Metab Toxicol 2:591–608 ods and applications. Curr Top Med Chem
151. Zheng X, Ekins S, Raufman JP, Polli JE 10:669–679
(2009) Computational models for drug inhi- 165. Horvath D (2011) Pharmacophore-based vir-
bition of the human apical sodium-dependent tual screening. Methods Mol Biol (Clifton,
bile acid transporter. Mol Pharm NJ) 672:261–298
6:1591–1603
166. Yang SY (2010) Pharmacophore modeling
152. Varma MV, Ambler CM, Ullah M, Rotter CJ, and applications in drug discovery: challenges
Sun H, Litchfield J, Fenner KS, El-Kattan AF and recent advances. Drug Discov Today
(2010) Targeting intestinal transporters for 15:444–450
optimizing oral drug absorption. Curr Drug
Metab 11:730–742 167. Ebalunode JO, Zheng W, Tropsha A (2011)
Application of QSAR and shape pharmaco-
153. Chang C, Swaan PW (2006) Computer opti- phore modeling approaches for targeted
mization of biopharmaceutical properties. In: chemical library design. Methods Mol Biol
Ekins S (ed) Computer applications in phar- (Clifton, NJ) 685:111–133
168. Halperin I, Ma B, Wolfson H, Nussinov R molecular docking of ligand-protein com-

(2002) Principles of docking: an overview of plexes. J Comput Aided Mol Des 14:731–751
search algorithms and a guide to scoring func- 182. Cornell WD, Cieplak P, Bayly CI, Gould IR,
tions. Proteins 47:409–443 Merz KM Jr, Ferguson DM, Spellmeyer DC,
169. Lorber DM, Shoichet BK (2005) Hierarchical Fox T, Caldwell JW, Kollman PA (1995) A
docking of databases of multiple ligand con- second generation force field for the simula-
formations. Curr Top Med Chem 5:739–749 tion of proteins, nucleic acids, and organic
170. Koca J (1998) Travelling through conforma- molecules. J Am Chem Soc 117:19
tional space: an approach for analyzing the 183. Halgren T (1996) Merck molecular force
conformational behaviour of flexible mole- field. I. Basis, form, scope, parameterization,
cules. Prog Biophys Mol Biol 70:137–173 and performance of MMFF94. J Comput
171. Miller MD, Kearsley SK, Underwood DJ, Chem 17:490–519
Sheridan RP (1994) FLOG: a system to select 184. Clark M, Crammer RD, Van Opdenbosch N
‘quasi-flexible’ ligands complementary to a (1989) Validation of the general purpose tri-
receptor of known three-dimensional struc- pos 5.2 force field. J Comput Chem 10:30
ture. J Comput Aided Mol Des 8:153–174 185. Bohm HJ (1992) LUDI: rule-based auto-
172. Sousa SF, Fernandes PA, Ramos MJ (2006) matic design of new substituents for enzyme
Protein-ligand docking: current status and inhibitor leads. J Comput Aided Mol Des
future challenges. Proteins 65:15–26 6:593–606
173. Kuntz ID, Blaney JM, Oatley SJ, Langridge 186. Eldridge MD, Murray CW, Auton TR, Pao-
R, Ferrin TE (1982) A geometric approach to lini GV, Mee RP (1997) Empirical scoring
macromolecule-ligand interactions. J Mol functions: I. The development of a fast empir-
Biol 161:269–288 ical scoring function to estimate the binding
174. Rarey M, Kramer B, Lengauer T, Klebe G affinity of ligands in receptor complexes.
(1996) A fast flexible docking method using J Comput Aided Mol Des 11:425–445
an incremental construction algorithm. J Mol 187. Gohlke H, Hendlich M, Klebe G (2000)
Biol 261:470–489 Knowledge-based scoring function to predict
175. Halgren TA, Murphy RB, Friesner RA, Beard protein-ligand interactions. J Mol Biol
HS, Frye LL, Pollard WT, Banks JL (2004) 295:337–356
Glide: a new approach for rapid, accurate 188. Muegge I, Martin YC (1999) A general and
docking and scoring. 2. Enrichment factors fast scoring function for protein-ligand
in database screening. J Med Chem interactions: a simplified potential approach.
47:1750–1759 J Med Chem 42:791–804
176. Welch W, Ruppert J, Jain AN (1996) Ham- 189. DeWitte RS, Shakhnovich EI (1996) SMoG:
merhead: fast, fully automated docking of de novo design method based on simple, fast,
flexible ligands to protein binding sites. and accurate free energy estimates. 1. Meth-
Chem Biol 3:449–462 odology and supporting evidence. J Am
177. Junmei Wang TH, Chen L, Xiaojie Xu (1999) Chem Soc 118:11
Conformational analysis of peptides using 190. Charifson PS, Corkery JJ, Murcko MA, Wal-
Monte Carlo simulations combined with the ters WP (1999) Consensus scoring: a method
genetic algorithm. Chemom Intell Lab Syst for obtaining improved hit rates from docking
45:5 databases of three-dimensional structures into
178. Goodsell DS, Olson AJ (1990) Automated proteins. J Med Chem 42:5100–5109
docking of substrates to proteins by simulated 191. Zwanzig R (1954) High-temperature equa-
annealing. Proteins 8:195–202 tion of state by a perturbation method.
179. Jones G, Willett P, Glen RC, Leach AR, Tay- J Chem Phys 22:1420–1426
lor R (1997) Development and validation of a 192. Gilson MK, Zhou HX (2007) Calculation of
genetic algorithm for flexible docking. J Mol protein-ligand binding affinities. Annu Rev
Biol 267:727–748 Biophys Biomol Struct 36:21–42
180. Kitchen DB, Decornez H, Furr JR, Bajorath J 193. Singh N, Warshel A (2010) Absolute binding
(2004) Docking and scoring in virtual screen- free energy calculations: on the accuracy of
ing for drug discovery: methods and applica- computational scoring of protein-ligand
tions. Nat Rev Drug Discov 3:935–949 interactions. Proteins 78:1705–1723
181. Verkhivker GM, Bouzida D, Gehlhaar DK, 194. Leach AR (2001) Molecular modelling prin-
Rejto PA, Arthurs S, Colson AB, Freer ST, ciples and applications, 2nd edn. Pearson
Larson V, Luty BA, Marrone T, Rose PW Education Ltd, New York, NY
(2000) Deciphering common failures in
195. Beutler TC, Mark AE, Vanschaik RC, Gerber 206. Orrling KM, Marzahn MR, Gutierrez-de-
PR, van Gunsteren WF (1994) Avoiding sin- Teran H, Aqvist J, Dunn BM, Larhed M
gularities and numerical instabilities in free- (2009) a-Substituted norstatines as the
energy calculations based on molecular simu- transition-state mimic in inhibitors of multi-
lations. Chem Phys Lett 222:529–539 ple digestive vacuole malaria aspartic pro-
196. Zacharias M, Straatsma TP, McCammon JA teases. Bioorg Med Chem 17:5933–5949
(1994) Separation-shifted scaling, a new scal- 207. Kerrigan JE, Ragunath C, Kandra L,
ing method for Lenard-Jones interactions in Gyemant G, Liptak A, Janossy L, Kaplan JB,
thermodynamic integration. J Chem Phys Ramasubbu N (2008) Modeling and
100:9025–9031 biochemical analysis of the activity of antibio-
197. Jorgensen W, Chandrasekhar J, Madura J, film agent Dispersin B. Acta Biol Hung
Klein M (1983) Comparison of simple poten- 59:439–451
tial functions for simulating liquid water. 208. Zhou R, Frienser RA, Ghosh A, Rizzo RC,
J Chem Phys 79:926–935 Jorgensen WL, Levy RM (2001) New linear
198. Berendsen HJ, Postma JP, van Gunsteren WF, interaction method for binding affinity
Hermans J (1981) Interaction models for calculations using a continuum solvent
water in relation to protein hydration. In: Pull- model. J Phys Chem B 105:10388–10397
man B (ed) Intermolecular forces. D. Reidel 209. Zoete V, Meuwly M, Karplus M (2004)
Publishing Co., Dordrecht, pp 331–342 Investigation of glucose binding sites on insu-
199. Åqvist J, Medina C, Samuelsson JE (1994) A lin. Proteins 55:568–581
new method for predicting binding affinity in 210. Liu S, Zhou LH, Wang HQ, Yao ZB (2010)
computer-aided drug design. Protein Eng Superimposing the 27 crystal protein/inhibi-
7:385–391 tor complexes of beta-secretase to calculate
200. Åqvist J, Hansson T (1996) On the validity of the binding affinities by the linear interaction
electrostatic linear response in polar solvents. energy method. Bioorg Med Chem Lett
J Phys Chem 100:9512–9521 20:6533–6537
201. Hansson T, Marelius J, Åqvist J (1998) 211. Alam MA, Naik PK (2009) Applying linear
Ligand binding affinity prediction by linear interaction energy method for binding affinity
interaction energy methods. J Comput calculations of podophyllotoxin analogues
Aided Mol Des 12:27–35 with tubulin using continuum solvent model
202. Almlöf M, Carlsson J, Åqvist J (2007) and prediction of cytotoxic activity. J Mol
Improving the accuracy of the linear interac- Graph Model 27:930–943
tion energy method for solvation free ener- 212. Alzate-Morales JH, Contreras R, Soriano A,
gies. J Chem Theory Comput 3:2162–2175 Tunon I, Silla E (2007) A computational
203. Almlöf, M. (2007) Computational Methods study of the protein-ligand interactions in
for Calculation of Ligand-Receptor Binding CDK2 inhibitors: using quantum mechan-
Affinities Involving Protein and Nucleic Acid ics/molecular mechanics interaction energy
Complexes, In Cell and Molecular Biology, as a predictor of the biological activity. Bio-
p 53, Uppsala University, Uppsala, Sweden phys J 92:430–439
204. Almlof M, Brandsdal BO, Aqvist J (2004) Bind- 213. Wolber G, Langer T (2005) LigandScout:
ing affinity prediction with different force fields: 3-D pharmacophores derived from protein-
examination of the linear interaction energy bound ligands and their use as virtual screen-
method. J Comput Chem 25:1242–1254 ing filters. J Chem Inf Model 45:160–169
205. Jorgensen WL, Maxwell DS, Tirado-Rives J 214. Tan L, Batista J, Bajorath J (2010) Computa-
(1996) Development and testing of the OPLS tional methodologies for compound database
all-atom force field on conformational ener- searching that utilize experimental protein-
getics and properties of organic liquids. J Am ligand interaction information. Chem Biol
Chem Soc 118:11225–11236 Drug Des 76:191–200
Part II
Mathematical and Computational Modeling

Chapter 4
Best Practices in Mathematical Modeling

Lisette G. de Pillis and Ami E. Radunskaya
Abstract
Mathematical modeling is a vehicle that allows for explanation and prediction of natural phenomena. In this
chapter we present guidelines and best practices for developing and implementing mathematical models,
using cancer growth, chemotherapy, and immunotherapy modeling as examples.
Key words: Mathematical modeling, Modeling tutorial, Cancer, Immunology, Chemotherapy,

Immunotherapy
1. Introduction
and Overview
Mathematics is a concise language that encourages clarity of com-
munication. Mathematical modeling is a process that makes use of
the power of mathematics as a language and a tool to develop
helpful descriptions of natural phenomena. Mathematical models
of biological and medical processes are useful for a number of
reasons. These include the following: clarity in communication;
safe hypothesis testing; predictions; treatment personalization;
new treatment protocol testing.
1.1. Clarity in Describing a phenomenon using mathematics forces clarity of

Communication communication. The process of choosing mathematical terms
requires one to be precise, and implicit assumptions are less likely
to slip by. This formulation in mathematical terms is sometimes
called a “formal model” to distinguish it from, for example, an
experimental model, such as a “mouse model” (1).
51
52 L.G. de Pillis and A.E. Radunskaya
For example, suppose a clinician wants to model treating a

cancer with chemotherapy. Describing the process using mathe-
matics forces us to clarify certain assumptions such as the following:
1. Is the tumor heterogeneous? Is it liquid or solid?
2. Does the immune system have any effect on the tumor; does it
slow growth or stimulate growth? Do we need to include the
immune system in the model? If so, what effect does the
chemotherapy have on the immune system?
3. Does the treatment depend on tumor vasculature and, there-
fore, does the vasculature need to be included in the mathe-
matical model?
4. Does the tumor develop resistance to the drug? Is the phase of
the cell cycle an important consideration in treatment?
A mathematical, or “formal,” model makes clear which features
are most important when considering chemotherapy treatment in a
particular case. In the remainder of this chapter we illustrate the
process of building up the formal model from the simplest level to a
more complicated model, as required to answer a specific question.
We use the treatment of cancer as our running example.
1.2. Safe Hypothesis A useful mathematical model may allow one to test the possible
Testing mechanisms behind certain observed phenomena. For example, the
reasons some patients go into remission from cancer and never
relapse, while others do relapse, are not fully understood. A mathe-
matical model, however, can allow us to test the hypothesis that the
strength of a patient’s immune system plays a significant role in
whether or not a patient will experience relapse.
1.3. Predictions Mathematical models can be used to predict system performance

under otherwise un-testable conditions. One cannot and should
not experiment on patients the way one can experiment with a
mathematical model. For example, with a mathematical model,
one can make predictions about disease progression if a patient
does not receive any treatment, and one can also test new combi-
nation therapies and alternate protocols without endangering a
patient’s health or safety.
1.4. Treatment A calibrated mathematical model can be used to test personalized

Personalization treatments. In practice, essentially identical treatments are given to
a broad array of patients who are not identical. A mathematical
model, however, allows us to take into consideration patient-
specific features such as the strength of their immune response
and their response to treatments. A variety of scenarios and patient
profiles can therefore be efficiently and safely addressed using the
mathematical model.
4 Best Practices in Mathematical Modeling 53
1.5. New Treatment A useful mathematical model may allow one to test new medical
Protocol Testing interventions: a variety of hypothetical interventions can be ana-
lyzed through the formal model much more quickly, inexpensively
and safely than can be done using clinical trials.
2. Modeling
Philosophy
There are two main approaches to developing a mathematical
model from the ground up. One is to start with the most compli-
cated model that includes everything and pare it down. Alterna-
tively, in the Occam’s Razor approach, one starts with the simplest
model possible and then builds it up as necessary. We recommend
using the Occam’s Razor approach, only adding complexity when
the simple model is not sufficient to achieve the desired results.
Paul Sanchez (1) suggests keeping in mind the following
guidelines when developing a mathematical model.
l Start with the simplest model possible that captures the essen-
tial features of the system of interest.
l Build in small steps—Once your simple model is working, add
features incrementally to make it more realistic. Be sure to only
add one thing at a time, and test the model after each addition.
l Keep only those additional features that actually improve the
model: is the more complicated model more useful in answer-
ing the questions that you need addressed?
l Always compare incremental improvements with previous, sim-
pler versions of the model. Do not hesitate to go back to earlier
models, or to start over, considering a different approach.
We can summarize these guidelines in the Goldilocks Principle: a
mathematical model should be not too complicated, but not too
simple either. There is always a trade-off between complexity and
tractability. On the one hand, a highly complicated model can have
many variables and many parameters, making it appear more realistic.
It could be difficult, however, if not impossible, to estimate the large
number of parameters. Each parameter estimate introduces some new
degree of uncertainty, thus potentially diminishing the usefulness of
the model. In addition, it is typically very difficult to mathematically
analyze a system with a large number of variables. On the other hand,
a simpler model might be analytically tractable but too unrealistic: the
simple model may not be able to answer the question of interest.
There is an art to modeling: there is not necessarily one correct
model. Part of this art is to determine which elements are impor-
tant to have in the model, and which elements we can ignore.
Keeping these philosophical guidelines in mind, we view the
modeling process in terms of the following five-step approach.
We illustrate each step in the context of modeling chemotherapy of

cancer. A more detailed implementation of the five-step process
follows in the next section.
STEP 1: Ask the question.
Which chemotherapy protocol will most effectively control a
patient’s cancer?
STEP 2: Select the modeling approach.
There are many individuals, with many types of cancer. We first pick
a type of cancer, for example a cancer of the blood. This allows us to
use a modeling approach with no spatial component. Since the cell
populations are large, a continuous and deterministic modeling
approach is appropriate. Therefore, we develop an ordinary differ-
ential equations model.
STEP 3: Formulate the model.

We describe the interactions between the model elements as func-
tional expressions. This is the step in which we write mathematical
formulas that describe the system behavior.
STEP 4: Solve and validate.
Find a solution to the mathematical model, either numerically or
analytically. This typically involves the estimation of model para-
meters, and calibration of the model to the specific situation being
studied. In this example we would need tumor growth data from
patients, and data on responses to a particular chemotherapy
against which we could compare model outcomes. Does the
model adequately describe observed behavior? If necessary, go
back to step 2 and revise the model.
STEP 5: Answer the question.
Interpret the results from step 4. Have we determined a reasonable
chemotherapy protocol? Does this give rise to new questions? If so,
return to step 1.
Figure 1 gives a visual representation of our five-step modeling
process.
3. Example:
Implementation
of the Modeling
Process In this section we illustrate the five-step modeling process by using
it to develop a mathematical model of tumor response to the
immune system. We then extend this model to study the effects of
chemotherapy on the system.
The Modeling Process
Model World
Real World Occam’s Razor*
2. Select the
1. Ask the question modeling approach
5. Answer the
question.
sary
ces
if ne
4. Analysis, Solutions,
Numerics. Validate.
Mathematical Model
(Equations)
*Occam’s Razor :
“Entia non sunt multiplicanda 3. Formulate Model
praeter necessitatem” World Problem
“Things should not be
multiplied without good reason”
Fig. 1. The Modeling Process. This diagram shows the five steps of the modeling process,
with possible loops to illustrate successive model refinement.
Step 1: Ask the Question.

How does the immune system affect tumor cell growth? Could it be
responsible for tumor “dormancy”—when the tumor apparently
regresses for a significant amount of time, followed by aggressive
recurrence?
Step 2: Select the Modeling Approach.
We need the model to track tumor and immune populations over
time. Following the philosophy of Occam’s Razor, we assume that
the populations are homogeneous, i.e., individual cells are identical,
and we assume that populations are well mixed, which means that
each individual in one population is equally likely to interact with
any individual in the other population. This simplification allows us
to neglect spatial changes, and so we only track population size
changes. The next step is to consider which intrinsic population
model we will use.
Population models can be roughly divided into several different
types. One type of model considers the evolution of the popula-
tions by keeping track of the population sizes at discrete points in
time, for example every year, day, or hour. These are (appropriately)
called discrete time models. Continuous time models represent time
as a continuum, and describe the evolution of the population sizes
as continuous functions of time. These two types of models are
illustrated in Fig. 2.
5 5
4 4
3 3
Population
Population
2 2
1 1
0 0
–1 –1
–5 0 5 10 15 20 –5 0 5 10 15 20
Time Time
DiscreteTime Continuous Time
Fig. 2. Discrete versus continuous models. In a discrete model, the system is defined at a finite set of points (left panel ),
while in a continuous model, the system is defined on a continuous interval of time points (right panel ).
Another broad distinction between mathematical models is

whether the law of evolution is deterministic or stochastic.
In deterministic models, populations evolve according to a
fixed law that can be formulated as a function. Formally, this
function describes a difference equation in the discrete case:
P ðt iþ1 Þ ¼ P ðt i Þ þ F ½P ðt i Þ
and a differential equation in the continuous case:
dP
¼ F ½PðtÞ:
dt
In these equations, P is the population size, ti is the ith time
point in the discrete model, t is the time in the continuous model,
and F is the law of evolution. In general, there could be many
populations interacting, so F could be a multivariable function. It
is possible that the evolution of the population is affected by time or
other variables, so that these might be independent variables of
the function, F as well.
A stochastic, or probabilistic model, assumes that the evolution
of the populations cannot be described by a function. Rather, it
assumes that the population’s size over time is determined by
random events that are described by some probability distribution.
Since some aspects of the environment might be random, while
others might not, it is also reasonable to consider models that
combine both deterministic and stochastic elements.
In our example, we want to describe the evolution of large cell
populations in the body. Physiological processes can be described at
any point in time, and, at least on an observational scale, time
appears to be continuous. If populations are relatively large, as in
the case of a population of tumor cells forming a tumor, or the
immune cells in an individual’s body, they appear to evolve contin-
uously over time. We therefore choose to use a continuous model
to describe these populations. Furthermore, inter-cellular reactions

are described by empirically determined rates, so that we are able to
formulate deterministic laws that describe how the two populations
change over time. Hence, for this example, we choose a model that
is continuous and deterministic. We remark that there are analytical
advantages to using continuous functions as well: we can integrate
them and, if they are smooth enough, we can differentiate them.
We would like to take advantage of the analytical tools applicable to
systems of differential equations, and so at this time we choose to
represent our system of two populations as a system of differential
equations:
dE
¼ F 1 ðE; T Þ
dt
dT
¼ F 2 ðE; T Þ;
dt
where E denotes the immune cell populations, and T denotes the
tumor cell population.
Remark: We point out that, while focusing here on differential
equations models, these are appropriate only when a continuous
description of the variables is appropriate (many cells, many time
points). Also, ordinary differential equations (ODE’s) often assume
that the populations are well-mixed. For an example of a model of
tumor—immune interactions that has discrete and stochastic ele-
ments, see ref. (2). In the more complicated model described in the
referenced paper, spatial variation and a variety of tumor—immune
interactions can be explored, at the cost of analytical tractability.
The general modeling philosophy is independent of the choice of
model type (continuous versus discrete), but we emphasize that the
choice of model must be informed by the question being asked, as
well as the analytical tools at hand.
Step 3: Formulate the Model.
Applying our philosophy of starting with the simplest possible
model, we choose to track two cell populations: effector cells and
tumor cells. Thus, our model contains two dependent variables:
T(t) ¼ Tumor cells (number of cells or density),
E(t) ¼ Immune Cells that kill tumor cells (Effectors) (number of
cells or density),
and one independent variable: t (time).
The growth of each cell population can be divided into two
components: population growth in isolation, and competitive
interactions between populations. We initially focus on the first
component, and begin by examining how each cell population
might grow if it were isolated from the other.
3.1. Modeling Tumor The simplest growth process involves the cells dividing at a constant
Growth rate, giving the differential equation:
dT
¼ kT ; k > 0:
dt
This equation has the solution:
T ðtÞ ¼ T 0 ekt ;
where T0 ¼ T(0) is the initial tumor population, the number of
tumor cells, at time zero. A tumor large enough to be detected
clinically is approximately 7 mm in diameter. This corresponds to a
population of approximately 108 cells. Exponential growth implies
that, with a doubling rate of 2 days, the tumor population would
grow to 1011 cells and weigh 1 kg, in only 20 days! Thus, exponential
growth seems physically unrealistic, even if the growth rate were
much smaller, say, a doubling every 2 weeks.
Experiments show that tumor cells grow exponentially when
the population is small, but growth slows down when the popula-
tion is large. Figure 3 compares exponential growth and self-
limiting growth on the same graph.
Figure 4 shows a graph plotting tumor growth rate (dT/dt) as a
function of tumor size (T).
Exponential Growth vs Self-Limiting Growth
Slowing
Exponential growth
growth regime
Growth stops
Fig. 3. Exponential versus self-limiting growth. A plot of the time-evolution of two models of population growth on the same
axes. In the initial time period, both graphs look the same: this is the initial exponential growth phase of the self-limiting
growth model. By time t ¼ 6, the graphs of the two models begin to separate, and by time t ¼ 15 the graph of the self-
limiting growth model has leveled off, indicating that growth has stopped.
Sel–Limiting Growth: dT/dt = aT(1–bT)

25
20
15
Tumor growth rate: dT / dt

10
–5
–10
–15
–20
–25
0 100 200 300 400 500 600
Tumor size: T (× 106)
Fig. 4. Self-limiting growth. This graph of growth rate (dT/dt ) against population (T, in
units of 106 cells) shows one self-limiting growth model, the logistic model, (see Table 1:
population growth models). Note that, for small population values, growth rate increases
as the population increases, but as the population increases beyond T ¼ 250, growth rate
begins to decrease until it reaches zero at the carrying capacity, T ¼ 500. This corre-
sponds to the value at which the population levels off in Fig. 3. For values of the
population, T, larger than 500, dT/dt is negative, indicating that a population larger than
the carrying capacity will decrease until it reaches 500. Models of growth may be
developed starting with an empirically derived graph, similar to this one, of the rate of
growth against the population.
Note that the curve in Fig. 4 is closely approximated by a line

with positive slope for small values of T, the slope decreases as T
increases, and the growth rate, dT/dt, reaches a maximum when T
is approximately 250. For larger population values, the growth rate
decreases, reaching zero when T ¼ 500. For values of T larger than
500, the growth rate is negative, indicating that the tumor cell
population will decrease if the population exceeds 500. This curve
could be described by the quadratic equation:
dT
¼ aT ð1 bT Þ:
dt
This is also known as the logistic growth equation. In this
particular example, b ¼ 0.002, and the model predicts that the
population will grow if 0 < T < 500, it will decrease if T > 500
and, if T ¼ 500, the population size will remain constant. The
quantity, 1/b gives the limiting tumor size (in this case 500), and
it is called the carrying capacity. The model also predicts that, if
T ¼ 0, the growth rate is also zero. In the equation, a is the slope of
the graph when T ¼ 0, so we can interpret a as the intrinsic growth
rate of the population: the rate of growth that occurs when the
population is small enough to grow almost exponentially. The
terms a and b are the two model parameters. They could be
Table 1
Growth laws
Growth law Equation Number of parameters

Logistic dN
¼ aN ð1 bN Þ Two parameters: a and b
dt
Power dN
¼ aN b Two parameters: a and b
dt
Gompertz dN
¼ aN lnð1=bN Þ Two parameters: a and b
dt
von Bertalanffy dN
¼ aN ð ðbN Þc 1Þ Three parameters: a, b and c
dt
This Table gives the equations and the number of parameters for four com-
monly used laws of population growth. A growth law should be chosen to fit to
experimental data and according to the principle of “parameter parsimony.”
see Fig. 5
measured from data, or hypothesized from basic knowledge of the

system.
There are many other possible formulas that could describe
self-limiting growth, which can be found in the literature. Growth
functions, in addition to logistic growth, that are frequently used to
model tumor growth include the power growth law, von Berta-
lanffy growth, and Gompertz growth. The equations for these
growth laws are given in Table 1.
Which intrinsic growth model is best? The answer depends on
several factors, including the type of cancer being modeled, the loca-
tion of the cancer in the body, etc. However, if we have tumor growth
data against which to calibrate potential models, we recommend
choosing an intrinsic growth model based on a best fit to the data.
In Fig. 5 we show an example of calibrating the four different
intrinsic growth models against the same tumor growth data sets.
Diefenbach et al. (3) provided tumor growth data from immuno-
compromised mice in three different sets of experiments, where the
mice were challenged with 103, 104, and 105 tumor cells.
We used the Matlab® routine fminsearch to find the parameter
values for each growth model that minimized the least squares
distance to all three data sets simultaneously. Figure 5 shows the
resulting model curves using the best-fit parameter values, with
residual error bars (the model value minus the data value) shown
under each of the four panels. We then compared the residual error
for each of the four models. The two models shown in the right
panels, the logistic (top) and von Bertalanffy (bottom), have smal-
ler residuals than the other two models. We chose the logistic over
von Bertalanffy using the principle of parameter parsimony, which
prefers models with the fewest parameters. Note that the logistic
model has two parameters: a and b, while the von Bertalanffy model
has three parameters: a, b and c.
Fig. 5. A comparison of four growth laws. Data from ref. (3), which describes three different mouse experiments (marked
as “Data set 1,” “Data set 2,” and “Data set 3,” respectively), is used to fit four different growth laws. The model result is
shown in solid curves, while the data points are shown by filled squares. In each case, the parameters of the models are
chosen to minimize the least squares distance from the model’s predicted values to the data. Residuals showing the
difference between the predicted values and the data are shown as bars below the graphs in each case. Note that the first
data set has more time points than the other two, so that the last three residuals are due to differences coming only from
the first data set. The two models shown in the left column, the power law and the Gompertz models, show larger residuals
than the two models depicted on the right, the logistic and the von Bertalanffy models. In this sense, the two models on the
right are “better” than the two on the left. Using the principle of “parameter parsimony,” which indicates that the model
with the fewest parameters is preferable, the logistic model is “better” than the von Bertalanffy in predicting the outcome
of the experiments represented by these data.
3.2. Modeling the Cytotoxic effector cells include those immune cells that are capable
Growth of Effector Cells of killing tumor cells, for example Natural Killer (NK) cells or
Killer T Cells (CTL); we combine all cytotoxic effector cells into
one population, referred to as Effector Cells, denoted by E.
If we assume that there is a constant source of effector cells
providing cells at a fixed rate, s, cells per unit time, and that the
fraction of these cells that die off per unit time is another constant,
d, we get the differential equation:
dE
¼ s dE with s; d>0:
dt
3.3. Modeling Tumor We now need to add to the immune cell equation a term that
and Immune represents how the production of tumor-specific effector cells
Interactions responds to the presence of the tumor. This function should incor-
porate the recognition of antigen by the immune system, and could
take several different forms. The most common function describing
the interaction of two populations describes the Law of Mass
Action, which assumes that the effect of the interaction is directly
proportional to the product of the two populations. For example, if
the effector cells were stimulated by the tumor cells according to
the Law of Mass Action, the equation would take on the following
form:
dE
¼ s dE þ rET with s; d; r>0:
dt
Note, however, that the term rET implies that the larger the
tumor, the greater the response of the immune system, and that this
response could get infinitely large. The rate of production of
tumor-specific effector cells is difficult to measure experimentally,
however, it is known that the effector response saturates, and does
not grow indefinitely. Therefore, the response function should be
an increasing function of the number of tumor cells, but should be
bounded above by some constant. One such function has a Michae-
lis–Menten form, where we replace rET by rET/(s + T) and the
equation becomes:
dE rET
¼ s dE þ with s; d; r; s>0:
dt sþT
We now must include the destructive effect that interactions
between tumor cells and immune cells have on the cell populations.
Biologically, we assume that tumor cells can be killed by effector
cells while, in the process, the effector cells are deactivated follow-
ing interactions with tumor cells. In this case, a mass action law
makes sense, since the maximum number of cells that can be killed
under this law will not exceed the total population. Putting these
terms into the differential equations for effector cell and tumor
growth gives the system of equations:
dE rET
¼ s dE þ c 1 ET (1)
dt sþT
dT
¼ aT ð1 bT Þ c 2 ET ; (2)
dt
where c1 and c2 are also positive constants.
Note that the following assumptions are now included in the
two-dimensional model:
l Effectors have a constant source.
l Effectors are recruited by tumor cells.
l Tumor cells can deactivate effectors (assume mass action law).

l Effectors have a natural death rate.
l Tumor cell population grows logistically (includes death
already).
l Effector cells kill tumor cells (assume mass action law).
3.4. Dimensional Once the model has been developed, it is very important to make
Analysis sure that the units for the parameters and populations (“state
variables”) are balanced. Inappropriately mixing units is a common
mistake that beginning modelers make. The first step in checking
whether the equations are balanced is to specify the units of each
model parameter. For example, suppose we want to examine the
units in the Michaelis–Menten term. On the left, the units of dE/dt
are cells per time, or #cells/day, since E represents the number of
effectors cells, and t represents time. Each term of the equation on
the right must then work out to #cells/day. Looking at the Michae-
lis–Menten term in Eq. 1, with the units in parentheses, and only
the units of r and s as yet to be determined, we have:
dEð#cellsÞ rð?ÞEð#cellsÞT ð#cellsÞ
¼ ::: ::: (3)
dtðdayÞ sð?Þ þ T ð#cellsÞ
We can now determine the proper units for the parameters r
and s in the Michaelis–Menten term. First, it is clear that the units
of s must be in units of cells, that is s(?)¼s(# cells), for the
denominator to make sense (we add number of cells to number
of cells). Secondly, the parameter r must be in units of 1/day, since
the left hand side of Eqs. 1 and 3 are in units of (# cells/day), each
additive term on the right hand side of those equations must also be
in units of (# cells/day). Thus, with r (1/day), the Michaelis–-
Menten term units work out to be r (1/day) (#cells)2/
(#cells) ¼ (#cells/day). The rest of the parameter units can be
worked out in a similar fashion, giving the values for all of the
parameters, which are shown in Table 2.
In summary, the differential equations model has one equation
per variable. According to the mass balance principle, each equation
gives the total change in the variable, which is equal to the amount in
minus the amount out. Positive terms represent the amount going
in, while negative terms represent the amount going in, while
negative terms represent the amount going out. The form of each
term is derived through biological considerations such as knowing
there is a constant source of immune cells, and empirical observa-
tions, such as knowing that tumor growth is self-limiting. Figure 6
summarizes the elements in the formal mathematical model.
Step 4: Solve the System.
In general we cannot find an explicit function or formula for the
solution (also known as a closed form solution) for a mathematical
Table 2
Description of parameters and their units
Parameter name Description Units

s Constant immune cells source rate # cells/day
s Steepness coefficient # cells
r Tumor recruitment rate of effector cells 1/day
c1 Tumor deactivation rate of effectors 1/(cellday)
d Effector death rate 1/day
a Intrinsic tumor growth rate 1/day
1/b Tumor population carrying capacity # cells
c2 Effector kill rate of tumor cells 1/(cellday)
The parameters in a model give insight into the quantities that affect the long-
term behavior of the solutions. A dimensional analysis is useful as a check that
the model equations are balanced, and this analysis should always be per-
formed before solving the model and interpreting the results
Model Elements
Mass Michaelis-
Logistic Growth Menten
Action
dE
= s + rET ( +T) − c1ET − dE
dt
dT
= aT (1 − bT) − c2ET
dt
Fig. 6. Elements of the tumor-immune model. This figure shows the equations of the
tumor immune model, with key functional elements highlighted. The negative mass action
terms are inside a rectangular border, indicating that we assume that any cell is equally
likely to interact with any other cell. The logistic growth term in the second differential
equation describing tumor growth is shown inside a hexagonal border; this term indicates
our assumption that tumor growth is self-limiting, even in the absence of immune cells.
The recruitment term in the differential equation describing immune cell growth is shown
circled by an oval. The Michaelis–Menten form of this term indicates our assumption that
immune cell production in response to the presence of a tumor saturates at the level given
by the parameter, r. Each term in a model is developed using knowledge of the system
being modeled, for example the dynamics of cell growth and the kinetics of the immune
response, as well as available experimental data showing the shape of growth and
response curves.
Fig. 7. A phase portrait of the tumor-immune model. This figure shows orbits, the result of
numerical solutions to the tumor-immune model given in Eqs. 1 and 2. For this set of
parameter values (given in Table 3), there are three equilibria, indicated by asterisks. Two
equilibria are stable, so that nearby solutions are attracted to them. One of these
represents a relatively large tumor size near the carrying capacity of 5 108 cells,
while the other represents a relatively small tumor size of approximately 8.5 105 cells.
The third equilibrium is unstable, but the orbits that converge to it, indicated by dark lines,
serve to separate the two basins of attraction of the stable equilibria. This figure also
depicts several representative orbits, or solutions of the model. Some of these orbits
move up towards the large-tumor stable equilibrium, while the others spiral towards the
small-tumor stable equilibrium. This phase portrait captures all possible behaviors of the
model.
model, so we must use numerical solvers (computational solu-

tions). However, we can use mathematical analysis to determine
some of the qualitative features of the solution. The details are
beyond the scope of this chapter, but see ref. (4) for a good
reference.
3.5. Interpreting the Figures 7 and 8 show a numerical solution of the model given in
Solution: The Phase Eqs. 1 and 2. Figure 7 shows a Phase Portrait, which is a graph
Portrait showing the evolution of the state variables of the system, in this
case the effector cell population, E(t), on the horizontal axis, and
the tumor cell population, T(t), plotted on the vertical axis. The
phase portrait does not show time explicitly, but rather it shows
how the two state variables of the system change over time with
respect to each other. A phase portrait is a compact way of display-
ing all possible behaviors of the system at once. An orbit of the
differential equation is a solution plotted as a curve in the phase
plane. The graph shows six orbits (thin lines), going to one of three
equilibria, or steady states, marked by asterisks in Fig. 7. The equili-
bria are found by setting the differential equations, Eqs. 1 and 2,
Attracted to Large Tumor Equilibrium Attracted to Small Tumor Equilibrium

103 103
Effector Cells
Tumor Cells
Effector Cells
102 102
Tumor Cells
Number of Cells
101 101
Different initial tumor sizes
100 100
10–1 10–1
0 10 20 30 40 50 0 10 20 30 40 50
Time Time
Fig. 8. Tumor-immune solution curves. This figure shows two distinct solutions of the tumor immune model using the same
parameter values as in Fig. 7, (see Table 3). On the left, the solution converges to the large tumor equilibrium, while the
graphs on the right show a solution converging to the smaller tumor equilibrium. The initial tumor population values for the
two solutions are the only thing that differs: on the left, T(0) ¼ 10, while on the right T(0) ¼ 1 (in units of 106 cells).
equal to zero, and they describe possible long-term behaviors of the

system. Equilibria are determined graphically by plotting nullclines,
the set of points where each differential equation is zero (the thicker
lines) and finding their points of intersection (the three asterisks).
In this case, there are two stable equilibria, one at the upper left of
the graph, representing a high tumor burden and a relatively low
effector cell population, and one in the middle of the graph, repre-
senting a smaller tumor population and a higher effector cell popu-
lation (notice that the tumor population is plotted using a
logarithmic scale on the vertical axis.) Most orbits will approach
one or the other of these equilibria, depending on where they
initiate. The Basin of Attraction of a stable equilibrium is the set
of orbits that converge to it: from a modeling perspective, deter-
mining the basins of attraction is equivalent to being able to predict
the long-term outcome of a particular scenario. For example, if a
patient’s condition is represented by an orbit that starts in the basin
of attraction of the small-tumor equilibrium, we predict that, even
without treatment, the tumor will soon reach a relatively small
steady state size. The third equilibrium, in between the other
two, is unstable, and has only one orbit that approaches it from
each side, one from below and one from above. This special orbit
separates the two basins of attraction.
Figure 8 shows two sets of solution curves corresponding to

two different scenarios. The set of solution curves on the left
corresponds to an orbit in the basin of attraction of the high
tumor equilibrium, while the set of curves on the right corresponds
to an orbit in the basin of attraction of the smaller tumor equilib-
rium. The initial tumor size is T(0) ¼ 10 in the first case, and T
(0) ¼ 1 in the second case. The initial effector cell populations and
all parameter values are identical in both cases. Note that in the set
of solutions on the right, the populations oscillate with decreasing
amplitude before settling at the equilibrium. This is observed as a
spiraling orbit in the phase portrait of Fig. 7.
Step 5: Interpret the Results—Did We Answer the Question?
Our question was: does the immune system explain the observed
clinical phenomena known as “tumor dormancy” and “creeping
through.” In both of these scenarios, the tumor shrinks to an
undetectable size, but after some time begins to grow again, some-
times quite aggressively. In the phase portrait analysis above, we do
see solutions in the basin of attraction of the lower tumor equilib-
rium in which the tumor size oscillates around the equilibrium
value. However, we do not see dormancy followed by aggressive
regrowth, or “creeping through.”
3.6. Reexamine Model At this stage, before abandoning the model as inadequate, we
Parameters and/or choose to test other parameter ranges. In fact, with one change in
Assumptions a parameter value, we can get the “creeping through” phenome-
non. The new phase portrait is shown in Fig. 9. Note that in this
simulation, the value of s, the constant source rate of immune cells,
has been decreased by a factor of 10, making the immune response
particularly susceptible to the strength of the adaptive immune
response. The parameter values are given in Table 3. Two solutions
are shown in Fig. 9. The initial values of the immune and tumor cell
populations are close to each other (the points are labeled on the
graph), but one initial value results in a tumor that is small for a
time, i.e., dormant, then increases in size, gradually reaching the
lower tumor equilibrium in an oscillatory manner. The other initial
value results in a dormant phase followed by aggressive growth
towards the large tumor equilibrium. Solution curves showing the
tumor populations over time are shown in Fig. 10. In this current
model, we now see both tumor dormancy and aggressive growth,
so we have answered the question in the affirmative: the immune
response can explain both tumor dormancy and creeping through.
Step 5, continued: Can We Use the Mathematical Model to Answer
Other Questions? Now that we have a mathematical model of tumor
growth with the immune response, we can return to our original
question: “What is the best way to administer chemotherapy?” In
particular, how can we determine the optimal doses and timings for
treatments? It is probably impossible to find the “best” treatment, so
Fig. 9. Creeping-through and dormancy. This figure shows a phase portrait of the tumor-immune system, similar to Fig. 7.
However, the parameter representing the constant immune source rate is smaller in this numerical solution of the model,
resulting in a phase portrait with a different structure. There is an fourth, unstable equilibrium (marked “H”), in addition to
the three equilibria shown in Fig. 7. In addition, the shape of the basins of attraction of the two stable equilibria (“D” and
“B”) are different, giving rise to the possibility of an observed phenomenon known as creeping through. In this scenario, a
tumor decreases in size and remains small for some time (dormancy), but then begins to regrow until it reaches a
dangerously large size. This phase portrait shows that, depending on the initial conditions, a tumor can exhibit dormancy
followed by oscillating growth and shrinking until it approaches the smaller-tumor equilibrium (labeled “B”), or it can
experience shrinkage, followed by aggressive growth, or creeping-through, to the larger-tumor equilibrium (labeled “D”).
These two solutions are graphed as a function of time in Fig. 10. Parameter values are given in Table 3.
Table 3
Parameter values used in the simulations
Parameter name Figures 7 and 8 Figures 9 and 10 Units: c ¼ 106 cells, d ¼ 102 days
s 0.1181 0.01 c/d
s 20.19 20.19 c
r 1.131 1.131 1/d
c1 0.00311 0.00311 1/(cd)
d 0.3743 0.3743 1/d
a 1.636 1.636 1/d
b 0.002 0.002 1/c
c2 1 1 1/(cd)
E(0) 0.00001 3.5 and 3.6 c
T(0) 10 and 1 300 c
This Table lists all of the parameter values used to numerically solve the system given by Eqs. 1 and 2. These
solutions are shown in Figs. 7 through 10
600
Tumor dormancy
Dormancy then creeping through
500
Tumor population (cells x 106)

400
300 Similar initial conditions
200
100
0
0 5 10 15 20 25 30 35
Time (days x 102)
Fig. 10. Creeping-through solutions. This figure shows two solutions that are also shown
in the phase portrait in Fig. 9. Only the values of the tumor populations over time are
shown in this figure, so that the two initial conditions look identical, with T(0) ¼ 300
( 106 cells). However, the initial effector cell populations differed, with E
(0) ¼ 3.5 106 cells (solid line) for the solution that converges to the small tumor
equilibrium (“dormancy”), and E(0) ¼ 3.6 106 cells (dashed line) for the solution that
converges to the large tumor equilibrium. The fact that a larger initial immune cell
population results in a worse outcome is a nonintuitive result of the structure of the
phase portrait. To explain this phenomenon: if the initial immune population is large, the
immune system is able to control the tumor for a while (for approximately 900 days, or
2½ years), but during this time the immune response declines to a point at which the
tumor is able to escape the immune surveillance entirely. A similar scenario ensues with a
slightly smaller initial effector cell population, but the tumor population does not get quite
as small, and hence the immune response is just a bit larger, preventing the tumor from
escaping to the large tumor equilibrium. Parameter values are given in Table 3.
instead we might be satisfied with answering whether there is a way to

improve existing protocols. When we compare treatments, we must
be able to make quantitative comparisons between outcomes. In this
context, we could decide that a treatment is better if it minimizes the
amount of tumor after a specific amount of time, while keeping
toxicity levels low. In fact, other quantitative comparisons could be
made: for example, we might want to minimize the total amount of
drug given, or we might want to minimize the amount of time until
the tumor is reduced to a specific size. In these cases the steps in the
modeling process would be the same, and we select one set of criteria
here for illustrative purposes. We refine our example model following
the steps outlined above. In the process, we illustrate how mathemat-
ical techniques, in this case optimal control theory, can be applied to a
mathematical model to answer a specific question.
Step 2: Select the Modeling Approach.
Since we have an existing ODE (ordinary differential equation)
model, we add state variable, u(t), to model the drug, and modify
F(u) = k (1 – e–u)
saturation level = k
0.5
F(u) = per cell kill rate

0.4
0.3
0.2
0.1
0
0 1 2 3 4 5
u = Amount of Drug
Fig. 11. Drug kill rate. This figure shows the graph of a function describing the fractional
cell kill as a function of the amount of drug at the tumor site. The function was chosen to
conform with the assumptions that (1) the kill rate increases with the amount of drug
present, and (2) the kill rate saturates at a fixed level, in this case given by the parameter
k ¼ 0.47, indicated by the dashed line.
the equations to describe the interaction between the drug and the
cells. We note, however, that our choice of modeling approach
implies certain assumptions. For example, since we only describe
total populations, all interactions are assumed to be homogeneous.
In particular, we assume that the drug at the tumor site reaches all
tumor cells with equal likelihood, and that all cells are affected in
the same way. Since we want to monitor toxicity, we introduce
another cell population that we denote by N(t) for “normal”
cells. This population will be adversely affected by the drug, and
we consider a treatment to be “not too toxic” if this normal cell
population is maintained above a certain prespecified amount.
Step 3: Formulate the Model.
As before, we use the mass conservation principle to develop the
formal model for the extended model:
The change in population over time is equal to “amount
in”“amount out.”
Applying this to the amount of drug, we assume that the drug
goes in at a rate that we can control. Incorporating the time it takes
for the drug to be transported from the injection site to the tumor
site we denote the rate of influx of the drug to the tumor site by a
function of time, v(t). We assume that the drug decays exponen-
tially, at a rate proportional to the existing amount:
du
¼ vðtÞ d 2 u:
dt
To model the interaction between the drug and the tumor cells,
we use a mass action term of the form F(u)T, where F(u) is a
saturating function of the amount of drug, u(t):
F ðuÞ ¼ kð1 eu Þ:
Figure 11 shows a graph of the function F(u).
We assume that the drug is toxic to all cells, with the toxicity
varying according to cell type. The parameter k is varied for each
cell type to indicate the different levels of toxicity. To model the
normal cell population, we use a formal model similar to the
differential equation describing tumor cell growth, using a logistic
growth law to describe the “amount in.” To model the “amount
out,” in addition to the drug kill rate, a mass action term is used to
describe competition with tumor cells. Finally, since we assume that
normal cells and tumor cells compete, we add another competition
term to reflect this in the equation for the tumor cells. This gives
the following set of four differential equations for the extended
model:
dE rET
¼ s dE þ c 1 ET k1 ð1 eu Þ (4)
dt sþT
dT
¼ a 1 T ð1 b 1 T Þ c 2 ET c 3 NT k2 ð1 eu Þ (5)
dt
dN
¼ a 2 N ð1 b 2 N Þ c 4 NT k3 ð1 eu Þ (6)
dt
du
¼ vðtÞ d 2 u: (7)
dt
Step 4: Solve the System.
Our question is now turned into the following mathematical
problem that we must solve. We must find the function, v(t), that
will minimize the value of the tumor variable, T, while maintaining
the normal cell variable, N, above a prescribed level. A mathemati-
cal solution to this problem can be obtained using optimization
techniques based on Pontryagin’s Maximum Principle. For more
details see ref. (5). The results are shown in Fig. 12, which
compares treatments on two hypothetical individuals, one whose
initial immune population is 0.15 (in units of 105 cells), and
the other with a slightly smaller immune cell population initially:
E(0) ¼ 0.1 ¼ 104 cells. The top row of Fig. 12 shows simulations
of the model when no chemotherapy treatment is given: the
tumor population in both “patients” increases to a dangerous
level. The middle row of Fig. 12 shows the results from the
model if v(t), the function describing the drug dosing, mimics
that of a (hypothetical) “traditional” chemotherapy protocol,
where drug is administered in bolus injections regularly every
3 days for 90 days. We note that this regular, pulsed therapy does
No Chemotherapy
No Chemotherapy: I0 = 0.15 No Chemotherapy: I0 = 0.1
1 1
Immune Immune
0.9 Tumor 0.9 I0 = 0.1, T0 = 0.25, N0 =1 Tumor
I0 = 0.15, T0 = 0.25, N0 = 1
Normal s = 0.33, R = 0.01, A = 0.3 Normal
s = 0.33, R = 0.01, A = 0.3 0.8
0.8
0.7 0.7
Number of Cells
Number of Cells
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 50 100 150 0 50 100 150
Time in Days Time in Days
Traditional Pulsed Chemotherapy

Traditional pulsed chemotherapy: I0 = 0.15 Traditional pulsed chemotherapy: I0 = 0.1
2 2
Immune
Immune 1.8 Tumor
1.8 I0 = 0.15, T0 = 0.25, N0 = 1 I0 = 0.1, T0 = 0.25, N0 = 1
Tumor Normal
s = 0.33, R = 0.01, A = 0.3 s = 0.33, R = 0.01, A = 0.3
1.6 Normal 1.6 Drug Dose
Drug Dose
1.4 1.4
Number of Cells
Number of Cells
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 50 100 150 0 50 100 150
Solution of Optimization Problem: Irregularly-space, Extended Doses

Optimal control chemotherapy: I0 = 0.15 Optimal control chemotherapy: I0 = 0.1
2 2
1.8 I0 = 0.15, T0 = 0.25, N0 = 1 1.8 I0 = 0.1, T0 = 0.25, N0 = 1

s = 0.33, R = 0.01, A = 0.3 s = 0.33, R = 0.01, A = 0.3
1.6 1.6
1.4 1.4
Number of Cells
Number of Cells
1.2 1.2
1 1
0.8 0.8
Immune
Immune
0.6 0.6 Tumor
Tumor
Normal
Normal
0.4 0.4 Drug Dose
Drug Dose
0.2 0.2
0 0
0 50 100 150 0 50 100 150
I0 = E(0) = :15 I0=E(0) = :10
eliminate the tumor when the initial immune cell count is 0.15, but
is unable to control tumor growth in the right-hand panel, when
the initial immune cell count is only 0.1. The bottom row of Fig. 12
shows the numerical solution to the optimization problem, which
indicates that the function v(t) should be a step function that
changes irregularly from the minimum (zero) to the maximum
dosage. With this treatment schedule, tumor growth is controlled
in both cases. We note that the total amount of drug given is
identical in the treatment scenarios depicted in the first and second
rows of Fig. 12. Furthermore, the value of the normal cells never
falls below 0.75, which, in these scaled variables, is three-fourths of
its usual level.
Step 5: Answer the Question.
We have seen that the mathematical model, which reproduced the
clinically observed phenomena of tumor dormancy and “creeping
through,” can be used to study the effect of chemotherapy treat-
ments. Using optimization theory, treatment protocols can be
identified that, according to the model, improve the treatment
outcomes, in the sense that the tumor volume is smaller at the
end of a prescribed amount of time, while normal cells never get
below three-fourths of their usual levels.
4. Conclusion
In this chapter we have given an overview of the modeling process,

and a detailed example of the process applied to the development of
a model of tumor growth, the immune response, and chemotherapy.
We illustrated how new questions might arise as the model is devel-
oped, and how these new questions can lead to new model devel-
opments. In the case of the treatment of tumors, new therapies can

Fig. 12. A comparison of chemotherapy treatment protocols. This figure depicts three different solutions of the model
system given by Eqs. 4 through 7. In the top row, the control function, v(t), which describes the rate of drug delivery, is zero
(no treatment). The left graph shows a solution with an initial value describing a patient with a strong immune system
[E (0) ¼ 1.5 105 cells], while the right graph shows a patient with a weaker immune system [E(0) ¼ 1 105 cells].
Without treatment, both solutions show the tumor population (asterisks) converging to the larger tumor equilibrium. In the
second row, the control function, v(t ) describes regularly spaced treatments: a bolus injection every 3 days for 90 days (the
amount of drug at the tumor site is depicted with squares). The left panel shows that this treatment does control the tumor
in the patient with the stronger immune system, but it is ineffective in the case of the patient with the weaker immune
system, shown in the right panel. Note that in the right panel, the normal cells (triangles) and the immune cells (circles) are
decreasing after treatment as the tumor grows. In the third row of the figure, the control function v(t ) is the solution to the
optimal control problem. It is an irregularly varying function, and in both cases, the stronger and the weaker immune
response shown in the left and right panels, the tumor is reduced to zero. Note that the total amount of drug delivered, the
area under the graph of the function v(t ) is the same in the second and third rows.
also lead to new models. For example, the development of cancer

vaccines has meant the introduction of new compartments into the
model, such as the spleen, in order to include the proliferation of
different types of immune cells due to the vaccine. Delays, repre-
senting the time required for the immune response to initiate, must
also be included in the equations. Further model refinements
include spatial variables to include tumor heterogeneity and the
inclusion of stochastic components to model uncertainties and
variation in the response of individual cells. see refs. (2), (6) and
(7) for further model refinements. General articles about the
modeling process that might be useful are refs. (1) and (8).
References
1. Sanchez PJ (2006) As simple as possible, but and drug therapy: an optimal control approach.
no simpler: a gentle introduction to simulation J Theor Med 3:79–100
modeling. Proceedings of the 2006 winter 6. de Pillis LG, Radunskaya AE, Wiseman CL
simulation conference, pp 2–10 (2005) A validated mathematical model of
2. de Pillis LG, Mallett D, Radunskaya AE (2006) cell-mediated immune response to tumor
Spatial tumor—immune modeling. Comput growth. Cancer Res 65:7950–7958
Math Model Med 7(2):159–176 7. de Pillis LG, Radunskaya AE (2006) Some
3. Diefenbach A, Jensen ER, Jamieson A, Raulet promising approaches to tumor—immune
DH (2001) Rae1 and H60 ligands of the modeling. In: Gumel A, Castillo-Chavez C,
NKG2D receptor stimulate tumour immunity. Mickens R and Clemence DP(eds) Mathemati-
Nature 413:165–171 cal studies on human disease dynamics:
4. Borrelli R, Coleman C (2004) Differential emerging paradigms and challenges. AMS
equations—a modeling perspective. Wiley, Contemporary Mathematics Series
New York, NY 8. Cooke K (1981) On the construction and eval-
5. de Pillis LG, Radunskaya AE (2001) A mathe- uation of mathematical models. In: Sabloff JA
matical tumor model with immune resistance (ed) Simulations in Archaeology. University of
New Mexico Press, Albuquerque, NM
Chapter 5
Tools and Techniques

Arthur N. Mayeno and Brad Reisfeld
Abstract
This chapter lists some of the software and tools that are used in computational toxicology, as presented in
this volume.
Key words: Software, Mathematical modeling, Algorithms, Computational toxicology, Computa-

tional chemistry, Computational biology
1. Introduction
As the name “Computational Toxicology” implies, this field

involves computations, often in the form of mathematical equa-
tions and computer algorithms, implemented either as part of a
dedicated software package or through any variety of software
languages. By algorithm, we mean a list of instructions that calcu-
lates a quantity of interest or performs a task. The need for compu-
tations arises due to the vast quantity of complex information that
must be organized, filtered, and analyzed, and the desire to perform
elaborate calculations that are impossible without the use of a
computer, such as solving complex systems of coupled differential
equations or performing Monte Carlo simulations, where a large
number of repeated random samplings are required.
Over the course of these past few decades, as the field has
matured, an increasing number of software packages have become
available. Before acquiring a software application, it is crucial for
prospective users to evaluate a number of factors, in addition to
technical considerations. Some software packages are available at
no cost while others are relatively expensive and may require
the payment of annual, upgrade, and per-seat licensing fees.
The documentation and level of support also vary widely across
75
76 A.N. Mayeno and B. Reisfeld
the range of available software. Many packages come with some

level of technical support and, sometimes, documentation of vali-
dation of certain aspects of its performance, while others are
completely unsupported, with no guarantee of performance or
accuracy. Aside from evaluating the current level of support, it is
also important to consider the long-term availability, vitality, and
maintenance of the software. For instance, a software product with
a strong user base and active maintainers and support staff may be
updated with new features and bug fixes on an ongoing basis, while
a niche software application produced in a research lab may not be
available, or be maintained, past its period of use for a particular
project. Finally, there is the issue of the license under which the
software is distributed. Commercial software is often distributed
under a proprietary software license where ownership of the soft-
ware remains with the software publisher and the user is restricted
from an extensive number of activities, such as reverse engineering,
simultaneous use of the software by multiple users, and publication
of benchmarks or performance tests. On the other end of the
spectrum are a variety of free and open source licenses, some of
which essentially grant the end users permission to do anything
they wish with the source code in question, including the right to
take the code and use it as part of closed-source software or soft-
ware released under a proprietary software license.
Performing the calculations requires hardware, namely, compu-
ters. The type of computer and the computing “power” (the speed
at which a computer can perform calculations) required depends on
the nature of the calculations performed. For example, linear regres-
sion calculations (e.g., often used in quantitative structure–activity
relationships) can be accomplished instantly on a spreadsheet, while
other types of calculations (e.g., molecular dynamics of enzyme
catalysis, which involve simulating the movement of atoms and
molecules) may require weeks or months (or longer), even on
“supercomputers.” Thus, in addition to the software, computing
resources are required, which can dramatically increase the cost of
performing research in computational toxicology. Today, most soft-
ware packages can run on one or more of the three most common
computing platforms, i.e., Windows, Macintosh, and Linux/Unix,
although some functionality (such as parallelization) may not be
available on all platforms. Thus, it is essential to carefully investigate
the software and hardware before purchase.
Another cautionary note worth mentioning has arisen due to
the ease of performing calculations. As software packages have
allowed calculations to be performed more easily, for example
through the use of simplified user interfaces such as a “wizard,”
the user many not understand what is actually being performed. On
the other hand, new users who attempt to use a complex analysis
program without this simplified interface may be overwhelmed
with the number of program options and settings that must be
made to correctly solve their problem of interest. As such,
5 Tools and Techniques 77
researchers new to the area of computational sciences are wise to

remember the phrase, “Garbage in, garbage out”: the output
results will be worthless if the input quantities are flawed. These
input quantities comprise essentially everything used to perform
the calculations, including the quality of the data itself, parameter
values, assumptions made, structure of the model used, and so
forth. A common mistake that new modelers make is the belief
that the result the software package gives is “correct” and that the
computed number is exact (i.e., precise to the number of digits
output). Neither is true: the results are only as good as the input
used and the appropriateness and correctness of the computational
algorithm and implementation, and the precision of values pro-
duced is typically a result of a set threshold value and algorithm
dependent. When using unvalidated software, the danger of erro-
neous calculations is even more likely. Even with the theoretical
advances in the understanding of biology and the physical universe,
experimental results are still the “gold standard” in nearly all cases
(the exception being when experiments cannot be performed).
These caveats are not intended, by any means, to dissuade
scientists and researchers from using and applying computational
methodologies. Rather, they are mentioned here to help those who
are new to the field to avoid these pitfalls. We believe that the
“Golden Age of Computational Toxicology” is underway, and we
hope that others will take advantage of and contribute to this
burgeoning and practical field.
Table 1 outlines many of the software packages and tools pre-
sented in this volume. This list is not a comprehensive list of software
and tools available, just a summary of some of those mentioned in
the chapters. Depending on the purpose of their work, users—
before acquiring software—should take the time to perform a thor-
ough investigation of the software packages available and to under-
stand which packages are acceptable for their needs, either as a
purely research tool or as one used to support a regulatory filing.
Table 1
List of software packages, tools, and companies presented in this volume
Name/Developer Description/URL
Chapter 1.2
MOE (Molecular Software package contains Structure-Based Design; Pharmacophore
Operating Discovery; Protein & Antibody Modeling; Molecular Modeling &
Environment) Simulations; Cheminformatics & (HTS) Quantitative Structure/Activity
Relationship (QSAR); and Medicinal Chemistry Applications.
Chemical Computing http://www.chemcomp.com/software.htm
Group, Inc.
(continued)
Table 1
(continued)
QiKProP Package to predict parameters such as octanol/water and water/gas logP,
logs, logBB, overall CNS activity, Caco-2 and MDCK cell permeabilities,
human oral absorption, log Khsa for human serum albumin binding, and log
IC50 for HERG K+-channel blockage.
Schrődinger, LLC http://www.schrodinger.com/products/14/17/
SPARCa Predictive modeling system that calculates large number of physical/chemical
parameters from molecular structure and basic information about the
environment.
US EPA, University of http://www.epa.gov/extrmurl/research/projects/sparc/index.html; http://
Georgia archemcalc.com/index.php
OpenEye Company that offers various software packages, libraries, and toolkits.
OpenEye Scientific www.eyesopen.com
Software, Inc.
Chapter 1.3
BLASTa Sequence database search methods: finds regions of local similarity between
sequences. The program compares nucleotide or protein sequences to
sequence databases and calculates the statistical significance of matches.
BLAST can be used to infer functional and evolutionary relationships
between sequences as well as help identify members of gene families.
National Center for http://blast.ncbi.nlm.nih.gov
Biotechnology
Information
FASTAa Sequence database search methods: The FASTA programs find regions of local
or global (new) similarity between Protein or DNA sequences, either by
searching Protein or DNA databases or by identifying local duplications
within a sequence. Other programs provide information on the statistical
significance of an alignment. Like BLAST, FASTA can be used to infer
functional and evolutionary relationships between sequences as well as help
identify members of gene families.
William R. Pearson and http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml
the University of
Virginia
ClustalWa Multiple sequence alignment methods.
Ch.EMBnet.org http://www.ch.embnet.org/index.html
Kaligna Multiple sequence alignment methods: a method employing the Wu-Manber
string-matching algorithm.
European http://www.ebi.ac.uk/Tools/msa/kalign/
Bioinformatics
Institute
MUSCLEa Multiple sequence alignment method.
Robert C. Edgar http://www.drive5.com/muscle/
T-Coffeea Multiple sequence alignment methods: A collection of tools for Computing,
Evaluating, and Manipulating Multiple Alignments of DNA, RNA, Protein
Sequences, and Structures.
Center for Genomic http://www.tcoffee.org/
Regulation (CRG)
(continued)
Table 1
(continued)
Predict Proteinb Secondary structure prediction methods: PredictProtein integrates feature
prediction for secondary structure, solvent accessibility, transmembrane
helices, globular regions, coiled-coil regions, structural switch regions,
B-values, disorder regions, intra-residue contacts, protein–protein and
protein–DNA binding sites, subcellular localization, domain boundaries,
beta-barrels, cysteine bonds, metal binding sites, and disulfide bridges.
ROSTLAB.ORG http://www.predictprotein.org/
JPreda Secondary structure prediction method.
Geoff Barton, http://www.compbio.dundee.ac.uk/Software/JPred/jpred.html
Bioinformatics and
Computational
Biology Research,
University of
Dundee, Scotland,
UK
ExPASy-toolsa ExPASy is the new SIB Bioinformatics Resource Portal which provides access
to scientific databases and software tools in different areas of life sciences
including proteomics, genomics, phylogeny, systems biology, evolution,
population genetics, transcriptomics, etc.
Swiss Institute of http://expasy.org/tools/
Bioinformatics
PSI-BLASTa PSI-BLAST is similar to NCBI BLAST except that it uses position-specific
scoring matrices derived during the search; this tool is used to detect distant
evolutionary relationships.
European http://www.ebi.ac.uk/Tools/sss/psiblast/
Bioinformatics
Institute
BLASTa The Basic Local Alignment Search Tool (BLAST) finds regions of local
similarity between sequences. The program compares nucleotide or protein
sequences to sequence databases and calculates the statistical significance of
matches. BLAST can be used to infer functional and evolutionary
relationships between sequences as well as help identify members of gene
families.
National Center for http://blast.ncbi.nlm.nih.gov/Blast.cgi
Biotechnology
Information
BINDa Biomolecular Interaction Network Database (BIND) as a Web database.
BIND offers methods common to related biology databases and
specializations for its protein interaction data.
Christopher Hogue’s http://www.blueprint.org/
Research Lab
BioGRIDa Biological General Repository for Interaction Datasets (BioGRID) database
was developed to house and distribute collections of protein and genetic
interactions from major model organism species.
TyersLab.com http://www.thebiogrid.org
DIPa The DIPTM database catalogs experimentally determined interactions between
proteins. It combines information from a variety of sources to create a
single, consistent set of protein–protein interactions.
(continued)
Table 1
(continued)
Regents of the http://dip.doe-mbi.ucla.edu/dip/Main.cgi
University of
California and David
Eisenberg
HPRDb Human Protein Reference Database (HPRD) is a protein database accessible
through the Internet.
Institute of http://www.hprd.org/
Bioinformatics in
Bangalore, India, and
the Pandey lab at
Johns Hopkins
University
IntActa IntAct provides a freely available, open source database system and analysis
tools for protein interaction data.
European http://www.ebi.ac.uk/intact/
Bioinformatics
Institute (EBI)
MINTa MINT, the Molecular INTeraction database. MINT focuses on experimentally
verified protein–protein interactions mined from the scientific literature by
expert curators.
University of Rome Tor http://mint.bio.uniroma2.it/mint/Welcome.do
Vergataand IRCCS
Fondazione Santa
Lucia
KEGG KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of online
databases dealing with genomes, enzymatic pathways, and biological
chemicals. The PATHWAY database records networks of molecular
interactions in the cells, and variants of them specific to particular
organisms.
Kanehisa Laboratories at http://www.genome.jp/kegg/pathway.html
Kyoto University and
the University of
Tokyo
UniPathwaya UniPathway is a curated resource of metabolic pathways for UniProtKB/
Swiss-Prot knowledge base.
Swiss Institute of http://www.grenoble.prabi.fr/obiwarehouse/unipathway
Bioinformatics (SIB),
French National
Institute for Research
in Computer Science
and Control (INRIA
Rhone-Alpes)—
HELIX/BAMBOO
group, Laboratoire
d’Ecologie Alpine,
Pôle Rhône-Alpin de
Bioinformatique
(continued)
Table 1
(continued)
GastroPlus Advanced software program that simulates the absorption, pharmacokinetics,
and pharmacodynamics for drugs in human and preclinical species. The
underlying model is the Advanced Compartmental Absorption and Transit
(ACAT) model.
Simulations-Plus, Inc. http://www.simulations-plus.com/
WinNonLin WinNonlin is the industry standard for pharmacokinetic, pharmacodynamic,
and noncompartmental analysis. In addition to its extensive library of built-
in PK, PD, and PK/PD models, WinNonlin supports custom, user-defined
models to address any kind of data.
Pharsight Inc. http://www.pharsight.com
DOCKb DOCK addresses the problem of “docking” molecules to each other. In
general, “docking” is the identification of the low-energy binding modes of
a small molecule, or ligand, within the active site of a macromolecule, or
receptor, whose structure is known.
Kuntz Lab program http://dock.compbio.ucsf.edu/
FlexX Docking software: Binding mode prediction (predicts the geometry of the
protein–ligand complex) and Virtual high-throughput screening (vHTS).
BioSolveIT http://www.biosolveit.de/FlexX/
Glide Docking software: full spectrum of speed and accuracy from high-throughput
virtual screening of millions of compounds to extremely accurate binding
mode predictions.
Schrödinger, LLC http://www.schrodinger.com/products/14/5/
Autodocka A suite of automated docking tools. It is designed to predict how small
molecules, such as substrates or drug candidates, bind to a receptor of
known 3D structure.
The Scripps Research http://autodock.scripps.edu/
Institute
GOLD Protein–Ligand Docking: a program for calculating the docking modes of
small molecules in protein binding sites and is provided as part of the
GOLD Suite.
University of Sheffield, http://www.ccdc.cam.ac.uk/products/life_sciences/gold/
GlaxoSmithKline plc
and CCDC
Amber A suite of programs that allow users to carry out molecular dynamics
simulations, particularly on biomolecules.
Multiple collaborator: http://ambermd.org/
http://ambermd.
org/contributors
Tripos Software company offering a variety of packages, including SYBYL-X Suite,
Muse, and D360.
Tripos http://tripos.com/
Chapter 3.1
SMILES A compact machine and human-readable chemical nomenclature
Daylight Chemical www.daylight.com/dayhtml_tutorials/languages/smiles/index.html
Information Systems
Inc.
(continued)
Table 1
(continued)
QSAR Toolboxa Software to fill gaps in (eco-)toxicity data needed for assessing the hazards of
chemicals.
Organisation for http://www.qsartoolbox.org/index.html
Economic Co-
operation and
Development &
European Chemicals
Agency
CODESSA An advanced, full featured QSAR program that ties information from
AMPAC™ to experimental data.
SemiChem Inc. http://www.semichem.com/
MoKa A novel approach for in silico computation of pKa values; trained using a very
diverse set of more than 25,000 pKa values, it provides accurate and fast
calculations using an algorithm based on descriptors derived from GRID
molecular interaction fields.
Molecular Discovery http://www.moldiscovery.com/index.php
COSMO-RS A program for the quantitative calculation of solvation mixture
thermodynamics based on quantum chemistry.
COSMOlogic http://www.cosmologic.de/index.php
SPARCa Predictive modeling system that calculates large number of physical/chemical
parameters from molecular structure and basic information about the
environment.
US EPA, University of http://www.epa.gov/extrmurl/research/projects/sparc/index.html; http://
Georgia archemcalc.com/index.php
PREDICTPlus Prediction of Thermodynamic and Transport Properties.
Dragon Technology http://www.mwsoftware.com/dragon/
Inc.
PhysProps Thermodynamic/chemical physical property database and property
estimation tool.
G&P Engineering http://www.gpengineeringsoft.com/
Software
ChemDBsoft Searches databases by structure and substructure.
Chemistry Database http://www.chemdbsoft.com/
Software
EPI Suitea A Windows®-based suite of physical/chemical property and environmental
fate estimation programs developed by the EPA’s Office of Pollution
Prevention Toxics and Syracuse Research Corporation (SRC).
EPA http://www.epa.gov/opptintr/exposure/pubs/episuite.htm
CSLog D Prediction gives the log of the water/octanol partition coefficient for charged
molecules.
ChemSilico http://www.chemsilico.com/index.html
CSLog P Prediction gives the log of the water/octanol partition coefficient for neutral
molecules.
ChemSilico http://www.chemsilico.com/index.html
Marvin Suite A collection of tools for drawing, displaying, and characterizing chemical
structures, substructures, and reactions.
ChemAxon http://www.chemaxon.com/
(continued)
Table 1
(continued)
Physicochemical & A complete array of tools for the prediction of molecular physical properties
ADMET Prediction from structure. The ability to train allows for the inclusion of novel chemical
Software space in many modules. The value of predictions has also been extended to
include a tool for property-based structure design.
ACD/Labs http://www.acdlabs.com/home/
Molconn-Z The standard program for generation of Molecular Connectivity, Shape, and
Information Indices for QSAR Analyses.
eduSoft http://www.edusoft-lc.com/molconn/
Jaguar A ab initio quantum chemistry package for the calculation of molecular
properties, including NMR, IR, pKa, partial charges, multipole moments,
polarizabilities, molecular orbitals, electron density, electrostatic potential,
Mulliken population, and NBO analysis.
Schrödinger http://www.schrodinger.com/
Statistica A comprehensive package for data analysis, data management, data
visualization, and data mining procedures.
StatSoft http://www.statsoft.com/#
SPSS Software Advanced mathematical and statistical packages to extract predictive
knowledge that when deployed into existing processes makes them adaptive
to improve outcomes.
IBM http://www.spss.com/
Scigress Explorer A computational chemistry package to calculate molecular properties and
energy values. The software provides insight into chemical structure,
properties, and reactivity.
SCUBE http://www.scubeindia.com/scigress_bio.html
ProChemist A molecular modeling package to simulate on screen the physical-chemical
properties of organic molecules, design graphically new structures, and
predict their behavior.
Cadcom http://pro.chemist.online.fr/
SAS A widely used, comprehensive statistical analysis software package.
SAS Institute Inc. http://www.sas.com/
PSPPa A program for statistical analysis of sampled data; similar to SPSS; free.
http://savannah.gnu. http://www.gnu.org/software/pspp/
org/projects/pspp
NCSS A statistical and power analysis software.
NCSS http://www.ncss.com/
Minitab A leading statistical software.
Minitab http://www.minitab.com/
MATLAB A high-level language and interactive environment that enables you to
perform computationally intensive tasks faster than with traditional
programming languages such as C, C++, and Fortran.
MathWorks http://www.mathworks.com/
ADMET Predictor A software for advanced predictive modeling of ADMET properties.
Simulations Plus http://www.simulations-plus.com/Default.aspx
PLSRa Performs model construction and prediction of activity/property using the
Partial Least Squares (PLS) regression technique.
VCCLab http://www.vcclab.org/lab/pls/
(continued)
Table 1
(continued)
Chapter 3.2
Coota For macromolecular model building, model completion, and validation,
particularly suitable for protein modeling using X-ray data.
Paul Emsley et al. www.biop.ox.ac.uk/coot/
Chapter 3.6
ProSa 2003b Software tool for the analysis of 3D structures of proteins.
CAME http://www.came.sbg.ac.at/prosa_details.php
Chapter 4.2
GastroPlus An advanced software program that simulates the absorption,
pharmacokinetics, and pharmacodynamics for drugs in human and
preclinical species.
Simulations Plus www.simlations-plus.com
PK-Sim A software tool for summary and evaluation of preclinical and clinical
experimental results on absorption, distribution, metabolism, and excretion
(ADME) by means of whole-body physiologically based pharmacokinetic
(PBPK) modeling and simulation.
Bayer Technology http://www.systems-biology.com/products/pk-sim.html
SimCYP Population-based Simulator for drug development through the modeling and
simulation of pharmacokinetics and pharmacodynamics in virtual
populations.
Simcyp Limited http://www.simcyp.com/
Chapter 4.3
ADME Suite A collection of software modules that provide predictions relating to the
pharmacokinetic profiling of compounds, specifically their ADME
properties.
ACD Labs www.acdlabs.com
ADME Descriptors ADMET Descriptors in Discovery Studio® include models for intestinal
absorption, aqueous solubility, blood–brain barrier penetration, plasma
protein binding, cytochrome P450 2D6 inhibition, and hepatotoxicity.
Accelrys www.accelrys.com
Cloe PK A software system that uses PBPK modeling for pharmacokinetic prediction.
Cyprotex www.cyprotex.com
Gastroplus An advanced software program that simulates the absorption,
pharmacokinetics, and pharmacodynamics for drugs in human and
preclinical species.
Simulations Plus www.simlations-plus.com
META Program for metabolic pathways prediction of molecules, using provided
dictionaries.
Multicase www.multicase.com
MetabolExpert Program for prediction of the metabolic fate of a compound in the drug
discovery process or during the dispositional research phase.
Compudrug www.compudrug.com
QikProp Software for rapid ADME predictions of drug candidates.
Schrodinger www.schrodinger.com
Volsurf+ Software that creates 128 molecular descriptors from 3D Molecular
Interaction Fields (MIFs) produced by Molecular Discovery’s software
GRID, which are particularly relevant to ADME prediction.
(continued)
Table 1
(continued)
Molecular Discovery www.moldiscovery.com
MetaCore An integrated knowledge database and software suite for pathway analysis of
experimental data and gene lists.
GeneGo www.genego.com
MetaDrug A unique systems pharmacology platform designed for evaluation of biological
effects of small molecule compounds on the human body, with pathway
analysis and other bioinformatics applications from toxicogenomics to
translational medicine.
Metasite A computational procedure that predicts metabolic transformations related to
cytochrome-mediated reactions in phase I metabolism.
MEXAlert A quick and sensitive tool for indicating possibilities of first-pass metabolism.
RetroMEX RetroMex predicts the structure of the retro-metabolites, collects them in
compound databases, and presents the results in tree format. Retro-
metabolic drug design encompasses a series of concepts, e.g., prodrugs, soft
drugs, and chemical drug-delivery systems.
MoKa A novel approach for in silico computation of pKa values; trained using a very
diverse set of more than 25,000 pKa values, it provides accurate and fast
calculations using an algorithm based on descriptors derived from GRID
molecular interaction fields.
MCASE/MC4PC Windows-based Structure–Activity Relationship (SAR) automated expert
system: the program will automatically evaluate the dataset and try to
identify the structural features responsible for activity (biophores). It then
creates organized dictionaries of these biophores and develops ad hoc local
QSAR correlations that can be used to predict the activity of unknown
molecules.
METAPC Windows-based Metabolism and Biodegradation Expert System: uses
provided dictionaries to create metabolic paths of molecules submitted to it.
Each product is tested in silico for carcinogenicity.
ADMET Predictor A software for advanced predictive modeling of ADMET properties.
Simulations Plus http://www.simulations-plus.com/Default.aspx
Meteor The program uses expert knowledge rules in metabolism to predict the
metabolic fate of chemicals and the predictions are presented in metabolic
trees.
LHASA www.lhasalimited.org
PK Solutions An automated Excel-based program that does single- and multiple-dose
pharmacokinetic data analysis of concentration–time data from biological
samples (blood, serum, plasma, lymph, etc.) following intravenous or
extravascular routes of administration.
Summit PK www.summitpk.com
(continued)
Table 1
(continued)
PK-Sim A software tool for summary and evaluation of preclinical and clinical
experimental results on ADME by means of whole-body PBPK modeling
and simulation.
Bayer Technology http://www.systems-biology.com/products/pk-sim.html
Chapter 4.4
Discovery Studio A software suite of life science molecular design solutions for computational
chemists and computational biologists.
Accelrys http://accelrys.com/products/discovery-studio/index.html
MOE Software package contains Structure-Based Design; Pharmacophore
Discovery; Protein & Antibody Modeling; Molecular Modeling &
Simulations; Cheminformatics & (HTS) QSAR; and Medicinal Chemistry
Applications.
Chemical Computing http://www.chemcomp.com/software.htm
Group, Inc.
CORINA A fast and powerful 3D structure generator for small and medium-sized,
typically drug-like molecules.
Molecular Networks http://www.molecular-networks.com/products/corina
FlexX Docking software: Binding mode prediction (predicts the geometry of the
protein–ligand complex) and vHTS.
BioSolveIT http://www.biosolveit.de/FlexX/
Chapter 4.6
Jsimb A Java-based simulation system for building quantitative numeric models and
analyzing them with respect to experimental reference data.
Physiome Project http://www.physiome.org/jsim
Chapter 4.7
acslX A modeling, execution, and analysis environment for continuous dynamic
systems and processes.
The AEgis Technologies http://www.acslx.com/
Group, Inc.
Berkeley Madonna General-purpose differential equation solver.
Robert I. Macey & http://www.berkeleymadonna.com
George F. Oster
ModelMaker Two-way class tree oriented productivity, refactoring and UML-style CASE
tool. Native Refactoring and UML 2.0 modeling for Delphi and C#.
ModelMaker Tools http://www.modelmakertools.com
SCOP Database and tool to provide a detailed and comprehensive description of the
structural and evolutionary relationships between all proteins whose
structure is known.
SCOP http://scop.mrc-lmb.cam.ac.uk/scop
(continued)
Table 1
(continued)
Chapter 5.1
QSAR Toolboxa Software for grouping chemicals into categories and filling gaps in (eco)
toxicity data needed for assessing the hazards of chemicals.
OECD (Organisation http://www.qsartoolbox.org/index.html
for Economic
Co-operation
and Development)
& European
Chemicals Agency
Statistica A comprehensive package for data analysis, data management, data
visualization, and data mining procedures.
StatSoft http://www.statsoft.com/#
SIMCA-P A tool for scientists, researchers, product developers, engineers, and others
who have huge datasets.
Umetrics http://www.umetrics.com/simca
Chapter 5.4
Toxtreea A full-featured and flexible user-friendly open source application, which is able
to estimate toxic hazard by applying a decision tree approach.
Ideaconsult Ltd. http://toxtree.sourceforge.net/
OncoLogica A desktop computer program that evaluates the likelihood that a chemical may
cause cancer.
EPA http://www.epa.gov/oppt/sf/pubs/oncologic.htm
for Economic Co-
operation and
Development) &
European Chemicals
Agency
OpenToxa An interoperable predictive toxicology framework which may be used as an
enabling platform for the creation of predictive toxicology applications.
OpenTox http://www.opentox.org
Chapter 5.6
Caesara An EC-funded project (Project no. 022674—SSPI), which was specifically
dedicated to develop QSAR models for the REACH legislation.
Istituto di Ricerche http://www.caesar-project.eu
Farmacologiche
Mario Negri
Derek Nexus Expert knowledge base system that predicts whether a chemical is toxic in
humans, other mammals, and bacteria.
Lhasa Limited https://www.lhasalimited.org
HazardExpert A software tool for initial estimation of toxic symptoms of organic compounds
in humans and in animals.
Compudrug http://www.compudrug.com
Lazara An open source software program that makes predictions of toxicological
endpoints (e.g., mutagenicity, rodent and hamster carcinogenicity, maximum
recommended daily dose) by analyzing structural fragments in a training set.
(continued)
Table 1
(continued)
In Silico Toxicology http://lazar.in-silico.de
gmbh
Meteor The program uses expert knowledge rules in metabolism to predict the
metabolic fate of chemicals and the predictions are presented in metabolic
trees.
LHASA www.lhasalimited.org
for Economic Co-
operation and
Development) &
European Chemicals
Agency
OncoLogica A desktop computer program that evaluates the likelihood that a chemical may
cause cancer.
EPA http://www.epa.gov/oppt/sf/pubs/oncologic.htm
TOPKAT Discovery Studio tool that makes predictions of a range of toxicological
endpoints, including mutagenicity, developmental toxicity, rodent
carcinogenicity, rat chronic Lowest Observed Adverse Effect Level
(LOAEL), rat Maximum Tolerated Dose, and rat oral LD50.
Accelrys http://accelrys.com
ACD/Tox Suite A collection of software modules that predict probabilities for basic toxicity
endpoints.
ACD/Labs http://www.acdlabs.com/home
Toxtreea A full-featured and flexible user-friendly open source application, which is able
to estimate toxic hazard by applying a decision tree approach.
Ideaconsult Ltd. http://toxtree.sourceforge.net/
Chapter 6.2
CellNetOptimizera A MATLAB toolbox for creating logic-based models of signal transduction
networks, and training them against high-throughput biochemical data.
Saez-Rodriguez Group http://www.ebi.ac.uk/saezrodriguez/software.html#CellNetOptimizer
Software
Chapter 6.3
SMBioNet A tool for modeling genetic regulatory systems, based on the multivalued
logical formalism of René Thomas and the Computational Tree Logic
(CTL).
Laboratoire I3S— http://www.i3s.unice.fr/~richard/smbionet
CNRS & Université
de Nice
(continued)
Table 1
(continued)
Chapter 6.4
Pajeka A program, for Windows, for analysis and visualization of large networks
having some thousands or even millions of vertices.
Vladimir Batagelj and http://vlado.fmf.uni-lj.si/pub/networks/pajek/
Andrej Mrvar
Cytoscapea An open source software platform for visualizing complex networks and
integrating these with any type of attribute data.
Cytoscape Consortium http://www.cytoscape.org
Chapter 7.1
MetaCore An integrated knowledge database and software suite for pathway analysis of
experimental data and gene lists.
GoMinera A tool for biological interpretation of “omic” data—including data from gene
expression microarrays.
NCI/LMP Genomics http://discover.nci.nih.gov/gominer/index.jsp
and Bioinformatics
group
Chapter 9.3
PASSI Toolkita The PTK tool is Rational Rose plug-in that offers a support for PASSI (a
Process for Agent Societies Specification and Implementation), a step-by-
step requirement-to-code methodology for designing and developing
multi-agent societies.
mcossentino, http://sourceforge.net/projects/ptk
mr_lombardo, sirtoy
MASONa A fast discrete-event multi-agent simulation library core in Java, designed to be
the foundation for large custom-purpose Java simulations.
George Mason http://cs.gmu.edu/~eclab/projects/mason
University’s
Evolutionary
Computation
Laboratory and the
GMU Center for
Social Complexity
NetLogoa A multi-agent programmable modeling environment.
Uri Wilensky http://ccl.northwestern.edu/netlogo
SeSAma A generic environment for modeling and experimenting with agent-based
simulation.
University of W€
urzburg http://www.simsesam.de
a
Open source or free access
b
Free for noncommercial and/or academic use
Packages and tools are listed based on their occurrence in a chapter; as such, some listings are repeated.
Descriptions are generally excerpts from information obtained at the listed URLs. This list is not compre-
hensive; if a package, tool, or company mentioned in a chapter is not listed, it is due to an inability to locate
the appropriate URL or an oversight by the authors
Part III
Cheminformatics and Chemical Property Prediction

Chapter 6
Prediction of Physicochemical Properties

John C. Dearden
Abstract
Physicochemical properties are key factors in controlling the interactions of xenobiotics with living organisms.
Computational approaches to toxicity prediction therefore generally rely to a very large extent on the
physicochemical properties of the query compounds. Consequently it is important that reliable in silico
methods are available for the rapid calculation of physicochemical properties. The key properties are partition
coefficient, aqueous solubility, and pKa and, to a lesser extent, melting point, boiling point, vapor pressure,
and Henry’s law constant (air–water partition coefficient). The calculation of each of these properties from
quantitative structure–property relationships (QSPRs) and from available software is discussed in detail, and
recommendations made. Finally, detailed consideration is given of guidelines for the development of QSPRs
and QSARs.
Key words: Physicochemical properties, Prediction, Quantitative structure–property relationships,

QSPR, Prediction software, Partition coefficient, Aqueous solubility, pKa, Melting point, Boiling
point, Vapor pressure, Henry’s law constant, QSPR guidelines
1. Introduction
Why, in a book devoted to the prediction of chemical toxicity, is it

necessary to consider the prediction of physicochemical properties
of chemicals? The answer is quite simple: it is largely their physico-
chemical properties that determine and control the absorption,
distribution, metabolism, excretion, and toxicity (ADMET) of
chemicals (1). For example, a chemical needs to have some aqueous
solubility in order to be absorbed by an organism. It also needs to
possess some hydrophobicity (lipophilicity) if it is to be absorbed
through lipid membranes. Both solubility and hydrophobicity are
affected by pKa, which is also a function of chemical structure.
There is a vast range of physicochemical properties that can
affect the toxicity of chemicals, and clearly it would be impossible in
this chapter to consider more than a fraction of these. I have
93
94 J.C. Dearden
therefore chosen to deal with the main physicochemical properties

that have been found from experience to be important in modeling
toxicity. There have been several reviews of the calculation of these
properties (2–4). Other properties concerned with environmental
toxicity are bioconcentration factor (BCF) and soil sorption of
chemicals, and these have been reviewed by Dearden (5, 6).
Physicochemical properties are predicted largely through the
use of quantitative structure–property relationships (QSPRs),
which in effect relate such properties to other, more fundamental
properties and/or structural features such as the presence or
absence of molecular substructures (7). An example is the modeling
of octanol–water partition coefficient (P, Kow) (representing hydro-
phobicity) (8):
log P ¼ 0:088 þ 0:562R 1:054pH þ 0:034SaH
3:460SbH þ 3:814VX ; (1)
n ¼ 613, R2 ¼ 0.995, and s ¼ 0.116,
where R ¼ excess molar refractivity (a measure of polarizability),
pH ¼ a polarity term, SaH and SbH ¼ hydrogen bond donor and
acceptor abilities, respectively, VX ¼ the McGowan characteristic
molecular volume, n ¼ number of chemicals used to develop the
model, R ¼ correlation coefficient, and s ¼ standard error of the
estimate (log units). Since the descriptors in this QSPR are approx-
imately auto-scaled, the magnitudes of the coefficients give an
indication of the relative contribution of each descriptor to log P.
Thus it can be seen that hydrogen bond acceptor ability and molec-
ular size make the most important contributions to log P; on the
other hand the contribution of hydrogen bond donor ability is
negligible, and this is attributed to the hydrogen bond acceptor
abilities of both water and octanol being very similar, while in
contrast the hydrogen bond donor ability of water being very
strong, accounting for the high negative coefficient on the SbH
term. The standard error of prediction is very low compared with
the typical experimental error on log P of about 0.35 log unit (9),
and may indicate some over-fitting of the data (see Subhead-
ing 4.12).
Many QSPRs are available in the literature for the prediction of
the most important physicochemical properties. For example,
Dearden (10) reported 93 published QSPRs for prediction of
aqueous solubility in diverse data sets between 1990 and 2005.
How then does one select the best, or most appropriate, QSPR for
one’s own purposes? Firstly, choose one with good statistics that
has been validated by the use of data not used in the development of
the QSPR (see Subheadings 4.17 and 4.20). Then check that the
descriptors used in the QSPR are accessible, and preferably have
some clear physicochemical significance. Finally consider whether
6 Prediction of Physicochemical Properties 95
Table 1
Software predictions of aqueous solubilitya
Atropine Caffeine Butylparaben

Measured 2.18 1.02 2.96
Software no. 1 1.87 1.87 3.09
Software no. 2 2.06 0.65 3.05
Software no. 3 1.01 0.27 3.07
Software no. 4 2.03 0.56 2.58
Consensus prediction 1.74 0.84 2.94
a
Solubility values given as log S, with S in mol/L
the use of more than one such QSPR, to give a consensus of

predictions, is appropriate.
It may be noted that perhaps the majority of QSPR modeling is
carried out using multiple linear regression (MLR), in which it is
assumed that the property being modeled correlates in a rectilinear
(straight line) manner with each descriptor used. That assumption
is not always valid, and the artificial neural networks (ANN)
approach is sometimes used to overcome this problem. The
ANN approach brings its own problems, however, such as over-
fitting and lack of an interpretable QSPR. There are numerous
other QSPR approaches (11), the discussion of which is outside
the scope of this chapter.
Clearly, for someone not familiar with QSPR, such a task could
be daunting. It is therefore not surprising that numerous software
programs are now available for the prediction of physicochemical
properties (as well as, of course, for the prediction of a range of
toxicity endpoints). Some of these software programs are freely
available, although most have to be purchased. They almost invari-
ably use QSPR methodology for their predictions, but all that the
user has to do is to input a chemical structure, for example as a
SMILES string (12) or a molfile. Again, the use of more than one
prediction program can allow a consensus prediction to be made.
The example in Table 1 shows predicted aqueous solubilities of
three compounds, atropine, caffeine, and butylparaben, using four
different software programs.
For atropine, it is clear that three programs give similar predic-
tions, well within the experimental error of 0.6 log unit, whereas
software no. 3 gives a poor prediction. The consensus (mean) of the
three good predictions is 1.99, which is only 0.2 log unit different
from the experimental value of 2.18. However, it may be noted
that even if the poor prediction from software no. 3 is included,
the consensus predicted value is 1.74, which is still within the
experimental error of the measured value.
96 J.C. Dearden
For caffeine, there is a considerable divergence of predicted

values, indicating that the solubility of this compound is difficult
to predict. Only two of the four predictions are within the experi-
mental error of 0.6 log unit, but the consensus of all four predic-
tions is 0.84, which is well within the experimental error. This
example really emphasizes the value of consensus modeling.
Butylparaben has a simpler chemical structure than atropine
and caffeine, and this is reflected in the more accurate predictions of
aqueous solubility, with all four being within the experimental error
of 0.6 log unit, and the consensus of all four predictions being
2.94.
It is recommended that, wherever possible, predictions be
obtained from more than one software program and/or QSPR,
and that the consensus of all the predictions be used, unless one of
the predicted values is clearly very different from the others, in
which case that prediction should be rejected.
2. Methods
2.1. QSPR Development If one wishes to develop a QSPR, there are three main steps to be
followed. Firstly, it is necessary to obtain experimental values of the
property of interest, for a sufficiently large number of chemicals.
“How large?” is an open question; QSPRs have been published
based on training sets of a few to several thousands of chemicals.
The more chemicals that are in the training set, the more robust will
be the QSPR—provided that the experimental data are accurate,
and that the chemicals selected are consistent with the purpose for
which the QSPR is to be developed (13). For example, if one
wished to develop a QSPR for the prediction of aqueous solubility
of aliphatic amines, it would be wrong to include data for aromatic
amines. A number of published papers (14, 15) have discussed the
selection of chemicals for QSPR development. Table 2 lists some
sources of physicochemical property data that could be useful for
QSPR modeling, and more are given by Wagner (27).
If a QSPR applicable to a diverse range of chemicals is required,
then of course the training set data should also be diverse. The
actual diversity is often limited by the availability of data, and care
should be taken in the application of the QSPR. Oyarzabal et al.
(28) have recently described a novel approach to minimize the
problems of using data from various sources and of data sets with
restricted coverage of chemical space.
Secondly, one has to obtain values for descriptors that will
model the specified property well (29, 30). One of the two
approaches can be taken here. If one believes that one or two
descriptors will serve, then those are all that may be required.
For example, Yalkowsky and coworkers (31) have found that
Table 2
Some sources of data for modeling
of physicochemical properties
Database Reference
Aquasol (16)
Benchware Discovery 360 (17)
Chem. & Physical Properties Database (18)
Chemical Database Service (19)
ChemSpider (20)
Crossfire (21)
OCHEM (22)
OECD eChemPortal (23)
OECD Toolbox (24)
OSHA (25)
PhysProp (26)
aqueous solubility can often be modeled quite well with log P and
melting point. However, generally one does not know a priori
which descriptor(s) will best model a given property, in which
case the approach is usually to assemble a large pool of descriptors
(from the thousands that are now available from which to select a
pool). It should be noted also that whilst experimentally measured
descriptor values are generally more accurate than are calculated
values, the use of the latter means that one can use a QSPR to
predict the requisite properties of chemicals not yet synthesized or
available (Table 3).
The third step is to use a statistical method that will select from
the pool the “best” descriptor(s) based on appropriate statistical
criteria, and will generate a QSPR based on those descriptors (42,
43). Typical techniques include stepwise regression and genetic
algorithms, and most commercially available statistical packages
include these (Table 4).
Tetko et al. (60) have discussed the accuracy of ADMET pre-
dictions, and emphasized the importance of accuracy in order to
avoid filtering out promising series of compounds because of, for
example, wrongly predicted log P or aqueous solubility.
Even if a good QSPR is obtained, it may not be a good
predictor of the property in question for compounds not in the
training set. Hence some measure of predictive ability is required.
The best way for predictivity to be assessed is to use the QSPR to
predict the property in question for a number of compounds that
98 J.C. Dearden
Table 3
Some sources of descriptors for modeling of physico-
chemical properties
Database Reference
ADAPT (32)
Almond (33)
CODESSA (34)
C-QSAR (35)
Discovery Studio (36)
Dragon (37)
eDragon (38)
MOE (39)
MOLCONN-Z (40)
OCHEM (22)
QSARpro (41)
Volsurf (33)
were not used in the training set, but for which the measured value
of the property is known; such a set of compounds is called a test
set. The test set compounds must be reasonably similar to those of
the training set; that is, they must lie within the applicability
domain of the QSPR. This is often achieved by dividing the total
number of compounds into two groups; the larger group forms the
training set, and the smaller group (typically 5–50% of the total)
forms the test set. If the standard error for the test set is much larger
than that for the training set, then the QSPR does not have good
predictivity, and it should not be used for predictive purposes.
If the total number of compounds is small, then it may not be
practicable to split it into training and test sets. In that case a
procedure called internal cross-validation can be used, whereby
each compound in turn is deleted from the training set, the
QSPR is developed with the remaining compounds, and is used
to predict the property value of the omitted compound. That
compound is then returned to the training set and a second com-
pound is deleted, and so on until every compound has been left out
in turn. A cross-validated R2 value, called Q2, is then calculated,
which is an indicator of the internal predictivity of the QSPR. It is,
however, not considered to be as good an indicator as is obtained
using an external test set. Walker et al. (61) have proposed that an
indicator of good predictivity is that Q2 should not be more than
Table 4
Statistical packages for QSPR modeling
of physicochemical properties
Software Reference
ADMET Modeler (44)
ADMEWORKS ModelBuilder (45)
ASNN (46)
Cerius2 (47)
CODESSA (34)
C-QSAR (35)
a
GENSTAT (48)
MATLABa (49)
a
Minitab (50)
MOE (39)
a
NCSS (51)
OCHEM (22)
Pentacle (33)
Pipeline Pilot (36)
PNN (46)
PredictionBase (52)
ProChemist (53)
a
PSPP (54)
QSARpro (41)
a
SAS (55)
Scigress Explorer (56)
a
SPSS (57)
Statisticaa (58)
Strike (59)
SYBYL-X (17)
a
General-purpose statistical software
0.3 lower than R2, whilst Eriksson et al. (62) have proposed a
minimal acceptable value of 0.5 for Q2.
2.2. Published QSPRs Many thousands of QSPRs are available in the open literature.
A large number of these have been referenced in review papers
100 J.C. Dearden
Table 5
Some QSAR/QSPR databases
Database Reference
C-QSAR (35)
JRC QMRF Database (65)
Danish QSAR Database (66)
OCHEM (22)
(for example (10, 63, 64)), and many more can be found with the
use of a search engine. Thus, inputting “QSPR pKa” into Scholar
Google produced 830 hits.
Compilations of QSARs and QSPRs are also available in a
number of databases, some of which are listed in Table 5.
2.3. QSPR Software There are now numerous software packages available for the pre-
diction of physicochemical properties. These vary in their perfor-
mance and in the range of properties that they predict. They almost
all use QSPR modeling approaches for their predictions. Some are
available free of charge, and some are very expensive. Most give
some indication of their performance, but what is lacking in general
is independent comparative assessment of performance. Many of
these software packages are listed in Table 6. The ChemProp soft-
ware (70) is unique in that it selects the best QSPR or software
program, from those it holds, for the prediction of a given property
of a given compound.
3. Prediction of
Selected
Physicochemical
Properties As pointed out above, it is beyond the scope of this chapter to consider
all physicochemical properties that might play a part in modeling
toxicity. Those considered most important are octanol–water parti-
tion coefficient, aqueous solubility, pKa, melting point, boiling point,
vapor pressure, and Henry’s law constant (air–water partition coeffi-
cient), and they are discussed below.
3.1. 1-Octanol–Water Partition coefficient is defined (84) as the ratio of concentrations at

Partition Coefficient equilibrium of a solute distributed between two immiscible liquid
(Log P, Log Kow) phases; the concentration in the more lipophilic phase is, by con-
vention, the numerator. The term “immiscible” does not preclude
the two phases’ having partial miscibility. For ionizable solutes, the
partition coefficient P relates to the undissociated species only, and
is thus (approximately) independent of pH.
Table 6
Physicochemical properties estimated by some commercially and freely available software
Henry’s
Aqueous Melting Boiling Vapor law
Software Log P Log D solubility pKa point point pressure constant Availability References
Absolv ✓ ✓ ✓ ✓ ✓ Purchase (67)
ACD/PhysChem Suite ✓ ✓ ✓ ✓ ✓ ✓ Purchase (67)
ADMET Predictor ✓ ✓ ✓ ✓ Purchase (44)
ADMEWORKS ✓ Purchase (45)
Predictor
ChemAxon ✓ ✓ ✓ Purchase (68)
ChemOffice ✓ ✓ ✓ ✓ Purchase (69)
ChemProp ✓ ✓ ✓ ✓ ✓ ✓ Free online (70)
ChemSilico ✓ ✓ ✓ Purchase (71)
C log P ✓ Purchase (35, 72)
Episuite ✓ ✓ ✓ ✓ ✓ ✓ Free (73)
download
MOE ✓ Purchase (39)
Molecular Modeling Pro ✓ ✓ ✓ ✓ ✓ Purchase (74)
MoKa ✓ Purchase (33)
Molinspiration ✓ Free online (75)
6 Prediction of Physicochemical Properties
(continued)
101
102
Table 6
(continued)
Henry’s
J.C. Dearden
Aqueous Melting Boiling Vapor law

Software Log P Log D solubility pKa point point pressure constant Availability References
MOLPRO ✓ ✓ ✓ ✓ Purchase (76)
a
OECD Toolbox ✓ ✓ ✓ ✓ ✓ ✓ Free (24)
download
Pallas ✓ ✓ ✓ Purchase (77)
PhysProps ✓ ✓ ✓ Purchase (78)
Pipeline Pilot ✓ ✓ ✓ ✓ Purchase (36)
PredictionBase ✓ Purchase (52)
PREDICTPlus ✓ ✓ ✓ Purchase (79)
ProChemist ✓ ✓ ✓ ✓ Purchase (53)
ProPred ✓ ✓ ✓ ✓ ✓ ✓ ✓ Consortium (80)
Schrödinger ✓ ✓ ✓ ✓ Purchase (59)
SPARC ✓ ✓ ✓ ✓ ✓ ✓ ✓ Free online (81)
TerraQSAR-LOGP ✓ Purchase (82)
b
StarDrop ✓ ✓ ✓ Purchase (83)
VCCLAB ✓ ✓ ✓ ✓ Free online (46)
a
The OECD Toolbox uses the Episuite software
b
StarDrop offers log D at pH 7.4 only
The 1-octanol/water solvent pair was first used by Hansch

et al. (85) as a surrogate for lipid–water partitioning, and on the
whole has worked well. About 70% of published QSARs and
QSPRs incorporate a log P term, indicating the importance of
this property for modeling biological activities. It is essentially a
transport term, reflecting the rate of movement of a chemical
through biological membranes.
Many publications have dealt with the estimation of
log P values from molecular structure, and there have been a num-
ber of reviews of the subject (9, 86–91). Mannhold et al. (9) in
particular give a detailed critical analysis of available methods. The
main prediction methodologies are based on physicochemical,
structural, and/or topological descriptors, or on atomic or group
contributions.
The earliest work on log P prediction was that of Hansch and
coworkers, who developed (92) a hydrophobic substituent con-
stant p, which was, to a first approximation, additive, although
it required numerous correction factors. Rekker and coworkers
(93, 94) developed a fragmental approach which proved easier to
use. Extension of the fragmental approach by Leo et al. (95) led to
the development of the C log P software (35) for log P prediction.
Bodor et al. (96) developed a QSPR with 14 physicochemical
and quantum chemical descriptors to model log P of a diverse set of
118 organic chemicals, with R2 ¼ 0.882 and a standard error of
prediction of 0.296 log unit. Klopman and Wang (97) used their
MCASE group contribution approach to predict the log P values of
935 organic compounds with a standard error of 0.39 log unit.
These errors are close to the typical experimental error of 0.35 log
unit on log P (89). The method of Ghose et al. (98) used atomic
contributions, and on a set of 893 compounds the standard error
was 0.496 log unit.
Liu and Zhou (99) used molecular fingerprints to model
log P values of a 9,769-chemical training set, with R2 ¼ 0.926
and root mean square error (RMSE) ¼ 0.511 log unit. Chen
(100) compared the ability of multiple linear regression, radial
basis neural networks, and support vector machines to model
log P of about 3,500 diverse chemicals, and found R2 values of
0.88, 0.90, and 0.92, respectively. Tetko et al. (101) used E-state
indices (102) and artificial neural networks to model log P values of
12,777 chemicals, with R2 ¼ 0.95 and RMSE ¼ 0.39 log unit.
There are numerous software programs available for the esti-
mation of log P of organic chemicals, and some of these give good
predictions. A recent comparison of 14 such programs (103) found
that, using a 138-chemical test set, the percentage of chemicals with
log P predicted within 0.5 log unit of the measured log P value
ranged from 94 to 50%; the performances of the top ten programs
are shown in Table 7.
104 J.C. Dearden
Table 7
Log P predictions from ten software packages
for a 138-chemical test set (103)
% of chemicals with
log P prediction error
Software <
_ 0.5 log unit r2 s
QMPRPlusa (44) 94.2 0.965 0.272
ACD/Labs (67) 93.5 0.965 0.271
ChemSilico (71) 93.5 0.958 0.297
ProPred (80) 89.9 0.945 0.342
A log P (75) 89.1 0.948 0.332
KOWWIN (Episuite) (73) 89.1 0.947 0.335
SPARC (81) 88.5 0.941 0.330
C log P (35) 88.4 0.961 0.287
b
Prolog P (77) 86.2 0.949 0.329
MOLPRO (76) 81.1 0.847 0.568
a
Now ADMET Predictor
b
Now in PALLAS
Sakuratani et al. (104) tested six software programs, using a test

set of 134 simple organic compounds. None of the programs
predicted log P values of all the compounds. Their results were as
follows: Episuite (KOWWIN) (73), n ¼ 130, s ¼ 0.94; C log
P (35), n ¼ 131, s ¼ 0.95; ACD/Labs (67), n ¼ 127, s ¼ 1.09;
VLOGP (36), n ¼ 122, s ¼ 1.11; SLOGP (39), n ¼ 132,
s ¼ 1.34; and COSMO (105), n ¼ 129, s ¼ 1.35. These standard
prediction errors are all very high.
Mannhold et al. (9) tested 35 log P software programs, and
found wide variations in performance, with RMSE values ranging
from 0.41 to 1.98 log unit.
It is recommended that at least two of the better software
programs be used for the prediction of log P. If possible, the
average of several predictions should be taken. It should be noted
that the VCCLAB Web site (46), as well as giving its own log P
prediction, gives predictions from nine other software packages,
together with the mean of all ten.
The distribution coefficient (D) is the ratio of total concentra-
tions, both dissociated and undissociated, of the solute at a given
pH in two liquid phases. It is related to P by Eq. 2 for acids and by
Eq. 3 for bases:
log DðpHÞ ¼ log P logð1 þ 10pHpKa Þ: (2)
log DðpHÞ ¼ log P logð1 þ 10pKapH Þ: (3)

There are very few published studies on QSPR modeling of
log D. One such study derived a QSPR for uranyl extracted by
podands from water to 1,2-dichloroethane (106). However,
log D values can generally readily be calculated from Eqs. 2 and
3, using either measured or calculated log P and pKa values (see
above and Subheading 3.3).
When a toxicity data set includes ionizable chemicals, it is
worth considering whether or not using log D instead of
log P would result in an improved QSPR correlation. This was
the case for a study of the toxicity of substituted nitrobenzenes to
Tetrahymena pyriformis (107).
3.2. Aqueous Solubility Aqueous solubility depends not only on the affinity of a solute for
water, but also on its affinity for its own crystal structure. Molecules
that are strongly bound in their crystal lattice require considerable
energy to remove them. This also means that such compounds have
high melting points, and high-melting compounds generally have
poor solubility in any solvent. Note that solubility can vary consid-
erably with temperature, and it is important that solubility data are
reported at a given temperature.
Removal of a molecule from its crystal lattice also means an
increase in entropy, and this can be difficult to model accurately. For
this reason, as well as the fact that the experimental error on
solubility measurements is estimated to be about 0.6 log unit
(108), the prediction of aqueous solubility is not as accurate as is
the prediction of partition coefficient. Nevertheless, many papers
(10) and a book (109) have been published on the prediction of
aqueous solubility, as well as a number of reviews (10, 87,
110–112).
There are also a number of commercial software programs
available for that purpose (10, 113). Livingstone (90) has discussed
the reliability of aqueous solubility predictions from both QSPRs
and commercial software. It should be noted that there are various
ways that aqueous solubilities can be reported: in pure water, at a
specified pH, at a specified ionic strength, as the undissociated
species (intrinsic solubility), or in the presence of other solvents
or solutes. Solubilities are also reported in different units, for exam-
ple g/100 mL, mol/L, and mole fraction. The use of mol/L is
recommended, as this provides a good basis for comparison.
The aqueous solubility (S) of a chemical could be described as
its hydrophilicity, and so one could perhaps expect an inverse
relationship between aqueous solubility and hydrophobicity
(as measured by partition coefficient). This is in fact so for organic
liquids (114), but does not hold well for solids, because the ther-
modynamics of melting means that there are significant enthalpy
106 J.C. Dearden
and entropy changes when a molecule is removed from its crystal

lattice (10). The melting point (MP) of a chemical is a reasonable
measure of these changes, and Yalkowsky and Valvani (115) found
that aqueous solubility could be modeled by log P and MP:
log S ¼ 0:87 1:05 log P 0:012 MP: (4)
n ¼ 155, R ¼ 0.978, and s ¼ 0.308.
2
Note that Eq. 4 was developed using experimental log P and MP

( C) values. Had predicted values been used, the statistics would
probably have been a little worse; melting point in particular cannot
yet be predicted very well (60, 116); see also Subheading 3.4 below.
Equation 4 has now been modified (117), and is called the
General Solubility Equation (GSE):
log S ¼ 0:5 log P 0:01ðMP 25Þ: (5)
The melting point term is taken as zero for compounds melting
at or below 25 C. Aqueous solubilities of 1,026 nonelectrolytes,
with a log S range of 13 to +1 (S in mol/L), calculated with the
GSE had a standard error of 0.38 log unit.
Good predictions for a large diverse data set have been
obtained by the use of linear solvation energy descriptors (118):
log S ¼ 0:518 1:004R þ 0:771pH þ 2:168SaH
þ 4:238SbH 3:362SaH SbH 3:987VX : (6)
n ¼ 659, R ¼ 0.920, and s ¼ 0.557.
2
It can be seen from Eq. 6 that the main factors controlling aqueous
solubility are hydrogen bond acceptor ability and molecular size.
The SaHSbH term models intramolecular hydrogen bonding,
which lowers aqueous solubility. Increased molecular size also low-
ers solubility, since a larger cavity has to be created in water to
accommodate a larger solute molecule.
Katritzky et al. (108) used their CODESSA descriptors to
model the aqueous solubilities of a large diverse set of organic
chemicals:
log Saq ¼ 16:1 Q min 0:113 N el þ 2:55 FHDSAð2Þ
þ 0:781 ABOðN Þ þ 0:328 0SIC 0:0143 RNCS
0:882; (7)
n ¼ 411, R2 ¼ 0.879, and s ¼ 0.573,
where Qmin ¼ most negative partial charge, Nel ¼ number of
electrons, FHDSA(2) ¼ fractional hydrogen bond donor area,
ABO(N) ¼ average bond order of nitrogen atoms, 0SIC ¼ an
information content topological descriptor, and RNCS ¼ relative
negatively charged surface area. The CODESSA software is
available from SemiChem Inc.(34).
Electrotopological state descriptors (119), hydrogen bonding

and nearest-neighbor similarities (120), and group contributions
(121) have also been used to model the aqueous solubilities of large
diverse data sets of organic chemicals. Palmer et al. (122) used
random forest models for solubility prediction of a diverse data
set, with good results; for an external test set of 330 chemicals,
they obtained R2 ¼ 0.89 and s ¼ 0.69 log unit. Similar results
were obtained with support vector machines (123): R2 ¼ 0.88
and RMSE ¼ 0.62.
Several recent papers (124–126) have reported QSPRs specifi-
cally for the aqueous solubility of drugs and drug-like chemicals.
There are relatively few studies of solubility prediction within
specific chemical classes. Yang et al. (127) found a good correlation
of log S with mean molecular polarizability for a small set of dioxins
(n ¼ 12, r2 ¼ 0.978, s ¼ 0.30). Wei et al. (128) obtained a good
correlation of log S of all 209 polychlorinated biphenyls with quan-
tum chemical descriptors and positions of chlorine substitution.
Dearden et al. (129) compared 11 commercial software pro-
grams for aqueous solubility prediction (as log S), and found
considerable variation in performance against a 113-chemical test
set of organic chemicals that included 17 drugs and pesticides. The
performances of the top ten programs are shown in Table 8.
Table 8
Aqueous solubility (S) predictions from ten software
packages for a 113-chemical test set (129)
% of chemicals
with log S prediction
Software error <
_ 0.5 log unit r2 s
ChemSilico (71) 79.7 0.951 0.451
WATERNT (Episuite) (73) 79.6 0.954 0.437
VCCLAB (46) 77.0 0.943 0.487
a
QMPRPlus (44) 74.3 0.939 0.501
ACD/Labs (67) 72.6 0.940 0.498
WSKOWWIN (Episuite) (73) 69.9 0.923 0.562
SPARC (81) 68.1 0.853 0.779
ABSOLV (36) 61.9 0.888 0.680
QikProp (59) 55.7 0.867 0.742
MOLPRO (76) 50.4 0.766 0.984
a
Now ADMET Predictor
108 J.C. Dearden
Table 9
Aqueous solubility (S) predictions from ten software
packages for a 122-chemical test set of drugs (10)
% of chemicals
with log S prediction
Software error <
_ 0.5 log unit r2 s
Admensaa (83) 72.1 0.76 0.65
ADMET Predictor (44) 64.8 0.82 0.47
MOLPRO (76) 62.3 0.44 1.22
ChemSilico (71) 59.8 0.67 0.73
ACD/Labs (67) 59.0 0.73 0.66
VCCLAB (46) 51.6 0.67 0.73
QikProp (59) 47.6 0.57 0.97
PredictionBase (52) 46.7 0.48 1.07
SPARC (81) 42.9 0.73 0.96
WSKOWWIN (Episuite) (73) 41.0 0.51 1.17
a
Now known as StarDrop
The Episuite predictions were made without the input of

measured melting point values.
Dearden (10) tested 16 commercially available software
programs for their ability to predict the aqueous solubility of a
122-compound test set of drugs with accurately measured solubi-
lities in pure water. Again there was considerable variation in per-
formance. The performances of the top ten programs are shown in
Table 9.
Dearden (130) also tested the relatively new fragment-based
WATERNT module in the Episuite software on both the 113-
chemical test set (comprising mostly simple organic chemicals)
and the 122-drug test set. He found it to be the best for the former
(79.6% within 0.5 log unit of measured value; standard error
¼ 0.44 log unit) and among the worst for the latter (38.5% within
0.5 log unit of measured value; standard error ¼ 0.93 log unit).
Investigation indicated that this was caused by the program’s not
including all fragments and/or correction factors in its calculations.
3.3. pKa Within a congeneric series of chemicals, pKa is often closely corre-
lated with the Hammett substituent constant, and this is the basis
for a number of attempts at pKa prediction. Harris and Hayes
(131) and Livingstone (90) have reviewed the published literature
in this area.
The Hammett substituent constant s was derived from a

consideration of acid dissociation constants Ka, and most noncom-
puterized methods of calculating Ka and pKa values are based on
s values:
pK aðderivativeÞ ¼ pK aðparentÞ rs; (8)
where r is the series constant, which is 1.0 for benzoic acids. Harris
and Hayes (131) list r values for other series.
Harris and Hayes (131) give several examples of pKa calcula-
tion, for example for 4-t-butylbenzoic acid. The pKa value of
benzoic acid is 4.205, the r value for benzoic acids is 1.0, and
the r value for 4-t-butyl is 0.197. Hence the pKa value of
4-t-butylbenzoic acid is calculated as 4.205(0.197) ¼ 4.402.
This value is virtually identical to the measured value for this
compound.
A number of publications have dealt with estimation of pKa
values from chemical structure, but these relate mostly to specific
chemical classes, e.g., benzimidazoles (132), 4-aminoquinolines
(133), and imidazol-1-ylalkanoic acids (134).
There have also been a number of attempts to model pKa
values of diverse sets of chemicals. Klopman and Fercu (135)
used their MCASE methodology to model the pKa values of a set
of 2,464 organic acids, and obtained good predictions; a test set of
about 600 organic acids yielded a standard error of 0.5 pKa unit.
The COSMO-RS methodology was used to predict pKa values of
64 organic and inorganic acids, with a standard error of 0.49 pKa
unit (136) and of 43 organic bases, with an RMSE of 0.66 pKa unit
(137). Lee et al. (138) used a decision tree approach and SMARTS
strings to model a diverse set of 1,693 monoprotic chemicals, with
an RMSE of 0.80 pKa unit. Milletti et al. (139) used GRID
interaction fields to model pKa on a very large training set of
24,617 pKa values; for a class of 947 6-membered N-heterocyclic
bases they found an RMSE of 0.60 pKa unit. The MoKa program
developed by Cruciani et al. (140) was reported as giving predic-
tions within 0.5 pKa units. A model based on group philicity (141)
yielded a standard error of 0.57 pKa unit for 63 chemicals of
various chemical classes.
There are a number of software programs that predict multiple
pKa values of organic chemicals. ACD/pKa has a claimed standard
error of 0.39 pKa unit for 22 compounds, and one of 0.36 pKa unit
for 26 drugs. pKalc (part of the PALLAS suite) is claimed to be
accurate to within 0.25 pKa unit (142), Schrödinger’s pKa calcula-
tor (59) is claimed to have a mean absolute error (MAE) of 0.19
pKa unit, and SPARC is claimed to have an RMSE of 0.37 pKa unit
when evaluated on 3,685 compounds (143), although Lee et al.
(144) found an RMSE of 1.05 pKa unit when they tested SPARC
on 537 drug-like chemicals. ADMET Predictor is claimed to have
110 J.C. Dearden
Table 10
Some software predictions of pKa by Dearden et al. (146)
and Liao and Nicklaus (147)
Dearden et al. Liao and Nicklaus

Software
r2 MAE r2 MAE
ACD/Labs (67) 0.922 0.54 0.908 0.478
a
ADME Boxes 0.959 0.32 0.944 0.389
ADMET Predictor (44) 0.899 0.67 0.837 0.659
ChemSilicoa (71) 0.565 1.48 – –
Epik (59) 0.768 0.93 0.802 0.893
Jaguar (59) – – 0.579 1.283
ChemAxon/Marvin (68) 0.778 0.90 0.763 0.872
PALLAS (77) 0.656 1.17 0.803 0.787
Pipeline Pilot (36) 0.852 0.43 0.757 0.769
SPARC (81) 0.846 0.78 0.894 0.651
VCCLAB (46) 0.931 0.40 – –
a
No longer available
an MAE of 0.56 pKa unit for a test set of 2,143 diverse chemicals.
ChemSilico’s pKa predictor was reported to have an MAE of
0.99 pKa unit for a test set of 665 diverse chemicals, many of
them multiprotic. However, this module does not appear currently
to be available, although it is stated to be used in ChemSilico’s
log D predictor (71). The ChemProp software (70) uses a novel
approach whereby, for a given compound, the best prediction
method is selected based on prediction errors for structurally simi-
lar compounds (145).
Dearden et al. (146) tested the performance of ten available
software programs that calculate pKa values. Some of these pro-
grams will calculate pKa values of all ionizable sites. However, the
test set of 665 chemicals that they used, which was kindly supplied
by ChemSilico Inc. and used by them as their test set, had measured
pKa values only for the prime ionization site in each molecule.
There were doubts about the correct structures of 11 of the test
set chemicals, and so the programs were tested on 654 chemicals.
Some of the software companies kindly ran our compounds
through their software in-house. The results are given in Table 10.
It should be noted that the ACD/pKa predictions were incorrectly
reported by Dearden et al. (146), for which the author apologizes.
The ACD/pKa predictions given in Table 10 are correct.
Liao and Nicklaus (147) carried out a comprehensive comparison

of nine pKa prediction software programs on a test set of 261
drugs and drug-like chemicals. Their results are also given in
Table 10, and on the whole are comparable with those of Dearden
et al. (146).
Meloun and Bordovská (148) compared the pKa predictions
from ACD/pKa, Marvin, PALLAS, and SPARC for 64 drug-like
chemicals, and found MAE values of 0.12, 0.23, 0.55, and
1.64 pKa unit, respectively. Balogh et al. (149) compared the
performance of five available pKa software programs (ACD/pKa
(67), Epik (59), Marvin (68), PALLAS (77), and VCCLAB (46)),
using a 248-chemical Gold Standard test set, and found standard
errors of 0.808, 2.089, 0.957, 1.229, and 0.615 pKa unit, respec-
tively. Manchester et al. (150), using a test set of 211 drug-like
chemicals, found the following RMSE values (pKa unit): ACD/
pKa (67) v. 12, 0.8; Epik (59), 3.0; Marvin (68), 0.9, MoKa (33),
1.0; and Pipeline Pilot (36), 2.6.
It should be noted that the ADME Boxes software tested by
Liao and Nicklaus (147) is no longer available. It was offered by
Pharma Algorithms, which has now been taken over by ACD/
Labs. Similarly the VCCLAB software tested by Balogh et al.
(149) was also that of Pharma Algorithms. The current VCCLAB
pKa prediction tool uses ACD/pKa.
It thus appears that ACD/pKa is generally the best overall pKa
prediction software currently available, with ADMET Predictor
also performing well.
3.4. Melting Point Melting point is an important property for two main reasons.
Firstly, it indicates whether a chemical will be solid or liquid at
particular temperatures, which will dictate how it is handled. Sec-
ondly, it is used in the GSE (117) to predict aqueous solubility.
The melting point of a crystalline compound is controlled
largely by two factors—intermolecular interactions and molecular
symmetry. For example, 3-nitrophenol, which can hydrogen-bond
via its OH group, melts at 97 C, whereas its methyl derivative,
3-nitroanisole, which cannot hydrogen-bond with itself, melts at
39 C. The symmetrical 1,4-dichlorobenzene melts at 53 C, whilst
its less-symmetrical 1,3-isomer melts at 25 C. These and other
effects have been discussed in detail by Dearden (151).
There have been many attempts to predict the melting point of
organic chemicals, and these have been reviewed by Horvath (152),
Reinhard and Drefahl (87), Dearden (63, 153), and Tesconi and
Yalkowsky (154). It may be noted that in 1884 Mills (155) devel-
oped a QSPR based on carbon chain length for melting points of
homologous series of compounds that was accurate to 2 :
MP ¼ ðbðx cÞÞ=ð1 þ gðx cÞÞ; (9)
112 J.C. Dearden
where x ¼ number of CH2 groups in the chain, and b, g, and c are

constants depending on the series.
Essentially two approaches have been used in the prediction of
melting point—the physicochemical/structural descriptor
approach and the group contribution approach. The former is
exemplified by the work of Katritzky et al. (156), who used nine
of their CODESSA descriptors to model a diverse set of 443
aromatic chemicals with R2 ¼ 0.837 and s ¼ 30.2 . The
CODESSA software is available from SemiChem Inc. (34). This is
a complex QSPR, with descriptors that are not easy to comprehend,
and reflects the difficulty of modeling the melting points of diverse
data sets. Even for a set of 58 PCB congeners with 1–10 chlorine
atoms, a 5-term QSPR was required (157), with R2 ¼ 0.83 and
s ¼ 22.1 . Yalkowsky and coworkers have published extensively on
the prediction of melting point. They incorporated terms to
account for conformational flexibility and rotational symmetry
(158) and molecular eccentricity (159) to try to account for the
entropic contributions to melting. They were able (160) to model
the melting points of 1,040 aliphatic chemicals, using a combina-
tion of molecular geometry and group contributions, with a stan-
dard error of 34.4 .
Todeschini et al. (161) used their WHIM descriptors to model
the melting points of 94 European Union environmental priority
chemicals, with a standard error of 32.8 . Bergström et al. (162)
used principal components analysis and partial least squares to
model the melting points of 227 diverse drugs. They used 2-D,
3-D, and a combination of 2-D and 3-D descriptors to give three
separate models. A consensus of all three models gave the best
results, with R2 ¼ 0.63 and RMSE ¼ 35.1 . Modarresi et al.
(163) used eight descriptors from the no-longer-available Tsar
program (36), CODESSA (34) and Dragon (37), to model the
melting points of 323 drugs, with R2 ¼ 0.660 and RMSE ¼ 41.1

. Godavarthy et al. (164) used genetic algorithms and neural
networks to model the melting points of over 1,250 chemicals,
with R2 ¼ 0.95 and RMSE ¼ 12.6 , although it must be said
that those results are so good that over-fitting could have occurred.
Karthikeyan et al. (165) used a very large diverse training set of
4,173 chemicals to develop a QSPR based on a neural network
approach using principal components. They found 2-D descriptors
to be better than 3-D descriptors; their results were as follows:
Training set Internal validation Test set Test set (drugs)

n ¼ 2,089 n ¼ 1,042 n ¼ 1,042 n ¼ 277
R ¼ 0.661
2
Q ¼ 0.645
2
Q ¼ 0.658
2
Q2 ¼ 0.662
MAE ¼ 37.6 MAE ¼ 39.8 MAE ¼ 38.2 MAE ¼ 32.6
Table 11
Some software predictions of melting points
of a 96-compound test set (63)
Software Mean absolute error

Episuite (73) 26.3
ChemOffice (69) 27.0
ProPred (80) 25.8
Considering the size and diversity of the data sets, the statistics
are quite good. However, the methodology used was complex, and
could not readily be applied.
The group contribution approach to melting point prediction
was first used by Joback and Reid (166). Simamora and Yalkowsky
(167) modeled the melting points of a diverse set of 1,690 aromatic
compounds using a total of 41 group contributions and four intra-
molecular hydrogen bonding terms, and found a standard error of
37.5 . Constantinou and Gani (168) used two levels of group
contributions to model the melting points of 312 diverse chemi-
cals, and obtained a mean absolute error of prediction of 14.0 ,
compared with an MAE of 22.6 for the Joback and Reid method.
Marrero and Gani (169) extended this approach to predict the
melting points of 1,103 diverse chemicals with a standard error of
25.3 . Tu and Wu (170) used group contributions to predict
melting points of 1,310 diverse chemicals with an MAE of 8.2%.
There are several software programs that predict melting point
(see Table 6); they all use one or more group contribution
approaches. Dearden (63) used a 96-compound test set to compare
the performances of three of these programs. Episuite (73) calcu-
lates melting point by two methods, that of Joback and Reid (166)
and that of Gold and Ogle (171), and takes their mean. ChemOf-
fice (69) uses the method of Joback and Reid (166), and ProPred
(80) uses the Gani approach (168, 169). The results are given in
Table 11.
An ECETOC report (113) mentions a 1999 US Environmen-
tal Protection Agency (EPA) test of the performance of the Episuite
MPBPVP module; for two large, diverse test sets the performance
was as follows: (1) n ¼ 666, r2 ¼ 0.73, MAE ¼ 45 ; (2)
n ¼ 1,379, r2 ¼ 0.71, MAE ¼ 44 .
Molecular Modeling Pro uses the Joback and Reid (166)
method, so its performance should be the same as that of ChemOf-
fice. Four other programs, Absolv (67), ChemProp (70), OECD
Toolbox (24), and PREDICTPlus (79), also predict melting point.
It can be seen that there is little to choose between the pro-
grams in terms of accuracy of prediction. They can all operate in
114 J.C. Dearden
batch mode. It is therefore recommended that the Episuite

software, which is freely downloadable, and at least one other
method be used to calculate melting point.
It should be noted that currently both QSPR methods and
software programs have prediction errors well in excess of the
error on experimental measurement of melting point, which is
usually <2 . Therefore it is preferable to use measured melting
points if at all possible.
3.5. Boiling Point Boiling point (Tb) is an important property since it is an indicator
of volatility, and can be used to predict vapor pressure. From the
Clausius–Clapeyron equation, boiling point is inversely propor-
tional to the logarithm of vapor pressure. Boiling point also indi-
cates whether a chemical is gaseous or liquid at a given temperature.
Lyman (172) has discussed seven recommended methods for
the prediction of boiling point. The methods are based on physico-
chemical and structural properties and group contributions.
Perhaps the simplest of those methods is that of Banks (173),
who developed the following QSPR:
p
log TbðKÞ ¼ 2:98 4= MW; (10)
where MW ¼ molecular weight. No statistics were given for this
QSPR.
Rechsteiner (174), Reinhard and Drefahl (87), and Dearden
(63) have reviewed the QSPR prediction of boiling point.
Many studies of boiling point prediction have dealt with spe-
cific chemical classes, and very good correlations have generally
been obtained. In 1884 Mills (155) modeled the boiling points of
a number of homologous series with QSPRs based on carbon chain
length, and claimed accuracy to within about 2 (see Eq. 9). Ivan-
ciuc et al. (175) used four topological descriptors to model the
boiling points of 134 alkanes with a standard error of 2.7 , whilst
Gironés et al. (176) used only one quantum chemical descriptor
(electron–electron repulsion energy) to model the boiling points of
15 alcohols with a standard error of 5.6 .
Models based on diverse training sets are, however, more
widely applicable. Katritzky et al. (177) used four CODESSA
descriptors to model the boiling points of 298 diverse organic
compounds:
TbðKÞ ¼ 67:4 GI1=3 þ 21; 540 HDSAð2Þ þ 140:4d max
þ 17:5NCl 151:3; (11)
n ¼ 298, R ¼ 0.973, and s ¼ 12.4 ,
2
where GI ¼ gravitational index, HDSA(2) ¼ area-weighted sur-

face charge of hydrogen-bond donor atoms, dmax ¼ most nega-
tive atomic partial charge, and NCl ¼ number of chlorine atoms.
The CODESSA software is available from SemiChem Inc. (34).
Sola et al. (178) used eight CODESSA descriptors to model the

boiling points of 135 diverse chemicals, with an RMSE of 9.1 .
Wessel and Jurs (179) used their ADAPT descriptors to
develop two QSPRs for the prediction of boiling point—one
for compounds containing O, S, and halogens, and the other for
compounds containing N. The QSPR for O, S, and halogens is
TbðKÞ ¼ 0:3009 PPSA 3:690 PNSA 51:78 RPCG
þ 9:515 NRA þ 19:21 SQMW þ 554:7 SADH
25:52 NF þ 19:52 KETO þ 50:84Nsulf
135:0 S=NA þ 59:86; (12)
n ¼ 248, R2 ¼ 0.991, and RMSE ¼ 11.6 ,
where PPSA ¼ partial positive surface area, PNSA ¼ partial nega-
tive surface area, RPCG ¼ relative positive charge, NRA ¼ number
of ring atoms, SQMW ¼ square root of molecular weight, SADH
¼ surface area of donatable hydrogen atoms, NF ¼ number of
fluorine atoms, KETO ¼ indicator variable for ketone, Nsulf ¼
number of sulfide groups, and S/NA ¼ (number of sulfur
atoms)/(total number of atoms).
Basak and Mills (180) used eight topochemical, topological,
and hydrogen bonding descriptors to model the boiling points of
1,015 diverse organic compounds, with a standard error of 15.7 .
Probably the best QSPR developed to date is that of Hall and Story
(181), who used atom-type electrotopological descriptors (182)
and a neural network to obtain an MAE of 3.9 for a set of 298
diverse chemicals with a boiling point range of about 430 .
The group contribution approach was used first by Joback and
Reid (166), who obtained an MAE of 12.9 for a set of 438 diverse
chemicals. Stein and Brown (183) devised a simple group contri-
bution method to model boiling points of a very large set of 4,426
diverse chemicals, with an MAE of 15.5 . A group contribution
approach was also used by Marrero and Gani (169) to model the
boiling points of 1,794 organic compounds with a standard error of
8.1 , whilst Labute (184) used 18 atomic contributions on a set of
298 diverse organics, to give a standard error of 15.5 . Simamora
and Yalkowsky (167) used 36 group contributions and four intra-
molecular hydrogen bonding terms to model the boiling points of a
diverse set of 44 aromatic compounds, with a standard error of
17.6 . Ericksen et al. (185) modeled the boiling points of 1,141
chemicals with second-order group contributions, and found an
MAE of 7.8 .
There are a number of software programs available for the
prediction of boiling point, and Dearden (63) compared the per-
formance of six of these using a 100-compound test set. The results
The ACD/Labs result is based on the 54 chemicals in the test
set that were not included in the ACD/Labs training set. Clearly
116 J.C. Dearden
Table 12
Some software predictions of boiling points of a
100-compound test set (63)
Software Mean absolute error

ACD/Labs (67) 1.0
SPARC (81) 6.3
Episuite (73) 13.8
ChemOffice (69) 13.8
ProPred (80) 16.1
Molecular Modeling Pro (74) 21.7
the ACD/Labs software gives by far the best predictions, but has to
be purchased. SPARC is freely accessible, but operates only in
manual mode, with SMILES input. Episuite can be freely down-
loaded, but its standard error of prediction was more than twice
that of SPARC. ECETOC (113) quotes the US EPA testing of the
MPBPVP module of the Episuite software; two very large diverse
test sets yielded the following: n ¼ 4,426, MAE ¼ 15.5 ;
n ¼ 6,584, MAE ¼ 20.4 . PREDICTPlus is claimed to have an
MAE of 12.9 . These results are comparable with those of Dearden
given above. Five other software programs, Absolv (67), Chem-
Prop (70), PhysProps (78), OECD Toolbox (24), and PREDICT-
Plus (79), also predict boiling point.
It is recommended that at least two predictions be obtained,
and their average used.
3.6. Vapor Pressure The vapor pressure (VP) of a chemical controls its release into the
atmosphere, and thus is an important factor in the environmental
distribution of chemicals. Vapor pressure is highly temperature
dependent. Most literature values are at ambient temperature, but
some QSPRs allow predictions over a range of temperatures. The
variation of vapor pressure with temperature is given by the
Clausius–Clapeyron equation:

VP2 L 1 1
ln ¼ ; (13)
VP1 R T2 T1
where L ¼ latent heat of vaporization, and R ¼ universal gas con-
stant.
If the latent heat of vaporization is high, vapor pressure changes
markedly with temperature, which is why some chemicals (e.g.,
PCBs) deposit out in polar regions.
Numerous methods are available for the estimation of vapor

pressure, and Grain (186), Schwarzenbach et al. (111), Delle Site
(187), Sage and Sage (188), and Dearden (63) have reviewed many
of these. The descriptors used in vapor pressure QSPRs include
physicochemical, structural, and topological descriptors, and group
contributions. Katritzky et al. (189) used their CODESSA descrip-
tors to model the vapor pressure of a large set of diverse organic
chemicals:
log VP ¼ 0:00559 GI 0:708 HDCAð1Þ þ 0:767 NF
0:00757 WNSA 1 þ 7:01; (14)
n ¼ 645, R2 ¼ 0.937, and s ¼ 0.366,
where GI ¼ gravitational index, HDCA(1) ¼ hydrogen-bond
donor solvent-accessible surface area, NF ¼ number of fluorine
atoms, and WNSA-1 ¼ weighted partial negative surface area.
The CODESSA software is available from SemiChem Inc. (34).
Liang and Gallagher (190) used polarizability and seven struc-
tural descriptors to model the vapor pressure of 479 diverse organic
chemicals, using both multiple linear regression and an artificial
neural network. There was little difference between the two meth-
ods with MLR giving a standard error of 0.534 log unit and ANN
yielding 0.522 log unit.
Tu (191) used a group contribution method to model the
vapor pressure of 1,410 diverse organic chemicals. Using 81
group contributions, 2 hydrogen bonding terms, and melting
point he obtained a standard error of 0.36 log unit. Öberg and
Liu (192) used a partial least squares regression (PLSR) method to
model the vapor pressures of a set of 1,340 diverse organic chemi-
cals, with RMSE ¼ 0.410 log unit. Basak and Mills (193) used
topological descriptors to model vapor pressures of 121 chlorinated
organic chemicals, and obtained an RMSE of 0.130 log unit.
The vapor pressures of 352 hydrocarbons and halohydrocar-
bons were modeled by Goll and Jurs (194), using seven of their
ADAPT descriptors. Vapor pressure was recorded in pascals, and
the data covered the log VP range 1.016 to +6.65:
log VP ¼ 0:670 w þ 0:204 NF þ 5:47 102 NSB
0:121 NRA 6:35 102 DPSA þ 0:117 N3C
þ 0:518 RPCG þ 8:15; (15)
n ¼ 352, R2 ¼ 0.983, and RMSE ¼ 0.186 log unit,
where w ¼ zero order molecular connectivity, NF ¼ number of
fluorine atoms, NSB ¼ number of single bonds, NRA ¼ number
of atoms in ring systems, DPSA ¼ difference between partial posi-
tive surface area and partial negative surface area, N3C ¼ number of
3rd order clusters, and RPCG ¼ relative positive charge.
118 J.C. Dearden
Table 13
Some software predictions of vapor pressures
at 25 oC of a 100-compound test set (63)
Software Mean absolute error (log unit)

SPARC (81) 0.105
ACD/Labs (67) 0.107
Episuite (73) 0.285
Molecular Modeling Pro (74) 0.573
Some of the ADAPT descriptors are difficult to interpret, but

have been found to give good correlations of a number of physico-
chemical properties. The very low standard RMSE reflects the fact
that there was little chemical diversity within the compounds used.
An interesting approach was used by Staikova et al. (195), who
modeled the vapor pressures of nonpolar chemicals with a single
descriptor, average polarizability, and found a standard error of
prediction of 0.313 log unit. It is unlikely, however, that this
approach would work with polar chemicals.
A number of studies (196–199) allow the estimation of vapor
pressures over a range of temperatures.
There are several commercially available software programs that
will calculate vapor pressure; one of them (ACD/Labs) will allow
the calculation of vapor pressure over a temperature range. Using a
100-compound test set of organic chemicals with vapor pressures
measured at 25 C, Dearden (63) compared the performance of
four software programs that calculate log (vapor pressure). The
test results are given in Table 13.
The programs can operate in batch mode, except for SPARC.
The ACD/Labs result was determined on only 42 compounds; 46
test set compounds that were used in the ACD/Labs training set
were deleted, and in addition the ACD/Labs software did not give
a vapor pressure at 25 C for 18 very volatile compounds. ECETOC
(113) quotes the US EPA testing of the MPBPVP module of the
Episuite software in 1999: n ¼ 805, r2 ¼ 0.941, and MAE
¼ 0.476 log unit. This MAE probably reflects either the greater
diversity of the US EPA test set or improvements made in the
software since 1999.
The prediction errors of the PREDICTPlus software (79) are
reported to be 2–5%, depending on the method of calculation. Five
other software programs, Absolv (67), ChemProp (70), PhysProps
(78), OECD Toolbox (24), and ProPred (80), also predict vapor
pressure.
It is recommended that SPARC, ACD/Labs, or Episuite

software be used for the calculation of vapor pressure. Predictions
from at least two different sources should be obtained if possible.
3.7. Henry’s Law The air–water partition coefficient is important in the distribution
Constant (Air–Water of chemicals between the atmosphere and water in the environ-
Partition Coefficient) ment. The prediction of Henry’s law constant (H) has been
reviewed by Sch€ €rmann and Rothenbacher (200), Schwarzen-
uu
bach et al. (111), Reinhard and Drefahl (87), Mackay et al. (201),
and Dearden and Sch€ €rmann (64).
uu
One simple way of calculating H is to use the ratio of vapor
pressure and aqueous solubility (VP/Cw). It is not a highly accurate
method, but neither is the measurement of H, especially for
chemicals with very high or very low H values. VP/Cw can be
converted to the dimensionless form of H (ratio of concentrations
in air and water, Ca/Cw, or Kaw) by the following equation, which is
valid for 25 C:
Ca VP
¼ 40:874 : (16)
Cw Cw
Most prediction methods for H use a group or bond contribu-
tion approach, although some have used physicochemical proper-
ties (202). The group and bond contribution methods were first
used by Hine and Mookerjee (203), who obtained, for a set of 263
diverse simple organic chemicals, a standard deviation of 0.41 log
unit for the group contribution method and one of 0.42 for the
bond contribution method. Cabani et al. (204) claimed an
improvement in the group contribution method over that of
Hine and Mookerjee, whilst Meylan and Howard (205) extended
the bond contribution method and obtained, for a set of 345
diverse chemicals, a standard error of 0.34 log unit. Their method,
together with a group contribution method, is incorporated in the
HENRYWIN module of the Episuite software (73).
Several workers have used physicochemical and/or structural
descriptors to model H.
Nirmalakhandan and Speece (206) developed a QSPR using a
polarizability descriptor, a molecular connectivity term, and an
indicator variable for hydrogen bonding. However, Sch€ uu€ rmann
and Rothenbacher (200) found it to have poor predictive power.
Russell et al. (207) used their ADAPT software to develop a
5-descriptor model of log Kaw for a relatively small but diverse data
set:
log Kaw ¼ 0:547 NHEAVY þ 0:0402 WPSA þ 0:0360 RNCS
þ 10:1 QHET 215 QRELSQ þ 0:73;
(17)
n ¼ 63, R ¼ 0.956, and s ¼ 0.375,
2
120 J.C. Dearden
where NHEAVY ¼ number of heavy atoms, WPSA ¼ (total solvent-

accessible surface area) (sum of surface areas of positively
charged atoms), RNCS ¼ (charge on most negative atom)
(surface area of most negative atom)/(sum of charges on negatively
charged atoms), QHET ¼ (total charge on heteroatoms)/(num-
ber of heteroatoms), and QRELSQ ¼ square of (total charge on
heteroatoms)/(number of atoms).
Recently QSPRs have been developed by Modarresi et al. (208)
using a very large (940-compound) diverse data set. Using genetic
algorithm selection of descriptors, they obtained a 10-descriptor
QSPR with an RMSE of 0.571 log unit.
The Ostwald solubility coefficient L (the reciprocal of Kaw) of
a very diverse data set of chemicals was modeled by Abraham et al.
(209):
log L ¼ 0:577R þ 2:549p þ 3:813Sa þ 4:841Sb
0:869VX þ 0:994; (18)
n ¼ 408, R2 ¼ 0.996, and s ¼ 0.151,
where R ¼ excess molar refractivity (a measure of polarizability),
p ¼ a polarity/polarizability term, Sa and Sb ¼ sum of hydrogen
bond donor and acceptor abilities, respectively, and VX ¼ the
McGowan characteristic volume. The Abraham descriptors are
approximately auto-scaled, so that the magnitudes of the coeffi-
cients in Eq. 18 indicate the relative contributions of each term. It is
clear that hydrogen bonding is the most important factor
controlling water–air distribution; the greater magnitude of the
Sb term probably reflects the strong hydrogen bond donor ability
of water. Molecular size, represented by VX, appears to play only a
minor role in determining air–water partitioning. It may be noted
that the very high correlation coefficient and low standard error of
Eq. 6.22 suggest possible over-fitting; no external validation of
Eq. 18 was provided. The Abraham descriptors are available in
the Absolv software (67).
Over-fitting of data seems likely in two other papers also
(210, 211), where prediction errors of 0.03 and 0.1 log unit,
respectively, were reported.
Katritzky et al. (212) used their CODESSA software to model
the data set of Abraham et al. [1994]:
log L ¼ 42:37 HDCAð2Þ þ 0:65½NO þ NN 0:16 DE
þ 0:12 PCWT þ 0:82 NR þ 2:65; (19)
n ¼ 406, R2 ¼ 0.942, and s ¼ 0.52,
where HDCA(2) ¼ hydrogen bond donor ability, (NO + NN) ¼ a
linear combination of the number of oxygen and nitrogen atoms,
DE ¼ HOMO–LUMO energy difference, PCWT ¼ most nega-
tive partial charge-weighted topological electronic index, and NR ¼
number of rings. It may be noted that the standard error of 0.52 log
unit is more realistic than that of 0.151 reported by Abraham et al.

(209).
Katritzky et al. (108) used predicted vapor pressure and aque-
ous solubility to calculate Henry’s law constant according to
Eq. 20 for 411 diverse chemicals. The table giving their results
was inadvertently omitted in their paper, but they reported a stan-
dard error of 0.63 log unit, which is not very much greater than
that found (0.52 log unit) in their correlation shown in Eq. 23
above.
There are eight software programs that calculate Henry’s law
constant, namely, Absolv (67), ChemOffice (69), ChemProp (70),
Episuite (73), OECD Toolbox (24), Schrödinger (59),
ProPred (80), Schrödinger (59) and SPARC (81). The perfor-
mances of most of them are not known.
Dearden and Sch€ uu€rmann (64) tested a number of methods for
prediction of log H, using a large, diverse test set of 700 chemicals.
Only one of the methods, the bond contribution method in the
HENRYWIN module of the Episuite software, allowed prediction
of log H for all 700 chemicals, with an MAE of prediction of
0.63 log unit.
It is recommended that the HENRYWIN module of the Epi-
suite software be used for the prediction of Henry’s law constant.
4. Guidelines for
Developing QSARs
and QSPRs
A number of publications have offered guidelines on how to
develop QSARs and QSPRs (11, 61, 213–215). In March 2002 a
meeting of QSAR/QSPR experts was held in Setúbal, Portugal, to
formulate a set of guidelines for the validation of QSARs/QSPRs,
in particular for regulatory purposes. Six guidelines were drawn up,
which were later adopted by the OECD (216) and modified to five.
The guidelines are the following:
A valid QSAR/QSPR should have:
1. A defined endpoint.
2. An unambiguous algorithm.
3. A defined domain of applicability.
4. Appropriate measures of goodness of fit, robustness, and pre-
dictivity.
5. A mechanistic interpretation, if possible.
The guidelines are now known as the OECD Principles for the
Validation of (Q)SARs, although they are intended to apply to
QSPRs also. The OECD has also provided a checklist to provide
guidance on the interpretation of the principles (217).
122 J.C. Dearden
Table 14
Types of error in QSAR/QSPR development and use
(from Dearden et al. (218), by kind permission of Taylor
& Francis Ltd., publishers (www.informaworld.com))
Relevant OECD
No. Type of error principle(s)
1 Failure to take account of data heterogeneity 1
2 Use of inappropriate endpoint data 1
3 Use of collinear descriptors 2, 4, 5
4 Use of incomprehensible descriptors 2, 5
5 Error in descriptor values 2
6 Poor transferability of QSAR/QSPR 2
7 Inadequate/undefined applicability domain 3
8 Unacknowledged omission of data points 3
9 Use of inadequate data 3
10 Replication of compounds in data set 3
11 Too narrow a range of endpoint values 3
12 Over-fitting of data 4
13 Use of excessive number of descriptors 4
in a QSAR/QSPR
14 Lack of/inadequate statistics 4
15 Incorrect calculation 4
16 Lack of descriptor auto-scaling 4
17 Misuse/misinterpretation of statistics 4
18 No consideration of distribution of residuals 4
19 Inadequate training/test set selection 4
20 Failure to validate a QSAR/QSPR correctly 4
21 Lack of mechanistic interpretation 5
Recently Dearden et al. (218) published an analysis of 21 types

of error made in the development of QSARs and QSPRs, with
examples taken from the literature, including some of their own.
The different types of error are shown in Table 14, and are dis-
cussed briefly below. However, the reader is directed to Dearden
et al. (218) for further details.
4.1. Heterogeneous There is a temptation, especially if data are scarce, to use values that
Descriptors are not strictly comparable. For example, aqueous solubilities can
be measured in pure water, as undissociated species, at a given pH,
or at different temperatures. It is important, in the development of
a QSPR, to use data that were obtained under the same conditions,
and if possible using the same protocol. Failure to do so will result
in a less than satisfactory QSPR.
4.2. Inappropriate The values of the property of interest must be in molar units, and
Endpoint Data not weight units. This is an important matter, but one that is
frequently not recognized. Again using aqueous solubility as an
example, values should be in units of mol/L, and not g/L. The
reason is that the effect of a chemical (be it physicochemical or
biological) is determined by the number of molecules present, and
not by how much they weigh. Consider two chemicals, A with a
molecular weight of 100 and B with a molecular weight of 200.
Both are found to have an aqueous solubility of 100 mg/L. How-
ever, the molar solubility of A is 1 (i.e., 100/100) mmol/L, whilst
that of B is 0.5 (i.e., 100/200) mmol/L, so B is really only half as
soluble as A.
4.3. Descriptor If two descriptors in a QSPR are themselves highly correlated

Collinearity (collinear), they contribute essentially the same information. In
addition, one could be misled into misinterpretation of the
QSPR. A more serious problem is that highly collinear descriptors
in a QSPR or a QSAR can give rise to a spurious model. Dearden
et al. (218) gave an example where two collinear (r2 ¼ 0.959)
descriptors separately yielded two good QSARs, both with positive
descriptor coefficients. However, if both were included in the same
QSAR, one of the descriptors appeared with a negative coefficient
and was statistically insignificant.
4.4. Incomprehensible There are now thousands of molecular descriptors available for use
Descriptors in QSPR and QSAR model development, and many of them have
no clear physicochemical meaning. Whilst it is not essential for
descriptors to be clearly understood, it is helpful and satisfying if
they are, as well as aiding in the interpretation of the model. It must
nevertheless be recognized that the existence of a correlation,
however good, is not a guarantee of causality.
4.5. Errors in Most descriptor values, be they measured or calculated, contain

Descriptor Values error, and so it behoves the QSAR/QSPR practitioner to try to
ensure that descriptor values are as accurate as possible (60). For
example, measured log P values have a typical error of about
0.35 log unit (9), and hence calculated values must not contain
errors significantly lower than that.
Ghafourian and Dearden (219) examined the use of three
different quantum chemical approaches to the calculation of atomic
124 J.C. Dearden
charges and orbital energies as descriptors of hydrogen bonding,

and found MNDO and AM1 methods to be better than PM3.
It is the opinion of this author that much more work needs to
be done on comparison and accuracy of descriptor values, as a
means of improving the accuracy of QSAR and QSPR property
predictions.
4.6. Poor One of the main values of a QSAR or a QSPR is that it can be used
Transferability of by others for predictive purposes. Hence it has to be transferable
QSARs and QSPRs and reproducible. Unfortunately, for various reasons (e.g., lack of
availability of software) this is often not the case. Hartung et al.
(220) have suggested the following criteria for transferability of a
QSAR or a QSPR to a different operator:
(a) Descriptor values can be reproduced.
(b) Model definition can be confirmed.
(c) Goodness of fit and statistical robustness can be confirmed.
(d) Reproducibility of predictions can be confirmed.
(e) An assessment is given of the adequacy of documentation on
the development and application of the model.
4.7. Inadequate/ The applicability domain (AD) of a QSAR or a QSPR has been
Undefined Applicability defined as: “the response and chemical structure space in which the
Domain model makes predictions with a given reliability” (221, 222). It is
permissible to use a QSAR/QSPR to make predictions a little way
outside its AD, but one should have less confidence in the accuracy
of such predictions. For example, if a given toxicity endpoint cor-
related well with log P, and the log P range of the training set
chemicals was 0–6, one could not expect an accurate toxicity pre-
diction for a chemical with a log P value of 9.
At present, very few published QSAR/QSPR papers give an
indication of AD, although if descriptor values are given one can see
what ranges of descriptor values were used in the training set. In
addition, of course, one should not use a QSAR or a QSPR to make
predictions for a chemical that is not structurally similar to at least
some of the chemicals in the training set. Again, this guideline is not
always adhered to.
4.8. Unacknowledged Data used for the development of a QSAR/QSPR are often taken
Omission of Data from published literature, and one may, for a number of reasons,
Points wish to use only a selection of those available. If some data are
omitted, that must be stated, with reasons (e.g., to keep the train-
ing set to a reasonable number of chemicals, or to examine a
particular class of chemicals). However, it is not uncommon to
find that data have been pruned without a reason being given, or
even without a mention that data have been omitted. Probably the
main reason for omission of data is that the omitted chemicals were
found to be outliers; that is, their property of interest was not well
predicted by the QSAR/QSPR. If this is the case, it must be stated
clearly, and preferably a reason should be given (e.g., the omitted
chemicals were the only ones that were strongly dissociated).
4.9. Use of Inadequate Inadequacy of data can occur in a number of ways. It can include
Data heterogeneity (Subheading 4.1), inappropriate data (Subhead-
ing 4.2), an undefined applicability domain (Subheading 4.7),
and omission of data (Subheading 4.8). Another common problem
is the accuracy of data, which is often very difficult to determine.
Sometimes one finds incorrect or inadequately defined chemi-
cal names. For example, a QSAR study of skin absorption (223)
listed 4-chlorocresol and chloroxylenol in the training set used. The
former has two isomers (4-chloro-2-cresol and 4-chloro-3-cresol),
whilst chloroxylenol has 18 isomers.
A recent study (224) found that incorrect chemical structures
in a number of public and private databases ranged from 0.1 to
3.4%, and observed that even slight structural errors could cause
pronounced changes in the accuracy of QSAR/QSPR predictions.
It should also be noted that it is unacceptable to use predicted
values of the property of interest, when developing a QSAR or a
QSPR, as one is then making predictions about predictions. An
example is a study of skin permeability of 114 chemicals, 63 of
which had calculated permeability values (225).
4.10. Replication Replication of chemicals in a QSAR/QSPR data set clearly distorts

of Chemicals the development and predictivity of the model. Replicate structures
in a Data Set can occur for a number of reasons. The same chemical can some-
times have more than one CAS number, more than one chemical
name (226), different numbering systems, and different values of
the property of interest (227).
How, then, can one avoid replication in QSAR/QSPR data
sets? Probably the best way is by the use of unique structural
codes such as the InChI code (228), and using a computer program
to check for replicates. An alternative method is to sort all data by
each of the available parameters, particularly the chemical formula,
for possible replicates. Failure to eliminate replicates will result in
invalid correlations and poor predictivity.
4.11. Too Narrow a The greater the range of endpoint values used in the development
Range of Endpoint of a QSPR model, the better is its predictivity (229). It is recom-
Values mended that a range of endpoint values of at least 1.0 log unit be
used, if possible, for a good QSAR/QSPR model to be developed
(229). Sometimes, of course, that cannot be achieved, perhaps
through lack of availability of sufficient data, or through sheer
impossibility. For example, a QSPR for the melting points of sub-
stituted anilines used chemicals with a melting range of
244.5–461.5 K, or 0.276 log unit (151).
126 J.C. Dearden
4.12. Over-fitting In the development of a QSAR or a QSPR, one aims to achieve as

of Data good a model as possible. Sometimes this is done by the use of
many descriptors, by the removal of outliers, or by the use of a
certain statistical technique. In order to reduce the risk of chance
correlations, it is recommended (230) that the ratio of number of
training set chemicals to number of descriptors in the model is at
least 5:1. This “rule” has often been broken, with probably the
worst example being the use of nine descriptors to model the
aquatic toxicities of 12 alcohols (231). Some statistical techniques,
such as fuzzy ARTMAP, appear (232) to produce over-fitted mod-
els in which the standard error of prediction is much lower than the
error on the experimental data used to develop the model.
It is recommended that y-scrambling be carried out to check for
over-fitting and chance correlations. This procedure involves ran-
domizing the values of the property being modeled, and then
developing a new model. This procedure is repeated for, say, 100
times, and the R2 values of the correlations compared with the R2
value of the true correlation. If the true R2 value is well above all of
the R2 values from the randomized models, then one can have
confidence that the original model is valid.
4.13. Use of Excessive QSPRs with a large number of descriptors are difficult to interpret
Number of Descriptors (233). Dearden et al. (218) recommended a maximum of five or six
in a QSAR/QSPR descriptors as a general rule, largely on the grounds of understand-
ing. The principle of Occam’s razor is apposite here: “Entia non
sunt multiplicanda praeter necessitatem” (“One should not
increase beyond what is necessary the number of entities required
to explain anything”). However, occasionally QSARs/QSPRs are
developed with a large number of descriptors, as for example the
use of 55 descriptors to model the aqueous solubility of 1,050
chemicals (234). This practice should be avoided because of the
difficulty of comprehension and the risk of over-fitting (see Sub-
heading 4.12).
4.14. Lack of/ The statistics provided with a QSAR/QSPR are an indication of
Inadequate Statistics how well the model fits the training set data, and how predictive the
model is. Many QSAR and QSPR models are still published with-
out full statistics, which means that it is difficult, if not impossible,
to judge the validity of a model. Dearden et al. (218) have recom-
mended that the following statistical indicators are included with
each published QSAR/QSPR: n (number of chemicals in the train-
ing set); r2 or R2 (coefficient of determination or squared correla-
tion coefficient, lower case for a 1-descriptor model and upper case
for a multi-descriptor model); q2 or Q2 (squared cross-validated
correlation coefficient, an internal indicator of predictivity); Radj2
(squared correlation coefficient adjusted for degrees of freedom,
which allows comparison between QSARs and QSPRs containing
different number of descriptors); s (standard error of the estimate)
or RMSE (very similar to standard error of the estimate, especially
when n is large); and full F statistics (F is the Fisher statistic or

variance ratio, which indicates the confidence level of the model).
4.15. Incorrect It is the duty of authors to ensure that calculations that are made to
Calculation obtain their results are as accurate as possible. Editors and manu-
script reviewers expect that to be the case, and it is often impossible
to check accuracy because insufficient data are supplied. There is no
doubt that incorrectly calculated QSARs and QSPRs have been
published, but no one has investigated this problem. A few
instances have come to light. Dearden et al. (218) reported a case
in which a 5-descriptor QSAR was published with R2 ¼ 0.958,
s ¼ 0.14, and F ¼ 79. Because all descriptor values were given, it
was possible to recalculate the QSAR, and the statistics were found
to be R2 ¼ 0.298, s ¼ 0.56, and F ¼ 1.4.
It is recommended that all calculations be double-checked,
preferably by two different people, before a QSAR or a QSPR is
published.
4.16. Lack of Auto-scaling is the modification of descriptor values by subtracting

Descriptor the mean from the value of each descriptor, and then dividing by
Auto-Scaling the standard deviation. This yields descriptor values with a mean of
zero and a variance of one, which means that modeling is less
susceptible to the influence of chemicals with extreme values (42),
and avoids the risk of descriptors with large numerical values dom-
inating those with small values (62). Another important result of
auto-scaling is that the relative contributions of each descriptor to
the model can readily be seen (see Eq. 1).
Regrettably, very few QSAR/QSPR publications use auto-
scaling, and it is recommended that it should be standard practice.
4.17. Misuse or Not many QSAR/QSPR practitioners are statisticians, and few
Misinterpretation statisticians have experience of QSAR/QSPR modeling. Neverthe-
of Statistics less, statistics is an essential QSAR/QSPR tool. It is therefore not
surprising to find that it is sometimes misused, or its results mis-
interpreted. Livingstone’s book (42) is very helpful in this respect,
and a survey of QSAR/QSPR statistics available on the Internet
(235) gives useful guidance. Useful Web sites are those of the
Scripps Institute (236) and QSAR World (237).
Two cases involving the misuse or misinterpretation of statistics
(231, 232) have already been mentioned in Subheading 4.12.
It behoves QSAR/QSPR workers to familiarize themselves
sufficiently with the requisite statistics, or to enlist the help of a
knowledgeable statistician, in order to ensure that the statistical
techniques and results that they use are valid.
4.18. No Consideration Residuals (differences between measured and predicted endpoint

of Distribution values) can arise from both random error and systematic error.
of Residuals Random error arises from lack of data and/or descriptor value
128 J.C. Dearden
reproducibility, whilst systematic error generally results from biases

in descriptor values, perhaps from poor choice of descriptors.
If residuals are plotted against measured endpoint values, ran-
dom distribution around the zero residual line indicates random
error. If, however, all or most of the residuals lie on one side of the
zero residual line, or show a consistent variation of residuals with
increasing measured values, systematic error is indicated. This sug-
gests that the model needs to be reexamined in order to eliminate
such error.
Currently very few QSAR/QSPR publications include a con-
sideration of the distribution of residuals. It is recommended that
residual plots be included, as a guide to model improvement.
4.19. Inadequate Data for QSAR/QSPR modeling are usually divided into training
Training/Test Set and test sets either randomly or by ordering the chemicals accord-
Selection ing to endpoint values and then selecting every nth chemical for the
test set. These approaches have, however, been shown to be subop-
timal (14, 238). The training set should cover a good range of
endpoint values and (for diverse data sets) have an adequate cover-
age of the requisite chemical space. The test set chemicals should
also cover a good range of endpoint values, be sufficiently diverse in
nature, and be similar (but not too similar) to training set chemi-
cals. Various techniques (14, 15, 239, 240) have been proposed for
the rational selection of training and test sets.
It is essential that proper attention is paid to training and test
set selection when preparing to develop a QSAR/QSPR model. It
is recognized, however, that one often serious drawback to this is
lack of availability of satisfactory and appropriate data.
4.20. Failure to The main use of a QSAR or a QSPR is to make predictions of

Validate a QSAR/QSPR properties of chemicals that were not in the training set used to
Correctly develop the model. So the prediction ability (the predictivity) of the
model needs to be assessed (241). Tropsha et al. (242) discussed a
number of ways of doing this, and they recommended that both
internal and external validation be carried out; these procedures
were described briefly in Subheading 2.1. It is now generally
accepted that external validation is the better way to validate a
QSAR/QSPR model (233, 243).
A problem arises if there are only a relatively few chemicals
available from which to develop the model. Suppose that one has
only 15 chemicals in one’s data set. Removal of 20% of them to use
as a test set would leave only 12 chemicals in the training set, which
would allow, from the Topliss and Costello rule (230), the inclu-
sion of only two descriptors in the model. An alternative acceptable
procedure in such a case is to remove one chemical, and develop the
model on the remaining chemicals, from the whole pool of descrip-
tors. That model is then used to predict the endpoint value of the
omitted chemical. That chemical is then returned to the training
set, a second chemical is removed, and the model is redeveloped,

again from the whole pool of descriptors. This procedure is
repeated until every chemical has been removed in turn, and one
then has external predictions for every chemical. The technique is
not as satisfactory as using separate training and test sets, but is
better than internal cross-validation (244).
It should be noted that at least four journals (SAR & QSAR in
Environmental Research, Molecular Informatics, Journal of Medic-
inal Chemistry and Journal of Chemical Information and Model-
ing) require external validation of published QSARs/QSPRs. It is
thus essential that all QSARs and QSPRs are fully validated before
being used predictively.
4.21. Lack of The documentation for the OECD Principles for the Validation of
Mechanistic (Q)SARs (216) states: It is recognised that it is not always possible,
Interpretation from a scientific viewpoint, to provide a mechanistic interpretation of
a given (Q)SAR (Principle 5), or that there even be multiple mecha-
nistic interpretations of a given model. The absence of a mechanistic
interpretation for a model does not mean that a model is not poten-
tially useful in the regulatory context. The intent of Principle 5 is not
to reject models that have no apparent mechanistic basis, but to ensure
that some consideration is given to the possibility of a mechanistic
association between the descriptors used in a model and the endpoint
being predicted, and to ensure that this association is documented.
The OECD guidance on the Principles for the Validation of
(Q)SARs/(Q)SPRs (217) recommends that the following ques-
tions be asked regarding the mechanistic basis of a QSAR/QSPR:
1. Do the descriptors have a physicochemical interpretation that is
consistent with a known mechanism?
2. Can any literature references be cited in support of the pur-
ported mechanistic basis of the QSAR/QSPR?
If the answers to both questions are positive, one may have
some confidence in the proposed mechanism of action. If the
answer to one or both questions is negative, then the level of
confidence will be lower. In all cases, it must be remembered that
the existence of a correlation does not imply causality.
Johnson (245) recently commented that QSAR has devolved
into a kind of logical fallacy: cum hoc ergo propter hoc (with this,
therefore because of this). He also stated: “rarely, if ever, are any
designed experiments presented to test or challenge the interpreta-
tion of the descriptors . . . Statistical methodologies should be a tool
of QSAR but instead have often replaced the craftsman tools of our
trade—rational thought, controlled experiments, and personal
observation.” The QSAR/QSPR practitioner would do well to
take those strictures to heart, as well as the recommendations
made in a recent extensive review of QSPR prediction of
physicochemical properties (246).
130 J.C. Dearden
References
1. van de Waterbeemd H (2009) Improving In: Cronin MTD, Madden JC (eds) In silico
compound quality through in vitro and in toxicology: principles and applications. RSC
silico profiling. Chem Biodivers 6:1760–1766 Publishing, Cambridge, pp 59–117
2. Cronin MTD, Livingstone DJ (2004) Calcu- 14. Leonard JT, Roy K (2006) On selection of
lation of physicochemical properties. In: Cro- training and test sets for the development of
nin MTD, Livingstone DJ (eds) Predicting predictive QSAR models. QSAR Comb Sci
chemical toxicity and fate. CRC, Boca 25:235–251
Raton, FL, pp 31–40 15. Golbraikh A, Shen M, Xiao Z et al (2003)
3. Fisk PR, McLaughlin L, Wildey RJ (2004) Rational selection of training and test sets for
Good practice in physicochemical property the development of validated QSAR models.
prediction. In: Cronin MTD, Livingstone DJ J Comput Aided Mol Des 17:241–253
(eds) Predicting chemical toxicity and fate. 16. Aquasol: www.pharmacy.arizona.edu/out-
CRC, Boca Raton, FL, pp 41–59 reach/aquasol/
4. Webb TH, Morlacci LA (2010) Calculation of 17. Tripos: www.tripos.com
physic-chemical and environmental fate prop- 18. Chemical & Physical Properties Database:
erties. In: Cronin MTD, Madden JC (eds) In www.dep.state.pa.us/physicalproperties/CPP_
silico toxicology: principles and applications. search.htm
RSC Publishing, Cambridge, pp 118–147
19. Chemical Database Service: cds.dl.ac.uk
5. Dearden JC (2004) QSAR modeling of bioac-
cumulation. In: Cronin MTD, Livingstone 20. ChemSpider: www.chemspider.com
DJ (eds) Predicting chemical toxicity and 21. Crossfire: info.crossfiredatabases.com
fate. CRC, Boca Raton, FL, pp 333–355 22. OCHEM: www.ochem.eu
6. Dearden JC (2004) QSAR modeling of soil 23. OECD eChemPortal: www.echemportal.org
sorption. In: Cronin MTD, Livingstone DJ 24. OECD QSAR Toolbox: www.qsartoolbox.
(eds) Predicting chemical toxicity and fate. org
CRC, Boca Raton, FL, pp 357–371 25. OSHA: www.osha.gov/web/dep/chemical-
7. Sch€ € rmann G, Ebert R-U, Nendza M et al
uu data/
(2007) Predicting fate-related physicochemi- 26. PhysProp: www.syrres.com/what-we-d0/
cal properties. In: van Leeuwen CJ, Vermeire product.aspx?id¼133
TG (eds) Risk assessment of chemicals: an
introduction, 2nd edn. Springer, Dordrecht, 27. Wagner AB (2001) Finding physical proper-
pp 375–426 ties of chemicals: a practical guide for scien-
tists, engineers, and librarians. Sci Technol
8. Abraham MH, Chadha HS, Mitchell RC Lib 21(3/4):27–45
(1994) Hydrogen bonding. 32. An analysis
of water–octanol and water–cyclohexane 28. Oyarzabal J, Pastor J, Howe TJ (2009) Opti-
partitioning and the Dlog P parameter of Sei- mizing the performance of in silico ADMET
ler. J Pharm Sci 83:1085–1100 general models according to local require-
ments: MARS approach. Solubility estimations
9. Mannhold R, Poda GI, Ostermann C et al as case study. J Chem Inf Model 49:2837–2850
(2009) Calculation of molecular lipophilicity:
state-of-the-art and comparison of log P 29. Dearden JC (1990) Physico-chemical descrip-
methods on more than 96,000 compounds. tors. In: Karcher W, Devillers J (eds) Practical
J Pharm Sci 98:861–893 applications of quantitative structure–activity
relationships (QSARs) in environmental
10. Dearden JC (2006) In silico prediction of chemistry and toxicology. Kluwer Academic,
aqueous solubility. Exp Opin Drug Discov Dordrecht, pp 25–59
1:31–52
30. Maran U, Sild S, Tulp I et al (2010) Molecular
11. Livingstone DJ (2004) Building QSAR mod- descriptors from two-dimensional chemical
els: a practical guide. In: Cronin MTD, structure. In: Cronin MTD, Madden JC
Livingstone DJ (eds) Predicting chemical (eds) In silico toxicology: principles and
toxicity and fate. CRC, Boca Raton, FL, pp applications. RSC Publishing, Cambridge,
151–170 pp 148–192
12. SMILES: www.daylight.com/dayhtml_tutor- 31. Ran YQ, Jain N, Yalkowsky SH (2001) Pre-
ials/languages/smiles/index.html diction of aqueous solubility of organic com-
13. Nendza M, Aldenberg T, Benfenati E et al pounds by the general solubility equation
(2010) Data quality assessment for in silico (GSE). J Chem Inf Comput Sci 41:
methods: a survey of approaches and needs. 1208–1217
32. ADAPT: research.chem.psu.edu/pcjgroup/ 63. Dearden JC (2003) Quantitative structure–-

adapt.html property relationships for prediction of boil-
33. Molecular Discovery: www.moldiscovery.com ing point, vapor pressure, and melting point.
34. SemiChem: www.semichem.com Environ Toxicol Chem 22:1696–1709
35. Biobyte: www.biobyte.com 64. Dearden JC, Sch€ € rmann G (2003) Quanti-
uu
tative structure–property relationships for
36. Accelrys: www.accelrys.com predicting Henry’s law constant from molec-
37. Dragon: www.talete.mi.it/products/dra- ular structure. Environ Toxicol Chem
gon_description.htm 22:1755–1770
38. eDragon: www.vcclab.org/lab/edragon/ 65. QMRF Database: qsardb.jrc.it/qmrf/
39. ChemComp: www.chemcomp.com 66. Danish QSAR Database: www.130.226.165.
40. EduSoft: www.edusoft-lc.com/molconn/ 14/index.html
41. vLifeSciences: www.vlifesciences.com 67. ACD/Labs: www.acdlabs.com
42. Livingstone D (1995) Data analysis for che- 68. ChemAxon: www.chemaxon.com
mists. Oxford University Press, Oxford 69. CambridgeSoft: www.cambridgesoft.com
43. Rowe PH (2010) Statistical methods for con- 70. UFZ: www.ufz.de/index.php?en=6738
tinuous measured endpoints in in silico toxi- 71. ChemSilico: www.chemsilico.com
cology. In: Cronin MTD, Madden JC (eds) In
silico toxicology: principles and applications. 72. Daylight: www.daylight.com
RSC Publishing, Cambridge, pp 228–251 73. Episuite: www.epa.gov/opptintr/exposure/
44. SimulationsPlus: www.simulations-plus.com pubs/episuite.htm
45. FQS Poland: www.fqs.pl 74. ChemSW: www.chemsw.com
46. VCCLAB: www.vcclab.org 75. Molinspiration: www.molinspiration.com
47. MSI: www.msi.umn.edu/sw/cerius2 76. Chemistry Database Software: www.chemdb-
soft.com
48. VSN International: www.vsni.co.uk/soft-
ware/genstat/ 77. CompuDrug: www.compudrug.com
49. MathWorks: www.mathworks.com 78. G & P Engineering Software: www.gpengi-
neeringsoft.com
50. Minitab: www.minitab.com
79. MW Software: www.mwsoftware.com/
51. NCSS: www.ncss.com dragon
52. IDBS: www.idbs.com 80. ProPred: www.capec.kt.dtu.dk
53. ProChemist: pro.chemist.online.fr 81. SPARC: ibmlc2.chem.uga.edu/sparc
54. GNU: www.gnu.org/software/pspp/ 82. TerraBase: www.terrabase-inc.com
55. SAS: www.sas.com 83. Optibrium: www.optibrium.com
56. Scigress Explorer: www.scigress-explorer.soft- 84. Dearden JC (1985) Partitioning and lipophi-
ware.informer.com licity in quantitative structure–activity rela-
57. SPSS: www.spss.com tionships. Environ Health Perspect
58. StatSoft: www.statsoft.com 61:203–228
59. Schrödinger: www.schrodinger.com 85. Hansch C, Maloney PP, Fujita T et al (1962)
60. Tetko IV, Bruneau P, Mewes H-W et al Correlation of biological activity of phenox-
(2006) Can we estimate the accuracy of yacetic acids with Hammett substituent con-
ADME-Tox predictions? Drug Disc Today stants and partition coefficients. Nature
11:700–707 194:178–180
61. Walker JD, Dearden JC, Schultz TW et al 86. Nendza M (1998) Structure–activity relation-
(2003) QSARs for new practitioners. In: ships in environmental sciences. Chapman &
Walker JD (ed) Quantitative structure–activ- Hall, London
ity relationships for pollution prevention, tox- 87. Reinhard M, Drefahl A (1999) Estimating
icity screening, risk assessment, and web physicochemical properties of organic com-
applications. SETAC, Pensacola, FL, pp 3–18 pounds. Wiley, New York, NY
62. Eriksson L, Jaworska J, Worth AP et al (2003) 88. Leo A (2000) Octanol/water partition coeffi-
Methods for reliability and uncertainty assess- cients. In: Boethling RS, Mackay D (eds)
ment and for applicability evaluations of clas- Handbook of property estimation methods
sification- and regression-based QSARs. for chemicals. Lewis, Boca Raton, FL, pp
Environ Health Perspect 111:1361–1375 89–114
132 J.C. Dearden
89. Mannhold R, van de Waterbeemd H (2001) 103. Dearden JC, Netzeva TI, Bibby R (2003) A
Substructure and whole molecule approaches comparison of commercially available soft-
for calculating log P. Comput Aided Mol Des ware for the prediction of partition coeffi-
15:337–354 cient. In: Ford M, Livingstone D, Dearden J
90. Livingstone DJ (2003) Theoretical property et al (eds) Designing drugs and crop protec-
predictions. Curr Top Med Chem 3: tants: processes, problems and solutions.
1171–1192 Blackwell, Oxford, pp 168–169
91. Klopman G, Zhu H (2005) Recent meth- 104. Sakuratani Y, Kasai K, Noguchi Y et al (2007)
odologies for the estimation of n-octanol/ Comparison of predictivities of log P calcula-
water partition coefficients and their use in tion models based on experimental data for
the prediction of membrane transport proper- 134 simple organic compounds. QSAR
ties of drugs. Mini Rev Med Chem Comb Sci 26:109–116
5:127–133 105. COSMOlogic: www.cosmologic.de
92. Fujita T, Iwasa J, Hansch C (1964) A new 106. Varnek A, Fourches D, Solov’ev VP et al
substituent constant, p, derived from parti- (2004) “In silico” design of new uranyl
tion coefficients. J Am Chem Soc extractants based on phosphoryl-containing
86:5175–5180 podands: QSPR studies, generation and
93. Nys GG, Rekker RF (1973) Statistical analysis screening of virtual combinatorial library,
of a series of partition coefficients with special and experimental tests. J Chem Inf Comput
reference to the predictability of folding of Sci 44:1365–1382
drug molecules. Introduction of hydrophobic 107. Dearden JC, Cronin MTD, Schultz TW et al
fragmental constants (f values). Chim Ther (1995) QSAR study of the toxicity of nitro-
8:521–535 benzenes to Tetrahymena pyriformis. Quant
94. Rekker RF (1977) The hydrophobic fragmen- Struct Act Relat 14:427–432
tal constant. Elsevier, Amsterdam 108. Katritzky AR, Wang Y, Sild S et al (1998)
95. Leo A, Jow PYC, Silipo C et al (1975) Calcu- QSPR studies on vapor pressure, aqueous sol-
lation of hydrophobic constant (log P) from p ubility, and the prediction of air–water parti-
and f values. J Med Chem 18:865–868 tion coefficients. J Chem Inf Comput Sci
96. Bodor N, Gabanyi NZ, Wong C-K (1989) A 38:720–725
new method for the estimation of partition 109. Yalkowsky SH, Banerjee S (1992) Aqueous
coefficient. J Am Chem Soc 111:3783–3786 solubility: methods of estimation for organic
97. Klopman G, Wang S (1991) A computer compounds. Dekker, New York, NY
automated structure evaluation (CASE) 110. Mackay D (2000) Solubility in water. In:
approach to calculation of partition coeffi- Boethling RS, Mackay D (eds) Handbook of
cient. J Comput Chem 12:1025–1032 property estimation methods for chemicals:
98. Ghose AK, Pritchett A, Crippen GM (1988) environmental and health sciences. Lewis,
Atomic physicochemical parameters for three Boca Raton, FL, pp 125–139
dimensional structure directed quantitative 111. Schwarzenbach RP, Gschwend PM, Imboden
structure–activity relationships: III. Modeling DM (1993) Environmental organic chemis-
hydrophobic interactions. J Comput Chem try. Wiley, New York, NY
9:80–90 112. Johnson SR, Zheng W (2006) Recent prog-
99. Liu R, Zhou D (2008) Using molecular fin- ress in the computational prediction of aque-
gerprint as descriptors in the QSPR study of ous solubility and absorption. AAPS J 8:
lipophilicity. J Chem Inf Model 48:542–549 E27–E40
100. Chen H-F (2009) In silico log P prediction 113. ECETOC Technical Report No. 89 (2003)
for a large data set with support vector (Q)SARs: evaluation of the commercially
machines, radial basis neural networks and available software for human health and envi-
multiple linear regression. Chem Biol Drug ronmental endpoints with respect to chemical
Des 74:142–147 management applications. ECETOC, Brus-
101. Tetko IV, Tanchuk VYu, Villa AEP (2001) sels
Prediction of n-octanol/water partition coef- 114. Hansch C, Quinlan JE, Lawrence GL (1968)
ficients from PHYSPROP database using arti- The linear free energy relationship between
ficial neural networks and E-state indices. partition coefficients and aqueous solubility
J Chem Inf Comput Sci 41:1407–1421 of organic liquids. J Org Chem 33:347–350
102. Hall LH, Kier LB (1999) Molecular structure 115. Yalkowsky SH, Valvani SC (1980) Solubility
description: the electrotopological state. Aca- and partitioning I: solubility of nonelectro-
demic, New York, NY lytes in water. J Pharm Sci 69:912–922
116. Hughes LD, Palmer DS, Nigsch F et al 129. Dearden JC, Netzeva TI, Bibby R (2003) A
(2008) Why are some properties more diffi- comparison of commercially available soft-
cult to predict than others? A study of QSPR ware for the prediction of aqueous solubility.
models of solubility, melting point, and log P. In: Ford M, Livingstone D, Dearden J et al
J Chem Inf Model 48:220–232 (eds) Designing drugs and crop protectants:
117. Sanghvi T, Jain N, Yang G et al (2003) Esti- processes, problems and solutions. Blackwell,
mation of aqueous solubility by the general Oxford, pp 169–171
solubility equation (GSE) the easy way. QSAR 130. Dearden JC. Unpublished information
Comb Sci 22:258–262 131. Harris JC, Hayes MJ (1990) Acid dissociation
118. Abraham MH, Le J (1999) The correlation constant. In: Lyman WJ, Reehl WF, Rosen-
and prediction of the solubility of compounds blatt DH (eds) Handbook of chemical prop-
in water using an amended solvation energy erty estimation methods. American Chemical
relationship. J Pharm Sci 88:868–880 Society, Washington, DC, pp 6.1–6.28
119. Votano JR, Parham M, Hall LH et al (2004) 132. Brown TN, Mora-Diez N (2006) Computa-
Prediction of aqueous solubility based on tional determination of aqueous pKa values of
large datasets using several QSPR models uti- protonated benzimidazoles (Part 2). J Phys
lizing topological structure representation. Chem B 110:20546–20554
Chem Biodivers 11:1829–1841 133. Kaschula CH, Egan TJ, Hunter R et al (2002)
120. Raevsky OA, Raevskaja OE, Schaper K-J Structure–activity relationships in 4-
(2004) Analysis of water solubility data on aminoquinoline antiplasmodials. The role of
the basis of HYBOT descriptors. Part 3. Sol- the group at the 7-position. J Med Chem
ubility of solid neutral chemicals and drugs. 45:3531–3539
QSAR Comb Sci 23:327–343 134. Soriano E, Cerdan S, Ballesteros P (2004)
121. Klopman G, Zhu H (2001) Estimation of the Computational determination of pK(a)
aqueous solubility of organic molecules by the values. A comparison of different theoretical
group contribution approach. J Chem Inf approaches and a novel procedure. J Mol
Comput Sci 41:439–445 Struct Theochem 684:121–128
122. Palmer DS, O’Boyle NM, Glen RC et al 135. Klopman G, Fercu D (1994) Application of
(2007) Random forest models to predict the multiple computer automated structure
aqueous solubility. J Chem Inf Model 47: evaluation methodology to a quantitative
150–158 structure–activity relationship study of acidity.
123. Lind P, Maltseva T (2003) Support vector J Comput Chem 15:1041–1050
machines for the estimation of aqueous solu- 136. Klamt A, Eckert F, Diedenhofen M et al
bility. J Chem Inf Comput Sci 43:1855–1859 (2003) First principles calculations of aque-
124. Duchowicz PR, Talevi A, Bruno-Blanch LE ous pK(a) values for organic and inorganic
et al (2008) New QSPR study for the predic- acids using COSMO-RS reveal an inconsis-
tion of aqueous solubility of drug-like com- tency in the slope of the pK(a) scale. J Phys
pounds. Bioorg Med Chem 16:7944–7955 Chem A 107:9380–9386
125. Duchowicz PR, Castro EA (2009) QSPR 137. Eckert F, Klamt A (2006) Accurate prediction
studies on aqueous solubilities of drug-like of basicity in aqueous solution with COSMO-
compounds. Int J Mol Sci 10:2558–2577 RS. J Comput Chem 27:11–19
126. Huuskonen J, Livingstone DJ, Manallack DT 138. Lee AC, Yu J-Y, Crippen GM (2008) pKa
(2008) Prediction of drug solubility from prediction of monoprotic small molecules
molecular structure using a drug-like training the SMARTS way. J Chem Inf Model
set. SAR QSAR Environ Res 19:191–212 48:2042–2053
127. Yang G-Y, Yu J, Wang Z-Y et al (2007) QSPR 139. Milletti F, Storchi L, Sforna G et al (2007)
study on the aqueous solubility (lgS(w)) New and original pKa prediction method
and n-octanol/water partition coefficients using GRID molecular interaction fields.
(lgK(ow)) of polychlorinated dibenzo-p- J Chem Inf Model 47:2172–2181
dioxins (PCDDs). QSAR Comb Sci 26: 140. Cruciani G, Milletti F, Storchi L et al (2009)
352–357 In silico prediction and ADME profiling.
128. Wei X-Y, Ge Z-G, Wang Z-Y et al (2007) Chem Biodivers 6:1812–1821
Estimation of aqueous solubility (lgS(w)) 141. Parthasarathi R, Padmanabhan J, Elango M
of all polychlorinated biphenyl (PCB) conge- et al (2006) pKa prediction using group phi-
ners by density function theory and position licity. J Phys Chem A 110:6540–6544
of Cl substitution (N-PCS) method. Chinese 142. Tsantili-Kakoulidou A, Panderi I, Csizmadia
J Struct Chem 26:519–528 F et al (1997) Prediction of distribution
134 J.C. Dearden
coefficient from structure 2. Validation of 156. Katritzky AR, Maran U, Karelson M et al

PrologD, an expert system. J Pharm Sci 86: (1997) Prediction of melting points for the
1173–1179 substituted benzenes. J Chem Inf Comput Sci
143. Hilal SH, Karickhoff SW, Carreira LA (1995) 37:913–919
A rigorous test for SPARC’s chemical reactiv- 157. Abramowitz R, Yalkowsky SH (1990) Esti-
ity models: estimation of more than 4300 mation of aqueous solubility and melting
ionisation pKa’s. Quant Struct Act Relat 14: point of PCB congeners. Chemosphere
348–355 21:1221–1229
144. Lee PH, Ayyampalayam SN, Carreira LA et al 158. Tsakanikas PD, Yalkowsky SH (1988) Esti-
(2007) In silico prediction of ionization con- mation of melting point of flexible molecules:
stants of drugs. Mol Pharm 4:498–512 aliphatic hydrocarbons. Toxicol Environ
145. K€uhne R, Ebert R-U, Sch€ uu€ rmann G (2006) Chem 17:19–33
Model selection based on structural similar- 159. Abramowitz R, Yalkowsky SH (1990) Melt-
ity—method description and application to ing point, boiling point and symmetry. Pharm
water solubility prediction. J Chem Inf Res 7:942–947
Model 46:636–641 160. Zhao L, Yalkowsky SH (1999) A combined
146. Dearden JC, Cronin MTD, Lappin DC group contribution and molecular geometry
(2007) A comparison of commercially avail- approach for predicting melting points of ali-
able software for the prediction of pKa. phatic compounds. Ind Eng Chem Res
J Pharm Pharmacol 59(suppl 1):A-7 38:3581–3584
147. Liao C, Nicklaus MC (2009) Comparison of 161. Todeschini R, Vighi M, Finizio A et al (1997)
nine programs predicting pKa values of phar- 3-D modelling and prediction by WHIM
maceutical substances. J Chem Inf Model descriptors. Part 8. Toxicity and physico-
49:2801–2812 chemical properties of environmental priority
148. Meloun M, Bordovská S (2007) Benchmarking chemicals by 2D-TI and 3D-WHIM descrip-
and validating algorithms that estimate pKa tors. SAR QSAR Environ Res 7:173–193
values of drugs based on their molecular struc- 162. Bergström CAS, Norinder U, Luthman K
ture. Anal Bioanal Chem 389:1267–1281 et al (2003) Molecular descriptors influencing
149. Balogh GT, Gyarmati B, Nagy B et al (2009) melting point and their role in classification of
Comparative evaluation of in silico pKa pre- solid drugs. J Chem Inf Comput Sci 43:
diction tools on the Gold Standard dataset. 1177–1185
QSAR Comb Sci 28:1148–1155 163. Modarresi H, Dearden JC, Modarress H
150. Manchester J, Walkup G, Rivin O et al (2010) (2006) QSPR correlation of melting point
Evaluation of pKa estimation methods on 211 for drug compounds based on different
druglike compounds. J Chem Inf Model sources of molecular descriptors. J Chem Inf
50:565–571 Model 46:930–936
151. Dearden JC (1991) The QSAR prediction of 164. Godavarthy SS, Robinson RL, Gasem KAM
melting point, a property of environmental (2006) An improved structure–property
relevance. Sci Total Environ 109(110):59–68 model for predicting melting-point tempera-
152. Horvath AL (1992) Molecular design: chem- tures. Ind Eng Chem Res 45:5117–5126
ical structure generation from the properties 165. Karthikeyan M, Glen RC, Bender A (2005)
of pure organic compounds. Elsevier, Amster- General melting point prediction based on a
dam diverse compound data set and artificial neu-
153. Dearden JC (1999) The prediction of melting ral networks. J Chem Inf Model 45:581–590
point. In: Charton M, Charton B (eds) 166. Joback KG, Reid RC (1987) Estimation of
Advances in quantitative structure–property pure-component properties from group con-
relationships, vol 2. JAI Press, Stamford, CT, tributions. Chem Eng Commun 57:233–243
pp 127–175 167. Simamora P, Yalkowsky SH (1994) Group
154. Tesconi M, Yalkowsky SH (2000) Melting contribution methods for predicting the
point. In: Boethling RS, Mackay D (eds) melting points and boiling points of aromatic
Handbook of property estimation methods compounds. Ind Eng Chem Res 33:
for chemicals. Lewis, Boca Raton, FL, pp 1405–1409
3–27 168. Constantinou L, Gani R (1994) New group
155. Mills EJ (1884) On melting point and boiling contribution method for estimating proper-
point as related to composition. Phil Mag ties of pure compounds. Am Inst Chem Eng
17:173–187 J 40:1697–1710
169. Marrero J, Gani R (2001) Group-contribution 181. Hall LH, Story CT (1996) Boiling point and
based estimation of pure component proper- critical temperature of a heterogeneous data
ties. Fluid Phase Equil 183–184:183–208 set. QSAR with atom type electrotopological
170. Tu C-H, Wu Y-S (1996) Group-contribution state indices using artificial neural networks.
estimation of normal freezing points of J Chem Inf Comput Sci 36:1004–1014
organic compounds. J Chin Inst Chem Eng 182. Kier LB, Hall LH (1999) Molecular structure
27:323–328 description: the electrotopological state. Aca-
171. Gold PI, Ogle GJ (1969) Estimating thermo- demic, San Diego, CA
physical properties of liquids. Part 4—Boil- 183. Stein SE, Brown RL (1994) Estimation of
ing, freezing and triple-point temperatures. normal boiling points from group contribu-
Chem Eng 76:119–122 tions. J Chem Inf Comput Sci 34:581–587
172. Lyman WJ (2000) Boiling point. In: Boethl- 184. Labute P (2000) A widely applicable set of
ing RS, Mackay D (eds) Handbook of prop- descriptors. J Mol Graph Model 18:464–477
erty estimation methods for chemicals: 185. Ericksen D, Wilding WV, Oscarson JL et al
environmental and health sciences. Lewis, (2002) Use of the DIPPR database for devel-
Boca Raton, FL, pp 29–51 opment of QSPR correlations: normal boiling
173. Banks WH (1939) Considerations of a vapour point. J Chem Eng Data 47:1293–1302
pressure-temperature equation, and their 186. Grain CF (1990) Vapor pressure. In: Lyman
relation to Burnop’s boiling point function. WJ, Reehl WF, Rosenblatt DH (eds) Hand-
J Chem Soc 292–295 book of chemical property estimation meth-
174. Rechsteiner CE (1990) Boiling point. In: ods. American Chemical Society, Washington,
Lyman WJ, Reehl WF, Rosenblatt DH (eds) DC, pp 14.1–14.20
Handbook of chemical property estimations 187. Delle Site A (1996) The vapor pressure of
methods. American Chemical Society, environmentally significant organic chemi-
Washington, DC, pp 12.1–12.55 cals: a review of methods and data at ambient
175. Ivanciuc O, Ivanciuc T, Cabrol-Bass D et al temperature. J Phys Chem Ref Data 26:
(2000) Evaluation in quantitative structure– 157–193
property relationship models of structural 188. Sage ML, Sage GW (2000) Vapor pressure.
descriptors derived from information-theory In: Boethling RS, Mackay D (eds) Handbook
operators. J Chem Inf Comput Sci 40: of property estimation methods for chemicals:
631–643 environmental and health sciences. Lewis,
176. Gironés X, Amat L, Robert D et al (2000) Boca Raton, FL, pp 53–65
Use of electron–electron repulsion energy as a 189. Katritzky AR, Slavov SH, Dobchev DA et al
molecular descriptor in QSAR and QSPR (2007) Rapid QSPR model development
studies. J Comput Aided Mol Des 14: technique for prediction of vapor pressure of
477–485 organic compounds. Comput Chem Eng
177. Katritzky AR, Mu L, Lobanov VS et al (1996) 31:1123–1130
Correlation of boiling points with molecular 190. Liang CK, Gallagher DA (1998) QSPR pre-
structure. 1. A training of 298 diverse organ- diction of vapor pressure from solely
ics and a test set of 9 simple inorganics. J Phys theoretically-derived descriptors. J Chem Inf
Chem 100:10400–10407 Comput Sci 38:321–324
178. Sola D, Ferri A, Banchero M et al (2008) 191. Tu C-H (1994) Group-contribution method
QSPR prediction of N-boiling point and crit- for the estimation of vapor pressures. Fluid
ical properties of organic compounds and Phase Equil 99:105–120
comparison with a group-contribution 192. Öberg T, Liu T (2008) Global and local PLS
method. Fluid Phase Equil 263:33–42 regression models to predict vapor pressure.
179. Wessel MD, Jurs PC (1995) Prediction of QSAR Comb Sci 27:273–279
normal boiling points for a diverse set of 193. Basak SC, Mills D (2009) Predicting the
industrially important organic compounds vapour pressure of chemicals from structure:
from molecular structure. J Chem Inf Com- a comparison of graph theoretic versus quan-
put Sci 35:841–850 tum chemical descriptors. SAR QSAR Envi-
180. Basak SC, Mills D (2001) Use of mathemati- ron Res 20:119–132
cal structural invariants in the development of 194. Goll ES, Jurs PC (1999) Prediction of vapor
QSPR models. Commun Math Comput pressures of hydrocarbons and halohydrocar-
Chem 44:15–30 bons from molecular structure with a
136 J.C. Dearden
computational neural network model. J 206. Nirmalakhandan NN, Speece RE (1988)

Chem Inf Comput Sci 39:1081–1089 QSAR model for predicting Henry’s con-
195. Staikova M, Wania F, Donaldson DJ (2004) stant. Environ Sci Technol 22:1349–1357
Molecular polarizability as single-parameter 207. Russell CJ, Dixon SL, Jurs PC (1992)
predictor of vapor pressures and octanol-air Computer-assisted study of the relationship
partitioning coefficients of nonpolar com- between molecular structure and Henry’s
pounds: a priori approach and results. Atmos law constant. Anal Chem 64:1350–1355
Environ 38:213–225 208. Modarresi H, Modarress H, Dearden JC
196. Andreev NN, Kuznetsov SE, Storozhenko SY (2007) QSPR model of Henry’s law constant
(1994) Prediction of vapour pressure and for a diverse set of organic chemicals based on
boiling points of aliphatic compounds. Men- genetic algorithm-radial basis function net-
deleev Commun 173–174 work approach. Chemosphere 66:2067–2076
197. K€uhne R, Ebert R-U, Sch€ uu€ rmann G (1997) 209. Abraham MH, Andonian-Haftvan J, Whiting
Estimation of vapour pressures for hydrocar- GS et al (1994) Hydrogen bonding. Part 34.
bons and halogenated hydrocarbons from The factors that influence the solubility of
chemical structure by a neural network. Che- gases and vapours in water at 298 K, and a
mosphere 34:671–686 new method for its determination. J Chem
198. Yaffe D, Cohen Y (2001) Neural network Soc Perkin Trans 2:1777–1791
based temperature-dependent quantitative 210. Yaffe D, Cohen Y, Espinosa G et al (2003) A
structure property relationships (QSPRs) for fuzzy ARTMAP-based quantitative struc-
predicting vapor pressure of hydrocarbons. ture–property relationship (QSPR) for the
J Chem Inf Comput Sci 41:463–477 Henry’s law constant of organic compounds.
199. Godavarthy SS, Robinson RL, Gasem KAM J Chem Inf Comput Sci 43:85–112
(2006) SVRC-QSPR model for predicting 211. Gharagheizi F, Abbasi R, Tirandazi B (2010)
saturated vapor pressure of pure fluids. Fluid Prediction of Henry’s law constant of organic
Phase Equil 246:39–51 compounds in water from a new group-con-
200. Sch€ € rmann G, Rothenbacher C (1992)
uu tribution-based model. Ind Eng Chem Res
Evaluation of estimation methods for the 49:10149–10152
air–water partition coefficient. Fresenius 212. Katritzky AR, Mu L, Karelson M (1996) A
Environ Bull 1:10–15 QSPR study of the solubility of gases and
201. Mackay D, Shiu WY, Ma KC (2000) Henry’s vapors in water. J Chem Inf Comput Sci 36:
law constant. In: Boethling RS, Mackay D 1162–1168
(eds) Handbook of property estimation 213. Walker JD, Jaworska J, Comber MHI et al
methods for chemicals: environmental and (2003) Guidelines for developing and using
health sciences. Lewis, Boca Raton, FL, pp quantitative structure–activity relationships.
69–87 Environ Toxicol Chem 22:1653–1665
202. Dearden JC, Cronin MTD, Ahmed SA et al 214. Dearden JC, Cronin MTD (2006) Quantita-
(2000) QSPR prediction of Henry’s law con- tive structure–activity relationships (QSAR)
stant: improved correlation with new para- in drug design. In: Smith HJ (ed) Introduc-
meters. In: Gundertofte K, Jørgensen FS tion to the principles of drug design and
(eds) Molecular modeling and prediction of action, 4th edn. Taylor & Francis, Boca
bioactivity. Kluwer Academic/Plenum, New Raton, FL, pp 185–209
York, NY, pp 273–274 215. Madden JC (2010) Introduction to QSAR
203. Hine J, Mookerjee PK (1974) The intrinsic and other in silico methods to predict toxicity.
hydrophilic character of organic compounds. In: Cronin MTD, Madden JC (eds) In silico
Correlations in terms of structural contribu- toxicology: principles and applications. RSC
tions. J Org Chem 40:292–298 Publishing, Cambridge, pp 11–30
204. Cabani S, Gianni P, Mollica V et al (1981) 216. OECD Principles: www.oecd.org/dataoecd/
Group contributions to the thermodynamic 33/37/37849783.pdf
properties of non-ionic organic solutes in dilute 217. OECD Guidelines: www.olis.oecd.org/olis/
aqueous solution. J Solut Chem 10:563–595 2004doc.nsf/LinkTo/NT00009192/
205. Meylan WM, Howard PH (1991) Bond con- $FILE/JT00176183.PDF
tribution method for estimating Henry’s law 218. Dearden JC, Cronin MTD, Kaiser KLE
constants. Environ Toxicol Chem 10: (2009) How not to develop a quantitative
1283–1293
structure–activity or structure–property rela- multiple regression analysis. J Med Chem

tionship (QSAR/QSPR). SAR QSAR Envi- 15:1066–1068
ron Res 20:241–266 231. Romanelli GP, Cafferata LFR, Castro EA
219. Ghafourian T, Dearden JC (2000) The use of (2000) An improved QSAR study of toxicity
atomic charges and orbital energies as hydro- of saturated alcohols. J Mol Struct Theochem
gen-bonding-donor parameters for QSAR 504:261–265
studies: comparison of MNDO, AM1 and 232. Yaffe D, Cohen Y, Espinosa G et al (2001) A
PM3 methods. J Pharm Pharmacol 52: fuzzy ARTMAP based on quantitative struc-
603–610 ture–property relationships (QSPRs) for pre-
220. Hartung T, Bremer S, Casati S et al (2004) A dicting aqueous solubility of organic
modular approach to the ECVAM principles compounds. J Chem Inf Comput Sci
on test validity. ATLA 32:467–472 41:1177–1207
221. Netzeva TI, Worth A, Aldenberg T et al 233. Aptula AO, Jeliazkova NG, Schultz TW et al
(2005) Current status of methods for defin- (2005) The better predictive model: high q2
ing the applicability domain of (quantitative) for the training set or low root mean square
structure–activity relationships. The report error of prediction for the test set? QSAR
and recommendations of ECVAM Workshop Comb Sci 24:385–396
52. ATLA 33:155–173 234. Erös D, Kéri G, Kövesdi I et al (2004) Com-
222. Hewitt M, Ellison CM (2010) Developing parison of predictive ability of water solubility
the applicability domain of in silico models: QSPR models generated by MLR, PLS and
relevance, importance and methods. In: Cro- ANN methods. Mini Rev Med Chem
nin MTD, Madden JC (eds) In silico toxicol- 4:167–177
ogy: principles and applications. RSC 235. Devillers J, Doré JC (2002) e-Statistics for
Publishing, Cambridge, pp 301–333 deriving QSAR models. SAR QSAR Environ
223. Flynn GL (1990) Physicochemical determi- Res 13:409–416
nants of skin absorption. In: Gerrity TR, 236. Scripps Institute: www.scripps.edu/rc/soft-
Henry CJ (eds) Principles of route-to-route waredocs/msi/cerius45/qsar/working_with_
extrapolation for risk assessment. Elsevier, stats.html
Amsterdam, pp 93–127 237. QSAR World: www.qsarworld.com/statistics.
224. Young D, Martin T, Venkatapathy R et al php
(2008) Are the chemical structures in your 238. Golbraikh A, Tropsha A (2002) Predictive
QSAR correct? QSAR Comb Sci 27: QSAR modeling based on diversity sampling
1337–1345 of experimental datasets for the training and
225. Cronin MTD, Dearden JC, Moss GP et al test set selection. J Comput Aided Mol Des
(1999) Investigation of the mechanism of 16:357–369
flux across human skin in vitro by quantitative 239. Eriksson L, Johansson E, M€ uller M et al
structure–permeability relationships. Eur J (2000) On the selection of the training set
Pharm Sci 7:3250330 in environmental QSAR analysis when
226. Hewitt M, Madden JC, Rowe PH, Cronin compounds are clustered. J Chemom 14:
MTD (2007) Structure-based modelling in 599–616
reproductive toxicology: (Q)SARs for the pla- 240. Hemmateenajad B, Javadnia K, Elyasi M
cental barrier. SAR QSAR Environ Res 18: (2007) Quantitative structure-retention rela-
57–76 tionship for the Kovats retention indices of a
227. Doniger S, Hofmann T, Yeh J (2002) Predict- large set of terpenes: a combined data
ing CNS permeability of drug molecules: splitting-feature selection strategy. Anal
comparison of neural network and support Chim Acta 592:72–81
vector machine algorithms. J Comput Biol 241. Cronin MTD (2010) Characterisation, eval-
9:849–864 uation and possible validation of in silico
228. IUPAC InChI code: www.iupac.orgt.inchi models for toxicity: determining if a predic-
229. Gedeck P, Rohde B, Bartels C (2006) tion is valid. In: Cronin MTD, Madden JC
QSAR—how good is it in practice? Compari- (eds) In silico toxicology: principles and
son of descriptor sets on an unbiased cross applications. RSC Publishing, Cambridge,
section of corporate data sets. J Chem Inf pp 275–300
Model 46:1924–1936 242. Tropsha A, Gramatica P, Gombar VK (2003)
230. Topliss JG, Costello RJ (1972) Chance corre- The importance of being earnest: validation is
lations in structure–activity studies using the absolute essential for successful
138 J.C. Dearden
application and interpretation of QSPR mod- 245. Johnson SR (2008) The trouble with QSAR
els. QSAR Comb Sci 22:69–77 (or how I learned to stop worrying and
243. Benigni R, Bossa C (2008) Predictivity of embrace fallacy). J Chem Inf Model 48:25–26
QSAR. J Chem Inf Model 48:971–980 246. Katritzky AR, Kuanar M, Slavov S et al (2010)
244. Dearden JC, Hewitt M, Geronikaki AA et al Quantitative correlation of physical and
(2009) QSAR investigation of new cogni- chemical properties with chemical structure:
tion enhancers. QSAR Comb Sci 28: utility for prediction. Chem Rev
1123–1129 110:5714–5789
Chapter 7
Informing Mechanistic Toxicology with Computational

Molecular Models
Michael R. Goldsmith, Shane D. Peterson, Daniel T. Chang,
Thomas R. Transue, Rogelio Tornero-Velez, Yu-Mei Tan,
and Curtis C. Dary
Abstract
Computational molecular models of chemicals interacting with biomolecular targets provides toxicologists
a valuable, affordable, and sustainable source of in silico molecular level information that augments,
enriches, and complements in vitro and in vivo efforts. From a molecular biophysical ansatz, we describe
how 3D molecular modeling methods used to numerically evaluate the classical pair-wise potential at the
chemical/biological interface can inform mechanism of action and the dose–response paradigm of modern
toxicology. With an emphasis on molecular docking, 3D-QSAR and pharmacophore/toxicophore
approaches, we demonstrate how these methods can be integrated with chemoinformatic and toxicoge-
nomic efforts into a tiered computational toxicology workflow. We describe generalized protocols in which
3D computational molecular modeling is used to enhance our ability to predict and model the most
relevant toxicokinetic, metabolic, and molecular toxicological endpoints, thereby accelerating the compu-
tational toxicology-driven basis of modern risk assessment while providing a starting point for rational
sustainable molecular design.
Key words: Docking, Molecular model, Virtual ligand screening, Virtual screening, Enrichment,
Toxicity, Toxicoinformatics, Discovery, Prediction, 3D-QSAR, Toxicophore, Toxicant, In silico,
Pharmacophore
139
140 M.R. Goldsmith et al.
1. Introduction
1.1. Overview of Modern computational molecular modeling methods are some of

Molecular Modeling the most well-established, versatile, and vital computational chem-
and Its Role in istry methods that are at the very core of the emerging field of both
Computational mechanistic (1) and computational toxicology and sustainable
Toxicology: Filling the molecular design.1 The use of molecular modeling coupled to
Data Gaps in
mathematical and chemical–biological inquiry is crucial “to better
understand the mechanisms through which a given chemical
Mechanistic Toxicology
induces harm and, ultimately, to be able to predict adverse effects
of the toxicants on human health and/or the environment” (2).
A first step in considering the use of three-dimensional (3D)
computer-assisted molecular modeling (CAMM) methods is the
awareness of molecular level questions one can address with the
various techniques. Molecular modeling can be used in the context
of toxicological inquiry to address three molecular level aspects of
both individual small-molecule (ligand) or biological macromole-
cules (targets) and the resultant interactions of the ligand/target
complex, namely (1) Structure, (2) properties, and (3) (re)activity.
In the context of toxicological and chemical genomic research (or
toxicogenomics), one is interested in or requires downstream infor-
mation that makes use of “optimized” structures or geometries of
ligands or biological targets. Of the properties one may be inter-
ested in, molecular complementarity is a key objective along with
catalytic competence of a chemical and possibly molecular suscepti-
bility (or reactivity of a molecule). Similarly, there are two main
research efforts one wishes to inform in mechanistic toxicology,
namely
1. Toxicokinetics or ADME (rate of fate within the body).
2. Toxicodynamics or molecular toxicological interactions that
result in a cellular response.
By considering two principal research paths and the biological
macromolecular target space to which these coupled processes are
related (Table 1) it becomes evident that the subset of molecular
modeling tools that will be used by a toxicologist is not much
different than the in silico drug discovery workflows (3), with the
exception that there is less of an emphasis on lead optimization and
more of an incentive on modeling approaches that possess an ability
to both accurately and efficiently prioritize and categorize chemi-
cals to their respective macromolecular targets; in silico methods
1
Historically molecular modeling methods, stemming from roots in theoretical and computational chemistry, are
composed of an ensemble of developed and thoroughly vetted computational approaches used to investigate
molecular-level processes and phenomena including but not limited to molecular structure, chemical catalysis,
geochemistry, interfacial chemistry, nanotechnology, conformational analysis, stereoselectivity, enzyme biochem-
istry, chemical reaction dynamics, solvation, molecular aggregation, and molecular design.
7 Informing Mechanistic Toxicology with Computational Molecular Models 141
Table 1
This table shows the overlap between macroscopic and mechanistic toxicology,
examples of targets for which pair-wise ligand/target interactions are most often
sought after, and the molecular modeling methods used to inform the toxicological
questions. Toxicology research streams (toxicokinetics/dynamics), specific
toxicology-related processes (ADME/T), examples of toxicologically related
biological macromolecules implicated in specific processes and the three-
dimensional Computational Molecular Modeling methods (3D-CAMM) elaborated
in this text
Examples of process relevant

Coupled processes macromolecular protein Molecular modeling
“Toxico-” in toxicology targets methods
(I) -KINETICS (A)bsorption (i.e., Ion channels (PgP) (A) Geometry
(biological dermal, oral, Molecular transporters optimization
fate models of inhalation) Cell membranes (lipid bilayer (B) Partial charge
chemicals or considered for passive calculation/
disposition properties) assignment
models) (D)istribution Extracellular protein binding Target-specific endpoint
(i.e., Target Tissue of (e.g., human serum data available
target organ) albumin or alpha- (C) Pharmacophore
fetoprotein or modeling and
immunoglobulin binding) (D) 3D-QSAR
(M)etabolism Intracellular solute carrier Target Structure available
(enzyme-mediated proteins (e.g., SHBG or (E) Target geometry
chemical FABP) (inhibitors or Optimization and/or
transformations substrate binding related homology modeling
typically associated properties) (F) A priori Small-
with hepatic Phase I/II enzymes (i.e., molecule/target
clearance CYP450s, oxidoreductases, interaction evaluation
mechanisms) carboxylesterases) by molecular docking
*For kinetic properties, such as (G) Molecular mechanics or
rate constants consider QM empirical pose scoring
formalism (see * in methods Structure-based Virtual
as well) Ligand Screening
(E)limination/ Ion channels, organic (SB-VLS)
excretion (Renal or molecular ionic . . .+SB-VLS + SB-VLS +
Biliary elimination transporters, globulins, SB-VLS . . .
processes) active transporters ¼in silico chemical
(II) - Molecular (T) Nuclear receptors, G-protein- genomics
DYNAMICS oxicology (ligand/ coupled receptors, ion
(response or receptor pathways) channels (e.g., for neuronal
effect models) impulse propagation or
cardiac charge regulation,
examples are Sodium-gated
Ion channels, or HERG2
channel)
that are complementary to modern experimental toxicogenomic

inquiry.
It is estimated that there exist in the order of 7,000,000 chemi-
cal leads for small-molecule drug discovery and ~80,000–100,000
chemicals under the auspice of environmental chemicals for which
the data matrix for risk is sparsely populated (i.e., environmental
chemicals), and so there is a need for large-scale screening efforts
for prioritization and categorization of these large inventories
(1, 3–6). Due to the scale of chemical inventories of interest and
variety of toxicologically implicated targets of interest, the most
appropriate starting point for 3D-CAMM most frequently applied
to toxicology (pharmacophores, 3D-QSAR and molecular dock-
ing) is the use of molecular mechanics force fields to describe or
determine the 3D structure of a chemical/biological molecular
system of interest (7–13). In this approach both ligands and
biological macromolecular targets are mathematically described
and modeled by applying classical Newtonian mechanics to atomic
(not electronic) systems which in turn are numerically evaluated
using modern computational implementations of the underlying
biophysical models. We stress the importance of delineating the
fundamental choice of molecular mechanics as opposed to quan-
tum mechanical approaches for answering questions typical of
chemical/biological perturbations due to the size domain, and
information criteria of the part of mechanistic toxicology one
most often wants to inform in the computational toxicology frame-
work; the pair-wise interaction potential between ligand and mac-
romolecular target. To better understand the difference between
2D/3D molecular modeling methods as applied to computational
toxicology, we present a symbolic graphic (Fig. 1) outlining the
three main 3D molecular modeling techniques used in this chapter.
Although intrinsic property or functional group chemical filters
(e.g., Leadscope) in addition to classical QSAR approaches (14)
and decision tree classifiers (15) are both pragmatic and parsimoni-
ous components of the chemoinformatic toolkit of computational
toxicologists, they lack the intimate molecular level detail of the
biomolecular interaction that could only be resolved by 3D-
CAMM. Often chemoinformatics methods alone are unsuitable
to address structurally related questions that require target-specific
insight. For instance, for cases that fail to be able to resolve stereo-
isomerism and its implications in biomolecular interactions,
species-related differences in sequences, polymorphism-related
extrapolations in susceptible populations, and structural bases for
mechanistic variability (inhibition versus substrate, agonist versus
antagonist), there is little question that primarily 3D modeling
methods such as (I) pharmacophore mapping, (II) 3D-QSAR,
and (III) molecular docking methods that necessitate detailed
structural information (i.e., Cartesian coordinates of atoms and
Fig. 1. Point of departure from (a) 1D chemical smiles notation to 3D representation, with atom type and specific
coordinates spatially defined. (d–f) The three major classes of molecule modeling methods used to evaluate ligand/target
interactions.
their specific connectivity) and are the only viable alternatives for
reliable a priori estimates for risk assessment.
1.2. Exploring Ligand: Unlike specific models of both biological macromolecules and
Target Interactions small-molecule ligands, both 3D-QSAR and pharmacophore meth-
Implicitly: 3D-QSAR and ods address the fundamental chemical/biological aspects of pair-
Pharmacophores wise interactions implicitly. Although both deal with the explicit
(i.e., full 3D) structure of a chemical of interest and both require
either a training/test set of chemicals with known activities for a
given target for a given mode of action (i.e., agonist or antagonist,
substrate or inhibitor), neither pharmacophore approaches nor 3D-
QSAR approaches can provide specific molecular level detail
between atoms on both macromolecule coupled to those of the
ligands that give rise to said activity. Pair-wise interactions between
the ligand and the target molecule must be spatially defined. None-
theless, both methods are a step in the right direction from tradi-
tional 2D-QSAR since inherently both 3D-QSAR and
pharmacophore models have the ability to discriminate activity
based on three-dimensional topology (i.e., inform stereochemical

interactions or regiospecific interactions) without providing
residue-specific interactions that could give rise to the specific
interaction.
According to IUPAC, a pharmacophore (or in the case of
toxicology, a toxicophore) is “an ensemble of steric and electronic
features that is necessary to ensure the optimal supramolecular
interactions with a specific biological target and to trigger (or
block) its biological response” (16). In this sense a pharmacophore
model’s objective is to characterize a molecule’s atomic constitu-
ents in terms of the primary interaction types that give rise to pair-
wise interactions; from multiple atoms to a subset of binding
“features.” The features most common for a set of known chemical
actors on a known biological target of undefined tertiary structure
are hydrophobic, aromatic, hydrogen bond acceptor, hydrogen
bond donor, and cationic, anionic, or metal interactions. Further-
more there may be exclusion volumes and feature directionality
included in the pharmacophore. The pharmacophore features are
elucidated by comparing multiple known actors in terms of com-
mon overlaid structural features (or alignments). Next, if one is to
investigate a series of chemicals and test the pharmacophore one
would sample conformational space and any structure that
contained a conformation that satisfied the spatial and feature
requirements of the pharmacophore model would be considered a
complete or partial “hit.”
On the other hand, being able to assess which functional
groups or specific spatial features have the ability to modulate the
chemical/biological interactions in a quantitative sense is the area
of 3D-QSAR. Although there are cases of simple QSAR models
dating back to the late 1800s (17), 3D-QSAR is a much more
recent approach. While classical QSAR models are useful for rapidly
predicting chemically induced effects based on physicochemical
properties, its main weakness is that it does not account for three-
dimensional molecular shape, a critical aspect of intermolecular
interaction. Instead of relying on physicochemical properties as
molecular descriptors, 3D-QSAR interprets molecular shape using
interaction energies from force field calculations. The huge number
of individual interaction energies was historically difficult to corre-
late with biological activity and it was not until the advent of
computationally implemented Partial Least Squares (PLS) analysis
(18) that 3D-QSAR became technically feasible.
The first, and still most widely used 3D-QSAR method, is
known as Comparative Molecular Field Analysis (CoMFA) (19).
Other methods have since emerged, including Comparative Molec-
ular Similarity Indices Analysis (CoMSIA) (20), ALMOND (21),
three-dimensional QSAR (TDQ) (22), Catalyst (23), and Phase
(23), generally to either improve predictive performance or simplify
the model development process. The main drawback of 3D-QSAR

is the time requirements and difficulty in preparing the data set for
model development.
Further details and steps required for both pharmacophore
elucidation/mapping and 3D-QSAR as applied to toxicology are
elaborated in Subheading 3.
1.3. Modeling Explicit In the specific case of modeling ligand/target interactions for
Pair-Wise Interaction virtual ligand screening as applied to toxicology, certain methods
Potential of for evaluating pair-wise interaction energy are too computationally
Ligand–Target: expensive/intensive and scale poorly with system size; Quantum
Molecular Mechanics, Mechanical (QM or sometimes referred to as quantum chemical or
Empirical Scoring, and electronic structure theory methods) are highly accurate but not
the Need for Structurally ideal (hence not pertinent) due to their computational demand for
Informed Molecular almost all of the said interaction partners and processes listed in
Models Table 1, with perhaps the exception of bond breaking/making
processes inherent in metabolic reactions or irreversible binding.
Although the principal focus of in silico methods to estimate meta-
bolic rate constants have been quantum mechanical (24, 25), the
majority of pair-wise interactions a toxicologist will require are
related to ligand/macromolecular target pair-wise interactions,
composed of both bonded (ligand and target “self-energy”) and
nonbonded interactions between a small molecule and target
biological macromolecule (i.e., receptor or enzyme) for which all
structural optimization routines are adequate within a classical
physics formalism, or more specifically, within a molecular mechan-
ics (MM) framework in which the smallest unit of relevance are
atoms (not electrons as in the case of QM approaches).
The classical physics approach to modeling molecules requires
the assumptions of molecular mechanics which makes use of
atom-specific functions, or force fields parameters, that have been
developed by a variety of experimental or high-level theoretical
calculations (i.e., ab initio or semi-empirical QM). These are related
to atom-specific terms that describe all bonded and nonbonded
interactions (conformational energy, as a function of dihedral
angles, bond angles and bond lengths intramolecular electrostatic
interactions and van der Waals, or dispersion forces) in Cartesian
space that are ultimately integrated over all space of the individual
molecule or ligand/biomolecule complex to estimate “intermolec-
ular” interaction energy. The “pair-wise interaction potential”
between a ligand and a macromolecular target is provided in a
simplified form in Fig. 2.
It is well known that there is a relationship between free energy
of a reaction or complex formation and reaction rate (i.e., chemical
kinetic) variables. As provided in Eq. 2 of Fig. 2, if one has a
method for capturing interaction free energy of complex formation
(or association) of a ligand/target complex this thermodynamic
L T L:T
1. ΔEL:T =∑ ∑ (Ebonded + Enon−bonded + Esolvation)L:T −∑ (Ebonded + Enon−bonded + Esolvation)L,T

L T L,T
2. ΔE ≈ ΔGassociation = ΔHL:T – TΔSL:T = –RT ln(KAL:T) = –RTln(1/KDL:T)
3. KDL:T ≈ Ki
Fig. 2. Fundamental classical expressions evaluated in pair-wise interaction modeling between ligand (L) and macromo-
lecular target (T), and the resulting affinity of the complex (L:T). The first expression relates the energy of the molecular
components as a difference of the complex’s bonded/nonbonded atomic potential from the energy of the individual
partners (L,T). (2) The approximation that the energy function is related to free energy of a system, the thermodynamic
representation in terms of enthalpy (H) and entropy (S), and the thermodynamic interpretation of transition-state theory and
molecular driving forces for association (Ka). Finally the relationship between the complex affinity or dissociation and a
toxicologists metric of the inhibition constant (Cheng-Prusoff).
variable, dG can be cast in terms of an equilibrium process via the

expression in lines 2 and 3, where the association constant of an L:T
complex, Ka ¼ [LT]/[L][T] Kd ¼ 1/Ka and Ki, the inhibition
constant from competitive inhibition assays that in vitro assays
often quantify is directly proportional to Kd (dissociation constant)
of the ligand with respect to a reference probe (26).
In theory, it is tempting to believe that the free energy from
scoring or force-field functions should directly correlate with the
experimentally determined biological activity (Kd or Ki) of complex
formation as evaluated by pair-wise interaction schemes, the prob-
lem is significantly more involved. The complexity of the problem
and inherent simplifications in molecular docking often result in an
ability to enrich a data set in question in such a way that “actives”
(i.e., biologically active molecules or “hits” for a target) considered
above some threshold expectation value for binding are guessed
several orders of magnitude better than a random guess. For
screening this is a reasonable expectation. Details of the various
steps for 3D molecular modeling are addressed briefly in Subhead-
ing 3, with focus on how to use these optimized structures for 3D
pharmacophore elucidation, 3D-QSAR, and molecular docking.
For more extensive methodological resources for any of the meth-
ods provided, we refer the reader to Table 3 which contains expan-
sions of the topics covered in this chapter. We strongly encourage
familiarization with these tools through practice. If one wants to
apply these techniques to individual toxicological research efforts.
Fig. 3. Computational toxicology modeling workflow showcasing the in silico, in vitro, and in vivo integration of data and
models within an informatics framework.
1.4. The Use of Although we have provided an overview of the most popular and
Molecular Modeling useful aspects of 3D-CAMM that could be used to inform mecha-
in Computational nistic toxicology, we need to understand how they fall into the
Toxicology: The computational toxicology framework. To know how and when
integrated Modeling these methods are applied in practice, and by whom, we have
Workflow to In Silico devised a workflow (Fig. 3) that highlights some of these compo-
Chemical Genomics nents and how they may complement experimental High Through-
put Screening protocols. The objective is to enrich the
understanding of chemical/biological interactions through toxico-
genomic inquiry. This is achieved by an in silico (filters ! 2D
QSAR > pharmacophore ! docking/3D-QSAR) tiered approach
that is tightly coupled to experimental in vitro screening efforts
(i.e., protein ligand binding assays, transient activation assays,
gene expression profiling, cytotoxicity assays, etc.) to encode a
chemical-specific biological activity fingerprint or signature. This

conceivably can also be performed in silico using multiple target
screening and used as a metric for chemical/biological activity
comparisons (i.e., similarity based on multi-target virtual affinity
fingerprint as opposed to structure alone).
All of the data from a tiered approach to virtual ligand screen-
ing could and should ultimately be encoded or captured within a
database framework so that easy recall could be performed to
inform molecular level resolution data gaps as they arise. We add
that the development of a database infrastructure that can subse-
quently capture the resultant poses and pair-wise interaction energy
(surrogates for affinity) holds value in being able to query
molecular-level insight for an experimental chemical genomics
screen.
We provide, in brief, a workflow that demonstrates how to pair
or couple experimental, in silico, and 3D in silico methods and the
various pipes of data that allows one to build a virtual ligand–target
complex structural database. This type of strategy had been
adopted to build our own in-house resource to support in silico
toxicogenomic inquiry (DockScreen) which is explained in Sub-
heading 7.4.
2. Materials
There are well over 350 independent packages (computational

codes) available for various aspects of the molecular modeling or
Virtual Ligand Screening (or 3D VLS) paradigm that capture the
various components required for 3D modeling of ligand/biomole-
cule interactions: all chemoinformatics and QSAR development,
docking, homology modeling, pharmacophore elucidation, chemi-
cal structure manipulation, structure building, refinement, optimi-
zation, and finally bioinformatics applications.
For the case of computational toxicology, the lead optimization
procedure/process typically associated with in silico drug discovery
or rational drug design and associated methods and coded imple-
mentations are essentially dropped (although they may persist for
sustainable molecular design). These packages run on many differ-
ent platforms including but not limited to Windows/PCs, SGI,
Mac, Linux (UNIX workstations), and some limited functionality
molecular modeling utilities are even available for handheld devices
and smartphones. For practical purposes, we have typically chosen
one of several commercial suites that with the following features.
1. Platform independence (works on heterogeneous network
architecture).
2. Token-key license structure (check out by user when required).
Table 2
A list of several comprehensive software/tool/data
resources lists available on the WWW that provide access
to various commercial and open-source software packages,
in addition to open-access database resources
Individual 3D CAMM lists Uniform resource locator (URL)

Directory of In Silico Drug Design Tools http://www.click2drug.org
(Swiss Institute of Bioinformatics)
Universal Molecular Modeling List http://cmm.info.nih.gov/
(NIH) modeling/universal_software.
html
Free computer tools in Structural http://www.vls3d.com/links.
Bioinformatics and Chemoinformatics html
Computational Chemistry List, ltd. http://www.ccl.net/chemistry/
(CCL.NET) Software-Related Sites links/software/index.shtml
(Note, these include “ALL”
chemistry-related sites above and
beyond the scope of this paper)
Virtual Library: Science: Chemistry: http://www.liv.ac.uk/
Chemistry Software Houses from the Chemistry/Links/
University of Liverpool (UK) softwarecomp.html
3. Many independent molecular modeling methods, bioinformatics,

chemoinformatics and data mining methods combined.
4. Built-in functionality for scripting, piping data, and auto-
mated/macro-workflows.
5. Is well documented and has good active and passive support
networks (technical service and FAQ/scripting forums).
Publically available resources for the “nonexpert” or experts are
included in Table 2 and provide numerous links for a variety of
software packages, both commercial and open-source, in addition
to visualization tools and databases relevant for informing ligand/
target pair-wise interaction modeling.
From an application standpoint, the authors have required
both bioinformatics and chemoinformatics tools, structural data-
base capabilities, and the ability to perform geometry optimization
of structures, molecular docking, homology modeling of target
structures, conformational searches, pharmacophore elucidation,
and QSAR development. However, we have primarily used Chemi-
cal Computing Group’s Molecular Operating Environment
(MOE) (27) for all database manipulation, QSAR development,
library development, structural optimization, and descriptor calcu-
lation. Similarly, for ADME-related parameter estimation via QSAR
we use Schrodinger’s QikProp (28) which has been vetted against

various animal and human drug targets or ADME-related end-
points (i.e., LogP(Brain:Blood), LogK(HSA), #metabolites, CACO2,
or MDCK permeability, etc.).
3. Methods
As mentioned in the introduction, the classical physics approach to

modeling molecules requires the assumptions of molecular
mechanics which makes use of atom-specific functions, or force
field parameters, that have been developed often for specific classes
of molecules. Force field calculations have been successfully per-
formed on larger polypeptides and protein structures. Park and
Harris (24) utilized AMBER force fields to develop an all-atom
model for CYP2E1 which was subsequently used for docking stud-
ies. A comprehensive review of AMBER protein force field devel-
opment can be found elsewhere (7, 29). Several studies have also
assessed the relative performance of CHARMM, MMx, OPLS, and
AMBER force fields (30, 31). Gundertofte et al. (32) have assessed
the relative accuracies of MMx and AMBER force fields. Jorgenson
et al. (33, 34) have also examined the performance of their OPLS
force field in the context of proteins and organic liquids. Regardless
of the framework details, a molecular mechanics force field is always
chosen for structural optimization, and the specific force field
selected is usually chosen that best captures the atom-type diversity
in the data set (i.e., chemical space of the training fragments or
atoms). A broad overview of all the stepwise modeling procedures is
provided in Fig. 4.
l For our molecular modeling needs (i.e., in the case of small-
molecule ligands with environmental chemicals), we have almost
exclusively used the MMFFx force field (MMFFx, (35)) for
3D geometry optimizations of entire libraries of chemicals.
l Subsequently, partial atomic charges are assigned either using
empirical (i.e., Gasteiger) or semiempirical-based charge model
representations of the electrostatics of the system (i.e., AM1-
BCC) and are stored in a 3D chemical structural database.
l These structures could be used directly with target specific
activity information as the seed for a conformational search
(spanning all rotatable bonds to predict other relevant geome-
tries) and aligned to other known biologically active chemicals
to generate 3D-QSAR or pharmacophore models.
l However, if the structure is known and the target protein
sequence is known and a crystal structure or near-neighbor
homolog exists, it is conceivably simple to optimize hydrogens
7 Informing Mechanistic Toxicology with Computational Molecular Models
Fig. 4. Specific steps in 3D-CAMM (computer-assisted molecular modeling) workflows, chemical/biological knowledge-based boundary conditions (top box ), ligand-based
151
approaches (left box ), structure-guided methods (central box ), structural biological target inventories, and types of questions related to toxicologically relevant target–target
extrapolations one can inform from structure-based approaches (dark gray boxes ).
on the crystal structure obtained from the literature, or per-

form theoretical site-directed mutagenesis or threading, the
basis for homology modeling based off of a known structural
template.
l Finally, with an optimized target structure database and an
optimized ligand database, one could perform molecular dock-
ing experiments where the pair-wise ligand:target interactions
(bonded and nonbonded terms) are systematically evaluated.
The resulting poses from such a docking “run” can each be
individually scored based on known binding affinity. There are
numerous online resources that provide ligand/target binding
affinity data (i.e., www.bindingdb.org).
l Using these rank-ordered lists of chemicals based on scored
docking poses between a small molecule and a macromolecular
target is the starting point for a prioritization or rank-order
scheme for screening a specific target: virtual structure
(macromolecule)-based virtual ligand screening or structure-
guided chemical prioritization methods.
l A library of structural targets of interest that may have been
selected based on their role in a major toxicity pathway a
researcher may be studying, has value in being able to fish for
targets of any chemical (36). The next section elaborates on the
capabilities of a large-scale ligand/target screening initiative.
For detailed description of external methods we encourage the
readers to consult Table 3, which contain more detail for each
aspect of the various steps of molecular modeling. Next we provide
a stepwise breakdown of various modeling steps required for eval-
uating ligand/macromolecular target interactions.
3.1. Ligand Preparation 1. Collect a list of all the chemicals of interest

(optional)—Data set augmentation (i.e., adding “simulated
metabolites”—Enumerate metabolites using a heuristic (or
knowledge base) metabolite enumeration algorithm (37, 38)).
2. Curate this list with the smiles representation of the structure
of interest.
3. Convert this 2D chemical structure data set to a 3D representa-
tion by assigning the chemicals absolute configuration which
includes the atom types but their 3D connectivity (bond type
and orientation) and selecting an appropriate molecular mechan-
ics force field and charge model of interest (depending on the
chemical space of the chemicals of interest in addition to the
magnitude of the screening initiative (i.e., hundreds to thousands
of chemicals one would be better of going to no more than a
classical physics approximation of the molecular geometry)).
4. Assign charges to the geometry optimized structures.
Table 3
A comprehensive set of combined reviews and/or methods papers for various
aspects of the 3D molecular modeling methods discussed in this chapter. We urge
the reader to familiarize themselves with each of the steps associated with their
modeling method chosen and the particular toxicological data gaps they may wish
to address
Step Systematic methods reference

Molecular docking Morris G, Lim-Wilby M (2008) Molecular docking. Methods in
molecular biology, vol 443. Clifton, NJ, pp 365
A general introduction to (a) Kroemer R (2003) Molecular modelling probes: docking and
molecular modeling techniques scoring. Biochem Soc Trans 31:980–984; (b) Van Dijk A, Boelens
in the area of protein–ligand R Bonvin A (2005) Data driven docking for the study of
interactions biomolecular complexes. FEBS J 272(2):293–312
Docking scoring functions Pick D (2004) Novel scoring methods in virtual ligand screening.
Methods in molecular biology, vol 275, pp 439–448
Chemical database preparation Bologa C, Olah M, Oprea T (2005) Chemical database preparation
for compound acquisition or virtual screening. Methods in
Molecular biology, vol 316. Clifton, NJ, p 375
Target selection criteria Wishart D (2008) Identifying putative drug targets and potential
drug leads: starting points for virtual screening and docking.
Methods in molecular biology, vol 443. Clifton, NJ, p 333
Virtual or in silico affinity Briem H, Lessel U (2000) In vitro and in silico affinity fingerprints:
fingerprints Finding similarities beyond structural classes. Perspect Drug
Discov Des 20(1):231–244
3D structure-based virtual ligand Villoutreix B et al (2007) Free resources to assist structure-based
screening resources and brief virtual ligand screening experiments. Curr Protein Pept Sci 8
overview (4):381–411
In silico chemical genomics— Jongejan A et al (2005) The role and application of in silico docking
target and ligand preparation in chemical genomics research. Methods in molecular biology, vol
310. Clifton, NJ, p 63
Analysis of chemical space in the Jaworska J, Nikolova-Jeliazkova N, Aldenberg T (2005) QSAR
context of domain of applicability domain estimation by projection of the training set
applicability descriptor space: a review. Altern Lab Anim 33(5):445
Analysis of docking data Bender A et al (2007) Chemogenomic data analysis: prediction of
small-molecule targets and the advent of biological fingerprints.
Comb Chem High Throughput Screen 10(8):719–731
5. Refine the data set (see Table of Methods)—(39).

(a) Consider charge state and charge model, force field, and
domain of applicability (DOA) dependent on the nature of
your chemical.
6. Capture all 3D geometries into a database.

(a) Almost all major molecular modeling suites (e.g., Chemical
Computing group’s MOE (27), Accelrys Discovery Suite
(40), Schrodinger (28), and Tripos (41)) provide database
representation of the chemicals of interest, so converting
from smiles code to 3D optimized, cleaned, and charge-
model applied 3D representation is relatively seamless.
(b) STOP.
3.2. Target Preparation 1. Coupled to experimental knowledge, searching through chem-

ical genomics databases such as http://stitch.embl.de or the
Comparative Toxicogenomics Database (http://ctd.mdibl.
org/) often identifies relevant targets for a chemical or analog
of interest.
2. Finding a suitable target model (typically an X-ray crystal struc-
ture from http://www.pdb.org) is the next step. Before assum-
ing that a given target structure will serve as a sound basis for
molecular modeling studies, it is critical to understand that
“protein structures” are models. Although they are based on
experimental data, they are nonetheless prone to bias or ambi-
guity from several sources. While numeric metrics such as
resolution, R-factor, free-R, redundancy, and average I/sigma
(signal to noise ratio) are important considerations for the
overall reliability of a crystal structure model, at least some
local errors or ambiguities are found in nearly all structures.
Active sites are often somewhat rigid, especially when bound to
ligands, so one can hope that the structure of interest is a sound
choice. However, there is usually no substitute for examining
electron density (see Subheading 7.5 and (42)).
3. How does one select the “appropriate structure” if confronted
with several? Perform an RMSD evaluation on a structural
superposition. If the geometries are similar they may cluster
into most probable conformation states. Select a representative
from each cluster.
4. If (a) the target structure is known, (b) the sequence is known,
and (c) a crystal structure or near-neighbor homolog exists, it is
conceivably simple to optimize hydrogen atoms on the crystal
structure obtained from the literature or perform theoretical
site-directed mutagenesis or threading, the basis for homology
modeling based off of a known structural template.
5. If the target structure is not known and one wishes to perform
structure-based virtual screening or molecular docking, one
must build a homology model of the structure of interest
using the sequence of the desired target (from www.uniprot.
org) and a crystal structure template of the nearest-neighbor
homolog (template or crystal structures from www.pdb.org)
and homology or sequence identify search using BLAST.
A protein homology model server is available for integrated

web modeling at www.proteinmodelportal.org.
3.3. Molecular Docking 1. With an optimized target structure database, and an optimized
ligand database one could perform molecular docking experi-
ments where the pair-wise interactions (bonded and non-
bonded terms) between the ligand and the macromolecular
target are systematically evaluated. The resulting poses from
such a docking “run” can each be individually scored based on
known binding affinity data training set of chemicals for a given
target.
2. A binding site is identified (co-crystallized ligand site or ratio-
nally selected site) and each ligand is subject to interact with the
macromolecular target, where sampling and docking trajec-
tories are subject to the force field approximations. Each indi-
vidual “pose” is scored or captured for subsequent analysis.
3. Each of the poses are systematically scored using pair-wise inter-
action potentials that are either derived from classical physics
approaches (i.e., force field approximation) or empirical scoring
functions that have been optimized to reproduce either experi-
mental in vitro binding affinities (trained scoring function).
4. The results are subsequently validated for their ability to enrich
MOA data or rank-order chemical binding for a known target.
Another common validation protocol that has less to do with
the binding affinity and more with pose analysis is the ability for
the docking algorithm to reproduce the original co-crystallized
ligand in the same geometry. Methods that minimize the
RMSD between known pose and docked pose are considered
optimal. This approach of being able to reproduce experimen-
tal crystal structures is termed “pose fidelity” (43, 44) and
references within.
5. Docking “experiments” can form the basis of a numerically
continuous complementarity evaluation of ligand/target com-
plexes (unlike experimental, that rely on binding stronger than
a probe chemical threshold or limit of detection, or else result
in an “NA” or blank result.). Since one can take the top and
bottom rank-ordered chemicals for a target and deduce che-
moinformatic filters (i.e., intrinsic functional property or func-
tional group profiling) one could conceivably perform what is
known as “progressive docking,” where filters from molecular
docking simulations are used to create “structure-guided” fil-
ters for subsequent chemicals (Progressive Docking: A Hybrid
QSAR/Docking Approach for Accelerating In silico High
Throughput Screening) (45).
6. Details about assumptions and expectations from structure-
based virtual ligand screening models, and the very nature of
the target structure used are enumerated in Subheading 7.5. It

is very important to be aware of the various issues that lead to
mismatched expectations (i.e., surprise) when attempting to
apply these 3D-CAMM approaches.
3.4. 3D-QSAR This section is intended to provide a brief overview of 3D-QSAR,

outlining its advantages and disadvantages and describing the basic
steps required to derive and validate a model. For greater detail on
the topic, the reader is referred to external references (46–48).
Figure 5 outlines the basic steps involved with developing, validat-
ing, and using a 3D-QSAR model. Although not listed on the
figure, the first step in deriving a 3D-QSAR model is really defining
the applicability domain or the chemical space comprising the set of
compounds for which the model is to be used. A good way of
rapidly assessing the chemical space of two chemical lists is using
ChemGPS-NP (49, 50). When that has been defined, a representa-
tive sample of that compound list with known biological activity
must be chosen for further development. 3D-QSAR models have
been developed using data sets ranging from as little as 20–30
compounds or as much as 200–300. A large data set of compounds
Compound
selection
Bio-active Geometry Charge

conformation optimization calculation
Data set
No
Division
Sufficiently
Training set Test set
predictive?
3D-QSAR model
Yes
Internal External Use Model

validation –q2 validation –r2pred
Fig. 5. Workflow for deriving, validating, and using a 3D-QSAR model.

will likely cover a larger chemical space, but considerations of the

cost of biological testing usually limit this size. Another consider-
ation in the initial stages of data set design is the overall span in
biological activity values: a span in activity values of 5 log units is
generally considered to be the minimum requirement.
1. After data set compounds have been carefully chosen, several
steps of molecular modeling are necessary to ensure
(a) They are in their biologically active conformation.
(b) Geometry optimization has been performed.
(c) Accurate partial atomic charges have been assigned.
2. Of these steps, identification of the biologically active confor-
mation is the most important and most difficult. Geometry
optimization and charge calculation methods generally have a
much lesser effect on model predictive performance.
(a) Data set is carefully divided into training and test sets. The
training set is a subset of the data set used for deriving the
3D-QSAR model and usually comprises roughly two-thirds
of the original data set. The test set is then composed of the
remaining one-third of compounds and is used for evalua-
tion of the predictive performance of the model. Care must
be taken to ensure the training and test sets have similar
coverage of chemical space as well as a similar span in
activity.
(b) Following data set division, the final software-dependent
steps in 3D-QSAR model derivation may be taken. Since
3D-QSAR correlates biological activity with differences in
structural features, these final steps generally involve align-
ing important pharmacophore groups or features of the
chemical scaffold to bring to light those structural features
that impact biological activity.
3. After the 3D-QSAR model has been derived it must be vali-
dated. The first step of the validation process normally includes
internal validation by cross-validation, which gives an indica-
tion of the strength of correlation within the training set com-
pounds. Although this is a useful metric, it gives little
indication of how well the model can predict activity data for
compounds not included in the training set. To understand
this, the model is used to predict the activities of test set
compounds. These predicted values are compared with the
previously known activity values to calculate rpred2, a measure
of external predictive performance.
4. If the model is found to be sufficiently predictive following
these tests, it may be used to perform predictions on similar
compounds for which biological activity is unknown. If it is
not, the user must repeat the previous steps leading up to

model derivation until the model is sufficiently predictive.
3.5. Pharmacophore/ 1. Taking the optimized ligand geometries and knowledge of the
Toxicophore Elucidation specific target for which these chemicals interact with it is
possible to superimpose the various ligands, or conformations
of the various ligands, to recreate or infer the optimal features
for a specific experimental mode of action.
2. Negative data is especially useful in pharmacophore models as it
allows one to rule out “impossible” superpositions (hence
better molecular level boundary conditions) and increase the
predictive accuracy of models to predict “hits” or “non-hits.”
3. Flexible alignment or ligand superposition is preceded by
geometry optimization and conformation enumeration of
each of the 3D geometries that can be made by performing
either stochastic or deterministic molecular simulations within
a molecular mechanics framework to systematically alter the
dihedral angles of the chemicals of interest and localize multiple
low energy conformers or rotamers.
4. Once alignment procedures between chemicals have been com-
pleted, one often finds common molecular features that one
can reduce the ligand structure into (i.e., hydrogen bond
donor, hydrogen bond acceptor, hydrophobic contact, aro-
matic contacts, metal interactions, cationic interactions, and
anionic interactions). These features can include exclusion
volumes or cavities that “wrap” the outer volume of a set of
known ligands. For any given chemical, if a conformation falls
within the cavity, and the spatial relationship between features
are either completely or partially satisfied one would identify
“potential hits.”
4. Examples
In order to familiarize toxicologists with pertinent examples for

further exploration, we briefly outline some of the key papers in
table format (Table 4) as they address mechanistic toxicology ques-
tions, the modeling approaches taken, the software used, and the
literature reference of the research effort. Finally, we provide a brief
description of an in-house in silico chemical genomics program and
the key actions taken to bring an in silico molecular modeling
results database to fruition to complement screening and toxicoge-
nomic efforts.
Example(s) I: 3D CAMM to inform toxicology, single target research.
Table 4
Selected research papers that exemplify 3D CAMM to inform
mechanistic toxicology
Tox data gap Modeling method Chemical/target space Citation

1. Absorption Catalyst (Accelrys) TARGET: P-gp (P-glycoprotein (51)
3D-QSAR efflux transporter)
Pharmacophore LIGANDS: 27 digoxin inhibitors,
Homology modeling, 21 + 17 vinblastine inhibitors
2. Distribution Homology modeling, 3D-QSAR, TARGET: sex hormone binding (52–54)
molecular dynamics, and molecular globulin
docking (various—Glide— LIGAND: 80,000 ligands
Schrodinger, and Chemical TARGET: Human serum albumin
Computing and Tripos suites) binding (plasma binding)
Molecular docking (Autodock) LIGAND: <10 structurally related
molecular modeling chemicals to naturally occurring
(Macromodel/Schrodinger), ochratoxin
theoretical site-directed TARGET: Human Serum Albumin
mutagenesis (Sybyl, Tripos) binding (plasma binding)
3D-QSAR and molecular docking LIGAND: 37 structurally related
putative interleukin 8 inhibitors
3. Metabolism Pharmacophore and 3D-QSAR (a) TARGET: CYPs 1A2, 2B6, (55–57)
2C9, 2D6, 3A4
LIGANDS: various
drug-like/leads
(b) TARGET: PXR
LIGANDS: various
drug-like/leads
4. Elimination Pharmacophore and 3D-QSAR TARGET: rat multidrug resistance- (58, 59)
[various in-house packages such as associated protein 2 and 1
CAMDA, as well as commercial LIGAND: >4,000 conformers
suites such as Sybyl (TRIPOS)] of >18 metabolism-like leads
5. Molecular Homology modeling and molecular TARGET: Estrogen receptor alpha (60, 61)
toxicology docking (OpenEye’s FRED and (i.e., nuclear receptor) or 5-HT6
(receptors) SimBioSys Ltd. eHITS) receptor based off the rhodopsin
crystal structure (GPCR
modeling)
Example II: Building an in silico chemical genomics framework using 3D

CAMM to inform toxicology on multiple targets
This next example focuses primarily on the components

required to build a multiple target large-scale docking inventory
to support chemical genomic research. In an effort to dig deeper
into toxicogenomics data we had built an in-house in silico chemi-
cal genomics infrastructure. The idea was to complement HTS and
chemical genomics programs and support a fully integrated in silico,
in vitro, and in vivo computational toxicology framework.
Particular examples of building a multiple target, multiple chem-

ical docking database are relatively rare in the literature, and for the
most part support drug discovery research. We wanted to demon-
strate to the reader how this was done at an overview level of detail to
show key considerations, infrastructure and coding requirements
needed to be assessed. In this case, we were building a database to
inform the data matrix required for risk assessment of many environ-
mental chemicals. One of the efforts from the US-Environmental
Protection Agency’s National Center for Computational Toxicology
is the ToxCast program (5). In this case, for phase I a total of 320
chemicals (DSSTox data set) were selected for thorough in vitro
work-up. What was not provided by any of these assays, however,
was the type of information at molecular resolution as obtained by
structure-based virtual ligand screening. In this case an in silico
chemical genomic initiative, DockScreen, was started (62–64).
The Dockscreen data is the result of nearly 2,500 ligands
(environmental, therapeutic, metabolites and industrial chemicals)
docked into about 151 binding sites of ~100 protein targets using
the eHiTS (Simbiosys Ltd, Canada) software package. The result
was over 350,000 docking runs resulting in over nine million ligand
poses. These calculations were performed over a period of 2 months
on 20 servers and collected a total of ~250 Gb of coordinate-
specific pose data for each of the 2,100 chemicals on each of the
151 targets. To store and manage queries to access this data, a
MySQL relational database schema was designed with separate
tables for ligands, proteins, docking runs, and poses as well as
some computational statistics. A custom interface to this database
was built in a Linux OS environment with a PHP enabled Apache
web server. The acronym “LAMP” is often used to refer to such a
combination of Linux, Apache, MySQL, and PHP which have been
used in combination to provide web access to many databases. For
Dockscreen, only a dozen or so PHP scripts were needed to let
users to view, query, and select groups of ligands, groups of targets,
and statistical calculations on the distribution of docking scores for
the runs including such ligands and targets. In addition to numeric
statistics, histogram graphics were produced on the fly. An
applet allowing users to draw chemicals and search against ligands
is also built in. We believe in-house tools such as these that provide
scientists relatively fast access to “molecular-level” target binding
properties and poses is critical for those wishing to focus on chemi-
cal risk assessment at a molecular level of accountability.
5. Notes
l The good modeling practices as discussed more broadly

throughout this book in other chapters still apply to molecular
modeling, that is, in order to keep an audit trail of the steps and
methods applied to virtual screening one must keep track of

details used to obtain the numerical results.
– Keep track of crystal structure PDB accession number
– Comment on species type and co-crystallized ligands
– Consider any information with regards to MOA to be perti-
nent and capture (e.g., IS inhibitor, IS substrate, IS agonist,
or IS antagonist)
– Consider keeping the crystal structure of the co-crystallized
ligands as methods to test pose fidelity and “accuracy” of
your modeling experiment
– When selecting a crystal structure it is good practice to inspect
the atoms in the vicinity of the putative binding site for which
you will perform docking. If the B or thermal factor is rela-
tively low, then this is a good sign that the active site is
relatively rigid and not an “ensemble of conformations.”
This information is explicitly found in the PDB file (and can
be downloaded from http://www.pdb.org). Other derived
information about the model geometry can be analyzed with
free tools such as MolProbity (65). This free software helps
identify model inconsistencies which may suggest not using a
given structure. Similarly, it is vital to consider only targets for
which the original X-ray data have been deposited. Using this
data, electron density maps can be calculated or downloaded
from places such as the Uppsala Electron Density server (66).
The maps and models can be viewed using free programs such
as Coot (67), Python Molecular Viewer (PMV) (68, 69),
or SwissPDB’s DeepView (70). Even with help from an expe-
rience X-ray crystallographer, one can confirm that (a) density
clearly follows the shape of the model and (b) there is not
substantial “difference electron density” to indicate that
model atoms are incorrectly placed.
– It is good practice to use crystal structures with relevant co-
crystallized ligands as opposed to only resolution criteria.
l Every detail counts: Knowledge of the pH, solvent medium,
ionic strength, and buffers used can have implications on the
model, the charge state of the model, and the type of charge
model you would select to estimate atomic charges.
– Considering the subcellular localization can often help in
determining charge state for a chemical. For instance, the
pH of the cytosol is ~7, the mitochondrial and ER pH is ~5,
whereas the pH of the nucleus is ~7.5–8. This may affect the
charge models you wish to capture.
l For modeling protein binding consider solvation/desolvation
description as implemented in molecular mechanics frame-
works or molecular docking algorithms as being inadequately
captured or addressed.
l For modeling more complex cellular activity phenomena (such as

receptor-mediated transient activation assays) consider cellular
transport processes as surrogates to modeling a molecular
MOA. For instance, estimating cellular membrane permeabil-
ity, nonspecific target binding and specific target binding may
assist in these efforts.
l When validation of pose fidelity is not optimal, consider the
reasons for failure. “Reasons for Pose fidelity failure—Many of
these pose fidelity failures could be attributed to one of four
common causes: (a) insufficient conformational sampling of the
ligand, particularly of aliphatic ring puckering, (b) symmetric
or pseudo symmetric molecule, (c) critical missing water mole-
cules, and (d) ligand pose dominated by electronic (orbital)
effects. These issues are common to all docking methods and
protocols” (71).
l Putting the pieces together: data and models and different soft-
ware packages: One may want to consider either the purchase
of a workflow manager such as Pipeline Pilot (40) or use of
public domain versions such as KNIME (74), Bioclipse (72), or
Taverna (73). Many of these packages contain the necessary
elements to address step 1 and 2 of Fig. 2 in the in silico, in vitro
workflow. Then, the selection of a docking package is the final
step.
Acknowledgments
Michael-Rock Goldsmith would like to thank James Rabinowitz

and Stephen Little (from the US-EPA’s National Center for
Computational Toxicology) for providing mentorship and assis-
tance during his postdoctoral research, and providing the environ-
ment to explore molecular docking in the context of toxicology
while providing insight and valuable discussion in the development
of the in-house in silico chemical genomics initiative at the US-
EPA.
References
1. Voutchkova A, Osimitz T, Anastas P (2010) 3. Rabinowitz J, Goldsmith M, Little S, Pasquinelli
Toward a comprehensive molecular design M (2008) Computational molecular modeling
framework for reduced hazard. Chem Rev for evaluating the toxicity of environmental che-
110:5845–5882 micals: prioritizing bioassay requirements. Envi-
2. Rusyn I, Daston G (2010) Computational tox- ron Health Perspect 116:573–577
icology: realizing the promise of the toxicity 4. Allinger N, Burkert U (1982) Molecular
testing in the 21st century. Environ Health mechanics. American Chemical Society,
Perspect 118:1047–1050 Washington, DC
5. Dix D, Houck K (2007) The ToxCast program 17. Kubinyi H (2002) From narcosis to hyper-
for prioritizing toxicity testing of environmen- space: the history of QSAR. Quant Struct Act
tal chemicals. Toxicol Sci 95:5–12 Relat 21:348–356
6. Villoutreix B, Renault N, Lagorce D, Speran- 18. Wold S, Ruhe A, Wold H, Dunn W (1984) The
dio O, Montes M, Miteva M (2007) Free collinearity problem in linear regression—the
resources to assist structure-based virtual partial least squares (PLS) approach to
ligand screening experiments. Curr Protein generalized inverses. SIAM J Sci Stat Comput
Pept Sci 8:381–411 5:735–743
7. Ponder J, Case D (2003) Force fields for pro- 19. Cramer R, Patterson D, Bunce J (1988) Com-
tein simulations. Adv Protein Chem 66:27–85 parative Molecular Field Analysis (CoMFA). 1.
8. Pearlman D, Case D, Caldwell J, Ross W, Effect of shape on binding of steroids to carrier
Cheathham T, DeBolt S, Ferguson D, Seibel proteins. J Am Chem Soc 110:5959–5967
G, Kollman P (1995) AMBER, a package of 20. Klebe G, Abraham U, Mietzner T (1994)
computer programs for applying molecular Molecular similarity indices in a comparative
mechanics, normal mode analysis, molecular analysis (CoMSIA) of drug molecules to corre-
dynamics and free energy calculations to simulate and predict their biological activity. J Med
late the structural and energetic properties of Chem 37:4130–4146
molecules. Comput Phys Commun 91:1–41 21. Pastor M, Cruciani G, McLay I, Pickett S,
9. MacKerell A, Brooks B, Brooks C, Nilsson L, Clementi S (2000) GRid-INdependent descrip-
Roux B, Won Y, Kaplus M (1998) CHARMM: tors (GRIND): a novel class of alignment-
the energy function and its parameterization independent three-dimensional molecular
with an overview of the program. In: Scheyer descriptors. J Med Chem 43:3233–3243
PVR et al (eds) The encyclopedia of computa- 22. Norinder U (1996) 3D-QSAR investigation of
tional chemistry. Wiley, Chichester the Tripos benchmark steroids and some protein-
10. Case D, Cheatham T, Darden T, Gohlke H, tyrosine kinase inhibitors of styrene type using
Luo R, Merz K, Onufriev A, Simmerling C, the TDQ approach. J Chemom 10:533–545
Wang B, Woods R (2005) The AMBER bio- 23. Kurogi Y, Guner O (2001) Pharmacophore
molecular simulation programs. J Comput modelling and three-dimensional database
Chem 26:1668–1688 searching for drug design using catalyst. Curr
11. Brooks B, Brooks C, Mackerell A, Nilsson L, Med Chem 8:1035–1055
Petrella R, Roux B, Won Y, Archontis C, Bar- 24. Park J, Harris D (2003) Construction and
tels S, Caflish B, Caves L, Cui Q, Dinner A, assessment of models of CYP2E1: predictions
Feig M, Fischer S, Gao J, Hodoscek M, Im W, of metabolism from docking, molecular
Kuczera K, Lazaridi T, Ma J, Ovchinnikov V, dynamics and density functional theoretical cal-
Paci E, Pastor R, Post C, Pu J, Schaefer M, culations. J Med Chem 46:1645–1660
Tidor B, Venable T, Woodcock H, Wu X, Yah 25. Jones J, Mysinger M, Korzekwa K (2002)
W, York D, Karplus M (2009) CHARMM: the Computational models for cytochrome P450:
biomolecular simulation program. J Comput a predictive electronic model for aromatic oxi-
Chem 30:1545–1615 dation and hydrogen atom abstraction. Drug
12. Brooks B, Bruccoleri R, Olafson B, States D, Metab Dispos 30:7–12
Swaminathan S, Karplus M (1983) CHARMM: 26. Cheng Y, Prusoff W (1973) Relationship
a program for macromolecular energy, minimi- between the inhibition constant (Ki) and the
zation, and dynamics calculations. J Comput concentration of inhibitor which causes 50 per
Chem 4:187–217 cent inhibition (I50) of an enzymatic reaction.
13. Allinger N, Yuh Y, Lii J (1989) Molecular Biochem Pharmacol 22:3099–3108
mechanics: the MM3 force field for hydrocar- 27. MOE. Chemical Computing Group. Mon-
bons. J Am Chem Soc 111:8551–8566 treal, Quebec, Canada
14. Leo A, Hansch C, Elkins D (1971) Partition 28. Schrodinger, Inc. New York, NY
coefficients and their uses. Chem Rev
71:525–616 29. Cheatham T, Young M (2001) Molecular
dynamics simulation of nucleic acids: successes,
15. Lipinski C, Lombardo F, Dominy B, Feeney P limitations and promise. Biopolymers 56:
(1997) Experimental and computational 232–256
approaches to estimate solubility and perme-
ability in drug discovery and development set- 30. Roterman I, Lambert M, Gibson K, Scheraga
tings. Adv Drug Deliv Rev 23:3–25 H (1989) A comparison of the CHARMM,
AMBER and ECEPP potentials for peptides.
16. Wermuth C, Ganellin C, Lindberg P, Mitscher 2. Phi-Psi maps for n-acetyl alanine N0 -methyl
L (1998) Glossary of terms used in medicinal amide—comparisons, contrasts and simple
chemistry. Pure Appl Chem 70:1129–1143
experimental tests. J Biomol Struct Dyn 44. Cross J, Thompson D, Rai B, Baber J, Fan K,
7:421–453 Hu Y, Humblet C (2009) Comparison of sev-
31. Roterman I, Gibson K, Scheraga H (1989) A eral molecuclar docking programs: pose predic-
comparison of the CHARMM, AMBER and tion and virtual screening accuracy. J Chem Inf
ECEPP potential for peptides. 1. Conforma- Model 49:1455–1474
tional predictions for the tandemly repeated 45. Cherkasov A, Fuqiang B, Li Y, Fallahi M, Ham-
peptide (Asn-Ala-Asn-Pro)9. J Biomol Struct mond G (2006) Progressive docking: a hybrid
Dyn 7:391–419 QSAR/Docking approach for accelerating in
32. Gundertofte K, Liljefors T, Norrby P, Petter- silico high throughput screening. J Med
son I (1996) A comparison of conformational Chem 49:7466–7478
energies calculated by several molecular 46. Peterson S (2007) Improved CoMFA model-
mechanics methods. J Comput Chem ing by optimization of settings: toward the
17:429–449 design of inhibitors of the HCV NS3 protease.
33. Jorgensen W, Maxwell D, Tirado-Rives J Uppsala University, Uppsala
(1996) Development and testing of the OPLS 47. Norinder U (1998) Recent progress in
all-atom force field on conformational energet- CoMFA methodology and related techniques.
ics and properties of organic liquids. J Am Perspect Drug Discov Des 12/13/14:25–39
Chem Soc 118:11225–11236 48. Kim K, Grecco G, Novellino E (1998) A criti-
34. Jorgensen W, Tirado-Rives J (1988) The OPLS cal review of recent CoMFA applications. Per-
potential functions for proteins—energy mini- spect Drug Discov Des 12/13/14:257–315
mizations for crystals of cyclic-peptides and 49. Rosen J, Lovgren A, Kogej T, Muresan S, Gott-
crambin. J Am Chem Soc 110:1657–1666 fries J, Backlund A (2009) ChemGPS-NPWeb:
35. Halgren T (1996) Merck molecular force field. chemical space navigation tool. J Comput
I. Basis, form, scope parameterization and per- Aided Mol Des 23:253–259
formance of MMFF94. J Comput Chem 50. Larsson J, Gottfries J, Muresan S, Backlund A
17:490–519 (2007) ChemGPS-NP: tuned for navigation in
36. Chen Y, Zhi D (2001) Ligand–protein inverse biologically relevant chemical space. J Nat Prod
docking and its potential use in the computer 70:789–794
search of protein targets of a small molecule. 51. Ekins S et al (2002) Three-dimensional quan-
Proteins 43:217–226 titative structure-activity relationships of inhi-
37. Ellis L, Hou B, Kang W, Wackett L (2003) The bitors of P-glycoprotein. Mol Pharmacol
University of Minnesota Biocatalysis/Biodeg- 61:964
radation Database: post-genomic data mining. 52. Thorsteinson N, Ban F, Santos-Filho O, Tabaei
Nucleic Acids Res 31:262–265 S, Miguel-Queralt S, Underhill C, Cherkasov
38. MetaPrint2d http://www-metaprint2d.ch. A, Hammond G (2009) In silico identification
cam.ac.uk/metaprint2d of anthropogenic chemicals as ligands of zebra-
39. Bologa C, Olah M, Oprea T (2005) Chemical fish sex hormone binding globulin. Toxicol
database preparation for compound acquisition Appl Pharmacol 234:47–57
or virtual screening. Methods Mol Biol 316: 53. Perry J, Goldsmith M, Peterson M, Beratan D,
375 Wozniak G, Ruker F, Simon J (2004) Structure
40. Accelrys Discovery Suite, Accelrys, Inc. San of the ochratoxin A binding site within human
Diego, CA serum albumin. J Phys Chem B 108:
41. Sybyl. Tripos, Inc. St. Louis, MO 16960–16964
42. Schwede T, Sali A, Honig B, Levitt M, Berman 54. Aureli L, Cruciani G, Cesta M, Anacardio R,
H, Jones D, Brenner S, Burley S, Das R, De Simone L, Moriconi A (2005) Predicting
Dokholyan N, Dunbrack R, Fidelis K, Fiser A, human serum albumin affinity of interleukin-
Godzik A, Huang Y, Humblet C, Jacobsen M, 8 (CXCL8) inhibitors by 3D-QSPR approach.
Joachimiak A, Krystek S, Kortemme T, Krysh- J Med Chem 48:2469–2479
tafovych A, Montelione G, Moult J, Murray D, 55. Ekins S, de Groot M, Jones J (2001) Pharma-
Sanchez R, Sosinick T, Standley D, Stouch T, cophore and three-dimensional quantitative
Vajda S, Vasquez M, Westbrook J, Wilson I structure activity relationship methods for
(2009) Outcome of a workshop on applica- modeling cytochrome P450 active sites. Drug
tions of protein models in biomedical research. Metab Dispos 29:936–944
Structure 17:151–159 56. Ekins S, Erickson J (2002) A pharmacophore
43. Irwin J (2008) Community benchmarks for for human pregnane X receptor ligands. Drug
virtual screening. J Comput Aided Mol Des Metab Dispos 30:96–99
22:193–199
57. Lewis D (2002) Molecular modeling of human 63. http://www.epa.gov/ncct/bosc_review/2009/

cytochrome P450-substrate interactions. Drug posters/2-06_Rabinowitz_CompTox_BOSC09.
Metab Rev 34:55–67 pdf
58. Hirono S, Nakagome L, Imai R, Maeda K, 64. Goldsmith M, Little S, Reif D, Rabinowitz J
Kusuhara H, Sugiyama Y (2005) Estimation Digging deeper into deep data: molecular dock-
of the three-dimensional pharmacophore of ing as a hypothesis-driven biophysical interroga-
ligands for rat multidrug-resistance-associated tion system in computational toxicology
protein 2 using ligand-based drug design tech- 65. http://molprobity.biochem.duke.edu
niques. Pharm Res 22:260–269 66. http://xray.bmc.uu.se/valid/density/form1.
59. DeGorter M, Conseil G, Deeley R, Campbell html
R, Cole S (2008) Molecular modeling of the 67. http://www.biop.ox.ac.uk/coot
human multidrug resistance protein 1
(MRP1/ABCC1). Biochem Biophys Res 68. http://pmvbase.blogspot.com/2009/04/
Commun 365:29–34 electron-density-map.html
60. Rabinowitz J, Little S, Laws S, Goldsmith M 69. http://mgltools.scrips.edu/documentation/
(2009) Molecular modeling for screening envi- tutorial/python-molecular-viewer
ronmental chemicals for estrogenicity: use of 70. http://spdbv.vital.it.ch
the toxicant-target approach. Chem Res Tox- 71. Irwin J, Shoichet B, Mysinger M, Huang N,
icol 22:1594–1602 Colizzi F, Wassam P, Cao Y (2009) Automated
61. Hirst W, Abrahamsen B, Blaney F, Calver A, docking screens: a feasibility study. J Med
Aloj L, Price G, Medhurst A (2003) Differ- Chem 52:5712–5720
ences in the central nervous system distribution 72. Bioclipse. Proteometric Group, Department of
and pharmacology of the mouse 5-hydroxy- Pharmaceutical Biosciences, Uppsala Univer-
tryptamine-6 receptor compared with rat sity, Sweden & Cheminformatics and Metabo-
and human receptors investigated by radioli- lism Team, European Bioinformatics Institute
gand binding, site-directed mutagenesis, (EMBI)
and molecular modeling. Mol Pharmacol 73. Taverna. School of Computer Science, University
64:1295–1308 of Manchester, UK
62. http://oaspub.epa.gov/eims/eimscomm.get- 74. www.knime.org
file?p_download_id¼466705
Chapter 8
Chemical Structure Representations and Applications

in Computational Toxicity
Muthukumarasamy Karthikeyan and Renu Vyas
Abstract
Efficient storage and retrieval of chemical structures is one of the most important prerequisite for solving
any computational-based problem in life sciences. Several resources including research publications, text
books, and articles are available on chemical structure representation. Chemical substances that have same
molecular formula but several structural formulae, conformations, and skeleton framework/scaffold/
functional groups of the molecule convey various characteristics of the molecule. Today with the aid of
sophisticated mathematical models and informatics tools, it is possible to design a molecule of interest with
specified characteristics based on their applications in pharmaceuticals, agrochemicals, biotechnology,
nanomaterials, petrochemicals, and polymers. This chapter discusses both traditional and current state of
art representation of chemical structures and their applications in chemical information management,
bioactivity- and toxicity-based predictive studies.
Key words: Linear molecular representations, 2D and 3D representation, Chemical fragments and
fingerprints, Chemical scaffolds, Toxicophores, Pharmacophores, Molecular similarity and diversity,
Molecular descriptors, Structure activity relationship studies
1. Introduction
Chemists believe that everything is made of chemicals and work

towards understanding their basic properties from their constitu-
ents elements and spatial arrangement to extend this knowledge
for designing better chemicals. Chemical information is a knowl-
edge based domain, where the primary chemical data is transcribed
mainly in the form of chemical structures and commonly used
as the international language of chemistry (1). Systematic naming
of even a moderately complex chemical structure is a challenging
task for a graduate student as the same chemical can have many
167
168 M. Karthikeyan and R. Vyas
different synonyms. The most commonly known chemical like

formaldehyde is documented with more than 180 synonyms in
publicly available databases (2). For instance a simple keyword
search for “formaldehye” in PubChem as substance would return
over 5181 hits (3). Here the PubChem compound ID for “formal-
dehyde” is assigned a value of 712 (4) with several user submitted
synonyms. However searching the molecule formaldehyde with
C(¼)O as Simplified Molecular Input Line Entry System
(SMILES) resulted in 11 entries (5). So the readers are encouraged
to use structure based search for retrieving focused hits. Chemical
structure representation requires therefore an unambiguous nota-
tion for the simplest atom to the most complicated molecule so that
the end user can identify the relevant record in shortest possible
time and need not browse through the voluminous literature that
has minimum or no relevance to molecule of interest. There are
several publicly available chemically relevant databases (6–12) and
professional chemical information service providers like Chemical
Abstract Service (13) and BEILSTEIN (14), from where one can
quickly obtain relevant chemical structure information for nominal
charges. Chemical Abstract Service (CAS) is a pioneer in chemical
information management involved in collecting all the critical
chemical data along with metadata from research publications,
patents, research reports, theses, etc., and it is easy to search these
resources accurately with the help of Chemical abstract registry
numbers using user friendly web-based tools like Scifinder (15).
A brief look at history reveals that in silico representation of
chemical structures proceeded almost parallely with advances in
computer technology. Beginning with punch cards, light pen, key-
boards, alphanumeric strings, graph theory methods to store few
chemicals, today it is practically possible to store millions of chemi-
cal entities in an easy computer readable format (16). Traditionally
hand-drawn chemical structures were used for chemical commu-
nications and eventually replaced by computer-generated images
over a period of time. The preferred way of chemical structure
generation today is via computers due to their flexibility in manip-
ulation and reusability in simulations and modeling studies. These
modern techniques of computer representation of structures effec-
tively meet the following diverse needs of chemists.
l Information Storage, Search and Efficient Retrieval
l The Prediction of physicochemical, biological, and toxicological
properties of compounds
l Spectra Simulation, Structure Elucidation
l Reaction Modeling
l Synthesis Planning
l Drug Design and Environmental Toxicity Assessment
8 Chemical Structure Representations and Applications in Computational Toxicity 169
2. Materials
In the early days of informatics attempts were made to digitize hand-

drawn artistic chemical structures for facilitating structure-based
searches. However, for the past two decades, the availability of
advanced computer softwares and development of chemical struc-
ture representation standards has changed the scenario more
towards the use of computer-generated chemical structures for
chemical registration and communications. Both linear representa-
tions (WLN, SLN, and SMILES) and connection table-based
matrix representation (MOL, MOL2, and SDF) of chemical struc-
tures were developed for compact storage and searching chemical
structures efficiently (17, 18). The methods presently in vogue can
be categorized mainly into compact line notation-based systems
such as SMILES string which were developed when storing data
was expensive and the more recent connection tables format in
structure databases. An overview of the various methods of chemical
structure representation methods so far reported in literature is
depicted in Fig. 1. Here the readers are encouraged to refer a recent
review article by Warr et al. (19) for a comprehensive coverage of
general representation of chemical structures and references cited
therein.
2.1. Linear Linear line notation is easily accessible to chemists and flexible
Representation enough to allow interpretation and generation of chemical notation
similar to natural language. Alphanumeric string-based linear
chemical structure encoding rules were developed by the pioneer-
ing contributions of Wiswesser, Morgan, Weininger, and Dyson,
and eventually applied in machine description. In 1949, William
Wiswesser introduced a line notation system based on the Berzelian
symbols with the structural and connectivity features. This system
named as Wiswesser Line Notation (WLN) was used online for
structure and substructure searches (20). Unfortunately, the adop-
tion of WLN was ignored due to the complexity of specification and
inflexible rules associated with it. The development of SMILES
notation made a significant effect on compact storage in chemical
information systems and it has led to the development of modern
form of representing chemical structures. This line notation system
has several advantages over the older systems in terms of its com-
pactness, simplicity, uniqueness, and human readability. David
Weininger developed the DEPICT program to interpret SMILES
into a molecular structure (21). A detailed description of many
advanced versions of SMILES such as USMILES, SMARTS,
STRAPS, and CHUCKLES are available at Daylight website (22).
SMiles ARbitrary Target Specification (SMARTS) is basically an
extension of SMILES used for describing molecular patterns as
Chemical
structure
representation
methods
Abstract Automatic
General
representation representation
Vision
Line notations Markush
Connection
generic
SMILES, WLN, table
(patents)
ROSDAL, SLN
Barcoding
Matrices RF
(2D / 3D) Substructures
representation
used in searching
OCR
Hash 2D
Fragment
codes fingerprints
coding
Fig. 1. A chart showing the molecule structure representation methods.
well as for substructure searching interpret (22). Sybyl Line Nota-

tion (SLN) is an another ASCII language which is almost similar to
SMILES the difference being mainly in the representation of
explicit hydrogen atoms (23). It can be used for substructure
searching, Markush representation, database storage, and network
communication but does not support reactions. An International
Chemical Identifier (InChI) notation is a string of characters capa-
ble of uniquely representing a chemical substance (24). It is derived
from a structural representation of that substance in a way designed
to be independent of the way that the structure was drawn (thus a
single compound will always produce the same identifier). It pro-
vides a precise, robust, IUPAC approved tag for representing a
chemical substance. InChI is the latest and most modern of the
line notations. It resolves many of the chemical ambiguities not
addressed by SMILES, particularly with respect to stereocenters,
tautomers, and other valence model problem. Table 1 shows the
various line notations used for representing a chemical compound.
Table 1
Various notations of 3- (p-CHLOROPHENYL) -1, 1-DIMETHYLUREA
TRICHLOROACETATE
l CAS No:140-41-0
l Other Names:
l EPA Pesticide Chemical Code 035502
l GC-2996
l LS-12938
l Caswell No. 583A
l PubChem : 8799
l EPA Pesticide Chemical Code: 035502
l Urox
l Monuron trichloroacetate
l 3-(p-CHLOROPHENYL)-1,1-DIMETHYLUREA TRICHLOROACETATE
l Acetic acid, trichloro-, compd. with 3-(p-chlorophenyl)-1,1-dimethylurea (1:1)
l Urea, 3-(p-chlorophenyl)-1,1-dimethyl-, cmpd. with trichloroacetic acid (1:1)
l Acetic acid, trichloro-, compd. with N0 -(4-chlorophenyl)-N,N-dimethylurea (1:1) (9CI)
l Trichloroacetic acid compound with 3-(p-chlorophenyl)-1,1-dimethylurea (1:1) (8CI)
l Wiswesser Line Notation (WLN): GR DMVRN1&1 &GXGGVO
l Canonical SMILES: CN(C)C(¼O)NC1 ¼ CC ¼ C(C ¼ C1)Cl.C(¼O)(C(Cl)(Cl)Cl)O
l InChI:
l InChI ¼ 1 S/C9H11ClN2O.C2HCl3O2/c1-12(2)9(13)11-8-5-3-7(10)4-6-8;3-2(4,5)1(6)7/
h3-6 H,1-2 H3,(H,11,13);(H,6,7)
l InChIKey: DUQGREMIROGTTD-UHFFFAOYSA-N
H3C CH3
N
O
O NH
HO
Cl
Cl
Cl
Cl
2.2. Graph/Matrix According to graph theory a chemical structure is an undirectional,

Representation unweighted, and labeled graph with atoms as nodes and bonds as
of Molecules edges. Grave and Costa augmented molecular graphs with rings
and functional groups by inserting additional vertices with
corresponding edges (25). Matrix representation of graph was
also used to denote chemical structure with n atoms as an array of
n n entries. There are several types of matrix representation such
as adjacency matrix, distance matrix, atom connectivity matrix,
incidence matrix, bond matrix, bond electron matrix each with its
own set of merits and demerits.
2.3. Connection Tables Contemporarily a new system of representation of molecular infor-

mation is the form of a connection table is used. Simply defined a
connection table is a list of atoms and bonds in a molecule. The
connection table is a table enumerating the atoms and the bonds
connecting specific atoms (26). The molecule shown in the con-
nection table below has three atoms and two bonds. The table
provides the three-dimensional (x, y, z) coordinates and
the information about the bonds connecting the atoms along
with the type of bonds (1 ¼ single, 2 ¼ double, etc.) alongside.
Despite the size and format constraints, the connection tables are
easily handled by the computers. However, it lacks in the human
interpretability of the structural information. Owing to the con-
straints, the connection tables were widely adopted by the storage
media file (Fig. 2).
2.4. 3D Structure Representing stereochemistry in computers during early days was a

Generation challenging task. In early 1970s Wipke et al. developed a rapid
storage and retrieval of chemical structure based on SEMA Algo-
rithm (Stereo chemically Extended Morgan Algorithm) (27).
James G Nourse provided a method CFSG (ConFormation Sym-
metry Group) for specifying molecular conformation using bond
attributes producing a unique name for conformation which can be
coded for searching databases (28). Helmut described a matrix-
based simple stereocode for structures wherein the stereochemistry
was defined by the sequence of substituent indexes (29). Peter
Willet and coworkers developed a similarity-based sophisticated
method for searching in files of three-dimensional chemical struc-
tures (30). Standard methods for 3D structure generation used
today are Concord, Corina, Chem Model, and Chem Axon (31).
Nicklaus et al. compared chemmodel against concord and found
that the earlier is superior due to its ability to map entire conforma-
tional space while comparing with X-ray crystallographic database
(32). Clark et al. in their study found that genetic algorithm and
directed tweak methods are efficient for searching database with
flexible 3D structures (33).
Marvin 08081111502D
20 19 0 0 0 0 999 V2000
–1.0992 2.2846 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
–0.3847 1.8721 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
0.3298 2.2846 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
–0.3847 1.0471 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
–1.0992 0.6346 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
0.3298 0.6346 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
0.3298 –0.1904 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.0442 –0.6029 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.0442 –1.4279 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.3298 –1.8404 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
–0.3847 –1.4279 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
–0.3847 –0.6029 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.3298 –2.6654 0.0000 Cl 0 0 0 0 0 0 0 0 0 0 0 0
3.9907 0.4083 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
4.4032 1.1227 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
4.4032 –0.3062 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
4.8157 –1.0207 0.0000 Cl 0 0 0 0 0 0 0 0 0 0 0 0
3.6887 –0.7187 0.0000 Cl 0 0 0 0 0 0 0 0 0 0 0 0
5.1176 0.1063 0.0000 Cl 0 0 0 0 0 0 0 0 0 0 0 0
3.1657 0.4083 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
4 5 2 0 0 0 0
4 6 1 0 0 0 0
6 7 1 0 0 0 0
7 8 2 0 0 0 0
8 9 1 0 0 0 0
9 10 2 0 0 0 0
10 11 1 0 0 0 0
11 12 2 0 0 0 0
7 12 1 0 0 0 0
10 13 1 0 0 0 0
14 15 2 0 0 0 0
14 16 1 0 0 0 0
16 17 1 0 0 0 0
16 18 1 0 0 0 0
16 19 1 0 0 0 0
14 20 1 0 0 0 0
M END
Fig. 2. Mol File format of 3- (p-CHLOROPHENYL) -1, 1-DIMETHYLUREA TRICHLOROACE-

TATE.
3. Methods
3.1. Strategies Used in To know which molecule is already published and studied in chem-
Searching Chemical ical or biological context, scientific literature search is conducted
Databases with the help of exact structure, similar structure, substructure and
hyperstructures.
3.1.1. Fragment and In the fragment-based approach in order to search a database to find
Fingerprint Based Search similar structures, the query molecule (graph) is fragmented into
Strategies various logical fragments (subgraphs such as functional groups, and
rings) (34). The list of retrieved hit structures will include all those
substructure (fragments) that were present in the query structure.
Another approach is the fingerprint-based approach which is more
of a partial description of the molecule as a set of fragments rather
than a well-defined substructure (35). It essentially treats a molecule
as a binary array of integers where each element (1 or 0) representing
presence or absence of particular fragments as TRUE or FALSE.
A given bit is set to 1 (True) if a particular structural feature is
present and 0 (False). Hashed Fingerprints composed of bits of
molecular information (fragments) such as types of rings, functional
groups, and other types of molecular and atomic data. Comparing
fingerprints will allow one to determine the similarity between two
molecules A and B as shown in Fig. 3.
3.1.2. Similarity Search The fingerprints of two structures can be compared and a distance
score can be computed for their similarity. The similarity coefficient
metrics such as Tanimoto, Cosine and Euclidean distance helps to
filter relevant molecules rapidly from millions of entries (36). Taka-
shi developed an approach for automatic identification of molecular
Molecule A 1000100011100010100001000000100101 a = 11
Molecule B 0001100001000010010001000000100101 b = 9
Similarity (A and B) 0000100001000010000001000000100101 c = 7
Fig. 3. Similarity and fingerprint analysis.

similarity using reduced graph representation and applied clique

detection algorithm for finding similar structural features (37).
3.1.3. Hash Coding Hash coding of molecules is an efficient way of substructure storing
and searching where a molecule is assigned given a unique id key
which is used for directly mapping the address of the compound in
a computer system. There is, however, information loss in this
approach as the source molecule cannot be reconstructed from
the hash keys (38). In a report by Wipke et al. four different hash
functions were used effectively for the rapid storage and retrieval of
chemical structures (39). Hodes and Feldman developed a novel
file organization for substructure searching via hash coding (40).
3.1.4. Hyperstructures Hyperstructure has been defined by Robert Brown as a novel

representation for a set of 2D chemical structures that has been
suggested for increasing the speed of substructure searches, when
compared with a sequential search. He reported two methods for
construction of hyperstructures atom assignment method and max-
imal overlap-set method wherein the first one was found to be
computationally less intensive (41).
3.1.5. Abstract One of the major challenges confronting pharma industry is how to
Representation find a molecule or list of molecules with desired properties which
of Molecules are not yet published? One approach is to simulate all the possible
chemicals for a given formula and enumerate them to full length.
However it is impractical to do that and also searching each entry
with millions of entries already known is another impossible task!
Only solution is to map what is known from the literature and
hence that would indirectly indicate what is “not yet reported.”
It does not necessarily imply that “those entries are not yet identi-
fied.” It is known in business context that several thousand new
molecules with desired properties are kept as trade secrets and
superficially protected by patents as generic structures. There are
occasions where the molecules are represented as markush struc-
tures in a generic context to cover a family of molecular structures
and that sometimes go beyond millions. Markush structures are
generic structures used in patent databases such as MARPAT main-
tained by Chemical abstracts service for protecting intellectual
chemical information in patents (42) (Fig. 4).
Fig. 4. A Markush structure.

A formal language GENSAL (generic structure description

language) has been developed to describe generic structure from
chemical patents (43). A seminal work by Stuart Kaback in 1980s
highlights the structure searching parameters in Derwent world
patent index (44).
3.2. Chemical Drawing There are several commonly available tools for generating chemical
Tools structures and storing them in standard file formats. Most popular
ones among academia and industry are MarvinSketch from Che-
mAxon and ChemDraw from Cambridgesoft. Other tools exist for
specific purposes like integrating with analytical data (1H NMR),
for example, ChemSketch and Inventory management using ISIS
draw/ISIS Base(45). With the help of these drawing tools one can
easily drag and drop necessary pre-built templates and build com-
plex chemical structures. Interconversion of 2D to 3D structures is
also possible through these tools. The quality of 3D structure
however depends on the methods used within the system. The
best 3D model is the one which is comparable with X-ray crystallo-
graphic data. Corina a software developed by Gasteiger et al. is very
fast in computing 3D coordinates for 2D structures. Comparable
or even better algorithms are implemented in ChemAxon tools.
With the help of 3D structure it is possible to calculate energy of the
molecule, volume, interatomic charge distribution, and other
three-dimensional descriptors required for QSAR-based predictive
studies.
A simple way to generate a 3D structure is to use Marvin View:
l Create and open a molecule in Marvin View tool
l Then go to edit and clean in 3D
l The output will be the 3D structure of the molecule as shown
in Fig. 5
3.3. Current Trends in The traditional methods of chemical structure representation and
Chemical Structure manipulation can be time consuming and tedious entailing long
Representation hours of searching. In this section we discuss the new emerging
technologies being developed to handle chemical structures for
automation and inventory management. Neural network-based
chemical structure indexing is a technique where the chemical
structure is presented as an image to the neural network using
pulse-coupled neural networks (PCNN) to produce binary bar-
codes of chemical structures (46).
Karthikeyan et al. in their work on encoding of chemical struc-
tures reported that chemical structures can be encoded and read as
2D barcodes (PDF417 format) in a fully automated fashion (47).
A typical linear barcode consists of a set of black bars in varying
width separated by white spaces encoding alphanumeric characters.
To reduce the amount of data that has to be encoded on the
Fig. 5. Molecule 2D and 3D representation in Marvin View.
barcode, a template-based chemical structure encoding method

was developed, the ACS file format (48). This method is based on
the Computer-Generated Automatic Chemical Structure Database
(CG-ACS-DB) originally developed to create a virtual library of
molecules through enumeration from a selected set of scaffolds and
functional groups (49). Scaffolds and groups are stored in Auto-
matic Chemical Structure (ACS) format as a plain text file. In this
ACS format, most commonly used chemical substructures are
represented as templates (scaffolds or functional groups) through
reduced graph algorithm along with their interconnectivity rather
than atom-by-atom connectivity information. The barcoded chem-
ical structures can be used for error-free chemical inventory man-
agement. One of the molecules containing over thousands of atoms
can be easily represented as barcoded and can be decoded automat-
ically and accurately in seconds without manual intervention.
Several file formats used for storing and retrieving chemical
structures using the barcode method are shown in Fig. 6.
(a) IUPAC NAME Representation
6-chloro-10- (4-methylpiperazin-1-yl) -15-[10- (4-methylpi-
perazin-1-yl)-2,9-diazatricyclo[9.4.0.0^{3,8}]pentadeca-1
(11),3,5,7,9,12,14-heptaen-6-yl]-2,9-diazatricyclo[9.4.0.0^
{3,8}]pentadeca-1 (11),3,5,7,9,12,14-heptaene
Fig. 6. Barcode representation of a molecule.
(b) InChIKey ¼ KEFNWQBDDAALRP-UHFFFAOYSA-N

The InChI representation of chemical structure is most suitable
for automatic and inventory management solutions using bar-
codes. For example, the InChI code of chemical structure is
represented as barcode as shown in Fig. 6. The simple barcode
code contains 1,309 characters representing chemical struc-
tures in InChI format describing the connectivity, atom infor-
mation, etc. Earlier work on barcoding chemical structures
used SMILES format.
Data (inside Barcode in InChI format)
InChI ¼ 1/C36H37ClN8/c1-42-14-18-44 (19-15-42) 35-
27-6-3-4-9-29 (27) 38-30-12-10-24 (22-32 (30) 40-35) 26-
7-5-8-28-34 (26) 39-31-13-11-25 (37) 23-33 (31) 41-36
(28) 45-20-16-43 (2) 17-21-45/h3-13,22-23,38-39 H,14-
21 H2,1-2H3AuxInfo ¼ 1/0/N:45,22,37,36,14,38,13,15,
35,25,2,26,3,41,43,18,20,40,44,17,21............ (Lines removed
for brevity) ............
.1175,6.7142,0;.5991,7.123,0;1.3114,6.7068,0;-
.8298,7.1304,0;
(c) SMILES Format
CN1CCN (CC1) C1 ¼ Nc2cc (ccc2Nc2ccccc12) -c1cccc2c1
Nc1ccc (Cl) cc1N ¼ C2N1CCN (C) CC1
(d) SMARTS:
[#6]N1CCN (CC1) C1 ¼ Nc2cc (ccc2Nc2ccccc12) -c1cccc
2c1Nc1ccc (Cl) cc1N ¼ C2N1CCN ([#6]) CC1
(e) MOL File format
Structure #1
Marvin 07131117252D
45 52 0 0 0 0 999 V2000
6.1237 0.8510 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
.......... (Lines removed for brevity) ............
-0.8298 7.1304 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1240000
............ (Lines removed for brevity) ............
43 44 1 0 0 0 0
M END
(f) Gaussian Cube file with atom data
No surface data provided
45 0.000000 0.000000 0.000000
1 1.889726 0.000000 0.000000
............. (Lines removed for brevity) ............
6 6.000000 -2.927110 25.152404 0.000000
0.00000E00
(g) PDB Format
HEADER PROTEIN 13-JUL-11 NONE
TITLE NULL
COMPND MOLECULE: Structure #1
SOURCE NULL
KEYWDS NULL
EXPDTA NULL
AUTHOR Marvin
REVDAT 1 13-JUL-11 0
HETATM 1 C UNK 0 11.431 1.589 0.000 0.00 0.00 C + 0
............. (Lines removed for brevity) ..............
HETATM 45 C UNK 0 -1.549 13.310 0.000 0.00 0.00 C + 0
CONECT 1 2 6 23
............ (Lines removed for brevity) ............
CONECT 45 42
MASTER 0 0 0 0 0 0 0 0 45 0 104 0
END
(h) XYZ format
45
Structure #1
C 11.43091 1.58853 0.00000
......... (Lines removed for brevity) ..............
C -1.54896 13.31008 0.00000
Software programs like openBabel can interconvert molecules
over 50 standard file formats required by several computational
chemistry and chemoinformatics oriented programs (50). Dalby
et al. in a classic paper have discussed all the file formats and their
interrelations that are required for storing and managing chemical
structures developed at Molecular Design Limited (MDL) (51).
RF tagging is a technology complementary to barcode repre-
sentation of molecules which is commonly used technique in
security and inventory management (52). Yet another upcoming

technology based on OCR (Optical Character Recognition) can
recognize molecular structures from scanned images of printed
text, one can recognize structures, reactions, and text from scanned
images of printed chemistry literature (53). This can save users
valuable time of redrawing structures from printed material, as it
directly transforms the “images” into a “real structures” that can
then be saved into chemical databases. Programs such as CLiDE,
OSRA, and ChemOCR are the known relevant softwares that
recognize structures, reactions, and text from scanned images of
printed chemistry literature (54–56).
One of the major breakthroughs due to the progress of WWW
system is the evolution of content-based markup language based on
XML syntax, the Chemical Markup Language (CML) developed by
Peter Murray-Rust (57). Now the CML has become as a valuable
tool with the functionalities to describe atomic, molecular, and
crystallographic information. CML captures the structural infor-
mation through a concise set of tags with the associated semantics.
CML representation is well documented however due its size com-
parison with other existing file formats it is prohibitive for many
applications. If a suitable tool is developed to store CML format in
compressed mode without loss of information and freedom of use
then it will encourage user community to apply CML for their
applications. CCML is a methodology for encoding chemical struc-
tures as compressed CML generated by popular chemical structure
generating programs like JME (58). This CCML format consists of
both SMILES and/or equivalent data along with coordinate infor-
mation about the atoms for generating chemical structures in plain
text format. Each structure is either generated by JME in standa-
lone mode or generated by virtual means that can be stored in this
format for efficient retrieval, as it requires about one tenth or below
of actual CML file format, since the SMILES describes the inter-
connectivity of the molecule. This CCML format is compatible for
automated inventory application and is commonly used technique
in security and inventory management.
An open source-based computer program called Chem Robot
is developed which can use digital video devices to capture and
analyze rapidly hand drawn or computer-generated molecular
structures from plain papers (59). The computer program is capa-
ble of extracting molecular images from live streaming digital video
signals and prerecorded chemistry-oriented educational videos.
The images captured from these sources are further transformed
into vector graphics for edge detection, node detection, Optical
Character Recognition (OCR), and interpreted as bonds, atoms in
molecular context. The molecular information generated is further
transformed into reusable data formats (MOL, SMILES, InCHI,
SDF) for modeling and simulation studies. The connection table
and atomic coordinates (2D) generated through this automatic
process can be further used for generation of IUPAC names of the

molecules and also for searching the chemical data from public and
commercial chemical databases. Applying this software the digital
webcams, camcorders can be used for recognition of molecular
structure from hand-drawn or computer-generated chemical
images. The method and algorithms can be further used to harvest
chemical structures from other digital documents or images such as
PDF, JPEG formats. Effective implementation of this program can
be further used for automatic translation of chemical images into
common names or IUPAC names for chemical education and
research. The performance and efficiency of this workflow can be
extended to mobile devices (smart phones) with wi-fi and camera.
3.4. Handling Chemical Storage of chemical structure in proper file format is very impor-
Structure Information tant, for example, structures (html) stored as GIF, JPG, BMP, and
PNG look alike but may not be compatible while database transfer
or computer processing. In late 1970s, Blake et al. developed
methods for processing chemical structure information using light
pen to create high-quality structures which were used in chemical
abstract volumes (60). A decade later all the features of structure
formatting guidelines for chemical abstract publications appeared
in a paper by Alan Goodson (61). Another interesting parallel
development was the use of SNN (structure–Nomenclature Nota-
tion) wherein the molecule was split into fragments by structure
determining vertices and linked by special signs for use in Beilstein
system (62). Igor developed a compact code for storing structure
formulas, performing substructure, and similarity searches and
applied it on a set of 50,000 structures (63). A modular architecture
for chemical structure elucidation called as mosaic or artel architec-
ture was also developed (64). Hagadone and Schulz used a rela-
tional database system in conjunction with a chemical software
component to create chemical databases with enhanced retrieval
capabilities (65).
3.5. Cluster Analysis Clustering is a process of finding the common features from a
and Classification of diverse class of compounds that requires multivariate analysis meth-
Chemical Structures ods. One of the most suitable methods for this study is clustering
where the consensus score and distance between set of compounds
could be easily measured through mean/Euclidean distance mea-
sures. This score reflects the similarity or dissimilarity between
classes of compounds and helps to identify potentially active or
toxic substances through predictive studies. Peter Willet carried
out a comparative study of various clustering algorithms for classi-
fying chemical structures (66). An excellent review by Barnard and
Downs illustrates the methods useful for clustering files of chemical
structures (67). The jarvis patrick algorithm is useful for clustering
chemical structures on the basis of 2D fragment descriptors (68).
Lipinski rule of five is one such example where the similar
Fig. 7. Clustering of compounds in Library MCS module of chemaxon.
characteristics of drug molecules were derived by clustering large

number of drugs and lead molecules. ChemAxon provides cluster-
ing tools to analyze hundreds and thousands of molecules (Library
MCS) via maximum common substructures (69) (Fig. 7).
3.6. Internet as a In situations where there is no chemical structure information

Repository of Chemical available, it is still possible to generate necessary chemical structure
Structures data through metadata harvesting method. Karthikeyan et al.
developed ChemXtreme, a java-based computer program
to harvest chemical information from Internet web pages using
Google search engine and applying distributed computing environ-
ment (70). ChemXtreme employs the “search the search engine”
strategy, where the URLs returned from the search engines are
analyzed further via textual pattern analysis. This process resembles
the manual analysis of the hit list, where relevant data are captured
and, by means of human intervention, are mined into a format
suitable for further analysis. ChemXtreme on the other hand trans-
forms chemical information automatically into a structured format
suitable for storage in databases and further analysis and also pro-
vides links to the original information source. The query data
retrieved from the search engine by the server is encoded,
encrypted, and compressed and then sent to all the participating
active clients in the network for parsing. Relevant information
identified by the clients on the retrieved web sites is sent back to
the server, verified, and added to the database for data mining and
further analysis. The chemical names including global identifiers

like InChI or corporate identifiers like CAS registry numbers and
Beilstein registry number could be mapped to corresponding struc-
tural information in a relational database systems.
3.7. Structure Property Physicochemical properties, bioactivities, and toxicity-related data

Correlation of chemicals available from scientific literature or from experimental
results are used for building predictive models applying advanced
mathematical methods or machine-learning techniques. The prin-
ciple of “similar structure with similar property” is applied while
building mathematical models. Chemical structure descriptors or
the structural features with independent property of interest such as
activity, property, and toxicity applying mathematical modeling or
statistical techniques are always linked. The quality of predictive
models basically depends on the type of relevant molecular descrip-
tors and accuracy of experimental data. The applicability domain is
one of the most important factors which should be taken into
consideration while building mathematical models or while apply-
ing the prebuilt models for predictive studies. Explaining outliers in
the training set, test set, and predicted set is one of the require-
ments in modern structure–property–activity relationship studies.
There are several types of molecular descriptors and features used
for structure property relationship studies. Most commonly used
molecular descriptors are topological, electronic, and shape
descriptors. These three molecular features are intimately related
to each other. Most common programs used to compute descrip-
tors are Dragon, Molconnz, MOE, JOELib, PaDEL, and Chem
Axon (71). The predictive models could be either continuous
model or binary model. In the continuous model, one can predict
the property value in a range, where as in binary model the out-
come would be either yes or no. Binary fingerprints, fragment keys
(MACCS), predefined pharmacophore features including aromatic,
hydrogen bond acceptors (HBA), or hydrogen bond donors
(HBD) generated from chemical structures helped to design better
and efficient lead molecules in drug discovery research. According
to IUPAC “A pharmacophore is an ensemble of steric and elec-
tronic features that is necessary to ensure the optimal supramolec-
ular interactions with a specific biological target and to trigger (or
block) its biological response” (72). In simple terms, a pharmaco-
phore represents the key features like hydrogen bond donors,
acceptors, centroids, aromatic rings, hydrophobic regions present
in the molecules which are likely to be responsible for bioactivity. In
the pharmacophore-based modeling, the positions of consensus
pharmacophore features from a set of aligned molecules with
known bioactivity type is captured in three-dimensional space
along with their interatomic distances. These features are used to
identify the hits from a collection of molecules with unknown
bioactivity. Commercially available programs like Molecular
Fig. 8. Pharmacophore model generated by aligning molecules in MOE program.
Operating Environment (MOE), Catalyst (Accelerys), and Phase

(Schrodinger) are equipped with these modules to design lead
molecules (73) (Figs. 8 and 9).
Experimental chemists and biologists are actually interested in
the properties of the chemicals and their response to biological
systems both in beneficial and in adverse effects contexts. Several
research groups across the world have compared chemical and drug
databases to identify the molecular descriptors that can be used to
classify molecules as drugs or nondrugs, toxins or non-toxins. In a
classic paper by David Bawden, the applications of structure
handling techniques for large-scale molecular property prediction
along with relevant descriptors have been described in detail (74).
Balaban et al. demonstrated correlation between chemical struc-
tures and boiling points of acyclic saturated compounds and
haloalkanes using topological descriptors (75). Helge et al. used
neural networks to predict the clearing temperatures of nematic
liquid crystalline phases of several compounds from their chemical
structures (76). Recently Karthikeyan et al. built a artificial neural
network-based machine learning model for predicting melting
point of diverse organic compounds (77). The quality of prediction
depends on the primary experimental data used for mathematical
modeling. A mathematical modeler is not expected to validate
the experimental data cited in the scientific literature. Therefore it
is the primary responsibility of the experimental chemist or biolo-
gist to publish high-quality data which is original, authentic and
reproducible.
3.7.1. Structure Correlation Here it is pertinent to discuss in detail the role of molecular struc-
with Toxicity tures/functional groups in toxicity context. Toxicity is another
important parameter which needs to be assessed from molecular
structures. Topkat is one such program which predict several
toxicological-related data from given molecular structures based
on the availability of selected structural patterns (78). The basic
premise of the field is that molecular structure is related to the
potential toxicity of a compound. This structure-based modeling
Fig. 9. Pharmacophoric annotation scheme in MOE.
finds application in QSAR, QSTR, environmental toxicity and high

throughput screening.
The type of fragments present in a molecule which is responsible
for a toxicity-eliciting interaction with the receptor or by metabolic
activation is called as a toxicophore. In order to build models for
predicting toxicology data, analysis of molecular structures in terms
of presence or absence of these toxicophores is most important.
Figure 10 shows common toxicophores which include Phosphoric
groups, acetylenic, acetylide, acetylene halide, Diazo, Nitroso,
Nitro, Nitrite, N-Nitroso, N-Nitro, Azoic, Peroxide, Hydroperox-
ide, Azide, Sulfur, carboxylates of diazonium, Halogenamine (79)
(Fig. 10).
–
O O–
C HO 5
N
S2+ C
C NH H2N O
C O
1 2 3
O–
–
O 4
O A N C S2+
N
S2+ C N A O– N+ N+ N– A O
–
6 O A
8 9
O– 7
O
H H
H H N A N
N A N A
A NH2 A N
C OH HO A 13 H
A A
10 11 12 14
F
F H
N N
A
N O O
A
N A F 17
15 16
F
F N
N N+ NH– A C O O
18 F A N 20
19 H
O N
H
N A O H2N NH
C A S C A
21 23
N A H
OH
H 22
H
N A S
C A A O A O
24 O 26
25
–
O O–
O–
S2+
O
A S2+ O O
O– O
N A 29
27 28
O
O O
N+ A F
A
–
O A N+ – C
O
N A OH S2+ C O– F
32
30 31 F
O–
Fig. 10. Common toxicophores.
Among the toxicophores, epoxides, and aziridines are electro-

philic, alkylating substructures with significant intrinsic reactivity
and they are present in various compound data sets available pub-
licly. It is necessary to identify these functional groups in early stage
of drug design and eliminate them cautiously using virtual

screening. Toxicophore models have been developed for mutage-
nicity prediction, HERG channel blockers, and hepatotoxicity
(80). In an interesting study by Hughes et al. on evaluation of
various physicochemical properties on in vivo toxicity TPSA (Topo-
logical Polar Surface Area) descriptor showed the maximum corre-
lation to toxicity. A low TPSA allowed compounds to pass through
lipid membranes and distribute into tissue compartments leading
to high levels of toxicity. Other measures in this study included
predicted blood brain barrier penetration and ClogP. It was
revealed that high-ClogP compounds have a greater incidence of
adverse outcomes relative to lower ClogP compounds (81).
3.7.2. Virtual Library In order to design a better lead molecule one has to perform
Enumeration sequence of several steps starting from collecting molecular data
with known bioactivity, analysis of those chemical structures to
extract significant features related to activity of interest and rebuild
new molecules with promising and favorable bioactivity profiles.
Virtual library of diverse molecules which are not yet synthe-
sized can be enumerated from a set of scaffolds and functional
groups by combinatorial means. Here the scaffold represents a
molecule containing at least one ring or several rings which are
connected by linker atoms. Scaffolds can be generated from com-
plex molecular structures by a systematic disconnection of func-
tional groups connected by single bonds. Using this approach a
scaffold translator (chemscreener and scaffoldpedia) was built and
applied on large-scale data set of PubChem compounds containing
over 12 million ring structures to generate a library of about 1
million scaffolds in a distributed computing environment (82).
From this study, it was observed that over 300,000 molecules
contain a common scaffold of “benzene” ring. The scaffolds and
functional groups generated could be further enumerated to build
virtual library of diverse organic molecules. An alternate approach
namely “lead hopping” is also available to replace common scaffold
by chemically and spatially equivalent core fragments (83).
Designing a better molecule is just one aspect of research, and
getting it synthesized and biologically evaluated is the most critical
component in drug discovery programs. Therefore it is necessary to
rank molecules based on their synthetic accessibility score that
represents the ease of synthesis of required molecule in a shortest
possible way with high yield (84). Reactivity pattern fingerprints are
being developed to characterize the molecules either as a reactant
or product based on the nature of functional groups composition
(85). The emerging trends in mapping reactivity pattern of mole-
cule is illustrated with following example where molecule “A” from
KEGG database with Reactant Like Score of (RLS) 52:22 contains
more reactivity groups which is likely to be a reactant for 52 reac-
tions whereas it is likely to be a product for 22 known reactions.
Fig. 11. (a) Reactant like (b) Product like molecule.
The converse is true for another molecule “B” with Product Like
Score (PLS) of 13:5 where it is likely to be less reactive and could be
synthesized through 13 different reactions but likely to undergo just
5 type of reactions based on their functional groups (Fig. 11). With
the help of these PLS and RLS one can prioritize and characterize
the compounds either as a reactant or as a product rapidly through
in silico analysis and design efficient synthetic routes for the same.
3.7.3. Environmental It is also important to discuss the characteristics of chemicals in the

Concerns context of the environment. Many petrochemicals used in indus-
trial, agricultural or domestic use reach the environment and
become a source of grave concern in long run. There are certain
chemicals such as polymers which require adverse conditions or
long time to degrade into harmless chemicals in the environment.
Some of the organic chemicals used as agrochemicals, for example
pesticides and fungicides, become persistent in the environment
and damages are significant. Now federal agencies are taking mea-
sures to control substances which are potential risk to the environ-
ment and human lives. Environmental Protection Agency (EPA)
strives to develop models for predicting characteristics of controlled
substances, chemicals of environmental concerns such as persistent
organic substances (POS), carcinogens, and mutagens. There are
software tools available to study the correlation between structure
and biodegradability termed as QSBR (quantitative structure bio-
degradability relationship) (86). Environmental Protection Agency
in USA and REACH initiatives by European Union are significant
steps to control movement of large quantities of chemicals with
potential risk (87). In the early days animals were used for toxico-
logical testing of chemicals, however due to resistance by bioethics
groups and law enforcement by government agencies the numbers
of such tests are restricted to a bare minimum. One way to assess
Regulatory Bodies Categories Databases
Substances
• EPA • Pesticides • DSSTOX
• TSCA • Fungicides • RTECS
• Persistent organic DDT
• REACH • MSDS
substances(POS) Endosulfan
• TOXNET
• Explosives RDX
• NIOSTIC
• Controlled
substances
Fig. 12. Toxicological information system.
toxicity is to use data from past experiments and to build better

predictive models as a substitute for animal toxicity testing and
apply these models to develop risk assessment procedures. Over a
period of time several toxicological models were evolved and fre-
quently used for in silico prediction of environmental toxicity,
mutagenicity, carcinogenicity, and ecotoxicity of chemicals. The
EPI (Estimation Programs Interface) Suite developed by EPA is a
physical/chemical property and environmental fate estimation
program. EPI Suite uses a set of several estimation programs like
KOWWIN, AOPWIN, HENRYWIN, MPBPWIN, BIOWIN, Bio
HCwin, KOCWIN, WSKOWWIN, WATERNT, BCFBAF, HYD
ROWIN, KOAWIN and AEROWIN, WVOLWIN, STPWIN, LEV
3EPI, and ECOSAR (88). Every module in this program and
similar program has its own level of approximation and accuracy.
It is also important to remember that computational or predictive
studies have their own merits and demerits, and preference should
be given to available experimental data over the computed or
estimated data especially while decision making on processes or
products of interest. DDT and Endosulfan are two compounds
historically used as pesticides banned by most of the countries due
to their adverse effect on human health and potential environmen-
tal toxicity (89) (Fig. 12).
Today predictive models are used extensively as guiding tools
essentially as a replacement for animal experiments and environ-
mental damage estimations. It is very important to learn from past
experiments and incidents about the dangerous nature of chemicals
and design better, safe, and green chemicals for the future with the
help of these knowledge-based tools and methods.
4. Notes
Structural handling of chemicals efficiently is the basic need of

computational toxicologist and any development in this area is
sure to find many applications in organizations worldwide. The
user has to choose his arsenal of methods and softwares in a cau-
tious manner to make informed decisions. There is no substitute for
human wisdom and care has to be taken while handling extremely
sensitive data for building toxicity prediction models. In cases
where wet lab validation is prohibitive comparison with data output
of other modeling methods should be practiced. To conclude
though the field of structure representation and manipulation
appears to be mature, still there is enough scope for advancement
in generic, complex, mixture or polymeric structure representa-
tions, and predictive studies with precision.
References
1. Ash JE, Warr WA, Willett P (1991) Chemical 18. www.ccl.net/cca/documents/molecular-model-
structure systems. Ellis Horwood, Chichester ing/node3.html. Accessed 11 July 2012
2. http://pubchem.ncbi.nlm.nih.gov/. Accessed 19. Wendy AW (2011) Representation of chemical
11 July 2012 structures. Vol 1. Wiley, New York. doi:10.
3. http://www.ncbi.nlm.nih.gov/sites/entrez?db¼ 1002/wcms.36
pccompound&term ¼ formaldehyde. Accessed 20. Fritts L, Schwind E, Margaret M (1982) Using
11 July 2012 the Wiswesser line notation (WLN) for online,
4. http://pubchem.ncbi.nlm.nih.gov/summary/ interactive searching of chemical structures.
summary.cgi?cid¼712. Accessed 11 July 2012 J Chem Inf Comput Sci 22:106–9
5. http://pubchem.ncbi.nlm.nih.gov/search/ 21. Weininger D (1990) SMILES Graphical depic-
search.cgi#. Accessed 10 July 2012 tion of chemical structures. J Chem Inf Com-
6. http://www.genome.jp/kegg/. Accessed 11 put Sci 30:237–43
July 2012 22. www.daylight.com/dayhtml/doc/theory/
7. https://www.ebi.ac.uk/chembldb/. Accessed theory.smarts.html. Accessed 11 July 2012
11 July 2012 23. Ash S, Malcolm AC, Homer RW, Hurst T,
8. http://www.drugbank.ca/. Accessed 11 July Smith GB (1997) SYBYL Line Notation
2012 (SLN): A versatile language for chemical struc-
ture representation. J Chem Inf Comput Sci
9. http://www.epa.gov/ncct/dsstox/index.html. 37:71–79
Accessed 11 July 2012
24. McNaught A (2006) The IUPAC Interna-
10. http://www.cdc.gov/niosh/rtecs/. Accessed tional Chemical Identifier: InChl. Chemistry
11 July 2012 International (IUPAC) 28 (6). http://www.
11. http://www.cdc.gov/niosh/database.html. iupac.org/publications/ci/2006/2806/
Accessed 11 July 2012 4_tools.html. Accessed 11 July 2012
12. http://www.msds.com/. Accessed 11 July 25. Grave KD, Costa F (2010) Molecular graph
2012 augmentation with rings and functional
13. http://www.cas.org. Accessed 11 July 2012 groups. J Chem Inf Model 50:1660–8
14. http://beilstein.com. Accessed 11 July 2012 26. www.lohninger.com/helpcsuite/connection_
15. https://scifinder.cas.org. Accessed 11 July 2012 table.htm. Accessed 11 July 2012
16. Gasteiger J, Engel T (eds) (2003) Chemoinfor- 27. Wipke WT, Dyott TM (1974) SEMA the
matics: a textbook. Wiley-VCH, Weinheim stereochemically extended algorithm. J Am
17. Quadrelli L, Bareggi V, Spiga S (1978) A new Chem Soc 96:4834
linear representation of chemical structures. 28. Christie BD, Leland BA, Nourse JG (1993)
J Chem Inf Comput Sci 18:37–40 Structure searching in chemical databases by
direct lookup methods. J Chem Inf Comput connection table representation for generic
Sci 33:545–7 structures. J Chem Inf Comput Sci 22:160–4
29. Helmut B (1982) Stereochemical structure 43. Barnard JM, Lynch MF, Welford SM (1981)
code for organic chemistry. J Chem Inf Com- Computer storage and retrieval of generic
put Sci 22:215–22 chemical structures in patents.; GENSAL, a
30. Bath PA, Poirrette AR, Willett P, Allen FH formal language for the description of generic
(1994) Similarity searching in files of three- chemical structures. J Chem Inf Comput Sci
dimensional chemical structures: Comparison 21:151–61
of fragment-based measures of shape similarity. 44. Kaback SM (1980) Chemical structure search-
J Chem Inf Comput Sci 34:141–7 ing in Derwent’s World Patents Index. J Chem
31. http://www.molecular-networks.com. Accessed Inf Comput Sci 20:1–6
11 July 2012 45. www.chemaxon.com. Accessed 11 July 2012
32. Nicklaus MC, Milne GW, Zaharevitz D (1993) 46. Rughooputh SDDV, Rughooputh HCS (2001)
ChemX and Cambridge Comparison of com- Neural network based chemical structure
puter generated chemical structures with X ray indexing. J Chem Inf Comput Sci 41:713–717
crystallographic data. J Chem Inf Comput Sci 47. Karthikeyan M, Bender A (2005) Encoding
33:639–46 and decoding graphical chemical structures as
33. Clark DE, Jones G, Willet P et al (1994) Phar- two-dimensional (PDF417) barcodes. J Chem
macophoric pattern matching in files of three Info Model 45:572–580
dimensional chemical structure: Comparison 48. Karthikeyan M, Krishnan S, Steinbeck C
of conformational searching algorithms for (2002) Text based chemical information loca-
flexible searching. J Chem Inf Comput Sci tor from Internet (CILI) using commercial
34:197–206 barcodes, 223rd American Chemical Society
34. Bond VL, Bowman CM, Davison LC et al Meeting - Orlando, Florida, USA
(1979) On-line storage and retrieval of chemi- 49. Karthikeyan M, Uzagare D, Krishnan S. Com-
cal information Substructure and biological pressed Chemical Markup Language for com-
activity screening. J Chem Inf Comput Sci pact storage and inventory applications, 225th
19:231–4 ACS Meeting - New Orleans, March 23–27,
35. Wang Y, Bajorath J (2010) Advanced Finger- 2003. CG ACS
print methods for similarity searching: balanc- 50. Guha R, Howard MT, Hutchinson GR et al
ing molecular complexity effects. Comb Chem (2006) The Blue Obelisk—interoperability in
High Throughput Screen 13:220–228 chemical informatics. J Chem Inf Model
36. Andrew Leach (2007) An introduction to che- 46:991–998
moinformatics Springer 51. Dalby A, Nourse JG, Hounshell WD et al
37. Takahashi Y, Sukekawa M, Sasaki S (1992) (1992) Description of several chemical struc-
Automatic identification of molecular similarity ture used by computer programs developed at
using reduced-graph representation of chemi- molecular design limited. J Chem Inf Comput
cal structure. J Chem In Comput Sci Sci 32:244–55
32:639–43 52. Xiai XY, Li RS (2000) Solid phase combinatorial
38. Zupan J (1989) Algorithms for chemists. synthesis using microkan reactors, Rf tagging
Wiley, Chichester, UK and directed sorting. Biotech Bioeng 71:41–50
39. Wipke WT, Krishnan S, Ouchi GI (1978) Hash 53. en.wikipedia.org/wiki/Optical_character_re-
functions for rapid storage and retrieval of cognition
chemical structures. J Chem Inf Comput Sci 54. Valko AT, Peter JA (2009) CLiDE Pro: The
18:32–7 Latest Generation of CLiDE, a Tool for Opti-
40. Hodes L, Feldman A (1978) An efficient cal Chemical Structure Recognition. J Chem
design for chemical structure searching. II. Info Model 49:780–787
The file organization. J Chem Inf Comput Sci 55. cactus.nci.nih.gov/osra/
18:96–100 56. infochem.de/mining/chemocr.shtml
41. Brown R, Downs D, Geoffrey M et al (1992) 57. Murray-Rust P, Rzepa HS (2003) Chemical
Hyperstructure model for chemical structure Markup, XML and the world wide web CML
handling: generation and atom-by-atom Schema. J Chem Inf Comput Sci 43:757–72
searching of hyperstructures. J Chem Inf
Comput Sci 32:522–31 58. http://www.molinspiration.com/jme/index.
html. Accessed 11 July 2012
42. Barnard JM, Lynch MF, Welford SM (1982)
Computer storage and retrieval of generic 59. http://moltable.ncl.res.in. Accessed 11 July
structures in chemical patents. 4. An extended 2012
60. Blake JE, Farmer NA, Haines RC (1977) An chemical properties like boiling points. J Chem
interactive computer graphics system for pro- Inf Comput Sci 32:237–44
cessing chemical structure diagrams. J Chem 76. Helge V, Volkmar M (1996) Prediction of
Inf Comput Sci 17:223–8 material properties from chemical structures.
61. Goodson AL (1980) Graphical representation The clearing temperature of nematic liquid
of chemical structures in Chemical Abstracts crystals derived from their chemical structures
Service publication. J Chem Inf Comput Sci by artificial neural Networks. J Chem Inf Com-
20:212–17 put Sci 36:1173–1177
62. Walentowski R (1980) Unique unambiguous 77. Karthikeyan M, Glen RC, Bender A (2005)
representation of chemical structures by com- General melting point prediction based on a
puterization of a simple notation. J Chem Inf diverse compound data set and artificial neural
Comput Sci 23:181–92 networks. J Chem Inf Model 45:581–90
63. Strokov I (1995) Compact code for chemical 78. http://accelrys.com/solutions/scientific-need/
structure storage and retrieval. J Chem Inf predictive-toxicology.html. Accessed 11 July
Comput Sci 35:939–44 2012
64. Strokov I (1996) A new modular architecture 79. Hakimelahi GH, Khodarahmi GA (2005) The
for chemical structure elucidation systems. identification of toxicophores for the predic-
J Chem Inf Comput Sci 36:741–745 tion of mutagenicity, hepatotoxicity and cardi-
65. Hagadone TR, Schulz MW (1995) Capturing otoxicity. JICS 2:244–267
Chemical Structure Information in a Relational 80. Garg D, Gandhi T, Gopi Mohan C (2008)
Database System: The Chemical Software Exploring QSTR and toxicophore of HERG K +
Component Approach. J Chem Inf Comput channel blockers using GFA and HypoGen tech-
Sci 35:879–84 niques. J Mol Graph Model 26:966–76
66. Willett Peter J (1984) Evaluation of relocation 81. Hughes JD, Blagg J, Da P et al (2008) Physi-
clustering algorithms for the automatic classifi- cochemical drug properties associated with
cation of chemical structures. J Chem Inf Com- in vivo toxicological outcomes. Bioorg Med
put Sci 24:29–33 Chem Lett 18:4872–4875
67. Barnard JM, Downs GM (1992) Clustering of 82. Karthikeyan M, Krishnan S, Pandey AK, Bender
chemical structures on the basis of two- A, Tropsha A (2008) Distributed chemical com-
dimensional similarity measures. J Chem Inf puting using Chemstar: an open source Java
Comput Sci 32:644–9 Remote Method Invocation architecture applied
68. Gu Q, Xu J, Gu L (2010) Selecting diversified to large scale molecular data from Pubchem.
compound to build a tangible library for J Chem Info Comput Sci 48:691–703
biological and biochemical assay. Molecules 83. Maass P (2007) Recore: A fast and versatile
15:5031–44 method for scaffold hopping based on small
69. www.chemaxon.com/jchem/doc/user/Lib molecule crystal structure conformations.
MCS.html. Accessed 11 July 2012 J Chem Inf Model 47:390–9
70. Karthikeyan M, Krishnan S, Pandey AK (2006) 84. Ertl P, Schuffenhauer A (2009) Estimation of
Harvesting chemical information from the synthetic accessibility score of drug like mole-
internet using a distributed approach: Chem cules based on molecular complexity and frag-
Extreme. J Chem Inf Model 46:452–461 ment contributions. J Cheminform 1:8
71. www.moleculardescriptors.eu/softwares/soft 85. Melvin JYu (2011) Natural product like virtual
wares.htm. Accessed 11 July 2012 libraries: Recursive atom based enumeration.
72. Horvath D (2011) Pharmacophore-based vir- J Chem Inf Model 51:541–557
tual screening In: Bajorath J (ed) Chemoinfor- 86. Yang H, Jiang Z, Shi S (2006) Aromatic com-
matics and Computational Chemical Biology, pounds biodegradation under anaerobic condi-
Methods in Molecular Biology, Humana Press tions and their QSBR models. Sci Total
672:261–298 Environ 358:265–76
73. Yang SY (2010) Pharmacophore modeling 87. http://www.epa.ie/whatwedo/monitoring/
and applications in drug discovery challenges reach/. Accessed 11 July 2012
and recent advances. Drug Discov Today 15: 88. Zhang X, Brown TN, Wania F (2010) Assess-
444–50 ment of chemical screening outcomes based on
74. Adamson GW, Bawden D (1981) Comparison different partitioning property estimation
of hierarchical cluster analysis techniques for methods. Environ Int 36:514–20
automatic classification of chemical structures. 89. Qiu X, Zhu T, Wang FHJ (2008) Air water gas
J Chem Inf Comput Sci 21:204–9 exchange of organochlorine pesticides in
75. Balaban AT, Kier LB, Joshi N (1992) Correla- Taihu lake China. Environ Sci Technol 42:
tions between chemical structure and physico- 1928–32
Chapter 9
Accessing and Using Chemical Property Databases

Janna Hastings, Zara Josephs, and Christoph Steinbeck
Abstract
Chemical compounds participate in all the processes of life. Understanding the complex interactions of
small molecules such as metabolites and drugs and the biological macromolecules that consume and
produce them is key to gaining a wider understanding in a systemic context. Chemical property databases
collect information on the biological effects and physicochemical properties of chemical entities. Accessing
and using such databases is key to understanding the chemistry of toxic molecules. In this chapter, we
present methods to search, understand, download, and manipulate the wealth of information available in
public chemical property databases, with particular focus on the database of Chemical Entities of Biological
Interest (ChEBI).
Key words: Chemistry, Databases, Ontology, ChEBI, Chemical graph, Cheminformatics, Chemical
properties, Structure search, Chemical nomenclature
1. Introduction
Small molecules are the chemical entities which are not directly
encoded by the genome, but which nevertheless are essential
participants in all the processes of life. They include many of the
vitamins, minerals, and nutritional substances which we consume
daily in the form of food, the bioactive content of most medicinal
preparations we use for the treatment of diseases, the neurotrans-
mitters which modulate our mood and experience, and the meta-
bolites which are transformed and created in a multitude of
complex pathways within our cells.
Understanding the complex interactions of small molecules
such as metabolites and drugs with the biological macromolecules
that consume and produce them is key to gaining a wider biological
understanding in a systemic context, such as can enable the predic-
tion of the therapeutic—and toxic—effects of novel chemical sub-
stances. Ultimately, all therapeutic and harmful effects of chemical
substances depend on the constitution, shape, and chemical
193
194 J. Hastings et al.
properties of the molecules which influence their interactions

within the body. In order to study these effects, it is therefore
crucial to access databases, which provide clear and accurate data
on these aspects of chemical entities.
Chemical property databases collect information on the nature
and properties of chemical entities. Accessing and using such data-
bases is an essential tool in understanding the chemistry of toxic
molecules. In this chapter, we present methods to search, under-
stand, download, and manipulate the wealth of information avail-
able in public chemical property databases, with particular focus on
the Chemical Entities of Biological Interest (ChEBI) database (1).
2. Materials
2.1. Describing The shape and properties of chemical entities depends to a large
Chemical Structures extent on the molecular structure of the entity—that is, the
with Chemical Graphs arrangement of atoms and bonds. Chemical graphs are a way of
representing these core elements in a concise formalism. The chem-
ical (molecular) graph describes the atomic connectivity within a
molecule by using labelled nodes for the atoms or groups within
the molecule, and labelled edges for the (usually covalent) bonds
between the atoms or groups (2).
The graph, strictly speaking, encodes only the constitution of
molecules, i.e., their constituent atoms and bonds. However, the
representational formalism is usually extended to include other
information such as idealised 2D or 3D coordinates for the
atoms, and bond order and type (single, double, triple, or aro-
matic). The chemical graph formalism is also accompanied by a
standard for diagrammatic representation. Figure 1 illustrates an
example of a chemical graph, 2D and 3D coordinates, and the
corresponding 2D and 3D visualisations.
Hydrogen nodes, and the edges linking them to their nearest
neighbouring atoms, are not explicitly displayed in Fig. 1, and
carbon nodes, which form the edges of the illustration, are not
explicitly labelled as such. These representational economies are
due to the prevalence of carbon and hydrogen in organic chemistry,
thus allowing these efficiencies, which have been introduced for
clarity of depiction, and efficiency in storage. Hydrogen-suppressed
graphs of this form are called skeleton graphs.
Chemical graphs simplify the complex nature of chemical entities
and allow for powerful visualisations, which assist the work of both
bench and computational chemists. They also enable many useful
predictions to be made about those physical and chemical properties
of molecules, which are based on connectivity. For these reasons, most
chemical property databases make use of the chemical graph formal-
ism for the storage of structural information about chemical entities.
9 Accessing and Using Chemical Property Databases 195
Fig. 1. The chemical graph (connection table) for a simple cyclohexane molecule, together with idealised 2D and 3D
coordinates and visualisations (for the 3D coordinates and visualisation shows the chair conformer of cyclohexane).
2.2. Exchanging Data The most common format used to encode and exchange chemical
on Chemicals graphs is the MOLfile format owned by Elsevier’s MDL (3). The
MOLfile is a flat ASCII text file with a specific structured format,
consisting of an atom table, describing the atoms contained in the
chemical entity, and a bond table, describing the bonds between
the atoms. Both the atom table and the bond table are extended
with additional properties including the isotope and charge of
the individual atoms, and the bond order and type of the bonds.
The MOLfile representation of the cyclohexane molecule is illu-
strated in Fig. 2.
The content of a MOLfile depends on the way in which the
chemical structure is drawn. For this reason, it is not possible to
efficiently check whether two representations are of the same chem-
ical based on the MOLfile, and therefore a canonical (i.e., the same
regardless of which way the molecule is drawn) and unique repre-
sentation of the molecule is needed for efficient identification. An
international standard for identification of chemical entities is the
IUPAC International Chemical Identifier (InChI) code. The InChI
is a nonproprietary, structured textual identifier for chemical enti-
ties which is generated by an algorithm from a MOLfile represen-
tation of the chemical entity (4). The generated InChI identifier is
not intended to be read and understood by humans, but is particu-
larly useful for computational matching of chemical entities, such as
when a computer has to align two different sets of data derived
Fig. 2. The MOLfile format for a cyclohexane molecule. The carbon atoms appear in the atom table and in the bond table
the bonds between atoms are listed, with line numbers from the atom table representing the atoms participating in the
bonds. Additional property columns allow for representation of many more features such as charge and stereochemistry.
from different sources. InChIs have variable lengths depending on

the complexity of the chemical structure, and as a result, database
lookups can be slow. The InChIKey is a hashed key for the InChI,
which allows easier database lookups as it has fewer characters and is
of a specified, invariant length of 14 characters. The InChI and
InChIKey for paracetamol are as follows
InChI ¼ 1=C8H 9NO2=c1 6ð10Þ9 7 2 4 8ð11Þ5 3
7=h2 5; 11H ; 1H 3; ðH ; 9; 10Þ=f =h9H
InChIKey ¼ RZVAJINKPMORJF BGGKNDAXCW
The InChI is an example of a line notation for chemical struc-
tures, expressing structural information on a single line. But the
InChI is not intended to be read by humans. Another line notation
for chemical structure representation which is somewhat easier for
human parsing yet still providing a compact representation is the
SMILES (Simplified Molecular Input Line Entry Specification)
format representation. The original SMILES specification was
developed by Arthur Weininger and David Weininger in the late
1980s. It has since been modified and extended by others, most
notably by Daylight Chemical Information Systems, Inc. (5). The
SMILES for paracetamol is
CC(¼O)Nc1ccc(O)cc1
Note that Hydrogen atoms are implicit in SMILES (as they
were in the example MOLfile representation for cyclohexane given
above), and to further reduce the space used in the representation,
single bonds are also implicit between neighbouring atoms. Atoms
are numbered when necessary to illustrate where the same atom
appears again (which is the case when the molecule contains a

cycle). Brackets indicate branching, and “¼” indicates a double
rather than a single bond. Lowercase letters indicate aromaticity.
The SMILES representation retains a high degree of human
readability while compressing the structural encoding of the mole-
cule into as few characters as possible. However, different algo-
rithms exist which produce different SMILES codes for the same
molecule, so care must be taken to cite the implementation used
when storing and comparing structures of molecules using
SMILES.
2.3. Repositories of Until fairly recently, the bulk of chemical data was only available
Chemical Property Data through proprietary chemical databases such as Chemical Abstracts
Service (CAS), provided by the American Chemical Society (6), and
Beilstein which since 2007 has been provided by Elsevier (7).
However, in recent years this has been changing, with more and
more chemical data being brought into the public domain. This
change has been brought about partly through the efforts of the
bioinformatics community, which needed access to chemistry data
to support systems-wide integrative research, and partly through
the joint efforts of pharmaceutical companies to reduce the expense
of pre-competitive research, since pharmaceutical companies had
historically each maintained their own database of chemicals for
pre-competitive research (8).
In 2004, two complementary open access databases were
initiated by the bioinformatics community, ChEBI (1) and Pub-
Chem (9). PubChem serves as automated repository on the struc-
tures and biological activities of small molecules, containing 29
million compound structures. ChEBI is a manually annotated data-
base of small molecules containing around 620,000 entities. Both
resources provide chemical structures and additional useful infor-
mation such as names and calculated chemical properties. ChEBI
additionally provides a chemical ontology, in which a structure-
based and role-based classification for the chemical entities is
provided. Additional publicly available resources for chemistry
information which are becoming more widely used are ChemSpider
(10), which provides an integrated cheminformatics search plat-
form across many publicly available databases, and the Wikipedia
Chemistry pages (11). Many smaller databases also exist, often
dedicated to particular topic areas or types of chemicals. For a full
listing of publicly available chemistry data sources, see (12), and for
a discussion see (13). Table 1 gives a brief listing of some of the
publicly available databases for chemical structures and properties.
2.4. Software Libraries Manipulating chemical data and generating chemical properties
programmatically requires the use of a cheminformatics software
library which is able to perform standard transformations and
provides implementations for common algorithms. The Chemistry
Table 1
A listing of several chemical property databases which are publicly available
(i.e., not commercial)
Name Description URL

ChEBI A freely available database and ontology http://www.ebi.ac.uk/
of chemical entities of biological interest chebi
PubChem A deposition-supplied database of publicly http://pubchem.ncbi.
available chemical entities and bioactivity nlm.nih.gov/
assays
DrugBank A database collecting information about drugs, http://www.drugbank.
drug targets, and drug active ingredients ca/
Spectral DB for organic A database for 1 H NMR and 13 C NMR http://riodb01.ibase.aist.
compounds (SDBS) spectra; also FT-IR, Raman, ESR, and MS go.jp/sdbs/cgi-bin/
data cre_index.cgi
Nmr spectra DB A resource dedicated to biomolecules (proteins, http://www.bmrb.wisc.
nucleic acids, etc.), providing raw spectral edu/
data
NmrShift DB Organic structures and their NMR (H-1 and http://www.ebi.ac.uk/
C-13) spectra nmrshiftdb/
MassBank Mass spectral data of metabolites http://www.massbank.
jp/index.html?
lang ¼ en
caNanolab A portal for accessing data on nanoparticles http://nano.cancer.gov/
important in cancer research collaborate/data/
cananolab.asp
Japan Chemical Substance Structure, synonyms, systematic names, and http://nikkajiweb.jst.go.
Dictionary (JCSD, MOL files for wide variety of chemical entities jp/nikkaji_web/pages/
Nikkaji web) top_e.jsp
GlycosuiteDB A curated glycan database http://glycosuitedb.
expasy.org/glycosuite/
glycodb
Cambridge Structural DB From the CCDC—crystal structure data http://www.ccdc.cam.ac.
uk/products/csd/
Chemexper chemical Gives structure and mol files for a large number http://www.chemexper.
directory of entities, as well as seller info com/
ChemicalBook Structure, CAS no., mol file, synonym list, loads http://www.
of physicochemical data (melting, boiling chemicalbook.com/
points, etc.)
IUPAC gold book Useful reference work for terminology and http://goldbook.iupac.
definitions org/
ChemBlink An online database of chemicals from around http://www.chemblink.
the world. Gives molecular structure and com/index.htm
formula, CAS no., hyperlinks to lists of
suppliers and market analysis reports
Development Kit (CDK) is a collaboratively developed open

source software library providing implementations for many of
the common cheminformatics problems (14, 15). These range
from Quantitative Structure–Activity Relationship descriptor cal-
culations to 2D and 3D model building, input and output in
different formats, SMILES parsing and generation, ring searches,
isomorphism checking, and structure diagram generation. It is
written in Java and forms the backbone of a growing number of
cheminformatics and bioinformatics applications.
When interacting with publicly available chemical property
databases, it is often necessary to draw or upload a chemical
structure for the purpose of searching. For this purpose, it will
be necessary to make use of a chemical structure editor, usually
embedded into the website of the chemical database. JChemPaint
is one such editor, a freely available open source Java-based 2D
editor for molecular structures (16). It provides most of the
standard drawing features of commercial chemical structure edi-
tors, including bonds of different orders, stereodescriptors, struc-
ture templates with complex structures, atom type selection from
the periodic table, rotation and zooming of the image, selection
and deletion of parts of the structure, and input and output of
chemical structure in various common formats including SMILES
and InChI.
3. Methods
3.1. Searching The first entry point in accessing chemical data is the search inter-
for Entities face provided by the database one is accessing. Generally, such
databases provide a “simple” search as a first entry point, which
3.1.1. Searching Using Text
takes the search string and searches across all data fields in the
and Property Values
database to bring back candidate hits. For example, ChEBI pro-
vides a search box in the centre of the ChEBI front page and in the
top right of every entry page. The search query may be any data
associated with an entity, such as names, synonyms, formulae, CAS
or Beilstein Registry numbers, or InChIs.
When searching any database with free text, it is important to
know the wildcard character, which allows one to match part of a
word or phrase. For the ChEBI database, the wildcard character is
“*.” A wildcard character allows one to find compounds by typing
in a partial name. The search engine will then try to find names
matching the pattern one has specified.
l To match words starting with a search term, add the wildcard
character to the end of the search term. For example, searching
for aceto* will find compounds such as acetochlor, acetophena-
zine, and acetophenazine maleate.
l To match words ending with a search term, add the wildcard

character to the start of the search term. For example, searching
for *azine will find compounds such as 2-(pentaprenyloxy)dihy-
drophenazine, acetophenazine, and 4-(ethylamino)-2-hydroxy-6-
(isopropylamino)-1,3,5-triazine.
l To match words containing a search term, add the wildcard
character to the start and the end of the search term. For
example, searching for *propyl* will find compounds such as
(R)-2-hydroxypropyl-CoM, 2-isopropylmaleic acid, and 2-methyl-
1-hydroxypropyl-TPP.
l Any number of wildcard characters may be used within a search
term, thus making the search facility very powerful.
Even using wildcards to extend the search capability, simple
text searching in chemistry databases can be problematic. Many of
the interesting properties for search purposes are structural or
numeric and therefore not easily expressed in text. To this end,
most chemistry databases provide a chemistry-specific advanced
search interface which allows for searches based on chemical prop-
erties such as mass and formula and, for example, value ranges for
properties such as charge. In ChEBI, the advanced text search
provides for additional granularity by allowing one to specify
which category to search in, as well as providing the option of
using Boolean operations when searching. Figure 3 illustrates the
ChEBI advanced search page with search options, using which the
user may combine multiple search criteria.
The user has the option to search by mass range (i.e., to find
compounds within a certain minimum and maximum mass), charge
Fig. 3. The ChEBI advanced search showing textual and chemical property-based search options. Multiple search options
may be specified and combined with the logical operators AND, OR, and BUT NOT. Text and property-based searches may
be combined with chemical structure searches.
range, chemical formula, and to combine these searches based on

Boolean operators. The standard Boolean operators are as follows.
l AND
This operator allows one to find a compound which contains all
of the specified search terms. For example, when searching for a
pyruvic acid with formula C5H6O4, specifying *pyruvic acid
C5H6O4 as the search term and selecting AND as the search
option will retrieve acetylpyruvic acid.
l OR
This operator allows one to type two or more words. It then tries
to find a compound which contains at least any one of these
words. For example, if one wanted to find all compounds con-
taining iron in the database, one could type in the search string
iron fe Fe2 Fe3.
l BUT NOT
Sometimes, common words can be a problem when searching,
as they can provide too many results. The “BUT NOT” operator
can be used to limit the result set. For example, if one was
looking for a compound related to chlorine but excluding acidic
compounds, one could specify *chlor as the search string but
qualify the search by specifying acid in the BUT NOT operator.
3.1.2. Chemical Structure An important search method to master in chemical databases is

Searching searching based on chemical structures. Structure searching is the
method by which a user is able to specify a chemical structure and
thereafter search through the database of chemical structures for
the search structure or similar structures. Different forms of struc-
ture searching include identity search, in which the exact entity
drawn is searched for in the database (which is useful if the name
of the entity is not known), substructure search, in which the entity
drawn is searched for as a wholly contained part of the structures
in the database, and similarity search, in which the entity drawn
is matched for similarity of features with the structures in the
database.
In order to perform a chemical structure search in ChEBI, the
structure must be provided via the JChemPaint applet embedded in
the structure search screen. The ChEBI structure search interface is
illustrated in Fig. 4.
Once the search is executed, a results page is shown with
matching hits. Clicking on the relevant ChEBI accession hyper-
linked under the search result image navigates to the entry page for
that entity. In addition, further structure-based searches may be
performed by hovering the mouse pointer over the displayed image
in the results grid and clicking on one of the resulting popup search
options. This is a shortcut which passes the illustrated structure
directly to the search facility.
Fig. 4. The JChemPaint applet for drawing chemical structures inside of web pages for performing chemical structure-
based searches. The bond selection utility is on the left and the atom selection along the bottom. The top menu includes
file tools and utilities, and along the right hand menu are common structures and an extended structure template library.
Structure search types are “Identity,” “Substructure,” and “Similarity”.
In ChEBI, structure searching is executed on the database in

the background using the ORCHEM library (17). When
performing identity searching, the search is based on the InChI,
which means that an InChI is generated from the drawn or
uploaded structure, and the database is then searched for exact
matches to that InChI, a straightforward string match. The sub-
structure and similarity search options are both based on chemical
fingerprints.
A fingerprint of a chemical structure is a way of representing
special characteristics of that structure in an easily searchable form.
Fingerprints are necessary because the problem of finding whether
a given chemical structure is a substructure of or is similar to,
another structure, is a computationally very expensive problem.
In fact, in the worst case, the time taken will increase exponentially
with the number of atoms. This makes running these searches
across whole databases almost completely intractable. Fortunately,
a general (and intractable) substructure or similarity search algo-
rithm does not have to be used across the full database, since
various different heuristics can be used to drastically narrow the
number of candidates for the algorithm to be applied to. For
example, the chemical formula could be used as one such heuristic.
Consider a search for all structures which have paracetamol as a
substructure. The chemical formula of paracetamol is C8H9NO2.
This means that we could immediately eliminate from the search
candidates any structures which don’t contain at least those quan-
tities of carbon, hydrogen, nitrogen, and oxygen. This simple
heuristic acts as a screen which cuts down the number of structures

required to perform the full substructure search against, by a large
percentage.
Fingerprints are designed to operate similarly as a screening
device, however, they are more generalised and abstract, allowing
more information about the structure to be encoded, thus elim-
inating a far larger percentage of search candidates. A fingerprint is
a boolean array (i.e. an array of 1s and 0s), or bitmap, in which the
characteristic features of structural patterns are encoded. Finger-
prints are created by an algorithm which generate a pattern for
l each atom in the structure,
l then each atom and its nearest neighbours, including the bonds
between them,
l then each group of atoms connected by paths up to two bonds long,
l . . . continuing, with paths of lengths 3, 4, 5, 6, 7, and 8 bonds
long.
For example, the water molecule generates the following patterns.
water (HOH)
0-bond paths H O H
1-bond paths HO OH
2-bond paths HOH
Every pattern in the molecule up to 8 bonds length is generated.
Each generated pattern is hashed to create a bit string output, then,
to create the final bitmap representation of the fingerprint, the
individual bit strings are added together (using the logical OR
operation) to create the final 1,024 bit fingerprint. Table 2 shows
how individual structural patterns might look for the example of
water, assuming for purposes of illustration that we have only a 10-
bit result to create the fingerprint.
Since the final result represents “infinitely” many structural
possibilities (as there are infinitely many chemical structures, at
least in theory) in a fixed length bitmap, it is inevitable that colli-
sions will occur—bits may be set already when they appear in a
subsequent pattern. Thus, fingerprints do not uniquely represent
chemical structures. However, fingerprints do have the very useful
property that every bit set in the fingerprint of a substructure of a
given structure, will also be set in the fingerprint of the full struc-
ture. So, for example, the fingerprint pattern for the water sub-
structure hydroxide (OH), which is a part of the full water
molecule, would be computed as shown in Table 3. By comparing
the fingerprint for hydroxide with the fingerprint for water given in
Table 2, it is easy to see that every bit set in the hydroxide finger-
print is also set in the water fingerprint.
Table 2
Bitmaps for individual structural patterns and the resulting
combined fingerprint for a water molecule in a simplified
fingerprinting example which consists only of a bitmap
of length 10
Pattern Hashed bitmap

H 0000010000
O 0010000000
HO 1010000000
OH 0000100010
HOH 0000000101
Result: 1010110111
Table 3
Bitmaps for individual structural patterns and the resulting
combined fingerprint for hydroxide, which is a substructure
of the water molecule
Pattern Hashed bitmap

H 0000010000
O 0010000000
OH 0000100010
Result: 0010110010
For substructure searching, fingerprints are used as effective

screening devices to narrow the set of candidates for a full substruc-
ture search. If all bits in a query fingerprint are also present in the
target fingerprint of a stored database structure, this structure is
subjected to the computationally expensive subgraph matching
algorithm. Those bit operations are very fast and independent of
the number of atoms in a structure due to the fixed length of the
fingerprint. For similarity searching, fingerprints are used as input
to the calculation of the similarity of two molecules in the form of
the Tanimoto coefficient, a measure of similarity. It is important to
remember that fingerprints have limitations: they are good at indi-
cating that a particular structural feature is not present, but they can
only indicate a structural feature’s presence with some probability.
Table 4
The calculation of a Tanimoto similarity coefficient between
two chemical structures as represented by fingerprints
Object Fingerprint
Object A 0010110010
Object B 1010110001
c (bits on in both) 3
a (bits on in Obj A) 4
b (bits on in Obj B) 5
Tanimoto (c/(a + b-c)) 3/(4 + 5-3) ¼ 0.5
The Tanimoto coefficient between two fingerprints is calcu-

lated as the ratio
c
T ða; b Þ ¼
aþbc
where c is the count of bits “on” (i.e., 1 not 0) in the same position
in both the two fingerprints, a is the count of bits on in object A,
and b is the count of bits on in object B. An example of a Tanimoto
calculation of similarity between two fingerprints is given in Table 4.
The Tanimoto coefficient varies in the range 0.0–1.0, with a
score of 1.0 indicating that the two structures are very similar (i.e.,
their fingerprints are the same). This is represented on the search
page in terms of a percentage of similarity. The percentage cutoff
for a similarity search may be specified, either at 70% (the default),
higher at 80% or 90%, or lower at 50% and (the minimum) 25%.
For all of the structure-based searches in ChEBI, the user has
the option to select “explicit stereo” to include explicit stereochem-
istry in the search term. Additionally, the user has the option to
select more results than the default 200 and to determine the
pagination size of the results page (default 15 results per page).
3.2. Viewing a Typical After a successful search for an entity in a chemical property data-
Database Entry base, the user will be presented with an entry page for that entity.
The entry page usually displays a visual representation of the struc-
ture of the entity together with useful information such as names
and synonyms, calculated properties, and classification information.
ChEBI database entries contain as follows
l A unique, unambiguous, recommended ChEBI name and an
associated stable unique identifier
l An illustration of the chemical structure where appropriate

(compounds and groups, but generally not classes), as well as
secondary structures such as InChI and related chemical data
such as formula
l A definition, giving an important textual description of the
entity
l A collection of synonyms, including the IUPAC recommended
name for the entity where appropriate, and brand names and
INNs for drugs
l A collection of cross-references to other databases (where these
are sourced from nonproprietary origins)
l Links to the ChEBI ontology
l Citation information where the chemical has been cited in
publication
ChEBI entities are illustrated by means of a diagram of the
chemical structure. Following best practices, the default illustration
is an unambiguous, two-dimensional representation of the struc-
ture of the entity. Additional structures may be present; for exam-
ple, a three-dimensional structure. Where available, this can be
accessed by clicking on “More structures >>” on the main entity
page. Structures are stored as MOLfiles within ChEBI, and other
structural representations such as InChI are also provided.
The chemical structure can be interactively explored by activating
the ChemAxon’s MarvinView applet (http://www.chemaxon.com)
by selecting the “applet” checkbox next to the structure image.
MarvinView allows the user to quickly and easily control many aspects
of the display, such as molecule display format, colour scheme,
dimension, and point of view. It is possible to move the displayed
molecule by translation, dragging, changing the zoom, or rotating in
three dimensions. It is also possible to animate the display. The format
of the display can be switched among common formats such as
wireframe, ball and stick, and spacefill. The colour scheme can be
changed. Also, the display of implicit and explicit hydrogens in the
image can be altered.
The entry page collates names and synonyms. Chemical names
may be trivial or systematic. In systematic names, the name encodes
structural features, and indeed, a fully specified systematic name can
be automatically transformed into a structural representation. Triv-
ial names are often assigned in common use and supersede the
systematic name for communication purposes, since trivial names
are easier to pronounce and write. Table 5 shows some examples of
trivial and systematic names for chemical entities.
The entry page also collects together database cross-references
and citations. Cross-references to other databases allow navigation
of the broader knowledge space pertaining to that entity. In partic-
ular, ChEBI provides many links of biological relevance, allowing
Table 5
Some examples of trivial and systematic
names for ChEBI entities
ChEBI ID Trivial name Systematic name

CHEBI:16285 Phytanic acid 3,7,11,15-Tetramethylhexadecanoic acid
CHEBI:17992 Sucrose b-D-Fructofuranosyl a-D-glucopyranoside
CHEBI:16494 Lipoic acid 5-(1,2-Dithiolan-3-yl)pentanoic acid
CHEBI:15756 Palmitic acid Hexadecanoic acid
Trivial names are generally shorter and easier to remember and pronounce
than their systematic counterparts. However, systematic names are more infor-
mative with respect to the chemical structural information
browsing to references to chemical entities appearing in diverse

biological databases. Citations are to literature resources in which
the entity, or significant properties of the entity, is described.
3.3. Understanding Annotation of data is essential for capturing and transmitting the
Chemical Classification knowledge associated with data in databases. Annotations are often
captured in the form of free text, which is easy for a human
audience to read and understand, but is difficult for computers to
parse; can vary in quality from database to database; and can use
different terminology to mean the same thing (even within the
same database, if for example different human annotators used
different terminology). A core structure for the organisation of
terminologies used in annotation is into an ontology, which consists
of a structured vocabulary with explicit semantics attached to the
relationships between the terms.
ChEBI provides such an ontology in the domain of chemistry,
in which are organised the terms describing structure-based chem-
ical classes in a structure-based hierarchy, and function-based clas-
ses including terms for bioactivity in a function-based hierarchy.
The ChEBI ontology is an ontology for biologically interesting
chemistry. It consists of three sub-ontologies, namely
Chemical entity, in which molecular entities or parts thereof and
chemical substances are classified according to their structural
features;
Role, in which entities are classified on the basis of their role within
a biological context, e.g., as antibiotics, antiviral agents, coen-
zymes, enzyme inhibitors, or on the basis of their intended use
by humans, e.g. as pesticides, detergents, healthcare products,
and fuel;
Subatomic Particle, in which are classified particles which are smal-
ler than atoms.
Chemical entities are linked to their structure-based classification

with the ontology “is a” relationship and to the roles or bioactivities
which they are known to exhibit under the relevant circumstances
with the “has role” relationship.
Molecular entities with defined connectivity are classified under
the chemical entity sub-ontology. These include the chemical com-
pounds which themselves could exist in some form in the real
world, such as in drug formulations, insecticides, or different alco-
holic beverages. In addition, classes of molecular entities are classi-
fied under the chemical entity sub-ontology. Classes may be
structurally defined, but do not represent a single structural defini-
tion; rather a generalisation of the structural features which all
members of that class share. It is often useful to define the interest-
ing parts of molecular entities as groups. Groups have a defined
connectivity with one or more specified attachment points.
The role sub-ontology is further divided into three distinct
types of role, namely biological role, chemical role, and application.
Roles do not themselves have chemical structures, but rather it is
the case that items in the role ontology are linked to the molecular
entities which have those roles.
The ChEBI ontology uses two generic ontology relationships,
namely
l Is a. Entity A is an instance of Entity B. For example, chloro-
form is a chloromethanes.
l Has part. Indicates relationship between part and whole, for
example, potassium tetracyanonickelate(2) has part tetracya-
nonickelate(2).
In addition, the ChEBI ontology contains several chemistry-
specific relationships which are used to convey additional semantic
information about the entities in the ontology. These are as follows.
l Is conjugate base of and is conjugate acid of. Cyclic relationships
used to connect acids with their conjugate bases, for example,
pyruvic acid is the conjugate acid of the pyruvate anion, while
pyruvate is the conjugate base of the acid.
l Is tautomer of. Cyclic relationship used to show relationship
between two tautomers, for example, L-serine and its zwitter-
ion are tautomers.
l Is enantiomer of. Cyclic relationship used in instances when two
entities are mirror images and nonsuperimposable upon each
other. For example, D-alanine is enantiomer of L-alanine and
vice versa.
l Has functional parent. Denotes the relationship between two
molecular entities or classes, one of which possesses one or more
characteristic groups from which the other can be derived by
functional modification. For example, 16a-hydroxyprogesterone
Fig. 5. A high-level illustration of the ChEBI ontology for the entity R-adrenaline. The entity is given a structure-based
classification as “catecholamines” in the chemical entity ontology, and role classification both in terms of an application
which the entity is used for, “vasodilator agent,” and in terms of the biological role, “hormone”.
can be derived by functional modification (i.e., 16a-hydroxyl-

ation) of progesterone.
l Has parent hydride. Denotes the relationship between an entity
and its parent hydride, for example, 1,4-napthoquinone has
parent hydride naphthalene.
l Is substituent group from. Indicates the relationship between a
substituent group/atom and its parent molecular entity, for
example, the L-valino group is derived by a proton loss from
the N atom of L-valine.
l Has role. Denotes the relationship between a molecular entity
and the particular behaviour which the entity may exhibit
either by nature or by human application, for example, mor-
phine has role opioid analgesic.
Figure 5 gives a high-level illustration of the ChEBI ontology
for R-adrenaline.
The ChEBI ontology may be browsed online by navigating
between entities using the links in the ChEBI ontology section of
the main entry page. To view all the paths from the entry to the
root, select the “Tree view” option on this screen. On the other
hand, to retrieve all terms which hold a specified relationship to
another term, the Advanced Search page provides an ontology
filter. For example, to find all carboxylic acids in ChEBI, go to the
Advanced Search, scroll down to the “Ontology Filter” selection,

and enter the ChEBI ID for carboxylic acid in the filter input box:
CHEBI:33575, and then click “Search.” The Advanced Search
ontology filter can also be used to retrieve all chemical entities
which participate in a particular role, even though such entities
may have vastly different structures. For example, searching for all
entities which have the has role relationship to the role “vasodilator
agent” retrieves 47 chemical entities (in the current release) includ-
ing convallatoxin and amlodipine.
3.4. Programmatically While web interfaces provide an easy point of entry for explorative
Accessing and research surrounding the data in chemical property databases and
Manipulating Chemical allow access to powerful searches, ultimately in order to drive any
Data large-scale computational research it is important to be able to
download the data, in a computationally accessible format, to the
local machine. The primary format for downloading chemical data
is the SDF format, which consists of a collection of MOLfile
resources, together with custom properties.
In ChEBI, targeted results of chemical searches can be down-
loaded directly from the search results page by clicking the icon
entitled “Export your search results,” SDF file format. On the
other hand, the full database can be downloaded directly from the
Downloads page available at http://www.ebi.ac.uk/chebi/down-
loadsForward.do. To download the chemical structures (and
structure-related properties), an SDF file download is available.
The data is provided in two flavours as follows.
l Chebi_lite.sdf file contains only the chemical structure, ChEBI
identifier, and ChEBI Name.
l Chebi_complete.sdf file contains all the chemical structures and
associated information. Note that it excludes any ontological
information as, since they do not contain a structure, ontologi-
cal classes are not able to be represented. To download the
ontology classification, the popular ontology formats OBO
(18) and OWL (19) are available.
Once one has an SDF file locally on the machine, several
publicly available tools facilitate manipulation of the file data, for
example, the Chemistry Development Kit. Figure 6 shows a brief
snippet of code necessary to extract data from an SDF file using
version 1.0.2 of the CDK.
An alternative to parsing and interpreting the full downloadable
data file is to make use of the web service facility to programmatically
retrieve exactly the entries needed in real time. ChEBI provides a web
service implemented in SOAP (Simple Object Access Protocol)
(http://www.w3.org/TR/soap/), which allows programmatic
retrieval of all the data in the database based on a search facility
which mimics the search interface available online. The WSDL
Fig. 6. A code snippet used to parse and extract the data from an SDF file using the Chemistry Development Kit.
(Web Services Description Language) XML document describing the

ChEBI web service implementation is available at http://www.ebi.ac.
uk/webservices/chebi/2.0/webservice?wsdl and contains the speci-
fication of the methods available for programmatic invocation by
clients connected to the ChEBI web service, as well as details of
where the service is running. The WSDL can be used to generate
client applications for various different programming languages.
ChEBI provides such clients in Java and Perl for ease of use. The
clients are available for download from http://www.ebi.ac.uk/
chebi/webServices.do.
In the ChEBI web service, there are seven methods provided
with which to access data:
l getLiteEntity
l getCompleteEntity and getCompleteEntityByList
l getOntologyParents
l getOntologyChildren and getAllOntologyChildrenInPath
l getStructureSearch
The search method getLiteEntity retrieves a LiteEntityList and
takes as parameters a search string and a search category (which may
be null to search across all categories). The getCompleteEntity
method retrieves the full data for a specified entity including syno-
nyms, database links, and structures and takes as parameter a
ChEBI identifier. The getCompleteEntityByList is similar but
retrieves full data for a specified list of multiple ChEBI identifiers
rather than a single one. To browse the ontology programmatically,
there is a getOntologyParents method, which retrieves the parents of
the given entity (specified by ChEBI identifier) in the ChEBI
ontology, a getOntologyChildren method, which retrieves the chil-
dren of the given entity (specified by ChEBI identifier), and a
getAllOntologyChildrenInPath method, which returns all entities
found when navigating down the hierarchy of the ontology to the

leaf terms. Finally, the getStructureSearch method allows program-
matic access to chemical structure searching.
The getLiteEntity search method allows the specification of a
search string and a search category. Allowed search categories are
(available as the SearchCategory enumeration in the domain model)
l ALL
l CHEBI ID
l CHEBI NAME
l SYNONYM
l IUPAC NAME
l DATABASE LINK
l FORMULA
l REGISTRY NUMBER
l COMMENT
l INCHI
l SMILES
The search method returns a LiteEntityList, which may contain
many LiteEntitys. For each LiteEntity contained in the list, the
ChEBI ID may then be used to retrieve the full dataset by passing
it as a parameter to the getCompleteEntity method. The Entity
object which is then returned contains the full ChEBI dataset
linked to that identifier, including structures, database links and
registry numbers, formulae, names, synonyms, and parent and
children ontology relationships.
Navigating the ontology, without retrieving a complete Entity
for each data item in the ontology, is accomplished by using the
methods getOntologyParents (for navigating up towards the root)
and getOntologyChildren (for navigating downwards towards the
leaves). Note that an OntologyDataItem represents a relationship,
specifying the ChEBI ID of the related term and the type of the
relationship. The ChEBI ID can then be used to access the com-
plete Entity if required. Some relationship types are cyclic, in this
case the flag cyclicRelationship will be set to true. Cyclic relation-
ships should be ignored for purposes of navigation or, if navigat-
ing them is required, care should be taken to access them
only once.
By using the web service to access the ChEBI data, details of
the underlying database model are not required to be known, and
the user need not worry about updates to the underlying database
model which do not affect the web service object model. This
makes using web services a very user-friendly mechanism for pro-
grammatically accessing chemical databases.
4. Examples
4.1. Understanding Heavy metals—trace metals at least five times denser than water—are
Ranking in a Simple generally systemic toxins with neurotoxic, nephrotoxic, fetotoxic, and
Search for Mercury teratogenic effects (http://tuberose.com/Heavy_Metal_Toxicity.
(CHEBI:16170) html). Mercury, one such metal, is a well-known environmental
toxin. Its most common organic form is methylmercury, to which
exposure occurs through consumption of seafood (20). Exposure
to elemental metallic mercury occurs from dental amalgam restora-
tions (21). Also found in batteries, fluorescent and mercury lamps,
vaccines and dental amalgams, its effects include pulmonary, cutane-
ous and neurological disorders as well as immunological glomerular
disease (22).
When searching ChEBI for “mercury” using the simple search
box, the search retrieves (at the current release) 35 search results.
These results include all the entities in which the word “mercury”
appears in any associated data field. The results are ranked, that is,
sorted by relevance, depending on where the search term appeared
in the searched entity. A search hit in the primary name, for exam-
ple, is promoted (receives a higher search score) above a search hit
in the cross-references or other associated data. The first five results
include all the forms of pure mercury in ChEBI, while the remain-
der of the search results include molecules and salts containing
mercury as part of a larger species. This ranking is designed to assist
the user in discovering the most relevant information more easily.
4.2. Browsing Sulphur dioxide (SO2) is a poisonous gas produced by volcanoes, in

the Cross-References the combustion of fossil fuels and in various industrial processes. It
Associated with Sulphur is used as a bleaching agent and, due to its germicidal and insecti-
Dioxide (ChEBI:18422) cidal properties, as a disinfectant and pesticide (http://en.wikipe-
dia.org/wiki/Sulfur_dioxide). Its preservative and antioxidant
properties make it useful in the food and drink industries (23).
Exposure to high levels of SO2 can lead to headache, acute pharyn-
gitis, acute airway problems, and asthma (24).
In ChEBI, sulphur dioxide has a standard entry page with a
structural representation, a collection of names and synonyms, an
ontology classification, and various database links and cross-
references. As we mentioned before, the cross-references, both
manually annotated (on the main entry page) and automatically
collected (on the separate “Automatic Xrefs” tab), provide a way
to navigate from a chemical entity into the wider space of
biological and chemical knowledge about that chemical. Sulphur
dioxide is manually cross-referenced to the KEGG COMPOUND
resource (25) and the PDBeChem resource (26). It is also auto-
matically cross-referenced to several resources. Collection of auto-
matic cross-references in ChEBI is often driven by the partner
resource performing annotation of ChEBI IDs to their data

records. These annotations are then collected together and passed
back to ChEBI, together with the IDs and links to the partner
resource. Such resources, linked to from sulphur dioxide, include
the UniProt knowledge base of proteins (27), the NMRShiftDB
database for organic structures and their nuclear magnetic reso-
nance (NMR) spectra (28), and the SABIO-RK database for
reaction kinetics (29).
4.3. Exploring Phenol is an aromatic alcohol with the chemical formula C6H5OH,
the Ontology Structural used commonly in molecular biology as an organic solvent in bacte-
Relationships with rial plasmid extraction protocols. It is also used in medical treat-
Phenol (ChEBI:15882) ments, petroleum refineries and the production of glue, fibre, and
nylon (30). Although only mildly acidic, it can cause burns (31) and
is also associated with cardiac dysfunction in certain contexts (32).
The ontology for phenol contains several structural relation-
ships (conjugate base/acid; has functional parent).
4.4. Exploring the Role Bafilomycin A1 is the most commonly used of the bafilomycins, a
Ontology family of toxic macrolide antibiotics derived from Streptomyces
with Bafilomycin griseus. It has a wide range of biological actions, including anti-
A (ChEBI:22689) bacterial, antifungal, antineoplastic, immunosuppressive, and
antimalarial activities, as well as a tendency to reduce multidrug
resistance (33).
In ChEBI, several roles are annotated to this entity, including
the biological role “toxin,” due to its toxic properties, and the
application “fungicide.” By clicking the link to the target role in
the ontology and selecting “tree view” in the ontology viewer, it is
possible to view the role hierarchy for the role terms and to navigate
to other chemicals which have been annotated with those roles.
4.5. Interacting with 2D Fusicoccin is a polycyclic organic compound whose structure contains
and 3D Structural three fused carbon rings and another ring containing an oxygen
Representations atom and five carbons. A phytotoxin produced by the fungus Fusi-
of Fusicoccin coccum amygdali, it causes membrane hyperpolarisation and proton
(ChEBI:51015) extrusion in plants (http://en.wikipedia.org/wiki/Fusicoccin). The
toxin acts by stimulating the H + -ATPases of the guard cells,
leading to plasma membrane hyperpolarisation and the irrevers-
ible opening of stomata. The ensuing wilting of the leaves is
followed by cell death (34). Fusicoccin has also been shown to
cause cell death in cultured cells of the sycamore (35).
When viewing the ChEBI entry page for Fusicoccin, the default
structural illustration is the schematic 2D diagram, however, click-
ing “More structures” reveals a 3D structure associated with the
entry page. Selecting the checkbox “applet” loads the MarvinView
applet which allows for interactive exploration of the structure,
including rotation in three dimensions.
4.6. Interacting Ciguatoxin is a cyclic polyether produced by the dinoflagellate

with the ChEBI Web Gambierdiscus toxicus. A neurotoxin, it is the agent of ciguatera
Service Using shellfish poisoning (36, 37). Ciguatera, an important form of
Ciguatoxin human poisoning caused by the consumption of seafood, is an
(ChEBI:36467) important medical entity in tropical and subtropical Pacific and
Indian Ocean regions and in the tropical Caribbean (38). It is
characterised by gastrointestinal, neurological, and cardiovascular
disturbances, which may lead to paralysis, coma, and death.
Although it is the most widespread marine-borne disease affecting
humans, there is no immunity, and the toxins are cumulative.
Symptoms may persist for months or years or recur periodically.
One common task that a user may wish to perform when
interacting with the ChEBI web service is performing a structure-
based similarity search for structures similar to the structure of a
given entity. Web services are accessed programmatically, but there
is an interface provided for graphically invoking the web service
methods at http://www.ebi.ac.uk/chebi/webServices.do. This
interface allows the web service methods to be tested and the
output examined, which can serve as input for designing and fine
tuning the programmatic pipeline. For example, there is a table for
executing the structure search method, allowing the input of a
structure in MOLfile or SMILES format. By copying the SMILES
from ciguatoxin into the search box and then executing a similarity
search, the browser displays the XML corresponding to the SOAP
of the web service response, which would be parsed automatically
by the client library in a programmatic execution. A similarity
search executed in this fashion with a 25% Tanimoto cutoff returns
many results, of which okadaic acid (CHEBI:44658) is the most
highly ranked at 84% similarity.
4.7. Linking The PDC-E2 173-184 peptide is the linear dodecapeptide

to the Literature sequence DKATIGFEVQEE corresponding to residues 173–184
Involving PDC-E2 of dihydrolipoyl transacetylase, the second (E2) of three enzymes in
173–184 Peptide the pyruvate dehydrogenase enzyme complex (PDC) (http://en.
(ChEBI:60738) wikipedia.org/wiki/Pyruvate_dehydrogenase_complex). In the
autoimmune disease primary biliary cirrhosis (PBC), the major
autoepitope recognised by both T and B cells lies in the inner lipoyl
domain of PDC-E2, which contains the PDC-E2 173–184 peptide
(39). Lipoylated in vivo at the lysine (K) residue, this dodecapep-
tide has been shown to act as an autoepitope, eliciting autoantibody
formation in the sera of PBC patients (40).
When browsing the ChEBI entry for PDC-E2 173–184 pep-
tide, notice that the entity is referenced to the literature via the
citations feature. This provides a direct link from the database
record to relevant publications in which properties of the chemical
entity have been described. Sometimes, the only way to establish
the complex properties of chemical entities is to glean them from
the primary literature in this fashion.
5. Notes
5.1. Cross-Database When searching public chemical databases and, in particular, when
Integration and searching across multiple databases with a view to collecting as
Chemical Identity complete a view as possible of the available information for a
particular chemical entity, redundancy of chemical information is
a large problem, as multiple records proliferate across public data-
bases. PubChem is a database which accepts depositions of chemi-
cal data from a vast array of public databases and thereafter
performs structure-based integration to provide a unified com-
pound detail page for each separately identifiable chemical entity.
Ideally, the user would like to see one record for each distinct
chemical entity and only one such record.
However, the process of deciphering, which chemical entities in
different data sources are the same and which are not, is a challeng-
ing one which is not yet fully solved by the wider community. The
InChI code is intended to solve the problem by providing a stan-
dard identifier which can be used for data integration. It goes a long
way towards achieving this goal, but falls short of a full solution for
several reasons, each of which is active areas of discussion and
research within the chemical database community.
One problem with the InChI as a tool for solving all the data
integration issues facing the public compound databases is that,
although it is intended to resolve to the same code regardless of the
way in which a chemical entity is drawn, but unfortunately, the
InChI cannot resolve differences in what is depicted where differ-
ences in drawing represent real chemical distinctions. For example,
salts may be depicted in different ways in chemical drawing soft-
ware. One possibility is to depict the molecular structure of the salt
as a bonded whole with the ionised bond drawn as charge sepa-
rated. Another possibility is to draw the two ions as distinct units,
disconnected from a graph perspective, with explicit charges. Yet
another possibility includes several copies of the charged ions in
order to illustrate the final quantitative ratio of the different ions
within the salt. Each of these depictions results—inevitably—in a
different InChI code, rendering programmatic unification more
difficult.
Another reason that the InChI is not able to fully resolve data
integration issues across different resources is that the basic unit of
identity for a chemical entity is regarded differently in different
resources where there is a different application context. For exam-
ple, a chemical entity database such as ChEBI regards different salts
as different chemical entities, while a drug database such as Drug-
Bank regards all different salts of the same active ingredient as the
same drug entity. Integration between such resources is a matter of
maintaining one-to-many, and in some cases even many-to-many,
mappings between identifiers. Similar such complications arise

because some resources unify all tautomers of a given chemical,
while other resources (including ChEBI) distinguish separate tau-
tomers. Again, from a biological perspective, conjugate bases and
acids are mergeable entities, since in physiological conditions these
forms of a chemical may readily interconvert, while from a chemist’s
perspective these are very different entities, with different names
and properties.
5.2. Calculated Many properties provided by public chemical databases are calculated
Properties and directly from the chemical structure. For example, the formula,
Experimentally Derived molecular weight, and overall charge are generally calculated directly
Properties from the chemical structure. Unfortunately, some of the properties of
interest to toxicology cannot be calculated directly from the chemical
Table 6
Lists some examples of chemical properties of interest
Chemical
property Importance of property
Hydrogen bond H-bonding (the formation of weak intermolecular bonds
index between a hydrogen carrying a partial positive charge
on one molecule and a partially negatively charged N,
O, or F on another molecule) plays pivotal role in
biological systems, contributing to the structures and
interactions of molecules such as carbohydrates,
nucleotides, and amino acids. The transience of
hydrogen bonds means that they can be switched on or
off with energy values that lie within the range of
thermal fluctuations of life temperatures (1)
Reactivity with The bodies of all living organisms are chiefly (70–80%)
water water, the unique, life-sustaining properties of which
arise from H-bonding
pH tendencies pH, a measure of hydrogen ion concentration, is a critical
factor controlling life processes such as enzyme
function. For example, a pH change of as little as 0.2
from the normal value of 7.4 in humans is life-
threatening. pH levels similarly determine the
functions of enzymes and other molecules in the wide
variety of living systems
Toxicity Toxins (substances capable of causing harm to living
organisms) include heavy metals such as mercury, lead,
cadmium, and aluminium; biotoxins, produced by
living cells and organism; xenobiotics, substances that
are not a normal component of the organism in
question; or food preservatives and cosmetics
structure. Such properties are predicted using models, which correlate

chemical structural features with properties based on a large set of
training data which is then used for prediction. But the model is only
as good as the training data, and for these purposes, to generate input
training data, there is no substitute for experimental measurement of
property values.
Table 6 lists some properties of interest in biology, including
some properties which can only be measured experimentally.
References
1. de Matos P, Alcántara R, Dekker A, Ennis M, 16. Krause S, Willighagen EL, Steinbeck C (2000)
Hastings J, Haug K, Spiteri I, Turner S, Stein- Molecules 5:93–98
beck C (2010) Nucl Acids Res 38:D249–D254 17. Rijnbeek M, Steinbeck C (2009) J Cheminfor-
2. Trinajstic N (1992) Chemical graph theory. matics 1(1):17
CRC Press, Florida, USA 18. The Gene Ontology Consortium The OBO
3. MDL (2010) http://www.mdl.com/company/ language, version 1.2 (2010) http://www.gen-
about/history.jsp. Last accessed December 2010 eontology.org/GO.format.obo-1_2.shtml.
4. IUPAC The IUPAC International Chemical Last accessed July 2012
Identifier (InChI) (2010) \http://www.iupac. 19. Smith MK, Welty C, McGuinness DL (2010)
org/inchi/. Last accessed July 2012 The web ontology language http://www.w3.
5. Daylight, inc. (2010) http://www.daylight. org/TR/owl-guide/. Last accessed July 2012
com/dayhtml/doc/theory/theory.smiles.html. 20. Berry MJ, Ralston NV (2008) Ecohealth
Last accessed July 2012 5:456–459
6. CAS Chemical Abstracts Service (2010) 21. Guzz G, Fogazzi GB, Cantù M, Minoia C,
http://www.cas.org/. Last accessed July 2012 Ronchi A, Pigatto PD, Severi G (2008) J Envi-
7. Beilstein Crossfire Beilstein Database (2010) ron Pathol Toxicol Oncol 27:147–155
\http://info.crossfiredatabases.com/beilstein 22. Haley BE (2005) Medical Veritas 2:535–542
acquisitionpressreleasemarch607.pdf. Last 23. Freedman BJ (1980) Br J Dis Chest 74:128–34
accessed December 2010 24. Longo BM, Yang W, Green JB, Crosby FL,
8. Marx V (2009) GenomeWeb BioInform News Crosby VL (2010) Toxicol Environ Health A
\http://www.genomeweb.com/informatics/ 73:1370–8
tear-down-firewall-pharma-scientists-call-pre- 25. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita
competitive-approach-bioinformatic?page ¼ K, Itoh M, Kawashima S, Katayama T, Araki M,
show. Last accessed July 2012 Hirakawa M (2006) Nucleic Acids Res 34:
9. Sayers E (2005) PubChem: An Entrez Database D354–357
of Small Molecules. NLM Tech Bull (342):e2 26. PDB The world wide protein data bank (2010)
10. Williams A (2008) Chemistry International http://www.wwpdb.org/. Last accessed July
30(1):1 2012
11. Wikipedia Chemistry (2010) http://en.wikipe 27. The UniProt Consortium (2010) Nucleic
dia.org/wiki/Chemistry. Last accessed July Acids Res 38:D142–D148
2012 28. Steinbeck C, Krause S, Kuhn S (2003) J Chem
12. CHEMBIOGRID CHEMBIOGRID: chemis- Inf Comput Sci 43(6):1733–1739
try databases on the web (2010) http://www. 29. Wittig U, Golebiewski M, Kania R, Krebs O,
chembiogrid.org/related/resources/about. Mir S, Weidemann A, Anstein S, Saric J, Rojas I
html. Last accessed July 2012 (2006) In: Proceedings of the 3rd international
13. Williams AJ (2009) Drug Discov Today workshop on data integration in the life
13:495–501 sciences 2006 (DILS’06), Hinxton, UK, pp
14. Steinbeck C, Han Y, Kuhn S, Horlacher O, 94–103
Luttmann E, and Willighagen E (2003) 30. AÅŸkin H, Uysal H, Altun D (2007) Toxicol
J chem Inf Comput Sci 43(2):493–500 Ind Health 23:591–8
15. Steinbeck C, Hoppe C, Kuhn S, Floris M, 31. Lin TM, Lee SS, Lai CS, Lin SD (2006) Burns
Guha R, Willighagen EL (2006) Curr Pharm 4:517–21
Des 12:2111–2120
32. Warner MA, Harper JV (1985) Anesthesiology 37. Nguyen-Huu TD, Mattei C, Wen PJ, Bourdelais
62:366–7 AJ, Lewis RJ, Benoit E, Baden DG, Molgó J,
33. van Schalkwyk DA, Chan XW, Misiano P, Meunier FA (2010) Toxicon 56:792–6
Gagliardi S, Farina C, Saliba KJ (2010) Bio- 38. Lehane L, Lewis RJ (2000) Int J Food Microbiol
chem Pharmacol 79:1291–9 61:91–125
34. Lanfermeijer FC, Prins H (1994) Plant Physiol 39. Long SA, Quan C, Van de Water J, Nantz MH,
104:1277–1285 Kurth MJ, Barsky D, Colvin ME, Lam KS,
35. Malerba M, Contran N, Tonelli M, Crosti P, Coppel RL, Ansari A, Gershwin ME (2001)
Cerana R (2008) Physiol Plant 133:449–57 J Immunol 167:2956–63
36. Mattei C, Wen PJ, Nguyen-Huu TD, Alvarez 40. Amano K, Leung PS, Xu Q, Marik J, Quan C,
M, Benoit E, Bourdelais AJ, Lewis RJ, Baden Kurth MJ, Nantz MH, Ansari AA, Lam KS,
DG, Molgó J, Meunier FA (2008) PLoS One Zeniya M, Coppel RL, Gershwin ME (2004)
3:e3448 J Immunol 172:6444–52
Chapter 10
Accessing, Using, and Creating Chemical Property

Databases for Computational Toxicology Modeling
Antony J. Williams, Sean Ekins, Ola Spjuth, and Egon L. Willighagen
Abstract
Toxicity data is expensive to generate, is increasingly seen as precompetitive, and is frequently used for the
generation of computational models in a discipline known as computational toxicology. Repositories of
chemical property data are valuable for supporting computational toxicologists by providing access to data
regarding potential toxicity issues with compounds as well as for the purpose of building structure–toxicity
relationships and associated prediction models. These relationships use mathematical, statistical, and
modeling computational approaches and can be used to understand the mechanisms by which chemicals
cause harm and, ultimately, enable prediction of adverse effects of these chemicals to human health and/or
the environment. Such approaches are of value as they offer an opportunity to prioritize chemicals for
testing. An increasing amount of data used by computational toxicologists is being published into the
public domain and, in parallel, there is a greater availability of Open Source software for the generation of
computational models. This chapter provides an overview of the types of data and software available and
how these may be used to produce predictive toxicology models for the community.
Key words: Bioinformatics, Cheminformatics, Computational toxicology, Public domain toxicology

data, QSAR, Toxicology databases
1. Introduction
Since the inception of computational toxicology as a subdiscipline

of toxicology there have likely been many hundreds of publications
and several books that address approaches for modeling various
toxicity endpoints. The reader is especially recommended to read
the available chapters in several books (to which this volume will
contribute) to see the broad diversity of computational approaches
applied to date (1–3).
Computational toxicology approaches are used in industries
that need to estimate risk early on. For example, the pharmaceuti-
cal, agrochemical, or consumer product industries may want to
determine the effects of a molecule on cytotoxicity. This can be
221
222 A.J. Williams et al.
achieved using in-house data for the end point of interest or using
freely available or commercial data sets for modeling the property.
At a deeper level, the lack of appreciable genotoxicity or cardiotoxi-
city may be important determinants of whether a molecule is
approved for human use or not by government regulatory autho-
rities. As the assays become more specific it is likely that the sources
of data may be restricted. Computational toxicology models could
be used early in the R&D process so that compounds with lower
risk may be progressed as part of a multidimensional process that
also considers other properties of molecules (4). The application of
an array of computational models to fields such as green chemistry
has also recently been reviewed (5). The advantages of such meth-
ods could be that they save money or prevent the physical testing of
compounds (and concomitant animal or other tissue usage) which
may be undesirable or impractical at high volume.
In the past 5 years, 2D-ligand-based approaches have been
increasingly used along with sophisticated algorithms and networks
to form a systems-biology approach for toxicology (6–10). These
studies commonly require compounds with toxicity data and
molecular descriptors to be generated or retrieved from databases.
Several of these recent approaches have described how machine
learning methods can be used for modeling binary or continuous
data relevant to toxicology. We have experience in this domain and
examples of our own activities include studying drug-induced liver
injury using data from a previously published study (11) and colla-
borating with Pfizer to generate models for the time-dependent
inhibition of CYP3A4 (12). Machine learning models for predict-
ing cytotoxicity using data from many different in-house assays that
were combined has also been published separately by Pfizer (13).
As examples of how databases of toxicity information can be
used or created to enable computational modeling we have provided
a few recent examples. There are several examples of quantitative
structure activity relationship (QSAR) or machine learning methods
for predicting hepatotoxicity (14, 15) or drug–drug interactions
(12, 16–18). Drug metabolism in the liver can convert some
drugs into highly reactive intermediates (19–22) and hence cause
drug-induced liver injury (DILI). DILI is the number one reason
why drugs are not approved or are withdrawn from the market after
approval (23). Idiosyncratic liver injury is much harder to predict
from the preclinical in vitro or in vivo situation so we frequently
become aware of such problems once a drug reaches large patient
populations in the clinic and this is generally too late for the drug
developer to identify an alternative and safer drug molecule. One
study assembled a list of approximately 300 drugs and chemicals,
with a classification scheme based on human clinical data for hepa-
totoxicity, for the purpose of evaluating an in vitro testing method-
ology based on cellular imaging of primary human hepatocyte
cultures (24). It was found that the 100-fold Cmax scaling factor
10 Accessing, Using, and Creating Chemical Property Databases. . . 223
represented a reasonable threshold to differentiate safe versus toxic

drugs, for an orally dosed drug and with regard to hepatotoxicity
(24). The concordance of the in vitro human hepatocyte imaging
assay technology (HIAT) applied to about 300 drugs and chemicals,
is about 75% with regard to clinical hepatotoxicity, with very few
false positives (24). An alternative is to use the human clinical DILI
data to create a computational model, and then validate it with
enough compounds to provide confidence in its predictive ability
so it can be used as a prescreen before in vitro testing.
In a recent study a training set of 295 compounds and a test set
of 237 molecules were used with a Bayesian classification approach
(25, 26). The Bayesian model generated was evaluated by leaving
out either 10, 30, or 50% of the data. In each case, the leave-out
testing was comparable to the leave-one-out approach and these
values were very favorable indicating good model robustness. The
mean concordance >57%, specificity >61%, and sensitivity >52%
did not seem to differ depending on the amount of data left out.
Molecular features such as long aliphatic chains, phenols, ketones,
diols, a-methyl styrene (represents a polymer monomer), conju-
gated structures, cyclohexenones, and amides predominated in
DILI active compounds (11). The Bayesian model was tested
with 237 new compounds. The concordance ~60%, specificity
67%, and sensitivity 56% were comparable with the internal valida-
tion statistics. A subset of 37 compounds of most interest clinically
showed similar testing values with a concordance greater than 63%
(11). Compounds of most interest clinically are defined as well-
known hepatotoxic drugs plus their less hepatotoxic comparators.
These less hepatotoxic comparators are approved drugs that typi-
cally share a portion of the chemical core structure as the hepato-
toxic ones. The purpose of this test set was to explore whether the
Bayesian in silico method could differentiate differences in DILI
potential between or among closely related compounds, as this is
likely the most useful case in the real-world drug discovery setting.
In order to develop any computational model, there are numer-
ous requirements for the data utilized as the foundation of the algo-
rithm development. There are also independent requirements for the
data, simply as content of a look-up database for medicinal chemists
and toxicologists, for example, to enable search and retrieval of prop-
erty data of interest. For the purpose of lookup and retrieval, the data
need to be of high “quality.” Text searches based on chemical identi-
fiers such as chemical names, CAS numbers and international identi-
fiers such as European Inventory of Existing Chemical Substances
(EINECS) numbers (http://www.ovid.com/site/products/field-
guide/EINECS/eine.htm#abouthealth) should result in the retrieval
of an accurate representation of the chemical compound with asso-
ciated data, preferably including attribution to the source data.
While any aggregated database is sure to contain errors this has
been shown to be a considerable issue when accessing data from
public domain databases (27) and will be examined in more detail

later. For the purpose of sourcing data to be utilized for modeling
purposes, it is preferable that the data span a diverse area of structure
space (although this may be dependent on whether “local” or
“global” models are being developed) that the assays captured in
the database provide responses over many orders of magnitude and
that the data have been acquired with repeatable measurements and,
when possible, validated against other data. Clearly these criteria are
rather challenging (and variable depending on the type of end point)
and as a result the production of a high-quality data set for the
purpose of modeling can be both tremendously time consuming
and exacting in its assembly.
2. Public Domain
Databases
Computational toxicologists have a choice when accessing data for
the purpose of reference or to create models. They can source data
in house or from collaborators or commercial sources [e.g., data
and computational models from Leadscope (http://www.lead-
scope.com), Accelrys (http://accelrys.com), LHASA (https://
www.lhasalimited.org), and Aureus (http://www.aureus-sciences.
com/aureus/web/guest/adme-overview)] or, as is increasingly
more common, from online resources (28). Online resources
should, in general, be expected to be more favorable to scientists
as the data are available at no cost and can contain a broader range
of information. However, it should be noted that all structure-
based databases can be prone to error and care needs to be taken
when choosing data from such sources. This is especially true of
public domain structure-based data sources as has been discussed
elsewhere (29).
Online resources hosting toxicity data include the multiple
databases integrated via the Toxicity Network, ToxNet (http://
toxnet.nlm.nih.gov), the Distributed Structure Searchable Toxicity
(DSSTox) database hosted by the EPA (30), the Hazardous Sub-
stances Databank (http://www.nlm.nih.gov/pubs/factsheets/
hsdbfs.html), ACToR (31), ChEMBL (32), and ToxRefDB
(http://www.epa.gov/ncct/toxrefdb/) to name just a few. These
are discussed in more detail below.
TOXNET, the TOXicology data NETwork, is a cluster of
databases covering toxicology, hazardous chemicals, environmental
health, and related areas. It is managed by the Toxicology and
Environmental Health Information Program (TEHIP) in the Divi-
sion of Specialized Information Services (SIS) of the National
Library of Medicine (NLM). It is a central portal integrating a
series of databases related to toxicology. These are Integrated Risk
Information System (IRIS), International Toxicity Estimates for
Risk (ITER), Chemical Carcinogenesis Research Information Sys-

tem (CCRIS), Genetic Toxicology (GENE-TOX), the Household
Products Database, Carcinogenic Potency Database (CPDB), and a
number of other related databases. It also integrates to HSDB, the
Hazardous Substances Data Bank.
HSDB focuses on the toxicology of potentially hazardous che-
micals and contains information on human exposure, industrial
hygiene, emergency handling procedures, environmental fate, reg-
ulatory requirements, and other related areas. It is peer reviewed by
a committee of experts and presently contains over 5,000 records.
The EPA recently released the ACToR database online (31). It
is made up of 500 public data sources on over 500,000 chemicals
and contains data regarding chemical exposure, hazard, and poten-
tial risks to human health and the environment. ACToR is
integrated to a toxicity reference database called ToxRefDB which
allows searching and downloading of animal toxicity testing results
on hundreds of chemicals. The database captures 30 years and $2
billion worth of animal testing chemical toxicity data in a publicly
accessible searchable format.
ACToR also links to DSSTox, the EPA DSSTox database proj-
ect (30, 33), that provides a series of documented, standardized,
and fully structure-annotated files of toxicity information. The
initial intention for the project was to deliver a public central
repository of toxicity information to allow for flexible analogue
searching, structure–activity relationship (SAR) model develop-
ment and the building of chemical relational databases. In order
to ensure maximum uptake by the public and allow users to inte-
grate the data into their own systems, the DSSTox project adopted
the use of a common standard file format to include chemical
structure, text, and property information. The DSSTox data sets
are among the most highly curated public data sets available and, in
the judgment of these authors, likely the reference standard in
publicly available structure-based toxicity data. Through ACToR
scientists can also access data from the ToxCast™ database (which
in itself can be used for computational toxicology modeling (34)).
This is a long-term, multi-million dollar effort that hopes to under-
stand biological processes impacted by chemicals that may lead to
adverse health effects, as well as generate predictive models that
should enable cheaper predictions of toxicity (35).
ChEMBL is a database of drugs and other small molecules of
biological interest. The database contains information for
>500,000 compounds and >3 million records of their effects on
biological systems including target binding, the effect of these
compounds on cells and organisms (e.g. IC50), and associated
ADMET data (>60,000 records of which >7,000 relate to toxic-
ity). The database contains manually curated SAR data from almost
40,000 papers from the medicinal chemistry and pharmacology
literature and therefore provides data that may be used for
Fig. 1. The Snorql graphical user interface showing the SPARQL query and the seven most frequent toxicity measurements
recorded in ChEMBL.
computational toxicology purposes. Modern application program-

ming interfaces (APIs) make it easy to query such databases. Ver-
sion 2 of the ChEMBL database is available as resource description
framework (RDF) (36) via web services that allow searching of the
database with the query language known as SPARQL (37). Using a
SPARQL query we can summarize what experimental toxicology
data is available in the database. Figure 1 shows a graphical user
interface around the SPARQL end point (called Snorql), showing
the SPARQL query and the seven most frequent toxicity measure-
ments recorded in ChEMBL.
Using such RDF and SPARQL queries proteochemometrics
studies have recently been performed, linking small molecules to
their targets. In one study, metadata on the assessed assay quality
provided by the ChEMBL curators was used in the modeling of
IC50 values (38), hinting at the potential that the inclusion of
complementary information has in understanding complex
Fig. 2. A screenshot of the Bioclipse wizard used to create a local data set for substructure mining. The wizard allows the
user to select two protein families that will be compared. The biological activity, IC50, is selected in this example and is
dynamically discovered in the online database.
biological patterns. Similarly, in an unpublished work, kinase

families have been compared in a database-wide substructure
mining study, where molecular fragments were identified in
the small molecules that were more present in the database for one
kinase family compared to others (Fig. 2).
It is important to note that both studies were acting directly on
the remote ChEMBL database, and the use of the SPARQL stan-
dard is crucial, thereby allowing any tool to query the ChEMBL
database using the same Open Standard as an increasing amount of
other databases in life sciences. These include general life sciences

databases exposed by Chem2Bio2RDF (39) and Bio2RDF (40, 41)
as well as toxicology-specific databases.
The power of semantic web technologies lies primarily in the
fact that they describe more explicitly what the content of the
database is (often using standardized ontologies) and also provide
a unified approach to access data. A nice example of the power of
this approach is the use of a federated SPARQL query by Prud’-
hommeaux in 2007 (42). A federated SPARQL query hypotheti-
cally searches across multiple different databases simultaneously.
The example in this use case links five online databases, searching
for chemical compounds causing over-expression of a particular
gene involved in apoptosis and which have a chemical similarity to
compounds showing a low lethal dose in mice (35). The databases
providing the data to answer this query are available as Open Data.
The fact that these queries can easily be changed shows how the use
of databases will likely impact computational toxicology in the next
few years. Some examples of online databases that will likely con-
tribute to this shift are given below.
PubChem is the highest profile online database serving our
community and was launched by the NIH in 2004 to support the
“New Pathways to Discovery” component of their roadmap initia-
tive (43). The primary purpose for the database was to act as a
repository for biological properties of chemical probes. PubChem
archives and organizes information about the biological activities of
chemical compounds into a comprehensive biomedical database
and is intended to empower the scientific community to use small
molecule chemical compounds in their research. It contains screen-
ing data that could be used for building models relevant to toxicity.
One example is the availability of data to enable modeling of
the potassium channel human Ether-à-go-go-related gene (hERG).
This channel is particularly important pharmaceutically as many
drugs interact and cause hERG-related cardiotoxicity. Numerous
blockbuster drugs have recently been removed from the market due
to QT syndrome side effects, an abnormality associated with the
hERG and associated channels (44). Several groups have used this
hERG data for computational model testing (45–47). Despite the
authoritative position granted to PubChem, as evidenced in a
recent online questionnaire (Antony J. Williams, personal commu-
nication), PubChem data have been shown to be rather low quality
in many cases, especially when it comes to providing data sets for
the purpose of modeling. While the number of chemical probes
screened by the National Screening Libraries to date is limited to
348,258, the PubChem database contains over 31 million unique
chemicals with PubChem’s content derived from the voluntary
contributions of commercial, academic, and government organi-
zations and, unfortunately, much of this has contributed to the
pollution of the database. Chemical structures often do not
accurately represent the expected chemical and the quality of the

screening data has been questioned by Shoichet and others (42).
No data curation is performed other than cheminformatics filters
and standardization approaches at the time of deposition. We have
found retrieval of available molecular structures for well-known
FDA-approved drugs from PubChem is difficult and this can
severely impact current and future efforts at drug repurposing
using computational methods (48). For example, Symbicort is a
well-known combination drug of budesonide and formoterol.
A search on PubChem should return both drugs as one record
but instead returns only budesonide. Searching on budesonide
itself returns nine hits, one of which is Symbicort but eight of
which are the single component, all of the same molecular mass
but differing in stereochemistry. Searching for formoterol returns
six hits all differing in their stereochemistry. These are not isolated
examples (27).
The NIMH Psychoactive Drug Screening Program (PDSP,
http://pdsp.med.unc.edu/indexR.html) (49–51) is a large data
set of compounds associated with biology data (over 55,000 values)
for many G-protein-coupled receptors. Several of these receptors
are associated with toxicity of drugs and may also be off-targets (49,
50, 52–54). The website has links to PubChem, however a recent
upload of this data in another database (Collaborative Drug Dis-
covery, http://www.collaborativedrug.com/) has provided over
20,000 searchable structures, although the quality of these may
be questionable (incorrect stereochemistry, etc.) as all the data
came from PubChem originally. A subset of this data for the 5-
HT2B receptor was previously used for building machine learning
models but required extensive sourcing of compounds without
structures provided in the database (55).
These public domain databases contain potentially valuable
data that can be used for the purposes of toxicology but, unfortu-
nately, have a number of challenges. There are, of course, the issues
regarding the quality of the experimental data (42) in terms of the
quality of measurement, reproducibility, and appropriateness of the
assays. Richard (33) has discussed the challenges of assembling
high-quality data based on the experiences of creating the DSSTox
data set, while Tropsha et al. (56) have examined the issues of
data quality in terms of the development of QSAR models. Williams
(27, 29) analyzed the quality of public domain databases in relation
to the curation of the ChemSpider database and has identified
common issues in regards to the relationships between chemical
structures and associated chemical names, generally drug names
and associated synonyms. As a result of the processes used to
assemble the databases, especially for repositories such as PubChem
(43), many of the public domain databases are contaminated. As
discussed earlier for PubChem, querying based on chemical names
can result in the retrieval of incorrect chemical structures that are
then used in the development of computational toxicology models.

The studies of Ekins and Williams (11, 29, 57) requiring the
assembly of structure data files have shown numerous challenges
with regards to the assembly of quality data from public domain
data (27). Williams et al. have initiated a study to examine 200 of
the top selling drugs and the consistency between a gold standard
set of compound structures and their presence in a series of public
compound databases. Early reports show that there are significant
issues with data sourced from PubChem, Wikipedia, Drugbank
(58), and others.
Williams and Ekins (27) have recently raised an alert in regards
to data quality for internet-based chemistry resources if they and
their content are to be used for drug repurposing or integrated into
other cheminformatics or bioinformatics resources for drug discov-
ery. It is clear that it is not yet appropriate to treat any of these
chemistry databases as authoritative and users should be vigilant in
their use and reliance on the chemical structures and data derived
from these sources. They identified an urgent need for government
funding of data curation for public domain databases to improve
the overall quality of chemistry on the internet and stem the prolif-
eration of errors.
3. Utilizing
Databases for
Computational
Toxicity Computational access to the databases discussed in this section has
not been formalized and, at best, one can download the full data
from a download page. Recently however the OpenTox project
proposed a standardized API for accessing data sets (59). The
resulting OpenTox Framework describes the use of a number of
technologies to implement this API, specifically RESTful services to
facilitate web access, and use of the RDF as an open standard for
communication. This combination makes it easy for third party
software projects to integrate databases exposed via the framework.
Bioclipse (60) was recently extended with a number of scripting
extensions to interact with OpenTox servers, and Table 1 shows
two approaches of accessing toxicological data sets made available
via the OpenTox framework (38). The first approach lists all data
sets and takes the first (index ¼ 0) to be saved in a standard SDF file
format. The second approach uses an OpenTox ontology server
and queries the underlying RDF data directly with the SPARQL
query language. The SPARQL requests shown in the listing shows
queries for data sets related to mutagenicity. At the time of writing
only one data set was available but this will hopefully change as
the community embraces and adopts the OpenTox Framework.
Figure 3 shows how this data set was subsequently downloaded as
Table 1
JavaScript scripts that show two approaches
to how Bioclipse can retrieve information
from OpenTox servers
//Approach 1: list all data sets provided by a OpenTox server

datasets ¼ opentox.listDataSets("http://apps.ideaconsult.net:8080/
ambit2/")
opentox.downloadDataSetAsMDLSDfile(
"http://apps.ideaconsult.net:8080/ambit2/",
datasets.get(0), "/OpenTox/dataset1.sdf"
)
//Approach 2: query an OpenTox ontology server
var sparql ¼ " \
PREFIX ot:<http://www.opentox.org/api/1.1#> \
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> \
select ?dataset ?comment where {\
?dataset rdf:type ot:Dataset . \
OPTIONAL {?dataset ?p ?comment .} \
FILTER regex(?comment, \"mutagenicity\") . \
}\
";
rdf.sparqlRemote("http://apps.ideaconsult.net:8080/ontology/", sparql)
an SDF file and then displayed in a Bioclipse molecule table view.

With these types of approaches and with adoption of the standards
offered by the OpenTox Framework, it is expected that many of the
public domain databases will become more accessible to the com-
munity via interfaces such as that offered by Bioclipse.
4. Utilizing
Databases for
Prediction of
Metabolic Sites The toxicity of compounds is not always directly caused by the
compound itself, but may also be due to metabolism. Public data-
bases related to drug metabolism are rare. The lack of such databases
makes methods that predict metabolism, or just the likelihood that
structures are metabolized, all the more relevant. Predicted meta-
bolites can be screened against toxicity databases and existing pre-
dictive models, while the likelihood that a drug can undergo
metabolism can be used in the overall analysis of a compound’s
toxicity. The cytochrome P450 (CYPs) family of heme-thiolate
enzymes are the cause of the majority of drug–drug interactions
and metabolism-dependent toxicity issues (61, 62). The identifica-
tion of the site-of-metabolism (SOM) for CYPs is an important
Fig. 3. A screenshot of a Bioclipse script that downloads a data set with mutagenicity information from an OpenTox server
and shows the resulting set of data in a molecules table view.
problem. Several methods have been reported to predict CYP

metabolism using QSAR, docking, pharmacophore modeling, and
statistical methods (61–64).
In the absence of free CYP metabolism databases the Meta-
Print2D (65, 66) method has been developed for the prediction
of metabolic sites from input chemical structures (see Figs. 4
and 5). The method uses an internal database based on historical
metabolite data derived from the proprietary Accelrys Metabolite
database (http://accelrys.com/products/databases/bioactivity/
metabolite.html), which is preprocessed using circular finger-
prints capable of accurately describing individual atom environ-
ments. New molecules can then be processed with the same
circular fingerprints, and the probability of metabolism can be
calculated for each atom by querying the database. This method
is very fast (ca. 50 m s per compound) (61) and performs well
compared to other algorithms in the field such as SmartCyp (67)
and MetaSite (68).
Fig. 4. A screenshot of the Bioclipse workbench with the MetaPrint2D method predicting the SOM for a set of drugs. In
the interface individual atoms are colored according to their likelihood of metabolism (the color scheme is as follows in the
interface red ¼ high, yellow ¼ medium, and green ¼ low). Please see color figure online to see color coding in the
interface.
5. Standardizing
QSAR Experiments
Setting up data sets for QSAR analysis is not without complications.
There are numerous software tools available for the calculation of
descriptors and the conversion of chemical structures into a form
which can be used in statistical and machine-learning methods.
These software packages are generally incompatible, have proprietary
file formats, or are available as standalone applications. A lack of
standardization in terms of descriptors and the associated descriptor
implementations has likely contributed to the poor quality of the
supplemental data for published QSAR models and has made it an
Fig. 5. A screenshot from Bioclipse with a script calling the MetaPrint2D method and outputting the predicted likelihood for
the SOM per atom.
often impossible task to reproduce both the formation of the data set
and hence the entire analysis.
QSAR-ML (69) is a new file format for exchanging QSAR data
sets, which consists of an open XML format (QSAR-ML) and
builds on the Blue Obelisk descriptor ontology (70). The ontology
provides an extensible manner of uniquely defining descriptors for
use in QSAR experiments. The exchange format supports multiple
versioned implementations of these descriptors. As a result a data
set described by QSAR-ML makes setup by others completely
reproducible. A Bioclipse plug-in is available for working with
Fig. 6. A screenshot from Bioclipse showing the QSAR plug-ins working graphically with QSAR datasets and complying with
the QSAR-ML standard for interoperable QSAR datasets.
QSAR data and provides graphical tools for importing molecules,

selecting descriptors, performing calculations, and exporting data
sets in QSAR-ML as well as in the more traditional CSV file formats
(Fig. 6). Descriptors can take advantage of software installed on
desktop computers, as well as calculations on networked computers
via Web services (71).
Pfizer has recently evaluated open source descriptors and model
building algorithms using a training set of approximately 50,000
molecules and a test set of approximately 25,000 molecules with
human liver microsomal metabolic stability data (72). A C5.0 deci-
sion tree model demonstrated that the Chemistry Development Kit
(73) descriptors together with a set of SMARTS keys had good
statistics (Kappa ¼ 0.43, sensitivity ¼ 0.57, specificity 0.91, posi-
tive predicted value (PPV) ¼ 0.64) equivalent to models built with
commercial MOE2D software and the same set of SMARTS keys
(Kappa ¼ 0.43, sensitivity ¼ 0.58, specificity 0.91, PPV ¼ 0.63).
This observation was also confirmed upon extension of the data
set to ~193,000 molecules and generation of a continuous model
using Cubist (http://www.rulequest.com/download.html). When
the continuous predictions and actual values were binned to get a
categorical score an almost identical Kappa statistic (0.42) was
observed (72).
Fig. 7. A schematic indicating the release of ADME/Tox models from commercial descriptors and algorithms, and how it
may facilitate data sharing.
The advantages of models and data built with open source

technologies are the reduced costs and overheads as well as
the ease of sharing with other researchers (Fig. 7). This can enable
groups outside of pharmaceutical companies to have some of the
capabilities that are taken for granted such as models for compu-
tational toxicology based on the abundance of data available to
them. We have recently proposed that companies could assist in
providing ADME/Tox data to a central repository as such data
could be considered precompetitive (57) and such a database
would be valuable for building and validating computational
models. Making such data available to the OpenTox Framework
could dramatically expand the availability of computational toxi-
cology across the community.
If toxicology data is to be integrated on a large scale then it is
necessary that it is available in a form which can be processed by
computers. The topic of scientific data processing by computers has
been mentioned as one of the primary bottlenecks for research (74).
There are numerous standardization initiatives in the life sciences,
with the Minimum Information standards (http://en.wikipedia.
org/wiki/Minimum_Information_Standards) as examples of stan-
dards which specify the minimum amount of metadata and data
required to meet a specific aim in a field. The MGED consortium
pioneered this approach in bioinformatics with the definition for
Minimum Information About Microarray Experiments (MIAME)
(75), and it has now become a requirement that data must be
deposited in MIAME-compliant public repositories using the
MAGE-ML exchange format in order to publish microarray
experiments in most journals. QSAR-ML is a step towards the
same goal for QSAR studies but in order to meet this demand a
database for sharing QSAR-ML-compliant experiments must first
be developed.
6. Conclusions
Computational toxicology modeling depends on the availability of

data to both generate and validate models and access to software to
generate the appropriate descriptors and develop the resulting
algorithms. At this stage in the development cycle for tools to
support computational toxicology we should expect that high-
quality data sets which can be used as the basis of model develop-
ment and validation would be available. In recent years, especially
with the development of the internet as a distribution network for
data and as a result of investments in bioinformatics and public
domain data sources there has been a dramatic increase in the
availability of data. While data are available caution is warranted as
in many cases data are not curated and scientists are encouraged to
understand the nature of the data sources from which they are
extracting data before they use it for modeling.
We have proposed the construction of a validated ADME/Tox
database and suggest an updated strategy for how the scientific
community could build such a resource (57).
1. Identify all available publications containing toxicity data relat-
ing to molecular structures tested in animal or human tissues
in vitro or in vivo. Mine the data from these publications
relating to toxicity properties.
2. Clean and organize the data from these publications, e.g.,
relate by species, tissue, cell types, and capture experimental
conditions using manual curation and create an ontology.
3. Provide a means for other scientists to update and include new
properties.
4. Encourage pharmaceutical companies to publish their previ-
ously “unpublished toxicology data” in exchange for access to a
duplicate of the database of toxicity data for their own in-house
efforts for internal deployment.
5. As an example of the development and value of such a toxicol-
ogy database, toxicity computational models could be built,
validated, and provided over the Web for free.
Assuming that such a database, or series of interconnected
databases, could be developed, it will be necessary that the com-
munity agrees to and adopts standards for describing the data and,
in parallel, develops similar approaches for the distribution of
resulting computational models. The efforts of the OpenTox

Framework or other organizations to develop standards are likely
to catalyze significant shifts. Coupling high-quality data with new
standards in interoperability, model interpretability, and usability
will hopefully enable the development of improved models for
computational toxicology. We can then envisage a deeper under-
standing of the toxicity mechanisms of molecules and this will
ultimately enable the prediction of adverse effects of these chemi-
cals to human health and/or the environment.
Acknowledgments
SE gratefully acknowledges the many collaborators involved in the

cited work. Contributions by OS and ELW were supported by
Uppsala University (KoF 07).
SE consults for Collaborative Drug Discovery, Inc. on a Bill and
Melinda Gates Foundation Grant#49852 “Collaborative drug dis-
covery for TB through a novel database of SAR data optimized to
promote data archiving and sharing.”
References
1. Helma C (ed) (2005) Predictive toxicology. MetaCore and MetaDrug platforms. Xenobio-
Taylor and Francis, Boca Raton tica 36(10–11):877–901
2. Cronin MTD, Livingstone DJ (2004) Predict- 8. Ekins S (2006) Systems-ADME/Tox:
ing chemical toxicity and fate. CRC, Boca resources and network approaches. J Pharma-
Raton col Toxicol Methods 53:38–66
3. Ekins S (2007) Computational toxicology: risk 9. Nikolsky Y, Ekins S, Nikolskaya T, Bugrim A
assessment for pharmaceutical and environ- (2005) A novel method for generation of sig-
mental chemicals. Wiley, Hoboken nature networks as biomarkers from complex
4. Ekins S, Boulanger B, Swaan PW, Hupcey high throughput data. Toxicol Lett 158:20–29
MAZ (2002) Towards a new age of virtual 10. Ekins S, Nikolsky Y, Nikolskaya T (2005)
ADME/TOX and multidimensional drug Techniques: application of systems biology to
discovery. J Comput Aided Mol Des absorption, distribution, metabolism, excre-
16:381–401 tion, and toxicity. Trends Pharmacol Sci
5. Voutchkova AM, Osimitz TG, Anastas PT 26:202–209
(2010) Toward a comprehensive molecular 11. Ekins S, Williams AJ, Xu JJ (2010) A predictive
design framework for reduced hazard. Chem ligand-based Bayesian model for human drug
Rev 110:5845–5882 induced liver injury. Drug Metab Dispos
6. Ekins S, Giroux C (2006) Computers and sys- 38:2302–2308
tems biology for pharmaceutical research and 12. Zientek M, Stoner C, Ayscue R, Klug-McLeod
development. In: Ekins S (ed) Computer appli- J, Jiang Y, West M, Collins C, Ekins S (2010)
cations in pharmaceutical research and devel- Integrated in silico-in vitro strategy for addres-
opment. John Wiley, Hoboken, pp 139–165 sing cytochrome P450 3A4 time-dependent
7. Ekins S, Bugrim A, Brovold L, Kirillov E, inhibition. Chem Res Toxicol 23:664–676
Nikolsky Y, Rakhmatulin EA, Sorokina S, Rya- 13. Langdon SR, Mulgrew J, Paolini GV, van
bov A, Serebryiskaya T, Melnikov A, Metz J, Hoorn WP (2010) Predicting cytotoxicity
Nikolskaya T (2006) Algorithms for network from heterogeneous data sources with Bayesian
analysis in systems-ADME/Tox using the learning. J Cheminform 2:11
14. Clark RD, Wolohan PR, Hodgkin EE, Kelly using a Bayesian model. J Med Chem
JH, Sussman NL (2004) Modelling in vitro 47:4463–4470
hepatotoxicity using molecular interaction 26. Bender A (2005) Studies on molecular similar-
fields and SIMCA. J Mol Graph Model ity. Ph.D. Thesis, University of Cambridge,
22:487–497 Cambridge
15. Cheng A, Dixon SL (2003) In silico models for 27. Williams AJ, Ekins S (2012) A quality alert for
the prediction of dose-dependent human hep- chemistry databases. Towards a gold standard:
atotoxicity. J Comput Aided Mol Des regarding quality in public domain chemistry
17:811–823 databases and approaches to improving the sit-
16. Ung CY, Li H, Yap CW, Chen YZ (2007) In uation, Drug Discovery Today, Volume 17,
silico prediction of pregnane X receptor activa- Issues 13–14, Pages 685–701. Submitted for
tors by machine learning approaches. Mol publication
Pharmacol 71:158–168 28. Judson R (2010) Public databases supporting
17. Marechal JD, Yu J, Brown S, Kapelioukh I, computational toxicology. J Toxicol Environ
Rankin EM, Wolf CR, Roberts GC, Paine MJ, Health 13:218–231
Sutcliffe MJ (2006) In silico and in vitro 29. Williams AJ, Tkachenko V, Lipinski C, Tropsha
screening for inhibition of cytochrome P450 A, Ekins S (2009) Free online resources
CYP3A4 by co-medications commonly used enabling crowd-sourced drug discovery. Drug
by patients with cancer. Drug Metab Dispos Discov World 10(Winter):33–38
34:534–538 30. Richard AM, Williams CR (2002) Distributed
18. Ekins S, Waller CL, Swaan PW, Cruciani G, structure-searchable toxicity (DSSTox) public
Wrighton SA, Wikel JH (2000) Progress database network: a proposal. Mutat Res
in predicting human ADME parameters in 499:27–52
silico. J Pharmacol Toxicol Methods 31. Judson R, Richard A, Dix D, Houck K,
44:251–272 Elloumi F, Martin M, Cathey T, Transue TR,
19. Boelsterli UA, Ho HK, Zhou S, Leow KY Spencer R, Wolf M (2008) ACToR—aggre-
(2006) Bioactivation and hepatotoxicity of gated computational toxicology resource.
nitroaromatic drugs. Curr Drug Metab Toxicol Appl Pharmacol 233:7–13
7:715–727 32. Overington J (2009) ChEMBL An interview
20. Kassahun K, Pearson PG, Tang W, McIntosh I, with John Overington, team leader, chemoge-
Leung K, Elmore C, Dean D, Wang R, Doss G, nomics at the European Bioinformatics Insti-
Baillie TA (2001) Studies on the metabolism of tute Outstation of the European Molecular
troglitazone to reactive intermediates in vitro Biology Laboratory (EMBL-EBI). Interview
and in vivo. Evidence for novel biotransforma- by Wendy A. Warr. J Comput Aided Mol Des
tion pathways involving quinone methide for- 23:195–198
mation and thiazolidinedione ring scission. 33. Richard AM (2006) DSSTox web site launch:
Chem Res Toxicol 14:62–70 Improving public access to databases for build-
21. Walgren JL, Mitchell MD, Thompson DC ing structure-toxicity prediction models. Pre-
(2005) Role of metabolism in drug-induced clinica 2:103–108
idiosyncratic hepatotoxicity. Crit Rev Toxicol 34. Kortagere S, Krasowski MD, Reschly EJ,
35:325–361 Venkatesh M, Mani S, Ekins S (2010) Evalua-
22. Park BK, Kitteringham NR, Maggs JL, Pirmo- tion of computational docking to identify
hamed M, Williams DP (2005) The role of pregnane receptor agonists in the ToxCast™
metabolic activation in drug-induced hepato- database. Environ Health Perspect
toxicity. Annu Rev Pharmacol Toxicol 118:1412–1417
45:177–202 35. Sanderson K (2011) It’s not easy being green.
23. Schuster D, Laggner C, Langer T (2005) Why Nature 469:18–20
drugs fail—a study on side effects in new 36. Carroll JJ, Klyne G (2004) Resource descrip-
chemical entities. Curr Pharm Des tion framework (RDF): concepts and abstract
11:3545–3559 syntax. Tech rep, W3C
24. Xu JJ, Henstock PV, Dunn MC, Smith AR, 37. Prud’hommeaux E, Seaborne A (2008)
Chabot JR, de Graaf D (2008) Cellular imag- SPARQL query language for RDF, W3C rec-
ing predictions of clinical drug-induced liver ommendation
injury. Toxicol Sci 105:97–105
38. Willighagen EL, Alvarsson J, Andersson A,
25. Xia XY, Maliski EG, Gallant P, Rogers D Eklund M, Lampa S, Lapins M, Spjuth O,
(2004) Classification of kinase inhibitors
Wikberg J (2011) Linking the resource ome to discover the molecular targets for
description framework to cheminformatics plant-derived psychoactive compounds: a
and proteochemometrics. J Biomedical Seman- novel approach for CNS drug discovery. Phar-
tics 2(Suppl 1):S1–S6 macol Ther 102:99–110
39. Chen B, Dong X, Jiao D, Wang H, Zhu Q, 52. Keiser MJ, Setola V, Irwin JJ, Laggner C,
Ding Y, Wild DJ (2010) Chem2Bio2RDF: a Abbas AI, Hufeisen SJ, Jensen NH, Kuijer
semantic framework for linking and data MB, Matos RC, Tran TB, Whaley R, Glennon
mining chemogenomic and systems chemical RA, Hert J, Thomas KL, Edwards DD, Shoi-
biology data. BMC Bioinformatics 11:255 chet BK, Roth BL (2009) Predicting new
40. Ansell P (2011) Model and prototype for que- molecular targets for known drugs. Nature
rying multiple linked scientific datasets. Future 462:175–181
Generat Comput Syst 27:329–333 53. Setola V, Dukat M, Glennon RA, Roth BL
41. Belleau F, Nolin MA, Tourigny N, Rigault P, (2005) Molecular determinants for the interac-
Morissette J (2008) Bio2RDF: towards a tion of the valvulopathic anorexigen norfen-
mashup to build bioinformatics knowledge sys- fluramine with the 5-HT2B receptor. Mol
tems. J Biomed Inform 41:706–716 Pharmacol 68:20–33
42. Prud’hommeaux E (2007) Case study: FeDeR- 54. Rothman RB, Baumann MH, Savage JE, Rau-
ate for drug research. Tech Rep: 4–7 ser L, McBride A, Hufeisen SJ, Roth BL
43. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, (2000) Evidence for possible involvement of
Bryant SH (2009) PubChem: a public infor- 5-HT(2B) receptors in the cardiac valvulopathy
mation system for analyzing bioactivities of associated with fenfluramine and other seroto-
small molecules. Nucleic Acids Res 37: nergic medications. Circulation
W623–W633 102:2836–2841
44. Crumb WJ Jr, Ekins S, Sarazan D, Wikel JH, 55. Chekmarev DS, Kholodovych V, Balakin KV,
Wrighton SA, Carlson C, Beasley CM (2006) Ivanenkov Y, Ekins S, Welsh WJ (2008) Shape
Effects of antipsychotic drugs on Ito, INa, Isus, signatures: new descriptors for predicting car-
IK1, and hERG: QT prolongation, structure diotoxicity in silico. Chem Res Toxicol
activity relationship, and network analysis. 21:1304–1314
Pharm Res 23:1133–1143 56. Zhu H, Tropsha A, Fourches D, Varnek A,
45. Su BH, Shen MY, Esposito EX, Hopfinger AJ, Papa E, Gramatica P, Oberg T, Dao P, Cherka-
Tseng YJ (2010) In silico binary classification sov A, Tetko IV (2008) Combinatorial QSAR
QSAR models based on 4D-fingerprints and modeling of chemical toxicants tested against
MOE descriptors for prediction of hERG Tetrahymena pyriformis. J Chem Inf Model
blockage. J Chem Inf Model 50:1304–1318 48:766–784
46. Li Q, Jorgensen FS, Oprea T, Brunak S, 57. Ekins S, Williams AJ (2010) Precompetitive
Taboureau O (2008) hERG classification preclinical ADME/Tox data: set It free on the
model based on a combination of support vec- web to facilitate computational model building
tor machine method and GRIND descriptors. to assist drug development. Lab Chip
Mol Pharm 5:117–127 10:13–22
47. Thai KM, Ecker GF (2009) Similarity-based 58. Wishart DS, Knox C, Guo AC, Cheng D, Shri-
SIBAR descriptors for classification of chemi- vastava S, Tzur D, Gautam B, Hassanali M
cally diverse hERG blockers. Mol Divers (2008) DrugBank: a knowledgebase for
13:321–336 drugs, drug actions and drug targets. Nucleic
Acids Res 36:D901–D906
48. Ekins S, Williams AJ, Krasowski MD, Freun-
dlich JS (2011) In silico repositioning of 59. Hardy B, Douglas N, Helma C, Rautenberg M,
approved drugs for rare and neglected diseases. Jeliazkova N, Jeliazkov V, Nikolova I, Benigni
Drug Discov Today 16(7–8):298–310 R, Tcheremenskaia O, Kramer S, Girschick T,
Buchwald F, Wicker J, Karwath A, Gutlein M,
49. Strachan RT, Ferrara G, Roth BL (2006) Maunz A, Sarimveis H, Melagraki G, Afantitis
Screening the receptorome: an efficient A, Sopasakis P, Gallagher D, Poroikov V, Fili-
approach for drug discovery and target valida- monov D, Zakharov A, Lagunin A, Gloriozova
tion. Drug Discov Today 11:708–716 T, Novikov S, Skvortsova N, Druzhilovsky D,
50. O’Connor KA, Roth BL (2005) Finding new Chawla S, Ghosh I, Ray S, Patel H, Escher S
tricks for old drugs: an efficient route for (2010) Collaborative development of predic-
public-sector drug discovery. Nat Rev Drug tive toxicology applications. J Cheminform 2:7
Discov 4:1005–1014 60. Spjuth O, Alvarsson J, Berg A, Eklund M,
51. Roth BL, Lopez E, Beischel S, Westkaemper Kuhn S, Masak C, Torrance G, Wagener J,
RB, Evans JM (2004) Screening the receptor- Willighagen EL, Steinbeck C, Wikberg JE
(2009) Bioclipse 2: a scriptable integration human cytochromes from the perspective of

platform for the life sciences. BMC Bioinfor- the chemist. J Med Chem 48:6970–6979
matics 10:397 69. Spjuth O, Willighagen EL, Guha R, Eklund M,
61. Afzelius L, Arnby CH, Broo A, Carlsson L, Wikberg JE (2010) Towards interoperable and
Isaksson C, Jurva U, Kjellander B, Kolmodin reproducible QSAR analyses: exchange of data-
K, Nilsson K, Raubacher F, Weidolf L (2007) sets. J Cheminform 2:5
State-of-the-art tools for computational site of 70. Floris F, Willighagen EL, Guha R, Rojas M,
metabolism predictions: comparative analysis, Hoppe C (2010) The blue obelisk descriptor
mechanistic insights, and future applications. ontology. Technical report
Drug Metab Rev 39:61–86 71. Wagener J, Spjuth O, Willighagen EL, Wikberg
62. Jolivette LJ, Ekins S (2007) Methods for pre- JE (2009) XMPP for cloud computing in bio-
dicting human drug metabolism. Adv Clin informatics supporting discovery and invoca-
Chem 43:131–176 tion of asynchronous web services. BMC
63. Crivori P, Poggesi I (2006) Computational Bioinformatics 10:279
approaches for predicting CYP-related metab- 72. Gupta RR, Gifford EM, Liston T, Waller CL,
olism properties in the screening of new drugs. Bunin B, Ekins S (2010) Using open source
Eur J Med Chem 41:795–808 computational tools for predicting human met-
64. Stjernschantz E, Vermeulen NP, Oostenbrink abolic stability and additional ADME/TOX
C (2008) Computational prediction of drug properties. Drug Metab Dispos 38:2083–2090
binding and rationalisation of selectivity 73. Steinbeck C, Hoppe C, Kuhn S, Floris M,
towards cytochromes P450. Expert Opin Guha R, Willighagen EL (2006) Recent devel-
Drug Metab Toxicol 4:513–527 opments of the chemistry development kit
65. Boyer S, Arnby CH, Carlsson L, Smith J, Stein (CDK)—an open-source java library for
V, Glen RC (2007) Reaction site mapping of chemo- and bioinformatics. Curr Pharm Des
xenobiotic biotransformations. J Chem Inf 12:2111–2120
Model 47:583–590 74. Brazma A (2001) On the importance of stan-
66. Carlsson L, Spjuth O, Adams S, Glen RC, dardisation in life sciences. Bioinformatics
Boyer S (2010) Use of historic metabolic bio- 17:113–114
transformation data as a means of anticipating 75. Brazma A, Hingamp P, Quackenbush J, Sher-
metabolic sites using MetaPrint2D and Bio- lock G, Spellman P, Stoeckert C, Aach J,
clipse. BMC Bioinformatics 11:362 Ansorge W, Ball CA, Causton HC, Gaasterland
67. Rydberg P, Gloriam DE, Olsen L (2010) The T, Glenisson P, Holstege FC, Kim IF, Marko-
SMARTCyp cytochrome P450 metabolism witz V, Matese JC, Parkinson H, Robinson A,
prediction server. Bioinformatics Sarkans U, Schulze-Kremer S, Stewart J, Taylor
26:2988–2989 R, Vilo J, Vingron M (2001) Minimum infor-
68. Cruciani G, Carosati E, De Boeck B, Ethirajulu mation about a microarray experiment
K, Mackie C, Howe T, Vianello R (2005) (MIAME)-toward standards for microarray
MetaSite: understanding metabolism in data. Nat Genet 29:365–371
Chapter 11
Molecular Dynamics
Xiaolin Cheng and Ivaylo Ivanov
Abstract
Molecular dynamics (MD) simulation holds the promise of revealing the mechanisms of biological
processes in their ultimate detail. It is carried out by computing the interaction forces acting on each
atom and then propagating the velocities and positions of the atoms by numerical integration of Newton’s
equations of motion. In this review, we present an overview of how the MD simulation can be conducted to
address computational toxicity problems. The study cases will cover a standard MD simulation performed
to investigate the overall flexibility of a cytochrome P450 (CYP) enzyme and a set of more advanced MD
simulations to examine the barrier to ion conduction in a human a7 nicotinic acetylcholine receptor
(nAChR).
Key words: Molecular dynamics, Force field, Toxicity, Free energy, Enhanced sampling
1. Introduction
1.1. Overview Drug toxicity, an exaggerated pharmacological response, is one of

of the Topic-How the the main reasons for high failure rates in drug discovery and devel-
Topic Fits into the Wider opment (1, 2). Major drug toxicity mechanisms can be divided into
Scope of Computational four general groups: on-target that is the result of a drug binding to
Toxicology the intended receptor, but at an inappropriate concentration, with
suboptimal kinetics, or in unintended tissues; off-target that is
caused by a drug binding to an unintended receptor; harmful
immunological reactions; and idiosyncratic toxicity. The identifica-
tion of drug toxicity is a complex task, and toxic effects are often not
revealed until a compound has been selected for development, or
has entered the clinic. However, with the structures of enzymes and
receptors associated with chemical toxicity determined or character-
ized, structure-based studies will bring us closer to understanding
the molecular mechanisms of drug toxicity and even predicting drug
toxicity (1, 3). Drug metabolism through specialized enzymes,
e.g., cytochrome P450, is a major consideration for drug clearance
243
244 X. Cheng and I. Ivanov
and metabolite-induced toxicity (4, 5). Virtually all drug molecules

are metabolized in the human body, which modifies the pharmaco-
logical properties of these drugs. Additionally, a number of
hormone-mediated receptors are known to be related to toxicity,
e.g., the epidermal growth factor receptor and the androgen recep-
tor, towards which a variety of compounds show agonistic or antag-
onistic activities (6). Finally, the hERG potassium channels have
become an important target for toxicity screening due to their
possible involvement in the life-threatening drug-induced long
QT syndrome (7, 13) in recent years.
Common approaches in computational toxicity involve the use
of cheminformatics to determine if a new compound is toxic
(8–11). Quantitative structure–activity relationship (QSAR) tech-
niques have been widely used to correlate toxicity with a variety of
physiochemical and structural properties for congeneric series of
compounds. As toxicity data accumulates and expands in dimen-
sions, data mining and machine learning techniques from computer
science and statistics have also become popular in chemical toxicity
predictions. In recent years, high-throughput screening (HTS) and
“omic” technologies have made important strides in advancing our
understanding of drug toxicity (12–14). In particular, systematic
approaches to studying signaling networks and pathways have
provided tremendous insights into the toxicity mechanisms at the
molecular and cellular levels (1). HTS and “omic” studies produce
enormous amount of data, which in turn requires computational
approaches to analyze, interpret, and build more predictive models.
Cheminformatic-based methods, however, do not provide insights
into the molecular mechanisms of drug toxicity, and are neverthe-
less to predict which compound is likely to be toxic, and to estimate
the relative margin of safety for a group of compounds. In addition
to these more phenomenological approaches, molecular dynamics
(MD) simulation is finding increasing applications in computa-
tional toxicity research, especially when the dynamic nature of
enzymes or receptors involved in drug toxicity cannot be ignored.
With ongoing developments of structural and systems biology and
computational techniques, MD simulation has also become an
essential component in the emerging multi-scale toxicity modeling
approaches that incorporate structural and functional information
at multiple scales (15, 16).
MD simulation based on the numerical integration of New-
ton’s equation of motion can reveal the mechanisms of biological
processes in their ultimate detail, and in general has been used in
two different ways. The first is to use the simulation to probe the
actual dynamics of the system. Thus MD simulation opens the
possibility for direct observation of the motion of biomolecules at
the atomic scale. For example, folding/unfolding of peptides or
small proteins(17, 18), water molecules permeation through
biological channels (19, 20) have been extensively studied with
11 Molecular Dynamics 245
MD, which shed light into the underlying mechanisms that would
otherwise be difficult to obtain from experiments. The second way
is to use simulation simply as a means of sampling. Then, in a
statistical mechanics framework, a variety of equilibrium and kinetic
properties of the systems can be derived and compared with experi-
ments. Therefore, MD simulation not only allows direct visualiza-
tion of the dynamics of biomolecules but also helps elucidate the
underlying molecular mechanisms (driving forces) of the observed
behavior. Since the 1970s, the method of MD simulation has
gained popularity in biochemistry and biophysics (21–24). As the
simulation capability increases in complexity and scale, MD has
widely served as a computational microscope to investigate the
molecular details of many complex biological processes, with appli-
cation to protein folding (25), enzymatic catalysis (26, 27), molec-
ular machines (28), etc. More recently, MD simulation has also
been used to address drug toxicity problems in which the dynamic
nature of proteins is essential.
1.2. Application Areas In the context of computational toxicity, MD simulations have

for the Techniques been used to help understand the molecular mechanisms of drug
toxicity especially for the cases where the target structures are
known, and to a less extent to predict drug toxicity. Biological
systems are dynamic in nature; characterization of their dynamics
at the atomistic level is therefore essential to understanding many
biological phenomena including drug toxicity. MD simulation is an
excellent computational tool for capturing dynamics of biological
systems. When coupled with other computational tools, it opens
the opportunity to address many fundamental mechanistic pro-
blems in drug toxicity. Current application areas for MD include
the following:
First, when combined with the theory of thermodynamics and
statistical mechanics, MD simulation can be used to compute many
physicochemical properties of drug-like molecules, such as octa-
nol/water partition coefficients, water solubility, solvent accessible
area, and Henry’s constant. The pKa values, chemical reactivity
(such as redox potentials) and hydrolysis rate constants are also
computable with a combined quantum mechanical/molecular
mechanical (QM/MM) treatment (29, 30). Then, using QSAR-
like or data modeling approaches, the computed physicochemical
properties can be correlated with many toxicity-related properties,
e.g., the distribution of a drug in blood and tissue or the kinetics of
its metabolic activity.
Second, a few classes of enzymes or receptors are known to be
involved in chemical toxicity for various reasons. MD simulations
have been extensively applied to investigate the molecular mechan-
isms of these enzymes or receptors at an atomic level.
The cytochrome P450 superfamily (CYP) is a large and diverse
group of enzymes, accounting for ~75% of the total metabolism
(31–33). They play a primary role in reducing drug toxicity

through metabolic oxidation that then leads to the clearance of a
drug. The CYP P450 enzymes are capable of recognizing a wide
variety of chemically diverse substrates, the molecular details of this
promiscuity, however, have remained elusive. Therefore, under-
standing the mechanism and specificity of substrate binding in the
CYP enzymes is an important step toward explaining their key role
in drug metabolism, toxicity, and xenobiotic degradation. A variety
of experimental approaches have been employed to probe the
dynamic nature of the enzymes, to study the substrate interaction
with the active sites, and to identify potential residues involved in
the catalytic mechanisms of the substrates. The availability of the
structures of CYP P450 enzymes from X-ray crystallography allows
MD simulations to be used to explore various aspects of the protein
dynamics in substrate recognition and catalysis, complementing
experimental findings (34).
Nuclear receptors comprise a family of ligand-mediated tran-
scription factors within cells that are responsible for sensing hor-
mones and other molecules (35–37). The binding of hormonally
active compounds or endocrine disruptors to these receptors can
elicit a variety of adverse effects in humans including promotion of
hormone-dependent cancers and reproductive disorder. A number
of receptors involved in drug or environmental toxicity are known,
including thyroid hormone receptor (38), epidermal growth factor
receptor (39), aryl hydrocarbon receptor (40, 41), androgen recep-
tor, (42) and estrogen receptor (43). Ligands that bind to and
activate nuclear receptors are typically lipophilic, such as endoge-
nous hormones and xenobiotic endocrine disruptors. When the
structures of the targets become available, usually from X-ray or
by homology modeling, the most common computational
approach for understanding the ligand–receptor interactions is
molecular docking. However, MD simulations can help overcome
many issues in molecular docking simulation, including optimiza-
tion of the complex structures, accommodating flexibility of the
receptors and improvement of scoring functions (44). MD simula-
tions can also be used to provide an explanation for dynamic
regulation and molecular basis of agonicity and antagonicity in
nuclear receptors (45).
The hERG potassium channel is responsible for the electrical
activity of the heart that coordinates the heart beating. hERG
blockage has been linked to life-threatening arrhythmias and thus
represents a major safety concern in drug development (7). Exten-
sive experimental studies have greatly advanced our knowledge on
the molecular basis of hERG-mediated arrhythmias. Although the
crystal structure of hERG has not been determined, a few homol-
ogy models have been developed, enabling MD simulations to be
used to study these channels (46, 47). A variety of physiological
processes are amenable to MD studies, e.g., how drug molecules
are able to bind to hERG and then block the ion flow through the
channel. MD simulations can be particularly valuable for
membrane-bound hERG potassium channels as experimental char-
acterization of their structural dynamics is very challenging.
1.3. How, When, and MD simulations have been extensively employed to investigate the
by Whom These conformational dynamics of cytochrome P450 enzymes that is
Techniques or Tools thought to play an important role in ligand binding and catalysis.
Are Used in Practice Meharenna et al. have performed high-temperature MD simula-
tions to probe the structural basis for enhanced stability in thermal
1.3.1. Cytochrome P450
stable cytochrome P450. The comparison of the MD trajectories at
Enzymes
500K suggests that the tight nonpolar interactions involving Tyr26
and Leu308 in the Cys ligand loop are responsible for the enhanced
stability in CYP119, the most thermal stable P450 known (48).
Using MD simulations at normal and high temperatures, Skopalı́k
et al. have studied the flexibility and malleability of three micro-
somal cytochromes: CYP3A4, CYP2C9, and CYP2A6. MD simu-
lations reveal flexibility differences between these three
cytochromes, which appear to correlate with their substrate prefer-
ences (49). Hendrychová et al. have employed MD simulations and
spectroscopy experiments to probe the flexibility and malleability of
five forms of human liver CYP enzymes, and have demonstrated
consistently from different techniques that CYP2A6 and CYP1A2
have the least malleable active sites while CYP2D6, CYP2C9, and
CYP3A4 exhibit considerably greater degrees of flexibility (50).
Lampe et al. have utilized MD simulations in conjunction with
two-dimensional heteronuclear single quantum coherence NMR
spectroscopy to examine substrate and inhibitor binding to
CYP119, a P450 from Sulfolobus acidocaldarius. Their results sug-
gest that tightly binding hydrophobic ligands tend to lock the
enzyme into a single conformational substate, whereas weakly
binding low-affinity ligands bind loosely in the active site, resulting
in a distribution of localized conformers. Their MD simulation
results further show that the ligand-free enzyme samples ligand-
bound conformations of the enzyme, thus suggesting that ligand
binding proceeds through conformational selection rather than
induced fit (51). By means of MD simulations, Park et al. have
examined the differences in structural and dynamic properties
between CYP3A4 in the resting form and its complexes with the
substrate progesterone and the inhibitor metyrapone (52).
The dynamics of the substrate binding site has also been a
matter of extensive MD studies because of its crucial role in under-
standing the specificity and selectivity of the enzyme towards dif-
ferent substrates. Diazepam is metabolized by CYP3A4 with
sigmoidal dependence kinetics, which has been speculated to be
caused by the cooperative binding of two substrates in the active
site. Fishelovitch et al. have performed MD simulations of the
substrate-free CYP3A4 and the enzymes with one and two
diazepam molecules bounded to understand the factors governing

the cooperative binding (53). Seifert et al. performed extensive MD
simulations to investigate the molecular basis of activity and regios-
electivity of the human microsomal cytochrome CYP2C9 toward
its substrate warfarin (54). The simulations suggest that the crystal
structure of CYP2C9 with warfarin was captured in a nonproduc-
tive state, whereas in the productive states both 7- and 6-positions
of warfarin (with position 7 markedly favored over position 6) are in
contact with the heme, consistent with experimentally determined
regioselectivity.
The crystal structure of CYP3A4 revealed that a small active site
is buried inside the protein, extending to the protein surface
through a narrow channel. Conventional MD simulations and
MD simulations with enhanced sampling methods have been used
to explore the dynamical entrance and exit of substrates/products
into the active site through the access channel. Wade et al. have
extensively investigated the ligand exit pathways and mechanisms of
P450cam (CYP101), P450BM-3 (CYP102), and P450eryF
(CYP107A1) by using random expulsion MD and conventional
MD simulations (55), suggesting that the channel opening
mechanisms are adjusted to the physicochemical properties of the
substrates and can kinetically modulate the protein–substrate spec-
ificity (56). Steered molecular dynamics (SMD) simulations have
been used by Li et al. and Fishelovitch et al. to pull metyrapone
(57), temazepam, testosterone-6bOH (58) out of CYP3A4 respec-
tively, in order to identify the preferred substrate/product path-
ways and their gating mechanism. Based on the simulation results,
they concluded that product exit preferences in CYP3A4 are regu-
lated by protein–substrate specificity. A quantitative assessment of
the channel accessibility for various substrates would require the
calculation of the potential of mean force (PMF) along the plausible
ligand access pathways, but research along this line has not been
reported to date due to the lack of efficient MD sampling
techniques.
1.3.2. hERG Potassium No experimentally determined 3D structure of the hERG potas-

Channel sium channels is available, whereas the structures of several homol-
ogous voltage-gated potassium channels have been determined by
X-ray crystallography. MD simulations have been previously used
to refine, evaluate homology models and to acquire an atomic-level
description of the channel pore and possible binding of channel
blockers. Stary et al. have used a combination of geometry/pack-
ing/normality validation methods as well as MD simulations to
obtain a consensus model of the hERG potassium channel (47).
Subbotina et al. have used MD simulations to optimize a full model
of the hERG channel including all transmembrane segments devel-
oped by using a template-driven de-novo design with ROSETTA-
membrane modeling, leading to a structural model that is
consistent with the reported structural elements inferred from

mutagenesis and electrophysiology experiments (59). Masetti
et al. have run MD simulations of the homology models of the
hERG channel in both open and closed states, and showed that the
combined use of MD and docking is suitable to identify possible
binding modes of several drugs, reaching a fairly good agreement
with experiments (46).
During the past decade, many aspects of channel functions
(such as gating, ion permeation, and voltage sensing) of voltage-
gated potassium channels have been extensively studied by means
of MD simulations. In comparison, MD studies of the gating
mechanisms and functions of the hERG channels have been rare.
However, as our knowledge about the structural features of this
important protein continues to advance, MD simulations are
expected to be increasingly used in the study of hERG. Stansfeld
et al. have studied the inactivation mechanisms in the hERG potas-
sium channels, and have revealed that the carbonyl of Phe627,
forming the S0 K+ binding site, swiftly rotates away from the
conduction axis in the wild-type channel while occurs less fre-
quently in the non-inactivating mutant channels (60). MD simula-
tions of a hERG model carried out by Kutteh et al. suggest that the
fast inactivation might be caused by an unusually long S5-P linker
in the outer mouth of hERG that moves closer to the channel axis,
possibly causing a steric hindrance to permeating K+ ions (61).
Osterberg et al. have used MD simulations combined with docking
and the linear interaction energy method to evaluate the binding
affinities of a series of sertindole analogues binding to the human
hERG potassium channel. The calculations reproduce the relative
binding affinities of these compounds very well and indicate that
both polar interactions near the intracellular opening of the selec-
tivity filter as well as hydrophobic complementarity in the region
around F656 are important for blocker binding (62).
2. Materials
2.1. Common Software Many MD packages have been developed over the years, including
and Methods Used CHARMM (63), AMBER (64), GROMOS (65), GROMACS
in the Field (66), TINKER (67), MOLDY (68), DL_POLY (69), NAMD
(70), LAMMPS (71), and Desmond (72). Some of them have
their associated force fields, while others only provide an MD
engine and require compatible force fields for running a simulation.
For biomolecular simulation, CHARMM, AMBER, GROMOS,
NAMD, and GROMACS are most widely used. CHARMM and
AMBER have enjoyed the longest history of continuous develop-
ment and offer a wide range of functionalities for advanced
sampling as well as pre- and post-simulation analysis. They also

carry their own force fields (73, 74) developed for proteins, nucleic
acids, lipids, and carbohydrates. The development of NAMD and
GOMACS has been focused on the performance. Recent versions
of NAMD and GROMACS have shown remarkable parallel effi-
ciency on both high-end parallel platforms and commodity clusters,
thus being very attractive for large-scale MD simulations of biomo-
lecular systems. GROMACS supports a variety of force fields,
including all-atom GROMOS (75), AMBER (74), and CHARMM
(73) as well as united atom and coarse-grained force fields. NAMD
runs with standard AMBER, CHARMM, and GROMACS style
ASCII topology and coordinates without the need of any format
conversion. Desmond is a relatively new software package devel-
oped at D.E. Shaw Research to perform high-performance MD
simulations of biological systems. Desmond supports several var-
iants of the AMBER, CHARMM, and OPLS-AA force fields (76).
2.2. Special Most MD programs run under the Unix/Linux like operating
Requirements for the systems. As MD simulations of biomolecules are computationally
Software and Methods intensive, production runs are often performed on high-end paral-
(e.g., Hardware, lel platforms or commodity clusters of many processors. Recently,
Computing Platform, with the emergence of special purpose processors such as field-
and Operating System) programmable gate array (FPGA) and graphic processing unit
(GPU) designed to speed up computing-intensive portions of
applications, several MD codes have been adapted and ported to
run on these platforms as well (77–79). Fortran or C/C++ compi-
lers are required since most MD codes are written in Fortran or C/
C++. Special parallel programming libraries are also required to run
MD in parallel on multiple CPU cores or multiple nodes on a
network, e.g., the MPI library for distributed memory systems, and
POSIX threads and OpenMP for shared memory systems. Most MD
codes use MPI, a message-passing application programmer interface,
while NAMD uses Charm++ parallel objects for good performance
on a wide variety of underlying hardware platforms. FFTW, a C
subroutine library for computing the discrete Fourier transform, is
extensively employed by state-of-the-art MD codes for treating long-
ranged electrostatic interactions with the particle mesh Ewald (PME)
(80) or particle–particle particle–mesh (P3M) (81) methods. Pre-
compiled libraries, e.g., special math library, or script language library
may also be required by some MD programs, such as the TCL library
used in both NAMD (70) and VMD (82).
2.3. Preferred Software Several MD codes are currently used by the biomolecular simulation
and Why community, and each of them has its strength and weakness. Choos-
ing which MD software for your simulation will depend on many
factors, such as the force field compatibility, the speed/scalability,
the support for simulation setup and post-simulation analysis, and
the availability of special simulation techniques. It also depends on

the problem/system under investigation and the computer
resources at your disposal. NAMD (70) has been the software of
preference for large-scale, conventional explicit solvent simulations
for the following reasons: (1) superior scalability; (2) compatibility
with AMBER and CHARMM force fields; (3) it uses the popular
molecular graphics program VMD for simulation setup and trajec-
tory analysis; (4) offers flexibility by providing a TCL scripting
language interface so that the user can customize his/her own
code for any special purpose without the need to modify the source
code and recompile. Last but not least, NAMD provides excellent
user support including online documentations, tutorials, and fre-
quent workshops.
3. Methods
3.1. Molecule Building MD simulation starts with a 3D structure as the initial configura-
tion of the system. This structure can be an NMR or X-ray structure
3.1.1. Initial Coordinates
obtained from the Brookhaven Protein Databank (http://www.
rcsb.org/pdb/). If no experimentally determined structure is avail-
able, an atomic-resolution model of the “target” protein can be
constructed from its amino acid sequence by homology modeling.
Homology modeling can produce high-quality structural models
when the target and the template, a homologous protein with an
experimentally determined structure, are closely related. The
choice of an initial configuration must be done carefully as this
can influence the results of the simulation. When multiple PDB
entries are available, which structure to choose usually depends on
the quality of the structure, the state in which the structure was
captured and the experimental condition under which the structure
was determined. It is important to choose a configuration in a state
best representing what one wishes to simulate.
3.1.2. Prune Protein With a 3D structure in hand, still a few things need to be sorted out
Structure before a simulation can get started. (1) Removing redundant atoms:
X-ray structures may be captured in a multimer state; NMR may
yield an ensemble of conformations; multiple conformations may
exist for some flexible side chains; extra chemical agents may be
added to facilitate the structure determination. All these redundant
atoms should be removed prior to further structure processing.
(2) Add missing atoms: depending on the quality of a PDB struc-
ture, some coordinates may be missing. It is important to check
whether the missing coordinates are relevant to the question to be
addressed. First, for those proteins which active form are multi-
meric, it is necessary to construct the multimer structures from
the deposited monomer coordinates using associated symmetry

and/or experimental constraints. Second, if only a few residues or
atoms are missing, most MD building programs can reliably rebuild
these missing parts. However, if the missing gap is large, e.g., a
flexible loop that is often found missing or partially missing due to
its dynamic nature, most MD building programs will fail in rebuild-
ing a reliable structure. Ab initio loop modeling or homology
modeling tools should be used instead for these cases (83). Third,
since X-ray structures usually do not give the hydrogen positions,
another necessary step is to add hydrogen atoms and assign appro-
priate protonation state to ionizable residues. The ionization states
of residues such as glutamate, aspartate, histidine, lysine, and argi-
nine can be extremely relevant for the function of a protein. So it is
advisable to use pKa calculations to aid in the assignment of pro-
tonation states. Several software packages and Web servers are
available for this purpose, such as H++ (http://biophysics.cs.vt.
edu/H++/), Karlsberg+ (http://agknapp.chemie.fu-berlin.de/
karlsberg/), PROPKA (http://propka.ki.ku.dk/), and MCCE
(http://134.74.90.158/). (3) Replace mutated atoms: For those
proteins that do not form stable structures or only form transient
(weak) structure complexes with their ligands, some of the residues
might have been modified in order to obtain stable crystal struc-
tures. In these cases, the modified residues or atoms should be
replaced by the native ones. (4) Build the structure from multiple
components: when the complete structure of a protein–ligand com-
plex or a multi-domain protein complex is not available, the molec-
ular docking or protein–protein docking tools can be used to build
an initial complex structure for simulation and refinement, from the
available structures of its components.
3.1.3. Molecular Structure Given a refined PDB structure of the protein or protein complex,
and/or Topology File the next step is to generate the topology or parameter files for the
system. The topology file contains the geometrical information of
the system, e.g., bonds, angles, dihedral angles, and interaction list.
Sometimes, topology files are combined with parameter files, thus
may also contain the force field parameters, i.e., the functional
forms and parameter sets used to describe the potential energy of
the system. Various force fields have been developed for different
types of biomacromolecules, including proteins, nucleic acids,
lipids, and carbohydrates. The choice of an appropriate force field
is of substantial importance, and will depend on the nature of the
system (problem) of interest. In general, the chosen force field
should be compatible with the MD engine, and the force fields
for different components of the system should be consistent with
each other. Most MD programs provide auxiliary utility programs
for generating topology and parameter files from PDB files. The
procedure is straightforward except for a few potentially confusing
items, for instance, some special treatments (or patch) may be
required if there exists a disulfide bond between a pair of cysteine

residues, if the residue/atom name conventions in PDB and the
topology generating programs are different, or if a nonstandard
protonation state is assigned to a residue. If the system contains a
nonstandard residue or novel ligand molecule, then its force field
must be generated first. Also, the new force field must be compati-
ble with that used for the rest of the system. Generalized force fields
for drug-like molecules compatible with the AMBER (84) and
CHARMM (85) all-atom force fields have become available.
3.1.4. Solvate the System To simulate a biological system in an aqueous solution, a choice
should be made between explicit and implicit solvent models.
However, only explicit solvent models (such as the TIP3P, SPC/
E water models) will be discussed in this review. For crystal waters,
it is advisable to keep them, especially for those located in the active
site or the interior of the protein that often play a structural or
functional role. When necessary, additional water molecules can be
placed inside or around the protein using programs such as
DOWSER (http://hekto.med.unc.edu:8080/HERMANS/soft-
ware/DOWSER/). The system is then solvated with a pre-
equilibrated water box. Ions (usually Na+, K+, Cl) are added to
neutralize the system, and to reach a desired ionic concentration.
For membrane-associated systems, proteins need to be inserted to a
pre-equilibrated lipid bilayer. The orientation and position of pro-
teins in membrane can be determined by an online server OPM
(http://opm.phar.umich.edu/), together with experiments and
the modeler’s intuition. The lipid composition is another issue
worthy of consideration as accumulating evidence has shown it
can have a significant and differentiate impact on the function of
membrane-bound proteins. The CHARMM force field supports
six types of lipids 1,2-dipalmitoyl-sn-phosphatidylcholine (DPPC),
1,2-dimyristoyl-sn-phosphatidylcholine (DMPC), 1,2-dilauroyl-sn-
phosphatidylcholine (DLPC), 1-palmitoyl-2-oleoyl-sn-phosphati-
dylcholine (POPC), 1,2-dioleoyl-sn-phosphatidylcholine (DOPC),
and 1-palmitoyl-2-oleoyl-sn-phosphatidylethanolamine (POPE),
while VMD provides two types of pre-equilibrated POPC and
POPE membrane patches. After everything is assembled together,
the new structure/topology files can be built, followed by several
rounds of energy minimization to remove bad van der Waals con-
tacts.
3.2. Conducting By integrating Newton’s equations of motion, molecular dynamics

Simulations allows the time evolution of a system of particles to be followed in
terms of a dynamical trajectory (a record of all particle positions and
3.2.1. MD Theory and
momenta at discrete points in time over a time span T ). For a
Algorithm
system of N particles of masses mi and positions {ri}, the equations
of motion are:
Fi ¼ rri U ¼ mi €ri ; i ¼ 1; . . . ; N ; (1)
where the potential energy U acting on the atomic nuclei may be

described, neglecting the electronic interactions, by simple func-
tions of the ionic positions U ¼ U ðfri gÞ. The total potential
energy function can be expressed as a sum of potentials derived
from simple physical forces: van der Waals, electrostatic, mechanical
strains arising from ideal bond length and angle deviations, and
internal torsion flexibility. The forces can be separated into bonded
and nonbonded terms.
X X X
U ðfri gÞ ¼ Kb ðl l0 Þ2 þ Ka ðy y0 Þ2 þ Ko ðo o0 Þ2
b a imp:
X X qi qj
þ Kc ½ð1 þ cosðnc dÞ þ
dihed:
i;j 4pe0 em rij
!
X Aij Bij
þ :
i;j rij12 rij6
(2)
In classical mechanics, the trajectory of the system is deter-
mined by its initial conditions, namely the initial atomic positions
and velocities. The integration algorithm can be derived from the
Taylor series expansion of the atomic positions with respect to the
simulation time t:
1 1
_
rðt þ dtÞ ¼ rðtÞ þ rðtÞdt þ €r ðtÞdt 2 þ €r dt 3 þ Oðdt 4 Þ: (3)
2 3
By summing the expansions for rðt þ dtÞand rðt dtÞ, we
obtain the Verlet algorithm:
1
rðt þ dtÞ ¼ 2rðtÞ rðt dtÞ þ aðtÞdt 2 þ Oðdt 4 Þ; (4)
2
where r(t) and a are the position vector and the acceleration vector,
respectively; dt is the time step. The time step dt has to be chosen
sufficiently small in order to ensure that continuity in the forces
acting on the atoms and overall energy conservation.
The ensemble average of an observable A in a system charac-
terized by the Hamiltonian H, in the classical limit of statistical
mechanics (ℏ ! 0), in most of the cases can be considered equiva-
lent to its time average:
Ð N N N N bH ðpN ;r N Þ
dp dr A p ; r e
<A> ¼ R
dpN dr N ebH ðpN ;r N Þ
ð
1
ffi lim dt 0 A pN ðt 0 Þ; r N ðt 0 Þ : (5)
t!1 t
Here, b ¼ 1/kBT, kB is the Boltzmann constant and N is the

number of degrees of freedom for the system under consideration.
The above equivalence relation is known as the ergodic hypothesis,
and for many systems similar to those contained in this thesis, the
validity of the hypothesis has been confirmed. The ergodic hypoth-
esis provides the rationale for the molecular dynamics method and a
practical recipe that allows ensemble averages to be determined
from time averaging over dynamical trajectories.
3.2.2. Interaction Treatment How an MD simulation will be run is controlled by a set of input
and Integration Method parameters contained in a configuration file, such as the number of
steps and the temperature. The main options and values can be
divided into three categories: (1) interaction energy treatment; (2)
integration method; (3) ensemble specification. Additional sets of
parameters may be used by advanced simulation techniques, e.g.,
enhanced sampling and free energy simulations. Most explicit sol-
vent MD simulations employ periodic boundary conditions to
avoid the boundary artifact. The most time consuming part of a
simulation is the calculation of nonbonded terms in potential
energy functions, e.g., the electrostatic and van der Waals forces.
In principle, the nonbonded energy terms between every pair of
atoms should be evaluated; in this case, the number of operations
increases as the square of the number of atoms for a pair wise model
(N2). To speed up the computation, the nonbonded interactions,
e.g., the electrostatic and van der Waals forces, are truncated if two
atoms are separated greater than a predefined cutoff distance. The
long-ranged electrostatic interactions typically use FFT-based PME
(80) or particle–particle particle–mesh (P3M) (81) methods that
reduce the computational complexity from N2 to N log N. MD
simulation involves the numerical integration of Newton’s equa-
tions of motion in finite time steps that must be small enough to
avoid discretization errors. Typical time steps used in MD are in the
order of 1 fs (i.e., smaller than the fastest vibrational frequency in
biomolecular systems). This value may be increased by using con-
straint algorithms such as SHAKE (86), which fix the fastest vibra-
tions of the atoms (e.g., hydrogens). Multiple-time-step methods
are also available, which allow for extended times between updates
of slowly varying long-range forces (87). The total simulation
duration should be chosen to be long enough to reach biologically
relevant time scales or allow sufficient sampling (barrier crossing),
and should also account for the available computational resources
so that the calculation can finish within a reasonable wall-clock
time.
3.2.3. Temperature and MD simulation is often performed on the following three thermo-
Pressure Control dynamic ensembles: microcanonical (NVE), canonical (NVT), and
isothermal–isobaric (NPT). In the NVE ensemble, the number of
particles (N ), the volume (V), and the total energy (E ) of the system
are held constant. In the canonical ensemble, N, V, and the temper-
ature (T ) are constant, where the temperature is maintained
through a thermostat. In the NPT ensemble that corresponds
most closely to laboratory conditions, N, T, and the pressure (P) are

held constant, where a barostat is needed in addition to a thermo-
stat. In the simulation of biological membranes, anisotropic pres-
sure control is more appropriate, e.g., constant membrane area (PA)
or constant surface tension (Pg). A variety of thermostat methods
are available to control temperature, which include velocity rescal-
ing (88), the Nosé-Hoover thermostat (89, 90), Nosé-Hoover
chains (91), the Berendsen thermostat (92), and Langevin dynam-
ics. Note that the Berendsen thermostat might cause unphysical
translations and rotations of the simulated system. As with temper-
ature control, different ways of pressure control are available for
MD simulation, including the length-scaling technique of Berend-
sen (92) and the extended Nosé-Hoover (extended Lagrangian)
formalism of Martyna et al. (93).
3.2.4. Equilibration Before a production run, a multistage equilibration simulation is

often necessary especially for a heterogeneous system composed of
multiple components or phases. For a typical system of a protein
embedded in an explicit solvent box, the equilibration often starts
with fixing the protein and letting the waters move to adjust to the
presence of the protein. After the waters are equilibrated, the con-
straints on the protein can be removed and let the whole system
(protein + water) evolve with time. During the heating phase,
initial velocities corresponding to a low temperature are assigned,
and then the temperature is gradually brought up until the target
temperature is reached. As the simulation continues, several proper-
ties of the system are routinely monitored, including the tempera-
ture, the pressure, the energies, and the structure. The production
simulation can be started only until these properties become stable
with respect to time. The purpose of the equilibration phase is to
minimize nonequilibrium effects and avoid unphysical local struc-
tural distortions, thus leading to a more meaningful simulation.
3.3. Postprocessing During an MD simulation, the coordinates and velocities of every

and Analyzing atom in the system can be saved at a prespecified frequency for later
the Results analysis. From the saved data, in principle, all the structural, ther-
modynamic (i.e., energy, temperature, pressure, velocity distribu-
tions) and dynamic (diffusion, time correlation functions)
properties of the system can be computed. A variety of post-
processing utility programs can be found in popular MD software
packages. As MD simulations are often used to help visualize and
understand conformational dynamics at an atomic level, after the
simulation the MD trajectory is typically first loaded to molecular
graphics programs to display possible structural changes of interest
in a time-dependent way. Also, post-processing consists of more
quantitative and detailed structural analysis, such as distance, angle,
dihedral angle, contact, hydrogen bond, radius of gyration, radial
distribution functions, protein secondary structure, sugar puckering,
and DNA local curvature. Additional geometrical quantities that are

routinely calculated from an MD simulation trajectory include the
following: root mean square difference (RMSD) between two struc-
tures and RMS fluctuations (RMSF). The time trajectory of RMSD
shows how a protein structure deviates from a reference structure as
a function of time, while the time-averaged RMSF indicates the
flexibility of different regions of a protein, which is related to the
crystallographic B-factors.
Nowadays, large-scale MD simulations produce an immense
quantity of data. Clustering and correlation analyses are standard
mathematical tools that are used to group similar structures or
detect correlations in large data sets. Moreover, the covariance
matrix of atomic displacements can be diagonalized to obtain
large-scale collective motions under a quasi-harmonic approxima-
tion, and to compute configurational entropies via various approx-
imations(94, 95). Another valuable post-processing procedure for
MD simulations is the (free) energy decomposition by the MM-PB
(GB)SA method (96), or by the integration of individual force
components. Time correlation functions can also be easily com-
puted from an MD trajectory, which, in turn, can be used to relate
the dynamics of atoms and electrons to various molecular spectros-
copy data using the theory of nonequilibrium statistical mechanics.
For nonequilibrium simulations or simulations using special sam-
pling techniques, e.g., umbrella sampling (97), generalized ensemble
algorithm (98), and Wang-Landau method (99). for achieving
importance sampling, special post-processing, such as weighted
histogram (100) or maximum-likelihood method, is required to
recover equilibrium thermodynamic quantities from the biased
simulations.
3.4. Validating The simulation results should be validated at two levels: first, to assess
the Results whether the simulation is conducted properly; second, to assess
whether the model underlying the simulation sufficiently describes
the problem to be probed. A variety of MD outputs can provide
hints about whether the simulation is conducted properly, including
the time-dependent thermodynamic quantities (i.e., temperature,
pressure, and volume), their fluctuations, and the distribution of velo-
cities in the system. For example, one would expect the conservation of
the total energy in an NVE ensemble simulation, while any significant
energy drift indicates possible problem of either the integration algo-
rithm or the interaction force evaluation. Structural features of the
system can be validated by visualization to rule out any unphysical
(inappropriate) changes, contacts, or assembly; computer programs, e.
g., Verify3D (http://nihserver.mbi.ucla.edu/Verify_3D/), Procheck
phi/psi angle check (http://www.ebi.ac.uk/thornton-srv/soft-
ware/PROCHECK/), WHAT_CHECK Packing 2 (http://swift.
cmbi.ru.nl/gv/whatcheck/), Prosa2003 (http://www.came.sbg.
ac.at/prosa_details.php), ModFOLD (http://www.reading.ac.uk/
bioinf/ModFOLD), and a local quality assessment method (101)

are also available for assessing the overall quality of the structures
sampled in simulation. The second level of validation is usually
through comparing simulation results to experiments and/or results
of other methods, which will be further discussed below. When no
experimental result is available for comparison, one should run
benchmark/test simulations first to validate on the quantities that
have already been experimentally determined. Only upon this vali-
dation, can more trustworthy simulations be performed for other
unknown but related properties. Finally, it is always advisable to run
multiple or control simulations, since the behavior of the control
simulations or the difference between the production and control
simulations usually provides valuable insights into the reliability of
the underlying simulations.
3.5. Comparing MD simulation provides a window to investigate the ultimate

Simulation Results details of biological processes. But it also suffers from a number
to Experiments and/or of limitations as any theoretical models do, especially when applied
Results of Other to complex biological systems or processes. So whenever possible
Methods MD results should be compared to experiments. From an MD
simulation, structural quantities can be easily calculated and most
of them are directly comparable to experimental measurements.
The average (or most populated) structure can be compared to
X-ray crystallographic or nuclear magnetic resonance (NMR) struc-
ture. Positions of hydrogen atoms in the context of hydrogen
bonding can be compared to neutron diffraction data. Distances
in solution can be compared to the NMR nuclear Overhauser
(NOE) experiments. Protein secondary structure features can be
compared to circular dichroism, infrared, or Raman spectroscopic
experiments. Orientation of molecular fragments can be compared
to NMR order parameters.
It remains challenging for experimental techniques to probe
the conformational dynamics of biomolecules at the atomic level.
Most measurements are only able to capture one aspect of the
dynamical changes, such as distance or overall shape. In this respect,
MD simulations ideally complement the experiments. Time evolu-
tion of distance changes derived from simulation can be compared
with fluorescence resonance energy transfer (FRET) experiment;
slow conformational dynamics (>109 s) captured in simulation
can be compared to NMR residual dipolar coupling (RDC) experi-
ment; correlated motions derived from simulation can be compared
to quasi-elastic and inelastic neutron scattering, diffuse X-ray scat-
tering, inelastic Mossbauer scattering, and dielectric spectroscopy.
Many thermodynamic (e.g., the free energy changes associated
with solvation, ligand binding or conformational shift) or kinetic
(e.g., the rate of an enzyme-catalyzed reaction or the single-channel
conductance of a biological channel) quantities can also be calcu-
lated from an MD simulation and compared to experimental
measurements. Even though a quantitative agreement between

simulations and experiments is still a difficult task for many complex
biological problems, an increasing number of successful examples
are appearing in the literature. In practical applications, indirect
comparison of the simulation results can also be made with many
experiments that correlate the structural data with the functional
measurements, such as mutagenesis and labeling experiments.
3.6. Interpreting The interpretation of MD simulation results are often straightfor-

the Results ward since the simulation provides all necessary information in
detail, i.e., the coordinates and velocities for every atom in the
system for all the dynamic steps. However, experiments are often
conducted for complex biological systems under complex condi-
tions, so when interpreting the simulation results, one should
always consider whether the simulation system and/or conditions
adequately reflect or correspond to the experimental settings; oth-
erwise the interpretation of the results could be irrelevant or even
invalid. Furthermore, the interpretation of the simulation results
can become complicated in the following two situations.
First, MD simulations are often limited in their abilities to
investigate “long”-time scale motions. So in practical applications,
it often involves the use of thermodynamic data to avoid the direct
simulation of slow dynamics, such as the use of PMF computed
with enhanced sampling techniques to help understand the ion
conductance in biological channels; or the use of multiple short
trajectories to approximate the long time dynamics; or the extrapo-
lation of the limited data to project possible long time dynamics.
The convergence of the thermodynamic quantities is the central
question when interpreting large-scale biomolecular simulations
(102). Unfortunately, assessing the errors introduced by these
methods is very difficult to do and often is necessary to be validated
by experiments as discussed above.
Second, when the experimental data is not directly comparable,
some kind of assumption (or model) must be invoked to correlate
the simulation data with the measurements, such as crystallographic
B-factors, NMR S order parameters, and mutagenesis data. Techni-
ques like quasi-elastic and inelastic neutron scattering can in princi-
ple provide information about correlated motions. However, the
interpretation of neutron scattering at molecular level often requires
a theoretical model. For these cases, one should bear in mind the
underlying assumption and/or shortcomings of the model when
interpreting the simulation results or comparing them to experi-
ments. For example, thermal fluctuations are often related to crys-
tallographic B-factors via the relationship B ¼ 8p2u2, where u is the
time-averaged atomic fluctuation. However, this relationship
assumes that the B-factors only approximate thermal motion within
a well-ordered structure as a harmonic oscillator (i.e., isotropic
vibration), while discounting the disorder of the protein.
3.7. Improving MD simulations suffer from several drawbacks, which are due to
the Model the empirical force fields, the limited simulation length and size,
and the way the simulation models are built. The classical mechani-
cal force field functions and parameters are derived from both
experimental work and high-level quantum mechanical calculations
based on numerous approximations (103, 104). Limitations in
current force fields, such as inaccurate conformational preferences
for small proteins and peptides in aqueous solutions have been
known for years, which have led to a number of attempts to
improve these parameters. Recent re-parameterization of the dihe-
dral terms in AMBER (105) and the so-called CMAP correction in
CHARMM (106) have significantly improved the accuracy of
empirical force fields for protein secondary structure predictions.
Moreover, many existing force fields based on fixed charge models
that do not account for electronic polarization of the environment,
although more sophisticated and expensive polarizable models have
been shown to be necessary for accurate description of some molec-
ular properties. A few polarizable force field models have been
developed over the past several years (107–110). Recent systematic
validations of AMOEBA force field have shown significant
improvements over fixed charge models for a variety of structural
and thermodynamic and dynamic properties (108). An increasing
use of this next-generation of force field models in biomolecular
simulations is anticipated within next few years. Furthermore, clas-
sical force fields are based on the assumption that quantum effects
play a minor role or can be separated from classical Newtonian
dynamics. A proper description of charge/electron transfer process
or chemical bond breaking/forming requires quantum treatment
that can be incorporated into simulations in different ways.
Another way of improving the model is to extend the time
scales spanned by MD simulations. Currently attainable time scales
are still about 3–4 orders of magnitude shorter than most biologi-
cally relevant ones. Methodologically, two general ways exist to
extend the time scale of an MD simulation: to make each integra-
tion step faster (mainly) through parallelization, or to improve the
exploration of phase space via enhanced sampling techniques.
During recent years, tremendous efforts have been focused on
improving parallel efficiency of the MD codes so that more CPU
processors can be used, which has enabled many ms and even ms
MD simulations of biological systems. One example is the use of a
special hardware computer Anton to reach ms simulations of an Abl
kinase (111), an NhaA antiporter (112), and a potassium channel
(113). As the computing power continues to increase, we expect
next generation of computer systems to significantly expand our
capability to simulate more complex and realistic systems for longer
times. However, the increase in computing power alone will not be
sufficient, and the development of more efficient and robust
enhanced sampling techniques will also be required to address
many challenging thermodynamic and kinetic problems in biology.
A variety of enhanced sampling techniques have been developed, to

obtain better converged equilibrium thermodynamic quantities,
such as generalized ensemble methods (98, 114), Wang-Landau
algorithm (99), meta-dynamics (115), and accelerated MD (116),
or to obtain reaction pathways and reaction rates, such as transition
path sampling (117) or Markov state models (118).
4. Examples
4.1. Cytochrome P450: We will first show how to carry out a standard MD simulation of the
A Simple Problem cytochrome P450 to investigate the overall flexibility of the protein.
Three crystal structures of CYP3A4, unliganded, bound to the
inhibitor metyrapone, and bound to the substrate progesterone,
have been determined (119). The comparison of the three struc-
tures revealed little conformational change associated with the
binding of ligand. So it will be interesting to investigate if any
dynamical differences exist among the three structures. Here, we
demonstrate how the protein dynamics can be probed by MD
simulations using the NAMD package (70). The input and config-
uration files will be prepared with VMD (82) and TCL scripts.
4.1.1. Building a Structural Our simulation will start from an X-ray crystal structure of
Model of CYP3A4 CYP3A4, which is an unliganded CYP3A4 soluble domain cap-
tured at a 2.80 Å resolution (119). The PDB file of CYP3A4 can
be downloaded from the PDB database (PDB entries 1w0e). Given
a PDB structure, the next step is to generate the PSF and PDB files
using VMD and psfgen plugin. The script protein.tcl contains the
detailed steps for the process, which can be executed by typing in a
Linux terminal,
vmd -dispdev text –e protein.tcl > protein.log
4.1.2. Solvating and Now we will use VMD’s Solvate plugin to solvate the protein.
Ionizing the System Solvate places the solute in a box of pre-equilibrated waters of a
specified size, and then removes waters that are within a certain
cutoff distance from the solute. After solvation, we will use VMD’s
Autoionize plugin to add ions to neutralize the system, which is
important for Ewald-based long range electrostatic method such as
particle Ewald mesh (PME) to work properly. It can also create a
desired ionic concentration. What Autoionize does is to randomly
replace water molecules with ions.
vmd -dispdev text –e solvate.tcl > solvate.log
vmd -dispdev text –e ionize.tcl > ionize.log
4.1.3. Running a The solvated system comprises about 40,000 atoms. We will run
Simulation of CYP3A4 the simulations in a local Linux cluster computer using 32 cores
(four nodes and each node with eight cores). We will first minimize
Fig. 1. Root-mean-square deviation (RMSD) as a function of time during the MD simula-

tions of CYP3A4.
the system for 2,000 steps to remove bad contacts. The minimiza-
tion run can be executed by submitting to the PBS batch scheduler
with the command,
qsub runmin.pbs
The configuration file for minimization is min.namd, also
shown below.
After minimization, an MD simulation starting from the mini-
mized structure will be run by submitting to the PBS batch sched-
uler with the command,
qsub runmd.pbs
The PBS run script runmd.pbs is similar to runmin.pbs. The
configuration file for the MD simulation md.namd is shown below.
4.1.4. Analyze the Results The root mean square deviation (RMSD) is a frequently used
measure of the differences between the structures sampled during
the simulation and the reference structure. Using simulation trajec-
tory as an input, RMSD can be calculated with VMD > Extensions
> Analysis > RMSD Trajectory Tool for any selected atoms. Struc-
ture alignment will usually be performed to remove the transla-
tional and rotational movements. Backbone RMSD as a function of
time for CYP3A4 is shown in Fig. 1. Overall, the CYP3A4 structure
appears quite stable, with RMSD quickly reaching a plateau of
1.8 Å after about 0.4 ns of simulation.
The root mean square fluctuation (RMSF) measures the move-
ment of a subset of atoms with respect to the average structure over
the entire simulation. RMSF indicates the flexibility of different
regions of a protein, which can be related to crystallographic
B factors. Figure 2a illustrates the RMSFs of the Ca atoms from
the simulation (red line) in comparison to those (black line)
Fig. 2. RMSFs of the Ca atoms from the MD simulation as compared to the experimental data (black line), which were
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
calculated from the B-factors of CYP3A4 (PDB code: 1w0e) using RMSF ¼ ð3=8ÞBfactor =p. The computed RMSF
values are color-coded onto a cartoon representation of the protein structure, with red corresponding to the most mobile
region and blue corresponding to the most stable region.
obtained from crystallographic B-factors. Overall, the pattern of the

computed RMSFs is moderately consistent with that obtained from
crystallographic B-factors. The RMSF values can be filled in the
Beta or Occupancy field of a PDB file so that the flexibility of
different regions of a protein can be displayed in different colors
with VMD. Figure 2b illustrates a cartoon representation of the
CYP3A4 structure color-coded in the RMSF values. Clearly, the a
helices and b-strands exhibit low flexibility, while regions that
fluctuate most significantly are the loops connecting helices and
strands.
4.2. Nicotinic For a more complex problem, we will use MD simulations to

Acetylcholine Receptor examine the barrier to ion conduction in a model of the cationic
(nAChR): A More human a7 nicotinic acetylcholine receptor (nAChR)(Fig. 3).
Complex Problem nAChR concentrates at synapses, where it responds to nerve-
released acetylcholine to mediate excitatory transmission through-
out the central and peripheral nervous systems (120). The binding
of neurotoxins from snake venoms, such as bungarotoxin, to
nAChR strongly blocks the channel conductance, thus leading to
neuronal toxicity or even sudden death. The simulation strategies
used for nAChR, a membrane integral ion channel, should be
conveniently applicable to the study of hERG potassium channels.
Fig. 3. Structure of the nicotinic acetylcholine receptor with the five subunits highlighted
in different colors; chloride ions are shown in green, sodium ions in yellow, the head
group region of the lipid bilayer in dark blue.
4.2.1. Building a Structural As no experimental 3D structure of human a7 nAChR is available,

Model of nAChR the first step is to construct a homology model based on the 4.0 Å
resolution cryo-EM structure of nAChR from T. marmorata (PDB
accession code: 2BG9) (121). The homology modeling will be
conducted using the program Modeller 8 (83). The model used
in the following simulation will be the lowest score models gener-
ated by Modeller and further evaluated with PROCHECK (122)
and Prosa 2003 (123). Please refer to the reference (124) for more
details of the homology modeling.
Given a PDB structure of nAChR, the script protein.tcl will be
used to generate the PSF and PDB files with VMD and psfgen. The
PDB structure by homology modeling is first divided into ten
protein segments with each corresponding to a continuous chain
and one segment for calcium ions. Five calcium ions in the b8–b9
loop regions are added during the homology modeling process,
since it is known from experiments that the binding of these cal-
cium ions help stabilize the otherwise flexible loop structures. The
default protonation states are used for all the ionizable residues.
Ten disulfide bond patches are applied.
vmd -dispdev text –e protein.tcl > protein.log
To simulate nAChR in a native-like environment, the next step is

to place the protein in a fully hydrated membrane. We will first prepare
a 120 Å 120 Å palmitoyl-2-oleoyl-sn-glycerol-phosphatidylcho-
line (POPC) bilayer using the VMD Membrane Builder plugin.
vmd -dispdev text –e membrane.tcl > membrane.log
Then the structures of protein and membrane are aligned so
that the channel axis overlays the membrane normal direction.
After this is done, the protein can be placed into the membrane
by running the combine.tcl script. What this script does is to com-
bine the two PDB files, remove all the membrane atoms within
0.8 Å of the protein and write out a new set of PSF and PDB files for
the combined protein/membrane system.
vmd -dispdev text –e combine.tcl > combine.log
At this point, we will use solvate.tcl to solvate, and ionize.tcl to
ionize the combined protein/membrane system. Note the 10 Å
padding in solvation is only applied for the membrane normal
direction. Sometimes, the solvate procedure may put water mole-
cules inside the membrane, which is undesired. So we will use the
delwat.tcl script to remove those water molecules located within the
membrane but not inside the channel pore.
vmd -dispdev text –e solvate.tcl > solvate.log
vmd -dispdev text –e ionize.tcl > ionize.log
vmd -dispdev text –e delwat.tcl > delwat.log
4.2.2. Minimization Now that we have prepared all of our input files, we can start to run
and Equilibration an MD simulation of the system. However, as the protein structure
is built from homology modeling, and the final system consists of
multiple components: protein, membrane, and water, we will use
more sophisticated equilibration procedures to relax the system.
The entire equilibration protocol consists of the following stages:
minimization with fixed backbone atoms for 2,000 steps; minimi-
zation with restrained Ca atoms for 2,000 steps; Langevin dynam-
ics with harmonic positional restraints on the Ca atoms for
100,000 steps (to heat the system to the target temperature);
constant pressure dynamics (the ratio of the unit cell in the x–y
plane is fixed) with decreasing positional restraints on the Ca atoms
in five steps. The equilibration job will be submitted using
qsub runeq.pbs
4.2.3. Run a Free Energy Brute-force simulation of ion translocation through the nAChR
Simulation Using the channel is still a daunting task, often requiring specialized computer
Adaptive Biasing Force hardware. So here we will focus on understanding the intrinsic
(ABF) Method properties of the nAChR channel pore by computing the systematic
forces, also known as the PMF, experienced by a sodium ion inside
the channel. The adaptive biasing force (ABF) method as imple-
mented in the NAMD package by Chipot and Henin will be used to
Fig. 4. Potentials of mean force for translocation of Na+ ions (red ) and Cl ions (blue) in
nAChR. Positions of M2 pore-lining residues are shown with gray lines and labeled at the
top of the graphs.
construct the PMF (125). The detail about this method has been
given elsewhere. Briefly, a reaction coordinate x has to be selected.
The average forces acting along x are accumulated in bins,
providing an estimate of free energy derivative as the simulation
progresses. Then the application of the biasing forces (the negative
of the average forces) will allow the system to realize a free self-
diffusion along x. In the following simulation, the reaction coordi-
nate will be chosen as the normal to the bilayer surface (z). The
simulations will carried out in ten windows of length 5 Å along this
direction, which should be sufficient to cover the entire length of
the transmembrane domain region of nAChR. The PBS submission
script and the corresponding ABF simulation configuration file for
one representative window are given below.
qsub runabf1.pbs
4.2.4. Analyze the Results Combing the PMFs from all the windows by optimally matching the
overlapping regions of two adjacent windows will produce the final
PMF as displayed in Fig. 4 (red line). We only briefly summarize
below the main finding of the PMF, and refer interested readers to
the reference (126) for the detailed analysis of the PMF along with
other calculations. The PMF for sodium inside the nAChR pore
features two distinct areas of ion stabilization toward the extracellu-
lar end, corresponding to two distinct sets of negatively charged
residues, D270 and E200 . In both positions a sodium ion is stabilized
by ~2 kcal/mol. Multiple ions can be accommodated at position
D270 due to the large pore radius at that position of the lumen. The
PMF reaches an overall maximum at z ~ 0 Å. In this region the M2
helices expose primarily hydrophobic residues toward the interior of
the receptor (Leu90 , Val130 , Phe140 , and Leu160 ). This result impli-
cates a hydrophobic nature of the gate. The effective free energy of
sodium in the entire intracellular region of the pore (z between
0 and 20 Å) remains largely unfavorable compared to bulk solvent
and goes through several minor peaks and troughs. Overall, the
computed PMF provides a detailed thermodynamic description of
a sodium ion inside the nAChR channel, such as equilibrium sodium
distribution, location of ion binding site, or barrier. Moreover,
when combined with a macroscopic or semi-microscopic diffusional
theory, the PMF can be used to calculate ionic current, thus directly
comparable to single-channel conductance measurements.
5. Notes
l Guidelines and best practices

l Recommendations and caveats about the tools and methods
l Common pitfalls and how these are mitigated
One of the significant challenges faced with MD simulation is the
limitation on the time scales that we can simulate. So before starting
an MD simulation, one should consider what timescale is expected
for the biological process under investigation, and what timescale is
affordable by the simulation. Timescale mismatches or unconverged
simulations often lead to invalid conclusions or erroneous interpre-
tation of the simulation results. If the biological timescale is indeed
out of the reach for MD simulation with the available resources, one
can consider using some advanced or alternative simulation techni-
ques that enhance the sampling of phase space. Among many avail-
able enhanced sampling algorithms, it is advisable to choose one
that maintains the rigorousness of thermodynamics and statistical
mechanics, introduces minimum external perturbation to the sys-
tem, and explores most efficiently the phase space.
The other major limitation of classical MD simulations is the
underlying force field models. A molecular mechanical force field
requires first the definition of a potential function form and then
the fitting of a set of parameters to describe the interactions
between atoms. During the development of a force field, various
levels of approximations have to be introduced. So when choosing a
force field and/or interpreting simulation results, one should be
aware of how a particular force field is parameterized, what are
potential limitations, and in what circumstance, the force field is
applicable. For example, if the force field has been parameterized
against thermodynamic properties or equilibrium structural data,
then one would expect the simulation with this force field to be less
accurate in reproducing the kinetics. In applications where electro-
static polarization or charge transfer is important, classical fixed
charge force field models will not work very well.
In general, good practices of MD simulations involve the follow-

ing: (1) a well-defined biological problem that can be addressed by
MD simulations; (2) a high-quality starting structure that most
closely represents the biological condition; (3) a good force field for
the system and/or problem under investigation; (4) carefully moni-
tored minimization and equilibration, especially for multicomponent
systems or models generated by homology modeling; (4) always run
multiple (control) simulations for self-consistency check. Finally, drug
toxicity is a complex biological process: it is unlikely MD simulation
alone will be able to identify the toxicity mechanism or predict the
toxicity of a new compound; but when tightly coupled with other
computational tools as well as a variety of experimental techniques,
MD simulation is becoming a necessary tool for computational toxic-
ity research. We anticipate more and more drug toxicity queries to be
addressed by MD simulations in the near future.
6. Sample Input
Files
Box 1
TCL script for building the protein structure
protein.tcl
##############################
# Script to build the protein structure of 1W0E
# STEP 1: Build Protein
package require psfgen
# Use the specified CHARMM27 topology file.
topology /home/xc3/toppar/top_all27_prot_lipid.rtf
alias residue HIS HSD
alias atom ILE CD1 CD
# Build one segment
segment PROT {
first ACE
last CT3
pdb 1w0e-prot.pdb
}
# Load the coordinates for each segment.
coordpdb 1w0e-prot.pdb PROT
# Guess the positions of missing atoms.
guesscoord
# Write out coor and psf file
writepdb protein.pdb
writepsf protein.psf
mol load psf protein.psf pdb protein.pdb
quit
Box 2
TCL scripts for solvating the protein structure
solvate.tcl
##############################
# STEP 2: Solvate Protein
package require solvate
solvate protein.psf protein.pdb -t 10 -o solvated
quit
ionize.tcl
##############################
# STEP 3: Ionize Protein
package require autoionize
autoionize -psf solvated.psf -pdb solvated.pdb -is 0.1
quit
Box 3
PBS script to run an energy minimization
runmin.pbs
##############################
## Example PBS script to run a minimization on the linux cluster
#PBS -S /bin/bash
#PBS -j oe
#PBS -m ae
#PBS -N test
#PBS -l nodes¼4:ppn¼8,walltime¼24:00:00
#PBS -V
#PBS -q md
source /share/apps/mpi/gcc/openmpi-1.2.8/bin/mpivars.sh
cd /home/xc3/data7/CYT450
/share/apps/mpi/gcc/openmpi-1.2.8/bin/mpiexec -np 32 /share/apps/
namd/NAMD_2.7b1_Source/Linux-amd64-MPI.arch/namd2 min.namd
> min.log
Box 4
NAMD configuration file for running an energy minimization
min.namd
##############################
# minimization for 2000 steps
# molecular system
coordinates ionized.pdb
structure ionized.psf
firsttimestep 0
temperature 0
minimization on
(continued)
Box 4
(continued)
numsteps 2000
# force field
paratypecharmm on
parameters par_all27_prot.prm
exclude scaled1-4
1-4scaling 1.0
switching on
switchdist 8.5
cutoff 10
pairlistdist 12
#PME stuff
cellOrigin 57.56 77.37 10.48
cellBasisVector1 64.70 00.00 00.00
PME on
PmeGridsizeX 64
PmeGridsizeY 96
PmeGridsizeZ 81
margin 5
# output
outputname min
outputenergies 1000
outputtiming 1000
restartname min_restart
restartfreq 1000
restartsave no
Box 5
PBS script to run an MD simulation
runmd.pbs
##############################
#PBS -S /bin/bash
#PBS -j oe
#PBS -m ae
#PBS -N test
#PBS -V
#PBS -q md
cd /home/xc3/data7/CYT450
namd/NAMD_2.7b1_Source/Linux-amd64-MPI.arch/namd2 md.namd
> md.log
Box 6
NAMD configuration file for running an MD simulation
md.namd
##############################
# run md for 2000000 steps
# molecular system
coordinates ionized.pdb
structure ionized.psf
bincoordinates min_restart.coor
binvelocities min_restart.vel
extendedSystem min_restart.xsc
firsttimestep 0
numsteps 2000000
# force field
paratypecharmm on
parameters par_all27_prot_lipid.prm
exclude scaled1-4
1-4scaling 1.0
switching on
switchdist 8.5
cutoff 10
pairlistdist 12
# integrator
timestep 1.0
stepspercycle 20
nonbondedFreq 1
#PME stuff
cellOrigin 57.56 77.37 10.48
PME on
PmeGridsizeX 64
PmeGridsizeY 96
PmeGridsizeZ 81
margin 5
# output
outputname md
outputenergies 1000
outputtiming 1000
dcdfreq 1000
wrapAll on
wrapNearest on
restartname md_restart
restartfreq 1000
restartsave no
(continued)
Box 6
(continued)
# temperature & pressure

langevin on
langevinDamping 10
langevinTemp 310
langevinHydrogen on
langevinPiston on
langevinPistonTarget 1.01325
langevinPistonPeriod 200
langevinPistonDecay 100
langevinPistonTemp 310
useGroupPressure yes
useFlexibleCell no
Box 7
TCL script files for building a protein structure embedded
in membrane
protein.tcl
##############################
# STEP 1: Build Protein
# Script to build the protein structure of GA
# Use the specified CHARMM27 topology file.
topology top_all27_prot_lipid.inp
alias atom ILE CD1 CD
alias residue HIS HSD
# Build two segments, one for each Chain.
segment GA1 {
first ACE
last CT3
pdb Chain1.pdb
}
segment GA2 {
first ACE
last CT3
pdb Chain2.pdb
}
segment GA3 {
first ACE
last CT3
pdb Chain3.pdb
}
segment GA4 {
first ACE
last CT3
(continued)
Box 7
(continued)
pdb Chain4.pdb
}
segment GA5 {
first ACE
last CT3
pdb Chain5.pdb
}
segment GA6 {
first ACE
last CT3
pdb Chain6.pdb
}
segment GA7 {
first ACE
last CT3
pdb Chain7.pdb
}
segment GA8 {
first ACE
last CT3
pdb Chain8.pdb
}
segment GA9 {
first ACE
last CT3
pdb Chain9.pdb
}
segment GA10 {
first ACE
last CT3
pdb Chain10.pdb
}
segment GA11 {
auto none
pdb Chain11.pdb
}
# Add patches, for example disulphide bridges.
patch DISU GA1:128 GA1:142
# Load the coordinates for each segment.
coordpdb Chain1.pdb GA1
(continued)
Box 7
(continued)

# Guess the positions of missing atoms.
guesscoord
# Write out coor and psf file
writepdb protein.pdb
writepsf protein.psf
mol load psf protein.psf pdb protein.pdb
quit
membrane.tcl
##############################
# STEP 2: Building a Membrane Patch
package require membrane
membrane -l popc -x 120 -y 120
combine.tcl
##############################
# STEP 3: Combine Protein and Membrane
##!/usr/local/bin/vmd
# need psfgen module and topology
topology top_all27_prot_lipid.inp
# load structures
resetpsf
readpsf membrane.psf
coordpdb membrane.pdb
readpsf protein.psf
coordpdb protein.pdb
# write temporary structure
set temp "temp"
writepsf $temp.psf
writepdb $temp.pdb
# reload full structure (do NOT resetpsf!)
mol load psf $temp.psf pdb $temp.pdb
# select and delete lipids that overlap protein:any atom to any atom distance
under 0.8A
set sellip [atomselect top "resname POPC"]
(continued)
Box 7
(continued)
set lseglist [lsort -unique [$sellip get segid]]

foreach lseg $lseglist {
# find lipid backbone atoms
set selover [atomselect top "segid $lseg and within 0.8 of protein"]
# delete these residues
set resover [lsort -unique [$selover get resid]]
foreach res $resover {
delatom $lseg $res
}
}
foreach res { } {delatom $LIP1 $res}
foreach res { } {delatom $LIP2 $res}
# select and delete waters that overlap protein:
set selwat [atomselect top "resname TIP3"]
set lseglist [lsort -unique [$selwat get segid]]
foreach lseg $lseglist {
set selover [atomselect top "segid $lseg and within 3.8 of protein"]
set resover [lsort -unique [$selover get resid]]
foreach res $resover {
delatom $lseg $res
}
}
foreach res { } {delatom $WAT1 $res}
foreach res { } {delatom $WAT2 $res}
# write full structure
writepsf protein_and_membrane.psf
writepdb protein_and_membrane.pdb
file delete $temp.psf
file delete $temp.pdb
quit
Box 8
TCL script files for solvating the membrane protein
structure
solvate.tcl
##############################
# STEP 4: Solvate Protein
package require solvate
solvate protein.psf protein.pdb -z 10 -o solvated
quit
ionize.tcl
##############################
# STEP 5: Ionize Protein
package require autoionize
(continued)
Box 8
(continued)
autoionize -psf solvated.psf -pdb solvated.pdb -is 0.1

quit
delwat.tcl
##############################
# STEP 6: Delete water in the membrane
# Load a pdb and psf file into both psfgen and VMD.
resetpsf
readpsf ionized.psf
coordpdb ionized.pdb
mol load psf ionized.psf pdb ionized.pdb
# Select waters located within the membrane but not inside the channel pore
set badwater [atomselect top "water and (z<14 and z>-14 and (((x>-50
and x<5) or (x>17 and x<75)) or ((y>-50 and y<5) or (y>17 and
y<75))))"]
# Delete the residues corresponding to the atoms we selected.
foreach segid [$badwater get segid] resid [$badwater get resid] {
delatom $segid $resid
}
# write out the new psf and pdb file
writepsf ionized_porewat.psf
writepdb ionized_porewat.pdb
quit
Box 9
NAMD configuration file for running an MD equilibration
equil.namd
##############################
# STEP 7: Minimization and Equilibration
# molecular system
coordinates ionized_porewat.pdb
structure ionized_porewat.psf
temperature 0
# force field
paratypecharmm on
exclude scaled1-4
1-4scaling 1.0
switching on
switchdist 8.5
cutoff 10
pairlistdist 12
# integrator
(continued)
Box 9
(continued)
timestep 1.0
stepspercycle 20
nonbondedFreq 1
#PME stuff
cellOrigin 11.53 12.51 -30.64
PME on
PmeGridsizeX 128
PmeGridsizeY 128
PmeGridsizeZ 144
margin 5
# output
outputname eq
outputenergies 1000
outputtiming 1000
dcdfreq 1000
dcdfile eq.dcd
wrapAll on
wrapNearest on
fixedAtoms on
fixedAtomsForces on
fixedAtomsFile fix_backbone.pdb
fixedAtomsCol B
constraints on
consRef restrain_ca.pdb
consKFile restrain_ca.pdb
consKCol B
langevin on
langevinDamping 10
langevinTemp 310
langevinHydrogen no
langevinPiston on
useGroupPressure yes # smaller fluctuations
useFlexibleCell yes # allow dimensions to fluctuate independently
useConstantRatio yes # fix shape in x-y plane
# run one step to get into scripting mode
minimize 0
# turn off until later
langevinPiston off
# minimize nonbackbone atoms
minimize 2000
(continued)
Box 9
(continued)
output min_fix
# min all atoms
fixedAtoms off
minimize 2000
output min_all
# heat with CAs restrained
run 100000
output heat
# equilibrate volume with CAs restrained
langevinPiston on
constraintScaling 3.0
output equil_ca1
run 200000
output equil_ca2
run 200000
output equil_ca3
run 200000
output equil_ca4
run 200000
constraintScaling 0
output equil_ca5
run 1000000
Box 10
PBS script for running an MD equilibration
runeq.pbs
##############################
#PBS -S /bin/bash
#PBS -j oe
#PBS -m ae
#PBS -N test
#PBS -V
#PBS -q md
cd /home/xc3/data7/nAChR
namd/NAMD_2.7b1_Source/Linux-amd64-MPI.arch/namd2 equil.namd
> equil.log
Box 11
PBS script for running an MD production
runabf1.pbs
##############################
#PBS -S /bin/bash
#PBS -j oe
#PBS -m ae
#PBS -N abf1
#PBS -V
#PBS -q md
cd /home/xc3/data7/nAChR
/share/apps/mpi/gcc/openmpi-1.2.8/bin/mpiexec -np 96 /share/apps/namd/
NAMD_2.7b1_Source/Linux-amd64-MPI.arch/namd2 abf1.namd > abf1.
log
Box 12
NAMD configuration file for running an ABF MD simulation
abf1.namd
##############################
# STEP 7: ABF Simulation – window 1
# molecular system
# start from slightly modified equilibrated configuration
# the position of the permeating sodium ion is modified to be located within the
biasing window
coordinates ionized_porewat-abf1.pdb
structure ionized_porewat.psf
bincoordinates eq_restart.coor
binvelocities eq_restart.vel
extendedSystem eq_restart.xsc
firsttimestep 0
temperature 310
numsteps 2000000
# force field
paratypecharmm on
exclude scaled1-4
1-4scaling 1.0
switching on
switchdist 8.5
cutoff 10
pairlistdist 12
# integrator
timestep 1.0
stepspercycle 20
(continued)
Box 12
(continued)
nonbondedFreq 1
#PME stuff
cellOrigin 11.53 12.51 -30.64
PME on
PmeGridsizeX 128
PmeGridsizeY 128
PmeGridsizeZ 144
# output
outputname abf1
outputenergies 1000
outputtiming 1000
dcdfreq 1000
dcdfile abf1.dcd
wrapAll on
wrapNearest on
restartname abf1_restart
restartfreq 1000
restartsave no
# restraints are applied to six Ca atoms on each subunit
# (three at the extracellular end and three at the intracellular end of the M2
helices)
constraints on
consRef restrain_ref.pdb
consKFile restrain_ref.pdb
consKCol B
langevin on
langevinDamping 5
langevinTemp 310
langevinHydrogen on
langevinPiston on
useGroupPressure yes
useFlexibleCell yes
useConstantArea yes
# ABF SECTION
colvars on
colvarsConfig Distance.in
Distance.in
##############################
Colvarstrajfrequency 2000
Colvarsrestartfrequency 20000
(continued)
Box 12
(continued)
colvar {
name COMDistance
width 0.1
lowerboundary -25.0
upperboundary 20.0
lowerwallconstant 10.0
upperwallconstant 10.0
# distance along z axis between the ion and the 30 reference atoms
distanceZ {
group1 {
atomnumbers { 245701 }
}
group2 {
atomnumbers { 58205 58216 58223 58687 58705 58717
64078 64089 64096 64560 64577 64590
69951 69962 69969 70433 70450 70463
75824 75835 75842 76306 76323 76336
81697 81708 81715 82179 82196 82209 }
}
}
}
abf {
colvars COMDistance
fullSamples 800
hideJacobian
}
References
1. Liebler DC, Guengerich FP (2005) Elucidat- nuclear receptor superfamily. Nat Rev Drug
ing mechanisms of drug-induced toxicity. Nat Discov 3:950–964
Rev Drug Discov 4(5):410–420 7. Sanguinetti MC, Tristani-Firouzi M (2006)
2. Houck KA, Kavlock RJ (2008) Understand- hERG potassium channels and cardiac
ing mechanisms of toxicity: insights from arrhythmia. Nature 440(7083):463–469
drug discovery research. Toxicol Appl Phar- 8. Cronin MT (2000) Computational methods
macol 227(2):163–178 for the prediction of drug toxicity. Curr Opin
3. Gillette JR, Mitchell JR, Brodie BB (1974) Drug Discov Dev 3(3):292–297
Biochemical mechanisms of drug toxicity. 9. Dearden JC (2003) In silico prediction of
Annu Rev Pharmacol 14:271–288 drug toxicity. J Comput Aided Mol Des 17
4. Baillie TA (2008) Metabolism and toxicity of (2–4):119–127
drugs. Two decades of progress in industrial 10. Valerio LG (2009) In silico toxicology for the
drug metabolism. Chem Res Toxicol 21 pharmaceutical sciences. Toxicol Appl Phar-
(1):129–137 macol 241(3):356–370
5. Guengerich FP (1999) Cytochrome P-450 11. Kavlock RJ et al (2008) Computational toxi-
3A4: regulation and role in drug metabolism. cology—a state of the science mini review.
Annu Rev Pharmacol Toxicol 39:1–17 Toxicol Sci 103(1):14–27
6. Gronemeyer H, Gustafsson J, Laudet V 12. Nicholson JD, Wilson ID (2003) Understand-
(2004) Principles for modulation of the ing ‘Global’ systems biology: metabonomics
and the continuum of metabolism. Nat Rev Trans A Math Phys Eng Sci 363
Drug Discov 2:668–676 (1827):331–355
13. Bugrim A, Nikolskaya T, Yuri Nikolsky Y 29. Senn HM, Thiel W (2009) QM/MM meth-
(2004) Early prediction of drug metabolism ods for biomolecular systems. Angew Chem
and toxicity: systems biology approach and Int Ed Engl 48(7):1198–1229
modeling. Drug Discov Today 9(3):127–135 30. Ridder L, Mulholland AJ (2003) Modeling
14. Nicholson JK et al (2002) Metabonomics: a biotransformation reactions by combined
platform for studying drug toxicity and gene quantum mechanical/molecular mechanical
function. Nat Rev Drug Discov 1:153–161 approaches: from structure to activity. Curr
15. Hunter PJ, Borg TK (2003) Integration from Top Med Chem 3(11):1241–1256
proteins to organs: the Physiome Project. Nat 31. Lewis DFV (2001) Guide to cytochromes
Rev Mol Cell Biol 4(3):237–243 P450: structure and function, 2nd edn.
16. Silva JR et al (2009) A multiscale model link- Informa Healthcare, London
ing ion-channel molecular dynamics and elec- 32. Denisov IG et al (2005) Structure and chem-
trostatics to the cardiac action potential. Proc istry of cytochrome P450. Chem Rev 105
Natl Acad Sci U S A 106(27):11102–11106 (6):2253–2277
17. Dill KA et al (2008) The protein folding 33. Wang JF, Chou KC (2010) Molecular model-
problem. Annu Rev Biophys 37:289–316 ing of cytochrome P450 and drug metabo-
18. Zimmermann O, Hansmann UH (2008) lism. Curr Drug Metab 11(4):342–346
Understanding protein folding: small proteins 34. Otyepka M et al (2007) What common struc-
in silico. Biochim Biophys Acta 1784 tural features and variations of mammalian
(1):252–258 P450s are known to date? Biochim Biophys
19. de Groot BL, Grubm€ uller H (2005) The Acta 1770(3):376–389
dynamics and energetics of water permeation 35. Henley DV, Korach KS (2006) Endocrine-
and proton exclusion in aquaporins. Curr disrupting chemicals use distinct mechanisms
Opin Struct Biol 15(2):176–183 of action to modulate endocrine system func-
20. Roux B, Schulten K (2004) Computational tion. Endocrinology 147(6):S25–S32
studies of membrane channels. Structure 12 36. Ankley GT et al (2010) Adverse outcome
(8):1343–1351 pathways: a conceptual framework to support
21. McCammon JA, Gelin BR, Karplus M (1977) ecotoxicology research and risk assessment.
Dynamics of folded proteins. Nature 267 Environ Toxicol Chem 29(3):730–741
(5612):585–590 37. Jugan ML, Levi Y, Blondeau JP (2010) Endo-
22. Karplus M, McCammon JA (2002) Molecular crine disruptors and thyroid hormone physi-
dynamics simulations of biomolecules. Nat ology. Biochem Pharmacol 79(7):939–947
Struct Biol 9(9):646–652 38. Pearce EN, Braverman LE (2009) Environ-
23. van Gunsteren WF et al (2006) Biomolecular mental pollutants and the thyroid. Best Pract
modeling: goals, problems, perspectives. Res Clin Endocrinol Metab 23(6):801–813
Angew Chem Int Ed Engl 45 39. Prenzel N et al (2001) The epidermal growth
(25):4064–4092 factor receptor family as a central element for
24. Adcock SA, McCammon JA (2006) Molecu- cellular signal transduction and diversifica-
lar dynamics: survey of methods for simulat- tion. Endocr Relat Cancer 8(1):11–31
ing the activity of proteins. Chem Rev 106 40. Bock KW (1994) Aryl hydrocarbon or dioxin
(5):1589–1615 receptor: biologic and toxic responses. Rev
25. Scheraga HA, Khalili M, Liwo A (2007) Physiol Biochem Pharmacol 125:1–42
Protein-folding dynamics: overview of molec- 41. Bradshaw TD, Bell DR (2009) Relevance of
ular simulation techniques. Annu Rev Phys the aryl hydrocarbon receptor (AhR) for clin-
Chem 58:57–83 ical toxicology. Clin Toxicol (Phila) 47
26. Warshel A (2002) Molecular dynamics simu- (7):632–642
lations of biological reactions. Acc Chem Res 42. Gray LE Jr et al (2006) Adverse effects of
35(6):385–395 environmental antiandrogens and androgens
27. Garcia-Viloca M et al (2004) How enzymes on reproductive development in mammals.
work: analysis by modern rate theory and Int J Androl 29(1):96–104
computer simulations. Science 303 43. Roncaglioni A, Benfenati E (2008) In silico-
(5655):186–195 aided prediction of biological properties of
28. Karplus M et al (2005) Protein structural chemicals: oestrogen receptor-mediated
transitions and their functional role. Philos effects. Chem Soc Rev 37(3):441–450
44. Lin JH et al (2002) Computational drug role for a buried arginine. Proc Natl Acad Sci
design accommodating receptor flexibility: U S A 99(8):5361–5366
the relaxed complex scheme. J Am Chem 56. L€udemann SK, Lounnas V, Wade RC (2000)
Soc 124(20):5632–5633 How do substrates enter and products exit the
45. Cornell W, Nam K (2009) Steroid hormone buried active site of cytochrome P450cam? 1.
binding receptors: application of homology Random expulsion molecular dynamics inves-
modeling, induced fit docking, and molecular tigation of ligand access channels and
dynamics to study structure–function rela- mechanisms. J Mol Biol 303(5):797–811
tionships. Curr Top Med Chem 9 57. Li W et al (2007) Possible pathway(s) of
(9):844–853 metyrapone egress from the active site of cyto-
46. Recanatini M, Cavalli A, Masetti M (2008) chrome P450 3A4: a molecular dynamics sim-
Modeling the hERG potassium channel in a ulation. Drug Metab Dispos 35(4):689–696
phospholipid bilayer: molecular dynamics and 58. Fishelovitch D et al (2009) Theoretical char-
drug docking studies. J Comput Chem 29 acterization of substrate access/exit channels
(5):795–808 in the human cytochrome P450 3A4 enzyme:
47. Stary A et al (2010) Toward a consensus involvement of phenylalanine residues in the
model of the HERG potassium channel. gating mechanism. J Phys Chem B 113
ChemMedChem 5(3):455–467 (39):13018–13025
48. Meharenna YT, Poulos TL (2010) Using 59. Subbotina J et al (2010) Structural refinement
molecular dynamics to probe the structural of the hERG1 pore and voltage-sensing
basis for enhanced stability in thermal stable domains with ROSETTA-membrane and
cytochromes P450. Biochemistry 49 molecular dynamics simulations. Proteins 78
(31):6680–6686 (14):2922–2934
49. Skopalı́k J, Anzenbacher P, Otyepka M 60. Stansfeld PJ et al (2008) Insight into the
(2008) Flexibility of human cytochromes mechanism of inactivation and pH sensitivity
P450: molecular dynamics reveals differences in potassium channels from molecular dynam-
between CYPs 3A4, 2 C9, and 2A6, which ics simulations. Biochemistry 47
correlate with their substrate preferences. J (28):7414–7422
Phys Chem B 112(27):8165–8173 61. Kutteh R, Vandenberg JI, Kuyucak S (2007)
50. Hendrychováa T et al (2010) Flexibility of Molecular dynamics and continuum electro-
human cytochrome p450 enzymes: molecular statics studies of inactivation in the HERG
dynamics and spectroscopy reveal important potassium channel. J Phys Chem B
function-related variations. Biochim Biophys 111:1090–1098
Acta 1814:58–68 62. Osterberg F, Aqvist J (2005) Exploring
51. Lampe JN et al (2010) Two-dimensional blocker binding to a homology model of the
NMR and all-atom molecular dynamics of open hERG K+ channel using docking and
cytochrome P450 CYP119 reveal hidden con- molecular dynamics methods. FEBS Lett
formational substates. J Biol Chem 285 579:2939–2944
(13):9594–9603 63. Brooks BR et al (2009) CHARMM: the bio-
52. Park H, Lee S, Suh J (2005) Structural and molecular simulation program. J Comput
dynamical basis of broad substrate specificity, Chem 30(10):1545–1614
catalytic mechanism, and inhibition of cyto- 64. Case DA et al (2005) The Amber biomolecu-
chrome P450 3A4. J Am Chem Soc 127 lar simulation programs. J Comput Chem 26
(39):13634–13642 (16):1668–1688
53. Fishelovitch D et al (2007) Structural dynam- 65. Christen M et al (2005) The GROMOS soft-
ics of the cooperative binding of organic ware for biomolecular simulation: GRO-
molecules in the human cytochrome P450 MOS05. J Comput Chem 26
3A4. J Am Chem Soc 129(6):1602–1611 (16):1719–1751
54. Seifert A et al (2006) Multiple molecular 66. Van Der Spoel D et al (2005) GROMACS:
dynamics simulations of human p450 mono- fast, flexible, and free. J Comput Chem 26
oxygenase CYP2C9: the molecular basis of (16):1701–1718
substrate binding and regioselectivity toward 67. Ponder JW, Richards FM (1987) An efficient
warfarin. Proteins 64(1):147–155 Newton-like method for molecular mechanics
55. Winn PJ et al (2002) Comparison of the energy minimization of large molecules. J
dynamics of substrate access channels in Comput Chem 8(7):1016–1024
three cytochrome P450s reveals different 68. Refson K (2000) Moldy: a portable molecular
opening mechanisms and a novel functional dynamics simulation program for serial and
parallel computers. Comput Phys Commun 83. Sali A et al (1995) Evaluation of comparative
126(3):310–329 protein modeling by MODELLER. Proteins
69. Smith W, Forester TR (1996) 23(3):318–326
DL_POLY_2.0: a general-purpose parallel 84. Wang J et al (2004) Development and testing
molecular dynamics simulation package. J of a general amber force field. J Comput
Mol Graph 14(3):136–141 Chem 25(9):1157–1174
70. Phillips JC et al (2005) Scalable molecular 85. Vanommeslaeghe K et al (2010) CHARMM
dynamics with NAMD. J Comput Chem general force field: a force field for drug-like
26:1781–1802 molecules compatible with the CHARMM
71. Plimpton SJ (1995) Fast parallel algorithms all-atom additive biological force fields. J
for short-range molecular dynamics. J Comp Comput Chem 31(4):671–690
Phys 117:1–19 86. Ryckaert JP, Ciccotti G, Berendsen HJC
72. Bowers KJ et al (2006) Scalable algorithms for (1977) Numerical integration of the Carte-
molecular dynamics simulations on commod- sian equations of motion of a system with
ity clusters. In: Proceedings of the ACM/ constraints: molecular dynamics of n-alkanes.
IEEE conference on supercomputing J Comp Phys 23:327–341
(SC06). Tampa, FL. 87. Tuckerman M, Berne BJ, Martyna GJ (1992)
73. MacKerell AD et al (1998) CHARMM: the Reversible multiple time scale molecular
energy function and its parameterization with dynamics. J Chem Phys 97:1990–2001
an overview of the program. In: Schleyer PVR 88. Andersen HC (1980) Molecular dynamics at
(ed) The encyclopedia of computational constant pressure and/or temperature. J
chemistry. Wiley, Chichester, pp 271–277 Chem Phys 72:2384–2393
74. Cornell WD et al (1995) A second generation 89. Nose S (1984) A unified formulation of the
force field for the simulation of proteins, constant temperature molecular-dynamics
nucleic acids, and organic molecules. J Am methods. J Chem Phys 81(1):511–519
Chem Soc 117(19):5179–5197 90. Hoover WG (1985) Canonical dynamics:
75. Oostenbrink C et al (2004) A biomolecular equilibrium phase-space distributions. Phys
force field based on the free enthalpy of hydra- Rev A 31(3):1695–1697
tion and solvation: the GROMOS force-field 91. Martyna GL, Klein ML, Tuckerman M
parameter sets 53A5 and 53A6. J Comput (1992) Nose-Hoover chains: the canonical
Chem 25(13):1656–1676 ensemble via continuous dynamics. J Chem
76. Jorgensen WL, Maxwell DS, Tirado-Rives J Phys 97(4):2635–2643
(1996) Development and testing of the OPLS 92. Berendsen HJC et al (1984) Molecular-
all-atom force field on conformational ener- dynamics with coupling to an external bath.
getics and properties of organic liquids. J Am J Chem Phys 81(8):3684–3690
Chem Soc 118:11225–11236 93. Martyna GL, Tobias DJ, Klein ML (1994)
77. Stone JE et al (2007) Accelerating molecular Constant pressure molecular dynamics algo-
modeling applications with graphics proces- rithms. J Chem Phys 101(5):4177–4189
sors. J Comput Chem 28(16):2618–2640 94. Andricioaei I, Karplus M (2001) On the cal-
78. Friedrichs MS et al (2009) Accelerating culation of entropy from covariance matrices
molecular dynamic simulation on graphics of the atomic fluctuations. J Chem Phys
processing units. J Comput Chem 30 115:6289–6292
(6):864–872 95. Baron R, H€ unenberger PH, McCammon JA
79. Davis JE et al (2009) Towards large-scale (2009) Absolute single-molecule entropies
molecular dynamics simulations on graphics from quasi-harmonic analysis of microsecond
processors. Lecture Notes Comput Sci molecular dynamics: correction terms and
5462:176–186 convergence properties. J Chem Theory
80. Darden T, York D, Pedersen L (1993) Particle Comput 5(12):3150–3160
mesh Ewald: an N log (N) method for Ewald 96. Kollman PA et al (2000) Calculating struc-
sums in large systems. J Chem Phys tures and free energies of complex molecules:
98:10089–10092 combining molecular mechanics and contin-
81. Hockney RW, Eastwood JW (1988) Com- uum models. Acc Chem Res 33(12):889–897
puter simulation using particles. Taylor & 97. Torrie GM, Valleau JP (1977) Nonphysical
Francis Croup, New York sampling distributions in Monte Carlo free-
82. Humphrey W, Dalke A, Schulten K (1996) energy estimation—umbrella sampling. J
VMD: visual molecular dynamics. J Mol Comput Phys 23:187–199
Graph Model 14(1):33–38
98. Mitsutake A, Sugita Y, Okamoto Y (2001) the Abl kinase. Proc Natl Acad Sci U S A
Generalized-ensemble algorithms for molecu- 106(1):139–144
lar simulations of biopolymers. Biopolymers 112. Arkin IT et al (2007) Mechanism of Na+/H+
60(2):96–123 antiporting. Science 317(5839):799–803
99. Wang F, Landau DP (2001) Efficient multiple 113. Jensen MØ et al (2010) Principles of conduc-
range random walk algorithm to calculate tion and hydrophobic gating in K+ channels.
density of states. Phys Rev Lett 86:2050 Proc Natl Acad Sci U S A 107
100. Kumar S et al (1993) The weighted histogram (13):5833–5838
analysis method (WHAM) for free energy cal- 114. Okamoto Y (2004) Generalized-ensemble
culations on biomolecules: 1. The method. J algorithms: enhanced sampling techniques for
Comput Chem 13:1011–1021 Monte Carlo and molecular dynamics simula-
101. Fasnacht M, Zhu J, Honig B (2007) Local tions. J Mol Graph Model 22(5):425–439
quality assessment in homology models using 115. Laio A, Parrinello M (2002) Escaping free-
statistical potentials and support vector energy minima. Proc Natl Acad Sci U S A 99
machines. Protein Sci 16(8):1557–1568 (20):12562–12566
102. Grossfield A, Feller SE, Pitman MC (2007) 116. Hamelberg D, Mongan J, McCammon JA
Convergence of molecular dynamics simula- (2004) Accelerated molecular dynamics: a
tions of membrane proteins. Proteins 67 promising and efficient simulation method for
(1):31–40 biomolecules. J Chem Phys 120:11919–11929
103. Guvench O, MacKerell AD Jr (2008) Com- 117. Bolhuis PG et al (2002) Transition path sam-
parison of protein force fields for molecular pling: throwing ropes over rough mountain
dynamics simulations. Methods Mol Biol passes, in the dark. Annu Rev Phys Chem
443:63–88 53:291–318
104. Ponder JW, Case DA (2003) Force fields for 118. Noé F et al (2007) Hierarchical analysis of
protein simulations. Adv Protein Chem conformational dynamics in biomolecules:
66:27–85 transition networks of metastable states. J
105. Hornak V et al (2006) Comparison of multi- Chem Phys 126(15):155102
ple Amber force fields and development of 119. Williams PA et al (2004) Crystal structures of
improved protein backbone parameters. Pro- human cytochrome P450 3A4 bound to
teins 65(3):712–725 metyrapone and progesterone. Science 305
106. MacKerell AD Jr, Feig M, Brooks CL III (5684):683–686
(2004) Improved treatment of the protein 120. Sine SM, Engel AG (2006) Recent advances
backbone in empirical force fields. J Am in Cys-loop receptor structure and function.
Chem Soc 126(3):698–699 Nature 440(7083):448–455
107. Patel S, Mackerell AD Jr, Brooks CL III 121. Unwin N (2005) Refined structure of the
(2004) CHARMM fluctuating charge force nicotinic acetylcholine receptor at 4A resolu-
field for proteins: II protein/solvent proper- tion. J Mol Biol 346(4):967–989
ties from molecular dynamics simulations 122. Laskowski RA et al (1993) PROCHECK: a
using a nonadditive electrostatic model. J program to check the stereochemical quality
Comput Chem 25(12):1504–1514 of protein structures. J Appl Cryst
108. Ponder JW et al (2010) Current status of the 26:283–291
AMOEBA polarizable force field. J Phys 123. Sippl MJ (1993) Recognition of errors in
Chem B 114(8):2549–2564 three-dimensional structures of proteins. Pro-
109. Kaminski GA, Friesner RA, Zhou R (2003) A tein 17:355–362
computationally inexpensive modification of 124. Cheng X et al (2006) Channel opening
the point dipole electrostatic polarization motion of alpha7 nicotinic acetylcholine
model for molecular simulations. J Comput receptor as suggested by normal mode analy-
Chem 24(3):267–276 sis. J Mol Biol 355(2):310–324
110. Lopes PE, Roux B, Mackerell AD (2009) 125. Chipot C, Henin J (2005) Exploring the free-
Molecular modeling and dynamics studies energy landscape of a short peptide using an
with explicit inclusion of electronic polariz- average force. J Chem Phys 123(24):244906
ability: theory and applications. Theor Chem
Acc 124(1–2):11–28 126. Ivanov IN et al (2007) Barriers to ion translo-
cation in cationic and anionic receptors from
111. Shan Y et al (2009) A conserved protonation- the cys-loop family. J Am Chem Soc 129
dependent switch controls drug binding in (26):8217–24
Part IV
Pharmacokinetic and Pharmacodynamic Modeling

Chapter 12
Introduction to Pharmacokinetics in Clinical Toxicology

Pavan Vajjah, Geoffrey K. Isbister, and Stephen B. Duffull
Abstract
In clinical toxicology, a better understanding of the pharmacokinetics of the drugs may be useful in both
risk assessment and formulating treatment guidelines for patients. Pharmacokinetics describes the time
course of drug concentrations and is a driver for the time course of drug effects. In this chapter pharmaco-
kinetics is described from a mathematical modeling perspective as applied to clinical toxicology. The
pharmacokinetics of drugs are described using a combination of input and disposition (distribution and
elimination) phases. A description of the time course of the input and disposition of drugs in overdose
provides a basis for understanding the time course of effects of drugs in overdose. Relevant clinical
toxicology examples are provided to explain various pharmacokinetic principles. Throughout this chapter
we have taken a pragmatic approach to understanding and interpreting the time course of drug effects.
Key words: Pharmacokinetics, Clinical toxicology, Input, Disposition, Clearance, Volume of distri-
bution, Compartmental models
1. Introduction
The primary tenet of pharmacology and toxicology alike is that the

effects caused by a drug are related to the concentration of the
drug, whether the effect is beneficial or toxic. Dose–effect relation-
ships have been studied for decades and in these relationships dose
is assumed to be an input or driving force for the effect. While such
studies provide useful information about the size of the effect they
lack information about the change of effect over time. Pharmaco-
kinetics (PK) describes the time course of drug concentrations in
the body and when used as the driver for the effect, it naturally adds
time into the dose–effect relationship. Since pharmacodynamics
(PD) is the relationship between concentration and effect then a
primary purpose of studying PK is to incorporate the influence of
time on effect. Figure 1 describes the purpose of pharmacokinetics
in pharmacology and toxicology.
289
290 P. Vajjah et al.
Fig. 1. A primary purpose of pharmacokinetics is to add time into the dose/concentration–effect relationship.
The combination of PK and PD therefore provides an under-

standing of the time course of drug effects, which means the time of
onset of effects, the time for the effect to wear off, and the extent of
the effect (as shown in Fig. 1).
In clinical toxicology, measurement of plasma concentrations
of chemicals may be useful in making treatment decisions for
chronic exposure to drugs/chemicals where there is reasonable
access to assays in a clinically useful period of time. Examples
include chronic lead and chronic pesticide exposure. Measurement
of plasma concentrations of drugs in acute overdose or poisoning is
rarely useful to guide treatment decisions because such assays are
not routinely and rapidly available for most drugs. There are some
exceptions for common or highly toxic drugs including acetamino-
phen, theophylline, iron, anticonvulsants, digoxin, lithium, and
salicylates (1). It is therefore difficult to use drug concentration or
PK to guide treatment decisions in clinical toxicology practice.
12 Introduction to Pharmacokinetics in Clinical Toxicology 291
In contrast, studying the PK of drugs in overdose can be used

to inform us about the treatment of overdose patients. Such
research allows us to understand the dose–concentration–effect
relationships in patients including the effect of various interven-
tions such as decontamination (2). PKPD models can then be
developed to inform guidelines regarding risk assessment and treat-
ment with the aim of finding simple clinical determinants of out-
comes such as dose or early clinical effects. Such guidelines have
been developed for citalopram (3) from a PKPD model of citalo-
pram in overdose (4).
In this chapter we will discuss the basic concepts of PK and use
these to understand the time course and severity of drugs in over-
dose.
1.1. Toxic Dose In this chapter we have taken a broad perspective of the word drug,
where a drug is defined as any exogenously administered chemical
that elicits an effect on the body. Hence from this perspective a drug
could have therapeutic or toxic actions depending on the concen-
tration in the target tissues, which is a function of dose and time.
Based on this definition, the term toxicokinetics, which is used
often used in the discipline of toxicology, is simply pharmacokinet-
ics. We therefore use the term PK to refer to the concentration–-
time course of the concentration of any drug in the body whether it
is used therapeutically or following inadvertent or deliberate self-
poisoning.
2. Materials
2.1. Pharmacokinetics Traditionally the PK of drugs has been described using ADME
(Absorption, Distribution, Metabolism, and Excretion) principles.
The acronym ADME suggests a serial approach to PK such that
absorption occurs first followed by distribution then metabolism
and then excretion. However these processes (ADME) are not tem-
porally discrete and occur simultaneously even though at any point of
time one of the processes may be predominant. For example the
distribution, metabolism, and excretion of a drug usually continue
to occur during the so-called absorption phase. Another way to
consider PK processes is to categorize them into two components;
1. Input which describes the time course of drug movement from
the site of administration to the site of measurement.
2. Disposition which describes the time course of drug distribution
and elimination from the site of measurement.
Figures 2 and 3 show the input and disposition phases of the
drug that is administered via an extravascular route. Here we
Fig. 2. The relationship between input and disposition rates over time.
Fig. 3. The input and disposition phases of the PK process.
assume that the input and disposition rates are concentration

dependent and hence described predominantly by a first-order
process. A comparison of PK in terms of input and disposition
versus ADME is discussed below.
PK when studied as ADME is useful in understanding the
processes governing the fate of drug once it enters the body, i.e.,
we may know if the drug is absorbed through active/passive absorp-
tion or if the drug follows phase I or phase II metabolism. However,
absence of time in the ADME concepts makes it a difficult paradigm
for understanding the change of concentration over time and hence
a driver for the time course of drug effects. In contrast, PK can be
naturally divided into input and disposition since these processes are
independent, i.e., the manner in which a drug is administered
(e.g., percutaneous absorption, oral, intravenous) does not affect
its disposition. Hence the overall time course of concentration in the
body is the combination of input and disposition. In this sense

ADME can be considered as a method to understand the mechan-
isms of drug movement and input–disposition as a method to
understand the time course of drug movement.
2.2. Input Input can be defined as the process by which unchanged drug
travels from the site of administration to the site of measurement
within the body. In the case of an intravenous bolus dose the entire
amount of unchanged drug is available instantaneously in the body.
In the case of the extravascular route of administration the input
process usually involves more than one mechanism and there may
be several possible sites where the drug may be irreversibly elimi-
nated during the input process, hence absorption may be incom-
plete and variable in rate.
When the drugs are dosed orally loss may occur due to biological
reasons and/or poor physicochemical properties of the drugs.
Biological reasons include degradation of the drug in the gastrointes-
tinal tract (e.g., insulin), pre-systemic metabolism by enzymes present
in the gastrointestinal tract wall (e.g., cyclosporin) and transporters in
the gastrointestinal tract (e.g., vincristine) wall providing a counter
flux back into the gastrointestinal tract (5).
Most of the drugs that are substrates of the enzyme CYP3A4 (a
class of cytochrome P 450 enzyme) may undergo pre-systemic
metabolism. This is due to the presence of CYP3A4 in gut wall
(6). A key example of a drug that undergoes pre-systemic metabo-
lism through CYP3A4 is cyclosporine resulting in often highly
erratic PK profiles (7). Terbutaline (8) also undergo pre-systemic
sulphation. Drugs that are substrates of P-Glycoprotein (P-gp) also
have poor bioavailability. P-gp is a glycoprotein present in the gut
wall. The major function of P-gp in the gut wall is to transport the
drug back into the gut. Examples of P-gp substrates include
digoxin (9) and fexofenadine (10).
Physicochemical properties of the drug like poor solubility
and/or permeability of the drug may also lead to loss of the drug.
Some drugs like glibenclamide have low solubility in the gastroin-
testinal fluids and hence low bioavailability (11). Others like cimet-
idine have high solubility but low permeability (11). Once the drug
gains entry passed the gastrointestinal tract and enters the portal
vein, it may undergo first-pass metabolism in the liver. Figure 4
represents the absorption of the drug via the oral route.
2.2.1. Input and Drug In most studies of overdose the input phase may be difficult to
Overdose characterize. This is mainly because the patient does not take the
overdose in the hospital and less information is available about the
dose and the time at which the patient ingested the overdose (12). In
addition, there is a lag between the time of ingestion of the dose and
time at which the first plasma sample can be collected and the input
phase of the drug may be partially or fully completed by this time.
Fig. 4. Barriers that a drug encounters when given via the oral route. The drug can be degraded or poorly absorbed and
eliminated through feces, metabolized by intestinal microsomes, affected by transporters in intestinal wall that may inhibit
their passage through the GIT wall, and first pass metabolism in the liver.
Hence, the data collected may not provide much information about
the input phase of the drug. Methods have been developed in order to
account for this missing information in the input phase and account
for some of the uncertainty which is particularly important for esti-
mating the disposition parameters (e.g., clearance) (2, 12, 13).
It is often assumed that drug absorption takes longer when the
drug is taken in overdose. However, in a number of pharmacoki-
netic studies of drugs in overdose the absorption appears to be
rapid and complete, similar to the pharmacokinetics in therapeutic
doses of the same drugs (2, 12–14). The input phase for drugs in
overdose may therefore be similar to the drug in therapeutic doses
in many cases. In some cases there is clearly prolonged absorption
and this is assumed to be due to the tablets aggregating to form
what are known as pharmacobezoars (15). Carbamazepine is an
example of a drug in overdose that has a prolonged input phase
(16), and this can be seen in Fig. 5 that suggests there is ongoing
absorption for up to 24 h with increasing concentrations or a
plateauing of drug concentrations. This may be due to the forma-
tion of pharmacobezoars or that carbamazepine affects its own
absorption by perhaps reducing gastrointestinal motility.
2.3. Disposition Disposition can be defined as the process by which the drug moves
to and from the site of measurement. Once absorbed the drug
molecules are distributed to the tissues of the body including to
organs of elimination, leading to a decrease in concentration of the
drug at the site of measurement. The decrease in the blood con-
centration could be due to reversible loss of drug from the blood to
the tissues, defined as distribution, or the irreversible loss of drug
Fig. 5. Observed dose-normalized concentrations (to the therapeutic dose of 200 mg)
versus reported time of administration curves for eight patients taking normal release
carbamazepine overdoses with doses ranging from 3 to 36 g. Samples from the same
dose event are connected.
Fig. 6. A physiologically based pharmacokinetic model, describing the distribution of drug

to various tissues.
from blood, defined as elimination. Disposition is therefore the

combination distribution and elimination.
2.4. Distribution Distribution can be defined as the reversible transfer of drug from the
central compartment (blood) to the different tissues and is schema-
tically represented in Fig. 6. The rate and extent of distribution of a
drug to the different tissues is determined by the blood flow to the

tissue, the ability of the drug to cross the tissue membrane, the
solubility and the binding of the in the tissue (Fig. 6).
The clearance (CL) and volume of distribution (V) are the two
most important disposition PK parameters. A compartment is a
space in which the drug is assumed to distribute evenly and instan-
taneously. The apparent volume of a compartment may be much
bigger than the physiological space since it is given by the product
of the tissue weight and the partition coefficient (kp) of the drug for
that tissue.
In the model depicted in Fig. 6 the sum of the volumes of
distribution of the different tissues is equivalent to the volume of
distribution of the drug at steady state of the drug.
Vss ¼ KpLung WTLung ð1 ELung Þ
þ KpBrain WTBrain ð1 EBrain Þ
þ KpHeart WTHeart ð1 EHeart Þ þ
þ KpAdipose WTAdipose ð1 EAdipose Þ; (1)
where Kp is the partition coefficient of the drug into the tissue
which is a composite term that accounts for partitioning of drug
and WT is the weight of the tissue and E is the extraction ratio of
the tissue. If a tissue does not eliminate the drug then the extraction
ratio (E) for that tissue will be equal to zero.
2.4.1. Volume of Distribution The volume of distribution (V ) of a drug is an important pharma-

cokinetic parameter and is defined as the ratio of the amount of
drug in the body to the concentration in the compartment of
interest (e.g., central compartment or plasma). For a one-
compartment model, it is the ratio of the amount of drug in the
body to the plasma concentration. It has units of volume.
AðmassÞ
V ðvolumeÞ ¼ (2)
C mass
volume
The relationship between the first-order rate constant (k),
clearance (CL) and apparent volume (V) of distribution is given
by k ¼ CL/V. The elimination half-life, which is the length of time
for the amount in the body to reduce by half can then be deter-
mined as ln 2/k in this mode (see Subheading 2.5.5 for a descrip-
tion of CL).
2.4.2. Distribution and Drug Distribution is an important consideration in understanding drug

Overdose toxicity. For some drugs the time course of distribution rather than
the elimination appears to correlate well with the time course of
toxicity. This is the case for tricyclic antidepressant overdose. Tricy-
clic antidepressants have long elimination half-lives (typically in the
range of 8–80 h) that are not consistent with the time course of
Fig. 7. Drug concentration time curves for seven patients with amitryptyline overdoses
redrawn from Figure 3 in Hulten et al. (18) Shaded area is the therapeutic range of
0.8–2 mM.
toxicity in acute overdose. Tricyclic antidepressant toxicity develops

rapidly and then resolves over a period of 6–24 h consistent with
redistribution of drug to muscle and fat from the central nervous
system and heart (17–19). Meineke et al. described the pharmaco-
kinetics of imipramine in sheep given an intravenous overdose with
a three-compartment model showing that there is a rapid decrease
in plasma concentrations over the first few hours and hepatic
metabolism is not an important factor in the initial rapid decrease
in plasma drug concentrations. Studies of amitriptyline overdose in
human patients demonstrate a similar rapid decrease in drug con-
centration in the first 6–24 h from toxic concentrations of
1,000–5,000 ng/mL back into the therapeutic range of
80–200 ng/mL (Fig. 7) (18). Tricyclic antidepressant overdose is
characterized by central nervous system depression and coma, sei-
zures, QRS widening, and cardiac arrhythmias. The clinical and
electrocardiogram abnormalities (QRS widening) develop rapidly
and resolve over 6–24 h consistent with the rapid changes in the
plasma concentrations (17).
Lithium toxicity is another example of a drug where under-
standing its distribution is key to explaining acute versus chronic
toxicity. Lithium is a small molecule that is distributed widely in
extracellular fluid. The major site of toxicity is the central nervous
system where it can cause cerebellar toxicity, confusion, coma, and
death. However, this occurs only with chronic toxicity which devel-
ops over days to weeks (20). Figure 8 shows the time course of
lithium concentrations in a patient with chronic toxicity who pre-
sented with QT prolongation, confusion, and cerebellar signs. In
this case, the decline in lithium concentration was consistent with
Fig. 8. Plasma lithium concentration (black squares) and QT interval (gray circles) versus time for a female patient with
chronic lithium toxicity. The apparent half-life of lithium is 24.8 h assuming a one-compartmental model.
the clinical effects and this is shown with a similar decrease of the
QT interval with the decline in lithium concentration (Fig. 8). In
contrast, in acute lithium overdose toxicity occurs rarely because
there is only transient exposure to high plasma lithium concentra-
tions before it is distributed throughout the body. Uptake into the
central nervous system is slow and for toxicity to occur there must
be persistently high concentrations (above the upper level of the
therapeutic range) for central nervous system toxicity to occur (20).
In this setting, the distribution of lithium into the brain takes
longer than its elimination, which is opposite to tricyclic antide-
pressants where the distribution to the brain is rapid and elimina-
tion slow.
The relationship between the first-order rate constant, clear-
ance, and apparent volume of distribution is shown in Fig. 8 where
there is mono-exponential decay of lithium with a first-order elimi-
nation constant of 0.028/h and an elimination half-life of 24.8 h.
This apparent half-life of elimination is similar to that determined in
pharmacokinetic studies of lithium therapeutically (21).
Methotrexate is another example of a drug which is similar to
lithium in that its cellular uptake is slower than its renal elimination.
Acute overdose of methotrexate orally has never been reported to
cause severe toxicity compared to taking a therapeutic dose of meth-
otrexate daily instead of weekly which can be life-threatening (22).
2.5. Elimination Elimination is the irreversible loss of the drug from the site of
measurement which occurs by excretion of unchanged drug,
mainly through kidneys, or conversion of drug into a metabolite
Fig. 9. A physiologically based pharmacokinetic model, describing the elimination of drug

from various tissues.
via various metabolic pathways, mainly in the liver. Some drugs are
excreted through bile after being metabolized in the liver, usually
after phase II conjugation reactions. Uncommonly some drugs,
mostly volatiles, may also be excreted through the lungs. We limit
our discussion here to the two most common methods of drug
elimination, namely the renal and the hepatic routes. Figure 9
replicates the distribution pathways in Fig. 6 and adds the major
elimination pathways.
2.5.1. Hepatic Elimination Metabolism is the predominant mechanism by which about 75% of
drugs are eliminated from the body. The majority of drug metabo-
lism occurs in the liver but can occur at other sites, including the
gastrointestinal tract, the kidneys, and the lungs. Drug metabolites
are not always inactive but are generally less active. There are special
circumstances in which the metabolite is active while the parent is
inactive. It is argued that codeine is a special case of this where
codeine itself is thought to be inactive, and morphine, a metabolite,
is active. In some cases drugs are designed specifically for the parent
to be inactive and the metabolite active. These are termed prodrugs
in which the chemical structure is modified so that absorption
occurs more readily, e.g., dabigatran is given as the inactive dabiga-
tran etexilate (23), and mycophenolate is given as the inactive
mycophenolate mofetil (24). There are also less common circum-
stances in which the parent and metabolite are active and the
metabolite itself is sometimes given as the active agent, e.g.,
Fig. 10. Metabolic pathways for acetaminophen excluding the small amount excreted unchanged by the kidneys.
amitriptyline is metabolized to nortriptyline (25), hydroxyzine is

metabolized to cetirizine (26).
There are two major pathways of hepatic metabolism referred
to as phase I and phase II metabolic pathways. Phase I reactions
generally convert the drug into a more polar species, which are then
more easily eliminated by the kidneys, by either introducing a
functional group or unmasking a functional group (e.g., –OH,
–NH2, –SH) and involve the oxidation, reduction or hydrolysis of
the drug. The conversion of acetaminophen to N-acetyl-p-benzo-
quinone imine (NAPQI) by CYP2E1 (27) shown in Fig. 10 is an
example of a phase I reaction. Phase II reactions are characterized
by conjugation pathways and involve combining an endogenous
substrate (e.g., glucuronide, sulfate) to a functional group on the
drug. Figure 10 shows that the majority of acetaminophen is meta-
bolized via conjugation pathways glucuronidation (40–80%) and
sulfation (10–30%). Some drug fates include both phase I and II
reactions. This occurs for acetaminophen where the toxic
Table 1
Examples of in vivo substrates, inhibitors, and inducers of various CYP isozymes
relevant to clinical toxicology. This is not an exhaustive list
CYP Substrate Inhibitor Inducer

1A2 Olanzapine, caffeine, amitriptyline Fluvoxamine, fluoroquinolones Smoking
2C9 Warfarin, phenytoin, ibuprofen, Fluconazole, valproate Rifampicin
sulfonylureas
2C19 Omeprazole, citalopram, diazepam, Omeprazole, fluvoxamine, Rifampicin
imipramine moclobemide
2D6 Imipramine, amitriptyline, fluoxetine, Paroxetine, fluoxetine,
fluvoxamine, paroxetine, bupropion
venlafaxine, oxycodone, tramadol,
codeine, risperidone, metoprolol
2E1 Halothane and related anesthetics, Disulfiram Ethanol
acetaminophen, theophylline
3A4/3A5 Midazolam, alprazoloam quetiapine, Clarithromycin, indinavir, Rifampicin,
venlafaxine, methadone, ketoconazole carbamazepine,
mirtazapine, reboxetine, sertraline, phenytoin,
diltiazem, many phenobarbitone
immunosuppressants, and
chemotherapeutic agents
metabolite NAPQI is detoxified by conjugation to glutathione and

then eliminated (Fig. 10).
The enzymes responsible for phase I metabolism belong to the
family of cytochrome P 450 (CYP450). The CYP isoenzymes can
be classified into various classes such as CYP2E1, CYP2C19,
CYP2C9, CYP2D6, and CYP3A4/3A5 which are common and
important ones for the metabolism of drugs taken in overdose.
Generally all CYP enzymes are present in all metabolism tissues
(e.g., gut, liver). CYP3A4 is the most abundant enzyme and
CYP2D6 although less abundant is responsible for metabolism
the largest fraction of drugs that undergo phase I metabolism.
Drugs that undergo metabolism through a specific enzyme are
called substrates for the enzyme. Some drugs may induce or inhibit
the activity of the enzymes and this is important for both therapeu-
tic use of drugs and in some cases the clearance of drugs in overdose
(13). Table 1 provides examples of common drugs that are sub-
strates, inducers and inhibitors of CYP isozymes relevant to clinical
toxicology.
Phase II metabolic reactions are usually detoxification reac-
tions. In these reactions the drug molecule is conjugated with a
cofactor. Examples of cofactors include UDP-glucuronic acid and
glutathione. The conjugation takes place in the presence of

enzymes such as the UDP-glucuronosyltransferases, sulfotrans-
ferases, N-acetyltransferases, and glutathione S-transferases glucur-
onidases.
2.5.2. Metabolism and Drug It is generally thought that in most overdoses these metabolic
Overdose pathways are saturated. However, there is little evidence to support
saturation in overdose and recent pharmacokinetics studies of cita-
lopram, quetiapine, and venlafaxine in overdose suggest that
metabolism is not saturated for these drugs despite drug concen-
trations 10- to 100-fold those seen with therapeutic use (12–14).
Common examples of drugs where saturation occurs include etha-
nol, theophylline, and phenytoin; however saturation in these cases
occurs with therapeutic doses, albeit for theophylline this is not
noticeable clinically.
In the case of acetaminophen overdose, toxicity is due to the
formation of the toxic metabolite NAPQI and there is some evi-
dence that inhibition of CYP2E1 by ethanol decreases the forma-
tion of this metabolite and therefore toxicity, and conversely
chronic alcohol use induces metabolism (28, 29). Although there
is saturation of the sulfation pathway in acetaminophen overdose
due to depletion of sulfate (30, 31), it is unclear if this changes the
overall pharmacokinetics in overdose.
2.5.3. Renal Elimination The renal elimination of the drugs is usually referred to as drug
excretion and may be elimination of the parent drug or the drug
metabolites. Excretion of drug and the metabolites into the urine
involves three main mechanisms, glomerular filtration, active tubu-
lar secretion, and tubular reabsorption. About 25% of all the drugs
undergo renal elimination as unchanged drug.
Renal elimination depends on various factors including the
lipophilicity of the drug, plasma protein binding and the plasma
drug concentration. The nephron is the basic anatomical unit of
the kidney that is responsible for renal elimination of drugs. Like
any substance eliminated by the kidneys, including endogenous
substances, drugs can be eliminated by one of four major pro-
cesses which occur at different sites along the nephron: glomeru-
lar filtration, active secretion, passive diffusion, and active
reabsorption.
About 25% of cardiac output goes to the kidney and about 10%
of it is filtered through the kidney. The glomerular filtration of a
drug will depend on the glomerular blood flow and the concentra-
tion of unbound drug. Drugs that are bound to plasma proteins are
not filtered. The glomerular filtration rate (GFR) is usually approxi-
mated by the clearance of creatinine, which is a catabolic product of
amino acid metabolism in the muscle. Changes in the plasma
protein binding of drugs will affect filtration, such as saturation of
protein binding or changes in pH.
If the renal clearance of a drug is greater than GFR, then active

secretion pathways are likely to be playing a role. It should be noted
that only the net renal clearance can be determined and the individ-
ual influence of each process is generally not identifiable in clinical
practice. Active secretion occurs via a carrier mechanism and tends to
be sufficiently powerful to remove drug from plasma protein binding
sites and hence is relatively unaffected by binding to plasma. Benzyl
penicillin is 80% protein bound and almost completely removed
from the blood by secretion into the proximal tubule. However,
these efflux transporters can saturate at high concentrations of
drugs. The P-glycoprotein and the organic anion transporting poly-
peptides are specific transport proteins for excretion of drugs. These
are saturable and can lead to nonlinear pharmacokinetics (32).
Water is reabsorbed as the filtrate passes down the nephron so
that only 1% of the original filtrate emerges as urine. Lipophilic
drugs are more likely to permeate through the membrane and be
reabsorbed. This is in contrast to polar drugs that have minimal
reabsorption and their renal clearance will be similar to the GFR, e.
g., gentamicin.
Drugs that resemble essential amino acids (levodopa, a-meth-
yldopa, and thyroxine) are actively reabsorbed. Uric acid is also
actively reabsorbed which is inhibited by probenecid, which com-
petes with uric acid for the active transport system. This is the basis
of the use of probenecid for the treatment of gout.
Renal clearance of a drug can be measured with timed collec-
tion of urine and analysis of the drug concentration in the urine
using the following equation:
C Q
CLr ¼ ; (3)
Curine
where CLr is renal clearance, C is the concentration of the drug in
plasma, Q is the urine flow rate, and Curine is the concentration of
the drug in urine.
2.5.4. Renal Elimination Understanding renal elimination is important for both acute over-
and Drug Overdose dose and poisoning with drugs that are mainly or completely elimi-
nated by the kidneys. Chronic poisoning by drugs that are renally
eliminated will occur in patients with abnormal renal function or
acute renal failure, including digoxin toxicity, lithium toxicity, and
metformin poisoning.
There are a number of drugs where renal elimination becomes
important in overdose compared to therapeutic doses. An example
of this is salicylate poisoning where the metabolic pathways in the
liver are saturated in overdose leaving the major elimination path-
way as renal clearance of salicylic acid (33). This explains why the
apparent half-life of elimination of salicylate increases from 2 to 4 h
in therapeutic doses (due to hepatic clearance) to approximately
20 h in overdose which is mainly due to renal elimination (34).
Fig. 11. Two patients with acute aspirin (acetylsalicylic acid) overdoses. The first patient
(filled circles; thick line) ingested 36 g, was not treated with sodium bicarbonate and had
a half-life of elimination of 29.8 h. The second patient (open circles; dashed line) ingested
15 g, was treated with a loading dose and infusion of bicarbonate and had a half life of
5.2 h. The observed concentrations (filled and open circles) have been fitted to a one-
compartment model with first-order input.
Increasing the renal excretion of salicylic acidic is therefore impor-

tant in the treatment of salicylate poisoning. Alkalinization of the
urine, usually by administering a loading dose and infusion of
bicarbonate, increases the dissociated anionic form of salicylic acid
(i.e., salicylate anion) in the urine (35). Reabsorption is therefore
reduced since the molecule is charged (“ion trapping”) which
increases the amount of salicylate excreted in the urine. The differ-
ence in rate of elimination is shown in Fig. 11 for two patients with
aspirin overdose and only one treated with alkaline diuresis.
Toxic alcohols also have changed elimination in overdose. The
metabolites of methanol and ethylene glycol (formic acid and oxalic
acid) are the toxic species in toxic alcohol poisoning. Antidotal
therapy aims to inhibit the enzyme alcohol dehydrogenase that
metabolizes the alcohols either with ethanol or fomepizole. This
then increases the half-life of elimination to about 50 h (methanol)
or 17 h (ethylene glycol) since only the renal pathway remains. In
the case of methanol this often necessitates the use of hemodialysis
to speed the removal of methanol.
2.5.5. Clearance The clearance of a drug from various tissues occurs in parallel as
shown in Fig. 9 so the total body clearance of the drug (CL) is equal
to the sum of the clearances of the individual tissues.
CL ¼ CLh þ CLg þ CLr : (4)
Clearance is a constant that describes the relationship between
drug concentration (C) in the body and the rate of elimination of the
drug from the body and has units of volume per time. It is, however,
perhaps most easily understood based on Fig. 9 as the product of the

perfusion of the eliminating organ (Q) and the intrinsic ability of the
organ to eliminate the drug termed extraction (E).
CL ¼ Q E: (5)
Since E is unitless, then CL has the same units as perfusion
(volume per time). The CL of the drug is always constant and is
considered a primary parameter.
This is a physiologically appealing definition of CL in the sense
that alterations in perfusion and extraction can be shown to change
CL in a predictable manner.
3. Methods
3.1. Basic Model for PK Given the many processes involved in the input and disposition of
drugs, the mathematical models used to describe the PK of drugs
can be complicated. While the processes can be, sometimes, exceed-
ingly complex as depicted in Fig. 9, it is fortunate that a simple one-
compartment model with first-order input provides a reasonable
description of the time course for many drugs both therapeutically
and in overdose (Fig. 12). Of course more complex PK models may
be necessary.
In Fig. 12 we see that all the tissues of the body are lumped
together as a single homogeneous compartment, the “body” (dis-
cussed in the next section). The arrows indicate the movement of
the drug. The figure shows that part of the drug may be excreted
Fig. 12. A one compartment PK model with first-order input that can be used to describe
the time course of drug concentration. Although two routes of elimination are illustrated
the mathematical model describes total clearance.
unchanged and some part may be metabolized by enzymes in the

liver. In the figure the time course of dose–concentration relation-
ship may be described by combining input and disposition models.
Conceptually this is provided by:
Drug concentrationðtÞ ¼ f ½input model ðt), disposition model(tÞ:
(6)
Here we just use f ½input model ðt), disposition model(tÞ to
signify that drug concentration is a function of input and disposi-
tion and these two models are essentially independent of one
another. From a mass balance perspective the total amount of
drug that enters the body is equal to the total amount of drug
that is eliminated from the body. Hence
Dose ¼ Amount of drug at absorption site
þ amount of drug in the body
þ amount of unchanged drug excreted
þ amount of metabolite in the body
þ amount of metabolite eliminated (7)
We note here that this is a simplification of true mass balance
and does not account for the change of mass due to metabolic
process, but in a simple sense if mass did not change during these
processes then this concept holds.
3.2. Rate of Movement Most drug movement processes are due to passive diffusion, there-
of Drug Around the Body fore the driving factor is concentration. This means that the higher
the concentration the greater the rate of the drug that diffuses
across a membrane. This concentration-proportional rate is termed
first order which holds for most drugs. If the rate of movement is
constant and independent of concentration, then it is said to be
zero order. If the rate of movement of drug is saturable (i.e., it
changes from apparent first order to apparent zero order as con-
centration increases) then it is termed mixed order and most com-
monly described by a Michaelis–Menten process.
Note all the equations shown below are in terms of amount (A)
but can be written in terms of concentrations (C).
Zero-order process:
dA
¼ k0 (8)
dt
A zero-order process is described by a constant rate (mass per
time) alone and is independent of the amount of drug to be
transferred.
First-order process:
dA
¼ k A (9)
dt
A first-order process is described by rate constant (k) and the

amount of drug to be transferred. If we describe the same equations
as concentration instead of amount, the rate constant will be
replaced by clearance (CL)
Michaelis–Menten (MM) process or mixed order process:
dA Vmax
¼ A=V (10)
dt km þ A=V
A Michaelis–Menten process is described by the Michaelis con-
stant (km ) and Vmax the maximum rate at which the drug can be
eliminated. km is defined as the amount for half the maximum rate
or Vmax =2 and has same units as A. Here V is the volume of
distribution. When A>>km , km is negligible and the equation
simplifies to,
dA
¼ Vmax (11)
dt
which is equivalent to a zero-order process (Eq. 8).
When km >>A, A is negligible and the equation becomes:
dA Vmax
¼ A; (12)
dt km
where the ratio of Vmax (e.g., mg/h) to km (e.g., mg) essentially
provides k (1/h) and thus equivalent to a first-order process (Eq. 9).
3.3. Describing Three different approaches can be used to describe the PK of drugs:
the Pharmacokinetics
1. Compartmental pharmacokinetic models
of Drugs
2. Non-compartmental pharmacokinetic models
3. Physiologically based pharmacokinetic (PBPK) models
In the compartmental approach the body is divided into one or
more compartments the number of which is dictated by the data.
PBPK involves the investigator defining a set of compartments
based on physiology. The similarity of the behavior of the drug in
various compartments is assessed based on data which is sampled
from each of the compartments (Shown in Figs. 6 and 9).
The non-compartmental approach (NCA) does not require the
assumption of any compartments for the purpose of analysis and
includes the summary variables: peak or maximum drug concentra-
tion (Cmax), time to peak drug concentration (Tmax), area under the
curve (AUC). However, they do require weak assumptions if the
non-compartmental summary variables are converted into parame-
ter values such as CL and V.
3.3.1. Compartmental A commonly used approach to characterize the PK of a drug in the

Models body is to represent the body as a series of compartments.
Assumptions are required when models are used to describe

data. The compartment model assumes that each of the compart-
ments is a kinetically homogenous unit. The model also assumes
that the drug is instantaneously and evenly distributed through-
out the compartment. Despite these being apparently strong
assumptions it appears that most models perform well suggesting
that these requirements are generally too subtle for clinically
meaningful issues with the model.
Though not a requirement, it is often assumed when construct-
ing a model that the rate of elimination of the drug from the
compartment and the transfer of the drug between the compart-
ments follows first order (linear kinetics). When naming a compart-
ment model the number of compartments refers to the total
number of disposition compartments. For example a one-
compartment extravascular absorption model has two compart-
ments, a gastrointestinal compartment and a compartment which
incorporates all other tissues in the body. However, by convention
this would be termed a one-compartment model based on the
number of disposition compartments.
3.3.2. The One- The one compartment model is the simplest of the models used to
Compartment PK Model describe the PK of drugs. This model assumes that all the tissues in
the body are lumped together into a single kinetically homogenous
unit. In the below pharmacokinetic model, the whole body, except
the gut, is assumed to be a single compartment. A schematic of a
one-compartment PK model is given in Fig. 13.
When a dose D is administered extravascularly, it has to be
absorbed through a biological barrier to enter the central compart-
ment (blood) where it becomes systemically available. The process
itself is complex and determined by factors such as, the route of
administration, the amount administered, the formulation, and the
physicochemical properties of the drug. This complex input process
is often assumed to be a first-order input process governed by a
single parameter ka, the rate constant of absorption. This assump-
tion of a first-order input process is not a requirement and the
absorption rate may be described by any number of processes,
including a zero-order input process (which would describe an
intravenous infusion) or a mixture of these processes.
For this first-order input one-compartment model the rate of
change of amount of drug over time (t) in both the absorption site
and the body can be represented as a set of ordinary differential
equations given by:
dAð1Þ
¼ ka Að1Þ (13)
dt
dAð2Þ
¼ ðka Að1ÞÞ ðCL=V Að2ÞÞ; (14)
dt
Fig. 13. (a) Schematic of a one-compartment PK model with intravenous bolus adminis-
tration, showing all the tissues in the body. CL is the clearance and V is the volume of
distribution of the drug. (b) Schematic of a one-compartment PK model with intravenous
bolus administration. All the tissues are lumped together into a single homogeneous
compartment. Both are the same; however, (a) shows the lumping assumptions.
with the initial conditions set to A(1) ¼ D and A(2) ¼ 0 at t ¼ 0.

It is important to understand that the one-compartment model
does not imply that the concentration in the compartment (plasma
or blood, the tissue most commonly used for measuring drug
concentrations), is equal to the concentration in other tissues in
the body. Rather the rate of change of concentration in the plasma
or blood is identical to the rate of change of concentration in the
tissues.
By simultaneously solving the above differential equations the
amount of drug in plasma at any instance of time t is given by:
ka h CLt ka t
i
AðtÞ ¼ D exp V exp : (15)
ka CL
V
The parameters in Eq. 11 are the absorption rate constant (ka ),

clearance (CL) and volume of distribution (V). By dividing Eq. 15
by the volume of distribution (V), the concentration at any time can
be obtained as shown in Eq. 16.
ka h i
CL
V t expka t :
CðtÞ ¼ D exp (16)
V ðka CL
V Þ
Figure 14 gives an example of a concentration time data from a

patient who ingested 70 g of amisulpride. The parameters ka , V, and
CL have been estimated using a first-order input one-compartment
model and the half-life has been derived from the disposition
Fig. 14. Observed concentration time data from a female patient ingesting 14 g of
amisulpride fitted with a first-order one-compartment model with parameter estimates
of ka , 0.173/h, V, 179 L and CL, 31.3 L/h and a half-life of 4 h.
parameters V and CL. The CL of amisulpride in healthy volunteer

study at therapeutic doses was reported to be 31.2–41.6 L/h (36).
4. Notes
l Pharmacokinetics describes the time course of drug concentra-

tion in the body.
l Pharmacokinetic models are usually broken down into a model
for drug input and a model for drug disposition.
l The primary purpose of pharmacokinetics is to add time into
the concentration–effect relationship and thus allow us to
understand the time course of drug effects.
l In clinical toxicology, a pharmacokinetic study can form the
basis from which the pharmacokinetics after overdose can be
compared to therapeutic doses to assess for any unexpected
influence of very high doses. These studies also form the basis
of determining the likely effectiveness of decontamination pro-
cesses.
References
1. Dawson AH, Whyte IM (2001) Therapeutic overdose patients? Ther Drug Monit
drug monitoring in drug overdose. Br J Clin 32:300–304
Pharmacol 52(Suppl 1):97S–102S 3. Isbister GK, Friberg LE, Duffull SB (2006)
2. Isbister GK (2010) How do we use drug con- Application of pharmacokinetic-
centration data to improve the treatment of pharmacodynamic modelling in management
of QT abnormalities after citalopram overdose. the pharmacokinetics and the clinical features
Intensive Care Med 32:1060–1065 of carbamazepine poisoning. Am J Emerg Med
4. Friberg LE, Isbister GK, Duffull SB (2006) 24:440–443
Pharmacokinetic-pharmacodynamic modelling 17. Hulten BA, Heath A, Knudsen K, Nyberg G,
of QT interval prolongation following citalopram Starmark JE, Martensson E (1992) Severe ami-
overdoses. Br J Clin Pharmacol 61:177–190 triptyline overdose: relationship between toxi-
5. Martinez MN, Amidon GL (2002) A mecha- cokinetics and toxicodynamics. J Toxicol Clin
nistic approach to understanding the factors Toxicol 30:171–179
affecting drug absorption: a review of funda- 18. Hulten BA, Heath A, Knudsen K, Nyberg G,
mentals. J Clin Pharmacol 42:620–643 Svensson C, Martensson E (1992) Amitripty-
6. Lin JH, Chiba M, Baillie TA (1999) Is the role line and amitriptyline metabolites in blood and
of the small intestine in first-pass metabolism cerebrospinal fluid following human overdose.
overemphasized? Pharmacol Rev 51:135–158 J Toxicol Clin Toxicol 30:181–201
7. Hoppu K, Koskimies O, Holmberg C, Hirvi- 19. Meineke I, Schmidt W, Nottrott M, Schroder
salo EL (1991) Evidence for pre-hepatic T, Hellige G, Gundert-Remy U (1997) Mod-
metabolism of oral cyclosporine in children. elling of non-linear pharmacokinetics in sheep
Br J Clin Pharmacol 32:477–481 after short-term infusion of cardiotoxic doses
8. Pacifici GM, Eligi M, Giuliani L (1993) (+) and of imipramine. Pharmacol Toxicol 80:266–271
() terbutaline are sulphated at a higher rate in 20. Waring WS (2006) Management of lithium
human intestine than in liver. Eur J Clin Phar- toxicity. Toxicol Rev 25:221–230
macol 45:483–487 21. Sproule BA, Hardy BG, Shulman KI (2000)
9. Tanigawara Y, Okamura N, Hirai M, Yasuhara Differential pharmacokinetics of lithium in
M, Ueda K, Kioka N, Komano T, Hori R elderly patients. Drugs Aging 16:165–177
(1992) Transport of digoxin by human P- 22. Balit CR, Daly FFS, Little M, Murray L (2006)
glycoprotein expressed in a porcine kidney epi- Oral methotrexate overdose. Clin Toxicol 44:1
thelial cell line (LLC-PK1). J Pharmacol Exp 23. Sanford M, Plosker GL (2008) Dabigatran
Ther 263:840–845 etexilate. Drugs 68:1699–1709
10. Cvetkovic M, Leake B, Fromm MF, Wilkinson 24. Goldblum R (1993) Therapy of rheumatoid
GR, Kim RB (1999) OATP and P-glycoprotein arthritis with mycophenolate mofetil. Clin
transporters mediate the cellular uptake and Exp Rheumatol 11(Suppl 8):S117–S119
excretion of fexofenadine. Drug Metab Dispos 25. Watson CP, Vernich L, Chipman M, Reed K
27:866–871 (1998) Nortriptyline versus amitriptyline in
11. Lindenberg M, Kopp S, Dressman JB (2004) postherpetic neuralgia: a randomized trial.
Classification of orally administered drugs on Neurology 51:1166–1171
the World Health Organization Model list of 26. Tashkin DP, Brik A, Gong H Jr (1987) Cetir-
Essential Medicines according to the biophar- izine inhibition of histamine-induced broncho-
maceutics classification system. Eur J Pharm spasm. Ann Allergy 59:49–52
Biopharm 58:265–278
27. Manyike PT, Kharasch ED, Kalhorn TF, Slat-
12. Friberg LE, Isbister GK, Hackett LP, Duffull tery JT (2000) Contribution of CYP2E1 and
SB (2005) The population pharmacokinetics of CYP3A to acetaminophen reactive metabolite
citalopram after deliberate self-poisoning: a formation. Clin Pharmacol Ther 67:275–282
Bayesian approach. J Pharmacokinet Pharma-
codyn 32:571–605 28. Schmidt LE, Dalhoff K, Poulsen HE (2002)
Acute versus chronic alcohol consumption in
13. Isbister GK, Friberg LE, Hackett LP, Duffull acetaminophen-induced hepatotoxicity. Hepa-
SB (2007) Pharmacokinetics of quetiapine in tology 35:876–882
overdose and the effect of activated charcoal.
Clin Pharmacol Ther 81:821–827 29. Thummel KE, Slattery JT, Ro H, Chien JY,
Nelson SD, Lown KE, Watkins PB (2000) Eth-
14. Kumar VV, Oscarsson S, Friberg LE, Isbister anol and production of the hepatotoxic metab-
GK, Hackett LP, Duffull SB (2009) The effect olite of acetaminophen in healthy adults. Clin
of decontamination procedures on the pharma- Pharmacol Ther 67:591–599
cokinetics of venlafaxine in overdose. Clin
Pharmacol Ther 86:403–410 30. Levy G, Galinsky RE, Lin JH (1982) Pharma-
cokinetic consequences and toxicologic impli-
15. Buckley NA, Dawson AH, Reith DA (1995) cations of endogenous cosubstrate depletion.
Controlled release drugs in overdose: clinical Drug Metab Rev 13:1009–1020
considerations. Drug Saf 12:73–84
31. Gelotte CK, Auiler JF, Lynch JM, Temple AR,
16. Brahmi N, Kouraichi N, Thabet H, Amamou Slattery JT (2007) Disposition of
M (2006) Influence of activated charcoal on
acetaminophen at 4, 6, and 8 g/day for 3 days in 35. Cumming G, Dukes DC, Widdowson G
healthy young adults. Clin Pharmacol Ther (1964) Alkaline diuresis in treatment of aspirin
81:840–848 poisoning. Br Med J 2:1033–1036
32. Tirona RG, Kim RB (2002) Pharmacogenomics 36. Rosenzweig P, Canal M, Patat A, Bergougnan
of organic anion-transporting polypeptides L, Zieleniuk I, Bianchetti G (2002) A review of
(OATP). Adv Drug Deliv Rev 54:1343–1352 the pharmacokinetics, tolerability and pharma-
33. Levy G, Tsuchiya T (1972) Salicylate accumula- codynamics of amisulpride in healthy volun-
tion kinetics in man. N Engl J Med 287:430–432 teers. Hum Psychopharmacol 17:1–13
34. Done AK (1960) Salicylate intoxication. Signifi-
cance of measurements of salicylate in blood in
cases of acute ingestion. Pediatrics 26:800–807
Chapter 13
Modeling of Absorption
Walter S. Woltosz, Michael B. Bolger, and Viera Lukacova
Abstract
Absorption takes place when a compound enters an organism, which occurs as soon as the molecules enter
the first cellular bilayer(s) in the tissue(s) to which is it exposed. At that point, the compound is no longer
part of the environment (which includes the alimentary canal for oral exposure), but has become part of the
organism. If absorption is prevented or limited, then toxicological effects are also prevented or limited.
Thus, modeling absorption is the first step in simulating/predicting potential toxicological effects. Simula-
tion software used to model absorption of compounds of various types has advanced considerably over the
past 15 years. There can be strong interactions between absorption and pharmacokinetics (PK), requiring
state-of-the-art simulation computer programs that combine absorption with either compartmental phar-
macokinetics (PK) or physiologically based pharmacokinetics (PBPK). Pharmacodynamic (PD) models for
therapeutic and adverse effects are also often linked to the absorption and PK simulations, providing PK/
PD or PBPK/PD capabilities in a single package. These programs simulate the interactions among a variety
of factors including the physicochemical properties of the molecule of interest, the physiologies of the
organisms, and in some cases, environmental factors, to produce estimates of the time course of absorption
and disposition of both toxic and nontoxic substances, as well as their pharmacodynamic effects.
Key words: Absorption, Permeability, Gastrointestinal, Dermal, Nasal/pulmonary, Ocular, Pharma-

cokinetic, Pharmacodynamic, Toxicology, Solubility, Dissolution, Precipitation, Supersaturation,
Formulation, Transporters
1. Introduction
Because absorption is required before any compound can exert a

toxicological influence, the field of computational toxicology
requires not only the ability to predict whether a molecular structure
is likely to be toxic to an organism, and in what ways, but also
whether it can be expected to be absorbed at rates and in quantities
sufficient to overcome the organism’s natural defense mechanisms.
If the molecule itself is nontoxic, but one or more of its metabolites
are, then the absorption of the parent molecule will govern
its availability at the sites of metabolism, but the rates and amounts
313
314 W.S. Woltosz et al.
of metabolite formation will govern the toxicological effects, as seen

for acetaminophen (1–3).
Absorption occurs when a compound enters an organism,
which occurs as soon as the compound enters the first cellular
bilayer(s) in the tissue(s) to which is it exposed. For oral absorption
in the gastrointestinal tract, fraction absorbed (Fa) is defined as the
fraction of the administered dose that gets absorbed into the apical
membrane of the enterocytes (4). At that point, the compound is
no longer part of the environment, but has become part of the
organism. From there, the fate of the compound, and thereby
the potential fate of the organism as a result of the exposure, will
be determined by the pharmacokinetics and pharmacodynamics of
the compound within the organism. Some compounds will be
effluxed back out of absorbing cells by transporter proteins—one
of nature’s defense mechanisms (5). Some may be assisted into
the cells by influx transporters, particularly if such transporters per-
ceive them as nutrients (6). Some will be metabolized either in the
absorbing cells or in other tissues within the organism. The resulting
metabolite(s) may be therapeutic, toxic, or benign. These metabo-
lites may be easily cleared out of the organism, or they may become
bound to various tissues and clear very slowly. Toxicological effects
can be from one or more metabolites in addition to, or in the absence
of, toxicological effects of the original molecule that was absorbed.
Organisms absorb exogenous molecules through a variety of
pathways. Some are ingested with food or liquids and are exposed
to gastrointestinal tissues. Some are inhaled and are exposed to the
tissues in the respiratory system. Some come in contact with the
external surfaces of the organism, typically skin, eyes, and hair.
The rate and extent of absorption of molecules that come into
contact with an organism depends on a number of factors that
affect absorption through complex interactions. Typically, mole-
cules must be in solution to be absorbed; however, endocytosis can
result in very small solid particles being taken into some cells (7).
When absorption requires molecules in solution, the solubility of
the substance in the medium outside the membrane bilayer (intes-
tinal fluids, mucus layer in the respiratory tract, tear layer in the
eyes, etc.) will determine the concentration gradient across that
first exposed bilayer, and hence the rate of absorption for passive
diffusion.
Ionization of the invading molecules (when they are ionizable)
can play an important role in affecting rate and extent of absorption
into various tissues. The degree of ionization affects solubility for
ionizable compounds, with solubility increasing with percent ion-
ized. Transcellular permeability generally decreases with ionization
(8), while observed paracellular permeability of protonated mole-
cules was higher than permeability of their neutral forms (9). This
results in a complex interplay between solubility and permeability
that requires mechanistic simulations to solve. This interplay is
13 Modeling of Absorption 315
Fig. 1. Interacting processes involved in gastrointestinal absorption.
especially complex for absorption in the gastrointestinal tract,

where pH is constantly changing, along with the fluid volume
(10), surface area of the enterocytes (11), and the width of the
tight junction gaps between enterocytes (12, 13). Food effects and
biliary secretion at mealtime can result in changes to solubility and
are often the cause for different absorption rates of the same dosage
form of drugs between fasted and fed state (14). Many toxic sub-
stances would also be expected to show different absorption rates
under fasted and fed conditions.
Thus, modeling of absorption of compounds via the various
pathways requires knowledge of the physicochemical properties of
the molecules, the physiology of the organism, and the biology of
the tissues involved. The physical nature of the exposed compound
(e.g., solution, powder, suspension) as well as the specific tissues
exposed (e.g., skin, nasal/pulmonary passages, gastrointestinal
tract, eyes) and duration of exposure (affected by transit time for
some tissues) will determine the rate, and eventually the extent, of
absorption into the organism. For some substances, degradation of
the invading molecules by chemical or metabolic processes can
reduce exposure, so that additional equations are required to calcu-
late the degradation rate for such substances.
Thus, dissolution rate, solubility, permeability, transit time,
degradation rate, and length of exposure must all be accounted
for in an absorption model, and the interactions among these
phenomena can be complex (15, 16). For example, a compound
that is absorbed quickly produces a sink effect in the donor fluid,
reducing the concentration, and allowing faster dissolution of any
remaining solid particles, and perhaps also reducing degradation
rate. Figure 1 shows the complex interactions involved in the
gastrointestinal absorption of an oral dose of a pharmacological
ingredient. These processes are essentially the same for ingested

toxicants with the exception that a dose is not administered as a
particular dosage form like a tablet or capsule, and so it may be
introduced into the organism at a steady rate (e.g., inhalation of
toxic fumes) or at multiple times with fixed or random spacing
(e.g., via drinking water).
The simplest absorption model takes the form of (1)
dM abs =dt ¼ ka M soln (1)
where Mabs is the amount absorbed, ka is called an absorption rate
constant (but is almost always not constant, but a time-varying
coefficient), and Msoln is the mass of compound in solution at any
point in time. This function is an oversimplification, and a more
correct expression for movement of molecules across a membrane
takes the form of (2), which is based on Fick’s First Law for passive
diffusion across a membrane (17).

dM abs =dt ¼ ka V donor C donor C acceptor (2)
where Vdonor is the volume of fluid on the donor side of the mem-
brane, Cdonor is the concentration on the donor side, and Cacceptor is
the concentration on the acceptor side of the membrane. Notice
that the product Vdonor Cdonor is equal to Msoln in (1). Notice also
that if the two concentration terms become equal, absorption stops.
In fact, if Cacceptor becomes greater than Cdonor, the absorption rate
would be negative and molecules would move back into the donor
fluid. The presence of transporter proteins can override the concen-
tration gradient and result in molecules moving from low concen-
tration to high concentration against the gradient. Modern
simulation software includes the ability to model the effects of
transporters that both help (influx) and hinder (efflux) absorption
of molecules.
The absorption rate coefficient, ka, in (1) and (2) is influenced
by different factors depending on the tissue into which absorption
is taking place. The value of ka to input into a simulation can be
obtained from in vitro cell culture or artificial membrane experi-
ments, from a variety of animal in situ experiments, or from an
in silico prediction based on the molecular structure (18).
The most widely used absorption model for intestinal absorp-
tion in the pharmaceutical industry is the one incorporated into
GastroPlus and known as the advanced compartmental absorption
and transit (ACATTM) model (15). This model is based on the
original compartmental absorption and transit (CAT) model devel-
oped by Yu (18). Figure 2 shows a diagram of the ACAT model in
GastroPlus. A total of nine compartments are used to represent the
gastrointestinal tract for humans and animals: stomach, six small
intestine compartments, caecum, and colon. Each of these com-
partments will have a mean transit time, pH, permeability,
Fig. 2. Advanced compartmental absorption and transit (ACAT) model in GastroPlus.
concentration of bile salts, and fluid volume based on the physiol-

ogy of the species being modeled. So each compartment will have
its own absorption equation, as well as equations for dissolution
of solid particles (which can have various sizes), metabolism in
the absorbing cells (enterocytes), degradation of the substance
in the intestinal lumen, and transit of absorbed compound through
the enterocytes to the portal vein leading to the liver. Each com-
partment simulates all of the processes illustrated earlier in Fig. 1.
Simulating absorption with this program requires the following
input parameters for the invading molecule as a minimum.
Molecular weight pKa (s)

Log P or log D at a specified pH Ratio of ionized to un-ionized solubility
Solubility at a specified pH Permeability
Dose amount Dosage form (solution, solid,
suspension)
Particle size(s) if solid particles
exist
When the species to be simulated is selected, built-in values for

the numerous physiological and biological properties are automati-
cally invoked to represent fasted or fed state. To obtain complete
plasma concentration-time (Cp–time) predictions, PK parameters
are also required inputs. These can be fitted against observed
Cp–time data when such data are available.
Absorption modeling is intimately coupled with Pharmaco
Kinetic/Toxico Kinetic (PK/TK), PBPK, and PD models in
today’s state-of-the-art software programs. By combining experi-

mental data with in silico predictions for those parameters for which
experimental data are not available, simulation programs like Gas-
troPlus can be used not only to predict the absorption of the
original (parent) compound but also to predict the rates of forma-
tion of metabolites (with known or predicted Vmax and Km), the
distribution of parent and metabolites into various tissues (when
metabolite structures are known or when metabolite concentra-
tion–time data are available), the clearance of parent and metabo-
lites from the organism, and the pharmacodynamic (including
toxicological) effects of parent and metabolites. The ability to
predict specific metabolite PK and PD requires knowing the molec-
ular structures of the metabolites as well as knowing their specific
metabolizing enzymes.
Application areas for the techniques
Mechanistic simulation is the only way to quantitatively assess
the complex interactions among the various mechanisms involved
in absorption and eventual toxicity. A well-constructed simulation
allows us to learn things we don’t know from things that we do
know. Simulation also affords the opportunity to test the sensitiv-
ities of the factors that affect absorption and toxicity, including
variations in exposure (dose), dosage form, physiological factors,
environment (e.g., wind velocity for inhaled compounds), and
biology (e.g., changes in expression levels for transporters and
enzymes in various tissues, which can change with age, gender,
ethnicity, and disease state).
Applications for absorption modeling and simulation in
computational toxicology can include the study of all forms of
toxic exposure, including, but not limited to natural causes (solar
exposure, water and air pollution as a result of natural events) (19),
industrial chemicals (waste products, process materials, lubricants,
high pressure fluids, coatings, etc.) (20, 21), vehicle exhaust pro-
ducts (aircraft, boats and ships, cars, and trucks), home products
(cleaning supplies, automotive supplies, lawn and plant supplies,
pet supplies, cosmetics) (22), agricultural chemicals (fertilizers,
pesticides) (23), and pharmaceutical products (prescription and
over-the-counter medications) (24).
How, when, and by whom these techniques or tools are used in
practice
Absorption simulation and modeling tools for computational
toxicology are used by environmental toxicologists, academic
researchers, chemical industry scientists, clinical pharmacologists,
and others to:
l Estimate pharmacokinetic effects (maximum concentration
in plasma and various tissues) from potential exposure to
toxic materials prior to actual exposure in individuals and

populations (25, 26).
l Estimate safe levels of exposure to toxic substances (27, 28).
l Fit models that explain observed effects in subjects actually
exposed to toxic materials (29, 30).
l From those models to anticipate actions needed to protect
against further damaging exposure.
2. Materials
Simulation computer programs have been developed that employ

mechanistic models that account for the interactions identified
above through a series of differential equations that are integrated
forward in time to provide complete absorption-versus-time pro-
files. These programs are commercially available from several
sources, with the dominant ones used in the pharmaceutical indus-
try listed below.
l GastroPlus™ (http://www.simulations-plus.com).
l PK-Sim™ (http://www.systems-biology.com).
l SimCYP™ (http://www.simcyp.com).
All of these programs are available on Microsoft Windows®
platforms and should be able to run on any current version of the
Windows operating system for standalone or server-based installa-
tions. GastroPlus, PK-Sim, and SimCYP are broad-based programs
that offer built-in capabilities that include physiologically based
pharmacokinetics (PBPK), drug–drug interactions (which can also
be used for toxicant interactions with other molecules, including
both drugs and other toxicants), stochastic simulations of variabil-
ities expected in target populations, and model-fitting capabilities.
The solution of as many as hundreds of differential equations over a
typical exposure period is accomplished within seconds on modern
personal computers, allowing scientists to quickly explore various
“what if” scenarios as they attempt to explain observed behaviors
and to predict outcomes for new conditions. To illustrate the
importance of computer speed, the ratio of computer speed for a
typical laptop computer in the year 2010 is roughly 15,000 times
faster than the original IBM PC in 1981. Thus, a simulation that
takes 10 s on the modern laptop computer would take on the order
of 150,000 s, or approximately 42 h, on the original IBM PC.
Fitting model parameters can require hundreds of such simulations,
perhaps requiring an hour on the modern laptop (1.7 years on the
1981 computer). The tremendous improvement in speed and
memory in today’s inexpensive computers has made possible the
sophisticated absorption/PK/PD software that is in common use

today. One can only speculate how this will change in yet another
30 years.
For the greatest utility, absorption simulation software should
be available for the most popular operating systems, especially
Windows. It should provide accurate results using an appropriate
level of detail in mechanistic models within the limitations of the
state-of-the-art for such models and the required input parameters,
i.e., limitations of in vitro and in vivo data available from which
simulation parameters can be estimated. Values for critical model
parameters should be built-in where possible. Software should run
with reasonable speed and should incorporate built-in error check-
ing that traps likely user mistakes and provides guidance for cor-
recting them. Tutorials should be comprehensive and should use
real-world data for examples. The software vendor should provide
strong technical support with reasonable direct access to the actual
software development science/engineering team.
3. Methods
During program development, complex simulation programs are

tested repeatedly by running a variety of examples with known
outcomes to ensure that the simulations duplicate observed results.
Developers learn from outliers—situations that are not well-
predicted at a certain stage of development and require more
sophisticated mechanistic models and/or additional parameteriza-
tion for proper simulation of those situations. With the continuous
expansion of knowledge about physiology and biology, as well as
the processes relevant in absorption and distribution of xenobiotics
in the body, the models (especially the physiologically based ones)
are improving with each newer version of these programs. The
developers as well as end users need to check the reproducibility
of the previous simulation results under the new assumptions.
Developers ensure the repeatability of the simulation results from
the previous versions of the programs. In cases where the repeat-
ability of simulation results from previous versions is not possible,
developers should provide clear explanations for the changes and
what differences should the end user expect. In any case, developers
should provide guidance on what changes should the end user
make to their previous models to account for the latest body of
knowledge.
Supporting data used during development and validation come
from the scientific literature (including scientific posters and pre-
sentations at related meetings), the developer’s own experimental
data, and from in silico predictions when no experimental data are
available. In addition to being one of the most important parts of
developing new modeling capabilities, it is often the most tedious,

requiring gathering, reading through, and filtering of hundreds of
sometimes conflicting reports. Typical concerns with respect to
literature data include, for example, consistency of the values
reported by different labs, relevance of the in vitro experiment
design to the in vivo situation (e.g., media selection in solubility
or dissolution rate measurements), whether observations have been
corrected for unbound fraction in cases where significant binding
to different components in the in vitro assay may affect the results
(e.g., in vitro measurement of clearance), whether separate data for
different enantiomers are available. However, the gathering of
supporting data is not limited to simply obtaining the values for
obvious processes that would be affecting the absorption of the
compound (e.g., solubility or permeability), but also gathering
information about additional mechanisms that might play a role
in a compound’s absorption (e.g., designing an in vitro experiment
to evaluate whether the compound might be a substrate for trans-
porters).
3.1. Fitting Model Regardless of the quantity and quality of data available during soft-
Parameters ware development, fitting of one or more model parameters will be
required in some, if not most, instances. Absorption simulation soft-
ware should provide for robust numerical optimization of such para-
meters from as many types of experimental data as possible. The user
should be provided a choice of optimization algorithms, objective
functions, and weighting functions and should be able to set con-
straints on both fitted (optimized) parameters and on various results
of the absorption simulation, such as maximum concentration in
plasma (Cmax) or other tissues, time of maximum concentration
(Tmax), area under the plasma concentration–time curve (AUC),
fraction absorbed, and fraction bioavailable.
Scaling data from in vitro experiments or in vivo measurements
in different species to human is a frequent challenge. For example,
permeability in the gastrointestinal tract of rat or dog can be
significantly different than that in human, even for compounds
that obey simple passive diffusion (i.e., no transporters involved).
Rat permeabilities tend to be around 3–4 times lower than human
but are very well correlated after being scaled, while dog perme-
abilities tend to be around four times higher, but these are gross
approximations and the actual reported ranges are much wider,
with rat permeabilities as much as 15 times lower than human
(31). It is also important to keep in mind the relevance of in vitro
values to the in vivo situation. For example, aqueous solubility will
generally not describe adequately dissolution of a compound in the
gastrointestinal tract (14). The concentration of bile salts in vivo
needs to be accounted for (32), together with the fact that it
changes in different regions of the gastrointestinal tract as well as
in relation to food intake (33). Of course, the effect of bile salts on
solubility will not be relevant for ocular or pulmonary exposure

where different types and amounts of surfactant may aid dissolution
(34, 35).
3.2. Conducting Conducting absorption simulations involves a series of steps:

Simulations and l Gathering the data to be used as inputs and observations, and,
Validating Results
if necessary, converting the values to units needed by the simu-
lation software. Some programs provide built-in conversion
tools that will accept data in a variety of commonly used units
and convert them to units needed by the software.
l Examining the available data, both input parameters and
expected outputs (observations) for likely challenges in build-
ing predictive models.
– Incomplete data—missing values, aggregate rather than
individual subject data.
– Anomalies in the data that appear to be inconsistent.
– Unknowns within the data (e.g., were in vitro results cor-
rected for unbound fraction?).
– Recognizable behaviors in observations that provide clues
for using various model options (e.g., double peaks in
plasma concentration–time data that might indicate meal-
times or enterohepatic circulation, or significant delay after
dose/exposure before noticeable concentration of the mol-
ecule of interest is detected might indicate delayed gastric
emptying when the exposure is by the oral route).
l If multiple data sets are available, dividing the available experi-
mental data into two groups: that which will be used as to fit
relevant model parameters for the simulations and that which
will be used to test those models for generality.
l Running simulations with default parameter values to see if the
simulated results are reasonably close to observations.
l Fitting model parameters as required to improve the ability of
the absorption and pharmacokinetic models to predict
observed concentration–time data. A very important part of
this step is the selection of parameters that should be fitted. It is
easy to match the observed concentration–time data by fitting
many parameters, but one needs to keep in mind the final
interpretability of the model. Only the parameters that are
relevant to the mechanism of absorption for each particular
compound should be fitted. One should also keep in mind
the presence of measurement errors with each set of experi-
mental data. These include errors in sampling as well as analyti-
cal measurement. Models can account for variability among
subjects, but it is unreasonable to expect a model which was
developed to describe the compound’s absorption to account
0.05
0.04
Subject A
0.03
Subject B
0.02 Subject C
0.01 Average
0
–6 4 14 24
Fig. 3. Example of individual double peaks masked in mean data.
also for experimental errors. For example, the times at which

samples are taken are typically reported in round numbers of
minutes or hours, yet it is highly unlikely that multiple subjects
were all sampled at exactly 5, 10, 15, and 30 min. The software
has no way to correct these errors, and so the inputs are
accepted as correct. Concern over the accuracies and variances
of measured concentrations is typical, but concern over var-
iances in sampling times at which the samples were taken is
unusual, even though such variances add to the total variances
observed for data among multiple subjects. Note, however,
that simulation software can estimate variabilities in a popula-
tion that are caused by known differences in physiology among
subjects, known variations in the dosage form, and even esti-
mated variabilities from measurement errors if the user can
provide an estimate of those errors.
l Testing fitted models with additional data not used in the
fitting process to assess whether the model will generalize to
other conditions.
l Using validated models to predict outcomes from new dose/
exposure conditions.
We advocate obtaining individual subject data rather than only
mean or median data. Aggregated data can both hide certain beha-
viors such as double peaks in plasma concentration–time observa-
tions and can improperly indicate certain behaviors such as double
peaks in plasma concentration–time observations. Figure 3 shows
an example of double peaks that exist in individual subjects but that
would be hidden if mean data were used.
l Interpreting the Results
Often, absorption models will provide results for which the
interpretation is unambiguous. Occasionally, however, results will
require interpretation. Such anomalous behaviors can be caused by
environmental factors, disease states, different results in very young
or very old subjects, interactions between substances, sleep/wake
cycles, and mealtimes in general as well as specific food effects
30
25
20
15 NoGFJ
GFJ
10
0
0 10 20 30
Fig. 4. Saquinavir plasma concentration–time without and after grapefruit juice.
(e.g., grapefruit juice effect on the metabolism of molecules meta-

bolized by the cytochrome P450 3A4 enzyme), as shown in Fig. 4
for the difference in plasma concentration–time for the same
600 mg dose of saquinavir taken without and after grapefruit
juice (36).
l Improving the Model
Model quality is directly related to the quality and quantity of
experimental data available upon which to build equations, and
relationships that mechanistically describe how a system behaves.
Models will typically improve as new data become available. In cases
where the initial model is not able to describe the compound’s
behavior correctly, the model should be used to explore what
additional mechanisms are missing. These may include involvement
of transporters in absorption and their saturation, or possible pre-
cipitation of the compound in the gastrointestinal tract for oral
exposure. Absorption of compound after inhalation exposure may
not be limited to absorption from the respiratory system. The
process of mucociliary clearance followed by swallowing may result
in a significant portion of compound being absorbed from the
gastrointestinal tract (37).
4. Examples
Consider the following relatively simple example from the world of

pharmacology. A common drug used to treat hypertension, pro-
pranolol (Inderal), has both high permeability into intestinal cells
and high solubility. An oral dose dissolves quickly and is absorbed
quickly into the proximal small intestine. Using the GastroPlus
software, the user need only enter a value for the dose, permeability,
solubility, logP, and molecular weight, and with the default human
fasted physiology will correctly predict rapid absorption of 100 % of
the dose. This is all done without any calibration of the model.
Fig. 5. Propranolol absorption–time simulation results.
Figure 5 shows the absorption–time plot predicted for propranolol

by GastroPlus.
With the availability of intravenous (iv) data, pharmacokinetic
parameters for a two-compartment PK model can be calculated.
Figure 6 shows the plasma concentration–time plot produced using
the same absorption model from above along with pharmacokinetic
parameters fitted from intravenous data, and first pass extraction
calculated from the difference in AUC/dose between intravenous
and oral administration. Again, no calibration of the model was
required to achieve a nearly perfect match to the observed plasma
concentration–time data.
In a recent complex study that remains confidential at the time
of this writing, a large amount of data were available for individual
subjects, and the data showed that gastric emptying times had
unusually high variance across subjects.
In Fig. 7, the unusual high variance in gastric emptying is
evident within the same subject on different days. During a Gastro-
Plus simulation study for this data, it was discovered that changing
only gastric emptying time, while fixing the values of all other
model parameters, provided an excellent fit to observed plasma
concentration–time data for numerous subjects. This was an impor-
tant discovery, as it would have been easy to adjust other model
parameters (absorption rates and PK parameters) for different sub-
jects, but doing so would have required different pharmacokinetic
Fig. 6. Propranolol plasma concentration–time simulation results (line ¼ simulation, squares ¼ observations).
25
20
15
10
0
0 1 2 3 4 5 6 7 8 9
Time (h)
Fig. 7. High variability of gastric emptying (same subject, different days).
parameters for each subject—not a desirable way to fit data unless

there is strong justification for doing so.
Another example of complex absorption behavior comes from a
study of induced variability of gastric emptying in rats by alprazo-
lam (38). In this study, theophylline and alprazolam were
Fig. 8. Plasma concentration–time profiles for alprazolam (ALP) and theophylline (TPL) in four rats from Metsugi. Reprinted
from (38) with permission from Springer.
coadministered. Plasma concentration–time data were obtained for

independent intravenous doses and for coadministered oral solu-
tion doses. Figure 8 shows the plasma concentration–time profiles
for four different rats following oral administration of alprazolam
(12.5 mg/kg for rats A and B and 25 mg/kg for rats C and D) and
theophylline (5 mg/kg administered as aminophylline for all rats)
from the study. The double peaks in the oral data are evident for
both alprazolam and theophylline in rats A, C, and D, while only
alprazolam was observed to have a double peak in rat B. In order to
match the data, the simulation must encompass the complete pro-
cess of absorption through pharmacokinetics.
5. Notes
Using mechanistic simulation software requires an understanding

of the various phenomena that interact and how they’re being
simulated. The expert user of such software must be a good gener-
alist who understands each of the contributing areas of science well
enough to communicate with the experts in each area to obtain the
inputs needed for the simulations, and later to communicate the

results back to them. There are many guidelines, best practices, and
caveats, not all of which can be covered here. Advanced hands-on
training workshops are typically several days to a full week long.
Here we offer some suggestions that should help new users to avoid
common pitfalls.
Some of the most common mistakes made with complex simu-
lation software are
Failing to step back and look at the big picture before starting to
run simulations.
Failing to have a simulation plan.
Providing incorrect or incomplete inputs.
Fitting the wrong model parameters, or too many of them.
Using oversimplified models that do not capture the actual govern-
ing mechanisms.
Having the wrong person run the simulation studies.
Failing to step back and look at the big picture before starting to run
simulations. Before the simulation program is started up, it pays to
review the data to be analyzed and the goals of the simulation study.
For example, if the data consist of plasma concentration–time
(Cp–time) data for various subjects, various dose levels, and various
formulations, look over the data to see if it appears to be consistent.
Do you have the same information for all individuals (species, age,
body weight, gender, etc.). We find that quickly plotting the data in
a spreadsheet and looking for trends, nonlinearities, or anomalies
across different data sets often reveals clues as to the underlying key
mechanisms affecting absorption and pharmacokinetics. For exam-
ple, if double peaks are observed in mean Cp–time data, are double
peaks seen in the individual data for all or most subjects, or are they
simply due to different Tmax values for single peaks in various
subjects that only give the appearance of double peaks when aver-
aged? Does AUC appear to be dose-proportional across different
doses? If AUC is not dose-proportional, is it increasing or decreas-
ing as the dose is increased? If it’s decreasing, saturation of an
absorption process should be investigated—saturation of solubility
and saturation of an influx transporter system are often observed in
such cases. If AUC/Dose increasing with dose amount, saturation
of an efflux transporter, and/or saturation of a first pass extraction
mechanism may be involved. The interaction of absorption and
pharmacokinetics cannot be ignored—both must be simulated
together. Knowing the most likely mechanisms provides guidance
for the types of models to be investigated. Failure to employ a
model with saturable transport or metabolism when they are
involved can lead to much lost time and wasted simulations with a
simpler model that simply cannot explain the data across all doses.
Our rule is if you have to change the model parameters for different
doses, you don’t have the right model–you need one where the
dose amount is automatically accounted for so that the same model
can be used for all doses.
Failing to have a simulation plan. What are the goals of the absorp-
tion/PK simulation study? What are the next decisions that need to
be made to take the project forward? How are absorption/PK
simulation results expected to affect those decisions?
Examples of the purposes for running absorption/PK simula-
tions include
Analyzing animal data to assess a drug’s behavior.
Estimating first dose in human.
Fitting models to try to understand unusual observations.
Testing theories to decide what steps to take next in animal or
human studies.
Performing in vitro–in vivo correlations.
Each of these can involve different approaches to the simulation
study. For example, if the goal is to develop an in vitro–in vivo
correlation for a controlled release formulation, the first step should
be to develop the pharmacokinetic and absorption models from
data for iv and immediate release oral doses. Once those model
parameters are set, then the only remaining factor for the con-
trolled release formulation should be how it releases in vivo,
which might be quite different than the dissolution–time data
from an in vitro experiment. With modern IVIVC methods, the
in vivo release can be fitted (“deconvoluted”) to best match the
simulated Cp–time curve to the observed Cp–time data. This
allows direct comparison of the in vivo and in vitro release/disso-
lution–time profiles. A study of this type would have a different
simulation plan than one designed to estimate first dose in human.
Always develop a plan for what the simulation study is intended
to accomplish, and always organize and examine the data prior to
running simulations. It doesn’t take long, and you will save time
and frustration later.
Incorrect or incomplete inputs. Simulation software is not intelli-
gent—it does what you tell it to do. If you input water solubility
at 25 C, then you are simulating a gastrointestinal tract filled with
25 C water, so don’t blame the software if the solubility is too low
and absorption is less than observed. If you don’t provide a com-
plete picture of ionization (all pKas) then solubility-versus-pH and
logD-versus-pH will be wrong, and your results might differ dra-
matically from that they should be. Figure 9 shows the difference in
predicted absorption for a low-solubility monoprotic acid using
pKa values of 4, 4.5, and 5—at this pH, even small changes are
critical. If you input Caco-2 permeability as human intestinal
Fig. 9. Effects of small changes in pKa on dissolution and absorption.
permeability without correcting it, you will get very little absorp-
tion, because Caco-2 permeabilities are typically 2–3 orders of
magnitude lower than in vivo human permeability. If in vitro
metabolism measurements are not properly scaled to the simulated
species, body size, and enzyme expression levels, then the software
will simulate something different. The old computer adage “gar-
bage in—garbage out” applies!
Inputs to absorption/PK simulation software should represent
in vivo conditions to the maximum possible extent. This will include,
among others, ionization constants (pKas—ALL of them for multi-
protic compounds!), solubility versus pH in media that best repre-
sent in vivo conditions, log P (or log D at a specified pH—be sure
you understand the difference!), permeability versus region in the
intestinal tract, plasma protein binding, blood–plasma concentration
ratio, enzyme and transporter expression levels in various tissues, and
Vmax and Km values for metabolism and transport. Generally, Vmax
values for transporters will not be available and will need to be fitted;
however, Km values can often be gleaned from in vitro data. For oral
doses, in addition to physiological and physicochemical inputs,
the simulation program will also need an accurate description of
the formulation(s) to be simulated—particle size distributions,
shape factors for active pharmaceutical ingredient (API) particles,

and any excipients that may affect solubility and dissolution.
Fitting the wrong model parameters, or too many of them. Modern
complex mechanistic absorption/PK simulations allow fitting vir-
tually any combination of model parameters to observed data.
Often, many different combinations of fitted parameters can result
in statistically similarly good fits to the data. The modeler should
resist the temptation to force the simulation line through every data
point—it’s not about that! Statistical software may provide empiri-
cal functions that make pretty pictures, but with complex mecha-
nistic simulations, the goal should be to understand the
mechanisms that govern the behavior of the compound, not to
create perfect plots. The fewest number of parameters should be
used when fitting data to achieve an adequate (not perfect) fit.
Using oversimplified models that do not capture the actual governing
mechanisms. It has been our experience that some scientists use very
simple models for all of their work. Simple models can be adequate
for some data sets, but not for all. Using a one-compartment PK
model for a drug that actually undergoes two- or three-
compartment (or more) distribution will lead to false conclusions
and may lead to the wrong project decisions, wasting time, and
money. Ignoring nonlinearities in absorption and/or PK can result
in estimating incorrect doses for the next trial. Toxicity studies
involve large doses for the animals involved—simulation can pro-
vide an idea of what dose levels achieve maximum exposure (so that
animals are not wasted on higher doses that achieve no more
exposure than lower ones), but only if the model used accounts
for the true exposure–dose relationship.
Consider the behavior of the drug valacyclovir shown in Fig. 10
(39). The Cp–time data in the graph are for doses ranging from 100
to 1,000 mg. It is clear that the 1,000 mg dose does not result in
ten times the Cmax and AUC of the 100 mg dose, so right away we
know that something is getting saturated. Could it be solubility? It
is not likely for these dose levels as the solubility has been measured
at 7 mg/mL at pH 8 and 39 mg/mL at pH 6 (personal communi-
cation, Richard Lloyd, GSK), so in 250 mL of fluid (the usual
amount assumed in small intestine simulations), saturation at pH
6 would require 1,750 mg. In vitro data indicate that valacyclovir
undergoes chemical degradation at pH > 5, with the rate of
degradation increasing with pH, as shown in Fig. 11 (40). In vitro
data also indicate that the molecule is a substrate for the PepT1
oligopeptide influx transporter (41) (in fact, valacyclovir was
designed as a prodrug for acyclovir to take advantage of the
increased permeability using the transporter). Thus, when concen-
trations are high enough to saturate the transporter, gut permeabil-
ity will be a function of concentration, as shown in Fig. 12. Even at
lower concentrations, permeability will be dependent on the
Fig. 10. Nonlinear dose-dependence of valacyclovir. Reprinted from (39) with permission from Macmillan Publishers Ltd.
Fig. 11. Degradation of valacyclovir as a function of pH. Reprinted from (40) with
permission from John Wiley & Sons, Inc.
amount (expression level) of influx transporter(s) in the enterocytes

where the drug is in solution at the intestinal wall. In vivo data are
available for both 5 and 10 mg iv doses, and fitting across these two
doses simultaneously using the PKPlus™ module within Gastro-
Plus shows the best fit (based on the lowest Akaike Information
Fig. 12. Concentration dependence of valacyclovir effective permeability (Km ¼ 1.22 mM).
Reprinted from (40) with permission from John Wiley & Sons, Inc.
Fig. 13. One-compartment PK model for 5 and 10 mg valacyclovir iv doses.
Criterion) is a two-compartment PK model, as shown in Figs. 13

and 14. The effects of the aforementioned mechanisms of degrada-
tion and influx transport, as well as the multicompartment model,
could be ignored with a simple linear one-compartment model. But
the model parameters would be adjusted to cover up the inadequa-
cies in the model, and it would only apply to one dose level at a time.
Having the wrong person run the simulation studies. In addition to
how simulation software should be run, there is the question of
Fig. 14. Two-compartment PK model for 5 and 10 mg valacyclovir iv doses.
who should run it. The nature of simulation software is that it is

highly multidisciplinary. Thus, there is a need for generalists who
have an aptitude and a desire to learn a variety of areas of science, to
communicate with the specialists in those areas, and to understand
how to integrate their data into a cohesive model and interpret the
results. Academia in the pharmaceutical sciences has not tradition-
ally trained generalists. Instead, students of the pharmaceutical
sciences tend to specialize—focusing narrowly on a particular area
to become expert at it. The generalist is one who can go deep
enough into any of the specialties to know what questions to ask,
how to interpret and apply the data, and to have the insight and
critical thinking skills when things are not going as expected to
know where to look for resolution. Data from in vivo trials often
exhibit behaviors that are not attributable to obvious explanations.
Complex interplay among competing phenomena requires the
generalist to experiment with the absorption/PK/PD simulations
to determine which of multiple possible explanations best explain
the data. Often, the best of a series of competing models cannot be
identified, and the generalist should be able to guide experimental-
ists with respect to what experiments are needed to rule out incor-
rect models and home in on the one that truly represents what is
happening in vivo.
To become such a generalist requires training, time, and prac-
tice, and the user’s organization must provide the time and
resources for these to take place. One cannot expect to be expert
at solving complex problems using absorption/PK/PD software if
it is only a small side activity. As with almost any tool, only through
continued use can skills be kept at a high level. Incorrect use of a
tool can cause more damage than good, so if the user is not
provided appropriate time and resources to be good at what he/
she is expected to do, then the organization should consider out-
sourcing for the required expertise. We have seen all too often that
inexperienced users blame the software when things are going awry,
only to discover that inputs did not represent reality, or the
approach used modeling methods that were too simplified for the
problem at hand.
References
1. Swaan PW, Marks GJ, Ryan FM et al (1994) 12. Fordtran JS, Rector FC Jr, Ewton MF et al (1965)
Determination of transport rates for arginine Permeability characteristics of the human small
and acetaminophen in rabbit intestinal tissues intestine. J Clin Invest 44(12):1935–1944
in vitro. Pharm Res 11(2):283–287 13. Billich CO, Levitan R (1969) Effects of sodium
2. Slattery JT, Levy G (1979) Acetaminophen concentration and osmolality on water and
kinetics in acutely poisoned patients. Clin Phar- electrolyte absorption form the intact human
macol Ther 25(2):184–195 colon. J Clin Invest 48(7):1336–1347
3. Clements JA, Heading RC, Nimmo WS et al 14. Parrott N, Lukacova V, Fraczkiewicz G et al
(1978) Kinetics of acetaminophen absorption (2009) Predicting pharmacokinetics of drugs
and gastric emptying in man. Clin Pharmacol using physiologically based modeling-
Ther 24(4):420–431 application to food effects. AAPS J 11(1):45–53
4. Hogben CAM, Tocco DJ, Brodie BB et al 15. Agoram B, Woltosz WS, Bolger MB (2001)
(1959) On the mechanism of intestinal absorp- Predicting the impact of physiological and bio-
tion of drugs. J Pharmacol Exp Ther chemical processes on oral drug bioavailability.
125:275–282 Adv Drug Deliv Rev 50(Suppl 1):S41–S67
5. Tubic M, Wagner D, Spahn-Langguth H et al 16. Bolger MB, Agoram B, Fraczkiewicz R et al
(2006) In silico modeling of non-linear drug (2003) Simulation of absorption, metabolism,
absorption for the P-gp substrate talinolol and and bioavailability. In: Waterbeemd HVD,
of consequences for the resulting pharmacody- Lennern€as H, Artursson P (eds) Drug bioavail-
namic effect. Pharm Res 23(8):1712–1720 ability. Estimation of solubility, permeability
6. Bolger MB, Lukacova V, Woltosz WS (2009) and bioavailability. Wiley, New York
Simulations of the nonlinear dose dependence 17. Avdeef A (2001) Physicochemical profiling
for substrates of influx and efflux transporters (solubility, permeability and charge state).
in the human intestine. AAPS J 11(2):353–363 Curr Top Med Chem 1(4):277–351
7. Swaan PW (1998) Recent advances in intestinal 18. Yu LX, Amidon GL (1999) A compartmental
macromolecular drug delivery via receptor- absorption and transit model for estimating
mediated transport pathways. Pharm Res 15 oral drug absorption. Int J Pharm 186
(6):826–834 (2):119–125
8. Palm K, Luthman K, Ros J et al (1999) Effect 19. Qiu Y, Kuo CH, Zappi ME (2001) Perfor-
of molecular charge on intestinal epithelial mance and simulation of ozone absorption
drug transport: pH- dependent transport of and reactions in a stirred-tank reactor. Environ
cationic drugs. J Pharmacol Exp Ther 291 Sci Technol 35(1):209–215
(2):435–443 20. Bogdanffy MS, Mathison BH, Kuykendall JR
9. Adson A, Raub TJ, Burton PS et al (1994) et al (1997) Critical factors in assessing risk
Quantitative approaches to delineate paracellu- from exposure to nasal carcinogens. Mutat
lar diffusion in cultured epithelial cell mono- Res 380(1–2):125–141
layers. J Pharm Sci 83(11):1529–1536 21. Fasano WJ, McDougal JN (2008) In vitro der-
10. Schiller C, Frohlich CP, Giessmann T et al mal absorption rate testing of certain chemicals
(2005) Intestinal fluid volumes and transit of of interest to the Occupational Safety and
dosage forms as assessed by magnetic reso- Health Administration: summary and evalua-
nance imaging. Aliment Pharmacol Ther tion of USEPA’s mandated testing. Regul
22:971–979 Toxicol Pharmacol 51(2):181–194
11. Wilson JP (1967) Surface area of the small 22. Nohynek GJ, Dufour EK, Roberts MS (2008)
intestine in man. Gut 8:618–621 Nanotechnology, cosmetics and the skin: is
there a health risk? Skin Pharmacol Physiol 21 32. Sugano K (2009) Computational oral absorp-
(3):136–149 tion simulation for Low-solubility compounds.
23. Mansour SA, Gad MF (2010) Risk assessment Chem Biodivers 6:2014–2029
of pesticides and heavy metals contaminants in 33. Porter CJH, Trevaskis NL, Charman WN
vegetables: a novel bioassay method using (2007) Lipids and lipid-based formulations:
Daphnia magna Straus. Food Chem Toxicol optimizing the oral delivery of lipophilic
48(1):377–389 drugs. Nat Rev Drug Discov 6:231–248
24. Bohus E, Coen M, Keun HC et al (2008) 34. Davies NM, Feddah MR (2003) A novel
Temporal metabonomic modeling of method for assessing dissolution of aerosol
l-arginine-induced exocrine pancreatitis. inhaler products. Int J Pharm 255
J Proteome Res 7(10):4435–4445 (1–2):175–187
25. Andersen ME, Krishnan K (1994) Physiologi- 35. Son YJ, McConville JT (2009) Development of
cally based pharmacokinetics and cancer risk a standardized dissolution test method for
assessment. Environ Health Perspect 102 inhaled pharmaceutical formulations. Int J
(Suppl 1):103–108 Pharm 2009(382):1–2
26. Dobrev ID, Andersen ME, Yang RSH (2002) 36. Kupferschmidt HH, Fattinger KE, Ha HR et al
In silico toxicology: simulating interaction (1998) Grapefruit juice enhances the
thresholds for human exposure to mixtures of bioavailability of the HIV protease inhibitor
trichloroethylene, tetrachloroethylene, and saquinavir in man. Br J Clin Pharmacol 45
1,1,1-trichloroethane. Environ Health Per- (4):355–359
spect 110:1031–1039 37. Chilvers MA, O’Callaghan C (2000) Local
27. Vinegar A, Jepson GW, Cisneros M et al mucociliary defence mechanisms. Paediatr
(2000) Setting safe acute exposure limits for Respir Rev 1(1):27–34
halon replacement chemicals using physiologi- 38. Metsugi Y, Miyaji Y, Ogawara K et al (2008)
cally based pharmacokinetic modeling. Inhal Appearance of double peaks in plasma concen-
Toxicol 12(8):751–763 tration–time profile after oral administration
28. Rao HV, Ginsberg GL (1997) A depends on gastric emptying profile and weight
physiologically-based pharmacokinetic model function. Pharm Res 25(4):886–895
assessment of methyl t-butyl ether in ground- 39. Weller S, Blum MR, Doucette M et al (1993)
water for bathing and showering determina- Pharmacokinetics of the acyclovir pro-drug
tion. Risk Anal 17(5):583–598 valacyclovir after escalating single- and
29. Yang Y, Xu X, Georgopoulos P (2010) A multiple-dose administration to normal volun-
Bayesian population PBPK model for multi- teers. Clin Pharmacol Ther 54(6):595–605
route chloroform exposure. J Expo Sci Environ 40. Sinko PJ, Balimane PV (1998) Carrier-
Epidemiol 20(4):326–341 mediated intestinal absorption of valacyclovir,
30. Campbell A (2009) Development of PBPK the L-valyl ester prodrug of acyclovir: 1. Inter-
model of molinate and molinate sulfoxide in actions with peptides, organic anions and
rats and humans. Regul Toxicol Pharmacol 53 organic cations in rats. Biopharm Drug Dispos
(3):195–204 19:209–217
31. Fagerholm U, Johansson M, Lennernas H 41. Giacomini KM, Huang SM, Tweedie DJ et al
(1996) Comparison between permeability (2010) Membrane transporters in drug devel-
coefficients in rat and human jejunum. Pharm opment. Nat Rev Drug Discov 9(3):215–236
Res 13(9):1336–1342
Chapter 14
Prediction of Pharmacokinetic Parameters

A.K. Madan and Harish Dureja
Abstract
In silico tools specifically developed for prediction of pharmacokinetic parameters are of particular interest
to pharmaceutical industry because of the high potential of discarding inappropriate molecules during
an early stage of drug development itself with consequent saving of vital resources and valuable time.
The ultimate goal of the in silico models of absorption, distribution, metabolism, and excretion (ADME)
properties is the accurate prediction of the in vivo pharmacokinetics of a potential drug molecule in man,
whilst it exists only as a virtual structure. Various types of in silico models developed for successful
prediction of the ADME parameters like oral absorption, bioavailability, plasma protein binding, tissue
distribution, clearance, half-life, etc. have been briefly described in this chapter.
Key words: In silico models, Absorption, Distribution, Metabolism, Excretion, Bioavailability,

Protein binding, Classification models, QSAR, Pharmacokinetics
1. Introduction
Drug discovery is a highly complex and expensive endeavor involving

seven major steps: disease selection, target hypothesis, lead identifica-
tion, lead optimization, preclinical trial, clinical trial, and pharmaco-
genomic optimization. Among the various techniques used to
accelerate the drug discovery process, virtual (or in silico) ligand
screening based upon the structure of known ligands or on the
structure of the receptor is gradually emerging as a method of choice
(1). The application of computational methodology during drug
discovery process leads to significant reduction in number of experi-
mental studies required for compound selection and development
and for significantly improving the success rate. The in silico
approaches are being widely used today to assess the absorption,
distribution, metabolism, and excretion (ADME) properties of com-
pounds at the early stages of drug discovery process (2). Study of
ADME profiles is being widely used in drug discovery to understand
337
338 A.K. Madan and H. Dureja
Fig. 1. Reasons for failure in drug development. Reproduced from ref. 6 with permission
from Elsevier Limited, UK.
the properties necessary to convert lead structures into good

medicines (3). Early consideration of ADME properties is also
becoming increasingly important due to the implementation of com-
binatorial chemistry and high-throughput screening so as to generate
vast number of potential lead compounds (4). The FDA has estab-
lished a Simulation Working Group and information on current views
can easily be found at the Center for Drug Development Science. In
addition, other groups such as ECVAM (The European Centre for
the Validation of Alternative Methods) have identified predictive
pharmacokinetic modeling as a beneficial tool for reduction in the
use of animals in drug discovery and development (5). As a result of
studies in the late 1990s, indicating that the poor pharmacokinetics
and toxicity were major causes of costly late-stage failures in drug
development, there is increasing realization to consider these areas as
early as possible in the drug discovery process (3).
Historically, inappropriate pharmacokinetic characteristics
(Fig. 1) have been a major cause for the failure of compounds in
the later stages of development (6). This was largely owing to an
inability to rectify poor pharmacokinetic characteristics inherent in
many lead series adopted for lead optimization (7). Poor pharma-
cokinetic properties are mainly responsible for terminating the
development of drug candidates, with huge financial impact on
the cost of R&D in the pharmaceutical industry (8). The failure
rate due to pharmacokinetic problems may be greater than
reported because poor pharmacokinetic properties such as lack of
absorption, rapid metabolism or elimination, or unfavorable distri-
bution may be clinically manifested as a lack of efficacy (5). There is
an ever increasing need for good tools for predicting these proper-
ties to serve dual objectives—first, at the initial design stage of new
compounds and compound libraries so as to minimize the risk of
late-stage attrition; and second, to optimize the screening and
testing by considering only the most promising compounds (9).
The significant failure/attrition rate of drug candidates in later
developmental stages is the key driving force for the development
14 Prediction of Pharmacokinetic Parameters 339
of in vitro, in vivo, and in silico predictive tools that can eliminate

inappropriate compounds before substantial time and money is
invested in testing (10, 11). Firstly, a wide variety of in vitro assays
have been automated through the use of robotics and miniaturiza-
tion. Secondly, in silico models are being used to facilitate selection
of appropriate assays, as well as selection of subsets of compounds
to go through these screens. Thirdly, predictive models have been
developed that might ultimately become sophisticated and reliable
enough to replace in vitro assays and/or in vivo experiments (9).
Significant advances in automation technology and experimen-
tal ADME/Tox techniques, such as the Caco-2 permeability screen-
ing based on the 3-day Caco-2 culture system, the metabolic stability
screening using microsomes or hepatocytes, and the P450 inhibition
assay, have enabled the assaying of relatively much larger number
of compounds when compared to traditional strategies (12). The
approach of quickly predicting the ADME properties through
computational means is of much greater importance because the
experimental ADME testing is enormously expensive and arduous.
Therefore, the use of computational models in the prediction of
ADME parameters has been growing rapidly in the drug discovery
process because of their immense benefits in throughput and early
application in drug design (13). Compared to experimental
approaches, these in silico methods have distinct advantage that
they do not initially require the compounds to be synthesized and
experimentally tested. Moreover, compound databases can be virtu-
ally screened rapidly in a high-throughput fashion if the calculations
are computationally efficient. Until now, many computational meth-
odologies have already been developed for the ADME/Tox proper-
ties which include aqueous solubility, bioavailability, intestinal
absorption, blood–brain barrier (BBB) penetration, drug–drug
interactions, transporter, plasma–protein binding, and toxicity
(9, 11). The computational ADME has witnessed significant
advances since early 1970s (14). In silico predictive methods are
significantly influenced by the quality, quantity, sources, and gener-
ation of the measured data available for model development (15).
The early assessment of ADME characteristics will definitely
help pharmaceutical scientists to select the best candidates for
development as well as to discard those with low probability of
success. The ultimate goal of the in silico models of ADME proper-
ties is the accurate prediction of the in vivo pharmacokinetic of a
potential drug molecule in man, whilst it exists only as a virtual
structure (2). The unexpectedly good prediction power of the
simple computational models with high-throughput renders them
vital tools in the early screening of drug candidates, whereas labori-
ous cell culture models and animal studies can be beneficial in the
later phases when comprehensive information about the transport
mechanisms is needed (16). There is gradually emerging consensus
that in silico predictions are no less predictive to what occurs in vivo
Fig. 2. The linkage between the data generation, databases, and model building.
Reproduced from ref. 14 with permission from Elsevier Limited, UK.
than are in vitro tests, with the distinct advantage that far less
investment in technology, resources, and time is needed (17). An
amalgamation of in vitro experiments and in silico modeling will
dramatically increase the insight and knowledge about the relevant
physiological and pharmacological processes in drug discovery (3).
The linkage between data generation and model building has been
illustrated in Fig. 2 (14).
For data modeling, quantitative structure–property relationship
(QSPR) approaches are generally applied. Based on appropriate
descriptors, QSPR exploiting from simple linear regression to mod-
ern multivariate analysis techniques or machine-learning methods is
now being extensively applied for the analysis of ADME data. Data
modeling can be efficiently applied to large number of molecules,
but require a significant quantity of high quality data to derive a
relationship between the structures and the modeled property
(11). Quantitative structure–activity relationship (QSAR) model-
ing is a well known and established discipline, where physicochem-
ical and molecular descriptors are correlated with bioassay drug
concentrations eliciting a standard pharmacological response such
as EC50 or IC50. The extension of the QSAR approach to pharma-
cokinetic parameters is similarly referred to as quantitative struc-
ture–pharmacokinetic relationships (QSPKR or QSPkR) modeling.
A relatively less utilized technique, albeit just as empirical as the
QSAR approach, is the direct mapping of molecular descriptors
with the time course of plasma levels following administration of
drug by various routes (18). Pharmacokinetics is the study of the
time course of a drug within the body incorporating the processes
of ADME. Pharmacokinetics can be defined as the study of the time
course of drug and metabolite levels in various fluids, tissues, and

excreta of the body, and of the mathematical relationships required to
describe them (5). Pharmacokinetic data is highly beneficial in
optimization of the dosage form design and establishing the dos-
age regimen (19).
Therefore, ADME parameters are of utmost importance and
numerous computational approaches of diverse nature have been
developed for these parameters, which includes bioavailability,
human intestinal absorption, permeability, BBB penetration, half-
life, volume of distribution, metabolism and clearance, etc. The
present chapter deals with the in silico models for prediction of
pharmacokinetic parameters.
2. Materials
and Methods
The first phase of ADME computational models began in the
1960s with development of classical QSAR by Hansch. According
to him quantitative relationships could be developed for the
lipophilicity of the drugs as well as metabolic parameters such as
microsomal hydroxylation, demethylation, CYP450–CYP420 con-
version, and duration of drug action (20). The simplest ADME-
concerned filter for short listing of potential drug molecule may be
“rule of five” proposed by Lipinski et al. in 1997 (21). According to
Lipinski, it is much easier to optimize pharmacokinetic properties
in initial stage and to optimize the receptor binding affinity at a later
stage of drug discovery process (22). Though the “rule of five” may
be too simple approach but it has definitely generated/stimulated
considerable interest in development of fast, generally applicable
filters for ADME. This is now being suggested as a “rule of thumb”
or guide rather than a definitive cutoff. However, “rule of five” and
other general rules simply lay down minimum criterion of a mole-
cule to be drug-like. The values of different desired characteristics
proposed by various researchers in drug-like rules have been com-
piled in Table 1 (21, 23, 24). It is relatively easy for a molecule to
fall within the “rule of five” but has no certainty to lead to a drug.
As a matter of fact, 68.7% compounds of 2.4 million compounds in
Available Chemical Directory (ACD) Screening Database and 55%
compounds of 240,000 compounds in ACD have no violation of
“rule of five” at all (11). Therefore, much more stringent criteria need
to be laid down so as to discriminate drug-like molecules from others.
Various QSAR approaches ranging from simple linear regression
to modern multivariate analysis techniques are now being applied to
the analysis of ADME data. Data mining and machine-learning tech-
niques which were originally developed and used in other fields are
now being successfully used for this purpose. These techniques
Table 1
Drug-like rules proposed by various researchers
Characteristics of drugs
Data set Octanol/ Number of

employed for water Number of hydrogen-
Research developing Molecular partition hydrogen- bond
group drug-like rules weight coefficient bond donors acceptors Reference
Lipinski and 2,245 Drugs <500 Da <5 (CLOGP) <5 <10 (21)
coworkers from World or 4.15
Drug Index (MLOGP)
Ghose and 7,183 Compounds 160–480 –0.4 to 5.6 Molar Total (22)
coworkers from (average calculated refractivity: number
Comprehensive value: log P 40–130 of atoms:
Medicinal 357) (ALOGP) (average 20–70
Chemistry (average value: 97) (average
database value: 2.52) value: 48)
Wenlock 594 Compounds <473 <5 (calculated <4 <7 (23)
and from Physicians log P) or
coworkers Desk Reference <4.3
1999 (calculated
log D7.4)
include neural networks, classification and regression tree, self-

organizing maps, support vector machine, and recursive partitioning
(9). It is important to select an appropriate statistical and mathemati-
cal tool for the analysis of ADME data. Therefore, the data needs to
be pre-analyzed so that linear versus nonlinear methods are correctly
selected and utilized. Stringent model validation is an integral com-
ponent of the successful development of any statistical/mathematical
model. Without proper validation, the predictive ability of the model
cannot be estimated (25).
Apart from the development of prediction of higher confidence,
another major challenge is to develop an in silico ADME/Tox pre-
diction software system and integrate the existing tools into a simpli-
fied single, consistent workflow environment. Number of companies
active in the field of molecular modeling has now developed software
(s) to assist in the estimation of ADME/Tox properties (11). These
software/programs are basically computer simulation models devel-
oped and validated for prediction of ADME outcomes, such as rate
and extent of absorption, using a limited number of in vitro data
inputs. They are advanced compartmental absorption and transit
models, in which physicochemical concepts, such as solubility and
lipophilicity, are easily incorporated than physiological aspects
involving transporters and metabolism (9). Commercially available

software/programs used for prediction of ADME properties have
been exemplified in Table 2.
2.1. Models Various in silico models have been developed for the prediction of
for Prediction of ADME pharmacokinetic parameters, which include oral absorption, bio-
Parameters availability, plasma protein binding (PPB), tissue distribution, clear-
ance, half-life, etc. Some of these predictive models have been
exemplified in Table 3.
Absorption: The first step in the drug absorption process is the
disintegration of the tablet or capsule and subsequent dissolution
of the drug. Poor biopharmaceutical properties, i.e., poor aqueous
solubility and slow dissolution rate can lead to poor and delayed oral
absorption and hence low oral bioavailability. Consequently, low
solubility is naturally detrimental to good and complete oral absorp-
tion, and so the early consideration of this property is of great
significance in drug discovery process (9). Prediction of intestinal
absorption is a major goal in the design, optimization, and selection
of suitable candidates for development as oral drugs (28). Most of
the computational approaches currently being used to predict
absorption are based on the assumption that absorption is passive
and can be predicted from various molecular descriptors of the
compound. No account is taken care of active transport processes,
including both uptake and efflux transporters. Presently, it is not
known that how many compounds are actually actively transported
in the gut (2). Drug absorption following oral administration is
difficult to predict because of complex drug-specific parameters
and physiological processes. These include drug release from the
dosage form and dissolution, aqueous solubility, gastrointestinal
(GI) motility and contents, pH, GI blood flow, active or passive
transport systems, and pre-systemic and first-pass metabolism (18).
Permeability: Human intestinal permeability (important for the
absorption of oral drugs) and BBB permeability (important for
the distribution of CNS active agents and toxicity of non-CNS
drugs) constitute important pharmacokinetic parameters (93).
The hydrogen-bonding capacity of a drug solute is generally recog-
nized as an important determinant of permeability. In order to
penetrate through a membrane, a drug molecule needs to break/
rupture hydrogen bonds with its aqueous environment. The more
potential hydrogen bonds in a molecule will necessitate increased
bond breaking costs. As a consequence high hydrogen-bonding
potential is an unfavorable property that is often related to reduced
permeability and absorption (9). Caco-2 cells permeability data still
constitute a major target property for modeling, in spite of sub-
stantial inter- and even intra-laboratory variability in the data (94).
The BBB represents a significant barrier towards entry of drugs into
Table 2
Some examples of commercially available software/
program for prediction of ADME parameters
Name of the software/

program Company Website
ACD/ADME Suite ACD Labs www.acdlabs.com
C2.ADME Accelrys www.accelrys.com
CLOE PK Cyprotex www.cyprotex.com
Gastroplus Simulations Plus www.simlations-plus.com
KnowIAll ADME/Tox Bio-Rad www3.bio-rad.com
Laboratories
META Multicase www.multicase.com
MetabolExpert Compudrug www.compudrug.com
QikProp Schrodinger www.schrodinger.com
Volsurf+ Molecular Discovery www.moldiscovery.com
MetaCore GeneGo www.genego.com
MetaDrug GeneGo www.genego.com
Metasite Molecular Discovery www.moldiscovery.com
MEXAlert Compudrug www.compudrug.com
RetroMEX Compudrug www.compudrug.com
Moka Molecular Discovery www.moldiscovery.com
iDEA pkEXPRESS Lion Biosciences; www.lionbiosciences.com;
Biowisdom www.biowisdom.com
MCASE/MC4PC Multicase www.multicase.com
METAPC Multicase www.multicase.com
ADMET Predictor Simulations Plus www.simlations-plus.com
METEOR LHASA www.lhasalimited.org
PK Solutions Summit PK www.summitpk.com
QMPRPlus Simulations Plus www.simlations-plus.com
PK-sim Bayer Technology www.bayertechnology.
Services com
Table 3
Examples of predictive models for ADME
Pharmacokinetic Statistical/modeling
parameter Data set technique References
Oral bioavailability 188 Non-congeneric organic Fuzzy adaptive least square (26)
medicinals
Intestinal drug absorption 6 b-Adrenoreceptor Regression (27)
antagonists
Human intestinal 67 Drugs and drug-like Artificial neural network (28)
absorption compounds (training set);
9 drugs (cross-validation
set); and 10 drugs (external
cross-validation set)
Intestinal drug absorption 6 b-Adrenoreceptor Regression (27)
antagonists
Human intestinal 67 Drugs and drug-like Artificial neural network (28)
absorption compounds (training set);
9 drugs (cross-validation
set); and 10 drugs (external
cross-validation set)
Tissue distribution 9 n-Alkyl-5-ethyl barbituric Regression (29)
acids
Human jejunal permeability 22 Structurally diverse Multivariate data analysis (30)
compounds (training set)
and 34 compounds
(external validation set)
Intestinal absorption 20 Drugs (training set); 74 Multilinear regression (31)
drugs (prediction set)
Intestinal membrane 10 Non-peptide endothelin Rule of five, molecular (32)
permeability receptor antagonists mechanics, and quantum
mechanics
Intestinal absorption 20 Molecules Artificial neural network (33)
based correlation
(hashkey) model
Intestinal absorption 234 Compounds Statistical pattern (34)
recognition model
Oral bioavailability 591 Structurally diverse Step-wise regression (35)
compounds
Human intestinal 38 Drugs (training set) and Multilinear regression (36)
absorption 131 drugs (prediction set)
Plasma protein binding 95 Diverse compounds Genetic function (37)
approximation
(continued)
Table 3
(continued)
Oral bioavailability 21 Drugs and drug candidates Graphical classification (38)
model
Oral bioavailability 1,100 Drug candidates Regression (39)
Clearance, volume of 272 Structurally unrelated Principal component (40)
distribution, fractal drugs analysis and projection to
clearance, and fractal latent structures
volume
Human clearance 68 Drugs Multiple linear regression, (41)
partial least squares
method, and artificial
neural network
Blood–brain barrier 324 Drugs and drug-like Neural network and Support (42)
permeability molecules vector machine
Metabolic stability 631 Diverse chemicals k-Nearest neighbor (43)
(training set) and 107
chemicals (validation set)
Human intestinal 1,000 Drug-like compounds Recursive partitioning (44)
absorption analyses
Aqueous solubility, plasma 202, 226, and 204 Linear regression analysis (45)
protein binding, and compounds (training set)
human volume of and 442, 94, and 124
distribution at steady state compounds (test set),
respectively
Clearance (oral) 87 Drugs Multivariate regression (46)
analyses, multiple linear
regression, and partial
least squares analysis
Half-life, renal, and total 20 Cephalosporins Artificial neural network (47)
body clearance, fraction
excreted in urine, volume
of distribution, and
fraction bound to plasma
proteins
Human intestinal 82 Compounds (training set) Topological substructural (48)
absorption and 127 drugs (prediction approach (TOPS-MODE)
set) and 109 drugs (test set)
Clearances, fraction bound 62 Structurally diverse Artificial neural network (49)
to plasma proteins and compounds
volume of distribution
(continued)
Table 3
(continued)
Blood–brain barrier 150 Chemically diverse 4D-molecular similarity and (50)
penetration compounds cluster analysis
Oral absorption 1,260 Compounds Classification regression (51)
trees
Blood–brain barrier 415 Drugs Logistic regression, linear (52)
penetration discriminant analysis, k-
nearest neighbor, decision
tree, probabilistic neural
network, and support
vector machine
Human serum albumin 37 Structurally related 3D-QSAR (53)
affinity interleukin-8 inhibitors
Metabolism 42 Derivatives Comparative molecular field (54)
analysis
p-Glycoprotein inhibitors Series of 1,4-dihydropyridines Forward inclusion coupled (55)
and pyridines with multiple regression
analysis and partial least
square regression
Permeability 20 Drugs Partial least square method (56)
Steady-state volume of 199 Compounds (human Bayesian neural networks, (57)
distribution data) and 2,086 classification and
compounds (rat data) regression trees, and
partial least squares
Oral drug absorption 22 Structurally diverse drugs Partial least square analysis (58)
(training set) and 169
drugs (prediction set)
Blood–brain barrier 191 Drugs (training set) and Artificial neural network (59)
permeability 50 drugs (test set)
Renal clearance 130 Diverse compounds Principal component (60)
(training set) and 20 analysis and partial least
compounds (test set) squares analysis
Drug clearance 398 Compounds (training General regression neural (61)
set) and 105 compounds network, support vector
(validation set) regression, and k-nearest
neighbor
Blood–brain barrier 28 Structurally diverse Moving average analysis (62, 63)
permeability compounds (training set); based classification models
31 compounds (validation
set) and 31 compounds
(cross-validation set)
(continued)
Table 3
(continued)
Human intestinal 480 Structural diverse drug- Support vector machine (64)
absorption like molecules (training set) based classification model
and 98 molecules (test set)
Human oral intestinal drug 164 Compounds (training Membrane-interaction (65)
absorption set) and 24 compounds QSAR analysis
(test set)
Blood–brain barrier 136 Compounds Approximate similarity (66)
permeability matrices and partial least
square
Plasma protein binding 686 Compounds Partial least square (67)
regression
Oral bioavailability 30 Compounds Regression (68)
Oral bioavailability 768 Chemical compounds Correlation (69)
Oral bioavailability 250 Structurally diverse Hologram-QSAR (70)
molecules (training set) and
52 molecules (test set)
Metabolism, tissue 50 Structurally diverse Correlation (71)
distribution, compounds
bioavailability
Human intestinal 455 Compounds (training Genetic function (72)
absorption set) and 193 compounds approximation technique
(test set)
Blood–brain barrier 78 Compounds ( training set) Multilinear regression (73)
penetration and 25 compounds (test
set)
Blood–brain barrier 159 Compounds (training k-Nearest neighbors and (74)
permeability set) and 99 drugs (external support vector machine
test set-1) and 267 organic
compounds (external test
set-2)
Oral clearance 24 Compounds Allometric approaches (75)
Plasma protein binding and 692 Compounds Support vector machine (76)
oral bioavailability combined with genetic
algorithm
Half-life, renal, and total 20 Cephalosporins Random forest; decision (77)
body clearance, fraction tree; and moving average
excreted in urine, volume analysis
of distribution, and
fraction bound to plasma
proteins
(continued)
Table 3
(continued)
Tmax 28 Structurally diverse Decision tree and moving (78)
antihistamines average analysis
Blood–brain barrier 280 Compounds Regression (79)
permeability
Buccal permeability 15 Drugs (training set) and Multiple linear regression (80)
13 compounds (test set) and maximum likelihood
estimations
Human volume of 669 Drug compounds Linear and nonlinear (81)
distribution at steady state (training set) and 29 statistical techniques,
compounds (test set) partial least squares, and
random forest
Clearance 20,000 Unique compounds Bayesian classification and (82)
extended connectivity
fingerprints
Hepatic clearance 64 Drugs (training set) and Multilinear regression (83)
22 drugs(test set) analysis
Hepatic clearance 33 Drugs Multilinear regression (84)
analysis
Drug distribution 93 Drugs Partial least square and (85)
artificial neural network
Hepatic metabolic clearance 27,697 Compounds Binary classification model (86)
Hepatic clearance 50 Drugs Multilinear regression (87)
analysis
Bioavailability 75 Compounds Cluster analysis (88)
Blood–brain barrier 193 Compounds Genetic approximation (89)
penetration based regression model
Blood–brain barrier 1,093 Compounds (for BBB) Substructure pattern (90)
penetration and human and 480 compounds (for recognition and support
intestinal absorption HIA) vector machine
Clearance (total) 370 Compounds (training k-Nearest neighbors (91)
set) and 92 compounds
(test set)
Human hepatocyte intrinsic 71 Drugs (training set); 18 Artificial neural network (92)
clearance drugs (test set-1); and 112
drugs (test set-2)
the brain and CNS (7). Brain penetration is of utmost importance

for drug candidates where the site of action is within the CNS.
However, BBB penetration needs to be minimized for drug mole-
cules that target peripheral sites to reduce potential central nervous
system-related side effects (95). Therefore, it is essential for the
scientific community to develop models that can easily discriminate
between molecules with high or low BBB permeability.
Bioavailability: Among the pharmacokinetic properties, a low and
highly variable bioavailability is indeed the major cause for discon-
tinuing further development of the drug. Oral bioavailability of a
drug is dependent upon many factors, such as dissolution in the GI
tract, intestinal membrane permeation, and intestinal/hepatic first-
pass metabolism. The in silico prediction of oral bioavailability may
be inspired by the Lipinski’s “rule of five” (69). Membrane perme-
ation is recognized as a vital requirement for oral bioavailability in
the absence of active transport, and failure to achieve this will
usually result in reduced oral bioavailability (38). Theoretical
aspects on transporter proteins, especially P-glycoprotein (P-gp),
have been vigorously studied during recent years. This is attributed
to the reason that transport proteins are found in most organs
involved in the uptake and elimination of endogenous compounds
and xenobiotics, including drug molecules (10). In recent years,
several prediction models for oral bioavailability based on QSPkR
analysis have been reported.
Distribution: Tissue distribution is a vital determinant of the phar-
macokinetic profile of a drug molecule. Currently, there are several
techniques available for the prediction of tissue distribution. These
techniques either predict tissue:plasma ratios or the volume of
distribution at a steady state (2). The extent of distribution is
defined by the volume of distribution, a proportionality factor
relating the plasma level to the total amount of drug in the body.
The volume of distribution, along with the clearance rate, deter-
mines the half-life of a drug and consequently its dosage regimen
(18, 96). Hence, in drug development, the early prediction of both
pharmacokinetic parameters would naturally be highly beneficial.
Plasma protein binding (PPB): Drugs frequently exhibit nonspe-
cific binding in plasma and tissues to constituent proteins, such as
albumin (primarily acidic drugs), a1-acid glycoprotein (basic
drugs), and lipoproteins (neutral and basic drugs), as well as
circulating cells including red blood cells and platelets (18).
Drugs reveal equilibrium between their protein-bound and -free
forms. Since only unbound free drug exhibits the intended thera-
peutic effect, therefore, the PPB affinity of drugs becomes a
crucial property (93). For these reasons, development of the in
silico models for the prediction of PPB is an active area of predic-
tive ADME/Tox.
Metabolism: Drug metabolism and clearance also contribute

towards the success of a drug, as these properties have a significant
impact on bioavailability (oral or intravenous) (97). Chemical trans-
formations of xenobiotics by liver (and other tissues and organs) are
the key to the ADME profiling. It is extremely difficult to develop
models for prediction of metabolic fate of drugs due to the complex
nature of the biochemical processes involved. Of the two sets of
metabolic transformations, oxidation(s) by CYP enzymes is/are
critical (93). The rate and extent of metabolism influence clearance,
whereas the involvement of particular enzyme(s) may lead to issues
relating to the polymorphic nature of some of these enzymes as well
as to drug–drug interactions (10). Metabolism, key factor influen-
cing the excretion of drugs, has long been the target of computa-
tional models. Most efforts till date have mainly concentrated on
modeling cytochrome P450 (CYP450) mediated metabolic stability
and CYP450 inhibition. There are now examples of other drug
metabolic end-points for which computational models have been
developed include glucuronidation and enzymatic hydrolysis (94).
Excretion: Excretion (clearance) is the process through which the
body eliminates the xenobiotics (i.e., “foreign” or “extraneous”
compounds) and most of the biotransformations of these compounds
take place in the liver in spite of the fact that this organ is not the
exclusive site of metabolism. Another important route of excretion
for a metabolite as well as the parent compound is renal clearance
(94). There has been relatively little work conducted on the in silico
modeling or prediction of excretion. Although most of the existing
drugs are excreted to a variable extent as unchanged molecules via the
kidneys or the bile, for only a few is urinary or biliary excretion a major
route of elimination (2). Drug clearance is extremely difficult to
correlate with physicochemical properties and molecular descriptors
because of the complexity of the biological system, the influence of
transporters, and the vast range of sites and mechanisms of biotrans-
formation and elimination (18). The systemic clearance measures the
efficiency for removal of a xenobiotic from the body. The hepatic
clearance is the systemic clearance of those drugs which are exclusively
metabolized by the liver. Apart from the evaluation of bioavailability,
hepatic clearance also plays a vital role in the estimation and stream-
lining of early clinical trial doses/exposures in the clinic (93).
3. Example(s)
3.1. Prediction Multiple pharmacokinetic parameters of cephalosporins such as t1/2,

of Multiple CL, CLR, fe, V, and fb were predicted using random forest (RF),
Pharmacokinetic decision tree, and moving average analysis. The RFs were grown
Parameters of with the R program (version 2.1.0) using the RF library. R program
Cephalosporins (77) along with the RPART library was used to grow decision tree.
Fig. 3. The decision tree for distinguishing low value (A) from high value (B). (1) t1/2; (2) CL; (3) CLR; (4) fe; (5) V; (6) fb
(A1, molecular connectivity topochemical index; A2, eccentric adjacency topochemical index; A5, eccentric connectivity
topochemical index; A11, eccentric adjacency index). Reproduced from ref. 77 with permission from Austrian Pharmacists’
Publishing House, Austria.
To construct a single topological index based model for predicting

property/activity based ranges, moving average analysis of correctly
predicted compounds was used. RF correctly classified the pharma-
cokinetic parameters into low and high ranges up to 95%. A decision
tree was constructed for each pharmacokinetic parameter to deter-
mine the importance of topological indices. The decision tree learned
the information from the input data with an accuracy of 95% and
correctly predicted the cross-validated (tenfold) data with an accuracy
of up to 90%. The classification of these pharmacokinetic parameters
using decision tree is shown in Fig. 3. Three independent moving
average based topological models were developed using a single range
for simultaneous prediction of multiple pharmacokinetic parameters.
The accuracy of classification of single index based models using
moving average analysis varied from 65 to 100% (77).
3.2. Prediction of Tmax Models were developed for prediction of physicochemical property
of Antihistamines (78) (octanol/water partition constant, log P), critical pharmacokinetic
parameter (time to reach the maximum level of drug into the
bloodstream, Tmax), and toxicological property (lethal dose,
LD50) of structurally diverse antihistaminic compounds using deci-
sion tree and moving average analysis. A decision tree was con-
structed for each property to determine the significance of
topological descriptors. Single topological descriptor based models
were developed using moving average analysis. The tree learned the
information from the input data with an accuracy of >94% and
subsequently predicted the cross-validated (tenfold) data with an
accuracy of up to 71%. The classification of Tmax using decision tree
is shown in Fig. 4. The accuracy of prediction of single index based
models using moving average analysis varied from ~63 to 80% (78).
Fig. 4. A decision tree for distinguishing between low Tmax and high Tmax. A1, molecular connectivity topochemical index;
A2, eccentric adjacency topochemical index; A, low Tmax; B, high Tmax. Reproduced from ref. 78 with permission from
Inderscience Enterprises Limited, Switzerland.
4. Conclusion
In silico tools have not only accelerated drug discovery process but
also have led to significant reduction in time, animal sacrifice, and
expenditure. In silico tools specifically developed for prediction of
ADME characteristics are of particular interest to pharmaceutical
industry because of the high potential of discarding inappropriate
molecules during an early stage of drug development with conse-
quent saving of vital resources and valuable time. A well planned
systematic integrated in silico, in vitro, and in vivo approach can
discard inappropriate molecules at early stage and steeply accelerate
drug discovery process at reduced cost.
References
1. Miteva MA, Violas S, Montes M et al (2006) absorption and permeability in drug discovery.
FAF-drugs: free ADME/tox filtering of Curr Med Chem 13:2653–2667
compound collections. Nucleic Acids Res 34: 12. Li AP (2001) Screening for human ADME/
W738–W744 Tox drug proteins in drug discovery. Drug Disc
2. Boobis A, Gundert-Remy U, Kremers P et al Today 6:357–366
(2002) In silico prediction of ADME and phar- 13. Paul Y, Dhake AS, Singh B (2009) In silico
macokinetics: report of an expert meeting quantitative structure pharmacokinetic rela-
organised by COST B15. Eur J Pharm Sci tionship modeling of quinolones: apparent vol-
17:183–193 ume of distribution. Asian J Pharm 3:202–207
3. Huisinga W, Telgmann R, Wulkow M (2006) 14. Ekins S, Waller CL, Swaan PW et al (2000)
The virtual lab approach to pharmacokinetics: Progress in predicting human ADME para-
design principles and concepts. Drug Discov meters in silico. J Pharm Toxicol Methods
Today 11:800–805 44:251–272
4. Hodgson J (2001) ADMET—turning chemi- 15. Goodwin JT, Clark DE (2005) In silico predic-
cals into drugs. Nat Biotechnol 19:722–726 tions of blood–brain barrier penetration: con-
5. Grass GM, Sinko PJ (2002) Physiologically- siderations to “keep in mind”. J Pharm Exp
based pharmacokinetic simulation modeling. Ther 315:477–483
Adv Drug Deliv Rev 54:433–451 16. Linnankoski J, Ranta V-P, Yliperttula M et al
6. Kennedy T (1997) Managing the drug discov- (2008) Passive oral drug absorption can be
ery/development interface. Drug Disc Today predicted more reliably by experimental than
2:436–444 computational models—fact or myth. Eur J
7. Spalding DJM, Harker AJ, Bayliss MK (2000) Pharm Sci 34:129–139
Combining high-throughput pharmacokinetic 17. Modi S (2004) Positioning ADMET in silico
screens at the hits-to-leads stage of drug dis- tools in drug discovery. Drug Disc Today
covery. Drug Disc Today 5:S70–S76 9:14–15
8. Butina D, Segall MD, Frankcombe K (2002) 18. Mager DE (2006) Quantitative structure–phar-
Predicting ADME properties in silico: methods macokinetic/pharmacodynamic relationships.
and models. Drug Disc Today 7:S83–S88 Adv Drug Del Rev 58:1326–1356
9. van Waterbeemd H, Gifford E (2003) 19. Gunaratna C (2001) Drug metabolism and
ADMET in silico modelling: towards predic- pharmacokinetics in drug discovery: a primer
tion paradise? Nat Rev Drug Disc 2:192–204 for bioanalytical chemists, part II. Curr Sep
10. Hou T, Xu X (2004) Recent development and 19:87–92
application of virtual screening in drug discov- 20. Hansch C (1972) Quantitative relationships
ery: an overview. Curr Pharm Des between lipophilic character and drug metabo-
10:1011–1033 lism. Drug Metab Rev 1:1–14
11. Hou T, Wang J, Zhang W et al (2006) Recent 21. Lipinski CA, Lombardo F, Dominy BW et al
advances in computational prediction of drug (1997) Experimental and computational
approaches to estimate solubility and permeability characterization and its application for
in drug discovery and development settings. Adv predicting important pharmaceutical properties
Drug Deliv Rev 23:3–25 of molecules. J Med Chem 42:1739–1748
22. Lipinski CA (2000) Druglike properties and 34. Egan WJ, Merz KM, Baldwin JJ (2000) Predic-
the causes of poor solubility and poor perme- tion of drug absorption using multivariate sta-
ability. J Pharm Toxicol Methods 44:235–249 tistics. J Med Chem 43:3867–3877
23. Ghose AK, Viswanadhan VN, Wendoloski JJ 35. Andrews CW, Bennett L, Yu LX (2000) Pre-
(1999) A knowledge-based approach in dicting human oral bioavailability of a com-
designing combinatorial or medicinal chemis- pound: development of a novel quantitative
try libraries for drug discovery. 1. A qualitative structure–bioavailability relationship. Pharm
and quantitative characterization of known Res 17:639–644
drug databases. J Comb Chem 1:55–68 36. Zhao YH, Le J, Abraham MH et al (2001)
24. Wenlock MC, Austin RP, Barton P, Davis AM, Evaluation of human intestinal absorption
Leeson PD (2003) A comparison of physio- data and subsequent derivation of a quantita-
chemical property profiles of development and tive structure–activity relationship (QSAR)
marketed oral drugs. J Med Chem with the Abraham descriptors. J Pharm Sci
46:1250–1256 90:749–784
25. Norinder U, Bergstrçm CAS (2006) Prediction 37. Colmenarejo G, Alvarez-Pedraglio A, Lavan-
of ADMET properties. ChemMedChem dera JL (2001) Chemoinformatic models to
1:920–937 predict binding affinities to human serum albu-
26. Hirono S, Nakagome I, Hirano H et al (1994) min. J Med Chem 44:4370–4378
Non-congeneric structure–pharmacokinetic 38. Mandagere AK, Thompson TN, Hwang KK
property correlation studies using fuzzy adap- (2002) A graphical model for estimating oral
tive least-squares: oral bioavailability. Biol bioavailability of drugs in humans and other
Pharm Bull 17:306–309 species from their Caco-2 permeability and
27. Palm K, Luthman K, Ungel AL et al (1996) in vitro liver enzyme metabolic stability rates.
Correlation of drug absorption with molecular J Med Chem 45:304–311
surface properties. J Pharm Sci 85:32–39 39. Veber DF, Johnson SR, Cheng H-Y et al
28. Wessel MD, Jurs PC, Tolan JW et al (1998) (2002) Molecular properties that influence
Prediction of human intestinal absorption of the oral bioavailability of drug candidates.
drug compounds from molecular structure. J Med Chem 45:2615–2623
J Chem Inf Comput Sci 38:726–735 40. Karalis V, Tsantili-Kakoulidou A, Macheras P
29. Nestorov I, Aarons L, Rowland M (1998) (2002) Multivariate statistics of disposition
Quantitative structure-pharmacokinetics pharmacokinetic parameters for structurally
relationships II. Mechanistically based model unrelated drugs used in therapeutics. Pharm
for the relationship between the tissue distribu- Res 19:1827–1834
tion parameters and the lipophilicity of the 41. Wajima T, Fukumura K, Yano Y et al (2002)
compounds. J Pharmacokinet Biopharm Prediction of human clearance from animal
26:521–545 data and molecular structural parameters
30. Winiwarter S, Bonham NM, Ax F et al (1998) using multivariate regression analysis. J Pharm
Correlation of human jejunal permeability Sci 91:2489–2499
(in vivo) of drugs with experimentally and the- 42. Doniger S, Hofmann T, Yeh J (2002) Predict-
oretically derived parameters. A multivariate ing CNS permeability of drug molecules: com-
data analysis approach. J Med Chem parison of neural network and support vector
41:4939–4949 machine algorithms. J Comput Biol
31. Clark DE (1999) Rapid calculation of polar 9:849–864
molecular surface area and its application to 43. Shen M, Xiao YD, Golbraikh A et al (2003)
the prediction of transport phenomena. 1. Pre- Development and validation of k-nearest-
diction of intestinal absorption. J Pharm Sci neighbor QSPR models of metabolic stability
88:807–814 of drug candidates. J Med Chem
32. Stenberg P, Luthman K, Ellens H et al (1999) 46:3013–3020
Prediction of the intestinal absorption of 44. Zmuidinavicius D, Didziapetris R, Japertas P
endothelin receptor antagonists using three et al (2003) Classification structure–activity
theoretical methods of increasing complexity. relations (C-SAR) in prediction of human
Pharm Res 16:1520–1526 intestinal absorption. J Pharm Sci 92:621–633
33. Ghuloum AM, Sage CR, Jain AN (1999) Molec- 45. Lobell M, Sivarajah V (2003) In silico predic-
ular hashkeys: a novel method for molecular tion of aqueous solubility, human plasma
protein binding and volume of distribution of 58. Linnankoski J, M€akel€a JM, Ranta VP et al
compounds from calculated pKa and AlogP98 (2006) Computational prediction of oral drug
values. Mol Divers 7:69–87 absorption based on absorption rate constants
46. Wajima T, Fukumura K, Yano Y et al (2003) in humans. J Med Chem 49:3674–3681
Prediction of human pharmacokinetics from 59. Garg P, Verma J (2006) In silico prediction of
animal data and molecular structural para- blood–brain barrier permeability: an artificial
meters using multivariate regression analysis: neural network model. J Chem Inf Model
oral clearance. J Pharm Sci 92:2427–2440 46:289–297
47. Turner JV, Maddalena DJ, Cutler DJ et al 60. Doddareddy MR, Cho YS, Koh HY et al
(2003) Multiple pharmacokinetic parameter (2006) In silico renal clearance model using
prediction for a series of cephalosporins. classical Volsurf approach. J Chem Inf Model
J Pharm Sci 92:552–559 46:1312–1320
48. Pérez MA, Sanz MB, Torres LR et al (2004) A 61. Yap CW, Li ZR, Chen YZ (2006) Quantitative
topological sub-structural approach for pre- structure–pharmacokinetic relationships for
dicting human intestinal absorption of drugs. drug clearance by using statistical learning
Eur J Med Chem 39:905–916 methods. J Mol Graph Model 24:383–395
49. Turner JV, Maddalena DJ, Cutler DJ (2004) 62. Dureja H, Madan AK (2006) Topochemical
Pharmacokinetic parameter prediction from models for the prediction of permeability
drug structure using artificial neural networks. through blood brain barrier. Int J Pharm
Int J Pharm 270:209–219 323:27–33
50. Pan D, Iyer M, Liu J et al (2004) Constructing 63. Dureja H, Madan AK (2007) Validation of
optimum blood brain barrier QSAR models topochemical models for the prediction of per-
using a combination of 4D-molecular similarity meability through blood brain barrier. Acta
measures and cluster analysis. J Chem Inf Com- Pharm 57:451–467
put Sci 44:2083–2098 64. Hou T, Wang J, Li Y (2007) ADME evaluation
51. Bai JP, Utis A, Crippen G et al (2004) Use of in drug discovery. 8. The prediction of human
classification regression tree in predicting oral intestinal absorption by a support vector
absorption in humans. J Chem Inf Comput Sci machine. J Chem Inf Model 47:2408–2415
44:2061–2069 65. Iyer M, Tseng YJ, Senese CL et al (2007) Pre-
52. Li H, Yap CW, Ung CY et al (2005) Effect of diction and mechanistic interpretation of
selection of molecular descriptors on the pre- human oral drug absorption using MI-QSAR
diction of blood–brain barrier penetrating and analysis. Mol Pharm 4:218–231
nonpenetrating agents by statistical learning 66. Cuadrado MU, Ruiz IL, Gómez-Nieto MA
methods. J Chem Inf Model 45:1376–1384 (2007) QSAR models based on isomorphic
53. Aureli L, Cruciani G, Cesta MC et al (2005) and nonisomorphic data fusion for predicting
Predicting human serum albumin affinity of the blood brain barrier permeability. J Comput
interleukin-8 (CXCL8) inhibitors by 3D- Chem 28:1252–1260
QSPR approach. J Med Chem 48:2469–2479 67. Gleeson MP (2007) Plasma protein binding
54. Rahnasto M, Raunio H, Poso A et al (2005) affinity and its relationship to molecular struc-
Quantitative structure–activity relationship ture: an in-silico analysis. J Med Chem
analysis of inhibitors of the nicotine metaboliz- 50:101–112
ing CYP2A6 enzyme. J Med Chem 68. Li C, Liu T, Cui X et al (2007) Development of
48:440–449 in vitro pharmacokinetic screens using Caco-2,
55. Zhou XF, Shao Q, Coburn RA et al (2005) human hepatocyte, and Caco-2/human hepa-
Quantitative structure–activity relationship tocyte hybrid systems for the prediction of oral
and quantitative structure–pharmacokinetics bioavailability in humans. J Biomol Screen
relationship of 1,4-dihydropyridines and pyri- 12:1084–1091
dines as multidrug resistance modulators. 69. Hou T, Wang J, Zhang W et al (2007) ADME
Pharm Res 22:1989–1996 evaluation in drug discovery. 6. Can oral bio-
56. Jung SJ, Choi SO, Um SY et al (2006) Predic- availability in humans be effectively predicted
tion of the permeability of drugs through study by simple molecular property-based rules?
on quantitative structure–permeability rela- J Chem Inf Model 47:460–463
tionship. J Pharm Biomed Anal 41:469–475 70. Moda TL, Montanari CA, Andricopulo AD
57. Gleeson MP, Waters NJ, Paine SW et al (2006) (2007) Hologram QSAR model for the predic-
In silico human and rat Vss quantitative struc- tion of human oral bioavailability. Bioorg Med
ture–activity relationship models. J Med Chem Chem 15:7738–7745
49:1953–1963
71. De Buck SS, Sinha VK, Fenu LA et al (2007) The 84. Emoto C, Murayama N, Rostami-Hodjegan A
prediction of drug metabolism, tissue distribu- et al (2009) Utilization of estimated physico-
tion, and bioavailability of 50 structurally diverse chemical properties as an integrated part of
compounds in rat using mechanism-based predicting hepatic clearance in the early drug-
absorption, distribution, and metabolism pre- discovery stage: impact of plasma and micro-
diction tools. Drug Metab Dispos 35:649–659 somal binding. Xenobiotica 39:227–235
72. Hou T, Wang J, Zhang W et al (2007) ADME 85. Paixão P, Gouveia LF, Morais JA (2009) Pre-
evaluation in drug discovery. 7. Prediction of diction of drug distribution within blood. Eur J
oral absorption by correlation and classifica- Pharm Sci 36:544–554
tion. J Chem Inf Model 47:208–218 86. Chang C, Duignan DB, Johnson KD (2009)
73. Fu XC, Wang GP, Shan HL et al (2008) Pre- The development and validation of a computa-
dicting blood–brain barrier penetration from tional model to predict rat liver microsomal
molecular weight and number of polar atoms. clearance. J Pharm Sci 98:2857–2867
Eur J Pharm Biopharm 70:462–466 87. Li H, Sun J, Sui X et al (2009) First-principle,
74. Zhang L, Zhu H, Oprea TI et al (2008) QSAR structure-based prediction of hepatic metabolic
modeling of the blood–brain barrier perme- clearance values in human. Eur J Med Chem
ability for diverse organic compounds. Pharm 44:1600–1606
Res 25:1902–1914 88. Grabowski T, Jaroszewski JJ (2009) Bioavail-
75. Sinha VK, De Buck SS, Fenu LA et al (2008) ability of veterinary drugs in vivo and in silico.
Predicting oral clearance in humans: how close J Vet Pharmacol Ther 32:249–257
can we get with allometry? Clin Pharmacokinet 89. Fan Y, Unwalla R, Denny RA et al (2010)
47:35–45 Insights for predicting blood–brain barrier
76. Ma CY, Yang SY, Zhang H et al (2008) Predic- penetration of CNS targeted molecules using
tion models of human plasma protein binding QSPR approaches. J Chem Inf Model
rate and oral bioavailability derived by using 50:1123–1133
GA-CG-SVM method. J Pharm Biomed Anal 90. Shen J, Cheng F, Xu Y et al (2010) Estimation
47:677–682 of ADME properties with substructure pattern
77. Dureja H, Gupta S, Madan AK (2008) Topo- recognition. Chem Inf Model 50:1034–1041
logical models for prediction of pharmacoki- 91. Yu MJ (2010) Predicting total clearance in
netic parameters of cephalosporins using humans from chemical structure. J Chem Inf
random forest, decision tree and moving aver- Model 50:1284–1295
age analysis. Sci Pharm 76:377–394 92. Paixão P, Gouveia LF, Morais JA (2010) Pre-
78. Dureja H, Gupta S, Madan AK (2009) Deci- diction of the in vitro intrinsic clearance deter-
sion tree derived topological models for predic- mined in suspensions of human hepatocytes by
tion of physico-chemical, pharmacokinetic and using artificial neural networks. Eur J Pharm
toxicological properties of antihistaminic Sci 39:310–321
drugs. Int J Comput Biol Drug Des 2:353–370 93. Kharkar PS (2010) Two-dimensional (2D) in
79. Lanevskij K, Japertas P, Didziapetris R et al silico models for absorption, distribution,
(2009) Ionization-specific prediction of metabolism, excretion and toxicity
blood–brain permeability. J Pharm Sci (ADME/T) in drug discovery. Curr Top Med
98:122–134 Chem 10:116–126
80. Kokate A, Li X, Williams PJ et al (2009) In 94. Lombardo F, Gifford E, Shalaeva MY (2003)
silico prediction of drug permeability across In silico ADME prediction: data, models, facts
buccal mucosa. Pharm Res 26:1130–1139 and myths. Mini Rev Med Chem 3:861–875
81. Berellini G, Springer C, Waters NJ et al (2009) 95. Liu X, Tu M, Kelly RS et al (2004) Develop-
In silico prediction of volume of distribution in ment of a computational approach to predict
human using linear and nonlinear models on a blood–brain barrier permeability. Drug Metab
669 compound data set. J Med Chem Dispos 32:132–139
52:4488–4495 96. Sui X, Sun J, Wu X et al (2008) Predicting the
82. McIntyre TA, Han C, Davis CB (2009) Predic- volume of distribution of drugs in humans.
tion of animal clearance using naı̈ve Bayesian Curr Drug Metab 9:574–580
classification and extended connectivity finger- 97. Ekins S, Boulanger B, Swaan PW et al (2002)
prints. Xenobiotica 39:487–494 Towards a new age of virtual ADME/TOX and
83. Li H, Sun J, Sui X, Yan Z et al (2009) multidimensional drug discovery. J Comput
Structure-based prediction of the nonspecific Aided Mol Des 16:381–401
binding of drugs to hepatic microsomes.
AAPS J 11:364–370
Chapter 15
Ligand- and Structure-Based Pregnane X Receptor Models

Sandhya Kortagere, Matthew D. Krasowski, and Sean Ekins
Abstract
The human pregnane X receptor (PXR) is a ligand dependent transcription factor that can be activated by
structurally diverse agonists including steroid hormones, bile acids, herbal drugs, and prescription medica-
tions. PXR regulates the transcription of several genes involved in xenobiotic detoxification and apoptosis.
Activation of PXR has the potential to initiate adverse effects by altering drug pharmacokinetics or
perturbing physiological processes. Hence, more reliable prediction of PXR activators would be valuable
for pharmaceutical drug discovery to avoid potential toxic effects. Ligand- and protein structure-based
computational models for PXR activation have been developed in several studies. There has been limited
success with structure-based modeling approaches to predict human PXR activators, which can be
attributed to the large and promiscuous site of this protein. Slightly better success has been achieved with
ligand-based modeling methods including quantitative structure–activity relationship (QSAR) analysis,
pharmacophore modeling and machine learning that use appropriate descriptors to account for the diversity
of the ligand classes that bind to PXR. These combined computational approaches using molecular shape
information may assist scientists to more confidently identify PXR activators. This chapter reviews the
various ligand and structure based methods undertaken to date and their results.
Key words: Agonists, Alignment methods, Antagonists, Bayesian classification, Docking and Scoring,
Pharmacophore, Pregnane Xenobiotic receptors, QSAR, Support vector machines
1. Introduction
Receptor mediated toxicity has received a lot of attention in the past

decade with several studies commissioned to examine the deleterious
effects of man-made chemicals (1–4). These chemicals include pesti-
cides and other industrial chemicals that may elicit toxic endpoints by
mimicking the action of endogenous hormones and neurotransmit-
ters (3, 5). As a first line of investigation, the endocrine receptors have
been implicated in receptor mediated toxicity although the role of
other receptors and transporters has also been profiled (6, 7). One
such class of endocrine receptors is the nuclear hormone receptors
(NRs), which form the largest superfamily of ligand-dependent tran-
scription factors. NRs are involved in a variety of functions including
359
Fig. 1. General architecture of nuclear hormone receptor family. DBD represents the DNA binding domain and LBD
represents the ligand binding domain and AF2 represents the activation function region that binds to coactivator.
cell proliferation, differentiation, development, and metabolic

homeostasis (8). Functionally, all NRs transcriptionally regulate
the expression of target genes by the recruitment of coactivators
and corepressors. The general architecture of all NRs consists of
four distinct domains (Fig. 1): (a) an A/B domain present at the
amino terminal region which contains the ligand-independent
AF1 activation domain, (b) a DNA-binding domain (DBD) that
contains two conserved C4-type zinc finger motifs, (c) a highly
flexible hinge region (“D-domain”), and (d) the ligand binding
domain (LBD, also known as the E domain) that connects to the
second activation domain called the AF2 domain at the carboxy
terminal of the protein. The DBD binds to target gene responsive
elements that contain conserved hexameric sequences. The LBD
among the NRs show much more structural diversity than the
DBDs, which explains the diverse array of ligands that bind to
NRs. Because of their diverse structural and functional properties
and their ability to bind to a broad array of endogenous and
exogenous molecules, NRs are involved in regulating a number
of genes. Among the NR superfamily, the role of estrogen recep-
tors (ERs), androgen receptor (AR), and peroxisome proliferator
activated receptors (PPARs) have been well studied in terms of
receptor mediated toxicity due to their direct role in mediating
hormones (2, 9, 10). The federal agencies have brought in legisla-
tion to test for endocrine disrupting chemicals that interfere with
these reproductive hormone regulators (11, 12).
The human PXR (hPXR) is a ligand dependent NR that plays a
key role in regulating drug–drug interactions (13–15) by activating
an essential class of cytochrome P450 enzymes that metabolize
endobiotics and xenobiotics. PXR agonists include a wide range
of structurally diverse endogenous and exogenous compounds
such as bile acids, steroid hormones, dietary fat-soluble vitamins,
prescription medications, and herbal drugs, as well as environmen-
tal chemicals such as pesticides, estrogens, and antiestrogens (16).
PXR forms a heterotetramer with the retinoid X receptor (RXR)
and a homodimer with itself. The homodimerization interface of
PXR influences the coactivator binding site at AF2 by modulating
long range motions of the LBD (17). These motions are responsi-
ble for the promiscuousness of the LBD in PXR that helps bind
ligands of varying sizes and shapes. The role of PXR in various
15 Ligand- and Structure-Based Pregnane X Receptor Models 361
pathophysiological states indicates that PXR agonists could variably

affect human and animal health. For example PXR agonists
can impact cholesterol metabolism, cell cycle regulation, the endo-
crine system (18, 19) as well as potentiate the toxicity of other
environmental contaminants as reviewed recently (20). Animal
models may not reliably predict hPXR-related problems due to
the diversity of PXRs across species (11, 14) resulting in differences
in the ligand selectivity (21). The identification and characteriza-
tion of hPXR agonists is thus important to human pharmacokinet-
ics and toxicology of environmental chemicals. We (13–15, 22–28)
and others (29–35) have tried to utilize a number of in silico
methods to understand and characterize the agonist and antagonist
binding modes of hPXR. In this chapter we summarize these
in silico methods and the reader is urged to consult other references
listed for a comprehensive overview of all the methods.
2. Results
and Discussion
2.1. Ligand-Based Human PXR agonist pharmacophore models have been shown to
Modeling of PXR possess hydrophobic, hydrogen bond acceptor and hydrogen bond
donor features (Table 1), consistent with the crystallographic struc-
tures of hPXR ligand–receptor complexes (17, 36–38). Several
predictive computational models for hPXR have been developed
to define key binding features of ligands (27, 28, 39). Most
Table 1
hPXR pharmacophore features and how to avoid being an agonist
Pharmacophore How to avoid interaction

features with the protein
Hydrophobic (hy) Attaching hydrogen
Hydrogen bond bonding groups on one of
acceptors (hb) the hydrophobic features,
Hydrogen bond donor adding larger more rigid
(occasionally) groups as well as
removing central H-bond
acceptors
pharmacophore models feature 4-5 hydrophobic features and at

least 1–2 hydrogen bonding moieties.
One study has used hPXR activation data for 30 steroidal
compounds (including 9 bile acids) to create a pharmacophore
with four hydrophobic features and one hydrogen bond acceptor
(13). This pharmacophore contained 5a-androstan-3b-ol (EC50
0.8 mM) which contains one hydrogen bond acceptor, indicating
that in contrast to the crystal structure of 17b-estradiol (published
EC50 20 mM) bound to hPXR with two hydrogen bonding inter-
actions (37), hydrophobic interactions may therefore be more
important for increased affinity. (13) This and other pharmaco-
phores have been used to predict hPXR interactions for antibiotics
(24) which were verified in vitro, suggesting one use for computa-
tional approaches in combination with experimental methods (24).
The original hPXR pharmacophore (Table 1) consisted of 4 hydro-
phobes and a hydrogen bond acceptor feature and was found to
map 5 of the antibiotics, one of which, RIF, was originally in the
pharmacophore training set. Nafcillin, dicloxacillin, erythromycin,
and troleandomycin mapped to the widely dispersed features well.
The diverse (n ¼ 31) pharmacophore consisted of 2 hydrophobes,
a hydrogen bond acceptor and a hydrogen bond donor feature and
mapped to 16 of the antibiotics, one of which, rifampin (RIF), was
also in the pharmacophore training set. Tetracycline, sulfisoxazole,
sulfmethazole, troleandomycin, and griseofulvin did not map to the
features. Interestingly nafcillin fit well and dicloxacillin had the
lowest fit value. The steroidal (n ¼ 30) pharmacophore consisted
of 4 hydrophobes and a hydrogen bond acceptor feature and
surprisingly was found to map 4 of the antibiotics, dicloxacillin,
troleandomycin, clindamycin, and griseofulvin. Twelve of the anti-
biotics were also docked (erythromycin failed to dock) into one of
the hPXR crystal structures (Protein Data Bank (www.rcsb.org/
pdb) accession # 1NRL), and were then scored using the docking
program FlexX (40). The lower the FlexX score, the better the
complementarity between ligand and receptor. All the penicillins
and cephems docked and scored well apart from cefuroxime,
tetracycline, doxycycline, and clindamycin which had poorer
scores. Docking and scoring with FlexX scored the penicillins
and cephems similarly and failed to dock the larger molecule
erythromycin. We have also recently suggested that docking meth-
ods may need combination with quantitative structure–activity
relationship (QSAR) or other computational methods, in order
to improve predictions due to the flexibility of the protein and
large binding site that could accommodate multiple pharmaco-
phores (40).
The pharmacophore models have predominantly used structur-
ally diverse ligands in the training set and have the limitation in
most cases of compiling data from multiple laboratories using
different experimental protocols, ultimately forcing the binary
classification of ligands for the training sets (i.e., activating versus

nonactivating). Others have described a statistical quantitative
structure activity relationship (QSAR) model using VolSurf
descriptors and partial least squares (PLS) for a training set of 33
hPXR ligands which identified hydrogen bond acceptor regions
and amide responsive regions; however, no test set data were
provided (41). A second statistical model using a recursive
partitioning method has been used with 99 hPXR activators and
nonactivators to predict the probability of aprepitant, L-742694,
4-hydroxytamoxifen and artemisinin binding to hPXR (26). Addi-
tional models based on machine learning methods using a set of
hPXR activators and nonactivators (33) displayed overall prediction
accuracies between 72 and ~80%, while an external test set of
known activators had a similar level of accuracy.
To date there have been few attempts to build ligand-based
models around a large structurally narrow set of hPXR activators.
The absence of large sets of quantitative data for hPXR agonists
has restricted QSAR models to a relatively small universe of mole-
cules compared to the known drugs, drug-like molecules, endo-
biotics and xenobiotics in general (33). Various machine learning
methods (e.g. support vector machines, recursive partitioning etc.)
that can be used with binary biological data have been applied to
hPXR. We have generated computational models for hPXR using
recursive partitioning (RP), random forest (RF), and support vec-
tor machine (SVM) algorithms with VolSurf descriptors. Following
10-fold randomization the models correctly predicted 82.6–98.9%
of activators and 62.0–88.6% of nonactivators. All models were
validated with a test set (N ¼ 145), and the prediction accuracy
ranged from 63 to 67% overall (25). These test set molecules were
found to cover the same area in a principal component analysis plot
as the training set, suggesting that the predictions were within the
applicability domain. A second study used the same training and
test sets with SVM algorithms with molecular descriptors derived
from two sources, Shape Signatures and the Molecular Operating
Environment (MOE) application software. The overall test set
prediction accuracy for hPXR activators with SVM was 72% to
81% (23).
A substantial amount of experimental hPXR data has recently
been generated for classes of steroidal compounds, namely, andros-
tanes, estratrienes, pregnanes, and bile salts (14) which was then
used with an array of ligand-based computational methods includ-
ing Bayesian modeling with 2D fingerprints methods. (15) All 115
compounds were used to generate a Bayesian classification model
(42) using a definition of active as a compound having an EC50 for
hPXR activation of less than 10 mM. Using FCFP_6 and 8 inter-
pretable descriptors (AlogP, molecular weight, rotatable bonds,
number of rings, number of aromatic rings, hydrogen bond accep-
tor, hydrogen bond donor, and polar surface area) a model was
developed with a receiver operator characteristic for leave one out

cross validation of 0.84. In addition to the leave one out cross
validation, further validation methods were undertaken. After
leaving 20% of the compounds out 100 times the ROC is 0.84,
concordance 73.2%, specificity 69.1%, and sensitivity 84.1%.
In comparison to molecular docking methods, ligand based models
performed better in classifying the compounds. The Bayesian
method appeared to have good model statistics for internal cross
validation of steroids. We have additionally used this model to
classify a previously used diverse molecule test set. The Bayesian
hPXR model was used to rank 123 molecules (65 activators and 58
non activators). Out of the top 30 molecules scored and ranked
with this model 20 (75%) were classified as activators (EC50 < 100
mM) (15). All hPXR positive contributing substructures were
essentially hydrophobic, while hPXR negative contributing features
possessed hydroxyl or other substitutions which are likely not
optimally placed to facilitate interactions with hydrogen bonding
features in the hPXR LBD. Therefore, possession of these hydro-
gen bond acceptor and donor features indicated in the steroidal
substructures appears to be related to loss of hPXR activation.
Among the alignment based methods, CoMFA and CoMSIA’s
performance in classification models was the least since they are
based on rigid alignment of molecules, while hPXR binding site is
very flexible. 4D and 5D QSAR methods are based on ensembles of
ligand conformations and seem to perform better within a narrow
chemical space (such as a sub-class of steroids), but the performance
is limited when extended to a larger chemical class.
The Bayesian approach using fingerprints and 117 structural
descriptors was also used recently with a large diverse training set
comprising 177 compounds. The classifier was used to screen a
subset of FDA approved drugs followed by testing of a few com-
pounds (17 compounds from the top 25) with a cell-based lucifer-
ase reporter assay for evaluation of chemical-mediated hhPXR
activation in HepG2 cells. The reporter assay confirmed nine
drugs as novel hPXR activators: fluticasone, nimodipine, nisoldi-
pine, beclomethasone, finasteride, flunisolide, megestrol, secobar-
bital, and aminoglutethimide (29). Such ligand-based Bayesian
approaches with a diverse training set may be more useful than a
narrow structural series of steroidal compounds that was previously
used for database searching a set of pesticides and other industrial
chemicals (22). These global models for hPXR could certainly be of
value for selecting compounds for in vitro screening. We are not
aware of an exhaustive screening of FDA approved drugs for poten-
tial hPXR agonists and then testing of compounds that score highly.
Such an approach may be useful to understand hPXR mediated
drug–drug interactions more comprehensively.
2.2. Structure-Based Currently at the time of writing there are five high-resolution
Modeling of hPXR crystal structures of the hPXR LBD in complex with a variety of
ligands (17, 36–38) available in the PDB (and a sixth structure to
be deposited (57)). The structures have provided atomic level
details that have led to a greater understanding of the LBD and
the structural features involved in ligand–receptor interactions (27,
28, 39). The cocrystallized ligands include the natural products
hyperforin (active component of the herbal antidepressant St.
John’s wort) and colupulone (from hops), the steroid 17b-estra-
diol, and the synthetic compounds SR12813, T-0901317, and the
antibiotic rifampicin (Fig. 2). These ligands span a range of
molecular sizes (M.Wt range 272.38–713.81 Da, mean
487.58 147.25 Da) and are predicted as generally hydrophobic
(calculated ALogP (43) 3.54–10.11, mean 5.54 2.41). The cav-
ernous hPXR ligand binding pocket (LBP) with a volume
>1,350 Å3 accepts molecules of these widely varying dimensions
and chemical properties, and is likely capable of binding small
molecules in multiple orientations (44). This complicates overall
prediction of whether a small molecule is likely to be classified as an
hPXR agonist using traditional structure-based virtual screening
methods like docking that treat the receptor as rigid for purposes
of modeling (25, 45). With regard to this, we have previously
shown that the widely used structure-based docking methods
Fig. 2. Two dimensional structures of ligands that are cocrystallized with hPXR; 444 is T0901317, EST is 17b-estradiol,
COL is colupulone, HYF is hyperforin, RFP is rifampicin, and SRL is SR12813.
FlexX and GOLD performed relatively poorly in predicting hPXR

agonists and this is perhaps not surprising based on the observa-
tions described above.
In a recent paper we docked a series of 115 steroids to the six
hPXR crystal structures using GOLD (15). To assess the diversity
of the binding sites in the six published crystal structures, we super-
imposed the backbones. All six crystal structures superimposed
with a backbone root mean squared deviation of 0.5 Å suggesting
that they had very similar structures and their cocrystallized ligands
bound to the same binding pocket. The docking scores for all
115 compounds were in the range of 36–77 (higher values are
preferred) for all the crystal structures. In this study we designed
a scoring scheme in which the docked scores were weighted based
on their similarity to either the high affinity ligand (5a-androstan-
3b-ol) or the cocrystal steroid ligand 17b-estradiol (see appendix
for other scoring schemes). The corresponding Tanimoto similarity
scores to 5a-androstan-3b-ol and the crystal ligand 17b-estradiol
using MDL public keys were between 0.4 and 1 (with 1 being
maximal similarity). We also chose an EC50 value of 10 mM as a
cutoff to classify the compounds as hPXR activators and nonacti-
vators. Based on these parameters the overall accuracy in classifica-
tion using similarity weighted GoldScore ranged between 35% and
58%. Although the classification success rates were modest, these
values were entirely dependent on the EC50 value chosen as cut off.
The performance of docking based classification for individual
classes of steroids was better than the overall performance. The
classification of activators ranged from 66 to 100%. However,
several of the nonactivators were classified as activators in the
estratrienes and the reason being their high similarity scores
towards the crystal structure ligand 17b-estradiol. Thus, similarity
weighted scoring schemes have to be improved further to avoid this
misclassification. Docking studies provided a number of insights
into the mode of binding of steroids to hPXR. Analysis of the
predicted binding mode of the various steroids showed that they
all form hydrogen bonded interactions with His407 and also main
chain atoms of Leu209 and Val211. The steroid core maintains
hydrophobic interactions with Leu240, Leu239, Ile236, Trp299,
Phe288, Leu411, and Met243 (Fig. 3). In a study by Gao et al.
(30), molecular docking methods were used to derive the SAR of a
known class of hPXR activators. It was found that the activation of
hPXR could be attenuated by adding polar groups to portions
of the ligand that would otherwise form favorable hydrophobic
interactions with the protein. These results correlate well with our
docking studies on other xenobiotics as well (45, 46).
In a recent study we used molecular docking and the similarity
weighted GoldScore approach to classify activators of hPXR using
the ToxCastTM database of pesticides and other industrial chemicals
(22). Each ToxCastTM compound was docked into the 5 published
Fig. 3. Structural comparison of the binding modes of (a) hyperforin and (b) steroids compound in hPXR LBD is shown.
The 2D interaction diagram was generated using LIGX module of program MOE.
crystallographic structures of hPXR and scored according to the

hybrid scoring scheme and 11 compounds were selected based on
their consensus docking scores for testing. The results from the
docking studies were also correlated with a Bayesian model to
classify the ToxCastTM compounds into hPXR agonists and non-
agonists. These chosen compounds were tested for hPXR activation
using a luciferase-based reporter assays in the HepG2 and DPX-
2 human liver cell lines. Among the eleven compounds tested,
six were found to be strong agonists and two had weak agonist
activity. Further, docking based classification of the entire Toxcast
dataset consisting of 308 molecules was then compared to the
experimental results. Docking based classification performed well
with a prediction sensitivity of ~ 74% for the entire ToxCastTM
dataset published (47–49).
Another significant observation with reference to hPXR
involves the presence of two splice variants of hPXR. A recent
study has compared the splice variant hPXR.2 with hPXR.1, using
homology based models. It was shown that hPXR.2 lacks the 37
amino acids that make up helix 2, and this might suggest why
agonists do not activate this splice variant in recombinant expres-
sion systems (24).
2.3. hPXR Antagonists There have been relatively few attempts to understand or develop
in silico models of antagonism of hPXR (50). One computational
approach focused on the LBD using the crystal structure of hPXR
bound to T-0901317 (37), but this proved difficult (31). The list of
hPXR antagonists is however steadily growing and even includes
some compounds first characterized as weak hPXR agonists (15).
For example, the antagonists ketoconazole (33), fluconazole, and
enilconazole (51) have all been shown to inhibit the activation of
hPXR in the presence of paclitaxel, while behaving as weak agonists

on their own. Computational docking results revealed these hPXR
antagonists partially occupied the same hydrophobic groove where
the coactivator motif binds to receptor, antagonizing the essential
protein–protein interaction ((13) and references therein). Ketoco-
nazole was shown to inhibit the interaction of hPXR with the
coactivator SRC-1 suggesting binding to the AF-2 site. This
hypothesis was further confirmed with site-directed mutagenesis
data (51), indicating ketoconazole behaved like the histidine resi-
due of SRC-1 (51). Based on the three azole hPXR antagonists, a
pharmacophore model was developed to elucidate the important
structural features for binding. When this model was combined
with docking studies and biological testing, it enabled discovery
of several more potent nonazole hPXR antagonists which included
commercially available synthetic compounds and the FDA
approved prodrug leflunomide, confirmed experimentally in vitro
(14). When ketoconazole was docked into the exterior site, the
piperazine ring was predicted as solvent exposed. The pharmaco-
phore model also indicated the minimum requirements of these
azoles, suggesting the complementary nature of different
computer-aided antagonist design methods (13). The new small
molecule antagonists had good ligand efficiencies (as they were
more potent on a per heavy atom basis) compared with ketocona-
zole when determined using the published approaches (52). This
suggests that smaller molecules could also be effective antagonists
and with optimal protein–ligand interactions. It is important to
consider that antagonism of hPXR or other NRs could occur via
interactions with other proteins that interact with hPXR or at other
surface sites beyond those currently known. Novel hPXR antago-
nists provide a possible small molecule intervention to control drug
metabolism and transport by reducing the activation of these genes
during therapeutic treatment.
3. Conclusions
Long range motions of the hPXR LBD (17) and potential that the
ligands may bind in multiple conformations within the LBD of
hPXR may be responsible for the promiscuity of the hPXR LBD.
Thus developing computational models to predict the binding
mode and affinity of ligands for hPXR is challenging. In this study
we have summarized the utility of ligand based and structure based
models to predict agonists and antagonists of hPXR. Each of these
methods has their inherent advantages and limitations specially
when used on flexible proteins (22). Among the ligand based
methods, some of them are based on ligand alignment while others
are independent of alignments. The performance of alignment-
dependent methods was limited in their ability to classify activators

of hPXR. In the alignment independent category, the pharmaco-
phore based models perform well within a given class of com-
pounds by generating pharmacophoric features that are relevant
to that class of molecules. However, their predictive potential
decreases when applied to a diverse set of molecules. Machine
learning methods such as Bayesian modeling and Support vector
machines are also alignment independent methods that use 2D
fingerprints and 2D or 3D molecular descriptors to build the
models. In our studies we have found that they perform much
better than the other in-silico methods; however their applicability
domain is dependent on the training sets and consideration of
molecular similar to molecules in the training set may be important.
Some of the ligand based methods cannot deal with stereoisomers
(unlike pharmacophore based methods) and need large training
sets for improved performance. On the other hand structure
based methods have their advantages of physically docking the
molecules to the binding site and sampling both the ligand and
protein conformational space, however they have their inherent
limitations in scoring the best conformers and are computationally
expensive for very large datasets. So the best option would be to
utilize a combination of ligand and structure based methods that
are complimentary in their approaches to better conquer flexible
and promiscuous proteins such as hPXR. The methods described
here are also generally applicable to other flexible proteins.
Acknowledgments
We would like to thank all our collaborators who have contributed

to our studies on PXR. SK is supported by Scientist Development
Grant awarded by American Heart Association.
Appendix
Structure-based Methods. Data collection: A comprehensive hPXR

dataset consisting of a variety of chemical classes, namely, andros-
tanes, pregnanes, estratrienes, bile acids, and regular xenobiotics is
available from previously published studies. (53, 54). The Environ-
mental Protection Agency has developed a dataset comprising of
everyday chemicals, pesticides and other related chemicals aptly
called Toxcast dataset (ToxcastTM) which is available for use by
researchers ((47, 49). For all these datasets, either the direct binding
data to hPXR or relative fold induction changes have been pub-

lished. However, the EC50 cutoff value used for classifying the
molecules into hPXR activators and nonactivators is not always
available or does not follow a strict pattern (25, 33), SMILES strings
for compounds listed in datasets or 2D structures in mol or sdf
formats can be obtained from the respective publications (25).
Using these as input, single low energy three-dimensional structure
of molecules can be generated using Molecular Operating Environ-
ment (MOE, Chemical Computing Group, Montreal, Canada) or
CORINA (Molecular Networks GmbH, N€agelsbachstr. 25, 91052
Erlangen, Germany. http://www.mol-net.de) with partial charges
assigned according to the Gasteiger-Marsili scheme (55).
Molecular Descriptors. hPXR binds a variety of ligands that differ in
shape, size, and chemical composition. However, analysis of well
known hPXR agonists show that the interactions are dominated by
hydrophobic and hydrogen bonding features of the ligands. Hence
to capture these properties, molecular descriptors that represent
shape, size, flexibility and hydrogen bonding, and hydrophobic
properties must be chosen. These include FCFP_6 fingerprints,
volume, weight, KierA1-A3, Kier1-3, number of rotatable bonds,
number of rings and KierFlex, electrostatic features like logP, TPSA,
logs of lip_don, lip_acc, number of N atoms, and number of O
atoms. The values for these specific molecular descriptors can be
derived from MOE. In addition to analyze the specific role of shape
and electrostatics, specific shape based descriptors such as shape
signatures can be derived (56).
Molecular Docking. Five crystal structures of hPXR cocrystalized
with a variety of ligands are available in the protein databank (PDB)
under the codes 1 M13, resolution 2.00 Å (36), 1SKX, resolution
2.80 Å (38), 2O9I, resolution 2.80 Å (37), 1NRL, resolution
2.00 Å (36), and 2QNV, resolution 2.80 Å (37). In addition,
another structure in complex with 17-b estradiol that is yet to be
deposited in PDB was also used for docking studies (57). In all
cases, the protein structure is first prepared by removing the crystal
structure ligand and adding hydrogen atoms to the amino acids and
the resulting structures are energy minimized to remove any steric
contacts. All amino acids within 6 Å of the cocrystallized ligand are
generally chosen as being part of the binding site. The docking
program GOLD (ver 4) (58) or FlexX could be used for docking.
GOLD uses a genetic algorithm to explore the various conforma-
tions of ligands and flexible receptor side chains in the binding
pocket. In our studies, we have chosen to perform 20 independent
docking runs for each ligand to sample the ligand and protein
conformational space. The resulting docked complexes can be
scored using GoldScore and ChemScore.
Scoring functions. One of the bottlenecks in docking studies is the

scoring functions. Although most docking programs are capable of
sampling the ligand in the binding pocket and generating solutions,
the scoring functions are not sensitive enough to identify the truly
best docked solutions. This is because, most scoring functions are
empirical energy based schemes as shown below:
DGbind ¼ DGsolvent þ DGconf þ DGint þ DGrot þ DGt =r þ DGvib
Where DGbind is the binding free energy, DGsolvent is the penalty for
desolvation, DGint is the internal energy, DGrot represents the energy
contribution for bond rotation, and DGvib represents the energy
contribution for vibration component. However, the terms for solva-
tion is approximated to a best fit model and the terms for entropy are
most likely dropped from the energy equation due to the complexity
involved in computing the entropy factors. Thus, in practice no single
scoring function can work for every target and has to be customized to
suit the needs of the target. In case of hPXR, given that the binding site
is very promiscuous and binds a variety of ligands, developing a
scoring scheme is challenging. We have used a number of methods
to derive consensus scoring schemes including contact score, shape
based scores, and molecular descriptor based schemes.
1. Contact scoring scheme. The docked receptor–ligand complexes
were scored using a contact based scoring function. Accord-
ingly, an in-house program was used to scan the docked com-
plexes for contacts between the ligand and protein atoms (56).
These contacts were scored based on a weighting scheme that
was derived from the nature of interaction between the ligands
cocrystallized with hhPXR. For example, hyperforin has hydro-
gen bond interactions with residues Gln285, His407, and
Ser247 of the hhPXR protein in the crystal structure (PDB
ID:1 M13) (Fig. 3). Thus the contact scoring function
weighted all those docked protein–ligand complexes that fea-
tured the hydrogen bonding between the ligands and these
three residues, higher than the rest of the interactions. Similarly,
other nonbonded interactions were weighted based on the
interactions of the ligands in the hPXR crystal structures. All
interaction scores were then summed and normalized against all
crystal structures. A consensus scoring scheme was developed
for final classification based on the following rule: Only those
compounds that had at least half the value of the highest Gold-
Score and a nonzero contact score were assigned as activators
and the rest of the molecules were classified as nonactivators.
2. Shape based scoring scheme. In this scheme, the ligands were
compared with the hPXR ligands from the five crystal struc-
tures for their shape based similarities using two different
approaches. The first was based on the 2D similarity encoded
in MDL public fingerprint keys calculation using Discovery

Studio 2.0 (Accelrys, San Diego, CA). The Tanimoto coeffi-
cient was used as the metric to compare the molecular finger-
prints. The coefficients varied between 0 and 1, where
0 meant maximally dissimilar and 1 coded for maximally
similar. The Tanimoto coefficient between fingerprints X
and Y has been defined to be: [number of features in intersect
(A,B)]/[number of features in union(A,B)], where A and B
are two compounds.
In an approach, the 3D shapes of the molecules from the
combined dataset were compared with the shapes of each of the
four crystal structure ligands. This was achieved by comparing
their corresponding 1D Shape Signatures and a dissimilarity
score was computed for each ligand pair. The dissimilarity score
was then converted to a similarity score, which was in turn used as
weighting factor for the GoldScore. In all these scoring schemes
the consensus score was calculated as shown below in Eq. (1).
3. Molecular descriptor based scoring. In this scheme, the molecular
descriptors computed using MOE were used to calculate
Euclidean distances from the crystal structure ligands. These
Euclidean distances were used as weighting factors to Gold-
Score. Similarly, the values of the molecular descriptors were
also used to calculate Tanimoto similarity indices (59) with
reference to the cocrystal structure ligands. The values for the
Tanimoto indices for each ligand in the combined set were
calculated against each of the crystal structure ligands and
then used as weighting factors to the GoldScores.
The weighted docking score of an active compound j with i
conformations was described as
Si;j ¼ wi sij (1)
where sij was the original GoldScore for the compound i in its
jth conformation and wi is the weighting factor for compound i
from either of the schemes described above.
References
1. Ruegg J, Penttinen-Damdimopoulou P, function relationships and mechanism of

Makela S, Pongratz I, Gustafsson JA (2009) action. Environ Health Perspect 60:47–56
Receptors mediating toxicity and their involve- 4. Gustafsson JA (1995) Receptor-mediated tox-
ment in endocrine disruption. EXS icity. Toxicol Lett 82–83:465–470
99:289–323 5. Sewall CH, Lucier GW (1995) Receptor-
2. Tabb MM, Blumberg B (2006) New modes of mediated events and the evaluation of the Envi-
action for endocrine-disrupting chemicals. Mol ronmental Protection Agency (EPA) of dioxin
Endocrinol 20:475–482 risks. Mutat Res 333:111–122
3. Safe S, Bandiera S, Sawyer T, Robertson L, Safe 6. Zollner G, Wagner M, Trauner M (2010)
L, Parkinson A, Thomas PE, Ryan DE, Reik Nuclear receptors as drug targets in cholestasis
LM, Levin W et al (1985) PCBs: structure- and drug-induced hepatotoxicity. Pharmacol
Ther 126:228–243
7. Lee EJ, Lean CB, Limenta LM (2009) Role of 19. Gong H, Guo P, Zhai Y, Zhou J, Uppal H,
membrane transporters in the safety profile of Jarzynka MJ, Song WC, Cheng SY, Xie W
drugs. Expert Opin Drug Metab Toxicol (2007) Estrogen deprivation and inhibition of
5:1369–1383 breast cancer growth in vivo through activation
8. Giguere V (1999) Orphan nuclear receptors: of the orphan nuclear receptor liver X receptor.
from gene to function. Endocr Rev Mol Endocrinol 21:1781–1790
20:689–725 20. Rotroff DM, Beam AL, Dix DJ, Farmer A,
9. Ashby J, Houthoff E, Kennedy SJ, Stevens J, Freeman KM, Houck KA, Judson RS,
Bars R, Jekat FW, Campbell P, Van Miller J, LeCluyse EL, Martin MT, Reif DM, Ferguson
Carpanini FM, Randall GL (1997) The chal- SS (2010) Xenobiotic-metabolizing enzyme
lenge posed by endocrine-disrupting chemi- and transporter gene expression in primary cul-
cals. Environ Health Perspect 105:164–169 tures of human hepatocytes modulated by Tox-
10. Kelce WR, Gray LE, Wilson EM (1998) Anti- Cast chemicals. J Toxicol Environ Health B
androgens as environmental endocrine disrup- Crit Rev 13:329–346
tors. Reprod Fertil Dev 10:105–111 21. Tirona RG, Leake BF, Podust LM, Kim RB
11. Melnick R, Lucier G, Wolfe M, Hall R, Stancel (2004) Identification of amino acids in rat
G, Prins G, Gallo M, Reuhl K, Ho SM, Brown pregnane X receptor that determine species-
T, Moore J, Leakey J, Haseman J, Kohn M specific activation. Mol Pharmacol 65:36–44
(2002) Summary of the National Toxicology 22. Kortagere S, Krasowski MD, Reschly EJ,
Program’s report of the endocrine disruptors Venkatesh M, Mani S, Ekins S (2010) Evaluation
low-dose peer review. Environ Health Perspect of computational docking to identify pregnane X
110:427–431 receptor agonists in the ToxCast database. Envi-
12. Goldman JM, Laws SC, Balchak SK, Cooper ron Health Perspect 118:1412–1417
RL, Kavlock RJ (2000) Endocrine-disrupting 23. Ekins S, Kortagere S, Iyer M, Reschly EJ, Lill
chemicals: prepubertal exposures and effects on MA, Redinbo M, Krasowski MD (2009) Chal-
sexual maturation and thyroid activity in the lenges Predicting Ligand-Receptor Interac-
female rat. A focus on the EDSTAC recom- tions of Promiscuous Proteins: The Nuclear
mendations. Crit Rev Toxicol 30:135–196 Receptor PXR. PLoS Comput Biol 5:
13. Ekins S, Chang C, Mani S, Krasowski MD, e1000594
Reschly EJ, Iyer M, Kholodovych V, Ai N, 24. Yasuda K, Ranade A, Venkataramanan R, Strom
Welsh WJ, Sinz M, Swaan PW, Patel R, Bach- S, Chupka J, Ekins S, Schuetz E, Bachmann K
mann K (2007) Human pregnane X receptor (2008) A comprehensive in vitro and in silico
antagonists and agonists define molecular analysis of antibiotics that activate pregnane X
requirements for different binding sites. Mol receptor and induce CYP3A4 in liver and intes-
Pharmacol 72:592–603 tine. Drug Metab Dispos 36:1689–1697
14. Ekins S, Kholodovych V, Ai N, Sinz M, Gal J, 25. Khandelwal A, Krasowski MD, Reschly EJ,
Gera L, Welsh WJ, Bachmann K, Mani S Sinz MW, Swaan PW, Ekins S (2008) Machine
(2008) Computational discovery of novel low learning methods and docking for predicting
micromolar human pregnane X receptor human pregnane X receptor activation. Chem
antagonists. Mol Pharmacol 74:662–672 Res Toxicol 21:1457–1467
15. Biswas A, Mani S, Redinbo MR, Krasowski 26. Ekins S, Andreyev S, Ryabov A, Kirillov E,
MD, Li H, Ekins S (2009) Elucidating the Rakhmatulin EA, Sorokina S, Bugrim A,
‘Jekyll and Hyde’ nature of PXR: the case for Nikolskaya T (2006) A Combined approach
discovering antagonists or allosteric antago- to drug metabolism and toxicity assessment.
nists. Pharm Res 26:1807–1815 Drug Metab Dispos 34:495–503
16. Mnif W, Pascussi JM, Pillion A, Escande A, 27. Bachmann K, Patel H, Batayneh Z, Slama J,
Bartegi A, Nicolas JC, Cavailles V, Duchesne White D, Posey J, Ekins S, Gold D, Sambucetti
MJ, Balaguer P (2007) Estrogens and anties- L (2004) PXR and the regulation of apoA1 and
trogens activate PXR. Toxicol Lett 170:19–29 HDL-cholesterol in rodents. Pharmacol Res
17. Teotico DG, Bischof JJ, Peng L, Kliewer SA, 50:237–246
Redinbo MR (2008) Structural basis of human 28. Ekins S, Erickson JA (2002) A pharmacophore
pregnane X receptor activation by the hops for human pregnane-X-receptor ligands. Drug
constituent colupulone. Mol Pharmacol Metab Dispos 30:96–99
74:1512–1520 29. Pan Y, Li L, Kim G, Ekins S, Wang H, Swaan
18. Wada T, Gao J, Xie W (2009) PXR and CAR in PW (2011) Identification and validation of
energy metabolism. Trends Endocrinol Metab Novel hPXR activators amongst prescribed
20:273–279
drugs via ligand-based virtual screening. Drug 40. Khandelwal, A., Krasowski, M. D., Reschly, E.
Metab Dispos 39(2):337–44 J., Sinz, M., Swaan, P. W., and Ekins, S. (2007)
30. Gao YD, Olson SH, Balkovec JM, Zhu Y, Royo A Comparative Analysis of Quantitative
I, Yabut J, Evers R, Tan EY, Tang W, Hartley Structure Activity Relationship Methods and
DP, Mosley RT (2007) Attenuating pregnane Docking For Human Pregnane X Receptor
X receptor (PXR) activation: a molecular Activation, submitted.
modelling approach. Xenobiotica 37:124–138 41. Jacobs MN (2004) In silico tools to aid risk
31. Lemaire G, Benod C, Nahoum V, Pillon A, assessment of endocrine disrupting chemicals.
Boussioux AM, Guichou JF, Subra G, Pascussi Toxicology 205:43–53
JM, Bourguet W, Chavanieu A, Balaguer P 42. Hassan M, Brown RD, Varma-O’brien S,
(2007) Discovery of a highly active ligand of Rogers D (2006) Cheminformatics analysis
human pregnane x receptor: a case study from and learning in a data pipelining environment.
pharmacophore modeling and virtual screen- Mol Divers 10:283–299
ing to “in vivo” biological activity. Mol Phar- 43. Ghose AK, Viswanadhan VN, Wendoloski JJ
macol 72:572–581 (1998) Prediction of hydrophobic (lipophilic)
32. Schuster D, Langer T (2005) The identifica- properties of small organic molecules using frag-
tion of ligand features essential for PXR activa- mental methods: an analysis of ALOGP and
tion by pharmacophore modeling. J Chem Inf CLOGP methods. J Phys Chem 102:3762–3772
Model 45:431–439 44. Watkins RE, Wisely GB, Moore LB, Collins JL,
33. Huang H, Wang H, Sinz M, Zoeckler M, Stau- Lambert MH, Williams SP, Willson TM,
dinger J, Redinbo MR, Teotico DG, Locker J, Kliewer SA, Redinbo MR (2001) The human
Kalpana GV, Mani S (2007) Inhibition of drug nuclear xenobiotic receptor PXR: structural
metabolism by blocking the activation of determinants of directed promiscuity. Science
nuclear receptors by ketoconazole. Oncogene 292:2329–2333
26:258–268 45. Kortagere S, Chekmarev D, Welsh WJ, Ekins S
34. Lill MA, Dobler M, Vedani A (2005) In silico (2009) Hybrid scoring and classification
prediction of receptor-mediated environmental approaches to predict human pregnane X
toxic phenomena-application to endocrine dis- receptor activators. Pharm Res 26:1001–1011
ruption. SAR QSAR Environ Res 16:149–169 46. Ekins S, Kortagere S, Iyer M, Reschly EJ, Lill
35. Lin YS, Yasuda K, Assem M, Cline C, Barber J, MA, Redinbo MR, Krasowski MD (2009)
Li CW, Kholodovych V, Ai N, Chen JD, Welsh Challenges predicting ligand-receptor interac-
WJ, Ekins S, Schuetz EG (2009) The major tions of promiscuous proteins: the nuclear recep-
human pregnane X receptor (PXR) splice tor PXR. PLoS Comput Biol 5:e1000594
variant, PXR.2, exhibits significantly dimin- 47. Judson RS, Houck KA, Kavlock RJ, Knudsen
ished ligand-activated transcriptional regula- TB, Martin MT, Mortensen HM, Reif DM,
tion. Drug Metab Dispos 37:1295–1304 Rotroff DM, Shah I, Richard AM, Dix DJ
36. Watkins RE, Davis-Searles PR, Lambert MH, (2010) In vitro screening of environmental
Redinbo MR (2003) Coactivator binding pro- chemicals for targeted testing prioritization:
motes the specific interaction between ligand the ToxCast project. Environ Health Perspect
and the pregnane X receptor. J Mol Biol 118:485–492
331:815–828 48. Martin MT, Dix DJ, Judson RS, Kavlock RJ,
37. Xue Y, Chao E, Zuercher WJ, Willson TM, Reif DM, Richard AM, Rotroff DM, Romanov
Collins JL, Redinbo MR (2007) Crystal struc- S, Medvedev A, Poltoratskaya N, Gambarian
ture of the PXR-T1317 complex provides a M, Moeser M, Makarov SS, Houck KA
scaffold to examine the potential for receptor (2010) Impact of environmental chemicals on
antagonism. Bioorg Med Chem 15:2156–2166 key transcription regulators and correlation to
38. Chrencik JE, Orans J, Moore LB, Xue Y, Peng toxicity end points within EPA’s ToxCast pro-
L, Collins JL, Wisely GB, Lambert MH, gram. Chem Res Toxicol 23:578–590
Kliewer SA, Redinbo MR (2005) Structural 49. Knudsen TB, Martin MT, Kavlock RJ, Judson
disorder in the complex of human pregnane X RS, Dix DJ, Singh AV (2009) Profiling the
receptor and the macrolide antibiotic rifampi- activity of environmental chemicals in prenatal
cin. Mol Endocrinol 19:1125–1134 developmental toxicity studies using the U.S.
39. Schuster D, Laggner C, Steindl TM, Palusczak EPA’s ToxRefDB. Reprod Toxicol
A, Hartmann RW, Langer T (2006) Pharma- 28:209–219
cophore modeling and in silico screening for 50. Mani S, Huang H, Sundarababu S, Liu W,
new P450 19 (aromatase) inhibitors. J Chem Kalpana G, Smith AB, Horwitz SB (2005)
Inf Model 46:1301–1311 Activation of the steroid and xenobiotic
receptor (human pregnane X receptor) by 55. Gasteiger JAMM (1980) Iterative partial equal-
nontaxane microtubule-stabilizing agents. ization of orbital electronegativity—a rapid
Clin Cancer Res 11:6359–6369 access to atomic charges. Tetrahedron 36:10
51. Chen Y, Tang Y, Wang MT, Zeng S, Nie D 56. Kortagere S, Welsh WJ (2006) Development
(2007) Human pregnane X receptor and resis- and application of hybrid structure based
tance to chemotherapy in prostate cancer. Can- method for efficient screening of ligands bind-
cer Res 67:10361–10367 ing to G-protein coupled receptors. J Comput
52. Reynolds CH, Tounge BA, Bembenek SD Aided Mol Des 20:789–802
(2008) Ligand binding efficiency: trends, phys- 57. Xue Y, Moore LB, Orans J, Peng L, Bencharit
ical basis, and implications. J Med Chem S, Kliewer SA, Redinbo MR (2007) Crystal
51:2432–2438 structure of the pregnane X receptor-estradiol
53. Ekins S, Reschly EJ, Hagey LR, Krasowski MD complex provides insights into endobiotic rec-
(2008) Evolution of pharmacologic specificity ognition. Mol Endocrinol 21:1028–1038
in the pregnane X receptor. BMC Evol Biol 58. Jones G, Willett P, Glen RC, Leach AR, Taylor
8:103 R (1997) Development and validation of a
54. Krasowski MD, Yasuda K, Hagey LR, Schuetz genetic algorithm for flexible docking. J Mol
EG (2005) Evolution of the pregnane x recep- Biol 267:727–748
tor: adaptation to cross-species differences in 59. Kogej T, Engkvist O, Blomberg N, Muresan S
biliary bile salts. Mol Endocrinol (2006) Multifingerprint based similarity
19:1720–1739 searches for targeted class compound selection.
J Chem Inf Model 46:1201–1213
Chapter 16
Non-compartmental Analysis
Johan Gabrielsson and Daniel Weiner
Abstract
When analyzing pharmacokinetic data, one generally employs either model fitting using nonlinear
regression analysis or non-compartmental analysis techniques (NCA). The method one actually employs
depends on what is required from the analysis. If the primary requirement is to determine the degree of
exposure following administration of a drug (such as AUC), and perhaps the drug’s associated pharmaco-
kinetic parameters, such as clearance, elimination half-life, Tmax, Cmax, etc., then NCA is generally the
preferred methodology to use in that it requires fewer assumptions than model-based approaches. In this
chapter we cover NCA methodologies, which utilize application of the trapezoidal rule for measurements of
the area under the plasma concentration–time curve. This method, which generally applies to first-order
(linear) models (although it is often used to assess if a drug’s pharmacokinetics are nonlinear when several
dose levels are administered), has few underlying assumptions and can readily be automated.
In addition, because sparse data sampling methods are often utilized in toxicokinetic (TK) studies, NCA
methodology appropriate for sparse data is also discussed.
Key words: Non-compartmental, NCA, AUC, Toxicokinetic, TK, lz
1. Non-
compartmental
Analysis
Most current approaches to characterize a drug’s kinetics involve
1.1. Non-compartmental non-compartmental analysis, denoted NCA, and nonlinear regres-
Versus Regression sion analysis (1). The advantages of the regression analysis approach
Analysis are the disadvantages of the non-compartmental approach and vice
versa. NCA does not require the assumption of a specific compart-
mental model for either drug or metabolite. The method used
involves application of the trapezoidal rule for measurements of
the area under a plasma concentration–time curve. This method,
which applies to first-order (linear) models, is rather assumption
free and can readily be automated. Figure 1 gives a schematic
picture of the NCA and nonlinear regression approaches. As can
be seen, NCA deals with sums of areas whereas regression
377
378 J. Gabrielsson and D. Weiner
Fig. 1. Comparison of NCA (left ) and nonlinear regression modeling (right ). Ka, K, and V in the right-hand panel indicate the
model parameters to be estimated by regressing the model to data.
modeling uses a function with regression parameters. Both meth-

ods are applied for the characterization of the kinetics of a com-
pound.
The time course of drug concentration in plasma can usually be
regarded as a statistical distribution curve. The area under a plot of
the plasma concentration versus time curve is referred to as the area
under the zero moment curve AUC, and the area under the prod-
uct of the concentration and time versus time curve is then called
the area under the first moment curve AUMC. Only the areas of the
zero and first moments are generally used in pharmacokinetic anal-
ysis, because the higher moments are prone to an unacceptable level
of computational error.
This section focuses on NCA with regard to computational
methods, strategies for estimation of lz, pertinent pharmacokinetic
estimates, issues related to steady state, and how to tackle situations
where t1/2 is much less than input time.
1.2. Computational The areas can either be calculated by means of the linear trapezoi-
Methods: Linear dal rule or by the log-linear trapezoidal rule. The total area is then
Trapezoidal Rule measured by summing the incremental area of each trapezoid
(Fig. 2).
The magnitude of the error associated with the estimated area
depends on the width of the trapezoid and the curvature of the true
profile. This is due to the fact that the linear trapezoidal rule over-
estimates the area during the descending phase assuming elimina-
tion is first-order, and underestimates the area during the ascending
part of the curve (Fig. 3). This over/underestimation error will be
more pronounced if the sampling interval Dt is large in relation to
the half-life.
Using the linear trapezoidal method for calculation of the area
under the zero moment curve AUC from 0 to time tn, we have
16 Non-compartmental Analysis 379
Fig. 2. Graphical presentation of the linear trapezoidal rule. AUCtit(i+1) is the area between
ti and ti+1. Ci and Ci+1 are the corresponding plasma concentrations, and Dt is the time
interval. Note that Dt may differ for different trapeziums.
Fig. 3. Concentration versus time during and after a constant rate infusion. The shaded
area represents underestimation of the area during ascending concentrations and over-
estimation of the area during descending concentrations. By decreasing the time
step (Dt ) between observations, this under- or overestimation of the area is minimized.
Xn
C i þ C iþ1
AUC0t last ¼ Dt; (1)
i¼1
2
where Dt ¼ ti+1 ti and tlast denotes the time of the last measur-
able concentration. Unless one has sampled long enough in time so
that concentrations are negligible, the AUC as defined above will
underestimate the true AUC. Therefore it may be necessary to
extrapolate the curve out to t equal to infinity (1). The extrapo-
lated area under the zero moment curve from the last sampling
time to infinity AUCtlast-1 is calculated as
Fig. 4. Semilog plot demonstrating the estimation of lz. The terminal data points are fit by
log-linear regression to estimate the slope.
ð
1 1
elz ðtt last Þ
AUCt1last ¼ C last e lz ðtt last Þ
dt ¼ C last
lz t last
t last

1 C last
¼ C last 0 ¼ ; (2)
lz lz
where Clast and lz are the last measurable nonzero plasma concen-
tration and the terminal slope on a loge scale, respectively. One may
also use the predicted concentration at tlast if the observed concen-
tration deviates from the terminal regression line (Fig. 4).
The lz parameter is graphically obtained from the terminal
slope of the semilogarithmic concentration–time curve as shown
in Fig. 4, with a minimum of 3–4 observations being required for
accurate estimation. The Y axis ln(C) denotes the natural logarithm
(loge) of the plasma concentration C.
The linear trapezoidal method for calculation of the area under
the first moment curve AUMC from 0 to time tlast is obtained from
Xn
t i C i þ t iþ1 C iþ1
AUMC0t last ¼ Dt: (3)
i¼1
2
Ð ax ax
Remembering that x eax dx ¼ xea e a2 ; the
corresponding area under the first moment curve from time tlast
to infinity AUMCtlast-1 is computed as
ð
1 ð
1
AUMCt1last ¼ t Cdt ¼ t C last elz ðtt last Þ dt

t last t last
" #1
lz t last
lz t last t e e lz t last
¼ C last e þ
lz l2z t
last
C last t last C last

¼ þ 2 : (4)
lz lz
1.3. Computational An alternative procedure that has been proposed is the log-linear
Methods: Log-Linear trapezoidal rule. The underlying assumption is that the plasma
Trapezoidal Rule concentrations decline mono-exponentially between two measured
concentrations. However, this method applies only for descending
data and fails when Ci ¼ 0 or Ci+1 ¼ Ci. In these instances one
would revert to the linear trapezoidal rule. The principal difference
between the linear and the log-linear trapezoidal method is demon-
strated in Fig. 5.
Remember that when the concentrations decline exponentially
C iþ1 ¼ C i eK ðt iþ1 t i Þ ¼ C i eK Dt ; (5)
where ti+1 ti is the time step Dt between two observations and K
is the elimination rate constant for a one-compartment system.
Otherwise, lz should be used as the slope. The above expression
when rearranged gives the elimination rate constant K:
lnðC i =C iþ1 Þ
K¼ : (6)
Dt
The AUC within the time interval Dt is the difference between
the concentrations divided by the slope K:
C i C iþ1 C i C iþ1
AUCiiþ1 ¼ ¼ Dt: (7)
K lnðC i =C iþ1 Þ
Using the log-linear trapezoidal method from time zero to tn
Xn
C i C iþ1
AUC0t n ¼ Dt; (8)
i¼1
lnðC i =C iþ1 Þ
while the corresponding equation for AUMC from time zero to tn

with this method yields
Fig. 5. The principal difference between the linear (left ) and the log-linear (right )
trapezoidal methods. The shaded region represents the over-predicted area with the
linear trapezoidal rule. Note that the log-linear approximation is only true if the decay is
truly mono-exponential between ti and ti+1.
Xn
t i C i t iþ1 C iþ1 C iþ1 C i
AUMC0t n ¼ Dt Dt 2 :
i¼1
lnðC i =C iþ1 Þ ½ lnðC i =C iþ1 Þ 2
(9)
The extrapolated area under the zero moment curve from the
last sampling time to infinity AUCtlast-1 is calculated as
C last
AUCt1last ¼ ; (10)
lz
where Clast and lz are as defined earlier. The corresponding
area under the first moment curve from time zero to infinity
AUMCtlast-1 is
C last t last C last
AUMCt1last ¼ þ 2 : (11)
lz lz
As previously pointed out, the linear trapezoidal method gives
approximate estimates of AUC during both the ascending and
descending parts of the concentration–time curve, although the
bias is usually negligible for the upswing. The log-linear trapezoidal
method may also give somewhat biased results, though to a lesser
extent. Some people argue that the log-linear trapezoidal method
may therefore be preferable for drugs with long half-lives relative to
the sampling interval. From a practical point of view this still needs
to be proven. However, our own experience is that the difference
between the two methods is negligible as long as a reasonable
sampling design has been used. We generally use a mixture of the
two methods, which means that the linear trapezoidal method is
applied for increasing and equal concentrations, e.g., at the peak or
a plateau, and the log-linear trapezoidal method for decreasing
concentrations. This is demonstrated in Fig. 6.
Note that NCA is often used in crossover studies comparing two
formulations and 12–36 subjects. Thus, since the error associated
Fig. 6. NCA using a combination of the linear and log-linear trapezoidal methods. The
linear method is used for consecutively increasing or consecutively equal concentrations.
The log-linear method is used for decreasing concentrations.
with an individual patient’s AUC is generally small, the (average)

error associated with the average AUC for a formulation will gener-
ally be negligible, regardless of the method used. The choice of
method is thus up to the discretion of the modeler, as long as one
can explain why a particular method provides a more accurate esti-
mate of AUC.
The linear trapezoidal method will work excellently in situa-
tions of zero-order kinetics since plasma concentrations decline
linearly with time. Hence, even large sampling intervals will be
acceptable. The log-linear trapezoidal rule may in some instances
be more optimal within the first-order concentration range. The
linear method will then overpredict the areas particularly when half-
life is short relative to the sampling interval.
Direct integration of the function for the drug’s kinetics in
plasma is discussed under the introductory section on mono- and
multi-exponential models and will therefore not be further elabo-
rated here.
1.4. Strategies for When estimating lz, we recommend that data from each individual
Estimation of lz are first plotted in a semilog diagram. Ideally, to obtain a reliable
estimate of the terminal slope, 3–4 half-lives would need to have
elapsed. However, sometimes this is not possible. A minimum
requirement is then to have 3–4 observations for the terminal
slope (Fig. 7). By means of log-linear regression of those observa-
tions, the estimate of lz is obtained. This is then used for calculation
of the extrapolated area as shown below:
C last
AUCt1last ðobservedÞ ¼ (12)
lz
or
^ last
C
AUCt1last ðpredictedÞ ¼ : (13)
lz
In Fig. 8 the last observed concentration Cobs deviates some-
what from the regression line. The extrapolated area, if based on
Fig. 7. The ideal situation (left ) for estimation of the terminal slope lz. Another and
perhaps more commonly encountered situation (right ) is where one only has an indication
of an additional slope.
Fig. 8. Impact on the extrapolated area of using observed terminal concentration versus
predicted concentration. The shaded area from tlast to infinity symbolizes the overestima-
tion that would result. Note that if the observed terminal concentration lies below the
predicted terminal concentration, then the extrapolated area would be underestimated.
The open circle is the predicted concentration at tlast. The last observation is not included
in the regression.
Cobs, would be disproportionately large as compared to the area

based on the predicted concentration.
The total area is obtained by summing the individual areas
obtained by means of the trapezoidal rule to the last time (tlast),
and adding the extrapolated area according to
AUC total ¼ AUC01 ¼ AUC0t last þ AUCt1last : (14)
The fraction of AUCextr to AUC0-1 is calculated as
AUCt1last
% extrapolated area ¼ 100: (15)
AUC01
The extrapolated area should ideally be as small as possible in
comparison to the total area. We believe that AUCtlast-1 should not
exceed 20–25% of AUCtotal, unless it is only used as a preliminary
estimate for further study refinement.
1.5. Pertinent Moment analysis has been widely used in recent years as a non-
Pharmacokinetic compartmental approach to the estimation of clearance Cl, mean
Estimates residence time MRT, steady-state volume of distribution Vss, and
volume of distribution during the terminal phase Vz (also called Vdb
for a bi-exponential system). A general treatment for the aforemen-
tioned parameters has been presented, which includes the possibil-
ity of input/exit from any compartment in a mammillary model
(2, 3). This approach also defines exit site-dependent and exit
site-independent parameters. We will, however, assume in the
following examples that input/output occurs to the central com-
partment. Assuming a simple case with a one-compartment bolus
system, the shape of the concentration–time and t·concentration–
time profiles will take the form depicted in Fig. 9.
The extrapolated area from the last sample at tlast to infinity is in
this case small. However, the corresponding area under the first
Fig. 9. Comparison of shape of area under the zero moment curve AUC and area under the first moment (tC ) curve AUMC.
The latter area contains usually an extensive extrapolated area as compared to AUC.
moment curve has an altogether different shape. Clearly, the extra-

polated area from last sampling point to infinity will generally
contribute to a much larger extent under the first moment curve
as compared to the area under the zero moment curve.
Pharmacokinetics has moved almost completely from parame-
terizing elimination in terms of rate constants, with the more
physiologically relevant use of clearance now being widely recog-
nized. To put even more focus on clearance, Holford suggested
that AUC no longer be used as a pharmacokinetic parameter.
Clearance Cl or clearance over bioavailability Cl/F also denoted
that Cl0 is easily computed from AUC and dose, and Cl and CL/F
can immediately be interpreted in a physiological context. On the
other hand, AUC can be viewed as a parameter that confounds
clearance and dose, and that has no intrinsic merit. While we agree
with those ideas, AUC is still useful as a measure of exposure in
toxicological studies and when dose is unknown.
Clearance is calculated from the dose and the area under the
zero moment curve:
D iv
Cl ¼ : (16)
AUC01
Oral clearance Cl0 or Cl/F is calculated from the oral dose and
the area under the zero moment curve:
Cl D po
Cl 0 ¼ ¼ : (17)
F AUC01
Using the areas obtained from systemic, e.g., intravenous and
extravascular, e.g., oral, dosing, the bioavailability F is calculated,
after dose-normalization, according to
AUC ev D iv
F ¼ ; (18)
AUC iv D ev
where AUCev and AUCiv denote area under the extravascular and
intravenous concentration–time profiles, respectively. Dev and Dev
are the respective extravascular and intravenous doses.
If the drug is given at a constant rate over a period of Tinf, then
one also needs to adjust MRT for the infusion time by means of
subtracting Tinf/2 (infusion time/2) as follows:
AUMC01 T inf
MRT ¼ : (19)
AUC01 2
Tinf/2 originates from the average time a molecule stays in the
infusion set (e.g., syringe, catheter, line). Half of the dose is infused
when the piston has traveled half of the intended distance. Tinf/2 is
the mean input time, MIT. Similarly for first-order input,
AUMC01 1
MRT ¼ 1 : (20)
AUC0 Ka
Remember that Ka is the apparent first-order absorption rate
constant derived from plasma data. This parameter may also con-
tain processes parallel to the true absorption step of drug in the
gastrointestinal tract, e.g., chemical degradation (kd). Conse-
quently, the mean absorption time, MAT, is the sum of several
processes including absorption and chemical degradation:
1 1
MAT ¼ ¼ : (21)
K aðapparentÞ K aðtrueÞ þ kd
The MRT of the central compartment MRT(1) is the sum of
the inverse of the initial a and terminal b slopes corrected for the
inverse of the sum of the exit rate constants from the peripheral
compartment:
1 1 1
MRT iv ð1Þ ¼ þ : (22)
a b E2
Assuming that there is only one exit rate constant from the
peripheral compartment, which then is k21, the MRTiv is
1 1 1
MRT iv ¼ þ : (23)
a b k21
The observed MRT after extravascular dosing becomes
AUMC01measured
¼ MRT þ MIT ; (24)
AUC01measured
which is the sum of the true MRT and MIT. MIT can also be
obtained from the input function according to Eq. 25 below:
Ð
1 Ð
1
input function t dt input function t dt
MIT ¼ 0 1
Ð ¼ 0
; (25)
F Dose
input function dt
0
provided the input function is known,

Agut ¼ F D po eK a t ; (26)
and MIT can be derived:
Ð
1
F D po eK a t t dt
F D po =Ka2 1
MIT ¼ 0 1
Ð ¼ ¼ : (27)
F D po =K a K a
F D po eK a t dt
0
The volume of distribution at steady-state Vss is computed as

AUMC01 D iv D iv AUMC01
V ss ¼ MRT Cl ¼ 1 1 ¼ 2 : (28)
AUC0 AUC0 AUC 1 0
The volume of distribution during the terminal phase Vz is

computed as
Cl D iv 1
Vz ¼ ¼ : (29)
lz AUC01 lz
The corresponding volume for a bi-exponential function is
computed as
Cl D iv 1
V db ¼ ¼ : (30)
b AUC01 b
The terminal half-life t1/2z is readily estimated from the slope
lz as
lnð2Þ
t 1=2z ¼ : (31)
lz
The half-life of the initial a-phase is
lnð2Þ
t 1=2a ¼ : (32)
a
The half-life of the terminal b-phase of a bi-exponential func-
tion is
lnð2Þ
t 1=2b ¼ : (33)
b
Note that the t1/2z parameter is referred to as t1/2b in a bi-
exponential function and simply t1/2 in a mono-exponential system.
1.6. NCA Approaches In some instances it may not be possible to obtain sufficient samples
for Sparse Data from each subject so as to completely characterize the plasma con-
centration–time curve. This may be due to the need to sacrifice the
animal to obtain the samples, general concerns over blood loss
(such as in human neonates or small rodents), or cost concerns.
In these situations it is necessary to pool the data from multiple
subjects to characterize the full time–plasma concentration curve.

Generally these approaches are recommended only when the data
are being collected from populations that do not exhibit extensive
subject-to-subject variation, such as in highly inbred strains of
animals.
One such approach is an extension of the NCA analysis for rich
data described previously, and enables one to derive an estimated
standard error (se) for AUC for sparse data (4–6). This procedure is
implemented in Phoenix® WinNonlin®.
Another approach has also been proposed that involves nonlin-
ear mixed effects modeling (also denoted as population modeling).
In this instance a structural pk model is specified and fit to the data.
This approach has the advantage of incorporating covariates
(e.g., age, gender, body weight, etc.). That is, the ability (e.g.) to
model changes in clearance as a function of age or body weight. It
also has a limitation of possibly not being able to adequately iden-
tify the underlying structural model unless the sparse data can be
pooled with rich data from some other cohort (7).
1.7. Suggested Reading For further reading on basic pharmacokinetic principles, we refer
the reader to Benet (8), Benet and Galeazzi (9), Gibaldi and Perrier
(10), Nakashima and Benet (2, 3), Jusko (11), and Rowland and
Tozer (12). Houston (13) and Pang (14) provide excellent texts on
metabolite kinetics.
Benet and Galeazzi (9), Watari and Benet (15), and Nakashima
and Benet (3) have elaborated on the theory of NCA, while Gille-
spie (16) discussed the pros and cons of NCA versus compartmen-
tal models.
References
1. Gabrielsson J, Weiner D (2006) Pharmacoki- plete blood sampling. In: ASA Proceedings of
netic and pharmacodynamic data analysis: con- the Biopharmaceutical Section, p 4
cepts and applications, 4th edn. Swedish 6. Nedelman JR, Jia X (1998) An extension of
Pharmaceutical Press, Stockholm Satterthwaite’s approximation applied to phar-
2. Nakashima E, Benet LZ (1988) General treat- macokinetics. J Biopharm Stat 8(2):317
ment of mean residence time, clearance and 7. Hing JP, Woolfrey SG, Wright PMC (2001)
volume parameters in linear mamillary models Analysis of toxicokinetic data using NON-
with elimination from any compartment. MEM: impact of quantification limit and
J Pharmacokinet Biopharm 16:475 replacement strategies for censored data.
3. Nakashima E, Benet LZ (1989) An integrated J Pharmacokinet Pharmacodyn 28(5):465
approach to pharmacokinetic analysis for linear 8. Benet LZ (1972) General treatment of linear
mammillary systems in which input and exit mammillary models with elimination from any
may occur in/from any compartment. compartment as used in pharmacokinetics.
J Pharmacokinet Biopharm 17:673 J Pharm Sci 61:536
4. Bailer AJ (1988) Testing for the equality of area 9. Benet LZ, Galeazzi RL (1979) Noncompart-
under the curve when using destructive mea- mental determination of the steady-state vol-
surement techniques. J Pharmacokinet Bio- ume of distribution. J Pharm Sci 48:1071
pharm 16:303 10. Gibaldi M, Perrier D (1982) Pharmacokinetics.
5. Yeh C (1990) Estimation and significance tests Revised and expanded, 2nd edn. Marcel Dek-
of area under the curve derived from incom- ker Inc., New York, NY
11. Jusko WJ (1992) Guidelines for collection and 14. Pang KS (1985) A review of metabolic kinetics.
analysis of pharmacokinetic data. In: Evans WE, J Pharmacokinet Biopharm 13(6):633
Schentag JJ, Jusko WJ (eds) Applied pharmacoki- 15. Watari N, Benet LZ (1989) Determination of
netics: principles of therapeutic drug monitoring, mean input time, mean residence time, and
3rd edn. Applied Therapeutics, Spokane, WA steady-state volume of distribution with multi-
12. Rowland M, Tozer T (2010) Clinical pharma- ple drug inputs. J Pharmacokinet Biopharm 17
cokinetics and pharmacodynamics: concepts (1):593
and applications, 4th edn. Lippincott Williams 16. Gillespie WR (1991) Noncompartmental ver-
& Wilkins, Maryland sus compartmental modeling in clinical phar-
13. Houston JB (1994) Kinetics of disposition of macokinetics. Clin Pharmacokinet 20:253
xenobiotics and their metabolites. Drug Metab
Drug Interact 6:47
Chapter 17
Compartmental Modeling in the Analysis

of Biological Systems
James B. Bassingthwaighte, Erik Butterworth, Bartholomew Jardine,
and Gary M. Raymond
Abstract
Compartmental models are composed of sets of interconnected mixing chambers or stirred tanks. Each
component of the system is considered to be homogeneous, instantly mixed, with uniform concentration.
The state variables are concentrations or molar amounts of chemical species. Chemical reactions, transmem-
brane transport, and binding processes, determined in reality by electrochemical driving forces and con-
strained by thermodynamic laws, are generally treated using first-order rate equations. This fundamental
simplicity makes them easy to compute since ordinary differential equations (ODEs) are readily solved
numerically and often analytically. While compartmental systems have a reputation for being merely descrip-
tive they can be developed to levels providing realistic mechanistic features through refining the kinetics.
Generally, one is considering multi-compartmental systems for realistic modeling. Compartments can be
used as “black” box operators without explicit internal structure, but in pharmacokinetics compartments are
considered as homogeneous pools of particular solutes, with inputs and outputs defined as flows or solute
fluxes, and transformations expressed as rate equations.
Descriptive models providing no explanation of mechanism are nevertheless useful in modeling of many
systems. In pharmacokinetics (PK), compartmental models are in widespread use for describing the
concentration–time curves of a drug concentration following administration. This gives a description of
how long it remains available in the body, and is a guide to defining dosage regimens, method of delivery,
and expectations for its effects. Pharmacodynamics (PD) requires more depth since it focuses on the
physiological response to the drug or toxin, and therefore stimulates a demand to understand how the
drug works on the biological system; having to understand drug response mechanisms then folds back on
the delivery mechanism (the PK part) since PK and PD are going on simultaneously (PKPD).
Many systems have been developed over the years to aid in modeling PKPD systems. Almost all have
solved only ODEs, while allowing considerable conceptual complexity in the descriptions of chemical
transformations, methods of solving the equations, displaying results, and analyzing systems behavior.
Systems for compartmental analysis include Simulation and Applied Mathematics, CoPasi (enzymatic
reactions), Berkeley Madonna (physiological systems), XPPaut (dynamical system behavioral analysis),
and a good many others. JSim, a system allowing the use of both ODEs and partial differential equations
(that describe spatial distributions), is used here. It is an open source system, meaning that it is available for
free and can be modified by users. It offers a set of features unique in breadth of capability that make model
verification surer and easier, and produces models that can be shared on all standard computer platforms.
391
392 J.B. Bassingthwaighte et al.
Key words: Physiological and pharmacologic modeling, PKPD, Pharmacokinetics–pharmacody-

namics, Compartmental systems, Systems biology, Physiome, JSim, CellML, SBML, Reproducible
research, Unit checking, Verification, Validation, Ordinary and partial differential equations,
Optimization, Confidence limits
1. Introduction
1.1. Overview Compartmental analysis implies the use of linear first-order differential
of the Topic operators as analogs for describing the kinetics of drug distribution
and elimination from the body. Concentrations are measured in
accessible fluids, usually the plasma, and the concentration–time
curve is used to provide a measure of how long the drug concentra-
tion remains at a therapeutic level. This is the basis of pharmacoki-
netics (PK). The influences on efficacy and utility are considered by
the term ADME, administration, distribution, metabolism, and
elimination. Drugs are given in chemically significant amounts;
they bind to enzymes, channels, receptors or transporters, changing
reaction rates and fluxes in concentration-dependent fashion. The
effects on the biological system is termed pharmacodynamics (PD).
Precise mathematical statements about the kinetics and the body’s
responses comprise the combination pharmacokinetics–pharmaco-
dynamics (PKPD).
Compartmental analysis had its historical start with the use of
tracers. Tracer-labeled compounds were used in order to determine
kinetics when the drug concentrations were too low to be measured
chemically. Radioactive tracers were given in such low concentra-
tions relative to those of native non-tracer mother substances that
the kinetics were in fact linear. Consider a reaction rate, k(C), that is
dependent upon the concentration of the mother substance of
concentration C(t):
Flux of mother substance ¼ kðCÞ CðtÞ: (1)
When tracer of concentration C 0 is added to the system, then
Total flux; mother, and tracer ¼ kðC þ C 0 Þ ½CðtÞ þ C 0 ðtÞ: (2)
When C 0 << C, then the rate constant is determined solely by
C, as k(C þ C 0 ) k(C), and the rate constant is independent of
the tracer concentration:
Tracer flux of C 0 ðtÞ ¼ kðCÞ C 0 ðtÞ; (3)
0
where the flux is first order in C when the background non-tracer
mother substance concentration is constant. When only the tracer is
changing concentration, the k(C) is constant and the system is first
order and linear. In general then, one can look upon tracers in
compartmental systems as being linear, first-order systems, though
nowadays they can go far beyond that. The originators and later
proponents of compartmental analysis (Berman (1); Jacquez (2, 3);
17 Compartmental Modeling in the Analysis of Biological Systems 393
Cobelli et al. (4)) used this simplification, but were always aware of
the greater possibilities of allowing nonlinear coefficients. Berman’s
classic 1963 article (1) provides much more than solutions to ordi-
nary differential equations (ODEs) for he outlines an important
philosophic approach to modeling in general. Jacquez’ books and
many articles, and the book by Cobelli et al. (4), give detailed
mathematical approaches and explicit applications. The desire to
use linear kinetics was not so much to avoid solving nonlinear
equations as it was to use linear algebra to solve the differential
equations. A system of linear differential equations can be solved
by matrix inversion and can provide the much desired analytical
solutions. As we shall see below, analytical solutions are still desired,
for they serve as verification that the numerical solutions produced
by modern simulation systems are correct in specific reduced cases,
and thereby imply that the nonlinear system solutions in that neigh-
borhood of parameter space are also correct. But, because most
biological phenomena are nonlinear, such that the rate coefficients
vary with the concentrations of one or usually more solutes, tem-
perature, and pH, we have to acknowledge right at the start that
using linear compartmental systems analysis is an approximation.
Compartmental analysis was mostly descriptive, not mechanis-
tic. It was “Black Box,” not attempting to define enzymatic reac-
tions mechanistically, but to describe the time course, “White Box”
modeling is where the innards of the operational analysis attempt to
describe mechanism, not just the kinetics of a relationship. Never-
theless, the descriptive level was a success; in FDA reviews quanti-
tative descriptions are valuable, for they distinguish groups of
responses and allow categorization even when they cannot provide
a physiological interpretation. The plasma concentration–time data
are very useful in choosing methods of administration and in defin-
ing dosage regiments.
Modern molecular biology and emerging integrative multi-scale
modeling analysis likewise are changing the game. Personalized
medicine is pointedly mechanistic, with cell and molecular physiol-
ogy dominating in the strategies of Administration, Distribution,
Metabolism, and Elimination (ADME). Fortunately the huge
increase in the rates of acquisition of data, causing a demand for
detailed, informative but complex simulation analysis has been more
than compensated for by the increases in computational speed and in
improved software facilitating modeling analysis. Most importantly,
software sharing is now relatively easy, and of much increased impor-
tance since comprehensive models may take years in development.
Now, highly nonlinear complex systems, spatially distributed or
lumped, are handled with faster computation, and can include kinet-
ics and detailed physiological pharmacodynamics (the PD of
PKPD), which is the systematic analysis of the body’s responses to
the drug or toxin. Given the relevance of physiological transport
processes (diffusion, flow, transmembrane exchange, binding) in
both administration and distribution, a relatively new term, PBPK,

physiologically based pharmacokinetics, has arisen to recognize the
importance of incorporating anatomy and physiology into PK.
Drugs are toxins. It is only a matter of dosage. The struggle to
distinguish acceptable toxicity from unacceptable toxicity is the
central conflict, not just for cancer therapy but also in defining
drug usage in general. Aspirin, ibuprofen, oxycodone, sugar, and
water, all create problems when in excess. Distinguishing “Thera-
peutic Dose” from “Toxic Dose” depends on the drug and on the
particulars of the patient (size, age, body fat level, other drugs,
physiological state and past history, genetic heritage). Many sub-
stances in our environment augment the difficulties, adding other
sources of specific or general toxicity.
Drugs and toxins have many common features; for example the
lipid solubility that allows easy permeation of cell membranes, so
desirable in drugs, is the source of the problems with inhaled
toxicants. While the body has evolved a rather general system for
dealing with foreign toxicants, the P450 system in the liver that
handles hundreds of different chemicals, it is also good at degrad-
ing and excreting drugs as well. As a corollary, hepatotoxicity can be
a problem with drugs. Likewise, renal damage can be a risk from
those drugs excreted in the urine, like ibuprofen.
Computer modeling includes all phases of ADME. The
method of Administration by swallowing a pill is the commonest
but is not as fast as intravenous (i.v.) injection. Intravenous is the
method with the best defined administrative kinetics, followed by
intramuscular (i.m.). There are a host of other local injection types,
slow release subdermally, suppositories, inhalation, sublingual
absorbance, etc., all of which have different rates of drug delivery
into the circulation and to the target. Since the exposure of the
target to the drug is most often measured in terms of the AUC
(Area Under the Curve of the drug’s plasma concentration versus
time) the differences amongst methods of delivery are important.
The AUC is influenced by dilution in the circulation (part of
Distribution), degradation by hydrolysis or metabolism (the M of
ADME), and elimination or excretion (the E of ADME), so a
pharmacokinetic model must include all of these. See earlier chap-
ters on Fundamentals of ADME and on Modeling of Absorption.
In the neighborhood of the target (enzyme, receptor, channel,
transporter, or transcription regulator) much depends on physico-
chemical attributes of the drug or toxin. Does it bind to plasma
proteins? What is the affinity of the target proteins compared to that
of other competing proteins, or DNA or membrane-bound pro-
teins? What are the on- and off-rates of any competing binding
sites? What is the drug’s tissue/blood partition coefficient or its
solubility in body fat? One of the standard ways of getting clues on
the fate of the drug is to do whole body distribution studies on rats
to see where it is deposited at a succession of points in time. The
distribution sites of positron-labeled drug may be shown with high

resolution in reconstructed 3D imaging using MicroPET (Positron
Emission Tomography) systems. Such observations give data on
Distribution and Excretion and sometimes Metabolism. Since the
retention times in specific tissue locations influence, and may dom-
inate, the AUC, this information is crucial for optimizing effect on
the target while minimizing toxic side effects. Modeling analysis
then ideally provides specific information on all of these aspects of
the kinetics, and further provides the information essential for a
critical understanding of the pharmacodynamics. A by-product of a
good understanding of the PK is that it allows one to consider using
combined drug therapy wherein a second drug is used in advance to
protect a sensitive binding site by binding to it harmlessly and
preventing the toxicant drug from binding. Variants on this
theme might be combining therapy with a drug preventing the
activation of a receptor whose binding site accepts the drug of
interest.
1.2. Model Types, 1. Closed system: No sinks or sources, literally, all fluxes between
Topologies, and inside and outside are zero, and external driving forces are all
Equation Types zero.
1.2.1. In Terms of Input 2. Open system: There are external sinks and/or sources for some
and Output Characteristics of the constituents in some of the compartments or cells. (Sinks
We May Classify are defined as operations via which a substance vanishes from
Compartmental Systems the system; sources are operations generating a substance.)
as Closed or Open
1.2.2. In Terms 1. Catenary system: Two or more compartments arranged in

of Interconnections Amongst series.
Compartments or Cells, the 2. Cyclic system: Three or more in series, with the last connected to
Topology of the Network Is the first allowing a circulating flux.
Useful in Defining
Mathematical or Analytical
3. Mammillary: Two or more peripheral compartments connected
Approaches
to a central compartment and having no cyclic components, e.g.,
a blood compartment connected to each organ in the body.
Equating the circulatory system, for example, to a mammillary
compartmental system raises questions. “How can this be a rational
description?” Total circulatory mixing in humans requires many
minutes. Is the rate of solute escape from blood so slow that mixing
throughout the whole circulation is fast in comparison to the rate of
exchange within the organs? Solute-binding to plasma proteins
might so retard escape from the blood that the approximation is
adequate. Alternatively, consider the mouse, where the circulatory
mixing time is a few seconds, but permeabilities are similar to those
in humans, so equilibration in tissues is slow compared to circula-
tion times; here the idea that the system is mammillary is more
reasonable. The basic compartmental premises, instantaneous
mixing throughout the compartment, and therefore uniform inter-

nal compartmental concentrations are almost never truly valid;
being alert to this implies the next step: evaluating the error due
to failure to fulfill the requirements.
In terms of types of equations, models may be expressed in
many forms, but physiological models set up for solving by numer-
ical methods are mainly in the form of ODEs, Differential Alge-
braic Equations (DAEs), and partial differential equations (PDEs).
First-order ODEs are central to compartmental modeling, and
may be linear and nonlinear, for example a Michaelis–Menten
equation. A set of first-order equations set up with a single variable
to the left of each equal sign is said to be in state variable form. See
the Background chapters on Linear Algebra and ODEs. ODEs and
DAEs may be mixed together in a model, where the DAEs define
variables used in the ODEs, either implicitly or explicitly. Seeking
solutions sometimes requires prior analysis, but solvers like JSim
and Matlab can handle many implicit forms. PDEs are becoming
more common in PKPD problems now that computation is fast.
Especially recently it is being recognized that there are usually
concentration gradients along the lengths of capillaries for con-
sumed substrates and for drugs during the uptake phase; the
existence of gradients violates the compartmental assumption of
uniform concentration (5, 6) and causes errors in the estimation of
permeabilities and conductance parameters of all sorts. One-
dimensional PDEs usually suffice for capillary–tissue exchange
since in well-perfused organs the radial distances between blood
and cells are so short that radial diffusional retardation is negligible
compared to membrane permeation across endothelial cells or
parenchymal cell membranes.
1.3. Distribution Distribution, the D of ADME, is by convection, diffusion, and

from the Site permeation, and must precede the drug’s therapeutic and degradative
of Administration reactions at target sites or metabolic sites. Distribution includes the
reversible transfer of a drug between one compartment and another.
Some factors affecting drug distribution include regional blood flow
rates, volumes of interstitial and cellular spaces, molecular size, polar-
ity, and binding to serum proteins or other nontarget sites, forming
complexes. In using the AUC of plasma concentration versus time,
one should keep in mind that plasma protein-bound drug is having
no effect at the targeted site. Administration and Distribution are
best treated together as it is the combination of processes that
governs the effective concentration reaching the target.
Whatever the administration method (delivery on nanoparticles,
laser-induced controlled release, ultrasound enhancement of intra-
dermal diffusion, pill decomposition rates, location in gut, etc.), the
next phases of Distribution are physiological processes that are fairly
well understood (convection, permeation, diffusion, transport across
membranes or intracellular microtubular transport), but not
necessarily well characterized for the particular drug. Heterogeneities

in regional flows, capillary densities, and tissue composition may have
to be accounted for. These precede the reaction, binding, inhibition,
or receptor-mediated responses that compose the desired pathophys-
iological responses. Metabolic reactions, sequestration, uptake, and
excretion by epithelial cells (liver, kidney, saliva, skin, etc.), as by the
liver’s P450 system, or other reactions which inactivate (glucuronida-
tion, glycosylation) are all a part of the ADME model and have to be
considered in any PK modeling. For compounds which are degraded,
the metabolic products have to be assessed for long-term effects.
Compounds excreted by the liver or kidney become concentrated,
even 100-fold, in the process of elimination in urine or bile, and if the
drug is not conjugated or inactivated in some way there may be
damage to the excretory organ, compromising the organ function
and changing the pharmacokinetics after prolonged usage, and rais-
ing the AUC following each administration. Both renal glomerular
filtration and tubular secretion and hepatic biliary excretion create
high concentrations of the drug or metabolites.
At the level of the target, Quantitative structure–activity rela-
tionship (QSAR) (sometimes quantitative structure–property rela-
tionship: QSPR) comes into play. This is the process by which
chemical structure is quantitatively correlated with biological or
chemical reactivity. Biological activity can be expressed quantita-
tively as in the concentration of a substance required to give a
certain biological response, but in the context of quantitative PK
analysis, one would prefer that the PD (pharmacodynamics or
biological response) also be expressed in mechanistic terms. Addi-
tionally, when the physicochemical properties are expressible in
numbers, one can formulate the quantitative structure-activity rela-
tionships in the form of a mathematical model. The mathematical
model may also predict the biological response to related chemical
compounds. But it is the concentration of the bound agent com-
plexed to the target site that determines the level and duration of
the response.
There are a good many useful measures in considering the PK
of ADME. These include, in addition to AUC (exposure), Cmax
(maximum concentration), Tmax (Time to Cmax), half-life, clearance
route and rate, volume of distribution, and bioavailability in
unbound form. In the steady state of long-term administration
one assesses the plasma concentrations, total accumulation, linear
or nonlinear PK, time-dependent changes in kinetics, and metabo-
lites, their identity, and their PK. It is the combination of PK and
PD that allows knowledgeable optimization of dosage regimens.
A well-developed PKPD model will account for most of these
considerations and thereby be predictive and advisory to the thera-
pist. A clear description of the kinetics allows the planning of
efficient dosage regimens.
2. Software
for Systems
Description,
Simulation, Pharmacokinetic models have been written in virtually every
and Data Analysis computer language that exists, and it is a field that has stimulated
the development of a large set of relatively specialized simulation
systems. A partial list of simulation software for compartmental
2.1. Common Software
analysis goes back half a century:
and Methods Used
in the Field SAAM, Simulation, and Analysis Modeling, was the first, developed by
Mones Berman at NIH for analyzing tracer kinetics (http://depts.
washington.edu/saam2/). It exists still as SAAMII.
SIMCON, a general simulation control system (7), now evolved
into JSim, was used to solve FORTRAN-based models of all
sorts.
XPPAUT, from Bard Ermentrout (http://www.math.pitt.edu/
~bard/xpp/xpp.html). XPPAUT is particularly good for bifur-
cation analysis of dynamical systems.
Gepasi, now CoPasi, from Pedro Mendez (http://www.softpedia.
com/progDownload/Gepasi-Download-167140.html;
http://www.copasi.org/tiki-view_articles.php). CoPasi is espe-
cially good for enzymatic reactions and biochemical systems,
allowing a menu of choices for reaction types.
Modelica, http://www.openmodelica.org/, is excellent for linking
operators and presenting the forms of model networks (http://
www.ida.liu.se/labs/pelab/modelica/OpenSourceModelica-
Consortium.html).
Jarnac is designed for symbolic or diagrammatic entry for biochem-
ical and gene regulatory reactions (from Herbert Sauro (8, 9),
http://sys-bio.org/jarnac/).
JSim: Developed from SIMCON and XSim (for X-windows linux
systems, http://www.physiome.org/software/xsim/) into
JSim (10) (http://www.physiome.org/jsim/). JSim was devel-
oped by Erik Butterworth (11); it provides automated unit
balance checking and unit conversion, thus avoiding errors
due to inconsistency in the units used in the code.
Non-MEM, nonlinear mixed effects modeling, is a commercial
software package providing the capability to use a wide variety
of pharmacokinetic models. It is particularly designed for the
analysis of sparse data sets using combinations of single patient
and population data, (http://www.iconplc.com/nonmem).
BioSPICE (derived from SPICE, for biology: http://sourceforge.
net/projects/biospice) is designed for molecular biology. See
also http://jigcell.cs.vt.edu/software.php for JigCell, based on
BioSpice.
Cellular Open Resource (COR) is for a Windows environment:

http://cor.physiol.ox.ac.uk/ for cellular level physiological
systems. PCEnv for physiological systems is being developed
from it.
Stella (http://www.iseesystems.com, a commercial system) for
networks of operators such as in compartmental systems.
StochSim (http://www.ebi.ac.uk/~lenov/stochsim.html) is writ-
ten explicitly for the treatment of molecular interaction when
there are few molecules and interactions occur stochastically.
These simulation systems have one purpose in common: to
make the programming of models simpler and to facilitate the
analysis of experimental data in terms of the parameterized descrip-
tions of the kinetics. They vary considerably with respect to their
representation of physical/chemical mechanisms. Such modeling
and analysis systems do not displace FORTRAN, C, C++, and Java
as mainstream languages, but rather they replace the front-end
entry to formulate the models and interface them to data sets.
None of these have the general capabilities of Matlab or Mathema-
tica, nor do they attempt algorithmic manipulation as in Maple, but
are more directly tuned to the user’s needs, as will be described.
2.2. A Preferred JSim is perhaps the most general of these simulation analysis systems,
Simulation designed for the analysis of experimental data. It is built around a
System, JSim “project file, .proj,” that may hold many data sets, several different
models, and results of multiple analyses. JSim handles not only the
ODEs around which traditional compartmental modeling is built,
but also DAEs, implicit functions, PDEs, and stochastic equations.
JSim, uniquely, and from its beginning in 1999, uses unit balance
checking and automated unit conversion. (Unit balance checking
assures that the units of the expressions on the left of the equal sign
are the same as those on the right. Automated unit conversion means
that when time is expressed in minutes a velocity expressed in cm/
s will be converted to cm/min by multiplying cm/s by 60 s/min.)
This pair of features is a great boon in programming since in the first
phase of compilation it automates the first stage of verification of the
model’s mathematical implementation by making sure that every
equation has unitary balance. The second phase of compilation parses
the details of the equations, and sequences them for efficient compu-
tation. The run-time code is compiled into Java, which now runs
almost as fast as FORTRAN and C. (On a cardiovascular–respiratory
system model JSim ran exactly 300 times faster than a Matlab–Simu-
link version of the identical model.) JSim’s advantages over the ODE-
based systems listed above are the following:
1. Runs on Linux, Macintosh, and Windows.
2. Is free and downloadable from www.physiome.org. On the
Macintosh it takes about 30 s to download and install, and
another 10 s to bring up a model.
3. Is the only one that solves PDEs and offers an assortment of solvers
for both PDEs (three available now) and ODEs (eight available).
4. Imports and exports both SBML and CellML archival forms.
5. Provides sensitivity analysis of two types, relative and absolute.
6. Graphical output is immediately available during the simulation
run and setup in seconds.
7. Has seven built-in optimizers for excellent power in parameter
adjustment to fit data.
8. Provides the covariance matrix giving the correlation among
free parameters and estimates of parameter confidence limits.
9. Use project files that allow the analysis of many experimental
data sets in one file.
10. Stores parameter sets so that individualized parameter sets for
each data set can be stored.
11. Allows the use of several models within one project file so that
competing hypotheses (models) can be compared and evaluated.
12. Is structured so that the front-end parameter control and
graphical user interface (GUI) can be framed explicitly for any
model.
13. Has linear and log line graphs, 2D contour plots representing
3D, and phase-plane plots.
14. Has “looping” capability, allowing discrete successive jumps of
the values of one or two parameters at a time in order to
explore system sensitivities visually and rapidly.
15. Uses a Mathematical Modeling Language, MML, in which one
writes the equations directly, for simultaneous solution, and in
which the order of the equations is not specified.
There are no special requirements for the JSim software or for
its methods of use for model building and exploring or use in
analysis with respect to hardware, computing platform, or operating
system. It has important limitations, not being a procedural lan-
guage but a declarative mathematical language. This means there is
no equivalent of a FORTRAN DO-loop (or GO TOs or jumps). It
cannot yet do matrix inversions (except through a special mecha-
nism), and is in a continuing state of development. JSim 2.0,
released in February 2011, is based on a new compiler providing
many new features described at nsr.bioeng.washington.edu/JSim/.
The features listed above, and others not listed, have been
implemented in JSim because the years of experience with a large
variety of models, with teaching graduate and undergraduate clas-
ses, and postdoctoral and faculty workshops have led to a detailed
understanding of how people use modeling in scientific research.
Experiment design, hypothesis testing, and system parameteriza-
tion are given priority in the conveniences provided.
Thus JSim, since it is designed around the analysis of experimen-

tal data, is our preferred software and will be used for the compart-
mental modeling shown next.
3. Compartmental
Modeling
3.1. The Modeling The overall process in the experiment/model hypothesis iteration
Process loop of Platt (12) is as follows: (1) express the hypothesis in
quantitative terms, as a mathematical model, with units on every-
thing; (2) use the model to determine the best experiment that
might contradict the predictions of the model, or, better yet,
develop an alternative model that is seemingly as good but makes
different predictions, and then design the experiment that clearly
distinguishes between the models; and (3) do the experiment and
analyze the data. One of the two competing models, maybe both,
must be proven wrong, and so science is advanced.
The normal data analysis using models begins by putting the
data to be analyzed in a “project file,” modelname.proj, and dis-
playing them on the JSim plot-pages. The second stage, coding and
model verification in accord with standards (http://www.imagwiki.
nibib.nih.gov/mediawiki/index.php?title¼Working_Group_10),
is building and testing the model, incorporating reference analytical
solutions if appropriate to verify the solutions as being mathemati-
cally accurate, and representing the equations. The “project file”
may contain two or more models so that alternative model forms
can be compared directly by examining the solutions (changing
parameters and rerunning, using “loops” to automatically change
parameters, using behavioral analysis, plotting in various forms
including phase-plane plots, contour plots). The verification stage
is to show that the model solutions are computed correctly, done by
testing different solvers, using different time-step sizes, and com-
paring with analytical solutions in special cases.
The validation stage is to test the fitting of the model solutions
against the experimental data. The word “validation” is truly opti-
mistic, because a good fit of the model solution to the data does not
really validate the model, but merely fails to invalidate it. It is the
failure of the model that leads to the scientific advances by forcing
new ideas to be incorporated. Nevertheless, fitting of the model to
the data provides characterization of the data, augments diagnostic
acuity, assesses progress of disease or evidence of successful therapy,
and is generally useful in reconciling the working hypothesis
(model) with observations.
The final phase is preparing the model so that it can be reproduced
by others. This is not only critical from a tutorial point of view but also
in fact is a requirement for any scientific publication. Anything that
cannot be reproduced is misleading and wastes time and money.

Reproducible models can be tested by others or used as building
blocks to advance the field when they pass muster.
3.2. A Simple The modeling code. For an introduction to JSim we use a two-
Compartmental Model compartment closed system with passive exchange between the
Implemented in JSim compartments, and a conversion reaction of solute A to solute B
in either or both compartments. This model has analytic solutions
which could be used either to show the solutions or to provide
verification of the accuracy of the numerical solution, but since
these solutions run no faster than the numerical solutions, they
will not be used here. Detailed instruction in JSim use is available at
http://www.physiome.org/jsim/. This model is #246.
Many model programs are available at http://www.physiome.
org/Models. One can search to find a model similar to what one
might like to construct, e.g., from a tutorial list of compartmental
models: http://www.physiome.org/jsim/models/webmodel/NSR/
TUTORIAL/COMPARTMENTAL/index.html.
Open model #246, Comp2ExchReact, and you will be asked to
allow the display on your computer, wait a moment for it to be
compiled, and then click on “Source” at the bottom of the JSim
page to show the source code, Table 1 for the model shown in
Fig. 1. All the models on the Web site are archived to keep track of
model changes (previous versions can be found under the “Model
History” section on each model Web page). (Models edited over
the Web cannot be saved on the Physiome Web server, so simply
save what you want to your own directory. The JSim system can
likewise be downloaded directly and the model worked on from
your own computer.) The model for the code in Fig. 1 is dia-
grammed at the bottom of the figure; it is a two-compartment
model for two substances, A and B. Both substances can passively
move from one compartment to the other. A is irreversibly con-
verted to B in either or both compartments. After the title a short
description is provided. (Text enclosed by /* . . .*/ is ignored by
the compiler, as is comment text following // on any line.)
An important JSim feature is invoked next, the reading of a
units file, nsrunit, which allows automated unit balance checking
and automated unit conversion to common units during the
compilation phase; MKS, CGS, and English units can all be
used. (The unit conversion can be turned off, a feature useful
Fig. 1. JSim code for a two-compartment model in which a solute A can exchange across
a membrane between volume V1 and V2 and can react irreversibly to form solute B in
either compartment. This is available for download at www.physiome.org/jsim/models/
webmodel/NSR/Comp2ExchangeReaction/index.html and is model #246.
Table 1
Code for two-compartmental model with passive bidirectional exchange
when importing models from CellML or SBML which sometimes

have unit conversion factors hidden as dimensionless factors in their
archived models, whereas they should be dimensioned, e.g., as 60 s/
min.) The word “math” and the curly bracket designate the start of
the model code, written in JSim’s MML, which will be seen to be
merely the equations for the model. The “realDomain” defines the
independent variable as t, time in seconds, along with a starting and
ending time and a time interval for graphing the solutions.
The parameters of the model are assigned units. We have put
the units in physiological form in order to represent those for a
perfused tissue, so much per gram of tissue. In general it is practical
to avoid mistakes by using the same units as used for the
experimental studies in which the data are acquired. The volumes of

the two compartments are V1 and V2 ml/g. In this context, flow
per gram of tissue (F), clearances (Cl), and permeability-surface
area products (PS) all have the same units, ml/(g min). (The PS is
defined as permeability, P cm/min, times membrane surface area
per gram of tissue, S cm2/g.) The reaction is described as a clear-
ance, but here is given the symbol G, ml/(g min) for a gulosity or
consumption. The solute A is converted from A to B. (The termi-
nology used here is that used by the American Physiological Society
(13), where an extensive set of terms for transport, flow, and
electrophysiology are given.)
The model’s variable are functions of the independent variable
time, for example the concentration of A is defined as a real number
and as a function of time by “real A(t).” In the equations A(t) can
be written without the “(t),” but in JSim’s MML the use of the (t)
initially is required to establish it as a function of time. Other
languages such as Madonna and XPPaut do not demand this, but
JSim users find that this reduces errors. The MML is fairly similar to
that in those languages; the basic intention is just to write the
equations directly.
The ODEs, used to describe compartmental models, need to
be supplied with initial conditions, here provided as A10, A20,
B10, and B20 where the first subscript digit refers the compartment
and the second is “0” to refer to the initial time t ¼ t min, which
can be negative or positive, but is usually t ¼ 0.
In the ODEs, the derivative of A(t), which you would expect to
be dA/dt is written A:t. The second derivative, d2A/dt2, would be
A:t:t. The equations are written in state variable form in this model,
for tidiness and simplicity, but this is not FORTRAN and one could
equally well write the equation as “V1 A1:t ¼ A2 PSa A1
(PSa + G1);”. In this latter form, the left-hand side, LHS, of the
equation is the rate of change of mass of solute A, (mol/g)/min ¼
mM (ml/(g min)) for A1 in V1.
3.2.1. Unit Balance Checking The acute observer will have noticed that the independent variable, t,
is in seconds: the differential equation therefore looks to have unbal-
anced units. It actually computes correctly since the phrase “unit
conversion on” at the top of the program preceding the model code
enlists the automated unit conversions so that in the compiled code
the multiplier of the left side, “60 s/min,” is inserted. Without unit
conversion on, the best way to handle this is to put the independent
variable t in minutes. In languages like Matlab there is no unit
checking. In huge projects like the Mars Climate Orbiter mission
(http://www.spaceref.com/news/viewpr.html?pid¼2937), mixing
units from the European and American programs led to the crash
of the space vehicle and the termination of the billion dollar mission.
There would have been no problem in JSim as long as the units were
stated and unit conversion “on” (11). The nsrunit file may be viewed
by clicking “Debug” (left, bottom) to drop down a menu including

“View system units file.” The four ODEs in Fig. 1 are a complete
description of the system behavior after the initial moment. Follow-
ing them is an algebraic equation for TotalC, also a function of time,
to check the total mass at each point in time divided by the total
volume and giving the average concentration. With the initial con-
ditions and volumes given, the result is 0.33333333 mM for every
time step. Run the model and then check the numbers for the
plotted variables by clicking on “Text” instead of “Graph” at the
bottom of the plot page, as in Fig. 2. The program end is marked by
a right curly bracket, beyond which one can put notes, comments,
key words, diagrams, references, etc.
3.2.2. Graphical User The JSim GUI for simulation control is shown in Fig. 2. The left
Interface panel has overlays for project contents: one or more models, data
sets, parameters sets, setups for solvers for ODEs and PDEs,
sensitivity analysis, optimization, and confidence limits. To start a
simulation run one clicks “RUN” at the top of the run time
window. On the right page one clicks on a “Message” panel for
error messages, or the plot pages (1_Conc or 2_ReacSite, names
chosen by the user), and at the page bottom choose “Graph” for
displaying the output graphically or “Text” for seeing the numerical
listings of the experimental data and the model solutions at each
time point.
There is an extensive introduction to JSim at www.physiome.
org/jsim giving precise detail to supplement this outline of usage.
The JSim MML code is pretty easy for a beginner to use since
it contains just the parameters with their units, the variables
followed by (t) to indicate their dependence on the independent
variable t for time, the initial conditions, and the equations for the
model. The most common mistakes are to misapply the uses of
commas and semicolons. Each of a sequence of events is usually
comma separated, and a string of them is closed with a semicolon.
The model code itself begins with the left curly bracket, “{” after
the word “math” and ends with the right curly bracket,“},”
after all of the equations have been written. Comments are pre-
ceded by a double slash, “//,” or alternatively can be preceded by
a “/*” followed by an “*/,” without the quote signs. Equations
end with a semicolon, including those in the initial condition
statement.
3.2.3. Exploring Parameter In loop mode, the user can choose to enter a sequence of values
Influences Using the “LOOP” (under Other Values) to explore model behavior widely, using
Mode of Operation comma separation, e.g., 2, 3, 5, 8, etc., and in the “auto” mode
will do as many runs as there are values entered. One can also enter
arithmetic changes such as @*2 or @ + 3 or more complicated
expressions to indicate automatic changes in the starting value by
Fig. 2. Standard JSim Input/Output control and plot pages. Left panel: Runtime control: “Domain” is time, t, with tmin
starting time, tmax ending, and tdelta the time step for plotting. “Model Inputs” gives parameter values and initial conditions.
Model Outputs shows values of variables at the end of the run, at t ¼ 60 s. The mass balance check, TotalC, is exact to
eight decimals. Right panel: The time courses of concentrations and A and B in compartments 1 and 2 are shown for the
parameters and initial conditions shown in the left panel and in the code in Fig. 1. The user chooses the variables to plot,
and the colors and line type or point type. The title and labeling are user written and retained in the JSim project file. The
legend for the graphics output is automated.
multiplication by 2 on each run, or the automatic addition of 3 on

each run, for a chosen number of runs (Fig. 3).
3.2.4. Sensitivity Analysis Sensitivity analysis is available to provide a quantitative measure of

the effect of any chosen parameter on a particular variable. This
extends the information gained by “looping.” The sensitivity func-
tion S(t) of a variable to a parameter is calculated by calculating the
change in a variable such as A(t) per fractional change in a parame-
ter value, P. The linear sensitivity function is
SðtÞ ¼ @AðtÞ=@P; (4)
or alternatively, the log sensitivity function is
Slog ðtÞ ¼ ð@AðtÞ=AðtÞÞ=ð@P=PÞ ¼ logðAðtÞÞ=@ log P: (5)
Fig. 3. Loop Mode Operation. The control panel for the looping operator is shown on the left. The starting values for G1 and
G2 are those shown in the code (Fig. 1) and their solutions are given by the solid lines, and are the same as in Fig. 2, with
conversion of A ! B only in compartment 1. On the second run (dashed lines of same colors) the values entered by the
user under Other Values (left panel, top right ) are used, in this case setting G1 ¼ 2 and G2 ¼ 0, so that the conversion of
A ! B occurs now only in compartment 1 instead of in compartment 2.
Sensitivities are local linear approximations, and are calculated

by computing two solutions, A1(t) and A2(t), the second having P
changed by a small fraction, 0.001P or 0.01P, so that
SðtÞ ¼ ðA1 ðtÞ A2 ðtÞÞ=DP; (6)
and using the definition of the derivative as DP goes to zero gives
the formal value.
3.2.5. Optimization in Data Optimization, either by manual parameter adjustment or by

Fitting and Analysis automated methods, is the procedure of adjusting parameters of the
equations so that the solution provides a good fit to the data. The
evaluation of the goodness of fit by minimizing the sum of squares of
the differences between the model solution and the experimental data
is based on an implicit assumption that the differences are Gaussian
random. This is seldom correct, and it is important to appreciate that
the choice of the distance function is personal, i.e., it is up to the
investigator to characterize the noise in the data and to weight the
influence of individual data points. See next section.
4. Applications
4.1. A Compartmental Aspirin is a very old drug used to reduce fever and inflammation;
Approach to Aspirin only recently has its mechanism of action begun to be understood,
Kinetics the first being its action as a blocker of prostaglandin formation. Its
kinetics have not been thoroughly worked out, so we present here
our analysis of three sets of representative data from three different
studies. Aspirin is acetylsalicylic acid, and its reactions are as follows:
Acetylsalicylic acid ! salicylic acid ! salicyluric acid: (7)
Aspirin, acetylsalicylic acid, is hydrolyzed quickly via a plasma
esterase to salicylate. Salicylate is pharmacologically active. Salicy-
late kinetics dominate the clearance. The modeling examines only
salicylate’s enzymatic conversion to product, where product is con-
sidered to be equivalent to excretion into the urine, a saturable
process. This may produce salicyluric acid or a glucuronate. The
model captures the kinetics of salicylate clearance over a 100-fold
range of concentrations through consideration of one enzymatic
reaction with parameters optimized to fit three very different data
sets taken from the referenced papers.
The second reaction to form the excretable product is slow and
is enzymatically facilitated. At high doses the enzyme becomes
saturated, i.e., the reaction is limited by the fact that the enzyme
is all in the form of the bound enzyme–substrate complex and
raising the salicylate concentration does not accelerate the reaction.
At low dosage the clearance is rapid; at medium or high therapeutic
dosage the clearances are slower; at near-lethal toxic levels clearance
is very slow and often requires treatment by infusion of alkaline salts
and sometimes dialysis. For this example we have taken the data
from three research studies. (Low dose data are from Fig. 1 Right of
Benedek (14) Dose Period 1 (squares). Medium dose data are from
Fig. 4 of Aarons (15) oral dosage, last nine points. High dose data
are from Fig. 1 of Prescott (16) Control.) These particular data
were chosen because the chemical methods and procedures
appeared to be excellent, the data covered many hours, and, while
we do not have the original data, conversion from the symbols in
the figures to numerical representation was accomplished with
good accuracy.
A reason for choosing aspirin as the subject for compartmental
analysis is that the time course of the clearance, mainly by loss into the
urine, is long compared to circulatory mixing times, so that the bio-
chemical processes appeared to limit the clearance, and they could
therefore be characterized. If the circulation and distribution times
were long compared to the reaction processes, the latter would not be
meaningfully determined from the observations. Another reason was
that a comparison among the different data sets suggested that the
clearance was enzymatically mediated: at high concentration the dimi-

nution was almost linear, at low concentrations it was almost exponen-
tial, and at intermediate concentrations the rate of clearance appeared
to speed up with time. This fits the expectations for a saturable enzy-
matic process: at low concentrations well below the dissociation con-
stant for the enzyme substrate complex almost all of the enzyme is free
and available for the reaction and therefore the fractional transforma-
tion of substrate is at its highest. With all the enzyme free this is a first-
order process, a single exponential. In contrast at high concentrations
the enzyme is almost totally saturated and the conversion is at a
maximum rate independent of the concentration; this is zero order
kinetics, giving a linear diminution in concentration. Thus the shift
from zero order kinetics to what occurs at intermediate concentration,
a gradually increasing fractional rate of reaction, is what was suspected
by looking at the middle-level concentrations.
Thus we hypothesize an enzyme conversion model for salicylic
acid clearance, and then test the surmise by attempting to simulta-
neously fit the three independent data sets using one set of para-
meters. To do this in one program and to optimize the fit to the
data for all three salicylate levels we coded three identical models in
the one program (Table 2). These are computed simultaneously in
order to fit the three independent data sets from the three research
reports using one common set of parameters, and automated optimi-
zation was used to minimize the set of differences between data and
model solutions.
The reactions, both the binding to the enzyme and the product
formation, are reversible:
kon1 ! koff 2 !
SA þ E ! SAE ! E þ P: (8)
koff 1 kon2
The equations and parameters are identical for the three dosage
levels. Here is the model code for the low dose, where the prefix L
distinguishes this model equations from those for the medium
dose, prefixed M and the high dose, prefixed H, neither of which
In undertaking an analysis on a single enzymatic reaction we
lack knowledge of the exact mechanism. In addition the compart-
mental approximation is certainly questionable for whole body
studies. The hypothesis that a single enzymatic reaction dominates
the clearance would be strengthened if the model provides good fits
to the three data sets. There is no guidance from the literature on
the dissociation constant, KD, for our presumed enzymatic reac-
tion, so that we are neither constrained nor aided.
We chose to use a simple enzymatic reaction, one that allowed
characterizing the rates of binding and unbinding of substrate and
enzyme, and a rate of the forward reaction to yield the product. We
allow also a backward reaction, on the basis that all reactions are
Table 2
Model code for salicylate clearance (model downloadable
from www.physiome.org/Models: search for model 280)
/* MODEL NUMBER 280

MODEL NAME: Aspirin
SHORT DESCRIPTION: Salicylic acid (SA) clearance for three different dose ranges is modeled as
an enzyme reaction. This table is abbreviated by omitting the code for the Mid and High dose
reactions.
*/
import nsrunit; unit conversion on;
math Aspirin {
// INDEPENDENT VARIABLE
realDomain t hour; t.min ¼ 0; t.max ¼ 16.0; t.delta ¼ 0.05;
// PARAMETERS (SAME FOR ALL THREE MODELS)
real kon1 ¼ 0.174 L*mg^(-1)/hour, // On rate for SA + enzyme
KD1 ¼ 6.3 mg/L, // Dissociation constant for SA enzyme complex
koff1 ¼ KD1*kon1, // Off rate for SA enzyme complex
kon2 ¼ 0.003 L/(mg*hour), // On rate for Product + enzyme
KD2 ¼ 250 mg/L, // Dissociation constant for Product enzyme complex
koff2 ¼ KD2*kon2, // Forward rate to form Product from complex
Gp ¼ 0.03 1/hour, // Clearance rate from plasma
Etot ¼ 10 mg/L; // Total enzyme concn
// LOW DOSE MODEL: Data from Benedek (1995) Fig 1 Right Dose Period 1 (squares)
// LOW DOSE PARAMETER
real LSAtot ¼ 8.2 mg/L, // Total Low Dose concentration
// LOW DOSE MODEL VARIABLES
LSA(t) mg/L, // Low dose SA
LSAE(t) mg/L, // Low dose SA-enzyme complex
LE(t) mg/L, // Low dose free enzyme
LP(t) mg/L; // Low dose product
// LOW DOSE INITIAL CONDITIONS
when(t ¼ t.min) {LSA ¼ LSAtot; LSAE ¼ 0; LP ¼ 0;}
// LOW DOSE ORDINARY DIFFERENTIAL AND MASS BALANCE EQUATIONS
LSA:t ¼ -kon1*LSA*LE + koff1*LSAE;
LSAE:t ¼ kon1*LSA*LE-koff1*LSAE-koff2*LSAE + kon2*LE*LP;
LP:t ¼ koff2*LSAE - kon2*LE*LP - Gp*LP;
LE ¼ Etot - LSAE; // LSAtot ¼ LSA + LSAE + LP; for an overall mass balance
accounting
} //End of Low Dose Model. The omitted code for the medium and high dose models is identical
// but L is replaced with M or H. Copy and paste the Low Dose Model into the space preceding the
// right curly bracket, twice, and change the L to M for one copy, and L to H for the other; recompile.
thermodynamically reversible, at least in principle. This reaction

setup allows reduction to the simpler, commonly used Michaelis–-
Menten reaction; this is accomplished by speeding up the binding
and unbinding reactions of salicylate to enzyme and eliminating the
Fig. 4. Data on salicylate clearance from the three laboratories are fitted simultaneously with a single enzyme model to
describe the clearance. Parameters are given in the model code in Table 2. Data are from Benedek (14) in the left panel,
Aaron (15) in the middle panel, and Prescott (16) in the right panel. Note that the concentration ranges are markedly
different. (http://www.physiome.org/jsim/models/webmodel/NSR/Aspirin/ model 280).
reverse reaction from the product back to salicylate. So for the first
level of testing all parameters are considered as open and adjustable,
including the initial values of the concentrations in the system. In
this analysis the system is considered to be a single well-stirred tank,
as if the circulation were instantaneously mixed. This gross over-
simplification also makes the assumption that the product either
goes directly into the urine or to some other location in the body
from which it does not return. Enzymatic conversion in the liver
followed by excretion into the bile would be equivalent kinetically
to conversion to a glucuronate followed by the circulation through
the blood and clearance in the kidney. Ideally one would measure
the urinary excretory rates simultaneously and model both the
plasma clearance rates and the urinary clearance. This was not
done here.
The results are shown in Fig. 4 where the curves at the low,
middle, and high concentration ranges are shown to be fitted
reasonably well by the model. The high concentration data (right
panel) are fitted less well but do illustrate that the slope is
Fig. 5. Semilog plots of the data (symbols) fitted with the model solutions for the salicylic acid (LSA, MSA, and HSA, solids
lines). All the data in both Figure 5 were fitted simultaneously with one parameter set for the enzyme as given in Table 2.
The Product concentrations (dashed line for LP, MP, and HP) are merely predicted product concentrations. Since there are
no data on product concentrations, the assumption that there is no degradation of Product must lead to some overestima-
tion of its influence on the backward reaction.
approximately linear, as expected, whereas at the low concentra-

tions (left panel) the curve is nearly exponential, as expected.
A different view is provided by Fig. 5, a semilog plot of the
three sets of data each fitted by the model. The high concentration
curve (open circles, purple line) appears to be almost linear on this
plot, but the slope is shallow, and judgment based on Fig. 4 is
better. The high dose concentrations are very much higher than the
KD1 for substrate binding, estimated at 6.3 mg/l, so that there is
no doubt that the enzyme was almost saturated. In fact, with the
High Dose the concentrations are above apparent KD2 of 250 mg/l
for product binding, so there is significant reversal of the reaction.
At Mid Dose level (triangles, blue line) the slope diminishes as time
progresses, that is to say the fractional clearance is increasing as the
enzyme becomes less saturated. The Low Dose data (diamonds,
green line) are fitted well and have considerable curvature on the
semilog plot, the slope at late times being much diminished: this
leads to the idea that there is some tendency for retention at the low
concentrations, which could be either due to recirculation from
other parts of the body where the concentrations were initially
Fig. 6. Three sets of linear sensitivity functions versus time. Solid lines are sensitivities to the initial zero-time
concentrations resulting from the doses. Long dashed lines are sensitivities to KD1: the sensitivities are all positive.
Dotted lines are sensitivities to KD2, the dissociation constant for the reverse reaction: the sensitivities are all negative.
higher or due to “product pressure” to form salicylate by the

reverse reaction. The reversibility is governed by the KD2 for the
product, which here was about 250 mg/l. There must be even
more reverse flux with the middle and higher level concentrations,
HP and MP, a feature of product inhibition, and the estimate of this
KD2 is determined from both of these, even though the effect is
most evident from the curvature of the low salicylate dose, LSA.
4.1.1. Sensitivity Analysis In Fig. 6 the linear or absolute sensitivity functions are shown
for initial concentrations and for the two dissociation constants.
The solid lines are sensitivities to the initial zero-time concentra-
tions resulting from the doses; most of the sensitivity is at the
earliest points. With the high dose the fractional clearance is so
low that the high sensitivity extends throughout the 16 h of the
study. For the middle dose, MSAtot, the sensitivity diminishes most
steeply as a function of time at around 10 h when the concentration
is close to KD1, the dissociation constant for substrate binding.
The long dashed lines are the sensitivities to KD1; these are all
positive, meaning that if KD1 were increased (decreasing the affinity
of the enzyme for the substrate SA) the model solutions for all
three doses would be at higher levels and the rate of disappearance
would be diminished. Note that the time of peak sensitivity to KD1
is at early times for the low dose, at 10 h for the middle dose, and at
late times for the high dose. The dotted lines are sensitivities to KD2:
the sensitivities are all negative, meaning that if KD2 were increased
(decreasing the affinity of the enzyme for the product P) the model
solutions would be at lower levels and the rate of disappearance
would be increased because of reduced rates of reverse flux from
product to salicylate.
Technically the sensitivity calculations are set up by a special
mechanism: at the bottom of the left page is a button labeled
“Sensitivity.” Clicking on it takes one to the “Sensitivity Analysis
Configurator.” There in the leftmost column of the configurator
table one types in, or chooses from the drag down menu, the
parameter for which one wants to find the sensitivity. By clicking
on the down arrowheads you bring up the choices. In the setup
provided on the Web site at www.physiome.org, etc. the three start-
ing values for the initial concentrations at t ¼ 0 are listed: LSAtot,
MSAtot, and HSAtot. Next on the list are the dissociation constants
KD1 and KD2.Their current values are automatically displayed under
“value.” The calculations of each S(t) are made on the basis of the
parameter change of 1% set under “delta” at 0.01. The tick marks in
the OK column indicate that the calculation will be made as
described earlier, namely, that the standard solution will be calcu-
lated and then another solution calculated for each of the five para-
meters listed, with this 1% change in parameter value. The S(t) is the
difference in the solution at each time point from the standard
solution divided by the 1% change in the parameter value, Eq. 4.
4.1.2. Optimization This is the process of fitting the model solutions as closely as
possible to the data in order to guide one’s thinking about and
one’s use of the model. When the fit is very close, then one has a
descriptor of the fitted data sets, that is, the model and its parameter
set provide a record of that description. Descriptions of many
different studies, patients studies or experiments, allow compari-
sons and possible classification into categories having specific dis-
tinctions. Descriptive models are useful for diagnosis and possibly
for prognosis or choosing modes of therapy. (If the model
“explains” the data by defining the physical and chemical mechan-
isms, that is even better.)
When the fit is poor, then more exploration is needed. Was
automated optimization used? A typical set of trajectories of param-
eter values during an optimization run using SENSOP is shown in
Fig. 7. The values do not range widely; to assure one that they have
not settled into local minima we also used other optimizers that
search widely, e.g. simulated annealing. Try weighting the data
differently: a simple sum of squares minimization may not be
appropriate; it is almost never appropriate if the data range over one

order of magnitude. For example when fitting a decay process that
is exponential, one can use a reciprocal weighting so that points are
weighted more evenly, or a weighting adjusted to the individual
result such as “1/exp(t/(2 h))” for the low dose data set, and “1/
exp(t/(8 h))” for the middle dose data. These choices of weight-
ing are to be typed, without the quotes, into the appropriate line
under Pwgt (point weighting) in the Data to Match Table on the
Optimizer Configuration Page.
In the same table there is opportunity to even up the weightings
for the three quite different curves by using Curve Weighting, Cwgt.
In this case we weighted Low Dose data with 140 times the weight of
the High Dose Data since the latter were about 140 times higher
concentrations. This evens up the contributions to the sum of
squares. Likewise the Middle dose data were weighted at 14, that
is, about ten times higher than the High Dose data. This combina-
tion of Point weighting within each data curve and Curve weighting
among the three data curves gives each point in the triple data set
about the same weight. This makes the calculation of a root mean
square error provided by the minimization of the sum of squares a
reasonable strategy. This is probably close to what one would do if
using “eyeball best fitting,” examining the fitting to all the points.
Both Figs. 1 and 2 plots are valuable in this regard since they weight
the curves differently from the “eyeball” view. While arguing that the
eyeball fitting is just as valid as that from automated optimization,
the latter has the virtue that the weighting scheme is explicitly stated,
and that the weighted sum of squares can be reproducibly reported
when the weighting scheme is reported also.
There are situations where an optimizer fails to reach a good fit
because the sum of squares has settled in a local minimum. Opti-
mizers like Gridsearch and Simulated Annealing are designed to
cover a wide range in state space so that even the corners get
explored (Fig. 7). Others like SENSOP (17), GGOPT (18), and
NL2SOL (19) make excursions, jumping out of the locale to test
other regions for a better fit. To switch optimizers, simply pick
another one from the drag down menu.
4.2. Multiple Sequential Test of clinical functions evolves in a variety of ways, usually being
Dose Administration: designed long after the function has been clearly understood. Here
Hepatic Function is a counterexample, a case in which the idea of the clinical evalua-
tion from the administration of a drug was evident right at
the beginning. There was a coalescence of features that brought
this about.
For the estimation of cardiac output using the indicator dilu-
tion technique Mayo Clinic’s Earl Wood needed a dye that
absorbed light at a wavelength of 800 nm, the isosbestic point at
which oxyhemoglobin and reduced hemoglobin absorbed equally.
This would allow optical detection and quantitation of the dye
Fig. 7. Optimization: Trajectories of values for parameters being optimized, in this case the three initial concentrations and
the two dissociation constants during 100 trials of fitting the model solutions to the salicylate data. The optimizer used was
SENSOP (17); the staircase nature of the plotted values is that SENSOP reports the previous estimate of the parameter
vector as it calculates each new sensitivity function, one for each parameter being optimized, and then reports and plots
the new value of each parameter.
concentration independent of the blood oxygen level. The dye,

indocyanine green, was found by Fox in Kodak’s repository (20);
it was not toxic; its absorbance peaked at 805 nm; it bound to
albumin and so stayed in the circulation and did not color the body.
The densitometer measuring the absorbance was developed (21),
and a spin-off from it was an earpiece densitometer that could be
used to detect blood concentration noninvasively. In early experi-
ments to test the accuracy and reproducibility of the estimates of
cardiac output, dye was injected repeatedly as a bolus into a vein
and the arterial concentration–time curve recorded each time. The
repeated injections led to a rise in the background arterial concen-
tration, raising the question of whether the detector could
be calibrated accurately over a large range of concentrations
(22, 23). In the first studies, we found right away that the dye
CInject
V1, C1, Flow1 = C.O.
CRecirc
Lungs and Heart
V2, C2 Flow2
Kidney and Head
V3, C3 Flow3
G3
Liver and Gut
V4, C4 Flow4
Muscle
Fig. 8. Indocyanine Green Injection and Clearance. Upper panel: Blood concentrations in a
14 kg dog with 22 successive intravenous injections of 2.5 mg ICG (vertical pips). Dashed
lines represent estimated single exponential decay after each series of injections. Data
are from Edwards et al. (22). Lower panel: Circulatory model for hepatic clearance of ICG,
a mammillary compartmental model. Model code, with all parameters and equations, is in
Table 3. Modeling results are in Fig. 9.
was excreted via the bile: the feces turned green! Within a few years
the Indocyanine Green clearance Test was used as a liver function
test; it was a valued test even before the mechanisms of its hepatic
excretion became known (24).
The data from one dog (Edwards, 1960), shown in Fig. 8
(top panel), invite analysis. After each series of injections a set of
blood samples were obtained as the concentration diminished: the
diminutions appeared as straight lines on the semilog plot shown in
Fig. 8, top. This suggests a first-order clearance, i.e., a constant
fraction of the dye was being removed per unit time. Now let us
develop a simple model, and then test it against the data.
The anatomy and physiology provide the framework for the
model. The dye distributes throughout the whole blood volume.
We hypothesize that it is removed at one point only, by the liver,

and is excreted into the bile. We have no data on biliary concentra-
tion versus time, except that the dye ends up in the lower bowel
within the duration of the experiment, 4 h. We model the quanti-
tative data: the doses and their time of injection, the blood con-
centrations at the particular times, and use the dog’s weight, 14 kg,
as a constraint. The cardiac output is known from the areas of the
dye–dilution curves: C.O. ¼ Dose injected (mg)/Area under dye
curve (min. mg/l). These averaged 1.5 l/min, with a standard
deviation of about 10% (25).
A simple model of the whole body circulation is chosen as a
compromise, three compartments to represent fast, moderate, and
slow flows throughout the body and a single lumped compartment to
represent blood in the heart and lungs, as in Fig. 8, lower panel. A
mammillary model of this sort is far too crude to represent the
indicator dilution curves used to measure the cardiac output; this
requires higher temporal resolution (26, 27) and more precise mod-
els (28, 29). In this model there are up to three parameters for each of
the four compartments, flow and volume, and in compartment 3, a
consumption term G3. The shape of the transport function is fixed by
the assumption of a mixing chamber, so each has the impulse response
h(t) ¼ (1/t) exp(t/t), where the time constant t ¼ volume/flow.
The consumption influences the fraction leaving the compartment. If
we were to try to use the observed concentrations in Fig. 8 upper as
the input to the model we would see at once that we do not have
enough data: the points sampled are too sparse.
But the doses and their times of injection were defined pre-
cisely, so we use the sequence of doses as the input. We represent
the process of flow distribution and dilution of the injected ICG by
the set of four first-order differential equations, stirred tanks, as in
traditional compartmental analysis. Within each tank the concen-
tration, Cin, is given by
dCi =dt ¼ ðFi ðCini Ci Þ Gi Ci Þ=Vi ; (9)
where the index subscript i denotes the compartment, F is its flow, V
is its volume, and G is the rate of consumption within the volume.
Cini is the concentration in the blood entering the volume, and
equals the concentration in the compartment just upstream. The
concentration Ci is the same as the concentration flowing out
because of the basic compartmental assumption that the tank is
stirred instantaneously from entrance to exit. Via the test of fitting
the model to the data we can assess whether or not this assumption is
refuted by the data; if the data are fitted, then we do not prove that
the assumption is correct, but only that it was not invalidated by the
data. (This philosophical point underlies every assumption in for-
mulating a model: a model can never be proven correct. It can only
be demonstrated as adequate for the situation, and then can be used
as a “working hypothesis,” yet to be disproven. As T. H. Huxley put
it: “The great tragedy of science: the slaying of a beautiful hypothesis

by an ugly fact.” The downfall leads to the advancement!)
The JSim code is given in Table 3. The equations for each
compartment are closely similar. The recirculated dye and newly
injected dye enter the heart and lung compartment with central
blood volume V1, and mix with the total cardiac output. The only
clearance, G3, is during passage through the liver where ICG is
highly extracted and secreted into the bile; G2 and G4 are available
for exploration, but are set to zero.
The modeling results are shown in Fig. 9. Each injection
resulted in a sharp rise in concentration; when closely spaced the
mean concentration rises in spite of the rapid clearance. In the
absence of injections the concentration diminishes almost mono-
exponentially as was suggested by the straight lines on the semilog
plot in Fig. 8 (upper). If the system were a perfect single mixing
chamber with first-order clearance the time constant for washout
would be the volume divided by the clearance, Vtot/G3, and in fact
this is very close. The test is to plot the theoretical concentration
diminution against the model and the data. The equation,
Test ¼ 10 expððt 58Þ=ðVtot =G3 ÞÞ;
is the red dashed line in Fig. 9, and it does fit the phase where there
are no injections after 64 min and before the next injection at
100 min. The multiplier, 10, and the delay, 58 min, in the equation
simply position the curve. Test curves with longer delays (130 and
195 min for the purple and blue dashes) fit the data for the later
phases without injections. The closeness of these fits suggests that
approximating the whole blood volume as a single mixing chamber
would give a fairly good fit too. But the exponential Test curves
decay a little too rapidly compared to the model function and the
data at the lowest concentrations. This systematically better fit of
the model compared to the single exponential curve emphasizes
that the circulation is not really instantaneously wholly mixed, and
that a complete washout curve must be multi-exponential.
With respect to the physiological state of the animal, the fact
that the same value for G3 fits the data throughout the 4-h study
says that the hepatic clearance was stable over this long period of
anesthesia. We can also conclude that there is no evidence for the
saturation of the clearance process over the range of ICG concen-
trations seen here, fairly high concentrations, nearing 10 mg/l.
This implies that the processes for ICG clearance are not only
very effective but must also have binding constants high compared
to the concentrations found here. This dye soon found use as a
clinical test of liver function (24), and is widely used clinically today
(30–32). The hepatocyte’s apical transporter has a very high capac-
ity so that in the normal liver the transhepatic extraction is nearly
100% and the dye clearance can be used to estimate hepatic blood
flow. The reason the extraction is so high is that there is an active
ATP-supported extrusion of the dye from the hepatocyte into
Table 3
Compartmental model for hepatic ICG clearance
/* MODEL NUMBER: 0103

MODEL NAME: Comp4ICG
SHORT DESCRIPTION: Four-compartment whole body model with recirculation:
Repeated injections and first-order hepatic clearance of Indocyanine Green dye. */
import nsrunit; unit conversion on;
math Comp4ICG {
// INDEPENDENT VARIABLE
realDomain t min; t.min ¼ 0; t.max ¼ 30; t.delta ¼ 0.1;
// PARAMETERS
real Flow1 ¼ 1.5 L/min, // Blood Flow through Heart/Lung ¼ Cardiac Output
Dose ¼ 2.5 mg, // Amt injected
Vtot ¼ 1. 256 L, // Total Blood Volume
// First Compartmental unit (Heart/lung)
Vfr1 ¼ 0.25 dimensionless, // fraction of Vtot in Comp 1
V1 ¼ Vfr1*Vtot, // Volume of blood 1
C10 ¼ 0 mg/L, // Initial concentration
// Second Compartmental unit (Kidney and Head)
Fr2 ¼ 0.33, // Fraction of flow through comp 2
Flow2 ¼ Fr2*Flow1, // Flow through Comp 2
Vfr2 ¼ 0.40, // fraction of Vtot in Comp 2
G2 ¼ 0 ml/min, // consumption 2
// Third Compartmental unit (Liver and Gut)
Fr3 ¼ 0.25, // Fraction of Flow through Comp 3 (Liver)
Vfr3 ¼ 0.25, // fraction of Vtot in Comp 3
G3 ¼ 63.7 ml/min, // consumption 3
// Fourth Compartmental unit (Muscle)
Fr4 ¼ 1 - Fr2 - Fr3, // Fraction of Flow through Comp 4
Vfr4 ¼ 1 -Vfr1 -Vfr2 -Vfr3, // fraction of Vtot in Comp 4
G4 ¼ 0 ml/min; // Consumption 4
// Total flow relationship: Flow4 ¼ Flow1 - Flow2 - Flow3;
// EXTERNAL VARIABLE: The series of ICG injections, 2,5 mg each
extern real Cin1(t) 1/min; // 60 second Pulse injection @ 0.1 min
extern real Cin2(t) 1/min; // 60 second Pulse injection @ 30 min
(continued)
Table 3
(continued)

// DEPENDENT VARIABLES
real Crecirc(t) mg/L, //Inflow + recirculation
Qtot(t) mg, // Total amt in 4 compartments
Cinsum(t) 1/min, // Sum of the string of pulse injections
// Comp1 Comp2 Comp3 Comp4
C1(t) mg/L, C2(t) mg/L, C3(t) mg/L, C4(t) mg/L, // concn in each region
Q1(t) mg, Q2(t) mg, Q3(t) mg, Q4(t) mg; // Quantity in each regions
// INITIAL CONDITIONS
when(t ¼ t.min) {C1 ¼ C10; C2 ¼ C20; C3 ¼ C30; C4 ¼ C40;}
// INPUT CONCENTRATION FUNCTION
real Qinjrate(t) mg/min;
Cinsum ¼ (Cin1+ Cin2 + Cin3 + Cin4 + Cin5 + Cin6 + Cin7 + Cin8+
Cin9+ Cin10+ Cin11+ Cin12+ Cin13+ Cin14+ Cin15+
Cin16+ Cin17+ Cin18+ Cin19+ Cin20+ Cin21+ Cin22);
Qinjrate ¼ Dose*Cinsum;
Crecirc ¼ (C2*Flow2+ C3*Flow3 + C4*Flow4)/Flow1;
// QUANTITY Retained at time t:
Qtot ¼ V1*C1 + V2*C2 + V3*C3 + V4*C4 // Qtot is the total amount of ICG in the
body at time t.
real Area ¼ Dose*(integral(t ¼ t.min to t.max, Cinsum)); // check amt injected
// ORDINARY DIFFERENTIAL EQUATIONS
V1*C1:t ¼ Flow1*(Crecirc-C1) + Cinject; // Note that V1 can be left of ¼ sign
C2:t ¼ Flow2*(C1-C2)/V2 - G2*C2/V2;
C3:t ¼ Flow3*(C1-C3)/V3 - G3*C3/V3;
C4:t ¼ Flow4*(C1-C4)/V4 - G4*C4/V4;
} // END of program
/*
DETAILED DESCRIPTION:
This is a whole body model composed of a central blood volume from which flows the whole cardiac
output. The C.O. is distributed into three organs labeled "kidney" for kidney and head, "liver" for
liver and intestines, and "muscle" for the rest of the body.
(continued)
Table 3
(continued)
The four-compartment model has recirculation. The input function to compartment 1 (Heart and Lung)
is the sum of the recirculated indicator plus the series of injection pulses into V1, Qinjrate, each
injection at x mg/min for a short duration. Each injection, Cin#, is defined at run time using a separate
function generator. Clearance of the injected dye, indocyanine green, is hepatic extraction from the
blood via a saturable transporter on the hepatocyte sinusoidal membrane followed by ATP-dependent
excretion across the hapatocyte apical membrane into the bile, but is represented here by a passive first-
order loss, G3. This is adequate kinetically only at low concentrations of ICG, where the transporter is
mainly uncomplexed. */
KEY WORDS: compartment, flow and exchange, mixing chamber, hepatic clearance. first-order
consumption, washout, organ, multi-organ, recirculation
REFERENCES:
Edwards AWT, Bassingthwaighte JB, Sutterer WF, and Wood EH. Blood level of indocyanine green in
the dog during multiple dye curves and its effect on instrumental calibration. Proc S M Mayo Clin 35:
747-751, 1960.
Edwards AWT, Isaacson J, Sutterer WF, Bassingthwaighte JB, and Wood EH. Indocyanine green
densitometry in flowing blood compensated for background dye. J Appl Physiol 18: 1294-1304,
1963.
REVISION HISTORY:
Author: BEJ 06jan11
Revised by: JBB 09jan11 to combine the function generators, fgens, to speed computation
COPYRIGHT AND REQUEST FOR ACKNOWLEDGMENT OF USE:
Copyright (C) 1999-2011 University of Washington. From the National Simulation Resource,
Director J. B. Bassingthwaighte, Department of Bioengineering, University of Washington, Seattle WA
98195-5061.
Academic use is unrestricted. Software may be copied as long as this copyright notice is included.
This software was developed with support from NIH grant HL073598.
Please cite this grant in any publication for which this software is used and send an e-mail with the
citation and, if possible, a PDF file of the paper to [email protected].
*/ Table 3 is an example of a complete set of MML code in the format used in general for JSim project files, having the
same sequence and format as those used in the repository of models at www.physiome.org/Models.
the bile, thus keeping the intra-hepatocyte concentration low.

These features cannot be demonstrated at the low concentrations
seen in Figs. 8 and 9, but one would expect that the model would
have to be revised to include a saturable transporter if the concen-
trations were a lot higher.
5. Summary:
The Processes
Undertaken in
Pharmacokinetics In Subheading 4 we covered a standard approach to the steps in a
modeling analysis of data. The order of the steps depends a little on
the nature of the task. In the first model we performed no
Fig. 9. Model solution to fit the ICG data in a 14 kg dog. The triangles are the data shown in Fig. 8 (upper). The parameters
and initial conditions for the model are those given in the code in Table 3. The parameters are the result of optimization
using NL2SOL from Dennis and Schnabel (19); the total blood volume, Vtot, and the hepatic clearance, G3, were the only
free parameters. The 2.5 mg injection pulses are shown along the abscissa. The dashed lines are mono-exponentials with
time constant Vtot/G3. The model is #103 at http://www.physiome.org/jsim/models/webmodel/NSR/Comp4ICG/.
verification steps, and in the second the verification was done after
the fitting of the model to the data. This is clearly in the wrong
order: there is no point fitting a model to the data until it has been
demonstrated to be computed correctly, so in the list that follows,
the verification is done as soon as the draft model is constructed.
One cannot argue that the verification is not needed until the
model fits the data: a mathematically incorrect model might fit
the data, and after all that work the effort would be shown to be a
waste if the code had an error. Our failure to precede the data fitting
by a formal verification in these two cases is based on prior observa-
tions that JSim’s solutions for compartmental models provide four-
digit accuracy compared to analytical solutions.
Taking a listing of the steps to a detailed level:
l Ideally, design the experimental protocol to be the best test of
the model.
l Gathering supporting data, assess experimental accuracy.
l Obtain information on necessary parameters, a priori, and on
possible constraints.
l Complete development of the model. List all the assumptions.
l Conduct simulations, compare the results with other methods

of solution, perform verification tests, and find limiting cases
for which there are analytical solutions.
l Determine how one should weight the data, for use in mini-
mizing the sum of squares of differences between model solu-
tions and data.
l Validate that the model is reasonable, that it provides good fits
and that the, residual errors are not systematic or localized.
l Comparing simulation results to experiments and/or results of
other methods.
l Post-processing: Analyzing the results for consistency with
respect to physical and chemical and physiological expectations.
l Interpret the results scientifically. What does the model predict?
l Rethink the process. What are the weakest assumptions in the
model? How might it be improved? The model is always wrong.
Figure out new tests of it. Where might its predictions fail?
This overall process can be considered a success, if:
l Observations hitherto unexplained now fit a rational working
hypothesis.
l The essence of the phenomenology has been captured. (A
descriptive success only, perhaps.)
l Diagrams and schema of interactions aid understanding the
model, the kinetic relationships, and the code.
6. Model
Alternatives
and Modifications:
Interactive When the fit is not precise, outside of the limits of expectation
Hypothesis Revising relative to the noise in the data, despite all the attempts, then
maybe the model is just wrong. Certainly it is not nicely descriptive,
let alone explanatory! Given the philosophical premise that all
models are wrong, in the sense of being incomplete, or incorrect
mechanistically, every failure is a stimulus to find an alternative
model. A most rewarding and successful strategy is that of Platt
(12): he proposes that right from the outset one should
have alternative hypotheses in mind, and that the experiment
should be designed to distinguish between these hypotheses.
“Strong Inference” is the title of his paper. We advocate that each
hypothesis be expressed in terms of a computational model, since
that means that it is described explicitly and is therefore testable.
The strategy pays off because at least one of the hypotheses
is proven wrong when the model fails to fit the data. Sometimes
both are wrong! Regroup, rethink!
The least squares approach, minimization of the sums of

squares of the differences between the data points and the model
function, is a blunt tool, without specific information with respect
to any misfitting. Displaying the residuals, plotting the point-by-
point differences as a function of time (or in general, of the
independent variable), is an excellent way to get insight into
what to do next: a series of points above or below zero reveals a
systematic misfitting. Comparing the time course of the deviations
of the data from the model with the sensitivity functions of the
various parameters could suggest that a particular parameter has
not been optimized well, but this is most unlikely when the
various options for optimization have been explored. It is more
likely that the model lacks a feature that is needed. Back to the
drawing board!
This is the usual iterative process: hypothesis in general terms,
develop the model that represents a clear precise hypothesis, if
possible design the model for an alternative hypothesis, design
the distinguishing experiment while taking into account the accu-
racy of the data to be acquired, and disprove one or both hypoth-
eses. The next step is to improve the model and repeat the series of
steps until a satisfactory level of synchrony between model and
experiment is achieved. This version of the model is usually then
designated the working hypothesis. No working hypothesis is to be
regarded as “the truth,” though it is useful for practical purposes.
And in fact it serves as the standing target to be disproved in order
to advance the science.
7. What to Do When
the Compartmental
Representation Is
not so Good? What follows is a common example that applies in biophysics,
physiology, and pharmacology. It is usual that drugs and substrates
for metabolism and signaling molecules of molecular weight less
than 1,000 Da are partially extracted during single passage through
a capillary. Since capillaries are about a 1,000 mm long, but only
5 mm in diameter (as in Fig. 10), there is no possibility that they are
instantaneously stirred tanks with uniform concentration from end
to end: there must be gradients between capillary entrance and exit
for any solute that is exchanging between blood and tissue. If the
extraction is less than 5%, the gradient might be ignored, but for
solutes of interest to us here the steady-state extractions are
30–90%, and so affect the estimates of the permeabilities and
consumption rates. Consequently we now consider the computa-
tional differences between a stirred tank and an axially distributed
capillary–tissues exchange unit.
Fig. 10. A venule and capillaries on the epicardium of a dog heart casted with microfil.
Capillary diameters are 5.6 1.3 mm, average intercapillary distances are 17–19 mm,
and lengths are 800–1,000 mm. The distance between the long calibration lines is
100 mm. Modified from Bassingthwaighte et al. (33).
7.1. Capillary–Tissue A two-compartment system is here modified to incorporate flow, as

Exchange: Convention, shown in Fig. 11 (left panel), thus identifying V1 as the vascular
Permeation, Reaction, region, the membrane as the capillary barrier, and V2 as the tissue.
and Diffusion The system is considered as a homogeneously perfused organ with
constant volumes and steady flow, F, in and out. Now, in order to
put it into the context of substrate delivery and metabolism, we
switch to standard physiological representation of the units, defin-
ing them per gram of organ mass. F, PS, and the consumption G
have units ml/(g min), and the volumes have units ml/g. This
notation normalizes flows, substrate use, etc. to be independent
of organ mass. (The model is www.physiome.org/jsim/models/
webmodel/NSR/Comp2FlowExchange/, #247.)
To keep the system simple so as to focus on the blood–tissue
exchange, the intratissue consumption is considered to be a first-
order process, as if the substrate concentration is far below the KD
for any enzymatic reaction.
In the right panel of Fig. 11 is the equivalent axially distributed
model that accounts for gradients in concentration along the length
of the capillary. It is more general, but reduces to an exactly analo-
gous model when the PS’s are set to zero, so there is no entry into
the cells. Other parameters for cellular permeation and reaction are
as follows: for passage across endothelial cell luminal membrane
(PSecl); endothelial cell abluminal membrane (PSeca); and paren-
chymal cell membrane (PSpc); G, intracellular consumption ml/
(g min) or metabolism of solute by endothelial cells (Gec) or by
parenchymal cells (Gpc). The Vs, ml/g, are volumes of distribution
in plasma (Vpl), endothelial cell (Vec), interstitial (Visf), and paren-
chymal cell (Vpc). With the cell permeabilities, PSecl, PSeca, and
Fig. 11. Compartmental versus axially distributed models for capillary–tissue exchange. Exchange between capillary
plasma and interstitial fluid regions can be regarded as providing two conceptually similar but mathematical distinguish-
able methods of representation. Left panel: Two compartment stirred tank for the exchange of solute C between flowing
blood and surrounding stagnant tissue. Flow F, ml/(g min), carries in solute at concentration Cin, mM, and carries out a
concentration Coutt ¼ Cp, mM. The flux from capillary to ISF (compartment 2) is limited by the conductance PS ml/(g min),
the permeability-surface area product of the membrane separating the two chambers, allowing bidirectional flux. G2 ml/
(g min) is a reaction rate for a transformation flux forming product at a rate G2·C2 mmol/(g min). Right panel: Axially
distributed model equivalent to the two-compartmental model when the solute does not enter the endothelial or
parenchymal cells. PSg, the conductance for permeation through the interendothelial clefts, is equivalent to the PS of
the compartmental model in the left panel. Right panel from Gorman et al. (18) with permission from the American
Physiological Society.
PSpc, set to zero, PSg is equivalent to the compartmental PS, and

Vpl and Visf are equivalent to V1 and V2. In each of the four regions
there is an axial dispersion coefficient, equivalent to a diffusion
coefficient, setting the rate of random spreading along the length.
The full version of this model, GENTEX, a general multispecies
model for blood-tissue exchange is available at: www.physiome.
org/jsim/models/webmodel/NSR/gentex.proj.
Chinard (34) developed a technique to distinguish individual
processes involved in blood–tissue exchange and reaction: the
Multiple-Indicator Dilution (MID) technique (Fig. 12). He first
used it for the purpose of estimating the volumes of distribution for
sets of tracers of differing characteristics: the mean transit time vol-
ume, Vmtt ¼ F t where t is the mean transit time through the
system. He did not estimate permeabilities as his studies were on
highly permeable solutes. Crone (35) analyzed the technique,
showing how it could be used to estimate PS from the outflow curves
for a simultaneous injection into the inflow of a solute and imperme-
able reference intravascular tracer as shown in Fig. 12. The figure dia-
grams an experimental setup for examining the uptake of D-glucose in
an isolated perfused heart as by Kuikka et al. (36). L-Glucose, the
stereoisomer, serves as an extracellular, non-metabolized reference.
A more realistic diagram of a capillary–tissue exchange includes the
endothelial cells and interstitial fluid (ISF), as shown in Fig. 11.
To determine capillary permeability, the relevant reference sol-
ute is one that does not escape from the capillary blood during
single transcapillary passage; for example, albumin is the relevant
reference solute to determine the capillary permeability to glucose.
In this situation the albumin dispersion along the vascular space
Fig. 12. Schematic overview of experimental procedures underlying the application of the multiple-indicator dilution
technique to the investigation of multiple substrates passing through an isolated organ without recirculation of tracer. The
approach naturally extends also to their metabolites.
may be assumed to be the same as that of the glucose; thus

the shape of the albumin impulse response, halb(t), accounts for
the intravascular transport of all the solutes. (L-Glucose, an extra-
cellular reference tracer with the same molecular weight and diffu-
sivity as D-glucose, is the extracellular reference for D-glucose,
having the same capillary PSg and the same interstitial volume of
distribution, Visf. Having simultaneous data on such reference tra-
cers greatly reduces the degrees of freedom in estimating the para-
meters of interest for D-glucose.)
7.2. Model Equations The two diagrams in Fig. 11 look quite different, but the second
for Tracer can be reduced to the compartmental model, as we will show
below. The essential difference is that the distributed model
accounts for concentration gradients along the capillary length.
Capillaries are about 1 mm long, and are 5 mm in diameter, an
aspect ratio of 200. Diffusional relaxation times thus differ by a
factor of 200 between radial and axial directions. Consequently,
considering the capillary as a stirred tank is unreasonable.
The stirred tank expressions account for the flow through
compartment 1, the permeation, and consumption terms G2 ml/
(g min) in the second compartment:
dC1 PS F
¼ ðC1 C2 Þ ðCin C1 Þ;
dt V1 V1
dC2 PS G2
¼ þ ðC1 C2 Þ C2 : (10)
dt V2 V2
The use of these ODEs implies and builds into the calculations
a discontinuity between the concentration of solute in the inflow
and that in V1. Because V1 is assumed instantly mixed, there is no

gradient along the capillary and the tracer entering the tank is
immediately available to be washed out with the same probability
as any molecule dwelling in there for a longer time.
Alternatively the capillary–tissue unit of Fig. 11 right, can be
reduced to two regions represented by PDEs that allow a continu-
ous gradient along the length of the capillary between entrance and
exit, like those of Krogh and Erlang (37), Sangren and Sheppard
(38), and Renkin (39). It improved upon these by incorporating
axial dispersion in both the capillary and extravascular regions.
Using the spatially distributed analogs for plasma, Cpl, or blood,
and extravascular tissue, Cisf, instead of the lumped variables C1 and
C2:
@Cp ðx; tÞ Fpl L @Cpl PSg @ 2 C1
¼ Cpl Cisf þ Dx1 ;
@t Vpl @x Vpl @x 2
@Cisf ðx; tÞ PSg Gisf @ 2 C2
¼ Cpl Cisf Cisf þ Dx2 ; (11)
@t Visf Visf @x 2
where Cpl and Cisf are spatially distributed functions of both x and t,
not just t. The axial position is denoted by x, where 0 < x < L, the
capillary length, cm. The analogy between this model, Eq. 11, and
the compartmental version in Eq. 10 is Fp ¼ F, Vp ¼ V1, Visf ¼ V2,
and PSc ¼ PS, the permeability-surface area of the capillary wall,
but we retain the two sets of names in order to allow comparisons
between the estimated parameter values. The capillary length, L, is
arbitrarily set to an average value such as 0.1 cm; in the computa-
tions what is being used is the dimensionless fractional length, x/L.
PDEs require boundary conditions. At the capillary entrance, in
contrast to the compartmental model there is no discontinuity in the
concentration profile, but there is a requirement for matching the
diffusional and convective terms so that the influx is just the con-
vected mass, FCp(x ¼ 0, t). The boundary conditions are written:
when ðx ¼ xmin ÞfðFp L=Vp Þ ðCp Cin Þ þ Dp dCp =dx ¼ 0; g
at inlet to capillary,
(12)

when ðx ¼ xmax Þ dCp =dx ¼ 0; Cout ¼ Cp ;
at exit from capillary: (13)
The form of the inlet condition is important when the diffusion is
large, and we use it here for conceptual and practical accuracy. The
outflow concentration Cout is set equal to the concentration just inside
the exit, Cp(x ¼ L, t), the same condition described by the ODEs.
The last term in each equation is the diffusion along the length
of the capillary–tissue regions; the use of an anatomically correct
length then makes using observed diffusion coefficients for Dp and
Dtiss, cm2/s, practical and meaningful. Gross exaggeration of the
diffusion coefficients can be used in the equations to turn the

distributed model into a de facto well-mixed, compartmental
model. With both diffusion coefficients set to infinity this model
behaves identically to the compartmental model of Eq. 10.
The negative sign on the flow term, the first term on the right
of the equal sign in Eq. 11, merits further explanation. Consider
the inflow to contain a bolus of solute: as it enters, the concentra-
tion at the capillary entrance rises. At this time, the slope of the
curve of concentration versus position x, ∂C/∂x, is negative as
illustrated in the lower panel of Fig. 13 by the slope of C(x, t) for
the bolus shape at t ¼ 1.5 s, at the capillary mid point, x ¼ 0.5 L.
The spatial slope has always the sign opposite to the temporal
derivative ∂C/∂t at the same point, thus the negative sign on the
term.
Functionally, therefore Eq. 11 is analogous to Eq. 10. But
using the PDEs avoids the unrealistic discontinuity in the compart-
mental model at the entrance. Obnoxious, unrealistic discontinu-
ities at the entrances are the consequence of the instantaneous mixing
within a compartment. Using the PDE allows continuity in concen-
trations and concentration gradients along the capillary, and not
only in concentration but also in the properties of the system such
as axial gradients in transporter and enzyme densities that are
evident in the liver sinusoid. For the following analysis, all para-
meters are assumed spatially uniform so as to minimize the differ-
ence from the compartmental models. There are many ways of
representing axially distributed convecting systems, and two are
shown in Fig. 13. One solves PDEs using a PDE solver (there are
several choices within JSim), and here we used a Lagrangian
method (40–42). A compartmental type of alternative is to approx-
imate the capillary as a series of stirred tanks, each with the same
volume and PS, as diagrammed in the upper part of the figure. With
a large number of serial stirred tanks, the longitudinal concentra-
tion gradient is approximated increasingly accurately as the steps
from one to the next are small. The intravascular transport process
with serial stirred tanks is a Poisson process. In modeling, serial
stirred tanks are convenient because the number of tanks, Ntanks,
can be used as a free parameter. The relative dispersion RD over the
length of the tube is determined by Ntanks such that the RD of the
outflow curve (RD equals the coefficient of variation, the standard
deviation divided by the mean transit time), induced during transit,
is the reciprocal of the square root of Ntanks, so that with 100 tanks
the RD is 10%.
Figure 13 (upper) shows that curves for the PDE solution and
for the Poisson process are essentially similar, so that the dispersion
coefficient Dp sufficed to create the same dispersion as occurred
with the Poisson process using 109 tanks. The choice of 109 tanks
is arbitrary, large that a plot of C(Ntanks) versus N would appear
smooth. Because the capillary PS > 0, there is loss of solute as the
Fig. 13. Pulse responses in axially distributed models. The input function, Cin, is a pulse of duration 1.4 s. Upper panel:
Outflow concentration–time curves for a partial differential equation solution using a Lagrangian sliding fluid element
method and an intravascular dispersion coefficient, Dp ¼ 2.6 105 cm2/s (gray curve), and for a serial stirred tank
algorithm representing a Poisson process with 109 stirred tanks (black curve almost superimposed on the gray one). Lower
panel: Intracapillary spatial profiles in the distributed model (using the PDEs) at a succession of times, 1.5, 2.0, 2.5, and
3.0 s. The pulse slides and disperses due to the diffusion while some solute is lost from the vascular space by permeation
of the capillary wall. Parameters were the same for the compartmental 109 tank Poisson model and the PDE: Fp ¼ 1 ml/
(g min), PSC ¼ 2 ml/(g min), and tissue volume Vtiss was set to 10 ml/g so that there was negligible tracer flux from tissue
back into the plasma space. (The model: http://www.physiome.org/jsim/models/webmodel/NSR/Anderson_JC_2007/FIG-
URES/Anderson_JC_2007_fig11/index.html) Figure from Anderson (6).
bolus progresses along the capillary. The permeative loss is the same
for both methods, with the result that the peak outflow concentra-
tions are similar. Figure 13 (lower) shows the shape of the bolus as a
function of position as it deforms continuously from its initial
square pulse at the entrance to the capillary. The diminution in
peak height is therefore due not only to the spreading but also to
the loss. This loss is reflected of course in the reduction in the areas.
Fig. 14. Effect of reduction of Ntanks on the output C(t ). Responses of the Nth order
Poisson operator with Ntanks varied from 109 tanks in series down to 50, 20, 10, 5, 2,
and finally to a single mixing chamber, Ntanks ¼ 1. The gray curve is the Lagrangian
solution to the PDEs as in Fig. 13. All of the Poisson operator outflow curves (black) have
the same mean transit time, and the same parameters: Vp ¼ 0.05 and Vtiss ¼ 0.15 ml/g;
Fp ¼ 1 ml/(g min), PSg ¼ 1 ml/(g min). Figure from Anderson (6). Model is #46, running
loop mode to change Ntanks (http://www.physiome.org/jsim/models/webmodel/NSR/
Anderson_JC_2007/FIGURES/Anderson_JC_2007_fig12/index.html).
The grey curve touching to the top of each spatial profile is the
theoretical curve from Crone (35) and Renkin (39), as in the model
of Sangren and Sheppard (38):
PSg x

CðxÞ ¼ 1 e Fp L ; (14)
where x/L is the fractional distance along the capillary, the abscissa
in the lower panel.
Now that we know that the multi-compartmental serial tank
representation can give results approximating the normal PDE
representation, and that they differ basically only in the numerical
method used, the question becomes: “Which of the methods pro-
duces the correct assessment of the parameters with the greatest
efficiency?” The serial stirred tank model has the disadvantage that
the waveforms are seriously distorted by reducing the number of
stirred tanks, as is shown in Fig. 14.
While reducing the number of tanks in the stirred tank method
has a dramatic effect on the shapes of the outflow curves, the
problem is much less severe with the PDE representation, as
shown in Fig. 15. Solutions are shown for Ngrid ¼ 109, 51, 21,
11, and 7 for two methods of solving PDEs, one using a robust
solver TOMS731 (43) and the other using a Lagrangian sliding
fluid element algorithm (42).
Fig. 15. Effect of reduction of Ngrid on the output C(t ) with two PDE solvers. The green curve is the serial compartmental
model with 109 tanks. The red rectangle is the input function divided by 4. Black curves are the solutions to the PDE
at varied resolutions. Upper panel: Lagrangian sliding fluid element algorithm, LSFEA, with Ngrid ¼ 121 black solid line,
51 short dashes superimposed on the black solid line, 21 long dashes, and 11 dotted. This algorithm, while computation-
ally fast, describes the outflow dilution curve as a series of square pulses; with a large number of segments the curves
appear smooth. With fewer segments, e.g., Ngrid ¼ 21 (long dashes) or 11 (dotted line), the steps are obvious but the
approximation is reasonably good. Lower panel: TOMS731 solver. This slow, robust solver, like most PDE solvers,
broadens the solution somewhat with reduced Ngrid but with Ngrid ¼ 11 (dotted line) there is obvious spreading and
oscillation in the solutions (www.physiome.org Model # 126).
Fig. 16. Outflow dilution curves for D-glucose, 2-deoxy-D-glucose, and albumin (dog expt
4048-6) fitted with the model. The deoxyglucose curve has been shifted downward by half
a logarithmic decade (ordinate values divided by 10 in.) to display it separately from D-
glucose curve. Parameter estimates for D- and deoxyglucose were PSc, 0.97 and 1.0;
PSpc ¼ 0.7 and 0.5; Gpc ¼ 0.01 and 0.05 ml/(g min); the volume ratio Visf/Vp ¼ 6.5 and
Vpc/Vp ¼ 13.3 ml/g, the same for both glucoses. Coefficients of variation were 0.19 and
0.09. From Kuikka (36) with permission from the American Physiological Society (Model is
at www.physiome.org: Search on Kuikka, model 126).
Computation times for the PDE solvers are almost propor-

tional to Ngrid, the number of spatial elements chosen for the
computation. LSFEA is very much faster than TOMS731, and is
also faster than the serial tanks model since fewer segments are
needed. The key thing is that when the number of grid points is
reduced the PDE algorithms do not change shape so much as the
serial compartmental models. The result is that the PDEs are more
efficient for fitting the data and provide a realistic solution.
Accounting correctly for smooth intracapillary gradients is
important in the analysis of indicator dilution data for low-molecu-
lar-weight solutes using high-resolution techniques. Figure 16 is an
example of modeling of the uptake of glucoses in the dog heart.
The data from Kuikka (36) are outflow dilution curves from actively
contracting blood-perfused dog hearts. Albumin is the intravascu-
lar reference data curve defining the dispersion and delay from the
coronary inflow to the effluent veins for all of the tracers. The study
was on D-glucose, a normal substrate, and 2-deoxy-D-glucose, an
abnormal glucose that is transported into the cardiomyocyte, phos-
phorylated by hexokinase, and then cannot be further metabolized.
The model is that illustrated in Fig. 11 (right panel) but without
considering the negligible uptake by the endothelial cells. Para-
meters for the capillary wall permeation through the clefts and for
the cellular uptake are given in the legend. The quality of the data is
shown by the smoothness of the curves over time; the goodness of
fit is a testimonial to the quality of the model defined through
the PDEs.
8. Commentary
This presentation has emphasized issues important for new users of

compartmental systems to contemplate. This reduced coverage
neglects many important issues such as the identifiability of the
models, distinguishing one option from another, the numbers
and accuracy of the data points and their timing in obtaining
parameter estimates, the consequences of model verification test-
ing, and the basic principles of numerical methods and optimiza-
tion techniques. These are all covered in the more general
references listed in the Introduction.
We did however carefully express the models in physiological
terms, trying to emphasize recognition of the nature of the pro-
cesses of exchange and reaction. Thus Eq. 10 was deliberately not
written in the common parlance used in compartmental modeling:
dC1
¼ k1 ðC1 C2 Þ k2 ðCin C1 Þ;
dt
dC2
¼ þk3 ðC1 C2 Þ k4 C2 ; (15)
dt
where k1 ¼ PS/V1, k2 ¼ F/V1, k3 ¼ PS/V2, and k4 ¼ G2/V2.
These equations look simpler, for it appears that there are fewer
parameters, namely, four, while the original equations appear to
have five: F, PS, V1 and V2, and G2. But they are deceptively simple,
quietly masking their true identities behind the fact that there are
really only four independent parameters in the original Eq. 10.
Putting t ¼ V1/F, g ¼ V2/V1, d ¼ G2/F, and e ¼ PS/F, we
have k1 ¼ PS/V1 ¼ d/t; k2 ¼ F/V1 ¼ 1/t; k3 ¼ PS/V2 ¼ e/
(gt), and k4 ¼ G2/V2 ¼ d/(gt); this illustrates that there were
only four parameters in the original expressions, as expected,
remembering that the exponential response of a single compart-
mental system has just one parameter t ¼ V1/F, its mean transit
time. The point is that PS, G, F, and Vs have real anatomic and
physiological meaning, and reminding us that when the flow is
known, V1 is the relevant unknown, and it is usually constrained
by knowledge of the anatomy.
In general, “data” should include more than just a set of con-
centration–time curves, for one must recognize the underlying
anatomy as data. Anatomic data constrain the estimates of water
space or physical volumes as shown by Yipintsoi (44) and Vinnakota
(45). Further, large comprehensive models can be used to great

advantage when built upon anatomic and physicochemical con-
straints combined into the physiological kinetics (46).
In this essay we have not emphasized the distinction between
tracer and non-tracer kinetics except in the Introduction in
Eqs. 1–3. We have however considered the nonlinearity of reactions
in Application 4A, Aspirin Kinetics, where the dependence of the
reaction flux on the concentration is evident. If the salicylate con-
centration were held constant at any particular value, then for a
tracer, the tracer flux would be determined entirely by the concen-
tration of the ambient non-tracer salicylate, and the system would
be reduced to a linear one with constant coefficients (47). Likewise,
when examining tracer fluxes in a situation where mother substance
is varying, this should be taken into account by computing the dual
model, for mother and tracer simultaneously, a consideration all
too often forgotten in the common usage of compartmental analy-
sis. The warning is: when using tracers, measure the background
concentrations of mother substance often enough to assure its
constancy.
The traditional compartmental analysis therefore has its great-
est strength in situations where the overall system is in steady state
for all substances related to the tracer substance, for then the power
of linear systems analysis applies, convolution integration and sta-
tionarity apply, and matrix inversion can be used on sets of linear
ODEs. Then all is well set up for linear systems analysis.
References
1. Berman M (1963) The formulation and testing 7. Knopp TJ, Anderson DU, Bassingthwaighte JB
of models. Ann N Y Acad Sci 108:182–194 (1970) SIMCON–Simulation control to opti-
2. Jacquez JA (1972) Compartmental analysis in mize man-machine interaction. Simulation
biology and medicine. Kinetics of distribution 14:81–86
of tracer-labeled materials. Elsevier Publishing 8. Sauro HM, Fell DA (1991) SCAMP: a meta-
Co, Amsterdam, 237 pp bolic simulator and control analysis program.
3. Jacquez JA (1996) Compartmental analysis in Math Comput Model 15:15–28
biology and medicine, 3rd edn. BioMedware, 9. Sauro HM, Hucka M, Finney A, Bolouri H
Ann Arbor, MI, 514 pp (2001) The systems biology workbench con-
4. Cobelli C, Foster D, Toffolo G (2000) Tracer cept demonstrator: design and implementa-
kinetics in biomedical research. From data to tion. Available via the World Wide Web at
model. Kluwer Academic, New York http://www.cds.caltech.edu/erato/sbw/
5. Zierler KL (1981) A critique of compartmental docs/detailed-design/
analysis. Annu Rev Biophys Bioeng 10. Raymond GM, Butterworth E, Bas-
10:531–562 singthwaighte JB (2003) JSIM: free software
6. Anderson JC, Bassingthwaighte JB (2007) package for teaching physiological modeling
Tracers in physiological systems modeling. and research. Exp Biol 280.5:102. (www.phy-
Chapter 8 Mathematical modeling in nutrition siome.org/jsim)
and agriculture. In: Mark D. Hanigan JN, 11. Chizeck HJ, Butterworth E, Bassingthwaighte
Casey L Marsteller. Proceedings of the ninth JB (2009) Error detection and unit conversion.
international conference on mathematical Automated unit balancing in modeling inter-
modeling in nutrition, Roanoke, VA, 14–17 face systems. IEEE Eng Med Biol 28(3):50–58
August 2006, Virginia Polytechnic Institute 12. Platt JR (1964) Strong inference. Science
and State University Blacksburg, VA, 146:347–353
pp 125–159
13. Bassingthwaighte JB, Chinard FP, Crone C, 26. Stewart GN (1897) Researches on the circula-
Goresky CA, Lassen NA, Reneman RS, Zierler tion time and on the influences which affect it:
KL (1986) Terminology for mass transport and IV. The output of the heart. J Physiol
exchange. Am J Physiol Heart Circ Physiol 22:159–183
250:H539–H545 27. Hamilton WF, Moore JW, Kinsman JM, Spur-
14. Benedek IH, Joshi AS, Pieniazek JH, King ling RG (1932) Studies on the circulation. IV.
S-YP, Kornhauser DM (1995) Variability in Further analysis of the injection method, and of
the pharmacokinetics and pharmacodynamics changes in hemodynamics under physiological
of low dose aspirin in healthy male volunteers. and pathological conditions. Am J Physiol
J Clin Pharmacol 35:1181–1186 99:534–551
15. Aarons L, Hopkins K, Rowland M, Brossel S, 28. Thompson HK, Starmer CF, Whalen RE,
Thiercelin JF (1989) Route of administration McIntosh HD (1964) Indicator transit time
and sex differences in the pharmacokinetics of considered as a gamma variate. Circ Res
aspirin, administered as its lysine salt. Pharm 14:502–515
Res 6:660–666 29. Bassingthwaighte JB, Ackerman FH, Wood
16. Prescott LF, Balali-Mood M, Critchley JAJH, EH (1966) Applications of the lagged normal
Johnstone AF, Proudfoot AT (1982) Diuresis density curve as a model for arterial dilution
or urinary alkalinisation for salicylate poison- curves. Circ Res 18:398–415
ing? Br Med J 285:1383–1386 30. Krenn CG, Krafft P, Schaefer B, Pokorny H,
17. Chan IS, Goldstein AA, Bassingthwaighte JB Schneider B, Pinsky MR, Steltzer H (2000)
(1993) SENSOP: a derivative-free solver for Effects of positive end-expiratory pressure on
non-linear least squares with sensitivity scaling. hemodynamics and indocyanine green kinetics
Ann Biomed Eng 21:621–631 in patients after orthotopic liver transplanta-
18. Glad T, Goldstein A (1977) Optimization of tion. Crit Care Med 28:1760–1765
functions whose values are subject to small 31. Krenn CG, Pokorny H, Hoerauf K, Stark J,
errors. BIT 17:160–169 Roth E, Steltzer H, Druml W (2008) Non-
19. Dennis JE, Schnabel RB (1983) Numerical isotopic tyrosine kinetics using an alanyl-
methods for unconstrained optimization and tyrosine dipeptide to assess graft function in
nonlinear equation. Prentice-Hall, New York liver transplant recipients - a pilot study. Wien
20. Fox IJ, Brooker LGS, Heseltine DW, Essex Klin Wochenschr 120(1–2):19–24
HE, Wood EH (1957) A tricarbocyanine dye 32. Kortgen A, Paxian M, Werth M, Recknagel P,
for continuous recording of dilution curves in Rauschfusz F, Lupp A, Krenn C, Muller D,
whole blood independent of variations in Claus RA, Reinhart K, Settmacher U, Bauer
blood oxygen saturation. Proc Staff Meet M (2009) Prospective assessment of hepatic
Mayo Clin 32:478 function and mechanisms of dysfunction in
21. Edwards AWT, Isaacson J, Sutterer WF, Bas- the critically ill. Shock 32(4):358–365
singthwaighte JB, Wood EH (1963) Indocya- 33. Bassingthwaighte JB, Yipintsoi T, Harvey RB
nine green densitometry in flowing blood (1974) Microvasculature of the dog left ventric-
compensated for background dye. J Appl ular myocardium. Microvasc Res 7:229–249
Physiol 18:1294–1304 34. Chinard FP, Vosburgh GJ, Enns T (1955)
22. Edwards AWT, Bassingthwaighte JB, Sutterer Transcapillary exchange of water and of other
WF, Wood EH (1960) Blood level of indocya- substances in certain organs of the dog. Am J
nine green in the dog during multiple dye Physiol 183:221–234
curves and its effect on instrumental calibra- 35. Crone C (1963) The permeability of capillaries
tion. Proc Staff Meet Mayo Clin 35:747–751 in various organs as determined by the use of
23. Bassingthwaighte JB (1966) Plasma indicator the indicator diffusion method. Acta Physiol
dispersion in arteries of the human leg. Circ Scand 58:292–305
Res 19:332–346 36. Kuikka J, Levin M, Bassingthwaighte JB
24. Hunton DB, Bollman JL, Hoffman HN (1986) Multiple tracer dilution estimates of D-
(1961) The plasma removal of indocyanine and 2-deoxy-D-glucose uptake by the heart.
green and sulfobromophthalein: effect of dos- Am J Physiol Heart Circ Physiol 250:
age and blocking agents. J Clin Invest 30 H29–H42
(9):1648–1655 (PMCID PMC290858) 37. Krogh A (1919) The number and distribution
25. Bassingthwaighte JB, Edwards AWT, Wood of capillaries in muscles with calculations of the
EH (1962) Areas of dye-dilution curves sam- oxygen pressure head necessary for supplying
pled simultaneously from central and periph- the tissue. J Physiol (Lond) 52:409–415
eral sites. J Appl Physiol 17:91–98
38. Sangren WC, Sheppard CW (1953) A mathe- 44. Yipintsoi T, Scanlon PD, Bassingthwaighte JB
matical derivation of the exchange of a labeled (1972) Density and water content of dog
substance between a liquid flowing in a vessel ventricular myocardium. Proc Soc Exp Biol
and an external compartment. Bull Math Bio- Med 141:1032–1035
phys 15:387–394 45. Vinnakota K, Bassingthwaighte JB (2004)
39. Renkin EM (1959) Transport of potassium-42 Myocardial density and composition: a basis
from blood to tissue in isolated mammalian for calculating intracellular metabolite concen-
skeletal muscles. Am J Physiol 197:1205–1210 trations. Am J Physiol Heart Circ Physiol 286:
40. Bassingthwaighte JB (1974) A concurrent flow H1742–H1749
model for extraction during transcapillary 46. Bassingthwaighte JB, Raymond GR, Ploger
passage. Circ Res 35:483–503 JD, Schwartz LM, Bukowski TR (2006)
41. Bassingthwaighte JB, Wang CY, Chan IS GENTEX, a general multiscale model for
(1989) Blood-tissue exchange via transport in vivo tissue exchanges and intraorgan
and transformation by endothelial cells. Circ metabolism. Phil Trans Roy Soc A Math Phys
Res 65:997–1020 Eng Sci 364(1843):1423–1442. doi:
42. Bassingthwaighte JB, Chan IS, Wang CY 10.1098/rsta.2006.1779
(1992) Computationally efficient algorithms 47. Bassingthwaighte JB, Goresky CA, Linehan JH
for capillary convection-permeation-diffusion (1998) Ch. 1 Modeling in the analysis of the
models for blood-tissue exchange. Ann processes of uptake and metabolism in the
Biomed Eng 20:687–725 whole organ. In: Bassingthwaighte JB, Goresky
43. TOMS./TOMS. Association of computing CA, Linehan JH (eds) Whole organ approaches
machinery: transactions on mathematical soft- to cellular metabolism. Springer, New York,
ware. http://www.netlib.org/toms/index. pp 3–27
html
Chapter 18
Physiologically Based Pharmacokinetic/Toxicokinetic

Modeling
Jerry L. Campbell Jr., Rebecca A. Clewell, P. Robinan Gentry,
Melvin E. Andersen, and Harvey J. Clewell III
Abstract
Physiologically based pharmacokinetic (PBPK) models differ from conventional compartmental
pharmacokinetic models in that they are based to a large extent on the actual physiology of the organism.
The application of pharmacokinetics to toxicology or risk assessment requires that the toxic effects in a
particular tissue are related in some way to the concentration time course of an active form of the substance
in that tissue. The motivation for applying pharmacokinetics is the expectation that the observed effects of a
chemical will be more simply and directly related to a measure of target tissue exposure than to a measure of
administered dose. The goal of this work is to provide the reader with an understanding of PBPK modeling
and its utility as well as the procedures used in the development and implementation of a model to chemical
safety assessment using the styrene PBPK model as an example.
Key words: PBPK, Styrene, Pharmacokinetics
1. Introduction
Pharmacokinetics is the study of the time course for the absorption,

distribution, metabolism, and excretion of a chemical substance in a
biological system. Implicit in any application of pharmacokinetics
to toxicology or risk assessment is the assumption that the toxic
effects in a particular tissue can be related in some way to the
concentration time course of an active form of the substance in
that tissue. Moreover, except for pharmacodynamic differences
between animal species, it is expected that similar responses will
be produced at equivalent tissue exposures regardless of animal
species, exposure route, or experimental regimen (1–3). Of course
the actual nature of the relationship between tissue exposure and
response, particularly across species, may be quite complex, and
exceptions to the rule of tissue dose equivalence are numerous.
439
440 J.L. Campbell Jr. et al.
Nevertheless, the motivation for applying pharmacokinetics is the

expectation that the observed effects of a chemical will be more
simply and directly related to a measure of target tissue exposure
than to a measure of administered dose.
1.1. Compartmental One of the first general descriptions of pharmacokinetic modeling

Modeling was presented by Teorell (4, 5). The model consisted of a number of
compartments representing specific tissues. The concentration of
chemical in each compartment was described by a mass balance equa-
tion in which the rate of change of the amount of chemical in a
compartment was determined from the rates at which the chemical
entered and left the compartment in the blood as well as, when
appropriate, the rate of clearance of the chemical in that compartment.
Unfortunately, in order to obtain an analytical solution of the resulting
system of differential equations, Teorell found it necessary to make a
number of simplifying assumptions. These assumptions led to a solu-
tion in the form of a sum of exponential terms, and thus the “classical”
compartmental modeling approach still used today was born.
Over the years, Teorell’s association of the model compartments
with specific tissues has to a large extent been lost, and compartmen-
tal modeling as currently practiced is largely an empirical exercise. In
this empirical approach, data on the time course of the chemical of
interest in blood (and perhaps other tissues, urine, etc.) are collected.
Based on the behavior of the data, a mathematical model is selected
which possesses a sufficient number of compartments (and therefore
parameters) to describe the data. The compartments do not in gen-
eral correspond to identifiable physiological entities but rather are
described in abstract terms. An example of a simple two-
compartment mathematical model of this type is shown in Fig. 1.
This particular model consists of a “central” compartment, character-
ized by concentrations measured in the blood (but not considered to
actually represent only the blood), and a “deep” compartment repre-
senting unspecified tissues communicating with the central compart-
ment as described by kinetic parameters, k12 and k21, which
themselves have no obvious physiological or biochemical interpreta-
tion. Similarly, the volume of the central compartment and the uptake
and clearance parameters (ka and ke) are empirically determined by the
analysis or fitting of experimental data sets.
ka ke
CENTRAL
uptake clearance
k12 k21
DEEP
Fig. 1. Simple compartmental pharmacokinetic model.

18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 441
The advantage of this modeling approach is that there is no

limitation to fitting the model to the experimental data. If a partic-
ular model is unable to describe the behavior of a particular data set,
additional compartments can be added until a successful fit is
obtained. Since the model parameters do not possess any intrinsic
meaning, they can be freely varied to obtain the best possible fit,
and different parameter values can be used for each data set in a
related series of experiments. Statistical tests can then be employed
to compare the values of the parameters used, for example at two
administered dose levels, in order to determine whether any appar-
ent differences are statistically significant (6). Once developed,
these models are useful for interpolation and limited extrapolation
of the concentration profiles which can be expected as experimental
conditions are varied. If the model parameters vary with dose, this
information can provide evidence for the presence of nonlinearities
in the animal system, such as saturable metabolism or binding. At
this point, however, one of the serious disadvantages of the empiri-
cal approach becomes evident. Since the compartmental model
does not possess a physiological structure, it is often not possible
to incorporate a description of these nonlinear biochemical pro-
cesses in a biologically appropriate context. For example, in the case
of inhalation of chemicals subject to high-affinity, low-capacity
metabolism in the liver, an important determinant of metabolic
clearance at low inhaled concentrations is the fact that only the
fraction of the chemical in the blood reaching the liver is available
for metabolism (1). Without a physiological structure it is not
possible to correctly describe the interaction between blood-
transport of the chemical to the metabolizing organ and the intrin-
sic clearance of the chemical by the organ.
1.2. Physiologically Physiologically based pharmacokinetic (PBPK) models differ from

Based Modeling conventional compartmental pharmacokinetic models in that they
are based to a large extent on the actual physiology of the organism.
Instead of compartments defined solely by the experimental data,
actual organ and tissue groups are described using weights and
blood flows obtained from the literature. Moreover, instead of
composite rate constants determined solely by fitting data,
measured physical–chemical and biochemical constants of the com-
pound can often be used. To the extent that the structure of the
model reflects the important determinants of the kinetics of the
chemical, PBPK models can predict the qualitative behavior of an
experimental time course data set without having been based
directly on it. Refinement of the model to incorporate additional
insights gained from comparison with experimental data yields a
model which can be used for quantitative extrapolation well beyond
the range of experimental conditions on which it was based. In
particular, a properly validated PBPK model can be used to perform
Problem
Identification
Literature
Evaluation
Mechanisms Biochemical Biochemical

of Toxicity Constants Constants
Model Formulation
Simulation
Compare to
Refine Model Validate Model
Kinetic Data
Design / Conduct Extrapolation

Critical Experiments to Humans
Fig. 2. Flowchart of the biologically motivated PBPK modeling approach to chemical risk
assessment.
the high-to-low dose, dose-route, and interspecies extrapolations

necessary for estimating human risk on the basis of animal toxicol-
ogy studies (7–15).
A number of excellent reviews have been written on the subject of
PBPK modeling (16–20). The basic approach is illustrated in Fig. 2.
The process of model development begins with the definition of the
chemical exposure and toxic effect of concern, as well as the species
and target tissue in which it is observed. Literature evaluation involves
the integration of available information about the mechanism of
toxicity, the pathways of chemical metabolism, the nature of the
toxic chemical species (i.e., whether the parent chemical, a stable
metabolite, or a reactive intermediate produced during metabolism
is responsible for the toxicity), the processes involved in absorption,
transport and excretion, the tissue partitioning and binding charac-
teristics of the chemical and its metabolites, and the physiological
parameters (e.g., tissue weights and blood flow rates) for the species
of concern (i.e., the experimental species and the human). Using this
information, the investigator develops a PBPK model which expresses
mathematically a conception of the animal–chemical system. In the
model, the various time-dependent biological processes are described
as a system of simultaneous differential equations. A mathematical
model of this form can easily be written and exercised using
commonly available computer software (21). The specific structure

of the model is driven by the need to estimate the appropriate
measure of tissue dose under the various exposure conditions of
concern in both the experimental animal and the human. Before
the model can be used in human risk assessment it has to be
validated against kinetic, metabolic, and toxicity information and,
in many cases, refined based on comparison with the experimental
results. The model itself can frequently be used to help design
critical experiments to collect data needed for its own validation.
The chief advantage of a PBPK model over an empirical
compartmental description is its greater predictive power. Since
fundamental biochemical processes are described, dose extrapola-
tion over ranges where saturation of metabolism occurs is possible
(22). Since known physiological parameters are used, a different
species can be modeled by simply replacing the appropriate con-
stants with those for the species of interest, or by allometric scaling
(23–25). Similarly, the behavior for a different route of administra-
tion can be determined by adding equations which describe the
nature of the new input function (21, 26). The extrapolation from
one exposure scenario, say a single 6 h exposure, to another, e.g., a
repetitive 6 h exposure, 5 days a week for the life of the animal, is
relatively easy and only requires a little ingenuity in writing the
equations for the dosing regimen in the kinetic model (27, 28).
Since measured physical–chemical and biochemical parameters
are used, the behavior for a different chemical can quickly be esti-
mated by determining the appropriate constants. An important result
is the ability to reduce the need for extensive experiments with new
chemicals (12). The process of selecting the most informative experi-
mental data is also facilitated by the availability of a predictive phar-
macokinetic model (29). Perhaps the most desirable feature of a
physiologically based model is that it provides a conceptual frame-
work for employing the scientific method in which hypotheses can be
described in terms of biological processes, predictions can be made on
the basis of the description, and the hypothesis can be revised on the
basis of comparison with experimental data.
The trade-off against the greater predictive capability of
physiologically based models is the requirement for an increased
number of parameters and equations. However, values for many
of the parameters, particularly the physiological ones, are already
available in the literature (30–35), and in vitro techniques have
been developed for rapidly determining the compound-specific
parameters (36–39). An important advantage of PBPK models is
that they provide a biologically meaningful quantitative frame-
work within which in vitro data can be more effectively utilized
(40). There is even a prospect that predictive PBPK models can
someday be developed based almost entirely on data obtained
from in vitro studies.
Some of the best examples of successful PBPK modeling efforts

were performed to support the clinical use of chemotherapeutic
drugs, e.g., methotrexate (41) and cisplatin (42) (see (43) for a
review). There are also a large number of good examples of PBPK
models which describe the kinetics of important environmental
contaminants, including methylene chloride (8, 44, 45), trichloro-
ethylene (46–48), chloroform (49, 50), 2-butoxyethanol (51),
kepone (52), polybrominated biphenyls (53), polychlorinated
biphenyls (54) and dibenzofurans (55), dioxins (56, 57), lead
(58–62), arsenic (63, 64), methylmercury (65), atrazine (66, 67),
acrylonitrile (68–70), perchlorate (71–76), and BTEX components
(77–81). The U.S. EPA is currently compiling a compendium of
PBPK models including source code. This should be available
online within the next year.
2. Materials
Currently there exists a very diverse group of modeling software

packages that vary in both complexity and range of application.
Because of this diversity, there is a software package suitable for
every level of user from the expert to the first-time modeler. However,
not all modeling packages are created equal, and some of the more
user-friendly software can lack the capabilities of the more complex
programs. Consequently, no single software package available can
meet all needs of all users, and the diversity and complexity of the
programs can often make converting a model from one package to
another rather difficult. Table 1 provides a list of some of the available
software packages that may be useful for PBPK modeling (82).
An additional list of pharmacokinetic software is located at
http://boomer.org/pkin/soft.html. However, not all of the soft-
ware listed on this website is suitable for PBPK modeling.
The most commonly used software packages for PBPK model-
ing have included Advanced Continuous Modeling Language
(ACSL) (now acslX), Berkeley Madonna, MATLAB, MATLAB/
Simulink, ModelMaker, and SCoP. Table 2 provides a summary of
the features of each of these followed by further information.
acslX is an updated version of the widely used ACSL software.
It has graphical as well as text interface with automatic linkage to
the integration algorithm. In particular, a Pharmacokinetic Toolkit
in the graphic code interface makes it possible to build PBPK
models by connecting predefined tissue code blocks. The software
allows for the use of discrete blocks and script files and automati-
cally sorts equations in the derivative block. The model may be
compiled into either C/C++ or Fortran, although C++ is now
the preferred compiler, and may be debugged interactively.
Table 1
Representative list of available software packages
Package Source Website

General-purpose high-level scientific computing software. These high-level programming language
packages are very general modeling tools that are not specifically designed for PBPK modeling, but
offer more complexity
acslX AEgis Technologies http://www.acslX.com
Group, Inc.
Berkeley Madonna University of California http://www.berkeleymadonna.com
at Berkeley
GNU octave University of Wisconsin http://www.octave.org
MATLAB/Simulink The MathWorks, Inc. http://www.mathworks.com
MLAB Civilized Software, Inc. http://www.civilized.com
Biomathematical modeling software. Packages that were specifically designed for modeling biological
systems and some are user-friendly. Their usefulness in PBPK modeling is determined by their
graphical interfaces, computational speed, and language flexibility and may provide mixed-effects
(population) capabilities allowing for the analysis of sparse data sets
ADAPT II Biomedical Simulations http://bmsr.usc.edu
Resource,
USC
MCSim INERIS http://toxi.ineris.fr/activites/
toxicologie_quantitative/mcsim/
mcsim.php
ModelMaker ModelKinetix http://www.modelkinetix.com
NONMEM University of California http://www.globomaxservice.com
at San
Francisco and
Globomax Service
Group
SAAM II SAAM Institute, Inc. http://www.saam.com
SCoP Simulation Resources, http://www.simresinc.com
Inc.
Stella High Performance http://www.hps-inc.com
Systems, Inc.
WinNonlin Pharsight Corp. http://www.pharsight.com
WinNonMix Pharsight Corp. http://www.pharsight.com
Toxicokinetic software. These packages were designed specifically for PBPK and PBTK modeling and
are extremely flexible. They are based on modeling languages developed in the aerospace industry
for modeling complex systems
SimuSolv Dow chemical Not maintained or subject to further
development
Physiologically based custom-designed software. Custom-designed proprietary software programs
specifically for biomedical systems or applications that provide a high level of biological detail but
are not easily customized
GastroPlus Simulations Plus, Inc. http://www.simulations-plus.com
Pathway prism Physiome Sciences, Inc. http://www.physiome.com
Physiolab Entelos, Inc. http://www.entelos.com
SimCYP Simcyp, Ltd. http://www.simcyp.com
Table 2
Comparison of modeling software features
Berkeley MATLAB/ Model

Feature acslXc, d Madonnac, d MATLABc, d Simulinkc, d makerc, d SCoPd
Graphical interface Y Y N Y Y N
Text interface Y Y Y N Y Y
Automatic linkage to Y Y N Y Y Y
integration algorithm
Discrete blocks Y N N Y Y N
Scripting Y N Y Y N N
Code sorting Y Y N Y Y N
Choice of target language Y N N N N N
Interactive model Y Y N Y N N
debugging
Optimization Y Y Ya Y Y Y
b b b
Sensitivity analysis Y Y Y Y Y Y
b b b b
Monte Carlo Y Y Y Y Y N
Units checking N N N N N Y
Database of physiological N N N N N N
values
Compiled (faster) Y Y Ya, e N N Y
Interpreted N N Y Y Y N
(more convenient)
a
Extra cost
b
Can perform through the use of user-developed model code or script files
c
Must contact vendor for price. Price may depend upon the type of license
d
Student and/or academic licenses are available
e
With separate compiler
An optimization program is also included. Sensitivity analysis and

Monte Carlo analysis can currently be conducted with script files
and there is some capability built into the package that allows these
analyses to be conducted with the aid of a using a gui.
Berkeley Madonna has many of the same features as acslX;
however, it does not allow for the use of discrete blocks or script
files. It does currently have both an optimization and sensitivity
analysis feature, but does not have a built-in Monte Carlo capability.
MATLAB has a text interface, but not a graphical interface. It
does allow for the use of script files but not discrete blocks. It does
not sort the code in the model so the user must be careful regarding
the order of statements in the code. This statement order problem

can be problematic in the case of PBPK models, which require
simultaneous solution of multiple differential equations, and can
complicate conversion from a software package that automatically
sorts the model equations. An optimization package is available
through an add-on toolbox, but sensitivity analysis and Monte
Carlo analysis must be performed through the use of script files.
A large variety of user-built model code blocks are available at the
MATLAB web site. The model code may be compiled through the
purchase and use of a MATLAB Compiler, but the user has no
choice of the target language used.
Simulink is an add-on to MATLAB that offers a graphic inter-
face but no text interface. The use of discrete blocks is also added
with the use of Simulink. Since Simulink uses only a graphical
interface, there is no code to be viewed. Simulink adds a graphical
debugger to MATLAB and an optimizer.
ModelMaker has many of the same features as acslXtreme and
Berkeley Madonna. It has both a graphical and a text interface with
automatic linkage to the integration algorithm and allows for the
use of discrete blocks but not the use of script files. ModelMaker
also provides the capabilities for optimization, sensitivity analysis,
and Monte Carlo analysis.
SCoP has only a text interface with automatic linkage to the
integration algorithm. It does not allow for the use of discrete
blocks or script files and it does not automatically sort the code.
Optimization and sensitivity analysis capabilities are included, but
not Monte Carlo analysis capabilities.
An additional modeling software package, not included in
Table 2, is designed specifically to support Markov Chain Monte
Carlo (MCMC) simulation. This program, MCSim, has been used
to reestimate parameter distributions for PBPK models on the basis
of the agreement between model predictions and measured data
from kinetic studies. MCSim is available for free download from the
website listed in Table 1. Its proper use requires expertise in pro-
gramming and statistics.
Each of the software packages described above provides differ-
ent features that would recommend them for a particular user.
From the viewpoint of a risk assessor wanting to apply an existing
PBPK model, as well as for a model developer seeking to have his
model used in a risk assessment, the key requirement is the ability to
readily evaluate (verify) the model, reproduce (validate) the capa-
bility of the model to simulate key data sets, and document its
application for the risk assessment. The necessary documentation
includes a model definition (preferably code-based) that can be
reviewed to verify the mathematical correctness of the model, a
description of the parameters that should be used to run the model
for comparison with validation data, and a description of the
parameters used to calculate the dose metrics for the risk assess-
ment. The most important characteristics of a language for model
evaluation are verifiable code, self-documentation, and ease of use.
The feature that is most important with regard to self-
documentation is scripting, which allow the model developer to
create procedures consisting of sequences of commands that, for
example, set model parameters, run the model, and plot the model
predictions against the appropriate data set. Other features which
contribute to ease of model evaluation include viewable model
definition code, code sorting, and automatic linkage to integration
algorithms.
The modeling language that has seen the most widespread use
in PBPK modeling is the ACSL, which is currently implemented as
acslX. The ACSL language has also served as the basis for a variety
of older packages, including SimuSolv (from Dow Chemical,
no longer supported), ACSL/Tox (from Pharsight, no longer
supported), and ERDM (currently used by the USEPA). The
automatic code sorting provided by ACSL allows code to be
grouped functionally (liver, lung, fat, etc.) rather than in program
order, greatly simplifying model development and improving
readability of the code. The graphic code block capability in
acslX is particularly attractive for PBPK modeling because it pro-
vides the ability to create a model by connecting functional units
(e.g., tissues) in a graphic environment, while at the same time
creating a model definition in ACSL code that can be reviewed for
model verification. The scripting capability, which permits both
MATLAB-like m-files, greatly expedites the comparison of the
model with multiple data sets that is generally required for
PBPK modeling of data from multiple species and routes of expo-
sure. The scripting capability also makes it possible to document
the use of a model in a risk assessment, since m-scripts or com-
mand file procedures can be written that set the model parameters
and run the model for each of the dose metrics required for the
risk assessment, as well as for each of the data sets used for model
evaluation/validation.
Berkeley Madonna provides a particularly intuitive, flexible
platform for model development, and has been very popular in
academic settings. The software provides automatic code sorting,
the ability to automatically convert between code and graphic
model descriptions, and automatic compilation, greatly simplifying
and expediting model development, debugging, and verification.
Conversion of models between ACSL and Berkeley Madonna is
relatively straightforward. However, the lack of a scripting capabil-
ity makes comparison of the model with data fairly cumbersome,
particularly in situations where a large number of data sets are being
modeled. The lack of scripting also makes it more difficult to
document the actual use of a model in a risk assessment and greatly
complicates model evaluation.
MATLAB is a very powerful and flexible software package that

is particularly attractive in research activities. Its main drawback is
that its use requires significant expertise in programming.
MATLAB/Simulink avoids this drawback by providing a graphical
interface, and is very popular with engineers in the automotive and
electronic industries. However, verification of a Simulink model by
a nonengineer is hampered by the lack of any model definition
code. That is, the model is specified only by a “wiring diagram”
that shows the connections between blocks built up from basic
mathematical functions (adders, multipliers, integrators, etc.).
Conversion of a model between MATLAB and Simulink, or
between one of these programs and another software package,
can be very difficult.
ModelMaker, which is popular in Europe, provides surprisingly
broad functionality at a relatively low price. Its only serious draw-
back is the lack of a scripting capability.
SCoP, which along with ACSL was one of the first languages to
be used for PBPK modeling, continues to be used due to its
familiarity and low cost. However, its lack of scripting or code
sorting, and its DOS-based, menu-driven run-time interface can
make model evaluation more difficult.
3. Methods
3.1. General Concepts The methods will begin with a description of the seminal PBPK
model published by Ramsey and Andersen in 1984 and lead into
the elements necessary for successful model development, refine-
ment and validation. The experience of Ramsey and Andersen
serves as a useful example of the advantages of the PBPK modeling
approach. In this case, blood and tissue time-course curves of
styrene had been obtained for rats exposed to four different con-
centrations of 80, 200, 600, and 1,200 ppm (83). Data were
obtained during a 6 h exposure period and for 18 h after cessation
of the exposure. The initial analysis of these data had been based on
a simple compartmental model, similar to the model shown in
Fig. 1, which had a zero-order input related to the amount of
styrene inhaled, a two-compartment description of the rat, and
linear metabolism in the central compartment. The compartmental
model was successful with lower concentrations but was unable to
account for the more complex behavior at higher concentrations
(note the different behavior of the data at the two concentrations
shown in Fig. 3).
In an attempt to provide a more successful description, a PBPK
model was developed with a realistic equilibration process for pul-
monary uptake and Michaelis–Menten saturable metabolism in the
Rat
Styrene Concentration (mg/L)

600 ppm
80 ppm
Hours
Fig. 3. Model predictions (solid lines) and experimental blood styrene concentrations in
rats during and after 6 h exposures to 80 and 600 ppm styrene. The thick bars represent
the chamber air concentrations of styrene and are shown to highlight the nonlinearity of
the relationship between administered and internal concentrations. The model (Fig. 1.3)
contains sufficient biological realism to predict the very different behaviors observed at
the two concentrations.
liver. A diagram of the PBPK model that was used by Ramsey and
Andersen (1984) (22) to describe styrene inhalation in both rats
and humans is shown in Fig. 4. In this diagram, the boxes represent
tissue compartments and the lines connecting them represent
blood flows. The model contained several “lumped” tissue com-
partments: fat tissues, poorly perfused tissues (muscle, skin, etc.),
richly perfused tissues (viscera), and metabolizing tissues (liver).
The fat tissues were described separately from the other poorly
perfused tissues due to their much higher partition coefficient for
styrene, which leads to different kinetic properties, while the liver
was described separately from the other richly perfused tissues due
to its key role in the metabolism of styrene. Each of these tissue
groups was defined with respect to their blood flow, tissue volume,
and their ability to store (partition) the chemical of interest.
Although the model diagram in Fig. 4 shows a lung compartment,
a steady-state approximation for the equilibration of lung blood
with alveolar air was used in the mathematical formulation of the
model to eliminate the need for an actual lung tissue compartment.
This simple model structure, with realistic constants for the physi-
ological, partitioning, and metabolic parameters, very accurately
predicted the behavior of styrene in both fat and blood of the rat
at all concentrations. Fig. 3 compares the model-predicted time
course in the blood with the experimental data for the highest and
lowest exposure concentrations in the rat studies.
The structure of the PBPK model for styrene reflects the
generic mammalian architecture. Organs are arranged in a parallel
system of blood flows with total blood flow through the lungs.
QAlv QAlv
Alveolar Space
CInh CAlv
QT QT
Lung Blood
CVen CArt
QF
Fat Tissue Group
CVF CArt
QM
Muscle Tissue Group
CVM CArt
Richly Perfused QR
CVR Tissue Group CArt
Liver [Metabolizing QL
CVL Tissue Group] CArt
VMax
Metabolites
KM
Fig. 4. Diagram of a physiologically based pharmacokinetic model for styrene. In this

description, groups of tissues are defined with respect to their volumes, blood flows (Q ),
and partition coefficients for the chemical. The uptake of vapor is determined by the
alveolar ventilation (QALV), cardiac output (QT), blood–air partition coefficient, and the
concentration gradient between arterial and venous pulmonary blood (CART and CVEN).
Metabolism is described in the liver with a saturable pathway defined by a maximum
velocity (Vmax) and affinity (Km). The mathematical description assumes equilibration
between arterial blood and alveolar air as well as between each of the tissues and the
venous blood exiting from that tissue.
This model can easily be scaled-up to examine styrene kinetics for

other mammalian species. In the case of styrene, exposure experi-
ments had also been conducted with human volunteers (84). In
order to model this data, the PBPK model parameters were changed
to human physiological values, the human blood–air partitioning
was determined from human blood samples, and the metabolism
was scaled allometrically so that capacity (Vmax) was related to basal
metabolic rate (body weight raised to the 0.7 power) and affinity
(Km) was the same in the human as in the rat, 0.36 mg/L. Ramsey
(84) measured both venous blood and exhaled air concentrations in
these human volunteers. Although the rat PBPK model was devel-
oped for blood and fat, not for exhaled air, the physiologically based
description automatically provides information on expected exhaled
air concentrations. It was straightforward then to predict expected
Styrene Concentration (mg / L)

80 ppm
Blood
Exhaled Air
Hours
Fig. 5. Model predictions and experimental blood and exhaled air concentrations in human
volunteers during and after 6 h exposures to 80 ppm styrene. The model is identical to
that used for rats (Fig. 1.4). The model parameters have been changed to values
appropriate for humans on the basis of physiological and biochemical information, and
have not been adjusted to improve the fit to the experimental data.
Styrene Concentration (mg / L)
376
216
51
Hours
Fig. 6. Model predictions and experimental exhaled air concentrations in human volun-
teers following 1 h exposures to 51, 216, and 376 ppm styrene. The model is the same as
Fig. 1.5.
exhaled air concentrations in humans and compare the predictions

with the concentrations measured during the experiments (Fig. 5).
A similar comparison of the model’s predictions with another
human data set from (85) also demonstrated the ability of the
PBPK structure to support extrapolation of styrene kinetics from
the rat to the human (Fig. 6).
3.2. Modeling This basic PBPK model for styrene has several tissue groups which
Philosophy were lumped according to their perfusion and partitioning character-
istics. In the mathematical formulation, each of these several com-
partments is described by a single mass-balance differential equation.
It would be possible to describe individual tissues in each of the
lumped compartments, if necessary. This detail is usually unnecessary
unless some particular tissue in a lumped compartment is the target
tissue. One might, for example, want to separate brain from other
richly perfused tissues if the model were for a chemical that had a toxic
effect on the central nervous system (86–88). Other examples of
additional compartments include the addition of placental and mam-
mary compartments to model pregnancy and lactation (89–91). The
interactions of chemical mixtures can even be described by including
compartments for more than one chemical in the model (92–94).
Increasing the number of compartments does increase the number of
differential equations required to define the model. However, the
number of equations does not pose any problem due to the power of
modern desktop computers.
On the other hand, as the number of compartments in the
PBPK model increases, the number of input parameters increases
correspondingly. Each of these parameters must be estimated from
experimental data of some kind. Fortunately, the values of many of
these can be set within narrow limits from nonkinetic experiments.
The PBPK model can also help to define those experiments which
are needed to improve parameter estimates by identifying condi-
tions where the sensitivity of the model to the parameter is the
greatest (95). The demand that the PBPK fit a variety of data also
restricts the parameter values that will give a satisfactory fit to
experimental data. For example, the styrene model (described
above) was required to reproduce both the high and low concen-
tration behaviors, which appeared qualitatively different, using the
same parameter values. If one were independently fitting single
curves with a model, the different parameter values obtained
under different conditions would be relatively uninformative for
extrapolation.
As the renowned statistician George Box has said, “All models
are wrong, and some are useful.” Even a relatively complex descrip-
tion such as a PBPK model will sometimes fail to fit reliable experi-
mental data. When this occurs, the investigator needs to think how
the model might be changed, i.e., what extra biological aspects
must be added to the physiological description to bring the predic-
tions in line with experimental observation? In the case of the work
with styrene cited above, continuous 24 h styrene exposures could
not be modeled with a time-independent maximum rate of metab-
olism, and induction of enzyme activity had to be included to yield
a satisfactory representation of the observed kinetic behavior (96).
When a PBPK model is unable to adequately describe kinetic
data, the nature of the discrepancy can provide the investigator with
additional insight into time dependencies in the system. This

insight can then be utilized to reformulate the biological basis of
the model and improve its fidelity to the data. The resulting model
may be more complicated, but it will still be useful if the pertinent
kinetic constants can be estimated for human tissues. Indeed, as
long as the model maintains its biological basis the additional
parameters can often be determined directly from separate experi-
ment, rather than estimated by fitting the model to kinetic data. As
the models become more complex, they necessarily contain larger
numbers of physiological, biochemical, and biological constants.
The crucial task during model development is to keep the descrip-
tion as simple as possible and to ensure the identifiability of new
parameters that are added to the model; every attempt should be
made to obtain or verify model parameters from experimental
studies separate from the modeling exercises themselves (97).
The following section explores some of the key issues associated
with the development of PBPK models. It is meant to provide a
general understanding of the basic design concepts and mathemat-
ical forms underlying the PBPK modeling process, and is not meant
to be a complete exposition of the PBPK modeling approach for all
possible cases. It must be understood that the specifics of the
approach can vary greatly for different types of chemicals, e.g.,
volatiles, nonvolatiles, and metals, and for different applications.
Model building is an art, and is best understood as an iterative
process in the spirit of the scientific method (97). The literature
articles cited in the introductory section include examples of suc-
cessful PBPK models for a wide variety of chemicals and provide a
wealth of insight into various aspects of the PBPK modeling pro-
cess. They should be consulted for further detail on the approach
for applying the PBPK methodology in specific cases.
3.3. Tissue Grouping The first aspect of PBPK model development that will be discussed is
determining the extent to which the various tissues in the body may
be grouped together. Although tissue grouping is really just one
aspect of model design, which is discussed in the next section, it
provides a simple context for introducing the two alternative
approaches to PBPK model development: “lumping” and
“splitting” (Fig. 7). In the context of tissue grouping, the guiding
philosophy in the lumping approach can be stated as follows:
“Tissues which are pharmacokinetically and toxicologically indistin-
guishable may be grouped together.” In this approach, model
development begins with information at the greatest level of detail
that is practical, and decisions are made to combine physiological
elements (tissues and blood flows) to the extent justified by their
similarity. The common grouping of tissues into richly (or rapidly)
perfused and poorly (or slowly) perfused on the basis of their
perfusion rate (ratio of blood flow to tissue volume) is an example
of the lumping approach. The contrasting philosophy of splitting is
Lumping Splitting
Body
Body / Liver
Rapid / Slow / Liver
Rapid / Slow / Liver / Fat
Only a Few Tissues Grouped
All Tissues and Organs Separate
Fig. 7. The role of lumping and splitting processes in PBPK model development.
as follows: “Tissues which are pharmacokinetically or toxicologically

distinct must be separated.” This approach starts with the simplest
reasonable model structure and increases the model’s complexity
only to the extent required to reproduce data on the chemical of
concern for the application of interest. Splitting requires the greater
initial investment in data collection and, if taken to the extreme,
could paralyze model development. Lumping, on the other hand, is
more efficient but runs a greater risk of overlooking chemical-
specific determinants of chemical disposition.
The description of fat tissue in the PBPK model of styrene
described in the previous section can be used to provide an example
of the different approach associated with the two philosophies. In
the splitting approach, which is the approach used by Ramsey and
Andersen (22), a single fat compartment was used initially with
volume, blood flow and partitioning parameters selected to repre-
sent all adipose tissues in the body. Clearly, there are actually a
number of distinguishable adipose tissues, including inguinal, peri-
renal, and brown fat, among others, which may have different
partitioning and kinetic characteristics for styrene. However, since
this single-compartment treatment provided an adequate descrip-
tion of the available data on the kinetics of styrene in the fat and
blood, no attempt was made to split the fat tissue group into
multiple compartments. For a more lipophilic chemical, polychlor-
otrifluoroethylene oligomer, on the other hand, it was not possible
to adequately reproduce fat and blood kinetic data using a single fat
compartment (98); therefore, the fat compartment was split into
two parts: “perirenal fat” and “other fat tissues,” resulting in an
acceptable simulation of the observed kinetic behavior.
The splitting process just described can be contrasted with a
lumping approach, in which the PBPK model would initially be
designed to include separate compartments for all physiologically
distinguishable fat tissues. Partition coefficients for each of the fat
tissues would be determined experimentally, and the volume and

blood flow for each fat tissue would be estimated. If it were then
determined that the kinetic characteristics of the various fat tissues
were not sufficiently different to justify retaining separate compart-
ments, they would be lumped together by appropriately combining
the individual parameter values (adding the volumes and blood
flows and averaging the partition coefficients).
3.3.1. Criteria for Grouping There are two alternative approaches for determining whether
Tissues tissues are kinetically distinct or should be lumped together. In
the first approach, the tissue rate-constants are compared. The
rate-constant (kT) for a tissue is similar to the perfusion rate except
that the partitioning characteristics of the tissue are also considered:
kT ¼ Q T =ðP T V T Þ;
where QT ¼ the blood flow to the tissue (L/h), PT ¼ the tissue–blood
partition coefficient for the chemical, VT ¼ the volume of the
tissue (L).
Thus the units of the tissue rate-constant are the same as for the
perfusion rate, h1, but the rate-constant more accurately reflects
the kinetic characteristics of a tissue for a particular chemical. It was
the much smaller rate-constant for fat in the case of a lipophilic
chemical such as styrene that required the separation of the fat
compartment from the other poorly perfused tissues (muscle,
skin, etc.) in the PBPK model for styrene (22).
The second, less rigorous, approach for determining whether
tissues should be lumped together is simply to compare the perfor-
mance of the model with the tissues combined and separated. This
approach is essentially the reverse of the example given above for
splitting of the fat compartment. The reliability of this approach
depends on the availability of data under conditions where the
tissues being evaluated would be expected to have an observable
impact on the kinetics of the chemical. Sensitivity analysis can
sometimes be used to determine the appropriate conditions for
such a comparison (95).
3.4. Model Design There is no easy rule for determining the structure and level of
Principles complexity needed in a particular modeling application. The wide
variability of PBPK model design for different chemicals can be seen
by comparing the diagram of the PBPK model for methotrexate
(41), shown in Fig. 8, with the diagram for the styrene PBPK
model shown in Fig. 3. Model elements which are important for a
volatile, lipophilic chemical such as styrene (lung, fat) do not need
to be considered in the case of a nonvolatile, water soluble com-
pound such as methotrexate. Similarly, while kidney excretion and
enterohepatic recirculation are important determinants of the
kinetics of methotrexate, only metabolism and exhalation are
Plasma
QL - QG
QG
Liver G.I. Tract
r Biliary Secretion Gut Absorption
τ τ τ C1 C2 C3 C4 Feces
r1 r2 r3
Gut Lumen
QK
Kidney
Urine
QM
Muscle
Fig. 8. PBPK model for methotrexate (Bischoff et al. 1971).
significant for styrene. The decision of which elements to include in

the model structure for a specific chemical and application draws on
all of the modeler’s experience and knowledge of the animal–
chemical system.
The alternative approaches to tissue grouping discussed above are
actually just reflections of the two competing criteria which must be
balanced during model design: parsimony and plausibility. The prin-
ciple of parsimony simply states that a model should be as simple as
possible for the intended application (but no simpler). This
“splitting” philosophy is related to that of Occam’s Razor: “Entities
should not be multiplied unnecessarily.” That is, structures and para-
meters should not be included in the model unless they are needed to
support the application for which the model is being designed. For
example, if a model is developed to describe inhalation exposure to a
chemical over periods from hours to years, as in the case of the styrene
model discussed earlier, it is not necessary to describe transient,
breath-by-breath behavior of chemical uptake and exhalation in the
lung. On the other hand, if the model is being developed to predict
initial inhalation uptake of the chemical at times on the order of
minutes, this level of detail clearly might be justified (98, 99).
The desire for parsimony in model development is driven not
only by the desire to minimize the number of parameters whose
values must be identified, but also by the recognition that as the
number of parameters increases, the potential for unintended inter-
actions between parameters increases disproportionately. A gener-
ally accepted rule of software engineering warns that it is relatively
easy to design a computer program which is too complicated to be
completely comprehended by the human mind. As a model

becomes more complex, it becomes increasingly difficult to vali-
date, even as the level of concern for the trustworthiness of the
model should increase.
Countering the desire for model parsimony is the need for
plausibility of the model structure. As discussed in the introduc-
tion, it is the physiological and biochemical realism of PBPK mod-
els that gives them an advantage for extrapolation. The credibility
of a PBPK model’s predictions of kinetic behavior under conditions
different from those under which the model was validated rests to a
large extent on the correspondence of the model design to known
physiological and biochemical structures. In general, the ability of a
model to adequately simulate the behavior of a physical system
depends on the extent to which the model structure is homomor-
phic (having a one-to-one correspondence) with the essential fea-
tures determining the behavior of that system. For example, if the
model of styrene had not included a description of saturable metab-
olism, it would not have been able to adequately simulate the
kinetics of styrene at both low and high doses using a single
parameterization.
3.4.1. Model Identification The process of model identification begins with the selection of those
model elements which the modeler considers to be minimum essen-
tial determinants of the behavior of the particular animal–chemical
system under study, from the viewpoint of the intended application of
the model. Comparison with appropriate data, relevant to the
intended purpose of the model, then can provide insights into defects
in the model which must be corrected either by reparameterization or
by changes to the model structure. Unfortunately, it is not always
possible to separate these two elements. In models of biological
systems, estimates of the values of model parameters will always be
uncertain, due both to biological variation and experimental error. At
the same time, the need for biological realism unavoidably results in
models that are “overparameterized”; that is, they contain more
parameters than can be identified from the kinetic data the model is
used to describe.
As an example of the interaction between model structure and
parameter identification, the two metabolic parameters, Vmax and
Km, in the model for styrene discussed earlier could both be identi-
fied relatively unambiguously in the case of the rat. Indeed, as
pointed out previously, the inclusion of capacity-limited metabolism
in the model was necessary in order to reproduce the available data
at both low and high exposure concentrations. In the case of the
human, however, data was not available at sufficiently high concen-
trations to saturate metabolism. Therefore, only the ratio, Vmax/
Km, would actually be identifiable. The use of the same model
structure, including a two-parameter description of metabolism, in
the human as in the rat was justified by the knowledge that similar
enzymatic systems are responsible for the metabolism of chemicals

such as styrene in both species. However, if the model were to be
used to extrapolate to higher concentrations in the human, the
potential impact of the uncertainty in the values of the individual
metabolic parameters would have to be carefully considered.
Model identification is the selection of a specific model struc-
ture from several alternatives, based on conformity of the models’
predictions to experimental observations. The practical reality of
model identification in the case of biological systems is that regard-
less of the complexity of the model there will always be some level
of “model error” (lack of homomorphism) which will result in
systematic discrepancies between the model and experimental
data. This model structural deficiency interacts with deficiencies in
the identifiability of the model parameters, potentially leading to
misidentification of the parameters or misspecification of struc-
tures. This most dangerous aspect of model identification is exacer-
bated by the fact that, in general, adding equations and parameters
to a model increases the model’s degrees of freedom, improving its
ability to reproduce data, regardless of the validity of the underlying
structure. Therefore, when a particular model structure improves
the agreement of the model with kinetic data, it can only be said
that the model structure is “consistent” with the kinetic data; it
cannot be said that the model structure has been “proved” by its
consistency with the data. In such circumstances, it is imperative
that the physiological or biochemical hypothesis underlying the
model structure is tested using nonkinetic data.
3.5. Elements of Model The process of selecting a model structure can be broken down into
Structure a number of elements associated with the different aspects of
uptake, distribution, metabolism, and elimination. In addition,
there are several general model structure issues that must be
addressed, including mass balance and allometric scaling. The fol-
lowing section treats each of these elements in turn.
3.5.1. Storage Naturally, any tissues which are expected to accumulate significant
Compartments quantities of the chemical or its metabolites need to be included in
the model structure. As discussed earlier, these storage tissues can
be grouped together to the extent that they have similar time
constants. Three storage compartments were included in the sty-
rene model described above: fat tissues, richly perfused tissues, and
poorly perfused tissues. The generic mass balance equation for
storage compartments such as these is (Fig. 9):
QT QT
Tissue
CA CVT
Fig. 9. Blood flow through a storage compartment.

dA T =dt ¼ Q T C A Q T C VT ;
where AT ¼ the mass of chemical in the tissue (mg), QT ¼ the blood
flow to (and from) the tissue (L/h), CA ¼ the concentration of
chemical in the arterial blood reaching the tissue (mg/L), CVT ¼ the
concentration of the chemical in the venous blood leaving the tissue
(mg/L).
Thus this mass balance equation simply states that the change in
the amount of chemical in the tissue with respect to time (dAT/dt) is
equal to the difference between the amount of chemical entering the
tissue and the amount leaving the tissue. We can then calculate
the concentration of the chemical in the storage tissue (CT) from
the amount in the tissue and the tissue volume (VT):
C T ¼ A T =V T :
In PBPK models, it is common to assume “venous equilibra-
tion”; that is, that in the time that it takes for the blood to perfuse
the tissue, the chemical is able to achieve its equilibrium distribution
between the tissue and blood. Therefore, the concentration of the
chemical in the venous blood can be related to the concentration in
the tissue by the equilibrium tissue–blood partition coefficient (PT):
C VT ¼ C T =P T :
Therefore we obtain a differential equation in AT:
dA T =dt ¼ Q T C A Q T A T =ðP T V T Þ:
If desired, we can reformulate this mass balance equation in
terms of concentration:
dA T =dt ¼ dðC T V T Þ=dt ¼ C T dV T =dt þ V T dC T =dt:
If (and only if) VT is constant (i.e., the tissue does not grow
during the simulation), dVT/dt ¼ 0, and:
dA T =dt ¼ V T dC T =dt,
so we have the alternative differential equation:
dC T =dt ¼ Q T ðC A C T =P T Þ=V T :
This alternative mass balance formulation, in terms of concen-
tration rather than amount, is popular in the pharmacokinetic
literature. However, in the case of models with compartments
that change volume over time it is preferable to use the formulation
in terms of amounts in order to avoid the need for the additional
term reflecting the change in volume (CT dVT/dt).
Depending on the chemical, many different tissues can poten-
tially serve as important storage compartments. The use of a fat
storage compartment in the styrene model is typical of a lipophilic
chemical. The gut lumen can also serve as a storage site for chemi-
cals subject to enterohepatic recirculation, as in the case of
methotrexate. Important storage sites for metals, on the other

hand, can include the kidney, red blood cells, intestinal epithelial
cells, skin, bone, and hair. Transport to and from a storage com-
partment does not always occur via the blood, as described above;
for example, in some cases the storage is an intermediate step in an
excretion process (e.g., hair, intestinal epithelial cells). As with
methotrexate, it may also be necessary to use multiple compart-
ments in series, or other mathematical devices, to model plug flow
(i.e., a time delay between entry and exit from storage).
3.5.2. Blood Compartment The description of the blood compartment can vary considerably
from one PBPK model to another depending on the role the blood
plays in the kinetics of the chemical being modeled. In some cases
the blood may be treated as a simple storage compartment, with a
mass balance equation describing the summation (S) of the venous
blood flows from the various tissues and the return of the total
arterial blood flow (QC) to the tissues, as well as any urinary
clearance (Fig. 10):
X
dA B =dt ¼ Q T C T =P T Q C C B K U C B ;
where AB ¼ the amount of chemical in the blood (mg), QC ¼ the
total cardiac output (L/h), CB ¼ the concentration of chemical in
the blood (mg/L), KU ¼ the urinary clearance (L/h).
For some chemicals, such as methotrexate, all of the chemical is
present in the plasma rather than the red blood cells, so plasma
flows and volumes are used instead of blood. For other chemicals it
may be necessary to model the red blood cells as a storage com-
partment in communication with the plasma via diffusion-limited
transport. Note that if the blood is an important storage compart-
ment for a chemical, it may be necessary to carefully evaluate data
on tissue concentrations, particularly the richly perfused tissues, to
determine whether chemical in the blood perfusing the tissue could
be contributing to the measured tissue concentration.
For still other chemicals, such as styrene, the amount of chemical
actually in the blood may be relatively unimportant. In this case,
instead of having a true blood compartment, a steady-state approxi-
mation can be used to estimate the concentration in the blood at any
time. Assuming the blood is at steady-state with respect to the tissues:
dA B =dt ¼ 0:
QC
Blood
CB
KU
Fig. 10. Blood flow to tissue and urinary clearance.

Therefore, solving the blood equation for the concentration:

X Q T C T =P T
CB ¼ :
QC
3.5.3. Metabolism/ The liver is frequently the primary site of metabolism for a chemi-
Elimination cal. The following equation is an example of the mass balance
equation for the liver in the case of a chemical which is metabolized
by two pathways (Fig. 11):
d A L =dt ¼ Q L ðC A C L =P L Þ kF C L V L =P L V max
C L =P L =ðK m þ C L =P L Þ:
In this case, the first term on the right-hand side of the equa-
tion represents the mass flux associated with transport in the blood
and is identical to the case of the storage compartment described
previously. The second term describes metabolism by a linear (first-
order) pathway with rate constant kF (h1) and the third term
represents metabolism by a saturable (Michaelis–Menten) pathway
with capacity Vmax (mg/h) and affinity Km (mg/L). If it were
desired to model a water soluble metabolite produced by the
saturable pathway, an equation for its formation and elimination
could be added to the model (Fig. 12):
dA M =dt ¼ Rstoch V max C L =P L =ðK m þ C L =P L Þ ke A M
C M ¼ AM =V D ;
where AM ¼ the amount of metabolite in the body (mg), Rstoch ¼
the stoichiometric yield of the metabolite times the ratio of its
molecular, Weight to that of the parent chemical, ke ¼ the rate
QL QL
Liver
CA CVL
k VMax, KM
F
Fig. 11. Liver compartment metabolizing a chemical by two pathways.
VMax, KM
Metabolite
ke
Fig. 12. Metabolite formation and elimination compartment.

constant for the clearance of the metabolite from the body (h1),
CM ¼ the concentration of the metabolite in the plasma (mg/L),
VD ¼ the apparent volume of distribution for the metabolite (L).
3.5.4. Metabolite In principle, the same considerations which drive decisions regarding
Compartments the level of complexity of the PBPK model for the parent chemical
must also be applied for each of its metabolites, and their metabolites,
and so on. As in the case of the parent chemical, the first and most
important consideration is the purpose of the model. If the concern is
direct parent chemical toxicity and the chemical is detoxified by
metabolism, then there is no need for a description of metabolism
beyond its role in the clearance of the parent chemical. The models for
styrene and methotrexate discussed above are examples of parent
chemical models. Similarly, if reactive intermediates produced during
the metabolism of a chemical are responsible for its toxicity, as in the
case of methylene chloride, a very simple description of the metabolic
pathways might be adequate (8). The cancer risk assessment model for
methylene chloride described the rate of metabolism for two path-
ways: the glutathione conjugation pathway, which was considered
responsible for the carcinogenic effects, and the competing P450
oxidation pathway, which was considered protective.
On the other hand, if one or more of the metabolites are
considered to be responsible for the toxicity of a chemical, it may
be necessary to provide a more complete description of the kinetics
of the metabolites themselves. For example, in the case of teratoge-
nicity from all-trans-retinoic acid, both the parent chemical and
several of its metabolites are considered to be toxicologically active;
therefore, in developing the PBPK model for this chemical it was
necessary to include a fairly complete description of the metabolic
pathways (100). Fortunately, the metabolism of xenobiotic com-
pounds often produces metabolites which are relatively water solu-
ble, simplifying the description needed. In many cases, such as the
production of trichloroacetic acid from trichloroethylene (46–48),
a classical one-compartment description may be adequate for
describing the metabolite kinetics. An example of such a description
was provided earlier. In other cases, however, the description of the
metabolite (or metabolites) may have to be as complex as that of the
parent chemical. An example of such a case is the PBPK model for
parathion (88), in which the model for the active metabolite, para-
oxon, is actually more complex than that of the parent chemical.
3.5.5. Target Tissues Typically, a PBPK model used in toxicology or risk assessment
applications will include compartments for any target tissues for
the toxic action of the chemical. The target tissue description may
in some cases need to be fairly complicated, including such features
as in situ metabolism, binding, and pharmacodynamic processes in
order to provide a realistic measure of biologically effective tissue
exposure (57). For example, whereas the lung compartment in the
styrene model was represented only by a steady-state description of

alveolar vapor exchange, the PBPK model for methylene chloride
that was applied to perform a cancer risk assessment (8) included a
two-part lung description in which alveolar vapor exchange was
followed by a lung tissue compartment with in situ metabolism.
This more complex lung compartment was required to describe the
dose–response for methylene chloride induced lung cancer, which
was assumed to result from the metabolism of methylene chloride
in lung clara cells.
In other cases, describing a separate compartment for the
target tissue may be unnecessary. For example, the styrene model
described above could be used to relate acute exposures associated
with neurological effects without the necessity of separating out a
brain compartment. Instead, the concentration or AUC of styrene
in the blood could be used as a metric, on the assumption that the
relationship between brain concentration and blood concentration
would be the same under all exposure conditions, routes, and
species, namely, that the concentrations would be related by the
brain–blood partition coefficient. In fact, this is probably a reason-
able assumption across different exposure conditions in a given
species. However, while tissue–air partition coefficients for volatile
lipophilic chemicals appear to be similar in dog, monkey, and man
(101), human blood–air partition coefficients appear to be roughly
half of those in rodents (102). Therefore, the human brain–blood
partition would probably be about twice that in the rodent. Never-
theless, if the model were to be used for extrapolation from rodents
to humans, this difference could easily be factored into the analysis
as an adjustment to the blood metric, without the need to actually
add a brain compartment to the model.
A fundamental issue in determining the nature of the target tissue
description required is the need to identify the toxicologically active
form of the chemical. In some cases, a chemical may produce a toxic
effect directly, either through its reaction with tissue constituents (e.g.,
ethylene oxide) or through its binding to cellular control elements
(e.g., dioxin). Often, however, it is the metabolism of the chemical that
leads to its toxicity. In this case, toxicity may result primarily from
reactive intermediates produced during the process of metabolism
(e.g., chlorovinyl epoxide produced from the metabolism of vinyl
chloride) or from the toxic effects of stable metabolites (e.g., trichlor-
oacetic acid produced from the metabolism of trichloroethylene).
The specific nature of the relationship between tissue exposure
and response depends on the mechanism of toxicity, or mode of
action, involved. Some toxic effects, such as acute irritation or acute
neurological effects, may result primarily from the current concen-
tration of the chemical in the tissue. Other toxic effects, such as
tissue necrosis and cancer, may depend on both the concentration
and duration of the exposure. For developmental effects, the chem-
ical time course may also have to be convoluted with the window of
susceptibility for a particular gestational event. The selection of the

dose metric, that is, the active chemical form for which tissue
exposure should be determined and the nature of the measure
to be used—e.g., peak concentration (Cmax) or area under the
concentration (AUC)–time profile—is the most important step in
a pharmacokinetic analysis and a principal determinant of the struc-
ture and level of detail that will be required in the PBPK model.
3.5.6. Uptake Routes Each of the relevant uptake routes for the chemical must be
described in the model. Often there are a number of possible
ways to describe a particular uptake process, ranging from simple
to complex. As with all other aspects of model design, the compet-
ing goals of parsimony and realism must be balanced in the selec-
tion of the level of complexity to be used. The following examples
are meant to provide an idea of the variety of model code which can
be required to describe the various possible uptake processes.
Intravenous Administration A B0 ¼ Dose BW,

where AB0 ¼ the amount of chemical in the blood at the beginning
of the simulation (t ¼ 0), Dose ¼ administered dose (mg/kg),
BW ¼ animal body weight (kg).
or, in the case where a steady-state approximation has been
used to eliminate the blood compartment:

C B ¼ Q L C VL þ þ Q F C VF þ kIV =QC,
where
kIV ¼ Dose BW=t IV ; ðt<t IV Þ
¼ 0 ðt>t IV Þ
tIV ¼ the duration of time over which the injection takes place (h).
In the latter case, the model code must be written with a
“switch” to change the value of kIV to zero at t ¼ tIV.
Drinking Water (Fig. 13) k0 ¼ Dose BW=24

dA L =dt ¼ Q L ðC A C L =P L Þ kF C L V L =P L þ k0 ;
where Dose ¼ the daily ingestion rate of chemical in drinking water
(mg/kg/day) and the liver compartment as shown includes only
first-order metabolism
k0
QL QL
Liver
CA CVL
kF
Fig. 13. Ingestion of chemical through drinking water.

Oral Gavage For a chemical which is not excreted in the feces (Fig. 14):
AST0 ¼ Dose BW
dA ST =dt ¼ kA A ST
dA L =dt ¼ Q L ðC A C L =P L Þ kF C L V L =P L þ kA A ST ;
where Dose ¼ the gavage dose (mg/kg), BW ¼ the animal body
weight (kg), AST0 ¼ the amount of chemical in the stomach at the
beginning of the simulation, AST ¼ the amount of chemical in the
stomach at any given time, kA ¼ the oral absorption rate (h1).
For a chemical which is excreted in the feces (Fig. 15):
A ST0 ¼ Dose BW
dA ST =dt ¼ kA A ST
dA I =dt ¼ kA A st kI AI K F A I =V I
dA L =dt ¼ Q L ðC A C L =P L Þ kF C L V L =P L þ kI A I ;
where AI ¼ the amount of chemical in the intestinal lumen (mg),
kI ¼ the rate constant for intestinal absorption (h1), KF ¼ the
fecal clearance (L/h), VI ¼ the volume of the intestinal lumen (L).
The rate of fecal excretion of the chemical is then:
dA F =dt ¼ K F A I =V I :
Stomach
kA
QL QL
Liver
CA CVL
kF
Fig. 14. Chemical ingested through oral gavage and not excreted in the feces.
Stomach
kA
KF
Intestinal Lumen
kI
QL QL
Liver
CA CVL
kF
Fig. 15. Chemical being excreted through feces.

CI QP CX
Alveolar Air
QC QC
Alveolar Blood
CV CA
Fig. 16. Inhalation of chemical through the lung compartment.
Note, however, that this simple formulation does not consider

the plug flow of the intestinal contents and will not reproduce the
delay which actually occurs in the appearance of the chemical in the
feces. Such a delay could be added using a delay function available
in common simulation software, or multiple compartments could
be used to simulate plug flow, as shown in the diagram of the
methotrexate model (Fig. 8).
Inhalation (Fig. 16) dA AB =dt ¼ Q C ðC V C A Þ þ Q P ðC I C X Þ;

where AAB ¼ the amount of chemical in the alveolar blood (mg),
QC ¼ the total arterial blood flow (L/h), CV ¼ the concentration
of chemical in the pooled venous blood (mg/L), CA ¼ the con-
centration of chemical in the alveolar (arterial) blood (mg/L), QP
¼ the alveolar (not total pulmonary) ventilation rate (L/h), CI ¼
the concentration of chemical in the inhaled air (mg/L), CX ¼ the
concentration of chemical in the exhaled air (mg/L).
Assuming the alveolar blood is at steady-state with respect to
the other compartments:
dA AB =dt ¼ 0:
Also, assuming lung equilibration (i.e., that the blood in the
alveolar region has reached equilibrium with the alveolar air prior to
exhalation):
C X ¼ C A =P B :
Substituting into the equation for the alveolar blood, and
solving for CA:

Q C CV þ Q P CI
CA ¼ :
Q C þ Q P =P B
This steady-state approximation is used in the styrene model
described earlier. Note that the rate of elimination of the chemical
by exhalation is just QP CX. The alveolar ventilation rate, QP,
does not include the “deadspace” volume (the portion of the
inhaled air which does not reach the alveolar region), and is there-
fore roughly 70% of the total respiratory rate. The concentration
CX represents the “end-alveolar” air concentration; in order to
Surface
KP, ASkC
QSk QSk
Skin
CA CSkV
Fig. 17. Diffusion of chemical through the skin.
estimate the average exhaled concentration (CEX), the dead-space

contribution must be included:
C EX ¼ 0:3 C I þ 0:7 C X :
Dermal (Fig. 17) dASK =dt ¼ K P A SFC ðC SFC C SK =P SKV Þ= 1;000 þ Q SK

ðC A C SK =P SKB Þ;
where ASK ¼ the amount of chemical in the skin (mg), KP ¼ the skin
permeability coefficient (cm/h), ASFC ¼ the skin surface area (cm2),
CSFC ¼ the concentration of chemical on the surface of the skin
(mg/L), CSK ¼ the concentration of the chemical in the skin (mg/
L), PSK ¼ the skin–vehicle partition coefficient (i.e., the vehicle con-
taining the chemical on the surface of the skin), Q SK ¼ the blood
flow to the skin region (L/h), CA ¼ the arterial concentration of the
chemical (mg/L), PSKV ¼ the skin–blood partition coefficient.
Note that due to adding this compartment, the equation for
the blood in the model must also be modified to add a term for the
venous blood returning from the skin (+Q SK CSK/PSKB), the
blood flow and volume parameters for the slowly perfused tissue
compartment must be reduced by the amount of blood flow and
volume for the skin.
3.5.7. Experimental In some cases, in addition to compartments describing the

Apparatus animal–chemical system, it may also be necessary to include
model compartments that describe the experimental apparatus in
which measurements were obtained. An example of such a case is
modeling a closed-chamber gas uptake experiment. In a gas uptake
experiment, several animals are maintained in a small, enclosed
chamber while the air in the chamber is recirculated, with replen-
ishment of oxygen and scrubbing of carbon dioxide. A small
amount of a volatile chemical is then allowed to vaporize into the
chamber, and the concentration of the chemical in the chamber air
is monitored over time. In this design, any loss of the chemical from
the chamber air reflects uptake into the animals (103). In order to
simulate the change in the concentration in the chamber air as the
chemical is taken up into the animals, an equation is required for
the chamber itself (Fig. 18):
Chamber
CI N * QP CX
Alveoli
Fig. 18. The change in concentration in the chamber air as the chemical is absorbed.
QK QK
Kidney
CA CKV
ke
Fig. 19. Saturable binding in the kidney.
dA CH =dt ¼ N Q P ðC X C I Þ
C I ¼ A CH =V CH ;
where ACH ¼ the amount of chemical in the chamber (mg), N ¼ the
number of animals in the chamber, CX ¼ the concentration of chem-
ical in the air exhaled by the animals (mg/L), QP ¼ the alveolar
ventilation rate for a single animal (L/h), CI ¼ the chamber air
concentration (mg/L), VCH ¼ the volume of air in the chamber (L).
3.5.8. Distribution/Transport There are a number of issues associated with the description of the
transport and distribution of the chemical that must be considered
in the process of model design. Examples of a few of the more
common ones are included here.
Binding When there is evidence that saturable binding is an important

determinant of the distribution of a chemical (such as an apparent
dose-dependence of the tissue partitioning), a description of bind-
ing can be added to the model. For example, in the case of saturable
binding in the kidney (Fig. 19):
dA KT =dt ¼ V K dC KT =dt
¼ Q K ðC A C KF =P KF Þ ke C KF =P KF ;
where AKT ¼ the total amount of chemical in the kidney (mg),
VK ¼ the volume of the kidney (L), CKT ¼ the total concentration
of chemical in the kidney (mg/L), QK ¼ the blood flow to the
kidney, CKF ¼ the concentration of free (unbound) chemical in the
kidney (mg/L), PKF ¼ the kidney–blood partition coefficient for
free chemical, ke ¼ the urinary excretion rate constant (h1).
The apparent complication in adding this equation to the
model is that the total movement of chemical is needed for the
mass balance, but the determinants of the kinetics are in terms of
the free concentration. To solve for free in terms of total, we note

that:
C KT ¼ C KF þ C KB ;
where CKB ¼ the concentration of bound chemical.
We can describe the saturable binding with an equation similar
to that for saturable metabolism:
C KB ¼ B C KF =ðK B þ C KF Þ;
where B ¼ the binding capacity (mg/L), KB ¼ the binding affinity
(mg/L).
Substituting this equation in the previous one:
C KF þ B C KF
C KT ¼ :
ðK B þ C KF Þ
Rewriting this equation to solve for the free concentration in
terms of only the total concentration would result in a quadratic
equation, the solution of which could be obtained with the qua-
dratic formula. However, taking advantage of the iterative algo-
rithm by which these PBPK models are exercised (as will be
discussed later), it is not necessary to go to this effort. Instead, a
much simpler implicit equation can be written for the free concen-
tration (i.e., an equation in which the free concentration appears on
both sides):
C KF ¼ C KT =ð1 þ B=ðK B þ C KF ÞÞ:
In an iterative algorithm, this equation can be solved at each
time step using the previous value of CKF to obtain the new value! A
new value of CKT is then obtained from the mass balance equation
for the kidney and the process is repeated.
Diffusion Limitation Most of the PBPK models in the literature are flow-limited models;
that is, they assume that the rate of tissue uptake of the chemical is
limited only by the flow of the chemical to the tissue in the blood.
While this assumption appears to be reasonable in general, for some
chemicals and tissues uptake may instead be diffusion-limited.
Examples of tissues for which diffusion-limited transport has
often been described include the skin, placenta, brain, and fat.
The model compartments described thus far have all assumed
flow-limited transport. If there is evidence that the movement of
a chemical between the blood and a tissue is limited by diffusion, a
two-compartment description of the tissue can be used with a
“shallow” exchange compartment in communication with the
blood and a diffusion-limited “deep” compartment (Fig. 20):
dA S =dt ¼ Q S ðC A C S Þ K PA ðC S C D =P D Þ
dA D =dt ¼ K PA ðC S C D =P D Þ;
QS QS
Shallow
CA CS
KPA
Deep
Fig. 20. A two-compartment model describing the movement of a chemical between the
“shallow” and diffusion-limited “deep” compartment.
where AS ¼ the amount of chemical in the shallow compartment

(mg), QS ¼ the blood flow to the shallow compartment (L/h),
CS ¼ the concentration of chemical in the shallow compartment
(mg/L), KPA ¼ the permeability-area product for diffusion-limited
transport (L/h), CD ¼ the concentration of chemical in the deep
compartment (mg/L), PD ¼ the tissue–blood partition coefficient,
AD ¼ the amount of chemical in the deep compartment (mg).
3.6. Model Once the model structure has been determined, it still remains to
Parameterization identify the values of the input parameters in the model.
3.6.1. Physiological Estimates of the various physiological parameters needed in PBPK

Parameters models are available from a number of sources in the literature,
particularly for the human, monkey, dog, rat, and mouse (30–35).
Estimates for the same parameter often vary widely, however, due
both to experimental differences and to differences in the animals
examined (age, strain, activity). Ventilation rates and blood flow
rates are particularly sensitive to the level of activity (31, 33). Data
on some important tissues is relatively poor, particularly in the case
of fat tissue. Table 3 shows typical values of a number of physiolog-
ical parameters in several species.
3.6.2. Biochemical For volatile liquids, the type of chemicals which are common envi-
Parameters ronmental contaminants, tissue partition coefficients can be deter-
mined by a simple in vitro technique called vial equilibration (36, 38)
and tissue metabolic constants by a modification of the same tech-
nique (37) or other in vitro methods (105). Alternatively, rapid
in vivo approaches for determining metabolic constants can be used
either based on steady-state (96) or gas uptake experiments
(102–104, 106, 107). Determination of the total amount of chemical
metabolized in a particular exposure situation can also be used to
estimate metabolic parameters (109). In addition, determination of
stable end-product metabolites after exposure can be a particularly
attractive technique in some cases (108, 110). Similar approaches can
be used with nonvolatile chemicals (39, 111) and metals (112).
Table 3
(II-1) Typical physiological parameters for PBPK models
Species Mouse Rat Monkey Human

Ventilation
Alveolar (L/(h - 1 kg)) 29.0 15.0 15.0 15.0
Blood flows
Total (L/(h - 1 kg)) 16.5 15.0 15.0 15.0
Muscle (Fraction) 0.18 0.18 0.18 0.18
Skin (Fraction) 0.07 0.08 0.06 0.06
Fat (Fraction) 0.03 0.06 0.05 0.05
Liver (arterial) (Fraction) 0.035 0.03 0.065 0.07
Gut (portal) (Fraction) 0.165 0.18 0.185 0.19
Other organs (Fraction) 0.52 0.47 0.46 0.45
Tissue volumes
Body weight (kg) 0.02 0.3 4.0 80.0
Body water (Fraction) 0.65 0.65 0.65 0.65
Plasma (Fraction) 0.04 0.04 0.04 0.04
RBCs (Fraction) 0.03 0.03 0.03 0.03
Muscle (Fraction) 0.34 0.36 0.048 0.33
Skin (Fraction) 0.17 0.195 0.11 0.11
Fat (Fraction) 0.10 0.07 0.05 0.21
Liver (Fraction) 0.046 0.037 0.027 0.023
Gut tissue (Fraction) 0.031 0.033 0.045 0.045
Other organs (Fraction) 0.049 0.031 0.039 0.039
3.6.3. Allometry The different types of physiological and biochemical parameters in a

PBPK model are known to vary with body weight in different ways
(23). Typically, the parameterization of PBPK models is simplified
by assuming standard allometric scaling (33, 113), as shown in
Table 4, where the scaling factors, b, can be used in the following
equation:
Y ¼ aXb ;
where Y ¼ the value of the parameter at a given body weight, X
(kg), a ¼ the scaled parameter value for a 1 kg animal.
While standard allometric scaling provides a useful starting
point, or hypothesis, for cross-species scaling, it is not sufficiently
accurate for some applications, such as risk assessment. In the case
of the physiological parameters, the species-specific parameter
values are generally available in the literature (30–35), and can be
used directly in place of the allometric estimates. This is often not
the case for the other model parameters, however. The use of the
allometric scaling convention provides a useful way to estimate
reasonable initial values for parameters across species. It also pro-
vides a reasonable method for intraspecies scaling.
Table 4
Standard allometric scaling for physiologically based
pharmacokinetic model parameters
Parameter type (units) Scaling (power of body weight)

Volumes 1.0
Flows (volume per time) 0.75
Ventilation (volume per time) 0.75
Clearances (volume per time) 0.75
Metabolic capacities (mass per time) 0.75
Metabolic affinities (mass per volume) 0
Partition coefficients (unit less) 0
First-order rate constants (inverse time) 0.25
3.6.4. Parameter In many cases, important parameters values needed for a PBPK
Optimization model may not be available in the literature. In such cases it is
necessary to measure them in new experiments, to estimate them
by QSAR techniques, or to identify them by optimizing the fit of
the model to an informative data set. Even in the case where an
initial estimate of a particular parameter value can be obtained from
other sources, it may be desirable to refine the estimate by optimi-
zation. For example, given the difficulty of obtaining accurate
estimates of the fat volume in rodents, a more reliable estimate
may be obtained by examining the impact of fat volume on the
kinetic behavior of a lipophilic compound such as styrene. Of
course, being able to uniquely identify a parameter from a kinetic
data set rests on two key assumptions: (1) that the kinetic behavior
of the compound under the conditions in which the data was
collected is sensitive to the parameter being estimated, and (2)
that other parameters in the model which could influence the
observed kinetics have been determined by other means, and are
held fixed during the estimation process.
The actual approach for conducting a parameter optimization
can range from simple visual fitting, where the model is run with
different values of the parameter until the best correspondence
appears to be achieved, or by a quantitative mathematical algo-
rithm. The most common algorithm used in optimization is the
least-squares fit. To perform a least-squares optimization, the
model is run to obtain a set of predictions at each of the times a
data point was collected. The square of the difference between the
model prediction and data point at each time is calculated and the
results for all of the data points are summed. The parameter being
estimated is then modified, and the sum of squares is recalculated.
This process is repeated until the smallest possible sum of squares
is obtained, representing the best possible fit of the model to
the data.
In a variation on this approach, the square of the difference at
each point is divided by the square of the prediction. This variation,
known as relative least squares, is preferable in the case of data with
an error structure which can be described by a constant coefficient
of variation (that is, a constant ratio of the standard deviation to the
mean). The former method, known as absolute least squares, is
preferable in the case of data with a constant variance. From a
practical viewpoint, the absolute least squares method tends to
give greater weight to the data at higher concentrations and results
in fits that look best when plotted on a linear scale, while the relative
least squares method gives greater weight to the data at lower
concentrations and results in fits that look best when plotted on a
logarithmic scale.
A generalization of this weighting concept is provided by the
extended least squares method, available in a number of optimiza-
tion packages including ACSL/Opt (MGA Software, Concord,
MA). In the extended least squares algorithm, the heteroscedasti-
city parameter can be varied from 0 (for absolute weighting) to
2 (for relative weighting), or can be estimated from the data. In
general, setting the heteroscedasticity parameter from knowledge
of the error structure of the data is preferable to estimating it from a
data set.
A common example of identifying PBPK model parameters by
fitting kinetic data is the estimation of tissue partition coefficients
from experiments in which the concentration of chemical in the
blood and tissues is reported at various time points. Using an
optimization approach, the predictions of the model for the time
course in the blood and tissues could be optimized with respect to
the data by varying the model’s partition coefficients. There is really
little difference in the strength of the justification for estimating the
partition coefficients in this way as opposed to estimating them
directly from the data (by dividing the tissue concentrations by
the simultaneous blood concentration). In fact, the direct estimates
would probably be used as initial estimates in the model when the
optimization was started.
A major difficulty in performing parameter optimization results
from correlations between the parameters. When it is necessary to
estimate parameters which are highly correlated, it is best to gener-
ate a contour plot of the objective function (sum of squares) or
confidence region over a reasonable range of values of the two
parameters. Generation of a contour plot with ACSL/Opt is rela-
tively straightforward. An example of a contour plot for two of the
metabolic parameters in the PBPK model for methylene chloride is
Fig. 21. Contour plot for correlated metabolic parameters in the PBPK model for methylene
chloride.
shown in Fig. 21. The contours in the figure outline the joint
confidence region for the values of the two parameters, and the
fact that the confidence region is aligned diagonally reflects the
correlation between the two parameters.
3.7. Mass Balance One of the most important mathematical considerations during
Requirements model design is the maintenance of mass balance. Simply put, the
model should neither create nor destroy mass. This seemingly
obvious principle is often violated unintentionally during the pro-
cess of model development and parameterization. A common vio-
lation of mass balance, which typically leads to catastrophic results,
involves failure to exactly match the arterial and venous blood flows
in the model. As described above, the movement of chemical in the
blood (in units of mass per time) is described as the product of the
concentration of chemical in the blood (in units of mass per vol-
ume) times the flow rate of the blood (in units of volume per time).
Therefore, to maintain mass balance, the sum of the blood flows
leaving any particular tissue compartment must equal the sum of
the blood flows entering the compartment. In particular, to main-
tain mass balance in the blood compartment (regardless of whether
it is actually a compartment or just a steady-state equation), the
sum of the venous flows from the individual tissue compartments
must equal the total arterial blood flow leaving the heart:
X
Q T ¼ Q C:
Another obvious but occasionally overlooked aspect of main-
taining mass balance during model development is that if a model is
modified by splitting a tissue out of a lumped compartment, the
blood flow to the separated tissue (and its volume) must be sub-
tracted from that for the lumped compartment. Moreover, even
though a model may initially be designed with parameters that meet
the above requirements, mass balance may unintentionally be vio-
lated later if the parameters are altered during model execution. For
example, if the parameter for the blood flow to one compartment is
increased, the parameter for the overall blood flow must be
increased accordingly or an equivalent reduction must be made in
the parameter for the blood flow to another compartment. Partic-
ular care must be taken in this regard when the model is subjected
to sensitivity or uncertainty analysis; inadvertent violation of mass
balance during Monte Carlo sampling has lead in the past to the
publication of erroneous sensitivity results (95).
A similar mass balance requirement must be met for transport
other than blood flow. For example, if the chemical is cleared by
biliary excretion, the elimination of chemical from the liver in the
bile must exactly match the appearance of chemical in the gut
lumen in the bile. Put mathematically, the same term for the
transport must appear in the equations for the two compartments,
but with opposite signs (positive vs. negative). For example, if the
following equation were used to describe a liver compartment with
first-order metabolism and biliary clearance.
dA L =dt ¼ Q L ðC A C L =P L Þ kF C L V L =P L K B C L ;
where KB ¼ the biliary clearance rate (L/h).

The equation for the intestinal lumen would then need to
include the term: +KB CL.
As a model grows in complexity, it becomes increasingly difficult
to assure its mass balance by inspection. Therefore, it is a worthwhile
practice to check for mass balance by including an equation in the
model that adds up the total amount of chemical in each of the
model compartments, including metabolized and excreted chemi-
cal, for comparison with the administered or inhaled dose.
3.8. Model Diagram As described in the previous sections, the process of developing a
PBPK model begins by determining the essential structure of the
model based on the information available on the chemical’s toxicity,
mechanism of action, and pharmacokinetic properties. The results
of this step can usually be summarized by an initial model diagram,
such as those depicted in Figs. 3 and 8. In fact, in many cases a
well-constructed model diagram, together with a table of the input
parameter values and their definitions, is all that an accomplished
modeler should need in order to create the mathematical equations
defining a PBPK model. In general, there should be a one-to-one
correspondence of the boxes in the diagram to the mass balance
equations (or steady-state approximations) in the model. Similarly,
the arrows in the diagram correspond to the transport or
metabolism processes in the model. Each of the arrows connecting

the boxes in the diagram should correspond to one of the terms in
the mass balance equations for both of the compartments it con-
nects, with the direction of the arrow pointing from the compart-
ment in which the term is negative to the compartment in which it
is positive. Arrows only connected to a single compartment, which
represent uptake and excretion processes, are interpreted similarly.
The model diagram should be labeled with the names of the key
variables associated with the compartment or process represented
by each box and arrow. Interpretation of the model diagram is also
aided by the definition of the model input parameters in the
corresponding table. The definition and units of the parameters
can indicate the nature of the process being modeled (e.g.,
diffusion-limited vs. flow-limited transport, binding vs. partition-
ing, saturable vs. first-order metabolism, etc.).
3.9. Elements of Model One of the key advantages of PBPK models is their ability to
Evaluation perform extrapolations across species, routes of exposure, and
exposure conditions. The reliability of a particular PBPK model
for the purpose of extrapolation depends not only on the adequacy
of its structure, but also on the correctness of its parameterization.
The following section discusses some of the key issues associated
with evaluating the adequacy of a model to predict chemical kinet-
ics under conditions different from those for which experimental
data are available.
3.9.1. Model Documentation In cases where a model previously developed by one investigator is
being evaluated for use in a different application by another inves-
tigator, adequate model documentation is critical for evaluation of
the model. The documentation for a PBPK model should include
sufficient information about the model so that an experienced
modeler could accurately reproduce its structure and parameteri-
zation. Usually the suitable documentation of a model will require
a combination of one or more “box and arrow” model diagrams
together with any equations which cannot be unequivocally
derived from the diagrams. Model diagrams should clearly differ-
entiate blood flow from other transport (e.g., biliary excretion) or
metabolism, and arrows should be used where the direction of
transport could be ambiguous. All tissue compartments, metabo-
lism pathways, routes of exposure, and routes of elimination should
be clearly and accurately presented. All equations should be dimen-
sionally consistent and in standard mathematical notation. Generic
equations (e.g., for tissue “i”) can help to keep the description brief
but complete. The values used for all model parameters should be
provided, with units. If any of the listed parameter values are based
on allometric scaling, a footnote should provide the body weight
used to obtain the allometric constant as well as the power of body
weight used in the scaling.
3.9.2. Model Validation Internal validation consists of the evaluation of the mathematical
correctness of the model (114). It is best accomplished on the actual
model code, but if necessary can be performed on appropriate
documentation of the model structure and parameters, as described
above (Assuming, of course, that the actual model code accurately
reflects the model documentation). A more important issue regards
the provision of evidence for external validation (sometimes referred
to as verification). The level of detail incorporated into a model is
necessarily a compromise between biological accuracy and parsi-
mony. The process of evaluating the sufficiency of the model for its
intended purpose, termed model verification, requires a demonstra-
tion of the ability of the model to predict the behavior of experi-
mental data different from that on which it was based.
Whereas a simulation is intended simply to reproduce the
behavior of a system, a model is intended to confirm a hypothesis
concerning the nature of the system (115). Therefore, model vali-
dation should demonstrate the ability of the model to predict the
behavior of the system under conditions which test the principal
aspects of the underlying hypothetical structure. While quantitative
tests of goodness of fit may often be a useful aspect of the verifica-
tion process, the more important consideration may be the ability
of the model to provide an accurate prediction of the general
behavior of the data in the intended application.
Where only some aspects of the model can be verified, it is
particularly important to assess the uncertainty associated with the
aspects which are untested. For example, a model of a chemical
and its metabolites which is intended for use in cross-species
extrapolation to humans would preferably be verified using data
in different species, including humans, for both the parent chemi-
cal and the metabolites. If only parent chemical data is available in
the human, the correspondence of metabolite predictions with
data in several animal species could be used as a surrogate, but this
deficiency should be carefully considered when applying the
model to predict human metabolism. One of the values of biolog-
ically based modeling is the identification of specific data which
would improve the quantitative prediction of toxicity in humans
from animal experiments.
In some cases it is necessary to use all of the available data to
support model development and parameterization. Unfortunately,
this type of modeling can easily become a form of self-fulfilling
prophecy: models are logically strongest when they fail, but
psychologically most appealing when they succeed (116). Under
these conditions, model verification can particularly difficult, putting
an additional burden on the investigators to substantiate the trustwor-
thiness of the model for its intended purpose. Nevertheless, a com-
bined model development and verification can often be successfully
performed, particularly for models intended for interpolation, integra-
tion, and comparison of data rather than for true extrapolation.
3.9.3. Parameter Verification In addition to verifying the performance of the model against
experimental data, the model should be evaluated in terms of the
plausibility of its parameters. This is particularly important in the
case of PBPK models, where the parameters generally possess
biological significance, and can therefore be evaluated for plausibil-
ity independent of the context of the model. The source of each
model input parameter value should be identified, whether it was
obtained from prior literature, determined directly by experiment,
or estimated by fitting a model output to experimental data. Param-
eter estimates derived independently of tissue time course or dose-
response data are preferred. To the extent feasible, the degree of
uncertainty regarding the parameter values should also be evalu-
ated. The empirically derived “Law of Reciprocal Certainty” states
that the more important the model parameter, the less certain will
be its value. In accordance with this principle, the most difficult,
and typically most important, parameter determination for PBPK
models is the characterization of the metabolism parameters.
When parameter estimation has been performed by optimizing
model output to experimental data, the investigator must assure
that the parameter is adequately identifiable from the data (114).
Due to the confounding effects of model error, overparameteriza-
tion, and parameter correlation, it is quite possible for an optimiza-
tion algorithm to obtain a better fit to a particular data set by
modifying a parameter which in fact should not be identified on
the basis of that data set. Also, when an automatic optimization
routine is employed it should be restarted with a variety of initial
parameter values to assure that the routine has not stopped at a
local optimum. These precautions are particularly important when
more than one parameter is being estimated simultaneously, since
the parameters in biologically based models are often highly corre-
lated, making independent estimation difficult. Estimates of param-
eter variance obtained from automatic optimization routines
should be viewed as lower bound estimates of true parameter
uncertainty since only a local, linearized variance is typically calcu-
lated. In characterizing parameter uncertainty, it is probably more
instructive to determine what ranges of parameter values are clearly
inconsistent with the data than to accept a local, linearized variance
estimate provided by the optimization algorithm.
It is usually necessary for the investigator to repeatedly vary the
model parameters manually to obtain a sense of their identifiability
and correlation under various experimental conditions, although
some simulation languages include routines for calculating param-
eter sensitivity and covariance or for plotting confidence region
contours. Sensitivity analysis and Monte Carlo uncertainty analysis
techniques can serve as useful methods to estimate the impact of
input parameter uncertainty on the uncertainty of model outputs
(95, 117). However, care should be taken to avoid violation of mass
balance when parameters are varied by sensitivity or Monte Carlo

algorithms (95, 114), particularly where blood flows are affected.
3.9.4. Sensitivity Analysis To the extent that a particular PBPK model correctly reflects the
physiological and biochemical processes underlying the pharmaco-
kinetics of a chemical, exercising the model can provide a means for
identifying the most important physiological and biochemical para-
meters determining the pharmacokinetic behavior of the chemical
under different conditions (95). The technique for obtaining this
information is known as sensitivity analysis and can be performed by
two different methods. Analytical sensitivity coefficients are defined
as the ratio of the change in a model output to the change in a model
parameter that produced it. To obtain a sensitivity coefficient by this
method, the model is run for the exposure scenario of interest using
the preferred values of the input parameters, and the resulting
output (e.g., hair concentration) is recorded. The model is then
run again with the value of one of the input parameters varied
slightly. Typically, a 1% change is appropriate. The ratio of the
resulting incremental change in the output to the change in the
input represents the sensitivity coefficient. For example, if a 1%
increase in an input parameter resulted in a 0.5% decrease in the
output, the sensitivity coefficient would be 0.5. Sensitivity coeffi-
cients >1.0 in absolute value represent amplification of input error
and would be a cause for concern. An alternative approach is to
conduct a Monte Carlo analysis, as described below, and then to
perform a simple correlation analysis of the model outputs and input
parameters. Both methods have specific advantages. The analytical
sensitivity coefficient most accurately represents the functional rela-
tionship of the output to the specific input under the conditions
being modeled. The advantage of the correlation coefficients is that
they also reflect the impact of interactions between the parameters
during the Monte Carlo analysis.
3.9.5. Uncertainty Analysis There are a number of examples in the literature of evaluations of
the uncertainty associated with the predictions of a PBPK model
using the Monte Carlo simulation approach (10, 117, 118). In a
Monte Carlo simulation, a probability distribution for each of the
PBPK model parameters is randomly sampled, and the model is run
using the chosen set of parameter values. This process is repeated a
large number of times until the probability distribution for the
desired model output has been created. Generally speaking, 1,000
iterations or more may be required to ensure the reproducibility of
the mean and standard deviation of the output distributions as well
as the 1st through 99th percentiles. To the extent that the input
parameter distributions adequately characterize the uncertainty in
the inputs, and assuming that the parameters are reasonably inde-
pendent, the resulting output distribution will provide a useful
estimate of the uncertainty associated with the model outputs.
In performing a Monte Carlo analysis it is important to distin-

guish uncertainty from variability. As it relates to the impact of
pharmacokinetics in risk assessment, uncertainty can be defined as
the possible error in estimating the “true” value of a parameter for a
representative (“average”) person. Variability, on the other hand,
should only be considered to represent true interindividual differ-
ences. Understood in these terms, uncertainty is a defect (lack of
certainty) which can typically be reduced by experimentation, and
variability is a fact of life which must be considered regardless of the
risk assessment methodology used. An elegant approach for sepa-
rately documenting the impact of uncertainty and variability is
“two-dimensional” Monte Carlo, in which distributions for both
uncertainty and variability are developed and multiple Monte Carlo
runs are used to convolute the two aspects of overall uncertainty.
Unfortunately, in practice it is often difficult to differentiate the
contribution of variability and uncertainty to the observed variation
in the reported measurements of a particular parameter (118).
Due to its physiological structure, many of the parameters in a
PBPK model are interdependent. For example, the blood flows
must add up to the total cardiac output and the tissue volumes
(including those not included in the model) must add up to
the body weight. Failure to account for the impact of Monte
Carlo sampling on these mass balances can produce erroneous
results (95, 117). In addition, some physiological parameters are
naturally correlated, such as cardiac output and respiratory ventila-
tion rate, and these correlations should be taken into account
during the Monte Carlo analysis (118).
3.9.6. Collection As with model development, the best approach to model evaluation
of Critical Data is within the context of the scientific method. The most effective
way to evaluate a PBPK model is to exercise the model to generate a
quantitative hypothesis; that is, to predict the behavior of the
system of interest under conditions “outside the envelope” of the
data used to develop the model (at shorter/longer durations,
higher/lower concentrations, different routes, different species,
etc.). In particular, if there is an element of the model which
remains in question, the model can be exercised to determine the
experimental design under which the specific model element can
best be tested. For example, if there is uncertainty regarding
whether uptake into a particular tissue is flow or diffusion limited,
alternative forms of the model can be used to compare predicted
tissue concentration time courses under each of the limiting
assumptions under various experimental conditions. The experi-
mental design and sampling time which maximizes the difference
between the predicted tissue concentrations under the two assump-
tions can then serve as the basis for the actual experimental data
collection. Once the critical data has been collected, the same
model can also be used to support a more quantitative experimental
inference. In the case of the tissue uptake question just described,

not only can the a priori model predictions be compared with the
observed data to test the alternative hypotheses, but the model can
also be used a posteriori to estimate the quantitative extent of any
observed diffusion limitation (i.e., to estimate the relevant model
parameter by fitting the data). If, on the other hand, the model is
unable to reproduce the experimental data under either assump-
tion, it may be necessary to reevaluate other aspects of the model
structure. The key difference between research and analysis is the
iterative nature of the former. It has wisely been said, “If we knew
when we started what we had to do to finish, they’d call it search,
not research.”
4. Example
The previous sections have focused on the process of designing the

PBPK model structure needed for a particular application. At this
point the model consists of a number of mathematical equations:
differential equations describing the mass balance for each of the
compartments and algebraic equations describing other relation-
ships between model variables. The next step in model development
is the coding of the mathematical form of the model into a form
which can be executed on a computer. The discussion in this section
will be couched in terms of a particular software package, acslX.
4.1. Mathematical Mathematically, a PBPK model is represented by a system of simul-

Formulation taneous linear differential equations. The model compartments are
represented by the differential equations that describe the mass
balance for each one of the “state variables” in the model. There
may also be additional differential equations to calculate other
necessary model outputs, such as the area under the AUC in a
particular compartment, which is simply the integral of the concen-
tration over time. The resulting system of equations is referred to as
simultaneous because the time courses of the chemical in the vari-
ous compartments are so interdependent that solving the equations
for any one of the compartments requires information on the
current status of all the other compartments; that is, the equations
for all of the compartments must be solved at the same time. The
equations are considered to be linear in the sense that they include
only first-order derivatives, not due to any pharmacokinetic con-
siderations. This kind of mathematical problem, in which a system
is defined by the conditions at time zero together with differential
equations describing how it evolves over time, is known as an initial
value problem, and matrix decomposition methods are used to
obtain the simultaneous solution.
A number of numerical algorithms are available for solving such

problems. They all have in common that they are step-wise approx-
imations; that is, they begin with the conditions at time zero and
use the differential equations to predict how the system will change
over a small time step, resulting in an estimate of the conditions at a
slightly later time, which serves as the starting point for the next
time step. This iterative process is repeated as long as necessary to
simulate the experimental scenario.
The more sophisticated methods, such as the Gear algorithm
(named after the mathematician, David Gear, who developed it)
use a predictor–corrector approach, in which the corrector step
essentially amounts to “predicting backwards” after each step for-
ward, in order to check how closely the algorithm is able to repro-
duce the conditions at the previous time step. This allows the time
step to be increased automatically when the algorithm is
performing well, and to be shortened when it is having difficulty,
such as when conditions are changing rapidly. However, due to the
wide variation of the time constants (response times) for the various
physiological compartments (e.g., fat vs. richly perfused), PBPK
models often represent “stiff” systems. Stiff systems are character-
ized by state variables (compartments) with widely different time
constants, which cause difficulty for predictor–corrector algo-
rithms. The Gear algorithm was specifically designed to overcome
this difficulty. It is therefore generally recommended that the Gear
algorithm be used for executing PBPK models. An implementation
of the Gear algorithm is available in most of the advanced software
packages.
Regardless of the specific algorithm selected, the essential
nature of the solution, as stated above, will be a step-wise approxi-
mation. However, all of the algorithms made available in computer
software are convergent; that is, they can stay arbitrarily close to the
true solution, given a small enough time step. On modern personal
computers, even large PBPK models can be run to more than
adequate accuracy in a reasonable timeframe.
4.2. Model Coding The following sections contain typical elements of the ACSL code
in ACSL for a PBPK model, interspersed with comments, which will be
written in italics to differentiate them from the actual model
code. The first section describes the model definition file, which
by convention in ACSL is given a filename with the extension CSL.
The model used as an example in the following sections is a
simple, multiroute model for volatile chemicals, similar to the
styrene model discussed earlier, except that is also has the ability
to simulate closed-chamber gas uptake experiments.
4.3. Typical Elements An acslX source file follows the structure defined in the Standard for
in a Model File Continuous Simulation Languages (just like there is a standard for
C++). Thus, for example, there will generally be an INITIAL block
defining the initial conditions followed by a DYNAMIC block which

contains DISCRETE and/or DERIVATIVE sub-blocks that define
the model. In addition, conventions which have been generally
adopted by the PBPK modeling community (most of which started
with John Ramsey at Dow Chemical during the development of the
styrene model) help to improve the readability of the code. The follow-
ing file shows typical elements of “Ramseyan code.”
The first line in the code must start with the word PROGRAM
(a remnant of ACSL’s derivation from FORTRAN).
PROGRAM
Lines starting with an exclamation point (and portions of lines to the
right of one) are ignored by the ACSL translator and can be used for
comments:
! Developed for ACSL Level 10
! by Harvey Clewell (KS Crump Group, ICF Kaiser Int’l., Ruston, LA)
! and Mel Andersen (Health Effects Research Laboratory, USEPA,
RTP, NC)
The first section of an ACSL source file is the INITIAL block,
which is used to define parameters and perform calculations that do
not need to be repeated during the course of the simulation:
INITIAL ! Beginning of preexecution section
Only parameters defined in a CONSTANT statement can be changed
during a session using the SET command:
LOGICAL CC ! Flag set to .TRUE. for closed chamber runs
! Physiological parameters (rat)
CONSTANT QPC ¼ 14. ! Alveolar ventilation rate (L/hr)
CONSTANT QCC ¼ 14. ! Cardiac output (L/hr)
CONSTANT QLC ¼ 0.25 ! Fractional blood flow to liver
CONSTANT QFC ¼ 0.09 ! Fractional blood flow to fat
CONSTANT BW ¼ 0.22 ! Body weight (kg)
CONSTANT VLC ¼ 0.04 ! Fraction liver tissue
CONSTANT VFC ¼ 0.07 ! Fraction fat tissue
!—————Chemical specific parameters (styrene)
CONSTANT PL ¼ 3.46 ! Liver/blood partition coefficient
CONSTANT PF ¼ 86.5 ! Fat/blood partition coefficient
CONSTANT PS ¼ 1.16 ! Slowly perfused tissue/blood partition
CONSTANT PR ¼ 3.46 ! Richly perfused tissue/blood partition
CONSTANT PB ¼ 40.2 ! Blood/air partition coefficient
CONSTANT MW ¼ 104. ! Molecular weight (g/mol)
CONSTANT VMAXC ¼ 8.4 ! Maximum velocity of metabolism
(mg/hr-1kg)
CONSTANT KM ¼ 0.36 ! Michaelis–Menten constant (mg/L)
CONSTANT KFC ¼ 0. ! First order metabolism (/hr-1kg)
CONSTANT KA ¼ 0. ! Oral uptake rate (/hr)
!—————Experimental parameters
CONSTANT PDOSE ¼ 0. ! Oral dose (mg/kg)
CONSTANT IVDOSE ¼ 0. ! IV dose (mg/kg)
CONSTANT CONC ¼ 1000. ! Inhaled concentration (ppm)
CONSTANT CC ¼ .FALSE.! Default to open chamber
CONSTANT NRATS ¼ 3. ! Number of rats (for closed chamber)
CONSTANT KLC ¼ 0. ! First order loss from closed chamber (/hr)
CONSTANT VCHC ¼ 9.1 ! Volume of closed chamber (L)
CONSTANT TINF ¼ .01 ! Length of IV infusion (hr)
It is an understandable requirement in ACSL to define when to
stop and how often to report. The parameter for the reporting fre-
quency (“communication interval”) is assumed by the ACSL transla-
tor to be called CINT unless you tell it otherwise using the
CINTERVAL statement. The parameter for when to stop can be
called anything you want, as long as you use the same name in the
TERMT statement (see below), but the Ramseyan convention is
TSTOP:
CONSTANT TSTOP ¼ 24. ! Length of experiment (hr)
The following parameter name is generally used to define the length of
inhalation exposures (the name LENGTH is also used by some):
CONSTANT TCHNG ¼ 6. ! Length of inhalation exposure (hr)
The INITIAL block is a useful place to perform logical switching for
different model applications, in this case between the simulation of
closed-chamber gas uptake experiments and normal inhalation stud-
ies. It is also sometimes necessary to calculate initial conditions for one
of the integrals (“state variables”) in the model (the initial amount in
the closed chamber in this case):
IF (CC) RATS ¼ NRATS ! Closed chamber simulation
IF (CC) KL ¼ KLC
IF (.NOT.CC) RATS ¼ 0. ! Open chamber simulation
IF (.NOT.CC) KL ¼ 0.
! (Turn off chamber losses so concentration in chamber remains
constant)
IF (PDOSE.EQ.0.0) KA ¼ 0. ! If not oral dosing, turn off oral
uptake
VCH ¼ VCHC-RATS*BW ! Net chamber air volume (L)
AI0 ¼ CONC*VCH*MW/24450. ! Initial amount in cham-
ber (mg)
After all the constants have been defined, calculations using them
can be performed. In contrast to the DERIVATIVE block (of which
more later), the calculations in the INITIAL block are performed in
the order written, just like in FORTRAN, so a variable must be

defined before it can be used.
Note how allometric scaling is used for flows (QC, QP) and
metabolism (VMAX, KFC). Also note how the mass balance for the
blood flows and tissue volumes is maintained by the model code. Run-
time changes in the parameters for fat and liver are automatically
balanced by changes in the slowly and richly perfused compartments,
respectively. The fractional blood flows add to one, but the fractional
tissue volumes add up to only 0.91, allowing 9% of the body weight to
reflect nonperfused tissues:
!————Scaled parameters
QC ¼ QCC*BW**0.74 ! Cardiac output
QP ¼ QPC*BW**0.74 ! Alveolar ventilation
QL ¼ QLC*QC ! Liver blood flow
QF ¼ QFC*QC ! Fat blood flow
QS ¼ 0.24*QC-QF ! Slowly perfused tissue blood flow
QR ¼ 0.76*QC-QL ! Richly perfused tissue blood flow
VL ¼ VLC*BW ! Liver volume
VF ¼ VFC*BW ! Fat tissue volume
VS ¼ 0.82*BW-VF ! Slowly perfused tissue volume
VR ¼ 0.09*BW-VL ! Richly perfused tissue volume
VMAX ¼ VMAXC*BW**0.7 ! Maximum rate of metabolism
KF ¼ KFC/BW**0.3 ! First-order metabolic rate constant
DOSE ¼ PDOSE*BW ! Oral dose
IVR ¼ IVDOSE*BW/TINF ! Intravenous infusion rate
An END statement is required to delineate the end of the initial
block:
END ! End of initial se-ction
The next (and often last) section of an ACSL source file is the
DYNAMIC block, which contains all of the code defining what is to
happen during the course of the simulation:
DYNAMIC ! Beginning of execution section
ACSL possesses a number of different algorithms for performing
the simulation, which mathematically speaking consists of solving an
initial value problem for a system of simultaneous linear differential
equations. (Although it is easier to just refer to it as integrating.)
Available methods include the Euler, Runga-Kutta, and Adams-
Moulton, but the tried and true choice of most PBPK modelers is the
Gear predictor–corrector, variable step-size algorithm for stiff systems,
which PBPK models often are. (Stiff, that is):
ALGORITHM IALG ¼ 2 ! Use Gear integration algorithm
NSTEPS NSTP ¼ 10 ! Number of integration steps in
communication interval
MAXTERVAL MAXT ¼ 1.0e9 ! Maximum integration step size

MINTERVAL MINT ¼ 1.0e-9 ! Minimum integration step size
CINTERVAL CINT ¼ 0.01 ! Communication interval
One of the structures which can be used in the DYNAMIC block is
called a DISCRETE block. The purpose of a DISCRETE block is to
define an event which is desired to occur at a specific time or under
specific conditions. The integration algorithm then keeps a lookout for
the conditions and executes the code in the DISCRETE block at the
proper moment during the execution of the model. An example of a
pair of discrete blocks which are used to control repeated dosing in
another PBPK model are shown here as an example:
DISCRETE DOSE1 ! Schedule events to turn exposure on and off daily
INTERVAL DOSINT ¼ 24. ! Dosing interval
!(Set interval larger than TSTOP to prevent multiple exposure)
IF (T.GT.TMAX) GOTO NODOSE
IF (DAY.GT.DAYS) GOTO NODOSE
CONC ¼ CI ! Start inhalation exposure
TOTAL ¼ TOTAL + DOSE ! Administer oral dose
TDOSE ¼ T ! Record time of dosing
SCHEDULE DOSE2 .AT. T + TCHNG ! Schedule end of exposure
NODOSE..CONTINUE
DAY ¼ DAY + 1.
IF (DAY.GT.7.) DAY ¼ 0.5
END ! of DOSE1
DISCRETE DOSE2
CONC ¼ 0. ! End inhalation exposure
END ! of DOSE2
Within the DYNAMIC block, a group of statements defining a
system of simultaneous differential equations is put in a DERIVA-
TIVE block. If there is only one it does not have to be given a name:
DERIVATIVE ! Beginning of derivative definition block
The main function of the derivative block is to define the “state
variables” which are to be integrated. They are identified by the
INTEG function. For example, in the code below, AI is defined to be
a state variable which is calculated by integrating the equation
defining the variable RAI, using an initial value of AI0. For most
of the compartments, the initial value is zero.
!———————CI ¼ Concentration in inhaled air (mg/L)
RAI ¼ RATS*QP*(CA/PB-CI) - (KL*AI) ! Rate equation
AI ¼ INTEG(RAI,AI0) ! Integral of RAI
CI ¼ AI/VCH*CIZONE ! Concentration in air
CIZONE ¼ RSW((T.LT.TCHNG).OR.CC,1.,0.)
CP ¼ CI*24450./MW ! Chamber air concentration in ppm
Any experienced programmer would shudder at the code shown

above, because several variables appear to be used before they have been
calculated (for example, CIZONE is used to calculate CI and CI is
used to calculate RAI. However, within the derivative block, writing
code is almost too easy because the translator will automatically sort
the statements into the proper order for execution. That is, there is no
need to be sure that a variable is calculated before it is used.
The down side of the sorting is that you cannot be sure that two
statements will be calculated in the order you want just because you
place them one after the other. Also, because of the sorting (as well as
the way the predictor–corrector integration algorithm hops forward
and backward in time), IF statements will not work right. The RSW
function above works like an IF statement, setting CIZONE to 1.
whenever T (the default name for the time variable in ACSL) is Less
Than TCHNG, and setting CIZONE to 0. (and thus turning off the
exposure) whenever T is greater than or equal to TCHNG.
The following blocks of statements each define one of the compart-
ments in the model. These statements can be compared with the
mathematical equations described in the previous sections of the man-
ual. One of the advantages of models written in ACSL following the
Ramseyan convention is that they are easier to comprehend and
reasonably self-documenting.
!——MR ¼ Amount remaining in stomach (mg)
RMR ¼ -KA*MR
MR ¼ DOSE*EXP(-KA*T)
Note that the stomach could have been defined as one of the state
variables:
MR ¼ INTEG(RMR,DOSE)
But instead the exact solution for the simple integral has been used
directly.
Similarly, instead of defining the blood as a state variable, the
steady-state approximation is used:
!——CA ¼ Concentration in arterial blood (mg/L)
CA ¼ (QC*CV + QP*CI)/(QC + (QP/PB))
AUCB ¼ INTEG(CA,0.)
!——AX ¼ Amount exhaled (mg)
CX ¼ CA/PB ! End-alveolar air concentration (mg/L)
CXPPM ¼ (0.7*CX+0.3*CI)*24450./MW ! Average exhaled
air concentration (ppm)
RAX ¼ QP*CX
AX ¼ INTEG(RAX,0.)
!——AS ¼ Amount in slowly perfused tissues (mg)
RAS ¼ QS*(CA-CVS)
AS ¼ INTEG(RAS,0.)
CVS ¼ AS/(VS*PS)
CS ¼ AS/VS
!——AR ¼ Amount in rapidly perfused tissues (mg)
RAR ¼ QR*(CA-CVR)
AR ¼ INTEG(RAR,0.)
CVR ¼ AR/(VR*PR)
CR ¼ AR/VR
!——AF ¼ Amount in fat tissue (mg)
RAF ¼ QF*(CA-CVF)
AF ¼ INTEG(RAF,0.)
CVF ¼ AF/(VF*PF)
CF ¼ AF/VF
!——AL ¼ Amount in liver tissue (mg)
RAL ¼ QL*(CA-CVL)-RAM + RAO
AL ¼ INTEG(RAL,0.)
CVL ¼ AL/(VL*PL)
CL ¼ AL/VL
AUCL ¼ INTEG(CL,0.)
!——AM ¼ Amount metabolized (mg)
RAM ¼ (VMAX*CVL)/(KM + CVL) + KF*CVL*VL
AM ¼ INTEG(RAM,0.)
!——AO ¼ Total mass input from stomach (mg)
RAO ¼ KA*MR
AO ¼ DOSE-MR
!——IV ¼ Intravenous infusion rate (mg/h)
IVZONE ¼ RSW(T.GE.TINF,0.,1.)
IV ¼ IVR*IVZONE
!——CV ¼ Mixed venous blood concentration (mg/L)
CV ¼ (QF*CVF + QL*CVL + QS*CVS + QR*CVR + IV)/
QC
!——TMASS ¼ mass balance (mg)
TMASS ¼ AF + AL + AS + AR + AM + AX + MR
!——DOSEX ¼ Net amount absorbed (mg)
DOSEX ¼ AI + AO + IVR*TINF-AX
Last, but definitely not least, you have to tell ACSL when to stop:
TERMT(T.GE.TSTOP) ! Condition for terminating simulation
END ! End of derivative block
END ! End of dynamic section
Another kind of code section, the TERMINAL block, can also be
used here to execute statements that should only be calculated at the
end of the run.
END ! End of program
4.4. Model Evaluation The following section discusses various issues associated with the
evaluation of a PBPK model. Once an initial model has been
developed, it must be evaluated on the basis of its conformance
with experimental data. In some cases, the model may be exercised
to predict conditions under which experimental data should be
collected in order to verify or improve model performance. Com-
parison of the resulting data with the model predictions may sug-
gest that revision of the model will be required. Similarly, a PBPK
model designed for one chemical or application may be adapted to
another chemical or application, requiring modification of the
model structure and parameters. It is imperative that revision or
modification of a model is conducted with the same level of rigor
applied during initial model development, and that structures are
not added to the model with no other justification than that they
improve the agreement of the model with a particular data set.
In addition to comparing model predictions to experimental
data, model evaluation includes assessing the plausibility of the
model input parameters, and the confidence which can be placed
in extrapolations performed by the model. This aspect of model
evaluation is particularly important in the case of applications in risk
assessment, where it is necessary to assess the uncertainty associated
with risk estimates calculated with the model.
4.5. Model Revision An attempt to model the metabolism of allyl chloride (119) serves
as an excellent example of the process of model refinement and
validation. As mentioned earlier, in a gas uptake experiment several
animals are maintained in a small, enclosed chamber while the air in
the chamber is recirculated, with replenishment of oxygen and
scrubbing of carbon dioxide. A small amount of a volatile chemical
is then allowed to vaporize into the chamber, and the concentration
of the chemical in the chamber air is monitored over time. In this
design, any loss of the chemical from the chamber air reflects uptake
into the animals. After a short period of time during which the
chemical achieves equilibration with the animals’ tissues, any fur-
ther uptake represents the replacement of chemical removed from
the animals by metabolism. Analysis of gas uptake data with a PBPK
model has been used successfully to determine the metabolic para-
meters for a number of chemicals (103).
In an example of a successful gas uptake analysis, (108)
described the closed chamber kinetics of methylene chloride using
a PBPK model which included two metabolic pathways: one satu-
rable, representing oxidation by Cytochrome P450 enzymes, and
one linear, representing conjugation with glutathione (Fig. 22).
As can be seen in this figure, there is a marked concentration
dependence of the observed rate of loss of this chemical from the
chamber. The initial decrease in chamber concentration in all of the
experiments results from the uptake of chemical into the animal
Fig. 22. Gas uptake experiment. Concentration (ppm) of methylene chloride in a closed,
recirculated chamber containing three Fischer 344 rats. Initial chamber concentrations
were (top to bottom) 3,000, 1,000, 500, and 100 ppm. Solid lines show the predictions of
the model for a Vmax of 4.0 mg/h/kg, a Km of 0.3 mg/L, and a first-order rate constant of
2.0/h/kg, while symbols represent the measured chamber atmosphere concentrations.
tissues. Subsequent uptake is a function of the metabolic clearance

in the animals, and the complex behavior reflects the transition
from partially saturated metabolism at higher concentrations to
linearity in the low concentration regime. The PBPK model is
able to reproduce this complex behavior with a single set of para-
meters because the model structure appropriately captures the
concentration dependence of the rate of metabolism.
A similar analysis of gas uptake experiments with allyl chloride
using the same model structure was less successful. The smooth
curves shown in Fig. 23 are the best fit that could be obtained to
the observed allyl chloride chamber concentration data assuming a
saturable pathway and a first-order pathway with parameters that
were independent of concentration. Using this model structure
there were large systematic errors associated with the predicted
curves. The model predictions for the highest initial concentration
were uniformly lower than the data, while the predictions for the
intermediate initial concentrations were uniformly higher than the
data. A much better fit could be obtained by setting the first-order
rate constant to a lower value at the higher concentration; this
approach would provide a better correspondence between the
data and the model predictions, but would not provide a basis for
extrapolating to different exposure conditions.
The nature of the discrepancy between the PBPK model and
the data for allyl chloride suggested the presence of a dose-
dependent limitation on metabolism not included in the model
structure. This indication was consistent with other experimental
evidence indicating that the conjugative metabolism of allyl chlo-
ride depletes glutathione, a necessary cofactor for the linear
Fig. 23. Model failure. Concentration (ppm) of allyl chloride in a closed, recirculated
chamber containing three Fischer 344 rats. Initial chamber concentrations were (top to
bottom) 5,000, 2,000, 1,000, and 500 ppm. Symbols represent the measured chamber
atmosphere concentrations. The curves represent the best result that could be obtained
from an attempt to fit all of the data with a single set of metabolic constants using the
same closed chamber model structure as in Fig. 4.1.
conjugation pathway. The conjugation pathway for reaction of

methylene chloride and glutathione regenerates glutathione, but
in the case of allyl chloride glutathione is consumed by the conju-
gation reaction. Therefore, to adequately reflect the biological basis
of the kinetic behavior, it was necessary to model the time depen-
dence of hepatic glutathione. To accomplish this, the mathematical
model of the closed chamber experiment was expanded to include a
more complete description of the glutathione-dependent pathway.
The expanded model structure used for this description (120)
included a zero-order production of glutathione and a first-order
consumption rate that was increased by reaction of the glutathione
with allyl chloride; glutathione resynthesis was inversely related to
the instantaneous glutathione concentration. This description
provided a much improved correspondence between the data and
predicted behavior (Fig. 24). Of course, the improvement in fit was
obtained at the expense of adding several new glutathione-related
parameters to the model. To ensure that the improved fit is not just
a consequence of the additional parameters providing more free-
dom to the model for fitting the uptake data, a separate test of the
hypothesis underlying the added model structure (depletion of
glutathione) was necessary. Therefore the expanded model was
also used to predict both allyl chloride and hepatic glutathione
concentrations following constant concentration inhalation expo-
sures. Model predictions for end-exposure hepatic glutathione con-
centrations compared very favorably with actual data obtained in
separate experiments (Table 5).
Table 5
Predicted glutathione depletion caused by inhalation
exposure to allyl chloride
Depletion (mM)
Concentration (ppm) Observed Predicted

0 7,080 120 7,088
10 7,290 130 6,998a
0 7,230 80 7,238a
100 5,660 90 5,939
0 7,340 180 7,341a
1,000 970 10 839
0 6,890 710 6,890a
2,000 464 60 399
Note: Glutathione depletion data were graciously supplied by John Waechter,
Dow Chemical Co., Midland, Michigan
a
For the purpose of this comparison, the basal glutathione consumption rate in
the model was adjusted to obtain rough agreement with the controls in each
experiment. This basal consumption rate was then used to simulate the asso-
ciated exposure
Fig. 24. Cofactor depletion. Symbols represent the same experimental data as in Fig. 23.
The curves show the predictions of the expanded model, which not only included
depletion of glutathione by reaction with allyl chloride, but also provided for regulation
of glutathione biosynthesis on the basis of the instantaneous glutathione concentration, as
described in the text.
To reiterate the key points of this example:

1. A PBPK model which had successfully described experimental
results for a number of chemicals was unable to reproduce
similar kinetic data on another chemical.
2. A hypothesis was developed that depletion of a necessary
cofactor was affecting metabolism. This hypothesis was
based on:
(a) The nature of the discrepancy between the model predic-
tions and the kinetic data.
(b) Other available information about the nature of the che-
mical’s biochemical interactions.
3. The code for the PBPK model was altered to include additional
mass balance equations describing the depletion of this cofac-
tor, and its resynthesis, as well as the resulting impact on
metabolism.
4. The modification to the model was then tested in two ways:
(a) By testing the ability of the new model structure to simulate
the kinetic data that the original model was unable to
reproduce.
(b) By testing the underlying hypothesis regarding cofactor
depletion against experimental data on glutathione deple-
tion from a separate experiment.
Both elements of testing the model, kinetic validation and mech-
anistic validation, are necessary to provide confidence in the model.
Unfortunately, there is a temptation to accept kinetic validation
alone, particularly when data for mechanistic validation are unavail-
able. It should be remembered, however, that the simple act of adding
equations and parameters to a model will, in itself, increase the
flexibility of the model to fit data. Therefore, every attempt should
be made to obtain additional experimental data to provide support
for the mechanistic hypothesis underlying the model structure.
References
1. Andersen ME (1981) Saturable metabolism extravascular mode of administration. Arch

and its relation to toxicity. Crit Rev Toxicol Int Pharmacodyn 57:205–225
9:105–150 5. Teorell T (1937) Kinetics of distribution of
2. Monro A (1992) What is an appropriate mea- substances administered to the body.
sure of exposure when testing drugs for carci- I. The intravascular mode of administration.
nogenicity in rodents? Toxicol Appl Arch Int Pharmacodyn 57:226–240
Pharmacol 112:171–181 6. O’Flaherty EJ (1987) Modeling: an introduc-
3. Andersen ME, Clewell HJ, Krishnan K tion. National Research Council. In: Pharma-
(1995) Tissue dosimetry, pharmacokinetic cokinetics in risk assessment. Drinking water
modeling, and interspecies scaling factors. and health, vol 8. National Academy Press,
Risk Anal 15:533–537 Washington DC, pp. 27–35
4. Teorell T (1937) Kinetics of distribution of 7. Clewell HJ, Andersen ME (1985) Risk assess-
substances administered to the body. I. The ment extrapolations and physiological model-
ing. Toxicol Ind Health 1(4):111–131
8. Andersen ME, Clewell HJ, Gargas ML, Smith 21. Clewell HJ, Andersen ME (1986) A multiple
FA, Reitz RH (1987) Physiologically based dose-route physiological pharmacokinetic
pharmacokinetics and the risk assessment for model for volatile chemicals using ACSL/PC.
methylene chloride. Toxicol Appl Pharmacol In: Cellier FD (ed) Languages for continuous
87:185–205 system simulation. Society for Computer
9. Gerrity TR, Henry CJ (1990) Principles of Simulation, San Diego, pp 95–101
route-to-route extrapolation for risk assess- 22. Ramsey JC, Andersen ME (1984) A physio-
ment. Elsevier, New York logical model for the inhalation pharmacoki-
10. Clewell HJ, Jarnot BM (1994) Incorporation netics of inhaled styrene monomer in rats and
of pharmacokinetics in non-carcinogenic risk humans. Toxicol Appl Pharmacol
assessment: Example with chloropentafluoro- 73:159–175
benzene. Risk Anal 14:265–276 23. Adolph EF (1949) Quantitative relations in
11. Clewell HJ (1995) Incorporating biological the physiological constitutions of mammals.
information in quantitative risk assessment: Science 109:579–585
an example with methylene chloride. Toxicol- 24. Dedrick RL (1973) Animal scale-up. J Phar-
ogy 102:83–94 macokinet Biopharm 1:435–461
12. Clewell HJ (1995) The application of physio- 25. Dedrick RL, Bischoff KB (1980) Species
logically based pharmacokinetic modeling in similarities in pharmacokinetics. Fed Proc
human health risk assessment of hazardous 39:54–59
substances. Toxicol Lett 79:207–217 26. McDougal JN, Jepson GW, Clewell HJ, Mac-
13. Clewell HJ, Gentry PR, Gearhart JM, Allen BC, Naughton MG, Andersen ME (1986) A phys-
Andersen ME (1995) Considering pharmacoki- iological pharmacokinetic model for dermal
netic and mechanistic information in cancer risk absorption of vapors in the rat. Toxicol Appl
assessments for environmental contaminants: Pharmacol 85:286–294
examples with vinyl chloride and trichloroethy- 27. Paustenbach DJ, Clewell HJ, Gargas ML,
lene. Chemosphere 31:2561–2578 Andersen ME (1988) A physiologically based
14. Clewell HJ, Andersen ME (1996) Use of pharmacokinetic model for inhaled carbon
physiologically-based pharmacokinetic mod- tetrachloride. Toxicol Appl Pharmacol
eling to investigate individual versus popula- 96:191–211
tion risk. Toxicology 111:315–329 28. Vinegar A, Seckel CS, Pollard DL, Kinkead
15. Clewell HJ III, Gentry PR, Gearhart JM ER, Conolly RB, Andersen ME (1992) Poly-
(1997) Investigation of the potential impact chlorotrifluoroethylene (PCTFE) oligomer
of benchmark dose and pharmacokinetic pharmacokinetics in Fischer 344 rats: devel-
modeling in noncancer risk assessment. J Tox- opment of a physiologically based model.
icol Environ Health 52:475–515 Fundam Appl Toxicol 18:504–514
16. Himmelstein KJ, Lutz RJ (1979) A review of 29. Clewell HJ, Andersen ME (1989) Improving
the application of physiologically based phar- toxicology testing protocols using computer
macokinetic modeling. J Pharmacokinet Bio- simulations. Toxicol Lett 49:139–158
pharm 7:127–145 30. Bischoff KB, Brown RG (1966) Drug distri-
17. Gerlowski LE, Jain RK (1983) Physiologically bution in mammals. Chem Eng Prog Symp 62
based pharmacokinetic modeling: principles (66):33–45
and applications. J Pharm Sci 72:1103–1126 31. Astrand P, Rodahl K (1970) Textbook of
18. Fiserova-Bergerova V (1983) Modeling of work physiology. McGraw-Hill, New York
inhalation exposure to vapors: uptake distri- 32. International Commission on Radiological
bution and elimination, vol 1 and 2. CRC, Protection (ICRP) (1975) Report of the task
Boca Raton group on reference man. ICRP Publication 23
19. Bischoff KB (1987) Physiologically based 33. Environmental Protection Agency (EPA)
pharmacokinetic modeling. National (1988) Reference physiological parameters in
Research Council. In: Pharmacokinetics in pharmacokinetic modeling. EPA/600/6-88/
Risk Assessment. Drinking water and health, 004. Office of Health and Environmental
vol 8. National Academy Press, Washington, Assessment, Washington, DC
DC, pp. 36–61 34. Davies B, Morris T (1993) Physiological para-
20. Leung HW (1991) Development and meters in laboratory animals and humans.
utilization of physiologically based pharmaco- Pharm Res 10:1093–1095
kinetic models for toxicological applications. 35. Brown RP, Delp MD, Lindstedt SL,
J Toxicol Environ Health 32:247–267 Rhomberg LR, Beliles RP (1997) Physiologi-
cal parameter values for physiologically based
pharmacokinetic models. Toxicol Ind Health 48. Allen BC, Fisher J (1993) Pharmacokinetic
13(4):407–484 modeling of trichloroethylene and trichloroa-
36. Sato A, Nakajima T (1979) Partition coeffi- cetic acid in humans. Risk Anal 13:71–86
cients of some aromatic hydrocarbons and 49. Corley RA, Mendrala AL, Smith FA, Staats DA,
ketones in water, blood and oil. Br J Ind Gargas ML, Conolly RB, Andersen ME, Reitz
Med 36:231–234 RH (1990) Development of a physiologically
37. Sato A, Nakajima T (1979) A vial equilibra- based pharmacokinetic model for chloroform.
tion method to evaluate the drug metaboliz- Toxicol Appl Pharmacol 103:512–527
ing enzyme activity for volatile hydrocarbons. 50. Reitz RH, Mendrala AL, Corley RA, Quast
Toxicol Appl Pharmacol 47:41–46 JF, Gargas ML, Andersen ME, Staats DA,
38. Gargas ML, Burgess RJ, Voisard DE, Cason Conolly RB (1990) Estimating the risk of
GH, Andersen ME (1989) Partition coeffi- liver cancer associated with human exposures
cients of low-molecular-weight volatile che- to chloroform using physiologically based
micals in various liquids and tissues. Toxicol pharmacokinetic modeling. Toxicol Appl
Appl Pharmacol 98:87–99 Pharmacol 105:443–459
39. Jepson GW, Hoover DK, Black RK, McCaff- 51. Johanson G (1986) Physiologically based
erty JD, Mahle DA, Gearhart JM (1994) A pharmacokinetic modeling of inhaled 2-
partition coefficient determination method butoxyethanol in man. Toxicol Lett 34:23–31
for nonvolatile chemicals in biological tissues. 52. Bungay PM, Dedrick RL, Matthews HB
Fundam Appl Toxicol 22:519–524 (1981) Enteric transport of chlordecone
40. Clewell HJ (1993) Coupling of computer (Kepone) in the rat. J Pharmacokinet Bio-
modeling with in vitro methodologies to pharm 9:309–341
reduce animal usage in toxicity testing. 53. Tuey DB, Matthews HB (1980) Distribution
Toxicol Lett 68:101–117 and excretion of 2,20 ,4,40 ,5,50 -hexabromobi-
41. Bischoff KB, Dedrick RL, Zaharko DS, Long- phenyl in rats and man: pharmacokinetic
streth JA (1971) Methotrexate pharmacoki- model predictions. Toxicol Appl Pharmacol
netics. J Pharm Sci 60:1128–1133 53:420–431
42. Farris FF, Dedrick RL, King FG (1988) Cis- 54. Lutz RJ, Dedrick RL, Tuey D, Sipes IG,
platin pharmacokinetics: application of a phys- Anderson MW, Matthews HB (1984) Com-
iological model. Toxicol Lett 43:117–137 parison of the pharmacokinetics of several
43. Edginton AN, Theil FP, Schmitt W, Willmann polychlorinated biphenyls in mouse, rat,
S (2008) Whole body physiologically-based dog, and monkey by means of a physiological
pharmacokinetic models: their use in clinical pharmacokinetic model. Drug Metab Dispos
drug development. Expert Opin Drug Metab 12(5):527–535
Toxicol 4:1143–1152 55. King FG, Dedrick RL, Collins JM, Matthews
44. Andersen ME, Clewell HJ III, Gargas ML, HB, Birnbaum LS (1983) Physiological
MacNaughton MG, Reitz RH, Nolan R, model for the pharmacokinetics of 2,3,7,8-
McKenna M (1991) Physiologically based tetrachlorodibenzofuran in several species.
pharmacokinetic modeling with dichloro- Toxicol Appl Pharmacol 67:390–400
methane, its metabolite carbon monoxide, 56. Leung HW, Ku RH, Paustenbach DJ, Ander-
and blood carboxyhemoglobin in rats and sen ME (1988) A physiologically based phar-
humans. Toxicol Appl Pharmacol 108:14–27 macokinetic model for 2,3,7,8-
45. Andersen ME, Clewell HJ, Mahle DA, Gear- tetrachlorodibenzo-p-dioxin in C57BL/6J
hart JM (1994) Gas uptake studies of deute- and DBA/2J mice. Toxicol Lett 42:15–28
rium isotope effects on dichloromethane 57. Andersen ME, Mills JJ, Gargas ML, Kedderis
metabolism in female B6C3F1 mice in vivo. L, Birnbaum LS, Norbert D, Greenlee WF
Toxicol Appl Pharmacol 128:158–165 (1993) Modeling receptor-mediated pro-
46. Fisher J, Gargas M, Allen B, Andersen M cesses with dioxin: implications for pharmaco-
(1991) Physiologically based pharmacokinetic kinetics and risk assessment. Risk Anal 13
modeling with trichloroethylene and its metab- (1):25–36
olite, trichloroacetic acid, in the rat and mouse. 58. O’Flaherty EJ (1991) Physiologically based
Toxicol Appl Pharmacol 109:183–195 models for bone seeking elements. I. Rat skel-
47. Fisher JW, Allen BC (1993) Evaluating the etal and bone growth. Toxicol Appl Pharma-
risk of liver cancer in humans exposed to tri- col 111:299–312
chloroethylene using physiological models. 59. O’Flaherty EJ (1991) Physiologically based
Risk Anal 13:87–95 models for bone seeking elements. II. Kinetics
of lead disposition in rats. Toxicol Appl Phar- logically based pharmacokinetic modeling.
macol 111:313–331 Regul Toxicol Pharmacol 58:252–258
60. O’Flaherty EJ (1991) Physiologically based 71. Clewell RA, Merrill EA, Robinson PJ (2001)
models for bone seeking elements. III. The use of physiologically based models to
Human skeletal and bone growth. Toxicol integrate diverse data sets and reduce uncer-
Appl Pharmacol 111:332–341 tainty in the prediction of perchlorate and
61. O’Flaherty EJ (1993) Physiologically based iodide kinetics across life stages and species.
models for bone seeking elements. IV. Kinet- Toxicol Ind Health 17:210–222
ics of lead disposition in humans. Toxicol 72. Clewell RA, Merrill EA, Yu KO, Mahle DA,
Appl Pharmacol 118:16–29 Sterner TR, Mattie DR, Robinson PJ, Fisher
62. O’Flaherty EJ (1995) Physiologically based JW, Gearhart JM (2003) Predicting fetal per-
models for bone seeking elements. V. Lead chlorate dose and inhibition of iodide kinetics
absorption and disposition in childhood. during gestation: a physiologically-based
Toxicol Appl Pharmacol 131:297–308 pharmacokinetic analysis of perchlorate and
63. Mann S, Droz PO, Vahter M (1996) A physi- iodide kinetics in the rat. Toxicol Sci
ologically based pharmacokinetic model for 73:235–255
arsenic exposure. I. Development in hamsters 73. Clewell RA, Merrill EA, Yu KO, Mahle DA,
and rabbits. Toxicol Appl Pharmacol Sterner TR, Fisher JW, Gearhart JM (2003)
137:8–22 Predicting neonatal perchlorate dose and
64. Mann S, Droz PO, Vahter M (1996) A physi- inhibition of iodide uptake in the rat during
ologically based pharmacokinetic model for lactation using physiologically-based pharma-
arsenic exposure. II. Validation and applica- cokinetic modeling. Toxicol Sci 74:416–436
tion in humans. Toxicol Appl Pharmacol 74. Merrill EA, Clewell RA, Gearhart JM, Robin-
140:471–486 son PJ, Sterner TR, Yu KO, Mattie DR, Fisher
65. Farris FF, Dedrick RL, Allen PV, Smith JC JW (2003) PBPK predictions of perchlorate
(1993) Physiological model for the pharma- distribution and its effect on thyroid uptake of
cokinetics of methyl mercury in the growing radioiodide in the male rat. Toxicol Sci
rat. Toxicol Appl Pharmacol 119:74–90 73:256–269
66. McMullin TS, Hanneman WH, Cranmer BK, 75. Merrill EA, Clewell RA, Robinson PJ, Jarabek
Tessari JD, Andersen ME (2007) Oral absorp- AM, Gearhart JM, Sterner TR, Fisher JW
tion and oxidative metabolism of atrazine in (2005) PBPK model for radioactive iodide
rats evaluated by physiological modeling and perchlorate kinetics and perchlorate-
approaches. Toxicology 240:1–14 induced inhibition of iodide uptake in
humans. Toxicol Sci 83:25–43
67. Lin Z, Fisher JW, Ross MK, Filipov NM
(2011) A physiologically based pharmacoki- 76. McLanahan ED, Andersen ME, Campbell JL,
netic model for atrazine and its main metabo- Fisher JW (2009) Competitive inhibition of
lites in the adult male C57BL/6 mouse. thyroidal uptake of dietary iodide by perchlo-
Toxicol Appl Pharmacol 251:16–31 rate does not describe perturbations in rat
serum total T4 and TSH. Environ Health
68. Kirman CR, Hays SM, Kedderis GL, Gargas Perspect 117:731–738
ML, Strother DE (2000) Improving cancer
dose-response characterization by using phys- 77. Haddad S, Charest-Tardif G, Tardif R, Krish-
iologically based pharmacokinetic modeling: nan K (2000) Validation of a physiological
an analysis of pooled data for acrylonitrile- modeling framework for simulating the toxi-
induced brain tumors to assess cancer potency cokinetics of chemicals in mixtures. Toxicol
in the rat. Risk Anal 20:135–151 Appl Pharmacol 167:199–209
69. Sweeney LM, Gargas ML, Strother DE, Ked- 78. Haddad S, Beliveau M, Tardif R, Krishnan K
deris GL (2003) Physiologically based pharma- (2001) A PBPK modeling-based approach to
cokinetic model parameter estimation and account for interactions in the health risk
sensitivity and variability analyses for acryloni- assessment of chemical mixtures. Toxicol Sci
trile disposition in humans. Toxicol Sci 63:125–131
71:27–40 79. Dennison JE, Andersen ME, Yang RS (2003)
70. Takano R, Murayama N, Horiuchi K, Characterization of the pharmacokinetics of
Kitajima M, Kumamoto M, Shono F, Yama- gasoline using PBPK modeling with a
zaki H (2010) Blood concentrations of acry- complex mixtures chemical lumping
lonitrile in humans after oral administration approach. Inhal Toxicol 15:961–986
extrapolated from in vivo rat pharmacokinet- 80. Dennison JE, Andersen ME, Clewell HJ,
ics, in vitro human metabolism, and physio- Yang RSH (2004) Development of a
physiologically based pharmacokinetic model metabolite, trichloroacetic acid. Toxicol Appl

for volatile fractions of gasoline using chemi- Pharmacol 102:497–513
cal lumping analysis. Environ Sci Technol 91. Luecke RH, Wosilait WD, Pearce BA, Young
38:5674–5681 JF (1994) A physiologically based pharmaco-
81. Campbell JL, Fisher JW (2007) A PBPK kinetic computer model for human preg-
modeling assessment of the competitive met- nancy. Teratology 49:90–103
abolic interactions of JP-8 vapor with two 92. Andersen ME, Gargas ML, Clewell HJ,
constituents, m-xylene and ethylbenzene. Severyn KM (1987) Quantitative evaluation
Inhal Toxicol 19:265–273 of the metabolic interactions between trichlo-
82. Rowland M, Balant L, Peck C (2004) Physio- roethylene and 1,1-dichloroethylene by gas
logically based pharmacokinetics in drug uptake methods. Toxicol Appl Pharmacol
development and regulatory science: a work- 89:149–157
shop report (Georgetown University, 93. Mumtaz MM, Sipes IG, Clewell HJ, Yang
Washington, DC, May 29-30, 2002). AAPS RSH (1993) Risk assessment of chemical mix-
PharmSci 6:56–67 tures: biological and toxicologic issues. Fun-
83. Ramsey JC, Young JD (1978) Pharmacoki- dam Appl Toxicol 21:258–269
netics of inhaled styrene in rats and humans. 94. Barton HA, Creech JR, Godin CS, Randall
Scand J Work Environ Health 4:84–91 GM, Seckel CS (1995) Chloroethylene mix-
84. Ramsey JC, Young JD, Karbowski R, Cheno- tures: pharmacokinetic modeling and in vitro
weth MB, Mc Carty LP, Braun WH (1980) metabolism of vinyl chloride, trichloroethy-
Pharmacokinetics of inhaled styrene in human lene, and trans-1,2-dichloroethylene in rat.
volunteers. Toxicol Appl Pharmacol Toxicol Appl Pharmacol 130:237–247
53:54–63 95. Clewell HJ, Lee T, Carpenter RL (1994) Sen-
85. Stewart RD, Dodd HC, Baretta ED, Schaffer sitivity of physiologically based pharmacoki-
AW (1968) Human exposure to styrene netic models to variation in model
vapors. Arch Environ Health 16:656–662 parameters: methylene chloride. Risk Anal
86. Gearhart JM, Clewell HJ, Crump KS, Shipp 14:521–531
AM, Silvers A (1995) Pharmacokinetic dose 96. Andersen ME, Gargas ML, Ramsey JC (1984)
estimates of mercury in children and dose- Inhalation pharmacokinetics: evaluating sys-
response curves of performance tests in a temic extraction, total in vivo metabolism
large epidemiological study. Water Air Soil and the time course of enzyme induction for
Pollut 80:49–58 inhaled styrene in rats based on arterial blood:
87. Gearhart JM, Jepson GW, Clewell HJ, inhaled air concentration ratios. Toxicol Appl
Andersen ME, Conolly RB (1990) Physiolog- Pharmacol 73:176–187
ically based pharmacokinetic and pharmaco- 97. Andersen ME, Clewell HJ, Frederick CB
dynamic model for the inhibition of (1995) Applying simulation modeling to pro-
acetylcholinesterase by diisopropylfluoropho- blems in toxicology and risk assessment—a
sphate. Toxicol Appl Pharmacol short perspective. Toxicol Appl Pharmacol
106:295–310 133:181–187
88. Gearhart JM, Jepson GW, Clewell HJ, Ander- 98. Vinegar A, Winsett DW, Andersen ME,
sen ME, Connoly RB (1995) A physiologi- Conolly RB (1990) Use of a physiologically
cally based pharmacokinetic model for the based pharmacokinetic model and computer
inhibition of acetylcholinesterase by organo- simulation for retrospective assessment of
phosphate esters. Environ Health Perspect exposure to volatile toxicants. Inhal Toxicol
102(11):51–60 2:119–128
89. Fisher JW, Whittaker TA, Taylor DH, Clewell 99. Vinegar A, Jepson GW (1996) Cardiac sensiti-
HJ, Andersen ME (1989) Physiologically based zation thresholds of halon replacement chemi-
pharmacokinetic modeling of the pregnant rat: a cals predicted in humans by physiologically-
multiroute exposure model for trichlorethylene based pharmacokinetic modeling. Risk Anal
and its metabolite, trichloroacetic acid. Toxicol 16:571–579
Appl Pharmacol 99:395–414 100. Clewell HJ, Andersen ME, Wills RJ, Latriano
90. Fisher JW, Whittaker TA, Taylor DH, Clewell L (1997) A physiologically based pharmaco-
HJ, Andersen ME (1990) Physiologically kinetic model for retinoic acid and its meta-
based pharmacokinetic modeling of the lactat- bolites. J Am Acad Dermatol 36:S77–S85
ing rat and nursing pup: a multiroute expo- 101. Fiserova-Bergerova V (1975) Biological—
sure model for trichlorethylene and its mathematical modeling of chronic toxicity.
AMRL-TR-75-5. Aerospace Medical 111. Lam G, Chen M, Chiou WL (1981) Determi-

Research Laboratory, Wright-Patterson Air nation of tissue to blood partition coefficients
Force Base in physiologically-based pharmacokinetic
102. Gargas ML, Andersen ME (1989) Determining studies. J Pharm Sci 71(4):454–456
kinetic constants of chlorinated ethane metabo- 112. O’Flaherty EJ (1995) PBPK modeling for
lism in the rat from rates of exhalation. Toxicol metals. Examples with lead, uranium, and
Appl Pharmacol 97:230–246 chromium. Toxicol Lett 82–83:367–372
103. Gargas ML, Andersen ME, Clewell HJ 113. Environmental Protection Agency (EPA)
(1986) A physiologically-based simulation (1992) EPA request for comments on draft
approach for determining metabolic con- report of cross-species scaling factor for can-
stants from gas uptake data. Toxicol Appl cer risk assessment. Fed Reg 57:24152
Pharmacol 86:341–352 114. Carson ER, Cobelli C, Finkelstein L (1983)
104. Gargas ML, Clewell HJ, Andersen ME The mathematical modeling of metabolic and
(1990) Gas uptake techniques and the rates endocrine systems model formulation, identi-
of metabolism of chloromethanes, chlor- fication, and validation. Wiley, New York
oethanes, and chloroethylenes in the rat. 115. Rescigno A, Beck JS (1987) The use and
Inhal Toxicol 2:295–319 abuse of models. J Pharmacokinet Biopharm
105. Reitz RH, Mendrala AL, Guengerich FP 15:327–340
(1989) In vitro metabolism of methylene 116. Yates FE (1978) Good manners in good
chloride in human and animal tissues: use in modeling: mathematical models and com-
physiologically-based pharmacokinetic mod- puter simulations of physiological systems.
els. Toxicol Appl Pharmacol 97:230–246 Am J Physiol 234:R159–R160
106. Filser JG, Bolt HM (1979) Pharmacokinetics 117. Clewell HJ (1995) The use of physiologically
of halogenated ethylenes in rats. Arch Toxicol based pharmacokinetic modeling in risk
42:123–136 assessment: a case study with methylene chlo-
107. Andersen ME, Gargas ML, Jones RA, Jenkins ride. In: Olin S, Farland W, Park C, Rhom-
LH Jr (1980) Determination of the kinetic berg L, Scheuplein R, Starr T, Wilson J (eds)
constants of metabolism of inhaled toxicant Low-dose extrapolation of cancer risks: issues
in vivo based on gas uptake measurements. and perspectives. ILSI, Washington, DC
Toxicol Appl Pharmacol 54:100–116 118. Allen BC, Covington TR, Clewell HJ (1996)
108. Gargas ML, Clewell HJ, Andersen ME Investigation of the impact of pharmacoki-
(1986) Metabolism of inhaled dihalo- netic variability and uncertainty on risks pre-
methanes in vivo: differentiation of kinetic dicted with a pharmacokinetic model for
constants for two independent pathways. chloroform. Toxicology 111:289–303
Toxicol Appl Pharmacol 82:211–223 119. Clewell HJ, Andersen ME (1994)
109. Watanabe P, Mc Gown G, Gehring P (1976) Physiologically-based pharmacokinetic mod-
Fate of [14] vinyl chloride after single oral eling and bioactivation of xenobiotics. Toxi-
administration in rats. Toxicol Appl Pharma- col Ind Health 10:1–24
col 36:339–352 120. D’Souza RW, Francis WR, Andersen ME
110. Gargas ML, Andersen ME (1982) Metabo- (1988) Physiological model for tissue gluta-
lism of inhaled brominated hydrocarbons: thione depletion and increased resynthesis
validation of gas uptake results by determina- after ethylene dichloride exposure. J Pharma-
tion of a stable metabolite. Toxicol Appl Phar- col Exp Ther 245:563–568
macol 66:55–68
Chapter 19
Interspecies Extrapolation
Elaina M. Kenyon
Abstract
Interspecies extrapolation encompasses two related but distinct topic areas that are germane to quantitative
extrapolation and hence computational toxicology—dose scaling and parameter scaling. Dose scaling is the
process of converting a dose determined in an experimental animal to a toxicologically equivalent dose in
humans using simple allometric assumptions and equations. In a hierarchy of quantitative extrapolation
approaches, this option is used when minimal information is available for a chemical of interest. Parameter
scaling refers to cross-species extrapolation of specific biological processes describing rates associated
with pharmacokinetic (PK) or pharmacodynamic (PD) events on the basis of allometric relationships.
These parameters are used in biologically based models of various types that are designed for not only
cross-species extrapolation but also for exposure route (e.g., inhalation to oral) and exposure scenario
(duration) extrapolation. This area also encompasses in vivo scale-up of physiological rates determined in
various experimental systems. Results from in vitro metabolism studies are generally most useful for
interspecies extrapolation purposes when integrated into a physiologically based pharmacokinetic (PBPK)
modeling framework. This is because PBPK models allow consideration and quantitative evaluation of
other physiological factors, such as binding to plasma proteins and blood flow to the liver, which may be as
or more influential than metabolism in determining relevant dose metrics for risk assessment.
Key words: Scaling, Extrapolation, In vitro scale-up, Allometry, Cross-species, In vitro to in vivo
extrapolation (IVIVE)
1. Introduction
Interspecies extrapolation, also referred to as cross-species scaling,

includes the vast topic of allometry. Allometry is the study of the
usual variation in measurable characteristics of anatomy and physi-
ology as a function of overall body size (1, 2). It has been an area of
active research interest for over a century, originally inspired by a
desire to explain the general observation that smaller mammals
have higher rates of metabolism and shorter life spans compared
to larger mammals (3–5). In more recent years, renewed interest in
the study of allometry has been based on the need to determine
501
502 E.M. Kenyon
toxicologically equivalent doses to extrapolate the results of toxicity

studies in experimental animals to equivalent doses in humans
(6–8). This process is pivotal to the development of health protec-
tive guidance values that are used to establish limits on pollutant
levels in various environmental media (9, 10).
Physiologically based pharmacokinetic (PBPK) models are
extensively used as a tool for multiple risk assessment applications
and are the preferred method to address cross-species dose extrap-
olation. The reliability of PBPK models is directly related to the
accuracy of the chemical-specific parameters used as model inputs.
These chemical-specific parameters include metabolic rate con-
stants, partition coefficients, diffusion constants, and parameters
describing dermal or gastrointestinal absorption rates (11). Meta-
bolic rate parameters are often estimated using data derived from
in vitro experimental techniques and then scaled up for in vivo use
in PBPK models. Estimation of metabolic parameters using in vitro
data is increasingly necessary due to the number of chemicals for
which data are needed, the trend towards minimizing laboratory
animal use, and very limited opportunity to collect data in human
subjects (12).
This chapter deals with two related but distinct topic areas that
are germane to quantitative interspecies extrapolation and hence
computational toxicology—dose scaling and parameter scaling.
Dose scaling is the process of converting a dose determined in an
experimental animal to a toxicologically equivalent dose in humans.
Parameter scaling refers to cross-species extrapolation of specific
biological processes describing rates associated with pharmacoki-
netic (PK) or pharmacodynamic (PD) events on the basis of allo-
metric relationships. These parameters are used in biologically
based models of various types that are designed for not only
cross-species extrapolation but also for exposure route (e.g., inha-
lation to oral) and exposure scenario (duration) extrapolation. This
area also encompasses in vivo scale-up of physiological rates deter-
mined using various in vitro experimental systems, i.e., in vitro to
in vivo extrapolation (IVIVE).
2. Materials
There are a range of tools to assist in the analysis of the types of data
to which allometric scaling and in vitro scale-up procedures may be
applied. Because the calculations themselves tend to be numerically
simple, many commercially available spreadsheet and graphical
software (with curve fitting capabilities) packages are suitable for
these types of analyses. These software packages are typically able to
run on most desktop or laptop personal computers.
19 Interspecies Extrapolation 503
Experimentally determined enzyme kinetic data are often used

to estimate metabolism parameters that are scaled from the in vitro
to the in vivo situation for use in PBPK models. Graphical analyses
are often used as a means to easily and quickly estimate rate para-
meters such as KM, the Michaelis–Menten constant, and Vmax, the
maximum reaction velocity, for enzymes exhibiting saturable
(Michaelis–Menten) kinetic behavior. Graphical analyses using the
classical Lineweaver–Burk plot and more recently nonlinear regres-
sion analyses or alternative linear forms of the Michaelis–Menten
equation (e.g., Eadie–Hofstee or Hanes–Woolf) plot can be accom-
plished with a variety of graphical packages specifically adapted for
this purpose (e.g., SigmaPlot, SAAM II). Specific applications and
limitations of various methods for analysis of kinetic data are cov-
ered in depth in a number of texts (e.g., (13–15)).
3. Methods
3.1. Dose Scaling As used in this chapter, the term dose scaling is the process of
directly converting a dose determined in an experimental animal
model to a toxicologically equivalent dose in humans at a gross or
default level. The scientific basis for this procedure is the
generalized allometric equation
Y ¼ aðBWÞb ; (1)
where Y is the physiological variable of interest, BW is body
weight, a is the y-intercept, and b is the slope of the line obtained
from a plot of log Y vs. log BW.
This relationship was originally studied in regard to energy
utilization or basal metabolism by Kleiber whose analyses suggested
that basal metabolism scales to the 3/4 (0.75) power of body
weight across species (3–5). This was in contrast to the generally
accepted “surface law” at that time (16), i.e., the concept that basal
metabolism scales across species according to body surface area or
the 2/3 power of body weight (3, 4, 16). A variety of theories have
been advanced to explain this allometric relationship, including
“elastic similarity” of skeletal and muscular structures (17), “the
fractal nature of energy distributing vascular networks” (18, 19),
and others (20, 21). It should be noted that agreement on the
scaling exponent, i.e., 0.75, is not universal and some analyses
have been published which suggest 2/3 is the more appropriate
value (22). Dodds et al. (23) performed a reanalysis of data from
the published literature and concluded that available information
did not allow one to distinguish between an exponent of 2/3 vs.
3/4 as being more predictive.
504 E.M. Kenyon
In practice, BW0.75 is more widely accepted as a “default”

method for scaling oral doses across species for both cancer and
noncancer health effects compared to other exponents (2, 9, 10).
The meaning of default as used here applies to the situation where
insufficient chemical-specific information is available to use a more
data-informed approach (e.g., as outlined in ref. 24) or a PBPK
model. In this spectrum of approaches, an appropriately documen-
ted and evaluated PBPK model would be considered the optimal
choice for cross-species dose extrapolation (see next section for
discussion of interspecies scaling for PBPK model parameters). It
should be noted that for inhalation exposure, the US EPA has an
established framework for cross-species dosimetric adjustment that
does not depend on the availability of a PBPK model. This categor-
ical methodology is based on physical state (gas, particulate) as well
as reactivity and solubility for gases and regionally deposited dose
for particulates (25) and is conceptually analogous to the default
scaling of oral doses illustrated here (10).
For a default-type dosimetric adjustment, an oral dose (in mg/
kg/day) for an experimental animal is converted to a “human
equivalent dose” (HED) by multiplying the animal dose by a
dosimetric adjustment factor (DAF):

BW a 0:25
DAF ¼ : (2)
BW h
Or the mathematically equivalent alternative,

BW h 0:25
DAF ¼ ; (3)
BW a
where the subscripts “a” and “h” denote animal and human,
respectively, and the ¼ exponent results from the application of
BW0.75 scaling to exposure in units of mg/kg/day (rather than
mg/day) such that
BW 0:75
¼ BW 0:25 : (4)
BW 1=1
Therefore the HED is calculated as
HEDðmg=kg=dayÞ ¼ animal doseðmg/kg=dayÞ DAF: (5)
The empirical and theoretical basis for the development of a
generalized cross-species scaling factor inclusive of pharmacokinetic
processes has been detailed elsewhere (6, 7, 26), including limitations
and caveats which are described in greater detail in Subheading Note 2
of this chapter.
3.2. Parameter Scaling As used in this chapter, the term parameter scaling is the process of
scaling a physiological variable from an experimental animal value
to a human value for use in a PBPK model. Examples of parameters
typically scaled in PBPK models include alveolar ventilation rate,

cardiac output, and rates describing absorption and metabolism
(27). This is a more specific or refined use (compared to dose
scaling) of the allometric relationship denoted by equation (1)
proposed by Kleiber (3, 4) to describe energy utilization.
Analyses have been conducted to derive equations to describe
the relationship of body weight to a variety of physiological char-
acteristics and functions (e.g., (18, 28)). Examples of the coeffi-
cients conforming to the equation form, y ¼ a(BW)0.75, are shown
in Table 1. Examination of Table 1 reveals characteristics such as
blood volumes and organ weight increase in roughly direct propor-
tion to body weight. Rate processes, e.g., clearances and outputs,
typically scale in proportion to approximately (BW)0.75. When a
physiological parameter that varies in proportion to BW0.75 is nor-
malized against a characteristic that varies directly (BW1/1), scaling
will approximate BW0.25, i.e., BW0.75/BW1/1 ¼ BW0.25 (10).
On the basis of allometry, cross-species scaling by BW0.75 power
of body weight would be expected to yield reasonable initial estimates
for physiological quantities measured in volume per unit time (e.g.,
flows, ventilation, clearances) or mass per unit time (e.g., metabolic
capacity, i.e., Vmax). In the case of first-order rate processes (units of
inverse time, e.g., h1), cross-species scaling on the basis of BW0.25
would be used. In PBPK modeling, allometric scaling of parameter
values is generally used as a first approximation when appropriate
species-specific or chemical-specific parameter values are not available
(11). Numerical and statistical methods for parameter estimation and
optimization using pharmacokinetic data are discussed in greater
detail in Part VI of Computational Toxicology, Volume II.
In practice, species-specific parameter values are generally
available for physiological parameters such as blood flows and
organ volumes which also usually scale directly on the basis of
body weight (BW1/1). Chemical-specific parameters associated
with metabolism (e.g., Vmax) or binding (Bmax) that follow
Michaelis–Menten (saturable) kinetics are usually scaled by
BW0.75 when extrapolating from experimental animals to humans
using in vivo data (11).
Alternatively, metabolic rates may be determined for the chem-
ical of interest in vitro and scaled to the in vivo case as described in
Subheading 3.3 and illustrated in Subheading 4.3 of this chapter.
Note that body weight scaling is not applied in the case of affinity
constants (e.g., KM, BM) measured in units of mass per volume
(e.g., mg/L, mM). Tissue partition coefficients (PC), a unitless,
chemical-specific measure of tissue solubility, are typically normal-
ized against the chemical-specific blood-to-air partition coefficient
for humans as illustrated in Subheading 4.2. The underlying basis
for this correction is that tissue lipid and water content are impor-
tant determinants of tissue solubility and tissue composition is
highly conserved across mammalian species (27).
506 E.M. Kenyon
Table 1
Selected physiological characteristics and their body
weight (BW) scaling coefficientsa, b
Physiological characteristic (Y ) Units of Y a (y-intercept) b (slope)

Basal O2 consumption mL STP/h 3.8 0.734
Water intake mL/h 0.01 0.88
Urine output mL/h 0.0064 0.82
Ventilation rate mL/h 120 0.74
Tidal volume mL 6.2 103 1.01
Urea clearance mL/h 1.59 0.72
Inulin clearance mL/h 1.74 0.77
Creatinine clearance mL/h 4.2 0.69
Hippurate clearance mL/h 5.4 0.80
Heartbeat duration h 1.19 105 0.27
Breath duration h 4.7 105 0.28
Peristaltic (gut beat) duration h 9.3 105 0.31
Total nitrogen output g/h 7.4 105 0.735
Endogenous nitrogen output g/h 4.2 105 0.72
Creatinine nitrogen output g/h 1.09 106 0.9
Sulfur output g/h 1.71 106 0.74
Kidneys weight g 0.0212 0.85
Brain weight g 0.081 0.7
Heart weight g 6.6 103 0.98
Lungs weight g 0.0124 0.99
Liver weight g 0.082 0.87
Thyroids weight g 2.2 104 0.80
Adrenals weight g 1.1 103 0.92
Pituitary weight g 1.3 104 0.76
Stomach weight g 0.112 0.94
Blood weight g 0.055 0.99
Number of nephrons None 2,600 0.62
a
The values correspond to the equation Y ¼ a(BW) , where Y is the physio-
b
logical variable of interest, BW is body weight, a is the y-intercept, and b is the

slope of the line obtained from a plot of log Y vs. log BW
b
Adapted from (16, 26, 29)
3.3. In Vitro to In Vivo Data from various in vitro systems are commonly used to estimate
Scale-Up metabolic rate parameters that may then be scaled to the level of the
whole tissue and organism. Systems used include precision cut organ
slices, whole cell preparations (e.g., hepatocytes), subcellular tissue
fractions (i.e., microsomes and cytosol), and recombinantly expressed
enzymes. The performance, advantages, and limitations of these
systems have been compared and reviewed in a number of publica-
tions (e.g., (12, 30–32)). Figure 1 provides a schematic illustration of
the scaling procedures used with in vitro systems derived from liver
tissue for hepatocytes and subcellular fractions (microsomes and
Fig. 1. Schematic representation of experimental and computational steps necessary for

IVIVE based on the use of hepatic subcellular fractions (microsomes, cytosol) or hepato-
cytes. Microsomal protein per gram of liver (MPPGL), cytosolic protein per gram of liver
(CPPGL), and hepatocytes per gram of liver (HPGL) are ideally determined in the specific
experiment, but default values may also be obtained from the literature. While IVIVE for
rate of metabolism is illustrated here, other parameters such as KM and intrinsic clearance
(Vmax/KM) are also estimated utilizing these experimental systems.
cytosol), the two most commonly used experimental systems. While

the liver is generally the tissue of interest in metabolism studies,
conceptually the same principles apply to other tissues (e.g., kidney,
lung, gut) that can be metabolically active or toxicologically impor-
tant due to toxicity being caused by metabolism in the target organ.
The normalizing basis or units for rates of metabolism (Vmax)
reported in the literature will vary depending upon the experimental
system used. When using subcellular fractions, Vmax is typically
reported in units of mass per time per mg of microsomal or cytosolic
protein (e.g., nmol/min/mg protein). Specifically, it is the rate of
product formed (sometimes measured as disappearance of parent com-
pound) per unit time normalized to mg of microsomal protein or
cytosolic protein. To scale this rate to the whole liver, it is necessary
to know the mg of microsomal protein per gram of liver (MMPGL) or
mg of cytosolic protein per gram of liver (MCPGL) and the liver weight
(LW) in grams. The overall rate of the whole liver (LR) is given as
LR ¼ Vmax ðmass=time=mg proteinÞ MMPGL LWðgÞ: (6)
To convert this rate (in units of mass/time/liver) for use in a
PBPK model, it is necessary to divide this figure by BW0.75 to yield
508 E.M. Kenyon
units of mass/time/kg. This figure is often referred to as VmaxC in

the biological modeling literature. It is optimal to use figures for
MMPGL (or MCPGL) reported in the original source in which the
metabolism rate was reported. In practice, these data are often not
reported, thus necessitating the use of “default” or “typical” values
for MMPGL. Also, whole organ weights are generally estimated on
the basis of the average percentage of organ weight as a function of
body weight (12). For example, in humans, the liver is assumed to
be 2.6% of body weight. These figures vary depending upon the
species and their physiological status (age, gender, strain). Compi-
lations of physiological data, including organ weights as a percent-
age of body weight, are available (e.g., (33)).
When Vmax is derived from a recombinantly expressed enzyme
system (very often a cytochrome P450 isoenzyme or CYP, e.g., mass
of product/time/mass CYP protein), it is necessary to know the
mg of the particular enzyme (protein) per gram of liver (MEGL).
In this case, the overall rate of metabolism in the whole liver (LR) is
LR ¼ Vmax ðmass=time=mg enzyme proteinÞ MEGL LWðgÞ:
(7)
The same scaling procedure referenced above (division by
0.75
BW ) is applied to derive a VmaxC for use in a PBPK model.
If Vmax is determined using hepatocytes, data are typically
expressed in units of product formed per unit time per million
cells (e.g., nmol/min/106 cells). In this instance, the scaling factor
needed to estimate the rate for the whole liver is hepatocellularity
per gram of liver (HPGL). Thus, the overall rate of metabolism for
the whole liver (LR) is
LR ¼ Vmax ðmass=time=106 cellsÞ HPGLðcells/g liverÞ LWðgÞ:
(8)
This calculation yields a rate for the whole liver in units of
mass/time which is converted to VmaxC for use in a PBPK model
by division by BW0.75 as described previously (12).
Although generally estimated from in vitro studies, the Michae-
lis–Menten constant, KM, is often used directly in PBPK models after
conversion to appropriate units as necessary. Usually studies in the
published literature report KM values that were determined in aque-
ous solution, but for use in a PBPK model it is more biologically
relevant and accurate to express KM in terms of the concentration in
venous blood at equilibrium with liver (assuming liver is the tissue of
interest). This is accomplished by dividing the KM determined in
aqueous suspension by the liver:blood partition coefficient for the
chemical that is used in the PBPK model, i.e.,
aqueous suspension KM
PBPK KM ¼ : (9)
liver : blood PC
This procedure assumes that the concentration determined in

the in vitro suspension adequately represents the concentration
in liver that results in the half-maximal rate of metabolism (34).
In all cases where metabolism parameters are determined
in vitro using tissues from experimental animals, these parameters
are extrapolated from the experimental animals to humans using
the procedures outlined in Subheading 3.2 and illustrated in Sub-
heading 4.2 of this chapter. While the above discussion has focused
on point estimates which may be thought of in the context of
measures of central tendency, the impact of interindividual varia-
bility in metabolic rate parameters on PBPK model predictions used
in risk assessment has been explored and illustrated in several
publications (35–37).
4. Examples
4.1. Dose Scaling Table 2 below illustrates the results of BW0.75 scaling of a hypo-
thetical dose of 10 mg/kg in each of the four species of experimen-
tal animal to a “toxicologically equivalent” dose in a 70 kg human
using equations (2) or (3) and (5) from Subheading 3.1. The
significant assumptions and limitations inherent in this procedure
are discussed in Subheading 5.1.
4.2. Parameter Scaling Example 1—Cross-Species Extrapolation of Partition Coefficients:

Tissue-to-blood partition coefficients for humans are typically esti-
mated by dividing the tissue-to-air partition coefficient obtained
using rodent tissues by the blood-to-air partition coefficient using
human blood which is readily obtainable compared to human tissue.
Table 2
Use of the Dosimetric Adjustment Factor (DAF) in derivation
of a human equivalent dose (HED) from an oral animal
exposure
HED
Animal BWa BWh Animal dose DAF DAF (mg/kg-day)
species (kg) (kg) (mg/kg/day) Eq. (2) Eq. (3) Eq. (5)
Mouse 0.025 70 10 0.137 0.137 1.37
Rat 0.25 70 10 0.244 0.244 2.44
Rabbit 3.5 70 10 0.473 0.473 4.73
Dog 12 70 10 0.643 0.643 6.43
Note that Eqs. (2) and (3), i.e., DAF ¼ (BWa/BWh)
0.25
and DAF ¼ (BWh/
BWa)0.25, yield the same result
510 E.M. Kenyon
Table 3
Calculation of human tissue-to-blood partition coefficients
(PC) for toluenea
Rat experimental Calculate human

Tissue tissue:air PC tissue:blood PC
Liverb 42.3 42.3/13.9 ¼ 3.04
Liver c
83.6 83.6/13.9 ¼ 6.01
Muscle b
27.8 27.8/13.9 ¼ 2.00
Musclec 27.7 27.7/13.9 ¼ 1.99
Brain b
36.1 36.1/13.9 ¼ 2.60
Lung b
23.9 23.9/13.9 ¼ 1.72
Stomach b
55.1 55.1/13.9 ¼ 3.96
Skinb 22.9 22.9/13.9 ¼ 1.65
a
Partition coefficients were measured using the vial equilibration technique.
Thrall et al. (38) reported a human blood:air PC of 13.9
b
Reported in Thrall et al. (38)
c
Reported in Gargas et al. (39)
animal tissue : air PC

Human tissue : blood PC ¼ : (10)
human blood : air PC
This is illustrated for the volatile organic chemical toluene in Table 3
below. Assumptions and limitations of this approach are discussed in
Subheading 5.2.
Example 2—Cross-Species Scaling of Michaelis–Menten Vmax and
First-Order Rate (ka, kf) Constants: Michaelis–Menten rate parameters
can be estimated from in vivo experimental animal pharmacokinetic data
using a PBPK model. In this case, BW0.75 scaling is generally used, such that
Human Vmax ðVmax h in mass/timeÞ
¼ animal Vmax ðmass=time/kg BWÞ ðBW h Þ0:75 : (11)
In the case of first-order processes for absorption (ka) or metabo-
lism (kf) that have units of reciprocal time (e.g., h1), scaling is to
the 0.25 power such that (Table 4)
Human ka or kf ðkf h or ka h in time1 Þ ¼ animal ka or Kf
ðBW h Þ025 =ðBW a Þ0:25 : (12)
4.3. In Vitro to In Vivo Example: Scaling Vmax and KM for Chloroform (CHCL3) from In
Scale-Up Vitro Data Obtained Using Human Microsomes (34):
LR ¼ in vitro Vmax MEGL LW ¼ 5:24 pmole CHCl3 = min =
pmol CYP 2E1 2; 562 pmole CYP2E1=g liver 1; 820 g liver
¼ 24; 433; 281:6 pmoles CHCl3 = min =whole liver:
Table 4
Examples of cross-species scaling of rate parameters from a variety
of in vivo pharmacokinetic data
Chemical and Experimental data and animal

reference parameter Scaled human parameter
BDCM (40) Vmaxa ¼ 12.8 mg/h/kg Assuming a 70 kg human,
Vmaxh ¼ 12.8 (70)0.75
¼ 310 mg/h
Data used for estimation was Br ion
concentration in blood of rats at the
end of 4-h continuous exposure to
50, 100, 150, 200, 400, 800, 1,200,
1,600, or 3,200 ppm
bromodichloromethane (BDCM)
Benzene (41) Vmaxa ¼ 14.0 mmol/h/kg Assuming a 70 kg human,
Vmaxh ¼ 14.0 (70)0.75
¼ 339 mmol/h
Data used for estimation was a set of
curves for disappearance of benzene from
a closed vapor uptake chamber for male
B6C3F1 mice exposed to initial benzene
concentrations of 440, 1,250, or
2,560 ppm
Methyl kf ¼ 7.8/h for a 0.225 kg rat Assuming a 70 kg human,
chloroform kfh ¼ kfa (BWh)0.25/
(42) (BWa)0.25
¼ 7.8 (70)025/(0.225)025
¼ 1.86/h
Data used for estimation was a set of
curves for disappearance of benzene from
a closed vapor uptake chamber for male
F344 rats exposed to initial
concentrations of 0.2, 1.0, 10, or
210 ppm methyl chloroform
Note: If Vmax estimated from in vivo animal data is given in units of mass/time in a study, then the
conversion is Vmaxh ¼ Vmaxa (BWh)0.75/(BWa)0.75
For this example the units of time, mass of chloroform, organ or

body mass, and volume used in the PBPK model were h, mg, kg,
and L, respectively. Thus to convert the above figure to units
appropriate for the model (Table 5),
LR ¼ 24; 433; 281:6 pmoles CHCl3 = min 60 min =h 119 pg
CHCl3 =pmole 109 mg=pg ¼ 175 mg=h=liver:
Converting this LR to VmaxC,
Vmax C ¼ 175 mg=h/liver=ð70 kgÞ0:75

¼ 8:9 mg=h=kg:
512 E.M. Kenyon
Table 5
Data used for in vitro to in vivo extrapolation
of Vmax and KM for CHCl3
Item Value and units Source

In vitro Vmax 5.24 pmole CHCl3/min/pmol (43)
CYP 2E1
In vitro KM 18.27 mg/L (43)
CYP2E1 2,562 pmole CYP2E1/g liver (35)
concentration
Body mass 70 kg Assumed
value
Liver % of BW 2.6% (33)
Liver:blood PC 1.6 (34)
Time 60 min/h
FW CHCl3 119.4 pg CHCl3/pmole
Liver weight (LW) 70 kg 0.026 ¼ 1.82 kg (1,820 g) Calculated
value
To convert the in vitro KM determined in aqueous solution to a

value for use in PBPK model,
PBPK KM ¼ aqueous suspension KM =liver:blood PC
¼ 18:27 mg=L=1:6
¼ 11:4 mg=L mg=103 mg
¼ 0:012 mg=L:
An important consideration for IVIVE of this type is how well they
compare to estimates derived using other methods. For example,
the VmaxC calculated above (8.9 mg/h/kg) compares well (within
twofold) with other VmaxC values developed for chloroform of
7 mg h/kg (44) and 15.7 mg/h/kg (45).
5. Notes
1. A fundamental, but easy-to-overlook, principle if one is focused

on numerical analysis is that the quality of any parameter esti-
mate is only as reliable as the data on which it is based. For this
reason, the underlying biochemical, physiological, or toxico-
logical data used for parameter estimation or cross-species scal-
ing should be evaluated carefully for appropriateness of
experimental design and validity of methodology. The latter

includes but is not limited to in vitro and in vivo experimental
techniques, analytical chemistry methodology, and statistical
analysis.
2. Dose Scaling - Pharmacokinetic dose scaling based on BW0.75 is a
scientifically sound default option when it is necessary to perform
cross-species scaling to estimate “toxicologically equivalent
doses” in the absence of sufficient chemical-specific data. If the
dose scaled is one that elicited a toxic response in the experimental
animal, this may be a function of either or both pharmacokinetic
and pharmacodynamic processes. Rhomberg and Lewandowski
(7) have noted that allometric relationships for cross-species scal-
ing of pharmacodynamic processes have received relatively little
study and thus represent an important source of uncertainty that
merits further study. Some analyses have demonstrated that there
are certain conditions under which default allometrically based
dose scaling may yield clearly erroneous estimates and other
instances where significant uncertainties exist.
3. Dose Scaling - Overall, BW0.75 scaling is most reliable for
situations in which the chemical itself or its stable metabolite
is the putative toxic agent and clearance follows first-order
processes, i.e., when the appropriate dose metric is area under
the curve (6, 46). Conversely, when the putative toxic agent is a
highly reactive chemical or a metabolite that reacts with cellular
components at the site of formation, BW0.75 scaling is less
reliable (8, 46, 47).
4. Dose Scaling - Scaling on the basis of delivered dose per unit
surface area is preferable to BW3/4 scaling in cases where toxic
effects are mediated by direct action of a chemical or its metabo-
lites on tissues at the site of first contact. These are often referred
to as “portal-of-entry” effects. This could apply to the respiratory
tract for inhalation exposures, skin in the case of dermal exposures,
and epithelial tissues lining the gastrointestinal tract for oral expo-
sures (10, 25).
5. Dose Scaling - Evidence also suggests that BW0.75 scaling is not
applicable in cases where toxicity is a consequence of an acute
high-level exposure that suddenly overwhelms normal physiolog-
ical processes without opportunity for repair or regeneration.
Analysis by Rhomberg and Wolff (48) suggests that this is the
case among small laboratory rodents for lethality as a consequence
of acute oral exposure. Further analysis by Burzala-Kowalczyk and
Jongbloed (49) extended this observation to larger species (e.g.,
monkeys, dogs, rabbits) and parenteral routes of exposure (e.g.,
intramuscular, intravenous, intraperitoneal). In both cases direct
body weight scaling (BW1/1) fits the data better than BW0.75
scaling (48, 49).
514 E.M. Kenyon
6. Dose Scaling - BW scaling across life stages can engender

considerable uncertainty. Within species, dose scaling for
effects observed in mature organisms to immature organisms
is most reliable when there is mechanistic data on the physio-
logical processes affected with corresponding data on the matu-
rity of these systems in the life stage of interest. Some recent
analyses suggest that BW0.75 scaling is descriptive of various
toxicokinetic differences observed for pharmaceuticals across
ages in humans down to about 6 months of age (50–52).
Toxicodynamic differences across life stages are a largely unex-
plored research area (7). In addition, differences across species
in patterns of development can have considerable impact on
interspecies dose scaling from experimental animals to imma-
ture humans (53–55).
7. Parameter Scaling - The reliability of parameter scaling will be
highly dependent on the reliability of the experimental tech-
nique used to obtain the data and its applicability to the chemi-
cal in question. For example, for volatile chemicals the in vitro
vial equilibration technique is well established as an experimen-
tal method to obtain estimates of tissue partitioning in a variety
of media (39). Analogous in vitro experimental techniques for
estimating partitioning of nonvolatile chemicals (e.g., using
equilibrium dialysis or ultrafiltration) are fewer and not as
widely used (56, 57). It is also feasible to use in vivo pharmaco-
kinetic data to estimate partition coefficients for nonvolatile
chemicals, e.g., the area method of Gallo et al. (58), provided
that measurements are obtained under steady-state conditions.
8. Parameter Scaling - An important caveat in the interpretation
of all experimental data used to estimate tissue partitioning is
that the estimates themselves may be compromised if other
physiological effectors (e.g., metabolism, binding, presence of
residual blood) are not accounted for in the experimental
design or analysis of the data (59, 60). In the case of partition
coefficients, there are also a number of computational algo-
rithms that have been developed to estimate them on the basis
of tissue composition and physicochemical characteristics in
the absence of direct experimental data (e.g., (61, 62)). Appli-
cation and limitations for some of these approaches are dis-
cussed in Chapter 6.
9. Parameter Scaling - In the case where rate constants are esti-
mated from in vivo animal pharmacokinetic data, reliable
parameter estimation is only possible when the parameter is
sensitive to changes in the type of experimental endpoints
being measured. Vapor uptake data, typically a series of curves
over a range of initial starting concentrations measuring a
decline in air concentration in a chamber due to metabolism
by live rodents, are a good example of this. Vapor uptake data
are very sensitive to changes in Vmax, but typically much less

so to changes in KM. Techniques, such as sensitivity analysis
(see Chapter 17 and 18), are useful to determine whether a
particular type of data (response) is sufficiently sensitive (under
a given set of experimental conditions) to a change in the value
of a parameter to be useful to estimate that parameter.
10. Parameter Scaling - Another important aspect of metabolic
parameter estimation using in vivo data (that also applies to
in vitro data) is matching what is measured experimentally
to the scale or level of detail at which it is desired to describe
metabolism in the PBPK model. For example, experimental
techniques that measure the disappearance of a parent chemical
can provide a good estimate of overall metabolic clearance.
However, these same data would yield relatively less accurate
rate estimates of individual downstream metabolic reactions.
11. IVIVE - Reliable in vitro to in vivo scale-up requires accurate
scaling factors and thus a thorough understanding of the
biological and experimental factors that can influence their
estimation. This is particularly important if the parameter
being estimated from in vitro data is subject to variation that
may impact model predictions for pharmacokinetic responses
(i.e., dose metrics such as AUC) to be used in quantitative risk
assessment. The discussion here focuses on some more general
factors that impact IVIVE as well as the scaling factors them-
selves—microsomal protein (MPPGL) and cytosolic protein
(CPPGL) per gram of liver and HPGL.
12. IVIVE - It is a generally recognized principle that in vitro
pharmacokinetic measurements intended for extrapolation to
the level of the intact or whole organism (animal or human)
should be done under experimental conditions that mimic
the in vivo situation as closely as possible (35). Thus, it is critical
that the concentrations used for studies are within the range
of tissue concentrations that are either (1) observed in the
intact organism or (2) predicted on the basis of a kinetic
model under realistic exposure conditions. Another basic prin-
ciple is that the experimental conditions under which enzyme
activity was determined must be carefully evaluated in terms of
biological reality and the impact this may have on numerical
values reported. This will help ensure that methodology-
specific limitations are appropriately identified to distinguish
experimental artifacts from inherent biological characteristics.
This is necessary because experimental conditions are usually
designed to mitigate limiting factors, i.e., pH, buffering (ionic
strength), temperature, oxygen tension, cofactors and substrate
concentration, etc., are adjusted to optimize metabolic trans-
formation (12).
516 E.M. Kenyon
13. IVIVE - Ideally, scaling factors (MPPGL, CPPGL, and HPGL)

used should be obtained from the same source (publication) or
matched to the same experimental model (e.g., a specific strain,
age, gender of rat) from which the enzymatic data are obtained.
In practice, such data are rarely provided in published manu-
scripts because either they were not directly relevant to the
goals of the original experiment or were not feasible to deter-
mine in the in vitro system used (e.g., vendor-purchased hepa-
tocytes). This necessitates the use of average or default scaling
factors from other sources and requires an understanding of
factors which may impact their value. In general, MPPGL
seems to exhibit greater variability compared to HPGL. For
example, Carlile et al. (31) reported that in rats treated with
different enzyme inducers (e.g., phenobarbital, dexametha-
sone), hepatocellularity was essentially unchanged compared
to untreated rats, whereas recovery of microsomal protein
varied by twofold. It is also worthy of note that scaling to the
level of the whole organism from other experimental systems,
e.g., using data from recombinantly expressed enzyme systems,
especially the cytochrome P450 enzymes, is complicated by
differences in lipid matrix in this in vitro system compared to
intact hepatocytes or microsomes. These issues are more fully
discussed in ref. 12.
14. IVIVE - Barter et al. (63) reviewed the literature with the goal
of establishing average consensus values for MPPGL and
HPGL in adult human liver and estimates of variability as well
as significant covariates. On the basis of a meta-analytic
approach, they recommended weighted mean values of
32 mg/g liver (95% CI 29–34 mg/g) and 99 106 cells/
g liver (95% CI 74–131 106 cells/g liver) for MPPGL and
HPGL, respectively. The authors reported a weak, but statisti-
cally significant, negative correlation between age and both
MMPGL and HPGL, whereas no statistically significant rela-
tionship was reported for other covariates, such as gender,
smoking status, or alcohol consumption.
15. IVIVE - Some studies have been conducted with the goal of
evaluating the predictive value of different in vitro systems
compared to in vivo data for metabolic clearance. These studies
illustrate some of the issues which arise with various methodol-
ogies. For example, Houston(64) reported that for low-
clearance drugs both hepatocyte and microsomal systems
were reasonably predictive of in vivo metabolic clearance in
rats. For high-clearance drugs, rat hepatocytes were also pre-
dictive of in vivo metabolic clearance whereas estimates based
on rat microsomal data tended to underestimate in vivo clear-
ance. Based on their review of the literature, these authors also
reported values of MMPGL and HPGL of 45 mg/gram liver
and 1.35 108 hepatocytes/gram liver for the rat.
16. IVIVE - Andersson et al. (30) compared the metabolic clearance

of the antiarrhythmic drug, Almokalant, predicted from multiple
in vitro systems (microsomes, tissue slices, hepatocytes, and
cDNA-expressed P450 enzymes) with in vivo data from human
subjects; they reported that all the in vitro systems evaluated
under-predicted in vivo clearance. The authors suggested that
for drugs that are extensively conjugated (like Almokalant which
is predominantly glucuronidated) under-prediction could be
accounted for by differences in access to UDP-glucuronic acid,
latency of the enzyme in the endoplasmic reticulum, or hydrolysis
by other enzymes in the incubation system. In studies with Diaz-
epam (an antianxiety drug), Carlile et al. (31) presented evidence
that the under-prediction of in vivo clearance using rat liver micro-
somes was attributable to product inhibition, i.e., accumulation of
initially formed metabolites in the medium effectively inhibited
further metabolism of the parent compound. Other investigators
have demonstrated the importance of quantifying and accounting
for factors in the experimental system such as binding to proteins
or nonbiological components in the experimental system (e.g.,
(65–67)). Finally it should also be noted that results from in vitro
metabolism studies are most useful for extrapolation purposes
when integrated into a PBPK modeling framework. This is
because PBPK models allow consideration and quantitative eval-
uation of other physiological factors such as binding to plasma
proteins and consideration of blood flow to the liver which may be
as or more influential than metabolism in determining relevant
dose metrics for purposes of risk analysis (12, 67, 68).
Disclaimer
This manuscript has been reviewed in accordance with the policy of

the National Health and Environmental Effects Research Labora-
tory, US Environmental Protection Agency, and approved for pub-
lication. Approval does not signify that the contents necessarily
reflect the views and policies of the Agency, nor does mention of
trade names or commercial products constitute endorsement or
recommendation for use.
References
1. Dedrick RL (1973) Animal scale-up. J Pharma- based on equivalence of mg/kg3/4/day.
cokinet Biopharm 1:435–461 Notice Fed Reg 57:24152–24173
2. U.S. EPA (U.S. Environmental Protection 3. Kleiber M (1932) Body size and metabolism.
Agency) (1992) Draft report: a cross-species Hilgardia 6:315–353
scaling factor for carcinogen risk assessment
518 E.M. Kenyon
4. Kleiber M (1947) Body size and metabolic 22. White CR, Seymour RS (2003) Mammalian basal
rate. Physiol Rev 27:511–541 metabolic rate is proportional to body mass2/3.
5. Kleiber M (1961) The fire of life: an introduc- Proc Natl Acad Sci U S A 100:4046–4049
tion to animal energetics. Wiley, New York, NY 23. Dodds PS, Rothman DH, Weitz JS (2001)
6. O’Flaherty EJ (1989) Interspecies conversion Re-examination of the “3/4-law” of metabo-
of kinetically equivalent doses. Risk Anal lism. J Theor Biol 209:9–27
9:587–598 24. IPCS (International Programme on Chemical
7. Rhomberg LR, Lewandowski TA (2006) Safety) (2005) Guidance document for the use
Methods for identifying a default cross-species of data in development of chemical-specific
scaling factor. Hum Ecol Risk Assess adjustment factors (CSAFs) for interspecies dif-
12:1094–1127 ferences and human variability in dose/con-
8. Travis CC, White RK (1988) Interspecies scal- centration-response assessment. World Health
ing of toxicity data. Risk Anal 8:119–125 Organization, Geneva
9. U.S. EPA (2005) Guidelines for carcinogen 25. U.S. EPA (1994) Methods for derivation
risk assessment. EPA/630/P-03/001F Risk of inhalation reference concentrations and
Assessment Forum, Washington, DC application of inhalation dosimetry. EPA/
600/8-90/066F. Environmental Criteria and
10. U.S. EPA (2011) Harmonization in interspe- Assessment Office, Washington, DC
cies extrapolation: use of body weight3/4 as the
default method in derivation of the oral refer- 26. Boxenbaum H (1982) Interspecies scaling,
ence dose. EPA/100/R11/0001 Risk Assess- allometry, physiological time, and the ground
ment Forum, Washington, DC plan of pharmacokinetics. J Pharmacokinet
Biopharm 10:201–227
11. Clewell HJ, Reddy MB, Lave T, Andersen ME
(2008) Physiologically based pharmacokinetic 27. Reddy MB, Yang RSH, Clewell HJ, Andersen
modeling. In: Gad SC (ed) Preclinical develop- ME (2005) Physiologically based pharmacoki-
ment handbook: ADME and biopharmaceutical netic modeling—science and applications.
properties. Wiley, New York, NY, pp 1167–1227 Wiley Interscience, Hoboken, NJ
12. Lipscomb JC, Poet TS (2008) In vitro 28. Adolph EF (1949) Quantitative relations in the
measurements of metabolism for application physiological constitutions of mammals. Sci-
in pharmacokinetic modeling. Pharmacol ence 109:579–585
Ther 118:82–103 29. Mordenti J (1986) Man versus beast: pharma-
13. Matthews JC (1993) Fundamentals of recep- cokinetic scaling in mammals. J Pharm Sci
tor, enzyme and transport kinetics. CRC, Boca 75:1028–1040
Raton, FL 30. Andersson TB, Sjoberg H, Hoffman K-J, Boo-
14. Cornish-Bowden A (1995) Analysis of enzyme bis AR, Watts P, Edwards RJ, Lake BJ, Price RJ,
kinetic data. Oxford University Press, Oxford Renwick AB, Gomez-Lechon MJ, Castell JV,
Ingelman-Sundberg M, Hidestrand M,
15. Cornish-Bowden A (2004) Fundamentals of Goldfarb PS, Lewis DFV, Corcos L, Guillouzo
enzyme kinetics, 3rd edn. Portland Press, London A, Taavitsainen P, Pelkonen O (2001) An assess-
16. Rubner M (1883) Uber den einfluss der kor- ment of human liver-derived in vitro systems to
pergrosse auf stoff- und kraftwechsel. Zeit Biol predict the in vivo metabolism and clearance of
19:536–562 almokalant. Drug Metab Dispos 29:712–720
17. McMahon TA (1975) Using body size to under- 31. Carlile DJ, Zomorodi K, Houston JB (1997)
stand the structural design of animals: quadru- Scaling factors to relate drug metabolic clear-
pedal locomotion. J Appl Physiol 39:619–627 ance in hepatic microsomes, isolated hepato-
18. West GB, Brown JH, Endquist BJ (1997) A cytes and the intact liver—studies with
general model for the origin of allometric scal- induced livers involving diazepam. Drug
ing laws in biology. Science 276:122–126 Metab Dispos 25:903–911
19. West GB, Woodruff WH, Brown JH (2002) Allo- 32. Tang W, Wang RW, Lu AYH (2005) Utility of
metric scaling of metabolic rate from molecules recombinant cytochrome P450 enzymes: a
and mitochondria to cells and mammals. Proc drug metabolism perspective. Curr Drug
Natl Acad Sci U S A 99(Suppl 1):2473–2478 Metab 6:503–517
20. Banavar JR, Maritan A, Rinaldo A (1999) Size 33. Brown RP, Delp MD, Lindstedt SL, Rhomberg
and form in efficient transportation networks. LR, Beliles RP (1997) Physiological parameter
Nature 399:130–131 values for physiologically based pharmacokinetic
21. Bejan A (2000) Shape and structure, from models. Toxicol Ind Health 13:407–484
engineering to nature. Cambridge University 34. U.S. EPA (Lipscomb JC, Kedderis GL) (2005)
Press, Cambridge Use of physiologically based pharmacokinetic
models to quantify the impact of human age 47. Travis CC (1990) Tissue dosimetry for reactive
and interindividual differences in physiology metabolites. Risk Anal 10:317–321
and biochemistry pertinent to risk: final report 48. Rhomberg LR, Wolff SK (1998) Empirical
for cooperative agreement ORD/NCEA Cin- scaling of single oral lethal doses across mam-
cinnati, OH EPA/600/R-06-014A malian species base on a large database. Risk
35. Lipscomb JC, Teuschler LK, Swartout JC, Pop- Anal 18:741–753
ken D, Cox T, Kedderis GL (2003) The impact 49. Burzala-Kowalczyk L, Jongbloed G (2011)
of cytochrome P450 2E1-dependent metabolic Allometric scaling: analysis of LD50 data. Risk
variance on a risk relevant pharmacokinetic out- Anal 31:523–532
come in humans. Risk Anal 23:1221–1238 50. Ginsberg G, Hattis D, Sonawane B, Russ A,
36. Lipscomb JC, Kedderis GL (2002) Incorporat- Banati P, Kozlak M, Smolenski S, Goble R
ing human interindividual biotransformation (2002) Evaluation of child/adult pharmacoki-
variance in health risk assessment. Sci Total netic differences from a database derived from
Environ 288:12–21 the therapeutic drug literature. Toxicol Sci
37. Lipscomb JC (2004) Evaluating the relation- 66:185–200
ship between variance in enzyme expression 51. Ginsberg G, Hattis D, Miller R, Sonawane B
and toxicant concentration in health risk assess- (2004) Pediatric pharmacokinetic data: impli-
ment. Hum Ecol Risk Assess 10:39–55 cations for environmental risk assessment for
38. Thrall KD, Gies RA, Muniz J, Woodstock AD, children. Pediatrics 113(Suppl):973–983
Higgins G (2002) Route-of-entry and brain 52. Hattis D (2004) Role of dosimetric scaling and
tissue partition coefficients for common super- species extrapolation in evaluating risks across
fund contaminants. J Toxicol Environ Health life stages IV pharmacodynamic dosimetric
Part A 65:2075–2086 considerations. Report to the U.S. Environ-
39. Gargas ML, Burgess RJ, Voisard DE, Cason mental Protection Agency under RFQ No
GH, Andersen ME (1989) Partition coeffi- DC-03-00009
cients of low-molecular-weight volatile chemi- 53. Finlay BL, Darlington RB (1995) Linked reg-
cals in various liquids and tissues. Toxicol Appl ularities in the development and evolution of
Pharmacol 98:87–99 mammalian brains. Science 268:1578–1584
40. Lilly PD, Andersen ME, Ross TM, Pegram RA 54. Renwick AG, Lazarus NR (1998) Human
(1997) Physiologically based estimation of variability and noncancer risk assessment—an
in vivo rates of bromodichloromethane metab- analysis of the default uncertainty factor. Regul
olism. Toxicology 124:141–152 Toxicol Pharmacol 27:3–20
41. Kenyon EM, Kraichely RE, Hudson KT, Med- 55. Clancy B, Darlington RB, Finlay BL (2001)
insky MA (1996) Differences in rates of benzene Translating developmental time across mam-
metabolism correlate with observed genotoxi- malian species. Neuroscience 105:7–17
city. Toxicol Appl Pharmacol 136:649–656 56. Krishnam K, Andersen ME (1994) Physiologi-
42. Gargas ML, Andersen ME, Clewell HJ (1986) A cally based pharmacokinetic modeling in toxi-
physiologically based simulation approach for cology. In: Hayes AW (ed) Principles and
determining metabolic constants from gas uptake methods of toxicology, 3rd edn. Raven Press,
data. Toxicol Appl Pharmacol 86:341–352 New York, NY, pp 149–188
43. Lipscomb JC, Barton H, Tornerol-Velez R 57. Jepson GW, Hoover DK, Black RK, McCaff-
(2004) The metabolic rate constants and spe- erty JD, Mahle DA, Gearhart JM (1994) A
cific activity of human and rat hepatic cyto- partition coefficient determination method for
chrome P450 2E1 toward chloroform. nonvolatile chemicals in biological tissues.
J Toxicol Environ Health 67:537–553 Fundam Appl Toxicol 22:519–524
44. Delic JI, Lilly PD, MacDonald AJ, Loizou GD 58. Gallo JM, Lam FC, Perrier DG (1987) Area
(2000) The utility of PBPK in the safety assess- method for the estimation of partition coeffi-
ment of chloroform and carbon tetrachloride. cients for physiological pharmacokinetic mod-
Reg Toxicol Pharmacol 32:144–155 els. J Pharmacokinet Biopharm 15:271–280
45. Corley RA, Mendrala AL, Smith FA et al 59. Teo SKO, Kedderis GL, Gargas ML (1994)
(1990) Development of a physiologically Determination of tissue partition coefficients
based pharmacokinetic model for chloroform. for volatile tissue-reactive chemicals: acryloni-
Toxicol Appl Pharmacol 103:512–527 trile and its metabolite 2-cyanoethylene oxide.
46. Beck BD, Clewell HJ III (2001) Uncertainty/ Toxicol Appl Pharmacol 128:92–96
safety factors in health risk assessment: oppor- 60. Khor SP, Mayersohn M (1991) Potential error in
tunities for improvement. Hum Ecol Risk the measurement of tissue to blood distribution
Assess 7:203–207
520 E.M. Kenyon
coefficients in physiological pharmacokinetic 64. Houston JB (1994) Utility of in vitro drug

modeling residual tissue blood I theoretical con- metabolism data in predicting in vivo metabolic
siderations. Drug Metab Dispos 19:478–485 clearance. Biochem Pharmacol 47:1469–1479
61. Poulin P, Krishnan K (1995) An algorithm for 65. Blaauboer BJ (2010) Biokinetic modeling and
predicting tissue:blood partition coefficients or in vitro-in vivo extrapolations. J Toxicol Envi-
organic chemicals from n-octanol:water parti- ron Health Part B 13:242–252
tion coefficient data. J Toxicol Environ Health 66. Howgate EM, Yeo KR, Proctor NJ, Tucker
46:117–129 GT, Rostami-Hodjegan A (2006) Prediction
62. Poulin P, Krishnan K (1996) A mechanistic of in vivo drug clearance from in vitro data I
algorithm for predicting blood:air partition impact of inter-individual variability. Xenobio-
coefficients of organic chemicals with the con- tica 36:473–497
sideration of reversible binding in hemoglobin. 67. Obach RS (1999) Prediction of human clear-
Toxicol Appl Pharmacol 136:131–137 ance of twenty-nine drugs from hepatic micro-
63. Barter ZE, Bayliss MK, Beaune PH, Bobbis somal intrinsic clearance data: an examination
AR, Carlile DJ, Edwards RJ, Houston JB, of in vitro half-life approach and nonspecific
Lake BG, Lipscomb JC, Pelkonen OR, Tucker binding to microsomes. Drug Metab Dispos
GT, Rostami-Hodjegan A (2007) Scaling fac- 27:1350–1359
tors for the extrapolation of in vivo metabolic 68. Kedderis GL (1997) Extrapolation of in vitro
drug clearance from in vitro data: reaching a enzyme induction data to human in vivo.
consensus on values of human microsomal pro- Chem-Biol Interact 107:109–121
tein and hepatocellularity per gram of liver.
Curr Drug Metab 8:33–45
Chapter 20
Population Effects and Variability

Jean Lou Dorne, Billy Amzal, Frédéric Bois, Amélie Crépet,
Jessica Tressou, and Philippe Verger
Abstract
Chemical risk assessment for human health requires a multidisciplinary approach through four steps: hazard
identification and characterization, exposure assessment, and risk characterization. Hazard identification
and characterization aim to identify the metabolism and elimination of the chemical (toxicokinetics) and the
toxicological dose–response (toxicodynamics) and to derive a health-based guidance value for safe levels of
exposure. Exposure assessment estimates human exposure as the product of the amount of the chemical in
the matrix consumed and the consumption itself. Finally, risk characterization evaluates the risk of the
exposure to human health by comparing the latter to with the health-based guidance value. Recently, many
research efforts in computational toxicology have been put together to characterize population variability
and uncertainty in each of the steps of risk assessment to move towards more quantitative and transparent
risk assessment. This chapter focuses specifically on modeling population variability and effects for each step
of risk assessment in order to provide an overview of the statistical and computational tools available to
toxicologists and risk assessors. Three examples are given to illustrate the applicability of those tools:
derivation of pathway-related uncertainty factors based on population variability, exposure to dioxins,
dose–response modeling of cadmium.
Key words: Population variability, Risk assessment, Dose–reponse modeling, Benchmark dose,
Toxicokinetics, Toxicodynamics, Physiologically based models, Cadmium, Dioxins, Pathway-related
uncertainty factors, Meta-analysis, Systematic review
1. Introduction
Population variability is a critical quantitative matrix to all sciences

that have been applied to many disciplines to model systems in
biological, medical, environmental, and social sciences, economy,
and help risk analysts and risk assessors in the decision making
process. In the context of chemical risk assessment, a multidisci-
plinary approach at the cross road between medicine, biochemistry,
toxicology, ecology, analytical chemistry, applied mathematics,
and bioinformatics is applied to protect humans, animals, and the
521
522 J.L. Dorne et al.
environment from chemical hazards. According to Regulation

(EC) No 178/2002 of the European Parliament and of the Council
of 28 January 2002, “Hazard” is defined as a biological, chemical,
or physical agent in, or condition of, food and “Risk” is defined a
function of the probability of an adverse health effect and the
severity of that effect, consequential to a hazard (1). The WHO
International Program on Chemical Safety (IPCS) within its pro-
gram dealing with the harmonization of chemical risk assessment
methodologies has defined hazard as “the inherent property of an
agent or situation having the potential to cause adverse effects when
an organism, system or (sub)population is exposed to that agent”
and risk as “the probability of an adverse effect in an organism,
system or (sub)population caused under specified circumstances by
exposure to an agent” (2). To qualify and quantify such hazard and
risk, the application of the four pillars of risk assessment, namely,
hazard identification and hazard characterization, exposure assess-
ment, risk characterization have enabled scientists and public health
agencies to protect consumers from adverse health effects that may
result from acute and chronic chemical exposure (3).
Hazard identification and hazard characterization can be
underpinned as the toxicological dimension of chemical risk assess-
ment: the toxicokinetics (TK) translates a chemical external dose to
an internal dose leading to overall elimination from the body, i.e.,
absorption from the gastrointestinal tract, distribution in body
fluids/tissues, metabolism, and ultimately excretion in the urine/
feces. The toxicodynamics (TD) expresses the toxicity once the
toxic species, either as the parent compound or an active/toxic
metabolite reaches its target (receptor/cell/organ) to derive a
health-based guidance value (4).
Exposure assessment is defined as the qualitative and/or quan-
titative evaluation of the likely intake of biological, chemical or
physical agents via food as well as exposure from other sources if
relevant (2). Because of the absence of adequate biomarkers, the
external exposure is often the only information available to be
calculated and compared with toxicological thresholds. There is
an overall agreement that the exposure assessment should be a
stepwise approach, each step being more conservative that the
following one but also less costly in terms of time and resources.
In the field of food safety the exposure from food is often assumed
to be the unique source of exposure. In a vast majority of cases, its
assessment consists in a deterministic approach combining the
mean and high levels of occurrence for the chemical under consid-
eration with the consumption of an average and a high consumer
for the contaminated food. The deterministic approach is quite well
described in various guidelines (2, 5) and is used routinely for its
simplicity and its conservatism. However, it does not represent an
accurate or even a realistic picture of the true exposure over the
population. In particular it does not account either for the day to
20 Population Effects and Variability 523
day variability of the food consumption on long term or for the

distribution of hazard within foodstuffs.
Risk characterization consists in comparing the exposure with
one or another threshold for safety concern. The exposure could
first be compared with the Health based guidance values (e.g.,
provisional tolerable weekly intake) defined as: The estimate of the
amount of a chemical in food or drinking-water, expressed on a body
weight basis, that can be ingested [weekly] over a lifetime without
appreciable health risk to the consumer. The derivation of health-
based guidance value can use, as the point of departure, the Bench-
mark Dose Limit (BMDL) which is the lower boundary of the
confidence interval on a dose of a substance associated with a
specified low incidence of risk, of a health effect (2). The BMDL
accounts for the uncertainty in the estimate of the dose–response
that is due to characteristics of the experimental design, such as
sample size. Such comparison allows setting up safe levels of expo-
sure and consequently safe level of food consumption and/or haz-
ard occurrence. When health effects in humans are well
characterized, the exposure could also be compared with the level
at which these effects are likely to occur or with the benchmark dose.
Such type of comparison allows estimating a burden of disease
attributable to the hazard under consideration. In a number of
cases it is necessary to develop statistical methodologies to reconcile
the dietary exposure with the results of PB-TK models describing
the fate of toxic compounds within the body. Ultimately, consistent
outputs from the two exposure models (external and internal)
should allow a robust estimation of health effects.
Beyond classes of chemicals and the risk assessment pillars,
regulators have relied upon two basic mechanistic differences to
assess human health risks related to chemical exposure, i.e., whether
the chemicals are genotoxic carcinogens or nongenotoxic carcino-
gens. Such classification constitutes the basis for the derivation of
health-based guidance values in humans and has been reviewed in
detail elsewhere (3, 4, 6). In essence, genotoxic carcinogens and
their metabolites are assumed to act via a mode of action that
involves a direct and potentially irreversible DNA-covalent binding
with a linear dose–response relationship over a chronic to life time
exposure with no threshold or dose without a potential effect. To
deal with such genotoxic carcinogens, the Margin Of Exposure
(MOE) approach has been recently used by the World Health
Organization (WHO) and the European Food Safety Authority
(EFSA). The MOE is determined as the ratio of a defined point on
a dose–response curve for adverse effects obtained in animal experi-
ments (in absence of human epidemiological data) and human
intake data. Two reference points describing the dose–response
relationship have been proposed, namely, the preferred Benchmark
Dose (BMD) and BMDL or the T25. For the interpretation of
MOEs, the Scientific Committee of EFSA considered that an
MOE of 10,000 or more, based on a BMDL10 derived from animal

cancer bioassay data and taking into account the uncertainties in the
interpretation, “would be of low concern from a public health point of
view and might reasonably be considered as a low priority for risk
management actions” (7). Recent examples of risk assessments per-
formed by the Joint FAO/WHO Expert Committee on Food addi-
tives (JECFA) and EFSA using this approach have included ethyl
carbamate, polycyclic aromatic hydrocarbons, and arsenic (8–11).
In contrast, nongenotoxic carcinogens and their metabolites
are assumed to act via an epigenetic mode of action without cova-
lent binding to DNA. It is assumed a threshold level of exposure
below which no significant effects are induced implying that
homeostatic mechanisms can counteract biological perturbations
produced by low levels of intake, and that structural or functional
changes leading to adverse effects, that may include cancer, would
be observed only at higher intakes (4). For such thresholded tox-
icants, health-based guidance values are derived and have been
defined as “without appreciable health risk” when consumed
every day or weekly for a lifetime such as the “Acceptable and
Tolerable Daily Intakes (ADI/TDI)” or provisional tolerable
weekly intake (PTWI) in Europe and the “Reference dose” in the
United States. For thresholded chemicals with acute toxic effects,
the acute reference dose approach (ARfD) has been defined as by
the Joint FAO/WHO Meeting on Pesticide Residues (JMPR) as
“an estimate of the amount a substance in food and/or drinking
water, normally expressed on a body weight basis, that can be
ingested in a period of 24 h or less without appreciable health risk
to the consumer on the basis of all known facts at the time of the
evaluation” (12). Recent risk assessment for which ARfD have been
derived for humans based on the consumption of shellfish contami-
nated with marine biotoxins or mushrooms contaminated with
nicotine (13, 14). These health-based guidance values are derived
using surrogates for the threshold such as the no-observed-adverse-
effect-level (NOAEL) or the BMD or BMDL from laboratory
animal species used in risk assessment (mice, rat, rabbit, dog) and
an uncertainty factor of a 100-fold, to allow for interspecies differ-
ences (10) and human variability (10).
Over the years, the scientific basis for the uncertainty factors
has been challenged and considerable research efforts have aimed to
further split the products of such factors to include TK (4.0 for
interspecies differences and 3.16 human variability) and TD aspects
(2.5 for interspecies differences and 3.16 human variability) to
ultimately replace them with chemical-specific adjustment factors
(CSAF) when chemical-specific data are available and physiologi-
cally based models can be developed (Fig. 1) (15, 16). Application
of such adjustment factors is a substantial refinement in quantitative
risk assessment as it integrates information specific to the chemicals
or the studied populations, even from limited data. This integration
Toxicokinetics Toxicodynamics
Chemical-specific adjustment Chemical-specific adjustment

factor factor
or or
species + pathway-related process and species-related

uncertainty factors uncertainty factors
Interspecies
or or
general default general default

4.0 2.5
Chemical-specific adjustment Chemical-specific adjustment

factor factor
or or
pathway related uncertainty Process-related Interindividual

factors Uncertainty factors
or or
general default general default

3.16 3.16
Fig. 1. Uncertainty factors, chemical-specific adjustment factors (CSAF), pathway-related

uncertainty factors and the general default uncertainty factors. Based on ref. 15.
is made through statistical analysis of specific data with ad hoc

modeling of the population variability. An example of adjustment
factor derivation from physiological-based model for both TK and
TD aspects is the recent risk assessment of cadmium performed by
EFSA (17, 18). Intermediate options using simpler models have
also been applied. For example, when the metabolic route is known
in humans, the default uncertainty factor of 3.16 can be replaced by
pathway-related uncertainty factors which have been developed
based on meta-analyses of human variability in TK for phase I
enzymes [cytochrome P-450 isoforms (CYP1A2, CYP2E1,
CYP2C9, CYP2C19, CYP2D6, CYP3A4), hydrolysis. . .], phase
II enzymes (glucuronidation, sulfation, N-acetylation), and renal
excretion using the pharmaceutical database. These pathway-
related uncertainty factors have been derived for subgroups of the
population for whom data were available such as healthy adults
from different ethnic backgrounds including genetic polymorph-
isms (CYP2D6, CYP2C9, CYP2C19, N-acetylation, infants, chil-
dren and neonates, and the elderly) (19).
These examples which are detailed further in this chapter high-
light that considerable efforts move have been made to develop
population and hierarchical models using advanced statistical model-
ing to quantify population variability in the critical processes driving
the setting of health standards, namely, exposure, TK, and TD or
dose–response. This chapter focuses such population models in
computational toxicology applied to TK, TD (dose–response), and
exposure assessment. The first part focuses on the basic description of
such models together with the computer software and packages
available. Mathematical and practical aspects to develop the models
are then given and three models are given as examples: the develop-
ment of pathway-related uncertainty factors, dose–response model-
ing of cadmium in humans, and probabilistic exposure assessment of
dioxins in humans. Finally, critical conclusions and limitations of this
growing discipline conclude the chapter.
2. Materials: Main
Approaches and
Software Packages
After a chemical compound penetrates into a living mammalian
2.1. Hazard organism (following intentional administration or unintentional
Identification
exposure), it is distributed to various tissues and organs by blood
flow (20). Following its distribution to tissues, the substance can
and Characterization
bind to various proteins and receptors, undergo metabolism, or can
2.1.1. Toxicokinetic be eliminated unchanged. The concentration versus time profiles of
Modeling the xenobiotic in different tissues, or the amount of metabolites
formed, are often used as surrogate markers of internal dose or
biological activity (21). Such information can be predicted with the
help of toxicokinetic (TK)/pharmacokinetic (PK) modeling for
which two general approaches have been applied :
l Noncompartmental analyses, consisting in a statistical regres-
sion analysis of TK measurements versus time and a number of
covariates.
l Compartmental analyses, consisting in the modeling of TK
profiles accounting for the distribution and metabolism of the
toxic compounds throughout the body.
Like for any modeling exercise, the choice of method depends
on the objectives and scope of the analysis, the data available, and
the resources and constrains attached to the analysis. Usually, non-
compartmental approaches are used for exploratory analyses, the
screening of potentially influential factors, and the determination of
sample size for further TK studies. Conversely, compartmental
analyses are more suitable for refined analyses and predictions.
The methods and related tools are further detailed and compared
below.
Compartmental TK Analysis Compartmental analyses assume that the substance of interest dis-
tributes in the body as if it was made of homogeneous well-stirred
compartments. Empirical compartmental models assign not particu-
lar meaning to the compartment identified (i.e., the substance
behaves “as if” it was distributed in two or three compartments in
the body, without looking for interpretations of what these compart-
ments might be) (22). That type of analysis can only be performed
usefully in data-rich situations and does not lend itself to necessary
interpolations and extrapolations. It is less and less used, and survives
only in heavily regulated and slowly changing contexts. On the
LUNGS
SKIN
BRAIN
FAT
HEART
MUSCLE
MARROW
ADRENALS
THYROID
ARTERIAL BLOOD
VENOUS BLOOD OTHERS
BREAST
GONADS
KIDNEYS
SPLEEN
STOMACH
LUMEN PANCREAS
STOMACH
GUT
LUMEN GUT
LIVER
Fig. 2. Schematic representation of a PBTK model for a woman. The various organs
or tissues are linked by blood flow. In this model, exposure can be through the skin, the
lung or per os. Elimination occurs through the kidney, the GI tract, and the lung.
The parameters involved are compartment volumes, blood flows, tissue affinity constants
(or partition coefficients), and specific absorption, diffusion and excretion rate constants.
The whole life of the person can be described, with time-varying parameters. The model
structure is not specific of a particular chemical (see http://www.gnu.org/software/mcsim/,
also for a pregnant woman model).
contrary, physiological-based TK models (PBTK models) insist in

assigning physiological meaning to the compartments they define.
Their parameter values can be determined on the basis of in vitro data,
in vivo data in humans or animals, quantitative structure–property
relationship (QSPR) models, or the scientific literature (basically
reporting and summarizing past experiments of the previous types)
(23). PBTK models have evolved into sophisticated models which try
to use realistic biological descriptions of the determinants of the
disposition of the substance in the body (24). Those models describe
the body as a set of compartments corresponding to specific organs or
tissues (e.g., adipose, bone, brain, gut, heart, kidney, liver, lung,
muscle, skin, and spleen, etc.) as illustrated by Fig. 2. Between
compartments, the transport of substances is dictated by various
physiological flows (blood, bile, pulmonary ventilation, etc.) or by
diffusions (20). Perfusion-rate-limited kinetics applies when the

tissue membrane presents no barrier to distribution. Generally, this
condition is likely to be met by small lipophilic substances. In con-
trast, permeability-rate kinetics applies when the distribution of the
substance to a tissue is limited by the permeability of a compound
across the tissue membrane. That condition is more common with
polar compounds and large molecular structures. PBTK models may
therefore exhibit different degrees of complexity. In such a model, any
of the tissues can be a site of metabolism or excretion, if that is
biologically justified.
Building a PBTK model requires gathering a large amount of
data which can be categorized in three groups:
l Model structure, which refers to the arrangement of tissues and
organs included in the model.
l System data (physiological, anatomical, biochemical data).
l Compound-specific data.
Additional details on PBTK modeling and applications can be
found in (20, 25–27) or in this book (Bayesian inference-Chapter
25). Indeed, such descriptions of the body are approximate but a
balance has to be found between precision (which implies complex-
ity) and simplicity (for ease of use). Yet, the generic structure of a
PBTK model facilitates its application to any mammalian species as
long as the corresponding system data are used. Therefore, the same
structural model can be used for a human, a rat or a mouse, or for
various individuals of the same species. That is why such models are
widely used for interspecies extrapolations, and also for interindivid-
ual and intraindividual extrapolations.
Interindividual or intraindividual extrapolations refer to the
fact that a given exposure may induce different effects in the indi-
viduals of a population, and that the same individual may respond
differently to the same exposure at different times in his/her life-
time. These extrapolations are performed by setting parameter
values to those of the subpopulation or individual of interest, and
are mainly used to predict the differential effects of chemicals on
sensitive populations such as children, pregnant women, the
elderly, the obese, and the sick, taking into account genetic varia-
tion of key metabolic enzymes, etc. The toxicokinetic characteris-
tics of a compound can also be studied under special conditions,
such as physical activity. Understanding the extent and origins of
interindividual differences can be done predictively, on the basis of
Monte Carlo simulations for example. However, that requires a
large amount of system information, and in particular statistical
distributions for the systems parameters. An alternative is to esti-
mate the variability in parameters from experimental or clinical
data. This form of inference is well developed and known as popu-
lation or multilevel pharmacokinetic analyses (21).
Besides such extrapolations, PBTK models are also used to

perform interdose and interroute extrapolations. Interdose extra-
polations are achieved by capturing both the linear and nonlinear
components of the biological processes (e.g., transport and metab-
olism) governing the kinetics of the chemical of interest. Interroute
variations can also been captured by PBTK models to be used to
extrapolate or predict TK of a given compound from one route to
another (e.g., intravenous infusion, oral uptake, dermal absorption,
inhalation). The use of numerical integration also allows arbitrary
forms of time-varying inputs (e.g., air concentration) to be used.
Combinations of exposure routes can also be modeled.
Note that PBTK models have historically been used primarily
to estimate internal exposures following some defined external
exposure scenario. As with any model, PBTK models can be run
backwards (numerically at least) to reconstruct exposures from the
TK measurements. That application, even if obvious, is a bit more
recent (28).
The level of the complexity structuring PBTK models (e.g.,
number of compartments or differential equations) needs to be
commensurate with the available data and the primary objective
of the analysis. In chemical risk assessment, the evaluation of inter-
individual variability may be of primary importance. In this case,
compartmental models may be reduced to a simpler one-
compartment model describing only the first-order magnitude of
TK variations over time, but integrating a subject-specific random
effect (typically lognormal) to describe the population variability.
Generally speaking, multicompartment PBTK models describe
much better the metabolic pathways as a whole and allow the
calculation of chemicals’ concentrations in the main organs in the
body. On the other hand, the numerous parameters may require a
substantial amount of information on parameters which makes
more difficult any statistical evaluation. It also generally requires
thorough sensitivity analysis and model validation. Alternatively, a
one-compartment model only focuses on the overall elimination of
the toxicant from the body, making rough and global assumptions
on the involved pathways. In case of poor prior knowledge on these
pathways, it allows a simplified and parsimonious description of
toxicant elimination, hence easier statistical evaluations (such as
the evaluation of population variability). However, in some cases
where the simple TK modeling assumptions are not met (like zero-
order absorption, dose linearity), such an approach could lead
to poor fits and inflated residuals. Moreover, by definition,
one-compartment models do not allow for the evaluation of toxi-
cant concentrations in each organ where it distributes. Typically,
risk assessors may face the choice between implementing an
extended PBTK model without population variability versus fitting
a one-compartment model with a subject-specific random effect to
enable the evaluation of interindividual variability of the overall
toxicant half-time. The choice will then be made depending on the

available data and on the objective of the modeling analysis.
A discussion on this choice exemplified by the Cadmium example
is detailed in ref. 17.
The simple one-compartment, first order, TK models can be
use to relate external exposure to internal exposure (29, 30). Since
one-compartment TK models are widely used especially for chronic
risk assessment, we detail hereafter their general form.
One-compartment TK models are represented by a differential
equation describing the change of the chemical body concentration
x over time:
dx lnð2Þ
ðtÞ ¼ ðI ABS) ðk xðtÞÞ; k¼ (1)
dt HL
where I is the daily intake, ABS the fraction of chemical absorbed
after oral exposure (absorption fraction) and k the elimination rate
described by the biological half-life HL.
After a certain period of time, the chemical concentration in
body lipids settles to an equilibrium regime or steady-state. In this
regime, a balance is obtained between ingested dose of chemical
and its elimination. The differential equation 1 is therefore equals
to 0 and the burden in body at steady state situation can be simply
calculated as a linear relation:
I ABS
xðtÞ ¼ : (2)
k
This equation at steady-state is often used to relate external and
internal exposure (29, 30). Using the above equation requires that
the daily intake and the absorption fraction are constant over the
period of time needed to reach steady-state (31). No variability on
the chemical daily intake can be considered. However, for chemicals
present in food not daily consumed such as methylmercury in
fishery products a high variability between individual intakes over
lifetime can occur. Pinsky and Lorber (32) found that TK model
integrating time-varying exposure profile produced more reliable
predictions of body burden than compared with the equation at
steady-state. Moreover, the Eq. 2 cannot be used to relate external
exposure to measurements on population which has not reached
the steady-state, which is the particular case of chemicals with long
half-life.
For chemicals with long half-life the absorption time can be
considered as insignificant compared to the elimination time. In
such situation, there is no absorption process as in the case of an
intravenous injection of the chemical. A simplified equation differ-
ential (33, 34) obtained by removing the first part related to the
intake dose of the Eq. 1 is thus used to define the elimination
process between two intakes time
dx
ðtÞ ¼ k xðtÞ: (3)
dt
Any software package able to solve systems of differential
equations can be used to build a PBTK model and run simulations
of it. They include GNU Octave (http://www.gnu.org/software/
octave), Scilab (http://www.scilab.org/), GNU MCSim (free soft-
ware), Mathematica (http://www.wolfram.com/mathematica),
Matlab (http://www.mathworks.com/products/matlab), and acslX
(http://www.acslx.com). A list of software packages used for PBTK
modeling is detailed in Table 1. Most of packages specifically devel-
oped for compartmental analysis were tailored for PBPK and
applications in clinical pharmacology, though they can be used
equally in the context of toxicological assessments. Only a small
number of them allow user-specific integration of random effects to
evaluate population variability on PK parameters. As a matter of fact,
such nonlinear mixed effect models require powerful algorithms for
their statistical inference, such as simulation-based algorithms. This
can be achieved using Bayesian inference and Monte Carlo Markov
chains as implemented, e.g., in GNU MCSim (http://www.gnu.
org/software/mcsim) or using the SAEM algorithm, a stochastic
version of the EM algorithm (35) maximum likelihood estimation.
The Monolix application developed as a free toolbox for Matlab is
a powerful and flexible user-friendly platform to implement SAEM-
based estimation of PBPK and PBTK models with population
variability. It is increasingly used in particular in drug development
and pharmacological research as it combines flexibility, statistical
performance and reliability with minimal coding required from
users. From this perspective, it shows substantial advantage compared
to the NONMEM software, the gold standard of pharmacological
modeling used in the pharmaceutical world, mostly based on gradient
optimization and first-order approximation of the likelihood.
Even more specialized software using pathway-specific infor-
mation is available and has devoted considerable effort to building
databases of parameter values for various species and human
populations: notable examples are Simcyp Simulator (http://
www.simcyp.com), PK-Sim (http://www.pk-sim.com), GastroPlus
(http://www.simulations-plus.com), and Cyprotex’s CloePK
(https://www.cloegateway.com).
Noncompartmental Noncompartmental methods are in fact a misnomer, because they

TK Analysis still consider that the body as one compartment. In fact they are
simpler, often nonparametric compared with methods that aim at
estimating general kinetic properties of a substance (for example,
the substance biodisponibility, total clearance, etc.) (36). Noncom-
partmental models are convenient for exploratory analyses and easy
to use; they produce estimates of common parameters of interest.
They can be used when very little information is available on the
Table 1
List of the main software packages for compartmental PBPK/PBTK modeling,
in alphabetic order, with the indication that they include built-in features
for analyzing population variability or not
Built-in
population
Software (Web site) Description variability
asclX http://www.acslxtreme.com/ Built-in compartmental modeling including No
pharmacodynamics, toxicity studies, Monte
Carlo simulations tools
Berkeley Madonna www. Generic powerful ODE solver and Monte Carlo No
berkeleymadonna.com simulations
Boomer/Multi-Forte www.boomer.org Estimate and simulate from compartmental No
models, using a range of possible algorithms
including Bayesian inference
CloePK https://www.cloegateway.com PBPK predictive tools for animals and humans No
including open-source database on metabolic
pathways
ERDEM http://www.epa.gov/heasd/ PBPK/PD solver and simulator for risk No
products/erdem/erdem.html assessment used by EPA and exposure
assessment using backwards simulations
GastroPlusTM http://www.simulations- Popular PBPK solver and simulator, including No
plus.com trial simulations
GNU Octave http://www.gnu.org/ Generic ODE similar to Matlab No
software/octave
Matlab with MONOLIX http://www. Powerful PBPK solver integrating population Yes
monolix.org/ variability, with GUI interface for graphical
analyses and predictions. It uses MCMC and
SAEM algorithms
Mathematica with Biokmod http:// ODE solver specific to compartmental biokinetic No
web.usal.es/~guillermo/biokmod/ systems, includes design analysis
mathjournal.pdf
MCSim http://www.gnu.org/ Flexible PBPK solver with population variability, Yes
software/mcsim it gives examples of detailed generic PBPK
models
NONMEM http://www.iconplc.com Gold standard of the population PK Yes
compartmental analysis in the pharmaceutical
industry
Phoenix® NLME™ (formerly Population PK modeling tool with similar Yes
WinNonMix) http://www.pharsight. algorithms as in Nonmem but with better
com graphical features
PKBugs www.mrc-bsu.cam.ac.uk/ Efficient and user-friendly Bayesian software Yes
bugs/. . ./pkbugs.shtml built in the WinBUGS, for analysis of
population PK models
(continued)
Table 1
(continued)
Built-in
population
Software (Web site) Description variability
PK-Sim http://www.systems-biology. User-friendly tool for PBPK modeling and Yes
com/products/pk-sim.html simulations in humans and animals, it includes
a population database
PK solution http://www.summitpk. Easy-to-use Excel-based analysis tool for No
com noncompartmental and simple compartmental
models
R with PKfit http://cran.r-project.org/ Free R routine for the analysis of compartmental No
web/packages/PKfit/index.html models
SAAM II and Popkinetics http://depts. Compartmental modeling PK software with Yes
washington.edu/saam2 population analysis using Popkinetics
S-Adapt http://bmsr.usc.edu/ Compartmental PK/PD modeling platform Yes
Software/ADAPT/ADAPT.html including simple population models fitted with
the EM algorithm
Simcyp http://www.simcyp.com/ Major and powerful PBPK software suite with Yes
built-in PK and genetic database for
predictions in animals and humans
metabolic pathways and few data are to be analyzed. But they have
assumptions, first and foremost linearity.
As the methods involved are mainly simple calculations or
statistical regressions, most standard statistical software packages
can be used. Population variability is then assessed using linear
mixed-effect models built in most statistical packages. Neverthe-
less, more specific computer-based tools are available; the main
ones are listed in Table 2.
2.1.2. Toxicodynamics Toxicodynamic (TD) modeling quantifies the plausible causal link
and Dose–Response between chemical exposure and health or environmental conse-
Modeling quences. TD assessments are typically based on the development
of dose–response models or dose–effect relationships. Like for TK
models, a proper quantification of the interindividual variability of
dose–response is essential for quantitative risk assessment, and
more specifically for the derivation of health-based guidance
values.
TK/TD models provide a number of advantages over more
descriptive methods to analyze toxicity/safety data. In particular,
these models make better use of the available data as all effects
observations over time are used to estimate model parameters.
Table 2
List of the main software packages for noncompartmental
TK modeling, in alphabetic order, with the indication that
they include built-in features for analyzing population
variability or not
Population
Software (Web site) Main features variability
ADAPT II http://bmsr.usc. PKPD software with Monte No
edu/Software/ADAPT/ Carlo simulations capacities
ADAPT.html
BE/BA for R http://pkpd. Freeware using linear and No
kmu.edu.tw/bear/ ANOVA models, include
bioequivalence analysis and
study design analysis
PK solution http://www. Easy-to-use Excel-based analysis No
summitpk.com tool for noncompartmental
and simple compartmental
models
WinNonLin http://www. Industry standard for Yes
pharsight.com noncompartmental analyses.
Includes design analysis
As a consequence, TK/TD models can also successfully be used to

extrapolate and compare toxic effects between different exposure
scenarios. Finally, TK/TD models facilitate a mechanism-based
comparison or extrapolation of effects from different substances,
species, doses, and life stages.
Similarly to PBPK in TK assessments, biologically based
dose–response (BBDR) models have been developed TD analyses
in order to incorporate information on biological processes at the
cellular and molecular level to link external exposure to an adverse
effect. Although BBDR models provide a framework for testing
mechanistic hypotheses (37), their role in risk assessment, e.g., for
low-dose extrapolation appears to be limited (38) as the population
variability is then only expressed at cellular level rather than at
subject levels without much improvements on toxicity predictivity.
In contrast with mechanism-based BBDR models, empirical
modeling can be used to describe the patterns relating adverse
effects to increasing exposure to a contaminant. Empirical
dose–response modeling hence attempts to find a simple mathe-
matical model that adequately describes this pattern. In general,
empirical models have minimal or even no direct linkage to the
underlying mechanisms driving such adverse effect. Instead, they
focus on flexible and often simpler mathematical forms that can fit a
broad spectrum of dose–response data. This mathematical form is

then by nature strongly dependent on the dose range observed so
that extrapolation, e.g., to low dose should be carefully approached
in light of information available on the biology underlying mechan-
isms of action. Furthermore, empirical models can generally not be
used to extrapolate hazard characterization of mixtures from
dose–response of individual compounds without a strong assump-
tion on how the individual dose–response curves combine into the
overall dose–response for the mixture (e.g., assuming effect addi-
tivity). Typical empirical models include linear, log-linear, Poisson,
and Hill models. Empirical models can then be fitted using any
standard statistical software and then used to estimate a point of
departure.
In toxicological assessment where population variability needs
to be accounted for, the integration of TK and TD models into a
consistent mathematical description is essential. Indeed, misspeci-
fication of population variability in one of the TK and TD models
would then make the overall assessment inaccurate. Moreover, such
integrated approaches allow for correlating subject-specific TK and
TD parameters, hence avoiding over-estimation of population
variability in risk assessments.
There are very few dedicated software packages specific to
population TD assessments. In general, such package are integrated
either within a PK software or associated to a benchmark dose
evaluation tool (see below). This is the case, e.g., for commercial
packages like Toxtools which allow for dose–response model eva-
luations among a set of other features.
Reference Dose Evaluation: Risk assessment of thresholded toxicants (typically noncarcinogenic

NOAEL, LOAEL and BMD contaminants) requires the evaluation of a health-based exposure
limit values or Reference Dose (RfD) which are used for regulatory
purposes. Exposure to chemicals below such RfD will then gener-
ally be determined to be safe and exposures above the RfD will
generally be determined to be unsafe. Many variations are observed
between risk assessment bodies, but a general approach to the
evaluation of RfD is:
l To evaluate the dose–effect or dose–response relationship.
l To calculate a point of departure from this TD characterization.
l To apply an uncertainty factor to the point of departure,
in order to account for possible interspecies and/or interindi-
vidual variability.
Points of departure are typically one of these:
l No-observed-adverse-effect-level (NOAEL);
l Lowest-observed-adverse-effect-level (LOAEL); or
l Benchmark dose (BMD).
To determine these points of departure for a given compound,

specific prospective toxicological (laboratory) studies can be con-
ducted, generally on animals due to obvious ethical, cost and
regulatory hurdles. Alternatively, retrospective analyses of human
data can also be performed, with the evaluation of interindividual
variability as far as possible. This variability estimate can support the
derivation of evidence-based uncertainty factors, as illustrated fur-
ther in the next sections of this chapter.
NOAEL is defined by the US EPA as the level of exposure at
which there is no statistically or biologically significant increase in
the risk or severity of any adverse affect in the exposed population
as compared to a control or nonexposed population. It corre-
sponds to the highest level of exposure without adverse effects.
Closely related is the LOAEL, defined as the lowest level of expo-
sure at which adverse effect or early precursor of adverse effect is
observed. An obvious advantage of using NOAEL/LOAEL is that
they are simple to evaluate based on published data. Even toxico-
logical studies typically involving ten identical rats by dose groups
are rather straightforward to design and to analyze, essentially
because interindividual variability is by essence eliminated by the
animal study population and therefore ignored in the NOAEL/
LOAEL assessment. Using NOAEL/LOAEL as a basis for the
evaluation of RfD implies therefore that evidence-based uncer-
tainty factors can be derived to account for such population varia-
bility. In addition, a typical concern of NOAEL/LOAEL is the fact
that the uncertainty attached to their evaluation is usually under-
estimated and their values can be highly dependent on the study
designs and sample size. NOAEL/LOAEL generally larger with
smaller sample sizes which makes their use nonconservative in case
of sparse data.
As an alternative the BMD approach have developed and increas-
ingly used to better account for population variability and properly
assess estimation uncertainty using a statistical assessment of the
dose–response model (see e.g., refs. 39–41 for a recent overview of
the state-of-the-art). The BMD is defined as the dose needed to
achieve an excess risk of a given adverse effect compared to the
background exposure, usually defined as level of a biomarker, above
a given threshold. If P(d) denotes the probability for an individual to
reach the defined threshold for adverse effect, when exposed to
dose d, then the BMD can be defined in two different ways:
l As the dose leading to “Additional risk” of X% (often called
benchmark response (BMR)):
P(BMD) ¼ P(Background) + X%
where P(Background) stands for the probability of adverse
effect at background exposure, and X% (BMR) for the addi-
tional prevalence.
Fig. 3. Hybrid approach used for BMD evaluation, using continuous data dichotomized into
quanta data with a predefined cut-off.
l As the dose leading to “Extra risk” of X%:

P(BMD) ¼ P(Background) + X%(1 P(Background))
where P(Background) stands for the probability of adverse
effect at background exposure, and X% for the probability to
observe adverse effect at BMD given it was not observed at
background exposure.
BMDs can be derived using dose–response model evaluated on
quantal data, corresponding to the proportion of population reach-
ing a given threshold effect at a given dose. This standard approach
is the one proposed in the available software such as the EPA BMD
software and the PROAST package (www.epa.gov/ncea/bmds and
www.rivm.nl/proast). However, the so-called hybrid approach
(41–43) uses a dose–effect model describing the continuous rela-
tionship between dose and effect level, hence allowing for risk
calculation without dichotomizing the outcome, using all informa-
tion available from continuous data. Risks or prevalence can then be
derived with respect to any given biologically relevant threshold.
This approach is valid under the assumption that effect levels are
log-normally distributed over the population at a given dose. The
main idea of this hybrid approach is to model the population
variability around the mean dose–effect curve using a statistical
(log-normal) distribution at each given dose, as illustrated by
Fig. 3. Then, for any cut-off, one could derive the corresponding
prevalence and dose–response curve.
Once a dose–response model fitted to appropriate populated
data, BMD can be derived according to:

BMD ¼ exp m1 log(cutoff) s1 ðpÞ ;
with:

logðcutoff Þ background
p ¼ ð1 BMR Þ ;
s
for extra risk

logðcutoff Þ background
p¼ BMR;
s
for additional risk
where s stands for the population of effect, BMR for the bench-
mark response, p for the cumulative distribution function of the
standardized normal distribution, m for the dose–effect function.
By construction, a BMD includes the population variability
which implies that no uncertainty factors will be subsequently
required account for it. Furthermore, since BMDs are the results
of a parametric statistical estimation procedure, a confidence inter-
val can be derived. The lower bound of such interval (BMDL) can
be chosen as a more conservative point of departure, hence
accounting for estimation uncertainty.
In general, BMD analysis requires individual data are available
for a number of dose groups, which may sometimes be hurdle to
the implementation of BMD approach. However, approaching
dose–effect curve estimation in the context of meta-analysis where
only aggregated data from the literature are available can be made
under additional assumptions (e.g., log-normality of population
distributions). This should also be accounted for in the statistical
modeling and in the use of adjustment factors for any BMD evalu-
ation. Examples and discussion of analysis of aggregated epidemio-
logical data for dose–response analysis can be found in (44). There
is generally not a unique dose–response model that can be chosen
to meet the purpose. Therefore, it is necessary to assess the sensi-
tivity of results with respect to the modeling assumption. This can
be achieved, e.g., by comparing results from different possible
models or using model averaging techniques (45).
Although the BMD approach provides a powerful framework
to derive health-based guidance values for both carcinogenic and
noncarcinogenic compounds, it is far from being systematically
implemented by regulatory bodies. One of the main hurdles is the
substantial need for statistical modeling expertise even if specific
software packages have been developed. Implementation and inter-
pretation of outputs still require an expert modeler to be involved
in the assessment. In a way this just acknowledges that chemical risk
assessments remains by nature a multidisciplinary task.
As usual, the choice between NOAEL/LOAEL and BMD
approaches should be made in relation with the data available, the
resources available and the objectives to be met. Table 3 summarizes
the main differences between NOAEL and BMD-based assess-
ments, while Table 4 lists the pros and cons for the two approaches.
Table 3
Summary table comparing requirements and characteristics of NOAEL
versus BMD evaluations
Comparative table NOAEL BMD

Data/design NOAEL must be one of the doses Precision on doses should be good
requirements used in the studies (if not, it should be know and
accounted for)
Small sample size reduces statistical
power
Effect of Sample Smaller sample size leads to larger Smaller sample size leads to smaller BMD
size per dose NOAEL
Effect of number Fewer doses leads to larger NOAEL Fewer doses leads to smaller BMD
of dose groups
(or number of
publication)
Additional Need for an uncertainty factor in most Dose–response model needed
assumptions cases (sensitivity analysis often required)
Need for an uncertainty factor
in some cases
Table 4
Summary table comparing pros and cons in the application of NOAEL
versus BMD in toxicological risk assessment
Summary
table NOAEL BMD
PROS Easy/faster to understand and implement BMD calculated for any risk levels decided
upon
Intuitive appeal Uncertainty accounted for in a robust manner
Do not require strong assumptions on The whole dose–response curve accounted for
dose–response models
More robust when rare effects are observed Allows better adjustment for covariates/
clusters
CONS NOAEL must be of the doses tested Data requirements not always achieved to
allow modeling (especially with human
epidemiological data)
Fewer subjects/doses give higher NOAEL Few software packages available
(rewards poor designs)
Does not provide a measure of risk Except for ideal cases, requires statistical/
modeling expertise
Uncertainty around NOAEL is dealt with Less robust when doses are widely spread
uncertainty factors, often arbitrary or
poorly robust
Less robust with complex or shallow
dose–response
The evaluation of BMD can be implemented by direct coding of

dose–response models in generic modeling platform such as R,
Matlab or S-Plus. Since the BMD methodology has been increasingly
used by risk assessors worldwide, national agencies have developed
their own software. It is in particular the case of the BMDS software
developed by the US EPA and the PROAST software developed by
the Dutch agency RIVM. Version 2.1.2 of the U.S. EPA software
BMDS can be downloaded free of charge at: http://www.epa.gov/
NCEA/bmds/ while version 26.0 of the RIVM software PROAST
can be obtained free of charge at: http://www.rivm.nl/en/foodnu
tritionandwater/foodsafety/proast.jsp. PROAST is a subroutine of
the R package hence requires preliminary knowledge of R commands.
The same exponential family of models can be fitted in BMDS
and in PROAST. However some important features differentiate
these two software packages:
l BMDS uses a window driven environment and is therefore
more user-friendly than PROAST which requires basic knowl-
edge of the R-software environment.
l PROAST uses the lognormal distribution as the default while
BMDS has the normal distribution as the default setting.
l BMDS does not at the moment allow for covariates to be
included in the analysis while PROAST does.
l BMDS is not suitable for studies with a large number of individ-
ual data points as there is a limit in the number of rows in the data
file; the software is therefore of limited use for human studies.
l The BMDS software gives the outcomes for each model of the
exponential family, leaving the user to select the “best” model,
while PROAST selects it automatically.
Both packages are still regularly being improved and these
features may substantially change with the upcoming versions.
Other routines are available for more advanced or more specific
analyses. Considering the relatively strong sensitivity of BMD esti-
mate with respect to the modeling assumptions, it is worth mention-
ing routines available for BMD evaluation with model averaging, as
averaging over a list of potential models can generally increase the
robustness of results (45). The Model Averaging for Dichotomous
Response Benchmark Dose (MADr-BMD) has been developed for
this purpose by the US NIOSH. MADr-BMD fits the same quantal
response models used in the BMDS software with the same estima-
tion algorithms, but generates BMDs based on model-averaged
dose–response. It is available free of charge as open-source and stan-
dalone package on http://www.jstatsoft.org/v26/i05. Note that
Bayesian model averaging approach to BMD can also advantageously
be applied to a wider range of models, possibly including covariates
(46). However no specific user-friendly built-in routine have been
developed to our knowledge, so that ad hoc Bayesian modeling

would need to be implemented, e.g., in WinBUGS or similar generic
Bayesian package.
Among commercial alternatives, none can really make significant
differences from the freeware packages already named. Note that
Toxtools is similar to the BMDS software with more practical and
user-friendly outputs, and offering GEE models for repeated mea-
surements to account, e.g., for intraindividual variability.
2.2. Exposure The exposure to chemicals may occur by consumption of food and
Assessment drinking water, inhalation of air and ingestion of soil so in theory
exposure modeling should cover all these routes. As an example, in
the European Union System for the Evaluation of Substances (47),
the overall human exposure from all sources could be estimated by
EUSES. In practice, such an exposure modeling is strongly depen-
dent of the quality and accuracy of data available and default assump-
tions. For example, for the populations leaving close to a source of
pollution, the exposure assessment is based on typical worst case
scenario since all food items, water and air are derived from the
vicinity of a point source. When dietary exposure assessment is
performed in isolation, quantitative data (i.e., food consumption
and chemical occurrence in food) are generally available allowing
more accurate estimates than those based on scenarios. Unfortu-
nately similar data on the distribution of chemical occurrence in air
are rarely available and therefore dietary exposure assessment cannot
be compared with exposure scenarios to avoid an underestimation of
the contribution of exposure from food relative to other media.
As mentioned in the introduction the deterministic dietary expo-
sure assessment represents the most common approach currently
used in the area of chemical risk from food. The practices are detailed
in the recent guidelines of the World Health Organization (2) and are
not in the scope of this paragraph. In a number of cases the determin-
istic assessment fails to conclude on an exposure below the threshold
of safety concern. It is therefore necessary to develop statistical meth-
odologies to refine the exposure assessment and to bridge the gaps
between the external exposure and the potential health effects.
A further refinement of the deterministic approach is therefore
to use basic Monte-Carlo simulations. In such a stochastic model
the amounts of food consumed, the concentrations of the hazard in
food and the individual body weights of exposed populations are
assumed to arise from probability distributions. For this kind of
assessments, many software are commercially available, e.g.:
l @ risk software, Copyright #2011 Palisade Corporation
http://www.palisade.com/risk/
l crystalball
http://www.oracle.com/us/products/applications/crystal-
ball/crystalball-066563.html
l Creme Software Ltd. Dublin

http://www.biotechnologyireland.com/pooled/profiles/BF_
COMP/view.asp?Q¼BF_COMP_45078
More interestingly, for chemicals with long half-life, the burden
accumulated in the body can be completely different from the dose
exposure estimated from dietary intake at a fixed time or considering
a mean intake over lifetime. In that case, it is necessary to integrate
in the dietary exposure assessment the TK of the chemical. For this
last model described below, no software or routines currently exists.
2.2.1. Dynamic Modeling of The dynamic exposure process, mathematically described in ref. 48,
Dietary Exposure and Risk is determined by the accumulation phenomenon due to successive
dietary intakes and by the elimination process in between intakes.
Verger et al. (49) propose a Kinetic Dietary Exposure Model
(KDEM) which describes the dynamic evolution of the dietary
exposure over time. It is assumed that at each eating occasion Tn,
n 2 N the dynamic exposure process jumps with size equal to the
chemical intake Un related to this eating occasion. Between intakes
the exposure process decreases exponentially according the Eq. 3.
The value of the total body burden Xn + 1 at intake time Tn + 1 is
thus defined as:
Xnþ1 ¼ Xn ekDTnþ1 þ Unþ1 ; n 2 NX0 ¼ x0 (4)
with DTnþ1 ¼ Tnþ1 Tn ; n>1, is the time between two intakes
(interintake time), Un the intake of at time Tn, ln(2)/k is the half-
life of the compound, and x0 the initial body burden.
The exposure process X of a single individual over lifetime is
not available. Indeed, chemical occurrences in food ingested by an
individual and his consumption behavior over a long period of time
have never been surveyed. Thus, computations of the exposure
process are conducted from chemical concentration data and
dietary data reported over a short period of time (described in
Subheading 20.3). Starting from a population-based model, expo-
sure process simulations over lifetime can be refined to approach an
individual-based model. To perform simulations, values for model
input variables are needed. The level of information that could be
included in the model depends on the available data (see section3.1
gathering adequate supporting data for risk assessment). For some
variables, the information is available only at population level and
for others at individual level. For the latter, inter- and intraindivi-
dual variability can be included in the model. Each of these input
variables is further characterized and discussed hereafter.
2.2.2. Intake Time Depending on the available information from consumption survey
and on the consumption frequency of the contaminated products,
intake times can be the eating occasion, the day, the week, etc.
Another way is to consider as in ref. (49) that the consumption
occasions occur randomly so that interintake times are indepen-

dently drawn from the same probability distribution such as a right
censored exponential distribution (fitted on the observed interin-
take times).
2.2.3. Chemical Intake Chemical dietary intakes U are estimated in combining data on
consumed food quantities to data on chemical occurrence in food.
The intake Un of an individual at time Tn is therefore the sum of each
multiplication of the chemical concentration by the consumed
quantity of product p:
P
P
Qp Cn;p
p¼1
Un ¼ ;
Wn
where Qp is the concentration in total chemical of the consumed
product p at time Tn, Cn;p is the consumed quantity of the product
p at time Tn, P is the total number of products containing the
chemical consumed at time Tn, and Wn is the body weight of the
individual at time Tn.
Probabilistic methods to compute dietary intakes are described
and compared with deterministic approaches in ref. 50. The use of
the different methods depends on the nature of available data. The
increased number of data permits to better account for the varia-
bility in the intake assessment. Indeed, when empirical or para-
metric distributions are available, single or double Monte Carlo
simulations can be performed to combine chemical concentration
and food consumption.
2.2.4. Half-Life The half-life is the time required for the total body burden of a
chemical to decrease by half in absence of further intake. Many
studies on rats and humans have been conducted to determine half-
lives of various chemicals (33, 51, 52). The half-life value depends
on the toxicological properties of the chemical but also on the
individual’s personal characteristics. Variability between individual
half-lives can be integrated in the modeling using the estimate of
particular half-life for the different population groups (children,
women, men). Some authors have characterized a linear relation
between half-life and personal characteristics, allowing generating
individual half-life. Moreover, changing individual’s personal char-
acteristics with time, intraindividual variability of half-life over the
lifetime can be integrated in the model. However, level of half-life
variability which can be included in dynamic exposure modeling
depends on the available data. When it is possible, the impact on
model output of using a population half-life or an individual half-
life varying with time has to be tested and discussed.
2.2.5. Initial Body Burden Under mathematical properties of KDEM, at steady-state situation,
the dynamic of the exposure process does not depend on its initial
value X0 anymore. Considering that the aim of the exposure
modeling is to estimate quantities at steady-state situation, there is

therefore no need to accurately define the initial body burden.
Thus, the convergence of the dynamic process to the steady-state
has to be checked before evaluating steady-state quantities
(cf. Subheading “Validation of Exposure Models”).
When the steady state is not reached within a reasonable hori-
zon of time compared to human life, the initial body burden is
required. It corresponds to the concentration in chemical present in
the body at the starting time of the exposure process. Often,
simulations are done over life-time and the initial body burden
required is the one of newborn children aged of few months.
Chemical body burden of newborn children comes from their
mother during pregnancy and breast-feeding. Given one compart-
mental kinetic, a half-life in the order of several years and a constant
exposure during breast-feeding, one may expect approximate linear
accumulation kinetics of chemicals in children body. Therefore,
body burden of newborn children can be easily estimated combin-
ing data on daily breast milk consumption and data on burden of
breast milk. In both cases, the impact of X0 on the estimates of
quantities of interest has to be tested based on simulations starting
with different initial values.
2.2.6. Model Computation At steady-state situation, the exposure process Xt is described by a

governing steady-state distribution m(dxÞ ¼ f ðx)dx (48). This sta-
tionary distribution is crucial to determine steady-state or long-
term quantities such as the steady state mean exposure,
ð ð
1
Em ½X ¼ xmðxÞdx ¼ lim Xt dt (5)
T !1 T
In some rare circumstances, the steady state distribution can be

analytically determined. For example, with exponential distribu-
tions for both intakes and interintake times, the resulting steady
state distribution is a Gamma distribution (49). In most cases,
quantities of interest and time to reach steady state are rather
determined through computer simulations. Computer simulations
are also useful to estimate quantities at time before steady-state.
2.2.7. Simulation Verger et al. (49) describe a population-based approach of the KDEM
of Individual Exposure applied to the occurrence of methylmercury in fish and seafood
from Population Exposure consumed by women aged over 15 years. The intakes and interintake
Models times are considered independent and drawn from two exponential
distributions fitted from the French national consumption survey
INCA (53). A fixed half-life is used to define the elimination process
between intakes over lifetime. A trajectory of the exposure process is
computed in randomly selecting an intake and an interintake time in
their respective distribution. The corresponding body burden is cal-
culated from Eq. 4 using the previous selected values and the fixed
value of the half-life. Each trajectory is therefore computed from
intakes of the whole women population of INCA and represents the

dynamic of the exposure process of this population. In such way,
estimated quantities are the ones of a “mean individual” computed
with information from the whole population.
Based on this population-based model, individual intakes can be
randomly selected from the population distribution without account-
ing for the change of intake levels with the increasing age. However,
since food consumption behaviors, personal characteristic, and chem-
ical occurrence vary with age, simulations over lifetime must integrate
the intake and half-life variability with time. Simulating individual
trajectories over time faces a lack of individual data. One solution can
be to compute simulations combining children and adults intakes,
according to the following procedure. For each intake time Tn over
lifetime, an intake value can be randomly selected from the intake
distribution given the age of the simulated individual at Tn. The time
window can be defined considering a sufficient number of observed
intakes to compute an exposure distribution. For example, if the time
window is of 3 years, the exposure distribution changes every 3 years
with the increasing intake time Tn. Intake time can be fixed to a day or
a week, or else randomly selected in the interintake distribution of the
corresponding time window.
2.3. Risk Risk characterization is usually the step of risk assessment which
Characterization: combines outputs from risk characterization and exposure into one
The Case of Dynamic final assessment.
Modeling of Exposure In the case of the dynamic exposure model described above,
from Food some more specific risk characterization can be undertaken. As a
matter of fact, an interesting risk measure is the probability of the
exposure process exceeds a threshold dref given by:
ð
1
Pm fX >dref g ¼ mð½dref ;1Þ ¼ lim IfX ðtÞdref g dt:
T !1 T
Another interesting quantity is the amount of chemical over the

dref, calculated as:
ð
1
Em ½X dref jX >dref ¼ mð½dref ;1Þ ðx dref Þm(dxÞ:
The threshold dref, can be the extension in a dynamic context of

the tolerable daily intake (TI) of a chemical to which consumers
could be exposed all along their life without health risk. The toler-
able intake can be the health based guidance values defined by
international instances (JECFA, EFSA) such as acceptable daily
intake (ADI), the provisional tolerable weekly intake (PTWI) of
the chemical of interest. According to the definition of the tolerable
intake, a Kinetic Tolerable Intake (KTI) can be built (49). The KTI
is constructed considering that the tolerable intake is the dose of
chemical ingested per time and eliminated between two intake
times with the kinetic model (3). Above that condition, the
dynamic evolution of the process is
xnþ1 ¼ xn ekDTnþ1 þ TI:
At the steady-state, the reference process stabilizes at a safe
level, obtained from the TI and the half-life:
TI
lim xn ¼ :
n!1 1 ek
Note that when k is close to 0, the function 1 ek can be
approximated to k with the Taylor series and thus the previous
equation equals to Eq. (2).
Another value to define threshold dref can be the body burden of
chemical (endpoints) corresponding to possible health adverse
effects. According to that definition the probability of the exposure
process to be above dref corresponds to an incidence of disease.
Endpoints are derived from dose–response relationship provided
by laboratory animal experiments. They can be the lowest chemical
dose which can induce an observed effect of disease (LOAEL) or the
dose associated with a level of this effect (benchmark dose BMD).
For example, a BMD related to effects on reproductive toxicity can
be the percentage of decrease in sperm production. To extrapolate
human dose–response relationship from animal dose–response,
interspecies (variability between man and rat) and intraspecies
(variability between humans) factors have to be used. Each factor
can be defined by a single value or by a probabilistic distribution.
3. Methods:
Practical
Implementation
and Best Practices
for Quantitative
Modeling of
Population
Variability
Various methodologies have been developed to gather available
3.1. Gathering Adequate
data for the quantification of population variability with respect to
Supporting Data for Risk hazard identification and characterization [toxicokinetics, toxico-
Assessment dynamics (dose–response)], occurrence and consumption (expo-
sure assessment) for a particular chemical.
3.1.1. Gathering Evidence Prior to any hazard characterization (dose-dependent TK or dose–-

for Chemical Hazard response modeling), the collection of relevant information is obvi-
Characterization in Humans ously a key step which essentially drives the approach to undertake for
the data modeling. In particular, it will condition the way population
variability will or can be accounted for. Data collection, either
prospectively through ad hoc studies or retrospectively based on

literature reviews or SR (54), is described hereafter, with a focus on
how such population variability can be best synthesized.
For hazards with long term effects, the exposure should be
compared to the health based guidance values (e.g., provisional
tolerable weekly intake, see above) or with the benchmark dose
level at which these effects are likely to occur. In practice the
comparisons are commonly done without considering the kinetic
of the substance and assuming its complete elimination within the
period of reference of the health based guidance value (e.g., 1 week
for a tolerable weekly intake). For chemicals with a long half-life
this assumption is not valid and the temporal evolution of
the [weekly] exposure should be compared dynamically with the
[weekly] consumption of the tolerable amount of the substance
which can then be called the kinetic tolerable intake or KTI (49). In
that case the model for dietary exposure should combine the dis-
tributions of consumption and occurrence with the distribution of
time intervals between exposure occasions so that interindividual
variability in half-lives can be taken into account and allow to move
from a population-based to an individual-based model.
Design and Analysis Most of human and animal randomized studies can be used for
of Randomized Studies toxicological analyses. By nature, randomized studies are designed
so that the analysis of primary outcomes is simply restricted to
group comparisons. Such randomized studies are typically designed
to assess a given effect in relation with chemical exposure. In the
context of risk assessment, these designs are therefore more suitable
for hazard identification rather than hazard characterization. The
simplest design would then be a two-arm study comparing exposed
rats versus nonexposed rats. This assumes that the exposure level is
predefined. Responses to exposure are then compared between the
two groups using typical statistical testing such as Fischer tests for
quantal responses. In cases where a range of doses needs to be
tested, multiple group comparisons can be analyzed with
ANOVA-type of analyses.
By definition, the randomization structure is deemed to
balance all factors that could affect primary outcomes, so that no
adjustments for covariates are needed in the final group comparison
analysis, which can be performed by any standard statistical
package. As a consequence, the main statistical challenge lies in
the randomization structure which should account for the varia-
bility structure and clustering of the data to be collected.
Design and Analysis The variability structure of data is more critical in the analysis of
of Observational Studies observational human data, as influential factors have generally not
been balanced by the study design. The list of potential covariates
affecting human data can be vast (e.g., age, gender, body weight,
etc.). Confounding factors often interfere with the outcome variable
of interest. For example, in the case of dose–response assessment,

socioeconomic factors may confound with exposure when high expo-
sure is correlated with poorer living conditions, which may also
increase the risk for the ill health. This difficulty is commonly (partly)
addressed by including the confounding factors as covariates in a
multiple regression model.
An extreme particular case of design unbalance is the absence
of comparators, which includes the case of absence of control.
For example, in the case of exposure assessment, it can be that
zero exposure data are not available. Therefore the response
at zero exposure is, in fact, estimated by low-dose extrapolation
based on a dose–response model, resulting in uncertain (model-
dependent) estimates of the response in an unexposed population,
and hence the outcome of the analysis is likely to be strongly
model-dependent. Such weaknesses of the analysis need to be
acknowledged and possibly quantified in the discussion of the
results (e.g., using sensitivity analyses).
Although covariates may explain a part of interindividual varia-
tions, the variability structure present in human or environmental
data is often more complex at the large scale on which public health
or risk assessment questions are raised (including, e.g., variations
between population ethnic subgroups or temporal and spatial var-
iations). Moreover, some clusters often underlie such data, such as
the country, the slaughterhouse, or the field from where those data
come. Some of this complexity may or may not be captured and
handled by appropriate study design. Where the data are available,
hierarchical models can often help to account for such complexity
which is an important aspect to be evaluated in risk assessment.
A common particular case of variability pattern is the time and
space variations of the data collected which often requires a particu-
lar care in food and feed safety questions. Cross-sectional studies can
be problematic regarding time variations such as seasonal or peri-
odic effects or time trends. For example, in the case of hazard
characterization with long-term effects, exposure at the time of
the study may not reflect the long-term exposure. Similarly, cohort
studies may not render the spatial variation. As a result, a careful data
selection is necessary prior to any analysis. If data are not expected to
give unbiased picture, conservative choices should be made (e.g.,
focusing on vulnerable subgroups). Epidemiological studies are
often conducted on a larger scale than experimental studies, hence
with much larger sample sizes allowing for data exclusion when
necessary to improve relevance and quality of data. The level of
precision and accuracy expected from the toxicological assessment
should be commensurate to the size of the time and space scale of
relevance.
Missing or censored data are classical statistical issues. They can
be of large importance in meta-analysis especially for nonrando-
mized studies and for national surveys where the proportion of
missing or censored data can be above 50% (e.g., up to more than

80% in chemical exposure studies) and where the missingness is
usually far from being at random. Statistical approaches have been
developed to handle such missing data (e.g., multiple imputation or
mixed effect models) and censored data (e.g., using adequate max-
imum likelihood approaches, or Kaplan Meier estimators). How-
ever, it is often useful for risk managers to compare results using
more naı̈ve imputations based on worst and best cases scenarios.
Gathering Evidence from The systematic review (SR) methodology combined with a meta-
Multiple Published Studies analysis can be applied in the context of chemical risk assessment,
when a number of studies for that particular chemical are available.
A thorough account of the SR methodology is beyond the aim of
this chapter and its potential applicability in risk assessment applied
to food and feed safety has been explored by EFSA in a guidance
document. However, a few basic concepts are worth mentioning.
SR has been developed by the cochrane library collaboration and
applied in human medicine, epidemiology and ecology for a num-
ber of evidence-based syntheses (cost-effectiveness of treatments,
reporting of side effects, relative risk of disease, meta-analysis of
biodiversity or abundance data in ecology). SR has been defined as
“an overview of existing evidence pertinent to a clearly formulated
question, which uses prespecified and standardized methods to
identify and critically appraise relevant research, and to collect,
report, and analyze data from the studies that are included in the
review” (54).
When applied to risk assessment, the starting point of an SR is
to identify the question type and how to frame the question, using
the Cochrane collaboration methodology, four key elements frame
the question, namely, population, exposure, comparator, and out-
come. Such a question can be close-framed or open-framed. Typi-
cally, SR can be appropriate for a close-framed question since
primary research study design can be envisaged to answer the
question.
Dose-dependent toxicokinetic (i.e., half-life, clearance) or
dose–response in humans or in a test species (rat, mouse, dog,
rabbit..) can be taken as a generic example, the population can be
the human population or in the absence of relevant human data, the
population of interest would become the test species (rat,
mouse. . .), the exposure is the chemical, the comparator would
correspond to different dose levels and the outcome can be either
a toxicokinetic parameter (half-life, clearance, Cmax. . .) or toxicity
in a specific target organ (liver, lung, kidney, heart, bladder. . .).
Ideally, once the primary studies in humans or test species referring
to the dose-dependent toxicokinetics or dose–response for a spe-
cific chemical have been collected, a meta-analysis can be performed
so that modeling of the population variability is possible (54).
Whether a question is suitable for systematic review or not does

not necessarily mean that a systematic review would be worthwhile
or practically feasible. Considerations include the following: priori-
tization of risk assessment model parameters for which refinement
of the parameter estimates is considered most critical; the quantity
and quality of available evidence; the source and potential confi-
dentiality of the evidence; the need for transparency and/or for
integrating conflicting results (54). General aspects of the method-
ology of meta-analysis are given below and specific examples with
regards to analysis of (1) toxicokinetic variability for a number of
metabolic routes in humans and (2) human toxicodynamics of
cadmium are described in Chapter 4.
Meta-analysis is a statistical procedure to review and summarize
data from several past studies. By combining information from
relevant studies, meta-analyses may provide more precise estimates
than those derived from the individual studies included within a
review. Typically, such compilation of literature data is used in risk
assessment for hazard characterization in most cases, but also some-
times to identify unknown or hard-to-study hazards such as for
Genetically Modified Organisms (55). Meta-analyses also facilitate
investigations of the consistency of evidence across studies and the
exploration of differences across studies.
Generally speaking, a proper meta-analysis should allow to:
l Adjust for all identified sources of bias, such as selection bias or
design bias.
l Account for all significant and measurable sources of variability.
l Weight each study according to the evidence it provides:
weighting of evidence is a classical issue of meta-analysis, and
one of its primary objective. It includes weighting between
study designs and a weighting between studies of the same
type, e.g., based on their sample size, e.g., according to their
sample size or to the precision of estimates each of them
provide.
More specifically, a number of issues are worth being listed
when carrying out such analyses. Each of them may require a
dedicated statistical method to be handled in the analysis.
l The data collected often result from aggregation or averaging
of different individual values (e.g., regional averages or age
groups averages). Depending on the level of heterogeneity
and differences within the aggregated data, this aggregation
can translate into a simple precision problem or into a severe
fallacy issue in the results interpretation (e.g., for ecological
designs) if not accounted for by the statistical models (56).
l Heterogeneity between studies is also a typical issue to be
addressed in meta-analysis. Study heterogeneity can arise from
many sources such as the use of different study population,

different environmental conditions or different analytical tech-
niques employed in different studies. In such cases specific
statistical tools are needed to address the related specific issues,
such as the variation across studies or the publication bias (57,
58). These tools include, e.g., the use of random effect models
to account for interstudy variability. Statistical tests for hetero-
geneity have commonly been used especially in the case of
clinical trials (59), but they should be utilized and interpreted
very cautiously in most other types of study as they are often too
weak to detect heterogeneity in large scale food safety problems
where variability sources are often large, numerous and complex
and in which data quality is often poor.
Although the chi-square test for homogeneity is often used in
meta-analyses (60), its power is known to be very low (61) espe-
cially with low number of studies. It is therefore a good statistical
practice to investigate the post-hoc power of such a test to detect a
difference of the size of the claimed effect size. Moreover, the
Higgins test usually complements the assessment of homogeneity,
testing for inconsistency between studies (59). In general, the
Higgins test is therefore preferable as it is less sensitive to the
number of studies included.
3.1.2. Gathering Evidence To perform the probabilistic assessment of human dietary exposure,
for Exposure Assessment the basic available data are the distribution of food consumed and
the distribution of hazard occurrence in food. Most of the data
available are collected at national level and are assumed to be repre-
sentative for the country. For an international perspective the vari-
ous data sets should be combined together. This chapter emphasizes
on this step and aim to describe the data preparation.
Food Consumption Data Food consumption data are collected at national level based on
various survey methodologies like food record or recall (62). More-
over, the year of conduct, the population groups and the age
categories differ greatly between countries, therefore, it is not
possible to use directly these data for an international probabilistic
assessment (63). The approach used currently consists in compar-
ing national distributions and using the worst case, i.e., the highest
consumption for consumers only observed in one country, for the
comparison with health based guidance value. This means in prac-
tice that the percentage of risk or the percentage of individuals
exposed above the health based guidance values is extrapolated
from the national to the international population level and is likely
to be considerably higher than in the reality depending of the ratio
between the considered populations. For example: if the risk is
estimated for the 95th percentile of exposure in Netherlands
(16.6 million inhabitants), it represents about 828,000 people.
When extrapolated to the European Union (501 million inhabi-

tants) it represents 25 million people. Assuming that the popula-
tion at risk within the European Union is only 5% of the Dutch
population would correspond to a risk assessment at about the
99.8th percentile. This represents a large source of uncertainty in
the context of establishing an appropriate level of protection.
Possible improvements could be envisaged in the absence of
harmonized data collection. At first national food consumption
data based on individuals can be combined together using a com-
mon food classification. A single distribution for food consumption
of each food category could be simulated based on national data
weighted as a function of the country population. In that case for
each individual, a single day should be used to increase the compa-
rability between survey results (62). However, a number of assump-
tions should be made:
l All surveys were performed at the same time.
l Recall and record provide similar results.
l The individuals in the survey fully represent the whole popula-
tion of the country.
l The consumption of each food category is assumed to be
independent of the ones of other food groups.
Another approach should consist in identifying similar dietary
patterns across countries or regions. This would have the advantage
to allow estimating the exposure based on various consumer beha-
viors rather than on the citizenship to one or another country.
Moreover this approach would account for the dependency
between various food groups.
The clustering techniques are widely applied to reach this goal
and to identify the similarities and differences in dietary patterns
between countries and regions. As an example the World Health
Organization (WHO) developed this approach to describe the
various diets around the world and resulted in 13 so-called cluster
diets (64). In that case the clustering was based on the economical
data collected by the Food and Agricultural Organization and
known as the FAO Food Balance Sheets. A more recent study
(65) is applying nonnegative matrix factorization (NMF) techni-
ques to food consumption data in order to understand these com-
binations. This paper proposes a different modeling of population
consumption that can be directly applied to the consumed food
quantities. Even though a very large number of different foods are
involved in individual consumption patterns, all possible food com-
binations are not observed in practice. Certain foods are preferen-
tially combined or substituted as a function of hedonic choices
and/or sociocultural habits. One may then realistically expect that
the vast majority of consumption data can be described by a few
patterns, which are linear combinations of consumption vectors of
specific foods. These underlying factors can be interpreted as latent

variables. Therefore, according to this modeling, an individual diet
must be seen as a linear superposition of several consumption
systems. When identified, the consumption systems could be used
for risk assessment and their optimal combination should be iden-
tified for risk/benefit analysis.
Occurrence Data The main difference between occurrence and consumption data is
that food market is generally assumed to be global and therefore,
the country in which food is sampled is not considered to be the
only or even the main source of variability.
Because of the inherent variability between samples regarding
the occurrence of hazards in food, the uncertainty in the results is
likely to increase when the number of sample decreases. Therefore
in such cases, exposure assessment should be questioned and when
necessary a deterministic estimate of the range of exposure should
be performed (66). The current chapter is assuming a dataset with a
sufficient number of samples (e.g., > 50 samples) for each food
category to be considered, allowing a probabilistic approach with a
reasonable level confidence.
After combining all analytical results together using a common
food classification, the whole dataset should be described (e.g., with
histograms/density plot/cumulative distributions) and analyzed
with a particular focus on its potential sources of heterogeneity
(Country of origin, year of sampling, laboratory characteristics,
analytical techniques, etc.).
One of the main issues is therefore very often to handle con-
centration data described as being below the limit of reporting
(analytical limit of detection or limit of quantification). These data
are often known as nondetects, and the resulting occurrence distri-
bution is left-censored. As a first step the impact of left censored
data could be addressed imputing nondetects with values equal to
zero and to the limit of reporting according respectively to a lower-
and upper-bound scenario. The effects of the substitution method
could be evaluated on the mean, and/or the high percentiles. If the
effect is negligible then exposure assessment should be based on
substitution of nondetects by the limit of reporting (upper bound
approach). This approach has the disadvantage of hiring the varia-
bility between samples and overestimating the exposure but can
be used in that particular case because of its low impact on the
overall result.
In other cases, depending on the percentage of censored data,
parametric or nonparametric modeling could be used. When the
percentage of censoring is high (e.g., >50%), it has been observed
in the literature (66, 67) than a parametric approach is performing
better than the nonparametric ones. A set of candidate parametric
models should be defined, by inspecting the density plots. In recent
guidelines, EFSA (66) proposes to select the best parametric model
on the basis of a goodness of fit statistics (AIC or BIC) and to

implement the lack-of-fit test (Hollander–Proschan test) of the
selected model. When the percentage censoring is lower and
when datasets are very heterogeneous with multiple limits of
reporting, the (nonparametric) Kaplan–Meier method can also be
used (50). This method spreads the weight of censored data, those
below limits of reporting, over all lower uncensored data and zero.
The number of chemical measurements in a specific product is
generally too small to estimate high percentiles of the occurrence
distribution. To improve estimation of such percentiles, the upper
tails of the chemical occurrence distribution can be modeled using
extreme value theory (67, 68). This method could also be directly
applied on the exposure distribution (69).
3.2. Specific Good The purpose of this section is not to detail general “best modeling
Modeling Practices practices” that would apply to any statistical modeling. Instead, we
propose to emphasize the good practices that are specific to popu-
3.2.1. Population Model
lation models applied to chemical risk assessment. Note that a
Building
thorough dose–response model-building guidance document
prepared by the WHO International Program for Chemical at
Safety is available in ref. 70 for the purpose of risk assessment.
Generally speaking the key element in structuring a population
model is the definition of its stochastic component and how to
balance it with the deterministic one. This balance necessarily
depends on the objective of the modeling exercise. In case individ-
ual predictions are of primary interest (e.g., when using the model
to define a maximum tolerable dose of a compound for a specific
individual), the deterministic structure should be developed to its
largest extent allowed by the data available. Indeed, such determin-
istic model could then better account for individual specificities and
covariates. Conversely, when the population predictions are of
primary interest (e.g., when using the model to assess the response
of 95th population percentile), the variability component becomes
essential and the stochastic structure should be developed to best
capture such population variability.
This choice between population one-compartment model
(with population variability of TK parameters) versus larger PBTK
models (without population variability) with more compartments
to refine the description of metabolism is a typical illustration of
such balance between deterministic and stochastic structures as
exemplified in the cadmium example below.
The way to construct the population or hierarchical structure of
a stochastic model relies on the identification of the main patterns
found in the toxicological data to be used and of importance in the
risk assessment to be made. Those patterns can be captured or not
depending on the design of the data collection. A common way to
grasp and describe this variability structure is made by considering
the hierarchy of the different levels or scales at which data are
Fig. 4. Hierarchical levels commonly found in toxicological data, from interspecies

variability down to intraindividual variations. Such patterns should drive the variability
compound of statistical models used for the data analysis.
collected, starting from the wider scale (ecological or animal data)

down to the individual scale (data from one individual) as exempli-
fied by Fig. 4. Each level would then be described by a specific
random effect in a regression model. This description is often
meaningful from both statistical and biological standpoints.
Accounting for interspecies variability applies for large scale assess-
ment (e.g., ecological evaluation) involving a large number of
species. Interstudy variability is accounted for in a context of
meta-analysis as it is often the case in risk assessment (see Subhead-
ing “Gathering Evidence from Multiple Published Studies”). Inter-
individual variability is the most common source of variability
captured by toxicological data and of primary importance. It can
sometimes be reduced or “explained” by covariates inclusion.
Finally, intraindividual variations such as interoccasion variability
or seasonal effects can be assessed in case of longitudinal or
repeated measurements over time.
The critical modeling assumption to be set at each hierarchical
level is obviously the population distribution chosen for each ran-
dom effect. Biological assessments have shown to be sensitive with
respect to this choice and could therefore be a weak point in a
quantitative toxicological or risk assessment. The standard choice
of lognormal distribution is usually applied when the main outcome
modeled is a chemical concentration. It is a good practice to plot the
empirical distributions of such outcomes to visualize and support
the modeling assumptions. Furthermore, it is essential to check that
residual variances are not correlated with the random effects. Oth-
erwise, it is likely that the distribution chosen is not adequate.
A way to include more determinism in a model is to refine the
underlying structural model by describing in more details the
biological mechanisms involved. In PBPK models, this translates
in adding more compartments or in sophisticating the differential
equations describing the metabolism (e.g., by adding saturation

effects or elimination lag times).
Another way to add more determinism in a population model is
to include more predictors or covariates. These can explain the
population variability, and therefore reduce the residual variability
component of the model. Covariate selection is therefore a critical
step in population modeling. When using frequentist inference and
the standard paradigm of statistical testing, there are two common
and widely used approaches to select covariates on statistical
grounds:
1. For nested models, which is typically the case for nonlinear
mixed effects regression models used in TK or TD assess-
ments, likelihood ratio tests are particularly suited to compare
models with versus without a set of covariates. Their compar-
isons is then based on the difference of their respective objec-
tive function (2 Log-Likelihood) which asymptotically
follows a chi-square distribution with degree of freedom
equals to the difference in number of model parameters.
NONMEM software reports as a default output such objec-
tive functions for each fitted model to allow the users for
model comparisons.
2. For nonnested models, information criteria such as AIC or BIC
can be used and seen as an extension of log-likelihood ratio
tests. However, it should be noticed that degrees of freedom of
fixed effects in the context of mixed-effect models are often
debatable (71). Deviance Information Criteria (DIC) used in a
Bayesian framework are therefore often more robust for hierar-
chical models to be compared (72).
In the area of population PK/TK, the forward/backward selec-
tion is often used when the number of potential covariates is so
large that all possible pair-wise model comparisons are not feasible
(e.g., when the number of potential covariates is above 5). This
empirical and pragmatic way of selecting a set of covariates does not
ensure theoretically full optimality of the final model but it gener-
ally performs well and provides close-to-optimal solutions. In this
context, prior to any investigation, a base model (with no covariate)
is first fitted to the data. Subsequently, covariates are screened using
a two-step forward–backward approach in which covariates arte
added into or removed from the base model based on likelihood
ratio tests with a predetermined significance level. More specifically:
l The forward (selection) step consists in testing, for each single
factor independently, the effect on the main outcome. At the
end of this one-by-one selection process, a full model can be
built, including all selected factors.
l The backward (elimination) step consists in testing one-by-one

whether each selected factor could be removed or not from the
full model. At the end of this elimination process, a final model
can be built, which integrated all remaining significant factors.
One obvious theoretical flaw in this approach is the multiple
testing aspect: the final model building is based on the outcome of a
series of tests at a given significance level. For this reason, it usually
advised to either adjust for multiplicity (which can be tricky), or to
set a low level of significance for each test e.g., 1% instead of the too
conventional 5%.
Aside all statistical considerations, model structure can always
be motivated and justified by biological rationales based e.g., on
mechanism plausibility or on the relevance of the predictions
derived from the model in the dose range of interest.
3.2.2. Model Validation Population models are typically evaluated with respect to how the
model simulations can reflect past and/or future observations, or
Validation of Population
more specifically a function of observations of highest interest
TK/TD Models
for the assessment such as the 95th population percentile of a
concentration.
The prediction error is often evaluated on the observations
compared either to individual or to population predictions. Popu-
lation predictions are obtained after setting all random effects to
zero, i.e., the first-order approximation PRED in NONMEM.
Individual predictions are obtained after setting random effects to
their subject-specific values as estimated by the model. These are
typically evaluated using post hoc Bayesian estimates as with the
IPRED command of NONMEM. Bayesian individual predictions
are straightforward outputs from any Bayesian software like Win-
BUGS or MCSim.
Another measure of the predictive performance can be derived
from evaluating the likelihood of the data given the model esti-
mated with fixed population parameters.
Model validation can be done using an external dataset or
alternatively using a cross-validation approach. In the latter case,
the dataset is split into two subsets: one for the fitting (the larger)
and one for the validation (the smaller). The split should be done
randomly, and could be repeated.
Critical validation steps of population models include the graphi-
cal checks of residual errors variances and of random effect distribu-
tions. Visualize checks can be done using boxplots and should be
centralized around zero without systematic bias or patterns.
In the context of chemical risk assessment, sensitive assump-
tions in population modeling are typically:
l The choice of random effects’ distributions
l The choice of dose–response models
Models can be compared using Akaike or Bayes information

criteria (AIC, BIC). The deviance information criterion (DIC) can
be considered as a generalization of AIC for hierarchical models.
The advantage of DIC over other criteria in the case of Bayesian
model selection is that the DIC is easily calculated from the samples
generated by a Markov chain Monte Carlo simulation. AIC and
BIC require calculating the maximum likelihood, which is not
readily available from the MCMC simulations. WinBUGS provides
DIC calculations in its standard tool menu.
Validation of Exposure The KDEM model predicts for each simulated trajectory the mean
Models body burden of chemical at steady-state. The predictions from
dietary exposure can be compared with internal exposure (i.e.,
biomarkers) to validate the model. Measurements on internal expo-
sures to chemical are usually sampled in urine, hair, blood, or breast
milk. The chemical body burden predicted by KDEM has to be
converted in same unit of the measurements with conversion fac-
tors. For example, predicted body burden have to be converted in
concentration in body fat to compare with measurements in breast
milk. The conversion factors such as percentage of fat depend of
personal characteristics of the individuals and its variability could
also be included.
Often, internal measurements are not available for the popu-
lation for which the dynamic modeling of exposure process has
been computed, especially when the population of interest is the
whole population of a country. In that case, validation of the
exposure process is done at population level in comparing
both the distributions of predicted body burden and internal
measurements. When studies coupling biological measurements
with frequency consumption questionnaire have been carried out,
the interest is to link the body burden predicted from the ques-
tionnaire with the associated internal measurement for each indi-
vidual. Due to high uncertainty on past contamination and
consumption, for an individual the point estimate with one
trajectory can be far from its internal measurement value. In
such a case, for an individual, several trajectories can be simulated
under different scenarios regarding contamination and consump-
tion using probabilistic distributions. A confidence interval for the
predicted exposure can then be constructed and compared with
the internal exposure value. Sometimes, measurements have been
performed on a specific population that has certainly not reached
the steady-state situation, an example is pregnant women. Using a
well-defined initial body burden X0 and computing a sensitivity
analysis to this value, estimation of the body burden can however
be conducted based on the external exposure and compared with
internal measurements.
3.2.3. Numerical Aside specific model validation steps, the fitting of nonlinear mixed-
Considerations effect models involves implementation computational methods
such as stochastic or iterative algorithms. As a consequence, out-
Convergence of Fitting
puts from parameter estimation rely on valid convergence of such
Algorithms of Population
algorithms. In maximum likelihood estimation, fitting algorithms
Models
implemented in e.g., NONMEM are based on first order approx-
imations of likelihood. Those approximations may not be valid in
case of highly skewed likelihoods and nonnormal random effects,
or they may be highly sensitive to initial values. As a consequence, a
good practice is to test a range of different approximation methods
and initial values. To gain in flexibility and reliability, stochastic
approximation of EM algorithms (SAEM) have been developed
and used in PBTK modeling with the MONOLIX software.
Although SAEM is more powerful gradient algorithms for nonlin-
ear population models, only visual convergence check is available in
MONOLIX. Finally, for Bayesian inference, convergence of
MCMC needs also to be checked using as for any Bayesian model-
ing exercise. WinBUGS provides ready-to-use tools for conver-
gence check such as the Gelman-Rubin statistics.
Convergence of KDEM Time to reach steady-state situation can be determined through

Simulations simulations. For example, a set of 1,000 trajectories is computed for
several initial values X0. For each trajectory of the different set the
mean exposure over increasing periods of time [0,T] is calculated
using the eq. 5. For each set, the mean of the 1,000 means is
calculated and plotted (49). After a period of time which corre-
sponds to the time to reach the steady-state situation, the trajec-
tories of the means converge to the steady-state mean exposure
value. This value corresponds to the mean body burden of the
population at steady-state situation. In the KDEM application to
occurrence of methylmercury (49), the steady-state distribution is
simply a Gamma distribution with parameters depending of the two
exponential distributions used to model intakes and intertime
intakes. Therefore the steady-state mean exposure value is the one
of the Gamma distribution.
4. Examples
Three examples are described below to explore population model-

ing in computational toxicology: (1) Systematic review and meta-
analysis of human variability data in toxicokinetics for the metabolic
routes and derivation of pathway-related uncertainty factors using
the pharmaceutical database, (2) population modeling of toxicoki-
netics and toxicodynamics of cadmium in humans using Bayesian
inference for aggregated data, and (3) exposure assessment in
human populations to dioxins using probabilistic estimates of expo-
sure and toxicokinetic parameters.
4.1. Example 1: Systematic review and meta-analysis of TK variability for the major
Systematic Review human metabolic routes, (1) phase I metabolism (CYP1A2,
and Meta-analysis CYP2A6, CYP2C9, CYP2C19, CYP2D6, CYP2E1, CYP3A4,
of Human Variability hydrolysis, alcohol dehydrogenase), (2) phase II metabolism (N-
Data in Toxicokinetics acetyltransferases, glucuronidation, glycine conjugation, sulfation)
for Major Metabolic and (3) renal excretion; have been performed following a number of
Routes and Derivation methodological steps using the pharmaceutical database. The pur-
of Pathway-Related pose was to derive pathway-related uncertainty factors using human
Uncertainty Factors variability data in TK as an intermediate between default uncer-
tainty factors and chemical-specific adjustement factors (i.e., based on
PB-TK models) for chemical risk assessment (for more details on
uncertainty factors see Subheading 20.1) (4, 6, 19, 73–75).
– Selection of probe substrates and systematic review of the TK
literature:
Probe substrates were identified by searching the literature
(MEDLINE, PUBMED and TOXLINE depending on the
pathway). The selection of probe substrates followed a number
of specific criteria: (1) oral absorption complete, (2) a single
pathway (phase I, phase II metabolism and renal excretion)
responsible for the elimination of the compound (60–100% of
the dose), (3) intravenous data used for compounds for which
absorption was variable. The specific metabolic pathway was
identified using quantitative biotransformation data from
in vitro (microsomes, cell lines and primary cell cultures) and
in vivo (urinary and fecal excretion) studies.
– Systematic review and meta-analysis of TK data.
Selection of TK studies and data ranking for population varia-
bility analysis
For each metabolic route, a systematic review for TK studies in
humans was performed for each probe substrate selected for each
pathway for each subgroup of the population: general healthy adults
(16-70 years from different ethnic backgrounds (Caucasian, Asian,
African) and genetic polymorphisms (CYP2C9, CYP2D6,
CYP2C19, NAT-2) and other subgroups of the population [elderly:
healthy adults older than 70 years, children (>1 year to <16 years),
infants (>1 month to <1 year) and neonates (<1 month)] African,
effects of genetic polymorphisms, effects of age (elderly, children,
infants, neonates), effects of ethnicity (Caucasian, Asian, African)
using two different types of TK parameters: (1) TK parameters reflect-
ing chronic exposure [metabolic and total clearances, area under the
plasma concentration–time curve (AUC)]. (2) TK parameters reflect-
ing acute exposure (Cmax) markers of chronic (clearances and AUCs)
and acute (Cmax) exposure.
A number of meta-analysis were then performed in order to
quantify human variability in TK for each compound, marker
(acute, chronic), subgroup of the population and metabolic route
in humans including phase I metabolism (CYP1A2, CYP2A6,

CYP2C9, CYP2C19, CYP2D6, CYP2E1, CYP3A4, hydrolysis,
alcohol dehydrogenase), phase II metabolism (N-acetyltrans-
ferases, glucuronidation, glycine conjugation, sulfation), and renal
excretion.
Variability in TK for each probe substrate/subgroup/parame-
ter was estimated using ranking of the parameters, as described
previously in ref. 19:
1. Data were selected from peer-reviewed studies using specific
and sensitive analytical methods (i.e., HPLC).
2. Data for the oral route were preferred to intravenous data for
the purpose of application in chemical risk assessment since
humans are exposed to environmental contaminants and nutri-
ents mainly in food and drinking water.
3. TK parameters were abstracted preferentially as metabolic or
total plasma clearance (CL) adjusted to body weight (ml/min/
kg), then unadjusted metabolic or total plasma clearance (ml/
min), the area under the plasma concentration–time curve
(AUC).
4. AUC and Cmax values were corrected for body weight (mg/kg)
using the published mean adult body weight, or 70 kg (males),
60 kg (females), 65 kg (mixed males and females) for adults
when the weights were not reported.
5. Data were analyzed such that no individual would contribute
more than once to each analysis.
6. TK data were assumed to be linear at dose studied and to follow
a log-normal distribution. However, data from individual
kinetic studies were usually reported as arithmetic means (X)
and standard deviations (SD) or coefficient of variation assum-
ing a normal distribution (CVN), and were transformed into
geometric mean (GM), geometric standard deviation (GSD)
and the corresponding coefficient of variation (CVLN) (73, 74)
using:
X
GM ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (6)
ð1 þ CV 2N Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
GSD ¼ exp ln(1 þ CV 2N Þ : (7)
The coefficient of variation for the normal distribution (CVN)

is given by
SD
CV N ¼ : (8)
X
The coefficient of variation for the log-normal distribution

(CVLN) is given its relationship with the appropriate measure
of variation (geometric standard deviation (GSD))
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
CV LN ¼ exp [ln(GSD)]2 1: (9)
7. CV are assumed to represent interindividual differences in the

TK of compound and its metabolic pathway (full oral absorp-
tion and >60% metabolized by this route) either after chronic
(clearance, AUC) or acute exposure (Cmax) rather than mea-
surement errors or random analytical errors since TK para-
meters are derived from multiple measurements.
Meta-analyses of TK studies
As described above, meta-analyses of the TK data were performed
first o estimate population variability at the level of the compound
for each subgroup of the population, route of exposure (oral, intra-
venous) and TK parameter (AUCs/clearances or Cmax) using a
weighted mean approach (73). Each individual TK study reporting
data for the same kinetic parameter, compound and subgroup of the
population were then combined using the weighted mean method
described previously using the number of subjects in each study as
the weight (73, 74, 76, 77). The overall coefficients of variation
(CVN and CVLN) for each parameter/subgroup of the population
were then combined for all probe substrates for a particular pathway
as an average on the log-scale to derive the pathway-related varia-
bility for each metabolic pathway.
Published data related to subgroups of the population were also
analyzed to quantify differences in internal dose (for both means
and variability) compared to healthy adults:
The difference in mean level of internal dose (based on clearance,
AUC or Cmax) was expressed as the ratio of the GM for the healthy
to the GM for the subgroup. The GM ratio was expressed as the
magnitude of any increase in the internal dose in the subpopulation
compared to healthy adults (i.e., ratio of 2 would arise from a
twofold lower clearance or twofold higher AUC or Cmax in the
subgroup).
The difference in variability was expressed as the ratio CVLN for
healthy adults to CVLN for subgroup and expressed as the magni-
tude of any increase in the variability of the subgroup compared to
healthy adults (a ratio of 2 would indicate a twofold greater varia-
bility in the subgroup).
These differences in internal dose and variability for subgroups
of the population were also averaged on the log-scale to define
pathway-related differences in internal dose and variability.
As an example of subgroup analysis, for polymorphic metabolic
routes such as CYP2D6, CYP2C19 and N-acetyltransferases
2 (NAT-2), the interphenotypic differences in internal dose were
determined in the different subgroups of the population under two

different scenarios depending on data availability (74):
1. TK data available only in nonphenotyped subgroups of the
population: TK data for different probe compounds of
CYP2D6, CYP2C19, and NAT-2 in nonphenotyped were
compared with the corresponding data for nonphenotyped
healthy adults.
2. TK data available in phenotyped subgroups of the population
[extensive metbolizers (EM), SEM (slow extensive metaboli-
zers) and poor metabolizers (PM) (CYP2D6, CYP2C19), fast
acetylators (FA) and slow acetylators(Nas) (NAT-2)] were
compared with the corresponding data for EM (CYP2D6,
CYP2C19) or FA healthy adults(NAT-2) respectively in each
subgroup as an average on the log-scale between the
compounds.
4.1.1. Derivation of Pathway- Pathway-related uncertainty factors for markers of chronic and
Related Uncertainty Factors acute exposure were derived as the Z scores of the pathway-related
variability to cover the 95th, 97.5th and 99th centiles of the healthy
adult population and subgroups of the population (including
pathway-related ratio of internal dose and pathway-related varia-
bility in the specific subgroup). The pathway-related uncertainty
factors were calculated as the difference between each percentile
(95th, 97.5th and 99th centiles) of the subgroup compared with
healthy adults, without taking into account the incidence of the
subgroup in the overall population. These values assume that
higher circulating concentrations of the parent compound would
result in increased risk, i.e., that the parent compound is the toxic
chemical species (6, 19, 73–77).
4.2. Example 2: This example is extracted from a recent human risk assessment of
Population TK/TD Cadmium in food by the European Food Safety Authority (18).
of Cadmium Using Cadmium is a widespread environmental pollutant that has
Aggregated Data been shown to exert toxic effects on kidney and bones in humans
after long-term exposure. The recent health risk assessment of
Cadmium in food performed by the European Food Safety
Authority (EFSA), illustrates the whole process of TK/TD
model-based risk assessment. A population TK and dose-effect
models (TD) were developed and linked together in order to
evaluate a “safe dose” (“health-based guidance value”) for the
Caucasian population.
4.2.1. Toxicokinetics Urinary cadmium (U-Cd) concentration is considered a good bio-

marker of accumulated cadmium in kidney, and diet is the main
source of cadmium among nonsmokers.
The TK assessment involved the comparison of an eight-
compartment toxicokinetic model and a one-compartment
population TK model, based on a cohort study of 680 Swedish

women, over a 20-year-long period.
Second, an alternative one-compartment model (78) was con-
sidered. Such a simpler model focuses on the kidney accumulation
and urinary excretion, making rough and global assumptions on
the other pathways. In case of poor prior knowledge on the physi-
ological parameters involved in cadmium kinetics, it allows a sim-
plified and parsimonious description of cadmium excretion, hence
facilitating further statistical evaluations such as for example the
evaluation of population variability or integration of intraindividual
variability.
The one-compartment model considered here is a standard first-
order elimination model with bolus administration; see ref. 22.
It can be described as follows: For a given intake of cadmium (d0)
at time 0, the accumulated amount of cadmium in the kidney at time
t is calculated as:

d0 t1=2 logð2Þt
Cdkidney ðt;d0 Þ ¼ fk exp ;
log(2) t1=2
where t1/2 is the cadmium half-time and fk is a factor aggregating
several physiological and cadmium-related constants:
Abs frackidney coef cortex
fk ¼ ;
Weightkidney
with Abs ¼ gastrointestinal absorption coefficient (in %); frackidney
¼ fraction of cadmium transported to kidney (in %); coefcortex ¼
coefficient translating cadmium in the whole kidney into cadmium
in the kidney cortex; Weightkidney ¼ kidney weight (in g) assumed
to be proportional to body weight.
Repeated exposure to cadmium is considered as daily bolus
doses, thus:
X di t1=2
logð2Þðt iÞ
Cdkidney ðtÞ ¼ fk exp :
i
log(2) t1=2
The urinary cadmium concentration is assumed to be propor-

tional to the cadmium concentration in the kidney cortex, thus the
urinary cadmium concentration (in mg/g creatinine) at day t is
obtained by:
Cdurine ðtÞ ¼ fu Cdkidney
where fu is the ratio between cadmium in urine (mg/g creatinine)
and in kidney cortex (mg/kg kidney cortex). Note that, in the case
where dose per kg body weight is assumed to be constant over the
lifetime, the equation model can be simplified into the more classi-
cal one where urinary cadmium concentration is just proportional
to 1 exp [(t/t1/2)log(2)].
Fig. 5. Measured individual urinary cadmium concentrations (U-Cd) versus redicted

individual concentrations using the population one-compartment TK model.
A Bayesian approach was used to perform the statistical infer-

ence. A uniform prior between 5 and 35 years was set for individual
t1/2 in order to constrain the estimation only within biologically
plausible values. An informative prior was set over (fk fu) in order
to integrate the prior knowledge on the cadmium-related and phys-
iological parameters described previously in ref. 17. This prior was
set to a normal distribution (truncated at zero) centered on the
central value derived from the literature, and with CV ¼ 30%
allowing to cover the range of possible values for (fk fu).
Based on the estimated parameters, Monte Carlo simulations
were run in order to predict the population variation in urinary
cadmium as a function of lifetime exposure, for a given daily intake.
The predicted urinary cadmium concentrations corresponding to a
daily cadmium intake of 0.3 mg/kg body weight over 70 years in the
50th, 95th, and 99th percentiles of the population are shown in
Fig. 5. The upper percentiles represent the individuals at most risk
for high urinary cadmium concentration (mainly), because of long
retention time in the body.
Based on the model, the population distribution in the daily
cadmium intake corresponding to a given level of urinary cadmium
can also be obtained. Thus, we calculated the population variation
in dietary cadmium intake corresponding to urinary cadmium con-
centrations of 0.5, 1, 2 and 3 mg/g creatinine, in a 50-year-old
individual with 70 kg body weight (Fig. 6). The functions show for
each population percentile, the maximum dietary cadmium intakes
allowed in order not to exceed the predefined urinary cadmium
concentrations. Thus, in order to remain below e.g., 1 mg cad-
mium/g creatinine in urine in 50% of the population by the age
of 50 (average urinary cadmium is 1 mg/g), the average daily
Fig. 6. Cumulative population distribution of daily Cd intake needed to achieve 0.5, 1, 2,

and 3 mg/g creatinine of U-Cd concentration.
dietary cadmium intake should not exceed 0.8 mg Cd/kg body

weight. The corresponding intake for the 95th percentile of the
population remaining below 1 mg/g is 0.4 mg Cd/kg body weight
and day.
4.2.2. Modeling Population For cadmium renal effects, b2-microglobulinuria (b2-MG) was the
Variability in the most commonly reported biomarker and was therefore chosen as
Dose–Response: biomarker of effect. On the basis of a systematic review of the
Toxicodynamics and BMD literature, 35 epidemiological studies that measured both biomar-
Assessment kers of exposure and effect in urine were compiled into an aggre-
gated dataset made up of 165 groups of matching urinary cadmium
levels and b2-microglobulinuria.
A meta-analysis was then performed to determine the overall
relationship between urinary cadmium and b2-microglobulin for
subjects over 50 years of age (with purely occupational studies
excluded) and for the whole population with all studies.
The consolidated dataset of U-Cd dose groups versus b2-MG
effect groups is displayed on Fig. 7, illustrating dose and effect
population variability, as well as the interstudy variations. It exhibits
an S-shaped dose–effect relationship in the log-scale of both U-Cd
and b2-MG, which can be described by a Hill model, with equation:
Effect (d) ¼ background + amplitude [d/(d + ed50)]
Where: d stands for the dose, i.e., urinary cadmium (in log scale),
“amplitude” corresponds to the difference between the two
plateaus of the S-shape, ed50 corresponds to the dose where 50%
of the maximal effect is achieved, and corresponds to the shape
parameter defining the steepness of the S curve.
1000000
B2-Microglobulin (µg / g crea)

100000
10000
1000
100
10
1
0.1 1 10
Urinary Cadmium (µg / g crea)
Fig. 7. Scatter plot of data from all studies linking urinary cadmium to b2-microglobuli-
nuria, using a different color for each study and illustrating within group variability. Each
dose group is represented by an ellipse on the log scale, with log(GSD) as radium.
Especially in such cases of large interstudy variability due to

many uncontrolled factors, data homogeneity cannot be assumed
and variation between studies should be accounted for (79). This is
usually made using a random study effect in the statistical model
(80, 81). Therefore, the group geometric means and variances were
meta-analyzed using a mixed-effect Hill dose-effect model, hence
accounting for study heterogeneity and the group sample sizes.
If Yi(k) stands for a measurement of e.g., b2-MG for individual
(i) in the study (k), then empirical means and variances (S(k)2) of log
(Yi(k)) (as derived from the recorded geometric means and standard
deviations in the data) are assumed to follow the following statisti-
cal distributions:
X logðY ðkÞ Þ
ðkÞ s s2 2 ðkÞ
2
ðkÞ2
i
N m ; S w ðn 1Þ
i
nðkÞ nðkÞ nðkÞ
where n(k) is the sample size of study (k), m(k) is the population
subgroup mean effect (in log scale) and s2 is the interindividual
variance of the effect at a given dose. This statistical model naturally
accounts for interstudy and interindividual variability and weights
studies according to their sample sizes. It is valid under the assump-
tion that individual doses and effect levels are log-normally
distributed within each group, which were generally what was
reported in the original publications from the grouped data were
collected. The population mean m(k) was adjusted for ethnicity to
differentiate Caucasian from Asian data. The resulting model fit to
the data is reported by Fig. 8, showing that Asian subjects were
estimated to have more than twofold higher b2-microglobulinuria
than Caucasians with the same exposure (p < 0.01). Further
108
107 Asians
Beta2-Microglobulin (µg / g creatinine) Caucasian
106
105
104
103
102
101
10–1 100 101 102
Urinary Cadmium (µg / g creatinine)
Fig. 8. Hill model fitted to the Caucasian data (open circles) versus the Asian data , using the complete dataset.
investigations are however required to evaluate possible sources of

bias and confounding issues which may temper this finding.
Based on the interindividual variability of effect (s2) estimated
from this analysis, a benchmark dose was derived first assuming no
variability of dose within the groups. This benchmark dose was defined
as the urinary cadmium level at which an extra 5% of the Caucasian
population at 50 years of age is expected to show a biomarker response
(e.g., b2-microglobulinuria greater than 300 mg/g creatinine in the
case of the EFSA assessment) as compared to the background expo-
sure. To account for estimation uncertainty, the lower bound of the
one-sided 95%-confidence interval was used instead of a central esti-
mate, hence defining a BMDL rather than a BMD.
The statistical model was fitted via Bayesian inference as it par-
ticularly suited to hierarchical models fitting. Bayesian setup also
offers an integrated and robust framework to derive BMDs and
BMDLs by Monte Carlo simulations using posterior samples. The
Bayesian evaluation was made using WinBUGS (version 1.4, (82)).
For each fitted model, three Monte Carlo Markov chains were
simultaneously run until convergence. Convergence was assessed
using the Gelman–Rubin test available in the WinBUGS software,
and by visual inspection of the chains. Posterior means were reported
as statistical estimates of model parameters. Prior distributions were
chosen as “noninformative,” i.e., flat normal distributions for mean
parameters and flat gamma distributions for precision parameters.
4.2.3. Final Risk Assessment The overall approach taken by EFSA for this assessment is
and TDI Determination summarized by Fig. 9 showing how the various data, analyses
and outputs of analyses were sequenced and combined together
Fig. 9. Graphic representation of step-wise TK/TD assessment performed by EFSA to derive the final HBGV on dietary
cadmium.
to derive the final health based guidance value (HBGV) on

dietary cadmium. To account for dose variations within groups,
the CONTAM Panel then corrected this benchmark dose by an
adjustment factor based on estimated variance of exposure within
groups. After adjustment, the CONTAM Panel estimated that
1 mg/g creatinine of urinary cadmium was the critical exposure
level on which health-based reference of cadmium dietary intake
should be based. Subsequently, the TK model described above
showed that a food intake of about 2.5 mg/kg body weight
(b.w.) per week would prevent 95% of the Caucasian population
from being above the threshold of 1 mg/g creatinine of urinary
cadmium at 50 years of age (18).
The dose–effect model showed that a urinary cadmium level
above 1 mg/g creatinine leads to an excess risk of 5% for the
Caucasian population of being with b2-microglobulinuria above
the critical cut-off of 300 mg/g creatinine. The TK model showed
that a food intake of about 2.5 mg/kg body weight and per week
would prevent 95% of the Caucasian population from being above
the threshold of 1 mg/g creatinine of urinary cadmium.
A thorough discussion of this EFSA approach, including a

critical review of the model assumptions and robustness can be
found in ref. 83. Furthermore, recent BMD analyses using the
same biomarkers of dose and effect but with individual data con-
firmed the findings of this TK/TD evaluations (84) on the Japanese
population.
4.3. Example 3: In this section, KDEM is applied to the French population expo-
Population Exposure sure to dioxin and dioxins like compounds (DLCs). Several scenar-
to Dioxins over Time ios for the model input variables have been performed to test their
Using Individual Data impact on model output.
In the case of dioxin and related compounds, the total dietary
intake estimation should be performed for a group of congeners
with similar toxicological properties. The TEF approach, (51) are
based on an additive model and are used to convert concentrations
of each congener relatively to the compound defined as a reference.
The converted concentrations are then summed to obtain concen-
tration in total chemical expressed in toxic equivalents (TEQs).
Dietary exposure data are the individual exposures estimated by
(85) in combining concentration data from French monitoring
programs (2001–2004) with individual consumption data provided
by the first French Individual Consumption Survey (53). The daily
exposure to total dioxins expressed in toxic equivalents (TEQs) was
estimated to be on average 1.8 and 2.8 pg TEQ98/kg b.w./day for
respectively the adult and the children population. The 95th per-
centile of exposure is 4.5 and 6.6 pg TEQ98/kg b.w/day for
respectively the adult and the children population. The empirical
exposure distributions of adults and children are used as input
values in the dynamic exposure process.
Both the population-based model and the individual-based
model presented in the Subheading 3.2 have been computed
using different values of initial body burden X0. Variability between
congeners’ half-lives is considered to be included in the TEFs and
the half-life value of the TCDD which is the reference compound is
used (51). Two scenarios considering fixed or individual half-lives
were tested for the population-based model. With the first scenario,
the half-life of the adults’ population was fixed to 7.1 years and the
one of the children population to 2.8 years (31). Some authors (86)
have observed for dioxins and dioxins like compounds a linear
relationship between half-lives and personal characteristics such as
age, body fat, smoking, and breast-feeding. With the second sce-
nario, half-lives for each individual were estimated according to the
equation of (86). Trajectories were simulated over 90 years. The
interintake times for both model was set to a week and the time
window of the individual-based model was set to 3 years.
Figure 10 shows the individual variability of the dynamic pro-
cess in plotting 30 exposure trajectories computed with the
individual-based model. The jumps of the dynamic exposure
Fig. 10. An example of 30 trajectories of the exposure process over 90 years computed with the individual-based model.
process at each intake-time and its decreases between two intakes

can be observed.
The convergence to the steady-state of the dynamic exposure
process performed with the population-based model using a fixed
half-life can be visually checked in plotting the mean exposure
trajectories for different initial body burdens (Fig. 11, see section
3.2.3 on numercail considerations- convergence of fitting algo-
rithms of population model). Simulations were conducted over
90 years for both the children and adult population separately.
With increasing time, the trajectories converge to similar values of
body burden (between 3.8 and 4.2 ng TEQ/kg b.w. for the chil-
dren population).
The mean trajectories of the exposure process over different
scenarios are compared in Fig. 12. To simulate mean trajectories
over lifetime with the population-based model, the trajectories of
the children and the adults were combined creating a break in the
trajectories at 15 years of age. Using individual half-lives leads with
lower predicted body burden. Indeed, because half-lives calculated
with the equation of (86) are lower (values range between 0.6 and
2.4 years) than the fixed one for children aged over 15 years, the
exposure process decreases rapidly during the first years of life. The
individual-based model leads with lower mean exposures than that
computed with the population-based model during 64 years of life.
This phenomenon is due to the fact that the calculated half-lives for
individuals over 65 years are longer than the fixed half-life. The
trajectories of the mean exposure performed with the individual-
Fig. 11. Mean exposure trajectories for different initial body burden performed with the population-based model using a
fixed half-life. A mean exposure trajectory is represented by the mean of the computed mean exposures over increasing
intervals [0,T ] for a set of 1,000 trajectories.
Fig. 12. Mean exposure trajectories for different initial body burden performed with the population-based model using a
fixed half-life. A mean exposure trajectory is represented by the mean of the computed mean exposures over increasing
intervals [0,T ] for a set of 1,000 trajectories.
based model for three different initial body burdens are close after a
period of around 30 years.
The test of different scenarios results in that the dynamic
modeling of the exposure process is sensitive to the half-life values
and to the choice between the population-based model and indi-
vidual one. The individual-based model seems to be less sensitive to
the starting point X0. This phenomenon can be explained by the
fact that the exposure variability is lower within an age group than
among the whole population. Therefore, exposures performed
with two set of 1,000 trajectories using the individual-based
model will be closer than the ones performed with the
population-based model. Note that for both models, the steady-
state situation is reached after a very long time of exposure, more
than a life with the individual-based model.
To give example of risk measures, human BMD was extrapo-
lated from the rat to human combining animal BMD determined
by (87) and intraspecies factor modeled by a lognormal distribution
with a geometric mean of 1 and a geometric standard-deviation of
2 (88). The BMD is related to 5% of decrease in sperm production.

The probabilities that the mean exposure body burden, calculated
over 40 and 70 years of life, exceeds the extrapolated BMD were
computed with the individual-based model. Over 40 years, the
estimate of the probability that the mean exposure exceeds the
BMD is of 0.5%. The 95% confidence interval related to the varia-
bility of the BMD is of [0.1;1]%. As body burden increases with
age, over 70 years, the estimate of the probability that the mean
exposure exceeds the BMD is of 3.2% with a 95% confidence
interval of [2.2%;4.3%].
Another measure of risk is the percentage of time (as the
probability) that the exposure process exceeds the BMD. This
probability is estimated for each individual trajectory and therefore
a distribution of that probability is computed. The median esti-
mated over 40 years and 70 years are of 0%. This means that for the
50% of the population the probability to exceed the BMD is null.
The 97.5th of the distribution is respectively of 7% and 49% for
40 years and 70 years of exposure. This latter result means that 2.5%
of the population is exposed over the lower BMDL during around
50% of their life time over 70 years and that the overshoot of
exposure is more important between 40 and 70 years old.
5. Notes/
Conclusions and
Future Perspectives
Generally speaking, there are limitations for the various in vitro
toxicokinetic assays which have an impact on the predictive accu-
racy of PB-TK models. Difficulties in predicting metabolism, der-
mal absorption, renal excretion and active transport are foremost
in that respect, and improvements will proceed at the pace adopted
to solve these problems. For instance, checking the validity of PB-
TK models is much easier when such models have a stable and well-
documented physiological structure. In this context a number of
generic PB-TK models have been developed, i.e., Simcyp, Bayer
Technology Services, Cyprotex (https://www.cloegateway.com),
or Simulation Plus. However, modelers are still facing the need to
validate QSAR submodels or in vitro assays used to assign a PB-TK
model’s parameter values. Obviously, the quality of those inputs
conditions the validity of the PB-TK model using them. The vali-
dation of those submodels and in vitro assays requires particular
attention and follow the relevant procedures combining sensitivity
and uncertainty analyses to understand and quantify which model
parameters or assumptions are the most critical to develop relevant,
stable and quantitatively informative toxicokinetic/toxicodynamic
models for risk assessment (89).
In any case, the major challenge will probably be the coupling

of PB-TK models to predictive toxicity models, at the cellular and
at the organ level. For example, liver models are being developed
(90, 91), but their predictive power is far from established for
chronic repeated dose toxicity.
Additionnally, many improvements can still be implemented
within exposure models; such as the KDEM; to integrate the varia-
bility of input variables in the modeling. As an example, in the
dynamic exposure modeling developed by (49), the absorption
fraction (ABS) has been considered to be equal to 1. However,
experimental studies have shown that after oral exposure, the frac-
tion of the chemical absorbed can be variable and less than full
absorption (91, 92). This parameter could be included in the mod-
eling through the introduction of a probabilistic distribution reflect-
ing population variability and move away from point estimates.
A potential challenge is the fact that modeling of exposure over
time faces a lack of data on past contamination and consumption.
Indeed, for a number of chemicals related to environmental and
food contamination, exposure has decreased over the past two/
three decades in some countries. In contrast, in countries for
which environmental and food contamination is not controlled,
such exposure can potentially increase and become a public health
concern. In terms of food consumption, behaviors have changed
with the evolution of taste and food cultural practices and past and
future histories of exposures to chemicals can be reconstructed by
combining data from studies past and current studies. For example,
some countries have surveyed food consumption or purchases over
several years and such data can be combined to simulate individual
consumption over years. TK models used in KDEM can be
extended to PB-TK models with additional compartments repre-
senting the different tissues or physiological regions of the body
(93–95). The distribution of the chemical between the different
tissues or regions is represented by differential equations. The
large number of parameters required in PB-TK models limits their
use in statistical modeling to predict internal exposure from external
exposure.
5.1. Overall Perspective Over the last decade, risk assessment methodologies have evolved
for Chemical Risk considerably and such evolution has been stimulated by the scien-
Assessment tific community as a whole, specialists in the areas of toxicology,
pharmacokinetic modeling, applied statistics, systems biology to
cite but a few, as well as international public health agencies moving
towards “evidence-based” quantitative approaches. Historically, an
important historical factor of this trend has been the rise of the
bioinformatics in the postgenomic era serving as an interface
between the biomedical sciences (molecular biology, pharmacol-
ogy, toxicology, and epidemiology), mathematics/statistics, and
computer sciences. Additionally, high-performance computational
platforms as tools have favored such interdisciplinary approaches

to develop, such as the integration of population variability data
from the molecular, cellular, clinical, and epidemiological scales
thus providing powerful tools to quantify interindividual and inter-
species variations in physiological processes and toxicological
responses (6).
Currently, another trend is the tremendous growth of the
application of systems biology using OMICs methodologies (e.g.,
genomics, proteomics, metabolomics, metabonomics) both from
an experimental and modeling aspects to provide overall views on
gene, protein or metabolic profile changes in humans, animals, and
plants. These global views combined with appropriate population
modeling can provide not only an understanding of toxic mode of
action of a specific chemical in a specific species but opportunities to
develop specific new biomarkers with the associated population
variability. Together with systems biology, quantitative structure
activity relationship (QSAR) and molecular modeling, biologically
based or physiologically based models to quantify variability and
uncertainty have been central to these developments to progres-
sively more towards data-based risk assessment instead of default
assumptions, potentially used only in the case of total absence of
data. Indeed, a number of conceptual, experimental, and modeling
tools are still under development and will allow in the future to
integrate in a full quantitative manner physiological and toxicolog-
ical knowledge together with population variability to move
towards fully integrated risk assessment. Again, combining knowl-
edge on mode of action (MOA) at the organ and cellular level
together with genome-wide and functional measurements (tran-
scriptomics, proteomics, metabolomics. . .) can provide an global of
the biological basis of MOA. In practice for contaminants, the
identification of epidemiological biomarkers in humans, when
available, reflecting target organ toxicity expressed in molecular
terms, can provide a reliable way to determine quantitative levels
of exposure with a high level of protection for human subpopula-
tions (health-based guidance value) and at the same time consider-
ably reduce in vivo animal toxicity testing (71). In addition,
intelligent or integrated hierarchical testing strategies are being
developed in Europe for the implementation of the legislation for
Registration, Evaluation, and Authorization of Chemical Sub-
stances (REACH). These strategies include the use of human
cells/tissues to potentially eliminate interspecies extrapolation,
increase efficiencies in testing and reduce the use of animals. Impor-
tantly, the validation of in vitro testing methods can specifically
address particular modes of action for specific toxicological end-
points.
Finally, large chemical databases dealing specifically with
physicochemical properties, toxicokinetics and toxicity in humans,
mammals and species used for ecological risk assessments (daphnia,
fish, bacteria, soil invertebrates) are being developed around the

world including the echem-portal of the OECD and the European
Chemical Agency under REACH, the chemical hazards database of
EFSA and the TOX 21 program of the National toxicology pro-
gram to cite but a few. In the near future, these public, free-access
database will constitute a considerable step forward for researchers
and risk assessors to exchange data, reduce uncertainty in the risk
assessment process through a deeper mechanistic understanding
and the integration of weight of evidence in toxicokinetics and
toxicodynamics, and integrate them at the individual and popula-
tion variability level using the appropriate statistical methodologies,
e.g., Bayesian methods efficient to integrate complex data and
quantify uncertainties in the data.
Another Challenge is the integration of new methods and tools
to provide quantitative descriptors regarding variability and uncer-
tainty for the risk assessment of chemical mixtures (toxicokinetics,
toxicodynamics) in humans and the environment. A number of
frameworks are currently available to deal with the human risk
assessment of chemical mixtures. These methods will prove useful
to risk assessors in ecological and human risk assessment from the
public or private sector so that the relevant tools and modeling
techniques can provide science-based risk assessment that are more
transparent.
Acknowledgments
The views reflected in this review are the authors’ only and do not
reflect the views of the European Food Safety Authority, the Tech-
nological university of Compiegne, the French Agency for Food,
Environment, and Occupational Health Safety, the French
National Institute of Agronomical Research (INRA), or the
World Health organization.
References
1. European Commission (EC) (2002) Regula- mental health criteria 240. http://www.who.
tion (EC) No 178/2002 of the european par- int/foodsafety/chem/principles/en/index1.
liament and of the council laying down the html
general principles and requirements of food 3. Svendsen C, Ragas AM, Dorne JLCM (2008)
law, establishing the European Food Safety Contaminants in organic and conventional
Authority and laying down procedures in mat- food: the missing link between contaminant
ters of food safety. http://eur-lex.europa.eu/ levels and health effects. Book comparing
LexUriServ/LexUriServ.do?uri¼OJ: organic vs non-organic food at the nutritional,
L:2002:031:0001:0024:EN:PDF microbiological and toxicological level. In:
2. WHO (2009) Principles and methods for the Givens DI et al (eds) Health benefits of organic
risk assessment of chemicals in food, Environ-
foods: effects of the environment. Chapter 6, (12):1293. http://www.efsa.europa.eu/en/

vol 119. CABI, Wallingford efsajournal/doc/1393.pdf
4. Dorne JLCM, Bordajandi LR, Amzal B, Ferrari 14. EFSA (2009) Potential risks for public health
P, Verger P (2009) Combining analytical tech- due to the presence of nicotine in wild mush-
niques, exspoure assessment and biological rooms. EFSA J RN-286:2–47. http://www.
effects for risk assessment of chemicals in efsa.europa.eu/en/efsajournal/doc/286r.pdf
food. Trends Anal Chem 2009(28):695 15. Renwick AG, Lazarus NR (1998) Human
5. Kroes R, M€ uller D, Lambe J, Löwik MR, van variability and noncancer risk assessment—an
Klaveren J, Kleiner J, Massey R, Mayer S, analysis of the default uncertainty factor. Regul
Urieta I, Verger P, Visconti A (2002) Assess- Toxicol Pharmacol 27:3–20
ment of intake from the diet. Food Chem Tox- 16. WHO (2005) International Programme on
icol 40(2–3):327–385 Chemical Safety: chemical-specific adjustment.
6. Dorne JCM (2010) Metabolism, variability Factors for interspecies differences and human
and risk assessment. Toxicology 268 variability: guidance document for use of data
(3):156–164 in dose/concentration response assessment.
7. EFSA (European Food Safety Authority) World Health Organization, Geneva. http://
(2005) Opinion of the Scientific Committee www.who.int/ipcs/methods/harmonization/
on a request from EFSA related to a harmo- areas/uncertainty/en/index.html
nised approach for risk assessment of sub- 17. Amzal B, Julin B, Vahter M, Johanson G, Wolk
stances which are both genotoxic and A, Åkesson A (2009) Population toxicokinetic
carcinogenic. EFSA J 282:1–31. http://www. modeling of cadmium for health risk assess-
efsa.europa.eu/EFSA/Scientific_Opinion/sc_ ment. Environ Health Perspect 117
op_ej282_gentox_en3.pdf (8):1293–1301
8. FAO/WHO (Food and Agriculture Organisa- 18. EFSA (European Food Safety Authority)
tion of the United Nations/World Health (2009) Scientific Opinion of the Panel on Con-
Organization) (2006) Safety evaluation of cer- taminants in the Food Chain on a request from
tain contaminants in food. Prepared by the the European Commission on cadmium in
Sixty-fourth meeting of the Joint FAO/WHO food. EFSA J 980:1–139. http://www.efsa.
Expert Committee on Food Additives europa.eu/cs/BlobServer/Scientific_Opinion
(JECFA). FAO Food Nutr Pap 82:1–778 /contam_op_ej980_cadmium_en_rev.1.pdf?
9. EFSA (European Food Safety Authority) ssbinary¼true
(2007) Opinion of the Scientific Panel on Con- 19. Dorne JLCM, Walton K, Renwick AG (2005)
taminants in the Food chain on a request from Human variability in xenobiotic metabolism
the European Commission on ethyl carbamate and pathway-related uncertainty factors for
and hydrocyanic acid in food and beverages. chemical risk assessment: a review. Food
EFSA J 551:1–44. :http://www.efsa.europa. Chem Toxicol 43:203–216
eu/cs/BlobServer/Scientific_Opinion/Conta 20. Gerlowski LE, Jain RK (1983) Physiologically
m_ej551_ethyl_carbamate_en_rev.1.pdf?ssbi based pharmacokinetic modeling: principles
nary¼true and applications. J Pharm Sci 72:1103–1127
10. EFSA (European Food Safety Authority) 21. Bois F, Jamei M, Clewell HJ (2010) PBPK
(2008) Scientific Opinion of the Panel on Con- modelling of inter-individual variability in the
taminants in the Food Chain on a request from pharmacokinetics of environmental chemicals.
the European Commission on Polycyclic Aro- Toxicology 278:256–267
matic Hydrocarbons in Food. EFSA J 22. Gibaldi M, Perrier D (1982) Pharmacokinetics,
724:1–114. http://www.efsa.europa.eu/cs/ 2nd edn, revised and expanded ed. Marcel
BlobServer/Scientific_Opinion/conta- Dekker, New York
m_ej_724_PAHs_en,1.pdf?ssbinary¼true
23. Jamei M, Marciniak S, Feng KR, Barnett A,
11. EFSA (European Food Safety Authority) Tucker G, Rostami-Hodjegan A (2009) The
(2009) Scientific opinion on arsenic in food. Simcyp population-based ADME simulator.
EFSA J 7(10):1051. http://www.efsa.europa. Expert Opin Drug Metab Toxicol 5:211–223
eu/en/efsajournal/doc/1351.pdf
24. Bouvier d’Yvoire M, Prieto P, Blaauboer BJ,
12. JMPR (Joint FAO/WHO Meetings on Pesti- Bois FY, Boobis A, Brochot C, Coecke S, Frei-
cide Residues) (2002) Report of the JMPR, dig A, Gundert-Remy U, Hartung T, Jacobs
FAO Plant Production and Protection Paper, MN, Lavé T, Leahy DE, Lennern€as H, Loizou
172, 4. FAO, Rome GD, Meek B, Pease C, Rowland M, Spendiff
13. EFSA (2009) Scientific opinion on marine bio- M, Yang J, Zeilmaker M (2007)
toxins in shellfish—Palytoxin group. EFSA J 7 Physiologically-based kinetic modelling (PBK
modelling): meeting the 3Rs agenda—the 38. Crump KS, Chen C, Chiu WA, Louis TA, Por-
report and recommendations of ECVAM tier CJ et al (2010) What role for biologically
Workshop 63a. Altern Lab Anim 35:661–671 based dose–response models in estimating low-
25. Edginton AN, Schmitt W, Willmann S (2006) dose risk? Environ Health Perspect 118(5)
Development and evaluation of a generic phys- 39. Crump KS (1984) A new method for deter-
iologically based pharmacokinetic model for mining allowable daily intakes. Fundam Appl
children. Clin Pharmacokinet 45:1013–1034 Toxicol 4:854–871
26. Luecke RH, Pearce BA, Wosilait WD, Slikker 40. Budtz-Jørgensen E, Keiding N, Grandjean P
W, Young JF (2007) Postnatal growth consid- (2001) Benchmark dose calculation from epi-
erations for PBPK modeling. J Toxicol Environ demiological data. Biometrics 57:698–706
Health A 70:1027–1037 41. Sand S et al (2008) The current state of knowl-
27. Jones HM, Gardner IB, Watson KJ (2009) edge on the use of the benchmark dose concept
Modelling and PBPK simulation in drug dis- in risk assessment. J Appl Toxicol 28
covery. AAPS J 11:155–166 (4):405–421
28. Allen BC, Hack CE, Clewell HJ (2007) Use of 42. Crump KS (2002) Critical issues in benchmark
Markov chain Monte Carlo analysis with a calculations from continuous data. Crit Rev
physiologically-based pharmacokinetic model Toxicol 32:133–153
of methylmercury to estimate exposures in US 43. Suwazono Y et al (2006) Benchmark dose for
women of childbearing age. Risk Anal cadmium-induced renal effects in humans.
27:947–959 Environ Health Perspect 114(7):1072–1076
29. Lorber M (2008) Exposure of Americans to 44. Ryan L (2008) Combining data from multiple
polybrominated diphenyl ethers. J Expo Sci sources, with applications to environmental
Environ Epidemiol 18(1):2–19 risk assessment. Stat Med 27:698–710
30. Fromme H, Korner W et al (2009) Human 45. Wheeler MW, Bailer AJ (2007) Properties of
exposure to polybrominated diphenyl ethers model-averaged BMDLs: a study of model
(PBDE), as evidenced by data from a duplicate averaging in dichotomous response risk estima-
diet study, indoor air, house dust, and biomo- tion. Risk Anal 27:659–670
nitoring in Germany. Environ Int 35 46. Morales KH, Ibrahim JG, Chen CJ, Ryan LM
(8):1125–1135 (2006) Bayesian model averaging with applica-
31. US-EPA (2003) Exposure and human health tions to benchmark dose estimation for arsenic
reassessment of 2,3,7,8-tetrachlorodibenzo-p- in drinking water. J Am Stat Assoc 101
dioxin (TCDD) and related compounds (473):9–17
National Academy Sciences (NAS) review 47. EC (2004) European Union System for the
draft. Part III. EPA, Washington, DC Evaluation of Substances 2.0 (EUSES 2.0).
32. Pinsky PF, Lorber MN (1998) A model to Prepared for the European Chemicals Bureau
evaluate past exposure to 2,3,7,8-TCDD. J by the National Institute of Public Health and
Expo Anal Environ Epidemiol 8(2):187–206 the Environment (RIVM), Bilthoven, The
33. Smith JC, Farris FF (1996) Methyl mercury Netherlands (RIVM Report no. 601900005).
pharmacokinetics in man: a reevaluation. Tox- Available via the European Chemicals Bureau.
icol Appl Pharmacol 137(2):245–252 http://ecb.jrc.it
34. Albert I, Villeret G et al (2010) Integrating 48. Bertail P, Clémençon S et al (2010) Statistical
variability in half-lives and dietary intakes to analysis of a dynamic model for dietary contam-
predict mercury concentration in hair. Regul inant exposure. J Biol Dyn 4(2):212–234
Toxicol Pharmacol 58(3):482–489 49. Verger P, Tressou J, Clémençon S (2007) Inte-
35. Delyon B, Lavielle M, Moulines E (1999) Con- gration of time as a description parameter in
vergence of a stochastic approximation version risk characterisation: application to methyl
of the EM algorithm. Ann Stat 27(1):94–128 mercury. Regul Toxicol Pharmacol 49
36. Rowland M, Benet LZ, Graham GG (1973) (1):25–30
Clearance concepts in pharmacokinetics. 50. Tressou J, Leblanc JCh, Feinberg M, Bertail P
J Pharmacokinet Biopharm 1:123–136 (2004) Statistical methodology to evaluate
37. Shuey DL, Lau C, Logsdon TR, Zucker RM, food exposure to a contaminant and influence
Elstein KH, Narotsky MG, Setzer RW, Kavlock of sanitary limits: application to ochratoxin A.
RJ, Rogers JM (1994) Biologically based dose- Regul Toxicol Pharmacol 40(3):252–263
response modeling in developmental toxicol- 51. Van den Berg M, Birnbaum L et al (1998)
ogy: biochemical and cellular sequelae of 5- Toxic equivalency factors (TEFs) for PCBs,
fluorouracil exposure in the developing rat. PCDDs, PCDFs for humans and wildlife. Envi-
Toxicol Appl Pharmacol 126(1):129–144 ron Health Perspect 106:775–792
52. Thuresson, Höglund et al (2000) In: Medicine exposure assessment of chemical substances.
and health policy. New York: Marcel Dekker EFSA J 8(3):1557. http://www.efsa.europa.
53. AFSSA (2009) Etude individuelle Nationale eu/en/efsajournal/doc/1557.pdf
des consommations Alimentaires 2 (INCA 2) 67. Helsel DR (2005) Nondetects and data analy-
(2006-2007), Rapport AFSSA, 228p, http:// sis. Wiley, New York
www.anses.fr/Documents/PASER-Ra- 68. Kennedy MC, Roelofs VJ et al (2011) A hierar-
INCA2.pdf chical Bayesian model for extreme pesticide
54. EFSA (2010) Application of systematic review residues. Food Chem Toxicol 49(1):222–232
methodology to food and feed safety assess- 69. Tressou J, Bertail P et al (2003) 709 Evaluation
ments to support decision making. EFSA J 8 of food risk exposure using extreme value
(6):1637. http://www.efsa.europa.eu/en/ theory-application to heavy metals for sea pro-
efsajournal/doc/1637.pdf ducts consumers. Toxicol Lett 144(Supple-
55. Marvier M, McCreedy C, Regetz J, Kareiva P ment 1):s190
(2007) A meta-analysis of effects of Bt cotton 70. WHO (2009) Principles for modelling dose-
and maize on nontarget invertebrates. Science response for the risk assessment of chemicals.
316(5830):1475–1477 Environmental Health Criteria. http://www.
56. Greenland S, Robins J (1994) Invited com- who.int/tipcs/methods/harmonization/
mentary: ecologic studies—biases, misconcep- dose_response/en/
tions, and counterexamples. Am J Epidemiol 71. Spilke J, Piepho HP, Hu X (2005) A simulation
139:747–60 study on tests of hypotheses and confidence
57. Terrin N, Schmidt CH, Lau J, Olkin I (2003) intervals for fixed effects in mixed models for
Adjusting for publication bias in the presence blocked experiments with missing data. J Agric
of heterogeneity. Stat Med 22:2113–2212 Biol Environ Stat 10:374–389
58. Stangl D, Berry DA (eds) Meta-analysis 72. Spiegelhalter DJ, Best NG et al (2002) Bayes-
59. Higgins JP, Thompson SG, Deeks JJ, Altman ian measures of model complexity and fit. J R
DG (2003) Measuring inconsistency in meta- Stat Soc Series B Stat Methodol 64:583–640
analyses. BMJ 327:557–560 73. Dorne JLCM, Walton K, Renwick AG (2001)
60. Egger M et al (2001) Systematic reviews in Uncertainty factors for chemical risk assess-
health care. BMJ books, London ment: human variability in the pharmacokinet-
61. Egger M, Smith GD, Schneider M, Minder C ics of CYP1A2 probe substrates. Food Chem
(1997) Bias in meta-analysis detected by a sim- Toxicol 39:681–696
ple, graphical test. BMJ 315:629–634 74. Dorne JLCM, Walton K, Slob W, Renwick AG
62. Biro G, Hulshof K, Ovesen L, Amorim Cruz (2002) Human variability in polymorphic
JA (2002) Selection of methodology to assess CYP2D6 metabolism: is the kinetic default
food intake. Eur J Clin Nutr 56(Suppl 2): uncertainty factor adequate? Food Chem Tox-
S25–S32. doi:10.1038/sj/ejcn/1601426 icol 40:1633–1656
63. Verger P, Ireland J, Møller A, Abravicius JA, De 75. Dorne JLCM, Renwick AG (2005) The refine-
Henauw S, Naska A (2002) Improvement of ment of uncertainty/safety factors in risk
comparability of dietary intake assessment assessment by the incorporation of data on
using currently available individual food con- toxicokinetic variability in humans. Toxicol Sci
sumption surveys. Eur J Clin Nutr 56(Suppl 86:20–26
2):S1–S7. doi:10.1038/sj/ejcn/1601425 76. Dorne JLCM, Walton K, Renwick AG (2003)
64. Wirf€alt E, Hedblad B, Gullberg B, Mattisson I, Human variability in CYP3A4 metabolism and
AndrénC RU, Janzon L, Berglund G (2001) CYP3A4-related uncertainty factors for risk
Food patterns and components of the metabolic assessment. Food Chem Toxicol 41:201–224
syndrome in men and women: A cross-sectional 77. Dorne JLCM, Walton K, Renwick AG (2003)
study within the Malmö diet and cancer cohort. Polymorphic CYP2C19 and N-acetylation:
Am J Epidemiol 154(12):1150–1159 human variability in kinetics and pathway-
65. Zetlaoui M, Feinberg M, Verger P, Clémencon related uncertainty factors. Food Chem
S (2011) Extraction of food consumption sys- Toxicol 41:225–245
tems by non-negative matrix factorization 78. Kjellström T (1971) A mathematical model for
(NMF) for the assessment of food choices. the accumulation of cadmium in human kidney
Biometrics (in press). http://hal.archives- cortex. Nord Hyg Tidskr 52:111–119
ouvertes.fr/docs/00/48/47/94/PDF/NMF_ 79. Sutton AJ, Higgins JPT (2008) Recent devel-
food.pdf opments in meta-analysis. Stat Med
66. EFSA (2010) European Food Safety Authority; 27:625–650
management of left-censored data in dietary
80. Berry D, Strangl DK (eds) (2001) Meta- gration methods. The application of animal
analysis in medicine and health policy. Biosta- toxicity data in risk-benefit analysis:
tistics, New York 2,3,7,8-TCDD as an example.
81. Morales KH, Ryan LM (2005) Benchmark 89. Bernillon P, Bois FY (2000) Statistical issues in
dose estimation based on epidemiologic cohort toxicokinetic modeling: a Bayesian perspective.
data. Environmetrics 16:435–447 Environ Health Perspect 108(Suppl
82. Lunn DJ, Thomas A, Best N, Spiegelhalter D 5):883–893
(2000) WinBUGS—a Bayesian modelling 90. Yan L, Sheihk-Bahaei S, Park S, Ropella GE,
framework: concepts, structure and extensibil- Hunt CA (2008) Predictions of hepatic dispo-
ity. Stat Comput 10:325–337 sition properties using a mechanistically realis-
83. EFSA (2011) Comparison of the approaches tic, physiologically based model. Drug Metab
taken by EFSA and JECFA to establish a Dispos 36(4):759–768
HBGV for cadmium. http://www.efsa.europa. 91. Lerapetritou MG, Georgopoulos PG, Roth
eu/en/efsajournal/doc/2006.pdf. CM, Androulakis LP (2009) Tissue-level mod-
84. Suwazono Y, Nogawa K, Uetani M et al (2011) eling of xenobiotic metabolism in liver: an
Application of hybrid approach for estimating emerging tool for enabling clinical translational
the benchmark dose of urinary cadmium for research. Clin Transl Sci 2(3):228–237
adverse renal effects in the general population 92. McDonald TA (2005) Polybrominated diphe-
of Japan. J Appl Toxicol 31(1):89–93 nylether levels among United States residents:
85. Tard A, Gallotti S, Leblanc JC, Volatier JL daily intake and risk of harm to the developing
(2007) Dioxins, furans and dioxin-like PCBs: brain and reproductive organs. Integr Environ
occurrence in food and dietary intake in Assess Manag 1(4):343–354
France. Food Addit Contam 24(9):1007–1017 93. Van der Molen GW, Kooijman SALM et al
86. Milbrath MO, Wenger Y, Chang CW, Emond (1996) A generic toxicokinetic model for per-
C, Garabrant D, Gillespie BW, Jolliet O (2009) sistent lipophilic compounds in humans: an
Apparent half-lives of dioxins, furans, and poly- application to TCDD. Fundam Appl Toxicol
chlorinated biphenyls as a function of age, body 31(1):83–94
fat, smoking status, and breast-feeding. Envi- 94. Verner MA, Ayotte P et al (2009) A physiolog-
ron Health Perspect 117(3):417–425 ically based pharmacokinetic model for the
87. Gray LE, Ostby JS et al (1997) A dose- assessment of infant exposure to persistent
response analysis of the reproductive effects of organic pollutants in epidemiologic studies.
a single gestational dose of 2,3,7,8-tetrachlor- Environ Health Perspect 117(3):481–487
odibenzo-p-dioxin in male Long Evans 95. Lu C, Holbrook CM et al (2010) The implica-
Hooded rat offspring. Toxicol Appl Pharmacol tions of using a physiologically based pharma-
146(11–20) cokinetic (PBPK) model for pesticide risk
88. Bokkers, B. G. H., M. J. Zeilmaker, et al. assessment. Environ Health Perspect 118
(2009). RIVM report on framework and inte- (1):125–130
Chapter 21
Mechanism-Based Pharmacodynamic Modeling

Melanie A. Felmlee, Marilyn E. Morris, and Donald E. Mager
Abstract
Pharmacodynamic modeling is based on a quantitative integration of pharmacokinetics, pharmacological
systems, and (patho-) physiological processes for understanding the intensity and time-course of drug
effects on the body. Application of such models to the analysis of meaningful experimental data allows for
the quantification and prediction of drug–system interactions for both therapeutic and adverse drug
responses. In this chapter, commonly used mechanistic pharmacodynamic models are presented with
respect to their important features, operable equations, and signature profiles. In addition, literature
examples showcasing the utility of these models to adverse drug events are highlighted. Common model
types that are covered include simple direct effects, biophase distribution, indirect effects, signal transduc-
tion, and irreversible effects.
Key words: Adverse drug effects, Exposure–response relationships, Mathematical modeling,

Pharmacodynamics, Pharmacokinetics
1. Introduction
Pharmacodynamics represents a broad discipline that seeks to identify

drug- and system-specific properties that regulate acute and long-
term biological responses to drugs. The term is typically used in the
context of therapeutic effects, whereas toxicology or toxicodynamics
relates to adverse drug reactions. In contrast to classical conceptuali-
zations whereby beneficial and adverse responses occur via distinct
mechanisms, it is increasingly clear that diseases and both types of
drug responses may emerge from perturbations of singular complex
interconnected networks (1). Thus, mechanism-based pharmacody-
namic models, by definition, should be multipurpose and readily
adapted to understand the extent and time-course of adverse drug
effects.
In the mid-1960s, Gerhard Levy was the first to mathematically
demonstrate a link between pharmacokinetics (factors controlling
drug exposure) and the rate of decline of in vivo pharmacological
583
584 M.A. Felmlee et al.
responses (2, 3). Since that landmark discovery, pharmacodynamic

modeling has evolved into a quantitative field that aims to mathe-
matically characterize the temporal aspects of drug effects via emu-
lating mechanisms of action (4). The application of mathematical
models to describe drug–system interactions allows for the quanti-
fication and prediction of subsequent interactions within the sys-
tem. The major goals of pharmacodynamic modeling are to
integrate known system components, functions, and constraints,
generate and test competing hypotheses of drug mechanisms and
system responses under new conditions, and estimate system-
specific parameters that may be inaccessible (5). These models are
applicable to a wide range of disciplines within the biological
sciences including pharmacology and toxicology, wherein there is
a critical need to understand and predict desired and adverse
responses to xenobiotic exposure, which together define the clinical
utility or therapeutic index.
The main objectives of this chapter are to illustrate commonly
used mechanistic pharmacodynamic models, providing important
model features, operable equations, and signature profiles, as well
as examples of the application of these models to the analysis of
drug-induced adverse reactions.
2. Modeling
Requirements
Useful pharmacodynamic models are based on plausible mathemat-
ical and pharmacological exposure–response relationships. Basic
model components encompassing a range of pharmacodynamic
systems are illustrated in Fig. 1. For most drug effects, both
pharmacological mechanisms, often characterized by sensitivity-
grounded capacity-limited effector units, and physiological turn-
over processes need to be integrated with drug disposition when
constructing a PK/PD model.
The construction and evaluation of relevant PK/PD models
require suitable pharmacokinetic data, an appreciation for molecu-
lar and cellular mechanisms of pharmacological/toxicological
responses, and a range of quantitative experimental measurements
of meaningful biomarkers within the causal pathway between drug–
target interactions and clinical effects. Good experimental designs
are essential to ensure that sensitive and reproducible data are
collected. These data should cover a reasonably wide dose/concen-
tration range and appropriate study duration to ascertain net drug
exposure and the ultimate fate of the biomarkers or outcomes
under investigation. A wide range of systemic drug concentrations
is also typically required for the accurate and precise estimation of
pharmacodynamic parameters. Typically studies should involve a
minimum of two to three doses to adequately estimate the
21 Mechanism-Based Pharmacodynamic Modeling 585
Fig. 1. Basic components of pharmacodynamic models. The time-course of drug concentrations in a relevant biological
fluid (e.g., plasma, Cp) or the biophase (Ce) is characterized by a mathematical function that serves to drive PD models. The
biosensor process involves the interaction between the drug and the pharmacologic target (R ), and may be described
using various receptor-occupancy models, may require equations that consider the kinetics of the drug–receptor complex
formation and dissociation, or may encompass irreversible drug–target interactions. Many drugs act via indirect
mechanisms and the biosensor process may serve to stimulate or inhibit the production (kin) or loss (kout) of endogenous
mediators. These altered mediators may not represent the final observed drug effect (E) and further time-dependent
transduction processes may occur, thus requiring additional modeling components. System complexities such as drug
interactions, functional adaptation, changes with pathophysiology, and other factors may play a role in regulating drug
effects after acute and long-term drug exposure (adapted from ref. 33).
nonlinear parameters of most pharmacodynamic models with

simultaneous collection of concentration and response data. For
more complex systems (and therefore models), more extensive
datasets are required as these models typically incorporate multiple
nonlinear processes and pharmacodynamic endpoints. Models are
typically defined using ordinary differential equations and include
both drug- and system-specific parameters. This separation of terms
provides a platform for translational research, whereby relationships
with in vitro bioassays and preclinical experiments can be identified.
Once a structural model has been selected, unknown parameter
values can be estimated using nonlinear regression techniques. It is
beyond the scope of this chapter to review the vast array of software
programs and algorithms available, and the best tool and approach
will often be defined by the characteristics of the experimental data,
the familiarity of the end user with specific programs, and the goals
and objectives of the analysis. The type of model (e.g., data-driven
versus systems models), the nature of the biomarker (e.g., continu-
ous versus categorical), the degree of inter-subject variability, and
complexities within a dataset (e.g., missing variables, data above or
below a limit of quantification, and availability of covariates) are just
a few considerations when selecting an approach to develop and
qualify PK/PD models.
3. Practical
Modeling
Approaches
The first steps in any modeling endeavor are to define the objectives
of the analysis and to perform a careful graphical analysis of raw data.
Both efforts should facilitate selection of appropriate techniques
and conditions for model construction and evaluation. A good
graphical analysis (along with a priori knowledge of drug mechan-
isms) may be used to narrow down the number of structural models
being considered as a base model and also help in calculating initial
parameter estimates. Despite progress in computational algorithms,
good initial parameter estimates can reduce the likelihood of falling
into local minima and can also be used as a reality check when
compared to final parameter estimates or literature reported values.
Next, an appropriate drug/toxin pharmacokinetic/toxicokinetic
function is derived from fitting a model to concentration–time
profiles in relevant biological fluids. Depending on the complexity
of the pharmacodynamic model/system, the pharmacokinetic
model and associated parameters are often fixed to serve as a driving
function for the pharmacodynamic model relating drug exposure to
pharmacological/toxicological effects. Although simultaneous
PK/PD modeling is desirable, this can still be a formidable chal-
lenge for complex models. Objective model-fitting criteria (e.g.,
diagnostic and goodness-of-fit plots) are frequently compared to
select a final model, and a variety of techniques are available to verify
or qualify models, which can range in complexity depending on the
modeling approach (e.g., population versus pooled data). Ideally, an
external dataset, not used in the construction of the model, could be
used to determine whether the model is generalizable; however,
internal validation steps are far more common as most model-
builders will attempt to incorporate all available experimental data.
In any event, final models should reasonably recapitulate the data
used to derive the model, generate new insights and testable
hypotheses of factors controlling drug responses, and provide guid-
ance for subsequent decisions in drug discovery, development, and
pharmacotherapy. Subsequent sections will highlight commonly
used pharmacodynamic models with increasing degrees of complex-
ity, as well as provide literature examples on the application of such
models to the analysis of drug-induced adverse events.
3.1. Simple Direct Effect The Hill equation assumes that drugs effects (E) are directly pro-
Models portional to receptor occupancy (i.e., linear transduction), assumes
that plasma drug concentrations are in rapid equilibrium with the
effect site, and represents a fundamental pharmacodynamic rela-
tionship (6):
E max C p
E ¼ E0 : (1)
EC50 þ C p
10000 110
100
1000
90
100 80
Concentration
70
Effect
10 60
1 50
40
0.1 30
20
0.01
10
0.001 0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Time (hours) Time (hours)
Fig. 2. Simulated drug concentrations (left ) and response curves (right ) using a simple Emax model (Eq. 1). Drug
concentrations follow monoexponential disposition: Cp ¼ Coe(kt). Co was set to 10, 100, or 1,000 units to achieve
increasing dose levels. Parameter values were k ¼ 0.12/h, Emax ¼ 100 units, and EC50 ¼ 15 units.
This equation, also known as the Emax model, describes the

concentration–effect relationship in terms of a baseline effect or
E0 (if applicable), the maximum possible effect (Emax), and the
drug concentration producing half maximal effect (EC50). These
parameters can be visualized easily from a plot of effect versus log-
concentration where Emax is the plateau at relatively high concen-
trations and EC50 is the drug concentration associated with E
¼ 0.5 Emax. Signature temporal profiles for simple direct effects
for a compound with monoexponential disposition are shown in
Fig. 2. The effect versus time curves appear saturated at high dose
levels, decline linearly and in parallel over a range of doses, and the
peak response time corresponds with the time of peak drug con-
centrations.
If a sufficient range of concentrations is not achieved, or cannot
be obtained for safety reasons, the Hill equation can be reduced to
simpler functions. For concentrations significantly less than the
EC50, Cp in the denominator of Eq. 1 is negligible, and drug effect
is directly proportional to plasma drug concentrations:
E ¼ E o S C p; (2)
with S as the slope of the relationship. When the effect is between
20 and 80% maximal, according to Eq. 1, the effect is directly
proportional to the log of drug concentrations:
E ¼ E 0 m log C p ; (3)
with m as the slope of the relationship. These reduced functions are
only valid within certain ranges of drug concentrations relative to
drug potency, and hence cannot be extrapolated to identify the
maximal pharmacodynamic effect of a compound.
Fig. 3. Direct effect model of tacrolimus-induced changes of QTc intervals in guinea pigs. The pharmacokinetic model
includes both plasma and ventricular myocardial drug concentrations (a), and the latter are associated with changes in QTc
according to Eq. 4 (b). The PK/PD relationship results in the time-course of changes in QTc (c). Reprinted from ref. 8 with
permission from Springer.
The full Hill equation, or sigmoid Emax model, incorporates a

curve-fitting parameter, g, which describes the steepness of the
concentration–effect relationship:
E max Cpg
E ¼ E0 : (4)
ECg50 þ Cpg
Initial estimates for this parameter can be determined using the
linear slope of the effect versus log-concentration plot:
E max g
m¼ : (5)
4
As the Hill coefficient increases from 1 to 5, the concentration–
effect relationship becomes less graded, and values of 5 tend to result
in quantal or all-or-none types of effects. In contrast, values less than 1
produce very shallow slopes.
Simple direct effect models have been utilized to characterize
the adverse effects of a number of drugs. Arrhythmias may occur as
a side effect of cardiac and noncardiac therapies, and an increasing
number of studies are conducted with QTc intervals as the toxico-
dynamic endpoint. QTc prolongation in response to citalopram (7)
and tacrolimus (8) has been modeled using a simple Emax function
(Fig. 3). The simple Emax model incorporating baseline measure-
ments of the dynamic endpoints was also used to model the cardio-
vascular toxicity of cocaine administration (9). The model
100 90
keo = 0.1
80
keo = 0.5
10 70
Concentration
60
Effect
50
1
40
30
0.1 20
10
0.01 0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Fig. 4. Biophase model structure (top panel) and signature profiles for drug concentrations at the biophase (left bottom
panel ) and pharmacological effects (right bottom panel ). Response curves were simulated using Eqs. 1 and 6 driven by
drug concentrations following monoexponential disposition: Cp ¼ Coe(kt). Co was set to 100 units. Parameter values were
k ¼ 0.12/h, keo ¼ 0.1 or 0.5/h, Emax ¼ 100 units, and EC50 ¼ 15 units.
reasonably described the effects of cocaine on multiple endpoints

including heart rate and systolic and diastolic blood pressure. Both
the Emax and sigmoid Emax models were evaluated for describing
methemoglobin formation from dapsone metabolites (10); how-
ever, fitting criteria were not evaluated to select the best model.
3.2. Biophase In many cases, the in vivo pharmacological effects will lag behind
Distribution plasma drug concentrations. This results in the phenomenon of
hysteresis, or a temporal disconnect in effect versus concentration
plots. Distribution of drug to its site of action might represent a
rate-limiting process that may account for the delay in drug effect.
The term “biophase” was coined by Furchgott (11) to describe the
drug site of action, and a mathematical approach to linking plasma
concentrations and drug effect through a hypothetical effect com-
partment was popularized by Sheiner and colleagues (12) (Fig. 4,
top panel). Plasma drug concentrations are described using an
appropriate pharmacokinetic model, and the rate of change of
drug concentrations at the biophase (Ce) is defined as
dC e
¼ keo C p keo C e ; (6)
dt
with keo as a first-order distribution rate constant. Although separate

rate constants for production and loss were first proposed, they are
often set as the same term (keo) for identifiability purposes. The
amount of drug moving into and out of this compartment is
assumed to be negligible, and therefore does not influence the
pharmacokinetics of the drug. Biophase distribution is combined
with Eq. 1 or 4, with Ce from Eq. 6 replacing Cp to drive the
pharmacological effect. Figure 4 (bottom panels) illustrates the
signature profile of the biophase model (i.e., biophase concentra-
tion and effect profiles) for a drug exhibiting monoexponential
disposition. Peak drug effects are delayed relative to peak plasma
concentrations; however, the time to peak effect is observed at the
same time, independent of the dose level. The time to peak
drug effect is related to keo, with smaller values resulting in later
peak effects. Furthermore, for large dose levels, the slope of the
decline of effect is linear and parallel between 20 and 80% of
the maximum effect. Estimation of biophase model parameters
can be done sequentially by fitting the pharmacokinetics and then
fitting the biophase and pharmacodynamic parameters, or by simul-
taneously fitting all terms. The biophase model is only suitable for
describing delayed responses due to drug distribution. As it was the
first approach for describing such delayed drug responses, it has
been commonly misapplied to describe systems in which the rate-
limiting step is unrelated to drug distribution, resulting in poor
fitting and/or unrealistic parameter values.
The biophase model was implemented for describing
buprenorphine-induced respiratory depression in rats (13), and
the clinical prediction of transient increases in blood pressure
(14). Yassen and colleagues (13) utilized biophase distribution
combined with a sigmoidal Emax model to characterize changes in
ventilation following a range of dose levels of buprenorphine. In
contrast, increases in blood pressure resulting from a drug in clini-
cal development were described using the biophase model coupled
with a more complex pharmacodynamic relationship incorporating
changes from a blood pressure set point (14).
3.3. Indirect Response Indirect response models represent a highly useful class of models
Models wherein reversible drug–receptor interactions serve to alter the
natural production or loss of biomarker response variables.
A model reflecting inhibition of production was first utilized to
characterize prothrombin activity in blood after oral warfarin
administration (15). Dayneka and colleagues (16) were the first to
formally propose four basic indirect response models whose struc-
tures are detailed in Fig. 5 (top panel). These models have been
used to investigate the pharmacodynamics of a wide range of drug
effects, and their mathematical properties have been well character-
ized (17, 18). The four basic models include inhibition of
I II
120 600
100 500
80
Response
400
Response
60 300
40 200
20 100
0 0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
600 III IV
120
500 100
400
Response
80
Response
300 60
200 40
100 20
0 0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Fig. 5. Indirect response model structure (top panel ) and signature profiles for the four basic indirect response models
(middle and bottom panels). Response curves were simulated using Eqs. 7, 8, 9, and 10 driven by drug concentrations
following monoexponential disposition: Cp ¼ Coe(kt). Co was set to 10, 100, or 1,000 units to achieve increasing doses.
Parameter values were k ¼ 0.12/h, Imax ¼ 1 unit (Models I and II), Smax ¼ 10 units (Models III and IV), EC50 ¼ 15 units,
kout ¼ 0.25/h, and Ro ¼ 100 units (kin ¼ Rokout).
production (Model I) or dissipation (Model II) of response or

stimulation of production (Model III) or dissipation of response
(Model IV), and are defined by the following differential equations:
3.3.1. Model I
dR I max C p
¼ kin 1 kout R: (7)
dt IC50 þ C p
3.3.2. Model II
dR I max C p
¼ kin kout 1 R: (8)
dt IC50 þ C p
3.3.3. Model III

dR S max C p
¼ kin 1 þ kout R: (9)
dt SC50 þ C p
3.3.4. Model IV
dR S max C p
¼ kin kout 1 þ R; (10)
dt SC50 þ C p
where kin is a zero-order production rate constant, kout is a first-
order elimination rate constant, Imax and Smax are defined as the
maximum fractional factors of inhibition (0 < Imax 1) or stimu-
lation (Smax > 0), and IC50 and SC50 are defined as the EC50.
Initial parameter estimates can be obtained from a graphical analysis
of PK/PD data as previously described (17, 18). Signature profiles
for these models in response to increasing dose levels are shown in
Fig. 5 (middle and bottom panels). Interestingly, the time to peak
responses are dose dependent, occurring at later times as the dose
level is increased. This phenomenon is easily explained as the inhi-
bition or stimulation effect will continue for larger doses, as drug
remains above the EC50 for longer times. The initial condition for
all models (R0) is kin/kout which may be set constant or fitted as a
parameter during model development. Ideally, a number of mea-
surements should be obtained prior to drug administration to
assess baseline conditions. Based on the determinants of R0, typi-
cally the baseline and one of the turnover parameters are estimated,
and the remaining rate constant is calculated as a function of the
two estimated terms. This reduces the number of parameters to be
estimated and maintains system stationarity.
The basic indirect response models can be extended to incor-
porate a precursor compartment (P). The following equations
represent a general set of precursor-dependent indirect response
models (Fig. 6, top panel) that were developed and characterized
by Sharma and colleagues (19):
dP
¼ ko 1 H 1 C p ks þ kp f1H 2 C p P; (11)
dt
dR
¼ kp f1H 2 C p P kout R; (12)
dt
where k0 represents the zero-order rate constant for precursor
production, kp is a first-order rate constant for production of the
response variable, and kout is the first-order rate constant for dissi-
pation of response. H1 and H2 represent the inhibition or stimula-
tion of precursor production or production of response and are
V VI
175 135
150 130
125
125
120
Response
Response
100 115
75 110
105
50
100
25 95
0 90
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Fig. 6. Multiple compartment indirect response models (top panel) and signature profiles for Models V and VI (bottom
panel ). Response curves were simulated using Eqs. 11 and 12 driven by drug concentrations following monoexponential
disposition: Cp ¼ Coe(kt). Co was set to 10, 100, or 1,000 units to achieve increasing doses. Parameter values were
k ¼ 0.12/h, Imax ¼ 1 unit, Smax ¼ 10 units, EC50 ¼ 15 units, ko ¼ 25 unit/h, kp ¼ 0.5/h, and kout ¼ 0.25/h.
analogous to the Imax and Smax functions presented in Eqs. 7

through 10. Stimulation or inhibition of kp is more commonly
observed than alterations in the production of precursor. The
signature profiles for models V and VI are shown in Fig. 6 (bottom
panels) and clearly demonstrate the rebound effect as drug washes
out of the system. The data requirements for these models are
similar to the basic indirect response models; however, sufficient
data are needed to adequately capture baseline, maximum, and
rebound effects, as well as the eventual gradual return to baseline
conditions. Responses should be evaluated for two to three doses,
with a sufficiently large dose to capture the maximum effect. The
response measurements for the large dose should be used to deter-
mine initial parameter estimates followed by simultaneous fitting of
all response data. Initial parameter estimates should be derived as
previously described (19).
Indirect response models have been utilized to describe the
pharmacodynamic effects of a wide range of compounds that alter
the natural bioflux or turnover of endogenous substances or func-
tions. A basic indirect response model for erythropoietin was
extended to include multiple-compartments for describing the
turnover of red blood cells and carboplatin-induced anemia (20).
This model nicely illustrates the development of a more complex

model based on indirect mechanisms of drug action to simulta-
neously describe multiple in vivo processes.
3.4. Signal Transduction Substantial time-delays in the observed pharmacodynamic response

Models may result from multiple time-dependent steps occurring between
drug–receptor binding and the ultimate pharmacological response.
A transit compartment approach can be utilized to describe a lag
between drug concentration and observed effects owing to time-
dependent signal transduction (21, 22). Assuming rapid receptor
binding, the following differential equation describes the rate of
change of the initial transit compartment (M1):

dM 1 1 E max C p
¼ M1 ; (13)
dt t EC50 þ C p
wherein the Emax model describes the drug–receptor interaction,
and t is the mean transit time through this compartment.
Subsequent transit compartments may be added, and a general
equation for the ith compartment can be defined as
dM i 1
¼ ðM i1 M i Þ: (14)
dt t
Later compartments will show a clear delay in the onset of
response as well as substantial delays in achieving the maximum effect.
Model development for signal transduction systems typically includes
evaluating varied numbers of transit compartments and values for t to
determine the combination that best describes the data.
Chemotherapy-induced myelosuppression represents a classic
example of the use of a transit compartment modeling to describe
this adverse reaction to numerous chemotherapeutic agents (Fig. 7,
top panel). The structural model was proposed by Friberg and col-
leagues (23) to describe myelosuppression induced by irinotecan,
vinflunine (Fig. 7, bottom panel), and 20 -deoxy-20 -methylidenecyti-
dine for a range of dose levels and various dosing regimens. This same
structural model has been used to describe indisulam-induced mye-
losuppression (24), as well as the drug–drug interactions between
indisulam and capecitabine (25), and pemetrexed and BI2536 (26).
3.5. Irreversible Effect A wide range of compounds, including anticancer drugs, antimi-
Models crobial drugs, and enzyme inhibitors, elicit irreversible effects.
A basic model for describing irreversible effects was developed by
Jusko and includes simple cell killing (27):
dR
¼ k C R; (15)
dt
where R represents cells or receptors, C is either Cp or Ce, and k is a
second-order cell-kill rate constant. The initial condition for this
Fig. 7. Transit-compartment model of myelosuppression (top panel ) including a proliferating progenitor pool (P ), three
transit compartments (Mi), and a plasma neutrophil compartment (N ). Drug effect is driven by plasma drug concentration
(Cp) and pharmacodynamic parameters (y). An adaptive feedback function on the proliferation rate constant is governed by
the ratio of initial neutrophils to current neutrophil count, raised to a power coefficient (g). The time-course of neutrophils
following vinflunine administration (arrows in bottom panel). Reprinted from ref. 23 with permission from the American
Society of Clinical Oncology.
equation is the initial number of cells present within the system

(R0) often represented as a survival fraction. This approach is only
applicable for non-proliferating cell populations, but may be
extended to incorporate cell growth (27):
dR
¼ ks R k C R; (16)
dt
with ks as an apparent first-order growth rate for proliferating cell
populations, such as malignant cells or bacteria. This growth rate
constant represents the net combination of natural growth and
degradation of the cellular population, and its initial estimate can
be determined from a control- or nondrug-treated cell population.
The model diagram and corresponding signature profiles are shown
in Fig. 8. The initial slope of the log survival fraction versus time
curve out to time, t, and the plasma drug AUC(0t) can be used to
obtain an initial estimate for k (k ¼ ln SFt/AUC(0–t)), and the
initial condition for Eq. 16 is the total cell population at time zero.
In contrast to simple cell killing, the effect–time profiles are char-
acterized by an initial cell kill phase, followed by an exponential
growth phase, once drug concentrations are below an effective
ks
k x Cp
R
100
Survival Fraction
10
0.1
0.01
0 10 20 30 40 50 60 70 80
Time (hours)
Fig. 8. Structural model for irreversible effects (top panel ) and signature profiles for
irreversible effect model with a proliferating cell population (bottom panel ). Response
curves were simulated using Eq. 16 driven by drug concentrations following monoexpo-
nential disposition: Cp ¼ Coe(kt). Co was set to 0, 10, 100, or 1,000 units to achieve a
control population and increasing dose levels. Pharmacokinetic parameter was k ¼ 0.12/h.
Pharmacodynamic parameters were k ¼ 0.0005 units/h, ks ¼ 0.03/h.
concentration (Fig. 8, bottom panel). Clearly the control group is

needed to properly characterize the exponential growth rate con-
stant in the untreated cell population.
The irreversible effect model can also be adapted to include the
turnover or production and loss of a biomarker:
dR
¼ kin kout R k C R: (17)
dt
The initial condition for this model is the same as basic indirect
responses or kin/kout. The signature profiles for this model are
similar to the profiles for indirect response models I and IV
(Fig. 5). It is important to understand the mechanism of action of
the response that you are evaluating in order to determine which
model should be utilized.
Irreversible effect models are commonly used to describe the
cell killing action of chemotherapeutic agents and anti-infectives.
This model was also applied to evaluate the formation of methe-
moglobin following the administration of a range of antimalarial
agents (28). The final model characterized methemoglobin pro-
duction resulting from the formation of an active drug metabolite.
3.6. More Complex The main focus of this chapter was the introduction of commonly
Models used mechanistic pharmacodynamic models that can be readily
applied to toxicokinetics and dynamics. However, a number of
mechanistic processes may be required to adequately describe the
drug–system interactions under investigation. Slow receptor bind-

ing, tolerance phenomenon, drug interactions, opposing drug
effects, and disease progression may add additional complexities
to the analysis of toxicodynamic data. For example, Houze and
colleagues (29) evaluated paraoxon-induced respiratory toxicity
and its reversal with pralidoxime (PRX) administration in rats via
the combination of multiple pharmacodynamic modeling compo-
nents. Initially, the time-course of paraoxon inactivation of in vitro
whole blood cholinesterase (WBChE) was modeled based on
enzyme inactivation:

dE A kC PO
¼ E A þ kr E I ; (18)
dt EC50;PO þ C PO
where EA is active enzyme, k is the maximal rate constant of enzyme
inactivation, CPO is paraoxon concentration, EC50,PO is the con-
centration of paraoxon that produces 50% of k, kr is a first-order
reactivation rate constant, and EI is the inactive enzyme pool. The
rate of change of the inactive enzyme (EI) was defined as

dE I kC PO
¼ E A kr þ kage E I ; (19)
dt EC50;PO þ C PO
where kage is a first-order rate constant of aging of inactive enzyme.
The reactivation of this in vitro system by PRX was modeled as an
indirect response, and Eqs. 18 and 19 were updated accordingly:

dE A kC PO
¼ EA
dt EC50;PO þ C PO
!
h
E max CPRX
þ kr 1 þ EI (20)
ECh50;PRX þ CPRX h

dE I kC PO
¼ EA
dt EC50;PO þ C PO
!
h
E max CPRX
kr 1 þ E I ðkage E I Þ: (21)
ECh50;PRX þ CPRX
h
Interestingly, the estimated potency of PRX was in agreement

with an empirical literature estimate. For the in vivo dynamics, a
fixed pharmacokinetic function for PRX was introduced, and an
empirical function was used to describe paraoxon-induced enzyme
inactivation, as plasma concentrations were unavailable. The esti-
mated parameters from the in vitro analysis were fixed (not identi-
fiable from in vivo data only), and the toxicodynamic biomarker,
expiratory time (TE), was linked to apparent active enzyme (EA)
according to the following nonlinear transfer function:
n
E max;T E EEA0 1
T E ¼ TE0 þ n ; (22)
n
E50 þ EEA0 1
where TE0 is the baseline expiratory time, Emax,TE is the maximal

increase in TE, E0 is the baseline active enzyme (1 or 100%), E50 is
the corrected enzyme ratio resulting in 50% of Emax,TE, and n is a
sigmoidicity coefficient. Expiratory profiles and the transient anti-
dotal effect of PRX were described well, and this analysis highlights
the integration of several basic modeling approaches described in
this chapter. Further, the coupling of in vitro enzyme and in vivo
toxicodynamic data demonstrates the versatility and multi-scale
nature of the model.
An additional theoretical example of mechanism-based analysis
of drug interactions was presented by Earp and colleagues (30),
who examined drug interactions utilizing indirect response models.
These more complex models typically consider multiple pharmaco-
dynamic endpoints which require individual data sets and stepwise
analysis for each endpoint. A corticosteroid model which considers
mRNA dynamics of the glucocorticoid receptor and hepatic tyro-
sine aminotransferase mRNA and activity is an example of simulta-
neously characterizing multiple pharmacodynamic endpoints using
an integration of basic modeling components (31).
The majority of mechanism-based pharmacodynamic models
describe continuous physiological response variables. However,
models are available for evaluating noncontinuous outcomes,
such as the probability of a specific event occurring. Such responses
are often more clinically relevant, and more research is needed to
combine continuous mechanistic PK/PD models with clinical out-
comes data. One example is the prediction of enoxaparin-induced
bleeding events in patients undergoing various therapeutic dosing
regimens (32). A population proportional-odds model was devel-
oped to predict the severity of bleeding event on an ordinal scale of
1–3 (32).
4. Prospectus
The future of mechanism-based pharmacodynamic modeling for

both therapeutic and adverse drug responses is promising for
model-based drug development and therapeutics, and many of
the basic modeling concepts in this chapter will likely continue
to represent key building components in more complex systems
models. A diverse array of models is available with a minimal number
of identifiable parameters to mimic mechanisms and the time-course
of therapeutic and adverse drug effects. However, new
methodologies will be needed to evolve these models further into

translational platforms and prospectively predictive models of drug
efficacy and safety. Network-based systems pharmacology models
have shown utility for understanding drug-induced adverse events
(1). Further research is needed to identify practical techniques for
bridging systems pharmacology and in vivo PK/PD models to
anticipate the clinical utility of new chemical entities from first
principles.
Acknowledgments
The authors thank Dr. William J. Jusko (University at Buffalo,

SUNY) for reviewing this chapter and providing insightful feed-
back. This work was supported by Grant No. GM57980 from the
National Institutes of General Medicine, Grant No. DA023223
from the National Institute on Drug Abuse, and Hoffmann-La
Roche Inc.
References
1. Berger SI, Iyengar R (2011) Role of systems mus and QT prolongation in guinea pigs: phar-
pharmacology in understanding drug adverse macokinetic/pharmacodynamic model
events. Wiley Interdiscip Rev Syst Biol Med incorporating a site of adverse effect.
3:129–135 J Pharmacokinet Pharmacodyn 28:533–554
2. Levy G (1964) Relationship between elimina- 9. Laizure SC, Parker RB (2009) Pharmacody-
tion rate of drugs and rate of decline of their namic evaluation of the cardiovascular effects
pharmacologic effects. J Pharm Sci 53:342–343 after the coadministration of cocaine and etha-
3. Levy G (1966) Kinetics of pharmacologic nol. Drug Metab Dispos 37:310–314
effects. Clin Pharmacol Ther 7:362–372 10. Vage C, Saab N, Woster PM, Svensson CK
4. Mager DE, Wyska E, Jusko WJ (2003) Diver- (1994) Dapsone-induced hematologic toxic-
sity of mechanism-based pharmacodynamic ity: comparison of the methemoglobin-
models. Drug Metab Dispos 31:510–518 forming ability of hydroxylamine metabolites
5. Yates FE (1975) On the mathematical model- of dapsone in rat and human blood. Toxicol
ing of biological systems: a qualified ‘pro’. In: Appl Pharmacol 129:309–316
Vernberg FJ (ed) Physiological adaptation to 11. Furchgott RF (1955) The pharmacology of
the environment. Intext Educational Publish- vascular smooth muscle. Pharmacol Rev
ers, New York 7:183–265
6. Wagner JG (1968) Kinetics of pharmacologic 12. Sheiner LB, Stanski DR, Vozeh S, Miller RD,
response. I. Proposed relationships between Ham J (1979) Simultaneous modeling of phar-
response and drug concentration in the intact macokinetics and pharmacodynamics: applica-
animal and man. J Theor Biol 20:173–201 tion to d-tubocurarine. Clin Pharmacol Ther
7. Friberg LE, Isbister GK, Hackett LP, Duffull 25:358–371
SB (2005) The population pharmacokinetics of 13. Yassen A, Kan J, Olofsen E, Suidgeest E, Dahan
citalopram after deliberate self-poisoning: a A, Danhof M (2007) Pharmacokinetic-
Bayesian approach. J Pharmacokinet Pharma- pharmacodynamic modeling of the respiratory
codyn 32:571–605 depressant effect of norbuprenorphine in rats.
8. Minematsu T, Ohtani H, Yamada Y, Sawada Y, J Pharmacol Exp Ther 321:598–607
Sato H, Iga T (2001) Quantitative relationship 14. Stroh M, Addy C, Wu Y, Stoch SA, Pourkavoos
between myocardial concentration of tacroli- N, Groff M, Xu Y, Wagner J, Gottesdiener K,
Shadle C, Wang H, Manser K, Winchell GA, to reduce the risk of severe myelosuppression.
Stone JA (2009) Model-based decision making J Pharmacokinet Pharmacodyn 36:39–62
in early clinical development: minimizing the 25. Zandvliet AS, Siegel-Lakhai WS, Beijnen JH,
impact of a blood pressure adverse event. AAPS Copalu W, Etienne-Grimaldi MC, Milano G,
J 11:99–108 Schellens JH, Huitema AD (2008) PK/PD
15. Nagashima R, O’Reilly RA, Levy G (1969) model of indisulam and capecitabine: interac-
Kinetics of pharmacologic effects in man: the tion causes excessive myelosuppression. Clin
anticoagulant action of warfarin. Clin Pharma- Pharmacol Ther 83:829–839
col Ther 10:22–35 26. Soto E, Staab A, Freiwald M, Munzert G,
16. Dayneka NL, Garg V, Jusko WJ (1993) Com- Fritsch H, Doge C, Troconiz IF (2010) Pre-
parison of four basic models of indirect phar- diction of neutropenia-related effects of a new
macodynamic responses. J Pharmacokinet combination therapy with the anticancer drugs
Biopharm 21:457–478 BI 2536 (a Plk1 inhibitor) and pemetrexed.
17. Jusko WJ, Ko HC (1994) Physiologic indirect Clin Pharmacol Ther 88:660–667
response models characterize diverse types of 27. Jusko WJ (1971) Pharmacodynamics of che-
pharmacodynamic effects. Clin Pharmacol motherapeutic effects: dose-time-response
Ther 56:406–419 relationships for phase-nonspecific agents.
18. Sharma A, Jusko WJ (1998) Characteristics of J Pharm Sci 60:892–895
indirect pharmacodynamic models and applica- 28. Fasanmade AA, Jusko WJ (1995) An improved
tions to clinical drug responses. Br J Clin Phar- pharmacodynamic model for formation of
macol 45:229–239 methemoglobin by antimalarial drugs. Drug
19. Sharma A, Ebling WF, Jusko WJ (1998) Metab Dispos 23:573–576
Precursor-dependent indirect pharmacody- 29. Houze P, Mager DE, Risede P, Baud FJ (2010)
namic response model for tolerance and Pharmacokinetics and toxicodynamics of prali-
rebound phenomena. J Pharm Sci 87:1577– doxime effects on paraoxon-induced respira-
1584 tory toxicity. Toxicol Sci 116:660–672
20. Woo S, Krzyzanski W, Jusko WJ (2008) Phar- 30. Earp J, Krzyzanski W, Chakraborty A, Zama-
macodynamic model for chemotherapy- cona MK, Jusko WJ (2004) Assessment of drug
induced anemia in rats. Cancer Chemother interactions relevant to pharmacodynamic indi-
Pharmacol 62:123–133 rect response models. J Pharmacokinet Phar-
21. Sun YN, Jusko WJ (1998) Transit compart- macodyn 31:345–380
ments versus gamma distribution function to 31. Hazra A, Pyszczynski N, DuBois DC, Almon
model signal transduction processes in pharma- RR, Jusko WJ (2007) Modeling receptor/
codynamics. J Pharm Sci 87:732–737 gene-mediated effects of corticosteroids on
22. Mager DE, Jusko WJ (2001) Pharmacody- hepatic tyrosine aminotransferase dynamics in
namic modeling of time-dependent transduc- rats: dual regulation by endogenous and exog-
tion systems. Clin Pharmacol Ther 70:210– enous corticosteroids. J Pharmacokinet Phar-
216 macodyn 34:643–667
23. Friberg LE, Henningsson A, Maas H, Nguyen 32. Barras MA, Duffull SB, Atherton JJ, Green B
L, Karlsson MO (2002) Model of (2009) Modelling the occurrence and severity
chemotherapy-induced myelosuppression with of enoxaparin-induced bleeding and bruising
parameter consistency across drugs. J Clin events. Br J Clin Pharmacol 68:700–711
Oncol 20:4713–4721 33. Jusko WJ, Ko HC, Ebling WF (1995) Conver-
24. Zandvliet AS, Schellens JH, Copalu W, Beijnen gence of direct and indirect pharmacodynamic
JH, Huitema AD (2009) Covariate-based dose response models. J Pharmacokinet Biopharm
individualization of the cytotoxic drug indisulam 23:5–8
INDEX
A Antifungal ...................................................................... 208

Antihistaminic ............................................................... 345
Absorption Antimalarial ....................................................36, 208, 588
brain ........................................................................... 29 Antimicrobial................................................................. 586
drug ........................................ 77, 284–286, 335, 337
Antioxidant.................................................................... 207
oral .........................74, 306, 335, 339, 458, 552, 554 Apoptosis ....................................................................... 222
passive ...................................................................... 284 Aquatic toxicity ............................................................. 120
rate ................................................300, 301, 307, 308,
Artificial
317, 378, 458, 494 neural networks .........................................89, 97, 111,
ACD/Labs ............................................... 79, 84, 98, 101, 178, 337–339, 341
102, 104, 105, 109, 110, 112, 113
Aryl hydrocarbon receptor, AhR.................................. 240
Acetaminophen ...................................282, 292–294, 306 Aspirin..................................................296, 386, 399–407
Acetylcholine ........................................................ 257–261 Assessment
Acetylcholinesterase .......................................................... 4
environmental.............................................1, 154, 162
Acetyltransferases ................................294, 552, 553, 555 exposure................................................ 522, 517, 518,
acslX .............................82, 444–446, 440, 474, 475, 523 524, 533–538, 540, 543–546, 552, 566–567
Adipose ................................................................. 447, 519 risk....................................................... 2, 11, 137, 154,
ADME, ADMET
183, 283, 439, 442, 443, 439, 440, 455, 456,
evaluation.............................................................28, 80 464, 473, 482, 494, 501, 507, 513–517, 521,
parameters ..................................... 143, 331, 335–343 522, 524–527, 530, 532, 537–547, 549, 552,
pharmacokinetic ............................................. 337–347
553, 555, 561–562, 567–569
prediction.....................................79, 80, 91, 334–343 AutoDock .........................................................31, 77, 153
profiling .......................................................... 329, 343
suite............................................................................ 80 B
ADMEWORKS .........................................................93, 95
Agent Bayesian
based .......................................................................... 85 classification ........................................... 217, 341, 355
Albumin ........................................................ 74, 135, 153, inference ............................... 520, 523, 524, 551, 560
339, 342, 408, 428, 434 model .......................................................24, 217, 355,
Alcohol.......................................................... 36, 108, 120, 359, 361, 532, 533, 550, 551
208, 294, 296, 508, 552, 553 statistics.................................................................... 356
Alignment Benchmark dose ......................................... 523, 527, 532,
methods ......................................................22, 74, 356 539, 560, 561
molecules .......................................................... 28, 356 Benzene ................................................................ 181, 503
Allometric ................................................... 340, 443, 451, Berkeley Madonna ....................... 82, 383, 444–448, 524
464, 465, 469, 478, 494, 495, 497, 505 Biliary excretion ..................................343, 389, 468, 469
AlogP ........................................................... 334, 355, 357 Binary
Alprazolam............................................................ 318, 319 classification ............................................................. 341
Amber ............................32, 77, 144, 243–245, 247, 254 nearest neighbor............................................. 338–341
AMPAC ........................................................................... 78 qsar........................................................................... 355
Androgen receptor ...................................... 238, 240, 352 Binding
Anova .................................................................... 526, 539 affinity ............................................146, 149, 333, 462
Antibacterial .................................................................. 208 domain ..................................................................... 352
Antibiotics ...........................................201, 208, 354, 357 energy .................................................................. 32–35
Antibody ...........................................................73, 82, 209 ligand ...................................... 28, 141, 241, 252, 352
Anticancer...................................................................... 586 model ....................................................................... 461
DOI 10.1007/978-1-62703-050-2, # Springer Science+Business Media, LLC 2012
601
602 C OMPUTATIONAL TOXICOLOGY
Index
Binding (cont.) receptor...................................................................... 14
plasma ...................................................................... 153 signaling..................................................................... 14
site ....................................................... 31, 75, 77, 149, CellML.................................................................. 392, 395
154, 155, 241, 243, 261, 295, 352, 354, 356, CellNetOptimizer ........................................................... 84
358, 361–363, 386, 387 Cellular
tissue ............................................................... 288, 461 compartments................................................. 388, 426
Bioaccumulation ........................................................... 389 systems ..................................................................... 426
Bioavailability ........................................... 25, 28, 29, 285, Cephalosporins....................................338, 340, 343–344
331, 333, 335, 337, 338, 340–343, 377, 389 CHARMM ................................................. 144, 243–245,
Bioconductor................................................................. 253 247, 254, 262, 266
Biomarker ...............................................14, 23, 522, 528, Chemaxon .......................................................78, 95, 104,
550, 556, 558, 560, 562, 568, 576, 577, 166, 170, 176, 177, 200
582, 588, 589 Chembl ................................................218, 219, 220, 221
BioSolveIT.................................................................77, 82 Chemical Entities of Biological Interest (ChEBI)
Bond identifier................................................................... 205
acceptor...........................................88, 100, 138, 152, ontology .................................................191, 200–203
177, 334, 353–356 web.................................................................. 205, 209
breaking ................................................. 139, 254, 335 Cheminformatics.............................................73, 82, 191,
contribution ............................................................ 113 193, 223, 224, 238
contribution method ..................................... 113, 115 ChemOffice ...........................................95, 107, 110, 115
donor .......................................................88, 100, 108, Chemometrics ............................................................... 220
111, 114, 138, 152, 177, 334, 353–355 Chemotherapeutic...............................293, 444, 586, 588
Bone............................................................. 453, 519, 555 ChemProp .......................................................94, 95, 104,
Boolean ........................................................ 194, 195, 197 107, 110, 112, 115
Bossa ChemSpider....................................................91, 191, 223
predictivity ...................................................... 122, 199 Classification
Brain barrier models................................... 338–340, 344, 355, 356
penetration .......................29, 80, 180, 331, 340, 341 molecular ....................................................13, 72, 146
permeability .............................................. 29, 338–341 qsar.................................................................. 216, 355
tree .................................................334, 339, 344, 345
C Clearance
drug ...................................................... 237, 240, 293,
Caco
cells........................................................................... 335 295, 296, 339, 343, 508
permeability ........................................... 321, 322, 331 metabolite ................................................................ 237
model .............................................................. 297, 380
Cadmium .................................................... 211, 517, 518,
522, 542, 546, 551, 555–562 process ..................................................................... 411
Caffeine.............................................................89, 90, 293 rate .......................................... 24, 296, 342, 411, 440
ClogP ............................................................................. 181
Calcium.......................................................................... 258
Cancer Clonal growth ...................................................... 8, 14–15
risk................................................................... 524, 516 Cluster
analysis ........................................... 175–176, 339, 341
risk assessment................................................ 455, 456
Capillary...................................... 388, 389, 425–432, 434 pbs..................................................263, 264, 272, 273
Carbamazepine............................................ 286, 287, 293 Cmax.............................................................................. 554
Carcinogen, carcinogenic CODESSA .........................................................78, 92, 93,
100, 106, 108, 109, 111, 114
activity...................................................................... 455
effects ....................................................................... 455 Combinatorial chemistry ....................................... 24, 330
potency ........................................................................ 4 CoMFA. See Comparative Molecular Field
Analysis (CoMFA)
potential................................................................... 523
Cardiovascular ............................................. 209, 391, 580 Comparative Molecular Field Analysis
Cell (CoMFA) ........................... 25, 26, 138, 339, 356
Compartmental
cycle .................................................................. 48, 353
growth ................................................. 51, 60, 67, 587 absorption........................................77, 308, 309, 334
membrane......................................135, 386, 388, 427 analysis .................................................. 384, 385, 390,
permeability ............................................................... 12 400, 410, 436, 518–524
COMPUTATIONAL TOXICOLOGY
Index 603
model .....................................................290, 299–302, properties................................................................... 91
369, 391–436, 440–441, 441, 518, 521, 523, QSAR........................................ 27, 28, 120, 193, 332
524, 525, 526 QSPR .........................................................88, 91, 113,
systems .................................. 384, 385, 387, 391, 435 117, 120, 332
CoMSIA .........................................................25, 138, 356 topological ..................... 97, 100, 108, 111, 178, 345
Conformational Developmental
dynamics ................................................ 241, 250, 252 effects ....................................................................... 456
energetic .................................................................. 139 toxicants................................................................... 134
search ........................................................31, 143, 144 toxicity ....................................................................... 84
space........................................ 33, 138, 166, 361, 362 Diazepam.............................................241, 242, 293, 509
Connectivity Dibenzofurans ............................................................... 444
index ............................................................... 344, 345 Dichlorodiphenyltrichloroethane
topochemical .................................................. 344, 345 (DDT)............................................................... 183
Consensus Dietary
predictions ..................................................89, 90, 331 exposure................................................ 523, 533, 534,
score .................................................32, 175, 363, 364 539, 543, 550, 562
COPASI ......................................................................... 390 exposure assessment....................................... 533, 534
COSMOS ........................................................................ 98 intake .............................................534, 535, 561, 562
Covariance ..................................................................... 471 Differential equations ....................................... 50, 52–54,
matrix .............................................................. 251, 392 57–62, 65, 67, 71, 82, 300, 301, 311, 385, 388,
CRAN ............................................................................ 525 396, 410, 413, 431, 440, 442, 439, 445, 452,
Crystal structure..............................................22, 99, 144, 474, 475, 478, 479, 521–523, 548, 567, 577,
146, 148, 149, 153, 155, 192, 240, 242, 246, 583, 586
255, 354, 357–359, 362–364 Dioxin ......................................................... 101, 444, 456,
Cytochrome 518, 552, 562–566
metabolism ....................................285, 293, 316, 352 Discriminant analysis..................................................... 339
substrate................................ 241, 242, 255, 285, 293 DNA
Cytoscape......................................................................... 85 binding......................................................75, 352, 516
Cytotoxicity ................................................. 141, 215, 216 Docking
methods ............................31, 37, 156, 354, 355, 358
D molecular .............................................. 135, 136, 140,
143, 147–150, 153, 240, 358, 362–363
Database
chemical ................................................ 191, 193, 195, scoring ............................................32, 147, 154, 354,
206, 210, 211 358, 359, 364
simulations............................................................... 149
forest ............................................................... 101, 355
KEGG ............................................................... 23, 181 tools .................................................................. 77, 246
network.................................................................... 164 Dose
administered ..................................306, 440, 441, 457
search ..................................22, 29, 74, 166, 168, 195
software................................................................73, 78 extrapolation......................................... 443, 494, 496,
support.............................................................. 73, 191 521, 532, 540
metric .................................... 440, 457, 505, 507, 509
tree ........................................................................... 345
Decision ...........................................................83, 84, 103, pharmacokinetic ............................................... 81, 505
136, 183, 229, 339–341, 343, 344, 345, reference ........................................ 516, 527–533, 537
449, 513 response ................................................ 5, 12, 14, 456,
471, 523, 517, 518, 525–532, 538–541, 546,
Derek ............................................................................... 83
Dermal .................................................................. 135, 460 550, 558–562
Descriptors Dosimetry ..................................................................11–13
3D QSAR ............................................... 26–29, 135–141,
chemical ....................................................97, 101, 108
models............................................................. 113, 120 144, 150–153, 339
molecular ..................................................... 25, 29, 30, Dragon..................................................... 78, 92, 106, 177
Dragon descriptors.................................. 78, 92, 106, 177
80, 117, 138, 177, 178, 216, 332, 335, 343, 355,
361, 362, 363, 364 Drug
physicochemical...............................97, 106, 111, 113 absorption............................. 286, 335, 337, 339, 340
prediction................................................................... 97 binding..................................................................... 237
Index
Drug (cont.) EPISuite...................................................... 78, 95, 96, 98,
clearance ............................................... 237, 240, 293, 101, 102, 107, 108, 110, 112, 113, 115, 183
295, 296, 339, 343, 508 Epoxide................................................................. 180, 456
databases ......................................................... 178, 210 Estrogenic.......................................................................... 4
development ............................................2, 8, 80, 240, Ethanol ........................................................ 293, 294, 296
330, 342, 346, 523, 590 Excel................................................................................. 81
distribution ........................... 283, 341, 384, 388, 582 Expert systems................................................................. 81
drug interactions .................................. 216, 225, 311, Exposure
331, 343, 352, 356, 577, 586, 589, 590 assessment............................................. 522, 517, 518,
induced toxicity ....................................................... 238 524, 533–538, 540, 543–545, 552, 566–567
metabolism ....................................... 28, 30, 216, 225, dose .......................................... 11, 12, 310, 314, 315,
237, 240, 291, 293, 343, 360 323, 534
plasma .......................... 289, 294, 578, 579, 581, 587 hazard ...................................................................... 219
receptor.................................................. 577, 582, 586 level .................................................... 4, 505, 539, 561
resistance.................................................................... 48 model ................................................ 10, 11, 523, 519,
safety ............................................................. 2, 11, 240 533–537, 550–551, 567
solubility .................................................................. 101 population ...................................................... 532–566
targets ................................... 144, 147, 192, 576, 577 response ................................................................... 576
DSSTox................................................154, 218, 219, 223 route....................................................... 439, 494, 521
Dynamical systems ........................................................ 390 scenario ...........................................11, 443, 472, 494,
521, 526, 533
E
F
Ecological risk ............................................................... 570
Ecotoxicity, ecotoxicology............................................ 183 Fat
Effectors............................... 14, 53, 57–63, 65, 506, 576 compartment .................................................. 447, 448
Electrotopological ................................................ 101, 109 tissue ..................................................... 442, 447, 448,
Elimination 451, 478, 481, 463476
chemical .......................................................... 459, 468 Fate and transport ....................................................... 9–10
drug ...............................................290, 291, 294, 300 Fingerprint................................................ 26, 27, 97, 142,
model ....................................................................... 297 147, 168, 177, 181, 196–199, 226, 341, 355,
process .................................. 389, 522, 534, 536, 549 356, 361, 362, 364
rate ........................................ 296, 300, 373, 522, 584 Food
Emax model ...................................................579–582, 586 additives ...............................................................2, 516
Endocrine consumption data........................................... 543–545
disruptors................................................................. 240 intake ............................................................... 10, 313,
system ...................................................................... 353 561, 562
Ensemble methods........................................................ 255 safety ........................................................................ 522
Enterocytes ..........................................306, 307, 309, 324 Force field ................................................... 13, 32, 34–36,
Environmental 136, 138–140, 144, 146, 147, 149, 243–247,
agents ......................................................................... 24 254, 261, 262, 264, 265, 270, 273
contaminants .................................353, 444, 463, 553 Forest
fate ..................................................... 13, 78, 183, 219 decision tree............................................................. 340
indicators ................................................................... 12 Formaldehyde................................................................ 162
pollutants ..................................................................... 2 Fortran ...................................................... 79, 82, 84, 244,
protection ................................. 2, 107, 154, 182, 361 390–392, 396, 444, 476, 478
toxicity .................................... 88, 162, 179, 183, 240 Fractal ................................................................... 338, 495
Enzyme Fragment based .................................................... 102, 168
complex........................................................... 209, 402 Functional
cytochrome ..............................................27, 241–242, groups ......................................................36, 136, 138,
285, 293, 316, 352, 482, 508 149, 166, 168, 171, 178, 180–182, 292
metabolites ..................................................... 293, 309 theory....................................................................... 140
receptors ................................................ 139, 238, 239 units ......................................................................... 440
substrates ........................................................ 285, 293 Fuzzy
transporters..................................................... 310, 322 adaptive .................................................................... 337
Index 605
G I
Gastrointestinal ................................................... 209, 285, Immune
286, 291, 300, 306–308, 313, 316, 321, 335, cells.................................................. 52, 53, 57–60, 63,
378, 494, 505, 522, 556 65, 67, 69, 70
GastroPlus .........................................................24, 77, 80, model ................................................................... 60–62
308–311, 316, 317, 324, 336, 445, 523, 524 response ....................................................... 48, 60, 63,
Gene, genetic 65, 69, 70
algorithms........................................... 21, 31, 91, 106, Immunotoxicity............................................................. 237
114, 166, 340, 362 InChI ................................................. 119, 164, 165, 172,
expression networks .................................................. 14 174, 189, 190, 193, 196, 200, 206, 210
function .......................................................... 337, 340 Inducers ................................................................ 293, 508
networks .................................................................. 216 Information
neural networks ....................................................... 106 systems ............................................................ 163, 183
profiling ................................................................... 141 Ingestion...................................................... 285, 457, 533
regulatory systems..................................................... 84 Inhalation ................................................... 135, 308, 316,
Genotoxicity ..............................................................5, 216 386, 441, 442, 449, 459–460, 477, 479, 484,
Glomerular .................................................. 207, 294, 389 485, 494–496, 505, 521, 533
Glucuronidation ......................................... 292, 343, 517, Integrated risk information system (IRIS) .................. 218
552, 553, 3889 Interaction
Glutathione ................................ 293, 294, 455, 482–485 energy ....................................................32, 33, 36–37,
Graph 139, 142, 243, 249
model ......................................................................... 54 fields ..................................................... 78, 80, 81, 103
theory........................................................12, 162, 166 model .............................................................. 140, 143
network................................................................19, 75
H rules............................................................................ 32
Hazard Inter-individual variability ......................... 501, 521, 525,
assessment................................................................ 522 527, 528, 539, 559, 560
Interspecies
characterisation.............522, 518–533, 538–540, 542
identification......................... 522, 518–533, 538, 539 differences................................................................ 516
HazardExpert .................................................................. 83 extrapolation..........................................442, 493–509,
520, 568
Hepatic
clearance ............................................... 135, 295, 341, Intestinal
343, 409, 411, 412, 422, 423 absorption..................................................29, 80, 308,
331, 333, 335, 337, 341, 458
metabolism ..................................................... 289, 292
Hepatotoxic ................................................................... 217 permeability ............................................................. 335
Herg tract .......................................................................... 322
blockers.................................................................... 240
J
channel................................................... 135, 181, 243
potassium channels .............. 238, 240, 242–243, 257 Java......................................... 82, 85, 176, 193, 205, 391
Hierarchical JSim.............................................. 82, 388, 390–397, 411,
clustering ................................................................. 540 422, 423, 430
models................................... 517, 540, 548, 550, 560
HIV .................................................................................. 22 K
Homeostasis .................................................................. 352 Ketoconazole............................................... 293, 359, 360
Homology ..................................................................... 148 Kidney
models....................30, 148, 149, 240, 242, 243, 258
cortex ....................................................................... 556
Hormone k-nearest neighbor ............................................... 338–341
binding..................................................................... 153 KNIME.......................................................................... 156
receptor..........................................238, 240, 351, 352 Kow (octanol-water partition
hPXR
coefficient) ............................................. 88, 94–99
activation ........................................................ 354–356 Kyoto encyclopedia of genes and
agonists .......................................... 353, 355–359, 362 genomes (KEGG)
antagonists ..............................................353, 359–360
pathway ...................................................................... 76
Index
L Maximum likelihood estimation
(MLE) ............................341, 523, 541, 550, 551
Langevin .....................................250, 259, 266, 271, 274 MCMC. See Markov chain Monte Carlo (MCMC)
Leadscope ............................................................. 136, 218 MCSim.......................................................... 35, 445, 439,
Least squares......................................................56, 57, 79, 523, 524, 549, 550
106, 111, 138, 337–341, 425, 465, 466 Mercury ................................................................ 207, 211
Ligand Meta-analysis .............................................. 530, 539–543,
binding.................................... 28, 141, 241, 252, 352 547, 551–555, 558
complex................................. 32, 34, 77, 82, 246, 363 MetabolExpert ....................................................... 80, 336
interactions ........................................ 31–33, 147, 360 Metabolism
library..................................................... 30, 31, 33, 37 (bio)activation ........................ 24, 179, 352, 360, 387
receptor...........................................34, 135, 240, 353, drug ...........................................................28, 30, 216,
357, 363 225, 237, 240, 291, 293, 343, 360
screening .................................................................. 329 liver ..........................................................12, 216, 285,
Likelihood 286, 291, 293, 309, 343, 386, 441, 442, 443,
functions .................................................................. 548 454, 457, 468, 478, 499–502, 507–509
method .................................................................... 251 prediction..................................................... 12, 28, 30,
ratio .......................................................................... 548 81, 84, 87, 225, 226, 309, 333, 335, 342, 343,
Linear algebra ....................................................... 385, 388 443, 445, 450, 451, 465, 470, 483, 486, 501,
Linear discriminant analysis .......................................... 339 509, 518, 546
Lipinski, C.A. ......................................175, 333, 334, 342 rate ........................................................ 445, 455, 478,
Lipophilic...................................................... 94, 240, 295, 483, 499–501
447, 448, 452, 456, 465, 520 Metabolomics/metabonomics .................................3, 568
Liver MetaCore..........................................................81, 85, 336
enzyme..................................................................... 241 MetaDrug ............................................................... 81, 336
injury........................................................................ 216 Metal ............................................... 32, 75, 138, 152, 207
microsomes....................................................... 12, 509 METAPC ................................................................ 81, 338
tissue ..............................................343, 476, 481, 498 Metasite ..........................................................81, 226, 336
Logistic Meteor ..............................................................81, 84, 336
growth ................................................... 55, 56, 60, 67 Methanol ....................................................................... 296
regression................................................................. 339 Methemoglobin ................................................... 581, 588
Lognormal .................................................. 521, 529, 531, Methotrexate .............................................. 290, 444, 448,
547, 553, 554, 559, 566 449, 453, 455, 459
Log P .................................................................25, 29, 30, Metyrapone ................................................. 241, 242, 255
74, 88, 91, 94–100, 117, 118, 144, 309, 316, MEXAlert ............................................................... 81, 336
322, 334, 345, 362 Michaelis-Menten equation................................. 388, 495
Lowest observed adverse effect level Microglobulinuria ........................................558–560, 562
(LOAEL) .................................. 84, 527–533, 538 Microsomes .................................................. 12, 286, 331,
Lungs ..................................................288, 291, 409–412, 498, 499, 502, 508, 509, 552
422, 440, 442, 448, 449, 455, 456, 459, 498, Milk....................................................................... 536, 550
499, 502, 519, 541 Minitab ......................................................................79, 93
Missing data.......................................................... 314, 541
M
MLE. See Maximum likelihood estimation (MLE)
Madonna (Berkeley-Madonna) ............................ 82, 383, Modeling
395, 444–448, 524 homology..................................................... 21–23, 30,
Malarial ............................................................................ 22 135, 142, 143, 146, 148, 153, 240, 245, 246,
Mammillary .........................................376, 387, 409, 410 258, 259, 262
Markov chain Monte Carlo molecular ....................................................... 4, 12, 13,
(MCMC) ....................................... 439, 523, 524, 73, 79, 82, 95, 107, 110, 112, 134–137,
550, 551, 560 140–148, 151–152, 154, 568
Markup language .......................................................... 174 in vitro ......................................................... 12, 23, 29,
Mathematica ................................................ 391, 523, 524 140, 141, 216, 231, 308, 312, 313, 321, 323,
Matlab................................................................56, 79, 82, 331, 332, 334, 356, 360, 443, 494, 500, 509,
84, 93, 388, 391, 396, 444–449, 523, 524, 531 589, 590
Index 607
Models MOLPRO................................................ 96, 98, 101, 102
animal..........................................................1, 353, 495 Monte Carlo simulation .................................33, 71, 439,
biological activity.......................................25, 28, 138, 472, 520, 524, 526, 533, 535, 550, 557, 560
140, 150, 151, 518 Multi-agent systems ........................................................ 85
bone ................................................................ 453, 519 Multidimensional drug discovery ................................ 216
carcinogenicity................................. 5, 81, 83, 84, 183 Multidrug resistance ............................................ 153, 208
checking...................................................59, 193, 312, Multiscale................................................. 2, 238, 385, 590
384, 390, 391, 395, 396, 566 Multivariate
development ..............................................14, 69, 117, analysis ...........................................175, 332, 333, 337
139, 219, 231, 331, 442, 440, 441, 446, 447, regression................................................................. 338
449, 467, 470, 473, 474, 482, 584, 586 Mutagenicity
developmental ..........................................84, 330, 456 alerts......................................................................... 224
error ................................................................ 451, 471 prediction....................................................83, 84, 181
evaluation..................... 441, 442, 469–474, 482, 527 Myelosuppression................................................. 586, 587
fitting .................................... 311, 313–315, 321, 578 MySQL .......................................................................... 154
identification................................................... 450–451
intestina .........................................................................l N
prediction.............................................. 184, 342, 439,
NAMD........................................................ 243–245, 255,
440, 442, 444, 465, 474, 482–484, 486, 256, 259, 263–265, 270, 273
501, 507 Nanoparticles....................................................4, 192, 388
refinement...................................................51, 70, 482 Nasal/pulmonary .......................................................... 307
reproductive........................................... 240, 352, 538
Nearest neighbor.................................................. 101, 148
selection ................................................................... 550 Nephrotoxicity .............................................................. 207
structure...................................................21, 148, 442, Nervous system .......................................... 257, 289, 290,
447, 449–463, 470, 474, 482–484, 486, 519,
342, 445
549, 581, 583 Network
uncertainty................................................... 5, 49, 451, gene............................................................................ 85
468, 470–473, 482, 523–525, 527, 528, 530,
KEGG ................................................ 23, 76, 181, 207
531, 540, 544, 550, 552, 560, 567–569 metabolic ..................................................12, 229, 333
validation ....................................................... 334, 470, neural .......................................................97, 106, 109,
521, 549–551 111, 170, 178, 334, 338, 339
MoKa ........................................78, 81, 95, 103, 105, 336
Neurotoxicity ................................................................ 207
Molecular Newborn........................................................................ 536
descriptor ..................................................... 12, 25, 29, Newton method .......................................... 238, 247, 249
30, 80, 117, 138, 177, 178, 216, 332, 335, 343,
Nicotine ......................................................................... 516
355, 361–364 Non
docking ....................................................77, 136, 140, bonded interactions .............................. 139, 249, 363
143, 146–150, 153, 155, 235, 240, 246, 356,
congeneric ............................................................... 337
358, 362 genotoxic ........................................................ 523, 516
dynamics ............................................... 135, 136, 140, Noncancer risk assessment............................................ 455
143, 146–150, 153, 155, 156, 240, 246, 356,
Non-compartmental analysis ........................77, 369–380,
358, 362 384, 385, 390, 400, 410, 436, 523, 524
fragments ........................................................ 221, 252 NONMEM................................................. 390, 445, 523,
geometry......................................................... 106, 146 524, 548, 549, 551
mechanics .................................................... 32, 34, 36,
Nonspecific binding ...................................................... 342
135, 136, 139–141, 144, 146, 152, 155, 239, No-observed-adverse-effect-level
261, 337 (NOAEL).........................................516, 527–533
networks ........................................................... 82, 362
Nuclear receptor....................................29, 135, 153, 240
property ................................................................... 178 Nucleic acids..........................................37, 192, 244, 246
shape ........................................................................ 138 Numerical
similarity ......................................................... 138, 339
integration ............................................. 238, 249, 521
targets ............................................................. 139–141 methods ................................................. 388, 432, 435
Molfile........................................................... 89, 167, 172, Nutrition Examination Survey
189, 190, 192, 200, 204, 209 (NHANES)...................................................10, 11
Index
O Permeability.....................................................12, 29, 119,
144, 156, 285, 306–309, 313, 316, 321–323,
Objective function ...................................... 313, 466, 548 325, 331, 333, 335–342, 395, 427–429, 460,
Occam’s razor ......................................... 49, 51, 120, 449 463, 520
Occupational safety ........................................................... 3 brain barrier .................................................... 339–341
Ocular ............................................................................ 314 drug ...................................... 307, 316, 335, 337–342
Omics........................................................ 3, 4, 19, 20, 85, intestinal ........................................................... 29, 335
238, 568 in vitro .....................................................12, 308, 312,
Open MPI ...........................................263, 264, 272, 273 313, 321–323, 331, 519
OpenTox Framework..........................224, 225, 230, 232 Pesticide ...................................................... 101, 165, 182,
Optimization 183, 201, 207, 282, 310, 351, 352, 356, 358,
dosage .........................................................33, 69, 389 361, 516
methods ................................................................... 399 Pharmacogenomics ....................................................... 329
pre clinical......................................................... 24, 329 Pharmacophore ................................................. 25, 28–30,
Oral 37, 73, 82, 135–144, 151–153, 177, 178, 226,
absorption................................................74, 306, 335, 353, 354, 360, 361
339, 458, 552, 554 Phosphatidylcholine ............................................. 247, 259
dose ....................................................... 307, 316, 321, Physiome
322, 377, 477, 478, 479, 496 JSim models............................................................. 391
Organisation for Economic Co-operation and project ........................................................................ 82
Development (OECD) PhysProps ......................................... 78, 91, 96, 110, 112
guidelines................................................................. 123 Pitfalls ...................................................... 35, 73, 261, 320
QSAR toolbox.................................. 91, 96, 107, 110, pKa .....................................................................29, 30, 78,
112, 115 79, 81, 87, 94–96, 99, 102–105, 239, 246,
Organochlorine ............................................................. 182 321, 322
Orthologous.................................................................... 21 Plasma
Outlier .................................................119, 120, 177, 312 concentration ....................................... 282, 288, 289,
Overfitting ............................. 88, 89, 106, 114, 116, 120 309, 313–320, 322, 369–373, 375, 379, 380,
Oxidative stress................................................................ 13 385, 386, 388, 389, 552, 553, 581, 582, 589
protein binding .......................................80, 294, 295,
P
322, 331, 335, 337, 338, 340, 342
Paracetamol .......................................................... 190, 196 Pollution ................................................78, 222, 310, 533
Parameter Polychlorinated biphenyls
estimation ..............................................143–144, 471, (PCBs).....................................101, 106, 110, 444
497, 504, 506, 507, 551 Polycyclic aromatic hydrocarbons
scaling ....................................................494, 496–498, (PAHs) ............................................................... 516
501–502, 506–507 Polymorphism ............................................. 136, 517, 552
Paraoxon ............................................................... 455, 589 Pooled data.................................................................... 578
Parasite............................................................................. 22 Poorly perfused tissues ............................... 442, 448, 451
Partial least squares (PLS) ........................... 79, 106, 138, Population based model .....................534, 537, 562–565
338, 339, 341, 355 Portal vein ............................................................ 285, 309
Partition coefficient...........................................12, 78, 88, Posterior distribution........................................... 560–561
94–99, 113–115, 239, 288, 334, 386, 442, 443, Potassium channel..............................222, 238, 240–243,
447, 448, 452, 456, 460, 461, 463, 465, 466, 254, 257
476, 494, 497, 500–502, 506, 519 Predict
PASSI toolkit ................................................................... 85 absorption......................................................... 29, 335
Pathway ADME parameters ........................................ 331, 333,
analysis .................................................................81, 85 335–343
maps ......................................................................... 155 aqueous solubility .................................... 99, 101–102
Pattern recognition .............................................. 337, 341 biological activity.............................25, 138, 151, 518
Penicillins.............................................................. 295, 354 boiling point.................................... 94, 108–110, 178
Perchlorate..................................................................... 444 carcinogenicity......................................................... 183
Perfusion..................................................... 297, 445, 446, clearance .................................................................... 12
448, 519–520 CNS permeability............................................. 30, 335
Index 609
cytochrome P450 .......................................... 4, 28, 30, Q
225, 237, 239, 241–242, 255, 285, 293, 316,
343, 352, 482, 500, 508, 517 QikProp ...........................12, 74, 80, 101, 102, 144, 336
developmental toxicity .............................................. 84 QSARpro ............................................................ 78, 92, 93
fate ...........................................................8–13, 78, 80, Quantum chemical descriptors......................97, 101, 108
81, 84, 134, 135, 183, 219, 284, 306, 343, 386,
R
523, 576
genotoxicity ............................................................. 216 R (Statistical software) ....................................79, 93, 323,
Henry constant...................................................94–96, 525, 527
113–115, 239 Random
melting point............................................... 91, 94–96, effects .............................................521, 543, 547, 549
100, 102, 105–108, 111, 178 forest ..................................................... 101, 173, 340,
metabolism ..............................................87, 329, 330, 341, 343, 344, 355
333, 335, 339, 340, 342, 343 Ranking..................................................31, 207, 552, 553
mutagenicity ..............................................83, 84, 181, Reabsorption ........................................................ 294–296
183, 224, 226 Reactive intermediates .................................................. 442
pharmacokinetic parameters ..................317, 329–346 Receptor
physicochemical properties...............................87–123 agonists ................................. 136, 137, 155, 353, 357
safety ......................................................................1, 11 AhR .......................................................................... 240
toxicity ................................................ 4, 30, 110, 184, antagonists............................ 136, 137, 155, 353, 360
238, 526 binding affinity ...................................... 146, 149, 333
PREDICTPlus.................................................78, 96, 107, mediated toxicity............................................ 351, 352
110, 112 Recirculation .............................................. 412, 410, 422,
Pregnancy ............................................................. 445, 536 428, 448, 452
Pregnane Xenobiotic receptors ........................... 351–361 Reconstructed enzyme network.......................... 169, 567
Prior distribution ................................................. 560–561 Reference dose (RfD) .........................516, 527–533, 537
Prioritization ........................................... 2, 136, 146, 542 Relational databases ............................154, 175, 177, 219
toxicity testing ......................................................... 1, 4 Renal clearance ............................................ 295, 339, 343
Procheck ............................................................... 251, 258 Reproductive toxicity .................................................... 538
ProChemist......................................................... 79, 93, 96 Rescaling........................................................................ 250
Progesterone ............................................... 203, 241, 255 Residual errors................................................56, 424, 549
Project Caesar.................................................................. 83 Respiratory ................................................... 13, 306, 316,
Propranolol........................................................... 316–318 391, 459, 473, 505, 582, 589
ProSa....................................................................... 80, 258 system .................................................... 306, 316, 391
Protein tract ...........................................................13, 306, 505
binding......................................................... 12, 77, 80, RetroMEX .............................................................. 81, 336
135, 155, 294, 295, 322, 331, 335, 337, 338, Reverse engineering ........................................................ 72
340, 342 Richly perfused tissues ........................442, 445, 451, 453
databank (PDB) ......................................22, 155, 173, Risk
245–247, 255–259, 354, 357, 362 analysis ............................................................ 509, 513
docking ............................................................. 31, 246 characterization ............................. 522, 523, 537–538
folding............................................................... 21, 239 estimation ............................................. 183, 215, 494,
interaction............................................ 21, 75, 76, 360 507, 523, 527, 528, 530, 532, 557
ligand ........................................................... 28, 31–33, Integrated Risk Information
77, 82, 141, 147, 246, 360, 363 System (IRIS) ................................................... 218
structure....................................................... 21–22, 30, management ............................................................ 516
144, 148, 245–246, 251, 257, 259, 262, 263, Risk/safety assessment
266, 269, 362 chemical ...............................154, 442, 513, 522, 521,
targets .......................................................... 21, 27, 28, 530, 533, 541, 546, 549, 552, 553, 567–569
37, 135, 144, 154, 245, 386 pharmaceutical...............................1, 2, 215, 330, 552
Proteomics.............................................3, 19, 21, 75, 568 screening .....................................................2, 154, 330
Prothrombin.................................................................. 582 testing ........................................................2, 182, 183,
Pulmonary .................................................... 13, 207, 307, 216, 219, 330, 526, 539, 568
314, 441, 443, 459, 519 Robustness..................................115, 118, 217, 532, 562
Index
S SMILES. See Simplified Molecular Input
Line Entry System (SMILES)
Saccharomyces cerevisiae ................................................ 25 SMiles ARbitrary Target Specification
Salicylic acid................................295, 296, 399, 400, 402 (SMARTS) ..............................103, 163, 172, 229
Sample size ................................................. 523, 518, 528, Smoking....................................................... 293, 508, 562
531, 540, 542, 559 Sodium............................... 135, 258, 260, 261, 273, 296
SBML.................................................................... 392, 395 Soil .................................................... 10, 13, 88, 533, 569
Scalability .............................................................. 244, 245 Solubility prediction.......................................99, 101, 102
Scaling Source-to-effect continuum ............................................. 8
factor ......................36, 216, 464, 496, 500, 507, 508 SPARC ...............................................................13, 74, 78,
procedure........................................................ 498, 500 96, 98, 101–105, 110, 112, 113, 115
SCOP ..................................................... 82, 444–447, 441 SPARQL .......................................................220–222, 224
Scoring function...................................... 31–33, 149, 363 Sparse data .................................. 379–380, 390, 445, 528
Screening Species
drug ......................................................................... 223 differences............................................. 136, 353, 439,
drug discovery ............................................. 24, 34, 37, 463, 506, 516, 520, 542
136, 142, 154, 223, 329–331, 333 extrapolation......................................... 443, 456, 469,
environmental chemicals............................................. 4 470, 496, 520, 526, 568
HTS ............................... 4, 30, 37, 73, 153, 238, 330 scaling ................................................... 313, 443, 464,
methods ............................................... 4, 37, 147, 357 494–498, 500–502, 504–506
protocols .................................................................. 141 specific.................................................... 464, 497, 568
Searchable toxicity database ......................................... 218 SPSS ...........................................................................79, 93
Secondary structure prediction ................ 21, 22, 75, 254 Statistica .............................................................. 79, 83, 93
Selectivity index........................................... 241, 243, 353 Stereoisomers .................................................28, 361, 427
Self-organizing maps..................................................... 334 Steroid...........................................................352, 356–359
Sensitivity Stomach ............................. 308, 458, 480, 481, 498, 502
analysis ..........................................392, 396, 398–399, Storage compartment .......................................... 451–454
414–415, 446, 439, 448, 471, 472, 507, 521, Stress response........................................................ 13, 138
531, 550 Structural
coefficient ................................................................ 472 alert .......................................................................... 224
Sequence similarity ................................................ 142, 168, 198
alignment ...................................................... 21, 22, 74 Structure-activity relationship (SAR)
homology................................................................. 148 analysis ..................................................................... 358
Serum albumin ......................................74, 135, 153, 339 methods ............................................................ 38, 358
Shellfish...........................................................10, 209, 516 model ....................................................................... 219
Signal transduction ................................................ 84, 586 Styrene ........................................................ 217, 441–445,
SIMCA............................................................................. 83 447–453, 455, 456, 459, 465, 475, 476
Simcyp.................................. 80, 311, 445, 523, 525, 566 Sub
Similarity cellular..............................................75, 155, 498, 499
analysis ....................................................................... 29 compartments.......................................................... 586
indices ............................................................. 138, 364 Substrate
search .....................................................168–169, 175, active site ........................................................ 240–242
195, 196, 198, 199, 209 binding.................................. 135, 240, 241, 412, 413
Simplified Molecular Input Line Entry inducers.................................................................... 293
System (SMILES)...............................77, 89, 110, inhibitor ................................................................... 293
162–165, 172, 174, 190, 191, 193, 206, Substructure
209, 362 searching ................................................ 164, 169, 198
Simulated annealing ............................................... 31, 407 similarity ..................................................30, 168, 169,
Sinusoids........................................................................ 430 175, 195, 196, 198, 222, 356
Size penalty.................................................................... 363 Sugar ..................................................................... 250, 386
Skin Sulfur dioxide .................................................................. 20
lesion .......................................................................... 14 Supersaturation ............................................................. 313
SMARTCyp ................................................................... 226 Support vector machine (SVM) ............................. 23, 30,
SMBioNet........................................................................ 84 97, 101, 334, 338–341, 355, 361
Index 611
Surrogate endpoint ......................................................... 13 endocrine disruption...................................... 240, 352
Surveillance programs ..................................................... 65 endpoint ............................................ 84, 89, 118, 215
Switch ........................................................... 75, 200, 211, environmental..........................................88, 162, 179,
407, 426, 457 183, 240
SYBYL...................................................... 77, 93, 153, 164 estimates .................................................................. 505
Systems mechanism ............................................ 232, 237, 238,
biology ......................................................... 21, 23–24, 262, 468
75, 80, 216, 238, 567, 568 organic ................................................... 135, 181, 182
pharmacology ................................................... 81, 591 pathways ................................................................2, 14
toxicology ................................................12, 215–232, potential.................................................... 13, 178–179
567, 568, 575, 576 prediction..................................................... 4, 30, 118,
184, 238
T rodent carcinogenicity .............................................. 84
Tanimoto coefficient ................................... 198, 199, 364 screening .................................................................. 238
testing .......................................................... 1, 2, 4, 14,
Teratogenicity................................................................ 455
Tetrachlorodibenzo dioxin (TCDD) ........................... 562 15, 182, 183, 219, 568
Tetrahymena pyriformis ................................................... 99 Toxicogenomics ................................................5, 81, 134,
136, 141, 142, 148, 152, 153
Theophylline............................................... 282, 293, 294,
318, 319 Toxicologically equivalent .................................. 494, 495,
Therapeutic 501, 505
doses ..................................................... 286, 287, 290, Toxicophore ........................................138, 152, 179–181
TOXNET....................................................................... 218
294, 295, 302, 386
index ........................................................................ 576 ToxPredict ..................................................................... 334
Thermodynamic properties .......................................... 261 ToxRefDB............................................................. 218, 219
Toxtree.......................................................................83, 84
Threshold value .........................................................30, 73
Thyroid ................................................................. 240, 498 Tracers.................................................................. 384, 390,
Tissue 427–436
Training sets ......................................................13, 25, 26,
dosimetry ................................................................... 11
grouping ......................................................... 446–449 27, 37, 83, 90–92, 97, 103, 106, 108, 109, 112,
partition coefficient ............................... 463, 466, 497 118–120, 122, 147, 149, 151, 177, 217, 229,
volumes............................................11, 464, 473, 478 337–341, 354–356, 361
Transcription factor.............................................. 240, 351
Tmax ............................................................. 299, 313, 320,
341, 345, 389, 397 Transcriptome .................................................... 3, 75, 568
Tolerable Transduction .................................. 24, 84, 577, 578, 586
Transit compartment ........................................... 586, 587
daily intake...................................................... 516, 537
weekly intake .................................................. 523, 539 Translational research ................................................... 577
Tolerance ....................................................................... 589 Transport
mechanisms ............................................................. 331
TopKat .................................................................... 84, 178
Topliss tree .................................................................... 122 models............................................................. 9, 10, 13
Topological proteins (transporters) ................................... 295, 342
Tree ..................................................................81–84, 103,
descriptor .................................................97, 100, 108,
111, 178, 345 136, 203, 208, 229, 334, 339–341, 343–345
index ........................................................................ 344 self organizing ......................................................... 334
Topological Polar Surface Area Trichloroacetic acid..................................... 165, 455, 456
Tumor ............................................. 14, 48, 50–67, 69, 70
(TPSA) ...............................................30, 181, 362
Total clearance.....................................297, 341, 523, 552 Turnover ..............................................576, 584, 585, 588
ToxCast program .......................................................... 154 Tyrosine ......................................................................... 590
Toxic equivalency factors (TEF) .................................. 562
U
Toxicity/toxicological
chemical ...................................................87, 219, 237, UML ................................................................................ 82
238, 239, 455 Urinary cadmium concentration ......................... 556–558
database .......................................................... 225, 231 Urine........................................................... 294–296, 338,
drug ...................................... 237–240, 262, 288, 386 340, 386, 389, 399, 400, 411, 440, 498, 522,
DSSTox..........................................154, 218, 219, 223 550, 556, 558
Index
V Virtual
ADME tox ...............................................29, 230, 231,
Valacyclovir ........................................................... 323–326 331, 334, 336, 342
Validation high throughput screening
external ................................................. 114, 122, 123, (vHTS).................................................. 30, 31, 33,
337, 470 37, 77, 82
internal.................................. 106, 151, 217, 470, 578 libraries .................................................................... 330
methods .......................................................... 242, 356 screening ...................................................... 24, 29–31,
QSAR.............................................. 26, 115, 122–123, 37, 77, 147, 148, 155, 181, 357
149, 151, 356, 566, 568 tissue ................................................................. 4, 5, 15
techniques.................................................... 26, 71–85, VolSurf ..................................................... 80, 92, 336, 355
123, 443, 578 Volume of distribution ................................ 24, 288, 290,
van der Waals ...................................................32, 36, 139, 299, 301, 333, 338–342, 376, 379, 389,
247–249 428, 455
Vapor pressure .................................................94–96, 108,
110–113, 115 W
Variability ...........................................................2, 10, 136,
314, 318, 335, 448, 473, 501, 508, Warfarin ....................................................... 242, 293, 582
513–569, 577 WinBUGS............................................................ 524, 533,
Variable selection........................................................... 552 549–551, 560
Vascular endothelial .................................... 426, 427, 434 WinNonLin .................................... 24, 77, 380, 445, 526
VCCLAB ...........................................................79, 96, 98, WSKOWWIN.............................................. 101, 102, 183
101, 102, 104, 105
X
Venlafaxine............................................................ 293, 294
Vinyl chloride ................................................................ 456 XPPAUT ...................................................... 383, 390, 395

2012 Computational Toxicology

Uploaded by

Copyright:

Available Formats

2012 Computational Toxicology

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2012 Computational Toxicology

Uploaded by

Copyright:

Available Formats

METHODS MOLECULAR BIOLOGY

For further volumes:

ISSN 1064-3745 ISSN 1940-6029 (electronic)

ª Springer Science+Business Media, LLC 2012

Printed on acid-free paper

Humana Press is a brand of Springer

Although a complete picture of toxicological risk often involves an analysis of environ-

Colorado, USA Brad Reisfeld

PART II MATHEMATICAL AND COMPUTATIONAL MODELING

4 Best Practices in Mathematical Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

PART III CHEMINFORMATICS AND CHEMICAL PROPERTY PREDICTION

6 Prediction of Physicochemical Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

PART IV PHARMACOKINETIC AND PHARMACODYNAMIC MODELING

SEAN EKINS Collaborations in Chemistry, Fuquay Varina, NC, USA; Department

SHANE D. PETERSON National Exposure Research Laboratory, US Environmental

What is Computational Toxicology?

Key words: Computational toxicology, Computational chemistry, Computational biology, Systems

There are over 80,000 chemicals in common use worldwide, and

For both environmental pollutants and drugs, there is a clear

3. What Are Some

In addition, the development of models in computational

6. What Are Some

l Assessing activity profiles of chemicals evaluated across bio-

7. What Are Likely

l Developing approaches to predict cellular responses and bio-

Computational Toxicology: Application

Computational toxicology involves a variety of computational

Fig. 1. Major components of the source-to-effect continuum.

bibliometric analysis of publications that appeared between 1970

of chemicals to environmental media; physical, chemical, and

93% average predictability in user mobility. This predictability

2.3. Dosimetry Pharmacokinetic processes translate the exposure or applied dose

Many of the parameters used in PBPK models can be measured

For environmental fate and transport models, QSAR can be

seldom available. In some cases, however, where the economic

The computational models described above incorporate varying

The United States Environmental Protection Agency through its

Role of Computational Methods in Pharmaceutical Sciences

Key words: ADME/Tox, Bioinformatics, Chemoinformatics, Protein structure prediction,

A living cell is composed of a number of processes that are well

Fig. 1. A schematic representation of the role of in silico methods in pharmaceutical sciences.

discovery and development has been done in the past decade.

The precise definition of the term bioinformatics is not known. It

retrieving, analyzing, and modeling biological data such as

guided by evolutionary and functional relationships among

predicting, and cataloging these PPIs using bioinformatics and

Mathematical modeling of the pathways is another tool that has

Chemoinformatics models were earlier referred to as ligand-based

Quantitative structure–activity relationships (QSAR) aim to quan-

The prefactors c0 and ci can be derived using multi-linear

the ligands representing the native conformations and orientations

hydrophobicity or hydrogen bonds, are projected on its molecular

pharmacophores have found wide spread utility in virtual screening,

In structure-based VHTS, automated docking is commonly

docked pose. Although there are several scoring schemes, it should

15. Binding Energy

where b ¼ 1/kBT. The equation describes samples of configura-

These constants are related to the binding free energy DGbind,

elegant discussion of the pitfalls of methods used to calculate the

16. Free Energy