2012 Computational Toxicology
2012 Computational Toxicology
2012 Computational Toxicology
TM
IN
Series Editor
John M. Walker
School of Life Sciences
University of Hertfordshire
Hatfield, Hertfordshire, AL10 9AB, UK
Volume I
Edited by
Brad Reisfeld
Chemical and Biological Engineering & School of Biomedical Engineering
Colorado State University, Colorado, USA
Arthur N. Mayeno
Chemical and Biological Engineering,
Colorado State University, Colorado, USA
Editors
Brad Reisfeld Arthur N. Mayeno
Chemical and Biological Engineering Chemical and Biological Engineering
& School of Biomedical Engineering Colorado State University
Colorado State University Colorado, USA
Colorado, USA
Rapid advances in computer science, biology, chemistry, and other disciplines are enabling
powerful new computational tools and models for toxicology and pharmacology. These
computational tools hold tremendous promise for advancing applied and basic science,
from streamlining drug efficacy and safety testing to increasing the efficiency and effective-
ness of risk assessment for environmental chemicals. These approaches also offer the
potential to improve experimental design, reduce the overall number of experimental trials
needed, and decrease the number of animals used in experimentation.
Computational approaches are ideally suited to organize, process, and analyze the vast
libraries and databases of scientific information and to simulate complex biological phe-
nomena. For instance, they allow researchers to (1) investigate toxicological and pharma-
cological phenomena across a wide range of scales of biological organization
(molecular $ cellular $ organism), (2) incorporate and analyze multiple biochemical
and biological interactions, (3) simulate biological processes and generate hypotheses
based on model predictions, which can be tested via targeted experimentation in vitro or
in vivo, (4) explore the consequences of inter- and intra-species differences and population
variability on the toxicology and pharmacology, and (5) extrapolate biological responses
across individuals, species, and a range of dose levels.
Despite the exceptional promise of computational approaches, there are presently very
few resources that focus on providing guidance on the development and practice of these
tools to solve problems and perform analyses in this area. This volume was conceived as
part of the Methods in Molecular Biology series to meet this need and to provide both
biomedical and quantitative scientists with essential background, context, examples, useful
tips, and an overview of current developments in the field. To this end, we present a
collection of practical techniques and software in computational toxicology, illustrated with
relevant examples drawn principally from the fields of environmental and pharmaceutical
sciences. These computational techniques can be used to analyze and simulate a myriad of
multi-scale biochemical and biological phenomena occurring in humans and other animals
following exposure to environmental toxicants or dosing with drugs.
This book (the first in a two-volume set) is organized into four parts each covering a
methodology or topic, subdivided into chapters that provide background, theory, and
illustrative examples. Each part is generally self-contained, allowing the reader to start with
any part, although some knowledge of concepts from other parts may be assumed. Part I
introduces the field of computational toxicology and its current or potential applications.
Part II outlines the principal elements of mathematical and computational modeling, and
accepted best practices and useful guidelines. Part III discusses the use of computational
techniques and databases to predict chemical properties and toxicity, as well as the use of
molecular dynamics. Part IV delineates the elements and approaches to pharmacokinetic
and pharmacodynamic modeling, including non-compartmental and compartmental mod-
eling, modeling of absorption, prediction of pharmacokinetic parameters, physiologically
based pharmacokinetic modeling, and mechanism-based pharmacodynamic modeling;
chemical mixture and population effects, as well as interspecies extrapolation, are also
described and illustrated.
v
vi Preface
References
1. Clark, M.M., Transport modeling for environmental engineers and scientists. 2nd ed. 2009, Hobo-
ken, N.J.: Wiley.
2. Hemond, H.F. and E.J. Fechner-Levy, Chemical fate and transport in the environment. 2nd ed.
2000, San Diego: Academic Press. xi, 433 p.
3. Logan, B.E., Environmental transport processes. 1999, New York: Wiley. xiii, 654 p.
4. Nirmalakhandan, N., Modeling tools for environmental engineers and scientists. 2002, Boca Raton,
Fla.: CRC Press. xi, 312 p.
Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
PART I INTRODUCTION
1 What is Computational Toxicology? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Brad Reisfeld and Arthur N. Mayeno
2 Computational Toxicology: Application in Environmental Chemicals . . . . . . . . . . . 9
Yu-Mei Tan, Rory Conolly, Daniel T. Chang, Rogelio Tornero-Velez,
Michael R. Goldsmith, Shane D. Peterson, and Curtis C. Dary
3 Role of Computational Methods in Pharmaceutical Sciences . . . . . . . . . . . . . . . . . . 21
Sandhya Kortagere, Markus Lill, and John Kerrigan
vii
viii Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
List of Contributors
HERVÉ ABDI School of Behavioral and Brain Sciences, The University of Texas
at Dallas, Richardson, TX, USA
BILLY AMZAL LA-SER Europe Ltd, London, UK
MELVIN E. ANDERSEN The Hamner Institutes for Health Sciences,
Research Triangle Park, NC, USA
JAMES B. BASSINGTHWAIGHTE Department of Bioengineering, University
of Washington, Seattle, WA, USA
FRÉDÉRIC Y. BOIS Royallieu Research Center, Technological University
of Compiegne, Compiegne, France; INERIS, DRC/VIVA/METO,
Verneuil en Halatte, France
MICHAEL B. BOLGER Simulations Plus, Inc., Lancaster, CA, USA
ERIK BUTTERWORTH Department of Bioengineering, University of Washington, Seattle,
WA, USA
JERRY L. CAMPBELL JR. The Hamner Institutes for Health Sciences,
Research Triangle Park, NC, USA
DANIEL T. CHANG National Exposure Research Laboratory, US Environmental
Protection Agency, Research Triangle Park, NC, USA
XIAOLIN CHENG Oak Ridge National Laboratory, UT/ORNL Center
for Molecular Biophysics, Oak Ridge, TN, USA; Department of Biochemistry
and Cellular and Molecular Biology, University of Tennessee, Knoxville,
TN, USA
HARVEY J. CLEWELL III The Hamner Institutes for Health Sciences, Research Triangle
Park, NC, USA
REBECCA A. CLEWELL The Hamner Institutes for Health Sciences,
Research Triangle Park, NC, USA
JEAN PAUL COMET I3S laboratory, UMR 6070 CNRS, University of Nice-Sophia
Antipolis, Sophia Antipolis, France
RORY CONOLLY National Health and Environmental Effects Research Laboratory,
U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
AMÉLIE CRÉPET French Agency for Food, Environment and Occupational Health Safety
(ANSES), Maisons-Alfort, France
CURTIS C. DARY National Exposure Research laboratory, US Environmental Protection
Agency, Research Triangle Park, NC, USA
LISETTE G. DE PILLIS Department of Mathematics, Harvey Mudd College,
Claremont, CA, USA
JEAN LOU DORNE Emerging Risks Unit, European Food Safety Authority,
Parma, Italy
STEPHEN B. DUFFULL School of Pharmacy, University of Otago, Otago,
New Zealand
HARISH DUREJA M. D. University, Rohtak, India
ix
x List of Contributors
Introduction
Chapter 1
Abstract
Computational toxicology is a vibrant and rapidly developing discipline that integrates information and
data from a variety of sources to develop mathematical and computer-based models to better understand
and predict adverse health effects caused by chemicals, such as environmental pollutants and pharmaceu-
ticals. Encompassing medicine, biology, biochemistry, chemistry, mathematics, computer science, engi-
neering, and other fields, computational toxicology investigates the interactions of chemical agents and
biological organisms across many scales (e.g., population, individual, cellular, and molecular). This multi-
disciplinary field has applications ranging from hazard and risk prioritization of chemicals to safety screening
of drug metabolites, and has active participation and growth from many organizations, including govern-
ment agencies, not-for-profit organizations, private industry, and universities.
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_1, # Springer Science+Business Media, LLC 2012
3
4 B. Reisfeld and A.N. Mayeno
2. What is
Computational
Toxicology?
The US Environmental Protection Agency (EPA) defines Compu-
tational Toxicology as “the application of mathematical and com-
puter models to predict adverse effects and to better understand the
single or multiple mechanisms through which a given chemical
induces harm.”
In a larger context, computational toxicology is an emerging
multidisciplinary field that combines knowledge of toxicity path-
ways with relevant chemical and biological data to inform the
development, verification, and testing of multi-scale computer-
based models that are used to gain insights into the mechanisms
through which a given chemical induces harm. Computational
toxicology also seeks to manage and detect patterns and interac-
tions in large biological and chemical data sets by taking advantage
of high-information-content data, novel biostatistical methods, and
computational power to analyze these data.
4. What Are
the Major Fields
Comprising
Computational Computational toxicology is highly interdisciplinary. Researchers in
Toxicology? the field have backgrounds and training in toxicology, biochemistry,
chemistry, environmental sciences, mathematics, statistics, medicine,
engineering, biology, computer science, and many other disciplines.
1 What is Computational Toxicology? 5
5. Who Uses
Computational
Toxicology?
A broad spectrum of international organizations are involved in the
development, application, and dissemination of knowledge, tools,
and data in computational toxicology. These include
l Government agencies in
The USA (EPA, Centers for Disease Control, Food and
Drug Administration, National Institutes of Health,
Agency for Toxic Substances and Disease Registry).
Europe (European Chemicals Agency, Institute for Health
and Consumer Protection).
Canada (Health Canada, National Centre for Occupational
Health and Safety Information).
Japan (National Institute of Health Sciences of Japan).
l The USA state agencies.
l Not-for-profit organizations.
l National laboratories.
l Nongovernment organizations.
l Military laboratories; private industry.
l Universities.
Abstract
This chapter provides an overview of computational models that describe various aspects of the source-to-health
effect continuum. Fate and transport models describe the release, transportation, and transformation of
chemicals from sources of emission throughout the general environment. Exposure models integrate the
microenvironmental concentrations with the amount of time an individual spends in these microenviron-
ments to estimate the intensity, frequency, and duration of contact with environmental chemicals. Physio-
logically based pharmacokinetic (PBPK) models incorporate mechanistic biological information to predict
chemical-specific absorption, distribution, metabolism, and excretion. Values of parameters in PBPK models
can be measured in vitro, in vivo, or estimated using computational molecular modeling. Computational
modeling is also used to predict the respiratory tract dosimetry of inhaled gases and particulates [computa-
tional fluid dynamics (CFD) models], to describe the normal and xenobiotic-perturbed behaviors of
signaling pathways, and to analyze the growth kinetics of preneoplastic lesions and predict tumor incidence
(clonal growth models).
Key words: Computational toxicology, Source-to-effect continuum, Fate and transport, Dosimetry,
Signaling pathway, Physiologically based pharmacokinetic model, Biologically based dose response
model, Clonal growth model, Virtual tissue
1. Overview
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_2, # Springer Science+Business Media, LLC 2012
9
10 Y.-M. Tan et al.
Fig. 2. Literature searches performed to understand publication frequency of common modeling types used in environ-
mental computational toxicology.
2. Computational
Models Along the
Source-to-Health
Effect Continuum Fate and transport models describe the release, transportation, and
transformation of chemicals from sources of emission throughout
2.1. Fate and Transport the general environment. Fate addresses persistence, dissipation,
and loss of chemical mass along the migration pathway; and trans-
port addresses mobility of a chemical along the migration pathway
(1). Based on their complexity, models of fate and transport can be
used for either “screening-level” or “higher-tiered” applications
(2). Screening-level models often use default input parameters
that tend to over-predict exposures (the preferred default approach
used in the absence of data). These models are suitable for obtain-
ing a first approximation or to screen out exposures that are not
likely to be of concern (3). Screening-level models have limited
spatial and temporal scope. Higher-tiered models are needed when
analyses require greater temporal and spatial resolution, but much
more information is required, such as site-specific data.
The processes that can be described in fate and transport models
include advection, dispersion, diffusion, equilibrium partitioning
between solid and fluid, biodegradation, and phase separation of
immiscible liquids (1). In general, fate and transport models require
information on physicochemical properties; mechanisms of release
12 Y.-M. Tan et al.
2.2. Exposure The outputs of a fate and transport model are concentrations to
which humans may be exposed. These predicted concentrations are
then used, in some cases, as surrogates for actual exposure (2).
Since these provisional estimates do not provide sufficient resolu-
tion about variation of exposure among individuals and by time and
location, they can also be used as inputs to exposure models.
Exposure models integrate the microenvironmental concentrations
with the amount of time an individual spends in these microenvir-
onments to provide qualitative and quantitative evaluations of the
intensity, frequency, and duration of contact with chemicals, and
sometimes, the resulting amount of chemicals that is actually
absorbed into the exposed organism. Exposure models vary con-
siderably in their complexity. Some models are deterministic and
generate site of contact-specific point estimates (e.g., dermal con-
centration contact time). Others are probabilistic, describing
spatial and temporal profiles of chemical concentrations in micro-
environments. Both deterministic and probabilistic models may
aggregate some or all of the major exposure pathways.
Probabilistic models can also be used to describe variability in
human behavior. Human activities contribute to exposure variabil-
ity, and at first glance appear to be arbitrary, yet patterns of behavior
are known to be representative of different age groups (e.g., hand-
to-mouth behavior among 3–5 year olds) and this information can
be used to better inform stochastic exposure models (4). A major
challenge in characterizing human activity is overcoming the cost of
collecting information. For example, food consumption question-
naires are important in dietary modeling (e.g., estimating chronic
arsenic exposure by shellfish consumption); however the accuracy
in assessing chronic exposure is limited by the lack of longitudinal
survey information in the surveys such as Continuing Survey of
Food Intake by Individuals (CSFII) and National Health and
Nutrition Examination Survey (NHANES) (5, 6). The recent
study of Song et al. (7) examined how much information is needed
in order to predict human behavior. The authors examined the
predictability of macro-scale human mobility over a span of 3
months based on cell phone use—comparing a continuous record
(e.g., hourly) of a user’s momentary location with a less expensive
measure of mobility. The authors found that there is a potential
2 Computational Toxicology: Application in Environmental Chemicals 13
2.4. Signaling Pathways Signaling pathways such as the mitogen-activated protein kinase
(MAPK) pathway (40) consist of one or more receptors at the cell
surface that, when activated by their cognate ligands, transmit
signals to cytosolic effectors and also to the genome. The cytosolic
effects are rapid, occurring within seconds or minutes of receptor
activation, while the effects on gene expression take longer, with
changes in the associated protein levels typically occurring after one
or more hours. A number of computational models of signaling
pathways have been described (e.g., (41, 42).
The National Research Council (NRC) report, Toxicity Test-
ing in the twenty-first century (18) introduced the concept of
“toxicity pathways.” Toxicity pathways were defined by the NAS
as “interconnected pathways composed of complex biochemical
interactions of genes, proteins, and small molecules that maintain
normal cellular function, control communication between cells,
and allow cells to adapt to changes in their environment” and
which, “when sufficiently perturbed, are expected to result in
adverse health effects are termed toxicity pathways” (43). The
adverse effect is the clinically evident effect on health and is
often referred to as the apical effect, denoting its placement at
the terminal end of the toxicity pathways. Although not much
work has been done to date, computational models of signaling
pathways are expected to be integral components of toxicity path-
way models.
2.5. BBDR/Clonal Cancer is a disease of cell division. In healthy tissue, the respective
Growth rates of cellular division and death are tightly regulated, allowing
for either controlled growth or the maintenance of tissue size in
adulthood. When regulation of division and death rates is dis-
rupted, tumors can develop. (It should also be noted that embry-
onic development depends on tight regulation of division and
death rates, where dysregulation can result in malformations).
A number of computational models have been developed to
describe tumor incidence and the growth kinetics of preneoplastic
lesions. These vary from purely statistical models fit to incidence
data (44) to models that track time-dependent division and death
rates of cells in the various stages of multi-stage carcinogenesis (45).
These latter kinds of models provide insights into how different
kinds of toxic effects—e.g., direct reactivity with DNA versus cyto-
lethality—can differentially affect tumor development.
BBDR models represent the entire exposure—target site
dose—apical response continuum. These kinds of models require
large amounts of supporting data but have the capability to predict
both dose–response and time course for development of apical
effects as well as for some intermediate effects (e.g., (46). This
latter capability is important as it provides the opportunity to use
data on biomarkers in support of model development. The
resources needed to develop such models are, unfortunately,
2 Computational Toxicology: Application in Environmental Chemicals 17
3. Virtual Tissues
Disclaimer
References
1. ASTM (1998) 1998 Annual book of ASTM 3. US EPA (1992) Guidelines for exposure assess-
standards: standard guide for remediation of ment. EPA/600/Z-92/001. US Environmen-
ground water by natural attenuation at petro- tal Protection Agency, Washington, DC
leum release sites (Designation: E 1943-98), 4. Zartarian VG, Xue J, Ozkaynak H, Dang W,
vol 11.04. American Society for Testing and Glen G, Smith L, Stallings C (2006) A proba-
Materials, West Conshohocken, pp 875–917 bilistic arsenic exposure assessment for children
2. Williams PRD, Hubbell WBJ, Weber E et al who contact CCA-treated playsets and decks,
(2010) An overview of exposure assessment Part 1: model methodology, variability results,
models used by the U.S. Environmental Pro- and model evaluation. Risk Anal 26
tection Agency. In: Hanrahan G (ed) Modeling (2):515–531
of pollutants in complex environmental 5. Glen G, Smith L, Isaacs K, Mccurdy T, Lang-
systems, vol 2. ILM Publications, St Albans staff J (2008) A new method of longitudinal
18 Y.-M. Tan et al.
diary assembly for human exposure modeling. 17. Rapaport DC (2004) The art of molecular
J Expo Sci Environ Epidemiol 18(3):299–311 dynamics simulation, 2nd edn. Cambridge
6. Tran NL, Barraj L, Smith K, Javier A, Burke T University, New York
(2004) Combining food frequency and survey 18. Obach RS (1999) Prediction of human
data to quantify long-term dietary exposure: a clearance of twenty-nine drugs from hepatic
methyl mercury case study. Risk Anal 24 microsomal intrinsic clearance data: an exami-
(1):19–30 nation of in vitro half-life approach and non-
7. Song C, Qu Z, Blumm N, Barabasi AL (2010) specific binding to microsomes. Drug Metab
Limits of predictability in human mobility. Sci- Dispos 27(11):1350–1359
ence 327(5968):1018–1021 19. Obach RS, Baxter JG, Liston TE, Silber BM,
8. US EPA (1997) Exposure factors handbook. Jones BC, MacIntyre F, Rance DJ, Wastall P
US Environmental Protection Agency, (1997) The prediction of human pharmacoki-
Washington, DC. http://www.epa.gov/ netic parameters from preclinical and in vitro
NCEA/pdfs/efh/front.pdf metabolism data. J Pharmacol Exp Ther 283
9. Watanabe PG, Gehring PJ (1976) Dose- (1):46–58
dependent fate of vinyl chloride and its possible 20. Tornero-Velez R, Mirfazaelina A, Kim KB,
relationship to oncogenicity in rats. Environ Anand SS, Kim HJ, Haines WT, Bruckner JV,
Health Perspect 17:145–152 Fisher JW (2010) Evaluation of deltamethrin
10. Reddy MB, Yang RSH, Clewell HJ, Andersen kinetics and dosimetry in the maturing rats
ME (2005) Physiologically based pharmacoki- using a PBPK model. Toxicol Appl Pharmacol
netic modeling: science and applications. Wiley, 244(2):208–217
Hoboken 21. Böhm G (1996) New approaches in molecular
11. Emmen HH, Hoogendijk EM, Klopping- structure prediction. Biophys Chem 59
Ketelaars WA, Muijser H, Duisterrnaat E, (1–2):1–32
Ravensberg JC, Alexander DJ, Borkhataria D, 22. Fielden MR, Matthews JB, Fertuck KC et al
Rusch GM, Schmit B (2000) Human safety (2002) In silico approaches to mechanistic and
and pharmacokinetics of the CFC alternative predictive toxicology: an introduction to bio-
propellants HFC 134a (1,1,1,2- informatics for toxicologists. Crit Rev Toxicol
tetrafluoroethane) and HFC 227 32(2):67–112
(1,1,1,2,3,3,3-heptafluoropropane) following 23. Marrone TJ, Briggs JM, McCammon JA
whole-body exposure. Regul Toxicol Pharma- (1997) Structure-based drug design: computa-
col 32(1):22–35 tional advances. Annu Rev Pharmacol Toxicol
12. Ernstgard L, Andersen M, Dekant W, Sjogren 37:71–90
B, Johanson G (2010) Experimental exposure 24. Leo A, Handsch C, Elkins D (1971) Partition
to 1,1,1,3,3-pentafluoropropane (HFC- coefficients and their uses. Chem Rev 71
245fa): uptake and disposition in humans. Tox- (6):525–616
icol Sci 113(2):326–336 25. Valko K (2002) Measurements and predictions
13. Sexton K, Kleffman DE, Cailahan MA (1995) of physicochemical properties. In: Darvas F,
An introduction to the National Human Expo- Dorman G (eds) High-throughput ADMETox
sure Assessment Survey (NHEXAS) and estimation. Eaton Publishing, Westborough
related phase I field studies. J Expo Anal Envi- 26. Topliss JG (ed) (1983) Quantitative structure-
ron Epidemiol 5(3):229–232 activity relationships of drugs. Academic, New
14. Shin BS, Hwang SW, Bulitta JB, Lee JB, Yang York
SD, Park JS, Kwon MC, do Kim J, Yoon HS, 27. Cronin MTD, Dearden JC, Duffy JC, Edwards
Yoo SD (2010) Assessment of bisphenol A R, Manga N, Worth AP, Worgan ADP (2002)
exposure in Korean pregnant women by physi- The importance of hydrophobicity and electro-
ologically based pharmacokinetic modeling. philicity descriptors in mechanistically-based
J Toxicol Environ Health A 73 QSARs for toxicological endpoints. SAR
(21–22):1586–1598 QSAR Environ Res 13:167–176
15. Wilson NK, Chuang JC, Morgan MK, Lordo 28. Pratt WB, Taylor P (eds) (1990) Principles of
RA, Sheldon LS (2007) An observational study drug action. Churchill-Livingstone, Inc, New
of the potential exposures of preschool to pen- York
tachlorophenol, bisphenol-A, and nonylphenol 29. Rabinowitz JR, Little S, Laws SC, Goldsmith R
at home and daycare. Environ Res 103(1):9–20 (2009) Molecular modeling for screening envi-
16. Andersen ME (2003) Toxicokinetic modeling ronmental chemicals for estrogenicity: use
and its applications in chemical risk assessment. of the toxicant-target approach. Chem Res
Toxicol Lett 138(1–2):9–27 Toxicol 22(9):1594–1602
2 Computational Toxicology: Application in Environmental Chemicals 19
30. Allen MP, Tildesley DJ (2002) Computer protein kinases with diverse biological
simulations of liquids. Oxford University, functions. Microbiol Mol Biol Rev
New York 68:320–344
31. Car R, Parrinello M (1985) Unified approach 41. Bhalla US, Prahlad RT, Iyengar R (2002) MAP
for molecular dynamics and density-functional kinase phosphatase as a locus of flexibility in a
theory. Phys Rev Lett 55(22):2471–2474 mitogen-activated protein kinase signaling net-
32. Colombo MC, Guidoni L, Laio A, Magistrato work. Science 297:1018–1023
A, Maurer P, Piana S, Röhrig U, Spiegel K, 42. Hoffman A, Levchenko A, Scott ML, Balti-
Sulpizi M, VandeVondele J, Zumstein M, more D (2005) The IkB-NF-kB signaling
Röthlisberger U (2002) Hybrid QM/MM module: temporal control and selective gene
Carr-Parrinello simulations of catalytic and activation. Science 298:1241–1245
enzymatic reactions. CHIMIA 56(1–2):13–19 43. National Research Council (NRC) Committee
33. Geva E, Shi Q, Voth GA (2001) Quantum- on Toxicity and Assessment of Environmental
mechanical reaction rate constants from cen- Agents (2007) Toxicity testing in the twenty-
troid molecular dynamics simulations. J Chem first century: a vision and a strategy. National
Phys 115:9209–9222 Academies, Washington, DC. ISBN 0-309-
34. Prezhdo OV, Rossky PJ (1997) Evaluation of 10989-2
quantum transition rates from quantum classi- 44. Crump KS, Hoel DG, Langley CH, Peto R
cal molecular dynamics simulation. J Chem (1976) Fundamental carcinogenic processes
Phys 107:5863 and their implications for low dose risk assess-
35. Truhlar DG, Garrett BC (1980) Variational ment. Cancer Res 36:2937–2979
transition-state theory. Acc Chem Res 45. Moolgavkar SH, Dewanji A, Venzon DJ
13:440–448 (1988) A stochastic two-stage model for cancer
36. Tuckerman M, Laasonen K, Sprik M, Parri- risk assessment. The hazard function and prob-
nello M (1995) Ab initio molecular dynamics ability of tumor. Risk Anal 8:383–392
simulation of the solvation and transport of 46. Conolly RB, Kimbell JS, Janszen D, Schlosser
hydronium and hydroxyl ions in water. PM, Kalisak D, Preston J, Miller FJ (2004)
J Chem Phys 103:150–161 Human respiratory tract cancer risks of inhaled
37. Wang H, Sun X, Miller WH (1998) Semiclassi- formaldehyde: dose-response predictions
cal approximations for trhe calculation of ther- derived from biologically-motivated computa-
mal rate constants for chemical reactions in tional modeling of a combined rodent and
complex molecular systems. J Phys Chem 108 human dataset. Toxicol Sci 82:279–296
(23):9726–9736 47. Adra S, Sun T, MacNeil S, Holcombe M,
38. Kimbell JS, Gross EA, Joyner DR, Godo MN, Smallwood R (2010) Development of a three-
Morgan KT (1993) Application of computa- dimensional multiscale computational model
tional fluid dynamics to regional dosimetry of of the human epidermis. PLoS One 5(1):
inhaled chemicals in the upper respiratory tract e8511. doi:10.1371/journal.pone.0008511
of the rat. Toxicol Appl Pharmacol 48. Shah I, Wambaugh J (2010) Virtual tissues in
121:253–263 toxicology. J Toxicol Environ Health 13
39. Overton JH, Kimbell JS, Miller FJ (2001) (2–4):314–328
Dosimetry modeling of inhaled formaldehyde: 49. Wambaugh J, Shah I (2010) Simulating micro-
the human respiratory tract. Toxicol Sci dosimetry in a virtual hepatic lobule. PLoS
64:122–134 Comput Biol 6(4):e1000756. doi:10.1371/
40. Roux PP, Blenis J (2004) ERK and p38 journal.pcbi. 1000756
MAPK-activated protein kinases: a family of
Chapter 3
Abstract
Over the past two decades computational methods have eased up the financial and experimental burden of
early drug discovery process. The in silico methods have provided support in terms of databases,
data mining of large genomes, network analysis, systems biology on the bioinformatics front and struc-
ture–activity relationship, similarity analysis, docking, and pharmacophore methods for lead design and
optimization. This review highlights some of the applications of bioinformatics and chemoinformatics
methods that have enriched the field of drug discovery. In addition, the review also provided insights into
the use of free energy perturbation methods for efficiently computing binding energy. These in silico
methods are complementary and can be easily integrated into the traditional in vitro and in vivo methods to
test pharmacological hypothesis.
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_3, # Springer Science+Business Media, LLC 2012
21
22 S. Kortagere et al.
2. Bioinformatics
3. Protein Structure
and Prediction
Protein structure prediction algorithms can be classified into three
categories namely, secondary structure prediction, ab initio struc-
ture modeling, and homology modeling. Given that there are 20
amino acids, the number of possibilities for a given sequence of
amino acids that constitute the primary structure of the protein to
fold into a tertiary structure is astronomical. However, protein
folding in physiological conditions probably follows the path of
least complexity and is therefore highly efficient. To mimic this
process, it is prudent to first predict the secondary structural ele-
ments namely alpha helix, beta sheet, and gamma turns which are
24 S. Kortagere et al.
4. Protein–Protein
Interactions
It can be envisioned that the next paradigm in drug discovery will
be to design inhibitors to key protein–protein interactions (PPIs).
These PPIs could be present between host and pathogen or entirely
belonging to a host or a pathogen. The feasibility of such drug
design has been shown recently by our group in designing small
molecule interaction inhibitors of key PPIs of the malaria parasite
(41) and other infectious agents such as Toxoplasma gondii and
HIV (42, 43). However, the bottle neck in this design process lies
in identifying key PPIs given only a handful of crystal structures of
such complexes are currently available in the protein databank.
Understanding PPIs are also important from other pharmacologi-
cal and biochemical perspectives such as in signaling, cellular
adhesion, enzyme kinetics, pathways, etc. Thus understanding,
3 Role of Computational Methods in Pharmaceutical Sciences 25
5. Systems Biology
and Protein
Networks
PPIs can be called as the minimal subunit of protein networks. The
concept of systems biology deals with systemically understanding
the biological processes by incrementally building up the networks
of interactions that underlie the biological process (64, 65). These
networks then provide the molecular basis for the etiology of dis-
eases and to rationally develop therapeutics that can work at one or
more components of the protein network (66, 67). The systems
approach also help in identifying key targets, biomarkers, and to
quantify potential side effects of drugs due to off-target interactions
(68, 69). In delineating these new networks, experimental methods
such as microarrays work in close association with statistical meth-
ods such as Bayesian network models to uncover new protein net-
works (70, 71). In addition, the networks of lower organisms such
as S. cerevisiae (72, 73), Drosophila melanogaster, have been identi-
fied (74, 75) and stored in databases such as KEGG (http://www.
genome.jp/kegg/pathway.html), UniPathway (http://www.gre-
noble.prabi.fr/obiwarehouse/unipathway) and clearly serve as
models to understand and derive the networks of higher organisms.
26 S. Kortagere et al.
6. Chemoinfor-
matics
Analogous to bioinformatics, chemoinformatics is a field of sci-
ence that is involved in management of chemical data using
computational methods. This area of science has gained tremen-
dous significance in the past few decades due to the availability of
chemical data in the form of combinatorial chemistry and high
throughput screening (84). These two aspects have set new trends
in drug discovery in which attrition rates are now being seriously
taken into account at early stages of drug discovery. Chemoinfor-
matics plays a major role in designing models for virtual screening,
lead design, lead optimization, preclinical filtering schemes for
drug like properties, ADMET and PK–PD modeling (85).
3 Role of Computational Methods in Pharmaceutical Sciences 27
7. Classical QSAR
predictive quality of the model for new compounds that are not
included in the training set. Leave-one-out cross-validation, a
technique used in the past to provide this information, proved
to be insufficient to measure predictive power (87–89), but leav-
ing out larger groups throughout cross-validation or scrambling
tests (the activity data of the compounds is randomly reordered
among the dataset and no QSAR model with comparable regres-
sion quality should be obtained for any reordering) have been
shown to more reliably estimate the predictive quality of the
QSAR model. As an ultimate test, however, the QSAR model
should always be validated by its potential to predict compounds,
called the external test set, not included at any stage in the training
process.
In parallel to Hansch and Fujita, Free and Wilson (90) derived
QSAR models using indicator variables. Indicator variables [aim]k
describe the absence or presence of a chemical substituent im (e.g.,
Cl, Br, I, Me) at position m of a common ligand scaffold with values
of 0 (absence) and 1 (presence)
X X
Ak ¼ c0 þ ci0 ½ai0 k þ þ ciN ½aiN k for all ligands k:
i0 iN
(2)
N is the number of substitutions. Only one indicator variable in
each sum of equation (2) can have a value of 1 for each ligand.
Although original Free–Wilson type QSAR analysis displayed some
shortcomings over Hansch-type QSAR models (e.g., activity pre-
dictions are only possible for new combinations of substituents
already included in the training set; more degrees of freedom are
necessary to describe every substitution), this QSAR scheme has
become popular again with the onset of structural fingerprints or
hashed keys (91, 92) (Fig. 2) describing the topology of the mole-
cules in the data set.
8. 3D-QSAR
and Extensions
With the introduction of comparative molecular field analysis
(CoMFA) (93), for the first time structure–activity relationships
were based on the three-dimensional structure of the ligand mole-
cules (3D-QSAR). In 3D-QSAR, the ligands’ interaction with
chemical probes or the ligands’ property fields (such as electrostatic
fields) are mapped onto a surface or grid surrounding a series of
compounds. The values on the grid or surface points are utilized as
individual descriptors, which are usually grouped into a smaller
number of descriptors, for use in a regression. The quality of the
3D-QSAR model critically depends on the correct superposition of
3 Role of Computational Methods in Pharmaceutical Sciences 29
Fig. 2. (a) A structural fingerprint for a chemical is generated by determining the frequency a specific fragment of a
predefined library is present in the ligand. The frequencies of all fragments are stored as individual bits in a bit string. The
individual bits are used as individual indicator variables. (b) In a hashed key the fragments are generated on-the-fly for all
ligands in the training set and the frequency of presence of a fragment is distributed to a hash key with fixed length.
9. Applications
of QSAR
QSAR has become an integral component in pharmaceutical
research to optimize lead compounds. Whereas QSAR is widely
used to identify ligands with high affinity for a given target protein,
more recently QSAR methodology has been extended to predict
pharmacokinetic properties, such as adsorption, distribution,
metabolism, elimination (ADME) properties (104) or the oral bio-
availability of compounds (105, 106), as well as the toxicity of drug
candidates. Furthermore, in the context of the Registration, Evalu-
ation, and Authorization of Chemicals (REACH) legislation of the
European Union, the prediction of the toxic potential of environ-
mental chemicals using QSAR has created public interest (107).
10. Pharmacophore-
Based Modeling
Another ligand-based method that has found utility in pharmaceu-
tical industry is pharmacophore-based modeling. A pharmacophore
can be defined as a molecular framework required for the biological
activity of a compound or a set of compounds (108). The concept of
3 Role of Computational Methods in Pharmaceutical Sciences 31
11. ADMET
Modeling
One of the major applications of ligand-based method is to predict
absorption, distribution, metabolism, excretion, and toxicity
(ADME/Tox) applications. A number of studies have utilized dif-
ferent ligand-based models (131–139). In addition there are many
studies that have provided an extensive comparison of these pro-
grams that have been designed to predict the ADMET properties
(140–142). ADMET properties can be described using a set of
physicochemical properties such as solubility, log P, log D, pKa,
polar surface area that describe permeability, intestinal absorption,
blood brain barrier penetration, and excretion. Solubility is mod-
eled as logarithm of solubility (log S) using molecular descriptors
that govern shape, size, interatomic forces, and polarity (143–147).
Permeability is a measure of the compound’s bioavailability and is
modeled using molecular descriptors that code for hydrophobicity,
steric and electronic properties of molecules (148–150). In addi-
tion to passive diffusion across cellular membranes, permeability
could also be through active transport by membrane bound trans-
porters and pumps (151, 152). In silico models of such active
transport have been modeled using ligand-based methods (153,
154). Permeability across the blood brain barrier is an associated
32 S. Kortagere et al.
parameter that is computed exclusively for CNS drugs and for other
compounds as an off-target filter. It is computed as logarithm of BB
(log BB) which is a measure of the ratio of concentration of the
drug in the brain to that in the blood (155, 156). Several in silico
models have been proposed that utilize molecular descriptors such
as log P, pKa, TPSA, and molecular weight with a variety of meth-
ods such as ANN, multiple regression models (MLR), QSAR,
Support vector machines (SVM), and other statistical techniques
(157–159). Similarly several in silico models have been proposed to
model drug metabolism and toxicity predictions which is reviewed
in a recent report (160).
12. Structure-
Based Methods
Structure-based methods as the name suggests relies on the
three-dimensional structure of the target and small molecule.
Three-dimensional (3D) structure of the target can be obtained
by experimental methods such as X-ray crystallography or NMR
methods or by homology modeling methods. Several review
articles provide additional details about the methods and utility of
homology models (160–162). Here we discuss the utility of
structure-based methods in virtual screening applications.
13. Virtual
Screening
High-throughput screening (HTS) has become a common tool for
drug discovery used to identify new hits interacting with a certain
biological target. Virtual screening technologies are applied to
either substitute or aid HTS. Both ligand-based methods that use
similarity to previously identified hits and structure-based methods
that use existing protein structure information can be used to
perform virtual high-throughput screening (VHTS) to identify
potentially active compounds. In ligand-based VHTS, one or sev-
eral hits must be identified first, for example, from previous HTS
experiments using a smaller subset of a ligand library or from
previously published hits. Factors such as the set of molecular
descriptors, the measurement of similarity, size, and diversity of
the virtual ligand library, and the similarity threshold value separat-
ing potential active from inactive compounds are critical to the
success of VHTS and must be carefully tuned. Techniques used in
ligand-based VHTS include methods based on 2D and 3D similar-
ity. Examples of such methods are substructure comparison (163),
shape matching (164), or pharmacophore methods (165–167).
3 Role of Computational Methods in Pharmaceutical Sciences 33
14. Scoring
Functions
Scoring functions are used to estimate the protein–ligand interac-
tion energy of each docked pose. The pose with the most favorable
interaction score represents the predicted bioactive pose and in
principle can be used as a starting point for subsequent rational
structure-based drug design. Scoring functions can be classified
into three types, namely force field, knowledge based, and empirical
based on the type of parameters used for scoring protein–ligand
interactions. The force field method uses the molecular mechanics
energy terms to compute the internal energy of the ligand and the
binding energy. However, entropic terms are generally omitted in
the calculations as they are computationally expensive to be com-
puted. Various scoring schemes are built on different force fields
such as Amber (182), MMFF (183), and Tripos (184). In general a
force field-based scoring function consists of the van der Waals term
approximated by a Lennard Jones potential function and an elec-
trostatics term in the form of a Coulombic potential with distance-
dependent dielectric function to reduce the effect of charge–charge
interactions (180). Additional terms can be incorporated in certain
cases where in the contributions of water molecules or metal ions
are distinctly known that can increase the accuracy of the scoring
function. Empirical scoring functions are devised to fit known
experimental data derived from a number of protein–ligand com-
plexes (185). Regression equations are derived using these known
protein–ligand complexes and regression coefficients are computed
and these coefficients are used to derive information about ener-
getic of other protein–ligand complexes. Scoring schemes employ-
ing empirical methods include LUDI (185), Chemscore (186), and
F-score (174). Knowledge-based scoring functions are used to
score simple pairwise atom interactions based on their environ-
ment. A set of known protein–ligand complexes are used to build
the knowledge database about the type of interactions that can
exist. Because of their simplistic approach they are advantageous
to be used in scoring large databases in relatively short time scales.
Scoring functions that use knowledge-based methods include
Drugscore (187), PMF (188), and SMOG (189).
Each of the scoring schemes mentioned have their advantages
and disadvantages. Hence the concept of consensus scoring
schemes was introduced to limit the dependency on any of the
schemes (190). A number of publications in the literature describe
comparative studies employing different docking and scoring
schemes (168, 180). There is no set rule to combine scoring
schemes, deriving a consensus score should be customized to
every application to limit amplifying errors and balancing the
right set of parameters that can be useful to identify the correctly
3 Role of Computational Methods in Pharmaceutical Sciences 35
Fig. 3. Illustration of the unbound receptor plus unbound ligand in equilibrium with the
receptor–ligand complex (the bound state).
ZB
DG ¼ GB GA ¼ kB T ln : (5)
ZA
The equations below describe the relation to chemical equilib-
rium (Fig. 3). KA is the equilibrium constant for association of recep-
tor (R) and ligand (L) to the complex (RL). The dissociation of the
complex (RL) back to receptor (R) and ligand (L) is described by the
dissociation constant KD which is also the inhibition constant Ki.
R þ L Ð RL ½RL
KA ¼
½R½L
RL Ð R þ L ½R½L
KD ¼ Ki ¼
½RL
17. Linear
Interaction Energy
(LIE)
Two methods, LIE and Molecular Mechanics-Poisson Boltzmann-
Surface Area (MM-PBSA), are often referred to as “endpoint”
methods because they neglect the intermediate states in the transi-
tion (192). The LIE method developed by Åqvist uses a scaling
factor b, based on the linear response approximation for the elec-
trostatic component (197, 198), while estimating the van der Waals
term using a scaling factor, a (199). This approach only considers
the endpoints: the bound ligand and the unbound or “free” ligand.
elec elec
DGbind ¼ b Vbound Vunbound
vdw vdw
þ a Vbound Vunbound þ g: (8)
The sometimes-used parameter g is a scaling factor used to
account for the medium in computing absolute free energies. The
values for the scaling factors a, b, g are dependent on the nature of
binding pocket, functional groups of ligands (Table 1), force fields,
solvent models (200–205). Several studies in the literature have
proposed the use of LIE methods to efficiently design small molecule
inhibitors such as antimalarials (206), antibiofilm agent Dispersin B
(207), HIV-1 reverse transcriptase inhibitors (208), glucose binding
to insulin (209), BACE-1 inhibitors (210), tubulin (211), and CDK-
2 inhibitors (212). The LIE method continues to evolve and grow as
Table 1
Optimal b parameters based
on compound type (203)
Compound b
Alcohols 0.37
1 Amides 0.41
1 , 2 Amines 0.39
Carboxylic acids 0.40
Cations 0.52
Anions 0.45
Other compounds 0.43
3 Role of Computational Methods in Pharmaceutical Sciences 39
18. Conclusions
Acknowledgments
We would like to thank our collaborators for their views and comments
on the manuscript and Bharat Kumar Stanam for his help in
designing figures. SK is funded by American Heart Association,
scientist development grant.
References
1. Figeys D (2004) Combining different ‘omics’ 10. Hopkins AL (2008) Network pharmacology:
technologies to map and validate protein–- the next paradigm in drug discovery. Nat
protein interactions in humans. Brief Funct Chem Biol 4:682–690
Genomic Proteomic 2:357–365 11. Metz JT, Hajduk PJ (2010) Rational
2. Cusick ME, Klitgord N, Vidal M, Hill DE approaches to targeted polypharmacology: cre-
(2005) Interactome: gateway into systems ating and navigating protein-ligand interaction
biology. Hum Mol Genet 14(Spec No. 2): networks. Curr Opin Chem Biol 14:498–504
R171–R181 12. Morrow JK, Tian L, Zhang S (2010) Molec-
3. Chakravarti B, Mallik B, Chakravarti DN ular networks in drug discovery. Crit Rev
(2010) Proteomics and systems biology: Biomed Eng 38:143–156
application in drug discovery and develop- 13. Scheibye-Alsing K, Hoffmann S, Frankel A,
ment. Methods Mol Biol 662:3–28 Jensen P, Stadler PF, Mang Y, Tommerup N,
4. Butcher EC, Berg EL, Kunkel EJ (2004) Gilchrist MJ, Nygard AB, Cirera S, Jorgensen
Systems biology in drug discovery. Nat CB, Fredholm M, Gorodkin J (2009) Sequence
Biotechnol 22:1253–1259 assembly. Comput Biol Chem 33:121–136
5. Cho CR, Labow M, Reinhardt M, van Oos- 14. Huang X (2002) Bioinformatics support for
trum J, Peitsch MC (2006) The application of genome sequencing projects. In: Lengauer T
systems biology to drug discovery. Curr Opin (ed) Bioinformatics—from genomes to drugs.
Chem Biol 10:294–302 Wiley-VCH Verlag GmbH, Weinheim
6. Chen C, McGarvey PB, Huang H, Wu CH 15. Mihara M, Itoh T, Izawa T (2010) SALAD
(2010) Protein bioinformatics infrastructure database: a motif-based database of protein
for the integration and analysis of multiple annotations for plant comparative genomics.
high-throughput “omics” data. Adv Bioin- Nucleic Acids Res 38:D835–D842
form 423589:19 16. Katayama S, Kanamori M, Hayashizaki Y
7. Gund P, Maliski E, Brown F (2005) Editorial (2004) Integrated analysis of the genome
overview: whither the pharmaceutical indus- and the transcriptome by FANTOM. Brief
try? Curr Opin Drug Discov Dev 8:296–297 Bioinform 5:249–258
8. Brown FK (1998) Chemoinformatics: what is 17. Blanchette M (2007) Computation and anal-
it and how does it impact drug discovery. ysis of genomic multi-sequence alignments.
Annu Rep Med Chem 33:9 Annu Rev Genomics Hum Genet 8:193–213
9. Gasteiger J, Engel T (2004) Chemoinfor- 18. Mungall CJ, Misra S, Berman BP, Carlson J,
matics: a textbook. Wiley, Weinheim Frise E, Harris N, Marshall B, Shu S,
3 Role of Computational Methods in Pharmaceutical Sciences 41
Kaminker JS, Prochnik SE, Smith CD, Smith 33. Bonneau R, Baker D (2001) Ab initio protein
E, Tupy JL, Wiel C, Rubin GM, Lewis SE structure prediction: progress and prospects.
(2002) An integrated computational pipeline Annu Rev Biophys Biomol Struct
and database to support whole-genome 30:173–189
sequence annotation. Genome Biol 3: 34. Simons KT, Kooperberg C, Huang E, Baker
RESEARCH0081 D (1997) Assembly of protein tertiary struc-
19. Lewis SE, Searle SM, Harris N, Gibson M, tures from fragments with similar local
Lyer V, Richter J, Wiel C, Bayraktaroglir L, sequences using simulated annealing and
Birney E, Crosby MA, Kaminker JS, Mat- Bayesian scoring functions. J Mol Biol
thews BB, Prochnik SE, Smithy CD, Tupy 268:209–225
JL, Rubin GM, Misra S, Mungall CJ, Clamp 35. Hardin C, Pogorelov TV, Luthey-Schulten Z
ME (2002) Apollo: a sequence annotation (2002) Ab initio protein structure prediction.
editor. Genome Biol 3:RESEARCH0082 Curr Opin Struct Biol 12:176–181
20. Pirovano W, Heringa J (2010) Protein sec- 36. Sali A, Blundell TL (1993) Comparative pro-
ondary structure prediction. Methods Mol tein modelling by satisfaction of spatial
Biol 609:327–348 restraints. J Mol Biol 234:779–815
21. Cozzetto D, Tramontano A (2008) Advances 37. Kriwacki RW, Wu J, Tennant L, Wright PE,
and pitfalls in protein structure prediction. Siuzdak G (1997) Probing protein structure
Curr Protein Pept Sci 9:567–577 using biochemical and biophysical methods.
22. Altschul SF, Gish W, Miller W, Myers EW, Proteolysis, matrix-assisted laser desorption/
Lipman DJ (1990) Basic local alignment ionization mass spectrometry, high-
search tool. J Mol Biol 215:403–410 performance liquid chromatography and
23. Pearson WR (1990) Rapid and sensitive size-exclusion chromatography of p21Waf1/
sequence comparison with FASTP and Cip1/Sdi1. J Chromatogr A 777:23–30
FASTA. Methods Enzymol 183:63–98 38. Kasprzak AA (2007) The use of FRET in the
24. Thompson JD, Higgins DG, Gibson TJ analysis of motor protein structure. Methods
(1994) CLUSTAL W: improving the sensitiv- Mol Biol 392:183–197
ity of progressive multiple sequence align- 39. Takeda-Shitaka M, Takaya D, Chiba C,
ment through sequence weighting, position- Tanaka H, Umeyama H (2004) Protein struc-
specific gap penalties and weight matrix ture prediction in structure based drug
choice. Nucleic Acids Res 22:4673–4680 design. Curr Med Chem 11:551–558
25. Lassmann T, Sonnhammer EL (2005) 40. Wlodawer A, Erickson JW (1993) Structure-
Kalign—an accurate and fast multiple sequence based inhibitors of HIV-1 protease. Annu Rev
alignment algorithm. BMC Bioinform 6:298 Biochem 62:543–585
26. Edgar RC (2004) MUSCLE: multiple 41. Kortagere S, Welsh WJ, Morrisey JM, Daly T,
sequence alignment with high accuracy and Ejigiri I, Sinnis P, Vaidya AB, Bergman LW
high throughput. Nucleic Acids Res (2010) Structure-based design of novel small-
32:1792–1797 molecule inhibitors of Plasmodium falci-
27. Notredame C, Higgins DG, Heringa J (2000) parum. J Chem Inf Model 50:840–849
T-Coffee: a novel method for fast and accu- 42. Kortagere S, Mui E, McLeod R, Welsh WJ.
rate multiple sequence alignment. J Mol Biol Rapid discovery of inhibitors of Toxoplasma
302:205–217 gondii using hybrid structure-based computa-
28. Tusnady GE, Simon I (1998) Principles gov- tional approach. J Comput Aided Mol Des.
erning amino acid composition of integral 2011 May;25(5):403–11
membrane proteins: application to topology 43. Kortagere S, Madani N, Mankowski MK,
prediction. J Mol Biol 283:489–506 Schön A, Zentner I, Swaminathan G,
29. Rost B, Liu J (2003) The PredictProtein Princiotto A, Anthony K, Oza A, Sierra LJ,
server. Nucleic Acids Res 31:3300–3304 Passic SR, Wang X, Jones DM, Stavale E,
30. Cole C, Barber JD, Barton GJ (2008) The Krebs FC, Martı́n-Garcı́a J, Freire E, Ptak
Jpred 3 secondary structure prediction server. RG, Sodroski J, Cocklin S, Smith AB 3rd.
Nucleic Acids Res 36:W197–W201 Inhibiting Early-Stage Events in HIV-1
Replication by Small-Molecule Targeting of
31. Guzzo AV (1965) The influence of amino- the HIV-1 Capsid. J Virol. 2012 Aug;86
acid sequence on protein structure. Biophys (16):8472–81
J 5:809–822
44. Pazos F, Valencia A (2008) Protein co-
32. Chou PY, Fasman GD (1974) Prediction of evolution, co-adaptation and interactions.
protein conformation. Biochemistry EMBO J 27:2648–2655
13:222–245
42 S. Kortagere et al.
45. Hu CD, Chinenov Y, Kerppola TK (2002) studying cellular networks of protein interac-
Visualization of interactions among bZIP tions. Nucleic Acids Res 30:303–305
and Rel family proteins in living cells using 57. Peri S, Navarro JD, Amanchy R, Kristiansen
bimolecular fluorescence complementation. TZ, Jonnalagadda CK, Surendranath V, Nir-
Mol Cell 9:789–798 anjan V, Muthusamy B, Gandhi TK, Gron-
46. Chien CT, Bartel PL, Sternglanz R, Fields S borg M, Ibarrola N, Deshpande N, Shanker
(1991) The two-hybrid system: a method to K, Shivashankar HN, Rashmi BP, Ramya MA,
identify and clone genes for proteins that Zhao Z, Chandrika KN, Padma N, Harsha
interact with a protein of interest. Proc Natl HC, Yatish AJ, Kavitha MP, Menezes M,
Acad Sci U S A 88:9578–9582 Choudhury DR, Suresh S, Ghosh N, Saravana
47. Selbach M, Mann M (2006) Protein interac- R, Chandran S, Krishna S, Joy M, Anand SK,
tion screening by quantitative immunoprecip- Madavan V, Joseph A, Wong GW, Schiemann
itation combined with knockdown (QUICK). WP, Constantinescu SN, Huang L, Khosravi-
Nat Methods 3:981–983 Far R, Steen H, Tewari M, Ghaffari S, Blobe
48. Gavin AC, Aloy P, Grandi P, Krause R, GC, Dang CV, Garcia JG, Pevsner J, Jensen
Boesche M, Marzioch M, Rau C, Jensen LJ, ON, Roepstorff P, Deshpande KS, Chinnai-
Bastuck S, Dumpelfeld B, Edelmann A, Heur- yan AM, Hamosh A, Chakravarti A, Pandey A
tier MA, Hoffman V, Hoefert C, Klein K, (2003) Development of human protein refer-
Hudak M, Michon AM, Schelder M, Schirle ence database as an initial platform for
M, Remor M, Rudi T, Hooper S, Bauer A, approaching systems biology in humans.
Bouwmeester T, Casari G, Drewes G, Neu- Genome Res 13:2363–2371
bauer G, Rick JM, Kuster B, Bork P, Russell 58. Kerrien S, Alam-Faruque Y, Aranda B, Bancarz
RB, Superti-Furga G (2006) Proteome survey I, Bridge A, Derow C, Dimmer E, Feuermann
reveals modularity of the yeast cell machinery. M, Friedrichsen A, Huntley R, Kohler C, Kha-
Nature 440:631–636 dake J, Leroy C, Liban A, Lieftink C,
49. Pellegrini M, Marcotte EM, Thompson MJ, Montecchi-Palazzi L, Orchard S, Risse J,
Eisenberg D, Yeates TO (1999) Assigning Robbe K, Roechert B, Thorneycroft D, Zhang
protein functions by comparative genome Y, Apweiler R, Hermjakob H (2007) IntAct—
analysis: protein phylogenetic profiles. Proc open source resource for molecular interaction
Natl Acad Sci U S A 96:4285–4288 data. Nucleic Acids Res 35:D561–D565
50. Dandekar T, Snel B, Huynen M, Bork P 59. Zanzoni A, Montecchi-Palazzi L, Quondam
(1998) Conservation of gene order: a finger- M, Ausiello G, Helmer-Citterich M, Cesareni
print of proteins that physically interact. G (2002) MINT: a molecular INTeraction
Trends Biochem Sci 23:324–328 database. FEBS Lett 513:135–140
51. Tan SH, Zhang Z, Ng SK (2004) ADVICE: 60. Ogmen U, Keskin O, Aytuna AS, Nussinov R,
automated detection and validation of inter- Gursoy A (2005) PRISM: protein interactions
action by co-evolution. Nucleic Acids Res 32: by structural matching. Nucleic Acids Res 33:
W69–W72 W331–W336
52. Aloy P, Russell RB (2003) InterPreTS: pro- 61. Keskin O, Ma B, Nussinov R (2005) Hot
tein interaction prediction through tertiary regions in protein–protein interactions: the
structure. Bioinformatics 19:161–162 organization and contribution of structurally
conserved hot spot residues. J Mol Biol
53. Aytuna AS, Gursoy A, Keskin O (2005) Pre- 345:1281–1294
diction of protein-protein interactions by
combining structure and sequence conserva- 62. Chen YC, Lo YS, Hsu WC, Yang JM (2007)
tion in protein interfaces. Bioinformatics 3D-partner: a web server to infer interacting
21:2850–2855 partners and binding models. Nucleic Acids
Res 35:W561–W567
54. Bader GD, Donaldson I, Wolting C, Ouellette
BF, Pawson T, Hogue CW (2001) BIND— 63. Jansen R, Yu H, Greenbaum D, Kluger Y,
the biomolecular interaction network data- Krogan NJ, Chung S, Emili A, Snyder M,
base. Nucleic Acids Res 29:242–245 Greenblatt JF, Gerstein M (2003) A Bayesian
networks approach for predicting protein-
55. Stark C, Breitkreutz BJ, Reguly T, Boucher L, protein interactions from genomic data.
Breitkreutz A, Tyers M (2006) BioGRID: a Science 302:449–453
general repository for interaction datasets.
Nucleic Acids Res 34:D535–D539 64. Monk NA (2003) Unravelling nature’s net-
works. Biochem Soc Trans 31:1457–1461
56. Xenarios I, Salwinski L, Duan XJ, Higney P,
Kim SM, Eisenberg D (2002) DIP, the data- 65. Uetz P, Finley RL Jr (2005) From protein
base of interacting proteins: a research tool for networks to biological systems. FEBS Lett
579:1821–1827
3 Role of Computational Methods in Pharmaceutical Sciences 43
66. Schrattenholz A, Groebe K, Soskic V (2010) drug development: role of modeling and sim-
Systems biology approaches and tools for ulation. AAPS J 7:E544–E559
analysis of interactomes and multi-target 80. Danhof M, de Jongh J, De Lange EC, Della
drugs. Methods Mol Biol 662:29–58 Pasqua O, Ploeger BA, Voskuyl RA (2007)
67. Lowe JA, Jones P, Wilson DM (2010) Network Mechanism-based pharmacokinetic-
biology as a new approach to drug discovery. pharmacodynamic modeling: biophase distri-
Curr Opin Drug Discov Dev 13:524–526 bution, receptor theory, and dynamical systems
68. Kell DB (2006) Systems biology, metabolic analysis. Annu Rev Pharmacol Toxicol
modelling and metabolomics in drug discov- 47:357–400
ery and development. Drug Discov Today 81. Paul Lee WN, Wahjudi PN, Xu J, Go VL
11:1085–1092 (2010) Tracer-based metabolomics: concepts
69. Xie L, Bourne PE (2011) Structure-based and practices. Clin Biochem 43:1269–1277
systems biology for analyzing off-target bind- 82. Chipman KC, Singh AK (2011) Using sto-
ing. Curr Opin Struct Biol 21(2):189–199 chastic causal trees to augment Bayesian net-
70. Imoto S, Higuchi T, Goto T, Tashiro K, works for modeling eQTL datasets. BMC
Kuhara S, Miyano S (2003) Combining Bioinform 12:7
microarrays and biological knowledge for esti- 83. Hou L, Wang L, Qian M, Li D, Tang C, Zhu
mating gene networks via Bayesian networks. Y, Deng M, Li F (2011) Modular analysis of
Proc IEEE Comput Soc Bioinform Conf the probabilistic genetic interaction network.
2:104–113 Bioinformatics 27:853
71. Needham CJ, Manfield IW, Bulpitt AJ, Gil- 84. Villar HO, Hansen MR (2009) Mining and
martin PM, Westhead DR (2009) From gene visualizing the chemical content of large data-
expression to gene regulatory networks in bases. Curr Opin Drug Discov Dev
Arabidopsis thaliana. BMC Syst Biol 3:85 12:367–375
72. Otero JM, Papadakis MA, Udatha DB, 85. Langer T, Hoffmann R, Bryant S, Lesur B
Nielsen J, Panagiotou G (2010) Yeast (2009) Hit finding: towards ‘smarter’
biological networks unfold the interplay of approaches. Curr Opin Pharmacol 9:589–593
antioxidants, genome and phenotype, and 86. Fujita T, Hansch C. Analysis of the structure-
reveal a novel regulator of the oxidative stress activity relationship of the sulfonamide drugs
response. PLoS One 5:e13606 using substituent constants. J Med Chem.
73. Teusink B, Westerhoff HV, Bruggeman FJ 1967 Nov;10(6):991–1000
(2010) Comparative systems biology: from 87. Golbraikh A, Tropsha A (2002) Beware of q2!
bacteria to man. Wiley Interdiscip Rev Syst J Mol Graph Model 20:269–276
Biol Med 2:518–532 88. Kubinyi H (2002) High throughput in drug
74. Neumuller RA, Perrimon N (2010) Where discovery. Drug Discov Today 7:707–709
gene discovery turns into systems biology: 89. Kubinyi H, Hamprecht FA, Mietzner T
genome-scale RNAi screens in Drosophila. (1998) Three-dimensional quantitative
Wiley Interdiscip Rev Syst Biol Med similarity-activity relationships (3D QSiAR)
3:471–478 from SEAL similarity matrices. J Med Chem
75. Bier E, Bodmer R (2004) Drosophila, an 41:2553–2564
emerging model for cardiac disease. Gene 90. Free SM Jr, Wilson JW (1964) A mathemati-
342:1–11 cal contribution to structure-activity studies.
76. Gianchandani EP, Chavali AK, Papin JA J Med Chem 7:395–399
(2010) The application of flux balance analysis 91. Brown RD, Martin YC (1996) Use of struc-
in systems biology. Wiley Interdiscip Rev Syst ture activity data to compare structure-based
Biol Med 2:372–382 clustering methods and descriptors for use in
77. Neves SR, Iyengar R (2009) Models of spa- compound selection. J Chem Inform Comput
tially restricted biochemical reaction systems. Sci 36:12
J Biol Chem 284:5445–5449 92. Brown RD, Martin YC (1997) The informa-
78. Czock D, Markert C, Hartman B, Keller F tion content of 2D and 3D structural descrip-
(2009) Pharmacokinetics and pharmacody- tors relevant to ligand-receptor binding.
namics of antimicrobial drugs. Exp Opin J Chem Inform Comput Sci 37:9
Drug Metab Toxicol 5:475–487 93. Cramer RD, Patterson DE, Bunce JD (1988)
79. Chien JY, Friedrich S, Heathman MA, de Comparative molecular-field analysis (Comfa).
Alwis DP, Sinha V (2005) Pharmacokinet- 1. Effect of shape on binding of steroids to
ics/pharmacodynamics and the stages of carrier proteins. J Am Chem Soc 110:8
44 S. Kortagere et al.
94. Bravi G, Gancia E, Mascagni P, Pegna M, 107. Worth AP, Bassan A, De Bruijn J, Gallegos
Todeschini R, Zaliani A (1997) MS-WHIM, Saliner A, Netzeva T, Patlewicz G, Pavan M,
new 3D theoretical descriptors derived from Tsakovska I, Eisenreich S (2007) The role of
molecular surface properties: a comparative the European Chemicals Bureau in promot-
3D QSAR study in a series of steroids. ing the regulatory use of (Q)SAR methods.
J Comput Aided Mol Des 11:79–92 SAR QSAR Environ Res 18:111–125
95. Belvisi L, Bravi G, Scolastico C, Vulpetti A, 108. Leach AR, Gillet VJ, Lewis RA, Taylor R
Salimbeni A, Todeschini R (1994) A 3D (2010) Three-dimensional pharmacophore
QSAR approach to the search for geometrical methods in drug discovery. J Med Chem
similarity in a series of nonpeptide angiotensin 53:539–558
II receptor antagonists. J Comput Aided Mol 109. Martin YC, Bures MG, Danaher EA, DeLaz-
Des 8:211–220 zer J, Lico I, Pavlik PA (1993) A fast new
96. Silverman BD, Platt DE (1996) Comparative approach to pharmacophore mapping and its
molecular moment analysis (CoMMA): 3D- application to dopaminergic and benzodiaze-
QSAR without molecular superposition. pine agonists. J Comput Aided Mol Des
J Med Chem 39:2129–2140 7:83–102
97. Hopfinger AJ, Wang S, Tokarski JS, Jin B, 110. Jones G, Willett P, Glen RC (1995) A genetic
Albuquerque M, Madhav PJ, Duraiswami C algorithm for flexible molecular overlay and
(1997) Construction of 3D-QSAR models pharmacophore elucidation. J Comput Aided
using the 4D-QSAR analysis formalism. Mol Des 9:532–549
J Am Chem Soc 119:15 111. Chang C, Swaan PW (2006) Computational
98. Vedani A, Briem H, Dobler M, Dollinger H, approaches to modeling drug transporters.
McMasters DR (2000) Multiple- Eur J Pharm Sci 27:411–424
conformation and protonation-state repre- 112. Patel Y, Gillet VJ, Bravi G, Leach AR (2002)
sentation in 4D-QSAR: the neurokinin-1 A comparison of the pharmacophore identifi-
receptor system. J Med Chem 43:4416–4427 cation programs: Catalyst, DISCO and GASP.
99. Lukacova V, Balaz S (2003) Multimode J Comput Aided Mol Des 16:653–681
ligand binding in receptor site modeling: 113. Ekins S, Johnston JS, Bahadduri P, D’Souza
implementation in CoMFA. J Chem Inf VM, Ray A, Chang C, Swaan PW (2005) In
Comput Sci 43:2093–2105 vitro and pharmacophore-based discovery of
100. Lill MA, Vedani A, Dobler M (2004) Raptor: novel hPEPT1 inhibitors. Pharm Res
combining dual-shell representation, 22:512–517
induced-fit simulation, and hydrophobicity 114. Chang C, Bahadduri PM, Polli JE, Swaan PW,
scoring in receptor modeling: application Ekins S (2006) Rapid identification of
toward the simulation of structurally diverse P-glycoprotein substrates and inhibitors.
ligand sets. J Med Chem 47:6174–6186 Drug Metab Dispos 34:1976–1984
101. Lill MA, Dobler M, Vedani A (2006) Predic- 115. Ekins S, Kim RB, Leake BF, Dantzig AH,
tion of small-molecule binding to cytochrome Schuetz EG, Lan LB, Yasuda K, Shepard RL,
P450 3A4: flexible docking combined with Winter MA, Schuetz JD, Wikel JH, Wrighton
multidimensional QSAR. ChemMedChem SA (2002) Application of three-dimensional
1:73–81 quantitative structure-activity relationships of
102. Vedani A, Dobler M, Lill MA (2005) Com- P-glycoprotein inhibitors and substrates. Mol
bining protein modeling and 6D-QSAR. Pharmacol 61:974–981
Simulating the binding of structurally diverse 116. Ekins S, Kim RB, Leake BF, Dantzig AH,
ligands to the estrogen receptor. J Med Chem Schuetz EG, Lan LB, Yasuda K, Shepard RL,
48:3700–3703 Winter MA, Schuetz JD, Wikel JH, Wrighton
103. Vedani A, Dobler M (2002) 5D-QSAR: the SA (2002) Three-dimensional quantitative
key for simulating induced fit? J Med Chem structure-activity relationships of inhibitors
45:2139–2149 of P-glycoprotein. Mol Pharmacol
104. Norinder U (2005) In silico modelling of 61:964–973
ADMET-a minireview of work from 2000 to 117. Bednarczyk D, Ekins S, Wikel JH, Wright SH
2004. SAR QSAR Environ Res 16:1–11 (2003) Influence of molecular structure on
105. Yoshida F, Topliss JG (2000) QSAR model substrate binding to the human organic cat-
for drug human oral bioavailability. J Med ion transporter, hOCT1. Mol Pharmacol
Chem 43:2575–2585 63:489–498
106. Martin YC (2005) A bioavailability score. 118. Chang C, Pang KS, Swaan PW, Ekins S
J Med Chem 48:3164–3170 (2005) Comparative pharmacophore
3 Role of Computational Methods in Pharmaceutical Sciences 45
modeling of organic anion transporting poly- method for efficient screening of ligands
peptides: a meta-analysis of rat Oatp1a1 and binding to G-protein coupled receptors.
human OATP1B1. J Pharmacol Exp Ther J Comput Aided Mol Des 20:789–802
314:533–541 131. Ekins S, Waller CL, Swaan PW, Cruciani G,
119. Suhre WM, Ekins S, Chang C, Swaan PW, Wrighton SA, Wikel JH (2000) Progress in
Wright SH (2005) Molecular determinants predicting human ADME parameters in
of substrate/inhibitor binding to the human silico. J Pharmacol Toxicol Methods
and rabbit renal organic cation transporters 44:251–272
hOCT2 and rbOCT2. Mol Pharmacol 132. Ekins S, Ring BJ, Grace J, McRobie-Belle DJ,
67:1067–1077 Wrighton SA (2000) Present and future
120. Ekins S, Swaan PW (2004) Computational in vitro approaches for drug metabolism.
models for enzymes, transporters, channels J Pharm Tox Methods 44:313–324
and receptors relevant to ADME/TOX. Rev 133. Ekins S, Ring BJ, Bravi G, Wikel JH,
Comp Chem 20:333–415 Wrighton SA (2000) Predicting drug-drug
121. Clement OO, Mehl AT (2000) HipHop: interactions in silico using pharmacophores:
pharmacophore based on multiple common- a paradigm for the next millennium. In:
feature alignments. IUL, San Diego, CA Guner OF (ed) Pharmacophore perception,
122. Evans DA, Doman TN, Thorner DA, Bodkin development, and use in drug design. IUL,
MJ (2007) 3D QSAR methods: phase and San Diego, pp 269–299
catalyst compared. J Chem Inf Model 134. Ekins S, Obach RS (2000) Three
47:1248–1257 dimensional-quantitative structure activity
123. Bahadduri PM, Polli JE, Swaan PW, Ekins S relationship computational approaches of pre-
(2010) Targeting drug transporters—com- diction of human in vitro intrinsic clearance.
bining in silico and in vitro approaches to J Pharmacol Exp Ther 295:463–473
predict in vivo. Methods Mol Biol 135. Ekins S, Bravi G, Binkley S, Gillespie JS, Ring
637:65–103 BJ, Wikel JH, Wrighton SA (2000) Three and
124. Ekins S, Ecker GF, Chiba P, Swaan PW four dimensional-quantitative structure activ-
(2007) Future directions for drug transporter ity relationship (3D/4D-QSAR) analyses of
modeling. Xenobiotica 37:1152–1170 CYP2C9 inhibitors. Drug Metab Dispos
125. Diao L, Ekins S, Polli JE (2010) Quantitative 28:994–1002
structure activity relationship for inhibition of 136. Ekins S, Bravi G, Wikel JH, Wrighton SA
human organic cation/carnitine transporter. (1999) Three dimensional quantitative struc-
Mol Pharm 7(6):2120–2131 ture activity relationship (3D-QSAR) analysis
126. Zheng X, Ekins S, Rauffman J-P, Polli JE of CYP3A4 substrates. J Pharmacol Exp Ther
(2009) Computational models for drug inhibi- 291:424–433
tion of the human apical sodium-dependent 137. Ekins S, Bravi G, Ring BJ, Gillespie TA, Gil-
bile acid transporter. Mol Pharm 6:1591–1603 lespie JS, VandenBranden M, Wrighton SA,
127. Diao L, Ekins S, Polli JE (2009) Novel inhi- Wikel JH (1999) Three dimensional-
bitors of human organic cation/carnitine quantitative structure activity relationship
transporter (hOCTN2) via computational analyses of substrates for CYP2B6. J Pharm
modeling and in vitro testing. Pharm Res Exp Ther 288:21–29
26:1890–1900 138. Ekins S, Bravi G, Binkley S, Gillespie JS, Ring
128. Gao Q, Yang L, Zhu Y (2010) Pharmaco- BJ, Wikel JH, Wrighton SA (1999) Three and
phore based drug design approach as a practi- four dimensional-quantitative structure activ-
cal process in drug discovery. Curr Comput ity relationship (3D/4D-QSAR) analyses of
Aided Drug Des 6:37–49 CYP2D6 inhibitors. Pharmacogenetics
9:477–489
129. Keri G, Szekelyhidi Z, Banhegyi P, Varga Z,
Hegymegi-Barakonyi B, Szantai-Kis C, 139. Ekins S, Bravi G, Binkley S, Gillespie JS, Ring
Hafenbradl D, Klebl B, Muller G, Ullrich A, BJ, Wikel JH, Wrighton SA (1999) Three and
Eros D, Horvath Z, Greff Z, Marosfalvi J, four dimensional-quantitative structure activ-
Pato J, Szabadkai I, Szilagyi I, Szegedi Z, ity relationship analyses of CYP3A4 inhibi-
Varga I, Waczek F, Orfi L (2005) Drug dis- tors. J Pharm Exp Ther 290:429–438
covery in the kinase inhibitory field using the 140. Lagorce D, Sperandio O, Galons H, Miteva
Nested Chemical Library technology. Assay MA, Villoutreix BO (2008) FAF-Drugs2: free
Drug Dev Technol 3:543–551 ADME/tox filtering tool to assist drug dis-
130. Kortagere S, Welsh WJ (2006) Development covery and chemical biology projects. BMC
and application of hybrid structure based Bioinform 9:396
46 S. Kortagere et al.
141. Villoutreix BO, Renault N, Lagorce D, maceutical research and development. Wiley,
Sperandio O, Montes M, Miteva MA (2007) Hoboken, NJ, pp 495–512
Free resources to assist structure-based virtual 154. Chang C, Ekins S, Bahadduri P, Swaan PW
ligand screening experiments. Curr Protein (2006) Pharmacophore-based discovery of
Pept Sci 8:381–411 ligands for drug transporters. Adv Drug Del
142. Ekins S (2007) Computational toxicology: Rev 58:1431–1450
risk assessment for pharmaceutical and envi- 155. Hamilton RD, Foss AJ, Leach L (2007)
ronmental chemicals. Wiley, Hoboken, NJ Establishment of a human in vitro model of
143. Wang J, Hou T (2009) Recent advances on in the outer blood-retinal barrier. J Anat
silico ADME modeling. Annu Rep Comput 211:707–716
Chem 5:101–127 156. Loscher W, Potschka H (2005) Drug resistance
144. Jorgensen WL, Duffy EM (2002) Prediction in brain diseases and the role of drug efflux
of drug solubility from structure. Adv Drug transporters. Nat Rev Neurosci 6:591–602
Deliv Rev 54:355–366 157. Abraham MH, Ibrahim A, Zhao Y, Acree WE
145. Wang J, Hou T, Xu X (2009) Aqueous solu- Jr (2006) A data base for partition of volatile
bility prediction based on weighted atom type organic compounds and drugs from blood/
counts and solvent accessible surface areas. plasma/serum to brain, and an LFER analysis
J Chem Inf Model 49:571–581 of the data. J Pharm Sci 95:2091–2100
146. Delaney JS (2005) Predicting aqueous solu- 158. Kortagere S, Chekmarev D, Welsh WJ, Ekins
bility from structure. Drug Discov Today S (2008) New predictive models for blood-
10:289–295 brain barrier permeability of drug-like mole-
147. Votano JR, Parham M, Hall LH, Kier LB, cules. Pharm Res 25:1836–1845
Hall LM (2004) Prediction of aqueous solu- 159. Zhang L, Zhu H, Oprea TI, Golbraikh A, Trop-
bility based on large datasets using several sha A (2008) QSAR modeling of the blood-
QSPR models utilizing topological structure brain barrier permeability for diverse organic
representation. Chem Biodivers compounds. Pharm Res 25:1902–1914
1:1829–1841 160. Kortagere S, Ekins S (2010) Troubleshooting
148. Hou TJ, Zhang W, Xia K, Qiao XB, Xu XJ computational methods in drug discovery.
(2004) ADME evaluation in drug discovery. J Pharmacol Toxicol Methods 61:67–75
5. Correlation of Caco-2 permeation with 161. Grant MA (2009) Protein structure predic-
simple molecular properties. J Chem Inf tion in structure-based ligand design and vir-
Comput Sci 44:1585–1600 tual screening. Comb Chem High
149. Jung E, Kim J, Kim M, Jung DH, Rhee H, Throughput Screen 12:940–960
Shin JM, Choi K, Kang SK, Kim MK, Yun 162. Sjogren B, Blazer LL, Neubig RR (2010)
CH, Choi YJ, Choi SH (2007) Artificial neu- Regulators of G protein signaling proteins as
ral network models for prediction of intestinal targets for drug discovery. Prog Mol Biol
permeability of oligopeptides. BMC Bioin- Transl Sci 91:81–119
form 8:245 163. Willett P (2003) Similarity-based approaches
150. Thomas VH, Bhattachar S, Hitchingham L, to virtual screening. Biochem Soc Trans
Zocharski P, Naath M, Surendran N, Stoner 31:603–606
CL, El-Kattan A (2006) The road map to oral 164. Ebalunode JO, Zheng W (2010) Molecular
bioavailability: an industrial perspective. Exp shape technologies in drug discovery: meth-
Opin Drug Metab Toxicol 2:591–608 ods and applications. Curr Top Med Chem
151. Zheng X, Ekins S, Raufman JP, Polli JE 10:669–679
(2009) Computational models for drug inhi- 165. Horvath D (2011) Pharmacophore-based vir-
bition of the human apical sodium-dependent tual screening. Methods Mol Biol (Clifton,
bile acid transporter. Mol Pharm NJ) 672:261–298
6:1591–1603
166. Yang SY (2010) Pharmacophore modeling
152. Varma MV, Ambler CM, Ullah M, Rotter CJ, and applications in drug discovery: challenges
Sun H, Litchfield J, Fenner KS, El-Kattan AF and recent advances. Drug Discov Today
(2010) Targeting intestinal transporters for 15:444–450
optimizing oral drug absorption. Curr Drug
Metab 11:730–742 167. Ebalunode JO, Zheng W, Tropsha A (2011)
Application of QSAR and shape pharmaco-
153. Chang C, Swaan PW (2006) Computer opti- phore modeling approaches for targeted
mization of biopharmaceutical properties. In: chemical library design. Methods Mol Biol
Ekins S (ed) Computer applications in phar- (Clifton, NJ) 685:111–133
3 Role of Computational Methods in Pharmaceutical Sciences 47
195. Beutler TC, Mark AE, Vanschaik RC, Gerber 206. Orrling KM, Marzahn MR, Gutierrez-de-
PR, van Gunsteren WF (1994) Avoiding sin- Teran H, Aqvist J, Dunn BM, Larhed M
gularities and numerical instabilities in free- (2009) a-Substituted norstatines as the
energy calculations based on molecular simu- transition-state mimic in inhibitors of multi-
lations. Chem Phys Lett 222:529–539 ple digestive vacuole malaria aspartic pro-
196. Zacharias M, Straatsma TP, McCammon JA teases. Bioorg Med Chem 17:5933–5949
(1994) Separation-shifted scaling, a new scal- 207. Kerrigan JE, Ragunath C, Kandra L,
ing method for Lenard-Jones interactions in Gyemant G, Liptak A, Janossy L, Kaplan JB,
thermodynamic integration. J Chem Phys Ramasubbu N (2008) Modeling and
100:9025–9031 biochemical analysis of the activity of antibio-
197. Jorgensen W, Chandrasekhar J, Madura J, film agent Dispersin B. Acta Biol Hung
Klein M (1983) Comparison of simple poten- 59:439–451
tial functions for simulating liquid water. 208. Zhou R, Frienser RA, Ghosh A, Rizzo RC,
J Chem Phys 79:926–935 Jorgensen WL, Levy RM (2001) New linear
198. Berendsen HJ, Postma JP, van Gunsteren WF, interaction method for binding affinity
Hermans J (1981) Interaction models for calculations using a continuum solvent
water in relation to protein hydration. In: Pull- model. J Phys Chem B 105:10388–10397
man B (ed) Intermolecular forces. D. Reidel 209. Zoete V, Meuwly M, Karplus M (2004)
Publishing Co., Dordrecht, pp 331–342 Investigation of glucose binding sites on insu-
199. Åqvist J, Medina C, Samuelsson JE (1994) A lin. Proteins 55:568–581
new method for predicting binding affinity in 210. Liu S, Zhou LH, Wang HQ, Yao ZB (2010)
computer-aided drug design. Protein Eng Superimposing the 27 crystal protein/inhibi-
7:385–391 tor complexes of beta-secretase to calculate
200. Åqvist J, Hansson T (1996) On the validity of the binding affinities by the linear interaction
electrostatic linear response in polar solvents. energy method. Bioorg Med Chem Lett
J Phys Chem 100:9512–9521 20:6533–6537
201. Hansson T, Marelius J, Åqvist J (1998) 211. Alam MA, Naik PK (2009) Applying linear
Ligand binding affinity prediction by linear interaction energy method for binding affinity
interaction energy methods. J Comput calculations of podophyllotoxin analogues
Aided Mol Des 12:27–35 with tubulin using continuum solvent model
202. Almlöf M, Carlsson J, Åqvist J (2007) and prediction of cytotoxic activity. J Mol
Improving the accuracy of the linear interac- Graph Model 27:930–943
tion energy method for solvation free ener- 212. Alzate-Morales JH, Contreras R, Soriano A,
gies. J Chem Theory Comput 3:2162–2175 Tunon I, Silla E (2007) A computational
203. Almlöf, M. (2007) Computational Methods study of the protein-ligand interactions in
for Calculation of Ligand-Receptor Binding CDK2 inhibitors: using quantum mechan-
Affinities Involving Protein and Nucleic Acid ics/molecular mechanics interaction energy
Complexes, In Cell and Molecular Biology, as a predictor of the biological activity. Bio-
p 53, Uppsala University, Uppsala, Sweden phys J 92:430–439
204. Almlof M, Brandsdal BO, Aqvist J (2004) Bind- 213. Wolber G, Langer T (2005) LigandScout:
ing affinity prediction with different force fields: 3-D pharmacophores derived from protein-
examination of the linear interaction energy bound ligands and their use as virtual screen-
method. J Comput Chem 25:1242–1254 ing filters. J Chem Inf Model 45:160–169
205. Jorgensen WL, Maxwell DS, Tirado-Rives J 214. Tan L, Batista J, Bajorath J (2010) Computa-
(1996) Development and testing of the OPLS tional methodologies for compound database
all-atom force field on conformational ener- searching that utilize experimental protein-
getics and properties of organic liquids. J Am ligand interaction information. Chem Biol
Chem Soc 118:11225–11236 Drug Des 76:191–200
Part II
Abstract
Mathematical modeling is a vehicle that allows for explanation and prediction of natural phenomena. In this
chapter we present guidelines and best practices for developing and implementing mathematical models,
using cancer growth, chemotherapy, and immunotherapy modeling as examples.
1. Introduction
and Overview
Mathematics is a concise language that encourages clarity of com-
munication. Mathematical modeling is a process that makes use of
the power of mathematics as a language and a tool to develop
helpful descriptions of natural phenomena. Mathematical models
of biological and medical processes are useful for a number of
reasons. These include the following: clarity in communication;
safe hypothesis testing; predictions; treatment personalization;
new treatment protocol testing.
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_4, # Springer Science+Business Media, LLC 2012
51
52 L.G. de Pillis and A.E. Radunskaya
1.2. Safe Hypothesis A useful mathematical model may allow one to test the possible
Testing mechanisms behind certain observed phenomena. For example, the
reasons some patients go into remission from cancer and never
relapse, while others do relapse, are not fully understood. A mathe-
matical model, however, can allow us to test the hypothesis that the
strength of a patient’s immune system plays a significant role in
whether or not a patient will experience relapse.
1.5. New Treatment A useful mathematical model may allow one to test new medical
Protocol Testing interventions: a variety of hypothetical interventions can be ana-
lyzed through the formal model much more quickly, inexpensively
and safely than can be done using clinical trials.
2. Modeling
Philosophy
There are two main approaches to developing a mathematical
model from the ground up. One is to start with the most compli-
cated model that includes everything and pare it down. Alterna-
tively, in the Occam’s Razor approach, one starts with the simplest
model possible and then builds it up as necessary. We recommend
using the Occam’s Razor approach, only adding complexity when
the simple model is not sufficient to achieve the desired results.
Paul Sanchez (1) suggests keeping in mind the following
guidelines when developing a mathematical model.
l Start with the simplest model possible that captures the essen-
tial features of the system of interest.
l Build in small steps—Once your simple model is working, add
features incrementally to make it more realistic. Be sure to only
add one thing at a time, and test the model after each addition.
l Keep only those additional features that actually improve the
model: is the more complicated model more useful in answer-
ing the questions that you need addressed?
l Always compare incremental improvements with previous, sim-
pler versions of the model. Do not hesitate to go back to earlier
models, or to start over, considering a different approach.
We can summarize these guidelines in the Goldilocks Principle: a
mathematical model should be not too complicated, but not too
simple either. There is always a trade-off between complexity and
tractability. On the one hand, a highly complicated model can have
many variables and many parameters, making it appear more realistic.
It could be difficult, however, if not impossible, to estimate the large
number of parameters. Each parameter estimate introduces some new
degree of uncertainty, thus potentially diminishing the usefulness of
the model. In addition, it is typically very difficult to mathematically
analyze a system with a large number of variables. On the other hand,
a simpler model might be analytically tractable but too unrealistic: the
simple model may not be able to answer the question of interest.
There is an art to modeling: there is not necessarily one correct
model. Part of this art is to determine which elements are impor-
tant to have in the model, and which elements we can ignore.
Keeping these philosophical guidelines in mind, we view the
modeling process in terms of the following five-step approach.
54 L.G. de Pillis and A.E. Radunskaya
3. Example:
Implementation
of the Modeling
Process In this section we illustrate the five-step modeling process by using
it to develop a mathematical model of tumor response to the
immune system. We then extend this model to study the effects of
chemotherapy on the system.
4 Best Practices in Mathematical Modeling 55
Model World
Real World Occam’s Razor*
2. Select the
1. Ask the question modeling approach
5. Answer the
question.
sary
ces
if ne
4. Analysis, Solutions,
Numerics. Validate.
Mathematical Model
(Equations)
*Occam’s Razor :
“Entia non sunt multiplicanda 3. Formulate Model
praeter necessitatem” World Problem
“Things should not be
multiplied without good reason”
Fig. 1. The Modeling Process. This diagram shows the five steps of the modeling process,
with possible loops to illustrate successive model refinement.
5 5
4 4
3 3
Population
Population
2 2
1 1
0 0
–1 –1
–5 0 5 10 15 20 –5 0 5 10 15 20
Time Time
DiscreteTime Continuous Time
Fig. 2. Discrete versus continuous models. In a discrete model, the system is defined at a finite set of points (left panel ),
while in a continuous model, the system is defined on a continuous interval of time points (right panel ).
dE
¼ F 1 ðE; T Þ
dt
dT
¼ F 2 ðE; T Þ;
dt
where E denotes the immune cell populations, and T denotes the
tumor cell population.
Remark: We point out that, while focusing here on differential
equations models, these are appropriate only when a continuous
description of the variables is appropriate (many cells, many time
points). Also, ordinary differential equations (ODE’s) often assume
that the populations are well-mixed. For an example of a model of
tumor—immune interactions that has discrete and stochastic ele-
ments, see ref. (2). In the more complicated model described in the
referenced paper, spatial variation and a variety of tumor—immune
interactions can be explored, at the cost of analytical tractability.
The general modeling philosophy is independent of the choice of
model type (continuous versus discrete), but we emphasize that the
choice of model must be informed by the question being asked, as
well as the analytical tools at hand.
Step 3: Formulate the Model.
Applying our philosophy of starting with the simplest possible
model, we choose to track two cell populations: effector cells and
tumor cells. Thus, our model contains two dependent variables:
T(t) ¼ Tumor cells (number of cells or density),
E(t) ¼ Immune Cells that kill tumor cells (Effectors) (number of
cells or density),
and one independent variable: t (time).
The growth of each cell population can be divided into two
components: population growth in isolation, and competitive
interactions between populations. We initially focus on the first
component, and begin by examining how each cell population
might grow if it were isolated from the other.
58 L.G. de Pillis and A.E. Radunskaya
3.1. Modeling Tumor The simplest growth process involves the cells dividing at a constant
Growth rate, giving the differential equation:
dT
¼ kT ; k > 0:
dt
This equation has the solution:
T ðtÞ ¼ T 0 ekt ;
where T0 ¼ T(0) is the initial tumor population, the number of
tumor cells, at time zero. A tumor large enough to be detected
clinically is approximately 7 mm in diameter. This corresponds to a
population of approximately 108 cells. Exponential growth implies
that, with a doubling rate of 2 days, the tumor population would
grow to 1011 cells and weigh 1 kg, in only 20 days! Thus, exponential
growth seems physically unrealistic, even if the growth rate were
much smaller, say, a doubling every 2 weeks.
Experiments show that tumor cells grow exponentially when
the population is small, but growth slows down when the popula-
tion is large. Figure 3 compares exponential growth and self-
limiting growth on the same graph.
Figure 4 shows a graph plotting tumor growth rate (dT/dt) as a
function of tumor size (T).
Slowing
Exponential growth
growth regime
Growth stops
Fig. 3. Exponential versus self-limiting growth. A plot of the time-evolution of two models of population growth on the same
axes. In the initial time period, both graphs look the same: this is the initial exponential growth phase of the self-limiting
growth model. By time t ¼ 6, the graphs of the two models begin to separate, and by time t ¼ 15 the graph of the self-
limiting growth model has leveled off, indicating that growth has stopped.
4 Best Practices in Mathematical Modeling 59
20
15
–5
–10
–15
–20
–25
0 100 200 300 400 500 600
Tumor size: T (× 106)
Fig. 4. Self-limiting growth. This graph of growth rate (dT/dt ) against population (T, in
units of 106 cells) shows one self-limiting growth model, the logistic model, (see Table 1:
population growth models). Note that, for small population values, growth rate increases
as the population increases, but as the population increases beyond T ¼ 250, growth rate
begins to decrease until it reaches zero at the carrying capacity, T ¼ 500. This corre-
sponds to the value at which the population levels off in Fig. 3. For values of the
population, T, larger than 500, dT/dt is negative, indicating that a population larger than
the carrying capacity will decrease until it reaches 500. Models of growth may be
developed starting with an empirically derived graph, similar to this one, of the rate of
growth against the population.
Table 1
Growth laws
Fig. 5. A comparison of four growth laws. Data from ref. (3), which describes three different mouse experiments (marked
as “Data set 1,” “Data set 2,” and “Data set 3,” respectively), is used to fit four different growth laws. The model result is
shown in solid curves, while the data points are shown by filled squares. In each case, the parameters of the models are
chosen to minimize the least squares distance from the model’s predicted values to the data. Residuals showing the
difference between the predicted values and the data are shown as bars below the graphs in each case. Note that the first
data set has more time points than the other two, so that the last three residuals are due to differences coming only from
the first data set. The two models shown in the left column, the power law and the Gompertz models, show larger residuals
than the two models depicted on the right, the logistic and the von Bertalanffy models. In this sense, the two models on the
right are “better” than the two on the left. Using the principle of “parameter parsimony,” which indicates that the model
with the fewest parameters is preferable, the logistic model is “better” than the von Bertalanffy in predicting the outcome
of the experiments represented by these data.
3.2. Modeling the Cytotoxic effector cells include those immune cells that are capable
Growth of Effector Cells of killing tumor cells, for example Natural Killer (NK) cells or
Killer T Cells (CTL); we combine all cytotoxic effector cells into
one population, referred to as Effector Cells, denoted by E.
If we assume that there is a constant source of effector cells
providing cells at a fixed rate, s, cells per unit time, and that the
fraction of these cells that die off per unit time is another constant,
d, we get the differential equation:
dE
¼ s dE with s; d>0:
dt
62 L.G. de Pillis and A.E. Radunskaya
3.3. Modeling Tumor We now need to add to the immune cell equation a term that
and Immune represents how the production of tumor-specific effector cells
Interactions responds to the presence of the tumor. This function should incor-
porate the recognition of antigen by the immune system, and could
take several different forms. The most common function describing
the interaction of two populations describes the Law of Mass
Action, which assumes that the effect of the interaction is directly
proportional to the product of the two populations. For example, if
the effector cells were stimulated by the tumor cells according to
the Law of Mass Action, the equation would take on the following
form:
dE
¼ s dE þ rET with s; d; r>0:
dt
Note, however, that the term rET implies that the larger the
tumor, the greater the response of the immune system, and that this
response could get infinitely large. The rate of production of
tumor-specific effector cells is difficult to measure experimentally,
however, it is known that the effector response saturates, and does
not grow indefinitely. Therefore, the response function should be
an increasing function of the number of tumor cells, but should be
bounded above by some constant. One such function has a Michae-
lis–Menten form, where we replace rET by rET/(s + T) and the
equation becomes:
dE rET
¼ s dE þ with s; d; r; s>0:
dt sþT
We now must include the destructive effect that interactions
between tumor cells and immune cells have on the cell populations.
Biologically, we assume that tumor cells can be killed by effector
cells while, in the process, the effector cells are deactivated follow-
ing interactions with tumor cells. In this case, a mass action law
makes sense, since the maximum number of cells that can be killed
under this law will not exceed the total population. Putting these
terms into the differential equations for effector cell and tumor
growth gives the system of equations:
dE rET
¼ s dE þ c 1 ET (1)
dt sþT
dT
¼ aT ð1 bT Þ c 2 ET ; (2)
dt
where c1 and c2 are also positive constants.
Note that the following assumptions are now included in the
two-dimensional model:
l Effectors have a constant source.
l Effectors are recruited by tumor cells.
4 Best Practices in Mathematical Modeling 63
3.4. Dimensional Once the model has been developed, it is very important to make
Analysis sure that the units for the parameters and populations (“state
variables”) are balanced. Inappropriately mixing units is a common
mistake that beginning modelers make. The first step in checking
whether the equations are balanced is to specify the units of each
model parameter. For example, suppose we want to examine the
units in the Michaelis–Menten term. On the left, the units of dE/dt
are cells per time, or #cells/day, since E represents the number of
effectors cells, and t represents time. Each term of the equation on
the right must then work out to #cells/day. Looking at the Michae-
lis–Menten term in Eq. 1, with the units in parentheses, and only
the units of r and s as yet to be determined, we have:
dEð#cellsÞ rð?ÞEð#cellsÞT ð#cellsÞ
¼ ::: ::: (3)
dtðdayÞ sð?Þ þ T ð#cellsÞ
We can now determine the proper units for the parameters r
and s in the Michaelis–Menten term. First, it is clear that the units
of s must be in units of cells, that is s(?)¼s(# cells), for the
denominator to make sense (we add number of cells to number
of cells). Secondly, the parameter r must be in units of 1/day, since
the left hand side of Eqs. 1 and 3 are in units of (# cells/day), each
additive term on the right hand side of those equations must also be
in units of (# cells/day). Thus, with r (1/day), the Michaelis–-
Menten term units work out to be r (1/day) (#cells)2/
(#cells) ¼ (#cells/day). The rest of the parameter units can be
worked out in a similar fashion, giving the values for all of the
parameters, which are shown in Table 2.
In summary, the differential equations model has one equation
per variable. According to the mass balance principle, each equation
gives the total change in the variable, which is equal to the amount in
minus the amount out. Positive terms represent the amount going
in, while negative terms represent the amount going in, while
negative terms represent the amount going out. The form of each
term is derived through biological considerations such as knowing
there is a constant source of immune cells, and empirical observa-
tions, such as knowing that tumor growth is self-limiting. Figure 6
summarizes the elements in the formal mathematical model.
Step 4: Solve the System.
In general we cannot find an explicit function or formula for the
solution (also known as a closed form solution) for a mathematical
64 L.G. de Pillis and A.E. Radunskaya
Table 2
Description of parameters and their units
Model Elements
Mass Michaelis-
Logistic Growth Menten
Action
dE
= s + rET ( +T) − c1ET − dE
dt
dT
= aT (1 − bT) − c2ET
dt
Fig. 6. Elements of the tumor-immune model. This figure shows the equations of the
tumor immune model, with key functional elements highlighted. The negative mass action
terms are inside a rectangular border, indicating that we assume that any cell is equally
likely to interact with any other cell. The logistic growth term in the second differential
equation describing tumor growth is shown inside a hexagonal border; this term indicates
our assumption that tumor growth is self-limiting, even in the absence of immune cells.
The recruitment term in the differential equation describing immune cell growth is shown
circled by an oval. The Michaelis–Menten form of this term indicates our assumption that
immune cell production in response to the presence of a tumor saturates at the level given
by the parameter, r. Each term in a model is developed using knowledge of the system
being modeled, for example the dynamics of cell growth and the kinetics of the immune
response, as well as available experimental data showing the shape of growth and
response curves.
4 Best Practices in Mathematical Modeling 65
Fig. 7. A phase portrait of the tumor-immune model. This figure shows orbits, the result of
numerical solutions to the tumor-immune model given in Eqs. 1 and 2. For this set of
parameter values (given in Table 3), there are three equilibria, indicated by asterisks. Two
equilibria are stable, so that nearby solutions are attracted to them. One of these
represents a relatively large tumor size near the carrying capacity of 5 108 cells,
while the other represents a relatively small tumor size of approximately 8.5 105 cells.
The third equilibrium is unstable, but the orbits that converge to it, indicated by dark lines,
serve to separate the two basins of attraction of the stable equilibria. This figure also
depicts several representative orbits, or solutions of the model. Some of these orbits
move up towards the large-tumor stable equilibrium, while the others spiral towards the
small-tumor stable equilibrium. This phase portrait captures all possible behaviors of the
model.
3.5. Interpreting the Figures 7 and 8 show a numerical solution of the model given in
Solution: The Phase Eqs. 1 and 2. Figure 7 shows a Phase Portrait, which is a graph
Portrait showing the evolution of the state variables of the system, in this
case the effector cell population, E(t), on the horizontal axis, and
the tumor cell population, T(t), plotted on the vertical axis. The
phase portrait does not show time explicitly, but rather it shows
how the two state variables of the system change over time with
respect to each other. A phase portrait is a compact way of display-
ing all possible behaviors of the system at once. An orbit of the
differential equation is a solution plotted as a curve in the phase
plane. The graph shows six orbits (thin lines), going to one of three
equilibria, or steady states, marked by asterisks in Fig. 7. The equili-
bria are found by setting the differential equations, Eqs. 1 and 2,
66 L.G. de Pillis and A.E. Radunskaya
Effector Cells
Tumor Cells
Effector Cells
102 102
Tumor Cells
Number of Cells
101 101
100 100
10–1 10–1
0 10 20 30 40 50 0 10 20 30 40 50
Time Time
Fig. 8. Tumor-immune solution curves. This figure shows two distinct solutions of the tumor immune model using the same
parameter values as in Fig. 7, (see Table 3). On the left, the solution converges to the large tumor equilibrium, while the
graphs on the right show a solution converging to the smaller tumor equilibrium. The initial tumor population values for the
two solutions are the only thing that differs: on the left, T(0) ¼ 10, while on the right T(0) ¼ 1 (in units of 106 cells).
3.6. Reexamine Model At this stage, before abandoning the model as inadequate, we
Parameters and/or choose to test other parameter ranges. In fact, with one change in
Assumptions a parameter value, we can get the “creeping through” phenome-
non. The new phase portrait is shown in Fig. 9. Note that in this
simulation, the value of s, the constant source rate of immune cells,
has been decreased by a factor of 10, making the immune response
particularly susceptible to the strength of the adaptive immune
response. The parameter values are given in Table 3. Two solutions
are shown in Fig. 9. The initial values of the immune and tumor cell
populations are close to each other (the points are labeled on the
graph), but one initial value results in a tumor that is small for a
time, i.e., dormant, then increases in size, gradually reaching the
lower tumor equilibrium in an oscillatory manner. The other initial
value results in a dormant phase followed by aggressive growth
towards the large tumor equilibrium. Solution curves showing the
tumor populations over time are shown in Fig. 10. In this current
model, we now see both tumor dormancy and aggressive growth,
so we have answered the question in the affirmative: the immune
response can explain both tumor dormancy and creeping through.
Step 5, continued: Can We Use the Mathematical Model to Answer
Other Questions? Now that we have a mathematical model of tumor
growth with the immune response, we can return to our original
question: “What is the best way to administer chemotherapy?” In
particular, how can we determine the optimal doses and timings for
treatments? It is probably impossible to find the “best” treatment, so
68 L.G. de Pillis and A.E. Radunskaya
Fig. 9. Creeping-through and dormancy. This figure shows a phase portrait of the tumor-immune system, similar to Fig. 7.
However, the parameter representing the constant immune source rate is smaller in this numerical solution of the model,
resulting in a phase portrait with a different structure. There is an fourth, unstable equilibrium (marked “H”), in addition to
the three equilibria shown in Fig. 7. In addition, the shape of the basins of attraction of the two stable equilibria (“D” and
“B”) are different, giving rise to the possibility of an observed phenomenon known as creeping through. In this scenario, a
tumor decreases in size and remains small for some time (dormancy), but then begins to regrow until it reaches a
dangerously large size. This phase portrait shows that, depending on the initial conditions, a tumor can exhibit dormancy
followed by oscillating growth and shrinking until it approaches the smaller-tumor equilibrium (labeled “B”), or it can
experience shrinkage, followed by aggressive growth, or creeping-through, to the larger-tumor equilibrium (labeled “D”).
These two solutions are graphed as a function of time in Fig. 10. Parameter values are given in Table 3.
Table 3
Parameter values used in the simulations
Parameter name Figures 7 and 8 Figures 9 and 10 Units: c ¼ 106 cells, d ¼ 102 days
s 0.1181 0.01 c/d
s 20.19 20.19 c
r 1.131 1.131 1/d
c1 0.00311 0.00311 1/(cd)
d 0.3743 0.3743 1/d
a 1.636 1.636 1/d
b 0.002 0.002 1/c
c2 1 1 1/(cd)
E(0) 0.00001 3.5 and 3.6 c
T(0) 10 and 1 300 c
This Table lists all of the parameter values used to numerically solve the system given by Eqs. 1 and 2. These
solutions are shown in Figs. 7 through 10
4 Best Practices in Mathematical Modeling 69
600
Tumor dormancy
Dormancy then creeping through
500
200
100
0
0 5 10 15 20 25 30 35
Time (days x 102)
Fig. 10. Creeping-through solutions. This figure shows two solutions that are also shown
in the phase portrait in Fig. 9. Only the values of the tumor populations over time are
shown in this figure, so that the two initial conditions look identical, with T(0) ¼ 300
( 106 cells). However, the initial effector cell populations differed, with E
(0) ¼ 3.5 106 cells (solid line) for the solution that converges to the small tumor
equilibrium (“dormancy”), and E(0) ¼ 3.6 106 cells (dashed line) for the solution that
converges to the large tumor equilibrium. The fact that a larger initial immune cell
population results in a worse outcome is a nonintuitive result of the structure of the
phase portrait. To explain this phenomenon: if the initial immune population is large, the
immune system is able to control the tumor for a while (for approximately 900 days, or
2½ years), but during this time the immune response declines to a point at which the
tumor is able to escape the immune surveillance entirely. A similar scenario ensues with a
slightly smaller initial effector cell population, but the tumor population does not get quite
as small, and hence the immune response is just a bit larger, preventing the tumor from
escaping to the large tumor equilibrium. Parameter values are given in Table 3.
F(u) = k (1 – e–u)
saturation level = k
0.5
0.3
0.2
0.1
0
0 1 2 3 4 5
u = Amount of Drug
Fig. 11. Drug kill rate. This figure shows the graph of a function describing the fractional
cell kill as a function of the amount of drug at the tumor site. The function was chosen to
conform with the assumptions that (1) the kill rate increases with the amount of drug
present, and (2) the kill rate saturates at a fixed level, in this case given by the parameter
k ¼ 0.47, indicated by the dashed line.
the equations to describe the interaction between the drug and the
cells. We note, however, that our choice of modeling approach
implies certain assumptions. For example, since we only describe
total populations, all interactions are assumed to be homogeneous.
In particular, we assume that the drug at the tumor site reaches all
tumor cells with equal likelihood, and that all cells are affected in
the same way. Since we want to monitor toxicity, we introduce
another cell population that we denote by N(t) for “normal”
cells. This population will be adversely affected by the drug, and
we consider a treatment to be “not too toxic” if this normal cell
population is maintained above a certain prespecified amount.
Step 3: Formulate the Model.
As before, we use the mass conservation principle to develop the
formal model for the extended model:
The change in population over time is equal to “amount
in”“amount out.”
Applying this to the amount of drug, we assume that the drug
goes in at a rate that we can control. Incorporating the time it takes
for the drug to be transported from the injection site to the tumor
site we denote the rate of influx of the drug to the tumor site by a
function of time, v(t). We assume that the drug decays exponen-
tially, at a rate proportional to the existing amount:
du
¼ vðtÞ d 2 u:
dt
4 Best Practices in Mathematical Modeling 71
To model the interaction between the drug and the tumor cells,
we use a mass action term of the form F(u)T, where F(u) is a
saturating function of the amount of drug, u(t):
F ðuÞ ¼ kð1 eu Þ:
Figure 11 shows a graph of the function F(u).
We assume that the drug is toxic to all cells, with the toxicity
varying according to cell type. The parameter k is varied for each
cell type to indicate the different levels of toxicity. To model the
normal cell population, we use a formal model similar to the
differential equation describing tumor cell growth, using a logistic
growth law to describe the “amount in.” To model the “amount
out,” in addition to the drug kill rate, a mass action term is used to
describe competition with tumor cells. Finally, since we assume that
normal cells and tumor cells compete, we add another competition
term to reflect this in the equation for the tumor cells. This gives
the following set of four differential equations for the extended
model:
dE rET
¼ s dE þ c 1 ET k1 ð1 eu Þ (4)
dt sþT
dT
¼ a 1 T ð1 b 1 T Þ c 2 ET c 3 NT k2 ð1 eu Þ (5)
dt
dN
¼ a 2 N ð1 b 2 N Þ c 4 NT k3 ð1 eu Þ (6)
dt
du
¼ vðtÞ d 2 u: (7)
dt
Step 4: Solve the System.
Our question is now turned into the following mathematical
problem that we must solve. We must find the function, v(t), that
will minimize the value of the tumor variable, T, while maintaining
the normal cell variable, N, above a prescribed level. A mathemati-
cal solution to this problem can be obtained using optimization
techniques based on Pontryagin’s Maximum Principle. For more
details see ref. (5). The results are shown in Fig. 12, which
compares treatments on two hypothetical individuals, one whose
initial immune population is 0.15 (in units of 105 cells), and
the other with a slightly smaller immune cell population initially:
E(0) ¼ 0.1 ¼ 104 cells. The top row of Fig. 12 shows simulations
of the model when no chemotherapy treatment is given: the
tumor population in both “patients” increases to a dangerous
level. The middle row of Fig. 12 shows the results from the
model if v(t), the function describing the drug dosing, mimics
that of a (hypothetical) “traditional” chemotherapy protocol,
where drug is administered in bolus injections regularly every
3 days for 90 days. We note that this regular, pulsed therapy does
72 L.G. de Pillis and A.E. Radunskaya
No Chemotherapy
No Chemotherapy: I0 = 0.15 No Chemotherapy: I0 = 0.1
1 1
Immune Immune
0.9 Tumor 0.9 I0 = 0.1, T0 = 0.25, N0 =1 Tumor
I0 = 0.15, T0 = 0.25, N0 = 1
Normal s = 0.33, R = 0.01, A = 0.3 Normal
s = 0.33, R = 0.01, A = 0.3 0.8
0.8
0.7 0.7
Number of Cells
Number of Cells
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 50 100 150 0 50 100 150
Time in Days Time in Days
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 50 100 150 0 50 100 150
Time in Days Time in Days
1.4 1.4
Number of Cells
Number of Cells
1.2 1.2
1 1
0.8 0.8
Immune
Immune
0.6 0.6 Tumor
Tumor
Normal
Normal
0.4 0.4 Drug Dose
Drug Dose
0.2 0.2
0 0
0 50 100 150 0 50 100 150
Time in Days Time in Days
I0 = E(0) = :15 I0=E(0) = :10
4 Best Practices in Mathematical Modeling 73
eliminate the tumor when the initial immune cell count is 0.15, but
is unable to control tumor growth in the right-hand panel, when
the initial immune cell count is only 0.1. The bottom row of Fig. 12
shows the numerical solution to the optimization problem, which
indicates that the function v(t) should be a step function that
changes irregularly from the minimum (zero) to the maximum
dosage. With this treatment schedule, tumor growth is controlled
in both cases. We note that the total amount of drug given is
identical in the treatment scenarios depicted in the first and second
rows of Fig. 12. Furthermore, the value of the normal cells never
falls below 0.75, which, in these scaled variables, is three-fourths of
its usual level.
Step 5: Answer the Question.
We have seen that the mathematical model, which reproduced the
clinically observed phenomena of tumor dormancy and “creeping
through,” can be used to study the effect of chemotherapy treat-
ments. Using optimization theory, treatment protocols can be
identified that, according to the model, improve the treatment
outcomes, in the sense that the tumor volume is smaller at the
end of a prescribed amount of time, while normal cells never get
below three-fourths of their usual levels.
4. Conclusion
References
1. Sanchez PJ (2006) As simple as possible, but and drug therapy: an optimal control approach.
no simpler: a gentle introduction to simulation J Theor Med 3:79–100
modeling. Proceedings of the 2006 winter 6. de Pillis LG, Radunskaya AE, Wiseman CL
simulation conference, pp 2–10 (2005) A validated mathematical model of
2. de Pillis LG, Mallett D, Radunskaya AE (2006) cell-mediated immune response to tumor
Spatial tumor—immune modeling. Comput growth. Cancer Res 65:7950–7958
Math Model Med 7(2):159–176 7. de Pillis LG, Radunskaya AE (2006) Some
3. Diefenbach A, Jensen ER, Jamieson A, Raulet promising approaches to tumor—immune
DH (2001) Rae1 and H60 ligands of the modeling. In: Gumel A, Castillo-Chavez C,
NKG2D receptor stimulate tumour immunity. Mickens R and Clemence DP(eds) Mathemati-
Nature 413:165–171 cal studies on human disease dynamics:
4. Borrelli R, Coleman C (2004) Differential emerging paradigms and challenges. AMS
equations—a modeling perspective. Wiley, Contemporary Mathematics Series
New York, NY 8. Cooke K (1981) On the construction and eval-
5. de Pillis LG, Radunskaya AE (2001) A mathe- uation of mathematical models. In: Sabloff JA
matical tumor model with immune resistance (ed) Simulations in Archaeology. University of
New Mexico Press, Albuquerque, NM
Chapter 5
Abstract
This chapter lists some of the software and tools that are used in computational toxicology, as presented in
this volume.
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_5, # Springer Science+Business Media, LLC 2012
75
76 A.N. Mayeno and B. Reisfeld
Table 1
List of software packages, tools, and companies presented in this volume
Name/Developer Description/URL
Chapter 1.2
MOE (Molecular Software package contains Structure-Based Design; Pharmacophore
Operating Discovery; Protein & Antibody Modeling; Molecular Modeling &
Environment) Simulations; Cheminformatics & (HTS) Quantitative Structure/Activity
Relationship (QSAR); and Medicinal Chemistry Applications.
Chemical Computing http://www.chemcomp.com/software.htm
Group, Inc.
(continued)
78 A.N. Mayeno and B. Reisfeld
Table 1
(continued)
Name/Developer Description/URL
QiKProP Package to predict parameters such as octanol/water and water/gas logP,
logs, logBB, overall CNS activity, Caco-2 and MDCK cell permeabilities,
human oral absorption, log Khsa for human serum albumin binding, and log
IC50 for HERG K+-channel blockage.
Schrődinger, LLC http://www.schrodinger.com/products/14/17/
SPARCa Predictive modeling system that calculates large number of physical/chemical
parameters from molecular structure and basic information about the
environment.
US EPA, University of http://www.epa.gov/extrmurl/research/projects/sparc/index.html; http://
Georgia archemcalc.com/index.php
OpenEye Company that offers various software packages, libraries, and toolkits.
OpenEye Scientific www.eyesopen.com
Software, Inc.
Chapter 1.3
BLASTa Sequence database search methods: finds regions of local similarity between
sequences. The program compares nucleotide or protein sequences to
sequence databases and calculates the statistical significance of matches.
BLAST can be used to infer functional and evolutionary relationships
between sequences as well as help identify members of gene families.
National Center for http://blast.ncbi.nlm.nih.gov
Biotechnology
Information
FASTAa Sequence database search methods: The FASTA programs find regions of local
or global (new) similarity between Protein or DNA sequences, either by
searching Protein or DNA databases or by identifying local duplications
within a sequence. Other programs provide information on the statistical
significance of an alignment. Like BLAST, FASTA can be used to infer
functional and evolutionary relationships between sequences as well as help
identify members of gene families.
William R. Pearson and http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml
the University of
Virginia
ClustalWa Multiple sequence alignment methods.
Ch.EMBnet.org http://www.ch.embnet.org/index.html
Kaligna Multiple sequence alignment methods: a method employing the Wu-Manber
string-matching algorithm.
European http://www.ebi.ac.uk/Tools/msa/kalign/
Bioinformatics
Institute
MUSCLEa Multiple sequence alignment method.
Robert C. Edgar http://www.drive5.com/muscle/
T-Coffeea Multiple sequence alignment methods: A collection of tools for Computing,
Evaluating, and Manipulating Multiple Alignments of DNA, RNA, Protein
Sequences, and Structures.
Center for Genomic http://www.tcoffee.org/
Regulation (CRG)
(continued)
5 Tools and Techniques 79
Table 1
(continued)
Name/Developer Description/URL
Predict Proteinb Secondary structure prediction methods: PredictProtein integrates feature
prediction for secondary structure, solvent accessibility, transmembrane
helices, globular regions, coiled-coil regions, structural switch regions,
B-values, disorder regions, intra-residue contacts, protein–protein and
protein–DNA binding sites, subcellular localization, domain boundaries,
beta-barrels, cysteine bonds, metal binding sites, and disulfide bridges.
ROSTLAB.ORG http://www.predictprotein.org/
JPreda Secondary structure prediction method.
Geoff Barton, http://www.compbio.dundee.ac.uk/Software/JPred/jpred.html
Bioinformatics and
Computational
Biology Research,
University of
Dundee, Scotland,
UK
ExPASy-toolsa ExPASy is the new SIB Bioinformatics Resource Portal which provides access
to scientific databases and software tools in different areas of life sciences
including proteomics, genomics, phylogeny, systems biology, evolution,
population genetics, transcriptomics, etc.
Swiss Institute of http://expasy.org/tools/
Bioinformatics
PSI-BLASTa PSI-BLAST is similar to NCBI BLAST except that it uses position-specific
scoring matrices derived during the search; this tool is used to detect distant
evolutionary relationships.
European http://www.ebi.ac.uk/Tools/sss/psiblast/
Bioinformatics
Institute
BLASTa The Basic Local Alignment Search Tool (BLAST) finds regions of local
similarity between sequences. The program compares nucleotide or protein
sequences to sequence databases and calculates the statistical significance of
matches. BLAST can be used to infer functional and evolutionary
relationships between sequences as well as help identify members of gene
families.
National Center for http://blast.ncbi.nlm.nih.gov/Blast.cgi
Biotechnology
Information
BINDa Biomolecular Interaction Network Database (BIND) as a Web database.
BIND offers methods common to related biology databases and
specializations for its protein interaction data.
Christopher Hogue’s http://www.blueprint.org/
Research Lab
BioGRIDa Biological General Repository for Interaction Datasets (BioGRID) database
was developed to house and distribute collections of protein and genetic
interactions from major model organism species.
TyersLab.com http://www.thebiogrid.org
DIPa The DIPTM database catalogs experimentally determined interactions between
proteins. It combines information from a variety of sources to create a
single, consistent set of protein–protein interactions.
(continued)
80 A.N. Mayeno and B. Reisfeld
Table 1
(continued)
Name/Developer Description/URL
Regents of the http://dip.doe-mbi.ucla.edu/dip/Main.cgi
University of
California and David
Eisenberg
HPRDb Human Protein Reference Database (HPRD) is a protein database accessible
through the Internet.
Institute of http://www.hprd.org/
Bioinformatics in
Bangalore, India, and
the Pandey lab at
Johns Hopkins
University
IntActa IntAct provides a freely available, open source database system and analysis
tools for protein interaction data.
European http://www.ebi.ac.uk/intact/
Bioinformatics
Institute (EBI)
MINTa MINT, the Molecular INTeraction database. MINT focuses on experimentally
verified protein–protein interactions mined from the scientific literature by
expert curators.
University of Rome Tor http://mint.bio.uniroma2.it/mint/Welcome.do
Vergataand IRCCS
Fondazione Santa
Lucia
KEGG KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of online
databases dealing with genomes, enzymatic pathways, and biological
chemicals. The PATHWAY database records networks of molecular
interactions in the cells, and variants of them specific to particular
organisms.
Kanehisa Laboratories at http://www.genome.jp/kegg/pathway.html
Kyoto University and
the University of
Tokyo
UniPathwaya UniPathway is a curated resource of metabolic pathways for UniProtKB/
Swiss-Prot knowledge base.
Swiss Institute of http://www.grenoble.prabi.fr/obiwarehouse/unipathway
Bioinformatics (SIB),
French National
Institute for Research
in Computer Science
and Control (INRIA
Rhone-Alpes)—
HELIX/BAMBOO
group, Laboratoire
d’Ecologie Alpine,
Pôle Rhône-Alpin de
Bioinformatique
(continued)
5 Tools and Techniques 81
Table 1
(continued)
Name/Developer Description/URL
GastroPlus Advanced software program that simulates the absorption, pharmacokinetics,
and pharmacodynamics for drugs in human and preclinical species. The
underlying model is the Advanced Compartmental Absorption and Transit
(ACAT) model.
Simulations-Plus, Inc. http://www.simulations-plus.com/
WinNonLin WinNonlin is the industry standard for pharmacokinetic, pharmacodynamic,
and noncompartmental analysis. In addition to its extensive library of built-
in PK, PD, and PK/PD models, WinNonlin supports custom, user-defined
models to address any kind of data.
Pharsight Inc. http://www.pharsight.com
DOCKb DOCK addresses the problem of “docking” molecules to each other. In
general, “docking” is the identification of the low-energy binding modes of
a small molecule, or ligand, within the active site of a macromolecule, or
receptor, whose structure is known.
Kuntz Lab program http://dock.compbio.ucsf.edu/
FlexX Docking software: Binding mode prediction (predicts the geometry of the
protein–ligand complex) and Virtual high-throughput screening (vHTS).
BioSolveIT http://www.biosolveit.de/FlexX/
Glide Docking software: full spectrum of speed and accuracy from high-throughput
virtual screening of millions of compounds to extremely accurate binding
mode predictions.
Schrödinger, LLC http://www.schrodinger.com/products/14/5/
Autodocka A suite of automated docking tools. It is designed to predict how small
molecules, such as substrates or drug candidates, bind to a receptor of
known 3D structure.
The Scripps Research http://autodock.scripps.edu/
Institute
GOLD Protein–Ligand Docking: a program for calculating the docking modes of
small molecules in protein binding sites and is provided as part of the
GOLD Suite.
University of Sheffield, http://www.ccdc.cam.ac.uk/products/life_sciences/gold/
GlaxoSmithKline plc
and CCDC
Amber A suite of programs that allow users to carry out molecular dynamics
simulations, particularly on biomolecules.
Multiple collaborator: http://ambermd.org/
http://ambermd.
org/contributors
Tripos Software company offering a variety of packages, including SYBYL-X Suite,
Muse, and D360.
Tripos http://tripos.com/
Chapter 3.1
SMILES A compact machine and human-readable chemical nomenclature
Daylight Chemical www.daylight.com/dayhtml_tutorials/languages/smiles/index.html
Information Systems
Inc.
(continued)
82 A.N. Mayeno and B. Reisfeld
Table 1
(continued)
Name/Developer Description/URL
QSAR Toolboxa Software to fill gaps in (eco-)toxicity data needed for assessing the hazards of
chemicals.
Organisation for http://www.qsartoolbox.org/index.html
Economic Co-
operation and
Development &
European Chemicals
Agency
CODESSA An advanced, full featured QSAR program that ties information from
AMPAC™ to experimental data.
SemiChem Inc. http://www.semichem.com/
MoKa A novel approach for in silico computation of pKa values; trained using a very
diverse set of more than 25,000 pKa values, it provides accurate and fast
calculations using an algorithm based on descriptors derived from GRID
molecular interaction fields.
Molecular Discovery http://www.moldiscovery.com/index.php
COSMO-RS A program for the quantitative calculation of solvation mixture
thermodynamics based on quantum chemistry.
COSMOlogic http://www.cosmologic.de/index.php
SPARCa Predictive modeling system that calculates large number of physical/chemical
parameters from molecular structure and basic information about the
environment.
US EPA, University of http://www.epa.gov/extrmurl/research/projects/sparc/index.html; http://
Georgia archemcalc.com/index.php
PREDICTPlus Prediction of Thermodynamic and Transport Properties.
Dragon Technology http://www.mwsoftware.com/dragon/
Inc.
PhysProps Thermodynamic/chemical physical property database and property
estimation tool.
G&P Engineering http://www.gpengineeringsoft.com/
Software
ChemDBsoft Searches databases by structure and substructure.
Chemistry Database http://www.chemdbsoft.com/
Software
EPI Suitea A Windows®-based suite of physical/chemical property and environmental
fate estimation programs developed by the EPA’s Office of Pollution
Prevention Toxics and Syracuse Research Corporation (SRC).
EPA http://www.epa.gov/opptintr/exposure/pubs/episuite.htm
CSLog D Prediction gives the log of the water/octanol partition coefficient for charged
molecules.
ChemSilico http://www.chemsilico.com/index.html
CSLog P Prediction gives the log of the water/octanol partition coefficient for neutral
molecules.
ChemSilico http://www.chemsilico.com/index.html
Marvin Suite A collection of tools for drawing, displaying, and characterizing chemical
structures, substructures, and reactions.
ChemAxon http://www.chemaxon.com/
(continued)
5 Tools and Techniques 83
Table 1
(continued)
Name/Developer Description/URL
Physicochemical & A complete array of tools for the prediction of molecular physical properties
ADMET Prediction from structure. The ability to train allows for the inclusion of novel chemical
Software space in many modules. The value of predictions has also been extended to
include a tool for property-based structure design.
ACD/Labs http://www.acdlabs.com/home/
Molconn-Z The standard program for generation of Molecular Connectivity, Shape, and
Information Indices for QSAR Analyses.
eduSoft http://www.edusoft-lc.com/molconn/
Jaguar A ab initio quantum chemistry package for the calculation of molecular
properties, including NMR, IR, pKa, partial charges, multipole moments,
polarizabilities, molecular orbitals, electron density, electrostatic potential,
Mulliken population, and NBO analysis.
Schrödinger http://www.schrodinger.com/
Statistica A comprehensive package for data analysis, data management, data
visualization, and data mining procedures.
StatSoft http://www.statsoft.com/#
SPSS Software Advanced mathematical and statistical packages to extract predictive
knowledge that when deployed into existing processes makes them adaptive
to improve outcomes.
IBM http://www.spss.com/
Scigress Explorer A computational chemistry package to calculate molecular properties and
energy values. The software provides insight into chemical structure,
properties, and reactivity.
SCUBE http://www.scubeindia.com/scigress_bio.html
ProChemist A molecular modeling package to simulate on screen the physical-chemical
properties of organic molecules, design graphically new structures, and
predict their behavior.
Cadcom http://pro.chemist.online.fr/
SAS A widely used, comprehensive statistical analysis software package.
SAS Institute Inc. http://www.sas.com/
PSPPa A program for statistical analysis of sampled data; similar to SPSS; free.
http://savannah.gnu. http://www.gnu.org/software/pspp/
org/projects/pspp
NCSS A statistical and power analysis software.
NCSS http://www.ncss.com/
Minitab A leading statistical software.
Minitab http://www.minitab.com/
MATLAB A high-level language and interactive environment that enables you to
perform computationally intensive tasks faster than with traditional
programming languages such as C, C++, and Fortran.
MathWorks http://www.mathworks.com/
ADMET Predictor A software for advanced predictive modeling of ADMET properties.
Simulations Plus http://www.simulations-plus.com/Default.aspx
PLSRa Performs model construction and prediction of activity/property using the
Partial Least Squares (PLS) regression technique.
VCCLab http://www.vcclab.org/lab/pls/
(continued)
84 A.N. Mayeno and B. Reisfeld
Table 1
(continued)
Name/Developer Description/URL
Chapter 3.2
Coota For macromolecular model building, model completion, and validation,
particularly suitable for protein modeling using X-ray data.
Paul Emsley et al. www.biop.ox.ac.uk/coot/
Chapter 3.6
ProSa 2003b Software tool for the analysis of 3D structures of proteins.
CAME http://www.came.sbg.ac.at/prosa_details.php
Chapter 4.2
GastroPlus An advanced software program that simulates the absorption,
pharmacokinetics, and pharmacodynamics for drugs in human and
preclinical species.
Simulations Plus www.simlations-plus.com
PK-Sim A software tool for summary and evaluation of preclinical and clinical
experimental results on absorption, distribution, metabolism, and excretion
(ADME) by means of whole-body physiologically based pharmacokinetic
(PBPK) modeling and simulation.
Bayer Technology http://www.systems-biology.com/products/pk-sim.html
SimCYP Population-based Simulator for drug development through the modeling and
simulation of pharmacokinetics and pharmacodynamics in virtual
populations.
Simcyp Limited http://www.simcyp.com/
Chapter 4.3
ADME Suite A collection of software modules that provide predictions relating to the
pharmacokinetic profiling of compounds, specifically their ADME
properties.
ACD Labs www.acdlabs.com
ADME Descriptors ADMET Descriptors in Discovery Studio® include models for intestinal
absorption, aqueous solubility, blood–brain barrier penetration, plasma
protein binding, cytochrome P450 2D6 inhibition, and hepatotoxicity.
Accelrys www.accelrys.com
Cloe PK A software system that uses PBPK modeling for pharmacokinetic prediction.
Cyprotex www.cyprotex.com
Gastroplus An advanced software program that simulates the absorption,
pharmacokinetics, and pharmacodynamics for drugs in human and
preclinical species.
Simulations Plus www.simlations-plus.com
META Program for metabolic pathways prediction of molecules, using provided
dictionaries.
Multicase www.multicase.com
MetabolExpert Program for prediction of the metabolic fate of a compound in the drug
discovery process or during the dispositional research phase.
Compudrug www.compudrug.com
QikProp Software for rapid ADME predictions of drug candidates.
Schrodinger www.schrodinger.com
Volsurf+ Software that creates 128 molecular descriptors from 3D Molecular
Interaction Fields (MIFs) produced by Molecular Discovery’s software
GRID, which are particularly relevant to ADME prediction.
(continued)
5 Tools and Techniques 85
Table 1
(continued)
Name/Developer Description/URL
Molecular Discovery www.moldiscovery.com
MetaCore An integrated knowledge database and software suite for pathway analysis of
experimental data and gene lists.
GeneGo www.genego.com
MetaDrug A unique systems pharmacology platform designed for evaluation of biological
effects of small molecule compounds on the human body, with pathway
analysis and other bioinformatics applications from toxicogenomics to
translational medicine.
GeneGo www.genego.com
Metasite A computational procedure that predicts metabolic transformations related to
cytochrome-mediated reactions in phase I metabolism.
Molecular Discovery www.moldiscovery.com
MEXAlert A quick and sensitive tool for indicating possibilities of first-pass metabolism.
Compudrug www.compudrug.com
RetroMEX RetroMex predicts the structure of the retro-metabolites, collects them in
compound databases, and presents the results in tree format. Retro-
metabolic drug design encompasses a series of concepts, e.g., prodrugs, soft
drugs, and chemical drug-delivery systems.
Compudrug www.compudrug.com
MoKa A novel approach for in silico computation of pKa values; trained using a very
diverse set of more than 25,000 pKa values, it provides accurate and fast
calculations using an algorithm based on descriptors derived from GRID
molecular interaction fields.
Molecular Discovery www.moldiscovery.com
MCASE/MC4PC Windows-based Structure–Activity Relationship (SAR) automated expert
system: the program will automatically evaluate the dataset and try to
identify the structural features responsible for activity (biophores). It then
creates organized dictionaries of these biophores and develops ad hoc local
QSAR correlations that can be used to predict the activity of unknown
molecules.
Multicase www.multicase.com
METAPC Windows-based Metabolism and Biodegradation Expert System: uses
provided dictionaries to create metabolic paths of molecules submitted to it.
Each product is tested in silico for carcinogenicity.
Multicase www.multicase.com
ADMET Predictor A software for advanced predictive modeling of ADMET properties.
Simulations Plus http://www.simulations-plus.com/Default.aspx
Meteor The program uses expert knowledge rules in metabolism to predict the
metabolic fate of chemicals and the predictions are presented in metabolic
trees.
LHASA www.lhasalimited.org
PK Solutions An automated Excel-based program that does single- and multiple-dose
pharmacokinetic data analysis of concentration–time data from biological
samples (blood, serum, plasma, lymph, etc.) following intravenous or
extravascular routes of administration.
Summit PK www.summitpk.com
(continued)
86 A.N. Mayeno and B. Reisfeld
Table 1
(continued)
Name/Developer Description/URL
PK-Sim A software tool for summary and evaluation of preclinical and clinical
experimental results on ADME by means of whole-body PBPK modeling
and simulation.
Bayer Technology http://www.systems-biology.com/products/pk-sim.html
Chapter 4.4
Discovery Studio A software suite of life science molecular design solutions for computational
chemists and computational biologists.
Accelrys http://accelrys.com/products/discovery-studio/index.html
MOE Software package contains Structure-Based Design; Pharmacophore
Discovery; Protein & Antibody Modeling; Molecular Modeling &
Simulations; Cheminformatics & (HTS) QSAR; and Medicinal Chemistry
Applications.
Chemical Computing http://www.chemcomp.com/software.htm
Group, Inc.
CORINA A fast and powerful 3D structure generator for small and medium-sized,
typically drug-like molecules.
Molecular Networks http://www.molecular-networks.com/products/corina
FlexX Docking software: Binding mode prediction (predicts the geometry of the
protein–ligand complex) and vHTS.
BioSolveIT http://www.biosolveit.de/FlexX/
Chapter 4.6
Jsimb A Java-based simulation system for building quantitative numeric models and
analyzing them with respect to experimental reference data.
Physiome Project http://www.physiome.org/jsim
Chapter 4.7
acslX A modeling, execution, and analysis environment for continuous dynamic
systems and processes.
The AEgis Technologies http://www.acslx.com/
Group, Inc.
Berkeley Madonna General-purpose differential equation solver.
Robert I. Macey & http://www.berkeleymadonna.com
George F. Oster
MATLAB A high-level language and interactive environment that enables you to
perform computationally intensive tasks faster than with traditional
programming languages such as C, C++, and Fortran.
MathWorks http://www.mathworks.com/
ModelMaker Two-way class tree oriented productivity, refactoring and UML-style CASE
tool. Native Refactoring and UML 2.0 modeling for Delphi and C#.
ModelMaker Tools http://www.modelmakertools.com
SCOP Database and tool to provide a detailed and comprehensive description of the
structural and evolutionary relationships between all proteins whose
structure is known.
SCOP http://scop.mrc-lmb.cam.ac.uk/scop
(continued)
5 Tools and Techniques 87
Table 1
(continued)
Name/Developer Description/URL
Chapter 5.1
QSAR Toolboxa Software for grouping chemicals into categories and filling gaps in (eco)
toxicity data needed for assessing the hazards of chemicals.
OECD (Organisation http://www.qsartoolbox.org/index.html
for Economic
Co-operation
and Development)
& European
Chemicals Agency
Statistica A comprehensive package for data analysis, data management, data
visualization, and data mining procedures.
StatSoft http://www.statsoft.com/#
SIMCA-P A tool for scientists, researchers, product developers, engineers, and others
who have huge datasets.
Umetrics http://www.umetrics.com/simca
Chapter 5.4
Toxtreea A full-featured and flexible user-friendly open source application, which is able
to estimate toxic hazard by applying a decision tree approach.
Ideaconsult Ltd. http://toxtree.sourceforge.net/
OncoLogica A desktop computer program that evaluates the likelihood that a chemical may
cause cancer.
EPA http://www.epa.gov/oppt/sf/pubs/oncologic.htm
QSAR Toolboxa Software for grouping chemicals into categories and filling gaps in (eco)
toxicity data needed for assessing the hazards of chemicals.
OECD (Organisation http://www.qsartoolbox.org/index.html
for Economic Co-
operation and
Development) &
European Chemicals
Agency
OpenToxa An interoperable predictive toxicology framework which may be used as an
enabling platform for the creation of predictive toxicology applications.
OpenTox http://www.opentox.org
Chapter 5.6
Caesara An EC-funded project (Project no. 022674—SSPI), which was specifically
dedicated to develop QSAR models for the REACH legislation.
Istituto di Ricerche http://www.caesar-project.eu
Farmacologiche
Mario Negri
Derek Nexus Expert knowledge base system that predicts whether a chemical is toxic in
humans, other mammals, and bacteria.
Lhasa Limited https://www.lhasalimited.org
HazardExpert A software tool for initial estimation of toxic symptoms of organic compounds
in humans and in animals.
Compudrug http://www.compudrug.com
Lazara An open source software program that makes predictions of toxicological
endpoints (e.g., mutagenicity, rodent and hamster carcinogenicity, maximum
recommended daily dose) by analyzing structural fragments in a training set.
(continued)
88 A.N. Mayeno and B. Reisfeld
Table 1
(continued)
Name/Developer Description/URL
In Silico Toxicology http://lazar.in-silico.de
gmbh
Meteor The program uses expert knowledge rules in metabolism to predict the
metabolic fate of chemicals and the predictions are presented in metabolic
trees.
LHASA www.lhasalimited.org
QSAR Toolboxa Software for grouping chemicals into categories and filling gaps in (eco)
toxicity data needed for assessing the hazards of chemicals.
OECD (Organisation http://www.qsartoolbox.org/index.html
for Economic Co-
operation and
Development) &
European Chemicals
Agency
OncoLogica A desktop computer program that evaluates the likelihood that a chemical may
cause cancer.
EPA http://www.epa.gov/oppt/sf/pubs/oncologic.htm
TOPKAT Discovery Studio tool that makes predictions of a range of toxicological
endpoints, including mutagenicity, developmental toxicity, rodent
carcinogenicity, rat chronic Lowest Observed Adverse Effect Level
(LOAEL), rat Maximum Tolerated Dose, and rat oral LD50.
Accelrys http://accelrys.com
ACD/Tox Suite A collection of software modules that predict probabilities for basic toxicity
endpoints.
ACD/Labs http://www.acdlabs.com/home
Toxtreea A full-featured and flexible user-friendly open source application, which is able
to estimate toxic hazard by applying a decision tree approach.
Ideaconsult Ltd. http://toxtree.sourceforge.net/
Chapter 6.2
CellNetOptimizera A MATLAB toolbox for creating logic-based models of signal transduction
networks, and training them against high-throughput biochemical data.
Saez-Rodriguez Group http://www.ebi.ac.uk/saezrodriguez/software.html#CellNetOptimizer
Software
MATLAB A high-level language and interactive environment that enables you to
perform computationally intensive tasks faster than with traditional
programming languages such as C, C++, and Fortran.
MathWorks http://www.mathworks.com/
Chapter 6.3
SMBioNet A tool for modeling genetic regulatory systems, based on the multivalued
logical formalism of René Thomas and the Computational Tree Logic
(CTL).
Laboratoire I3S— http://www.i3s.unice.fr/~richard/smbionet
CNRS & Université
de Nice
(continued)
5 Tools and Techniques 89
Table 1
(continued)
Name/Developer Description/URL
Chapter 6.4
Pajeka A program, for Windows, for analysis and visualization of large networks
having some thousands or even millions of vertices.
Vladimir Batagelj and http://vlado.fmf.uni-lj.si/pub/networks/pajek/
Andrej Mrvar
Cytoscapea An open source software platform for visualizing complex networks and
integrating these with any type of attribute data.
Cytoscape Consortium http://www.cytoscape.org
Chapter 7.1
MetaCore An integrated knowledge database and software suite for pathway analysis of
experimental data and gene lists.
GeneGo www.genego.com
GoMinera A tool for biological interpretation of “omic” data—including data from gene
expression microarrays.
NCI/LMP Genomics http://discover.nci.nih.gov/gominer/index.jsp
and Bioinformatics
group
Chapter 9.3
PASSI Toolkita The PTK tool is Rational Rose plug-in that offers a support for PASSI (a
Process for Agent Societies Specification and Implementation), a step-by-
step requirement-to-code methodology for designing and developing
multi-agent societies.
mcossentino, http://sourceforge.net/projects/ptk
mr_lombardo, sirtoy
MASONa A fast discrete-event multi-agent simulation library core in Java, designed to be
the foundation for large custom-purpose Java simulations.
George Mason http://cs.gmu.edu/~eclab/projects/mason
University’s
Evolutionary
Computation
Laboratory and the
GMU Center for
Social Complexity
NetLogoa A multi-agent programmable modeling environment.
Uri Wilensky http://ccl.northwestern.edu/netlogo
SeSAma A generic environment for modeling and experimenting with agent-based
simulation.
University of W€
urzburg http://www.simsesam.de
a
Open source or free access
b
Free for noncommercial and/or academic use
Packages and tools are listed based on their occurrence in a chapter; as such, some listings are repeated.
Descriptions are generally excerpts from information obtained at the listed URLs. This list is not compre-
hensive; if a package, tool, or company mentioned in a chapter is not listed, it is due to an inability to locate
the appropriate URL or an oversight by the authors
Part III
Abstract
Physicochemical properties are key factors in controlling the interactions of xenobiotics with living organisms.
Computational approaches to toxicity prediction therefore generally rely to a very large extent on the
physicochemical properties of the query compounds. Consequently it is important that reliable in silico
methods are available for the rapid calculation of physicochemical properties. The key properties are partition
coefficient, aqueous solubility, and pKa and, to a lesser extent, melting point, boiling point, vapor pressure,
and Henry’s law constant (air–water partition coefficient). The calculation of each of these properties from
quantitative structure–property relationships (QSPRs) and from available software is discussed in detail, and
recommendations made. Finally, detailed consideration is given of guidelines for the development of QSPRs
and QSARs.
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_6, # Springer Science+Business Media, LLC 2012
93
94 J.C. Dearden
Table 1
Software predictions of aqueous solubilitya
2. Methods
2.1. QSPR Development If one wishes to develop a QSPR, there are three main steps to be
followed. Firstly, it is necessary to obtain experimental values of the
property of interest, for a sufficiently large number of chemicals.
“How large?” is an open question; QSPRs have been published
based on training sets of a few to several thousands of chemicals.
The more chemicals that are in the training set, the more robust will
be the QSPR—provided that the experimental data are accurate,
and that the chemicals selected are consistent with the purpose for
which the QSPR is to be developed (13). For example, if one
wished to develop a QSPR for the prediction of aqueous solubility
of aliphatic amines, it would be wrong to include data for aromatic
amines. A number of published papers (14, 15) have discussed the
selection of chemicals for QSPR development. Table 2 lists some
sources of physicochemical property data that could be useful for
QSPR modeling, and more are given by Wagner (27).
If a QSPR applicable to a diverse range of chemicals is required,
then of course the training set data should also be diverse. The
actual diversity is often limited by the availability of data, and care
should be taken in the application of the QSPR. Oyarzabal et al.
(28) have recently described a novel approach to minimize the
problems of using data from various sources and of data sets with
restricted coverage of chemical space.
Secondly, one has to obtain values for descriptors that will
model the specified property well (29, 30). One of the two
approaches can be taken here. If one believes that one or two
descriptors will serve, then those are all that may be required.
For example, Yalkowsky and coworkers (31) have found that
6 Prediction of Physicochemical Properties 97
Table 2
Some sources of data for modeling
of physicochemical properties
Database Reference
Aquasol (16)
Benchware Discovery 360 (17)
Chem. & Physical Properties Database (18)
Chemical Database Service (19)
ChemSpider (20)
Crossfire (21)
OCHEM (22)
OECD eChemPortal (23)
OECD Toolbox (24)
OSHA (25)
PhysProp (26)
aqueous solubility can often be modeled quite well with log P and
melting point. However, generally one does not know a priori
which descriptor(s) will best model a given property, in which
case the approach is usually to assemble a large pool of descriptors
(from the thousands that are now available from which to select a
pool). It should be noted also that whilst experimentally measured
descriptor values are generally more accurate than are calculated
values, the use of the latter means that one can use a QSPR to
predict the requisite properties of chemicals not yet synthesized or
available (Table 3).
The third step is to use a statistical method that will select from
the pool the “best” descriptor(s) based on appropriate statistical
criteria, and will generate a QSPR based on those descriptors (42,
43). Typical techniques include stepwise regression and genetic
algorithms, and most commercially available statistical packages
include these (Table 4).
Tetko et al. (60) have discussed the accuracy of ADMET pre-
dictions, and emphasized the importance of accuracy in order to
avoid filtering out promising series of compounds because of, for
example, wrongly predicted log P or aqueous solubility.
Even if a good QSPR is obtained, it may not be a good
predictor of the property in question for compounds not in the
training set. Hence some measure of predictive ability is required.
The best way for predictivity to be assessed is to use the QSPR to
predict the property in question for a number of compounds that
98 J.C. Dearden
Table 3
Some sources of descriptors for modeling of physico-
chemical properties
Database Reference
ADAPT (32)
Almond (33)
CODESSA (34)
C-QSAR (35)
Discovery Studio (36)
Dragon (37)
eDragon (38)
MOE (39)
MOLCONN-Z (40)
OCHEM (22)
QSARpro (41)
Volsurf (33)
were not used in the training set, but for which the measured value
of the property is known; such a set of compounds is called a test
set. The test set compounds must be reasonably similar to those of
the training set; that is, they must lie within the applicability
domain of the QSPR. This is often achieved by dividing the total
number of compounds into two groups; the larger group forms the
training set, and the smaller group (typically 5–50% of the total)
forms the test set. If the standard error for the test set is much larger
than that for the training set, then the QSPR does not have good
predictivity, and it should not be used for predictive purposes.
If the total number of compounds is small, then it may not be
practicable to split it into training and test sets. In that case a
procedure called internal cross-validation can be used, whereby
each compound in turn is deleted from the training set, the
QSPR is developed with the remaining compounds, and is used
to predict the property value of the omitted compound. That
compound is then returned to the training set and a second com-
pound is deleted, and so on until every compound has been left out
in turn. A cross-validated R2 value, called Q2, is then calculated,
which is an indicator of the internal predictivity of the QSPR. It is,
however, not considered to be as good an indicator as is obtained
using an external test set. Walker et al. (61) have proposed that an
indicator of good predictivity is that Q2 should not be more than
6 Prediction of Physicochemical Properties 99
Table 4
Statistical packages for QSPR modeling
of physicochemical properties
Software Reference
ADMET Modeler (44)
ADMEWORKS ModelBuilder (45)
ASNN (46)
Cerius2 (47)
CODESSA (34)
C-QSAR (35)
a
GENSTAT (48)
MATLABa (49)
a
Minitab (50)
MOE (39)
a
NCSS (51)
OCHEM (22)
Pentacle (33)
Pipeline Pilot (36)
PNN (46)
PredictionBase (52)
ProChemist (53)
a
PSPP (54)
QSARpro (41)
a
SAS (55)
Scigress Explorer (56)
a
SPSS (57)
Statisticaa (58)
Strike (59)
SYBYL-X (17)
a
General-purpose statistical software
0.3 lower than R2, whilst Eriksson et al. (62) have proposed a
minimal acceptable value of 0.5 for Q2.
2.2. Published QSPRs Many thousands of QSPRs are available in the open literature.
A large number of these have been referenced in review papers
100 J.C. Dearden
Table 5
Some QSAR/QSPR databases
Database Reference
C-QSAR (35)
JRC QMRF Database (65)
Danish QSAR Database (66)
OCHEM (22)
(for example (10, 63, 64)), and many more can be found with the
use of a search engine. Thus, inputting “QSPR pKa” into Scholar
Google produced 830 hits.
Compilations of QSARs and QSPRs are also available in a
number of databases, some of which are listed in Table 5.
2.3. QSPR Software There are now numerous software packages available for the pre-
diction of physicochemical properties. These vary in their perfor-
mance and in the range of properties that they predict. They almost
all use QSPR modeling approaches for their predictions. Some are
available free of charge, and some are very expensive. Most give
some indication of their performance, but what is lacking in general
is independent comparative assessment of performance. Many of
these software packages are listed in Table 6. The ChemProp soft-
ware (70) is unique in that it selects the best QSPR or software
program, from those it holds, for the prediction of a given property
of a given compound.
3. Prediction of
Selected
Physicochemical
Properties As pointed out above, it is beyond the scope of this chapter to consider
all physicochemical properties that might play a part in modeling
toxicity. Those considered most important are octanol–water parti-
tion coefficient, aqueous solubility, pKa, melting point, boiling point,
vapor pressure, and Henry’s law constant (air–water partition coeffi-
cient), and they are discussed below.
Henry’s
Aqueous Melting Boiling Vapor law
Software Log P Log D solubility pKa point point pressure constant Availability References
Absolv ✓ ✓ ✓ ✓ ✓ Purchase (67)
ACD/PhysChem Suite ✓ ✓ ✓ ✓ ✓ ✓ Purchase (67)
ADMET Predictor ✓ ✓ ✓ ✓ Purchase (44)
ADMEWORKS ✓ Purchase (45)
Predictor
ChemAxon ✓ ✓ ✓ Purchase (68)
ChemOffice ✓ ✓ ✓ ✓ Purchase (69)
ChemProp ✓ ✓ ✓ ✓ ✓ ✓ Free online (70)
ChemSilico ✓ ✓ ✓ Purchase (71)
C log P ✓ Purchase (35, 72)
Episuite ✓ ✓ ✓ ✓ ✓ ✓ Free (73)
download
MOE ✓ Purchase (39)
Molecular Modeling Pro ✓ ✓ ✓ ✓ ✓ Purchase (74)
MoKa ✓ Purchase (33)
Molinspiration ✓ Free online (75)
6 Prediction of Physicochemical Properties
(continued)
101
102
Table 6
(continued)
Henry’s
J.C. Dearden
Table 7
Log P predictions from ten software packages
for a 138-chemical test set (103)
% of chemicals with
log P prediction error
Software <
_ 0.5 log unit r2 s
QMPRPlusa (44) 94.2 0.965 0.272
ACD/Labs (67) 93.5 0.965 0.271
ChemSilico (71) 93.5 0.958 0.297
ProPred (80) 89.9 0.945 0.342
A log P (75) 89.1 0.948 0.332
KOWWIN (Episuite) (73) 89.1 0.947 0.335
SPARC (81) 88.5 0.941 0.330
C log P (35) 88.4 0.961 0.287
b
Prolog P (77) 86.2 0.949 0.329
MOLPRO (76) 81.1 0.847 0.568
a
Now ADMET Predictor
b
Now in PALLAS
3.2. Aqueous Solubility Aqueous solubility depends not only on the affinity of a solute for
water, but also on its affinity for its own crystal structure. Molecules
that are strongly bound in their crystal lattice require considerable
energy to remove them. This also means that such compounds have
high melting points, and high-melting compounds generally have
poor solubility in any solvent. Note that solubility can vary consid-
erably with temperature, and it is important that solubility data are
reported at a given temperature.
Removal of a molecule from its crystal lattice also means an
increase in entropy, and this can be difficult to model accurately. For
this reason, as well as the fact that the experimental error on
solubility measurements is estimated to be about 0.6 log unit
(108), the prediction of aqueous solubility is not as accurate as is
the prediction of partition coefficient. Nevertheless, many papers
(10) and a book (109) have been published on the prediction of
aqueous solubility, as well as a number of reviews (10, 87,
110–112).
There are also a number of commercial software programs
available for that purpose (10, 113). Livingstone (90) has discussed
the reliability of aqueous solubility predictions from both QSPRs
and commercial software. It should be noted that there are various
ways that aqueous solubilities can be reported: in pure water, at a
specified pH, at a specified ionic strength, as the undissociated
species (intrinsic solubility), or in the presence of other solvents
or solutes. Solubilities are also reported in different units, for exam-
ple g/100 mL, mol/L, and mole fraction. The use of mol/L is
recommended, as this provides a good basis for comparison.
The aqueous solubility (S) of a chemical could be described as
its hydrophilicity, and so one could perhaps expect an inverse
relationship between aqueous solubility and hydrophobicity
(as measured by partition coefficient). This is in fact so for organic
liquids (114), but does not hold well for solids, because the ther-
modynamics of melting means that there are significant enthalpy
106 J.C. Dearden
It can be seen from Eq. 6 that the main factors controlling aqueous
solubility are hydrogen bond acceptor ability and molecular size.
The SaHSbH term models intramolecular hydrogen bonding,
which lowers aqueous solubility. Increased molecular size also low-
ers solubility, since a larger cavity has to be created in water to
accommodate a larger solute molecule.
Katritzky et al. (108) used their CODESSA descriptors to
model the aqueous solubilities of a large diverse set of organic
chemicals:
log Saq ¼ 16:1 Q min 0:113 N el þ 2:55 FHDSAð2Þ
þ 0:781 ABOðN Þ þ 0:328 0SIC 0:0143 RNCS
0:882; (7)
n ¼ 411, R2 ¼ 0.879, and s ¼ 0.573,
where Qmin ¼ most negative partial charge, Nel ¼ number of
electrons, FHDSA(2) ¼ fractional hydrogen bond donor area,
ABO(N) ¼ average bond order of nitrogen atoms, 0SIC ¼ an
information content topological descriptor, and RNCS ¼ relative
negatively charged surface area. The CODESSA software is
available from SemiChem Inc.(34).
6 Prediction of Physicochemical Properties 107
Table 8
Aqueous solubility (S) predictions from ten software
packages for a 113-chemical test set (129)
% of chemicals
with log S prediction
Software error <
_ 0.5 log unit r2 s
ChemSilico (71) 79.7 0.951 0.451
WATERNT (Episuite) (73) 79.6 0.954 0.437
VCCLAB (46) 77.0 0.943 0.487
a
QMPRPlus (44) 74.3 0.939 0.501
ACD/Labs (67) 72.6 0.940 0.498
WSKOWWIN (Episuite) (73) 69.9 0.923 0.562
SPARC (81) 68.1 0.853 0.779
ABSOLV (36) 61.9 0.888 0.680
QikProp (59) 55.7 0.867 0.742
MOLPRO (76) 50.4 0.766 0.984
a
Now ADMET Predictor
108 J.C. Dearden
Table 9
Aqueous solubility (S) predictions from ten software
packages for a 122-chemical test set of drugs (10)
% of chemicals
with log S prediction
Software error <
_ 0.5 log unit r2 s
Admensaa (83) 72.1 0.76 0.65
ADMET Predictor (44) 64.8 0.82 0.47
MOLPRO (76) 62.3 0.44 1.22
ChemSilico (71) 59.8 0.67 0.73
ACD/Labs (67) 59.0 0.73 0.66
VCCLAB (46) 51.6 0.67 0.73
QikProp (59) 47.6 0.57 0.97
PredictionBase (52) 46.7 0.48 1.07
SPARC (81) 42.9 0.73 0.96
WSKOWWIN (Episuite) (73) 41.0 0.51 1.17
a
Now known as StarDrop
3.3. pKa Within a congeneric series of chemicals, pKa is often closely corre-
lated with the Hammett substituent constant, and this is the basis
for a number of attempts at pKa prediction. Harris and Hayes
(131) and Livingstone (90) have reviewed the published literature
in this area.
6 Prediction of Physicochemical Properties 109
Table 10
Some software predictions of pKa by Dearden et al. (146)
and Liao and Nicklaus (147)
an MAE of 0.56 pKa unit for a test set of 2,143 diverse chemicals.
ChemSilico’s pKa predictor was reported to have an MAE of
0.99 pKa unit for a test set of 665 diverse chemicals, many of
them multiprotic. However, this module does not appear currently
to be available, although it is stated to be used in ChemSilico’s
log D predictor (71). The ChemProp software (70) uses a novel
approach whereby, for a given compound, the best prediction
method is selected based on prediction errors for structurally simi-
lar compounds (145).
Dearden et al. (146) tested the performance of ten available
software programs that calculate pKa values. Some of these pro-
grams will calculate pKa values of all ionizable sites. However, the
test set of 665 chemicals that they used, which was kindly supplied
by ChemSilico Inc. and used by them as their test set, had measured
pKa values only for the prime ionization site in each molecule.
There were doubts about the correct structures of 11 of the test
set chemicals, and so the programs were tested on 654 chemicals.
Some of the software companies kindly ran our compounds
through their software in-house. The results are given in Table 10.
It should be noted that the ACD/pKa predictions were incorrectly
reported by Dearden et al. (146), for which the author apologizes.
The ACD/pKa predictions given in Table 10 are correct.
6 Prediction of Physicochemical Properties 111
3.4. Melting Point Melting point is an important property for two main reasons.
Firstly, it indicates whether a chemical will be solid or liquid at
particular temperatures, which will dictate how it is handled. Sec-
ondly, it is used in the GSE (117) to predict aqueous solubility.
The melting point of a crystalline compound is controlled
largely by two factors—intermolecular interactions and molecular
symmetry. For example, 3-nitrophenol, which can hydrogen-bond
via its OH group, melts at 97 C, whereas its methyl derivative,
3-nitroanisole, which cannot hydrogen-bond with itself, melts at
39 C. The symmetrical 1,4-dichlorobenzene melts at 53 C, whilst
its less-symmetrical 1,3-isomer melts at 25 C. These and other
effects have been discussed in detail by Dearden (151).
There have been many attempts to predict the melting point of
organic chemicals, and these have been reviewed by Horvath (152),
Reinhard and Drefahl (87), Dearden (63, 153), and Tesconi and
Yalkowsky (154). It may be noted that in 1884 Mills (155) devel-
oped a QSPR based on carbon chain length for melting points of
homologous series of compounds that was accurate to 2 :
MP ¼ ðbðx cÞÞ=ð1 þ gðx cÞÞ; (9)
112 J.C. Dearden
Table 11
Some software predictions of melting points
of a 96-compound test set (63)
Considering the size and diversity of the data sets, the statistics
are quite good. However, the methodology used was complex, and
could not readily be applied.
The group contribution approach to melting point prediction
was first used by Joback and Reid (166). Simamora and Yalkowsky
(167) modeled the melting points of a diverse set of 1,690 aromatic
compounds using a total of 41 group contributions and four intra-
molecular hydrogen bonding terms, and found a standard error of
37.5 . Constantinou and Gani (168) used two levels of group
contributions to model the melting points of 312 diverse chemi-
cals, and obtained a mean absolute error of prediction of 14.0 ,
compared with an MAE of 22.6 for the Joback and Reid method.
Marrero and Gani (169) extended this approach to predict the
melting points of 1,103 diverse chemicals with a standard error of
25.3 . Tu and Wu (170) used group contributions to predict
melting points of 1,310 diverse chemicals with an MAE of 8.2%.
There are several software programs that predict melting point
(see Table 6); they all use one or more group contribution
approaches. Dearden (63) used a 96-compound test set to compare
the performances of three of these programs. Episuite (73) calcu-
lates melting point by two methods, that of Joback and Reid (166)
and that of Gold and Ogle (171), and takes their mean. ChemOf-
fice (69) uses the method of Joback and Reid (166), and ProPred
(80) uses the Gani approach (168, 169). The results are given in
Table 11.
An ECETOC report (113) mentions a 1999 US Environmen-
tal Protection Agency (EPA) test of the performance of the Episuite
MPBPVP module; for two large, diverse test sets the performance
was as follows: (1) n ¼ 666, r2 ¼ 0.73, MAE ¼ 45 ; (2)
n ¼ 1,379, r2 ¼ 0.71, MAE ¼ 44 .
Molecular Modeling Pro uses the Joback and Reid (166)
method, so its performance should be the same as that of ChemOf-
fice. Four other programs, Absolv (67), ChemProp (70), OECD
Toolbox (24), and PREDICTPlus (79), also predict melting point.
It can be seen that there is little to choose between the pro-
grams in terms of accuracy of prediction. They can all operate in
114 J.C. Dearden
3.5. Boiling Point Boiling point (Tb) is an important property since it is an indicator
of volatility, and can be used to predict vapor pressure. From the
Clausius–Clapeyron equation, boiling point is inversely propor-
tional to the logarithm of vapor pressure. Boiling point also indi-
cates whether a chemical is gaseous or liquid at a given temperature.
Lyman (172) has discussed seven recommended methods for
the prediction of boiling point. The methods are based on physico-
chemical and structural properties and group contributions.
Perhaps the simplest of those methods is that of Banks (173),
who developed the following QSPR:
p
log TbðKÞ ¼ 2:98 4= MW; (10)
where MW ¼ molecular weight. No statistics were given for this
QSPR.
Rechsteiner (174), Reinhard and Drefahl (87), and Dearden
(63) have reviewed the QSPR prediction of boiling point.
Many studies of boiling point prediction have dealt with spe-
cific chemical classes, and very good correlations have generally
been obtained. In 1884 Mills (155) modeled the boiling points of
a number of homologous series with QSPRs based on carbon chain
length, and claimed accuracy to within about 2 (see Eq. 9). Ivan-
ciuc et al. (175) used four topological descriptors to model the
boiling points of 134 alkanes with a standard error of 2.7 , whilst
Gironés et al. (176) used only one quantum chemical descriptor
(electron–electron repulsion energy) to model the boiling points of
15 alcohols with a standard error of 5.6 .
Models based on diverse training sets are, however, more
widely applicable. Katritzky et al. (177) used four CODESSA
descriptors to model the boiling points of 298 diverse organic
compounds:
TbðKÞ ¼ 67:4 GI1=3 þ 21; 540 HDSAð2Þ þ 140:4d max
þ 17:5NCl 151:3; (11)
n ¼ 298, R ¼ 0.973, and s ¼ 12.4 ,
2
Table 12
Some software predictions of boiling points of a
100-compound test set (63)
the ACD/Labs software gives by far the best predictions, but has to
be purchased. SPARC is freely accessible, but operates only in
manual mode, with SMILES input. Episuite can be freely down-
loaded, but its standard error of prediction was more than twice
that of SPARC. ECETOC (113) quotes the US EPA testing of the
MPBPVP module of the Episuite software; two very large diverse
test sets yielded the following: n ¼ 4,426, MAE ¼ 15.5 ;
n ¼ 6,584, MAE ¼ 20.4 . PREDICTPlus is claimed to have an
MAE of 12.9 . These results are comparable with those of Dearden
given above. Five other software programs, Absolv (67), Chem-
Prop (70), PhysProps (78), OECD Toolbox (24), and PREDICT-
Plus (79), also predict boiling point.
It is recommended that at least two predictions be obtained,
and their average used.
3.6. Vapor Pressure The vapor pressure (VP) of a chemical controls its release into the
atmosphere, and thus is an important factor in the environmental
distribution of chemicals. Vapor pressure is highly temperature
dependent. Most literature values are at ambient temperature, but
some QSPRs allow predictions over a range of temperatures. The
variation of vapor pressure with temperature is given by the
Clausius–Clapeyron equation:
VP2 L 1 1
ln ¼ ; (13)
VP1 R T2 T1
where L ¼ latent heat of vaporization, and R ¼ universal gas con-
stant.
If the latent heat of vaporization is high, vapor pressure changes
markedly with temperature, which is why some chemicals (e.g.,
PCBs) deposit out in polar regions.
6 Prediction of Physicochemical Properties 117
Table 13
Some software predictions of vapor pressures
at 25 oC of a 100-compound test set (63)
3.7. Henry’s Law The air–water partition coefficient is important in the distribution
Constant (Air–Water of chemicals between the atmosphere and water in the environ-
Partition Coefficient) ment. The prediction of Henry’s law constant (H) has been
reviewed by Sch€ €rmann and Rothenbacher (200), Schwarzen-
uu
bach et al. (111), Reinhard and Drefahl (87), Mackay et al. (201),
and Dearden and Sch€ €rmann (64).
uu
One simple way of calculating H is to use the ratio of vapor
pressure and aqueous solubility (VP/Cw). It is not a highly accurate
method, but neither is the measurement of H, especially for
chemicals with very high or very low H values. VP/Cw can be
converted to the dimensionless form of H (ratio of concentrations
in air and water, Ca/Cw, or Kaw) by the following equation, which is
valid for 25 C:
Ca VP
¼ 40:874 : (16)
Cw Cw
Most prediction methods for H use a group or bond contribu-
tion approach, although some have used physicochemical proper-
ties (202). The group and bond contribution methods were first
used by Hine and Mookerjee (203), who obtained, for a set of 263
diverse simple organic chemicals, a standard deviation of 0.41 log
unit for the group contribution method and one of 0.42 for the
bond contribution method. Cabani et al. (204) claimed an
improvement in the group contribution method over that of
Hine and Mookerjee, whilst Meylan and Howard (205) extended
the bond contribution method and obtained, for a set of 345
diverse chemicals, a standard error of 0.34 log unit. Their method,
together with a group contribution method, is incorporated in the
HENRYWIN module of the Episuite software (73).
Several workers have used physicochemical and/or structural
descriptors to model H.
Nirmalakhandan and Speece (206) developed a QSPR using a
polarizability descriptor, a molecular connectivity term, and an
indicator variable for hydrogen bonding. However, Sch€ uu€ rmann
and Rothenbacher (200) found it to have poor predictive power.
Russell et al. (207) used their ADAPT software to develop a
5-descriptor model of log Kaw for a relatively small but diverse data
set:
log Kaw ¼ 0:547 NHEAVY þ 0:0402 WPSA þ 0:0360 RNCS
þ 10:1 QHET 215 QRELSQ þ 0:73;
(17)
n ¼ 63, R ¼ 0.956, and s ¼ 0.375,
2
120 J.C. Dearden
4. Guidelines for
Developing QSARs
and QSPRs
A number of publications have offered guidelines on how to
develop QSARs and QSPRs (11, 61, 213–215). In March 2002 a
meeting of QSAR/QSPR experts was held in Setúbal, Portugal, to
formulate a set of guidelines for the validation of QSARs/QSPRs,
in particular for regulatory purposes. Six guidelines were drawn up,
which were later adopted by the OECD (216) and modified to five.
The guidelines are the following:
A valid QSAR/QSPR should have:
1. A defined endpoint.
2. An unambiguous algorithm.
3. A defined domain of applicability.
4. Appropriate measures of goodness of fit, robustness, and pre-
dictivity.
5. A mechanistic interpretation, if possible.
The guidelines are now known as the OECD Principles for the
Validation of (Q)SARs, although they are intended to apply to
QSPRs also. The OECD has also provided a checklist to provide
guidance on the interpretation of the principles (217).
122 J.C. Dearden
Table 14
Types of error in QSAR/QSPR development and use
(from Dearden et al. (218), by kind permission of Taylor
& Francis Ltd., publishers (www.informaworld.com))
Relevant OECD
No. Type of error principle(s)
1 Failure to take account of data heterogeneity 1
2 Use of inappropriate endpoint data 1
3 Use of collinear descriptors 2, 4, 5
4 Use of incomprehensible descriptors 2, 5
5 Error in descriptor values 2
6 Poor transferability of QSAR/QSPR 2
7 Inadequate/undefined applicability domain 3
8 Unacknowledged omission of data points 3
9 Use of inadequate data 3
10 Replication of compounds in data set 3
11 Too narrow a range of endpoint values 3
12 Over-fitting of data 4
13 Use of excessive number of descriptors 4
in a QSAR/QSPR
14 Lack of/inadequate statistics 4
15 Incorrect calculation 4
16 Lack of descriptor auto-scaling 4
17 Misuse/misinterpretation of statistics 4
18 No consideration of distribution of residuals 4
19 Inadequate training/test set selection 4
20 Failure to validate a QSAR/QSPR correctly 4
21 Lack of mechanistic interpretation 5
4.1. Heterogeneous There is a temptation, especially if data are scarce, to use values that
Descriptors are not strictly comparable. For example, aqueous solubilities can
be measured in pure water, as undissociated species, at a given pH,
or at different temperatures. It is important, in the development of
a QSPR, to use data that were obtained under the same conditions,
and if possible using the same protocol. Failure to do so will result
in a less than satisfactory QSPR.
4.2. Inappropriate The values of the property of interest must be in molar units, and
Endpoint Data not weight units. This is an important matter, but one that is
frequently not recognized. Again using aqueous solubility as an
example, values should be in units of mol/L, and not g/L. The
reason is that the effect of a chemical (be it physicochemical or
biological) is determined by the number of molecules present, and
not by how much they weigh. Consider two chemicals, A with a
molecular weight of 100 and B with a molecular weight of 200.
Both are found to have an aqueous solubility of 100 mg/L. How-
ever, the molar solubility of A is 1 (i.e., 100/100) mmol/L, whilst
that of B is 0.5 (i.e., 100/200) mmol/L, so B is really only half as
soluble as A.
4.4. Incomprehensible There are now thousands of molecular descriptors available for use
Descriptors in QSPR and QSAR model development, and many of them have
no clear physicochemical meaning. Whilst it is not essential for
descriptors to be clearly understood, it is helpful and satisfying if
they are, as well as aiding in the interpretation of the model. It must
nevertheless be recognized that the existence of a correlation,
however good, is not a guarantee of causality.
4.6. Poor One of the main values of a QSAR or a QSPR is that it can be used
Transferability of by others for predictive purposes. Hence it has to be transferable
QSARs and QSPRs and reproducible. Unfortunately, for various reasons (e.g., lack of
availability of software) this is often not the case. Hartung et al.
(220) have suggested the following criteria for transferability of a
QSAR or a QSPR to a different operator:
(a) Descriptor values can be reproduced.
(b) Model definition can be confirmed.
(c) Goodness of fit and statistical robustness can be confirmed.
(d) Reproducibility of predictions can be confirmed.
(e) An assessment is given of the adequacy of documentation on
the development and application of the model.
4.7. Inadequate/ The applicability domain (AD) of a QSAR or a QSPR has been
Undefined Applicability defined as: “the response and chemical structure space in which the
Domain model makes predictions with a given reliability” (221, 222). It is
permissible to use a QSAR/QSPR to make predictions a little way
outside its AD, but one should have less confidence in the accuracy
of such predictions. For example, if a given toxicity endpoint cor-
related well with log P, and the log P range of the training set
chemicals was 0–6, one could not expect an accurate toxicity pre-
diction for a chemical with a log P value of 9.
At present, very few published QSAR/QSPR papers give an
indication of AD, although if descriptor values are given one can see
what ranges of descriptor values were used in the training set. In
addition, of course, one should not use a QSAR or a QSPR to make
predictions for a chemical that is not structurally similar to at least
some of the chemicals in the training set. Again, this guideline is not
always adhered to.
4.8. Unacknowledged Data used for the development of a QSAR/QSPR are often taken
Omission of Data from published literature, and one may, for a number of reasons,
Points wish to use only a selection of those available. If some data are
omitted, that must be stated, with reasons (e.g., to keep the train-
ing set to a reasonable number of chemicals, or to examine a
particular class of chemicals). However, it is not uncommon to
find that data have been pruned without a reason being given, or
even without a mention that data have been omitted. Probably the
main reason for omission of data is that the omitted chemicals were
6 Prediction of Physicochemical Properties 125
found to be outliers; that is, their property of interest was not well
predicted by the QSAR/QSPR. If this is the case, it must be stated
clearly, and preferably a reason should be given (e.g., the omitted
chemicals were the only ones that were strongly dissociated).
4.9. Use of Inadequate Inadequacy of data can occur in a number of ways. It can include
Data heterogeneity (Subheading 4.1), inappropriate data (Subhead-
ing 4.2), an undefined applicability domain (Subheading 4.7),
and omission of data (Subheading 4.8). Another common problem
is the accuracy of data, which is often very difficult to determine.
Sometimes one finds incorrect or inadequately defined chemi-
cal names. For example, a QSAR study of skin absorption (223)
listed 4-chlorocresol and chloroxylenol in the training set used. The
former has two isomers (4-chloro-2-cresol and 4-chloro-3-cresol),
whilst chloroxylenol has 18 isomers.
A recent study (224) found that incorrect chemical structures
in a number of public and private databases ranged from 0.1 to
3.4%, and observed that even slight structural errors could cause
pronounced changes in the accuracy of QSAR/QSPR predictions.
It should also be noted that it is unacceptable to use predicted
values of the property of interest, when developing a QSAR or a
QSPR, as one is then making predictions about predictions. An
example is a study of skin permeability of 114 chemicals, 63 of
which had calculated permeability values (225).
4.11. Too Narrow a The greater the range of endpoint values used in the development
Range of Endpoint of a QSPR model, the better is its predictivity (229). It is recom-
Values mended that a range of endpoint values of at least 1.0 log unit be
used, if possible, for a good QSAR/QSPR model to be developed
(229). Sometimes, of course, that cannot be achieved, perhaps
through lack of availability of sufficient data, or through sheer
impossibility. For example, a QSPR for the melting points of sub-
stituted anilines used chemicals with a melting range of
244.5–461.5 K, or 0.276 log unit (151).
126 J.C. Dearden
4.13. Use of Excessive QSPRs with a large number of descriptors are difficult to interpret
Number of Descriptors (233). Dearden et al. (218) recommended a maximum of five or six
in a QSAR/QSPR descriptors as a general rule, largely on the grounds of understand-
ing. The principle of Occam’s razor is apposite here: “Entia non
sunt multiplicanda praeter necessitatem” (“One should not
increase beyond what is necessary the number of entities required
to explain anything”). However, occasionally QSARs/QSPRs are
developed with a large number of descriptors, as for example the
use of 55 descriptors to model the aqueous solubility of 1,050
chemicals (234). This practice should be avoided because of the
difficulty of comprehension and the risk of over-fitting (see Sub-
heading 4.12).
4.14. Lack of/ The statistics provided with a QSAR/QSPR are an indication of
Inadequate Statistics how well the model fits the training set data, and how predictive the
model is. Many QSAR and QSPR models are still published with-
out full statistics, which means that it is difficult, if not impossible,
to judge the validity of a model. Dearden et al. (218) have recom-
mended that the following statistical indicators are included with
each published QSAR/QSPR: n (number of chemicals in the train-
ing set); r2 or R2 (coefficient of determination or squared correla-
tion coefficient, lower case for a 1-descriptor model and upper case
for a multi-descriptor model); q2 or Q2 (squared cross-validated
correlation coefficient, an internal indicator of predictivity); Radj2
(squared correlation coefficient adjusted for degrees of freedom,
which allows comparison between QSARs and QSPRs containing
different number of descriptors); s (standard error of the estimate)
or RMSE (very similar to standard error of the estimate, especially
6 Prediction of Physicochemical Properties 127
4.15. Incorrect It is the duty of authors to ensure that calculations that are made to
Calculation obtain their results are as accurate as possible. Editors and manu-
script reviewers expect that to be the case, and it is often impossible
to check accuracy because insufficient data are supplied. There is no
doubt that incorrectly calculated QSARs and QSPRs have been
published, but no one has investigated this problem. A few
instances have come to light. Dearden et al. (218) reported a case
in which a 5-descriptor QSAR was published with R2 ¼ 0.958,
s ¼ 0.14, and F ¼ 79. Because all descriptor values were given, it
was possible to recalculate the QSAR, and the statistics were found
to be R2 ¼ 0.298, s ¼ 0.56, and F ¼ 1.4.
It is recommended that all calculations be double-checked,
preferably by two different people, before a QSAR or a QSPR is
published.
4.17. Misuse or Not many QSAR/QSPR practitioners are statisticians, and few
Misinterpretation statisticians have experience of QSAR/QSPR modeling. Neverthe-
of Statistics less, statistics is an essential QSAR/QSPR tool. It is therefore not
surprising to find that it is sometimes misused, or its results mis-
interpreted. Livingstone’s book (42) is very helpful in this respect,
and a survey of QSAR/QSPR statistics available on the Internet
(235) gives useful guidance. Useful Web sites are those of the
Scripps Institute (236) and QSAR World (237).
Two cases involving the misuse or misinterpretation of statistics
(231, 232) have already been mentioned in Subheading 4.12.
It behoves QSAR/QSPR workers to familiarize themselves
sufficiently with the requisite statistics, or to enlist the help of a
knowledgeable statistician, in order to ensure that the statistical
techniques and results that they use are valid.
4.19. Inadequate Data for QSAR/QSPR modeling are usually divided into training
Training/Test Set and test sets either randomly or by ordering the chemicals accord-
Selection ing to endpoint values and then selecting every nth chemical for the
test set. These approaches have, however, been shown to be subop-
timal (14, 238). The training set should cover a good range of
endpoint values and (for diverse data sets) have an adequate cover-
age of the requisite chemical space. The test set chemicals should
also cover a good range of endpoint values, be sufficiently diverse in
nature, and be similar (but not too similar) to training set chemi-
cals. Various techniques (14, 15, 239, 240) have been proposed for
the rational selection of training and test sets.
It is essential that proper attention is paid to training and test
set selection when preparing to develop a QSAR/QSPR model. It
is recognized, however, that one often serious drawback to this is
lack of availability of satisfactory and appropriate data.
4.21. Lack of The documentation for the OECD Principles for the Validation of
Mechanistic (Q)SARs (216) states: It is recognised that it is not always possible,
Interpretation from a scientific viewpoint, to provide a mechanistic interpretation of
a given (Q)SAR (Principle 5), or that there even be multiple mecha-
nistic interpretations of a given model. The absence of a mechanistic
interpretation for a model does not mean that a model is not poten-
tially useful in the regulatory context. The intent of Principle 5 is not
to reject models that have no apparent mechanistic basis, but to ensure
that some consideration is given to the possibility of a mechanistic
association between the descriptors used in a model and the endpoint
being predicted, and to ensure that this association is documented.
The OECD guidance on the Principles for the Validation of
(Q)SARs/(Q)SPRs (217) recommends that the following ques-
tions be asked regarding the mechanistic basis of a QSAR/QSPR:
1. Do the descriptors have a physicochemical interpretation that is
consistent with a known mechanism?
2. Can any literature references be cited in support of the pur-
ported mechanistic basis of the QSAR/QSPR?
If the answers to both questions are positive, one may have
some confidence in the proposed mechanism of action. If the
answer to one or both questions is negative, then the level of
confidence will be lower. In all cases, it must be remembered that
the existence of a correlation does not imply causality.
Johnson (245) recently commented that QSAR has devolved
into a kind of logical fallacy: cum hoc ergo propter hoc (with this,
therefore because of this). He also stated: “rarely, if ever, are any
designed experiments presented to test or challenge the interpreta-
tion of the descriptors . . . Statistical methodologies should be a tool
of QSAR but instead have often replaced the craftsman tools of our
trade—rational thought, controlled experiments, and personal
observation.” The QSAR/QSPR practitioner would do well to
take those strictures to heart, as well as the recommendations
made in a recent extensive review of QSPR prediction of
physicochemical properties (246).
130 J.C. Dearden
References
1. van de Waterbeemd H (2009) Improving In: Cronin MTD, Madden JC (eds) In silico
compound quality through in vitro and in toxicology: principles and applications. RSC
silico profiling. Chem Biodivers 6:1760–1766 Publishing, Cambridge, pp 59–117
2. Cronin MTD, Livingstone DJ (2004) Calcu- 14. Leonard JT, Roy K (2006) On selection of
lation of physicochemical properties. In: Cro- training and test sets for the development of
nin MTD, Livingstone DJ (eds) Predicting predictive QSAR models. QSAR Comb Sci
chemical toxicity and fate. CRC, Boca 25:235–251
Raton, FL, pp 31–40 15. Golbraikh A, Shen M, Xiao Z et al (2003)
3. Fisk PR, McLaughlin L, Wildey RJ (2004) Rational selection of training and test sets for
Good practice in physicochemical property the development of validated QSAR models.
prediction. In: Cronin MTD, Livingstone DJ J Comput Aided Mol Des 17:241–253
(eds) Predicting chemical toxicity and fate. 16. Aquasol: www.pharmacy.arizona.edu/out-
CRC, Boca Raton, FL, pp 41–59 reach/aquasol/
4. Webb TH, Morlacci LA (2010) Calculation of 17. Tripos: www.tripos.com
physic-chemical and environmental fate prop- 18. Chemical & Physical Properties Database:
erties. In: Cronin MTD, Madden JC (eds) In www.dep.state.pa.us/physicalproperties/CPP_
silico toxicology: principles and applications. search.htm
RSC Publishing, Cambridge, pp 118–147
19. Chemical Database Service: cds.dl.ac.uk
5. Dearden JC (2004) QSAR modeling of bioac-
cumulation. In: Cronin MTD, Livingstone 20. ChemSpider: www.chemspider.com
DJ (eds) Predicting chemical toxicity and 21. Crossfire: info.crossfiredatabases.com
fate. CRC, Boca Raton, FL, pp 333–355 22. OCHEM: www.ochem.eu
6. Dearden JC (2004) QSAR modeling of soil 23. OECD eChemPortal: www.echemportal.org
sorption. In: Cronin MTD, Livingstone DJ 24. OECD QSAR Toolbox: www.qsartoolbox.
(eds) Predicting chemical toxicity and fate. org
CRC, Boca Raton, FL, pp 357–371 25. OSHA: www.osha.gov/web/dep/chemical-
7. Sch€ € rmann G, Ebert R-U, Nendza M et al
uu data/
(2007) Predicting fate-related physicochemi- 26. PhysProp: www.syrres.com/what-we-d0/
cal properties. In: van Leeuwen CJ, Vermeire product.aspx?id¼133
TG (eds) Risk assessment of chemicals: an
introduction, 2nd edn. Springer, Dordrecht, 27. Wagner AB (2001) Finding physical proper-
pp 375–426 ties of chemicals: a practical guide for scien-
tists, engineers, and librarians. Sci Technol
8. Abraham MH, Chadha HS, Mitchell RC Lib 21(3/4):27–45
(1994) Hydrogen bonding. 32. An analysis
of water–octanol and water–cyclohexane 28. Oyarzabal J, Pastor J, Howe TJ (2009) Opti-
partitioning and the Dlog P parameter of Sei- mizing the performance of in silico ADMET
ler. J Pharm Sci 83:1085–1100 general models according to local require-
ments: MARS approach. Solubility estimations
9. Mannhold R, Poda GI, Ostermann C et al as case study. J Chem Inf Model 49:2837–2850
(2009) Calculation of molecular lipophilicity:
state-of-the-art and comparison of log P 29. Dearden JC (1990) Physico-chemical descrip-
methods on more than 96,000 compounds. tors. In: Karcher W, Devillers J (eds) Practical
J Pharm Sci 98:861–893 applications of quantitative structure–activity
relationships (QSARs) in environmental
10. Dearden JC (2006) In silico prediction of chemistry and toxicology. Kluwer Academic,
aqueous solubility. Exp Opin Drug Discov Dordrecht, pp 25–59
1:31–52
30. Maran U, Sild S, Tulp I et al (2010) Molecular
11. Livingstone DJ (2004) Building QSAR mod- descriptors from two-dimensional chemical
els: a practical guide. In: Cronin MTD, structure. In: Cronin MTD, Madden JC
Livingstone DJ (eds) Predicting chemical (eds) In silico toxicology: principles and
toxicity and fate. CRC, Boca Raton, FL, pp applications. RSC Publishing, Cambridge,
151–170 pp 148–192
12. SMILES: www.daylight.com/dayhtml_tutor- 31. Ran YQ, Jain N, Yalkowsky SH (2001) Pre-
ials/languages/smiles/index.html diction of aqueous solubility of organic com-
13. Nendza M, Aldenberg T, Benfenati E et al pounds by the general solubility equation
(2010) Data quality assessment for in silico (GSE). J Chem Inf Comput Sci 41:
methods: a survey of approaches and needs. 1208–1217
6 Prediction of Physicochemical Properties 131
89. Mannhold R, van de Waterbeemd H (2001) 103. Dearden JC, Netzeva TI, Bibby R (2003) A
Substructure and whole molecule approaches comparison of commercially available soft-
for calculating log P. Comput Aided Mol Des ware for the prediction of partition coeffi-
15:337–354 cient. In: Ford M, Livingstone D, Dearden J
90. Livingstone DJ (2003) Theoretical property et al (eds) Designing drugs and crop protec-
predictions. Curr Top Med Chem 3: tants: processes, problems and solutions.
1171–1192 Blackwell, Oxford, pp 168–169
91. Klopman G, Zhu H (2005) Recent meth- 104. Sakuratani Y, Kasai K, Noguchi Y et al (2007)
odologies for the estimation of n-octanol/ Comparison of predictivities of log P calcula-
water partition coefficients and their use in tion models based on experimental data for
the prediction of membrane transport proper- 134 simple organic compounds. QSAR
ties of drugs. Mini Rev Med Chem Comb Sci 26:109–116
5:127–133 105. COSMOlogic: www.cosmologic.de
92. Fujita T, Iwasa J, Hansch C (1964) A new 106. Varnek A, Fourches D, Solov’ev VP et al
substituent constant, p, derived from parti- (2004) “In silico” design of new uranyl
tion coefficients. J Am Chem Soc extractants based on phosphoryl-containing
86:5175–5180 podands: QSPR studies, generation and
93. Nys GG, Rekker RF (1973) Statistical analysis screening of virtual combinatorial library,
of a series of partition coefficients with special and experimental tests. J Chem Inf Comput
reference to the predictability of folding of Sci 44:1365–1382
drug molecules. Introduction of hydrophobic 107. Dearden JC, Cronin MTD, Schultz TW et al
fragmental constants (f values). Chim Ther (1995) QSAR study of the toxicity of nitro-
8:521–535 benzenes to Tetrahymena pyriformis. Quant
94. Rekker RF (1977) The hydrophobic fragmen- Struct Act Relat 14:427–432
tal constant. Elsevier, Amsterdam 108. Katritzky AR, Wang Y, Sild S et al (1998)
95. Leo A, Jow PYC, Silipo C et al (1975) Calcu- QSPR studies on vapor pressure, aqueous sol-
lation of hydrophobic constant (log P) from p ubility, and the prediction of air–water parti-
and f values. J Med Chem 18:865–868 tion coefficients. J Chem Inf Comput Sci
96. Bodor N, Gabanyi NZ, Wong C-K (1989) A 38:720–725
new method for the estimation of partition 109. Yalkowsky SH, Banerjee S (1992) Aqueous
coefficient. J Am Chem Soc 111:3783–3786 solubility: methods of estimation for organic
97. Klopman G, Wang S (1991) A computer compounds. Dekker, New York, NY
automated structure evaluation (CASE) 110. Mackay D (2000) Solubility in water. In:
approach to calculation of partition coeffi- Boethling RS, Mackay D (eds) Handbook of
cient. J Comput Chem 12:1025–1032 property estimation methods for chemicals:
98. Ghose AK, Pritchett A, Crippen GM (1988) environmental and health sciences. Lewis,
Atomic physicochemical parameters for three Boca Raton, FL, pp 125–139
dimensional structure directed quantitative 111. Schwarzenbach RP, Gschwend PM, Imboden
structure–activity relationships: III. Modeling DM (1993) Environmental organic chemis-
hydrophobic interactions. J Comput Chem try. Wiley, New York, NY
9:80–90 112. Johnson SR, Zheng W (2006) Recent prog-
99. Liu R, Zhou D (2008) Using molecular fin- ress in the computational prediction of aque-
gerprint as descriptors in the QSPR study of ous solubility and absorption. AAPS J 8:
lipophilicity. J Chem Inf Model 48:542–549 E27–E40
100. Chen H-F (2009) In silico log P prediction 113. ECETOC Technical Report No. 89 (2003)
for a large data set with support vector (Q)SARs: evaluation of the commercially
machines, radial basis neural networks and available software for human health and envi-
multiple linear regression. Chem Biol Drug ronmental endpoints with respect to chemical
Des 74:142–147 management applications. ECETOC, Brus-
101. Tetko IV, Tanchuk VYu, Villa AEP (2001) sels
Prediction of n-octanol/water partition coef- 114. Hansch C, Quinlan JE, Lawrence GL (1968)
ficients from PHYSPROP database using arti- The linear free energy relationship between
ficial neural networks and E-state indices. partition coefficients and aqueous solubility
J Chem Inf Comput Sci 41:1407–1421 of organic liquids. J Org Chem 33:347–350
102. Hall LH, Kier LB (1999) Molecular structure 115. Yalkowsky SH, Valvani SC (1980) Solubility
description: the electrotopological state. Aca- and partitioning I: solubility of nonelectro-
demic, New York, NY lytes in water. J Pharm Sci 69:912–922
6 Prediction of Physicochemical Properties 133
116. Hughes LD, Palmer DS, Nigsch F et al 129. Dearden JC, Netzeva TI, Bibby R (2003) A
(2008) Why are some properties more diffi- comparison of commercially available soft-
cult to predict than others? A study of QSPR ware for the prediction of aqueous solubility.
models of solubility, melting point, and log P. In: Ford M, Livingstone D, Dearden J et al
J Chem Inf Model 48:220–232 (eds) Designing drugs and crop protectants:
117. Sanghvi T, Jain N, Yang G et al (2003) Esti- processes, problems and solutions. Blackwell,
mation of aqueous solubility by the general Oxford, pp 169–171
solubility equation (GSE) the easy way. QSAR 130. Dearden JC. Unpublished information
Comb Sci 22:258–262 131. Harris JC, Hayes MJ (1990) Acid dissociation
118. Abraham MH, Le J (1999) The correlation constant. In: Lyman WJ, Reehl WF, Rosen-
and prediction of the solubility of compounds blatt DH (eds) Handbook of chemical prop-
in water using an amended solvation energy erty estimation methods. American Chemical
relationship. J Pharm Sci 88:868–880 Society, Washington, DC, pp 6.1–6.28
119. Votano JR, Parham M, Hall LH et al (2004) 132. Brown TN, Mora-Diez N (2006) Computa-
Prediction of aqueous solubility based on tional determination of aqueous pKa values of
large datasets using several QSPR models uti- protonated benzimidazoles (Part 2). J Phys
lizing topological structure representation. Chem B 110:20546–20554
Chem Biodivers 11:1829–1841 133. Kaschula CH, Egan TJ, Hunter R et al (2002)
120. Raevsky OA, Raevskaja OE, Schaper K-J Structure–activity relationships in 4-
(2004) Analysis of water solubility data on aminoquinoline antiplasmodials. The role of
the basis of HYBOT descriptors. Part 3. Sol- the group at the 7-position. J Med Chem
ubility of solid neutral chemicals and drugs. 45:3531–3539
QSAR Comb Sci 23:327–343 134. Soriano E, Cerdan S, Ballesteros P (2004)
121. Klopman G, Zhu H (2001) Estimation of the Computational determination of pK(a)
aqueous solubility of organic molecules by the values. A comparison of different theoretical
group contribution approach. J Chem Inf approaches and a novel procedure. J Mol
Comput Sci 41:439–445 Struct Theochem 684:121–128
122. Palmer DS, O’Boyle NM, Glen RC et al 135. Klopman G, Fercu D (1994) Application of
(2007) Random forest models to predict the multiple computer automated structure
aqueous solubility. J Chem Inf Model 47: evaluation methodology to a quantitative
150–158 structure–activity relationship study of acidity.
123. Lind P, Maltseva T (2003) Support vector J Comput Chem 15:1041–1050
machines for the estimation of aqueous solu- 136. Klamt A, Eckert F, Diedenhofen M et al
bility. J Chem Inf Comput Sci 43:1855–1859 (2003) First principles calculations of aque-
124. Duchowicz PR, Talevi A, Bruno-Blanch LE ous pK(a) values for organic and inorganic
et al (2008) New QSPR study for the predic- acids using COSMO-RS reveal an inconsis-
tion of aqueous solubility of drug-like com- tency in the slope of the pK(a) scale. J Phys
pounds. Bioorg Med Chem 16:7944–7955 Chem A 107:9380–9386
125. Duchowicz PR, Castro EA (2009) QSPR 137. Eckert F, Klamt A (2006) Accurate prediction
studies on aqueous solubilities of drug-like of basicity in aqueous solution with COSMO-
compounds. Int J Mol Sci 10:2558–2577 RS. J Comput Chem 27:11–19
126. Huuskonen J, Livingstone DJ, Manallack DT 138. Lee AC, Yu J-Y, Crippen GM (2008) pKa
(2008) Prediction of drug solubility from prediction of monoprotic small molecules
molecular structure using a drug-like training the SMARTS way. J Chem Inf Model
set. SAR QSAR Environ Res 19:191–212 48:2042–2053
127. Yang G-Y, Yu J, Wang Z-Y et al (2007) QSPR 139. Milletti F, Storchi L, Sforna G et al (2007)
study on the aqueous solubility (lgS(w)) New and original pKa prediction method
and n-octanol/water partition coefficients using GRID molecular interaction fields.
(lgK(ow)) of polychlorinated dibenzo-p- J Chem Inf Model 47:2172–2181
dioxins (PCDDs). QSAR Comb Sci 26: 140. Cruciani G, Milletti F, Storchi L et al (2009)
352–357 In silico prediction and ADME profiling.
128. Wei X-Y, Ge Z-G, Wang Z-Y et al (2007) Chem Biodivers 6:1812–1821
Estimation of aqueous solubility (lgS(w)) 141. Parthasarathi R, Padmanabhan J, Elango M
of all polychlorinated biphenyl (PCB) conge- et al (2006) pKa prediction using group phi-
ners by density function theory and position licity. J Phys Chem A 110:6540–6544
of Cl substitution (N-PCS) method. Chinese 142. Tsantili-Kakoulidou A, Panderi I, Csizmadia
J Struct Chem 26:519–528 F et al (1997) Prediction of distribution
134 J.C. Dearden
169. Marrero J, Gani R (2001) Group-contribution 181. Hall LH, Story CT (1996) Boiling point and
based estimation of pure component proper- critical temperature of a heterogeneous data
ties. Fluid Phase Equil 183–184:183–208 set. QSAR with atom type electrotopological
170. Tu C-H, Wu Y-S (1996) Group-contribution state indices using artificial neural networks.
estimation of normal freezing points of J Chem Inf Comput Sci 36:1004–1014
organic compounds. J Chin Inst Chem Eng 182. Kier LB, Hall LH (1999) Molecular structure
27:323–328 description: the electrotopological state. Aca-
171. Gold PI, Ogle GJ (1969) Estimating thermo- demic, San Diego, CA
physical properties of liquids. Part 4—Boil- 183. Stein SE, Brown RL (1994) Estimation of
ing, freezing and triple-point temperatures. normal boiling points from group contribu-
Chem Eng 76:119–122 tions. J Chem Inf Comput Sci 34:581–587
172. Lyman WJ (2000) Boiling point. In: Boethl- 184. Labute P (2000) A widely applicable set of
ing RS, Mackay D (eds) Handbook of prop- descriptors. J Mol Graph Model 18:464–477
erty estimation methods for chemicals: 185. Ericksen D, Wilding WV, Oscarson JL et al
environmental and health sciences. Lewis, (2002) Use of the DIPPR database for devel-
Boca Raton, FL, pp 29–51 opment of QSPR correlations: normal boiling
173. Banks WH (1939) Considerations of a vapour point. J Chem Eng Data 47:1293–1302
pressure-temperature equation, and their 186. Grain CF (1990) Vapor pressure. In: Lyman
relation to Burnop’s boiling point function. WJ, Reehl WF, Rosenblatt DH (eds) Hand-
J Chem Soc 292–295 book of chemical property estimation meth-
174. Rechsteiner CE (1990) Boiling point. In: ods. American Chemical Society, Washington,
Lyman WJ, Reehl WF, Rosenblatt DH (eds) DC, pp 14.1–14.20
Handbook of chemical property estimations 187. Delle Site A (1996) The vapor pressure of
methods. American Chemical Society, environmentally significant organic chemi-
Washington, DC, pp 12.1–12.55 cals: a review of methods and data at ambient
175. Ivanciuc O, Ivanciuc T, Cabrol-Bass D et al temperature. J Phys Chem Ref Data 26:
(2000) Evaluation in quantitative structure– 157–193
property relationship models of structural 188. Sage ML, Sage GW (2000) Vapor pressure.
descriptors derived from information-theory In: Boethling RS, Mackay D (eds) Handbook
operators. J Chem Inf Comput Sci 40: of property estimation methods for chemicals:
631–643 environmental and health sciences. Lewis,
176. Gironés X, Amat L, Robert D et al (2000) Boca Raton, FL, pp 53–65
Use of electron–electron repulsion energy as a 189. Katritzky AR, Slavov SH, Dobchev DA et al
molecular descriptor in QSAR and QSPR (2007) Rapid QSPR model development
studies. J Comput Aided Mol Des 14: technique for prediction of vapor pressure of
477–485 organic compounds. Comput Chem Eng
177. Katritzky AR, Mu L, Lobanov VS et al (1996) 31:1123–1130
Correlation of boiling points with molecular 190. Liang CK, Gallagher DA (1998) QSPR pre-
structure. 1. A training of 298 diverse organ- diction of vapor pressure from solely
ics and a test set of 9 simple inorganics. J Phys theoretically-derived descriptors. J Chem Inf
Chem 100:10400–10407 Comput Sci 38:321–324
178. Sola D, Ferri A, Banchero M et al (2008) 191. Tu C-H (1994) Group-contribution method
QSPR prediction of N-boiling point and crit- for the estimation of vapor pressures. Fluid
ical properties of organic compounds and Phase Equil 99:105–120
comparison with a group-contribution 192. Öberg T, Liu T (2008) Global and local PLS
method. Fluid Phase Equil 263:33–42 regression models to predict vapor pressure.
179. Wessel MD, Jurs PC (1995) Prediction of QSAR Comb Sci 27:273–279
normal boiling points for a diverse set of 193. Basak SC, Mills D (2009) Predicting the
industrially important organic compounds vapour pressure of chemicals from structure:
from molecular structure. J Chem Inf Com- a comparison of graph theoretic versus quan-
put Sci 35:841–850 tum chemical descriptors. SAR QSAR Envi-
180. Basak SC, Mills D (2001) Use of mathemati- ron Res 20:119–132
cal structural invariants in the development of 194. Goll ES, Jurs PC (1999) Prediction of vapor
QSPR models. Commun Math Comput pressures of hydrocarbons and halohydrocar-
Chem 44:15–30 bons from molecular structure with a
136 J.C. Dearden
application and interpretation of QSPR mod- 245. Johnson SR (2008) The trouble with QSAR
els. QSAR Comb Sci 22:69–77 (or how I learned to stop worrying and
243. Benigni R, Bossa C (2008) Predictivity of embrace fallacy). J Chem Inf Model 48:25–26
QSAR. J Chem Inf Model 48:971–980 246. Katritzky AR, Kuanar M, Slavov S et al (2010)
244. Dearden JC, Hewitt M, Geronikaki AA et al Quantitative correlation of physical and
(2009) QSAR investigation of new cogni- chemical properties with chemical structure:
tion enhancers. QSAR Comb Sci 28: utility for prediction. Chem Rev
1123–1129 110:5714–5789
Chapter 7
Abstract
Computational molecular models of chemicals interacting with biomolecular targets provides toxicologists
a valuable, affordable, and sustainable source of in silico molecular level information that augments,
enriches, and complements in vitro and in vivo efforts. From a molecular biophysical ansatz, we describe
how 3D molecular modeling methods used to numerically evaluate the classical pair-wise potential at the
chemical/biological interface can inform mechanism of action and the dose–response paradigm of modern
toxicology. With an emphasis on molecular docking, 3D-QSAR and pharmacophore/toxicophore
approaches, we demonstrate how these methods can be integrated with chemoinformatic and toxicoge-
nomic efforts into a tiered computational toxicology workflow. We describe generalized protocols in which
3D computational molecular modeling is used to enhance our ability to predict and model the most
relevant toxicokinetic, metabolic, and molecular toxicological endpoints, thereby accelerating the compu-
tational toxicology-driven basis of modern risk assessment while providing a starting point for rational
sustainable molecular design.
Key words: Docking, Molecular model, Virtual ligand screening, Virtual screening, Enrichment,
Toxicity, Toxicoinformatics, Discovery, Prediction, 3D-QSAR, Toxicophore, Toxicant, In silico,
Pharmacophore
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_7, # Springer Science+Business Media, LLC 2012
139
140 M.R. Goldsmith et al.
1. Introduction
1
Historically molecular modeling methods, stemming from roots in theoretical and computational chemistry, are
composed of an ensemble of developed and thoroughly vetted computational approaches used to investigate
molecular-level processes and phenomena including but not limited to molecular structure, chemical catalysis,
geochemistry, interfacial chemistry, nanotechnology, conformational analysis, stereoselectivity, enzyme biochem-
istry, chemical reaction dynamics, solvation, molecular aggregation, and molecular design.
7 Informing Mechanistic Toxicology with Computational Molecular Models 141
Table 1
This table shows the overlap between macroscopic and mechanistic toxicology,
examples of targets for which pair-wise ligand/target interactions are most often
sought after, and the molecular modeling methods used to inform the toxicological
questions. Toxicology research streams (toxicokinetics/dynamics), specific
toxicology-related processes (ADME/T), examples of toxicologically related
biological macromolecules implicated in specific processes and the three-
dimensional Computational Molecular Modeling methods (3D-CAMM) elaborated
in this text
Fig. 1. Point of departure from (a) 1D chemical smiles notation to 3D representation, with atom type and specific
coordinates spatially defined. (d–f) The three major classes of molecule modeling methods used to evaluate ligand/target
interactions.
their specific connectivity) and are the only viable alternatives for
reliable a priori estimates for risk assessment.
1.2. Exploring Ligand: Unlike specific models of both biological macromolecules and
Target Interactions small-molecule ligands, both 3D-QSAR and pharmacophore meth-
Implicitly: 3D-QSAR and ods address the fundamental chemical/biological aspects of pair-
Pharmacophores wise interactions implicitly. Although both deal with the explicit
(i.e., full 3D) structure of a chemical of interest and both require
either a training/test set of chemicals with known activities for a
given target for a given mode of action (i.e., agonist or antagonist,
substrate or inhibitor), neither pharmacophore approaches nor 3D-
QSAR approaches can provide specific molecular level detail
between atoms on both macromolecule coupled to those of the
ligands that give rise to said activity. Pair-wise interactions between
the ligand and the target molecule must be spatially defined. None-
theless, both methods are a step in the right direction from tradi-
tional 2D-QSAR since inherently both 3D-QSAR and
pharmacophore models have the ability to discriminate activity
144 M.R. Goldsmith et al.
1.3. Modeling Explicit In the specific case of modeling ligand/target interactions for
Pair-Wise Interaction virtual ligand screening as applied to toxicology, certain methods
Potential of for evaluating pair-wise interaction energy are too computationally
Ligand–Target: expensive/intensive and scale poorly with system size; Quantum
Molecular Mechanics, Mechanical (QM or sometimes referred to as quantum chemical or
Empirical Scoring, and electronic structure theory methods) are highly accurate but not
the Need for Structurally ideal (hence not pertinent) due to their computational demand for
Informed Molecular almost all of the said interaction partners and processes listed in
Models Table 1, with perhaps the exception of bond breaking/making
processes inherent in metabolic reactions or irreversible binding.
Although the principal focus of in silico methods to estimate meta-
bolic rate constants have been quantum mechanical (24, 25), the
majority of pair-wise interactions a toxicologist will require are
related to ligand/macromolecular target pair-wise interactions,
composed of both bonded (ligand and target “self-energy”) and
nonbonded interactions between a small molecule and target
biological macromolecule (i.e., receptor or enzyme) for which all
structural optimization routines are adequate within a classical
physics formalism, or more specifically, within a molecular mechan-
ics (MM) framework in which the smallest unit of relevance are
atoms (not electrons as in the case of QM approaches).
The classical physics approach to modeling molecules requires
the assumptions of molecular mechanics which makes use of
atom-specific functions, or force fields parameters, that have been
developed by a variety of experimental or high-level theoretical
calculations (i.e., ab initio or semi-empirical QM). These are related
to atom-specific terms that describe all bonded and nonbonded
interactions (conformational energy, as a function of dihedral
angles, bond angles and bond lengths intramolecular electrostatic
interactions and van der Waals, or dispersion forces) in Cartesian
space that are ultimately integrated over all space of the individual
molecule or ligand/biomolecule complex to estimate “intermolec-
ular” interaction energy. The “pair-wise interaction potential”
between a ligand and a macromolecular target is provided in a
simplified form in Fig. 2.
It is well known that there is a relationship between free energy
of a reaction or complex formation and reaction rate (i.e., chemical
kinetic) variables. As provided in Eq. 2 of Fig. 2, if one has a
method for capturing interaction free energy of complex formation
(or association) of a ligand/target complex this thermodynamic
146 M.R. Goldsmith et al.
L T L:T
3. KDL:T ≈ Ki
Fig. 2. Fundamental classical expressions evaluated in pair-wise interaction modeling between ligand (L) and macromo-
lecular target (T), and the resulting affinity of the complex (L:T). The first expression relates the energy of the molecular
components as a difference of the complex’s bonded/nonbonded atomic potential from the energy of the individual
partners (L,T). (2) The approximation that the energy function is related to free energy of a system, the thermodynamic
representation in terms of enthalpy (H) and entropy (S), and the thermodynamic interpretation of transition-state theory and
molecular driving forces for association (Ka). Finally the relationship between the complex affinity or dissociation and a
toxicologists metric of the inhibition constant (Cheng-Prusoff).
Fig. 3. Computational toxicology modeling workflow showcasing the in silico, in vitro, and in vivo integration of data and
models within an informatics framework.
1.4. The Use of Although we have provided an overview of the most popular and
Molecular Modeling useful aspects of 3D-CAMM that could be used to inform mecha-
in Computational nistic toxicology, we need to understand how they fall into the
Toxicology: The computational toxicology framework. To know how and when
integrated Modeling these methods are applied in practice, and by whom, we have
Workflow to In Silico devised a workflow (Fig. 3) that highlights some of these compo-
Chemical Genomics nents and how they may complement experimental High Through-
put Screening protocols. The objective is to enrich the
understanding of chemical/biological interactions through toxico-
genomic inquiry. This is achieved by an in silico (filters ! 2D
QSAR > pharmacophore ! docking/3D-QSAR) tiered approach
that is tightly coupled to experimental in vitro screening efforts
(i.e., protein ligand binding assays, transient activation assays,
gene expression profiling, cytotoxicity assays, etc.) to encode a
148 M.R. Goldsmith et al.
2. Materials
Table 2
A list of several comprehensive software/tool/data
resources lists available on the WWW that provide access
to various commercial and open-source software packages,
in addition to open-access database resources
3. Methods
Fig. 4. Specific steps in 3D-CAMM (computer-assisted molecular modeling) workflows, chemical/biological knowledge-based boundary conditions (top box ), ligand-based
151
approaches (left box ), structure-guided methods (central box ), structural biological target inventories, and types of questions related to toxicologically relevant target–target
extrapolations one can inform from structure-based approaches (dark gray boxes ).
152 M.R. Goldsmith et al.
Table 3
A comprehensive set of combined reviews and/or methods papers for various
aspects of the 3D molecular modeling methods discussed in this chapter. We urge
the reader to familiarize themselves with each of the steps associated with their
modeling method chosen and the particular toxicological data gaps they may wish
to address
3.3. Molecular Docking 1. With an optimized target structure database, and an optimized
ligand database one could perform molecular docking experi-
ments where the pair-wise interactions (bonded and non-
bonded terms) between the ligand and the macromolecular
target are systematically evaluated. The resulting poses from
such a docking “run” can each be individually scored based on
known binding affinity data training set of chemicals for a given
target.
2. A binding site is identified (co-crystallized ligand site or ratio-
nally selected site) and each ligand is subject to interact with the
macromolecular target, where sampling and docking trajec-
tories are subject to the force field approximations. Each indi-
vidual “pose” is scored or captured for subsequent analysis.
3. Each of the poses are systematically scored using pair-wise inter-
action potentials that are either derived from classical physics
approaches (i.e., force field approximation) or empirical scoring
functions that have been optimized to reproduce either experi-
mental in vitro binding affinities (trained scoring function).
4. The results are subsequently validated for their ability to enrich
MOA data or rank-order chemical binding for a known target.
Another common validation protocol that has less to do with
the binding affinity and more with pose analysis is the ability for
the docking algorithm to reproduce the original co-crystallized
ligand in the same geometry. Methods that minimize the
RMSD between known pose and docked pose are considered
optimal. This approach of being able to reproduce experimen-
tal crystal structures is termed “pose fidelity” (43, 44) and
references within.
5. Docking “experiments” can form the basis of a numerically
continuous complementarity evaluation of ligand/target com-
plexes (unlike experimental, that rely on binding stronger than
a probe chemical threshold or limit of detection, or else result
in an “NA” or blank result.). Since one can take the top and
bottom rank-ordered chemicals for a target and deduce che-
moinformatic filters (i.e., intrinsic functional property or func-
tional group profiling) one could conceivably perform what is
known as “progressive docking,” where filters from molecular
docking simulations are used to create “structure-guided” fil-
ters for subsequent chemicals (Progressive Docking: A Hybrid
QSAR/Docking Approach for Accelerating In silico High
Throughput Screening) (45).
6. Details about assumptions and expectations from structure-
based virtual ligand screening models, and the very nature of
156 M.R. Goldsmith et al.
Compound
selection
Data set
No
Division
Sufficiently
Training set Test set
predictive?
3D-QSAR model
Yes
3.5. Pharmacophore/ 1. Taking the optimized ligand geometries and knowledge of the
Toxicophore Elucidation specific target for which these chemicals interact with it is
possible to superimpose the various ligands, or conformations
of the various ligands, to recreate or infer the optimal features
for a specific experimental mode of action.
2. Negative data is especially useful in pharmacophore models as it
allows one to rule out “impossible” superpositions (hence
better molecular level boundary conditions) and increase the
predictive accuracy of models to predict “hits” or “non-hits.”
3. Flexible alignment or ligand superposition is preceded by
geometry optimization and conformation enumeration of
each of the 3D geometries that can be made by performing
either stochastic or deterministic molecular simulations within
a molecular mechanics framework to systematically alter the
dihedral angles of the chemicals of interest and localize multiple
low energy conformers or rotamers.
4. Once alignment procedures between chemicals have been com-
pleted, one often finds common molecular features that one
can reduce the ligand structure into (i.e., hydrogen bond
donor, hydrogen bond acceptor, hydrophobic contact, aro-
matic contacts, metal interactions, cationic interactions, and
anionic interactions). These features can include exclusion
volumes or cavities that “wrap” the outer volume of a set of
known ligands. For any given chemical, if a conformation falls
within the cavity, and the spatial relationship between features
are either completely or partially satisfied one would identify
“potential hits.”
4. Examples
Table 4
Selected research papers that exemplify 3D CAMM to inform
mechanistic toxicology
5. Notes
Acknowledgments
References
1. Voutchkova A, Osimitz T, Anastas P (2010) 3. Rabinowitz J, Goldsmith M, Little S, Pasquinelli
Toward a comprehensive molecular design M (2008) Computational molecular modeling
framework for reduced hazard. Chem Rev for evaluating the toxicity of environmental che-
110:5845–5882 micals: prioritizing bioassay requirements. Envi-
2. Rusyn I, Daston G (2010) Computational tox- ron Health Perspect 116:573–577
icology: realizing the promise of the toxicity 4. Allinger N, Burkert U (1982) Molecular
testing in the 21st century. Environ Health mechanics. American Chemical Society,
Perspect 118:1047–1050 Washington, DC
7 Informing Mechanistic Toxicology with Computational Molecular Models 163
5. Dix D, Houck K (2007) The ToxCast program 17. Kubinyi H (2002) From narcosis to hyper-
for prioritizing toxicity testing of environmen- space: the history of QSAR. Quant Struct Act
tal chemicals. Toxicol Sci 95:5–12 Relat 21:348–356
6. Villoutreix B, Renault N, Lagorce D, Speran- 18. Wold S, Ruhe A, Wold H, Dunn W (1984) The
dio O, Montes M, Miteva M (2007) Free collinearity problem in linear regression—the
resources to assist structure-based virtual partial least squares (PLS) approach to
ligand screening experiments. Curr Protein generalized inverses. SIAM J Sci Stat Comput
Pept Sci 8:381–411 5:735–743
7. Ponder J, Case D (2003) Force fields for pro- 19. Cramer R, Patterson D, Bunce J (1988) Com-
tein simulations. Adv Protein Chem 66:27–85 parative Molecular Field Analysis (CoMFA). 1.
8. Pearlman D, Case D, Caldwell J, Ross W, Effect of shape on binding of steroids to carrier
Cheathham T, DeBolt S, Ferguson D, Seibel proteins. J Am Chem Soc 110:5959–5967
G, Kollman P (1995) AMBER, a package of 20. Klebe G, Abraham U, Mietzner T (1994)
computer programs for applying molecular Molecular similarity indices in a comparative
mechanics, normal mode analysis, molecular analysis (CoMSIA) of drug molecules to corre-
dynamics and free energy calculations to simu- late and predict their biological activity. J Med
late the structural and energetic properties of Chem 37:4130–4146
molecules. Comput Phys Commun 91:1–41 21. Pastor M, Cruciani G, McLay I, Pickett S,
9. MacKerell A, Brooks B, Brooks C, Nilsson L, Clementi S (2000) GRid-INdependent descrip-
Roux B, Won Y, Kaplus M (1998) CHARMM: tors (GRIND): a novel class of alignment-
the energy function and its parameterization independent three-dimensional molecular
with an overview of the program. In: Scheyer descriptors. J Med Chem 43:3233–3243
PVR et al (eds) The encyclopedia of computa- 22. Norinder U (1996) 3D-QSAR investigation of
tional chemistry. Wiley, Chichester the Tripos benchmark steroids and some protein-
10. Case D, Cheatham T, Darden T, Gohlke H, tyrosine kinase inhibitors of styrene type using
Luo R, Merz K, Onufriev A, Simmerling C, the TDQ approach. J Chemom 10:533–545
Wang B, Woods R (2005) The AMBER bio- 23. Kurogi Y, Guner O (2001) Pharmacophore
molecular simulation programs. J Comput modelling and three-dimensional database
Chem 26:1668–1688 searching for drug design using catalyst. Curr
11. Brooks B, Brooks C, Mackerell A, Nilsson L, Med Chem 8:1035–1055
Petrella R, Roux B, Won Y, Archontis C, Bar- 24. Park J, Harris D (2003) Construction and
tels S, Caflish B, Caves L, Cui Q, Dinner A, assessment of models of CYP2E1: predictions
Feig M, Fischer S, Gao J, Hodoscek M, Im W, of metabolism from docking, molecular
Kuczera K, Lazaridi T, Ma J, Ovchinnikov V, dynamics and density functional theoretical cal-
Paci E, Pastor R, Post C, Pu J, Schaefer M, culations. J Med Chem 46:1645–1660
Tidor B, Venable T, Woodcock H, Wu X, Yah 25. Jones J, Mysinger M, Korzekwa K (2002)
W, York D, Karplus M (2009) CHARMM: the Computational models for cytochrome P450:
biomolecular simulation program. J Comput a predictive electronic model for aromatic oxi-
Chem 30:1545–1615 dation and hydrogen atom abstraction. Drug
12. Brooks B, Bruccoleri R, Olafson B, States D, Metab Dispos 30:7–12
Swaminathan S, Karplus M (1983) CHARMM: 26. Cheng Y, Prusoff W (1973) Relationship
a program for macromolecular energy, minimi- between the inhibition constant (Ki) and the
zation, and dynamics calculations. J Comput concentration of inhibitor which causes 50 per
Chem 4:187–217 cent inhibition (I50) of an enzymatic reaction.
13. Allinger N, Yuh Y, Lii J (1989) Molecular Biochem Pharmacol 22:3099–3108
mechanics: the MM3 force field for hydrocar- 27. MOE. Chemical Computing Group. Mon-
bons. J Am Chem Soc 111:8551–8566 treal, Quebec, Canada
14. Leo A, Hansch C, Elkins D (1971) Partition 28. Schrodinger, Inc. New York, NY
coefficients and their uses. Chem Rev
71:525–616 29. Cheatham T, Young M (2001) Molecular
dynamics simulation of nucleic acids: successes,
15. Lipinski C, Lombardo F, Dominy B, Feeney P limitations and promise. Biopolymers 56:
(1997) Experimental and computational 232–256
approaches to estimate solubility and perme-
ability in drug discovery and development set- 30. Roterman I, Lambert M, Gibson K, Scheraga
tings. Adv Drug Deliv Rev 23:3–25 H (1989) A comparison of the CHARMM,
AMBER and ECEPP potentials for peptides.
16. Wermuth C, Ganellin C, Lindberg P, Mitscher 2. Phi-Psi maps for n-acetyl alanine N0 -methyl
L (1998) Glossary of terms used in medicinal amide—comparisons, contrasts and simple
chemistry. Pure Appl Chem 70:1129–1143
164 M.R. Goldsmith et al.
experimental tests. J Biomol Struct Dyn 44. Cross J, Thompson D, Rai B, Baber J, Fan K,
7:421–453 Hu Y, Humblet C (2009) Comparison of sev-
31. Roterman I, Gibson K, Scheraga H (1989) A eral molecuclar docking programs: pose predic-
comparison of the CHARMM, AMBER and tion and virtual screening accuracy. J Chem Inf
ECEPP potential for peptides. 1. Conforma- Model 49:1455–1474
tional predictions for the tandemly repeated 45. Cherkasov A, Fuqiang B, Li Y, Fallahi M, Ham-
peptide (Asn-Ala-Asn-Pro)9. J Biomol Struct mond G (2006) Progressive docking: a hybrid
Dyn 7:391–419 QSAR/Docking approach for accelerating in
32. Gundertofte K, Liljefors T, Norrby P, Petter- silico high throughput screening. J Med
son I (1996) A comparison of conformational Chem 49:7466–7478
energies calculated by several molecular 46. Peterson S (2007) Improved CoMFA model-
mechanics methods. J Comput Chem ing by optimization of settings: toward the
17:429–449 design of inhibitors of the HCV NS3 protease.
33. Jorgensen W, Maxwell D, Tirado-Rives J Uppsala University, Uppsala
(1996) Development and testing of the OPLS 47. Norinder U (1998) Recent progress in
all-atom force field on conformational energet- CoMFA methodology and related techniques.
ics and properties of organic liquids. J Am Perspect Drug Discov Des 12/13/14:25–39
Chem Soc 118:11225–11236 48. Kim K, Grecco G, Novellino E (1998) A criti-
34. Jorgensen W, Tirado-Rives J (1988) The OPLS cal review of recent CoMFA applications. Per-
potential functions for proteins—energy mini- spect Drug Discov Des 12/13/14:257–315
mizations for crystals of cyclic-peptides and 49. Rosen J, Lovgren A, Kogej T, Muresan S, Gott-
crambin. J Am Chem Soc 110:1657–1666 fries J, Backlund A (2009) ChemGPS-NPWeb:
35. Halgren T (1996) Merck molecular force field. chemical space navigation tool. J Comput
I. Basis, form, scope parameterization and per- Aided Mol Des 23:253–259
formance of MMFF94. J Comput Chem 50. Larsson J, Gottfries J, Muresan S, Backlund A
17:490–519 (2007) ChemGPS-NP: tuned for navigation in
36. Chen Y, Zhi D (2001) Ligand–protein inverse biologically relevant chemical space. J Nat Prod
docking and its potential use in the computer 70:789–794
search of protein targets of a small molecule. 51. Ekins S et al (2002) Three-dimensional quan-
Proteins 43:217–226 titative structure-activity relationships of inhi-
37. Ellis L, Hou B, Kang W, Wackett L (2003) The bitors of P-glycoprotein. Mol Pharmacol
University of Minnesota Biocatalysis/Biodeg- 61:964
radation Database: post-genomic data mining. 52. Thorsteinson N, Ban F, Santos-Filho O, Tabaei
Nucleic Acids Res 31:262–265 S, Miguel-Queralt S, Underhill C, Cherkasov
38. MetaPrint2d http://www-metaprint2d.ch. A, Hammond G (2009) In silico identification
cam.ac.uk/metaprint2d of anthropogenic chemicals as ligands of zebra-
39. Bologa C, Olah M, Oprea T (2005) Chemical fish sex hormone binding globulin. Toxicol
database preparation for compound acquisition Appl Pharmacol 234:47–57
or virtual screening. Methods Mol Biol 316: 53. Perry J, Goldsmith M, Peterson M, Beratan D,
375 Wozniak G, Ruker F, Simon J (2004) Structure
40. Accelrys Discovery Suite, Accelrys, Inc. San of the ochratoxin A binding site within human
Diego, CA serum albumin. J Phys Chem B 108:
41. Sybyl. Tripos, Inc. St. Louis, MO 16960–16964
42. Schwede T, Sali A, Honig B, Levitt M, Berman 54. Aureli L, Cruciani G, Cesta M, Anacardio R,
H, Jones D, Brenner S, Burley S, Das R, De Simone L, Moriconi A (2005) Predicting
Dokholyan N, Dunbrack R, Fidelis K, Fiser A, human serum albumin affinity of interleukin-
Godzik A, Huang Y, Humblet C, Jacobsen M, 8 (CXCL8) inhibitors by 3D-QSPR approach.
Joachimiak A, Krystek S, Kortemme T, Krysh- J Med Chem 48:2469–2479
tafovych A, Montelione G, Moult J, Murray D, 55. Ekins S, de Groot M, Jones J (2001) Pharma-
Sanchez R, Sosinick T, Standley D, Stouch T, cophore and three-dimensional quantitative
Vajda S, Vasquez M, Westbrook J, Wilson I structure activity relationship methods for
(2009) Outcome of a workshop on applica- modeling cytochrome P450 active sites. Drug
tions of protein models in biomedical research. Metab Dispos 29:936–944
Structure 17:151–159 56. Ekins S, Erickson J (2002) A pharmacophore
43. Irwin J (2008) Community benchmarks for for human pregnane X receptor ligands. Drug
virtual screening. J Comput Aided Mol Des Metab Dispos 30:96–99
22:193–199
7 Informing Mechanistic Toxicology with Computational Molecular Models 165
Abstract
Efficient storage and retrieval of chemical structures is one of the most important prerequisite for solving
any computational-based problem in life sciences. Several resources including research publications, text
books, and articles are available on chemical structure representation. Chemical substances that have same
molecular formula but several structural formulae, conformations, and skeleton framework/scaffold/
functional groups of the molecule convey various characteristics of the molecule. Today with the aid of
sophisticated mathematical models and informatics tools, it is possible to design a molecule of interest with
specified characteristics based on their applications in pharmaceuticals, agrochemicals, biotechnology,
nanomaterials, petrochemicals, and polymers. This chapter discusses both traditional and current state of
art representation of chemical structures and their applications in chemical information management,
bioactivity- and toxicity-based predictive studies.
Key words: Linear molecular representations, 2D and 3D representation, Chemical fragments and
fingerprints, Chemical scaffolds, Toxicophores, Pharmacophores, Molecular similarity and diversity,
Molecular descriptors, Structure activity relationship studies
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_8, # Springer Science+Business Media, LLC 2012
167
168 M. Karthikeyan and R. Vyas
2. Materials
2.1. Linear Linear line notation is easily accessible to chemists and flexible
Representation enough to allow interpretation and generation of chemical notation
similar to natural language. Alphanumeric string-based linear
chemical structure encoding rules were developed by the pioneer-
ing contributions of Wiswesser, Morgan, Weininger, and Dyson,
and eventually applied in machine description. In 1949, William
Wiswesser introduced a line notation system based on the Berzelian
symbols with the structural and connectivity features. This system
named as Wiswesser Line Notation (WLN) was used online for
structure and substructure searches (20). Unfortunately, the adop-
tion of WLN was ignored due to the complexity of specification and
inflexible rules associated with it. The development of SMILES
notation made a significant effect on compact storage in chemical
information systems and it has led to the development of modern
form of representing chemical structures. This line notation system
has several advantages over the older systems in terms of its com-
pactness, simplicity, uniqueness, and human readability. David
Weininger developed the DEPICT program to interpret SMILES
into a molecular structure (21). A detailed description of many
advanced versions of SMILES such as USMILES, SMARTS,
STRAPS, and CHUCKLES are available at Daylight website (22).
SMiles ARbitrary Target Specification (SMARTS) is basically an
extension of SMILES used for describing molecular patterns as
170 M. Karthikeyan and R. Vyas
Chemical
structure
representation
methods
Abstract Automatic
General
representation representation
Vision
Line notations Markush
Connection
generic
SMILES, WLN, table
(patents)
ROSDAL, SLN
Barcoding
Matrices RF
(2D / 3D) Substructures
representation
used in searching
OCR
Hash 2D
Fragment
codes fingerprints
coding
Table 1
Various notations of 3- (p-CHLOROPHENYL) -1, 1-DIMETHYLUREA
TRICHLOROACETATE
l CAS No:140-41-0
l Other Names:
l EPA Pesticide Chemical Code 035502
l GC-2996
l LS-12938
l Caswell No. 583A
l PubChem : 8799
l EPA Pesticide Chemical Code: 035502
l Urox
l Monuron trichloroacetate
l 3-(p-CHLOROPHENYL)-1,1-DIMETHYLUREA TRICHLOROACETATE
l Acetic acid, trichloro-, compd. with 3-(p-chlorophenyl)-1,1-dimethylurea (1:1)
l Urea, 3-(p-chlorophenyl)-1,1-dimethyl-, cmpd. with trichloroacetic acid (1:1)
l Acetic acid, trichloro-, compd. with N0 -(4-chlorophenyl)-N,N-dimethylurea (1:1) (9CI)
l Trichloroacetic acid compound with 3-(p-chlorophenyl)-1,1-dimethylurea (1:1) (8CI)
l Wiswesser Line Notation (WLN): GR DMVRN1&1 &GXGGVO
l Canonical SMILES: CN(C)C(¼O)NC1 ¼ CC ¼ C(C ¼ C1)Cl.C(¼O)(C(Cl)(Cl)Cl)O
l InChI:
l InChI ¼ 1 S/C9H11ClN2O.C2HCl3O2/c1-12(2)9(13)11-8-5-3-7(10)4-6-8;3-2(4,5)1(6)7/
h3-6 H,1-2 H3,(H,11,13);(H,6,7)
l InChIKey: DUQGREMIROGTTD-UHFFFAOYSA-N
H3C CH3
N
O
O NH
HO
Cl
Cl
Cl
Cl
172 M. Karthikeyan and R. Vyas
Marvin 08081111502D
20 19 0 0 0 0 999 V2000
1 2 1 0 0 0 0
2 3 1 0 0 0 0
2 4 1 0 0 0 0
4 5 2 0 0 0 0
4 6 1 0 0 0 0
6 7 1 0 0 0 0
7 8 2 0 0 0 0
8 9 1 0 0 0 0
9 10 2 0 0 0 0
10 11 1 0 0 0 0
11 12 2 0 0 0 0
7 12 1 0 0 0 0
10 13 1 0 0 0 0
14 15 2 0 0 0 0
14 16 1 0 0 0 0
16 17 1 0 0 0 0
16 18 1 0 0 0 0
16 19 1 0 0 0 0
14 20 1 0 0 0 0
M END
3. Methods
3.1. Strategies Used in To know which molecule is already published and studied in chem-
Searching Chemical ical or biological context, scientific literature search is conducted
Databases with the help of exact structure, similar structure, substructure and
hyperstructures.
3.1.1. Fragment and In the fragment-based approach in order to search a database to find
Fingerprint Based Search similar structures, the query molecule (graph) is fragmented into
Strategies various logical fragments (subgraphs such as functional groups, and
rings) (34). The list of retrieved hit structures will include all those
substructure (fragments) that were present in the query structure.
Another approach is the fingerprint-based approach which is more
of a partial description of the molecule as a set of fragments rather
than a well-defined substructure (35). It essentially treats a molecule
as a binary array of integers where each element (1 or 0) representing
presence or absence of particular fragments as TRUE or FALSE.
A given bit is set to 1 (True) if a particular structural feature is
present and 0 (False). Hashed Fingerprints composed of bits of
molecular information (fragments) such as types of rings, functional
groups, and other types of molecular and atomic data. Comparing
fingerprints will allow one to determine the similarity between two
molecules A and B as shown in Fig. 3.
3.1.2. Similarity Search The fingerprints of two structures can be compared and a distance
score can be computed for their similarity. The similarity coefficient
metrics such as Tanimoto, Cosine and Euclidean distance helps to
filter relevant molecules rapidly from millions of entries (36). Taka-
shi developed an approach for automatic identification of molecular
Molecule A 1000100011100010100001000000100101 a = 11
Molecule B 0001100001000010010001000000100101 b = 9
Similarity (A and B) 0000100001000010000001000000100101 c = 7
3.1.3. Hash Coding Hash coding of molecules is an efficient way of substructure storing
and searching where a molecule is assigned given a unique id key
which is used for directly mapping the address of the compound in
a computer system. There is, however, information loss in this
approach as the source molecule cannot be reconstructed from
the hash keys (38). In a report by Wipke et al. four different hash
functions were used effectively for the rapid storage and retrieval of
chemical structures (39). Hodes and Feldman developed a novel
file organization for substructure searching via hash coding (40).
3.1.5. Abstract One of the major challenges confronting pharma industry is how to
Representation find a molecule or list of molecules with desired properties which
of Molecules are not yet published? One approach is to simulate all the possible
chemicals for a given formula and enumerate them to full length.
However it is impractical to do that and also searching each entry
with millions of entries already known is another impossible task!
Only solution is to map what is known from the literature and
hence that would indirectly indicate what is “not yet reported.”
It does not necessarily imply that “those entries are not yet identi-
fied.” It is known in business context that several thousand new
molecules with desired properties are kept as trade secrets and
superficially protected by patents as generic structures. There are
occasions where the molecules are represented as markush struc-
tures in a generic context to cover a family of molecular structures
and that sometimes go beyond millions. Markush structures are
generic structures used in patent databases such as MARPAT main-
tained by Chemical abstracts service for protecting intellectual
chemical information in patents (42) (Fig. 4).
3.2. Chemical Drawing There are several commonly available tools for generating chemical
Tools structures and storing them in standard file formats. Most popular
ones among academia and industry are MarvinSketch from Che-
mAxon and ChemDraw from Cambridgesoft. Other tools exist for
specific purposes like integrating with analytical data (1H NMR),
for example, ChemSketch and Inventory management using ISIS
draw/ISIS Base(45). With the help of these drawing tools one can
easily drag and drop necessary pre-built templates and build com-
plex chemical structures. Interconversion of 2D to 3D structures is
also possible through these tools. The quality of 3D structure
however depends on the methods used within the system. The
best 3D model is the one which is comparable with X-ray crystallo-
graphic data. Corina a software developed by Gasteiger et al. is very
fast in computing 3D coordinates for 2D structures. Comparable
or even better algorithms are implemented in ChemAxon tools.
With the help of 3D structure it is possible to calculate energy of the
molecule, volume, interatomic charge distribution, and other
three-dimensional descriptors required for QSAR-based predictive
studies.
A simple way to generate a 3D structure is to use Marvin View:
l Create and open a molecule in Marvin View tool
l Then go to edit and clean in 3D
l The output will be the 3D structure of the molecule as shown
in Fig. 5
3.3. Current Trends in The traditional methods of chemical structure representation and
Chemical Structure manipulation can be time consuming and tedious entailing long
Representation hours of searching. In this section we discuss the new emerging
technologies being developed to handle chemical structures for
automation and inventory management. Neural network-based
chemical structure indexing is a technique where the chemical
structure is presented as an image to the neural network using
pulse-coupled neural networks (PCNN) to produce binary bar-
codes of chemical structures (46).
Karthikeyan et al. in their work on encoding of chemical struc-
tures reported that chemical structures can be encoded and read as
2D barcodes (PDF417 format) in a fully automated fashion (47).
A typical linear barcode consists of a set of black bars in varying
width separated by white spaces encoding alphanumeric characters.
To reduce the amount of data that has to be encoded on the
8 Chemical Structure Representations and Applications in Computational Toxicity 177
45 52 0 0 0 0 999 V2000
6.1237 0.8510 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
.......... (Lines removed for brevity) ............
-0.8298 7.1304 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1240000
............ (Lines removed for brevity) ............
43 44 1 0 0 0 0
M END
(f) Gaussian Cube file with atom data
No surface data provided
45 0.000000 0.000000 0.000000
1 1.889726 0.000000 0.000000
............. (Lines removed for brevity) ............
6 6.000000 -2.927110 25.152404 0.000000
0.00000E00
(g) PDB Format
HEADER PROTEIN 13-JUL-11 NONE
TITLE NULL
COMPND MOLECULE: Structure #1
SOURCE NULL
KEYWDS NULL
EXPDTA NULL
AUTHOR Marvin
REVDAT 1 13-JUL-11 0
HETATM 1 C UNK 0 11.431 1.589 0.000 0.00 0.00 C + 0
............. (Lines removed for brevity) ..............
HETATM 45 C UNK 0 -1.549 13.310 0.000 0.00 0.00 C + 0
CONECT 1 2 6 23
............ (Lines removed for brevity) ............
CONECT 45 42
MASTER 0 0 0 0 0 0 0 0 45 0 104 0
END
(h) XYZ format
45
Structure #1
C 11.43091 1.58853 0.00000
......... (Lines removed for brevity) ..............
C -1.54896 13.31008 0.00000
Software programs like openBabel can interconvert molecules
over 50 standard file formats required by several computational
chemistry and chemoinformatics oriented programs (50). Dalby
et al. in a classic paper have discussed all the file formats and their
interrelations that are required for storing and managing chemical
structures developed at Molecular Design Limited (MDL) (51).
RF tagging is a technology complementary to barcode repre-
sentation of molecules which is commonly used technique in
180 M. Karthikeyan and R. Vyas
3.4. Handling Chemical Storage of chemical structure in proper file format is very impor-
Structure Information tant, for example, structures (html) stored as GIF, JPG, BMP, and
PNG look alike but may not be compatible while database transfer
or computer processing. In late 1970s, Blake et al. developed
methods for processing chemical structure information using light
pen to create high-quality structures which were used in chemical
abstract volumes (60). A decade later all the features of structure
formatting guidelines for chemical abstract publications appeared
in a paper by Alan Goodson (61). Another interesting parallel
development was the use of SNN (structure–Nomenclature Nota-
tion) wherein the molecule was split into fragments by structure
determining vertices and linked by special signs for use in Beilstein
system (62). Igor developed a compact code for storing structure
formulas, performing substructure, and similarity searches and
applied it on a set of 50,000 structures (63). A modular architecture
for chemical structure elucidation called as mosaic or artel architec-
ture was also developed (64). Hagadone and Schulz used a rela-
tional database system in conjunction with a chemical software
component to create chemical databases with enhanced retrieval
capabilities (65).
3.5. Cluster Analysis Clustering is a process of finding the common features from a
and Classification of diverse class of compounds that requires multivariate analysis meth-
Chemical Structures ods. One of the most suitable methods for this study is clustering
where the consensus score and distance between set of compounds
could be easily measured through mean/Euclidean distance mea-
sures. This score reflects the similarity or dissimilarity between
classes of compounds and helps to identify potentially active or
toxic substances through predictive studies. Peter Willet carried
out a comparative study of various clustering algorithms for classi-
fying chemical structures (66). An excellent review by Barnard and
Downs illustrates the methods useful for clustering files of chemical
structures (67). The jarvis patrick algorithm is useful for clustering
chemical structures on the basis of 2D fragment descriptors (68).
Lipinski rule of five is one such example where the similar
182 M. Karthikeyan and R. Vyas
3.7.1. Structure Correlation Here it is pertinent to discuss in detail the role of molecular struc-
with Toxicity tures/functional groups in toxicity context. Toxicity is another
important parameter which needs to be assessed from molecular
structures. Topkat is one such program which predict several
toxicological-related data from given molecular structures based
on the availability of selected structural patterns (78). The basic
premise of the field is that molecular structure is related to the
potential toxicity of a compound. This structure-based modeling
8 Chemical Structure Representations and Applications in Computational Toxicity 185
–
O O–
C HO 5
N
S2+ C
C NH H2N O
C O
1 2 3
O–
–
O 4
O A N C S2+
N
S2+ C N A O– N+ N+ N– A O
–
6 O A
8 9
O– 7
O
H H
H H N A N
N A N A
A NH2 A N
C OH HO A 13 H
A A
10 11 12 14
F
F H
N N
A
N O O
A
N A F 17
15 16
F
F N
N N+ NH– A C O O
18 F A N 20
19 H
O N
H
N A O H2N NH
C A S C A
21 23
N A H
OH
H 22
H
N A S
C A A O A O
24 O 26
25
–
O O–
O–
S2+
O
A S2+ O O
O– O
N A 29
27 28
O
O O
N+ A F
A
–
O A N+ – C
O
N A OH S2+ C O– F
32
30 31 F
O–
3.7.2. Virtual Library In order to design a better lead molecule one has to perform
Enumeration sequence of several steps starting from collecting molecular data
with known bioactivity, analysis of those chemical structures to
extract significant features related to activity of interest and rebuild
new molecules with promising and favorable bioactivity profiles.
Virtual library of diverse molecules which are not yet synthe-
sized can be enumerated from a set of scaffolds and functional
groups by combinatorial means. Here the scaffold represents a
molecule containing at least one ring or several rings which are
connected by linker atoms. Scaffolds can be generated from com-
plex molecular structures by a systematic disconnection of func-
tional groups connected by single bonds. Using this approach a
scaffold translator (chemscreener and scaffoldpedia) was built and
applied on large-scale data set of PubChem compounds containing
over 12 million ring structures to generate a library of about 1
million scaffolds in a distributed computing environment (82).
From this study, it was observed that over 300,000 molecules
contain a common scaffold of “benzene” ring. The scaffolds and
functional groups generated could be further enumerated to build
virtual library of diverse organic molecules. An alternate approach
namely “lead hopping” is also available to replace common scaffold
by chemically and spatially equivalent core fragments (83).
Designing a better molecule is just one aspect of research, and
getting it synthesized and biologically evaluated is the most critical
component in drug discovery programs. Therefore it is necessary to
rank molecules based on their synthetic accessibility score that
represents the ease of synthesis of required molecule in a shortest
possible way with high yield (84). Reactivity pattern fingerprints are
being developed to characterize the molecules either as a reactant
or product based on the nature of functional groups composition
(85). The emerging trends in mapping reactivity pattern of mole-
cule is illustrated with following example where molecule “A” from
KEGG database with Reactant Like Score of (RLS) 52:22 contains
more reactivity groups which is likely to be a reactant for 52 reac-
tions whereas it is likely to be a product for 22 known reactions.
188 M. Karthikeyan and R. Vyas
The converse is true for another molecule “B” with Product Like
Score (PLS) of 13:5 where it is likely to be less reactive and could be
synthesized through 13 different reactions but likely to undergo just
5 type of reactions based on their functional groups (Fig. 11). With
the help of these PLS and RLS one can prioritize and characterize
the compounds either as a reactant or as a product rapidly through
in silico analysis and design efficient synthetic routes for the same.
Substances
• EPA • Pesticides • DSSTOX
• TSCA • Fungicides • RTECS
• Persistent organic DDT
• REACH • MSDS
substances(POS) Endosulfan
• TOXNET
• Explosives RDX
• NIOSTIC
• Controlled
substances
4. Notes
References
1. Ash JE, Warr WA, Willett P (1991) Chemical 18. www.ccl.net/cca/documents/molecular-model-
structure systems. Ellis Horwood, Chichester ing/node3.html. Accessed 11 July 2012
2. http://pubchem.ncbi.nlm.nih.gov/. Accessed 19. Wendy AW (2011) Representation of chemical
11 July 2012 structures. Vol 1. Wiley, New York. doi:10.
3. http://www.ncbi.nlm.nih.gov/sites/entrez?db¼ 1002/wcms.36
pccompound&term ¼ formaldehyde. Accessed 20. Fritts L, Schwind E, Margaret M (1982) Using
11 July 2012 the Wiswesser line notation (WLN) for online,
4. http://pubchem.ncbi.nlm.nih.gov/summary/ interactive searching of chemical structures.
summary.cgi?cid¼712. Accessed 11 July 2012 J Chem Inf Comput Sci 22:106–9
5. http://pubchem.ncbi.nlm.nih.gov/search/ 21. Weininger D (1990) SMILES Graphical depic-
search.cgi#. Accessed 10 July 2012 tion of chemical structures. J Chem Inf Com-
6. http://www.genome.jp/kegg/. Accessed 11 put Sci 30:237–43
July 2012 22. www.daylight.com/dayhtml/doc/theory/
7. https://www.ebi.ac.uk/chembldb/. Accessed theory.smarts.html. Accessed 11 July 2012
11 July 2012 23. Ash S, Malcolm AC, Homer RW, Hurst T,
8. http://www.drugbank.ca/. Accessed 11 July Smith GB (1997) SYBYL Line Notation
2012 (SLN): A versatile language for chemical struc-
ture representation. J Chem Inf Comput Sci
9. http://www.epa.gov/ncct/dsstox/index.html. 37:71–79
Accessed 11 July 2012
24. McNaught A (2006) The IUPAC Interna-
10. http://www.cdc.gov/niosh/rtecs/. Accessed tional Chemical Identifier: InChl. Chemistry
11 July 2012 International (IUPAC) 28 (6). http://www.
11. http://www.cdc.gov/niosh/database.html. iupac.org/publications/ci/2006/2806/
Accessed 11 July 2012 4_tools.html. Accessed 11 July 2012
12. http://www.msds.com/. Accessed 11 July 25. Grave KD, Costa F (2010) Molecular graph
2012 augmentation with rings and functional
13. http://www.cas.org. Accessed 11 July 2012 groups. J Chem Inf Model 50:1660–8
14. http://beilstein.com. Accessed 11 July 2012 26. www.lohninger.com/helpcsuite/connection_
15. https://scifinder.cas.org. Accessed 11 July 2012 table.htm. Accessed 11 July 2012
16. Gasteiger J, Engel T (eds) (2003) Chemoinfor- 27. Wipke WT, Dyott TM (1974) SEMA the
matics: a textbook. Wiley-VCH, Weinheim stereochemically extended algorithm. J Am
17. Quadrelli L, Bareggi V, Spiga S (1978) A new Chem Soc 96:4834
linear representation of chemical structures. 28. Christie BD, Leland BA, Nourse JG (1993)
J Chem Inf Comput Sci 18:37–40 Structure searching in chemical databases by
8 Chemical Structure Representations and Applications in Computational Toxicity 191
direct lookup methods. J Chem Inf Comput connection table representation for generic
Sci 33:545–7 structures. J Chem Inf Comput Sci 22:160–4
29. Helmut B (1982) Stereochemical structure 43. Barnard JM, Lynch MF, Welford SM (1981)
code for organic chemistry. J Chem Inf Com- Computer storage and retrieval of generic
put Sci 22:215–22 chemical structures in patents.; GENSAL, a
30. Bath PA, Poirrette AR, Willett P, Allen FH formal language for the description of generic
(1994) Similarity searching in files of three- chemical structures. J Chem Inf Comput Sci
dimensional chemical structures: Comparison 21:151–61
of fragment-based measures of shape similarity. 44. Kaback SM (1980) Chemical structure search-
J Chem Inf Comput Sci 34:141–7 ing in Derwent’s World Patents Index. J Chem
31. http://www.molecular-networks.com. Accessed Inf Comput Sci 20:1–6
11 July 2012 45. www.chemaxon.com. Accessed 11 July 2012
32. Nicklaus MC, Milne GW, Zaharevitz D (1993) 46. Rughooputh SDDV, Rughooputh HCS (2001)
ChemX and Cambridge Comparison of com- Neural network based chemical structure
puter generated chemical structures with X ray indexing. J Chem Inf Comput Sci 41:713–717
crystallographic data. J Chem Inf Comput Sci 47. Karthikeyan M, Bender A (2005) Encoding
33:639–46 and decoding graphical chemical structures as
33. Clark DE, Jones G, Willet P et al (1994) Phar- two-dimensional (PDF417) barcodes. J Chem
macophoric pattern matching in files of three Info Model 45:572–580
dimensional chemical structure: Comparison 48. Karthikeyan M, Krishnan S, Steinbeck C
of conformational searching algorithms for (2002) Text based chemical information loca-
flexible searching. J Chem Inf Comput Sci tor from Internet (CILI) using commercial
34:197–206 barcodes, 223rd American Chemical Society
34. Bond VL, Bowman CM, Davison LC et al Meeting - Orlando, Florida, USA
(1979) On-line storage and retrieval of chemi- 49. Karthikeyan M, Uzagare D, Krishnan S. Com-
cal information Substructure and biological pressed Chemical Markup Language for com-
activity screening. J Chem Inf Comput Sci pact storage and inventory applications, 225th
19:231–4 ACS Meeting - New Orleans, March 23–27,
35. Wang Y, Bajorath J (2010) Advanced Finger- 2003. CG ACS
print methods for similarity searching: balanc- 50. Guha R, Howard MT, Hutchinson GR et al
ing molecular complexity effects. Comb Chem (2006) The Blue Obelisk—interoperability in
High Throughput Screen 13:220–228 chemical informatics. J Chem Inf Model
36. Andrew Leach (2007) An introduction to che- 46:991–998
moinformatics Springer 51. Dalby A, Nourse JG, Hounshell WD et al
37. Takahashi Y, Sukekawa M, Sasaki S (1992) (1992) Description of several chemical struc-
Automatic identification of molecular similarity ture used by computer programs developed at
using reduced-graph representation of chemi- molecular design limited. J Chem Inf Comput
cal structure. J Chem In Comput Sci Sci 32:244–55
32:639–43 52. Xiai XY, Li RS (2000) Solid phase combinatorial
38. Zupan J (1989) Algorithms for chemists. synthesis using microkan reactors, Rf tagging
Wiley, Chichester, UK and directed sorting. Biotech Bioeng 71:41–50
39. Wipke WT, Krishnan S, Ouchi GI (1978) Hash 53. en.wikipedia.org/wiki/Optical_character_re-
functions for rapid storage and retrieval of cognition
chemical structures. J Chem Inf Comput Sci 54. Valko AT, Peter JA (2009) CLiDE Pro: The
18:32–7 Latest Generation of CLiDE, a Tool for Opti-
40. Hodes L, Feldman A (1978) An efficient cal Chemical Structure Recognition. J Chem
design for chemical structure searching. II. Info Model 49:780–787
The file organization. J Chem Inf Comput Sci 55. cactus.nci.nih.gov/osra/
18:96–100 56. infochem.de/mining/chemocr.shtml
41. Brown R, Downs D, Geoffrey M et al (1992) 57. Murray-Rust P, Rzepa HS (2003) Chemical
Hyperstructure model for chemical structure Markup, XML and the world wide web CML
handling: generation and atom-by-atom Schema. J Chem Inf Comput Sci 43:757–72
searching of hyperstructures. J Chem Inf
Comput Sci 32:522–31 58. http://www.molinspiration.com/jme/index.
html. Accessed 11 July 2012
42. Barnard JM, Lynch MF, Welford SM (1982)
Computer storage and retrieval of generic 59. http://moltable.ncl.res.in. Accessed 11 July
structures in chemical patents. 4. An extended 2012
192 M. Karthikeyan and R. Vyas
60. Blake JE, Farmer NA, Haines RC (1977) An chemical properties like boiling points. J Chem
interactive computer graphics system for pro- Inf Comput Sci 32:237–44
cessing chemical structure diagrams. J Chem 76. Helge V, Volkmar M (1996) Prediction of
Inf Comput Sci 17:223–8 material properties from chemical structures.
61. Goodson AL (1980) Graphical representation The clearing temperature of nematic liquid
of chemical structures in Chemical Abstracts crystals derived from their chemical structures
Service publication. J Chem Inf Comput Sci by artificial neural Networks. J Chem Inf Com-
20:212–17 put Sci 36:1173–1177
62. Walentowski R (1980) Unique unambiguous 77. Karthikeyan M, Glen RC, Bender A (2005)
representation of chemical structures by com- General melting point prediction based on a
puterization of a simple notation. J Chem Inf diverse compound data set and artificial neural
Comput Sci 23:181–92 networks. J Chem Inf Model 45:581–90
63. Strokov I (1995) Compact code for chemical 78. http://accelrys.com/solutions/scientific-need/
structure storage and retrieval. J Chem Inf predictive-toxicology.html. Accessed 11 July
Comput Sci 35:939–44 2012
64. Strokov I (1996) A new modular architecture 79. Hakimelahi GH, Khodarahmi GA (2005) The
for chemical structure elucidation systems. identification of toxicophores for the predic-
J Chem Inf Comput Sci 36:741–745 tion of mutagenicity, hepatotoxicity and cardi-
65. Hagadone TR, Schulz MW (1995) Capturing otoxicity. JICS 2:244–267
Chemical Structure Information in a Relational 80. Garg D, Gandhi T, Gopi Mohan C (2008)
Database System: The Chemical Software Exploring QSTR and toxicophore of HERG K +
Component Approach. J Chem Inf Comput channel blockers using GFA and HypoGen tech-
Sci 35:879–84 niques. J Mol Graph Model 26:966–76
66. Willett Peter J (1984) Evaluation of relocation 81. Hughes JD, Blagg J, Da P et al (2008) Physi-
clustering algorithms for the automatic classifi- cochemical drug properties associated with
cation of chemical structures. J Chem Inf Com- in vivo toxicological outcomes. Bioorg Med
put Sci 24:29–33 Chem Lett 18:4872–4875
67. Barnard JM, Downs GM (1992) Clustering of 82. Karthikeyan M, Krishnan S, Pandey AK, Bender
chemical structures on the basis of two- A, Tropsha A (2008) Distributed chemical com-
dimensional similarity measures. J Chem Inf puting using Chemstar: an open source Java
Comput Sci 32:644–9 Remote Method Invocation architecture applied
68. Gu Q, Xu J, Gu L (2010) Selecting diversified to large scale molecular data from Pubchem.
compound to build a tangible library for J Chem Info Comput Sci 48:691–703
biological and biochemical assay. Molecules 83. Maass P (2007) Recore: A fast and versatile
15:5031–44 method for scaffold hopping based on small
69. www.chemaxon.com/jchem/doc/user/Lib molecule crystal structure conformations.
MCS.html. Accessed 11 July 2012 J Chem Inf Model 47:390–9
70. Karthikeyan M, Krishnan S, Pandey AK (2006) 84. Ertl P, Schuffenhauer A (2009) Estimation of
Harvesting chemical information from the synthetic accessibility score of drug like mole-
internet using a distributed approach: Chem cules based on molecular complexity and frag-
Extreme. J Chem Inf Model 46:452–461 ment contributions. J Cheminform 1:8
71. www.moleculardescriptors.eu/softwares/soft 85. Melvin JYu (2011) Natural product like virtual
wares.htm. Accessed 11 July 2012 libraries: Recursive atom based enumeration.
72. Horvath D (2011) Pharmacophore-based vir- J Chem Inf Model 51:541–557
tual screening In: Bajorath J (ed) Chemoinfor- 86. Yang H, Jiang Z, Shi S (2006) Aromatic com-
matics and Computational Chemical Biology, pounds biodegradation under anaerobic condi-
Methods in Molecular Biology, Humana Press tions and their QSBR models. Sci Total
672:261–298 Environ 358:265–76
73. Yang SY (2010) Pharmacophore modeling 87. http://www.epa.ie/whatwedo/monitoring/
and applications in drug discovery challenges reach/. Accessed 11 July 2012
and recent advances. Drug Discov Today 15: 88. Zhang X, Brown TN, Wania F (2010) Assess-
444–50 ment of chemical screening outcomes based on
74. Adamson GW, Bawden D (1981) Comparison different partitioning property estimation
of hierarchical cluster analysis techniques for methods. Environ Int 36:514–20
automatic classification of chemical structures. 89. Qiu X, Zhu T, Wang FHJ (2008) Air water gas
J Chem Inf Comput Sci 21:204–9 exchange of organochlorine pesticides in
75. Balaban AT, Kier LB, Joshi N (1992) Correla- Taihu lake China. Environ Sci Technol 42:
tions between chemical structure and physico- 1928–32
Chapter 9
Abstract
Chemical compounds participate in all the processes of life. Understanding the complex interactions of
small molecules such as metabolites and drugs and the biological macromolecules that consume and
produce them is key to gaining a wider understanding in a systemic context. Chemical property databases
collect information on the biological effects and physicochemical properties of chemical entities. Accessing
and using such databases is key to understanding the chemistry of toxic molecules. In this chapter, we
present methods to search, understand, download, and manipulate the wealth of information available in
public chemical property databases, with particular focus on the database of Chemical Entities of Biological
Interest (ChEBI).
Key words: Chemistry, Databases, Ontology, ChEBI, Chemical graph, Cheminformatics, Chemical
properties, Structure search, Chemical nomenclature
1. Introduction
Small molecules are the chemical entities which are not directly
encoded by the genome, but which nevertheless are essential
participants in all the processes of life. They include many of the
vitamins, minerals, and nutritional substances which we consume
daily in the form of food, the bioactive content of most medicinal
preparations we use for the treatment of diseases, the neurotrans-
mitters which modulate our mood and experience, and the meta-
bolites which are transformed and created in a multitude of
complex pathways within our cells.
Understanding the complex interactions of small molecules
such as metabolites and drugs with the biological macromolecules
that consume and produce them is key to gaining a wider biological
understanding in a systemic context, such as can enable the predic-
tion of the therapeutic—and toxic—effects of novel chemical sub-
stances. Ultimately, all therapeutic and harmful effects of chemical
substances depend on the constitution, shape, and chemical
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_9, # Springer Science+Business Media, LLC 2012
193
194 J. Hastings et al.
2. Materials
2.1. Describing The shape and properties of chemical entities depends to a large
Chemical Structures extent on the molecular structure of the entity—that is, the
with Chemical Graphs arrangement of atoms and bonds. Chemical graphs are a way of
representing these core elements in a concise formalism. The chem-
ical (molecular) graph describes the atomic connectivity within a
molecule by using labelled nodes for the atoms or groups within
the molecule, and labelled edges for the (usually covalent) bonds
between the atoms or groups (2).
The graph, strictly speaking, encodes only the constitution of
molecules, i.e., their constituent atoms and bonds. However, the
representational formalism is usually extended to include other
information such as idealised 2D or 3D coordinates for the
atoms, and bond order and type (single, double, triple, or aro-
matic). The chemical graph formalism is also accompanied by a
standard for diagrammatic representation. Figure 1 illustrates an
example of a chemical graph, 2D and 3D coordinates, and the
corresponding 2D and 3D visualisations.
Hydrogen nodes, and the edges linking them to their nearest
neighbouring atoms, are not explicitly displayed in Fig. 1, and
carbon nodes, which form the edges of the illustration, are not
explicitly labelled as such. These representational economies are
due to the prevalence of carbon and hydrogen in organic chemistry,
thus allowing these efficiencies, which have been introduced for
clarity of depiction, and efficiency in storage. Hydrogen-suppressed
graphs of this form are called skeleton graphs.
Chemical graphs simplify the complex nature of chemical entities
and allow for powerful visualisations, which assist the work of both
bench and computational chemists. They also enable many useful
predictions to be made about those physical and chemical properties
of molecules, which are based on connectivity. For these reasons, most
chemical property databases make use of the chemical graph formal-
ism for the storage of structural information about chemical entities.
9 Accessing and Using Chemical Property Databases 195
Fig. 1. The chemical graph (connection table) for a simple cyclohexane molecule, together with idealised 2D and 3D
coordinates and visualisations (for the 3D coordinates and visualisation shows the chair conformer of cyclohexane).
2.2. Exchanging Data The most common format used to encode and exchange chemical
on Chemicals graphs is the MOLfile format owned by Elsevier’s MDL (3). The
MOLfile is a flat ASCII text file with a specific structured format,
consisting of an atom table, describing the atoms contained in the
chemical entity, and a bond table, describing the bonds between
the atoms. Both the atom table and the bond table are extended
with additional properties including the isotope and charge of
the individual atoms, and the bond order and type of the bonds.
The MOLfile representation of the cyclohexane molecule is illu-
strated in Fig. 2.
The content of a MOLfile depends on the way in which the
chemical structure is drawn. For this reason, it is not possible to
efficiently check whether two representations are of the same chem-
ical based on the MOLfile, and therefore a canonical (i.e., the same
regardless of which way the molecule is drawn) and unique repre-
sentation of the molecule is needed for efficient identification. An
international standard for identification of chemical entities is the
IUPAC International Chemical Identifier (InChI) code. The InChI
is a nonproprietary, structured textual identifier for chemical enti-
ties which is generated by an algorithm from a MOLfile represen-
tation of the chemical entity (4). The generated InChI identifier is
not intended to be read and understood by humans, but is particu-
larly useful for computational matching of chemical entities, such as
when a computer has to align two different sets of data derived
196 J. Hastings et al.
Fig. 2. The MOLfile format for a cyclohexane molecule. The carbon atoms appear in the atom table and in the bond table
the bonds between atoms are listed, with line numbers from the atom table representing the atoms participating in the
bonds. Additional property columns allow for representation of many more features such as charge and stereochemistry.
2.3. Repositories of Until fairly recently, the bulk of chemical data was only available
Chemical Property Data through proprietary chemical databases such as Chemical Abstracts
Service (CAS), provided by the American Chemical Society (6), and
Beilstein which since 2007 has been provided by Elsevier (7).
However, in recent years this has been changing, with more and
more chemical data being brought into the public domain. This
change has been brought about partly through the efforts of the
bioinformatics community, which needed access to chemistry data
to support systems-wide integrative research, and partly through
the joint efforts of pharmaceutical companies to reduce the expense
of pre-competitive research, since pharmaceutical companies had
historically each maintained their own database of chemicals for
pre-competitive research (8).
In 2004, two complementary open access databases were
initiated by the bioinformatics community, ChEBI (1) and Pub-
Chem (9). PubChem serves as automated repository on the struc-
tures and biological activities of small molecules, containing 29
million compound structures. ChEBI is a manually annotated data-
base of small molecules containing around 620,000 entities. Both
resources provide chemical structures and additional useful infor-
mation such as names and calculated chemical properties. ChEBI
additionally provides a chemical ontology, in which a structure-
based and role-based classification for the chemical entities is
provided. Additional publicly available resources for chemistry
information which are becoming more widely used are ChemSpider
(10), which provides an integrated cheminformatics search plat-
form across many publicly available databases, and the Wikipedia
Chemistry pages (11). Many smaller databases also exist, often
dedicated to particular topic areas or types of chemicals. For a full
listing of publicly available chemistry data sources, see (12), and for
a discussion see (13). Table 1 gives a brief listing of some of the
publicly available databases for chemical structures and properties.
2.4. Software Libraries Manipulating chemical data and generating chemical properties
programmatically requires the use of a cheminformatics software
library which is able to perform standard transformations and
provides implementations for common algorithms. The Chemistry
198 J. Hastings et al.
Table 1
A listing of several chemical property databases which are publicly available
(i.e., not commercial)
3. Methods
3.1. Searching The first entry point in accessing chemical data is the search inter-
for Entities face provided by the database one is accessing. Generally, such
databases provide a “simple” search as a first entry point, which
3.1.1. Searching Using Text
takes the search string and searches across all data fields in the
and Property Values
database to bring back candidate hits. For example, ChEBI pro-
vides a search box in the centre of the ChEBI front page and in the
top right of every entry page. The search query may be any data
associated with an entity, such as names, synonyms, formulae, CAS
or Beilstein Registry numbers, or InChIs.
When searching any database with free text, it is important to
know the wildcard character, which allows one to match part of a
word or phrase. For the ChEBI database, the wildcard character is
“*.” A wildcard character allows one to find compounds by typing
in a partial name. The search engine will then try to find names
matching the pattern one has specified.
l To match words starting with a search term, add the wildcard
character to the end of the search term. For example, searching
for aceto* will find compounds such as acetochlor, acetophena-
zine, and acetophenazine maleate.
200 J. Hastings et al.
Fig. 3. The ChEBI advanced search showing textual and chemical property-based search options. Multiple search options
may be specified and combined with the logical operators AND, OR, and BUT NOT. Text and property-based searches may
be combined with chemical structure searches.
9 Accessing and Using Chemical Property Databases 201
Fig. 4. The JChemPaint applet for drawing chemical structures inside of web pages for performing chemical structure-
based searches. The bond selection utility is on the left and the atom selection along the bottom. The top menu includes
file tools and utilities, and along the right hand menu are common structures and an extended structure template library.
Structure search types are “Identity,” “Substructure,” and “Similarity”.
Table 2
Bitmaps for individual structural patterns and the resulting
combined fingerprint for a water molecule in a simplified
fingerprinting example which consists only of a bitmap
of length 10
O 0010000000
HO 1010000000
OH 0000100010
HOH 0000000101
Result: 1010110111
Table 3
Bitmaps for individual structural patterns and the resulting
combined fingerprint for hydroxide, which is a substructure
of the water molecule
O 0010000000
OH 0000100010
Result: 0010110010
Table 4
The calculation of a Tanimoto similarity coefficient between
two chemical structures as represented by fingerprints
Object Fingerprint
Object A 0010110010
Object B 1010110001
c (bits on in both) 3
a (bits on in Obj A) 4
b (bits on in Obj B) 5
3.2. Viewing a Typical After a successful search for an entity in a chemical property data-
Database Entry base, the user will be presented with an entry page for that entity.
The entry page usually displays a visual representation of the struc-
ture of the entity together with useful information such as names
and synonyms, calculated properties, and classification information.
ChEBI database entries contain as follows
l A unique, unambiguous, recommended ChEBI name and an
associated stable unique identifier
206 J. Hastings et al.
Table 5
Some examples of trivial and systematic
names for ChEBI entities
3.3. Understanding Annotation of data is essential for capturing and transmitting the
Chemical Classification knowledge associated with data in databases. Annotations are often
captured in the form of free text, which is easy for a human
audience to read and understand, but is difficult for computers to
parse; can vary in quality from database to database; and can use
different terminology to mean the same thing (even within the
same database, if for example different human annotators used
different terminology). A core structure for the organisation of
terminologies used in annotation is into an ontology, which consists
of a structured vocabulary with explicit semantics attached to the
relationships between the terms.
ChEBI provides such an ontology in the domain of chemistry,
in which are organised the terms describing structure-based chem-
ical classes in a structure-based hierarchy, and function-based clas-
ses including terms for bioactivity in a function-based hierarchy.
The ChEBI ontology is an ontology for biologically interesting
chemistry. It consists of three sub-ontologies, namely
Chemical entity, in which molecular entities or parts thereof and
chemical substances are classified according to their structural
features;
Role, in which entities are classified on the basis of their role within
a biological context, e.g., as antibiotics, antiviral agents, coen-
zymes, enzyme inhibitors, or on the basis of their intended use
by humans, e.g. as pesticides, detergents, healthcare products,
and fuel;
Subatomic Particle, in which are classified particles which are smal-
ler than atoms.
208 J. Hastings et al.
Fig. 5. A high-level illustration of the ChEBI ontology for the entity R-adrenaline. The entity is given a structure-based
classification as “catecholamines” in the chemical entity ontology, and role classification both in terms of an application
which the entity is used for, “vasodilator agent,” and in terms of the biological role, “hormone”.
3.4. Programmatically While web interfaces provide an easy point of entry for explorative
Accessing and research surrounding the data in chemical property databases and
Manipulating Chemical allow access to powerful searches, ultimately in order to drive any
Data large-scale computational research it is important to be able to
download the data, in a computationally accessible format, to the
local machine. The primary format for downloading chemical data
is the SDF format, which consists of a collection of MOLfile
resources, together with custom properties.
In ChEBI, targeted results of chemical searches can be down-
loaded directly from the search results page by clicking the icon
entitled “Export your search results,” SDF file format. On the
other hand, the full database can be downloaded directly from the
Downloads page available at http://www.ebi.ac.uk/chebi/down-
loadsForward.do. To download the chemical structures (and
structure-related properties), an SDF file download is available.
The data is provided in two flavours as follows.
l Chebi_lite.sdf file contains only the chemical structure, ChEBI
identifier, and ChEBI Name.
l Chebi_complete.sdf file contains all the chemical structures and
associated information. Note that it excludes any ontological
information as, since they do not contain a structure, ontologi-
cal classes are not able to be represented. To download the
ontology classification, the popular ontology formats OBO
(18) and OWL (19) are available.
Once one has an SDF file locally on the machine, several
publicly available tools facilitate manipulation of the file data, for
example, the Chemistry Development Kit. Figure 6 shows a brief
snippet of code necessary to extract data from an SDF file using
version 1.0.2 of the CDK.
An alternative to parsing and interpreting the full downloadable
data file is to make use of the web service facility to programmatically
retrieve exactly the entries needed in real time. ChEBI provides a web
service implemented in SOAP (Simple Object Access Protocol)
(http://www.w3.org/TR/soap/), which allows programmatic
retrieval of all the data in the database based on a search facility
which mimics the search interface available online. The WSDL
9 Accessing and Using Chemical Property Databases 211
Fig. 6. A code snippet used to parse and extract the data from an SDF file using the Chemistry Development Kit.
4. Examples
4.1. Understanding Heavy metals—trace metals at least five times denser than water—are
Ranking in a Simple generally systemic toxins with neurotoxic, nephrotoxic, fetotoxic, and
Search for Mercury teratogenic effects (http://tuberose.com/Heavy_Metal_Toxicity.
(CHEBI:16170) html). Mercury, one such metal, is a well-known environmental
toxin. Its most common organic form is methylmercury, to which
exposure occurs through consumption of seafood (20). Exposure
to elemental metallic mercury occurs from dental amalgam restora-
tions (21). Also found in batteries, fluorescent and mercury lamps,
vaccines and dental amalgams, its effects include pulmonary, cutane-
ous and neurological disorders as well as immunological glomerular
disease (22).
When searching ChEBI for “mercury” using the simple search
box, the search retrieves (at the current release) 35 search results.
These results include all the entities in which the word “mercury”
appears in any associated data field. The results are ranked, that is,
sorted by relevance, depending on where the search term appeared
in the searched entity. A search hit in the primary name, for exam-
ple, is promoted (receives a higher search score) above a search hit
in the cross-references or other associated data. The first five results
include all the forms of pure mercury in ChEBI, while the remain-
der of the search results include molecules and salts containing
mercury as part of a larger species. This ranking is designed to assist
the user in discovering the most relevant information more easily.
4.3. Exploring Phenol is an aromatic alcohol with the chemical formula C6H5OH,
the Ontology Structural used commonly in molecular biology as an organic solvent in bacte-
Relationships with rial plasmid extraction protocols. It is also used in medical treat-
Phenol (ChEBI:15882) ments, petroleum refineries and the production of glue, fibre, and
nylon (30). Although only mildly acidic, it can cause burns (31) and
is also associated with cardiac dysfunction in certain contexts (32).
The ontology for phenol contains several structural relation-
ships (conjugate base/acid; has functional parent).
4.4. Exploring the Role Bafilomycin A1 is the most commonly used of the bafilomycins, a
Ontology family of toxic macrolide antibiotics derived from Streptomyces
with Bafilomycin griseus. It has a wide range of biological actions, including anti-
A (ChEBI:22689) bacterial, antifungal, antineoplastic, immunosuppressive, and
antimalarial activities, as well as a tendency to reduce multidrug
resistance (33).
In ChEBI, several roles are annotated to this entity, including
the biological role “toxin,” due to its toxic properties, and the
application “fungicide.” By clicking the link to the target role in
the ontology and selecting “tree view” in the ontology viewer, it is
possible to view the role hierarchy for the role terms and to navigate
to other chemicals which have been annotated with those roles.
4.5. Interacting with 2D Fusicoccin is a polycyclic organic compound whose structure contains
and 3D Structural three fused carbon rings and another ring containing an oxygen
Representations atom and five carbons. A phytotoxin produced by the fungus Fusi-
of Fusicoccin coccum amygdali, it causes membrane hyperpolarisation and proton
(ChEBI:51015) extrusion in plants (http://en.wikipedia.org/wiki/Fusicoccin). The
toxin acts by stimulating the H + -ATPases of the guard cells,
leading to plasma membrane hyperpolarisation and the irrevers-
ible opening of stomata. The ensuing wilting of the leaves is
followed by cell death (34). Fusicoccin has also been shown to
cause cell death in cultured cells of the sycamore (35).
When viewing the ChEBI entry page for Fusicoccin, the default
structural illustration is the schematic 2D diagram, however, click-
ing “More structures” reveals a 3D structure associated with the
entry page. Selecting the checkbox “applet” loads the MarvinView
applet which allows for interactive exploration of the structure,
including rotation in three dimensions.
9 Accessing and Using Chemical Property Databases 215
5. Notes
5.1. Cross-Database When searching public chemical databases and, in particular, when
Integration and searching across multiple databases with a view to collecting as
Chemical Identity complete a view as possible of the available information for a
particular chemical entity, redundancy of chemical information is
a large problem, as multiple records proliferate across public data-
bases. PubChem is a database which accepts depositions of chemi-
cal data from a vast array of public databases and thereafter
performs structure-based integration to provide a unified com-
pound detail page for each separately identifiable chemical entity.
Ideally, the user would like to see one record for each distinct
chemical entity and only one such record.
However, the process of deciphering, which chemical entities in
different data sources are the same and which are not, is a challeng-
ing one which is not yet fully solved by the wider community. The
InChI code is intended to solve the problem by providing a stan-
dard identifier which can be used for data integration. It goes a long
way towards achieving this goal, but falls short of a full solution for
several reasons, each of which is active areas of discussion and
research within the chemical database community.
One problem with the InChI as a tool for solving all the data
integration issues facing the public compound databases is that,
although it is intended to resolve to the same code regardless of the
way in which a chemical entity is drawn, but unfortunately, the
InChI cannot resolve differences in what is depicted where differ-
ences in drawing represent real chemical distinctions. For example,
salts may be depicted in different ways in chemical drawing soft-
ware. One possibility is to depict the molecular structure of the salt
as a bonded whole with the ionised bond drawn as charge sepa-
rated. Another possibility is to draw the two ions as distinct units,
disconnected from a graph perspective, with explicit charges. Yet
another possibility includes several copies of the charged ions in
order to illustrate the final quantitative ratio of the different ions
within the salt. Each of these depictions results—inevitably—in a
different InChI code, rendering programmatic unification more
difficult.
Another reason that the InChI is not able to fully resolve data
integration issues across different resources is that the basic unit of
identity for a chemical entity is regarded differently in different
resources where there is a different application context. For exam-
ple, a chemical entity database such as ChEBI regards different salts
as different chemical entities, while a drug database such as Drug-
Bank regards all different salts of the same active ingredient as the
same drug entity. Integration between such resources is a matter of
maintaining one-to-many, and in some cases even many-to-many,
9 Accessing and Using Chemical Property Databases 217
5.2. Calculated Many properties provided by public chemical databases are calculated
Properties and directly from the chemical structure. For example, the formula,
Experimentally Derived molecular weight, and overall charge are generally calculated directly
Properties from the chemical structure. Unfortunately, some of the properties of
interest to toxicology cannot be calculated directly from the chemical
Table 6
Lists some examples of chemical properties of interest
Chemical
property Importance of property
Hydrogen bond H-bonding (the formation of weak intermolecular bonds
index between a hydrogen carrying a partial positive charge
on one molecule and a partially negatively charged N,
O, or F on another molecule) plays pivotal role in
biological systems, contributing to the structures and
interactions of molecules such as carbohydrates,
nucleotides, and amino acids. The transience of
hydrogen bonds means that they can be switched on or
off with energy values that lie within the range of
thermal fluctuations of life temperatures (1)
Reactivity with The bodies of all living organisms are chiefly (70–80%)
water water, the unique, life-sustaining properties of which
arise from H-bonding
pH tendencies pH, a measure of hydrogen ion concentration, is a critical
factor controlling life processes such as enzyme
function. For example, a pH change of as little as 0.2
from the normal value of 7.4 in humans is life-
threatening. pH levels similarly determine the
functions of enzymes and other molecules in the wide
variety of living systems
Toxicity Toxins (substances capable of causing harm to living
organisms) include heavy metals such as mercury, lead,
cadmium, and aluminium; biotoxins, produced by
living cells and organism; xenobiotics, substances that
are not a normal component of the organism in
question; or food preservatives and cosmetics
218 J. Hastings et al.
References
1. de Matos P, Alcántara R, Dekker A, Ennis M, 16. Krause S, Willighagen EL, Steinbeck C (2000)
Hastings J, Haug K, Spiteri I, Turner S, Stein- Molecules 5:93–98
beck C (2010) Nucl Acids Res 38:D249–D254 17. Rijnbeek M, Steinbeck C (2009) J Cheminfor-
2. Trinajstic N (1992) Chemical graph theory. matics 1(1):17
CRC Press, Florida, USA 18. The Gene Ontology Consortium The OBO
3. MDL (2010) http://www.mdl.com/company/ language, version 1.2 (2010) http://www.gen-
about/history.jsp. Last accessed December 2010 eontology.org/GO.format.obo-1_2.shtml.
4. IUPAC The IUPAC International Chemical Last accessed July 2012
Identifier (InChI) (2010) \http://www.iupac. 19. Smith MK, Welty C, McGuinness DL (2010)
org/inchi/. Last accessed July 2012 The web ontology language http://www.w3.
5. Daylight, inc. (2010) http://www.daylight. org/TR/owl-guide/. Last accessed July 2012
com/dayhtml/doc/theory/theory.smiles.html. 20. Berry MJ, Ralston NV (2008) Ecohealth
Last accessed July 2012 5:456–459
6. CAS Chemical Abstracts Service (2010) 21. Guzz G, Fogazzi GB, Cantù M, Minoia C,
http://www.cas.org/. Last accessed July 2012 Ronchi A, Pigatto PD, Severi G (2008) J Envi-
7. Beilstein Crossfire Beilstein Database (2010) ron Pathol Toxicol Oncol 27:147–155
\http://info.crossfiredatabases.com/beilstein 22. Haley BE (2005) Medical Veritas 2:535–542
acquisitionpressreleasemarch607.pdf. Last 23. Freedman BJ (1980) Br J Dis Chest 74:128–34
accessed December 2010 24. Longo BM, Yang W, Green JB, Crosby FL,
8. Marx V (2009) GenomeWeb BioInform News Crosby VL (2010) Toxicol Environ Health A
\http://www.genomeweb.com/informatics/ 73:1370–8
tear-down-firewall-pharma-scientists-call-pre- 25. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita
competitive-approach-bioinformatic?page ¼ K, Itoh M, Kawashima S, Katayama T, Araki M,
show. Last accessed July 2012 Hirakawa M (2006) Nucleic Acids Res 34:
9. Sayers E (2005) PubChem: An Entrez Database D354–357
of Small Molecules. NLM Tech Bull (342):e2 26. PDB The world wide protein data bank (2010)
10. Williams A (2008) Chemistry International http://www.wwpdb.org/. Last accessed July
30(1):1 2012
11. Wikipedia Chemistry (2010) http://en.wikipe 27. The UniProt Consortium (2010) Nucleic
dia.org/wiki/Chemistry. Last accessed July Acids Res 38:D142–D148
2012 28. Steinbeck C, Krause S, Kuhn S (2003) J Chem
12. CHEMBIOGRID CHEMBIOGRID: chemis- Inf Comput Sci 43(6):1733–1739
try databases on the web (2010) http://www. 29. Wittig U, Golebiewski M, Kania R, Krebs O,
chembiogrid.org/related/resources/about. Mir S, Weidemann A, Anstein S, Saric J, Rojas I
html. Last accessed July 2012 (2006) In: Proceedings of the 3rd international
13. Williams AJ (2009) Drug Discov Today workshop on data integration in the life
13:495–501 sciences 2006 (DILS’06), Hinxton, UK, pp
14. Steinbeck C, Han Y, Kuhn S, Horlacher O, 94–103
Luttmann E, and Willighagen E (2003) 30. AÅŸkin H, Uysal H, Altun D (2007) Toxicol
J chem Inf Comput Sci 43(2):493–500 Ind Health 23:591–8
15. Steinbeck C, Hoppe C, Kuhn S, Floris M, 31. Lin TM, Lee SS, Lai CS, Lin SD (2006) Burns
Guha R, Willighagen EL (2006) Curr Pharm 4:517–21
Des 12:2111–2120
9 Accessing and Using Chemical Property Databases 219
32. Warner MA, Harper JV (1985) Anesthesiology 37. Nguyen-Huu TD, Mattei C, Wen PJ, Bourdelais
62:366–7 AJ, Lewis RJ, Benoit E, Baden DG, Molgó J,
33. van Schalkwyk DA, Chan XW, Misiano P, Meunier FA (2010) Toxicon 56:792–6
Gagliardi S, Farina C, Saliba KJ (2010) Bio- 38. Lehane L, Lewis RJ (2000) Int J Food Microbiol
chem Pharmacol 79:1291–9 61:91–125
34. Lanfermeijer FC, Prins H (1994) Plant Physiol 39. Long SA, Quan C, Van de Water J, Nantz MH,
104:1277–1285 Kurth MJ, Barsky D, Colvin ME, Lam KS,
35. Malerba M, Contran N, Tonelli M, Crosti P, Coppel RL, Ansari A, Gershwin ME (2001)
Cerana R (2008) Physiol Plant 133:449–57 J Immunol 167:2956–63
36. Mattei C, Wen PJ, Nguyen-Huu TD, Alvarez 40. Amano K, Leung PS, Xu Q, Marik J, Quan C,
M, Benoit E, Bourdelais AJ, Lewis RJ, Baden Kurth MJ, Nantz MH, Ansari AA, Lam KS,
DG, Molgó J, Meunier FA (2008) PLoS One Zeniya M, Coppel RL, Gershwin ME (2004)
3:e3448 J Immunol 172:6444–52
Chapter 10
Abstract
Toxicity data is expensive to generate, is increasingly seen as precompetitive, and is frequently used for the
generation of computational models in a discipline known as computational toxicology. Repositories of
chemical property data are valuable for supporting computational toxicologists by providing access to data
regarding potential toxicity issues with compounds as well as for the purpose of building structure–toxicity
relationships and associated prediction models. These relationships use mathematical, statistical, and
modeling computational approaches and can be used to understand the mechanisms by which chemicals
cause harm and, ultimately, enable prediction of adverse effects of these chemicals to human health and/or
the environment. Such approaches are of value as they offer an opportunity to prioritize chemicals for
testing. An increasing amount of data used by computational toxicologists is being published into the
public domain and, in parallel, there is a greater availability of Open Source software for the generation of
computational models. This chapter provides an overview of the types of data and software available and
how these may be used to produce predictive toxicology models for the community.
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_10, # Springer Science+Business Media, LLC 2012
221
222 A.J. Williams et al.
achieved using in-house data for the end point of interest or using
freely available or commercial data sets for modeling the property.
At a deeper level, the lack of appreciable genotoxicity or cardiotoxi-
city may be important determinants of whether a molecule is
approved for human use or not by government regulatory autho-
rities. As the assays become more specific it is likely that the sources
of data may be restricted. Computational toxicology models could
be used early in the R&D process so that compounds with lower
risk may be progressed as part of a multidimensional process that
also considers other properties of molecules (4). The application of
an array of computational models to fields such as green chemistry
has also recently been reviewed (5). The advantages of such meth-
ods could be that they save money or prevent the physical testing of
compounds (and concomitant animal or other tissue usage) which
may be undesirable or impractical at high volume.
In the past 5 years, 2D-ligand-based approaches have been
increasingly used along with sophisticated algorithms and networks
to form a systems-biology approach for toxicology (6–10). These
studies commonly require compounds with toxicity data and
molecular descriptors to be generated or retrieved from databases.
Several of these recent approaches have described how machine
learning methods can be used for modeling binary or continuous
data relevant to toxicology. We have experience in this domain and
examples of our own activities include studying drug-induced liver
injury using data from a previously published study (11) and colla-
borating with Pfizer to generate models for the time-dependent
inhibition of CYP3A4 (12). Machine learning models for predict-
ing cytotoxicity using data from many different in-house assays that
were combined has also been published separately by Pfizer (13).
As examples of how databases of toxicity information can be
used or created to enable computational modeling we have provided
a few recent examples. There are several examples of quantitative
structure activity relationship (QSAR) or machine learning methods
for predicting hepatotoxicity (14, 15) or drug–drug interactions
(12, 16–18). Drug metabolism in the liver can convert some
drugs into highly reactive intermediates (19–22) and hence cause
drug-induced liver injury (DILI). DILI is the number one reason
why drugs are not approved or are withdrawn from the market after
approval (23). Idiosyncratic liver injury is much harder to predict
from the preclinical in vitro or in vivo situation so we frequently
become aware of such problems once a drug reaches large patient
populations in the clinic and this is generally too late for the drug
developer to identify an alternative and safer drug molecule. One
study assembled a list of approximately 300 drugs and chemicals,
with a classification scheme based on human clinical data for hepa-
totoxicity, for the purpose of evaluating an in vitro testing method-
ology based on cellular imaging of primary human hepatocyte
cultures (24). It was found that the 100-fold Cmax scaling factor
10 Accessing, Using, and Creating Chemical Property Databases. . . 223
2. Public Domain
Databases
Computational toxicologists have a choice when accessing data for
the purpose of reference or to create models. They can source data
in house or from collaborators or commercial sources [e.g., data
and computational models from Leadscope (http://www.lead-
scope.com), Accelrys (http://accelrys.com), LHASA (https://
www.lhasalimited.org), and Aureus (http://www.aureus-sciences.
com/aureus/web/guest/adme-overview)] or, as is increasingly
more common, from online resources (28). Online resources
should, in general, be expected to be more favorable to scientists
as the data are available at no cost and can contain a broader range
of information. However, it should be noted that all structure-
based databases can be prone to error and care needs to be taken
when choosing data from such sources. This is especially true of
public domain structure-based data sources as has been discussed
elsewhere (29).
Online resources hosting toxicity data include the multiple
databases integrated via the Toxicity Network, ToxNet (http://
toxnet.nlm.nih.gov), the Distributed Structure Searchable Toxicity
(DSSTox) database hosted by the EPA (30), the Hazardous Sub-
stances Databank (http://www.nlm.nih.gov/pubs/factsheets/
hsdbfs.html), ACToR (31), ChEMBL (32), and ToxRefDB
(http://www.epa.gov/ncct/toxrefdb/) to name just a few. These
are discussed in more detail below.
TOXNET, the TOXicology data NETwork, is a cluster of
databases covering toxicology, hazardous chemicals, environmental
health, and related areas. It is managed by the Toxicology and
Environmental Health Information Program (TEHIP) in the Divi-
sion of Specialized Information Services (SIS) of the National
Library of Medicine (NLM). It is a central portal integrating a
series of databases related to toxicology. These are Integrated Risk
Information System (IRIS), International Toxicity Estimates for
10 Accessing, Using, and Creating Chemical Property Databases. . . 225
Fig. 1. The Snorql graphical user interface showing the SPARQL query and the seven most frequent toxicity measurements
recorded in ChEMBL.
Fig. 2. A screenshot of the Bioclipse wizard used to create a local data set for substructure mining. The wizard allows the
user to select two protein families that will be compared. The biological activity, IC50, is selected in this example and is
dynamically discovered in the online database.
3. Utilizing
Databases for
Computational
Toxicity Computational access to the databases discussed in this section has
not been formalized and, at best, one can download the full data
from a download page. Recently however the OpenTox project
proposed a standardized API for accessing data sets (59). The
resulting OpenTox Framework describes the use of a number of
technologies to implement this API, specifically RESTful services to
facilitate web access, and use of the RDF as an open standard for
communication. This combination makes it easy for third party
software projects to integrate databases exposed via the framework.
Bioclipse (60) was recently extended with a number of scripting
extensions to interact with OpenTox servers, and Table 1 shows
two approaches of accessing toxicological data sets made available
via the OpenTox framework (38). The first approach lists all data
sets and takes the first (index ¼ 0) to be saved in a standard SDF file
format. The second approach uses an OpenTox ontology server
and queries the underlying RDF data directly with the SPARQL
query language. The SPARQL requests shown in the listing shows
queries for data sets related to mutagenicity. At the time of writing
only one data set was available but this will hopefully change as
the community embraces and adopts the OpenTox Framework.
Figure 3 shows how this data set was subsequently downloaded as
10 Accessing, Using, and Creating Chemical Property Databases. . . 231
Table 1
JavaScript scripts that show two approaches
to how Bioclipse can retrieve information
from OpenTox servers
4. Utilizing
Databases for
Prediction of
Metabolic Sites The toxicity of compounds is not always directly caused by the
compound itself, but may also be due to metabolism. Public data-
bases related to drug metabolism are rare. The lack of such databases
makes methods that predict metabolism, or just the likelihood that
structures are metabolized, all the more relevant. Predicted meta-
bolites can be screened against toxicity databases and existing pre-
dictive models, while the likelihood that a drug can undergo
metabolism can be used in the overall analysis of a compound’s
toxicity. The cytochrome P450 (CYPs) family of heme-thiolate
enzymes are the cause of the majority of drug–drug interactions
and metabolism-dependent toxicity issues (61, 62). The identifica-
tion of the site-of-metabolism (SOM) for CYPs is an important
232 A.J. Williams et al.
Fig. 3. A screenshot of a Bioclipse script that downloads a data set with mutagenicity information from an OpenTox server
and shows the resulting set of data in a molecules table view.
Fig. 4. A screenshot of the Bioclipse workbench with the MetaPrint2D method predicting the SOM for a set of drugs. In
the interface individual atoms are colored according to their likelihood of metabolism (the color scheme is as follows in the
interface red ¼ high, yellow ¼ medium, and green ¼ low). Please see color figure online to see color coding in the
interface.
5. Standardizing
QSAR Experiments
Setting up data sets for QSAR analysis is not without complications.
There are numerous software tools available for the calculation of
descriptors and the conversion of chemical structures into a form
which can be used in statistical and machine-learning methods.
These software packages are generally incompatible, have proprietary
file formats, or are available as standalone applications. A lack of
standardization in terms of descriptors and the associated descriptor
implementations has likely contributed to the poor quality of the
supplemental data for published QSAR models and has made it an
234 A.J. Williams et al.
Fig. 5. A screenshot from Bioclipse with a script calling the MetaPrint2D method and outputting the predicted likelihood for
the SOM per atom.
often impossible task to reproduce both the formation of the data set
and hence the entire analysis.
QSAR-ML (69) is a new file format for exchanging QSAR data
sets, which consists of an open XML format (QSAR-ML) and
builds on the Blue Obelisk descriptor ontology (70). The ontology
provides an extensible manner of uniquely defining descriptors for
use in QSAR experiments. The exchange format supports multiple
versioned implementations of these descriptors. As a result a data
set described by QSAR-ML makes setup by others completely
reproducible. A Bioclipse plug-in is available for working with
10 Accessing, Using, and Creating Chemical Property Databases. . . 235
Fig. 6. A screenshot from Bioclipse showing the QSAR plug-ins working graphically with QSAR datasets and complying with
the QSAR-ML standard for interoperable QSAR datasets.
Fig. 7. A schematic indicating the release of ADME/Tox models from commercial descriptors and algorithms, and how it
may facilitate data sharing.
same goal for QSAR studies but in order to meet this demand a
database for sharing QSAR-ML-compliant experiments must first
be developed.
6. Conclusions
Acknowledgments
References
1. Helma C (ed) (2005) Predictive toxicology. MetaCore and MetaDrug platforms. Xenobio-
Taylor and Francis, Boca Raton tica 36(10–11):877–901
2. Cronin MTD, Livingstone DJ (2004) Predict- 8. Ekins S (2006) Systems-ADME/Tox:
ing chemical toxicity and fate. CRC, Boca resources and network approaches. J Pharma-
Raton col Toxicol Methods 53:38–66
3. Ekins S (2007) Computational toxicology: risk 9. Nikolsky Y, Ekins S, Nikolskaya T, Bugrim A
assessment for pharmaceutical and environ- (2005) A novel method for generation of sig-
mental chemicals. Wiley, Hoboken nature networks as biomarkers from complex
4. Ekins S, Boulanger B, Swaan PW, Hupcey high throughput data. Toxicol Lett 158:20–29
MAZ (2002) Towards a new age of virtual 10. Ekins S, Nikolsky Y, Nikolskaya T (2005)
ADME/TOX and multidimensional drug Techniques: application of systems biology to
discovery. J Comput Aided Mol Des absorption, distribution, metabolism, excre-
16:381–401 tion, and toxicity. Trends Pharmacol Sci
5. Voutchkova AM, Osimitz TG, Anastas PT 26:202–209
(2010) Toward a comprehensive molecular 11. Ekins S, Williams AJ, Xu JJ (2010) A predictive
design framework for reduced hazard. Chem ligand-based Bayesian model for human drug
Rev 110:5845–5882 induced liver injury. Drug Metab Dispos
6. Ekins S, Giroux C (2006) Computers and sys- 38:2302–2308
tems biology for pharmaceutical research and 12. Zientek M, Stoner C, Ayscue R, Klug-McLeod
development. In: Ekins S (ed) Computer appli- J, Jiang Y, West M, Collins C, Ekins S (2010)
cations in pharmaceutical research and devel- Integrated in silico-in vitro strategy for addres-
opment. John Wiley, Hoboken, pp 139–165 sing cytochrome P450 3A4 time-dependent
7. Ekins S, Bugrim A, Brovold L, Kirillov E, inhibition. Chem Res Toxicol 23:664–676
Nikolsky Y, Rakhmatulin EA, Sorokina S, Rya- 13. Langdon SR, Mulgrew J, Paolini GV, van
bov A, Serebryiskaya T, Melnikov A, Metz J, Hoorn WP (2010) Predicting cytotoxicity
Nikolskaya T (2006) Algorithms for network from heterogeneous data sources with Bayesian
analysis in systems-ADME/Tox using the learning. J Cheminform 2:11
10 Accessing, Using, and Creating Chemical Property Databases. . . 239
14. Clark RD, Wolohan PR, Hodgkin EE, Kelly using a Bayesian model. J Med Chem
JH, Sussman NL (2004) Modelling in vitro 47:4463–4470
hepatotoxicity using molecular interaction 26. Bender A (2005) Studies on molecular similar-
fields and SIMCA. J Mol Graph Model ity. Ph.D. Thesis, University of Cambridge,
22:487–497 Cambridge
15. Cheng A, Dixon SL (2003) In silico models for 27. Williams AJ, Ekins S (2012) A quality alert for
the prediction of dose-dependent human hep- chemistry databases. Towards a gold standard:
atotoxicity. J Comput Aided Mol Des regarding quality in public domain chemistry
17:811–823 databases and approaches to improving the sit-
16. Ung CY, Li H, Yap CW, Chen YZ (2007) In uation, Drug Discovery Today, Volume 17,
silico prediction of pregnane X receptor activa- Issues 13–14, Pages 685–701. Submitted for
tors by machine learning approaches. Mol publication
Pharmacol 71:158–168 28. Judson R (2010) Public databases supporting
17. Marechal JD, Yu J, Brown S, Kapelioukh I, computational toxicology. J Toxicol Environ
Rankin EM, Wolf CR, Roberts GC, Paine MJ, Health 13:218–231
Sutcliffe MJ (2006) In silico and in vitro 29. Williams AJ, Tkachenko V, Lipinski C, Tropsha
screening for inhibition of cytochrome P450 A, Ekins S (2009) Free online resources
CYP3A4 by co-medications commonly used enabling crowd-sourced drug discovery. Drug
by patients with cancer. Drug Metab Dispos Discov World 10(Winter):33–38
34:534–538 30. Richard AM, Williams CR (2002) Distributed
18. Ekins S, Waller CL, Swaan PW, Cruciani G, structure-searchable toxicity (DSSTox) public
Wrighton SA, Wikel JH (2000) Progress database network: a proposal. Mutat Res
in predicting human ADME parameters in 499:27–52
silico. J Pharmacol Toxicol Methods 31. Judson R, Richard A, Dix D, Houck K,
44:251–272 Elloumi F, Martin M, Cathey T, Transue TR,
19. Boelsterli UA, Ho HK, Zhou S, Leow KY Spencer R, Wolf M (2008) ACToR—aggre-
(2006) Bioactivation and hepatotoxicity of gated computational toxicology resource.
nitroaromatic drugs. Curr Drug Metab Toxicol Appl Pharmacol 233:7–13
7:715–727 32. Overington J (2009) ChEMBL An interview
20. Kassahun K, Pearson PG, Tang W, McIntosh I, with John Overington, team leader, chemoge-
Leung K, Elmore C, Dean D, Wang R, Doss G, nomics at the European Bioinformatics Insti-
Baillie TA (2001) Studies on the metabolism of tute Outstation of the European Molecular
troglitazone to reactive intermediates in vitro Biology Laboratory (EMBL-EBI). Interview
and in vivo. Evidence for novel biotransforma- by Wendy A. Warr. J Comput Aided Mol Des
tion pathways involving quinone methide for- 23:195–198
mation and thiazolidinedione ring scission. 33. Richard AM (2006) DSSTox web site launch:
Chem Res Toxicol 14:62–70 Improving public access to databases for build-
21. Walgren JL, Mitchell MD, Thompson DC ing structure-toxicity prediction models. Pre-
(2005) Role of metabolism in drug-induced clinica 2:103–108
idiosyncratic hepatotoxicity. Crit Rev Toxicol 34. Kortagere S, Krasowski MD, Reschly EJ,
35:325–361 Venkatesh M, Mani S, Ekins S (2010) Evalua-
22. Park BK, Kitteringham NR, Maggs JL, Pirmo- tion of computational docking to identify
hamed M, Williams DP (2005) The role of pregnane receptor agonists in the ToxCast™
metabolic activation in drug-induced hepato- database. Environ Health Perspect
toxicity. Annu Rev Pharmacol Toxicol 118:1412–1417
45:177–202 35. Sanderson K (2011) It’s not easy being green.
23. Schuster D, Laggner C, Langer T (2005) Why Nature 469:18–20
drugs fail—a study on side effects in new 36. Carroll JJ, Klyne G (2004) Resource descrip-
chemical entities. Curr Pharm Des tion framework (RDF): concepts and abstract
11:3545–3559 syntax. Tech rep, W3C
24. Xu JJ, Henstock PV, Dunn MC, Smith AR, 37. Prud’hommeaux E, Seaborne A (2008)
Chabot JR, de Graaf D (2008) Cellular imag- SPARQL query language for RDF, W3C rec-
ing predictions of clinical drug-induced liver ommendation
injury. Toxicol Sci 105:97–105
38. Willighagen EL, Alvarsson J, Andersson A,
25. Xia XY, Maliski EG, Gallant P, Rogers D Eklund M, Lampa S, Lapins M, Spjuth O,
(2004) Classification of kinase inhibitors
240 A.J. Williams et al.
Wikberg J (2011) Linking the resource ome to discover the molecular targets for
description framework to cheminformatics plant-derived psychoactive compounds: a
and proteochemometrics. J Biomedical Seman- novel approach for CNS drug discovery. Phar-
tics 2(Suppl 1):S1–S6 macol Ther 102:99–110
39. Chen B, Dong X, Jiao D, Wang H, Zhu Q, 52. Keiser MJ, Setola V, Irwin JJ, Laggner C,
Ding Y, Wild DJ (2010) Chem2Bio2RDF: a Abbas AI, Hufeisen SJ, Jensen NH, Kuijer
semantic framework for linking and data MB, Matos RC, Tran TB, Whaley R, Glennon
mining chemogenomic and systems chemical RA, Hert J, Thomas KL, Edwards DD, Shoi-
biology data. BMC Bioinformatics 11:255 chet BK, Roth BL (2009) Predicting new
40. Ansell P (2011) Model and prototype for que- molecular targets for known drugs. Nature
rying multiple linked scientific datasets. Future 462:175–181
Generat Comput Syst 27:329–333 53. Setola V, Dukat M, Glennon RA, Roth BL
41. Belleau F, Nolin MA, Tourigny N, Rigault P, (2005) Molecular determinants for the interac-
Morissette J (2008) Bio2RDF: towards a tion of the valvulopathic anorexigen norfen-
mashup to build bioinformatics knowledge sys- fluramine with the 5-HT2B receptor. Mol
tems. J Biomed Inform 41:706–716 Pharmacol 68:20–33
42. Prud’hommeaux E (2007) Case study: FeDeR- 54. Rothman RB, Baumann MH, Savage JE, Rau-
ate for drug research. Tech Rep: 4–7 ser L, McBride A, Hufeisen SJ, Roth BL
43. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, (2000) Evidence for possible involvement of
Bryant SH (2009) PubChem: a public infor- 5-HT(2B) receptors in the cardiac valvulopathy
mation system for analyzing bioactivities of associated with fenfluramine and other seroto-
small molecules. Nucleic Acids Res 37: nergic medications. Circulation
W623–W633 102:2836–2841
44. Crumb WJ Jr, Ekins S, Sarazan D, Wikel JH, 55. Chekmarev DS, Kholodovych V, Balakin KV,
Wrighton SA, Carlson C, Beasley CM (2006) Ivanenkov Y, Ekins S, Welsh WJ (2008) Shape
Effects of antipsychotic drugs on Ito, INa, Isus, signatures: new descriptors for predicting car-
IK1, and hERG: QT prolongation, structure diotoxicity in silico. Chem Res Toxicol
activity relationship, and network analysis. 21:1304–1314
Pharm Res 23:1133–1143 56. Zhu H, Tropsha A, Fourches D, Varnek A,
45. Su BH, Shen MY, Esposito EX, Hopfinger AJ, Papa E, Gramatica P, Oberg T, Dao P, Cherka-
Tseng YJ (2010) In silico binary classification sov A, Tetko IV (2008) Combinatorial QSAR
QSAR models based on 4D-fingerprints and modeling of chemical toxicants tested against
MOE descriptors for prediction of hERG Tetrahymena pyriformis. J Chem Inf Model
blockage. J Chem Inf Model 50:1304–1318 48:766–784
46. Li Q, Jorgensen FS, Oprea T, Brunak S, 57. Ekins S, Williams AJ (2010) Precompetitive
Taboureau O (2008) hERG classification preclinical ADME/Tox data: set It free on the
model based on a combination of support vec- web to facilitate computational model building
tor machine method and GRIND descriptors. to assist drug development. Lab Chip
Mol Pharm 5:117–127 10:13–22
47. Thai KM, Ecker GF (2009) Similarity-based 58. Wishart DS, Knox C, Guo AC, Cheng D, Shri-
SIBAR descriptors for classification of chemi- vastava S, Tzur D, Gautam B, Hassanali M
cally diverse hERG blockers. Mol Divers (2008) DrugBank: a knowledgebase for
13:321–336 drugs, drug actions and drug targets. Nucleic
Acids Res 36:D901–D906
48. Ekins S, Williams AJ, Krasowski MD, Freun-
dlich JS (2011) In silico repositioning of 59. Hardy B, Douglas N, Helma C, Rautenberg M,
approved drugs for rare and neglected diseases. Jeliazkova N, Jeliazkov V, Nikolova I, Benigni
Drug Discov Today 16(7–8):298–310 R, Tcheremenskaia O, Kramer S, Girschick T,
Buchwald F, Wicker J, Karwath A, Gutlein M,
49. Strachan RT, Ferrara G, Roth BL (2006) Maunz A, Sarimveis H, Melagraki G, Afantitis
Screening the receptorome: an efficient A, Sopasakis P, Gallagher D, Poroikov V, Fili-
approach for drug discovery and target valida- monov D, Zakharov A, Lagunin A, Gloriozova
tion. Drug Discov Today 11:708–716 T, Novikov S, Skvortsova N, Druzhilovsky D,
50. O’Connor KA, Roth BL (2005) Finding new Chawla S, Ghosh I, Ray S, Patel H, Escher S
tricks for old drugs: an efficient route for (2010) Collaborative development of predic-
public-sector drug discovery. Nat Rev Drug tive toxicology applications. J Cheminform 2:7
Discov 4:1005–1014 60. Spjuth O, Alvarsson J, Berg A, Eklund M,
51. Roth BL, Lopez E, Beischel S, Westkaemper Kuhn S, Masak C, Torrance G, Wagener J,
RB, Evans JM (2004) Screening the receptor- Willighagen EL, Steinbeck C, Wikberg JE
10 Accessing, Using, and Creating Chemical Property Databases. . . 241
Molecular Dynamics
Xiaolin Cheng and Ivaylo Ivanov
Abstract
Molecular dynamics (MD) simulation holds the promise of revealing the mechanisms of biological
processes in their ultimate detail. It is carried out by computing the interaction forces acting on each
atom and then propagating the velocities and positions of the atoms by numerical integration of Newton’s
equations of motion. In this review, we present an overview of how the MD simulation can be conducted to
address computational toxicity problems. The study cases will cover a standard MD simulation performed
to investigate the overall flexibility of a cytochrome P450 (CYP) enzyme and a set of more advanced MD
simulations to examine the barrier to ion conduction in a human a7 nicotinic acetylcholine receptor
(nAChR).
Key words: Molecular dynamics, Force field, Toxicity, Free energy, Enhanced sampling
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_11, # Springer Science+Business Media, LLC 2012
243
244 X. Cheng and I. Ivanov
MD, which shed light into the underlying mechanisms that would
otherwise be difficult to obtain from experiments. The second way
is to use simulation simply as a means of sampling. Then, in a
statistical mechanics framework, a variety of equilibrium and kinetic
properties of the systems can be derived and compared with experi-
ments. Therefore, MD simulation not only allows direct visualiza-
tion of the dynamics of biomolecules but also helps elucidate the
underlying molecular mechanisms (driving forces) of the observed
behavior. Since the 1970s, the method of MD simulation has
gained popularity in biochemistry and biophysics (21–24). As the
simulation capability increases in complexity and scale, MD has
widely served as a computational microscope to investigate the
molecular details of many complex biological processes, with appli-
cation to protein folding (25), enzymatic catalysis (26, 27), molec-
ular machines (28), etc. More recently, MD simulation has also
been used to address drug toxicity problems in which the dynamic
nature of proteins is essential.
are able to bind to hERG and then block the ion flow through the
channel. MD simulations can be particularly valuable for
membrane-bound hERG potassium channels as experimental char-
acterization of their structural dynamics is very challenging.
1.3. How, When, and MD simulations have been extensively employed to investigate the
by Whom These conformational dynamics of cytochrome P450 enzymes that is
Techniques or Tools thought to play an important role in ligand binding and catalysis.
Are Used in Practice Meharenna et al. have performed high-temperature MD simula-
tions to probe the structural basis for enhanced stability in thermal
1.3.1. Cytochrome P450
stable cytochrome P450. The comparison of the MD trajectories at
Enzymes
500K suggests that the tight nonpolar interactions involving Tyr26
and Leu308 in the Cys ligand loop are responsible for the enhanced
stability in CYP119, the most thermal stable P450 known (48).
Using MD simulations at normal and high temperatures, Skopalı́k
et al. have studied the flexibility and malleability of three micro-
somal cytochromes: CYP3A4, CYP2C9, and CYP2A6. MD simu-
lations reveal flexibility differences between these three
cytochromes, which appear to correlate with their substrate prefer-
ences (49). Hendrychová et al. have employed MD simulations and
spectroscopy experiments to probe the flexibility and malleability of
five forms of human liver CYP enzymes, and have demonstrated
consistently from different techniques that CYP2A6 and CYP1A2
have the least malleable active sites while CYP2D6, CYP2C9, and
CYP3A4 exhibit considerably greater degrees of flexibility (50).
Lampe et al. have utilized MD simulations in conjunction with
two-dimensional heteronuclear single quantum coherence NMR
spectroscopy to examine substrate and inhibitor binding to
CYP119, a P450 from Sulfolobus acidocaldarius. Their results sug-
gest that tightly binding hydrophobic ligands tend to lock the
enzyme into a single conformational substate, whereas weakly
binding low-affinity ligands bind loosely in the active site, resulting
in a distribution of localized conformers. Their MD simulation
results further show that the ligand-free enzyme samples ligand-
bound conformations of the enzyme, thus suggesting that ligand
binding proceeds through conformational selection rather than
induced fit (51). By means of MD simulations, Park et al. have
examined the differences in structural and dynamic properties
between CYP3A4 in the resting form and its complexes with the
substrate progesterone and the inhibitor metyrapone (52).
The dynamics of the substrate binding site has also been a
matter of extensive MD studies because of its crucial role in under-
standing the specificity and selectivity of the enzyme towards dif-
ferent substrates. Diazepam is metabolized by CYP3A4 with
sigmoidal dependence kinetics, which has been speculated to be
caused by the cooperative binding of two substrates in the active
site. Fishelovitch et al. have performed MD simulations of the
substrate-free CYP3A4 and the enzymes with one and two
248 X. Cheng and I. Ivanov
2. Materials
2.1. Common Software Many MD packages have been developed over the years, including
and Methods Used CHARMM (63), AMBER (64), GROMOS (65), GROMACS
in the Field (66), TINKER (67), MOLDY (68), DL_POLY (69), NAMD
(70), LAMMPS (71), and Desmond (72). Some of them have
their associated force fields, while others only provide an MD
engine and require compatible force fields for running a simulation.
For biomolecular simulation, CHARMM, AMBER, GROMOS,
NAMD, and GROMACS are most widely used. CHARMM and
AMBER have enjoyed the longest history of continuous develop-
ment and offer a wide range of functionalities for advanced
250 X. Cheng and I. Ivanov
2.2. Special Most MD programs run under the Unix/Linux like operating
Requirements for the systems. As MD simulations of biomolecules are computationally
Software and Methods intensive, production runs are often performed on high-end paral-
(e.g., Hardware, lel platforms or commodity clusters of many processors. Recently,
Computing Platform, with the emergence of special purpose processors such as field-
and Operating System) programmable gate array (FPGA) and graphic processing unit
(GPU) designed to speed up computing-intensive portions of
applications, several MD codes have been adapted and ported to
run on these platforms as well (77–79). Fortran or C/C++ compi-
lers are required since most MD codes are written in Fortran or C/
C++. Special parallel programming libraries are also required to run
MD in parallel on multiple CPU cores or multiple nodes on a
network, e.g., the MPI library for distributed memory systems, and
POSIX threads and OpenMP for shared memory systems. Most MD
codes use MPI, a message-passing application programmer interface,
while NAMD uses Charm++ parallel objects for good performance
on a wide variety of underlying hardware platforms. FFTW, a C
subroutine library for computing the discrete Fourier transform, is
extensively employed by state-of-the-art MD codes for treating long-
ranged electrostatic interactions with the particle mesh Ewald (PME)
(80) or particle–particle particle–mesh (P3M) (81) methods. Pre-
compiled libraries, e.g., special math library, or script language library
may also be required by some MD programs, such as the TCL library
used in both NAMD (70) and VMD (82).
2.3. Preferred Software Several MD codes are currently used by the biomolecular simulation
and Why community, and each of them has its strength and weakness. Choos-
ing which MD software for your simulation will depend on many
factors, such as the force field compatibility, the speed/scalability,
the support for simulation setup and post-simulation analysis, and
11 Molecular Dynamics 251
3. Methods
3.1. Molecule Building MD simulation starts with a 3D structure as the initial configura-
tion of the system. This structure can be an NMR or X-ray structure
3.1.1. Initial Coordinates
obtained from the Brookhaven Protein Databank (http://www.
rcsb.org/pdb/). If no experimentally determined structure is avail-
able, an atomic-resolution model of the “target” protein can be
constructed from its amino acid sequence by homology modeling.
Homology modeling can produce high-quality structural models
when the target and the template, a homologous protein with an
experimentally determined structure, are closely related. The
choice of an initial configuration must be done carefully as this
can influence the results of the simulation. When multiple PDB
entries are available, which structure to choose usually depends on
the quality of the structure, the state in which the structure was
captured and the experimental condition under which the structure
was determined. It is important to choose a configuration in a state
best representing what one wishes to simulate.
3.1.2. Prune Protein With a 3D structure in hand, still a few things need to be sorted out
Structure before a simulation can get started. (1) Removing redundant atoms:
X-ray structures may be captured in a multimer state; NMR may
yield an ensemble of conformations; multiple conformations may
exist for some flexible side chains; extra chemical agents may be
added to facilitate the structure determination. All these redundant
atoms should be removed prior to further structure processing.
(2) Add missing atoms: depending on the quality of a PDB struc-
ture, some coordinates may be missing. It is important to check
whether the missing coordinates are relevant to the question to be
addressed. First, for those proteins which active form are multi-
meric, it is necessary to construct the multimer structures from
252 X. Cheng and I. Ivanov
3.1.3. Molecular Structure Given a refined PDB structure of the protein or protein complex,
and/or Topology File the next step is to generate the topology or parameter files for the
system. The topology file contains the geometrical information of
the system, e.g., bonds, angles, dihedral angles, and interaction list.
Sometimes, topology files are combined with parameter files, thus
may also contain the force field parameters, i.e., the functional
forms and parameter sets used to describe the potential energy of
the system. Various force fields have been developed for different
types of biomacromolecules, including proteins, nucleic acids,
lipids, and carbohydrates. The choice of an appropriate force field
is of substantial importance, and will depend on the nature of the
system (problem) of interest. In general, the chosen force field
should be compatible with the MD engine, and the force fields
for different components of the system should be consistent with
each other. Most MD programs provide auxiliary utility programs
for generating topology and parameter files from PDB files. The
procedure is straightforward except for a few potentially confusing
items, for instance, some special treatments (or patch) may be
11 Molecular Dynamics 253
3.1.4. Solvate the System To simulate a biological system in an aqueous solution, a choice
should be made between explicit and implicit solvent models.
However, only explicit solvent models (such as the TIP3P, SPC/
E water models) will be discussed in this review. For crystal waters,
it is advisable to keep them, especially for those located in the active
site or the interior of the protein that often play a structural or
functional role. When necessary, additional water molecules can be
placed inside or around the protein using programs such as
DOWSER (http://hekto.med.unc.edu:8080/HERMANS/soft-
ware/DOWSER/). The system is then solvated with a pre-
equilibrated water box. Ions (usually Na+, K+, Cl) are added to
neutralize the system, and to reach a desired ionic concentration.
For membrane-associated systems, proteins need to be inserted to a
pre-equilibrated lipid bilayer. The orientation and position of pro-
teins in membrane can be determined by an online server OPM
(http://opm.phar.umich.edu/), together with experiments and
the modeler’s intuition. The lipid composition is another issue
worthy of consideration as accumulating evidence has shown it
can have a significant and differentiate impact on the function of
membrane-bound proteins. The CHARMM force field supports
six types of lipids 1,2-dipalmitoyl-sn-phosphatidylcholine (DPPC),
1,2-dimyristoyl-sn-phosphatidylcholine (DMPC), 1,2-dilauroyl-sn-
phosphatidylcholine (DLPC), 1-palmitoyl-2-oleoyl-sn-phosphati-
dylcholine (POPC), 1,2-dioleoyl-sn-phosphatidylcholine (DOPC),
and 1-palmitoyl-2-oleoyl-sn-phosphatidylethanolamine (POPE),
while VMD provides two types of pre-equilibrated POPC and
POPE membrane patches. After everything is assembled together,
the new structure/topology files can be built, followed by several
rounds of energy minimization to remove bad van der Waals con-
tacts.
and for many systems similar to those contained in this thesis, the
validity of the hypothesis has been confirmed. The ergodic hypoth-
esis provides the rationale for the molecular dynamics method and a
practical recipe that allows ensemble averages to be determined
from time averaging over dynamical trajectories.
3.2.2. Interaction Treatment How an MD simulation will be run is controlled by a set of input
and Integration Method parameters contained in a configuration file, such as the number of
steps and the temperature. The main options and values can be
divided into three categories: (1) interaction energy treatment; (2)
integration method; (3) ensemble specification. Additional sets of
parameters may be used by advanced simulation techniques, e.g.,
enhanced sampling and free energy simulations. Most explicit sol-
vent MD simulations employ periodic boundary conditions to
avoid the boundary artifact. The most time consuming part of a
simulation is the calculation of nonbonded terms in potential
energy functions, e.g., the electrostatic and van der Waals forces.
In principle, the nonbonded energy terms between every pair of
atoms should be evaluated; in this case, the number of operations
increases as the square of the number of atoms for a pair wise model
(N2). To speed up the computation, the nonbonded interactions,
e.g., the electrostatic and van der Waals forces, are truncated if two
atoms are separated greater than a predefined cutoff distance. The
long-ranged electrostatic interactions typically use FFT-based PME
(80) or particle–particle particle–mesh (P3M) (81) methods that
reduce the computational complexity from N2 to N log N. MD
simulation involves the numerical integration of Newton’s equa-
tions of motion in finite time steps that must be small enough to
avoid discretization errors. Typical time steps used in MD are in the
order of 1 fs (i.e., smaller than the fastest vibrational frequency in
biomolecular systems). This value may be increased by using con-
straint algorithms such as SHAKE (86), which fix the fastest vibra-
tions of the atoms (e.g., hydrogens). Multiple-time-step methods
are also available, which allow for extended times between updates
of slowly varying long-range forces (87). The total simulation
duration should be chosen to be long enough to reach biologically
relevant time scales or allow sufficient sampling (barrier crossing),
and should also account for the available computational resources
so that the calculation can finish within a reasonable wall-clock
time.
3.2.3. Temperature and MD simulation is often performed on the following three thermo-
Pressure Control dynamic ensembles: microcanonical (NVE), canonical (NVT), and
isothermal–isobaric (NPT). In the NVE ensemble, the number of
particles (N ), the volume (V), and the total energy (E ) of the system
are held constant. In the canonical ensemble, N, V, and the temper-
ature (T ) are constant, where the temperature is maintained
through a thermostat. In the NPT ensemble that corresponds
256 X. Cheng and I. Ivanov
3.4. Validating The simulation results should be validated at two levels: first, to assess
the Results whether the simulation is conducted properly; second, to assess
whether the model underlying the simulation sufficiently describes
the problem to be probed. A variety of MD outputs can provide
hints about whether the simulation is conducted properly, including
the time-dependent thermodynamic quantities (i.e., temperature,
pressure, and volume), their fluctuations, and the distribution of velo-
cities in the system. For example, one would expect the conservation of
the total energy in an NVE ensemble simulation, while any significant
energy drift indicates possible problem of either the integration algo-
rithm or the interaction force evaluation. Structural features of the
system can be validated by visualization to rule out any unphysical
(inappropriate) changes, contacts, or assembly; computer programs, e.
g., Verify3D (http://nihserver.mbi.ucla.edu/Verify_3D/), Procheck
phi/psi angle check (http://www.ebi.ac.uk/thornton-srv/soft-
ware/PROCHECK/), WHAT_CHECK Packing 2 (http://swift.
cmbi.ru.nl/gv/whatcheck/), Prosa2003 (http://www.came.sbg.
ac.at/prosa_details.php), ModFOLD (http://www.reading.ac.uk/
258 X. Cheng and I. Ivanov
3.7. Improving MD simulations suffer from several drawbacks, which are due to
the Model the empirical force fields, the limited simulation length and size,
and the way the simulation models are built. The classical mechani-
cal force field functions and parameters are derived from both
experimental work and high-level quantum mechanical calculations
based on numerous approximations (103, 104). Limitations in
current force fields, such as inaccurate conformational preferences
for small proteins and peptides in aqueous solutions have been
known for years, which have led to a number of attempts to
improve these parameters. Recent re-parameterization of the dihe-
dral terms in AMBER (105) and the so-called CMAP correction in
CHARMM (106) have significantly improved the accuracy of
empirical force fields for protein secondary structure predictions.
Moreover, many existing force fields based on fixed charge models
that do not account for electronic polarization of the environment,
although more sophisticated and expensive polarizable models have
been shown to be necessary for accurate description of some molec-
ular properties. A few polarizable force field models have been
developed over the past several years (107–110). Recent systematic
validations of AMOEBA force field have shown significant
improvements over fixed charge models for a variety of structural
and thermodynamic and dynamic properties (108). An increasing
use of this next-generation of force field models in biomolecular
simulations is anticipated within next few years. Furthermore, clas-
sical force fields are based on the assumption that quantum effects
play a minor role or can be separated from classical Newtonian
dynamics. A proper description of charge/electron transfer process
or chemical bond breaking/forming requires quantum treatment
that can be incorporated into simulations in different ways.
Another way of improving the model is to extend the time
scales spanned by MD simulations. Currently attainable time scales
are still about 3–4 orders of magnitude shorter than most biologi-
cally relevant ones. Methodologically, two general ways exist to
extend the time scale of an MD simulation: to make each integra-
tion step faster (mainly) through parallelization, or to improve the
exploration of phase space via enhanced sampling techniques.
During recent years, tremendous efforts have been focused on
improving parallel efficiency of the MD codes so that more CPU
processors can be used, which has enabled many ms and even ms
MD simulations of biological systems. One example is the use of a
special hardware computer Anton to reach ms simulations of an Abl
kinase (111), an NhaA antiporter (112), and a potassium channel
(113). As the computing power continues to increase, we expect
next generation of computer systems to significantly expand our
capability to simulate more complex and realistic systems for longer
times. However, the increase in computing power alone will not be
sufficient, and the development of more efficient and robust
enhanced sampling techniques will also be required to address
many challenging thermodynamic and kinetic problems in biology.
11 Molecular Dynamics 261
4. Examples
4.1. Cytochrome P450: We will first show how to carry out a standard MD simulation of the
A Simple Problem cytochrome P450 to investigate the overall flexibility of the protein.
Three crystal structures of CYP3A4, unliganded, bound to the
inhibitor metyrapone, and bound to the substrate progesterone,
have been determined (119). The comparison of the three struc-
tures revealed little conformational change associated with the
binding of ligand. So it will be interesting to investigate if any
dynamical differences exist among the three structures. Here, we
demonstrate how the protein dynamics can be probed by MD
simulations using the NAMD package (70). The input and config-
uration files will be prepared with VMD (82) and TCL scripts.
4.1.1. Building a Structural Our simulation will start from an X-ray crystal structure of
Model of CYP3A4 CYP3A4, which is an unliganded CYP3A4 soluble domain cap-
tured at a 2.80 Å resolution (119). The PDB file of CYP3A4 can
be downloaded from the PDB database (PDB entries 1w0e). Given
a PDB structure, the next step is to generate the PSF and PDB files
using VMD and psfgen plugin. The script protein.tcl contains the
detailed steps for the process, which can be executed by typing in a
Linux terminal,
vmd -dispdev text –e protein.tcl > protein.log
4.1.2. Solvating and Now we will use VMD’s Solvate plugin to solvate the protein.
Ionizing the System Solvate places the solute in a box of pre-equilibrated waters of a
specified size, and then removes waters that are within a certain
cutoff distance from the solute. After solvation, we will use VMD’s
Autoionize plugin to add ions to neutralize the system, which is
important for Ewald-based long range electrostatic method such as
particle Ewald mesh (PME) to work properly. It can also create a
desired ionic concentration. What Autoionize does is to randomly
replace water molecules with ions.
vmd -dispdev text –e solvate.tcl > solvate.log
vmd -dispdev text –e ionize.tcl > ionize.log
4.1.3. Running a The solvated system comprises about 40,000 atoms. We will run
Simulation of CYP3A4 the simulations in a local Linux cluster computer using 32 cores
(four nodes and each node with eight cores). We will first minimize
262 X. Cheng and I. Ivanov
the system for 2,000 steps to remove bad contacts. The minimiza-
tion run can be executed by submitting to the PBS batch scheduler
with the command,
qsub runmin.pbs
The configuration file for minimization is min.namd, also
shown below.
After minimization, an MD simulation starting from the mini-
mized structure will be run by submitting to the PBS batch sched-
uler with the command,
qsub runmd.pbs
The PBS run script runmd.pbs is similar to runmin.pbs. The
configuration file for the MD simulation md.namd is shown below.
4.1.4. Analyze the Results The root mean square deviation (RMSD) is a frequently used
measure of the differences between the structures sampled during
the simulation and the reference structure. Using simulation trajec-
tory as an input, RMSD can be calculated with VMD > Extensions
> Analysis > RMSD Trajectory Tool for any selected atoms. Struc-
ture alignment will usually be performed to remove the transla-
tional and rotational movements. Backbone RMSD as a function of
time for CYP3A4 is shown in Fig. 1. Overall, the CYP3A4 structure
appears quite stable, with RMSD quickly reaching a plateau of
1.8 Å after about 0.4 ns of simulation.
The root mean square fluctuation (RMSF) measures the move-
ment of a subset of atoms with respect to the average structure over
the entire simulation. RMSF indicates the flexibility of different
regions of a protein, which can be related to crystallographic
B factors. Figure 2a illustrates the RMSFs of the Ca atoms from
the simulation (red line) in comparison to those (black line)
11 Molecular Dynamics 263
Fig. 2. RMSFs of the Ca atoms from the MD simulation as compared to the experimental data (black line), which were
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
calculated from the B-factors of CYP3A4 (PDB code: 1w0e) using RMSF ¼ ð3=8ÞBfactor =p. The computed RMSF
values are color-coded onto a cartoon representation of the protein structure, with red corresponding to the most mobile
region and blue corresponding to the most stable region.
Fig. 3. Structure of the nicotinic acetylcholine receptor with the five subunits highlighted
in different colors; chloride ions are shown in green, sodium ions in yellow, the head
group region of the lipid bilayer in dark blue.
4.2.2. Minimization Now that we have prepared all of our input files, we can start to run
and Equilibration an MD simulation of the system. However, as the protein structure
is built from homology modeling, and the final system consists of
multiple components: protein, membrane, and water, we will use
more sophisticated equilibration procedures to relax the system.
The entire equilibration protocol consists of the following stages:
minimization with fixed backbone atoms for 2,000 steps; minimi-
zation with restrained Ca atoms for 2,000 steps; Langevin dynam-
ics with harmonic positional restraints on the Ca atoms for
100,000 steps (to heat the system to the target temperature);
constant pressure dynamics (the ratio of the unit cell in the x–y
plane is fixed) with decreasing positional restraints on the Ca atoms
in five steps. The equilibration job will be submitted using
qsub runeq.pbs
4.2.3. Run a Free Energy Brute-force simulation of ion translocation through the nAChR
Simulation Using the channel is still a daunting task, often requiring specialized computer
Adaptive Biasing Force hardware. So here we will focus on understanding the intrinsic
(ABF) Method properties of the nAChR channel pore by computing the systematic
forces, also known as the PMF, experienced by a sodium ion inside
the channel. The adaptive biasing force (ABF) method as imple-
mented in the NAMD package by Chipot and Henin will be used to
266 X. Cheng and I. Ivanov
Fig. 4. Potentials of mean force for translocation of Na+ ions (red ) and Cl ions (blue) in
nAChR. Positions of M2 pore-lining residues are shown with gray lines and labeled at the
top of the graphs.
construct the PMF (125). The detail about this method has been
given elsewhere. Briefly, a reaction coordinate x has to be selected.
The average forces acting along x are accumulated in bins,
providing an estimate of free energy derivative as the simulation
progresses. Then the application of the biasing forces (the negative
of the average forces) will allow the system to realize a free self-
diffusion along x. In the following simulation, the reaction coordi-
nate will be chosen as the normal to the bilayer surface (z). The
simulations will carried out in ten windows of length 5 Å along this
direction, which should be sufficient to cover the entire length of
the transmembrane domain region of nAChR. The PBS submission
script and the corresponding ABF simulation configuration file for
one representative window are given below.
qsub runabf1.pbs
4.2.4. Analyze the Results Combing the PMFs from all the windows by optimally matching the
overlapping regions of two adjacent windows will produce the final
PMF as displayed in Fig. 4 (red line). We only briefly summarize
below the main finding of the PMF, and refer interested readers to
the reference (126) for the detailed analysis of the PMF along with
other calculations. The PMF for sodium inside the nAChR pore
features two distinct areas of ion stabilization toward the extracellu-
lar end, corresponding to two distinct sets of negatively charged
residues, D270 and E200 . In both positions a sodium ion is stabilized
by ~2 kcal/mol. Multiple ions can be accommodated at position
D270 due to the large pore radius at that position of the lumen. The
PMF reaches an overall maximum at z ~ 0 Å. In this region the M2
helices expose primarily hydrophobic residues toward the interior of
11 Molecular Dynamics 267
the receptor (Leu90 , Val130 , Phe140 , and Leu160 ). This result impli-
cates a hydrophobic nature of the gate. The effective free energy of
sodium in the entire intracellular region of the pore (z between
0 and 20 Å) remains largely unfavorable compared to bulk solvent
and goes through several minor peaks and troughs. Overall, the
computed PMF provides a detailed thermodynamic description of
a sodium ion inside the nAChR channel, such as equilibrium sodium
distribution, location of ion binding site, or barrier. Moreover,
when combined with a macroscopic or semi-microscopic diffusional
theory, the PMF can be used to calculate ionic current, thus directly
comparable to single-channel conductance measurements.
5. Notes
6. Sample Input
Files
Box 1
TCL script for building the protein structure
protein.tcl
##############################
# Script to build the protein structure of 1W0E
# STEP 1: Build Protein
package require psfgen
# Use the specified CHARMM27 topology file.
topology /home/xc3/toppar/top_all27_prot_lipid.rtf
alias residue HIS HSD
alias atom ILE CD1 CD
# Build one segment
segment PROT {
first ACE
last CT3
pdb 1w0e-prot.pdb
}
# Load the coordinates for each segment.
coordpdb 1w0e-prot.pdb PROT
# Guess the positions of missing atoms.
guesscoord
# Write out coor and psf file
writepdb protein.pdb
writepsf protein.psf
mol load psf protein.psf pdb protein.pdb
quit
11 Molecular Dynamics 269
Box 2
TCL scripts for solvating the protein structure
solvate.tcl
##############################
# STEP 2: Solvate Protein
package require solvate
solvate protein.psf protein.pdb -t 10 -o solvated
quit
ionize.tcl
##############################
# STEP 3: Ionize Protein
package require autoionize
autoionize -psf solvated.psf -pdb solvated.pdb -is 0.1
quit
Box 3
PBS script to run an energy minimization
runmin.pbs
##############################
## Example PBS script to run a minimization on the linux cluster
#PBS -S /bin/bash
#PBS -j oe
#PBS -m ae
#PBS -N test
#PBS -l nodes¼4:ppn¼8,walltime¼24:00:00
#PBS -V
#PBS -q md
source /share/apps/mpi/gcc/openmpi-1.2.8/bin/mpivars.sh
cd /home/xc3/data7/CYT450
/share/apps/mpi/gcc/openmpi-1.2.8/bin/mpiexec -np 32 /share/apps/
namd/NAMD_2.7b1_Source/Linux-amd64-MPI.arch/namd2 min.namd
> min.log
Box 4
NAMD configuration file for running an energy minimization
min.namd
##############################
# minimization for 2000 steps
# molecular system
coordinates ionized.pdb
structure ionized.psf
firsttimestep 0
temperature 0
minimization on
(continued)
270 X. Cheng and I. Ivanov
Box 4
(continued)
numsteps 2000
# force field
paratypecharmm on
parameters par_all27_prot.prm
exclude scaled1-4
1-4scaling 1.0
switching on
switchdist 8.5
cutoff 10
pairlistdist 12
#PME stuff
cellOrigin 57.56 77.37 10.48
cellBasisVector1 64.70 00.00 00.00
cellBasisVector2 00.00 93.59 00.00
cellBasisVector3 00.00 00.00 82.47
PME on
PmeGridsizeX 64
PmeGridsizeY 96
PmeGridsizeZ 81
margin 5
# output
outputname min
outputenergies 1000
outputtiming 1000
restartname min_restart
restartfreq 1000
restartsave no
Box 5
PBS script to run an MD simulation
runmd.pbs
##############################
## Example PBS script to run a minimization on the linux cluster
#PBS -S /bin/bash
#PBS -j oe
#PBS -m ae
#PBS -N test
#PBS -l nodes¼4:ppn¼8,walltime¼24:00:00
#PBS -V
#PBS -q md
source /share/apps/mpi/gcc/openmpi-1.2.8/bin/mpivars.sh
cd /home/xc3/data7/CYT450
/share/apps/mpi/gcc/openmpi-1.2.8/bin/mpiexec -np 32 /share/apps/
namd/NAMD_2.7b1_Source/Linux-amd64-MPI.arch/namd2 md.namd
> md.log
11 Molecular Dynamics 271
Box 6
NAMD configuration file for running an MD simulation
md.namd
##############################
# run md for 2000000 steps
# molecular system
coordinates ionized.pdb
structure ionized.psf
bincoordinates min_restart.coor
binvelocities min_restart.vel
extendedSystem min_restart.xsc
firsttimestep 0
numsteps 2000000
# force field
paratypecharmm on
parameters par_all27_prot_lipid.prm
exclude scaled1-4
1-4scaling 1.0
switching on
switchdist 8.5
cutoff 10
pairlistdist 12
# integrator
timestep 1.0
stepspercycle 20
nonbondedFreq 1
#PME stuff
cellOrigin 57.56 77.37 10.48
cellBasisVector1 64.70 00.00 00.00
cellBasisVector2 00.00 93.59 00.00
cellBasisVector3 00.00 00.00 82.47
PME on
PmeGridsizeX 64
PmeGridsizeY 96
PmeGridsizeZ 81
margin 5
# output
outputname md
outputenergies 1000
outputtiming 1000
dcdfreq 1000
wrapAll on
wrapNearest on
restartname md_restart
restartfreq 1000
restartsave no
(continued)
272 X. Cheng and I. Ivanov
Box 6
(continued)
Box 7
TCL script files for building a protein structure embedded
in membrane
protein.tcl
##############################
# STEP 1: Build Protein
# Script to build the protein structure of GA
package require psfgen
# Use the specified CHARMM27 topology file.
topology top_all27_prot_lipid.inp
alias atom ILE CD1 CD
alias residue HIS HSD
# Build two segments, one for each Chain.
segment GA1 {
first ACE
last CT3
pdb Chain1.pdb
}
segment GA2 {
first ACE
last CT3
pdb Chain2.pdb
}
segment GA3 {
first ACE
last CT3
pdb Chain3.pdb
}
segment GA4 {
first ACE
last CT3
(continued)
11 Molecular Dynamics 273
Box 7
(continued)
pdb Chain4.pdb
}
segment GA5 {
first ACE
last CT3
pdb Chain5.pdb
}
segment GA6 {
first ACE
last CT3
pdb Chain6.pdb
}
segment GA7 {
first ACE
last CT3
pdb Chain7.pdb
}
segment GA8 {
first ACE
last CT3
pdb Chain8.pdb
}
segment GA9 {
first ACE
last CT3
pdb Chain9.pdb
}
segment GA10 {
first ACE
last CT3
pdb Chain10.pdb
}
segment GA11 {
auto none
pdb Chain11.pdb
}
# Add patches, for example disulphide bridges.
patch DISU GA1:128 GA1:142
patch DISU GA1:190 GA1:191
patch DISU GA3:495 GA3:509
patch DISU GA3:557 GA3:558
patch DISU GA5:862 GA5:876
patch DISU GA5:924 GA5:925
patch DISU GA7:1229 GA7:1243
patch DISU GA7:1291 GA7:1292
patch DISU GA9:1596 GA9:1610
patch DISU GA9:1658 GA9:1659
# Load the coordinates for each segment.
coordpdb Chain1.pdb GA1
(continued)
274 X. Cheng and I. Ivanov
Box 7
(continued)
membrane.tcl
##############################
# STEP 2: Building a Membrane Patch
package require membrane
membrane -l popc -x 120 -y 120
combine.tcl
##############################
# STEP 3: Combine Protein and Membrane
##!/usr/local/bin/vmd
# need psfgen module and topology
package require psfgen
topology top_all27_prot_lipid.inp
# load structures
resetpsf
readpsf membrane.psf
coordpdb membrane.pdb
readpsf protein.psf
coordpdb protein.pdb
# write temporary structure
set temp "temp"
writepsf $temp.psf
writepdb $temp.pdb
# reload full structure (do NOT resetpsf!)
mol load psf $temp.psf pdb $temp.pdb
# select and delete lipids that overlap protein:any atom to any atom distance
under 0.8A
set sellip [atomselect top "resname POPC"]
(continued)
11 Molecular Dynamics 275
Box 7
(continued)
Box 8
TCL script files for solvating the membrane protein
structure
solvate.tcl
##############################
# STEP 4: Solvate Protein
package require solvate
solvate protein.psf protein.pdb -z 10 -o solvated
quit
ionize.tcl
##############################
# STEP 5: Ionize Protein
package require autoionize
(continued)
276 X. Cheng and I. Ivanov
Box 8
(continued)
Box 9
NAMD configuration file for running an MD equilibration
equil.namd
##############################
# STEP 7: Minimization and Equilibration
# molecular system
coordinates ionized_porewat.pdb
structure ionized_porewat.psf
temperature 0
# force field
paratypecharmm on
parameters par_all27_prot_lipid.prm
exclude scaled1-4
1-4scaling 1.0
switching on
switchdist 8.5
cutoff 10
pairlistdist 12
# integrator
(continued)
11 Molecular Dynamics 277
Box 9
(continued)
timestep 1.0
stepspercycle 20
nonbondedFreq 1
#PME stuff
cellOrigin 11.53 12.51 -30.64
cellBasisVector1 123.90 000.00 000.00
cellBasisVector2 000.00 123.25 000.00
cellBasisVector3 000.00 000.00 136.78
PME on
PmeGridsizeX 128
PmeGridsizeY 128
PmeGridsizeZ 144
margin 5
# output
outputname eq
outputenergies 1000
outputtiming 1000
dcdfreq 1000
dcdfile eq.dcd
wrapAll on
wrapNearest on
fixedAtoms on
fixedAtomsForces on
fixedAtomsFile fix_backbone.pdb
fixedAtomsCol B
constraints on
consRef restrain_ca.pdb
consKFile restrain_ca.pdb
consKCol B
langevin on
langevinDamping 10
langevinTemp 310
langevinHydrogen no
langevinPiston on
langevinPistonTarget 1.01325
langevinPistonPeriod 200
langevinPistonDecay 100
langevinPistonTemp 310
useGroupPressure yes # smaller fluctuations
useFlexibleCell yes # allow dimensions to fluctuate independently
useConstantRatio yes # fix shape in x-y plane
# run one step to get into scripting mode
minimize 0
# turn off until later
langevinPiston off
# minimize nonbackbone atoms
minimize 2000
(continued)
278 X. Cheng and I. Ivanov
Box 9
(continued)
output min_fix
# min all atoms
fixedAtoms off
minimize 2000
output min_all
# heat with CAs restrained
run 100000
output heat
# equilibrate volume with CAs restrained
langevinPiston on
constraintScaling 3.0
output equil_ca1
run 200000
constraintScaling 1.0
output equil_ca2
run 200000
constraintScaling 0.5
output equil_ca3
run 200000
constraintScaling 0.25
output equil_ca4
run 200000
constraintScaling 0
output equil_ca5
run 1000000
Box 10
PBS script for running an MD equilibration
runeq.pbs
##############################
## Example PBS script to run a minimization on the linux cluster
#PBS -S /bin/bash
#PBS -j oe
#PBS -m ae
#PBS -N test
#PBS -l nodes¼12:ppn¼8,walltime¼24:00:00
#PBS -V
#PBS -q md
source /share/apps/mpi/gcc/openmpi-1.2.8/bin/mpivars.sh
cd /home/xc3/data7/nAChR
/share/apps/mpi/gcc/openmpi-1.2.8/bin/mpiexec -np 96 /share/apps/
namd/NAMD_2.7b1_Source/Linux-amd64-MPI.arch/namd2 equil.namd
> equil.log
11 Molecular Dynamics 279
Box 11
PBS script for running an MD production
runabf1.pbs
##############################
## Example PBS script to run a minimization on the linux cluster
#PBS -S /bin/bash
#PBS -j oe
#PBS -m ae
#PBS -N abf1
#PBS -l nodes¼12:ppn¼8,walltime¼24:00:00
#PBS -V
#PBS -q md
source /share/apps/mpi/gcc/openmpi-1.2.8/bin/mpivars.sh
cd /home/xc3/data7/nAChR
/share/apps/mpi/gcc/openmpi-1.2.8/bin/mpiexec -np 96 /share/apps/namd/
NAMD_2.7b1_Source/Linux-amd64-MPI.arch/namd2 abf1.namd > abf1.
log
Box 12
NAMD configuration file for running an ABF MD simulation
abf1.namd
##############################
# STEP 7: ABF Simulation – window 1
# molecular system
# start from slightly modified equilibrated configuration
# the position of the permeating sodium ion is modified to be located within the
biasing window
coordinates ionized_porewat-abf1.pdb
structure ionized_porewat.psf
bincoordinates eq_restart.coor
binvelocities eq_restart.vel
extendedSystem eq_restart.xsc
firsttimestep 0
temperature 310
numsteps 2000000
# force field
paratypecharmm on
parameters par_all27_prot_lipid.prm
exclude scaled1-4
1-4scaling 1.0
switching on
switchdist 8.5
cutoff 10
pairlistdist 12
# integrator
timestep 1.0
stepspercycle 20
(continued)
280 X. Cheng and I. Ivanov
Box 12
(continued)
nonbondedFreq 1
#PME stuff
cellOrigin 11.53 12.51 -30.64
cellBasisVector1 123.90 000.00 000.00
cellBasisVector2 000.00 123.25 000.00
cellBasisVector3 000.00 000.00 136.78
PME on
PmeGridsizeX 128
PmeGridsizeY 128
PmeGridsizeZ 144
# output
outputname abf1
outputenergies 1000
outputtiming 1000
dcdfreq 1000
dcdfile abf1.dcd
wrapAll on
wrapNearest on
restartname abf1_restart
restartfreq 1000
restartsave no
# restraints are applied to six Ca atoms on each subunit
# (three at the extracellular end and three at the intracellular end of the M2
helices)
constraints on
consRef restrain_ref.pdb
consKFile restrain_ref.pdb
consKCol B
langevin on
langevinDamping 5
langevinTemp 310
langevinHydrogen on
langevinPiston on
langevinPistonTarget 1.01325
langevinPistonPeriod 200
langevinPistonDecay 500
langevinPistonTemp 310
useGroupPressure yes
useFlexibleCell yes
useConstantArea yes
# ABF SECTION
colvars on
colvarsConfig Distance.in
Distance.in
##############################
Colvarstrajfrequency 2000
Colvarsrestartfrequency 20000
(continued)
11 Molecular Dynamics 281
Box 12
(continued)
colvar {
name COMDistance
width 0.1
lowerboundary -25.0
upperboundary 20.0
lowerwallconstant 10.0
upperwallconstant 10.0
# distance along z axis between the ion and the 30 reference atoms
distanceZ {
group1 {
atomnumbers { 245701 }
}
group2 {
atomnumbers { 58205 58216 58223 58687 58705 58717
64078 64089 64096 64560 64577 64590
69951 69962 69969 70433 70450 70463
75824 75835 75842 76306 76323 76336
81697 81708 81715 82179 82196 82209 }
}
}
}
abf {
colvars COMDistance
fullSamples 800
hideJacobian
}
References
1. Liebler DC, Guengerich FP (2005) Elucidat- nuclear receptor superfamily. Nat Rev Drug
ing mechanisms of drug-induced toxicity. Nat Discov 3:950–964
Rev Drug Discov 4(5):410–420 7. Sanguinetti MC, Tristani-Firouzi M (2006)
2. Houck KA, Kavlock RJ (2008) Understand- hERG potassium channels and cardiac
ing mechanisms of toxicity: insights from arrhythmia. Nature 440(7083):463–469
drug discovery research. Toxicol Appl Phar- 8. Cronin MT (2000) Computational methods
macol 227(2):163–178 for the prediction of drug toxicity. Curr Opin
3. Gillette JR, Mitchell JR, Brodie BB (1974) Drug Discov Dev 3(3):292–297
Biochemical mechanisms of drug toxicity. 9. Dearden JC (2003) In silico prediction of
Annu Rev Pharmacol 14:271–288 drug toxicity. J Comput Aided Mol Des 17
4. Baillie TA (2008) Metabolism and toxicity of (2–4):119–127
drugs. Two decades of progress in industrial 10. Valerio LG (2009) In silico toxicology for the
drug metabolism. Chem Res Toxicol 21 pharmaceutical sciences. Toxicol Appl Phar-
(1):129–137 macol 241(3):356–370
5. Guengerich FP (1999) Cytochrome P-450 11. Kavlock RJ et al (2008) Computational toxi-
3A4: regulation and role in drug metabolism. cology—a state of the science mini review.
Annu Rev Pharmacol Toxicol 39:1–17 Toxicol Sci 103(1):14–27
6. Gronemeyer H, Gustafsson J, Laudet V 12. Nicholson JD, Wilson ID (2003) Understand-
(2004) Principles for modulation of the ing ‘Global’ systems biology: metabonomics
282 X. Cheng and I. Ivanov
and the continuum of metabolism. Nat Rev Trans A Math Phys Eng Sci 363
Drug Discov 2:668–676 (1827):331–355
13. Bugrim A, Nikolskaya T, Yuri Nikolsky Y 29. Senn HM, Thiel W (2009) QM/MM meth-
(2004) Early prediction of drug metabolism ods for biomolecular systems. Angew Chem
and toxicity: systems biology approach and Int Ed Engl 48(7):1198–1229
modeling. Drug Discov Today 9(3):127–135 30. Ridder L, Mulholland AJ (2003) Modeling
14. Nicholson JK et al (2002) Metabonomics: a biotransformation reactions by combined
platform for studying drug toxicity and gene quantum mechanical/molecular mechanical
function. Nat Rev Drug Discov 1:153–161 approaches: from structure to activity. Curr
15. Hunter PJ, Borg TK (2003) Integration from Top Med Chem 3(11):1241–1256
proteins to organs: the Physiome Project. Nat 31. Lewis DFV (2001) Guide to cytochromes
Rev Mol Cell Biol 4(3):237–243 P450: structure and function, 2nd edn.
16. Silva JR et al (2009) A multiscale model link- Informa Healthcare, London
ing ion-channel molecular dynamics and elec- 32. Denisov IG et al (2005) Structure and chem-
trostatics to the cardiac action potential. Proc istry of cytochrome P450. Chem Rev 105
Natl Acad Sci U S A 106(27):11102–11106 (6):2253–2277
17. Dill KA et al (2008) The protein folding 33. Wang JF, Chou KC (2010) Molecular model-
problem. Annu Rev Biophys 37:289–316 ing of cytochrome P450 and drug metabo-
18. Zimmermann O, Hansmann UH (2008) lism. Curr Drug Metab 11(4):342–346
Understanding protein folding: small proteins 34. Otyepka M et al (2007) What common struc-
in silico. Biochim Biophys Acta 1784 tural features and variations of mammalian
(1):252–258 P450s are known to date? Biochim Biophys
19. de Groot BL, Grubm€ uller H (2005) The Acta 1770(3):376–389
dynamics and energetics of water permeation 35. Henley DV, Korach KS (2006) Endocrine-
and proton exclusion in aquaporins. Curr disrupting chemicals use distinct mechanisms
Opin Struct Biol 15(2):176–183 of action to modulate endocrine system func-
20. Roux B, Schulten K (2004) Computational tion. Endocrinology 147(6):S25–S32
studies of membrane channels. Structure 12 36. Ankley GT et al (2010) Adverse outcome
(8):1343–1351 pathways: a conceptual framework to support
21. McCammon JA, Gelin BR, Karplus M (1977) ecotoxicology research and risk assessment.
Dynamics of folded proteins. Nature 267 Environ Toxicol Chem 29(3):730–741
(5612):585–590 37. Jugan ML, Levi Y, Blondeau JP (2010) Endo-
22. Karplus M, McCammon JA (2002) Molecular crine disruptors and thyroid hormone physi-
dynamics simulations of biomolecules. Nat ology. Biochem Pharmacol 79(7):939–947
Struct Biol 9(9):646–652 38. Pearce EN, Braverman LE (2009) Environ-
23. van Gunsteren WF et al (2006) Biomolecular mental pollutants and the thyroid. Best Pract
modeling: goals, problems, perspectives. Res Clin Endocrinol Metab 23(6):801–813
Angew Chem Int Ed Engl 45 39. Prenzel N et al (2001) The epidermal growth
(25):4064–4092 factor receptor family as a central element for
24. Adcock SA, McCammon JA (2006) Molecu- cellular signal transduction and diversifica-
lar dynamics: survey of methods for simulat- tion. Endocr Relat Cancer 8(1):11–31
ing the activity of proteins. Chem Rev 106 40. Bock KW (1994) Aryl hydrocarbon or dioxin
(5):1589–1615 receptor: biologic and toxic responses. Rev
25. Scheraga HA, Khalili M, Liwo A (2007) Physiol Biochem Pharmacol 125:1–42
Protein-folding dynamics: overview of molec- 41. Bradshaw TD, Bell DR (2009) Relevance of
ular simulation techniques. Annu Rev Phys the aryl hydrocarbon receptor (AhR) for clin-
Chem 58:57–83 ical toxicology. Clin Toxicol (Phila) 47
26. Warshel A (2002) Molecular dynamics simu- (7):632–642
lations of biological reactions. Acc Chem Res 42. Gray LE Jr et al (2006) Adverse effects of
35(6):385–395 environmental antiandrogens and androgens
27. Garcia-Viloca M et al (2004) How enzymes on reproductive development in mammals.
work: analysis by modern rate theory and Int J Androl 29(1):96–104
computer simulations. Science 303 43. Roncaglioni A, Benfenati E (2008) In silico-
(5655):186–195 aided prediction of biological properties of
28. Karplus M et al (2005) Protein structural chemicals: oestrogen receptor-mediated
transitions and their functional role. Philos effects. Chem Soc Rev 37(3):441–450
11 Molecular Dynamics 283
44. Lin JH et al (2002) Computational drug role for a buried arginine. Proc Natl Acad Sci
design accommodating receptor flexibility: U S A 99(8):5361–5366
the relaxed complex scheme. J Am Chem 56. L€udemann SK, Lounnas V, Wade RC (2000)
Soc 124(20):5632–5633 How do substrates enter and products exit the
45. Cornell W, Nam K (2009) Steroid hormone buried active site of cytochrome P450cam? 1.
binding receptors: application of homology Random expulsion molecular dynamics inves-
modeling, induced fit docking, and molecular tigation of ligand access channels and
dynamics to study structure–function rela- mechanisms. J Mol Biol 303(5):797–811
tionships. Curr Top Med Chem 9 57. Li W et al (2007) Possible pathway(s) of
(9):844–853 metyrapone egress from the active site of cyto-
46. Recanatini M, Cavalli A, Masetti M (2008) chrome P450 3A4: a molecular dynamics sim-
Modeling the hERG potassium channel in a ulation. Drug Metab Dispos 35(4):689–696
phospholipid bilayer: molecular dynamics and 58. Fishelovitch D et al (2009) Theoretical char-
drug docking studies. J Comput Chem 29 acterization of substrate access/exit channels
(5):795–808 in the human cytochrome P450 3A4 enzyme:
47. Stary A et al (2010) Toward a consensus involvement of phenylalanine residues in the
model of the HERG potassium channel. gating mechanism. J Phys Chem B 113
ChemMedChem 5(3):455–467 (39):13018–13025
48. Meharenna YT, Poulos TL (2010) Using 59. Subbotina J et al (2010) Structural refinement
molecular dynamics to probe the structural of the hERG1 pore and voltage-sensing
basis for enhanced stability in thermal stable domains with ROSETTA-membrane and
cytochromes P450. Biochemistry 49 molecular dynamics simulations. Proteins 78
(31):6680–6686 (14):2922–2934
49. Skopalı́k J, Anzenbacher P, Otyepka M 60. Stansfeld PJ et al (2008) Insight into the
(2008) Flexibility of human cytochromes mechanism of inactivation and pH sensitivity
P450: molecular dynamics reveals differences in potassium channels from molecular dynam-
between CYPs 3A4, 2 C9, and 2A6, which ics simulations. Biochemistry 47
correlate with their substrate preferences. J (28):7414–7422
Phys Chem B 112(27):8165–8173 61. Kutteh R, Vandenberg JI, Kuyucak S (2007)
50. Hendrychováa T et al (2010) Flexibility of Molecular dynamics and continuum electro-
human cytochrome p450 enzymes: molecular statics studies of inactivation in the HERG
dynamics and spectroscopy reveal important potassium channel. J Phys Chem B
function-related variations. Biochim Biophys 111:1090–1098
Acta 1814:58–68 62. Osterberg F, Aqvist J (2005) Exploring
51. Lampe JN et al (2010) Two-dimensional blocker binding to a homology model of the
NMR and all-atom molecular dynamics of open hERG K+ channel using docking and
cytochrome P450 CYP119 reveal hidden con- molecular dynamics methods. FEBS Lett
formational substates. J Biol Chem 285 579:2939–2944
(13):9594–9603 63. Brooks BR et al (2009) CHARMM: the bio-
52. Park H, Lee S, Suh J (2005) Structural and molecular simulation program. J Comput
dynamical basis of broad substrate specificity, Chem 30(10):1545–1614
catalytic mechanism, and inhibition of cyto- 64. Case DA et al (2005) The Amber biomolecu-
chrome P450 3A4. J Am Chem Soc 127 lar simulation programs. J Comput Chem 26
(39):13634–13642 (16):1668–1688
53. Fishelovitch D et al (2007) Structural dynam- 65. Christen M et al (2005) The GROMOS soft-
ics of the cooperative binding of organic ware for biomolecular simulation: GRO-
molecules in the human cytochrome P450 MOS05. J Comput Chem 26
3A4. J Am Chem Soc 129(6):1602–1611 (16):1719–1751
54. Seifert A et al (2006) Multiple molecular 66. Van Der Spoel D et al (2005) GROMACS:
dynamics simulations of human p450 mono- fast, flexible, and free. J Comput Chem 26
oxygenase CYP2C9: the molecular basis of (16):1701–1718
substrate binding and regioselectivity toward 67. Ponder JW, Richards FM (1987) An efficient
warfarin. Proteins 64(1):147–155 Newton-like method for molecular mechanics
55. Winn PJ et al (2002) Comparison of the energy minimization of large molecules. J
dynamics of substrate access channels in Comput Chem 8(7):1016–1024
three cytochrome P450s reveals different 68. Refson K (2000) Moldy: a portable molecular
opening mechanisms and a novel functional dynamics simulation program for serial and
284 X. Cheng and I. Ivanov
parallel computers. Comput Phys Commun 83. Sali A et al (1995) Evaluation of comparative
126(3):310–329 protein modeling by MODELLER. Proteins
69. Smith W, Forester TR (1996) 23(3):318–326
DL_POLY_2.0: a general-purpose parallel 84. Wang J et al (2004) Development and testing
molecular dynamics simulation package. J of a general amber force field. J Comput
Mol Graph 14(3):136–141 Chem 25(9):1157–1174
70. Phillips JC et al (2005) Scalable molecular 85. Vanommeslaeghe K et al (2010) CHARMM
dynamics with NAMD. J Comput Chem general force field: a force field for drug-like
26:1781–1802 molecules compatible with the CHARMM
71. Plimpton SJ (1995) Fast parallel algorithms all-atom additive biological force fields. J
for short-range molecular dynamics. J Comp Comput Chem 31(4):671–690
Phys 117:1–19 86. Ryckaert JP, Ciccotti G, Berendsen HJC
72. Bowers KJ et al (2006) Scalable algorithms for (1977) Numerical integration of the Carte-
molecular dynamics simulations on commod- sian equations of motion of a system with
ity clusters. In: Proceedings of the ACM/ constraints: molecular dynamics of n-alkanes.
IEEE conference on supercomputing J Comp Phys 23:327–341
(SC06). Tampa, FL. 87. Tuckerman M, Berne BJ, Martyna GJ (1992)
73. MacKerell AD et al (1998) CHARMM: the Reversible multiple time scale molecular
energy function and its parameterization with dynamics. J Chem Phys 97:1990–2001
an overview of the program. In: Schleyer PVR 88. Andersen HC (1980) Molecular dynamics at
(ed) The encyclopedia of computational constant pressure and/or temperature. J
chemistry. Wiley, Chichester, pp 271–277 Chem Phys 72:2384–2393
74. Cornell WD et al (1995) A second generation 89. Nose S (1984) A unified formulation of the
force field for the simulation of proteins, constant temperature molecular-dynamics
nucleic acids, and organic molecules. J Am methods. J Chem Phys 81(1):511–519
Chem Soc 117(19):5179–5197 90. Hoover WG (1985) Canonical dynamics:
75. Oostenbrink C et al (2004) A biomolecular equilibrium phase-space distributions. Phys
force field based on the free enthalpy of hydra- Rev A 31(3):1695–1697
tion and solvation: the GROMOS force-field 91. Martyna GL, Klein ML, Tuckerman M
parameter sets 53A5 and 53A6. J Comput (1992) Nose-Hoover chains: the canonical
Chem 25(13):1656–1676 ensemble via continuous dynamics. J Chem
76. Jorgensen WL, Maxwell DS, Tirado-Rives J Phys 97(4):2635–2643
(1996) Development and testing of the OPLS 92. Berendsen HJC et al (1984) Molecular-
all-atom force field on conformational ener- dynamics with coupling to an external bath.
getics and properties of organic liquids. J Am J Chem Phys 81(8):3684–3690
Chem Soc 118:11225–11236 93. Martyna GL, Tobias DJ, Klein ML (1994)
77. Stone JE et al (2007) Accelerating molecular Constant pressure molecular dynamics algo-
modeling applications with graphics proces- rithms. J Chem Phys 101(5):4177–4189
sors. J Comput Chem 28(16):2618–2640 94. Andricioaei I, Karplus M (2001) On the cal-
78. Friedrichs MS et al (2009) Accelerating culation of entropy from covariance matrices
molecular dynamic simulation on graphics of the atomic fluctuations. J Chem Phys
processing units. J Comput Chem 30 115:6289–6292
(6):864–872 95. Baron R, H€ unenberger PH, McCammon JA
79. Davis JE et al (2009) Towards large-scale (2009) Absolute single-molecule entropies
molecular dynamics simulations on graphics from quasi-harmonic analysis of microsecond
processors. Lecture Notes Comput Sci molecular dynamics: correction terms and
5462:176–186 convergence properties. J Chem Theory
80. Darden T, York D, Pedersen L (1993) Particle Comput 5(12):3150–3160
mesh Ewald: an N log (N) method for Ewald 96. Kollman PA et al (2000) Calculating struc-
sums in large systems. J Chem Phys tures and free energies of complex molecules:
98:10089–10092 combining molecular mechanics and contin-
81. Hockney RW, Eastwood JW (1988) Com- uum models. Acc Chem Res 33(12):889–897
puter simulation using particles. Taylor & 97. Torrie GM, Valleau JP (1977) Nonphysical
Francis Croup, New York sampling distributions in Monte Carlo free-
82. Humphrey W, Dalke A, Schulten K (1996) energy estimation—umbrella sampling. J
VMD: visual molecular dynamics. J Mol Comput Phys 23:187–199
Graph Model 14(1):33–38
11 Molecular Dynamics 285
98. Mitsutake A, Sugita Y, Okamoto Y (2001) the Abl kinase. Proc Natl Acad Sci U S A
Generalized-ensemble algorithms for molecu- 106(1):139–144
lar simulations of biopolymers. Biopolymers 112. Arkin IT et al (2007) Mechanism of Na+/H+
60(2):96–123 antiporting. Science 317(5839):799–803
99. Wang F, Landau DP (2001) Efficient multiple 113. Jensen MØ et al (2010) Principles of conduc-
range random walk algorithm to calculate tion and hydrophobic gating in K+ channels.
density of states. Phys Rev Lett 86:2050 Proc Natl Acad Sci U S A 107
100. Kumar S et al (1993) The weighted histogram (13):5833–5838
analysis method (WHAM) for free energy cal- 114. Okamoto Y (2004) Generalized-ensemble
culations on biomolecules: 1. The method. J algorithms: enhanced sampling techniques for
Comput Chem 13:1011–1021 Monte Carlo and molecular dynamics simula-
101. Fasnacht M, Zhu J, Honig B (2007) Local tions. J Mol Graph Model 22(5):425–439
quality assessment in homology models using 115. Laio A, Parrinello M (2002) Escaping free-
statistical potentials and support vector energy minima. Proc Natl Acad Sci U S A 99
machines. Protein Sci 16(8):1557–1568 (20):12562–12566
102. Grossfield A, Feller SE, Pitman MC (2007) 116. Hamelberg D, Mongan J, McCammon JA
Convergence of molecular dynamics simula- (2004) Accelerated molecular dynamics: a
tions of membrane proteins. Proteins 67 promising and efficient simulation method for
(1):31–40 biomolecules. J Chem Phys 120:11919–11929
103. Guvench O, MacKerell AD Jr (2008) Com- 117. Bolhuis PG et al (2002) Transition path sam-
parison of protein force fields for molecular pling: throwing ropes over rough mountain
dynamics simulations. Methods Mol Biol passes, in the dark. Annu Rev Phys Chem
443:63–88 53:291–318
104. Ponder JW, Case DA (2003) Force fields for 118. Noé F et al (2007) Hierarchical analysis of
protein simulations. Adv Protein Chem conformational dynamics in biomolecules:
66:27–85 transition networks of metastable states. J
105. Hornak V et al (2006) Comparison of multi- Chem Phys 126(15):155102
ple Amber force fields and development of 119. Williams PA et al (2004) Crystal structures of
improved protein backbone parameters. Pro- human cytochrome P450 3A4 bound to
teins 65(3):712–725 metyrapone and progesterone. Science 305
106. MacKerell AD Jr, Feig M, Brooks CL III (5684):683–686
(2004) Improved treatment of the protein 120. Sine SM, Engel AG (2006) Recent advances
backbone in empirical force fields. J Am in Cys-loop receptor structure and function.
Chem Soc 126(3):698–699 Nature 440(7083):448–455
107. Patel S, Mackerell AD Jr, Brooks CL III 121. Unwin N (2005) Refined structure of the
(2004) CHARMM fluctuating charge force nicotinic acetylcholine receptor at 4A resolu-
field for proteins: II protein/solvent proper- tion. J Mol Biol 346(4):967–989
ties from molecular dynamics simulations 122. Laskowski RA et al (1993) PROCHECK: a
using a nonadditive electrostatic model. J program to check the stereochemical quality
Comput Chem 25(12):1504–1514 of protein structures. J Appl Cryst
108. Ponder JW et al (2010) Current status of the 26:283–291
AMOEBA polarizable force field. J Phys 123. Sippl MJ (1993) Recognition of errors in
Chem B 114(8):2549–2564 three-dimensional structures of proteins. Pro-
109. Kaminski GA, Friesner RA, Zhou R (2003) A tein 17:355–362
computationally inexpensive modification of 124. Cheng X et al (2006) Channel opening
the point dipole electrostatic polarization motion of alpha7 nicotinic acetylcholine
model for molecular simulations. J Comput receptor as suggested by normal mode analy-
Chem 24(3):267–276 sis. J Mol Biol 355(2):310–324
110. Lopes PE, Roux B, Mackerell AD (2009) 125. Chipot C, Henin J (2005) Exploring the free-
Molecular modeling and dynamics studies energy landscape of a short peptide using an
with explicit inclusion of electronic polariz- average force. J Chem Phys 123(24):244906
ability: theory and applications. Theor Chem
Acc 124(1–2):11–28 126. Ivanov IN et al (2007) Barriers to ion translo-
cation in cationic and anionic receptors from
111. Shan Y et al (2009) A conserved protonation- the cys-loop family. J Am Chem Soc 129
dependent switch controls drug binding in (26):8217–24
Part IV
Abstract
In clinical toxicology, a better understanding of the pharmacokinetics of the drugs may be useful in both
risk assessment and formulating treatment guidelines for patients. Pharmacokinetics describes the time
course of drug concentrations and is a driver for the time course of drug effects. In this chapter pharmaco-
kinetics is described from a mathematical modeling perspective as applied to clinical toxicology. The
pharmacokinetics of drugs are described using a combination of input and disposition (distribution and
elimination) phases. A description of the time course of the input and disposition of drugs in overdose
provides a basis for understanding the time course of effects of drugs in overdose. Relevant clinical
toxicology examples are provided to explain various pharmacokinetic principles. Throughout this chapter
we have taken a pragmatic approach to understanding and interpreting the time course of drug effects.
Key words: Pharmacokinetics, Clinical toxicology, Input, Disposition, Clearance, Volume of distri-
bution, Compartmental models
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_12, # Springer Science+Business Media, LLC 2012
289
290 P. Vajjah et al.
Fig. 1. A primary purpose of pharmacokinetics is to add time into the dose/concentration–effect relationship.
1.1. Toxic Dose In this chapter we have taken a broad perspective of the word drug,
where a drug is defined as any exogenously administered chemical
that elicits an effect on the body. Hence from this perspective a drug
could have therapeutic or toxic actions depending on the concen-
tration in the target tissues, which is a function of dose and time.
Based on this definition, the term toxicokinetics, which is used
often used in the discipline of toxicology, is simply pharmacokinet-
ics. We therefore use the term PK to refer to the concentration–-
time course of the concentration of any drug in the body whether it
is used therapeutically or following inadvertent or deliberate self-
poisoning.
2. Materials
2.1. Pharmacokinetics Traditionally the PK of drugs has been described using ADME
(Absorption, Distribution, Metabolism, and Excretion) principles.
The acronym ADME suggests a serial approach to PK such that
absorption occurs first followed by distribution then metabolism
and then excretion. However these processes (ADME) are not tem-
porally discrete and occur simultaneously even though at any point of
time one of the processes may be predominant. For example the
distribution, metabolism, and excretion of a drug usually continue
to occur during the so-called absorption phase. Another way to
consider PK processes is to categorize them into two components;
1. Input which describes the time course of drug movement from
the site of administration to the site of measurement.
2. Disposition which describes the time course of drug distribution
and elimination from the site of measurement.
Figures 2 and 3 show the input and disposition phases of the
drug that is administered via an extravascular route. Here we
292 P. Vajjah et al.
Fig. 2. The relationship between input and disposition rates over time.
2.2. Input Input can be defined as the process by which unchanged drug
travels from the site of administration to the site of measurement
within the body. In the case of an intravenous bolus dose the entire
amount of unchanged drug is available instantaneously in the body.
In the case of the extravascular route of administration the input
process usually involves more than one mechanism and there may
be several possible sites where the drug may be irreversibly elimi-
nated during the input process, hence absorption may be incom-
plete and variable in rate.
When the drugs are dosed orally loss may occur due to biological
reasons and/or poor physicochemical properties of the drugs.
Biological reasons include degradation of the drug in the gastrointes-
tinal tract (e.g., insulin), pre-systemic metabolism by enzymes present
in the gastrointestinal tract wall (e.g., cyclosporin) and transporters in
the gastrointestinal tract (e.g., vincristine) wall providing a counter
flux back into the gastrointestinal tract (5).
Most of the drugs that are substrates of the enzyme CYP3A4 (a
class of cytochrome P 450 enzyme) may undergo pre-systemic
metabolism. This is due to the presence of CYP3A4 in gut wall
(6). A key example of a drug that undergoes pre-systemic metabo-
lism through CYP3A4 is cyclosporine resulting in often highly
erratic PK profiles (7). Terbutaline (8) also undergo pre-systemic
sulphation. Drugs that are substrates of P-Glycoprotein (P-gp) also
have poor bioavailability. P-gp is a glycoprotein present in the gut
wall. The major function of P-gp in the gut wall is to transport the
drug back into the gut. Examples of P-gp substrates include
digoxin (9) and fexofenadine (10).
Physicochemical properties of the drug like poor solubility
and/or permeability of the drug may also lead to loss of the drug.
Some drugs like glibenclamide have low solubility in the gastroin-
testinal fluids and hence low bioavailability (11). Others like cimet-
idine have high solubility but low permeability (11). Once the drug
gains entry passed the gastrointestinal tract and enters the portal
vein, it may undergo first-pass metabolism in the liver. Figure 4
represents the absorption of the drug via the oral route.
2.2.1. Input and Drug In most studies of overdose the input phase may be difficult to
Overdose characterize. This is mainly because the patient does not take the
overdose in the hospital and less information is available about the
dose and the time at which the patient ingested the overdose (12). In
addition, there is a lag between the time of ingestion of the dose and
time at which the first plasma sample can be collected and the input
phase of the drug may be partially or fully completed by this time.
294 P. Vajjah et al.
Fig. 4. Barriers that a drug encounters when given via the oral route. The drug can be degraded or poorly absorbed and
eliminated through feces, metabolized by intestinal microsomes, affected by transporters in intestinal wall that may inhibit
their passage through the GIT wall, and first pass metabolism in the liver.
Hence, the data collected may not provide much information about
the input phase of the drug. Methods have been developed in order to
account for this missing information in the input phase and account
for some of the uncertainty which is particularly important for esti-
mating the disposition parameters (e.g., clearance) (2, 12, 13).
It is often assumed that drug absorption takes longer when the
drug is taken in overdose. However, in a number of pharmacoki-
netic studies of drugs in overdose the absorption appears to be
rapid and complete, similar to the pharmacokinetics in therapeutic
doses of the same drugs (2, 12–14). The input phase for drugs in
overdose may therefore be similar to the drug in therapeutic doses
in many cases. In some cases there is clearly prolonged absorption
and this is assumed to be due to the tablets aggregating to form
what are known as pharmacobezoars (15). Carbamazepine is an
example of a drug in overdose that has a prolonged input phase
(16), and this can be seen in Fig. 5 that suggests there is ongoing
absorption for up to 24 h with increasing concentrations or a
plateauing of drug concentrations. This may be due to the forma-
tion of pharmacobezoars or that carbamazepine affects its own
absorption by perhaps reducing gastrointestinal motility.
2.3. Disposition Disposition can be defined as the process by which the drug moves
to and from the site of measurement. Once absorbed the drug
molecules are distributed to the tissues of the body including to
organs of elimination, leading to a decrease in concentration of the
drug at the site of measurement. The decrease in the blood con-
centration could be due to reversible loss of drug from the blood to
the tissues, defined as distribution, or the irreversible loss of drug
12 Introduction to Pharmacokinetics in Clinical Toxicology 295
Fig. 5. Observed dose-normalized concentrations (to the therapeutic dose of 200 mg)
versus reported time of administration curves for eight patients taking normal release
carbamazepine overdoses with doses ranging from 3 to 36 g. Samples from the same
dose event are connected.
2.4. Distribution Distribution can be defined as the reversible transfer of drug from the
central compartment (blood) to the different tissues and is schema-
tically represented in Fig. 6. The rate and extent of distribution of a
296 P. Vajjah et al.
Fig. 7. Drug concentration time curves for seven patients with amitryptyline overdoses
redrawn from Figure 3 in Hulten et al. (18) Shaded area is the therapeutic range of
0.8–2 mM.
Fig. 8. Plasma lithium concentration (black squares) and QT interval (gray circles) versus time for a female patient with
chronic lithium toxicity. The apparent half-life of lithium is 24.8 h assuming a one-compartmental model.
the clinical effects and this is shown with a similar decrease of the
QT interval with the decline in lithium concentration (Fig. 8). In
contrast, in acute lithium overdose toxicity occurs rarely because
there is only transient exposure to high plasma lithium concentra-
tions before it is distributed throughout the body. Uptake into the
central nervous system is slow and for toxicity to occur there must
be persistently high concentrations (above the upper level of the
therapeutic range) for central nervous system toxicity to occur (20).
In this setting, the distribution of lithium into the brain takes
longer than its elimination, which is opposite to tricyclic antide-
pressants where the distribution to the brain is rapid and elimina-
tion slow.
The relationship between the first-order rate constant, clear-
ance, and apparent volume of distribution is shown in Fig. 8 where
there is mono-exponential decay of lithium with a first-order elimi-
nation constant of 0.028/h and an elimination half-life of 24.8 h.
This apparent half-life of elimination is similar to that determined in
pharmacokinetic studies of lithium therapeutically (21).
Methotrexate is another example of a drug which is similar to
lithium in that its cellular uptake is slower than its renal elimination.
Acute overdose of methotrexate orally has never been reported to
cause severe toxicity compared to taking a therapeutic dose of meth-
otrexate daily instead of weekly which can be life-threatening (22).
2.5. Elimination Elimination is the irreversible loss of the drug from the site of
measurement which occurs by excretion of unchanged drug,
mainly through kidneys, or conversion of drug into a metabolite
12 Introduction to Pharmacokinetics in Clinical Toxicology 299
via various metabolic pathways, mainly in the liver. Some drugs are
excreted through bile after being metabolized in the liver, usually
after phase II conjugation reactions. Uncommonly some drugs,
mostly volatiles, may also be excreted through the lungs. We limit
our discussion here to the two most common methods of drug
elimination, namely the renal and the hepatic routes. Figure 9
replicates the distribution pathways in Fig. 6 and adds the major
elimination pathways.
2.5.1. Hepatic Elimination Metabolism is the predominant mechanism by which about 75% of
drugs are eliminated from the body. The majority of drug metabo-
lism occurs in the liver but can occur at other sites, including the
gastrointestinal tract, the kidneys, and the lungs. Drug metabolites
are not always inactive but are generally less active. There are special
circumstances in which the metabolite is active while the parent is
inactive. It is argued that codeine is a special case of this where
codeine itself is thought to be inactive, and morphine, a metabolite,
is active. In some cases drugs are designed specifically for the parent
to be inactive and the metabolite active. These are termed prodrugs
in which the chemical structure is modified so that absorption
occurs more readily, e.g., dabigatran is given as the inactive dabiga-
tran etexilate (23), and mycophenolate is given as the inactive
mycophenolate mofetil (24). There are also less common circum-
stances in which the parent and metabolite are active and the
metabolite itself is sometimes given as the active agent, e.g.,
300 P. Vajjah et al.
Fig. 10. Metabolic pathways for acetaminophen excluding the small amount excreted unchanged by the kidneys.
Table 1
Examples of in vivo substrates, inhibitors, and inducers of various CYP isozymes
relevant to clinical toxicology. This is not an exhaustive list
2.5.2. Metabolism and Drug It is generally thought that in most overdoses these metabolic
Overdose pathways are saturated. However, there is little evidence to support
saturation in overdose and recent pharmacokinetics studies of cita-
lopram, quetiapine, and venlafaxine in overdose suggest that
metabolism is not saturated for these drugs despite drug concen-
trations 10- to 100-fold those seen with therapeutic use (12–14).
Common examples of drugs where saturation occurs include etha-
nol, theophylline, and phenytoin; however saturation in these cases
occurs with therapeutic doses, albeit for theophylline this is not
noticeable clinically.
In the case of acetaminophen overdose, toxicity is due to the
formation of the toxic metabolite NAPQI and there is some evi-
dence that inhibition of CYP2E1 by ethanol decreases the forma-
tion of this metabolite and therefore toxicity, and conversely
chronic alcohol use induces metabolism (28, 29). Although there
is saturation of the sulfation pathway in acetaminophen overdose
due to depletion of sulfate (30, 31), it is unclear if this changes the
overall pharmacokinetics in overdose.
2.5.3. Renal Elimination The renal elimination of the drugs is usually referred to as drug
excretion and may be elimination of the parent drug or the drug
metabolites. Excretion of drug and the metabolites into the urine
involves three main mechanisms, glomerular filtration, active tubu-
lar secretion, and tubular reabsorption. About 25% of all the drugs
undergo renal elimination as unchanged drug.
Renal elimination depends on various factors including the
lipophilicity of the drug, plasma protein binding and the plasma
drug concentration. The nephron is the basic anatomical unit of
the kidney that is responsible for renal elimination of drugs. Like
any substance eliminated by the kidneys, including endogenous
substances, drugs can be eliminated by one of four major pro-
cesses which occur at different sites along the nephron: glomeru-
lar filtration, active secretion, passive diffusion, and active
reabsorption.
About 25% of cardiac output goes to the kidney and about 10%
of it is filtered through the kidney. The glomerular filtration of a
drug will depend on the glomerular blood flow and the concentra-
tion of unbound drug. Drugs that are bound to plasma proteins are
not filtered. The glomerular filtration rate (GFR) is usually approxi-
mated by the clearance of creatinine, which is a catabolic product of
amino acid metabolism in the muscle. Changes in the plasma
protein binding of drugs will affect filtration, such as saturation of
protein binding or changes in pH.
12 Introduction to Pharmacokinetics in Clinical Toxicology 303
2.5.4. Renal Elimination Understanding renal elimination is important for both acute over-
and Drug Overdose dose and poisoning with drugs that are mainly or completely elimi-
nated by the kidneys. Chronic poisoning by drugs that are renally
eliminated will occur in patients with abnormal renal function or
acute renal failure, including digoxin toxicity, lithium toxicity, and
metformin poisoning.
There are a number of drugs where renal elimination becomes
important in overdose compared to therapeutic doses. An example
of this is salicylate poisoning where the metabolic pathways in the
liver are saturated in overdose leaving the major elimination path-
way as renal clearance of salicylic acid (33). This explains why the
apparent half-life of elimination of salicylate increases from 2 to 4 h
in therapeutic doses (due to hepatic clearance) to approximately
20 h in overdose which is mainly due to renal elimination (34).
304 P. Vajjah et al.
Fig. 11. Two patients with acute aspirin (acetylsalicylic acid) overdoses. The first patient
(filled circles; thick line) ingested 36 g, was not treated with sodium bicarbonate and had
a half-life of elimination of 29.8 h. The second patient (open circles; dashed line) ingested
15 g, was treated with a loading dose and infusion of bicarbonate and had a half life of
5.2 h. The observed concentrations (filled and open circles) have been fitted to a one-
compartment model with first-order input.
2.5.5. Clearance The clearance of a drug from various tissues occurs in parallel as
shown in Fig. 9 so the total body clearance of the drug (CL) is equal
to the sum of the clearances of the individual tissues.
CL ¼ CLh þ CLg þ CLr : (4)
Clearance is a constant that describes the relationship between
drug concentration (C) in the body and the rate of elimination of the
drug from the body and has units of volume per time. It is, however,
12 Introduction to Pharmacokinetics in Clinical Toxicology 305
3. Methods
3.1. Basic Model for PK Given the many processes involved in the input and disposition of
drugs, the mathematical models used to describe the PK of drugs
can be complicated. While the processes can be, sometimes, exceed-
ingly complex as depicted in Fig. 9, it is fortunate that a simple one-
compartment model with first-order input provides a reasonable
description of the time course for many drugs both therapeutically
and in overdose (Fig. 12). Of course more complex PK models may
be necessary.
In Fig. 12 we see that all the tissues of the body are lumped
together as a single homogeneous compartment, the “body” (dis-
cussed in the next section). The arrows indicate the movement of
the drug. The figure shows that part of the drug may be excreted
Fig. 12. A one compartment PK model with first-order input that can be used to describe
the time course of drug concentration. Although two routes of elimination are illustrated
the mathematical model describes total clearance.
306 P. Vajjah et al.
3.2. Rate of Movement Most drug movement processes are due to passive diffusion, there-
of Drug Around the Body fore the driving factor is concentration. This means that the higher
the concentration the greater the rate of the drug that diffuses
across a membrane. This concentration-proportional rate is termed
first order which holds for most drugs. If the rate of movement is
constant and independent of concentration, then it is said to be
zero order. If the rate of movement of drug is saturable (i.e., it
changes from apparent first order to apparent zero order as con-
centration increases) then it is termed mixed order and most com-
monly described by a Michaelis–Menten process.
Note all the equations shown below are in terms of amount (A)
but can be written in terms of concentrations (C).
Zero-order process:
dA
¼ k0 (8)
dt
A zero-order process is described by a constant rate (mass per
time) alone and is independent of the amount of drug to be
transferred.
First-order process:
dA
¼ k A (9)
dt
12 Introduction to Pharmacokinetics in Clinical Toxicology 307
3.3. Describing Three different approaches can be used to describe the PK of drugs:
the Pharmacokinetics
1. Compartmental pharmacokinetic models
of Drugs
2. Non-compartmental pharmacokinetic models
3. Physiologically based pharmacokinetic (PBPK) models
In the compartmental approach the body is divided into one or
more compartments the number of which is dictated by the data.
PBPK involves the investigator defining a set of compartments
based on physiology. The similarity of the behavior of the drug in
various compartments is assessed based on data which is sampled
from each of the compartments (Shown in Figs. 6 and 9).
The non-compartmental approach (NCA) does not require the
assumption of any compartments for the purpose of analysis and
includes the summary variables: peak or maximum drug concentra-
tion (Cmax), time to peak drug concentration (Tmax), area under the
curve (AUC). However, they do require weak assumptions if the
non-compartmental summary variables are converted into parame-
ter values such as CL and V.
3.3.2. The One- The one compartment model is the simplest of the models used to
Compartment PK Model describe the PK of drugs. This model assumes that all the tissues in
the body are lumped together into a single kinetically homogenous
unit. In the below pharmacokinetic model, the whole body, except
the gut, is assumed to be a single compartment. A schematic of a
one-compartment PK model is given in Fig. 13.
When a dose D is administered extravascularly, it has to be
absorbed through a biological barrier to enter the central compart-
ment (blood) where it becomes systemically available. The process
itself is complex and determined by factors such as, the route of
administration, the amount administered, the formulation, and the
physicochemical properties of the drug. This complex input process
is often assumed to be a first-order input process governed by a
single parameter ka, the rate constant of absorption. This assump-
tion of a first-order input process is not a requirement and the
absorption rate may be described by any number of processes,
including a zero-order input process (which would describe an
intravenous infusion) or a mixture of these processes.
For this first-order input one-compartment model the rate of
change of amount of drug over time (t) in both the absorption site
and the body can be represented as a set of ordinary differential
equations given by:
dAð1Þ
¼ ka Að1Þ (13)
dt
dAð2Þ
¼ ðka Að1ÞÞ ðCL=V Að2ÞÞ; (14)
dt
12 Introduction to Pharmacokinetics in Clinical Toxicology 309
Fig. 13. (a) Schematic of a one-compartment PK model with intravenous bolus adminis-
tration, showing all the tissues in the body. CL is the clearance and V is the volume of
distribution of the drug. (b) Schematic of a one-compartment PK model with intravenous
bolus administration. All the tissues are lumped together into a single homogeneous
compartment. Both are the same; however, (a) shows the lumping assumptions.
Fig. 14. Observed concentration time data from a female patient ingesting 14 g of
amisulpride fitted with a first-order one-compartment model with parameter estimates
of ka , 0.173/h, V, 179 L and CL, 31.3 L/h and a half-life of 4 h.
4. Notes
References
1. Dawson AH, Whyte IM (2001) Therapeutic overdose patients? Ther Drug Monit
drug monitoring in drug overdose. Br J Clin 32:300–304
Pharmacol 52(Suppl 1):97S–102S 3. Isbister GK, Friberg LE, Duffull SB (2006)
2. Isbister GK (2010) How do we use drug con- Application of pharmacokinetic-
centration data to improve the treatment of pharmacodynamic modelling in management
12 Introduction to Pharmacokinetics in Clinical Toxicology 311
of QT abnormalities after citalopram overdose. the pharmacokinetics and the clinical features
Intensive Care Med 32:1060–1065 of carbamazepine poisoning. Am J Emerg Med
4. Friberg LE, Isbister GK, Duffull SB (2006) 24:440–443
Pharmacokinetic-pharmacodynamic modelling 17. Hulten BA, Heath A, Knudsen K, Nyberg G,
of QT interval prolongation following citalopram Starmark JE, Martensson E (1992) Severe ami-
overdoses. Br J Clin Pharmacol 61:177–190 triptyline overdose: relationship between toxi-
5. Martinez MN, Amidon GL (2002) A mecha- cokinetics and toxicodynamics. J Toxicol Clin
nistic approach to understanding the factors Toxicol 30:171–179
affecting drug absorption: a review of funda- 18. Hulten BA, Heath A, Knudsen K, Nyberg G,
mentals. J Clin Pharmacol 42:620–643 Svensson C, Martensson E (1992) Amitripty-
6. Lin JH, Chiba M, Baillie TA (1999) Is the role line and amitriptyline metabolites in blood and
of the small intestine in first-pass metabolism cerebrospinal fluid following human overdose.
overemphasized? Pharmacol Rev 51:135–158 J Toxicol Clin Toxicol 30:181–201
7. Hoppu K, Koskimies O, Holmberg C, Hirvi- 19. Meineke I, Schmidt W, Nottrott M, Schroder
salo EL (1991) Evidence for pre-hepatic T, Hellige G, Gundert-Remy U (1997) Mod-
metabolism of oral cyclosporine in children. elling of non-linear pharmacokinetics in sheep
Br J Clin Pharmacol 32:477–481 after short-term infusion of cardiotoxic doses
8. Pacifici GM, Eligi M, Giuliani L (1993) (+) and of imipramine. Pharmacol Toxicol 80:266–271
() terbutaline are sulphated at a higher rate in 20. Waring WS (2006) Management of lithium
human intestine than in liver. Eur J Clin Phar- toxicity. Toxicol Rev 25:221–230
macol 45:483–487 21. Sproule BA, Hardy BG, Shulman KI (2000)
9. Tanigawara Y, Okamura N, Hirai M, Yasuhara Differential pharmacokinetics of lithium in
M, Ueda K, Kioka N, Komano T, Hori R elderly patients. Drugs Aging 16:165–177
(1992) Transport of digoxin by human P- 22. Balit CR, Daly FFS, Little M, Murray L (2006)
glycoprotein expressed in a porcine kidney epi- Oral methotrexate overdose. Clin Toxicol 44:1
thelial cell line (LLC-PK1). J Pharmacol Exp 23. Sanford M, Plosker GL (2008) Dabigatran
Ther 263:840–845 etexilate. Drugs 68:1699–1709
10. Cvetkovic M, Leake B, Fromm MF, Wilkinson 24. Goldblum R (1993) Therapy of rheumatoid
GR, Kim RB (1999) OATP and P-glycoprotein arthritis with mycophenolate mofetil. Clin
transporters mediate the cellular uptake and Exp Rheumatol 11(Suppl 8):S117–S119
excretion of fexofenadine. Drug Metab Dispos 25. Watson CP, Vernich L, Chipman M, Reed K
27:866–871 (1998) Nortriptyline versus amitriptyline in
11. Lindenberg M, Kopp S, Dressman JB (2004) postherpetic neuralgia: a randomized trial.
Classification of orally administered drugs on Neurology 51:1166–1171
the World Health Organization Model list of 26. Tashkin DP, Brik A, Gong H Jr (1987) Cetir-
Essential Medicines according to the biophar- izine inhibition of histamine-induced broncho-
maceutics classification system. Eur J Pharm spasm. Ann Allergy 59:49–52
Biopharm 58:265–278
27. Manyike PT, Kharasch ED, Kalhorn TF, Slat-
12. Friberg LE, Isbister GK, Hackett LP, Duffull tery JT (2000) Contribution of CYP2E1 and
SB (2005) The population pharmacokinetics of CYP3A to acetaminophen reactive metabolite
citalopram after deliberate self-poisoning: a formation. Clin Pharmacol Ther 67:275–282
Bayesian approach. J Pharmacokinet Pharma-
codyn 32:571–605 28. Schmidt LE, Dalhoff K, Poulsen HE (2002)
Acute versus chronic alcohol consumption in
13. Isbister GK, Friberg LE, Hackett LP, Duffull acetaminophen-induced hepatotoxicity. Hepa-
SB (2007) Pharmacokinetics of quetiapine in tology 35:876–882
overdose and the effect of activated charcoal.
Clin Pharmacol Ther 81:821–827 29. Thummel KE, Slattery JT, Ro H, Chien JY,
Nelson SD, Lown KE, Watkins PB (2000) Eth-
14. Kumar VV, Oscarsson S, Friberg LE, Isbister anol and production of the hepatotoxic metab-
GK, Hackett LP, Duffull SB (2009) The effect olite of acetaminophen in healthy adults. Clin
of decontamination procedures on the pharma- Pharmacol Ther 67:591–599
cokinetics of venlafaxine in overdose. Clin
Pharmacol Ther 86:403–410 30. Levy G, Galinsky RE, Lin JH (1982) Pharma-
cokinetic consequences and toxicologic impli-
15. Buckley NA, Dawson AH, Reith DA (1995) cations of endogenous cosubstrate depletion.
Controlled release drugs in overdose: clinical Drug Metab Rev 13:1009–1020
considerations. Drug Saf 12:73–84
31. Gelotte CK, Auiler JF, Lynch JM, Temple AR,
16. Brahmi N, Kouraichi N, Thabet H, Amamou Slattery JT (2007) Disposition of
M (2006) Influence of activated charcoal on
312 P. Vajjah et al.
acetaminophen at 4, 6, and 8 g/day for 3 days in 35. Cumming G, Dukes DC, Widdowson G
healthy young adults. Clin Pharmacol Ther (1964) Alkaline diuresis in treatment of aspirin
81:840–848 poisoning. Br Med J 2:1033–1036
32. Tirona RG, Kim RB (2002) Pharmacogenomics 36. Rosenzweig P, Canal M, Patat A, Bergougnan
of organic anion-transporting polypeptides L, Zieleniuk I, Bianchetti G (2002) A review of
(OATP). Adv Drug Deliv Rev 54:1343–1352 the pharmacokinetics, tolerability and pharma-
33. Levy G, Tsuchiya T (1972) Salicylate accumula- codynamics of amisulpride in healthy volun-
tion kinetics in man. N Engl J Med 287:430–432 teers. Hum Psychopharmacol 17:1–13
34. Done AK (1960) Salicylate intoxication. Signifi-
cance of measurements of salicylate in blood in
cases of acute ingestion. Pediatrics 26:800–807
Chapter 13
Modeling of Absorption
Walter S. Woltosz, Michael B. Bolger, and Viera Lukacova
Abstract
Absorption takes place when a compound enters an organism, which occurs as soon as the molecules enter
the first cellular bilayer(s) in the tissue(s) to which is it exposed. At that point, the compound is no longer
part of the environment (which includes the alimentary canal for oral exposure), but has become part of the
organism. If absorption is prevented or limited, then toxicological effects are also prevented or limited.
Thus, modeling absorption is the first step in simulating/predicting potential toxicological effects. Simula-
tion software used to model absorption of compounds of various types has advanced considerably over the
past 15 years. There can be strong interactions between absorption and pharmacokinetics (PK), requiring
state-of-the-art simulation computer programs that combine absorption with either compartmental phar-
macokinetics (PK) or physiologically based pharmacokinetics (PBPK). Pharmacodynamic (PD) models for
therapeutic and adverse effects are also often linked to the absorption and PK simulations, providing PK/
PD or PBPK/PD capabilities in a single package. These programs simulate the interactions among a variety
of factors including the physicochemical properties of the molecule of interest, the physiologies of the
organisms, and in some cases, environmental factors, to produce estimates of the time course of absorption
and disposition of both toxic and nontoxic substances, as well as their pharmacodynamic effects.
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_13, # Springer Science+Business Media, LLC 2012
313
314 W.S. Woltosz et al.
where Vdonor is the volume of fluid on the donor side of the mem-
brane, Cdonor is the concentration on the donor side, and Cacceptor is
the concentration on the acceptor side of the membrane. Notice
that the product Vdonor Cdonor is equal to Msoln in (1). Notice also
that if the two concentration terms become equal, absorption stops.
In fact, if Cacceptor becomes greater than Cdonor, the absorption rate
would be negative and molecules would move back into the donor
fluid. The presence of transporter proteins can override the concen-
tration gradient and result in molecules moving from low concen-
tration to high concentration against the gradient. Modern
simulation software includes the ability to model the effects of
transporters that both help (influx) and hinder (efflux) absorption
of molecules.
The absorption rate coefficient, ka, in (1) and (2) is influenced
by different factors depending on the tissue into which absorption
is taking place. The value of ka to input into a simulation can be
obtained from in vitro cell culture or artificial membrane experi-
ments, from a variety of animal in situ experiments, or from an
in silico prediction based on the molecular structure (18).
The most widely used absorption model for intestinal absorp-
tion in the pharmaceutical industry is the one incorporated into
GastroPlus and known as the advanced compartmental absorption
and transit (ACATTM) model (15). This model is based on the
original compartmental absorption and transit (CAT) model devel-
oped by Yu (18). Figure 2 shows a diagram of the ACAT model in
GastroPlus. A total of nine compartments are used to represent the
gastrointestinal tract for humans and animals: stomach, six small
intestine compartments, caecum, and colon. Each of these com-
partments will have a mean transit time, pH, permeability,
13 Modeling of Absorption 317
2. Materials
3. Methods
3.1. Fitting Model Regardless of the quantity and quality of data available during soft-
Parameters ware development, fitting of one or more model parameters will be
required in some, if not most, instances. Absorption simulation soft-
ware should provide for robust numerical optimization of such para-
meters from as many types of experimental data as possible. The user
should be provided a choice of optimization algorithms, objective
functions, and weighting functions and should be able to set con-
straints on both fitted (optimized) parameters and on various results
of the absorption simulation, such as maximum concentration in
plasma (Cmax) or other tissues, time of maximum concentration
(Tmax), area under the plasma concentration–time curve (AUC),
fraction absorbed, and fraction bioavailable.
Scaling data from in vitro experiments or in vivo measurements
in different species to human is a frequent challenge. For example,
permeability in the gastrointestinal tract of rat or dog can be
significantly different than that in human, even for compounds
that obey simple passive diffusion (i.e., no transporters involved).
Rat permeabilities tend to be around 3–4 times lower than human
but are very well correlated after being scaled, while dog perme-
abilities tend to be around four times higher, but these are gross
approximations and the actual reported ranges are much wider,
with rat permeabilities as much as 15 times lower than human
(31). It is also important to keep in mind the relevance of in vitro
values to the in vivo situation. For example, aqueous solubility will
generally not describe adequately dissolution of a compound in the
gastrointestinal tract (14). The concentration of bile salts in vivo
needs to be accounted for (32), together with the fact that it
changes in different regions of the gastrointestinal tract as well as
in relation to food intake (33). Of course, the effect of bile salts on
322 W.S. Woltosz et al.
0.05
0.04
Subject A
0.03
Subject B
0.02 Subject C
0.01 Average
0
–6 4 14 24
30
25
20
15 NoGFJ
GFJ
10
0
0 10 20 30
4. Examples
Fig. 6. Propranolol plasma concentration–time simulation results (line ¼ simulation, squares ¼ observations).
25
20
15
10
0
0 1 2 3 4 5 6 7 8 9
Time (h)
Fig. 8. Plasma concentration–time profiles for alprazolam (ALP) and theophylline (TPL) in four rats from Metsugi. Reprinted
from (38) with permission from Springer.
5. Notes
Our rule is if you have to change the model parameters for different
doses, you don’t have the right model–you need one where the
dose amount is automatically accounted for so that the same model
can be used for all doses.
Failing to have a simulation plan. What are the goals of the absorp-
tion/PK simulation study? What are the next decisions that need to
be made to take the project forward? How are absorption/PK
simulation results expected to affect those decisions?
Examples of the purposes for running absorption/PK simula-
tions include
Analyzing animal data to assess a drug’s behavior.
Estimating first dose in human.
Fitting models to try to understand unusual observations.
Testing theories to decide what steps to take next in animal or
human studies.
Performing in vitro–in vivo correlations.
Each of these can involve different approaches to the simulation
study. For example, if the goal is to develop an in vitro–in vivo
correlation for a controlled release formulation, the first step should
be to develop the pharmacokinetic and absorption models from
data for iv and immediate release oral doses. Once those model
parameters are set, then the only remaining factor for the con-
trolled release formulation should be how it releases in vivo,
which might be quite different than the dissolution–time data
from an in vitro experiment. With modern IVIVC methods, the
in vivo release can be fitted (“deconvoluted”) to best match the
simulated Cp–time curve to the observed Cp–time data. This
allows direct comparison of the in vivo and in vitro release/disso-
lution–time profiles. A study of this type would have a different
simulation plan than one designed to estimate first dose in human.
Always develop a plan for what the simulation study is intended
to accomplish, and always organize and examine the data prior to
running simulations. It doesn’t take long, and you will save time
and frustration later.
Incorrect or incomplete inputs. Simulation software is not intelli-
gent—it does what you tell it to do. If you input water solubility
at 25 C, then you are simulating a gastrointestinal tract filled with
25 C water, so don’t blame the software if the solubility is too low
and absorption is less than observed. If you don’t provide a com-
plete picture of ionization (all pKas) then solubility-versus-pH and
logD-versus-pH will be wrong, and your results might differ dra-
matically from that they should be. Figure 9 shows the difference in
predicted absorption for a low-solubility monoprotic acid using
pKa values of 4, 4.5, and 5—at this pH, even small changes are
critical. If you input Caco-2 permeability as human intestinal
330 W.S. Woltosz et al.
permeability without correcting it, you will get very little absorp-
tion, because Caco-2 permeabilities are typically 2–3 orders of
magnitude lower than in vivo human permeability. If in vitro
metabolism measurements are not properly scaled to the simulated
species, body size, and enzyme expression levels, then the software
will simulate something different. The old computer adage “gar-
bage in—garbage out” applies!
Inputs to absorption/PK simulation software should represent
in vivo conditions to the maximum possible extent. This will include,
among others, ionization constants (pKas—ALL of them for multi-
protic compounds!), solubility versus pH in media that best repre-
sent in vivo conditions, log P (or log D at a specified pH—be sure
you understand the difference!), permeability versus region in the
intestinal tract, plasma protein binding, blood–plasma concentration
ratio, enzyme and transporter expression levels in various tissues, and
Vmax and Km values for metabolism and transport. Generally, Vmax
values for transporters will not be available and will need to be fitted;
however, Km values can often be gleaned from in vitro data. For oral
doses, in addition to physiological and physicochemical inputs,
the simulation program will also need an accurate description of
the formulation(s) to be simulated—particle size distributions,
13 Modeling of Absorption 331
Fig. 10. Nonlinear dose-dependence of valacyclovir. Reprinted from (39) with permission from Macmillan Publishers Ltd.
Fig. 11. Degradation of valacyclovir as a function of pH. Reprinted from (40) with
permission from John Wiley & Sons, Inc.
Fig. 12. Concentration dependence of valacyclovir effective permeability (Km ¼ 1.22 mM).
Reprinted from (40) with permission from John Wiley & Sons, Inc.
tool can cause more damage than good, so if the user is not
provided appropriate time and resources to be good at what he/
she is expected to do, then the organization should consider out-
sourcing for the required expertise. We have seen all too often that
inexperienced users blame the software when things are going awry,
only to discover that inputs did not represent reality, or the
approach used modeling methods that were too simplified for the
problem at hand.
References
1. Swaan PW, Marks GJ, Ryan FM et al (1994) 12. Fordtran JS, Rector FC Jr, Ewton MF et al (1965)
Determination of transport rates for arginine Permeability characteristics of the human small
and acetaminophen in rabbit intestinal tissues intestine. J Clin Invest 44(12):1935–1944
in vitro. Pharm Res 11(2):283–287 13. Billich CO, Levitan R (1969) Effects of sodium
2. Slattery JT, Levy G (1979) Acetaminophen concentration and osmolality on water and
kinetics in acutely poisoned patients. Clin Phar- electrolyte absorption form the intact human
macol Ther 25(2):184–195 colon. J Clin Invest 48(7):1336–1347
3. Clements JA, Heading RC, Nimmo WS et al 14. Parrott N, Lukacova V, Fraczkiewicz G et al
(1978) Kinetics of acetaminophen absorption (2009) Predicting pharmacokinetics of drugs
and gastric emptying in man. Clin Pharmacol using physiologically based modeling-
Ther 24(4):420–431 application to food effects. AAPS J 11(1):45–53
4. Hogben CAM, Tocco DJ, Brodie BB et al 15. Agoram B, Woltosz WS, Bolger MB (2001)
(1959) On the mechanism of intestinal absorp- Predicting the impact of physiological and bio-
tion of drugs. J Pharmacol Exp Ther chemical processes on oral drug bioavailability.
125:275–282 Adv Drug Deliv Rev 50(Suppl 1):S41–S67
5. Tubic M, Wagner D, Spahn-Langguth H et al 16. Bolger MB, Agoram B, Fraczkiewicz R et al
(2006) In silico modeling of non-linear drug (2003) Simulation of absorption, metabolism,
absorption for the P-gp substrate talinolol and and bioavailability. In: Waterbeemd HVD,
of consequences for the resulting pharmacody- Lennern€as H, Artursson P (eds) Drug bioavail-
namic effect. Pharm Res 23(8):1712–1720 ability. Estimation of solubility, permeability
6. Bolger MB, Lukacova V, Woltosz WS (2009) and bioavailability. Wiley, New York
Simulations of the nonlinear dose dependence 17. Avdeef A (2001) Physicochemical profiling
for substrates of influx and efflux transporters (solubility, permeability and charge state).
in the human intestine. AAPS J 11(2):353–363 Curr Top Med Chem 1(4):277–351
7. Swaan PW (1998) Recent advances in intestinal 18. Yu LX, Amidon GL (1999) A compartmental
macromolecular drug delivery via receptor- absorption and transit model for estimating
mediated transport pathways. Pharm Res 15 oral drug absorption. Int J Pharm 186
(6):826–834 (2):119–125
8. Palm K, Luthman K, Ros J et al (1999) Effect 19. Qiu Y, Kuo CH, Zappi ME (2001) Perfor-
of molecular charge on intestinal epithelial mance and simulation of ozone absorption
drug transport: pH- dependent transport of and reactions in a stirred-tank reactor. Environ
cationic drugs. J Pharmacol Exp Ther 291 Sci Technol 35(1):209–215
(2):435–443 20. Bogdanffy MS, Mathison BH, Kuykendall JR
9. Adson A, Raub TJ, Burton PS et al (1994) et al (1997) Critical factors in assessing risk
Quantitative approaches to delineate paracellu- from exposure to nasal carcinogens. Mutat
lar diffusion in cultured epithelial cell mono- Res 380(1–2):125–141
layers. J Pharm Sci 83(11):1529–1536 21. Fasano WJ, McDougal JN (2008) In vitro der-
10. Schiller C, Frohlich CP, Giessmann T et al mal absorption rate testing of certain chemicals
(2005) Intestinal fluid volumes and transit of of interest to the Occupational Safety and
dosage forms as assessed by magnetic reso- Health Administration: summary and evalua-
nance imaging. Aliment Pharmacol Ther tion of USEPA’s mandated testing. Regul
22:971–979 Toxicol Pharmacol 51(2):181–194
11. Wilson JP (1967) Surface area of the small 22. Nohynek GJ, Dufour EK, Roberts MS (2008)
intestine in man. Gut 8:618–621 Nanotechnology, cosmetics and the skin: is
336 W.S. Woltosz et al.
there a health risk? Skin Pharmacol Physiol 21 32. Sugano K (2009) Computational oral absorp-
(3):136–149 tion simulation for Low-solubility compounds.
23. Mansour SA, Gad MF (2010) Risk assessment Chem Biodivers 6:2014–2029
of pesticides and heavy metals contaminants in 33. Porter CJH, Trevaskis NL, Charman WN
vegetables: a novel bioassay method using (2007) Lipids and lipid-based formulations:
Daphnia magna Straus. Food Chem Toxicol optimizing the oral delivery of lipophilic
48(1):377–389 drugs. Nat Rev Drug Discov 6:231–248
24. Bohus E, Coen M, Keun HC et al (2008) 34. Davies NM, Feddah MR (2003) A novel
Temporal metabonomic modeling of method for assessing dissolution of aerosol
l-arginine-induced exocrine pancreatitis. inhaler products. Int J Pharm 255
J Proteome Res 7(10):4435–4445 (1–2):175–187
25. Andersen ME, Krishnan K (1994) Physiologi- 35. Son YJ, McConville JT (2009) Development of
cally based pharmacokinetics and cancer risk a standardized dissolution test method for
assessment. Environ Health Perspect 102 inhaled pharmaceutical formulations. Int J
(Suppl 1):103–108 Pharm 2009(382):1–2
26. Dobrev ID, Andersen ME, Yang RSH (2002) 36. Kupferschmidt HH, Fattinger KE, Ha HR et al
In silico toxicology: simulating interaction (1998) Grapefruit juice enhances the
thresholds for human exposure to mixtures of bioavailability of the HIV protease inhibitor
trichloroethylene, tetrachloroethylene, and saquinavir in man. Br J Clin Pharmacol 45
1,1,1-trichloroethane. Environ Health Per- (4):355–359
spect 110:1031–1039 37. Chilvers MA, O’Callaghan C (2000) Local
27. Vinegar A, Jepson GW, Cisneros M et al mucociliary defence mechanisms. Paediatr
(2000) Setting safe acute exposure limits for Respir Rev 1(1):27–34
halon replacement chemicals using physiologi- 38. Metsugi Y, Miyaji Y, Ogawara K et al (2008)
cally based pharmacokinetic modeling. Inhal Appearance of double peaks in plasma concen-
Toxicol 12(8):751–763 tration–time profile after oral administration
28. Rao HV, Ginsberg GL (1997) A depends on gastric emptying profile and weight
physiologically-based pharmacokinetic model function. Pharm Res 25(4):886–895
assessment of methyl t-butyl ether in ground- 39. Weller S, Blum MR, Doucette M et al (1993)
water for bathing and showering determina- Pharmacokinetics of the acyclovir pro-drug
tion. Risk Anal 17(5):583–598 valacyclovir after escalating single- and
29. Yang Y, Xu X, Georgopoulos P (2010) A multiple-dose administration to normal volun-
Bayesian population PBPK model for multi- teers. Clin Pharmacol Ther 54(6):595–605
route chloroform exposure. J Expo Sci Environ 40. Sinko PJ, Balimane PV (1998) Carrier-
Epidemiol 20(4):326–341 mediated intestinal absorption of valacyclovir,
30. Campbell A (2009) Development of PBPK the L-valyl ester prodrug of acyclovir: 1. Inter-
model of molinate and molinate sulfoxide in actions with peptides, organic anions and
rats and humans. Regul Toxicol Pharmacol 53 organic cations in rats. Biopharm Drug Dispos
(3):195–204 19:209–217
31. Fagerholm U, Johansson M, Lennernas H 41. Giacomini KM, Huang SM, Tweedie DJ et al
(1996) Comparison between permeability (2010) Membrane transporters in drug devel-
coefficients in rat and human jejunum. Pharm opment. Nat Rev Drug Discov 9(3):215–236
Res 13(9):1336–1342
Chapter 14
Abstract
In silico tools specifically developed for prediction of pharmacokinetic parameters are of particular interest
to pharmaceutical industry because of the high potential of discarding inappropriate molecules during
an early stage of drug development itself with consequent saving of vital resources and valuable time.
The ultimate goal of the in silico models of absorption, distribution, metabolism, and excretion (ADME)
properties is the accurate prediction of the in vivo pharmacokinetics of a potential drug molecule in man,
whilst it exists only as a virtual structure. Various types of in silico models developed for successful
prediction of the ADME parameters like oral absorption, bioavailability, plasma protein binding, tissue
distribution, clearance, half-life, etc. have been briefly described in this chapter.
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_14, # Springer Science+Business Media, LLC 2012
337
338 A.K. Madan and H. Dureja
Fig. 1. Reasons for failure in drug development. Reproduced from ref. 6 with permission
from Elsevier Limited, UK.
Fig. 2. The linkage between the data generation, databases, and model building.
Reproduced from ref. 14 with permission from Elsevier Limited, UK.
than are in vitro tests, with the distinct advantage that far less
investment in technology, resources, and time is needed (17). An
amalgamation of in vitro experiments and in silico modeling will
dramatically increase the insight and knowledge about the relevant
physiological and pharmacological processes in drug discovery (3).
The linkage between data generation and model building has been
illustrated in Fig. 2 (14).
For data modeling, quantitative structure–property relationship
(QSPR) approaches are generally applied. Based on appropriate
descriptors, QSPR exploiting from simple linear regression to mod-
ern multivariate analysis techniques or machine-learning methods is
now being extensively applied for the analysis of ADME data. Data
modeling can be efficiently applied to large number of molecules,
but require a significant quantity of high quality data to derive a
relationship between the structures and the modeled property
(11). Quantitative structure–activity relationship (QSAR) model-
ing is a well known and established discipline, where physicochem-
ical and molecular descriptors are correlated with bioassay drug
concentrations eliciting a standard pharmacological response such
as EC50 or IC50. The extension of the QSAR approach to pharma-
cokinetic parameters is similarly referred to as quantitative struc-
ture–pharmacokinetic relationships (QSPKR or QSPkR) modeling.
A relatively less utilized technique, albeit just as empirical as the
QSAR approach, is the direct mapping of molecular descriptors
with the time course of plasma levels following administration of
drug by various routes (18). Pharmacokinetics is the study of the
time course of a drug within the body incorporating the processes
of ADME. Pharmacokinetics can be defined as the study of the time
14 Prediction of Pharmacokinetic Parameters 341
2. Materials
and Methods
The first phase of ADME computational models began in the
1960s with development of classical QSAR by Hansch. According
to him quantitative relationships could be developed for the
lipophilicity of the drugs as well as metabolic parameters such as
microsomal hydroxylation, demethylation, CYP450–CYP420 con-
version, and duration of drug action (20). The simplest ADME-
concerned filter for short listing of potential drug molecule may be
“rule of five” proposed by Lipinski et al. in 1997 (21). According to
Lipinski, it is much easier to optimize pharmacokinetic properties
in initial stage and to optimize the receptor binding affinity at a later
stage of drug discovery process (22). Though the “rule of five” may
be too simple approach but it has definitely generated/stimulated
considerable interest in development of fast, generally applicable
filters for ADME. This is now being suggested as a “rule of thumb”
or guide rather than a definitive cutoff. However, “rule of five” and
other general rules simply lay down minimum criterion of a mole-
cule to be drug-like. The values of different desired characteristics
proposed by various researchers in drug-like rules have been com-
piled in Table 1 (21, 23, 24). It is relatively easy for a molecule to
fall within the “rule of five” but has no certainty to lead to a drug.
As a matter of fact, 68.7% compounds of 2.4 million compounds in
Available Chemical Directory (ACD) Screening Database and 55%
compounds of 240,000 compounds in ACD have no violation of
“rule of five” at all (11). Therefore, much more stringent criteria need
to be laid down so as to discriminate drug-like molecules from others.
Various QSAR approaches ranging from simple linear regression
to modern multivariate analysis techniques are now being applied to
the analysis of ADME data. Data mining and machine-learning tech-
niques which were originally developed and used in other fields are
now being successfully used for this purpose. These techniques
342 A.K. Madan and H. Dureja
Table 1
Drug-like rules proposed by various researchers
Characteristics of drugs
2.1. Models Various in silico models have been developed for the prediction of
for Prediction of ADME pharmacokinetic parameters, which include oral absorption, bio-
Parameters availability, plasma protein binding (PPB), tissue distribution, clear-
ance, half-life, etc. Some of these predictive models have been
exemplified in Table 3.
Absorption: The first step in the drug absorption process is the
disintegration of the tablet or capsule and subsequent dissolution
of the drug. Poor biopharmaceutical properties, i.e., poor aqueous
solubility and slow dissolution rate can lead to poor and delayed oral
absorption and hence low oral bioavailability. Consequently, low
solubility is naturally detrimental to good and complete oral absorp-
tion, and so the early consideration of this property is of great
significance in drug discovery process (9). Prediction of intestinal
absorption is a major goal in the design, optimization, and selection
of suitable candidates for development as oral drugs (28). Most of
the computational approaches currently being used to predict
absorption are based on the assumption that absorption is passive
and can be predicted from various molecular descriptors of the
compound. No account is taken care of active transport processes,
including both uptake and efflux transporters. Presently, it is not
known that how many compounds are actually actively transported
in the gut (2). Drug absorption following oral administration is
difficult to predict because of complex drug-specific parameters
and physiological processes. These include drug release from the
dosage form and dissolution, aqueous solubility, gastrointestinal
(GI) motility and contents, pH, GI blood flow, active or passive
transport systems, and pre-systemic and first-pass metabolism (18).
Permeability: Human intestinal permeability (important for the
absorption of oral drugs) and BBB permeability (important for
the distribution of CNS active agents and toxicity of non-CNS
drugs) constitute important pharmacokinetic parameters (93).
The hydrogen-bonding capacity of a drug solute is generally recog-
nized as an important determinant of permeability. In order to
penetrate through a membrane, a drug molecule needs to break/
rupture hydrogen bonds with its aqueous environment. The more
potential hydrogen bonds in a molecule will necessitate increased
bond breaking costs. As a consequence high hydrogen-bonding
potential is an unfavorable property that is often related to reduced
permeability and absorption (9). Caco-2 cells permeability data still
constitute a major target property for modeling, in spite of sub-
stantial inter- and even intra-laboratory variability in the data (94).
The BBB represents a significant barrier towards entry of drugs into
344 A.K. Madan and H. Dureja
Table 2
Some examples of commercially available software/
program for prediction of ADME parameters
Table 3
Examples of predictive models for ADME
Pharmacokinetic Statistical/modeling
parameter Data set technique References
Oral bioavailability 188 Non-congeneric organic Fuzzy adaptive least square (26)
medicinals
Intestinal drug absorption 6 b-Adrenoreceptor Regression (27)
antagonists
Human intestinal 67 Drugs and drug-like Artificial neural network (28)
absorption compounds (training set);
9 drugs (cross-validation
set); and 10 drugs (external
cross-validation set)
Intestinal drug absorption 6 b-Adrenoreceptor Regression (27)
antagonists
Human intestinal 67 Drugs and drug-like Artificial neural network (28)
absorption compounds (training set);
9 drugs (cross-validation
set); and 10 drugs (external
cross-validation set)
Tissue distribution 9 n-Alkyl-5-ethyl barbituric Regression (29)
acids
Human jejunal permeability 22 Structurally diverse Multivariate data analysis (30)
compounds (training set)
and 34 compounds
(external validation set)
Intestinal absorption 20 Drugs (training set); 74 Multilinear regression (31)
drugs (prediction set)
Intestinal membrane 10 Non-peptide endothelin Rule of five, molecular (32)
permeability receptor antagonists mechanics, and quantum
mechanics
Intestinal absorption 20 Molecules Artificial neural network (33)
based correlation
(hashkey) model
Intestinal absorption 234 Compounds Statistical pattern (34)
recognition model
Oral bioavailability 591 Structurally diverse Step-wise regression (35)
compounds
Human intestinal 38 Drugs (training set) and Multilinear regression (36)
absorption 131 drugs (prediction set)
Plasma protein binding 95 Diverse compounds Genetic function (37)
approximation
(continued)
346 A.K. Madan and H. Dureja
Table 3
(continued)
Pharmacokinetic Statistical/modeling
parameter Data set technique References
Oral bioavailability 21 Drugs and drug candidates Graphical classification (38)
model
Oral bioavailability 1,100 Drug candidates Regression (39)
Clearance, volume of 272 Structurally unrelated Principal component (40)
distribution, fractal drugs analysis and projection to
clearance, and fractal latent structures
volume
Human clearance 68 Drugs Multiple linear regression, (41)
partial least squares
method, and artificial
neural network
Blood–brain barrier 324 Drugs and drug-like Neural network and Support (42)
permeability molecules vector machine
Metabolic stability 631 Diverse chemicals k-Nearest neighbor (43)
(training set) and 107
chemicals (validation set)
Human intestinal 1,000 Drug-like compounds Recursive partitioning (44)
absorption analyses
Aqueous solubility, plasma 202, 226, and 204 Linear regression analysis (45)
protein binding, and compounds (training set)
human volume of and 442, 94, and 124
distribution at steady state compounds (test set),
respectively
Clearance (oral) 87 Drugs Multivariate regression (46)
analyses, multiple linear
regression, and partial
least squares analysis
Half-life, renal, and total 20 Cephalosporins Artificial neural network (47)
body clearance, fraction
excreted in urine, volume
of distribution, and
fraction bound to plasma
proteins
Human intestinal 82 Compounds (training set) Topological substructural (48)
absorption and 127 drugs (prediction approach (TOPS-MODE)
set) and 109 drugs (test set)
Clearances, fraction bound 62 Structurally diverse Artificial neural network (49)
to plasma proteins and compounds
volume of distribution
(continued)
14 Prediction of Pharmacokinetic Parameters 347
Table 3
(continued)
Pharmacokinetic Statistical/modeling
parameter Data set technique References
Blood–brain barrier 150 Chemically diverse 4D-molecular similarity and (50)
penetration compounds cluster analysis
Oral absorption 1,260 Compounds Classification regression (51)
trees
Blood–brain barrier 415 Drugs Logistic regression, linear (52)
penetration discriminant analysis, k-
nearest neighbor, decision
tree, probabilistic neural
network, and support
vector machine
Human serum albumin 37 Structurally related 3D-QSAR (53)
affinity interleukin-8 inhibitors
Metabolism 42 Derivatives Comparative molecular field (54)
analysis
p-Glycoprotein inhibitors Series of 1,4-dihydropyridines Forward inclusion coupled (55)
and pyridines with multiple regression
analysis and partial least
square regression
Permeability 20 Drugs Partial least square method (56)
Steady-state volume of 199 Compounds (human Bayesian neural networks, (57)
distribution data) and 2,086 classification and
compounds (rat data) regression trees, and
partial least squares
Oral drug absorption 22 Structurally diverse drugs Partial least square analysis (58)
(training set) and 169
drugs (prediction set)
Blood–brain barrier 191 Drugs (training set) and Artificial neural network (59)
permeability 50 drugs (test set)
Renal clearance 130 Diverse compounds Principal component (60)
(training set) and 20 analysis and partial least
compounds (test set) squares analysis
Drug clearance 398 Compounds (training General regression neural (61)
set) and 105 compounds network, support vector
(validation set) regression, and k-nearest
neighbor
Blood–brain barrier 28 Structurally diverse Moving average analysis (62, 63)
permeability compounds (training set); based classification models
31 compounds (validation
set) and 31 compounds
(cross-validation set)
(continued)
348 A.K. Madan and H. Dureja
Table 3
(continued)
Pharmacokinetic Statistical/modeling
parameter Data set technique References
Human intestinal 480 Structural diverse drug- Support vector machine (64)
absorption like molecules (training set) based classification model
and 98 molecules (test set)
Human oral intestinal drug 164 Compounds (training Membrane-interaction (65)
absorption set) and 24 compounds QSAR analysis
(test set)
Blood–brain barrier 136 Compounds Approximate similarity (66)
permeability matrices and partial least
square
Plasma protein binding 686 Compounds Partial least square (67)
regression
Oral bioavailability 30 Compounds Regression (68)
Oral bioavailability 768 Chemical compounds Correlation (69)
Oral bioavailability 250 Structurally diverse Hologram-QSAR (70)
molecules (training set) and
52 molecules (test set)
Metabolism, tissue 50 Structurally diverse Correlation (71)
distribution, compounds
bioavailability
Human intestinal 455 Compounds (training Genetic function (72)
absorption set) and 193 compounds approximation technique
(test set)
Blood–brain barrier 78 Compounds ( training set) Multilinear regression (73)
penetration and 25 compounds (test
set)
Blood–brain barrier 159 Compounds (training k-Nearest neighbors and (74)
permeability set) and 99 drugs (external support vector machine
test set-1) and 267 organic
compounds (external test
set-2)
Oral clearance 24 Compounds Allometric approaches (75)
Plasma protein binding and 692 Compounds Support vector machine (76)
oral bioavailability combined with genetic
algorithm
Half-life, renal, and total 20 Cephalosporins Random forest; decision (77)
body clearance, fraction tree; and moving average
excreted in urine, volume analysis
of distribution, and
fraction bound to plasma
proteins
(continued)
14 Prediction of Pharmacokinetic Parameters 349
Table 3
(continued)
Pharmacokinetic Statistical/modeling
parameter Data set technique References
Tmax 28 Structurally diverse Decision tree and moving (78)
antihistamines average analysis
Blood–brain barrier 280 Compounds Regression (79)
permeability
Buccal permeability 15 Drugs (training set) and Multiple linear regression (80)
13 compounds (test set) and maximum likelihood
estimations
Human volume of 669 Drug compounds Linear and nonlinear (81)
distribution at steady state (training set) and 29 statistical techniques,
compounds (test set) partial least squares, and
random forest
Clearance 20,000 Unique compounds Bayesian classification and (82)
extended connectivity
fingerprints
Hepatic clearance 64 Drugs (training set) and Multilinear regression (83)
22 drugs(test set) analysis
Hepatic clearance 33 Drugs Multilinear regression (84)
analysis
Drug distribution 93 Drugs Partial least square and (85)
artificial neural network
Hepatic metabolic clearance 27,697 Compounds Binary classification model (86)
Hepatic clearance 50 Drugs Multilinear regression (87)
analysis
Bioavailability 75 Compounds Cluster analysis (88)
Blood–brain barrier 193 Compounds Genetic approximation (89)
penetration based regression model
Blood–brain barrier 1,093 Compounds (for BBB) Substructure pattern (90)
penetration and human and 480 compounds (for recognition and support
intestinal absorption HIA) vector machine
Clearance (total) 370 Compounds (training k-Nearest neighbors (91)
set) and 92 compounds
(test set)
Human hepatocyte intrinsic 71 Drugs (training set); 18 Artificial neural network (92)
clearance drugs (test set-1); and 112
drugs (test set-2)
350 A.K. Madan and H. Dureja
3. Example(s)
Fig. 3. The decision tree for distinguishing low value (A) from high value (B). (1) t1/2; (2) CL; (3) CLR; (4) fe; (5) V; (6) fb
(A1, molecular connectivity topochemical index; A2, eccentric adjacency topochemical index; A5, eccentric connectivity
topochemical index; A11, eccentric adjacency index). Reproduced from ref. 77 with permission from Austrian Pharmacists’
Publishing House, Austria.
3.2. Prediction of Tmax Models were developed for prediction of physicochemical property
of Antihistamines (78) (octanol/water partition constant, log P), critical pharmacokinetic
parameter (time to reach the maximum level of drug into the
bloodstream, Tmax), and toxicological property (lethal dose,
LD50) of structurally diverse antihistaminic compounds using deci-
sion tree and moving average analysis. A decision tree was con-
structed for each property to determine the significance of
topological descriptors. Single topological descriptor based models
were developed using moving average analysis. The tree learned the
information from the input data with an accuracy of >94% and
subsequently predicted the cross-validated (tenfold) data with an
accuracy of up to 71%. The classification of Tmax using decision tree
is shown in Fig. 4. The accuracy of prediction of single index based
models using moving average analysis varied from ~63 to 80% (78).
Fig. 4. A decision tree for distinguishing between low Tmax and high Tmax. A1, molecular connectivity topochemical index;
A2, eccentric adjacency topochemical index; A, low Tmax; B, high Tmax. Reproduced from ref. 78 with permission from
Inderscience Enterprises Limited, Switzerland.
354 A.K. Madan and H. Dureja
4. Conclusion
In silico tools have not only accelerated drug discovery process but
also have led to significant reduction in time, animal sacrifice, and
expenditure. In silico tools specifically developed for prediction of
ADME characteristics are of particular interest to pharmaceutical
industry because of the high potential of discarding inappropriate
molecules during an early stage of drug development with conse-
quent saving of vital resources and valuable time. A well planned
systematic integrated in silico, in vitro, and in vivo approach can
discard inappropriate molecules at early stage and steeply accelerate
drug discovery process at reduced cost.
References
1. Miteva MA, Violas S, Montes M et al (2006) absorption and permeability in drug discovery.
FAF-drugs: free ADME/tox filtering of Curr Med Chem 13:2653–2667
compound collections. Nucleic Acids Res 34: 12. Li AP (2001) Screening for human ADME/
W738–W744 Tox drug proteins in drug discovery. Drug Disc
2. Boobis A, Gundert-Remy U, Kremers P et al Today 6:357–366
(2002) In silico prediction of ADME and phar- 13. Paul Y, Dhake AS, Singh B (2009) In silico
macokinetics: report of an expert meeting quantitative structure pharmacokinetic rela-
organised by COST B15. Eur J Pharm Sci tionship modeling of quinolones: apparent vol-
17:183–193 ume of distribution. Asian J Pharm 3:202–207
3. Huisinga W, Telgmann R, Wulkow M (2006) 14. Ekins S, Waller CL, Swaan PW et al (2000)
The virtual lab approach to pharmacokinetics: Progress in predicting human ADME para-
design principles and concepts. Drug Discov meters in silico. J Pharm Toxicol Methods
Today 11:800–805 44:251–272
4. Hodgson J (2001) ADMET—turning chemi- 15. Goodwin JT, Clark DE (2005) In silico predic-
cals into drugs. Nat Biotechnol 19:722–726 tions of blood–brain barrier penetration: con-
5. Grass GM, Sinko PJ (2002) Physiologically- siderations to “keep in mind”. J Pharm Exp
based pharmacokinetic simulation modeling. Ther 315:477–483
Adv Drug Deliv Rev 54:433–451 16. Linnankoski J, Ranta V-P, Yliperttula M et al
6. Kennedy T (1997) Managing the drug discov- (2008) Passive oral drug absorption can be
ery/development interface. Drug Disc Today predicted more reliably by experimental than
2:436–444 computational models—fact or myth. Eur J
7. Spalding DJM, Harker AJ, Bayliss MK (2000) Pharm Sci 34:129–139
Combining high-throughput pharmacokinetic 17. Modi S (2004) Positioning ADMET in silico
screens at the hits-to-leads stage of drug dis- tools in drug discovery. Drug Disc Today
covery. Drug Disc Today 5:S70–S76 9:14–15
8. Butina D, Segall MD, Frankcombe K (2002) 18. Mager DE (2006) Quantitative structure–phar-
Predicting ADME properties in silico: methods macokinetic/pharmacodynamic relationships.
and models. Drug Disc Today 7:S83–S88 Adv Drug Del Rev 58:1326–1356
9. van Waterbeemd H, Gifford E (2003) 19. Gunaratna C (2001) Drug metabolism and
ADMET in silico modelling: towards predic- pharmacokinetics in drug discovery: a primer
tion paradise? Nat Rev Drug Disc 2:192–204 for bioanalytical chemists, part II. Curr Sep
10. Hou T, Xu X (2004) Recent development and 19:87–92
application of virtual screening in drug discov- 20. Hansch C (1972) Quantitative relationships
ery: an overview. Curr Pharm Des between lipophilic character and drug metabo-
10:1011–1033 lism. Drug Metab Rev 1:1–14
11. Hou T, Wang J, Zhang W et al (2006) Recent 21. Lipinski CA, Lombardo F, Dominy BW et al
advances in computational prediction of drug (1997) Experimental and computational
14 Prediction of Pharmacokinetic Parameters 355
approaches to estimate solubility and permeability characterization and its application for
in drug discovery and development settings. Adv predicting important pharmaceutical properties
Drug Deliv Rev 23:3–25 of molecules. J Med Chem 42:1739–1748
22. Lipinski CA (2000) Druglike properties and 34. Egan WJ, Merz KM, Baldwin JJ (2000) Predic-
the causes of poor solubility and poor perme- tion of drug absorption using multivariate sta-
ability. J Pharm Toxicol Methods 44:235–249 tistics. J Med Chem 43:3867–3877
23. Ghose AK, Viswanadhan VN, Wendoloski JJ 35. Andrews CW, Bennett L, Yu LX (2000) Pre-
(1999) A knowledge-based approach in dicting human oral bioavailability of a com-
designing combinatorial or medicinal chemis- pound: development of a novel quantitative
try libraries for drug discovery. 1. A qualitative structure–bioavailability relationship. Pharm
and quantitative characterization of known Res 17:639–644
drug databases. J Comb Chem 1:55–68 36. Zhao YH, Le J, Abraham MH et al (2001)
24. Wenlock MC, Austin RP, Barton P, Davis AM, Evaluation of human intestinal absorption
Leeson PD (2003) A comparison of physio- data and subsequent derivation of a quantita-
chemical property profiles of development and tive structure–activity relationship (QSAR)
marketed oral drugs. J Med Chem with the Abraham descriptors. J Pharm Sci
46:1250–1256 90:749–784
25. Norinder U, Bergstrçm CAS (2006) Prediction 37. Colmenarejo G, Alvarez-Pedraglio A, Lavan-
of ADMET properties. ChemMedChem dera JL (2001) Chemoinformatic models to
1:920–937 predict binding affinities to human serum albu-
26. Hirono S, Nakagome I, Hirano H et al (1994) min. J Med Chem 44:4370–4378
Non-congeneric structure–pharmacokinetic 38. Mandagere AK, Thompson TN, Hwang KK
property correlation studies using fuzzy adap- (2002) A graphical model for estimating oral
tive least-squares: oral bioavailability. Biol bioavailability of drugs in humans and other
Pharm Bull 17:306–309 species from their Caco-2 permeability and
27. Palm K, Luthman K, Ungel AL et al (1996) in vitro liver enzyme metabolic stability rates.
Correlation of drug absorption with molecular J Med Chem 45:304–311
surface properties. J Pharm Sci 85:32–39 39. Veber DF, Johnson SR, Cheng H-Y et al
28. Wessel MD, Jurs PC, Tolan JW et al (1998) (2002) Molecular properties that influence
Prediction of human intestinal absorption of the oral bioavailability of drug candidates.
drug compounds from molecular structure. J Med Chem 45:2615–2623
J Chem Inf Comput Sci 38:726–735 40. Karalis V, Tsantili-Kakoulidou A, Macheras P
29. Nestorov I, Aarons L, Rowland M (1998) (2002) Multivariate statistics of disposition
Quantitative structure-pharmacokinetics pharmacokinetic parameters for structurally
relationships II. Mechanistically based model unrelated drugs used in therapeutics. Pharm
for the relationship between the tissue distribu- Res 19:1827–1834
tion parameters and the lipophilicity of the 41. Wajima T, Fukumura K, Yano Y et al (2002)
compounds. J Pharmacokinet Biopharm Prediction of human clearance from animal
26:521–545 data and molecular structural parameters
30. Winiwarter S, Bonham NM, Ax F et al (1998) using multivariate regression analysis. J Pharm
Correlation of human jejunal permeability Sci 91:2489–2499
(in vivo) of drugs with experimentally and the- 42. Doniger S, Hofmann T, Yeh J (2002) Predict-
oretically derived parameters. A multivariate ing CNS permeability of drug molecules: com-
data analysis approach. J Med Chem parison of neural network and support vector
41:4939–4949 machine algorithms. J Comput Biol
31. Clark DE (1999) Rapid calculation of polar 9:849–864
molecular surface area and its application to 43. Shen M, Xiao YD, Golbraikh A et al (2003)
the prediction of transport phenomena. 1. Pre- Development and validation of k-nearest-
diction of intestinal absorption. J Pharm Sci neighbor QSPR models of metabolic stability
88:807–814 of drug candidates. J Med Chem
32. Stenberg P, Luthman K, Ellens H et al (1999) 46:3013–3020
Prediction of the intestinal absorption of 44. Zmuidinavicius D, Didziapetris R, Japertas P
endothelin receptor antagonists using three et al (2003) Classification structure–activity
theoretical methods of increasing complexity. relations (C-SAR) in prediction of human
Pharm Res 16:1520–1526 intestinal absorption. J Pharm Sci 92:621–633
33. Ghuloum AM, Sage CR, Jain AN (1999) Molec- 45. Lobell M, Sivarajah V (2003) In silico predic-
ular hashkeys: a novel method for molecular tion of aqueous solubility, human plasma
356 A.K. Madan and H. Dureja
protein binding and volume of distribution of 58. Linnankoski J, M€akel€a JM, Ranta VP et al
compounds from calculated pKa and AlogP98 (2006) Computational prediction of oral drug
values. Mol Divers 7:69–87 absorption based on absorption rate constants
46. Wajima T, Fukumura K, Yano Y et al (2003) in humans. J Med Chem 49:3674–3681
Prediction of human pharmacokinetics from 59. Garg P, Verma J (2006) In silico prediction of
animal data and molecular structural para- blood–brain barrier permeability: an artificial
meters using multivariate regression analysis: neural network model. J Chem Inf Model
oral clearance. J Pharm Sci 92:2427–2440 46:289–297
47. Turner JV, Maddalena DJ, Cutler DJ et al 60. Doddareddy MR, Cho YS, Koh HY et al
(2003) Multiple pharmacokinetic parameter (2006) In silico renal clearance model using
prediction for a series of cephalosporins. classical Volsurf approach. J Chem Inf Model
J Pharm Sci 92:552–559 46:1312–1320
48. Pérez MA, Sanz MB, Torres LR et al (2004) A 61. Yap CW, Li ZR, Chen YZ (2006) Quantitative
topological sub-structural approach for pre- structure–pharmacokinetic relationships for
dicting human intestinal absorption of drugs. drug clearance by using statistical learning
Eur J Med Chem 39:905–916 methods. J Mol Graph Model 24:383–395
49. Turner JV, Maddalena DJ, Cutler DJ (2004) 62. Dureja H, Madan AK (2006) Topochemical
Pharmacokinetic parameter prediction from models for the prediction of permeability
drug structure using artificial neural networks. through blood brain barrier. Int J Pharm
Int J Pharm 270:209–219 323:27–33
50. Pan D, Iyer M, Liu J et al (2004) Constructing 63. Dureja H, Madan AK (2007) Validation of
optimum blood brain barrier QSAR models topochemical models for the prediction of per-
using a combination of 4D-molecular similarity meability through blood brain barrier. Acta
measures and cluster analysis. J Chem Inf Com- Pharm 57:451–467
put Sci 44:2083–2098 64. Hou T, Wang J, Li Y (2007) ADME evaluation
51. Bai JP, Utis A, Crippen G et al (2004) Use of in drug discovery. 8. The prediction of human
classification regression tree in predicting oral intestinal absorption by a support vector
absorption in humans. J Chem Inf Comput Sci machine. J Chem Inf Model 47:2408–2415
44:2061–2069 65. Iyer M, Tseng YJ, Senese CL et al (2007) Pre-
52. Li H, Yap CW, Ung CY et al (2005) Effect of diction and mechanistic interpretation of
selection of molecular descriptors on the pre- human oral drug absorption using MI-QSAR
diction of blood–brain barrier penetrating and analysis. Mol Pharm 4:218–231
nonpenetrating agents by statistical learning 66. Cuadrado MU, Ruiz IL, Gómez-Nieto MA
methods. J Chem Inf Model 45:1376–1384 (2007) QSAR models based on isomorphic
53. Aureli L, Cruciani G, Cesta MC et al (2005) and nonisomorphic data fusion for predicting
Predicting human serum albumin affinity of the blood brain barrier permeability. J Comput
interleukin-8 (CXCL8) inhibitors by 3D- Chem 28:1252–1260
QSPR approach. J Med Chem 48:2469–2479 67. Gleeson MP (2007) Plasma protein binding
54. Rahnasto M, Raunio H, Poso A et al (2005) affinity and its relationship to molecular struc-
Quantitative structure–activity relationship ture: an in-silico analysis. J Med Chem
analysis of inhibitors of the nicotine metaboliz- 50:101–112
ing CYP2A6 enzyme. J Med Chem 68. Li C, Liu T, Cui X et al (2007) Development of
48:440–449 in vitro pharmacokinetic screens using Caco-2,
55. Zhou XF, Shao Q, Coburn RA et al (2005) human hepatocyte, and Caco-2/human hepa-
Quantitative structure–activity relationship tocyte hybrid systems for the prediction of oral
and quantitative structure–pharmacokinetics bioavailability in humans. J Biomol Screen
relationship of 1,4-dihydropyridines and pyri- 12:1084–1091
dines as multidrug resistance modulators. 69. Hou T, Wang J, Zhang W et al (2007) ADME
Pharm Res 22:1989–1996 evaluation in drug discovery. 6. Can oral bio-
56. Jung SJ, Choi SO, Um SY et al (2006) Predic- availability in humans be effectively predicted
tion of the permeability of drugs through study by simple molecular property-based rules?
on quantitative structure–permeability rela- J Chem Inf Model 47:460–463
tionship. J Pharm Biomed Anal 41:469–475 70. Moda TL, Montanari CA, Andricopulo AD
57. Gleeson MP, Waters NJ, Paine SW et al (2006) (2007) Hologram QSAR model for the predic-
In silico human and rat Vss quantitative struc- tion of human oral bioavailability. Bioorg Med
ture–activity relationship models. J Med Chem Chem 15:7738–7745
49:1953–1963
14 Prediction of Pharmacokinetic Parameters 357
71. De Buck SS, Sinha VK, Fenu LA et al (2007) The 84. Emoto C, Murayama N, Rostami-Hodjegan A
prediction of drug metabolism, tissue distribu- et al (2009) Utilization of estimated physico-
tion, and bioavailability of 50 structurally diverse chemical properties as an integrated part of
compounds in rat using mechanism-based predicting hepatic clearance in the early drug-
absorption, distribution, and metabolism pre- discovery stage: impact of plasma and micro-
diction tools. Drug Metab Dispos 35:649–659 somal binding. Xenobiotica 39:227–235
72. Hou T, Wang J, Zhang W et al (2007) ADME 85. Paixão P, Gouveia LF, Morais JA (2009) Pre-
evaluation in drug discovery. 7. Prediction of diction of drug distribution within blood. Eur J
oral absorption by correlation and classifica- Pharm Sci 36:544–554
tion. J Chem Inf Model 47:208–218 86. Chang C, Duignan DB, Johnson KD (2009)
73. Fu XC, Wang GP, Shan HL et al (2008) Pre- The development and validation of a computa-
dicting blood–brain barrier penetration from tional model to predict rat liver microsomal
molecular weight and number of polar atoms. clearance. J Pharm Sci 98:2857–2867
Eur J Pharm Biopharm 70:462–466 87. Li H, Sun J, Sui X et al (2009) First-principle,
74. Zhang L, Zhu H, Oprea TI et al (2008) QSAR structure-based prediction of hepatic metabolic
modeling of the blood–brain barrier perme- clearance values in human. Eur J Med Chem
ability for diverse organic compounds. Pharm 44:1600–1606
Res 25:1902–1914 88. Grabowski T, Jaroszewski JJ (2009) Bioavail-
75. Sinha VK, De Buck SS, Fenu LA et al (2008) ability of veterinary drugs in vivo and in silico.
Predicting oral clearance in humans: how close J Vet Pharmacol Ther 32:249–257
can we get with allometry? Clin Pharmacokinet 89. Fan Y, Unwalla R, Denny RA et al (2010)
47:35–45 Insights for predicting blood–brain barrier
76. Ma CY, Yang SY, Zhang H et al (2008) Predic- penetration of CNS targeted molecules using
tion models of human plasma protein binding QSPR approaches. J Chem Inf Model
rate and oral bioavailability derived by using 50:1123–1133
GA-CG-SVM method. J Pharm Biomed Anal 90. Shen J, Cheng F, Xu Y et al (2010) Estimation
47:677–682 of ADME properties with substructure pattern
77. Dureja H, Gupta S, Madan AK (2008) Topo- recognition. Chem Inf Model 50:1034–1041
logical models for prediction of pharmacoki- 91. Yu MJ (2010) Predicting total clearance in
netic parameters of cephalosporins using humans from chemical structure. J Chem Inf
random forest, decision tree and moving aver- Model 50:1284–1295
age analysis. Sci Pharm 76:377–394 92. Paixão P, Gouveia LF, Morais JA (2010) Pre-
78. Dureja H, Gupta S, Madan AK (2009) Deci- diction of the in vitro intrinsic clearance deter-
sion tree derived topological models for predic- mined in suspensions of human hepatocytes by
tion of physico-chemical, pharmacokinetic and using artificial neural networks. Eur J Pharm
toxicological properties of antihistaminic Sci 39:310–321
drugs. Int J Comput Biol Drug Des 2:353–370 93. Kharkar PS (2010) Two-dimensional (2D) in
79. Lanevskij K, Japertas P, Didziapetris R et al silico models for absorption, distribution,
(2009) Ionization-specific prediction of metabolism, excretion and toxicity
blood–brain permeability. J Pharm Sci (ADME/T) in drug discovery. Curr Top Med
98:122–134 Chem 10:116–126
80. Kokate A, Li X, Williams PJ et al (2009) In 94. Lombardo F, Gifford E, Shalaeva MY (2003)
silico prediction of drug permeability across In silico ADME prediction: data, models, facts
buccal mucosa. Pharm Res 26:1130–1139 and myths. Mini Rev Med Chem 3:861–875
81. Berellini G, Springer C, Waters NJ et al (2009) 95. Liu X, Tu M, Kelly RS et al (2004) Develop-
In silico prediction of volume of distribution in ment of a computational approach to predict
human using linear and nonlinear models on a blood–brain barrier permeability. Drug Metab
669 compound data set. J Med Chem Dispos 32:132–139
52:4488–4495 96. Sui X, Sun J, Wu X et al (2008) Predicting the
82. McIntyre TA, Han C, Davis CB (2009) Predic- volume of distribution of drugs in humans.
tion of animal clearance using naı̈ve Bayesian Curr Drug Metab 9:574–580
classification and extended connectivity finger- 97. Ekins S, Boulanger B, Swaan PW et al (2002)
prints. Xenobiotica 39:487–494 Towards a new age of virtual ADME/TOX and
83. Li H, Sun J, Sui X, Yan Z et al (2009) multidimensional drug discovery. J Comput
Structure-based prediction of the nonspecific Aided Mol Des 16:381–401
binding of drugs to hepatic microsomes.
AAPS J 11:364–370
Chapter 15
Abstract
The human pregnane X receptor (PXR) is a ligand dependent transcription factor that can be activated by
structurally diverse agonists including steroid hormones, bile acids, herbal drugs, and prescription medica-
tions. PXR regulates the transcription of several genes involved in xenobiotic detoxification and apoptosis.
Activation of PXR has the potential to initiate adverse effects by altering drug pharmacokinetics or
perturbing physiological processes. Hence, more reliable prediction of PXR activators would be valuable
for pharmaceutical drug discovery to avoid potential toxic effects. Ligand- and protein structure-based
computational models for PXR activation have been developed in several studies. There has been limited
success with structure-based modeling approaches to predict human PXR activators, which can be
attributed to the large and promiscuous site of this protein. Slightly better success has been achieved with
ligand-based modeling methods including quantitative structure–activity relationship (QSAR) analysis,
pharmacophore modeling and machine learning that use appropriate descriptors to account for the diversity
of the ligand classes that bind to PXR. These combined computational approaches using molecular shape
information may assist scientists to more confidently identify PXR activators. This chapter reviews the
various ligand and structure based methods undertaken to date and their results.
Key words: Agonists, Alignment methods, Antagonists, Bayesian classification, Docking and Scoring,
Pharmacophore, Pregnane Xenobiotic receptors, QSAR, Support vector machines
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_15, # Springer Science+Business Media, LLC 2012
359
360 S. Kortagere et al.
Fig. 1. General architecture of nuclear hormone receptor family. DBD represents the DNA binding domain and LBD
represents the ligand binding domain and AF2 represents the activation function region that binds to coactivator.
2. Results
and Discussion
2.1. Ligand-Based Human PXR agonist pharmacophore models have been shown to
Modeling of PXR possess hydrophobic, hydrogen bond acceptor and hydrogen bond
donor features (Table 1), consistent with the crystallographic struc-
tures of hPXR ligand–receptor complexes (17, 36–38). Several
predictive computational models for hPXR have been developed
to define key binding features of ligands (27, 28, 39). Most
Table 1
hPXR pharmacophore features and how to avoid being an agonist
2.2. Structure-Based Currently at the time of writing there are five high-resolution
Modeling of hPXR crystal structures of the hPXR LBD in complex with a variety of
ligands (17, 36–38) available in the PDB (and a sixth structure to
be deposited (57)). The structures have provided atomic level
details that have led to a greater understanding of the LBD and
the structural features involved in ligand–receptor interactions (27,
28, 39). The cocrystallized ligands include the natural products
hyperforin (active component of the herbal antidepressant St.
John’s wort) and colupulone (from hops), the steroid 17b-estra-
diol, and the synthetic compounds SR12813, T-0901317, and the
antibiotic rifampicin (Fig. 2). These ligands span a range of
molecular sizes (M.Wt range 272.38–713.81 Da, mean
487.58 147.25 Da) and are predicted as generally hydrophobic
(calculated ALogP (43) 3.54–10.11, mean 5.54 2.41). The cav-
ernous hPXR ligand binding pocket (LBP) with a volume
>1,350 Å3 accepts molecules of these widely varying dimensions
and chemical properties, and is likely capable of binding small
molecules in multiple orientations (44). This complicates overall
prediction of whether a small molecule is likely to be classified as an
hPXR agonist using traditional structure-based virtual screening
methods like docking that treat the receptor as rigid for purposes
of modeling (25, 45). With regard to this, we have previously
shown that the widely used structure-based docking methods
Fig. 2. Two dimensional structures of ligands that are cocrystallized with hPXR; 444 is T0901317, EST is 17b-estradiol,
COL is colupulone, HYF is hyperforin, RFP is rifampicin, and SRL is SR12813.
366 S. Kortagere et al.
Fig. 3. Structural comparison of the binding modes of (a) hyperforin and (b) steroids compound in hPXR LBD is shown.
The 2D interaction diagram was generated using LIGX module of program MOE.
2.3. hPXR Antagonists There have been relatively few attempts to understand or develop
in silico models of antagonism of hPXR (50). One computational
approach focused on the LBD using the crystal structure of hPXR
bound to T-0901317 (37), but this proved difficult (31). The list of
hPXR antagonists is however steadily growing and even includes
some compounds first characterized as weak hPXR agonists (15).
For example, the antagonists ketoconazole (33), fluconazole, and
enilconazole (51) have all been shown to inhibit the activation of
368 S. Kortagere et al.
3. Conclusions
Long range motions of the hPXR LBD (17) and potential that the
ligands may bind in multiple conformations within the LBD of
hPXR may be responsible for the promiscuity of the hPXR LBD.
Thus developing computational models to predict the binding
mode and affinity of ligands for hPXR is challenging. In this study
we have summarized the utility of ligand based and structure based
models to predict agonists and antagonists of hPXR. Each of these
methods has their inherent advantages and limitations specially
when used on flexible proteins (22). Among the ligand based
methods, some of them are based on ligand alignment while others
are independent of alignments. The performance of alignment-
15 Ligand- and Structure-Based Pregnane X Receptor Models 369
Acknowledgments
Appendix
References
7. Lee EJ, Lean CB, Limenta LM (2009) Role of 19. Gong H, Guo P, Zhai Y, Zhou J, Uppal H,
membrane transporters in the safety profile of Jarzynka MJ, Song WC, Cheng SY, Xie W
drugs. Expert Opin Drug Metab Toxicol (2007) Estrogen deprivation and inhibition of
5:1369–1383 breast cancer growth in vivo through activation
8. Giguere V (1999) Orphan nuclear receptors: of the orphan nuclear receptor liver X receptor.
from gene to function. Endocr Rev Mol Endocrinol 21:1781–1790
20:689–725 20. Rotroff DM, Beam AL, Dix DJ, Farmer A,
9. Ashby J, Houthoff E, Kennedy SJ, Stevens J, Freeman KM, Houck KA, Judson RS,
Bars R, Jekat FW, Campbell P, Van Miller J, LeCluyse EL, Martin MT, Reif DM, Ferguson
Carpanini FM, Randall GL (1997) The chal- SS (2010) Xenobiotic-metabolizing enzyme
lenge posed by endocrine-disrupting chemi- and transporter gene expression in primary cul-
cals. Environ Health Perspect 105:164–169 tures of human hepatocytes modulated by Tox-
10. Kelce WR, Gray LE, Wilson EM (1998) Anti- Cast chemicals. J Toxicol Environ Health B
androgens as environmental endocrine disrup- Crit Rev 13:329–346
tors. Reprod Fertil Dev 10:105–111 21. Tirona RG, Leake BF, Podust LM, Kim RB
11. Melnick R, Lucier G, Wolfe M, Hall R, Stancel (2004) Identification of amino acids in rat
G, Prins G, Gallo M, Reuhl K, Ho SM, Brown pregnane X receptor that determine species-
T, Moore J, Leakey J, Haseman J, Kohn M specific activation. Mol Pharmacol 65:36–44
(2002) Summary of the National Toxicology 22. Kortagere S, Krasowski MD, Reschly EJ,
Program’s report of the endocrine disruptors Venkatesh M, Mani S, Ekins S (2010) Evaluation
low-dose peer review. Environ Health Perspect of computational docking to identify pregnane X
110:427–431 receptor agonists in the ToxCast database. Envi-
12. Goldman JM, Laws SC, Balchak SK, Cooper ron Health Perspect 118:1412–1417
RL, Kavlock RJ (2000) Endocrine-disrupting 23. Ekins S, Kortagere S, Iyer M, Reschly EJ, Lill
chemicals: prepubertal exposures and effects on MA, Redinbo M, Krasowski MD (2009) Chal-
sexual maturation and thyroid activity in the lenges Predicting Ligand-Receptor Interac-
female rat. A focus on the EDSTAC recom- tions of Promiscuous Proteins: The Nuclear
mendations. Crit Rev Toxicol 30:135–196 Receptor PXR. PLoS Comput Biol 5:
13. Ekins S, Chang C, Mani S, Krasowski MD, e1000594
Reschly EJ, Iyer M, Kholodovych V, Ai N, 24. Yasuda K, Ranade A, Venkataramanan R, Strom
Welsh WJ, Sinz M, Swaan PW, Patel R, Bach- S, Chupka J, Ekins S, Schuetz E, Bachmann K
mann K (2007) Human pregnane X receptor (2008) A comprehensive in vitro and in silico
antagonists and agonists define molecular analysis of antibiotics that activate pregnane X
requirements for different binding sites. Mol receptor and induce CYP3A4 in liver and intes-
Pharmacol 72:592–603 tine. Drug Metab Dispos 36:1689–1697
14. Ekins S, Kholodovych V, Ai N, Sinz M, Gal J, 25. Khandelwal A, Krasowski MD, Reschly EJ,
Gera L, Welsh WJ, Bachmann K, Mani S Sinz MW, Swaan PW, Ekins S (2008) Machine
(2008) Computational discovery of novel low learning methods and docking for predicting
micromolar human pregnane X receptor human pregnane X receptor activation. Chem
antagonists. Mol Pharmacol 74:662–672 Res Toxicol 21:1457–1467
15. Biswas A, Mani S, Redinbo MR, Krasowski 26. Ekins S, Andreyev S, Ryabov A, Kirillov E,
MD, Li H, Ekins S (2009) Elucidating the Rakhmatulin EA, Sorokina S, Bugrim A,
‘Jekyll and Hyde’ nature of PXR: the case for Nikolskaya T (2006) A Combined approach
discovering antagonists or allosteric antago- to drug metabolism and toxicity assessment.
nists. Pharm Res 26:1807–1815 Drug Metab Dispos 34:495–503
16. Mnif W, Pascussi JM, Pillion A, Escande A, 27. Bachmann K, Patel H, Batayneh Z, Slama J,
Bartegi A, Nicolas JC, Cavailles V, Duchesne White D, Posey J, Ekins S, Gold D, Sambucetti
MJ, Balaguer P (2007) Estrogens and anties- L (2004) PXR and the regulation of apoA1 and
trogens activate PXR. Toxicol Lett 170:19–29 HDL-cholesterol in rodents. Pharmacol Res
17. Teotico DG, Bischof JJ, Peng L, Kliewer SA, 50:237–246
Redinbo MR (2008) Structural basis of human 28. Ekins S, Erickson JA (2002) A pharmacophore
pregnane X receptor activation by the hops for human pregnane-X-receptor ligands. Drug
constituent colupulone. Mol Pharmacol Metab Dispos 30:96–99
74:1512–1520 29. Pan Y, Li L, Kim G, Ekins S, Wang H, Swaan
18. Wada T, Gao J, Xie W (2009) PXR and CAR in PW (2011) Identification and validation of
energy metabolism. Trends Endocrinol Metab Novel hPXR activators amongst prescribed
20:273–279
374 S. Kortagere et al.
drugs via ligand-based virtual screening. Drug 40. Khandelwal, A., Krasowski, M. D., Reschly, E.
Metab Dispos 39(2):337–44 J., Sinz, M., Swaan, P. W., and Ekins, S. (2007)
30. Gao YD, Olson SH, Balkovec JM, Zhu Y, Royo A Comparative Analysis of Quantitative
I, Yabut J, Evers R, Tan EY, Tang W, Hartley Structure Activity Relationship Methods and
DP, Mosley RT (2007) Attenuating pregnane Docking For Human Pregnane X Receptor
X receptor (PXR) activation: a molecular Activation, submitted.
modelling approach. Xenobiotica 37:124–138 41. Jacobs MN (2004) In silico tools to aid risk
31. Lemaire G, Benod C, Nahoum V, Pillon A, assessment of endocrine disrupting chemicals.
Boussioux AM, Guichou JF, Subra G, Pascussi Toxicology 205:43–53
JM, Bourguet W, Chavanieu A, Balaguer P 42. Hassan M, Brown RD, Varma-O’brien S,
(2007) Discovery of a highly active ligand of Rogers D (2006) Cheminformatics analysis
human pregnane x receptor: a case study from and learning in a data pipelining environment.
pharmacophore modeling and virtual screen- Mol Divers 10:283–299
ing to “in vivo” biological activity. Mol Phar- 43. Ghose AK, Viswanadhan VN, Wendoloski JJ
macol 72:572–581 (1998) Prediction of hydrophobic (lipophilic)
32. Schuster D, Langer T (2005) The identifica- properties of small organic molecules using frag-
tion of ligand features essential for PXR activa- mental methods: an analysis of ALOGP and
tion by pharmacophore modeling. J Chem Inf CLOGP methods. J Phys Chem 102:3762–3772
Model 45:431–439 44. Watkins RE, Wisely GB, Moore LB, Collins JL,
33. Huang H, Wang H, Sinz M, Zoeckler M, Stau- Lambert MH, Williams SP, Willson TM,
dinger J, Redinbo MR, Teotico DG, Locker J, Kliewer SA, Redinbo MR (2001) The human
Kalpana GV, Mani S (2007) Inhibition of drug nuclear xenobiotic receptor PXR: structural
metabolism by blocking the activation of determinants of directed promiscuity. Science
nuclear receptors by ketoconazole. Oncogene 292:2329–2333
26:258–268 45. Kortagere S, Chekmarev D, Welsh WJ, Ekins S
34. Lill MA, Dobler M, Vedani A (2005) In silico (2009) Hybrid scoring and classification
prediction of receptor-mediated environmental approaches to predict human pregnane X
toxic phenomena-application to endocrine dis- receptor activators. Pharm Res 26:1001–1011
ruption. SAR QSAR Environ Res 16:149–169 46. Ekins S, Kortagere S, Iyer M, Reschly EJ, Lill
35. Lin YS, Yasuda K, Assem M, Cline C, Barber J, MA, Redinbo MR, Krasowski MD (2009)
Li CW, Kholodovych V, Ai N, Chen JD, Welsh Challenges predicting ligand-receptor interac-
WJ, Ekins S, Schuetz EG (2009) The major tions of promiscuous proteins: the nuclear recep-
human pregnane X receptor (PXR) splice tor PXR. PLoS Comput Biol 5:e1000594
variant, PXR.2, exhibits significantly dimin- 47. Judson RS, Houck KA, Kavlock RJ, Knudsen
ished ligand-activated transcriptional regula- TB, Martin MT, Mortensen HM, Reif DM,
tion. Drug Metab Dispos 37:1295–1304 Rotroff DM, Shah I, Richard AM, Dix DJ
36. Watkins RE, Davis-Searles PR, Lambert MH, (2010) In vitro screening of environmental
Redinbo MR (2003) Coactivator binding pro- chemicals for targeted testing prioritization:
motes the specific interaction between ligand the ToxCast project. Environ Health Perspect
and the pregnane X receptor. J Mol Biol 118:485–492
331:815–828 48. Martin MT, Dix DJ, Judson RS, Kavlock RJ,
37. Xue Y, Chao E, Zuercher WJ, Willson TM, Reif DM, Richard AM, Rotroff DM, Romanov
Collins JL, Redinbo MR (2007) Crystal struc- S, Medvedev A, Poltoratskaya N, Gambarian
ture of the PXR-T1317 complex provides a M, Moeser M, Makarov SS, Houck KA
scaffold to examine the potential for receptor (2010) Impact of environmental chemicals on
antagonism. Bioorg Med Chem 15:2156–2166 key transcription regulators and correlation to
38. Chrencik JE, Orans J, Moore LB, Xue Y, Peng toxicity end points within EPA’s ToxCast pro-
L, Collins JL, Wisely GB, Lambert MH, gram. Chem Res Toxicol 23:578–590
Kliewer SA, Redinbo MR (2005) Structural 49. Knudsen TB, Martin MT, Kavlock RJ, Judson
disorder in the complex of human pregnane X RS, Dix DJ, Singh AV (2009) Profiling the
receptor and the macrolide antibiotic rifampi- activity of environmental chemicals in prenatal
cin. Mol Endocrinol 19:1125–1134 developmental toxicity studies using the U.S.
39. Schuster D, Laggner C, Steindl TM, Palusczak EPA’s ToxRefDB. Reprod Toxicol
A, Hartmann RW, Langer T (2006) Pharma- 28:209–219
cophore modeling and in silico screening for 50. Mani S, Huang H, Sundarababu S, Liu W,
new P450 19 (aromatase) inhibitors. J Chem Kalpana G, Smith AB, Horwitz SB (2005)
Inf Model 46:1301–1311 Activation of the steroid and xenobiotic
15 Ligand- and Structure-Based Pregnane X Receptor Models 375
receptor (human pregnane X receptor) by 55. Gasteiger JAMM (1980) Iterative partial equal-
nontaxane microtubule-stabilizing agents. ization of orbital electronegativity—a rapid
Clin Cancer Res 11:6359–6369 access to atomic charges. Tetrahedron 36:10
51. Chen Y, Tang Y, Wang MT, Zeng S, Nie D 56. Kortagere S, Welsh WJ (2006) Development
(2007) Human pregnane X receptor and resis- and application of hybrid structure based
tance to chemotherapy in prostate cancer. Can- method for efficient screening of ligands bind-
cer Res 67:10361–10367 ing to G-protein coupled receptors. J Comput
52. Reynolds CH, Tounge BA, Bembenek SD Aided Mol Des 20:789–802
(2008) Ligand binding efficiency: trends, phys- 57. Xue Y, Moore LB, Orans J, Peng L, Bencharit
ical basis, and implications. J Med Chem S, Kliewer SA, Redinbo MR (2007) Crystal
51:2432–2438 structure of the pregnane X receptor-estradiol
53. Ekins S, Reschly EJ, Hagey LR, Krasowski MD complex provides insights into endobiotic rec-
(2008) Evolution of pharmacologic specificity ognition. Mol Endocrinol 21:1028–1038
in the pregnane X receptor. BMC Evol Biol 58. Jones G, Willett P, Glen RC, Leach AR, Taylor
8:103 R (1997) Development and validation of a
54. Krasowski MD, Yasuda K, Hagey LR, Schuetz genetic algorithm for flexible docking. J Mol
EG (2005) Evolution of the pregnane x recep- Biol 267:727–748
tor: adaptation to cross-species differences in 59. Kogej T, Engkvist O, Blomberg N, Muresan S
biliary bile salts. Mol Endocrinol (2006) Multifingerprint based similarity
19:1720–1739 searches for targeted class compound selection.
J Chem Inf Model 46:1201–1213
Chapter 16
Non-compartmental Analysis
Johan Gabrielsson and Daniel Weiner
Abstract
When analyzing pharmacokinetic data, one generally employs either model fitting using nonlinear
regression analysis or non-compartmental analysis techniques (NCA). The method one actually employs
depends on what is required from the analysis. If the primary requirement is to determine the degree of
exposure following administration of a drug (such as AUC), and perhaps the drug’s associated pharmaco-
kinetic parameters, such as clearance, elimination half-life, Tmax, Cmax, etc., then NCA is generally the
preferred methodology to use in that it requires fewer assumptions than model-based approaches. In this
chapter we cover NCA methodologies, which utilize application of the trapezoidal rule for measurements of
the area under the plasma concentration–time curve. This method, which generally applies to first-order
(linear) models (although it is often used to assess if a drug’s pharmacokinetics are nonlinear when several
dose levels are administered), has few underlying assumptions and can readily be automated.
In addition, because sparse data sampling methods are often utilized in toxicokinetic (TK) studies, NCA
methodology appropriate for sparse data is also discussed.
1. Non-
compartmental
Analysis
Most current approaches to characterize a drug’s kinetics involve
1.1. Non-compartmental non-compartmental analysis, denoted NCA, and nonlinear regres-
Versus Regression sion analysis (1). The advantages of the regression analysis approach
Analysis are the disadvantages of the non-compartmental approach and vice
versa. NCA does not require the assumption of a specific compart-
mental model for either drug or metabolite. The method used
involves application of the trapezoidal rule for measurements of
the area under a plasma concentration–time curve. This method,
which applies to first-order (linear) models, is rather assumption
free and can readily be automated. Figure 1 gives a schematic
picture of the NCA and nonlinear regression approaches. As can
be seen, NCA deals with sums of areas whereas regression
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_16, # Springer Science+Business Media, LLC 2012
377
378 J. Gabrielsson and D. Weiner
Fig. 1. Comparison of NCA (left ) and nonlinear regression modeling (right ). Ka, K, and V in the right-hand panel indicate the
model parameters to be estimated by regressing the model to data.
1.2. Computational The areas can either be calculated by means of the linear trapezoi-
Methods: Linear dal rule or by the log-linear trapezoidal rule. The total area is then
Trapezoidal Rule measured by summing the incremental area of each trapezoid
(Fig. 2).
The magnitude of the error associated with the estimated area
depends on the width of the trapezoid and the curvature of the true
profile. This is due to the fact that the linear trapezoidal rule over-
estimates the area during the descending phase assuming elimina-
tion is first-order, and underestimates the area during the ascending
part of the curve (Fig. 3). This over/underestimation error will be
more pronounced if the sampling interval Dt is large in relation to
the half-life.
Using the linear trapezoidal method for calculation of the area
under the zero moment curve AUC from 0 to time tn, we have
16 Non-compartmental Analysis 379
Fig. 2. Graphical presentation of the linear trapezoidal rule. AUCtit(i+1) is the area between
ti and ti+1. Ci and Ci+1 are the corresponding plasma concentrations, and Dt is the time
interval. Note that Dt may differ for different trapeziums.
Fig. 3. Concentration versus time during and after a constant rate infusion. The shaded
area represents underestimation of the area during ascending concentrations and over-
estimation of the area during descending concentrations. By decreasing the time
step (Dt ) between observations, this under- or overestimation of the area is minimized.
Xn
C i þ C iþ1
AUC0t last ¼ Dt; (1)
i¼1
2
where Dt ¼ ti+1 ti and tlast denotes the time of the last measur-
able concentration. Unless one has sampled long enough in time so
that concentrations are negligible, the AUC as defined above will
underestimate the true AUC. Therefore it may be necessary to
extrapolate the curve out to t equal to infinity (1). The extrapo-
lated area under the zero moment curve from the last sampling
time to infinity AUCtlast-1 is calculated as
380 J. Gabrielsson and D. Weiner
Fig. 4. Semilog plot demonstrating the estimation of lz. The terminal data points are fit by
log-linear regression to estimate the slope.
ð
1 1
elz ðtt last Þ
AUCt1last ¼ C last e lz ðtt last Þ
dt ¼ C last
lz t last
t last
1 C last
¼ C last 0 ¼ ; (2)
lz lz
where Clast and lz are the last measurable nonzero plasma concen-
tration and the terminal slope on a loge scale, respectively. One may
also use the predicted concentration at tlast if the observed concen-
tration deviates from the terminal regression line (Fig. 4).
The lz parameter is graphically obtained from the terminal
slope of the semilogarithmic concentration–time curve as shown
in Fig. 4, with a minimum of 3–4 observations being required for
accurate estimation. The Y axis ln(C) denotes the natural logarithm
(loge) of the plasma concentration C.
The linear trapezoidal method for calculation of the area under
the first moment curve AUMC from 0 to time tlast is obtained from
Xn
t i C i þ t iþ1 C iþ1
AUMC0t last ¼ Dt: (3)
i¼1
2
Ð ax ax
Remembering that x eax dx ¼ xea e a2 ; the
corresponding area under the first moment curve from time tlast
to infinity AUMCtlast-1 is computed as
ð
1 ð
1
1.3. Computational An alternative procedure that has been proposed is the log-linear
Methods: Log-Linear trapezoidal rule. The underlying assumption is that the plasma
Trapezoidal Rule concentrations decline mono-exponentially between two measured
concentrations. However, this method applies only for descending
data and fails when Ci ¼ 0 or Ci+1 ¼ Ci. In these instances one
would revert to the linear trapezoidal rule. The principal difference
between the linear and the log-linear trapezoidal method is demon-
strated in Fig. 5.
Remember that when the concentrations decline exponentially
C iþ1 ¼ C i eK ðt iþ1 t i Þ ¼ C i eK Dt ; (5)
where ti+1 ti is the time step Dt between two observations and K
is the elimination rate constant for a one-compartment system.
Otherwise, lz should be used as the slope. The above expression
when rearranged gives the elimination rate constant K:
lnðC i =C iþ1 Þ
K¼ : (6)
Dt
The AUC within the time interval Dt is the difference between
the concentrations divided by the slope K:
C i C iþ1 C i C iþ1
AUCiiþ1 ¼ ¼ Dt: (7)
K lnðC i =C iþ1 Þ
Using the log-linear trapezoidal method from time zero to tn
Xn
C i C iþ1
AUC0t n ¼ Dt; (8)
i¼1
lnðC i =C iþ1 Þ
Fig. 5. The principal difference between the linear (left ) and the log-linear (right )
trapezoidal methods. The shaded region represents the over-predicted area with the
linear trapezoidal rule. Note that the log-linear approximation is only true if the decay is
truly mono-exponential between ti and ti+1.
382 J. Gabrielsson and D. Weiner
Xn
t i C i t iþ1 C iþ1 C iþ1 C i
AUMC0t n ¼ Dt Dt 2 :
i¼1
lnðC i =C iþ1 Þ ½ lnðC i =C iþ1 Þ 2
(9)
The extrapolated area under the zero moment curve from the
last sampling time to infinity AUCtlast-1 is calculated as
C last
AUCt1last ¼ ; (10)
lz
where Clast and lz are as defined earlier. The corresponding
area under the first moment curve from time zero to infinity
AUMCtlast-1 is
C last t last C last
AUMCt1last ¼ þ 2 : (11)
lz lz
As previously pointed out, the linear trapezoidal method gives
approximate estimates of AUC during both the ascending and
descending parts of the concentration–time curve, although the
bias is usually negligible for the upswing. The log-linear trapezoidal
method may also give somewhat biased results, though to a lesser
extent. Some people argue that the log-linear trapezoidal method
may therefore be preferable for drugs with long half-lives relative to
the sampling interval. From a practical point of view this still needs
to be proven. However, our own experience is that the difference
between the two methods is negligible as long as a reasonable
sampling design has been used. We generally use a mixture of the
two methods, which means that the linear trapezoidal method is
applied for increasing and equal concentrations, e.g., at the peak or
a plateau, and the log-linear trapezoidal method for decreasing
concentrations. This is demonstrated in Fig. 6.
Note that NCA is often used in crossover studies comparing two
formulations and 12–36 subjects. Thus, since the error associated
Fig. 6. NCA using a combination of the linear and log-linear trapezoidal methods. The
linear method is used for consecutively increasing or consecutively equal concentrations.
The log-linear method is used for decreasing concentrations.
16 Non-compartmental Analysis 383
1.4. Strategies for When estimating lz, we recommend that data from each individual
Estimation of lz are first plotted in a semilog diagram. Ideally, to obtain a reliable
estimate of the terminal slope, 3–4 half-lives would need to have
elapsed. However, sometimes this is not possible. A minimum
requirement is then to have 3–4 observations for the terminal
slope (Fig. 7). By means of log-linear regression of those observa-
tions, the estimate of lz is obtained. This is then used for calculation
of the extrapolated area as shown below:
C last
AUCt1last ðobservedÞ ¼ (12)
lz
or
^ last
C
AUCt1last ðpredictedÞ ¼ : (13)
lz
In Fig. 8 the last observed concentration Cobs deviates some-
what from the regression line. The extrapolated area, if based on
Fig. 7. The ideal situation (left ) for estimation of the terminal slope lz. Another and
perhaps more commonly encountered situation (right ) is where one only has an indication
of an additional slope.
384 J. Gabrielsson and D. Weiner
Fig. 8. Impact on the extrapolated area of using observed terminal concentration versus
predicted concentration. The shaded area from tlast to infinity symbolizes the overestima-
tion that would result. Note that if the observed terminal concentration lies below the
predicted terminal concentration, then the extrapolated area would be underestimated.
The open circle is the predicted concentration at tlast. The last observation is not included
in the regression.
1.5. Pertinent Moment analysis has been widely used in recent years as a non-
Pharmacokinetic compartmental approach to the estimation of clearance Cl, mean
Estimates residence time MRT, steady-state volume of distribution Vss, and
volume of distribution during the terminal phase Vz (also called Vdb
for a bi-exponential system). A general treatment for the aforemen-
tioned parameters has been presented, which includes the possibil-
ity of input/exit from any compartment in a mammillary model
(2, 3). This approach also defines exit site-dependent and exit
site-independent parameters. We will, however, assume in the
following examples that input/output occurs to the central com-
partment. Assuming a simple case with a one-compartment bolus
system, the shape of the concentration–time and t·concentration–
time profiles will take the form depicted in Fig. 9.
The extrapolated area from the last sample at tlast to infinity is in
this case small. However, the corresponding area under the first
16 Non-compartmental Analysis 385
Fig. 9. Comparison of shape of area under the zero moment curve AUC and area under the first moment (tC ) curve AUMC.
The latter area contains usually an extensive extrapolated area as compared to AUC.
where AUCev and AUCiv denote area under the extravascular and
intravenous concentration–time profiles, respectively. Dev and Dev
are the respective extravascular and intravenous doses.
If the drug is given at a constant rate over a period of Tinf, then
one also needs to adjust MRT for the infusion time by means of
subtracting Tinf/2 (infusion time/2) as follows:
AUMC01 T inf
MRT ¼ : (19)
AUC01 2
Tinf/2 originates from the average time a molecule stays in the
infusion set (e.g., syringe, catheter, line). Half of the dose is infused
when the piston has traveled half of the intended distance. Tinf/2 is
the mean input time, MIT. Similarly for first-order input,
AUMC01 1
MRT ¼ 1 : (20)
AUC0 Ka
Remember that Ka is the apparent first-order absorption rate
constant derived from plasma data. This parameter may also con-
tain processes parallel to the true absorption step of drug in the
gastrointestinal tract, e.g., chemical degradation (kd). Conse-
quently, the mean absorption time, MAT, is the sum of several
processes including absorption and chemical degradation:
1 1
MAT ¼ ¼ : (21)
K aðapparentÞ K aðtrueÞ þ kd
The MRT of the central compartment MRT(1) is the sum of
the inverse of the initial a and terminal b slopes corrected for the
inverse of the sum of the exit rate constants from the peripheral
compartment:
1 1 1
MRT iv ð1Þ ¼ þ : (22)
a b E2
Assuming that there is only one exit rate constant from the
peripheral compartment, which then is k21, the MRTiv is
1 1 1
MRT iv ¼ þ : (23)
a b k21
The observed MRT after extravascular dosing becomes
AUMC01measured
¼ MRT þ MIT ; (24)
AUC01measured
which is the sum of the true MRT and MIT. MIT can also be
obtained from the input function according to Eq. 25 below:
Ð
1 Ð
1
input function t dt input function t dt
MIT ¼ 0 1
Ð ¼ 0
; (25)
F Dose
input function dt
0
16 Non-compartmental Analysis 387
1.6. NCA Approaches In some instances it may not be possible to obtain sufficient samples
for Sparse Data from each subject so as to completely characterize the plasma con-
centration–time curve. This may be due to the need to sacrifice the
animal to obtain the samples, general concerns over blood loss
(such as in human neonates or small rodents), or cost concerns.
In these situations it is necessary to pool the data from multiple
388 J. Gabrielsson and D. Weiner
1.7. Suggested Reading For further reading on basic pharmacokinetic principles, we refer
the reader to Benet (8), Benet and Galeazzi (9), Gibaldi and Perrier
(10), Nakashima and Benet (2, 3), Jusko (11), and Rowland and
Tozer (12). Houston (13) and Pang (14) provide excellent texts on
metabolite kinetics.
Benet and Galeazzi (9), Watari and Benet (15), and Nakashima
and Benet (3) have elaborated on the theory of NCA, while Gille-
spie (16) discussed the pros and cons of NCA versus compartmen-
tal models.
References
1. Gabrielsson J, Weiner D (2006) Pharmacoki- plete blood sampling. In: ASA Proceedings of
netic and pharmacodynamic data analysis: con- the Biopharmaceutical Section, p 4
cepts and applications, 4th edn. Swedish 6. Nedelman JR, Jia X (1998) An extension of
Pharmaceutical Press, Stockholm Satterthwaite’s approximation applied to phar-
2. Nakashima E, Benet LZ (1988) General treat- macokinetics. J Biopharm Stat 8(2):317
ment of mean residence time, clearance and 7. Hing JP, Woolfrey SG, Wright PMC (2001)
volume parameters in linear mamillary models Analysis of toxicokinetic data using NON-
with elimination from any compartment. MEM: impact of quantification limit and
J Pharmacokinet Biopharm 16:475 replacement strategies for censored data.
3. Nakashima E, Benet LZ (1989) An integrated J Pharmacokinet Pharmacodyn 28(5):465
approach to pharmacokinetic analysis for linear 8. Benet LZ (1972) General treatment of linear
mammillary systems in which input and exit mammillary models with elimination from any
may occur in/from any compartment. compartment as used in pharmacokinetics.
J Pharmacokinet Biopharm 17:673 J Pharm Sci 61:536
4. Bailer AJ (1988) Testing for the equality of area 9. Benet LZ, Galeazzi RL (1979) Noncompart-
under the curve when using destructive mea- mental determination of the steady-state vol-
surement techniques. J Pharmacokinet Bio- ume of distribution. J Pharm Sci 48:1071
pharm 16:303 10. Gibaldi M, Perrier D (1982) Pharmacokinetics.
5. Yeh C (1990) Estimation and significance tests Revised and expanded, 2nd edn. Marcel Dek-
of area under the curve derived from incom- ker Inc., New York, NY
16 Non-compartmental Analysis 389
11. Jusko WJ (1992) Guidelines for collection and 14. Pang KS (1985) A review of metabolic kinetics.
analysis of pharmacokinetic data. In: Evans WE, J Pharmacokinet Biopharm 13(6):633
Schentag JJ, Jusko WJ (eds) Applied pharmacoki- 15. Watari N, Benet LZ (1989) Determination of
netics: principles of therapeutic drug monitoring, mean input time, mean residence time, and
3rd edn. Applied Therapeutics, Spokane, WA steady-state volume of distribution with multi-
12. Rowland M, Tozer T (2010) Clinical pharma- ple drug inputs. J Pharmacokinet Biopharm 17
cokinetics and pharmacodynamics: concepts (1):593
and applications, 4th edn. Lippincott Williams 16. Gillespie WR (1991) Noncompartmental ver-
& Wilkins, Maryland sus compartmental modeling in clinical phar-
13. Houston JB (1994) Kinetics of disposition of macokinetics. Clin Pharmacokinet 20:253
xenobiotics and their metabolites. Drug Metab
Drug Interact 6:47
Chapter 17
Abstract
Compartmental models are composed of sets of interconnected mixing chambers or stirred tanks. Each
component of the system is considered to be homogeneous, instantly mixed, with uniform concentration.
The state variables are concentrations or molar amounts of chemical species. Chemical reactions, transmem-
brane transport, and binding processes, determined in reality by electrochemical driving forces and con-
strained by thermodynamic laws, are generally treated using first-order rate equations. This fundamental
simplicity makes them easy to compute since ordinary differential equations (ODEs) are readily solved
numerically and often analytically. While compartmental systems have a reputation for being merely descrip-
tive they can be developed to levels providing realistic mechanistic features through refining the kinetics.
Generally, one is considering multi-compartmental systems for realistic modeling. Compartments can be
used as “black” box operators without explicit internal structure, but in pharmacokinetics compartments are
considered as homogeneous pools of particular solutes, with inputs and outputs defined as flows or solute
fluxes, and transformations expressed as rate equations.
Descriptive models providing no explanation of mechanism are nevertheless useful in modeling of many
systems. In pharmacokinetics (PK), compartmental models are in widespread use for describing the
concentration–time curves of a drug concentration following administration. This gives a description of
how long it remains available in the body, and is a guide to defining dosage regimens, method of delivery,
and expectations for its effects. Pharmacodynamics (PD) requires more depth since it focuses on the
physiological response to the drug or toxin, and therefore stimulates a demand to understand how the
drug works on the biological system; having to understand drug response mechanisms then folds back on
the delivery mechanism (the PK part) since PK and PD are going on simultaneously (PKPD).
Many systems have been developed over the years to aid in modeling PKPD systems. Almost all have
solved only ODEs, while allowing considerable conceptual complexity in the descriptions of chemical
transformations, methods of solving the equations, displaying results, and analyzing systems behavior.
Systems for compartmental analysis include Simulation and Applied Mathematics, CoPasi (enzymatic
reactions), Berkeley Madonna (physiological systems), XPPaut (dynamical system behavioral analysis),
and a good many others. JSim, a system allowing the use of both ODEs and partial differential equations
(that describe spatial distributions), is used here. It is an open source system, meaning that it is available for
free and can be modified by users. It offers a set of features unique in breadth of capability that make model
verification surer and easier, and produces models that can be shared on all standard computer platforms.
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_17, # Springer Science+Business Media, LLC 2012
391
392 J.B. Bassingthwaighte et al.
1. Introduction
1.1. Overview Compartmental analysis implies the use of linear first-order differential
of the Topic operators as analogs for describing the kinetics of drug distribution
and elimination from the body. Concentrations are measured in
accessible fluids, usually the plasma, and the concentration–time
curve is used to provide a measure of how long the drug concentra-
tion remains at a therapeutic level. This is the basis of pharmacoki-
netics (PK). The influences on efficacy and utility are considered by
the term ADME, administration, distribution, metabolism, and
elimination. Drugs are given in chemically significant amounts;
they bind to enzymes, channels, receptors or transporters, changing
reaction rates and fluxes in concentration-dependent fashion. The
effects on the biological system is termed pharmacodynamics (PD).
Precise mathematical statements about the kinetics and the body’s
responses comprise the combination pharmacokinetics–pharmaco-
dynamics (PKPD).
Compartmental analysis had its historical start with the use of
tracers. Tracer-labeled compounds were used in order to determine
kinetics when the drug concentrations were too low to be measured
chemically. Radioactive tracers were given in such low concentra-
tions relative to those of native non-tracer mother substances that
the kinetics were in fact linear. Consider a reaction rate, k(C), that is
dependent upon the concentration of the mother substance of
concentration C(t):
Flux of mother substance ¼ kðCÞ CðtÞ: (1)
When tracer of concentration C 0 is added to the system, then
Total flux; mother, and tracer ¼ kðC þ C 0 Þ ½CðtÞ þ C 0 ðtÞ: (2)
When C 0 << C, then the rate constant is determined solely by
C, as k(C þ C 0 ) k(C), and the rate constant is independent of
the tracer concentration:
Tracer flux of C 0 ðtÞ ¼ kðCÞ C 0 ðtÞ; (3)
0
where the flux is first order in C when the background non-tracer
mother substance concentration is constant. When only the tracer is
changing concentration, the k(C) is constant and the system is first
order and linear. In general then, one can look upon tracers in
compartmental systems as being linear, first-order systems, though
nowadays they can go far beyond that. The originators and later
proponents of compartmental analysis (Berman (1); Jacquez (2, 3);
17 Compartmental Modeling in the Analysis of Biological Systems 393
Cobelli et al. (4)) used this simplification, but were always aware of
the greater possibilities of allowing nonlinear coefficients. Berman’s
classic 1963 article (1) provides much more than solutions to ordi-
nary differential equations (ODEs) for he outlines an important
philosophic approach to modeling in general. Jacquez’ books and
many articles, and the book by Cobelli et al. (4), give detailed
mathematical approaches and explicit applications. The desire to
use linear kinetics was not so much to avoid solving nonlinear
equations as it was to use linear algebra to solve the differential
equations. A system of linear differential equations can be solved
by matrix inversion and can provide the much desired analytical
solutions. As we shall see below, analytical solutions are still desired,
for they serve as verification that the numerical solutions produced
by modern simulation systems are correct in specific reduced cases,
and thereby imply that the nonlinear system solutions in that neigh-
borhood of parameter space are also correct. But, because most
biological phenomena are nonlinear, such that the rate coefficients
vary with the concentrations of one or usually more solutes, tem-
perature, and pH, we have to acknowledge right at the start that
using linear compartmental systems analysis is an approximation.
Compartmental analysis was mostly descriptive, not mechanis-
tic. It was “Black Box,” not attempting to define enzymatic reac-
tions mechanistically, but to describe the time course, “White Box”
modeling is where the innards of the operational analysis attempt to
describe mechanism, not just the kinetics of a relationship. Never-
theless, the descriptive level was a success; in FDA reviews quanti-
tative descriptions are valuable, for they distinguish groups of
responses and allow categorization even when they cannot provide
a physiological interpretation. The plasma concentration–time data
are very useful in choosing methods of administration and in defin-
ing dosage regiments.
Modern molecular biology and emerging integrative multi-scale
modeling analysis likewise are changing the game. Personalized
medicine is pointedly mechanistic, with cell and molecular physiol-
ogy dominating in the strategies of Administration, Distribution,
Metabolism, and Elimination (ADME). Fortunately the huge
increase in the rates of acquisition of data, causing a demand for
detailed, informative but complex simulation analysis has been more
than compensated for by the increases in computational speed and in
improved software facilitating modeling analysis. Most importantly,
software sharing is now relatively easy, and of much increased impor-
tance since comprehensive models may take years in development.
Now, highly nonlinear complex systems, spatially distributed or
lumped, are handled with faster computation, and can include kinet-
ics and detailed physiological pharmacodynamics (the PD of
PKPD), which is the systematic analysis of the body’s responses to
the drug or toxin. Given the relevance of physiological transport
processes (diffusion, flow, transmembrane exchange, binding) in
394 J.B. Bassingthwaighte et al.
1.2. Model Types, 1. Closed system: No sinks or sources, literally, all fluxes between
Topologies, and inside and outside are zero, and external driving forces are all
Equation Types zero.
1.2.1. In Terms of Input 2. Open system: There are external sinks and/or sources for some
and Output Characteristics of the constituents in some of the compartments or cells. (Sinks
We May Classify are defined as operations via which a substance vanishes from
Compartmental Systems the system; sources are operations generating a substance.)
as Closed or Open
2. Software
for Systems
Description,
Simulation, Pharmacokinetic models have been written in virtually every
and Data Analysis computer language that exists, and it is a field that has stimulated
the development of a large set of relatively specialized simulation
systems. A partial list of simulation software for compartmental
2.1. Common Software
analysis goes back half a century:
and Methods Used
in the Field SAAM, Simulation, and Analysis Modeling, was the first, developed by
Mones Berman at NIH for analyzing tracer kinetics (http://depts.
washington.edu/saam2/). It exists still as SAAMII.
SIMCON, a general simulation control system (7), now evolved
into JSim, was used to solve FORTRAN-based models of all
sorts.
XPPAUT, from Bard Ermentrout (http://www.math.pitt.edu/
~bard/xpp/xpp.html). XPPAUT is particularly good for bifur-
cation analysis of dynamical systems.
Gepasi, now CoPasi, from Pedro Mendez (http://www.softpedia.
com/progDownload/Gepasi-Download-167140.html;
http://www.copasi.org/tiki-view_articles.php). CoPasi is espe-
cially good for enzymatic reactions and biochemical systems,
allowing a menu of choices for reaction types.
Modelica, http://www.openmodelica.org/, is excellent for linking
operators and presenting the forms of model networks (http://
www.ida.liu.se/labs/pelab/modelica/OpenSourceModelica-
Consortium.html).
Jarnac is designed for symbolic or diagrammatic entry for biochem-
ical and gene regulatory reactions (from Herbert Sauro (8, 9),
http://sys-bio.org/jarnac/).
JSim: Developed from SIMCON and XSim (for X-windows linux
systems, http://www.physiome.org/software/xsim/) into
JSim (10) (http://www.physiome.org/jsim/). JSim was devel-
oped by Erik Butterworth (11); it provides automated unit
balance checking and unit conversion, thus avoiding errors
due to inconsistency in the units used in the code.
Non-MEM, nonlinear mixed effects modeling, is a commercial
software package providing the capability to use a wide variety
of pharmacokinetic models. It is particularly designed for the
analysis of sparse data sets using combinations of single patient
and population data, (http://www.iconplc.com/nonmem).
BioSPICE (derived from SPICE, for biology: http://sourceforge.
net/projects/biospice) is designed for molecular biology. See
also http://jigcell.cs.vt.edu/software.php for JigCell, based on
BioSpice.
17 Compartmental Modeling in the Analysis of Biological Systems 399
2.2. A Preferred JSim is perhaps the most general of these simulation analysis systems,
Simulation designed for the analysis of experimental data. It is built around a
System, JSim “project file, .proj,” that may hold many data sets, several different
models, and results of multiple analyses. JSim handles not only the
ODEs around which traditional compartmental modeling is built,
but also DAEs, implicit functions, PDEs, and stochastic equations.
JSim, uniquely, and from its beginning in 1999, uses unit balance
checking and automated unit conversion. (Unit balance checking
assures that the units of the expressions on the left of the equal sign
are the same as those on the right. Automated unit conversion means
that when time is expressed in minutes a velocity expressed in cm/
s will be converted to cm/min by multiplying cm/s by 60 s/min.)
This pair of features is a great boon in programming since in the first
phase of compilation it automates the first stage of verification of the
model’s mathematical implementation by making sure that every
equation has unitary balance. The second phase of compilation parses
the details of the equations, and sequences them for efficient compu-
tation. The run-time code is compiled into Java, which now runs
almost as fast as FORTRAN and C. (On a cardiovascular–respiratory
system model JSim ran exactly 300 times faster than a Matlab–Simu-
link version of the identical model.) JSim’s advantages over the ODE-
based systems listed above are the following:
1. Runs on Linux, Macintosh, and Windows.
2. Is free and downloadable from www.physiome.org. On the
Macintosh it takes about 30 s to download and install, and
another 10 s to bring up a model.
400 J.B. Bassingthwaighte et al.
3. Is the only one that solves PDEs and offers an assortment of solvers
for both PDEs (three available now) and ODEs (eight available).
4. Imports and exports both SBML and CellML archival forms.
5. Provides sensitivity analysis of two types, relative and absolute.
6. Graphical output is immediately available during the simulation
run and setup in seconds.
7. Has seven built-in optimizers for excellent power in parameter
adjustment to fit data.
8. Provides the covariance matrix giving the correlation among
free parameters and estimates of parameter confidence limits.
9. Use project files that allow the analysis of many experimental
data sets in one file.
10. Stores parameter sets so that individualized parameter sets for
each data set can be stored.
11. Allows the use of several models within one project file so that
competing hypotheses (models) can be compared and evaluated.
12. Is structured so that the front-end parameter control and
graphical user interface (GUI) can be framed explicitly for any
model.
13. Has linear and log line graphs, 2D contour plots representing
3D, and phase-plane plots.
14. Has “looping” capability, allowing discrete successive jumps of
the values of one or two parameters at a time in order to
explore system sensitivities visually and rapidly.
15. Uses a Mathematical Modeling Language, MML, in which one
writes the equations directly, for simultaneous solution, and in
which the order of the equations is not specified.
There are no special requirements for the JSim software or for
its methods of use for model building and exploring or use in
analysis with respect to hardware, computing platform, or operating
system. It has important limitations, not being a procedural lan-
guage but a declarative mathematical language. This means there is
no equivalent of a FORTRAN DO-loop (or GO TOs or jumps). It
cannot yet do matrix inversions (except through a special mecha-
nism), and is in a continuing state of development. JSim 2.0,
released in February 2011, is based on a new compiler providing
many new features described at nsr.bioeng.washington.edu/JSim/.
The features listed above, and others not listed, have been
implemented in JSim because the years of experience with a large
variety of models, with teaching graduate and undergraduate clas-
ses, and postdoctoral and faculty workshops have led to a detailed
understanding of how people use modeling in scientific research.
Experiment design, hypothesis testing, and system parameteriza-
tion are given priority in the conveniences provided.
17 Compartmental Modeling in the Analysis of Biological Systems 401
3. Compartmental
Modeling
3.1. The Modeling The overall process in the experiment/model hypothesis iteration
Process loop of Platt (12) is as follows: (1) express the hypothesis in
quantitative terms, as a mathematical model, with units on every-
thing; (2) use the model to determine the best experiment that
might contradict the predictions of the model, or, better yet,
develop an alternative model that is seemingly as good but makes
different predictions, and then design the experiment that clearly
distinguishes between the models; and (3) do the experiment and
analyze the data. One of the two competing models, maybe both,
must be proven wrong, and so science is advanced.
The normal data analysis using models begins by putting the
data to be analyzed in a “project file,” modelname.proj, and dis-
playing them on the JSim plot-pages. The second stage, coding and
model verification in accord with standards (http://www.imagwiki.
nibib.nih.gov/mediawiki/index.php?title¼Working_Group_10),
is building and testing the model, incorporating reference analytical
solutions if appropriate to verify the solutions as being mathemati-
cally accurate, and representing the equations. The “project file”
may contain two or more models so that alternative model forms
can be compared directly by examining the solutions (changing
parameters and rerunning, using “loops” to automatically change
parameters, using behavioral analysis, plotting in various forms
including phase-plane plots, contour plots). The verification stage
is to show that the model solutions are computed correctly, done by
testing different solvers, using different time-step sizes, and com-
paring with analytical solutions in special cases.
The validation stage is to test the fitting of the model solutions
against the experimental data. The word “validation” is truly opti-
mistic, because a good fit of the model solution to the data does not
really validate the model, but merely fails to invalidate it. It is the
failure of the model that leads to the scientific advances by forcing
new ideas to be incorporated. Nevertheless, fitting of the model to
the data provides characterization of the data, augments diagnostic
acuity, assesses progress of disease or evidence of successful therapy,
and is generally useful in reconciling the working hypothesis
(model) with observations.
The final phase is preparing the model so that it can be reproduced
by others. This is not only critical from a tutorial point of view but also
in fact is a requirement for any scientific publication. Anything that
402 J.B. Bassingthwaighte et al.
3.2. A Simple The modeling code. For an introduction to JSim we use a two-
Compartmental Model compartment closed system with passive exchange between the
Implemented in JSim compartments, and a conversion reaction of solute A to solute B
in either or both compartments. This model has analytic solutions
which could be used either to show the solutions or to provide
verification of the accuracy of the numerical solution, but since
these solutions run no faster than the numerical solutions, they
will not be used here. Detailed instruction in JSim use is available at
http://www.physiome.org/jsim/. This model is #246.
Many model programs are available at http://www.physiome.
org/Models. One can search to find a model similar to what one
might like to construct, e.g., from a tutorial list of compartmental
models: http://www.physiome.org/jsim/models/webmodel/NSR/
TUTORIAL/COMPARTMENTAL/index.html.
Open model #246, Comp2ExchReact, and you will be asked to
allow the display on your computer, wait a moment for it to be
compiled, and then click on “Source” at the bottom of the JSim
page to show the source code, Table 1 for the model shown in
Fig. 1. All the models on the Web site are archived to keep track of
model changes (previous versions can be found under the “Model
History” section on each model Web page). (Models edited over
the Web cannot be saved on the Physiome Web server, so simply
save what you want to your own directory. The JSim system can
likewise be downloaded directly and the model worked on from
your own computer.) The model for the code in Fig. 1 is dia-
grammed at the bottom of the figure; it is a two-compartment
model for two substances, A and B. Both substances can passively
move from one compartment to the other. A is irreversibly con-
verted to B in either or both compartments. After the title a short
description is provided. (Text enclosed by /* . . .*/ is ignored by
the compiler, as is comment text following // on any line.)
An important JSim feature is invoked next, the reading of a
units file, nsrunit, which allows automated unit balance checking
and automated unit conversion to common units during the
compilation phase; MKS, CGS, and English units can all be
used. (The unit conversion can be turned off, a feature useful
Fig. 1. JSim code for a two-compartment model in which a solute A can exchange across
a membrane between volume V1 and V2 and can react irreversibly to form solute B in
either compartment. This is available for download at www.physiome.org/jsim/models/
webmodel/NSR/Comp2ExchangeReaction/index.html and is model #246.
17 Compartmental Modeling in the Analysis of Biological Systems 403
Table 1
Code for two-compartmental model with passive bidirectional exchange
3.2.1. Unit Balance Checking The acute observer will have noticed that the independent variable, t,
is in seconds: the differential equation therefore looks to have unbal-
anced units. It actually computes correctly since the phrase “unit
conversion on” at the top of the program preceding the model code
enlists the automated unit conversions so that in the compiled code
the multiplier of the left side, “60 s/min,” is inserted. Without unit
conversion on, the best way to handle this is to put the independent
variable t in minutes. In languages like Matlab there is no unit
checking. In huge projects like the Mars Climate Orbiter mission
(http://www.spaceref.com/news/viewpr.html?pid¼2937), mixing
units from the European and American programs led to the crash
of the space vehicle and the termination of the billion dollar mission.
There would have been no problem in JSim as long as the units were
stated and unit conversion “on” (11). The nsrunit file may be viewed
17 Compartmental Modeling in the Analysis of Biological Systems 405
3.2.2. Graphical User The JSim GUI for simulation control is shown in Fig. 2. The left
Interface panel has overlays for project contents: one or more models, data
sets, parameters sets, setups for solvers for ODEs and PDEs,
sensitivity analysis, optimization, and confidence limits. To start a
simulation run one clicks “RUN” at the top of the run time
window. On the right page one clicks on a “Message” panel for
error messages, or the plot pages (1_Conc or 2_ReacSite, names
chosen by the user), and at the page bottom choose “Graph” for
displaying the output graphically or “Text” for seeing the numerical
listings of the experimental data and the model solutions at each
time point.
There is an extensive introduction to JSim at www.physiome.
org/jsim giving precise detail to supplement this outline of usage.
The JSim MML code is pretty easy for a beginner to use since
it contains just the parameters with their units, the variables
followed by (t) to indicate their dependence on the independent
variable t for time, the initial conditions, and the equations for the
model. The most common mistakes are to misapply the uses of
commas and semicolons. Each of a sequence of events is usually
comma separated, and a string of them is closed with a semicolon.
The model code itself begins with the left curly bracket, “{” after
the word “math” and ends with the right curly bracket,“},”
after all of the equations have been written. Comments are pre-
ceded by a double slash, “//,” or alternatively can be preceded by
a “/*” followed by an “*/,” without the quote signs. Equations
end with a semicolon, including those in the initial condition
statement.
3.2.3. Exploring Parameter In loop mode, the user can choose to enter a sequence of values
Influences Using the “LOOP” (under Other Values) to explore model behavior widely, using
Mode of Operation comma separation, e.g., 2, 3, 5, 8, etc., and in the “auto” mode
will do as many runs as there are values entered. One can also enter
arithmetic changes such as @*2 or @ + 3 or more complicated
expressions to indicate automatic changes in the starting value by
406 J.B. Bassingthwaighte et al.
Fig. 2. Standard JSim Input/Output control and plot pages. Left panel: Runtime control: “Domain” is time, t, with tmin
starting time, tmax ending, and tdelta the time step for plotting. “Model Inputs” gives parameter values and initial conditions.
Model Outputs shows values of variables at the end of the run, at t ¼ 60 s. The mass balance check, TotalC, is exact to
eight decimals. Right panel: The time courses of concentrations and A and B in compartments 1 and 2 are shown for the
parameters and initial conditions shown in the left panel and in the code in Fig. 1. The user chooses the variables to plot,
and the colors and line type or point type. The title and labeling are user written and retained in the JSim project file. The
legend for the graphics output is automated.
Fig. 3. Loop Mode Operation. The control panel for the looping operator is shown on the left. The starting values for G1 and
G2 are those shown in the code (Fig. 1) and their solutions are given by the solid lines, and are the same as in Fig. 2, with
conversion of A ! B only in compartment 1. On the second run (dashed lines of same colors) the values entered by the
user under Other Values (left panel, top right ) are used, in this case setting G1 ¼ 2 and G2 ¼ 0, so that the conversion of
A ! B occurs now only in compartment 1 instead of in compartment 2.
4. Applications
4.1. A Compartmental Aspirin is a very old drug used to reduce fever and inflammation;
Approach to Aspirin only recently has its mechanism of action begun to be understood,
Kinetics the first being its action as a blocker of prostaglandin formation. Its
kinetics have not been thoroughly worked out, so we present here
our analysis of three sets of representative data from three different
studies. Aspirin is acetylsalicylic acid, and its reactions are as follows:
Acetylsalicylic acid ! salicylic acid ! salicyluric acid: (7)
Aspirin, acetylsalicylic acid, is hydrolyzed quickly via a plasma
esterase to salicylate. Salicylate is pharmacologically active. Salicy-
late kinetics dominate the clearance. The modeling examines only
salicylate’s enzymatic conversion to product, where product is con-
sidered to be equivalent to excretion into the urine, a saturable
process. This may produce salicyluric acid or a glucuronate. The
model captures the kinetics of salicylate clearance over a 100-fold
range of concentrations through consideration of one enzymatic
reaction with parameters optimized to fit three very different data
sets taken from the referenced papers.
The second reaction to form the excretable product is slow and
is enzymatically facilitated. At high doses the enzyme becomes
saturated, i.e., the reaction is limited by the fact that the enzyme
is all in the form of the bound enzyme–substrate complex and
raising the salicylate concentration does not accelerate the reaction.
At low dosage the clearance is rapid; at medium or high therapeutic
dosage the clearances are slower; at near-lethal toxic levels clearance
is very slow and often requires treatment by infusion of alkaline salts
and sometimes dialysis. For this example we have taken the data
from three research studies. (Low dose data are from Fig. 1 Right of
Benedek (14) Dose Period 1 (squares). Medium dose data are from
Fig. 4 of Aarons (15) oral dosage, last nine points. High dose data
are from Fig. 1 of Prescott (16) Control.) These particular data
were chosen because the chemical methods and procedures
appeared to be excellent, the data covered many hours, and, while
we do not have the original data, conversion from the symbols in
the figures to numerical representation was accomplished with
good accuracy.
A reason for choosing aspirin as the subject for compartmental
analysis is that the time course of the clearance, mainly by loss into the
urine, is long compared to circulatory mixing times, so that the bio-
chemical processes appeared to limit the clearance, and they could
therefore be characterized. If the circulation and distribution times
were long compared to the reaction processes, the latter would not be
meaningfully determined from the observations. Another reason was
that a comparison among the different data sets suggested that the
17 Compartmental Modeling in the Analysis of Biological Systems 409
The equations and parameters are identical for the three dosage
levels. Here is the model code for the low dose, where the prefix L
distinguishes this model equations from those for the medium
dose, prefixed M and the high dose, prefixed H, neither of which
are shown in Table 2.
In undertaking an analysis on a single enzymatic reaction we
lack knowledge of the exact mechanism. In addition the compart-
mental approximation is certainly questionable for whole body
studies. The hypothesis that a single enzymatic reaction dominates
the clearance would be strengthened if the model provides good fits
to the three data sets. There is no guidance from the literature on
the dissociation constant, KD, for our presumed enzymatic reac-
tion, so that we are neither constrained nor aided.
We chose to use a simple enzymatic reaction, one that allowed
characterizing the rates of binding and unbinding of substrate and
enzyme, and a rate of the forward reaction to yield the product. We
allow also a backward reaction, on the basis that all reactions are
410 J.B. Bassingthwaighte et al.
Table 2
Model code for salicylate clearance (model downloadable
from www.physiome.org/Models: search for model 280)
Fig. 4. Data on salicylate clearance from the three laboratories are fitted simultaneously with a single enzyme model to
describe the clearance. Parameters are given in the model code in Table 2. Data are from Benedek (14) in the left panel,
Aaron (15) in the middle panel, and Prescott (16) in the right panel. Note that the concentration ranges are markedly
different. (http://www.physiome.org/jsim/models/webmodel/NSR/Aspirin/ model 280).
reverse reaction from the product back to salicylate. So for the first
level of testing all parameters are considered as open and adjustable,
including the initial values of the concentrations in the system. In
this analysis the system is considered to be a single well-stirred tank,
as if the circulation were instantaneously mixed. This gross over-
simplification also makes the assumption that the product either
goes directly into the urine or to some other location in the body
from which it does not return. Enzymatic conversion in the liver
followed by excretion into the bile would be equivalent kinetically
to conversion to a glucuronate followed by the circulation through
the blood and clearance in the kidney. Ideally one would measure
the urinary excretory rates simultaneously and model both the
plasma clearance rates and the urinary clearance. This was not
done here.
The results are shown in Fig. 4 where the curves at the low,
middle, and high concentration ranges are shown to be fitted
reasonably well by the model. The high concentration data (right
panel) are fitted less well but do illustrate that the slope is
412 J.B. Bassingthwaighte et al.
Fig. 5. Semilog plots of the data (symbols) fitted with the model solutions for the salicylic acid (LSA, MSA, and HSA, solids
lines). All the data in both Figure 5 were fitted simultaneously with one parameter set for the enzyme as given in Table 2.
The Product concentrations (dashed line for LP, MP, and HP) are merely predicted product concentrations. Since there are
no data on product concentrations, the assumption that there is no degradation of Product must lead to some overestima-
tion of its influence on the backward reaction.
Fig. 6. Three sets of linear sensitivity functions versus time. Solid lines are sensitivities to the initial zero-time
concentrations resulting from the doses. Long dashed lines are sensitivities to KD1: the sensitivities are all positive.
Dotted lines are sensitivities to KD2, the dissociation constant for the reverse reaction: the sensitivities are all negative.
4.1.1. Sensitivity Analysis In Fig. 6 the linear or absolute sensitivity functions are shown
for initial concentrations and for the two dissociation constants.
The solid lines are sensitivities to the initial zero-time concentra-
tions resulting from the doses; most of the sensitivity is at the
earliest points. With the high dose the fractional clearance is so
low that the high sensitivity extends throughout the 16 h of the
study. For the middle dose, MSAtot, the sensitivity diminishes most
steeply as a function of time at around 10 h when the concentration
is close to KD1, the dissociation constant for substrate binding.
The long dashed lines are the sensitivities to KD1; these are all
positive, meaning that if KD1 were increased (decreasing the affinity
414 J.B. Bassingthwaighte et al.
of the enzyme for the substrate SA) the model solutions for all
three doses would be at higher levels and the rate of disappearance
would be diminished. Note that the time of peak sensitivity to KD1
is at early times for the low dose, at 10 h for the middle dose, and at
late times for the high dose. The dotted lines are sensitivities to KD2:
the sensitivities are all negative, meaning that if KD2 were increased
(decreasing the affinity of the enzyme for the product P) the model
solutions would be at lower levels and the rate of disappearance
would be increased because of reduced rates of reverse flux from
product to salicylate.
Technically the sensitivity calculations are set up by a special
mechanism: at the bottom of the left page is a button labeled
“Sensitivity.” Clicking on it takes one to the “Sensitivity Analysis
Configurator.” There in the leftmost column of the configurator
table one types in, or chooses from the drag down menu, the
parameter for which one wants to find the sensitivity. By clicking
on the down arrowheads you bring up the choices. In the setup
provided on the Web site at www.physiome.org, etc. the three start-
ing values for the initial concentrations at t ¼ 0 are listed: LSAtot,
MSAtot, and HSAtot. Next on the list are the dissociation constants
KD1 and KD2.Their current values are automatically displayed under
“value.” The calculations of each S(t) are made on the basis of the
parameter change of 1% set under “delta” at 0.01. The tick marks in
the OK column indicate that the calculation will be made as
described earlier, namely, that the standard solution will be calcu-
lated and then another solution calculated for each of the five para-
meters listed, with this 1% change in parameter value. The S(t) is the
difference in the solution at each time point from the standard
solution divided by the 1% change in the parameter value, Eq. 4.
4.1.2. Optimization This is the process of fitting the model solutions as closely as
possible to the data in order to guide one’s thinking about and
one’s use of the model. When the fit is very close, then one has a
descriptor of the fitted data sets, that is, the model and its parameter
set provide a record of that description. Descriptions of many
different studies, patients studies or experiments, allow compari-
sons and possible classification into categories having specific dis-
tinctions. Descriptive models are useful for diagnosis and possibly
for prognosis or choosing modes of therapy. (If the model
“explains” the data by defining the physical and chemical mechan-
isms, that is even better.)
When the fit is poor, then more exploration is needed. Was
automated optimization used? A typical set of trajectories of param-
eter values during an optimization run using SENSOP is shown in
Fig. 7. The values do not range widely; to assure one that they have
not settled into local minima we also used other optimizers that
search widely, e.g. simulated annealing. Try weighting the data
differently: a simple sum of squares minimization may not be
17 Compartmental Modeling in the Analysis of Biological Systems 415
4.2. Multiple Sequential Test of clinical functions evolves in a variety of ways, usually being
Dose Administration: designed long after the function has been clearly understood. Here
Hepatic Function is a counterexample, a case in which the idea of the clinical evalua-
tion from the administration of a drug was evident right at
the beginning. There was a coalescence of features that brought
this about.
For the estimation of cardiac output using the indicator dilu-
tion technique Mayo Clinic’s Earl Wood needed a dye that
absorbed light at a wavelength of 800 nm, the isosbestic point at
which oxyhemoglobin and reduced hemoglobin absorbed equally.
This would allow optical detection and quantitation of the dye
416 J.B. Bassingthwaighte et al.
Fig. 7. Optimization: Trajectories of values for parameters being optimized, in this case the three initial concentrations and
the two dissociation constants during 100 trials of fitting the model solutions to the salicylate data. The optimizer used was
SENSOP (17); the staircase nature of the plotted values is that SENSOP reports the previous estimate of the parameter
vector as it calculates each new sensitivity function, one for each parameter being optimized, and then reports and plots
the new value of each parameter.
CInject
V1, C1, Flow1 = C.O.
CRecirc
Lungs and Heart
V2, C2 Flow2
Kidney and Head
V3, C3 Flow3
G3
Liver and Gut
V4, C4 Flow4
Muscle
Fig. 8. Indocyanine Green Injection and Clearance. Upper panel: Blood concentrations in a
14 kg dog with 22 successive intravenous injections of 2.5 mg ICG (vertical pips). Dashed
lines represent estimated single exponential decay after each series of injections. Data
are from Edwards et al. (22). Lower panel: Circulatory model for hepatic clearance of ICG,
a mammillary compartmental model. Model code, with all parameters and equations, is in
Table 3. Modeling results are in Fig. 9.
was excreted via the bile: the feces turned green! Within a few years
the Indocyanine Green clearance Test was used as a liver function
test; it was a valued test even before the mechanisms of its hepatic
excretion became known (24).
The data from one dog (Edwards, 1960), shown in Fig. 8
(top panel), invite analysis. After each series of injections a set of
blood samples were obtained as the concentration diminished: the
diminutions appeared as straight lines on the semilog plot shown in
Fig. 8, top. This suggests a first-order clearance, i.e., a constant
fraction of the dye was being removed per unit time. Now let us
develop a simple model, and then test it against the data.
The anatomy and physiology provide the framework for the
model. The dye distributes throughout the whole blood volume.
418 J.B. Bassingthwaighte et al.
Table 3
Compartmental model for hepatic ICG clearance
Table 3
(continued)
Table 3
(continued)
The four-compartment model has recirculation. The input function to compartment 1 (Heart and Lung)
is the sum of the recirculated indicator plus the series of injection pulses into V1, Qinjrate, each
injection at x mg/min for a short duration. Each injection, Cin#, is defined at run time using a separate
function generator. Clearance of the injected dye, indocyanine green, is hepatic extraction from the
blood via a saturable transporter on the hepatocyte sinusoidal membrane followed by ATP-dependent
excretion across the hapatocyte apical membrane into the bile, but is represented here by a passive first-
order loss, G3. This is adequate kinetically only at low concentrations of ICG, where the transporter is
mainly uncomplexed. */
KEY WORDS: compartment, flow and exchange, mixing chamber, hepatic clearance. first-order
consumption, washout, organ, multi-organ, recirculation
REFERENCES:
Edwards AWT, Bassingthwaighte JB, Sutterer WF, and Wood EH. Blood level of indocyanine green in
the dog during multiple dye curves and its effect on instrumental calibration. Proc S M Mayo Clin 35:
747-751, 1960.
Edwards AWT, Isaacson J, Sutterer WF, Bassingthwaighte JB, and Wood EH. Indocyanine green
densitometry in flowing blood compensated for background dye. J Appl Physiol 18: 1294-1304,
1963.
REVISION HISTORY:
Author: BEJ 06jan11
Revised by: JBB 09jan11 to combine the function generators, fgens, to speed computation
COPYRIGHT AND REQUEST FOR ACKNOWLEDGMENT OF USE:
Copyright (C) 1999-2011 University of Washington. From the National Simulation Resource,
Director J. B. Bassingthwaighte, Department of Bioengineering, University of Washington, Seattle WA
98195-5061.
Academic use is unrestricted. Software may be copied as long as this copyright notice is included.
This software was developed with support from NIH grant HL073598.
Please cite this grant in any publication for which this software is used and send an e-mail with the
citation and, if possible, a PDF file of the paper to [email protected].
*/ Table 3 is an example of a complete set of MML code in the format used in general for JSim project files, having the
same sequence and format as those used in the repository of models at www.physiome.org/Models.
5. Summary:
The Processes
Undertaken in
Pharmacokinetics In Subheading 4 we covered a standard approach to the steps in a
modeling analysis of data. The order of the steps depends a little on
the nature of the task. In the first model we performed no
17 Compartmental Modeling in the Analysis of Biological Systems 423
Fig. 9. Model solution to fit the ICG data in a 14 kg dog. The triangles are the data shown in Fig. 8 (upper). The parameters
and initial conditions for the model are those given in the code in Table 3. The parameters are the result of optimization
using NL2SOL from Dennis and Schnabel (19); the total blood volume, Vtot, and the hepatic clearance, G3, were the only
free parameters. The 2.5 mg injection pulses are shown along the abscissa. The dashed lines are mono-exponentials with
time constant Vtot/G3. The model is #103 at http://www.physiome.org/jsim/models/webmodel/NSR/Comp4ICG/.
verification steps, and in the second the verification was done after
the fitting of the model to the data. This is clearly in the wrong
order: there is no point fitting a model to the data until it has been
demonstrated to be computed correctly, so in the list that follows,
the verification is done as soon as the draft model is constructed.
One cannot argue that the verification is not needed until the
model fits the data: a mathematically incorrect model might fit
the data, and after all that work the effort would be shown to be a
waste if the code had an error. Our failure to precede the data fitting
by a formal verification in these two cases is based on prior observa-
tions that JSim’s solutions for compartmental models provide four-
digit accuracy compared to analytical solutions.
Taking a listing of the steps to a detailed level:
l Ideally, design the experimental protocol to be the best test of
the model.
l Gathering supporting data, assess experimental accuracy.
l Obtain information on necessary parameters, a priori, and on
possible constraints.
l Complete development of the model. List all the assumptions.
424 J.B. Bassingthwaighte et al.
6. Model
Alternatives
and Modifications:
Interactive When the fit is not precise, outside of the limits of expectation
Hypothesis Revising relative to the noise in the data, despite all the attempts, then
maybe the model is just wrong. Certainly it is not nicely descriptive,
let alone explanatory! Given the philosophical premise that all
models are wrong, in the sense of being incomplete, or incorrect
mechanistically, every failure is a stimulus to find an alternative
model. A most rewarding and successful strategy is that of Platt
(12): he proposes that right from the outset one should
have alternative hypotheses in mind, and that the experiment
should be designed to distinguish between these hypotheses.
“Strong Inference” is the title of his paper. We advocate that each
hypothesis be expressed in terms of a computational model, since
that means that it is described explicitly and is therefore testable.
The strategy pays off because at least one of the hypotheses
is proven wrong when the model fails to fit the data. Sometimes
both are wrong! Regroup, rethink!
17 Compartmental Modeling in the Analysis of Biological Systems 425
7. What to Do When
the Compartmental
Representation Is
not so Good? What follows is a common example that applies in biophysics,
physiology, and pharmacology. It is usual that drugs and substrates
for metabolism and signaling molecules of molecular weight less
than 1,000 Da are partially extracted during single passage through
a capillary. Since capillaries are about a 1,000 mm long, but only
5 mm in diameter (as in Fig. 10), there is no possibility that they are
instantaneously stirred tanks with uniform concentration from end
to end: there must be gradients between capillary entrance and exit
for any solute that is exchanging between blood and tissue. If the
extraction is less than 5%, the gradient might be ignored, but for
solutes of interest to us here the steady-state extractions are
30–90%, and so affect the estimates of the permeabilities and
consumption rates. Consequently we now consider the computa-
tional differences between a stirred tank and an axially distributed
capillary–tissues exchange unit.
426 J.B. Bassingthwaighte et al.
Fig. 10. A venule and capillaries on the epicardium of a dog heart casted with microfil.
Capillary diameters are 5.6 1.3 mm, average intercapillary distances are 17–19 mm,
and lengths are 800–1,000 mm. The distance between the long calibration lines is
100 mm. Modified from Bassingthwaighte et al. (33).
Fig. 11. Compartmental versus axially distributed models for capillary–tissue exchange. Exchange between capillary
plasma and interstitial fluid regions can be regarded as providing two conceptually similar but mathematical distinguish-
able methods of representation. Left panel: Two compartment stirred tank for the exchange of solute C between flowing
blood and surrounding stagnant tissue. Flow F, ml/(g min), carries in solute at concentration Cin, mM, and carries out a
concentration Coutt ¼ Cp, mM. The flux from capillary to ISF (compartment 2) is limited by the conductance PS ml/(g min),
the permeability-surface area product of the membrane separating the two chambers, allowing bidirectional flux. G2 ml/
(g min) is a reaction rate for a transformation flux forming product at a rate G2·C2 mmol/(g min). Right panel: Axially
distributed model equivalent to the two-compartmental model when the solute does not enter the endothelial or
parenchymal cells. PSg, the conductance for permeation through the interendothelial clefts, is equivalent to the PS of
the compartmental model in the left panel. Right panel from Gorman et al. (18) with permission from the American
Physiological Society.
Fig. 12. Schematic overview of experimental procedures underlying the application of the multiple-indicator dilution
technique to the investigation of multiple substrates passing through an isolated organ without recirculation of tracer. The
approach naturally extends also to their metabolites.
7.2. Model Equations The two diagrams in Fig. 11 look quite different, but the second
for Tracer can be reduced to the compartmental model, as we will show
below. The essential difference is that the distributed model
accounts for concentration gradients along the capillary length.
Capillaries are about 1 mm long, and are 5 mm in diameter, an
aspect ratio of 200. Diffusional relaxation times thus differ by a
factor of 200 between radial and axial directions. Consequently,
considering the capillary as a stirred tank is unreasonable.
The stirred tank expressions account for the flow through
compartment 1, the permeation, and consumption terms G2 ml/
(g min) in the second compartment:
dC1 PS F
¼ ðC1 C2 Þ ðCin C1 Þ;
dt V1 V1
dC2 PS G2
¼ þ ðC1 C2 Þ C2 : (10)
dt V2 V2
The use of these ODEs implies and builds into the calculations
a discontinuity between the concentration of solute in the inflow
17 Compartmental Modeling in the Analysis of Biological Systems 429
Fig. 13. Pulse responses in axially distributed models. The input function, Cin, is a pulse of duration 1.4 s. Upper panel:
Outflow concentration–time curves for a partial differential equation solution using a Lagrangian sliding fluid element
method and an intravascular dispersion coefficient, Dp ¼ 2.6 105 cm2/s (gray curve), and for a serial stirred tank
algorithm representing a Poisson process with 109 stirred tanks (black curve almost superimposed on the gray one). Lower
panel: Intracapillary spatial profiles in the distributed model (using the PDEs) at a succession of times, 1.5, 2.0, 2.5, and
3.0 s. The pulse slides and disperses due to the diffusion while some solute is lost from the vascular space by permeation
of the capillary wall. Parameters were the same for the compartmental 109 tank Poisson model and the PDE: Fp ¼ 1 ml/
(g min), PSC ¼ 2 ml/(g min), and tissue volume Vtiss was set to 10 ml/g so that there was negligible tracer flux from tissue
back into the plasma space. (The model: http://www.physiome.org/jsim/models/webmodel/NSR/Anderson_JC_2007/FIG-
URES/Anderson_JC_2007_fig11/index.html) Figure from Anderson (6).
bolus progresses along the capillary. The permeative loss is the same
for both methods, with the result that the peak outflow concentra-
tions are similar. Figure 13 (lower) shows the shape of the bolus as a
function of position as it deforms continuously from its initial
square pulse at the entrance to the capillary. The diminution in
peak height is therefore due not only to the spreading but also to
the loss. This loss is reflected of course in the reduction in the areas.
432 J.B. Bassingthwaighte et al.
Fig. 14. Effect of reduction of Ntanks on the output C(t ). Responses of the Nth order
Poisson operator with Ntanks varied from 109 tanks in series down to 50, 20, 10, 5, 2,
and finally to a single mixing chamber, Ntanks ¼ 1. The gray curve is the Lagrangian
solution to the PDEs as in Fig. 13. All of the Poisson operator outflow curves (black) have
the same mean transit time, and the same parameters: Vp ¼ 0.05 and Vtiss ¼ 0.15 ml/g;
Fp ¼ 1 ml/(g min), PSg ¼ 1 ml/(g min). Figure from Anderson (6). Model is #46, running
loop mode to change Ntanks (http://www.physiome.org/jsim/models/webmodel/NSR/
Anderson_JC_2007/FIGURES/Anderson_JC_2007_fig12/index.html).
The grey curve touching to the top of each spatial profile is the
theoretical curve from Crone (35) and Renkin (39), as in the model
of Sangren and Sheppard (38):
PSg x
CðxÞ ¼ 1 e Fp L ; (14)
where x/L is the fractional distance along the capillary, the abscissa
in the lower panel.
Now that we know that the multi-compartmental serial tank
representation can give results approximating the normal PDE
representation, and that they differ basically only in the numerical
method used, the question becomes: “Which of the methods pro-
duces the correct assessment of the parameters with the greatest
efficiency?” The serial stirred tank model has the disadvantage that
the waveforms are seriously distorted by reducing the number of
stirred tanks, as is shown in Fig. 14.
While reducing the number of tanks in the stirred tank method
has a dramatic effect on the shapes of the outflow curves, the
problem is much less severe with the PDE representation, as
shown in Fig. 15. Solutions are shown for Ngrid ¼ 109, 51, 21,
11, and 7 for two methods of solving PDEs, one using a robust
solver TOMS731 (43) and the other using a Lagrangian sliding
fluid element algorithm (42).
17 Compartmental Modeling in the Analysis of Biological Systems 433
Fig. 15. Effect of reduction of Ngrid on the output C(t ) with two PDE solvers. The green curve is the serial compartmental
model with 109 tanks. The red rectangle is the input function divided by 4. Black curves are the solutions to the PDE
at varied resolutions. Upper panel: Lagrangian sliding fluid element algorithm, LSFEA, with Ngrid ¼ 121 black solid line,
51 short dashes superimposed on the black solid line, 21 long dashes, and 11 dotted. This algorithm, while computation-
ally fast, describes the outflow dilution curve as a series of square pulses; with a large number of segments the curves
appear smooth. With fewer segments, e.g., Ngrid ¼ 21 (long dashes) or 11 (dotted line), the steps are obvious but the
approximation is reasonably good. Lower panel: TOMS731 solver. This slow, robust solver, like most PDE solvers,
broadens the solution somewhat with reduced Ngrid but with Ngrid ¼ 11 (dotted line) there is obvious spreading and
oscillation in the solutions (www.physiome.org Model # 126).
434 J.B. Bassingthwaighte et al.
Fig. 16. Outflow dilution curves for D-glucose, 2-deoxy-D-glucose, and albumin (dog expt
4048-6) fitted with the model. The deoxyglucose curve has been shifted downward by half
a logarithmic decade (ordinate values divided by 10 in.) to display it separately from D-
glucose curve. Parameter estimates for D- and deoxyglucose were PSc, 0.97 and 1.0;
PSpc ¼ 0.7 and 0.5; Gpc ¼ 0.01 and 0.05 ml/(g min); the volume ratio Visf/Vp ¼ 6.5 and
Vpc/Vp ¼ 13.3 ml/g, the same for both glucoses. Coefficients of variation were 0.19 and
0.09. From Kuikka (36) with permission from the American Physiological Society (Model is
at www.physiome.org: Search on Kuikka, model 126).
the cellular uptake are given in the legend. The quality of the data is
shown by the smoothness of the curves over time; the goodness of
fit is a testimonial to the quality of the model defined through
the PDEs.
8. Commentary
References
1. Berman M (1963) The formulation and testing 7. Knopp TJ, Anderson DU, Bassingthwaighte JB
of models. Ann N Y Acad Sci 108:182–194 (1970) SIMCON–Simulation control to opti-
2. Jacquez JA (1972) Compartmental analysis in mize man-machine interaction. Simulation
biology and medicine. Kinetics of distribution 14:81–86
of tracer-labeled materials. Elsevier Publishing 8. Sauro HM, Fell DA (1991) SCAMP: a meta-
Co, Amsterdam, 237 pp bolic simulator and control analysis program.
3. Jacquez JA (1996) Compartmental analysis in Math Comput Model 15:15–28
biology and medicine, 3rd edn. BioMedware, 9. Sauro HM, Hucka M, Finney A, Bolouri H
Ann Arbor, MI, 514 pp (2001) The systems biology workbench con-
4. Cobelli C, Foster D, Toffolo G (2000) Tracer cept demonstrator: design and implementa-
kinetics in biomedical research. From data to tion. Available via the World Wide Web at
model. Kluwer Academic, New York http://www.cds.caltech.edu/erato/sbw/
5. Zierler KL (1981) A critique of compartmental docs/detailed-design/
analysis. Annu Rev Biophys Bioeng 10. Raymond GM, Butterworth E, Bas-
10:531–562 singthwaighte JB (2003) JSIM: free software
6. Anderson JC, Bassingthwaighte JB (2007) package for teaching physiological modeling
Tracers in physiological systems modeling. and research. Exp Biol 280.5:102. (www.phy-
Chapter 8 Mathematical modeling in nutrition siome.org/jsim)
and agriculture. In: Mark D. Hanigan JN, 11. Chizeck HJ, Butterworth E, Bassingthwaighte
Casey L Marsteller. Proceedings of the ninth JB (2009) Error detection and unit conversion.
international conference on mathematical Automated unit balancing in modeling inter-
modeling in nutrition, Roanoke, VA, 14–17 face systems. IEEE Eng Med Biol 28(3):50–58
August 2006, Virginia Polytechnic Institute 12. Platt JR (1964) Strong inference. Science
and State University Blacksburg, VA, 146:347–353
pp 125–159
17 Compartmental Modeling in the Analysis of Biological Systems 437
13. Bassingthwaighte JB, Chinard FP, Crone C, 26. Stewart GN (1897) Researches on the circula-
Goresky CA, Lassen NA, Reneman RS, Zierler tion time and on the influences which affect it:
KL (1986) Terminology for mass transport and IV. The output of the heart. J Physiol
exchange. Am J Physiol Heart Circ Physiol 22:159–183
250:H539–H545 27. Hamilton WF, Moore JW, Kinsman JM, Spur-
14. Benedek IH, Joshi AS, Pieniazek JH, King ling RG (1932) Studies on the circulation. IV.
S-YP, Kornhauser DM (1995) Variability in Further analysis of the injection method, and of
the pharmacokinetics and pharmacodynamics changes in hemodynamics under physiological
of low dose aspirin in healthy male volunteers. and pathological conditions. Am J Physiol
J Clin Pharmacol 35:1181–1186 99:534–551
15. Aarons L, Hopkins K, Rowland M, Brossel S, 28. Thompson HK, Starmer CF, Whalen RE,
Thiercelin JF (1989) Route of administration McIntosh HD (1964) Indicator transit time
and sex differences in the pharmacokinetics of considered as a gamma variate. Circ Res
aspirin, administered as its lysine salt. Pharm 14:502–515
Res 6:660–666 29. Bassingthwaighte JB, Ackerman FH, Wood
16. Prescott LF, Balali-Mood M, Critchley JAJH, EH (1966) Applications of the lagged normal
Johnstone AF, Proudfoot AT (1982) Diuresis density curve as a model for arterial dilution
or urinary alkalinisation for salicylate poison- curves. Circ Res 18:398–415
ing? Br Med J 285:1383–1386 30. Krenn CG, Krafft P, Schaefer B, Pokorny H,
17. Chan IS, Goldstein AA, Bassingthwaighte JB Schneider B, Pinsky MR, Steltzer H (2000)
(1993) SENSOP: a derivative-free solver for Effects of positive end-expiratory pressure on
non-linear least squares with sensitivity scaling. hemodynamics and indocyanine green kinetics
Ann Biomed Eng 21:621–631 in patients after orthotopic liver transplanta-
18. Glad T, Goldstein A (1977) Optimization of tion. Crit Care Med 28:1760–1765
functions whose values are subject to small 31. Krenn CG, Pokorny H, Hoerauf K, Stark J,
errors. BIT 17:160–169 Roth E, Steltzer H, Druml W (2008) Non-
19. Dennis JE, Schnabel RB (1983) Numerical isotopic tyrosine kinetics using an alanyl-
methods for unconstrained optimization and tyrosine dipeptide to assess graft function in
nonlinear equation. Prentice-Hall, New York liver transplant recipients - a pilot study. Wien
20. Fox IJ, Brooker LGS, Heseltine DW, Essex Klin Wochenschr 120(1–2):19–24
HE, Wood EH (1957) A tricarbocyanine dye 32. Kortgen A, Paxian M, Werth M, Recknagel P,
for continuous recording of dilution curves in Rauschfusz F, Lupp A, Krenn C, Muller D,
whole blood independent of variations in Claus RA, Reinhart K, Settmacher U, Bauer
blood oxygen saturation. Proc Staff Meet M (2009) Prospective assessment of hepatic
Mayo Clin 32:478 function and mechanisms of dysfunction in
21. Edwards AWT, Isaacson J, Sutterer WF, Bas- the critically ill. Shock 32(4):358–365
singthwaighte JB, Wood EH (1963) Indocya- 33. Bassingthwaighte JB, Yipintsoi T, Harvey RB
nine green densitometry in flowing blood (1974) Microvasculature of the dog left ventric-
compensated for background dye. J Appl ular myocardium. Microvasc Res 7:229–249
Physiol 18:1294–1304 34. Chinard FP, Vosburgh GJ, Enns T (1955)
22. Edwards AWT, Bassingthwaighte JB, Sutterer Transcapillary exchange of water and of other
WF, Wood EH (1960) Blood level of indocya- substances in certain organs of the dog. Am J
nine green in the dog during multiple dye Physiol 183:221–234
curves and its effect on instrumental calibra- 35. Crone C (1963) The permeability of capillaries
tion. Proc Staff Meet Mayo Clin 35:747–751 in various organs as determined by the use of
23. Bassingthwaighte JB (1966) Plasma indicator the indicator diffusion method. Acta Physiol
dispersion in arteries of the human leg. Circ Scand 58:292–305
Res 19:332–346 36. Kuikka J, Levin M, Bassingthwaighte JB
24. Hunton DB, Bollman JL, Hoffman HN (1986) Multiple tracer dilution estimates of D-
(1961) The plasma removal of indocyanine and 2-deoxy-D-glucose uptake by the heart.
green and sulfobromophthalein: effect of dos- Am J Physiol Heart Circ Physiol 250:
age and blocking agents. J Clin Invest 30 H29–H42
(9):1648–1655 (PMCID PMC290858) 37. Krogh A (1919) The number and distribution
25. Bassingthwaighte JB, Edwards AWT, Wood of capillaries in muscles with calculations of the
EH (1962) Areas of dye-dilution curves sam- oxygen pressure head necessary for supplying
pled simultaneously from central and periph- the tissue. J Physiol (Lond) 52:409–415
eral sites. J Appl Physiol 17:91–98
438 J.B. Bassingthwaighte et al.
38. Sangren WC, Sheppard CW (1953) A mathe- 44. Yipintsoi T, Scanlon PD, Bassingthwaighte JB
matical derivation of the exchange of a labeled (1972) Density and water content of dog
substance between a liquid flowing in a vessel ventricular myocardium. Proc Soc Exp Biol
and an external compartment. Bull Math Bio- Med 141:1032–1035
phys 15:387–394 45. Vinnakota K, Bassingthwaighte JB (2004)
39. Renkin EM (1959) Transport of potassium-42 Myocardial density and composition: a basis
from blood to tissue in isolated mammalian for calculating intracellular metabolite concen-
skeletal muscles. Am J Physiol 197:1205–1210 trations. Am J Physiol Heart Circ Physiol 286:
40. Bassingthwaighte JB (1974) A concurrent flow H1742–H1749
model for extraction during transcapillary 46. Bassingthwaighte JB, Raymond GR, Ploger
passage. Circ Res 35:483–503 JD, Schwartz LM, Bukowski TR (2006)
41. Bassingthwaighte JB, Wang CY, Chan IS GENTEX, a general multiscale model for
(1989) Blood-tissue exchange via transport in vivo tissue exchanges and intraorgan
and transformation by endothelial cells. Circ metabolism. Phil Trans Roy Soc A Math Phys
Res 65:997–1020 Eng Sci 364(1843):1423–1442. doi:
42. Bassingthwaighte JB, Chan IS, Wang CY 10.1098/rsta.2006.1779
(1992) Computationally efficient algorithms 47. Bassingthwaighte JB, Goresky CA, Linehan JH
for capillary convection-permeation-diffusion (1998) Ch. 1 Modeling in the analysis of the
models for blood-tissue exchange. Ann processes of uptake and metabolism in the
Biomed Eng 20:687–725 whole organ. In: Bassingthwaighte JB, Goresky
43. TOMS./TOMS. Association of computing CA, Linehan JH (eds) Whole organ approaches
machinery: transactions on mathematical soft- to cellular metabolism. Springer, New York,
ware. http://www.netlib.org/toms/index. pp 3–27
html
Chapter 18
Abstract
Physiologically based pharmacokinetic (PBPK) models differ from conventional compartmental
pharmacokinetic models in that they are based to a large extent on the actual physiology of the organism.
The application of pharmacokinetics to toxicology or risk assessment requires that the toxic effects in a
particular tissue are related in some way to the concentration time course of an active form of the substance
in that tissue. The motivation for applying pharmacokinetics is the expectation that the observed effects of a
chemical will be more simply and directly related to a measure of target tissue exposure than to a measure of
administered dose. The goal of this work is to provide the reader with an understanding of PBPK modeling
and its utility as well as the procedures used in the development and implementation of a model to chemical
safety assessment using the styrene PBPK model as an example.
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_18, # Springer Science+Business Media, LLC 2012
439
440 J.L. Campbell Jr. et al.
ka ke
CENTRAL
uptake clearance
k12 k21
DEEP
Problem
Identification
Literature
Evaluation
Model Formulation
Simulation
Compare to
Refine Model Validate Model
Kinetic Data
Fig. 2. Flowchart of the biologically motivated PBPK modeling approach to chemical risk
assessment.
2. Materials
Table 1
Representative list of available software packages
Table 2
Comparison of modeling software features
parameters used to calculate the dose metrics for the risk assess-
ment. The most important characteristics of a language for model
evaluation are verifiable code, self-documentation, and ease of use.
The feature that is most important with regard to self-
documentation is scripting, which allow the model developer to
create procedures consisting of sequences of commands that, for
example, set model parameters, run the model, and plot the model
predictions against the appropriate data set. Other features which
contribute to ease of model evaluation include viewable model
definition code, code sorting, and automatic linkage to integration
algorithms.
The modeling language that has seen the most widespread use
in PBPK modeling is the ACSL, which is currently implemented as
acslX. The ACSL language has also served as the basis for a variety
of older packages, including SimuSolv (from Dow Chemical,
no longer supported), ACSL/Tox (from Pharsight, no longer
supported), and ERDM (currently used by the USEPA). The
automatic code sorting provided by ACSL allows code to be
grouped functionally (liver, lung, fat, etc.) rather than in program
order, greatly simplifying model development and improving
readability of the code. The graphic code block capability in
acslX is particularly attractive for PBPK modeling because it pro-
vides the ability to create a model by connecting functional units
(e.g., tissues) in a graphic environment, while at the same time
creating a model definition in ACSL code that can be reviewed for
model verification. The scripting capability, which permits both
MATLAB-like m-files, greatly expedites the comparison of the
model with multiple data sets that is generally required for
PBPK modeling of data from multiple species and routes of expo-
sure. The scripting capability also makes it possible to document
the use of a model in a risk assessment, since m-scripts or com-
mand file procedures can be written that set the model parameters
and run the model for each of the dose metrics required for the
risk assessment, as well as for each of the data sets used for model
evaluation/validation.
Berkeley Madonna provides a particularly intuitive, flexible
platform for model development, and has been very popular in
academic settings. The software provides automatic code sorting,
the ability to automatically convert between code and graphic
model descriptions, and automatic compilation, greatly simplifying
and expediting model development, debugging, and verification.
Conversion of models between ACSL and Berkeley Madonna is
relatively straightforward. However, the lack of a scripting capabil-
ity makes comparison of the model with data fairly cumbersome,
particularly in situations where a large number of data sets are being
modeled. The lack of scripting also makes it more difficult to
document the actual use of a model in a risk assessment and greatly
complicates model evaluation.
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 449
3. Methods
3.1. General Concepts The methods will begin with a description of the seminal PBPK
model published by Ramsey and Andersen in 1984 and lead into
the elements necessary for successful model development, refine-
ment and validation. The experience of Ramsey and Andersen
serves as a useful example of the advantages of the PBPK modeling
approach. In this case, blood and tissue time-course curves of
styrene had been obtained for rats exposed to four different con-
centrations of 80, 200, 600, and 1,200 ppm (83). Data were
obtained during a 6 h exposure period and for 18 h after cessation
of the exposure. The initial analysis of these data had been based on
a simple compartmental model, similar to the model shown in
Fig. 1, which had a zero-order input related to the amount of
styrene inhaled, a two-compartment description of the rat, and
linear metabolism in the central compartment. The compartmental
model was successful with lower concentrations but was unable to
account for the more complex behavior at higher concentrations
(note the different behavior of the data at the two concentrations
shown in Fig. 3).
In an attempt to provide a more successful description, a PBPK
model was developed with a realistic equilibration process for pul-
monary uptake and Michaelis–Menten saturable metabolism in the
450 J.L. Campbell Jr. et al.
Rat
80 ppm
Hours
Fig. 3. Model predictions (solid lines) and experimental blood styrene concentrations in
rats during and after 6 h exposures to 80 and 600 ppm styrene. The thick bars represent
the chamber air concentrations of styrene and are shown to highlight the nonlinearity of
the relationship between administered and internal concentrations. The model (Fig. 1.3)
contains sufficient biological realism to predict the very different behaviors observed at
the two concentrations.
liver. A diagram of the PBPK model that was used by Ramsey and
Andersen (1984) (22) to describe styrene inhalation in both rats
and humans is shown in Fig. 4. In this diagram, the boxes represent
tissue compartments and the lines connecting them represent
blood flows. The model contained several “lumped” tissue com-
partments: fat tissues, poorly perfused tissues (muscle, skin, etc.),
richly perfused tissues (viscera), and metabolizing tissues (liver).
The fat tissues were described separately from the other poorly
perfused tissues due to their much higher partition coefficient for
styrene, which leads to different kinetic properties, while the liver
was described separately from the other richly perfused tissues due
to its key role in the metabolism of styrene. Each of these tissue
groups was defined with respect to their blood flow, tissue volume,
and their ability to store (partition) the chemical of interest.
Although the model diagram in Fig. 4 shows a lung compartment,
a steady-state approximation for the equilibration of lung blood
with alveolar air was used in the mathematical formulation of the
model to eliminate the need for an actual lung tissue compartment.
This simple model structure, with realistic constants for the physi-
ological, partitioning, and metabolic parameters, very accurately
predicted the behavior of styrene in both fat and blood of the rat
at all concentrations. Fig. 3 compares the model-predicted time
course in the blood with the experimental data for the highest and
lowest exposure concentrations in the rat studies.
The structure of the PBPK model for styrene reflects the
generic mammalian architecture. Organs are arranged in a parallel
system of blood flows with total blood flow through the lungs.
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 451
QAlv QAlv
Alveolar Space
CInh CAlv
QT QT
Lung Blood
CVen CArt
QF
Fat Tissue Group
CVF CArt
QM
Muscle Tissue Group
CVM CArt
Richly Perfused QR
CVR Tissue Group CArt
Liver [Metabolizing QL
CVL Tissue Group] CArt
VMax
Metabolites
KM
Blood
Exhaled Air
Hours
Fig. 5. Model predictions and experimental blood and exhaled air concentrations in human
volunteers during and after 6 h exposures to 80 ppm styrene. The model is identical to
that used for rats (Fig. 1.4). The model parameters have been changed to values
appropriate for humans on the basis of physiological and biochemical information, and
have not been adjusted to improve the fit to the experimental data.
Styrene Concentration (mg / L)
376
216
51
Hours
Fig. 6. Model predictions and experimental exhaled air concentrations in human volun-
teers following 1 h exposures to 51, 216, and 376 ppm styrene. The model is the same as
Fig. 1.5.
3.2. Modeling This basic PBPK model for styrene has several tissue groups which
Philosophy were lumped according to their perfusion and partitioning character-
istics. In the mathematical formulation, each of these several com-
partments is described by a single mass-balance differential equation.
It would be possible to describe individual tissues in each of the
lumped compartments, if necessary. This detail is usually unnecessary
unless some particular tissue in a lumped compartment is the target
tissue. One might, for example, want to separate brain from other
richly perfused tissues if the model were for a chemical that had a toxic
effect on the central nervous system (86–88). Other examples of
additional compartments include the addition of placental and mam-
mary compartments to model pregnancy and lactation (89–91). The
interactions of chemical mixtures can even be described by including
compartments for more than one chemical in the model (92–94).
Increasing the number of compartments does increase the number of
differential equations required to define the model. However, the
number of equations does not pose any problem due to the power of
modern desktop computers.
On the other hand, as the number of compartments in the
PBPK model increases, the number of input parameters increases
correspondingly. Each of these parameters must be estimated from
experimental data of some kind. Fortunately, the values of many of
these can be set within narrow limits from nonkinetic experiments.
The PBPK model can also help to define those experiments which
are needed to improve parameter estimates by identifying condi-
tions where the sensitivity of the model to the parameter is the
greatest (95). The demand that the PBPK fit a variety of data also
restricts the parameter values that will give a satisfactory fit to
experimental data. For example, the styrene model (described
above) was required to reproduce both the high and low concen-
tration behaviors, which appeared qualitatively different, using the
same parameter values. If one were independently fitting single
curves with a model, the different parameter values obtained
under different conditions would be relatively uninformative for
extrapolation.
As the renowned statistician George Box has said, “All models
are wrong, and some are useful.” Even a relatively complex descrip-
tion such as a PBPK model will sometimes fail to fit reliable experi-
mental data. When this occurs, the investigator needs to think how
the model might be changed, i.e., what extra biological aspects
must be added to the physiological description to bring the predic-
tions in line with experimental observation? In the case of the work
with styrene cited above, continuous 24 h styrene exposures could
not be modeled with a time-independent maximum rate of metab-
olism, and induction of enzyme activity had to be included to yield
a satisfactory representation of the observed kinetic behavior (96).
When a PBPK model is unable to adequately describe kinetic
data, the nature of the discrepancy can provide the investigator with
454 J.L. Campbell Jr. et al.
3.3. Tissue Grouping The first aspect of PBPK model development that will be discussed is
determining the extent to which the various tissues in the body may
be grouped together. Although tissue grouping is really just one
aspect of model design, which is discussed in the next section, it
provides a simple context for introducing the two alternative
approaches to PBPK model development: “lumping” and
“splitting” (Fig. 7). In the context of tissue grouping, the guiding
philosophy in the lumping approach can be stated as follows:
“Tissues which are pharmacokinetically and toxicologically indistin-
guishable may be grouped together.” In this approach, model
development begins with information at the greatest level of detail
that is practical, and decisions are made to combine physiological
elements (tissues and blood flows) to the extent justified by their
similarity. The common grouping of tissues into richly (or rapidly)
perfused and poorly (or slowly) perfused on the basis of their
perfusion rate (ratio of blood flow to tissue volume) is an example
of the lumping approach. The contrasting philosophy of splitting is
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 455
Lumping Splitting
Body
Body / Liver
Fig. 7. The role of lumping and splitting processes in PBPK model development.
3.3.1. Criteria for Grouping There are two alternative approaches for determining whether
Tissues tissues are kinetically distinct or should be lumped together. In
the first approach, the tissue rate-constants are compared. The
rate-constant (kT) for a tissue is similar to the perfusion rate except
that the partitioning characteristics of the tissue are also considered:
kT ¼ Q T =ðP T V T Þ;
where QT ¼ the blood flow to the tissue (L/h), PT ¼ the tissue–blood
partition coefficient for the chemical, VT ¼ the volume of the
tissue (L).
Thus the units of the tissue rate-constant are the same as for the
perfusion rate, h1, but the rate-constant more accurately reflects
the kinetic characteristics of a tissue for a particular chemical. It was
the much smaller rate-constant for fat in the case of a lipophilic
chemical such as styrene that required the separation of the fat
compartment from the other poorly perfused tissues (muscle,
skin, etc.) in the PBPK model for styrene (22).
The second, less rigorous, approach for determining whether
tissues should be lumped together is simply to compare the perfor-
mance of the model with the tissues combined and separated. This
approach is essentially the reverse of the example given above for
splitting of the fat compartment. The reliability of this approach
depends on the availability of data under conditions where the
tissues being evaluated would be expected to have an observable
impact on the kinetics of the chemical. Sensitivity analysis can
sometimes be used to determine the appropriate conditions for
such a comparison (95).
3.4. Model Design There is no easy rule for determining the structure and level of
Principles complexity needed in a particular modeling application. The wide
variability of PBPK model design for different chemicals can be seen
by comparing the diagram of the PBPK model for methotrexate
(41), shown in Fig. 8, with the diagram for the styrene PBPK
model shown in Fig. 3. Model elements which are important for a
volatile, lipophilic chemical such as styrene (lung, fat) do not need
to be considered in the case of a nonvolatile, water soluble com-
pound such as methotrexate. Similarly, while kidney excretion and
enterohepatic recirculation are important determinants of the
kinetics of methotrexate, only metabolism and exhalation are
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 457
Plasma
QL - QG
QG
Liver G.I. Tract
τ τ τ C1 C2 C3 C4 Feces
r1 r2 r3
Gut Lumen
QK
Kidney
Urine
QM
Muscle
3.4.1. Model Identification The process of model identification begins with the selection of those
model elements which the modeler considers to be minimum essen-
tial determinants of the behavior of the particular animal–chemical
system under study, from the viewpoint of the intended application of
the model. Comparison with appropriate data, relevant to the
intended purpose of the model, then can provide insights into defects
in the model which must be corrected either by reparameterization or
by changes to the model structure. Unfortunately, it is not always
possible to separate these two elements. In models of biological
systems, estimates of the values of model parameters will always be
uncertain, due both to biological variation and experimental error. At
the same time, the need for biological realism unavoidably results in
models that are “overparameterized”; that is, they contain more
parameters than can be identified from the kinetic data the model is
used to describe.
As an example of the interaction between model structure and
parameter identification, the two metabolic parameters, Vmax and
Km, in the model for styrene discussed earlier could both be identi-
fied relatively unambiguously in the case of the rat. Indeed, as
pointed out previously, the inclusion of capacity-limited metabolism
in the model was necessary in order to reproduce the available data
at both low and high exposure concentrations. In the case of the
human, however, data was not available at sufficiently high concen-
trations to saturate metabolism. Therefore, only the ratio, Vmax/
Km, would actually be identifiable. The use of the same model
structure, including a two-parameter description of metabolism, in
the human as in the rat was justified by the knowledge that similar
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 459
3.5. Elements of Model The process of selecting a model structure can be broken down into
Structure a number of elements associated with the different aspects of
uptake, distribution, metabolism, and elimination. In addition,
there are several general model structure issues that must be
addressed, including mass balance and allometric scaling. The fol-
lowing section treats each of these elements in turn.
3.5.1. Storage Naturally, any tissues which are expected to accumulate significant
Compartments quantities of the chemical or its metabolites need to be included in
the model structure. As discussed earlier, these storage tissues can
be grouped together to the extent that they have similar time
constants. Three storage compartments were included in the sty-
rene model described above: fat tissues, richly perfused tissues, and
poorly perfused tissues. The generic mass balance equation for
storage compartments such as these is (Fig. 9):
QT QT
Tissue
CA CVT
dA T =dt ¼ Q T C A Q T C VT ;
where AT ¼ the mass of chemical in the tissue (mg), QT ¼ the blood
flow to (and from) the tissue (L/h), CA ¼ the concentration of
chemical in the arterial blood reaching the tissue (mg/L), CVT ¼ the
concentration of the chemical in the venous blood leaving the tissue
(mg/L).
Thus this mass balance equation simply states that the change in
the amount of chemical in the tissue with respect to time (dAT/dt) is
equal to the difference between the amount of chemical entering the
tissue and the amount leaving the tissue. We can then calculate
the concentration of the chemical in the storage tissue (CT) from
the amount in the tissue and the tissue volume (VT):
C T ¼ A T =V T :
In PBPK models, it is common to assume “venous equilibra-
tion”; that is, that in the time that it takes for the blood to perfuse
the tissue, the chemical is able to achieve its equilibrium distribution
between the tissue and blood. Therefore, the concentration of the
chemical in the venous blood can be related to the concentration in
the tissue by the equilibrium tissue–blood partition coefficient (PT):
C VT ¼ C T =P T :
Therefore we obtain a differential equation in AT:
dA T =dt ¼ Q T C A Q T A T =ðP T V T Þ:
If desired, we can reformulate this mass balance equation in
terms of concentration:
dA T =dt ¼ dðC T V T Þ=dt ¼ C T dV T =dt þ V T dC T =dt:
If (and only if) VT is constant (i.e., the tissue does not grow
during the simulation), dVT/dt ¼ 0, and:
dA T =dt ¼ V T dC T =dt,
so we have the alternative differential equation:
dC T =dt ¼ Q T ðC A C T =P T Þ=V T :
This alternative mass balance formulation, in terms of concen-
tration rather than amount, is popular in the pharmacokinetic
literature. However, in the case of models with compartments
that change volume over time it is preferable to use the formulation
in terms of amounts in order to avoid the need for the additional
term reflecting the change in volume (CT dVT/dt).
Depending on the chemical, many different tissues can poten-
tially serve as important storage compartments. The use of a fat
storage compartment in the styrene model is typical of a lipophilic
chemical. The gut lumen can also serve as a storage site for chemi-
cals subject to enterohepatic recirculation, as in the case of
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 461
3.5.2. Blood Compartment The description of the blood compartment can vary considerably
from one PBPK model to another depending on the role the blood
plays in the kinetics of the chemical being modeled. In some cases
the blood may be treated as a simple storage compartment, with a
mass balance equation describing the summation (S) of the venous
blood flows from the various tissues and the return of the total
arterial blood flow (QC) to the tissues, as well as any urinary
clearance (Fig. 10):
X
dA B =dt ¼ Q T C T =P T Q C C B K U C B ;
where AB ¼ the amount of chemical in the blood (mg), QC ¼ the
total cardiac output (L/h), CB ¼ the concentration of chemical in
the blood (mg/L), KU ¼ the urinary clearance (L/h).
For some chemicals, such as methotrexate, all of the chemical is
present in the plasma rather than the red blood cells, so plasma
flows and volumes are used instead of blood. For other chemicals it
may be necessary to model the red blood cells as a storage com-
partment in communication with the plasma via diffusion-limited
transport. Note that if the blood is an important storage compart-
ment for a chemical, it may be necessary to carefully evaluate data
on tissue concentrations, particularly the richly perfused tissues, to
determine whether chemical in the blood perfusing the tissue could
be contributing to the measured tissue concentration.
For still other chemicals, such as styrene, the amount of chemical
actually in the blood may be relatively unimportant. In this case,
instead of having a true blood compartment, a steady-state approxi-
mation can be used to estimate the concentration in the blood at any
time. Assuming the blood is at steady-state with respect to the tissues:
dA B =dt ¼ 0:
QC
Blood
CB
KU
3.5.3. Metabolism/ The liver is frequently the primary site of metabolism for a chemi-
Elimination cal. The following equation is an example of the mass balance
equation for the liver in the case of a chemical which is metabolized
by two pathways (Fig. 11):
d A L =dt ¼ Q L ðC A C L =P L Þ kF C L V L =P L V max
C L =P L =ðK m þ C L =P L Þ:
In this case, the first term on the right-hand side of the equa-
tion represents the mass flux associated with transport in the blood
and is identical to the case of the storage compartment described
previously. The second term describes metabolism by a linear (first-
order) pathway with rate constant kF (h1) and the third term
represents metabolism by a saturable (Michaelis–Menten) pathway
with capacity Vmax (mg/h) and affinity Km (mg/L). If it were
desired to model a water soluble metabolite produced by the
saturable pathway, an equation for its formation and elimination
could be added to the model (Fig. 12):
dA M =dt ¼ Rstoch V max C L =P L =ðK m þ C L =P L Þ ke A M
C M ¼ AM =V D ;
where AM ¼ the amount of metabolite in the body (mg), Rstoch ¼
the stoichiometric yield of the metabolite times the ratio of its
molecular, Weight to that of the parent chemical, ke ¼ the rate
QL QL
Liver
CA CVL
k VMax, KM
F
VMax, KM
Metabolite
ke
constant for the clearance of the metabolite from the body (h1),
CM ¼ the concentration of the metabolite in the plasma (mg/L),
VD ¼ the apparent volume of distribution for the metabolite (L).
3.5.4. Metabolite In principle, the same considerations which drive decisions regarding
Compartments the level of complexity of the PBPK model for the parent chemical
must also be applied for each of its metabolites, and their metabolites,
and so on. As in the case of the parent chemical, the first and most
important consideration is the purpose of the model. If the concern is
direct parent chemical toxicity and the chemical is detoxified by
metabolism, then there is no need for a description of metabolism
beyond its role in the clearance of the parent chemical. The models for
styrene and methotrexate discussed above are examples of parent
chemical models. Similarly, if reactive intermediates produced during
the metabolism of a chemical are responsible for its toxicity, as in the
case of methylene chloride, a very simple description of the metabolic
pathways might be adequate (8). The cancer risk assessment model for
methylene chloride described the rate of metabolism for two path-
ways: the glutathione conjugation pathway, which was considered
responsible for the carcinogenic effects, and the competing P450
oxidation pathway, which was considered protective.
On the other hand, if one or more of the metabolites are
considered to be responsible for the toxicity of a chemical, it may
be necessary to provide a more complete description of the kinetics
of the metabolites themselves. For example, in the case of teratoge-
nicity from all-trans-retinoic acid, both the parent chemical and
several of its metabolites are considered to be toxicologically active;
therefore, in developing the PBPK model for this chemical it was
necessary to include a fairly complete description of the metabolic
pathways (100). Fortunately, the metabolism of xenobiotic com-
pounds often produces metabolites which are relatively water solu-
ble, simplifying the description needed. In many cases, such as the
production of trichloroacetic acid from trichloroethylene (46–48),
a classical one-compartment description may be adequate for
describing the metabolite kinetics. An example of such a description
was provided earlier. In other cases, however, the description of the
metabolite (or metabolites) may have to be as complex as that of the
parent chemical. An example of such a case is the PBPK model for
parathion (88), in which the model for the active metabolite, para-
oxon, is actually more complex than that of the parent chemical.
3.5.5. Target Tissues Typically, a PBPK model used in toxicology or risk assessment
applications will include compartments for any target tissues for
the toxic action of the chemical. The target tissue description may
in some cases need to be fairly complicated, including such features
as in situ metabolism, binding, and pharmacodynamic processes in
order to provide a realistic measure of biologically effective tissue
exposure (57). For example, whereas the lung compartment in the
464 J.L. Campbell Jr. et al.
3.5.6. Uptake Routes Each of the relevant uptake routes for the chemical must be
described in the model. Often there are a number of possible
ways to describe a particular uptake process, ranging from simple
to complex. As with all other aspects of model design, the compet-
ing goals of parsimony and realism must be balanced in the selec-
tion of the level of complexity to be used. The following examples
are meant to provide an idea of the variety of model code which can
be required to describe the various possible uptake processes.
k0
QL QL
Liver
CA CVL
kF
Oral Gavage For a chemical which is not excreted in the feces (Fig. 14):
AST0 ¼ Dose BW
dA ST =dt ¼ kA A ST
dA L =dt ¼ Q L ðC A C L =P L Þ kF C L V L =P L þ kA A ST ;
where Dose ¼ the gavage dose (mg/kg), BW ¼ the animal body
weight (kg), AST0 ¼ the amount of chemical in the stomach at the
beginning of the simulation, AST ¼ the amount of chemical in the
stomach at any given time, kA ¼ the oral absorption rate (h1).
For a chemical which is excreted in the feces (Fig. 15):
A ST0 ¼ Dose BW
dA ST =dt ¼ kA A ST
dA I =dt ¼ kA A st kI AI K F A I =V I
dA L =dt ¼ Q L ðC A C L =P L Þ kF C L V L =P L þ kI A I ;
where AI ¼ the amount of chemical in the intestinal lumen (mg),
kI ¼ the rate constant for intestinal absorption (h1), KF ¼ the
fecal clearance (L/h), VI ¼ the volume of the intestinal lumen (L).
The rate of fecal excretion of the chemical is then:
dA F =dt ¼ K F A I =V I :
Stomach
kA
QL QL
Liver
CA CVL
kF
Fig. 14. Chemical ingested through oral gavage and not excreted in the feces.
Stomach
kA
KF
Intestinal Lumen
kI
QL QL
Liver
CA CVL
kF
CI QP CX
Alveolar Air
QC QC
Alveolar Blood
CV CA
Surface
KP, ASkC
QSk QSk
Skin
CA CSkV
Chamber
CI N * QP CX
Alveoli
Fig. 18. The change in concentration in the chamber air as the chemical is absorbed.
QK QK
Kidney
CA CKV
ke
dA CH =dt ¼ N Q P ðC X C I Þ
C I ¼ A CH =V CH ;
where ACH ¼ the amount of chemical in the chamber (mg), N ¼ the
number of animals in the chamber, CX ¼ the concentration of chem-
ical in the air exhaled by the animals (mg/L), QP ¼ the alveolar
ventilation rate for a single animal (L/h), CI ¼ the chamber air
concentration (mg/L), VCH ¼ the volume of air in the chamber (L).
3.5.8. Distribution/Transport There are a number of issues associated with the description of the
transport and distribution of the chemical that must be considered
in the process of model design. Examples of a few of the more
common ones are included here.
Diffusion Limitation Most of the PBPK models in the literature are flow-limited models;
that is, they assume that the rate of tissue uptake of the chemical is
limited only by the flow of the chemical to the tissue in the blood.
While this assumption appears to be reasonable in general, for some
chemicals and tissues uptake may instead be diffusion-limited.
Examples of tissues for which diffusion-limited transport has
often been described include the skin, placenta, brain, and fat.
The model compartments described thus far have all assumed
flow-limited transport. If there is evidence that the movement of
a chemical between the blood and a tissue is limited by diffusion, a
two-compartment description of the tissue can be used with a
“shallow” exchange compartment in communication with the
blood and a diffusion-limited “deep” compartment (Fig. 20):
dA S =dt ¼ Q S ðC A C S Þ K PA ðC S C D =P D Þ
dA D =dt ¼ K PA ðC S C D =P D Þ;
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 471
QS QS
Shallow
CA CS
KPA
Deep
Fig. 20. A two-compartment model describing the movement of a chemical between the
“shallow” and diffusion-limited “deep” compartment.
3.6. Model Once the model structure has been determined, it still remains to
Parameterization identify the values of the input parameters in the model.
3.6.2. Biochemical For volatile liquids, the type of chemicals which are common envi-
Parameters ronmental contaminants, tissue partition coefficients can be deter-
mined by a simple in vitro technique called vial equilibration (36, 38)
and tissue metabolic constants by a modification of the same tech-
nique (37) or other in vitro methods (105). Alternatively, rapid
in vivo approaches for determining metabolic constants can be used
either based on steady-state (96) or gas uptake experiments
(102–104, 106, 107). Determination of the total amount of chemical
metabolized in a particular exposure situation can also be used to
estimate metabolic parameters (109). In addition, determination of
stable end-product metabolites after exposure can be a particularly
attractive technique in some cases (108, 110). Similar approaches can
be used with nonvolatile chemicals (39, 111) and metals (112).
472 J.L. Campbell Jr. et al.
Table 3
(II-1) Typical physiological parameters for PBPK models
Table 4
Standard allometric scaling for physiologically based
pharmacokinetic model parameters
3.6.4. Parameter In many cases, important parameters values needed for a PBPK
Optimization model may not be available in the literature. In such cases it is
necessary to measure them in new experiments, to estimate them
by QSAR techniques, or to identify them by optimizing the fit of
the model to an informative data set. Even in the case where an
initial estimate of a particular parameter value can be obtained from
other sources, it may be desirable to refine the estimate by optimi-
zation. For example, given the difficulty of obtaining accurate
estimates of the fat volume in rodents, a more reliable estimate
may be obtained by examining the impact of fat volume on the
kinetic behavior of a lipophilic compound such as styrene. Of
course, being able to uniquely identify a parameter from a kinetic
data set rests on two key assumptions: (1) that the kinetic behavior
of the compound under the conditions in which the data was
collected is sensitive to the parameter being estimated, and (2)
that other parameters in the model which could influence the
observed kinetics have been determined by other means, and are
held fixed during the estimation process.
The actual approach for conducting a parameter optimization
can range from simple visual fitting, where the model is run with
different values of the parameter until the best correspondence
appears to be achieved, or by a quantitative mathematical algo-
rithm. The most common algorithm used in optimization is the
least-squares fit. To perform a least-squares optimization, the
model is run to obtain a set of predictions at each of the times a
data point was collected. The square of the difference between the
model prediction and data point at each time is calculated and the
474 J.L. Campbell Jr. et al.
results for all of the data points are summed. The parameter being
estimated is then modified, and the sum of squares is recalculated.
This process is repeated until the smallest possible sum of squares
is obtained, representing the best possible fit of the model to
the data.
In a variation on this approach, the square of the difference at
each point is divided by the square of the prediction. This variation,
known as relative least squares, is preferable in the case of data with
an error structure which can be described by a constant coefficient
of variation (that is, a constant ratio of the standard deviation to the
mean). The former method, known as absolute least squares, is
preferable in the case of data with a constant variance. From a
practical viewpoint, the absolute least squares method tends to
give greater weight to the data at higher concentrations and results
in fits that look best when plotted on a linear scale, while the relative
least squares method gives greater weight to the data at lower
concentrations and results in fits that look best when plotted on a
logarithmic scale.
A generalization of this weighting concept is provided by the
extended least squares method, available in a number of optimiza-
tion packages including ACSL/Opt (MGA Software, Concord,
MA). In the extended least squares algorithm, the heteroscedasti-
city parameter can be varied from 0 (for absolute weighting) to
2 (for relative weighting), or can be estimated from the data. In
general, setting the heteroscedasticity parameter from knowledge
of the error structure of the data is preferable to estimating it from a
data set.
A common example of identifying PBPK model parameters by
fitting kinetic data is the estimation of tissue partition coefficients
from experiments in which the concentration of chemical in the
blood and tissues is reported at various time points. Using an
optimization approach, the predictions of the model for the time
course in the blood and tissues could be optimized with respect to
the data by varying the model’s partition coefficients. There is really
little difference in the strength of the justification for estimating the
partition coefficients in this way as opposed to estimating them
directly from the data (by dividing the tissue concentrations by
the simultaneous blood concentration). In fact, the direct estimates
would probably be used as initial estimates in the model when the
optimization was started.
A major difficulty in performing parameter optimization results
from correlations between the parameters. When it is necessary to
estimate parameters which are highly correlated, it is best to gener-
ate a contour plot of the objective function (sum of squares) or
confidence region over a reasonable range of values of the two
parameters. Generation of a contour plot with ACSL/Opt is rela-
tively straightforward. An example of a contour plot for two of the
metabolic parameters in the PBPK model for methylene chloride is
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 475
Fig. 21. Contour plot for correlated metabolic parameters in the PBPK model for methylene
chloride.
shown in Fig. 21. The contours in the figure outline the joint
confidence region for the values of the two parameters, and the
fact that the confidence region is aligned diagonally reflects the
correlation between the two parameters.
3.7. Mass Balance One of the most important mathematical considerations during
Requirements model design is the maintenance of mass balance. Simply put, the
model should neither create nor destroy mass. This seemingly
obvious principle is often violated unintentionally during the pro-
cess of model development and parameterization. A common vio-
lation of mass balance, which typically leads to catastrophic results,
involves failure to exactly match the arterial and venous blood flows
in the model. As described above, the movement of chemical in the
blood (in units of mass per time) is described as the product of the
concentration of chemical in the blood (in units of mass per vol-
ume) times the flow rate of the blood (in units of volume per time).
Therefore, to maintain mass balance, the sum of the blood flows
leaving any particular tissue compartment must equal the sum of
the blood flows entering the compartment. In particular, to main-
tain mass balance in the blood compartment (regardless of whether
it is actually a compartment or just a steady-state equation), the
sum of the venous flows from the individual tissue compartments
must equal the total arterial blood flow leaving the heart:
X
Q T ¼ Q C:
Another obvious but occasionally overlooked aspect of main-
taining mass balance during model development is that if a model is
modified by splitting a tissue out of a lumped compartment, the
476 J.L. Campbell Jr. et al.
blood flow to the separated tissue (and its volume) must be sub-
tracted from that for the lumped compartment. Moreover, even
though a model may initially be designed with parameters that meet
the above requirements, mass balance may unintentionally be vio-
lated later if the parameters are altered during model execution. For
example, if the parameter for the blood flow to one compartment is
increased, the parameter for the overall blood flow must be
increased accordingly or an equivalent reduction must be made in
the parameter for the blood flow to another compartment. Partic-
ular care must be taken in this regard when the model is subjected
to sensitivity or uncertainty analysis; inadvertent violation of mass
balance during Monte Carlo sampling has lead in the past to the
publication of erroneous sensitivity results (95).
A similar mass balance requirement must be met for transport
other than blood flow. For example, if the chemical is cleared by
biliary excretion, the elimination of chemical from the liver in the
bile must exactly match the appearance of chemical in the gut
lumen in the bile. Put mathematically, the same term for the
transport must appear in the equations for the two compartments,
but with opposite signs (positive vs. negative). For example, if the
following equation were used to describe a liver compartment with
first-order metabolism and biliary clearance.
dA L =dt ¼ Q L ðC A C L =P L Þ kF C L V L =P L K B C L ;
3.8. Model Diagram As described in the previous sections, the process of developing a
PBPK model begins by determining the essential structure of the
model based on the information available on the chemical’s toxicity,
mechanism of action, and pharmacokinetic properties. The results
of this step can usually be summarized by an initial model diagram,
such as those depicted in Figs. 3 and 8. In fact, in many cases a
well-constructed model diagram, together with a table of the input
parameter values and their definitions, is all that an accomplished
modeler should need in order to create the mathematical equations
defining a PBPK model. In general, there should be a one-to-one
correspondence of the boxes in the diagram to the mass balance
equations (or steady-state approximations) in the model. Similarly,
the arrows in the diagram correspond to the transport or
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 477
3.9. Elements of Model One of the key advantages of PBPK models is their ability to
Evaluation perform extrapolations across species, routes of exposure, and
exposure conditions. The reliability of a particular PBPK model
for the purpose of extrapolation depends not only on the adequacy
of its structure, but also on the correctness of its parameterization.
The following section discusses some of the key issues associated
with evaluating the adequacy of a model to predict chemical kinet-
ics under conditions different from those for which experimental
data are available.
3.9.1. Model Documentation In cases where a model previously developed by one investigator is
being evaluated for use in a different application by another inves-
tigator, adequate model documentation is critical for evaluation of
the model. The documentation for a PBPK model should include
sufficient information about the model so that an experienced
modeler could accurately reproduce its structure and parameteri-
zation. Usually the suitable documentation of a model will require
a combination of one or more “box and arrow” model diagrams
together with any equations which cannot be unequivocally
derived from the diagrams. Model diagrams should clearly differ-
entiate blood flow from other transport (e.g., biliary excretion) or
metabolism, and arrows should be used where the direction of
transport could be ambiguous. All tissue compartments, metabo-
lism pathways, routes of exposure, and routes of elimination should
be clearly and accurately presented. All equations should be dimen-
sionally consistent and in standard mathematical notation. Generic
equations (e.g., for tissue “i”) can help to keep the description brief
but complete. The values used for all model parameters should be
provided, with units. If any of the listed parameter values are based
on allometric scaling, a footnote should provide the body weight
used to obtain the allometric constant as well as the power of body
weight used in the scaling.
478 J.L. Campbell Jr. et al.
3.9.2. Model Validation Internal validation consists of the evaluation of the mathematical
correctness of the model (114). It is best accomplished on the actual
model code, but if necessary can be performed on appropriate
documentation of the model structure and parameters, as described
above (Assuming, of course, that the actual model code accurately
reflects the model documentation). A more important issue regards
the provision of evidence for external validation (sometimes referred
to as verification). The level of detail incorporated into a model is
necessarily a compromise between biological accuracy and parsi-
mony. The process of evaluating the sufficiency of the model for its
intended purpose, termed model verification, requires a demonstra-
tion of the ability of the model to predict the behavior of experi-
mental data different from that on which it was based.
Whereas a simulation is intended simply to reproduce the
behavior of a system, a model is intended to confirm a hypothesis
concerning the nature of the system (115). Therefore, model vali-
dation should demonstrate the ability of the model to predict the
behavior of the system under conditions which test the principal
aspects of the underlying hypothetical structure. While quantitative
tests of goodness of fit may often be a useful aspect of the verifica-
tion process, the more important consideration may be the ability
of the model to provide an accurate prediction of the general
behavior of the data in the intended application.
Where only some aspects of the model can be verified, it is
particularly important to assess the uncertainty associated with the
aspects which are untested. For example, a model of a chemical
and its metabolites which is intended for use in cross-species
extrapolation to humans would preferably be verified using data
in different species, including humans, for both the parent chemi-
cal and the metabolites. If only parent chemical data is available in
the human, the correspondence of metabolite predictions with
data in several animal species could be used as a surrogate, but this
deficiency should be carefully considered when applying the
model to predict human metabolism. One of the values of biolog-
ically based modeling is the identification of specific data which
would improve the quantitative prediction of toxicity in humans
from animal experiments.
In some cases it is necessary to use all of the available data to
support model development and parameterization. Unfortunately,
this type of modeling can easily become a form of self-fulfilling
prophecy: models are logically strongest when they fail, but
psychologically most appealing when they succeed (116). Under
these conditions, model verification can particularly difficult, putting
an additional burden on the investigators to substantiate the trustwor-
thiness of the model for its intended purpose. Nevertheless, a com-
bined model development and verification can often be successfully
performed, particularly for models intended for interpolation, integra-
tion, and comparison of data rather than for true extrapolation.
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 479
3.9.3. Parameter Verification In addition to verifying the performance of the model against
experimental data, the model should be evaluated in terms of the
plausibility of its parameters. This is particularly important in the
case of PBPK models, where the parameters generally possess
biological significance, and can therefore be evaluated for plausibil-
ity independent of the context of the model. The source of each
model input parameter value should be identified, whether it was
obtained from prior literature, determined directly by experiment,
or estimated by fitting a model output to experimental data. Param-
eter estimates derived independently of tissue time course or dose-
response data are preferred. To the extent feasible, the degree of
uncertainty regarding the parameter values should also be evalu-
ated. The empirically derived “Law of Reciprocal Certainty” states
that the more important the model parameter, the less certain will
be its value. In accordance with this principle, the most difficult,
and typically most important, parameter determination for PBPK
models is the characterization of the metabolism parameters.
When parameter estimation has been performed by optimizing
model output to experimental data, the investigator must assure
that the parameter is adequately identifiable from the data (114).
Due to the confounding effects of model error, overparameteriza-
tion, and parameter correlation, it is quite possible for an optimiza-
tion algorithm to obtain a better fit to a particular data set by
modifying a parameter which in fact should not be identified on
the basis of that data set. Also, when an automatic optimization
routine is employed it should be restarted with a variety of initial
parameter values to assure that the routine has not stopped at a
local optimum. These precautions are particularly important when
more than one parameter is being estimated simultaneously, since
the parameters in biologically based models are often highly corre-
lated, making independent estimation difficult. Estimates of param-
eter variance obtained from automatic optimization routines
should be viewed as lower bound estimates of true parameter
uncertainty since only a local, linearized variance is typically calcu-
lated. In characterizing parameter uncertainty, it is probably more
instructive to determine what ranges of parameter values are clearly
inconsistent with the data than to accept a local, linearized variance
estimate provided by the optimization algorithm.
It is usually necessary for the investigator to repeatedly vary the
model parameters manually to obtain a sense of their identifiability
and correlation under various experimental conditions, although
some simulation languages include routines for calculating param-
eter sensitivity and covariance or for plotting confidence region
contours. Sensitivity analysis and Monte Carlo uncertainty analysis
techniques can serve as useful methods to estimate the impact of
input parameter uncertainty on the uncertainty of model outputs
(95, 117). However, care should be taken to avoid violation of mass
480 J.L. Campbell Jr. et al.
3.9.4. Sensitivity Analysis To the extent that a particular PBPK model correctly reflects the
physiological and biochemical processes underlying the pharmaco-
kinetics of a chemical, exercising the model can provide a means for
identifying the most important physiological and biochemical para-
meters determining the pharmacokinetic behavior of the chemical
under different conditions (95). The technique for obtaining this
information is known as sensitivity analysis and can be performed by
two different methods. Analytical sensitivity coefficients are defined
as the ratio of the change in a model output to the change in a model
parameter that produced it. To obtain a sensitivity coefficient by this
method, the model is run for the exposure scenario of interest using
the preferred values of the input parameters, and the resulting
output (e.g., hair concentration) is recorded. The model is then
run again with the value of one of the input parameters varied
slightly. Typically, a 1% change is appropriate. The ratio of the
resulting incremental change in the output to the change in the
input represents the sensitivity coefficient. For example, if a 1%
increase in an input parameter resulted in a 0.5% decrease in the
output, the sensitivity coefficient would be 0.5. Sensitivity coeffi-
cients >1.0 in absolute value represent amplification of input error
and would be a cause for concern. An alternative approach is to
conduct a Monte Carlo analysis, as described below, and then to
perform a simple correlation analysis of the model outputs and input
parameters. Both methods have specific advantages. The analytical
sensitivity coefficient most accurately represents the functional rela-
tionship of the output to the specific input under the conditions
being modeled. The advantage of the correlation coefficients is that
they also reflect the impact of interactions between the parameters
during the Monte Carlo analysis.
3.9.5. Uncertainty Analysis There are a number of examples in the literature of evaluations of
the uncertainty associated with the predictions of a PBPK model
using the Monte Carlo simulation approach (10, 117, 118). In a
Monte Carlo simulation, a probability distribution for each of the
PBPK model parameters is randomly sampled, and the model is run
using the chosen set of parameter values. This process is repeated a
large number of times until the probability distribution for the
desired model output has been created. Generally speaking, 1,000
iterations or more may be required to ensure the reproducibility of
the mean and standard deviation of the output distributions as well
as the 1st through 99th percentiles. To the extent that the input
parameter distributions adequately characterize the uncertainty in
the inputs, and assuming that the parameters are reasonably inde-
pendent, the resulting output distribution will provide a useful
estimate of the uncertainty associated with the model outputs.
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 481
3.9.6. Collection As with model development, the best approach to model evaluation
of Critical Data is within the context of the scientific method. The most effective
way to evaluate a PBPK model is to exercise the model to generate a
quantitative hypothesis; that is, to predict the behavior of the
system of interest under conditions “outside the envelope” of the
data used to develop the model (at shorter/longer durations,
higher/lower concentrations, different routes, different species,
etc.). In particular, if there is an element of the model which
remains in question, the model can be exercised to determine the
experimental design under which the specific model element can
best be tested. For example, if there is uncertainty regarding
whether uptake into a particular tissue is flow or diffusion limited,
alternative forms of the model can be used to compare predicted
tissue concentration time courses under each of the limiting
assumptions under various experimental conditions. The experi-
mental design and sampling time which maximizes the difference
between the predicted tissue concentrations under the two assump-
tions can then serve as the basis for the actual experimental data
collection. Once the critical data has been collected, the same
model can also be used to support a more quantitative experimental
482 J.L. Campbell Jr. et al.
4. Example
4.2. Model Coding The following sections contain typical elements of the ACSL code
in ACSL for a PBPK model, interspersed with comments, which will be
written in italics to differentiate them from the actual model
code. The first section describes the model definition file, which
by convention in ACSL is given a filename with the extension CSL.
The model used as an example in the following sections is a
simple, multiroute model for volatile chemicals, similar to the
styrene model discussed earlier, except that is also has the ability
to simulate closed-chamber gas uptake experiments.
4.3. Typical Elements An acslX source file follows the structure defined in the Standard for
in a Model File Continuous Simulation Languages (just like there is a standard for
C++). Thus, for example, there will generally be an INITIAL block
484 J.L. Campbell Jr. et al.
!—————Experimental parameters
CONSTANT PDOSE ¼ 0. ! Oral dose (mg/kg)
CONSTANT IVDOSE ¼ 0. ! IV dose (mg/kg)
CONSTANT CONC ¼ 1000. ! Inhaled concentration (ppm)
CONSTANT CC ¼ .FALSE.! Default to open chamber
CONSTANT NRATS ¼ 3. ! Number of rats (for closed chamber)
CONSTANT KLC ¼ 0. ! First order loss from closed chamber (/hr)
CONSTANT VCHC ¼ 9.1 ! Volume of closed chamber (L)
CONSTANT TINF ¼ .01 ! Length of IV infusion (hr)
It is an understandable requirement in ACSL to define when to
stop and how often to report. The parameter for the reporting fre-
quency (“communication interval”) is assumed by the ACSL transla-
tor to be called CINT unless you tell it otherwise using the
CINTERVAL statement. The parameter for when to stop can be
called anything you want, as long as you use the same name in the
TERMT statement (see below), but the Ramseyan convention is
TSTOP:
CONSTANT TSTOP ¼ 24. ! Length of experiment (hr)
The following parameter name is generally used to define the length of
inhalation exposures (the name LENGTH is also used by some):
CONSTANT TCHNG ¼ 6. ! Length of inhalation exposure (hr)
The INITIAL block is a useful place to perform logical switching for
different model applications, in this case between the simulation of
closed-chamber gas uptake experiments and normal inhalation stud-
ies. It is also sometimes necessary to calculate initial conditions for one
of the integrals (“state variables”) in the model (the initial amount in
the closed chamber in this case):
IF (CC) RATS ¼ NRATS ! Closed chamber simulation
IF (CC) KL ¼ KLC
IF (.NOT.CC) RATS ¼ 0. ! Open chamber simulation
IF (.NOT.CC) KL ¼ 0.
! (Turn off chamber losses so concentration in chamber remains
constant)
IF (PDOSE.EQ.0.0) KA ¼ 0. ! If not oral dosing, turn off oral
uptake
VCH ¼ VCHC-RATS*BW ! Net chamber air volume (L)
AI0 ¼ CONC*VCH*MW/24450. ! Initial amount in cham-
ber (mg)
After all the constants have been defined, calculations using them
can be performed. In contrast to the DERIVATIVE block (of which
more later), the calculations in the INITIAL block are performed in
486 J.L. Campbell Jr. et al.
CS ¼ AS/VS
!——AR ¼ Amount in rapidly perfused tissues (mg)
RAR ¼ QR*(CA-CVR)
AR ¼ INTEG(RAR,0.)
CVR ¼ AR/(VR*PR)
CR ¼ AR/VR
!——AF ¼ Amount in fat tissue (mg)
RAF ¼ QF*(CA-CVF)
AF ¼ INTEG(RAF,0.)
CVF ¼ AF/(VF*PF)
CF ¼ AF/VF
!——AL ¼ Amount in liver tissue (mg)
RAL ¼ QL*(CA-CVL)-RAM + RAO
AL ¼ INTEG(RAL,0.)
CVL ¼ AL/(VL*PL)
CL ¼ AL/VL
AUCL ¼ INTEG(CL,0.)
!——AM ¼ Amount metabolized (mg)
RAM ¼ (VMAX*CVL)/(KM + CVL) + KF*CVL*VL
AM ¼ INTEG(RAM,0.)
!——AO ¼ Total mass input from stomach (mg)
RAO ¼ KA*MR
AO ¼ DOSE-MR
!——IV ¼ Intravenous infusion rate (mg/h)
IVZONE ¼ RSW(T.GE.TINF,0.,1.)
IV ¼ IVR*IVZONE
!——CV ¼ Mixed venous blood concentration (mg/L)
CV ¼ (QF*CVF + QL*CVL + QS*CVS + QR*CVR + IV)/
QC
!——TMASS ¼ mass balance (mg)
TMASS ¼ AF + AL + AS + AR + AM + AX + MR
!——DOSEX ¼ Net amount absorbed (mg)
DOSEX ¼ AI + AO + IVR*TINF-AX
Last, but definitely not least, you have to tell ACSL when to stop:
TERMT(T.GE.TSTOP) ! Condition for terminating simulation
END ! End of derivative block
END ! End of dynamic section
Another kind of code section, the TERMINAL block, can also be
used here to execute statements that should only be calculated at the
end of the run.
END ! End of program
490 J.L. Campbell Jr. et al.
4.4. Model Evaluation The following section discusses various issues associated with the
evaluation of a PBPK model. Once an initial model has been
developed, it must be evaluated on the basis of its conformance
with experimental data. In some cases, the model may be exercised
to predict conditions under which experimental data should be
collected in order to verify or improve model performance. Com-
parison of the resulting data with the model predictions may sug-
gest that revision of the model will be required. Similarly, a PBPK
model designed for one chemical or application may be adapted to
another chemical or application, requiring modification of the
model structure and parameters. It is imperative that revision or
modification of a model is conducted with the same level of rigor
applied during initial model development, and that structures are
not added to the model with no other justification than that they
improve the agreement of the model with a particular data set.
In addition to comparing model predictions to experimental
data, model evaluation includes assessing the plausibility of the
model input parameters, and the confidence which can be placed
in extrapolations performed by the model. This aspect of model
evaluation is particularly important in the case of applications in risk
assessment, where it is necessary to assess the uncertainty associated
with risk estimates calculated with the model.
4.5. Model Revision An attempt to model the metabolism of allyl chloride (119) serves
as an excellent example of the process of model refinement and
validation. As mentioned earlier, in a gas uptake experiment several
animals are maintained in a small, enclosed chamber while the air in
the chamber is recirculated, with replenishment of oxygen and
scrubbing of carbon dioxide. A small amount of a volatile chemical
is then allowed to vaporize into the chamber, and the concentration
of the chemical in the chamber air is monitored over time. In this
design, any loss of the chemical from the chamber air reflects uptake
into the animals. After a short period of time during which the
chemical achieves equilibration with the animals’ tissues, any fur-
ther uptake represents the replacement of chemical removed from
the animals by metabolism. Analysis of gas uptake data with a PBPK
model has been used successfully to determine the metabolic para-
meters for a number of chemicals (103).
In an example of a successful gas uptake analysis, (108)
described the closed chamber kinetics of methylene chloride using
a PBPK model which included two metabolic pathways: one satu-
rable, representing oxidation by Cytochrome P450 enzymes, and
one linear, representing conjugation with glutathione (Fig. 22).
As can be seen in this figure, there is a marked concentration
dependence of the observed rate of loss of this chemical from the
chamber. The initial decrease in chamber concentration in all of the
experiments results from the uptake of chemical into the animal
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 491
Fig. 22. Gas uptake experiment. Concentration (ppm) of methylene chloride in a closed,
recirculated chamber containing three Fischer 344 rats. Initial chamber concentrations
were (top to bottom) 3,000, 1,000, 500, and 100 ppm. Solid lines show the predictions of
the model for a Vmax of 4.0 mg/h/kg, a Km of 0.3 mg/L, and a first-order rate constant of
2.0/h/kg, while symbols represent the measured chamber atmosphere concentrations.
Fig. 23. Model failure. Concentration (ppm) of allyl chloride in a closed, recirculated
chamber containing three Fischer 344 rats. Initial chamber concentrations were (top to
bottom) 5,000, 2,000, 1,000, and 500 ppm. Symbols represent the measured chamber
atmosphere concentrations. The curves represent the best result that could be obtained
from an attempt to fit all of the data with a single set of metabolic constants using the
same closed chamber model structure as in Fig. 4.1.
Table 5
Predicted glutathione depletion caused by inhalation
exposure to allyl chloride
Depletion (mM)
Fig. 24. Cofactor depletion. Symbols represent the same experimental data as in Fig. 23.
The curves show the predictions of the expanded model, which not only included
depletion of glutathione by reaction with allyl chloride, but also provided for regulation
of glutathione biosynthesis on the basis of the instantaneous glutathione concentration, as
described in the text.
494 J.L. Campbell Jr. et al.
References
8. Andersen ME, Clewell HJ, Gargas ML, Smith 21. Clewell HJ, Andersen ME (1986) A multiple
FA, Reitz RH (1987) Physiologically based dose-route physiological pharmacokinetic
pharmacokinetics and the risk assessment for model for volatile chemicals using ACSL/PC.
methylene chloride. Toxicol Appl Pharmacol In: Cellier FD (ed) Languages for continuous
87:185–205 system simulation. Society for Computer
9. Gerrity TR, Henry CJ (1990) Principles of Simulation, San Diego, pp 95–101
route-to-route extrapolation for risk assess- 22. Ramsey JC, Andersen ME (1984) A physio-
ment. Elsevier, New York logical model for the inhalation pharmacoki-
10. Clewell HJ, Jarnot BM (1994) Incorporation netics of inhaled styrene monomer in rats and
of pharmacokinetics in non-carcinogenic risk humans. Toxicol Appl Pharmacol
assessment: Example with chloropentafluoro- 73:159–175
benzene. Risk Anal 14:265–276 23. Adolph EF (1949) Quantitative relations in
11. Clewell HJ (1995) Incorporating biological the physiological constitutions of mammals.
information in quantitative risk assessment: Science 109:579–585
an example with methylene chloride. Toxicol- 24. Dedrick RL (1973) Animal scale-up. J Phar-
ogy 102:83–94 macokinet Biopharm 1:435–461
12. Clewell HJ (1995) The application of physio- 25. Dedrick RL, Bischoff KB (1980) Species
logically based pharmacokinetic modeling in similarities in pharmacokinetics. Fed Proc
human health risk assessment of hazardous 39:54–59
substances. Toxicol Lett 79:207–217 26. McDougal JN, Jepson GW, Clewell HJ, Mac-
13. Clewell HJ, Gentry PR, Gearhart JM, Allen BC, Naughton MG, Andersen ME (1986) A phys-
Andersen ME (1995) Considering pharmacoki- iological pharmacokinetic model for dermal
netic and mechanistic information in cancer risk absorption of vapors in the rat. Toxicol Appl
assessments for environmental contaminants: Pharmacol 85:286–294
examples with vinyl chloride and trichloroethy- 27. Paustenbach DJ, Clewell HJ, Gargas ML,
lene. Chemosphere 31:2561–2578 Andersen ME (1988) A physiologically based
14. Clewell HJ, Andersen ME (1996) Use of pharmacokinetic model for inhaled carbon
physiologically-based pharmacokinetic mod- tetrachloride. Toxicol Appl Pharmacol
eling to investigate individual versus popula- 96:191–211
tion risk. Toxicology 111:315–329 28. Vinegar A, Seckel CS, Pollard DL, Kinkead
15. Clewell HJ III, Gentry PR, Gearhart JM ER, Conolly RB, Andersen ME (1992) Poly-
(1997) Investigation of the potential impact chlorotrifluoroethylene (PCTFE) oligomer
of benchmark dose and pharmacokinetic pharmacokinetics in Fischer 344 rats: devel-
modeling in noncancer risk assessment. J Tox- opment of a physiologically based model.
icol Environ Health 52:475–515 Fundam Appl Toxicol 18:504–514
16. Himmelstein KJ, Lutz RJ (1979) A review of 29. Clewell HJ, Andersen ME (1989) Improving
the application of physiologically based phar- toxicology testing protocols using computer
macokinetic modeling. J Pharmacokinet Bio- simulations. Toxicol Lett 49:139–158
pharm 7:127–145 30. Bischoff KB, Brown RG (1966) Drug distri-
17. Gerlowski LE, Jain RK (1983) Physiologically bution in mammals. Chem Eng Prog Symp 62
based pharmacokinetic modeling: principles (66):33–45
and applications. J Pharm Sci 72:1103–1126 31. Astrand P, Rodahl K (1970) Textbook of
18. Fiserova-Bergerova V (1983) Modeling of work physiology. McGraw-Hill, New York
inhalation exposure to vapors: uptake distri- 32. International Commission on Radiological
bution and elimination, vol 1 and 2. CRC, Protection (ICRP) (1975) Report of the task
Boca Raton group on reference man. ICRP Publication 23
19. Bischoff KB (1987) Physiologically based 33. Environmental Protection Agency (EPA)
pharmacokinetic modeling. National (1988) Reference physiological parameters in
Research Council. In: Pharmacokinetics in pharmacokinetic modeling. EPA/600/6-88/
Risk Assessment. Drinking water and health, 004. Office of Health and Environmental
vol 8. National Academy Press, Washington, Assessment, Washington, DC
DC, pp. 36–61 34. Davies B, Morris T (1993) Physiological para-
20. Leung HW (1991) Development and meters in laboratory animals and humans.
utilization of physiologically based pharmaco- Pharm Res 10:1093–1095
kinetic models for toxicological applications. 35. Brown RP, Delp MD, Lindstedt SL,
J Toxicol Environ Health 32:247–267 Rhomberg LR, Beliles RP (1997) Physiologi-
cal parameter values for physiologically based
496 J.L. Campbell Jr. et al.
pharmacokinetic models. Toxicol Ind Health 48. Allen BC, Fisher J (1993) Pharmacokinetic
13(4):407–484 modeling of trichloroethylene and trichloroa-
36. Sato A, Nakajima T (1979) Partition coeffi- cetic acid in humans. Risk Anal 13:71–86
cients of some aromatic hydrocarbons and 49. Corley RA, Mendrala AL, Smith FA, Staats DA,
ketones in water, blood and oil. Br J Ind Gargas ML, Conolly RB, Andersen ME, Reitz
Med 36:231–234 RH (1990) Development of a physiologically
37. Sato A, Nakajima T (1979) A vial equilibra- based pharmacokinetic model for chloroform.
tion method to evaluate the drug metaboliz- Toxicol Appl Pharmacol 103:512–527
ing enzyme activity for volatile hydrocarbons. 50. Reitz RH, Mendrala AL, Corley RA, Quast
Toxicol Appl Pharmacol 47:41–46 JF, Gargas ML, Andersen ME, Staats DA,
38. Gargas ML, Burgess RJ, Voisard DE, Cason Conolly RB (1990) Estimating the risk of
GH, Andersen ME (1989) Partition coeffi- liver cancer associated with human exposures
cients of low-molecular-weight volatile che- to chloroform using physiologically based
micals in various liquids and tissues. Toxicol pharmacokinetic modeling. Toxicol Appl
Appl Pharmacol 98:87–99 Pharmacol 105:443–459
39. Jepson GW, Hoover DK, Black RK, McCaff- 51. Johanson G (1986) Physiologically based
erty JD, Mahle DA, Gearhart JM (1994) A pharmacokinetic modeling of inhaled 2-
partition coefficient determination method butoxyethanol in man. Toxicol Lett 34:23–31
for nonvolatile chemicals in biological tissues. 52. Bungay PM, Dedrick RL, Matthews HB
Fundam Appl Toxicol 22:519–524 (1981) Enteric transport of chlordecone
40. Clewell HJ (1993) Coupling of computer (Kepone) in the rat. J Pharmacokinet Bio-
modeling with in vitro methodologies to pharm 9:309–341
reduce animal usage in toxicity testing. 53. Tuey DB, Matthews HB (1980) Distribution
Toxicol Lett 68:101–117 and excretion of 2,20 ,4,40 ,5,50 -hexabromobi-
41. Bischoff KB, Dedrick RL, Zaharko DS, Long- phenyl in rats and man: pharmacokinetic
streth JA (1971) Methotrexate pharmacoki- model predictions. Toxicol Appl Pharmacol
netics. J Pharm Sci 60:1128–1133 53:420–431
42. Farris FF, Dedrick RL, King FG (1988) Cis- 54. Lutz RJ, Dedrick RL, Tuey D, Sipes IG,
platin pharmacokinetics: application of a phys- Anderson MW, Matthews HB (1984) Com-
iological model. Toxicol Lett 43:117–137 parison of the pharmacokinetics of several
43. Edginton AN, Theil FP, Schmitt W, Willmann polychlorinated biphenyls in mouse, rat,
S (2008) Whole body physiologically-based dog, and monkey by means of a physiological
pharmacokinetic models: their use in clinical pharmacokinetic model. Drug Metab Dispos
drug development. Expert Opin Drug Metab 12(5):527–535
Toxicol 4:1143–1152 55. King FG, Dedrick RL, Collins JM, Matthews
44. Andersen ME, Clewell HJ III, Gargas ML, HB, Birnbaum LS (1983) Physiological
MacNaughton MG, Reitz RH, Nolan R, model for the pharmacokinetics of 2,3,7,8-
McKenna M (1991) Physiologically based tetrachlorodibenzofuran in several species.
pharmacokinetic modeling with dichloro- Toxicol Appl Pharmacol 67:390–400
methane, its metabolite carbon monoxide, 56. Leung HW, Ku RH, Paustenbach DJ, Ander-
and blood carboxyhemoglobin in rats and sen ME (1988) A physiologically based phar-
humans. Toxicol Appl Pharmacol 108:14–27 macokinetic model for 2,3,7,8-
45. Andersen ME, Clewell HJ, Mahle DA, Gear- tetrachlorodibenzo-p-dioxin in C57BL/6J
hart JM (1994) Gas uptake studies of deute- and DBA/2J mice. Toxicol Lett 42:15–28
rium isotope effects on dichloromethane 57. Andersen ME, Mills JJ, Gargas ML, Kedderis
metabolism in female B6C3F1 mice in vivo. L, Birnbaum LS, Norbert D, Greenlee WF
Toxicol Appl Pharmacol 128:158–165 (1993) Modeling receptor-mediated pro-
46. Fisher J, Gargas M, Allen B, Andersen M cesses with dioxin: implications for pharmaco-
(1991) Physiologically based pharmacokinetic kinetics and risk assessment. Risk Anal 13
modeling with trichloroethylene and its metab- (1):25–36
olite, trichloroacetic acid, in the rat and mouse. 58. O’Flaherty EJ (1991) Physiologically based
Toxicol Appl Pharmacol 109:183–195 models for bone seeking elements. I. Rat skel-
47. Fisher JW, Allen BC (1993) Evaluating the etal and bone growth. Toxicol Appl Pharma-
risk of liver cancer in humans exposed to tri- col 111:299–312
chloroethylene using physiological models. 59. O’Flaherty EJ (1991) Physiologically based
Risk Anal 13:87–95 models for bone seeking elements. II. Kinetics
18 Physiologically Based Pharmacokinetic/Toxicokinetic Modeling 497
of lead disposition in rats. Toxicol Appl Phar- logically based pharmacokinetic modeling.
macol 111:313–331 Regul Toxicol Pharmacol 58:252–258
60. O’Flaherty EJ (1991) Physiologically based 71. Clewell RA, Merrill EA, Robinson PJ (2001)
models for bone seeking elements. III. The use of physiologically based models to
Human skeletal and bone growth. Toxicol integrate diverse data sets and reduce uncer-
Appl Pharmacol 111:332–341 tainty in the prediction of perchlorate and
61. O’Flaherty EJ (1993) Physiologically based iodide kinetics across life stages and species.
models for bone seeking elements. IV. Kinet- Toxicol Ind Health 17:210–222
ics of lead disposition in humans. Toxicol 72. Clewell RA, Merrill EA, Yu KO, Mahle DA,
Appl Pharmacol 118:16–29 Sterner TR, Mattie DR, Robinson PJ, Fisher
62. O’Flaherty EJ (1995) Physiologically based JW, Gearhart JM (2003) Predicting fetal per-
models for bone seeking elements. V. Lead chlorate dose and inhibition of iodide kinetics
absorption and disposition in childhood. during gestation: a physiologically-based
Toxicol Appl Pharmacol 131:297–308 pharmacokinetic analysis of perchlorate and
63. Mann S, Droz PO, Vahter M (1996) A physi- iodide kinetics in the rat. Toxicol Sci
ologically based pharmacokinetic model for 73:235–255
arsenic exposure. I. Development in hamsters 73. Clewell RA, Merrill EA, Yu KO, Mahle DA,
and rabbits. Toxicol Appl Pharmacol Sterner TR, Fisher JW, Gearhart JM (2003)
137:8–22 Predicting neonatal perchlorate dose and
64. Mann S, Droz PO, Vahter M (1996) A physi- inhibition of iodide uptake in the rat during
ologically based pharmacokinetic model for lactation using physiologically-based pharma-
arsenic exposure. II. Validation and applica- cokinetic modeling. Toxicol Sci 74:416–436
tion in humans. Toxicol Appl Pharmacol 74. Merrill EA, Clewell RA, Gearhart JM, Robin-
140:471–486 son PJ, Sterner TR, Yu KO, Mattie DR, Fisher
65. Farris FF, Dedrick RL, Allen PV, Smith JC JW (2003) PBPK predictions of perchlorate
(1993) Physiological model for the pharma- distribution and its effect on thyroid uptake of
cokinetics of methyl mercury in the growing radioiodide in the male rat. Toxicol Sci
rat. Toxicol Appl Pharmacol 119:74–90 73:256–269
66. McMullin TS, Hanneman WH, Cranmer BK, 75. Merrill EA, Clewell RA, Robinson PJ, Jarabek
Tessari JD, Andersen ME (2007) Oral absorp- AM, Gearhart JM, Sterner TR, Fisher JW
tion and oxidative metabolism of atrazine in (2005) PBPK model for radioactive iodide
rats evaluated by physiological modeling and perchlorate kinetics and perchlorate-
approaches. Toxicology 240:1–14 induced inhibition of iodide uptake in
humans. Toxicol Sci 83:25–43
67. Lin Z, Fisher JW, Ross MK, Filipov NM
(2011) A physiologically based pharmacoki- 76. McLanahan ED, Andersen ME, Campbell JL,
netic model for atrazine and its main metabo- Fisher JW (2009) Competitive inhibition of
lites in the adult male C57BL/6 mouse. thyroidal uptake of dietary iodide by perchlo-
Toxicol Appl Pharmacol 251:16–31 rate does not describe perturbations in rat
serum total T4 and TSH. Environ Health
68. Kirman CR, Hays SM, Kedderis GL, Gargas Perspect 117:731–738
ML, Strother DE (2000) Improving cancer
dose-response characterization by using phys- 77. Haddad S, Charest-Tardif G, Tardif R, Krish-
iologically based pharmacokinetic modeling: nan K (2000) Validation of a physiological
an analysis of pooled data for acrylonitrile- modeling framework for simulating the toxi-
induced brain tumors to assess cancer potency cokinetics of chemicals in mixtures. Toxicol
in the rat. Risk Anal 20:135–151 Appl Pharmacol 167:199–209
69. Sweeney LM, Gargas ML, Strother DE, Ked- 78. Haddad S, Beliveau M, Tardif R, Krishnan K
deris GL (2003) Physiologically based pharma- (2001) A PBPK modeling-based approach to
cokinetic model parameter estimation and account for interactions in the health risk
sensitivity and variability analyses for acryloni- assessment of chemical mixtures. Toxicol Sci
trile disposition in humans. Toxicol Sci 63:125–131
71:27–40 79. Dennison JE, Andersen ME, Yang RS (2003)
70. Takano R, Murayama N, Horiuchi K, Characterization of the pharmacokinetics of
Kitajima M, Kumamoto M, Shono F, Yama- gasoline using PBPK modeling with a
zaki H (2010) Blood concentrations of acry- complex mixtures chemical lumping
lonitrile in humans after oral administration approach. Inhal Toxicol 15:961–986
extrapolated from in vivo rat pharmacokinet- 80. Dennison JE, Andersen ME, Clewell HJ,
ics, in vitro human metabolism, and physio- Yang RSH (2004) Development of a
498 J.L. Campbell Jr. et al.
Interspecies Extrapolation
Elaina M. Kenyon
Abstract
Interspecies extrapolation encompasses two related but distinct topic areas that are germane to quantitative
extrapolation and hence computational toxicology—dose scaling and parameter scaling. Dose scaling is the
process of converting a dose determined in an experimental animal to a toxicologically equivalent dose in
humans using simple allometric assumptions and equations. In a hierarchy of quantitative extrapolation
approaches, this option is used when minimal information is available for a chemical of interest. Parameter
scaling refers to cross-species extrapolation of specific biological processes describing rates associated
with pharmacokinetic (PK) or pharmacodynamic (PD) events on the basis of allometric relationships.
These parameters are used in biologically based models of various types that are designed for not only
cross-species extrapolation but also for exposure route (e.g., inhalation to oral) and exposure scenario
(duration) extrapolation. This area also encompasses in vivo scale-up of physiological rates determined in
various experimental systems. Results from in vitro metabolism studies are generally most useful for
interspecies extrapolation purposes when integrated into a physiologically based pharmacokinetic (PBPK)
modeling framework. This is because PBPK models allow consideration and quantitative evaluation of
other physiological factors, such as binding to plasma proteins and blood flow to the liver, which may be as
or more influential than metabolism in determining relevant dose metrics for risk assessment.
Key words: Scaling, Extrapolation, In vitro scale-up, Allometry, Cross-species, In vitro to in vivo
extrapolation (IVIVE)
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_19, # Springer Science+Business Media, LLC 2012
501
502 E.M. Kenyon
2. Materials
There are a range of tools to assist in the analysis of the types of data
to which allometric scaling and in vitro scale-up procedures may be
applied. Because the calculations themselves tend to be numerically
simple, many commercially available spreadsheet and graphical
software (with curve fitting capabilities) packages are suitable for
these types of analyses. These software packages are typically able to
run on most desktop or laptop personal computers.
19 Interspecies Extrapolation 503
3. Methods
3.1. Dose Scaling As used in this chapter, the term dose scaling is the process of
directly converting a dose determined in an experimental animal
model to a toxicologically equivalent dose in humans at a gross or
default level. The scientific basis for this procedure is the
generalized allometric equation
Y ¼ aðBWÞb ; (1)
where Y is the physiological variable of interest, BW is body
weight, a is the y-intercept, and b is the slope of the line obtained
from a plot of log Y vs. log BW.
This relationship was originally studied in regard to energy
utilization or basal metabolism by Kleiber whose analyses suggested
that basal metabolism scales to the 3/4 (0.75) power of body
weight across species (3–5). This was in contrast to the generally
accepted “surface law” at that time (16), i.e., the concept that basal
metabolism scales across species according to body surface area or
the 2/3 power of body weight (3, 4, 16). A variety of theories have
been advanced to explain this allometric relationship, including
“elastic similarity” of skeletal and muscular structures (17), “the
fractal nature of energy distributing vascular networks” (18, 19),
and others (20, 21). It should be noted that agreement on the
scaling exponent, i.e., 0.75, is not universal and some analyses
have been published which suggest 2/3 is the more appropriate
value (22). Dodds et al. (23) performed a reanalysis of data from
the published literature and concluded that available information
did not allow one to distinguish between an exponent of 2/3 vs.
3/4 as being more predictive.
504 E.M. Kenyon
3.2. Parameter Scaling As used in this chapter, the term parameter scaling is the process of
scaling a physiological variable from an experimental animal value
to a human value for use in a PBPK model. Examples of parameters
19 Interspecies Extrapolation 505
Table 1
Selected physiological characteristics and their body
weight (BW) scaling coefficientsa, b
3.3. In Vitro to In Vivo Data from various in vitro systems are commonly used to estimate
Scale-Up metabolic rate parameters that may then be scaled to the level of the
whole tissue and organism. Systems used include precision cut organ
slices, whole cell preparations (e.g., hepatocytes), subcellular tissue
fractions (i.e., microsomes and cytosol), and recombinantly expressed
enzymes. The performance, advantages, and limitations of these
systems have been compared and reviewed in a number of publica-
tions (e.g., (12, 30–32)). Figure 1 provides a schematic illustration of
the scaling procedures used with in vitro systems derived from liver
tissue for hepatocytes and subcellular fractions (microsomes and
19 Interspecies Extrapolation 507
4. Examples
4.1. Dose Scaling Table 2 below illustrates the results of BW0.75 scaling of a hypo-
thetical dose of 10 mg/kg in each of the four species of experimen-
tal animal to a “toxicologically equivalent” dose in a 70 kg human
using equations (2) or (3) and (5) from Subheading 3.1. The
significant assumptions and limitations inherent in this procedure
are discussed in Subheading 5.1.
Table 2
Use of the Dosimetric Adjustment Factor (DAF) in derivation
of a human equivalent dose (HED) from an oral animal
exposure
HED
Animal BWa BWh Animal dose DAF DAF (mg/kg-day)
species (kg) (kg) (mg/kg/day) Eq. (2) Eq. (3) Eq. (5)
Mouse 0.025 70 10 0.137 0.137 1.37
Rat 0.25 70 10 0.244 0.244 2.44
Rabbit 3.5 70 10 0.473 0.473 4.73
Dog 12 70 10 0.643 0.643 6.43
Note that Eqs. (2) and (3), i.e., DAF ¼ (BWa/BWh)
0.25
and DAF ¼ (BWh/
BWa)0.25, yield the same result
510 E.M. Kenyon
Table 3
Calculation of human tissue-to-blood partition coefficients
(PC) for toluenea
4.3. In Vitro to In Vivo Example: Scaling Vmax and KM for Chloroform (CHCL3) from In
Scale-Up Vitro Data Obtained Using Human Microsomes (34):
LR ¼ in vitro Vmax MEGL LW ¼ 5:24 pmole CHCl3 = min =
pmol CYP 2E1 2; 562 pmole CYP2E1=g liver 1; 820 g liver
¼ 24; 433; 281:6 pmoles CHCl3 = min =whole liver:
19 Interspecies Extrapolation 511
Table 4
Examples of cross-species scaling of rate parameters from a variety
of in vivo pharmacokinetic data
Table 5
Data used for in vitro to in vivo extrapolation
of Vmax and KM for CHCl3
5. Notes
Disclaimer
References
1. Dedrick RL (1973) Animal scale-up. J Pharma- based on equivalence of mg/kg3/4/day.
cokinet Biopharm 1:435–461 Notice Fed Reg 57:24152–24173
2. U.S. EPA (U.S. Environmental Protection 3. Kleiber M (1932) Body size and metabolism.
Agency) (1992) Draft report: a cross-species Hilgardia 6:315–353
scaling factor for carcinogen risk assessment
518 E.M. Kenyon
4. Kleiber M (1947) Body size and metabolic 22. White CR, Seymour RS (2003) Mammalian basal
rate. Physiol Rev 27:511–541 metabolic rate is proportional to body mass2/3.
5. Kleiber M (1961) The fire of life: an introduc- Proc Natl Acad Sci U S A 100:4046–4049
tion to animal energetics. Wiley, New York, NY 23. Dodds PS, Rothman DH, Weitz JS (2001)
6. O’Flaherty EJ (1989) Interspecies conversion Re-examination of the “3/4-law” of metabo-
of kinetically equivalent doses. Risk Anal lism. J Theor Biol 209:9–27
9:587–598 24. IPCS (International Programme on Chemical
7. Rhomberg LR, Lewandowski TA (2006) Safety) (2005) Guidance document for the use
Methods for identifying a default cross-species of data in development of chemical-specific
scaling factor. Hum Ecol Risk Assess adjustment factors (CSAFs) for interspecies dif-
12:1094–1127 ferences and human variability in dose/con-
8. Travis CC, White RK (1988) Interspecies scal- centration-response assessment. World Health
ing of toxicity data. Risk Anal 8:119–125 Organization, Geneva
9. U.S. EPA (2005) Guidelines for carcinogen 25. U.S. EPA (1994) Methods for derivation
risk assessment. EPA/630/P-03/001F Risk of inhalation reference concentrations and
Assessment Forum, Washington, DC application of inhalation dosimetry. EPA/
600/8-90/066F. Environmental Criteria and
10. U.S. EPA (2011) Harmonization in interspe- Assessment Office, Washington, DC
cies extrapolation: use of body weight3/4 as the
default method in derivation of the oral refer- 26. Boxenbaum H (1982) Interspecies scaling,
ence dose. EPA/100/R11/0001 Risk Assess- allometry, physiological time, and the ground
ment Forum, Washington, DC plan of pharmacokinetics. J Pharmacokinet
Biopharm 10:201–227
11. Clewell HJ, Reddy MB, Lave T, Andersen ME
(2008) Physiologically based pharmacokinetic 27. Reddy MB, Yang RSH, Clewell HJ, Andersen
modeling. In: Gad SC (ed) Preclinical develop- ME (2005) Physiologically based pharmacoki-
ment handbook: ADME and biopharmaceutical netic modeling—science and applications.
properties. Wiley, New York, NY, pp 1167–1227 Wiley Interscience, Hoboken, NJ
12. Lipscomb JC, Poet TS (2008) In vitro 28. Adolph EF (1949) Quantitative relations in the
measurements of metabolism for application physiological constitutions of mammals. Sci-
in pharmacokinetic modeling. Pharmacol ence 109:579–585
Ther 118:82–103 29. Mordenti J (1986) Man versus beast: pharma-
13. Matthews JC (1993) Fundamentals of recep- cokinetic scaling in mammals. J Pharm Sci
tor, enzyme and transport kinetics. CRC, Boca 75:1028–1040
Raton, FL 30. Andersson TB, Sjoberg H, Hoffman K-J, Boo-
14. Cornish-Bowden A (1995) Analysis of enzyme bis AR, Watts P, Edwards RJ, Lake BJ, Price RJ,
kinetic data. Oxford University Press, Oxford Renwick AB, Gomez-Lechon MJ, Castell JV,
Ingelman-Sundberg M, Hidestrand M,
15. Cornish-Bowden A (2004) Fundamentals of Goldfarb PS, Lewis DFV, Corcos L, Guillouzo
enzyme kinetics, 3rd edn. Portland Press, London A, Taavitsainen P, Pelkonen O (2001) An assess-
16. Rubner M (1883) Uber den einfluss der kor- ment of human liver-derived in vitro systems to
pergrosse auf stoff- und kraftwechsel. Zeit Biol predict the in vivo metabolism and clearance of
19:536–562 almokalant. Drug Metab Dispos 29:712–720
17. McMahon TA (1975) Using body size to under- 31. Carlile DJ, Zomorodi K, Houston JB (1997)
stand the structural design of animals: quadru- Scaling factors to relate drug metabolic clear-
pedal locomotion. J Appl Physiol 39:619–627 ance in hepatic microsomes, isolated hepato-
18. West GB, Brown JH, Endquist BJ (1997) A cytes and the intact liver—studies with
general model for the origin of allometric scal- induced livers involving diazepam. Drug
ing laws in biology. Science 276:122–126 Metab Dispos 25:903–911
19. West GB, Woodruff WH, Brown JH (2002) Allo- 32. Tang W, Wang RW, Lu AYH (2005) Utility of
metric scaling of metabolic rate from molecules recombinant cytochrome P450 enzymes: a
and mitochondria to cells and mammals. Proc drug metabolism perspective. Curr Drug
Natl Acad Sci U S A 99(Suppl 1):2473–2478 Metab 6:503–517
20. Banavar JR, Maritan A, Rinaldo A (1999) Size 33. Brown RP, Delp MD, Lindstedt SL, Rhomberg
and form in efficient transportation networks. LR, Beliles RP (1997) Physiological parameter
Nature 399:130–131 values for physiologically based pharmacokinetic
21. Bejan A (2000) Shape and structure, from models. Toxicol Ind Health 13:407–484
engineering to nature. Cambridge University 34. U.S. EPA (Lipscomb JC, Kedderis GL) (2005)
Press, Cambridge Use of physiologically based pharmacokinetic
19 Interspecies Extrapolation 519
models to quantify the impact of human age 47. Travis CC (1990) Tissue dosimetry for reactive
and interindividual differences in physiology metabolites. Risk Anal 10:317–321
and biochemistry pertinent to risk: final report 48. Rhomberg LR, Wolff SK (1998) Empirical
for cooperative agreement ORD/NCEA Cin- scaling of single oral lethal doses across mam-
cinnati, OH EPA/600/R-06-014A malian species base on a large database. Risk
35. Lipscomb JC, Teuschler LK, Swartout JC, Pop- Anal 18:741–753
ken D, Cox T, Kedderis GL (2003) The impact 49. Burzala-Kowalczyk L, Jongbloed G (2011)
of cytochrome P450 2E1-dependent metabolic Allometric scaling: analysis of LD50 data. Risk
variance on a risk relevant pharmacokinetic out- Anal 31:523–532
come in humans. Risk Anal 23:1221–1238 50. Ginsberg G, Hattis D, Sonawane B, Russ A,
36. Lipscomb JC, Kedderis GL (2002) Incorporat- Banati P, Kozlak M, Smolenski S, Goble R
ing human interindividual biotransformation (2002) Evaluation of child/adult pharmacoki-
variance in health risk assessment. Sci Total netic differences from a database derived from
Environ 288:12–21 the therapeutic drug literature. Toxicol Sci
37. Lipscomb JC (2004) Evaluating the relation- 66:185–200
ship between variance in enzyme expression 51. Ginsberg G, Hattis D, Miller R, Sonawane B
and toxicant concentration in health risk assess- (2004) Pediatric pharmacokinetic data: impli-
ment. Hum Ecol Risk Assess 10:39–55 cations for environmental risk assessment for
38. Thrall KD, Gies RA, Muniz J, Woodstock AD, children. Pediatrics 113(Suppl):973–983
Higgins G (2002) Route-of-entry and brain 52. Hattis D (2004) Role of dosimetric scaling and
tissue partition coefficients for common super- species extrapolation in evaluating risks across
fund contaminants. J Toxicol Environ Health life stages IV pharmacodynamic dosimetric
Part A 65:2075–2086 considerations. Report to the U.S. Environ-
39. Gargas ML, Burgess RJ, Voisard DE, Cason mental Protection Agency under RFQ No
GH, Andersen ME (1989) Partition coeffi- DC-03-00009
cients of low-molecular-weight volatile chemi- 53. Finlay BL, Darlington RB (1995) Linked reg-
cals in various liquids and tissues. Toxicol Appl ularities in the development and evolution of
Pharmacol 98:87–99 mammalian brains. Science 268:1578–1584
40. Lilly PD, Andersen ME, Ross TM, Pegram RA 54. Renwick AG, Lazarus NR (1998) Human
(1997) Physiologically based estimation of variability and noncancer risk assessment—an
in vivo rates of bromodichloromethane metab- analysis of the default uncertainty factor. Regul
olism. Toxicology 124:141–152 Toxicol Pharmacol 27:3–20
41. Kenyon EM, Kraichely RE, Hudson KT, Med- 55. Clancy B, Darlington RB, Finlay BL (2001)
insky MA (1996) Differences in rates of benzene Translating developmental time across mam-
metabolism correlate with observed genotoxi- malian species. Neuroscience 105:7–17
city. Toxicol Appl Pharmacol 136:649–656 56. Krishnam K, Andersen ME (1994) Physiologi-
42. Gargas ML, Andersen ME, Clewell HJ (1986) A cally based pharmacokinetic modeling in toxi-
physiologically based simulation approach for cology. In: Hayes AW (ed) Principles and
determining metabolic constants from gas uptake methods of toxicology, 3rd edn. Raven Press,
data. Toxicol Appl Pharmacol 86:341–352 New York, NY, pp 149–188
43. Lipscomb JC, Barton H, Tornerol-Velez R 57. Jepson GW, Hoover DK, Black RK, McCaff-
(2004) The metabolic rate constants and spe- erty JD, Mahle DA, Gearhart JM (1994) A
cific activity of human and rat hepatic cyto- partition coefficient determination method for
chrome P450 2E1 toward chloroform. nonvolatile chemicals in biological tissues.
J Toxicol Environ Health 67:537–553 Fundam Appl Toxicol 22:519–524
44. Delic JI, Lilly PD, MacDonald AJ, Loizou GD 58. Gallo JM, Lam FC, Perrier DG (1987) Area
(2000) The utility of PBPK in the safety assess- method for the estimation of partition coeffi-
ment of chloroform and carbon tetrachloride. cients for physiological pharmacokinetic mod-
Reg Toxicol Pharmacol 32:144–155 els. J Pharmacokinet Biopharm 15:271–280
45. Corley RA, Mendrala AL, Smith FA et al 59. Teo SKO, Kedderis GL, Gargas ML (1994)
(1990) Development of a physiologically Determination of tissue partition coefficients
based pharmacokinetic model for chloroform. for volatile tissue-reactive chemicals: acryloni-
Toxicol Appl Pharmacol 103:512–527 trile and its metabolite 2-cyanoethylene oxide.
46. Beck BD, Clewell HJ III (2001) Uncertainty/ Toxicol Appl Pharmacol 128:92–96
safety factors in health risk assessment: oppor- 60. Khor SP, Mayersohn M (1991) Potential error in
tunities for improvement. Hum Ecol Risk the measurement of tissue to blood distribution
Assess 7:203–207
520 E.M. Kenyon
Abstract
Chemical risk assessment for human health requires a multidisciplinary approach through four steps: hazard
identification and characterization, exposure assessment, and risk characterization. Hazard identification
and characterization aim to identify the metabolism and elimination of the chemical (toxicokinetics) and the
toxicological dose–response (toxicodynamics) and to derive a health-based guidance value for safe levels of
exposure. Exposure assessment estimates human exposure as the product of the amount of the chemical in
the matrix consumed and the consumption itself. Finally, risk characterization evaluates the risk of the
exposure to human health by comparing the latter to with the health-based guidance value. Recently, many
research efforts in computational toxicology have been put together to characterize population variability
and uncertainty in each of the steps of risk assessment to move towards more quantitative and transparent
risk assessment. This chapter focuses specifically on modeling population variability and effects for each step
of risk assessment in order to provide an overview of the statistical and computational tools available to
toxicologists and risk assessors. Three examples are given to illustrate the applicability of those tools:
derivation of pathway-related uncertainty factors based on population variability, exposure to dioxins,
dose–response modeling of cadmium.
Key words: Population variability, Risk assessment, Dose–reponse modeling, Benchmark dose,
Toxicokinetics, Toxicodynamics, Physiologically based models, Cadmium, Dioxins, Pathway-related
uncertainty factors, Meta-analysis, Systematic review
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_20, # Springer Science+Business Media, LLC 2012
521
522 J.L. Dorne et al.
Toxicokinetics Toxicodynamics
are then given and three models are given as examples: the develop-
ment of pathway-related uncertainty factors, dose–response model-
ing of cadmium in humans, and probabilistic exposure assessment of
dioxins in humans. Finally, critical conclusions and limitations of this
growing discipline conclude the chapter.
2. Materials: Main
Approaches and
Software Packages
After a chemical compound penetrates into a living mammalian
2.1. Hazard organism (following intentional administration or unintentional
Identification
exposure), it is distributed to various tissues and organs by blood
flow (20). Following its distribution to tissues, the substance can
and Characterization
bind to various proteins and receptors, undergo metabolism, or can
2.1.1. Toxicokinetic be eliminated unchanged. The concentration versus time profiles of
Modeling the xenobiotic in different tissues, or the amount of metabolites
formed, are often used as surrogate markers of internal dose or
biological activity (21). Such information can be predicted with the
help of toxicokinetic (TK)/pharmacokinetic (PK) modeling for
which two general approaches have been applied :
l Noncompartmental analyses, consisting in a statistical regres-
sion analysis of TK measurements versus time and a number of
covariates.
l Compartmental analyses, consisting in the modeling of TK
profiles accounting for the distribution and metabolism of the
toxic compounds throughout the body.
Like for any modeling exercise, the choice of method depends
on the objectives and scope of the analysis, the data available, and
the resources and constrains attached to the analysis. Usually, non-
compartmental approaches are used for exploratory analyses, the
screening of potentially influential factors, and the determination of
sample size for further TK studies. Conversely, compartmental
analyses are more suitable for refined analyses and predictions.
The methods and related tools are further detailed and compared
below.
Compartmental TK Analysis Compartmental analyses assume that the substance of interest dis-
tributes in the body as if it was made of homogeneous well-stirred
compartments. Empirical compartmental models assign not particu-
lar meaning to the compartment identified (i.e., the substance
behaves “as if” it was distributed in two or three compartments in
the body, without looking for interpretations of what these compart-
ments might be) (22). That type of analysis can only be performed
usefully in data-rich situations and does not lend itself to necessary
interpolations and extrapolations. It is less and less used, and survives
only in heavily regulated and slowly changing contexts. On the
20 Population Effects and Variability 527
LUNGS
SKIN
BRAIN
FAT
HEART
MUSCLE
MARROW
ADRENALS
THYROID
ARTERIAL BLOOD
VENOUS BLOOD OTHERS
BREAST
GONADS
KIDNEYS
SPLEEN
STOMACH
LUMEN PANCREAS
STOMACH
GUT
LUMEN GUT
LIVER
Fig. 2. Schematic representation of a PBTK model for a woman. The various organs
or tissues are linked by blood flow. In this model, exposure can be through the skin, the
lung or per os. Elimination occurs through the kidney, the GI tract, and the lung.
The parameters involved are compartment volumes, blood flows, tissue affinity constants
(or partition coefficients), and specific absorption, diffusion and excretion rate constants.
The whole life of the person can be described, with time-varying parameters. The model
structure is not specific of a particular chemical (see http://www.gnu.org/software/mcsim/,
also for a pregnant woman model).
dx
ðtÞ ¼ k xðtÞ: (3)
dt
Any software package able to solve systems of differential
equations can be used to build a PBTK model and run simulations
of it. They include GNU Octave (http://www.gnu.org/software/
octave), Scilab (http://www.scilab.org/), GNU MCSim (free soft-
ware), Mathematica (http://www.wolfram.com/mathematica),
Matlab (http://www.mathworks.com/products/matlab), and acslX
(http://www.acslx.com). A list of software packages used for PBTK
modeling is detailed in Table 1. Most of packages specifically devel-
oped for compartmental analysis were tailored for PBPK and
applications in clinical pharmacology, though they can be used
equally in the context of toxicological assessments. Only a small
number of them allow user-specific integration of random effects to
evaluate population variability on PK parameters. As a matter of fact,
such nonlinear mixed effect models require powerful algorithms for
their statistical inference, such as simulation-based algorithms. This
can be achieved using Bayesian inference and Monte Carlo Markov
chains as implemented, e.g., in GNU MCSim (http://www.gnu.
org/software/mcsim) or using the SAEM algorithm, a stochastic
version of the EM algorithm (35) maximum likelihood estimation.
The Monolix application developed as a free toolbox for Matlab is
a powerful and flexible user-friendly platform to implement SAEM-
based estimation of PBPK and PBTK models with population
variability. It is increasingly used in particular in drug development
and pharmacological research as it combines flexibility, statistical
performance and reliability with minimal coding required from
users. From this perspective, it shows substantial advantage compared
to the NONMEM software, the gold standard of pharmacological
modeling used in the pharmaceutical world, mostly based on gradient
optimization and first-order approximation of the likelihood.
Even more specialized software using pathway-specific infor-
mation is available and has devoted considerable effort to building
databases of parameter values for various species and human
populations: notable examples are Simcyp Simulator (http://
www.simcyp.com), PK-Sim (http://www.pk-sim.com), GastroPlus
(http://www.simulations-plus.com), and Cyprotex’s CloePK
(https://www.cloegateway.com).
Table 1
List of the main software packages for compartmental PBPK/PBTK modeling,
in alphabetic order, with the indication that they include built-in features
for analyzing population variability or not
Built-in
population
Software (Web site) Description variability
asclX http://www.acslxtreme.com/ Built-in compartmental modeling including No
pharmacodynamics, toxicity studies, Monte
Carlo simulations tools
Berkeley Madonna www. Generic powerful ODE solver and Monte Carlo No
berkeleymadonna.com simulations
Boomer/Multi-Forte www.boomer.org Estimate and simulate from compartmental No
models, using a range of possible algorithms
including Bayesian inference
CloePK https://www.cloegateway.com PBPK predictive tools for animals and humans No
including open-source database on metabolic
pathways
ERDEM http://www.epa.gov/heasd/ PBPK/PD solver and simulator for risk No
products/erdem/erdem.html assessment used by EPA and exposure
assessment using backwards simulations
GastroPlusTM http://www.simulations- Popular PBPK solver and simulator, including No
plus.com trial simulations
GNU Octave http://www.gnu.org/ Generic ODE similar to Matlab No
software/octave
Matlab with MONOLIX http://www. Powerful PBPK solver integrating population Yes
monolix.org/ variability, with GUI interface for graphical
analyses and predictions. It uses MCMC and
SAEM algorithms
Mathematica with Biokmod http:// ODE solver specific to compartmental biokinetic No
web.usal.es/~guillermo/biokmod/ systems, includes design analysis
mathjournal.pdf
MCSim http://www.gnu.org/ Flexible PBPK solver with population variability, Yes
software/mcsim it gives examples of detailed generic PBPK
models
NONMEM http://www.iconplc.com Gold standard of the population PK Yes
compartmental analysis in the pharmaceutical
industry
Phoenix® NLME™ (formerly Population PK modeling tool with similar Yes
WinNonMix) http://www.pharsight. algorithms as in Nonmem but with better
com graphical features
PKBugs www.mrc-bsu.cam.ac.uk/ Efficient and user-friendly Bayesian software Yes
bugs/. . ./pkbugs.shtml built in the WinBUGS, for analysis of
population PK models
(continued)
20 Population Effects and Variability 533
Table 1
(continued)
Built-in
population
Software (Web site) Description variability
PK-Sim http://www.systems-biology. User-friendly tool for PBPK modeling and Yes
com/products/pk-sim.html simulations in humans and animals, it includes
a population database
PK solution http://www.summitpk. Easy-to-use Excel-based analysis tool for No
com noncompartmental and simple compartmental
models
R with PKfit http://cran.r-project.org/ Free R routine for the analysis of compartmental No
web/packages/PKfit/index.html models
SAAM II and Popkinetics http://depts. Compartmental modeling PK software with Yes
washington.edu/saam2 population analysis using Popkinetics
S-Adapt http://bmsr.usc.edu/ Compartmental PK/PD modeling platform Yes
Software/ADAPT/ADAPT.html including simple population models fitted with
the EM algorithm
Simcyp http://www.simcyp.com/ Major and powerful PBPK software suite with Yes
built-in PK and genetic database for
predictions in animals and humans
metabolic pathways and few data are to be analyzed. But they have
assumptions, first and foremost linearity.
As the methods involved are mainly simple calculations or
statistical regressions, most standard statistical software packages
can be used. Population variability is then assessed using linear
mixed-effect models built in most statistical packages. Neverthe-
less, more specific computer-based tools are available; the main
ones are listed in Table 2.
2.1.2. Toxicodynamics Toxicodynamic (TD) modeling quantifies the plausible causal link
and Dose–Response between chemical exposure and health or environmental conse-
Modeling quences. TD assessments are typically based on the development
of dose–response models or dose–effect relationships. Like for TK
models, a proper quantification of the interindividual variability of
dose–response is essential for quantitative risk assessment, and
more specifically for the derivation of health-based guidance
values.
TK/TD models provide a number of advantages over more
descriptive methods to analyze toxicity/safety data. In particular,
these models make better use of the available data as all effects
observations over time are used to estimate model parameters.
534 J.L. Dorne et al.
Table 2
List of the main software packages for noncompartmental
TK modeling, in alphabetic order, with the indication that
they include built-in features for analyzing population
variability or not
Population
Software (Web site) Main features variability
ADAPT II http://bmsr.usc. PKPD software with Monte No
edu/Software/ADAPT/ Carlo simulations capacities
ADAPT.html
BE/BA for R http://pkpd. Freeware using linear and No
kmu.edu.tw/bear/ ANOVA models, include
bioequivalence analysis and
study design analysis
PK solution http://www. Easy-to-use Excel-based analysis No
summitpk.com tool for noncompartmental
and simple compartmental
models
WinNonLin http://www. Industry standard for Yes
pharsight.com noncompartmental analyses.
Includes design analysis
Fig. 3. Hybrid approach used for BMD evaluation, using continuous data dichotomized into
quanta data with a predefined cut-off.
with:
logðcutoff Þ background
p ¼ ð1 BMR Þ ;
s
for extra risk
logðcutoff Þ background
p¼ BMR;
s
for additional risk
where s stands for the population of effect, BMR for the bench-
mark response, p for the cumulative distribution function of the
standardized normal distribution, m for the dose–effect function.
By construction, a BMD includes the population variability
which implies that no uncertainty factors will be subsequently
required account for it. Furthermore, since BMDs are the results
of a parametric statistical estimation procedure, a confidence inter-
val can be derived. The lower bound of such interval (BMDL) can
be chosen as a more conservative point of departure, hence
accounting for estimation uncertainty.
In general, BMD analysis requires individual data are available
for a number of dose groups, which may sometimes be hurdle to
the implementation of BMD approach. However, approaching
dose–effect curve estimation in the context of meta-analysis where
only aggregated data from the literature are available can be made
under additional assumptions (e.g., log-normality of population
distributions). This should also be accounted for in the statistical
modeling and in the use of adjustment factors for any BMD evalu-
ation. Examples and discussion of analysis of aggregated epidemio-
logical data for dose–response analysis can be found in (44). There
is generally not a unique dose–response model that can be chosen
to meet the purpose. Therefore, it is necessary to assess the sensi-
tivity of results with respect to the modeling assumption. This can
be achieved, e.g., by comparing results from different possible
models or using model averaging techniques (45).
Although the BMD approach provides a powerful framework
to derive health-based guidance values for both carcinogenic and
noncarcinogenic compounds, it is far from being systematically
implemented by regulatory bodies. One of the main hurdles is the
substantial need for statistical modeling expertise even if specific
software packages have been developed. Implementation and inter-
pretation of outputs still require an expert modeler to be involved
in the assessment. In a way this just acknowledges that chemical risk
assessments remains by nature a multidisciplinary task.
As usual, the choice between NOAEL/LOAEL and BMD
approaches should be made in relation with the data available, the
resources available and the objectives to be met. Table 3 summarizes
the main differences between NOAEL and BMD-based assess-
ments, while Table 4 lists the pros and cons for the two approaches.
20 Population Effects and Variability 539
Table 3
Summary table comparing requirements and characteristics of NOAEL
versus BMD evaluations
Table 4
Summary table comparing pros and cons in the application of NOAEL
versus BMD in toxicological risk assessment
Summary
table NOAEL BMD
PROS Easy/faster to understand and implement BMD calculated for any risk levels decided
upon
Intuitive appeal Uncertainty accounted for in a robust manner
Do not require strong assumptions on The whole dose–response curve accounted for
dose–response models
More robust when rare effects are observed Allows better adjustment for covariates/
clusters
CONS NOAEL must be of the doses tested Data requirements not always achieved to
allow modeling (especially with human
epidemiological data)
Fewer subjects/doses give higher NOAEL Few software packages available
(rewards poor designs)
Does not provide a measure of risk Except for ideal cases, requires statistical/
modeling expertise
Uncertainty around NOAEL is dealt with Less robust when doses are widely spread
uncertainty factors, often arbitrary or
poorly robust
Less robust with complex or shallow
dose–response
540 J.L. Dorne et al.
2.2. Exposure The exposure to chemicals may occur by consumption of food and
Assessment drinking water, inhalation of air and ingestion of soil so in theory
exposure modeling should cover all these routes. As an example, in
the European Union System for the Evaluation of Substances (47),
the overall human exposure from all sources could be estimated by
EUSES. In practice, such an exposure modeling is strongly depen-
dent of the quality and accuracy of data available and default assump-
tions. For example, for the populations leaving close to a source of
pollution, the exposure assessment is based on typical worst case
scenario since all food items, water and air are derived from the
vicinity of a point source. When dietary exposure assessment is
performed in isolation, quantitative data (i.e., food consumption
and chemical occurrence in food) are generally available allowing
more accurate estimates than those based on scenarios. Unfortu-
nately similar data on the distribution of chemical occurrence in air
are rarely available and therefore dietary exposure assessment cannot
be compared with exposure scenarios to avoid an underestimation of
the contribution of exposure from food relative to other media.
As mentioned in the introduction the deterministic dietary expo-
sure assessment represents the most common approach currently
used in the area of chemical risk from food. The practices are detailed
in the recent guidelines of the World Health Organization (2) and are
not in the scope of this paragraph. In a number of cases the determin-
istic assessment fails to conclude on an exposure below the threshold
of safety concern. It is therefore necessary to develop statistical meth-
odologies to refine the exposure assessment and to bridge the gaps
between the external exposure and the potential health effects.
A further refinement of the deterministic approach is therefore
to use basic Monte-Carlo simulations. In such a stochastic model
the amounts of food consumed, the concentrations of the hazard in
food and the individual body weights of exposed populations are
assumed to arise from probability distributions. For this kind of
assessments, many software are commercially available, e.g.:
l @ risk software, Copyright #2011 Palisade Corporation
http://www.palisade.com/risk/
l crystalball
http://www.oracle.com/us/products/applications/crystal-
ball/crystalball-066563.html
542 J.L. Dorne et al.
2.2.1. Dynamic Modeling of The dynamic exposure process, mathematically described in ref. 48,
Dietary Exposure and Risk is determined by the accumulation phenomenon due to successive
dietary intakes and by the elimination process in between intakes.
Verger et al. (49) propose a Kinetic Dietary Exposure Model
(KDEM) which describes the dynamic evolution of the dietary
exposure over time. It is assumed that at each eating occasion Tn,
n 2 N the dynamic exposure process jumps with size equal to the
chemical intake Un related to this eating occasion. Between intakes
the exposure process decreases exponentially according the Eq. 3.
The value of the total body burden Xn + 1 at intake time Tn + 1 is
thus defined as:
Xnþ1 ¼ Xn ekDTnþ1 þ Unþ1 ; n 2 NX0 ¼ x0 (4)
with DTnþ1 ¼ Tnþ1 Tn ; n>1, is the time between two intakes
(interintake time), Un the intake of at time Tn, ln(2)/k is the half-
life of the compound, and x0 the initial body burden.
The exposure process X of a single individual over lifetime is
not available. Indeed, chemical occurrences in food ingested by an
individual and his consumption behavior over a long period of time
have never been surveyed. Thus, computations of the exposure
process are conducted from chemical concentration data and
dietary data reported over a short period of time (described in
Subheading 20.3). Starting from a population-based model, expo-
sure process simulations over lifetime can be refined to approach an
individual-based model. To perform simulations, values for model
input variables are needed. The level of information that could be
included in the model depends on the available data (see section3.1
gathering adequate supporting data for risk assessment). For some
variables, the information is available only at population level and
for others at individual level. For the latter, inter- and intraindivi-
dual variability can be included in the model. Each of these input
variables is further characterized and discussed hereafter.
2.2.2. Intake Time Depending on the available information from consumption survey
and on the consumption frequency of the contaminated products,
intake times can be the eating occasion, the day, the week, etc.
Another way is to consider as in ref. (49) that the consumption
20 Population Effects and Variability 543
2.2.3. Chemical Intake Chemical dietary intakes U are estimated in combining data on
consumed food quantities to data on chemical occurrence in food.
The intake Un of an individual at time Tn is therefore the sum of each
multiplication of the chemical concentration by the consumed
quantity of product p:
P
P
Qp Cn;p
p¼1
Un ¼ ;
Wn
where Qp is the concentration in total chemical of the consumed
product p at time Tn, Cn;p is the consumed quantity of the product
p at time Tn, P is the total number of products containing the
chemical consumed at time Tn, and Wn is the body weight of the
individual at time Tn.
Probabilistic methods to compute dietary intakes are described
and compared with deterministic approaches in ref. 50. The use of
the different methods depends on the nature of available data. The
increased number of data permits to better account for the varia-
bility in the intake assessment. Indeed, when empirical or para-
metric distributions are available, single or double Monte Carlo
simulations can be performed to combine chemical concentration
and food consumption.
2.2.4. Half-Life The half-life is the time required for the total body burden of a
chemical to decrease by half in absence of further intake. Many
studies on rats and humans have been conducted to determine half-
lives of various chemicals (33, 51, 52). The half-life value depends
on the toxicological properties of the chemical but also on the
individual’s personal characteristics. Variability between individual
half-lives can be integrated in the modeling using the estimate of
particular half-life for the different population groups (children,
women, men). Some authors have characterized a linear relation
between half-life and personal characteristics, allowing generating
individual half-life. Moreover, changing individual’s personal char-
acteristics with time, intraindividual variability of half-life over the
lifetime can be integrated in the model. However, level of half-life
variability which can be included in dynamic exposure modeling
depends on the available data. When it is possible, the impact on
model output of using a population half-life or an individual half-
life varying with time has to be tested and discussed.
2.2.5. Initial Body Burden Under mathematical properties of KDEM, at steady-state situation,
the dynamic of the exposure process does not depend on its initial
value X0 anymore. Considering that the aim of the exposure
544 J.L. Dorne et al.
2.2.7. Simulation Verger et al. (49) describe a population-based approach of the KDEM
of Individual Exposure applied to the occurrence of methylmercury in fish and seafood
from Population Exposure consumed by women aged over 15 years. The intakes and interintake
Models times are considered independent and drawn from two exponential
distributions fitted from the French national consumption survey
INCA (53). A fixed half-life is used to define the elimination process
between intakes over lifetime. A trajectory of the exposure process is
computed in randomly selecting an intake and an interintake time in
their respective distribution. The corresponding body burden is cal-
culated from Eq. 4 using the previous selected values and the fixed
value of the half-life. Each trajectory is therefore computed from
20 Population Effects and Variability 545
2.3. Risk Risk characterization is usually the step of risk assessment which
Characterization: combines outputs from risk characterization and exposure into one
The Case of Dynamic final assessment.
Modeling of Exposure In the case of the dynamic exposure model described above,
from Food some more specific risk characterization can be undertaken. As a
matter of fact, an interesting risk measure is the probability of the
exposure process exceeds a threshold dref given by:
ð
1
Pm fX >dref g ¼ mð½dref ;1Þ ¼ lim IfX ðtÞdref g dt:
T !1 T
times with the kinetic model (3). Above that condition, the
dynamic evolution of the process is
xnþ1 ¼ xn ekDTnþ1 þ TI:
At the steady-state, the reference process stabilizes at a safe
level, obtained from the TI and the half-life:
TI
lim xn ¼ :
n!1 1 ek
Note that when k is close to 0, the function 1 ek can be
approximated to k with the Taylor series and thus the previous
equation equals to Eq. (2).
Another value to define threshold dref can be the body burden of
chemical (endpoints) corresponding to possible health adverse
effects. According to that definition the probability of the exposure
process to be above dref corresponds to an incidence of disease.
Endpoints are derived from dose–response relationship provided
by laboratory animal experiments. They can be the lowest chemical
dose which can induce an observed effect of disease (LOAEL) or the
dose associated with a level of this effect (benchmark dose BMD).
For example, a BMD related to effects on reproductive toxicity can
be the percentage of decrease in sperm production. To extrapolate
human dose–response relationship from animal dose–response,
interspecies (variability between man and rat) and intraspecies
(variability between humans) factors have to be used. Each factor
can be defined by a single value or by a probabilistic distribution.
3. Methods:
Practical
Implementation
and Best Practices
for Quantitative
Modeling of
Population
Variability
Various methodologies have been developed to gather available
3.1. Gathering Adequate
data for the quantification of population variability with respect to
Supporting Data for Risk hazard identification and characterization [toxicokinetics, toxico-
Assessment dynamics (dose–response)], occurrence and consumption (expo-
sure assessment) for a particular chemical.
Design and Analysis Most of human and animal randomized studies can be used for
of Randomized Studies toxicological analyses. By nature, randomized studies are designed
so that the analysis of primary outcomes is simply restricted to
group comparisons. Such randomized studies are typically designed
to assess a given effect in relation with chemical exposure. In the
context of risk assessment, these designs are therefore more suitable
for hazard identification rather than hazard characterization. The
simplest design would then be a two-arm study comparing exposed
rats versus nonexposed rats. This assumes that the exposure level is
predefined. Responses to exposure are then compared between the
two groups using typical statistical testing such as Fischer tests for
quantal responses. In cases where a range of doses needs to be
tested, multiple group comparisons can be analyzed with
ANOVA-type of analyses.
By definition, the randomization structure is deemed to
balance all factors that could affect primary outcomes, so that no
adjustments for covariates are needed in the final group comparison
analysis, which can be performed by any standard statistical
package. As a consequence, the main statistical challenge lies in
the randomization structure which should account for the varia-
bility structure and clustering of the data to be collected.
Design and Analysis The variability structure of data is more critical in the analysis of
of Observational Studies observational human data, as influential factors have generally not
been balanced by the study design. The list of potential covariates
affecting human data can be vast (e.g., age, gender, body weight,
etc.). Confounding factors often interfere with the outcome variable
548 J.L. Dorne et al.
Gathering Evidence from The systematic review (SR) methodology combined with a meta-
Multiple Published Studies analysis can be applied in the context of chemical risk assessment,
when a number of studies for that particular chemical are available.
A thorough account of the SR methodology is beyond the aim of
this chapter and its potential applicability in risk assessment applied
to food and feed safety has been explored by EFSA in a guidance
document. However, a few basic concepts are worth mentioning.
SR has been developed by the cochrane library collaboration and
applied in human medicine, epidemiology and ecology for a num-
ber of evidence-based syntheses (cost-effectiveness of treatments,
reporting of side effects, relative risk of disease, meta-analysis of
biodiversity or abundance data in ecology). SR has been defined as
“an overview of existing evidence pertinent to a clearly formulated
question, which uses prespecified and standardized methods to
identify and critically appraise relevant research, and to collect,
report, and analyze data from the studies that are included in the
review” (54).
When applied to risk assessment, the starting point of an SR is
to identify the question type and how to frame the question, using
the Cochrane collaboration methodology, four key elements frame
the question, namely, population, exposure, comparator, and out-
come. Such a question can be close-framed or open-framed. Typi-
cally, SR can be appropriate for a close-framed question since
primary research study design can be envisaged to answer the
question.
Dose-dependent toxicokinetic (i.e., half-life, clearance) or
dose–response in humans or in a test species (rat, mouse, dog,
rabbit..) can be taken as a generic example, the population can be
the human population or in the absence of relevant human data, the
population of interest would become the test species (rat,
mouse. . .), the exposure is the chemical, the comparator would
correspond to different dose levels and the outcome can be either
a toxicokinetic parameter (half-life, clearance, Cmax. . .) or toxicity
in a specific target organ (liver, lung, kidney, heart, bladder. . .).
Ideally, once the primary studies in humans or test species referring
to the dose-dependent toxicokinetics or dose–response for a spe-
cific chemical have been collected, a meta-analysis can be performed
so that modeling of the population variability is possible (54).
550 J.L. Dorne et al.
3.1.2. Gathering Evidence To perform the probabilistic assessment of human dietary exposure,
for Exposure Assessment the basic available data are the distribution of food consumed and
the distribution of hazard occurrence in food. Most of the data
available are collected at national level and are assumed to be repre-
sentative for the country. For an international perspective the vari-
ous data sets should be combined together. This chapter emphasizes
on this step and aim to describe the data preparation.
Food Consumption Data Food consumption data are collected at national level based on
various survey methodologies like food record or recall (62). More-
over, the year of conduct, the population groups and the age
categories differ greatly between countries, therefore, it is not
possible to use directly these data for an international probabilistic
assessment (63). The approach used currently consists in compar-
ing national distributions and using the worst case, i.e., the highest
consumption for consumers only observed in one country, for the
comparison with health based guidance value. This means in prac-
tice that the percentage of risk or the percentage of individuals
exposed above the health based guidance values is extrapolated
from the national to the international population level and is likely
to be considerably higher than in the reality depending of the ratio
between the considered populations. For example: if the risk is
estimated for the 95th percentile of exposure in Netherlands
(16.6 million inhabitants), it represents about 828,000 people.
552 J.L. Dorne et al.
Occurrence Data The main difference between occurrence and consumption data is
that food market is generally assumed to be global and therefore,
the country in which food is sampled is not considered to be the
only or even the main source of variability.
Because of the inherent variability between samples regarding
the occurrence of hazards in food, the uncertainty in the results is
likely to increase when the number of sample decreases. Therefore
in such cases, exposure assessment should be questioned and when
necessary a deterministic estimate of the range of exposure should
be performed (66). The current chapter is assuming a dataset with a
sufficient number of samples (e.g., > 50 samples) for each food
category to be considered, allowing a probabilistic approach with a
reasonable level confidence.
After combining all analytical results together using a common
food classification, the whole dataset should be described (e.g., with
histograms/density plot/cumulative distributions) and analyzed
with a particular focus on its potential sources of heterogeneity
(Country of origin, year of sampling, laboratory characteristics,
analytical techniques, etc.).
One of the main issues is therefore very often to handle con-
centration data described as being below the limit of reporting
(analytical limit of detection or limit of quantification). These data
are often known as nondetects, and the resulting occurrence distri-
bution is left-censored. As a first step the impact of left censored
data could be addressed imputing nondetects with values equal to
zero and to the limit of reporting according respectively to a lower-
and upper-bound scenario. The effects of the substitution method
could be evaluated on the mean, and/or the high percentiles. If the
effect is negligible then exposure assessment should be based on
substitution of nondetects by the limit of reporting (upper bound
approach). This approach has the disadvantage of hiring the varia-
bility between samples and overestimating the exposure but can
be used in that particular case because of its low impact on the
overall result.
In other cases, depending on the percentage of censored data,
parametric or nonparametric modeling could be used. When the
percentage of censoring is high (e.g., >50%), it has been observed
in the literature (66, 67) than a parametric approach is performing
better than the nonparametric ones. A set of candidate parametric
models should be defined, by inspecting the density plots. In recent
guidelines, EFSA (66) proposes to select the best parametric model
554 J.L. Dorne et al.
3.2. Specific Good The purpose of this section is not to detail general “best modeling
Modeling Practices practices” that would apply to any statistical modeling. Instead, we
propose to emphasize the good practices that are specific to popu-
3.2.1. Population Model
lation models applied to chemical risk assessment. Note that a
Building
thorough dose–response model-building guidance document
prepared by the WHO International Program for Chemical at
Safety is available in ref. 70 for the purpose of risk assessment.
Generally speaking the key element in structuring a population
model is the definition of its stochastic component and how to
balance it with the deterministic one. This balance necessarily
depends on the objective of the modeling exercise. In case individ-
ual predictions are of primary interest (e.g., when using the model
to define a maximum tolerable dose of a compound for a specific
individual), the deterministic structure should be developed to its
largest extent allowed by the data available. Indeed, such determin-
istic model could then better account for individual specificities and
covariates. Conversely, when the population predictions are of
primary interest (e.g., when using the model to assess the response
of 95th population percentile), the variability component becomes
essential and the stochastic structure should be developed to best
capture such population variability.
This choice between population one-compartment model
(with population variability of TK parameters) versus larger PBTK
models (without population variability) with more compartments
to refine the description of metabolism is a typical illustration of
such balance between deterministic and stochastic structures as
exemplified in the cadmium example below.
The way to construct the population or hierarchical structure of
a stochastic model relies on the identification of the main patterns
found in the toxicological data to be used and of importance in the
risk assessment to be made. Those patterns can be captured or not
depending on the design of the data collection. A common way to
grasp and describe this variability structure is made by considering
the hierarchy of the different levels or scales at which data are
20 Population Effects and Variability 555
3.2.2. Model Validation Population models are typically evaluated with respect to how the
model simulations can reflect past and/or future observations, or
Validation of Population
more specifically a function of observations of highest interest
TK/TD Models
for the assessment such as the 95th population percentile of a
concentration.
The prediction error is often evaluated on the observations
compared either to individual or to population predictions. Popu-
lation predictions are obtained after setting all random effects to
zero, i.e., the first-order approximation PRED in NONMEM.
Individual predictions are obtained after setting random effects to
their subject-specific values as estimated by the model. These are
typically evaluated using post hoc Bayesian estimates as with the
IPRED command of NONMEM. Bayesian individual predictions
are straightforward outputs from any Bayesian software like Win-
BUGS or MCSim.
Another measure of the predictive performance can be derived
from evaluating the likelihood of the data given the model esti-
mated with fixed population parameters.
Model validation can be done using an external dataset or
alternatively using a cross-validation approach. In the latter case,
the dataset is split into two subsets: one for the fitting (the larger)
and one for the validation (the smaller). The split should be done
randomly, and could be repeated.
Critical validation steps of population models include the graphi-
cal checks of residual errors variances and of random effect distribu-
tions. Visualize checks can be done using boxplots and should be
centralized around zero without systematic bias or patterns.
In the context of chemical risk assessment, sensitive assump-
tions in population modeling are typically:
l The choice of random effects’ distributions
l The choice of dose–response models
558 J.L. Dorne et al.
Validation of Exposure The KDEM model predicts for each simulated trajectory the mean
Models body burden of chemical at steady-state. The predictions from
dietary exposure can be compared with internal exposure (i.e.,
biomarkers) to validate the model. Measurements on internal expo-
sures to chemical are usually sampled in urine, hair, blood, or breast
milk. The chemical body burden predicted by KDEM has to be
converted in same unit of the measurements with conversion fac-
tors. For example, predicted body burden have to be converted in
concentration in body fat to compare with measurements in breast
milk. The conversion factors such as percentage of fat depend of
personal characteristics of the individuals and its variability could
also be included.
Often, internal measurements are not available for the popu-
lation for which the dynamic modeling of exposure process has
been computed, especially when the population of interest is the
whole population of a country. In that case, validation of the
exposure process is done at population level in comparing
both the distributions of predicted body burden and internal
measurements. When studies coupling biological measurements
with frequency consumption questionnaire have been carried out,
the interest is to link the body burden predicted from the ques-
tionnaire with the associated internal measurement for each indi-
vidual. Due to high uncertainty on past contamination and
consumption, for an individual the point estimate with one
trajectory can be far from its internal measurement value. In
such a case, for an individual, several trajectories can be simulated
under different scenarios regarding contamination and consump-
tion using probabilistic distributions. A confidence interval for the
predicted exposure can then be constructed and compared with
the internal exposure value. Sometimes, measurements have been
performed on a specific population that has certainly not reached
the steady-state situation, an example is pregnant women. Using a
well-defined initial body burden X0 and computing a sensitivity
analysis to this value, estimation of the body burden can however
be conducted based on the external exposure and compared with
internal measurements.
20 Population Effects and Variability 559
3.2.3. Numerical Aside specific model validation steps, the fitting of nonlinear mixed-
Considerations effect models involves implementation computational methods
such as stochastic or iterative algorithms. As a consequence, out-
Convergence of Fitting
puts from parameter estimation rely on valid convergence of such
Algorithms of Population
algorithms. In maximum likelihood estimation, fitting algorithms
Models
implemented in e.g., NONMEM are based on first order approx-
imations of likelihood. Those approximations may not be valid in
case of highly skewed likelihoods and nonnormal random effects,
or they may be highly sensitive to initial values. As a consequence, a
good practice is to test a range of different approximation methods
and initial values. To gain in flexibility and reliability, stochastic
approximation of EM algorithms (SAEM) have been developed
and used in PBTK modeling with the MONOLIX software.
Although SAEM is more powerful gradient algorithms for nonlin-
ear population models, only visual convergence check is available in
MONOLIX. Finally, for Bayesian inference, convergence of
MCMC needs also to be checked using as for any Bayesian model-
ing exercise. WinBUGS provides ready-to-use tools for conver-
gence check such as the Gelman-Rubin statistics.
4. Examples
4.1. Example 1: Systematic review and meta-analysis of TK variability for the major
Systematic Review human metabolic routes, (1) phase I metabolism (CYP1A2,
and Meta-analysis CYP2A6, CYP2C9, CYP2C19, CYP2D6, CYP2E1, CYP3A4,
of Human Variability hydrolysis, alcohol dehydrogenase), (2) phase II metabolism (N-
Data in Toxicokinetics acetyltransferases, glucuronidation, glycine conjugation, sulfation)
for Major Metabolic and (3) renal excretion; have been performed following a number of
Routes and Derivation methodological steps using the pharmaceutical database. The pur-
of Pathway-Related pose was to derive pathway-related uncertainty factors using human
Uncertainty Factors variability data in TK as an intermediate between default uncer-
tainty factors and chemical-specific adjustement factors (i.e., based on
PB-TK models) for chemical risk assessment (for more details on
uncertainty factors see Subheading 20.1) (4, 6, 19, 73–75).
– Selection of probe substrates and systematic review of the TK
literature:
Probe substrates were identified by searching the literature
(MEDLINE, PUBMED and TOXLINE depending on the
pathway). The selection of probe substrates followed a number
of specific criteria: (1) oral absorption complete, (2) a single
pathway (phase I, phase II metabolism and renal excretion)
responsible for the elimination of the compound (60–100% of
the dose), (3) intravenous data used for compounds for which
absorption was variable. The specific metabolic pathway was
identified using quantitative biotransformation data from
in vitro (microsomes, cell lines and primary cell cultures) and
in vivo (urinary and fecal excretion) studies.
– Systematic review and meta-analysis of TK data.
Selection of TK studies and data ranking for population varia-
bility analysis
For each metabolic route, a systematic review for TK studies in
humans was performed for each probe substrate selected for each
pathway for each subgroup of the population: general healthy adults
(16-70 years from different ethnic backgrounds (Caucasian, Asian,
African) and genetic polymorphisms (CYP2C9, CYP2D6,
CYP2C19, NAT-2) and other subgroups of the population [elderly:
healthy adults older than 70 years, children (>1 year to <16 years),
infants (>1 month to <1 year) and neonates (<1 month)] African,
effects of genetic polymorphisms, effects of age (elderly, children,
infants, neonates), effects of ethnicity (Caucasian, Asian, African)
using two different types of TK parameters: (1) TK parameters reflect-
ing chronic exposure [metabolic and total clearances, area under the
plasma concentration–time curve (AUC)]. (2) TK parameters reflect-
ing acute exposure (Cmax) markers of chronic (clearances and AUCs)
and acute (Cmax) exposure.
A number of meta-analysis were then performed in order to
quantify human variability in TK for each compound, marker
(acute, chronic), subgroup of the population and metabolic route
20 Population Effects and Variability 561
X
GM ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (6)
ð1 þ CV 2N Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
GSD ¼ exp ln(1 þ CV 2N Þ : (7)
4.1.1. Derivation of Pathway- Pathway-related uncertainty factors for markers of chronic and
Related Uncertainty Factors acute exposure were derived as the Z scores of the pathway-related
variability to cover the 95th, 97.5th and 99th centiles of the healthy
adult population and subgroups of the population (including
pathway-related ratio of internal dose and pathway-related varia-
bility in the specific subgroup). The pathway-related uncertainty
factors were calculated as the difference between each percentile
(95th, 97.5th and 99th centiles) of the subgroup compared with
healthy adults, without taking into account the incidence of the
subgroup in the overall population. These values assume that
higher circulating concentrations of the parent compound would
result in increased risk, i.e., that the parent compound is the toxic
chemical species (6, 19, 73–77).
4.2. Example 2: This example is extracted from a recent human risk assessment of
Population TK/TD Cadmium in food by the European Food Safety Authority (18).
of Cadmium Using Cadmium is a widespread environmental pollutant that has
Aggregated Data been shown to exert toxic effects on kidney and bones in humans
after long-term exposure. The recent health risk assessment of
Cadmium in food performed by the European Food Safety
Authority (EFSA), illustrates the whole process of TK/TD
model-based risk assessment. A population TK and dose-effect
models (TD) were developed and linked together in order to
evaluate a “safe dose” (“health-based guidance value”) for the
Caucasian population.
4.2.2. Modeling Population For cadmium renal effects, b2-microglobulinuria (b2-MG) was the
Variability in the most commonly reported biomarker and was therefore chosen as
Dose–Response: biomarker of effect. On the basis of a systematic review of the
Toxicodynamics and BMD literature, 35 epidemiological studies that measured both biomar-
Assessment kers of exposure and effect in urine were compiled into an aggre-
gated dataset made up of 165 groups of matching urinary cadmium
levels and b2-microglobulinuria.
A meta-analysis was then performed to determine the overall
relationship between urinary cadmium and b2-microglobulin for
subjects over 50 years of age (with purely occupational studies
excluded) and for the whole population with all studies.
The consolidated dataset of U-Cd dose groups versus b2-MG
effect groups is displayed on Fig. 7, illustrating dose and effect
population variability, as well as the interstudy variations. It exhibits
an S-shaped dose–effect relationship in the log-scale of both U-Cd
and b2-MG, which can be described by a Hill model, with equation:
Effect (d) ¼ background + amplitude [d/(d + ed50)]
Where: d stands for the dose, i.e., urinary cadmium (in log scale),
“amplitude” corresponds to the difference between the two
plateaus of the S-shape, ed50 corresponds to the dose where 50%
of the maximal effect is achieved, and corresponds to the shape
parameter defining the steepness of the S curve.
20 Population Effects and Variability 567
1000000
10000
1000
100
10
1
0.1 1 10
Urinary Cadmium (µg / g crea)
Fig. 7. Scatter plot of data from all studies linking urinary cadmium to b2-microglobuli-
nuria, using a different color for each study and illustrating within group variability. Each
dose group is represented by an ellipse on the log scale, with log(GSD) as radium.
where n(k) is the sample size of study (k), m(k) is the population
subgroup mean effect (in log scale) and s2 is the interindividual
variance of the effect at a given dose. This statistical model naturally
accounts for interstudy and interindividual variability and weights
studies according to their sample sizes. It is valid under the assump-
tion that individual doses and effect levels are log-normally
distributed within each group, which were generally what was
reported in the original publications from the grouped data were
collected. The population mean m(k) was adjusted for ethnicity to
differentiate Caucasian from Asian data. The resulting model fit to
the data is reported by Fig. 8, showing that Asian subjects were
estimated to have more than twofold higher b2-microglobulinuria
than Caucasians with the same exposure (p < 0.01). Further
568 J.L. Dorne et al.
108
107 Asians
Beta2-Microglobulin (µg / g creatinine) Caucasian
106
105
104
103
102
101
10–1 100 101 102
Urinary Cadmium (µg / g creatinine)
Fig. 8. Hill model fitted to the Caucasian data (open circles) versus the Asian data , using the complete dataset.
4.2.3. Final Risk Assessment The overall approach taken by EFSA for this assessment is
and TDI Determination summarized by Fig. 9 showing how the various data, analyses
and outputs of analyses were sequenced and combined together
20 Population Effects and Variability 569
Fig. 9. Graphic representation of step-wise TK/TD assessment performed by EFSA to derive the final HBGV on dietary
cadmium.
4.3. Example 3: In this section, KDEM is applied to the French population expo-
Population Exposure sure to dioxin and dioxins like compounds (DLCs). Several scenar-
to Dioxins over Time ios for the model input variables have been performed to test their
Using Individual Data impact on model output.
In the case of dioxin and related compounds, the total dietary
intake estimation should be performed for a group of congeners
with similar toxicological properties. The TEF approach, (51) are
based on an additive model and are used to convert concentrations
of each congener relatively to the compound defined as a reference.
The converted concentrations are then summed to obtain concen-
tration in total chemical expressed in toxic equivalents (TEQs).
Dietary exposure data are the individual exposures estimated by
(85) in combining concentration data from French monitoring
programs (2001–2004) with individual consumption data provided
by the first French Individual Consumption Survey (53). The daily
exposure to total dioxins expressed in toxic equivalents (TEQs) was
estimated to be on average 1.8 and 2.8 pg TEQ98/kg b.w./day for
respectively the adult and the children population. The 95th per-
centile of exposure is 4.5 and 6.6 pg TEQ98/kg b.w/day for
respectively the adult and the children population. The empirical
exposure distributions of adults and children are used as input
values in the dynamic exposure process.
Both the population-based model and the individual-based
model presented in the Subheading 3.2 have been computed
using different values of initial body burden X0. Variability between
congeners’ half-lives is considered to be included in the TEFs and
the half-life value of the TCDD which is the reference compound is
used (51). Two scenarios considering fixed or individual half-lives
were tested for the population-based model. With the first scenario,
the half-life of the adults’ population was fixed to 7.1 years and the
one of the children population to 2.8 years (31). Some authors (86)
have observed for dioxins and dioxins like compounds a linear
relationship between half-lives and personal characteristics such as
age, body fat, smoking, and breast-feeding. With the second sce-
nario, half-lives for each individual were estimated according to the
equation of (86). Trajectories were simulated over 90 years. The
interintake times for both model was set to a week and the time
window of the individual-based model was set to 3 years.
Figure 10 shows the individual variability of the dynamic pro-
cess in plotting 30 exposure trajectories computed with the
individual-based model. The jumps of the dynamic exposure
20 Population Effects and Variability 571
Fig. 10. An example of 30 trajectories of the exposure process over 90 years computed with the individual-based model.
Fig. 11. Mean exposure trajectories for different initial body burden performed with the population-based model using a
fixed half-life. A mean exposure trajectory is represented by the mean of the computed mean exposures over increasing
intervals [0,T ] for a set of 1,000 trajectories.
20 Population Effects and Variability 573
Fig. 12. Mean exposure trajectories for different initial body burden performed with the population-based model using a
fixed half-life. A mean exposure trajectory is represented by the mean of the computed mean exposures over increasing
intervals [0,T ] for a set of 1,000 trajectories.
based model for three different initial body burdens are close after a
period of around 30 years.
The test of different scenarios results in that the dynamic
modeling of the exposure process is sensitive to the half-life values
and to the choice between the population-based model and indi-
vidual one. The individual-based model seems to be less sensitive to
the starting point X0. This phenomenon can be explained by the
fact that the exposure variability is lower within an age group than
among the whole population. Therefore, exposures performed
with two set of 1,000 trajectories using the individual-based
model will be closer than the ones performed with the
population-based model. Note that for both models, the steady-
state situation is reached after a very long time of exposure, more
than a life with the individual-based model.
To give example of risk measures, human BMD was extrapo-
lated from the rat to human combining animal BMD determined
by (87) and intraspecies factor modeled by a lognormal distribution
with a geometric mean of 1 and a geometric standard-deviation of
574 J.L. Dorne et al.
5. Notes/
Conclusions and
Future Perspectives
Generally speaking, there are limitations for the various in vitro
toxicokinetic assays which have an impact on the predictive accu-
racy of PB-TK models. Difficulties in predicting metabolism, der-
mal absorption, renal excretion and active transport are foremost
in that respect, and improvements will proceed at the pace adopted
to solve these problems. For instance, checking the validity of PB-
TK models is much easier when such models have a stable and well-
documented physiological structure. In this context a number of
generic PB-TK models have been developed, i.e., Simcyp, Bayer
Technology Services, Cyprotex (https://www.cloegateway.com),
or Simulation Plus. However, modelers are still facing the need to
validate QSAR submodels or in vitro assays used to assign a PB-TK
model’s parameter values. Obviously, the quality of those inputs
conditions the validity of the PB-TK model using them. The vali-
dation of those submodels and in vitro assays requires particular
attention and follow the relevant procedures combining sensitivity
and uncertainty analyses to understand and quantify which model
parameters or assumptions are the most critical to develop relevant,
stable and quantitatively informative toxicokinetic/toxicodynamic
models for risk assessment (89).
20 Population Effects and Variability 575
5.1. Overall Perspective Over the last decade, risk assessment methodologies have evolved
for Chemical Risk considerably and such evolution has been stimulated by the scien-
Assessment tific community as a whole, specialists in the areas of toxicology,
pharmacokinetic modeling, applied statistics, systems biology to
cite but a few, as well as international public health agencies moving
towards “evidence-based” quantitative approaches. Historically, an
important historical factor of this trend has been the rise of the
bioinformatics in the postgenomic era serving as an interface
between the biomedical sciences (molecular biology, pharmacol-
ogy, toxicology, and epidemiology), mathematics/statistics, and
computer sciences. Additionally, high-performance computational
576 J.L. Dorne et al.
Acknowledgments
The views reflected in this review are the authors’ only and do not
reflect the views of the European Food Safety Authority, the Tech-
nological university of Compiegne, the French Agency for Food,
Environment, and Occupational Health Safety, the French
National Institute of Agronomical Research (INRA), or the
World Health organization.
References
1. European Commission (EC) (2002) Regula- mental health criteria 240. http://www.who.
tion (EC) No 178/2002 of the european par- int/foodsafety/chem/principles/en/index1.
liament and of the council laying down the html
general principles and requirements of food 3. Svendsen C, Ragas AM, Dorne JLCM (2008)
law, establishing the European Food Safety Contaminants in organic and conventional
Authority and laying down procedures in mat- food: the missing link between contaminant
ters of food safety. http://eur-lex.europa.eu/ levels and health effects. Book comparing
LexUriServ/LexUriServ.do?uri¼OJ: organic vs non-organic food at the nutritional,
L:2002:031:0001:0024:EN:PDF microbiological and toxicological level. In:
2. WHO (2009) Principles and methods for the Givens DI et al (eds) Health benefits of organic
risk assessment of chemicals in food, Environ-
578 J.L. Dorne et al.
modelling): meeting the 3Rs agenda—the 38. Crump KS, Chen C, Chiu WA, Louis TA, Por-
report and recommendations of ECVAM tier CJ et al (2010) What role for biologically
Workshop 63a. Altern Lab Anim 35:661–671 based dose–response models in estimating low-
25. Edginton AN, Schmitt W, Willmann S (2006) dose risk? Environ Health Perspect 118(5)
Development and evaluation of a generic phys- 39. Crump KS (1984) A new method for deter-
iologically based pharmacokinetic model for mining allowable daily intakes. Fundam Appl
children. Clin Pharmacokinet 45:1013–1034 Toxicol 4:854–871
26. Luecke RH, Pearce BA, Wosilait WD, Slikker 40. Budtz-Jørgensen E, Keiding N, Grandjean P
W, Young JF (2007) Postnatal growth consid- (2001) Benchmark dose calculation from epi-
erations for PBPK modeling. J Toxicol Environ demiological data. Biometrics 57:698–706
Health A 70:1027–1037 41. Sand S et al (2008) The current state of knowl-
27. Jones HM, Gardner IB, Watson KJ (2009) edge on the use of the benchmark dose concept
Modelling and PBPK simulation in drug dis- in risk assessment. J Appl Toxicol 28
covery. AAPS J 11:155–166 (4):405–421
28. Allen BC, Hack CE, Clewell HJ (2007) Use of 42. Crump KS (2002) Critical issues in benchmark
Markov chain Monte Carlo analysis with a calculations from continuous data. Crit Rev
physiologically-based pharmacokinetic model Toxicol 32:133–153
of methylmercury to estimate exposures in US 43. Suwazono Y et al (2006) Benchmark dose for
women of childbearing age. Risk Anal cadmium-induced renal effects in humans.
27:947–959 Environ Health Perspect 114(7):1072–1076
29. Lorber M (2008) Exposure of Americans to 44. Ryan L (2008) Combining data from multiple
polybrominated diphenyl ethers. J Expo Sci sources, with applications to environmental
Environ Epidemiol 18(1):2–19 risk assessment. Stat Med 27:698–710
30. Fromme H, Korner W et al (2009) Human 45. Wheeler MW, Bailer AJ (2007) Properties of
exposure to polybrominated diphenyl ethers model-averaged BMDLs: a study of model
(PBDE), as evidenced by data from a duplicate averaging in dichotomous response risk estima-
diet study, indoor air, house dust, and biomo- tion. Risk Anal 27:659–670
nitoring in Germany. Environ Int 35 46. Morales KH, Ibrahim JG, Chen CJ, Ryan LM
(8):1125–1135 (2006) Bayesian model averaging with applica-
31. US-EPA (2003) Exposure and human health tions to benchmark dose estimation for arsenic
reassessment of 2,3,7,8-tetrachlorodibenzo-p- in drinking water. J Am Stat Assoc 101
dioxin (TCDD) and related compounds (473):9–17
National Academy Sciences (NAS) review 47. EC (2004) European Union System for the
draft. Part III. EPA, Washington, DC Evaluation of Substances 2.0 (EUSES 2.0).
32. Pinsky PF, Lorber MN (1998) A model to Prepared for the European Chemicals Bureau
evaluate past exposure to 2,3,7,8-TCDD. J by the National Institute of Public Health and
Expo Anal Environ Epidemiol 8(2):187–206 the Environment (RIVM), Bilthoven, The
33. Smith JC, Farris FF (1996) Methyl mercury Netherlands (RIVM Report no. 601900005).
pharmacokinetics in man: a reevaluation. Tox- Available via the European Chemicals Bureau.
icol Appl Pharmacol 137(2):245–252 http://ecb.jrc.it
34. Albert I, Villeret G et al (2010) Integrating 48. Bertail P, Clémençon S et al (2010) Statistical
variability in half-lives and dietary intakes to analysis of a dynamic model for dietary contam-
predict mercury concentration in hair. Regul inant exposure. J Biol Dyn 4(2):212–234
Toxicol Pharmacol 58(3):482–489 49. Verger P, Tressou J, Clémençon S (2007) Inte-
35. Delyon B, Lavielle M, Moulines E (1999) Con- gration of time as a description parameter in
vergence of a stochastic approximation version risk characterisation: application to methyl
of the EM algorithm. Ann Stat 27(1):94–128 mercury. Regul Toxicol Pharmacol 49
36. Rowland M, Benet LZ, Graham GG (1973) (1):25–30
Clearance concepts in pharmacokinetics. 50. Tressou J, Leblanc JCh, Feinberg M, Bertail P
J Pharmacokinet Biopharm 1:123–136 (2004) Statistical methodology to evaluate
37. Shuey DL, Lau C, Logsdon TR, Zucker RM, food exposure to a contaminant and influence
Elstein KH, Narotsky MG, Setzer RW, Kavlock of sanitary limits: application to ochratoxin A.
RJ, Rogers JM (1994) Biologically based dose- Regul Toxicol Pharmacol 40(3):252–263
response modeling in developmental toxicol- 51. Van den Berg M, Birnbaum L et al (1998)
ogy: biochemical and cellular sequelae of 5- Toxic equivalency factors (TEFs) for PCBs,
fluorouracil exposure in the developing rat. PCDDs, PCDFs for humans and wildlife. Envi-
Toxicol Appl Pharmacol 126(1):129–144 ron Health Perspect 106:775–792
580 J.L. Dorne et al.
52. Thuresson, Höglund et al (2000) In: Medicine exposure assessment of chemical substances.
and health policy. New York: Marcel Dekker EFSA J 8(3):1557. http://www.efsa.europa.
53. AFSSA (2009) Etude individuelle Nationale eu/en/efsajournal/doc/1557.pdf
des consommations Alimentaires 2 (INCA 2) 67. Helsel DR (2005) Nondetects and data analy-
(2006-2007), Rapport AFSSA, 228p, http:// sis. Wiley, New York
www.anses.fr/Documents/PASER-Ra- 68. Kennedy MC, Roelofs VJ et al (2011) A hierar-
INCA2.pdf chical Bayesian model for extreme pesticide
54. EFSA (2010) Application of systematic review residues. Food Chem Toxicol 49(1):222–232
methodology to food and feed safety assess- 69. Tressou J, Bertail P et al (2003) 709 Evaluation
ments to support decision making. EFSA J 8 of food risk exposure using extreme value
(6):1637. http://www.efsa.europa.eu/en/ theory-application to heavy metals for sea pro-
efsajournal/doc/1637.pdf ducts consumers. Toxicol Lett 144(Supple-
55. Marvier M, McCreedy C, Regetz J, Kareiva P ment 1):s190
(2007) A meta-analysis of effects of Bt cotton 70. WHO (2009) Principles for modelling dose-
and maize on nontarget invertebrates. Science response for the risk assessment of chemicals.
316(5830):1475–1477 Environmental Health Criteria. http://www.
56. Greenland S, Robins J (1994) Invited com- who.int/tipcs/methods/harmonization/
mentary: ecologic studies—biases, misconcep- dose_response/en/
tions, and counterexamples. Am J Epidemiol 71. Spilke J, Piepho HP, Hu X (2005) A simulation
139:747–60 study on tests of hypotheses and confidence
57. Terrin N, Schmidt CH, Lau J, Olkin I (2003) intervals for fixed effects in mixed models for
Adjusting for publication bias in the presence blocked experiments with missing data. J Agric
of heterogeneity. Stat Med 22:2113–2212 Biol Environ Stat 10:374–389
58. Stangl D, Berry DA (eds) Meta-analysis 72. Spiegelhalter DJ, Best NG et al (2002) Bayes-
59. Higgins JP, Thompson SG, Deeks JJ, Altman ian measures of model complexity and fit. J R
DG (2003) Measuring inconsistency in meta- Stat Soc Series B Stat Methodol 64:583–640
analyses. BMJ 327:557–560 73. Dorne JLCM, Walton K, Renwick AG (2001)
60. Egger M et al (2001) Systematic reviews in Uncertainty factors for chemical risk assess-
health care. BMJ books, London ment: human variability in the pharmacokinet-
61. Egger M, Smith GD, Schneider M, Minder C ics of CYP1A2 probe substrates. Food Chem
(1997) Bias in meta-analysis detected by a sim- Toxicol 39:681–696
ple, graphical test. BMJ 315:629–634 74. Dorne JLCM, Walton K, Slob W, Renwick AG
62. Biro G, Hulshof K, Ovesen L, Amorim Cruz (2002) Human variability in polymorphic
JA (2002) Selection of methodology to assess CYP2D6 metabolism: is the kinetic default
food intake. Eur J Clin Nutr 56(Suppl 2): uncertainty factor adequate? Food Chem Tox-
S25–S32. doi:10.1038/sj/ejcn/1601426 icol 40:1633–1656
63. Verger P, Ireland J, Møller A, Abravicius JA, De 75. Dorne JLCM, Renwick AG (2005) The refine-
Henauw S, Naska A (2002) Improvement of ment of uncertainty/safety factors in risk
comparability of dietary intake assessment assessment by the incorporation of data on
using currently available individual food con- toxicokinetic variability in humans. Toxicol Sci
sumption surveys. Eur J Clin Nutr 56(Suppl 86:20–26
2):S1–S7. doi:10.1038/sj/ejcn/1601425 76. Dorne JLCM, Walton K, Renwick AG (2003)
64. Wirf€alt E, Hedblad B, Gullberg B, Mattisson I, Human variability in CYP3A4 metabolism and
AndrénC RU, Janzon L, Berglund G (2001) CYP3A4-related uncertainty factors for risk
Food patterns and components of the metabolic assessment. Food Chem Toxicol 41:201–224
syndrome in men and women: A cross-sectional 77. Dorne JLCM, Walton K, Renwick AG (2003)
study within the Malmö diet and cancer cohort. Polymorphic CYP2C19 and N-acetylation:
Am J Epidemiol 154(12):1150–1159 human variability in kinetics and pathway-
65. Zetlaoui M, Feinberg M, Verger P, Clémencon related uncertainty factors. Food Chem
S (2011) Extraction of food consumption sys- Toxicol 41:225–245
tems by non-negative matrix factorization 78. Kjellström T (1971) A mathematical model for
(NMF) for the assessment of food choices. the accumulation of cadmium in human kidney
Biometrics (in press). http://hal.archives- cortex. Nord Hyg Tidskr 52:111–119
ouvertes.fr/docs/00/48/47/94/PDF/NMF_ 79. Sutton AJ, Higgins JPT (2008) Recent devel-
food.pdf opments in meta-analysis. Stat Med
66. EFSA (2010) European Food Safety Authority; 27:625–650
management of left-censored data in dietary
20 Population Effects and Variability 581
80. Berry D, Strangl DK (eds) (2001) Meta- gration methods. The application of animal
analysis in medicine and health policy. Biosta- toxicity data in risk-benefit analysis:
tistics, New York 2,3,7,8-TCDD as an example.
81. Morales KH, Ryan LM (2005) Benchmark 89. Bernillon P, Bois FY (2000) Statistical issues in
dose estimation based on epidemiologic cohort toxicokinetic modeling: a Bayesian perspective.
data. Environmetrics 16:435–447 Environ Health Perspect 108(Suppl
82. Lunn DJ, Thomas A, Best N, Spiegelhalter D 5):883–893
(2000) WinBUGS—a Bayesian modelling 90. Yan L, Sheihk-Bahaei S, Park S, Ropella GE,
framework: concepts, structure and extensibil- Hunt CA (2008) Predictions of hepatic dispo-
ity. Stat Comput 10:325–337 sition properties using a mechanistically realis-
83. EFSA (2011) Comparison of the approaches tic, physiologically based model. Drug Metab
taken by EFSA and JECFA to establish a Dispos 36(4):759–768
HBGV for cadmium. http://www.efsa.europa. 91. Lerapetritou MG, Georgopoulos PG, Roth
eu/en/efsajournal/doc/2006.pdf. CM, Androulakis LP (2009) Tissue-level mod-
84. Suwazono Y, Nogawa K, Uetani M et al (2011) eling of xenobiotic metabolism in liver: an
Application of hybrid approach for estimating emerging tool for enabling clinical translational
the benchmark dose of urinary cadmium for research. Clin Transl Sci 2(3):228–237
adverse renal effects in the general population 92. McDonald TA (2005) Polybrominated diphe-
of Japan. J Appl Toxicol 31(1):89–93 nylether levels among United States residents:
85. Tard A, Gallotti S, Leblanc JC, Volatier JL daily intake and risk of harm to the developing
(2007) Dioxins, furans and dioxin-like PCBs: brain and reproductive organs. Integr Environ
occurrence in food and dietary intake in Assess Manag 1(4):343–354
France. Food Addit Contam 24(9):1007–1017 93. Van der Molen GW, Kooijman SALM et al
86. Milbrath MO, Wenger Y, Chang CW, Emond (1996) A generic toxicokinetic model for per-
C, Garabrant D, Gillespie BW, Jolliet O (2009) sistent lipophilic compounds in humans: an
Apparent half-lives of dioxins, furans, and poly- application to TCDD. Fundam Appl Toxicol
chlorinated biphenyls as a function of age, body 31(1):83–94
fat, smoking status, and breast-feeding. Envi- 94. Verner MA, Ayotte P et al (2009) A physiolog-
ron Health Perspect 117(3):417–425 ically based pharmacokinetic model for the
87. Gray LE, Ostby JS et al (1997) A dose- assessment of infant exposure to persistent
response analysis of the reproductive effects of organic pollutants in epidemiologic studies.
a single gestational dose of 2,3,7,8-tetrachlor- Environ Health Perspect 117(3):481–487
odibenzo-p-dioxin in male Long Evans 95. Lu C, Holbrook CM et al (2010) The implica-
Hooded rat offspring. Toxicol Appl Pharmacol tions of using a physiologically based pharma-
146(11–20) cokinetic (PBPK) model for pesticide risk
88. Bokkers, B. G. H., M. J. Zeilmaker, et al. assessment. Environ Health Perspect 118
(2009). RIVM report on framework and inte- (1):125–130
Chapter 21
Abstract
Pharmacodynamic modeling is based on a quantitative integration of pharmacokinetics, pharmacological
systems, and (patho-) physiological processes for understanding the intensity and time-course of drug
effects on the body. Application of such models to the analysis of meaningful experimental data allows for
the quantification and prediction of drug–system interactions for both therapeutic and adverse drug
responses. In this chapter, commonly used mechanistic pharmacodynamic models are presented with
respect to their important features, operable equations, and signature profiles. In addition, literature
examples showcasing the utility of these models to adverse drug events are highlighted. Common model
types that are covered include simple direct effects, biophase distribution, indirect effects, signal transduc-
tion, and irreversible effects.
1. Introduction
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2_21, # Springer Science+Business Media, LLC 2012
583
584 M.A. Felmlee et al.
2. Modeling
Requirements
Useful pharmacodynamic models are based on plausible mathemat-
ical and pharmacological exposure–response relationships. Basic
model components encompassing a range of pharmacodynamic
systems are illustrated in Fig. 1. For most drug effects, both
pharmacological mechanisms, often characterized by sensitivity-
grounded capacity-limited effector units, and physiological turn-
over processes need to be integrated with drug disposition when
constructing a PK/PD model.
The construction and evaluation of relevant PK/PD models
require suitable pharmacokinetic data, an appreciation for molecu-
lar and cellular mechanisms of pharmacological/toxicological
responses, and a range of quantitative experimental measurements
of meaningful biomarkers within the causal pathway between drug–
target interactions and clinical effects. Good experimental designs
are essential to ensure that sensitive and reproducible data are
collected. These data should cover a reasonably wide dose/concen-
tration range and appropriate study duration to ascertain net drug
exposure and the ultimate fate of the biomarkers or outcomes
under investigation. A wide range of systemic drug concentrations
is also typically required for the accurate and precise estimation of
pharmacodynamic parameters. Typically studies should involve a
minimum of two to three doses to adequately estimate the
21 Mechanism-Based Pharmacodynamic Modeling 585
Fig. 1. Basic components of pharmacodynamic models. The time-course of drug concentrations in a relevant biological
fluid (e.g., plasma, Cp) or the biophase (Ce) is characterized by a mathematical function that serves to drive PD models. The
biosensor process involves the interaction between the drug and the pharmacologic target (R ), and may be described
using various receptor-occupancy models, may require equations that consider the kinetics of the drug–receptor complex
formation and dissociation, or may encompass irreversible drug–target interactions. Many drugs act via indirect
mechanisms and the biosensor process may serve to stimulate or inhibit the production (kin) or loss (kout) of endogenous
mediators. These altered mediators may not represent the final observed drug effect (E) and further time-dependent
transduction processes may occur, thus requiring additional modeling components. System complexities such as drug
interactions, functional adaptation, changes with pathophysiology, and other factors may play a role in regulating drug
effects after acute and long-term drug exposure (adapted from ref. 33).
3. Practical
Modeling
Approaches
The first steps in any modeling endeavor are to define the objectives
of the analysis and to perform a careful graphical analysis of raw data.
Both efforts should facilitate selection of appropriate techniques
and conditions for model construction and evaluation. A good
graphical analysis (along with a priori knowledge of drug mechan-
isms) may be used to narrow down the number of structural models
being considered as a base model and also help in calculating initial
parameter estimates. Despite progress in computational algorithms,
good initial parameter estimates can reduce the likelihood of falling
into local minima and can also be used as a reality check when
compared to final parameter estimates or literature reported values.
Next, an appropriate drug/toxin pharmacokinetic/toxicokinetic
function is derived from fitting a model to concentration–time
profiles in relevant biological fluids. Depending on the complexity
of the pharmacodynamic model/system, the pharmacokinetic
model and associated parameters are often fixed to serve as a driving
function for the pharmacodynamic model relating drug exposure to
pharmacological/toxicological effects. Although simultaneous
PK/PD modeling is desirable, this can still be a formidable chal-
lenge for complex models. Objective model-fitting criteria (e.g.,
diagnostic and goodness-of-fit plots) are frequently compared to
select a final model, and a variety of techniques are available to verify
or qualify models, which can range in complexity depending on the
modeling approach (e.g., population versus pooled data). Ideally, an
external dataset, not used in the construction of the model, could be
used to determine whether the model is generalizable; however,
internal validation steps are far more common as most model-
builders will attempt to incorporate all available experimental data.
In any event, final models should reasonably recapitulate the data
used to derive the model, generate new insights and testable
hypotheses of factors controlling drug responses, and provide guid-
ance for subsequent decisions in drug discovery, development, and
pharmacotherapy. Subsequent sections will highlight commonly
used pharmacodynamic models with increasing degrees of complex-
ity, as well as provide literature examples on the application of such
models to the analysis of drug-induced adverse events.
3.1. Simple Direct Effect The Hill equation assumes that drugs effects (E) are directly pro-
Models portional to receptor occupancy (i.e., linear transduction), assumes
that plasma drug concentrations are in rapid equilibrium with the
effect site, and represents a fundamental pharmacodynamic rela-
tionship (6):
E max C p
E ¼ E0 : (1)
EC50 þ C p
21 Mechanism-Based Pharmacodynamic Modeling 587
10000 110
100
1000
90
100 80
Concentration
70
Effect
10 60
1 50
40
0.1 30
20
0.01
10
0.001 0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Time (hours) Time (hours)
Fig. 2. Simulated drug concentrations (left ) and response curves (right ) using a simple Emax model (Eq. 1). Drug
concentrations follow monoexponential disposition: Cp ¼ Coe(kt). Co was set to 10, 100, or 1,000 units to achieve
increasing dose levels. Parameter values were k ¼ 0.12/h, Emax ¼ 100 units, and EC50 ¼ 15 units.
Fig. 3. Direct effect model of tacrolimus-induced changes of QTc intervals in guinea pigs. The pharmacokinetic model
includes both plasma and ventricular myocardial drug concentrations (a), and the latter are associated with changes in QTc
according to Eq. 4 (b). The PK/PD relationship results in the time-course of changes in QTc (c). Reprinted from ref. 8 with
permission from Springer.
100 90
keo = 0.1
80
keo = 0.5
10 70
Concentration
60
Effect
50
1
40
30
0.1 20
10
0.01 0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Time (hours) Time (hours)
Fig. 4. Biophase model structure (top panel) and signature profiles for drug concentrations at the biophase (left bottom
panel ) and pharmacological effects (right bottom panel ). Response curves were simulated using Eqs. 1 and 6 driven by
drug concentrations following monoexponential disposition: Cp ¼ Coe(kt). Co was set to 100 units. Parameter values were
k ¼ 0.12/h, keo ¼ 0.1 or 0.5/h, Emax ¼ 100 units, and EC50 ¼ 15 units.
3.2. Biophase In many cases, the in vivo pharmacological effects will lag behind
Distribution plasma drug concentrations. This results in the phenomenon of
hysteresis, or a temporal disconnect in effect versus concentration
plots. Distribution of drug to its site of action might represent a
rate-limiting process that may account for the delay in drug effect.
The term “biophase” was coined by Furchgott (11) to describe the
drug site of action, and a mathematical approach to linking plasma
concentrations and drug effect through a hypothetical effect com-
partment was popularized by Sheiner and colleagues (12) (Fig. 4,
top panel). Plasma drug concentrations are described using an
appropriate pharmacokinetic model, and the rate of change of
drug concentrations at the biophase (Ce) is defined as
dC e
¼ keo C p keo C e ; (6)
dt
590 M.A. Felmlee et al.
3.3. Indirect Response Indirect response models represent a highly useful class of models
Models wherein reversible drug–receptor interactions serve to alter the
natural production or loss of biomarker response variables.
A model reflecting inhibition of production was first utilized to
characterize prothrombin activity in blood after oral warfarin
administration (15). Dayneka and colleagues (16) were the first to
formally propose four basic indirect response models whose struc-
tures are detailed in Fig. 5 (top panel). These models have been
used to investigate the pharmacodynamics of a wide range of drug
effects, and their mathematical properties have been well character-
ized (17, 18). The four basic models include inhibition of
21 Mechanism-Based Pharmacodynamic Modeling 591
I II
120 600
100 500
80
Response
400
Response
60 300
40 200
20 100
0 0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Time (hours) Time (hours)
600 III IV
120
500 100
400
Response
80
Response
300 60
200 40
100 20
0 0
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Time (hours) Time (hours)
Fig. 5. Indirect response model structure (top panel ) and signature profiles for the four basic indirect response models
(middle and bottom panels). Response curves were simulated using Eqs. 7, 8, 9, and 10 driven by drug concentrations
following monoexponential disposition: Cp ¼ Coe(kt). Co was set to 10, 100, or 1,000 units to achieve increasing doses.
Parameter values were k ¼ 0.12/h, Imax ¼ 1 unit (Models I and II), Smax ¼ 10 units (Models III and IV), EC50 ¼ 15 units,
kout ¼ 0.25/h, and Ro ¼ 100 units (kin ¼ Rokout).
3.3.1. Model I
dR I max C p
¼ kin 1 kout R: (7)
dt IC50 þ C p
592 M.A. Felmlee et al.
3.3.2. Model II
dR I max C p
¼ kin kout 1 R: (8)
dt IC50 þ C p
3.3.4. Model IV
dR S max C p
¼ kin kout 1 þ R; (10)
dt SC50 þ C p
where kin is a zero-order production rate constant, kout is a first-
order elimination rate constant, Imax and Smax are defined as the
maximum fractional factors of inhibition (0 < Imax 1) or stimu-
lation (Smax > 0), and IC50 and SC50 are defined as the EC50.
Initial parameter estimates can be obtained from a graphical analysis
of PK/PD data as previously described (17, 18). Signature profiles
for these models in response to increasing dose levels are shown in
Fig. 5 (middle and bottom panels). Interestingly, the time to peak
responses are dose dependent, occurring at later times as the dose
level is increased. This phenomenon is easily explained as the inhi-
bition or stimulation effect will continue for larger doses, as drug
remains above the EC50 for longer times. The initial condition for
all models (R0) is kin/kout which may be set constant or fitted as a
parameter during model development. Ideally, a number of mea-
surements should be obtained prior to drug administration to
assess baseline conditions. Based on the determinants of R0, typi-
cally the baseline and one of the turnover parameters are estimated,
and the remaining rate constant is calculated as a function of the
two estimated terms. This reduces the number of parameters to be
estimated and maintains system stationarity.
The basic indirect response models can be extended to incor-
porate a precursor compartment (P). The following equations
represent a general set of precursor-dependent indirect response
models (Fig. 6, top panel) that were developed and characterized
by Sharma and colleagues (19):
dP
¼ ko 1 H 1 C p ks þ kp f1H 2 C p P; (11)
dt
dR
¼ kp f1H 2 C p P kout R; (12)
dt
where k0 represents the zero-order rate constant for precursor
production, kp is a first-order rate constant for production of the
response variable, and kout is the first-order rate constant for dissi-
pation of response. H1 and H2 represent the inhibition or stimula-
tion of precursor production or production of response and are
21 Mechanism-Based Pharmacodynamic Modeling 593
V VI
175 135
150 130
125
125
120
Response
Response
100 115
75 110
105
50
100
25 95
0 90
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Time (hours) Time (hours)
Fig. 6. Multiple compartment indirect response models (top panel) and signature profiles for Models V and VI (bottom
panel ). Response curves were simulated using Eqs. 11 and 12 driven by drug concentrations following monoexponential
disposition: Cp ¼ Coe(kt). Co was set to 10, 100, or 1,000 units to achieve increasing doses. Parameter values were
k ¼ 0.12/h, Imax ¼ 1 unit, Smax ¼ 10 units, EC50 ¼ 15 units, ko ¼ 25 unit/h, kp ¼ 0.5/h, and kout ¼ 0.25/h.
3.5. Irreversible Effect A wide range of compounds, including anticancer drugs, antimi-
Models crobial drugs, and enzyme inhibitors, elicit irreversible effects.
A basic model for describing irreversible effects was developed by
Jusko and includes simple cell killing (27):
dR
¼ k C R; (15)
dt
where R represents cells or receptors, C is either Cp or Ce, and k is a
second-order cell-kill rate constant. The initial condition for this
21 Mechanism-Based Pharmacodynamic Modeling 595
Fig. 7. Transit-compartment model of myelosuppression (top panel ) including a proliferating progenitor pool (P ), three
transit compartments (Mi), and a plasma neutrophil compartment (N ). Drug effect is driven by plasma drug concentration
(Cp) and pharmacodynamic parameters (y). An adaptive feedback function on the proliferation rate constant is governed by
the ratio of initial neutrophils to current neutrophil count, raised to a power coefficient (g). The time-course of neutrophils
following vinflunine administration (arrows in bottom panel). Reprinted from ref. 23 with permission from the American
Society of Clinical Oncology.
ks
k x Cp
R
100
Survival Fraction
10
0.1
0.01
0 10 20 30 40 50 60 70 80
Time (hours)
Fig. 8. Structural model for irreversible effects (top panel ) and signature profiles for
irreversible effect model with a proliferating cell population (bottom panel ). Response
curves were simulated using Eq. 16 driven by drug concentrations following monoexpo-
nential disposition: Cp ¼ Coe(kt). Co was set to 0, 10, 100, or 1,000 units to achieve a
control population and increasing dose levels. Pharmacokinetic parameter was k ¼ 0.12/h.
Pharmacodynamic parameters were k ¼ 0.0005 units/h, ks ¼ 0.03/h.
3.6. More Complex The main focus of this chapter was the introduction of commonly
Models used mechanistic pharmacodynamic models that can be readily
applied to toxicokinetics and dynamics. However, a number of
mechanistic processes may be required to adequately describe the
21 Mechanism-Based Pharmacodynamic Modeling 597
dE I kC PO
¼ EA
dt EC50;PO þ C PO
!
h
E max CPRX
kr 1 þ E I ðkage E I Þ: (21)
ECh50;PRX þ CPRX
h
n
E max;T E EEA0 1
T E ¼ TE0 þ n ; (22)
n
E50 þ EEA0 1
4. Prospectus
Acknowledgments
References
1. Berger SI, Iyengar R (2011) Role of systems mus and QT prolongation in guinea pigs: phar-
pharmacology in understanding drug adverse macokinetic/pharmacodynamic model
events. Wiley Interdiscip Rev Syst Biol Med incorporating a site of adverse effect.
3:129–135 J Pharmacokinet Pharmacodyn 28:533–554
2. Levy G (1964) Relationship between elimina- 9. Laizure SC, Parker RB (2009) Pharmacody-
tion rate of drugs and rate of decline of their namic evaluation of the cardiovascular effects
pharmacologic effects. J Pharm Sci 53:342–343 after the coadministration of cocaine and etha-
3. Levy G (1966) Kinetics of pharmacologic nol. Drug Metab Dispos 37:310–314
effects. Clin Pharmacol Ther 7:362–372 10. Vage C, Saab N, Woster PM, Svensson CK
4. Mager DE, Wyska E, Jusko WJ (2003) Diver- (1994) Dapsone-induced hematologic toxic-
sity of mechanism-based pharmacodynamic ity: comparison of the methemoglobin-
models. Drug Metab Dispos 31:510–518 forming ability of hydroxylamine metabolites
5. Yates FE (1975) On the mathematical model- of dapsone in rat and human blood. Toxicol
ing of biological systems: a qualified ‘pro’. In: Appl Pharmacol 129:309–316
Vernberg FJ (ed) Physiological adaptation to 11. Furchgott RF (1955) The pharmacology of
the environment. Intext Educational Publish- vascular smooth muscle. Pharmacol Rev
ers, New York 7:183–265
6. Wagner JG (1968) Kinetics of pharmacologic 12. Sheiner LB, Stanski DR, Vozeh S, Miller RD,
response. I. Proposed relationships between Ham J (1979) Simultaneous modeling of phar-
response and drug concentration in the intact macokinetics and pharmacodynamics: applica-
animal and man. J Theor Biol 20:173–201 tion to d-tubocurarine. Clin Pharmacol Ther
7. Friberg LE, Isbister GK, Hackett LP, Duffull 25:358–371
SB (2005) The population pharmacokinetics of 13. Yassen A, Kan J, Olofsen E, Suidgeest E, Dahan
citalopram after deliberate self-poisoning: a A, Danhof M (2007) Pharmacokinetic-
Bayesian approach. J Pharmacokinet Pharma- pharmacodynamic modeling of the respiratory
codyn 32:571–605 depressant effect of norbuprenorphine in rats.
8. Minematsu T, Ohtani H, Yamada Y, Sawada Y, J Pharmacol Exp Ther 321:598–607
Sato H, Iga T (2001) Quantitative relationship 14. Stroh M, Addy C, Wu Y, Stoch SA, Pourkavoos
between myocardial concentration of tacroli- N, Groff M, Xu Y, Wagner J, Gottesdiener K,
600 M.A. Felmlee et al.
Shadle C, Wang H, Manser K, Winchell GA, to reduce the risk of severe myelosuppression.
Stone JA (2009) Model-based decision making J Pharmacokinet Pharmacodyn 36:39–62
in early clinical development: minimizing the 25. Zandvliet AS, Siegel-Lakhai WS, Beijnen JH,
impact of a blood pressure adverse event. AAPS Copalu W, Etienne-Grimaldi MC, Milano G,
J 11:99–108 Schellens JH, Huitema AD (2008) PK/PD
15. Nagashima R, O’Reilly RA, Levy G (1969) model of indisulam and capecitabine: interac-
Kinetics of pharmacologic effects in man: the tion causes excessive myelosuppression. Clin
anticoagulant action of warfarin. Clin Pharma- Pharmacol Ther 83:829–839
col Ther 10:22–35 26. Soto E, Staab A, Freiwald M, Munzert G,
16. Dayneka NL, Garg V, Jusko WJ (1993) Com- Fritsch H, Doge C, Troconiz IF (2010) Pre-
parison of four basic models of indirect phar- diction of neutropenia-related effects of a new
macodynamic responses. J Pharmacokinet combination therapy with the anticancer drugs
Biopharm 21:457–478 BI 2536 (a Plk1 inhibitor) and pemetrexed.
17. Jusko WJ, Ko HC (1994) Physiologic indirect Clin Pharmacol Ther 88:660–667
response models characterize diverse types of 27. Jusko WJ (1971) Pharmacodynamics of che-
pharmacodynamic effects. Clin Pharmacol motherapeutic effects: dose-time-response
Ther 56:406–419 relationships for phase-nonspecific agents.
18. Sharma A, Jusko WJ (1998) Characteristics of J Pharm Sci 60:892–895
indirect pharmacodynamic models and applica- 28. Fasanmade AA, Jusko WJ (1995) An improved
tions to clinical drug responses. Br J Clin Phar- pharmacodynamic model for formation of
macol 45:229–239 methemoglobin by antimalarial drugs. Drug
19. Sharma A, Ebling WF, Jusko WJ (1998) Metab Dispos 23:573–576
Precursor-dependent indirect pharmacody- 29. Houze P, Mager DE, Risede P, Baud FJ (2010)
namic response model for tolerance and Pharmacokinetics and toxicodynamics of prali-
rebound phenomena. J Pharm Sci 87:1577– doxime effects on paraoxon-induced respira-
1584 tory toxicity. Toxicol Sci 116:660–672
20. Woo S, Krzyzanski W, Jusko WJ (2008) Phar- 30. Earp J, Krzyzanski W, Chakraborty A, Zama-
macodynamic model for chemotherapy- cona MK, Jusko WJ (2004) Assessment of drug
induced anemia in rats. Cancer Chemother interactions relevant to pharmacodynamic indi-
Pharmacol 62:123–133 rect response models. J Pharmacokinet Phar-
21. Sun YN, Jusko WJ (1998) Transit compart- macodyn 31:345–380
ments versus gamma distribution function to 31. Hazra A, Pyszczynski N, DuBois DC, Almon
model signal transduction processes in pharma- RR, Jusko WJ (2007) Modeling receptor/
codynamics. J Pharm Sci 87:732–737 gene-mediated effects of corticosteroids on
22. Mager DE, Jusko WJ (2001) Pharmacody- hepatic tyrosine aminotransferase dynamics in
namic modeling of time-dependent transduc- rats: dual regulation by endogenous and exog-
tion systems. Clin Pharmacol Ther 70:210– enous corticosteroids. J Pharmacokinet Phar-
216 macodyn 34:643–667
23. Friberg LE, Henningsson A, Maas H, Nguyen 32. Barras MA, Duffull SB, Atherton JJ, Green B
L, Karlsson MO (2002) Model of (2009) Modelling the occurrence and severity
chemotherapy-induced myelosuppression with of enoxaparin-induced bleeding and bruising
parameter consistency across drugs. J Clin events. Br J Clin Pharmacol 68:700–711
Oncol 20:4713–4721 33. Jusko WJ, Ko HC, Ebling WF (1995) Conver-
24. Zandvliet AS, Schellens JH, Copalu W, Beijnen gence of direct and indirect pharmacodynamic
JH, Huitema AD (2009) Covariate-based dose response models. J Pharmacokinet Biopharm
individualization of the cytotoxic drug indisulam 23:5–8
INDEX
Brad Reisfeld and Arthur N. Mayeno (eds.), Computational Toxicology: Volume I, Methods in Molecular Biology, vol. 929,
DOI 10.1007/978-1-62703-050-2, # Springer Science+Business Media, LLC 2012
601
602 C OMPUTATIONAL TOXICOLOGY
Index
Binding (cont.) receptor...................................................................... 14
plasma ...................................................................... 153 signaling..................................................................... 14
site ....................................................... 31, 75, 77, 149, CellML.................................................................. 392, 395
154, 155, 241, 243, 261, 295, 352, 354, 356, CellNetOptimizer ........................................................... 84
358, 361–363, 386, 387 Cellular
tissue ............................................................... 288, 461 compartments................................................. 388, 426
Bioaccumulation ........................................................... 389 systems ..................................................................... 426
Bioavailability ........................................... 25, 28, 29, 285, Cephalosporins....................................338, 340, 343–344
331, 333, 335, 337, 338, 340–343, 377, 389 CHARMM ................................................. 144, 243–245,
Bioconductor................................................................. 253 247, 254, 262, 266
Biomarker ...............................................14, 23, 522, 528, Chemaxon .......................................................78, 95, 104,
550, 556, 558, 560, 562, 568, 576, 577, 166, 170, 176, 177, 200
582, 588, 589 Chembl ................................................218, 219, 220, 221
BioSolveIT.................................................................77, 82 Chemical Entities of Biological Interest (ChEBI)
Bond identifier................................................................... 205
acceptor...........................................88, 100, 138, 152, ontology .................................................191, 200–203
177, 334, 353–356 web.................................................................. 205, 209
breaking ................................................. 139, 254, 335 Cheminformatics.............................................73, 82, 191,
contribution ............................................................ 113 193, 223, 224, 238
contribution method ..................................... 113, 115 ChemOffice ...........................................95, 107, 110, 115
donor .......................................................88, 100, 108, Chemometrics ............................................................... 220
111, 114, 138, 152, 177, 334, 353–355 Chemotherapeutic...............................293, 444, 586, 588
Bone............................................................. 453, 519, 555 ChemProp .......................................................94, 95, 104,
Boolean ........................................................ 194, 195, 197 107, 110, 112, 115
Bossa ChemSpider....................................................91, 191, 223
predictivity ...................................................... 122, 199 Classification
Brain barrier models................................... 338–340, 344, 355, 356
penetration .......................29, 80, 180, 331, 340, 341 molecular ....................................................13, 72, 146
permeability .............................................. 29, 338–341 qsar.................................................................. 216, 355
tree .................................................334, 339, 344, 345
C Clearance
drug ...................................................... 237, 240, 293,
Caco
cells........................................................................... 335 295, 296, 339, 343, 508
permeability ........................................... 321, 322, 331 metabolite ................................................................ 237
model .............................................................. 297, 380
Cadmium .................................................... 211, 517, 518,
522, 542, 546, 551, 555–562 process ..................................................................... 411
Caffeine.............................................................89, 90, 293 rate .......................................... 24, 296, 342, 411, 440
ClogP ............................................................................. 181
Calcium.......................................................................... 258
Cancer Clonal growth ...................................................... 8, 14–15
risk................................................................... 524, 516 Cluster
analysis ........................................... 175–176, 339, 341
risk assessment................................................ 455, 456
Capillary...................................... 388, 389, 425–432, 434 pbs..................................................263, 264, 272, 273
Carbamazepine............................................ 286, 287, 293 Cmax.............................................................................. 554
Carcinogen, carcinogenic CODESSA .........................................................78, 92, 93,
100, 106, 108, 109, 111, 114
activity...................................................................... 455
effects ....................................................................... 455 Combinatorial chemistry ....................................... 24, 330
potency ........................................................................ 4 CoMFA. See Comparative Molecular Field
Analysis (CoMFA)
potential................................................................... 523
Cardiovascular ............................................. 209, 391, 580 Comparative Molecular Field Analysis
Cell (CoMFA) ........................... 25, 26, 138, 339, 356
Compartmental
cycle .................................................................. 48, 353
growth ................................................. 51, 60, 67, 587 absorption........................................77, 308, 309, 334
membrane......................................135, 386, 388, 427 analysis .................................................. 384, 385, 390,
permeability ............................................................... 12 400, 410, 436, 518–524
COMPUTATIONAL TOXICOLOGY
Index 603
model .....................................................290, 299–302, properties................................................................... 91
369, 391–436, 440–441, 441, 518, 521, 523, QSAR........................................ 27, 28, 120, 193, 332
524, 525, 526 QSPR .........................................................88, 91, 113,
systems .................................. 384, 385, 387, 391, 435 117, 120, 332
CoMSIA .........................................................25, 138, 356 topological ..................... 97, 100, 108, 111, 178, 345
Conformational Developmental
dynamics ................................................ 241, 250, 252 effects ....................................................................... 456
energetic .................................................................. 139 toxicants................................................................... 134
search ........................................................31, 143, 144 toxicity ....................................................................... 84
space........................................ 33, 138, 166, 361, 362 Diazepam.............................................241, 242, 293, 509
Connectivity Dibenzofurans ............................................................... 444
index ............................................................... 344, 345 Dichlorodiphenyltrichloroethane
topochemical .................................................. 344, 345 (DDT)............................................................... 183
Consensus Dietary
predictions ..................................................89, 90, 331 exposure................................................ 523, 533, 534,
score .................................................32, 175, 363, 364 539, 543, 550, 562
COPASI ......................................................................... 390 exposure assessment....................................... 533, 534
COSMOS ........................................................................ 98 intake .............................................534, 535, 561, 562
Covariance ..................................................................... 471 Differential equations ....................................... 50, 52–54,
matrix .............................................................. 251, 392 57–62, 65, 67, 71, 82, 300, 301, 311, 385, 388,
CRAN ............................................................................ 525 396, 410, 413, 431, 440, 442, 439, 445, 452,
Crystal structure..............................................22, 99, 144, 474, 475, 478, 479, 521–523, 548, 567, 577,
146, 148, 149, 153, 155, 192, 240, 242, 246, 583, 586
255, 354, 357–359, 362–364 Dioxin ......................................................... 101, 444, 456,
Cytochrome 518, 552, 562–566
metabolism ....................................285, 293, 316, 352 Discriminant analysis..................................................... 339
substrate................................ 241, 242, 255, 285, 293 DNA
Cytoscape......................................................................... 85 binding......................................................75, 352, 516
Cytotoxicity ................................................. 141, 215, 216 Docking
methods ............................31, 37, 156, 354, 355, 358
D molecular .............................................. 135, 136, 140,
143, 147–150, 153, 240, 358, 362–363
Database
chemical ................................................ 191, 193, 195, scoring ............................................32, 147, 154, 354,
206, 210, 211 358, 359, 364
simulations............................................................... 149
forest ............................................................... 101, 355
KEGG ............................................................... 23, 181 tools .................................................................. 77, 246
network.................................................................... 164 Dose
administered ..................................306, 440, 441, 457
search ..................................22, 29, 74, 166, 168, 195
software................................................................73, 78 extrapolation......................................... 443, 494, 496,
support.............................................................. 73, 191 521, 532, 540
metric .................................... 440, 457, 505, 507, 509
tree ........................................................................... 345
Decision ...........................................................83, 84, 103, pharmacokinetic ............................................... 81, 505
136, 183, 229, 339–341, 343, 344, 345, reference ........................................ 516, 527–533, 537
449, 513 response ................................................ 5, 12, 14, 456,
471, 523, 517, 518, 525–532, 538–541, 546,
Derek ............................................................................... 83
Dermal .................................................................. 135, 460 550, 558–562
Descriptors Dosimetry ..................................................................11–13
3D QSAR ............................................... 26–29, 135–141,
chemical ....................................................97, 101, 108
models............................................................. 113, 120 144, 150–153, 339
molecular ..................................................... 25, 29, 30, Dragon..................................................... 78, 92, 106, 177
Dragon descriptors.................................. 78, 92, 106, 177
80, 117, 138, 177, 178, 216, 332, 335, 343, 355,
361, 362, 363, 364 Drug
physicochemical...............................97, 106, 111, 113 absorption............................. 286, 335, 337, 339, 340
prediction................................................................... 97 binding..................................................................... 237
604 C OMPUTATIONAL TOXICOLOGY
Index
Drug (cont.) EPISuite...................................................... 78, 95, 96, 98,
clearance ............................................... 237, 240, 293, 101, 102, 107, 108, 110, 112, 113, 115, 183
295, 296, 339, 343, 508 Epoxide................................................................. 180, 456
databases ......................................................... 178, 210 Estrogenic.......................................................................... 4
development ............................................2, 8, 80, 240, Ethanol ........................................................ 293, 294, 296
330, 342, 346, 523, 590 Excel................................................................................. 81
distribution ........................... 283, 341, 384, 388, 582 Expert systems................................................................. 81
drug interactions .................................. 216, 225, 311, Exposure
331, 343, 352, 356, 577, 586, 589, 590 assessment............................................. 522, 517, 518,
induced toxicity ....................................................... 238 524, 533–538, 540, 543–545, 552, 566–567
metabolism ....................................... 28, 30, 216, 225, dose .......................................... 11, 12, 310, 314, 315,
237, 240, 291, 293, 343, 360 323, 534
plasma .......................... 289, 294, 578, 579, 581, 587 hazard ...................................................................... 219
receptor.................................................. 577, 582, 586 level .................................................... 4, 505, 539, 561
resistance.................................................................... 48 model ................................................ 10, 11, 523, 519,
safety ............................................................. 2, 11, 240 533–537, 550–551, 567
solubility .................................................................. 101 population ...................................................... 532–566
targets ................................... 144, 147, 192, 576, 577 response ................................................................... 576
DSSTox................................................154, 218, 219, 223 route....................................................... 439, 494, 521
Dynamical systems ........................................................ 390 scenario ...........................................11, 443, 472, 494,
521, 526, 533
E
F
Ecological risk ............................................................... 570
Ecotoxicity, ecotoxicology............................................ 183 Fat
Effectors............................... 14, 53, 57–63, 65, 506, 576 compartment .................................................. 447, 448
Electrotopological ................................................ 101, 109 tissue ..................................................... 442, 447, 448,
Elimination 451, 478, 481, 463476
chemical .......................................................... 459, 468 Fate and transport ....................................................... 9–10
drug ...............................................290, 291, 294, 300 Fingerprint................................................ 26, 27, 97, 142,
model ....................................................................... 297 147, 168, 177, 181, 196–199, 226, 341, 355,
process .................................. 389, 522, 534, 536, 549 356, 361, 362, 364
rate ........................................ 296, 300, 373, 522, 584 Food
Emax model ...................................................579–582, 586 additives ...............................................................2, 516
Endocrine consumption data........................................... 543–545
disruptors................................................................. 240 intake ............................................................... 10, 313,
system ...................................................................... 353 561, 562
Ensemble methods........................................................ 255 safety ........................................................................ 522
Enterocytes ..........................................306, 307, 309, 324 Force field ................................................... 13, 32, 34–36,
Environmental 136, 138–140, 144, 146, 147, 149, 243–247,
agents ......................................................................... 24 254, 261, 262, 264, 265, 270, 273
contaminants .................................353, 444, 463, 553 Forest
fate ..................................................... 13, 78, 183, 219 decision tree............................................................. 340
indicators ................................................................... 12 Formaldehyde................................................................ 162
pollutants ..................................................................... 2 Fortran ...................................................... 79, 82, 84, 244,
protection ................................. 2, 107, 154, 182, 361 390–392, 396, 444, 476, 478
toxicity .................................... 88, 162, 179, 183, 240 Fractal ................................................................... 338, 495
Enzyme Fragment based .................................................... 102, 168
complex........................................................... 209, 402 Functional
cytochrome ..............................................27, 241–242, groups ......................................................36, 136, 138,
285, 293, 316, 352, 482, 508 149, 166, 168, 171, 178, 180–182, 292
metabolites ..................................................... 293, 309 theory....................................................................... 140
receptors ................................................ 139, 238, 239 units ......................................................................... 440
substrates ........................................................ 285, 293 Fuzzy
transporters..................................................... 310, 322 adaptive .................................................................... 337
COMPUTATIONAL TOXICOLOGY
Index 605
G I
Gastrointestinal ................................................... 209, 285, Immune
286, 291, 300, 306–308, 313, 316, 321, 335, cells.................................................. 52, 53, 57–60, 63,
378, 494, 505, 522, 556 65, 67, 69, 70
GastroPlus .........................................................24, 77, 80, model ................................................................... 60–62
308–311, 316, 317, 324, 336, 445, 523, 524 response ....................................................... 48, 60, 63,
Gene, genetic 65, 69, 70
algorithms........................................... 21, 31, 91, 106, Immunotoxicity............................................................. 237
114, 166, 340, 362 InChI ................................................. 119, 164, 165, 172,
expression networks .................................................. 14 174, 189, 190, 193, 196, 200, 206, 210
function .......................................................... 337, 340 Inducers ................................................................ 293, 508
networks .................................................................. 216 Information
neural networks ....................................................... 106 systems ............................................................ 163, 183
profiling ................................................................... 141 Ingestion...................................................... 285, 457, 533
regulatory systems..................................................... 84 Inhalation ................................................... 135, 308, 316,
Genotoxicity ..............................................................5, 216 386, 441, 442, 449, 459–460, 477, 479, 484,
Glomerular .................................................. 207, 294, 389 485, 494–496, 505, 521, 533
Glucuronidation ......................................... 292, 343, 517, Integrated risk information system (IRIS) .................. 218
552, 553, 3889 Interaction
Glutathione ................................ 293, 294, 455, 482–485 energy ....................................................32, 33, 36–37,
Graph 139, 142, 243, 249
model ......................................................................... 54 fields ..................................................... 78, 80, 81, 103
theory........................................................12, 162, 166 model .............................................................. 140, 143
network................................................................19, 75
H rules............................................................................ 32
Hazard Inter-individual variability ......................... 501, 521, 525,
assessment................................................................ 522 527, 528, 539, 559, 560
Interspecies
characterisation.............522, 518–533, 538–540, 542
identification......................... 522, 518–533, 538, 539 differences................................................................ 516
HazardExpert .................................................................. 83 extrapolation..........................................442, 493–509,
520, 568
Hepatic
clearance ............................................... 135, 295, 341, Intestinal
343, 409, 411, 412, 422, 423 absorption..................................................29, 80, 308,
331, 333, 335, 337, 341, 458
metabolism ..................................................... 289, 292
Hepatotoxic ................................................................... 217 permeability ............................................................. 335
Herg tract .......................................................................... 322
blockers.................................................................... 240
J
channel................................................... 135, 181, 243
potassium channels .............. 238, 240, 242–243, 257 Java......................................... 82, 85, 176, 193, 205, 391
Hierarchical JSim.............................................. 82, 388, 390–397, 411,
clustering ................................................................. 540 422, 423, 430
models................................... 517, 540, 548, 550, 560
HIV .................................................................................. 22 K
Homeostasis .................................................................. 352 Ketoconazole............................................... 293, 359, 360
Homology ..................................................................... 148 Kidney
models....................30, 148, 149, 240, 242, 243, 258
cortex ....................................................................... 556
Hormone k-nearest neighbor ............................................... 338–341
binding..................................................................... 153 KNIME.......................................................................... 156
receptor..........................................238, 240, 351, 352 Kow (octanol-water partition
hPXR
coefficient) ............................................. 88, 94–99
activation ........................................................ 354–356 Kyoto encyclopedia of genes and
agonists .......................................... 353, 355–359, 362 genomes (KEGG)
antagonists ..............................................353, 359–360
pathway ...................................................................... 76
606 C OMPUTATIONAL TOXICOLOGY
Index
L Maximum likelihood estimation
(MLE) ............................341, 523, 541, 550, 551
Langevin .....................................250, 259, 266, 271, 274 MCMC. See Markov chain Monte Carlo (MCMC)
Leadscope ............................................................. 136, 218 MCSim.......................................................... 35, 445, 439,
Least squares......................................................56, 57, 79, 523, 524, 549, 550
106, 111, 138, 337–341, 425, 465, 466 Mercury ................................................................ 207, 211
Ligand Meta-analysis .............................................. 530, 539–543,
binding.................................... 28, 141, 241, 252, 352 547, 551–555, 558
complex................................. 32, 34, 77, 82, 246, 363 MetabolExpert ....................................................... 80, 336
interactions ........................................ 31–33, 147, 360 Metabolism
library..................................................... 30, 31, 33, 37 (bio)activation ........................ 24, 179, 352, 360, 387
receptor...........................................34, 135, 240, 353, drug ...........................................................28, 30, 216,
357, 363 225, 237, 240, 291, 293, 343, 360
screening .................................................................. 329 liver ..........................................................12, 216, 285,
Likelihood 286, 291, 293, 309, 343, 386, 441, 442, 443,
functions .................................................................. 548 454, 457, 468, 478, 499–502, 507–509
method .................................................................... 251 prediction..................................................... 12, 28, 30,
ratio .......................................................................... 548 81, 84, 87, 225, 226, 309, 333, 335, 342, 343,
Linear algebra ....................................................... 385, 388 443, 445, 450, 451, 465, 470, 483, 486, 501,
Linear discriminant analysis .......................................... 339 509, 518, 546
Lipinski, C.A. ......................................175, 333, 334, 342 rate ........................................................ 445, 455, 478,
Lipophilic...................................................... 94, 240, 295, 483, 499–501
447, 448, 452, 456, 465, 520 Metabolomics/metabonomics .................................3, 568
Liver MetaCore..........................................................81, 85, 336
enzyme..................................................................... 241 MetaDrug ............................................................... 81, 336
injury........................................................................ 216 Metal ............................................... 32, 75, 138, 152, 207
microsomes....................................................... 12, 509 METAPC ................................................................ 81, 338
tissue ..............................................343, 476, 481, 498 Metasite ..........................................................81, 226, 336
Logistic Meteor ..............................................................81, 84, 336
growth ................................................... 55, 56, 60, 67 Methanol ....................................................................... 296
regression................................................................. 339 Methemoglobin ................................................... 581, 588
Lognormal .................................................. 521, 529, 531, Methotrexate .............................................. 290, 444, 448,
547, 553, 554, 559, 566 449, 453, 455, 459
Log P .................................................................25, 29, 30, Metyrapone ................................................. 241, 242, 255
74, 88, 91, 94–100, 117, 118, 144, 309, 316, MEXAlert ............................................................... 81, 336
322, 334, 345, 362 Michaelis-Menten equation................................. 388, 495
Lowest observed adverse effect level Microglobulinuria ........................................558–560, 562
(LOAEL) .................................. 84, 527–533, 538 Microsomes .................................................. 12, 286, 331,
Lungs ..................................................288, 291, 409–412, 498, 499, 502, 508, 509, 552
422, 440, 442, 448, 449, 455, 456, 459, 498, Milk....................................................................... 536, 550
499, 502, 519, 541 Minitab ......................................................................79, 93
Missing data.......................................................... 314, 541
M
MLE. See Maximum likelihood estimation (MLE)
Madonna (Berkeley-Madonna) ............................ 82, 383, Modeling
395, 444–448, 524 homology..................................................... 21–23, 30,
Malarial ............................................................................ 22 135, 142, 143, 146, 148, 153, 240, 245, 246,
Mammillary .........................................376, 387, 409, 410 258, 259, 262
Markov chain Monte Carlo molecular ....................................................... 4, 12, 13,
(MCMC) ....................................... 439, 523, 524, 73, 79, 82, 95, 107, 110, 112, 134–137,
550, 551, 560 140–148, 151–152, 154, 568
Markup language .......................................................... 174 in vitro ......................................................... 12, 23, 29,
Mathematica ................................................ 391, 523, 524 140, 141, 216, 231, 308, 312, 313, 321, 323,
Matlab................................................................56, 79, 82, 331, 332, 334, 356, 360, 443, 494, 500, 509,
84, 93, 388, 391, 396, 444–449, 523, 524, 531 589, 590
COMPUTATIONAL TOXICOLOGY
Index 607
Models MOLPRO................................................ 96, 98, 101, 102
animal..........................................................1, 353, 495 Monte Carlo simulation .................................33, 71, 439,
biological activity.......................................25, 28, 138, 472, 520, 524, 526, 533, 535, 550, 557, 560
140, 150, 151, 518 Multi-agent systems ........................................................ 85
bone ................................................................ 453, 519 Multidimensional drug discovery ................................ 216
carcinogenicity................................. 5, 81, 83, 84, 183 Multidrug resistance ............................................ 153, 208
checking...................................................59, 193, 312, Multiscale................................................. 2, 238, 385, 590
384, 390, 391, 395, 396, 566 Multivariate
development ..............................................14, 69, 117, analysis ...........................................175, 332, 333, 337
139, 219, 231, 331, 442, 440, 441, 446, 447, regression................................................................. 338
449, 467, 470, 473, 474, 482, 584, 586 Mutagenicity
developmental ..........................................84, 330, 456 alerts......................................................................... 224
error ................................................................ 451, 471 prediction....................................................83, 84, 181
evaluation..................... 441, 442, 469–474, 482, 527 Myelosuppression................................................. 586, 587
fitting .................................... 311, 313–315, 321, 578 MySQL .......................................................................... 154
identification................................................... 450–451
intestina .........................................................................l N
prediction.............................................. 184, 342, 439,
NAMD........................................................ 243–245, 255,
440, 442, 444, 465, 474, 482–484, 486, 256, 259, 263–265, 270, 273
501, 507 Nanoparticles....................................................4, 192, 388
refinement...................................................51, 70, 482 Nasal/pulmonary .......................................................... 307
reproductive........................................... 240, 352, 538
Nearest neighbor.................................................. 101, 148
selection ................................................................... 550 Nephrotoxicity .............................................................. 207
structure...................................................21, 148, 442, Nervous system .......................................... 257, 289, 290,
447, 449–463, 470, 474, 482–484, 486, 519,
342, 445
549, 581, 583 Network
uncertainty................................................... 5, 49, 451, gene............................................................................ 85
468, 470–473, 482, 523–525, 527, 528, 530,
KEGG ................................................ 23, 76, 181, 207
531, 540, 544, 550, 552, 560, 567–569 metabolic ..................................................12, 229, 333
validation ....................................................... 334, 470, neural .......................................................97, 106, 109,
521, 549–551 111, 170, 178, 334, 338, 339
MoKa ........................................78, 81, 95, 103, 105, 336
Neurotoxicity ................................................................ 207
Molecular Newborn........................................................................ 536
descriptor ..................................................... 12, 25, 29, Newton method .......................................... 238, 247, 249
30, 80, 117, 138, 177, 178, 216, 332, 335, 343,
Nicotine ......................................................................... 516
355, 361–364 Non
docking ....................................................77, 136, 140, bonded interactions .............................. 139, 249, 363
143, 146–150, 153, 155, 235, 240, 246, 356,
congeneric ............................................................... 337
358, 362 genotoxic ........................................................ 523, 516
dynamics ............................................... 135, 136, 140, Noncancer risk assessment............................................ 455
143, 146–150, 153, 155, 156, 240, 246, 356,
Non-compartmental analysis ........................77, 369–380,
358, 362 384, 385, 390, 400, 410, 436, 523, 524
fragments ........................................................ 221, 252 NONMEM................................................. 390, 445, 523,
geometry......................................................... 106, 146 524, 548, 549, 551
mechanics .................................................... 32, 34, 36,
Nonspecific binding ...................................................... 342
135, 136, 139–141, 144, 146, 152, 155, 239, No-observed-adverse-effect-level
261, 337 (NOAEL).........................................516, 527–533
networks ........................................................... 82, 362
Nuclear receptor....................................29, 135, 153, 240
property ................................................................... 178 Nucleic acids..........................................37, 192, 244, 246
shape ........................................................................ 138 Numerical
similarity ......................................................... 138, 339
integration ............................................. 238, 249, 521
targets ............................................................. 139–141 methods ................................................. 388, 432, 435
Molfile........................................................... 89, 167, 172, Nutrition Examination Survey
189, 190, 192, 200, 204, 209 (NHANES)...................................................10, 11
608 C OMPUTATIONAL TOXICOLOGY
Index
O Permeability.....................................................12, 29, 119,
144, 156, 285, 306–309, 313, 316, 321–323,
Objective function ...................................... 313, 466, 548 325, 331, 333, 335–342, 395, 427–429, 460,
Occam’s razor ......................................... 49, 51, 120, 449 463, 520
Occupational safety ........................................................... 3 brain barrier .................................................... 339–341
Ocular ............................................................................ 314 drug ...................................... 307, 316, 335, 337–342
Omics........................................................ 3, 4, 19, 20, 85, intestinal ........................................................... 29, 335
238, 568 in vitro .....................................................12, 308, 312,
Open MPI ...........................................263, 264, 272, 273 313, 321–323, 331, 519
OpenTox Framework..........................224, 225, 230, 232 Pesticide ...................................................... 101, 165, 182,
Optimization 183, 201, 207, 282, 310, 351, 352, 356, 358,
dosage .........................................................33, 69, 389 361, 516
methods ................................................................... 399 Pharmacogenomics ....................................................... 329
pre clinical......................................................... 24, 329 Pharmacophore ................................................. 25, 28–30,
Oral 37, 73, 82, 135–144, 151–153, 177, 178, 226,
absorption................................................74, 306, 335, 353, 354, 360, 361
339, 458, 552, 554 Phosphatidylcholine ............................................. 247, 259
dose ....................................................... 307, 316, 321, Physiome
322, 377, 477, 478, 479, 496 JSim models............................................................. 391
Organisation for Economic Co-operation and project ........................................................................ 82
Development (OECD) PhysProps ......................................... 78, 91, 96, 110, 112
guidelines................................................................. 123 Pitfalls ...................................................... 35, 73, 261, 320
QSAR toolbox.................................. 91, 96, 107, 110, pKa .....................................................................29, 30, 78,
112, 115 79, 81, 87, 94–96, 99, 102–105, 239, 246,
Organochlorine ............................................................. 182 321, 322
Orthologous.................................................................... 21 Plasma
Outlier .................................................119, 120, 177, 312 concentration ....................................... 282, 288, 289,
Overfitting ............................. 88, 89, 106, 114, 116, 120 309, 313–320, 322, 369–373, 375, 379, 380,
Oxidative stress................................................................ 13 385, 386, 388, 389, 552, 553, 581, 582, 589
protein binding .......................................80, 294, 295,
P
322, 331, 335, 337, 338, 340, 342
Paracetamol .......................................................... 190, 196 Pollution ................................................78, 222, 310, 533
Parameter Polychlorinated biphenyls
estimation ..............................................143–144, 471, (PCBs).....................................101, 106, 110, 444
497, 504, 506, 507, 551 Polycyclic aromatic hydrocarbons
scaling ....................................................494, 496–498, (PAHs) ............................................................... 516
501–502, 506–507 Polymorphism ............................................. 136, 517, 552
Paraoxon ............................................................... 455, 589 Pooled data.................................................................... 578
Parasite............................................................................. 22 Poorly perfused tissues ............................... 442, 448, 451
Partial least squares (PLS) ........................... 79, 106, 138, Population based model .....................534, 537, 562–565
338, 339, 341, 355 Portal vein ............................................................ 285, 309
Partition coefficient...........................................12, 78, 88, Posterior distribution........................................... 560–561
94–99, 113–115, 239, 288, 334, 386, 442, 443, Potassium channel..............................222, 238, 240–243,
447, 448, 452, 456, 460, 461, 463, 465, 466, 254, 257
476, 494, 497, 500–502, 506, 519 Predict
PASSI toolkit ................................................................... 85 absorption......................................................... 29, 335
Pathway ADME parameters ........................................ 331, 333,
analysis .................................................................81, 85 335–343
maps ......................................................................... 155 aqueous solubility .................................... 99, 101–102
Pattern recognition .............................................. 337, 341 biological activity.............................25, 138, 151, 518
Penicillins.............................................................. 295, 354 boiling point.................................... 94, 108–110, 178
Perchlorate..................................................................... 444 carcinogenicity......................................................... 183
Perfusion..................................................... 297, 445, 446, clearance .................................................................... 12
448, 519–520 CNS permeability............................................. 30, 335
COMPUTATIONAL TOXICOLOGY
Index 609
cytochrome P450 .......................................... 4, 28, 30, Q
225, 237, 239, 241–242, 255, 285, 293, 316,
343, 352, 482, 500, 508, 517 QikProp ...........................12, 74, 80, 101, 102, 144, 336
developmental toxicity .............................................. 84 QSARpro ............................................................ 78, 92, 93
fate ...........................................................8–13, 78, 80, Quantum chemical descriptors......................97, 101, 108
81, 84, 134, 135, 183, 219, 284, 306, 343, 386,
R
523, 576
genotoxicity ............................................................. 216 R (Statistical software) ....................................79, 93, 323,
Henry constant...................................................94–96, 525, 527
113–115, 239 Random
melting point............................................... 91, 94–96, effects .............................................521, 543, 547, 549
100, 102, 105–108, 111, 178 forest ..................................................... 101, 173, 340,
metabolism ..............................................87, 329, 330, 341, 343, 344, 355
333, 335, 339, 340, 342, 343 Ranking..................................................31, 207, 552, 553
mutagenicity ..............................................83, 84, 181, Reabsorption ........................................................ 294–296
183, 224, 226 Reactive intermediates .................................................. 442
pharmacokinetic parameters ..................317, 329–346 Receptor
physicochemical properties...............................87–123 agonists ................................. 136, 137, 155, 353, 357
safety ......................................................................1, 11 AhR .......................................................................... 240
toxicity ................................................ 4, 30, 110, 184, antagonists............................ 136, 137, 155, 353, 360
238, 526 binding affinity ...................................... 146, 149, 333
PREDICTPlus.................................................78, 96, 107, mediated toxicity............................................ 351, 352
110, 112 Recirculation .............................................. 412, 410, 422,
Pregnancy ............................................................. 445, 536 428, 448, 452
Pregnane Xenobiotic receptors ........................... 351–361 Reconstructed enzyme network.......................... 169, 567
Prior distribution ................................................. 560–561 Reference dose (RfD) .........................516, 527–533, 537
Prioritization ........................................... 2, 136, 146, 542 Relational databases ............................154, 175, 177, 219
toxicity testing ......................................................... 1, 4 Renal clearance ............................................ 295, 339, 343
Procheck ............................................................... 251, 258 Reproductive toxicity .................................................... 538
ProChemist......................................................... 79, 93, 96 Rescaling........................................................................ 250
Progesterone ............................................... 203, 241, 255 Residual errors................................................56, 424, 549
Project Caesar.................................................................. 83 Respiratory ................................................... 13, 306, 316,
Propranolol........................................................... 316–318 391, 459, 473, 505, 582, 589
ProSa....................................................................... 80, 258 system .................................................... 306, 316, 391
Protein tract ...........................................................13, 306, 505
binding......................................................... 12, 77, 80, RetroMEX .............................................................. 81, 336
135, 155, 294, 295, 322, 331, 335, 337, 338, Reverse engineering ........................................................ 72
340, 342 Richly perfused tissues ........................442, 445, 451, 453
databank (PDB) ......................................22, 155, 173, Risk
245–247, 255–259, 354, 357, 362 analysis ............................................................ 509, 513
docking ............................................................. 31, 246 characterization ............................. 522, 523, 537–538
folding............................................................... 21, 239 estimation ............................................. 183, 215, 494,
interaction............................................ 21, 75, 76, 360 507, 523, 527, 528, 530, 532, 557
ligand ........................................................... 28, 31–33, Integrated Risk Information
77, 82, 141, 147, 246, 360, 363 System (IRIS) ................................................... 218
structure....................................................... 21–22, 30, management ............................................................ 516
144, 148, 245–246, 251, 257, 259, 262, 263, Risk/safety assessment
266, 269, 362 chemical ...............................154, 442, 513, 522, 521,
targets .......................................................... 21, 27, 28, 530, 533, 541, 546, 549, 552, 553, 567–569
37, 135, 144, 154, 245, 386 pharmaceutical...............................1, 2, 215, 330, 552
Proteomics.............................................3, 19, 21, 75, 568 screening .....................................................2, 154, 330
Prothrombin.................................................................. 582 testing ........................................................2, 182, 183,
Pulmonary .................................................... 13, 207, 307, 216, 219, 330, 526, 539, 568
314, 441, 443, 459, 519 Robustness..................................115, 118, 217, 532, 562
610 C OMPUTATIONAL TOXICOLOGY
Index
S SMILES. See Simplified Molecular Input
Line Entry System (SMILES)
Saccharomyces cerevisiae ................................................ 25 SMiles ARbitrary Target Specification
Salicylic acid................................295, 296, 399, 400, 402 (SMARTS) ..............................103, 163, 172, 229
Sample size ................................................. 523, 518, 528, Smoking....................................................... 293, 508, 562
531, 540, 542, 559 Sodium............................... 135, 258, 260, 261, 273, 296
SBML.................................................................... 392, 395 Soil .................................................... 10, 13, 88, 533, 569
Scalability .............................................................. 244, 245 Solubility prediction.......................................99, 101, 102
Scaling Source-to-effect continuum ............................................. 8
factor ......................36, 216, 464, 496, 500, 507, 508 SPARC ...............................................................13, 74, 78,
procedure........................................................ 498, 500 96, 98, 101–105, 110, 112, 113, 115
SCOP ..................................................... 82, 444–447, 441 SPARQL .......................................................220–222, 224
Scoring function...................................... 31–33, 149, 363 Sparse data .................................. 379–380, 390, 445, 528
Screening Species
drug ......................................................................... 223 differences............................................. 136, 353, 439,
drug discovery ............................................. 24, 34, 37, 463, 506, 516, 520, 542
136, 142, 154, 223, 329–331, 333 extrapolation......................................... 443, 456, 469,
environmental chemicals............................................. 4 470, 496, 520, 526, 568
HTS ............................... 4, 30, 37, 73, 153, 238, 330 scaling ................................................... 313, 443, 464,
methods ............................................... 4, 37, 147, 357 494–498, 500–502, 504–506
protocols .................................................................. 141 specific.................................................... 464, 497, 568
Searchable toxicity database ......................................... 218 SPSS ...........................................................................79, 93
Secondary structure prediction ................ 21, 22, 75, 254 Statistica .............................................................. 79, 83, 93
Selectivity index........................................... 241, 243, 353 Stereoisomers .................................................28, 361, 427
Self-organizing maps..................................................... 334 Steroid...........................................................352, 356–359
Sensitivity Stomach ............................. 308, 458, 480, 481, 498, 502
analysis ..........................................392, 396, 398–399, Storage compartment .......................................... 451–454
414–415, 446, 439, 448, 471, 472, 507, 521, Stress response........................................................ 13, 138
531, 550 Structural
coefficient ................................................................ 472 alert .......................................................................... 224
Sequence similarity ................................................ 142, 168, 198
alignment ...................................................... 21, 22, 74 Structure-activity relationship (SAR)
homology................................................................. 148 analysis ..................................................................... 358
Serum albumin ......................................74, 135, 153, 339 methods ............................................................ 38, 358
Shellfish...........................................................10, 209, 516 model ....................................................................... 219
Signal transduction ................................................ 84, 586 Styrene ........................................................ 217, 441–445,
SIMCA............................................................................. 83 447–453, 455, 456, 459, 465, 475, 476
Simcyp.................................. 80, 311, 445, 523, 525, 566 Sub
Similarity cellular..............................................75, 155, 498, 499
analysis ....................................................................... 29 compartments.......................................................... 586
indices ............................................................. 138, 364 Substrate
search .....................................................168–169, 175, active site ........................................................ 240–242
195, 196, 198, 199, 209 binding.................................. 135, 240, 241, 412, 413
Simplified Molecular Input Line Entry inducers.................................................................... 293
System (SMILES)...............................77, 89, 110, inhibitor ................................................................... 293
162–165, 172, 174, 190, 191, 193, 206, Substructure
209, 362 searching ................................................ 164, 169, 198
Simulated annealing ............................................... 31, 407 similarity ..................................................30, 168, 169,
Sinusoids........................................................................ 430 175, 195, 196, 198, 222, 356
Size penalty.................................................................... 363 Sugar ..................................................................... 250, 386
Skin Sulfur dioxide .................................................................. 20
lesion .......................................................................... 14 Supersaturation ............................................................. 313
SMARTCyp ................................................................... 226 Support vector machine (SVM) ............................. 23, 30,
SMBioNet........................................................................ 84 97, 101, 334, 338–341, 355, 361
COMPUTATIONAL TOXICOLOGY
Index 611
Surrogate endpoint ......................................................... 13 endocrine disruption...................................... 240, 352
Surveillance programs ..................................................... 65 endpoint ............................................ 84, 89, 118, 215
Switch ........................................................... 75, 200, 211, environmental..........................................88, 162, 179,
407, 426, 457 183, 240
SYBYL...................................................... 77, 93, 153, 164 estimates .................................................................. 505
Systems mechanism ............................................ 232, 237, 238,
biology ......................................................... 21, 23–24, 262, 468
75, 80, 216, 238, 567, 568 organic ................................................... 135, 181, 182
pharmacology ................................................... 81, 591 pathways ................................................................2, 14
toxicology ................................................12, 215–232, potential.................................................... 13, 178–179
567, 568, 575, 576 prediction..................................................... 4, 30, 118,
184, 238
T rodent carcinogenicity .............................................. 84
Tanimoto coefficient ................................... 198, 199, 364 screening .................................................................. 238
testing .......................................................... 1, 2, 4, 14,
Teratogenicity................................................................ 455
Tetrachlorodibenzo dioxin (TCDD) ........................... 562 15, 182, 183, 219, 568
Tetrahymena pyriformis ................................................... 99 Toxicogenomics ................................................5, 81, 134,
136, 141, 142, 148, 152, 153
Theophylline............................................... 282, 293, 294,
318, 319 Toxicologically equivalent .................................. 494, 495,
Therapeutic 501, 505
doses ..................................................... 286, 287, 290, Toxicophore ........................................138, 152, 179–181
TOXNET....................................................................... 218
294, 295, 302, 386
index ........................................................................ 576 ToxPredict ..................................................................... 334
Thermodynamic properties .......................................... 261 ToxRefDB............................................................. 218, 219
Toxtree.......................................................................83, 84
Threshold value .........................................................30, 73
Thyroid ................................................................. 240, 498 Tracers.................................................................. 384, 390,
Tissue 427–436
Training sets ......................................................13, 25, 26,
dosimetry ................................................................... 11
grouping ......................................................... 446–449 27, 37, 83, 90–92, 97, 103, 106, 108, 109, 112,
partition coefficient ............................... 463, 466, 497 118–120, 122, 147, 149, 151, 177, 217, 229,
volumes............................................11, 464, 473, 478 337–341, 354–356, 361
Transcription factor.............................................. 240, 351
Tmax ............................................................. 299, 313, 320,
341, 345, 389, 397 Transcriptome .................................................... 3, 75, 568
Tolerable Transduction .................................. 24, 84, 577, 578, 586
Transit compartment ........................................... 586, 587
daily intake...................................................... 516, 537
weekly intake .................................................. 523, 539 Translational research ................................................... 577
Tolerance ....................................................................... 589 Transport
mechanisms ............................................................. 331
TopKat .................................................................... 84, 178
Topliss tree .................................................................... 122 models............................................................. 9, 10, 13
Topological proteins (transporters) ................................... 295, 342
Tree ..................................................................81–84, 103,
descriptor .................................................97, 100, 108,
111, 178, 345 136, 203, 208, 229, 334, 339–341, 343–345
index ........................................................................ 344 self organizing ......................................................... 334
Topological Polar Surface Area Trichloroacetic acid..................................... 165, 455, 456
Tumor ............................................. 14, 48, 50–67, 69, 70
(TPSA) ...............................................30, 181, 362
Total clearance.....................................297, 341, 523, 552 Turnover ..............................................576, 584, 585, 588
ToxCast program .......................................................... 154 Tyrosine ......................................................................... 590
Toxic equivalency factors (TEF) .................................. 562
U
Toxicity/toxicological
chemical ...................................................87, 219, 237, UML ................................................................................ 82
238, 239, 455 Urinary cadmium concentration ......................... 556–558
database .......................................................... 225, 231 Urine........................................................... 294–296, 338,
drug ...................................... 237–240, 262, 288, 386 340, 386, 389, 399, 400, 411, 440, 498, 522,
DSSTox..........................................154, 218, 219, 223 550, 556, 558
612 C OMPUTATIONAL TOXICOLOGY
Index
V Virtual
ADME tox ...............................................29, 230, 231,
Valacyclovir ........................................................... 323–326 331, 334, 336, 342
Validation high throughput screening
external ................................................. 114, 122, 123, (vHTS).................................................. 30, 31, 33,
337, 470 37, 77, 82
internal.................................. 106, 151, 217, 470, 578 libraries .................................................................... 330
methods .......................................................... 242, 356 screening ...................................................... 24, 29–31,
QSAR.............................................. 26, 115, 122–123, 37, 77, 147, 148, 155, 181, 357
149, 151, 356, 566, 568 tissue ................................................................. 4, 5, 15
techniques.................................................... 26, 71–85, VolSurf ..................................................... 80, 92, 336, 355
123, 443, 578 Volume of distribution ................................ 24, 288, 290,
van der Waals ...................................................32, 36, 139, 299, 301, 333, 338–342, 376, 379, 389,
247–249 428, 455
Vapor pressure .................................................94–96, 108,
110–113, 115 W
Variability ...........................................................2, 10, 136,
314, 318, 335, 448, 473, 501, 508, Warfarin ....................................................... 242, 293, 582
513–569, 577 WinBUGS............................................................ 524, 533,
Variable selection........................................................... 552 549–551, 560
Vascular endothelial .................................... 426, 427, 434 WinNonLin .................................... 24, 77, 380, 445, 526
VCCLAB ...........................................................79, 96, 98, WSKOWWIN.............................................. 101, 102, 183
101, 102, 104, 105
X
Venlafaxine............................................................ 293, 294
Vinyl chloride ................................................................ 456 XPPAUT ...................................................... 383, 390, 395