QSAR Seminar Report Final
QSAR Seminar Report Final
QSAR Seminar Report Final
POLLUTANTS
by
Srivastav Ranganathan
Seminar Guide/Supervisor:
Prof.Sumathi Suresh
2010
Declaration
I declare that this written submission represents my ideas in my own words and where others' ideas or
words have been included, I have adequately cited and referenced the original sources. I also declare
that I have adhered to all principles of academic honesty and integrity and have not misrepresented or
fabricated or falsified any idea/data/fact/source in my submission. I understand that any violation of
the above will be cause for disciplinary action by the Institute and can also evoke penal action from
the sources which have thus not been properly cited or from whom proper permission has not been
taken when needed.
Signature:
Name of Student:
Srivastav Ranganathan
Roll No : 10318007
1 Introduction 4
5 QSAR in Environmental 30
Toxicology Case Study on
`Estrogenic Activity of
Anthraquinones`
CHAPTER 1:
AN INTRODUCTION TO ENVIRONMENTAL TOXICOLOGY AND
QSAR
Human civilization has made rapid progress in the past century and the technological
advances made with each passing decade have been rapid and huge. These technological
advances, coupled with rapidly growing population has led to introduction of man-made
chemicals and materials in the environment, many of which have disrupted the functioning of
the ecosystem to an extent which might eventually be detrimental to the intricate balance of
the ecosystem.
Although natural systems have a buffer to protect themselves against the human-introduced
toxic substances, the rate and amounts at which the toxic substances are released into the
environment do not allow the systems in nature to acclimatize and develop defence
mechanisms against these toxins. Hence, there exists a need to do an intensive assessment of
the potential toxic effects of the chemicals prior to their release into the surrounding
environment.
Environmental toxicology is that branch of science which deals with impact of pollutants
and chemicals on the structure and functioning of the ecosystems. Environmental toxicology
involves a multidisciplinary approach which requires the knowledge of molecular biology,
ecology, chemistry, genetics, biochemistry, mathematics, computational modelling and many
other fields to assess the eventual fate and toxic effects of a chemical or pollutant on the
ecosystem components (Table 1).
Table 1: List of subject areas which contribute to an assessment of the fate of toxic effects of
pollutants in environment
Ecology , Risk and Impact Assessment Basis to study the impact of a pollutants once it
gains entry into the environment.
The seminar report has been organized into the following chapters :
Chapter 1 deals with an introduction to QSAR studies and its place in the broad
toxicological framework.
Chapter 2 discusses QSAR modeling and the various steps involved in developing a
QSAR model.
Chapter 3 discusses some special cases and modification to QSAR modeling with
respect to chronic toxicity and in cases where the metabolites are more toxic relative
to the parent compound.
Chapter 4 discusses how the pollutants are classed into various groups in order to be
used as a training set for QSAR, based upon their chemical structure or biological
activities.
Chapter 5 summarizes a case study for predicting the interaction of anthraquinone
model compounds with estrogen receptor for exertion of estrogenic activity.
AN INTRODUCTION TO QSAR
The basic concept of QSAR is that chemicals which are similar in nature will behave
similarly in biological systems. However, the classification of chemicals as similar or
different and the choice of properties to decide the same are of key importance in QSAR.
Similarity must thus be described in relation to specific contexts and must take specific
attributes into consideration as well. For example, stereo-isomers may seem very similar in
structure but differ significantly in activity in biological systems.
QSAR uses the chemical and computational modeling approach to extrapolate the effects of
tested compounds to untested compounds which are similar in nature. Such models have been
successful in the estimation of toxicological endpoints like carcinogenicity, mutagenicity and
endocrine disrupting activity.
QSAR application in the domain of toxicological studies can be broadly divided into the
following categories:
The European Union legislation has put forth the requirement to implement QSAR in order to
assess the toxicity of chemicals according to the Registration, Evaluation, Authorisation and
Restriction of Chemicals (REACH) program (EU 2006).
1.2) The need for Computational methods and QSAR for toxicological assessment
According to USEPA and OECD reports, the current inventory of commercial and
industrial chemicals exceeds 160,000 and the number is growing at a rate of 3000 new
chemicals every year. The range of chemicals includes those from pharmaceuticals,
cosmetics and personal healthcare products, industrial chemicals and pesticides.
In addition to the fact that the number of chemicals is growing at an alarming rate, the
more worrisome fact is that the traditional toxicological testing assays only achieve
testing of about 500 chemicals every year. Thus, related data for environmental
effects and fate exists for only 20% of the chemicals.
The time lag between testing of chemicals and their widespread use could lead to
irreparable damage to the ecosystem by the time the harmful effects and toxicity
levels of certain chemicals are known and regulatory norms are established. This
poses a unique challenge in front of environmental toxicologists to look at non-testing
alternatives which would help us in prediction of toxicological endpoints of chemicals
at a faster rate and prioritization according to the toxicities predicted by these
methods.
Laboratory testing of chemicals in animals are time consuming and incur a high
expense. Screening-level assessments can cost from $1-5M, while comprehensive risk
assessments can cost more than $60M in testing and analysis (USEPA).
In vivo experiments conducted in lab animals may also have the problem of relevance
to human beings due to species to species differences.
In vitro studies consist of administration of the pollutant to cells cultured in vitro.
However, these results may not be reliable due to various other factors which come
into play when the entire biological system is taken into picture.
As observed from figure 1, QSAR studies feed on the data from previously performed
toxicological studies using traditional assays as well as provide data to prioritize in-vito
assays and animal studies. The data from the available studies have to be coherently
maintained and organized along with various data related to biochemical and biological
information. The predictive results from QSAR models then contribute to setting new, safer
regulatory norms from regulatory agencies.
An insight from QSAR models on toxicological profiles of the chemicals enables the focus of
industries to be channelized on identifying and developing newer alternatives to the
compounds with more toxic profiles. Thus, it’s a mutual, inter-connected relationship where
the industry, academia (academic research) and regulatory agencies have to work in tandem
to set new acceptable norms and thereby try and minimize the impact of chemicals on the
environment.
Fig 1. Valerio Jr, 2009 The path of work flow for the use of drug and chemical toxicity
databases and models; starting from the source of data to the goal of predicting environmental
health effects
CHAPTER 2:
PRINCIPLES FOR DEVELOPING ENVIRONMENTAL QSARS
Note: It must be noted that although described in a sequential order above, there is a
considerable amount of trial and error method involved while selecting the molecular
descriptor and endpoint. Thus, based upon the nature of the correlation, the procedure would
involve going back and forth in the above schematic in order to optimize the choice of
dataset, endpoint or descriptor.
Selection of a training set or dataset of chemicals goes hand in hand with the choice of
molecular descriptor. The dataset to be chosen depends on the biological endpoint to
be modeled. The dataset stongly depends upon grouping the chemicals with known
biological endpoints and molecular descriptors on the basis of different parameters.
i) One such approach is grouping the chemicals on the basis of their biological
targets, for e.g protein targets like enzymes, hormones etc.
ii.) The other approach towards grouping the chemicals is on the basis of their
structures, for e.g, classification of chemicals into chemical classes like
carbamates, polyaromatic hydrocarbons, polychlorinated biphenyls etc.
This relationship was developed from variety of aquatic organisms and its dataset included 51
organic compounds. The choice of a well defined biological endpoint is of immense
significance in developing a reliable QSAR model. The source of data must be those obtained
from standardized test results. The most commonly used examples of endpoints used in
environmental toxicology are LC50, LD50 or EC50 values which indicate the dosages which
are lethal to 50% of the population (organisms). These values of biological endpoints are
generally expressed in molar concentrations. The data are then converted to a logarithmic
scale in order to avoid issues associated with regression analysis. While choosing the training
set and using biological data for endpoints, it must be ensured that:
Another cause for variability is the presence of impurities in the tested chemical. If the
presence of impurities which cause a synergistic or toxicity lowering effect on the compound
under question, there would be major variance in the data for these endpoints and hence must
be avoided.
The second step involves chosing of a particular property which is linked to chemical
structure of the compounds. In order to choose a molecular descriptor on which the model
would be based, one would have to know how the descriptors are linked to the biological
endpoint which is chosen or in the chemical behaviour of the compound. Hence, it is the
structural representational part of QSAR.
The values of the molecular descriptors are arranged in a particular order and the relationship
to the numerical value of the biological endpoint is observed. If there is no trend observed, it
means that the molecular descriptor is not related to the biological endpoint and thus cannot
be used to model that particular biological endpoint. Molecular descriptors are generally
measured behaviours of the compounds which are expressed numerically. QSAR models in
environmental toxicology generally use molecular descriptors that are linked to the
physicochemical properties of the chemicals which are experimentally measurable. Studies
have suggested that biological responses to various chemicals are linked to their
hydrophobicity, electronic properties, steric effects etc. As an example, hydrophobicity is the
concentration of the compound in octanol compared to its concentration in water after it has
been partitioned into two phases. It gives an indication of the tendency of the compound to
accumulate in lipid deposits of the body or cross lipid membranes in order to exert a
biological effect. In general, an increase in hydrophobicity would manifest itself as an
increase in the ability of that compound to cross cell membrane, thereby exerting greater
biological response. The fate of a pollutant in a biological system would thus depend on the
hydrophobicity because :
i.) Compounds that are too hydrophobic will not show any solubility in aqueous
phase.
ii.) If the hydrophobicity is extremely high, the compund would get trapped in fat
deposits and never reach the target site.
Many algorithms are available in the literature for calculating these interesting molecular
descriptors, which can be easily computed for all the existing, new, and in development
chemicals for a multivariate description of the molecules when they are judiciously
combined.
QSAR uses data from a variety of sources, all of which may not be acquired using a uniform
protocol and hence might cause problems in statistical assessment. Also, the difficulty in
measuring certain toxicological (biological) endpoints accurately might cause additional
statistical difficulty. Hence external validation methods like the use of an external test set are
routinely used to validate the model. Other statistical methods applied to assess the model
reliability are RMSE (Root mean square error), squared correlation coefficient (R 2) which are
internal validation methods.
The choice of training set is considered to be the most important factor for deciding the
accuracy of a QSAR model. The choice of a dataset depends on the type of predictive model
which is desired. The primary means are to secure representativeness of the chemicals based
on biological activity or chemical structure.
r
n
Q
o
M
e
T
S
g
is
a
R
lA
d
t
ti
Assesment of Outliers
Fig 4: Choice of training set and a flowsheet governing the choice.
The flowsheet shown in figure 4 represents how the choice of dataset influences the model
accuracy. Once a dataset is established on which the model is to be built and the
relationship/correlation between the chemical and biological aspects is first established, the
model is further tested using a test set of chemicals whose toxicological endpoints are known
and the quality of prediction is then derivable from this prediction. If the accuracy of the
prediction is not upto desired levels, it would indicate that some of the data used was
inaccurate or that a revision of molecular descriptor choice is necessary. Elimination of the
chemical acting as a chemical outlier is performed in cases when it has been clearly identified
that the outlier has a different biological activity compared to the training set altogether.
In the course of developing most QSAR models, it is always observed that a major hurdle is
that the prediction of some chemicals of the training set are poor and inaccurate. Such a
situation is known as the problem of statistical outliers. Understanding the behaviour of such
outliers is of great help in gaining a better insight into the mechanisms of toxicity at a more
fundamental level like the biochemical level and also help reassess any errors in data choice
made in the previous steps.
IF NO
Some of the values of endpoints of the training set used is inaccurate and/or due to
non-standardization of the protocol used to acquire these endpoints.
The chemical which is a statistical outlier exhibits a biochemical mode of action
which is different to the training set.
The possibility of metabolic products of the chemical might be responsible for the
toxicity and not the parent molecule itself.
Choice of descriptor was not suitable enough or not sufficiently related to the
biological endpoint.
Hence, outliers help to revisit the choice of parameters in the initial steps and improve the
overall qunatity of the QSAR model chosen.
CHAPTER 3:
b
P
(C
h
M
f
u
N
D
k
G
A
S
IV
ti
,p
id
n
a
g
r
O
lff
v
L
m
e
t
y
s
o
c
E
:R
B
QSAR STUDIES FOR NON-TYPICAL CASES : CHRONIC TOXICITY
AND ACTIVE METABOLITE ACTIVITY
QSAR models generally work reliably when predicting short-term effects and acute toxicity
related to the chemical and its structure. However, there exist certain long term effects which
are caused due to prolonged exposure of a biological system to a chemical and also
influenced by multiple biological factors as well, in addition to the properties of the chemical
itself. Also, at a particular dose of a toxicant, multiple modes of carcinogenecity or
mutagenecity might be active and such effects cannot be taken into account reliably by
QSAR models.
However, one approach which offers promise in overcoming this limitation is the use of a
technique called as the Adverse Outcome Pathway (AOP). This is an approach which
utilizes the information available from recent advances in bioinformatics, toxicogenomics,
systems biology to predict adverse effects with greater reliablility.
Fig 5. Prediction of chronic toxicity and multiple modes of action using QSAR
Adverse Outcome Pathway of (AOPs) study the effects at two levels :
In comparision to the classical approach of QSAR modeling where the toxicity is predicted
based on relationships between a molecular descriptor and a biological endpoint in the chosen
chemicals of the training set, the AOP approach uses all available information from both the
biological and chemical domains of knowledge to predict the toxic outcome of a compound
with greater confidence.
The initial steps involve the classical QSAR approach to study the chemical structural aspects
of the compound and then also predict its metabolic products which might be biologically
active. The succeding steps then take into account the available biological macromolecular
and biochemical information databases in order to then model the toxic effects based upon
the biological target, effects of the biological target interaction and eventual biological
endpoints.
An example of such an Adverse Outcome Pathway is described below for a toxicity study
done on the reactive toxicity effect of moderate electrophiles:
The various interactions and known reactions between the moderate electrophile (e.g an
ester) and the biological systems are taken into account. The first step here is to first predict
the fate of the parent compund. One such fate is the formation of a Schiff`s base which is an
enzyme catalyzed reaction leading to formation of Imines.
A few of the possible interactions are binding of the active metabolic product of the
electrophile to DNA leading to formation of DNA adducts.
l̀P
M
C
g
B
A
D
lh
G
f
,u
r
v
ti
a
d
x
IO
m
N
n
o
p
R
t
y
iS
s
c
e
ff
E
b Possible Biological
Another commonly known effect is the interaction with glutathione (GSH) which in a
protective molecule in the cells, preventing them from the effect of free radicals. Each of
these pathways would lead to a different endpoint and would thus necessiate the development
of models to predict the different toxicity endpoints. Thus, AOP helps to seamlessly integrate
the information from the biological pathways and macromolecular information to the
chemical structural information to give a more accurate account of toxicity as well as give a
picture of the toxicity endpoints to be modeled.
The basic QSAR approach is quite reliable when it comes to predicting the effects of
chemicals whose parent compound is responsible for the effect. However in many cases the
parent compound might be almost non-toxic and is metabolized by transformation
mechanisms in the biological systems. The products of these reactions called active
metabolites are responsible for the toxic effects.Predicting metabolic activation of
compounds is a limitation of QSAR. However, this limitation can be overcome by
maintaining an ordered biochemical database of reactions and transformation pathways which
would help us predict the metabolic products. Once the metabolic products of a given
compound are predicted for their toxic effects, QSAR can then be applied to identify the
most toxic form of the chemical.
Figure 7 represents an example of such a case where the different active metabolites formed
as a result of 2-acetyminofluorene metabolism are predicted for their toxic effects using
QSAR. This approach first makes use of the information about the various metabolic and
biochemical pathways in the biological systems into which the compound in question might
enter and thereby undergo transformation reactions. The active and inactive products formed
in the reaction pathway are predicted from the knowledge of these pathways from existing
literature and biological pathway databases like NCBI`s Bio Systems.
EXAMPLE OF ACTIVE METABOLITE SELECTION
Simulated
O
NH
2- O O
Acetylaminofl NH
O
NH
O
NH
O
uorene OH
Metabolism HO
O OH
O
O
NH NH
NH
O
O
Activated HO
HO
O OH
metabolites NH
O
NHX
NH NH2 O
OH O
OH O
O
X= H, OH,
HO OH
O O
NH NH
NHOH
O O
O
...
...
N+H
O
HO OH
N+H N+H
O O
CHAPTER 4.)
CLASSIFICATION OF CHEMICALS INTO CLASSES FOR CHOICE OF
TRAINING SET
4.1) Congeneric QSAR (Training set pollutants chosen according to chemical structure)
During the early phase of QSAR modelling techniques were at an early phase, it was noticed
that xenobiotics belonging to the same class chemically acted through a similar biological
mechanism to produce toxic effects. Subsequently, many QSAR models were developed
which were based on specific group of chemicals like chlorobenzenes, chlorophenols etc.
Depending on the structural similarity of the compound for which the prediction is to be
made, a dataset of similar chemicals is chosen for modelling purposes. The similarity in this
case is strictly defined by the chemical structural parameters.The list of chemical classes is
exhaustive. However, a few examples of chemicals grouped according to their structures is as
follows
They are generally used in agriculture as a potent insecticide. Of late, some of the compounds
belonging to OCPs like aldrin, dieldrin, heptachlor, DDT, HCH, etc. which have been listed
under the group known as persistent organic pollutants (POPs) by the USEPA. The use of
these compounds have been restricted or totally banned. When absorbed into the body,
chlorinated hydrocarbons are not metabolized rapidly and are stored in the fatty tissues. OCPs
are persistent in the environment owing to their high stability and lipophilic nature which in
turn leads to their accumulation in the food chain components (Watanabe,2005). The
concentration of these compounds declines at a very slow rate even when the source of
contamination has been eliminated. These compounds are biomagnified at higher tropic
levels and hence elevated contamination is detected in the human body. Mild cases of
poisoning are characterized by headache, dizziness, gastrointestinal disturbances, numbness
and general weakness, apprehension and hyperirritability.
4.1.2) Organophosphates
These are another class of compounds which are used as insecticides or chemical warfare
agents. They are persistent to an extent less than organochlorines. The toxicity of compounds
of this group can be linked to its structure, the double bonded O or S (see figure) in addition
to the groups surrounding the phosphate in the compound. The most toxic compounds of this
group have been observed to have a short phosphonate groups with fluoride or cyano group
surrounding the phosphate. Metabolic activity leads to replacement of sulphur by oxygen or
other modes of transformation leads to its conversion to a more toxic species. These
compounds have been observed to bind to amino acid serine, thus reducing the catalytic
activity of enzymes by blocking the active site of enzymes. Another mode of toxicity of these
compounds is its tendency to bind to acetyl cholinesterase, a key enzyme of the Central
Nervous System (CNS). Cholinesterase is an enzyme in the human body that is essential for
the normal functioning of the nervous system. Inhibition of the activity of the cholinesterase
enzyme prevents neural signals from being transmitted from the brain to various parts of the
body. Symptoms of this inhibition include excess salivation, difficulty in breathing, blurred
vision, cramps, nausea and vomiting, rapid or slow heart rate, headache, weakness and
giddiness. They are also known to cause reproductive and endocrinal damages also.Typically,
acetylcholine is released in order to excite the receiving neurons to receive the signal during
the transmission of a nerve impulse. Acetylcholine is rapidly broken down by
acetylcholinesterase after the initial binding of this substrate to serine residue in the active
site of the enzyme. However, when an organophosphate binds to the serine, it leads to
irreversible blockage of the active site. A covalent bond between serine and phosphate is
formed with a loss of fluoride or other groups. The next step is the irreversble binding at
glutamyl residue of the enzyme called as ageing of the protein which is accompanied by loss
of activity.
The EPA and WHO have identified and classified 16 PAHs as priority pollutants (see figure
10). The European Community Directive 98/83/CE states a maximum value of 0.1 g L−1 for
PAHs in drinking waters expressed as the sum of benzo(b)fluoranthene,
benzo(k)fluoranthene, benzo(ghi)perylene and indeno(1,2,3-cd)pyrene. As far as Italian
legislation is concerned, a limit of 0.01 g L−1 has also been set for benzo(a)pyrene
(European Community Directive 98/83/CE).
Synthetic Pyrethroids are synthetic derivatives and analogues of a plant extract pyrethrin
which is obtained from the chrysanthemum flower. The design of pyrethroids is to make it
more toxic to target insects with longer breakdown times. Thus, pyrethroids are very
persistent with adverse biological effects. These chemicals are designed to rapidly penetrate
insects and paralyze their nervous system. The synthetic pyrethroids are generally
ketoalcoholic esters of pyrethroic acids. Traditional toxicity assays have shown that
pyrethroids have mild irritant activity although not very easily absorbed through skin. Studies
have shown that pyrethroids are highly neurotoxic following oral administration. Pyrethroids
act by interfering with the ionic conductance of nerve membranes. World Health
Organization reports suggest that pyrethroids act by acting on axons in central and peripheral
nervous system by interacting with sodium ion channels (Soderlund, 2002).
Pyrethroids have been found to be extremely toxic to aquatic organisms like laketrout,
bluegill etc (Go et al, 1999). Their LC50 values have found to be as low as 1 parts per billion
(ppb) which is very similar to the toxic levels to target organisms like mosquitoes, blackfly
etc. Adverse impacts in lobsters, zooplanktons include damage to gills and behavioural
changes. Indirect effects on birds has been in relation to the threat to their food supply.
Insectivorous bird populations are the most badly affected. (Fishel, 2005)
Pyrethroids are formulated in combination with chemicals called as synergists which increase
their stability and persistence and hence their toxic potency. Synergists are often oils or
petroleum distillates. One such synergist is PBO (piperonyl butoxide) which inhibits liver
enzymes like hepatic microsomal oxidase and thus interfere with detoxification mechanism
of the liver. Hence the toxic effects in combination with synergists is marked in mammals as
compared to pyrethroids alone.
PCBs are mixtures of upto 209 chemicals which are of industrial origin. The use of PCBs as
industrial coolants, lubricants in transformers, capacitors, in plastics and as paint plasticizers
are the common sources of PCB pollution. 130 such PCBs are commonly used in industry.
PCBs have also been classified as Persistent Organic Pollutants by the USEPA.
According to a WHO report in the year 2003 (Geneva, 2003) around 2 x 108 kg of PCBs
existed in environmental matrices at that time. Adsorption and subsequent sedimentation
immobilize PCBs in the aquatic systems for a long time. Biodegradability is related to the
amount of chlorination of a specific PCB. Higher chlorination leads to higher persistence and
lower biodegradability rates. Thus, PCBs accumulate in the environment and cause
environmental problems. Bioconcentration factor of PCBs which is the ratio of concentration
in biological systems to concentration in water, increases with increased chlorination.
Physicochemically, they have low water solubilities and are highly soluble in organic
solvents. PCBs have been found to accumulate in lipid organs and polar lipid content like
phospholipids. PCBs have been reported to biomagnify in aquatic food chains and higher
trophic levels.
Following is a summary of some pollutants, their mode of action, some physicochemical
properties and permissible limits (Table 1):
Chemical Class Representativ Uses Biological Maximum Log Vapor LD50
e Chemical Endpoint Permissible P Pressure
levels
Organochlorine Aldrin Insecticide Carcinogen 0.03 ug/l (WHO 7.4 2.31 x 10-5 33 mg/kg
s and EDC water quality mm Hg at body
norms) 20 C weight
(guinea
pigs)
Dieldrin Pesticide Carcinogen 0.03 ug/l (WHO 6.2 1.78 x 10-7 37 mg/kg
and EDC water quality mm Hg at body
norms) 20 C weight
(guinea
pigs)
DDT Insecticide Reproductive Banned (except 6.9 1.5 x 10-7 100
Defects,Bioac for a few mm of mg/kg
cumulation,E countries) mercury body
DCs at 20 C weight
(rats)
Organophospha Malathion Pesticide Cholinesterase 5 ppm (EU 2.89 5.3 mPa 1375
tes inhibition, Norms) at 30 C mg/kg in
neurotoxic rats
Profenofos Pesticide Cholinesterase 0.05 ppm (EU 4.44 3.5 x 10-4 358
inhibition, norms) mm. Hg at mg/kg
neurotoxic 25°C
PCBs Aroclor Electrical Reproductive 10 ppm (EU 6.3 0.9-2.5 Pa 2 to 10
Equipment toxicity, norms) at 20 C g/kg
manufacturi bioaccumulati body
ng on weight of
rats
PAHs Naphthalene Coal tar and Carcinogen 1.1 ug/l (WHO 3.01 0.082 mm 533 mg/k
industrial limit) - Hg at 25 C g in rats
waste 3.45
Note :
The physicochemical properties like Kow (log P) and Vapor pressure are very
commonly used molecular descriptors in QSAR studies.
LD50 values are commonly modelled endpoints in QSAR studies.
The source of information for the permissible limits in the above table are the
USEPA, Pesticide Activity Network UK and the World Health Organization.
4.2) Limitations of Congeneric models
Studies suggested that models based on modes of action were more accurate than those based
strictly on chemical structures. Also observed is that many inert chemicals have observed to
act by nonspecific modes of action like narcosis. Much of this could be attributed to the
lipophilicity of these inert chemicals which may not share any chemical similarity. Another
example of how a diverse class of chemicals can act by the same or similar mode of action
are compounds which act by uncoupling the oxidative phosphorylation enzymes in the
mitochondria. These compounds belong to multiple classes of chemicals like phenols,
anilines, pyridines etc.
Thus, classification of chemicals based upon their mode of actions and models based on such
training sets have gained more importance.
4.3) Classification of Pollutants based upon the modes of action and biological targets is
as follows:
This approach is also known as target-based prediction approach. Target prediction approach
originated from the field of drug designing for developing muti-target drugs and to study non-
target effects of such drugs. Such an approach was also extrapolated to the field of toxicology
to study acute responses of chemicals. All of these methods share the common goal of
establishing links between the chemical structures of a compound and potential protein
targets in biological systems. The concept by which side effects and non-target effects of
drugs were studied is also used in finding out targets molecules for chemicals. Efforts are also
on to integrate protein structural data and toxicological data to improve predictive models.
One such example of how this integration could be achieved was demonstrated by Chen and
co-workers (2001) who devised a technique called INVDOCK (ligand protein inverse
docking). This technique uses an algorithm to explore macromolecular binding site most
suitable to accommodate a target compound in question. The limitation of this method is the
relatively low number of structures for proteins that is currently available. However, with
more and more 3D structures of proteins being resolved with every passing year, better mode
of assessment and target prediction linked to chemical structure could be achieved.
Following are a few classes of pollutants based upon their modes of action and
biological endpoint:
EDC`s are those compounds which interfere in hormone biosynthesis, metabolism or activity
thereby causing adverse effects to the homeostatic mechanisms and reproductive systems of
higher organisms. According to a study concucted by the Kandarakis et al (2009) it has been
found that EDC`s have effects on male and female reproduction, breast development and
cancer, prostate cancer, neuro-endocrinology, thyroid, metabolism and obesity, and
cardiovascular endocrinology. Endocrine disruptors act by multiple pathways and thus exists
the challenge of detecting these compounds and predicting their endocrine disrupting activity.
Another complicating factor is that in certain cases, the parent compound may not show any
endocrine disrupting activity but their metabolic products may show biological activity.
Also, different kinds of EDCs may also produce similar biological effects. Endocrine systems
do not show any immunity or resistance against these chemicals because of their structural
similarity to endocrine hormones, shared receptor sites and their ability to bind to the
enzymes involved in metabolizing them. Earliest recorded examples of the ill effects of
EDCs were the thinning of eggshells of fish-eating birds and impairment of reproductive
processes in birds like seagulls. Some EDCs have been detected in commonly used products
such as personal-care products like soaps and cosmetics (some contain nonylphenol
compounds and parabens), industrial by-products, plastics (phthalates) and pesticides. When
these products are used, disposed of, or excreted by people or animals, they typically end up
in either stormwater or wastewater. While wastewater treatment processes can remove a
significant amount of these compounds, small concentrations of some are discharged into
surface waters. Below are a few examples of EDCs along with their targets:
EDC`s may not only affect the target organism but also impact future generations due to
modification in factors which affect gene expression, e.g DNA methylation and histone
acetylation (Anway, Skinner 2006).
Anthraquinones are also known for their xeno-estrogen activities which lead to unregulated
activation of estrogen receptors thereby disrupting the balance of endocrine hormone
functioning. The anthraquinones exert biological effects by their interaction with estrogen
receptor, mainly ERα1 which is predominantly expressed in uterine, kidney and ovarian cells.
The vast number of xenoestrogens necessitates the need to develop computational models
which would enable screening and prediction of their estrogen mimicking tendencies.
In order to develop a QSAR model on these anthraquinones for the prediction of
xenoestrogenic activities of anthraquinones, the primary need is to understand the interaction
between the xenobiotics and the estrogen receptor. In this study, this interaction was studied
using a computational biological approach called as `Molecular Docking`.
Molecular docking is commonly used to study the interaction between a biomolecular target
such as a receptor protein and a drug/pollutant molecule. The associations between
biologically relevant molecules such as proteins, nucleic acids and various chemical
molecules play an important role transduction of biological signals. Furthermore, the relative
orientation of the two interacting partners may affect the pattern and type of signal produced
(e.g.,agonism vs antagonism). Therefore docking is useful for predicting both the strength
and type of signal produced when a pollutant molecule is bound to a target. For example, a
receptor protein molecule in this case study. Other studies which used the technique of
docking to study the interaction between ligands and receptors were those of Celik et al.
(2008) who found that some PCBs, pesticides and plasticizers could interact with the sterol
binding site of the estrogen receptor.
The in-vitro experiments included the use of 20 Anthraquinone (AQ) model compounds to
determine estrogenic activity using a yeast cell based assay. Molecular docking was used to
define an interaction model between the ligand and the estrogen receptor. By observing the
ligand-receptor interactions, appropriate molecular descriptors were chosen for the purpose of
QSAR modelling.
Techniques used:
In-vitro Assay (Recombinant Yeast-based Assay)
20 Anthaquinones were selected on the basis of their occurrence in environmental and
biological matrices. DNA Sequence of estrogen receptor, ERα and a reporter gene lac-Z (for
enzyme β-galactosidase) were integrated into the yeast genome. The yeast strain used here
was Saccharomyces cerevisiae.
The principle behind this assay is that once the DNA construct is integrated into the yeast
genome, the yeast would express β-galactosidase enzyme activity in presence of an estrogen,
i.e whenever an enstrogenic molecule binds to the ERα receptor, the operon lac-Z leads to
production of β-galactosidase enzyme. The amount of β-galactosidase produced is thus
proportional to the estrogenic activity.
Where,
D = dilution factor,
V = volume of the culture,
ODtest= optical density measured for enzymatic action supernatant at 420nm
OD`blank = optical density measured for blank at 420nm
t = incubation time
The EC50 (half maximal effective concentration) was calculated from the dose response
curves. Estrogenic activities of the chosen anthraquinones were thus expressed in terms of
relative potency (RP). Relative Potency of the anthraquinone was calculated using the
expression :
In-silico method:
Molecular docking study was used to predict the binding of the various anthraquinone model
compounds to ERα receptor. CDOCKER was the docking algorithm used to study the
binding of AQs to the receptor. For purpose of docking, the 3D crystal structure (coordinates)
of ERα was obtained from the Protein Data Bank (PDB), USA. The structure of the ligand
and random conformations of the ligand were generated using molecular dynamics method.
The CDOCKER interaction energy between the AQs and the ERα (Ebinding) was then
calculated. Also calculated were the electrostatic parameters for the ligand-binding site, using
the same docking software.
i.) Ability of the AQs to penetrate the biological membrane and reach the target site
ii.) interactions between AQs and the receptor.
Using the information gained from the docking studies, a total of 15 parameters were selected
to characterize the process.
i) The log Kow values were chosen in order to characterize the ability of the AQs to
cross the membrane. Parameters like molecular volume (V), average molecular
polarizability (α) and the polarizability term were also selected because of their
correlation with log Kow.
ii) The parameters which were chosen from the docking studies in order to
characterize molecular interactions were as follows
Energy of the highest occupied orbital (EHOMO)
Energy of the lowest unoccupied orbital (EHUMO)
The most positive hydrogen atom in the molecule (qH+)
The most negative formal charge in the molecule (q-)
Electrophilicity index (ω)
Most positive and most negative values of molecular surface potentials
(Vsmax and Vsmin)
Averages of positive and negative potentials on the molecular surface
(Vδ+, Vδ-)
Relationship between the above parameters and the interacting molecules is the basis of the
binding affinity of these estrogen derivatives. EHOMO, ELUMO, qH+ and q- are all used to
characterize the electron donating/accepting nature of the molecule. Also, the electrophilicity
index measures the ability of a compound to accept electrons. It has been observed that in
many cases, the binding affinity directly correlates to the electrophilicity. Surface potentials
describe the charge distribution on the molecule.
Ebinding, above all was found to be the most important parameter in characterizing binding
affinity. All these parameters were calculated using the Gaussian 09 program.
QSAR modeling
The 20 AQs were randomly divided into 2 sets, one for training the model (training set) and
the other for testing it (validation set) as shown in table 3.
The binding free energies (Ebinding) calculated from the docking study are listed in table 3.
When the relationship between the log RP and Ebinding was analyzed, it was found to have a
simple linear relationship. This indicated that the binding to ERα was the key step in
exerting the estrogenic activity. It must be noted that, in this case log RP is the biological
endpoint and Ebinding is the molecular descriptor. However, Ebinding alone was not a good
predictor for log RP. Hence, a multi-parameter model was then developed using the
parameters found to be essential in binding interactions between the molecule and the
receptor and thus its activity.
Fig 13. A plot showing linear relationship between Ebinding and logRP
Finally, an optimal QSAR model was developed. The model was of the form :
log RP = -8.08 + 4.51π1 – 1.84 x 10-2Ebinding + 1.36 x 10-2α – 6.70 x 10-1q-- - 6.82Vs-
The predicted log RP values are listed in table 3. The R 2 values of the QSAR model was 0.85,
indicating a reliable model. The predicted log RP values were close to the observed values for
both the validation and training sets.
The training set also had good representativeness and hence the model can be used to predict
estrogenic activity of other anthroquinones not a part of this dataset.
The case study used both protein structural data as well as the electronic data of the ligand in
order to characterize the ligand-receptor interaction. Hydrogen bonding, hydrophobic and π-π
interactions between the ligand and the receptor govern the estrogenic activity of the AQs.
Hence, comprehension of binding interactions is of great importance in developing QSAR
models based on toxicity mechanisms. The study also demonstrates how protein structural
data as well as toxicity data can both be complementarily used to develop a robust QSAR
model.
The study also shows that these anthraquinones have an EC50 of 2-9 uM which is
comparable to the EC50 of an estradiol agonist Ethinylestradiol which shows that presence of
anthraquinone residues in drinking water could lead to potentially unregulated estrogenic
receptor activation by binding to ERα. ERα receptor activation has been reported to be a
major cause of stimulation of breast cancer cells. Binding to ERα by anthraquinones and
unregulated activation of the receptor could also be of detrimental effect to aquatic organisms
and affect their reproductive balance by causing infertility in male organisms. (Poola I. Yui
Q. 2007)
REFERENCES:
Anway M.D., Skinner M.K., Uzumcu M., Cupp A.S. (2006) Epigenetic
Transgenerational Actions of Endocrine Disruptors. Science 308, 1466-1469
Benigni R., Netzeva T.I., Benfenati E. (2007) The expanding role of predictive
toxicology: an update on the (Q)SAR models for mutagens and carcinogens. Journal
of Environmental Science and Health. Rev 25, 53 -97
Cao Q., Garib V., Yu Q., Connell D.W., Campitelli M., (2009), Quantitative
structure–property relationships (QSPR) for steroidal compounds of environmental
importance. Chemosphere 76, 453–459
Jensen G.E., Niemela J.R, Wedebye E.B., Nikolov N.G.(2008) QSAR models for
reproductive toxicology and endocrine disruption in regulatory use: a preliminary
investigation. SAR and QSAR in Environmental Research Vol. 19 Nos. 7–8, 631–
641
Kandarakis E.D., Bourguignon J.P., Guidice L.C., Hausser R., Prins G.S., Soto
A.M., Gore A.C., Zoeller R.T. (2009) Endocrine-Disrupting Chemicals: An
Endocrine Society Scientific Statement. Endocrine Reviews. 30(4), 293-342
Li F., Li X., Shao J., Chi P., Chen J., Wang Z. (2010) Activity of Anthraquinone
Derivatives: In Vitro and In Silico Studies. Chem. Res. Toxicol. 23, 1349-1355
Liu, H.X., Papa, E., Gramatica, P.(2006) QSAR prediction of estrogen activity for a
large set of diverse chemicals under the guidance of OECD principles. Chem.Res.
Toxicol 19, 1540–1548
Pasha F.A., Srivastava H.K., Singh, P.P.(2005) QSAR study of estrogens with the
help of PM3-based descriptors. Int. J. Quantum Chem. 104, 87–100.
Valerio Jr. (2009), In silico toxicology for the pharmaceutical sciences. Toxicology
and Applied Pharmacology ,241 ,356–370
URLs
USEPA (2001): Guidance for Reporting Toxic Chemicals: Pesticides and Other
Persistent Bioaccumulative Toxic (PBT) Chemicals.
www.epa.gov/tri/lawsandregs/pbt/pbtrule.htm, as on 22nd Oct 2010
Books Referred :
De Villiers J. (2009) Endocrine Disruption Modeling. CRC Press (Taylor & Francis
Group 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742,
USA)
Ekins S. (2007), Applying computers to toxicology assessment: Environmental.
Computational Toxicology: Risk Assessment of Pharmaceuticals and
Environmental Chemicals. Wiley-Interscience (John Wiley & Sons, Inc., 111 River
Street, Hoboken, NJ 07030, USA)