Toxicology For The Twenty-First Century
Toxicology For The Twenty-First Century
Toxicology For The Twenty-First Century
Synthetic chemicals have awoke Snow White after a long sleep, rousing looking at how one species models for another.
been components of con- toxicology at last. In several animal species, similar experiments
sumer products for just with the same agents have been carried out,
over a century. A system Defining the problem and there is no reason to assume that, for exam-
for identifying which So what is wrong with the current approach to ple, mice, rats and rabbits predict each other’s
chemicals pose a danger toxicology testing? An ideal study to under- response to a lesser extent than they predict
to individuals and the stand whether an agent is harmful to humans that of humans. Typical results from such stud-
environment was first put would require an extremely large number of ies show agreement between animal species for
HORIZONS
in place about 80 years human subjects who are representative of the 53–60% of chemicals5,6.
ago. But after several pro- diversity of humans and who are unknowingly Similar results have also been obtained for
ductive decades, in which a patchwork of testing exposed to the agent under realistic conditions. pharmaceuticals (as opposed to chemicals)
approaches was formed, fewer and fewer of the All possible effects should then be assessed. If that have been tested in humans. In one study,
latest scientific developments were incorporated. there is any deviation from these experimental 43% of toxic effects in humans were correctly
The system of regulatory toxicology fell asleep, conditions, which are unrealistic and unethical, predicted by tests in rodents, and 63% by tests
much like the fairy-tale character Snow White the study will provide only an approximation when non-rodent animals were also included7.
when she bit into the poisonous apple. In the of the real situation — it is a model. The crucial It is clear therefore that many adverse effects are
case of toxicology, the poison was international question therefore is how useful are the cur- not uncovered by such traditional tests. This is
guidelines. This international harmonization rent models, which are mostly animal models, also evident in data from the pharmaceutical
was tempting because it allowed manufactur- and how incorrect are they? Given that about industry, showing that 20% of the failure of
ers and suppliers to use fewer resources, and it €10 billion (US$14 billion) is spent on animal drug candidates occurs as a result of toxicity
overcame barriers to trade in global markets. experimentation worldwide every year (about only after the drugs have been administered to
But implementing these guidelines came at a €2 billion of which is for toxicological studies), humans in clinical trials8. And it is estimated
price: the slow and complicated international and given that more than 100 million experi- that 6.7% of hospitalized patients experience
consensus process hindered self-criticism and mental animals are used1 and that products unexpected adverse reactions to drugs (1 in
modernization of the field of toxicology. worth €5.6 trillion are regulated by such testing, 20 of which are fatal)9, showing the limitations
There is almost no other scientific field in the question is certainly appropriate. It encom- of anticipating toxic effects from preclinical
which the core experimental protocols have passes four main issues. animal studies. To improve the toxicity assess-
remained nearly unchanged for more than The first issue is the extent to which animal ment, tests are often carried out in two animal
40 years. Yet consumers continually increase models reflect human responses. It is clear that species: usually substances that show no toxic
their expectations about the safety of products. the use of animals has limitations2: we are not effect in one species are then tested in another
One recent effect of this was the instigation of 70 kg rats; we take up substances differently; species to improve the likelihood of finding any
the largest safety assessment of chemicals that we metabolize them differently; we live longer toxic properties. This increases the sensitivity
has ever been carried out: the European Union (allowing certain diseases to develop and of testing (that is, it increases the proportion
introduced the regulation known as Registra- prompting evolutionary adaptations to protect of toxic substances that are found) but at the
tion, Evaluation, Authorisation and Restriction against them); and we are exposed to a multitude cost of increasing the number of false positives
of Chemicals (REACH) by legislation in 2007. of environmental factors. However, few studies (when non-toxic chemicals seem to be toxic in
Whereas new chemicals have been systemati- have systematically measured the accuracy of the tests carried out).
cally evaluated in the European Union and the animal models. In one example, results from The second key issue facing animal testing
United States for about a quarter of a century, animal models were compared with informa- relates to the study design, particularly to the
the safety of any chemicals produced before tion from poison centres: comparing the dose highly precautionary (conservative) approach
1981 (which includes 97% of the major chemi- of a chemical that is lethal to 50% (LD50) of rats that is taken at present. To limit costs and animal
cals in use, and more than 99% of chemicals tested and the lethal concentration of the same numbers, toxicity testing is typically carried out
produced by volume) has not necessarily been chemical in the blood of humans showed a with the maximum dose of the chemical that
properly addressed. In fact, it is estimated that rather poor correlation (coefficient of correla- can be tolerated, which has previously been
data for 86% of the chemicals are lacking, and tion of 0.56; unpublished observations from an determined. Such doses can be more than
the REACH process seeks to redress this. The international validation study3). Similarly, in 1,000-fold higher than the doses intended for
regulation affects 27,000 companies, which are another study, 40% of the chemicals that irritated humans (in terms of milligrams per kilogram
required to provide information on the toxic the skin of rabbits were found not to be irritants body weight, for example). This strategy yields
properties and uses of 30,000 chemicals, after in the skin ‘patch test’ in humans4. many false positives and further diminishes the
a pre-registration phase in 2008. But REACH Given the overall lack of data, this problem correlation between findings in animal models
might turn out to be like the prince whose kiss can be considered in more general terms by and humans10.
208
© 2009 Macmillan Publishers Limited. All rights reserved
NATURE|Vol 460|9 July 2009 HORIZONS
The third issue is the testing of multiple end two-generation study in rats (in which toxic
First species test
points, which also contributes to false-positive Positive effects are followed not only in the offspring
2–3% of chemicals are Negative
results. When enough end points are studied, reproductive toxicants Undetected toxicant of exposed rats but also, after further mating,
positive associations will always be found. This in the next generation). Between animals and
is elegantly illustrated by a study that searched 60% humans, however, this concordance might be
for correlations between disease and zodiac sign even lower, owing to the high-dose, precaution-
in the health records of 10 million residents of ary approach. So, when testing 5,500 chemicals
Ontario, Canada11: those born under the sign 40% false positive with a test that is 60% accurate, 83 of the 138
of Leo had a significantly higher probability of reproductive toxicant will be found, but about
developing a gastric haemorrhage than indi- 2,145 substances (almost 40%) will yield a false-
viduals of other zodiac signs, and Sagittarians positive result. The standard procedure would
had far more fractures of the humerus over the then be to test the apparently non-toxic sub-
period analysed. The explanation for this is stances in another animal species. Given the
simple: a total of 223 medical conditions were same accuracy, in rabbits or mice, 40% of the
studied in a single population, and examining 3,272 substances that showed negative results in
so many variables inevitably results in some the first test (1,309 chemicals) will test as false
extreme clustering of random results. Similarly, positives. At the same time, 60% of the 55 true
in toxicological studies, a large number of end toxicants (33 chemicals) that were missed in the
points are measured: about 40 in repeat-dose first test, in rats, will be found.
toxicity studies (long-term studies in which ani- Second species test In total, 116 of the 138 true reproductive
mals are exposed to a chemical for a month to Cumulative final result toxicants (84%) will be found, and 3,454 non-
a year, and the effects on many organs are stud- toxic chemicals will be found to be toxic (a
ied); and 80 in reproductive toxicity studies (in 84% total of 63% false-positive findings). These
which adverse effects on the reproductive sys- results might therefore restrict the use of a large
tem, from fertility to embryonic malformations, number of these substances, which are subject
are analysed). Unavoidably, some end points to testing because they are produced in the
will be positive, and the group sizes used are too highest quantities in Europe17. This scenario
63% false positive
small to allow statistical correction for this. might be difficult to believe, but an analysis
Given that the cost of current tests is several of reproductive toxicity studies for chemicals
hundred thousand euros per substance, large between 1981 and 2007 confirms this16: in 27
increases in group sizes are unrealistic, in addi- years, 72 chemicals reached a production vol-
tion to this being undesirable from the perspec- ume that triggered reproductive toxicity tests.
tive of animal welfare. Therefore, all positive Of these, 41 (57%) tested positive, as the above
results have to be recorded as true positive calculation (of 63%) would suggest.
results. However, this is less undesirable in the There are several caveats though. The above
risk-assessment process than one might think, scenario might be too pessimistic because the
because the positive findings are simply used correlations between species are biased by the
to establish the ‘lowest observed effect level’ Figure 1 | The consequences of searching for rare inclusion of more chemicals that test positive
(that is, the smallest amount of a substance that hazards using imperfect tests. For reproductive in at least one species (because, in the past, a
causes an observable change in the organism toxicity testing, the concordance between animal second test was often carried out to challenge
being studied). But, because the maximum tol- species is about 60%. So roughly 40% of non- the result). In addition, triggers others than
erated dose is being used, there is usually a large toxic chemicals will yield false-positive results. production volume might have indicated the
safety margin (typically a factor of 100), so the This problem is compounded by the standard need for testing a substance: that is, if substances
substance could still be used even if seems to be practice of testing these chemicals that yield are tested because they are suspected of being
toxic at high doses (with the exception of chemi- negative results in a second species to increase toxic to the reproductive cycle, then this biases
the number of hazardous substances identified.
cals observed to have tumour-inducing proper- the number of positive results in the database.
ties, as these effects are generally thought to be Nonetheless, it is unlikely that we can afford
relevant at much lower doses). It is thus often not acutely toxic in current tests; 93% of them do to falsely assign a large proportion of high-
important whether a positive result is an artefact. not irritate the skin15; and only 2–3% impair the production-volume chemicals as reproductive
It is relevant, however, to those who later need reproductive cycle16. toxicants. This will unnecessarily restrict the use
to reproduce the presumed organ toxicity to So toxicological studies search for a rare haz- of many substances, require large and expen-
validate an alternative approach, because false- ard with imperfect models. What are the con- sive efforts to replace chemicals that are widely
positive results are difficult to reproduce when sequences of this? used, and create unnecessary fears in consumers
a different test is used. In addition, whether about previous exposure. It might also prompt
or not a positive result is false is unlikely to be False-positive issue a situation similar to that for pharmaceuticals:
noticed, because most regulatory tests are car- Take, for example, reproductive toxicity test- if such results are obtained for a drug that is in
ried out only once12 and because toxicological ing under the REACH legislation. All chemi- the late stages of development (when it is already
studies are often not reported publicly13. So the cals that were marketed before 1981 and are certain that the drug has financial value), then
self-corrective mechanisms of science are not produced at more than 100 tonnes per year in large amounts of toxicological work are required
in place: there is no cross-referencing between the European Union will be subject to testing: to determine whether the animal studies are in
similar studies in different laboratories. about 5,500 of the 30,000 chemicals covered any way relevant to humans so that a valuable
The fourth issue concerns the prevalence of by REACH. It is estimated16 that about 2.5% substance can be saved.
chemical effects on health. In other words, how of these (138 substances) are true reproduc- Another important issue is that the tests for
many chemicals actually have hazardous prop- tive toxicants in humans (Fig. 1), and the goal each chemical require an average of 3,200 ani-
erties14? Despite the use of highly precaution- of toxicological testing is to identify these. The mals for a single two-generation test17 — a total
ary tests, more than 87% of chemicals registered reported concordance between species is about of 17.6 million animals for 5,500 substances
as new chemicals over the last 25 years are not 60% for reproductive toxicity testing, using the — and the current REACH testing guidance
209
© 2009 Macmillan Publishers Limited. All rights reserved
HORIZONS NATURE|Vol 460|9 July 2009
for industry does not include much scope for identified in current practices will mean that
waivers or alternatives. Even if the use of alter- Retrieve existing information products need to be examined further and
natives to animal studies, such as cell-culture- Corrosive or irritant No test will open up liabilities again.
based testing, were feasible, such methods do wPeroxides wFlammable in air or water This situation is similar to that for pharma-
not have fewer limitations18, except for ethical wpH <2 or >11.5 ceuticals. In fact, the current risk-assessment
wChemical and physical data
ones. And particularly in the field of reproduc- wData from human studies methodologies for chemicals are derived
tive toxicity, alternative methods are only being wData from animal studies from those for preclinical studies of phar-
developed19, and the cost–benefit ratio of using wAcutely toxic dermal-route test data maceuticals. However, for pharmaceuticals,
these for large-scale screening programmes wEvidence from repeat-dose studies there are two further steps in the process:
wQuantitative structure–activity relationship (QSAR)
still needs to be established. or read across clinical trials in humans and post-marketing
wData from accepted and validated in vitro tests surveillance (in which data are collected after
Towards a solution a drug has been released onto the market). A
It is unlikely that researchers will suddenly considerable proportion of drug candidates
produce new tools and design new methods ds (8–30%) fail because of safety problems in
with great accuracy. The solution to usingg Weight of
humans20, despite having passed the entire
Quantitative information
fewer animals and making better predic- evidence judgement Y
Yes
Lowest observed effect
toxicological programme of animal test-
tions in the mid-term is to design inte- Enough levels ing. Many of these safety issues are minor,
grated testing strategies. At present, the evidence? for example nausea or a transient increase
typical process is to use a default animal testt in the concentration of liver enzymes, but
and then, in some cases, to use cell-culture No major chronic effects are not assessed at this
and computer-based methods to define the stage. In addition, biologically active sub-
mode of action of the toxin and to interpret Hazard information stances such as drugs often produce side effects
Generate
Consider classification
and balance the results further. But the best new data
and labelling
as a result of their intended actions on human
opportunity to improve regulatory toxicol- physiology (an effect known as ‘excess phar-
ogy lies in strategies in which optimal use is macology’); this is less of a problem for other
first made of all existing information about a areas of chemical use, in which the chemicals
substance and structurally similar substances, In vitro skin Yes are not usually intended to affect the human
and then information is gained by approaches corrosion? body. But even though drugs undergo addi-
that do not involve animal testing, leading to tional trials in human volunteers and patients,
targeted animal testing only if necessary. Such No
in my opinion there is always a need to follow
strategies will ideally include decision points up products after marketing, as illustrated by
that depend on interim results. An example of the anti-inflammatory drug Vioxx. Similarly,
such a strategy is shown in Fig. 2. In vitro skin Yes the possible hazards of chemicals in consumer
The simplest testing strategy would com- irritation? products will probably need to be followed up
bine two different approaches, such as a more intensively after marketing.
screening approach (a method to identify ‘sus- No Today, the pharmaceutical field is again
picious’ substances with less effort and allow- driving changes in safety testing. With human
ing false-positive results) and a confirmatory proteins or antibodies (collectively known as
one (which may be more sophisticated and In vitro eye Yes biologicals) making up about half of the new
specifically identifies hazards with higher corrosion? drugs entering the market, classical toxicology
certainty). All substances that test positive is largely useless, because these proteins mostly
during the screening approach or another pri- No
have species-specific actions and animals
oritization step would enter the confirmatory raise antibodies to them, limiting the value
stage, which would consist of, for example, a of animal testing. This has created pressure
battery of mechanistic tests examining rel- In vivo eye Yes/No Assessment to develop human-cell-based models for
of risk for
evant pathways of toxicity. Instead of testing irritation?
humans
these biologicals, and other areas of toxicol-
a large number of substances that includes few ogy will benefit from this. The inadequacy
true toxicants by using one definitive test, this of current methods is also evident for new
new approach would increase the number of products such as genetically modified food
Figure 2 | Integrated testing strategy for eye
true positives entering the confirmatory stage and skin toxicity. This strategy from the REACH
and animal feed21, functional food (food with
by creating a subset of suspicious substances, guidelines for industry is one of the first examples intended health effects), and nanoparticles22,
offering more evidence about whether a chem- of an integrated testing strategy. The sequence creating an additional demand for new testing
ical is hazardous than the screening test alone. includes decision points and involves assessing the methods. Similarly, current methods are not
Alternatively, analysing which end points (for existing information and then carrying out various tailored to assess the risk of acute poisonings
example, which of the up to 80 end points in vitro tests, with animal tests being used only as a associated with chemical accidents, or biologi-
measured in a reproductive toxicity study) last resort (in vitro tests for eye irritation are being cal or chemical weapons23.
actually lead to classification as toxic or non- validated at present). REACH will also be a key instigator of change.
toxic in a particular animal test might allow This is partly because unexpected positive test
researchers to identify the end points for which present, a clearly defined set of tests, which results for important chemicals will trigger a
dedicated tests are required16. make predictable demands on time and costs, review of the approaches — it is unlikely that
Despite the advantages of such a change in is carried out. Then, the books are closed, and important chemicals with decades of use will
approach, several difficulties are apparent. It industries’ liability is minimized. Every inte- be abandoned easily, without raising doubts
would first require acknowledging and analys- grated testing strategy with decision points in about the assessment. In addition, the legisla-
ing the limitations of the current approach. One its course will bring this simple procedure to tion itself already represents a revolution in
central problem here is that the current sys- an end, making the uncertainties more evident, safety-assessment practices. Over the past three
tem is convenient for the key players: namely, as well as the fact that only the probability of a decades, internationally agreed (animal) testing
the regulators and the regulated industry. At particular hazard is assessed. Any shortcomings guidelines have set out precisely how data must
210
© 2009 Macmillan Publishers Limited. All rights reserved
NATURE|Vol 460|9 July 2009 HORIZONS
be obtained, whereas REACH calls for the inte- and political compromises around such tests
grated use of all methodologies and for the use have been made. Assess current tools and
of animals as a last resort (with certain obsta- Clinical medicine has a similar problem in their limitations
cles in place). So REACH calls for more flex- that diagnostic and therapeutic approaches
ibility and for tailored approaches. In terms of need to be objectively appraised so that the best
REACH, the test guidance for industry that has decision can be made for each patient. Here too, Integrate various approaches
been developed in the past three years guides new scientific approaches are interwoven with into testing strategies
scientists through the combined use of existing traditions, financial compromises in terms of
data, and in silico (computer-based), in vitro health care, and so on. In the past couple of dec-
and in vivo approaches. The greatest challenge ades, the most important development in this
Design new methods and
will be to standardize these approaches in test area has been the evidence-based health-care construct new system
guidelines and to reach international agreement movement, steered by the Cochrane collabo-
on them. It is reasonable to assume that at least ration24. Using structured reviews, consensus
five times more guidelines will be necessary to processes and meta-analyses, a series of 5,000 Figure 3 | Towards a new toxicology. The
accommodate the new approaches, an enor- guidance documents has been developed. toxicology community needs to take three main
mous challenge to the regulatory community. These provide the best available consolidation steps to arrive at a new system of toxicology.
But the challenge goes one step further: for of the evidence in a particular field.
each new method, test guidelines need to be It is tempting to translate this evidence-based relatively simple, but they evolved rapidly, with
not only agreed but also implemented. An approach to toxicology25, and a similar move- many researchers now using three-dimensional
interesting test case is the local lymph-node ment has been initiated. A realistic assessment (‘organotypic’) cultures that resemble organs in
assay, which is used to predict whether topi- of the methods used in toxicological studies structure and function. Even one of the last big
cal application of a chemical to the skin will will help to improve these tools and to integrate challenges in cell culture — the lack of availabil-
induce an allergic response. In 2002, the assay them into testing strategies. At the same time, ity of primary human cells (usually only sourced
was internationally agreed by the Organisation it will be important to find ways to combine from surgically removed tissues with the nota-
for Economic Co-operation and Develop- information from various studies, both sys- ble exception of blood cells) — is now increas-
ment (OECD) as the preferred animal model tematically and quantitatively. The difficul- ingly being overcome by isolating or generating
for studying skin allergies, but it has been sel- ties entailed are illustrated by the results of 29 human stem cells, from which most of the cell
dom used until recently. Since 2002, less than independent risk assessments of the industrial types in the body can be produced27,28.
10% of new chemicals have been tested in this solvent trichloroethylene: 6 studies deemed it The avenue now opening for designing a
way, as indicated by notifications to European non-carcinogenic; 10 found it to be carcino- new regulatory toxicology originates from the
regulatory bodies. Applying a new method is genic in animals but unlikely to be carcinogenic combination of bioinformatics and biotechno-
hindered by, on the one hand, tradition and in humans; 9 found it a plausible carcinogen logical approaches that yield huge amounts of
established practices and, on the other hand, in humans but with negative epidemiological information29,30. Three important technologies
obstacles such as the absence of international findings; and 4 found it a plausible carcinogen developed during the past decade have entered
agreements with countries in important eco- in humans, with positive epidemiology26. the field of toxicology31,32: ‘omics’ technologies
nomic markets (for example, Brazil, Russia (such as genomic and proteomic analyses),
and China have not yet necessarily accepted Future visions imaging techniques and robotized testing
the new OECD approaches). So it is clear that the current system of testing platforms. The testing platforms allow high
International companies tend to use the needs to change. Moreover, the individual test- throughput of samples, enabling large numbers
traditional test until the last important market ing tools have limitations and are inadequate of substances to be tested under standardized
has accepted the new approach. So the banning for toxicology in the twenty-first century. To conditions. Omics technologies and imaging
of the original test method when alternatives resolve this, I propose a three-step solution methods compile enormous sets of informa-
become available is the prime opportunity to (Fig. 3). First, the limitations of the current tion about a single compound. Together, the
force a change. The OECD have only banned tools need to be objectively assessed, and a three technologies not only allow researchers
one test so far, however: the classical LD50 test, better understanding of their uses is needed to ‘fish’ for new biological markers of specific
which required 45 rats for testing each sub- (for example, we need to analyse the preva- toxic effects but also increasingly allow the
stance, was abandoned in 2000, when three lence of particular hazards because appropri- deduction of patterns (or signatures) that are
validated alternatives were introduced, requir- ate test strategies depend strongly on whether characteristic of certain toxic effects. By also
ing only 8 to 15 animals to test one substance. the hazard is rare or frequent). Second, in the harnessing advances in bioinformatics and
In other cases, the traditional animal tests have mid-term, the various approaches need to be in silico modelling, this information can be
not been banned or modified when alterna- integrated into testing strategies, making the mined and then integrated with knowledge
tives were introduced, so the original tests can best use of the existing methods by combining from other areas of the life sciences33. Such
still be carried out for regulatory purposes them strategically. And, third, an entirely new integration of information will be particularly
if justification is provided. But when a new system is urgently needed and should be built important for investigating cellular pathways
approach does not suit all needs (that is, it is from scratch, using modern methods. and should allow the cross-fertilization of
not appropriate for all chemicals or accepted The basis for such a new system has emerged ideas between toxicology and basic science34.
by all member states), it is difficult to remove over the past two decades: advances in cell- The combination of biochemical knowledge
the traditional guidelines. The regulators must culture techniques have enabled biological of cellular pathways with genomics, proteom-
then urge that the new approach be used, to phenomena to be studied in vitro, unlike when ics and metabonomics (the study of metabolic
reinforce its implementation. For this to work, toxicological experiments were first designed. responses to environmental factors, drugs
the advantages of the new test or the shortcom- In fact, most data generated in the life sciences and diseases) is already advancing as systems
ings of the old test need to be made evident, now originate from studies of in vitro systems. biology, and systems toxicology is a new sub-
and to be credible, this assessment must have a This change in experimental approach required branch of this field.
sound and objective basis. The problem is that not only the accumulation of experience in Such a systems approach was put forward
established practices have become intertwined these new techniques but also the provision of as a toxicology for the twenty-first century in a
with scientific insights during the decades in standardized equipment, materials and train- 2007 report by the US National Academy of Sci-
which toxicological tests have been shaped, ing. Early cell-culture-based experiments were ences on behalf of the Environmental Protection
211
© 2009 Macmillan Publishers Limited. All rights reserved
HORIZONS NATURE|Vol 460|9 July 2009
212
© 2009 Macmillan Publishers Limited. All rights reserved