Published Online October 24, 2009
Mathematical modelling of infectious diseases
M. J. Keeling* and L. Danon
Biological Sciences, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK
Introduction: Mathematical models allow us to extrapolate from current information about the state and progress of an outbreak, to predict the future and, most importantly, to quantify the uncertainty in these predictions. Here, we illustrate these principles in relation to the current H1N1 epidemic. Sources of data: Many sources of data are used in mathematical modelling, with some forms of model requiring vastly more data than others. However, a good estimation of the number of cases is vitally important. Areas of agreement: Mathematical models, and the statistical tools that underpin them, are now a fundamental element in planning control and mitigation measures against any future epidemic of an infectious disease. Wellparameterized mathematical models allow us to test a variety of possible control strategies in computer simulations before applying them in reality. Areas of controversy: The interaction between modellers and public-health practitioners and the level of detail needed for models to be of use. Growing points: The need for stronger statistical links between models and data. Areas timely for developing research: Greater appreciation by the medical community of the uses and limitations of models and a greater appreciation by modellers of the constraints on public-health resources.
Keywords: modelling/swine flu/prediction/uncertainty
Introduction
Accepted: September 24, 2009 *Correspondence to: Prof. M. J. Keeling, Biological Sciences, University of Warwick, Gibbet Hill Road, Coventry CV4 7AL, UK. E-mail: m.j.keeling@ warwick.ac.uk
The progress of an epidemic through the population is highly amenable to mathematical modelling. In particular, the first attempt to model and hence predict or explain patterns dates back over 100 years,1 although it was the work of Kermack and McKendrick2 that established the basic foundations of the subject. These early models, and many subsequent revisions and improvements,3,4 operated on the principle that individuals can be classified by their epidemiological status most simply susceptible to the infection, infected and therefore infectious, and recovered and hence no longer infectious. (We stress that
& The Author 2009. Published by Oxford University Press. All rights reserved.
For permissions, please e-mail: [email protected]
British Medical Bulletin 2009; 92: 3342 DOI:10.1093/bmb/ldp038
M. J. Keeling and L. Danon
this classification is based upon an individuals ability to host and transmit a pathogen, and may be relatively unconnected to their medical status.) In this review, we focus on how such models can be used to predict the future outcome of an epidemic process (or the impact of control measures); however, models may also have a more theoretical use as explanatory tools elucidating fundamental principles of transmission and the factors driving epidemic behaviour. The so-called SIR model is one of the simplest and most fundamental of all epidemiological models. It is based upon calculating the proportion of the population in each of the three classes (susceptible, infected and recovered) and determining the rates of transition between these classes (Fig. 1). In the simplest model of a single epidemic, births and deaths can often be ignored, and so, only two transitions are possible: infection (moving individuals from the susceptible to the infected class) and recovery (moving individuals from the infected to the recovered class). It is generally assumed (and supported by epidemic data) that the per capita rate that a given susceptible individual becomes infected is proportional to the prevalence of infection in the population;5 while for simplicity it is often assumed that infected individuals recover at a constant rate.2 To make progress even with this simple model requires modellers to estimate two parameters: the proportionality constant for infection and the recovery rate. This illustrates the fundamental relationship between models and statistics; without a good
Fig. 1 From left to right: a pictorial representation of the ow of individuals between classes in the SIR model. The basic differential equations for the SIR model which give the rate of change of the proportion in each class (negative values reect ows out of a class, whereas positive values reect ows into the class). The result of numerically solving the SIR model, showing how the proportion of susceptible, infected and recovered individuals in the population is predicted to change over time. 34
British Medical Bulletin 2009;92
Mathematical modelling of infectious diseases
statistical estimation of parameters from epidemiological data, models cannot be used as a predictive tool, they can only illustrate general concepts. This interplay between models and statistics is something we shall return to later. Once the recovery and transmission parameters have been estimated, the SIR model predicts an epidemic that follows recognized patterns: the number of cases initially increases exponentially until the proportion of susceptible in the population has been sufficiently depleted that the growth rate slows; this process continues until the epidemic can no longer be sustained and the number of cases drops eventually leading to extinction of the infection. The simple SIR model (Fig. 1) produces three general predictions that have important public-health implications and are supported by a range of more complex models.3,4
(i) The fundamental parameter that governs the epidemic behaviour is the basic reproductive ratio, R0; which is defined as the average number of secondary cases produced by a single infectious individual in a totally susceptible population.3 Values of R0 , 1 means that any epidemic is doomed to rapid failure as the chain of transmission cannot be maintained, whereas values . 1 mean that an epidemic is possible. (For the current H1N1 pandemic in the UK, it is estimated that R0 is 1.4; note that R0 depends on both the infection and the population.) (ii) At the end of an epidemic, a proportion of the population remains susceptible2 (has not been infected). This proportion becomes very small for even moderately large values of R0, but for R0 1.4, we would only expect 51% of the population to get infected. (More complex models change this precise value, but not the general concept.) (iii) Vaccination operates by reducing the pool of susceptible individuals, and when this is reduced sufficiently, an infectious disease cannot spread within the population. Most importantly, it is not necessary to vaccinate everyone to prevent an epidemic; immunizing someone not only protects that person but confers some protection to the population in general. The classic result is that to eradicate an endemic infection or prevent a novel pandemic, a proportion 1 1/R0 of the population needs to be successfully immunized;3 so for the current pandemic, we would need to immunize 29% of the population. (More complex models show that this value can be reduced if vaccination is carefully targeted.3,4)
We now introduce an alternative approach to modelling the progress of an epidemic, before considering extensions of the SIR model that increase its realism and predictive accuracy. Given the recent increase in computational power, it is now feasible to develop an individualbased model for relatively large populations.6 Here, the concept is to describe the status and interactions of each person in the population, rather than trying to estimate the number of people with a particular status. This change from a population-level to an individual-level
British Medical Bulletin 2009;92
35
M. J. Keeling and L. Danon
perspective is incredibly powerful and allows a wide range of biologically and socially realistic assumptions to be included.7 The difficulty with such individual-based approaches is 3-fold: First, we currently have a very limited understanding of the behaviour of individuals and the range of variability, and while recent work using diary-based studies8 or mobile phones9,10 aims to dissect the interactions that could lead to disease transmission, it is still unclear how well these data describe the interactions of individuals with symptoms. Secondly, during an epidemic, the vast majority of data that are collected is at the population-scale (such as estimates of the number of cases in a region), which is ideal for parameterizing population-scale models (such as the SIR model) but more statistically challenging to use in an individuallevel approach.6 Finally, due to the complexity and computational costs of individual-level models, it may be difficult to obtain general insights or to assess the implications of particular underlying assumptions. For these reasons, population-scale models based on the SIR paradigm are most often used for short-term public-health predictions, whereas individual-level models are more commonly used as planning tools. Although the SIR model provides a simple and generic framework for understanding and predicting epidemiological dynamics, a number of modifications are possible which increase the models realism but also increase the number of parameters that have to be estimated.4 We consider these in the chronological order that they were developed, focusing on the new insights that are provided and the reasons why such extra features were included.
Age structure
Largely prompted by the implementation of mass-vaccination control programmes against a range of childhood infections, mathematical models began to structure the population by age.11 This has two main implications that interact: first, older individuals are more likely to have been exposed to infection (simply because they have been around for longer), and secondly, people tend to preferentially mix with others of a similar agea principle known as assortativity. The vast majority of this age-structured modelling was performed for measles, where the mixing between school children drives the epidemic process, and so the school holidays have a dramatic impact.11,12 This work has strong resonance with the current modelling and statistical analysis of the H1N1 pandemic, due to the age-dependent susceptibility that has been recorded (with young children being much more susceptible than adults) and due to the role that school closures and school holidays may play in limiting epidemic spread.
36
British Medical Bulletin 2009;92
Mathematical modelling of infectious diseases
Stochasticity
The persistence of infections, particularly childhood infections, within a population prompted the study of stochastic models, in which the number of individuals in any class is always an integer (whole number) and events happen at random but with a given underlying probability that is based on the associated deterministic model. These stochastic models generate different epidemics on each realization and thus capture the variability in the epidemic profile. Apart from this obvious variability, two major results arise from this stochastic approach. Focusing on measles in cities in England and the USA, early studies established the existence of a critical population size, below which an infectious disease is unable to persist without reintroduction.13 Secondly, even when R0 . 1, the success of an epidemic is not guaranteed, chance events can lead to the early extinction of any outbreak. Following a single introduction of disease, the chance of extinction (without generating a major epidemic) is given by 1/R0 and so decreases rapidly as R0 increases.14 For the H1N1 outbreak, where the probability of failure from a single introduction is 71%, stochastic effects played an important role in the initial stages of the epidemic.
Risk structure
With the growth of the HIV epidemic in the late 1980s and early 1990s, considerable attention was focused towards understanding the spread of this and other sexually transmitted infections (STIs).15 Clearly for STIs, the dominant risk factor for becoming infected is the number of sexual partners (and unprotected sex acts), and it was therefore seen as vital to structure the population into multiple risk groups.3 Such risk-structured models highlighted how various sections of the community were at far greater risk than others both due to their behaviour and due to their increased interaction with other high-risk individuals. For H1N1, there are clear comparisons between age- and risk-structured models (as age is itself a risk factor); however, other risk groups could be considered: for example, health-care workers could be modelled as a high-risk group due to their potentially greater contact with infected individuals.
Infectious distributions
The chronic nature of HIV drew attention to the within-host dynamics and distributions of infectious periods.16 The challenge is to accurately
British Medical Bulletin 2009;92
37
M. J. Keeling and L. Danon
capture the impact of within-host processes at the population level. Changes in viral load, which correspond to varying infectivity, are generally modelled by movement between multiple infection states. The speed with which an epidemic spreads through a population depends on the generation timethe length of time between successive generations of infected individualswhich is defined by the infectiousness profile. Therefore, including variability in infectiousness affects predictions about the speed of epidemic spread, the impact of stochasticity, the value of and the impact of control measures.17,18
Spatial structure
A clear failing of the SIR models is the inability to describe any spatial aspects of the spread of disease. The Foot and Mouth Disease epidemic of 2001 highlighted the importance of spatially explicit modelling as transmission between farms was a highly localized process.19,20 Such models pointed to the local depletion of susceptibles as a mechanism for slowing epidemic spread compared with a fully mixed population, and the potential for locally targeted measures to control and contain an outbreak.6 Although many of these concepts do translate for the current H1N1 outbreak, the distance moved by people generally reduces these spatial effects and leads to relatively synchronized epidemics across the whole of the UK. The basic SIR model and all of the above extra features are all part of a general modelling framework that could be applied to a range of directly transmitted infectious pathogens. To make models that are specific to influenza, or specific to the current H1N1 pandemic in the UK, requires that the models are carefully parameterized to match available data, and that this parameterization reflects both statistical uncertainty and uncertainty in the data itself. Three main data sources are available in the UK, each of which provides important insights into particular elements of epidemiological dynamics and each of which have associated difficulties: The first few hundred (FF100) database was compiled (as the name suggests) from detailed data gathered on the first few hundred cases (Fig. 2). In fact, information exists on cases of H1N1 that were laboratory confirmed together with contacts that were successfully traced in an effort to control the initial outbreak. Such data provides the only reliable estimates of individual-level data such as the basic reproductive ratio (average number of secondary cases produced per identified case), and the delays between infection, subsequent transmission and the onset of symptoms. However, several problems exist with the interpretation of this information, primarily due to the nature of contact tracing itself.
38
British Medical Bulletin 2009;92
Mathematical modelling of infectious diseases
Fig. 2 (A) Examples of the type of information that can be gained from the FF100 data; black ovals represent individuals testing positive, white ovals represent individuals testing negative and the arrow show the direction of tracing. As exemplied in the left-hand gure, knowing the average number of infected contacts per source case allows us to calculate the basic reproductive ratio. The right-hand gure shows how information on the date of onset of symptoms and the date of contact can allow us to estimate incubation, latent and infectious periods. (B) Examples of the type of network data available from the FF100 scheme; black rectangles represent initial source infections, while ovals represent traced individuals; black ovals represent individuals testing positive, white ovals represent individuals testing negative. Lines show routes of tracing. The number of secondary infected individuals from each primary case provides a measure of the basic reproductive ratio.
British Medical Bulletin 2009;92
39
M. J. Keeling and L. Danon
It is implicitly assumed that all secondary cases have been successfully traced and that we can successfully identify the infecting individual for each case. In addition, initial cases are only usually identified once symptoms arise, so infection pathways often have to be inferred retrospectively. Finally, these data were collected at a time when the UK was prophylactically administering anti-virals to family members and other close contacts; therefore, there is some uncertainty in how these data translate to the current situation. The Qflu database contains information from patients contacting their GP and being diagnosed with influenza-like illness (ILI). Qflu operates from around 3300 general practices spread throughout the UK covering a total population of almost 22 million potential patients. Qflu therefore provides an age- and regionally stratified picture of the unfolding epidemic at the population-scale and was the main data source until 23 July when the National Pandemic Flu Service came into operation in England. The naive assumption would be to assume that the number of ILI cases accurately reflects the number of cases in the UK; however, three main biases disrupt this ideal. First, not everyone who is ill with the current H1N1 pandemic contacts his or her GP. Secondly, not everyone who is diagnosed with an ILI actually has H1N1 infection; only 3040% of swabs taken at sentinel GP surgeries are positive for H1N1, although the sensitivity of this methodology is unknown. Finally, there may be strong age-related biases in consulting a GP, with younger children more likely to be taken as a precautionary measure. Finally on 23 July, the National Pandemic Flu Service (NPFS) began operation in England. The majority of those symptomatic for influenza were requested to call this service or visit the web site, whereas pregnant women, children under one and people living in Wales and Scotland should continue to contact their GP. Again difficulties and biases exist with this data. In addition to the points made above with respect to Qflu, there is also the extreme difficulty of matching the number of cases reported before and after the start of NPFS as this was co-incident with the start of school holidays in many areas and a decline in the epidemic. The main difficulty with these multiple data sources is the inability to accurately estimate the number of true H1N1 cases at any point in time, with often a 4-fold difference between maximum and minimum estimates. This uncertainty is translated into an inability to predict the proportion of cases that require hospital treatment (as although we know the current number in a hospital, we do not accurately know the total number of cases in the population), and by a similar reasoning, we are unable to accurately predict the expected case fatalities. Figure 3 illustrates two further difficulties in predicting the course of any outbreak from early case-reporting data. In Figure 3A, the initial proportion of the population who are susceptible to infection is
40
British Medical Bulletin 2009;92
Mathematical modelling of infectious diseases
Fig. 3 Examples of the uncertainty in predicting the future course of an outbreak from early epidemic data. (A) The proportion of the population initially susceptible to the infection is unknown; results are from a simple SIR model parameterized to match both the observed growth rate and the observed basic reproductive ratio (R0 1.4). (B) We use an age-structured model of children and adults parameterized such that all epidemics have the same early growth, basic reproductive ratio and ratio of cases in adults and children; here, uncertainty enters in the relative transmission rates within and between the two age-classes.
unknown (and varied from 50% to 100%), leading to a range of possible outcomes all of which would match the early growth in a number of cases. In Figure 3B, we use a simple age-structured model of children and adults but assume that the relative mixing between these age-groups is unknown; again for parameters that match the early growth and the relative number of cases in children and adults, a wide variety of predicted outcomes are possible. However, despite these difficulties, mathematical modelling supported by solid statistical analysis does produce many useful predictions about the current pandemic. Most importantly, mathematical models allow us to rigorously quantify our uncertainty in the epidemic to date and to extend this uncertainty into predictions about the future. Therefore, although current data preclude accurate prediction of the epidemic, recognizing and quantifying uncertainty allow us to develop plausible worst-case scenarios to aid public-health planning in the months ahead. In addition, despite the uncertainty, models for the current H1N1 pandemic can be used to explore the impact of control measures; for example, we can ask whether given the range of plausible models that agree with the currently available data, are there any scenarios in which vaccine should not be targeted to the high-risk individuals, or whether there are scenarios in which the predicted autumn/ winter epidemic will exceed health-service capacity. It is in this role that models, of various levels of complexity, could have the greatest public-health benefit.
British Medical Bulletin 2009;92
41
M. J. Keeling and L. Danon
Acknowledgements
This research was supported by the Medical Research Council. We thank Thomas House for his very helpful comments on the manuscript, and Sam Mason for his help visualizing the FF100 data for Figure 2.
Funding
This research was supported by the Medical Research Council.
References
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Hamer W (1906) Epidemic diseases in Englandthe evidence of variability and of persistency of type. Lancet, 1, 733739. Kermack WO, McKendrick AG (1927) Contribution to the mathematical theory of epidemics. Proc R Soc Lond A, 115, 700721. Anderson RM, May RM (1991) Infectious Diseases of Humans. Oxford University Press, Oxford, UK. Keeling MJ, Rohani P (2008) Modeling Infectious Diseases. Princeton University Press, New Jersey, USA. Begon M, Turner J (2002) A clarification of transmission terms in host-microparasite models: numbers, densities and areas. Epidemiol Infect, 129, 147153. Riley S (2007) Large-scale spatial-transmission models of infectious disease. Science, 316, 1298 1301. Ferguson NM, Cummings DA, Fraser C, Cajka JC, Cooley PC, Burke DS (2006) Strategies for mitigating an influenza pandemic. Nature, 442, 448452. Mossong J, Hens N, Jit M et al. (2008) Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Med, 5, 381 391. Gonza lez M, Hidalgo C, Barabasi A-L (2008) Understanding individual human mobility patterns. Nature, 453, 779782. Eagle N, Pentland A (2009) Eigenbehaviors: identifying structure in routine. Behav Ecol Sociobiol, 63, 1057 1066. Schenzle D (1984) An age-structured model of pre- and post-vaccination measles transmission. IMA J Math Appl Med Biol, 1, 169191. Bolker BM (1993) Chaos and complexity in measles modelsa comparative numerical study. IMA J Math Appl Med Biol, 10, 83 95. Bartlett MS (1957) Measles periodicity and community size. J R Stat Soc A, 120, 48 70. Bartlett MS (1956) Deterministic and stochastic models for recurrent epidemics. Proc Third Berkley Symp Math Stat Prob, 4, 81 108. May RM, Anderson RM (1987) Transmission dynamics of HIV-infection. Nature, 326, 137142. Nowak M, May RM (2005) Virus Dynamics. Oxford University Press, Oxford, UK. Lloyd-Smith JO, Galvani AP, Getz WM (2003) Curtailing transmission of severe acute respiratory syndrome within a community and its hospital. Proc R Soc Lond B, 270, 19791989. Wearing HJ, Rohani P, Keeling MJ (2005) Appropriate models for the management of infectious diseases. PLoS Med, 2, 621 627. Ferguson NM, Donnelly CA, Anderson RM (2001) Transmission intensity and impact of control policies on the foot and mouth epidemic in Great Britain. Nature, 413, 542 548. Keeling MJ, Woolhouse ME, Shaw DJ et al. (2001) Dynamics of the 2001 UK foot and mouth epidemic: stochastic dispersal in a heterogeneous landscape. Science, 294, 813817.
42
British Medical Bulletin 2009;92