0% found this document useful (0 votes)
4 views10 pages

Rsif 2020 1006

This research presents a dynamic ensemble learning approach to forecast dengue fever epidemics in Brazil by analyzing weather patterns and population susceptibility. The methodology improves prediction accuracy by incorporating local climatic conditions and historical data, achieving 81% accuracy in forecasting epidemic years from 2012 to 2017. This adaptable framework could enhance public health decision-making globally by providing early warnings for dengue outbreaks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views10 pages

Rsif 2020 1006

This research presents a dynamic ensemble learning approach to forecast dengue fever epidemics in Brazil by analyzing weather patterns and population susceptibility. The methodology improves prediction accuracy by incorporating local climatic conditions and historical data, achieving 81% accuracy in forecasting epidemic years from 2012 to 2017. This adaptable framework could enhance public health decision-making globally by providing early warnings for dengue outbreaks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

A dynamic, ensemble learning approach

royalsocietypublishing.org/journal/rsif
to forecast dengue fever epidemic years in
Brazil using weather and population
susceptibility cycles
Research
Sarah F. McGough1,2, Leonardo Clemente1,3, J. Nathan Kutz4
and Mauricio Santillana1,2,5
Cite this article: McGough SF, Clemente L,
1
Kutz JN, Santillana M. 2021 A dynamic, Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02115, USA
2
ensemble learning approach to forecast Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
3
Tecnológico de Monterrey, 64849 Monterrey, Nuevo León, Mexico
dengue fever epidemic years in Brazil using 4
Department of Applied Mathematics, University of Washington, Seattle, WA 98195, USA
5
weather and population susceptibility cycles. Department of Pediatrics, Harvard Medical School, Harvard University, Boston, MA 02115, USA
J. R. Soc. Interface 18: 20201006. LC, 0000-0001-8939-8841
https://doi.org/10.1098/rsif.2020.1006
Transmission of dengue fever depends on a complex interplay of human, cli-
mate and mosquito dynamics, which often change in time and space. It is well
known that its disease dynamics are highly influenced by multiple factors
Received: 11 December 2020 including population susceptibility to infection as well as by microclimates:
Accepted: 19 May 2021 small-area climatic conditions which create environments favourable for the
breeding and survival of mosquitoes. Here, we present a novel machine learn-
ing dengue forecasting approach, which, dynamically in time and space,
identifies local patterns in weather and population susceptibility to make epi-
demic predictions at the city level in Brazil, months ahead of the occurrence of
Subject Category: disease outbreaks. Weather-based predictions are improved when information
Life Sciences–Physics interface on population susceptibility is incorporated, indicating that immunity is an
important predictor neglected by most dengue forecast models. Given the
Subject Areas: generalizability of our methodology to any location or input data, it may
computational biology, biomathematics prove valuable for public health decision-making aimed at mitigating the
effects of seasonal dengue outbreaks in locations globally.
Keywords:
dengue, forecasting, ensemble
1. Introduction
Owing to emerging sensor technologies and computational advances, the last
decade has seen significant strides in the way data are generated and collected,
Authors for correspondence: resulting in large volumes of complex information known as ‘big data’. The
Sarah F. McGough recent availability of these data has opened up the possibility of new and comp-
e-mail: [email protected] lementary avenues for epidemic monitoring that leverage diverse data
Mauricio Santillana modalities such as satellite imagery [1,2], Internet search engine activity [3,4],
social media [5], mobile phones [6,7], genomics [8,9] and disease surveillance
e-mail: [email protected]
databases [10,11]. This has opened up opportunities to posit and explore
more hypotheses for characterizing the causes and outcomes of disease trans-
mission, population behaviour, environmental conditions and other potential
indicators of population health. Exploiting these relationships to generate
reliable prospective forecasts would benefit health systems by allowing early
mobilization of resources for the prevention of morbidities and deaths in the
face of public health threats. A major challenge in disease forecasting is devel-
oping algorithms that can autonomously and continuously learn from these
complex and ever-changing dynamical systems, uncovering patterns and sig-
nals with little human effort. Machine learning algorithms are ideally suited
for such tasks. Indeed, they are having a profound impact across a wide
Electronic supplementary material is available range of application fields because of their ability to aid in learning and
online at https://doi.org/10.6084/m9.figshare. discovery.
c.5448568.
© 2021 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution
License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original
author and source are credited.
(a) (b) (c) (d) (e) 2
t0 + p
304 1

95 days
p p+1. . .p+5 non-epidemic calculate

royalsocietypublishing.org/journal/rsif
4.25
temperature (K)

epidemic
out-of-sample
302 1
epidemic historical
0 t0 test
accuracy 0
300 1 t0 + 1 train

precipitation frequency
. 4.00 and repeat for 1
extract .

p, period length
298 final
features ar all (t0, p)
t0 + 4 ye 1 prediction
2000 2001 2002 2003 2004 2005 1
3.75 1
p p+1. . .p+5 1
precipitation (mm)

7.5
0
epidemic t0
5.0 0 3.50 0
t0 + 1

10 days
1
.
2.5 . 1
extract t0 + 4 ar
ye June t0, start date Oct 1
features
2000 2001 2002 2003 2004 2005 301 302 303
date temperature (K)

Figure 1. Ensemble forecast workflow. (a) To predict next year’s epidemic status, we extract features from a daily time series of temperature (K) and precipitation
(mm) over a defined (t0, p) time interval and for each year in the training period. (b) We produce an array of features corresponding to the mean value

J. R. Soc. Interface 18: 20201006


of temperature and precipitation over the (t0, p) interval and (c) train an SVM to classify next year’s epidemic status. (d) This process is repeated for all 432
(t0, p) intervals, and the top 11 models are automatically selected to (e) contribute to a majority voting system based on historical out-of-sample accuracy.

One such complex system is the interplay of human, descriptive study showed the promise of a data-driven
climate and mosquito dynamics that give rise to the trans- approach in identifying weather patterns with meaningful
mission of mosquito-borne diseases such as dengue. Dengue signals for dengue fever outbreaks [31]. Specifically, their
fever, a viral mosquito-borne disease transmitted predomi- data-driven strategy identified temperature and frequency of
nately by the Aedes aegypti and Aedes albopictus mosquitoes, precipitation as key features in forecasting dengue outbreaks
infects an estimated 390 million people per year, with nearly by extracting windowed time intervals for different cities that
half the world’s population living at risk of infection [12]. were highly predictive. Motivated by such learning algorithms,
The global burden of dengue has doubled every 10 years we build upon this data-driven strategy to build a richer,
over the last three decades [13], and the disease is projected supervised forecasting algorithm.
to expand its latitude range as global temperatures increase
and create new suitable habitats for the Aedes mosquitoes
among previously unexposed human populations [14]. Short-
term climate conditions, particularly temperature and precipi- 2. Results
tation, can create favourable conditions for the breeding and
survival of Aedes mosquitoes that may increase the trans-
2.1. Exploiting weather signals to create a data-driven
mission of the dengue fever virus in humans. Distinct ranges forecast system
of temperature and precipitation have been observed to have We obtained data on both annual dengue fever cases (Brazilian
an influence on the extrinsic incubation period [15,16], mos- Ministry of Health) for 2001–2017 and on daily temperature
quito maturation rate [17], length of larval hatch time [18], and precipitation (GMAO-NASA) for 2000–2016, for 20
survival rate [19] and biting rate [20]. However, the relation- dengue-endemic municipalities (figure 1; electronic sup-
ships that govern these parameters and give rise to dengue plementary material, table S1) in Brazil. Weather patterns
transmission are complex and dynamic, changing over time were extracted and analysed across hundreds of partially over-
and across geographies. Moreover, multi-year cycles of lapping time intervals collectively spanning the last seven
dengue fever outbreaks, caused by one or more circulating months of a given year, a time period that typically precedes
dengue fever serotypes (DENV I, II, III, IV) and short-term the onset of epidemic outbreaks in Brazil. Each of these pat-
immunity conferred after infection, add an important layer of terns was then assessed for its ability to predict an outbreak
complexity to prediction [21]. year (defined as a year in which the number of cases exceeds
The dengue forecasting literature lacks a systematic, self- 100 per 100 000 persons) for the subsequent year. Retrospective
adaptive and generalizable framework capable of identifying and fully out-of-sample forecasts, trained on a yearly expand-
weather and population susceptibility patterns that may be pre- ing window, were produced for 10 years (2008–2017) and for
dictive of dengue fever outbreaks, particularly at the city level. each time interval using support vector machines (SVMs), a
Vector-borne diseases commonly exhibit spatial heterogeneity, binary classifier. Every year, the time intervals with high his-
a result of spatial variation in vector habitat, weather patterns torical predictive power were automatically selected and
and human control actions [22–25]. For developing forecast evaluated in the upcoming year to produce out-of-sample pre-
systems, this feature implies a trade-off between model consist- dictions for the subsequent dengue season (figure 1). An
ency and spatial resolution. As a consequence, most studies to ensemble approach was then implemented to determine, in a
date focus on producing ad hoc predictions for a single location, completely out-of-sample fashion (using the first 4 years of
ranging from the national to the city level [26–28], while others out-of-sample predictions to inform ensemble model selec-
build and evaluate multiple modelling strategies per study site tion), the system’s final prediction: whether a year would be
in efforts to manually identify relationships between weather epidemic or not for the next 6 years (2012–2017).
patterns and dengue incidence over different geographies This system, which autonomously identifies and exploits the
and temporal windows [29,30]. Both approaches highlight predictions of multiple time windows during the calendar year,
the difficulty in producing forecast models that are viable in makes it possible to identify temporally similar regions of highly
diverse settings. By contrast, data-driven techniques demon- predictive periods of the year preceding dengue outbreaks, here
strate promise by learning from multi-scale, complex systems referred to as ‘weather signatures’. Weather signatures represent
and automatically adapting to new information. A recent time windows across years that show strong influence
São Gonçalo Santa Cruz do Capibaribe Juazeiro do Norte Jí–Paraná Rondonópolis 3
100

royalsocietypublishing.org/journal/rsif
75

50

25

Manaus São Luís Barra Mansa Eunápolis Sertãozinho


100

75

50
out-of-sample
25 accuracy
period length, p

1.0
0.9
0.8
Belo Horizonte Parnaíba São Vicente Barretos Aracajú 0.7

J. R. Soc. Interface 18: 20201006


0.6
100 0.5
0.4
0.3
75 0.2
0.1
0.0
50

25

Guarujá Três Lagoas Maranguape Barueri Rio de Janeiro


100

75

50

25

June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct
start date, t0

Figure 2. The 10 year (2008–2017) out-of-sample forecast accuracy (%) for each time window of temperature and precipitation, by the municipality. The x-axis (t0)
indicates the start date of the time interval, and the y-axis ( p) indicates the length of the time interval from which weather data were gathered (10–95 days).
Models achieving at least 7/10 correct out-of-sample forecasts are shown in shades of yellow. Municipalities are ordered by decreasing ensemble prediction accuracy;
that is, the proportion of years correctly forecast by the ensemble method over the years 2012–2017.

(predictive power) on the incidence of dengue in a subsequent of epidemic outbreaks from year to year, such as the
year. We observed that cities where our methodology led to population susceptibility to being infected with the virus.
higher prediction accuracy tended to have clear and robust Specifically, endemic transmission of dengue fever is typically
weather signatures over the years, while cities where our distinguished by periodic outbreak cycles of around 3–4 years.
approach was not strongly predictive did not exhibit consistent These outbreak cycles are thought to occur as a result of (i) an
and robust weather patterns (figures 2 and 3a). Further, we exhaustion of the susceptible population after an outbreak and
observed that strong weather signatures in our sample of cities (ii) short-term cross-immunity to other circulating DENV sero-
often corresponded with or preceded important alternating types after infection [21], although the cycles can also be
tropical seasons, such as rainy and dry seasons. complicated by increased severity of a second infection [32].
Both factors result in a depletion of the population vulnerable
to infection and act as barriers to subsequent outbreaks. Inde-
2.2. Weather-based forecasting performance pendent of climate variability over the years, we expect some
Using weather data (temperature and frequency of precipi-
preservation of these susceptibility cycles.
tation) alone to predict annual dengue outbreaks, our
Inspired by this phenomenon, we implemented a data-
approach correctly forecast 81% of all epidemic years across
driven hidden Markov model by empirically computing the
20 municipalities in Brazil between 2012 and 2017 (table 1,
frequency of transitioning between multiple sequences of epi-
figure 3). For reference and as a baseline, the frequency of epi-
demic and non-epidemic years (described in detail in the
demic and non-epidemic years was 60% and 40%, thus a
electronic supplementary material). Given the previously
naive approach that predicts that all years are epidemic (the
observed sequence of consecutive outbreak and non-outbreak
class majority) would achieve an overall accuracy of 60%.
years (dengue fever cycles), the Markov model computes the
Our approach only identified 58% of non-epidemic years cor-
probability of the next year being an outbreak or a non-outbreak
rectly. This resulted in an overall accuracy of approximately
year. This acts as a proxy to dengue fever susceptibility in the
72%. Our approach significantly exceeded p = 0.005, the
population as it accounts for the cyclical nature of outbreaks
predictive power of a naive predictor.
that may be influenced by, for example, a depletion of the suscep-
tible population following multiple years of high dengue
2.3. Incorporating empirically observed dengue activity. The approach is implemented as follows: if the
susceptibility cycles weather-based approach makes a prediction with low prob-
The previously described weather-based ensemble approach ability, a decision rule is implemented to automatically
ignores important factors that may influence the emergence override the weather-based prediction if the hidden Markov
(a) (b) 4
São Gonçalo epidemic non-epidemic
Santa Cruz do Capibaribe

royalsocietypublishing.org/journal/rsif
15 correctly predicted
Juazeiro do Norte
no
Jí−Paraná
10 yes

count
Rondonópolis
Manaus status
5
São Luís prediction epidemic
Barra Mansa epidemic non-epidemic
non-epidemic 0
Eunápolis
2012 2014 2016 2012 2014 2016
Sertãozinho
year
city

Belo Horizonte
mean posterior
class probability
Parnaíba 0.25
(c)
epidemic non-epidemic
São Vicente 0.50

mean posterior class probability


0.75
Barretos 0.7
Aracajú
status

J. R. Soc. Interface 18: 20201006


Guarujá 0.6
Três Lagoas epidemic
Maranguape non-epidemic
0.5
Barueri
Rio de Janeiro 0.4
2012 2014 2016 2012 2014 2016 2012 2014 2016
year year

Figure 3. Weather-based prediction results for 120 municipality years. (a) Annual out-of-sample forecasts of outbreak status (epidemic/non-epidemic) for 20
Brazilian municipalities from 2012 to 2017, shaded by the mean posterior probability of the true outbreak status. Correct forecasts are indicated by a plus (+)
sign, and cells with light shading indicate that the model predicted the class with low probability. Municipalities are ordered by decreasing ensemble prediction
accuracy; that is, the proportion of years correctly forecast by the ensemble method over the years 2012–2017. (b) The number of total epidemic and non-epidemic
years correctly forecast across 20 municipalities, by year. The dashed white line indicates the number correctly forecast after the incorporation of empirically observed
dengue cycles. (c) The mean posterior class probability across municipalities, by year and epidemic status.

Table 1. Performance of weather-based out-of-sample forecasts across 120 experienced three consecutive epidemic years leading up to
municipality years in Brazil, with and without consideration for DENV the prediction.
susceptibility cycles. Overall, the combined approach (weather-based plus
dengue cycles) was dominantly driven by weather patterns
and informed by the decision rule only in a few cases when
weather + DENV historical data showed a very strong likelihood of either an
evaluation metric weather cycle
epidemic or not epidemic year happening. Thus, the decision
accuracy 71.70% 75% rule to favour the Markov model acts as an ‘expert opinion’
for situations in which there is clear evidence that a given
hit rate (sensitivity) 81% 78%
predicted outbreak scenario (even if suggested by the
non-epidemic detection 58% 71% weather patterns) is unlikely. Our specific finding—that
rate (specificity) the dengue cycles were used exclusively to overturn epidemic
no-information rate 60% 60% forecasts—suggests that while the weather conditions in
those locations and years were identified to be conducive to
P (accuracy > no- p = 0.005 p = 0.0004
an outbreak, there was stronger evidence that the population
information rate) may have had low susceptibility to infection (thus avoiding
an outbreak), based on multiple consecutive preceding
years of high disease incidence.
model (based on the pattern of consecutive outbreaks and non-
outbreaks in years prior) predicts a more likely scenario. In this
way, the ‘cycles’ of dengue fever outbreak susceptibility are 2.5. Model performance by year
incorporated into our otherwise agnostic weather-based The success of our combined epidemic forecasts varied by
approach. year, reflecting the difficulty of forecasting disease activity
relying only on weather patterns and the empirically
extracted susceptibility cycles. During the last three years of
the time series (2015–2017), epidemics were predicted by
2.4. Combining dengue cycles with weather patterns the weather-only models with at least 80% accuracy, with
improves forecasts 100% of the 13 outbreaks in 2016 correctly forecast
Compared with the exclusively weather-based approach, (figure 3b,c). Conversely, non-epidemic years during
incorporating these empirically observed dengue cycles into 2013–2014 were particularly difficult to predict, with only
our system improved our ability to predict non-epidemic one-third and one-half of cities correctly forecasting non-
years by approximately 20% (specificity = 69%) and increased epidemics for these years, respectively. The most successful
overall accuracy to 74.2% (table 1). Specifically, the additional non-epidemic predictions occurred in 2012, for which six
decision rule replaced seven epidemic forecasts with non- out of eight non-epidemics (75%) were predicted correctly.
epidemic forecasts, of which five were correct (figure 3b). Overall, 2015 and 2016 were the most successfully classified
The majority of these cases belonged to cities which had years, with 80% and 85% of municipalities correctly classified
São Gonçalo Santa Cruz do Capibaribe Juazeiro do Norte Jí−Paraná Rondonópolis 5
80
75 75 75 60

royalsocietypublishing.org/journal/rsif
70
50 50 50 40
60
25 25 50 25 20

June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct

Manaus São Luís Barra Mansa Eunápolis Sertãozinho

80 80 75 75
75

60 60 50 50
50
40 25 25
40 25

June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct

J. R. Soc. Interface 18: 20201006


Belo Horizonte Parnaíba São Vicente Barretos Aracajú
60
80
80 75 75
40 60
60 50 50

20 40 25 25
40

June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct

Guarujá Três Lagoas Maranguape Barueri Rio de Janeiro


80 50

75 45 75 75
60
40
40 50 50 50
35

20 30 25 25
25
25
June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct June July Aug Sep Oct

Figure 4. Periods of the year selected into the ensemble forecast model for 2012–2017, by the municipality. The x-axis (t0) indicates the start date of the time
interval, and the y-axis ( p) indicates the length of the time interval from which weather data were gathered (10–95 days). Municipalities with smaller and brighter
yellow centres are those which exhibit the highest consistency in the predictive performance of weather patterns. Municipalities are ordered by decreasing ensemble
prediction accuracy; that is, the proportion of years correctly forecast by the ensemble method over the years 2012–2017.

as epidemics or non-epidemics, respectively, while 2014 and year, the combination of these outputs is calculated using a
2017 were the most difficult years to predict, with 45% and voting system that only considers time windows that have
35% of municipalities misclassified, respectively. consistently exhibited the highest historical out-of-sample
Incorporating information on the dengue cycles helped prediction performance among all other time windows of
detect an additional non-epidemic in 2012 and 2015, and an the calendar year. In our framework, time windows are auto-
additional three non-epidemics in 2017 (figure 3b). matically selected into the forecasting ensemble if (i) their
own historical out-of-sample performance is high and (ii)
the historical performance of their calendar neighbours, that
2.6. Quantifying the strength of predictions is, models using temporally nearby time windows as
Because our forecast system produces deterministic binary pre- predictors, is high as well.
dictions (epidemic/non-epidemic year) using local-in-time Consequently, we computed metrics of ensemble accuracy
SVM classifiers, a natural question is how to quantify the con- and strength (or confidence) by quantifying both of these
viction (or confidence) of each prediction. It is important elements. We found that, in cities where the predictive per-
to note that the number of observations per city is small formance of our approach is highest (electronic
(n = 17), and, thus, a rigorous probabilistic approach to quanti- supplementary material, figure S2), the successful individual
fying conditional probabilities of success is not feasible. classifiers that contribute to our final prediction use as input
However, in the interest of better communicating to public temporal regions that are clustered around one another (as
health officials the reliability of our predictions in a given shown in figure 4), suggesting that the presence of temporally
location and time period, as well as identifying the determi- consistent weather patterns can be thought of as an indicator of
nants of success of our prediction system if one were to the success of our methodology.
extend our predictive approach to new locations, we explored It is important to note that models with high historical
simple ways to characterize the accuracy and conviction of pre- prediction performance may still lead to poor outcomes if
dictions. We did this based on both the historical performance the weather data for the year of (out-of-sample) forecast do
of the selected ensemble generating the prediction and the not clearly belong to an epidemic or non-epidemic class, as
performance of the weather-based classifiers themselves. learned by the individual classifiers, and/or if its weather
Our prediction system combines the output of a collection patterns happen to ‘look like’ those appearing historically
of local-in-time binary classifiers that use different time in the opposite class.
periods (characterized by an initial point in time, t0, and a In order to further assess the individual strength or con-
window length, p), prior to the typical date of the onset of viction of each individual classifier, we estimated whether
dengue outbreaks, as predictors. For each city and each the separability or difference between the two classes
(epidemic versus non-epidemic) was well captured by the immediately extended to other locations, requiring no 6
classifier by extracting calibrated posterior probabilities of location-specific manipulation or inputs aside from a globally

royalsocietypublishing.org/journal/rsif
each SVM model using Platt’s scaling [33]. The posterior available time series of daily temperature and precipitation as
probability reflects the distance to the separation boundary well as a complete yearly record of dengue incidence.
distinguishing epidemic and non-epidemic years on the Using weather information only, our models seek to
basis of weather. Thus, a higher probability represents how characterize and exploit the predictive ability of distinct
strongly the weather patterns of the prediction year aligned weather patterns preceding outbreak years. Because our
with those experienced by prior outbreak or non-outbreak framework automatically identifies the time periods for
years. We observed that, in general, the probabilities were which weather patterns produce strong signals, it was poss-
moderately calibrated, i.e. roughly 80% of predictions made ible to identify temporal weather signatures in multiple
with 0.8 probability were epidemics (electronic supplemen- locations with vastly different ecosystems and geographical
tary material, figure S3); however, the small sample size locations. For this, we observed that cities with better overall
(i.e. six out-of-sample years for each of the 20 cities) limits prediction accuracy had stronger weather signatures,
the ability to interpret this feature appropriately. We found suggesting perhaps some biological consistency. For example,

J. R. Soc. Interface 18: 20201006


that this measure of separability was not a particularly the southeastern municipality of Barra Mansa (five out of six
good indicator of accuracy; that is, our approach failed even ensemble years predicted correctly) exhibited strong signals
in scenarios with high separability. Several factors may be from time windows spanning the first half of the city’s
driving this finding, including insufficient training data and rainy season, in October through December of each year.
the influence of factors beyond weather (e.g. sociodemo- Further north, the hot, wet and humid municipality of
graphic characteristics, land use) on outbreaks; we elaborate Manaus (five out of six ensemble years predicted correctly),
further in the electronic supplementary material. situated at the mouth of the Amazon, appeared to show
Both approaches to characterize the confidence of our pre- two distinct weather signatures straddling the driest month
dictions—quantifying ensemble strength and quantifying the of the year, August. These patterns, generated from 10
separability of the data—highlight separate limitations of our years of out-of-sample model predictions, suggest that, in
modelling framework. First, we expect that both a greater different regions of Brazil, the weather may affect dengue
variety of environmental variables (e.g. humidity, vegetation transmission differently and at different times of the year.
and standing water) and non-environmental variables (e.g. However, in locations where weather-based predictions
human activity and public health interventions) will contrib- were less successful, these signatures were not distinct; for
ute to more accurate predictions by considering broader instance, Rio de Janeiro (three out of six ensemble years pre-
factors that contribute to dengue fever activity in a given dicted correctly) showed no clear temporal trend. In cities
location. Second, the robustness of our predictions was lim- such as these, we might expect to see a lower influence of
ited by a short time series of annual information, which weather patterns on transmission than with other predictors
may not be sufficient to detect clear differences in epidemic (e.g. sociodemographics, policy, population behaviour,
and non-epidemic years on the basis of weather alone. None- human land use, vector abundance). We did not find clear
theless, our reproducible modelling framework can easily be patterns by geography, population density or municipality
extended to accommodate additional predictors and longer size. We believe this work should catalyse important research
time series, and thus we highlight these as limitations of both on the local influence of weather patterns on dengue
only the present case study, with potential for improved outbreaks and on the extent to which other factors drive out-
performance in other data settings. breaks in these locations. Moreover, this data-driven
approach may help generate hypotheses on the relevance of
multiple factors that may influence the dynamics of seasonal
dengue outbreaks.
3. Discussion Even weather conditions that appear highly suitable for
Here, we have presented a novel approach to forecasting an outbreak (or none), based on historical information, may
dengue fever outbreak years in Brazil at its smallest adminis- be challenged by other factors that limit (or encourage) trans-
trative unit, the city level, using a single, dynamic and mission of dengue. A key strength of our approach is the
flexible modelling framework that uses only two weather vari- incorporation of empirically observed information on
ables and historical information on yearly dengue activity. Our dengue fever susceptibility cycles, to correct for potential
approach automatically learns from weather and population short-term immunity that results from previous exposure to
susceptibility patterns of any inputted yearly time series of the dengue virus. We found that these susceptibility cycles
dengue incidence and leverages the best historical predictions were critical to the performance of models, particularly
to generate an ensemble forecast. We find that complementing those which identified weather patterns suitable for a
our weather-based statistical approach with observed 3–4 year dengue outbreak in a year with potentially low population
cycles of dengue fever outbreaks (as a proxy for population susceptibility to infection. For instance, this approach cor-
susceptibility) is key for our models to achieve higher accuracy rectly identified three additional non-epidemics in 2017
and improve substantially in predicting non-epidemic years. compared with weather patterns alone, supporting the dis-
These forecasts may provide timely information on dengue course on the unusually low dengue activity seen in Brazil
fever activity to policymakers months ahead of outbreak in 2017 [34]. Still, our models missed half (6/12) of non-epi-
seasons. Further, our entirely data-driven models show an demics in 2014, which was predicted by experts to be a low
ability to learn from complex relationships between dengue transmission year because of the immunity provided by a
epidemics and climatic conditions and identify, in vastly differ- large outbreak in 2013 with no changes in circulating
ent locations, potentially relevant weather patterns with likely DENV serotypes [34,35]. Thus, incorporating information
biological significance. Importantly, these models can be on specific circulating serotypes could be used to better
detect changes in population immunity and enhance our outbreak status between 2001 and 2017). We chose a short 7
approach. Empirical and modelling-based seroprevalence training period (initial 7 years) to maximize the number of

royalsocietypublishing.org/journal/rsif
studies may aid with this component, though this surveil- out-of-sample ensemble predictions, but ultimately it is difficult
lance information is more challenging to routinely acquire to establish strong climatic distinctions between outbreak and
[36]. Regardless, here we highlight the importance of incor- non-outbreak years in the data with so few samples. Thus, we
porating mechanistic processes of disease transmission into anticipate improvement in performance for settings that have
data-driven approaches that may be otherwise blinded to multiple decades of data, which would allow for longer training
them. periods, improved separability in the data and more stable
Our approach achieved an overall accuracy of 75%, which identification of dengue susceptibility cycles, all improving the
we believe is promising considering the difficulties in predict- quality, robustness and accuracy of predictions. In addition,
ing the target. To put our results in context, we visited other where epidemiological data are available at finer temporal resol-
benchmarks in the dengue prediction literature. While most utions (e.g. weekly, monthly), this prediction problem could
dengue forecast models predict a continuous outcome such leverage more classical time-series approaches (such as
as total incidence (rendering comparisons of performance SARIMA models) that incorporate adjustments for seasonality

J. R. Soc. Interface 18: 20201006


metrics not possible), we do find that dengue weather- and trends, for example, as was done in [29]. Future studies
based predictions achieve overall lower accuracy than other should compare our approach with time-series-based methods
comparator models and achieve varied performance across wherever data are available to do so. Finally, our approach—
distinct geographical regions, for example in the work of which spans two decades and 20 locations—is limited by report-
Lauer et al. [30] and Johansson et al. [29]. To the latter ing heterogeneities in space and time. Brazil’s centralized
point, we find similarities with our work in that weather- compulsory notification system, SINAN (Information System
based predictions performed well in some Brazilian munici- for Notifiable Diseases), has experienced software and reporting
palities, but not others. In another study that predicted a standards changes over the last two decades, giving rise
comparable binary outcome, weekly outbreak status, in to potential discrepancies in disease reporting at temporal
Malaysian districts using weather information such as temp- change points. In addition, the case notification data in
erature and rainfall, the authors found an overall 70% SINAN originate from data collected at health facilities via epi-
accuracy using an SVM classifier [37], though noted that demiological disease surveillance reporting forms, and despite
weather variables were not the most predictive in the model. well-centralized reporting standards differences in reporting
Because dengue transmission is driven by multiple complex may exist between locations. However, dengue is a compulsory
socioecological and biological factors, we expect our models to reportable disease in Brazil and receives a large number of
capture only a portion of the epidemiological triangle. Here, we reports nationwide each year (e.g. 1.7 million cases reported in
show the performance of two simple and relevant weather indi- 2015), and reporting is thought to accurately represent the over-
cators of dengue fever, but the incorporation of additional all trend of dengue in Brazil [43]. Because we further reduce the
weather features (i.e. humidity, vegetation and soil water number of case reports to a binary outbreak status (epidemic/
absorption) combined with a feature selection step may lead non-epidemic), our dependent variable may be less susceptible
to improved accuracy of forecasts, by considering more complex to these issues. Nonetheless, reporting heterogeneities are an
weather conditions preceding dengue outbreaks. However, in inherent limitation to work like this.
initial exploratory analysis, we did not find that other weather Ultimately, this framework provides a simple, reproducible
factors such as humidity or soil absorption outperformed temp- method of predicting dengue fever outbreak years in a wide
erature and precipitation alone, confirming the findings of range of locations. Given that the global and economic burden
[31,38] that factors other than temperature and precipitation of dengue is placed at an estimated 390 million infections and
may have little influence on dengue outbreaks. We also demon- US$8.9 billion per year [12,44], optimizing resource allocation
strate the robustness of this approach by replicating the study for disease prevention is critical. However, control of the Aedes
using an alternative feature extraction method, singular value mosquito requires weeks or months before effects are seen on
decomposition, with similar results (electronic supplementary the vector population, so predicting dengue outbreaks up to
material, figure S4). Nonetheless, we show that weather predic- several months before their onset is ideal. Our reproducible
tions fail in some cities, for example Rio de Janeiro (discussed approach, which uses globally available data with the daily res-
above), where non-climatic factors may be influential in olution, is intended to serve as a supervised learning framework
dengue outbreaks. For example, social factors including socioe- to produce early outbreak warnings in any desired context,
conomic conditions [39], population mobility dynamics [40] resulting in more efficient resource mobilization, budgeting
and public health and infrastructure [41], as well as mosquito and prevention campaigns. Moreover, the flexible approach
factors such as vector abundance [42], are known contributors can be extended to include other variables thought to be predic-
to dengue transmission. These variables may contribute to a tive of dengue outbreaks. Developing transparent early
more complete understanding of dengue fever in Brazil. Our warning systems at the local level is emerging as a top global
work shows that weather- and susceptibility-based models health priority, making our contribution both timely and
can contribute valuable information to larger ensemble impactful.
approaches that leverage a collection of mobility, sociodemo-
graphic, epidemiological, climatic and biological information.
Future work should explore the incorporation of these
comprehensive data into a single modelling approach.
Our approach also demonstrates the feasibility (and limit-
4. Material and methods
ations) of predicting in a ‘small data’ setting, wherein only 4.1. Study design
17 outcome data points were available in total for training We developed a single, flexible modelling framework capable of
and out-of-sample predictions (each representing annual identifying potentially useful weather patterns to predict dengue
fever and used this to forecast annual outbreak status (epidemic/ of the calendar year during which weather information contains 8
non-epidemic). a strong signal for subsequent dengue fever outbreaks. In order
Our workflow, outlined in figure 1, combines elements from to construct a single framework that can automatically identify

royalsocietypublishing.org/journal/rsif
signal processing/spectral analysis, machine learning and important weather signals in multiple different locations with
ensemble modelling to achieve robust, data-driven epidemic vastly different ecosystems and weather patterns, we allow the
forecasts that do not require any prior knowledge of the system data to inform the choice of time intervals. Our algorithm
(i.e. climatic influences on dengue transmission). Our research achieves this by scanning over multiple, partially overlapping
question is inherently one of time-series classification, to forecast time intervals across the calendar year, and building hundreds
epidemic versus non-epidemic years of dengue fever. The work- of models on these different intervals in order to select those
flow begins with a time series of hourly and daily weather with the strongest signals.
information, which serve as inputs to a collection of classifiers Each time interval is defined by a start date, t0, between early
that contribute to ensemble-based epidemic predictions. Our June and late September, and a period length, p, of between 10
approach can be described in five steps. and 95 days. The combination of each (t0, p) produces multiple,
partially overlapping intervals spanning the last seven months
1. Signal preprocessing: for a time series of weather data, define of the calendar year.

J. R. Soc. Interface 18: 20201006


time intervals of varying sizes (10–95 days across the last Borrowing from spectral analysis and wavelet decompo-
seven months of the calendar year) and use a windowing sition, we use a windowing-inspired approach to better capture
technique [31] to include information within several days of signals within the time intervals. Windowing is typically used
the interval. In contrast with [31], there are no deleterious to improve signal clarity, and here we apply a rectangular
effects due to missing temperature data since the data are ‘range’ as described in [31] to incorporate the information in
acquired via satellite instead of ground measurements. the days both within and around each time interval. We define
2. Time-series feature extraction: extract a simple summary a rectangle of 5 × 6, indicating that, for every defined (t0, p)
measure for two weather variables with known influence time interval, the algorithm collects information from five
on mosquito-borne disease dynamics, temperature and fre- consecutive start dates, t0, t0 + 1, …, t0 + 4, spanning six consecu-
quency of precipitation. Although more variables can be tive period lengths, p, p + 1, …, p + 5. Each time interval and
considered, they have little influence on the predictive weather variable, then, is summarized by 30 data points, each
power in comparison with the two selected [31]. capturing slightly different temporal slices from the time series.
3. Independent model training and prediction: train a collection of This process effectively adds a bit of redundant information
independent SVM classifiers on historical information from to the model-building process—to which our learning algorithm,
each unique time interval, and generate an out-of-sample epi- the SVM, is in general robust—in order to pick up signals in the
demic prediction for the following year. Although SVM was data that may not be captured by applying an arbitrary ‘start’
used in [31], we provide here a richer out-of-sample prediction and ‘end’ cut-off to the data.
scheme for forecasting.
4. Model selection: choose the best 11 models, representing strongly
predictive periods of the year preceding outbreaks, based on 4.3. Time-series feature extraction
(i) historical out-of-sample prediction accuracy and (ii) out- Time-series data must be transformed into appropriate inputs in
of-sample performance of neighbouring time intervals. order to be used in supervised learning models. This process,
5. Ensemble prediction: determine a final out-of-sample epidemic called time-series feature extraction, involves computing sum-
forecast by a majority vote of the selected top models. mary features of the time series, which can range from simple
means to complex wavelet transforms. To test the feasibility of
To potentially enhance the performance of this exclusively our approach using only simple summary features, we extracted
weather-based approach, we implemented a post hoc step incor- the following features within each (t0, p) time interval based on
porating empirical information on 3- and 4-year dengue fever the findings of [31]: (i) the arithmetic mean of daily temperature
cycles as a proxy for population susceptibility to infection. and (ii) mean precipitation frequency, with the frequency defined
as the time interval (in days) between peaks (local maxima) of
6. Dengue cycles: implement a decision rule governed by the daily precipitation. In the electronic supplementary material,
second- and third-order Markov transition probabilities, we present an alternative method of feature extraction using
reflecting the transition between consecutive sequences of singular value decomposition.
epidemic and non-epidemic states

We applied our approach to 20 cities in Brazil spanning large 4.4. Independent model training and prediction
geographical and population ranges (electronic supplementary The goal of our independent model-building step is to identify
material, figure S1 and table S1). We used as input a historical dynamically, through the continually updating performance of
time series spanning 17 years and consisting of information on a collection of models, the periods of the year that are most pre-
dengue case reports (number, annual) and two weather variables: dictive of annual dengue outbreaks, in order to exploit a small
2 m air temperature (kelvin, daily) and precipitation (kg m−2, number of them to generate forecasts.
hourly). We describe data sources, acquisition and processing in To forecast outbreak years, we trained a collection of SVM
the electronic supplementary material. After an initial training classifiers on an initial 7 year training period and produced
period of 7 years, we generated 10 years of out-of-sample epidemic annual forecasts incorporating the most recently available
predictions for each of the independent models using a 1 year weather information using a dynamic, 1 year expanding training
expanding training window (step 2). We used the first 4 years of window. A unique SVM was trained for each of the (t0, p) time
out-of-sample predictions to inform ensemble model selection intervals, resulting in a total of 432 independent models trained
(step 4) and produced ensemble-based predictions for the remaining per year. Each model generated out-of-sample predictions for the
6 years (step 5). remaining 10 years of data. Predictions were made by classifying
the 30 out-of-sample data points corresponding to the weather
information preceding the target year, and taking a majority
4.2. Signal preprocessing vote. In order to handle highly nonlinear relationships between
Using a daily time series of weather data to forecast dengue fever weather variables, both radial basis function and sigmoid kernels
epidemic status requires identifying the most predictive period(s) were used and evaluated for performance and show results for
the best respective kernel in each city. We tuned model par- (ii) short-term cross-immunity to other circulating DENV sero- 9
ameters (gamma, soft margin cost function and coefficient) types after infection [21]. Both factors result in a depletion of
using 10-fold cross-validation. the population vulnerable to infection and act as barriers to sub-

royalsocietypublishing.org/journal/rsif
SVMs, a supervised learning method for classification, were sequent outbreaks. Independent of climate variability over the
used because of their flexibility in the face of complex, nonlinear years, we expect some preservation of these cycles.
decision boundaries and their robustness to overfitting and outliers. Consequently, we implemented a ‘decision rule’ in the model
The property that underpins these advantages is known as the ‘large- based on the observed transitions between epidemic and non-
margin classifier’. SVMs are also known for their good performance epidemic years across 51 Brazilian municipalities meeting ende-
in high-dimensional feature space, which is advantageous for the mic inclusion criteria (electronic supplementary material).
scale-up of the model to include dozens more predictors. Across these municipalities, we computed the mean second-
and third-order Markov transition probabilities, representing
the probability of transition from one outbreak state (epi-
4.5. Model selection demic/non-epidemic) to the opposite outbreak state (non-
From the resulting collection of 432 models, the best-performing epidemic/epidemic) after 2 and 3 consecutive years, respectively.
models (n = 11) were selected each year based on (i) historical out- Thus, we obtained the transition probabilities corresponding to

J. R. Soc. Interface 18: 20201006


of-sample prediction accuracy (per cent of outbreak forecasts cor- the following 3 and 4 year cycles: 001, 110, 0001 and 1110 (0 =
rect) and (ii) out-of-sample prediction accuracy of neighbouring non-epidemic year, 1 = epidemic year). Transition probabilities
models (representing similar time intervals). These models thus were computed based only on the first 11 years of data; that is,
represent strongly predictive periods of the year preceding out- the years preceding the six out-of-sample ensemble predictions.
breaks, and the algorithm rewards the high performance of Our decision rule acts as a surrogate ‘expert opinion’, over-
similar temporal windows over the high performance of a time turning the ensemble prediction if the probability of a specific
window whose neighbours exhibit poor prediction tendencies. Markov transition to an epidemic or non-epidemic status (based
Because the model-building process is dynamic, resulting in a on the data from previous years) exceeded the per cent of model
new collection of models each year with continually updating per- votes (out of 11 votes). For example, if the ensemble predicts an
formance measures, the selection of the 11 models changes from epidemic year to succeed two epidemic years with seven votes,
year to year. the corresponding ‘strength’ of that vote is 63% (7/11), which is
In order to get a sense of the out-of-sample performance of the weaker than the corresponding observed second-order transition
432 models, we allowed all models to generate 4 years of out-of- probability for a non-epidemic year to follow two epidemic years
sample predictions before the top 11 models were selected based (0.71). In this case, the model vote would be overridden to predict
on this prediction accuracy. As a result, the ensemble approach, a non-epidemic year instead of an epidemic year.
which exploited the predictions of the top 11 models, was used We compared the performance of predictions based solely on
for the final 6 years of out-of-sample predictions. weather patterns with those which incorporate additional
empirical data from outbreak cycles.

4.6. Ensemble prediction


Ensemble learning helps improve machine learning algorithms Data availability. All data needed to evaluate the conclusions in
by combining the results of multiple trained predictors in order the paper are contained in the paper and/or the electronic sup-
to generate a single, robust prediction. In our approach, we com- plementary material. The epidemiological data used in this study
bine the results from the strongest-performing models, which are available from the Brazilian Ministry of Health. Meteorological
represent the most highly predictive time periods preceding data (MERRA-2) are available through the Global Modeling and
dengue outbreaks. While there is an abundance of ensembling Assimilation Office (GMAO) at NASA Goddard Space Flight
methods in machine learning, we use a simple majority vote of Center. The yearly dengue activity binary classification of the Brazi-
the 11 models to decide a single forecast. These single forecasts lian municipalities used in this study and meteorological data are
available through https://github.com/LeonardoClemente/Supple
were produced for the last 6 years of the 17 year dataset, repre-
mentaryMaterialsBrazilBinary.
senting the culmination of a prediction process that involves:
7 year initial training period, 4 year out-of-sample model cali- Authors’ contributions. S.F.M., J.N.K. and M.S. conceived the study;
S.F.M., L.C., J.N.K. and M.S. formulated the experimental design;
bration period and 6 year out-of-sample ensemble prediction
S.F.M. collected the data; S.F.M. and L.C. analysed the data; all
period. Across 20 Brazilian municipalities, this scheme produced authors discussed results, contributed to manuscript preparation
120 municipality years of out-of-sample ensemble predictions. and reviewed the manuscript.
Competing interests. We declare we have no competing interests.
Funding. M.S. and L.C. thank the Johnson and Johnson Foundation
4.7. Dengue cycles and the Johnson and Johnson Global Public Health R&D Unit for
Our weather-based ensemble approach remains agnostic to the providing institutional research funds to partially support this
relationship between weather patterns and dengue outbreaks, work. M.S. was partially supported by the National Institute of
instead allowing the data to drive model selection and predic- General Medical Sciences of the National Institutes of Health under
tions. However, endemic transmission of dengue fever is award no. R01GM130668.
typically distinguished by periodic outbreak cycles of around Disclaimer. The content is solely the responsibility of the authors and
3–4 years. These outbreak cycles are thought to occur as a does not necessarily represent the official views of the National Insti-
result of (i) an exhaustion of susceptibles after an outbreak and tutes of Health.

References
1. Ford TE, Colwell RR, Rose JB, Morse SS, Rogers DJ, 2. Sewe MO, Tozan Y, Ahlm C, Rocklöv J. 2017 Using 3. McGough SF, Brownstein JS, Hawkins JB, Santillana
Yates TL. 2009 Using satellite images of remote sensing environmental data to forecast M. 2017 Forecasting Zika incidence in the 2016
environmental changes to predict infectious disease malaria incidence at a rural district hospital in Latin America outbreak combining traditional
outbreaks. Emerg. Infect. Dis. 15, 1341–1346. Western Kenya. Sci. Rep. 7, 2589. (doi:10.1038/ disease surveillance with search, social media, and
(doi:10.3201/eid/1509.081334) s41598-017-02560-z) news report data. PLoS Negl. Trop.
Dis. 11, e0005295. (doi:10.1371/journal.pntd. 18. Byttebier B, De Majo MS, Fischer S. 2014 Hatching climate conditions for different capitals. (https://arxiv. 10
0005295) response of Aedes aegypti (Diptera: Culicidae) eggs org/abs/1701.00166 [q-bio.QM])

royalsocietypublishing.org/journal/rsif
4. Yang S, Santillana M, Kou SC. 2015 Accurate at low temperatures: effects of hatching media and 32. Guzman MG, Alvarez M, Halstead SB. 2013
estimation of influenza epidemics using Google storage conditions. J. Med. Entomol. 51, 97–103. Secondary infection as a risk factor for
search data via ARGO. Proc. Natl Acad. Sci. USA 112, (doi:10.1603/ME13066) dengue hemorrhagic fever/dengue shock syndrome:
14 473–14 478. (doi:10.1073/pnas.1515373112) 19. Barry W, Alto DB. 2013 Temperature and dengue an historical perspective and role of antibody-
5. Marques-Toledo CdA, Degener CM, Vinhal L, Coelho virus infection in mosquitoes: independent effects dependent enhancement of infection. Arch. Virol.
G, Meira W, Codeço CT, Teixeira MM. 2017 Dengue on the immature and adult stages. Am. J. Trop. 158, 1445–1459. (doi:10.1007/s00705-013-1645-3)
prediction by the web: tweets are a useful tool for Med. Hyg. 88, 497. (doi:10.4269/ajtmh.12-0056) 33. Platt J. 1999 Probabilistic outputs for support vector
estimating and forecasting Dengue at country and 20. Scott TW, Morrison AC, Lorenz LH, Clark GG, machines and comparisons to regularized likelihood
city level. PLoS Negl. Trop. Dis. 11, e0005729. Strickman D, Kittayapong P, Zhou H, Edman JD. methods. Adv. Large Margin Classifiers 10, 61–74.
(doi:10.1371/journal.pntd.0005729) 2000 Longitudinal studies of Aedes aegypti 34. van Panhuis WG, Hyun S, Blaney K, Marques Jr ETA,
6. Bengtsson L, Gaudart J, Lu X, Moore S, Wetter E, (Diptera: Culicidae) in Thailand and Puerto Rico: Coelho GE, Siqueira Jr JB, Tibshirani R, da Silva Jr JB,
Sallah K, Rebaudet S, Piarroux R. 2015 Using mobile population dynamics. J. Med. Entomol. 37, 77–88. Rosenfeld R. 2014 Risk of dengue for tourists and teams

J. R. Soc. Interface 18: 20201006


phone data to predict the spatial spread of cholera. (doi:10.1603/0022-2585-37.1.77) during the World Cup 2014 in Brazil. PLoS Negl. Trop.
Sci. Rep. 5, 8923. (doi:10.1038/srep08923) 21. Adams B, Holmes EC, Zhang C, Mammen Jr MP, Dis. 8, e3063. (doi:10.1371/journal.pntd.0003063)
7. Kramer AM, Pulliam JT, Alexander LW, Park AW, Nimmannitya S, Kalayanarooj S, Boots M. 2006 Cross- 35. Massad E et al. 2014 Risk of symptomatic dengue
Rohani P, Drake JM. 2016 Spatial spread of the West protective immunity can account for the alternating for foreign visitors to the 2014 FIFA World Cup in
Africa Ebola epidemic. R. Soc. Open Sci. 3, 160294. epidemic pattern of dengue virus serotypes Brazil. Mem. Inst. Oswaldo Cruz 109, 394–397.
(doi:10.1098/rsos.160294) circulating in Bangkok. Proc. Natl Acad. Sci. USA 103, (doi:10.1590/0074-0276140133)
8. Zhu Z, Chan JF-W, Tee K-M, Choi GK-Y, Lau SK-P, Woo 14 234–14 239. (doi:10.1073/pnas.0602768103) 36. Honório NA et al. 2009 Spatial evaluation and
PC-Y, Tse H, Yuen K-Y. 2016 Comparative genomic 22. Mbogo CM et al. 2003 Spatial and temporal modeling of dengue seroprevalence and vector
analysis of pre-epidemic and epidemic Zika virus heterogeneity of Anopheles mosquitoes and density in Rio de Janeiro, Brazil. PLoS Negl. Trop.
strains for virological factors potentially associated with Plasmodium falciparum transmission along the Dis. 3, e545. (doi:10.1371/journal.pntd.0000545)
the rapidly expanding epidemic. Emerg. Microbes Kenyan coast. Am. J. Trop. Med. Hyg. 68, 734–742. 37. Salim NAM, Wah YB, Reeves C, Smith M, Yaacob
Infect. 5, e22. (doi:10.1038/emi.2016.48) (doi:10.4269/ajtmh.2003.68.734) WFW, Mudin RN, Dapari R, Sapri NNFF, Haque U.
9. Dudas G et al. 2017 Virus genomes reveal factors 23. Acevedo MA, Prosper O, Lopiano K, Ruktanonchai N, 2021 Prediction of dengue outbreak in Selangor
that spread and sustained the Ebola epidemic. Caughlin TT, Martcheva M, Osenberg CW, Smith DL. Malaysia using machine learning techniques. Sci.
Nature 544, 309–315. (doi:10.1038/nature22040) 2015 Spatial heterogeneity, host movement and Rep. 11, 939. (doi:10.1038/s41598-020-79193-2)
10. Reich NG et al. 2019 A collaborative multiyear, mosquito-borne disease transmission. PLoS ONE 10, 38. Stolerman LM, Maia PD, Kutz JN. 2019 Forecasting
multimodel assessment of seasonal influenza e0127552. (doi:10.1371/journal.pone.0127552) dengue fever in Brazil: an assessment of climate
forecasting in the United States. Proc. Natl Acad. Sci. 24. Torres-Sorando L, Rodríguez DJ. 1997 Models of conditions. PLoS ONE 14, e0220106. (doi:10.1371/
USA 116, 3146–3154. (doi:10.1073/pnas.1812594116) spatio-temporal dynamics in malaria. Ecol. Model. journal.pone.0220106)
11. Buczak AL, Baugher B, Moniz LJ, Bagley T, Babin 104, 231–240. (doi:10.1016/S0304-3800(97)00135-X) 39. Farinelli EC, Baquero OS, Stephan C, Chiaravalloti-
SM, Guven E. 2018 Ensemble method for dengue 25. Teurlai M et al. 2015 Socio-economic and climate Neto F. 2018 Low socioeconomic condition and the
prediction. PLoS ONE 13, e0189988. (doi:10.1371/ factors associated with dengue fever spatial risk of dengue fever: a direct relationship. Acta
journal.pone.0189988) heterogeneity: a worked example in New Caledonia. Trop. 180, 47–57. (doi:10.1016/j.actatropica.2018.
12. Bhatt S et al. 2013 The global distribution and PLoS Negl. Trop. Dis. 9, e0004211. (doi:10.1371/ 01.005)
burden of dengue. Nature 496, 504–507. (doi:10. journal.pntd.0004211) 40. Stoddard ST et al. 2013 House-to-house human
1038/nature12060) 26. Descloux E et al. 2012 Climate-based models for movement drives dengue virus transmission. Proc.
13. Stanaway JD et al. 2016 The global burden of understanding and forecasting dengue epidemics. Natl Acad. Sci. USA 110, 994–999. (doi:10.1073/
dengue: an analysis from the global burden of PLoS Negl. Trop. Dis. 6, e1470. (doi:10.1371/journal. pnas.1213349110)
disease study 2013. Lancet Infect. Dis. 16, 712–723. pntd.0001470) 41. Akanda AS, Johnson K. 2018 Growing water
(doi:10.1016/S1473-3099(16)00026-8) 27. Guo P et al. 2017 Developing a dengue forecast insecurity and dengue burden in the Americas.
14. Morin CW, Comrie AC, Ernst K. 2013 Climate and model using machine learning: a case study in Lancet Planet Health 2, e190–e191. (doi:10.1016/
dengue transmission: evidence and implications. China. PLoS Negl. Trop. Dis. 11, e0005973. (doi:10. S2542-5196(18)30063-9)
Environ. Health Perspect. 121, 1264–1272. (doi:10. 1371/journal.pntd.0005973) 42. Fustec B et al. 2020 Complex relationships between
1289/ehp.1306556) 28. Chuang T-W, Chaves LF, Chen P-J. 2017 Effects of local Aedes vectors, socio-economics and dengue
15. Tjaden NB, Thomas SM, Fischer D, Beierkuhnlein C. and regional climatic fluctuations on dengue outbreaks transmission—lessons learned from a case-control
2013 Extrinsic incubation period of dengue: in southern Taiwan. PLoS ONE 12, e0178698. (doi:10. study in northeastern Thailand. PLoS Negl. Trop.
knowledge, backlog, and applications of 1371/journal.pone.0178698) Dis. 14, e0008703. (doi:10.1371/journal.pntd.
temperature dependence. PLoS Negl. Trop. Dis. 7, 29. Johansson MA, Reich NG, Hota A, Brownstein JS, 0008703)
e2207. (doi:10.1371/journal.pntd.0002207) Santillana M. 2016 Evaluating the performance of 43. Barbosa JR, Barrado JCdS, Zara ALdSA, Siqueira JB.
16. Rohani A, Wong YC, Zamre I, Lee HL, Zurainee MN. infectious disease forecasts: a comparison of 2015 Evaluation of the dengue epidemiological
2009 The effect of extrinsic incubation temperature climate-driven and seasonal dengue forecasts surveillance system data quality, positive predictive
on development of dengue serotype 2 and 4 viruses for Mexico. Sci. Rep. 6, 33707. (doi:10.1038/ value, timeliness and representativeness, Brazil,
in Aedes aegypti (L.). Southeast Asian J. Trop. Med. srep33707) 2005–2009. Epidemiol. Serv. Saúde 24, 49–58.
Public Health 40, 942–950. 30. Lauer SA et al. 2018 Prospective forecasts of annual (doi:10.5123/S1679-49742015000100006)
17. Liu Z, Zhang Z, Lai Z, Zhou T, Jia Z, Gu J, Wu K, dengue hemorrhagic fever incidence in Thailand, 44. Shepard DS, Undurraga EA, Halasa YA, Stanaway
Chen X-G. 2017 Temperature increase enhances 2010–2014. Proc. Natl Acad. Sci. USA 115, JD. 2016 The global economic burden of dengue:
Aedes albopictus competence to transmit dengue E2175–E2182. (doi:10.1073/pnas.1714457115) a systematic analysis. Lancet Infect. Dis.
virus. Front. Microbiol. 8, 2337. (doi:10.3389/fmicb. 31. Stolerman L, Maia P, Kutz JN. 2016 Data-driven forecast 16, 935–941. (doi:10.1016/S1473-3099(16)
2017.02337) of dengue outbreaks in Brazil: a critical assessment of 00146-8)

You might also like