1. Introduction
Dengue fever (DF) is a potentially life-threatening viral disease transmitted by
Aedes spp. mosquitoes with an estimated disease burden of over 100 million infections a year [
1,
2]. The mosquito vectors breed in small bodies of stagnant water, particularly in water storage containers around homes. More recently, the range of the disease has expanded geographically [
3], resulting in increasing risk of disease in 129 countries. Early detection and management are key to preventing mortality [
1].
Saudi Arabia has one of the largest DF burdens in the Middle East. The first documented case appeared in Jeddah in late 1993 [
4]. By March 1994, the Disease Control Division had initiated a dengue surveillance system that recorded 289 cases that year [
5]. Sporadic outbreaks occurred in ensuing years, each with no more than 15 cases annually [
6]. However, between 2004 and 2015, significantly larger outbreaks occurred, primarily during the rainy season, extending beyond Jeddah into the nearby cities of Makkah, Al-Madinah, Jizan, and Najran, and led the Saudi Ministry of Health to declare DF endemic in the western region of Saudi Arabia [
7]. In 2022, 3647 cases of DF were reported with an incidence rate of 11 per 100,000 person-years [
8].
The dramatic increase in DF globally over the past 50 years has been attributed to increased urbanization, migration, erratic water supplies, and geographically expanding vector populations associated with climate change, among other factors [
7,
9]. DF transmission generally follows a seasonal pattern and is highly sensitive to temperature, rainfall, and humidity [
9,
10,
11,
12,
13]. Temperature influences both the physiology and behavior of the vectors and viral replication rate [
9,
14,
15]; several statistical models have successfully predicted these relationships [
9,
16,
17,
18,
19]. DF may also be governed by seasonal precipitation as rainfall provides pockets of stagnant water around dwellings [
11]. Although humid conditions generally coincide with rainfall, often ambient humidity is enough to create the necessary conditions for
Aedes aegypti proliferation, by increasing the longevity of female mosquitoes and preventing the desiccation of mosquito ova. Hales et al., (2002) found that average annual vapor pressure was the strongest predictor of DF distribution [
20]. Favorable weather conditions can also help imported cases of DF become local epidemics [
21].
Climatologic and population factors likely contribute to DF’s epidemiology in Saudi Arabia. While arid, conditions in some areas have allowed DF to become endemic, with a seasonal pattern peaking in the wetter spring (March–May) and a smaller second peak in November and December [
22]. This pattern is likely related to the seasonal abundance of the mosquito vectors [
23]. The second peak might be attributed to the lower temperatures that are optimal for DF transmission [
23]. Population makeup also plays a role. A third of the country’s approximately 30 million are foreign workers [
24]. The country hosts over 8 million visiting Muslim pilgrims annually in Makkah, arriving primarily through Jeddah [
25], where the climate was particularly favorable for DENV introduction and emergence and persistence remains high [
7]. Inter-regional population movement, particularly during the annual Hajj and Umrah pilgrimages, increases disease importation risk [
26]. Between 1.5 and 2.5 million pilgrims from over 180 different countries participate in the week-long Hajj [
27], most from countries where DF is endemic [
28,
29]. Millions of pilgrims also travel to Makkah to perform the Umrah pilgrimage. These mass gathering events further drive DENV serotype mixing and transmission, [
22,
26,
28]. As the Hajj falls on the 12th month of the Islamic lunar calendar, the seasonal variability of the event complicates DF transmission dynamics [
27].
Despite rich evidence linking weather and climate with DF globally, linkages in Saudi Arabia remain largely unexplored. Further understanding of interactions between weather and demographic factors is needed to anticipate the possible impacts of climate change on dengue incidence [
11]. Additionally, understanding the role of pilgrims in the original and continuous importation of dengue virus would improve health system preparedness [
7].The World Health Organization (WHO) has emphasized the importance of identifying the factors, particularly weather variables, that may act as leading indicators of DF outbreaks. Predictive models in other locations highlight the importance of these indicators and the potential of predictive modeling to minimize the burden of DF [
30]. Recently, Siddiq et al. attempted to predict the geospatial clustering of DF in Jeddah. They used annual and monthly weather variables and environmental variables but did not incorporate any population factors into their models [
31].
The objectives of this study are (1) to examine and quantify the relationships between weather, pilgrimage events, and DF in Saudi Arabia, (2) to determine the best statistical modeling approach for DF prediction there, and (3) to utilize this information to create a predictive model for DF incidence.
Research efforts investigating the factors responsible for DF emergence and spread within Saudi Arabia are limited, possibly due to lack of consistent publicly available datasets. We were able to obtain electronic DF data from three cities in Saudi Arabia for a period of 10 years. To our knowledge, this is the first research effort utilizing this rich resource. This study is also the one of the first attempts to predict DF incidence in the Arabian Peninsula using an empirical model. DF in this region presents a unique context as the area is non-tropical and known for its arid climate. It further poses a novel question pertaining to the effect of hosting the Hajj and Umrah pilgrimages on DF epidemiology.
We evaluate the weather variables that can be used to predict DF in Saudi Arabia based on lagged observations and compare three different modeling approaches previously utilized in other geographic regions.
4. Discussion
DF ecology in the Arabian Peninsula has not been well described and has two unusual elements: the region’s aridity [
4] and the unique large annual religious pilgrimages that bring in people from other endemic regions [
7]. This is one of the first model-based investigations of DF epidemiology in Saudi Arabia. The ultimate goal of this work is to develop a predictive model that could facilitate early warning and intervention to reduce future infections. While overall the RF model performed the best, both the ARIMA and Poisson regression models lend insights into the environmental and social factors affecting the epidemic and allows us to examine biologic plausibility and other factors that the black box RF model can obscure. Our findings suggest that our predictive models have sufficient skill to be used in prevention and control efforts.
The seasonal distribution of DF in our dataset has previously been described locally [
5,
11,
23], and globally [
9,
10,
32], and as previously mentioned, largely exhibited the effect of weather on vector life cycle dynamics. The slight dips in the trend at 2–3 years (
Figure 3) have previously been discussed in the literature. Jayaraj et al., (2019) explained this phenomenon of ebb and flow in DF epidemiology by the replacement of the dominant circulating viral serotype with another serotype resulting in a process of virus extinction and reinvasion termed “clade replacement” [
33], consistent with Saudi Arabia’s experience.
We found a moderately strong association with temperature variables, which is supported by the literature. Temperature acts on multiple components of the ecologic pathway, including viral replication, mosquito oviposition, and larval development and density, with higher temperatures favoring these processes [
1,
9,
34]. Wu et al. (2009) contend that minimum temperature was the most critical for mosquito survival and development [
35]. The literature also suggests that average temperatures between 20 and 30 °C are most suitable for
Ae. aegypti population growth [
1,
5,
9,
11]. Morin et al. (2013) emphasize that this association needs to be considered in the context of the local climate. For at least a third of the year, average temperatures in this region are over 30 °C and can reach up to 40 °C [
36]. This might explain the shift we see in the relationship between temperature and dengue incidence with increasing lag. At lower temperatures, the relationship between temperature and DF cases is positive; however, as temperatures continue to rise past ~32 °C conditions become detrimental to the mosquito [
16,
17,
18,
19,
35], inflecting the relationship.
The relationship with humidity variables followed a similar pattern. While collinearity between the two weather variables likely contributed to the association, humidity also plays an independent role, as it is associated with increased mosquito feeding, survival, and egg development [
9]. Lab studies have shown that although higher humidity generally favors the mosquito life cycle, higher temperatures and moderate humidity levels (28 °C and 50 to 55% relative humidity (RH) are better suited to the vector compared to environments of very high RH and slightly lower temperature (25 °C and 85 to 90%) [
37]. In studies investigating DF in Guangzhou, China, both Wu et al., (2018), and Xiang et al., (2017) found that very high RH has a negative relationship with DF incidence [
17,
36]. Observed variability in the DF–humidity relationship has been explained in part by climatic differences. For example, in tropical regions like Indonesia where humidity is very high year round (70–80%), no significant association was observed, whereas areas with more moderate humidity reported significant positive associations [
17].
Although some studies have reported an association between precipitation and DF [
33], it is debatable whether this factor is significant in urban areas where the primary vector breeding habitats may be in indoor containers [
32]. The weaker rainfall association we found is likely attributable to rain’s rarity in this region. Water storage behaviors in response to water shortages are more likely to influence mosquito breeding habitats [
5,
7]. Unfortunately, we do not have access to any water storage data for the region.
The positive correlation between Ramadan and DF is likely due to crowding and increased movement in the Jeddah/Makkah region with the exponential increase in the number of domestic pilgrims during the holy month of Ramadan, the most common pilgrimage time. In 2016, the number of domestic Umrah pilgrims was 16.5 million, nearly half of whom visited during Ramadan [
25].
Curiously, like Siddiq et al. [
31], we did not find an association between DF and the Hajj timing. There are several potential explanations. First, reporting may decrease during the Hajj, as local health resources are focused on the large influx of visitors. As DF typically presents as mild non-specific symptoms, this may lead to fewer health center visits and thus less reporting during this busy time. Second, active DF cases in Hajj pilgrims may be identified and isolated by health screenings before and upon entering the country, including screening by thermal cameras at Jeddah international airport [
38]. Similarly, sick potential pilgrims may self-select, as the Hajj pilgrimage is physically demanding and unlikely to be attempted by someone who is ill. In addition, the virus extinction reinvasion concept, described earlier, could also contribute to the negative correlation between DF cases and the number of pilgrims the previous year. Finally, it is our hypothesis that the negative association with the timing of pilgrimage events is most likely an artifact of the seasonality of these events. The timing of the Hajj in the last 10 years has occurred in early fall, when DF incidence is historically low. Ramadan has also failed to coincide with the peak DF season in the last 10 years. By 2025, the Ramadan and Hajj events will take place between March and June. Notably, DF first emerged in 1993, when these two holy events also took place during the spring. Additional data, including viral serotyping, and further analyses, such as hindcasting to the period of DF emergence in the region, would be required to further evaluate causal mechanisms linking the pilgrimages and DF incidence over the past 25 years.
While overall the RF model had the highest predictive ability, both the ARIMA and Poisson models also contributed to in our analysis by providing clues regarding the various environmental and social factors impacting DF epidemiology in the region. Poisson regression has been standard for studying the impact of weather on DF but has been supplanted by other approaches in recent years. In this study, the Poisson model performed well overall but was not able to capture the magnitude of DF peaks. ARIMA models are also commonly used [
32,
39] and, while ideal for tackling large datasets, are also known for their sensitivity to outlier data points and poor handling of missing values and multicollinearity [
32]. Here, the ARIMA model performed very well with Group 2 covariates but less so when the variable of the
number of cases the previous week was not included, which is unsurprising given ARIMA’s reliance on historical data. This is an issue when attempting forecasts in places where there are limited or no surveillance data.
The RF model’s overall superior predictive ability, with or without the
number of cases the previous week, likely derives from the approach’s ability to handle outlier data [
32] and better capture non-linear relationships [
40]. In assessing DF prediction methods, researchers have emphasized the superiority of tree-based and support vector regression (SVR) machine-learning models compared to those utilizing linear regression [
32,
41]. In China, Guo et al. found SVR to be the most accurate [
41], and Carvajal et al. demonstrated the advantage of an RF approach compared to a variety of other models [
32]. A study in Jeddah found that machine-learning methods with environmental and weather variables were adept at predicting DF outbreak locations [
31]. Tree-based methods have also been utilized to project the geospatial expansion of the disease vector while subject to varying climate change scenarios. Machine-learning methods are particularly suited to investigate questions where in spite of accumulating large amounts of data many theoretical knowledge gaps persist [
40]. Although the RF approach has been shown to be promising in DF prediction, the complex role that several environmental and population factors play on disease incidence leads to differing findings in the relationship between climate and DF in various locations [
30].
All of the approaches struggled with some aspect of the relationship between weather and DF, particularly epidemic peaks, likely due to several factors. First, the actual relationships may vary over time. For instance, Xiang et al., (2017) described the relationship between weather and DF as linear up to a specific threshold, beyond which the association is less straightforward and more nuanced [
17,
37]. Second, some plausible drivers are unobserved, e.g., urban microclimate conditions [
40]. Lastly, there may be other overlooked contributing factors at work not included in our model whose effect is more profound during the peak of the epidemic. This is supported by the fact that even the nonlinear RF model struggled to accurately represent the magnitude of the contagion during the seasonal peak.
Our study has several limitations. The first is missing DF count data. We found 19 missing days, from 2 May to 19 May 2018, in the Jeddah dataset (
Figure 1). This likely influenced the magnitude of the correlation between the observed DF cases and the cases predicted by either model, but had no bearing on the comparison between the two models. We also suspect significant under-reporting, observed in many countries [
10], due to asymptomatic cases, misdiagnosis of mild DF cases, or changes in reporting standards or rates of DF testing over the study period. Additionally, as noted, we have no data on other factors known to affect dengue ecology like water storage, household density, and the prevalence of window screens and air conditioning prevalence that might affect the extent of suitable habitat or transmission dynamics. Lastly, our findings may not be generalizable, as statistical models are usually very location specific [
39].
5. Conclusions
DF, endemic in the Arabian peninsula, has complex ecology that is strongly affected by local environmental and social factors [
7]. Local virus serotypes, immunity patterns, population demographics and movement, and intervention programs affect DF epidemiology [
30]. DF ecology in Saudi Arabia was not well characterized prior to our study. We found that temperature, humidity, and, to a much lesser extent, rainfall affect DF incidence there. Additionally, the two main pilgrimages involving the city of Makkah might also play a role in DF incidence, but how and to what extent remains unclear.
We found that a nonlinear machine-learning approach had better prediction accuracy, particularly in the absence of accurate surveillance data. These models could have varying applications depending on the timing of the application. For example, the ability to predict disease incidence two or three months in advance potentially allows for primary prevention interventions, such as vector control, including eliminating mosquito breeding habitats in the form of household water containers. Whereas, predicting the disease a week or two in advance gives medical personnel time to prepare for the influx of patients.
Further investigation is needed to better understand the role various environmental and population factors play in DF incidence in this sparsely studied geographic area and to better prepare the region’s healthcare system to anticipate and intervene to reduce the spread of this disease.