0% found this document useful (0 votes)
7 views13 pages

Micai2023 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

Micai2023 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1 Bayesian Classifier Models for Forecasting COVID-19 Related

2 Targets Using Epidemiological and Demographic Data


3 Pedro Romero-Martı́nez1,* and Christopher R. Stephens1,2
1
4 Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México,
5 Ciudad de México, México.
2
6 Instituto de Ciencias Nucleares, Universidad Nacional Autónoma de México,
7 Ciudad de México, México.
*
8 Address correspondence to: [email protected]

9 Abstract

10 This paper proposes using Bayesian classifiers for predicting in space and time COVID-
11 19 related targets such as infections, hospitalizations, intubations and deaths. In order to
12 achieve this, Bayesian classifiers were developed and applied across a spatial grid, with each
13 cell representing a municipality in Mexico. These models utilized open access epidemiological
14 data between 2020 and 2021 published by the Mexican government’s epidemiology agency and
15 sociodemographic data from the 2020 national census of Mexico. Specifically, COVID-19 related
16 targets are derived from epidemiological data and predictive features used in the model are
17 extracted from socio-demographic and socio-economic data. Continuous variables from both
18 datasets were discretized and represented as a finite set of presence-absence variables. These
19 Bayesian models assign a “correlation” measure, known as score, to each variable with respect
20 to the COVID-19 target. This implies that, we’re able to identify profiles of the municipalities
21 that are conductive to having COVID-19 related targets. The models generate two types of
22 outcomes: (1) Spatiotemporal predictions of the abundance of COVID-19 targets are made
23 using the Bayesian framework. In particular, the framework predicts the municipalities that fall
24 within the top 10% based on the available training period data, for a given validation period. (2)
25 Predictions of number of individuals belonging to a given COVID-19 target for each municipality
26 in a defined validation period.

27 1 Introduction
28 The most recent pandemic was provoked by the SARS-Cov-2 virus. Since the first cases in Decem-
29 ber 2019 until November 2022, according to World Health Organization (WHO) [1], this disease has
30 infected more than 634.5 million people and caused 6.5 millions deaths worldwide, The prevention
31 and control of pandemics are of utmost importance from both the public health and scientific per-

1
32 spectives. Furthermore, the pandemic has demonstrated itself to be a Complex Adaptive System
33 (CAS) as its evolution is contingent upon multiple factors which have changed and adapted over
34 time as has the pathogen itself.

35 One of the most important disciplines with which to study the pandemic is epidemiology - ‘the
36 ‘systematic study of the distribution, causes and determinants (factors) of epidemiological states,
37 risks or health-related events in specific populations, as in a geographical area, and its application to
38 public health problems” - [2]. The determinants play a crucial role in addressing the most relevant
39 questions to understand about health phenomenon: when?, where?, why?, who?, what?, how?, etc.
40 Therefore, epidemiology is a research discipline with an important public health component and
41 a quantitative discipline encompassing descriptive and predictive perspectives. According to [3] -
42 “epidemiological intelligence is defined as the systematic compilation, analysis and communication
43 of information aimed at detecting, verifying, evaluating and investigating events and risks for the
44 public health, with the purpose of issuing an early alert”. In this context, it becomes crucial for
45 decision makers to generate models about various aspects of the pandemic, interpreting the outcomes
46 of these models in the real-world lead to lead to actionable insights.

47 According to official Mexican government data, the COVID-19 pandemic has resulted in over 7
48 million infected people and more than 300 thousand deaths as of November 2022 in Mexico [4]. This
49 pandemic has become the most extensively documented pandemic in world history, primarily owing
50 advances in data collection, processing and storage capabilities achieved in recent years. In Mexico,
51 the Ministry of Health implemented a surveillance system for infections, which publishes daily the
52 records obtained from a national network of hospitals. This database includes demographic data,
53 comorbidities, clinical conditions and spatiotemporal attributes. Moreover, there are public datasets,
54 such as the 2020 national census of Mexico, that can be included into the models as potential risk
55 factors, processing them as presence-absence variables, as we’ll see later.

56 In this work, Bayesian classifier models are generated to predict the number of individuals be-
57 longing to COVID-19 related targets, such as infected people and deaths. The Bayesian models are
58 computationally inexpensive, transparent, readily interpretable and have shown a good performance
59 in a wide variety of problems [5] [6] [7], those are the main reasons to apply them. Unlike traditional
60 SIRS-type epidemiological models, Bayesian classifier models enable the incorporation of a large
61 number of variables, thereby capturing the high degree of multifactoriality of the pandemic.

62 2 Other models
63 2.1 Differential equations models
64 In the 20th century, compartmental models were proposed for analyzing epidemics, consisting of an
65 initial value problem, which involves ordinary differential equations (ODE) and initial conditions.
66 Although they are mathematically elemental, they help to develop the intuition for utilizing more
67 sophisticated models. These SI(R)(S) models divide the population into groups, where the number
68 of people in each group is time dependent: S (t) is the number of susceptibles, I (t) is the number

2
69 of infected and R (t) is the number of recovered. The equations contain some known parameters,
70 such as the mortality rate µ, the contact rate λ and the recovery rate γ. Some models have considered
71 the number of births and deaths in the population by adding the term µN to the change in the
72 susceptible group and subtracting a proportional amount from each group.

73 In 1927, Kermack y McKendrick purposed the SIR model aimed at modelling specific epidemics,
74 wherein individuals become immunized upon recovery

dS dI dR
= −λIS + µN − µS, = λIS − γI − µI, = γI − µR (1)
dt dt dt
75 where S (0) = S0 > 0, I (0) = I0 > 0, R (0) = R0 > 0 and S (t) + I (t) + R (t) = N . However,
76 there are certain diseases, such as COVID-19, in which individuals do not develop total immunity
77 upon recovery. For such cases, we have the SIS model

dS dI
= −λIS + γI + µN − µS, = λIS − γI − µI (2)
dt dt
78 where S (0) = S0 > 0, I (0) = I0 > 0 and S (t) + I (t) = N . These types of models have
79 been extensively studied, as seen in [8]. In the context of the COVID-19, numerous works have
80 modeled the outbreak in different places, as evidenced in [9] [10] [11]. Furthermore, new versions of
81 these models have been developed, by incorporating additional epidemiological states and transition
82 rates between different groups [12]. Some other works identified certain deficiencies in the SI(R)(S)
83 models, as seen in, [13]; in which, the authors utilized the SIR model to predict COVID-19 cases
84 and deaths in Isfahan province of Iran, and discovered significant disparities between the long-term
85 forecasts and the real cases and deaths. Another common criticism of SI(R)(S) models is that they do
86 not consider the multifactorial nature of a complex phenomenon such as an epidemic. For instance,
87 these models do not incorporate factors beyond the simplified susceptible, infected etc. states, such
88 as social, cultural, demographic, economic, ecological, geographical and others.

89 2.2 Machine learning models


90 Thanks to developments in computing and data storage capabilities in recent decades, applications of
91 machine learning have proliferated across a variety of fields and disciplines. There have been studies
92 that utilized machine learning models to predict COVID-19 targets. The class of deep learning
93 models learn patterns using neural networks with multiple neuron layers. A research group from
94 Georgia Institute of Technology developed a deep learning model called DeepCOVID [14], aimed at
95 making predictions about COVID-19 for each state in USA. This deep learning framework utilized
96 many data sources like COVID-19 epidemiological, COVID-19 tests, digital thermometer readings,
97 mobility, social distancing measurements and viral load measurements. DeepCOVID was one of the
98 first purely data driven and deep learning model and its results were very good in the short-term
99 and trend performance. Another machine learning approach, utilized to interpret the COVID-19
100 cases and deaths over time as time series for a given place, is the attention mechanism models as
101 applied to time series, weighting specific elements in the processing stage, as seen in [15].

3
102 In addition to the machine learning and SI(R)(S) models, some studies have presented hybrid
103 models, combining the dynamics of compartmental models with machine learning techniques. For
104 instance, in [10], interpretable encoders were utilized to incorporate covariates. Also in [16], a
105 variation of SI(R)(S) is trained using weighted least squares. The main criticism for the deep
106 learning and some of the hybrid models is their computational expense, which presents a challenge
107 in generating real-time predictions, as running these models requires, special hardware as GPUs as
108 well as their ”black box” nature.

109 3 Bayesian classifier models


110 The general approach in this work is to employ a Bayesian framework, where the main objective is
111 to estimate the conditional probability P (C|X) for a given target class C, conditioned on a vector
112 of attributes X = (X1 , X2 , . . . , Xm ). The general Bayesian approach possesses several advantages,
113 as exemplified by Bayes’ theorem

P (X|C) P (C)
P (C|X) = (3)
P (X)

114 that relates the conditional probability P (C|X), also known in this context as the posterior prob-
115 ability, with the likelihood function P (X|C), the evidence function P (X) and the prior probability
116 P (C). P (C|X) is referred as the posterior probability because it can be interpreted as a probabil-
117 ity after the inclusion of the data associated with X, providing a better estimation than the prior
118 probability P (C). Naturally, Bayes’ theorem incorporates the phenomenon of adaptation, as the
119 posterior probability can be re-calculated when new information X′ become available, according to

P (X′ |X, C) P (C|X)


P (C|X′ , X) = (4)
P (X′ |X)

120 which determines how the previous posterior probability as a new prior is updated. Another
121 advantage of employing the Bayesian approach is that it provides a natural framework for analyzing
122 causality [17].

123 3.1 Naive Bayes


124 Given the impossibility in directly approximating P (C|X) or P (X|C) in a frequentist sense it is
125 necessary to find a method for estimating them. One well-known, tested and simple approximation is
126 the called Naive Bayes method. It assumes that the variables X = (X1 , X2 , . . . , Xm ) are independent,
127 thus

P (X|C) = Πm
i=1 P (Xi |C) (5)
m
 
P X|C = Πi=1 P Xi |C

4
128 where C the set complement of C i. Combining the equations (3), (5) and the following approxima-
129 tion for the evidence function

P (X) = Πm m
 
i=1 P (Xi |C) P (C) + Πi=1 P Xi |C P C (6)

130 then,
Πmi=1 P (Xi |C) P (C)
P (C|X) =   (7)
Πm
i=1 P (Xi |C) P (C) + Πm i=1 P Xi |C P C

131 at this point, the score function S (C, X) is introduced, which is a monotone function of P (C|X)
132 and can be interpreted as the odds ratio of C and its complement C
! ! m
!
P (C|X) P (C) X P (Xi |C)
S (C, X) = ln  = ln  + ln 
P C|X P C i=1
P Xi |C
m
X
= s0 + si (X) (8)
i=1

   
P (C) P (Xi |C)
133 defining s0 := ln P (C )
and si (X) := ln P (Xi |C )
for 1 ≤ i ≤ m. The function S (C, X) can
134 be interpreted as a classifier, indicating that a record with profile X belongs to the target class C if
135 S (C, X) > 0 and it belongs to the class C if S (C, X) < 0.

136 3.2 Generalized Naive Bayes


137 The Naive Bayes method is based on a strong assumption: the likelihood function can be completely
138 decomposed, as shown in (5). Despite this supposition the Naive Bayes method has proven to be
139 robust and surprisingly accurate, as demonstrated in [5]. However, this method can be generalized
140 by employing an alternative factorization to (5), for considering correlations among the variables
141 X = (X1 , X2 , . . . , Xm ). Let ξ be a partition of X, that is, ξ = {ξ1 , ..., ξk } where each ξj is a subset
142 of X and they satisfy that {X1 , .., Xm } = ∪kj=1 ξj and ξi ∩ ξj ̸= ∅ for i ̸= j. Particularly, defining
143 ξj = {Xj } for 1 ≤ j ≤ m, ξ = {ξ1 , ..., ξm } represents the Naive Bayes approximation. Given a
144 partition ξ the likelihood function factorization (5) can be generalized as

P (X|C) = Πki=1 P (ξi |C) , ξi ∈ ξ (9)

145 which, in general, differs from the Naive Bayes factorization. Analogous to (7) utilizing (9) instead
146 of (5)
Πki=1 P (ξi |C) P (C)
P (C|X) =  kξ kη   (10)
Πi=1 P (ξi |C) P (C) + Πi=1 P ηi |C P C

5

147 where η = η1 , ..., ηkη is a partition different from ξ. Finally the score functions is generalized as
! kξ kη
P (C) X X 
S (C, X) = ln  + ln (P (ξi |C)) − ln P ηi |C
P C i=1 i=1
kξ kη
X X
= s0 + S C (ξi ) − S C (ηi ) (11)
i=1 i=1


148 where S C (ξi ) := ln (P (ξi |C)) and S C (ηi ) := ln P ηi |C . Selecting η = ξ in (10)
! k
!
P (C) X P (ξi |C)
S (C, X) = ln  + ln  (12)
P C i=1
P ξi |C

149 this is a natural generalization of the Naive Bayes classifier.

150 4 Spatial cells ensemble


151 To calculate the score contributions we must have a statistical ensemble with which counts of NC ,
152 NXi and NCXi can be made. We will consider two types of ensemble, starting with an ensemble of
153 spatial cells - in the present case municipalities. Let R be a region in the two-dimensional plane,
N
154 such as the surface delimited by Mexico in the map. Suppose that M = {ci }i=1 is a partition of
155 R, that is, a set of subregions where ci ∩ cj = ∅ for any i ̸= j and the union of these subregions is
156 equal to R. M is defined as a mesh and the elements ci are the cells. The set of municipalities in
157 Mexico is a mesh for the region delimited by Mexico. Then, a function Xj : M → {0, 1} is called
158 a presence-absence variable, we will say that Xj occurs in the cell ci , if it satisfies that Xj (ci ) = 1.
159 For a given mesh M and a set of presence-absence variables X = {X1 , ..., Xm }, a target class is a
160 subset C of M. In this context, the Naive Bayes approximation (8) can be rewritten as
  m  
NC X NCXi /NC
S (C, X) = ln + ln (13)
N − NC i=1
(NXi − NCXi ) / (N − NC )


161 because P (C) = NC /N , P (C) = (N − NC ) /N , P (Xi |C) = NCXi /NC and P Xi |C = (NXi − NCXi ) / (N − NC ),
162 where NC represents the number of cells belonging to the target class C and NCXi indicates the
163 number of cells where both C and Xi co-occur. Clearly, if NC = 0 or NCXi = 0 the score S (C, X)
164 is undefined, to avoid this possibility a standard Laplace term is applied [18]
  m  
NC X (NCXi + α)/(NC + 2α)
S (C, X) = ln + ln (14)
N − NC i=1
(NXi − NCXi + α) / (N − NC + 2α)

165 There are several target classes related with COVID-19 that can be predicted utilizing the en-
166 semble of cells. For example, the top 10% of cells with the highest number of COVID-19 cases
167 during a training period. The Naive Bayes model assigns the score sj to the variable Xj , and by
168 using the expression (14) it is possible to calculate the score for each cell. The score of each cell can

6
169 be interpreted as a measure of correlation with the target class, cells with higher scores are more
170 likely to belong to the target class. In the previous example, the cells with the higher scores during
171 training period, are the more likely for belonging to the top 10% with the highest number of cases
172 of COVID-19 in the subsequent period.

173 In order to capture the changes over time, three periods with the same length are considered:
174 (1) the first period t − 1, (2) the training period t and (3) the validation period t + 1. For a given
175 target class C, such as top 10% of cells with highest number of deaths, two special types of target
176 classes Ĉ are defined as

177 1. Improvement: Cells that belong to C during t − 1 and do not belong to C during t.

178 2. Deterioration: Cells that do not belong to C during t − 1 and belong to C during t.

179 By utilizing the target class Ĉ and presence-absence variables during the training period in the Naive
180 Bayes method, it is possible to determine the improvement or deterioration of the target class for
181 the validation period by identifying the cells with the highest scores.

182 5 Population ensemble


183 In the population ensemble the fundamental element is not the cell, but the “person”. Let Ni
184 represents the population of the cell ci ∈ M. If M is the set of municipalities in Mexico, the Ni
185 is the population of the municipality ci . In this context, the target classes are defined based on
186 the individuals, such as infected or death by COVID-19. In this case, the population ensemble size
P
187 coincides with the total population N = Ni and the presence-absence variables are based on
188 the combined populations of the cells. The population ensemble enable us to predict the number of
189 individuals in the target class by assigning a score to each individual using the expression (14), where
190 NC represents the number of people belonging to the target class C and NCXi indicates the number
191 of people belonging to C and possessing the attribute Xi . The higher the score of an individual, the
192 more likely it is the individual belongs to the target class.

193 Although for reasons of privacy it is not possible to create models which have socio-demographic
194 and socio-economic variables documented for each individual over the whole population of Mexico,
195 there are documented and publicly available variables defined over the set of municipalities of Mexico.
196 In order to extend the use of the cells-defined (municipalities-defined) variables Xj to the entire
197 population, we define the function X̂j such that X̂j = 1 for individuals that are part of the population
198 of any cell ci that satisfies Xj (ci ) = 1. For simplicity, the variables X̂j will be just denoted by Xi .
199 Using variables defined over the cells to make predictions, we assign the same score for a given
200 variable to every individual within the same cell, as each individual within a given cell inherits the
201 attributes of that cell.

202 In order to determine the probability for each individual population ensemble, the score calculated
203 for individuals is considered. Ranking the population based on their individual score and dividing

7
204 into equally sized d sub-lists Ik , the probability for each sub-list is calculated as follows

number of individuals belonging to the target class C within Ik


p Ik = (15)
number of individuals within Ik

205 Just like in the cells ensemble the score depends on the period. Let’s consider the scores and proba-
bilities for each cell during the first and training period as Sit−1 , pt−1

206
i and (Sit , pti ), the probability
207 for each individual in the cell ci computed in two ways
208 • Additive prediction: Let f be a regression model for the data (Sit , pti ), then define ∆pti :=
f Sit − Sit−1 . The probability for each cell ci in the validation period is given by pt+1

209
i :=
210 pti + ∆pti .
#Cit
211 • Multiplicative prediction: pt+1
i := pt .
#Cit−1 i

212 Here, #Cit represents the number of the individuals in the target class within the cell ci during the
213 period t. For both types of predictions #Cit+1 = pt+1
i Ni .

214 6 Model validation


215 6.1 Spatial validation
216 Given a training period t and a cells ensemble, the ensemble is randomly divided into two subsets:
217 the training and the validation sets. The Bayesian model is trained using the training set, computing
218 a score sj for the presence-absence variables Xj during the training period. The score for each cell
219 in the validation set is calculated using the variable scores sj . It is possible that certain cells may
220 not have any calculated score variables associated with them, such cells are called nulls. The spatial
221 validation aims to measure the model’s ability to identify the validation cells in the target class.
222 This purpose is analyzed using the recall defined as, T P/ (T P + F N ) in each sub-list Ik , where
223 T P is the number of true positives in the sub-list Ik , F N is the number of false negatives and the
224 sub-lists are equally sized defined by ranking the validation cells by score.

225 6.2 Temporal validation


226 Let t and t + 1 be training and validation periods, respectively. The objective of the temporal
227 validation in the cells ensemble is to measure the performance of the predictions over time. Similar
228 to the spatial validation, the recall is analyzed for each sub-list Ik obtained by ranking the entire
229 mesh by score and comparing it with the real data in the validation period. In this type of validation,
230 the T P are cells in the target class during the validation period and belonging to Ik and the F N
231 are the false negatives.

232 7 Data processing


233 The data necessary to train the Bayesian models includes the target classes C for the specified
234 periods, presence-absence variables Xj and the mesh M over the region R. This work focuses on

8
235 Mexico as the region between the years 2020 and 2021 and the set of municipalities in Mexico as the
236 mesh. The presence-absence variables are derived from the processed variables of the 2020 national
237 census of Mexico, while the target classes pertain to the epidemiological states of infection and death
238 caused by COVID-19.

239 The epidemiological states are obtained from the open COVID-19 database of the epidemiology
240 agency of the Mexican government. This database is generated by the COVID-19 surveillance
241 system, which publishes daily records reported by the hospital network in the country. In addition
242 to capturing whether an individual is infected or not, it includes demographic profiles, comorbidity
243 data, other clinical conditions, and spatial-temporal information at the daily and municipal level.
244 For a given training period and target class, the open COVID-19 database provides the municipality
245 information for each record that belongs to the target class.

246 The presence-absence variables are derived from the 2020 national census database of the Mexican
247 government. The census database contains 180 variables with population and housing characteristics
248 for different geographical levels. In particular, this study utilizes data at the municipal level. All
249 census variables are integer-valued variables defined over the mesh of municipalities, and they are
250 processed to generate presence-absence variables. Let X be an integer-valued variable and d an
251 integer value greater than 0. It is possible to obtain d presence-absence variables from the integer-
252 valued variable X as follows. Since the variable X is defined over M, the rank is finite. Therefore,
253 by sorting the rank, it is possible to divide it into d equally sized sub-ranks (rj−1 , rj ]. Each sub-rank
254 defines a presence-absence variable Xj as follows: for every ci ∈ M, Xj (ci ) = 1 if rj−1 < Xj (ci ) ≤
255 rj . This data processing transforms every integer or real-valued variable into d presence-absence
256 variables.

257 8 Results
258 Several models have been generated for different configurations. In the population ensemble, the
259 target classes considered were infection or death by COVID-19 for different age groups: 60 years
260 and older, 50-59 years, 40-49 years, 30-39 years, and 18-29 years. Furthermore, each model had a
training period of 30 days in duration.

State Municipality Ni #Cit Sit predicted #Cit+1 #Cit+1


Ciudad de México Gustavo A. Madero 171225 21 55.483 52.26 36
Ciudad de México Iztapalapa 281800 31 52.954 84.65 57
Ciudad de México Tlalpan 107280 13 51.268 40.54 14
Ciudad de México Iztacalco 61842 14 50.906 33.56 15
México Cuautitlán Izcalli 84377 8 50.848 39.24 17

Table 1: Predictions for the municipalities with the highest scores resulting from the model targeting
deaths by COVID-19 in the population between 30 and 39. The first, training and validation periods
are November 2020, December 2020 and January 2021, respectively.
261

9
Figure 1: Scatter plot showing the predicted values Cit+1 versus the observed values of Cit+1 for the
predictions of the model in Table 1. The R2 value is 0.8611.

Figure 2: Scatter plot of the prediction Cit+1 versus the observed value of Cit+1 for the predictions
in Table 2, with an R2 value of 0.9393.

10
State Municipality Ni #Cit Sit predicted #Cit+1 #Cit+1
Ciudad de México Álvaro Obregón 122319 2526 83.976 3630.63 2957
Ciudad de México Gustavo A. Madero 203469 2488 83.107 5365.26 3416
Ciudad de México Tlalpan 108894 1724 81.535 2557.30 2218
Ciudad de México Venustiano Carranza 78964 1135 79.815 1998.50 1153
Ciudad de México Coyoacán 126592 1416 79.397 3199.82 1615

Table 2: Predictions for the municipalities with the highest scores resulting from the model targeting
deaths by COVID-19 in the population aged 60 years and older. The first, training and validation
periods are November 2020, December 2020 and January 2021, respectively.

262 9 Conclusions and discussion


263 1. While some of the developed models have incorporated variables from various domains (de-
264 mographic, hospital infrastructure, mobility, social contact measures, etc.), they have been
265 limited in quantity. Considering the complexity of the COVID-19 pandemic, which depends
266 on numerous factors, it is important to include as many variables from relevant domains as
267 possible to accurately model the reality.

268 2. Unlike the SI(R)(S) models, the Bayesian approach allows for the consideration of variables
269 other than just the time series of infected and deceased individuals in making predictions.

270 3. In general, the reviewed literature agrees that the generated predictions are intended to support
271 public health decision-makers in formulating more informed policies. However, very few models
272 provide a measure of the factors most correlated with the target class of COVID-19 (infected,
273 hospitalized, deceased, etc.), which would provide more specific guidance on the necessary
274 actions to be taken.

275 4. Regarding computational resources, there are models that require specific computing infras-
276 tructure such as Graphics Processing Units (GPUs) for real-time prediction generation, such
277 as neural networks, due to the necessary calculations involved in model training. In contrast,
278 the approach proposed in this study does not require specific hardware and it has reasonable
279 training time.

280 10 Acknowledgments
281 We would like to express our sincere gratitude to PAPIIT, a support program for research and
282 technological innovation in UNAM. PAPIIT has provided support to several projects belonging to
283 the Chilam laboratory, and this paper is a result of the work and research conducted in the Chilam
284 laboratory.

285 References
286 1. Organization WH. WHO Coronavirus (COVID-19) Dashboard. Last accessed 29 June 2023.
287 2022. url: https://covid19.who.int.

11
288 2. Dicker R, Coronado F, Koo D, and Parrish RG. Principles of Epidemiology in Public Health
289 Practice. 3rd ed. U.S. DEPARTMENT OF HEALTH and HUMAN SERVICES, 2012.
290 3. Salud S de. MANUAL DE OPERACIÓN PARA LAS UNIDADES DE INTELIGENCIA EPI-
291 DEMIOLÓGICA Y SANITARIA. Last accessed 28 abril 2023. 2021. url: https://epidemiologia.
292 salud.gob.mx/gobmx/salud/documentos/manuales/39_Manual_UIES.pdf.
293 4. México G de. Covid-19 México. Last accessed 21 November 2022. 2022. url: https://datos.
294 covid-19.conacyt.mx.
295 5. Stephens CR, Huerta HF, and Linares AR. When is the Naive Bayes approximation not so
296 naive? 2018.
297 6. Stephens CR, Sierra-Alcocer R, González-Salazar C, et al. SPECIES: A platform for the ex-
298 ploration of ecological data. Ecology and Evolution 2019.
299 7. Stephens CR, González-Salazar C, and Romero-Martı́nez P. Does a Respiratory Virus Have an
300 Ecological Niche, and If So, Can It Be Mapped? Yes and Yes. Tropical Medicine and Infectious
301 Disease 2023;8.
302 8. Satsuma J, Willox R, Ramani A, Grammaticos B, and Carstea A. Extending the SIR epidemic
303 model. Physica A: Statistical Mechanics and its Applications 2004;336:369–75.
304 9. Anastassopoulou C, Russo L, Tsakris A, and Siettos C. Data-based analysis, modelling and
305 forecasting of the COVID-19 outbreak. PLoS ONE 2020.
306 10. Arik SO, Li CL, Yoon J, et al. Interpretable Sequence Learning for COVID-19 Forecasting.
307 2021. arXiv: 2008.00646 [cs.LG].
308 11. Chen YC, Lu PE, Chang CS, and Liu TH. A Time-Dependent SIR Model for COVID-19
309 With Undetectable Infected Persons. IEEE Transactions on Network Science and Engineering
310 2020;7:3279–94.
311 12. Acuña-Zegarra MA, Santana-Cibrian M, Hernandez-Vela CER, Mena RH, and Velasco-Hernández
312 JX. A retrospective analysis of COVID-19 non-pharmaceutical interventions for Mexico and
313 Peru: a modeling study. medRxiv 2022.
314 13. Moein S, Nickaeen N, Roointan A, et al. Inefficiency of SIR models in forecasting COVID-19
315 epidemic: a case study of Isfahan. Scientific Reports 2021.
316 14. Rodrı́guez A, Tabassum A, Cui J, et al. DeepCOVID: An Operational Deep Learning-driven
317 Framework for Explainable Real-time COVID-19 Forecasting. Proceedings of the AAAI Con-
318 ference on Artificial Intelligence 2021;35:15393–400.
319 15. Jin X, Wang YX, and Yan X. Inter-Series Attention Model for COVID-19 Forecasting. 2021.
320 arXiv: 2010.13006 [cs.LG].
321 16. Srivastava A and Prasanna VK. Learning to Forecast and Forecasting to Learn from the
322 COVID-19 Pandemic. 2020. arXiv: 2004.11372 [q-bio.PE].
323 17. Neuberg LG. CAUSALITY: MODELS, REASONING, AND INFERENCE, by Judea Pearl,
324 Cambridge University Press, 2000. Econometric Theory 2003;19:675–85.

12
325 18. Langou J. Translation and modern interpretation of Laplace’s Théorie Analytique des Proba-
326 bilités, pages 505-512, 516-520. 2009. arXiv: 0907.4695 [math.NA].

13

You might also like