A Data Driven Model Approach
A Data Driven Model Approach
ScienceDirect
Original Article
a
Center for Telemedicine and Telepharmacy, University of Camerino, Camerino, Italy
b
Research Department, International Medical Radio Center Foundation (C.I.R.M.), Rome, Italy
Received 1 April 2020; received in revised form 5 April 2020; accepted 6 April 2020
Available online - - -
KEYWORDS Abstract Background: Till 31 March 2020, 105,792 COVID-19 cases were confirmed in Italy
COVID-19 outbreak; including 15,726 deaths which explains how worst the epidemic has affected the country. After
Forecasting; the announcement of lockdown in Italy on 9 March 2020, situation was becoming stable since
ARIMA; last days of March. In view of this, it is important to forecast the COVID-19 evaluation of Italy
Italian population; condition and the possible effects, if this lock down could continue for another 60 days.
Lock down Methods: COVID-19 infected patient data has extracted from the Italian Health Ministry web-
site includes registered and recovered cases from mid February to end March. Adoption of sea-
sonal ARIMA forecasting package with R statistical model was done.
Results: Predictions were done with 93.75% of accuracy for registered case models and 84.4%
of accuracy for recovered case models. The forecasting of infected patients could be reach the
value of 182,757, and recovered cases could be registered value of 81,635 at end of May.
Conclusions: This study highlights the importance of country lockdown and self isolation in
control the disease transmissibility among Italian population through data driven model anal-
ysis. Our findings suggest that nearly 35% decrement of registered cases and 66% growth of
recovered cases will be possible.
Copyright ª 2020, Taiwan Society of Microbiology. Published by Elsevier Taiwan LLC. This is an
open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
* Corresponding author. E-health and Telemedicine center, University of Camerino, Via Madonna delle carceri 9, Camerino, 62032, Italy.
E-mail addresses: [email protected] (N. Chintalapudi), [email protected] (G. Battineni), francesco.amenta@
unicam.it (F. Amenta).
https://doi.org/10.1016/j.jmii.2020.04.004
1684-1182/Copyright ª 2020, Taiwan Society of Microbiology. Published by Elsevier Taiwan LLC. This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Please cite this article as: Chintalapudi N et al., COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day
lockdown in Italy: A data driven model approach, Journal of Microbiology, Immunology and Infection, https://doi.org/10.1016/
j.jmii.2020.04.004
+ MODEL
2 N. Chintalapudi et al.
Please cite this article as: Chintalapudi N et al., COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day
lockdown in Italy: A data driven model approach, Journal of Microbiology, Immunology and Infection, https://doi.org/10.1016/
j.jmii.2020.04.004
+ MODEL
A data driven predictive model approach for COVID-19 3
Figure 1. Total registered cases progression (left) and Total recovered case progression (right) of COVID-19 in Italy (from mid of
February to end of March).
install. Packages("forecast")
library(forecast)
library(readxl)
worldcovid19 <- read_excel("Italycovid19.xlsx") Figure 3. Weekly box plot diagram of infected Italians of
View(worldcovid19) COVID-19.
tsworldcovid19 <-ts (Italycovid19$‘daily registered
Cases’, frequency Z 1,start Z c(15/02/2020,1))
tsworldcovid19 <-ts (Italycovid19$‘daily recovered
Cases’, frequency Z 1,start Z c(15/02/2020,1)) symptoms on or after incubation period. In view of this, we
plot(tsworldcovid19) conducted simple forecasting of COVID cases if the same
trend has been continued for two months. We applied
‘AUTOARIMA’ package in R to evaluate the values of (p, d,
q) and forecaste the reproduction of infected cases. Two
ARIMA models of COVID-19 daily registered and recovered
45 days patient data from 15 February 2020 (i.e., where cases were designed. The possible residuals for these two
serious outbreak was about to originated) to 31 March 2020 models to understand the case variance were plotted and
with one day frequency was considered (Fig. 2). statistical analysis was performed using ‘R’ version 1.2.5.
The plots revealed that the trend in case registered at
Italian hospitals was going upwards and peak number of
corona cases was registered in the last two weeks of March Results
(Fig. 3). This might be caused because of most people are
traveled to home lands through public transports before For data fitting in ARIMA model to develop a model for
lockdown was officially announced. Through this migration COVID-19 for both registered and recovered cases, we
of people, virus could spreads through and expose the performed the commands mentioned below.
Please cite this article as: Chintalapudi N et al., COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day
lockdown in Italy: A data driven model approach, Journal of Microbiology, Immunology and Infection, https://doi.org/10.1016/
j.jmii.2020.04.004
+ MODEL
4 N. Chintalapudi et al.
Please cite this article as: Chintalapudi N et al., COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day
lockdown in Italy: A data driven model approach, Journal of Microbiology, Immunology and Infection, https://doi.org/10.1016/
j.jmii.2020.04.004
+ MODEL
A data driven predictive model approach for COVID-19 5
Figure 4. Predictive and confidence intervals (CI) of registered case model (graph A), and recovered case model (graph B) (Black
line: actual data, Blue line:60-day forecast, Gray zone: 80% of CI, White zone: 95% of CI).
Figure 5. Probability plots of registered cases (left), and recovered cases (right).
The 60-days COVID-19 forecasting graphs of register considered mean absolute prediction error (MAPE) param-
along recovery cases (Fig. 4), and normalized QQ plots9 eter. The accuracy (Acc) is defined in equation (2);
were computed (Fig. 5). Table 1 presents the model out-
comes and accuracy parameters. Acc % Z 100-MAPE*100 (2).
The probability of new positive cases and recovered
cases in Italy for next two months based on available The models of ARIMA(1,2,0) registered, and ARIMA(3,2,0)
data were computed. It is evident from Fig. 4, the 60- recovered cases are validated with an accuracy of 93.75%,
day forecasting of infected cases might rise in between 84.4% respectivly.
the range of 105,732e182,757, and recovered cases
could increased in between the range of 16,742e81,635
with CI of 80e95%. The regressive distribution of patient Discussions
cases while two plots had observed to estimate the
fitting accuracy. The model validation was assessed by We used existed COVID-19 epidemic data of Italian patients
prediction errors. to evaluate the probability of infected and recovered
Based on the ARIMA model accuracy evolution of COVID- pateint number after having 60-day country lockdown.
19 Italian epidemic data on mentioned time period, we Simple automatic forecasting package (AUTOARIMA) of ‘R’
Please cite this article as: Chintalapudi N et al., COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day
lockdown in Italy: A data driven model approach, Journal of Microbiology, Immunology and Infection, https://doi.org/10.1016/
j.jmii.2020.04.004
+ MODEL
6 N. Chintalapudi et al.
the given period. From the plot, it is clear that the first two
weeks trend seems normal and after 3 March 2020, a huge
spike in case variance can be observed (i.e., 24 to 26th days
after quarantine had begun).
One positive sign of this COVID-19 epidemic in Italy is
after having established isolation, there is a significant
growth of recovered case number, particular with last
weeks of March (Fig. 7). This could be because of the
increased availability of medical devices, medications and
Figure 6. Residual plot of positive registered cases. health professionals in the most affected areas that might
affect lowering of pandemic rates.
At present, the Italian citizens are also taking more
preventive measures and maintaining social distancing to
control speed of infection. As a result, disease transmission
is expected to be reduced in the near future. Preliminary
results of this study suggest that if Italian government and
citizens could continue to be quarantined for another two
months there could be chance of low tendency rate in
infective cases. Predictions mentioned that another 78,701
infected cases might be the registered in 60 days which is
lower than last 45 days.
ARIMA models can forecast the simple up and downs and
Figure 7. Residual plot of recovered cases. more predictive than regressive models without change in
the overall trend. It is because ARIMA can only look back
the data of dependent variables (i.e, registered and
recovered cases).15 This represents a primary limitation of
was applied to conduct predictive modelling.11 Our data this study. Secondly, due to unwillingness to join in hospi-
driven model analysis highlights the necessity of country tal, some confirmed cases are not ready to inform the
lockdown and self isolation to control disease trans- medical authorities. This measure could affect the natural
missibility among Italian population at the moment. transmission of disease to family members which will also
At the present, Italy is becoming the worst epidemic affect the study outcomes. Finally, used data was retrieved
corona outbreak center. On 3 March 2020, 11 towns in North from official Italian Health Ministry websites, if any delay or
Italy announced quarantine after result of 17 deaths and mismatch of data reporting could results incorrect
650 positive cases.12 Unfortunately, in consequence of forecasting.
many Italian citizens continued their daily life routine COVID-19 is a severe pandemic that all countries are
irrespective of outbreak results epidemic spread all over facing. This results about half of the global population went
the country. After about one week, the Italian government into lockdown. At the present, Italy is facing serious
announced more than 9000 positive cases with 97 deaths.13 epidemic of positive and mortality rates. We estimated an
On 9 March 2020 the Italian prime minister announced increase in the size of registered cases and recovered case
country lock down and strictly passed regulations to close number population if the present lockdown could continue
malls, educational institutions, and sport events in order to for another two months. Results of this study indicate that
stop infection among the other citizens. As mentioned, due nearly 35% of decay in positive cases and 66% of growth in
to extreme characteristic of COVID-19 is not expose the recovered cases could be possible.
immediate symptoms while in the incubation time. In addition, present government taking some serious
After Italy’s lockdown, government officials make sure contaminant measures such as suspending training sessions
that people were at home. All national administration of sports persons, professionals, and non professionals. All
websites encourages companies to offer free online ser- emergency issues remained same including to prohibit
vices. Educational institutions and universities involved e- natural persons to move with public and private means of
learning methods, any data or publications on COVID-19 transport. Advertising of prevention measures such as hand
made available for free to general public. COVID-19 washing, mask wearing and disinfection was done continu-
response team also conducting screening tests for domi- ously through national media which is largely influences the
cile or long stay in high hit areas like north Italy provinces. reproductive number of corona virus cases. The future of
Hospitals and medical centers are successfully handling COVID-19 diffusion in Italy will largely depend on govern-
patient flow to local hospitals and addresses individual is- ment regulations and motivation to carry self isolation of
sues about bed facilities, overcrowding in emergency de- individual citizens.
partments, and patient transfer to other specialized
facilities.14
All these critical circumstances were considered to un- Author contributions
derstand what exactly happened in between the period of
lockdown (9 March 2020) announcement and incubation NC: Data analysis, methods, results and study design; GB:
period (possibly 23 March 2020). It can be observed in the Manuscript preparation and statistical analysis; FA: Final
Fig. 6, the residual plot of positive COVID-19 cases during revision and study approval.
Please cite this article as: Chintalapudi N et al., COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day
lockdown in Italy: A data driven model approach, Journal of Microbiology, Immunology and Infection, https://doi.org/10.1016/
j.jmii.2020.04.004
+ MODEL
A data driven predictive model approach for COVID-19 7
Declaration of Competing Interest 16. Hipel KW, McLeod AI. Time series modelling of water resources
and environmental systems. Time Ser Model water Resour
Environ Syst 1994. https://doi.org/10.1016/0022-1694(95)
No author does not have any conflicts of interest. 90010-1.
This work was supported by institutional funding of the Autoregressive integrated moving average (ARIMA) model is
University of Camerino, Italy. Dr Nalini Chintalapudi and Dr aims to capture the auto correlation in the series modeling,
Gopi Battineni were recipients of PhD bursaries from the and generally to do forecasting.
University of Camerino. ARIMA model can completely be summarized by three
parameters; p: The number of autoregressive terms, d:
number of non seasonal differences, and q: number of
References moving terms. These three parameters (p, d, q) can used to
define ARIMA models, thus alternatively it is called as
‘ARIMA (p, d, q)’ model. There are two types of models in
1. Chen Z-L, Zhang Q, Lu Y, Guo ZM, Zhang X, Zhang WJ, et al.
Distribution of the COVID-19 epidemic and correlation with ARIMA such as Generalized random walk modes (i.e., well
population emigration from wuhan, China. Chin Med J (Engl). tuned to discard all residual correlations) and Generalized
2020. https://doi.org/10.1097/cm9.0000000000000782. exponential smoothing modes (i.e., which can incorporate
2. Peiris JSM, Poon LLM. Severe Acute respiratory Syndrome the long term trends and seasonality).
(SARS). In: Encyclopedia of virology; 2008. https: The mathematical definitions are well explained below.
//doi.org/10.1016/B978-012374410-4.00780-9. If we consider ‘B’ is back shift operator which causes the
3. Situation reports. https://www.who.int/emergencies/ observation that multiplies to be backward shifting in time
diseases/novel-coronavirus-2019/situation-reports. [Accessed by 1 interval.
31 March 2020].
For any time series Z at any period t is considered as
4. Which countries are under lockdown - and is it working?.
BZt Z Zt1 , and for n powers of B : Bn Zt Z Zt n
https://www.telegraph.co.uk/news/2020/03/29/lockdown-
countries/. [Accessed 31 March 2020]. ARIMA is joint model of two individual models (autore-
5. Delamater PL, Street EJ, Leslie TF, Yang YT, Jacobsen KH. gressive AR(p) and model average MA(q)) is integrated by
Complexity of the basic reproduction number (R0). Emerg difference variable I(d). In ARIMA models non-stationary
Infect Dis 2019. https://doi.org/10.3201/eid2501.171901. time series is defined stationary by application of finite
6. Grasselli G, Pesenti A, Cecconi M. Critical care utilization for difference in data points.
the COVID-19 outbreak in Lombardy, Italy. JAMA. 2020. https: The general multiplicative ARIMA/SARIMA framework
//doi.org/10.1001/jama.2020.4031. can be written:
7. Tuite AR, Ng V, Rees E, Fisman D. Correspondence Estimation
D
of COVID-19 outbreak size in Italy. Lancet Infect Dis 2020. fP ðBs Þ4p ðBÞð1 BÞd ð1 Bs Þ Zt Z qq ðBÞwQ ðBs Þet ð1Þ
https://doi.org/10.1016/S1473-3099(20)30227-9.
8. Kalpakis K, Gada D, Puttagunta V. Distance measures for where B is backshift operator, and
effective clustering of ARIMA time-series. In: Proceedings -
IEEE international conference on data mining. ICDM.; 2001.
fP ðBs Þ Z 1 f1 Bs f2 B2s :::::::: fp Bps ð2Þ
https://doi.org/10.1109/icdm.2001.989529.
9. Loy A, Follett L, Hofmann H. Variations of QeQ plots: the 4p ðBÞ Z 1 41 B 42 B2 ::::::::: 4p Bp ð3Þ
power of our eyes! Am Stat 2016. https:
//doi.org/10.1080/00031305.2015.1077728.
10. Shumway RH, Stoffer DS, Shumway RH, Stoffer DS. ARIMA qq ðBÞ Z 1 q1 B q2 B2 :::::::::::::: qq Bq ð4Þ
models. In: Time series: a data analysis approach using R;
2019. https://doi.org/10.1201/9780429273285-5. wQ ðBs Þ Z 1 w1 Bs w2 B2s :::::::: wQ BQs ð5Þ
11. Zhang PG. Time series forecasting using a hybrid ARIMA and
neural network model. Neurocomputing 2003. https: The general setting in equation (1) can also expressed
//doi.org/10.1016/S0925-2312(01)00702-0. as: ARIMA(p, d, q)x(P,D,Q).
12. Italy struggled to convince citizens of coronavirus crisis. What
In the ARIMA(2,1,3), we have p Z 2, d Z 1, q Z 3, s Z 0,
can Europe learn? j World news j The Guardian. https://www.
theguardian.com/world/2020/mar/23/a-warning-to-europe-
then its mathematical structure can be shown as:
italy-struggle-to-convince-citizens-of-coronavirus-crisis.
1 41 B 42 B2 ð1 BÞZt Z 1 q1 B q2 B2 q3 B3 et ð6Þ
[Accessed 31 March 2020].
13. Novel coronavirus. http://www.salute.gov.it/portale/
nuovocoronavirus/homeNuovoCoronavirus.jsp?lingua= Similarly the structure of ARIMA(1,0,1) (0,1,1)12 where
english. [Accessed 31 March 2020]. (p Z 1,d Z 0,q Z 1; P Z 0, D Z 1,Q Z 1, S Z 12) is:
14. Spina S, Marrazzo F, Migliari M, Stucchi R, Sforza A,
Fumagalli R. The response of milan’s emergency medical sys- ð1 41 BÞ 1 B12 Zt Z ð1 q1 BÞ 1 w1 B12 et ð7Þ
tem to the COVID-19 outbreak in Italy. Lancet 2020. https:
//doi.org/10.1016/S0140-6736(20)30493-1. The mathematical formulation of ARIMA (p, d, q) model
15. Christodoulos C, Michalakelis C, Varoutas D. Forecasting with lag polynomials is defined as16
with limited data: combining ARIMA and diffusion models.
Technol Forecast Soc Change 2010. https: 4ðLÞð1 LÞd yt Z qðLÞεt ð8Þ
//doi.org/10.1016/j.techfore.2010.01.009.
Please cite this article as: Chintalapudi N et al., COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day
lockdown in Italy: A data driven model approach, Journal of Microbiology, Immunology and Infection, https://doi.org/10.1016/
j.jmii.2020.04.004
+ MODEL
8 N. Chintalapudi et al.
For multiple lag polynomials The difference integer d controls the differencing levels,
" # " # usually d Z 1 is good enough in highest number of cases, if
Xp X q
1 i d
4i L ð1 LÞ yt Z 1 þ qj L εt
j
ð9Þ d Z 0 then model turns into ARMA (p, q) model.
iZ1 jZ1
Please cite this article as: Chintalapudi N et al., COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day
lockdown in Italy: A data driven model approach, Journal of Microbiology, Immunology and Infection, https://doi.org/10.1016/
j.jmii.2020.04.004