A Machine Learning Forecasting Model For COVID-19 Pandemic in India
A Machine Learning Forecasting Model For COVID-19 Pandemic in India
A Machine Learning Forecasting Model For COVID-19 Pandemic in India
https://doi.org/10.1007/s00477-020-01827-8 (0123456789().,-volV)(0123456789().
,- volV)
ORIGINAL PAPER
Abstract
Coronavirus disease (COVID-19) is an inflammation disease from a new virus. The disease causes respiratory ailment (like
influenza) with manifestations, for example, cold, cough and fever, and in progressively serious cases, the problem in
breathing. COVID-2019 has been perceived as a worldwide pandemic and a few examinations are being led utilizing
different numerical models to anticipate the likely advancement of this pestilence. These numerical models dependent on
different factors and investigations are dependent upon potential inclination. Here, we presented a model that could be
useful to predict the spread of COVID-2019. We have performed linear regression, Multilayer perceptron and Vector
autoregression method for desire on the COVID-19 Kaggle data to anticipate the epidemiological example of the ailment
and pace of COVID-2019 cases in India. Anticipated the potential patterns of COVID-19 effects in India dependent on data
gathered from Kaggle. With the common data about confirmed, death and recovered cases across India for over the time
length helps in anticipating and estimating the not so distant future. For extra assessment or future perspective, case
definition and data combination must be kept up persistently.
Keywords COVID-19 Prediction Linear regression (LR) Multilayer perceptron (MLP) Vector autoregression (VAR)
123
Stochastic Environmental Research and Risk Assessment
website of Kaggle.2 Weka 3.8.43 and Orange4 is utilized to COVID-19 emergency will probably uncover a portion of
decipher the information. LR, MLP, and VAR are applied the key setbacks of AI. Machine learning, the present type
on the Kaggle dataset having 80 instances for anticipating of AI, works by recognizing designs in verifiable training
the future effects of COVID-19 pandemic in India. Fore- information. People have a preferred position over AI. We
casting is the need of an hour that helps to device a better can take in exercises from one setting and apply them to
strategy to tackle this crucial hour across the globe because novel circumstances, drawing on our theoretical informa-
of this infectious disease. As mentioned by the visual tion to make the best theories on what may work or what
capitalist, the human race as crossed several outbreaks may occur. Simulated intelligence frameworks, interest-
because of the several microbes that were invisible and ingly, need to gain without any preparation at whatever
invincible. COVID-19 is the current threat in the highly point the setting or undertaking changes even somewhat.
sophisticated twenty-first century. Figure 1 is a snapshot of The COVID-19 emergency, along these lines, will feature
the visual capitalist.5 something that has consistently been valid about AI: it is an
Artificial intelligence (AI) can assist us in handling the apparatus, and the estimation of its utilization in any cir-
problems that need to be addressed raised by the COVID- cumstance is dictated by the people who plan it and use it.
196 pandemic. It isn’t simply the innovation, however, that In the present emergency, human activity and development
will affect yet rather the information and inventiveness of will be especially basic in utilizing the intensity of what AI
the people who use it. Without a doubt, the COVID-19 can do. One way to deal with the novel circumstance issue
emergency will probably uncover a portion of the key is to accumulate new training information under current
shortages of AI. Machine learning (ML), the present type conditions. For both human chiefs and AI frameworks the
of AI, works by recognizing designs in chronicled training same, each new snippet of data about our present circum-
information. People have a preferred position over AI. We stance is especially important in illuminating our choices
can take in exercises from one situation and apply them to going ahead. The more compelling we are at sharing data,
novel circumstances, drawing on our dynamic information the more rapidly our circumstance is not, at this point novel
to make the best speculations on what may work or what and we can start to see a way ahead.
may occur. Computer-based intelligence frameworks,
conversely, need to gain without any preparation at what-
ever point the setting or assignment changes even 2 Related work
marginally.
The COVID-19 emergency, hence, will feature some- Sujatha and Chatterjee (2020) proposed a model that could
thing that has consistently been valid about AI: it is a be useful to foresee the spread of COVID-2019 by using
device, and the estimation of its utilization in any cir- linear regression, Multilayer perceptron and Vector
cumstance is dictated by the people who structure it and autoregression model on the COVID-19 kaggle data to
use it. In the present emergency, human activity and envision the epidemiological example of the malady and
development will be especially basic in utilizing the pace of COVID-2019 cases in India. Yang et al. (2020)
intensity of what AI can do. One way to deal with the novel introduced dynamic SEIR model for anticipating the
circumstance issue is to assemble new training information COVID-19 pestilence pinnacles and sizes. They utilized an
under current conditions. For both human leaders and AI AI model prepared with respect to past SARS dataset
frameworks the same, each new snippet of data about our additionally shows guarantee for future expectation of the
present circumstance is especially significant in advising scourges. Barstugan et al. (2020) presented early stage
our choices going ahead. The more viable we are at sharing location of COVID-19, which is named by World Health
data, the more rapidly our circumstance is not, at this point Organization (WHO), by machine learning strategies
novel and we can start to see a way ahead. AI can assist us actualized on stomach Computed Tomography pictures.
in handling the problems that need to be addressed raised Elmousalami and Hassanien (2020) presents a correlation
by the COVID-19 pandemic. It isn’t simply the innovation, of day level guaging models on COVID-19 influenced
however, that will affect yet rather the information and cases utilizing time series models and numerical detailing.
imagination of the people who use it. To be sure, the Rizk-Allah and Hassanien (2020) acquainted another
guaging model with examine and gauge the CS of COVID-
2
https://www.kaggle.com/imdevskp/corona-virus-report/data. 19 for the coming days dependent on the announced data
3
https://sourceforge.net/projects/weka/. since 22 Jan 2020. Rezaee et al. (2020) introduced a
4
https://orange.biolab.si/. mixture approach dependent on the Linguistic FMEA,
5
https://www.visualcapitalist.com/history-of-pandemics-deadliest/. Fuzzy Inference System and Fuzzy Data Envelopment
6
https://www.weforum.org/agenda/2020/03/covid-19-crisis-artifi Analysis model to ascertain a novel score for covering
cial-intelligence-creativity/. some RPN inadequacies and the prioritization of HSE
123
Stochastic Environmental Research and Risk Assessment
dangers. Navares et al. (2018) introduced an answer for the 3 Methods and materials
issue of anticipating every day medical clinic confirmations
in Madrid because of circulatory and respiratory cases In statistics, Linear Regression7 (LR) is a direct way to deal
dependent on biometeorological markers. Cui and Singh with demonstrating the connection between a dependent
(2017) created and applied the MRE hypothesis for month variable and at least one independent variable. LR was the
to month streamflow prediction withspectral power as a main kind of regression analysis to be concentrated thor-
random variable. Torky and Hassanien (2020) introduced a oughly and to be utilized widely in useful applications
blockchain incorporated structure which research the (Yan and Su 2009). LR shows the connection between two
chance of using peer-to peer, time stepping and decen- variables by fitting a straight condition to based informa-
tralized storage points of interest of blockchain to construct tion. One variable is viewed as an independent and the
another framework for confirming and distinguishing the other is viewed as a dependent. An LR1 line has a condi-
obscure contaminated instances of COVID-19 infection. tion of the structure:
Ezzat and Ella (2020) a novel methodology called GSA-
Y ¼ bX þ a ð1Þ
DenseNet121-COVID-19 dependent on a hybrid CNN
structure is proposed utilizing an optimization strategy.
7
https://www.stat.yale.edu/Courses/1997-98/101/linreg.htm.
123
Stochastic Environmental Research and Risk Assessment
here X is the independent and Y is the dependent variable. Correlation plays a great role in finding the dependency
The slope of the line is b and a is the intercept (the value of among the features of the dataset. Our dataset revolves
y when x = 0). around the confirmed, recovered, and death of cases
A multilayer perceptron8 (MLP) is a type of feedforward because of the COVID-19 outbreak over the time frame of
artificial neural network (FANN). The term MLP is utilized around 2 months in India. From the Spearman correlation,
vaguely, now and then freely to indicate any FANN, now it’s very evident that based on progressive of the day (date)
and then carefully to allude to systems made out of various the possibility of getting prone to sickness is very high and
layers of the perceptron. An MLP9 is a perceptron that is that is given with thE?0.949 correlation value. Figure 4
generally used for complex issues. The formula for MLP2 provides a glance at the correlation between Pearson and
is: the spearman process. Appreciably the date attribute is
! holding a higher level of importance and that’s is reason
Xn globally the measures have been taken for social distancing
y¼u wi xi þ b ¼ u wT x þ b ð2Þ
i¼1
(Mu et al. 2018; Gautheir 2001). Normally the spread
happens just in contact with the person by a handshake is
here w is for the vector of weights, x is for the vector of the big brother in case of COVID-19. Correlation provides
inputs, b is for bias and phi are the non-linear activation the signal about the impact and necessary countermeasures
function. to be taken into consideration. Across the globe, leaders of
A Vector Autoregression10 (VAR) is a prediction cal- the nation are carrying out various trial and error methods
culation which is utilized when at least two-time series to combat the seriousness of the disease.
impact one another, i.e., the connection between the time Forecasting gives pertinent and consistent input about
arrangements included is bi-directional. The formula for the past, present, and future happenings with certain sta-
VAR is: tistical and scientific approaches. Helps in string decision
Yt ¼ a þ b1 Yt1 þ b2 Yt2 þ þ bp Ytp þ t ð3Þ making in all perspectives. Broadly classified into quali-
tative and quantitative approaches. Steps involved in
where a is the intercept, a constant and b1, b2 till bp are forecasting is the deciding factor of the task. Initial
the coefficients of the lags of Y till order p. understanding of the problem with complete analysis,
Order ‘p’ means, up to p-lags of Y is utilized and they making a strong foundation, collecting data based on the
are the predictors in the equation. The et is the error con- previous two steps followed by future estimation. Com-
sidered as white noise. parison between actual and estimated with followup
actions. Various applications like economic and sales
prediction, budget, census and stock market analysis, yield
4 Experimental results projections and many more fields. The medical field also a
potential area to deploy the forecast and predication to
The structure of data based on date, confirmed, recovered serve the number of people in need (Hajirahimi and Kha-
and death are shown in Fig. 2 with the boxplots, and it’s shei 2019; Yamana and Shaman 2019). Our work carried
very clear that several cases are in so primitive stages. As out with linear regression, multilayer perceptron, and VAR
mentioned by WHO, right now India is in the second phase model over the time series dataset to provide the forecast.
indicating very few cases and forecast of this same is the VAR model is a more suitable analysis model in the
potential work that is required at this juncture (Tareen et al. multivariate time series. It helps in inferencing and analysis
2019). of policy. It is used more in a practical forecasting scenario
Sieve diagram provides the visualization of the but it is hading superior forecasting performance. Techni-
dataset along with that showing the sieve rank. Figure 3 cally narrating about the VAR, it is an m-equation, m-
illustrates attributes that have a strong relationship with the variable model in which individual variable explains on its
dark shades. The interestingness of the pair of attributes is own based on current, past values. Various parameters of
represented via this contingency table. It’s a very graphical VAR begins with maximum auto-regression order. Various
way of frequency visualization. information criteria that help in optimize autoregressive
order are Akaike’s information criterion (AIC), Bayesian
8
information criterion (BIC), Hann-Quinn and Final pre-
https://en.wikipedia.org/wiki/Multilayer_perceptron.
9
diction error (FPE). By adding and varying trends from
https://missinglink.ai/guides/neural-network-concepts/perceptrons-
constant, linear, and quadratic with forecast steps ahead
and-multi-layer-perceptrons-the-artificial-neuron-at-the-coreofdeep-
learning/. and confidence intervals (Billio et al. 2019; Portet 2020;
10
https://www.machinelearningplus.com/time-series/vector-autore Zhang and Krieger 1993). The formula for calculating AIC,
gression-examples-python/. BIC and HQ is as follows:
123
Stochastic Environmental Research and Risk Assessment
123
Stochastic Environmental Research and Risk Assessment
Figure 6 shows the COVID-19 predicted confirmed going to be increased in future as per the existing case data
cases; death cases and recovered cases based on actual by utilizing LR.
confirmed, death and recovered data with a 95% CI with Figure 8 shows the predicted confirmed cases based on
MLP. The graph using MLP can be interpreted that cases the actual confirmed case data with a 95% CI with MLP.
are going to be increased in future as per the existing case The graph using MLP shows prediction of confirmed cases
data. in a incremental range based on the existing data of
Figure 7 shows the predicted confirmed cases based on 80 days.
the actual confirmed case data with a 95% CI with LR. The Figure 9 shows the predicted impacts of COVID-19
graph using can be interpreted that confirmed cases are based on the actual data of confirmed, death and recovered
cases with 95% CI via LR. In this figure also it is showing
123
Stochastic Environmental Research and Risk Assessment
123
Stochastic Environmental Research and Risk Assessment
that the confirmed cases will be increasing day by day through MLP. The Fig. 12 can be interpreted that cases are
based on the input data, system shows this prediction. going to be increased in future as per the existing case data.
Figure 10 predicts the impacts of COVID-19 based on Figure 13 shows the predicted impacts of COVID-19
the actual data of confirmed, death and recovered cases recovered based on the actual data of recovered cases with
with 95% CI through MLP. This graph shows the con- 95% CI through LR. By analyzing the Figs. 13 and 14 we
firmed cases will go down with a very slow rate and the can understand the cases are going to increase in future.
recovered and death records will fluctuate (i.e. some times Figure 14 shows the predicted impacts of COVID-19
more some times less) as per prediction with MLP. recovered based on the actual data of recovered cases with
Figure 11 shows the predicted impacts of COVID-19 95% CI with MLP.
death based on the actual data of death cases with 95% CI Figure 15 shows the forecast of next 69 days in the
through LR. The graph can be interpreted that cases are VAR model, where auto regression order is 10, with AIC
going to be increased in future as per the existing case data. optimize information criteria with constant and linear trend
Figure 12 shows the predicted impacts of COVID-19 vectors and CI of 95% for the confirmed, recovered and
death based on the actual data of death cases with 95% CI death cases are illustrated in perfect manner.
123
Stochastic Environmental Research and Risk Assessment
We have given data of cases till the 80th day i.e. 10th i.e. 18th June 2020.These are the predicted values as per
April 2020. Table 1 shows the predicted values of cases the actual values given in the system as an input. The
(confirmed, death, recovered) by using the LR method Figs. 5,7,9,11,13 are generated based on the predicted
from the 81st day i.e. 11th April 2020 for the next 69 days, values of Table 1.
123
Stochastic Environmental Research and Risk Assessment
Fig. 15 Forecast of confirmed, deaths and recovered cases of COVID-19 using VAR model
We have given data of cases till the 80th day i.e. 10th of the infected person, age of the patient, gender of the
April 2020. Table 2 shows the predicted values of cases patient, steps taken to combat the proliferation of the virus,
(confirmed, death, recovered) by using MLP method from and so on to make it completely informative. As of now,
the 81st day i.e. 11th April 2020 for the next 69 days, i.e. it’s very prudent that yards to carry needs to be stringent
18th June 2020. These are the system predicted values as and vigil in nature to handle this crucial situation by social
per the actual values given as an input. distancing, lockdown, curfew, quarantine, and isolation to
Figure 15 gets its waves of the different cases from the prevent the transmission. By seeing the predicted values
Table 3 values for the next 69 days. It depends on the and matching with cases from John Hopkins University11
various parameters mentioned in the VAR model part. data we can conclude that the MLP method is giving good
prediction results than that of the LR and VAR method
using WEKA and Orange. In future we can work with
6 Conclusion some deep learning methods for forcasting time series data
for getting better predictions.
Information and communication technology help in the
decision-making process based on the past data with the
data analytics and data mining perspective. The size of data
available is huge and gathering information and getting an
interesting pattern out of the cumulated data is a chal-
lenging task. With the prevailing data about confirmed,
recovered and death across India for over the time duration
helps in predicting and forecasting the near future. The
correctness of the model could be increased by introducing
related attributes like several hospitals, the immune system
11
https://www.arcgis.com/apps/opsdashboard/index.html#/
bda7594740fd40299423467b48e9ecf6.
123
Stochastic Environmental Research and Risk Assessment
123
Stochastic Environmental Research and Risk Assessment
123
Stochastic Environmental Research and Risk Assessment
123
Stochastic Environmental Research and Risk Assessment
Author contributions All the authors have made substantive con- admissions due to circulatory and respiratory causes in Madrid.
tributions to the article and assume full responsibility for its content. Stoch Env Res Risk Assess 32(10):2849–2859
Portet S (2020) A primer on the model selection using the Akaike
information criterion. Infect Dis Modell 5:111–128
Compliance with ethical standards Rezaee MJ, Yousefi S, Eshkevari M, Valipour M, Saberi M (2020)
Risk analysis of health, safety and environment in chemical
Conflict of interest The authors declare that we don’t have any con- industry integrating linguistic FMEA, fuzzy inference system
flict of Interest. and fuzzy DEA. Stoch Env Res Risk Assess 34(1):201–218
Rizk-Allah RM, Hassanien AE (2020) COVID-19 forecasting based
on an improved interior search algorithm and multi-layer feed
forward neural network. arXiv preprint arXiv:2004.05960
References Sujatha R, Chatterjee J (2020) A machine learning methodology for
forecasting of the COVID-19 cases in India
Barstugan M, Ozkaya U, Ozturk S (2020) Coronavirus (covid-19) Tapia JA, Salvador B, Rodrı́guez JM (2020) Data envelopment
classification using ct images by machine learning methods. analysis with estimated output data: confidence intervals
arXiv preprint arXiv:2003.09424 efficiency. Measurement 152:107364
Billio M, Casarin R, Rossini L (2019) Bayesian nonparametric sparse Tareen ADK, Nadeem MSA, Kearfott KJ, Abbas K, Khawaja MA,
VAR models. J Econ 212(1):97–115 Rafique M (2019) Descriptive analysis and earthquake prediction
Cui H, Singh VP (2017) Application of minimum relative entropy using boxplot interpretation of soil radon time-series data. Appl
theory for streamflow forecasting. Stoch Env Res Risk Assess Radiat Isot 154:108861
31(3):587–608 Torky M, Hassanien AE (2020) COVID-19 blockchain framework:
Elmousalami HH, Hassanien AE (2020) Day level forecasting for innovative approach. arXiv preprint arXiv:2004.06081
Coronavirus Disease (COVID-19) spread: analysis, modeling Yamana TK, Shaman J (2019) A framework for evaluating the effects
and recommendations. arXiv preprint arXiv:2003.07778 of observational type and quality on vector-borne disease
Ezzat D, Ella HA (2020) GSA-DenseNet121-COVID-19: a hybrid forecast. Epidemics 100359
deep learning architecture for the diagnosis of COVID-19 Yan X, Su X (2009) Linear regression analysis: theory and
disease based on gravitational search optimization algo- computing. World Scientific, New York
rithm. arXiv preprint arXiv:2004.05084 Yang Z, Zeng Z, Wang K, Wong SS, Liang W, Zanin M, Liu P, Cao
Gautheir TD (2001) Detecting trends using Spearman’s rank corre- X, Gao Z, Mai Z, Liang J (2020) Modified SEIR and AI
lation coefficient. Environ Forens 2(4):359–362 prediction of the epidemics trend of COVID-19 in China under
Hajirahimi Z, Khashei M (2019) Hybrid structures in time series public health interventions. J Thorac Dis 12(3):165
modeling and forecasting: A review. Eng Appl Artif Intell Zhang P, Krieger AM (1993) Appropriate penalties in the final
86:83–106 prediction error criterion: a decision-theoretic approach. Stat
Mu Y, Liu X, Wang L (2018) A Pearson’s correlation coefficient- Probab Lett 18(3):169–177
based decision tree and its parallel implementation. Inf Sci
435:40–58 Publisher’s Note Springer Nature remains neutral with regard to
Navares R, Dı́az J, Linares C, Aznarte JL (2018) Comparing ARIMA jurisdictional claims in published maps and institutional affiliations.
and computational intelligence methods to forecast daily hospital
123