Epidemic Outbreak Prediction Using Machine Learning Model
Epidemic Outbreak Prediction Using Machine Learning Model
Abstract— The intelligent models is used for prediction of effects, insights from other legislative bodies and
diseases as well as creation of model that helps doctor to government. The prediction models is for suggestion of new
2022 5th International Conference on Advances in Science and Technology (ICAST) | 978-1-6654-9263-8/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICAST55766.2022.10039594
prevent spreading of disease globally is increased day by day. strategies and assess effectiveness of those that have already
When a disease spreads rapidly in a short period of time in a been put in place. (A & G., 2020) [1].
specific area, it is called an epidemic outbreak. An outbreak
might start in a single community or spread across multiple Model uncertainty was greatly enhanced by the
countries. It might last anywhere from a few days to several complexity of community activities in different geographic
years. PHO (Public health organizations) are taking regions and changes in control techniques, in addition to the
preventative efforts to stop the disease from spreading besides many unknown and known underlying factors in the
that they are highly benefited from accurate prediction of transmission. (Darwish, Rahhal, & Jafar, 2020)[2].
infectious disease. The emergence of big data in the sectors of Traditional epidemiological models are therefore faced with
health and biomedicine, precise data analysis aids early disease new challenges in providing more reliable data. Many new
identification and better patient treatment. It is now models have emerged to address this problem, each of which
increasingly viable to use massive computing power to predict adds a set of assumptions to the modelling process. (Scarpino
and manage outbreaks. Our goal is to investigate and & Petri, 2019) [3].
determine how outbreaks spread in villages and suburbs where
medical care may be limited. A machine learning model is SEIR models demonstrated improved accuracy of model
required to forecast epidemic dynamics and identify where the for the Zika and Varicella outbreaks by accounting for the
next outbreak is most likely to occur. Because these are lengthy incubation period that infected patients experience.
important features that contribute subtly to the dynamics of the (Pan, Huang, & Chen, 2012)[4]. A random variable that
disease epidemic, our method considers the climate, geography, affects the incubation duration, and a disease-free equilibrium
and distribution of population in impacted region. Our is assumed in SEIR models, just like the SIR model [5].
approach will assist health authorities in taking the necessary
steps to guarantee that there are sufficient resources to fulfil Due to the complexity and magnitude of the challenge of
demand and, if feasible, to prevent epidemics from arising. developing health-related models, recently machine learning
has received importance for producing epidemic prediction
Keywords— Epidemic, Zika Virus, Outbreak, Cases, Classi models. Machine learning methods intent to make models
fication, Regression, Weather, Population Density, Prediction having improved generalisation capability and reliable
prediction with increased lead periods. (Yin, Tran, Zhou,
I. INTRODUCTION Zheng, & Kwoh, 2018) [6].
The globe is currently dealing with an evolving spectrum
of infectious diseases, which has been influenced by When using textblob in Python to perform sentiment
organismal evolution, human demographics, global travel, analysis, polarity and subjectivity are two important factors to
environmental modifications, and closure of institutions for consider. As proposed by (Alka, 2018), it focuses on some
public health. To meet growing needs for prevention, common areas such as parts of speech, nouns and phrases
detection, reporting, and response, current infectious disease from text, text classification, sentiment analysis, and so on. In
surveillance and response procedures are insufficient. The Python, tokens supplied to textblob can be processed as
ability to predict diseases will equip governments and strings for natural language processing. The sentiment
healthcare practitioners with a way to promptly respond to analyzer provides a tuple of the type sentiment (polarity,
epidemics., reducing the damage and conserving limited subjectivity), with polarity ranging between [-1.0, 1.0] and
resources. Many infectious diseases, especially those subjectivity ranging between [0.0, 1.0].
conveyed by arthropod vectors, possess the capacity to The outcome of data analysis is greatly improved when
anticipate the potential for epidemics in time and space using using an interactive approach. It also significantly enhances
sophisticated monitoring and modelling techniques that comparative analysis [8]. (Buja, Cook, & Swayne., 1996)
combine environmental data. These tactics can offer useful, suggested an interactive data visualization approach focused
timely, and cost-effective tools when paired with on specific analytic tasks like comparisons. Plotly, an open
communication technologies. The focus of this study work is source and interactive python graphing package, is used in
on the Zika virus outbreak, and we took into account multiple this study. Statistical charts, financial charts, scientific charts,
disease dynamics to enable us make accurate predictions. and other sorts of charts can all be plotted.
II. LITERATURE SURVEY Traditional prediction methodologies make it difficult to
manage time components, however time series forecasting
Effective outbreak prediction models are needed to learn
techniques take time components into consideration with
more about the anticipated spread and infectious disease
Authorized licensed use limited to: VTU Consortium. Downloaded on December 05,2024 at 06:41:16 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)
other factors. Time series forecasting produces far more given a region where an epidemic outbreak has
accurate results than traditional prediction approaches. already occurred.
According to the findings of (Taylor, 2008) study, there is a
lot of potential for using time series forecasting to anticipate A. Project Architecture
future outcomes.[9]
Artificial Intelligence model as a Dengue Outbreak
Predictor: This model was turned into a Graphical User
Interface, which was intended to help and educate the general
population in areas at risk of a dengue outbreak. It employs a
Bayesian network machine learning technique with a 79-84
percent accuracy (Chenar & Deng, 2018)[10]. A fusion of
meteorological data and random forest algorithms was used
to forecast global African swine disease outbreaks [11].
The 12 features used for modelling were subjected to a
logistic regression analysis after that Receiver Operating
Curves were produced, and it was assumed that
precipitation had a substantial impact on the pandemic's
onset. It decided to use random forest algorithms in
conjunction with the subset Evaluator-Best First feature
selection method using ASF outbreak data and weather data
from the world climb database [12].
III. PROBLEM DEFINITION Fig.1. Project Architecture
COVID19 has shown the true state of our healthcare
infrastructure, planning, and preparedness to deal with a We begin our project by collecting data from several
pandemic with limited resources, unprepared personnel, and sources and parameters, such as the Zika Virus dataset,
shattered supply chains. If government agencies are given weather dataset, and population density dataset of virus-
this information ahead of time, they may plan and execute affected areas. Following that, we do exploratory data
projects in a well-organized manner, maximizing the analysis in order to extract useful information based on the
utilization of employees and resources. The objective of this dataset and the data suitably prepared. In addition, we
project is to investigate and develop a multimodal model that generate two datasets for classification and regression
may foretell the likelihood of an outbreak in a certain place. models, respectively, and perform hyperparameter tuning to
pick the best functioning model.
Motivation
Finally, we design a user interface to display the
The sole motivation of this problem statement is the
predictions and results.
current pandemic situation that has occurred abruptly. The
viral condition known as "coronavirus disease" is caused by a IV. IMPLEMENTATION AND RESULTS
coronavirus that has just been discovered (COVID-19). The
COVID-19 situation could have long-term and severe A. Technology Stack
impacts on countries, humanities, and cooperation between For all machine learning activities, we primarily
nations as well as posing complications in management of employed the Python programming language. Along with
sickness and crisis management. There are growing signs Pandas, Numpy, and Seaborn, the Scikit-Learn package is
that the world after the crisis will change and that used to build models. In Tableau, we've also generated a
globalisation will be called into question in a number of dashboard. To make the front end of our website responsive,
situations. we used HTML, CSS, JavaScript, and Bootstrap. The Flask
Scope API is utilised in the backend for Heroku deployment.
The scope of our project is broad and widespread,
globally. We will be including numerous ML algorithms for B. Data Collection
predicting the outbreak of an epidemic disease in a certain
region. Because these factors are pertinent and slightly We collected three major datasets to help us do the
influence the dynamics of the disease outbreak, our prediction for the outbreak of Zika virus-
methodology considers climate, distribution of population in • The Zika Data Repository from the CDC and
the afflicted area and geography. Prevention makes information on the Zika virus
1. Reducing avoidable pain from sickness Reduce the accessible to the general public. It gave us enough
expenditure load on government and healthcare information to build and test the model.
systems by giving them first-hand knowledge of • Historical weather dataset collected from world
outbreak hotspots and the agents that cause epidemics weather online through API.
to spread.
• Population density dataset of the virus struck zones
2. The machine learning model must identify the
where we are doing the prediction.
subsequent outbreak-prone locations and attributes
that greatly aid in the propagation of the outbreak • Also, latitude and longitude information of these areas
for plotting purpose.
128
Authorized licensed use limited to: VTU Consortium. Downloaded on December 05,2024 at 06:41:16 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)
Fig.2. Scatter plot for number for cases vs population density In case of Argentina, we can see that Precipitation,
Humidity, and temperature have a negative correlation with
Effect of Weather on the number of cases: the number of cases. Hence indicating that in such regions
weather has minimal effect on number of cases.
We noticed that certain countries are tropical, while
others are subtropical, in our list of countries. As a result, we Analysis of weather with incubation period of Zika virus-
decided to track the weather trends in these areas and assess
their impact on the number of instances. For any case observed, the process of getting infected
happens 7 to 14 days prior. So, we have to observe the trend
Tropical locations are those where the months of a of weather in those 7 days. The incubation period of Zika
specific season are seen in a consistent manner. The term virus is generally 3-14 days. In our analysis, we’ve
"subtropical" refers to areas where there is no definite season considered the incubation period of 7 days.
period.
Case 1: When number of cases are 0, the weather
Countries belonging to tropical regions- Colombia, conditions 7 days prior are:
Brazil, Ecuador, Dominican Republic, El Salvador, Haiti
Guatemala, Panama, Nicaragua.
Countries belonging to subtropical regions- United
States, Argentina, Mexico.
As we know, Zika Virus is a mosquito-borne disease and
mosquitoes are most active in the rainy season. So, for
weather analysis, we considered three main factors that will
help us to understand if there was any rain- Precipitation,
Humidity, and maximum temperature of the area.
129
Authorized licensed use limited to: VTU Consortium. Downloaded on December 05,2024 at 06:41:16 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)
Regression models:
For regression following baseline models were built and
tested.
130
Authorized licensed use limited to: VTU Consortium. Downloaded on December 05,2024 at 06:41:16 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)
After hyperparameter tuning of these models: forecast the possibility of an outbreak based on the number of
cases.
TABLE III. PERFORMANCE EVALUATION AFTER TUNING
We may also analyze the pattern of new instances to see if
Model MSE R2 score % increase they will rise or fall.
Lasso 546999.53 0.0067 7.46
131
Authorized licensed use limited to: VTU Consortium. Downloaded on December 05,2024 at 06:41:16 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)
REFERENCES
[1] Remuzzi A, Remuzzi G., “COVID-19 and Italy: what next ? ”, Lancet.
2020;395(10231):pp.1225-1228.
[2] Darwish, A.; Rahhal, Y.; Jafar, A., “A comparative study on predicting
influenza outbreaks using different feature spaces: Application of
influenza-like illness data from Early Warning Alert and Response
System in Syria”. BMC Res. Notes 2020, 13, pp. 33.
[3] Scarpino, S.V., Petri, G., “On the predictability of infectious disease
Fig.13. Map to know the countries which are observed for prediction. outbreaks”. Nat Communication, 10, 2019, pp. 898 .
[4] Pan, J.-R.; Huang, Z.-Q.; Chen, K., “Evaluation of the effect of
varicella outbreak control measures through a discrete time delay SEIR
model”. Zhonghua Yu Fang Yi Xue Za Zhi2012, 46, pp. 343–347
[5] Dantas, E.; Tosin, M.; Cunha, A., Jr., “Calibration of a SEIR–SEI
epidemic model to describe the Zika virus outbreak in Brazil”, Applied.
Math. Computation. 2018, 338, pp. 249–259.
[6] Yin, R.; Tran, V.H.; Zhou, X.; Zheng, J.; Kwoh, C.K., “Predicting
antigenic variants of H1N1 influenza virus based on epidemics and
pandemics using a stacking model. PLoS ONE”, 2018,pp. 13.
[7] Das Adhikari, Nimai & Alka, Arpana & Kushwaha, Jitendra & Nayak,
Ashish, “Sentiment Classifier and Analysis for Epidemic Prediction”.,
10.5121/csit.2018.81004.
[8] Andreas Buja, Dianne Cook & Deborah F. Swayne Research Scientist,
“Interactive High-Dimensional Data Visualization”, Journal of
Computational and Graphical Statistics , 1996, 5:1, pp. 78-99.
Fig.14. List of locations for prediction [9] Taylor, James., “A Comparison of UnivariateTime Series Methods for
Forecasting Intraday Arrivals at a Call Center”,
ManagementScience,2008,54.253-265.0.1287/mnsc.1070.0786.
We have to select the location from this list and thus
[10] Chenar, S.S.; Deng, Z, “Development of genetic programming-based
make the prediction for the number of cases in that location. models for predicting oyster norovirus outbreak risks.”, Water Res.,
2018, 128, pp. 20–37.
V. CONCLUSION [11] Liang, R.; Lu, Y.; Qu, X.; Su, Q.; Li, C.; Xia, S.; Liu, Y.; Zhang, Q.;
The main aspect of this paper is to make people Cao, X.; Chen, Q.; et al. “Prediction for global African swine fever
aware of any epidemic diseases that prevail in their outbreaks based on a combination of random forest algorithms and
meteorological data.” Transbound. Emerg. Dis., 2020, 67, pp. 935–946.
surroundings. Our key emphasis was on studying and
[12] Raja, D.B.; Mallol, R.; Ting, C.Y.; Kamaludin, F.; Ahmad, R.; Ismail,
analyzing the data we gathered before moving on to building S.; Jayaraj, V.J.; Sundram, B.M., “Artificial Intelligence Model as
prediction models. With Epidemic Outbreak Prediction, not Predictor for Dengue Outbreaks”. Malays. J. Public Health Med. ,2019,
only the general public, but also the government, can take the 19, pp. 103–108.
required procedures that limit epidemic diseases and prevent
their spread. This project is very adaptable, and we can use it
to anticipate various diseases. Because the UI is simple to
use, individuals of all ages may access and use it.
VI. FUTURE SCOPE
We can expand this project to predict zika virus cases all
across the world. We can predict the probability of an
epidemic for a few more days now that weather forecasting
data is available. Given the success of our proof of concept,
we may improve it by adding more criteria such as social
media symptomatic data, lifestyle, demographic dynamics,
and so on. Using the same method, we can investigate and
predict a variety of additional epidemics. We can also seek
additional data from government and healthcare institutions
that isn't currently available in the public domain
132
Authorized licensed use limited to: VTU Consortium. Downloaded on December 05,2024 at 06:41:16 UTC from IEEE Xplore. Restrictions apply.